malva.indexes module¶
- class malva.indexes.BackgroundModel¶
Bases:
objectModels background k-mer frequencies for filtering common sequences.
Maintains counts of k-mer occurrences in reference sequences to identify and filter out highly abundant or common k-mers during searches.
- total_mers¶
Total number of k-mers processed
- Type:
int
- kmer_size¶
Length of k-mers being counted
- Type:
int
- verbose¶
Whether to print detailed logging information
- Type:
bool
- create_from_reference(filename, consecutive_genes=True)¶
Build background model from a reference sequence file.
- Parameters:
filename (str) – Path to reference sequence file
consecutive_genes (bool) – Whether to group consecutive genes
Processes reference sequences to count k-mer frequencies, optionally combining counts for consecutive genes with same name.
- export_fasta(filename)¶
- import_jellyfish_fasta(filename)¶
- is_mer_above_cutoff(kmer, cutoff)¶
- load(filename)¶
- save(filename)¶
- class malva.indexes.MalvaIndex¶
Bases:
objectA spatial indexing system for k-mer sequences.
Manages creation, storage, and querying of k-mer indices with associated spatial coordinates. Supports both memory-efficient and performance-optimized indexing strategies.
- index_dir¶
Directory where index files are stored
- Type:
str
- kmer_size¶
Length of k-mers used in the index
- Type:
int
- verbose¶
Whether to print detailed logging information
- Type:
bool
- n_chunks¶
Number of chunks the index is split into
- Type:
int
- add_reads(reads_in, bam_tags='CB:{cell}', cell='r1[2:27]', n_report=10000000, chunksize=100000000, threads=1)¶
- close()¶
- coord_lims¶
- data_lengths¶
- get_cell_id(cell_id)¶
Extract cell ID without project ID.
- get_project_id(cell_id)¶
Extract project ID from a cell ID.
- get_whole_sliding_sequence(string, k)¶
- get_whole_sliding_sequence_chunk(string, sliding_size)¶
- index¶
- index_dir¶
- static index_exists(self)¶
- index_file¶
- initialize(kmer_size=8)¶
- initialize_kmer_index()¶
- jump_amount¶
- kmer_size¶
- load_index_to_memory(chunk_id=0, chunk_size=50000000, max_mem=None, force=False, count_at_most=10000, count_at_least=10)¶
- merge_chunks(file_out, merge_projects=False, sample_percentage=0.05)¶
- n_chunks¶
- n_spatial¶
- open(mode='r+', blosc_load_to_memory=False)¶
- project_mapping¶
- set_background_model(background_model)¶
- set_barcode_index(sindex)¶
- set_spatial_coords(coords)¶
- set_spatial_index(sindex)¶
- spatial_coord¶
- verbose¶
- where(sequence, sliding_size=128, pct_threshold=0.65, count_at_most=10000, count_at_least=10, chunk_id=0, single_count=False, max_mem='1M', force_reload=False, use_background_model=True, use_batched=False, *args, **kwargs)¶
Locate spatial positions where a sequence or set of sequences appear.
- Parameters:
sequence (Union[str, List[str], List[List[str]]]) – Query sequence(s) to search for. If List[List[str]], each sublist represents isoforms of the same gene that should be quantified together.
sliding_size (int) – Size of sliding window for k-mer generation. When < 0, will not use sliding windows but the whole sequence, and should be set to -(read_length)
pct_threshold (float) – Minimum percentage of matching k-mers required
count_at_most (int) – Maximum count threshold for k-mer consideration
count_at_least (int) – Minimum count threshold for k-mer consideration
chunk_id (int) – Index chunk to search in
single_count (bool) – Whether to count each match only once
max_mem (str) – Maximum memory constraint
force_reload (bool) – Force index reload
use_background_model (bool) – Use background model for filtering
use_batched (bool) – For lazy loading, whether to use batching. When False, it detects automatically. When True, it forces to batched.
- Returns:
- List of tuples, one per group (or single tuple if input is str/List[str]), each containing:
np.ndarray: Spatial locations where sequences were found
np.ndarray: Count of occurrences at each location
List: Matching details for sequence positions
- Return type:
List[Tuple]
- write()¶
- class malva.indexes.SpatialIndex¶
Bases:
objectManages spatial coordinates and their associated cell barcodes.
Provides efficient lookup and storage of spatial positions indexed by cell barcodes. Supports both standard spatial transcriptomics and STOmics data formats.
- get_coords()¶
- get_coords_stomics()¶
- load_binary(filename)¶
- load_binary_stomics(filename, barcode_length=25)¶
- num_items()¶
- save_binary(filename)¶
- malva.indexes.create_singlecell_index(whitelist_file)¶
Create an index from a single-cell barcode whitelist.
- Parameters:
whitelist_file (str) – Path to file containing valid cell barcodes
- Returns:
Index containing valid cell barcodes
- Return type:
Processes a whitelist file containing one barcode per line to create a lookup structure for valid cell identifiers.
- malva.indexes.create_spatial_index(spatial_barcode_file)¶
Create a spatial index from a barcode coordinate file.
- Parameters:
spatial_barcode_file (str) – Path to file containing barcode coordinates
- Returns:
Index mapping barcodes to spatial coordinates
- Return type:
Processes a CSV/TSV file containing cell barcodes and their x,y coordinates to build an efficient lookup structure.