malva.indexes module¶

class malva.indexes.BackgroundModel¶

Bases: object

Models background k-mer frequencies for filtering common sequences.

Maintains counts of k-mer occurrences in reference sequences to identify and filter out highly abundant or common k-mers during searches.

total_mers¶

Total number of k-mers processed

Type:: int

kmer_size¶

Length of k-mers being counted

Type:: int

verbose¶

Whether to print detailed logging information

Type:: bool

create_from_reference(filename, consecutive_genes=True)¶

Build background model from a reference sequence file.

Parameters:

filename (str) – Path to reference sequence file
consecutive_genes (bool) – Whether to group consecutive genes

Processes reference sequences to count k-mer frequencies, optionally combining counts for consecutive genes with same name.

export_fasta(filename)¶

import_jellyfish_fasta(filename)¶

is_mer_above_cutoff(kmer, cutoff)¶

load(filename)¶

save(filename)¶

class malva.indexes.MalvaIndex¶

Bases: object

A spatial indexing system for k-mer sequences.

Manages creation, storage, and querying of k-mer indices with associated spatial coordinates. Supports both memory-efficient and performance-optimized indexing strategies.

index_dir¶

Directory where index files are stored

Type:: str

kmer_size¶

Length of k-mers used in the index

Type:: int

verbose¶

Whether to print detailed logging information

Type:: bool

n_chunks¶

Number of chunks the index is split into

Type:: int

add_reads(reads_in, bam_tags='CB:{cell}', cell='r1[2:27]', n_report=10000000, chunksize=100000000, threads=1)¶

close()¶

coord_lims¶

data_lengths¶

get_cell_id(cell_id)¶: Extract cell ID without project ID.

get_project_id(cell_id)¶: Extract project ID from a cell ID.

get_whole_sliding_sequence(string, k)¶

get_whole_sliding_sequence_chunk(string, sliding_size)¶

index¶

index_dir¶

static index_exists(self)¶

index_file¶

initialize(kmer_size=8)¶

initialize_kmer_index()¶

jump_amount¶

kmer_size¶

load_index_to_memory(chunk_id=0, chunk_size=50000000, max_mem=None, force=False, count_at_most=10000, count_at_least=10)¶

merge_chunks(file_out, merge_projects=False, sample_percentage=0.05)¶

n_chunks¶

n_spatial¶

open(mode='r+', blosc_load_to_memory=False)¶

project_mapping¶

set_background_model(background_model)¶

set_barcode_index(sindex)¶

set_spatial_coords(coords)¶

set_spatial_index(sindex)¶

spatial_coord¶

verbose¶

where(sequence, sliding_size=128, pct_threshold=0.65, count_at_most=10000, count_at_least=10, chunk_id=0, single_count=False, max_mem='1M', force_reload=False, use_background_model=True, use_batched=False, *args, **kwargs)¶

Locate spatial positions where a sequence or set of sequences appear.

Parameters:

sequence (Union[str, List[str], List[List[str]]]) – Query sequence(s) to search for. If List[List[str]], each sublist represents isoforms of the same gene that should be quantified together.
sliding_size (int) – Size of sliding window for k-mer generation. When < 0, will not use sliding windows but the whole sequence, and should be set to -(read_length)
pct_threshold (float) – Minimum percentage of matching k-mers required
count_at_most (int) – Maximum count threshold for k-mer consideration
count_at_least (int) – Minimum count threshold for k-mer consideration
chunk_id (int) – Index chunk to search in
single_count (bool) – Whether to count each match only once
max_mem (str) – Maximum memory constraint
force_reload (bool) – Force index reload
use_background_model (bool) – Use background model for filtering
use_batched (bool) – For lazy loading, whether to use batching. When False, it detects automatically. When True, it forces to batched.

Returns:

List of tuples, one per group (or single tuple if input is str/List[str]), each containing:

np.ndarray: Spatial locations where sequences were found
np.ndarray: Count of occurrences at each location
List: Matching details for sequence positions

Return type:

List[Tuple]

write()¶

class malva.indexes.SpatialIndex¶

Bases: object

Manages spatial coordinates and their associated cell barcodes.

Provides efficient lookup and storage of spatial positions indexed by cell barcodes. Supports both standard spatial transcriptomics and STOmics data formats.

get_coords()¶

get_coords_stomics()¶

load_binary(filename)¶

load_binary_stomics(filename, barcode_length=25)¶

num_items()¶

save_binary(filename)¶

malva.indexes.create_singlecell_index(whitelist_file)¶

Create an index from a single-cell barcode whitelist.

Parameters:: whitelist_file (str) – Path to file containing valid cell barcodes
Returns:: Index containing valid cell barcodes
Return type:: SpatialIndex

Processes a whitelist file containing one barcode per line to create a lookup structure for valid cell identifiers.

malva.indexes.create_spatial_index(spatial_barcode_file)¶

Create a spatial index from a barcode coordinate file.

Parameters:: spatial_barcode_file (str) – Path to file containing barcode coordinates
Returns:: Index mapping barcodes to spatial coordinates
Return type:: SpatialIndex

Processes a CSV/TSV file containing cell barcodes and their x,y coordinates to build an efficient lookup structure.