malva.indexes module

class malva.indexes.BackgroundModel

Bases: object

Models background k-mer frequencies for filtering common sequences.

Maintains counts of k-mer occurrences in reference sequences to identify and filter out highly abundant or common k-mers during searches.

total_mers

Total number of k-mers processed

Type:

int

kmer_size

Length of k-mers being counted

Type:

int

verbose

Whether to print detailed logging information

Type:

bool

create_from_reference(filename, consecutive_genes=True)

Build background model from a reference sequence file.

Parameters:
  • filename (str) – Path to reference sequence file

  • consecutive_genes (bool) – Whether to group consecutive genes

Processes reference sequences to count k-mer frequencies, optionally combining counts for consecutive genes with same name.

export_fasta(filename)
import_jellyfish_fasta(filename)
is_mer_above_cutoff(kmer, cutoff)
load(filename)
save(filename)
class malva.indexes.MalvaIndex

Bases: object

A spatial indexing system for k-mer sequences.

Manages creation, storage, and querying of k-mer indices with associated spatial coordinates. Supports both memory-efficient and performance-optimized indexing strategies.

index_dir

Directory where index files are stored

Type:

str

kmer_size

Length of k-mers used in the index

Type:

int

verbose

Whether to print detailed logging information

Type:

bool

n_chunks

Number of chunks the index is split into

Type:

int

add_reads(reads_in, bam_tags='CB:{cell}', cell='r1[2:27]', n_report=10000000, chunksize=100000000, threads=1)
close()
coord_lims
data_lengths
get_cell_id(cell_id)

Extract cell ID without project ID.

get_project_id(cell_id)

Extract project ID from a cell ID.

get_whole_sliding_sequence(string, k)
get_whole_sliding_sequence_chunk(string, sliding_size)
index
index_dir
static index_exists(self)
index_file
initialize(kmer_size=8)
initialize_kmer_index()
jump_amount
kmer_size
load_index_to_memory(chunk_id=0, chunk_size=50000000, max_mem=None, force=False, count_at_most=10000, count_at_least=10)
merge_chunks(file_out, merge_projects=False, sample_percentage=0.05)
n_chunks
n_spatial
open(mode='r+', blosc_load_to_memory=False)
project_mapping
set_background_model(background_model)
set_barcode_index(sindex)
set_spatial_coords(coords)
set_spatial_index(sindex)
spatial_coord
verbose
where(sequence, sliding_size=128, pct_threshold=0.65, count_at_most=10000, count_at_least=10, chunk_id=0, single_count=False, max_mem='1M', force_reload=False, use_background_model=True, use_batched=False, *args, **kwargs)

Locate spatial positions where a sequence or set of sequences appear.

Parameters:
  • sequence (Union[str, List[str], List[List[str]]]) – Query sequence(s) to search for. If List[List[str]], each sublist represents isoforms of the same gene that should be quantified together.

  • sliding_size (int) – Size of sliding window for k-mer generation. When < 0, will not use sliding windows but the whole sequence, and should be set to -(read_length)

  • pct_threshold (float) – Minimum percentage of matching k-mers required

  • count_at_most (int) – Maximum count threshold for k-mer consideration

  • count_at_least (int) – Minimum count threshold for k-mer consideration

  • chunk_id (int) – Index chunk to search in

  • single_count (bool) – Whether to count each match only once

  • max_mem (str) – Maximum memory constraint

  • force_reload (bool) – Force index reload

  • use_background_model (bool) – Use background model for filtering

  • use_batched (bool) – For lazy loading, whether to use batching. When False, it detects automatically. When True, it forces to batched.

Returns:

List of tuples, one per group (or single tuple if input is str/List[str]), each containing:
  • np.ndarray: Spatial locations where sequences were found

  • np.ndarray: Count of occurrences at each location

  • List: Matching details for sequence positions

Return type:

List[Tuple]

write()
class malva.indexes.SpatialIndex

Bases: object

Manages spatial coordinates and their associated cell barcodes.

Provides efficient lookup and storage of spatial positions indexed by cell barcodes. Supports both standard spatial transcriptomics and STOmics data formats.

get_coords()
get_coords_stomics()
load_binary(filename)
load_binary_stomics(filename, barcode_length=25)
num_items()
save_binary(filename)
malva.indexes.create_singlecell_index(whitelist_file)

Create an index from a single-cell barcode whitelist.

Parameters:

whitelist_file (str) – Path to file containing valid cell barcodes

Returns:

Index containing valid cell barcodes

Return type:

SpatialIndex

Processes a whitelist file containing one barcode per line to create a lookup structure for valid cell identifiers.

malva.indexes.create_spatial_index(spatial_barcode_file)

Create a spatial index from a barcode coordinate file.

Parameters:

spatial_barcode_file (str) – Path to file containing barcode coordinates

Returns:

Index mapping barcodes to spatial coordinates

Return type:

SpatialIndex

Processes a CSV/TSV file containing cell barcodes and their x,y coordinates to build an efficient lookup structure.