Examples¶
Hands-on tutorials for analyzing single-cell RNA-seq data with Malva Tools.
We use the 1k Human PBMCs dataset from 10x Genomics (v3 chemistry) as a standard benchmark containing approximately 1,000 peripheral blood mononuclear cells.
Tutorial Overview¶
Step 1: Build Index
Download data, build the k-mer index, and quantify gene expression.
Step 2: Analyze
Load results, cluster cells, and identify marker genes.
Step 3: Query
Search for custom sequences and visualize results.
Prerequisites¶
Python Packages
scanpy, matplotlib, pandas, numpy, dnaio
Disk Space
~20 GB for example data
What You’ll Learn¶
Build a Malva index
Quantify gene expression
Cluster cells and find markers
Query custom sequences
Step 1: Build Index and Quantify¶
Download the example data
# Create directory structure
mkdir -p malva_example/{barcodes,reads,references,indices,quant}
cd malva_example
# Download 10x cell barcode whitelist (v3 chemistry)
wget https://bimsbstatic.mdc-berlin.de/rajewsky/malva/examples/malva_tools/3M-february-2018.txt \
-O barcodes/3M-february-2018.txt
# Download human transcriptome reference
wget https://bimsbstatic.mdc-berlin.de/rajewsky/malva/examples/malva_tools/human_cdna_ncrna_masked.fa.gz \
-O references/human_cdna_ncrna_masked.fa.gz
# Download PBMC 1k v3 sequencing reads
wget https://bimsbstatic.mdc-berlin.de/rajewsky/malva/examples/malva_tools/pbmc_1k_v3_S1_R1_001.fastq.gz \
-O reads/pbmc_1k_v3_S1_R1_001.fastq.gz
wget https://bimsbstatic.mdc-berlin.de/rajewsky/malva/examples/malva_tools/pbmc_1k_v3_S1_R2_001.fastq.gz \
-O reads/pbmc_1k_v3_S1_R2_001.fastq.gz
Build the index
malva index \
--reads-in reads/pbmc_1k_v3_S1_R1_001.fastq.gz reads/pbmc_1k_v3_S1_R2_001.fastq.gz \
--flavor sc_10x_v3 \
--spatial-bc-in barcodes/3M-february-2018.txt \
--index-out indices/pbmc_1k_v3 \
--kmer-length 24 \
--chunksize 100000000 \
--merge-chunks
Quantify gene expression
malva quant \
--index-in indices/pbmc_1k_v3 \
--reference references/human_cdna_ncrna_masked.fa.gz \
--folder-out quant/pbmc_1k_v3 \
--h5ad \
--pct-threshold 0.99 \
--kmer-min 0 \
--kmer-max 1000 \
--sliding-size 90
Tip
Output: quant/pbmc_1k_v3/pseudoquant.h5ad - a scanpy-compatible file ready for analysis.
Analysis Notebooks¶
Continue with the Jupyter notebooks below: