Examples

Hands-on tutorials for analyzing single-cell RNA-seq data with Malva Tools.

We use the 1k Human PBMCs dataset from 10x Genomics (v3 chemistry) as a standard benchmark containing approximately 1,000 peripheral blood mononuclear cells.


Tutorial Overview

Step 1: Build Index

Download data, build the k-mer index, and quantify gene expression.

Step 1: Build Index and Quantify
Step 2: Analyze

Load results, cluster cells, and identify marker genes.

Single-Cell Analysis of Malva Quantification
Step 3: Query

Search for custom sequences and visualize results.

Sequence Search with Malva

Prerequisites

Malva Tools

Installed via wheel

Installation

Python Packages

scanpy, matplotlib, pandas, numpy, dnaio

Disk Space

~20 GB for example data


What You’ll Learn

Build a Malva index
Quantify gene expression
Cluster cells and find markers
Query custom sequences

Step 1: Build Index and Quantify

Download the example data

# Create directory structure
mkdir -p malva_example/{barcodes,reads,references,indices,quant}
cd malva_example

# Download 10x cell barcode whitelist (v3 chemistry)
wget https://bimsbstatic.mdc-berlin.de/rajewsky/malva/examples/malva_tools/3M-february-2018.txt \
    -O barcodes/3M-february-2018.txt

# Download human transcriptome reference
wget https://bimsbstatic.mdc-berlin.de/rajewsky/malva/examples/malva_tools/human_cdna_ncrna_masked.fa.gz \
    -O references/human_cdna_ncrna_masked.fa.gz

# Download PBMC 1k v3 sequencing reads
wget https://bimsbstatic.mdc-berlin.de/rajewsky/malva/examples/malva_tools/pbmc_1k_v3_S1_R1_001.fastq.gz \
    -O reads/pbmc_1k_v3_S1_R1_001.fastq.gz
wget https://bimsbstatic.mdc-berlin.de/rajewsky/malva/examples/malva_tools/pbmc_1k_v3_S1_R2_001.fastq.gz \
    -O reads/pbmc_1k_v3_S1_R2_001.fastq.gz

Build the index

malva index \
    --reads-in reads/pbmc_1k_v3_S1_R1_001.fastq.gz reads/pbmc_1k_v3_S1_R2_001.fastq.gz \
    --flavor sc_10x_v3 \
    --spatial-bc-in barcodes/3M-february-2018.txt \
    --index-out indices/pbmc_1k_v3 \
    --kmer-length 24 \
    --chunksize 100000000 \
    --merge-chunks

Quantify gene expression

malva quant \
    --index-in indices/pbmc_1k_v3 \
    --reference references/human_cdna_ncrna_masked.fa.gz \
    --folder-out quant/pbmc_1k_v3 \
    --h5ad \
    --pct-threshold 0.99 \
    --kmer-min 0 \
    --kmer-max 1000 \
    --sliding-size 90

Tip

Output: quant/pbmc_1k_v3/pseudoquant.h5ad - a scanpy-compatible file ready for analysis.


Analysis Notebooks

Continue with the Jupyter notebooks below: