Examples Overview¤
Practical examples demonstrating DiffBio's differentiable bioinformatics operators, from single-operator basics to full pipeline composition with ecosystem integration.
Example Tiers¤
Basic¤
Single-operator examples. One config, one apply(), one gradient check. Target audience: first-time DiffBio users.
| Example | Duration | Key Operators | Description |
|---|---|---|---|
| Operator Pattern | 5 min | SoftKMeansClustering | The universal Config -> Construct -> Apply pattern |
| MolNet Data Loading | 5 min | MolNetSource | Load MoleculeNet benchmark datasets |
| Molecular Fingerprints | 10 min | CircularFingerprint | Generate ECFP and neural fingerprints |
| Molecular Similarity | 5 min | TanimotoSimilarity | Compare molecules with similarity metrics |
| Scaffold Splitting | 5 min | ScaffoldSplitter | Proper train/test splits for drug discovery |
| DNA Encoding | 5 min | one-hot encoding | One-hot encode DNA sequences |
| Sequence Alignment | 10 min | SmoothSmithWaterman | Smith-Waterman local alignment |
| Pileup Generation | 10 min | DifferentiablePileup | Generate pileups from aligned reads |
| Single-Cell Clustering | 10 min | SoftKMeansClustering | Soft k-means with training loop |
| RNA Structure | 10 min | McCaskill algorithm | Predict RNA secondary structure |
| Protein Structure | 10 min | DSSP prediction | Predict protein secondary structure |
| HMM Sequence Model | 10 min | DifferentiableHMM | Hidden Markov Models for sequences |
| Preprocessing | 10 min | QualityFilter, AdapterRemoval | Read preprocessing pipeline |
Intermediate¤
Multi-operator workflows with two or three operators chained, parameter sweeps, or evaluation against ground truth. Target audience: users building custom pipelines.
| Example | Duration | Key Operators | Description |
|---|---|---|---|
| Imputation | 15 min | DifferentiableDiffusionImputer | MAGIC-style diffusion imputation for dropout recovery |
| Trajectory | 20 min | DifferentiablePseudotime, FateProbability, SwitchDE | Pseudotime ordering and fate probability estimation |
| Cell Annotation | 15 min | DifferentiableCellAnnotator (3 modes) | Cell type annotation: celltypist, cellassign, scanvi |
| Doublet Detection | 15 min | DoubletScorer, SoloDetector | Scrublet-style and Solo-style doublet detection |
| Batch Correction | 20 min | Harmony, MMD, WGAN | Three batch correction strategies compared |
Advanced¤
Full pipeline composition with ecosystem integration (calibrax metrics, artifex losses, opifex training). Training loops, benchmarking, and multi-operator chains. Target audience: researchers adapting DiffBio for their data.
| Example | Duration | Key Operators | Description |
|---|---|---|---|
| Spatial Analysis | 25 min | SpatialDomain, PASTEAlignment | STAGATE domain identification and PASTE slice alignment |
| GRN Inference | 25 min | DifferentiableGRN | Gene regulatory network inference via GATv2 attention |
| Single-Cell Pipeline | 30 min | Simulator, AmbientRemoval, Imputer, Clustering, Pseudotime | Five-operator end-to-end pipeline |
| Calibrax Metrics | 25 min | SoftKMeansClustering, DifferentiableAUROC | Training vs evaluation metric split with calibrax |
| scVI Benchmark | 30 min | VAENormalizer, MultiOmicsVAE | scVI-style VAE training with calibrax evaluation |
| Drug Discovery Workflow | 30 min | CircularFingerprint, PropertyPredictor | End-to-end drug discovery pipeline |
| ADMET Prediction | 25 min | ADMETPredictor | Multi-task ADMET property prediction |
| AttentiveFP GNN | 25 min | AttentiveFPOperator | Attention-based molecular fingerprints |
| Variant Calling Pipeline | 30 min | Full variant calling pipeline | End-to-end variant calling with CNN classifier |
| Single-Cell Batch Correction | 20 min | DifferentiableHarmony | Harmony-style batch correction |
| Differential Expression | 25 min | NB-GLM | DESeq2-style statistical testing |
| RNA Velocity | 25 min | Neural ODE velocity | RNA velocity trajectory inference |
| Epigenomics Analysis | 25 min | Peak calling, chromatin states | ChIP-seq and ATAC-seq analysis |
| Multi-omics Integration | 30 min | Spatial deconvolution, Hi-C | Multi-omics data integration |
Running Examples¤
All examples are self-contained Python scripts that generate synthetic data and produce verifiable outputs.
# Setup
./setup.sh
source ./activate.sh
# Run any example
uv run python examples/basics/operator_pattern.py
uv run python examples/singlecell/clustering.py
uv run python examples/ecosystem/scvi_benchmark.py
Key Features Demonstrated¤
All examples showcase DiffBio's core capabilities:
- Differentiability -- every operator supports
jax.gradfor gradient computation - JIT Compilation -- all operators work with
jax.jitfor accelerated execution - apply() Contract -- consistent
result, state, metadata = operator.apply(data, {}, None)interface - Synthetic Data -- self-contained examples with no external data dependencies
- Ecosystem Integration -- calibrax metrics, artifex losses, and opifex training utilities
Contributing Examples¤
See the Contributing Guide and the Example Documentation Design Guide for details on adding new examples.