Skip to content

Variant Operators API¤

Differentiable operators for variant calling and analysis.

DeepVariantStylePileup¤

diffbio.operators.variant.deepvariant_pileup.DeepVariantStylePileup ¤

DeepVariantStylePileup(
    config: DeepVariantPileupConfig,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: TemperatureOperator

DeepVariant-style multi-channel pileup image generator.

Generates pileup images compatible with DeepVariant's CNN architecture while maintaining full differentiability for end-to-end training.

The pileup image has shape (max_reads, window_size, num_channels) where each read occupies a row and each column represents a base position.

Inherits from TemperatureOperator to get:

  • _temperature property for temperature-controlled smoothing
  • soft_max() for logsumexp-based smooth maximum
  • soft_argmax() for soft position selection
Example
config = DeepVariantPileupConfig(window_size=101, max_reads=50)
pileup = DeepVariantStylePileup(config)
data = {
    "reads": reads,  # (num_reads, read_length, 4)
    "reference": reference,  # (window_size, 4)
    "base_qualities": qualities,  # (num_reads, read_length)
    "mapping_qualities": mapq,  # (num_reads,)
    "strands": strands,  # (num_reads,)
    "positions": positions,  # (num_reads,)
}
result, _, _ = pileup.apply(data, {}, None)
pileup_image = result["pileup_image"]  # (50, 101, num_channels)

Parameters:

Name Type Description Default
config DeepVariantPileupConfig

Pileup configuration

required
rngs Rngs | None

Random number generators (optional)

None
name str | None

Optional operator name

None

num_channels property ¤

num_channels: int

Return the number of output channels.

apply ¤

apply(
    data: PyTree,
    state: PyTree,
    metadata: dict[str, Any] | None,
    random_params: Any = None,
    stats: dict[str, Any] | None = None,
) -> tuple[PyTree, PyTree, dict[str, Any] | None]

Apply DeepVariant-style pileup generation.

Parameters:

Name Type Description Default
data PyTree

Dictionary containing: - "reads": One-hot encoded reads (num_reads, read_length, 4) - "reference": One-hot encoded reference (window_size, 4) - "base_qualities": Phred quality scores (num_reads, read_length) - "mapping_qualities": Mapping quality scores (num_reads,) - "strands": Strand orientation (num_reads,) - "positions": Read start positions in window (num_reads,)

required
state PyTree

Element state (passed through unchanged)

required
metadata dict[str, Any] | None

Element metadata (passed through unchanged)

required
random_params Any

Not used (deterministic operator)

None
stats dict[str, Any] | None

Not used

None

Returns:

Type Description
tuple[PyTree, PyTree, dict[str, Any] | None]

Tuple of (transformed_data, state, metadata): - transformed_data contains input data plus pileup_image - state is passed through unchanged - metadata is passed through unchanged

compute_pileup_image ¤

compute_pileup_image(
    reads: Float[Array, "num_reads read_length 4"],
    reference: Float[Array, "window_size 4"],
    base_qualities: Float[Array, "num_reads read_length"],
    mapping_qualities: Float[Array, num_reads],
    strands: Float[Array, num_reads],
    positions: Int[Array, num_reads],
) -> Float[Array, "max_reads window_size num_channels"]

Compute DeepVariant-style pileup image.

Parameters:

Name Type Description Default
reads Float[Array, 'num_reads read_length 4']

One-hot encoded reads (num_reads, read_length, 4)

required
reference Float[Array, 'window_size 4']

One-hot encoded reference (window_size, 4)

required
base_qualities Float[Array, 'num_reads read_length']

Phred quality scores (num_reads, read_length)

required
mapping_qualities Float[Array, num_reads]

Mapping quality scores (num_reads,)

required
strands Float[Array, num_reads]

Strand orientation, 0=forward, 1=reverse (num_reads,)

required
positions Int[Array, num_reads]

Starting position of each read in window (num_reads,)

required

Returns:

Type Description
Float[Array, 'max_reads window_size num_channels']

Pileup image of shape (max_reads, window_size, num_channels)

DeepVariantPileupConfig¤

diffbio.operators.variant.deepvariant_pileup.DeepVariantPileupConfig dataclass ¤

DeepVariantPileupConfig(
    temperature: float = DEFAULT_TEMPERATURE,
    learnable_temperature: bool = False,
    window_size: int = 221,
    max_reads: int = 100,
    channels: tuple[
        str, ...
    ] = _DEFAULT_DEEPVARIANT_CHANNELS,
    quality_max: float = 40.0,
    mapq_max: float = 60.0,
)

Bases: TemperatureConfig

Configuration for DeepVariant-style pileup generation.

Inherits from TemperatureConfig to get temperature parameter for soft/differentiable operations.

Attributes:

Name Type Description
window_size int

Width of pileup image in base pairs (default: 221)

max_reads int

Height of pileup image / max reads to include (default: 100)

channels tuple[str, ...]

Ordered channel set to emit in the pileup image.

quality_max float

Maximum quality score for normalization (default: 40)

mapq_max float

Maximum mapping quality for normalization (default: 60)

temperature class-attribute instance-attribute ¤

temperature: float = DEFAULT_TEMPERATURE

learnable_temperature class-attribute instance-attribute ¤

learnable_temperature: bool = False

CNNVariantClassifier¤

diffbio.operators.variant.cnn_classifier.CNNVariantClassifier ¤

CNNVariantClassifier(
    config: CNNVariantClassifierConfig,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: OperatorModule

CNN classifier for DeepVariant-style variant calling.

This operator implements a convolutional neural network that processes pileup images to classify genomic positions as reference, SNV, or indel.

Architecture: - Multiple Conv2D layers with batch normalization and ReLU - Max pooling for spatial reduction - Global average pooling before FC layers - Fully connected layers with dropout - Softmax output for class probabilities

Parameters:

Name Type Description Default
config CNNVariantClassifierConfig

CNNVariantClassifierConfig with model parameters.

required
rngs Rngs | None

Flax NNX random number generators.

None
name str | None

Optional operator name.

None
Example
config = CNNVariantClassifierConfig(num_classes=3)
classifier = CNNVariantClassifier(config, rngs=nnx.Rngs(42))
data = {"pileup_image": image_batch}  # (B, H, W, C)
result, state, meta = classifier.apply(data, {}, None)

Parameters:

Name Type Description Default
config CNNVariantClassifierConfig

Classifier configuration.

required
rngs Rngs | None

Random number generators for initialization.

None
name str | None

Optional operator name.

None

apply ¤

apply(
    data: PyTree,
    state: PyTree,
    metadata: dict[str, Any] | None,
    random_params: Any = None,
    stats: dict[str, Any] | None = None,
) -> tuple[PyTree, PyTree, dict[str, Any] | None]

Apply CNN classification to pileup images.

Parameters:

Name Type Description Default
data PyTree

Dictionary containing: - "pileup_image": Pileup images (batch, height, width, channels)

required
state PyTree

Element state (passed through unchanged)

required
metadata dict[str, Any] | None

Element metadata (passed through unchanged)

required
random_params Any

Not used

None
stats dict[str, Any] | None

Not used

None

Returns:

Type Description
tuple[PyTree, PyTree, dict[str, Any] | None]

Tuple of (transformed_data, state, metadata): - transformed_data contains:

- "pileup_image": Original input
- "logits": Raw classification scores
- "class_probs": Softmax probabilities
  • state is passed through unchanged
  • metadata is passed through unchanged

CNNVariantClassifierConfig¤

diffbio.operators.variant.cnn_classifier.CNNVariantClassifierConfig dataclass ¤

CNNVariantClassifierConfig(
    num_classes: int = DEFAULT_NUM_CLASSES,
    input_height: int = 100,
    input_width: int = 221,
    num_channels: int = 6,
    hidden_channels: tuple[int, ...] = (64, 128, 256),
    fc_dims: tuple[int, ...] = (256, 128),
    dropout_rate: float = DEFAULT_DROPOUT_RATE,
)

Bases: OperatorConfig

Configuration for CNNVariantClassifier.

Attributes:

Name Type Description
num_classes int

Number of variant classes (default: 3 for REF/SNV/INDEL).

input_height int

Height of pileup image (coverage depth).

input_width int

Width of pileup image (context window).

num_channels int

Number of input channels (A, C, G, T, quality, strand).

hidden_channels tuple[int, ...]

Number of channels in each conv layer.

fc_dims tuple[int, ...]

Dimensions of fully connected layers.

dropout_rate float

Dropout rate for regularization.

DifferentiableCNVSegmentation¤

diffbio.operators.variant.cnv_segmentation.DifferentiableCNVSegmentation ¤

DifferentiableCNVSegmentation(
    config: CNVSegmentationConfig,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: TemperatureOperator

Soft CNV segmentation using attention-based changepoint detection.

This operator identifies segment boundaries in coverage data using attention mechanisms, replacing hard Circular Binary Segmentation with differentiable soft assignments.

Algorithm: 1. Project coverage signal into hidden space 2. Use self-attention to identify changepoint positions 3. Compute soft segment assignments via attention 4. Compute segment means as weighted averages

Inherits from TemperatureOperator to get:

  • _temperature property for temperature-controlled smoothing
  • soft_max() for logsumexp-based smooth maximum
  • soft_argmax() for soft position selection

Parameters:

Name Type Description Default
config CNVSegmentationConfig

CNVSegmentationConfig with model parameters.

required
rngs Rngs | None

Flax NNX random number generators.

None
name str | None

Optional operator name.

None
Example
config = CNVSegmentationConfig(max_segments=50)
segmenter = DifferentiableCNVSegmentation(config, rngs=nnx.Rngs(42))
data = {"coverage": coverage_signal}  # (n_positions,)
result, state, meta = segmenter.apply(data, {}, None)

Parameters:

Name Type Description Default
config CNVSegmentationConfig

Segmentation configuration.

required
rngs Rngs | None

Random number generators for initialization.

None
name str | None

Optional operator name.

None

apply ¤

apply(
    data: PyTree,
    state: PyTree,
    metadata: dict[str, Any] | None,
    random_params: Any = None,
    stats: dict[str, Any] | None = None,
) -> tuple[PyTree, PyTree, dict[str, Any] | None]

Apply CNV segmentation to coverage data.

Parameters:

Name Type Description Default
data PyTree

Dictionary containing: - "coverage": Coverage signal (n_positions,)

required
state PyTree

Element state (passed through unchanged)

required
metadata dict[str, Any] | None

Element metadata (passed through unchanged)

required
random_params Any

Not used

None
stats dict[str, Any] | None

Not used

None

Returns:

Type Description
tuple[PyTree, PyTree, dict[str, Any] | None]

Tuple of (transformed_data, state, metadata): - transformed_data contains:

- "coverage": Original coverage
- "boundary_probs": Soft boundary probabilities
- "segment_assignments": Soft segment memberships
- "segment_means": Mean value per segment
- "smoothed_coverage": Segmented/smoothed signal
  • state is passed through unchanged
  • metadata is passed through unchanged

CNVSegmentationConfig¤

diffbio.operators.variant.cnv_segmentation.CNVSegmentationConfig dataclass ¤

CNVSegmentationConfig(
    max_segments: int = 100,
    hidden_dim: int = 64,
    attention_heads: int = 4,
    temperature: float = 1.0,
)

Bases: OperatorConfig

Configuration for DifferentiableCNVSegmentation.

Attributes:

Name Type Description
max_segments int

Maximum number of segments to detect.

hidden_dim int

Hidden dimension for attention layers.

attention_heads int

Number of attention heads.

temperature float

Temperature for softmax operations.

SoftVariantQualityFilter¤

diffbio.operators.variant.quality_recalibration.SoftVariantQualityFilter ¤

SoftVariantQualityFilter(
    config: VariantQualityFilterConfig,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: TemperatureOperator

Differentiable variant quality filter using GMM.

This operator implements VQSR-style variant quality recalibration using a learnable Gaussian Mixture Model. Variants are scored by their likelihood under the GMM, and soft filtering is applied via sigmoid thresholds.

Algorithm: 1. Compute GMM component responsibilities (E-step style) 2. Score variants by weighted log-likelihood 3. Apply sigmoid threshold for soft filtering

Parameters:

Name Type Description Default
config VariantQualityFilterConfig

VariantQualityFilterConfig with model parameters.

required
rngs Rngs | None

Flax NNX random number generators.

None
name str | None

Optional operator name.

None
Example
config = VariantQualityFilterConfig(n_components=3)
filter_op = SoftVariantQualityFilter(config, rngs=nnx.Rngs(42))
data = {"variant_features": features}  # (n_variants, n_features)
result, state, meta = filter_op.apply(data, {}, None)

Parameters:

Name Type Description Default
config VariantQualityFilterConfig

Filter configuration.

required
rngs Rngs | None

Random number generators for initialization.

None
name str | None

Optional operator name.

None

apply ¤

apply(
    data: PyTree,
    state: PyTree,
    metadata: dict[str, Any] | None,
    random_params: Any = None,
    stats: dict[str, Any] | None = None,
) -> tuple[PyTree, PyTree, dict[str, Any] | None]

Apply quality filtering to variants.

Parameters:

Name Type Description Default
data PyTree

Dictionary containing: - "variant_features": Feature vectors (n_variants, n_features)

required
state PyTree

Element state (passed through unchanged)

required
metadata dict[str, Any] | None

Element metadata (passed through unchanged)

required
random_params Any

Not used

None
stats dict[str, Any] | None

Not used

None

Returns:

Type Description
tuple[PyTree, PyTree, dict[str, Any] | None]

Tuple of (transformed_data, state, metadata): - transformed_data contains:

- "variant_features": Original features
- "quality_scores": Computed quality scores [0, 1]
- "filter_weights": Soft filter weights [0, 1]
- "component_probs": GMM component responsibilities
  • state is passed through unchanged
  • metadata is passed through unchanged

VariantQualityFilterConfig¤

diffbio.operators.variant.quality_recalibration.VariantQualityFilterConfig dataclass ¤

VariantQualityFilterConfig(
    n_components: int = 3,
    n_features: int = 4,
    threshold: float = 0.5,
    temperature: float = 1.0,
)

Bases: OperatorConfig

Configuration for SoftVariantQualityFilter.

Attributes:

Name Type Description
n_components int

Number of GMM components.

n_features int

Number of variant features.

threshold float

Quality score threshold for filtering.

temperature float

Temperature for softmax/sigmoid operations.

CellTypeAwareVariantClassifier¤

diffbio.operators.variant.classifier.CellTypeAwareVariantClassifier ¤

CellTypeAwareVariantClassifier(
    config: CellTypeAwareVariantClassifierConfig,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: OperatorModule

Cell-type-aware variant classifier with per-type classification heads.

Uses separate classification heads for each cell type, weighted by soft cell-type assignment probabilities. This allows different variant calling thresholds per cell type, enabling more accurate variant detection in heterogeneous cell populations (e.g., single-cell sequencing).

Architecture
  1. Shared feature encoder: pileup -> flatten -> Linear -> ReLU -> hidden features
  2. Per-type classification heads: n_cell_types separate Linear(hidden, n_classes)
  3. Each head produces type-specific variant logits -> softmax probabilities
  4. Final aggregation: sum_t(cell_type_weights[:, t] * head_t_probs)

Parameters:

Name Type Description Default
config CellTypeAwareVariantClassifierConfig

CellTypeAwareVariantClassifierConfig with model parameters.

required
rngs Rngs | None

Flax NNX random number generators.

None
name str | None

Optional operator name.

None
Example
config = CellTypeAwareVariantClassifierConfig(n_classes=3, n_cell_types=5)
classifier = CellTypeAwareVariantClassifier(config, rngs=nnx.Rngs(42))
data = {
    "pileup": pileup_batch,              # (n, channels, width)
    "cell_type_assignments": assignments,  # (n, n_cell_types)
}
result, state, meta = classifier.apply(data, {}, None)
# result["variant_probabilities"]   -> (n, n_classes)
# result["per_type_probabilities"]  -> (n, n_cell_types, n_classes)

Parameters:

Name Type Description Default
config CellTypeAwareVariantClassifierConfig

Classifier configuration.

required
rngs Rngs | None

Random number generators for parameter initialization.

None
name str | None

Optional operator name.

None

apply ¤

apply(
    data: PyTree,
    state: PyTree,
    metadata: dict[str, Any] | None,
    random_params: Any = None,
    stats: dict[str, Any] | None = None,
) -> tuple[PyTree, PyTree, dict[str, Any] | None]

Apply cell-type-aware variant classification.

Computes per-type variant probabilities and aggregates them using cell-type assignment weights.

Parameters:

Name Type Description Default
data PyTree

Dictionary containing: - "pileup": Pileup data, shape (n, channels, width). - "cell_type_assignments": Soft cell-type weights, shape (n, n_cell_types).

required
state PyTree

Element state (passed through unchanged).

required
metadata dict[str, Any] | None

Element metadata (passed through unchanged).

required
random_params Any

Not used.

None
stats dict[str, Any] | None

Not used.

None

Returns:

Type Description
tuple[PyTree, PyTree, dict[str, Any] | None]

Tuple of (transformed_data, state, metadata): - transformed_data contains all input keys plus: - "variant_probabilities": Aggregated probabilities (n, n_classes) - "per_type_probabilities": Per-type probabilities (n, n_cell_types, n_classes) - state passed through unchanged - metadata passed through unchanged

CellTypeAwareVariantClassifierConfig¤

diffbio.operators.variant.classifier.CellTypeAwareVariantClassifierConfig dataclass ¤

CellTypeAwareVariantClassifierConfig(
    n_classes: int = 3,
    hidden_dim: int = 64,
    n_cell_types: int = 5,
    pileup_channels: int = 6,
    pileup_width: int = 100,
)

Bases: OperatorConfig

Configuration for cell-type-aware variant classifier.

This classifier uses separate classification heads per cell type, weighted by soft cell-type assignments to produce cell-type-specific variant calling thresholds.

Attributes:

Name Type Description
n_classes int

Number of variant types (e.g., SNP, indel, ref).

hidden_dim int

Hidden layer dimension for the shared feature encoder.

n_cell_types int

Number of cell types for per-type heads.

pileup_channels int

Number of channels in pileup input.

pileup_width int

Width of pileup input.

EnhancedCNVSegmentation¤

diffbio.operators.variant.cnv_segmentation.EnhancedCNVSegmentation ¤

EnhancedCNVSegmentation(
    config: EnhancedCNVSegmentationConfig,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: DifferentiableCNVSegmentation

Enhanced CNV segmentation with multi-signal fusion and pyramidal smoothing.

Inherits from DifferentiableCNVSegmentation and adds:

  1. Multi-signal fusion -- learnable linear combination of log-ratio coverage, BAF, and SNP density signals.
  2. Pyramidal smoothing -- infercnvpy-style triangular convolution for spatial noise reduction.
  3. Dynamic thresholding -- threshold_scale * std(smoothed) filters low-amplitude noise.
  4. HMM state mapping -- soft copy-number posteriors (0-somy to 4-somy by default) via learned emission model.

Parameters:

Name Type Description Default
config EnhancedCNVSegmentationConfig

EnhancedCNVSegmentationConfig with model parameters.

required
rngs Rngs | None

Flax NNX random number generators.

None
name str | None

Optional operator name.

None
Example
config = EnhancedCNVSegmentationConfig(
    max_segments=50, use_baf=True, smoothing_window=100,
)
op = EnhancedCNVSegmentation(config, rngs=nnx.Rngs(0))
data = {"coverage": cov, "baf_signal": baf, "snp_density": snp}
result, state, meta = op.apply(data, {}, None)

Parameters:

Name Type Description Default
config EnhancedCNVSegmentationConfig

Enhanced segmentation configuration.

required
rngs Rngs | None

Random number generators for initialization.

None
name str | None

Optional operator name.

None

apply ¤

apply(
    data: PyTree,
    state: PyTree,
    metadata: dict[str, Any] | None,
    random_params: Any = None,
    stats: dict[str, Any] | None = None,
) -> tuple[PyTree, PyTree, dict[str, Any] | None]

Apply enhanced CNV segmentation to genomic signal data.

Parameters:

Name Type Description Default
data PyTree

Dictionary containing: - "coverage": Log-ratio coverage signal (n_positions,) - "baf_signal" (optional): B-allele frequency (n_positions,) - "snp_density" (optional): SNP density (n_positions,)

required
state PyTree

Element state (passed through unchanged).

required
metadata dict[str, Any] | None

Element metadata (passed through unchanged).

required
random_params Any

Not used.

None
stats dict[str, Any] | None

Not used.

None

Returns:

Type Description
PyTree

Tuple of (transformed_data, state, metadata) where

PyTree

transformed_data contains:

dict[str, Any] | None
  • "coverage": Original coverage
tuple[PyTree, PyTree, dict[str, Any] | None]
  • "fused_signal": Fused multi-signal output
tuple[PyTree, PyTree, dict[str, Any] | None]
  • "pyramidal_smoothed": After pyramidal convolution
tuple[PyTree, PyTree, dict[str, Any] | None]
  • "thresholded_signal": After dynamic noise filtering
tuple[PyTree, PyTree, dict[str, Any] | None]
  • "dynamic_threshold": Scalar threshold value
tuple[PyTree, PyTree, dict[str, Any] | None]
  • "boundary_probs": Soft boundary probabilities
tuple[PyTree, PyTree, dict[str, Any] | None]
  • "segment_assignments": Soft segment memberships
tuple[PyTree, PyTree, dict[str, Any] | None]
  • "segment_means": Mean value per segment
tuple[PyTree, PyTree, dict[str, Any] | None]
  • "smoothed_coverage": Final segmented/smoothed signal
tuple[PyTree, PyTree, dict[str, Any] | None]
  • "copy_number_posteriors": Per-position CN state posteriors
tuple[PyTree, PyTree, dict[str, Any] | None]
  • "expected_copy_number": Expected copy number per position

EnhancedCNVSegmentationConfig¤

diffbio.operators.variant.cnv_segmentation.EnhancedCNVSegmentationConfig dataclass ¤

EnhancedCNVSegmentationConfig(
    max_segments: int = 100,
    hidden_dim: int = 64,
    attention_heads: int = 4,
    temperature: float = 1.0,
    use_baf: bool = False,
    baf_weight: float = 0.3,
    smoothing_window: int = 100,
    threshold_scale: float = 1.5,
    n_copy_states: int = 5,
)

Bases: CNVSegmentationConfig

Configuration for EnhancedCNVSegmentation.

Extends the base CNV segmentation with multi-signal fusion, pyramidal smoothing, dynamic thresholding, and HMM copy-number state mapping.

Attributes:

Name Type Description
max_segments int

Maximum number of segments to detect.

hidden_dim int

Hidden dimension for attention layers.

attention_heads int

Number of attention heads.

temperature float

Temperature for softmax operations.

use_baf bool

Whether to incorporate B-allele frequency signal.

baf_weight float

Initial weight for BAF signal in fusion.

smoothing_window int

Window size for pyramidal smoothing convolution.

threshold_scale float

Multiplier for STDDEV-based dynamic threshold.

n_copy_states int

Number of discrete copy-number states (0-somy to N-somy).

max_segments class-attribute instance-attribute ¤

max_segments: int = 100

hidden_dim class-attribute instance-attribute ¤

hidden_dim: int = 64

attention_heads class-attribute instance-attribute ¤

attention_heads: int = 4

temperature class-attribute instance-attribute ¤

temperature: float = 1.0

Usage Examples¤

CNN Variant Classification¤

from flax import nnx
from diffbio.operators.variant import CNNVariantClassifier, CNNVariantClassifierConfig

config = CNNVariantClassifierConfig(
    num_classes=3,
    input_height=100,
    input_width=221,
    num_channels=6,
)
classifier = CNNVariantClassifier(config, rngs=nnx.Rngs(42))

data = {"pileup_tensor": pileup}  # (n_positions, window_size, num_channels)
result, _, _ = classifier.apply(data, {}, None)
predictions = result["predictions"]

CNV Segmentation¤

from diffbio.operators.variant import DifferentiableCNVSegmentation, CNVSegmentationConfig

config = CNVSegmentationConfig(max_segments=100, hidden_dim=64)
cnv_seg = DifferentiableCNVSegmentation(config, rngs=nnx.Rngs(42))

data = {"log_ratios": log_ratios, "positions": positions}
result, _, _ = cnv_seg.apply(data, {}, None)
copy_numbers = result["copy_numbers"]

DeepVariant-style Pileup Images¤

import jax.numpy as jnp
from diffbio.operators.variant import DeepVariantStylePileup, DeepVariantPileupConfig

config = DeepVariantPileupConfig(
    window_size=101,
    max_reads=50,
    channels=(
        "base",
        "base_quality",
        "mapping_quality",
        "strand",
        "supports_variant",
        "differs_from_ref",
    ),
)
pileup = DeepVariantStylePileup(config)

# Prepare data
data = {
    "reads": reads,  # (num_reads, read_length, 4)
    "reference": reference,  # (window_size, 4)
    "base_qualities": qualities,  # (num_reads, read_length)
    "mapping_qualities": mapq,  # (num_reads,)
    "strands": strands,  # (num_reads,)
    "positions": positions,  # (num_reads,)
}

result, _, _ = pileup.apply(data, {}, None)
pileup_image = result["pileup_image"]  # (50, 101, 9)

Variant Quality Filtering¤

from diffbio.operators.variant import SoftVariantQualityFilter, VariantQualityFilterConfig

config = VariantQualityFilterConfig(
    n_components=3,
    n_features=4,
    threshold=0.5,
    temperature=1.0,
)
qf = SoftVariantQualityFilter(config, rngs=nnx.Rngs(42))

data = {"quality_scores": quality, "context_features": features}
result, _, _ = qf.apply(data, {}, None)
filtered = result["filtered_quality"]