Skip to content

Differential Expression Pipeline API¤

DESeq2-style differential expression analysis pipeline.

DifferentialExpressionPipeline¤

diffbio.pipelines.differential_expression.DifferentialExpressionPipeline ¤

DifferentialExpressionPipeline(
    config: DEPipelineConfig, *, rngs: Rngs | None = None
)

Bases: OperatorModule

End-to-end differentiable differential expression analysis.

This pipeline implements a DESeq2-style analysis with: 1. Size factor normalization (median-of-ratios) 2. Negative binomial GLM fitting 3. Wald test for significance 4. Multiple testing correction (soft approximation)

All steps maintain gradient flow for end-to-end learning.

Example
config = DEPipelineConfig(
    n_genes=5000,
    n_conditions=2,
)
pipeline = DifferentialExpressionPipeline(config, rngs=rngs)

data = {
    "counts": count_matrix,  # (n_samples, n_genes)
    "design": design_matrix,  # (n_samples, n_conditions)
}
result, state, metadata = pipeline.apply(data, {}, None)
lfc = result["log_fold_change"]
pvals = result["p_values"]
significant = result["significant"]

Parameters:

Name Type Description Default
config DEPipelineConfig

Configuration for the pipeline.

required
rngs Rngs | None

Random number generators for initialization.

None

apply ¤

apply(
    data: dict[str, Any],
    state: dict[str, Any],
    metadata: dict | None,
    random_params: dict | None = None,
    stats: dict | None = None,
) -> tuple[dict, dict, dict | None]

Apply differential expression analysis.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary containing: - 'counts': Count matrix of shape (n_samples, n_genes) - 'design': Design matrix of shape (n_samples, n_conditions)

required
state dict[str, Any]

Operator state dictionary.

required
metadata dict | None

Optional metadata dictionary.

required
random_params dict | None

Optional random parameters (unused).

None
stats dict | None

Optional statistics dictionary (unused).

None

Returns:

Type Description
tuple[dict, dict, dict | None]

Tuple of (output_data, state, metadata) where output_data contains:

  • 'counts': Original count matrix
  • 'design': Original design matrix
  • 'size_factors': Computed size factors
  • 'predicted_mean': Predicted mean expression
  • 'log_fold_change': Log2 fold change estimates
  • 'wald_statistic': Wald test statistics
  • 'standard_error': Standard errors
  • 'p_values': P-values for differential expression
  • 'significant': Soft significance indicators

DEPipelineConfig¤

diffbio.pipelines.differential_expression.DEPipelineConfig dataclass ¤

DEPipelineConfig(
    n_genes: int = 1000,
    n_conditions: int = 2,
    alpha: float = 0.05,
    use_size_factors: bool = True,
)

Bases: OperatorConfig

Configuration for differential expression pipeline.

Attributes:

Name Type Description
n_genes int

Number of genes to analyze.

n_conditions int

Number of conditions (covariates) in design matrix.

alpha float

Significance threshold for differential expression.

use_size_factors bool

Whether to compute and use size factors.

Usage Examples¤

Basic Differential Expression¤

from flax import nnx
from diffbio.pipelines import (
    DifferentialExpressionPipeline,
    DEPipelineConfig,
)

config = DEPipelineConfig(
    n_genes=2000,
    n_conditions=2,
    alpha=0.05,
)

pipeline = DifferentialExpressionPipeline(config, rngs=nnx.Rngs(42))

data = {
    "counts": count_matrix,      # (n_samples, n_genes)
    "design": design_matrix,     # (n_samples, n_conditions)
}
result, _, _ = pipeline.apply(data, {}, None)

log2fc = result["log_fold_change"]
pvalues = result["p_values"]
significant = result["significant"]

Access Intermediate Results¤

# Size factors
size_factors = result["size_factors"]

# Predicted mean expression
predicted_mean = result["predicted_mean"]

# Log fold change estimates
log_fold_change = result["log_fold_change"]

# Wald test statistics and standard errors
wald_statistic = result["wald_statistic"]
standard_error = result["standard_error"]

# P-values and significance indicators
p_values = result["p_values"]
significant = result["significant"]