Differential Expression Pipeline API¤
DESeq2-style differential expression analysis pipeline.
DifferentialExpressionPipeline¤
diffbio.pipelines.differential_expression.DifferentialExpressionPipeline
¤
DifferentialExpressionPipeline(
config: DEPipelineConfig, *, rngs: Rngs | None = None
)
Bases: OperatorModule
End-to-end differentiable differential expression analysis.
This pipeline implements a DESeq2-style analysis with: 1. Size factor normalization (median-of-ratios) 2. Negative binomial GLM fitting 3. Wald test for significance 4. Multiple testing correction (soft approximation)
All steps maintain gradient flow for end-to-end learning.
Example
config = DEPipelineConfig(
n_genes=5000,
n_conditions=2,
)
pipeline = DifferentialExpressionPipeline(config, rngs=rngs)
data = {
"counts": count_matrix, # (n_samples, n_genes)
"design": design_matrix, # (n_samples, n_conditions)
}
result, state, metadata = pipeline.apply(data, {}, None)
lfc = result["log_fold_change"]
pvals = result["p_values"]
significant = result["significant"]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
DEPipelineConfig
|
Configuration for the pipeline. |
required |
rngs
|
Rngs | None
|
Random number generators for initialization. |
None
|
apply
¤
apply(
data: dict[str, Any],
state: dict[str, Any],
metadata: dict | None,
random_params: dict | None = None,
stats: dict | None = None,
) -> tuple[dict, dict, dict | None]
Apply differential expression analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary containing: - 'counts': Count matrix of shape (n_samples, n_genes) - 'design': Design matrix of shape (n_samples, n_conditions) |
required |
state
|
dict[str, Any]
|
Operator state dictionary. |
required |
metadata
|
dict | None
|
Optional metadata dictionary. |
required |
random_params
|
dict | None
|
Optional random parameters (unused). |
None
|
stats
|
dict | None
|
Optional statistics dictionary (unused). |
None
|
Returns:
| Type | Description |
|---|---|
tuple[dict, dict, dict | None]
|
Tuple of (output_data, state, metadata) where output_data contains:
|
DEPipelineConfig¤
diffbio.pipelines.differential_expression.DEPipelineConfig
dataclass
¤
DEPipelineConfig(
n_genes: int = 1000,
n_conditions: int = 2,
alpha: float = 0.05,
use_size_factors: bool = True,
)
Bases: OperatorConfig
Configuration for differential expression pipeline.
Attributes:
| Name | Type | Description |
|---|---|---|
n_genes |
int
|
Number of genes to analyze. |
n_conditions |
int
|
Number of conditions (covariates) in design matrix. |
alpha |
float
|
Significance threshold for differential expression. |
use_size_factors |
bool
|
Whether to compute and use size factors. |
Usage Examples¤
Basic Differential Expression¤
from flax import nnx
from diffbio.pipelines import (
DifferentialExpressionPipeline,
DEPipelineConfig,
)
config = DEPipelineConfig(
n_genes=2000,
n_conditions=2,
alpha=0.05,
)
pipeline = DifferentialExpressionPipeline(config, rngs=nnx.Rngs(42))
data = {
"counts": count_matrix, # (n_samples, n_genes)
"design": design_matrix, # (n_samples, n_conditions)
}
result, _, _ = pipeline.apply(data, {}, None)
log2fc = result["log_fold_change"]
pvalues = result["p_values"]
significant = result["significant"]
Access Intermediate Results¤
# Size factors
size_factors = result["size_factors"]
# Predicted mean expression
predicted_mean = result["predicted_mean"]
# Log fold change estimates
log_fold_change = result["log_fold_change"]
# Wald test statistics and standard errors
wald_statistic = result["wald_statistic"]
standard_error = result["standard_error"]
# P-values and significance indicators
p_values = result["p_values"]
significant = result["significant"]