Epigenomics Operators API¤
Differentiable operators for epigenomic analysis including peak calling and chromatin state annotation.
DifferentiablePeakCaller¤
diffbio.operators.epigenomics.peak_calling.DifferentiablePeakCaller
¤
DifferentiablePeakCaller(
config: PeakCallerConfig, *, rngs: Rngs | None = None
)
Bases: TemperatureOperator
Differentiable peak caller for ChIP-seq and ATAC-seq data.
This operator uses a CNN-based approach to detect peaks in coverage signals, with soft thresholding for end-to-end differentiability.
Optionally applies VAE-based denoising before peak detection, using a Poisson decoder (per SCALE) to model count data. When VAE denoising is enabled, the pipeline is: coverage -> VAE encoder -> latent -> Poisson decoder -> denoised -> CNN -> peaks
The operator processes coverage data and outputs: - Peak probabilities at each position - Peak boundaries (soft) - Peak summits - Denoised coverage (when VAE is enabled)
Example
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
PeakCallerConfig
|
Configuration for the peak caller. |
required |
rngs
|
Rngs | None
|
Random number generators for initialization. |
None
|
apply
¤
apply(
data: dict[str, Any],
state: dict[str, Any],
metadata: dict | None,
random_params: dict | None = None,
stats: dict | None = None,
) -> tuple[dict, dict, dict | None]
Apply peak calling to coverage data.
When VAE denoising is enabled, the coverage signal is first denoised through a VAE with Poisson decoder before peak detection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary containing: - 'coverage': Coverage signal of shape (batch, length) or (length,) |
required |
state
|
dict[str, Any]
|
Operator state dictionary. |
required |
metadata
|
dict | None
|
Optional metadata dictionary. |
required |
random_params
|
dict | None
|
Optional random parameters. |
None
|
stats
|
dict | None
|
Optional statistics dictionary. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[dict, dict, dict | None]
|
Tuple of (output_data, state, metadata) where output_data contains:
|
PeakCallerConfig¤
diffbio.operators.epigenomics.peak_calling.PeakCallerConfig
dataclass
¤
PeakCallerConfig(
learnable_temperature: bool = True,
min_peak_width: int = 50,
use_vae_denoising: bool = False,
vae_latent_dim: int = 16,
vae_hidden_dim: int = 64,
window_size: int = 200,
num_filters: int = 32,
kernel_sizes: tuple[int, ...] = (5, 11, 21),
threshold: float = 0.5,
temperature: float = 1.0,
)
Bases: _PeakDetectionConfig, _PeakDenoisingConfig, OperatorConfig
Configuration for differentiable peak caller.
ChromatinStateAnnotator¤
diffbio.operators.epigenomics.chromatin_state.ChromatinStateAnnotator
¤
ChromatinStateAnnotator(
config: ChromatinStateConfig,
*,
rngs: Rngs | None = None,
)
Bases: TemperatureOperator
Differentiable chromatin state annotator using HMM.
This operator implements a differentiable Hidden Markov Model for annotating chromatin states from histone modification data. It uses the forward algorithm in log-space for numeric stability and provides soft Viterbi decoding for end-to-end differentiability.
The HMM has: - Learnable transition probabilities between states - Learnable emission probabilities for each histone mark per state - Learnable initial state distribution
When cell-type conditioning is enabled: - Each state has per-cell-type Gaussian emission parameters (mean, variance) - The cell type vector is used to blend emission parameters - GMM-style soft assignment (gamma) is computed via softmax over log-likelihoods, per SCALE
Inherits from TemperatureOperator to get:
- _temperature property for temperature-controlled smoothing
- soft_max() for logsumexp-based smooth maximum
- soft_argmax() for soft Viterbi decoding
Example
config = ChromatinStateConfig(
num_states=15,
num_marks=6,
use_cell_type_conditioning=True,
num_cell_types=5,
)
annotator = ChromatinStateAnnotator(config, rngs=rngs)
data = {
"histone_marks": marks, # (length, num_marks)
"cell_type": cell_type, # (num_cell_types,) soft vector
}
result, state, metadata = annotator.apply(data, {}, None)
state_probs = result["state_probabilities"]
gamma = result["gamma"] # soft state assignment
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ChromatinStateConfig
|
Configuration for the annotator. |
required |
rngs
|
Rngs | None
|
Random number generators for initialization. |
None
|
apply
¤
apply(
data: dict[str, Any],
state: dict[str, Any],
metadata: dict | None,
random_params: dict | None = None,
stats: dict | None = None,
) -> tuple[dict, dict, dict | None]
Apply chromatin state annotation to histone mark data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary containing: - 'histone_marks': Signals of shape (length, num_marks) or (batch, length, num_marks) - 'cell_type': Optional cell type vector of shape (num_cell_types,) when conditioning is enabled |
required |
state
|
dict[str, Any]
|
Operator state dictionary. |
required |
metadata
|
dict | None
|
Optional metadata dictionary. |
required |
random_params
|
dict | None
|
Optional random parameters (unused). |
None
|
stats
|
dict | None
|
Optional statistics dictionary (unused). |
None
|
Returns:
| Type | Description |
|---|---|
tuple[dict, dict, dict | None]
|
Tuple of (output_data, state, metadata) where output_data contains:
|
ChromatinStateConfig¤
diffbio.operators.epigenomics.chromatin_state.ChromatinStateConfig
dataclass
¤
ChromatinStateConfig(
num_states: int = 15,
num_marks: int = 6,
temperature: float = 1.0,
use_cell_type_conditioning: bool = False,
num_cell_types: int = 1,
)
Bases: OperatorConfig
Configuration for differentiable chromatin state annotator.
Attributes:
| Name | Type | Description |
|---|---|---|
num_states |
int
|
Number of chromatin states to learn. |
num_marks |
int
|
Number of histone marks in input. |
temperature |
float
|
Temperature for soft operations. |
use_cell_type_conditioning |
bool
|
Whether to condition emission probabilities on cell type. When enabled, each state has per-cell-type Gaussian emission parameters. |
num_cell_types |
int
|
Number of cell types for conditioning. |
stream_name |
int
|
Name of the data stream to process. |
Usage Examples¤
Peak Calling¤
from flax import nnx
from diffbio.operators.epigenomics import DifferentiablePeakCaller, PeakCallerConfig
config = PeakCallerConfig(num_filters=32, kernel_sizes=[3, 5, 7])
peak_caller = DifferentiablePeakCaller(config, rngs=nnx.Rngs(42))
data = {"signal": signal_track} # (length,)
result, _, _ = peak_caller.apply(data, {}, None)
peaks = result["peak_scores"]
Chromatin State Annotation¤
from diffbio.operators.epigenomics import ChromatinStateAnnotator, ChromatinStateConfig
config = ChromatinStateConfig(num_states=15, num_marks=6)
annotator = ChromatinStateAnnotator(config, rngs=nnx.Rngs(42))
data = {"histone_marks": marks} # (length, num_marks)
result, _, _ = annotator.apply(data, {}, None)
states = result["state_probabilities"]