Skip to content

Epigenomics Operators API¤

Differentiable operators for epigenomic analysis including peak calling and chromatin state annotation.

DifferentiablePeakCaller¤

diffbio.operators.epigenomics.peak_calling.DifferentiablePeakCaller ¤

DifferentiablePeakCaller(
    config: PeakCallerConfig, *, rngs: Rngs | None = None
)

Bases: TemperatureOperator

Differentiable peak caller for ChIP-seq and ATAC-seq data.

This operator uses a CNN-based approach to detect peaks in coverage signals, with soft thresholding for end-to-end differentiability.

Optionally applies VAE-based denoising before peak detection, using a Poisson decoder (per SCALE) to model count data. When VAE denoising is enabled, the pipeline is: coverage -> VAE encoder -> latent -> Poisson decoder -> denoised -> CNN -> peaks

The operator processes coverage data and outputs: - Peak probabilities at each position - Peak boundaries (soft) - Peak summits - Denoised coverage (when VAE is enabled)

Example
config = PeakCallerConfig(
    window_size=200,
    num_filters=32,
    threshold=0.5,
    use_vae_denoising=True,
)
peak_caller = DifferentiablePeakCaller(config, rngs=rngs)

data = {"coverage": coverage_signal}
result, state, metadata = peak_caller.apply(data, {}, None)
peak_probs = result["peak_probabilities"]

Parameters:

Name Type Description Default
config PeakCallerConfig

Configuration for the peak caller.

required
rngs Rngs | None

Random number generators for initialization.

None

apply ¤

apply(
    data: dict[str, Any],
    state: dict[str, Any],
    metadata: dict | None,
    random_params: dict | None = None,
    stats: dict | None = None,
) -> tuple[dict, dict, dict | None]

Apply peak calling to coverage data.

When VAE denoising is enabled, the coverage signal is first denoised through a VAE with Poisson decoder before peak detection.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary containing: - 'coverage': Coverage signal of shape (batch, length) or (length,)

required
state dict[str, Any]

Operator state dictionary.

required
metadata dict | None

Optional metadata dictionary.

required
random_params dict | None

Optional random parameters.

None
stats dict | None

Optional statistics dictionary.

None

Returns:

Type Description
tuple[dict, dict, dict | None]

Tuple of (output_data, state, metadata) where output_data contains:

  • 'coverage': Original coverage signal
  • 'peak_scores': Raw peak detection scores
  • 'peak_probabilities': Soft peak probabilities
  • 'peak_summits': Soft summit indicators
  • 'peak_starts': Soft peak start indicators
  • 'peak_ends': Soft peak end indicators
  • 'denoised_coverage': Denoised signal (only when VAE enabled)
  • 'vae_kl_loss': KL divergence loss (only when VAE enabled)

PeakCallerConfig¤

diffbio.operators.epigenomics.peak_calling.PeakCallerConfig dataclass ¤

PeakCallerConfig(
    learnable_temperature: bool = True,
    min_peak_width: int = 50,
    use_vae_denoising: bool = False,
    vae_latent_dim: int = 16,
    vae_hidden_dim: int = 64,
    window_size: int = 200,
    num_filters: int = 32,
    kernel_sizes: tuple[int, ...] = (5, 11, 21),
    threshold: float = 0.5,
    temperature: float = 1.0,
)

Bases: _PeakDetectionConfig, _PeakDenoisingConfig, OperatorConfig

Configuration for differentiable peak caller.

learnable_temperature class-attribute instance-attribute ¤

learnable_temperature: bool = True

min_peak_width class-attribute instance-attribute ¤

min_peak_width: int = 50

use_vae_denoising class-attribute instance-attribute ¤

use_vae_denoising: bool = False

vae_latent_dim class-attribute instance-attribute ¤

vae_latent_dim: int = 16

vae_hidden_dim class-attribute instance-attribute ¤

vae_hidden_dim: int = 64

window_size class-attribute instance-attribute ¤

window_size: int = 200

num_filters class-attribute instance-attribute ¤

num_filters: int = 32

kernel_sizes class-attribute instance-attribute ¤

kernel_sizes: tuple[int, ...] = (5, 11, 21)

threshold class-attribute instance-attribute ¤

threshold: float = 0.5

temperature class-attribute instance-attribute ¤

temperature: float = 1.0

ChromatinStateAnnotator¤

diffbio.operators.epigenomics.chromatin_state.ChromatinStateAnnotator ¤

ChromatinStateAnnotator(
    config: ChromatinStateConfig,
    *,
    rngs: Rngs | None = None,
)

Bases: TemperatureOperator

Differentiable chromatin state annotator using HMM.

This operator implements a differentiable Hidden Markov Model for annotating chromatin states from histone modification data. It uses the forward algorithm in log-space for numeric stability and provides soft Viterbi decoding for end-to-end differentiability.

The HMM has: - Learnable transition probabilities between states - Learnable emission probabilities for each histone mark per state - Learnable initial state distribution

When cell-type conditioning is enabled: - Each state has per-cell-type Gaussian emission parameters (mean, variance) - The cell type vector is used to blend emission parameters - GMM-style soft assignment (gamma) is computed via softmax over log-likelihoods, per SCALE

Inherits from TemperatureOperator to get:

  • _temperature property for temperature-controlled smoothing
  • soft_max() for logsumexp-based smooth maximum
  • soft_argmax() for soft Viterbi decoding
Example
config = ChromatinStateConfig(
    num_states=15,
    num_marks=6,
    use_cell_type_conditioning=True,
    num_cell_types=5,
)
annotator = ChromatinStateAnnotator(config, rngs=rngs)

data = {
    "histone_marks": marks,  # (length, num_marks)
    "cell_type": cell_type,  # (num_cell_types,) soft vector
}
result, state, metadata = annotator.apply(data, {}, None)
state_probs = result["state_probabilities"]
gamma = result["gamma"]  # soft state assignment

Parameters:

Name Type Description Default
config ChromatinStateConfig

Configuration for the annotator.

required
rngs Rngs | None

Random number generators for initialization.

None

apply ¤

apply(
    data: dict[str, Any],
    state: dict[str, Any],
    metadata: dict | None,
    random_params: dict | None = None,
    stats: dict | None = None,
) -> tuple[dict, dict, dict | None]

Apply chromatin state annotation to histone mark data.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary containing: - 'histone_marks': Signals of shape (length, num_marks) or (batch, length, num_marks) - 'cell_type': Optional cell type vector of shape (num_cell_types,) when conditioning is enabled

required
state dict[str, Any]

Operator state dictionary.

required
metadata dict | None

Optional metadata dictionary.

required
random_params dict | None

Optional random parameters (unused).

None
stats dict | None

Optional statistics dictionary (unused).

None

Returns:

Type Description
tuple[dict, dict, dict | None]

Tuple of (output_data, state, metadata) where output_data contains:

  • 'histone_marks': Original histone mark signals
  • 'state_probabilities': State probabilities at each position
  • 'state_posteriors': Posterior state probabilities
  • 'viterbi_path': Soft Viterbi decoding result
  • 'log_likelihood': Log likelihood of the sequence
  • 'gamma': Soft state assignment (only when conditioning enabled)

ChromatinStateConfig¤

diffbio.operators.epigenomics.chromatin_state.ChromatinStateConfig dataclass ¤

ChromatinStateConfig(
    num_states: int = 15,
    num_marks: int = 6,
    temperature: float = 1.0,
    use_cell_type_conditioning: bool = False,
    num_cell_types: int = 1,
)

Bases: OperatorConfig

Configuration for differentiable chromatin state annotator.

Attributes:

Name Type Description
num_states int

Number of chromatin states to learn.

num_marks int

Number of histone marks in input.

temperature float

Temperature for soft operations.

use_cell_type_conditioning bool

Whether to condition emission probabilities on cell type. When enabled, each state has per-cell-type Gaussian emission parameters.

num_cell_types int

Number of cell types for conditioning.

stream_name int

Name of the data stream to process.

Usage Examples¤

Peak Calling¤

from flax import nnx
from diffbio.operators.epigenomics import DifferentiablePeakCaller, PeakCallerConfig

config = PeakCallerConfig(num_filters=32, kernel_sizes=[3, 5, 7])
peak_caller = DifferentiablePeakCaller(config, rngs=nnx.Rngs(42))

data = {"signal": signal_track}  # (length,)
result, _, _ = peak_caller.apply(data, {}, None)
peaks = result["peak_scores"]

Chromatin State Annotation¤

from diffbio.operators.epigenomics import ChromatinStateAnnotator, ChromatinStateConfig

config = ChromatinStateConfig(num_states=15, num_marks=6)
annotator = ChromatinStateAnnotator(config, rngs=nnx.Rngs(42))

data = {"histone_marks": marks}  # (length, num_marks)
result, _, _ = annotator.apply(data, {}, None)
states = result["state_probabilities"]