Skip to content

RNA-seq Operators API¤

Differentiable operators for RNA-seq analysis including splicing PSI and motif discovery.

SplicingPSI¤

diffbio.operators.rnaseq.splicing_psi.SplicingPSI ¤

SplicingPSI(
    config: SplicingPSIConfig, *, rngs: Rngs | None = None
)

Bases: TemperatureOperator

Differentiable PSI calculation for alternative splicing analysis.

PSI (Percent Spliced In) quantifies alternative splicing by computing the fraction of transcripts that include a specific exon or splice site.

The standard PSI formula is

PSI = inclusion_reads / (inclusion_reads + exclusion_reads)

This operator adds: - Learnable pseudocount for regularization - Confidence estimation based on read coverage - Full differentiability for end-to-end training

Example
config = SplicingPSIConfig(
    pseudocount=1.0,
    min_total_reads=10,
)
psi_op = SplicingPSI(config, rngs=rngs)

data = {
    "inclusion_counts": inclusion_reads,  # Junction reads supporting inclusion
    "exclusion_counts": exclusion_reads,  # Junction reads supporting exclusion
}
result, state, metadata = psi_op.apply(data, {}, None)
psi_values = result["psi"]
confidence = result["psi_confidence"]

Parameters:

Name Type Description Default
config SplicingPSIConfig

Configuration for the operator.

required
rngs Rngs | None

Random number generators for initialization.

None

apply ¤

apply(
    data: dict[str, Any],
    state: dict[str, Any],
    metadata: dict | None,
    random_params: dict | None = None,
    stats: dict | None = None,
) -> tuple[dict, dict, dict | None]

Apply PSI calculation to junction read counts.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary containing: - 'inclusion_counts': Reads supporting exon inclusion - 'exclusion_counts': Reads supporting exon exclusion

required
state dict[str, Any]

Operator state dictionary.

required
metadata dict | None

Optional metadata dictionary.

required
random_params dict | None

Optional random parameters (unused).

None
stats dict | None

Optional statistics dictionary (unused).

None

Returns:

Type Description
tuple[dict, dict, dict | None]

Tuple of (output_data, state, metadata) where output_data contains:

  • 'inclusion_counts': Original inclusion counts
  • 'exclusion_counts': Original exclusion counts
  • 'psi': Computed PSI values
  • 'psi_confidence': Confidence in PSI estimates
  • 'psi_variance': Variance of PSI estimates

SplicingPSIConfig¤

diffbio.operators.rnaseq.splicing_psi.SplicingPSIConfig dataclass ¤

SplicingPSIConfig(
    pseudocount: float = 1.0,
    temperature: float = 1.0,
    learnable_temperature: bool = True,
    min_total_reads: int = 10,
)

Bases: OperatorConfig

Configuration for differentiable PSI calculation.

Attributes:

Name Type Description
pseudocount float

Pseudocount added for numerical stability and regularization.

temperature float

Temperature for confidence calculation.

min_total_reads int

Minimum total reads for reliable PSI estimation.

stream_name int

Name of the data stream to process.

DifferentiableMotifDiscovery¤

diffbio.operators.rnaseq.motif_discovery.DifferentiableMotifDiscovery ¤

DifferentiableMotifDiscovery(
    config: MotifDiscoveryConfig,
    *,
    rngs: Rngs | None = None,
)

Bases: TemperatureOperator

Differentiable motif discovery with PWM learning.

This operator implements a simplified differentiable version of MEME-style motif discovery. It learns Position Weight Matrices (PWMs) that represent sequence motifs and scans sequences to find motif occurrences.

The motif score at position i is computed as

score(i) = sum_j PWM[j, seq[i+j]]

For one-hot encoded sequences, this is equivalent to: score(i) = sum_j sum_k seq[i+j, k] * log(PWM[j, k])

Example
config = MotifDiscoveryConfig(
    motif_width=12,
    num_motifs=3,
)
motif_op = DifferentiableMotifDiscovery(config, rngs=rngs)

data = {"sequence": one_hot_sequence}  # (length, alphabet_size)
result, state, metadata = motif_op.apply(data, {}, None)
motif_scores = result["motif_scores"]  # (num_positions, num_motifs)
pwm = result["pwm"]  # (num_motifs, motif_width, alphabet_size)

Parameters:

Name Type Description Default
config MotifDiscoveryConfig

Configuration for the operator.

required
rngs Rngs | None

Random number generators for initialization.

None

apply ¤

apply(
    data: dict[str, Any],
    state: dict[str, Any],
    metadata: dict | None,
    random_params: dict | None = None,
    stats: dict | None = None,
) -> tuple[dict, dict, dict | None]

Apply motif discovery to sequence data.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary containing: - 'sequence': One-hot encoded sequence(s) of shape (length, alphabet_size) or (batch, length, alphabet_size)

required
state dict[str, Any]

Operator state dictionary.

required
metadata dict | None

Optional metadata dictionary.

required
random_params dict | None

Optional random parameters (unused).

None
stats dict | None

Optional statistics dictionary (unused).

None

Returns:

Type Description
tuple[dict, dict, dict | None]

Tuple of (output_data, state, metadata) where output_data contains:

  • 'sequence': Original sequence data
  • 'motif_scores': Log-odds scores at each position
  • 'motif_positions': Soft motif occurrence indicators
  • 'pwm': Current Position Weight Matrix

MotifDiscoveryConfig¤

diffbio.operators.rnaseq.motif_discovery.MotifDiscoveryConfig dataclass ¤

MotifDiscoveryConfig(
    motif_width: int = 12,
    num_motifs: int = 1,
    alphabet_size: int = 4,
    temperature: float = 1.0,
    learnable_temperature: bool = True,
    background_prior: float = 0.25,
)

Bases: OperatorConfig

Configuration for differentiable motif discovery.

Attributes:

Name Type Description
motif_width int

Width of the motif (number of positions).

num_motifs int

Number of motifs to discover.

alphabet_size int

Size of the sequence alphabet (4 for DNA).

temperature float

Temperature for soft operations.

background_prior float

Prior probability for background model.

stream_name float

Name of the data stream to process.

Usage Examples¤

Splicing PSI Calculation¤

from flax import nnx
from diffbio.operators.rnaseq import SplicingPSI, SplicingPSIConfig

config = SplicingPSIConfig(temperature=1.0, pseudocount=1.0)
psi_calc = SplicingPSI(config, rngs=nnx.Rngs(42))

data = {
    "inclusion_counts": inclusion,
    "exclusion_counts": exclusion,
}
result, _, _ = psi_calc.apply(data, {}, None)
psi_values = result["psi"]

Motif Discovery¤

from diffbio.operators.rnaseq import DifferentiableMotifDiscovery, MotifDiscoveryConfig

config = MotifDiscoveryConfig(num_motifs=10, motif_width=8)
motif_finder = DifferentiableMotifDiscovery(config, rngs=nnx.Rngs(42))

data = {"sequences": sequences}  # (n_seqs, seq_len, alphabet_size)
result, _, _ = motif_finder.apply(data, {}, None)
pwms = result["pwms"]