RNA-seq Operators API¤
Differentiable operators for RNA-seq analysis including splicing PSI and motif discovery.
SplicingPSI¤
diffbio.operators.rnaseq.splicing_psi.SplicingPSI
¤
SplicingPSI(
config: SplicingPSIConfig, *, rngs: Rngs | None = None
)
Bases: TemperatureOperator
Differentiable PSI calculation for alternative splicing analysis.
PSI (Percent Spliced In) quantifies alternative splicing by computing the fraction of transcripts that include a specific exon or splice site.
The standard PSI formula is
PSI = inclusion_reads / (inclusion_reads + exclusion_reads)
This operator adds: - Learnable pseudocount for regularization - Confidence estimation based on read coverage - Full differentiability for end-to-end training
Example
config = SplicingPSIConfig(
pseudocount=1.0,
min_total_reads=10,
)
psi_op = SplicingPSI(config, rngs=rngs)
data = {
"inclusion_counts": inclusion_reads, # Junction reads supporting inclusion
"exclusion_counts": exclusion_reads, # Junction reads supporting exclusion
}
result, state, metadata = psi_op.apply(data, {}, None)
psi_values = result["psi"]
confidence = result["psi_confidence"]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SplicingPSIConfig
|
Configuration for the operator. |
required |
rngs
|
Rngs | None
|
Random number generators for initialization. |
None
|
apply
¤
apply(
data: dict[str, Any],
state: dict[str, Any],
metadata: dict | None,
random_params: dict | None = None,
stats: dict | None = None,
) -> tuple[dict, dict, dict | None]
Apply PSI calculation to junction read counts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary containing: - 'inclusion_counts': Reads supporting exon inclusion - 'exclusion_counts': Reads supporting exon exclusion |
required |
state
|
dict[str, Any]
|
Operator state dictionary. |
required |
metadata
|
dict | None
|
Optional metadata dictionary. |
required |
random_params
|
dict | None
|
Optional random parameters (unused). |
None
|
stats
|
dict | None
|
Optional statistics dictionary (unused). |
None
|
Returns:
| Type | Description |
|---|---|
tuple[dict, dict, dict | None]
|
Tuple of (output_data, state, metadata) where output_data contains:
|
SplicingPSIConfig¤
diffbio.operators.rnaseq.splicing_psi.SplicingPSIConfig
dataclass
¤
SplicingPSIConfig(
pseudocount: float = 1.0,
temperature: float = 1.0,
learnable_temperature: bool = True,
min_total_reads: int = 10,
)
Bases: OperatorConfig
Configuration for differentiable PSI calculation.
Attributes:
| Name | Type | Description |
|---|---|---|
pseudocount |
float
|
Pseudocount added for numerical stability and regularization. |
temperature |
float
|
Temperature for confidence calculation. |
min_total_reads |
int
|
Minimum total reads for reliable PSI estimation. |
stream_name |
int
|
Name of the data stream to process. |
DifferentiableMotifDiscovery¤
diffbio.operators.rnaseq.motif_discovery.DifferentiableMotifDiscovery
¤
DifferentiableMotifDiscovery(
config: MotifDiscoveryConfig,
*,
rngs: Rngs | None = None,
)
Bases: TemperatureOperator
Differentiable motif discovery with PWM learning.
This operator implements a simplified differentiable version of MEME-style motif discovery. It learns Position Weight Matrices (PWMs) that represent sequence motifs and scans sequences to find motif occurrences.
The motif score at position i is computed as
score(i) = sum_j PWM[j, seq[i+j]]
For one-hot encoded sequences, this is equivalent to: score(i) = sum_j sum_k seq[i+j, k] * log(PWM[j, k])
Example
config = MotifDiscoveryConfig(
motif_width=12,
num_motifs=3,
)
motif_op = DifferentiableMotifDiscovery(config, rngs=rngs)
data = {"sequence": one_hot_sequence} # (length, alphabet_size)
result, state, metadata = motif_op.apply(data, {}, None)
motif_scores = result["motif_scores"] # (num_positions, num_motifs)
pwm = result["pwm"] # (num_motifs, motif_width, alphabet_size)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
MotifDiscoveryConfig
|
Configuration for the operator. |
required |
rngs
|
Rngs | None
|
Random number generators for initialization. |
None
|
apply
¤
apply(
data: dict[str, Any],
state: dict[str, Any],
metadata: dict | None,
random_params: dict | None = None,
stats: dict | None = None,
) -> tuple[dict, dict, dict | None]
Apply motif discovery to sequence data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary containing: - 'sequence': One-hot encoded sequence(s) of shape (length, alphabet_size) or (batch, length, alphabet_size) |
required |
state
|
dict[str, Any]
|
Operator state dictionary. |
required |
metadata
|
dict | None
|
Optional metadata dictionary. |
required |
random_params
|
dict | None
|
Optional random parameters (unused). |
None
|
stats
|
dict | None
|
Optional statistics dictionary (unused). |
None
|
Returns:
| Type | Description |
|---|---|
tuple[dict, dict, dict | None]
|
Tuple of (output_data, state, metadata) where output_data contains:
|
MotifDiscoveryConfig¤
diffbio.operators.rnaseq.motif_discovery.MotifDiscoveryConfig
dataclass
¤
MotifDiscoveryConfig(
motif_width: int = 12,
num_motifs: int = 1,
alphabet_size: int = 4,
temperature: float = 1.0,
learnable_temperature: bool = True,
background_prior: float = 0.25,
)
Bases: OperatorConfig
Configuration for differentiable motif discovery.
Attributes:
| Name | Type | Description |
|---|---|---|
motif_width |
int
|
Width of the motif (number of positions). |
num_motifs |
int
|
Number of motifs to discover. |
alphabet_size |
int
|
Size of the sequence alphabet (4 for DNA). |
temperature |
float
|
Temperature for soft operations. |
background_prior |
float
|
Prior probability for background model. |
stream_name |
float
|
Name of the data stream to process. |
Usage Examples¤
Splicing PSI Calculation¤
from flax import nnx
from diffbio.operators.rnaseq import SplicingPSI, SplicingPSIConfig
config = SplicingPSIConfig(temperature=1.0, pseudocount=1.0)
psi_calc = SplicingPSI(config, rngs=nnx.Rngs(42))
data = {
"inclusion_counts": inclusion,
"exclusion_counts": exclusion,
}
result, _, _ = psi_calc.apply(data, {}, None)
psi_values = result["psi"]
Motif Discovery¤
from diffbio.operators.rnaseq import DifferentiableMotifDiscovery, MotifDiscoveryConfig
config = MotifDiscoveryConfig(num_motifs=10, motif_width=8)
motif_finder = DifferentiableMotifDiscovery(config, rngs=nnx.Rngs(42))
data = {"sequences": sequences} # (n_seqs, seq_len, alphabet_size)
result, _, _ = motif_finder.apply(data, {}, None)
pwms = result["pwms"]