Biological Regularization API¤
Regularization losses based on biological constraints and priors.
BiologicalPlausibilityLoss¤
diffbio.losses.biological_regularization.BiologicalPlausibilityLoss
¤
BiologicalPlausibilityLoss(
config: BiologicalRegularizationConfig,
*,
rngs: Rngs | None = None,
)
Bases: Module
Combined biological plausibility regularization.
Combines multiple regularization terms to encourage biologically plausible sequences and alignments during differentiable optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
BiologicalRegularizationConfig
|
BiologicalRegularizationConfig with weights and targets. |
required |
rngs
|
Rngs | None
|
Flax NNX random number generators. |
None
|
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
BiologicalRegularizationConfig
|
Configuration with weights and targets. |
required |
rngs
|
Rngs | None
|
Random number generators (optional). |
None
|
gc_loss
instance-attribute
¤
gc_loss = GCContentRegularization(
target_gc=target_gc_content,
tolerance=target_gc_tolerance,
rngs=rngs,
)
complexity_loss
instance-attribute
¤
complexity_loss = SequenceComplexityLoss(
min_entropy=1.0, rngs=rngs
)
BiologicalRegularizationConfig¤
diffbio.losses.biological_regularization.BiologicalRegularizationConfig
dataclass
¤
BiologicalRegularizationConfig(
gc_content_weight: float = 1.0,
gap_pattern_weight: float = 1.0,
complexity_weight: float = 1.0,
target_gc_content: float = 0.5,
target_gc_tolerance: float = 0.2,
)
Configuration for biological regularization losses.
Attributes:
| Name | Type | Description |
|---|---|---|
gc_content_weight |
float
|
Weight for GC content regularization. |
gap_pattern_weight |
float
|
Weight for gap pattern regularization. |
complexity_weight |
float
|
Weight for sequence complexity loss. |
target_gc_content |
float
|
Target GC content (typically 0.4-0.6). |
target_gc_tolerance |
float
|
Tolerance around target GC content. |
GCContentRegularization¤
diffbio.losses.biological_regularization.GCContentRegularization
¤
GCContentRegularization(
target_gc: float = 0.5,
tolerance: float = 0.2,
*,
rngs: Rngs | None = None,
)
Bases: Module
Regularization loss for GC content.
Penalizes sequences with GC content far from biological norms. For most organisms, GC content ranges from 25% to 75%.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_gc
|
float
|
Target GC content (default 0.5 for balanced). |
0.5
|
tolerance
|
float
|
Tolerance around target before penalizing. |
0.2
|
rngs
|
Rngs | None
|
Flax NNX random number generators. |
None
|
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_gc
|
float
|
Target GC content. |
0.5
|
tolerance
|
float
|
Tolerance around target. |
0.2
|
rngs
|
Rngs | None
|
Random number generators (optional). |
None
|
GapPatternRegularization¤
diffbio.losses.biological_regularization.GapPatternRegularization
¤
Bases: Module
Regularization loss for gap patterns in alignments.
Penalizes unrealistic gap patterns such as: - Very long consecutive gaps - Many scattered small gaps
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_gap_length
|
int
|
Maximum expected gap length before penalizing. |
10
|
rngs
|
Rngs | None
|
Flax NNX random number generators. |
None
|
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_gap_length
|
int
|
Maximum expected gap length. |
10
|
rngs
|
Rngs | None
|
Random number generators (optional). |
None
|
SequenceComplexityLoss¤
diffbio.losses.biological_regularization.SequenceComplexityLoss
¤
Bases: Module
Regularization loss for sequence complexity.
Penalizes low-complexity sequences that might arise from adversarial optimization (e.g., all-A sequences, repetitive patterns).
Uses entropy as a measure of complexity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_entropy
|
float
|
Minimum expected entropy per position. |
1.0
|
rngs
|
Rngs | None
|
Flax NNX random number generators. |
None
|
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_entropy
|
float
|
Minimum expected entropy. |
1.0
|
rngs
|
Rngs | None
|
Random number generators (optional). |
None
|
Usage Example¤
from diffbio.losses import (
BiologicalPlausibilityLoss,
GCContentRegularization,
SequenceComplexityLoss,
)
# GC content regularization
gc_reg = GCContentRegularization(target_gc=0.5, weight=1.0)
gc_loss = gc_reg(sequences=predicted_sequences)
# Sequence complexity loss
complexity = SequenceComplexityLoss()
comp_loss = complexity(sequences=predicted_sequences)