Smith-Waterman API¤
Differentiable Smith-Waterman local alignment operator.
SmoothSmithWaterman¤
diffbio.operators.alignment.smith_waterman.SmoothSmithWaterman
¤
SmoothSmithWaterman(
config: SmithWatermanConfig,
scoring_matrix: Array,
*,
rngs: Rngs | None = None,
name: str | None = None,
)
Bases: TemperatureOperator
Differentiable Smith-Waterman local alignment.
This operator implements a smooth version of the Smith-Waterman algorithm where max operations are replaced with logsumexp, enabling gradient flow through the alignment computation.
The smoothness is controlled by the temperature parameter: - temperature -> 0: Approaches hard max (standard Smith-Waterman) - temperature -> inf: Uniform averaging
Inherits from TemperatureOperator to get:
- Learnable temperature parameter management
- soft_max() method using logsumexp relaxation
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SmithWatermanConfig
|
SmithWatermanConfig with alignment parameters. |
required |
scoring_matrix
|
Array
|
Scoring matrix for matches/mismatches. |
required |
rngs
|
Rngs | None
|
Flax NNX random number generators (optional). |
None
|
name
|
str | None
|
Optional operator name. |
None
|
Example
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SmithWatermanConfig
|
Alignment configuration. |
required |
scoring_matrix
|
Array
|
Scoring matrix (alphabet_size, alphabet_size). |
required |
rngs
|
Rngs | None
|
Random number generators (optional). |
None
|
name
|
str | None
|
Optional operator name. |
None
|
align
¤
align(
seq1: Float[Array, "len1 alphabet"],
seq2: Float[Array, "len2 alphabet"],
) -> AlignmentResult
Perform smooth Smith-Waterman local alignment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq1
|
Float[Array, 'len1 alphabet']
|
First sequence, one-hot encoded (len1, alphabet_size). |
required |
seq2
|
Float[Array, 'len2 alphabet']
|
Second sequence, one-hot encoded (len2, alphabet_size). |
required |
Returns:
| Type | Description |
|---|---|
AlignmentResult
|
AlignmentResult with score, alignment matrix, and soft alignment. |
apply
¤
apply(
data: PyTree,
state: PyTree,
metadata: dict[str, Any] | None,
random_params: Any = None,
stats: dict[str, Any] | None = None,
) -> tuple[PyTree, PyTree, dict[str, Any] | None]
Apply alignment to sequence pair data.
This method implements the OperatorModule interface for batch processing. It expects data containing two sequences and returns alignment results.
Note: Output preserves input keys for Datarax vmap compatibility, while adding alignment result keys.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
PyTree
|
Dictionary containing: - "seq1": First sequence, one-hot encoded (len1, alphabet_size) - "seq2": Second sequence, one-hot encoded (len2, alphabet_size) |
required |
state
|
PyTree
|
Element state (passed through unchanged) |
required |
metadata
|
dict[str, Any] | None
|
Element metadata (passed through unchanged) |
required |
random_params
|
Any
|
Not used (deterministic operator) |
None
|
stats
|
dict[str, Any] | None
|
Not used |
None
|
Returns:
| Type | Description |
|---|---|
tuple[PyTree, PyTree, dict[str, Any] | None]
|
Tuple of (transformed_data, state, metadata): - transformed_data contains input sequences plus alignment results (score, alignment_matrix, soft_alignment) - state is passed through unchanged - metadata is passed through unchanged |
SmithWatermanConfig¤
diffbio.operators.alignment.smith_waterman.SmithWatermanConfig
dataclass
¤
SmithWatermanConfig(
temperature: float = DEFAULT_TEMPERATURE,
learnable_temperature: bool = False,
cacheable: bool = True,
gap_open: float = DEFAULT_GAP_OPEN,
gap_extend: float = DEFAULT_GAP_EXTEND,
)
Bases: TemperatureConfig
Configuration for SmoothSmithWaterman.
Attributes:
| Name | Type | Description |
|---|---|---|
temperature |
float
|
Temperature for logsumexp smoothing. Lower = sharper (closer to hard max), Higher = smoother. |
gap_open |
float
|
Penalty for opening a gap. |
gap_extend |
float
|
Penalty for extending a gap. |
AlignmentResult¤
diffbio.operators.alignment.smith_waterman.AlignmentResult
¤
Bases: NamedTuple
Result of a smooth alignment.
Attributes:
| Name | Type | Description |
|---|---|---|
score |
Float[Array, '']
|
The soft alignment score. |
alignment_matrix |
Float[Array, 'len1_plus1 len2_plus1']
|
The DP matrix H[i,j] of shape (len1+1, len2+1). |
soft_alignment |
Float[Array, 'len1 len2']
|
Soft alignment matrix showing position correspondences. |
Scoring Matrices¤
create_dna_scoring_matrix¤
diffbio.operators.alignment.scoring.create_dna_scoring_matrix
¤
Create a simple DNA scoring matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
match
|
float
|
Score for matching nucleotides (diagonal). |
2.0
|
mismatch
|
float
|
Score for mismatching nucleotides (off-diagonal). |
-1.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, '4 4']
|
4x4 scoring matrix for DNA (A, C, G, T order). |
create_rna_scoring_matrix¤
diffbio.operators.alignment.scoring.create_rna_scoring_matrix
¤
Create a simple RNA scoring matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
match
|
float
|
Score for matching nucleotides (diagonal). |
2.0
|
mismatch
|
float
|
Score for mismatching nucleotides (off-diagonal). |
-1.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, '4 4']
|
4x4 scoring matrix for RNA (A, C, G, U order). |
ScoringMatrix¤
diffbio.operators.alignment.scoring.ScoringMatrix
¤
Pre-defined Matrices¤
from diffbio.operators.alignment import (
get_dna_simple, # 4x4 DNA scoring matrix
get_rna_simple, # 4x4 RNA scoring matrix
get_blosum62, # 20x20 protein substitution matrix
PROTEIN_ALPHABET, # "ARNDCQEGHILKMFPSTWYV"
)
Usage Examples¤
Basic Alignment¤
import jax.numpy as jnp
from diffbio.operators.alignment import (
SmoothSmithWaterman,
SmithWatermanConfig,
create_dna_scoring_matrix,
)
# Setup
config = SmithWatermanConfig(temperature=1.0, gap_open=-10.0)
scoring = create_dna_scoring_matrix(match=2.0, mismatch=-1.0)
aligner = SmoothSmithWaterman(config, scoring_matrix=scoring)
# One-hot encode sequences
seq1 = jnp.eye(4)[jnp.array([0, 1, 2, 3])] # ACGT
seq2 = jnp.eye(4)[jnp.array([0, 1, 0, 3])] # ACAT
# Align
data = {"seq1": seq1, "seq2": seq2}
result, _, _ = aligner.apply(data, {}, None)
print(f"Score: {result['score']}")
Datarax Interface¤
data = {"seq1": seq1, "seq2": seq2}
result_data, state, metadata = aligner.apply(data, {}, None)
print(result_data["score"])