Smith-Waterman API¤

Differentiable Smith-Waterman local alignment operator.

SmoothSmithWaterman¤

diffbio.operators.alignment.smith_waterman.SmoothSmithWaterman ¤

SmoothSmithWaterman(
    config: SmithWatermanConfig,
    scoring_matrix: Array,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: TemperatureOperator

Differentiable Smith-Waterman local alignment.

This operator implements a smooth version of the Smith-Waterman algorithm where max operations are replaced with logsumexp, enabling gradient flow through the alignment computation.

The smoothness is controlled by the temperature parameter: - temperature -> 0: Approaches hard max (standard Smith-Waterman) - temperature -> inf: Uniform averaging

Inherits from TemperatureOperator to get:

Learnable temperature parameter management
soft_max() method using logsumexp relaxation

Parameters:

Name	Type	Description	Default
`config`	`SmithWatermanConfig`	SmithWatermanConfig with alignment parameters.	required
`scoring_matrix`	`Array`	Scoring matrix for matches/mismatches.	required
`rngs`	`Rngs \| None`	Flax NNX random number generators (optional).	`None`
`name`	`str \| None`	Optional operator name.	`None`

Example

config = SmithWatermanConfig(temperature=1.0)
scoring = create_dna_scoring_matrix(match=2.0, mismatch=-1.0)
aligner = SmoothSmithWaterman(config, scoring_matrix=scoring)
result = aligner.align(seq1, seq2)
print(result.score)

Parameters:

Name	Type	Description	Default
`config`	`SmithWatermanConfig`	Alignment configuration.	required
`scoring_matrix`	`Array`	Scoring matrix (alphabet_size, alphabet_size).	required
`rngs`	`Rngs \| None`	Random number generators (optional).	`None`
`name`	`str \| None`	Optional operator name.	`None`

align ¤

align(
    seq1: Float[Array, "len1 alphabet"],
    seq2: Float[Array, "len2 alphabet"],
) -> AlignmentResult

Perform smooth Smith-Waterman local alignment.

Parameters:

Name	Type	Description	Default
`seq1`	`Float[Array, 'len1 alphabet']`	First sequence, one-hot encoded (len1, alphabet_size).	required
`seq2`	`Float[Array, 'len2 alphabet']`	Second sequence, one-hot encoded (len2, alphabet_size).	required

Returns:

Type	Description
`AlignmentResult`	AlignmentResult with score, alignment matrix, and soft alignment.

apply ¤

apply(
    data: PyTree,
    state: PyTree,
    metadata: dict[str, Any] | None,
    random_params: Any = None,
    stats: dict[str, Any] | None = None,
) -> tuple[PyTree, PyTree, dict[str, Any] | None]

Apply alignment to sequence pair data.

This method implements the OperatorModule interface for batch processing. It expects data containing two sequences and returns alignment results.

Note: Output preserves input keys for Datarax vmap compatibility, while adding alignment result keys.

Parameters:

Name	Type	Description	Default
`data`	`PyTree`	Dictionary containing: - "seq1": First sequence, one-hot encoded (len1, alphabet_size) - "seq2": Second sequence, one-hot encoded (len2, alphabet_size)	required
`state`	`PyTree`	Element state (passed through unchanged)	required
`metadata`	`dict[str, Any] \| None`	Element metadata (passed through unchanged)	required
`random_params`	`Any`	Not used (deterministic operator)	`None`
`stats`	`dict[str, Any] \| None`	Not used	`None`

Returns:

Type	Description
`tuple[PyTree, PyTree, dict[str, Any] \| None]`	Tuple of (transformed_data, state, metadata): - transformed_data contains input sequences plus alignment results (score, alignment_matrix, soft_alignment) - state is passed through unchanged - metadata is passed through unchanged

SmithWatermanConfig¤

diffbio.operators.alignment.smith_waterman.SmithWatermanConfig `dataclass` ¤

SmithWatermanConfig(
    temperature: float = DEFAULT_TEMPERATURE,
    learnable_temperature: bool = False,
    cacheable: bool = True,
    gap_open: float = DEFAULT_GAP_OPEN,
    gap_extend: float = DEFAULT_GAP_EXTEND,
)

Bases: TemperatureConfig

Configuration for SmoothSmithWaterman.

Attributes:

Name	Type	Description
`temperature`	`float`	Temperature for logsumexp smoothing. Lower = sharper (closer to hard max), Higher = smoother.
`gap_open`	`float`	Penalty for opening a gap.
`gap_extend`	`float`	Penalty for extending a gap.

temperature `class-attribute` `instance-attribute` ¤

temperature: float = DEFAULT_TEMPERATURE

learnable_temperature `class-attribute` `instance-attribute` ¤

learnable_temperature: bool = False

AlignmentResult¤

diffbio.operators.alignment.smith_waterman.AlignmentResult ¤

Bases: NamedTuple

Result of a smooth alignment.

Attributes:

Name	Type	Description
`score`	`Float[Array, '']`	The soft alignment score.
`alignment_matrix`	`Float[Array, 'len1_plus1 len2_plus1']`	The DP matrix H[i,j] of shape (len1+1, len2+1).
`soft_alignment`	`Float[Array, 'len1 len2']`	Soft alignment matrix showing position correspondences.

Scoring Matrices¤

create_dna_scoring_matrix¤

diffbio.operators.alignment.scoring.create_dna_scoring_matrix ¤

create_dna_scoring_matrix(
    match: float = 2.0, mismatch: float = -1.0
) -> Float[Array, "4 4"]

Create a simple DNA scoring matrix.

Parameters:

Name	Type	Description	Default
`match`	`float`	Score for matching nucleotides (diagonal).	`2.0`
`mismatch`	`float`	Score for mismatching nucleotides (off-diagonal).	`-1.0`

Returns:

Type	Description
`Float[Array, '4 4']`	4x4 scoring matrix for DNA (A, C, G, T order).

create_rna_scoring_matrix¤

diffbio.operators.alignment.scoring.create_rna_scoring_matrix ¤

create_rna_scoring_matrix(
    match: float = 2.0, mismatch: float = -1.0
) -> Float[Array, "4 4"]

Create a simple RNA scoring matrix.

Parameters:

Name	Type	Description	Default
`match`	`float`	Score for matching nucleotides (diagonal).	`2.0`
`mismatch`	`float`	Score for mismatching nucleotides (off-diagonal).	`-1.0`

Returns:

Type	Description
`Float[Array, '4 4']`	4x4 scoring matrix for RNA (A, C, G, U order).

ScoringMatrix¤

diffbio.operators.alignment.scoring.ScoringMatrix ¤

Bases: NamedTuple

Scoring matrix with metadata.

Attributes:

Name	Type	Description
`matrix`	`Float[Array, 'alphabet alphabet']`	The scoring matrix array.
`alphabet`	`str`	The alphabet string (e.g., "ACGT" for DNA).
`name`	`str`	Optional name for the matrix.

matrix `instance-attribute` ¤

matrix: Float[Array, 'alphabet alphabet']

alphabet `instance-attribute` ¤

alphabet: str

name `class-attribute` `instance-attribute` ¤

name: str = ''

Pre-defined Matrices¤

from diffbio.operators.alignment import (
    get_dna_simple,     # 4x4 DNA scoring matrix
    get_rna_simple,     # 4x4 RNA scoring matrix
    get_blosum62,       # 20x20 protein substitution matrix
    PROTEIN_ALPHABET,   # "ARNDCQEGHILKMFPSTWYV"
)

Usage Examples¤

Basic Alignment¤

import jax.numpy as jnp
from diffbio.operators.alignment import (
    SmoothSmithWaterman,
    SmithWatermanConfig,
    create_dna_scoring_matrix,
)

# Setup
config = SmithWatermanConfig(temperature=1.0, gap_open=-10.0)
scoring = create_dna_scoring_matrix(match=2.0, mismatch=-1.0)
aligner = SmoothSmithWaterman(config, scoring_matrix=scoring)

# One-hot encode sequences
seq1 = jnp.eye(4)[jnp.array([0, 1, 2, 3])]  # ACGT
seq2 = jnp.eye(4)[jnp.array([0, 1, 0, 3])]  # ACAT

# Align
data = {"seq1": seq1, "seq2": seq2}
result, _, _ = aligner.apply(data, {}, None)
print(f"Score: {result['score']}")

Datarax Interface¤

data = {"seq1": seq1, "seq2": seq2}
result_data, state, metadata = aligner.apply(data, {}, None)
print(result_data["score"])

Gradient Computation¤

import jax

def alignment_loss(aligner, seq1, seq2):
    data = {"seq1": seq1, "seq2": seq2}
    result, _, _ = aligner.apply(data, {}, None)
    return -result["score"]

grads = jax.grad(alignment_loss)(aligner, seq1, seq2)

Smith-Waterman API¤

SmoothSmithWaterman¤

diffbio.operators.alignment.smith_waterman.SmoothSmithWaterman ¤

align ¤

apply ¤

SmithWatermanConfig¤

diffbio.operators.alignment.smith_waterman.SmithWatermanConfig dataclass ¤

temperature class-attribute instance-attribute ¤

learnable_temperature class-attribute instance-attribute ¤

AlignmentResult¤

diffbio.operators.alignment.smith_waterman.AlignmentResult ¤

Scoring Matrices¤

create_dna_scoring_matrix¤

diffbio.operators.alignment.scoring.create_dna_scoring_matrix ¤

create_rna_scoring_matrix¤

diffbio.operators.alignment.scoring.create_rna_scoring_matrix ¤

ScoringMatrix¤

diffbio.operators.alignment.scoring.ScoringMatrix ¤

matrix instance-attribute ¤

alphabet instance-attribute ¤

name class-attribute instance-attribute ¤

Pre-defined Matrices¤

Usage Examples¤

Basic Alignment¤

Datarax Interface¤

Gradient Computation¤

diffbio.operators.alignment.smith_waterman.SmithWatermanConfig `dataclass` ¤

temperature `class-attribute` `instance-attribute` ¤

learnable_temperature `class-attribute` `instance-attribute` ¤

matrix `instance-attribute` ¤

alphabet `instance-attribute` ¤

name `class-attribute` `instance-attribute` ¤