Skip to content

Smith-Waterman API¤

Differentiable Smith-Waterman local alignment operator.

SmoothSmithWaterman¤

diffbio.operators.alignment.smith_waterman.SmoothSmithWaterman ¤

SmoothSmithWaterman(
    config: SmithWatermanConfig,
    scoring_matrix: Array,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: TemperatureOperator

Differentiable Smith-Waterman local alignment.

This operator implements a smooth version of the Smith-Waterman algorithm where max operations are replaced with logsumexp, enabling gradient flow through the alignment computation.

The smoothness is controlled by the temperature parameter: - temperature -> 0: Approaches hard max (standard Smith-Waterman) - temperature -> inf: Uniform averaging

Inherits from TemperatureOperator to get:

  • Learnable temperature parameter management
  • soft_max() method using logsumexp relaxation

Parameters:

Name Type Description Default
config SmithWatermanConfig

SmithWatermanConfig with alignment parameters.

required
scoring_matrix Array

Scoring matrix for matches/mismatches.

required
rngs Rngs | None

Flax NNX random number generators (optional).

None
name str | None

Optional operator name.

None
Example
config = SmithWatermanConfig(temperature=1.0)
scoring = create_dna_scoring_matrix(match=2.0, mismatch=-1.0)
aligner = SmoothSmithWaterman(config, scoring_matrix=scoring)
result = aligner.align(seq1, seq2)
print(result.score)

Parameters:

Name Type Description Default
config SmithWatermanConfig

Alignment configuration.

required
scoring_matrix Array

Scoring matrix (alphabet_size, alphabet_size).

required
rngs Rngs | None

Random number generators (optional).

None
name str | None

Optional operator name.

None

align ¤

align(
    seq1: Float[Array, "len1 alphabet"],
    seq2: Float[Array, "len2 alphabet"],
) -> AlignmentResult

Perform smooth Smith-Waterman local alignment.

Parameters:

Name Type Description Default
seq1 Float[Array, 'len1 alphabet']

First sequence, one-hot encoded (len1, alphabet_size).

required
seq2 Float[Array, 'len2 alphabet']

Second sequence, one-hot encoded (len2, alphabet_size).

required

Returns:

Type Description
AlignmentResult

AlignmentResult with score, alignment matrix, and soft alignment.

apply ¤

apply(
    data: PyTree,
    state: PyTree,
    metadata: dict[str, Any] | None,
    random_params: Any = None,
    stats: dict[str, Any] | None = None,
) -> tuple[PyTree, PyTree, dict[str, Any] | None]

Apply alignment to sequence pair data.

This method implements the OperatorModule interface for batch processing. It expects data containing two sequences and returns alignment results.

Note: Output preserves input keys for Datarax vmap compatibility, while adding alignment result keys.

Parameters:

Name Type Description Default
data PyTree

Dictionary containing: - "seq1": First sequence, one-hot encoded (len1, alphabet_size) - "seq2": Second sequence, one-hot encoded (len2, alphabet_size)

required
state PyTree

Element state (passed through unchanged)

required
metadata dict[str, Any] | None

Element metadata (passed through unchanged)

required
random_params Any

Not used (deterministic operator)

None
stats dict[str, Any] | None

Not used

None

Returns:

Type Description
tuple[PyTree, PyTree, dict[str, Any] | None]

Tuple of (transformed_data, state, metadata): - transformed_data contains input sequences plus alignment results (score, alignment_matrix, soft_alignment) - state is passed through unchanged - metadata is passed through unchanged

SmithWatermanConfig¤

diffbio.operators.alignment.smith_waterman.SmithWatermanConfig dataclass ¤

SmithWatermanConfig(
    temperature: float = DEFAULT_TEMPERATURE,
    learnable_temperature: bool = False,
    cacheable: bool = True,
    gap_open: float = DEFAULT_GAP_OPEN,
    gap_extend: float = DEFAULT_GAP_EXTEND,
)

Bases: TemperatureConfig

Configuration for SmoothSmithWaterman.

Attributes:

Name Type Description
temperature float

Temperature for logsumexp smoothing. Lower = sharper (closer to hard max), Higher = smoother.

gap_open float

Penalty for opening a gap.

gap_extend float

Penalty for extending a gap.

temperature class-attribute instance-attribute ¤

temperature: float = DEFAULT_TEMPERATURE

learnable_temperature class-attribute instance-attribute ¤

learnable_temperature: bool = False

AlignmentResult¤

diffbio.operators.alignment.smith_waterman.AlignmentResult ¤

Bases: NamedTuple

Result of a smooth alignment.

Attributes:

Name Type Description
score Float[Array, '']

The soft alignment score.

alignment_matrix Float[Array, 'len1_plus1 len2_plus1']

The DP matrix H[i,j] of shape (len1+1, len2+1).

soft_alignment Float[Array, 'len1 len2']

Soft alignment matrix showing position correspondences.

Scoring Matrices¤

create_dna_scoring_matrix¤

diffbio.operators.alignment.scoring.create_dna_scoring_matrix ¤

create_dna_scoring_matrix(
    match: float = 2.0, mismatch: float = -1.0
) -> Float[Array, "4 4"]

Create a simple DNA scoring matrix.

Parameters:

Name Type Description Default
match float

Score for matching nucleotides (diagonal).

2.0
mismatch float

Score for mismatching nucleotides (off-diagonal).

-1.0

Returns:

Type Description
Float[Array, '4 4']

4x4 scoring matrix for DNA (A, C, G, T order).

create_rna_scoring_matrix¤

diffbio.operators.alignment.scoring.create_rna_scoring_matrix ¤

create_rna_scoring_matrix(
    match: float = 2.0, mismatch: float = -1.0
) -> Float[Array, "4 4"]

Create a simple RNA scoring matrix.

Parameters:

Name Type Description Default
match float

Score for matching nucleotides (diagonal).

2.0
mismatch float

Score for mismatching nucleotides (off-diagonal).

-1.0

Returns:

Type Description
Float[Array, '4 4']

4x4 scoring matrix for RNA (A, C, G, U order).

ScoringMatrix¤

diffbio.operators.alignment.scoring.ScoringMatrix ¤

Bases: NamedTuple

Scoring matrix with metadata.

Attributes:

Name Type Description
matrix Float[Array, 'alphabet alphabet']

The scoring matrix array.

alphabet str

The alphabet string (e.g., "ACGT" for DNA).

name str

Optional name for the matrix.

matrix instance-attribute ¤

matrix: Float[Array, 'alphabet alphabet']

alphabet instance-attribute ¤

alphabet: str

name class-attribute instance-attribute ¤

name: str = ''

Pre-defined Matrices¤

from diffbio.operators.alignment import (
    get_dna_simple,     # 4x4 DNA scoring matrix
    get_rna_simple,     # 4x4 RNA scoring matrix
    get_blosum62,       # 20x20 protein substitution matrix
    PROTEIN_ALPHABET,   # "ARNDCQEGHILKMFPSTWYV"
)

Usage Examples¤

Basic Alignment¤

import jax.numpy as jnp
from diffbio.operators.alignment import (
    SmoothSmithWaterman,
    SmithWatermanConfig,
    create_dna_scoring_matrix,
)

# Setup
config = SmithWatermanConfig(temperature=1.0, gap_open=-10.0)
scoring = create_dna_scoring_matrix(match=2.0, mismatch=-1.0)
aligner = SmoothSmithWaterman(config, scoring_matrix=scoring)

# One-hot encode sequences
seq1 = jnp.eye(4)[jnp.array([0, 1, 2, 3])]  # ACGT
seq2 = jnp.eye(4)[jnp.array([0, 1, 0, 3])]  # ACAT

# Align
data = {"seq1": seq1, "seq2": seq2}
result, _, _ = aligner.apply(data, {}, None)
print(f"Score: {result['score']}")

Datarax Interface¤

data = {"seq1": seq1, "seq2": seq2}
result_data, state, metadata = aligner.apply(data, {}, None)
print(result_data["score"])

Gradient Computation¤

import jax

def alignment_loss(aligner, seq1, seq2):
    data = {"seq1": seq1, "seq2": seq2}
    result, _, _ = aligner.apply(data, {}, None)
    return -result["score"]

grads = jax.grad(alignment_loss)(aligner, seq1, seq2)