Alignment Operators¤
DiffBio provides advanced differentiable alignment operators for multiple sequence alignment and profile-based homology detection.
Alignment Fully Differentiable
Overview¤
Beyond pairwise alignment (Smith-Waterman), DiffBio implements:
- SoftProgressiveMSA: Multiple sequence alignment with soft guide tree
- ProfileHMMSearch: Profile Hidden Markov Models for sequence family detection
SoftProgressiveMSA¤
Differentiable progressive multiple sequence alignment using neural sequence encoders.
Quick Start¤
from flax import nnx
from diffbio.operators.alignment import SoftProgressiveMSA, SoftProgressiveMSAConfig
# Configure MSA operator
config = SoftProgressiveMSAConfig(
max_seq_length=100,
hidden_dim=64,
num_layers=2,
alphabet_size=4, # DNA
temperature=1.0,
gap_open_penalty=-10.0,
gap_extend_penalty=-1.0,
)
# Create operator
rngs = nnx.Rngs(42)
msa = SoftProgressiveMSA(config, rngs=rngs)
# Prepare sequences (n_seqs, seq_len, alphabet_size)
sequences = jnp.stack([seq1, seq2, seq3])
# Perform MSA
data = {"sequences": sequences}
result, state, metadata = msa.apply(data, {}, None)
# Access results
aligned = result["aligned_sequences"] # Aligned sequences
distances = result["pairwise_distances"] # Guide tree distances
consensus = result["consensus_profile"] # Consensus profile
Configuration¤
| Parameter | Type | Default | Description |
|---|---|---|---|
max_seq_length |
int | 100 | Maximum sequence length |
hidden_dim |
int | 64 | Hidden dimension for neural encoder |
num_layers |
int | 2 | Number of encoder layers |
alphabet_size |
int | 4 | Size of sequence alphabet |
temperature |
float | 1.0 | Temperature for softmax operations |
gap_open_penalty |
float | -10.0 | Gap opening penalty |
gap_extend_penalty |
float | -1.0 | Gap extension penalty |
Algorithm¤
- Sequence Encoding: Neural network encodes each sequence to a fixed-size embedding
- Distance Computation: Pairwise distances computed via cosine similarity
- Progressive Alignment: Sequences aligned progressively using soft attention
- Consensus Building: Weighted profile built from aligned sequences
ProfileHMMSearch¤
Profile Hidden Markov Model for detecting sequence homology to a family profile.
Quick Start¤
from diffbio.operators.alignment import ProfileHMMSearch, ProfileHMMConfig
# Configure Profile HMM
config = ProfileHMMConfig(
profile_length=50,
alphabet_size=4,
num_match_states=50,
temperature=1.0,
)
# Create operator
rngs = nnx.Rngs(42)
profile_hmm = ProfileHMMSearch(config, rngs=rngs)
# Score sequence against profile
data = {"sequence": sequence} # (seq_len, alphabet_size)
result, state, metadata = profile_hmm.apply(data, {}, None)
# Get homology score
score = result["log_likelihood"]
viterbi_path = result["viterbi_path"]
Configuration¤
| Parameter | Type | Default | Description |
|---|---|---|---|
profile_length |
int | 50 | Length of the profile (number of match states) |
alphabet_size |
int | 4 | Size of sequence alphabet |
num_match_states |
int | 50 | Number of match states |
temperature |
float | 1.0 | Temperature for soft operations |
HMM States¤
The Profile HMM has three state types:
- Match (M): Position emits from learned distribution
- Insert (I): Handles insertions relative to profile
- Delete (D): Handles deletions (silent states)
Differentiability Techniques¤
Soft Guide Tree¤
Instead of discrete UPGMA clustering, SoftProgressiveMSA uses:
- Neural embeddings for sequence similarity
- Soft attention for pairwise alignment
- Differentiable weighted consensus
Soft Viterbi¤
ProfileHMMSearch uses temperature-scaled logsumexp for soft Viterbi:
This maintains gradient flow through the decoding process.
Use Cases¤
| Application | Operator | Description |
|---|---|---|
| Protein family detection | ProfileHMMSearch | Score query against family profile |
| Homology search | ProfileHMMSearch | Find similar sequences |
| Multiple alignment | SoftProgressiveMSA | Align related sequences |
| Phylogenetic analysis | SoftProgressiveMSA | Build evolutionary relationships |
| Consensus building | SoftProgressiveMSA | Extract conserved features |
Next Steps¤
- See the Smith-Waterman operator for pairwise alignment
- Explore Statistical Operators for more HMM applications