Protein Structure Operators¤
DiffBio provides differentiable operators for protein structure analysis, including secondary structure prediction using the DSSP algorithm.
Protein Fully Differentiable
Overview¤
Protein structure operators enable gradient-based optimization for structural bioinformatics:
- DifferentiableSecondaryStructure: PyDSSP-style DSSP algorithm with continuous hydrogen bond matrix
Protein foundation-model scope¤
Experimental protein foundation-model support is deliberately narrow. DiffBio now
uses the shared sequence foundation substrate for protein sequence context in
the secondary-structure benchmark and provides a strict adapter contract for
precomputed protein-LM artifacts. This means exported embedding matrices can be
aligned by sequence_ids through the same sequence adapter path used for other
foundation-model benchmarks.
Post-DTI stable boundary: benchmark-backed operator support is separate from imported foundation-model promotion.
This is not stable imported protein-LM support. It does not claim external protein-LM checkpoint loading, tokenizer interchangeability, or broad protein task promotion. The current benchmark evidence is limited to synthetic secondary-structure scaffold context with stable scope excluded.
DifferentiableSecondaryStructure¤
Differentiable implementation of the DSSP algorithm for assigning secondary structure (helix, strand, loop) to protein backbone atoms.
Algorithm¤
The DSSP algorithm (Kabsch & Sander, 1983) assigns secondary structure based on hydrogen bonding patterns:
graph TB
A["Backbone Coordinates<br/>(N, CA, C, O)"] --> B["Compute H-bond Energy<br/>E = q₁q₂f(1/rON + 1/rCH - 1/rOH - 1/rCN)"]
B --> C["Continuous H-bond Matrix<br/>(1+sin((cutoff-E-m)/m*π/2))/2"]
C --> D["Pattern Detection<br/>Helix: i→i+4 H-bonds<br/>Strand: non-local H-bonds<br/>Loop: remaining residues"]
D --> E["Soft SS Assignments<br/>(H, E, -)"]
style A fill:#d1fae5,stroke:#059669,color:#064e3b
style B fill:#e0e7ff,stroke:#4338ca,color:#312e81
style C fill:#e0e7ff,stroke:#4338ca,color:#312e81
style D fill:#dbeafe,stroke:#2563eb,color:#1e3a5f
style E fill:#d1fae5,stroke:#059669,color:#064e3b
Quick Start¤
from flax import nnx
import jax
import jax.numpy as jnp
from diffbio.operators.protein import (
DifferentiableSecondaryStructure,
SecondaryStructureConfig,
create_secondary_structure_predictor,
)
# Create operator
predictor = create_secondary_structure_predictor(
margin=1.0, # Smoothing margin
cutoff=-0.5, # H-bond energy threshold (kcal/mol)
temperature=1.0, # Softmax temperature
)
# Prepare backbone coordinates (N, CA, C, O atoms)
# Shape: (batch, n_residues, 4, 3)
n_residues = 50
coords = jax.random.uniform(jax.random.PRNGKey(0), (1, n_residues, 4, 3)) * 10
# Apply operator
data = {"coordinates": coords}
result, state, metadata = predictor.apply(data, {}, None)
# Get results
ss_probs = result["ss_onehot"] # (1, 50, 3) - soft probabilities
ss_indices = result["ss_indices"] # (1, 50) - hard assignments
hbond_map = result["hbond_map"] # (1, 50, 50) - H-bond matrix
Configuration¤
| Parameter | Type | Default | Description |
|---|---|---|---|
margin |
float | 1.0 | Smoothing margin for H-bond transformation |
cutoff |
float | -0.5 | H-bond energy threshold (kcal/mol) |
min_helix_length |
int | 4 | Minimum residues for helix assignment |
temperature |
float | 1.0 | Softmax temperature for assignments |
Input/Output Formats¤
Input
| Key | Shape | Description |
|---|---|---|
coordinates |
(batch, length, 4, 3) | Backbone atom coordinates (N, CA, C, O) |
Output
| Key | Shape | Description |
|---|---|---|
coordinates |
(batch, length, 4, 3) | Original coordinates |
ss_onehot |
(batch, length, 3) | Soft SS probabilities (loop, helix, strand) |
ss_indices |
(batch, length) | Hard assignments (0=loop, 1=helix, 2=strand) |
hbond_map |
(batch, length, length) | Continuous H-bond matrix in [0, 1] |
Hydrogen Bond Energy¤
The hydrogen bond energy is computed using the Kabsch-Sander electrostatic formula:
Where:
- \(q_1 q_2 = 0.084\) (partial charges in electron units)
- \(f = 332\) (conversion factor to kcal/mol)
- \(r_{XY}\) = distance between atoms X and Y
Donor: N-H from residue i, Acceptor: C=O from residue j
Continuous H-bond Matrix¤
The energy is transformed to a continuous [0,1] matrix using:
This smooth transformation enables gradient flow through secondary structure prediction.
Secondary Structure Classes¤
| Index | Symbol | Name | Pattern |
|---|---|---|---|
| 0 | - |
Loop/Coil | No characteristic H-bonds |
| 1 | H |
α-Helix | i→i+4 backbone H-bonds |
| 2 | E |
β-Strand | Non-local parallel/antiparallel H-bonds |
Training Example¤
import optax
from flax import nnx
predictor = create_secondary_structure_predictor()
optimizer = optax.adam(1e-3)
opt_state = optimizer.init(nnx.state(predictor, nnx.Param))
def loss_fn(model, coords, target_ss):
"""Cross-entropy loss for SS prediction."""
data = {"coordinates": coords}
result, _, _ = model.apply(data, {}, None)
log_probs = jnp.log(result["ss_onehot"] + 1e-10)
return -jnp.mean(jnp.sum(target_ss * log_probs, axis=-1))
@nnx.jit
def train_step(model, opt_state, coords, target_ss):
loss, grads = nnx.value_and_grad(loss_fn)(model, coords, target_ss)
params = nnx.state(model, nnx.Param)
updates, opt_state = optimizer.update(grads, opt_state, params)
nnx.update(model, optax.apply_updates(params, updates))
return loss, opt_state
Accessing the H-bond Map¤
# Get continuous H-bond matrix for analysis
result, _, _ = predictor.apply({"coordinates": coords}, {}, None)
hbond_map = result["hbond_map"] # (batch, length, length)
# Find strong H-bonds (threshold at 0.5 matches original DSSP)
strong_hbonds = hbond_map > 0.5
# Visualize H-bond pattern
import matplotlib.pyplot as plt
plt.imshow(hbond_map[0], cmap='viridis')
plt.xlabel('Acceptor residue')
plt.ylabel('Donor residue')
plt.colorbar(label='H-bond probability')
plt.title('Hydrogen Bond Matrix')
Use Cases¤
| Application | Operator | Description |
|---|---|---|
| Structure validation | DifferentiableSecondaryStructure | Validate predicted structures |
| Fold recognition | DifferentiableSecondaryStructure | Compare SS patterns |
| Protein design | DifferentiableSecondaryStructure | Optimize for target SS |
| MD analysis | DifferentiableSecondaryStructure | Track SS changes during simulation |
References¤
-
Kabsch, W. & Sander, C. (1983). "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features." Biopolymers 22, 2577-2637.
-
Minami, S. (2023). "PyDSSP: A simplified implementation of DSSP algorithm for PyTorch and NumPy." GitHub repository.
Next Steps¤
- See Alignment Operators for sequence-structure alignment
- Explore Statistical Operators for related analysis methods