Data Sources API¤

Data source modules for loading bioinformatics and drug discovery datasets, extending Datarax's DataSourceModule.

Genomics Sources¤

BAMSource¤

diffbio.sources.bam.BAMSource ¤

BAMSource(
    config: BAMSourceConfig,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: IndexedBatchSourceMixin, DataSourceModule

BAM/CRAM file data source extending Datarax DataSourceModule.

Provides efficient access to aligned sequencing reads with:

Lazy loading using pysam iterators
Indexed random access via BAI/CRAI files
Quality filtering at load time
One-hot encoded sequence output

Inherits from DataSourceModule (StructuralModule) because:

Non-parametric: BAM reading is deterministic
Frozen config: file parameters don't change
Domain-specific: requires genomics-specific handling

Example

config = BAMSourceConfig(file_path=Path("sample.bam"))
source = BAMSource(config)
for element in source:
    print(element.data["read_name"], element.data["sequence"].shape)

Performance Tips (from pysam best practices):

Use indexed BAM files for random access
Filter by region to reduce data loading
Set min_mapping_quality to filter at read time

Parameters:

Name	Type	Description	Default
`config`	`BAMSourceConfig`	BAM source configuration	required
`rngs`	`Rngs \| None`	Random number generators (unused for data loading)	`None`
`name`	`str \| None`	Optional module name	`None`

Raises:

Type	Description
`FileNotFoundError`	If BAM file not found
`ImportError`	If pysam is not installed

len ¤

__len__() -> int

Return the number of reads in the source.

getitem ¤

__getitem__(idx: int) -> Element | None

Get read by index.

Parameters:

Name	Type	Description	Default
`idx`	`int`	Index of the read	required

Returns:

Type	Description
`Element \| None`	Element at the given index, or None if out of bounds

iter ¤

__iter__() -> Iterator[Element]

Return iterator over reads.

reset ¤

reset(seed: int | None = None) -> None

Reset iteration state, optionally with a new seed.

get_batch ¤

get_batch(
    batch_size: int, key: Array | None = None
) -> list[Element]

Return the next batch of elements, advancing the internal index.

BAMSourceConfig¤

diffbio.sources.bam.BAMSourceConfig `dataclass` ¤

BAMSourceConfig(
    file_path: Path = None,
    reference_path: Path | None = None,
    include_unmapped: bool = False,
    min_mapping_quality: int | None = None,
    region: str | None = None,
    handle_n: Literal["uniform", "zero"] = "uniform",
)

Bases: StructuralConfig

Configuration for BAM/CRAM data source.

Attributes:

Name	Type	Description
`file_path`	`Path`	Path to BAM/CRAM file
`reference_path`	`Path \| None`	Optional path to reference FASTA (required for CRAM)
`include_unmapped`	`bool`	Whether to include unmapped reads (default: False)
`min_mapping_quality`	`int \| None`	Minimum mapping quality to include (default: None)
`region`	`str \| None`	Optional genomic region to query (e.g., "chr1:1000-2000")
`handle_n`	`Literal['uniform', 'zero']`	How to handle N nucleotides in sequences

FastaSource¤

diffbio.sources.fasta.FastaSource ¤

FastaSource(
    config: FastaSourceConfig,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: IndexedBatchSourceMixin, DataSourceModule

FASTA file data source extending Datarax DataSourceModule.

Provides efficient access to DNA/RNA sequences with:

Lazy loading using samtools-compatible .fai index
Dictionary-like access by sequence name
One-hot encoded sequence output
Support for compressed BGZF files

Inherits from DataSourceModule (StructuralModule) because:

Non-parametric: FASTA reading is deterministic
Frozen config: file parameters don't change
Domain-specific: requires genomics-specific handling

Example

config = FastaSourceConfig(file_path=Path("genome.fasta"))
source = FastaSource(config)
elem = source.get_by_name("chr1")
print(elem.data["sequence"].shape)

Performance Tips (from pyfaidx best practices):

Use indexed FASTA files (.fai) for random access
Access regions with slicing for large chromosomes
BGZF compression reduces disk space while maintaining random access

Parameters:

Name	Type	Description	Default
`config`	`FastaSourceConfig`	FASTA source configuration	required
`rngs`	`Rngs \| None`	Random number generators (unused for data loading)	`None`
`name`	`str \| None`	Optional module name	`None`

Raises:

Type	Description
`FileNotFoundError`	If FASTA file not found
`ImportError`	If pyfaidx is not installed

sequence_names `property` ¤

sequence_names: list[str]

Get list of all sequence names in the FASTA file.

len ¤

__len__() -> int

Return the number of sequences in the source.

getitem ¤

__getitem__(idx: int) -> Element | None

Get sequence by index.

Parameters:

Name	Type	Description	Default
`idx`	`int`	Index of the sequence	required

Returns:

Type	Description
`Element \| None`	Element at the given index, or None if out of bounds

iter ¤

__iter__() -> Iterator[Element]

Return iterator over sequences.

reset ¤

reset(seed: int | None = None) -> None

Reset iteration state, optionally with a new seed.

get_batch ¤

get_batch(
    batch_size: int, key: Array | None = None
) -> list[Element]

Return the next batch of elements, advancing the internal index.

get_by_name ¤

get_by_name(name: str) -> Element | None

Get sequence by name/ID.

Parameters:

Name	Type	Description	Default
`name`	`str`	Sequence identifier (e.g., "chr1", "seq1")	required

Returns:

Type	Description
`Element \| None`	Element for the sequence, or None if not found

FastaSourceConfig¤

diffbio.sources.fasta.FastaSourceConfig `dataclass` ¤

FastaSourceConfig(
    file_path: Path = None,
    handle_n: Literal["uniform", "zero"] = "uniform",
    create_index: bool = True,
)

Bases: StructuralConfig

Configuration for FASTA data source.

Attributes:

Name	Type	Description
`file_path`	`Path`	Path to FASTA file
`handle_n`	`Literal['uniform', 'zero']`	How to handle N nucleotides ("uniform" or "zero")
`create_index`	`bool`	Whether to create .fai index if not exists (default: True)

MolNet Benchmark Source¤

MolNetSource¤

diffbio.sources.molnet.MolNetSource ¤

MolNetSource(
    config: MolNetSourceConfig,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: DataSourceModule

MolNet benchmark data source extending Datarax DataSourceModule.

Provides standardized access to MoleculeNet benchmark datasets with proper train/valid/test splits. Supports automatic downloading and caching.

Inherits from DataSourceModule (StructuralModule) because:

Non-parametric: data loading is deterministic
Frozen config: dataset parameters don't change
Domain-specific: requires molecular data handling

Example

config = MolNetSourceConfig(dataset_name="bbbp", split="train")
source = MolNetSource(config)
for element in source:
    print(element.data["smiles"], element.data["y"])

References

Wu et al. "MoleculeNet: A Benchmark for Molecular Machine Learning" Chemical Science, 2018.

Parameters:

Name	Type	Description	Default
`config`	`MolNetSourceConfig`	MolNet source configuration	required
`rngs`	`Rngs \| None`	Random number generators (unused for data loading)	`None`
`name`	`str \| None`	Optional module name	`None`

Raises:

Type	Description
`ValueError`	If dataset_name is unknown
`FileNotFoundError`	If data not found and download=False

task_type `property` ¤

task_type: str

Get the task type for this dataset.

n_tasks `property` ¤

n_tasks: int

Get the number of tasks for this dataset.

len ¤

__len__() -> int

Return the number of elements in the source.

getitem ¤

__getitem__(idx: int) -> Element | None

Get element by index.

Parameters:

Name	Type	Description	Default
`idx`	`int`	Index of the element	required

Returns:

Type	Description
`Element \| None`	Element at the given index, or None if out of bounds

iter ¤

__iter__() -> Iterator[Element]

Return iterator over elements.

MolNetSourceConfig¤

diffbio.sources.molnet.MolNetSourceConfig `dataclass` ¤

MolNetSourceConfig(
    dataset_name: str = "",
    split: Literal["train", "valid", "test"] = "train",
    data_dir: Path | None = None,
    download: bool = True,
)

Bases: StructuralConfig

Configuration for MolNet benchmark data source.

Attributes:

Name	Type	Description
`dataset_name`	`str`	Name of the MolNet dataset (e.g., "bbbp", "tox21", "esol")
`split`	`Literal['train', 'valid', 'test']`	Which split to load ("train", "valid", or "test")
`data_dir`	`Path \| None`	Directory to store downloaded data (default: ~/.diffbio/molnet)
`download`	`bool`	Whether to download if data not found (default: True)

Indexed View Source¤

IndexedViewSource¤

diffbio.sources.indexed_view.IndexedViewSource ¤

IndexedViewSource(
    config: IndexedViewSourceConfig,
    source: DataSourceModule,
    indices: ndarray,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: DataSourceModule

Lazy-loading view into a data source using index mapping.

This source wraps an existing DataSourceModule and provides access only to elements at specified indices. Elements are loaded ON-DEMAND from the underlying source, enabling lazy loading for large datasets.

Key Features:

LAZY LOADING: Elements fetched from underlying source only when accessed
Memory efficient: Only stores indices, not actual data
Preserves underlying source's lazy loading behavior
Supports shuffling of view indices (not underlying data)

Example

# Create view of first 1000 elements
indices = jnp.arange(1000)
config = IndexedViewSourceConfig()
view = IndexedViewSource(config, original_source, indices)
view[0]  # Fetches original_source[indices[0]] lazily

Parameters:

Name	Type	Description	Default
`config`	`IndexedViewSourceConfig`	Configuration for the view source	required
`source`	`DataSourceModule`	Underlying data source to wrap	required
`indices`	`ndarray`	Array of indices into the source to expose	required
`rngs`	`Rngs \| None`	Random number generators for shuffling	`None`
`name`	`str \| None`	Optional name for the module	`None`

Parameters:

Name	Type	Description	Default
`config`	`IndexedViewSourceConfig`	Configuration for the view source	required
`source`	`DataSourceModule`	Underlying data source to wrap	required
`indices`	`ndarray`	Array of indices into the source to expose	required
`rngs`	`Rngs \| None`	Random number generators for shuffling	`None`
`name`	`str \| None`	Optional name for the module	`None`

len ¤

__len__() -> int

Return number of elements in the view.

getitem ¤

__getitem__(idx: int) -> Element | None

Get element at view index (LAZY - fetches from underlying source).

Parameters:

Name	Type	Description	Default
`idx`	`int`	Index into the VIEW (0 to len(view)-1)	required

Returns:

Type	Description
`Element \| None`	Element from underlying source at mapped index, or None if out of bounds

iter ¤

__iter__() -> Iterator[Element]

Iterate over view elements (LAZY - fetches on demand).

IndexedViewSourceConfig¤

diffbio.sources.indexed_view.IndexedViewSourceConfig `dataclass` ¤

IndexedViewSourceConfig(
    shuffle: bool = False, seed: int | None = None
)

Bases: StructuralConfig

Configuration for IndexedViewSource.

Attributes:

Name	Type	Description
`shuffle`	`bool`	Whether to shuffle the view indices on initialization and reset
`seed`	`int \| None`	Random seed for shuffling (optional)

AnnData Source¤

AnnDataSource¤

diffbio.sources.anndata_source.AnnDataSource ¤

AnnDataSource(
    config: AnnDataSourceConfig,
    *,
    rngs: Rngs | None = None,
    name: str | None = None,
)

Bases: DataSourceModule

Eager-loading AnnData source for single-cell RNA-seq data.

Loads all data from .h5ad files to JAX arrays at initialization, then provides pure JAX iteration, batching, and indexed access. Follows the same eager-loading pattern as datarax's HFEagerSource.

Provides

Full dataset loading via load()
Per-cell indexed access via __getitem__
Iteration via __iter__ with optional O(1) memory shuffling
Batch retrieval via get_batch(batch_size)
Automatic sparse-to-dense conversion
JAX array output for count matrices and embeddings

Output dictionary keys

counts: Dense JAX array of shape (n_cells, n_genes) from .X
obs: Dict of cell metadata columns from .obs
var: Dict of gene metadata columns from .var
obsm: Dict of embedding JAX arrays from .obsm (empty if absent)

Example

config = AnnDataSourceConfig(file_path="pbmc3k.h5ad")
source = AnnDataSource(config)
print(len(source))                # 2700
print(source.load()["counts"].shape)  # (2700, 32738)

for cell in source:
    print(cell["counts"].shape)   # (32738,)
    break

batch = source.get_batch(32)
print(batch["counts"].shape)      # (32, 32738)

Loads all data to JAX arrays at construction time.

Parameters:

Name	Type	Description	Default
`config`	`AnnDataSourceConfig`	AnnDataSourceConfig with file path and options.	required
`rngs`	`Rngs \| None`	Optional RNG state for shuffling.	`None`
`name`	`str \| None`	Optional module name.	`None`

Raises:

Type	Description
`FileNotFoundError`	If the file does not exist.
`ImportError`	If anndata is not installed.

len ¤

__len__() -> int

Return the number of cells in the dataset.

getitem ¤

__getitem__(idx: int) -> dict[str, Any]

Get data for a single cell by index.

Supports negative indexing.

Parameters:

Name	Type	Description	Default
`idx`	`int`	Cell index (supports negative indexing).	required

Returns:

Type	Description
`dict[str, Any]`	Dictionary with `counts`, `obs`, `obsm` keys.

Raises:

Type	Description
`IndexError`	If idx is out of bounds.

iter ¤

__iter__() -> Iterator[dict[str, Any]]

Iterate over cells with optional O(1) memory shuffling.

Yields:

Type	Description
`dict[str, Any]`	Per-cell dictionaries with `counts`, `obs`, `obsm` keys.

AnnDataSourceConfig¤

diffbio.sources.anndata_source.AnnDataSourceConfig `dataclass` ¤

AnnDataSourceConfig(
    file_path: str | None = None,
    backed: bool = False,
    shuffle: bool = False,
    seed: int = 42,
    split: str | None = None,
)

Bases: StructuralConfig

Configuration for AnnDataSource.

Attributes:

Name	Type	Description
`file_path`	`str \| None`	Path to the .h5ad file (string or Path object).
`backed`	`bool`	Whether to open in backed mode (memory-mapped).
`shuffle`	`bool`	Whether to shuffle during iteration.
`seed`	`int`	Integer seed for Grain's index_shuffle.
`split`	`str \| None`	Optional split name for pipeline integration.

AnnData Interop¤

to_anndata¤

diffbio.sources.anndata_interop.to_anndata ¤

to_anndata(data_dict: dict[str, Any]) -> AnnData

Convert a DiffBio data dict to an AnnData object.

Translates the standard DiffBio dictionary format (as produced by AnnDataSource.load()) into an anndata.AnnData object for use with scanpy, scvi-tools, and other AnnData-based tools.

JAX arrays in counts and obsm are converted to numpy via np.asarray(). The obs and var dicts become pandas DataFrames.

Parameters:

Name	Type	Description	Default
`data_dict`	`dict[str, Any]`	Dictionary with keys: - `counts`: JAX or numpy array of shape (n_cells, n_genes). - `obs`: Dict mapping column names to per-cell arrays. - `var`: Dict mapping column names to per-gene arrays. - `obsm` (optional): Dict mapping embedding names to arrays.	required

Returns:

Type	Description
`AnnData`	AnnData object with `.X`, `.obs`, `.var`, and `.obsm`
`AnnData`	populated from the input dictionary.

Raises:

Type	Description
`ImportError`	If anndata or pandas is not installed.

from_anndata¤

diffbio.sources.anndata_interop.from_anndata ¤

from_anndata(adata: AnnData) -> dict[str, Any]

Convert an AnnData object to a DiffBio data dict.

Translates an anndata.AnnData object into the standard DiffBio dictionary format compatible with AnnDataSource.load() output.

Sparse .X matrices are converted to dense before wrapping in a JAX array. .obs and .var DataFrames become plain dicts of numpy arrays. .obsm entries become JAX arrays.

Parameters:

Name	Type	Description	Default
`adata`	`AnnData`	AnnData object to convert.	required

Returns:

Type	Description
`dict[str, Any]`	Dictionary with keys: - `counts`: Dense JAX array of shape (n_cells, n_genes). - `obs`: Dict mapping column names to numpy arrays. - `var`: Dict mapping column names to numpy arrays. - `obsm`: Dict mapping embedding names to JAX arrays.

Usage Examples¤

Reading BAM/CRAM Files¤

from pathlib import Path
from diffbio.sources import BAMSource, BAMSourceConfig

# Load aligned reads from a BAM file
config = BAMSourceConfig(
    file_path=Path("sample.bam"),
    min_mapping_quality=20,  # Filter low-quality alignments
    include_unmapped=False,  # Skip unmapped reads
)
source = BAMSource(config)

print(f"Number of reads: {len(source)}")

# Access reads
for element in source:
    # One-hot encoded sequence (length, 4)
    sequence = element.data["sequence"]
    # Phred quality scores (length,)
    quality = element.data["quality_scores"]
    # Read name
    name = element.data["read_name"]

    print(f"Read {name}: {sequence.shape}, avg quality: {quality.mean():.1f}")

Reading FASTA Files¤

from pathlib import Path
from diffbio.sources import FastaSource, FastaSourceConfig

# Load sequences from a FASTA file
config = FastaSourceConfig(
    file_path=Path("genome.fasta"),
    handle_n="uniform",  # or "zero" for N nucleotides
    create_index=True,   # Create .fai index for random access
)
source = FastaSource(config)

print(f"Number of sequences: {len(source)}")
print(f"Sequence names: {source.sequence_names}")

# Access by index
for element in source:
    seq_id = element.data["sequence_id"]
    sequence = element.data["sequence"]  # One-hot encoded
    print(f"{seq_id}: {sequence.shape[0]} bp")

# Access by name
chr1 = source.get_by_name("chr1")
if chr1 is not None:
    print(f"Chromosome 1 length: {chr1.data['sequence'].shape[0]}")

Region-Based BAM Access¤

from pathlib import Path
from diffbio.sources import BAMSource, BAMSourceConfig

# Load only reads from a specific region
config = BAMSourceConfig(
    file_path=Path("sample.bam"),
    region="chr1:1000000-2000000",  # 1Mb region on chr1
)
source = BAMSource(config)

print(f"Reads in region: {len(source)}")

Loading MolNet Benchmarks¤

from diffbio.sources import MolNetSource, MolNetSourceConfig

# Load BBBP (Blood-Brain Barrier Penetration) dataset
config = MolNetSourceConfig(
    dataset_name="bbbp",
    split="train",  # "train", "valid", or "test"
    download=True,  # Auto-download if not found
)
source = MolNetSource(config)

print(f"Dataset size: {len(source)}")
print(f"Task type: {source.task_type}")  # "classification"
print(f"Number of tasks: {source.n_tasks}")  # 1

# Iterate over elements
for element in source:
    smiles = element.data["smiles"]
    label = element.data["y"]
    print(f"{smiles}: {label}")

Available MolNet Datasets¤

Dataset	Task Type	Tasks	Description
`bbbp`	classification	1	Blood-brain barrier penetration
`tox21`	classification	12	Toxicity across 12 assays
`hiv`	classification	1	HIV replication inhibition
`bace`	classification	1	BACE-1 inhibitor activity
`clintox`	classification	2	Clinical trial toxicity
`sider`	classification	27	Drug side effects
`esol`	regression	1	Aqueous solubility
`freesolv`	regression	1	Hydration free energy
`lipophilicity`	regression	1	Octanol/water partition

Using IndexedViewSource for Lazy Loading¤

from diffbio.sources import IndexedViewSource, IndexedViewSourceConfig
from diffbio.splitters import ScaffoldSplitter, ScaffoldSplitterConfig
import jax.numpy as jnp

# Create a splitter
splitter_config = ScaffoldSplitterConfig(smiles_key="smiles")
splitter = ScaffoldSplitter(splitter_config)

# Get split indices
result = splitter.split(data_source)

# Create lazy view for training data
view_config = IndexedViewSourceConfig(shuffle=True, seed=42)
train_view = IndexedViewSource(
    view_config,
    data_source,
    result.train_indices,
)

# Iterate - elements loaded on demand
for element in train_view:
    print(element.data["smiles"])

Integration with Datarax Samplers¤

from diffbio.sources import MolNetSource, MolNetSourceConfig
from diffbio.splitters import ScaffoldSplitter, ScaffoldSplitterConfig
from datarax.samplers import ShuffleSampler, ShuffleSamplerConfig

# Load dataset
source_config = MolNetSourceConfig(dataset_name="bbbp", split="train")
source = MolNetSource(source_config)

# Split by scaffold
splitter_config = ScaffoldSplitterConfig(smiles_key="smiles")
splitter = ScaffoldSplitter(splitter_config)

# Create split sources (lazy loading)
train_source, valid_source, test_source = splitter.create_split_sources(
    source,
    lazy=True,
)

# Use with Datarax sampler
sampler_config = ShuffleSamplerConfig(batch_size=32)
train_sampler = ShuffleSampler(sampler_config, data_source=train_source)

# Training loop
for batch in train_sampler:
    # Process batch
    pass

Custom Data Directory¤

from pathlib import Path
from diffbio.sources import MolNetSource, MolNetSourceConfig

# Specify custom data directory
config = MolNetSourceConfig(
    dataset_name="tox21",
    split="train",
    data_dir=Path("/custom/path/to/data"),
    download=True,
)
source = MolNetSource(config)

Accessing Individual Elements¤

from diffbio.sources import MolNetSource, MolNetSourceConfig

config = MolNetSourceConfig(dataset_name="esol", split="train")
source = MolNetSource(config)

# Access by index
element = source[0]
if element is not None:
    smiles = element.data["smiles"]
    solubility = element.data["y"]
    metadata = element.metadata  # {"idx": 0, "dataset": "esol"}

Data Element Format¤

MolNetSource Elements¤

Each element from MolNetSource contains:

Field	Type	Description
`data["smiles"]`	str	SMILES representation of molecule
`data["y"]`	float or jnp.ndarray	Label(s) for the molecule
`state`	dict	Empty state dictionary
`metadata["idx"]`	int	Index within the split
`metadata["dataset"]`	str	Dataset name

IndexedViewSource Elements¤

Elements from IndexedViewSource are passed through from the underlying source, with indices remapped to the view's subset.

Configuration Options¤

MolNetSourceConfig¤

Parameter	Type	Default	Description
`dataset_name`	str	required	Name of MolNet dataset
`split`	str	"train"	Which split: "train", "valid", "test"
`data_dir`	Path	~/.diffbio/molnet	Data storage directory
`download`	bool	True	Auto-download if missing

IndexedViewSourceConfig¤

Parameter	Type	Default	Description
`shuffle`	bool	False	Shuffle indices on iteration
`seed`	int	None	Random seed for shuffling

Data Sources API¤

Genomics Sources¤

BAMSource¤

diffbio.sources.bam.BAMSource ¤

__len__ ¤

__getitem__ ¤

__iter__ ¤

reset ¤

get_batch ¤

BAMSourceConfig¤

diffbio.sources.bam.BAMSourceConfig dataclass ¤

FastaSource¤

diffbio.sources.fasta.FastaSource ¤

sequence_names property ¤

__len__ ¤

__getitem__ ¤

__iter__ ¤

reset ¤

get_batch ¤

get_by_name ¤

FastaSourceConfig¤

diffbio.sources.fasta.FastaSourceConfig dataclass ¤

MolNet Benchmark Source¤

MolNetSource¤

diffbio.sources.molnet.MolNetSource ¤

task_type property ¤

n_tasks property ¤

__len__ ¤

__getitem__ ¤

__iter__ ¤

MolNetSourceConfig¤

diffbio.sources.molnet.MolNetSourceConfig dataclass ¤

Indexed View Source¤

IndexedViewSource¤

diffbio.sources.indexed_view.IndexedViewSource ¤

__len__ ¤

__getitem__ ¤

__iter__ ¤

IndexedViewSourceConfig¤

diffbio.sources.indexed_view.IndexedViewSourceConfig dataclass ¤

AnnData Source¤

AnnDataSource¤

diffbio.sources.anndata_source.AnnDataSource ¤

__len__ ¤

__getitem__ ¤

__iter__ ¤

AnnDataSourceConfig¤

diffbio.sources.anndata_source.AnnDataSourceConfig dataclass ¤

AnnData Interop¤

to_anndata¤

diffbio.sources.anndata_interop.to_anndata ¤

from_anndata¤

diffbio.sources.anndata_interop.from_anndata ¤

Usage Examples¤

Reading BAM/CRAM Files¤

Reading FASTA Files¤

Region-Based BAM Access¤

Loading MolNet Benchmarks¤

Available MolNet Datasets¤

Using IndexedViewSource for Lazy Loading¤

Integration with Datarax Samplers¤

Custom Data Directory¤

Accessing Individual Elements¤

Data Element Format¤

MolNetSource Elements¤

IndexedViewSource Elements¤

Configuration Options¤

MolNetSourceConfig¤

IndexedViewSourceConfig¤

len ¤

getitem ¤

iter ¤

diffbio.sources.bam.BAMSourceConfig `dataclass` ¤

sequence_names `property` ¤

len ¤

getitem ¤

iter ¤

diffbio.sources.fasta.FastaSourceConfig `dataclass` ¤

task_type `property` ¤

n_tasks `property` ¤

len ¤

getitem ¤

iter ¤

diffbio.sources.molnet.MolNetSourceConfig `dataclass` ¤

len ¤

getitem ¤

iter ¤

diffbio.sources.indexed_view.IndexedViewSourceConfig `dataclass` ¤

len ¤

getitem ¤

iter ¤

diffbio.sources.anndata_source.AnnDataSourceConfig `dataclass` ¤