Annotated sequence data

API

Readers

These classes are designed to read data from a variety of file formats into a SeqData object.

BAM(name, bams, samples, batch_size[, ...])

VCF(name, vcf, fasta, batch_size[, samples, ...])

BigWig(name, bigwigs, samples, batch_size[, ...])

FlatFASTA(name, fasta, batch_size[, n_threads])

GenomeFASTA(name, fasta, batch_size[, ...])

Table(name, tables, seq_col, batch_size, ...)

Composing readers

These functions are designed to be used in a composable way to read data from a variety of file formats into a single SeqData object.

from_flat_files(*readers, path, fixed_length)

Save a SeqData to disk and open it (without loading it into memory).

from_region_files(*readers, path, ...[, ...])

Save a SeqData to disk and open it (without loading it into memory).

Default readers

These functions are special cases of the composable readers that are designed to be used for common use cases

read_bam(seq_name, cov_name, out, fasta, ...)

rtype:

Dataset

read_bigwig(seq_name, cov_name, out, fasta, ...)

rtype:

Dataset

read_flat_fasta(name, out, fasta, ...[, ...])

rtype:

Dataset

read_genome_fasta(name, out, fasta, bed, ...)

rtype:

Dataset

read_table(name, out, tables, seq_col, ...)

rtype:

Dataset

read_vcf(name, out, vcf, fasta, samples, ...)

rtype:

Dataset

read_bedlike(path)

Reads a bed-like (BED3+) file as a pandas DataFrame.

Writing

SeqData writes all datasets to disk as Zarr stores using the following functions

open_zarr(store[, group, synchronizer, ...])

PyTorch dataloading

SeqData provides a unified interface for converting SeqData objects into PyTorch dataloaders

get_torch_dataloader(sdata, sample_dims, ...)

Get a PyTorch DataLoader for this SeqData.

Utilities

Some utility functions that are useful for working with SeqData objects

add_bed_to_sdata(sdata, bed[, col_prefix, ...])

label_overlapping_regions(sdata, targets, mode)

Label regions for binary or multitask classification based on whether they overlap with another set of regions.

merge_obs(sdata, obs[, on, left_on, ...])