Annotated sequence data

API

Readers

These classes are designed to read data from a variety of file formats into a SeqData object.

BAM(name, bams, samples, batch_size[, ...])

VCF(name, vcf, fasta, batch_size[, samples, ...])

BigWig(name, bigwigs, samples, batch_size[, ...])

FlatFASTA(name, fasta, batch_size[, n_threads])

GenomeFASTA(name, fasta, batch_size[, ...])

Table(name, tables, seq_col, batch_size, ...)

Composing readers

These functions are designed to be used in a composable way to read data from a variety of file formats into a single SeqData object.

from_flat_files(*readers, path, fixed_length)

Composable function to create a SeqData object from flat files.

from_region_files(*readers, path, ...[, ...])

Composable function to create a SeqData object from region based files.

Default readers

These functions are special cases of the composable readers that are designed to be used for common use cases

read_bam(seq_name, cov_name, out, fasta, ...)

Read in sequences with coverage from a BAM file.

read_bigwig(seq_name, cov_name, out, fasta, ...)

Read a bigWig file and return a Dataset.

read_flat_fasta(name, out, fasta, ...[, ...])

Reads sequences from a "flat" FASTA file into xarray.

read_genome_fasta(name, out, fasta, bed, ...)

Reads sequences from a "genome" FASTA file into xarray.

read_table(name, out, tables, seq_col, ...)

Reads sequences and metadata from tabular files (e.g.

read_vcf(name, out, vcf, fasta, samples, ...)

Read a VCF file and return a Dataset.

read_bedlike(path)

Reads a bed-like (BED3+) file as a pandas DataFrame.

Zarr

SeqData reads and writes all datasets to disk as Zarr stores using the following functions

to_zarr(sdata, store[, chunk_store, mode, ...])

Write a xarray object to disk as a Zarr store.

open_zarr(store[, group, synchronizer, ...])

Open a SeqData object from disk.

PyTorch dataloading

SeqData provides a unified interface for converting SeqData objects into PyTorch dataloaders

get_torch_dataloader(sdata, sample_dims, ...)

Get a PyTorch DataLoader for this SeqData.

Utilities

Some utility functions that are useful for working with SeqData objects

add_bed_to_sdata(sdata, bed[, col_prefix, ...])

Add a BED-like DataFrame to a Dataset.

label_overlapping_regions(sdata, targets, mode)

Label regions for binary or multitask classification based on whether they overlap with another set of regions.

merge_obs(sdata, obs[, on, left_on, ...])

Merge observations into a SeqData object along sequence axis.