seqdata.read_table

seqdata.read_table(name, out, tables, seq_col, batch_size, fixed_length, overwrite=False, **kwargs)

Reads sequences and metadata from tabular files (e.g. CSV, TSV, etc.) into xarray.

Uses polars under the hood to read the table files.

Parameters:
  • name (str) – Name of the sequence variable in the output dataset.

  • out (PathType) – Path to the output Zarr store where the data will be saved. Usually something like /path/to/dataset_name.zarr.

  • tables (Union[PathType, ListPathType]) – Path to the input table file(s). Can be a single file or a list of files.

  • seq_col (str) – Name of the column in the table that contains the sequence.

  • batch_size (int) – Number of sequences to read at a time. Use as many as you can fit in memory.

  • fixed_length (bool) – Whether your sequences have a fixed length or not. If they do, the data will be stored in a 2D array as bytes, otherwise it will be stored as unicode strings.

  • overwrite (bool) – Whether to overwrite the output Zarr store if it already exists.

  • **kwargs – Additional keyword arguments to pass to the polars read_csv function.

Returns:

The output dataset.

Return type:

xr.Dataset