seqdata.read_table¶
- seqdata.read_table(name, out, tables, seq_col, batch_size, fixed_length, overwrite=False, **kwargs)¶
Reads sequences and metadata from tabular files (e.g. CSV, TSV, etc.) into xarray.
Uses polars under the hood to read the table files.
- Parameters:
name (str) – Name of the sequence variable in the output dataset.
out (PathType) – Path to the output Zarr store where the data will be saved. Usually something like
/path/to/dataset_name.zarr.tables (Union[PathType, ListPathType]) – Path to the input table file(s). Can be a single file or a list of files.
seq_col (str) – Name of the column in the table that contains the sequence.
batch_size (int) – Number of sequences to read at a time. Use as many as you can fit in memory.
fixed_length (bool) – Whether your sequences have a fixed length or not. If they do, the data will be stored in a 2D array as bytes, otherwise it will be stored as unicode strings.
overwrite (bool) – Whether to overwrite the output Zarr store if it already exists.
**kwargs – Additional keyword arguments to pass to the polars
read_csvfunction.
- Returns:
The output dataset.
- Return type:
xr.Dataset