seqdata.label_overlapping_regions¶
- seqdata.label_overlapping_regions(sdata, targets, mode, label_dim=None, fraction_overlap=None)¶
Label regions for binary or multitask classification based on whether they overlap with another set of regions.
- Parameters:
sdata (xr.Dataset) –
targets (Union[str, Path, pd.DataFrame, List[str]]) – Either a DataFrame (or path to one) with (for binary classification) at least columns [‘chrom’, ‘chromStart’, ‘chromEnd’], or a list of variable names in sdata to use that correspond to the [‘chrom’, ‘chromStart’, ‘chromEnd’] columns, in that order. This is useful if, for example, another set of regions is already in the sdata object under a different set of column names. For multitask classification, the ‘name’ column is also required (i.e. binary requires BED3 format, multitask requires BED4).
mode (Literal["binary", "multitask"]) – Whether to mark regions for binary (intersects with any of the target regions) or multitask classification (which target region does it intersect with?).
label_dim (str, optional) – Name of the label dimension. Only needed for multitask classification.
fraction_overlap (float, optional) – Fraction of the length that must be overlapping to be considered an overlap. This is the “reciprocal minimal overlap fraction” as described in the [bedtools documentation](https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html#r-and-f-requiring-reciprocal-minimal-overlap-fraction).
- Return type:
DataArray