pyprobound.table.sample_dataframe
- sample_dataframe(dataframe, frac=0.1, random_state=None, n_bin=128)
Randomly samples from a dataframe evenly by enrichment.
To make validation or test splits representative of the training data, bin sequences by their overall enrichment and sample evenly within each bin.
- Parameters:
dataframe (
DataFrame) – The input dataframe to be sampled from.frac (
float) – The proportion of reads to be sampled from the dataframe.random_state (
int|None) – A seed used to make the output reproducible.n_bin (
int) – The bin size used to sample sequences from.
- Return type:
tuple[DataFrame,DataFrame]- Returns:
A tuple of two dataframes, the first containing frac of the original dataframe, the second containing 1 - frac of the original dataframe.