pyprobound.table.CountTable
- class CountTable(dataframe, alphabet, transliterate=None, transliterate_flanks=False, left_flank='', right_flank='', left_flank_length=0, right_flank_length=0, max_left_flank_length=None, max_right_flank_length=None, wildcard_pad=False, min_variable_length=None, max_variable_length=None)
Bases:
Table[CountBatch],CountBatchA tensor encoding of a count table with flank management.
- seqs
A sequence tensor of shape \((\text{minibatch},\text{length})\) or \((\text{minibatch},\text{in_channels},\text{length})\).
- Type:
Tensor
- target
A count tensor of shape \((\text{minibatch},\text{rounds})\).
- Type:
Tensor
- left_flank
The prepended sequence.
- Type:
str
- right_flank
The appended sequence.
- Type:
str
- left_flank_length
The scored length of the left flank.
- Type:
int
- right_flank_length
The scored length of the right flank.
- Type:
int
- counts_per_round
The number of probes in each round of the count table, as a count tensor of shape \((\text{rounds})\).
- Type:
Tensor
- __init__(dataframe, alphabet, transliterate=None, transliterate_flanks=False, left_flank='', right_flank='', left_flank_length=0, right_flank_length=0, max_left_flank_length=None, max_right_flank_length=None, wildcard_pad=False, min_variable_length=None, max_variable_length=None)
Initializes the count table.
- Parameters:
dataframe (
DataFrame) – The dataframe used to initialize the count table.alphabet (
Alphabet) – The alphabet used to encode sequences into tensors.transliterate (
dict[str,str] |None) – A mapping of strings to be replaced before encoding.transliterate_flanks (
bool) – Whether to apply transliteration to flanks.left_flank (
str) – The prepended sequence.right_flank (
str) – The appended sequence.left_flank_length (
int) – The scored length of the left flank.right_flank_length (
int) – The scored length of the right flank.max_left_flank_length (
int|None) – The maximum allowed length of the prepended sequence.max_right_flank_length (
int|None) – The maximum allowed length of the appended sequence.wildcard_pad (
bool) – Whether to append a wildcard character (ex. N for DNA) to all sequences to make them the same length.min_variable_length (
int|None) – The minimum possible length of the sequences (needed if using train/test splits on variable length data).max_variable_length (
int|None) – The maximum possible length of the sequences (needed if using train/test splits on variable length data).
Methods
batchlen()Batch length.
A description used when printing the output of an optimizer.
set_flank_length([left, right])Updates the length of flanks included in sequences.
Attributes
The number of elements in the length dimension.
The prepended sequence.
The scored length of the left flank.
The maximum number of finite elements in the length dimension.
The minimum number of finite elements in the length dimension.
The appended sequence.
The scored length of the right flank.
Non-Inherited Members
- property input_shape: int
The number of elements in the length dimension.
- property min_read_length: int
The minimum number of finite elements in the length dimension.
- property max_read_length: int
The maximum number of finite elements in the length dimension.
- get_setup_string()
A description used when printing the output of an optimizer.
- Return type:
str
- set_flank_length(left=0, right=0)
Updates the length of flanks included in sequences.
- Parameters:
left (
int) – The new length of the prepended sequence.right (
int) – The new length of the appended sequence.
- Return type:
None