pyprobound.table.CountTable

class CountTable(dataframe, alphabet, transliterate=None, transliterate_flanks=False, left_flank='', right_flank='', left_flank_length=0, right_flank_length=0, max_left_flank_length=None, max_right_flank_length=None, wildcard_pad=False, min_variable_length=None, max_variable_length=None)

Bases: Table[CountBatch], CountBatch

A tensor encoding of a count table with flank management.

seqs

A sequence tensor of shape \((\text{minibatch},\text{length})\) or \((\text{minibatch},\text{in_channels},\text{length})\).

Type:

Tensor

target

A count tensor of shape \((\text{minibatch},\text{rounds})\).

Type:

Tensor

left_flank

The prepended sequence.

Type:

str

right_flank

The appended sequence.

Type:

str

left_flank_length

The scored length of the left flank.

Type:

int

right_flank_length

The scored length of the right flank.

Type:

int

counts_per_round

The number of probes in each round of the count table, as a count tensor of shape \((\text{rounds})\).

Type:

Tensor

__init__(dataframe, alphabet, transliterate=None, transliterate_flanks=False, left_flank='', right_flank='', left_flank_length=0, right_flank_length=0, max_left_flank_length=None, max_right_flank_length=None, wildcard_pad=False, min_variable_length=None, max_variable_length=None)

Initializes the count table.

Parameters:
  • dataframe (DataFrame) – The dataframe used to initialize the count table.

  • alphabet (Alphabet) – The alphabet used to encode sequences into tensors.

  • transliterate (dict[str, str] | None) – A mapping of strings to be replaced before encoding.

  • transliterate_flanks (bool) – Whether to apply transliteration to flanks.

  • left_flank (str) – The prepended sequence.

  • right_flank (str) – The appended sequence.

  • left_flank_length (int) – The scored length of the left flank.

  • right_flank_length (int) – The scored length of the right flank.

  • max_left_flank_length (int | None) – The maximum allowed length of the prepended sequence.

  • max_right_flank_length (int | None) – The maximum allowed length of the appended sequence.

  • wildcard_pad (bool) – Whether to append a wildcard character (ex. N for DNA) to all sequences to make them the same length.

  • min_variable_length (int | None) – The minimum possible length of the sequences (needed if using train/test splits on variable length data).

  • max_variable_length (int | None) – The maximum possible length of the sequences (needed if using train/test splits on variable length data).

Methods

batchlen()

Batch length.

get_setup_string()

A description used when printing the output of an optimizer.

set_flank_length([left, right])

Updates the length of flanks included in sequences.

Attributes

input_shape

The number of elements in the length dimension.

left_flank

The prepended sequence.

left_flank_length

The scored length of the left flank.

max_read_length

The maximum number of finite elements in the length dimension.

min_read_length

The minimum number of finite elements in the length dimension.

right_flank

The appended sequence.

right_flank_length

The scored length of the right flank.

seqs

target

Non-Inherited Members

property input_shape: int

The number of elements in the length dimension.

property min_read_length: int

The minimum number of finite elements in the length dimension.

property max_read_length: int

The maximum number of finite elements in the length dimension.

get_setup_string()

A description used when printing the output of an optimizer.

Return type:

str

set_flank_length(left=0, right=0)

Updates the length of flanks included in sequences.

Parameters:
  • left (int) – The new length of the prepended sequence.

  • right (int) – The new length of the appended sequence.

Return type:

None