pyprobound.alphabets.Codon

class Codon

Bases: Alphabet

Stores the codon encoding of sequences into tensors.

Three sequence characters are reserved: ‘ ‘ is -infinity (not scored), ‘***’ is the IUPAC wildcard character NNN, and ‘—’ is zero.

alphabet

All \(_{4}P_{3}\) permutations of the DNA alphabet.

Type:

tuple[str]

get_index

A mapping of monomers in the alphabet to indices in the embedding matrix.

Type:

dict[str, int]

get_encoding

IUPAC encoding of monomers to tuples of indices in the embedding matrix; for example, ‘***’ maps to (0, 1, …, 63).

Type:

dict[str, tuple[int,…]]

__init__()

Initializes the alphabet.

Methods

embed(seqs)

Embeds sequences from a dense to a one-hot representation.

pairwise_embed(seqs, dist)

Embeds sequences into a one-hot pairwise representation.

translate(sequence)

Translates a sequence into a tensor.

Non-Inherited Members