pyprobound.alphabets.Codon
- class Codon
Bases:
AlphabetStores the codon encoding of sequences into tensors.
Three sequence characters are reserved: ‘ ‘ is -infinity (not scored), ‘***’ is the IUPAC wildcard character NNN, and ‘—’ is zero.
- alphabet
All \(_{4}P_{3}\) permutations of the DNA alphabet.
- Type:
tuple[str]
- get_index
A mapping of monomers in the alphabet to indices in the embedding matrix.
- Type:
dict[str, int]
- get_encoding
IUPAC encoding of monomers to tuples of indices in the embedding matrix; for example, ‘***’ maps to (0, 1, …, 63).
- Type:
dict[str, tuple[int,…]]
- __init__()
Initializes the alphabet.
Methods
embed(seqs)Embeds sequences from a dense to a one-hot representation.
pairwise_embed(seqs, dist)Embeds sequences into a one-hot pairwise representation.
translate(sequence)Translates a sequence into a tensor.
Non-Inherited Members