pyprobound.alphabets.Protein
- class Protein
Bases:
AlphabetStores the protein encoding of sequences into tensors.
Three sequence characters are reserved: ‘ ‘ is -infinity (not scored), ‘*’ is the IUPAC wildcard character X, and ‘-’ is zero.
- alphabet
All 20 one-letter amino acid codes.
- Type:
tuple[str]
- get_index
A mapping of monomers in the alphabet to indices in the embedding matrix.
- Type:
dict[str, int]
- get_encoding
IUPAC encoding of monomers to tuples of indices in the embedding matrix; for example, ‘X’ maps to (0, 1, …, 19).
- Type:
dict[str, tuple[int,…]]
- __init__()
Initializes the alphabet.
Methods
embed(seqs)Embeds sequences from a dense to a one-hot representation.
pairwise_embed(seqs, dist)Embeds sequences into a one-hot pairwise representation.
translate(sequence)Translates a sequence into a tensor.
Non-Inherited Members