pyprobound.alphabets.Protein

class Protein

Bases: Alphabet

Stores the protein encoding of sequences into tensors.

Three sequence characters are reserved: ‘ ‘ is -infinity (not scored), ‘*’ is the IUPAC wildcard character X, and ‘-’ is zero.

alphabet

All 20 one-letter amino acid codes.

Type:

tuple[str]

get_index

A mapping of monomers in the alphabet to indices in the embedding matrix.

Type:

dict[str, int]

get_encoding

IUPAC encoding of monomers to tuples of indices in the embedding matrix; for example, ‘X’ maps to (0, 1, …, 19).

Type:

dict[str, tuple[int,…]]

__init__()

Initializes the alphabet.

Methods

embed(seqs)

Embeds sequences from a dense to a one-hot representation.

pairwise_embed(seqs, dist)

Embeds sequences into a one-hot pairwise representation.

translate(sequence)

Translates a sequence into a tensor.

Non-Inherited Members