pyprobound.alphabets.DNA
- class DNA
Bases:
AlphabetStores the DNA encoding of sequences into tensors.
Three sequence characters are reserved: ‘ ‘ is -infinity (not scored), ‘*’ is the IUPAC wildcard character N, and ‘-’ is zero.
- alphabet
(‘A’, ‘C’, ‘G’, ‘T’).
- Type:
tuple[str]
- get_index
A mapping of monomers in the alphabet to indices in the embedding matrix.
- Type:
dict[str, int]
- get_encoding
IUPAC encoding of monomers to tuples of indices in the embedding matrix; for example, ‘N’ maps to (0, 1, 2, 3).
- Type:
dict[str, tuple[int,…]]
- __init__()
Initializes the DNA alphabet.
Methods
embed(seqs)Embeds sequences from a dense to a one-hot representation.
pairwise_embed(seqs, dist)Embeds sequences into a one-hot pairwise representation.
translate(sequence)Translates a sequence into a tensor.
Non-Inherited Members