pyprobound.alphabets.DNA

class DNA

Bases: Alphabet

Stores the DNA encoding of sequences into tensors.

Three sequence characters are reserved: ‘ ‘ is -infinity (not scored), ‘*’ is the IUPAC wildcard character N, and ‘-’ is zero.

alphabet

(‘A’, ‘C’, ‘G’, ‘T’).

get_index

A mapping of monomers in the alphabet to indices in the embedding matrix.

get_encoding

IUPAC encoding of monomers to tuples of indices in the embedding matrix; for example, ‘N’ maps to (0, 1, 2, 3).

Methods

`embed`(seqs)	Embeds sequences from a dense to a one-hot representation.
`pairwise_embed`(seqs, dist)	Embeds sequences into a one-hot pairwise representation.
`translate`(sequence)	Translates a sequence into a tensor.

Non-Inherited Members