pyprobound.utils.count_kmers
- count_kmers(sequences, kmer_length=3, vocabulary=None)
Returns a sparse count matrix of k-mers in a list of sequences.
- Parameters:
sequences (
Iterable[Sequence[Hashable]]) – The sequences to count the k-mers in.kmer_length (
int) – The k-mer length to be counted.vocabulary (
dict[Sequence[Hashable],int] |None) – Mapping of k-mers to indices.
- Return type:
tuple[csc_array,dict[Sequence[Hashable],int]]- Returns:
A tuple (matrix, vocabulary), where matrix is a sparse CSC matrix of the count of each k-mer in each sequence, and vocabulary is the mapping of k-mers to their respective indices in the matrix.