ushuffle
A wrapper around the ushuffle C implementation.
- bpreveal.ushuffle.shuffleString(sequence, kmerSize, numShuffles=1, seed=None)
Given a string sequence, perform a shuffle that maintains the kmer distribution.
This is adapted from ushuffle.
sequenceshould be a string in ASCII, but it should theoretically work on multi-byte encoded utf-8 characters so long as the kmerSize is at least as long as the longest byte sequence for a character in the input. (Please don’t rely on this random fact!)Returns a list of shuffled strings.
- Parameters:
sequence (str)
kmerSize (int)
numShuffles (int)
seed (int | None)
- Return type:
list[str]
- bpreveal.ushuffle.shuffleOHE(sequence, kmerSize, numShuffles=1, seed=None)
Given a one-hot sequence, perform a shuffle that maintains the kmer distribution.
sequenceshould have shape(length, alphabetLength). For DNA,alphabetLength == 4. It is an error to have an alphabet length of more than 8. Internally, this function packs the bits at each position into a character, and the resulting string is shuffled and then unpacked. For this reason, it is possible to have more than one letter be hot at one position, or even to have no letters hot at a position. For example, this one-hot encoded sequence is valid input:Pos A C G T 0 1 0 0 0 1 0 1 0 0 2 1 0 1 0 3 0 1 1 1 4 0 0 0 0
This is adapted from ushuffle. Returns an array of shape
(numShuffles, length, alphabetLength)- Parameters:
sequence (ndarray[tuple[int, ...], dtype[uint8]])
kmerSize (int)
numShuffles (int)
seed (int | None)
- Return type:
ndarray[tuple[int, …], dtype[uint8]]