tools.addNoiseUtils
Utility functions for adding noise.
This program is needed for a reason that I don’t quite understand - gmstar is not picklable if I put it inside addNoise.py.
- bpreveal.tools.addNoiseUtils.loadFile(h5Fname, numHeads)
Initializer for a multiprocessing pool, loads up a global hdf5 file.
- Parameters:
h5Fname (str) – The name of the input hdf5 File that the Pool workers will read from.
numHeads (int) – The number of heads of your model.
- Return type:
None
- bpreveal.tools.addNoiseUtils.gmstar(getMutArgs)
A wrapper around getMutated that is to be used with a Pool of workers.
- Parameters:
getMutArgs (tuple[str, list[int], int]) – A 3-tuple containing: the configuration json as a string, a list giving the number of tasks for each head, and an integer that will be used to seed the rng.
- Returns:
See :py:func`~getMutated`.
- bpreveal.tools.addNoiseUtils.applyAddSub(maxReads, minReads, fracMut, headData, maxChange, add, rng)
Either add or subtract random noise from headData
- Parameters:
maxReads (int) – If a locus has more reads than this, don’t add noise.
minReads (int) – If a locus has fewer reads than this, don’t add noise.
fracMut (float) – What fraction of bases should be noised?
headData (ndarray) – An array of data from a single head, of shape ((output-length + jitter * 2) x num-tasks)
maxChange (int) – Add or subtract a random number of bases up to this maximium.
add (bool) – Do you want to add noise or subtract? True to add, False to subtract.
- bpreveal.tools.addNoiseUtils.applyShift(maxDistance, shiftIndependently, fracMut, headData, rng)
Shift the data randomly but keep it near its source.
- Parameters:
rng – A numpy random number generator.
maxDistance (int) – What is the furthest that a read is allowed to drift?
shiftIndependently (bool) – Should each read be moved on its own, or should all of the reads at one position move together?
fracMut (float) – What fraction of bases should be subject to the shuffling?
headData (ndarray) – An array containing the data for a specific head.
- bpreveal.tools.addNoiseUtils.runShiftAlgo(shiftInputs, maxDistance, headData, shiftIndependently)
Actually perform the shifting.
- Parameters:
shiftInputs (ndarray) – The read indexes that should have their data shifted.
maxDistance (int) – What’s the furthest that a read can be shifted?
headData (ndarray) – The array of data for a head.
shiftIndependently (bool) – Should each read be shifted independently?
- bpreveal.tools.addNoiseUtils.mutateProfile(mutationTypes, headData, rng)
Given a single head’s data (that may contain multiple tasks), perturb it.
- Parameters:
mutationTypes (list[dict]) – Straight from the config json, these are the profile-mutation-types.
headData (ndarray) – An array containing the data for the given head, from the input H5.
- Returns:
A new array with the same shape as headData that has been perturbed.
- Return type:
ndarray
- bpreveal.tools.addNoiseUtils.getMutated(config, tasksPerHead, seed)
Performs the requested mutations.
- Parameters:
config (dict) – The json configuration.
tasksPerHead (list[int]) – A list giving, for each head, how many tasks it has.
seed (int) – An integer that will be used to seed the RNG.
- Returns:
A tuple. The first element is a dataset containing one-hot-encoded sequences. The second element is a list containing profile data for each of the heads, in order.
- Return type:
tuple[ndarray, list[ndarray]]
- bpreveal.tools.addNoiseUtils.writeOutput(outFname, sequences, heads)
Writes the given datasets to an hdf5 file that can be used to train BPReveal models.
- Parameters:
outFname (str) – The name of the hdf5 file to write
sequences (ndarray) – An array of shape (num-regions x (input-length + 2*jitter) x 4), containing one-hot encoded sequences.
heads (list[ndarray]) – A list of profile data for each head. Each element of this list has shape (num-regions x (output-length + 2 * jitter) * num-tasks).
- Return type:
None