tools.addNoiseUtils
Utility functions for adding noise.
Warning
This tool is deprecated and will be removed in BPReveal 6.0. It turns out that it’s not very useful.
This program is needed for a reason that I don’t quite understand - gmstar is not picklable if I put it inside addNoise.py.
- bpreveal.tools.addNoiseUtils.loadFile(h5Fname, numHeads)
Initializer for a multiprocessing pool, loads up a global hdf5 file.
- Parameters:
h5Fname (str) – The name of the input hdf5 File that the Pool workers will read from.
numHeads (int) – The number of heads of your model.
- Return type:
None
- bpreveal.tools.addNoiseUtils.gmstar(getMutArgs)
A wrapper around getMutated that is to be used with a Pool of workers.
- Parameters:
getMutArgs (tuple[str, list[int], int]) – A 3-tuple containing: the configuration json as a string, a list giving the number of tasks for each head, and an integer that will be used to seed the rng.
- Returns:
See :py:func`~getMutated`.
- bpreveal.tools.addNoiseUtils.applyAddSub(maxReads, minReads, fracMut, headData, maxChange, add, rng)
Either add or subtract random noise from headData
- Parameters:
maxReads (int) – If a locus has more reads than this, don’t add noise.
minReads (int) – If a locus has fewer reads than this, don’t add noise.
fracMut (float) – What fraction of bases should be noised?
headData (ndarray) – An array of data from a single head, of shape ((output-length + jitter * 2) x num-tasks)
maxChange (int) – Add or subtract a random number of bases up to this maximium.
add (bool) – Do you want to add noise or subtract? True to add, False to subtract.
- bpreveal.tools.addNoiseUtils.applyShift(maxDistance, shiftIndependently, fracMut, headData, rng)
Shift the data randomly but keep it near its source.
- Parameters:
rng – A numpy random number generator.
maxDistance (int) – What is the furthest that a read is allowed to drift?
shiftIndependently (bool) – Should each read be moved on its own, or should all of the reads at one position move together?
fracMut (float) – What fraction of bases should be subject to the shuffling?
headData (ndarray) – An array containing the data for a specific head.
- bpreveal.tools.addNoiseUtils.runShiftAlgo(shiftInputs, maxDistance, headData, shiftIndependently)
Actually perform the shifting.
- Parameters:
shiftInputs (ndarray) – The read indexes that should have their data shifted.
maxDistance (int) – What’s the furthest that a read can be shifted?
headData (ndarray) – The array of data for a head.
shiftIndependently (bool) – Should each read be shifted independently?
- bpreveal.tools.addNoiseUtils.mutateProfile(mutationTypes, headData, rng)
Given a single head’s data (that may contain multiple tasks), perturb it.
- Parameters:
mutationTypes (list[dict]) – Straight from the config json, these are the profile-mutation-types.
headData (ndarray) – An array containing the data for the given head, from the input H5.
- Returns:
A new array with the same shape as headData that has been perturbed.
- Return type:
ndarray
- bpreveal.tools.addNoiseUtils.getMutated(config, tasksPerHead, seed)
Performs the requested mutations.
- Parameters:
config (dict) – The json configuration.
tasksPerHead (list[int]) – A list giving, for each head, how many tasks it has.
seed (int) – An integer that will be used to seed the RNG.
- Returns:
A tuple. The first element is a dataset containing one-hot-encoded sequences. The second element is a list containing profile data for each of the heads, in order.
- Return type:
tuple[ndarray, list[ndarray]]
- bpreveal.tools.addNoiseUtils.writeOutput(outFname, sequences, heads)
Writes the given datasets to an hdf5 file that can be used to train BPReveal models.
- Parameters:
outFname (str) – The name of the hdf5 file to write
sequences (ndarray) – An array of shape (num-regions x (input-length + 2*jitter) x 4), containing one-hot encoded sequences.
heads (list[ndarray]) – A list of profile data for each head. Each element of this list has shape (num-regions x (output-length + 2 * jitter) * num-tasks).
- Return type:
None