tools.addNoiseUtils

Utility functions for adding noise.

This program is needed for a reason that I don’t quite understand - gmstar is not picklable if I put it inside addNoise.py.

bpreveal.tools.addNoiseUtils.loadFile(h5Fname, numHeads)

Initializer for a multiprocessing pool, loads up a global hdf5 file.

Parameters:

h5Fname (str) – The name of the input hdf5 File that the Pool workers will read from.
numHeads (int) – The number of heads of your model.

Return type:

None

bpreveal.tools.addNoiseUtils.gmstar(getMutArgs)

A wrapper around getMutated that is to be used with a Pool of workers.

Parameters:: getMutArgs (tuple[str, list[int], int]) – A 3-tuple containing: the configuration json as a string, a list giving the number of tasks for each head, and an integer that will be used to seed the rng.
Returns:: See :py:func`~getMutated`.

bpreveal.tools.addNoiseUtils.applyAddSub(maxReads, minReads, fracMut, headData, maxChange, add, rng)

Either add or subtract random noise from headData

Parameters:

maxReads (int) – If a locus has more reads than this, don’t add noise.
minReads (int) – If a locus has fewer reads than this, don’t add noise.
fracMut (float) – What fraction of bases should be noised?
headData (ndarray) – An array of data from a single head, of shape ((output-length + jitter * 2) x num-tasks)
maxChange (int) – Add or subtract a random number of bases up to this maximium.
add (bool) – Do you want to add noise or subtract? True to add, False to subtract.

bpreveal.tools.addNoiseUtils.applyShift(maxDistance, shiftIndependently, fracMut, headData, rng)

Shift the data randomly but keep it near its source.

Parameters:

rng – A numpy random number generator.
maxDistance (int) – What is the furthest that a read is allowed to drift?
shiftIndependently (bool) – Should each read be moved on its own, or should all of the reads at one position move together?
fracMut (float) – What fraction of bases should be subject to the shuffling?
headData (ndarray) – An array containing the data for a specific head.

bpreveal.tools.addNoiseUtils.runShiftAlgo(shiftInputs, maxDistance, headData, shiftIndependently)

Actually perform the shifting.

Parameters:

shiftInputs (ndarray) – The read indexes that should have their data shifted.
maxDistance (int) – What’s the furthest that a read can be shifted?
headData (ndarray) – The array of data for a head.
shiftIndependently (bool) – Should each read be shifted independently?

bpreveal.tools.addNoiseUtils.mutateProfile(mutationTypes, headData, rng)

Given a single head’s data (that may contain multiple tasks), perturb it.

Parameters:

mutationTypes (list[dict]) – Straight from the config json, these are the profile-mutation-types.
headData (ndarray) – An array containing the data for the given head, from the input H5.

Returns:

A new array with the same shape as headData that has been perturbed.

Return type:

ndarray

bpreveal.tools.addNoiseUtils.getMutated(config, tasksPerHead, seed)

Performs the requested mutations.

Parameters:

config (dict) – The json configuration.
tasksPerHead (list[int]) – A list giving, for each head, how many tasks it has.
seed (int) – An integer that will be used to seed the RNG.

Returns:

A tuple. The first element is a dataset containing one-hot-encoded sequences. The second element is a list containing profile data for each of the heads, in order.

Return type:

tuple[ndarray, list[ndarray]]

bpreveal.tools.addNoiseUtils.writeOutput(outFname, sequences, heads)

Writes the given datasets to an hdf5 file that can be used to train BPReveal models.

Parameters:

outFname (str) – The name of the hdf5 file to write
sequences (ndarray) – An array of shape (num-regions x (input-length + 2*jitter) x 4), containing one-hot encoded sequences.
heads (list[ndarray]) – A list of profile data for each head. Each element of this list has shape (num-regions x (output-length + 2 * jitter) * num-tasks).

Return type:

None