tools.addNoiseUtils

Utility functions for adding noise.

This program is needed for a reason that I don’t quite understand - gmstar is not picklable if I put it inside addNoise.py.

bpreveal.tools.addNoiseUtils.loadFile(h5Fname, numHeads)

Initializer for a multiprocessing pool, loads up a global hdf5 file.

Parameters:
  • h5Fname (str) – The name of the input hdf5 File that the Pool workers will read from.

  • numHeads (int) –

Return type:

None

bpreveal.tools.addNoiseUtils.gmstar(getMutArgs)

A wrapper around getMutated that is to be used with a Pool of workers.

Parameters:

getMutArgs (tuple[str, list[int], int]) – A 3-tuple containing: the configuration json as a string, a list giving the number of tasks for each head, and an integer that will be used to seed the rng.

Returns:

See :py:func`~getMutated`.

bpreveal.tools.addNoiseUtils.applyAddSub(maxReads, minReads, fracMut, headData, maxChange, add, rng)

Either add or subtract random noise from headData

Parameters:
  • maxReads (int) – If a locus has more reads than this, don’t add noise.

  • minReads (int) – If a locus has fewer reads than this, don’t add noise.

  • fracMut (float) – What fraction of bases should be noised?

  • headData (ndarray) – An array of data from a single head, of shape ((output-length + jitter * 2) x num-tasks)

  • maxChange (int) – Add or subtract a random number of bases up to this maximium.

  • add (bool) – Do you want to add noise or subtract? True to add, False to subtract.

bpreveal.tools.addNoiseUtils.applyShift(maxDistance, shiftIndependently, fracMut, headData, rng)

Shift the data randomly but keep it near its source.

Parameters:
  • rng – A numpy random number generator.

  • maxDistance (int) – What is the furthest that a read is allowed to drift?

  • shiftIndependently (bool) – Should each read be moved on its own, or should all of the reads at one position move together?

  • fracMut (float) – What fraction of bases should be subject to the shuffling?

  • headData (ndarray) – An array containing the data for a specific head.

bpreveal.tools.addNoiseUtils.runShiftAlgo(shiftInputs, maxDistance, headData, shiftIndependently)

Actually perform the shifting.

Parameters:
  • shiftInputs (ndarray) – The read indexes that should have their data shifted.

  • maxDistance (int) – What’s the furthest that a read can be shifted?

  • headData (ndarray) – The array of data for a head.

  • shiftIndependently (bool) – Should each read be shifted independently?

bpreveal.tools.addNoiseUtils.mutateProfile(mutationTypes, headData, rng)

Given a single head’s data (that may contain multiple tasks), perturb it.

Parameters:
  • mutationTypes (list[dict]) – Straight from the config json, these are the profile-mutation-types.

  • headData (ndarray) – An array containing the data for the given head, from the input H5.

Returns:

A new array with the same shape as headData that has been perturbed.

Return type:

ndarray

bpreveal.tools.addNoiseUtils.getMutated(config, tasksPerHead, seed)

Performs the requested mutations.

Parameters:
  • config (dict) – The json configuration.

  • tasksPerHead (list[int]) – A list giving, for each head, how many tasks it has.

  • seed (int) – An integer that will be used to seed the RNG.

Returns:

A tuple. The first element is a dataset containing one-hot-encoded sequences. The second element is a list containing profile data for each of the heads, in order.

Return type:

tuple[ndarray, list[ndarray]]

bpreveal.tools.addNoiseUtils.writeOutput(outFname, sequences, heads)

Writes the given datasets to an hdf5 file that can be used to train BPReveal models.

Parameters:
  • outFname (str) – The name of the hdf5 file to write

  • sequences (ndarray) – An array of shape (num-regions x (input-length + 2*jitter) x 4), containing one-hot encoded sequences.

  • heads (list[ndarray]) – A list of profile data for each head. Each element of this list has shape (num-regions x (output-length + 2 * jitter) * num-tasks).

Return type:

None