generators

A class to load up hdf5 files.

These files will have been generated by prepareTrainingData.

class bpreveal.generators.H5BatchGenerator(*args, **kwargs)

Loads up training data and presents it to the model.

Parameters:
  • headList (dict) – The list of heads straight from the configuration JSON. This gets mutated when mean counts are added, see addMeanCounts().

  • dataH5 (File) – The (opened) hdf5 file generated by prepareTrainingData.

  • inputLength (int) – The input length of your model.

  • outputLength (int) – The output length of your model.

  • maxJitter (int) – How much random offset can the generator apply when it creates a batch?

  • batchSize (int) – How many samples will the model be trained on in each batch?

addMeanCounts()

For all heads, calculate the average number of reads over all regions.

In the BPNet paper, it was shown that \({\lambda}\)\({1/2}\) = ĉ/2, where ĉ is the average number of counts in each region, and if that value of \({\lambda}\) is used in as the counts loss weight, then the profile and counts losses will be given equal weight.

For each head in self.headList, adds a new field INTERNAL_mean-counts that contains the average counts over the output windows. For a target counts loss weight fraction f, you can calculate an initial \({\lambda}\) value for the counts loss based on: \({\lambda}\) = f * ĉ

Return type:

None

loadData()

Read in the hdf5 file and suck all the data into memory.

Called only once.

Return type:

None

refreshData()

Go over all the data and load it into the data structures from loadData.

Called once every epoch.

Return type:

None

on_epoch_end()

When the epoch is done, re-jitter the data by calling refreshData.

Return type:

None