generators
A class to load up hdf5 files.
These files will have been generated by
prepareTrainingData.
- class bpreveal.generators.H5BatchGenerator(*args, **kwargs)
Loads up training data and presents it to the model.
- Parameters:
headList (dict) – The list of heads straight from the configuration JSON. This gets mutated when mean counts are added, see
addMeanCounts().dataH5 (File) – The (opened) hdf5 file generated by
prepareTrainingData.inputLength (int) – The input length of your model.
outputLength (int) – The output length of your model.
maxJitter (int) – How much random offset can the generator apply when it creates a batch?
batchSize (int) – How many samples will the model be trained on in each batch?
- addMeanCounts()
For all heads, calculate the average number of reads over all regions.
In the BPNet paper, it was shown that \({\lambda}\)\({1/2}\) = ĉ/2, where ĉ is the average number of counts in each region, and if that value of \({\lambda}\) is used in as the counts loss weight, then the profile and counts losses will be given equal weight.
For each head in self.headList, adds a new field INTERNAL_mean-counts that contains the average counts over the output windows. For a target counts loss weight fraction f, you can calculate an initial \({\lambda}\) value for the counts loss based on: \({\lambda}\) = f * ĉ
- Return type:
None
- loadData()
Read in the hdf5 file and suck all the data into memory.
Called only once.
- Return type:
None
- refreshData()
Go over all the data and load it into the data structures from loadData.
Called once every epoch.
- Return type:
None
- on_epoch_end()
When the epoch is done, re-jitter the data by calling refreshData.
- Return type:
None