Programs
Main CLI
These are the core programs of BPReveal. Each one takes a JSON configuration file.
interpretFlatGenerates shap scores of the same type as BPNet. Hypothetical contributions for each base are written to a modisco-compatible h5.
interpretPisaRuns an all-to-all shap analysis on the given bed regions or fasta sequences.
makePredictionsTakes a trained model (solo, combined, residual, or even transformation models work) and predicts over the given regions.
motifScanScan the genome for patterns of contribution scores that match motifs identified by modiscolite.
motifSeqletCutoffsLoads the output from
modiscoliteand calculates cutoff values to use during motif scanning.prepareBedGiven a set of regions and data tracks, reject regions that have too few (or too many) reads, or that have unmapped bases in the genome.
prepareTrainingDataTakes in bed and bigwig files and a genome, and generates an hdf5-format file containing the samples used for training.
trainCombinedModelTakes a transformation model and experimental data and builds a model to explain the residuals. Saves both a combined model and the residual model alone to disk.
trainSoloModelTakes in a training input configuration and trains up a model to predict the given data, with no bias correction. Saves the model to disk, along with information from the training phase.
trainTransformationModelTakes in a bias (i.e., solo) model and the actual experimental (i.e., biology + bias) data. Derives a relation to best fit the bias profile onto the experimental data. Saves a new model to disk, adding a simple layer or two to do the regression.
Utility CLI
These are little tools and utilities that help in dealing with models. These take arguments on the command line.
lengthCalcGiven the parameters of a network, like input filter width, number of layers &c., determine the input width or output width.
makeLossPlotsOnce you’ve trained a model, you can run this on the history file to get plots of all of the components of the loss.
metricsCalculates a suite of metrics about how good a model’s predictions are.
motifAddQuantilesTakes the output from
motifScanand adds quantile information for determining how good your motif matches were.predictToBigwigTakes the hdf5 file generated by the predict step and converts one track from it into a bigwig file.
shapToBigwigConverts a shap hdf5 file (from
interpretFlat) into a bigwig track for visualization.shapToNumpyTakes the interpretations from
interpretFlatand converts them to numpy arrays that can be read in by modiscolite.checkJsonTake a json file and make sure that it’s valid input for one of the BPReveal programs. Can also be used to identify which BPReveal program a json belongs to.
showTrainingProgressRead in the log files generated by the training programs (when verbosity is
INFOorDEBUGand show you how well the model’s doing in real time.showModel(DEPRECATED, will be removed in 6.0.0) Make a pretty picture of your model.
API
These are Python libraries that do most of the heavy lifting, and can be imported to do useful things in your code.
gaOptimizecontains tools for evolving sequences that lead to desired profiles. It implements a genetic algorithm that supports insertions and deletions.
utilsContains general-use utilities and a high-performance tool to generate predictions for many sequences.
bedUtilsUseful functions for manipulating bed files, particularly for tiling the genome with regions.
motifUtilsFunctions for dealing with motif scanning and modisco files.
logUtilsFunctions used to log information. It’s basically TensorFlow’s wrapper around the
loggingmodule in the standard library.interpretUtilsFunctions for getting interpretation scores. Contains a streaming system for calculating pisa and flat importance scores.
schemaA set of JSON schemas that validate the inputs to the BPReveal programs. These are used to make sure that incorrect inputs trigger errors early, and that those errors are clearer to the user.
trainingA very simple module that actually runs the training loop for
trainSoloModel,trainTransformationModel, andtrainCombinedModel.jaccardContains wrappers around C functions that calculate the sliding Jaccard similarity used to scan for motifs.
ushuffleA wrapper around the ushuffle library, used to perform shuffles of sequences that preserve k-mer distributions.
Tools
These are miscellaneous programs that are not part of BPReveal proper, but that I have found useful. They are not actively maintained, and tend to have subpar documentation.