Programs
========

Main CLI
--------

These are the core programs of BPReveal. Each one takes a JSON configuration file.

Before you train
''''''''''''''''

:py:mod:`prepareBed<bpreveal.prepareBed>`
    Given a set of regions and data tracks, reject regions that have too few
    (or too many) reads, or that have unmapped bases in the genome.

:py:mod:`prepareTrainingData<bpreveal.prepareTrainingData>`
    Takes in bed and bigwig files and a genome, and generates an hdf5-format
    file containing the samples used for training.

Training your models
''''''''''''''''''''

The first step is always ``trainSoloModel``. If you're not doing fancy bias
correction, then it's also the last step. ``trainTransformationModel`` and
``trainCombinedModel`` are just for if you're doing ChromBPnet-style bias
removal.

:py:mod:`trainSoloModel<bpreveal.trainSoloModel>`
    Takes in a training input configuration and trains up a model to predict
    the given data, with no bias correction. Saves the model to disk, along
    with information from the training phase.

:py:mod:`trainTransformationModel<bpreveal.trainTransformationModel>`
    Takes in a bias (i.e., solo) model and the actual experimental (i.e.,
    biology + bias) data. Derives a relation to best fit the bias profile onto
    the experimental data. Saves a new model to disk, adding a simple layer or
    two to do the regression.

:py:mod:`trainCombinedModel<bpreveal.trainCombinedModel>`
    Takes a transformation model and experimental data and builds a model to
    explain the residuals. Saves both a combined model and the residual model
    alone to disk.

Working with models
'''''''''''''''''''

These are the tools that you'll use after you've trained your models.

:py:mod:`makePredictions<bpreveal.makePredictions>`
    Takes a trained model (solo, combined, residual, or even transformation
    models work) and predicts over the given regions or sequences.

:py:mod:`interpretFlat<bpreveal.interpretFlat>`
    Generates shap scores of the same type as BPNet. Hypothetical contributions
    for each base are written to a modisco-compatible h5.

:py:mod:`interpretPisa<bpreveal.interpretPisa>`
    Runs an all-to-all shap analysis on the given bed regions or fasta
    sequences.

:py:mod:`makePisaFigure<bpreveal.makePisaFigure>`
    Generates a handsome-looking PISA graph or plot. This is a thin wrapper around
    :py:func:`plotting.pisaPlot<bpreveal.plotting.pisaPlot>` and
    :py:func:`plotting.pisaGraph<bpreveal.plotting.pisaGraph>`.

Looking for motifs
''''''''''''''''''

After you run MoDISco, you can use BPReveal to scan for motif instances
in the genome.

:py:mod:`motifScan<bpreveal.motifScan>`
    Scan the genome for patterns of contribution scores that match motifs
    identified by modiscolite.

:py:mod:`motifSeqletCutoffs<bpreveal.motifSeqletCutoffs>`
    Loads the output from ``modiscolite`` and calculates cutoff values to use
    during motif scanning.


Utility CLI
-----------

These are little tools and utilities that help in dealing with models. These
take arguments on the command line.

Before you train
''''''''''''''''

:py:mod:`checkJson<bpreveal.checkJson>`
    Take a json file and make sure that it's valid input for one of the
    BPReveal programs. Can also be used to identify which BPReveal program a
    json belongs to.

:py:mod:`lengthCalc<bpreveal.lengthCalc>`
    Given the parameters of a network, like input filter width, number of
    layers &c., determine the input width or output width.

Training your models
''''''''''''''''''''

:py:mod:`showTrainingProgress<bpreveal.showTrainingProgress>`
    Read in the log files generated by the training programs (when verbosity is
    ``INFO`` or ``DEBUG``) and show you how well the model's doing in real time.

:py:mod:`showModel<bpreveal.showModel>`
    (DEPRECATED, will be removed in 6.0.0) Make a pretty picture of your model.

:py:mod:`makeLossPlots<bpreveal.makeLossPlots>`
    Once you've trained a model, you can run this on the history file to get
    plots of all of the components of the loss.

Working with models
'''''''''''''''''''

:py:mod:`predictToBigwig<bpreveal.predictToBigwig>`
    Takes the hdf5 file generated by the predict step and converts one track
    from it into a bigwig file.

:py:mod:`shapToBigwig<bpreveal.shapToBigwig>`
    Converts a shap hdf5 file (from
    :py:mod:`interpretFlat<bpreveal.interpretFlat>`) into a bigwig track for
    visualization.

:py:mod:`shapToNumpy<bpreveal.shapToNumpy>`
    Takes the interpretations from
    :py:mod:`interpretFlat<bpreveal.interpretFlat>` and converts them to numpy
    arrays that can be read in by modiscolite.

:py:mod:`metrics<bpreveal.metrics>`
    Calculates a suite of metrics about how good a model's predictions are.

Looking for motifs
''''''''''''''''''

:py:mod:`motifAddQuantiles<bpreveal.motifAddQuantiles>`
    Takes the output from :py:mod:`motifScan<bpreveal.motifScan>` and adds
    quantile information for determining how good your motif matches were.


API
---
These are Python libraries that do most of the heavy lifting, and can be imported
to do useful things in your code.

:py:mod:`bedUtils<bpreveal.bedUtils>`
    Useful functions for manipulating bed files, particularly for tiling the
    genome with regions and
    :py:func:`calculating metapeaks<bpreveal.bedUtils.metapeak>` over very
    large data sets.

:py:mod:`gaOptimize<bpreveal.gaOptimize>`
    Tools for evolving sequences that lead to desired profiles. It
    implements a genetic algorithm that supports insertions and deletions.
    You can also use the :py:class:`Organism<bpreveal.gaOptimize.Organism>`
    class on its own to apply mutations to sequences. These mutations can
    include insertions and deletions.

:py:mod:`interpretUtils<bpreveal.internal.interpretUtils>`
    Functions for getting interpretation scores. Contains a streaming system
    for calculating PISA and flat importance scores. You should not normally
    need to interact with this module. Instead, use
    :py:mod:`interpretFlat<bpreveal.interpretFlat>`,
    :py:mod:`interpretPisa<bpreveal.interpretPisa>`, or
    :py:func:`easyInterpretFlat<bpreveal.utils.easyInterpretFlat>`.

:py:mod:`jaccard<bpreveal.jaccard>`
    Contains wrappers around C functions that calculate the sliding Jaccard similarity
    used to scan for motifs. You almost certainly don't need to use this.

:py:mod:`logUtils<bpreveal.logUtils>`
    Functions used to log information. It's basically TensorFlow's wrapper
    around the ``logging`` module in the standard library. You probably don't need
    to use the logging functions yourself, but you may want to use the
    :py:func:`setVerbosity<bpreveal.logUtils.setVerbosity>` and
    :py:func:`setBooleanVerbosity<bpreveal.logUtils.setBooleanVerbosity>` functions.


:py:mod:`motifUtils<bpreveal.motifUtils>`
    Functions for dealing with motif scanning and modisco files. You probably don't need
    to use this directly.

:py:mod:`plotting<bpreveal.plotting>`
    Utilities for making high-quality plots of your results. For PISA, you will probably
    want to use :py:func:`plotPisa<bpreveal.plotting.plotPisa>` or
    :py:func:`plotPisaGraph<bpreveal.plotting.plotPisaGraph>`.
    For MoDISco results, there's
    :py:func:`plotModiscoPattern<bpreveal.plotting.plotModiscoPattern>`.

:py:mod:`schema<bpreveal.schema>`
    A set of JSON schemas that validate the inputs to the BPReveal programs.
    These are used to make sure that incorrect inputs trigger errors early, and
    that those errors are clearer to the user. You do not need to use this.

:py:mod:`training<bpreveal.training>`
    A very simple module that actually runs the training loop for
    :py:mod:`trainSoloModel<bpreveal.trainSoloModel>`,
    :py:mod:`trainTransformationModel<bpreveal.trainTransformationModel>`, and
    :py:mod:`trainCombinedModel<bpreveal.trainCombinedModel>`.
    You should not need to use this directly.

:py:mod:`ushuffle<bpreveal.ushuffle>`
    A wrapper around the ushuffle library, used to perform shuffles of sequences that
    preserve k-mer distributions.

:py:mod:`utils<bpreveal.utils>`
    Contains general-use utilities and a high-performance tool to generate
    predictions for many sequences.

Useful API features
-------------------

Much of the BPReveal API is dedicated to supporting the CLI tools and a typical
user won't need to interact with it. But there are a few functions here and
there that you might find helpful. Here are a few you should know about.


Data processing
'''''''''''''''

To tile the genome with regions, you can use
:py:func:`bedUtils.makeWhitelistSegments<bpreveal.bedUtils.makeWhitelistSegments>` and
:py:func:`bedUtils.tileSegments<bpreveal.bedUtils.tileSegments>`, or you can use
:py:func:`bedUtils.createTilingRegions<bpreveal.bedUtils.createTilingRegions>`, which
just wraps the two former functions.

For bed intervals, you can resize them with
:py:func:`bedUtils.resize<bpreveal.bedUtils.resize>`.

For working with bigwigs, you can use
:py:func:`utils.loadChromSizes<bpreveal.utils.loadChromSizes>`,
:py:func:`utils.blankChromosomeArrays<bpreveal.utils.blankChromosomeArrays>`, and
:py:func:`utils.writeBigwig<bpreveal.utils.writeBigwig>` to easily write
data to a new bigwig file.

You can use
:py:func:`bedUtils.metapeak<bpreveal.bedUtils.metapeak>` to get the average
profile over many regions, which is useful for plotting.

Making predictions
''''''''''''''''''

If you want to do this the easy way, use the Easy function,
:py:func:`utils.easyPredict<bpreveal.utils.easyPredict>`.
This function will load up a model, make predictions, and then give you the
profiles. It also cleans up after itself and releases the GPU.

For more intense predictions, or if you need the raw model outputs, use
:py:class:`utils.ThreadedBatchPredictor<bpreveal.utils.ThreadedBatchPredictor>`.
This spawns background threads that can run predictions at blinding speed, with
multiple processes sharing the GPU for maximum throughput.
This class supports streaming data, so you can make terabytes of predictions and
process them as they come, letting your program run with a minimal memory
footprint.

If you have model outputs (logits and logcounts) and want a predicted profile, use
:py:func:`utils.logitsToProfile<bpreveal.utils.logitsToProfile>`.

To efficiently convert DNA sequences to and from one-hot-encoded form, use
:py:func:`utils.oneHotEncode<bpreveal.utils.oneHotEncode>` and
:py:func:`utils.oneHotDecode<bpreveal.utils.oneHotDecode>`.
These functions are optimized and can perform their calculations far faster than a naive
implementation with dictionary lookups.

For applying mutations to sequences, I suggest using the
:py:class:`Organism<bpreveal.gaOptimize.Organism>` class in the
:py:mod:`gaOptimize<bpreveal.gaOptimize>` module. While it is designed to be
part of a genetic algorithm optimization, it can easily be used on its own to
apply corruptors (called "corruptors" to avoid confusion with the genetic
algorithm operation called "mutation") to a single sequence.


Getting importance scores
'''''''''''''''''''''''''

If the :py:mod:`interpretFlat<bpreveal.interpretFlat>` CLI tool doesn't do what you need,
you can use
:py:func:`utils.easyInterpretFlat<bpreveal.utils.easyInterpretFlat>` to get
importance scores.
If you need something even more custom, you'll have to wade through the arcane and
complex :py:mod:`interpretUtils<bpreveal.internal.interpretUtils>` module and
I'm sorry for you.

Working with motifs
'''''''''''''''''''

The :py:mod:`motifUtils<bpreveal.motifUtils>` module contains helpers for working with
Modisco pattern objects. Typically, you create a
:py:class:`motifUtils.Pattern<bpreveal.motifUtils.Pattern>` object and then call
:py:func:`loadCwm<bpreveal.motifUtils.Pattern.loadCwm>` and then
:py:func:`loadSeqlets<bpreveal.motifUtils.Pattern.loadSeqlets>` to load in the
relevant data.
Just about the only time you'd need to create a Pattern object is to plot it.

Showing off your results
''''''''''''''''''''''''

There are a bunch of nifty tools for making high-quality plots in the
:py:mod:`plotting<bpreveal.plotting>` package. You can make PISA plots, PISA
graph plots, and motif summary plots.

Tools
-----

These are miscellaneous programs that are not part of BPReveal proper, but that
I have found useful. They are not actively maintained, and tend to have subpar
documentation.

..
    Copyright 2022-2025 Charles McAnany. This file is part of BPReveal. BPReveal is free software: You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version. BPReveal is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with BPReveal. If not, see <https://www.gnu.org/licenses/>.