Workflows ========= Without bias removal -------------------- For cases where you don't have a bias track you want to regress out, here are the steps: 1. Prepare bigwig tracks and select the regions you are interested in. There are some utilities for this in BPReveal (:py:mod:`prepareBed`), but you are mostly on your own for this stage. 2. Prepare training data files with :py:mod:`prepareTrainingData`. These hdf5 files contain the sequences and experimental profiles for all the heads in your model. 3. Train a solo model with :py:mod:`trainSoloModel`. 4. Measure the performance of your model with :py:mod:`metrics`. 5. Make predictions from the model with :py:mod:`makePredictions`. 6. Generate importance scores, either one-dimensionally, with :py:mod:`interpretFlat`, or by making two-dimensional PISA plots with :py:mod:`interpretPisa` 7. Run MoDISco to extract motifs (using :py:mod:`shapToNumpy` the external ``tfmodisco-lite`` package) 8. Use the motif mapping tools :py:mod:`motifSeqletCutoffs`, :py:mod:`motifScan` and :py:mod:`motifAddQuantiles` to map the discovered motifs back to the genome. With bias removal ----------------- If you do have strong experimental biases, you will need to regress them out. In that case, the workflow is the following: 1. Prepare bigwig tracks and select the regions you are interested in with :py:mod:`prepareBed`. 2. Prepare a data file containing bias using :py:mod:`prepareTrainingData`. The bias regions may be uninteresting regions in the genome, or you may train on your regions of interest but use an experimental control for the data. 3. Train a bias (AKA solo) model with :py:mod:`trainSoloModel`. 4. Train a transformation model to match the bias model to the experimental data, using :py:mod:`trainTransformationModel`. 5. Train a residual model to explain non-bias parts of the experimental data, using :py:mod:`trainCombinedModel`. 6. Measure the performance of the full model with :py:mod:`metrics`. 7. Make predictions from the full model and residual model with :py:mod:`makePredictions`. 8. Generate importance scores from the residual model with :py:mod:`interpretFlat`. 9. Run MoDISco, using scores generated by :py:mod:`shapToNumpy`. 10. Map the discovered motifs back to the genome using :py:mod:`motifSeqletCutoffs`, :py:mod:`motifScan` and :py:mod:`motifAddQuantiles`. .. Copyright 2022, 2023, 2024 Charles McAnany. This file is part of BPReveal. BPReveal is free software: You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version. BPReveal is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with BPReveal. If not, see . # noqa # pylint: disable=line-too-long