models
Functions to build BPNet-style models.
The model architectures are generally derived from the basepairmodels
repository, which is released under an MIT-style license. You can find a copy
at etc/basepairmodels_license.txt.
The arithmetic for residual models is derived from ChromBPNet, but the code is not derived from that project.
- bpreveal.models.soloModel(inputLength, outputLength, numFilters, numLayers, inputFilterWidth, outputFilterWidth, headList, modelName)
Generate a model using the classic BPNet architecture.
- Parameters:
inputLength (int) – is the length of the one-hot encoded DNA sequence.
outputLength (int) – is the length of the predicted profile.
numFilters (int) – is the number of convolutional filters used at each layer.
numLayers (int) – is the number of dilated convolutions.
inputFilterWidth (int) – is the width of the first convolutional layer, the one looking for motifs.
outputFilterWidth (int) – is the width of the profile head convolutional filter at the very bottom of the network.
headList (list[dict]) – is taken directly from a <head-list> in the configuration JSON.
modelName (str) – The name you want this model to have when saved.
- Returns:
A TF model.
- Return type:
keras.models.Model
Input to this model is a (batch x inputLength x NUM_BASES) tensor of one-hot encoded DNA. Output is a list of (profilePreds, profilePreds, profilePreds,… , countPreds, countPreds, countPreds…). profilePreds is a tensor of shape (batch x numTasks x outputLength), containing the logits of the profile values for each task. countsPreds is a tensor of shape (batch x numTasks) containing the log counts for each task.
If you call this with an inputLength that is too long for the model, it will automatically crop the input window down to the correct size. Since this could be an indication that your architecture is inconsistent, this will emit a warning. It is an error to call this with an inputLength that is too short to satisfy the model.
- bpreveal.models.transformationModel(soloModelIn, profileArchitectureSpecification, countsArchitectureSpecification, headList)
Construct a simple model used to regress out the solo model from experimental data.
Given a solo model (typically representing bias), generate a simple network that can be used to transform the solo model’s output into the experimental data. That is, experimental = f(bias) and f is a simple function like y=mx+b or something. When you train the model returned by this function, you are training the m and b parameters of that function. Note that this function sets the solo model to non-trainable, since you’re not trying to make the bias model better, you’re trying to transform the solo model’s output to look like experimental data.
- Parameters:
soloModelIn (keras.models.Model) – A Keras model that you’d like to transform.
profileArchitectureSpecification (dict) – Straight from the config JSON.
countsArchitectureSpecification (dict) – Straight from the config JSON.
headList (list[dict]) – Also from the config JSON.
- Returns:
A Keras model with the same output shape as the soloModel.
- Return type:
keras.models.Model
- bpreveal.models.combinedModel(inputLength, outputLength, numFilters, numLayers, inputFilterWidth, outputFilterWidth, headList, biasModel)
Build a combined model.
This builds a standard BPNet model, but then adds in the bias at the very end:
,-----------------SEQUENCE------------------, V , Cropdown step V _____________ _________________ | SOLO MODEL| | RESIDUAL MODEL| |___________| |_______________| | | _____V_______ _______ | | TRANSFORM |--------> | ADD |<-----------------' |___________| |_____| | _____V_____ |COMBINED | |_________|
Since you’ll usually want to isolate the bias-free model (AKA residual model), that is returned separately.
- Parameters:
inputLength (int) – The length of the one-hot encoded DNA sequence (which must be the same for the bias model and the residual model).
outputLength (int) – The length of the predicted profile.
numFilters (int) – The number of convolutional filters used at each layer in the residual model.
numLayers (int) – The number of dilated convolutions in the residual model.
inputFilterWidth (int) – The width of the first convolutional layer in the residual model, the one looking for motifs.
outputFilterWidth (int) – The width of the profile head convolutional filter in the residual model at the very bottom of the network.
headList (list[dict]) – Taken straight from the config json.
biasModel (keras.models.Model) – A keras model that goes from sequence to transformed bias. This is the file that is saved when you generate the transformation model, and internally comprises both the solo model and a transformation.
- Returns:
Three kmodels.
The first is the combined output, i.e., the COMBINED node in the graph above. Input to this model is a (batch x inputLength x NUM_BASES) tensor of one-hot encoded DNA. Output is a list of (profilePreds, profilePreds, profilePreds,… , countPreds, countPreds, countPreds…). profilePreds is a tensor of shape (batch x numTasks x outputLength), containing the logits of the profile values for each task. countsPreds is a tensor of shape (batch x numTasks) containing the log counts for each task.
The second is the bias-free model, RESIDUAL MODEL in the graph above. It has the same input and output shapes as the COMBINED model.
The final model is the solo model, just in case you need it.
- Return type:
tuple[keras.models.Model, keras.models.Model, keras.models.Model]
It is an error to call this function with an inconsistent network structure, such as an input that is too short.