motifScan
Scans for motifs given importance scores.
This program scans over the contribution scores you calculated with
interpretFlat and looks for matches to motifs
called by modiscolite. It can be run with a quantile JSON from
motifSeqletCutoffs, or you can include
the settings for that program inside the JSON for this one, in which case it
will perform the seqlet analysis first and save those results for you. If you
include a seqlet-cutoff-settings block in the config, it will run the
motifSeqletCutoffs tools, and if you
don’t include that, you must include a seqlet-cutoff-json file with the
appropriate cutoffs. It is an error to specify both seqlet-cutoff-settings
and seqlet-cutoff-json.
Configuration Json
BNF
<motif-scan-config> ::= { <scan-quantile-settings> "scan-settings" : { "scan-contrib-h5" : <string>, "hits-tsv" : <string>, "num-threads" : <integer>} <verbosity-section> }
<scan-quantile-settings> ::= "seqlet-cutoff-json" : <string>, | "seqlet-cutoff-settings" : <seqlet-scanning-settings>,
Parameter Notes
- scan-contrib-h5
The output of
interpretFlatand contains contribution scores. All of the regions in this file will be scanned.- num-threads
The number of parallel workers to use. Due to the streaming architecture of this program, the minimum value of
num-threadsis 3. I have found that this program scales very well up to 70 cores, and haven’t tested it beyond that.- hits-tsv
Where would you like the hit data stored?
- seqlet-cutoff-settings
See
motifSeqletCutoffsfor the specification of this block.
Quantile json
If you don’t run the seqlet cutoffs during the scan, you need to provide a JSON
file containing the information for each pattern. This file is generated by
motifSeqletCutoffs and saved to the name
quantile-json in the configuration to that script.
BNF
<quantile-json> ::= [<list-of-scan-patterns> ]
<list-of-scan-patterns> ::= <scan-pattern>«, <list-of-scan-patterns>»
<scan-pattern> ::= { "metacluster-name" : <string>, "pattern-name" : <string>, "short-name" : <string> "cwm" : <motif-array>, "pssm" : <motif-array>, "seq-match-cutoff" : <number-or-null>, "contrib-match-cutoff" : <number-or-null>, "contrib-magnitude-cutoff" : <number-or-null> }
<motif-array> ::= [ <list-of-base-arrays> ]
<list-of-base-arrays> ::= <base-array>«, <list-of-base-arrays>»
<base-array> ::= [<number>, <number>, <number>, <number>]
Parameter notes
In the quantile JSON, we find the actual numerical cutoffs for scanning.
- metacluster-name, pattern-name
These are from the modisco hdf5 file.
- short-name
is a convenient name for this motif, and is entirely up to you. The short name will be used to populate the name column in the generated bed and csv files.
- cwm
An array of shape (length, NUM_BASES) that contains the cwm of the motif. It is used to calculate the Jaccard similarity and the L1 score.
- pssm
The sequence-based information content at each position, and is used to calculate sequence match scores.
- seq-match-cutoff, contrib-match-cutoff, contrib-magnitude-cutoff
The three cutoff values are the actual scores, not quantile values. These are calculated by
motifSeqletCutoffs. You could set these manually, but why would you?seq-match: Cutoff where a sequence must have a PSSM score higher than the original TF-MoDISco pattern seqlets’ quantile value.
contrib-match: Cutoff where a sequence must have a CWM score higher than the original TF-MoDISco pattern seqlets’ quantile value.
contrib-match: Cutoff where a sequence must have contribution (L1 magnitude) higher than the original TF-MoDISco pattern seqlets’ quantile value.
Note: Setting any of these cutoff values to 0 will mean that no value can be less that the lowest seqlet. Setting a cutoff to 0 is not the same as setting a cutoff to None.
Output Specification
For the generated tsv file, see
motifAddQuantiles. If you include a
quantile-json in your seqlet-cutoff-settings, then running
motifScan will save out the cutoff JSON.
API
- bpreveal.motifScan.motifScan(config)
Run the scan.
- Parameters:
config (dict) – A JSON object matching the motifScan specification.
- Return type:
None
- bpreveal.motifScan.main()
A zero-argument wrapper around the main function.
- Return type:
None
Schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "motifScan",
"description": "Schema for motifScan.py",
"type": "object",
"properties": {
"seqlet-cutoff-json": {
"type": "string"
},
"seqlet-cutoff-settings": {
"$ref": "/schema/motifSeqletCutoffs#/definitions/seqlet-scanning-settings"
},
"scan-settings": {
"type": "object",
"properties": {
"scan-contrib-h5": {
"type": "string"
},
"hits-tsv": {
"type": "string"
},
"num-threads": {
"type": "integer",
"minimum" : 3
}
},
"required": [
"scan-contrib-h5",
"hits-tsv",
"num-threads"
]
},
"verbosity": {"$ref": "/schema/base#/definitions/verbosity"}
},
"oneOf": [
{
"required": [
"seqlet-cutoff-json"
]
},
{
"required": [
"seqlet-cutoff-settings"
]
}
],
"required": [
"verbosity",
"scan-settings"
]
}