motifScan

Scans for motifs given importance scores.

This program scans over the contribution scores you calculated with interpretFlat and looks for matches to motifs called by modiscolite. It can be run with a quantile JSON from motifSeqletCutoffs, or you can include the settings for that program inside the JSON for this one, in which case it will perform the seqlet analysis first and save those results for you. If you include a seqlet-cutoff-settings block in the config, it will run the motifSeqletCutoffs tools, and if you don’t include that, you must include a seqlet-cutoff-json file with the appropriate cutoffs. It is an error to specify both seqlet-cutoff-settings and seqlet-cutoff-json.

Configuration Json

BNF

<motif-scan-config> ::=
    {<scan-quantile-settings>
     "scan-settings" : {
         "scan-contrib-h5" : <string>,
         "hits-tsv" : <string>,
         "num-threads" : <integer>}
     <verbosity-section>}
<scan-quantile-settings> ::=
    "seqlet-cutoff-json" : <string>,
  | "seqlet-cutoff-settings" : <seqlet-scanning-settings>,

Parameter Notes

scan-contrib-h5

The output of interpretFlat and contains contribution scores. All of the regions in this file will be scanned.

num-threads

The number of parallel workers to use. Due to the streaming architecture of this program, the minimum value of num-threads is 3. I have found that this program scales very well up to 70 cores, and haven’t tested it beyond that.

hits-tsv

Where would you like the hit data stored?

seqlet-cutoff-settings

See motifSeqletCutoffs for the specification of this block.

Quantile json

If you don’t run the seqlet cutoffs during the scan, you need to provide a JSON file containing the information for each pattern. This file is generated by motifSeqletCutoffs and saved to the name quantile-json in the configuration to that script.

BNF

<quantile-json> ::=
    [<list-of-scan-patterns> ]
<list-of-scan-patterns> ::=
    <scan-pattern>
    | <scan-pattern>, <list-of-scan-patterns>
<scan-pattern> ::=
    {"metacluster-name" : <string>,
    "pattern-name" : <string>,
    "short-name" : <string>
    "cwm" : <motif-array>,
    "pssm" : <motif-array>,
    "seq-match-cutoff" : <number-or-null>,
    "contrib-match-cutoff" : <number-or-null>,
    "contrib-magnitude-cutoff" : <number-or-null>}
<motif-array> ::=
    [ <list-of-base-arrays> ]
<list-of-base-arrays> ::=
    <base-array>
    | <base-array>, <list-of-base-arrays>
<base-array> ::=
    [<number>, <number>, <number>, <number>]

Parameter notes

In the quantile JSON, we find the actual numerical cutoffs for scanning.

metacluster-name, pattern-name

These are from the modisco hdf5 file.

short-name

is a convenient name for this motif, and is entirely up to you. The short name will be used to populate the name column in the generated bed and csv files.

cwm

An array of shape (length, NUM_BASES) that contains the cwm of the motif. It is used to calculate the Jaccard similarity and the L1 score.

pssm

The sequence-based information content at each position, and is used to calculate sequence match scores.

seq-match-cutoff, contrib-match-cutoff, contrib-magnitude-cutoff

The three cutoff values are the actual scores, not quantile values. These are calculated by motifSeqletCutoffs. You could set these manually, but why would you?

seq-match: Cutoff where a sequence must have a PSSM score higher than the original TF-MoDISco pattern seqlets’ quantile value.

contrib-match: Cutoff where a sequence must have a CWM score higher than the original TF-MoDISco pattern seqlets’ quantile value.

contrib-match: Cutoff where a sequence must have contribution (L1 magnitude) higher than the original TF-MoDISco pattern seqlets’ quantile value.

Note: Setting any of these cutoff values to 0 will mean that no value can be less that the lowest seqlet. Setting a cutoff to 0 is not the same as setting a cutoff to None.

Output Specification

For the generated tsv file, see motifAddQuantiles. If you include a quantile-json in your seqlet-cutoff-settings, then running motifScan will save out the cutoff JSON.

API

bpreveal.motifScan.main(config)

Run the scan.

Parameters:

config (dict) – A JSON object matching the motifScan specification.

Return type:

None

Schema

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "motifScan",
    "description": "Schema for motifScan.py",
    "type": "object",
    "properties": {
        "seqlet-cutoff-json": {
            "type": "string"
        },
        "seqlet-cutoff-settings": {
            "$ref": "/schema/motifSeqletCutoffs#/definitions/seqlet-scanning-settings"
        },
        "scan-settings": {
            "type": "object",
            "properties": {
                "scan-contrib-h5": {
                    "type": "string"
                },
                "hits-tsv": {
                    "type": "string"
                },
                "num-threads": {
                    "type": "integer",
                    "minimum" : 3
                }
            },
            "required": [
                "scan-contrib-h5",
                "hits-tsv",
                "num-threads"
            ]
        },
        "verbosity": {"$ref": "/schema/base#/definitions/verbosity"}
    },
    "oneOf": [
        {
            "required": [
                "seqlet-cutoff-json"
            ]
        },
        {
            "required": [
                "seqlet-cutoff-settings"
            ]
        }
    ],
    "required": [
        "verbosity",
        "scan-settings"
    ]
}