tools.bestMotifsOnly

Help Info

Read in a tsv file from the motif scanner and limit each position to only have one motif.

usage: bestMotifsOnly [-h] [--metric METRIC]
                      [--filter FILTER] [--in-tsv INTSV]
                      [--in-bed INBED] [--out-bed OUTBED]
                      [--out-tsv OUTTSV]
                      [--no-match-names]
                      [--max-offset MAXOFFSET] [--verbose]

Named Arguments

--metric

A python expression giving the value that should be used to compare motif instances to see which one is better. (Beds should use the ‘score’ column to compare.)

Default: 'score'

--filter

Only consider motifs that satisfy this filter. Format: Any valid Python expression where the identifiers are column names in the tsv. (Don’t forget to quote comparison operators on the shell!)

Default: 'True'

--in-tsv

The input file, in TSV format. One of –input-tsv or –input-bed is required

--in-bed

The input file, in bed format. One of –input-tsv or –input-bed is required

--out-bed

(Optional) The name of the output file, in bed format.

--out-tsv

(Optional) The name of the output file, in tsv format. Cannot be used with –in-bed.

--no-match-names

If provided, then don’t require motif names to match. Default: motifs will only be removed if there is a better instance of a motif with the same name at that locus.

Default: True

--max-offset

Instead of removing motifs that overlap at all, only compare motif instances that are offset by this amount or less.

Default: 99999

--verbose

Show progress.

Default: False

Usage

Read in a tsv file from motif scanning and remove overlapping motifs.

For each position, if there are two motifs that claim it, remove the motif with a lower value in the specified score.

bpreveal.tools.bestMotifsOnly.getParser()

Generates the parser, but does not call parse_args().

Return type:

ArgumentParser

bpreveal.tools.bestMotifsOnly.removeOverlaps(entries, nameCol, maxOffset)

Scan over the (sorted) motif hits and keep the best ones.

For every pair of motifs (m1, m2):

  1. If they don’t overlap at all, continue.

  2. If the center of m1 is more than maxOffset bp away from the center of m2, continue.

  3. If nameCol is given (i.e., is not none) and the name of m1 is different than the name of m2, continue.

  4. We have established that m1 and m2 are competing. We must mark one of them as bad.

  5. Select the motif that has the lower metric. Mark it as bad. If they have the same metric value, mark one at random.

Once all of the marking is complete, return a list of all motifs that are NOT marked as bad.

Parameters:
  • entries (list[dict]) – A list of motif tsv (or bed) entries. This is a dict keyed by column name.

  • nameCol (str | None) – The name of the column used to check to see if motifs have the same name, for example short_name. If None, then don’t compare motifs by name, and only return the single best hit at each locus.

  • maxOffset (int) – If two entries have midpoints that are separated by more than this distance, then they are considered non-overlapping. The larger this value, the more aggressive the culling will be.

Returns:

A list of entries, of the same type as the input entries.

Return type:

list[dict]

There are some quirks with this algorithm. It works based on the order of the bed file, and so the following situation gives a result you might not expect:

|---a:0.8---|     |---c:0.2---|
         |---b:0.7---|   |---d:0.1---|

Here we have four motif calls: a, b, c, and d. By the above algorithm, the motif pairs (a, b), (b, c), and (c, d) have overlap, and so we’ll mark the lower-scoring motif from each pair. Between a and b, b has the lower score, so b will be marked. For b and c, c will be marked. For c and d, d will be marked. This means that ONLY motif a will be returned because every other motif instance had overlap with a better instance.

bpreveal.tools.bestMotifsOnly.loadEntries(inTsv, inBed, matchNames)

Read in the bed or TSV file containing motif calls.

Parameters:
  • inTsv (str | None) – The name of the input tsv file, or None if one wasn’t provided.

  • inBed (str | None) – The name of the input bed file, or None if one wasn’t provided.

  • matchNames (bool) – Should the culling only consider motifs that have the same name?

Returns:

A tuple containing three items. The first is a list of dicts, containing all of the entries from the input data file. The second is a list of the field names in each entry. The third is a string giving the name of the field that should be used to get the name of the motif.

Return type:

tuple[list[dict], list[str], str | None]

bpreveal.tools.bestMotifsOnly.main()

Run the culling algorithm.

Return type:

None