metrics

Help Info

Take two bigwig-format files and calculate an assortment of metrics on their contents.

usage: metrics [-h] --reference REFERENCE --predicted
               PREDICTED --regions REGIONS [--verbose]
               [--threads THREADS] [--skip-zeroes]
               [--json-output] [--output-file OUTPUTFILE]
               [--apply-abs]

Named Arguments

--reference

The name of the reference (i.e., experimental) bigwig.

--predicted

The name of the bigwig file generated by a model.

--regions

The name of a bed file containing the regions to use to calculate the metrics.

--verbose

Display progress as the file is being written. Cannot be used with –json-output.

Default: False

--threads

Number of parallel threads to use. Default 1.

Default: 1

--skip-zeroes

When a region has zero counts, the default behavior is to poison the resultswith NaN, to indicate that a problem has occurred. If this flag is set,then regions with zero counts in either bigwig will be silently skipped.

Default: False

--json-output

Instead of producing a human-readable output, generate a machine-readable json file. Cannot be used with –verbose.

Default: False

--output-file

Instead of writing to stdout, write to this file.

--apply-abs

Use the absolute value of the entries in the bigwig. Useful if one bigwig contains negative values.

Default: False

Usage

Calculates useful metrics of model performance.

This little helper program reads in two bigwig files and a bed of regions. For each of the regions, it calculates several metrics, and then displays the quintiles of the values of those metrics over the regions. For each region, four metrics are calculated based on the profiles: mnll, jsd, pearsonr, and spearmanr. Then, the program displays the quintiles of the observed metrics values. The 50th percentile indicates the median value, and is likely to be the most useful for typical work.

Additionally, this tool calculates the total counts over each region and calculates, for all regions in the reference against all regions in the prediction, the Pearson and Spearman correlation of the counts values.

This program uses command-line arguments rather than a configuration json, since it doesn’t do very much. The arguments are given by running metrics --help.

The metrics are:

mnll: The multinomial log-likelihood of seeing the observed data given the predicted probability distribution. This is the profile loss function.
jsd: The Jensen-Shannon distance. This technique comes from information theory, and determines how similar two probability distributions are. This is a distance metric, so lower values indicate a better match.
pearsonr: The Pearson correlation coefficient of the profiles.
spearmanr: The Spearman correlation coefficient of the profiles.
Counts pearson: The Pearson correlation coefficient of total counts in every region.
Counts spearman: The Spearman correlation coefficient of the total counts in every region.

Output specification

Normal

If you don’t specify --json-output, then you get a table of metrics. The first row gives the percentile cutoffs for the various metrics. The following rows give the value of each metric at each percentile value. For example, if the 75% column for jsd is 0.72, that means that 75% of your regions have a jsd value under 0.72.

The two last rows give the statistics for the counts performance. Since counts metrics are not evaluated region-by-region, there are no quantile statistics for them.

json

If you request a JSON output with --json-output, then you will get a json with the following format:

<metrics-json> ::= {
    "reference" : <string>,
    "predicted" : <string>,
    "regions" : <string>,
    "mnll" : <metrics-quantile-section>,
    "jsd" : <metrics-quantile-section>,
    "pearsonr" : <metrics-quantile-section>,
    "spearmanr" : <metrics-quantile-section>,
    "counts-pearson" : <number>,
    "counts-spearman" : <number>
    }

<metrics-quantile-section> ::= {
    "quantile-cutoffs" : [<list-of-numbers>],
    "quantiles" : [<list-of-numbers>]
    }

Output notes

quantile-cutoffs: A list of the quantile thresholds used for the metrics. These will be [0.0, 0.25, 0.5, 0.75, 1.0]
quantiles: A list of numbers giving the value of the given metric at the given quantile. For example, the quantiles[1] will give the 25th percentile of the value of that metric.

API

class bpreveal.metrics.Region(line)

A simple container from a line from a bed file.

Parameters:: line (str)

class bpreveal.metrics.MetricsCalculator(referenceBwFname, predictedBwFname, applyAbs, inQueue, outQueue, tid)

Calculates metrics as it receives queries from inQueue, puts results in outQueue.

Parameters:

referenceBwFname (str) – The file name of the reference bigwig.
predictedBwFname (str) – The file name of the predicted bigwig.
applyAbs (bool) – Should the values in the bigwig be made positive?
inQueue (Queue) – The queue that will provide queries.
outQueue (Queue) – The queue where results will be put.
tid (int) – The thread ID of this process.

runRegions(regionReference, regionPredicted, regionID)

Run the calculation on a single region.

Parameters:

regionReference (Region) – A region in the reference bigwig.
regionPredicted (Region) – A region in the predicted bigwig.
regionID (Any) – A tag passed into the output queue.

Return type:

None

Given a region, loads up profiles from the reference and predicted bigwigs and calculates the various metrics. Puts its results into the output queue.

run()

Watch the input queue and run queries until you get the stop signal.

Return type:: None

finish()

Wrap up shop, close the bigwigs.

Return type:: None

bpreveal.metrics.calculatorThread(referenceBwFname, predictedBwFname, applyAbs, inQueue, outQueue, tid)

Just spawns a MetricsCalculator and runs it.

Parameters:

referenceBwFname (str) – The file name of the reference bigwig.
predictedBwFname (str) – The file name of the predicted bigwig.
applyAbs (bool) – Should the values in the bigwig be made positive?
inQueue (Queue) – The queue that will provide queries.
outQueue (Queue) – The queue where results will be put.
tid (int) – The thread ID of this process.

Return type:

None

bpreveal.metrics.regionGenThread(regionsFname, regionQueue, numThreads, numberQueue)

A thread to generate regions and stuff them in the regionQueue.

Parameters:

regionsFname (str) – The bed file to read in.
regionQueue (Queue) – The queue that the calculator threads will be getting their queries from.
numThreads (int) – How many calculator threads will be running?
numberQueue (Queue) – A queue that will hear the number of regions in the regions file.

Return type:

None

numberQueue is needed so that the parent thread can know how many results to expect. This thread counts the number of regions in regionsFname and then puts that number in numberQueue before it starts putting regions into regionQueue.

bpreveal.metrics.percentileStats(name, vector, jsonDict, header=False, write=True, outputFp=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)

Given a vector of statistics, calculate percentile values.

Parameters:

name (str) – The name of the statistic being calculated. Used for output.
vector (ndarray) – The data that you want processed.
jsonDict (dict) – A dict where you want quantile information stored.
header (bool) – Should a header be written? If so, prints a row with the quantile cutoff values.
write (bool) – Should the results be written at all? If not, they will still be added to the jsonDict.
outputFp (TextIO) – The (opened) file object where output should be written.

Return type:

None

Doesn’t return anything, but does put information in jsonDict.

bpreveal.metrics.receiveThread(numRegions, outputQueue, skipZeroes, jsonOutput, jsonDict, outputFile)

Listen to the output from the calculator threads and process it.

Parameters:

numRegions (int) – How many total regions will be calculated? This is calculated inside regionGenThread and passed back through numberQueue.
outputQueue (Queue) – The queue that the calculator threads are putting their results in.
skipZeroes (bool) – Should regions where the metrics are undefined be filtered out? If not, your results will be contaminated with NaN, but this can also be a good indication that something is wrong.
jsonOutput (bool) – Should a json file be written? If so, the normal tabular output will not be printed.
jsonDict (dict) – Any additional information you’d like included in your json file, like the names of the files that were processed.
outputFile (str | None) – The name of the file where the output should be saved. If this is None, then write to stdout.

Return type:

None

bpreveal.metrics.runMetrics(reference, predicted, regions, threads, applyAbs, skipZeroes, jsonOutput, outputFile)

Run the calculation.

Parameters:

reference (str) – The name of the bigwig file with reference data.
predicted (str) – The name of the bigwig file with predictions.
regions (str) – The name of the bed file with regions to analyze.
threads (int) – How many parallel workers should be used?
applyAbs (bool) – Should all values in the bigwigs be made positive?
skipZeroes (bool) – If one of the metrics from one region is NaN, should it be ignored (skipZeros = True) or should all the results be contaminated with NaN (skipZeros = False)?
jsonOutput (bool) – Should json output be written instead of a table?
outputFile (str | None) – If not None, gives the name of a file that the output should be saved to.

Return type:

None

Doesn’t return anything, but will print to stdout.

bpreveal.metrics.getParser()

Generate the argument parser.

Return type:: ArgumentParser

bpreveal.metrics.main()

Run the whole thing.

Return type:: None