metrics
Help Info
Take two bigwig-format files and calculate an assortment of metrics on their contents.
usage: metrics [-h] --reference REFERENCE --predicted
PREDICTED --regions REGIONS [--verbose]
[--threads THREADS] [--skip-zeroes]
[--json-output] [--output-file OUTPUTFILE]
[--apply-abs]
Named Arguments
- --reference
The name of the reference (i.e., experimental) bigwig.
- --predicted
The name of the bigwig file generated by a model.
- --regions
The name of a bed file containing the regions to use to calculate the metrics.
- --verbose
Display progress as the file is being written. Cannot be used with –json-output.
Default:
False- --threads
Number of parallel threads to use. Default 1.
Default:
1- --skip-zeroes, --skip-zeros
When a region has zero counts, the default behavior is to poison the resultswith NaN, to indicate that a problem has occurred. If this flag is set,then regions with zero counts in either bigwig will be silently skipped.
Default:
False- --json-output
Instead of producing a human-readable output, generate a machine-readable json file. Cannot be used with –verbose.
Default:
False- --output-file
Instead of writing to stdout, write to this file.
- --apply-abs
Use the absolute value of the entries in the bigwig. Useful if one bigwig contains negative values.
Default:
False
Usage
Calculates useful metrics of model performance.
This little helper program reads in two bigwig files and a bed of regions. For
each of the regions, it calculates several metrics, and then displays the
quintiles of the values of those metrics over the regions. For each region,
four metrics are calculated based on the profiles: mnll, jsd,
pearsonr, and spearmanr. Then, the program displays the quintiles of
the observed metrics values. The 50th percentile indicates the median value,
and is likely to be the most useful for typical work.
Additionally, this tool calculates the total counts over each region and calculates, for all regions in the reference against all regions in the prediction, the Pearson and Spearman correlation of the counts values.
This program uses command-line arguments rather than a configuration json,
since it doesn’t do very much. The arguments are given by running
metrics --help.
The metrics are:
- mnll
The multinomial log-likelihood of seeing the observed data given the predicted probability distribution. This is the profile loss function.
- jsd
The Jensen-Shannon distance. This technique comes from information theory, and determines how similar two probability distributions are. This is a distance metric, so lower values indicate a better match.
- pearsonr
The Pearson correlation coefficient of the profiles.
- spearmanr
The Spearman correlation coefficient of the profiles.
- Counts pearson
The Pearson correlation coefficient of total counts in every region.
- Counts spearman
The Spearman correlation coefficient of the total counts in every region.
Output specification
Normal
If you don’t specify --json-output, then you get a table of metrics. The
first row gives the percentile cutoffs for the various metrics. The following
rows give the value of each metric at each percentile value. For example, if
the 75% column for jsd is 0.72, that means that 75% of your regions have a jsd
value under 0.72.
The two last rows give the statistics for the counts performance. Since counts metrics are not evaluated region-by-region, there are no quantile statistics for them.
json
If you request a JSON output with --json-output, then you will get
a json with the following format:
<metrics-json> ::= {
"reference" : <string>,
"predicted" : <string>,
"regions" : <string>,
"mnll" : <metrics-quantile-section>,
"jsd" : <metrics-quantile-section>,
"pearsonr" : <metrics-quantile-section>,
"spearmanr" : <metrics-quantile-section>,
"counts-pearson" : <number>,
"counts-spearman" : <number>
}
<metrics-quantile-section> ::= {
"quantile-cutoffs" : [<list-of-numbers>],
"quantiles" : [<list-of-numbers>]
}
Output notes
- quantile-cutoffs
A list of the quantile thresholds used for the metrics. These will be
[0.0, 0.25, 0.5, 0.75, 1.0]- quantiles
A list of numbers giving the value of the given metric at the given quantile. For example, the quantiles[1] will give the 25th percentile of the value of that metric.
API
- class bpreveal.metrics.Region(line)
A simple container from a line from a bed file.
- Parameters:
line (str)
- class bpreveal.metrics.MetricsCalculator(referenceBwFname, predictedBwFname, applyAbs, inQueue, outQueue, tid)
Calculates metrics as it receives queries from inQueue, puts results in outQueue.
- Parameters:
referenceBwFname (str) – The file name of the reference bigwig.
predictedBwFname (str) – The file name of the predicted bigwig.
applyAbs (bool) – Should the values in the bigwig be made positive?
inQueue (CrashQueue) – The queue that will provide queries.
outQueue (CrashQueue) – The queue where results will be put.
tid (int) – The thread ID of this process.
- runRegions(regionReference, regionPredicted, regionID)
Run the calculation on a single region.
- Parameters:
- Return type:
None
Given a region, loads up profiles from the reference and predicted bigwigs and calculates the various metrics. Puts its results into the output queue.
- run()
Watch the input queue and run queries until you get the stop signal.
- Return type:
None
- finish()
Wrap up shop, close the bigwigs.
- Return type:
None
- bpreveal.metrics.calculatorThread(referenceBwFname, predictedBwFname, applyAbs, inQueue, outQueue, tid)
Just spawns a MetricsCalculator and runs it.
- Parameters:
referenceBwFname (str) – The file name of the reference bigwig.
predictedBwFname (str) – The file name of the predicted bigwig.
applyAbs (bool) – Should the values in the bigwig be made positive?
inQueue (CrashQueue) – The queue that will provide queries.
outQueue (CrashQueue) – The queue where results will be put.
tid (int) – The thread ID of this process.
- Return type:
None
- bpreveal.metrics.regionGenThread(regionsFname, regionQueue, numThreads, numberQueue)
A thread to generate regions and stuff them in the regionQueue.
- Parameters:
regionsFname (str) – The bed file to read in.
regionQueue (CrashQueue) – The queue that the calculator threads will be getting their queries from.
numThreads (int) – How many calculator threads will be running?
numberQueue (CrashQueue) – A queue that will hear the number of regions in the regions file.
- Return type:
None
numberQueue is needed so that the parent thread can know how many results to expect. This thread counts the number of regions in regionsFname and then puts that number in numberQueue before it starts putting regions into regionQueue.
- bpreveal.metrics.percentileStats(name, vector, jsonDict, header=False, write=True, outputFp=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)
Given a vector of statistics, calculate percentile values.
- Parameters:
name (str) – The name of the statistic being calculated. Used for output.
vector (ndarray) – The data that you want processed.
jsonDict (dict) – A dict where you want quantile information stored.
header (bool) – Should a header be written? If so, prints a row with the quantile cutoff values.
write (bool) – Should the results be written at all? If not, they will still be added to the jsonDict.
outputFp (TextIO) – The (opened) file object where output should be written.
- Return type:
None
Doesn’t return anything, but does put information in jsonDict.
- bpreveal.metrics.receiveThread(numRegions, outputQueue, skipZeroes, jsonOutput, jsonDict, outputFile)
Listen to the output from the calculator threads and process it.
- Parameters:
numRegions (int) – How many total regions will be calculated? This is calculated inside regionGenThread and passed back through numberQueue.
outputQueue (CrashQueue) – The queue that the calculator threads are putting their results in.
skipZeroes (bool) – Should regions where the metrics are undefined be filtered out? If not, your results will be contaminated with NaN, but this can also be a good indication that something is wrong.
jsonOutput (bool) – Should a json file be written? If so, the normal tabular output will not be printed.
jsonDict (dict) – Any additional information you’d like included in your json file, like the names of the files that were processed.
outputFile (str | None) – The name of the file where the output should be saved. If this is
None, then write to stdout.
- Return type:
None
- bpreveal.metrics.runMetrics(reference, predicted, regions, threads, applyAbs, skipZeroes, jsonOutput, outputFile)
Run the calculation.
- Parameters:
reference (str) – The name of the bigwig file with reference data.
predicted (str) – The name of the bigwig file with predictions.
regions (str) – The name of the bed file with regions to analyze.
threads (int) – How many parallel workers should be used?
applyAbs (bool) – Should all values in the bigwigs be made positive?
skipZeroes (bool) – If one of the metrics from one region is NaN, should it be ignored (skipZeros = True) or should all the results be contaminated with NaN (skipZeros = False)?
jsonOutput (bool) – Should json output be written instead of a table?
outputFile (str | None) – If not None, gives the name of a file that the output should be saved to.
- Return type:
None
Doesn’t return anything, but will print to stdout.
- bpreveal.metrics.getParser()
Generate the argument parser.
- Returns:
An ArgumentParser, ready to call parse_args()
- Return type:
ArgumentParser
- bpreveal.metrics.main()
Run the whole thing.
- Return type:
None