predictToBigwig

Help Info

Take an hdf5-format file generated by the predict script and render it to a bigwig.

usage: predictToBigwig [-h] [--h5 H5] [--bw BW]
                       [--head-id HEADID]
                       [--task-id TASKID] [--mode MODE]
                       [--verbose] [--negate]
                       [--threads NUMTHREADS]

Named Arguments

--h5

The name of the hdf5-format file to be read in.

--bw

The name of the bigwig file that should be written.

--head-id

Which head number do you want data for?

--task-id

Which task in that head do you want?

--mode

What do you want written? Options are ‘profile’, meaning you want (softmax(logits) * exp(logcounts)), or ‘logits’, meaning you just want logits, or ‘mnlogits’, meaning you want the logits, but mean-normalized (for easier display), or ‘logcounts’, meaning you want the log counts for every region, or ‘counts’, meaning you want exp(logcounts). You will usually want ‘profile’

--verbose

Display progress as the file is being written.

Default: False

--negate

Negate all of the values written to the bigwig. Used for negative-strand predictions.

Default: False

--threads

Number of threads to use for calculating profile tracks (max: num chromosomes)

Default: 24

Usage

A script to take the predictions hdf5 file and turn it into a bigwig.

class bpreveal.predictToBigwig.Region(chromIdx, start, end, h5Idx)

Represents a single region.

It knows where it is in the genome and also what index it occupies in the hdf5.

Parameters:
  • chromIdx (int) – The chromosome index, corresponding to chrom_names in the hdf5.

  • start (int) – The genomic start coordinate, inclusive.

  • end (int) – The genomic end coordinate, exclusive.

  • h5Idx (int) – The index in the hdf5 file where this region is found.

getValues(h5fp, mode, head, taskID)

Get the values in the hdf5 at this region.

Parameters:
  • h5fp (File) – An opened hdf5 file containing predictions.

  • mode (str) – One of profile, logits, mnlogits, logcounts, or counts.

  • head (int) – Which head index do you want the data from?

  • taskID (int) – Which task do you want the data for?

Raises:

ValueError – if you asked for an invalid mode.

Returns:

A vector containing the requested data.

Return type:

ndarray

bpreveal.predictToBigwig.getChromInserts(arg)

Packs all the arguments into one so it’s easier to use with pool.map().

Parameters:

arg (tuple[list[Region], str, int, int, str]) – In order, regionList, h5Fname, headID, taskID, mode.

Returns:

The inserts from vectorToListOfInserts.

Return type:

list[tuple[ndarray, int]]

bpreveal.predictToBigwig.getChromVector(regionList, h5fp, headID, taskID, mode)

Map the values at each Region onto a vector representing the chromosome.

regionList should only contain regions from one chromosome, as regionList[0].chromIdx is used to determine the size of the returned vector.

Parameters:
  • regionList (list[Region]) – The regions you want data for.

  • h5fp (File) – The (open) hdf5 file of predictions.

  • headID (int) – The head you want data for.

  • taskID (int) – The task within that head that you want data for.

  • mode (str) – One of profile, logits, mnlogits, logcounts, or counts.

Returns:

An array as long as the chromosome, with zeros everywhere that regionList did not cover, and the values of the data wherever the regions do exist. For overlapping regions, the predictions are averaged.

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

bpreveal.predictToBigwig.vectorToListOfInserts(dataVector)

Convert a chromosome vector to a list of regions that actually have data.

Given a vector of data from getChromVector, remove all the zeros and give you a list of regions with actual data. For example:

vectorToListOfInserts([0,0,0,1,2,3,0,0,0,5,6,7])
[([1, 2, 3], 3), ([5, 6, 7], 9)]
Parameters:

dataVector (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – An array representing the data along an entire chromosome.

Returns:

A list of tuples. The first element of each tuple is an array of data, and the second is the genomic coordinate where that dataset starts.

Return type:

list[tuple[ndarray, int]]

bpreveal.predictToBigwig.buildRegionList(inH5)

Builds a list of Region objects for each chromosome in the hdf5.

Parameters:

inH5 (File) – The (open) h5py file containing predictions.

Returns:

A dict mapping chromosome ID to a list of Regions on that chromosome.

Return type:

dict[int, list[Region]]

bpreveal.predictToBigwig.writeBigWig(inH5Fname, outFname, headID, taskID, mode, verbose, negate, numThreads)

Load in the h5 files and write the predictions to a bigwig file.

Parameters:
  • inH5Fname (str) – The name of an hdf5 file on disk containing predictions.

  • outFname (str) – The name of the bigwig file to write.

  • headID (int) – The head you want predictions from.

  • taskID (int) – The task within that head that you want predictions for.

  • mode (str) – One of profile, logits, mnlogits, logcounts, or counts.

  • verbose (bool) – Should the program emit logging information?

  • negate (bool) – Should the predictions be negated in the output bigwig? Useful for chip-nexus.

  • numThreads (int) – How many threads should be used?

Return type:

None

bpreveal.predictToBigwig.getParser()

Generate the argument parser.

Returns:

An ArgumentParser, ready to call parse_args()

Return type:

ArgumentParser

bpreveal.predictToBigwig.main()

Run the program.

Return type:

None