internal.constants

Types that are used throughout BPReveal.

bpreveal.internal.constants.ONEHOT_T

Data type for elements of a one-hot encoded sequence.

bpreveal.internal.constants.ONEHOT_AR_T

Data type for an array of one-hot encoded sequences

alias of ndarray[Any, dtype[uint8]]

bpreveal.internal.constants.PRED_T

Data type for coverage.

bpreveal.internal.constants.PRED_AR_T

Data type for an array of predictions.

alias of ndarray[Any, dtype[float32]]

bpreveal.internal.constants.LOGIT_T

Data type for logits from the model.

bpreveal.internal.constants.LOGIT_AR_T

Data type for an array of logits.

alias of ndarray[Any, dtype[float32]]

bpreveal.internal.constants.LOGCOUNT_T

Data type for logcount values.

bpreveal.internal.constants.IMPORTANCE_T

Store importance scores with 16 bits of precision.

Since importance scores (particularly PISA values) take up a lot of space, I use a small floating point type and compression to mitigate the amount of data.

bpreveal.internal.constants.IMPORTANCE_AR_T

Data type for an array of importance values.

alias of ndarray[Any, dtype[float16]]

bpreveal.internal.constants.MODEL_ONEHOT_T

Inside the models, we use floating point numbers to represent one-hot sequences.

For reasons I don’t understand, setting this to uint8 DESTROYS pisa values.

bpreveal.internal.constants.MOTIF_FLOAT_T

The type used to represent cwms and pwms, and also the type used by the jaccard code.

If you change this, be sure to change libJaccard.c and libJaccard.pyf (and run make) so that the jaccard library uses the correct data type.

bpreveal.internal.constants.MOTIF_FLOAT_AR_T

An array of motif data.

alias of ndarray[Any, dtype[float32]]

bpreveal.internal.constants.H5_CHUNK_SIZE: int = 128

When saving large hdf5 files, store the data in compressed chunks.

This constant sets the number of entries in each chunk that gets compressed. For good performance, whenever you read a compressed hdf5 file, it really helps if you read out whole chunks at a time and buffer them. See, for example, shapToBigwig for an example of a chunked reader.

bpreveal.internal.constants.QUEUE_TIMEOUT: int = 240

How long should a queue wait before crashing?

In parallel code, if something goes wrong, a queue could stay stuck forever. Python’s queues have a nifty timeout parameter so that they’ll crash if they wait too long. If a queue has been blocking for longer than this timeout, have the program crash.

This is measured in seconds.

bpreveal.internal.constants.GLOBAL_TENSORFLOW_LOADED: bool = False

Has Tensorflow been loaded in this process?

This gets set to True if you use any of the tensorflow-importing functions in this file. If you import tensorflow in a parent process, child processes will not be able to use tensorflow, because tensorflow is dumb like that. Tools like the easy® functions and the threaded batcher check to see if Tensorflow has been loaded in the parent process before they spawn children.

bpreveal.internal.constants.GENOME_NUCLEOTIDE_FREQUENCY: dict[str, list[float]] = {'danRer11': [0.316952, 0.183272, 0.183253, 0.31652], 'dm6': [0.290034, 0.210142, 0.209919, 0.289903], 'hg38': [0.295182, 0.203906, 0.204783, 0.296127], 'mm10': [0.291497, 0.208327, 0.208343, 0.291831], 'sacCer3': [0.309806, 0.190882, 0.190596, 0.308714]}

The frequency of A, C, G, and T (in that order) in common reference genomes.

bpreveal.internal.constants.setTensorflowLoaded()

Call this when you first load tensorflow.

bpreveal.internal.constants.getTensorflowLoaded()

Returns true if this process has ever loaded tensorflow.

Return type:

bool