internal.constants

Types that are used throughout BPReveal.

bpreveal.internal.constants.ONEHOT_T: Data type for elements of a one-hot encoded sequence.

bpreveal.internal.constants.ONEHOT_AR_T

Data type for an array of one-hot encoded sequences

alias of ndarray[Any, dtype[uint8]]

bpreveal.internal.constants.PRED_T: Data type for coverage.

bpreveal.internal.constants.PRED_AR_T

Data type for an array of predictions.

alias of ndarray[Any, dtype[float32]]

bpreveal.internal.constants.LOGIT_T: Data type for logits from the model.

bpreveal.internal.constants.LOGIT_AR_T

Data type for an array of logits.

alias of ndarray[Any, dtype[float32]]

bpreveal.internal.constants.LOGCOUNT_T: Data type for logcount values.

bpreveal.internal.constants.IMPORTANCE_T

Store importance scores with 16 bits of precision.

Since importance scores (particularly PISA values) take up a lot of space, I use a small floating point type and compression to mitigate the amount of data.

bpreveal.internal.constants.IMPORTANCE_AR_T

Data type for an array of importance values.

alias of ndarray[Any, dtype[float16]]

bpreveal.internal.constants.MODEL_ONEHOT_T

Inside the models, we use floating point numbers to represent one-hot sequences.

For reasons I don’t understand, setting this to uint8 DESTROYS pisa values.

bpreveal.internal.constants.MOTIF_FLOAT_T

The type used to represent cwms and pwms, and also the type used by the jaccard code.

If you change this, be sure to change libJaccard.c and libJaccard.pyf (and run make) so that the jaccard library uses the correct data type.

bpreveal.internal.constants.MOTIF_FLOAT_AR_T

An array of motif data.

alias of ndarray[Any, dtype[float32]]

bpreveal.internal.constants.H5_CHUNK_SIZE: int = 128

When saving large hdf5 files, store the data in compressed chunks.

This constant sets the number of entries in each chunk that gets compressed. For good performance, whenever you read a compressed hdf5 file, it really helps if you read out whole chunks at a time and buffer them. See, for example, shapToBigwig for an example of a chunked reader.

bpreveal.internal.constants.QUEUE_TIMEOUT: int = 240

How long should a queue wait before crashing?

In parallel code, if something goes wrong, a queue could stay stuck forever. Python’s queues have a nifty timeout parameter so that they’ll crash if they wait too long. If a queue has been blocking for longer than this timeout, have the program crash.

This is measured in seconds.

bpreveal.internal.constants.GLOBAL_TENSORFLOW_LOADED: bool = False

Has Tensorflow been loaded in this process?

This gets set to True if you use any of the tensorflow-importing functions in this file. If you import tensorflow in a parent process, child processes will not be able to use tensorflow, because tensorflow is dumb like that. Tools like the easy® functions and the threaded batcher check to see if Tensorflow has been loaded in the parent process before they spawn children.

bpreveal.internal.constants.GENOME_NUCLEOTIDE_FREQUENCY: dict[str, list[float]] = {'danRer11': [0.316952, 0.183272, 0.183253, 0.31652], 'dm6': [0.290034, 0.210142, 0.209919, 0.289903], 'hg38': [0.295182, 0.203906, 0.204783, 0.296127], 'mm10': [0.291497, 0.208327, 0.208343, 0.291831], 'sacCer3': [0.309806, 0.190882, 0.190596, 0.308714]}: The frequency of A, C, G, and T (in that order) in common reference genomes.

bpreveal.internal.constants.setTensorflowLoaded(): Call this when you first load tensorflow.

bpreveal.internal.constants.getTensorflowLoaded()

Returns true if this process has ever loaded tensorflow.

Return type:: bool