tools.plots
- bpreveal.tools.plots.getCoordinateTicks(start, end, numTicks, zeroOrigin)
Given a start and end coordinate, return x-ticks that should be used for plotting. Given a start and end coordinate, return a list of ticks and tick labels that 1. include exactly the start and stop coordinates 2. Contain approximately numTicks positions and labels. 3. Try to fall on easy multiples 1, 2, and 5 times powers of ten. 4. Are formatted to reduce redundant label noise by omitting repeated initial digits.
- Parameters:
start (int)
end (int)
numTicks (int)
zeroOrigin (bool)
- Return type:
tuple[list[float], list[str]]
- bpreveal.tools.plots.plotLogo(values, width, ax, colors='seq', spaceBetweenLetters=0)
A convenience function to plot an array of sequence data (like a pwm) on a matplotlib axes object.
Arguments: values is an (N,4) array of sequence data. This could be, for example, a pwm or a one-hot encoded sequence.
width is the width of the total logo, useful for aligning axis labels.
ax is a matplotlib axes object on which the logo will be drawn.
- Colors, if provided, can have several meanings:
- Give an explicit rgba color for each base.
colors should be an array of shape (N, 4, 4), where the first dimension is the sequence position, the second is the base (A, C, G, T, in that order), and the third gives an rgba color to use for that base at that position.
- Give a color for each base type. In this case, colors will be a dict of tuples:
{“A”: (220, 38, 127), “C”: (120, 94, 240), “G”: (254, 97, 0), “T”: (255, 176, 0)} This will make each instance of a particular base have the same color.
- Give a matplotlib colormap and a min and max value. Each base will be colored
based on its magnitude. For example, to highlight bases with large negative values, you might specify (‘Blues_r’, -2, 0) to draw all bases with negative scores as blue, and bases with less negative colors will be drawn lighter. Bases with scores outside the limits you provide will be clipped to the limit values.
- The string ‘seq’ means A will be drawn green, C will be blue, G will be orange,
and T will be red. These colors are drawn from a colorblind-aware palette.
- Parameters:
values (ndarray[Any, dtype[float32]])
width (float)
spaceBetweenLetters (float)
- Return type:
None
- bpreveal.tools.plots.plotPisaWithFiles(pisaDats, cutMiddle, cutLengthX, cutLengthY, receptiveField, genomeWindowStart, genomeWindowChrom, genomeFastaFname, importanceBwFname, motifScanBedFname, profileDats, nameColors, fig, bbox, colorSpan=1.0, boxHeight=0.1, fontsize=5, mini=False)
Given the names of files, make a pisa plot.
- Parameters:
pisaDats (str | ndarray[Any, dtype[float16]]) – Either a string naming an hdf5 file or an array from loadPisa.
cutMiddle (int) – The midpoint of the pisa plot, relative to the start of the profile.
cutLengthX (int) – How wide should the X axis be? If 99 or less, a sequence will be plotted.
cutLengthY (int) – How tall should the plot be?
receptiveField (int) – What is the model’s receptive field?
genomeWindowStart (int) – Where in the genome does pisaDats start? This is used to generate the x axis.
genomeWindowChrom (str) – What chromosome is the sequence on?
genomeFastaFname (str) – Name of the fasta file containing the genome.
importanceBwFname (str) – The bigwig of importance scores from interpretFlat
motifScanBedFname (str) – The bed file containing mapped motifs.
profileDats (str) – The bigwig file containing predicted profile.
nameColors (dict[str, tuple[float, float, float]]) – A dict containing the color to be used for each motif name. If a motif is encountered that is not in this dict, then it is added with a color taken from the IBM palette.
fig (Figure) – The matplotlib figure to put this plot on.
bbox (tuple[float, float, float, float]) – The bounding box to use for drawing the figure. Lets you put multiple pisa plots on a single matplotlib Figure.
colorSpan (float) – What are the maximum and minimum values in the color map.
boxHeight (float) – How tall should the boxes containing motif names be?
fontsize (int) – How big should the font be?
mini (bool) – Would you like a smaller plot, suitable for one-column printing? Default: False.
- Returns:
Same as plotPisa
- bpreveal.tools.plots.plotPisa(pisaDats, cutMiddle, cutLengthX, cutLengthY, receptiveField, genomeWindowStart, seq, impScores, annotations, profile, nameColors, fig, bbox, colorSpan=1.0, boxHeight=0.1, fontsize=5, showGrid=True, showDiag=True, mini=False)
Given the actual vectors to show, make a pretty pisa plot.
- Parameters:
pisaDats (str | ndarray[Any, dtype[float16]]) – Either a string naming an hdf5 file, or an array from loadPisa.
cutMiddle (int) – Where should the midpoint of the plot be, relative to the pisaDats array?
cutLengthX (int) – How wide should the plot be?
cutLengthY (int) – How tall should the plot be?
receptiveField (int) – What is the model’s receptive field?
genomeWindowStart (int) – Where in the genome does this sequence start?
seq (str | None) – The sequence of the region.
impScores (ndarray[Any, dtype[float16]]) – The importance scores.
annotations (tuple[tuple[int, int], str, tuple[float, float, float]]) – A list of annotations, containing ((start, stop), name, color).
profile (ndarray[Any, dtype[float32]]) – A vector containing profile information.
nameColors (dict[str, tuple[float, float, float]]) – A dict mapping motif name to color. This is ignored.
fig (Figure) – The matplotlib Figure onto which the plot should be drawn.
bbox (tuple[float, float, float, float]) – The bounding box on the figure that will be used.
colorSpan (float) – The limit of the color scale.
boxHeight (float) – How tall should the motif name boxes be?
fontsize (int) – How large should the font be?
showGrid (bool) – Should the grid be plotted? Default: True
showDiag (bool) – Should a dotted line be plotted along the diagonal? Default: True
mini (bool) – Should a small-scale plot be made? This shrinks down the border elements for small printing and display.
- Returns:
(axPisa, axSeq, axProfile, nameColors, axCbar)
- bpreveal.tools.plots.plotSequenceHeatmap(hmap, ax, upsamplingFactor=10)
Show a sequence heatmap from an array of one-hot encoded sequences.
- Parameters:
hmap (ndarray[Any, dtype[uint8]]) – An array of sequences of shape (numSequences, length, 4)
ax (Axes) – A matplotlib Axes object upon which the heatmap will be drawn.
upsamplingFactor (int) – How much should the x-axis be sharpened? If upsamplingFactor * hmap.shape[1] >> ax.width_in_pixels then you may get aliasing artifacts. If upsamplingFactor * hmap.shape[1] << ax.width_in_pixels then you will get blurry borders.
- bpreveal.tools.plots.plotModiscoPattern(pattern, fig, sortKey=None)
Create a plot showing a pattern’s seqlets and their match scores.
- Parameters:
sortKey (None or ndarray) – Either None (do not sort) or an array of shape (numSeqlets,) giving the order in which the seqlets should be displayed. See example below for common use cases.
pattern (Pattern) – The pattern to plot. This pattern must have already had its seqlets loaded.
fig (Figure) – The matplotlib figure upon which the plots should be drawn.
Example:
# Background ACGT frequency bgProbs = [0.29, 0.21, 0.21, 0.29] patZld = motifUtils.Pattern("pos_patterns", "pattern_1", "Zld") with h5py.File("modisco_results.h5", "r") as fp: patZld.loadCwm(fp, 0.3, 0.3, bgProbs) patZld.loadSeqlets(fp) fig = plt.figure() # Sort the seqlets by their contribution match. sortKey = [x.contribMatch for x in patZld.seqlets] plotModiscoPattern(patZld, fig, sortKey=sortKey)