jaccard

Thin wrapper for the internal libjaccard C library.

bpreveal.jaccard.slidingJaccard(importanceScores, cwm)

Calculate the sliding Jaccard similarity.

Parameters:
  • importanceScores (ndarray[Any, dtype[float16]]) – An array of shape (M, 4) giving hypothetical importance scores.

  • cwm (ndarray[Any, dtype[float32]]) – An array of shape (N, 4), giving a motif’s CWM.

Returns:

A tuple of arrays, both with shape (M - N + 1). The first one gives the sliding Jaccard similarities and the second gives the contribution magnitudes.

Return type:

tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float32]]]

The returned array is defined by::

slidingJaccard(A,B)[i] = jaccardRegion(A[i:i+N],B)

bpreveal.jaccard.jaccardRegion(importanceScores, scaleFactor, cwm)

For given region’s importance scores, calculate the continuous Jaccard similarity.

Parameters:
  • importanceScores (ndarray[Any, dtype[float16]]) – An array of shape (length, 4) giving a region’s hypothetical importance scores.

  • scaleFactor (float) – A constant that the importance scores should be multiplied by.

  • cwm (ndarray[Any, dtype[float32]]) – An array of shape (length, 4) giving the CWM for a motif.

Returns:

A single float giving the Jaccard match

This implements the formula in the modisco paper, namely that

\[J(v_1,v_2) = \frac{\sum_i (v_{1,i} \cap v_{2,i})}{\sum_i (v_{1,i} \cup v_{2,i})}\]

where:

\[\begin{split}x \cap y &= min(|x|, |y|) * sign(x) * sign(y) \\ x \cup y &= max(|x|, |y|)\end{split}\]

The scaleFactor is a number that the importanceScores array should be multiplied by. If you just want the continuous Jaccard metric, set this to 1.0.