jaccard
Thin wrapper for the internal libjaccard C library.
- bpreveal.jaccard.slidingJaccard(importanceScores, cwm)
Calculate the sliding Jaccard similarity.
- Parameters:
importanceScores (ndarray[Any, dtype[float16]]) – An array of shape (M, 4) giving hypothetical importance scores.
cwm (ndarray[Any, dtype[float32]]) – An array of shape (N, 4), giving a motif’s CWM.
- Returns:
A tuple of arrays, both with shape (M - N + 1). The first one gives the sliding Jaccard similarities and the second gives the contribution magnitudes.
- Return type:
tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float32]]]
- The returned array is defined by::
slidingJaccard(A,B)[i] = jaccardRegion(A[i:i+N],B)
- bpreveal.jaccard.jaccardRegion(importanceScores, scaleFactor, cwm)
For given region’s importance scores, calculate the continuous Jaccard similarity.
- Parameters:
importanceScores (ndarray[Any, dtype[float16]]) – An array of shape (length, 4) giving a region’s hypothetical importance scores.
scaleFactor (float) – A constant that the importance scores should be multiplied by.
cwm (ndarray[Any, dtype[float32]]) – An array of shape (length, 4) giving the CWM for a motif.
- Returns:
A single float giving the Jaccard match
This implements the formula in the modisco paper, namely that
\[J(v_1,v_2) = \frac{\sum_i (v_{1,i} \cap v_{2,i})}{\sum_i (v_{1,i} \cup v_{2,i})}\]where:
\[\begin{split}x \cap y &= min(|x|, |y|) * sign(x) * sign(y) \\ x \cup y &= max(|x|, |y|)\end{split}\]The scaleFactor is a number that the importanceScores array should be multiplied by. If you just want the continuous Jaccard metric, set this to 1.0.