Peak Detection for Data Visualization
Note: This was originally published as an Observable notebook.
Peaks are spike-shaped patterns in time series data. Detecting them is often useful, since peaks can represent anomalies and sudden events.
Peak detection can be put to good use in a data visualization where it can direct attention to areas of potential value.
Example
Let's look at data on the number of visitors to Wikipedia pages. The peaks in this data represent times when there was a sudden spike of interest in a particular page.
Here are the daily pageviews of the Wikipedia page for pumpkins in 2018:
There's a large peak near the end of the year. What do you think that is? If we hover over the chart we see that interest in pumpkins peaked on Halloween.
It would be nice if the chart guided us towards this conclusion instead of making us work for it. Here it is again, this time with the peaks automatically detected and marked:
Now the insight is immediate. We can see that the large peak falls on Halloween, and that the peak following it is on Thanksgiving.
Easter hunt
Peak detection can also be used for completely frivolous purposes, such as determining the date of Easter purely from the pageviews of a curated set of Wikipedia pages (which is much more fun than the traditional method).
In addition to
Peak Detection Library
In an Observable Notebook, you can use peak detection in your own applications by using the detectPeaks
function provided by this notebook, which accepts a time series and returns the indices of any detected peaks.
import {detectPeaks} from '@yurivish/peak-detection'
Call detectPeaks
with an array of numbers to get back an array of peak indices:
detectPeaks([2, 0, 10, 2, 1])
You can pass an accessor function to transform each element of the array before peak detection occurs:
detectPeaks([{value: 2}, {value: 0}, {value: 10}, {value: 3}, {value: 1}], d => d.value)
Configuration options can fine-tune the behavior of the algorithm:
detectPeaks([1, 3, 2, 0, 10, 3, 1, 4], {
lookaround: 2, // the number of neighbors to compare to on each side
sensitivity: 1, // sensitivity, in terms of standard deviations above the mean
coalesce: 2 // coalesce together peaks within this distance of each other
})
The Algorithm
The approach to peak-detection used in this piece is a modification of one presented in a paper by Girish Palshikar titled Simple Algorithms for Peak Detection in Time-Series.
-
Compute a peak score for each data sample by comparing it to its shortest neighbors on each side (lookaround). The bigger the difference, the higher the score.
-
Each data sample now has a score, which is used to filter the peaks. We apply a sensitivity threshold (sensitivity) to the scores to pick out the globally strongest peaks.
-
To prevent peaks that span multiple samples from appearing multiple times in the final list, coalesce peaks that are close together (coalesce), grouping them and keeping the highest peak from each group.
Explore the Parameter Space
The sliders below control the peak detection in all of the charts on this page. They can help you get a feel for how the algorithm works and expose some of its more subtle aspects — there are nonlinear interactions due to peak coalescing.
For example, lowering the sensitivity may introduce a new candidate peak that lies between two existing peaks, causing all three to get coalesced and result in a net reduction in the number of peaks.
Thanks for reading!
Appendix: Code
function detectPeaks(data, accessor, options) {
let {lookaround, sensitivity, coalesce, full} = Object.assign({
lookaround: 2,
sensitivity: 1.4,
coalesce: 0,
full: false
}, options || accessor)
let values = typeof accessor == "function" ? data.map(accessor) : data
// Compute a peakiness score for every sample value in `data`
// We normalize the scale of the scores by mean-centering and dividing by the standard deviation
// to get a dimensionless quantity such that can be used as a sensitivity parameter
// across different scales of data (s. t. normalize(x) == normalize(k*x))
let scores = normalize(
values.map(
(value, index) => peakiness(
values.slice(max(0, index - lookaround), index),
value,
values.slice(index + 1, index + lookaround + 1)
)
)
)
// Candidate peaks are indices whose score is above the sensitivity threshold
let candidates = d3.range(scores.length).filter(index => scores[index] > sensitivity)
// If we have multiple peaks, coalesce those that are close together
let groups = candidates.length ? [[candidates[0]]] : []
d3.pairs(candidates).forEach(([a, b]) => {
if (b - a < coalesce) {
groups[groups.length - 1].push(b)
} else {
groups.push([b])
}
})
// Represent every group of peaks by the highest peak in the group
let peaks = groups.map(
group => group[d3.scan(group, (a, b) => values[b] - values[a])]
)
return full ? { data, values, scores, candidates, groups, peaks } : peaks
}
// Assigns a spikiness score to `value`, based on its left and right neighbors
const peakiness = (left, value, right) => {
// assume zero outside the boundary
return value - d3.max([d3.min(left) || 0, d3.min(right) || 0]) // this can be max or mean.
}
const normalize = xs => {
let mean = d3.mean(xs)
let stdev = d3.deviation(xs)
return xs.map(x => (x - mean) / stdev)
}
Appendix: Fonts
The fonts used for some visual elements of this piece are Valkyrie and Concourse, commercial fonts available for purchase from MB Type.