Pareto Frontier
Status: This was written in a hurry. I should find some way to label these with how much effort went into them / how much attention should be put on them…
The Pareto frontier can help you find a subset of your data points that are the ones that represent trade-offs between maximizing the various dimensions.
Imagine you’re looking at data about thousands of cars, and for each car you know its cost, mileage, and top speed. Maybe you’re trying to get a general sense of which cars perform the best, but this can be difficult when there are multiple attributes along which they can be compared.
All else being equal, though, you want a lower cost, higher mileage, and higher top speed, and it could be the case that some cars are strictly better than others on all three of these aspects. That set of those points is called the Pareto frontier.
The Pareto frontier will reveal the points that represent interesting trade-offs: no point on the frontier is strictly better than any other. Of course, there may be important real-world dimensions not captured in your data – perhaps you also care about the size of the car, for example – and in that case, looking at only the frontier of cost, mileage, and top speed can be a dangerous thing since it would cause you to ignore potentially relevant points along those other dimensions or skew your data (since size is correlated with all three dimensions). But if you can quantify these “hidden” aspects and add them to your data, the new higher-dimensional frontier might prove to be a source of insight.
Implementation
Here’s a simple JavaScript implementation of a Pareto frontier algorithm from the paper Fast Linear Expected-Time Algorithms for Computing Maxima and Convex Hulls., coauthored by Jon Bentley of Programming Pearls fame.
It takes an array of points, each represented by an array, and returns the indices of the points that are on the “max” Pareto frontier (intuitively, these points are generally “strictly larger” than the others when considering all dimensions).
// Return the indices of points on the max Pareto frontier (maxima of a point set) using algorithm
// M3 from "Fast Linear Expected-Time Algorithms for Computing Maxima and Convex Hulls".
// Note that if multiple equal points are potentially on the frontier, only one will be returned.
// p1 dominates p2 if it is greater than it in some components and equal
// in the remainder (incomparable in no components). Allows for dimensions
// s. t. neither p1 nor p2 is less than the other, eg. if the values
// are distinct Objects.