It depends what race you’re running
Imagine there are three teams of runners who are about to compete against each other in a team-vs-team race.
The first team has some very fast runners, but also some that are very slow. Despite the slowpokes, this team has highest average speed.
🯅 🯅 🯅 🯅 🯅 🯅 🯅
-----------------------------
Slow Fast
The second team is made up entirely of unremarkable runners, all of whom run at roughly the same middling pace.
🯅🯅🯅🯅🯅🯅🯅
-----------------------------
Slow Fast
The third team consists of runners who are far below average. But it has one incredible athlete – a world-class talent who’s easily the fastest runner on any of the teams.
🯅🯅🯅🯅🯅🯅 🯅
-----------------------------
Slow Fast
Which of these teams is most likely to win the race? Well, it depends entirely on what kind of race it is.
- It might be a relay race, in which the total time taken by a team is the sum of the times taken by the individual runners. In this scenario, the first team would be most likely to win since it has the highest average speed.
- Or maybe the race is one in which all of the runners start at the same time and teams are scored by how long it takes everyone on the team to finish the race. In this case, the time for a team depends exclusively on its slowest runner, which would give the second team the advantage since the other two teams both have slower runners.
- Or maybe teams are scored not by their slowest runner, but by their fastest. Then the third team would be the favorite since it has the fastest individual runner.
This analogy is an attempt to explain some simple but deep ideas in software performance that I came to relatively recently, despite writing software for more than a decade:
- It’s useful to think of the execution time of a program not as a number, but as a distribution.
- Which aspects of this distribution are worth optimizing depends not on the program, but on the context in which the program is invoked.
In the analogy, each team represents a computer program, and each runner represents a single execution of that program. The differences in the runner’s “run times” captures the idea of performance variation across multiple executions of the same program.
- The relay race is like a plain
for
loop. Executions of the program occur serially, and the total time taken is the sum of the times of the individual executions. To win this race, it makes sense to optimize the mean execution time, which is just a scaled version of the sum. - The second race, in which all of the runners have to cross the finish line, is like invoking a program many times in parallel, then waiting for all of them to complete, as one might in a map-reduce computation. For example, imagine you want to search a partitioned dataset. Individual partitions can be searched in parallel, and then the results combined, with the total run time depending on the maximum time it takes to conduct an individual search (since you need to wait for the slowest search to finish before returning the combined results).
- The third race, in which the team with the fastest runner wins, is like issuing a bunch of parallel calls and but only waiting around for the first result. A practical example of this pattern is the idea of hedged requests where, simplifying somewhat, to improve the response times from a service you send it multiple identical requests and take the response that arrives first, ie. the one with the minimum run time.
References
Just as fault-tolerant computing aims to create a reliable whole out of less-reliable parts, large online services need to create a predictably responsive whole out of less-predictable parts; we refer to such systems as “latency tail-tolerant,” or simply “tail-tolerant.” Here, we outline some common causes for high-latency episodes in large online services and describe techniques that reduce their severity or mitigate their effect on whole-system performance.
Minimum Times Tend to Mislead When Benchmarking
In this short post, I want to make a simple argument against using the minimum time of a benchmark as a proxy for that benchmark’s performance.
Tail Latency Might Matter More Than You Think
I continue to believe that if you’re going to measure just one thing, make it the mean. However, you probably want to measure more than one thing.
This draft is based on ideas I learned about in discussions with Yao Yue, Dan Luu, and Rebecca Isaacs over the last few years, all of whom know way more about this stuff than I do and have written articles and papers that cover the same ground but in much more conceptual depth and technical detail.
Also, all this assumes that you care about winning the race in the first place.