# Euclidean Distance and Pearson Correlation

*Note: This was originally published as an Observable notebook.*

I recently learned there's a useful connection between Euclidean distance and Pearson correlation: the squared Euclidean distance between two z-normalized vectors equals their Pearson distance, up to a scale factor.

This short note walks through a derivation.

### Z-normalization

Let's say we have two z-normalized vectors

Z-normalizing a vector involves subtracting its mean then dividing by its standard deviation, so we know that the means

and their standard deviations

We're going to introduce some statistical concepts and see how they simplify under these constraints. Expressions in color are simplifications valid only under z-normalization.

Summations implicity sum over all terms:

### Covariance

Since the means

### Standard deviation

The standard deviations

### Pearson correlation

The denominator reduces to one, leaving us with just the covariance:

### Euclidean distance

Now let's look at squared Euclidean distance between z-normalized vectors.

We can expand out the squared differences inside the sum:`

then separate out the terms into their own sums so we can tackle them one by one:`

Let's examine this. The two sum-of-squares terms,

Meanwhile, the middle term is

Putting these pieces together, our formula becomes:`

which can be simplified to:`

showing that the squared Euclidean distance between two z-normalized vectors is

### Why does this matter?

A common trick to improve Euclidean nearest-neighbor searches is to z-normalize the vectors that you're searching over. The connection to Pearson correlation gives us an intuition for why this works so well in practice.

Additionally, representing correlation as a distance metric allows us to use it with algorithms that rely on properties of a distance metric, such as the triangle inequality. Despite the name, Pearson distance is not actually a distance.

### References

For time series comparisons, it has often been observed that z-score normalized Euclidean distances far outperform the unnormalized variant. In this paper we show that a z-score normalized, squared Euclidean Distance is, in fact, equal to a distance based on Pearson Correlation.

Thanks to Dillon Shook for comments on an early draft of this note.