Covariance -- A Visual Walk Through

In a previous post, I’ve looked at walking through the calculation of variance and standard deviation, visualizing each step. This post is dedicated to the visualization of another statistic: covariance.

Covariance is a measure of the joint variability of two random variables.

Let’s have a look at the sample covariance equation over all:

\(cov(x,y) = \frac{\sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y})}{n-1}\)

And now lets apply the equation to the following case:

Ready? Okay, now let’s walk through the calculation; there are 7 small steps:

Step 1: find the mean of x:


Step 2: find the mean of y


Step 3: calculate difference between x and mean of x


Step 4: calculate difference between y and mean of y


Step 5: multiply these differences (observation-wise)


Step 6: Add these areas

\(\sum_1^n (x_i-\overline{x})(y_i-\overline{y})\)

Step 7: Divide through by number of observations minus 1 (the result will a bit larger in magnitude than the average)

\(cov(x,y) = \frac{\sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y})}{n-1}\)

That’s it.

Now we can compare this visualized result to what we would get if we simply trust the R covariance function to calculate this for us.

## [1] 0.4766744
cov(x,y) # Calculation for **sample** covariance
## [1] 0.4766744

Great. It’s a match!

Discussion question

What would the units of unadjusted covariance be for the covariance between life expectancy in years and per capita gdp in dollars?

Note: The normalized version of covariance is Pearson’s correlation coefficient.


R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Evangeline Reynolds
Visiting Teaching Assistant Professor

My research interests include international institutions, causal inference, data visualization, and computational social science and pedagogy.