How to Calculate the Covariance Between Two Variables
Use this interactive covariance calculator to measure how two variables move together. Enter paired X and Y values, choose sample or population covariance, and instantly see the result, means, interpretation, and a visual scatter chart.
Covariance Calculator
Results
Enter paired values above, then click Calculate Covariance.
Expert Guide: How to Calculate the Covariance Between Two Variables
Covariance is one of the core ideas in statistics, finance, economics, data science, and social research because it measures how two variables move together. If one variable tends to rise when the other rises, covariance is positive. If one variable tends to rise when the other falls, covariance is negative. If the variables do not show a consistent joint movement, covariance tends to be near zero. While that definition sounds simple, many learners still struggle with the actual calculation, how to interpret the sign, and how covariance differs from correlation. This guide walks through the full process in practical terms so you can compute covariance confidently and explain what it means.
What covariance tells you
Suppose you are studying hours worked and weekly pay, advertising spend and sales, rainfall and crop yield, or study time and exam score. In each case, you are looking at paired observations. For every value of X, there is a corresponding value of Y. Covariance uses those pairs to estimate whether the variables move in the same direction or in opposite directions.
- Positive covariance: X and Y tend to move in the same direction.
- Negative covariance: X and Y tend to move in opposite directions.
- Covariance near zero: there is little linear co-movement in the data.
Key idea: covariance is about direction of movement, not standardized strength. The numeric value depends on the units of the two variables, which is why correlation is often used alongside covariance.
The covariance formula
There are two common formulas. Use the population version when your data includes the entire population of interest. Use the sample version when your data is a sample drawn from a larger population.
Sample covariance: sxy = Σ[(xᵢ – x̄)(yᵢ – ȳ)] / (n – 1)
Here:
- xᵢ = each observed X value
- yᵢ = each observed Y value
- x̄ = mean of X
- ȳ = mean of Y
- n = number of paired observations
Step by step: how to calculate covariance manually
- List your paired data. Make sure each X value is matched with the correct Y value. Covariance only works correctly when the observations are properly paired.
- Find the mean of X and the mean of Y. Add all X values and divide by n. Do the same for Y.
- Compute each deviation from the mean. For every pair, subtract x̄ from xᵢ and subtract ȳ from yᵢ.
- Multiply the paired deviations. For each row, calculate (xᵢ – x̄)(yᵢ – ȳ).
- Add all products. This gives the total co-movement across the dataset.
- Divide by n or n – 1. Use n for population covariance and n – 1 for sample covariance.
Worked example
Imagine you want to measure the covariance between hours studied and exam score for five students:
| Student | Hours Studied (X) | Exam Score (Y) |
|---|---|---|
| 1 | 2 | 65 |
| 2 | 4 | 70 |
| 3 | 6 | 75 |
| 4 | 8 | 88 |
| 5 | 10 | 95 |
First calculate the means:
- Mean of X = (2 + 4 + 6 + 8 + 10) / 5 = 6
- Mean of Y = (65 + 70 + 75 + 88 + 95) / 5 = 78.6
Then calculate the paired deviation products:
- (2 – 6)(65 – 78.6) = 54.4
- (4 – 6)(70 – 78.6) = 17.2
- (6 – 6)(75 – 78.6) = 0
- (8 – 6)(88 – 78.6) = 18.8
- (10 – 6)(95 – 78.6) = 65.6
The sum of products is 156. For a population covariance, divide by 5 to get 31.2. For a sample covariance, divide by 4 to get 39.0. The covariance is positive, which means more study hours are associated with higher exam scores in this dataset.
How to interpret the sign and magnitude
The sign is the most direct part of interpretation. A positive sign means the variables tend to move together. A negative sign means they tend to move in opposite directions. A value near zero suggests weak linear co-movement. The magnitude is trickier because covariance is not standardized. If X is measured in dollars and Y is measured in percentages, the covariance will be in dollar-percent units. If you change the units, the covariance changes too.
That is why analysts often use covariance to understand direction and to support later calculations, but use correlation when they need a standardized scale from negative 1 to positive 1. In portfolio theory, for example, covariance is central because the joint movement of asset returns affects diversification. In regression and machine learning, covariance also helps describe how features vary together.
Covariance vs correlation
| Feature | Covariance | Correlation |
|---|---|---|
| What it measures | Direction of joint movement | Direction and standardized strength of linear relationship |
| Units | Depends on the units of X and Y | Unitless |
| Scale | Unbounded | Always between -1 and 1 |
| Best use | Variance-covariance matrices, finance, multivariate analysis | Easy comparison across different variable pairs |
Real statistics example: unemployment and inflation
To see how covariance works with public economic data, consider U.S. annual unemployment and inflation rates. The table below uses rounded annual values that are commonly reported in federal statistical releases. Because these are paired by year, they can be used to examine whether unemployment and inflation moved together across this period.
| Year | U.S. Unemployment Rate (%) | U.S. Inflation Rate (%) |
|---|---|---|
| 2019 | 3.7 | 1.8 |
| 2020 | 8.1 | 1.2 |
| 2021 | 5.3 | 4.7 |
| 2022 | 3.6 | 8.0 |
| 2023 | 3.6 | 4.1 |
If you compute covariance for these observations, the result is negative, reflecting that in this short period higher unemployment did not move together with higher inflation. This is a good reminder that covariance is sample-specific and period-specific. A relationship observed over five years may differ from a relationship observed over fifty years.
Real statistics example: wages and consumer prices
Now consider another real-world pairing where we might expect positive co-movement: average hourly earnings and inflation. Over time, wages and price levels often rise together, although not always at the same rate.
| Year | Average Hourly Earnings, Private Nonfarm ($) | Inflation Rate (%) |
|---|---|---|
| 2019 | 28.16 | 1.8 |
| 2020 | 29.66 | 1.2 |
| 2021 | 30.95 | 4.7 |
| 2022 | 32.99 | 8.0 |
| 2023 | 34.00 | 4.1 |
Here the covariance is positive because larger wage values tend to appear in years with higher inflation than at the beginning of the period. Again, the sign tells you the direction of movement, but not the standardized strength.
When to use sample covariance vs population covariance
This distinction matters. If your data includes every observation in the population you care about, use population covariance and divide by n. If your data is only a sample, use sample covariance and divide by n – 1. Dividing by n – 1 corrects bias when estimating the population covariance from a sample. In practical business and research settings, most data you analyze is a sample, so sample covariance is often the better default.
Common mistakes to avoid
- Mismatched pairs: if the X and Y values are not aligned correctly, the covariance is meaningless.
- Using the wrong denominator: population covariance and sample covariance are not interchangeable.
- Overinterpreting magnitude: a larger covariance does not necessarily mean a stronger relationship unless the units are comparable.
- Ignoring outliers: extreme values can dominate covariance because deviations are multiplied.
- Assuming causation: covariance only measures co-movement, not whether one variable causes changes in the other.
Why visualization matters
A covariance number is informative, but a scatter plot often reveals the real story. You may find a positive covariance driven by one outlier, a curved relationship that covariance does not capture well, or clusters that suggest the data contains separate groups. That is why the calculator above includes a chart. A good analyst always checks both the number and the pattern.
Where covariance is used in practice
- Finance: asset return covariance helps estimate portfolio risk and diversification benefits.
- Economics: analysts compare inflation, wages, employment, production, and spending series.
- Data science: covariance matrices describe feature relationships and feed into methods like principal component analysis.
- Quality control: manufacturers examine how process inputs move with product outcomes.
- Education and social science: researchers measure whether variables such as attendance and performance move together.
Authoritative sources for deeper study
If you want official statistical context and reference material, these sources are excellent starting points:
- U.S. Bureau of Labor Statistics
- U.S. Census Bureau research and working papers
- Penn State Online Statistics Program
Final takeaway
To calculate covariance between two variables, start with paired data, compute each variable’s mean, find every deviation from the mean, multiply the paired deviations, add those products, and divide by n or n – 1 depending on whether you have a population or a sample. The sign gives you the direction of co-movement. The magnitude depends on the units, so use caution when comparing values across different datasets. When you need a clearer measure of strength, complement covariance with correlation. Most importantly, always pair the numeric result with a visual check of the data. That combination leads to better interpretation and better decisions.