How to Calculate Correlation Between Two Variables in R
Use this interactive calculator to compute Pearson, Spearman, or Kendall correlation from two numeric vectors, visualize the relationship, and instantly generate matching R code for your analysis.
Correlation Calculator
Enter your two variables and click Calculate Correlation to see the coefficient, interpretation, and R code.
Expert Guide: How to Calculate Correlation Between Two Variables in R
Correlation is one of the most common statistical tools used to measure how strongly two variables move together. In R, correlation is straightforward to calculate, but the quality of your result depends on choosing the correct method, checking assumptions, and interpreting the number carefully. If you are learning how to calculate correlation between two variables in R, the good news is that base R already includes everything you need. With a few lines of code, you can estimate the strength and direction of a relationship, handle missing values, and even run a formal correlation test.
At its core, a correlation coefficient tells you whether larger values of one variable tend to occur with larger values of another variable, with smaller values, or with no consistent pattern at all. Positive correlation means both variables tend to increase together. Negative correlation means one tends to decrease as the other increases. A value near zero means there is little or no monotonic or linear association, depending on the method used.
Quick answer: In R, the most direct command is cor(x, y). By default, this calculates the Pearson correlation coefficient. You can switch methods with method = "spearman" or method = "kendall".
1. The Basic R Syntax
Suppose you have two numeric vectors named x and y. The most basic calculation in R is:
This returns the Pearson correlation coefficient, which measures linear association. If your variables are approximately continuous, roughly linear, and not dominated by extreme outliers, Pearson is often the default choice.
To calculate other types of correlation in R, use:
These alternatives are useful when the relationship is not strictly linear, when ranks matter more than actual distances, or when the data contain influential outliers.
2. Choosing Between Pearson, Spearman, and Kendall
Not all correlation methods measure the same thing. Pearson correlation focuses on linear relationships. Spearman correlation transforms the data to ranks, then measures how consistently the ranking of one variable matches the ranking of the other. Kendall correlation also works from ranks, but it is based on concordant and discordant pairs, making it a robust choice for ordinal data and smaller samples.
| Method | Best for | What it measures | Typical R syntax |
|---|---|---|---|
| Pearson | Continuous variables with linear association | Linear relationship | cor(x, y, method = "pearson") |
| Spearman | Ranked, skewed, or non-normal data | Monotonic relationship using ranks | cor(x, y, method = "spearman") |
| Kendall | Ordinal data or smaller samples | Association from ordered pairs | cor(x, y, method = "kendall") |
A practical rule is simple: use Pearson when linearity is plausible and the variables are measured on a continuous scale; use Spearman when the relationship may be curved but still consistently increasing or decreasing; use Kendall when you want a conservative rank-based estimate, especially with many ties or a modest sample size.
3. How to Read the Correlation Coefficient
The correlation coefficient always falls between -1 and 1. A value of 1 indicates a perfect positive relationship. A value of -1 indicates a perfect negative relationship. A value of 0 indicates no association according to the method used.
- 0.90 to 1.00: very strong positive association
- 0.70 to 0.89: strong positive association
- 0.40 to 0.69: moderate positive association
- 0.10 to 0.39: weak positive association
- -0.09 to 0.09: little to no association
- -0.39 to -0.10: weak negative association
- -0.69 to -0.40: moderate negative association
- -0.89 to -0.70: strong negative association
- -1.00 to -0.90: very strong negative association
These cutoffs are only rough guidelines. In many fields, a smaller correlation can still be scientifically meaningful, especially when the measurements are noisy or the sample is large.
4. Running a Significance Test in R
If you want more than just the coefficient, R also provides cor.test(). This function gives you the estimated correlation, a p-value, and a confidence interval when available.
A typical output includes:
- The test statistic
- The p-value
- The estimated correlation coefficient
- A confidence interval for Pearson correlation
This is useful when you are conducting inferential analysis rather than just descriptive analysis. For example, if your p-value is less than 0.05, you may conclude there is statistical evidence of association under the assumptions of the chosen method.
5. Real Dataset Examples in R
One of the easiest ways to understand correlation is to look at real statistics from well-known built-in R datasets. These examples below are often reproduced in teaching materials because they show clearly different strengths and directions of association.
| Dataset | Variables | Approximate Pearson r | Interpretation |
|---|---|---|---|
| mtcars | mpg vs wt | -0.8677 | Strong negative linear relationship. Heavier cars tend to have lower fuel efficiency. |
| mtcars | mpg vs hp | -0.7762 | Strong negative relationship. More horsepower is associated with lower mpg. |
| iris | Sepal.Length vs Petal.Length | 0.8718 | Strong positive relationship across all species combined. |
| women | height vs weight | 0.9955 | Near-perfect positive linear relationship in this small teaching dataset. |
You can reproduce these in R with commands such as:
6. Handling Missing Values Correctly
One of the most common mistakes when learning how to calculate correlation between two variables in R is forgetting about missing values. If either variable contains NA, cor() may return NA unless you specify how missing observations should be treated.
The safest common option is pairwise complete cases for two vectors:
For two vectors, complete.obs is usually the clearest choice. It tells R to use only rows where both values are present. This matters because mismatched missingness can change both your sample size and your estimate.
7. Correlation Matrix in R
If you have a data frame with multiple numeric variables, you do not need to calculate each pair separately. R can generate a full correlation matrix:
This produces a square table of correlations between every pair of variables. It is especially useful in exploratory data analysis, feature selection, multicollinearity checks, and descriptive reporting.
8. Visualizing Correlation
You should never rely on the coefficient alone. A scatter plot can reveal nonlinearity, outliers, clusters, and subgroup patterns that a single number can hide. In R, a basic plot is easy:
If the points follow a roughly straight upward or downward pattern, Pearson correlation may be appropriate. If the relationship is curved but still consistently ordered, Spearman may tell a more accurate story. If a few points dominate the line, you should investigate outliers before reporting the result.
9. Assumptions and Practical Warnings
Correlation is simple to compute, but interpretation can go wrong quickly if assumptions are ignored. Here are the main points to remember:
- Correlation does not imply causation. Two variables can be highly correlated because of a third factor.
- Pearson correlation assumes linearity. A curved relationship can produce a misleadingly low Pearson value.
- Outliers matter. A single unusual observation can heavily distort Pearson correlation.
- Restricted range reduces correlation. If your sample covers only a narrow range of values, the estimate may look weaker than the true relationship.
- Subgroups can reverse the story. Aggregated data can hide or reverse patterns seen within groups.
When in doubt, inspect the data visually, calculate more than one method if appropriate, and document exactly how missing values were handled.
10. Step-by-Step Workflow in R
- Load or create your numeric vectors.
- Check their length and inspect for missing values.
- Make a scatter plot to assess shape and outliers.
- Choose Pearson, Spearman, or Kendall based on the data structure.
- Run
cor()for the coefficient. - Run
cor.test()if you need inference and a p-value. - Report the coefficient, method, sample size, and any missing-data rule used.
11. Example Write-Up
Here is a clean reporting template you can adapt:
If you used Spearman or Kendall, replace the method name and coefficient symbol appropriately. The key is to be explicit about what was measured and how it was measured.
12. Helpful Official and Academic References
For readers who want trustworthy statistical guidance, these sources are excellent starting points:
- University of California, Berkeley: correlation concepts and interpretation
- NIST.gov: engineering statistics guidance on correlation and scatter plots
- Penn State University: correlation overview and practical examples
13. Final Takeaway
If you want to know how to calculate correlation between two variables in R, the essential answer is straightforward: use cor(x, y) for the coefficient and cor.test(x, y) for statistical testing. But expert use goes beyond syntax. You should pick the right method, inspect the data visually, handle missing values deliberately, and avoid overinterpreting a single number. When used carefully, correlation is one of the fastest and most useful tools for understanding relationships in data.
The calculator above helps you do exactly that. It computes the coefficient instantly, plots your paired values, and generates R code you can paste directly into your script or report. That means you can move from understanding the concept to applying it in real analysis with far fewer mistakes.