How to Calculate a Correlation Coefficient Between Two Dichotomous Variables

Use this interactive phi coefficient calculator to measure the strength and direction of association between two binary variables, such as yes/no, pass/fail, exposed/not exposed, or purchased/did not purchase.

Best for 2 x 2 contingency tables

Row 1 label

Row 2 label

Column 1 label

Column 2 label

Column 1 count

Column 2 count

Row 1

Cell a

Row 1 and Column 1

Cell b

Row 1 and Column 2

Row 2

Cell c

Row 2 and Column 1

Cell d

Row 2 and Column 2

Decimal places

Interpretation scale

Enter the four cell counts and click calculate to see the phi coefficient, contingency table totals, and chart.

Understanding how to calculate a correlation coefficient between two dichotomous variables

When both variables are dichotomous, meaning each one has only two possible categories, the most common correlation-style statistic is the phi coefficient. A dichotomous variable might be coded as yes/no, success/failure, smoker/non-smoker, exposed/not exposed, male/female, treatment/control, or purchased/did not purchase. In all of those situations, you do not typically use Pearson correlation on raw numeric values unless the binary coding and interpretation are carefully justified. Instead, you summarize the data in a 2 x 2 contingency table and compute phi.

The phi coefficient tells you two things: direction and strength. A positive phi suggests that the presence of one category is associated with the presence of the other category more often than expected under independence. A negative phi suggests an inverse pattern: when one variable is in its first category, the other tends to be in its second category. A phi near zero indicates little or no linear association in the 2 x 2 table.

This is especially useful in medical research, behavioral science, education, survey analysis, public health, and marketing analytics. For example, you might ask whether vaccination status is associated with disease status, whether ad exposure is associated with purchase behavior, or whether passing an exam is associated with attendance status. In each case, both variables have two categories, so a phi coefficient is a natural summary measure.

The 2 x 2 table structure

To calculate the association between two dichotomous variables, start by arranging the observed counts in a 2 x 2 table:

	Variable Y = 1	Variable Y = 0	Row total
Variable X = 1	a	b	a + b
Variable X = 0	c	d	c + d
Column total	a + c	b + d	a + b + c + d

Here, a is the count for cases where both variables are in their first category, b is where X is in the first category and Y is in the second, c is where X is in the second category and Y is in the first, and d is where both are in their second category. Once those four counts are known, the phi coefficient can be computed directly.

The phi coefficient formula

The formula is:

phi = (a x d – b x c) / sqrt((a+b)(c+d)(a+c)(b+d))

The numerator compares the product of the diagonal cells. If the main diagonal product, a x d, is much larger than the off-diagonal product, b x c, the relationship is positive. If the reverse is true, the relationship is negative. The denominator standardizes the result so that the final coefficient generally ranges from -1 to +1, similar to other correlation coefficients.

Step by step example

Suppose a researcher wants to study whether attending a prep course is associated with passing a certification exam. The observed data are:

	Passed	Failed	Total
Attended course	35	15	50
Did not attend	10	40	50
Total	45	55	100

In this example:

a = 35
b = 15
c = 10
d = 40

Plug the numbers into the formula:

Compute the numerator: (35 x 40) – (15 x 10) = 1400 – 150 = 1250
Compute the denominator parts:
- (a+b) = 50
- (c+d) = 50
- (a+c) = 45
- (b+d) = 55
Multiply those terms: 50 x 50 x 45 x 55 = 6,187,500
Take the square root: sqrt(6,187,500) ≈ 2487.469
Divide numerator by denominator: 1250 / 2487.469 ≈ 0.503

The phi coefficient is approximately 0.503. That suggests a moderately strong positive association. People who attended the prep course were more likely to pass the exam than those who did not attend.

How to interpret the coefficient

Interpretation should be tied to both context and effect size. A coefficient of 0.10 may be meaningful in large population studies, while 0.30 or 0.50 may be considered substantial in applied social science. There is no universal rule, but common conventions can help.

Absolute phi value	Common interpretation	Practical meaning
0.00 to 0.09	Negligible	Very little observable relationship
0.10 to 0.29	Small	Weak but potentially meaningful association
0.30 to 0.49	Medium	Noticeable relationship
0.50 and above	Large	Strong practical association

The sign also matters. A positive coefficient means the categories align in the same direction based on your coding. A negative coefficient means the categories tend to oppose each other. Because coding choices affect the sign, always state clearly how the categories were defined.

Real-world examples with interpreted statistics

Example 1: Smoking status and chronic cough

Imagine a sample of 200 adults. Of 90 smokers, 54 report chronic cough and 36 do not. Of 110 non-smokers, 22 report chronic cough and 88 do not. This produces a positive phi coefficient because cough is more common among smokers than non-smokers. The association would likely fall in the small-to-moderate range depending on the exact calculation. In public health, even a modest association can be important because the exposure may affect a large number of people.

Example 2: Website ad exposure and purchase

Suppose 500 visitors are tracked. Among 220 visitors who saw an ad, 66 make a purchase and 154 do not. Among 280 who did not see the ad, 42 purchase and 238 do not. The resulting phi is positive because conversion is relatively more common among exposed users. In a commercial setting, a phi that seems numerically small can still be financially valuable if it scales across large traffic volume.

Relationship between phi and chi-square

The phi coefficient is closely related to the chi-square test for independence in a 2 x 2 table. Specifically:

phi = sqrt(chi-square / n)

where n is the total sample size. This connection matters because researchers often want both an effect size and a significance test. The chi-square test answers whether the variables appear statistically independent, while phi answers how strong the association is. Statistical significance alone does not tell you whether the relationship is practically important. Effect size helps fill that gap.

Important assumptions and cautions

Both variables should be dichotomous. If one or both variables have more than two categories, phi is no longer the best summary measure.
Counts should represent independent observations. Repeated measurements on the same individual may require different methods.
Watch sparse cells. Very small counts can make results unstable and can affect inferential procedures.
Coding changes sign. If you reverse a category, the magnitude remains the same but the sign can flip.
Association is not causation. A positive phi does not prove that one binary variable causes the other.

Common mistakes when calculating correlation for binary variables

Using percentages instead of counts. The formula uses raw cell frequencies.
Entering the wrong cell arrangement. Be consistent about which category is assigned to rows and columns.
Ignoring zero marginals. If any marginal total becomes zero, the denominator is invalid and phi cannot be computed.
Overinterpreting small values. Context, sample size, and study design matter.
Confusing phi with odds ratio. Both use a 2 x 2 table, but they answer different questions.

Phi coefficient versus other related measures

It helps to know when phi is appropriate and when another statistic may be better:

Phi coefficient: Best when both variables are dichotomous.
Point-biserial correlation: Used when one variable is dichotomous and the other is continuous.
Tetrachoric correlation: Used when both observed dichotomies are believed to come from underlying continuous variables cut at thresholds.
Cramer’s V: Used for larger contingency tables with more than two categories.
Odds ratio: Common in epidemiology and logistic modeling for comparing odds between groups.

How to report the result in academic or business writing

A clear reporting format might look like this: “Attendance at the prep course was positively associated with passing the exam, phi = 0.503, indicating a moderate-to-large relationship.” If you also perform a chi-square test, you could report both the significance and the effect size. In business reporting, you might say that ad exposure had a positive binary association with purchase behavior, with a small but meaningful effect size.

Why calculators help

Manual calculation is useful for understanding the method, but calculators reduce entry errors and make it easier to test alternative category definitions. This tool automatically computes the denominator, total sample size, and a readable interpretation. It also displays the four cells visually so you can inspect whether the pattern is being driven by strong agreement along one diagonal or by mismatch across categories.

Authoritative resources for deeper study

If you want to study contingency tables, chi-square testing, and effect size interpretation in more depth, these resources are helpful:

Bottom line

To calculate a correlation coefficient between two dichotomous variables, place the observed frequencies in a 2 x 2 table and compute the phi coefficient. The value summarizes how strongly the two binary variables move together and in which direction. It is easy to calculate, easy to interpret, and tightly connected to the chi-square framework used in categorical data analysis. If your variables are truly binary and your goal is to quantify association, phi is usually the right starting point.

How To Calculate A Correlation Coefficient Between Two Dichotomous Variables