Calculate Mean Of Two Variables Stata

Stata Mean Calculator

Calculate Mean of Two Variables in Stata

Enter two numeric variable series to compute row means, overall means, paired summaries, and a chart that mirrors the logic of using Stata to average two variables such as test scores, blood pressure measures, costs, or survey responses.

Enter comma, space, or line-break separated numbers.
Values are matched by position to calculate row means like Stata rowwise logic.
Stata equivalent for row means is commonly egen mean2 = rowmean(var1 var2). For separate variable means, use summarize var1 var2.

Enter two variables and click Calculate Mean to see row means, summary statistics, and the chart.

Expert Guide: How to Calculate Mean of Two Variables in Stata

When people search for how to calculate mean of two variables in Stata, they are usually trying to do one of two things. First, they may want the average of each variable separately, such as the mean of income and the mean of expenses. Second, they may want a row-wise mean, which creates a new variable that averages two values on the same observation, such as the mean of systolic and diastolic scores, pretest and posttest scales, or two ratings collected for the same person. The distinction matters because Stata uses different commands depending on the goal.

If your objective is to calculate one mean for each variable, the standard Stata approach is summarize var1 var2. If your goal is to create a new variable that equals the average of two variables for every row, the more practical command is egen newvar = rowmean(var1 var2). This page explains both approaches, shows when each is appropriate, and gives you a calculator that helps verify your numbers before you write the final Stata code.

What “mean of two variables” means in practice

In data analysis, the word mean can refer to several related but different operations. Understanding the precise operation prevents coding mistakes and helps you interpret your output correctly.

  • Separate means: You want the average of variable A and the average of variable B as two independent summary statistics.
  • Row mean: You want one new variable for each observation, computed as the average of A and B for that case.
  • Grand mean across all values: You combine the values from both variables and compute a single average over the full pooled set. This is less common, but sometimes used in quality checks.
  • Mean with missing handling: You need to decide whether missing values should cause a missing result or whether the available value should be used.

For example, imagine a student dataset with math_score and reading_score. If you run summarize math_score reading_score, Stata gives you a mean for each subject separately. If you run egen academic_avg = rowmean(math_score reading_score), Stata creates one average score per student. Those are not the same question, and they should not be interpreted the same way.

Fastest ways to do it in Stata

Below are the most common and correct approaches.

  1. Get the mean of each variable separately
    Use summarize var1 var2. This returns the number of observations, mean, standard deviation, minimum, and maximum for each variable.
  2. Create a row mean variable
    Use egen mean_two = rowmean(var1 var2). This is the safest way to average two variables across each row, especially when missing values are present.
  3. Generate a direct arithmetic average
    Use generate mean_two = (var1 + var2) / 2. This works when both variables are complete and you want strict arithmetic averaging. If one value is missing, the result becomes missing.
  4. Inspect the new variable
    Use summarize mean_two or tabstat mean_two, stats(mean sd min max n) to confirm your result.
Important: generate mean_two = (var1 + var2) / 2 and egen mean_two = rowmean(var1 var2) can produce different results if either variable contains missing values. The egen version is often preferred for applied research.

Worked example with paired observations

Suppose a health researcher records two related measures for each patient, morning blood pressure and evening blood pressure. They want a daily midpoint value for each patient. In Stata, this is a classic row mean task. The logic looks like this:

  1. Each patient has one value in variable 1 and one value in variable 2.
  2. The two values belong to the same row, so they should be averaged within observation.
  3. The output should be a new variable, not just two separate descriptive means.
Observation Variable 1 Variable 2 Row Mean
1 10 12 11.0
2 20 18 19.0
3 30 33 31.5
4 40 39 39.5
5 50 60 55.0

From this small example, the separate mean of variable 1 is 30.0 and the separate mean of variable 2 is 32.4. The mean of the row means is 31.2. In balanced data like this, the average of row means equals the average of the two variable means combined at equal weight. This is useful because it lets you sanity check your Stata output. If your calculator and your Stata commands disagree, the most likely cause is missing values, misaligned observations, or accidental text characters in the source data.

Using Stata commands correctly

The simplest syntax patterns are easy to remember.

  • summarize var1 var2 for separate means.
  • egen mean_two = rowmean(var1 var2) for a row-wise mean.
  • generate mean_two = (var1 + var2)/2 when you want strict pair averaging and complete cases only.
  • list var1 var2 mean_two in 1/10 to inspect the first ten rows.

In production work, many analysts favor egen rowmean() because it is safer when real data are messy. If a student has a math score but a missing reading score, a direct arithmetic formula will usually return missing for the average. By contrast, rowmean() can calculate the mean of the available nonmissing values within the row. That behavior is often desirable in survey work, psychometrics, and panel data cleaning, but it should always be documented in your methods section.

Comparison of common Stata approaches

Method Typical Stata Command What It Returns Best Use Case
Separate means summarize var1 var2 One mean for each variable Descriptive statistics and quick overview
Row mean, flexible missing handling egen avg2 = rowmean(var1 var2) New variable with one mean per row Composite indicators, paired measures
Row mean, strict arithmetic generate avg2 = (var1 + var2)/2 New variable with one mean per row Clean data where both values must exist
Post-check summary summarize avg2 Mean, SD, min, max, N of new variable Quality control and reporting

How missing values affect the mean

Missing values are the most common reason analysts think Stata is calculating the wrong mean. In reality, Stata is doing exactly what the command requests. The issue is usually command choice. Consider this distinction:

  • If you use generate avg = (a + b)/2 and either a or b is missing, the result is missing.
  • If you use egen avg = rowmean(a b), Stata generally averages the available nonmissing values in the row.

That difference can change sample size, the reported mean of your new variable, and downstream regression results. For example, in a real project using survey responses from two related items, the generate formula may drop hundreds of observations due to one missing answer, while rowmean() preserves many of them. This is one reason institutional analysts often prefer egen for index construction.

When to use row means in applied research

Calculating the mean of two variables in Stata is common across many fields:

  • Public health: averaging repeated blood pressure or symptom measures.
  • Education: combining math and reading indicators into a broad performance score.
  • Economics: averaging two quarterly values into a simplified annual midpoint.
  • Psychology: averaging two related scale items before building a larger index.
  • Operations: averaging forecast and actual values to compare midpoint bias.

In all of these contexts, the key requirement is that the two variables represent values that can legitimately be averaged. You should not average variables on incompatible scales without transforming them first. For example, averaging age and income makes no substantive sense because the units are different. However, averaging two Likert items that measure the same construct often does make sense, especially when internal consistency is acceptable.

Practical validation strategy

A strong workflow is to validate your Stata result in three steps:

  1. Inspect a small number of rows manually and compute the arithmetic mean yourself.
  2. Use a calculator like the one above to confirm row-by-row averages and overall means.
  3. Run the Stata command and compare the first few observations with list.

This process catches pairing errors quickly. For instance, if your imported CSV shifted one column or sorted one variable independently of the other, your row means will look numerically plausible but substantively wrong. The calculator on this page is especially helpful because it visualizes the paired rows and instantly shows whether the pattern of averages matches your expectation.

Example interpretation of summary statistics

Suppose your output shows the following descriptive summary: variable 1 mean = 30.0, variable 2 mean = 32.4, and row-mean mean = 31.2. This tells you that the second variable tends to run slightly higher than the first. If your row means range from 11.0 to 55.0, then the paired average rises steadily across observations. In a research report, you might write that the average combined score across the two measures was 31.2, with the second measure slightly exceeding the first on average.

Always remember that a mean is sensitive to extreme values. If one observation contains a large outlier, the row-mean distribution can be pulled upward. In Stata, it is often smart to follow your mean calculations with a quick graph or distribution check. Even a simple histogram or box plot can reveal whether the mean reflects the center of the data well.

Common mistakes to avoid

  • Using summarize when you really need a new row-wise average variable.
  • Averaging variables that are not measured on comparable scales.
  • Ignoring missing values and assuming generate and egen rowmean() behave the same way.
  • Sorting one variable differently from the other before averaging.
  • Reporting the mean without the number of observations or missing-data rule.

Recommended authoritative references

If you want to deepen your understanding of means, summary statistics, and Stata workflows, these are useful references:

Bottom line

To calculate mean of two variables in Stata, start by deciding whether you want separate descriptive means or a row-wise average. Use summarize for separate means, and use egen rowmean() when you need a new variable that averages the two measures for each observation. If you have complete, perfectly paired data and want strict arithmetic behavior, generate with a direct formula also works. The calculator above gives you a quick validation step, including row means, summary outputs, and a chart, so you can move into Stata with confidence and fewer avoidable errors.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top