Calculating Median of a Variable in Stata
Use the interactive calculator below to find the median from a numeric list, preview the equivalent Stata command, and visualize the sorted distribution with the middle value highlighted. Then read the expert guide to understand how median calculation works in Stata, when to use it, and how to report it correctly in research.
Interactive Median Calculator
Enter numbers separated by commas, spaces, or line breaks. This calculator mirrors the core logic Stata uses when you summarize a variable and request detailed distribution statistics.
Results and Stata Output
What does it mean to calculate the median of a variable in Stata?
The median is the middle value of a numeric variable after the observations are sorted from smallest to largest. If there is an odd number of valid observations, the median is the exact center point. If there is an even number, the median is the average of the two middle values. In Stata, this matters because many real world datasets are skewed, include outliers, or contain values that make the mean less representative than the median. Income, medical spending, home prices, emergency room wait times, and survey response amounts are all classic examples where the median often gives a more realistic summary of the typical case.
When analysts say they want to calculate the median of a variable in Stata, they usually mean one of three things. First, they may want a quick descriptive statistic for a variable across the full sample. Second, they may want the median within groups such as sex, region, year, or treatment status. Third, they may want to store the median in a new dataset, scalar, or table for later reporting. Stata supports all three workflows, but the command you choose depends on whether you want a one time display, grouped summaries, or programmable output.
Fastest ways to compute a median in Stata
1. Using summarize with the detail option
The fastest method for a single variable is:
This command displays a detailed distribution table including percentiles, the smallest and largest values, the mean, variance, standard deviation, skewness, kurtosis, and the median. In the output, the median appears as the 50% percentile. This is often the best option for exploratory analysis because it gives you more context than just a single statistic.
2. Using tabstat for compact tables
If you want a cleaner summary table with just the statistics you need, use tabstat:
This approach is useful in reports because it keeps the output compact and easy to interpret. You can also extend it to multiple variables in the same command.
3. Using centile when you want exact percentiles
The median is the 50th percentile, so you can also calculate it with:
This command is especially useful when you want several percentiles at once, such as the 25th, 50th, and 75th percentiles for an interquartile range summary.
How Stata handles missing values when calculating the median
By default, Stata excludes missing values from numeric summaries. That behavior is usually what you want. If your variable contains values like ., .a, or other extended missing codes, they are not treated as real data points in the median calculation. This means your sample size for the median is the count of nonmissing observations only. Analysts sometimes forget this and compare a median from a reduced sample with a mean from the full sample after filtering. The safest practice is to always confirm the number of observations used with the summary output.
If you need to compute the median only for a subset, add an if condition:
That command gives the median of income only for observations where region == 2. This is cleaner than generating a temporary subset dataset and helps your workflow stay reproducible.
Grouped medians in Stata
Research often requires medians by category rather than one median for the whole sample. For example, you might want median wages by industry, median age by state, or median test scores by school type. In Stata, grouped medians are commonly generated with tabstat, table, bysort, or collapse.
tabstat by groups
This displays separate medians for each category of sex. It is efficient when you want a quick comparison across groups.
collapse when building a summary dataset
This command replaces the active data in memory with a collapsed dataset containing one observation for each region and year combination, plus the median income for that group. It is ideal for building publication tables or graphs, but remember that collapse changes the dataset in memory, so save your original data first if needed.
egen for within group medians
This creates a new variable where each observation within the same hospital receives the group median of cost. It is very useful for later modeling, normalization, or diagnostics.
Why the median is often better than the mean
The median is resistant to outliers. Suppose one respondent reports annual income of 10,000,000 while the rest of the sample ranges between 30,000 and 90,000. The mean can jump upward sharply, but the median changes very little because it depends on rank order rather than total magnitude. In applied work, this makes the median especially valuable in economics, epidemiology, education, policy analysis, and public administration.
| Dataset example | Mean | Median | Interpretation |
|---|---|---|---|
| Sample incomes: 32,000; 36,000; 39,000; 41,000; 44,000; 48,000; 950,000 | 170,000 | 41,000 | The mean is dominated by one extreme income, while the median reflects the typical observation. |
| Classic Stata auto dataset, mpg | 21.30 | 20.00 | The median is close to the center of the ordered fuel economy values and is less influenced by the highest mileage cars. |
| Classic Stata auto dataset, price | 6,165.26 | 5,006.50 | Price is right skewed, so the median gives a lower and often more representative central value than the mean. |
The table above shows a key practical point: when a variable is skewed, the median can differ substantially from the mean. This is why many government and public health reports prefer median household income, median home value, or median wait time instead of relying on the arithmetic mean alone.
Interpreting median output from summarize, detail
After running summarize varname, detail, Stata shows percentiles in the left portion of the output. The line labeled 50% is your median. The same output also gives the 25th percentile and 75th percentile, which together help you report the interquartile range. A strong reporting pattern is:
- Median
- 25th percentile
- 75th percentile
- Number of nonmissing observations
That format is common in biomedical, social science, and policy papers because it provides a robust summary of both center and spread.
Weighted medians and advanced situations
Many analysts eventually ask whether Stata can compute a weighted median. The answer depends on the command and the type of weight. Some summary commands allow analytic or frequency weights, while others do not provide weighted medians directly in the same way they provide weighted means. If you are working with survey data, you may need to use survey design aware procedures or community contributed commands depending on your exact objective. Before applying weights, confirm whether you need a design based median, a replicate weight approach, or a simple weighted percentile. The concept is statistical, not just computational, so your method should match the data collection design.
For official guidance on descriptive statistics and robust summaries, it helps to review references from authoritative institutions such as the National Institute of Standards and Technology, the UCLA Statistical Methods and Data Analytics resources, and the U.S. Census Bureau publications library.
Step by step workflow for calculating median in Stata
- Inspect the variable type with describe to confirm it is numeric.
- Check missing values with count if missing(varname).
- Run summarize varname, detail for a complete overview.
- Read the 50% percentile as the median.
- If needed, compute group specific medians using tabstat, egen, or collapse.
- Report the median with sample size and, when relevant, the interquartile range.
Comparison of common Stata commands for medians
| Command | Best use case | Median shown directly? | Strength |
|---|---|---|---|
| summarize var, detail | Single variable exploratory review | Yes, as 50% | Rich diagnostic context including percentiles, extremes, skewness, and kurtosis |
| tabstat var, statistics(median) | Clean descriptive table | Yes | Compact output and easy grouping with by() |
| centile var, centile(50) | Percentile focused work | Yes | Ideal when you also need quartiles or other percentiles |
| bysort group: egen med = median(var) | Create a reusable variable | Stored in a new variable | Excellent for later modeling and within group analysis |
| collapse (median) var, by(group) | Build summary datasets | Yes | Produces a dataset ready for charts and final tables |
Common mistakes when calculating medians in Stata
Using summarize without detail
The plain summarize command does not display the median. Many beginners run it and assume the mean is enough, but if you need the median, you must add , detail or use another command that reports medians explicitly.
Forgetting the sample restriction
If your analysis includes if or in qualifiers in one command but not another, your medians can differ simply because the underlying observations changed. Always document your restriction logic.
Ignoring data type problems
Sometimes a variable that looks numeric is actually stored as string text. If so, Stata will not calculate a median until you convert it with commands such as destring where appropriate.
Reporting the median without spread
A lone median is useful, but many readers also want to know variability. Reporting the 25th and 75th percentiles alongside the median creates a much stronger descriptive summary.
How to report median results in a paper or dashboard
A clean reporting sentence might read: “Median monthly medical expenditure was 214 dollars (IQR: 96 to 487; n = 4,832).” In Stata terms, those values could come from the 50th, 25th, and 75th percentiles shown by summarize, detail or from a custom tabstat table. In dashboards, medians are often placed in KPI cards because they remain stable even when new extreme values arrive in the data feed.
If your audience is technical, include the exact command used. If your audience is nontechnical, explain why median was chosen. For example, “Median was preferred to the mean because the variable was right skewed.” This simple sentence helps readers trust your method and understand your choice.
Practical takeaway
If you need a fast answer, use summarize varname, detail and read the 50% line. If you need a polished table, use tabstat. If you need group level medians stored in your data, use egen or collapse. In every case, verify missing values, sample restrictions, and whether the distribution is skewed enough that the median is the preferred measure of central tendency.