How to Calculate Frequency of a Variable in Stata
Use this interactive calculator to build a frequency table, percentages, and cumulative percentages for any categorical variable. Enter categories and counts exactly as you might summarize them from a Stata dataset, then compare the output to the Stata commands you would use in practice.
Frequency Calculator
Paste category labels and frequencies line by line using the format Category, Count. The calculator will compute totals, percentages, cumulative percentages, the mode, and a chart.
Results
Enter your values and click calculate to generate the frequency distribution and chart.
Expert Guide: How to Calculate Frequency of a Variable in Stata
Calculating the frequency of a variable in Stata is one of the most important basic skills in data analysis. A frequency table tells you how often each value or category appears in a dataset. This is essential because before you run models, estimate effects, or produce formal reports, you should always understand the distribution of the variables you are working with. In Stata, frequency analysis is commonly used for survey responses, demographic variables, education levels, disease categories, regions, treatment groups, and any other variable where you want a count of observations by category.
If you are asking how to calculate frequency of a variable in Stata, the short answer is that you will usually use the tabulate command, often abbreviated as tab. But understanding the command alone is not enough. You should know what the frequency means, how percentages are computed, how cumulative percentages are built, how missing values are handled, and when a one-way frequency table should be preferred over a two-way crosstab. This guide walks through all of that in a practical, analyst-friendly way.
What frequency means in Stata
A frequency is simply the number of observations that take on a specific value. Imagine you have a variable called education_level coded as Low, Medium, and High. If 18 observations are Low, 32 are Medium, and 25 are High, then those counts are the frequencies. Stata presents these in a table so you can immediately see the distribution of the variable.
In most cases, a frequency table includes four useful pieces of information:
- Frequency: the count of records in each category.
- Percent: the share of the total sample represented by that category.
- Cumulative frequency or cumulative percent: the running total across ordered categories.
- Total: the number of non-missing observations, unless missing is explicitly included.
The most common Stata command for frequencies
The standard syntax is straightforward:
For example:
That command tells Stata to display a one-way table of frequencies for the variable education_level. Stata returns a table showing counts and percentages for each observed category. If the variable is numeric but has value labels attached, Stata will display the labels instead of the raw numbers.
How Stata calculates the percent
The percentage in a frequency table is calculated using a very simple formula:
Using the calculator example above:
- Low = 18
- Medium = 32
- High = 25
- Missing = 5
If you exclude missing values, the valid total is 75. The percent for Medium is:
This is exactly the kind of result Stata shows when you use tab education_level on valid observations only. If you include missing with the missing option, then the denominator can change depending on how the table is specified and what output you are reviewing.
Basic step-by-step process in Stata
- Load or open your dataset in Stata.
- Identify the variable you want to summarize.
- Run tab variable_name.
- Review counts, percentages, and cumulative percentages.
- Check whether missing values are excluded or included.
- If necessary, sort categories, recode values, or apply labels before producing the final table.
Example with real interpretation
Suppose you have a public health dataset with a variable smoking_status containing three categories: Never, Former, and Current. If your frequency table shows that 52 percent of respondents are Never smokers, 28 percent are Former smokers, and 20 percent are Current smokers, you immediately learn the sample is dominated by people who have never smoked. That tells you something important before you run any regression or subgroup analysis.
| Smoking status | Frequency | Percent | Cumulative percent |
|---|---|---|---|
| Never | 520 | 52.0% | 52.0% |
| Former | 280 | 28.0% | 80.0% |
| Current | 200 | 20.0% | 100.0% |
The table above uses realistic proportions often seen in health and social science examples. Whether your exact numbers differ is not the point. The logic is the same: the frequency tells you count, the percent gives relative size, and the cumulative percent helps when categories are ordered.
How to include missing values
By default, Stata often excludes missing values from one-way tabulations. If you want missing values listed as their own category, use:
This is especially important in data quality checks. A variable with a high percentage of missing values may not be suitable for certain analyses without imputation, restriction, or redesign. Imagine a survey variable where 18 percent of responses are missing. That alone might change how much confidence you place in the observed distribution.
How to sort categories by frequency
If you want to quickly see the largest categories first, use:
This is very useful when a variable has many labels and you want to identify the dominant categories at a glance. Sorting by frequency can reveal concentration patterns that are hard to see in the natural numeric coding order.
When cumulative percentages matter
Cumulative percentages are most useful for ordered variables. For instance, if you have income bands, education levels, or Likert scale responses from strongly disagree to strongly agree, cumulative percentages help you answer questions like “What percentage is at or below this category?” In unordered nominal variables, cumulative percentages are less meaningful because the category order is arbitrary.
| Education category | Frequency | Percent | Cumulative percent |
|---|---|---|---|
| Primary or less | 145 | 14.5% | 14.5% |
| Secondary | 410 | 41.0% | 55.5% |
| College | 300 | 30.0% | 85.5% |
| Graduate | 145 | 14.5% | 100.0% |
From this table, you can say that 55.5 percent of respondents have secondary education or less, and 85.5 percent have college or less. That kind of interpretation is often useful in policy, education, and labor market analysis.
Frequency table versus summary statistics
Many new users confuse frequency tables with numeric summaries like mean, standard deviation, minimum, and maximum. The two tools answer different questions. Frequency tables are best for categorical variables and discrete counts. Summary statistics are best for continuous variables like age, income, blood pressure, or test scores.
- Use tab variable for categories.
- Use summarize variable for continuous variables.
- Use both if a numeric variable has a small number of discrete categories and you want both distribution and central tendency.
How to calculate frequencies for two variables
If you want to compare frequencies across two variables, use a two-way table:
This gives you counts for each combination of the row and column categories. You can add options such as row, col, and chi2 depending on whether you need row percentages, column percentages, or a chi-square test of association. For example:
That output is not the same as a one-way frequency table, but it is built from the same counting logic.
Working with value labels
In Stata, many variables are stored as numbers but displayed with labels. For instance, a variable might code 1 = Male and 2 = Female, or 1 = Low, 2 = Medium, 3 = High. A frequency table becomes much easier to interpret when labels are attached. If your output is showing numbers only, check whether value labels have been applied correctly. Labeling variables well is part of good statistical workflow.
Common mistakes when calculating frequency in Stata
- Ignoring missing values: this can make percentages look larger than they really are relative to the full sample.
- Using unordered categories with cumulative percentages: cumulative results may be technically correct but substantively meaningless.
- Forgetting sample restrictions: if you used if or in, your frequencies apply only to the subset.
- Not checking labels: unlabeled numeric codes can lead to misinterpretation.
- Assuming percentages always use the full dataset: Stata typically computes percentages from the valid observations included in the table.
Useful variations of the command
These small options cover a large share of practical frequency work in Stata. Once you know them, you can move quickly from raw data inspection to polished descriptive analysis.
Why visualizing frequencies helps
Tables are precise, but charts reveal patterns instantly. A bar chart is especially useful for showing which categories dominate and how wide the differences are. That is why this calculator includes a chart beside the frequency table. In Stata, graphical alternatives such as bar charts can complement your frequency tables when preparing presentations or reports for nontechnical audiences.
Authoritative data and methods resources
If you are learning frequency analysis for research, it helps to consult methodological and data sources with strong institutional credibility. These links are useful starting points:
- U.S. Census Bureau guidance and training resources
- Centers for Disease Control and Prevention survey documentation
- UCLA Statistical Methods and Data Analytics resources for Stata
How the calculator on this page relates to Stata
The interactive calculator above does not replace Stata, but it mirrors the same arithmetic Stata uses in a one-way frequency table. You enter category counts, and the tool computes:
- Total observations.
- Percent share for each category.
- Cumulative percentage.
- The modal category, meaning the category with the highest frequency.
That means you can use the calculator to verify hand calculations, teach students how tabulations work, or quickly prepare a frequency table before coding the same analysis in Stata.
Best practice workflow
A strong workflow for frequency analysis in Stata usually looks like this:
- Inspect the variable with codebook or describe.
- Run a one-way frequency with tab variable.
- Check missing values using tab variable, missing.
- Confirm labels and coding make sense.
- If needed, recode sparse or invalid categories.
- Use charts or export tables for reporting.
Following those steps will save time and reduce mistakes. Frequency analysis is often seen as simple, but it is one of the best ways to catch coding errors, identify data quality problems, and understand your sample before moving into more advanced statistical work.
Final takeaway
To calculate the frequency of a variable in Stata, use tab variable_name. That single command gives you the distribution of a categorical variable, including counts and percentages. Add options like missing or sort when needed. Most importantly, do not treat the frequency table as just a mechanical output. Read it carefully. It tells you how your data are distributed, whether categories are balanced, whether missingness is a concern, and whether your variable is ready for deeper analysis.