How to Calculate Percentage of Variable in Stata
Use this premium calculator to compute percentages from raw values, preview the exact Stata command pattern you need, and visualize the result instantly. This tool is especially useful when you want to convert a count, subtotal, or category frequency into a percent of a total inside Stata.
Interactive Calculator
Enter the numerator and denominator that match your Stata variable logic. Then choose whether you want a simple percent, a generated variable command, or a grouped frequency example.
Ready: enter values and click Calculate Percentage to see the computed result, formula, and suggested Stata command.
Expert Guide: How to Calculate Percentage of Variable in Stata
If you are learning how to calculate percentage of variable in Stata, the core idea is straightforward: divide the value of interest by the appropriate total and multiply by 100. The challenge is not the arithmetic. The real challenge is choosing the correct denominator, deciding whether you need observation-level or group-level percentages, and writing efficient Stata code that is easy to audit later.
In Stata, percentages can be produced in several ways depending on your goal. You might want the percentage of one numeric variable relative to another numeric variable, the percentage distribution of a categorical variable, or the percentage share of a subgroup within a panel, household, firm, county, or survey domain. Each use case looks similar on the surface, but the command pattern differs. That is why understanding both the formula and the data structure matters.
The basic percentage formula
The standard percentage formula is:
percentage = (part / total) * 100
In Stata, a direct observation-level calculation usually looks like this:
gen pct = (part_variable / total_variable) * 100
For example, if you have employee sales and company total sales recorded on the same row, you can compute the employee share as:
gen sales_pct = (employee_sales / company_sales) * 100
When to use generate versus egen
Many Stata users confuse generate and egen. Use generate when both the numerator and denominator already exist row by row. Use egen when you need to create a total before calculating the percentage. For instance, if each observation is a person and you need a household total first, you would often use egen with a grouping variable:
bysort household_id: egen household_income = total(income)
gen income_share = (income / household_income) * 100
This pattern is extremely common in microdata analysis, labor data, firm records, and household surveys. The first line creates the denominator within each household. The second line converts each person’s income into a share of household income.
Three common ways to calculate percentages in Stata
1. Percentage of one numeric variable relative to another
This is the easiest case. Suppose a dataset contains profit and revenue. You want profit margin as a percentage:
gen profit_margin = (profit / revenue) * 100
Always check for zero or missing denominators before running this on production data:
gen profit_margin = .
replace profit_margin = (profit / revenue) * 100 if revenue > 0
This avoids divide-by-zero problems and prevents misleading infinite values.
2. Percentage distribution of a categorical variable
If you want percentages for categories such as sex, region, industry, education level, or response type, the simplest route is often a frequency table:
tabulate region
Stata automatically displays frequencies and percentages. If you need a two-way percentage table, use:
tabulate region sex, row
tabulate region sex, col
tabulate region sex, cell
row gives row percentages, col gives column percentages, and cell gives the percentage of the whole table. This is one of the most useful distinctions in applied work, especially when analyzing survey distributions or cross-tab summaries.
3. Percentage share within groups
Many analysts need percentages within a subgroup, such as the percentage of household expenditure by category, vote share within district, or student share within school. In these cases, use bysort and egen:
bysort district: egen district_total = total(votes)
gen vote_share = (votes / district_total) * 100
This method is ideal when the denominator is not a fixed dataset total but a dynamic total that changes by group.
Worked examples
Example 1: Individual percentage of total
Suppose 125 survey respondents out of 500 selected a given option. The percentage is:
(125 / 500) * 100 = 25
In Stata, if count_yes is 125 and count_total is 500 for a row or a subset, the code pattern remains:
gen yes_pct = (count_yes / count_total) * 100
Example 2: Household expenditure shares
Imagine each row is one expense item within a household. You have variables household_id and expense. To calculate what percentage each item contributes to total household spending:
bysort household_id: egen household_expense_total = total(expense)
gen expense_share = (expense / household_expense_total) * 100
This is a textbook use case for percentage of variable in Stata because the denominator must be created inside each group.
Example 3: Category percentages from tabulate
If your variable is categorical, for example employment_status, and you simply want the percentage in each category, use:
tab employment_status
Stata outputs a table showing counts, percents, and cumulative percentages. This is often faster and cleaner than manually generating indicator variables unless you specifically need a new percentage variable stored in the dataset.
Comparison table: best Stata method by use case
| Use case | Recommended command pattern | Why it works | Typical output |
|---|---|---|---|
| Variable divided by another variable | gen pct = (x / y) * 100 |
Both numerator and denominator already exist by observation | Observation-level percentage |
| Share within group | bysort group: egen total_y = total(y)gen pct = (y / total_y) * 100 |
Creates a group-specific denominator | Within-group share |
| Category frequency percentage | tab variable |
Stata computes counts and percents automatically | Frequency table |
| Two-way percentage table | tab rowvar colvar, row or col |
Useful for cross-tab comparison | Row or column percentages |
Real statistics context: why percentage calculations matter
Percentages are used constantly in policy, education, labor, and public health datasets. According to the U.S. Census Bureau, percentage shares are central to reporting demographic, housing, and economic distributions across groups and geographies. In higher education data, institutions often report percentages of enrollment by race, sex, or attendance status rather than raw counts because proportions allow fair comparison between differently sized groups. Labor and health datasets similarly rely on percent distributions to compare categories across states, years, or demographic segments.
| Illustrative data context | Raw count | Total | Percentage |
|---|---|---|---|
| Students enrolled part time in a college sample | 1,250 | 5,000 | 25.0% |
| Households with broadband in a county sample | 18,400 | 23,000 | 80.0% |
| Workers in services in a local labor sample | 7,350 | 14,700 | 50.0% |
| Survey respondents selecting option A | 125 | 500 | 25.0% |
Important data quality checks before calculating percentages
- Confirm the denominator. A wrong denominator creates a wrong percentage even when the formula is correct.
- Check for zero denominators. Use conditional replacement if totals may be zero.
- Inspect missing values. Missing numerator or denominator values can propagate into the result.
- Verify grouping logic. If you use
bysort, make sure the grouping variable truly identifies the intended unit such as household, school, or district. - Decide on scale. Some workflows store shares between 0 and 1, while others store percentages between 0 and 100.
Example of safe coding for missing or zero totals
gen pct = .
replace pct = (x / y) * 100 if !missing(x) & !missing(y) & y > 0
This pattern is safer in real datasets because it explicitly protects the result from invalid denominators.
How to calculate percentages after collapse or contract
Two powerful Stata workflows are collapse and contract. If you need category percentages from the dataset itself, contract can create frequencies and then percentages:
contract occupation
egen total_freq = total(_freq)
gen pct = (_freq / total_freq) * 100
If you are summarizing data before percentage calculation, collapse can be equally helpful. For example, you can collapse to district totals first and then compute each district’s share of the national total.
Row, column, and cell percentages explained
When working with two-way tables, analysts often misuse row and column percentages. Here is the difference:
- Row percentages add to 100 across each row. They answer: within this row category, how are observations distributed across columns?
- Column percentages add to 100 down each column. They answer: within this column category, how are observations distributed across rows?
- Cell percentages are based on the grand total of the table. They answer: what percent of the entire dataset is represented by this cell?
In Stata, these correspond to:
tabulate var1 var2, row
tabulate var1 var2, col
tabulate var1 var2, cell
Common mistakes users make in Stata
- Multiplying by 100 before division without parentheses in more complex formulas.
- Using the overall dataset total when a group-level total is required.
- Forgetting that
tabulatepercentages are display output, not automatically saved variables. - Creating percentages from already rounded totals, which can distort results.
- Comparing percentages from groups with very different sample sizes without checking the counts.
Recommended workflow for analysts
- Inspect the variable with
summarize,tabulate, orcodebook. - Define whether your denominator is observation-level, group-level, or dataset-level.
- Create totals if needed using
egen total()within the correctbysortstructure. - Generate the percentage variable with clear naming, such as
income_pctorvote_share. - Validate the result using a few hand-checked observations.
- Format or round only after the calculation, not before.
Authoritative references for methodology and data reporting
For deeper reference material on statistics, survey reporting, and tabular percentages, review guidance from U.S. Census Bureau, National Center for Education Statistics, and U.S. Bureau of Labor Statistics.
Final takeaway
To calculate percentage of variable in Stata, you generally divide a part by a total and multiply by 100. If the total already exists, use generate. If the total must be built within groups, use bysort with egen total(). If you only need category percentages for a table, use tabulate. The command is simple, but the denominator decision is everything. Once you understand that distinction, percentage calculations in Stata become consistent, scalable, and easy to explain in reports, appendices, and reproducible code files.