Variable Autocorrelation in Stata Calculator
Enter a time-ordered series to estimate autocorrelation, compare lags, review a quick Stata workflow, and visualize the pattern instantly. This tool is designed for analysts working with panel, economic, financial, health, and operational time series data.
- Lag 1 to Lag 12 support
- Pearson autocorrelation estimate
- Approximate significance check
- Chart.js visualization
Interactive Calculator
Tip: autocorrelation near +1 indicates persistence, near 0 indicates weak serial relationship, and near -1 suggests reversal across observations at the selected lag.
Autocorrelation Visualization
The chart displays the original series and the shifted lagged series used for the correlation estimate.
Expert Guide to Calculating Variable Autocorrelation in Stata
Calculating variable autocorrelation in Stata is one of the most important diagnostic steps in time series and panel data work. When a variable is autocorrelated, its current value is statistically related to its own past values. In practical terms, that means the observation you see today may depend partly on what happened yesterday, last quarter, or last year. This matters because many statistical procedures assume independent observations. If serial dependence is present and ignored, your standard errors can be biased, your forecast quality can be overstated, and your model interpretation can become misleading.
Autocorrelation appears constantly in real-world research. Monthly unemployment rates, inflation, industrial output, stock index returns, disease incidence counts, electricity demand, website traffic, and customer order volumes often display persistence across time. In Stata, analysts typically detect this dependence by visually inspecting the data, computing the autocorrelation function, checking correlograms, reviewing residual diagnostics after regression, and sometimes comparing the result with the Durbin-Watson statistic or Ljung-Box style tests. The calculator above provides an immediate hands-on estimate of autocorrelation at a chosen lag, but understanding the reasoning behind the number is what makes the result useful.
What autocorrelation means mathematically
For a time-ordered variable y, lag k autocorrelation measures the correlation between yt and yt-k. If the lag 1 autocorrelation is 0.82, then observations that are one period apart move together strongly. If lag 1 autocorrelation is close to zero, then consecutive observations show little linear association. If it is negative, increases in one period tend to be followed by decreases in the next.
The standard sample autocorrelation formula at lag k is:
Many software implementations, including workflows in Stata, conceptually use this same idea even if small implementation details differ depending on the command and context.
How to calculate autocorrelation in Stata
Stata provides several direct ways to study autocorrelation. The exact command depends on whether you want the autocorrelation of a raw variable, a model residual, or a panel series. A common workflow starts by declaring the time variable and then using a plotting or correlation command:
- Sort and declare the time structure with tsset.
- Inspect the series graphically with tsline.
- Generate correlograms with corrgram or review autocorrelation and partial autocorrelation functions.
- If you estimated a regression, examine residual autocorrelation instead of only the raw variable.
- Consider stationarity before interpreting high autocorrelation values, because trends can mechanically inflate them.
A simple Stata sequence often looks like this:
corrgram y, lags(12)
ac y, lags(12)
pac y, lags(12)
If you are estimating a regression and want to diagnose serial correlation in the residuals, you might fit the model first and then evaluate the residual structure. For example, after an OLS model, you may compute residuals and inspect them. In time series regressions, it is usually more informative to examine whether the model has removed the temporal dependence rather than just whether the raw variable was persistent before modeling.
Why stationarity matters before you interpret autocorrelation
A very common mistake is to compute autocorrelation on a series with a strong deterministic trend or unit root and treat the result as pure serial dependence. Suppose GDP, population, or cumulative enrollment rises over time. Even if the period-to-period shocks are weakly related, the trend alone can produce a very high lag 1 autocorrelation. In that case, the number is real, but it may be telling you more about nonstationarity than about the dynamic dependence around a stable mean.
That is why Stata users frequently combine autocorrelation analysis with differencing, detrending, or formal unit root testing. If your variable has a unit root, first differences may be more appropriate for analysis. After differencing, the correlogram often changes dramatically. A series with lag 1 autocorrelation of 0.95 in levels may show only 0.18 in first differences. This does not mean the software was inconsistent. It means the underlying statistical object changed.
Interpreting the output from the calculator and from Stata
The calculator above estimates sample autocorrelation at the lag you choose. It also reports an approximate significance threshold based on the familiar rule of thumb ±z / √n, where n is the number of observations used in the lagged correlation pair. This approximation is useful for quick screening. If your autocorrelation is much larger in magnitude than that threshold, serial dependence is more likely to be statistically meaningful. In Stata correlograms, you will often see confidence bands that communicate the same idea visually.
- Strong positive autocorrelation: values often exceed 0.5 and indicate persistence.
- Moderate autocorrelation: values around 0.2 to 0.5 suggest some serial structure worth modeling.
- Weak autocorrelation: values near zero imply limited linear dependence at that lag.
- Negative autocorrelation: values below zero imply reversal or oscillation across periods.
Interpretation always depends on context. In macroeconomic data, positive first-order autocorrelation is common. In quality-control or inventory systems, negative autocorrelation can appear if managers over-correct after each period. In environmental data, seasonality can cause strong spikes at lags 12, 24, or 52 depending on the observation frequency.
Real benchmark statistics analysts often compare against
The table below gives common reference values used in diagnostics and descriptive interpretation. These are not universal cutoffs, but they are widely used in applied analysis.
| Statistic | Typical benchmark | Practical meaning | Example interpretation |
|---|---|---|---|
| Lag 1 autocorrelation | 0.70 or higher | Strong persistence | Current period strongly resembles prior period, common in level series with inertia. |
| Lag 1 autocorrelation | 0.20 to 0.50 | Moderate dependence | Temporal structure exists, but shocks dissipate more quickly. |
| Approximate 95% significance band | ±1.96 / √n | Quick screening threshold | At n = 100, the band is about ±0.196. |
| Durbin-Watson | About 2.00 | Little first-order residual autocorrelation | A value near 2 suggests residual independence at lag 1. |
| Durbin-Watson | Below 1.50 | Positive residual autocorrelation concern | Equivalent to high positive first-order residual correlation in many simple settings. |
For example, if you compute lag 1 autocorrelation using 100 effective observations, the 95% reference band is approximately ±0.196. An observed value of 0.62 is therefore far beyond the threshold and strongly suggests nonrandom time dependence. By contrast, a value of 0.08 would be small enough that many analysts would treat it as compatible with noise, at least at the first lag.
Common Stata commands and when to use them
Stata offers multiple commands because autocorrelation can be studied from several angles:
- tsset: declares the time variable and is required for many time series operations.
- ac: plots the autocorrelation function for a variable.
- pac: plots the partial autocorrelation function, often useful for AR order identification.
- corrgram: produces a correlogram table and graph with autocorrelations, partial autocorrelations, and Ljung-Box style summaries.
- predict after regression: creates residuals for diagnostic analysis.
- estat dwatson after some regressions: reports Durbin-Watson where appropriate.
Suppose you have annual sales data and want to see whether current sales depend on prior sales. A strong lag 1 autocorrelation might motivate an autoregressive model. Suppose instead you estimate sales on advertising, prices, and income. In that case, your key diagnostic question is whether the residuals are autocorrelated after controlling for those explanatory variables. That distinction is critical because a raw series can be highly autocorrelated while the model residuals are not.
Comparison table: levels vs differenced data
Analysts frequently compare the same variable before and after differencing. The statistics below are realistic illustrative values commonly seen in economic and operational datasets.
| Series version | Observations | Lag 1 ACF | Lag 12 ACF | Interpretation |
|---|---|---|---|---|
| Monthly revenue in levels | 120 | 0.91 | 0.58 | Very persistent, likely trend and seasonality effects are present. |
| Monthly revenue first difference | 119 | 0.24 | -0.06 | Dependence is much smaller after removing trend. |
| Monthly revenue seasonally differenced | 108 | 0.18 | 0.11 | Most annual seasonality and persistence have been reduced. |
How the calculator mirrors a basic Stata concept
The calculator computes the correlation between the observed series and its lagged counterpart. If you choose lag 1, observation 2 is paired with observation 1, observation 3 with observation 2, and so on. If you choose lag 4, then observation 5 is paired with observation 1, observation 6 with observation 2, and so forth. This is conceptually the same alignment that underlies the autocorrelation function in statistical software. In Stata, declaring the time structure with tsset ensures those lags are interpreted correctly and consistently.
The resulting chart helps you see whether the shifted lagged series tracks the original closely. When the lines move together, autocorrelation is usually positive. When one line rises while the other tends to fall, the autocorrelation may be negative. Visual inspection should not replace numeric diagnostics, but it often helps detect structure, breaks, outliers, and seasonal patterns that a single statistic cannot fully summarize.
Best practices for reliable autocorrelation analysis
- Use data in true time order. Even one sorting mistake can invalidate the result.
- Check missing values. Irregular gaps can alter effective sample size and lag pairing.
- Review plots first. Trend, seasonality, and level shifts can dominate autocorrelation estimates.
- Distinguish raw-variable autocorrelation from residual autocorrelation. They answer different questions.
- Assess stationarity. Strong persistence in levels does not always imply a stable autoregressive process.
- Interpret multiple lags. A single lag may miss seasonal dependence that appears later.
- Use domain knowledge. Weekly, monthly, quarterly, and annual cycles often produce predictable lag patterns.
Useful authoritative references
For deeper methodological grounding, consult these high-quality sources:
- U.S. Census Bureau time series analysis resources
- NIST Engineering Statistics Handbook on autocorrelation
- Penn State STAT 510 applied time series course materials
Final takeaway
Calculating variable autocorrelation in Stata is not just a box-checking exercise. It is a foundational step in understanding dependence over time, selecting an appropriate model, validating residual behavior, and avoiding false confidence in regression results. If your autocorrelation is large, that may point to persistence, omitted dynamics, seasonality, or nonstationarity. If it is small after transformation or modeling, that is often evidence that your specification is capturing the data-generating process more effectively.
Use the calculator to get a quick estimate, then apply the same logic inside Stata with tsset, ac, pac, and corrgram. The combination of numeric estimates, confidence bands, and visual diagnostics will give you a much stronger basis for decision-making than any single number alone.