Cannot Calculate Sargan Test with Dropped Variables Calculator
Use this tool to check whether your IV or GMM model remains overidentified after collinear or omitted instruments are dropped. The calculator estimates the effective number of instruments, overidentifying restriction degrees of freedom, Sargan test availability, and optional p-value guidance from an observed J statistic.
Understanding the Error: Cannot Calculate Sargan Test with Dropped Variables
The message cannot calculate sargan test with dropped variables usually appears in instrumental variables estimation when your software detects that the model no longer has enough valid overidentifying restrictions after one or more variables have been removed during estimation. In practice, the most common reasons are perfect multicollinearity, duplicate instruments, a lack of variation in one or more instruments, or automatic omission caused by missing data patterns. The issue matters because the Sargan test is not defined unless the model is overidentified. If your instrument count falls to the same number as your endogenous regressors, the model is exactly identified. If it falls below that number, the model is underidentified. In both cases, there is no valid Sargan overidentification test to compute.
At a technical level, the Sargan statistic tests whether the sample moments associated with the instruments are jointly consistent with the maintained exogeneity assumptions. The test statistic is asymptotically chi-square distributed with degrees of freedom equal to:
degrees of freedom = effective number of instruments – number of endogenous regressors
That formula is exactly why dropped variables matter. If you begin with six excluded instruments and two endogenous regressors, you have four overidentifying restrictions. But if four instruments are dropped because they are collinear with other regressors or have no usable variation, your effective instrument count falls to two. The model becomes exactly identified, the degrees of freedom become zero, and the Sargan test disappears. The error message is therefore not random. It is the software telling you that the test cannot be formed from the model you actually estimated.
Why the Sargan test requires overidentification
An overidentification test asks whether the extra instruments, beyond the minimum needed for identification, are mutually consistent with the structural error process. If you have only the minimum number of instruments, there are no extra moment conditions left over to test. This is why exactly identified models do not have a Sargan test. The estimator can still be computed, but the overidentification diagnostic cannot.
- Overidentified model: effective instruments > endogenous regressors. Sargan test can usually be computed.
- Exactly identified model: effective instruments = endogenous regressors. No overidentification test is available.
- Underidentified model: effective instruments < endogenous regressors. The equation is not adequately identified.
What counts as a dropped variable?
Software packages may drop variables for several reasons, and not all of them are obvious from the command syntax alone. In an IV setting, a dropped variable may be:
- An instrument perfectly predicted by another instrument or by included regressors.
- A dummy variable made redundant by a full set of fixed effects.
- An interaction term that collapses because one component has no variation in the estimation sample.
- An instrument lost because observations with missing values are excluded, leaving no variation after listwise deletion.
- A variable omitted after transformations such as differencing, demeaning, or absorption of fixed effects.
For this reason, you should always inspect the estimation log rather than relying only on the original model specification. Your command may have requested eight instruments, but the estimated system may effectively use only five.
How to interpret the calculator results
The calculator above reports four core quantities. First, it computes the effective instrument count by subtracting dropped variables from the total excluded instruments entered. Second, it computes the overidentification degrees of freedom. Third, it determines whether the Sargan test is available. Fourth, if you enter an observed test statistic, it approximates the p-value and compares your statistic with the appropriate chi-square critical value.
Suppose you have 7 instruments, 3 endogenous regressors, and 2 dropped instruments. The effective count is 5, so the degrees of freedom are 2. In that case the Sargan test is available because 5 minus 3 equals 2. If your observed J statistic were 6.10, then at the 5% level you would compare that against the chi-square critical value for 2 degrees of freedom, which is 5.99. Because 6.10 is larger, you would reject the null of instrument validity at the 5% level. That does not prove any one instrument is invalid, but it signals inconsistency between the overidentifying restrictions and the model assumptions.
Critical chi-square values commonly used for Sargan tests
The Sargan test relies on the chi-square distribution. The table below provides standard reference values for selected degrees of freedom. These are real statistical cutoffs widely used in econometrics.
| Degrees of freedom | 10% critical value | 5% critical value | 1% critical value |
|---|---|---|---|
| 1 | 2.71 | 3.84 | 6.63 |
| 2 | 4.61 | 5.99 | 9.21 |
| 3 | 6.25 | 7.81 | 11.34 |
| 4 | 7.78 | 9.49 | 13.28 |
| 5 | 9.24 | 11.07 | 15.09 |
| 10 | 15.99 | 18.31 | 23.21 |
These values help explain why the degrees of freedom matter so much. A model with one overidentifying restriction faces a 5% cutoff of 3.84, while a model with ten restrictions faces 18.31. In other words, the same observed test statistic can lead to a very different conclusion depending on how many effective instruments survive the estimation process.
Typical scenarios that trigger the error
Many users encounter this problem after adding fixed effects, high-dimensional controls, or multiple lags as instruments. The command runs, coefficients are reported, but the overidentification test is absent or replaced with an error. Here are several common scenarios:
| Scenario | Total instruments | Dropped | Endogenous regressors | Effective instruments | Sargan available? |
|---|---|---|---|---|---|
| Simple overidentified IV | 5 | 0 | 2 | 5 | Yes, df = 3 |
| Collinearity after fixed effects | 5 | 3 | 2 | 2 | No, exactly identified |
| Weakly varying lag instruments | 6 | 4 | 3 | 2 | No, underidentified |
| Moderate instrument loss | 8 | 2 | 3 | 6 | Yes, df = 3 |
Sargan versus Hansen J
Although the error message often mentions the Sargan test specifically, many applied researchers also use the Hansen J test. The distinction matters. The classic Sargan test assumes homoskedasticity. The Hansen J test is the heteroskedasticity-robust version commonly reported after GMM estimation. However, both rely on overidentifying restrictions. If dropped variables eliminate those restrictions, neither test can be meaningfully computed. So while the robust version differs in assumptions, it does not solve the identification arithmetic.
How to fix the problem
If your software says it cannot calculate the Sargan test with dropped variables, the right response is to diagnose the source of the dropped instruments rather than forcing the test. Work through the following checklist:
- Read the estimation log carefully. Identify exactly which variables were dropped.
- Check for perfect collinearity. This is especially common with dummy sets, absorbed effects, and interaction terms.
- Verify variation in the estimation sample. An instrument may vary in the raw dataset but become constant after subsetting.
- Recount effective instruments. Use the post-drop count, not the original specification.
- Reduce unnecessary instrument proliferation. Too many mechanically generated instruments can create redundancy and weak finite-sample behavior.
- Reconsider the specification. You may need additional valid instruments or fewer endogenous variables.
It is also good practice to report in your paper or replication notes that certain instruments were dropped and that the model became exactly identified or underidentified. This is much clearer than saying only that the overidentification test was not reported.
Why sample size still matters even though it does not determine availability
Sample size does not change whether the Sargan test exists. That depends only on the relationship between effective instruments and endogenous regressors. But sample size affects reliability. With small samples, asymptotic chi-square approximations can be poor, and overidentification tests may be unstable. With very large samples, even small misspecifications may lead to rejection. So the existence of the test and the usefulness of the test are related but not identical concepts.
Recommended references and authoritative sources
If you want to review the underlying theory or software behavior in more depth, these sources are useful:
- NIST Engineering Statistics Handbook for chi-square distribution background and test interpretation.
- UCLA Statistical Methods and Data Analytics for applied regression and instrumental variables examples.
- Penn State Online Statistics Program for broad hypothesis testing and asymptotic inference resources.
Bottom line
The phrase cannot calculate sargan test with dropped variables almost always means your final estimated model lacks enough surviving instruments to form overidentifying restrictions. The key diagnostic is simple: count the effective instruments that remain after estimation, subtract the number of endogenous regressors, and inspect the result. If the difference is zero or negative, the test is unavailable by construction. If the difference is positive, the test can usually be computed, and then the question becomes whether the observed statistic is large relative to the chi-square benchmark. The calculator on this page gives you a fast way to perform that check before you spend time debugging output that is behaving exactly as econometric theory predicts.