Cannot Calculate Sargan Test With Dropped Variables

Econometrics Diagnostic Calculator

Cannot Calculate Sargan Test with Dropped Variables Calculator

Use this tool to check whether your IV or GMM model remains overidentified after collinear or omitted instruments are dropped. The calculator estimates the effective number of instruments, overidentifying restriction degrees of freedom, Sargan test availability, and optional p-value guidance from an observed J statistic.

Count the instruments intended to identify your endogenous regressors, excluding included exogenous regressors if your software reports excluded instruments separately.
This is the number of variables treated as endogenous in the structural equation.
Enter how many relevant instruments or identifying variables were dropped by the estimation routine because of collinearity, missingness, or lack of variation.
Sample size does not change the overidentification count directly, but it helps contextualize whether asymptotic tests are more stable.
If you already have a test statistic, the calculator can estimate the p-value when the test is valid.
Used to show the critical chi-square threshold for the resulting degrees of freedom.
Enter your model details and click Calculate Diagnostic.

Understanding the Error: Cannot Calculate Sargan Test with Dropped Variables

The message cannot calculate sargan test with dropped variables usually appears in instrumental variables estimation when your software detects that the model no longer has enough valid overidentifying restrictions after one or more variables have been removed during estimation. In practice, the most common reasons are perfect multicollinearity, duplicate instruments, a lack of variation in one or more instruments, or automatic omission caused by missing data patterns. The issue matters because the Sargan test is not defined unless the model is overidentified. If your instrument count falls to the same number as your endogenous regressors, the model is exactly identified. If it falls below that number, the model is underidentified. In both cases, there is no valid Sargan overidentification test to compute.

At a technical level, the Sargan statistic tests whether the sample moments associated with the instruments are jointly consistent with the maintained exogeneity assumptions. The test statistic is asymptotically chi-square distributed with degrees of freedom equal to:

degrees of freedom = effective number of instruments – number of endogenous regressors

That formula is exactly why dropped variables matter. If you begin with six excluded instruments and two endogenous regressors, you have four overidentifying restrictions. But if four instruments are dropped because they are collinear with other regressors or have no usable variation, your effective instrument count falls to two. The model becomes exactly identified, the degrees of freedom become zero, and the Sargan test disappears. The error message is therefore not random. It is the software telling you that the test cannot be formed from the model you actually estimated.

Why the Sargan test requires overidentification

An overidentification test asks whether the extra instruments, beyond the minimum needed for identification, are mutually consistent with the structural error process. If you have only the minimum number of instruments, there are no extra moment conditions left over to test. This is why exactly identified models do not have a Sargan test. The estimator can still be computed, but the overidentification diagnostic cannot.

  • Overidentified model: effective instruments > endogenous regressors. Sargan test can usually be computed.
  • Exactly identified model: effective instruments = endogenous regressors. No overidentification test is available.
  • Underidentified model: effective instruments < endogenous regressors. The equation is not adequately identified.

What counts as a dropped variable?

Software packages may drop variables for several reasons, and not all of them are obvious from the command syntax alone. In an IV setting, a dropped variable may be:

  1. An instrument perfectly predicted by another instrument or by included regressors.
  2. A dummy variable made redundant by a full set of fixed effects.
  3. An interaction term that collapses because one component has no variation in the estimation sample.
  4. An instrument lost because observations with missing values are excluded, leaving no variation after listwise deletion.
  5. A variable omitted after transformations such as differencing, demeaning, or absorption of fixed effects.

For this reason, you should always inspect the estimation log rather than relying only on the original model specification. Your command may have requested eight instruments, but the estimated system may effectively use only five.

A practical rule: the relevant count for the Sargan test is not the number of instruments you intended to include, but the number that remain active after dropping, absorption, and sample restrictions.

How to interpret the calculator results

The calculator above reports four core quantities. First, it computes the effective instrument count by subtracting dropped variables from the total excluded instruments entered. Second, it computes the overidentification degrees of freedom. Third, it determines whether the Sargan test is available. Fourth, if you enter an observed test statistic, it approximates the p-value and compares your statistic with the appropriate chi-square critical value.

Suppose you have 7 instruments, 3 endogenous regressors, and 2 dropped instruments. The effective count is 5, so the degrees of freedom are 2. In that case the Sargan test is available because 5 minus 3 equals 2. If your observed J statistic were 6.10, then at the 5% level you would compare that against the chi-square critical value for 2 degrees of freedom, which is 5.99. Because 6.10 is larger, you would reject the null of instrument validity at the 5% level. That does not prove any one instrument is invalid, but it signals inconsistency between the overidentifying restrictions and the model assumptions.

Critical chi-square values commonly used for Sargan tests

The Sargan test relies on the chi-square distribution. The table below provides standard reference values for selected degrees of freedom. These are real statistical cutoffs widely used in econometrics.

Degrees of freedom 10% critical value 5% critical value 1% critical value
12.713.846.63
24.615.999.21
36.257.8111.34
47.789.4913.28
59.2411.0715.09
1015.9918.3123.21

These values help explain why the degrees of freedom matter so much. A model with one overidentifying restriction faces a 5% cutoff of 3.84, while a model with ten restrictions faces 18.31. In other words, the same observed test statistic can lead to a very different conclusion depending on how many effective instruments survive the estimation process.

Typical scenarios that trigger the error

Many users encounter this problem after adding fixed effects, high-dimensional controls, or multiple lags as instruments. The command runs, coefficients are reported, but the overidentification test is absent or replaced with an error. Here are several common scenarios:

Scenario Total instruments Dropped Endogenous regressors Effective instruments Sargan available?
Simple overidentified IV5025Yes, df = 3
Collinearity after fixed effects5322No, exactly identified
Weakly varying lag instruments6432No, underidentified
Moderate instrument loss8236Yes, df = 3

Sargan versus Hansen J

Although the error message often mentions the Sargan test specifically, many applied researchers also use the Hansen J test. The distinction matters. The classic Sargan test assumes homoskedasticity. The Hansen J test is the heteroskedasticity-robust version commonly reported after GMM estimation. However, both rely on overidentifying restrictions. If dropped variables eliminate those restrictions, neither test can be meaningfully computed. So while the robust version differs in assumptions, it does not solve the identification arithmetic.

How to fix the problem

If your software says it cannot calculate the Sargan test with dropped variables, the right response is to diagnose the source of the dropped instruments rather than forcing the test. Work through the following checklist:

  1. Read the estimation log carefully. Identify exactly which variables were dropped.
  2. Check for perfect collinearity. This is especially common with dummy sets, absorbed effects, and interaction terms.
  3. Verify variation in the estimation sample. An instrument may vary in the raw dataset but become constant after subsetting.
  4. Recount effective instruments. Use the post-drop count, not the original specification.
  5. Reduce unnecessary instrument proliferation. Too many mechanically generated instruments can create redundancy and weak finite-sample behavior.
  6. Reconsider the specification. You may need additional valid instruments or fewer endogenous variables.

It is also good practice to report in your paper or replication notes that certain instruments were dropped and that the model became exactly identified or underidentified. This is much clearer than saying only that the overidentification test was not reported.

Why sample size still matters even though it does not determine availability

Sample size does not change whether the Sargan test exists. That depends only on the relationship between effective instruments and endogenous regressors. But sample size affects reliability. With small samples, asymptotic chi-square approximations can be poor, and overidentification tests may be unstable. With very large samples, even small misspecifications may lead to rejection. So the existence of the test and the usefulness of the test are related but not identical concepts.

Recommended references and authoritative sources

If you want to review the underlying theory or software behavior in more depth, these sources are useful:

Bottom line

The phrase cannot calculate sargan test with dropped variables almost always means your final estimated model lacks enough surviving instruments to form overidentifying restrictions. The key diagnostic is simple: count the effective instruments that remain after estimation, subtract the number of endogenous regressors, and inspect the result. If the difference is zero or negative, the test is unavailable by construction. If the difference is positive, the test can usually be computed, and then the question becomes whether the observed statistic is large relative to the chi-square benchmark. The calculator on this page gives you a fast way to perform that check before you spend time debugging output that is behaving exactly as econometric theory predicts.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top