How to Calculate Variable Importance in Random Forest Python
Use this interactive calculator to estimate feature importance from either mean decrease in impurity (MDI) or permutation importance. Enter your feature values below, calculate normalized importance, and visualize the ranking instantly.
Random Forest Variable Importance Calculator
Choose your method, enter the relevant model outputs, and generate a ranked importance profile. For MDI, input the total impurity reduction attributed to each feature. For permutation importance, input your baseline score and the score after shuffling each feature.
Results
Your ranked feature importance summary appears below.
How to calculate variable importance in random forest Python
Variable importance in a random forest tells you how much each feature contributes to a model’s predictive performance. In Python, the two most common approaches are mean decrease in impurity and permutation importance. Both methods are useful, but they answer slightly different questions. If you are building a model in scikit-learn and want a fast ranking of features used inside the trees, impurity-based importance is built in. If you want a more reliable measure of how much a feature affects prediction quality on validation data, permutation importance is usually the better choice.
At a practical level, a random forest works by creating many decision trees, each trained on a bootstrap sample of the data and a random subset of features at each split. During training, each split reduces some impurity measure such as Gini impurity for classification or variance for regression. Features that repeatedly create large improvements across many trees receive higher importance scores. In Python, this information is exposed through the feature_importances_ attribute after fitting a RandomForestClassifier or RandomForestRegressor.
Two standard ways to measure importance
- MDI: Mean decrease in impurity. Fast, built into the fitted random forest, but biased toward continuous or high-cardinality variables.
- Permutation importance: Measures how much model performance drops when one feature is randomly shuffled. Slower, but often more faithful to real predictive value on holdout data.
If your goal is exploratory model interpretation, it is smart to compute both and compare them. If they agree on the top variables, your interpretation is usually more robust. If they disagree sharply, you may have multicollinearity, leakage, data imbalance, or a variable with many possible split points dominating the tree-building process.
Calculating built-in random forest importance in Python
The simplest workflow in Python uses scikit-learn. After training a random forest, you can read the importance vector directly:
- Split your data into training and testing sets.
- Fit a random forest model.
- Read
model.feature_importances_. - Map those values back to the original feature names.
- Sort descending to identify the strongest predictors.
Conceptually, the formula for impurity-based importance for a feature is the total weighted impurity reduction contributed by splits on that feature, summed over all trees, then normalized so all importances add to 1. In many tutorials, this is described as:
Importance(feature j) = sum of weighted impurity decreases from all splits using feature j / sum of decreases from all features
That means if a variable receives a score of 0.27, it accounts for 27% of the total impurity reduction tracked by the forest. The calculator above performs that same normalization when you choose the MDI method.
Important: MDI is efficient and available immediately after training, but it can overstate the value of variables with more candidate split points. Categorical features encoded into many levels or continuous variables with broad numeric ranges often appear more important than they truly are.
Example Python code for MDI
In practice, the workflow looks like this:
- Import pandas and the relevant scikit-learn model.
- Fit the random forest.
- Create a DataFrame from
feature_namesandfeature_importances_. - Sort descending and plot the top values.
Because the built-in metric is normalized, the scores sum to exactly 1.000 in scikit-learn. If you multiply by 100, you get a convenient percentage-based interpretation for a dashboard, report, or blog article.
How permutation importance is calculated
Permutation importance asks a more intuitive question: if you randomly destroy the relationship between one feature and the target, how much worse does the model perform? In Python, the process is:
- Train the random forest normally.
- Measure a baseline score on validation or test data.
- Shuffle one feature column while leaving all others unchanged.
- Predict again and recalculate the score.
- Compute the performance drop.
- Repeat several times and average the drop.
If you are using an accuracy-like metric where higher is better, the importance is:
Importance(feature j) = baseline score – shuffled score
If you are using an error metric where lower is better, such as RMSE, then the logic reverses:
Importance(feature j) = shuffled error – baseline error
The calculator above supports both metric directions. This matters because many people accidentally compute the sign backward when moving from classification metrics like AUC to regression metrics like MAE.
Why permutation importance is often preferred
- It evaluates importance on validation data, not just inside training splits.
- It aligns directly with predictive performance.
- It is model-agnostic and can be used beyond random forests.
- It can reveal when a feature adds little value despite being heavily used by the trees.
Its main downside is computational cost. If you have 200 features and repeat the shuffling 20 times per feature, that can become expensive. Still, for most business and research datasets, the extra reliability is worth it.
MDI vs permutation importance comparison
| Criterion | MDI | Permutation Importance |
|---|---|---|
| Calculation source | Training-time split impurity reduction | Validation or test score change after shuffling |
| Runtime cost | Very low | Moderate to high |
| Bias toward high-cardinality features | Higher | Lower |
| Works for any model | No | Yes |
| Interpretation | Share of total impurity reduction | Expected drop in predictive performance |
A useful rule is this: use MDI for speed and an early sense of ranking, then confirm your conclusions with permutation importance on a true holdout set. This two-step workflow often gives the best balance between productivity and rigor.
Realistic interpretation example
Suppose you train a churn prediction model with five features: age, income, tenure, region, and campaign count. Your random forest returns these impurity importances: 0.36, 0.22, 0.18, 0.11, and 0.07. Normalized to percentages, that means age contributes 38.1% of the total measured importance if the sum of the listed values is 0.94. In other words, age is the strongest splitting variable in the trained forest.
Now imagine your holdout AUC is 0.910. When you permute each feature separately, the AUC becomes 0.842, 0.879, 0.891, 0.903, and 0.907. The drops are 0.068, 0.031, 0.019, 0.007, and 0.003. This ranking also places age first, but the result now reflects actual deterioration in model performance. In many applied settings, the permutation ranking is more trustworthy for communicating business impact or scientific importance.
| Feature | Sample MDI Score | Normalized MDI % | Baseline AUC | Shuffled AUC | Permutation Drop |
|---|---|---|---|---|---|
| age | 0.36 | 38.30% | 0.910 | 0.842 | 0.068 |
| income | 0.22 | 23.40% | 0.910 | 0.879 | 0.031 |
| tenure | 0.18 | 19.15% | 0.910 | 0.891 | 0.019 |
| region | 0.11 | 11.70% | 0.910 | 0.903 | 0.007 |
| campaigns | 0.07 | 7.45% | 0.910 | 0.907 | 0.003 |
Common mistakes when calculating feature importance
- Using training data for permutation importance: this inflates confidence because the model has already seen the examples.
- Ignoring correlated features: if two variables carry the same signal, shuffling one may not reduce performance much because the other still stands in for it.
- Misreading normalized percentages as causal effects: importance shows contribution to predictions, not proof of causation.
- Overlooking negative permutation values: a slightly negative value can happen from noise and usually means the feature is weak or unstable.
- Forgetting metric direction: higher-is-better and lower-is-better metrics require different subtraction logic.
How to compute it in scikit-learn
In Python, random forest importance is most often computed with scikit-learn. For built-in importance, fit the model and use feature_importances_. For permutation importance, use sklearn.inspection.permutation_importance, which returns the mean importance and standard deviation across repeats. That standard deviation is very useful because it tells you whether a feature’s estimated importance is stable or noisy.
For example, a feature with mean importance 0.042 and standard deviation 0.003 is much more dependable than one with mean 0.011 and standard deviation 0.019. In a production workflow, you should not just rank by the mean. You should also inspect uncertainty, especially if the differences among lower-ranked variables are small.
Recommended workflow
- Clean missing values and encode categorical variables appropriately.
- Split data into train, validation, and test sets if possible.
- Train a tuned random forest with fixed
random_state. - Extract MDI from
feature_importances_. - Run permutation importance on validation or test data.
- Compare the rankings and inspect correlated features.
- Document the metric used and whether higher or lower values indicate better performance.
How many trees do you need?
There is no universal answer, but many practical random forest models use between 100 and 1000 trees. More trees generally stabilize feature importance estimates, although the gains diminish after a point. In benchmark-style examples, moving from 100 to 500 trees often reduces variability in rankings noticeably, while increasing from 500 to 2000 trees may produce smaller incremental benefits. If importance stability matters, rerun the model with different seeds and check whether your top features remain consistently near the top.
Authoritative references
For rigorous background on model evaluation and machine learning methodology, these sources are useful:
- NIST AI Risk Management Framework (.gov)
- Carnegie Mellon University Department of Statistics and Data Science (.edu)
- Cornell University Computer Science (.edu)
Final takeaway
If you want to know how to calculate variable importance in random forest Python, the fastest answer is to fit the forest and inspect feature_importances_. If you want a stronger estimate of real predictive contribution, compute permutation importance on unseen data. Use MDI for speed, permutation for trust, and compare both when communicating feature impact to stakeholders. The calculator on this page helps you do the arithmetic behind each method, normalize the results, and visualize the ranking clearly.
Educational note: feature importance is a model interpretation aid, not a substitute for domain expertise, error analysis, bias auditing, or causal inference.