Machine learning explainability Permutation importance

Som
2 min readDec 31, 2022
Feature importance plot

Feature importance

Understanding which features are most important in a model can help us understand how it works. Permutation importance is one way to measure feature importance. This method works by randomly shuffling a single feature and then measuring the impact on the model’s prediction. It is a useful technique for understanding which features are most important for a particular prediction, as well as the effect each feature has on the model’s prediction more generally.

Why use the permutation importance compared to other approaches

  • Fast to calculate
  • widely used and understood
  • consistent with the properties of feature importance measure

Permutation importance

it’s computed after the model has been fitted. it’s calculated if we randomly shuffle the single col of validation data, leaving the target and all other columns in place how would that affect the accuracy of the predictions in that now shuffled data?

Steps for calculating the permutation importance

  1. Get a trained model.
  2. Shuffle the values in a single column, and make predictions using the resulting dataset. Use these predictions and the true target values to calculate how much the loss function suffered from shuffling. That performance deterioration measures the importance of the variable you just shuffled.
  3. Return the data to the original order (undoing the shuffle from step 2) Repeat step 2 with the next column in the dataset, until you have calculated the importance of each column.
import eli5
from eli5.sklearn import PermutationImportance
perm = PermutationImportance(my_model, random_state=1).fit(val_X, val_y)
eli5.show_weights(perm, feature_names = val_X.columns.tolist())
show_weights by eli5

The values at the top are the most important features, and at the bottom are the least important features. When we see negative values for the feature importance, it can happen that the predictions on the shuffled data are more accurate than the real data. This occurs when the feature doesn’t matter, but random chance causes the predictions on the shuffled data to be more accurate.

Additional resources and references

Thanks for reading my blog! 🤗 Follow me for more content, and don’t forget to say hi in the comments. It’s always encouraging to hear from readers. Have a great day! 🤗

--

--