xgboost feature importance

December 1, 2018

This post will go over extracting feature (variable) importance and creating a function for creating a ggplot object for it. I will draw on the simplicity of Chris Albon’s post. For steps to do the following in Python, I recommend his post.

Feature Importance by Chris Albon

If you’ve ever created a decision tree, you’ve probably looked at measures of feature importance. In the above flashcard, impurity refers to how many times a feature was use and lead to a misclassification. Here, we’re looking at the importance of a feature, so how much it helped in the classification or prediction of an outcome.

This example will draw on the build in data Sonar from the mlbench package.

Prepping the Environment

library(caret)
library(xgboost)
library(tidyverse)

Loading the data

data("Sonar", package = "mlbench")

Train the decision tree

xgb_fit <- train(Class ~ .,
                 data = Sonar,
                 method = "xgbLinear")

xgb_fit

## eXtreme Gradient Boosting 
## 
## 208 samples
##  60 predictor
##   2 classes: 'M', 'R' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 208, 208, 208, 208, 208, 208, ... 
## Resampling results across tuning parameters:
## 
##   lambda  alpha  nrounds  Accuracy   Kappa    
##   0e+00   0e+00   50      0.7954411  0.5856736
##   0e+00   0e+00  100      0.7955067  0.5859968
##   0e+00   0e+00  150      0.7955067  0.5859968
##   0e+00   1e-04   50      0.7931161  0.5807902
##   0e+00   1e-04  100      0.7896880  0.5736899
##   0e+00   1e-04  150      0.7896880  0.5736899
##   0e+00   1e-01   50      0.7974045  0.5899654
##   0e+00   1e-01  100      0.8007978  0.5965433
##   0e+00   1e-01  150      0.8018652  0.5987027
##   1e-04   0e+00   50      0.7936100  0.5817500
##   1e-04   0e+00  100      0.7902008  0.5746993
##   1e-04   0e+00  150      0.7902008  0.5746993
##   1e-04   1e-04   50      0.7916874  0.5777943
##   1e-04   1e-04  100      0.7883283  0.5708511
##   1e-04   1e-04  150      0.7883283  0.5708511
##   1e-04   1e-01   50      0.7974045  0.5899654
##   1e-04   1e-01  100      0.8007978  0.5965433
##   1e-04   1e-01  150      0.8018652  0.5987027
##   1e-01   0e+00   50      0.7937810  0.5824365
##   1e-01   0e+00  100      0.7958099  0.5863334
##   1e-01   0e+00  150      0.7958099  0.5863334
##   1e-01   1e-04   50      0.7953707  0.5854209
##   1e-01   1e-04  100      0.7963228  0.5873658
##   1e-01   1e-04  150      0.7963228  0.5873658
##   1e-01   1e-01   50      0.7987849  0.5923712
##   1e-01   1e-01  100      0.8034709  0.6018293
##   1e-01   1e-01  150      0.8049729  0.6047501
## 
## Tuning parameter 'eta' was held constant at a value of 0.3
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, lambda =
##  0.1, alpha = 0.1 and eta = 0.3.

Extract feature importance

Since we are using the caret package we can use the built in function to extract feature importance, or the function from the xgboost package. We will do both.

`caret` feature importance

caret_imp <- varImp(xgb_fit)
caret_imp

## xgbLinear variable importance
## 
##   only 20 most important variables shown (out of 60)
## 
##     Overall
## V11 100.000
## V45  26.941
## V16  24.354
## V21  21.665
## V51  18.798
## V4   18.140
## V48  14.366
## V9   12.607
## V31  12.489
## V27  12.347
## V15  11.269
## V34   9.125
## V37   8.805
## V20   8.792
## V52   8.114
## V28   7.162
## V32   5.897
## V55   4.584
## V17   4.490
## V49   4.129

`xgboost` feature importance

xgb_imp <- xgb.importance(feature_names = xgb_fit$finalModel$feature_names,
               model = xgb_fit$finalModel)

head(xgb_imp)

##    Feature       Gain      Cover  Frequency
## 1:     V11 0.25619824 0.12851517 0.03283582
## 2:     V45 0.06902206 0.04458378 0.03582090
## 3:     V16 0.06239349 0.04163116 0.01492537
## 4:     V21 0.05550596 0.03471532 0.02686567
## 5:     V51 0.04816044 0.04527492 0.05373134
## 6:      V4 0.04647540 0.03484923 0.03880597

Plotting feature importance

caret

You have a few options when it comes to plotting feature importance. You can call plot on the saved object from caret as follows:

plot(caret_imp)

ggplot(caret_imp) +
  theme_minimal()

`xgboost`

You can use the plot functionality from xgboost

xgb.plot.importance(xgb_imp)

Or use their ggplot feature

xgb.ggplot.importance(xgb_imp)

xgboost feature importance

Prepping the Environment

Loading the data

Train the decision tree

Extract feature importance

caret feature importance

xgboost feature importance

Plotting feature importance

caret

xgboost

`caret` feature importance

`xgboost` feature importance

`xgboost`