machine learning - Random Forest: mismatch between %IncMSE and %NodePurity -


i have performed random forest analysis of 100,000 classification trees on rather small dataset (i.e. 28 obs. of 11 variables).

i made plot of variable importance

in resulting plots there substantial mismatch between %incmse , incnodepurity @ least 1 of important variables. variable in fact appears seventh importance in former (i.e. %incmse<0) third in latter.

could enlighten me on how should interpreter mismatch?

the variable in question correlated 1 other variable appears consistently in second place in both graphs. clue?

the first graph shows if variable assigned values random permutation how mse increase. higher value, higher variable importance.

on other hand, node purity measured gini index the difference between rss before , after split on variable.

since concept of criteria of variable importance different in 2 cases, have different rankings different variables.

there no fixed criterion select "best" measure of variable importance depends on problem have @ hand.


Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -