machine learning - Random Forest: mismatch between %IncMSE and %NodePurity -

March 15, 2011

i have performed random forest analysis of 100,000 classification trees on rather small dataset (i.e. 28 obs. of 11 variables).

i made plot of variable importance

in resulting plots there substantial mismatch between %incmse , incnodepurity @ least 1 of important variables. variable in fact appears seventh importance in former (i.e. %incmse<0) third in latter.

could enlighten me on how should interpreter mismatch?

the variable in question correlated 1 other variable appears consistently in second place in both graphs. clue?

the first graph shows if variable assigned values random permutation how mse increase. higher value, higher variable importance.

on other hand, node purity measured gini index the difference between rss before , after split on variable.

since concept of criteria of variable importance different in 2 cases, have different rankings different variables.

there no fixed criterion select "best" measure of variable importance depends on problem have @ hand.

Search This Blog

Parth Code

machine learning - Random Forest: mismatch between %IncMSE and %NodePurity -

Comments

Post a Comment

Popular posts from this blog

c# - WPF Converters DLL - Failed to Add Reference -

sql server - SQL Query get records between 10pm to 6am -

Java sticky instances of class com.mysql.jdbc.Field aggregating -