Random Forest Classifiers
As already mentioned in the last chapter, a single Decision Tree is prone to overfitting and variations in the data. To avoid this scenario, an ensemble of Decision Trees is often used, a so-called Random Forest. This way, a whole variety of Decision Trees with are created that all use a different sample of the whole dataset as well as a set of predictors randomly chosen at each node. What is really comfortable about Random Forest Classifiers is that most common algorithms do this step fully automatic – all the user usually needs to adjust is the number of trees to be considered in the ensemble and the number of variables in the random subset.
Advantages of Random Forest Classifiers:
Improves the accuracy of single decision trees by reducing overfitting
Less prone to missing values
Data does not need to be transformed or rescaled
Data can be both categorical or numerical features
Disadvantages of Random Forest Classifiers:
Depending on the number and complexity of decision trees, computation can be demanding.
The significance of single variables is blurred, hence impeding interpretability
Advancing Literature to dive deeper into the theoretical part:
https://blogs.fu-berlin.de/reseda/random-forest/
Liaw, A., Wiener, M. (2002): Classification and Regression by randomForest. Forest, 23.
Pal., M. (2003): Random Forest Classifier for Remote Sensing Classification. In: International Journal of Remote Sensing, Volume 26, Issue 1. Pages 217-222.