Random Forest Classifiers

As already mentioned in the last chapter, a single Decision Tree is prone to overfitting and variations in the data. To avoid this scenario, an ensemble of Decision Trees is often used, a so-called Random Forest. This way, a whole variety of Decision Trees with are created that all use a different sample of the whole dataset as well as a set of predictors randomly chosen at each node. What is really comfortable about Random Forest Classifiers is that most common algorithms do this step fully automatic – all the user usually needs to adjust is the number of trees to be considered in the ensemble and the number of variables in the random subset.

Advantages of Random Forest Classifiers:

Improves the accuracy of single decision trees by reducing overfitting

Less prone to missing values

Data does not need to be transformed or rescaled

Data can be both categorical or numerical features

Disadvantages of Random Forest Classifiers:

Depending on the number and complexity of decision trees, computation can be demanding.

The significance of single variables is blurred, hence impeding interpretability

Advancing Literature to dive deeper into the theoretical part:

https://blogs.fu-berlin.de/reseda/random-forest/

Liaw, A., Wiener, M. (2002): Classification and Regression by randomForest. Forest, 23.

Pal., M. (2003): Random Forest Classifier for Remote Sensing Classification. In: International Journal of Remote Sensing, Volume 26, Issue 1. Pages 217-222.

https://scikit-learn.org/stable/modules/tree.html

Department of Earth Sciences

GEO-IT - The Technology of Data Acquisition for Sustainable Development and Crisis Management

Random Forest Classifiers