In some complex problems in data science, we find that the performance of our machine learning algorithm is very poor even after spending some time understanding the problem and performing some feature engineering. At that point, we realize that combining several machine learning algorithms may come to the rescue!
Ensemble Learning is a technique that creates multiple models and then combines them to produce improved results. As a consequence, it supposedly produces more accurate solutions than a single model.
The companion graph simplifies this concept a lot. Several models are being trained on the same dataset and then combined for, hopefully, better results.
The interesting part about ensemble learning methods is that it could be applied to regression and classification as well. For regression problems, you can create multiple regressors to address the same problem such as linear, polynomial, and multiple regression models in an ensemble learning system. For classification problems, the same idea can be used for creating an ensemble learning system with models that can be diversified between logistic regression, decision tree, KNN, SVM, etc.
In general, there are two main steps in ensemble learning, the first one requires building multiple machine learning models from the same or different algorithms and they are called “Base models”. Then, the final and overall prediction that is being performed based on base models.
The question now is, What are the different approaches in applying ensemble learning?
Well, they are 4 main techniques in applying ensemble learning.
- Voting and Averaging
- Bootstrap Aggregation / Bagging
And in the next few slides, we will go through the idea of each type.
Speaking about Voting and Averaging, you may have guessed now that voting is used in classification and averaging is suitable to be used in regression and we will reveal the mechanics of each in the next slides.
As this graph depicts, when handling a classification problem using the voting ensemble technique we build an odd number of classifiers then we take the class the has the majority votes to be the final prediction. Here, we have 3 cats versus 2 dogs so, the final prediction is Cat!
Unlike Voting, When using averaging, we can use whatever number we want of regressors then average the outputs to get our final prediction. In this graph for example our final prediction is 4.58
Stacking or also known as stacked generalization’s basic idea is to train machine learning algorithms with the training dataset then generate a new dataset out of their predictions. This new dataset is used as input for combiner or final machine learning algorithm.
As this graph shows, we have some training dataset then we train multiple machine learning algorithms which are denoted in d1, d2 to dL. Then we use all these predictions to get a new dataset afterwards we train a new combiner machine learning algorithm upon these predictions.
At first, multiple machine learning models are generated. These models are trained with n random observations of m samples in the original dataset. Then we aggregate the result of these learners using voting or averaging depending on the problem itself.
For example, random forest and extra trees are examples of bagging techniques.
We can see from this graph that the second step when building a bagging ensemble learning system is pretty similar to the voting and averaging techniques illustrated earlier.
Boosting is used to describe a family of algorithms that can convert weak models to a strong one. This could be done by incrementally building ensemble by training each model with the same training dataset with the weights of instances that are adjusted according to the error of the last prediction.
For example, AdaBoost and XGBoost are widely used boosting algorithms.