Making AdaBoost Less Prone to Overfitting On Noisy Datasets

EasyChair Preprint 2742

8 pages•Date: February 21, 2020

Zainab Ghadiri Modarres, Mahmood Shabankhah and Ali Kamandi

Abstract

AdaBoost is perhaps one of the most well-known ensemble learning algorithms. In simple terms, the idea in AdaBoost is to train a number of weak learners in an increamental fashion where each new learner tries to focus more on those samples that were misclassfied by the preceding classifiers. Consequently, in the presence of noisy data samples, the new leraners will somehow memorize the data, which in turn will lead to an overfitted model. The main objective of this paper is to provide a generalized version of the Adaboost algorithm that avoids overfitting, and performs better when the data samples are corrupted with noise. To this end, we make use of another ensemble learning algorithm called ValidBoost [15], and introduce a mechanism to dynamically determine the thresholds for both the error rate of each classifier and the error rate in each iteration. These threshholds enable us to control the error rate of the algorithm. Experimental simulations has been made on several benchmark datasets to evaluate the performance of our proposed algorithm.

Keyphrases: AdaBoost, Boosting, Overfitting, ensemble learning algorithms, noise, zero_one_loss

Links:

https://easychair.org/publications/preprint/rFCW

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:2742,
  author    = {Zainab Ghadiri Modarres and Mahmood Shabankhah and Ali Kamandi},
  title     = {Making AdaBoost Less Prone to Overfitting On Noisy Datasets},
  howpublished = {EasyChair Preprint 2742},
  year      = {EasyChair, 2020}}

Download PDF Open PDF in browser