Weighted Bagging in Decision Trees: Data Mining

  • Yousef M. T. El Gimati Statistics Department, Faculty of Science University of Benghazi, Libya (LY)
Keywords: Resampling, bagging, bootstrapping, decision tree

Viewed = 104 time(s)

Abstract

The main focus of this paper is on the use of resampling techniques to construct predictive models from data and the goal is to identify the best possible model which can produce better predications. Bagging or Bootstrap aggregating is a general method for improving the performance of given learning algorithm by using a majority vote to combine multiple classifier outputs derived from a single classifier on a bootstrap resample version of a training set. A bootstrap sample is generated by a random sample with replacement from the original training set. Inspired by the idea of bagging, we present an improved method based on a distance function in decision trees, called modified bagging (or weighted Bagging) in this study. The experimental results show that modified bagging is superior to the usual majority vote. These results are confirmed by both real data and artificial data sets with random noise. The Modified bagged classifier performs significantly better than usual bagging on various tree levels for all sample sizes. An interesting observation is that the weighted bagging performs somewhat better than usual bagging with sumps.



References

Breiman, L. and Friedman, J. H. and Olshen, R. and Stone, C. J. (1984). Classification and Regression Trees. Belmont, California.

Breiman, L. (1996a). Bagging predictors. Machine Learning 26, 123-40.

Breiman, L. (1996b). Bais, variance and arcing classifier. Technical report, Statistics Department, University of California, Berkeley.

Crawley M. J. (2014). The R Book. John Wiley & Sons, Ltd.

Davison, A. C. and Hinkley, D. V. (1997). Bootstrap Method and their Application. Cambridge University Press.

Efron, B. and Tibhirani, R. (1993). An Introduction to The Bootstrap. Chapman & Hall, London.

El Ojali, (1994). Pattern of anæmic among Libyan infants of northeastern Libya. [Unpublished Master's dissertation]. Medicine Department, University of Garyounis.

Fisher, R. A. (1936). The use of multiple measurements in Taxonomic problems. Annals of Eugenics 7, 179-80.

Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York: Wiley.

Hastie, T. and Tibshirani, R. and Friedman, J. (2017). The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer Series in Statistics. New York.

Schapire, R. E. (1990). The strength of learnability. Machine Learning 5, 197-227.

Published
2020-10-01
Section
Articles
How to Cite
Elgimati, Y. (2020). Weighted Bagging in Decision Trees: Data Mining. JINAV: Journal of Information and Visualization, 1(1), 1-14. https://doi.org/10.35877/454RI.jinav149