"AdaBoost performance on new data (test dataset) MUCH worse than without AdaBoost"

miaquemiaque MemberPosts:4Contributor I
edited June 2019 inHelp
Hello,
I have the following problem:
I am working on dataset of data suitable for modeling the classification problem of digits recognition.

The database consists of 64 normal attributes + one for the class. It consists of nearly 5000 examples and is divided for training set (30 digit-writers) and test set (another, new 14 writers).

For my study project I am obliged to use the meta-learning operators. I faced the problem, that without use of AdaBoost operator, the results are aprox. 85% for the training set (X-Validation) and aprox. 80% for testing set (new data). When I try to implement AdaBoost, the results from X-Validation of training set are getting better - aprox. 90%, and MUCH WORSE for the new data - only 20% of accuracy!

Can anyone know what can be the issue here?

Thank you!

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,388RM Data Scientist
    看起来像你u overtrain, right?
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign InorRegisterto comment.