What seems to be the problem in this case?

cjjc20001cjjc20001 MemberPosts:8Contributor II
I am trying Lightgbm with a dataset. It is giving the following error.




Sample data are gender, degree concentration etc. Mostly ready-made options coming from a survey where the participant just selects the most appropriate option.

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,377RM Data Scientist
    Hi,

    looks like your text field has categories in application which werent present in training.

    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • cjjc20001cjjc20001 MemberPosts:8Contributor II
    I think the problem is that there are data instances that only occur once, and during the sampling, this occurrence is not chosen by the training data; hence during the validation; they are marked as unrecognized. When I removed the split, it worked. However, I need to train and test the model. I utilized cross-validation but it has the same problem. What is the solution for this?
Sign InorRegisterto comment.