Finding an incorrect grading pattern

marketa_vackovamarketa_vackova MemberPosts:2Contributor I
edited November 2018 inHelp

I was given a labelled data set and I was told few of the labels are wrongly assigned, i.e. some of the data were graded inaccurately. I'm supposed to find which ones. Which tool in RapidMiner should I use?

I tried the operator Find Outliers (Density), but somehow I feel that is not the one I'm looking for.

非常感谢你的建议。Marketa

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder

    Here is an idea: you could train a model on the data set which is generalizing well (no overfitting, no k-nn with 1 neighbor only, you get the idea...) and then apply this model to the training data set again. Whenever the prediction differs from the label, this could be a good candidate for wrongly labeled.

    Just my 2c,

    Ingo

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    Another potenial approach would be to run a clustering analysis on the labeled classes separately and then look for individual outliers that way.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
Sign InorRegisterto comment.