Filtering examples based on number of occurences in attribute

BaskiBaski MemberPosts:1Contributor I
edited November 2018 inHelp

Hi,

For example I have examples that containts information about visits. Every visit is defined to visitor_id. I want to filter the examples(rows) where the visitor_id occure more than 5 times. So there will be no more then 4 rows for every visitor_id. I tried filter, but that was not helpfull.

Any idea how to do this in rapid miner ?
Thanks.

Tagged:

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder

    Hi,

    While I am pretty sure that the answer to this question will involve the operators "Aggregate", "Pivot", and "Filter Examples", I am unfortunately not sure if I fully got the problem. Can you give us a small data sample (original data) as well as how the desired output for this sample should look like?

    Merci,

    Ingo

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    hi...no the Filter Examples operator is not going to help you here (as you saw). The way I see it, you need to first create an attribute that lists # of occurrences, and then you can filter for n > 5 or whatever. Personally I would use the Aggregate operator where you group by visitor_id and aggregate by visitor_id. Then join this with your original data set on the visitor_id attribute.

    Scott

Sign InorRegisterto comment.