How can we check whether a word is present in the list or not?

AnushaAnusha MemberPosts:19Maven
Hi All!

I have a dataset, which is having 3 columns. The first 2 columns are having a list of words and 3r column has a single word in each row. I need to check that word present in the 3rd column whether it's present in 1st column list or 2nd column list.


Source Data:

list1 list2 ch
shape,size,type,endi toldis,umbr,oilv,poll type
shape,size,type,endi toldis,umbr,oilv,poll oilv
shape,size,type,endi toldis,umbr,oilv,poll umbr


Desired output:


list1 list2 ch flag_1(list1) flag_2(list2)
shape,size,type,endi toldis,umbr,oilv,poll type 1 0
shape,size,type,endi toldis,umbr,oilv,poll oilv 0 1
shape,size,type,endi toldis,umbr,oilv,poll umbr 0 1

as "type" is present in list1 flag_1 should be "1" and flag_2 should be "0"
"oilv" and "umbr" are present in list2 column so flag_2 should be "1" for them.


I have tried array_contains, IN, NOT IN and loop values but unable to get the required answer. can anyone help me in resolving this?


Thanks in Advance!

Best Answer

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:953Unicorn
    Solution Accepted
    Hi,

    it is easy with Generate Attributes. I tried two different approaches:

    if(contains(list1, ch), 1, 0)

    if(matches(list2, "(^|.*,)" + ch + "($|,.*)"), 1, 0)

    The solution with "contains()" is simpler but not exactly foolprof: it could also match substrings.
    正则表达式搜索与匹配()检查s for "either the start of the string or a text followed by a comma", the search string, and "either the end of the string or a text after a comma".

    Here's an example process:
                                                 


    Regards,
    Balázs
    Sign InorRegisterto comment.