Filter out rows with dictionary terms

tatianiiatatianiia MemberPosts:11Contributor I
edited June 2019 inHelp
Hi! I have a problem that appeared quite simple at first glance, but I really don't know how to solve it.
  • I have a file with several text attributes (attr1, attr2)
  • I have a long list of terms in another file.
I need to derive only those rows from the first file that don't contain in attr.2 any of the terms from the second file.

So, if the first file contains:
attr.1 attr.2
Sun Sun is shining
Rain Rain is falling

and the second contains:
attr.1
Sun
月亮

I want to get the second row from the first file as an output.

I guess there must be some easy solution for that. Thanks!

Answers

  • tatianiiatatianiia MemberPosts:11Contributor I
    My solution is:
    1. Apply "Process documents" operator with dictionary terms as a word list to create a binary vector;
    2. Filter out examples that don't have positive attributes.

    Not really sure that this solution is the best one.
Sign InorRegisterto comment.