[SOLVED] Filtering Duplicate Table Data

MckenzieMckenzie MemberPosts:2Contributor I
edited September 2019 inHelp
Hi all, I have a simialirty to data module setup and I'm getting the following column outputs:

FIRST_ID, SECOND_ID, SIMILARITY

The way the pages are being compared means that the first id and second id are being displayed twice, for example

3, 2, 1.0
2, 3, 1.0

They are both the same but just in a different order. 3,2 and 2,3

I've been having a look at the remove duplicate module under Filtering, however I can't seem to find the correct rule or expression to only return unique values of the first and second id once.

Many thanks,

Mckenzie

Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Hi,

    i do not have a one operator solution for you, but the process below solves the problem. I do not know if there is an easier way to do it.



    Cheers,
    Martin






    <宏/ >























































    < from_op = " Multip连接ly (2)" from_port="output 1" to_op="Append (2)" to_port="example set 2"/>
    < from_op = " Multip连接ly (2)" from_port="output 2" to_op="Append (2)" to_port="example set 1"/>


    < from_op = " Multip连接ly" from_port="output 1" to_op="Rename" to_port="example set input"/>
    < from_op = " Multip连接ly" from_port="output 2" to_op="Rename (3)" to_port="example set input"/>














    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • MckenzieMckenzie MemberPosts:2Contributor I
    Hi Martin,

    Thanks for the reply. In the end I created an aggregate attribute similar to what you did and compared, ordered and concatenated the first and second by ID (using RegEx) to give a new unique ID then removed duplicates.

    Many thanks for your help.

    Mckenzie
Sign InorRegisterto comment.