[SOLVED] Filtering Duplicate Table Data

Mckenzie · March 2015

Hi all, I have a simialirty to data module setup and I'm getting the following column outputs:

FIRST_ID, SECOND_ID, SIMILARITY

The way the pages are being compared means that the first id and second id are being displayed twice, for example

3, 2, 1.0
2, 3, 1.0

They are both the same but just in a different order. 3,2 and 2,3

I've been having a look at the remove duplicate module under Filtering, however I can't seem to find the correct rule or expression to only return unique values of the first and second id once.

Many thanks,

Mckenzie

MartinLiebig · March 2015

Hi,

i do not have a one operator solution for you, but the process below solves the problem. I do not know if there is an easier way to do it.

Cheers,
Martin







<宏/ >























































< from_op = " Multip连接ly (2)" from_port="output 1" to_op="Append (2)" to_port="example set 2"/>
< from_op = " Multip连接ly (2)" from_port="output 2" to_op="Append (2)" to_port="example set 1"/>


< from_op = " Multip连接ly" from_port="output 1" to_op="Rename" to_port="example set input"/>
< from_op = " Multip连接ly" from_port="output 2" to_op="Rename (3)" to_port="example set input"/>

Mckenzie · March 2015

Hi Martin,

Thanks for the reply. In the end I created an aggregate attribute similar to what you did and compared, ordered and concatenated the first and second by ID (using RegEx) to give a new unique ID then removed duplicates.

Many thanks for your help.

Mckenzie

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

[SOLVED] Filtering Duplicate Table Data

Answers