Similarity between mutiple tables
Hi,
Currently, I am working on a thesis research for my university to solve an entity resolution problem. Today I have tried to integrate two tables with each other through measuring the Similarity between these tables. If the threshold is above 0,9 it is considered as useful and will it be used in the second evaluation. In the second evaluation the variables will be evaluated on weight. For example, a phone number is a better unique key, than a firstname. At the end, the customer representation need to be evaluated as followed (0.9*2)+(0,8*7) = .... if the threshold is above the 0.8 (for example) it will consider as usefull and integrate the rows. I Tried to perform the similarity (with a couple of similarity measures) measure In rapid miner, but I received extreme values ( <0 or >1).
(currently, I cannot post any screenshots, since I am new)
What do I wrong?
Cheers, Robin
Currently, I am working on a thesis research for my university to solve an entity resolution problem. Today I have tried to integrate two tables with each other through measuring the Similarity between these tables. If the threshold is above 0,9 it is considered as useful and will it be used in the second evaluation. In the second evaluation the variables will be evaluated on weight. For example, a phone number is a better unique key, than a firstname. At the end, the customer representation need to be evaluated as followed (0.9*2)+(0,8*7) = .... if the threshold is above the 0.8 (for example) it will consider as usefull and integrate the rows. I Tried to perform the similarity (with a couple of similarity measures) measure In rapid miner, but I received extreme values ( <0 or >1).
(currently, I cannot post any screenshots, since I am new)
What do I wrong?
Cheers, Robin
Tagged:
0
Answers
Best Regards,
Edwin Yaqub
Scott
Currently, I made the following process in rapid miner:
我使用相同的数据集,使用正确的数据and the other with manipulated data (same columns). To start with the first cross distance test I selected the "initials" attribute. Within the cross distance operate I selected "nominal measures" and "JaccardSimilarity". I received the following results::
Results:
I was expecting results such as: 0,43, 0,33 etc, see below an real example: