“使用跨越距离两个文档相似性”
- 我正在使用rapidminer比较similarity between two text fields in two sheets in same excel file using cross distance, as i want to compart one request will all referernce to return the similarity value by cosine similarity, the problem is the distance returns as question mark '?' without knowing the reason
<参数键= value =“transform_to瞧wer case"/>
<参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>
< portSpacing port="source_document" spacing="0"/>
< portSpacing port="sink_document 1" spacing="0"/>
< portSpacing port="sink_document 2" spacing="0"/>
<参数键= value =“transform_to瞧wer case"/>
<参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>
< portSpacing port="source_document" spacing="0"/>
< portSpacing port="sink_document 1" spacing="0"/>
< portSpacing port="sink_document 2" spacing="0"/>
< portSpacing port="source_input 1" spacing="0"/>
< portSpacing port="sink_result 1" spacing="0"/>
< portSpacing port="sink_result 2" spacing="0"/>Read Requirements Document Read Requirements Change Requests
Best Answers
-
lionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
Hi@asafwat,
I think I found elements of answers (now calculated distances/similarities have numerical values) :
In the documentation of theCross-Distancesoperator it is said that :
"Please note that both input ExampleSets should havethe same attributesand in the same order".
So you have to use aSuperset(cf documentation of this operator) operator to feed thereqandrefports of theCross-Distancesoperator with 2 datasets which have strictly the same attributes.
Moreover, I made some modifications in your process :
- in theProcess Documents from Dataoperators : vector creation ->Term Occurences.
- in theTokenize运营商:模式- >non letters.
The process :
<参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>
< portSpacing port="source_document" spacing="0"/>
< portSpacing port="sink_document 1" spacing="0"/>
< portSpacing port="sink_document 2" spacing="0"/>
<参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>
< portSpacing port="source_document" spacing="0"/>
< portSpacing port="sink_document 1" spacing="0"/>
< portSpacing port="sink_document 2" spacing="0"/>
< portSpacing port="source_input 1" spacing="0"/>
< portSpacing port="sink_result 1" spacing="0"/>
< portSpacing port="sink_result 2" spacing="0"/>Read Requirements Document Read Requirements Change Requests I hope it helps,
Regards,
Lionel
0 -
lionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
Hi (one more time ...)@asafwat,
Just a (last ?) little advice, you don't need to specify that an attribute is "regular" in theSet Roleoperator :
By default, RapidMiner set automatically an attribute as "regular"...
Regards,
Lionel
1
Answers
Hi@asafwat,
Are your attributes "numerical" ?
Can you share your dataset(s) in order we can reproduce what you observe ?
Regards,
Lionel
Sure, here is it, i have changed it to csv in order to attach it
Hi again@asafwat,
I have difficulties with your CSV file, can you send me your original Excel file by :
- zipping it, then, attaching it to this post
- sending your Excel file on Google Drive and then copy and share the link here in the forum
Regards,
Lionel
再次你好(再一次)@asafwat,
Can you send me your Wordnet dictionnary too (by zipping it for example).
Regards,
Lionel
@lionelderkrikorwooow it works, great efforts, really you made my day. much apperciated
Thanks a lot