“使用跨越距离两个文档相似性”

asafwatasafwat MemberPosts:4Contributor I
edited June 2019 inHelp
  1. 我正在使用rapidminer比较similarity between two text fields in two sheets in same excel file using cross distance, as i want to compart one request will all referernce to return the similarity value by cosine similarity, the problem is the distance returns as question mark '?' without knowing the reason


































































































<参数键= value =“transform_to瞧wer case"/>



<参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>


































< portSpacing port="source_document" spacing="0"/>
< portSpacing port="sink_document 1" spacing="0"/>
< portSpacing port="sink_document 2" spacing="0"/>















































































































<参数键= value =“transform_to瞧wer case"/>



<参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>


































< portSpacing port="source_document" spacing="0"/>
< portSpacing port="sink_document 1" spacing="0"/>
< portSpacing port="sink_document 2" spacing="0"/>







































































< portSpacing port="source_input 1" spacing="0"/>
< portSpacing port="sink_result 1" spacing="0"/>
< portSpacing port="sink_result 2" spacing="0"/>
Read Requirements Document
Read Requirements Change Requests



Screen Shot 2018-07-29 at 12.45.20 PM.pngScreen Shot 2018-07-29 at 12.46.20 PM.pngScreen Shot 2018-07-29 at 12.46.37 PM.png

Best Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Solution Accepted

    Hi@asafwat,

    I think I found elements of answers (now calculated distances/similarities have numerical values) :

    In the documentation of theCross-Distancesoperator it is said that :

    "Please note that both input ExampleSets should havethe same attributesand in the same order".

    So you have to use aSuperset(cf documentation of this operator) operator to feed thereqandrefports of theCross-Distancesoperator with 2 datasets which have strictly the same attributes.

    Moreover, I made some modifications in your process :

    - in theProcess Documents from Dataoperators : vector creation ->Term Occurences.

    - in theTokenize运营商:模式- >non letters.

    The process :







































    <参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>



















    < portSpacing port="source_document" spacing="0"/>
    < portSpacing port="sink_document 1" spacing="0"/>
    < portSpacing port="sink_document 2" spacing="0"/>








































    <参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>

















    < portSpacing port="source_document" spacing="0"/>
    < portSpacing port="sink_document 1" spacing="0"/>
    < portSpacing port="sink_document 2" spacing="0"/>







































    < portSpacing port="source_input 1" spacing="0"/>
    < portSpacing port="sink_result 1" spacing="0"/>
    < portSpacing port="sink_result 2" spacing="0"/>
    Read Requirements Document
    Read Requirements Change Requests



    I hope it helps,

    Regards,

    Lionel

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Solution Accepted

    Hi (one more time ...)@asafwat,

    Just a (last ?) little advice, you don't need to specify that an attribute is "regular" in theSet Roleoperator :

    By default, RapidMiner set automatically an attribute as "regular"...

    Regards,

    Lionel

    asafwat

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@asafwat,

    Are your attributes "numerical" ?

    Can you share your dataset(s) in order we can reproduce what you observe ?

    Regards,

    Lionel

    asafwat
  • asafwatasafwat MemberPosts:4Contributor I

    Sure, here is it, i have changed it to csv in order to attach it

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi again@asafwat,

    I have difficulties with your CSV file, can you send me your original Excel file by :

    - zipping it, then, attaching it to this post

    - sending your Excel file on Google Drive and then copy and share the link here in the forum

    Regards,

    Lionel

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    再次你好(再一次)@asafwat,

    Can you send me your Wordnet dictionnary too (by zipping it for example).

    Regards,

    Lionel

  • asafwatasafwat MemberPosts:4Contributor I

    @lionelderkrikorwooow it works, great efforts, really you made my day. much apperciated

    Thanks a lot

Sign InorRegisterto comment.