One Problem

kinkouniokinkounio MemberPosts:9Contributor II
edited November 2018 inHelp
I have a file with more data and i compare to file with one data. The result will have one data of first file. The data more proxim to data of second file.

How to ??

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    Hi,

    this question has been asked during the last few days a few times. Here are the answers:

    You have two options.

    1. Load the data sets and merge them. Calculate a similarity measure for the merged data set. Filter out the combinations where your single data is not part of. Sort the rest. Use the one with the highest similariy. All the necessary operators are part of RapidMiner.

    2. If the amount of data is rather large, then the calculation of the full similarity matrix is probably not applicable. In that case, you have to iterate over the examples, use only the current example, calculate the similarity with your single example of interest and store it via ProcessLog. Afterwards you can change the process log back to a data set, sort it etc.

    Cheers,
    Ingo
  • kinkouniokinkounio MemberPosts:9Contributor II
    Good moorning .

    Where is the similar post?

    Thanks.
  • kinkouniokinkounio MemberPosts:9Contributor II
    Hi.

    I want to compare 2 archives.

    historik.txt

    1 73 15 16 13 14 15
    2 123 25 26 23 24 25
    3 173 35 36 33 34 35
    4 224 45 46 43 44 46
    5 274 55 56 53 54 56

    dades.txt

    25 26 23 24 25

    The correct result would be the second row of the first file . Value: 123

    With this code he is not correct. The result with this code is 73. That I have bad?





















    Files aml.

    dades.aml


    name = "dades.txt (1)"
    sourcecol = "1"
    valuetype = "integer"/>

    name = "dades.txt (2)"
    sourcecol = "2"
    valuetype = "integer"/>

    name = "dades.txt (3)"
    sourcecol = "3"
    valuetype = "integer"/>

    name = "dades.txt (4)"
    sourcecol = "4"
    valuetype = "integer"/>

    name = "dades.txt (5)"
    sourcecol = "5"
    valuetype = "integer"/>



    historik.aml



    name = "historik.txt (1)"
    sourcecol = "1"
    valuetype = "integer"/>

    name = "historik.txt (2)"
    sourcecol = "2"
    valuetype = "integer"/>

    name = "historik.txt (3)"
    sourcecol = "3"
    valuetype = "integer"/>

    name = "historik.txt (4)"
    sourcecol = "4"
    valuetype = "integer"/>

    name = "historik.txt (5)"
    sourcecol = "5"
    valuetype = "integer"/>

    name = "historik.txt (6)"
    sourcecol = "6"
    valuetype = "integer"/>

    name = "historik.txt (7)"
    sourcecol = "7"
    valuetype = "integer"/>



    How I can do it?

    Thanks.
  • haddockhaddock MemberPosts:849Maven
    Hi,

    The answer to your problem is that for some reason only known to yourself you call column three a cluster!

    <集群
    name = "historik.txt (3)"
    sourcecol = "3"
    valuetype = "integer"/>

    I've laid out the data in one file like this...

    1 73 15 16 13 14 15
    2 123 25 26 23 24 25
    3 173 35 36 33 34 35
    4 224 45 46 43 44 46
    5 274 55 56 53 54 56
    6 ? 25 26 23 24 25


    and made the necessary code changes to this...

















    and rather unsurprisingly the correct answer emerges.

    So the answer to
    How I can do it?
    is

    With more care!
  • kinkouniokinkounio MemberPosts:9Contributor II
    嗨,黑线鳕。

    Your code it's not the solution. I woultd compare the atribute 3-7 of file 1 with atribute of file 2 and the result there is atribute 2 of file 1.

    The column "cluster" is an error for me.

    I would obtain one valor of the second column of file 1. This valor is the valor where the file 1 is the same valor of file 2.

    In the example my, on compare 2 files the result it would have to give the second colum of second row of file 1.

    Thanks.
  • haddockhaddock MemberPosts:849Maven
    The correct result would be the second row of the first file . Value: 123
    To make it even easier for you to comprehend I've put the data into CSV form, then we don't need AML files at all. So here is the data...

    1, 73, 15, 16, 13, 14,15
    2, 123, 25, 26, 23,24, 25
    3, 173, 35, 36, 33, 34, 35
    4, 224, 45, 46, 43, 44, 46
    5, 274, 55, 56,53, 54, 56
    6, , 25, 26, 23, 24, 25

    For the same reason I've taken out the second data read and replaced it with a datacopy, like this...



























    If I run this I get "123" as the answer, just like before, so I'm puzzled as to what you mean by the following
    Your code it's not the solution. I woultd compare the atribute 3-7 of file 1 with atribute of file 2 and the result there is atribute 2 of file 1.
    Perhaps you could enlighten us?
  • kinkouniokinkounio MemberPosts:9Contributor II
    Hi,
    haddock thanks.

    I will prove it.
Sign InorRegisterto comment.