Problem with combining all example set from IO Object Collection

binsetyawanbinsetyawan MemberPosts:46Guru
edited November 2018 inHelp

Hello everyone

I'm running a loop to create each ExampleSet I end up with an IOObjectCollection on the output. I got a problem with joining all example sets that i got from looping attributes into one example set. i've tried all join operator but im stuck on it. I set attribute "No" as an ID and the value is alike with each other example set. For example my data are like this.

example set 1 :

No att1

1

2

example set 2 :

No att2

1

2

example set 3 :

No att3

1

2

the result that i want is like this

example set :

No att1 att2 att3

1

2

i've tried looking for a reference, and i ended up find similiar post like this but still im stuck on it, here is the seimiliar posthttp://community.www.turtlecreekpls.com/t5/Original-Rapid-I-Forum/Combining-Example-Set-Attributes/m-p/12879

Best Answers

  • 艾丁_Klapic艾丁_Klapic Moderator, Employee, RMResearcher, MemberPosts:299RM Data Scientist
    Solution Accepted

    Hi,

    I have attached an example process and the XML which should solve your problem.

    Some key takeaways:

    1. The solution uses theJoinoperator andRemember / Recallwithin aLoop Collection.
    2. Joining needs an ID attribute - Either you create one or you use an existing one which can be used ==> Then be sure you use the desiredjoin type
    3. IDs need to have the same Value type (e.g. Numerical). Here the Blending -> Attributes -> Types Operators can help
    4. In order to overcome the problem that you need to have always two ExampleSets for a Join operation I Remember the first one
    5. Each execution of the Loop the Remembered dataset is Recalled, Joined and again Remembered
    6. In the end you receive the final dataset which can be Recalled outside of the Loop Collection

    Please keep in mind thatRemember / Recallare great operators but I do not recommend to use them when it comes to handling huge datasets.

    Best,

    艾丁

    Here the XML:







    <运营商激活= " true " class="process" compatibility="7.5.001" expanded="true" name="Process">

    <运营商激活= " true " class="generate_data" compatibility="7.5.001" expanded="true" height="68" name="Generate Data (2)" width="90" x="45" y="34">


    <运营商激活= " true " class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">




    <运营商激活= " true " class="multiply" compatibility="7.5.001" expanded="true" height="124" name="Multiply" width="90" x="313" y="34"/>
    <运营商激活= " true " class="rename" compatibility="7.5.001" expanded="true" height="82" name="Rename (4)" width="90" x="447" y="238">




    <运营商激活= " true " class="rename" compatibility="7.5.001" expanded="true" height="82" name="Rename (3)" width="90" x="447" y="136">





    <运营商激活= " true " class="loop_collection" compatibility="7.5.001" expanded="true" height="68" name="Loop Collection (2)" width="90" x="715" y="34">


    <运营商激活= " true " class="generate_id" compatibility="7.5.001" expanded="true" height="82" name="Generate ID (2)" width="90" x="179" y="238"/>
    <运营商激活= " true " class="branch" compatibility="7.5.001" expanded="true" height="82" name="Branch (2)" width="90" x="514" y="238">



    <运营商激活= " true "类=“记住”兼容ibility="7.5.001" expanded="true" height="68" name="Remember (3)" width="90" x="45" y="34">








    <运营商激活= " true " class="recall" compatibility="7.5.001" expanded="true" height="68" name="Recall (3)" width="90" x="45" y="34">


    <运营商激活= " true " class="join" compatibility="7.5.001" expanded="true" height="82" name="Join (2)" width="90" x="179" y="85">



    <运营商激活= " true "类=“记住”兼容ibility="7.5.001" expanded="true" height="68" name="Remember (4)" width="90" x="313" y="85">














    Either <br/>- Generate an ID<br/>- Set the Role for an attribute to ID<br/><br/>Important is that the attribute names in the final exampleset must be unique<br/><br/>In addition the value type (Numerical vs. Polynominal) of the ID attribute has to be the same for each ExampleSet


    <运营商激活= " true " class="recall" compatibility="7.5.001" expanded="true" height="68" name="Recall (2)" width="90" x="849" y="34">

















    binsetyawan
  • 艾丁_Klapic艾丁_Klapic Moderator, Employee, RMResearcher, MemberPosts:299RM Data Scientist
    Solution Accepted

    I included aBreakpointin my solution right after theCollectOperator. It is depicted with a red square symbol.

    A Breakpoint pauses the Process and shows the intermediate result.

    You have three options:

    1. Before starting the Process:
      1. Remove the Breakpoint by clicking on the Operator where the Breakpoint is assigned and press the Shortkey F7
      2. Remove the Breakpoint by rightclicking on the Operator where the Breakpoint is assigned and uncheck the selection "Breakpoint After"
    2. After starting the Process: Resume the Process by clicking again onRun Process(Shortkey F11)

    Best regards,

    艾丁

    binsetyawan
  • 艾丁_Klapic艾丁_Klapic Moderator, Employee, RMResearcher, MemberPosts:299RM Data Scientist
    Solution Accepted

    The Process itself is correct.

    The reason for your problem is that each role (as well as attribute name) can only occur once in each exampleset. Therefore the prediction is always overwritten.

    Thus you need to change the role for each attribute. In case all attributes have different names you can use a similar solution as depicted in the screenshot below.

    image.png

    Best regards,

    艾丁

    binsetyawan

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    你可以添加这些一起但是首先在tributes will need to be renamed so the datset has the same structure (attributes names and data types). Try the Rename by Generic Names followed by an Append and you should get a resulting dataset that you can then transpose.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    Thomas_Ott
  • binsetyawanbinsetyawan MemberPosts:46Guru

    i've tried your recomendation but error appears, it said "duplicate attribute name". I put Rename by Generic Names inside loop attributes operator and append, transpose outside the loop operators

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    I wouldn't put the Rename by Generic into the Loop, I'd do it on the outside of the loop.

  • binsetyawanbinsetyawan MemberPosts:46Guru

    it comes error too, it said that "your connection is producing worng type data". Maybe, because after the loop, the type of data is IO Object Collection and Rename by Generic name only expect a example set

  • binsetyawanbinsetyawan MemberPosts:46Guru

    thank you for the reference of ooperator, the tips and the example too, i'll try it with my model that i built.

    *P.S : When i run your example, it still appears object collection with some example sets

    Regards,

    Bintang

  • binsetyawanbinsetyawan MemberPosts:46Guru

    I've looking for another example and i've found a model that similiar with yours and the result is what i looking for. But, when i tried with my model, it appears an error on recall operator inside branch operator, it said that "no object with name X was found during retrieval from the object store", even though i've adjusted with the model.

    Here is the xml code from the model that i've adjusted to








    <运营商激活= " true " class="process" compatibility="6.1.000-SNAPSHOT" expanded="true" name="Process">

    <运营商激活= " true " class="subprocess" compatibility="6.1.000-SNAPSHOT" expanded="true" height="76" name="Subprocess" width="90" x="112" y="30">

    <运营商激活= " true " class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="30">








    <运营商激活= " true " class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="45" y="120">








    <运营商激活= " true "类= compatib“追加”ility="6.1.000-SNAPSHOT" expanded="true" height="94" name="Append" width="90" x="179" y="30"/>
    <运营商激活= " true " class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="45" y="210">








    <运营商激活= " true " class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification (4)" width="90" x="45" y="300">








    <运营商激活= " true "类= compatib“追加”ility="6.1.000-SNAPSHOT" expanded="true" height="94" name="Append (2)" width="90" x="179" y="210"/>
    <运营商激活= " true " class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification (5)" width="90" x="45" y="390">








    <运营商激活= " true " class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification (6)" width="90" x="45" y="480">








    <运营商激活= " true "类= compatib“追加”ility="6.1.000-SNAPSHOT" expanded="true" height="94" name="Append (3)" width="90" x="179" y="390"/>
    <运营商激活= " true " class="collect" compatibility="6.1.000-SNAPSHOT" expanded="true" height="112" name="Collect" width="90" x="380" y="210"/>















    <运营商激活= " true " class="multiply" compatibility="6.1.000-SNAPSHOT" expanded="true" height="94" name="Multiply (2)" width="90" x="246" y="30"/>
    <运营商激活= " true " class="select" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Select (2)" width="90" x="447" y="30"/>
    <运营商激活= " true "类=“记住”兼容ibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Remember" width="90" x="581" y="30">


    <运营商激活= " true " class="loop_collection" compatibility="6.1.000-SNAPSHOT" expanded="true" height="76" name="Loop Collection" width="90" x="447" y="165">


    <运营商激活= " true " class="branch" compatibility="6.1.000-SNAPSHOT" expanded="true" height="76" name="Branch" width="90" x="112" y="120">










    <运营商激活= " true " class="recall" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Recall" width="90" x="112" y="75">


    <运营商激活= " true " class="join" compatibility="6.1.000-SNAPSHOT" expanded="true" height="76" name="Join" width="90" x="246" y="30">


    <运营商激活= " true "类=“记住”兼容ibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Remember (2)" width="90" x="380" y="30">














    < portSpacing端口= " source_single”间隔= " 0 " / >




    <运营商激活= " true " class="recall" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Recall (2)" width="90" x="581" y="165">












  • 艾丁_Klapic艾丁_Klapic Moderator, Employee, RMResearcher, MemberPosts:299RM Data Scientist

    I used the XML you posted in my RapidMiner (v 7.5.001) and it worked perfectly.

    Did I miss something?

    Best,

    艾丁

  • binsetyawanbinsetyawan MemberPosts:46Guru

    when i run your xml code, it appears IO Object Collection with 3 example sets that not yet joined into one example set. Therefore im looking for another reference and then i found other xml code (on my previous reply)

  • binsetyawanbinsetyawan MemberPosts:46Guru

    ah thank you so much, i didn't realize there is breakpoint (im still new with Rapidminer). i'll try with my model

  • binsetyawanbinsetyawan MemberPosts:46Guru

    it only appears first example set when joined. Here is my model that i've combined with your xml code. Is there any mistake in my configuration?

  • binsetyawanbinsetyawan MemberPosts:46Guru

    Ah so thats the problem, thank you for help me sir!Wait for a question about another topic from me :smileyvery-happy:

    艾丁_Klapic
  • binsetyawanbinsetyawan MemberPosts:46Guru

    one thing that make me curious, in my model that i build, all example set run same neural network model. The thing is every example set have their unique neural network model, right? can i run neural network model with different neuron size, training cycle, learning rate, momentum to each example set? how to do?

  • 艾丁_Klapic艾丁_Klapic Moderator, Employee, RMResearcher, MemberPosts:299RM Data Scientist

    You may have a look into the OperatorOptimize Parameters (Grid).

    Within the Operator Help there is a Tutorial process linked which should point you in the right direction.

    Best regards,

    艾丁

  • binsetyawanbinsetyawan MemberPosts:46Guru

    yeah i've tried it and i found the best ANN model for each example set but how to apllied it for each example set? if i put it on neural network operator, it only for one Neural Network Model but it means that this one model is applied to all example set right?

  • mskinnermskinner MemberPosts:10Contributor I

    i tried the posted solution.

    I found that i like it with a union operator instead of teh join. witht he join it would either repeat the column header modified by source it came from or only have one instance of attribute value.

  • 艾丁_Klapic艾丁_Klapic Moderator, Employee, RMResearcher, MemberPosts:299RM Data Scientist

    @binsetyawanSince you are doing everything withinLoop Attributes, each Attribute has its own model. Does that answer your question?

    @mskinnerI suppose that depends on your use case. Did you just replace the Join with the Union Operator? Since Union simply appends your ExampleSets, the number of examples in the final ExampleSet can drastically increase.

    Best,

    艾丁

  • binsetyawanbinsetyawan MemberPosts:46Guru

    yeah i've tried optimization grid and each example set got their own model, but how to apply their own model on each example set when i use attribute loop?

    is it possible with rapidminer?@Edin_Klapic@Thomas_Ott

  • mskinnermskinner MemberPosts:10Contributor I

    i observed teh exact opposite performance. with the join any attribute that was there was renamed and added as a new attributs so the files size was huge.

    when i used union it add teh new example uner teh appropriate atribut if it existed and only create a new atribute when it did not already exist in set it was being joined with.

  • binsetyawanbinsetyawan MemberPosts:46Guru

    yeah its depend on your case, in my case i need each example set to create new attribute

Sign InorRegisterto comment.