Problem with combining all example set from IO Object Collection


Hello everyone
I'm running a loop to create each ExampleSet I end up with an IOObjectCollection on the output. I got a problem with joining all example sets that i got from looping attributes into one example set. i've tried all join operator but im stuck on it. I set attribute "No" as an ID and the value is alike with each other example set. For example my data are like this.
example set 1 :
No att1
1
2
example set 2 :
No att2
1
2
example set 3 :
No att3
1
2
the result that i want is like this
example set :
No att1 att2 att3
1
2
i've tried looking for a reference, and i ended up find similiar post like this but still im stuck on it, here is the seimiliar posthttp://community.www.turtlecreekpls.com/t5/Original-Rapid-I-Forum/Combining-Example-Set-Attributes/m-p/12879
Best Answers
-
艾丁_Klapic Moderator, Employee, RMResearcher, MemberPosts:299
RM Data Scientist
Hi,
I have attached an example process and the XML which should solve your problem.
Some key takeaways:
- The solution uses theJoinoperator andRemember / Recallwithin aLoop Collection.
- Joining needs an ID attribute - Either you create one or you use an existing one which can be used ==> Then be sure you use the desiredjoin type
- IDs need to have the same Value type (e.g. Numerical). Here the Blending -> Attributes -> Types Operators can help
- In order to overcome the problem that you need to have always two ExampleSets for a Join operation I Remember the first one
- Each execution of the Loop the Remembered dataset is Recalled, Joined and again Remembered
- In the end you receive the final dataset which can be Recalled outside of the Loop Collection
Please keep in mind thatRemember / Recallare great operators but I do not recommend to use them when it comes to handling huge datasets.
Best,
艾丁
Here the XML:
<运营商激活= " true " class="process" compatibility="7.5.001" expanded="true" name="Process">
<运营商激活= " true " class="generate_data" compatibility="7.5.001" expanded="true" height="68" name="Generate Data (2)" width="90" x="45" y="34">
<运营商激活= " true " class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
<运营商激活= " true " class="multiply" compatibility="7.5.001" expanded="true" height="124" name="Multiply" width="90" x="313" y="34"/>
<运营商激活= " true " class="rename" compatibility="7.5.001" expanded="true" height="82" name="Rename (4)" width="90" x="447" y="238">
<运营商激活= " true " class="rename" compatibility="7.5.001" expanded="true" height="82" name="Rename (3)" width="90" x="447" y="136">
<运营商激活= " true " class="loop_collection" compatibility="7.5.001" expanded="true" height="68" name="Loop Collection (2)" width="90" x="715" y="34">
<运营商激活= " true " class="generate_id" compatibility="7.5.001" expanded="true" height="82" name="Generate ID (2)" width="90" x="179" y="238"/>
<运营商激活= " true " class="branch" compatibility="7.5.001" expanded="true" height="82" name="Branch (2)" width="90" x="514" y="238">
<运营商激活= " true "类=“记住”兼容ibility="7.5.001" expanded="true" height="68" name="Remember (3)" width="90" x="45" y="34">
<运营商激活= " true " class="recall" compatibility="7.5.001" expanded="true" height="68" name="Recall (3)" width="90" x="45" y="34">
<运营商激活= " true " class="join" compatibility="7.5.001" expanded="true" height="82" name="Join (2)" width="90" x="179" y="85">
<运营商激活= " true "类=“记住”兼容ibility="7.5.001" expanded="true" height="68" name="Remember (4)" width="90" x="313" y="85">Either <br/>- Generate an ID<br/>- Set the Role for an attribute to ID<br/><br/>Important is that the attribute names in the final exampleset must be unique<br/><br/>In addition the value type (Numerical vs. Polynominal) of the ID attribute has to be the same for each ExampleSet
<运营商激活= " true " class="recall" compatibility="7.5.001" expanded="true" height="68" name="Recall (2)" width="90" x="849" y="34">1 -
艾丁_Klapic Moderator, Employee, RMResearcher, MemberPosts:299
RM Data Scientist
I included aBreakpointin my solution right after theCollectOperator. It is depicted with a red square symbol.
A Breakpoint pauses the Process and shows the intermediate result.
You have three options:
- Before starting the Process:
- Remove the Breakpoint by clicking on the Operator where the Breakpoint is assigned and press the Shortkey F7
- Remove the Breakpoint by rightclicking on the Operator where the Breakpoint is assigned and uncheck the selection "Breakpoint After"
- After starting the Process: Resume the Process by clicking again onRun Process(Shortkey F11)
Best regards,
艾丁
1 - Before starting the Process:
-
艾丁_Klapic Moderator, Employee, RMResearcher, MemberPosts:299
RM Data Scientist
The Process itself is correct.
The reason for your problem is that each role (as well as attribute name) can only occur once in each exampleset. Therefore the prediction is always overwritten.
Thus you need to change the role for each attribute. In case all attributes have different names you can use a similar solution as depicted in the screenshot below.
Best regards,
艾丁
1
Answers
你可以添加这些一起但是首先在tributes will need to be renamed so the datset has the same structure (attributes names and data types). Try the Rename by Generic Names followed by an Append and you should get a resulting dataset that you can then transpose.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
i've tried your recomendation but error appears, it said "duplicate attribute name". I put Rename by Generic Names inside loop attributes operator and append, transpose outside the loop operators
I wouldn't put the Rename by Generic into the Loop, I'd do it on the outside of the loop.
it comes error too, it said that "your connection is producing worng type data". Maybe, because after the loop, the type of data is IO Object Collection and Rename by Generic name only expect a example set
thank you for the reference of ooperator, the tips and the example too, i'll try it with my model that i built.
*P.S : When i run your example, it still appears object collection with some example sets
Regards,
Bintang
I've looking for another example and i've found a model that similiar with yours and the result is what i looking for. But, when i tried with my model, it appears an error on recall operator inside branch operator, it said that "no object with name X was found during retrieval from the object store", even though i've adjusted with the model.
Here is the xml code from the model that i've adjusted to
I used the XML you posted in my RapidMiner (v 7.5.001) and it worked perfectly.
Did I miss something?
Best,
艾丁
when i run your xml code, it appears IO Object Collection with 3 example sets that not yet joined into one example set. Therefore im looking for another reference and then i found other xml code (on my previous reply)
ah thank you so much, i didn't realize there is breakpoint (im still new with Rapidminer). i'll try with my model
it only appears first example set when joined. Here is my model that i've combined with your xml code. Is there any mistake in my configuration?
Ah so thats the problem, thank you for help me sir!Wait for a question about another topic from me :smileyvery-happy:
one thing that make me curious, in my model that i build, all example set run same neural network model. The thing is every example set have their unique neural network model, right? can i run neural network model with different neuron size, training cycle, learning rate, momentum to each example set? how to do?
You may have a look into the OperatorOptimize Parameters (Grid).
Within the Operator Help there is a Tutorial process linked which should point you in the right direction.
Best regards,
艾丁
yeah i've tried it and i found the best ANN model for each example set but how to apllied it for each example set? if i put it on neural network operator, it only for one Neural Network Model but it means that this one model is applied to all example set right?
i tried the posted solution.
I found that i like it with a union operator instead of teh join. witht he join it would either repeat the column header modified by source it came from or only have one instance of attribute value.
@binsetyawanSince you are doing everything withinLoop Attributes, each Attribute has its own model. Does that answer your question?
@mskinnerI suppose that depends on your use case. Did you just replace the Join with the Union Operator? Since Union simply appends your ExampleSets, the number of examples in the final ExampleSet can drastically increase.
Best,
艾丁
yeah i've tried optimization grid and each example set got their own model, but how to apply their own model on each example set when i use attribute loop?
is it possible with rapidminer?@Edin_Klapic@Thomas_Ott
i observed teh exact opposite performance. with the join any attribute that was there was renamed and added as a new attributs so the files size was huge.
when i used union it add teh new example uner teh appropriate atribut if it existed and only create a new atribute when it did not already exist in set it was being joined with.
yeah its depend on your case, in my case i need each example set to create new attribute