"PIVOT operator creates erroneous output"
Hello, i was testing the results of pivot in Excel and Rapidminer and i found out that they are completely different, whith those of RM being wrong! Any ideas?
Here is the dataset i used in both cases:
Student | Item | Amount |
Id 1 | Item 1 | 1 |
Id 2 | Item 2 | 8 |
Id 3 | Item 1 | 1 |
Id 3 | Item 1 | 1 |
Id 2 | Item 2 | 1 |
Id 2 | Item 1 | 1 |
Id 3 | Item 1 | 1 |
Id 2 | Item 1 | 1 |
Id 2 | Item 2 | 5 |
Id 2 | Item 1 | 1 |
and here is the result from Pivot operation in Excel (the correct one)
Row Labels | Item 1 | Item 2 |
Id 1 | 1 | |
Id 2 | 3 | 14 |
Id 3 | 3 |
and here is the result from the Pivot operator in RM. i selected the student attr as ID and Itema s Index and SUM as weight aggregation.
Row No | Item1 | Item 2 |
1 | 1.0 | ? |
2 | 1.0 | 5.0 |
3 | 1.0 | ? |
here is the process XML
<运营商激活= " true " class = "process" compatibility="7.4.000" expanded="true" name="Process">
<运营商激活= " true " class = "generate_transaction_data" compatibility="7.4.000" expanded="true" height="68" name="Generate Transaction Data" width="90" x="45" y="340">
<运营商激活= " true " class = "multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply" width="90" x="246" y="289"/>
<运营商激活= " true " class = "pivot" compatibility="7.4.000" expanded="true" height="82" name="Pivot" width="90" x="246" y="85">
<运营商激活= " true " class = "rename_by_replacing" compatibility="7.4.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="447" y="34">
Best Answer
-
Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
@mmarag, and if you want to actually get the data structure similar to the Excel output, where the row is by id and there is a seprate column for each item sum, then you can add a pivot after the Aggregate operator that@Thomas_Ottshowed in his process. Here's an updated process that adds that piece:
<运营商激活= " true " class = "process" compatibility="7.4.000" expanded="true" name="Process">
<运营商激活= " true " class = "generate_transaction_data" compatibility="7.4.000" expanded="true" height="68" name="Generate Transaction Data" width="90" x="45" y="34">
<运营商激活= " true " class = "multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply" width="90" x="246" y="136"/>
<运营商激活= " true " class = "aggregate" compatibility="7.4.000" expanded="true" height="82" name="Aggregate" width="90" x="514" y="289">
<运营商激活= " true " class = "pivot" compatibility="7.4.000" expanded="true" height="82" name="Pivot (2)" width="90" x="648" y="289">
<运营商激活= " true " class = "pivot" compatibility="7.4.000" expanded="true" height="82" name="Pivot" width="90" x="447" y="34">
<运营商激活= " true " class = "rename_by_replacing" compatibility="7.4.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="581" y="34">但请记住,对于某些类型的analysis, the unpivoted format (long and skinny) is actually more efficient. So the right structure depends what you are trying to accomplish.
2
Answers
What you want is the Aggregate operator, not Pivot. There's much confusion as to what Excel calls Pivot and what the rest of the world means by Pivot, but that's a long discussion over a few beers.
I added an Aggregate to your process, see attached.