"PIVOT operator creates erroneous output"

mmaragmmarag MemberPosts:35Guru
edited June 2019 inHelp

Hello, i was testing the results of pivot in Excel and Rapidminer and i found out that they are completely different, whith those of RM being wrong! Any ideas?

Here is the dataset i used in both cases:

Student Item Amount
Id 1 Item 1 1
Id 2 Item 2 8
Id 3 Item 1 1
Id 3 Item 1 1
Id 2 Item 2 1
Id 2 Item 1 1
Id 3 Item 1 1
Id 2 Item 1 1
Id 2 Item 2 5
Id 2 Item 1 1

and here is the result from Pivot operation in Excel (the correct one)

Row Labels Item 1 Item 2
Id 1 1
Id 2 3 14
Id 3 3

and here is the result from the Pivot operator in RM. i selected the student attr as ID and Itema s Index and SUM as weight aggregation.

Row No Item1 Item 2
1 1.0 ?
2 1.0 5.0
3 1.0 ?

here is the process XML







<运营商激活= " true " class = "process" compatibility="7.4.000" expanded="true" name="Process">

<运营商激活= " true " class = "generate_transaction_data" compatibility="7.4.000" expanded="true" height="68" name="Generate Transaction Data" width="90" x="45" y="340">





<运营商激活= " true " class = "multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply" width="90" x="246" y="289"/>
<运营商激活= " true " class = "pivot" compatibility="7.4.000" expanded="true" height="82" name="Pivot" width="90" x="246" y="85">



<运营商激活= " true " class = "rename_by_replacing" compatibility="7.4.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="447" y="34">













Tagged:

Best Answer

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    Solution Accepted

    @mmarag, and if you want to actually get the data structure similar to the Excel output, where the row is by id and there is a seprate column for each item sum, then you can add a pivot after the Aggregate operator that@Thomas_Ottshowed in his process. Here's an updated process that adds that piece:







    <运营商激活= " true " class = "process" compatibility="7.4.000" expanded="true" name="Process">

    <运营商激活= " true " class = "generate_transaction_data" compatibility="7.4.000" expanded="true" height="68" name="Generate Transaction Data" width="90" x="45" y="34">





    <运营商激活= " true " class = "multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply" width="90" x="246" y="136"/>
    <运营商激活= " true " class = "aggregate" compatibility="7.4.000" expanded="true" height="82" name="Aggregate" width="90" x="514" y="289">





    <运营商激活= " true " class = "pivot" compatibility="7.4.000" expanded="true" height="82" name="Pivot (2)" width="90" x="648" y="289">



    <运营商激活= " true " class = "pivot" compatibility="7.4.000" expanded="true" height="82" name="Pivot" width="90" x="447" y="34">



    <运营商激活= " true " class = "rename_by_replacing" compatibility="7.4.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="581" y="34">















    但请记住,对于某些类型的analysis, the unpivoted format (long and skinny) is actually more efficient. So the right structure depends what you are trying to accomplish.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    mmarag Thomas_Ott

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    What you want is the Aggregate operator, not Pivot. There's much confusion as to what Excel calls Pivot and what the rest of the world means by Pivot, but that's a long discussion over a few beers.

    I added an Aggregate to your process, see attached.







    <运营商激活= " true " class = "process" compatibility="7.4.000" expanded="true" name="Process">

    <运营商激活= " true " class = "generate_transaction_data" compatibility="7.4.000" expanded="true" height="68" name="Generate Transaction Data" width="90" x="45" y="34">





    <运营商激活= " true " class = "multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply" width="90" x="246" y="136"/>
    <运营商激活= " true " class = "aggregate" compatibility="7.4.000" expanded="true" height="82" name="Aggregate" width="90" x="514" y="289">





    <运营商激活= " true " class = "pivot" compatibility="7.4.000" expanded="true" height="82" name="Pivot" width="90" x="447" y="34">



    <运营商激活= " true " class = "rename_by_replacing" compatibility="7.4.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="581" y="34">

















    mmarag
Sign InorRegisterto comment.