count total occurrence and sort by date

tahsin · May 2021

I have rapidminer example set like this,

ID Issue Exp 100 9/8/2020 11/8/2020 100 8/5/2019 9/5/2019 101 6/3/2020 10/1/2020 102 8/15/2020 12/12/2020

I want to add a new column which will count the occurrence of the ID by adding the numbers and sort by the earliest date so we know at what date how many count I had.

Output like this,

ID Issue Exp Count 100 8/5/2019 9/5/2019 1 100 9/8/2020 11/8/2020 2 101 6/3/2020 10/1/2020 1 102 8/15/2020 12/12/2020 1

But when I aggregate by ID and do a count it will just count the total instead and show them for the same ID. So, for ID 100 it shows me 2 both the times because it is just adding the numbers both the times.

For example, for ID 100 in 2019 we had only 1 issue date hence count is 1, when we find ID 100 again at 2020 the count will be 2. So, the sort by date is also important because it will help us find the ID occurrence in correct order.

tahsin · May 2021

Yes, if you look at the ID column, for ID 100 the count is 2 both the times. It should be 1 when there was an issue date first time(2019) and 2 when there was another issue date (2020). So, we are incrementing the numbers based on occurrence instead of just counting 2 both times.

Thanks.

MartinLiebig · May 2021

Hi,

have a look at the attached process, it should do the trick.

Best,

Martin

<参数可y="logverbosity" value="init"/>
<参数可y="random_seed" value="2001"/>
<参数可y="send_mail" value="never"/>
<参数可y="notification_email" value=""/>
<参数可y="process_duration_for_mail" value="30"/>
<参数可y="encoding" value="SYSTEM"/>

<运营商激活= " true " class = "效用:create_exampleset" compatibility="9.9.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
<参数可y="generator_type" value="comma separated text"/>
<参数可y="number_of_examples" value="100"/>
<参数可y="use_stepsize" value="false"/>

<参数可y="add_id_attribute" value="false"/>

<参数可y="date_format" value="yyyy-MM-dd HH:mm:ss"/>
<参数可y="time_zone" value="SYSTEM"/>
<参数可y="input_csv_text" value="ID , Issue , Exp 100 , 9/8/2020 , 11/8/2020 100 , 8/5/2019 , 9/5/2019 101 , 6/3/2020 , 10/1/2020 102 , 8/15/2020 , 12/12/2020"/>
<参数可y="column_separator" value=","/>
<参数可y="parse_all_as_nominal" value="false"/>
<参数可y="decimal_point_character" value="."/>
<参数可y="trim_attribute_names" value="true"/>

<参数可y="group_by_attribute" value="ID"/>
<参数可y="group_by_attribute (numerical)" value=""/>
<参数可y="sorting_order" value="none"/>

<参数可y="set_iteration_macro" value="false"/>
<参数可y="macro_name" value="iteration"/>
<参数可y="macro_start_value" value="1"/>
<参数可y="unfold" value="false"/>

<列出关键= " sort_by " >
<参数可y="Issue" value="ascending"/>

<参数可y="create_nominal_ids" value="false"/>
<参数可y="offset" value="0"/>

<参数可y="id" value="count"/>

<参数可y="from_attribute" value=""/>
<参数可y="to_attribute" value=""/>

MartinLiebig · May 2021

Hi,

sounds like you want to aggregate and then join again? Attached is a process on your data.

Bet,

Martin

<参数可y="logverbosity" value="init"/>
<参数可y="random_seed" value="2001"/>
<参数可y="send_mail" value="never"/>
<参数可y="notification_email" value=""/>
<参数可y="process_duration_for_mail" value="30"/>
<参数可y="encoding" value="SYSTEM"/>

<运营商激活= " true " class = "效用:create_exampleset" compatibility="9.9.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
<参数可y="generator_type" value="comma separated text"/>
<参数可y="number_of_examples" value="100"/>
<参数可y="use_stepsize" value="false"/>

<参数可y="add_id_attribute" value="false"/>

<参数可y="date_format" value="yyyy-MM-dd HH:mm:ss"/>
<参数可y="time_zone" value="SYSTEM"/>
<参数可y="input_csv_text" value="ID , Issue , Exp 100 , 9/8/2020 , 11/8/2020 100 , 8/5/2019 , 9/5/2019 101 , 6/3/2020 , 10/1/2020 102 , 8/15/2020 , 12/12/2020"/>
<参数可y="column_separator" value=","/>
<参数可y="parse_all_as_nominal" value="false"/>
<参数可y="decimal_point_character" value="."/>
<参数可y="trim_attribute_names" value="true"/>

<参数可y="use_default_aggregation" value="false"/>
<参数可y="attribute_filter_type" value="all"/>
<参数可y="attribute" value=""/>
<参数可y="attributes" value=""/>
<参数可y="use_except_expression" value="false"/>
<参数可y="value_type" value="attribute_value"/>
<参数可y="use_value_type_exception" value="false"/>
<参数可y="except_value_type" value="time"/>
<参数可y="block_type" value="attribute_block"/>
<参数可y="use_block_type_exception" value="false"/>
<参数可y="except_block_type" value="value_matrix_row_start"/>
<参数可y="invert_selection" value="false"/>
<参数可y="include_special_attributes" value="false"/>
<参数可y="default_aggregation_function" value="average"/>

<参数可y="ID" value="count"/>

<参数可y="group_by_attributes" value="ID"/>
<参数可y="count_all_combinations" value="false"/>
<参数可y="only_distinct" value="false"/>
<参数可y="ignore_missings" value="true"/>

<参数可y="remove_double_attributes" value="true"/>
<参数可y="join_type" value="inner"/>
<参数可y="use_id_attribute_as_key" value="false"/>

<参数可y="ID" value="ID"/>

<参数可y="keep_both_join_attributes" value="false"/>

<列出关键= " sort_by " >
<参数可y="Exp" value="ascending"/>

tahsin · May 2021

Hi,

Thanks for your reply. I ran your process and this is the result I get. It looks like its the same result I got. Showing 2 for count both the times for ID 100.

Image: https://us.v-cdn.net/6030995/uploads/editor/g9/khwgpse1cffl.png

MartinLiebig · May 2021

Hi,

so why would this be wrong? You want more a cumulative sum?

Best,

Martin

tahsin · May 2021

太棒了! !我所有的预期。非常感谢your help.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

count total occurrence and sort by date

Best Answers

Answers