count total occurrence and sort by date

tahsintahsin MemberPosts:20Contributor II

I have rapidminer example set like this,

ID Issue Exp 100 9/8/2020 11/8/2020 100 8/5/2019 9/5/2019 101 6/3/2020 10/1/2020 102 8/15/2020 12/12/2020

I want to add a new column which will count the occurrence of the ID by adding the numbers and sort by the earliest date so we know at what date how many count I had.

Output like this,

ID Issue Exp Count 100 8/5/2019 9/5/2019 1 100 9/8/2020 11/8/2020 2 101 6/3/2020 10/1/2020 1 102 8/15/2020 12/12/2020 1

But when I aggregate by ID and do a count it will just count the total instead and show them for the same ID. So, for ID 100 it shows me 2 both the times because it is just adding the numbers both the times.

For example, for ID 100 in 2019 we had only 1 issue date hence count is 1, when we find ID 100 again at 2020 the count will be 2. So, the sort by date is also important because it will help us find the ID occurrence in correct order.

Tagged:

Best Answers

  • tahsintahsin MemberPosts:20Contributor II
    Solution Accepted
    Yes, if you look at the ID column, for ID 100 the count is 2 both the times. It should be 1 when there was an issue date first time(2019) and 2 when there was another issue date (2020). So, we are incrementing the numbers based on occurrence instead of just counting 2 both times.

    Thanks.
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Solution Accepted
    Hi,

    have a look at the attached process, it should do the trick.

    Best,
    Martin








    <参数可y="logverbosity" value="init"/>
    <参数可y="random_seed" value="2001"/>
    <参数可y="send_mail" value="never"/>
    <参数可y="notification_email" value=""/>
    <参数可y="process_duration_for_mail" value="30"/>
    <参数可y="encoding" value="SYSTEM"/>

    <运营商激活= " true " class = "效用:create_exampleset" compatibility="9.9.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
    <参数可y="generator_type" value="comma separated text"/>
    <参数可y="number_of_examples" value="100"/>
    <参数可y="use_stepsize" value="false"/>

    <参数可y="add_id_attribute" value="false"/>



    <参数可y="date_format" value="yyyy-MM-dd HH:mm:ss"/>
    <参数可y="time_zone" value="SYSTEM"/>
    <参数可y="input_csv_text" value="ID , Issue , Exp 100 , 9/8/2020 , 11/8/2020 100 , 8/5/2019 , 9/5/2019 101 , 6/3/2020 , 10/1/2020 102 , 8/15/2020 , 12/12/2020"/>
    <参数可y="column_separator" value=","/>
    <参数可y="parse_all_as_nominal" value="false"/>
    <参数可y="decimal_point_character" value="."/>
    <参数可y="trim_attribute_names" value="true"/>


    <参数可y="group_by_attribute" value="ID"/>
    <参数可y="group_by_attribute (numerical)" value=""/>
    <参数可y="sorting_order" value="none"/>


    <参数可y="set_iteration_macro" value="false"/>
    <参数可y="macro_name" value="iteration"/>
    <参数可y="macro_start_value" value="1"/>
    <参数可y="unfold" value="false"/>


    <列出关键= " sort_by " >
    <参数可y="Issue" value="ascending"/>



    <参数可y="create_nominal_ids" value="false"/>
    <参数可y="offset" value="0"/>



    <参数可y="id" value="count"/>

    <参数可y="from_attribute" value=""/>
    <参数可y="to_attribute" value=""/>

























    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    tahsin

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Hi,
    sounds like you want to aggregate and then join again? Attached is a process on your data.

    Bet,
    Martin







    <参数可y="logverbosity" value="init"/>
    <参数可y="random_seed" value="2001"/>
    <参数可y="send_mail" value="never"/>
    <参数可y="notification_email" value=""/>
    <参数可y="process_duration_for_mail" value="30"/>
    <参数可y="encoding" value="SYSTEM"/>

    <运营商激活= " true " class = "效用:create_exampleset" compatibility="9.9.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
    <参数可y="generator_type" value="comma separated text"/>
    <参数可y="number_of_examples" value="100"/>
    <参数可y="use_stepsize" value="false"/>

    <参数可y="add_id_attribute" value="false"/>



    <参数可y="date_format" value="yyyy-MM-dd HH:mm:ss"/>
    <参数可y="time_zone" value="SYSTEM"/>
    <参数可y="input_csv_text" value="ID , Issue , Exp 100 , 9/8/2020 , 11/8/2020 100 , 8/5/2019 , 9/5/2019 101 , 6/3/2020 , 10/1/2020 102 , 8/15/2020 , 12/12/2020"/>
    <参数可y="column_separator" value=","/>
    <参数可y="parse_all_as_nominal" value="false"/>
    <参数可y="decimal_point_character" value="."/>
    <参数可y="trim_attribute_names" value="true"/>



    <参数可y="use_default_aggregation" value="false"/>
    <参数可y="attribute_filter_type" value="all"/>
    <参数可y="attribute" value=""/>
    <参数可y="attributes" value=""/>
    <参数可y="use_except_expression" value="false"/>
    <参数可y="value_type" value="attribute_value"/>
    <参数可y="use_value_type_exception" value="false"/>
    <参数可y="except_value_type" value="time"/>
    <参数可y="block_type" value="attribute_block"/>
    <参数可y="use_block_type_exception" value="false"/>
    <参数可y="except_block_type" value="value_matrix_row_start"/>
    <参数可y="invert_selection" value="false"/>
    <参数可y="include_special_attributes" value="false"/>
    <参数可y="default_aggregation_function" value="average"/>

    <参数可y="ID" value="count"/>

    <参数可y="group_by_attributes" value="ID"/>
    <参数可y="count_all_combinations" value="false"/>
    <参数可y="only_distinct" value="false"/>
    <参数可y="ignore_missings" value="true"/>


    <参数可y="remove_double_attributes" value="true"/>
    <参数可y="join_type" value="inner"/>
    <参数可y="use_id_attribute_as_key" value="false"/>

    <参数可y="ID" value="ID"/>

    <参数可y="keep_both_join_attributes" value="false"/>


    <列出关键= " sort_by " >
    <参数可y="Exp" value="ascending"/>


















    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    BalazsBarany
  • tahsintahsin MemberPosts:20Contributor II
    Hi,

    Thanks for your reply. I ran your process and this is the result I get. It looks like its the same result I got. Showing 2 for count both the times for ID 100.


  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Hi,
    so why would this be wrong? You want more a cumulative sum?
    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • tahsintahsin MemberPosts:20Contributor II
    太棒了! !我所有的预期。非常感谢your help.
    MartinLiebig
Sign InorRegisterto comment.