How to copy rows (samples) based on numerical value of defined attribute?

CausalityvsCorrCausalityvsCorr MemberPosts:17Contributor II
edited July 2019 inHelp

I have a dataset with a few thousand rows and tens of attributes, where one attribute contains integers between 1 and around 100. It can be treated as a sort of sampling weight. I need to copy each row based on the value in that specific attribute (i.e. from 1 to around hundred times) and to create a new dataset accordingly.

I cannot find any operator which is dedicated to this kind of task, but I am sure this can be done with RM. But how?

Best Answer

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1761年Unicorn
    Solution Accepted

    @CausalityvsCorrpretty simple to do with loops and macros. Something like this?





    <宏/ >

    <操作or activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">

    <操作or activated="true" class="generate_data" compatibility="8.1.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">



    <操作or activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"/>
    <操作or activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">



    <操作or activated="true" class="extract_macro" compatibility="8.1.000" expanded="true" height="68" name="Extract Macro" width="90" x="447" y="34">
    <参数键= "宏" value = " num " / >


    <操作or activated="true" class="concurrency:loop" compatibility="8.1.000" expanded="true" height="82" name="Loop" width="90" x="581" y="34">


    <操作or activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">




    <操作or activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store" width="90" x="313" y="34">






















    CausalityvsCorr sgenzer

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn

    Hi,

    求进步ding on why you need the rows copied, you may also avoid the data copy by setting the numerical attribute as weight. Many algorithms support weigthed examples. See their operator capabilities.

    Greetings,

    Sebastian

    sgenzer
  • u1111082u1111082 MemberPosts:5Learner I
    edited June 2020
    I'm also trying to copy a row with say an attribute count value of 10 into 10 identical rows, so I can then run the FG-growth operator. I'm not sure if the solutions above will work in my situation? Appreciate any comments.
Sign InorRegisterto comment.