Generating a data set for testing

pettudorpettudor MemberPosts:2Contributor I
edited December 2018 inHelp

Hello,

Computer engineer student here, new to data science but what I want is fairly simple in notion but I couldn't find the right operators to do it yet or maybe I have and don't know how to use them, so here we go:

1.I have 22 attributes, 20 of which I want them to be integers that very from 0.2 to 2.8 depending on the attribute (the first 2 are just strings).

2.Is there a way to generate with dependency on what was generate before, need an example to explain better, lets say we have one example with attribute 1 that generated 1.4 that's, 0.4 above average for that specific attribute, so the next one, attribute 2, will generate 0.9 (0.5 which is the average for that attribute + the difference from the one before 0.4 so 0.5+0.4) making the generation pseudo-random.







<参数键=“P2”值= "真正的" / >

























I am definitely doing something wrong :smileysad:

Tagged:

Best Answer

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, MemberPosts:290Unicorn
    Solution Accepted

    Hi@pettudor


    2.Is there a way to generate with dependency on what was generate before, need an example to explain better,

    lets say we have one example with attribute 1 that generated 1.4 that's, 0.4 above average for that specific attribute


    I am a bit confused with the description.

    The answer for the first part is yes, there is an operator 'Generate attributes' that allows you to construct new attributes based on already existing ones, and that's pretty easy. You even may do some aggregations so that you can generate new attributes based not only on existing previous values, but also using such aggregated values like mean, median, sum etc etc.

    The second part though is confusing. You say this first attribute woul have value = 1.4 for some certain example, but what exactly this value is based upon? You need either to generate the first attribute pseudo-randomly, or base its values on already existing data.

    Could you please clarify?

    sgenzer pettudor

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959社区等内容er

    hi@pettudorwelcome to the community. So first I want to say CONGRATULATIONS - you're the first "newbie" I have seen in a long while who actually read the directions and posted their XML process with their first post.:):):)

    So back to your question....so I'm not sure if you have 22 attributes from your own data set, or you want to create 22 attributes from random data. If it's the former, just use the "Add Data" wizard in the Repository panel and go through the steps:

    屏幕截图2018 - 04-04 at 8.50.26 AM.png

    If you want to create random data, use the "Generate Data" operator rather than the "Generate Data by User Specification"":

    屏幕截图2018 - 04-04 at 8.52.14 AM.png

    The default for this is to create six attributes: five "regular" attributes of real numbers, and one "label" attribute with real numbers:

    屏幕截图2018 - 04-04 at 8.54.50 AM.png屏幕截图2018 - 04-04 at 8.53.38 AM.png屏幕截图2018 - 04-04 at 8.53.45 AM.png

    You can then modify these with other operators to make them strings, integers, etc...:

    屏幕截图2018 - 04-04 at 8.59.19 AM.png

    Let me know if that makes sense.

    Scott

    pettudor
  • pettudorpettudor MemberPosts:2Contributor I

    So after the generation of one attribute of 100 random examples I just used the operator generate attribute, gave it a dependency formula and bobs your uncle I have what I want.
























    Added the code, such an easy task in reprospect :catfrustrated:

    Must thank you all for the patience of reading this mess of a post, have a great day.

    sgenzer kypexin
Sign InorRegisterto comment.