I want to predict a value by another values

davidraul36davidraul36 MemberPosts:6Contributor I
edited November 2018 inHelp

Hello, I'm very newbie to RapidMiner and data science as well so bear me please.

I want to predict values from totally different values, it's like trying to finding a model for the relation between them.

For Example;

I have Excel spreedsheet with cloumns (A, B, C, D, F)

I want to use (A, B, C, D) to predict or getting model for the values in (F) then use it to test data...

Thanks in advance,

Tagged:

Best Answer

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn
    Solution Accepted

    @davidraul36Here's what I would do. Clean up the date and time attributes and use a different algo. 74% trend accuracy and you can most likely optimize that with Optimize Parameters.







    <运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">














    <参数键=“2”值= " Open.true.real.attribute"/>
















    <运营商激活= " true " class = " nominal_to_date”compatibility="8.0.001" expanded="true" height="82" name="Nominal to Date" width="90" x="447" y="34">
















    <参数键=“2”值= " Open.true.real.attribute"/>


































    <列出关键= " expert_parameters " / >












    <运营商激活= " true " class = "系列:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">













































































    sgenzer

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@davidraul36,

    Can you share your dataset(s) please ?

    Regards,

    Lionel

  • davidraul36davidraul36 MemberPosts:6Contributor I

    Here it's the data I use,

    I want to find a model which finds the values of column "Avg" from all the other columns.

    Feed.zip 128.7K
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    您应该检查啊ut the "Getting Started" videos on the www.turtlecreekpls.com webpage, they are designed to help you get started with a basic predictive modeling project such as this one. You will need to define your "label" (the thing you are trying to predict) first.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    davidraul36
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    @davidraul36I would do what@Telcontar120suggests, review some videos and try out the tutorials that are built into Studio itself. Then build a process and if you get stuck, post that XML to the community for help.

    davidraul36
  • davidraul36davidraul36 MemberPosts:6Contributor I

    I already tried to do a model, but my model use the previous data of "Avg" to predict the next one.

    I don't know what to do in the design to let "column (Avg)" as only a prediction without getting any info from it or its previous values.







    <运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">






















    <参数键=“2”值= " Open.true.real.attribute"/>




































    <参数键= value =“select_label_by_dimensionfalse"/>




















































    <运营商激活= " true " class = "系列:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="179" y="85">
































    <参数键=“2”值= " Open.true.real.attribute"/>




































    <参数键= value =“select_label_by_dimensionfalse"/>





























    Test.zip 164.3K
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    @davidraul36I see that you set this up as a time series problem. Was there a particular reason to seperate the time and date columns?

    davidraul36
  • davidraul36davidraul36 MemberPosts:6Contributor I

    Since it's a direct time series problem, I have tried time series examples.

    I was trying to predict the moving average values, instead of common lag.

    I have tried another model, by selecting "Avg" as label and all other columns as "attributes" then use any operators like Neural, SVM, then apply model on test data...

    So is that OK?

  • davidraul36davidraul36 MemberPosts:6Contributor I

    Sorry for my newbie behaviour:)

    here it's the XML







    <运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">






















    <参数键=“2”值= " Open.true.real.attribute"/>












































    <列出关键= " expert_parameters " / >

















    <参数键=“2”值= " Open.true.real.attribute"/>























    Thomas_Ott
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    This should work but your trend accuracy sucks now. So what was screwing this up was how you transformed your AVG attribute into the label. I made some small modifications and dropped out the AVG column from the test set (cause that's what you want to test). If you want to compare the test set AVG with what's predicted, then set the AVG attribute as a 'dummy' role. See the next process below this one.







    <运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">














    <参数键=“2”值= " Open.true.real.attribute"/>



















    <参数键=“2”值= " Open.true.real.attribute"/>


























































    <运营商激活= " true " class = "系列:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">




































    With Dummy Role







    <运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">














    <参数键=“2”值= " Open.true.real.attribute"/>



















    <参数键=“2”值= " Open.true.real.attribute"/>



























































    <运营商激活= " true " class = "系列:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">










































  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    The more I look at this, the more I think you need to use a Sort operator to feed in the time series correctly. I wouldn't split the Date and Time into two units, RapidMiner can easily understand date-time together.

  • davidraul36davidraul36 MemberPosts:6Contributor I

    Thank you so much for spending so much time helping me, I really appreciate that.

    Great Software and Great community!

    I'm just curious about why the chart doesn't plot smoothly.

    However,

    Thank you so much,

    Kindest regards,

    chart.png

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    @davidraul36That's probably because you have AVG values for each hour in your date-time. Rolled up to daily value you'd get the standard daily moving average. I would use an Aggregate operator for that.

    sgenzer davidraul36
Sign InorRegisterto comment.