I want to predict a value by another values
davidraul36
MemberPosts:6Contributor I
Hello, I'm very newbie to RapidMiner and data science as well so bear me please.
I want to predict values from totally different values, it's like trying to finding a model for the relation between them.
For Example;
I have Excel spreedsheet with cloumns (A, B, C, D, F)
I want to use (A, B, C, D) to predict or getting model for the values in (F) then use it to test data...
Thanks in advance,
Tagged:
0
Best Answer
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn
@davidraul36Here's what I would do. Clean up the date and time attributes and use a different algo. 74% trend accuracy and you can most likely optimize that with Optimize Parameters.
<运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">
<参数键=“2”值= " Open.true.real.attribute"/>
<运营商激活= " true " class = " nominal_to_date”compatibility="8.0.001" expanded="true" height="82" name="Nominal to Date" width="90" x="447" y="34">
<参数键=“2”值= " Open.true.real.attribute"/>
<列出关键= " expert_parameters " / >
<运营商激活= " true " class = "系列:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">1
Answers
Hi@davidraul36,
Can you share your dataset(s) please ?
Regards,
Lionel
Here it's the data I use,
I want to find a model which finds the values of column "Avg" from all the other columns.
您应该检查啊ut the "Getting Started" videos on the www.turtlecreekpls.com webpage, they are designed to help you get started with a basic predictive modeling project such as this one. You will need to define your "label" (the thing you are trying to predict) first.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
@davidraul36I would do what@Telcontar120suggests, review some videos and try out the tutorials that are built into Studio itself. Then build a process and if you get stuck, post that XML to the community for help.
I already tried to do a model, but my model use the previous data of "Avg" to predict the next one.
I don't know what to do in the design to let "column (Avg)" as only a prediction without getting any info from it or its previous values.
@davidraul36I see that you set this up as a time series problem. Was there a particular reason to seperate the time and date columns?
Since it's a direct time series problem, I have tried time series examples.
I was trying to predict the moving average values, instead of common lag.
I have tried another model, by selecting "Avg" as label and all other columns as "attributes" then use any operators like Neural, SVM, then apply model on test data...
So is that OK?
Sorry for my newbie behaviour
here it's the XML
This should work but your trend accuracy sucks now. So what was screwing this up was how you transformed your AVG attribute into the label. I made some small modifications and dropped out the AVG column from the test set (cause that's what you want to test). If you want to compare the test set AVG with what's predicted, then set the AVG attribute as a 'dummy' role. See the next process below this one.
With Dummy Role
The more I look at this, the more I think you need to use a Sort operator to feed in the time series correctly. I wouldn't split the Date and Time into two units, RapidMiner can easily understand date-time together.
Thank you so much for spending so much time helping me, I really appreciate that.
Great Software and Great community!
I'm just curious about why the chart doesn't plot smoothly.
However,
Thank you so much,
Kindest regards,
@davidraul36That's probably because you have AVG values for each hour in your date-time. Rolled up to daily value you'd get the standard daily moving average. I would use an Aggregate operator for that.