Doubt
Good Morning.
Can someone help me with my question?
我有以下过程来预测the value of feat 8, in which I have other feat and these feat affect the value of feat 8.
I have this process, someone can tell me if it is possible to place an operator here that makes the outliers, does not contain for the estimate.
And you can also tell me if in the process I did if all the feat's are having importance in predicting the value of the feat8 or the only feat that is is the feat9 that was the feat created with the number of days that the object was installed.
Thanks
André
Tagged:
0
Best Answers
-
andre5007 MemberPosts:22Contributor IHi@yyhuang
I was thinking in another way using only excel, I can calculate the feat9 that are the days that the device is already active and I thought to calculate a feat 10 that would be the feat 9/ feat 8, so I would be calculated the number of interventions per day, then I calculated the average for each model and now I can go to the csv Test and calculate the feat 9 and take the average of feat 10 for each model of the other csv and calculate the feat 8 in the csv Test.I tried to do the same in rapidminer but I don't think it is possible, because rapidminer ends up telling me that then the feat 10 is missing for retrive test and even if I put the operator 'Generate Attributes' it will not be possible to calculate in this case because I don't have the feat 8.Is there any way to use the feat 10 calculated in Retrive train and pass to retrive test to be one more feat to have importance in the estimation of feat 8.
0 -
yyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:362RM Data ScientistThe target encoding operator is used to take care the nominal attributes (feature 1 and feature 4) inside cross validation. It is similar to nominal to numerical conversion. I will get the documentation link if there is any. But you can always refer to help views for the help docs. You can delete it if you don't like to convert. nominal attributes.. The reason that I apply the encoding on these nominal attributes is, too many different values in the nominal attributes could result in overfitting for trees.
I can run the process with the new feature 10 added. Not sure about the error in your screenshot.. But I will not use the feature 10 = feature 9/ feature 8 as predictor in my model. Because this new feature is derived from my label (feature 8)and will be data leakage
For geolocation outlier, you can run the quick Tukey test before you apply a filter that remove the outlier
2
Answers
理解预测,你可以使用
Ty
André
Ty@yyhuang
André