Data mining
Hi guys, I was doing a job but I found a problem and I don't know how to start, I'm really new to using the rapidminer, and I would like to know if anyone could help me. I have to estimate Feature 8 which is the number of maintenance interventions the device has had. What can I do? Thanks André
Tagged:
0
Best Answer
-
yyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:362RM Data Scientist
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
I worked on your training data a bit to build regression trees based on clean features. The predictive model performs pretty good with 10-fold cross validation. RMSE is as follows
My process attached for your reference.
Cheers,
YY
Thanks
André
I used the csv files from you in another thread. They are attached here as well.
Cheers,
YY
This way I can understand what?
Ps. the feat1 could potentially result in somedata leakageif we apply target encoding on such categorical attributes with soo many values. I don't have the context here but you can try to drop it by configuring "Target Encoding".
Pps. you can round up the predictions after scoring if you prefer to integers.
HTH!
André
I hope it makes sense
André
According to your definition, the model is predicting " Feat 8, which is the number of maintenance interventions."
I will stick to theregression models(KNN, regression tree, Random Forest, GLM, GBT are good choices for regression) because you will predict a numerical target. If the target is categorical, saying true/false, broken/normal, then go classification.
Besides visualization for data exploration andoutlier detection, you can also use some of the outlier detection models (e.g.Tukey testfor exponential distribution... )
I fully understand why you use the regression method, why the classification method is not the best, but I was kind of at a loss as to why you for example don't use the associations & correlations method is there a reason?