Question data

andre5007andre5007 MemberPosts:22Contributor I
edited May 2021 inHelp
I have these two csv, in which both csv have several feats. Feat1- model, Feat2-power measure, Feat3- is something that this object has or does not have, being 1 has and 0 does not, Feat4 is a feature that I don’t know what it is, Feat5- device installation date, Feat6 / 7- It is the latitude and longitude and feat 8 is the number maintenance interventions. In the CSV Training I have values for feat 8 and in the Test no. My goal is to estimate the Feat 8 for the Test set. How can I do this? Thanks
Treino.csv 739.5K
Teste.csv 728.9K

Best Answers

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist
    Solution Accepted
    Hi@andre5007, it looks your prediction target is numerical (integers). Are you sure you want to build decision tree or any predictive model for classification, rather than regressions? I would parse the label into numbers and try the regression decision trees or GLM/GBT for regression.
    andre5007
  • andre5007andre5007 MemberPosts:22Contributor I
    Solution Accepted
    Hi@yyhuang
    Why do you think regression decision trees or GLM/GBT for regression is better?
    Thanks
    André
  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist
    Solution Accepted
    Hi@andre5007, my point was regression is better than classification here as the model for your data. Because the label is integer. For the difference between regression and classification,https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/
    andre5007

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    You should review the RapidMiner tutorials for Cross Validation and for Apply Model. Basically you are going to define Feat 8 as the label and build your model on that, and then you are going to save that model and apply it to the 2nd dataset.
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    andre5007
  • andre5007andre5007 MemberPosts:22Contributor I
    Ok, I will try to see and do, if you have any questions then can you help me?
  • andre5007andre5007 MemberPosts:22Contributor I
    Can someone tell me if I'm going in a good way please?


  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist
    Hi@andre5007,

    The workflow looks fine if you have your own test set. However, as Brian mentioned above, cross validation is always a smart option on your training set.

    https://academy.www.turtlecreekpls.com/learn/article/cross-validation
    https://academy.www.turtlecreekpls.com/learn/video/validating-a-model
    //www.turtlecreekpls.com/blog/validate-models-cross-validation/

    HTH!

    YY

    andre5007
  • andre5007andre5007 MemberPosts:22Contributor I
    Now I noticed that I was wrong on the print I sent, because it was not the one I wanted to have selected.

    I put a filter at the beginning because it had a value that was missing and because of that it gave an error.

    然后在交叉验证,我把decision tree inside the process at the training site and in the test the apply model and performance.

    Then I linked the cross validation to another apply model and in that apply model I also put the test data set where I have to define feat 8.

    Do you think you should change anything in the operators parameters? Because I didn't change anything just when it was necessary to be able to run the process.

    What do you think I can improve? Or if I am now on the right path?

    Thanks
    Best regards
    André


  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    Looks like a good setup for basic model construction and validation with an additional out-of-sample validation.
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    andre5007
  • andre5007andre5007 MemberPosts:22Contributor I
    Can you explain how I can do to improve the value that I mark in red? Thanks

  • rugmanasokanrugmanasokan MemberPosts:1Newbie
    As a model for your data, regression is better than classification. Due to the integer nature of the label. In order to understand the difference between regression and classification -https://nimblebox.ai/blog/regression-machine-learning
Sign InorRegisterto comment.