How to build a prediction model by weight

zmkzmk MemberPosts:4Contributor I
edited December 2018 inHelp

Hi there,

I am Rapidminer beginner. I have the following problem:

I have a small dataset: only 11 rows, but 102 attributes. The label is binominal: 1 or 2.

The decision tree finds only one attribute that discriminates between 1 and 2 in the 11 rows with 100% accuracy - which has a accuracy of about 51% tested on a second validation data set.

Using "Weight by correlation" and by manual visual comaprison of the graphs I was able to find about 6 attributes that discriminate very good between 1 and 2.

Now I want to generade a model out of the top 6 weighted attributes and test it on a unlabled data set.

How do I do this?

Thanks,

ZMK

weight.jpghere is my process so far

Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,407RM Data Scientist

    Hi,

    have a look at the last 4 videos of our getting started://www.turtlecreekpls.com/training/videos/

    that should explain it.

    Cheers,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • zmkzmk MemberPosts:4Contributor I

    Actually I did and constructed the training processes step by step (I really enjoyed the videos). Then I replaced the training data with my own data. Because the decision tree results in only one attribute that can discriminate between my two label values it performed really bad with the validation dataset, not known to the algorithm before.

    So I used "select by weights" to visualize the data and realized that the decision tree took only the one top attribute with the highest weight value. But instead the top six are great.

    So now I want to build a model forced using all six attributes and test it on my validation data set.

    Something like "3 out of six must be altered to predict label"

    P.s.

    I guess the tree model is too complex or tight if it uses just one value. This seems to result in overfitting. I am looking for a way to increase the generalization performance.

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    The fundamental problems are not enough examples and too many attributes. DT is going to be suceptible to overfitting the training in this circumstance. You would be better off to do a combination of dimensionality reduction / feature engineering to reduce the number of attributes, and simultanously see if you can acquire more data (examples) for model building. Otherwise I think you are going to have to use a more judgmental model building strategy.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
Sign InorRegisterto comment.