"correlated feature and creating a itteration in models"

Legacy UserLegacy User MemberPosts:0Newbie
edited May 2019 inHelp
Hallo,

I have create a modell in rapidminer version 4.6.
The aims of this modells are: Modelling of the chlorophyll dispersion

depend variable: chlorophyll data - numerical data
independ variables: different variables - numerical data


The problem of all my data-sets is, that I have a lot of independend variables (numerical), but the
variables are among themselves correlate.

Therefore at first I have integrate
a) SVM-Weighting - Weighting of all variables
b) AttributeWeightSelection - extract of the important variables

c) RemoveCorrelatedFeature - to remove correlatedFeature - but: Attribute-order: random !!!
The Attribute-order: random is important because I will get different sets of important variables.


What is the my problem:

I would like to create a iteration over the modelle (because I get different sets of variables from the random
selection). And I will get the best modell from this iteration as an result.
But I did not have any idea, how does it work.

In the following I send you the model. For testing this I integrated the examplesetGenerator. This are not the
original data, but its shows what happen in the modell.

Can anybody help me in the integration of an iteration-part?
Many thanks for this.

Best regard

Angela










































<操作符的名字= "莫德尔anwenden联合国d evaluieren" class="OperatorChain" expanded="no">






























Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi Angela,
    RapidMiner提供了一个操作符,这将解决your problem. It's called the RandomOptimizer and applies to all situations, where random effects have heavy impact on the performance. You will simply put everything inside this meta operator, which must be repeatedly executed.
    Before I post the example process, a remark on your process:
    I don't think removing correlated features twice will make any difference, because you checked the "use_absolute_correlation" parameter. This will remove each attribute correlated with more than 0.1 and less than -0.1 in the first round. You might use breakpoints after each operator to see the results.

    And I have removed the additional learning of the linear regression model after the XValidation. If you want a complete model, using all data for learning, you simply might check the "create_complete_model" parameter in the XValidation. But applying this model on the training data won't give you any reliable results, because it might be overfitted on the training set.

    And here's your slightly modified process:






















































    Greetings,
    Sebastian
Sign InorRegisterto comment.