Multiple non linear Regression in Rapid miner
I am a newbie in rapid miner. I am using Rapid miner as a part of my data mining tool for my graduation thesis.
I have a number of independent variables and one dependent variable which is numerical. I have tried using Linear regression and polynomial regression. However, I also want to try multiple non-linear regression on my data, if it predicts more accurately than linear regressions.
By multiple non-linear regression, I mean that, some independent variables are linear, and some are non-linear(as logarithmic, or exponential or even polynomial). And the predictive value is the combination of all of those.
Y = a . C1 + b.e^C2 + c.log C3 + ...
Here, a, b, c are independent variables and C1, C2, C3 are coefficients.
Could anybody explain me how can I add such operators to achieve my goal?
Thanks, in advance. I would love to clarify my problem, if it is not clear to you.
I have a number of independent variables and one dependent variable which is numerical. I have tried using Linear regression and polynomial regression. However, I also want to try multiple non-linear regression on my data, if it predicts more accurately than linear regressions.
By multiple non-linear regression, I mean that, some independent variables are linear, and some are non-linear(as logarithmic, or exponential or even polynomial). And the predictive value is the combination of all of those.
Y = a . C1 + b.e^C2 + c.log C3 + ...
Here, a, b, c are independent variables and C1, C2, C3 are coefficients.
Could anybody explain me how can I add such operators to achieve my goal?
Thanks, in advance. I would love to clarify my problem, if it is not clear to you.
Tagged:
0
Answers
~Martin
Dortmund, Germany
But Neural net is something which is hidden to the user and also requires large number of inputs. That is why I am considering regressions, in which the regression formula is visible and clear to the user. And the user can easily relate how the dependent variables are a function of linear, exponential, logarithmic or polynomial function of independent variables.
So, are there any operators in Rapidminer to get such kind of formulas for regressions? Or if there is any way to deal with such problem?
I might use neural networks and other techniques as well to validate my predictions though.
Thanks!
Each variable gets a formula which transforms the space around it so it becomes linear.
Try one alongside the Create Formula operator.
I implemented your approach and it did produce a very complex formula of like 20-30 terms for 5 independent variables.
But the worst part was that, the performance was not very promising for my data.
I am developing a parametric cost model, in which the cost is dependent on a number of independent variables. So, the final formula would contain various Cost Estimating Relationship formulas combined together to predict the cost. I know that this is a multiple non-linear regression problem, but I do not know how to implement this even with other tools or with rapidminer.
Any further help to this direction, would be appreciated.
Anyway all the regression operators including linear (GLM), polynomial, etc.. can all be found by simply typing "regression" in the operator search window:
有一个特别的理由你想使用nonli吗near regression models? What is your use case? Have you tried just using Auto Model and see what happens there first as a quick test?
Scott
Thanks so much for the reply! I have 1 dependent variable (engagement rate) and 12 independent variables (color of the picture) all measured at continuous level. I tried SPSS first with linear regression but didn't really work because the data should be non-linear based on the graph. That's why now I am trying out nonlinear.
I highly recommend to follow Scott's advice to submit your data to AutoModel.
More over , AutoModel can perform feature selection (and eventually feature generation) automatically for you.
Your dataset must contain at least 100 rows.
Regards,
Lionel
I have just tried it out! The generalized linear model appeared to perform the best though. However, I wonder is there any reason that there is no p-value etc showing?
To get the p-values, please uncheck the "use regularization" option in GLM parameters and check the "compute p-values" in the parameters. I also suggest checking the "remove collinear columns" option as well. This way you will get the p-values.
Please let us know if you encounter any issues.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
I wanted to check the "remove collinear columns" as per your suggestion, but I couldn't find that option? Where is that? Thank you very much in advance!!!.
Looks like you didn't check "add intercept". First, check the "add intercept" then you can find "remove collinear columns".
Let us know any other issues you face.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
I've decided to use the results from SVM eventually but I am not sure exactly how to interpret those numbers ... for example, some of the weight of the attribute shows 0, meaning that they do not contribute to my DV at all? And there are several other outputs under SVM that I am not sure how to interpret it. I couldn't find SVM in the Auto Model ducumentation on rapidminer website. It would be nice if you have some information regarding the SVM results generated by Auto Model!