"Extract Coefficients from linear regression model"
Here is another problem I would like to share with the RapidMiner Community.
I have generated a Linear Regression model (imported from the stored .mod file.).
y = a*x + b*y + c*z +...
a, b, c ... coefficients of the Linear Regression Modell
x, y, z ... attribute values
I would like to multiply the coefficients from the wgt file with the corresponding attribute to analyse which attribute contributes to the deviation of the prediction.
Desired: a*x, b*y, c*z
I hope I could explain the problem somehow.
Cheers,
Markus
I have generated a Linear Regression model (imported from the stored .mod file.).
y = a*x + b*y + c*z +...
a, b, c ... coefficients of the Linear Regression Modell
x, y, z ... attribute values
I would like to multiply the coefficients from the wgt file with the corresponding attribute to analyse which attribute contributes to the deviation of the prediction.
Desired: a*x, b*y, c*z
I hope I could explain the problem somehow.
Cheers,
Markus
Tagged:
0
Answers
I'm curious: How do you want to calculate the deviation for one single attribute?
Anyway I think you will have to incorporate the script operator to get access to the single components of the formular. The linear regression model is a FormularProvider class, this should give you a good interface to retrieve the coefficients.
If you convert them to a AttributeWeight object, you can apply this to do the multiplication for the whole example set. Might be this is of help depending on how you will answer my question above
Greetings,
Sebastian
Maybe I am just working in circles without gaining any information (it would not be the first time...)
I was thinking of a way to visualize where deviations come from. So I sort my examples according to label-prediction in decreasing order. Thus I know which examples are predicted badly. But of course I don't know the attributes that are responsible for that. Thus I want to see which attributes do contribute what to the final prediction.
And here the coefficients enter the game. It would make no sense to focus on attributes that have a very small coefficient. So those attributes can be wrong. But I want to get rid of the attributes that have a large coefficient but still show nearly no correlation
Cheers,
Markus
do you have normalized your data? Since if you don't do, the coefficients might be small although the contribution is great depending on the scale of the attribute.
Then let me add, that you are trying to model a shrinkage algorithm, that is already included in the linear regression itself: There is a parameter that will assign costs on high coefficients. Thus high coefficients on unimportant attributes are surpressed. Take a look at the parameter "ridge".
Another way to look at the things is the attribute selection which might become handy in your case.
Greetings,
Sebastian