Polynomial regression with mixed terms
michaelhecht
MemberPosts:89Guru
I would appreciate if the polynomial regression would be able to apply also mixed terms, i.e.
if there are attributes like X1,X2,X3 and a label Y and I specify a second order polynomial
that I could get
Y = a * X1 + b * X2 + c * X3 + d * X1*X2 + e * X2*X3 + f * X1*X3 + g * X1^2 + h * X2^2 + i * X3^2
maybe with a kind of optimization of used terms according to an implicite cross validation,
since the application of all possible mixed terms could "explode" the size of the polynomial,
if the size of attribute is large. On the other hand maybe one could specify the maximum number
of mixed terms, e.g. to 3 which means that intermixing is only allowed for 3 attributes. With higher
order polynomials one would get terms like: a * X1 * X4^3 * X5^2.
if there are attributes like X1,X2,X3 and a label Y and I specify a second order polynomial
that I could get
Y = a * X1 + b * X2 + c * X3 + d * X1*X2 + e * X2*X3 + f * X1*X3 + g * X1^2 + h * X2^2 + i * X3^2
maybe with a kind of optimization of used terms according to an implicite cross validation,
since the application of all possible mixed terms could "explode" the size of the polynomial,
if the size of attribute is large. On the other hand maybe one could specify the maximum number
of mixed terms, e.g. to 3 which means that intermixing is only allowed for 3 attributes. With higher
order polynomials one would get terms like: a * X1 * X4^3 * X5^2.
0
Answers
as you said, the invention of all mixed terms would let the problem explode. So you would have to do some sort of feature selection internally, validation is needed of course, too.
I personally believe, that this is the wrong way, because you loose so much of controll. It's a simple approach but you could make everything you want to have done internally just inside your process. Use a feature generation providing multiplications, perhabs a genetic algorithm, just as you like, together with a validation over a linear regression and you have everything you want...
Greetings,
Sebastian
If yes, assistance is really welcome. I'm not in any way so familiar with RapidMiner that I'm able to do
this. The documentation also doesn't seem to be helpful. So if it is easy for you or a challengeI
would really be happy to get a solution or at least some hints. I'm sure that other users of this
forum do think similar.
here's just a simple example for what I suggested. The genetic feature generation will produce products of the attributes, and the linear regression will learn a least squares fitted model on all terms. Hence this is some sort of polynomial regression with mixed terms. The genetic selection algorithm will optimize the used mixed terms, so that it is avoided to build all possible of them at once.
Greetings,
Sebastian
I'm both impressed and confused. Is it really this simple (short)? Is there any detailed description in the RM
documentation how the AGA operator works?
Is there a chance to force certain attributes to be used? In detail I wanted to plot att1 against predicted label
but the AGA produced only mixed terms.
Nevertheless, thank you very much again.
[glow=red,2,300]I'm sure that RM is able to do much more than I can imagine!
So where is the RM book to learn me how to do?[/glow]
:-\
unfortunately there's no more detailed description than the operator info. Surely this implementation is based upon a scientific publication, but perhabs this something too detailed?
If you only want to plot one variable against another, you could simply use the scatter plot? If you want to evaluate any special expression, because of some background knowledge, you could use the AttributeConstruction operator. You can enter there any expression, in your case you would type "att1 * prediction(label)". Take a look in the operator info of the AttributeConstruction operator for an overview over the various functions available.
The RM book is planned, but unfortunately the community finds too much bugs, so nobody can work on itTo be serious: We are working on it, but it won't be finished in the near future.
Greetings,
Sebastian