"Logistic regression output"
Hi,
I am using Ripid Miner for (among others) logistic regression. I have also used SysStat, OpenStat, and some other tools. My problem is that I do not understand the Rapid Miner output.
In all other packages that I tried the logistic regression output consists of the values of b0, b1, .... bn plus Z-values, P, and a host of other indicators and statistics. The ROC is helpful too. This helps me narrowing down from about 30 predictors to a dozen or so that really matter.
In Rapid miner I get as the only output "bias" and what appears to be values for the predictive variables ("w"), but these are different from what the other packages generate. I simply do not understand what I am looking at. In addition, as there are no Z and p values, how to decide which predictors are significant, and which can be dropped?
I have the same problem with the LDA model.
Who can advise me?
Cheers!
Bert
I am using Ripid Miner for (among others) logistic regression. I have also used SysStat, OpenStat, and some other tools. My problem is that I do not understand the Rapid Miner output.
In all other packages that I tried the logistic regression output consists of the values of b0, b1, .... bn plus Z-values, P, and a host of other indicators and statistics. The ROC is helpful too. This helps me narrowing down from about 30 predictors to a dozen or so that really matter.
In Rapid miner I get as the only output "bias" and what appears to be values for the predictive variables ("w"), but these are different from what the other packages generate. I simply do not understand what I am looking at. In addition, as there are no Z and p values, how to decide which predictors are significant, and which can be dropped?
I have the same problem with the LDA model.
Who can advise me?
Cheers!
Bert
Tagged:
0
Answers
actually I doubt that these statistics generated by the models are reliable for attribute selection at all. These values can of course be interpreted in this way, but the meaning is only correct if the assumption underlying the model holds true. And I doubt this in the most real data scenarios.
Of course you can again do testing to actually get a feeling if the statistical assumptions are correct. If you need such deep statistical methods, you might head for the R Extension of RapidMiner to get access to the most widely used statistical software.
In general the approach for attribute selection in RapidMiner is more data driven than based on assumption: You can use several schemes like Forward or Backward Selection that incorporate a Learning Algorithm inside a Cross Validation to estimate the performance on that subset. This approach might include non linear dependencies between the attributes if you choose a learner able to do so.
The other way around are the numerous attribute weighting schemes built in into RapidMiner. These again will calculate the importance of an attribute based upon some assumptions which might or might not hold true.
PS:
Of course we would like to add all these statistical indicators to our models. If someone wants to volunteer and contribute some code, he's welcomed with open arms.
Greetings,
Sebastian
I already understood from other posts on the forum that Rapid miner has not yet reached the point where such statistics are included. As I am rather unsophisticated and not sufficiently computer savy, the easiest solution for me is simply to use another software tool.
Cheers!
Bert
I was under the impression that logistic regression in RM was kernel based and as such was not the same as the "ordinary" logistic regression of SAS, SPSS etc. that is fit with maximum likelihood. Thus, there was not a correspondence between the interpretation and deviance tests (significance of variables) from ordinary logistic regression and the RM flavor.
You can certainly fit the LR you are used to in R - using the R interface and generalized linear models (glm package).