"Which Learning Algorithm to use for probability estimation?"
Ghostrider
MemberPosts:60Maven
I have several (around 30) attributes that I want to feed into a learning algorithm. The attributes are all numeric. The result that I am after is a probability about whether one event will or will not happen (I'm only trying to predict the probability of one event, not multiple events / classification). The probability of event has a non-linear dependence on the attributes. What I mean by this, sometimes a 70% chance of event occurring can be given based upon the conditions of several attributes when taken as a whole. Sometimes, a 70% chance of event occurring can be inferred based on condition of one attribute in particular. The example space is huge so a fast algorithm would be preferred. Can anyone make some recommendations on which learning algorithm to use? If it's not part of RM, but has an open-source Java library, I'd still consider it.
EDIT/UPDATE: One example of what I am looking for is more commonly known as a probabilistic neural network. Link:http://www.statsoft.com/textbook/neural-networks/.; The disadvantage of such a network, however, is that the model stores the training data. Anyone know of a learning algorithm which outputs probability for each class (in my case, only one...maybe 3 eventually) that does not require storing all training examples?
EDIT/UPDATE: One example of what I am looking for is more commonly known as a probabilistic neural network. Link:http://www.statsoft.com/textbook/neural-networks/.; The disadvantage of such a network, however, is that the model stores the training data. Anyone know of a learning algorithm which outputs probability for each class (in my case, only one...maybe 3 eventually) that does not require storing all training examples?
Tagged:
0
Answers
you can use Naive Bayes if you want to have a straight forward probability calculation.
But I wonder why you have the constraint that the result must be the result of a probability calculation?
Greetings,
Sebastian
I recommend Logistic Regression since you only have numeric predictors and a binary response variable. It is indeed slower than NaiveBayes, but the output is a generally better approximation to the probability you seek to calculate. NaiveBayes probabilities are not that well calibrated and tend to clump in regions near 0 and 1.
Regarding general model quality (AUC etc.), logistic regression and naive bayes perform both well.
greetings,
steffen