"Prediction accuracy problem"

c1borg · May 2009

Ok firstly hi to everyone this is my first post. My problem is to predict stock movements. So firstly I created a spreadsheet in Excel with daily closing prices for 3 stocks and a prediction to learn against which is OUT, LONG & SHORT
I run the prediction and get 85% accuracy
true out true long true short
pred. out 1626 85 73
pred. long 77 660 93
pred. short 62 73 433
class recall 92.12% 80.68% 72.29%
I then run the saved model on 1 year of test data not previously used for the prediction and the result of true vs predicted value is only 53%
What have I done wrong?

I can post xml's and excel sheet if required or answer in more detail if requested

haddock · May 2009

Welcome to the whacky world of RM!

Without seeing the XML and data it is almost impossible to give a useful answer; that being said my experience is that, when it comes to financial prediction, the more realistic the setup the lower the accuracy, dammit >:( You can check out globestreetjournal.com to see what I mean.

c1borg · May 2009

I was going to attach the files but cant work out how to? So here are the 2 xml's

If you can point me in the right direction so I can post attachments I will do that.

haddock · May 2009

Hi,

I'll take a better look tomorrow, but one point is clear, namely that stratified sampling cannot be right for the validation, as you could end up training on the future if you think about it... try sliding window validation instead.

Gottarush, cheers.

c1borg · May 2009

Ok many thanks if you need the data file let me know but I might have to email you with It cant find the attachment option?

haddock · May 2009

G'Day c1borg!

No need to send in the data. The core of your problem is that the results of validating your model are so different from the results you get when you apply it to unseen data. Applying the model is fine, so you need to concentrate on the validation end. Validation splits the data into training and test sets, making the model from the former and applying it to the latter. So the key notion is to make sure that this splitting is done sensibly.

For your problem you need to be certain that the training is done on examples that occurbeforethe examples to be tested. If you check outhttp://en.wikipedia.org/wiki/Stratified_samplingyou will see that stratified sampling does not do this. However, sliding a window down your examples ensures that this cannot happen, so that would be a possibility.

Happy mining, and good luck!

c1borg · May 2009

Ok many thanks will take your advice.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Prediction accuracy problem"

Answers