"Prediction accuracy problem"
Ok firstly hi to everyone this is my first post. My problem is to predict stock movements. So firstly I created a spreadsheet in Excel with daily closing prices for 3 stocks and a prediction to learn against which is OUT, LONG & SHORT
I run the prediction and get 85% accuracy
true out true long true short
pred. out 1626 85 73
pred. long 77 660 93
pred. short 62 73 433
class recall 92.12% 80.68% 72.29%
I then run the saved model on 1 year of test data not previously used for the prediction and the result of true vs predicted value is only 53%
What have I done wrong?
I can post xml's and excel sheet if required or answer in more detail if requested
I run the prediction and get 85% accuracy
true out true long true short
pred. out 1626 85 73
pred. long 77 660 93
pred. short 62 73 433
class recall 92.12% 80.68% 72.29%
I then run the saved model on 1 year of test data not previously used for the prediction and the result of true vs predicted value is only 53%
What have I done wrong?
I can post xml's and excel sheet if required or answer in more detail if requested
Tagged:
0
Answers
Without seeing the XML and data it is almost impossible to give a useful answer; that being said my experience is that, when it comes to financial prediction, the more realistic the setup the lower the accuracy, dammit >:( You can check out globestreetjournal.com to see what I mean.
If you can point me in the right direction so I can post attachments I will do that.
I'll take a better look tomorrow, but one point is clear, namely that stratified sampling cannot be right for the validation, as you could end up training on the future if you think about it... try sliding window validation instead.
Gottarush, cheers.
No need to send in the data. The core of your problem is that the results of validating your model are so different from the results you get when you apply it to unseen data. Applying the model is fine, so you need to concentrate on the validation end. Validation splits the data into training and test sets, making the model from the former and applying it to the latter. So the key notion is to make sure that this splitting is done sensibly.
For your problem you need to be certain that the training is done on examples that occurbeforethe examples to be tested. If you check outhttp://en.wikipedia.org/wiki/Stratified_samplingyou will see that stratified sampling does not do this. However, sliding a window down your examples ensures that this cannot happen, so that would be a possibility.
Happy mining, and good luck!