Apply Model: Testing & Training Sets Differ

HyramHyram MemberPosts:39Contributor II
Hi
I am using Sentiment 140 as my training and testing data. They have already split the data into two sets. I am performing training, cross validation and testing all separately. Training and CV on the training set and testing on the testing set. The problem I have is that after text preprocessing, the features in the test set don't align with those of the training set and therefore I can't apply the trained model. In text preprocessing, my end product is a matrix where texts are the examples and the features are aligned to the term frequencies which will be different for the training and test sets.
Do I somehow merge both sets so that the features are aligned and TF = 0?
Thanks

Best Answers

Answers

  • HyramHyram MemberPosts:39Contributor II
    edited July 2020
    道歉——我认为这是解决马吕斯和我ngo in 2012. Was wondering - if you join word list output of process documents from train leg to word list input of process docs on test leg, if it uses same TF values or zeros for out put of process docs on test leg. The values carried through are indeed zero.
    This works, using the word output of the training leg but what if I am processing that information after the process docs operator and reducing features by using a select by weight operator?
  • HyramHyram MemberPosts:39Contributor II
    Thanks very much@Telcontar120and@jacobcybulski!
Sign InorRegisterto comment.