"A problem about the output_word_list of TextInput"
Hi,
I want to do classification on a text set with 5 categories. Here is the code to input the text:
BTW, what is the meaning of the last number, listed after each term in "word.list"?
Sincerely yours,
gfyang
I want to do classification on a text set with 5 categories. Here is the code to input the text:
Here is a fragment of the outputted file "word.list":
OperatorChain textInput;
IOContainer container;
textInput = (OperatorChain) OperatorService.createOperator("TextInput");
列表< String[] >对位=新的基于“增大化现实”技术rayList();
String[] para1 = {"graphics", "c:/data/Reuters/acq"};
String[] para2 = {"hardware", "c:/data/Reuters/corn"};
String[] para3 = {"hardware", "c:/data/Reuters/crude"};
String[] para4 = {"hardware", "c:/data/Reuters/earn"};
String[] para5 = {"hardware", "c:/data/Reuters/grain"};
para.add(para1);
para.add(para2);
para.add(para3);
para.add(para4);
para.add(para5);
textInput.setListParameter("texts", para);
textInput.setParameter("prune_below", "3");
textInput.setParameter("output_word_list", "d:/test/word.list");
// some preprocessing for text
container = textInput.apply(new IOContainer());
WHY there are only two classes in "word.list"?
@number_of_documents80
@number_of_classes2
bank,8,5,3
aim,3,3,0
ltd,11,7,4
... ...
BTW, what is the meaning of the last number, listed after each term in "word.list"?
Sincerely yours,
gfyang
Tagged:
0
Answers
Looks to me like there are only two classes ( "hardware" and "graphics" as a wild and crazy guess), but what do I know ?
You are right. ;D This is my foolish mistake. Sorry.
As they say, the devil is always in the detail >:(