"A problem about the output_word_list of TextInput"

gfyanggfyang MemberPosts:29Maven
edited June 2019 inHelp
Hi,

I want to do classification on a text set with 5 categories. Here is the code to input the text:

OperatorChain textInput;
IOContainer container;

textInput = (OperatorChain) OperatorService.createOperator("TextInput");
列表< String[] >对位=新的基于“增大化现实”技术rayList();
String[] para1 = {"graphics", "c:/data/Reuters/acq"};
String[] para2 = {"hardware", "c:/data/Reuters/corn"};
String[] para3 = {"hardware", "c:/data/Reuters/crude"};
String[] para4 = {"hardware", "c:/data/Reuters/earn"};
String[] para5 = {"hardware", "c:/data/Reuters/grain"};
para.add(para1);
para.add(para2);
para.add(para3);
para.add(para4);
para.add(para5);
textInput.setListParameter("texts", para);
textInput.setParameter("prune_below", "3");
textInput.setParameter("output_word_list", "d:/test/word.list");

// some preprocessing for text

container = textInput.apply(new IOContainer());

Here is a fragment of the outputted file "word.list":

@number_of_documents80
@number_of_classes2
bank,8,5,3
aim,3,3,0
ltd,11,7,4
... ...
WHY there are only two classes in "word.list"?

BTW, what is the meaning of the last number, listed after each term in "word.list"?

Sincerely yours,
gfyang

Answers

  • haddockhaddock MemberPosts:849Maven
    Hi,

    Looks to me like there are only two classes ( "hardware" and "graphics" as a wild and crazy guess;)), but what do I know ?

  • gfyanggfyang MemberPosts:29Maven
    Hi, haddock

    You are right. ;D This is my foolish mistake. Sorry.
  • haddockhaddock MemberPosts:849Maven
    Hola gfyang,

    As they say, the devil is always in the detail >:(
Sign InorRegisterto comment.