Bayes Learner and Text Classification - Index Out of Bounds

B_B_ MemberPosts:70Guru
edited September 2019 inHelp
I am trying to classify text with W-Naive Bayes Multinomial, but keep getting an error 263 Array Index Out of Bounds error.

I use a sample of 200 text records from a database as a training set, save the model, then read 2000 text records to classify. I know there are additional terms in the full set that are not in the test set. Is this causing the index out of bounds error?

What will correct this?

Thanks

B.

training code









<运营商激活= " true " class = "process" compatibility="5.0.0" expanded="true" name="Root">
Using a simple Naive Bayes classifier.

<运营商激活= " true " class = "retrieve" compatibility="5.0.10" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">


<运营商激活= " true " class = "select_attributes" compatibility="5.0.10" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="165">




<运营商激活= " true " class = "set_role" compatibility="5.0.10" expanded="true" height="76" name="Set Role" width="90" x="45" y="300">



<运营商激活= " true " class = "set_role" compatibility="5.0.10" expanded="true" height="76" name="Set Role (2)" width="90" x="45" y="390">



<运营商激活= " true " class = "nominal_to_text" compatibility="5.0.10" expanded="true" height="76" name="Nominal to Text" width="90" x="246" y="345">



<运营商激活= " true " class = "text:process_document_from_data" compatibility="5.0.6" expanded="true" height="76" name="Process Documents from Data" width="90" x="380" y="255">


<运营商激活= " true " class = "text:transform_cases" compatibility="5.0.6" expanded="true" height="60" name="Transform Cases" width="90" x="179" y="165"/>
<运营商激活= " true " class = "text:tokenize" compatibility="5.0.6" expanded="true" height="60" name="Tokenize" width="90" x="313" y="120"/>








<运营商激活= " true " class = "weka:W-NaiveBayesMultinomialUpdateable" compatibility="5.0.1" expanded="true" height="76" name="W-NaiveBayesMultinomialUpdateable" width="90" x="447" y="120">


<运营商激活= " true " class = " write_model“compatibility="5.0.10" expanded="true" height="60" name="Write Model" width="90" x="455" y="30">














< portSpacing端口= " sink_result 2”间隔= " 0 " / >




model applier











<运营商激活= " true " class = "process" compatibility="5.0.10" expanded="true" name="Process">

<运营商激活= " true " class = "read_model" compatibility="5.0.10" expanded="true" height="60" name="Read Model" width="90" x="376" y="34">


<运营商激活= " true " class = "retrieve" compatibility="5.0.10" expanded="true" height="60" name="Retrieve" width="90" x="44" y="22">


<运营商激活= " true " class = "select_attributes" compatibility="5.0.10" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="120">




<运营商激活= " true " class = "set_role" compatibility="5.0.10" expanded="true" height="76" name="Set Role" width="90" x="45" y="255">



<运营商激活= " true " class = "nominal_to_text" compatibility="5.0.10" expanded="true" height="76" name="Nominal to Text" width="90" x="179" y="75">



<运营商激活= " true " class = "text:process_document_from_data" compatibility="5.0.6" expanded="true" height="76" name="Process Documents from Data" width="90" x="313" y="165">


<运营商激活= " true " class = "text:transform_cases" compatibility="5.0.6" expanded="true" height="60" name="Transform Cases" width="90" x="112" y="51"/>
<运营商激活= " true " class = "text:tokenize" compatibility="5.0.6" expanded="true" height="60" name="Tokenize" width="90" x="313" y="75"/>








<运营商激活= " true " class = "apply_model" compatibility="5.0.10" expanded="true" height="76" name="Apply Model" width="90" x="581" y="75">











< portSpacing端口= " sink_result 2”间隔= " 0 " / >




Answers

  • fischerfischer MemberPosts:439Maven
    Hi,

    could you paste the stack trace from the log?

    Best,
    Simon
  • B_B_ MemberPosts:70Guru
    Simon

    the W-Naive Bayes did not show a stack trace, only the error messages in the log below.
    W-NaiveBayesMultinomialUpdateable: Exception occured while classifying example:263 [class java.lang.ArrayIndexOutOfBoundsException]



    This is from using the Naive Bayes process to create a model.

    Exception: java.lang.ArrayIndexOutOfBoundsException
    Message: 262
    Stack trace:

    com.rapidminer.operator.learner.bayes.SimpleDistributionModel.performPrediction(SimpleDistributionModel.java:384)
    com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
    com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
    com.rapidminer.operator.Operator.execute(Operator.java:771)
    com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
    com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
    com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
    com.rapidminer.operator.Operator.execute(Operator.java:771)
    com.rapidminer.Process.run(Process.java:899)
    com.rapidminer.Process.run(Process.java:795)
    com.rapidminer.Process.run(Process.java:790)
    com.rapidminer.Process.run(Process.java:780)
    com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
  • fischerfischer MemberPosts:439Maven
    Hi,

    I opened a bug report. You can help by attaching a process and data to it.

    http://bugs.rapid-i.com/show_bug.cgi?id=403

    Best,
    Simon
  • B_B_ MemberPosts:70Guru
    Is there a known good Bayes in an earlier version of RM to use while the bug is fixed?
  • fischerfischer MemberPosts:439Maven
    Hi,

    Problem is closed due to invalid classification process. The word list was missing and so the example set was incompatible with the model.

    Best,
    Simon
  • B_B_ MemberPosts:70Guru
    What do you mean by the word list was missing?

    Can you provide a simple example of how to set up Bayes text classification? there isn't one in the samples.

    Thanks

    B.
  • B_B_ MemberPosts:70Guru
    Add: searched the forum but didn't find any examples of how to correctly set up Bayes
  • fischerfischer MemberPosts:439Maven
    Hi,

    the Naive Bayes operator can be used like any other learning algorithm. Try with an example without text first.

    The problem is not the Naive Bayes operator. The problem is that the classification process does not have the same word list as the training process. For that reason, the example sets have different attributes during training and application. The whole process does not make sense, it won't give correct results for any learner. In order to fix the setup, you must store the word list generated during training and feed it back into the classification step. The document processing operator has an input for that. Consider the word list as a part of the model.

    Best,
    Simon
  • B_B_ MemberPosts:70Guru
    "the classification process does not have the same word list as the training process. For that reason, the example sets have different attributes during training and application"

    How to use word input wasn't clear - I assumed the word attributes were stored in the model. Now it's working.

    thanks Simon
Sign InorRegisterto comment.