"Error - attribute text was already present in the example set"
I've used this basic train/apply model successfully for other text classification jobs, but now it is producing an error. Removing stemming and token filtering by size in the apply section of Process Documents still produces an error. Version is 5.0.11 and text module is 5.0.7.
Any ideas about what to change? The text is coming from the same table/field. Training text is a subset of the full set of documents in the table.
Exception: com.rapidminer.operator.UserError
Message: The attribute text was already present in the example set.
Stack trace:
com.rapidminer.operator.text.io.AbstractDocumentInputOperator.createWordAttributes(AbstractDocumentInputOperator.java:336)
com.rapidminer.operator.text.io.AbstractDocumentInputOperator.doWork(AbstractDocumentInputOperator.java:243)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
Nov 22, 2010 10:19:39 AM SEVERE: Process failed: The attribute text was already present in the example set.
Nov 22, 2010 10:19:39 AM SEVERE: Here: Root[1] (Process)
subprocess 'Main Process'
+- Read Database[1] (Read Database)
+- Set Role[0] (Set Role)
+- Set Role (2)[1] (Set Role)
+- Replace[1] (Replace)
+- Nominal to Text[1] (Nominal to Text)
从数据[1](+ -流程文档Documen过程ts from Data)
subprocess 'Vector Creation'
| +- Transform Cases[19098] (Transform Cases)
| +- Tokenize[19098] (Tokenize)
| +- Filter Stopwords (2)[19098] (Filter Stopwords (English))
| +- Stem (2)[19098] (Stem (Porter))
| +- Filter Tokens (by Length)[19098] (Filter Tokens (by Length))
| +- Extract Token Number[19098] (Extract Token Number)
+- SVM[1] (Support Vector Machine (LibSVM))
+- Read Database (2)[1] (Read Database)
+- Replace (2)[1] (Replace)
+- Set Role (3)[1] (Set Role)
+- Set Role (4)[0] (Set Role)
+- Nominal to Text (2)[1] (Nominal to Text)
+- Process Documents from Data (2)[1] (Process Documents from Data)
subprocess 'Vector Creation'
| +- Transform Cases (2)[1] (Transform Cases)
==> | +- Tokenize (2)[1] (Tokenize)
| +- Filter Stopwords (English)[0] (Filter Stopwords (English))
| +- Filter Tokens (2)[0] (Filter Tokens (by Length))
| +- Stem (Porter)[0] (Stem (Porter))
| +- Extract Token Number (2)[0] (Extract Token Number)
+- Apply Model[0] (Apply Model)
Nov 22, 2010 10:19:39 AM SEVERE: The attribute text was already present in the example set.
Any ideas about what to change? The text is coming from the same table/field. Training text is a subset of the full set of documents in the table.
Exception: com.rapidminer.operator.UserError
Message: The attribute text was already present in the example set.
Stack trace:
com.rapidminer.operator.text.io.AbstractDocumentInputOperator.createWordAttributes(AbstractDocumentInputOperator.java:336)
com.rapidminer.operator.text.io.AbstractDocumentInputOperator.doWork(AbstractDocumentInputOperator.java:243)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
Nov 22, 2010 10:19:39 AM SEVERE: Process failed: The attribute text was already present in the example set.
Nov 22, 2010 10:19:39 AM SEVERE: Here: Root[1] (Process)
subprocess 'Main Process'
+- Read Database[1] (Read Database)
+- Set Role[0] (Set Role)
+- Set Role (2)[1] (Set Role)
+- Replace[1] (Replace)
+- Nominal to Text[1] (Nominal to Text)
从数据[1](+ -流程文档Documen过程ts from Data)
subprocess 'Vector Creation'
| +- Transform Cases[19098] (Transform Cases)
| +- Tokenize[19098] (Tokenize)
| +- Filter Stopwords (2)[19098] (Filter Stopwords (English))
| +- Stem (2)[19098] (Stem (Porter))
| +- Filter Tokens (by Length)[19098] (Filter Tokens (by Length))
| +- Extract Token Number[19098] (Extract Token Number)
+- SVM[1] (Support Vector Machine (LibSVM))
+- Read Database (2)[1] (Read Database)
+- Replace (2)[1] (Replace)
+- Set Role (3)[1] (Set Role)
+- Set Role (4)[0] (Set Role)
+- Nominal to Text (2)[1] (Nominal to Text)
+- Process Documents from Data (2)[1] (Process Documents from Data)
subprocess 'Vector Creation'
| +- Transform Cases (2)[1] (Transform Cases)
==> | +- Tokenize (2)[1] (Tokenize)
| +- Filter Stopwords (English)[0] (Filter Stopwords (English))
| +- Filter Tokens (2)[0] (Filter Tokens (by Length))
| +- Stem (Porter)[0] (Stem (Porter))
| +- Extract Token Number (2)[0] (Extract Token Number)
+- Apply Model[0] (Apply Model)
Nov 22, 2010 10:19:39 AM SEVERE: The attribute text was already present in the example set.
Using a simple Naive Bayes classifier.
<参数键= value =“replace_what http . * \ s | # \ w+\s|@\w*\s"/>
<参数键= value =“replace_what http . * \ s | # \ w+\s|@\w*\s"/>
Tagged:
0
Answers
the problem is as the error says that you there's already an attribute "text" present in your example set. During the application the Process Documents operator takes care that the bag of word is built with exactly the same attribute names, because the models would otherwise use wrong attributes for classification.
If in the application data set an attribute is already present that wasn't in the training data set, and this attribute name is part of the word list, then this error will occur. Best thing you can do is prevent this by either including exactly the same attributes in training as in testing (switch on keep text parameter in training, too), or remove the additionally created attribute of by switching "keep text" parameter in application of.
Greetings,
Sebastian
I'm surprised this error hasn't shown up before since I use this basic structure for several classification tasks. Turning off keep text solved the problem.
This bug is still present. As you note when you select "keep_text = true" Process Documents will add a new field called text. If you have tokenized your data and text is a token the process will break. Their does not seem to be an elligant work around as of RM 7.4.
Hi,
there is one. Just don't keep the quest but join the text on the set later on.
Best,
Martin
Dortmund, Germany
I have this problem too, but I don't understand how to solve it. Because this problem occurs when I'm using auto model. Can anyone help me with more details on what I need to do please?