"Classifying with SVM through the java API"

I am trying to do a simple classification by integrating RapidMiner into Java. This is approximately the same as a Process I have defined in the GUI which works great. This is how I try and do it in code:
(I call train() once and the classify() for each text).
问题是所有的文本总是通用电气t the same classification, as if no learning had occured or even just some default is taken. These are texts that I classify in the GUI properly (they belong to 5 different classes - polynominal problem), and in different classifiers (lingPipe and a homebrewed one).

public void train(List documents) {
RapidMiner.init(false, false, false, true);

wvtoolOperator = (OperatorChain) OperatorService


List list = new ArrayList();
for (Text text : documents) {
String filename = ...
String classname = ...
list.add(new Object[] { filename, classname});

wvtoolOperator.setListParameter("texts", list);

IOContainer container = wvtoolOperator.apply(new IOContainer());
ExampleSet exampleSet = container.get(ExampleSet.class);
Learner learner = (Learner)OperatorService.createOperator(LibSVMLearner.class);
//Maybe set parameters here?
model = learner.learn(exampleSet);
// Create the model applier
modelApplier = OperatorService.createOperator("ModelApplier");

//Create a new SingleTextInput, for processing test Strings
wvtoolOperator = (OperatorChain) OperatorService

// Add additional processing steps.
/ /注意etup must be same as the one you used when creating the classification model


public String classify(String text) {

// Set the text
wvtoolOperator.setParameter("text", text);

// Call the text input operator
IOContainer container = wvtoolOperator.apply(new IOContainer());

container = container.append(model);
// Call the model applier (the model was added already before calling the text input)
container = modelApplier.apply(container);

// Obtain the example set from the io container. It contains only a single example with our text in it.
ExampleSet eset = container.get(ExampleSet.class);
Example e = eset.iterator().next();

//This does the same thing as what two lines later happens...
//return e.getValueAsString(eset.getAttributes().getPredictedLabel()));

int predLabelIndex = (int) e.getPredictedLabel();
return e.getAttributes().getPredictedLabel().getMapping().mapIndex(predLabelIndex);
} catch (Exception ex) {

This works whether I set or not set parameters in //Should we set parameters here?
setting them is done there this way:

((Operator)learner).setParameter(LibSVMLearner.PARAMETER_SVM_TYPE, new Integer(LibSVMLearner.SVM_TYPE_C_SVC).toString());
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_KERNEL_TYPE, "0");//linear
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_EPSILON, "0.001");
//((Operator)learner).setParameter(LibSVMLearner.PARAMETER_C, "0.0");
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_P, "0.1");
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_CONFIDENCE_FOR_MULTICLASS, "true");

I am probably overlooking something simple but I'm completely out of ideas, I have looked around a lot and tried many approaches.

Thanks a lot,


    Okay, I have understood that I have to save and load the wordlist via the parameters. However, I feel like there should be some kind of object I could pass around between the filters instead of having to write it to a file and load it. Is this supported?
    Also, does that mean it will be loaded every time I apply() the SingleTextInput()?


    And another question while I'm at it... Using the code shown above it takes me about 1.5 seconds to classify each text (around 200 words) after learning a model containing a few hundreds of documents. In the GUI it is closer to your published performance of 25ms per post: It takes 66 seconds to cross-validate the same 350 or so documents in 10 folds (I end up classifying around 700+ documents, so it's actually even much faster). I'm running the example in the text plugin samples, 04_Learning/01_TextClassificationXVal.xml .
    The slow step is ModelApplier.apply()... What could it be? Inherently my java development environment it over 1,000 times slower? or is something done in a different manner in the GUI environment for the said sample?

    Thank you,
    Okay I see now this is because of pruning, which seriously affects the performance of the SVM Model.

    I hope this will be useful to someone for posterity.
    But please answer my last question if you have the time (in the previous thread).

