"Text Classification with SVM"

Jepse · March 2011

Hi,

My goal is to optimize sentiment prediction by using SVM insted knn.

Therefor i had to bring up a substep which proves for subjectivity (with a two class Model: subjectivity, nonsub). Each sentence with subjectivity would be relevant for my SVM based approach. Certainly, i'm using SVM to predict the subjectivity. But when i apply the Model on unclassified sentences i receive as a result just on prediction class.

This is my process to create the model.The process to apply the model is down under. Am i missing something?
[tt]

< = " t运营商激活rue" class="process" compatibility="5.1.001" expanded="true" name="Process">

< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve (3)" width="90" x="45" y="120">

< = " t运营商激活rue" class="text:process_document_from_data" compatibility="5.1.000" expanded="true" height="76" name="Process Documents from Data" width="90" x="246" y="30">

< = " t运营商激活rue" class="text:tokenize" compatibility="5.1.000" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
< = " t运营商激活rue" class="text:stem_snowball" compatibility="5.1.000" expanded="true" height="60" name="Stem (Snowball)" width="90" x="447" y="30">

<操作符= " true " class = " select_attribute激活s" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="255">

< = " t运营商激活rue" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="246" y="165">

< = " t运营商激活rue" class="support_vector_machine_libsvm" compatibility="5.1.001" expanded="true" height="76" name="SVM (5)" width="90" x="514" y="345">

< = " t运营商激活rue" class="store" compatibility="5.1.001" expanded="true" height="60" name="Store (3)" width="90" x="715" y="300">

[/tt]

This is the Process to apply the model:
[tt]

< = " t运营商激活rue" class="process" compatibility="5.1.001" expanded="true" name="Process">

< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="JPW_getTestData" width="90" x="45" y="120">

< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Model SVM#" width="90" x="45" y="255">

< = " t运营商激活rue" class="text:process_document_from_data" compatibility="5.1.000" expanded="true" height="76" name="Process Documents from Data (2)" width="90" x="447" y="30">

< = " t运营商激活rue" class="text:tokenize" compatibility="5.1.000" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
< = " t运营商激活rue" class="text:stem_snowball" compatibility="5.1.000" expanded="true" height="60" name="Stem (2)" width="90" x="404" y="30">

< = " t运营商激活rue" class="informationExtraction:tree_svm" compatibility="1.0.000" expanded="true" height="94" name="TreeSVM" width="90" x="305" y="396"/>
< = " t运营商激活rue" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="313" y="165">

< = " t运营商激活rue" class="multiply" compatibility="5.1.001" expanded="true" height="94" name="Multiply" width="90" x="447" y="345"/>
<操作符= " true " class = " select_attribute激活s" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="514" y="165">

< = " t运营商激活rue" class="write_database" compatibility="5.1.001" expanded="true" height="60" name="Write Database" width="90" x="648" y="165">
@sentiment_analysis"/>;

<连接from_op = "应用模式”from_port = "标签data" to_op="Multiply" to_port="input"/>

[/tt]

haddock · March 2011

Hi there Jepse,

In my own work I've found that SVMs are very sensitive to their parameters, kernel type, C and epsilon. Things aren't made any easier by by the fact that those parameters are rather tricky to definehttp://www.svms.org/parameters/; so your best bet is to do the combinations and check the performances.

I only mention this because I see that your model is made on one pass with C set to zero; without checking against the data it is not possible to be definitive, but I'm not that surprised that you just get one class in the prediction column. There is a handy paper on this subject athttp://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdfwhich will point you in the right direction.

Hope so, have fun!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Text Classification with SVM"

Answers