"Text Classification with SVM"
Hi,
My goal is to optimize sentiment prediction by using SVM insted knn.
Therefor i had to bring up a substep which proves for subjectivity (with a two class Model: subjectivity, nonsub). Each sentence with subjectivity would be relevant for my SVM based approach. Certainly, i'm using SVM to predict the subjectivity. But when i apply the Model on unclassified sentences i receive as a result just on prediction class.
This is my process to create the model.The process to apply the model is down under. Am i missing something?
[tt]
< = " t运营商激活rue" class="process" compatibility="5.1.001" expanded="true" name="Process">
< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve (3)" width="90" x="45" y="120">
< = " t运营商激活rue" class="text:process_document_from_data" compatibility="5.1.000" expanded="true" height="76" name="Process Documents from Data" width="90" x="246" y="30">
< = " t运营商激活rue" class="text:tokenize" compatibility="5.1.000" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
< = " t运营商激活rue" class="text:stem_snowball" compatibility="5.1.000" expanded="true" height="60" name="Stem (Snowball)" width="90" x="447" y="30">
<操作符= " true " class = " select_attribute激活s" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="255">
< = " t运营商激活rue" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="246" y="165">
< = " t运营商激活rue" class="support_vector_machine_libsvm" compatibility="5.1.001" expanded="true" height="76" name="SVM (5)" width="90" x="514" y="345">
< = " t运营商激活rue" class="store" compatibility="5.1.001" expanded="true" height="60" name="Store (3)" width="90" x="715" y="300">
[/tt]
This is the Process to apply the model:
[tt]
< = " t运营商激活rue" class="process" compatibility="5.1.001" expanded="true" name="Process">
< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="JPW_getTestData" width="90" x="45" y="120">
< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Model SVM#" width="90" x="45" y="255">
< = " t运营商激活rue" class="text:process_document_from_data" compatibility="5.1.000" expanded="true" height="76" name="Process Documents from Data (2)" width="90" x="447" y="30">
< = " t运营商激活rue" class="text:tokenize" compatibility="5.1.000" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
< = " t运营商激活rue" class="text:stem_snowball" compatibility="5.1.000" expanded="true" height="60" name="Stem (2)" width="90" x="404" y="30">
< = " t运营商激活rue" class="informationExtraction:tree_svm" compatibility="1.0.000" expanded="true" height="94" name="TreeSVM" width="90" x="305" y="396"/>
< = " t运营商激活rue" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="313" y="165">
< = " t运营商激活rue" class="multiply" compatibility="5.1.001" expanded="true" height="94" name="Multiply" width="90" x="447" y="345"/>
<操作符= " true " class = " select_attribute激活s" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="514" y="165">
< = " t运营商激活rue" class="write_database" compatibility="5.1.001" expanded="true" height="60" name="Write Database" width="90" x="648" y="165">
@sentiment_analysis"/>
<连接from_op = "应用模式”from_port = "标签data" to_op="Multiply" to_port="input"/>
[/tt]
My goal is to optimize sentiment prediction by using SVM insted knn.
Therefor i had to bring up a substep which proves for subjectivity (with a two class Model: subjectivity, nonsub). Each sentence with subjectivity would be relevant for my SVM based approach. Certainly, i'm using SVM to predict the subjectivity. But when i apply the Model on unclassified sentences i receive as a result just on prediction class.
This is my process to create the model.The process to apply the model is down under. Am i missing something?
[tt]
< = " t运营商激活rue" class="process" compatibility="5.1.001" expanded="true" name="Process">
< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve (3)" width="90" x="45" y="120">
< = " t运营商激活rue" class="text:process_document_from_data" compatibility="5.1.000" expanded="true" height="76" name="Process Documents from Data" width="90" x="246" y="30">
< = " t运营商激活rue" class="text:tokenize" compatibility="5.1.000" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
< = " t运营商激活rue" class="text:stem_snowball" compatibility="5.1.000" expanded="true" height="60" name="Stem (Snowball)" width="90" x="447" y="30">
<操作符= " true " class = " select_attribute激活s" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="255">
< = " t运营商激活rue" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="246" y="165">
< = " t运营商激活rue" class="support_vector_machine_libsvm" compatibility="5.1.001" expanded="true" height="76" name="SVM (5)" width="90" x="514" y="345">
< = " t运营商激活rue" class="store" compatibility="5.1.001" expanded="true" height="60" name="Store (3)" width="90" x="715" y="300">
This is the Process to apply the model:
[tt]
< = " t运营商激活rue" class="process" compatibility="5.1.001" expanded="true" name="Process">
< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="JPW_getTestData" width="90" x="45" y="120">
< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Model SVM#" width="90" x="45" y="255">
< = " t运营商激活rue" class="text:process_document_from_data" compatibility="5.1.000" expanded="true" height="76" name="Process Documents from Data (2)" width="90" x="447" y="30">
< = " t运营商激活rue" class="text:tokenize" compatibility="5.1.000" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
< = " t运营商激活rue" class="text:stem_snowball" compatibility="5.1.000" expanded="true" height="60" name="Stem (2)" width="90" x="404" y="30">
< = " t运营商激活rue" class="informationExtraction:tree_svm" compatibility="1.0.000" expanded="true" height="94" name="TreeSVM" width="90" x="305" y="396"/>
< = " t运营商激活rue" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="313" y="165">
< = " t运营商激活rue" class="multiply" compatibility="5.1.001" expanded="true" height="94" name="Multiply" width="90" x="447" y="345"/>
<操作符= " true " class = " select_attribute激活s" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="514" y="165">
< = " t运营商激活rue" class="write_database" compatibility="5.1.001" expanded="true" height="60" name="Write Database" width="90" x="648" y="165">
<连接from_op = "应用模式”from_port = "标签data" to_op="Multiply" to_port="input"/>
Tagged:
0
Answers
In my own work I've found that SVMs are very sensitive to their parameters, kernel type, C and epsilon. Things aren't made any easier by by the fact that those parameters are rather tricky to definehttp://www.svms.org/parameters/; so your best bet is to do the combinations and check the performances.
I only mention this because I see that your model is made on one pass with C set to zero; without checking against the data it is not possible to be definitive, but I'm not that surprised that you just get one class in the prediction column. There is a handy paper on this subject athttp://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdfwhich will point you in the right direction.
Hope so, have fun!