"Text Classification with SVM"

JepseJepse MemberPosts:11Contributor II
edited May 2019 inHelp
Hi,

My goal is to optimize sentiment prediction by using SVM insted knn.

Therefor i had to bring up a substep which proves for subjectivity (with a two class Model: subjectivity, nonsub). Each sentence with subjectivity would be relevant for my SVM based approach. Certainly, i'm using SVM to predict the subjectivity. But when i apply the Model on unclassified sentences i receive as a result just on prediction class.

This is my process to create the model.The process to apply the model is down under. Am i missing something?
[tt]






< = " t运营商激活rue" class="process" compatibility="5.1.001" expanded="true" name="Process">










< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve (3)" width="90" x="45" y="120">


< = " t运营商激活rue" class="text:process_document_from_data" compatibility="5.1.000" expanded="true" height="76" name="Process Documents from Data" width="90" x="246" y="30">






< = " t运营商激活rue" class="text:tokenize" compatibility="5.1.000" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
< = " t运营商激活rue" class="text:stem_snowball" compatibility="5.1.000" expanded="true" height="60" name="Stem (Snowball)" width="90" x="447" y="30">










<操作符= " true " class = " select_attribute激活s" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="255">



< = " t运营商激活rue" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="246" y="165">




< = " t运营商激活rue" class="support_vector_machine_libsvm" compatibility="5.1.001" expanded="true" height="76" name="SVM (5)" width="90" x="514" y="345">






< = " t运营商激活rue" class="store" compatibility="5.1.001" expanded="true" height="60" name="Store (3)" width="90" x="715" y="300">



















[/tt]


This is the Process to apply the model:
[tt]






< = " t运营商激活rue" class="process" compatibility="5.1.001" expanded="true" name="Process">


< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="JPW_getTestData" width="90" x="45" y="120">


< = " t运营商激活rue" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Model SVM#" width="90" x="45" y="255">


< = " t运营商激活rue" class="text:process_document_from_data" compatibility="5.1.000" expanded="true" height="76" name="Process Documents from Data (2)" width="90" x="447" y="30">





< = " t运营商激活rue" class="text:tokenize" compatibility="5.1.000" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
< = " t运营商激活rue" class="text:stem_snowball" compatibility="5.1.000" expanded="true" height="60" name="Stem (2)" width="90" x="404" y="30">










< = " t运营商激活rue" class="informationExtraction:tree_svm" compatibility="1.0.000" expanded="true" height="94" name="TreeSVM" width="90" x="305" y="396"/>
< = " t运营商激活rue" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="313" y="165">


< = " t运营商激活rue" class="multiply" compatibility="5.1.001" expanded="true" height="94" name="Multiply" width="90" x="447" y="345"/>
<操作符= " true " class = " select_attribute激活s" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="514" y="165">




< = " t运营商激活rue" class="write_database" compatibility="5.1.001" expanded="true" height="60" name="Write Database" width="90" x="648" y="165">
@sentiment_analysis"/>;






<连接from_op = "应用模式”from_port = "标签data" to_op="Multiply" to_port="input"/>








[/tt]

Answers

  • haddockhaddock MemberPosts:849Maven
    Hi there Jepse,

    In my own work I've found that SVMs are very sensitive to their parameters, kernel type, C and epsilon. Things aren't made any easier by by the fact that those parameters are rather tricky to definehttp://www.svms.org/parameters/; so your best bet is to do the combinations and check the performances.

    I only mention this because I see that your model is made on one pass with C set to zero; without checking against the data it is not possible to be definitive, but I'm not that surprised that you just get one class in the prediction column. There is a handy paper on this subject athttp://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdfwhich will point you in the right direction.

    Hope so, have fun!
Sign InorRegisterto comment.