Which are the most important parameters to tune for k-NN, NB, RF, DL, SVM for text classification?

jochen_hartmannjochen_hartmann MemberPosts:5Contributor I
edited July 2019 inHelp

Dear community,

I would like to compare the performance of the following five algorithms on different text classification tasks*:

  1. k-Nearest Neighbors (k-NN)
  2. Naive Bayes (NB)
  3. Random Forest (RF)
  4. Deep Learning (DL)
  5. Support Vector Machines (SVM)

Question 1:Which paramesters are the most important to optimize for each method 1-5?

Question 2:What ranges should I give those parameters in the parameter optimization operator in order to avoid "boiling the ocean"?

Thanks in advance!

* each task has between 3 to 5 classes and the text length varies between 3 to 70 words per document / example

Best Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn
    Solution Accepted

    Great question!

    1. With K-nn I would optimize around "k".
    2. 朴素贝叶斯我通常不优化ze
    3. Random Forest I would optimize depth of trees, # of trees, confidence
    4. Deep Learning I'm not sure but I would choose a few of the activation functions
    5. For text, I would use a LinearSVM and optimize C.
    kypexin jochen_hartmann
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    Solution Accepted

    Excellent suggestions from@Thomas_Ottas usual. I would add a couple more:

    • There isn't actually anything to optimize with Naive Bayes, there is only one parameter (Laplace correction) and I would definitely leave it on.
    • For Random Forest, I would also optimize the growing criterion (information gain, gain ratio, Gini, accuracy).
    • For SVM, you might also try a polynomial kernel and optimize C as well as degree in the range of 1-4.
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    kypexin Thomas_Ott jochen_hartmann
    Sign InorRegisterto comment.