Text Mining Use Cases and Capabilities with RapidMiner

yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist
edited December 2018 inKnowledge Base

Screen Shot 2018-04-12 at 9.16.24 AM.png

Attached is the slide deck that summarizes the major functions and techniques offered by the Text Processing, Web Mining, and Operator Toolbox extensions.

Maxb

Comments

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    @yyhuanglooks like some great new applications here. The presentation contains screenshots that appear to be of annotated processes---are there any new templates or examples that correspond to some of the newer capabilities described in the deck available for download?

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    Thanks@yyhuang, this has inspired me to upgrade my Twitter Content models.

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist

    Thanks for your interests! Of course, I will share some template process as supplemental files to the slides.

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    @yyhuangany ETA on those template processes? :-)

    Thanks!

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist






























    <连接from_op = "过滤器Stopwords(英语)”来m_port="document" to_op="Stem (WordNet)" to_port="document"/>























    LSA is quite simple, you just use SVD to perform dimensionality reduction on the tf-idf vectors&#8211;that&#8217;s really all there is to it!



    LSA or LSI

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist





































    generate n-grams with max length of 5
































    LDA

    The used filter on token is a data like

    and
    the
    you
    for
    in
    on
    from
    of
    am
    is
    was
    are
    be
    i
    that
    with
    very
    really
    can
    has
    will
    this
    they
  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist











    <运营商激活= " true "类=“子流程”薪酬atibility="8.1.003" expanded="true" height="82" name="Get News Feeds" width="90" x="45" y="187">


    http://feeds.bbci.co.uk/news/rss.xml"/>


    http://feeds.bbci.co.uk/news/world/asia/rss.xml"/>


    http://feeds.bbci.co.uk/news/business/rss.xml"/>


    http://feeds.bbci.co.uk/news/entertainment_and_arts/rss.xml"/>









    Don't convert article link to document text.




















































    We use a common process for both the training and testing set



    <运营商激活= " true "类=“子流程”薪酬atibility="8.1.003" expanded="true" height="103" name="post-processing" width="90" x="1184" y="85">

    <操作符= " true " class = " generate_attribu激活tes" compatibility="8.1.003" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="45" y="34">










    <操作符= " true " class = " generate_attribu激活tes" compatibility="8.1.003" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="246" y="34">














    <操作符= " true " class = " generate_attribu激活tes" compatibility="8.1.003" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="514" y="34">










    <连接from_op = from_po“生成属性(2)”rt="example set output" to_op="Select Attributes (2)" to_port="example set input"/>

    <连接from_op = from_po“生成属性(3)”rt="example set output" to_op="Rename" to_port="example set input"/>







































































    Extracted compnay name and find co-existence<br>find similar documents with the metioned entity name



    Entity recoginition for the company names mentioned in news tittle, with a lisf of target like

    19_entertainment
    20th_century_fox
    23andme
    27b/6
    37signals
    3com
    3m
    7-eleven
    a&m_records
    a&w_root_beer

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    Wonderful, thanks so much for those!!

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
Sign InorRegisterto comment.