Extract drug names

t_klokt_klok MemberPosts:3Contributor I
edited December 2018 inHelp

I am a medical doctor and doing research.

I have an excel sheet with freetext wich contains drugs names.

I want to filter out these drug names and count how many drugs are noted in each field (excel cell).

Any suggestions??

Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist

    Hi@t_klok,

    this is one of the problems were i started with "hey that's easy" and it turned out to be a 15operator process. Maybe there is another way to do this?@sgenzermight find one:). Anyway, my solution is attached.

    You might want to link up with@SvenVanPoucke. He is a physician and our medical expert in the community.

    Best,

    Martin






































    <描述一致=“中心”颜色=“透明”有限公司lored="false" width="126">Dummy data for drug texts you can replace this with read excel


    <描述一致=“中心”颜色=“透明”有限公司lored="false" width="126">Att needs to be text to work with Process Documents

    <运营商激活= " true " class = "特克斯t:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="34">



    <运营商激活= " true " class = "特克斯t:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="45" y="34"/>

    <参数键= "属性" value = "drugname"/>
    <描述一致=“中心”颜色=“透明”有限公司lored="false" width="126">only use specifed drug names







    <描述一致=“中心”颜色=“透明”有限公司lored="false" width="126">Generate bag of words






























    <描述一致=“中心”颜色=“透明”有限公司lored="false" width="126">Dummy data for drug names. You can replace this with read excel





    <描述一致=“中心”颜色=“透明”有限公司lored="false" width="126">id will become header in transpose




    <描述一致=“中心”颜色=“透明”有限公司lored="false" width="126">Only let attributes through which were present in the lower exa





















    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • t_klokt_klok MemberPosts:3Contributor I

    Hi Martin,

    Rapid(miner) answers..

    Thx I think I understand.

    But I would like to filter out drugnames using a list which contains the drugnames.
    I do not want to enter all the reference drugnames by hand....

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist

    Hi,

    sure you can just read in the Excel file instead of generating them by hand. That was just to generate some dummy data.

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    hi@t_klok- I'd want to see the data before really weighing in but just from what you describe I would use the Text Processing extension, tokenize, and then Filter Tokens (Dictionary) with the drug names. It's very similar to what@mschmitzbuilt with his XML.


    Scott

  • DocMusherDocMusher MemberPosts:333Unicorn

    Hi each country provides a list with official drug names. Additionally, SNOMED can help you find drug names in a text.

    Schermafbeelding 2018-03-08 om 17.01.01.png


    @sgenzerwrote:

    hi@t_klok- I'd want to see the data before really weighing in but just from what you describe I would use the Text Processing extension, tokenize, and then Filter Tokens (Dictionary) with the drug names. It's very similar to what@mschmitzbuilt with his XML.


    Scott


    sgenzer MartinLiebig
  • DocMusherDocMusher MemberPosts:333Unicorn

    Hi,

    Please take a look at the technology Microsoft is testing:https://www.youtube.com/watch?v=c6exHAzNwy4#action=share

    Cheers Sven

    MartinLiebig
  • t_klokt_klok MemberPosts:3Contributor I

    Thank you all.

    I have a (large) list of drugnames and I want to see if freetext fields in an xcl contain any of these names.

    So I query an xcl file with freetext cells and the reference is a file with all drugnames.

    I do not want to enter all these drugnames one by one in rapidminer.

Sign InorRegisterto comment.