Need help in analyzing medical keywords in a column of free text. Excel does not work here..

arsalan_karimarsalan_karim MemberPosts:14因素二世
edited December 2018 inHelp

hi Everyone.

I need some serious help. I have been working on an excel file. Just one column. It contains data coming from pHysicians offices. Its a string of free text that the doctor would write down when examining a patient. This column pertains to the daignosis information. I need to create a model to give this data some structure.

I am specifically trying to filter out all conditions that are related to migraine. the way I am doing it in microsoft excel is that I am using the "if,error,search" functions to sniff out the keywords from the table. I need two kinds of Keywords:

includes: i.e all keywords that can be "Migraines"

excludes: i.e all keywords that if present can never be migraines.

Sometimes I have to combine "includes" and "excludes" to find out the actual migraine.. for example:

Includes = Migraine

Excludes = family History of

in this case I am trying to look for a patient with Migraine, not someone who has a family history of Migraine. So I need to exclude the text "family History of". its like "this string should include this keyword and exclude this keyword"

I think this should be faily simple in rapidminer. It is taking my hours and hours of formulas in excel and driving me crazy since i have about half a million rows to analyse and too many formulas. The objective is to create a model that i can scale up to other diseases as well.

谁能帮助……

I am attaching the excel file with some data as well as some examples of includes and excludes I am using. Created a zip file with the excel file inserted

Thanks

Arsalan (MD)

Answers

  • FBTFBT MemberPosts:106Unicorn

    You could try the "Replace" operator. It allows you to replace your values with some regular expression logic. Afterwards it's just a matter of filtering the examples. However, be careful with respect to typos. You may for example want to just look for "migr", instead of "Migraine" to decrease the chance of missing something.

  • arsalan_karimarsalan_karim MemberPosts:14因素二世

    Hi FBT

    Thanks for replying, but I cant figure out how to use the Replace function.

    我试过,但它似乎并没有be changing anything in my original text.

  • FBTFBT MemberPosts:106Unicorn

    Ok, there is actually a simpler way. You can just use the "Filter Example" operator and select "Contain" together with your specified keyword to filter for what you are looking for. Take a look at the process below (just copy & paste it into your XML tab and press the green checkmark on the left top side).















    <参数键= value =“filters_entry_key Name_Clean.contains.Hemmeroids"/>




    <参数键= value =“filters_entry_key Name_Clean.contains.Migr"/>














    The "No Migraine" filter is just for illustration purposes. You would need to enter whatever your relevant keyword is. If you have more, remember to select "match any" within the operators filtering panel.

  • arsalan_karimarsalan_karim MemberPosts:14因素二世

    超级的我有crossed stage 1 with your help. Please check attached file.

    Now for stage 2 - I need to create seperate attributes for each of these key words. I need a table like this below.

    Name_Clean Type of keyword
    Migraines since Grade 7. Migraine
    ?episodic paroxysmal hemicrania Hemicrania
    severe headache 2010 - spinal tap negative headache

    By doing this i could easily generate aggregates and do my counts. Need help in creating this table.

    step1.png 163.6K
  • FBTFBT MemberPosts:106Unicorn

    See, if the below process does what you are looking for:















    <参数键= value =“filters_entry_key Name_Clean.contains.Rheumatic"/>




    <参数键= value =“filters_entry_key Name_Clean.contains.Migr"/>














    <参数键= value =“filters_entry_key Name_Clean.contains.headac"/>

























    It's based on your sample file again, which does not have an example of Hemicrania, hence I replaced it with Rheumatic Fever, for demo purposes. A word of caution: migraine and headache show up simultaniously in one example. This may become a cause of error in your further analysis, hence you may want to set a more elaborate filter, to make sure that the right keywords are assigned to such examples.

    MartinLiebig
  • arsalan_karimarsalan_karim MemberPosts:14因素二世

    Oh yeah this is much better . we break the data into smaller filters and tag them with the keyword and join them back into one in the end...

    Thank you so much for your help FBT.

    Really appreciate it...

    Arsalan

  • DocMusherDocMusher MemberPosts:333Unicorn

    Hi,

    You could also take a look at Metamap (https://metamap.nlm.nih.gov/), we used it before with RM.

    Cheers

    Sven

Sign InorRegisterto comment.