De-identification of medical text

DocMusherDocMusher MemberPosts:333Unicorn
edited December 2018 inHelp

Hi,

Planning a scientific paper on the use of RM in the medical field. Therefore I would like to implement a recent github project (Python) in a RM process.https://github.com/vmenger/deduce/blob/master/setup.py

The RM community member who is able to integrate the py code in a RM process where data consist of a column with text examples, becomes a co-author of the paper.

Thanks

Sven

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,368RM Data Scientist
    Solution Accepted

    Sven,

    attached is a process using the function to "deidentify" a attribute named text. Tell me if you need more:).

    Best,

    Martin







    <运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">


    [email protected], t: 06-12345678) is 64 jaar"/>


    [email protected]"/>





    <参数键=“脚本”值= "进口熊猫# 10;from deduce import deduce # rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) def deduce_string(x): annotated = deduce.annotate_text(x, patient_first_names="Jan", patient_surname="Jansen") deidentified = deduce.deidentify_annotations(annotated) return deidentified def rm_main(data): attribute = "text" data["deident"] = data["text"].apply(deduce_string) # connect 2 output ports to see the results return data"/>











    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    DocMusher

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,368RM Data Scientist

    Sven,

    do i see it correctly that you just want to "un-identify" a single text attribute?

    ~Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • DocMusherDocMusher MemberPosts:333Unicorn

    Martin,

    In fact, the code does more and is unique for the Dutch language. With the explosive growth of medical data, the majority as text (e.a. discharge notes), any preprocessing require deletion of Protected Health Information.

    Thanks

    Sven

    Deduce: de-identification method for Dutch medical text

    This project contains the code for DEDUCE: de-identification method for Dutch medical text as described inMenger et al (2017). De-identification of medical text is needed for using text data for analysis, to comply with legal requirements and to protect the privacy of patients. Our pattern matching based method removes Protected Health Information (PHI) in the following categories:

    1. Person names, including initials
    2. Geographical locations smaller than a country
    3. Names of institutions that are related to patient treatment
    4. Dates
    5. Ages
    6. Patient numbers
    7. Telephone numbers
    8. E-mail addresses and URLs

    The details of the development and workings of the method, and its validation can be found in:

    Menger, V.J., Scheepers, F., van Wijk, L.M., Spruit, M. (2017). DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text, Telematics and Informatics, 2017, ISSN 0736-5853

Sign InorRegisterto comment.