deleting data based on 2 conditions - not just filter

thesoletravelerthesoletraveler MemberPosts:3Contributor I
edited February 2020 inHelp

Hi there

I have only started using Rapidminer so am quite basic - apologies.

I would like to reduce my data to include only data that meets specific conditions. When I use the filter operators they are returning the data that i want removed. My dilemma is this. I want to delete all data that is a Yes, or a 1 (in first column) and has a value greater than 150 (2nd column). With the filter it returns the data for 80 of the 750 entries that meets those conditions but I want this data deleted and to keep the 670 entries not the 80. I hope I'm being clear.

Thank you I have spent hours trying different operators and searching the community and Youtube.

Tagged:

Best Answer

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Solution Accepted

    Hi again@thesoletraveler,

    If I understand good, you just have to apply the 2 filters conditions and checkinvert filter :

    Does this process answer to your need :





    <宏/ >
































    < portSpacingport="source_input 1" spacing="0"/>
    < portSpacingport="sink_result 1" spacing="0"/>
    < portSpacingport="sink_result 2" spacing="0"/>



    I hope it helps,

    Regards,

    Lionel

    sgenzer

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@thesoletraveler,

    Can you share your dataset and explain with an example what you want to obtain, please ?

    Regards,

    Lionel

  • thesoletravelerthesoletraveler MemberPosts:3Contributor I

    Hi

    In the outcome column of the raw data set you will see either 1 (true) or 0 (false) which I know how to switch. In the insulin column there are numerous values. Based on literature (and my directive), it is argued that anyone with insulin over 150 is on insulin therapy. So when building a model to predict diabetes I want to eliminate anyone who is 1 or true (they have diabetes) with an insulin reading above 150 as they already have diabetes and are receiving therapy, therefore not an accurate data set to include in building my model.

    As mentioned previously, I want to delete these 70+ data results so I have data from approx 680 (needing additional work in themselves) who may or may not have diabetes and who are not receiving insulin therapy.

    I believe it adds far more value to my model if insulin results are more relevant. Thanks in advance for your assistance!

    C

    diabetes.csv 23.3K
  • 妈rtinLiebig妈rtinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist

    Hi@thesoletraveler

    another way to solve this is the expression option of Filter Examples, which is very flexbile and powerful.

    Best,

    妈rtin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    sgenzer
  • thesoletravelerthesoletraveler MemberPosts:3Contributor I

    Thank you so much Lionel. So much to learn in this program! I really appreciate this community and their time which has been helpful for me in my analytics study. All the best.

    C

    sgenzer 妈rtinLiebig
Sign InorRegisterto comment.