[SOLVED] Filter Attributes

AnalyticaltimAnalyticaltim MemberPosts:15Contributor II
edited June 2020 inHelp
亲爱的快速社区,

While this question is undeniably basic I am at my wit's end of how to solve it so I turn to you.:o

I am working with a dataset of housing sales figures in NYC. One of my Attributes is called "NEIGHBORHOODS" I want to filter specific neighborhoods out of this larger dataset for exploration. Thus, I use the "Filter Examples" operator and select "attribute_value_filter" and use the string: "NEIGHBORHOOD=FORT GREENE" (note that all original data is in Caps thus the case sensitive nature of my string). This string does not return the filtered data. Instead in the Results window I get an ExampleSet with 0 examples, 0 special attributes, 3 regular attributes.

I have checked my spelling again and again checked the data to make sure it is all there and checked all over the internet to make sure my paramater string is correct. To no avail.

There is certainly something I am missing. Any help is much appreciated.

Yours,
Tim
Tagged:

Answers

  • venkateshvenkatesh MemberPosts:15Contributor I
    Is the attribute defined as nominal or text?
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University ProfessorPosts:1,984RM Engineering
    Hi,

    One of my Attributes is called "NEIGHBORHOODS"
    [...]
    and use the string: "NEIGHBORHOOD=FORT GREENE"
    You're missing an S there ;D

    On a more serious note, I just created an ExampleSet with such an attribute and tested your condition and it worked flawlessly for me (tried with attribute as nominal, polynominal and text). What version of RapidMiner are you using? Can you post your process setup here (go to the XML tab and just copy&paste)?

    Regards,
    Marco
  • AnalyticaltimAnalyticaltim MemberPosts:15Contributor II
    Dear Marco,

    You are quite right on the "S"!:o

    I currently have the attribute under the "NEIGHBORHOOD" as a polynominal. Could this be the problem?

    下面是我filte的XMLr process.

    Thanks again for all your help RapidMiner Rocks!

    Tim


    ?xml version="1.0" encoding="UTF-8" standalone="no"?>
























  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University ProfessorPosts:1,984RM Engineering
    Hi,

    no it should not matter.. You have included a whitespace at the end of your condition ("FORT GREENE ") though, please make sure that's not the error.

    Apart from that I don't know. It works for me, so I'm afraid I cannot help you any further without the actual data. If you could provide a minimal sample (you can use the Filter Example Range and Select Attributes operators to only the absolute minimum needed) to me (if the data should not be publically visible you can contact me via PM) I could have a look and check if there is a bug involved.

    Regards,
    Marco
  • AnalyticaltimAnalyticaltim MemberPosts:15Contributor II
    Dear Marco,

    Below is some origina data that I extracted with the "Filter examples range operator" within this example range the problem persists for me as well. You are correct about the "white space" in the code. I was trying that to see if it was my problem and it accidentally got in that XML I sent you. Sorry. The truncated dataset is below. Same problem with any neighborhood example in this case, Bath Beach or Carroll Gardens.

    Thanks again for your help!
    Tim

    "NEIGHBORHOOD","SALE PRICE","SALE DATE"
    "CARROLL GARDENS ",907278.0,10/9/12 12:00 AM
    "CARROLL GARDENS ",1522283.0,8/22/12 12:00 AM
    "CARROLL GARDENS ",885000.0,8/22/12 12:00 AM
    "CARROLL GARDENS ",1508642.0,8/10/12 12:00 AM
    "CARROLL GARDENS ",830000.0,8/7/12 12:00 AM
    "CARROLL GARDENS ",1483413.0,8/30/12 12:00 AM
    "BEDFORD STUYVESANT ",712775.0,9/27/12 12:00 AM
    "BEDFORD STUYVESANT ",700000.0,10/24/12 12:00 AM
    "BEDFORD STUYVESANT ",700000.0,10/24/12 12:00 AM
    "BEDFORD STUYVESANT ",450000.0,11/14/12 12:00 AM
    “浴ACH ",0.0,11/19/12 12:00 AM
    “浴ACH ",0.0,11/12/12 12:00 AM
    “浴ACH ",0.0,11/13/12 12:00 AM
    “浴ACH ",0.0,11/13/12 12:00 AM
    “浴ACH ",0.0,12/7/12 12:00 AM
    “浴ACH ",0.0,11/7/12 12:00 AM
    “浴ACH ",610000.0,6/28/12 12:00 AM
    “浴ACH ",0.0,5/3/12 12:00 AM
    “浴ACH ",0.0,3/26/12 12:00 AM
    “浴ACH ",508000.0,8/24/12 12:00 AM
    “浴ACH ",690000.0,11/14/12 12:00 AM
    “浴ACH ",0.0,2/6/12 12:00 AM
    “浴ACH ",800000.0,2/6/12 12:00 AM
    “浴ACH ",420000.0,4/4/12 12:00 AM
    “浴ACH ",500000.0,7/19/12 12:00 AM
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University ProfessorPosts:1,984RM Engineering
    Hi,

    there we go. You're having trouble because of the whitespaces at the end of each NEIGHBORHOOD name. Sadly due to some restrictions the parameter you entered will get trimmed, aka will have its leading and trailing whitespaces removed, therefore it won't work. What you can do is remove the whitespaces for your NEIGHBORHOOD attribute, and you can do so via the Generate Attributes operator. Just add it after you retrieve your data and before the Filter Examples operator. Then add a key/value pair to the function descriptions parameter as follows:

    attribute name: NEIGHBORHOOD_NEW
    function expressions: trim(NEIGHBORHOOD)
    You can then filter on the NEIGHBORHOOD_NEW attribute and will finally get your desired results:)
    We plan to enhance the Filter Examples operator in the future, but until then I'm afraid the workaround is necessary in this case.

    Regards,
    Marco
  • AnalyticaltimAnalyticaltim MemberPosts:15Contributor II
    Marco!

    My man! It worked like a dream! Thank you very much. You are tops! ;D:D

    Thanks again for all your help.
    RapidMiner Rocks!

    Tim
Sign InorRegisterto comment.