"[SOLVED] How to filter documents by their length?"

mohammadrezamohammadreza MemberPosts:23Contributor I
edited June 2019 inHelp
Hi forum,

I wonder if somebody knows how to filter documents by their length instead of their content. I need to remove short documents from my training examples. But I could only find "Filter Document by Contents" which I think cannot be used for this case.

Thanks all

Answers

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University ProfessorPosts:1,984RM Engineering
    Hi,

    see the answer below for a better way of doing it. However, it is also possible to do it this way which allows for more complex filtering if needed:






    <宏/ >














































    <连接from_op = "过滤器示例”from_port = "考试ple set output" to_op="Data to Documents" to_port="example set"/>




    <连接from_op = "过滤器Examples (2)" from_port="example set output" to_op="Data to Documents (2)" to_port="example set"/>








    Regards,
    Marco
  • awchisholmawchisholm RapidMiner Certified Expert, MemberPosts:458Unicorn
    Hello

    There is always "Filter Tokens by Length". This will work on documents because these are just tokens.

    Here's an example





    <宏/ >




















    <连接from_op = "过滤器Tokens (by Length)" from_port="document" to_port="result 1"/>






    regards

    Andrew
  • mohammadrezamohammadreza MemberPosts:23Contributor I
    Thanks a lot Marco and Andrew. I found both of the solutions pretty nice and marked the question as SOLVED.
Sign InorRegisterto comment.