Performance Issues

agurruchagaagurruchaga MemberPosts:11Contributor I
edited December 2018 inHelp

Good morning,

I am having serious performance issues when it comes to handle large databases. I configured an oracle connection and when I try to filter a table (about 15 Million rows) by a simple attribute, I get "Retrieve [tablename]..." message and it just does nothing. When I access the table with a standard sql editor, I get the results almost instantly. I don't know if this common when handling this large volumes of data or it should just work ok.

Thanks in advance.

Tagged:

Best Answers

  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, MemberPosts:299RM Data Scientist
    Solution Accepted

    Hi@agurruchaga,

    From your problem description where it states that the process outputs "Retrieve [tablename]" I assume you are using the Retrieve Operator and apply the Filter Examples Operator afterwards. The Retrieve Operator always fetches the full dataset and the filters are appliedafterwards.

    Thus, I suggest to use the Read Database Operator and write the SQL query with your desired filter option in there. This way the filtering is executed directly in the database and only the relevant Examples are loaded in RapidMiner.

    Best,

    Edin

    sgenzer phellinger agurruchaga
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:949Unicorn
    Solution Accepted

    Hi!

    No need to create a temporary table, you can use SQL IN.

    You'd aggregate the IDs from your Excel table using the Aggregate operator and the concatenation function. Then you would use Replace to change the separator to a comma.

    For example, you get a value like "13,25,45,133". You then use Extract Macro to convert this to a macro named "filter" that you can use in the SQL query.

    The SQL query would look like this:

    SELECT ...

    FROM bigtable

    WHERE key IN (%{filter})

    Regards,

    Balázs

    sgenzer SGolbert agurruchaga

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    I am interested in any of the developers' comments on this issue. Ever since the "Stream Database" operator was deprecated, I have also had problems with large datasets. Are there any suggested remedies or plans to bring back "Stream Database"?

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    agurruchaga
  • agurruchagaagurruchaga MemberPosts:11Contributor I

    Hello Edin,

    This worked like a charm! Thank you!

    Greetings.

    sgenzer Edin_Klapic
  • agurruchagaagurruchaga MemberPosts:11Contributor I

    Hello again,

    I was thinking about a similar but not same situation : Imagine I import a 100 row Excel sheet to Rapidminer that I wan't to join with a Table(15.000.000 rows) in my DB. If i wan't to perform a Join, the first step is to retrieve the database, so I would have the same problem I stated before, but can't use sql in this case since the 100 row table is not part of the DB. Is there a similar operator that allows to join without having to retrieve the entire DB?

    Thanks!

  • SGolbertSGolbert RapidMiner Certified Analyst, MemberPosts:344Unicorn

    Hi,

    第二个问题是实际的解决方案lly very similar to the first one. You can use Read Excel and Write Database to import your excel sheet into the DB. Then you can simply join with a sql command.

    Best,

    Sebastian

    sgenzer agurruchaga
  • agurruchagaagurruchaga MemberPosts:11Contributor I

    Hi Sebastian,

    I thought about that solution but the problem is that i do not have privileges to write in that database.

    I will have to think about another solution.

    Thanks!

  • SGolbertSGolbert RapidMiner Certified Analyst, MemberPosts:344Unicorn

    You might be able to create a temporary table.

    http://devzone.advantagedatabase.com/dz/webhelp/advantage9.0/adssql/using_temporary_tables_in_sql_statements.htm

    To be honest, I'm not sure how to implement this solution (maybe@BalazsBaranyhas a -better- idea). My guess is that you could convert the excel table to a CREATE TABLE statement, then use the Execute SQL operator to create the temporary table, make the join and finally retrieve the data.

    If you make it work, I'll be very interested in seeing the solution:D

  • agurruchagaagurruchaga MemberPosts:11Contributor I

    Sorry for the late reply, I think this solution is super! Fast and easy to use, thank you very much guys!

    Edit : It seems my oracle db has a 1.000 elements limitation in a list, so this might not be a solution in some cases.

    Greetings

    sgenzer
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:949Unicorn

    Hi,

    you can use Loop Batches or something similar in that case ;-) Then you would make as many queries as necessary and union or append the results.

    Regards,

    Balázs

    sgenzer
Sign InorRegisterto comment.