"Timestamps from PostgreSQL: Different results in RapidMiner and RapidAnalytics"

LarsHLarsH MemberPosts:6Contributor II
edited June 2019 inHelp
Hi everyone,

I am facing a strange behavior of timestamps (or big numbers) in RapidMiner and RapidAnalytics when reading them from a PostgreSQL Database. I set up a very simple process, which only reads from the DB, sorts the resulting example set and stores it in the RA repository.

image











<参数键= "查询" value = "选择to_char (01.08.2012 01:01:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string, '01.08.2012 01:01:00+00'::timestamptz as zeit_ts, extract(epoch from timestamp with time zone '01.08.2012 01:01:00+00') as zeit_epoch union select to_char('01.08.2012 01:02:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string, '01.08.2012 01:02:00+00'::timestamptz as zeit_ts, extract(epoch from timestamp with time zone '01.08.2012 01:02:00+00') as zeit_epoch union select to_char('01.08.2012 01:03:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string, '01.08.2012 01:03:00+00'::timestamptz as zeit_ts, extract(epoch from timestamp with time zone '01.08.2012 01:03:00+00') as zeit_epoch;"/>

















When I execute
select
to_char('01.08.2012 01:01:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string,
'01.08.2012 01:01:00+00'::timestamptz as zeit_ts,
extract(epoch from timestamp with time zone '01.08.2012 01:01:00+00') as zeit_epoch

union

select to_char('01.08.2012 01:02:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string,
'01.08.2012 01:02:00+00'::timestamptz as zeit_ts,
extract(epoch from timestamp with time zone '01.08.2012 01:02:00+00') as zeit_epoch

union

select to_char('01.08.2012 01:03:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string,
“01.08.2012 01:03:00 + 00 ':timestamptz zeit_ts,
extract(epoch from timestamp with time zone '01.08.2012 01:03:00+00') as zeit_epoch;
directly from RapidMiner, everything seems fine and I get

image

When running the same process on the RA server, I get a different and obviously wrong result

image

I'm using Rapidminer 5.3.007, RapidAnalytics CE 1.3.007 and PostgreSQL 8.4.4. RapidMiner uses the DB connection I've defined in RapidAnalytics.

Does anyone have an idea of what is going on here?

Thanks for help,
LarsH
Tagged:

Answers

  • Nils_WoehlerNils_Woehler MemberPosts:463Maven
    Hi,

    do you have the same SQL driver for RapidMiner and RapidAnalytics?

    Best,
    Nils
  • LarsHLarsH MemberPosts:6Contributor II
    Hi!

    Yes, I think so. I didn't change the driver that comes with RM and RA (postgresql-9.1-901.jdbc4.jar).

    Over the weekend, I found out that the problem occurs on Linux and Windows. I've tested the following combinations:

    RM: Ubuntu 12.04 -> RA: Ubuntu 12.04
    RM: PC-BSD 9.1 -> RA: Ubuntu 10.04
    RM: Windows 7 -> RA: Windows 7

    All of them show the same behavior as described above.

    Best,
    Lars
  • LarsHLarsH MemberPosts:6Contributor II
    I just found out that my problem isn't only related to PostgreSQL.

    I created an example set called 'Test' via "Import CSV" with the following content (automatically identified as 'integer' by RM):
    1343782860
    Then I executed the following process


































    to get an example set that contains the number from 'Test' as date '01.08.2012 03:01:00 MESZ'. Doing this directly in Rapidminer, everything works well. But in Rapidanalytics, I get the same wrong result like in my PostgreSQL example.

    And I found out another interesting detail: When I declare my CSV number 1343782860to be 'numeric', Rapidminer imports it as 1343782900. Thats '01.08.2012 03:01:40 MESZ'! Perhaps that's the error that happens during the PostgreSQL import? But how can I fix it?
  • LarsHLarsH MemberPosts:6Contributor II
    LarsH wrote:

    And I found out another interesting detail: When I declare my CSV number 1343782860to be 'numeric', Rapidminer imports it as 1343782900. Thats '01.08.2012 03:01:40 MESZ'! Perhaps that's the error that happens during the PostgreSQL import? But how can I fix it?
    I've found another example for that problem. I created the following process

    image


























    and ran it with Rapidminer and Rapidanalytics. Lots of 16-digit random numbers.

    With Rapidminer, I get

    image

    but with Rapidanalytics, the last 7 digits are always zero?!

    image

    Of course, the numbers are "random", but is this behavior really correct?
  • Nils_WoehlerNils_Woehler MemberPosts:463Maven
    I think this has to do something with the way Postgres treats numbers. If you run it directly from RapidMiner, pure Java is used.
    No number is concatenated when storing the ExampleSet on disk. But when you run it in RapidAnalytics the results are stored in the Postgres SQL database you selected for RapidAnalytics.
    And there must be a weird problem. Currenlty I have no Postgres DB available, so I can't reproduce the error, but could you please have a look at the setup of your database tables?
    Is there something suspicious?

    Best,
    Nils
  • LarsHLarsH MemberPosts:6Contributor II
    Thanks, that's a very good hint I think. I've never used the local repository, even if I run my processes locally with RapidMiner, I store the results in the RapidAnalytics repository. I've found out that these example sets are corrupt as well when I open them in RapidMiner again. I've never seen that, because I thought the example set RapidMiner displays in the result perspective after finishing the local process, would have come from the RA repository...

    我可重复的随机数s' process and found out that RA stores the 16-digit numbers as 'real's, instead of 'double' or 'numeric'.
    rapidanalytics=> select rm_rownum, att_1, att_2 as label from es_75605_data_1 limit 1;
    rm_rownum | att_1 | label
    -----------+-------------+----------
    1 | 4.00508e+15 | 0.863356
    (1 Zeile)
    Even timestamps are stored as 'real'! That's why some digits (and seconds...) get lost!

    Best,
    Lars

  • Nils_WoehlerNils_Woehler MemberPosts:463Maven
    Just for clarification:
    If you run a process from the RapidMiner GUI the result you see in the end is produced on the machine you run RapidMiner on.
    It does not matter if the process is stored an RA or in a local Repository. Nevertheless the 'Store' operator will transfer the locally generated data to your RA instance and save it in the selected RA database if the process is stored on RapidAnalytics.

    Could you please have a look what the data types the data tables have? For me it looks like this on MySQL:

    select rm_rownum, att_1, att_2 as label from `rapidanalytics`.`es_305_data_1` limit 1;

    rm_rownum att_1 label
    1 4.005079079564649e15 0.8633563476940593

    SHOW FIELDS FROM `rapidanalytics`.`es_305_data_1`;

    Field Type Null Key Default Extra
    rm_rownum int(11) NO PRI NULL
    att_1 double YES NULL
    att_2 double YES NULL
    As you can see att_1 and the label values are stored as doubles.
    Thus retrieving the values yields exactly the same results as executing the process from RapidMiner locally does.

    Best,
    Nils
  • LarsHLarsH MemberPosts:6Contributor II
    For me it looks like this:
    SELECT a.attnum, a.attname, t.typname, a.attlen, a.atttypmod,
    a.attnotnull, a.atthasdef FROM pg_class c, pg_attribute a, pg_type t
    WHERE a.attnum > 0 and a.attrelid = c.oid and c.relname = 'es_75605_data_1'
    and a.atttypid = t.oid order by a.attnum;
    attnum | attname | typname | attlen | atttypmod | attnotnull | atthasdef
    --------+-----------+---------+--------+-----------+------------+-----------
    1 | rm_rownum | int4 | 4 | -1 | t | f
    2 | att_1 | float4 | 4 | -1 | f | f
    3 | att_2 | float4 | 4 | -1 | f | f
    (3 Zeilen)
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:949Unicorn
    Hi,

    assuming that your original problem with the timestamps is still current:

    to_char('01.08.2012 01:01:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ')

    Don't do this, it is not portable. It depends on the PostgreSQL configuration setting "datestyle" and also on the settings in the client session. That again can be set by the client and maybe the JDBC connector or RapidAnalytics set it up differently.

    You can always check the value by putting "SHOW datestyle;" into a Read Database operator.

    Your example outputs "08.01.2012" instead of "01.08.2012" in my RapidMiner because my PostgreSQL installation has "iso, mdy" configured - the default configuration.

    See the "Date input" topic in the PostgreSQL documentation for safe syntax variants:
    http://www.postgresql.org/docs/9.1/static/datatype-datetime.html
Sign InorRegisterto comment.