time series creation
Hello everyone,
我甲肝病毒e this date-time ettribut from which i want to create its appropriate time series counting the number of events per hour, so that i can plot the evolution of the time series and apply arima on it later on.
ID Date-time
1 31/12/2017 22:55
2 31/12/2017 22:27
3 31/12/2017 22:17
4 31/12/2017 21:55
5 31/12/2017 20:48
6 31/12/2017 11:03
7 31/12/2017 09:58
8 31/12/2017 09:52
9 31/12/2017 09:16
10 31/12/2017 08:11
11 31/12/2017 07:55
12 31/12/2017 06:54
13 31/12/2017 04:10
14 30/12/2017 12:15
15 30/12/2017 11:20
16 30/12/2017 10:50
17 30/12/2017 05:40
18 30/12/2017 04:49
19 30/12/2017 04:30
20 30/12/2017 04:00
21 30/12/2017 02:34
22 29/12/2017 23:02
23 29/12/2017 18:38
24 29/12/2017 08:04
25 29/12/2017 01:30
26 28/12/2017 23:59
27 28/12/2017 16:22
28 28/12/2017 14:43
29 28/12/2017 14:39
30 28/12/2017 14:39
31 28/12/2017 14:02
32 28/12/2017 13:46
33 28/12/2017 13:28
34 28/12/2017 12:51
35 28/12/2017 12:45
36 28/12/2017 12:24
37 28/12/2017 12:06
ETC
I do really need your help.
Thank you in advance.
Best regards
Best Answers
-
lionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
Hi@data1maths,
我甲肝病毒e no idea how to perform your task with RapidMiner's native operator(s).
So I propose a solution using a Python script :
To execute this process, you have to :
- Install Python on your computer
- Install thePython Scriptingextension from the MarketPlace
- Set the name of your datetime attributewith quotesin the parameters of theSet Macrosoperator :
ikik
The process :
I hope it helps,
Regards,
Lionel
1 -
MartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
-
tftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, MemberPosts:164RM Research
Hi@data1maths,
For creating timestamps for time series data the 'Create ExampleSet' operator from the toolbox extension is the way to go, as@mschmitzpointed out.
When I understand you correctly, you have already timestamps and wants to count the events per hour, right?
您可以使用生成与date_get属性(倍tamp,DATE_UNIT_HOUR) to extract the hour of your timestamp and then the Aggregate operator with the new hour_attribute as the group by attributes and count(id) to get the number of events in the corresponding hour. When you have events in every hour, you are already finshed. If not you should create hourly timestamps with the Create ExampleSet and joins this with your results and then Replace Missing Values with 0.
Best regards,
Fabian3 -
IngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
Hi,
And last but not least, you can also get to the desired result with Turbo Prep. In fact, there are multiple ways to achieve this. Here is one I like personally:
1.你的数据导入到存储库,打开它in Turbo Prep. Please note that I kept the date column in my example below as categorical (nominal) column. You may need to adapt this a bit if you already have a proper date column.
Load in timestamp data into Turbo Prep
2. Copy the date column in Transform. I called the new column "Hour".
Copy date column
3. Change the type of the new Hour column to "date and time" using the "Change Type" function in the Transform group. For the date format you show above, the correct date format string would be "dd/MM/yyyy HH:mm". Don't forget to change the output type to "date and time" as well!
Change type to date
4. Now change the type of the new column again, but this time to "number". Extract the "hour" relative to the "epoch". You now should have three columns: ID, the date / time column (still categorical), and a numerical column with all the hours (starting with a high number, but that should not bother you - if it does, you can change it later on. This column will be useful for plotting though...)
Extract hour as number
5. Now comes the cool trick and a bit of regular expression magic. You select your original date column (you can do all of this in the same transformation session BTW) and use the "Replace" function. You should activate the regular expressions and replace "(.*):.*" with "$1:00". This will replace all minutes in this column by "00" which allows you to use these new values as a nice representation in your aggregation.
Use a regular expression to replace the minutes by "00"
6. The final result should look like the table below. Commit it.
Final result of transformations - ready for the aggregation
7. Now go into the "Pivot" group and use the date column (the one without the minutes) and the numerical hours column as "Group-By" columns. Use the ID column as aggregate at the bottom - but change the function to "count". This result is below.
Create the Pivot table
8. You can sort the data now according to the Hours column in the Transform group. Then you can plot it to your heart's content. Of course you can also build a process from this etc.
The final chart as bars
Hope this helps. I will post the created process below as well.
Best,
Ingo
Loading timestamps Unify column types
<参数键= =“日期时间”/“attribute_name”价值>Copy Date-time to Hour Parse dates in Hour Extract hour of epoch for Hour Replace '(.*):.*' by '$1:00' in Date-time Aggregate ID grouped by Date-time, Hour Ensure order of columns matches the table view. 3
Answers
Hi@data1maths,
have a look at "Create Example Set" in operator toolbox. One of the examples shows a similar way of using it.
BR,
Martin
Dortmund, Germany
Himschmitz,
Honestly i couldn't find what i needed in the exemples.
What i want is to get like for the time between 22h 23h of the 31/12/2017 it gets me 3 as the number of records in this hour.
先生感谢你的帮助。
Best regards
Thank you so much to all of you.
Best Regards