Classification and feature construction on Time series Data

surya_mpadsurya_mpad MemberPosts:3Contributor I
edited December 2018 inHelp

Hello everyone,

As part of a case study, I 've been working on the task 'Time series Classification' and the goal is to classify the time series data (each example in the dataset represents a time series) into 7 different classes. With the basic process( K-NN with Dynamic Time Warping) I got the classification accuracy of 98.93 and RMSE 0.011 +/0.103 ( which is strange). Since I am new to time series classification, I built a simple process without any feature construction.

So I would like to have your comments on the processes that I have built and about the various feature engineering(preprocessing) techniques and the operators in RapidMiner that I can apply on time series data (each example represents a time series) for classification

I have attached the sample data and the XML of the process. Please review the process and the data, and it would be great if you can let me know the right way to deal with the time series( each example in the dataset) data for the classification task with RapidMiner.

About the dataset:

*. Each example ( each row) represents a time series and have 34 regular attributes(features) which represent the different periods of the time series.

*. The class labels Type have 7 different classes(1,2,..7). see below picture

Capture.PNG

Your comments are valuable,

Many thanks and best regards,

Surya








< parameter key="logverbosity" value="init"/>
< parameter key="random_seed" value="2001"/>
< parameter key="send_mail" value="never"/>
< parameter key="notification_email" value=""/>
< parameter key="process_duration_for_mail" value="30"/>
< parameter key="encoding" value="SYSTEM"/>
< process expanded="true">

< parameter key="repository_entry" value="../data/Classfication_timeseries_with classnames"/>



< parameter key="ratio" value="0.8"/>
< parameter key="ratio" value="0.2"/>

< parameter key="sampling_type" value="automatic"/>
< parameter key="use_local_random_seed" value="false"/>
< parameter key="local_random_seed" value="1992"/>


< parameter key="attribute_filter_type" value="single"/>
< parameter key="attribute" value="Type"/>
< parameter key="attributes" value=""/>
< parameter key="use_except_expression" value="false"/>
< parameter key="value_type" value="attribute_value"/>
< parameter key="use_value_type_exception" value="false"/>
< parameter key="except_value_type" value="time"/>
< parameter key="block_type" value="attribute_block"/>
< parameter key="use_block_type_exception" value="false"/>
< parameter key="except_block_type" value="value_matrix_row_start"/>
< parameter key="invert_selection" value="true"/>
< parameter key="include_special_attributes" value="true"/>


< parameter key="create_complete_model" value="false"/>
< parameter key="training_window_width" value="10"/>
< parameter key="training_window_step_size" value="-1"/>
< parameter key="test_window_width" value="10"/>
< parameter key="horizon" value="1"/>
< parameter key="cumulative_training" value="false"/>
< parameter key="average_performances_only" value="true"/>
< process expanded="true">

< parameter key="k" value="1"/>
< parameter key="weighted_vote" value="false"/>
< parameter key="measure_types" value="NumericalMeasures"/>
< parameter key="mixed_measure" value="MixedEuclideanDistance"/>
< parameter key="nominal_measure" value="NominalDistance"/>
< parameter key="numerical_measure" value="DynamicTimeWarpingDistance"/>
< parameter key="divergence" value="GeneralizedIDivergence"/>
< parameter key="kernel_type" value="radial"/>
< parameter key="kernel_gamma" value="1.0"/>
< parameter key="kernel_sigma1" value="1.0"/>
< parameter key="kernel_sigma2" value="0.0"/>
< parameter key="kernel_sigma3" value="2.0"/>
< parameter key="kernel_degree" value="3.0"/>
< parameter key="kernel_shift" value="1.0"/>
< parameter key="kernel_a" value="1.0"/>
< parameter key="kernel_b" value="0.0"/>



< portSpacing port="source_training" spacing="0"/>
< portSpacing port="sink_model" spacing="0"/>
< portSpacing port="sink_through 1" spacing="0"/>

< process expanded="true">


< parameter key="create_view" value="false"/>


< parameter key="main_criterion" value="first"/>
< parameter key="accuracy" value="true"/>
< parameter key="classification_error" value="false"/>
< parameter key="kappa" value="false"/>
< parameter key="weighted_mean_recall" value="false"/>
< parameter key="weighted_mean_precision" value="false"/>
< parameter key="spearman_rho" value="false"/>
< parameter key="kendall_tau" value="false"/>
< parameter key="absolute_error" value="false"/>
< parameter key="relative_error" value="false"/>
< parameter key="relative_error_lenient" value="false"/>
< parameter key="relative_error_strict" value="false"/>
< parameter key="normalized_absolute_error" value="false"/>
< parameter key="root_mean_squared_error" value="true"/>
< parameter key="root_relative_squared_error" value="false"/>
< parameter key="squared_error" value="false"/>
< parameter key="correlation" value="false"/>
< parameter key="squared_correlation" value="false"/>
< parameter key="cross-entropy" value="false"/>
< parameter key="margin" value="false"/>
< parameter key="soft_margin_loss" value="false"/>
< parameter key="logistic_loss" value="false"/>
< parameter key="skip_undefined_labels" value="true"/>
< parameter key="use_example_weights" value="true"/>






< portSpacing port="source_model" spacing="0"/>
< portSpacing port="source_test set" spacing="0"/>
< portSpacing port="source_through 1" spacing="0"/>
< portSpacing port="sink_averagable 1" spacing="0"/>
< portSpacing port="sink_averagable 2" spacing="0"/>




< parameter key="create_view" value="false"/>









< portSpacing port="source_input 1" spacing="0"/>
< portSpacing port="sink_result 1" spacing="0"/>
< portSpacing port="sink_result 2" spacing="0"/>
< portSpacing port="sink_result 3" spacing="0"/>
< portSpacing port="sink_result 4" spacing="0"/>



sgenzer

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@surya_mpad,

    You are using aSliding Window Validationoperator, which is used, in deed, in time series problems.

    But a priori your problem is a pure classification problem : you want to predict the class of the attribute "Type" according to the values

    of your attributes Period-i, right ?

    So you have to use aCross Validationoperator associated to aPerformance (Classification)operator.

    I don't how you obtain an accuracy of 98.93 % (on the whole dataset ? / have you set "Product Id" as "id" usingSet Role?), this high result is suspect.

    To answer to your question about feature selection, in deed, you have a lot of attributes. So to reduce the number of these attributes (without losing precision), and thus gain in simplicity, you can useOptimize Selection (Evolutionnary)operator (documentation about this algorithmhere).

    On my side, on your partial dataset, I obain with thekNNmodel :

    with Optimize Selection without Optimize Selection

    k = 195%89%

    k = 2 88%89%

    k = 3 89% 89%

    ...

    You can find my process here :








    < process expanded="true">

    < parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Classification_Period\sample data.csv"/>
    < parameter key="first_row_as_names" value="false"/>

    < parameter key="0" value="Name"/>

    < parameter key="encoding" value="windows-1252"/>

    < parameter key="0" value="Product_ID.true.real.attribute"/>
    < parameter key="1" value="Period_1.true.real.attribute"/>
    < parameter key="2" value="Period_2.true.real.attribute"/>
    < parameter key="3" value="Period_3.true.real.attribute"/>
    < parameter key="4" value="Period_4.true.real.attribute"/>
    < parameter key="5" value="Period_5.true.real.attribute"/>
    < parameter key="6" value="Period_6.true.real.attribute"/>
    < parameter key="7" value="Period_7.true.real.attribute"/>
    < parameter key="8" value="Period_8.true.real.attribute"/>
    < parameter key="9" value="Period_9.true.real.attribute"/>
    < parameter key="10" value="Period_10.true.real.attribute"/>
    < parameter key="11" value="Period_11.true.real.attribute"/>
    < parameter key="12" value="Period_12.true.real.attribute"/>
    < parameter key="13" value="Period_13.true.real.attribute"/>
    < parameter key="14" value="Period_14.true.real.attribute"/>
    < parameter key="15" value="Period_15.true.real.attribute"/>
    <参数键= value = " Period_16.true.real.att“16”ribute"/>
    < parameter key="17" value="Period_17.true.real.attribute"/>
    < parameter key="18" value="Period_18.true.real.attribute"/>
    < parameter key="19" value="Period_19.true.real.attribute"/>
    < parameter key="20" value="Period_20.true.real.attribute"/>
    < parameter key="21" value="Period_21.true.real.attribute"/>
    < parameter key="22" value="Period_22.true.real.attribute"/>
    < parameter key="23" value="Period_23.true.real.attribute"/>
    < parameter key="24" value="Period_24.true.real.attribute"/>
    < parameter key="25" value="Period_25.true.real.attribute"/>
    < parameter key="26" value="Period_26.true.real.attribute"/>
    < parameter key="27" value="Period_27.true.real.attribute"/>
    < parameter key="28" value="Period_28.true.real.attribute"/>
    < parameter key="29" value="Period_29.true.real.attribute"/>
    < parameter key="30" value="Period_30.true.real.attribute"/>
    < parameter key="31" value="Period_31.true.real.attribute"/>
    < parameter key="32" value="Period_32.true.real.attribute"/>
    < parameter key="33" value="Period_33.true.real.attribute"/>
    < parameter key="34" value="Period_34.true.real.attribute"/>
    < parameter key="35" value="Period_35.true.real.attribute"/>
    < parameter key="36" value="Period_36.true.real.attribute"/>
    < parameter key="37" value="Type.true.integer.attribute"/>



    < parameter key="attribute_name" value="Type"/>
    < parameter key="target_role" value="label"/>

    < parameter key="Product_ID" value="id"/>



    < parameter key="attribute_filter_type" value="single"/>
    < parameter key="attribute" value="Type"/>
    < parameter key="include_special_attributes" value="true"/>




    < parameter key="k-NN.k" value="[3;10;10;linear]"/>

    < process expanded="true">

    < process expanded="true">

    < process expanded="true">

    < parameter key="k" value="10"/>



    < portSpacing port="source_training set" spacing="0"/>
    < portSpacing port="sink_model" spacing="0"/>
    < portSpacing port="sink_through 1" spacing="0"/>

    < process expanded="true">











    < portSpacing port="source_model" spacing="0"/>
    < portSpacing port="source_test set" spacing="0"/>
    < portSpacing port="source_through 1" spacing="0"/>
    < portSpacing port="sink_test set results" spacing="0"/>
    < portSpacing port="sink_performance 1" spacing="0"/>
    < portSpacing port="sink_performance 2" spacing="0"/>


    <运营商激活= " true "类=“记住”同情tibility="8.2.000" expanded="true" height="68" name="Remember" width="90" x="514" y="34">
    < parameter key="name" value="Model"/>
    < parameter key="io_object" value="Model"/>




    < portSpacing port="source_example set" spacing="0"/>
    < portSpacing port="source_through 1" spacing="0"/>
    < portSpacing port="sink_performance" spacing="0"/>



    < parameter key="name" value="Model"/>
    < parameter key="io_object" value="Model"/>






    < portSpacing port="source_input 1" spacing="0"/>
    < portSpacing port="source_input 2" spacing="0"/>
    < portSpacing port="sink_performance" spacing="0"/>
    < portSpacing port="sink_model" spacing="0"/>
    < portSpacing端口= " sink_output 1”间隔= " 0 " / >
    < portSpacing port="sink_output 2" spacing="0"/>
    < portSpacing port="sink_output 3" spacing="0"/>




    < parameter key="attribute_name" value="Weight"/>
    < parameter key="sorting_direction" value="decreasing"/>



    < parameter key="k-NN.k" value="[3;10;10;linear]"/>

    < process expanded="true">

    < process expanded="true">



    < portSpacing port="source_training set" spacing="0"/>
    < portSpacing port="sink_model" spacing="0"/>
    < portSpacing port="sink_through 1" spacing="0"/>

    < process expanded="true">







    <连接from_port = to_op =“测试集应用模型(2)" to_port="unlabelled data"/>



    < portSpacing port="source_model" spacing="0"/>
    < portSpacing port="source_test set" spacing="0"/>
    < portSpacing port="source_through 1" spacing="0"/>
    < portSpacing port="sink_test set results" spacing="0"/>
    < portSpacing port="sink_performance 1" spacing="0"/>
    < portSpacing port="sink_performance 2" spacing="0"/>





    < portSpacing port="source_input 1" spacing="0"/>
    < portSpacing port="source_input 2" spacing="0"/>
    < portSpacing port="sink_performance" spacing="0"/>
    < portSpacing port="sink_model" spacing="0"/>
    < portSpacing端口= " sink_output 1”间隔= " 0 " / >







    <连接from_op = "优化参数(网格)”_port="performance" to_port="result 1"/>
    <连接from_op = "优化参数(网格)”_port="parameter set" to_port="result 3"/>
    <连接from_op = "优化参数(网格)”_port="output 1" to_op="Weights to Data" to_port="attribute weights"/>



    < portSpacing port="source_input 1" spacing="0"/>
    < portSpacing port="sink_result 1" spacing="0"/>
    < portSpacing port="sink_result 2" spacing="0"/>
    < portSpacing port="sink_result 3" spacing="0"/>
    < portSpacing port="sink_result 4" spacing="0"/>
    < portSpacing port="sink_result 5" spacing="0"/>



    I hope it helps,

    Regards,

    Lionel

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    @surya_mpadI want to add that your time series data appears to have very low activity and then a sudden spike in volatility. Can you account for this?spikes.png

    sgenzer
  • surya_mpadsurya_mpad MemberPosts:3Contributor I

    Hi Thomas,

    Thanks for the reply.

    I think the time series what you have drawn is from one attribute(period_1.0).
    As I have mentioned in my post, each example represents a time series, and the task is to classify them into categories( attribute 'type' is the label). So I think we need to analyses time series on each example( please correct me if I am wrong).

    And please remember that the data generated with a script, so the data might irregular.

    Many Thanks
    Surya

    sgenzer
Sign InorRegisterto comment.