Polynomial regression gives wrong results (?)

phivuphivu MemberPosts:34Guru
edited September 2019 inHelp

Hi RapidMiner,

I'm trying to use Polynomial Regression with a dataset generated from the function: y= 2*x^2 + 3*x + 1, and test the model with the same dataset, but the prediction results look like a straight line (attached picture). All the parameters are set as default (max iterations= 5000, replication factor= 1, max degree= 5). Could you show me how to get the correct result which can capture the quadratic curve using Polynomial Regression? I paste the dataset below and also insert the process code at the end. Thank you very much!

x y
1 6
2 15
3 28
4 45
5 66
6 91
7 120
8 153
9 190
10 231
11 276
12 325
13 378
14 435
15 496
16 561
17 630
18 703
19 780
20 861
21 946
22 1035
23 1128
24 1225

Polynomial-regression-result.jpg








<帕拉meter key="logverbosity" value="init"/>
<帕拉meter key="random_seed" value="2001"/>
<帕拉meter key="send_mail" value="never"/>
<帕拉meter key="notification_email" value=""/>
<帕拉meter key="process_duration_for_mail" value="30"/>
<帕拉meter key="encoding" value="SYSTEM"/>


<帕拉meter key="repository_entry" value="Polynomial"/>


<参数键= " attribute_name " value = " y " / >
<帕拉meter key="target_role" value="label"/>



<帕拉meter key="return_preprocessing_model" value="false"/>
<帕拉meter key="create_view" value="false"/>
<帕拉meter key="attribute_filter_type" value="single"/>
<帕拉meter key="attribute" value="x"/>
<帕拉meter key="attributes" value=""/>
<帕拉meter key="use_except_expression" value="false"/>
<帕拉meter key="value_type" value="numeric"/>
<帕拉meter key="use_value_type_exception" value="false"/>
<帕拉meter key="except_value_type" value="real"/>
<帕拉meter key="block_type" value="value_series"/>
<帕拉meter key="use_block_type_exception" value="false"/>
<帕拉meter key="except_block_type" value="value_series_end"/>
<帕拉meter key="invert_selection" value="false"/>
<帕拉meter key="include_special_attributes" value="false"/>
<帕拉meter key="method" value="range transformation"/>
<帕拉meter key="min" value="-1.0"/>
<帕拉meter key="max" value="1.0"/>


<帕拉meter key="max_iterations" value="5000"/>
<帕拉meter key="replication_factor" value="1"/>
<帕拉meter key="max_degree" value="5"/>
<帕拉meter key="min_coefficient" value="-100.0"/>
<帕拉meter key="max_coefficient" value="100.0"/>
<帕拉meter key="use_local_random_seed" value="false"/>
<帕拉meter key="local_random_seed" value="1992"/>


<帕拉meter key="repository_entry" value="Polynomial"/>


<参数键= " attribute_name " value = " y " / >
<帕拉meter key="target_role" value="label"/>



<帕拉meter key="return_preprocessing_model" value="false"/>
<帕拉meter key="create_view" value="false"/>
<帕拉meter key="attribute_filter_type" value="single"/>
<帕拉meter key="attribute" value="x"/>
<帕拉meter key="attributes" value=""/>
<帕拉meter key="use_except_expression" value="false"/>
<帕拉meter key="value_type" value="numeric"/>
<帕拉meter key="use_value_type_exception" value="false"/>
<帕拉meter key="except_value_type" value="real"/>
<帕拉meter key="block_type" value="value_series"/>
<帕拉meter key="use_block_type_exception" value="false"/>
<帕拉meter key="except_block_type" value="value_series_end"/>
<帕拉meter key="invert_selection" value="false"/>
<帕拉meter key="include_special_attributes" value="false"/>
<帕拉meter key="method" value="range transformation"/>
<帕拉meter key="min" value="-1.0"/>
<帕拉meter key="max" value="1.0"/>


<帕拉meter key="missing_attribute_handling" value="proceed on missing"/>



<帕拉meter key="create_view" value="false"/>



<帕拉meter key="create_view" value="false"/>










<连接from_op = "应用模式”from_port = "标签data" to_op="Apply Model (2)" to_port="unlabelled data"/>









Tagged:

Best Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    Solution Accepted

    Hi,

    You need to set the "replication factor" to 2 - otherwise the attribute "x" will only be used a single time. I also recommend to increase the number of "max iterations" for small data sets.

    Here is a a process generating the same data set and training the model. For this process I also restricted the max degree to 2 and the coefficients to a range between 1 and 3 so it is pretty much forced to learn your function. In reality, you would most likely use a larger range for the coefficients and the degrees of course...

    Hope this helps,

    Ingo












    <帕拉meter key="number_examples" value="24"/>
    <帕拉meter key="number_of_attributes" value="1"/>



    <帕拉meter key="attribute_name" value="id"/>



    <帕拉meter key="attribute_filter_type" value="single"/>
    <帕拉meter key="attribute" value="id"/>
    <帕拉meter key="include_special_attributes" value="true"/>


    <帕拉meter key="old_name" value="id"/>
    <帕拉meter key="new_name" value="x"/>




    <帕拉meter key="y" value="2*x^2+3*x+1"/>



    <参数键= " attribute_name " value = " y " / >
    <帕拉meter key="target_role" value="label"/>

















    <帕拉meter key="max_iterations" value="100000"/>
    <帕拉meter key="replication_factor" value="2"/>
    <帕拉meter key="max_degree" value="2"/>
    <帕拉meter key="min_coefficient" value="1.0"/>
    <帕拉meter key="max_coefficient" value="3.0"/>







    <连接from_op = "应用模式”from_port = "标签data" to_port="result 1"/>







    phivu
  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist
    Solution Accepted

    Interesting to test the polynominal regression model. Maybe it is a good chance to try RapidMiner's evolutionary optimization algorithm YAGGA (Yet Another Generating Genetic Algorithm). In short words, YAGGA will generate new attributes using some combinations of math functions: +, -, *, /, power function, etc.

    The attached sample process shows that YAGGA kept the orignal attribute x and also generate a new attribute for x^2. You can apply linear regression with the constructed attributes from YAGGA.








    <帕拉meter key="notification_email" value=""/>
    <帕拉meter key="encoding" value="SYSTEM"/>




    <帕拉meter key="number_examples" value="24"/>
    <帕拉meter key="number_of_attributes" value="1"/>



    <帕拉meter key="attribute_name" value="id"/>



    <帕拉meter key="attribute_filter_type" value="single"/>
    <帕拉meter key="attribute" value="id"/>
    <帕拉meter key="include_special_attributes" value="true"/>


    <帕拉meter key="old_name" value="id"/>
    <帕拉meter key="new_name" value="x"/>




    <帕拉meter key="y" value="2*x^2 + 3*x + 1"/>



    <参数键= " attribute_name " value = " y " / >
    <帕拉meter key="target_role" value="label"/>

















    <帕拉meter key="population_size" value="100"/>
    <帕拉meter key="maximum_number_of_generations" value="10"/>
    <帕拉meter key="use_plus" value="false"/>
    <帕拉meter key="reciprocal_value" value="false"/>
    <帕拉meter key="tournament_size" value="0.8"/>
    <帕拉meter key="keep_best_individual" value="true"/>


    <帕拉meter key="local_random_seed" value="10"/>

    <操作符= " true " class = " linear_regressio激活n" compatibility="7.3.000" expanded="true" height="103" name="Linear Regression" width="90" x="109" y="30"/>











    <帕拉meter key="root_mean_squared_error" value="false"/>
    <帕拉meter key="root_relative_squared_error" value="true"/>



















    <操作符= " true " class = " linear_regressio激活n" compatibility="7.3.000" expanded="true" height="103" name="Linear Regression (2)" width="90" x="447" y="34"/>
    <连接from_op = "准备数据“from_port =“1”to_op="YAGGA" to_port="example set in"/>













    phivu
  • phivuphivu MemberPosts:34Guru
    Solution Accepted

    It works now, after I set the replication factor to 2. Thank you!

    Sign InorRegisterto comment.