[SOLVED] error imputing missing values using linear regression

cgkolarcgkolar MemberPosts:29Maven
edited June 2019 inHelp
Hi. I was assuming that this would be straightforward thing to do. I have a dataset with surprisingly few missing values in just a few of the cases, I want to compute the missing values. There is an ID field in the data but no label. I set up the following process.


































It appears to run, and when I run in debug mode it shows me the regression results for each of the 26 variables, but it appears to get to the end and throws me this error:
Dec 6, 2011 6:05:32 PM SEVERE: Process failed: operator cannot be executed. Check the log messages...
Dec 6, 2011 6:05:32 PM SEVERE: Here: Process[1] (Process)
subprocess 'Main Process'
+- Retrieve[1] (Retrieve)
+- Impute Missing Values[1] (Impute Missing Values)
subprocess 'Replacement Learning'
==> | +- Linear Regression[26] (Linear Regression)
+- Write Excel[0] (Write Excel)
Dec 6, 2011 6:05:32 PM FINER: Parameter 'send_mail' is not set. Using default ('never').
Dec 6, 2011 6:05:32 PM SEVERE: java.lang.NullPointerException
That's all I get in verbose mode. Any suggestions would be appreciated, this is my first time trying to impute missing values so much of this is a learning exercise for me. Thanks, CK

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hi,

    in your posted XML code the last lines are missing. Can you please post you complete process setup?

    Kind regards,
    Marius
  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    I replaced k-NN by a Linear Regression and I can't reproduce your errors. Since the Linear Regression can only handle real valued or binominal labels, in the Labor example it only replaces the real valued attributes.

    Are you using a current version of RapidMiner? If yes, the problem probably only occurs with your data, and a minimum set of data with which the error occurs would be helpful. Another helpful thing is the "Show Details" button in the error dialog you should get in debug mode. Please hit it and paste the stacktrace here.

    Cheers, Marius
  • cgkolarcgkolar MemberPosts:29Maven
    I have a small dataset, all real, what is strange is that when I look at the imputation operation and hover over the example set output of the linear regression operator it shows no more missing values (they do show up on the example set input). It seems like it is somehow getting hung up trying to get out of the impute missing values operator.

    Still not seeing an obvious mistake. Here is one moment of brokenness from the log window:
    2011年12月7日11:35:36AM FINE: Executing subprocess Impute Missing Values.Replacement Learning. Execution order is: [Linear Regression (Linear Regression)]
    2011年12月7日11:35:36AM FINE: Starting application 18 of operator Linear Regression
    2011年12月7日11:35:36AM FINER: Linear Regression called 18th time with input:
    training setConditionedExampleSet:
    217 examples,
    25 regular attributes,
    special attributes = {
    label = #17: i31 (real/single_value)
    }
    2011年12月7日11:35:36AM FINER: Parameter 'use_bias' is not set. Using default ('true').
    2011年12月7日11:35:36AM FINER: Parameter 'eliminate_colinear_features' is not set. Using default ('true').
    2011年12月7日11:35:36AM FINER: Parameter 'ridge' is not set. Using default ('1.0E-8').
    2011年12月7日11:35:36AM FINER: Parameter 'min_tolerance' is not set. Using default ('0.05').
    2011年12月7日11:35:36AM FINER: Parameter 'feature_selection' is not set. Using default ('M5 prime').
    Dec 7, 2011 11:35:37 AM FINE: Completed application 18 of operator Linear Regression
    Dec 7, 2011 11:35:37 AM FINER: Linear Regression returned with output:
    model 0.167 * i2
    - 0.064 * i4
    + 0.063 * i5
    + 0.120 * i14
    + 0.054 * i15
    + 0.040 * i16
    + 0.178 * i17
    - 0.040 * i20
    - 0.159 * i23
    - 0.129 * i24
    + 0.082 * i25
    - 0.037 * i28
    - 0.085 * i29
    - 0.107 * i33
    - 0.086 * i34
    - 0.031 * i37
    + 0.261 * i38
    - 0.129 * i39
    + 0.130 * i40
    + 0.078 * i41
    - 0.170 * i42
    + 2.964
    exampleSetConditionedExampleSet:
    217 examples,
    25 regular attributes,
    special attributes = {
    label = #17: i31 (real/single_value)
    }
    weights-/-
    Dec 7, 2011 11:35:37 AM FINEST: Linear Regression: execution time was 78 ms
    Dec 7, 2011 11:35:37 AM FINE: Impute Missing Values: Imputating missing values in attribute i31.
    Dec 7, 2011 11:35:37 AM WARNING: Impute Missing Values: Unable to impute 1 missing values in attribute i31.
    The missingXML ending is:





    C

  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hi, I still can't reproduce the NullPointerException on my data. Do you get a message box which states that something went wrong? If so, there should be a button to submit a bug. Then please use the bug report wizard to report the bug to our bugtracker. There will be some information included automatically which will help us to track down the bug.

    Best regards,
    Marius

    EDIT: just saw your PN, trying with your data right now.
  • haddockhaddock MemberPosts:849Maven
    Hi Folks,
    操作员MissingValueImpution背景失踪values by learning models for each attribute (except the label) and applying those models to the data set.
    Models built by regression also need labels in the data they model, but....
    There is an ID field in the data but no label.
    Just a thought

    PS MissingValueImpution should read MissingValueImputation
  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hi haddock,

    generally you are right, of course a regression needs a label. The Impute Missing Values operator however iterates attributes with missing values. It temporarily defines the current attribute as label, splits the dataset in examples with and without missing values, learns a model on the complete examples and applies it on the examples with missing values.
    When all attributes with missing values have been treated, the original label (if present) is restored.

    Now the problem was indeed that the cgkolar's dataset did not contain a label, because there was a bug in Impute Missing Values. I just fixed that bug, the fix will be included in the next release. Until then, the process below can be used as a workaround.

    Cheers,
    Marius





















































  • cgkolarcgkolar MemberPosts:29Maven
    谢谢马吕斯和黑线鳕。说实话,我是glad that it was a bug and not something going wrong in my head. I appreciate all of the attention. Problem solved. CK
Sign InorRegisterto comment.