[SOLVED] error imputing missing values using linear regression
Hi. I was assuming that this would be straightforward thing to do. I have a dataset with surprisingly few missing values in just a few of the cases, I want to compute the missing values. There is an ID field in the data but no label. I set up the following process.
It appears to run, and when I run in debug mode it shows me the regression results for each of the 26 variables, but it appears to get to the end and throws me this error:
That's all I get in verbose mode. Any suggestions would be appreciated, this is my first time trying to impute missing values so much of this is a learning exercise for me. Thanks, CK
Dec 6, 2011 6:05:32 PM SEVERE: Process failed: operator cannot be executed. Check the log messages...
Dec 6, 2011 6:05:32 PM SEVERE: Here: Process[1] (Process)
subprocess 'Main Process'
+- Retrieve[1] (Retrieve)
+- Impute Missing Values[1] (Impute Missing Values)
subprocess 'Replacement Learning'
==> | +- Linear Regression[26] (Linear Regression)
+- Write Excel[0] (Write Excel)
Dec 6, 2011 6:05:32 PM FINER: Parameter 'send_mail' is not set. Using default ('never').
Dec 6, 2011 6:05:32 PM SEVERE: java.lang.NullPointerException
Tagged:
0
Answers
in your posted XML code the last lines are missing. Can you please post you complete process setup?
Kind regards,
Marius
Are you using a current version of RapidMiner? If yes, the problem probably only occurs with your data, and a minimum set of data with which the error occurs would be helpful. Another helpful thing is the "Show Details" button in the error dialog you should get in debug mode. Please hit it and paste the stacktrace here.
Cheers, Marius
Still not seeing an obvious mistake. Here is one moment of brokenness from the log window: The missingXML ending is:
C
Best regards,
Marius
EDIT: just saw your PN, trying with your data right now.
PS MissingValueImpution should read MissingValueImputation
generally you are right, of course a regression needs a label. The Impute Missing Values operator however iterates attributes with missing values. It temporarily defines the current attribute as label, splits the dataset in examples with and without missing values, learns a model on the complete examples and applies it on the examples with missing values.
When all attributes with missing values have been treated, the original label (if present) is restored.
Now the problem was indeed that the cgkolar's dataset did not contain a label, because there was a bug in Impute Missing Values. I just fixed that bug, the fix will be included in the next release. Until then, the process below can be used as a workaround.
Cheers,
Marius