Logistic Regression - Model Training Error (H2) NullPointerException

rtbarberrtbarber Member, University ProfessorPosts:8University Professor
edited November 2018 inHelp

I have a model with a binomial outcome variable. I built it using a decision tree, but would like to try a few other model types. Logistic Regression is the obvious next choice, but when I run it I get:Model training error (H2O). Error while training the H2O model java.lang.NullPointerException. Please check your input data and the parameter setup.

The process is relatively simple - after selecting an even sample, I have an Optimize Selection (Evolutionary) node, within which is a Validation node, and within that is the Logistic Regression node. The input data is a mix of numeric (integer, real, numeric) and polynomial data with no missing data. There are a LOT Of attributes (253) but only 1486 records. The Optimize Selection is intended to prune that down. (The decision tree used 4 of those variables.)

Any insights as to what this error means or what is going wrong? Is there a log file somewhere I need to be looking at?

Tagged:
Jasmine_

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    Can you post the XML of the process?

    Jasmine_
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder

    And - if possible and not confidential - maybe even a sample of the data causing this error? We can also figure out via private message if you prefer. We changed the underlying behavior of the Logistic Regression model recently and despite our tests there might always some specifics about data we did not capture ourselves...

    Thanks,

    Ingo

    Jasmine_
  • rtbarberrtbarber Member, University ProfessorPosts:8University Professor

    Happy to help! I am running 7.2 on a Mac (just installed this AM) I've anonymized the data and can share it via PM (I have no problem with your team having access to it, but there is potentially an issue if it were to get out into the wild.) In the meantime, a simplified XML is attached that reads the CSV of the data I will share (which is post ETL stuff). No change in error, so it wasn't in any of that mess.

    HRRet.xml 28.9 k
    Jasmine_
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder

    完美,谢谢你的帮助。我们将凯p the data confidential of course and will have a look into this. Our engineers will keep you posted on the progress...

    Thanks again,

    Ingo

    Jasmine_
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder

    Hi again,

    I got your process running if I de-select "compute p-values". So if you do not need them at this point this is at least a workaround until we publish a patch...

    Best,

    Ingo

    RandyLeBlanc Jasmine_
  • rtbarberrtbarber Member, University ProfessorPosts:8University Professor

    Thanks Ingo - That worked as an interim solution while I am developing the model. I look forward to the patch!

    RandyLeBlanc Jasmine_
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder

    Hi,

    Just a quick update on this: we are working on this right now and will come back as soon as we know when we will release a patch.

    Best,

    Ingo

    Jasmine_
  • phellingerphellinger Employee, MemberPosts:103RM Engineering

    Hi,

    here is the update: Studio 7.3 release fixes this error.

    For earlier versions, the workaround is to disable "compute p-values".

    The fix in the release actually does something similar: disables the p-values computation when there are too many (like a couple thousand) distinct nominal values in one or more regular attributes. Having such regular attributes may actually indicate that some feature selection / engineering is required before the training, even if the algorithm can deal with an input like this.

    Best,

    Peter

    Jasmine_
  • Shaila_SegalShaila_Segal MemberPosts:4Contributor I
    I am using RapidMiner 9.6.000 and I am having the same issue with the Model training error (H2O) with my logistic regression function. I have tried the unchecking compute p-value and it hasn't fixed it. Is there something else I can do in the newest version?
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
    @Shaila_Segalthis seems like a duplicate post?
    Jasmine_
Sign InorRegisterto comment.