question marks in linear regression output

AD2019AD2019 Member, University ProfessorPosts:13University Professor
我跑18 independen的线性回归模型t variables and feature selection turned off. For some of the independent variables there were question marks for the standard error of the estimate, and therefore for the t-statistic and p-value for the coefficient. I ran the mode again with feature selection turned on and got the same question marks. What do these question marks mean? Thay cannot have anything to do with missing values as the regression would not have run to completion in that case. I am baffled about what these "?" symbols might mean. Help.....

Best Answers

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    Can you post your process xml? Do you have the bias parameter checked in the LR operator or the exclude collinear features? There are several options that can affect the output.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
  • AD2019AD2019 Member, University ProfessorPosts:13University Professor
    Hi, I have attached my process rmp file. the 'exclude collinear features' is unchecked. and you are correct about the bias thing. if 'use bias' is checked, i do not get question marks. if it is unchecked, i do get question marks. I did all this with 'feature selection' turned off. Something else is also strange. I then turned on feature selection and used T_Test as the selection method with alpha set to 0.05. I got a solution that included Independent variables with p-value much much higher than 0.05. I am confused why these IVs were not trimmed from the output. thanks in advance for your help.
  • AD2019AD2019 Member, University ProfessorPosts:13University Professor
    by the way, regardless of the cause, I would like to know what the question mark in the regression output is trying to communicate to the user. does it mean a computational underflow or overflow or a computational error or what?
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
    @AD2019I'm picking up this thread here. I have your process (thank you) but not the data set - hence I cannot run the process. Can you pls post?
  • AD2019AD2019 Member, University ProfessorPosts:13University Professor
    my apologies for this delay in posting the data file. please see attached. when i run the regression without bias, I get question marks in the regression model. What does that mean? the process files was posted earlier (RM-houseprice-process.rmp).
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
    @AD2019do you mean these ? marks?



    简单的答案是,?marks are used in RapidMiner when values are missing. The better question is why are they missing...my educated guess here (pls correct me@varunm1 @mschmitzif my stats are wrong here) is that there can be no std coefficient or tolerance for an intercept of a LinReg model as it's a computed value. All of your actual data (the other attributes) have std coefficients which make sense. But my stats are a wee bit rusty so I look to these other smart folks to correct me.:wink:

    Scott

    lionelderkrikor Pavithra_Rao
  • AD2019AD2019 Member, University ProfessorPosts:13University Professor
    Hi Scott:
    if you run the process with bias turned off, you will get questions marks for some of the independent variables as well, not just the intercept. Since there is a question mark on the standard error for these variables, the t-statistic and p-values also have question marks on them. So it is not just an issue of the intercept. The data set does not have missing values, so I could not figure out what the question marks were trying to say. The only thing I could think of was numerical overflow or underflow when calculating the standard error of the associated variable, but then I could not see how the coefficients would have been computed.
    Amit
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
    嗨Amit -

    Ah I understand. Good point. It's been a while since I've played with all of this (we normally use the GLM modeler instead of LinReg as it is far more versatile and robust). Let me investigate.

    Scott

  • AD2019AD2019 Member, University ProfessorPosts:13University Professor
    thanks Scott. Let me play around with GLM and see if I can get rid of the ?
    Tghadially sgenzer
  • AD2019AD2019 Member, University ProfessorPosts:13University Professor
    thank you Varun.
Sign InorRegisterto comment.