Solutions to Algoritmo Lab’s Data Science Challenge – November 2021 on Linear Regression
Q1. For a good model, R-squared takes a value close to 1.0
Q2. The n-1 dummy encoding needs to be performed in all-numeric predictor variables
Q3. If the p-value of the F-Statistic is less than the significance level,
Ans. Data provide evidence that the regression model fits the data well
Q4. When can R-Squared be negative?
Ans. When regression model fit is worse than average line
Q5. Build an intercept model with 7 numeric predictors & 1 numeric target variable. Build 2nd intercept model after z-score standardizing the predictors.
Ans. Both Multiple & adj r-squared will be the same for models 1 & 2
Q6. An intercept model is built with X as a predictor. Change X to Z where Z is 2021-X. Build the 2nd intercept model.
Ans. If the coefficient of X in model 1 is 121, the coefficient of Z in model 2 is -121
Q7. Is it necessary to standardize variables before using Lasso and Ridge Regression?
Q8. Errors from a linear regression model should be normally distributed with zero mean. If error terms are not normally distributed, it implies
Ans. Confidence Intervals will be too wide or narrow
Q9. The parameters of a linear regression model can be estimated using
Ans. Both least squares and MLE procedure
Q10. In linear regression, we can calculate the importance of variables by ranking predictors based on the
Ans. Descending order of absolute value of the standardized coefficient.
Q11. In the linear regression model, when an interaction is created from two variables that are not centered on 0,
Ans. Some amount of collinearity will be induced
Q12. In the linear regression model, is it helpful to standardize a variable when you include polynomial terms like X2 or X3
Ans. Yes, Standardization helps remove collinearity
Q13. Which of the following enforces sparsity in models?
Ans. L1 Norm
Q14. The more able a model is to ignore extreme values in the data, the more robust it is. Which of the following is correct?
Ans. L1-norm is more robust than L2-norm
Q15. A closed-form solution for a linear regression model is given by β=(XT.X)-1.XT.Y. In case perfect multicollinearity exists, (XTX)-1 may lead to
Ans. Singular matrix error
Q16. One-Hot encoding can lead to multi-collinearity and should be avoided in linear regression analysis
Q17. The Durbin Watson (DW) statistic is a test for autocorrelation in the residuals of a regression model. DW can take values between 0 and 4.
Ans. A value of 2 indicates there is zero autocorrelation
Q18. Ridge regression can reduce the coefficients to zero values