Topic 1. Heteroskedasticity and Its Consequences
Topic 2. Detecting and Correcting Heteroskedasticity
Topic 3. Multicollinearity and Its Consequences
Topic 4. Detecting and Correcting Multicollinearity
Q1. Effects of conditional heteroskedasticity include which of the following problems?
I. The coefficient estimates in the regression model are biased.
II. The standard errors are unreliable.
A. I only.
B. II only.
C. Both I and II.
D. Neither I nor II.
Explanation: B is correct.
Effects of heteroskedasticity include the following:
(1) The standard errors are usually unreliable estimates and
(2) the coefficient estimates are not affected.
Step 1: Estimate OLS and compute residuals.
If conditional heteroskedasticity is detected, we can conclude that the coefficients are unaffected but the standard errors are unreliable. In such a case, revised, White standard errors should be used in hypothesis testing instead of the standard errors from OLS estimation procedures.
Q2. Der-See Hsu, researcher for Xiang Li Quant Systems, is using a multiple regression model to forecast currency values. Hsu determines that the chi-squared statistics calculated using the of the regression involving the squared residuals as dependent variable exceeds the chi-squared critical value. Which of the following is
the most appropriate conclusion for Hsu to reach?
A. Hsu should estimate the White standard errors for use in hypothesis testing.
B. OLS estimates and standard errors are consistent, unbiased, and reliable.
C. OLS coefficients are biased but standard errors are reliable.
D. A linear model is inappropriate to model the variation in the dependent variable.
Explanation: A is correct.
Hsu’s test results indicate that the null hypothesis of no conditional
heteroskedasticity should be rejected. In such a case, the OLS estimates of standard errors would be unreliable and Hsu should estimate White corrected standard errors for use in hypothesis testing. Coefficient estimates would still be reliable (i.e., unbiased and consistent).
Definition: Occurs when two or more independent variables are highly correlated.
Perfect Collinearity: One variable is an exact linear combination of others (violates OLS assumption).
Consequences:
Coefficient estimates remain unbiased, but:
Standard errors increase.
t-values decrease, making significant variables appear insignificant.
Leads to Type II errors (failing to reject a false null).
Signs of Multicollinearity:
Q3. Ben Strong recently joined Equity Partners as a junior analyst. Within a few weeks, Strong successfully modeled the movement of price for a hot stock using a multiple regression model. Beth Sinclair, Strong’s supervisor, is in charge of evaluating the results of Strong’s model. What is the most appropriate conclusion for Sinclair based
on the variance information factor (VIF) for each of the explanatory variables included in Strong’s model as shown here?
A. Variables X1 and X2 are highly correlated and should be combined into one variable.
B. Variable X3 should be dropped from the model.
C. Variable X2 should be dropped from the model.
D. Variables X1 and X2 are not statistically significant.
Explanation: C is correct.
VIF > 10 for independent variable X2 indicates that it is highly correlated with the other two independent variables in the model, indicating multicollinearity. One of the approaches to overcoming the problem of multicollinearity is to drop the highly correlated variable.
Q4. Which of the following statements regarding multicollinearity is least accurate?
A. Multicollinearity may be present in any regression model.
B. Multicollinearity is not a violation of a regression assumption.
C. Multicollinearity makes it difficult to determine the contribution to explanation of the dependent variable of an individual explanatory variable.
D. If the t-statistics for the individual independent variables are insignificant, yet the F-statistic is significant, this indicates the presence of multicollinearity.
Explanation: A is correct.
Multicollinearity will not be present in a single regression. While perfect collinearity is a violation of a regression assumption, the presence of multicollinearity is not. Divergence between t-test and F-test is one way to detect the presence of multicollinearity. Multicollinearity makes it difficult to precisely measure the contribution of an independent variable toward explaining the
variation in the dependent variable.
Topic 1. Model Specification
Topic 2. Omitted Variable Bias
Topic 3. Bias-Variance Tradeoff
Topic 4. Residual Plots
Topic 5. Identifying Outliers
Topic 6. The Best Linear Unbiased Estimator (BLUE)
are met:
the omitted variable is correlated with other independent variables in the model, and
the omitted variable is a determinant of the dependent variable.
Q5. The omitted variable bias results from:
A. exclusion of uncorrelated independent variables.
B. inclusion of uncorrelated independent variables.
C. inclusion of correlated independent variables.
D. exclusion of correlated independent variables.
Explanation: D is correct.
Omitted variable bias results from excluding a relevant independent variable that is correlated with other independent variable.
Q6. Which of the following statements about bias-variance tradeoff is most accurate?
A. Models with a large number of independent variables tend to have a high bias error.
B. High variance error results when the out-of-sample R2 of a regression is high.
C. Models with fewer independent variables tend to have a high variance error.
D. General-to-specific model is one approach to resolve the bias-variance tradeoff.
Explanation: D is correct.
Larger, overfit models have a low bias error (high in-sample but low out-ofsample). Smaller, parsimonious models have lower
in-sample and a lower variance error. Two ways to resolve the bias-variance tradeoff are the general-to specific model and m-fold cross-validation.
Q7. Evaluate the following statements:
I. A high value of Cook’s distance indicates the presence of an outlier.
II. Cook’s distance is inversely related to the squared residuals.
A. Both statements are correct.
B. Only Statement I is correct.
C. Only Statement II is correct.
D. Both statements are incorrect.
Explanation: A is correct.
Both statements are correct. A high value of Cook’s distance for an observation (>1) indicates that it is an outlier. The squared residuals are in the denominator in the computation of Cook’s distance and, hence, are inversely related to the measure.