LOYOLA COLLEGE (AUTONOMOUS), CHENNAI – 600 034
M.Sc. DEGREE EXAMINATION – STATISTICS
|
FOURTH SEMESTER – APRIL 2006
ST 4954 – APPLIED REGRESSION ANALYSIS
Date & Time : 27-04-2006/9.00-12.00 Dept. No. Max. : 100 Marks
SECTION –A
Answer ALL the Questions (10 X 2 = 20 marks)
- State the statistic for testing the overall fit of a linear model with ‘k’ regressors.
- Define ‘Extra Sum of Squares’.
- Define ‘Studentized’ Residuals.
- What is a ‘Variance Stabilizing Transformation’?
- State the consequence of using OLS in a situation when GLS is required.
- Define “Variance Inflation Factor’.
- Give the form of the Ridge Estimate when a constant ‘l’ is added to the diagonal elements of X’X.
- What is a hierarchical polynomial regression model?
- Mention the components of a ‘Generalized Regression Model’ (GLM).
- Define ‘Sensitivity’ of a Binary Logit Model.
SECTION – B
Answer any FIVE Questions (5 X 8 = 40 marks)
- The following table gives the data on four independent variables used to build a linear model with an intercept for a dependent variable.
X1 | X2 | X3 | X4 |
2
1 5 4 -2 3 2 -3 2 1 |
-1
4 2 3 -2 3 2 3 2 1 |
3
2 -3 1 4 -1 2 5 -2 -3 |
5
3 4 -1 2 1 4 1 3 -2 |
If one wishes to test the hypothesis H0: b1 = b3, b2 = 2b4, write down the reduced
data matrix and the restricted model under H0. Briefly indicate the test procedure.
- Depict the different possibilities that occur when the residuals are plotted against the fitted values. How are they interpreted?
- Define ‘Standardized Regression Coefficient’ and discuss any one method of scaling the variables.
- Decide whether “Y= b0 + b1X” or “Y1/2 = b0 + b1X” is the more appropriate model for the following data:
X | 1 | 2 | 3 | 4 |
Y | 3.5 | 4.7 | 6.5 | 9.2 |
- Discuss the issue of ‘multicollinearity’ and its ill-effects.
Eigen Values
of X’X |
Singular
Values of X |
Condition
Indices |
Variance decomposition Proportions
X1 X2 X3 X4 X5 X6 |
3.4784
2.1832 1.4548 0.9404 0.2204 0.0725 |
?
? ? ? ? ? |
?
? ? ? ? ? |
0.0003 0.0005 0.0004 0.0004 ? 0.0350
? 0.0031 0.0001 0.3001 0.0006 0.0018 0.0004 ? 0.0005 0.0012 0.0032 0.2559 0.0011 0.6937 0.5010 0.0002 0.7175 ? 0.0100 0.0000 ? 0.0003 0.0083 0.2845 0.8853 0.3024 0.4964 ? 0.2172 0.0029 |
- Fill up the missing entries in the following table and investigate the presence of collinearity in the data, indicating which variables are involved in collinear relationships, if any.
- Explain ‘Cubic Spline’ fitting.
- Describe the components of a GLM. Show how the log link arises naturally in modeling a Poisson (Count) response variable.
SECTION – C
Answer any TWO Questions (2 X 20 = 40 marks)
- The observed and predicted values of a response variable (based on a model using 25 data points) and the diagonal elements of the ‘Hat’ matrix are given below:
Yi | 16.68 11.50 12.03 14.88 13.75 18.11 8.00 17.83 79.24 21.50 40.33 21.0 13.5 |
Yi^ | 21.71 10.35 12.08 9.96 14.19 18.40 7.16 16.67 71.82 19.12 38.09 21.59 12.47 |
hii | 0.102 0.071 0.089 0.058 0.075 0.043 0.082 0.064 0.498 0.196 0.086 0.114 0.061 |
Yi | 19.75 24.00 29.00 15.35 19.00 9.50 35.10 17.90 52.32 18.75 19.83 10.75 |
Yi^ | 18.68 23.33 29.66 14.91 15.55 7.71 40.89 20.51 56.01 23.36 24.40 10.96 |
hii | 0.078 0.041 0.166 0.059 0.096 0.096 0.102 0.165 0.392 0.041 0.121 0.067 |
Compute PRESS statistic and R2prediciton. Comment on the predictive power of the
underlying model.
- (a) In a study on the mileage performance of cars, three brands of cars (A, B and C) and two types of fuel (OR and HG) were used. The speed of driving was also observed and the data are reported below:
Mileage(Y) | 14.5 12.6 13.7 15.8 16.4 13.9 14.6 16.7 11.8 15.3 16.8 17.0 15.0 16.5 |
Speed
Car Fuel |
45 60 50 60 55 52 59 50 40 53 62 56 62 55
A B C B A A C A B B C C A B OR HG OR HG HG OR HG OR OR HG HG OR HG OR |
Create the data matrix so as to build a model with an intercept term and interaction terms between Fuel and Driving Speed and also between Car-type and Driving Speed.
(You need not build any model).
(b) Discuss GLS and obtain an expression for the GLS estimate. (14 + 6)
- Based on a sample of size 15, a linear model is to be built for a response variable Y with four regressors X1,…,X4. Carry out the Forward Selection Process to decide which of the regressors would finally be significant for Y, given the following information:
SST = 543.15, SSRes(X1) = 253.14, SSRes(X2) = 181.26, SSRes(X3) = 387.88, SSRes(X4) = 176.77, SSRes(X1,X2) = 11.58, SSRes(X1,X3) = 245.41, SSRes(X1,X4) = 14.95, SSRes(X2,X3) = 83.09, SSRes(X2,X4) = 173.77, SSRes(X3,X4) = 35.15, SRes(X1,X2,X3) = 9.62, SSRes(X1,X2,X4) = 9.59, SSRes(X1,X3,X4) = 10.17, SSRes(X2,X3,X4) = 14.76, SSRes(X1,X2,X3,X4) = 9.57
- The laborers in a coal-mine were screened for symptoms of pneumoconiosis to study the effect of “number of years of work” (X) on the laborers’ health. The response variable ‘Y’ defined as ‘1’ if symptoms were found and ‘0’ if not. The data on 20 employees are given below:
Y | 0 1 1 0 1 1 0 0 0 1 1 1 0 0 1 0 0 1 1 1 |
X | 10 30 28 14 25 35 15 12 20 24 33 27 13 12 18 17 11 28 32 30 |
The logit model built for the purpose had the linear predictor (logit score) function as – 4.8 + 0.1 X. Construct the Gains Table and compute the KS statistic. Comment on the discriminatory power of the model.