LOYOLA COLLEGE (AUTONOMOUS), CHENNAI – 600 034
|
M.Sc. DEGREE EXAMINATION – STATISTICS
FIRST SEMESTER – APRIL 2007
ST 1811 – APPLIED REGRESSION ANALYSIS
Date & Time: 02/05/2007 / 1:00 – 4:00 Dept. No. Max. : 100 Marks
Answer ALL the questions SECTION – A (10 x 2 = 20 marks)
- Explain the term ‘partial regression coefficients’.
- State an unbiased estimate of the error variance in a multiple linear regression model.
- Define ‘PRESS Residuals’.
- What is the variance stabilizing transformation used when σ2 is proportional to E(Y)?
- Give the expression for the GLS estimator explaining the notations.
- Mention any two sources of multicollinearity.
- Define ‘Variance Inflation Factor’ of a regression coefficient.
- Define a ‘Hierarchical Polynomial Model’.
- What is ‘Link function’ in a GLM?
- Give the interpretation for a positive coefficient in a logit model.
Answer any FIVE questions: SECTION – B (5 x 8 = 40 marks)
- Briefly explain the limitations to be recognized and cautions that are needed in applying regression models in practice.
- A model (with an intercept) relating a response variable to four regressors is to be built based on the following sample of size 10:
Y | X1 | X2 | X3 | X4 |
23.5
15.7 22.8 18.9 17.3 28.4 16.6 23.1 20.0 19.8 |
2
3 7 1 5 8 3 7 3 4 |
12
22 18 14 20 25 24 17 13 24 |
38
33 27 29 34 40 32 37 28 30 |
7
18 9 14 11 16 10 8 13 15 |
Write down the full data matrix. Also, if we wish to test the linear hypothesis H0: β2 = 2β3, β1 = 0, write down the reduced model under the H0 and also the reduced data matrix.
- Explain the motivation and give the expressions for ‘studentized’ and ‘externally studentized’ residuals.
- The following residuals were obtained after a linear regression model was built: -0.15, 0.03. -0.06, 0.01, 0.23, -0.31, 0.19, 0.15, -0.08, -0.01
Plot the ‘normal probability plot’ on a graph sheet and draw appropriate conclusions.
- An investigator has the following data:
Y: 3.2 5.1 4.5 2.4
X: 5 9 6 4
Guide the investigator as to whether the model Y = β0 + β1X or Y1/2 = β0 + β1X is more appropriate.
- Discuss the need for ‘Generalized Least Squares’ pointing out the requirements for it. Briefly indicate the ANOVA for a model built using GLS.
- The following is part of the output obtained while investigating the presence of multicollinearity in the data used for building a linear model. Fill up the missing entries and identify which regressors are involved in collinear relationship(s), if any.
Eigen
Value (of X’X) |
Singular
value (of X) |
Condition
Indices |
Variance Decomposition Proportions
X1 X2 X3 X4 X5 X6 |
2.525 | ? | ? | 0.0180 0.0355 0.0004 0.0005 ? 0.0350 |
1.783 | ? | ? | 0.0029 0.1590 0.0305 0.0987 0.0032 ? |
1.380 | ? | ? | 0.0168 0.0006 ? 0.0500 0.0006 0.0018 |
0.952 | ? | ? | 0.6830 ? 0.0001 0.0033 0.1004 0.4845 |
0.245 | ? | ? | ? 0.1785 0.0025 0.0231 0.7175 0.4199 |
0.002 | ? | ? | 0.2040 0.2642 0.9664 ? 0.0172 0.0029 |
- Discuss ‘Spline’ fitting.
Answer any TWO Questions SECTION – C (2 x 20 = 40 marks)
- (a)Obtain the decomposition of the total variation in the data under a multiple
linear regression model. Hence, define SST, SSR and SSRes and indicate the
ANOVA.
(b)Develop the Partial F-Test for the contribution of some ‘r’ of the ‘k’ regressors
in a multiple regression model. (10 + 10)
- A model with an intercept is to be built with the monthly mobile phone bill amount (Average over the past six months) of students (Y) as the DV and IDVs as: monthly income of parents (X1), age of the student (X2), number of telephone numbers saved in the mobile (X3) and also dummy variables indicating gender (male / female), class (UG /PG / M/Phil.), residence (day scholar / hostel inmate). The following data collected from 15 students are available:
Bill Amt.
(in Rs.) |
Income
(in ‘000 Rs.) |
Age | # of Saved
numbers |
Gender | Class | Residence |
230
150 300 225 400 180 125 170 200 350 280 375 450 390 195 |
25
12 35 40 45 20 15 18 12 15 21 35 42 37 18 |
17
22 21 18 21 24 19 18 20 25 19 23 22 26 17 |
50
38 62 43 45 33 27 36 22 35 39 50 47 43 25 |
F
F M M M F F M F M F F M M M |
UG
PG PG UG UG M.Phil UG UG PG M.Phil. UG PG PG M.Phil. UG |
Day scholar
Hostel inmate Day scholar Hostel inmate Hostel inmate Hostel inmate Day scholar Day scholar Hostel inmate Day scholar Hostel inmate Day Scholar Hostel inmate Hostel inmate Day Scholar |
(a) Construct the data matrix for building the model.
(b) If interaction effects of ‘Class’ with ‘parental income’ and interaction effect of ‘Gender’ with
‘Residence type’ are also to be incorporated in the model, write down the appropriate data matrix.
[You need not build the models]. (10 +10)
- Build a linear model for a DV with a maximum of four regressors using the Forward Selection method based on a sample of size 25, given the following information:
SST = 2800, SSRes(X1) = 1500, SSRes(X2) = 1650, SSRes(X3) = 1800,
SSRes(X4) = 1200, SSRes(X1,X2) = 1150, SSRes(X1,X3) = 1380,
SSRes(X1,X4) = 1050, SSRes(X2,X3) = 1300, SSRes(X2,X4) = 1020,
SSRes(X3,X4) = 990, SSRes(X1, X2, X3) = 1000, SSRes(X1, X2, X4) = 900,
SSRes(X1,X3, X4) = 850, SSRes(X2,X3,X4) = 750, SSRes(X1,X2,X3, X4) = 720.
- (a)Discuss ‘sensitivity’, ‘specificity’ and ‘ROC’ of a logistic regression model
and the objective behind these measures.
(b) The following data were used to build a logistic model and the estimates were
β0 = 3.8, β1 = –5.2, β2 = 2.2
DV | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 |
X1 | -3 | 1 | 0 | 2 | -2 | 4 | 1 | -1 | 5 | 2 | 3 | -2 | 0 | -4 | 1 | 2 | -1 | -2 | -3 | 4 |
X2 | 0 | 2 | -3 | 2 | -4 | -1 | 0 | 3 | 2 | -3 | 4 | -5 | 1 | -1 | -4 | 3 | 4 | -3 | 1 | 1 |
Compute the logit score for each record. Construct the Gains Table and
compute the KS statistic. (8 + 12)
Latest Govt Job & Exam Updates: