LOYOLA COLLEGE (AUTONOMOUS), CHENNAI – 600 034
M.Sc. DEGREE EXAMINATION – STATISTICS
|
FIRST SEMESTER – APRIL 2006
ST 1811 – APPLIED REGRESSION ANALYSIS
Date & Time : 27-04-2006/1.00-4.00 P.M. Dept. No. Max. : 100 Marks
SECTION – A
Answer ALL the Questions (2´10 = 20 marks)
- Define ‘Residuals’ of a linear model.
- What is Partial F- test.
- What are the two scaling techniques for computing standardized regression coefficients.
- Define ‘Externally Studentized Residuals’.
- Stae the variance stabilizing tramsformation if V(Y) is proportional to [E(Y)]3.
- What is FOUT in Backward selection process.
- How is the multicollinearity trap avoided in regression models with dummy variables.
- State any one method of detecting multicollinearit.
- Give an example of a polynomial regression model.
- Give the motivation for Generalized Linear Models.
SECTION – B
Answer any FIVE Questions (5´8 = 40 marks)
- Fill up the missing entries in the following ANOVA for a regression model with 5 regressors and an intercept:
Source | d.f | S.S | Mean S.S. | F ratio |
Regression
Residual |
?
14 |
?
? |
40.5
? |
13.5
——- |
Residual | ? | ? | ——– | ——- |
Also, test for the overall fit of the model.
- The following table gives the data matrix corresponding to a model
Y = b0+b1X1+b2X2+b3X3. Suppose we wish to test H0: b2 = b3. Write down the restrcited model under H0 and the reduced data matrix that is used to build the restricted model.
1 2 -3 4
1 -1 2 5
1 3 4 -3
1 -2 1 2
X = 1 4 5 -2
1 -3 4 3
1 2 3 1
1 1 2 5
1 4 -2 2
1 -3 4 2
- Explain how residual plot are used to check the assumption of normality of the errors in a linear model.
- Discuss ‘Generalized Least Squares’ and obtain the form of the GLS estimate.
- Explain the variance decomposition method of detecting multicollinearity and derive the expression for ‘Variance Inflation Factor’.
- Discuss ‘Ridge Regression’ and obtain the expression for the redge estimate.
- Suggest some strategies to decide on the degree of a polynomial regression model.
- Describe Cubic-Spline fitting.
SECTION – C
Answer any TWO Questions (2 ´ 20 = 40 marks)
- Build a linear regression model with the following data and test for overall fit . Also, test for the individual significance of X1 and of X2.
Y: 12.8 13.9 15.2 18.3 14.5 12.4
X1: 2 3 5 5 4 1
X2: 4 2 5 1 2 3
- (a)Decide whether “Y =b0 + b1X” or “Y2 = b0 + b1X” is the more appropriate model for the following data:
X: 1 2 3 4
Y: 1.2 1.8 2.3 2.5
(b)The starting salary of PG students selected in campus interviews are given below
along with the percentage of marks they scored in their PG and their academic
stream:
Salary (in ‘000 Rs) | Stream | Gender | % in PG |
12
8 15 12.5 7.5 6 10 18 14 |
Arts
Science Commerce Science Arts Commerce Science Science Commerce |
Male
Male Female Male Female Female Male Male Female |
75
70 85 80 75 60 70 87 82 |
It is believed that there could be a possible interaction between Stream and % in
PG and between Gender and % in PG. Incorporate this view and create the data
matrix. (You need not build the model). (10+10)
- Based on a sample of size 16, a model is to be built for a response variable with four regressors X1, …,X4. Carry out the Forward selection process to decide on the significant regressors, given the following information:
SST = 1810.509, SSRes(X1) = 843.88, SSRes(X2) = 604.224, SSRes(X3) = 1292.923, SSRes(X4) = 589.24, SSRes(X1, X2) = 38.603, SSRes(X1,X3) = 818.048, SSRes(X1,X4) = 49.84, SSRes(X2,X3) = 276.96, SSRes(X2,X4) = 579.23, SSRes(X3,X4) = 117.14, SSRes(X1,X2,X3) = 32.074, SSRes(X1,X2, X4) = 31.98, SSRes(X1,X3,X4) = 33.89, SSRes(X2,X3,X4) = 49.22, SSRes(X1,X2,X3,X4) = 31.91.
- (a) Obtain the likelihood equation for estimating the parameters of a logistic regression model.
(b) If the logit score (linear predictor) is given by –2.4 + 1.5 X1 + 2 X2, find the estimated P(Y = 1) for each of the following combination of the IDVs:
X1: 0 1.5 2 3 -2 -2.5
X2: 1 0 1.5 -1 2 2.5 (12+8)