Loyola College M.Sc. Statistics April 2006 Applied Regression Analysis Question Paper PDF Download

November 20, 2017 by 01 prakash

LOYOLA COLLEGE (AUTONOMOUS), CHENNAI – 600 034

M.Sc. DEGREE EXAMINATION – STATISTICS

AC 28

FIRST SEMESTER – APRIL 2006

ST 1811 – APPLIED REGRESSION ANALYSIS

Date & Time : 27-04-2006/1.00-4.00 P.M. Dept. No. Max. : 100 Marks

SECTION – A

Answer ALL the Questions (2´10 = 20 marks)

Define ‘Residuals’ of a linear model.
What is Partial F- test.
What are the two scaling techniques for computing standardized regression coefficients.
Define ‘Externally Studentized Residuals’.
Stae the variance stabilizing tramsformation if V(Y) is proportional to [E(Y)]³.
What is F_OUT in Backward selection process.
How is the multicollinearity trap avoided in regression models with dummy variables.
State any one method of detecting multicollinearit.
Give an example of a polynomial regression model.
Give the motivation for Generalized Linear Models.

SECTION – B

Answer any FIVE Questions (5´8 = 40 marks)

Fill up the missing entries in the following ANOVA for a regression model with 5 regressors and an intercept:

Source

d.f

S.S

Mean S.S.

F ratio

Regression

Residual

40.5

13.5

——-

Residual

——–

——-

Also, test for the overall fit of the model.

The following table gives the data matrix corresponding to a model
Y = b₀+b₁X₁+b₂X₂+b₃X₃. Suppose we wish to test H₀: b₂ = b₃. Write down the restrcited model under H₀ and the reduced data matrix that is used to build the restricted model.

1 2 -3 4

1 -1 2 5

1 3 4 -3

1 -2 1 2

X = 1 4 5 -2

1 -3 4 3

1 2 3 1

1 1 2 5

1 4 -2 2

1 -3 4 2

Explain how residual plot are used to check the assumption of normality of the errors in a linear model.

Discuss ‘Generalized Least Squares’ and obtain the form of the GLS estimate.

Explain the variance decomposition method of detecting multicollinearity and derive the expression for ‘Variance Inflation Factor’.
Discuss ‘Ridge Regression’ and obtain the expression for the redge estimate.

Suggest some strategies to decide on the degree of a polynomial regression model.

Describe Cubic-Spline fitting.

SECTION – C

Answer any TWO Questions (2 ´ 20 = 40 marks)

Build a linear regression model with the following data and test for overall fit . Also, test for the individual significance of X₁ and of X₂.

Y: 12.8 13.9 15.2 18.3 14.5 12.4

X₁: 2 3 5 5 4 1

X_2:4 2 5 1 2 3

(a)Decide whether “Y =b₀ + b₁X” or “Y² = b₀ + b₁X” is the more appropriate model for the following data:

X: 1 2 3 4

Y: 1.2 1.8 2.3 2.5

(b)The starting salary of PG students selected in campus interviews are given below

along with the percentage of marks they scored in their PG and their academic

stream:

Salary (in ‘000 Rs)

Stream

Gender

% in PG

12.5

7.5

Arts

Science

Commerce

Science

Arts

Commerce

Science

Commerce

Male

Female

Male

Female

Male

Female

It is believed that there could be a possible interaction between Stream and % in

PG and between Gender and % in PG. Incorporate this view and create the data

matrix. (You need not build the model). (10+10)

Based on a sample of size 16, a model is to be built for a response variable with four regressors X₁, …,X₄. Carry out the Forward selection process to decide on the significant regressors, given the following information:

SS_T = 1810.509, SS_Res(X₁) = 843.88, SS_Res(X₂) = 604.224, SS_Res(X₃) = 1292.923, SS_Res(X₄) = 589.24, SS_Res(X₁, X₂) = 38.603, SS_Res(X_1,X₃) = 818.048, SS_Res(X_1,X₄) = 49.84, SS_Res(X₂,X₃) = 276.96, SS_Res(X_2,X₄) = 579.23, SS_Res(X_3,X₄) = 117.14, SS_Res(X_1,X₂,X₃) = 32.074, SS_Res(X_1,X₂, X₄) = 31.98, SS_Res(X_1,X₃,X₄) = 33.89, SS_Res(X_2,X₃,X₄) = 49.22, SS_Res(X_1,X₂,X₃,X₄) = 31.91.

(a) Obtain the likelihood equation for estimating the parameters of a logistic regression model.

(b) If the logit score (linear predictor) is given by –2.4 + 1.5 X₁ + 2 X₂, find the estimated P(Y = 1) for each of the following combination of the IDVs:

X₁: 0 1.5 2 3 -2 -2.5

X₂: 1 0 1.5 -1 2 2.5 (12+8)

Go To Main page

Loyola College M.Sc. Statistics April 2006 Applied Regression Analysis Question Paper PDF Download

November 20, 2017 by 01 prakash

LOYOLA COLLEGE (AUTONOMOUS), CHENNAI – 600 034

M.Sc. DEGREE EXAMINATION – STATISTICS

AC 52

FOURTH SEMESTER – APRIL 2006

ST 4954 – APPLIED REGRESSION ANALYSIS

Date & Time : 27-04-2006/9.00-12.00 Dept. No. Max. : 100 Marks

SECTION –A

Answer ALL the Questions (10 X 2 = 20 marks)

State the statistic for testing the overall fit of a linear model with ‘k’ regressors.
Define ‘Extra Sum of Squares’.
Define ‘Studentized’ Residuals.
What is a ‘Variance Stabilizing Transformation’?
State the consequence of using OLS in a situation when GLS is required.
Define “Variance Inflation Factor’.
Give the form of the Ridge Estimate when a constant ‘l’ is added to the diagonal elements of X’X.
What is a hierarchical polynomial regression model?
Mention the components of a ‘Generalized Regression Model’ (GLM).
Define ‘Sensitivity’ of a Binary Logit Model.

SECTION – B

Answer any FIVE Questions (5 X 8 = 40 marks)

The following table gives the data on four independent variables used to build a linear model with an intercept for a dependent variable.

X₁

X₂

X₃

X₄

-2

-3

-1

-2

-3

-1

-2

-3

-1

-2

If one wishes to test the hypothesis H₀: b₁ = b₃, b₂ = 2b₄, write down the reduced

data matrix and the restricted model under H₀. Briefly indicate the test procedure.

Depict the different possibilities that occur when the residuals are plotted against the fitted values. How are they interpreted?

Define ‘Standardized Regression Coefficient’ and discuss any one method of scaling the variables.

Decide whether “Y= b₀ + b₁X” or “Y^1/2 = b₀ + b₁X” is the more appropriate model for the following data:

X	1	2	3	4
Y	3.5	4.7	6.5	9.2

Discuss the issue of ‘multicollinearity’ and its ill-effects.

Eigen Values

of X’X

Singular

Values of X

Condition

Indices

Variance decomposition Proportions

X₁ X₂ X₃ X₄ X₅ X₆

3.4784

2.1832

1.4548

0.9404

0.2204

0.0725

0.0003 0.0005 0.0004 0.0004 ? 0.0350

? 0.0031 0.0001 0.3001 0.0006 0.0018

0.0004 ? 0.0005 0.0012 0.0032 0.2559

0.0011 0.6937 0.5010 0.0002 0.7175 ?

0.0100 0.0000 ? 0.0003 0.0083 0.2845

0.8853 0.3024 0.4964 ? 0.2172 0.0029

Fill up the missing entries in the following table and investigate the presence of collinearity in the data, indicating which variables are involved in collinear relationships, if any.

Explain ‘Cubic Spline’ fitting.

Describe the components of a GLM. Show how the log link arises naturally in modeling a Poisson (Count) response variable.

SECTION – C

Answer any TWO Questions (2 X 20 = 40 marks)

The observed and predicted values of a response variable (based on a model using 25 data points) and the diagonal elements of the ‘Hat’ matrix are given below:

Y_i	16.68 11.50 12.03 14.88 13.75 18.11 8.00 17.83 79.24 21.50 40.33 21.0 13.5
Y_i^	21.71 10.35 12.08 9.96 14.19 18.40 7.16 16.67 71.82 19.12 38.09 21.59 12.47
h_ii	0.102 0.071 0.089 0.058 0.075 0.043 0.082 0.064 0.498 0.196 0.086 0.114 0.061

Y_i	19.75 24.00 29.00 15.35 19.00 9.50 35.10 17.90 52.32 18.75 19.83 10.75
Y_i^	18.68 23.33 29.66 14.91 15.55 7.71 40.89 20.51 56.01 23.36 24.40 10.96
h_ii	0.078 0.041 0.166 0.059 0.096 0.096 0.102 0.165 0.392 0.041 0.121 0.067

Compute PRESS statistic and R²_prediciton. Comment on the predictive power of the

underlying model.

(a) In a study on the mileage performance of cars, three brands of cars (A, B and C) and two types of fuel (OR and HG) were used. The speed of driving was also observed and the data are reported below:

Mileage(Y)

14.5 12.6 13.7 15.8 16.4 13.9 14.6 16.7 11.8 15.3 16.8 17.0 15.0 16.5

Speed

Car

Fuel

45 60 50 60 55 52 59 50 40 53 62 56 62 55

A B C B A A C A B B C C A B

OR HG OR HG HG OR HG OR OR HG HG OR HG OR

Create the data matrix so as to build a model with an intercept term and interaction terms between Fuel and Driving Speed and also between Car-type and Driving Speed.

(You need not build any model).

(b) Discuss GLS and obtain an expression for the GLS estimate. (14 + 6)

Based on a sample of size 15, a linear model is to be built for a response variable Y with four regressors X₁,…,X₄. Carry out the Forward Selection Process to decide which of the regressors would finally be significant for Y, given the following information:

SS_T = 543.15, SS_Res(X₁) = 253.14, SS_Res(X₂) = 181.26, SS_Res(X₃) = 387.88, SS_Res(X₄) = 176.77, SS_Res(X₁,X₂) = 11.58, SS_Res(X₁,X₃) = 245.41, SS_Res(X₁,X₄) = 14.95, SS_Res(X₂,X₃) = 83.09, SS_Res(X₂,X₄) = 173.77, SS_Res(X₃,X₄) = 35.15, S_Res(X₁,X₂,X₃) = 9.62, SS_Res(X₁,X₂,X₄) = 9.59, SS_Res(X₁,X₃,X₄) = 10.17, SS_Res(X₂,X₃,X₄) = 14.76, SS_Res(X₁,X₂,X₃,X₄) = 9.57

The laborers in a coal-mine were screened for symptoms of pneumoconiosis to study the effect of “number of years of work” (X) on the laborers’ health. The response variable ‘Y’ defined as ‘1’ if symptoms were found and ‘0’ if not. The data on 20 employees are given below:

Y	0 1 1 0 1 1 0 0 0 1 1 1 0 0 1 0 0 1 1 1
X	10 30 28 14 25 35 15 12 20 24 33 27 13 12 18 17 11 28 32 30

The logit model built for the purpose had the linear predictor (logit score) function as – 4.8 + 0.1 X. Construct the Gains Table and compute the KS statistic. Comment on the discriminatory power of the model.

Go To Main page

Loyola College M.Sc. Statistics Nov 2006 Applied Regression Analysis Question Paper PDF Download

November 20, 2017 by 01 prakash

LOYOLA COLLEGE (AUTONOMOUS), CHENNAI 600 034

M.Sc. Degree Examination – Statistics

I Semester – November 2006

ST 1811 – APPLIED REGRESSION ANALYSIS

02 / 11/ 2006 Time: 1.00. – 4.00 Max. Marks: 100

SECTION – A

Answer ALL the Questions (10 x 2 = 20 marks)

Define ‘residuals’ and ‘residual sum of squares’ in a linear model.
State the test for the overall fit of a linear regression model.
Define Adjusted R² of a linear model.
Give an example of a relationship that can be linearized.
What is the variance stabilizing transformation used when σ² is proportional to E(Y)[1 – E(Y)]?
State any one criterion for assessing and comparing performances of linear models.
State any one ill-effect of multicollinearity.
Illustrate with an example why both X and X² can be considered for inclusion as regressors in a model.
Define the logit link used for modeling a binary dependent variable.
Define any one measure of performance of a logistic model.

SECTION – B

Answer any FIVE Questions (5 x 8 = 40 marks)

Discuss “No-Intercept Model” and give an illustrative example where such a model is appropriate. State how you will favour such a model against a model with intercept. Indicate the ANOVA for such a model.

A model (with an intercept) relating a response variable to four regressors is to be built based on the following sample of size 10:

X₁

X₂

X₃

X₄

13.8

22.9

23.7

16.8

21.6

25.5

16.6

17.4

19.9

24.6

Write down the full data matrix. Also, if we wish to test the linear hypothesis H₀: β₄ = 2β₁ + β₂, write down the reduced model under the H₀ and also the reduced data matrix.

Give the motivation for standardized regression coefficients and explain anyone method for scaling the variables.

The following residuals were obtained after a linear regression model was built:

0.17, – 1.04, 1.24, 0.48, – 1.83, 1.57, 0.50, – 0.32, – 0.77

Plot the ‘normal probability plot’ on a graph sheet and draw appropriate conclusions.

Describe the Box-Cox method of analytical selection of transformation of the dependent variable.

Discuss the role of dummy variables in linear models, explaining clearly how they are used to indicate different intercepts and different slopes among categories of respondents /subjects. Illustrate with examples.

The following is part of the output obtained while investigating the presence of multicollinearity in the data used for building a linear model. Fill up the missing entries and point out which regressors are involved in collinear relationship(s), if

any:

Eigen Value (of X’X)	Singular value (of X)	Condition Indices	Variance Decomposition Proportions X₁ X₂ X₃ X₄ X₅ X₆
2.429	?	?	0.0003 0.0005 0.0004 0.0000 0.0531 ?
1.546	?	?	0.0004 0.0000 ? 0.0012 0.0032 0.0559
0.922	?	?	? 0.0033 0.9964 0.0001 0.0006 0.0018
0.794	?	?	0.0000 0.0000 0.0002 0.0003 ? 0.4845
0.308	?	?	0.0011 ? 0.0025 0.0000 0.7175 0.4199
0.001	?	?	0.9953 0.0024 0.0001 ? 0.0172 0.0029

Discuss ‘Spline’ fitting.

SECTION – C

Answer any TWO Questions (2 x 20 = 40 marks)

(a)Depict the different possibilities that can arise when residuals are plotted against the fitted (predicted) values and explain how they can be used for detecting model inadequacies.

(b) Explain ‘partial regression plots’ and state how they are useful in model building. (13 + 7)

The following data were used to regress Y on X₁, X₂, X₃ and X₄ with an intercept term and the coefficients were estimated to be β₀^{^} = 45.1225, β₁^{^} = 1.5894, β₂^{^} = 0.7525, β₃^{^} = 0.0629, β₄^{^} = 0.054. Carry out the ANOVA and test for the overall significance of the model. Also test the significance of the intercept and each of the individual slope coefficients.

Y(Heat in calories)	X1(Tricalium Aluminate)	X2(Tricalcium Silicate)	X3(Tetracalcium alumino ferrite)	X4(Dicalium silicate)
78.5	7	26	6	60
74.3	1	29	15	52
104.3	11	56	8	20
87.6	11	31	8	47
95.9	7	52	6	3
109.2	11	55	9	22
102.7	3	71	17	6
72.5	1	31	22	44
93.1	2	54	18	22
115.9	21	47	4	26

The following is also given for your benefit:

15.90911472	-0.068104115	-0.216989375	-0.042460127	-0.165914393
-0.068104115	0.008693142	-0.001317006	0.007363424	-0.000687829
-0.216989375	-0.001317006	0.003723258	-0.001844902	0.002629903
-0.042460127	0.007363424	-0.001844902	0.009317298	-0.001147731
-0.165914393	-0.000687829	0.002629903	-0.001147731	0.002157976

(X’X)^{– 1} =

Build a linear model for a DV with a maximum of four regressors using Stepwise Procedure, based on a sample of size 25, given the following information:

SS_T = 5600, SS_Res(X₁) = 3000, SS_Res(X₂) = 3300, SS_Res(X₃) = 3600,

SS_Res(X₄) = 2400, SS_Res(X₁,X₂) = 2300, SS_Res(X₁,X₃) = 2760,

SS_Res(X₁,X₄) = 2100, SS_Res(X₂,X₃) = 2600, SS_Res(X₂,X₄) = 2040,

SS_Res(X₃,X₄) = 1980, SS_Res(X₁, X₂, X₃) = 2000, SS_Res(X₁, X₂, X₄) = 1800,

SS_Res(X₁,X₃, X₄) = 1700, SS_Res(X₂,X₃,X₄) = 1500, SS_Res(X₁,X₂,X₃, X₄) = 1440.

(a) Briefly indicate the Wilk’s Likelihood Ratio Test and the Wald’s Test for testing the significance of a subset of the parameters in a Logistic Regression model.

(b) The following data were used to build a logistic model:

DV	1	1	0	1	0	0	1	0	1	1	1	0	0	1	0	1	1	0	0	0
X1	2	4	1	0	-1	3	5	-2	3	-2	3	0	-4	2	-3	1	-1	3	4	-2
X2	-2	-4	2	0	4	-2	1	3	-4	2	1	3	0	-2	-4	-3	1	-1	2	0

The estimates were found to be β₀ = 2.57, β₁ = 3.78, β₂= – 3.2. Construct the Gains Table and compute KS Statistic. (8+12)

Go To Main Page