STAT3600 Linear Statistical Analysis

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE

STAT3600 Linear Statistical Analysis

Chapter 4 Multiple Linear Regression

4 Multiple Linear Regression

4.1 Model Formulation and Assumptions

A sample of n observations is observed in the form of (yi , xi 1 ,..., xi p ) for the ith observation:

where Y is the response variable and X1 ,...,Xp are the explanatory variables. A multiple linear regression model (MLR) assumes:

(A1) X1 ,...,Xp are non-random variables or they are given,

(A2) yi = β0 + β1 xi 1 + ... + βp xi p + εi with εi ∼ N (0, σ2),

(A3) the responses y1 ,..., yn are independent.

Equivalently, in matrix form,

where,

β0 ,β1 ,...βp are the unknown regression coefficients with βj measuring the associated effect of X j on Y ; and X is an n × (p + 1) design matrix with known entries. Here we assume the columns of X are linearly independent. The word ’linear’ in a linear model refers to the prop-erty that E (Y ) is ‘linear’ in the regression coefficients β0 ,β1 ,...βp , but not necessarily in each explanatory variable.

Examples:

Linear models (linear combination of parameters):

• y = β0 + β1 x + ε (SLR)

• y = β0 + β1 x + β2x2 + ε

(explanatory variables: x1 = x , x2 = x2)

• y = β0 + β1 sin 2πz1 − β2 log z2 + ε

(explanatory variables: x1 = sin 2πz1, x2 = −log z2)

NOT Linear models:

• 

• 

4.2 Model Fitting by Least Square Method

Same as the SLR, the unknown parameters β can be estimated by minimizing the Error Sum of Squares or Sum of the Squared Errors given the observed data of the explanatory variable X and response variable Y.

Hence,

The LS-estimator ˆβ = [ˆβ0, ˆβ1 ,...,ˆβp]T satisfies

which is the result we obtained for the SLR in Chapter 3.

4.2.1 Example (Environmental Data)

A data set is taken from an environmental study that measured four variables:

ozone (Y ) - ozone surface concentration in New York, in parts per million;

radiation (X1) - solar radiation, in langley;

temperature (X2) - observed temperature, in degrees Fahrenheit;

wind (X3) - wind speed, in miles per hour;

for 30 days.

Table 1: Environmental data


Figure 1: Scatterplot matrix for the environmental data

Multiple regression of Y on X1, X2 and X3:

yi = β0 + β1 xi 1 + β2 xi 2 + β3 xi 3 + εi   or   Y = X β + ε (matrix form),

where

and

As a result, the LS-estimate of β is:

Hence, the fitted model is given by:

When other factors are fixed, on average, ozone

• increases by 0.0013 parts per million when radiation increases by 1 langley.

• increases by 0.0456 parts per million when temperature increase by 1 degree Fahren-heit.

• decreases by 0.0278 parts per million when wind increases by 1 mile per hour.

4.2.2 Example (Philadelphia Birth Data)

These are data based on a 5% sample of all births occurring in Philadelphia in 1990. The sample has 1115 observations (after deleting 32 cases with incomplete information) on five variables:

grams (Y ) - Birth weight in grams;

ethnic (X1 ) - Mother is African American (1=yes, 0=no);

educ (X2 ) - Mother’s years of education (0-17 Years);

smoke (X3 ) - Whether mother smoked during pregnancy (1=yes, 0=no);

gestate (X4 ) - Gestational age in weeks;

Multiple regression of Y on X1, X2, X3 and X4:

yi = β0 + β1 xi 1 + β2 xi 2 + β3 xi 3 + β4 xi 4 + εi   or   Y = X β + ε (matrix form),

where

and


As a result, the LS-estimate of β is:

Hence, the fitted model is given by:

When other factors are fixed, on average, the weight of the child

• decreases by 168.9684 grams when the mother is Afro-American.

• increases by 9.5718 grams when the education duration of the mother increases by 1 year.

• decreases by 174.8129 grams when the mother is a smoker.

• increases by 156.5116 grams when the gestational age increases by 1 week.

发表评论

电子邮件地址不会被公开。 必填项已用*标注