STAT3600 Linear Statistical Analysis

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE

STAT3600 Linear Statistical Analysis

Chapter 3 Simple Linear Regression

3 Simple Linear Regression

3.1 Motivation

In many situations, the value taken by one variable is influenced by or related to the value taken by some other variable X. In this chapter, we are mainly interested in determining what relationship exists between Y (dependent variable, response) and one X (independent vari- able, covariate, explanatory variable) (for example, Yis the monthly expenditure of the i-th person while Xi is monthly income of the same person). We wish to describe the relationship between and by means of a mathematical formula (or mathematical model) given by

f (X )

However, even if the model is true, our data will not agree perfectly with the model in general. Unlike in some physical science subjects where there may be exact functional rela- tionships between variables while in economics and most other fields, the functional rela- tionships are usually not exact. We express our model as

3.2 Model Formulation and Assumptions

When the data are of the form (x1, y1) , . . ., (xnyn) (with one dependent and one independent variable only), the simplest form is the simple linear regression model (SLR) given by

Yi = β0+ β 1xi + ε i = 1,2, . . . , n

where β 's are the regression parameters, ε i's are stochastic disturbances or random errors which are not observable. Moreover we assume that the values of X1,..., Xn are non-random, i.e. fixed or regarded as given.

After collection of the data, the first thing to do should be plotting the data points on a scatter diagram. Suppose the scatter diagram suggests a linear relationship between X and or a linear relationship is suggested by the subject matter knowledge or scientific evidence, we may approach to fit a simple linear regression model of the form

Yi = β0+ β 1xi + ε i = 1,2, . . . , n

Assumptions

(A1)  E (Ei) = 0 that implies E ( Y xi) = β0+β1x(primary assumption);

(A2)  Var (Ei) = σ2  (constant variance);

(A3)  The E i’s are uncorrelated (uncorrelatedness).

3.3 Least Square Estimator

The unknown parameters are β0, β1, and σ2. The parameters β0 and β1 are called the inter- cept and slope, respectively and are termed regression coefficients. Determination of the best fit line or the estimation of β0  and β 1  from the observed data cannot be performed without specification of the criterion. The most commonly used criterion is to minimize the Sum of the Squared Errors (SSE), which is the sum of square of the vertical deviations from the fitted straight line. Mathematically, we have

where yis the observed value of Y.  Our objective is to choose β0  and β1 to minimize SSE, and this method is known as the method of least squares (LS).

On differentiating SSE with respect to β0and β 1 and set to zero, we have the normalequa- tions:

Solving for β0 and β1, we obtain the LS -estimator

Note that the estimators are linear in yi’s. The following figure illustrates the application of the least square method on a single linear regression:

3.3.1 Example (Westwood Company Data)


Let man-hours be the dependent variable and lot size be the independent variable X. The simple linear regression model is given by

man-hours = β0+β 1lot size+ ε

The least square estimate of β0and β 1 are therefore

Hence, the estimated regression line is given by:

man-hours = 10 + 2 × lot size.

3.3.2 Example (Cholesterol Data)

The data are put in pairs as follows

Let the plasma levels of total cholesterol (in mg/ml) be the dependent variable and the ages be the independent variable X. The simple linear regression model is given by

cholesterol levels = β0+β1ages + ε

The least square estimate of β0and β 1 are therefore

Hence, the estimated regression line is given by:

cholesterol levels = 1.2799 + 0.0526 × ages.

发表评论

电子邮件地址不会被公开。 必填项已用*标注