STAT0023: Computing for Practical Statistics

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

STAT0023 23-24: Question 9

Introduction

Consider using data (x ,y ), … (x ,y ) to estimate the regression function f(•) in the regression model Y = f(x ) + ε (i=1, …, n), where the {ε } are independent N(0,σ ) random variables. No assumptions are made about the function f(•) except that it is differentiable. Local linear regression is a nonparametric approach for estimating f(•). For a specific value x , the local linear regression estimate of f(x ) is obtained as the fitted value from a weighted least-squares regression of the {y } upon the {x }: specifically, it is the value of β obtained by minimising the expression

where the {w } are weights (see below). In this case, it can be shown that the required estimate of f(x ) is

where  The weights {w } are chosen so that observations with values close to x have a high weight and observations further away have near-zero weight: this ensures that the estimate is primarily based on observations with x values in the neighbourhood of x , hence the term 'local' linear regression. To construct the weights it is common to use the probability density function (pdf) of a normal distribution with mean 0 and standard deviation h for some user-specified value of h, so that  where is the pdf of a standard normal distribution.

As described above, the local linear regression technique will provide an estimate of the single value f(x ). To obtain an estimate of the entire regression function f(•) over some range (x , x ), it is necessary to set up a grid of values of x covering this range and to estimate the value of f(•) for each of these values: if the grid is fine enough, the estimates can then be plotted against the x values to obtain a smooth curve. Note: when doing this, the weights {w } will be different for each grid node - as will the quantities A, B and C.

Your task

Write an R function called llr(), to carry out a local linear regression estimate of y upon x as described above. The arguments to your function should be: y, a vector containing the {y }; x, a vector containing the {x }; xlims, a vector of length 2 containing the values of x and x (see above); n.grid, the number of points required in the grid of values of x ; and h, the standard deviation of the normal distribution used to calculate the weights {w }. The default value of n.grid should be 100, and the default value of h should be (x - x ) / 4. Your function should return a list containing components x and y (vectors containing the original data values); x0.grid (a vector containing the grid of values of x in increasing order) and f.hat (a vector containing the corresponding vector of estimates ). In writing your function, you may not use any existing R routines for local regression.

发表评论

电子邮件地址不会被公开。 必填项已用*标注