CMSE 820 Homework Assignment 1
This assignment is due on Jan. 24th at 11:59 pm.
Question 1: Assume that Y = XT β + ϵ, where X e RP is not random and ϵ … N(0, 1). Given i.i.d. data {(x1 , y1 ), . . . (xn , yn )}, we would like to estimate β e Rp through the maximum likelihood framework. Write down the joint log likelihood and compare it with the least-squares method.
Question 2: Consider the usual linear regression setup, with response vector y e Rn and predictor matrix X e Rp ×n. Let x1 ,..., xp be the rows of X. Suppose that β(ˆ) e Rp is a minimizer of the least-squares criterion,
Iy — XT βI2 .
a. Show that if v e Rp is a vector such that XTv = 0, then β(ˆ) + c · v is also a minimizer of the least-squares criterion, for any c e R.
b. If x1 ,..., xp e Rn are linearly independent, then what vectors v e Rp satisfy XTv = 0? We assume p 三 n.
c. Suppose that p > n. Show that there exists a vector v 0 such that XTv = 0. Argue, based on part (a), that there are infinitely many linear regression estimates. Further argue that there is a variable i e {1,..., p} such that the regression coefficient of vari- able β[i] can have different signs, depending on which estimate we choose. Comment on this.
Question 3: Implement the following model (you can use any language)
Y = XT β + ϵ,
where ϵ … N(0, 1), X … N(0, Ip ×p ) and β e Rp with β[1] = 1, β[2] = — 2 and the rest of β [j] = 0. Based on this setting, let us start with p = 5 and simulate {x1 ,..., x100 } and store it. Then carry out the following experiments.
(1) Based on the β and {x1 ,..., x100 }, we first simulate the corresponding Y ’s and calcu-late the β(ˆ)ols.
(2) Using the same {x1 ,..., x100 }, we then simulate another set of Y(˜) = {˜(y)1 , . . . , ˜(y)100 } and calculate the in-sample prediction error (PEin ) using β(ˆ)ols calculated in (1). This is one realization of PEin [prediction-error in sample].
(3) Repeat (1) - (2) 5000 times and take average of those 5000 calculated PEin. You have an approximate PEin.
(4) Repeat the same procedure for p = 10, 40, 80. What is the trend for the PEin? Comment on your findings.
Question 4: Implement the following model (you can use any language)
yi = β[1(*)] xi[1] + β[2(*)] xi[2] + ϵi ,
where E(ϵi ) = 0, Var(ϵi ) = 1, Cov(xi , xj ) = 0 and β = (−1, 2)T . We also assume xi ∼ N(0, Σx ) with
Σx = Cov(xi ) =( 0.9999(1) 0.91(9)99 ) .
We repeat the following 2000 times:
. Generate y = (y1 ,..., y50 )T and X = (x1 ,..., x50 ).
. Compute and record β(ˆ)ols and β(ˆ)ridge (for ridge regression, choose λ = 0.005).
Then report the followings:
a. The histograms for β(ˆ)[1(ol)](s) and β(ˆ)[(r)1(i)](d)ge. What conclusion can you make from these his- tograms?
b. For each replicate of the 2000 repeats, compare |β[1(*)] − β(ˆ)[1(ol)](s)| with |β[1(*)] − β(ˆ)[(r)1(i)](d)ge | . How many times does ridge regression return a better estimate of β[1(*)] ?