STAT 527 Non-parametric Statistics Homework 1


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


STAT/BIOST 527 Homework 1 

Problem 1 - Estimating h by cross-validation For this problem, submit your code. In this problem you will compute and plot a kernel density estimate of the corresponding densities f and g given below. f(x) =  1, 0 ≤ x ≤ 1 0, otherwise. (1) g(x) =    4x 0 ≤ x ≤ 0.5 4(1 − x), 0.5 ≤ x ≤ 1 0, otherwise. (2) 

a. Sample a training set D consisting of n = 1000 samples from f and a validation set Dv of m = 300 samples. Use the Gaussian kernel and find the optimal kernel width h by cross-validation. For this, construct ph(x) the density estimated from D with kernel width h. Then compute the log-likelihood lv(h) of the data in Dv under ph. Also compute l(h), the likelihood of the training set D under ph. Repeat this for several values of h and plot lv(h) and l(h) as a function of h on the same graph. (Suggested range of h: 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5). Save the training set D. 

b. Let h ∗ be the h that maximizes lv(h). Make a plot of ph∗ (x) (by, for instance, computing the ph∗ (x) values on a grid x = −0.5 − 0.49, −0.48, . . . 1.49, 1.5). Plot the true p(x) on the same graph. Make sure that the x axis extends left and right of the [0, 1] interval and contains the entire region where ph 6≈ 0. 

c., d. Repeat questions a, b for G and g.

e.,f. Repeat questions a,c on the same samples D from a, c, this time with 5-fold CV. Plot the new l5cv(h) on the same graphs as in a,c, together with its standard deviation. Use either Rule 1 (argmax of l5cv(h)) or Rule 2 to obtain h ∗ 5cv. Make sure your graphs are clearly labeled and readable. Make separate graphs for f and g. The homework you hand in should contain: the formula(s) you used for ph, the formula(s) you used to compute lv(h) and l(h) and the required graphs. It is OK to replace log-likelihoods with likelihoods in the plots and equations. Clarification: because there are 2 true distributions here, we denote them f, g instead of pX. The estimators will be both denoted p instead of pˆX, as in the notes. 

g. Compare the optimal h’s and the quality of the plots in b, d. Which of the densities looks easier to approximate? Which of the optimal kernels widths is larger, the one used for f or the one used for g? Can you suggest an explanation why? 

[f. – Extra credit] Observing the bias and variance. Implement a sampler from f. Repeat B = 10 times: (1) draw a sample Db of size n = 100 from F; (2) use the h found in a to estimate f from Db , denote this particular estimate of f by p b h , (3) use the value h 0 = 2h to estimate f from Db , denote this particular estimate of f by f b 2h , Plot p 1:B h on the same plot and f 1:B 2h on a separate plot. 

Compare the two plots in terms of bias and variance. In which of the plots do you observe higher variance? In which of the plots do you observe higher bias? Explain your answer. To convince us that you understand these concepts, please use the terms correctly and precisely.

Problem 2 – variation of h with n 

Wenyu1 has a data set D0 with size n0 = 1, 000, 000, 000 and he wants to compute a kernel density estimator based on this data. He decides to select h by cross-validation (CV). For clarity, in this problem we will always denote ph,D(x) = 1 |D|h X i∈D k  x − x i h  Wenyu will sample Dv ⊂ D0 and use it as validation set, while using D = D0 \ Dv for constructing fh,D. After CV is finished, Wenyu will use the optimal h ∗ obtained from CV to construct the final density estimator fh∗,D0 . 

a. Recommend a value nv for the size of the validation set |Dv|. Explain your choice. 

b. Wenyu has followed your advice in a, and now has Dv of size nv and D of size n, with n + nv = n0. He has chosen a range of h with m = 100 possible values. He would like to know how many times the function k( ) would have to be computed to complete the entire CV procedure. For example, to obtain fh,D(x), the kernel function k( ) is computed n times, one for each term k( x−x i h ). 

c. Denote by N the value computed in b. Wenyu discovers that on his computer it will take too long to run the entire procedure and he will miss his homework deadline. Hence, he subsampled a data set Dsmall ⊂ D of size nsmall = 10, 000, and obtained h ∗ small by CV with Dsmall and Dv. But what he really needs is h ∗ , the optimal kernel width for fh∗,D0 . Can he obtain h ∗ by a simple calculation from h ∗ small and the other information available? 

Problem 3 – k-NN and Kernel Regression on a toy data set For this problem, feel free to make the graphs by hand or computer, as you wish. I recommend, though, that you do them by hand. If you make them by computer, please submit the code. As always, you are required to implement all the functions by hand, and this toy problem is no exception. The data set is D = {(1, 1),(2, 2),(3.5, 1),(4.0)}, n = 4. The task is to perform (by hand, preferably) non-parametric regression on these data. 

a. Let b1 be the square kernel with h = 1 b1(z) =  1, if |z| ≤ 0.5 0, otherwise , (3) and let b1/2 be the square kernel with h = 0.5. Complete the formula below (no proof required), in a way that ensures R R b1/2(z)dz = 1. b1/2(z) =  ?, if |z| ≤? 0, otherwise (4) 

b. Let ˆy1(x) be the Nadaraya-Watson kernel regression result for D with b1. Write the formula of yˆ1(x = 4.1), once as a literal expression, and once with all the numbers plugged in. Example: If D={(- 1,2),(-1.1,-3), (-7,2)}, and x = −1, yˆ1(x = −1) = 1×(−2)+1×(−3) 1+1 . Draw ˆy1(x) for x ∈ [−1, 6]. 

c. Let ˆy1/2(x) be the Nadaraya-Watson kernel regression result for D with b1/2. Write the formula of yˆ1/2(x = 4.1), once as a literal expression, and once with all the numbers plugged in. Draw ˆy1/2(x) for x ∈ [−1, 6]. 

d. Let ˆy1NN (x) be the 1-NN regression result for D. Write the formula of ˆy1NN (x = 4.1), with all the numbers plugged in. Draw ˆy1NN (x) for x ∈ [−1, 6]. 

e. Let ˆy2NN (x) be the 2-NN regression result for D. Write the formula of ˆy2NN (x = 4.1), with all the numbers plugged in. Draw ˆy2NN (x) for x ∈ [−1, 6]. 

f. What is the set supp ˆy = {x ∈ R, yˆ(x) 6= 0} for ˆy ∈ {yˆ1, yˆ1/2, yˆ1NN , yˆ2NN }? (no proofs required) 

发表评论

电子邮件地址不会被公开。 必填项已用*标注