Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
STAT 451 Final Exam
1. If a question is ambiguous, resolve the ambiguity in writing. We will consider grading ac- cordingly. e.g.
● In #10, I think “average” refers to the population mean μ (not the sample mean X(¯)).
● In #13b, I think ...
Please answer this question with a period (.) if you have no other comment, so that Canvas will think you answered it and give you its 1 point. Do not write unnecessary comments.
2. Consider using k-means on the unsupervised 1D dataset {x} = {1, 3, 5, 10, 12} to create k = 2 clusters. Suppose the two initial randomly-chosen cluster centroids are c1 = 3 and c2 = 5.
(a) What are the centroids after the first iteration of k-means?
c1 = and c2 = .
(b) What are the centroids after the second iteration?
c1 = and c2 = .
3. For each situation, indicate which hyperparameter search strategy, G = grid search or R = random search, is more likely to be successful. Suppose computation time is limited.
(a) A model has two hyperparameters. The first takes one of two string values and the other takes one of three numeric values.
(b) A model has two hyperparameters. The first takes a floating-point number in the interval [0, 1] while the second takes an integer in the range [0, 100000].
4. Consider the use of bagging applied to classification decision trees of depth 1 (one decision node and two leaf nodes per tree). A training data set, on the left, consists of {(x, y)} = {(x,y)} because x has only one feature, x. It is followed by B = 3 bootstrap resamples created by sampling with replacement from the training data.
Training data Resample #1 Resample #2 Resample #3
x y x y x y x y
1 0 1 0 1 0 1 0
2 1 2 1 1 0 1 0
3 0 4 1 3 0 2 1
4 1 4 1 4 1 2 1
Consider making a prediction for x = 2.
(a) What prediction is made by the tree trained on Resample #1? ˆ(y) =
(b) What prediction is made by the tree trained on Resample #2? ˆ(y) =
(c) What prediction is made by the tree trained on Resample #3? ˆ(y) =
(d) What prediction is made by this bagging classifier? ˆ(y) =
5. Here is a graph of 1D data {xi} = {xi} = {1, 2, 4} and corresponding Gaussian curves {fµ=xi,σ =b(x)} made with bandwidth b = 0.25.
(a) Supposing the data were randomly sampled from some population, use kernel density
estimation to estimate the population’s probability density f(x) at x = 1.
Based on the plot, the estimate is fb(ˆ)=0.25 (1) ≈ .
(b) Estimate the density at x = 1.5.
Based on the plot, the estimate is fb(ˆ)=0.25 (1.5) ≈ .
(c) On the figure above, draw the estimated density function over the interval [0 , 6].
6. Consider the following questions about model assessment.
(a) Consider a classifier trained on examples (x, y) in the first two columns of the table below that makes the predictions on training data in the third column.
(1, 4) 1 1 (3, −2) 1 1 (3, 0) 0 1
predicted ˆ(y)
Complete the corresponding confusion matrix:
1
(b) The classifier is evaluated on unseen test data yielding this confusion matrix:
predicted ˆ(y)
actual y 0 1
2
3
4
5
What is the precision on the test data?
(c) What is the recall on the test data?
(d) What is the accuracy on the test data?
(e) For a classifier that is randomly guessing with P(ˆy = 1) = 3/1, what is the AUC?
(f) For a classifier with TPR = 1 and FPR = 0, what is the AUC?
(g) For each situation, indicate whether P = precision or R = recall should be optimized:
i. A bank is doing fraud detection where a fraudulent transaction (“positive”) that is missed is expensive but a valid transaction labeled fraudulent is inexpensive.
ii. A doctor is screening patients for a disease in which an ill patient (“positive”) infects others and dies if the disease is not diagnosed.
iii. A marketing campaign invests considerable expense in a prospective cus-
tomer when it classifies that customer as likely to make a purchase (“positive”).
7. Consider a one-vs.-rest SVM classifier trained on the following data depicted by circles, squares, and triangles:
(a) On the graph above, draw the three binary classifiers required by this method.
(b) How does this classifier classify the point indicated by “+”?
circle
square
triangle
(c) Which category is ranked second by this classifier’s decision method for the “+”?
circle
square
triangle
8. Here is a graph of the data set {(xi, yi)} = {(xi, yi)} = {(1, 3), (2, 2), (4, 4)} (here each xi is a 1D xi) along with corresponding Gaussian curves {fµ=xi,σ =b(x)} made with bandwidth b = 0.25:
(a) Use kernel regression to estimate y = f(x) for x = 1. Based on the plot, the estimate is ˆ(y) ≈ .
(b) Estimate y = f(x) for x = 1.5.
Based on the plot, the estimate is ˆ(y) ≈ .
(c) On the figure above, draw the estimated regression function over the interval [0 , 6].
9. The next two questions are about principal component analysis (PCA).
(a) Consider the following code and its output:
rng = np.random.default_rng(seed=0) (n_rows, n_cols) = (10, 4)
X = rng.normal(loc=0, scale=1, size=n_rows*n_cols) .reshape((n_rows, n_cols)) pca = PCA(n_components=n_cols, random_state=0)
pca.fit(X=X)
with np.printoptions(precision=3):
print(f'pca.components_=\n{pca.components_}')
print(f'pca.explained_variance_={pca.explained_variance_}')
print(f'pca.explained_variance_ratio_={pca.explained_variance_ratio_}') print(f'pca.noise_variance_={pca.noise_variance_}')
print(f'pca.mean_={pca.mean_}')
print(f'pca.singular_values_={pca.singular_values_}') Output:
pca .components_=
[[-0 .219 -0 .091 -0 .752 -0.615]
[ 0 .854 0 .439 -0 .085 -0.265] [-0 .41 0 .882 -0 .138 0 .184] [-0 .232 0 .142 0 .639 -0.72 ]]
pca .explained_variance_=[1 .237 0 .733 0 .388 0 .109]
pca .explained_variance_ratio_=[0 .501 0 .297 0 .157 0.044] pca .noise_variance_=0 .0
pca.mean_=[-0 .448 0 .052 -0.093 0.247]
pca.singular_values_=[3.336 2.569 1.869 0.988]
What is the minimum number of principal components we must retain to account for 90% of the variability in the data?
(b) Suppose PCA is run on the data in the plot. Draw arrows on the plot repre- senting the first two principal compo- nents. (There is more than one correct answer.)