Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
CS7545, Spring 2024: Machine Learning Theory - Homework #1
Homework Policy: Homeworks must be typed up in LaTeX and submitted as a pdf. Working in groups is fine, but you must write up your own solutions. Please write the members of your group on your solutions. There is no strict limit to the size of the group but we may find it a bit suspicious if there are more than 4 to a team.
1) Norm. Prove the following norm inequalities. Assume x ∈ R N .
(a) ∥x∥2 ≤ ∥x∥1 ≤ √ N∥x∥2
(b) ∥x∥∞ ≤ ∥x∥2 ≤ √ N∥x∥∞
(c) ∥x∥∞ ≤ ∥x∥1 ≤ N∥x∥∞
(d) ∥x∥p ≤ N1/p∥x∥∞ for p > 1
(e) limp→+∞ ∥x∥p = ∥x∥∞ for p > 1
2) H¨older. Let p ∈ ∆N with full support; that is, PN i=1 pi = 1 and pi > 0 for all i = 1, . . . , N.
(a) Prove using H¨older’s Inequality that PN i=1 1 pi q−1 ≥ Nq for any q > 1.
(b) Prove that PN i=1 pi + 1 pi 2 ≥ N3 + 2N + 1/N
3) Convexity.
(a) Show that the following function is convex on the domain p ∈ ∆N : f(p) = X N i=1 pi log pi .
(b) Show that if a differentiable function g is convex then it satisfies the gradient monotonicity property: (∇g(x) − ∇g(y))T (x − y) ≥ 0 for all x, y ∈ dom(g
4) Fenchel.
(a) Let f, g be convex functions such that f ∗ = g. Write the convex conjugate of fα(·) = αf(·) (where α ∈ R +) in terms of α and g.
(b) Let f(x) := √ 1 + x 2. What is its Fenchel conjugate, f ∗ (θ)?
5) Hoeffding. Often Hoeffding’s Inequality is stated in a different way. Make sure to use Hoeffding to prove this version.
Let X1, . . . , Xm be m independent random variables sampled from the same distribution D, where D has support on [−1, 1], and the mean of D is µ. Then for some α, β, γ > 0 we have the following statement: with probability at least 1 − δ 1 m Xm i=1 Xi − µ ≤ α s log(β/δ) γm .
When you solve this problem, make sure to get the best values of α, β, γ!
6) Bayes classifier.
In this problem we will study an example of the Bayes classifier in a binary classification setting where X = R. Let Y = 1 with probability 1/2 and Y = −1 otherwise. Then, we draw X as follows: if Y = 1 let X ∼ N (µ/2, σ2 ) and if Y = −1 let X ∼ N (−µ/2, σ2 ) where N (µ, σ2 ) is the Gaussian distribution with mean µ and variance σ 2 .
(a) Recall that η(x) = Pr[Y = 1|X = x]. Show that η(x) = 1 1 + exp( −xµ σ2 ) . **Hint.** Use Bayes’ rule.
(b) Compute an analytical expression for hBayes (*i.e.*, substitute η(x) and simplify the resulting expression).
(c) Compute the Bayes error L(hBayes). You can leave your answer in terms of the Gaussian cdf Φ. (d) On which point(s) x ∈ R is hBayes most likely to make a mistake? Why? 2