Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Theoretical Foundations of Machine Learning - Homework #1
Homework Policy: Working in groups is fine. Please write the members of your group on your solutions. There is no strict limit to the size of the group but we may find it a bit suspicious if there are more than 4 to a team. Questions labelled with (Challenge) are not strictly required, but you’ll get some participation credit if you have something interesting to add, even if it’s only a partial answer.
1) Norm. Prove the following norm inequalities. Assume x ∈ R N .
(a) kxk2 ≤ kxk1 ≤ √ Nkxk2
(b) kxk∞ ≤ kxk2 ≤ √ Nkxk∞
(c) kxk∞ ≤ kxk1 ≤ Nkxk∞
(d) kxkp ≤ N1/pkxk∞ for p > 1
2) H¨older. Let p ∈ ∆N with full suport; that is, PN i=1 pi = 1 and pi > 0 for all i = 1, . . . , N.
(a) Prove using H¨older’s Inequality that PN i=1 1 pi q−1 ≥ Nq for any q > 1.
(b) Prove that PN i=1 pi + 1 pi 2 ≥ N3 + 2N + 1/N
3) Fenchel.
(a) Let f, g be convex functions such that f ∗ = g. Write the convex conjugate of fα(·) = αf(·) (where α ∈ R +) in terms of α and g.
(b) Let f(x) := √ 1 + x 2. What is its Fenchel conjugate, f ∗ √ (θ)? (EDIT: original version was f(x) := 1 − x 2, which is concave!)
4) Hoeffding.
(a) Let X be a random variable and define f(λ) := log E[exp(λX)]. Let f ∗ (θ) be the Fenchel conjugate of f. Show that P(X > t) ≤ exp(−f ∗ (t))
(b) Assume you have m coins. All the coins are unbiased (that is, they have an equal probability of heads or tails), EXCEPT the special coin which comes up heads with probabiliy 1 2 + ρ, for some 0 < ρ < 1 2 . You toss each of the coins n times exactly, and you count the number of heads you observe; say hi is the number of times coin i came up heads. Did the special coin have the most heads? Find a lower bound, in terms of n, m, and ρ, on the probability that the special coin had the largest value hi .
(c) I want to make sure I find the special coin, and I can only handle a small δ > 0 probability of error. Find a value of n, in terms of ρ, δ, and m, to gaurantee that the special coin has the most heads with probability at least 1 − δ.
5) Shannon. Let f : ∆n → R be the negative entropy function, defined as f(x) = Xn i=1 xi log xi . where 0 log(0) = 0.
(a) Let g : R n → R be a function defined as g(θ) = log (Pn i=1 exp(θi)). Show that f is the Fenchel conjugate of g.
(b) Prove that when n = 2, f is 1-strongly convex with respect to k · k1. Denote two arbitrary vectors in ∆2 as x = (p, 1 − p) and y = (q, 1 − q). Hint: Without loss of generality, assume p ≥ q.
(c) (Challenge) Prove that for any n ≥ 2, the function f is 1-strongly convex with respect to k · k1. Hint: Can you find a reduction to the n = 2 case? Here is one possible route: Let x = (x1, . . . , xn) and y = (y1, . . . , yn) be two vectors in ∆n. Let A = {i : xi ≥ yi} be the coordinates where x dominates y. Find new vectors xA, yA ∈ ∆2 such that kx−yk1 = kxA −yAk1 and operate on these.
6) Bregman. Let f be a differentiable convex function.
(a) Show that for all x, y, z ∈ dom(f), Df (x, y) + Df (y, z) − Df (x, z) = h∇f(z) − ∇f(y), x − yi.
(b) Show that if f is 1-strongly convex with respect to a norm k · k then h∇f(x) − ∇f(y), x − yi ≥ kx − yk 2 for all x, y ∈ dom(f), where k · k∗ is the dual norm to k · k. Hint: Consider Df (x, y) and Df (y, x). Note: The original version asked you to show that k∇f(x) − ∇f(y)k∗ ≥ kx − yk which, indeed, is also true. We will accept an answer to either question, but the Challenge question 6(c) refers to the present version.
(c) (Challenge) Show the converse of the above (problem 6b). Hint: Let h(α) = f(x + α(y − x)) and zα = x + α(y − x) . 2