大神代写 - 程序代写, 作业代写, essay代写, homework代写,project代写, quiz代写, exam等

Theory Assignment 3

COMP 451 - Fundamentals of Machine Learning

Question 1 [13 points]

In class we introduced the Gaussian mixture model (GMM). In this question, we will consider a mixture

of Bernoulli distributions. Here, our data points will be defined as m-dimensional vectors of binary values

x ∈ {0, 1}

First, we will introduce a single multivariate Bernoulli distribution, which is defined by a mean vector μ

P(x|μ) =

mY?1

j=0

μ[j]

x[j]

(1 ? μ[j])(1?x[j])

. (1)

Thus, we see that a the individual binary dimensions are independent for a single multivariate Bernoulli.

Now, we can define a mixture of K multivariate Bernoulli distributions as follows

, πk, k = 0, .., K ? 1} are the parameters of the mixture and P(x|μk

) is the probability

assigned to the point by each individual component in the model.

Note that the mean of each individual component distribution P(x|μk) is given by

Ek[x] = μk

, (5)

and the covariance matrix of each component is given by

Cov[x] = Σk = diag(μk ? (1 ? μk

)), (6)

where ? denotes elementwise multiplication. In other words, the covariance matrix Σk for each component

is a diagonal matrix with diagonal entries given by Σk[j, j] = μ[j](1 ? μ[j]). It is a diagonal matrix because

each dimension is independent.

Part 1 [8 points]

Derive expression for the mean vector and the covariance matrix of the full mixture distribution defined in

Equation 2. That is, give expressions for the following:

E[x] =? Cov[x] =? (7)

Hint: use the fact that

Cov[x] = E

(x ? E[x])(x ? E[x])>

= E[xx>] ? E[x]E[x]

Part 2 [5 points]

Just as with a GMM, we can use the expectation maximization (EM) algorithm to compute learn the

parameters of a Bernoulli mixture model. Here, we will provide you with the formula for the expectation

step as well as the log-likelihood of the model. You must derive the formula for the maximization step.

Expectation step. In the expectation step of the Bernoulli mixture model, we compute scores r(x, k), which

tell us how likely it is that point x belongs to component k. These scores are computed as follows:

r(x, k) = πkP(x|μk

)

j=1 πjP(x|πj )

, (8)

where P(x|μk

) is defined as in Equation 2.

Log-likelihood.

? (9)

Maximization step. You must find the formula for the μk parameters in the maximization step:

μk =? (10)

Question 2 [5 points]

Recall that the low dimensional codes in PCA are defined as

zi = U>(xi ? μ), (11)

where U is a matrix containing the top-k eigenvectors of the covariance matrix and. (12)

Recall that the reconstruction of a point xi using its code zi

is given by

x?i = Uzi + μ. (13)

Show that

(x?i ? xi)

>(x?i ? μ) = 0. (14)

Question 3 [short answers; 2 points each]

Answer each question with 1-3 sentences for justification, potentially with equations/examples for support.

a) True or false: It is always possible to choose an initialization so that K-means converges in one iteration.

b) Suppose you are learning a decision tree for email spam classification. Your current sample of the training

data has the following distribution of labels:

[43+, 30?], (15)

i.e., the training sample has 43 examples that are spam and 30 that are not spam. Now, you are choosing

between two candidate tests.

Test 1 (T1) tests whether the number of words in the email is greater than 30 and would result in the

following splits:

? num words > 30 : [5+, 15?]

? num words ≤ 30: [38+, 15?]

Test 2 (T2) tests whether the email contains an external URL link and would result in the following splits:

? has link: [25+, 5?]

? not has link: [18+, 25?]

Which test should you use to split the data? I.e., which test provides a higher information gain?

c) Which of the following statements is false:

1. If the covariance between two variables is zero, then their mutual information is also zero.

2. Adding more features is a useful strategy to combat underfitting.

3. Decision trees can learn non-linear decision boundaries.

4. The Gaussian mixture model contains more parameters than K-means.

文章

COMP 451 - Fundamentals of Machine Learning

发表评论