STAT3006 Statistical Learning

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

STAT3006/7305

Tutorial 1

1. Assume  is bivariate normal with mean  and covariance matrix 

Derive an expression for the marginal distribution of X1 .

2. For the above problem, derive an expression for the conditional distribution of X1 from the bivariate normal, given X2  = 0.

3. If the variates XT   =  [X1  X2  X3] and Y T   =  [Y1  Y2  Y3] are independently and trivariate normally distributed with respective mean and covariance:

(a) Determine the distribution of X − Y.    (b) Determine the correlation matrix for X.

(c) Use R (or Python) to determine the correlation matrix for Y. (d) What is the distribution of Y1 and Y2 when Y3 is fixed to 4?    (e) What is the partial correlation of Y1 and Y2 when Y3 is fixed?

4. In introductory statistics courses, you likely encountered rules for the mean and  variance of linear combinations of univariate random variables. Specifically, given random variables X1 and X2 and constants a and b, consider the linear combina-  tion

Y = aX1 + bX2 .

The mean of the linear combination is the linear combination of the means:

E(Y) = aE(X1 ) + bE(X2 )

The variance of the linear combination is as follows:

Var(Y) = a2 Var(X1 ) + b2 Var(X2 ) + 2 ab Cov(X1 , X2 ).

Derive corresponding expressions for the mean and variance of linear combina- tions of arbitrary p-dimensional random variables:  AX + BY.  Do not assume that any of these variables are independent of others.

Hints:  let =  (XY T )T, consider a suitable matrix C and the properties of CZ. Assume E(X) = μX, E(Y ) = μY , Cov(X X) = ΣXX , Cov(Y Y ) = ΣYY , Cov(X Y ) = ΣXY .

5. Simulation, estimation, accuracy and plotting

(a) UseR’smvtnorm package (or similar) to generate 100 (pseudo-)random vec- tors from the distribution for given in the previous question.

(b) Calculate the A matrix (sums of squares and cross-products) and and the maximum likelihood estimate of all the parameters of the corresponding normal model using the data produced above. Use R or Python.

(c) Calculate the Euclidean error for the estimated mean versus the true mean using R or Python.

(d) Calculate the Frobenius norm for the difference between the estimated cor- relation matrix and the true correlation matrix using R or Python..

(e) Try to plot some (e.g. 2 or 3) contours of the marginal distribution (first two dimensions) for the true and estimated distributions, in different colours. Add the data points.

(f) If you had produced e.g. 100,000 random vectors instead, think about how you might reasonably produce a plot similar to the one above.  Implement this method, producing a plot.  Does this method also work well for the original sample size of 100?






发表评论

电子邮件地址不会被公开。 必填项已用*标注