Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
STAT7005 Multivariate Methods
3 Inference about Mean Structure
3.1 General Consideration of Inference
3.1.1 Duality of Hypothesis Testing and Confidence Region
To illustrate the duality of hypothesis testing and confidence region, an example of testing H0 : µ = µ0 against H1 : µ ≠ µ0 is considered. Suppose a random sample X1, X2, . . . , Xn drawn from a normal distribution with unknown mean µ and known variance σ 2 . Intuitively, if |X¯ − µ0| is sufficiently large, say larger than x0, H0 is favorable to be rejected based on the observed random sample. By picking x0 = zα/2σ/√ n, we assure that the probability of Type I error is α. Then, H0 cannot be rejected for X = {X1, X2, . . . , Xn} over the set
Alternatively, this set can be rewritten as
On the other hand, consider (1 − α)100% confidence interval for µ.
Clearly, it can be checked that X ∈ A if and only if µ0 ∈ C. In other words, the data X falls within the non-rejection region A under H0 if and only if µ0 belongs to the confidence interval C obtained from the data X. This relationship can be used to construct the confidence intervals from the tests of hypothesis and the tests of hypothesis from the confidence intervals.
Another example of testing H0 : µ = µ0 against H1 : µ > µ0 is considered. Reasonably, if ¯X − µ0 is larger than x0, H0 is rejected. Following the previous argument, by picking the critical value x0 = zασ/√ n, the probability of Type I error, α, is achieved. Then, H0 cannot be rejected for X if over the set
Then, (1 − α)100% one-sided confidence interval for µ can be constructed as
This idea can be extended to the multivariate settings as shown in the following sections and even in the general hypothesis testing case.
In the following 2 sections, two approaches of hypothesis testing are introduced and they will be used throughout the course, namely likelihood ratio test (LRT) and union intersection test (UIT).
3.1.2 Likelihood Ratio Test (LRT)
In view of the above duality between tests and confidence regions, some general procedure of deriving a test for hypotheses would be vital. Recall that the following procedure provides the Likelihood Ratio Test (LRT). The target statistic is
which is the ratio of two maximum likelihoods, one under the restriction imposed by H0 and under the restriction imposed by (H0 or H1), where ω and Ω are the parameter sets under the restrictions imposed by H0 and H0∪H1 respectively. Thus, this ratio has a value in the interval (0, 1]. One moment’s reflection of the meaning of the ratio would suggest that large values of the ratio are more favorable to H0. Thus, we reject H0 at significance level α if λ(x1, . . . , xn) < cα where cα is to be determined so that
Indeed, cα is also known as the critical value. Or equivalently, if the p-value for an observed value, λ0, of likelihood ratio is less than α, i.e.
p-value(λ0) = P(λ(x1, . . . , xn) < λ0|H0) < α.
In practice, the sampling distribution of likelihood ratio λ is difficult to be determined, so we usually express λ in terms of an intermediate statistic which is related to a well-known distribution or which can facilitate easier computation of p-value (or critical value), so that the LRT will be more conveniently performed based on that statistic. The procedure can produce all the test statistics presented in this chapter.
When the sample size tends to infinity, the statistic, −2 log λ, is distributed asymptotically as χ 2 (k) where the degree of freedom k equals the number of free parameters under H0 ∪ H1 minus that under H0, or k = dim(Ω) − dim(ω). In other words, k is the difference between the number of free parameters under unrestricted model and that under restricted model. This asymptotic distribution will be used in some later chapters.
For example, consider the following multiple linear regression model
Y = β1X1 + β2X2 + · · · + βpXp + ε
where ε ∼ N(0, σ2 ). Suppose we test the hypothesis H0 : β2 = β3 = β4 = 0 against H1 : at least one of them is not zero. The likelihood ratio is
The test statistic −2 log λ is distributed asymptotically as χ 2 (k) where k = dim(Ω)− dim(ω) = p + 1 − (p + 1 − 3) = 3.
The “negative twice log likelihood ratio” was first called “G2 -statistic” by Sir R.A. Fisher and later re-named by McCullagh and Nelder as the deviance between the two models respectively under H0 and H1 in their theory of Generalized Linear Model.
3.1.3 Union Intersection Test (UIT)
Another procedure introduced by N. N. Roy specially for multivariate setting generates what are called Union Intersection Tests (UIT). This procedure gives the same tests in this chapter, but may give different tests in other circumstances. It first expresses the multivariate null hypothesis as an intersection of the family of univariate null hypotheses in terms of all possible linear combinations of the p variables. Correspondingly, the multivariate alternative hypothesis becomes the logical union of the univariate alternative hypotheses. If a test exists for each univariate null hypothesis with its associated alternative, the multivariate null hypothesis is not rejected if and only if all the univariate null hypotheses are not rejected.
Since, unlike the LRT, this procedure is introduced for the first time, we demonstrate its operation for the special case of H0 : µ = µ0 versus H1 : µ = µ0 where x¯ and S are respectively the mean vector and covariance matrix of a random sample of n observations from Np(µ, Σ).
Recall that ∩ and ∪ denote the intersection and union of all sets respectively.
Now, we observe that
Under H0, the difference between a 0 µ and a 0 µ0 should be zero for all a. In other words, no matter how we squeeze the space of µ = (µ1, . . . , µp) 0 , it should be the same as µ0 . Consequently,
not reject H0 ⇔ not reject H0(a) for all a;
and reject H0 ⇔ reject H0(a) at least one a.
We recall that for any a, the univariate null H0(a) will not be rejected for small values of
t 2 (a) = n(a 0 x¯ − a 0 µ0 ) 2 /a 0 Sa
N.B. t 2 (a) can be regarded as the squared t-statistic. Then, according to the above UIT argument, we do not reject H0 when the following value is small
where the maximum is obtained by using the technique in the Appendix 1 of Chapter 1. The test statistic is called the Hotelling’s T 2 whose distribution is related to the F distribution follows. Note that the UIT is also the LRT for this case.
3.1.4 Hotelling’s T 2 Distribution
Suppose y ∼ Np(0, Σ) and V ∼ Wp(k, Σ) are independent. Define
T 2 = ky 0 V −1y.
Then, T 2 is said to follow a Hotelling’s T 2 distribution with parameters p and k, and is denoted as T 2 (p, k).
Properties
1. If x¯ and S are respectively the sample mean vector and sample covariance matrix of a random sample of size n taken from Np(µ, Σ),
n(x¯ − µ) 0 S −1 (x¯ − µ) ∼ T 2 (p, n − 1).
Proof: Put y = √ n(x¯ − µ) ∼ Np(0, Σ) and V = (n − 1)S ∼ Wp(n − 1, Σ).
In univariate sense, the Hotelling’s T 2 statistic can be reduced to the squared t-statistic.
Proof: From Property (9) of Section 2.1, we have y 0 Σ −1y ∼ χ 2 (p). Also, from Property (9) of Section 2.3, we have y 0 Σ −1y/y 0 V −1y ∼ χ 2 (k − p + 1). Then,
Hence,
and the result is shown already.
A direct result of property (2) is T 2 (p, n − 1) = n − p/p(n − 1)F(p, n − p).
3. T 2 (1, k) = t 2 (k) = F(1, k).
Hint: Use property (2).
Remarks
1. The statistic T 2 = n(x¯ − µ) 0 S −1 (x¯ − µ) is invariant under the transformation
y = Cx + d (C is non-singular).
That is, linear transformation of measurement x cannot change the value of T2 statistic.
Proof: Let the sample mean and covariance matrix of y be y¯ and Sy respectively. Clearly, y¯ = Cx¯ + d and Sy = CSC0 . Then, T 2 statistic becomes
T 2 = [C −1 (y¯ − d) − µ] 0 S −1 [C −1 (y¯ − d) − µ]
= (y¯ − d − Cµ) 0 (CSC0 ) −1 (y¯ − d − Cµ)
= (y¯ − µy ) 0 S − y 1 (y¯ − µy )
where µy = Cµ + d which is the same transformation applied to x.
2. In terms of the Mahalanobis distance between x1 and x2 with respect to Σ, DΣ(x1, x2) ≡ p (x1 − x2) 0 Σ −1 (x1 − x2), we have T 2 = nDS 2 (x¯, µ).
3. Under property (1), the distribution of the quadratic form under non-normality is reasonably robust as long as the underlying multivariate distribution has pdf contours close to elliptical shape, but T 2 is sensitive to the departure from such elliptical symmetry of the distribution.
4. Property (2) implies that the critical value of Hotelling’s T 2 distribution can be obtained from the F distribution.