Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
MTHM017 Advanced Topics in Statistics
Ref/Def Assignment
The assignment has two main parts. Part A involves fitting a mixture model to assess reaction times in schizophrenic patients. Part B involves using different methods for classification of data into two groups.
A. Bayesian Inference [64 marks]
In Part A we will fit a finite mixture model using the rtimes dataset, which contains the reaction times of 17 people (11 non-schizophrenics and 6 schizophrenics) in a psychological experiment. Each person’s reaction time was measured 30 times.
1. [5 marks] Read in the data, then for each person produce a histogram of that given person’s reaction times. The range of the x axis should be the same on each histogram. Visually compare the reaction time distributions of schizophrenic and non-schizophrenic individuals. What differences/similarities can you observe?
It is suggested that schizophrenics suffer from attention deficit on some trials, as well as a general motor reflex retardation. Motor reflex retardation affects the response time of all trials, while attention deficit only affects some of the responses. To address this theory we will fit a model, where the response times of non-schizophrenics are described by a normal random-effects model, and the response times of schizophrenic individuals are modeled as a two-component mixture model.
To reflect the attention deficit, let yij denote the logarithm of the jth measured reaction time of person i. Then:
• For the responses of the ith non-schizophrenic person (i = 1, 2, . . . , 11) we have
yij ~ N (αi , σy(2)), i = 1, 2, . . . , 11, j = 1, 2, . . . , 30.
That is, the responses are normally distributed with person-specific mean αi and some common variance
σy(2) .
• For the responses of the ith schizophrenic individual (i = 12, 13, . . . , 17), with probability (1 − λ) there is no delay, and the response is normally distributed with mean αi and variance σy(2); and with probability
λ the response is delayed so that the observations have mean αi + τ and variance σy(2) . That is
yij ~ N (αi + τ zij , σy(2)),
zij ~ Bernoulli(λ), i = 12, 13, . . . , 17; j = 1, 2, . . . , 30.
Note that in the above model zij is an indicator function that takes the value 1 whenever the response is delayed, and the value 0 otherwise. Furthermore, τ is the amount of time by which the response is delayed. To ensure that the model is identifiable we will restrict τ to be positive.
The two cases (schizophrenic and non-schizophrenic) could be brought to the same form by adding indicator variables zij to the non-schizophrenic part of the model. However in the non-schizophrenic case these variables will always take the value 0!
The magnitude of the schizophrenics’ motor retardation is captured by the distribution of the αi parameters. In particular,
• For non-schizophrenic individuals we assume that αi follows a normal distribution with mean µ and variance σα(2), that is
αi ~ N(µ, σα(2)), i = 1, 2, . . . , 11.
• For the schizophrenics we assume that the mean of αi is µ + β, while the variance remains σα(2) . That is αi ~ N(µ + β, σα(2)), i = 12, 13, . . . , 17.
2. [5 marks] The above model uses the logarithm of measured reaction times. Explain why taking the logarithm might be necessary, then perform the transformation yourself. For each person compute the standard deviation of the log transformed reaction times of that individual.
3. [5 marks] List the parameters of the model and assign non-informative uniform prior distributions to each parameter, paying attention to the values these parameter are allowed take.
4. [16 marks] Code up the above model in JAGS using functions covered in the module. Fit the model with 10000 iterations, discarding the first 5000 as burn-in. Make sure you set the model up in a way that demonstrates your understanding of the Bayesian model fitting process.
Note that part of the JAGS model was written for you. To write your JAGS model edit the draft model definition below by adding i) the likelihood component that describes the reaction time of schizophrenics, and ii) the prior distributions of all the model parameters.
jags.model <- function(){
# reaction time of non-schizophrenics
for(i in 1:11){
for(j in 1:30){
y.ns[i,j] ~ dnorm(alpha.ns[i], p.y)
}
alpha.ns[i] ~ dnorm(mu,p.alpha)
}
# reaction time of schizophrenics
...
# priors
...
}
5. [7 marks] Investigate whether the chains for all the parameters have converged. Include any relevant evidence that supports your conclusions.
6. [10 marks] The primary interest to psychologists lies in the parameters β , λ and τ . Plot the posterior distributions of these three parameters, then produce numerical summaries of the distributions.
Remembering that the response time was modeled on the log scale (and therefore both τ and β are on the log scale), give the median and a 95% posterior interval for each of these parameters on their original scale. Based on these estimates what conclusions can you make about the reaction times of schizophrenics compared to non-schizophrenics?
7. [16 marks] Next we will use prediction to check the fit of the model. Follow the steps below to assess how well the model can explain the variability in the data.
(a) Edit yo˜ur previous model definition so that it predicts 30 additional (log) response time measure-ments y˜ij , j = 1, 2, . . . , 30, for each schizophrenic individual i = 12, 13, . . . , 17 in the study. Note that the prediction of y˜ij should use the posterior of αi .
(b) Then add futher nodes to your model to i) find the standard deviation of the 30 predicted measurements for each individual, and to ii) get the minimum and maximum values of these 6 standard deviation values.
That is, if for individual i the simulated response times are
y(˜)i = (y(˜)i1 , y(˜)i2 , . . . , y(˜)i30), i = 1, 2, . . . , 6,
then the model should first compute to standard deviations
sdi = sd(y(˜)i ), i = 1, 2, . . . , 6,
then find the smallest and largest of these six values,
Smin = min(sd1 , sd2, . . . , sd6 ),
Smax = max(sd1 , sd2, . . . , sd6 ).
(c) Fit your edited model with 6000 iterations, discarding 5000 as burnin.
(d) Extract the minimum and maximum standard deviation values from the fitted model, and produce a scatterplot of the (Smin , Smax ) pairs. (Note that each iteration of the model fit will produce a minimum-maximum pair). Find the minimum and maximum of the raw standard deviation estimates obtained in Question 2, and add an additional point to your scatterplot showing this raw minimum-maximum pair.
Based on the scatterplot, would you say that the model can accurately explain the variation in the within-person response time variance?
B. Classification [36 marks]
The following figure shows the information in the dataset Classification .csv - it shows two different groups, plotted against two explanatory variables. This is simulated data - the groupings are determined by a (known, but not to you!) function of X1 and X2 with added noise/random error. The aim is to find a suitable method for classifying the 1000 datapoints into the two groups from a selection of possible approaches.
1. [4 marks] Summarise the two groups in terms of the variables X1 and X2 . Describe your findings. Considering the plot showing the observations and the numerical summaries, which of the following classification methods do you think are suitable for classifying this data and why?
a. Linear discriminant analysis.
b. Quadratic discriminant analysis.
c. K-nearest neighbour regression.
d. Support vector machines.
e. Random forests.
2. [1 marks] Select 80% of the data to act as a training set, with the remaining 20% for testing/evaluation.
3. [26 marks] Choose four of the methods listed in Question 1 that might be suitable to classify the data. Perform classification using these methods. In each case, briefly describe how the classification method works, present the results of an evaluation of the method (highlighting different aspects of the model performance) and describe your findings. Where appropriate optimise the (hyper)parameters of the method.
4. [2 marks] Compare the results from your chosen four approaches and select what you think is the best method for classification in this case, explaining your reasoning.
5. [3 marks] The file ClassificationTrue .csv contains the true classifications, based on the function of X1 and X2 without the noise. Evaluate how your 4 chosen methods from Questions 3 compare to the truth (in each case use the previously selected optimal value of the parameters). Does your choice from Question 4 still perform best in this case?