Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
MATH39512 Survival Analysis for Actuarial Science: example sheet 5
*=easy, **=intermediate, ***=difficult
* Exercise 5.1
Consider again Exercise 2.4
(a) Use R to produce a plot of the Kaplan-Meier estimate of the data set in Exercise 2.4.
(b) Use R to compute the estimate in part (c) of Exercise 2.4 in an efficient way. (Hint: use the survfit() command to create suitable vectors and then use the diff() and sum() commands.)
* Exercise 5.2
Consider again Exercise 3.4.
(a) Use R to produce a plot of the Nelson-Aalen estimate of the data set in Exercise 3.4.
(b) Use R to produce a plot of the estimate of the variance of the N-A estimator as defined in Exercise 3.4. (Hint: use the last piece of code in Section A.1 for inspiration.)
*** Exercise 5.3
In R, install and load the KMsurv package containing datasets from the book of Klein and Moeschberger (1997) Survival Analysis Techniques for Censored and truncated data, Springer. This can be done
via the commands
1 install . packages ( " KM surv " )
2 library (KM surv)
Load the data set channing by using the command data(channing). This produces a data frame called channing which consists of data on the death times of 462 elderly residents of Channing House, a retirement centre in California. Use help(channing) to get further information on this data set.
(a) Work with time since entry into the retirement centre as the time scale.
(i) Produce a suitable plot containing both the Kaplan-Meier estimate of the survival function of a male as well as of a female resident.
(ii) What is the estimated median number of months that a resident spends in the retirement centre.
(iii) Perform the log-rank test at the 5% significance level to test whether there is a difference in the survival time distribution between male and female residents and report your conclusions.
(iv) Perform the log-rank test at the 5% significance level to test whether there is a difference in the survival time distribution between a female resident with age at entry less than 75 years exact and a female resident with age at entry more than or equal to 75 years
exact. Report your conclusions. (b) Work with age as the time scale.
(i) Produce a suitable plot containing the Kaplan-Meier estimate of the survival function of a female resident.
(ii) Produce a suitable plot containing the Kaplan-Meier estimate of the survival function of a male resident. (Hint: there is problem with the male data which prevents one from getting a sensible K-M estimate. Find out what the problem is and come up with an appropriate workaround.)
(iii) Why is it not a good idea to provide the plots in one graph as in part (a)(i)?
(iv) Estimate the probability that a male resident dies between the ages 80 exact and 85 exact.
(v) Estimate the conditional expectation of the age at death in years of a female resident given that she dies before age 90 exact.
** Exercise 5.4
Consider the following survival data corresponding to a homogeneous group of 20 individuals where + denotes a censored value:
6.7 12.8 5.5+ 4.9+ 4.3+ 5.6 7.6 4.8 3.7+ 15.6+
3.3 7.4 6.0 2.6 11.0 4.2+ 6.9 15.9 8.4 7.5+
Check graphically if a Weibull distribution of the survival times corresponding to the data is appropriate. Do the same with an exponential distribution. Report your conclusions.
*** Exercise 5.5
Let Ti be the survival time and Ci the censoring time of individual i = 1, . . . ,n. Assume the pairs (T1, C1 ), (T2, C2 ), ...,(Tn, Cn) are i.i.d. and the common joint survival function of (Ti, Ci) is given by
e(Ti > t, Ci > y) = e−µt−βy−θty, t,y ≥ 0,
where the parameters µ , β and θ satisfy µ > 0, β > 0 and θ ∈ [0,µβ]. (Note that in the literature, (Ti, Ci) is said to have a Gumbel’s bivariate exponential distribution.) For this exercise you should make use of the R script simbivarexp .r which generates outcomes of (Ti, Ci). This script can be
found on the course materials webpage.
(a) What is the hazard function of T1?
(b) Assume µ = 1, β = 1.5 and θ = 0. Simulate outcomes of (Ti, Ci) for two groups consisting of 1000 (independent) individuals each. Assume that for one group you observe the exact surivival times Ti for each individual (i.e. there is no censoring) whereas for the other group you only observe the minimum of Ti and Ci for each individual (i.e. there is censoring). Plot the Kaplan-Meier estimates for both groups and perform the log-rank test by using R.
(c) Do the same as in part (a) but now assume θ = 1.5 instead of θ = 0.
(d) Report your conclusions regarding whether or not the censoring is independent for each of the two cases considered (i.e. the case in part (a) and the case in part (b)).
(e) Can you think of a reason why we need to simulate two independent groups of 1000 individ-uals in both parts (a) and (b)? Why can we not just simply simulate one group and then compare, via the log-rank test, the case where there is no censoring with the case where there is censoring?