STAT 2011 Probability and Estimation Theory – Semester 1, 2024
Computer Practical Sheet Week 4
If you want the same (pseudo-)random numbers generated each time, place a command like
set.seed(36784)
in your first chunk so that the (pseudo-)random number generator always starts from the same “point” .
Consider taking random samples with replacement of size n from the population
S = {1, 2, 3, 4, 5, 6}.
Before you start the actual computer exercise it will help if you do some quick calculations, namely if X denotes a single random draw from S, determine
µX = E(X) and σX(2) = Var(X) = E(X2) − [E(X)]2 .
Furthermore note that the expected value µY = E(Y) and variance σY(2) = E(Y2) − [E(Y)]2 of the total Y = : Xi (as functions of µ, σ 2 and n) where X1, X2 ,..., Xn denote a random sample of size n taken from S with replacement are given by µY = nµX and σY(2) = nσX(2) .
We shall be looking at ways to evaluate and/or approximate probabilities of the form P(Y ≥ y) for various values of n and t, in particular
. P(Y ≥ 15) when n = 3 and
. P(Y ≥ 45) when n = 10.
We shall be making use of the rep() command in R, which repeats numbers in various ways. There are two main forms of usage:
. rep(vec,int), where vec is a vector and int is a positive integer, simply creates a new vector where vec is repeated int times, e.g.
> x = 1:6
> x
[1] 1 2 3 4 5 6
> rep(x,3)
[1] 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
. rep(vec1,vec2), where vec1 is an arbitrary vector and vec2 is a vector of non-negative integers the same length as vec1; this creates a new vector in which vec1[i] (the i-th element of vec1) is repeated vec2[i] times, e.g.
> x = 1:6
> x
[1] 1 2 3 4 5 6
> rep(x,c(1,2,3,1,2,3))
[1] 1 2 2 3 3 3 4 5 5 6 6 6
These two forms can be combined together in rather interesting ways:
> x = 1:6
> rep(rep(x,rep(2,6)),3)
[1] 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6
The above is the same as the following sequence of steps:
> a = rep(2,6)
> a
[1] 2 2 2 2 2 2
> b = rep(x,a)
> b
[1] 1 1 2 2 3 3 4 4 5 5 6 6
> rep(b,3)
[1] 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6
Computer Problems for Week 4
For this week’s lab report: Please submit your code, output, and any comment required, for Q7 and Q8 (a), (b), and (c) ONLY. An assignment item has been set up on Canvas; file upload format is limited to pdf and html.
1. Define x=1:6. By applying rep() to x, create a vector X3 which simply repeats x 36 times.
2. Now create another vector X1 of length 216 with 36 1’s, 36 2’s,...,36 6’s (hint: use a command of the form rep(...,rep(...))).
3. Finally create a vector X2 of length 216 which takes a vector consisting of 6 1’s, 6 2’s,...,6 6’s and then repeats it 6 times (hint: rep(rep(...,rep(...)),...)).
4. The matrix outcomes=cbind(X1,X2,X3) gives a representation of all outcomes in the sample space when sampling 3 times from S with replacement. Print the first 13 lines of it using outcomes[1:13,]. Obtain a vector sums of row sums of this matrix (see last week’s exercise if you forget how to do this).
5. Using the vector sums, determine (for the case n = 3)
(a) µY = E(Y),
(b) σY(2) = E(Y2) − [E(Y)]2 and
(c) P(Y ≥ 15)
(note sum(vec>=x) counts the number of elements of the vector vec that are greater than or equal to x; also mean(vec) is the same as sum(vec)/length(vec)).
6. A nice way to visualise the probability distribution of Y is to plot an ordinate diagram (as opposed to a histogram, since the random variable is discrete):
‘‘‘{r Q6}
propns=table(sums)/length(sums)
plot(propns,type="h")
‘‘‘
7. We now turn our attention to the case n = 10. In this case there are 610 > 60 million different possible outcomes, and so constructing an exhaustive matrix of outcomes is a daunting exercise. However we can use Monte Carlo methods: we can simulate a large number y1 ,..., yB of realisations of this random variable Y , and then for any function g( ·) we can approximate E[g(Y )] simply by determining the long-run average B/1 g(yi). Set n=10, B=10,000 and popn=1:6. Define a vector of n*B simulated random draws with replacement from popn using
draws=sample(size=n*B,x=popn,replace=TRUE)
and form the vector draws into a B-by-n matrix called samples (see previous computer labs if you forgot how to do this). Each row then represents a sample of size n with replacement from popn.
8. Obtain a vector sim.sums of (simulated) sample sums. Use this to compute Monte Carlo approximations to
(a) µY = E(Y),
(b) σY(2) = E(Y2) − [E(Y)]2 and
(c) P(Y ≥ 45)
Compare the approximations, using the R functions mean and var to calculate the mean (sample expectation) and sample variance, respectively, to the theoretical values computed earlier and comment. (You can state the theoretical value of the expectation without showing derivation.)
9. Produce an ordinate diagram of the relative frequencies/proportions of each value in sim.sums.