MTHM505 Data Science And Statistical Modelling In Space And Time

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Data Science And Statistical Modelling In Space And Time

Assessment REF/DEF, 2024

This assessment consists of 3 questions:

1.  Spatial modelling

2. Time series modelling

3. A combination of both

The total number of marks available is 100, and is split 35/35/30 between the 3 questions.  Marks indicated for individual parts suggest the relative amount of detail required to answer questions.

The deadline for submission is 12 noon, 29th July.

You should submit a single pdf at the ELE submission point containing all your solutions. The data required in each question is provided.

Commented R code (and any outcomes/plots) should be part of the answers, however onlyinclude R output that is helpful for answering the questions, and it should be clear from your answers which models you are fitting and why (i.e. don’t only include code/plots), and ensure that plots are properly labelled and explained.

You are expected to work independently - strict disciplinary action will betaken for any plagiarism. Late submissions will be penalised according to the University’s late submission policy.

All questions can be answered using models seen in the lectures and practicals. You may use any programming language or R package, however be careful that the code you are using is actually fitting the model that you think it is and answering the question - e.g., if the question says fit a ‘Gaussian process with maximum likelihood’, you won’t get marks for fitting a different type of spatial model, or for using a different type of parameter estimation.

1. Precipitation modelling [35 marks]

Commented R code (and any outcomes/plots) should be part of the answers, however onlyinclude R output that is helpful for answering the questions, and it should be clear from your answers which models you are fitting and why (i.e. don’t only include code/plots), and ensure that plots are properly labelled and explained.

a)  Plot and comment on the data.  You might find it helpful to convert the data into a geodata object using as.geodata() [3 marks]

b)  Select 4 points from the dataset at random, report the chosen stations (name, longitude, latitude, precipitation), and remove these from your training dataset for fitting models.

c)  Using the sample variogram, comment on whether you need to set a maximum distance, and explain whether there should be a nugget in your model. Given this, fit a spatial model using the variogram/kriging approach.  You may want to try different assumptions and see which one fits best. Clearly state what assumptions you are making about the trend/mean function, covariance function, and the nugget, and state all fitted model parameters.  Validate your model. [12 marks]

d)  Repeat part c), but instead fit a Gaussian Process model using maximum likelihood toesti- mate the parameters. [9 marks]

e)  Now estimate the model parameters using a Bayesian approach with discrete priors.  You may use your answer from d) to help set prior ranges.  Include priors over the correlation length and nugget. Clearly state what modelling assumptions you are making, and compare the parameter estimates to those from part d). [8 marks]

f)  Using your models from c), d) and e), predict precipitation at the 4 locations you removed.

Compare your predictions. [3 marks] Hints:

• The Bayesian approach can become extremely slow if you have multiple discrete priors and a large number of bins for each - it may be worth starting with a coarse discrete prior that allows you to fit the model relatively quickly, and then add more bins later if you have time.

Reference:

Commented R code (and any outcomes/plots) should be part of the answers, however onlyinclude R output that is helpful for answering the questions, and it should be clear from your answers which models you are fitting and why (i.e. don’t only include code/plots), and ensure that plots are properly labelled and explained.

2. Global surface temperature [35 marks]

In this question, we are going to consider the global surface temperature anomaly (Rohde and  Hausfather 2020, https://doi.org/10.5194/essd-12-3469-2020), a key measure of global warming. The file gst_anom.csv contains monthly observations for 1850-2023 of the global surface temper- ature anomaly (calculated relative to the mean for 1951-1980), in degrees. We are going to model  the data and forecast ahead.

a)  Plot the data and comment on any patterns/trends observed. [3 marks] For the remainder of the question, only use data from 1990 onwards.

b)  Fit  appropriate  ARMA and ARIMA  models  (both  without  seasonal  components)  to  the anomaly dataset. You may want to fit multiple models and select the best, justifying clearly why your chosen models are appropriate. [12 marks]

c) Average the dataset to quarterly means instead of monthly means, and find the most suitable ARMA or ARIMA or SARIMA model for this quarterly dataset. [8 marks]

d)  Fit a Dynamic Linear Model with a linear trend and a seasonal component, to both the original monthly dataset and the quarterly dataset from part c). [8 marks]

e)  Using your best models from each of b), c) and d), forecast the values of the global surface temperature anomaly for 2024. Comment on your forecasts. [4 marks]

3. UK daily temperatures [30 marks]

This question considers modelling maximum daily temperature in the UK in 2022. You have 2 files:

• uk_loc.csv containing longitude, latitude, elevation and place name for 27 measurement sites in the UK.

• uk_temp.csv containing maximum daily temperatures in degrees Celsius for each site in 2022.

For each modelling question, you should expect to carry out all the usual stages, e.g. making clear which model you are fitting and to which data, which assumptions you are making, etc. You should also perform appropriate validation checks.

a.  Provide spatial and time series plots of the data, and comment on trends seen in maximum daily temperature in the UK in 2022. [2 marks]

b.  Fit a spatial Gaussian process model using maximum likelihood to predict the maximum temperature in Balmoral, Bedford and Reading on February 19th 2022. [12 marks]

c.  Using the model from part b), produce plots of the mean and variance over a 0.05 degree grid covering the input data. [2 marks]

d. In this question, you are not allowed to use auto.arima or a similar automated selection procedure. Fit suitable time series models to produce forecasts of the maximum temperature of the following:

1.  Camborne on March 22nd-26th 2022

2.  Blackpool on September 26th-30th 2022 [14 marks]

发表评论

电子邮件地址不会被公开。 必填项已用*标注