Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
DATA 6530 STATISTICS AND FORECASTING
Please note:
a) This is an Individual Exam. You have to work on R Studio and prepare all required files without consulting with others
b) Please submit one *.Rmd file, one *.html file, and one *.docx file in a single zip file via Blackboard.
c) You may refer to HW#2 and HW#9 as examples
1. Download Walmart quarterly sales data 1995–2015 in Walmart_Data.xlsx and work on the following with R Studio (20 points)
a. Import the Excel file into RStudio as ’walmart’
b. Inspect the file ‘walmart’ in terms of columns and rows
c. Convert the file ‘walmart’ into a time series ‘wal’, create a new column ‘Quarter’ in ‘wal’ using ‘mutate(Quarter = yearquarter(Date))’, index the ‘Quarter’ using ‘index = Quarter ‘, and inspect the time series ‘wal’
d. Filter the ‘wal’ for the period 1995-2013 as ‘wal_train’, and 2014-2015 as ‘wal_test’. Inspect the “wal_train’ and ‘wal_test’
e. Plot target variable ‘Sales’ in ‘wal’ and identify any seasonal pattern and need for transformation
f. Comment in 1(e) on seasonality, stationarity, transformation, and possible forecasting strategies
2. Use ‘Sales’ as the target variable for simple forecasting models using train data (1995-2013) (20 points)
a. Develop simple forecast models in terms of drift, mean, naïve, and seasonal naive
b. You may use ‘fc <- forecast (fit, x_test)’ to forecast, where x_test is the test data, which is applicable to all forecasting models
c. Plot the results from 2(b)
d. Calculate the model accuracy both on train data and test data
e. Compare the results and comment
f. Determine the best model based on 2(a), 2(b), and 2(c)
3. Time Series Regression Models using the train data (1995-2013) (20 points)
a. Develop regression forecast models on Sales with ‘trend’, ‘season’, and ‘GDP’
b. Fine-tune the model in terms of F, RMSE, t-test, and R2
c. Forecast using the test data and plot the resultsd. Calculate the model accuracy both on train data and test data.
e. Identify the best regression model by comparing the train and test results
f. Summarize the regression models along with appropriate plots
4. Use the quarterly ‘Sales’ train data 1995-2013 for ARIMA Models (20 points)
a. Using ‘Sales’, find an appropriate transformation and order of differencing for stationarity
b. For the resulting stationary time series, use ggtsdisplay to check on ACF and PACF
c. Develop ARIMA models with the hints from ACF and PACF for appropriate parameters
d. Determine the best ARIMA model by checking accuracy and residuals on RMSE and AIC
e. You may use auto ARIMA as a benchmark for comparison both for train and test data
f. Forecast using the test data, plot the results, and comment on the results
5. Final model comparison using the quarterly ‘Sales’ train data 1995-2013 (20 points)
a. Develop the best ETS model
b. Compare the best model from Items 2, 3, and 4, along with the new ETS model from 5(a)
c. Calculate the model accuracy and residuals on RMSE and AIC
d. Forecast using the test data and compare the model accuracy on train and test data
e. Summarize the results along with appropriate plots
f. Write the final report in Word, with descriptions, tables, charts, figures, and comments
Please check the following list before you submit your Midterm Exam via Blackboard:
1. Please submit the ‘*.Rmd’ file, the ‘*.html’ file, and the ‘*.docx’ final report file in a single zip file
2. The final report in Word should be named as ‘Exam-your-name.docx’ to comment and summarize your final decision as to which forecasting model is the best, with supporting evidence and measures, such as comparison tables and charts (some of the R Studio charts can be conveniently copied from your html file).
3. All three files should be zipped into a single file for Exam submission by the due date.
4. You are required to run getwd( ) and Sys.time( ) on R Studio as always.