MA3616: Statistical Analysis for Big Data


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


MA3616: Statistical Analysis for Big Data
Project-based Continuous Assessment

Project-based open-book assessment:

Your projects should represent at least 20 hours of work undertaken towards implementation of machine methods taught in this module, including Logistic Regression, Naïve Bayes Classifier, KNN Classifier, Cross Validation Classifier, etc . All analyses should be conducted using R.

Project Description:

When ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground project's dataset proves that these much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa. This project challenges you to predict the final price of each home.

Task:

It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable. Submissions are evaluated on Root Mean Squared Error between the logarithm of the predicted value and the logarithm of the observed sales price.

Data:
Available on Brightspace MA3616.

Submission Deadline

A final report on your project is due by Friday 1 st November 2024.

This report should not exceed ten A4-sized pages, including all figures, tables as well as the implementation of methods and models with R-code. The reports have to be submitted electronically via Wiseflow.

Marking Criteria

1. Technical Contents and discussions (50%): clearly state your research question(s) of interest; properly state and use the appropriate methods to analyse the chosen data set, and fully justify or conclude your analyses or comparison or discussion.

2. Written presentation (30%): clearly structure your presentation and consistently use the right notations.
3. R-code (20%): make your R code readable and reproducible.

发表评论

电子邮件地址不会被公开。 必填项已用*标注