COM6012 2025 Assignment


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


COM6012 2025 Assignment

Deadline: 13:00 Thursday 08 May 2025

Please carefully read the assignment brief before starting to complete the assignment. 

Release Status:

Q1 - 10 marks
Q2 - 9 marks
Q3 - 10 marks
Q4 - 10 marks
An FAQ (last update 06.03.2025) will be updated when questions are raised for important clarifications or tips.

Assignment Brief

How and what to submit
A. Create a folder YOUR_USERNAME-COM6012 containing the following:

1) AS_report.pdf: A report in PDF containing answers (including all figures and tables) to ALL questions at the root of the zipped folder (like readme.txt in the lab solutions). If an answer to a question is not found in this PDF file, you will lose the respective mark. The report should be concise. You may include appendices/references for additional information but marking will focus on the main body of the report.

2) Code, script, and output files: All files used to generate the answers for individual questions in the report above, except the data, should be included. These files should be named properly starting with the question number (separate files for the two questions): for example, your Python code as Q1_code.py and Q2_code.py, your HPC script as Q1_script.sh and Q2_script.sh, and your output files on HPC as Q1_output.txt and Q2_output.txt (and Q1_figC1.jpg, etc.). The results must be generated from the HPC, not your local machine. Figures must be created by Python code. We will apply a penalty if any of these files are missing, 25% for each file. Double-check that these files are included by downloading the zipped file on another machine and opening it to verify.

B. When you have finished ALL the questions, zip your folder YOUR_USERNAME-COM6012 to include the above (one single report plus code, script, and output files for all questions, properly named) and upload this YOUR_USERNAME-COM6012.zip file to Blackboard before the deadline.C. NO DATA UPLOAD: Please do not upload the data files used. Instead, use the relative file path in your code, assuming data files are downloaded (and unzipped if needed) under the folder ‘Data’, as in the lab.

D. Code and output: 1) Use PySpark 3.5.4 and Python 3.12 as covered in the lecture and lab sessions to complete the tasks; 2) Submit your PySpark job to HPC with sbatch to obtain the output.

Assessment Criteria (Scope: Sessions 1 to 8; Total: 39 marks)

1. Being able to use PySpark to analyse big data to answer data analytic questions.
2. Being able to perform tasks covered in Sessions 1 to 8 on large-scale data.
3. Being able to make useful observations and explain obtained results clearly.

Late submissions: We follow the Department's guidelines about late submissions, i.e., “If you submit work to be marked after the deadline you will incur a deduction of 5% of the mark each working day the work is late after the deadline, up to a maximum of 5 working days” but NO late submission will be marked after the maximum of 5 working days because we will release a solution by then. Please see this link.

Use of unfair means: "Any form of unfair means is treated as a serious academic offence and action may be taken under the Discipline Regulations." (from the MSc Handbook). Please carefully read this link on what constitutes Unfair Means if you are not sure.

Note: This assignment is for internal students (COM6012). External students (COM6012s) will be assessed by exam only.

发表评论

电子邮件地址不会被公开。 必填项已用*标注