Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
CDS504 - Enabling Technologies & Infrastructures for Big Data
Semester 2, Academic Year 2023/2024
Assignment 1
Objective Analyze income inequality or rich-poor disparity in a country, using Hadoop MapReduce.
Tasks
i) Identify a problem/issue/aspect related to income inequality or rich-poor disparity which can be solved/processed using MapReduce.
ii) Find a suitable open/free dataset related to income inequality or rich-poor disparity for Malaysia (or your own country). (E.g. data.gov.my, dosm.gov.my, kaggle.com, data.world, data.gov, data.worldbank.org, etc.)
iii) Extract and reformat the data you’re interested in, if needed.
iv) Show how the various stages of MapReduce (i.e. splitter, mapper, shuffler/sorter, reducer) will process your data. The reducer should implement at least 2 different functions (i.e. total, average, median, maximum, minimum, standard deviation, etc.).
v) Show how new/interesting results (which originally did not exist in the dataset) could be obtained after processing by MapReduce.
You are not required/expected to install Hadoop MapReduce or write MapReduce program code for your solution. Just simulate (on paper, via diagrams & description) how MapReduce will process the dataset to produce interesting results.
Hint/Suggestion
Use WordCount example as basis, but develop a new/different solution for your objective, using the selected dataset.
Report
Prepare a written report which include the following:
. Problem statement (i.e. what is the issue), motivation (i.e. why is the issue important, why MapReduce is suitable, etc.) and the goal of your work (i.e. hypothesis, expected end-result, how it will be useful, etc.).
. Description of the selected dataset.
. Data pre-processing and reformatting, if any.
. How the mapper and reducer process the data. Show the various stages, with their intermediate results.
. Description and discussion on the final results obtained.
. Reflection (i.e. lessons learned, experience developing solution for MapReduce, challenges faced, etc.).
. References
. Appendix – Sample of dataset used (original and extracted/reformatted).
Online Submission
Upload the report (in PDF format) to eLearning portal. Only one submission per group is required.
Viva
No presentation required.
Important Note
Plagiarism is not tolerated. Direct "cut-and-paste" from Internet or other sources is not acceptable. You must show this is your own work (written using your own words).
All references used must be properly acknowledged. Use the standard format for listing references.
Report length: 20 pages (maximum).
Group: 1-4 persons in a group. Everyone in the group must contribute.
Contribution: 20% of coursework.
Due date: Sunday, 12th May 2024, 11.00pm.
Grading Rubric
|
Poor (0-3 marks) |
Average (4-5 marks) |
Good (6-7 marks) |
Excellent (8-10 marks) |
Problem statement (10%) |
Poorly defined, with key aspects not highlighted. |
Reasonably defined, with key aspects somewhat highlighted. |
Well defined, with some key aspects highlighted. |
Very well defined, with key aspects very well highlighted. |
Dataset (10%) |
Matches poorly with problem statement, and not well described. |
Matches somewhat with problem statement, and reasonably described. |
Matches quite well with problem statement, and well described. |
Matches very well with problem statement, and very well described. |
MapReduce (40%) |
Only a few stages and intermediate results are presented, and not clear. |
Only a few stages and intermediate results are presented, and quite clear. |
Most the stages and intermediate results are presented, and quite clear. |
All the stages and intermediate results arepresented, and very clear. |
Results and discussion (20%) |
Poor or very little, and not well presented. |
Reasonable, and averagely presented. |
Good, and well presented. |
Very good, and very well presented. |
Reflection (5%) |
Poor, and not genuine. |
Reasonable, but does not look genuine. |
Good and genuine. |
Very good and very genuine. |
References (5%) |
Insufficient, and not well formatted. |
Sufficient, but not well formatted. |
Sufficient, and well formatted. |
Sufficient, and very well formatted. |
Formatting and layout (5%) |
Not nice and not neat. |
Somewhat nice and neat. |
Nice and neat. |
Very nice and very neat. |
Language (5%) |
Full of language, grammar and spelling errors. |
Contains numerous language, grammar and spelling errors. |
Contains some language, grammar and spelling errors. |
Very few or no language, grammar and spelling errors. |