CDS504 - Enabling Technologies & Infrastructures for Big Data

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

CDS504 - Enabling Technologies & Infrastructures for Big Data

Semester 2, Academic Year 2023/2024

Assignment 1

Objective Analyze income inequality or rich-poor disparity in a country, using Hadoop MapReduce.

Tasks

i)    Identify a problem/issue/aspect related to income inequality or rich-poor disparity which can be solved/processed using MapReduce.

ii)   Find a suitable open/free dataset related to income inequality or rich-poor disparity for Malaysia (or your own country). (E.g. data.gov.my, dosm.gov.my, kaggle.com, data.world, data.gov, data.worldbank.org, etc.)

iii)  Extract and reformat the data you’re interested in, if needed.

iv)   Show how the various stages of MapReduce (i.e. splitter, mapper, shuffler/sorter, reducer) will process your data. The reducer should implement at least 2 different functions (i.e. total, average, median, maximum, minimum, standard deviation, etc.).

v)    Show how new/interesting results (which originally did not exist in the dataset) could be obtained after processing by MapReduce.

You are not required/expected to install Hadoop MapReduce or write MapReduce program code for your solution.  Just simulate (on paper, via diagrams & description) how MapReduce will process the dataset to produce interesting results.

Hint/Suggestion

Use WordCount example as basis, but develop a new/different solution for your objective, using the selected dataset.

Report

Prepare a written report which include the following:

.     Problem statement (i.e. what is the issue), motivation (i.e. why is the issue important, why MapReduce is suitable, etc.) and the goal of your work (i.e. hypothesis, expected end-result, how it will be useful, etc.).

.     Description of the selected dataset.

.     Data pre-processing and reformatting, if any.

.     How the mapper and reducer process the data.  Show the various stages, with their intermediate results.

.     Description and discussion on the final results obtained.

.     Reflection (i.e. lessons learned, experience developing solution for MapReduce, challenges faced, etc.).

.     References

.     Appendix – Sample of dataset used (original and extracted/reformatted).

Online Submission

Upload the report (in PDF format) to eLearning portal.  Only one submission per group is required.

Viva

No presentation required.

Important Note

Plagiarism is not tolerated.  Direct "cut-and-paste" from Internet or other sources is not acceptable. You must show this is your own work (written using your own words).

All references used must be properly acknowledged.  Use the standard format for listing references.

Report length: 20 pages (maximum).

Group: 1-4 persons in a group. Everyone in the group must contribute.

Contribution: 20% of coursework.

Due date: Sunday, 12th May 2024, 11.00pm.

Grading Rubric

Poor

(0-3 marks)

Average

(4-5 marks)

Good

(6-7 marks)

Excellent

(8-10 marks)

Problem

statement

(10%)

Poorly defined, with key aspects not

highlighted.

Reasonably defined,

with key aspects

somewhat highlighted.

Well defined, with

some key aspects

highlighted.

Very well defined,

with key aspects very well highlighted.

Dataset

(10%)

Matches poorly with

problem statement, and not well described.

Matches somewhat

with problem

statement, and

reasonably described.

Matches quite well

with problem

statement, and well

described.

Matches very well

with problem

statement, and very well described.

MapReduce

(40%)

Only a few stages and   intermediate results are presented, and not

clear.

Only a few stages and   intermediate results are presented, and quite

clear.

Most the stages and

intermediate results are presented, and quite

clear.

All the stages and

intermediate results

arepresentedand very clear.

Results and

discussion

(20%)

Poor or very little, and not well presented.

Reasonable, and

averagely presented.

Good, and well

presented.

Very good, and very well presented.

Reflection

(5%)

Poor, and not genuine.

Reasonable, but does not look genuine.

Good and genuine.

Very good and very genuine.

References

(5%)

Insufficient, and not well formatted.

Sufficient, but not well formatted.

Sufficient, and well

formatted.

Sufficient, and very well formatted.

Formatting

and layout

(5%)

Not nice and not neat.

Somewhat nice and

neat.

Nice and neat.

Very nice and very

neat.

Language

(5%)

Full of language,

grammar and spelling errors.

Contains numerous

language, grammar and spelling errors.

Contains some

language, grammar and spelling errors.

Very few or no

language, grammar

and spelling errors.

发表评论

电子邮件地址不会被公开。 必填项已用*标注