Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
COMP9313 Big Data Management
Course Details & Outcomes
Course Description
This course introduces the core concepts and technologies involved in managing Big Data. It will first introduce the characteristics of big data and big data analysis. Then, we will learn the open-source big data management framework Hadoop. We will mainly focus on Hadoop MapReduce programming. YARN, HDFS, HBase, and Hive will be briefly introduced as well. We will also learn an open-source memory-based distributed computing framework Spark. Another major focus of this course is algorithm design on large-scale data sets based on big data management frameworks, in various domains such as data stream mining, graph data processing, and finding similar items.
Course Aims
This course aims to introduce students to the concepts behind Big Data, the core technologies used in managing large-scale data sets, and a range of technologies for developing solutions to large-scale data analytics problems.
This course is intended for students who want to understand modern large-scale data analytics systems. It covers a wide range of topics and technologies. It will prepare students to be able to build such systems as well as use them efficiently and effectively to address challenges in big data management.
Course Learning Outcomes
| Course Learning Outcomes |
|---|
| CLO1 : Describe the important characteristics of Big Data |
| CLO2 : Develop an appropriate storage structure for a Big Data repository |
| CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data |
| CLO4 : Use a high-level query language to manipulate Big Data |
| CLO5 : Develop efficient solutions for analytical problems involving Big Data |
| Course Learning Outcomes | Assessment Item |
|---|---|
| CLO1 : Describe the important characteristics of Big Data |
|
| CLO2 : Develop an appropriate storage structure for a Big Data repository |
|
| CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data |
|
| CLO4 : Use a high-level query language to manipulate Big Data |
|
| CLO5 : Develop efficient solutions for analytical problems involving Big Data |
|
Learning and Teaching Technologies
Moodle - Learning Management System | Blackboard Collaborate
Assessments
Assessment Structure
| Assessment Item | Weight | Relevant Dates |
|---|---|---|
|
Coding Project 1
Assessment FormatIndividual
|
12% |
Start DateNot Applicable
Due DateWeek 4: 17 June - 23 June
|
|
Coding Project 2
Assessment FormatIndividual
|
16% |
Due DateWeek 7: 08 July - 14 July
|
|
Coding Project 3
Assessment FormatIndividual
|
22% |
Due DateWeek 10: 29 July - 04 August
|
|
Final Exam
Assessment FormatIndividual
|
50% |
Due DateTBA during Exam Week
|
Assessment Details
-
Coding Project 1
Assessment Overview
This coding project assesses the student's MapReduce programming skills. It will be assessed manually by course tutors according to a rubric. The feedback will be provided in Moodle to students in the format of comments on the students' submissions.Course Learning Outcomes
- CLO2 : Develop an appropriate storage structure for a Big Data repository
- CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data
-
Coding Project 2
Assessment Overview
This coding project assesses the student's Spark programming skills. It will be assessed manually by course tutors according to a rubric. The feedback will be provided in Moodle to students in the format of comments on the students' submissions.Course Learning Outcomes
- CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data
- CLO4 : Use a high-level query language to manipulate Big Data
- CLO5 : Develop efficient solutions for analytical problems involving Big Data
-
Coding Project 3
Assessment Overview
This coding project assesses the student's Spark programming skills, using a real cloud computing platform such as Google Dataproc. It will be assessed manually by course tutors according to a rubric. The feedback will be provided in Moodle to students in the format of comments on the students' submissions.Course Learning Outcomes
- CLO2 : Develop an appropriate storage structure for a Big Data repository
- CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data
- CLO5 : Develop efficient solutions for analytical problems involving Big Data
-
Final Exam
Assessment Overview
The final exam assesses the students' MapReduce and Spark programming skills, as well as algorithm design for big data analytics. The exam will be marked by course tutors manually according to a rubric. The feedback is provided upon students' request.
Course Learning Outcomes
- CLO1 : Describe the important characteristics of Big Data
- CLO2 : Develop an appropriate storage structure for a Big Data repository
- CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data
- CLO5 : Develop efficient solutions for analytical problems involving Big Data
General Assessment Information
Later Submission Penalties:
- 5% reduction of your marks for up to 5 days
The final mark is calculated by:
- Final Mark= proj1 + proj2 + proj3 + FinalExam
- Double Pass: You also need to achieve at least 20 marks in the final exam to pass the course.
Grading Basis
Standard
Course Schedule
| Teaching Week/Module | Activity Type | Content |
|---|---|---|
| Week 1 : 27 May - 2 June | Lecture |
Course information + introduction to big data |
| Week 2 : 3 June - 9 June | Lecture |
Hadoop MapReduce 1 |
| Week 3 : 10 June - 16 June | Lecture |
Hadoop MapReduce 2 |
| Week 4 : 17 June - 23 June | Lecture |
Spark 1 |
| Week 5 : 24 June - 30 June | Lecture |
Spark 2 |
| Week 6 : 1 July - 7 July | Lecture |
Recess Week |
| Week 7 : 8 July - 14 July | Lecture |
Finding Similar Items |
| Week 8 : 15 July - 21 July | Lecture |
Mining Data Streams |
| Week 9 : 22 July - 28 July | Lecture |
Graph Data Management |
| Week 10 : 29 July - 4 August | Lecture |
NoSQL, HBase, and Hive/Revision and exam preparation |
Attendance Requirements
Students are strongly encouraged to attend all classes and review lecture recordings.
General Schedule Information
The table summarises the planned weekly activities for the course. These are tentative. Please refer to the relevant sections of the course homepage for the most up-to-date information about the weekly schedule throughout the course delivery period.
Course Resources
Recommended Resources
The textbooks include:
- Hadoop: The Definitive Guide . Tom White. 4th Edition - O'Reilly Media
- Data-Intensive Text Processing with MapReduce . Jimmy Lin and Chris Dyer. University of Maryland, College Park.
- Mining of Massive Datasets . Jure Leskovec, Anand Rajaraman, Jeff Ullman . 2nd edition - Cambridge University Press
- Learning Spark . 1st and 2nd Edition - O'Reilly Media
Other references include:
Course Evaluation and Development
According to the feedback, the students mentioned the need of more examples. In this term, we will modify the slides and lecture as required.