COMP9313 Big Data Management

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

COMP9313 Big Data Management


Course Details & Outcomes

Course Description

This course introduces the core concepts and technologies involved in managing Big Data. It will first introduce the characteristics of big data and big data analysis. Then, we will learn the open-source big data management framework Hadoop. We will mainly focus on Hadoop MapReduce programming. YARN, HDFS, HBase, and Hive will be briefly introduced as well. We will also learn an open-source memory-based distributed computing framework Spark. Another major focus of this course is algorithm design on large-scale data sets based on big data management frameworks, in various domains such as data stream mining, graph data processing, and finding similar items.

Course Aims

This course aims to introduce students to the concepts behind Big Data, the core technologies used in managing large-scale data sets, and a range of technologies for developing solutions to large-scale data analytics problems.

This course is intended for students who want to understand modern large-scale data analytics systems. It covers a wide range of topics and technologies. It will prepare students to be able to build such systems as well as use them efficiently and effectively to address challenges in big data management.

Course Learning Outcomes

Course Learning Outcomes
CLO1 : Describe the important characteristics of Big Data
CLO2 : Develop an appropriate storage structure for a Big Data repository
CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data
CLO4 : Use a high-level query language to manipulate Big Data
CLO5 : Develop efficient solutions for analytical problems involving Big Data


Course Learning Outcomes Assessment Item
CLO1 : Describe the important characteristics of Big Data
  • Final Exam
CLO2 : Develop an appropriate storage structure for a Big Data repository
  • Coding Project 1
  • Coding Project 3
  • Final Exam
CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data
  • Coding Project 2
  • Coding Project 1
  • Coding Project 3
  • Final Exam
CLO4 : Use a high-level query language to manipulate Big Data
  • Coding Project 2
CLO5 : Develop efficient solutions for analytical problems involving Big Data
  • Coding Project 2
  • Coding Project 3
  • Final Exam

Learning and Teaching Technologies

Moodle - Learning Management System | Blackboard Collaborate

Assessments

Assessment Structure

Assessment Item Weight Relevant Dates
Coding Project 1
Assessment FormatIndividual
12%
Start DateNot Applicable
Due DateWeek 4: 17 June - 23 June
Coding Project 2
Assessment FormatIndividual
16%
Due DateWeek 7: 08 July - 14 July
Coding Project 3
Assessment FormatIndividual
22%
Due DateWeek 10: 29 July - 04 August
Final Exam
Assessment FormatIndividual
50%
Due DateTBA during Exam Week

Assessment Details

  • Coding Project 1
    Assessment Overview
    This coding project assesses the student's MapReduce programming skills. It will be assessed manually by course tutors according to a rubric. The feedback will be provided in Moodle to students in the format of comments on the students' submissions.
    Course Learning Outcomes
    • CLO2 : Develop an appropriate storage structure for a Big Data repository
    • CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data
  • Coding Project 2
    Assessment Overview
    This coding project assesses the student's Spark programming skills. It will be assessed manually by course tutors according to a rubric. The feedback will be provided in Moodle to students in the format of comments on the students' submissions.
    Course Learning Outcomes
    • CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data
    • CLO4 : Use a high-level query language to manipulate Big Data
    • CLO5 : Develop efficient solutions for analytical problems involving Big Data
  • Coding Project 3
    Assessment Overview
    This coding project assesses the student's Spark programming skills, using a real cloud computing platform such as Google Dataproc. It will be assessed manually by course tutors according to a rubric. The feedback will be provided in Moodle to students in the format of comments on the students' submissions.
    Course Learning Outcomes
    • CLO2 : Develop an appropriate storage structure for a Big Data repository
    • CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data
    • CLO5 : Develop efficient solutions for analytical problems involving Big Data
  • Final Exam
    Assessment Overview

    The final exam assesses the students' MapReduce and Spark programming skills, as well as algorithm design for big data analytics. The exam will be marked by course tutors manually according to a rubric. The feedback is provided upon students' request.

    Course Learning Outcomes
    • CLO1 : Describe the important characteristics of Big Data
    • CLO2 : Develop an appropriate storage structure for a Big Data repository
    • CLO3 : Utilise the map/reduce paradigm and the Spark platform to manipulate Big Data
    • CLO5 : Develop efficient solutions for analytical problems involving Big Data

General Assessment Information

Later Submission Penalties:

  • 5% reduction of your marks for up to 5 days

The final mark is calculated by:

  • Final Mark= proj1 + proj2 + proj3 + FinalExam
  • Double Pass: You also need to achieve at least 20 marks in the final exam to pass the course.
Grading Basis

Standard

Course Schedule

Teaching Week/Module Activity Type Content
Week 1 : 27 May - 2 June Lecture

Course information + introduction to big data

Week 2 : 3 June - 9 June Lecture

Hadoop MapReduce 1

Week 3 : 10 June - 16 June Lecture

Hadoop MapReduce 2

Week 4 : 17 June - 23 June Lecture

Spark 1 

Week 5 : 24 June - 30 June Lecture

Spark 2

Week 6 : 1 July - 7 July Lecture

Recess Week

Week 7 : 8 July - 14 July Lecture

Finding Similar Items 

Week 8 : 15 July - 21 July Lecture

Mining Data Streams

Week 9 : 22 July - 28 July Lecture

Graph Data Management

Week 10 : 29 July - 4 August Lecture

NoSQL, HBase, and Hive/Revision and exam preparation 

Attendance Requirements

Students are strongly encouraged to attend all classes and review lecture recordings.

General Schedule Information

The table summarises the planned weekly activities for the course. These are tentative. Please refer to the relevant sections of the course homepage for the most up-to-date information about the weekly schedule throughout the course delivery period.

Course Resources

Recommended Resources

The textbooks include:

Other references include:

Course Evaluation and Development

According to the feedback, the students mentioned the need of more examples. In this term, we will modify the slides and lecture as required. 



发表评论

电子邮件地址不会被公开。 必填项已用*标注