Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
COMP9319 Web Data Compression and Search
Course Details & Outcomes
Course Description
As the amount of Web data increases, it is becoming vital to not only be able to search and retrieve this information quickly, but also to store it in a compact manner. This is especially important for mobile devices which are becoming increasingly popular. Without loss of generality, within this course, we assume Web data (excluding media content) will be in XML and its like (e.g., HTML, JSON).
If time allows, we may cover optional topics such as: streaming algorithms, text analytics, Web data optimization for mobile devices. The lecture materials will be complemented by two programming assignments and numerous tutorial-type, written exercises.
Course Aims
This course aims to introduce the concepts, theories, and algorithmic issues important to Web data compression and search. The course will also introduce the most recent development in various areas of Web data optimization topics, common practice, and its applications. The course is composed of the following parts:
- Adaptive coding, information theory
- Text compression (zip, gzip, bzip, etc)
- Burrows-Wheeler Transform and backward search
- XML compression
- Indexing
- Pattern matching and regular expression search
- Distributed querying
- Fast index construction
- Implementation
Course Learning Outcomes
Course Learning Outcomes |
---|
CLO1 : Apply the fundamentals of text compression |
CLO2 : Apply advanced data compression techniques such as those based on Burrows Wheeler Transform |
CLO3 : Write computer programs for Web data compression and search with optimization |
CLO4 : Use selected XML processing and optimization techniques |
CLO5 : Analyze the advantages and disadvantages of data compression for Web search |
CLO6 : Apply basic techniques from XML distributed query processing |
CLO7 : Discuss the past, present, and future of data compression and Web data optimization |
Course Learning Outcomes | Assessment Item |
---|---|
CLO1 : Apply the fundamentals of text compression |
|
CLO2 : Apply advanced data compression techniques such as those based on Burrows Wheeler Transform |
|
CLO3 : Write computer programs for Web data compression and search with optimization |
|
CLO4 : Use selected XML processing and optimization techniques |
|
CLO5 : Analyze the advantages and disadvantages of data compression for Web search |
|
CLO6 : Apply basic techniques from XML distributed query processing |
|
CLO7 : Discuss the past, present, and future of data compression and Web data optimization |
|
Learning and Teaching Technologies
Moodle - Learning Management System | Echo 360 | EdStem | Blackboard Collaborate
Assessments
Assessment Structure
Assessment Item | Weight | Relevant Dates |
---|---|---|
Assignment 1
Assessment FormatIndividual
|
15% |
Due DateWeek 5: 24 June - 30 June
|
Assignment 2
Assessment FormatIndividual
|
35% |
Due DateWeek 9: 22 July - 28 July
|
Final Examination
Assessment FormatIndividual
|
50% |
Start DateNot Applicable
Due DateDuring Exam Period
|
Assessment Details
-
Assignment 1
Assessment Overview
This is a warm-up programming assignment for the course. Hence it will be relatively lightweight (students are expected to be able to finish the assignment in a few hours).
Assessment of assignments will be primarily based on how accurately they satisfy the requirements; this means that most of the marks will be based on automatic marking. However, we may also manually examine submitted assignments to determine (a) whether they are written with good style, (b) how closely they satisfied the requirements, if time allows.
Individual graded results with optional comments will be emailed to each student. Overall feedbacks will be discussed in the lectures, and students may discuss with the tutors in consultation sessions for further assessment feedbacks.
Course Learning Outcomes
- CLO1 : Apply the fundamentals of text compression
- CLO2 : Apply advanced data compression techniques such as those based on Burrows Wheeler Transform
- CLO3 : Write computer programs for Web data compression and search with optimization
- CLO4 : Use selected XML processing and optimization techniques
- CLO5 : Analyze the advantages and disadvantages of data compression for Web search
-
Assignment 2
Assessment Overview
This is the second programming assignment for the course. Hence it will be relatively heavier weight since it involves more advanced techniques that students have learnt from the course (students are expected to be able to finish the assignment in a few days).
Assessment of assignments will be primarily based on how accurately they satisfy the requirements; this means that most of the marks will be based on automatic marking. However, we may also manually examine submitted assignments to determine (a) whether they are written with good style, (b) how closely they satisfied the requirements, if time allows.
Individual graded results with optional comments will be emailed to each student. Overall feedbacks will be discussed in the lectures, and students may discuss with the tutors in consultation sessions for further assessment feedbacks.
Course Learning Outcomes
- CLO1 : Apply the fundamentals of text compression
- CLO2 : Apply advanced data compression techniques such as those based on Burrows Wheeler Transform
- CLO3 : Write computer programs for Web data compression and search with optimization
- CLO4 : Use selected XML processing and optimization techniques
- CLO5 : Analyze the advantages and disadvantages of data compression for Web search
-
Final Examination
Assessment Overview
The final exam will be a major assessment in this course and aims to test what students learned about data compression and search during the course of the semester. To pass this course, students are required to have satisfactory performance on the final exam even if they do very well on the assignments. In order to meet the hurdle requirement, students must score better than 40% on the final exam. Note that the hurdle will be enforced after any required scaling.
Course Learning Outcomes
- CLO1 : Apply the fundamentals of text compression
- CLO2 : Apply advanced data compression techniques such as those based on Burrows Wheeler Transform
- CLO4 : Use selected XML processing and optimization techniques
- CLO5 : Analyze the advantages and disadvantages of data compression for Web search
- CLO6 : Apply basic techniques from XML distributed query processing
- CLO7 : Discuss the past, present, and future of data compression and Web data optimization
Assignment submission Turnitin type
Not Applicable
Hurdle rules
To pass this course, students are required to have satisfactory performance on the final exam even if they do very well on the assignments. In order to meet the hurdle requirement, students must score better than 40% on the final exam. Note that the hurdle will be enforced after any required scaling.
General Assessment Information
Assignments will be completed individually ; this means that you should do them yourself without assistance from others, except for asking advice from the Lecturer or Tutor. As noted above, assignments are the primary vehicle for learning the material in this course. If you don't do them, or simply copy and submit someone else's work, you have wasted a valuable learning opportunity.
Assignments are to be submitted via "give" before the specified time on the due date. Assessment of assignments will be primarily based on how accurately they satisfy the requirements; this means that most of the marks will be based on automatic marking. However, we may also manually examine submitted assignments to determine (a) whether they are written with good style, (b) how closely they satisfied the requirements, if time allows.
The penalty for late submission of assignments will be 5% (of the worth of the assignment) subtracted from the raw mark per day of being late. In other words, earned marks will be lost. For example, assume an assignment worth 20 marks is marked as 18, but had been submitted two days late. The late penalty will be 2 marks, resulting in a mark of 16 being awarded. No assignments will be accepted later than 5 days after the original deadline. For example, if you have your special consideration granted by UNSW for a one-week extension, there will be no late penalty if the assignment is submitted within 7 days after the original deadline. However, no further late submissions will be accepted after these 7 days.
Grading Basis
Standard
Course Schedule
Teaching Week/Module | Activity Type | Content |
---|---|---|
Week 1 : 27 May - 2 June | Lecture |
Introduction, basic information theory, basic compression |
Week 2 : 3 June - 9 June | Lecture |
More basic compression algorithms |
Week 3 : 10 June - 16 June | Lecture |
Adaptive Huffman; Overview of BWT |
Week 4 : 17 June - 23 June | Lecture |
Pattern matching and regular expression |
Week 5 : 24 June - 30 June | Lecture |
FM index, backward search, compressed BWT |
Week 7 : 8 July - 14 July | Lecture |
Suffix tree, suffix array, the linear time algorithm |
Week 8 : 15 July - 21 July | Lecture |
XML overview; XML compression |
Week 9 : 22 July - 28 July | Lecture |
Graph compression; Distributed Web query processing |
Week 10 : 29 July - 4 August | Lecture |
Optional advanced topics; Course Revision |
Attendance Requirements
Students are strongly encouraged to attend all classes and review lecture recordings.
General Schedule Information
The course schedule is an approximate guide to the sequence of topics in this course. It is subject to change as the term progresses.
Course Resources
Recommended Resources
There will be no textbook used in this course. Lecture slides and supplementary readings will be provided and used.
You may find the readings below useful as reference materials:
- Managing Gigabytes: Compressing and Indexing. Documents and Images, Second Edition. Ian H. Witten, Alistair Moffat, Timothy C. Bell, Morgan Kaufmann, 1999. (recommended reference, available at the university bookstore)
- Search Engines: Information Retrieval in Practice. W. Bruce Croft, Donald Metzler, and Trevor Strohman, Pearson Education, 2009.
- http://www.data-compression.info contains lots of valuable resources on data compression (especially links to readings and useful advice), despite the website's pink color!
- Data on the Web: from relations to semistructured data and XML. Serge Abiteboul, Peter Buneman, Dan Suciu. Morgan Kaufmann, 2000.
You will also find your previous textbooks on data structures and/or algorithms useful, in case you need to refer to the fundamentals of data structures and algorithms for text processing.
Course Evaluation and Development
This course is evaluated each session using MyExperience.
The MyExperience evaluation from the last time I taught this course showed that students were overall satisfied with all aspects of the course. Thus we maintain a similar style and structure for this term. Since this is the second time that we run this course after the pandemic (from totally online back to hybrid mode), we will go through the in-depth topics in the recorded lectures and discuss more examples and/or practical considerations in the live lectures (mixed online & in person. Please note that your feedback is important and will be considered to improve future offerings of this course (e.g., how much content can remain online).
Students are also encouraged to provide informal feedback during the term and let the lecturer know of any problems, as soon as they arise. Suggestions will be listened to very openly, positively, constructively, and thankfully, and every reasonable effort will be made to address them as soon as possible.