COMP90038 Algorithms and Complexity

COMP90038 Algorithms and Complexity
Assignment 2 (2024 Semester 1)
Due date: 09:00am May 20 (Monday) 2024

Marks. This assignment has in total 15 marks and is worth 15% of the total subject mark.

Objectives. To improve your understanding of the time complexity of algorithms and recurrence relations. To develop problem-solving and design skills. To improve written communication skills; in particular the ability to present algorithms clearly, precisely and unambiguously.

Difficulty. The (∗) mark means the problem would be a bit challenging.

About Analysis. When solving problems in this assignment, you can directly use any known theoretical results that are taught in class as facts with no need to prove them. For example, you can claim sorting n integers can be done in O(n · log n) worst-case time without proofs.

Problem 1. [8 Marks] Consider a set S of n integers; given a query integer q, the predecessor of q in S is defined as the largest integer in S that is no more than q. For example, consider

S = {1, 14, 7, 9, 30, 77, 15, 20, 60} ;

the predecessor of q = 7 in S is 7, and that of q = 28 in S is 20.

By symmetry, the successor of q in S is defined as the smallest integer in S that is no less than q. Continuing the example, the successor of q = 7 in S is still 7, and that of q = 28 in S is 30.

• (a) [3 Marks] Design a data structure which achieves the following:

– it can be constructed in O(n · log n) worst-case time;
– it consumes O(n) worst-case space;

– it can find the predecessor in S for every given query integer q in O(log n) worst-case time.

You will need to describe: (i) the data structure, (ii) the construction algorithm of the data structure, (iii) the predecessor query algorithm of the data structure; and you will also need to justify the construction time complexity, space consumption and query time complexity of your data structure with appropriate analysis. [1 mark for the data structure, 1 mark for the algorithm and 1 mark for the analysis]

• (b) [3 Marks] Suppose that you have a data structure that meets the requirements in Problem (a). Moreover, in addition to the predecessor queries, suppose also that your data structure can find the successor in S for every given query integer q in O(log n) worst-case time.

Describe an algorithm with the above data structure such that, given any range [a, b] with integers a < b, it can report all the integers x in S such that a ≤ x ≤ b, in O(k + log n) time in the worst case, where k is the number of integers in S that are in the range [a, b].

Note that the term k in the complexity is necessary because, in the worst case, e.g., for the query range [−∞, ∞], the algorithm needs to output all the n integers in S, which takes O(n) time. More specifically, it takes Ω(k) time to output k integers for the query result and hence, the O(k) term is inevitable in the query time complexity.

You will need to justify the time complexity of your algorithm with appropriate analysis. Continuing the previous example, given a range [4, 19], your algorithm should report {7, 9, 14, 15}

as they are the integers in S that are in the given range. [1.5 marks for the algorithm and 1.5 marks for the analysis]

• (c) (∗) [2 Marks] Further strengthen (if necessary yet still with O(n) worst-case space) the data structure in Problem (b), and describe an algorithm such that given any range [a, b] with integers a < b, it can return the number of integers in S that are in the range [a, b], in O(log n) time in the worst case.

Note that in this Problem (c), the answer to each query is just a number (i.e., the number of integers falling in the query range). And hence, unlike the case in Problem (b) which needs to pay O(k) cost for outputting the query answer, the output cost for the query answer here is just O(1).

In other words, simply applying the solution in Problem (b) and counting the number of integers in the query range will not meet the required query time bound, as it will take O(k + log n) time which can exceed O(log n) when k is large, e.g., k = n for the query range [−∞, ∞].

You will need to justify the time complexity and the correctness of your algorithm with appro priate analysis.

Continuing the example in Problem (b), your algorithm should return 4 for the range [4, 19] as there are 4 integers in S in this range. [1 mark for the algorithm, 0.5 mark for the correctness justification, and 0.5 mark for the running time analysis]

Problem 2. [4 Marks] Consider two English documents D1 and D2; denoted by S(D1) and S(D2) as the sets of English words in them, respectively, where each word has O(1) length, i.e., each word consists of only a constant number of characters.

The similarity between D1 and D2 is defined as the Jaccard Similarity between the sets of words in them, i.e., S(D1) and S(D2). Specifically, the similarity between D1 and D2 is computed as:

Sim(D1, D2) = |S(D1) ∩ S(D2)|/|S(D1) ∪ S(D2)|.

That is, the size of the set intersection S(D1) ∩ S(D2) divided by the size of the set union S(D1) ∪ S(D2). For simplicity, we assume that the sets of words in D1 and D2 have the same size, namely,

|S(D1)| = |S(D2)|. And we let n = |S(D1)| = |S(D2)|.

Design an algorithm to compute Sim(D1, D2) in O(n) expected time. You will need to justify the time complexity of your algorithm with appropriate analysis. [2 marks for the algorithm and 2 marks for the analysis]

Problem 3. (∗) [3 Marks] Given an (unsorted) sequence S of n integers from a constant-size universe {1, 2, . . . , 100} (i.e., the universe size is 100, a constant), find the longest strictly ascending sub-sequence (not necessarily contiguous) of integers from the original sequence S such that all these integers in this sub-sequence are in a strictly ascending order.

For example, consider the following sequence of integers:

S = {1, 20, 14, 7, 9, 30, 77, 15, 20, 60} ,
where s1 = {1, 20, 30, 77} and s2 = {1, 7, 9, 15, 20, 60} are two possible strictly ascending subsequences of S. While there exist more other strictly ascending sub-sequences of S, it can be verified that s2 is the longest strictly ascending sub-sequence of S.
You are asked to design an algorithm that can solve the above problem in O(n) worst-case time. You will need to justify the time complexity of your algorithm with appropriate analysis. [2 marks for the algorithm and 1 mark for the analysis]

Submissions

You should lodge your submission for Assignment 2 via the LMS (i.e., Canvas). You must identify yourself in your file. Poor-quality scans of solutions written or printed on paper will not be accepted. There are scanning facilities on campus, not to mention scanning apps for smartphones etc. Solutions generated directly on a computer are of course acceptable. Submit one file:

• A assignment2.pdf file, comprising your solutions to the questions in the assignment.

Administrative Issues

When is late? What do I do if I am late? The due date and time are printed on the front of this document. The late penalty for non-exam assessment is two marks per day (or part thereof) overdue. Requests for extensions or adjustment must follow the University policy (the Melbourne School of Engineering “owns” this subject), including the requirement for appropriate evidence.

Late submissions should also be lodged via the LMS, but, as a courtesy, please also email the subject coordinator, Junhao Gan, when you submit late. If you make both on-time and late submissions, please consult the subject coordinator as soon as possible to determine which submission will be assessed.

Individual work. You are reminded that your submission for this Assignment is to be your own individual work. Students are expected to be familiar with and to observe the University’s Academic Integrity policy http://academicintegrity.unimelb.edu.au/. For the purpose of ensuring academic integrity, every submission attempt by a student may be inspected, regardless of the number of attempts made.

Students who allow other students access to their work run the risk of also being penalized, even if they themselves are sole authors of the submission in question. By submitting your work electronically, you are declaring that this is your own work. Automated similarity checking software may be used to compare submissions.

You may re-use code and materials provided by the teaching staff, and you may refer to resources on the Web or in published or similar sources. Indeed, you are encouraged to read beyond the standard teaching materials. However, all such sources must be cited fully and, apart from code and materials provided by the teaching staff, you must not copy code or any materials.

Finally. Despite all these stern words, we are here to help! There is information about getting help in this subject on the LMS pages. Frequently asked questions about the Assignment will be answered in the LMS discussion group.

发表评论

电子邮件地址不会被公开。 必填项已用*标注