COMS30042 Advanced Algorithms

January/February 2021 Examination Period

FACULTY OF ENGINEERING

Third Year Examination for the Degree of

Bachelor and Master of Engineering and Bachelor of Science

COMS30042

Advanced Algorithms

Q1. (a) Consider an array of n integers whose values are in the range 0, . . . , u − 1. Give an algorithm that can sort the array in O(n log log u) time.     [3 marks]

(b) State the space usage of the FKS hashing scheme and state the time complexity of the construction.    [2 marks]

Q2. Answer the following questions about Bloom filters which we use to represent a set of elements. Bloom filters do not support deletions natively. However, your friend Martha suggests a way round this problem. Her idea is as follows. Create two Bloom filters, one called Badd and one callld Bdel. Whenever we want to add an element e to the set we set the relevant places in Badd to 1. Whenever we want to delete an element e we set the relevant places in Bdel to 1. In order to check if an element is in the set we have to look at both Badd and Bdel. First check Badd . If it returns false then e is not in the set. If it returns true then check Bdel. If this returns true then e is not in the set. If it returns false then e is in the set.

(a) Does Martha’s idea give us the desirable property that false positives may occur but false negatives are impossible. If you think yes then explain why and if no, give an example that breaks this property.    [5 marks]

(b) What is the time complexity of a lookup using Martha’s idea?    [5 marks]

(c) Another way to handle deletions is to store an m-bit number in the array positions. In practice it turns out that 4 bits normally suffices. Describe how insert, delete and lookup queries could be supported in such an extended Bloom filter.     [5 marks]

(d) Does your solution using m-bit numbers in the array positions have the desirable property that false positives may occur but false negatives are impossible? If you think yes then explain why and if no, give an example that breaks this property.   [5 marks]

(e) Imagine we want to use a Bloom filter of size 800 bits for a Web cache which stores up to 100 URLs picked from a universe of 1000000 URLs. How would you calculate the optimal number of hash functions to use to minimise the false positive probability and what false positive probability does it give? You can give the probability to one decimal place.    [5 marks]

Q3. A hash function family H = {h1, h2, . . . } with hi : U → T is pairwise independent iff for randomly and uniformly chosen h ∈ H, we have that for any distinct x, y ∈ U and pair of values u, v ∈ T, Pr(h(x) = u ∧ h(y ) = v ) = 1/|T|2.

Let x1, . . . , xn be different values chosen from U.

(a) Pick at random a1, . . . , an ∈ {0, 1} and define

ha1,...,an (x1, . . . , xn) = a1x1 + · · · + anxn mod 2

Prove that this family is pairwise independent or give a counter-example.    [5 marks]

(b) Pick at random a0, . . . , an ∈ {0, 1} and define

ha0,...,an (x1, . . . , xn) = a0 + a1x1 + · · · + anxn mod 2

Let u = u1, . . . , un and v = v1, . . . , vn. ui and vi must differ at at least one index i and we can ignore all other indices. We can regard a1, . . . , an as selecting a subset of values in the two inputs u and v . You can assume without proof that the probability of the size of the subset of mismatching indices being even or odd size is equal.

Prove that this hash function family is pairwise independent.     [5 marks]

Q4. This question is about devising a fast algorithm to count the number of distinct substrings of a string of length n, including the empty string. For example, the string abb has five distinct non-empty substrings which are (a, b, ab, bb, abb).

(a) i. Construct the atomic suffix tree of abb. The suffix tree you construct should have exactly three leaves as we do not need to add an edge from the root that points only to the sentinel.     [5 marks]

ii. What property of an atomic suffix tree gives you the number of distinct non-empty substrings for a string?    [5 marks]

iii. Give an algorithm that runs in O(n) time to count the number of distinct non-empty substrings of a string of length n. You should provide a short explanation in English of why your algorithm is correct and why its running time is O(n). You may assume anything given in the lectures as part of your solution. [Hint: The atomic tree has size O(n2).]     [5 marks]

(b) This part of the question is about the DC3 suffix array construction algorithm. In the DC3 algorithm indices B1 = {i mod 3 = 1} and B2 = {i mod 3 = 2} are defined. In the first stage of the algorithm it computes the suffix array for indices B1 ∪ B2 from a string of twice the length of the input. Show the suffix array that is obtained in this way by the DC3 algorithm in its first stage using the input string  bananas.    [5 marks]

Q5. Consider the following problem which we will call MaxIncome. We are given work values a1, . . . , an and a bound b and incomes c1, . . . , cn with bound k. Work ai gives income ci . Is there a way of choosing work values so that their sum is at most b and the corresponding total income is at least k?

(a) Show that MaxIncome is in NP.   [5 marks]

(b) Show that MaxIncome is NP-complete by reducing it from one of the NP-complete problems that has been covered in the lectures.   [5 marks]

Q6. Consider a 2D version of the RMQ problem. We are given a square n×n grid of integers. A query RMQ (y1, x1, y2, x2) returns an index of a smallest value in the rectangle with bottom left coordinate (y1, x1) and top right coordinate (y2, x2). We can assume that y1 < y2 and x1 < x2.

Consider a solution which precomputes RMQ (y1, x1, y1 + ly , x1 + lx ) for all y1, x1, ly ∈ {2 1 , 2 2 , 2 3 , . . .} and lx ∈ {2 1 , 2 2 , 2 3 , . . .} and store the results in a table.

(a) Explain a fast method for performing the precomputation and give its running time and gives its time complexity.     [5 marks]

(b) How can queries be performed in O(1) time once this precomputation has been finished?     [5 marks]

Q7. Construct the Cartesian tree for the input 7, 6, 1, 7, 5, 8, 3, 4.       [5 marks]


发表评论

电子邮件地址不会被公开。 必填项已用*标注