CS 294-165 Sketching Algorithms Problem Set 1


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


CS 294-165 Sketching Algorithms

Problem Set 1 

Problem 1: Incoherent matrices. A matrix Π ∈ R m×n is -incoherent if 

1. For each column Πi , kΠik2 = 1. 

2. For each i 6= j, |hΠi , Πj i| < . 

In the lecture notes, Theorem 3.1.4 implies that for any n > 1 and  ∈ (0, 1), there exists a code C with n codewords, alphabet size q = O(1/), block length ` = O( −1 log n), and relative distance 1−. This leads to an -incoherent matrix via the construction in Figure 1, where the columns of Π are in correspondence with codewords in C. 

Figure 1: Each codeword gives one column of the incoherent matrix. Here q = 4, t = 3 and the codeword is Ci = (1, 1, 3). The vector is m = qt dimensional with the coordinates broken up into t blocks each of size q. A 1 is placed in the jth position in the location specified by (Ci)j . The entire vector is normalized by 1/ √ ` to have unit norm. 

(a) (2 points) Consider the construction in Figure 1 to convert a code with n codewords, block length `, alphabet size q, and relative distance ρ into a matrix Π ∈ R m×n . For what  is it -incoherent? How many rows m does it have? Your answers should be in terms of the code parameters. 

It turns out one can achieve a better m than the code of Theorem 3.1.4 when  is sufficiently small. Suppose the alphabet size q is a prime power and consider the finite field Fq. Consider all polynomials p1, . . . , pN ∈ Fq[x] of degree at most d where N = q d+1 . Define the Reed-Solomon code C1, . . . , CN as follows: ` = q where the jth entry of Ci is the evaluation of pi on the jth element of Fq (so Ci is the evaluation table of pi). 

(b) (5 points) If we still want to have at least n codewords, we need N ≥ n. Show how to choose d, q so N ≥ n and the relative distance is 1 − , and show what this gives (in big-Oh notation) for the number m of rows of the incoherent matrix Π we obtain

(c) (3 points) How small does  need to be as a function of n for the codes from part (b) to give smaller m than the code from Theorem 3.1.4? If the turning point is T , you should provide an answer  0 T such that log(1/0 T ) = Θ(log(1/T )).

(d) (5 points) Suppose one has an -incoherent matrix Π. Show how `1-point queries to x being updated in the turnstile streaming model can be answered solely given y = Πx. (Note then that part (b) implies a deterministic `1-point query algorithm with low space in turnstile streams.) Hint: it may help to remember that Πx = P i xiΠi . 

OPEN PROBLEM: It is known that any -incoherent matrix with n columns must have m = Ω(min{n, −2 (log n)/ log(1/ε))} [1, Section 9]. Can the gap between upper and lower bounds be closed? It is conceivable a better upper bound could be achieved by discovering a better code construction. 

Problem 2: Counting distinct elements with deletions. In Lecture 2 we showed how to estimate the number of distinct elements in a stream in poly(ε −1 lg n) bits of space with 2/3 success probability, where all integers in the stream are in [n]. In Lecture 3, we gave a different algorithm based on geometric sampling. Recall in the turnstile model, the distinct elements problem asks us to estimate kxk0 := |{i : xi 6= 0}| when all updates have ∆ = +1. What if the updates in the stream are allowed to have ∆ ∈ {−1, 1} though? Show how to alter the geometric sampling algorithm from Section 2.2.3 of the lecture notes to also handle such negative updates. What is the space complexity of your solution? Any solution using space poly(ε −1 lg(nL)) for this modified problem, where L is the length of the stream, will receive full credit. 

Problem 3: Insertion-only `1 heavy hitters. In class we considered turnstile streams where a vector x ∈ R n receives updates of the form “add ∆ to xi” in a stream. As mentioned, insertion-only streams are a special case of turnstile streams where ∆ = 1 always (so we can just imagine the stream is a sequence i1i2 · · · iL of integers in [n]). Also, recall in the point query problem that after several updates we are asked a query on i for some i ∈ [n] and must output a value in [xi−(1/k)kxk1, xi+(1/k)kxk1]. Consider the algorithm CounterPointQuery below. 

Algorithm CounterPointQuery: 1. Initialize B counter/index pairs (i1, Ci), . . . ,(iB, CB) all to (0, 0) 2. update(i): if i = ij for some j ∈ [B], then increment Cj else if none of the ij = i but some Cj = 0, then set ij = i, Cj = 1 else decrement every Cj by 1 3. query(i): if i = ij for some j ∈ [B], then output Cj else output 0  

(a) (10 points) Give an upper bound on what B needs to be to ensure that query(i) always outputs a value in [xi − (1/k)kxk1, xi + (1/k)kxk1]. 

(b) (5 points) One can store the (ij , Cj ) pairs in an array so that they consume B space and updates take time O(B) (since finding whether some ij = i or decrementing all counters would take O(B) time). Devise a data structure taking O(B) space to store the (ij , Cj ) pairs so that the update(i) operation above can be implemented in O(1) time. Your data structure should probably use hashing, and the update time will be O(1) expected time. Assume that integers in the range 1, . . . , max{n, L} can be stored in one unit of space, and that a computer can perform basic arithmetic operations on integers of this size in constant time. 

Challenge problem (no credit): Suppose in part (a) you want to have error satisfying the tail guarantee, i.e. additive error ±(1/k)kxtail(k)k1 (see Remark 4.1.1 of the lecture notes). Then what does B need to be? 

Problem 4: (1 point) How much time did you spend on this problem set? If you can remember the breakdown, please report this per problem. (sum of time spent solving problem and typing up your solution) 

References 

[1] Noga Alon. Problems and results in extremal combinatorics–I Discrete Mathematics, 273(1-3): 31–53, 2003. 3

发表评论

电子邮件地址不会被公开。 必填项已用*标注