CS 330, Spring 2025, Homework 9

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

CS 330, Spring 2025, Homework 9

Due Thursday, April 24, 11:59 pm EST, via Gradescope

Homework Guidelines

Collaboration policy If you choose to collaborate on some problems, you are allowed to discuss each problem with at most 3 other students currently enrolled in the class.  Before working with others on a problem, you should think about it yourself for at least 45 minutes.

You must write up each problem solution by yourself without assistance, even if you collaborate with others to solve  the problem.  You must also identify your collaborators.  If you  did not work with anyone, you should write ”Collaborators:  none.” It is a violation of this policy to submit a problem solution that you cannot orally explain to an instructor or TA.

Typesetting Solutions should be typed and submitted as a PDF file on Gradescope.  You may use any program you like to type your solutions.  LATEX, or  “Latex”, is commonly used for technical writing  (overleaf .com is a free web-based platform for writing in Latex) since it handles math very well. Word, Google Docs, Markdown or other software are also fine.

Content of the Solution. When a problem asks you to design an algorithm, a complete (a.k.a. full credit) solution  always  consists of a description of the algorithm using pseudocode  (thus  not actual code) with well-chosen comments, preferable some short and high level description in English so that the reader has an idea of what the code does, a proof of the correctness of the algorithm and a running time analysis.

Filling in a Contingency Table

You work for a government agency  (such as the US Census Bureau) that conducted a survey to find out race and ethnicity data across different age groups. There are 128 possible ethnicities and 100 possible ages.  They constructed a 100 x 128 size table, where each entry represents the number of people in the survey of a particular age and particular ethnicity. They published the row and column sums for this table, given as R = [R1, R2, . . . R100] and C = [C1, C2, . . . C128] respectively.

Unfortunately,  the original data in the table was lost,  due to  a hard drive failure  and poor backup protocols. All that was salvaged was:

•  The row and column sums; and

• A list of table positions (that is, pairs (i, j)) that are known to be 0.  This list doesn't contain all the positions that are 0, but everything on the list is 0 for sure.

•  (*) (used for part 2  of the problem) A list of table positions  (x, y) that are known to have at least one person, thus the value assigned to  (x, y) is at least  1  .   (Again, this list need not contain all such positions.)

1.  (7 points) Write an algorithm to reconstruct a possible table (that is, a table with the right row and column sums, with zeros in the positions known to be 0).  In this part of the problem you don’t have to consider the lower bounds on table positions as described in (*).

There may be many possible solutions; your algorithm just needs to find one.

To analyze the running time, suppose that there are k rows and t columns, and p people who participated in the survey. Your algorithm should run in time O(pkt).

Table.jpg

Figure 1: An example of Census Data Before the Data Was Lost

2.  (3 points) Modify your previous algorithm to take care of the third requirement marked by (*). That is, there is a second list of positions given as input of the cells that are required to have value at least 1.  We are aiming for the same running time of O(pkt).  In the proof it’s enough to prove the correctness of the modifications.  In your algorithm you can call the graph creation process from part 1  and then apply the modifications to the graph and the algorithmic process as needed.   (You can give a complete  solution for part 2 only and receive  the full 10 points if this is  correct.  However,  it’s  easier to think about in two steps.  If you  want  to play it safe you may be better off  doing the  two parts separately.)

Notes and Extensions This is not part of the homework  assignment, but  interesting  to  think about!

1.  The premise for this problem is unrealistic, since the Census Bureau very rarely loses data. However, they really do face a computationally similar problem.  Because of privacy concerns, the Census sometimes computes row and column sums, adds some noise to those, and then publishes  a  table  that  is  consistent   (or  as  consistent  as  possible)  with  those  noisy  row  and columns sums, subject to known constraints like the fact that some cells are emptythere are no 10-year old grandmothers and other cells have to have a person (e.g.  nursing homes homes only  appear  in  the  data  if they  have  at  least  one  occupant).   The  real  Census  version  is actually much more complicated, because the data has dozens of attributes instead of just two (age and ethnicity). But the flavor is similar.

2. Do not hand in:  How  could you modify your algorithm to see if there is more than one possible table? How could you modify the algorithm to take upper ui;j  and lower bounds {i;j on the value of each cell?

发表评论

电子邮件地址不会被公开。 必填项已用*标注