Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Assignment #3
CS 348 - Spring 2024 - Sections 002 & 003
Due : 11:59 p.m. July 19, 2024
Appeal Deadline: One week after return
(Total weight 10%)
Submission Instruction
This assignment will be submitted through Crowdmark. See Course Outline on Learn for more detailed instructions. In particular, do not forget to submit one file per question to make the livesof TAs easier. You don’t have to type the questions if you use Latex. Just specifying the question number is good enough. You don’t need to keep the same font style. Here is a draft/empty latex template you may download and use: Overleaf Template. Handwritten solutions scanned to pdf are also acceptable as long as they are readable.
Question 1.
[10 points in total] Consider a table residing fully on disk and occupying 1,000,000 blocks. There is no index, and 100 blocks are available for query processing in main memory. Suppose you have the following two options to improve query performance:
(1) Buy more memory to increase the number of available memory blocks to 1000.
(2) Buy a faster disk to increase the speed of I/O by 10%.
(a) (4 points) If the objective is to speed up the query “SELECT * FROM R WHERE R.A > val”, which option is more effective? Justify your answer in no more than four sentences.
(b) (6 points) If the objective is to speed up the query “SELECT * FROM R ORDER BY A”,which option is more effective? Justify your answer in no more than four sentences using the relevant complexity bounds discussed in the class.
Question 2.
[20 points] Consider the following schema for a portion of a simple corporate database (type information is not relevant to this question and is omitted):
Employee (id , name, addr, salary, age, year, deptid) Department (deptid , deptname, floor, budget)
Suppose you know that the following queries are the five most common queries in the workload for this corporation and that all five are roughly equivalent in frequency and importance:
1. List the average salary for employees of each age; that is, group by on age and list the age of each group and the corresponding average salary.
2. List all the attributes of employees who started working in a given range of years.
3. List all the department information, ordered by department budget.
4. List the id, name, and address of employees who work in the department with a user-specified depart- ment name.
5. List the overall average salary of all employees.
Given this information, decide which attributes will be indexed and whether each index will be a clustered index or an unclustered index. Assume that B+ tree indexes are the only index type supported by the DBMS and that both single- and multiple-attribute search keys are permitted. Provide at most five indexes (ideally covering all five queries above) and briefly explain your index choices.
Question 3.
[20 points in total] Consider the following schema: Employee(Eno , ename, title, salary)
Project(Pno , pname, budget, department) Works(Eno , Pno , duration)
FK: Eno references to Employee FK: Pno references to Project
Works relation stores tuples for each employee working in a project. duration is an integer attribute that stores how long an employee works for a particular project in months. department attribute stores the department in which a project is carried out. salary and budget are also integer attributes for Employee and Project table, respectively. Given the schema, answer the following parts (a,b, c, d)
(a) (6 points) For the following SQL query, provide 2 different, equivalent logical query plans in the form of relational algebra expression tree. Do not use cross-product as operator in either tree.
Select E .ename, P .pname
From Employee as E, Projects as P, Works as W
Where E .Eno = W .Eno and W .Pno=P .Pno and W .duration > 37
(b) (8 points) Transform the following un-optimized query into an optimized one by applying all the relevant transformation rules. Represent your result as a relational algebra expression tree.
πename (σWorks.duration=12∨Works.duration=24
(σEmployee.ename!=′ J.Doe′ ΛProject.department=′ Construction′
(σEmployee.Eno=Works.EnoΛWorks.Pno=Project.Pno (Employee × Works × Project))))
Suppose for the given schema above, we have the following statistics additionally (|| represents the cardinality of a table or a query):
• |Employee| = 10000
• |Project| = 1200, |πdepartment Project| = 15
• | Works | = 36000, |πdurationWorks | = 20, min(duration, Works) = 3, max(duration, Works) = 48
Also assume that, for each project there is at least one employee working for it (i.e, for each project there is a tuple in Works table) and every employee works for at least 1 project (i.e, for each employee there is a tuple in Works table). For each of the following relational algebra expressions (c, d), estimate the number of tuples it produces. You may make the same assumptions as in the lecture on query optimization. If you make different or additional assumptions, please state them explicitly.
(c) (3 points) σduration<=12(Works)
(d) (3 points) σdepartment=′ HR′ (Employee ▷◁ Works ▷◁ Project)
Question 4.
[20 points in total] Given the following histories:
H1 = {W1 (x), R1 (x), R2 (z), W2 (x), W2 (y), R3 (x), R3 (z), R3 (y)}
H2 = {R2 (z), W2 (x), W2 (y), R3 (x), W1 (x), R1 (x), R3 (z), R3 (y)}
H3 = {W1 (x), R1 (x), R2 (z), R3 (x), R3 (z), W2 (x), W2 (y), R3 (y)}
H4 = {W1 (x), R2 (z), R1 (x), W2 (x), R3 (x), R3 (z), W2 (y), R3 (y)}
a. (12 points) Which pairs of the above histories (6 pairs in total) are conflict equivalent and why? Show your answer using the table representation used in class.
b. (8 points) Which of the four histories above are serializable and why?
Question 5.
[20 points in total]
(a) (5 points) Consider the following transaction. Add lock and unlock instructions to it so that the transaction observes the two-phase locking protocol. Make sure to indicate the modes of the locks (shared or exclusive) and choose the most permissive mode that allows all operations that Ti will perform on X (shared locks are more permissive than exclusive locks).
T1:
read(A); read(B);
if A=0 then B:=B+1; read(C);
if C=0 then B:=B+1; write(B) .
(b) (5 points) Consider two transactions:
T1:
read(A); read(B);
if A=0 then B:=B+1; write(B) .
T2:
read(B); read(A);
if B=0 then A:=A+1; write(A) .
Can the execution of these transactions with the two-phase locking protocol result in a deadlock? Briefly justify your answer with an example.
(c) (10 points) Consider the following sequence of operations, where, e.g., c1 and a4 are commit and abort requests by transactions T1 and T4 , respectively. Assume the database system uses strict 2PL protocol and enforces the following lock acquisition and release rules:
• A transaction T attempts to acquire a lockon data item o, right before T is about to execute its first operation on o.
• A transaction T releases a lockon itemo as early as possible ensuring that the transaction abides by the strict 2PL protocol.
Given the above rules, consider a series of requests. (i) Indicate which, if any, of these requests will cause a transaction to be blocked? (ii) Indicate if a deadlock will occur?
w1 [a], r2 [a], r1 [b], w3 [b], c1 , r4 [b], w2 [b], c3 , c4 , a2
Question 6.
[10 points in total] The following is the sequence of log records of a recovery manager written by the transactions T1, T2, and T3:
T1 Start;
T1,A,12,11; T2 Start;
T2,B,20,21; T1,C,30,31; T2,D,40,41; T2 Commit; T1,E,50,51; T3 Start;
T3 D,41,4;
Describe the action of the recovery manager, including changes to both the disk and the log as follows:
1. list the undo/redo actions based on the log
2. list the final values of each data item in the disk