首页 » 数据科学与大数据技术 » INFO20003 Database Systems

INFO20003 Database Systems

2024-06-28 Admin 写评论

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Exam: Database Systems (INFO20003_2021_SM2)

Section 1: Relational Database Modelling + Implementation (23 marks)

Question 1

Data Types

Select the most appropriate MySQL data types for the following pieces of information:

Number of books sold by a small bookstore in one calendar day

A person's place of birth

Netflix's profit for 2020 (including decimals)

An employees sign-in time

Was Tim at work at work on 07/06/21?

A person's phone number in Australia (10 digits)

Price of items at a local grocery store

An item’s description in the form of a short video

Is it a horror, comedy, or an action movie?

Auto-incremented primary key for table 'Student'

Question 2

ER Modelling A

Assuming that common sense applies, which one of the following ER model fragments is most believable?

Option A

Option B

Option C

Option D

Question 3

ER Modelling B

A group of students was asked to model a typical product tracking system for “Amazon” by drawing a Conceptual ER model in Chen’s notation.

A shipped item is identified by a unique item ID. The system also stores the weight of the package, its dimensions, destination, and estimated delivery date. These shipped items are received into the tracking system at a single fulfilment center. Each fulfilment center is identified by a unique ID. The system also stores type of the center and its address. Shipped items make their way to their destination via one or more standard transportation events (e.g., air flights, sea freight, truck deliveries). These transportation events are identified by a unique eventID and characterized by a type (e.g., flights, freight, truck), and a delivery route.

Below are the attempts by the students. Identify the ER model that addresses all the requirements of this case study.

Option A

Option B

Option C

Option D

Question 4

ER Modelling C

A group of students was asked to draw a Conceptual ER model in Chen’s notation for a typical flight database to store details about an airline’s fleet, flights, and seat bookings. Each flight is using a single airplane. There are several airplanes available for different flights and each of these airplanes has a unique registration number, model name and the capacity, i.e., number of passengers. Each flight is identified by a flight number, date and time of departure, date and time of arrival, departure city and destination city. Passengers can book a seat on any given flight. The database stores a passenger’s passport number, first name and last name. A passenger can have multiple bookings scheduled on different flights. A passenger can change the seat allocation of a booked flight multiple times throughout history, and the database needs to maintain the history of changes by storing the date and time of the update and seat number for each such change.

Below are the attempts by the students. Identify the ER model that addresses all the requirements of this case study.

Option A

Option B

Option C

Option D

Question 5

DDL

In the following conceptual model of a library consisting of books and multimedia artefacts, a member can borrow any given item multiple times and should return the item or renew the booking on or before the return date. Each item is assigned a type, but each type can be assigned to multiple items. If the item is a book, an ISBN number (consisting of 13 digits) is also stored in addition to title and genre. An item can be borrowed by more than one member at different times, and a member can borrow more than one item at any given time. Each member is given a numeric ranking ranging from 0-10 based on their timeliness in managing their returns and re-bookings.

Write SQL statements to create the tables for the data model shown below. Be sure to specify primary and foreign keys. You do not need to specify whether the fields are NULL/NOT NULL. Where the data type is not obvious, feel free to choose an appropriate data type.

Section 2: SQL & RA (24 marks)

SQL Intro

Please note that the following data model and the description is the same as Assignment 2.

The case study focuses on a small part of a contact centre operated by a large financial institution. The contact centre employs several staff members organised in a hierarchical team structure. The case study focuses on four teams which are named after Australian national parks and are organised as follows:

Each team has a team leader and a small number of agents (between 1 and 4). The team structure of contact centres is very dynamic in real life – and this case is no exception. Staff members may move from one team to another, may be scheduled to work for more than one team at the same time and may assume different roles within the same or different teams.

The profile table keeps track of staff allocations over time using a unique 4-digit identifier for each allocation of a member of staff to a team in a particular role (e.g. Team Leader or Agent). The valid_from and valid_until attributes track the duration of the allocation.

Throughout the day, Contact Centre agents take incoming calls from customers. For every call, the system generates an entry in the call_record table, linking it to the agent who (first) handled the call. A call may be transferred between agents or agents to team leaders if need be. In that case, the call comprises of more than one call legs. The provided call record data covers a period of three months from 1st April 2021 to 30th June 2021.

Important note: You may use the time of the survey (survey.response_time) when determining ‘which month was a call+survey response made in’, as opposed to the time the call was first started (call_record.call_time).

Contact Centre management requires that at the end of every call, the agent asks the caller to stay on the line and complete a post-call survey using their phone keypad. The survey responses are used to analyse the performance of contact centre individuals and teams, as well as the overall business. The survey consists of three questions, as follows:

# Question Metric Question Wording Answer

1 First Call Resolution Was your enquiry resolved today? 1-3 scale, where 1=yes, 2=no, 3=tooearly to tell

2 Agent Quality How happy were you with the person you spoke to? 0-10 scale, where 0=very unhappy and 10=extremely happy

3 Promoter Score How likely are you to recommend us to your friends and family? 0-10 scale, where 0=highly unlikely and 10=highly likely

A caller may opt out of taking a survey altogether. Some callers may agree to the survey but quickly hang up without answering any questions. Others may answer only some of the questions before hanging up. And some will give answers to all questions. Customers cannot ‘skip’ questions, i.e. they can’t have provided an answer to Q2 without having provided an answer to Q1. Customer responses are recorded in the survey_response table. Only the first leg will (potentially, if consented) be related to a survey-record (i.e. the surveys are about the first agent that picked up)

For reference, the data model is given below.

Figure 1: The ER Model for Contact Centre Database

Some definitions

“Participation” is a contact centre metric that measures what proportion (percentage) of inbound calls are followed up by an attempted survey. This number can go anywhere between 0% and 100%. Contact centre agents and management would like this figure to be as high as possible.

"Enhanced participation” is like participation, but considers not just offered surveys, but offered surveys where at least the first question was answered by the customer (i.e., in this case they didn’t “say yes to the survey, but then hang up before answering the first question”).

“Net Promoter Score” or NPS is a measure of customer loyalty and is calculated from the collected promoter scores. The way how NPS is calculated can be found here https://www.netpromoter.com/know/ (https://www.netpromoter.com/know/) . An online calculator is available on http://www.npscalculator.com/en# (http://www.npscalculator.com/en)

“Cherry picking” refers to the practice of agents selectively offering surveys only to callers who are likely to score the agent highly – with a 9 or 10 – on the Agent Quality question.

Question 6

SQL Q1

Display the first and last names of all the agents. Return as (firstname, lastname), with one row per agent.

Question 7

SQL Q2

List the top 3 agents who have attended the most calls (only include calls where the agent was the first one to pick up). Return as (staffId, numberOfcalls). Assume there are no ties.

Question 8

SQL Q3

Which staff worked as 'agents' (not team leaders) for more than one team on the 16th of May? Show their names and the number of teams they were working for. Return as (firstname, lastname, numberOfTeams).

Question 9

SQL Q4

List all of the agents who have a low (<10%) 'survey participation rate' (participation), but have only ever received 9/10 or 10/10 (or no response) for the 'agent_quality' survey response (i.e., they are likely to be cherry-picking). Return as (staffId).

Question 10

RA 1

For each of the relational algebra below, state whether they are correct or not for the following statement:

List the first name of all the staff who have ever worked for the team with id “1”.

Question 11

RA 2

For each of the relational algebra below, state whether they are correct or not for the following statement:

Retrieve all non-null survey response comments + the staff id for staff with a last name of “Victor”.

Section 3: Query Processing and Optimisation (28 marks)

Single Relation Plan Intro

Consider a relation called Sale that stores information about the sales of a company. Imagine the relation consists of 500,000 tuples and each data page can contain 100 tuples. Imagine the sale amount can take any value between 0 and 100,000 ([0,100,000]) and imagine that the SalesAgent can be one of the 10 different sales agents. Suppose that the following SQL query is executed frequently using the given relation:

SELECT *

FROM Sale

WHERE amount > 40,000 AND SalesAgent = 'Jim Harper';

Question 12

Single Relation Plan A

Compute the estimated result size for the query, and the reduction factor of each filter.

RF(amount)

RF(SalesAgent)

Result Size

Question 13

Single Relation Plan B

Compute the estimated cost of plan alternatives assuming that an unclustered hash index on (SalesAgent) is (the only index) available. Suppose there are 10 index pages.

Give the lowest estimated cost (in I/0s) after considering all access methods available.

Question 14

Single Relation Plan C

What would happen with the cost if our query changed and became:

SELECT *

FROM Sale

WHERE amount > 80,000 AND SalesAgent = 'Jim Harper';

Assuming that the unclustered index on SalesAgent from the previous question is the only index available, would the cost of the best plan change?

Yes, because the RF value will change.

No, because the RF value for SalesAgent stays the same

No, the result size will stay the same

No, the result size changes, but best plan cost is still the same

Multi-Relation Plan Intro

Consider two relations called Singer and Concert. The relation Singer has 200 pages, and the relation Concert has 20,000 pages. Consider the following SQL statement:

SELECT *

FROM Singer INNER JOIN Concert

ON Singer.ID = Concert.singer

WHERE Concert.city = ‘Amsterdam’;

There are 102 buffer pages available in memory. Both relations are stores as simple heap files. Neither relation has any indexes built on it.

Evaluate block-oriented Nested Loop Join, Sort Merge Join and Hash Join using the number of disk I/O's as the cost. Consider Singer as the outer relation in all alternatives. Assume that sorting can be performed in two passes for both relations. All selections are performed on-the-fly after the join.

发表评论

电子邮件地址不会被公开。必填项已用*标注

姓名 *

电子邮件 *

验证码 *