首页 » 计算机科学(Computing Science) » [MP6] Naive Database

[MP6] Naive Database

2025-06-05 Admin 写评论

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

[MP6] Naive Database
Launch Autograder
GitHub Classroom

Grading

The assignment's final due date is shown in the yellow box at the top of this page.
This assignment will be autograded. You have unlimited penalty-free submission attempts.
You must correctly pass ALL test cases to earn 100 pts.
If your code does not pass all of our test cases, these 100 pts will be prorated → 100 pts * (% what you earned / 100%).

Introduction

In this MP you are writing two classes to create a simple database (Database, and DbTable). The database will consist of tables. Each table consists of columns (describing the data) and rows (examples of the data). For example, the user may create a table that holds the name of an object, it’s cost, and the count. This would be stored in the table as three columns of the types (string, double, int). A row in the table could look like: “Pumpkin”, 4.50, 3.

For this MP first read the specifications then fill in the class member functions given in the starter code as specified.

Background

This MP continues our interaction with dynamic memory in classes.

While we haven't formally introduced databases, I think one's interactions with spreadsheets will help make this assignment more accessible. While a spreadsheet and database are very different, a spreadsheet will provide a tangible reference to help visualize some concepts. I will do my best to This assignment is due Thursday, May 1, 2025 at 23:59 CDT. See the syllabus for our late policy. present the problem carefully to ensure it is accessible to all, regardless of one's experience withspreadsheets or databases.

To begin, navigate your browser to docs.google.com/spreadsheets and create a new sheet. You will be confronted with a two-dimensional array of data fields. The columns are typically identified upfront with a header. Each subsequent row would then contain a data record spanning the columns. Each column's row stores data of a specific type. For instance, we might have the following:

In the diagram above, the column headers occupy the first row, and then there are two data records: one for Reginald and another for Ariana. Each record has three data fields: "NAME", "UIN", and "GPA"; notice how each field is derived from a column.

In the example above, each column specifies a type for its data: the student's "Name" is a string, "UIN" an int, and "GPA earned" a double. Accordingly, each record (i.e., row of data) is a heterogeneous data aggregate. This is shown below:

What's a database? A relational database (a common database form) is composed of multiple tables with typed columns and rows of records. A record in one table can be associated with a record in another table, forming relationships between tables. Relationships between tables are not always one-to-one; one-to-many and many-to-many relationships can also exist. You will not worry about relationships between tables. It's noteworthy that many relational databases use structured query language (SQL) commands to manage their tabular data.

In this assignment, we will define a database as a collection of tables; a table as a collection of records of aggregated heterogeneous data. You will implement a table structure that houses dynamically allocated rows of heterogeneous, dynamically allocated data.

This assignment will require you to employ data structures and a compound type that we haven't introduced formally in the lessons. Don't worry, though! We will provide a good introduction to those structures and types in this section and again during the discussion section.

void*

From our discussions in class, you've seen the "type" void is used as the return type of a function that does not return a value. In this assignment, you're introduced to void* (pointer to a void object). What does this mean?

You can think about void* as a special type of generic pointer, though, to understand why it's important to consider pointers in general. For our discussion, all pointers are the same size and can take on the same range of values. This makes sense since the address to an integer object should be no different than the address to a double object. What's being stored is an address, and where an integer can begin in memory should be no different from where a double might begin in memory.

Let's say you have an int* p_int = new int(7): p_int points to a dynamically allocated integer on the free store; p_int (if evaluated) stores the address to that integer. When we dereference p_int (*p_int), we go out to that address in memory and begin interpreting the sequence of bits there according to the pointer's base type, int.

Likewise, if we had double* p_double = new double(3.14): p_double points to a dynamically allocated double on the free store; p_double (if evaluated) stores the address to that double. When we dereference p_double, we go out to the address and begin interpreting the bit sequence as a double.

With basic type casting, we could go out to any address in our program's space and begin interpreting memory from that starting point according to any time. We've concluded that the addresses are the same size, and the base type is important in interpreting a resident living at an address. If we write void* p_void = static_cast<void*>(new int(7)), we store the address to our dynamic integer object in a pointer to void. There are some implications to this: (1) we do not have a base type, so we cannot dereference p_void, and (2), the pointer is not pointing to an empty ("void") object but to a real object. However, since void* is a homogeneous type, we could create an array of void*s whose elements point to very differently typed objects:

void** array = new void*[3];

array[0] = static_cast<void*>(new std::string("Michael"));

array[1] = static_cast<void*>(new int(7));

array[2] = static_cast<void*>(new double(4.0));

That's neat, though to access the dynamically allocated objects in the array, we have to know the underlying type of those objects since we cannot dereference a void pointer. If we had that information stored somewhere, we could write

*(static_cast<std::string*>(array[0])); // to access the object pointed to by array[0]

*(static_cast<int*>(array[1])); // to access the object pointed to by array[1]

*(static_cast<double*>(array[2])); // to access the object pointed to by array[2]

This is exactly how we are going to use void pointers in this assignment: to "encode" an array of heterogeneously typed data in a table's row (i.e., record), and we'll maintain the proper accounting to

"decode" a record's data fields (i.e., columns).

Activity: Void Pointers

This is due on Wednesday, May 7, 2025 at 23:59:00 CDT

This playground activity provides an opportunity to interact with void*s!

> Click Run to compile and run your code.

std::initializer_list<T>

The documentation for this type is quite good, so I will encourage you to review std::initializer_list from cppreference.com. In short, you can think about this as an std::vector of objects for some type T, albeit with limited functionality. In this assignment, we will use employ an std::initializer_list<std::string>.

Typically, when we write a braced list (i.e., {"Howdy", "World"}), a temporary object of the std::initializer_list type is created. Many objects have a constructor defined to have an std::initializer_list parameter to list-initialize an object.

This is not a graded activity.

#include <iostream>

#include <string>

int main() {

void** array = new void*[3];

array[0] = static_cast<void*>(new std::string("Michael"));

array[1] = static_cast<void*>(new int(7));

array[2] = static_cast<void*>(new double(4.0));

std::string& str = *(static_cast<std::string*>(

array[0])); // to access the object pointed to by array[0]

int& i = *(static_cast<int*>(

array[1])); // to access the object pointed to by array[1]

double& d = *(static_cast<double*>(

array[2])); // to access the object pointed to by array[2]

std::cout << str << '\t' << i << '\t' << d << std::endl;

delete static_cast<std::string*>(array[0]);

delete static_cast<int*>(array[1]);

delete static_cast<double*>(array[2]);

delete[] array;

}

In this assignment, we will use an std::initializer_list<std::string> to allow a variable number of "arguments" to be passed our AddRow function. To access the elements in an

std::initializer_list<std::string>, you will need to use a range-based for-statement to iterate through them one by one. See the example of this in the playground below.

Activity: Initializer lists

Ungraded

Playground

This is due on Wednesday, May 7, 2025 at 23:57:00 CDT

Acquaint yourself with std::initializer_list<std::string> through the C++ playground

that follows:

> Click Run to compile and run your code.

This is not a graded activity.

#include <initializer_list>

#include <iostream>

#include <string>

void InitializerListPractice(std::initializer_list<std::string> init_list);

int main() { InitializerListPractice({"Howdy", "World"}); }

void InitializerListPractice(std::initializer_list<std::string> init_list) {

if (init_list.size() == 0) {

std::cout << "init_list is empty" << std::endl;

}

for (const std::string& str : init_list) {

std::cout << str << std::endl;

}

User-defined types created in this assignment, high-level

behavior overview

Table description

Columns

You will store column descriptors and data types independently of the data records; however, understanding the parallel correspondence between the column vector and the row records will be imperative to interpret the data stored within a table's record.

Column descriptors and data types will be stored in the DbTable's std::vector<std::pair<std::string, DataType>> col_descs_ data member. Possible data types for the columns will be DataType::kString, DataType::kDouble, DataType::kInt. In your program, you will "decode" DataType::kString as an std::string, DataType::kDouble as a double, and DataType::kInt as an int.

While we have not detailed how you will store the row data (the next section), the underlying row data structure will be a free store array of dynamically allocated objects. The row arrays and col_descs_vector should be considered parallel arrays: each has the same size (though not necessarily capacity), and the elements are related. For example, The nth index in a row array will correspond to the nth row in col_descs_:

Given a row array, we can look up the name and type of data that each column (i.e., data field) stores.

Given the parallel nature of the row array and column's vector, col_descs_.at(0).first tells you that row[0] is the "NAME" column; col_descs_.at(0).second tells you that the type of data stored in that column is DataType::kString.

Rows

You will store the table rows in the private data member std::map<unsigned int, void**> rows_.

The key of rows_ is a unique identifier for the respective row. We will use the data member next_unique_id_ as the unique identifier when inserting a row to the table (i.e., to the map); after the insertion, you will increment next_unique_id_ by one.

The unique identifier maps to the base address of a free store array of pointers to dynamically allocated "void" objects. This array constitutes a row in our table. You will be responsible for managingthe memory associated with each row and the object to which it points.

A row is always allocated with a capacity of row_col_capacity_, though the number of data elements it contains is the number of columns in col_desc_ (i.e., col_descs_.size()). The idea here is similar to

a vector's size vs. its capacity. With "oversized" row arrays, we can add columns until

col_descs_.size() == row_col_capacity_, ensuring that we do not need to resize the row arrayswhenever we want to add a column. However, should col_descs_.size() become equal to row_col_capacity_, you will be responsible for resizing every row array in the table.

Let's talk about accessing a row within a table. To access the table row with id 0, you'd write rows_.at(0). This is simple, though accessing the data is non-trivial. Since C++ arrays must be homogenous in element type, we use void pointer to store addresses to the data contained within each row column. This allows us to store addresses for DataType::kString (i.e., std::string),

DataType::kDouble (i.e., double), DataType::kInt (i.e., int). Since we cannot dereference a void pointer, we must static_cast the void pointer to the underlying data type's pointer type. This is where the correspondence to col_descs_ becomes important.

Let's say we have Reginald's record by executing void** row = rows_.at(0). To read the record's

Name field, we need to look up the data type for column 0 in col_descs_. We would observe that

col_descs_.at(0).second returns DataType::kString. Therefore, we must interpret the data in the column (more correctly, the data pointed to) as an std::string. To read the data, we would

static_cast the void pointer in row[0] to an std::string pointer and dereference the result: *

(static_cast<std::string*>(row[0]). A similar process would commence for the record's data in the UIN and GPA columns. I recommend you use a switch statement (or an if-statement) to apply the proper cast. Consider the following diagram:

The written text highlighted in green looks up the value mapped to in rows_ by the record's unique identifier (i.e., key) 0. Since this is a pointer to an array of pointers to "void" objects, we store the result in an object typed void** named row. To access the data stored in the 0th column (i.e., the Name column), I initialize a void* typed object, row_col0, with row[0]. This is the text highlighted in red. If I were to inspect the value stored in row_col0, I would see the address to the std::string object storing "Ariana". Since I cannot dereference a void pointer, I consult col_descs_ to see what type of data is pointed to by the address in column 0. I find out that column 0 stores DataType::kString (i.e., std::string) data. Therefore, I need to cast the address in column 0 to an std::string* pointer and dereference the result of that expression. In the diagram above, the complete statement is highlighted in purple. How would you access the data in the second (UIN) or third (GPA) columns of this record?

Database description

The database type will have a single data member tables_ that will map a table name (std::string) to a

DbTable pointer (DbTable*), which stores the address of a dynamically allocated table object. All

interactions with the tables owned by the database will be through tables_.

User-defined types created in this assignment, low-level

implementation details

DbTable

Member functions

DbTable() = default;

Default constructor with default behavior: the data members will be initialized using in-class initialization. For the primitive types, that's the value specified for their declaration's initializer; for user-defined types, their default constructors will determine their object's starting state.

void AddColumn(const std::pair<std::string, DataType>& col_desc);

This function has numerous considerations that should be addressed across multiple helper functions.

Your implementation will:

If col_descs_.size() is equal to row_col_capacity_;

Resize every data row array mapped to by rows_ to have the capacity (row_col_capacity_ *

Update row_col_capacity_ to (row_col_capacity_ * 2)

Pushes back the col_desc std::pair to col_descs_, effectively increasing the number of table

columns by 1.

For every data row mapped by rows_

Set the row's "new column" to a new dynamically allocated object containing the zero value of

the column's type. For

DataType::kString, an empty std::string

DataType::kInt, an int initialized to 0

DataType::kDouble, a double initialized to 0.0

2025/4/28 17:43

CS 128 | [MP6] Naive Database

https://cs128.org/2025a/mp6-naive-database-1321

9/17Example:

2025/4/28 17:43

CS 128 | [MP6] Naive Database

https://cs128.org/2025a/mp6-naive-database-1321

10/17void DeleteColumnByIdx(unsigned int col_idx);

This function also has multiple concerns due to its parallel relationship with the mapped rows.

Checks whether col_idx is a valid index into col_descs_. Throws an std::out_of_range exception if

col_idx is outside the range of col_descs_.

If col_descs_.size() == 1 and rows_.size() > 0, thrown an std::runtime_error exception: any table with at least one row must have at least one column.

For every mapped row in rows_,

Deallocates the dynamically allocated object residing at row[col_idx]

Shifts all elements with indexes great than col_idx down one index position to effectively close the gap caused by the "deleted column" at row[col_idx] and to maintain the parallel

relationship with col_descs_

Erases the std::pair at col_idx from col_descs_, effectively decreasing the size of col_descs_

by 1. You can delete an element from col_descs_ by writing

col_descs_.erase(col_descs_.begin()+idx), where idx stores the index you'd like removed.

Example:

void AddRow(const std::initializer_list<std::string>& col_data);

This function's parameter is base typed std::initializer_list<std::string>. We describe what this data structure is and how you can use it in the prior text. It's important to note that col_data is a parallel "array" to col_descs_ and the array rows.

First, you need to ensure the number of items in col_data is equal to the number of columns in the table's rows. If this is not the case, throw an std::invalid_argument exception.

Second, you must create a new entry into rows_ with the next_unique_id_ mapped to a dynamically allocated array of row_col_capacity_ pointers to void objects. Don't forget to increment next_unique_id_ by 1 once you're done.

Finally, insert the dynamically allocated objects storing the col_data into the new row.

Notice that every item in the col_data list is an std::string. Therefore, if the data type for the column you're inserting is DataType::kInt or DataType::kDouble, you must convert that std::string to an int (use std::stoi) or double (use std::stod) when initializing the dynamically allocated int or

double object, respectively.

void DeleteRowById(unsigned int id);

This function will delete a row from the table. If a row with id does not exist, throw an exception.

Provided the row exists, there are three considerations:

Every row's column points to a dynamically allocated object. Therefore, you must release the memory allocated to those objects. This will require you to iterate through the row, static_cast each void* pointer, and invoke the delete operator on the address returned by that cast.

A row is dynamically allocated too, so after "deleting" its contents, you must invoke delete[] on

the address stored in the void** object.

Finally, remove the row from the rows_ map by invoking rows_.erase(id).

DbTable(const DbTable& rhs);

Copy constructor. Implements a deep copy policy for DbTable typed objects.

DbTable& operator=(const DbTable& table);

Copy assignment operator. Implements a deep copy policy for DbTable typed objects.

~DbTable();

Destructor. This function should free all dynamic memory associated with every row in the table.

Data members

Once a DBTable object is constructed, do not assign a new map object to rows_ or a new vector object to col_descs_. Instead, mutate these objects (and their stored contents) as necessary to update the

table's state.

Data Member (with initializer)

Description

unsigned int next_unique_id_ = 0;

Stores the next unique identifier value for a new row.

unsigned int row_col_capacity_ = 2;

Specifies the present allocation of the row arrays (i.e.,

its capacity).

std::map<unsigned int, void**>

rows_;

Maps a unique row identifier to the base address of a free store array of pointers to dynamically allocated "void" objects. You must use the map that is created during construction: we expect you to use the same map object throughout the DbTable’s lifetime. That is, you cannot do something like rows_ = new_row_map anywhere in your code.

std::vector<std::pair<std::string,

DataType>> col_descs_;

A vector of std::pair objects storing the column names paired with the data type of elements in that column. Has a parallel relationship to the row arrays in that the type of data stored in a row[0] can be looked up as col_desc_.at(0).second. Therefore, the size of

col_descs_ also communicates the number of columns

storing data in any row in the table (i.e., the row's size; is

different from a row's capacity).

Non-member helper functions

std::ostream& operator<<(std::ostream& os, const DbTable& table);

Inserts into os the columns names as single line of comma separated values with Column

Name(C++DataType). Each subsequent line is a table row inserted in ascending row id as a comma

separated sequence of the data field values. For example,

Name(std::string), Home Airport(std::string), MQMs(int), MQDs(double)

Detroit Flyer, DTW, 25000, 3050.24

Austin Flyer, AUS, 65000, 8050.24

Houston Flyer, IAH, 3000, 515.17

NYC Flyer, LGA, 105000, 12551.3

All you have to do is insert the object into the stream. You do not want to mess with the stream formatting through the facilities provided in <iomanip>. Repeat, do not. Moreover, you do not want to convert any integral or floating-point types to a string representation. Instead, if you've double d = 3.14, all you'd do is os << d. In my solution, I have a switch statement that looks at the data type, and static_casts to the proper pointer type and inserts the dereferenced value directly into the stream.

For example, this is taken directly from my solution: os << *(static_cast<double*>

(row_array[col_idx]));.

Database

Member functions

Database() = default;

void CreateTable(const std::string& table_name);

Creates a new database table. This is accomplished by creating a new entry in tables_ that maps the

table_name to a new dynamically allocated DbTable object (i.e., new DbTable).

void DropTable(const std::string& table_name);

Drops an existing table from the database. This is done by looking up the table_name in the tables_ map and then deallocating the mapped DbTable. Remember that the table key is the table name and the value is the address of the dynamically allocated DbTable object. If table_name does not exist in tables_, throw an exception.

DbTable& GetTable(const std::string& table_name);

Returns a reference to the DbTable object associated with table_name.

Database(const Database& rhs);

Copy constructor. Implements a deep copy policy for Database typed objects. This also results in every database row being deep copied into the object as well.

Database& operator=(const Database& rhs);

Copy assignment operator. Implements a deep copy policy for Database typed objects. This also results in every database row being deep copied into the object as well.

~Database();

Destructor. This function should free all dynamic memory associated with every table in tables_. This

includes all dynamic memory associated with every row in a table's rows_.

Data members

Once a Database object is constructed, do not assign a new map object to tables_. Instead, mutate

this objects (and its stored contents) as necessary to update the database's state.

Data Member (with initializer)

Description

std::map<std::string, DbTable*> tables_;

Maps a table name to the DbTable* to that table.

Important note on std::map's erase

When you invoke erase on an std::map's entry, any iterators in-use for that object will become invalidated, since the std::map's underlying structure will have changed. In this assignment, we have you implement Table::DeleteColumnByIdx(), which will invoke erase on a table's unique id. This is fine.

However, if we were to write the following code:

for (auto const& [row_id, row_array] : rows_) {

rows_.erase(row_id);

}

the mechanism governing the range-based for-statement will become invalidated after the first

iteration of the loop. Likewise, writing

for (auto const& [row_id, row_array] : rows_) {

DeleteRowById(row_id);

}

will also be problematic since DeleteRowById invokes erase, so the mechanism governing the range

based for-statement here, too will become invalidated after the first iteration of the loop.

Let's say you'd like to use DeleteRowById in your destructor. How could you go about this? You know

that each row has a unique id between 0 and next_unique_id_-1 inclusive. You could write a plain

vanilla for-statement that loops from 0 until next_unqiue_id_, checking whether rows_ contains the

id, and if so, invoking DeleteRowById with that id. For example,

for (unsigned int row_id = 0; row_id < next_unique_id_; ++row_id) {

if(rows_.contains(row_id))

DeleteRowById(row_id);

}

Friend Functions

You might notice that DbTable and Database have seemingly duplicate declarations of the insertion operator, with one in the class looking like:

friend std::ostream& operator<<(std::ostream& os, const DbTable& table);

nd one outside the class looking like:

std::ostream& operator<<(std::ostream& os, const DbTable& table);

These aren't duplicate declarations - the first is declaring the overloaded extraction operator as a friend of the DbTable class, while the second actually declares the method. What the friend declaration inside the class means is that while operator<< is not a member function, it can access private members of the class as if it was. This will make writing the output operator much easier, as you don't have to write a bunch of getters for all the variables you need - you can just access them directly. In summary, making a function a friend of a class allows it to access private members of the class as if it was a member function.

Friend functions in C++ are not inherently "bad," but they should be used with caution and an understanding of their implications. The key reason for this is that they can break encapsulation and hinder the object-oriented programming (OOP) principles of data hiding and abstraction. In some scenarios, friend functions can be useful, such as when implementing operator overloading. However, in modern C++, best practices encourage minimizing the use of friend functions. So, you should only use them when no better alternatives are available. We expect you to refrain from declaring your own friend functions in this course.

Constraints

You cannot create additional data members: we break encapsulation to set data members to specific values and test the behaviors you've implemented. We only update the data members specified in our prompt during that process. Furthermore, do not create new std::maps and assign to the private data members after the construction of an object. Instead should you require the container empty, clear the map you already have.
All functions of the public interface will be used in testing, so it is CRUCIAL that their signaturesand return types are not altered (and that you implement them).
Your program must compile without warnings/errors when compiled with: clang++ using the - std=c++20 and the following flags -Wall -Wextra -Werror -pedantic.
Only the following header files are allowed to be #included in your solution files: "db_table.hpp" "db.hpp" "iomanip" "iostream" "stdexcept" "map" "string" "initializer_

Submitting your work

You will submit the following header and source files to PrairieLearn for grading: db.hpp, db.cc, db_table.hpp, and db_table.cc.

发表评论

电子邮件地址不会被公开。必填项已用*标注

姓名 *

电子邮件 *

验证码 *