Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Project 2: 2025WI_COMP_SCI_339-0_SEC1Introduction to Database Systems B+ Tree
Released: Mon, Feb 3, 2025Due: Sat, Mar 1, 2025 @11:59 PM CT
In this project, you will implement a B+ tree index, which is a balanced search tree where the internal pages direct the search and leaf pages contain the actual key-value pairs. Indexes provide fast data retrieval without needing to search every row in a table, enabling rapid point lookups and efficient scans of ordered tuples. Your implementation will support lookup, insertion, and deletion (including splitting and merging nodes). As part of the project, you will implement:
- B+ Tree Pages
- B+ Tree Operations
1. Project Specification
For each of the following components, stub classes are provided that contain the APl you should implement. Do not modify the signatures for the predefined functions in these classes. If you modify these signatures, it will break the autograder and you will not receive credit for the project. Similarly, if a class already contains data members, do not remove them. It is fine to add additional data members and helper functions as part of your solution.Unless specified otherwise, you may use any built-in C++17 containers. It is up to you to decide which ones (if any) you want to use. Be warned that these containers are generally not thread-safe, so you will need to use some type of concurrency control mechanism (e.g., mutexes) to synchronize concurrent threads. You may not use third-party libraries (e.g., Boost).
1.1. Task #1- B+ Tree Pages
You must implement the following three Page classes to store data in your B+ tree.Base Page
This is a base class that the Internal Page and Leaf Page inherit from. It contains only information that both child classes share. A B+ tree Page has the following fields:
Variable Name |
Size |
Description |
page_type_ |
4 |
Page type (invalid page / leaf page / internal page) |
size_ |
4 |
Number of key & value pairs in a page |
max_size_ |
4 |
Max number of key & value pairs in a page |
You must implement Page by modifying only its header file
(src/include/storage/page/b_plus_tree_page.h) and corresponding source file
(src/storage/page/b_plus_tree_page.cpp).
Note: Each B+ tree Page corresponds to the content (i.e., thedata_part) of a memory page fetched by the buffer pool. Every time you read from or write to a Page, you must first fetch it from the buffer pool (using its uniquepage_id), reinterpret cast it to either a leaf or an internal page, and unpin it when finished.
Internal Page
An Internal Page (i.e., inner node) stores m ordered keys and m + 1 child pointers to other B+ tree Pages. These keys and pointers are internally represented as an array of key/page_idpairs. As the number of child pointers is one more than the number of keys, the first key inkey_array_(see src/include/storage/page/b_plus_tree_internal_page.h) is set to be invalid, and lookups should always start from the second key.
At any time, each Internal Page should be at least half full. During deletion, two half-full pages can be merged, or keys and pointers can be redistributed to avoid merging. During insertion, one full page can be split into two, or keys and pointers can be redistributed to avoid splitting. These are examples of the many design choices that you will make while implementing your B+ tree.
You must implement the Internal Page by modifying only its header file
src/include/storage/page/b_plus_tree_internal_page.h) and corresponding source file
src/storage/page/b_plus_tree_internal_page.cpp).
Leaf Page
The Leaf Page stores m ordered keys and their m corresponding values. In your implementation, the value should always be the 64-bit record ID for each tuple (see theRIDclass in src/include/common/rid.h). Leaf Pages have the same restrictions on the number of key/value pairs as
Internal Pages, and they should follow the same operations for merging, splitting, and redistributing keys.
You must implement your Leaf Page by modifying only its header file
(src/include/storage/page/b_plus_tree_leaf_page.h ) and corresponding source file
src/storage/page/b_plus_tree_leaf_page.cpp).
Note: Even though Leaf Pages and Internal Pages contain the same key type, they may have
different value types. Thus,max_sizecan be different.
1.2. Task #2 - B+ Tree Operations
Your B+ tree must support point lookups (Getvalue( )), insertion (Insert()), and deletion(Delete()). You need only support unique keys; that is, if you try to reinsert an existing key into the B+ tree, insertion should not be performed and falsewill be returned.
B+ tree pages should be split (or keys should be redistributed) if an insertion would violate any invariants. If an insertion changes the page ID of the root, you must update theroot_page_id in the header page. You can do so by accessing theheader_page_id_page, which is given to you in the constructor. Then, via reinterpret cast, you can interpret this page as aBPlusTreeHeaderPage(from
src/include/storage/page/b_plus_tree_header_page.h ) and update the root page ID accordingly. You must also implementGetRootpageId .
Similarly, your B+ tree must support merging or redistributing keys between pages if necessary to maintain the invariants when deleting a key. As with insertions, you must correctly update the root page ID if the root changes.
We recommend that you use the page guard classes from Project #1 (https://canvas.northwestern.edu/courses/225669/pages/project-1-2)_to avoid synchronization problems.
You should useReadPage or writePageaccordingly.
You may optionally use theContextclass (defined in src/include/storage/index/b_plus_tree.h) to track the pages that you have read/written (via theread_set_and write_set_fields) or to store other metadata that you need to pass into other functions recursively.
If decide to use theContextclass, here are some tips:
- You might only need to use write_set_when inserting or deleting. It is possible that you will not useread_set_,depending on your implementation.
- You might want to store the root page ID in the context and acquire a write guard for the header page when modifying the B+ tree. along the access path.
- To find a parent of the current node, look at the back of write_set_. It should contain all nodes
- You may useBUSTUB_ASSERTt to help you find inconsistent data in your implementation. For example, if you want to split a node (except the root), you should ensure that there is still at least one node in the write set ). If vou need to split the root. vou should check if header page is std::nullopt .
- To unlock the header page, simply set header_page_ to std::nullopt. To unlock other pages, pop from the write_set_and drop.
The B+ tree is parameterized on arbitrary key, value, and key comparator types. We have defined a macro, INDEX_TEMPLATE_ARGUMENTS ,that generates the template parameter declaration for you:
template <typename KeyType,
typename ValueType,
typename KeyComparator>
The type parameters are:
- KeyType: The type of each key in the index. In practice this will be aGenerickey . The actual size of a Generickeyvaries, and is specified with its own template argument that depends on the type of the indexed attribute.
- ValueType : The type of each value in the index. In practice, this will be a 64-bitRID.
- KeyComparator : A class used to compare whether twoKeyTypeinstances are less than, greater than, or equal to each other. These will be included in thekeyTypeimplementation files.
1.3. Debugging -Tree Visualization
BusTub includes a built-in tool (tools/b_plus_tree_printer/b_plus_tree_printer.cpp) for generating a visual representation of your B+ tree. This tool will help you check your solution for correct behavior, particularly to ensure it is performing the splits and merges correctly.To generate a dotfile after constructing a tree:
$ #To build the tool
$ mkdir build
$ cd build
$ make b_plus_tree_printer -j
$./bin/b_plus_tree_printer
> ... USAGE ...
>>5 5 // set leaf node and internal node max size to be 5
>>f input.txt // Insert into the tree with some inserts
>> g my-tree.dot // output the tree to dot format
>> q // Quit the test (Or use another terminal)
You should now have amy-tree. dotfile in the same directory as your test binary, which you can then visualize with a command line visualizer or an online visualizer:
- Dump the content to http://dreampuf.github.io/GraphvizOnline/B (http://dreampuf.github.io/GraphvizOnline/)_; or
- Download a CLI tool for your platform, then create a PNG file of your tree using this command: dot -Tpng -0 my-tree.dot
Pink rectangles represent intera| nodes, while green rectangles represent leaf nodes.
- The first rowP=3tells you the page ID of the tree page.
- The second row prints out the properties.
- The third row prints out keys and draws pointers from parent nodes to children.
- Note that the first box of internal nodes are empty; this is not a bug.
2.Grading
Please submit all questions to Piazza (https://piazza.com/northwestern/winter2024/comp sci339crotty)_, not via email.2.1.Testing
You can test individual project components with the supplied test framework, which uses GTest (https://github.com/google/googletest)_for unit tests. You can disable tests by adding aDISABLED_ prefix to the test name. There are four separate files that contain tests for each component:
- test/storage/b_plus_tree_insert_test.cpp
- test/storage/b_plus_tree_delete_test.cpp
- test/storage/b_plus_tree_sequential_scale_test.cpp
- test/storage/b_plus_tree_concurrent_test.cpp
2.2. Memory Leaks
We will useLLVM Address Sanitizer(ASAN) and Leak Sanitizer (LSAN). (https://clang.llvm.org/docs/AddressSanitizer.html)_to check for memory errors. To enable ASAN and LSAN,configure CMake in debug mode and run tests as you normally would, and you should see a report of any detected memory errors. Note that macOS only supports address sanitizer without leak sanitizer.In some cases, address sanitizer might affect the usability of the debugger. In this case, you might need to disable all sanitizers by configuring the CMake project with:
$ cmake -DCMAKE_BUILD_TYPE-Debug-DBUSTUB_SANITIZER- ..
2.3. Development Hints
You can use BUSTUB_ASSERT for assertions in debug mode. Note that any statements within BUSTUB_ASSERT will not be executed in release mode. If you have something to assert in all cases, use BUSTUB_ENSUREinstead. For logging, you may use(printf , stream-based output (e.g.,std: :cout),or the logging utilities supplied in src/include/common/logger.h)If you are having compilation problems, running make clean process. Instead, you should delete your build directory and run cmake .. again before you rerun make.
2.4. Grading Rubric
Each submission will be graded based on the following criteria:- Does the submission successfully execute all of the test cases and produce the correct answer?
- Does the submission execute without any memory leaks?
WARNING: All of the code you submit must be your own original work. Please see the Academic Honesty pollicy in the syllabus (https://canvas.northwestern.edu/courses/225669/assignments/syllabus) for more information. Plagiarism will not be tolerated.