Assignment: Implement Regex Pattern Matching with Basic Meta Characters

[![Review Assignment Due Date](https://classroom.github.com/assets/deadline-readme-button-22041afd0340ce965d47ae6ef1cefeee28c7c493a6346c4f15d667ab976d596c.svg)](https://classroom.github.com/a/fg4YWue_)
# Assignment: Implement Regex Pattern Matching with Basic Meta Characters
## Objective
The objective of this coding assignment is to develop a foundational understanding of regex (regular expressions) by implementing a pattern matching utility that supports basic meta characters. By completing this assignment, students will:
- **Understand the Working of Regex:**
  - Gain insight into how regular expressions are parsed and evaluated to match patterns in a given input string.
- **Learn the Basic Syntax of Regex:**
  - Familiarize themselves with essential regex meta characters.
- **Apply Two Fundamental Rules of Regex:**
  - **First Match Wins**: Demonstrate the concept that regex engines match patterns from left to right and stop at the first successful match.
  - **Greedy Quantifiers**: Understand and implement the default greedy nature of quantifiers that attempt to match the longest possible substring before backtracking.
- **Reinforce Problem-Solving Skills:**
Translate the abstract rules of regex into concrete, efficient algorithms that mimic the behavior of regex engines.
By the end of this assignment, students will have a deeper appreciation of the logic and algorithms behind regex, which are widely used in programming for text processing, validation, and searching.
## Getting Started
### Prerequisites
Make sure your C++ development environment is set up according to the instructions on Canvas (C++ Dev Environment Setup). You’ll need the following tools:
- **Visual Studio Code**: Our recommended IDE, with these extensions installed:
  - **C/C++ Extension Pack**: Provides language support for C++.
  - **C++ TestMate**: Enhances Google Test integration and provides a convenient Test Explorer.
- **Development Tools**:
  - **C++ Compiler**: We’ll use `clang++` as the default compiler.
  - **CMake**: A build system generator for configuring and managing builds.
  - **Ninja**: A fast build system used in conjunction with CMake.
  - **Google Test**: The testing framework used for unit tests in this project.
### Repository Structure
TODO: update the structure as needed
```plaintext
.
└── .github/workflows/ci.yml       # github action to execute test runs when student create a PR to merge into main branch, you might want to adjust commands to run
└── .vscode/                       # vs code configuration files
|   ├── launch.json                #   configuration to launch debugger for main
|   ├── settings.json              #   vs code settings for c++ coding environment
|   ├── tasks.json                 #   tasks to run in vs code IDE: build, run, etc.
└── src/
|   └── regex/                     # Folder for coding assignment: we will build a library to link to unit test and main app
|   |   └── test/                  #   Unit test folder for the assignment
|   |   |   ├── CMakeLists.txt     #     CMake file for test directory
|   |   |   └── test.cpp           #     unit test code: one per class
|   |   ├── CMakeLists.txt         #   CMake file for project library
|   |   ├── myregex.h              # Header file that defines IRegex interface and a couple of structs for return value
|   |   ├── regex_impl.cpp         # Implementation file for your **RegexImpl** class.
|   |   └── regex_impl.h           # Header file for students to implement **RegexImpl** class.
|   ├── CMakeLists.txt             # CMake file for src directory
|   └── main.cpp                   # Main file to run external tests, DO NOT modify: this is to be used for running my test suites.
├── .clang-format                  # C++ coding style based on Google: DO NOT modify
├── .clang-tidy                    # .clang-tidy configurations: DO NOT modify
├── .gitignore                     # files/folders not to tracked by git
├── CMakeLists.txt                 # Root CMake file
├── CMakePresets.json              # Root CMake file
└── README.md                      # CMake preset configurations
```
## Assignment Instructions
### Part 1: Implement `RegexImpl` class
#### Implement **RegexImpl** class:
- Constructor will take a string `pattern` as regex pattern.
- public `Find` method:
  - takes a string `input` to match with the `pattern`
  - it will return the following based on success or failure of the match:
    - if a match is found,
      - return `RegexError::NONE` as `FindResult.error` and
      - `FindResult.match` with `start` and `end` as of the position in the `input` string with 0 index base. `end` will be the position right after the last character matched. see example below.
    - if no match is found, return `RegexError::NO_MATCH` as `FindResult.error`.
    - if `pattern` is invalid, return `RegexError::INVALID_REGEX` as `FindResult.error`.
#### Supported Meta Characters
The function must support the following regex metacharacters:
- Use a backslash (`\`) to escape metacharacters (`.`, `?`, `+`, `*`) and itself only.
  - Regex: `a\.b` matches the string `a.b`.
  - Regex: `a\*b` matches the string `a*b`.
  - Regex: `a\?b` matches the string `a?b`.
  - Regex: `a\+b` matches the string `a+b`.
  - Regex: `a\\b` matches the string `a\b`.
- Two metacharacters cannot be adjacent to each other in the regex pattern. For example:
  - Valid: `a*b`
  - Invalid: `a*?b`, `a+?b`: no lazy quantifier
- However, the following patterns are legal:
  - `.?`, `.+`, `.*`
- `.` (dot): Matches any single character.
- `?` (question mark): Matches zero or one occurrence of the preceding character.
- `+` (plus): Matches one or more occurrences of the preceding character.
- `*` (asterisk): Matches zero or more occurrences of the preceding character.
- `?`, `+`, `*` meta characters are **`greedy`**
####  Implementation Details
- Optimize for clarity and readability.
- Feel free to add private helper methods.
- Implement an NFA engine with backtracking.
- You may not use regex from existing library.
#### Examples:
- `pattern`: `a`, `input`: `abc`, `FindResult`: `{error: NONE, match: {start: 0, end: 1}}`
- `pattern`: `b`, `input`: `abc`, `FindResult`: `{error: NONE, match: {start: 1, end: 2}}`
- `pattern`: `c`, `input`: `abc`, `FindResult`: `{error: NONE, match: {start: 2, end: 3}}`
- `pattern`: `abc`, `input`: `abc`, `FindResult`: `{error: NONE, match: {start: 0, end: 3}}`
- `pattern`: `abcd`, `input`: `abc`, `FindResult`: `{error: NO_MATCH}`
- `pattern`: `a**b`, `input`: `abc`, `FindResult`: `{error: INVALID_REGEX}`
### Part 2: Write Unit Tests
1. **Create Tests**:
   - Write unit tests to verify each method of the interface.  
2. **Test Scenarios**:
   - Ensure your tests cover common cases, edge cases, and invalid inputs (e.g., accessing an index out of range).
## Building and Testing
### Building the Project
1. Update necessary **CMakeLists.txt** files to make the project compile.
2. **Generate and build** the project using CMake:
   ```build
   cmake --list-presets: to view a list of presets 
   cmake --preset <preset-name>: either debug or release
   cmake --build build/<preset-name>: either debug or release
   ```
3. **Building the Project via VS Code UI**
   Follow these steps to configure and build the project using Visual Studio Code:
   - **Open the Command Palette**:
     - **From the Menu**: Go to `View` > `Command Palette...`
     - **Using Keyboard Shortcuts**:
       - **macOS**: `Command + Shift + P`
       - **Windows/Linux**: `Control + Shift + P`
   - **Configure the Project**:
     1. In the Command Palette, select **"CMake: Select Config Preset"** and choose either **Debug** or **Release**.
     2. Next, select **"CMake: Configure"** to generate the necessary configuration files for CMake.
   - **Build the Project**:
     - In the Command Palette, select **"CMake: Build"** to compile the project and generate binaries.
### Running the Tests
1. Run the unit tests:
   - Use the **Test Explorer** in **VS Code** for a graphical interface, or
   - Execute the test binary directly from a terminal or command shell.
2. Ensure that **all unit tests pass** before submitting your work.
### Submission
1. Create and work locally in your own dev branch, typically named as `dev/<your-name>`.
2. Commit changes locally as often as you see fit. If you change your mind of implementation, you can go back to earlier versions.
3. Once you have done your testing with unit test, commit and push to remote server.
4. Create a PR (pull request) to merge into `main` branch to kick of the `GitHub` actions to build, execute your unit tests and my test suite.
5. Check out action results to see if your code works for all test cases.

发表评论

电子邮件地址不会被公开。 必填项已用*标注