CSci 4061: Introduction to Operating Systems, Spring 2024

CSci 4061: Introduction to Operating Systems, Spring 2024
Project #3: MultiThreaded Image Matching Server
Instructor: Jon Weissman
Intermediate submission due: 11:59pm (CDT), 4, 4, 2023
Final submission due: 11:59pm (CDT), 4. 12, 2023
1. Background
The purpose of this lab is to construct a multithreaded client and a multithreaded server using POSIX threads (pthreads) in the C language to learn about thread programming and synchronization methods. In this project, we will use multithreading to improve the performance of a server that is programmed to accept an image from the user, match it against a database of known images, and return the closest matching image. In this programming assignment we will be using the dispatcher-worker model of threads. There is both concurrency and parallelism at play (the latter if the server is running on a multicore system). Note: even if threads are dispatched to different cores, they still have direct access to all of the process memory.
The purpose of this programming assignment is to get you started with thread programming and synchronization. You need to be familiar with POSIX threads, mutex locks and condition variables.
2. Project Overview
Your project will be composed of two types of threads: dispatcher thread and worker threads.The purpose of the dispatcher threads is to repeatedly accept an incoming connection, read the client request from the connection, and place the request in a queue. We will assume that there will only be one request per incoming connection. The purpose of the worker threads are to monitor the request queue, retrieve requests (in the form of an input image) and read the image into memory (ie. get the bytes of the image), match the image against a database of images, and serve the best or closest matching image back to the user. The queue is a bounded buffer and will need to be properly synchronized. All client-server communication is implemented for you.
3. Server Overview
Your server should create a fixed pool of worker and dispatcher threads when the program starts.
The worker thread pool size should be num_worker (you can assume that the number of worker threads will be less than the number of requests) and dispatcher thread should be of size num_dispatcher. Your server should bring the database of images into memory when the server starts up.
3.1 Server Database:
● The database is a directory filled with images. These images are utilized for comparing input images received from clients, with the closest match subsequently returned to the respective client. It is imperative to load this database into memory upon the server's startup to ensure efficient access during the matching process.
3.2 Request Queue Structure:
● Request Queue Structure: Each request inside the queue will contain an image (i.e. a stream of bytes) sent from the client via an image file they specify and file descriptor of where to send the best matching image back. You may use a struct to hold this data before adding it to the queue.
The queue structure is up to you. You can implement it as a queue of structs or a linked list of structs, or any other data structure you find suitable.
3.3 Dispatcher Thread
The purpose of the dispatcher threads is to repeatedly accept an incoming connection, read the client request from the connection (i.e. the image contents), and place the request in a queue. We will assume that there will only be one request per incoming connection. You will use locks and condition variables (discussed Thursday) to synchronize this queue (also known as a bounded buffer). The queue is of fixed size.
● Queue Management: The identified image stream of bytes are added to the request queue along with a file descriptor of where to send the image back. This queue is shared with the workers.
● Signaling New Request: Once a request is added to the request queue, the dispatcher thread will signal to all of the worker threads that there is a request in the queue.
● Full Queue: Once the queue is full, the dispatcher thread will wait for a signal from any worker thread that there is a space in the queue.
● Network Functions the dispatcher will call:
○ int socketfd = accept_connection(): returns a file descriptor which should be stored in the queue
○ Char * buffer = get_request(int socketfd, size_t *size):
Takes the file descriptor as the first argument, and takes a size_t pointer as a second argument which will be set by this function. Returns a char * with the raw image bytes.
3.6 Worker Threads
The worker threads are responsible for monitoring the request queue, retrieving requests, comparing images from the database with the request image, and serving the best image back to the user.
Here's a breakdown of its functionality:
● Parameters: The worker thread will take a threadID as a parameter (0, 1, 2, …) which will later be used for logging. You can assign the threads an ID in the order the threads are created. Note that this thread ID is different from the pthread_id assigned to the thread by the pthread_create() function.
● Queue Monitoring: Worker threads continuously monitor the shared request queue. When a new request arrives from the dispatcher thread, one of the worker threads retrieves it for further processing.
● Request Handling: Once a request is obtained, a worker thread will compare against the in-memory copy of the database for the best matching image.
● Response to request: After finding the image, the worker thread prepares the image to be served back to the user by sending the image bytes. The client then writes the returned image into a file. An example would be: input file is foobar.png output file could be foobar_similar.png.
● Empty Queue: Once the queue is empty, the worker thread will wait for a signal from any dispatcher thread that there are now requests in the queue.
● Synchronization: Proper synchronization mechanisms such as mutex locks and condition variables are used to ensure that multiple worker threads can safely access and modify shared data structures (queues) and other global variables without race conditions or deadlocks.
● Network Function the worker will make:
○ database_entry_t image_match(char *input_image, int size):
○ send_file_to_client(int socketFd, char *buffer, int size):
Takes the client file descriptor, the matching image memory block, and its size.
3.8 Request Logging
The worker threads must carefully log each request to a file called “server_log” and also to the terminal (stdout) in the format below. The log file should be created in the same directory where the final executable “server” exists. You must also protect the log file from race conditions. The format is:[threadId][reqNum][fd][Request string][bytes/error]
● threadId is an integer from 0 to num_workers -1 indicating the thread index of requesthandling worker. (Note: this is not the pthread_t returned by pthread_create).
● reqNum is the total number of requests a specific worker thread has handled so far, includingthe current request (i.e. it is a way to tag each request uniquely).
● fd is the file descriptor given to you by accept_connection() for this request
● database string is the image filename sent by the server
● bytes/error is either the number of bytes returned by a successful request.
The log (in the “server_log” file and in the terminal) should look something like the example below. We provide the code for this.
[8][1][5][/DB/30.jpg][17772]
[9][1][5][/DB/30.jpg][17772]
Make sure serer_log file is opened for write with truncation to 0.
3.8 Server termination
We will keep this very simple: ^C. If you wish you can catch ^C, and do some cleanup or goodbye, but not needed. If the client is running, then this may hang the client or possibly make it crash.
Do not worry about that.
4. Client Overview
Your client will take a directory name as command line argument and be tasked with traversing its contents. For each file encountered within the directory, the client will initiate a thread to request the server to process it. This thread will handle the transmission of the file to the server for processing.
Subsequently, the thread will remain active, awaiting the receipt of the corresponding matching imagefrom the server, writing the contents to a file, and then terminating. The reason the client is multithreaded is to emulate multiple concurrent requests to the server.
4.1 Client Main Thread
● Directory Traversal: The main thread will traverse the directory contents, for each image encountered within the directory, it will spawn a thread to process it. This will give us some concurrency at the server, hopefully.
4.2 Client Threads
● File Preparation: Once the thread starts it will load the image into memory and send it to the server using send_file_to_server() function.
● User Response: After the thread successfully sends an image to the server. The thread will remain active, awaiting the receipt of the corresponding matching image from the server.
● Matching image Handling: Once the matching image has been received by the client, the thread will save the image into a new file and log the request.
● Network Functions the client needs to call:
○ int socketFd = setup_connection(): returns a file descriptor for where to send data to the server
○ send_file_to_server(int socketFd FILE *fd, size_t size) : takes a server file descriptor, the image file descriptor and size of the image.
○ receive_file(int socketFd, char * path): server file descriptor and path to output the new image.
5. Compilation Instructions
You can create all of the necessary executable files with
Command Line
$ make
Running the program with various directories can be accomplished with
Command Line
$ ./server
$ ./client
Example:
Command Line
$ ./server 8000 database 50 50 20
$ ./client img 8000 output/img
6. Project Folder Structure
Please strictly conform to the folder structure that is provided to you. Your conformance will be graded.
Project structure Contents (initial/required contents [1] )
include/ .h header files (server.h client.h utils.h)
lib/ .o library files (utils.o)
src/ .c source files (server.c client.c)
database/ Contain the server database
expected/ expected output
Makefile file containing build information and used for testing/compiling
README.md
This content is required at minimum, but adding additional content is OK as long as it doesn’t break the existing code.
7. Assumptions / Notes
1. The maximum number of dispatcher threads will be 100.
2. The maximum number of worker threads will be 100.
3. The maximum length of the request queue will be 100 requests.
4. The maximum length of a filename will be 1024.
5. The maximum number of database entries is 100.
8. Documentation
Within your code you should use one or two sentences to describe each function that you write.
You do not need to comment every line of your code. However, you might want to comment portions of your code to increase readability.
9. Submission Details
There will be two submission periods. The intermediate submission is due 1 week before the final submission deadline. The first submission is mainly intended to make sure you are on pace to finish the project on time. The final submission is due ~2 weeks after the project is released.
9.1 Intermediate Submission
For the intermediate submission, your task is to perform directory traversal on the server to form the in-memory version of the database - that is, add the directory content into the database list. You will also create N dispatcher and worker threads and join them in the main thread, with each thread printing its thread ID and exiting. The main client thread will make one request to the server. It should just accept the request, and print it out to stdout. There is no worker or queue needed for the intermediate. Note that you do not need to implement any synchronization for the intermediate submission.
One student from each group should upload a .zip file to Gradescope containing all of your project files.
We’ll be primarily focusing on *.c and your README, which should contain the following information:
● Project group number
● Group member names and x500s
● The name of the CSELabs computer that you tested your code on
○ e.g. csel-kh1250-01.cselabs.umn.edu
● Any changes you made to the Makefile or existing files that would affect grading
● Plan outlining individual contributions for each member of your group
● Plan on how you are going to construct the worker threads and how you will make use of mutex locks and condition variables.
The member of the group who uploads the .zip file to Gradescope should add the other members to
their group after submitting. Only one member in a group should upload.
9.2 Final Submission
One student from each group should upload a .zip file to Gradescope containing all of the project files.
The README should include the following details:
● Project group number
● Group member names and x500s
● The name of the CSELabs computer that you tested your code on
○ e.g. csel-kh1250-01.cselabs.umn.edu
● Members’ individual contributions
● Any changes you made to the Makefile or existing files that would affect grading
● Any assumptions that you made that weren’t outlined in section 7
● How could you enable your program to make EACH individual request parallelized? (high-level pseudocode would be acceptable/preferred for this part)
The member of the group who uploads the .zip file to Gradescope should add the other members to their group after submitting. Only one member in a group should upload.
Your project folder should include all of the folders that were in the original template. You can add additional files to those folders and edit the Makefile, but make sure everything still works. Before submitting your final project, run “make clean” to remove any existing output/ data and manually remove any erroneous files.
10. Miscellaneous
1. We will provide an initial set of code, but you will be doing most of the coding.
2. Do not use the system call “system”.
3. Said before: KILL all of your stray processes during debugging as needed.
4. Any provided binaries are meant for the CSELAB Linux environment. No other binaries will be distributed.
5. ChatGPT or other significant “other” code reuse is prohibited. The purpose of this course is to learn by doing, and not meeting some deadline. If you are unsure about any located online code, contact us.
6. On the other hand, locating code snippets that show how system calls can be used is fine
11. Rubric (tentative)
● [10%] README
● [15%] Intermediate submission
● [15%] Coding style: indentations, readability, comments where appropriate
● [20%] Test cases
● [30%] Correct use of pthread_create(), pthread_join(), pthread_mutex_t , pthread_cond_t, pthread_mutex_lock(), pthread_mutex_unlock(), pthread_cond_wait(), pthread_cond_signal()
● [10%] Error handling — should handle system call errors and terminate gracefully
Additional notes:
● We will use the GCC version installed on the CSELabs machines to compile your code. Make sure your code compiles and runs on CSELabs.
● A list of CSELabs machines can be found at https://cse.umn.edu/cseit/classrooms-labs
○ Try to stick with the Keller Hall computers since those are what we’ll use to test your code
● Helpful GDB manual. From Huang: GDB Tutorial From Kauffman: Quick Guide to gdb

发表评论

电子邮件地址不会被公开。 必填项已用*标注