Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
CS 252: Systems Programming
Fall 2024
Project 5: Web Server
Prof. Turkstra
Checkpoint: Monday, November 4 11:59 PM
Final: Monday, November 11 11:59 PM
1 Goals
In this project, you will implement an HTTP server which allows an HTTP client (such as Mozilla Firefox) to browse a website and receive different types of content. You will gain a high level understanding of Internet sockets, the HyperText Transfer Protocol (HTTP), SSL/TLS extensions to HTTP, and the Common Gateway Interface (CGI).
2 Deadlines
3 The Big Idea
You will implement relevant portions of the HTTP/1.1 specification (RFC 2616). Your server will not need to support any methods beyond GET, although there is extra credit available for supporting other methods (outlined in the extra credit section).
RFC 2616 is somewhat long at 176 pages, but it is well written and straightforward. We do notexpect you to read the document in its entirety (though it certainly would not hurt to do so). You should, however, identify the relevant sections in the Table of Contents and carefully read them.
The sections below offer some level of detail with regard to implementation, but more details are in the RFC.
The provided code template provides a basic structure and foundation on which to build your server. You should carefully look through the provided skeleton code and ensure that you understand
4 Getting Started
Login to a CS department machine (a lab machine or data.cs.purdue.edu), navigate to the directory containing your projects, and run
You can build your webserver by typing make, and start it by running ./myhttpd <portno> where portno is the port number you wish to bind to. You can see the various options that need to be implemented by running ./myhttpd -h.
4.1 Security Notice
Some CS machines are publicly accessible on the Internet. When developing a web server, it is wise to shield the server from any possible network traffic so that an attacker cannot take advantage of bugs in your code. You should modify your server to bind to the loopback interface for at least the first 3 tasks.
Upon doing this, if you are connected to data, your webserver will only accept connections coming from data itself. While all CS machines allow many users to be logged in simultaneously, it is a reasonably safe assumption that no malicious requests will be sent to your server. When you are confident that your server won’t accidentally leak secret files, you can accept all connections from any interface so that you can develop via SSH and use a web browser on your local system.
You can make your server bind to the loopback interface by modifying tcp.c to bind to INADDR_- LOOPBACK instead of INADDR_ANY.
4.2 Resource Limits
For this project, we set the limit for the amount of memory your program can use to 48 MiB, the maximum amount of CPU time to 5 minutes, and the maximum number of forked processes to 50. We don’t expect you to hit these limits, so if your server hits any of these limits, it’s probably because you need to change the way you’re doing something! Web servers like the one we are implementing should not be resource intensive.
5 The Assignment - Checkpoint
5.1 Background
HTTP, or Hypertext Transport Protocol, is a protocol for communicating over the Internet that is used by numerous applications. HTTP specifies the format and meaning of messages that are used to communicate between client applications and server applications. It is a textual format, and requests and responses can be thought of as structured blobs of text that humans can read.
HTTP 1.1 follows a strict request-response model where the client opens a connection, sends a single request, and then receives a single response from the server before the server closes the connection and the transaction is complete.
Connections for HTTP are initiated using TCP, which is a high-level transmission protocol that has many features built in to handle dropped connections and faulty data transmission. You will not be implementing TCP in any capacity in this project, it is merely the mechanism used to facilitate HTTP connections.
There are two types of HTTP messages: request messages and response messages. As you can imagine, a client sends a request message to an HTTP server to serve some request (e.g. to obtain the file /index.html or the picture /img/PurdueSeal.png). The request message contains details such as the version of HTTP being used, which resource is being requested, what you’d like to do with that resource, what sorts of responses the client accepts, and so on. The server is responsible for parsing the request message and creating a response message containing the results of the query.
More details on this can be found in the RFC.
5.2 Request/Response Format
- <sp> represents any whitespace character
- <crlf> represents a carriage return + line feed pair: \r\n (ASCII character 13 followed by ASCII character 10).
- <URL> is the name and location of the requested file. Note the path is relative to the DocumentRoot mentioned below. The filename itself is optional. If it is not specified, the a default index.html should be added.
- (<headers> <crlf>)* contains additional information of the format <key>: <value> Note that this part can be composed of several lines each separated by a <crlf>.
- For this project, the only additional header that needs to be addressed is Authorization. This is discussed in section 5.4.
- Note that the HTTP client (i.e., web browser) may send additional headers. You can safely ignore the other headers as long as the HTTP request is syntactically correct.
HTTP/1.1 <sp> 200 <sp> OK <crlf>Connection: <sp> close <crlf>Content-Type: <sp> <document_type> <crlf>Content-Length: <sp> <size> <crlf>(<additional headers> <crlf>)*<crlf><data>
- Connection: close indicates that the connection will be closed upon completion of the response. It is required for every response.
- <document_type> indicates to the client the type of document being sent. Every response that sends data (see Section 5.5 should include this header. Below are the document types that need to be implemented for this project:
- “text/plain” for plain text
- “text/html” for HTML documents
- “text/css” for CSS documents
- “image/gif” for gif files
- “image/png” for png files
- “image/jpeg” for jpg/jpeg files
- “image/svg+xml” for svg files
- <size> is the number of bytes in the delivered content (<data>). It is required for every response that sends data.
- (<additional headers><crlf>)* contains, as before, additional useful information for the client to use. Specifically for our purposes, this includes the WWW-Authenticate header which should be transmitted any time a request lacks a proper Authorization header.
- <data> is the actual file or resource requested. Observe that this is separated from the response headers by two carriage return + line feed pairs.
HTTP/1.1 <sp> 404 File Not Found <crlf>Connection: <sp> close <crlf>Content-Type: <sp> <Document-Type> <crlf><crlf><Error Message>
- <Document-Type> indicates the type of document (i.e. error message in this case) being sent. Since you are going to send a plain text message, this should be set to text/plain.
- <Error Message> is a human readable description of the error in plain text format indicating the error (e.g. Could not find the specified URL. The server returned an error). Look at http_messages.c.
• RFC2616 Section 5 for information on how to parse request messages. The only method you are required to support is GET.• RFC2616 Section 6 for information on how to form response messages.
At a minimum, your solution should implement the following response codes:
• 200 . OK• 400 . Bad Request (incorrect request syntax)• 401 . Unauthorized (Basic HTTP Authentication)• 403 . Forbidden (access denied—e.g., requested file does not permit read access.)• 404 . Not Found (requested resource doesn’t exist)• 405 . Method Not Allowed (e.g., in response to something other than a GET request)• 500 . Internal Server Error (internal failure—e.g. missing EOF on file read, pipe failure, fork failure, exec failure, etc)• 505 . HTTP Version Not Supported (anything other than HTTP/1.1 or HTTP/1.0)
5.3 Basic Server
• Open a passive socket• Do forever:
– Accept a new TCP connection– Read a request from the TCP connection and parse it– Choose the appropriate response header depending on whether the URL requested is found on the server or not– Write the response header to the TCP connection– Write the requested data (by default you should respond with the contents of index.html, located at http-root-dir/htdocs/index.html) to the TCP connection– Close the TCP connection
The server that you will implement at this stage will not be concurrent, meaning that it will not serve more than one client or request at a time (it queues the remaining requests while processing each request). Use the aforementioned daytime server as a reference for programming with sockets.
Most of the logic for the server itself (e.g., concurrency modes, accepting client connections, responding to client, etc.) should be implemented in server.c. The http_messages.c file should contain code for dealing with HTTP requests and responses (e.g., parsing requests, forming responses, etc.).
5.4 Basic HTTP Authentication
In this part, you will add basic HTTP authentication to your server. Your HTTP server may have some bugs and may expose security problems. You don’t want to expose this to the open internet. One way to mitigate this security risk is to implement basic HTTP authentication. You will implement the authentication scheme in RFC7617, aptly called “Basic HTTP Authentication.”
In Basic HTTP Authentication, you will check for an Authorization header field in all HTTP requests. If the Authorization header field isn’t present, you should respond with a status code of 401 Unauthorized with the following additional field:
When a browser receives this response, it knows to prompt for a username and password. The provided credentials (in the form of username:password are then encoded in base64 format. The browser then sends the same request a second time including the Authorization header containing the base64 encoded credentials.
$ cat auth.txt | base64
You can check this by running cat auth.txt and seeing where the prompt prints after the file.
-s -1 auth.txt (this will delete the last byte in the file).
To illustrate the whole process, consider the following message sequence. The first time a browser tries to connect, it will send a request like the following:
GET /index.html HTTP/1.1
HTTP/1.1 401 UnauthorizedConnection: closeWWW-Authenticate: Basic realm="myhttpd-cs252"
Note: you should create your own username and password and NOT use cs252 and password. You can modify username and password in file auth.txt, and use function return_user_pwd_string (in server.c) to load it. When loaded, the string is stored in a global variable g_user_pass as well. You shouldn’t use your Purdue career account credentials either.
Note that the client web browser will cache the authentication information after the server authen ticates it the first time and will send it with every request, so your server should make sure the
For testing purposes, it is best to use a private window in your browser so when you quit out of the browser, it will clear the cached username and password.
5.5 Serving Static Files
Your webserver should serve http-root-dir/htdocs as its DocumentRoot. When your server receives a request e.g. for /index.html, it should attempt to read and transmit the file http-root-dir/htdocs/index.html. If a request is made for a file that does not exist, the server should reply with an appropriate status code—likely 404 in this case. Requests for directories should attempt to serve the file named index.html file in that directory (if that file is not present, then respond with 404).
<body>
<h1>CS252 HTTP Server</h1><ul><li><a HREF="simple.html"> Simple Test</A><li><a HREF="complex.html"> Complex Test</A><li><a HREF="directory"> Browsing directory/</A><li><A HREF="cgi-bin/test-env"> cgi-bin: test-env</A><li><A HREF="cgi-bin/test-cgi"> cgi-bin: test-cgi</A><li><A HREF="cgi-bin/finger"> cgi-bin: finger</A><li><A HREF="cgi-bin/jj"> cgi-bin: jj</A></ul>
<hr>
</body>
Note there are 2 CRLFs after the last header (Content-Length), followed by the data for the document being requested.
- The default request URL will be /, which should be treated as /index.html.
- You will mostly be modifying server.c and the handle_htdocs() function in handlers.c for this part.
- If you receive a request for a document that is a directory without a trailing frontslash (e.g. /dir1, not /dir1/, then you should really serve /dir1/index.html. That is, pretend as though requests for /dir1 are really requests for /dir1/index.html.
- If a request has a trailing frontslash, you will handle this in the browsable directories part.
5.6 Concurrency
• -f : Create a new process for each requestFork mode (run_forking_server()): You should handle each request in a new process. To do this, after you accept a connection off of your socket acceptor, you should call fork() andexecute your normal request/response logic from within the child. Don’t forget about zombies!• -t : Create a new thread for each requestThread-per-request mode (run_threaded_server()): You should handle each request in anew thread. To do this, after you accept a connection off of your socket acceptor, you shouldcreate a new thread and execute your normal request/response logic from within the child.You should consider using pthreads for this part.• -pNUM THREADS : Pool of threadsPool of threads (run_thread_pool_server()): You should handle each request using a pool of n workers. n is denoted by NUM THREADS. Your program should be using n + 1 threads during execution - the thread that your program starts in should be used to create the n workers and then wait for them to finish. Of course, since each worker is running in an infinite loop, they will only finish on an error or when the program exits.• -h : Print usageThis flag will print out the usage of myhttpd and myhttpds.
If no flags are passed, the server should act like an iterative server as created in the Basic Server section. If port is not passed, choose your own default port number. Make sure it is larger than 1024 and less than 65536.
5.7 Server Termination
5.8 Checkpoint Submission (45 points)
6 The Assignment - Final
6.1 Secure HTTP (HTTPS)
Fortunately a more secure version of HTTP exists, called HTTPS. HTTPS uses Transport Layer
Security (TLS) to communicate. TLS encrypts all traffic at the transport layer of the Internet model, which means that you, as an application programmer, only need to worry about adding high-level support for communicating via TLS. That is, the underlying socket has changed, but the HTTP protocol you’ve implemented is unconcerned about what the socket does when your server asks to read or write data.
The skeleton code provided includes an abstraction of the socket layer that supports regular and encrypted communication. The “regular” version was provided. You now need to implement the functions declared in tls.h:
• int close_tls_socket(tls_socket *socket);• int tls_write(tls_socket *socket, char *buf, size_t buf_len);• int tls_read(tls_socket *socket, char *buf, size_t buf_len);• tls_acceptor *create_tls_acceptor(int port);• tls_socket *accept_tls_connection(tls_acceptor *acceptor);• int close_tls_acceptor(tls_acceptor *acceptor);
$ make myhttpsd$ ./myhttpsd ...
When you navigate to your server with your internet browser, be sure to include the https:// prefix.
Note: After successfully porting the TLS “driver” to your web server, you will receive warning messages in your browser along the lines of “Connection is not Private.” This is because your SSL certificate was not signed by an accepted certificate authority, such as Comodo, LetsEncrypt, GlobalSign, etc. You can manually install your certificate as a trusted certificate in your browser, or you may simply add an exception. We trust you can figure out how to do this depending on the browser being used.
If you haven’t added the certificate exception when you first connect to your server, some browsers (like Firefox) will immediately close the connection—often before your server can write a response back. Your code will likely crash because you receive a SIGPIPE. You must safely handle this signal.
6.1.2 Example Code
We have provided a simple reference TLS server in the file examples/tls-server.c. Please look at this example since it initializes and accesses a TLS server in the same way your server will. You can build it with make tls-server and play around with it to learn how to use the OpenSSL library. The simple TLS server example assumes that your private key and certificate file are called cert.pem and key.pem and that you can bind to port 4433 - you will want to change this port number so that you can run the demo. You can generate the certificate and key by running the following command and filling in the prompts:
$ openssl req -newkey rsa:4096 -nodes -sha512 -x509 -days 21-nodes -out cert.pem -keyout key.pem$ chmod 700 *.pem
6.1.3 Approach
• SSL_CTX_new(TLS_server_method()) will return a pointer to an SSL_CTX structure.• An SSL context is essentially a collection of options and other configuration-related values that are used to create and maintain an SSL connection.• See the create_context() function in the TLS example.
Be sure to handle any errors and read the man pages for the above functions for more information.Once again, be sure to look carefully at the example in examples/tls-server.c since it clearly demonstrates how to do the above steps. A lot of the code can be copy/pasted into your server!
You will notice that this project uses conditional compilation to compile either a TLS or regular TCP server. In general, your server should use the functions in socket.c to open, close, read, or write sockets and accept connections, so implementing the functions in tls.c should be enough to create a working server since myhttpsd should invoke those functions in place of the regular functions.
6.2 Common Gateway Interface (CGI) Support
The general sequence of events for handling a cgi-bin request then is:
6.3 Loadable Modules
6.4 Directory Browsing
7 Extra Credit
7.1 Sorting
7.2 Statistics
The number of requests should be updated after the socket is closed.
7.3 Logging
8 Testing
- Be sure to place your username and password in the auth file per above and use the -B option over the -A option. If you do not, others will be able to see your credentials when e.g. running ps.
- ab doesn’t work when using the hostname localhost, so use the IP address (127.0.0.1) instead when benchmarking locally.
- If you add the cs252 bin directory to your PATH, you can just run ab. Add export PATH=PATH:~cs252/bin to your preferred shellrc (.bashrc, .zshrc, etc.)
9 Final Submission (55 points)
$ make clean$ make myhttpds$ make submit_final