CS 252: Systems Programming


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


CS 252: Systems Programming
Fall 2024
Project 5: Web Server
Prof. Turkstra
Checkpoint: Monday, November 4 11:59 PM
Final: Monday, November 11 11:59 PM

1 Goals

In this project, you will implement an HTTP server which allows an HTTP client (such as Mozilla Firefox) to browse a website and receive different types of content. You will gain a high level understanding of Internet sockets, the HyperText Transfer Protocol (HTTP), SSL/TLS extensions to HTTP, and the Common Gateway Interface (CGI).

2 Deadlines

• The deadline for the checkpoint is Monday, November 4 11:59 PM. No late submissions
for the checkpoint will be accepted.
• The deadline for the final submission is Monday, November 11 11:59 PM.

3 The Big Idea

You will implement relevant portions of the HTTP/1.1 specification (RFC 2616). Your server will not need to support any methods beyond GET, although there is extra credit available for supporting other methods (outlined in the extra credit section).

RFC 2616 is somewhat long at 176 pages, but it is well written and straightforward. We do notexpect you to read the document in its entirety (though it certainly would not hurt to do so). You should, however, identify the relevant sections in the Table of Contents and carefully read them.

The sections below offer some level of detail with regard to implementation, but more details are in the RFC.

The provided code template provides a basic structure and foundation on which to build your server. You should carefully look through the provided skeleton code and ensure that you understand

it.

4 Getting Started

Login to a CS department machine (a lab machine or data.cs.purdue.edu), navigate to the directory containing your projects, and run

$ git clone ~cs252/repos/$USER/proj5
$ cd proj5
In addition to the template, you should see an examples subdirectory that contains a number of useful programs. examples/daytime-server.c has the sockets example discussed in class. tls-server.c has a sample server that uses TLS to encrypt communication. The HTTPS portion of this project requires a nearly identical implementation with some changes.

You can build your webserver by typing make, and start it by running ./myhttpd <portno> where portno is the port number you wish to bind to. You can see the various options that need to be implemented by running ./myhttpd -h.

4.1 Security Notice

Some CS machines are publicly accessible on the Internet. When developing a web server, it is wise to shield the server from any possible network traffic so that an attacker cannot take advantage of bugs in your code. You should modify your server to bind to the loopback interface for at least the first 3 tasks.

Upon doing this, if you are connected to data, your webserver will only accept connections coming from data itself. While all CS machines allow many users to be logged in simultaneously, it is a reasonably safe assumption that no malicious requests will be sent to your server. When you are confident that your server won’t accidentally leak secret files, you can accept all connections from any interface so that you can develop via SSH and use a web browser on your local system.

You can make your server bind to the loopback interface by modifying tcp.c to bind to INADDR_- LOOPBACK instead of INADDR_ANY.

4.2 Resource Limits

For this project, we set the limit for the amount of memory your program can use to 48 MiB, the maximum amount of CPU time to 5 minutes, and the maximum number of forked processes to 50. We don’t expect you to hit these limits, so if your server hits any of these limits, it’s probably because you need to change the way you’re doing something! Web servers like the one we are implementing should not be resource intensive.

5 The Assignment - Checkpoint

5.1 Background

HTTP, or Hypertext Transport Protocol, is a protocol for communicating over the Internet that is used by numerous applications. HTTP specifies the format and meaning of messages that are used to communicate between client applications and server applications. It is a textual format, and requests and responses can be thought of as structured blobs of text that humans can read.

HTTP 1.1 follows a strict request-response model where the client opens a connection, sends a single request, and then receives a single response from the server before the server closes the connection and the transaction is complete.

Connections for HTTP are initiated using TCP, which is a high-level transmission protocol that has many features built in to handle dropped connections and faulty data transmission. You will not be implementing TCP in any capacity in this project, it is merely the mechanism used to facilitate HTTP connections.

There are two types of HTTP messages: request messages and response messages. As you can imagine, a client sends a request message to an HTTP server to serve some request (e.g. to obtain the file /index.html or the picture /img/PurdueSeal.png). The request message contains details such as the version of HTTP being used, which resource is being requested, what you’d like to do with that resource, what sorts of responses the client accepts, and so on. The server is responsible for parsing the request message and creating a response message containing the results of the query.

More details on this can be found in the RFC.

5.2 Request/Response Format

A HTTP client issues a ‘GET’ request to a server in order to retrieve a file. The general syntax of such a request is given below:
GET <sp> <URL> <sp> HTTP/1.1 <crlf>
(<headers> <crlf>)*
<crlf>
where:
  • <sp> represents any whitespace character
  • <crlf> represents a carriage return + line feed pair: \r\n (ASCII character 13 followed by ASCII character 10).
  • <URL> is the name and location of the requested file. Note the path is relative to the DocumentRoot mentioned below. The filename itself is optional. If it is not specified, the a default index.html should be added.
  • (<headers> <crlf>)* contains additional information of the format <key>: <value> Note that this part can be composed of several lines each separated by a <crlf>.
    • For this project, the only additional header that needs to be addressed is Authorization. This is discussed in section 5.4.
    • Note that the HTTP client (i.e., web browser) may send additional headers. You can safely ignore the other headers as long as the HTTP request is syntactically correct.
Finally, observe that the client ends the request with two carriage return + line feed character pairs:
<crlf><crlf>.
The function of a HTTP server is to parse requests of the above form from a client, identify the resource being requested, and send the resource to the client.
Before sending the actual document, the HTTP server must send a response header to the client.
The following shows a typical response from a HTTP server when the requested resource is found on the server:
HTTP/1.1 <sp> 200 <sp> OK <crlf>
Connection: <sp> close <crlf>
Content-Type: <sp> <document_type> <crlf>
Content-Length: <sp> <size> <crlf>
(<additional headers> <crlf>)*
<crlf>
<data>
where:
  • Connection: close indicates that the connection will be closed upon completion of the response. It is required for every response.
  • <document_type> indicates to the client the type of document being sent. Every response that sends data (see Section 5.5 should include this header. Below are the document types that need to be implemented for this project:
    • “text/plain” for plain text
    • “text/html” for HTML documents
    • “text/css” for CSS documents
    • “image/gif” for gif files
    • “image/png” for png files
    • “image/jpeg” for jpg/jpeg files
    •  “image/svg+xml” for svg files
  • <size> is the number of bytes in the delivered content (<data>). It is required for every response that sends data.
  • (<additional headers><crlf>)* contains, as before, additional useful information for the client to use. Specifically for our purposes, this includes the WWW-Authenticate header which should be transmitted any time a request lacks a proper Authorization header.
  • <data> is the actual file or resource requested. Observe that this is separated from the response headers by two carriage return + line feed pairs.
If the requested file cannot be found on the server, the server must send a response header indicating the error. The following shows a typical response:
HTTP/1.1 <sp> 404 File Not Found <crlf>
Connection: <sp> close <crlf>
Content-Type: <sp> <Document-Type> <crlf>
<crlf>
<Error Message>
where:
  • <Document-Type> indicates the type of document (i.e. error message in this case) being sent. Since you are going to send a plain text message, this should be set to text/plain.
  • <Error Message> is a human readable description of the error in plain text format indicating the error (e.g. Could not find the specified URL. The server returned an error). Look at http_messages.c.
To reiterate, while this will get you started, more detailed information about HTTP requests and responses can be found in RFC 2616:
• RFC2616 Section 5 for information on how to parse request messages. The only method you are required to support is GET.
• RFC2616 Section 6 for information on how to form response messages.
5.2.1 Supported HTTP responses

At a minimum, your solution should implement the following response codes:

• 200 . OK
• 400 . Bad Request (incorrect request syntax)
• 401 . Unauthorized (Basic HTTP Authentication)
• 403 . Forbidden (access denied—e.g., requested file does not permit read access.)
• 404 . Not Found (requested resource doesn’t exist)
• 405 . Method Not Allowed (e.g., in response to something other than a GET request)
• 500 . Internal Server Error (internal failure—e.g. missing EOF on file read, pipe failure, fork failure, exec failure, etc)
• 505 . HTTP Version Not Supported (anything other than HTTP/1.1 or HTTP/1.0)

5.3 Basic Server

You will implement an iterative HTTP server that follows the following basic algorithm:
• Open a passive socket
• Do forever:
– Accept a new TCP connection
– Read a request from the TCP connection and parse it
– Choose the appropriate response header depending on whether the URL requested is found on the server or not
– Write the response header to the TCP connection
– Write the requested data (by default you should respond with the contents of index.html, located at http-root-dir/htdocs/index.html) to the TCP connection
– Close the TCP connection

The server that you will implement at this stage will not be concurrent, meaning that it will not serve more than one client or request at a time (it queues the remaining requests while processing each request). Use the aforementioned daytime server as a reference for programming with sockets.

Note again that, in HTTP, all newlines should be “\r\n” (CRLF), not just “\n” (LF)!
5.3.1 Code Organization

Most of the logic for the server itself (e.g., concurrency modes, accepting client connections, responding to client, etc.) should be implemented in server.c. The http_messages.c file should contain code for dealing with HTTP requests and responses (e.g., parsing requests, forming responses, etc.).

You should utilize request handler functions and other helper functions for modularity and readability of code. The starter code provides some empty handlers in handlers.c that you can implement to handle different types of requests, as well as empty functions to parse requests and create responses in http_messages.c.
You should implement your HTTP server in server.c, http_messages.c, and handlers.c. The handle() function is a good place to start.

5.4 Basic HTTP Authentication

In this part, you will add basic HTTP authentication to your server. Your HTTP server may have some bugs and may expose security problems. You don’t want to expose this to the open internet. One way to mitigate this security risk is to implement basic HTTP authentication. You will implement the authentication scheme in RFC7617, aptly called “Basic HTTP Authentication.”

In Basic HTTP Authentication, you will check for an Authorization header field in all HTTP requests. If the Authorization header field isn’t present, you should respond with a status code of 401 Unauthorized with the following additional field:

WWW-Authenticate: Basic realm= <something> (you should change <something> to a realm ID of your choosing, such as “The Great Realm of CS252”).

When a browser receives this response, it knows to prompt for a username and password. The provided credentials (in the form of username:password are then encoded in base64 format. The browser then sends the same request a second time including the Authorization header containing the base64 encoded credentials.

You should create your own username/password combination and store it in auth.txt. You can manually encode it in base64 on most Linux systems with the following command:
$ cat auth.txt | base64
Make sure that there is no newline at the end of your file.

You can check this by running cat auth.txt and seeing where the prompt prints after the file.

There is a good chance your text editor will add this newline at the end, even if it doesn’t look like there is a blank line at the end of the file. If this happens, you can remove it by running truncate

-s -1 auth.txt (this will delete the last byte in the file).

To illustrate the whole process, consider the following message sequence. The first time a browser tries to connect, it will send a request like the following:

GET /index.html HTTP/1.1
The server will initially respond with:
HTTP/1.1 401 Unauthorized
Connection: close
WWW-Authenticate: Basic realm="myhttpd-cs252"
After seeing this response, the client browser will prompt the user for a username/password.
The user supplies cs252 as the username and password as the password. The browser encodes cs252:password in base64 as Y3MyNTI6cGFzc3dvcmQ= and sends the following request to the
server:
GET /index.html HTTP/1.1
Authorization: Basic Y3MyNTI6cGFzc3dvcmQ=

Note: you should create your own username and password and NOT use cs252 and password. You can modify username and password in file auth.txt, and use function return_user_pwd_string (in server.c) to load it. When loaded, the string is stored in a global variable g_user_pass as well. You shouldn’t use your Purdue career account credentials either.

Your server should then check that the request includes the line Authorization: Basic <username:password in base64> and make sure the base64 encoded string matches what the server expects. If it does, the server can proceed and generate the usual response (described in section 5.5). Otherwise, the server should continue responding with status code 401 to tell the client it is not authenticated.
For runtime base64 encoding, we encourage you to find an implementation on the web and use it.
Be sure to credit the source in your comments. You should not fork and use the base64 command line program.
After you add the basic HTTP authentication, you may serve other documents besides index.html.

Note that the client web browser will cache the authentication information after the server authen ticates it the first time and will send it with every request, so your server should make sure the

Authorization header is present and check it on every request.

For testing purposes, it is best to use a private window in your browser so when you quit out of the browser, it will clear the cached username and password.

5.5 Serving Static Files

When serving static content, web servers essentially expose a directory hierarchy. The root of this hierarchy is often called the DocumentRoot. Any request URLs are considered to be relative to that root.

Your webserver should serve http-root-dir/htdocs as its DocumentRoot. When your server receives a request e.g. for /index.html, it should attempt to read and transmit the file http-root-dir/htdocs/index.html. If a request is made for a file that does not exist, the server should reply with an appropriate status code—likely 404 in this case. Requests for directories should attempt to serve the file named index.html file in that directory (if that file is not present, then respond with 404).

You do not need to respect the Accept field of the request message. We do expect that you give a valid Content-Type in your response. You can call get_content_type() (defined in misc.h) to get this information for you. Be careful though! This function does not validate the filename. It simply runs file -biE <filename> and gives you the output. This can be dangerous if the request contains a malicious path.
For this part, your server should support the following status codes: 200, 404. When grading for 404, we will only check that the status code was set correctly. You can provide any content message you wish.
Here is an example of an interaction between a client browser and webserver:
Client:
GET / HTTP/1.1
Authorization: Basic Y3MyNTI6cGFzc3dvcmQ=
Page 7Project 5: Web Server CS 252 - Fall 2024
Server:
HTTP/1.1 200 OK
Connection: close
Content-Type: text/html
Content-Length: 630
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>CS 252 HTTP Server</title>
<link rel="stylesheet" type="text/css" href="style.css">
</head>
<body>
<h1>CS252 HTTP Server</h1>
<ul>
<li><a HREF="simple.html"> Simple Test</A>
<li><a HREF="complex.html"> Complex Test</A>
<li><a HREF="directory"> Browsing directory/</A>
<li><A HREF="cgi-bin/test-env"> cgi-bin: test-env</A>
<li><A HREF="cgi-bin/test-cgi"> cgi-bin: test-cgi</A>
<li><A HREF="cgi-bin/finger"> cgi-bin: finger</A>
<li><A HREF="cgi-bin/jj"> cgi-bin: jj</A>
</ul>
<hr>
</body>
</html>

Note there are 2 CRLFs after the last header (Content-Length), followed by the data for the document being requested.

Some additional notes:
  • The default request URL will be /, which should be treated as /index.html.
  • You will mostly be modifying server.c and the handle_htdocs() function in handlers.c for this part.
  • If you receive a request for a document that is a directory without a trailing frontslash (e.g. /dir1, not /dir1/, then you should really serve /dir1/index.html. That is, pretend as though requests for /dir1 are really requests for /dir1/index.html.
  • If a request has a trailing frontslash, you will handle this in the browsable directories part.

5.6 Concurrency

In this part, you will add concurrency to the server. You will implement three concurrency modes, distinguished by an input argument passed to the server. The concurrency modes you will implement are the following:
• -f : Create a new process for each request
Fork mode (run_forking_server()): You should handle each request in a new process. To do this, after you accept a connection off of your socket acceptor, you should call fork() and
execute your normal request/response logic from within the child. Don’t forget about zombies!
• -t : Create a new thread for each request
Thread-per-request mode (run_threaded_server()): You should handle each request in a
new thread. To do this, after you accept a connection off of your socket acceptor, you should
create a new thread and execute your normal request/response logic from within the child.
You should consider using pthreads for this part.
• -pNUM THREADS : Pool of threads
Pool of threads (run_thread_pool_server()): You should handle each request using a pool of n workers. n is denoted by NUM THREADS. Your program should be using n + 1 threads during execution - the thread that your program starts in should be used to create the n workers and then wait for them to finish. Of course, since each worker is running in an infinite loop, they will only finish on an error or when the program exits.
• -h : Print usage
This flag will print out the usage of myhttpd and myhttpds.
The format of the command to run your server should be:
./myhttpd [-f|-t|-pNUM_THREADS] [-h] PORT_NO

If no flags are passed, the server should act like an iterative server as created in the Basic Server section. If port is not passed, choose your own default port number. Make sure it is larger than 1024 and less than 65536.

5.7 Server Termination

Webservers do not typically terminate on their own. However, to make it simpler to assess file descriptor and memory leaks, your implementation should. The mechanism to cause termination is a signal. Specifically, SIGUSR1. You should add a signal handler for SIGUSR1 that causes your program to cleanly terminate. All threads should halt, all memory that was allocated should be free()’d, and all file descriptors except for the three standard streams should be closed. We will test your server for leaks after deliverying SIGUSR1 to the process.

5.8 Checkpoint Submission (45 points)

The deadline for the checkpoint is Monday, November 4, 2024 at 11:59 PM. You must run the following commands in order for your submission to be accepted:
$ make clean
$ make myhttpd
$ make submit_checkpoint
Grading for the checkpoint consists of 40 points for from test cases, 2.5 points for proper memory management, and 2.5 points for closing the appropriate file descriptors.

6 The Assignment - Final

6.1 Secure HTTP (HTTPS)

HTTP messages are sent across the Internet unencrypted, which means that anyone who can sniff traffic along a message’s route can see anything transmitted between the client and server (think about that base64 encoded password from earlier...). This is often undesirable.

Fortunately a more secure version of HTTP exists, called HTTPS. HTTPS uses Transport Layer

Security (TLS) to communicate. TLS encrypts all traffic at the transport layer of the Internet model, which means that you, as an application programmer, only need to worry about adding high-level support for communicating via TLS. That is, the underlying socket has changed, but the HTTP protocol you’ve implemented is unconcerned about what the socket does when your server asks to read or write data.

6.1.1 Basics

The skeleton code provided includes an abstraction of the socket layer that supports regular and encrypted communication. The “regular” version was provided. You now need to implement the functions declared in tls.h:

• int close_tls_socket(tls_socket *socket);
• int tls_write(tls_socket *socket, char *buf, size_t buf_len);
• int tls_read(tls_socket *socket, char *buf, size_t buf_len);
• tls_acceptor *create_tls_acceptor(int port);
• tls_socket *accept_tls_connection(tls_acceptor *acceptor);
• int close_tls_acceptor(tls_acceptor *acceptor);
To run your server with the TLS Socket, you should:
$ make myhttpsd
$ ./myhttpsd ...

When you navigate to your server with your internet browser, be sure to include the https:// prefix.

Note: After successfully porting the TLS “driver” to your web server, you will receive warning messages in your browser along the lines of “Connection is not Private.” This is because your SSL certificate was not signed by an accepted certificate authority, such as Comodo, LetsEncrypt, GlobalSign, etc. You can manually install your certificate as a trusted certificate in your browser, or you may simply add an exception. We trust you can figure out how to do this depending on the browser being used.

If you haven’t added the certificate exception when you first connect to your server, some browsers (like Firefox) will immediately close the connection—often before your server can write a response back. Your code will likely crash because you receive a SIGPIPE. You must safely handle this signal.

6.1.2 Example Code

We have provided a simple reference TLS server in the file examples/tls-server.c. Please look at this example since it initializes and accesses a TLS server in the same way your server will. You can build it with make tls-server and play around with it to learn how to use the OpenSSL library. The simple TLS server example assumes that your private key and certificate file are called cert.pem and key.pem and that you can bind to port 4433 - you will want to change this port number so that you can run the demo. You can generate the certificate and key by running the following command and filling in the prompts:

$ openssl req -newkey rsa:4096 -nodes -sha512 -x509 -days 21
-nodes -out cert.pem -keyout key.pem
$ chmod 700 *.pem

6.1.3 Approach

The process of creating a TLS server is as follows:
1. Initialize OpenSSL by calling SSL_load_error_strings() and OpenSSL_add_ssl_algorithms(). This should be done at the start of your program (i.e., near the beginning of main()).
2. Create the master socket just like you would for the unencrypted server, configure it to listen.
3. Create the server SSL_CTX with SSL_CTX_new() using the TLS_server_method() as the SSL_METHOD parameter.
• SSL_CTX_new(TLS_server_method()) will return a pointer to an SSL_CTX structure.
• An SSL context is essentially a collection of options and other configuration-related values that are used to create and maintain an SSL connection.
• See the create_context() function in the TLS example.
4. Configure the context’s SSL ECDH parameters with SSL_CTX_set_ecdh_auto()
5. Set the certificate and private key locations SSL_CTX_use_certificate_file() and SSL_CTX_use_PrivateKey_file(). See the configure_context() function in the TLS example.
6. Accept a connection using accept() as usual.
7. Create the associate SSL structure for this connection using SSL_new(). Pass the earlier created context in as the argument.
8. Associate the current connections file descriptor with the previously created SSL structure using SSL_set_fd.
9. Call SSL_accept to establish the SSL connection.
10. Use SSL_read and SSL_write to interact with the client, just like we previously used recv() and send().
• Instead of passing in the client socket fd, you should pass in the pointer to the SSL structure obtained from SSL_new().
11. When done, close() the connection clean up the SSL state by calling SSL_free and SSL_CTX_free.

Be sure to handle any errors and read the man pages for the above functions for more information.Once again, be sure to look carefully at the example in examples/tls-server.c since it clearly demonstrates how to do the above steps. A lot of the code can be copy/pasted into your server!

6.1.4 Code Organization

You will notice that this project uses conditional compilation to compile either a TLS or regular TCP server. In general, your server should use the functions in socket.c to open, close, read, or write sockets and accept connections, so implementing the functions in tls.c should be enough to create a working server since myhttpsd should invoke those functions in place of the regular functions.

6.2 Common Gateway Interface (CGI) Support

For this part, you will implement CGI-like behavior on your server. The Common Gateway Interface allows a server to handle dynamic requests and forward them on to an arbitrary executable, typically inside of a specified directory like cgi-bin.
When a request like this arrives:
GET <sp> /cgi-bin/<script>?{<var>=<val>&}*{<var>=<val>}<sp> HTTP/1.1 <crlf>
{<headers> <crlf>}*
<crlf>
the process or thread handling the request should use fork and execv to execute the program in cgi-bin/<script>.
There are two ways the variable-value pairs in {<var>=<val>&}*{<var>=<val>} are passed to the
CGI script: the GET method and the POST method. You will implement the GET method and for extra points you may implement the POST method.
For the GET method, the string of variables {<var>=<val>&}*{<var>=<val>} is passed to the
<script> program as an environment variable QUERY_STRING. It is up to the <script> program
to decode this string. Also, you should set the REQUEST_METHOD environment variable to ”GET”.
The output of <script> should be sent back to the client.

The general sequence of events for handling a cgi-bin request then is:

1. Fork a child process
2. Set the environment variable REQUEST_METHOD=GET
3. Set the environment variable QUERY_STRING=(arguments after ?)
4. Redirect output of child process to the slave socket.
5. Print the following header:
6. HTTP/1.1 200 Document follows <crlf> Server: Server-Type <crlf>
7. Execute script
The script or CGI program is responsible for outputting the content type and generating the associated output or data.
For more information on how cgi-bin works see the Apache documentation.
Note: CGI scripts generally expect to be able to write directly to the socket stream. In the project template provided, we form a response structure and write that to the socket. CGI scripts also expect to be able to set some of the header fields You have a few options on how to tackle this problem. The first solution could be to parse the output of the CGI script and detect any headers that were set in the output. Another solution could be to write a partial HTTP message over the socket (just HTTP/1.1 200 OK) and allow the CGI script to write to the socket without the HTTP response structure involved at all.

6.3 Loadable Modules

For this part, you will implement loadable modules. Loadable modules permit the dynamic (runtime) addition of additional executable code to your server. When the name of a CGI script ends with .so, instead of calling exec(), your server should load that module into memory using dlopen()
— if it has not been previously loaded. Your server should then transfer control to this module by first looking up the function extern httprun(int ssock, char *query_string) using dlsym() and then calling httprun(), passing the slave socket fd and the query string as parameters.
httprun() will write the response to the slave fd using the parameters in querystring.
For example, a request of the form:
http://localhost:8080/cgi-bin/hello.so?a=b should cause your server to load the module hello.so into memory and then call the function httprun() in this module with ssock and querystring as parameters. It is up to the module to write the response to ssock.
Your server needs to keep track of what modules have already been loaded to ensure that dlopen() is not invoked multiple times for the same module.
The hello.c module can be found in the examples directory. You should run make inside the examples directory to build it.
There is an example of how to use loadable modules, named use-dlopen.c in your project 5 examples directory as well.
For this part, you should also transform http-root-dir/cgi-src/jj.c into a loadable module and name it jj-mod.c. Hint: Use the call fdopen to be able to use buffered and formatted I/O calls such as fprintf() to write to the slave socket fd. For example, in the top of httprun() in
jj-mod.c one could:
FILE *fssock = fdopen(ssock, "r+");
and then output formatted data:
fprintf(fssock, "tomato, and mayo.<P>%c", LF);
Don’t forget to close fssock at the end of httprun().
fclose(fssock);

6.4 Directory Browsing

In this stage you will add to your server the capacity to browse directories. If the
<Document Requested> in the request is a directory, your HTTP server should return an HTML document with hyperlinks to the contents of the directory. Also, you should be able to visit and browse subdirectories contained in this directory.
An example of what a directory should look like is available at http-root-dir. Mimic the appearance and functionality of the directory browser found at that link. You may wish to read the man pages
for opendir() and readdir().
Some notes:
• The name of the current directory should be shown at the top of the page in the format “Index of <directory>”
• The name, last modified date, and file size should each be a separate column (you do not need to worry about description).
• A “Parent Directory” link should always appear at the top of the list of files. Clicking this will take you to the parent directory of the current location. Clicking this button while in the highest explorable directory area will return to index.html.
• Icons should be implemented for the various file types. All icons can be found in http-root-dir/icons.
– “Parent Directory” link should use “back.gif” icon.
– Directories should use “folder.gif”
– Images of .gif, .png, .jpg, .jpeg, and .svg should use “image.gif”
– All other files should use “text.gif”
Resources
opendir, closedir, and readdir

7 Extra Credit

7.1 Sorting

Implement sorting by name, size, and modification time for the Directory Browsing section. The
“Parent Directory” link should remain on top despite sorting.

7.2 Statistics

Implement a page http://localhost:<port>/stats with the following:
• The name of the student who wrote the project
• The time the server has been up (the amount of time myhttpd has been running)
• The number of requests since the server started
• The minimum service time and the URL request that took this time.
• The maximum service time and the URL request that took this time.
The service-time is the time it takes to service a request since the request is accepted until the socket is closed. Use the function clock_gettime to measure the duration of the requests and link your program with -lrt.
The response should have at least a header with the key being Content-Type and the value being text/plain, Time value should be displayed in the format of <sec>:<nsec>, where <sec> number of seconds, and <nsec> is the number of nanoseconds. For reference, see clock_gettime().

The number of requests should be updated after the socket is closed.

Logging information should be displayed in the following format:
Author: Richard
Running Time: 5:607325026
Number of Requests: 2
Max Time: 0:11383527
Max URL: /
Min Time: 0:3329152
Min URL: /style.css

7.3 Logging

Write and maintain a file named myhttpd.log in the directory containing your server executable that contains a list of all requests. Each line should include:
• The source host of the request
• The URL requested
• The response code
in the format of [host] route (status). An invalid request URI should be logged as <undefined>
Below is a sample log containing three entries:
[72.12.204.60] /cgi-bin/date.so (404)
[72.12.204.60] / (200)
[72.12.204.60] <undefined> (500)

8 Testing

The primary methods for testing your server should be web browsers like Firefox with the developer console. If your browser is not providing useful information, you can also use tools such as telnet, netcat, or curl.
$ telnet localhost 4747
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is ’^]’.
GET / HTTP/1.1
HTTP/1.1 200 OK
Connection: close
Content-Length: 12
Hello CS252!
Connection closed by foreign host.
$ printf "GET / HTTP/1.1\r\n\r\n" | netcat localhost 4747
HTTP/1.1 200 OK
Connection: close
Content-Length: 12
Hello CS252!
$ curl localhost:4747
Hello CS252!
ApacheBench (ab) is a useful tool for benchmarking and stress testing web servers. We will use it for robustness testing. You can read more about it on Wikipedia or at Apache.
Please note, ApacheBench is NOT a good way to test if your basic implementation is correct. For that you should be testing with one of the methods above (curl, netcat, telnet, firefox). ApacheBench is good for checking that your server works under load after you know it serves correctly. In particular it can be useful for testing concurrency.
To use ApacheBench, first create a file called auth.txt and place your username and password in it with the format
<username>:<password>
The following command requests index.html 1000 times with 25 concurrent requests from an http server on localhost running on port 8080. You will need to adjust these parameters to fit your server:
~cs252/bin/ab -B "auth.txt" -n 1000 -c 25 -r "http://127.0.0.1:8080/index.html"
Here is a general pattern for executing ApacheBench:
~cs252/bin/ab -B "auth.txt" -n ${n_reqs} -c ${n_conc} -r
"${proto}://${host}:${port}${path}"
Final notes:
  • Be sure to place your username and password in the auth file per above and use the -B option over the -A option. If you do not, others will be able to see your credentials when e.g. running ps.
  • ab doesn’t work when using the hostname localhost, so use the IP address (127.0.0.1) instead when benchmarking locally.
  • If you add the cs252 bin directory to your PATH, you can just run ab. Add export PATH=PATH:~cs252/bin to your preferred shellrc (.bashrc, .zshrc, etc.)

9 Final Submission (55 points)

The deadline for your complete solution is Monday, November 11, 2024 at 11:59 PM.
$ make clean
$ make myhttpds
$ make submit_final
Grading for the final submission consists of 50 points for from test cases, 2.5 points for proper memory management, and 2.5 points for closing the appropriate file descriptors.

发表评论

电子邮件地址不会被公开。 必填项已用*标注