Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
CSEE4119 F24 Computer Networks
Project 1: Video CDN
1. Overview
In this project, you will explore aspects of how streaming video works, as well as socket programming and HTTP. In particular, you will implement adaptive bitrate selection. The programming languages and packages are specified in the development environment section.
We will do this in multiple stages:
1. Preliminary stage: building a simple proxy
2. Intermediate stage: requesting and receiving video chunks
3. Final stage: implementing adaptive bitrate streaming
1.1 In the Real World
Figure 1 depicts (at a high level) what this system looks like in the real world. Clients trying to stream a video first issue a DNS query to resolve the service’s domain name to an IP address for one of the content servers operated by a content delivery network (CDN). The CDN’s authoritative DNS server selects the “best” content server for each particular client based on (1) the client’s IP address (from which it learns the client’s geographic or network location) and (2) current load on the content servers (which the servers periodically report to the DNS server).
Once the client has the IP address for one of the content servers, it begins requesting chunks of the video the user requested. The video is encoded at multiple bitrates. As the client player receives video data, it calculates the throughput of the transfer and monitors how much video it has buffered to play, and it requests the highest bitrate the connection can support without running out of video in the playback buffer.
1.2 Your System
Implementing an entire CDN is clearly a tall order, so let’s simplify things. First, your entire system will run on one host; we’re providing a network simulator netsim (described in Development Environment) that will allow you to run several processes with arbitrary IP addresses on one machine. Our simulator also allows you to assign arbitrary link characteristics (bandwidth and latency) to the path between each pair of “end hosts” (processes). For this project, you will do your development and testing using a virtual machine (VM) we provide.
Browser. You’ll use an off-the-shelf web browser (e.g. Chrome) to play videos served by your CDN (via your proxy).
Proxy. Rather than modify the video player itself, you will implement adaptive bitrate selection in an HTTP proxy. The player requests chunks with standard HTTP GET requests. Your proxy will intercept these and modify them to retrieve whichever bitrate your algorithm deems appropriate. To simulate multiple clients, you will launch multiple instances of your proxy.
Web Server. Video content will be served from an off-the-shelf web server (Apache). More detail is in the Development Environment section. As with the proxy, you can run multiple instances of Apache on different fake IP addresses to simulate a CDN with several content servers. However, in the assignment, rather than using DNS redirection like a CDN would, the proxy will contact a particular server via its IP address (without a DNS lookup). We also ask you to implement a DNS server that decides which server to direct the proxy to.
The project is broken up into three stages (plus the initial set up to get you ready for the stages):
● In the preliminary stage, you will implement a simple proxy that sequentially handles clients and passes messages back and forth between client and server without modifying the messages.
● In the intermediate stage, you will extend the proxy to forward HTTP requests between a web browser and web server.
● In the final stage, you will extend the proxy to implement the full functionality described above, with the proxy modifying HTTP requests to perform bitrate adaptation.
1.3 Groups and collaboration policy
This is an individual project, but you can discuss it at a conceptual level with other students or consult Internet material (excluding implementations of Python proxies), as long as the final code and configuration you submit is completely yours and as long as you do not share code or configuration. Before starting the project, be sure to read the collaboration policy at the end of this document.
2. Preliminary stage: Proxy Set-up
The preliminary stage was meant to help you understand socket programming and prepare you for building a proxy between web browsers (clients) and video servers. Moving into the intermediate stage, however, the requirements are different enough that you may want to use your preliminary stage code as a roadmap rather than using it as starter code. Either way, please read the preliminary stage description again, if you have trouble building a proxy.
3. Intermediate stage: Chunks!
The goal of the intermediate stage was to help familiarize you with the patterns of HTTP and how to build a proxy to forward requests and responses between clients and servers. Unlike with the preliminary stage, it is likely that you will be able to use your intermediate proxy as a foundation for the final stage. However, if you find yourself having issues with the fundamentals of designing a HTTP proxy, it may be worth revisiting the intermediate stage description again.
4. Final stage: Video Bitrate Adaptation
Many video players (e.g. Netflix, Twitch, Youtube) monitor how quickly they receive data from the server and then use this throughput value to request higher or lower quality encodings of the video with the goal of streaming the highest quality encoding that the connection can handle. Rather than modifying an existing video client to perform bitrate adaptation, you will implement this functionality in an HTTP proxy through which your browser will direct requests.
After enhancing your proxy with video bitrate adaptation capabilities, you will then implement a basic DNS server that listens for queries for the domain “video.columbia.edu” and returns a web server IP address corresponding to the “best” web server. More details are in Simple DNS Server
4.1 Requirements
(1) Implement basic video adaption functionality in your proxy. Your proxy should calculate the throughput it receives from the video server and select the best bitrate for the connection. See Implementation Details for details.
(2) Evaluate your basic video adaptation function. Running the dumbbell topology (via netsim) will create two servers (listening on the IP addresses in topo1.servers: 3.0.0.1:8080 and 4.0.0.1:8080); you should direct one proxy to EACH server. See Sample output for details. Now
(a) (Place yourself in ~/csee_4119_abr_project/netsim) Start playing the video through each proxy. Run the topo1 events file and direct netsim.py to generate a log file: ./run_netsim.sh -l ../topos/topo1 run After 1 minute, stop video playback and kill the proxies.
(b) (Place yourself in ~/csee_4119_abr_project/plot) Gather the netsim log file and the log files from your proxy and use them to generate plots for link utilization, fairness, and smoothness. Use our grapher.py script to do this: ./grapher.py .
(c) Repeat this process for alpha = .1, .5, .9. Generate 9 plots (from (b)) and compile them into a single PDF. In the PDF, include a brief discussion of the tradeoffs and trends when varying alpha. We are looking for a paragraph or two, at most.
4.2 Proxy Implementation Details
You are implementing a simple HTTP proxy. It accepts connections from web browsers, modifies video chunk requests as described below, opens a connection with the web server’s IP address, and forwards the modified request to the server. Any data (the video chunks) returned by the server should be forwarded, unmodified, to the browser.
Your proxy should listen for connections from a browser on any IP address on the port specified as a command line argument (see below). Your proxy should accept multiple concurrent connections from web browsers (possibly by starting a new thread or process for each new request). When it connects to a server, it should first bind the socket to the fake IP address specified on the command line (note that this is somewhat atypical: you do not ordinarily bind() a client socket before connecting). Figure 4 depicts this.
4.2.1 Simple ABR Algorithm
The simple ABR algorithm looks at bandwidth available to a client, and requests bitrates that the connection can likely handle. Hence, there are two functions: throughput calculation and bitrate requests.
4.2.1.1 Throughput Calculation
Your proxy could estimate each stream’s throughput once per chunk as follows. Note the start time, ts, of each chunk request (i.e., include “time” and save a current timestamp using time.time() when your proxy receives a request from the player). Save another timestamp, tf , when you have finished receiving the chunk from the server. Now, given the size of the chunk, B, you can compute the throughput, T, your proxy saw for this chunk (to get the size of each chunk, parse the received data and find the “Content-Length” parameter) :
To smooth your throughput estimate, your proxy should use an exponentially-weighted moving average (EWMA). Every time you make a new throughput measurement (Tnew), update your current throughput estimate as follows:
The constant 0 ≤ α ≤ 1 controls the tradeoff between a smooth throughput estimate (α closer to 0) and one that reacts quickly to changes (α closer to 1). You will control α via a command line argument.
You should maintain separate throughput estimates for each
4.2.1.2 Choosing a Bitrate
Once your proxy has calculated the connection’s current throughput, it should select the highest offered bitrate the connection can support. For this project, we say a connection can support a bitrate if the average throughput is at least 1.5 times the bitrate. For example, before your proxy should request chunks encoded at 1000 Kbps, its current throughput estimate should be at least 1.5 Mbps.
Your proxy should learn which bitrates are available for a given video by parsing the manifest file (the “.mpd” initially requested at the beginning of the stream). The manifest is encoded in XML with each encoding of the video described by a element, whose bitrate attribute you should find (the manifest file uses the term ‘bandwidth’, but please treat it as bitrate).
Your proxy replaces each chunk request with a request for the same chunk at the selected labeled bitrate by modifying the HTTP request’s Request-URI. Video chunk URIs are structured as follows:
/path/to/video/bunny_bps/BigBuckBunny_6s
For example, suppose the player requests chunk 2 of the video Big Buck Bunny at bitrate label 45514:
/path/to/video/bunny_45514bps/BigBuckBunny_6s2
To switch to a higher bitrate, e.g., bitrate label 1006743, the proxy should modify the URI like this:
/path/to/video/bunny_10067431bps/BigBuckBunny_6s2
IMPORTANT: The video player requests BigBuckBunny_6s_nolist.mpd, which is also stored in servers. It does not list the available bitrates (actually, it only lists a single bitrate label, 1006743, therefore you should expect the browser to always send you queries with 1006743 as the desired bitrate label for any sequence). Your proxy should, however, fetch BigBuckBunny_6s.mpd for itself (i.e., don’t return it to the client) so you can parse the list of available encodings as described above. Don’t hardcode the bitrates!
4.2.2 Logging
We require that your proxy create a log of its activity in a very particular format for scoring and graphing. After each response for a requested chunk your proxy should append the following line to the log:
time: The current time in seconds since the epoch.
duration: A floating point number representing the number of seconds it took to download this chunk from the server to the proxy.
tput: The throughput you measured for the current link in Kbps.
avg-tput: Your current EWMA throughput estimate in Kbps.
chunk-bitrate: The actual bitrate your proxy requested for this chunk in Kbps (NOT LABELED). Example: for a chunk with bitrate 1006743 bps, the actual bitrate should be 1006743 / 1000 = 1007 Kbps.
server-ip: The IP address of the server to which the proxy forwarded this request.
chunkname: The name of the file your proxy requested from the server (that is, the modified file name in the modified HTTP GET message).
Please make sure to flush your log to disk after every line so that we can access the log while your proxy is still running. Failure to do so will likely result in no points on some tests.
4.2.3 Running the Proxy
You should create an executable Python script called proxy under the proxy directory, which should be invoked as follows:
cd ~/csee_4119_abr_project/proxy
./proxy
log: The file path to which you should log the messages described in Logging.
alpha: A float in the range [0, 1]. Use this as the coefficient in your EWMA throughput estimate.
listen-port: The TCP port your proxy should listen to for accepting connections from your browser.
fake-ip: Your proxy should bind to this IP address for outbound connections to the web servers. The fake-ip can only be one of the clients’ IP addresses under the network topology you specified (see Network Simulation). The main reason why we are doing this is because netsim emulates a network topology where it can manipulate throughput on links between end-hosts. If we bind the outbound socket to this fake-ip, then we can be sure that packets sent between the proxy and the server traverse ONLY the links set by netsim. (and here is why you want your packets to traverse netsim links only)
dns-server-port: The port on which you run your DNS server (see Running the DNS Server)
4.2.4 Sample output
You can get a piece of output like the sample by the following steps:
1. Place yourself in ~/csee_4119_abr_project/netsim. Start netsim using topology1 by running:
./run_netsim.sh ../topos/topo1 start
2. Place yourself in ~/csee_4119_abr_project/proxy. Start your DNS server (see DNS Server Implementation for more information) by running:
./dns_serve.py
3. In a separate window, place yourself in ~/csee_4119_abr_project/proxy. Start your proxy server by running:
./proxy ../topos/topo1 log1.txt 0.5 8000 1.0.0.1 53
4. Start to play the video through your proxy by pointing the web browser to:
5. Generate the network events by running:
./run_netsim.sh -l netsim_log.txt ../topos/topo1 run
You may run ./proxy ../topos/topo1 log2.txt 0.5 8001 2.0.0.1 53 along with step 2 and point the web browser also to http://localhost:8001 to initiate and use a second proxy.
Example output:
(sample log1.txt output with the DNS server directing your proxy to web server 4.0.0.1)
Note: Your log file will never be exactly the same. There are some fields that change between runs and some that should not. Keep this in mind when checking against this example.
4.3 DNS Server Implementation
Authoritative DNS servers, among other things, respond to queries for domains they manage with IP addresses at which clients can find those domains. Your job is to implement a simple DNS server that manages the domain “video.columbia.edu” and returns an IP address for the “best” web server at each point in time.
Your DNS server should listen on a UDP port for DNS queries, and respond to ones for A records whose question fields are “video.columbia.edu”. As you are not allowed to use external libraries, you must interpret raw DNS messages and construct DNS messages from scratch. You do not need to implement the whole DNS protocol, but you should be able to respond to queries for “video.columbia.edu” from both your proxy and command line tools like “dig”, and disregard queries for other domains.
4.3.1 Basic Functionality
You should listen on a UDP port and IP address specified by command line arguments for incoming DNS queries from any client, and respond to those queries if they ask about a domain you control.
DNS messages should be in the DNS protocol format, specified in RFC 1035. To make your life easier, here are some tips:
AA Set this to 0 in requests, 1 in responses.
RD Set this to 0 in all messages.
RA Set this to 0 in all messages.
Z Set this to 0 in all messages.
NSCOUNT Set this to 0 in all messages.
ARCOUNT Set this to 0 in all messages.
QTYPE Set this to 1 in all requests (asking for an A record).
QCLASS Set this to 1 in all requests (asking for an IP address).
TYPE Set this to 1 in all responses (returning an A record).
CLASS Set this to 1 in all responses (returning an IP address).
TTL Set this to 0 in all responses (no caching).
Your DNS server will operate over UDP, on an IP address specified in the topo{NUMBER}.dns file (given to you), and on a port specified as command line arguments. Your server need only respond to requests for “video.columbia.edu”. Any other requests should generate a response with RCODE 3.
Your DNS server should read the IP address it should listen on from the topo{NUMBER}.dns file in the topology directory you specify as a command line argument. Your DNS server should read the available web server IP addresses in the topo{NUMBER}.servers file in that same directory. See Running the DNS Server for more information about command line arguments.
Your DNS server should periodically measure latency to each of the available web servers, regardless of whether it uses this information to make decisions. To test whether you are correctly measuring latency, use artificial latencies generated by the network simulator. The -I option of the ping tool might come in handy.
4.3.2 Choosing the “Best” Web Server
Your DNS server should support two decision functions for returning valid records. The decision function is specified in string format via a command-line argument. It should be set upon server instantiation, and does not change over the lifetime of the program.
1. “round-robin” : the returned video server IP address should change with each query. Mimics a case where a DNS server load-balances requests across multiple servers.
2. “lowest-latency” : your DNS server should return the video server IP address to which it has the lowest latency. Mimics a case where the DNS server is geographically close to the clients it serves, and assumes a well-performing web server for itself will also perform well for its clients.
4.3.3 Logging
We require that your DNS server create a log of its activity in a very particular format for scoring. After each response from a client, your DNS server should append the following line to the log:
“request-report”
time: The current time in seconds since the epoch.
“request-report”: Just the string “request-report” without quotes.
decision-method. The decision method your server used to compute the returned web server IP. One of “round-robin” or “lowest-latency”.
returned-web-server-ip. The IP address of the web server you answer the DNS query with.
Do not log requests for domains you do not control. After each measurement to a web server, your DNS server should append the following line to the log for EACH video server:
“measurement-report”
time: The current time in seconds since the epoch.
measurement-report: Just the string “measurement-report” without quotes.
video-server-ip. Video server to which the latency measurement corresponds.
latency. Measured latency, in milliseconds, to the video-server-ip.
Please make sure to flush your logs to disk after every line so that we can access the log while your proxy is still running. Failure to do so will likely result in no points on some tests.
4.3.4 Running the DNS Server
You should create an executable Python script called dns_server under the proxy directory, which should be invoked as follows:
cd ~/csee_4119_abr_project/proxy
./dns_server
topo-dir: Directory corresponding to the topology you’re using in the experiment. Here your DNS server will find the IP address it should listen on, and the video server addresses.
log: The file path to which you should log the messages described in Logging.
listen-port: The UDP port your proxy should listen on for accepting DNS requests.
decision-method: One of “round-robin” or “lowest-latency”. See Choosing the Best Web Server for more information.
4.4 What to Submit for the final stage
Your submission will contain one zip final called “{uni}.zip” containing the following files.
● writeup.pdf - This should contain the plots and analysis described in Requirements.
● proxy and dns_server - Please add ‘#!/usr/bin/env python3’ to the top of your Python files so that we can easily make it executable. (Please add exactly the same characters. We won’t be able to execute the file if there is an extra space or slash. Please also notice that the name is proxy, not proxy.py. All your code must be in these two files.)
4.5 Where to Submit
You will submit your code to Gradescope. You will be asked how many slip days you would like to use for the final stage. If you have any questions about it, please let us know ASAP. You are expected to perform tests apart from the ones we give to you.
4.6 Possible Plan of Attack
1. Familiarize yourself with the netsim and network topology. Make sure you can play the videos by pointing your web browser directly to the web server’s IP address.
2. Set up the connection between the web browser and the proxy. Make sure any request from your browser will be forwarded to the proxy.
3. Set up the connection between the proxy and the web server. Hardcode a valid IP address for a web server in your proxy for now. Make sure the proxy can get the video content from web servers and can parse the content for the information you need.
4. Set up the combined connections, throughput estimators, and ABR algorithms as described in Implementation Details.
5. Implement a simple DNS server that always returns the same record, and ensure that it responds correctly to queries. Add functionality to your proxy to query this DNS server for the web server’s IP address.
6. Implement performance-based decision functions in your DNS server as described in Choosing the Best Web Server, and configure your proxy to update which web server it communicates with over time.
4.7 Hints
1. You can open a new thread for each new connection for concurrent connections.
2. The proxy outbound socket has to wait before it’s ready to read. You may have to use select.select() to achieve that.
3. The package re can be helpful to parse message content.
4. The tools dig and ping may be useful. The Python method “subprocess.check_output” allows you to interactively use these commands in Python.
5. Remember to clear your web browser cache after each test. Otherwise, the request won’t be forwarded to your proxy but responded to by the web browser cache directly. Even when using incognito mode, your browser will still build up a cache if you don’t close the browser.
6. Use the tools tcpdump to observe which packets are being sent over various interfaces.
5. Development Environment
For the project, we are providing a virtual machine (VM) pre-configured with the software you will need. We strongly recommend that you do all development and testing in this VM; your code must run correctly on this image as we will be using it for grading. For example, some students in previous years decided to write their code on their Windows environment, which changed the control characters to CLRF (Unix uses LF, thus our grader could not run their code). Please make sure your code uses LF. This section describes the VM and the starter code it contains.
5.1 Virtual Machine
We provide an image on GCP (Google Cloud Platform) and you can create a virtual machine (VM) instance based on it. Please follow the tutorial to set up your VM instance and do all your testing there.
In the event you need to move files between the VM and your computer (e.g., for submission), there are a few options.
1. (recommended) Create a (private) Github repository with the relevant project files. You can push to your repository from the VM and access the files from anywhere.
2. If you SSH into your VM instance using the GCP default option (the SSH button on the same row of the instance), on the top of your SSH window, there are two arrow buttons which upload and download files.
3. Launch your remote desktop. You can use sidebar options to upload and download files. Alternatively, send the files via the Internet (email, Google Drive). Chrome is installed on the VM.
5.2 Starter Files (for final stage)
You will find the following files in ~/csee_4119_abr_project/. We may update the files as we release bug fixes, which we will announce on Ed. Assuming these fixes are completely contained to this directory (as opposed to the image), we will release them via Git. Use the command ‘git pull origin master’ to fetch changes.
common Common code used by our network simulation and LSA generation scripts.
lsa (LSAs are not used in this version of the project, so you can ignore them.)
netsim
netsim/netsim.py This script controls the simulated network; see Network Simulation.
netsim/tc_setup.py This script adjusts link characteristics (BW and latency) in the simulated network. It is called netsim.py; you do not need to interact with it directly.
netsim/apache_setup.py This file contains code used by netsim.py to start and stop Apache instances on the IP addresses in your .servers file; you do not need to interact with it directly.
plot
grapher.py A script to produce plots of link utilization, fairness, and smoothness from log files. (See Requirements.)
topos
topos/topo1
topos/topo1/topo1.clients A list of IP addresses, one per line, for the proxies. (Used by netsim.py to create a fake network interface for each proxy.)
topos/topo1/topo1.servers A list of IP addresses, one per line, for the video servers. (Used by netsim.py to create a fake interface for each server.)
topos/topo1/topo1.dns A single IP address for your DNS server. (Used by netsim.py to create a fake interface for the DNS server.) However, in this project you will ignore DNS and let your proxy connect to one of the video servers directly by IP address.
topos/topo1/topo1.links A list of links in the simulated network. (Used by genlsa.py.)
topos/topo1/topo1.bottlenecks A list of bottleneck links to be used in topo1.events. (See §4.3.) (not applicable for preliminary stage)
topos/topo1/topo1.events A list of changes in link characteristics (BW and latency) to “play.” See the comments in the file. (Used by netsim.py.) (not applicable for preliminary stage)
topos/topo1/topo1.lsa A list of LSAs heard by the DNS server in this topology. You can ignore it for this project.
topos/topo1/topo1.pdf A picture of the network.
topos/topo2 (same as topo1, just with different available bandwidth)
topos/topo3 Same file structure as topo1, but with two bottleneck links instead of one. This topology provides a more interesting scenario in which to test your DNS redirection.
5.3 Network Simulation (for final stage)
To test your system, you will run everything (proxies, servers, and DNS server) on a simulated network in the VM. You control the simulated network with the netsim.py script. We wrap netsim.py with a shell script netsim.sh which is the script you interact with. You need to provide the script with a directory containing a network topology, which consists of several files. We provide three sample topologies; feel free to create your own. See Starter Files for a description of each of the files comprising a topology. Note that netsim requires that each constituent file’s prefix match the name of the topology (e.g. in the topo1 directory, files are named topo1.clients, topo1.servers, etc.).
To start the network from the netsim directory:
./run_netsim.sh start
is the path of the topology file, e.g. ../topos/topos1 for topology 1
Starting the network creates a fake network interface for each IP address in the .clients, .servers files; this allows your proxies, Apache instances to bind to these IP addresses.
To stop it once started (thereby removing the fake interfaces), run:
./run_netsim.sh stop
To facilitate testing your adaptive bitrate selection, the simulator can vary the bandwidth and latency of a link designated as a bottleneck in your topology’s .bottlenecks file. (Bottleneck links must be declared because our simulator limits you to adjusting the characteristics of only one link between any pair of endpoints. This also means that some topologies simply cannot be simulated by our simulator.) To vary network characteristics, add link changes to the .events file you pass to netsim.py. Events can run automatically according to timings specified in the file or they can wait to run until triggered by the user (see topos/topo1/topo1.events for an example). When your .events file is ready, tell netsim.py to run it:
./run_netsim.sh run
Note that you must start the network before running any events. You can issue the run commands as many times as you want without restarting the network. You may modify the .events file between runs, but you must not modify any other topology files, including the .bottlenecks file, without restarting the network. Also note that the links stay as the last event configured even when netsim.py finishes running (until you stop the network simulator).
The given topologies 1 and 2 are the same simple topology with one bottleneck, just with different network events. Topology 3 is a scenario with two bottlenecks, which may be useful for testing your DNS server selection mechanism.
5.4 Apache
You will use the Apache web server to serve the video files which is controlled using the commands:
sudo service apache2 start|stop|restart
The server listens on port 80 and is configured to serve files from /var/www/html, where we have put sample video chunks here for you. To view the video directly from the web server without your proxy in between, start the server and point your browser to http://localhost:80.
5.5 Programming Language and Packages
This project must be implemented in Python 3. Your VM instance comes with Python version 3.10.12, which is the version we will use to test your code. If this choice of language poses a significant problem for you (i.e., you have never used Python before), please contact the instructors.
To run python 3.10.12 on the VM instance, please use the following command:
python3
To install python packages for version 3.10.12 on the VM instance, please use:
python3 -m pip
For this project, you are allowed to use the following python packages: sys, socket, threading, select, time, re, numpy, subprocess
Using an unallowed package in your code may result in no credit being given. Other than the packages listed, you may only use the package if you ask on Ed Discussion and a TA or the professor explicitly responds to your request approving the use. We will maintain a pinned Ed post titled "Project 1 List of Approved (and Disallowed) Packages", so please check that post before posting your request. If you would like to use a package not mentioned here and are unsure if it would be acceptable, please add a new followup discussion under the above mentioned pinned post on Ed at least 3 days in advance of the project deadline.
In your followup discussion, you must mention the package name, the package version, and a link to the official repository of the package (e.g. http://pypi.python.org/pypi). TAs will examine your request and determine if the package is allowed and add it to the list of allowed/disallowed packages.