Distributed systems help programmers aggregate the resources of many networked computers to construct highly available and scalable services, such as petabyte storage, and massively parallel computation. This class teaches the abstractions, design, and implementation techniques that allow you to build fast, scalable, fault-tolerant distributed systems. Topics include multithreading, network programming, distributed file systems, replication, consistency models, fault tolerance, distributed transactions, agreement and commitment, consensus, and several case studies of distributed systems.
This course helps students to gain (1) an understanding of the principles and techniques behind the design of distributed systems and (2) practical experience designing, implementing, and debugging real distributed systems. The course consists of lectures, exams, and a series of programming labs.
Time and Location
• Class: TR, 2-3:20 PM, Javits 111
• Office Hours: Thursdays, 4-6 pm, NCS 139
Prerequisites
• Undergraduate Operating Systems or Computer Systems Organization
• System-level programming experience
Topics
• Introduction
■ Distributed computation, Communication model, Time and replication
• Consensus
■ Agreement, Raft protocol, Paxos protocol and its variants, PBFT and other Byzantine fault-tolerant (BFT) protocols
• Transactions
■ Consistency (linearizability, eventual, causal and fork-join consistency), Concurrency, Isolation, Sharding
• Industrial case studies
■ Google Spanner, Google Chubby and Google BigTable
• Blockchains
■ Permissioned blockchains, Permissionless blockchains, Scalability, PoW, PoS
Evaluation
• 4 Course Projects (no programming language restriction)
(1) Implementing a crash fault-tolerant protocol (a variant of Paxos)
(2) Implementing a Byzantine fault-tolerant protocol (a variant of PBFT)
(3) Implementing a distributed transaction processing system (on top of project 1)
(4) Implementing a scalable permissioned blockchain system (on top of project 2)
■ 60% (15%, 15%, 15%, 15%)
• Exam
■ 30%
• Quizzes (five Quizzes, each worth 2%)
■ 10%
Schedule
Topic
|
Date
|
Resources and Readings
|
Slides
|
Notes
|
08/27
|
No Class
|
|
|
|
08/29
|
No Class
|
|
|
|
09/03
|
Introduction
|
(1) Google Code University, Introduction to Distributed System Design
(2) What Good are Models and What Models are Good?
(3) The Eight Fallacies of Distributed Computing
(4) Your computer is already a distributed system. Why isn't your OS?
|
slides
|
|
09/05
|
Communication
|
Implementing Remote Procedure Calls
|
slides
|
|
09/10
|
Time
|
Time, clocks, and the ordering of events in a distributed system
|
slides
|
|
09/12
|
Replication
|
The Design of a Practical System for Fault-Tolerant Virtual Machines
|
slides
|
|
09/17
|
Agreement
|
(1) Impossibility of Distributed Consensus with One Faulty Process
(2) Synchrony, Asynchrony and Partial synchrony
|
slides
|
Quiz 1
|
09/19
|
Raft Protocol
|
In Search of an Understandable Consensus Algorithm
|
slides
|
|
09/24
|
Paxos
|
Paxos Made Simple
|
slides
|
|
09/26
|
Paxos Variants
|
Resource
|
slides
|
|
10/01
|
Byzantine Fault-Tolerant Protocols
|
Practical Byzantine Fault Tolerance
|
slides
|
|
10/03
|
More BFT protocols
|
The Bedrock of Byzantine Fault Tolerance: A Unified Platform for BFT Protocol Design and Implementation
|
slides
|
Quiz 2
|
10/08
|
Transactions
|
Resource
|
slides
|
Project 1
|
10/10
|
Linearizability
|
Resource
|
slides
|
|
10/15
|
No Class - Fall Break
|
|
|
|
10/17
|
Other forms of consistency
|
Resource
|
slides
|
|
10/22
|
Concurrency
|
Resource
|
slides
|
|
10/24
|
Isolation
|
Resource
|
slides
|
Quiz 3
|
10/29
|
Scalable Transaction Processing
|
Crash Recovery in a Distributed Data Storage System
|
slides
|
Project 2
|
10/31
|
Google Spanner
|
Spanner: Google’s Globally-Distributed Database
|
slides
|
|
11/05
|
Google Chubby and Google BigTable
|
Resource
|
slides
|
|
11/07
|
Permissioned Blockchains
|
(1) Tendermint: Byzantine Fault Tolerance in the Age of Blockchains, link to Repository
(2) Hyperledger Fabric: a distributed operating system for permissioned blockchains, Link to Repository
(3) Parblockchain: Leveraging transaction parallelism in permissioned blockchain systems
(4) Hyperledger Fabric variants: (a) Fast Fabric, (b) Fabric++, (c) Fabric#
|
slides
|
|
11/12
|
Scalable Permissioned Blockchains
|
(1) Towards scaling blockchain systems via sharding
(2) SharPer: Sharding Permissioned Blockchains Over Network Clusters
(3) ResilientDB: Global Scale Resilient Blockchain Fabric
|
slides
|
Quiz 4
|
11/14
|
Permissionless Blockchains
|
Database and Distributed Computing Foundations of Blockchains
|
slides
|
|
11/19
|
Bitcoin
|
Bitcoin: A Peer-to-Peer Electronic Cash System
|
slides
|
|
11/21
|
Scalable Permissionless Blockchain
|
(1) A Secure Sharding Protocol for Open Blockchains
(2) Omniledger: A secure, scale-out, decentralized ledger via sharding
(3) Rapidchain: Scaling blockchain via full sharding
|
slides
|
Project 3
|
11/26
|
Proof of Stake
|
(1) Ethereum: A Secure Decentralised Generalised Transaction Ledger
(2) Solana: A New Architecture for a High-Performance Blockchain
|
slides
|
|
11/28
|
No Class (Thanksgiving)
|
|
|
|
12/03
|
Distributed Computation
|
MapReduce: Simplified Data Processing on Large Clusters
|
slides
|
Quiz 5
|
12/05
|
Exam
|
|
|
Exam
|
12/12
|
|
|
|
Project 4
|