Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Project 4: Sharded Key/Value Service with
Paxos Groups
Introduction
This general architecture (a configuration service and a set of replica groups) is patterned at a high level on a number of systems: Flat Datacenter Storage, BigTable, Spanner, FAWN, Apache HBase, Rosebud, and many others. These systems differ in many details from this project, though, and are also typically more sophisticated and capable. For example, this project lacks persistent storage for key/value pairs and for the Paxos log; it sends more messages than required per Paxos agreement; it cannot evolve the sets of peers in each Paxos group; its data and query models are very simple; and handoff of shards is slow and doesn't allow concurrent client access.
The main challenge in this project will be handling reconfiguration in the assignment of shards to replica groups. For example, if a Put arrives at about the same time as a reconfiguration, all involved servers must agree on whether the Put occurred before or after the reconfiguration. If before, the Put should take effect and the new owner of the shard will see its effect; if after, the Put won't take effect, the client is notified that the key is no longer served by that group, and the client will re-try at the new owner.
Reconfiguration also requires interaction among replica groups. For example, in a particular configuration, a group of servers G1 may be responsible for shard S1. In the next configuration, another group G2 may be responsible for shard S1. During the change from the first configuration to the second, servers in group G2 must fetch the contents of shard S1 (the key/value pairs) from servers in group G1.
Software
Part A: The Shard Master
Your implementation must support the RPC interface described in shardmaster/types.go, which consists of Join, Leave, Move, and Query RPCs.
You don't need to implement duplicate client request detection for RPCs to the shard master that might fail or repeat due to network issues. A real system would need to do so, but this project doesn't require it.
The Join RPC's arguments are a non-zero replica group identifier (GID) that is not currently present in the configuration, and an array of server ports. The shardmaster should react by creating a new configuration that includes the new replica group. The new configuration should divide the shards as evenly as possible among the groups, and should move as few shards as possible to achieve that goal.
The Leave RPC's arguments are the GID of a previously joined group. The shardmaster should create a new configuration that does not include the group, and that assigns the group's shards to the remaining groups. The new configuration should divide the shards as evenly as possible among the groups, and should move as few shards as possible to achieve that goal.
The Move RPC's arguments are a shard number and a GID. The shardmaster should create a new configuration in which the specified shard is assigned to the specified group. The main purpose of Move is to allow us to test your software, but in principle it might also be useful to fine-tune the overall workload if some shards are more popular than others or some replica groups are slower than others. The outcome of a Move operation need not be retained across Joins or Leaves. This is because if a Move is followed by a Join/Leave, that Join/Leave will likely undo the Move because Join and Leave re-balance.
The Query RPC's argument is a configuration number. The shardmaster replies with the configuration that has that number. If the number is -1 or bigger than the biggest known configuration number, the shardmaster should reply with the latest known configuration. The result of Query(-1) should reflect every Join, Leave, or Move that completed before the Query(-1) RPC was sent.
The very first configuration should be numbered zero. It should contain no groups, and all shards should be assigned to GID zero (an invalid GID). The next configuration (created in response to a Join RPC) should be numbered 1, etc. There will usually be significantly more shards than groups (i.e., each group will serve more than one shard), in order that load can be shifted at a fairly fine granularity.
Your shardmaster must be fault-tolerant, using your Paxos and PaxosRSM libraries from Project 3. You are free to modify your Paxos and PaxosRSM implementations from Project 3, and will need to do so at least to accommodate an application-supplied notion of log entry equality. Recall that this is useful during AddOp when encountering a decided log value that is the duplicate of the supplied log entry.
When your shardmaster determines that a shard must move, it informs the new server group that is in the new owner of that shard, using the ShardKV.AssignShard RPC. The server group, in turn, uses the ShardKV.PullShard RPC to obtain the shard from the owning server group. If a configuration change implies multiple shard movements, your shardmaster should move them strictly one at a time, and serialize the movements. Because both the shardmaster and shardkv groups need to know the Assign types, they are defined in common/types.go
- Start with a stripped-down copy of your KVPaxos server.
- One difference in your use of PaxosRSM, compared to project 3, is that PaxosRSM needs to be initialized not only with an applyOp method (which it invokes on decided values) but alsorequires an equals method (to compare any two decided values). This is because some of the elements you are likely to want to put in your log are not natively comparable--at least, not in the way you might expect.
- Go maps are references. If you assign one variable of type map to another, both variables refer to the same map. Thus if you want to create a new Config based on a previous one, you need to create a new map object (with make()) and copy the keys and values individually. If you want to compare two maps for equality, you have to do so element-by-element. The method maps.Equal is helpful here.
- While Part B will require your shardmaster to communicate changes in the configuration to shardkv servers, you should be able to pass all of the test cases for Part A without issuing any AssignShard RPCs. This allows you to test Part A independently.
- After you complete part B, to pass the test cases for part A locally, you will have to comment out RPCs to ShardKV servers within your shardmaster code. When the autograder runs the part A test cases, it will ensure that all RPCs to ShardKV servers return true without actually executing the RPC.
Part B: Sharded Key/Value Server
In this part, you will build shardkv, a sharded fault-tolerant key/value storage system. You'll modify server_impl.go in src/shardkv. You'll also modify your shardmaster implementation from part A so that, when serving Join, Leave, and Move RPCs, the shardmaster communicates the changes in shard assignments to the relevant replica groups via the AssignShard RPC.
Your storage system must provide single copy semantics to applications that use its client interface. That is, completed application calls to the Clerk.Get(), Clerk.Put(), and Clerk.Append() methods in client.go must appear to have affected all replicas in the same order, and provide linearizable consistency. A Clerk.Get() should see the value written by the most recent completed Put/Append to the same key. This will get tricky when Gets and Puts arrive at about the same time as the shard configuration changes. The recommended approach is to have each replica group use Paxos to log not just the sequence of Puts, Appends, and Gets but also the sequence of shard movements. When a replica group receives a request for a key that is in a shard not assigned to it, it returns an ErrWrongGroup error.
Your implementation must also ensure that, when the shardmaster responds to a Join, Leave, or Move RPC, the reconfiguration caused by the operation is already complete. In other words, once the shardmaster finishes serving any Join, Leave, or Move RPC, every replica group must be ready to serve Get, Put, and Append requests on shards that were assigned to that group in the new configuration.
You will need to ensure that at most one replica group is serving requests for each shard. Luckily it is reasonable to assume that each replica group is always available, because each group uses Paxos for replication and thus can tolerate some network and server failures. As a result, your design can rely on one group to actively respond to another group's request for a shard during reconfiguration. This is simpler than the situation in primary/backup replication (Project 2), where the old primary is often not reachable and may still think it is primary.
You are allowed to assume that a majority of shardkv servers in each replica group are alive and cantalk to each other, can talk to a majority of the shardmaster servers, and can talk to a majority of servers in other replica groups. Your implementation must operate (serve requests and be able to reconfigure as needed) if a minority of servers in some replica group(s) are dead, temporarily unavailable, or slow. It must also tolerate situations in which a server is temporarily in a minority partition due to transient network failures.
Unlike Part A, you do need to detect duplicate client RPCs to the shardkv service in Part B. Pleasemake sure that your scheme for duplicate detection frees server memory as it becomes possible to doso. Here are a few hints/tips to keep in mind:
- The data types that are needed by both shardmaster and shardkv are defined in common/types.go.
- Your key value servers should not call the shardmaster's Join() or Leave() handlers directly. The testing infrastructure will call Join() or Leave() when appropriate.
- Ensure that any log entry is self-sufficient. In other words, each replica in a group thatdiscovers a decided log entry should be able to apply that log entry to its local state without needing to issue another RPC. That should be true even if the replica that added that log entry needed to issue an RPC to populate the entry.
- To pass the concurrent test cases, you must design a correct protocol for handling concurrent operations in the presence of configuration changes. It is important to make sure that there is no point in time when two different groups both believe they can or should serve any given request.
- When a test fails, check for gob error (e.g. "rpc: writing response: gob: type not registered for interface ...") in your output. Go doesn't consider the error fatal, although it is fatal for the project.
- Be careful about implementing at-most-once semantics. When a server transfers shards to another, the server also needs to send the state that it maintains to detect duplicate requests. Think about how the receiver of the shards should update its own state. Is it ok for the receiver to replace its state for detecting duplicate requests with the received one?
- Think about how the shardkv server deals with ErrWrongGroup. When the client receives an ErrWrongGroup response to one of its requests and retries its request, the client retains the request identifier that was originally used. On the server-side, when returning ErrWrongGroup to a client's request, should the server add this request's identifier to the cache that the server maintains to detect duplicate requests?
- After a server has moved to a new configuration, it can leave the shards that it no longer ownsin the new configuration undeleted. This will simplify your implementation.
- Think about when it is ok for a server to transfer shards to another server during a configuration change. In particular, there can be no moment in time when both server groupsbelieve they can and should serve requests to that shard. That may require Paxos log entries at both the pulled-from group and the pulled-to group. Thinking about the order in which thesethree events happen is important.
Handin procedure
When you submit your project to the autograder, it will pull the following files from your repository:
- paxos/paxos_impl.go
- paxosrsm/server_impl.go
- shardmaster/server_impl.go
- shardkv/server_impl.go
So, please ensure that a) your repository has directories called paxos, paxosrsm, shardmaster, and shardkv containing these files, and b) all modifications that you make to the code handed out are restricted to only these files.
You may not add any RPC calls to either the shardmaster or shardkv servers, nor may you modify any of the argument or return value types.
Among the unit tests included in the handout, if you find that you pass some of them locally on your computer but not on the autograder
- Check that you have only modified the *impl.go files
- Check that you pass on CAEN all the tests that you pass locally
- Ensure that you use appropriate synchronization to protect any access to shared data from any goroutine; to check, run "go test" on CAEN with the "-race" flag
- Since some of the tests are non-deterministic, run them repeatedly and make sure you pass them every time
- If you still pass all runs of the test cases, post privately on Piazza