ds_sample_quiz

pdf

School

Columbia University *

*We aren’t endorsed by this school

Course

4113

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

10

Uploaded by ChiefGorilla3742

Report
Distributed Systems Sample Quiz (Time: 110 minutes) NAME: __________________________________________________ UNI: __________________________________________________ Grading notes: - All questions are graded on a scale of 0-2: - 0 = no answer, answer contains a major flaw, answer is beside the point, or answer lacks required explanation or example. - 1 = answer is not faulty but is beating around the bush or not quite nailing the point; - 2 = answer is correct, to the point, and reflects understanding of the concepts. - For questions that require an explanation or example will be counted as 0 (zero) if no explanation or example is provided, even if the answer is technically correct (e.g., answers such as “Yes,” “No,” “Correct,” “Incorrect,” in the questions below will get a zero). - Please keep your answers brief and to the point. For reference, you shouldn’t need more than the allotted space to fit your answer even if you use big font. CONSISTENCY 1. [2 points] Give one concrete example of an execution history for a distributed storage system that is causally consistent but not sequentially consistent. Do NOT use the same example as in class, invent your own. 1
TWO-PHASE COMMIT Ben Bitdiddle and Alyssa Hacker are reflecting on the properties of the two-phase commit protocol. For concreteness, they examine a scenario where client C initiates a single transaction involving two servers A and B. 2. [2 points]: Suppose the one way delay between any pair of nodes is r, how long does it take for client C to learn of the transaction’s result using the two-phase commit protocol? (Draw a message time diagram to illustrate the situation.) 3. [2 points]: Give an example scenario under which two-phase commit protocol fails to make progress (i.e. the transaction neither succeeds nor aborts). 2
PAXOS The Paxos protocol as presented in class is static: the set of nodes participating in the protocol are assumed not to change over time. Ben Bitdiddle needs a dynamic version of Paxos, which would allow him to add/remove nodes to address load changes, failures, etc. Ben settles on the following protocol for dynamic Paxos. When a node X wants to join, it will initiate Paxos by becoming a proposer itself and seeks a majority of ”accept” messages from the nodes in the current group plus itself. For example, if the current group is {A,B} and node X joins, the Paxos protocol initiated by X will succeed in deciding on the new group when any majority of nodes in {A,B,X} accepts the new group. 5. [2 points]: Is Ben’s new join protocol correct? If yes, give a proof sketch. If no, give a concrete counter-example where his protocol gives an incorrect result. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
6. [2 points]: Ben decides to simplify the Paxos protocol and speed up the decision process. He proposes to change Paxos by having a proposing node use its node identifier as the proposal number. As a result, the node with the biggest node identifier will always be able to win over other potential proposers. In contrast, in the original Paxos design, a proposer generates a proposal number by concatenating an (increasing) integer together its node identifier instead of just using the node identifier alone. Therefore, unlike Ben’s design, the original Paxos does not ensure that any one node will always win over other potential proposers. Is Ben’s change desirable? If yes, explain its advantages over the original Paxos. If no, give a concrete example to explain why the original Paxos is preferable. 4
7. [2 points]: Ben and Alyssa P. Hacker are discussing about the need for every Paxos node to log the highest proposal number that it has accepted (na) and the corresponding accepted value va on disk. Alyssa agrees with Ben that logging is essential for the correctness of the original Paxos design. However, she also thinks that it is not necessary to log na and va if a recovered node (after a previous crash) always joins the system with a new unique node identifier. Is Alyssa correct? If yes, give a proof sketch and discuss the pros and cons of Alyssa’s new design with the original Paxos that requires logging of na and va . If no, give a counter-example. 8. [2 points]: Paxos does not guarantee liveness. In other words, it is possible that Paxos never terminates under a sequence of events. Give a concrete example in which Paxos never reaches agreement on a new transaction after a couple of retries. 5
CONSISTENCY OF REAL DISTRIBUTED SYSTEMS 9. [2 points]: Amazon's business mostly relies on eventually-consistent storage systems. For example, users' shopping carts are stored in Dynamo, an eventually-consistent key-value store. Using Amazon's shopping cart as an example application, describe a scenario where the weak consistency model causes the application to produce an ”abnormal” or ”confusing” result to the end-users. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10. [2 points]: An important design decision that real systems sometimes make for scalability is that replicas of the same piece of data do not necessarily have the same view of the current set of available (or live) replicas. For example, replica A might believe that replicas B, C, and itself form the current replica group, while replica D assumes that only A, B, itself, but not C, are forming the current replica group. This is the case in Dynamo, Amazon's storage system, which does not enforce that all nodes have consistent views over the replica group. Explain why this design decision might cause an otherwise sequentially consistent system to lose consistency guarantees. Give concrete examples. (Hint: because we haven't studied Dynamo, you could consider how inconsistent views of the replica group would mess Paxos up instead, for example.) 7
BIGTABLE DESIGN Ben has decided to create a new company based on a Twitter-like system (plus some really cool other stuff that that will allow him to differentiate from Twitter, but we don't need to care about that part for this exercise). His first step is to build a very limited version of Twitter, and he wants to use Hbase (Bigtable's open-source version) as a storage system to store all users' tweets. The limited version of Twitter shows a user to see the most recent 3 tweets from each user he is following, along with all of his own past tweets. Your assignment is to help Ben design his Bigtable schema for the table storing user tweets appropriately to support this limited version of Twitter. 11. [2 points] Identify two types of queries that will likely be very frequent given the specified limited version of Twitter. 8
12. [2 points] Specify the schema of a Bigtable table that stores user tweets in such a way so as to optimize these queries. (Hint: don't worry about de-normalizing the DB and replicating the data.) 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
13. [2 points] What are some consistency challenges that you envision with your optimized schema? Thank you for attending the DS-1 class! We hope what you learned will be of use to you! Best of luck in your careers! 10