Reading Summary 2019-04-22
· by Aldrin Montana
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage Systems
Epidemic Algorithms for Replicated Database Maintenance
This paper seems to define "anti-entropy" as an approach for eventually propagating information amongst nodes.
Session Guarantees for Weakly Consistent Replicated Data
This paper defines and details "session guarantees" which can be summarized as safety properties that are achieved by requirements between pairs of read and write operations: Read after Read (RaR), Read after Write (RaW), Write after Write (WaW), Write after Read (WaR).
I think this is overall a very good paper. It builds on previous work–session guarantees, anti-entropy–in distinct, useful ways to build a system, Bayou. The design and work that went into Bayou seems large enough to warrant many papers, and so I like that this paper has a good scope within which everything is explained well and provides good insights into the design and implementation of an eventually consistent system.
This system does not explicitly support transactions, but seems to enable applications to define their own transaction-like context in a flexible manner that begs further insight into the time delays between dependency checks, conflict checks, and communicated updates.
The sessions guarantees provided by the system are at a similar level of abstraction as the mechanisms provided for applications to specify dependency checks and conflict-merge logic. Having consistency guarantees that consider all orderings of read and write pairs makes it easier to think of application operations in imperative or procedural semantics (“If I see this, then I should do X; otherwise, Y”). There is no impedance mismatch between the features of the system and the development model provided by the system.
The data store view model seems fairly simple yet powerful to me: the write log tracks stable writes with a tail of tentative writes, and there are 2 bits for each record that specifies wether it’s tentative or stable and stable write history can eventually be represented by a simple “omission” vector. I think this approach makes good use traditional database systems, write-ahead logging, and an undo log for providing an intuitive, robust relational persistence model.
While I like the presentation of dependency checks and merge procedures at a high level, they seem to make the system weaker in ways that can be particularly difficult to accommodate by application developers.
As a practitioner, it is not clear what semantics are provided for atomicity, and how to handle the nuances when a conflict occurs (as with the bank example and simply requiring that at least $100 be in the account). Should the system block until the writes are resolved via the write log before notifying clients of the operation status?
This approach certainly allows a lot of flexibility, but at the cost of complexity and extra burden on applications. This may certainly be a reasonable approach, but it can be incredibly difficult and inefficient to require that applications identify conflicts and write conflict-management code. The downside here being that because no mechanisms are provided by the system to assist application developers, it can lead to a system that is difficult to build on top of and brittle applications.
There seem to be some weaknesses in allowing arbitrary merge procedures and dependency checks, particularly in ensuring a responsive, helpful system. It can be difficult to ensure that write operations are performant, and the semantics for dependency checks and merge procedures don’t allow the system to make intelligent decisions to help applications or application developers.
If one server, s, applies an update w, which is communicated (via anti-entropy) to s’ (from which s receives w’): Is it possible for both s(w’) and s’(w) to fail their respective dependency checks such that the result of their conflict-merge operations are mutually inconsistent with each other? In my mind, it seems the type of scenario that could easily come up in a high traffic, real-time system that allows edits or deletes–such as facebook or twitter. I can’t think of an exact example, but perhaps 2 independent comments to a post that need to resolve with each other, and the owner of a post deleting the post? The point of my question being: Does the combination of session guarantees and the use of dependency checks, etc. for eventual conflict resolution necessarily have a cap on the type of applications it can support?
What happens if a dependency check or conflict merge takes a long time? Are other updates held up, or is an OCC-like approach taken where they’re allowed to run but are simply failed and the application needs to try again?
Would it be beneficial to explore analysis of merge procedures so that Bayou could determine ordering for some write operations? For instance, merge procedures that always produce a new record (instead of selecting a winner or merging in place) can always commute.