Reading Summary 2019.05.20
· by Aldrin Montana
A comprehensive study of Convergent and Commutative Replicated Data Types
This paper details shared data types, which receive and process updates at one replica and then update state at other replicas, using “simple” formal conditions that guarantee eventual consistency. The authors call these data types convergent replicated data types, or commutative replicated data types. The convergent and commutative terms refer to how updates of state converge or how update operations commute.
Interestingly, state replication is referred to as passive replication and entails the update of other replicas after an update is applied at a source replica. This requires that an update operation be applied in its entirety at the source replica so that the “content” or payload be propagated to other replicas to merge state. Operation replication is then referred to as active replication and is both: more constrained and capable of asynchronous updates. Although pre-conditions must be checked at the source replica, the actual update can be propagated concurrently with application at the source replica, and the propagation need only contain the operation and no state. The rest of the paper does quite well in discussing various principles and approaches for achieving both types of replication and does so using a few examples.
I like that the strength of the communication channel is mentioned as part of the capability of CvRDTs and CmRDTs, which is that state-based replication (CvRDTs) can be used with weaker channels which may deliver many times, compared to operation-based replication (CmRDTs) which require reliable delivery.
The detail of discussion for various examples is quite good. I appreciate the ramp up from G-counters to PN-counters and the discussion of why the state-based replication for these require vector clocks and the clear description of why an oversimplified state payload (just an integer for a counter) doesn’t capture enough state for correctness. This can easily be extended by the reader to see how more complex data types can end up requiring a lot of state.
“Both these constructs are CRDTs because they combine two CRDTs…” I do believe there is no discussion of whether or not composing CRDTs can yield a CRDT. The interesting thing here is that a CRDT that is composed of other CRDTs must have correct composition semantics, which I feel is referenced in passing but deserving of discussion.
I think that garbage collection is deserving of a larger discussion because it is mentioned as a reason why CvRDTs become inefficient over time. As discussed, it leaves the reader to derive garbage collection for any CRDT from the insight that for 2P-Sets there is a particular state which is trivial to garbage collect: when you have removed an element, and thus it can no longer be added or removed. I am okay with leaving details to other papers, but I think discussion of how GC requirements may change for CRDTs if composed or if CvRDTs are used instead of CmRDTs, etc., would be a great addition.
- Does reliable communication imply once-delivery as well? I had assumed it only meant that messages are eventually delivered, but in order for CmRDTs to converge, I get the impression that reliable communication is once-delivery.