This paper proposes an approach for handling data consistency for applications that synchronize data to cloud storage. The authors specifically design for mobile applications because network partitions can happen frequently.
The contributions of this paper, or at least the crux of their proposal, are around cloud data types which have optimized methods for handling the merging of revisions. A revision is a sequence of operations in between invocations of yield. By providing a yield() method, the authors are able to frame operations in between yields as an atomic, transaction-like sequence that makes it easier for developers to think in terms of revisions and when it is important for local state change and remote state change to be merged to maintain consistency.
The overall motivation is to allow developers to use eventual consistency and reason about consistency, without having to explicitly manage operations and state to satisfy the desired consistency properties.
The introduction of the paper makes it seem as if their data model is the mechanism on which their programming language level semantics are dependent, but I feel as if that’s a bit misleading. The introduction of yield and the relevant semantics are actually key to the system managing consistency in the described ways. Even when they describe how to provide stronger consistency, they introduce flush and a function that explicitly re-evaluates conditions on the remote machine, not just on the local machine. This tells me that while their data model (data types) is necessary for reducing the load of thinking about convergence and commuting operations, their main contribution is actually making the concept of “these operations are happening but are not yet communicated to the other machines” a first class citizen in their programming model.
I wonder how much this work assumes that there is a single replica with which to synchronize. The fork/join formalisms and intuitions make sense to me, but I wonder how they may need to be extended if this work were applied to peer-to-peer distributed systems rather than server-client distributed systems. This would be very useful for games or applications that allowed device-to-device connections, but then allowed that state to be sent to a central server as well. Unfortunately, I have not cracked the formalisms enough to know if this is already addressed.