This is a repository for my research, paper reading summaries/reviews, and relevant blog-like posts in markdown.

CMPS 290S Reading Response: Replicated Data Types: Specification, Verification, Optimality


In this paper, Burckhardt, et al. formalize specification of replicated data types (RDT), and then provide frameworks and methods for analyzing implementations of the specified RDTs with respect to correctness (verification) and spatial compactness (optimality of memory usage). This paper seems to expand on Burckhardt’s previous work on formalizing RDTs and was published in the same year that his book, Principles of Eventual Consistency, was released. The paper first goes into detail of the semantics and notation for RDTs and distributed systems, then goes into how to prove correctness over these semantics using simulation-based proofs. The analysis assumes a declarative, abstract language and does not consider implementations, but does provide intuitions for how to understand the semantics in the context of a possible implementation. Finally (not including related work), the paper discusses optimality with respect to memory and space overhead.

Learning and Understanding

I have typically thought that part of the difficulty of observable systems is that the information needed to analyze provenance of events is a superset of the information needed to have a correct, distributed system. While this may be true for some levels of consistency, I think this paper makes me realize that in order to correctly implement CRDTs the information needed (and used for optimality analysis) should satisfy needs for observability as well. So, I would think that the analyis for optimality would be useful for designing a system which accounts for trade-offs in resources for observability of various CRDTs in addition to just their implementation.

I’m not entirely sure the intuitive motivation behind replica order compared to arbitration. If visibility and arbitration are meant to model the delay that can be observed when sending messages across a network, I wonder what replica order is meant to model. Perhaps replica order and arbitration serve similar roles, but for processing (server-side) and management (data-side), respectively? That is, maybe visibility and replica order are meant to model delay in sending messages across the network and when a server will process messages for a replica. Whereas, visibility and arbitration model the delay in when a data type (data structure/data management system) applies changes to stored state?

Research Question and What to Investigate

While reading the initial semantics, I was curious if these formalisms might be applicable as quantum computers further develop. There is a mention of multi-value registers, but I’m not sure if this is applicable for quantum computers, or more applicable to something like register windows, which the sparc processor architecture supports. This would be something that would be interesting in terms of understanding how to better support high-level development that applies these semantics described by Burckhardt, et al, or perhaps to better design mappings of the described RDTs to hardware primitives for better hardware utilization.