Reading Summary 2019-04-19
· by Aldrin Montana
Stronger Semantics for Low-Latency Geo-Replicated Storage
The paper overall seems good, but I do not believe it presents a significant step beyond the previous work, COPS (and COPS-GT). After reviewing the proposed contributions and the improvements, I think it is a borderline evaluation–I cannot decide whether to reject or accept. Specifically, I value that the change in a few perspectives led to a significant change in implementation which improves performance in a variety of ways, but I believe the core changes do not lead to more interesting research, although the improvements are certainly useful for the system and system users.
The proposed contributions are:
Support for read-only transactions that are minimally interrupted by concurrent writes.
Support for write-only transactions that allows useful work to still potentially serve reads. This is done by cohorts asking the coordinator of a transaction for information regarding committed transactions, and then trying to resolve whether a value being read will potentially be overwritten.
Evaluation showing that Eiger performance is comparable to Cassandra, despite providing stronger consistency guarantees.
The system model now supports column families
I believe that 1 and 2 are valuable, I suspect that the workload for proving 3 is very workload specific, and 4 seems valuable from a practical perspective, but perhaps not a research perspective.
I think the use of a random key as a coordinator key is very ingenious, and I would like to see the use of this in future work and perhaps a deeper dive on the trade-offs for how it can be selected and how it may relate to load balancing and other workload or system characteristics.
Performance metrics do look good. I think including numbers of some other system with linearizable or causal consistency would be useful just for reference. It’s nice to know that causal consistency can be achieved with minimal overhead, but I don’t have much of a reference for how much of an improvement the minimal overhead is compared to typical or naive systems.
The consistency model and evaluation seems to be the same as what was used in COPS, with the main difference being, “COPS places dependencies on values, while Eiger uses dependencies on operations.” The interesting thing of COPS was the idea that dependencies could be between values, rather than operations.
The use of logical time and EVTs seemed very similar, in practice, to something like vector clocks. I suppose the insight that storing the EVT is enough to determine consistency, but that also appears to not be a new idea, as this is the insight first described in the vector clock paper, except as a reasoning that the state of a process’s history can be summarized by the logical time of the latest event of that process.
I think the 4.4 section is very uninteresting. Each datacenter is assumed to be linearizable, so it seems trivial to say that the datacenter of origin for a read can be linearizable if all concurrent writes come from the same datacenter (are known to have a linearizable order). I may misunderstand the insight here, but I feel like it is better left out.
This may be an unnecessary ask, but I would have liked to see some performance evaluations with varying latencies between data centers and/or with some failures. It is not clear that any of these cases happen in the evaluations, and since each datacenter is linearizable and it is not clear how many operations span various
- How do servers know what values were returned in a first round? Or does “storing values newer than those returned in a first round” apply universally, as a fact of maintaining validity periods?