Reading Summary 2019-04-23

· by Aldrin Montana

Chord: A Scalable Peer-to-peer Internet Lookup Service for Internet Applications

Relevant Readings

Problem Statement

The authors are trying to provide a platform on which applications can reliably locate nodes in a distributed system for a given key, even in the presence of nodes dynamically joining and leaving the system. The authors want the mapping of keys to nodes to be stable, with minimal balancing or data shuffling as nodes join and leave.

Proposed Solution

The authors propose an approach that uses consistent hashing to stably map keys to nodes, which is proposed as a substrate for distributed applications. For performance, the authors propose finger tables to skip intermediate successor nodes that are known to not contain the desirable data.

The system performs well for locating nodes–log(n) where n is the number of nodes–and with the use of finger tables is relatively efficient in the amount of memory necessary to improve performance. However, there is a known issue that the stabilization algorithm is not always correct, because it is done in two atomic phases.

Contribution

This work is important because it brings in the use of consistent hashing to provide a platform that is incredibly useful for large distributed systems and is useful in various systems today (even if they don’t actually use chord).

Comments and Questions

With regard to chord’s stabilization algorithm not being correct, it has made me wonder if there is any way for chord to atomically track successors in a way to make it correct. The problem is that nodes can join in such a way that between updating the successor of a node and updating the successor of the successor, the nodes added may fail and then it’s possible to have disjoint rings. I suspect that this framing of the problem requires consensus which can be reliable most of the time especially today, but perhaps stabilization is just as hard as consensus?
I keep thinking that with Ivy it would be possible to partition data in a way that only certain clients handle certain data. If this is done, it would be interesting to use chord to maintain which client clusters handle which files or file blocks in a way that Ivy clients can join and leave the system dynamically, and chord can remap keys to new clients that manage those keys. This enables Ivy clients to read a few number of client logs for performance, but also enables Ivy in the use case it is best for–systems where clients shouldn’t be dependent on other clients.