Reading Summary 2019-05-28

· by Aldrin Montana

Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

Problem Statement

The authors are interested in a storage system that can better support cluster computing frameworks such as hadoop, spark, etc. Specifically, they want to improve compute performance using a distributed file system that can speed up writes across compute frameworks.

There are many distributed file systems, but Tachyon is the only one that uses operation-based replication instead of state-based replication for reliability. And in the case of BAD-FS, the main distinguishing features of Tachyon is the use of asynchronous checkpoints to bound recompuation of file state.

Ultimately, the inspiration for Tachyon appears to me to be RDD’s in spark. However, spark only has scope and information to track lineage in RDDs within a computation, whereas Tachyon extends lineage to files in the storage system.

Compared to other systems that do track lineage in some form, Nectar takes a more traditional file-based approach and must be used with DryadLINQ and do some code inspection to know if files can be re-used for various queries, whereas Tachyon needs to generalize across languages and frameworks.

Proposed Solution

The authors propose a pushdown of lineage (provenance) into the storage system in order to eliminate network- and disk-based bottlenecks associated with replication in distributed file systems. The authors’ implementation, Tachyon, provides roughly 100x speedup for write performance, and 2x speedup for read performance. For a “realistic workflow,” Tachyon has roughly 4x speedup in overall performance, even with failures (which take only minutes to recover from).

I would say that in comparison to memHDFS (used for the above evaluation), Tachyon certainly seems to perform well.

Contribution

This work is important in how it shows the value of operation-based replication for storage systems and for reuse across compute frameworks. Further, it shows how some of the difficult problems with operation-based replication can actually be addressed with the use of checkpoints and recomputing discarded state.

Comments and Questions

Although the paper says that lineage is used to avoid replication, what is really being enabled is the lazy or low-cost replication of data by recording operation history, aka lineage or provenance. I think the application is very cool, however, and I am curious to see what more recent work in this area looks like.
I am curious if the ability to cache data in spark has special integration with tachyon, as cached data in spark is essentially a checkpoint in Tachyon. Further, I wonder if Tachyon has any integration with ray, which is used for sharing and persisting objects outside of the address space of a running compute job (aka to store objects off-heap for spark which uses the JVM).