Paper Review: Bigtable: A Distributed Storage System for Structured Data

This is a repository for my research, paper reading summaries/reviews, and relevant blog-like posts in markdown.

Paper Review: Bigtable: A Distributed Storage System for Structured Data

Review

What’s new

Summary

BigTable is a very flexible DBMS that is designed for scalability. Flexibility primarily comes from the use of both row-based and column-based storage, and scalability comes from grouping over rows and columns for improved locality, and the liberal use of SSTables (aka LSM-merge trees). BigTable had to be flexible to accommodate the vast difference in use cases between financial applications and web-based applications. A distributed lock service, Chubby, was also extensively used, which was also published in the same conference (OSDI ‘06), so there was clearly a lot of new designs coming out of Google at the time.

Thoughts/Comments

  1. The column families are awesome. I’m not sure if this is where the concept was introduced, but my understanding is that it was one of the keys to the design of dremel and is the basis of parquet.

  2. Tablets and tablet servers are equally as great for efficiency as column families, I would think. I am of the impression that in combination with locality groups, this makes lookups extremely efficient as the rows should be in the same server and SSTable. This may be a reach, but I wonder if tablets were an influence in the design of row-based tile groups as used by Joy Alruraj in his “Bridging the Archipelago…” paper.