This is a repository for my research, paper reading summaries/reviews, and relevant blog-like posts in markdown.

The Zebra Striped Network File System

What is the problem the authors are trying to solve?

The authors are trying to improve throughput of the file system.

What other approaches or solutions existed at the time that this work was done? What was wrong with the other approaches or solutions?

RAID-II, DataMesh, and TickerTAIP were other solutions for high-performance file servers. RADD is a solution proposed by Stonebraker and Schloss that uses multiple disks, but distributed across many geolocations. While RADD does not improve throughput, it improves reliability and redundancy.

sfs, Bridge, and CFS stripe data across I/O devices (nodes in a parallel computer sounds like devices in a standard computer), and swift is the only other solution the authors mention that stripes data across nodes in a network.

File systems that try to maximize availability are Locus, Coda, Deceit, Ficus, and Harp.

Overall, Zebra appears to have been one of the few solutions in its space, providing improved throughput over LFS by striping data across servers within a network.

What is the authors’ approach or solution?

The authors’ main idea is to stripe across disks and store parity–similar to RAID 5–to improve performance and achieve reliability without full mirroring, while also using a log-structured approach in order to simplify the overall approach. More specifically, clients that communicate with Zebra maintain their own log of file writes, and so small writes are batched, while large writes are appropriately partitioned. Parity for writes are computed based on log records rather than the write or file itself. In this way the authors hope to combine the benefits of striping with the benefits of caching (or batching).

Why is it better than the other approaches or solutions?

There are specific problems with LFS and RAID that the authors are trying to address: LFS is designed for local storage devices, while RAID performs poorly for small writes.

How does it perform?

The authors claim that for large file read/write performance Zebra has up to 4-5 times more throughput compared to NFS or the Sprite file system. The graphs however don’t seem to convey that particularly well. Performance overall seems good, though. The performance trade-offs seem to be that Zebra is fast at writing small files, but unable to improve efficiency of small reads due to being unable to batch them. Large file writes are slightly slower than reads due to parity calculation, but large file throughput has a significant improvement over other approaches, while small file throughput is on par with other approaches. There are certainly beneficial features not measured, such as reliability, that Zebra provides over RAID and Sprite LFS.

Why is this work important?

This work is important as it appears to be successful, very early work in a network file system that uses concepts from RAID. One of the authors is the author of the Sprite LFS paper, and so I’m not surprised that LFS concepts are also incorporated.

Generally speaking, this paper also goes into great detail on how failure is handled of each component, how each component works together to achieve the system goals, and I thought the performance of each component was really great to include. As this was early work on striping files across a network of devices, this detail was likely very useful to the storage research community.

3+ comments/questions

  1. I got the impression that Zebra took a RAID 4 approach where parity was sent to a particular storage server, but I would think that RAID 5 would be better for spreading the load of parity calculation and throughput. Especially since storage server CPU utilization tended to be low compared to other measured resource utilizations.

  2. It was mentioned that small read throughput could not be improved by batching, and so was less than small write throughput. But, I wonder if these authors, or any others, considered the use of indexes for data being frequently read. If such indexes were only for hot data and kept relatively small, then the overhead would be low and there would be little loss compared to not having the indexes.