File System Design for an NFS File Server Appliance

This is a repository for my research, paper reading summaries/reviews, and relevant blog-like posts in markdown.

File System Design for an NFS File Server Appliance

What is the problem the authors are trying to solve?

To improve networked file system performance, Network Appliance (NetApp) decided to produce specialized hardware for serving an NFS-based file system. This paper addresses design concerns of a file system optimized for an NFS appliance.

What other approaches or solutions existed at the time that this work was done?

At the time of this paper (‘94), there were many general purpose file systems that existed. However, there were none that were designed for specialized systems such as an NFS appliance. The systems that the proposed approach were compared to for performance were an Auspex server and a few Sun SPARC servers.

What was wrong with the other approaches or solutions?

RAID requires a read-modify-write sequence that is a bottleneck for write performance, and other NFS approaches have low latency due to having to wait until the write is done before notifying the client, which also leads to lower throughput. Additionally, NetApp has produced appliances for which no file system is optimally designed, but also the appliance can be co-designed with their own file system.

What is the authors’ approach or solution?

The authors’ approach is a co-design of system hardware and the file system to provide a high-performance NFS implementation. The key features of their file system, WAFL, is the use of NVRAM so that WAFL is more responsive than other NFS implementations, and the use of shallow snapshots to improve system recovery time and simpler process for writing data.

Why is it better than the other approaches or solutions?

The use of NVRAM allows for a low-latency approach to maintaining consistency which speeds up the responsiveness of WAFL compared to other NFS approaches. The use of “shallow snapshots” allows low-latency and low-overhead snapshots compared to the Episode file system. Between the afore mentioned approaches, and good hardware in general, NetApp storage appliances are much better than other approaches where NFS is a viable approach for network-based storage.

How does it perform?

The authors are relatively honest to claim that performance is difficult to measure, and the performance metrics provided are overestimates of some aspects and underestimates of others. The performance metrics can be summarized as the FAServer (NetApp appliance) having a best response time that is 3-4 times better than the best response times of other approaches; FAServer has a best throughput that is at least as good as 1 other approach, and about 1.5x better than all others in operations per second; and, at the best throughput has 2-3x better response time than other approaches. The FAServer, at best throughput, has response time ~6x worse than their best response time, while other servers have 3-4x worse response time than their best response time.

Why is this work important?

This work showcases an early trend in co-designing hardware with software in order to address problems that general purpose software has.

3+ comments/questions

  1. Even if performance comparisons to other systems are difficult, they could have provided better performance measurements. I am particularly curious about how well data de-duplication works (if they had it implemented at this time), how often data from old snapshots are cleaned for re-use, and how long an appliance might typically last before cleaning old data becomes a necessity.

  2. I did not see any mention of Sprite LFS, but clearly the use of NVRAM for a log and then writing the contents of the log to the disk is inspired by LFS. I wonder why they did not provide any comparisons of how WAFL performs compared to LFS given this similarity. I also wonder how beneficial it would be to extend an LFS system with NVRAM in a similar way, so that the cache used for batching writes would be much faster. Would the performance be pretty much the same? Is the overhead of managing inode maps and block maps a little bit slower than just writing the data as in LFS?