HP AutoRAID Hierarchical Storage System

This is a repository for my research, paper reading summaries/reviews, and relevant blog-like posts in markdown.

HP AutoRAID Hierarchical Storage System

What is the problem the authors are trying to solve?

RAID configuration is a black art and difficult to adapt to a variety of workloads and changes to the usage over time.

What other approaches or solutions existed at the time that this work was done? What was wrong with the other approaches or solutions?

Other approaches did tiering between HDDs and tape, but not multiple tiers on only HDDs. Other solutions also have drawbacks in using variable-size records instead of fixed-size blocks, or didn’t accommodate the use of fixed-size blocks and multiple storage levels.

What is the authors’ approach or solution?

The authors’ approach is a two tiered storage system, where one tier is mirrored and the other tier is striping with parity. By migrating cold data to the mirrored tier, and leaving hot data in the striped tier, the proposed storage system is able to provide performance for data being frequently accessed, and reliability for data that is accessed infrequently.

Why is it better than the other approaches or solutions?

The main aspects that makes this work better than other approaches is the combination of targeting RAID arrays, targeting fast medium for both tiers (instead of archival mediums such as tape), and the tiering of both metadata (including parity) and data (cold and hot).

How does it perform?

It seems that IOs per second performed as expected, where RAID needs 4 IOs for RAID 5, AutoRAID needs 2 IOs for RAID 1 (mirrored), and JBOD needs a single write. However, for read performance AutoRAID does very well, even with JBOD. The authors describe this as a benefit of mirrored storage to compensate for controller overhead. RAID seems to do unexplainably poor, which the authors attribute to configuration black arts and the difficulty in configuring for a variety of workloads.

Why is this work important?

This work seems important enough, but perhaps isn’t revolutionary. It seems to have been a major step toward tiered storage within a single class (not considering archival storage as a tier), but perhaps it only does this in the context of RAID arrays. This is still a big deal for storage systems, though it is hard to imagine how important the RAID context was for this time.

3+ comments/questions

  1. I’m still not sure how mirrored storage improves performance and hides controller overhead. My best guess is that it has to do with being able to request data from one disk, and requesting different data from the other disk, and just that when one disk is being used, the other one is always available to service any request, rather than some requests being required to go to a specific disk.

  2. Without having done any searching myself, I wonder how long it was before a research group combined the work on tickerTAIP with AutoRAID. If AutoRAID is doing this tiering, it would be really cool to allow for distribution of the RAID controller, especially since it’s entirely in software, to see if scalability improves further.

  3. I wonder if this work is a precursor to tiering between SSDs and HDDs as well, or if groups that first did that perhaps did not know about this particular system.