Some ideas and arbitrarily-ish chosen decisions.
|19/04/12||Fri||Gather an example workload of HDF5 and implement HDF5 VOL Passthrough||Task|
|19/04/15||Mon||Gather additional bioinformatics use cases from Genomics Institute||Task|
|19/04/19||Fri||Have workloads working on HDF5 files on top of librados||Milestone|
|19/05/03||Fri||Have workloads working on HDF5 files on top of HDF5-RADOS interface||Milestone|
For Friday (05/03), I would like to have a naive/simple implementation of what Carlos had mentioned, where HDF5 VOL passes API calls to another HDF5 VOL server/endpoint which is closely integrated with RADOS (via objecter class/code) so that I can compare the overhead of using librados (just VOL call out to object store) and the overhead of the object store “incorporating” operations from an access library.
I think at this point, I would be able to do a majority of a project for Carlos’s class, which more explicitly mentions benchmarking the comparative performance of HDF5 over a local file system to HDF5 operations distributed to multiple, remote file systems:
Build a HDF5/VOL plugin that maps to HDF5. Measure performance of HDF5 over a regular file system vs HDF5/VOL to HDF5 over a regular file system. What is the overhead of this indirection? Then map HDF5/VOL via plugin to HDF5 on multiple servers. How many servers do you need to be faster than HDF5 over a regular file system?
Pending time I want to merge some ideas here with ideas for a microservice architecture for a bioinformatics application.