This is a repository for my research, paper reading summaries/reviews, and relevant blog-like posts in markdown.
Some ideas and arbitrarily-ish chosen decisions.
Date | Day | Description | Task/Milestone |
---|---|---|---|
19/04/12 | Fri | Gather an example workload of HDF5 and implement HDF5 VOL Passthrough | Task |
19/04/15 | Mon | Gather additional bioinformatics use cases from Genomics Institute | Task |
19/04/19 | Fri | Have workloads working on HDF5 files on top of librados | Milestone |
19/05/03 | Fri | Have workloads working on HDF5 files on top of HDF5-RADOS interface | Milestone |
For Friday (05/03), I would like to have a naive/simple implementation of what Carlos had mentioned, where HDF5 VOL passes API calls to another HDF5 VOL server/endpoint which is closely integrated with RADOS (via objecter class/code) so that I can compare the overhead of using librados (just VOL call out to object store) and the overhead of the object store “incorporating” operations from an access library.
I think at this point, I would be able to do a majority of a project for Carlos’s class, which more explicitly mentions benchmarking the comparative performance of HDF5 over a local file system to HDF5 operations distributed to multiple, remote file systems:
Build a HDF5/VOL plugin that maps to HDF5. Measure performance of HDF5 over a regular file system vs HDF5/VOL to HDF5 over a regular file system. What is the overhead of this indirection? Then map HDF5/VOL via plugin to HDF5 on multiple servers. How many servers do you need to be faster than HDF5 over a regular file system?
Pending time I want to merge some ideas here with ideas for a microservice architecture for a bioinformatics application.