This is a repository for my research, paper reading summaries/reviews, and relevant blog-like posts in markdown.
Data management and processing is a vast field and/or topic. If we consider any particular end-to-end system, there can be many systems used to fill specialized roles; on the other hand, there may also be fewer, generalized systems to satisfy the same needs. Here, I want to elaborate on a particular ecosystem that I am trying to fit into and how I view our role or niche in relevant end-to-end systems.
For my purposes here, I think of an ecosystem as a particular set of interchangeable systems that can be integrated to satisfy a purpose. Any particular composition of such systems would be an end-to-end system. Each system therein could stand on it’s own, but if it is specialized to the point of depending on other systems, then it may also be considered a sub-system.
Originally, I was trying to determine a reasonable name for my project that was relevant to
SkyhookDM
. My work was initially imagined as something that would sit on top of–or live as an
extension of–SkyhookDM
. For this reason, I have been calling my project Skytether
(meant to
have the connotation of a particular part of a skyhook). For a first name, it
seemed reasonable.
Now, beyond names, most of the code that I have written is meant to eventually integrate with
SkyhookDM, providing the necessary functionality to be able to delegate some
processing to computational devices that a computational storage server may be using. Similarly to
how SkyhookDM
brings tabular data management to an existing storage system, Ceph
, I imagine
that Skytether
would bring the use of computational storage devices to an existing
computational storage system, SkyhookDM
.
As I was working on Skytether
, I began to feel like I should differentiate some of the work I was
doing as a domain-specific extension of SkyhookDM
. Thus, I initially named the repository
skytether-singlecell
. However, I recently gave a presentation at a CROSS to propose
my work for funding as an open source project. In preparing for this presentation, I realized that
my research really should have multiple parts: (1) Skytether
is an extension and modification of
SkyhookDM
to leverage computational storage devices, and (2) MSG Express
is a domain-specific
data management system that uses Skytether
to support data processing and storage of single-cell
gene expression data. Thus, MSG Express
will be an open source project that provides benefit to
the bioinformatics community (specifically use cases that use single-cell RNA sequencing),
meanwhile Skytether
will most likely be an umbrella project that consists of functionality pushed
into other, various open source projects–arrow and SkyhookDM. I
specifically imagine that MSG Express
will then provide integration between application-level
libraries, such as scanpy and anndata, and Skytether
(or SkyhookDM
).
Work in Progress.