Declarative Bioinformatics (2019-08-05) (Work in Progress)

This post will attempt to describe and explore a “declarative programming interface” for bioinformatics. The specific purpose of this declarative interface is to determine the necessary subset of relational algebra, relational calculus, or other language that a storage system would need to provide to support various bioinformatics use cases. I will start with gene expression and single cell RNA sequencing (scRNA-seq) use cases, but I will explore how this interface generalizes to other bioinformatics use cases.

Bioinformatics is a broad field that tries to understand molecular and cellular biology through the analysis of biological data. Taking a top-down approach to determining a declarative programming interface, I start with some key biological entities and concepts that exist in gene expression analysis and single cell RNA sequencing.

List of Biological Entities and Concepts (Work in Progress)

Molecular and cellular biology terminology:

Sequencing terminology:

Analysis terminology:

Ontology terminology:

Resources