The advantages of packaging experiments in a way that makes it easy for others to re-execute are abundantly clear: for personal projects, where one can easily go back to an experimentation that was "abandoned" a few months back; academic articles, where one wants to publish an experiment associated to a publication, so others can re-run it easily; or for class projects, where students share their homework and final class report in the form of executable code to their professors. This idea of reproducible research has been on the rise in recent years in many areas of computational and data science, and with it the demand for tools that are designed specifically for creating experiments in this way.
Popper is a protocol for creating reproducible experiments. Popper is designed as a way of leveraging popular DevOps tools and techniques, such as Git, Docker and continuous integration (CI) in order to produce experiments that can be re-executed on different environments with a single command.
In the context of Popper, we refer to experiments as pipelines. These pipelines consist of stages, where stages are (Bash) shell scripts that dictate the steps of an experiment (one script per stage). As part of our efforts, we provide a CLI tool to easily bootstrap and manage repositories of pipelines.
An interesting use case is automatically reproducing experiments on cloud services. To illustrate this, we've developed a pipeline that runs a simple experiment on ChameleonCloud. This pipeline can be used as a learning example and also as a starter template for creating reproducible experiments that run on Chameleon. To obtain this pipeline, one can use the Popper CLI tool:
The above snippet initializes a new Git repo; "Popperizes" the repository (
popper init); and downloads the Chameleon pipeline (
popper add), placing it into the
pipelines folder of our repository. To keep things simple, the pipeline has three stages and only requires Docker to be installed on the machine where it is being executed. The experiment has the following stages:
setup: creates a reservation and allocates the requested resources on Chameleon using
enoslib(a blog post on Enos can be found here). The
scripts/request.pyfile is where resources are specified. If you use this pipeline to bootstrap an experiment, this file needs to be modified in order to specify the actual number and types of machines you need. Once this stage is done, the information about each allocated node is written in the
run: this stage is the one that carries out the actual experiment. In our case we are running a couple of benchmarks using
baseliner, and saving the results by retrieving output files into the
results/folder. Of course, this stage should be changed to fit your needs. For example, a tool for running a workflow, automating the configuration, or monitoring resources, can be used to orchestrate an experiment on this allocation.
teardown: deletes the lease used for the experiment.
Once you've obtained the pipeline, you can issue
popper run chameleon-benchmarking to start the experiment. But it does not have to stop there. Another feature of Popper is the ability to set up our experiments to run automatically on CI environments through the
popper ci command. You can also add a
validation stage to make sure that a pipeline produces the expected results. Using this, one can check the integrity of an experimentation pipeline whenever a new change is introduced (a commit from us or collaborators).
The goal for Popper is to make it a domain-agnostic tool, so a pipeline can implement a scientific exploration from many fields, for example Data Science, Genomics, or Linux kernel research. Since pipelines are made up of familiar shell scripts, it is often straightforward to "popperize" existing experiments. We also provide many examples and starter piplines that you can take advantage of by using the
popper add command. You can read more on the official documentation, take a self-paced hands-on tutorial, and, if you are interested in contributing or require any assistance when using Popper, you can take a look at our issue tracker or reach out to us on gitter.
Type of contribution: Software
Author(s): Rafael A. Castillo, Francisco E. Gonzalez, Ivo Jimenez
Link to code/repository: https://github.com/systemslab/popper
Link to documentation: http://popper.readthedocs.io
Dependencies: Docker, Popper