Packaging Experiments for Reproducibility

Did you ever find yourself in a situation where you needed to package an experiment for reproducibility? Perhaps you were preparing homework for students, finalizing a summer project and wanted your collaborators to be able to build on your work easily, or perhaps the conference you are submitting to requires it — whatever the reason, packaging experiments so that they can be easily repeated and potentially extended is increasingly a fundamental precondition to sharing research. 


How can a testbed like Chameleon help? First and foremost, it is a shareable instrument  — all users can have access to the same hardware. It’s no longer the case that I can carry out an experiment on a GPU cluster in my basement — and you can’t because you don’t have access to such a cluster. Secondly, Chameleon is a cloud — users access it by configuring images, orchestration templates, and other digital artifacts and deploying them on the testbed to create an experimental environment — the same artifacts can then be used by others to repeat this deployment and thus recreate that experimental environment. Thus, just by virtue of using Chameleon for your experiments, you are going a long way towards making your experiment repeatable. 


What’s missing? We still need to implement the experiment in this environment, gather, analyze, and visualize data, and share it — preferably all within one well integrated environment. Jupyter Notebook provides most of these additional capabilities (more on that here), and integrated with Chameleon, repeatability and reproducibility are made easy. In this blog we are going to describe a process for packaging an experiment for repeatability. 


Step 1: Create Your Experimental Environment

To begin any experiment on Chameleon, the first step is creating your environment. You could do this using any orchestration method, but it is particularly convenient to do from a Jupyter Notebook — using the same container set up commands you’d use in the CLI. Since Jupyter is integrated with the testbed, your Chameleon credentials are implicit in the Jupyter cells, making it easy to build an experimental container: start a reservation, launch an instance, or SSH into one you’ve already launched from the GUI. Look at some of the examples below — you may be able to copy part of the experiment packaging to start your own experiment! 


Step 2: Run the Experiment 

With your experimental container ready, you can quickly and easily code up what it takes to run your experiment (check out example #2 below!). Depending on the complexity of the experiment, you can put this code in a separate notebook (and then reuse your experimental container for multiple different experiments). You can run, tweak, and rerun your experiment as you shape it. A big advantage of using Jupyter is that the integration of code with text allows you to explain your experimental steps, justify choices, and discuss alternative approaches.


Step 3: Analyze Your Data

The reason why Jupyter is such a handy tool is that it is also a conducive environment for data analysis — after the experiment is finished or as you go, download the results locally to analyze and visualize them within Jupyter. Because Jupyter enables graphical and image visualization, you can include these in your notebook, so your container, experiment, and results are all in the same place.


Step 4: Share Your Work

Ultimately, implementing your experiments within the Chameleon-Jupyter Notebook environment allows you to replicate your environment and experiment exactly — saving your container-building commands and experiment code in one place. In doing this, others — whether collaborators, reviewers, or other researchers interested in this area — can easily repeat your experiment by launching your notebook. Chameleon’s integration with Zenodo, a digital publishing platform, allows you to publish your digital representation of an experiment and provides a DOI: you can now reference your experiment from your paper! 

Chameleon, as a shareable, cloud testbed, is designed for experimental collaboration. The Jupyter Notebook integration allows you to mix explanatory text with actionable code, replay experiments, or break them down cell by cell. Whether to avoid future troubleshooting or provide educational value, the Chameleon-Jupyter Notebook combination unifies cloud capabilities with code and text. The best part about running your Chameleon experiments with Jupyter Notebook is the convenience — the inherent experimental repeatability is an added bonus. 


If you are interested in seeing how others followed these steps, here are some examples of experiments in different areas: 

  1. Application-based QoS Support with P4 and OpenFlow: a networking experiment showing an early use of Jupyter notebook for packaging -- while no published Juypyter notebook for this one exists, you can view it in this short screencast available on our YouTube channel.

  2. Image Classification with AlexNet on the Stanford Dogs Dataset: a machine learning experiment packaged in Jupyter Notebook, designed to be run with tools available within Chameleon and OpenStack. The packaged notebook is available on Zenodo, making it easy to reproduce in ~1 hour and perfect to use to teach machine learning or how to use the Chameleon testbed. 

  3. Tiny-Tail Flash: Near Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs Reproduction: For this experiment, a Jupyter notebook is packaged and available from Zenodo, reproducing the Dev Tools Release experiment from this paper.


If you would like to try out this packaging method, our Jupyter documentation is a good place to start. Feel free also to explore the examples above — some elements, like the creation of an experimental container or certain data analysis patterns — could be generic to many experiments and will allow you to get started faster. And if you do package your experiment using this method, please let us know — we’d love to profile it on our blog!  



Add a comment

No comments