A Scalable Cyberinfrastructure for Repeatable Ecological Research

This blog discusses a new experiment deployed on Chameleon called CIEF, a Cyber Infrastructure for Ecological Forecasting (Dietz & Matta, 2018). CIEF supports data-driven research in ecological forecasting to understand our ecosystem and drive policy. Examples include predicting environmental changes, corn production in the near to medium term, types of disease-carrying mosquitos, based on data related to air, land, and water.  The goal is to provide a platform where ecologists submit their forecasting model code, run these forecasts and update models whenever new data becomes available, and analyze and visualize results.

 

CIEF raises the level of abstraction by leveraging the “serverless computing” model (Paul Castro et al., 2019) where users worry only about their application code but not its deployment. This makes it much easier for domain scientists (users) to advance their research in a scalable, automated, secure, and reproducible way.

We developed the CIEF hybrid edge-cloud middleware architecture to automate the process of iterative eco-forecasting (Ecological Forecasting Initiative, 2020) for authorized researchers, standardize data formats and interfaces, develop front-ends that support decision making, and allow forecasting models to ‘compete’ and perhaps aggregate their predictions.

 

To test our CIEF architecture, we used Chameleon as a back-end core cloud (datacenter), and GENI (Elliott, 2008) as a distributed system of edge servers. The system then runs a forecasting code submitted by the user near the data source, or data store, or the user. Figure 1 illustrates the workflow supported by CIEF:

 

  1. The user submits her function code, along with dependencies / libraries.
  2. An ‘orchestrator’ analyzes that submitted code, installs code dependencies, and configures the container in which the code will run.
  3. We use OpenWhisk (Apache OpenWhisk, 2020) as the serverless platform to run the forecasting function.
  4. Especially with huge amounts of data, it is important for the orchestrator to strategically place the forecasting function, for example, close to where the source data is generated or where the result data is stored.
  5. The user interface allows the user to authenticate and register, start an experiment, access the logs and results, visualize results and compare or validate models.

 

Figure 1: High-level view of the CIEF prototype

Figure 2 shows the user’s view of CIEF: user submits her code (e.g., written in the R programming language), accesses logs, and views results (e.g., comparing different prediction models).

Figure 2: Screenshots showing the user interface of CIEF.

Chameleon has proven to be a critical infrastructure over which CIEF has been tested and used by our ecology collaborators. Chameleon resources, however, need to be reserved in advance and are only available until the reservation expires. We are investigating ways to dynamically reserve and elastically manage Chameleon resources so CIEF can operate more reliably.

 

The vision is for CIEF to also support other applications (with different demands on resources) and to broker services provided by any number of cloud providers, including commercial offerings, as shown in Figure 3.

 

 

Figure 3: CIEF's general architecture.

References:
Apache OpenWhisk. (2020). http://openwhisk.apache.org/
Dietz, M., & Matta, A. (2018). EFCI. A Scalable and Secure Cyberinfrastructure for the Repeatability of Ecological Research. https://www.bu.edu/hic/2018/02/01/a-scalable-and-secure-cyberinfrastructure-for-the-repeatability-of-ecological-research/
Ecological Forecasting Initiative. (2020). https://ecoforecast.org/
Elliott, C. (2008). GENI (Global Environment for Network Innovations). 33rd IEEE Conference on Local Computer Networks (LCN). https://doi.org/10.1109/lcn.2008.4664143
Paul Castro, Ishakian, V., Muthusamy, V., & Slominski, A. (2019). The Rise of Serverless Computing. Communications of the ACM, 62(12), 44–54.


Add a comment

No comments