Trick or Treat?

Posted by Kate Keahey on October 31, 2016

In the true Halloween spirit we declared war on – and vanquished (or at least did severe damage to) – several dangerous demons that we are all afflicted by on a daily basis:

The Ghost of Complexity Past. Many of our users want to deploy “virtual clusters”, such as for example OpenStack installations or MPI clusters. They are often hard to deploy because their configuration involves exchange of information typically available only at deployment time, such as hostnames or security keys. These last configuration steps are therefore often carried out manually, increasing complexity and making reproducibility hard. To solve this problem we have deployed a tool for configuring such complex appliances so that they can be deployed automatically. Our complex appliance deployment orchestrator allows you to configure complex “virtual clusters” on bare metal hardware “with one click”. It takes an image (appliance) and a template – a document that defines how many instances of this appliance to deploy, what information to exchange, and what scripts to run on deployment – and then deploys a cluster of specified size automatically, and in exactly the same way, time and time again. It  also supports parameterizing deployments – for example, configuring the number of nodes of a cluster. Complex appliance support is currently available on CHI@UC and CHI@TACC. To get started, please check out our guide to working with complex appliances.

The time-sucking Vampire of Having-to-Do-Everything-Yourself. While we are confident that many of you will want to configure your own complex appliances representing interesting frameworks you have developed, we thought we’d get you started with a few. They include several of the most popular distributed applications and frameworks:

  • OpenStack Mitaka: this complex appliance deploys OpenStack Mitaka with DevStack over one controller node and a configurable number of compute nodes

  • MPI bare-metal cluster: this complex appliance deploys a bare-metal MPI cluster using the MVAPICH2 implementation

  • NFS share: this complex appliance deploys an NFS server exporting a directory to a configurable number of NFS clients

  • MPI + SR-IOV KVM cluster: this complex appliance deploys an MPI cluster of KVM virtual machines using the MVAPICH2-Virt implementation and configured with SR-IOV for high-performance communication over Infiniband

All of these complex appliances can be easily found in our Appliance Catalog; they are identified by a cluster icon in the top right corner. We hope that you will configure and share with other users complex appliances that represent the research you are working on using our instructions for customizing or writing new complex appliances and sharing them.

The Nightmare of Knowing-about-Cool-New-Hardware-without-Being-Able-to-Use-It. Now you can lay your hands on non-volatile memory and FPGAs! We added two NVMe SSDs to each of our two storage hierarchy nodes. Each device is a 2.0 TB Intel SSD DC P3700, advertised with a sequential read throughput of 2,800 MB/s and a sequential write throughput of 2,000 MB/s. The storage hierarchy available on each of the storage nodes is now: 512 GiB of RAM, two NVMe SSDs of 2.0 TB each, four Intel S3610 SSDs of 1.6 TB each, and four 15K SAS HDDs of 600 GB each. We also made available four FPGA nodes. Each of these nodes is fitted with a Nallatech 385A board with an Altera Arria 10 1150 GX FPGA (up to 1.5 TFlops), 8 GiB DDR3 of on-card memory, and dual QSFP 10/40 GbE support. They are configured to run OpenCL code, but they can be reconfigured (by a request to our help desk) to run compiled designs prepared with Altera Quartus. Due to export control limitations, access to the development toolchain requires verification of your user profile. We explain how to gain access to the development toolchain and execute code on the FPGA nodes in the FPGA guide.

The Specter of Too-High-Entry-Barrier. We replaced the dusty old spellbook that served us all so well with a fresh bare metal guide that is much easier to follow for different groups of users. We have now split the documentation into two parts, one focusing on using Chameleon through its web interface, and the other focusing on using Chameleon through command-line and APIs. We also created a Community Resources space on the Chameleon website (currently available off the Documentation tab) and added two new training videos demonstrating how to use Chameleon for experimental research with a use case showing a performance evaluation of Spark – one for Linux and OS X users and one for Windows users. From now on, this will be the space where education and training materials using Chameleon will get released – we know that many of you are using Chameleon in classes and hope that these materials will be useful. As always, we are interested in your feedback, so please let us know what types of training materials you would like to see or if you have any you would like to contribute!

Happy Halloween!