Chameleon Changelog for May 2018

Great news in Chameleon-land!

I hope everybody is enjoying the nice weather and hopefully a more relaxed schedule for the summer! Read on to find out what we’ve been up to last month.

 

Work on deployment challenges. Many of you noticed recently that launching a large number of instances sometimes resulted in error; in addition, the system was occasionally slow or unresponsive, particularly on CHI@TACC which handles a larger number of nodes. We applied some patches and reconfigured the system to handle the load better. As a result the system is now stable again and the failure rates have dropped. We are still aware of some issues and are working on tracking down the last of them. Some issues will realistically persist. In particular, instances may still fail to launch due to unresponsive or broken hardware; this kind of issue is inherently linked to frequent bare-metal reconfiguration. For this reason, we still recommend that medium to large-scale experiments reserve slightly more nodes than needed, to account for a small percentage of failed launches.

New appliances and appliance upgrades. We made quite a few new appliances available this month to get you on your way faster with complex experiments. In addition to providing new functionality and upgrades, all of our new appliances also have the latest versions Chameleon tools installed, including cc-checks, cc-snapshot and Cloudfuse.

  • ExoGENI stitching appliance. This new appliance automates the creation of a private VLAN network that can be stitched to the slices on the ExoGENI testbed. Simply launch the appliance and create an ExoGENI slice that includes a Chameleon stitchport to extend your experiment across both testbeds -- detailed instructions can be found here.
  • Hadoop appliance. We created a Hadoop complex appliance that deploys a one-click Hadoop cluster, including the Hadoop Distributed File System. The appliance can be used for most Hadoop experiments and can be easily scaled to an arbitrary number of nodes. The appliance comes with a tutorial (see appliance documentation) that teaches distributed computing and networking.
  • CUDA appliance upgrades. We upgraded our CUDA appliance to support CUDA 9.1 for both CentOS 7 (CC-CentOS7-CUDA9) and Ubuntu 16.04 (CC-Ubuntu16.04-CUDA9); both can be launched at CHI@TACC. We will still provide full support of CC-Ubuntu16.04-CUDA8, but can provide only limited support for CC-CentOS7-CUDA8 because CentOS 7 does not support CUDA 8.

 

Changes in appliance charging. We have been seeing very high usage of our GPU, FPGA, and storage hierarchy nodes; GPU nodes in particular typically have to be reserved far in advance. Anticipating this usage dynamics, our original intent has been to eventually charge more SUs for their usage though we never enforced it in practice; we would like to start charging twice the regular SU rate to encourage experimenters to think twice about whether they really need those resources -- and if not make them available to others who do. The changes will come into effect on Monday.

 

 

We are looking forward to a fantastic June -- we will be back throughout the month with new feature announcements and summarize it all again at the end of the month. And as always, please let us know what you think of these changes.

 
 

Add a comment

No comments