We all know how wonderful it is to work with the diverse array of bare metal compute resources offered by Chameleon, but what about research which doesn’t require all of the advantages of bare metal? This is what the Chameleon KVM cloud at TACC is for. It is often overshadowed by our bare metal sites – but this month it is in the spotlight!
Why do we have a KVM cloud as a companion to our bare metal offering? Virtualized clouds are more cost-effective because users can deploy multiple VMs on one node instead of allocating a whole bare metal node. This is important given that hardware manufacturers are now putting more cores than ever in one system: our new AMD nodes at TACC go up to 128 cores per node. At the same time, virtualized clouds do not provide the performance isolation that a bare metal cloud offers, which makes them unsuitable for many experiments – but when they can be used they are cheaper. Specifically, for bare metal use you are charged one SU for node hour, but VMs do not use any of your SU allocation. They instead have a unique resource quota model.
In addition, users can make open-ended deployments on the KVM partition, while bare metal leases are limited to 7 days at most. This is why our KVM cloud is popular with educational projects, projects with long-running applications, as well as projects that explore scale. Plus, using KVM is easier than ever with the python-chi interface. We’ve also created an example Jupyter notebook that will help you to get started. Read on to see how using our KVM cloud is different from the bare metal offering, and how to use it to better support your experiments.
Why should I use KVM?
Similarity to bare metal nodes
If your experiment doesn’t require the use of a specific type of hardware, the delicate accuracy of precise hardware measurements, or power management, then KVM may be a good choice for you. KVM instances “feel” just like bare metal. They boot the same images, so you can run the same operating systems, install the same software, and run the same code on them. You can also, of course, connect to your KVM instance via SSH or the serial console in the web browser.
You may have heard that virtual machines are slow because of the overhead of emulating virtual hardware. While it is true that virtual machines will lag behind in terms of disk and network I/O, KVM is actually very comparable to bare metal in terms of CPU performance! For most applications, it is similar in performance to bare metal, and it is actually significantly faster to boot up a virtual machine than a bare metal instance, a big plus especially if you are experimenting with scale or your experiment requires frequent reboots.
With KVM, we offer 7 different configurations, or “flavors,” which subdivide the node resources among different VMs in different ways and thus represent different "slices" of resource. This allows you to configure the exact type of virtual machine that you need for your experiment.
|m1.tiny||1||512 MB||1 GB|
|m1.small||1||2 GB||20 GB|
|m1.medium||2||4 GB||40 GB|
|m1.large||4||8 GB||40 GB|
|m1.xlarge||8||16 GB||40 GB|
|m1.xxlarge||16||32 GB||40 GB|
|m1.xxxlarge||16||64 GB||40 GB|
KVM also has a storage volume feature, powered by OpenStack Cinder. This allows you to store large collections of data in a highly available, fault tolerant remote block storage. You can then mount these volumes on different VMs, allowing you to store all of your experiment data in one place, and use it on any VM. If you happen to need this remote storage capability on bare metal, then you may be interested in our new filesystem preview!
Another unique feature of KVM is Security Groups. You may have seen these on the bare metal sites before, but we disabled their behavior on those sites in favor of ufw. On KVM, these groups are a convenient way to set network ingress/egress rules for all of your KVM instances. You can configure them before they are provisioned, or after they’ve already booted up. One important quirk to understand with these is that KVM instances do not allow SSH by default! The default security group blocks all incoming connections. To allow SSH to your instance, you’ll need to add the “Allow SSH” security group. There are other security groups to allow other types of traffic, or you can create your own. However, in almost all cases, you will not need any groups besides “Allow SSH.” We recommend using an SSH tunnel for any web-facing services, and reviewing our security best practices blog before making any changes to your security groups.
In the spirit of a mainstream on-demand cloud, you can spin up a VM and just leave it running forever! As long as your allocation hasn’t expired, your VM will remain running. This is extremely useful for experiments that take a long time to run, or even experiments that require lots of re-visiting over a long period of time. This is why Chameleon operators regularly use KVM instances for our testing environments: they are always available whenever we need to test new features.
One highly advantageous use of KVM is to run a component of your experiment that must remain highly available for a long period of time, such as an API server, and have it receive requests from high performance compute nodes running in a bare metal site. This way, you can offload the most demanding parts of your experiment to bare metal, and leave book-keeping and data storage to a lower-performance VM. The benefit here is twofold: you save SUs in your allocation, and leave more bare metal hardware available for your fellow researchers. Who doesn’t love that?
KVM Experiment Pattern Notebook
And, we saved the best for last… We recently updated our python-chi interface to be compatible with KVM, so orchestrating your experiments with python code is now easier than ever. And to make it super easy, we also developed a Jupyter notebook that represents an “experiment pattern” of a KVM experiment. First, it guides the user through resource discovery, resource provisioning and configuration, and all the experiment preparation stages and explains in detail how to use the python-chi interface for KVM to implement them. Second, it implements a simple “demo” experiment which is an exploration of Linux cgroups. Each experiment consists of roughly two types of actions – experiment preparation and experiment content – so while your experiment will likely be doing something different, you may find that you could reuse the experiment preparation almost verbatim. In other words, feel free to copy and extend the notebook and use it to kickstart your own experiments. The experiment pattern notebook is available on Trovi to get you started! If you prefer a tutorial on using the KVM@TACC GUI, check out our documentation.
Share with us!
If you conduct some interesting research on KVM (or anywhere on Chameleon ;)) and would like to share with fellow researchers and the Chameleon team, consider uploading it to Trovi, our research sharing and reproducibility platform! For more information on how to share your research on Trovi, check out our May 2022 tips and tricks blog on the subject.