App Icon OSU-CentOS7-SRIOV-MVAPICH2-Virt: Documentation

Appliance Details Launch at CHI@TACC 

Please refer to the bare metal user guide for documentation on how to reserve and provision resources using the appliance of CC-CentOS7-SRIOV-MVAPICH2-Virt.

Set up a floating IP on your node and connect to it with SSH. If you are using bare metal nodes with InfiniBand, you can first setup the IB interface ip address (otherwise, skip these steps). To do this, edit the "/etc/sysconfig/network-scripts/ifcfg-ib0" file and check that the last 2 decimals for the IPADDR field are the same as the Ethernet ip address (if the Ethernet address is 10.40.x.y, then the IPADDR field can have the value 172.16.x.y).

Next, run the following two commands as root:

[root@host]# ifdown ib0
[root@host]# ifup ib0

The instance should now be setup to use.

Launching Virtual Machines on Bare-metal InfiniBand Nodes with SR-IOV and IVSHMEM on Chameleon

We provide a CentOS 7 VM image (chameleon-mvapich2-virt-appliance.qcow2) and a VM startup script (start-vm.sh) to facilitate users launching VM with SR-IOV and IVSHMEM. IVSHMEM is an efficient mechanism to enable inter-VM shared memory based communication for co-located VMs. Before you can launch a VM, you have to create a network port first. To do this, source your OpenStack credentials file (see how to download your credentials file) and run this command:

[user@host]$ neutron port-create sharednet1

Note the MAC address and IP address are in the output of this command. You should use this MAC address while launching a VM and the IP address to ssh to the VM. You also need the PCI device ID of the virtual function that you want to assign to the VM. This can be obtained by running "lspci | grep Mellanox" and looking for the device ID (with format - XX:XX.X) of one of the virtual functions as shown below:

[cc@host]$ lspci | grep Mellanox
03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
03:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
...

The PCI device ID of the Virtual Function is 03:00:1 in the above example. Now, you can launch a VM on your instance with SR-IOV and IVSHMEM using provided VM startup script and corresponding arguments as follows with the root account.

[root@host]# ./start-vm.sh <vm-mac> <vm-ifname> <virtual-function-device-id>

Please note that and are the ones you get from the outputs of above commands. And is the name of VM virtual NIC interface. For example:

[root@host]# ./start-vm.sh fa:16:3e:47:48:00  tap0  03:00:1

You can also edit corresponding fields in VM startup script to change the number of cores, memory size, size of IVSHMEM region and etc.

So far, you should have a VM running on your bare metal instance. If you want to run more VMs on your instance, you will have to create more network ports. You will also have to change the name of VM virtual NIC interface to different ones (like tap1, tap2, etc.) and select different device IDs of virtual functions.

Extra Initialization when Launching Virtual Machines

In order to run high performance MPI library MVAPICH2-Virt, across VMs with SR-IOV and IVSHMEM, and in the meantime keep the size of VM image small, extra initialization will be executed when launching VM automatically, which includes:

  • Detect Mellanox SR-IOV drivers, download and install it if nonexistent
  • Detect ID of IVSHMEM device and enable appropriate permission for it
  • Detect MVAPICH2-Virt library, download and install latest one on /opt/mvapich2-virt if nonexistent

After finishing the extra initialization procedure, you should be able to run MVAPICH2-Virt library with SR-IOV and IVSHMEM support across VMs. For more details of MVAPICH2-Virt library, please refer to its user guide.

Important Note after Tearing Down Virtual Machines

Once you tear down the VMs, delete the network ports you created for VMs earlier using the following command

[user@host]$ neutron port-delete PORT

Please note that it is important to delete unused ports after experiments.