Provisioning issues with some configurations at CHI@TACC

Resolved Posted by Michael Sherman on July 05, 2022
Outage start Tuesday, July 05, 2022 8 a.m.
Expected end Wednesday, July 06, 2022 6 p.m.

UPDATE 07/26/22: compute_zen3 nodes are now working as expected.


UPDATE 07/06/22: We have worked around a software error on a new network switch, sharedwan1 is now working as expected.


UPDATE 07/05/22: The Corsa switch configuration has been corrected, and compute_skylake nodes are once again available for use.


We've observed issues with the following node_types and networks at CHI@TACC. The symptoms are that provisioning new instances either takes a long time, or times out. We're working on fixes for the following:

  • The sharedwan1 network (Layer 2 connectivity is fine, but is encountering issues with DHCP and routing)
  • compute_skylake nodes (a configuration issue with the Corsa switch is causing issues during provisioning)
  • compute_zen3 nodes (a kernel update in the provisioning image is causing failures during provisioning)

We have confirmed that the compute_cascadelake and compute_cascadelake_r node types, and the sharednet1 network are functioning as normal, and can be used if you need baremetal nodes while we fix the above issues.