Network outage at TACC January 10th-13th

Resolved Posted by Jason Anderson on January 13, 2020
Outage start Friday, January 10, 2020 4 p.m.
Expected end Monday, January 13, 2020 9 a.m.

We experienced a networking outage across our TACC cluster starting Friday at approximately 4pm. Our internal DHCP service at CHI@TACC stopped being responsive, and as a result when DHCP leases expired for nodes within an experiment, they effectively were disconnected from the network. This causes experimental nodes to be unreachable via SSH, though the Chameleon portal and user interfaces were still operational. CHI@UC was unaffected.

Connectivity was largely restored on Sunday evening at 4pm, and was fully resolved Monday morning by 9am.