Reported Outages

Provisioning failures for CHI@TACC

Resolved Posted by Cody Hammock on June 29, 2022
Outage start Tuesday, June 28, 2022 12 p.m.
Expected end Wednesday, June 29, 2022 4:15 p.m.

We discovered an issue preventing DHCP from working when provisioning nodes, this has now been resolved.
If you saw error messages like:

Exceeded maximum number of retries. Failed to provision instance <uuid>: Timeout reached while waiting for callback for node <uuid>

or that the instance simply failed to start after a long while at TACC during this time, please try again, as this may have been the cause.

Out-of-band switch maintenance at UC

Resolved Posted by Michael Sherman on June 29, 2022
Outage start Thursday, June 30, 2022 10 a.m.
Expected end Thursday, June 30, 2022 10 a.m.

Between 10AM and 11 AM Central Time, there will be brief interruptions of the out-of-band network for certain racks at UC.

Users may notice failures to power instances on/off, or to deploy new instances. Running instances are unaffected.

User Portal Maintenance

Resolved Posted by Adam Cooper on June 27, 2022
Outage start Tuesday, June 28, 2022 11 a.m.
Expected end Tuesday, June 28, 2022 11:30 a.m.

Scheduled TLS Certificate Maintenance

storage system interruption at UC

Resolved Posted by Michael Sherman on June 13, 2022
Outage start Monday, June 13, 2022 3:08 p.m.
Expected end Wednesday, June 15, 2022 2:13 p.m.

Update: 06/15/2022:

Provisioning of new instances at UC is now functional with all the Chameleon supported images available. 

Unfortunately, we were not able to restore all of the images. We are still in the process of restoring some of them and will be contacting users whose images have not been possible to restore and working with them through available options. If you don’t find the image you are looking for, please reach out to help desk.

CHI@TACC Upgrade June 21, 2022

Resolved Posted by Cody Hammock on June 09, 2022
Outage start Tuesday, June 21, 2022 8 a.m.
Expected end Wednesday, June 22, 2022 7:47 p.m.

RESOLVED: The upgrade is complete! All services are operating normally.

On June 21st, Chameleon services at TACC will be unavailable to permit an OpenStack version upgrade. During this time, the CHI@TACC dashboard and API will be unavailable. Network access to running instances will also be interrupted, but instances will continue to run and will become available again after the upgrade. KVM@TACC will not be impacted.

TACC Network Maintenance Sunday June 26th 2022

Resolved Posted by Cody Hammock on June 08, 2022
Outage start Sunday, June 26, 2022 10 a.m.
Expected end Sunday, June 26, 2022 4 p.m.

UPDATE: The Network Maintenance scheduled for Sunday, June 26th, 2022 has been suspended. 
 

Network maintenance will be carried out between 10:00 AM and 4:00 PM (CST) on Sunday, June 26th. Access to all TACC systems will be unavailable during this time, including CHI@TACC, KVM@TACC, and the Chameleon Portal. Instances will continue to run, but users will have no access to TACC services and systems until the upgrade is complete.

Please submit any questions you may have via the TACC User Portal.

CHI@NU Public network down

Resolved Posted by Michael Sherman on June 02, 2022
Outage start Wednesday, June 01, 2022 9 p.m.
Expected end Thursday, June 02, 2022 4:36 p.m.

Update: 4:36 PM. NU has restored network connectivity, and the site is back up.


The network providing access to CHI@NU has gone down, preventing access to the site. The web UI and all instances are inaccessible.

Site staff are investigating, and we'll update here with a timeline for resolution.

Baremetal Provisioning Outage for CHI@TACC

Resolved Posted by Cody Hammock on June 01, 2022
Outage start Wednesday, June 01, 2022 8 a.m.
Expected end Friday, June 03, 2022 2:22 p.m.

Resolved: The system is now operating normally. Thank you for your patience.

CHI@TACC is currently experiencing an outage in provisioning baremetal nodes. This does not affect currently running instances, but prevents the launch of new ones. The team is working to resolve the issue.

Upcoming Maintenance window at UC

Resolved Posted by Michael Sherman on May 25, 2022
Outage start Monday, June 06, 2022 8 a.m.
Expected end Tuesday, June 07, 2022 6 p.m.

Update 5:30 pm: Issues are resolved, all nodes are usable again.


Update 4pm June 7th: Provisioning of baremetal nodes is restored. We're seeing failures to create leases for P2 nodes (types compute_skylake, gpu_rtx_6000), but reservation of P3 nodes is succeeding.