Reported Outages

chi@uc outage

Resolved Posted by Michael Sherman on November 09, 2022
Outage start Wednesday, November 09, 2022 3:03 p.m.
Expected end Wednesday, November 09, 2022 10 p.m.

10pm CT: This is now resolved.


Due to a disk failure, we need to restart the management node running CHI@UC.

Scheduled maintenance on authentication systems

Resolved Posted by Adam Cooper on November 02, 2022
Outage start Monday, December 05, 2022 9 a.m.
Expected end Monday, December 05, 2022 10 a.m.

We will be bringing down our authentication system for about an hour at 9 AM CT on December 5th for scheduled maintenance. If there are any conflicts with your deadlines, please contact us immediately.

CHI@UC server error

Resolved Posted by Mark Powers on October 27, 2022
Outage start Thursday, October 27, 2022 8 a.m.
Expected end Thursday, October 27, 2022 9:35 a.m.

UPDATE: Services are now working as expected. Please contact the help desk if you encounter any issues.


Requests to CHI@UC are not working. Running experiments should be unaffected.  

CHI@TACC Certificate Issue

Resolved Posted by Cody Hammock on October 20, 2022
Outage start Thursday, October 20, 2022 10 a.m.
Expected end Thursday, October 20, 2022 10:50 a.m.

Due to an issue with SSL certificate generation, CHI@TACC was unavailable between 10:00am and 10:50am Central Time. Running instances were not affected.

New instances failing to provision at CHI@UC

Resolved Posted by Adam Cooper on October 12, 2022
Outage start Wednesday, October 12, 2022 3 p.m.
Expected end Thursday, October 13, 2022 5:32 p.m.

Nodes at UC are currently failing to provision due to an issue communicating with the switches’ control plane, staff are investigating.

CHI@TACC Skylake nodes unavailable

Resolved Posted by Cody Hammock on September 29, 2022
Outage start Wednesday, September 28, 2022 1 p.m.
Expected end Monday, October 17, 2022 7:49 a.m.

Resolved: The failed switch has been replaced, and nodes are once again available.

Due to hardware issues with a swtich, The Skylake compute nodes for CHI@TACC are currently unavailable. Staff are working to restore connectivity for these nodes.

TACC and Authentication Scheduled Maintenance

Resolved Posted by Adam Cooper on September 16, 2022
Outage start Monday, September 26, 2022 11 a.m.
Expected end Monday, September 26, 2022 12 p.m.

The TACC site and our authentication system will be down for scheduled maintenance for an hour on September 26th. 

Authentication issues affecting some users

Resolved Posted by Michael Sherman on September 12, 2022
Outage start Saturday, September 10, 2022 12 p.m.
Expected end Monday, September 12, 2022 6 p.m.

This has been resolved: Users receiving "No active allocations" in error can fix it by logging out and back in.

Chameleon Portal, CHI@TACC, KVM@TACC, and CHI@EDGE networking outage

Resolved Posted by Mark Powers on September 02, 2022
Outage start Friday, September 02, 2022 10:46 a.m.
Expected end Friday, September 02, 2022 3:04 p.m.

UPDATE: All services should now be working as expected

---

Due to a networking issue at TACC, Chameleon’s portal, CHI@TACC, KVM@TACC, and CHI@EDGE are currently unavailable. 
This affects site access as well as already running resources. Site networking staff are investigating, but there is no ETA
for resolution at this time.
 

Object Store at UC currently unavailable

Resolved Posted by Michael Sherman on August 24, 2022
Outage start Wednesday, August 24, 2022 5:22 p.m.
Expected end Thursday, August 25, 2022 10:52 a.m.

The outage is resolved, and the object store is available again.

A rebalancing operation in the underlying storage cluster caused higher than normal I/O, and pushed some of the storage nodes into an unstable state. The node configuration has been tuned to prevent this instability from recurring.