Reported Outages

Unplanned Jupyter downtime May 13

Resolved Posted by Jason Anderson on May 13, 2022
Outage start Friday, May 13, 2022 12:06 p.m.
Expected end Friday, May 13, 2022 1:42 p.m.

We are experiencing an outage of the Jupyter environment and are working to restore service shortly, stay tuned, and apologies for the lack of notice.

Instance Provisioning failures at UC

Resolved Posted by Michael Sherman on May 11, 2022
Outage start Wednesday, May 11, 2022 8 a.m.
Expected end Wednesday, May 11, 2022 4:13 p.m.

Update: 4PM 05/11/2022: This issue is now resolved, provisioning and connectivity should be restored for all UC nodes.


An issue is affecting the provisioning of new instances on P3 nodes at UC. Existing nodes are unaffected.

Network switch failure for P2 nodes at UC

Resolved Posted by Michael Sherman on May 04, 2022
Outage start Wednesday, May 04, 2022 10 a.m.
Expected end Tuesday, May 10, 2022 6:31 p.m.

Update: 6pm 05/10/22: The outage is now resolved. Both switches are now functional, and P2 nodes from nc01-nc64 are back online. New instances have no issues, existing instances may still have connectivity issues. If you have those issues, please try removing and re-attaching the network port to your instance.

Provisioning network failure at CHI@UC

Resolved Posted by Michael Sherman on May 02, 2022
Outage start Friday, April 29, 2022 11:20 a.m.
Expected end Monday, May 02, 2022 1:17 p.m.

Update 05/03/22: This issue is now resolved. It was caused by a combination of two factors: misconfiguration of the DHCP behavior for out-of-band interfaces, and a failure causing an out of band switch to power off.

All affected nodes should be reservable again. If you have an instance that has become inaccessable, please get in touch with us via the helpdesk.

kvm@TACC Unavailable April 22, 2022

Resolved Posted by Cody Hammock on April 22, 2022
Outage start Thursday, April 21, 2022 8 p.m.
Expected end Friday, April 22, 2022 4:08 p.m.

KVM@TACC was unavailable starting in the evening of April 21, 2022. It has been resolved.

CHI@UC down

Resolved Posted by Michael Sherman on March 24, 2022
Outage start Thursday, March 24, 2022 10:25 a.m.
Expected end Thursday, March 24, 2022 12 p.m.

Update: This has been resolved as of 11:42 AM, and the site is back up. Running nodes should not have been affected, aside from the temporary loss of network connectivity.


CHI@UC is currently down due to a failure of the controller node's load-balancer. We will update here with more information.

Network Switch failure at UC

Resolved Posted by Michael Sherman on March 01, 2022
Outage start Tuesday, March 01, 2022 4:04 p.m.
Expected end Sunday, May 01, 2022 4:04 p.m.

Update: Connectivity has been restored. Root cause was a software bug preventing the creation of a PVST instance on the switch, due to a large number of configured vlans. Using a single instance for all VLANs restored functionality.


The 1g switch serving out-of-band access for nodes in rack BG-41 has encountered a (so far) unrecoverable software error, preventing traffic to the out of band interface on nodes P3-CPU-020 to P3-CPU-038.

Networking outage at UC

Resolved Posted by Michael Sherman on February 22, 2022
Outage start Monday, February 21, 2022 3 p.m.
Expected end Tuesday, February 22, 2022 11:17 a.m.

Update 11:16 CST: This should now be resolved. A forwarding loop in the underlying network topology caused some ports to become shut down. Instance provisioning and floating IPs should now be working again. Please reach out if you're still seeing issues on the UC site.


We're currently observing networking issues at UC. New instances are failing to provision, and existing ones are unreachable. We're still investigating the root cause, but will update here when resolved. Other sites are unaffected.

TACC Network maintenance 6 March 2022

Resolved Posted by Cody Hammock on February 21, 2022
Outage start Sunday, March 06, 2022 10 a.m.
Expected end Sunday, March 06, 2022 4 p.m.

Update: The work completed at 12:00 PM (CST).

Network maintenance will be carried out at the TACC site between 10:00 AM and 4:00 PM (CST) on Sunday, March 6th. Access to all systems hostetd at TACC will be unavailable during this time, includeing CHI@TACC, KVM@TACC, CHI@Edge, and the Chameleon Portal. Instances will continue to run, but users will have no access to TACC services and systems until the upgrade is complete.

 

Please submit any questions you may have via the Chameleon Helpdesk: https://chameleoncloud.org/user/help/