Reported Outages

CHI@TACC Network connectivity

Posted by Cody Hammock on September 29, 2025
Outage start Saturday, September 27, 2025 6 a.m.
Expected end Thursday, October 02, 2025 4 p.m.

Update 10/02/2025:

KVM@TACC and CHI@Edge are now back to normal.
CHI@TACC is back to normal, exept for the following:

  • A subset of baremetal nodes that lost network configuration
  • Composable hardware (Liqid and GigaIO) cannot reach their attached GPUs, pending a restart of these nodes and their shared PCIe backplane.

Thank you for your patience as we restore these!
 

Chameleon Jupyterhub outage

Resolved Posted by Francois Halbach on September 27, 2025
Outage start Saturday, September 27, 2025 6:30 a.m.
Expected end Saturday, September 27, 2025 11:59 p.m.

Chameleon Jupyterhub is unavailable.

CHI@UC instance provisioning outage

Resolved Posted by Michael Sherman on September 15, 2025
Outage start Monday, September 15, 2025 12 p.m.
Expected end Tuesday, September 16, 2025 12:54 p.m.

As of the morning of September 16th, the switch has had another issue, and is no longer passing traffic. Staff are investigating, but we do not have an ETA yet.

CHI@Edge packet forwarding failure

Resolved Posted by Michael Sherman on August 01, 2025
Outage start Thursday, July 31, 2025 6 p.m.
Expected end Friday, August 01, 2025 9:21 p.m.

This is now resolved.

For those interested in the details, see https://github.com/projectcalico/calico/issues/9622


CHI@Edge containers are currently failing to send any network traffic. This is due to an unanticipated edge case during routine maintenance breaking breaking updates to the calico network database.

July 2025 Trovi maintenance

Resolved Posted by Mark Powers on July 21, 2025
Outage start Tuesday, July 22, 2025 9:30 a.m.
Expected end Tuesday, July 22, 2025 11 a.m.

On the morning of July 22, we'll be upgrading the infrastructure hosting trovi. During this time, you will not be able to upload, edit, or launch trovi artifacts. Running Jupyter servers will not be affected.

July 2025 Authentication Maintenance

Resolved Posted by Mark Powers on July 15, 2025
Outage start Monday, July 21, 2025 9 a.m.
Expected end Monday, July 21, 2025 9:05 a.m.

All functionality should be restored. Please contact the help desk if you encounter any issues.

---

On Monday, July 21 we will be upgrading out authentication server. As a result, login to all Chameleon web services will be down, including portal, Jupyter, and horizon. Network connections to running instances will not be affected. We expect the interruption to last 5 minutes.

CHI@Edge container launches timing out

Resolved Posted by Michael Sherman on June 09, 2025
Outage start Saturday, June 07, 2025 6 a.m.
Expected end Monday, June 09, 2025 6 p.m.

It appears that early morning on Saturday, container launches on CHI@Edge began timing out. We are currently investigating the cause, and will update here.

CHI@NCAR Maintenance Window June1 - June 13

Resolved Posted by Michael Sherman on May 28, 2025
Outage start Sunday, June 01, 2025 9 a.m.
Expected end Tuesday, June 10, 2025 4:28 p.m.

Update June 10th: The datacenter maintenance has concluded, and CHI@NCAR is back online.

Network Maintenance at ANL affecting Authentication, Trovi, and CHI@UC

Resolved Posted by Michael Sherman on May 28, 2025
Outage start Wednesday, May 28, 2025 6 p.m.
Expected end Wednesday, May 28, 2025 10 p.m.

Tonight, from 6pm - 10pm US Central time, network maintenance at ANL will cause rolling outages impacting access to CHI@UC, Trovi, and to Authentication services for all Chameleon sites.

We expect the interruptions to be in the range of minutes to <= 1 hour, sometime during this window.

Connections (e.g. over ssh) to instances that are already running at other sites will not be affected, but you may observe interuptions in the ability to launch new instances due to the authentication outage.

CHI@Edge: outage impacting all edge devices

Resolved Posted by Michael Sherman on May 12, 2025
Outage start Monday, May 12, 2025 4:30 p.m.
Expected end Monday, May 12, 2025 6:56 p.m.

Resolved: 7:00 pm Monday:

Reservations are workign again, and tests to launch a container and access it via floating IP are succeeding, implying that the network underlay is also healthy once again.

Please let us know if you continue to observe issues.