Outage start | Tuesday, January 28, 2025 6 a.m. |
Expected end | Tuesday, January 28, 2025 6 p.m. |
Update Jan 28th, 8pm ct:
The Jan 28th maintenance window has concluded, and KVM@TACC is back to normal.
During work, we observed less than an hour of downtime, primarily affecting the dashboard. Authentication and networking were interrupted for ~10 minutes.
We were able to migrate the loadbalancer and database services to a redundant configuration, but had to roll back the migration of other control-plane services due to unforseen complications. We'll be investigating a path forward, and a subsequent (shorter) maintenance window to finish the migration for each service.
We're proposing a maintenance window for KVM@TACC for the full day on Jan 28th. We expect the actual downtime to be less than that, but want to reserve the time in case of unexpected issues.
During the downtime, you won't be able to launch new instances, or access your existing ones on KVM@TACC. Afterwards, everything will be as usual.
Finally, this work will not change anything about how you interact with KVM@TACC, it is resolving tech debt and laying the groundwork for the GPU and reservation feautes we announced previously.
We understand that several courses are using KVM@TACC in the coming weeks and months, and ask that you please let us know if this timing presents an issue.
This window will be used for two purposes:
1. Reconfiguration of control-plan services, in order to allow service upgrades without future downtime
2. openstack version upgrades: we are targeting a version of Antelope/2023.1 (the same as other sites). As we're upgrading from a much older release, and upgrading one "step" at a time, we will be able to stop "early" and resume the upgrade path at a future date.