Pre-upgrade health checks for ROSA HCP
Environment
- Red Hat OpenShift Service on AWS (ROSA HCP) 4
Issue
- Perform pre-upgrade health checks on a ROSA HCP Cluster
- How to check the health of cluster prior to an upgrade
Resolution
-
ROSA HCP clusters consist of a hosted control plane in a Red Hat‑managed AWS account and node pools in your AWS account. Upgrades are performed in two phases: the control plane must be upgraded first, followed by machine pools. Both phases include automated pre‑flight checks (PHC) that run immediately when an upgrade is scheduled.
-
For how to upgrade ROSA clusters, please refer to Upgrading Red Hat OpenShift Service on AWS clusters.
-
Please refer to our Control plane upgrade troubleshooting guide for ROSA clusters article for more information on ROSA HCP control plane upgrade troubleshooting.
-
Please refer to our Node pools(machine pools) upgrade troubleshooting guide for ROSA HCP cluster article for more information on ROSA HCP machine pool upgrade troubleshooting.
-
-
Additionally, you can also leverage
occommands, such asoc adm upgrade, to check various cluster health and status details. -
Please review our Proactive OSD/ROSA article for steps and information on how to report a proactive cluster maintenance ticket to Red Hat Support, if deemed necessary.
Diagnostic Steps
-
Download the latest OpenShift Content from mirror.openshift.com is not included.
ocand ROSA Content from mirror.openshift.com is not included.rosaCLI, if you have not already done so. -
View the cluster status.
- Run the command:
$ oc adm upgrade
- Note: This is a
read-onlycheck for anyDegradedstates of cluster operators and does not initiate any changes to your cluster. - If the output includes
Failing=True, please create a Support Case in the Red Hat Customer Portal.
- Confirm the available versions that the cluster can be upgraded to, and note the recommended version.
- For the control plane
$ rosa list upgrade --cluster=<cluster_name OR cluster_id>
- For a specific machine pool
$ rosa list upgrade --cluster=<cluster_name_or_id> --machinepool=<machinepool_name>
- ROSA HCP performs automated pre‑flight checks when a control plane and machine pool upgrade is scheduled. If any check fails, the upgrade is aborted and a service log is generated. You can monitor results in the Cluster History tab of the Hybrid Cloud Console and in service logs. If a check fails, resolve the underlying issue and reschedule the upgrade.
-
Example Service Log for control plane:
Control Plane upgrade maintenance failed
Control plane upgrade failed: found 2 critical alerts -
Example Service Log for machine pool:
NodePool 'xxxxxx' upgrade maintenance failed
node pool upgrade failed due to error: found 1 critical alerts
Scheduling and Canceling a Cluster Upgrade
- To schedule an upgrade, run the following command with the correct Date and Time for your maintenance window.
- Control Plane Upgrade
$ rosa upgrade cluster -c <cluster_name OR cluster_id> --version <version-id> --schedule-date 2024-05-18 --schedule-time 09:00 --version <version_number>
- Machine Pools Upgrade
$ rosa upgrade cluster -c <cluster_name OR cluster_id> <machinepool_name> --version <version-id> --schedule-date 2024-05-18 --schedule-time 09:00 --version <version_number>
- To cancel a scheduled upgrade, please verify that the cluster upgrade has not already started by running the following command. Please note that if the upgrade has already started, it CANNOT be stopped or canceled and must complete the process.
- Control Plane Upgrade
$ rosa list upgrades --cluster=<cluster_name OR cluster_id>
Example output:
VERSION NOTES
4.15.14 recommended - scheduled for 2024-06-02 15:00 UTC
4.15.13
- If the upgrade has not started, please run the following command to delete its schedule and cancel the upgrade.
$ rosa delete upgrade --cluster=<cluster_name OR cluster_id>
Confirm the deletion by entering 'Yes' when prompted
- Machine Pools Upgrade
$ rosa list upgrades --cluster=<cluster_name OR cluster_id> --machinepool=<machinepool_name>
Example output:
VERSION NOTES
4.15.14 recommended - scheduled for 2024-06-02 15:00 UTC
4.15.13
- If the upgrade has not started, please run the following command to delete its schedule and cancel the upgrade.
$ rosa delete upgrade --cluster=<cluster_name OR cluster_id> --machinepool=<machinepool_name>
Confirm the deletion by entering 'Yes' when prompted
Performing pre-upgrade checks manually
- Check all the cluster operators for any that may be in DEGRADED=True state.
$ oc get co
- Check the status of Machine Pools.
$ rosa list machinepools --cluster=<cluster_name>
- Check that there are no restrictive Pod Disruption Budget (PDB) defined in workload namespaces.
$ oc get pdb -A
- Check operator compatibility.
- For any additional operators installed from the OperatorHub, verify that the operator versions are compatible with the OpenShift target version by using the This content is not included.Operator checker and consult the OpenShift Operator Life Cycles for more information.