Pre-upgrade health checks for ROSA Classic and OSD clusters
Environment
- Red Hat OpenShift Service on AWS classic architecture (ROSA Classic) 4
- Red Hat OpenShift Dedicated (OSD) 4
Issue
- Perform pre-upgrade health checks on a ROSA Classic and OSD cluster
- How to check the health of cluster prior to an upgrade
Resolution
-
Red Hat's Managed OpenShift OSD and ROSA Classic products now include a PreHealthCheck (PHC) feature through the
managed-upgrade-operator(MUO) cluster operator. PHC allows users to identify potential problems prior to a version upgrade with OSD or ROSA Classic. The PHC is always run at least one time during the upgrading phase but can also be run additionally in advance if the upgrade is scheduled for more than 2 hours from the current time. -
Scheduling an upgrade, via the Hybrid Cloud Console or the
rosaCLI, at least 2 hours before its maintenance window will insure that the PHC will run additionally in advance and report back any known issues that could delay or block the scheduled upgrade. -
PHC will send a service log notification regarding any issues discovered that could impact the upgrade.
- Please refer to our PHC Troubleshooting article for more information on the types of failures that can be reported and their steps to a possible resolution.
-
Additionally, you can also leverage
occommands, such asoc adm upgrade, to check various cluster health and status details. -
Please review our Proactive OSD/ROSA article for steps and information on how to report a proactive cluster maintenance ticket to Red Hat Support, if deemed necessary.
Diagnostic Steps
-
Download the latest OpenShift Content from mirror.openshift.com is not included.
ocand ROSA Content from mirror.openshift.com is not included.rosaCLI, if you have not already done so. -
View the cluster status.
- Run the command:
$ oc adm upgrade
- Note: This is a
read-onlycheck for anyDegradedstates of cluster operators and does not initiate any changes to your cluster. - If the output includes
Failing=True, please create a Support Case in the Red Hat Customer Portal.
- Confirm the available versions that the cluster can be upgraded to, and note the recommended version.
$ rosa list upgrade --cluster=<cluster_name OR cluster_id>
- MUO will run the new PHC feature automatically as long as the cluster's maintenance window is scheduled at least 2 hours in advance.
- This is the minimum time frame to allow PHC to finish performing its health checks and allow any discoveries and their resulting service logs to appear in the Hybrid Cloud Console's "Cluster History" tab and minimize their impact on the planned upgrade.
- Example Service Log:
NodePool 'xxxxxx' upgrade maintenance failed
node pool upgrade failed due to error: found 1 critical alerts
Scheduling and Canceling a Cluster Upgrade
- To schedule an upgrade, run the following command with the correct Date and Time for your maintenance window.
$ rosa upgrade cluster -c <cluster_name OR cluster_id> --version <version-id> --schedule-date 2024-05-18 --schedule-time 09:00 --version <version_number>
- To cancel a scheduled upgrade, please verify that the cluster upgrade has not already started by running the following command. Please note that if the upgrade has already started, it CANNOT be stopped or canceled and must complete the process.
$ rosa list upgrades --cluster=<cluster_name OR cluster_id>
Example output:
VERSION NOTES
4.15.14 recommended - scheduled for 2024-06-02 15:00 UTC
4.15.13
- If the upgrade has not started, please run the following command to delete its schedule and cancel the upgrade.
$ rosa delete upgrade --cluster=<cluster_name OR cluster_id>
Confirm the deletion by entering 'Yes' when prompted
Performing pre-upgrade checks manually
- Check all the cluster operators for any that may be in DEGRADED=True state.
$ oc get co
- Check the status of Machine Config Pools.
$ oc get mcp
- Check that there are no restrictive Pod Disruption Budget (PDB) defined in workload namespaces.
$ oc get pdb -A
- Check operator compatibility.
- For any additional operators installed from the OperatorHub, verify that the operator versions are compatible with the OpenShift target version by using the This content is not included.Operator checker and consult the OpenShift Operator Life Cycles for more information.