ROSA HCP cluster machine pools not upgraded to expected version
Environment
- Red Hat OpenShift Service on AWS (ROSA) HCP
- 4
Issue
- Check the SL(service log) from OCM console shows the workers-1 machine pool already been upgraded to 4.15.33
NodePool 'workers-1' upgrade maintenance scheduled Cluster Updates Info xxxxxxxxx 4 Oct 2024, 11:00 UTC
Cluster is scheduled for NodePool upgrade maintenance to version '4.15.33' on 2024-10-04 at 11:06 UTC
NodePool 'workers-1' upgrade maintenance beginning Cluster Updates Info xxxxxxxxx 4 Oct 2024, 11:09 UTC
NodePool is currently being upgraded to version '4.15.33'
........
NodePool 'workers-1' upgrade maintenance completed Cluster Updates xxxxxxxxx 9 Oct 2024, 02:54 UTC
NodePool has been successfully upgraded to version '4.15.33'
- However, by checking the machine pool version in OCM, it still shows the older version 4.15.14
NAME CLUSTER DESIRED NODES CURRENT NODES AUTOSCALING AUTOREPAIR VERSION UPDATINGVERSION UPDATINGCONFIG MESSAGE
workers-1 xxxxxx 2 2 False True 4.15.14 False False
Resolution
- Do not modify the machine pool when the machine pool is upgraded. You can check SL (service log) below in OCM console to see when the upgrade has begin and if the upgrade has finished or not
NodePool 'xxxxxx' upgrade maintenance beginning
NodePool is currently being upgraded to version 'xxxxx'
......
NodePool 'xxxxxx' upgrade maintenance completed
NodePool has been successfully upgraded to version 'xxxxx'
- If you happen to change the setting towards the machine pool when it is upgrading and after the machine pool upgrade SL shows it is finished, but the machine pool version is still on the old version, please create a support ticket to RedHat support for help.
Root Cause
- There is an internal bug reported when upgrading HCP machine pool and not finished yet, the action of modifying the machine pool like adding a machine to the machine pool might cause the upgrade to be pending. The final solution is still on developing
This content is not included.Prevent NP version from being updated when NP upgrade is in progress
Diagnostic Steps
- Check the SL(service log) from the OCM console. After the upgrade of the machine pool started, and before the machine pool upgrade finished, the machine pool was changed manually by cluster owner as the SL (service log ) below shows
Node pool 'workers-1' has been updated in cluster 'xxxxxx' Cluster Scaling Info xxxxxx 7 Oct 2024, 03:29 UTC
Node pool ID: 'workers-1', Replicas: '2', Instance Type: 'm7i.xlarge', EC2 Metadata Http Tokens: 'optional', 'no labels', 'no taints', Tags: 'red-hat-managed: true, red-hat-clustertype: rosa, api.openshift.com/id: xxxxxxxx, api.openshift.com/name: xxxxxxxx, api.openshift.com/environment: production, api.openshift.com/nodepool-ocm: workers-1, api.openshift.com/legal-entity-id: xxxxxx, api.openshift.com/nodepool-hypershift: xxxxxx-workers-1'
- Even try to upgrade the machine pool again using the same command, it will say there are already one upgrade ongoing
$ rosa upgrade machinepool -c xxxxxxx workers-1 --version 4.15.33
WARN: There is already a started upgrade to version 4.15.33 on 2024-10-04 11:06 UTC
INFO: An upgrade already exists for machine pool 'workers-1' in cluster 'xxxxxxx'
SBR
Product(s)
Components
Category
Tags
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.