ROSA HCP cluster machine pools upgrade pending long time

Solution Verified - Updated

Environment

  • Red Hat OpenShift Service on AWS (ROSA) HCP
    • 4

Issue

ROSA HCP cluster machine pools upgrade pending a long time

Resolution

Root Cause

Diagnostic Steps

  • Check node information found there are SchedulingDisabled node
$ oc get nodes
NAME                                         STATUS                     ROLES    AGE     VERSION
......
ip-xx-xxx-xx-xxx.ap-northeast-1.compute.internal Ready,SchedulingDisabled   worker   130d    v1.28.9+416ecaf
  • Check machine API logs and found some pods can not be evicted
E1007 10:31:31.477147       1 machine_controller.go:648] "error when evicting pods/\"logging-loki-index-gateway-0\" -n \"openshift-logging\" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.\n" controller="machine" .........Node="ip-xx-xxx-xx-xxx.ap-northeast-1.compute.internal"
  • Check if PDB information found there are PDB which prevent the node from evicting pods
$ oc get pdb -n openshift-logging
NAME                          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
logging-loki-distributor      1               N/A               0                     19d
logging-loki-gateway          1               N/A               0                     19d
logging-loki-index-gateway    1               N/A               0                     19d
logging-loki-ingester         1               N/A               0                     19d
logging-loki-querier          1               N/A               0                     19d
logging-loki-query-frontend   1               N/A               0                     19d
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.