PodDisruptionBudget (PDB) could cause Machine Config Operator to be degraded in OpenShift 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Machine Config Operator (MCO)
- PodDisruptionBudget (PDB)
Issue
-
OpenShift 4 upgrade is failing due to
machine-config-operatordegraded. -
MCP is degraded with following message:
pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node [node_name] is reporting: \"failed to drain node: [node_name] after 1 hour. Please see machine-config-controller logs for more information\ -
Log message errors in
machine-config-controllerpod:error when evicting pods/"[pod_name]" -n "[namespace_name]" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
Resolution
All the procedures will cause the offending pod(s) to be deleted, so it is needed to ensure they can be deleted at the time the procedure is executed.
IMPORTANT: if the PDB preventing the upgrade is from OCS, please refer to cannot evict
rook-ceph-monpod due to pod violatingPodDisruptionBudgetin OCS 4, as deleting therook-ceph-monpod if the OCS cluster is not healthy could cause data loss.
IMPORTANT: if the PDB preventing the upgrade is from OpenShift Virtualization (virt-launcher Pods), deleting the PDB will ungracefully kill the Virtual Machine, potentially causing data loss and unintended service disruption. So do NOT follow this KCS. The most common cause for this is RAM dirty rate being higher than network bandwidth, please refer to OpenShift upgrade delayed due to Virtual Machines failing to drain from nodes on how to deal with this situation. However, other issues can also prevent live-migration.
Check for similar pods in the same node
$ oc get pods -n <pod_namespace> -o wide
Apply one of the following procedures
If the affected PDB is related to an OpenShift component, please troubleshoot the cause of the rest of the pods failing. As an example, a configuration issue with the console can cause the console PDB to not allow a console pod to be evicted.
1- Disable eviction for draining the node
Use the --disable-eviction option for manually draining the node as explained in drain with PodDisruptionBudget blocks in OpenShift 4.
2- Delete the pod(s) that cannot be evicted
-
Manually delete the pod/pods that cannot be evicted, to let them recreate in different nodes:
$ oc delete pod <pod_name> -n <pod_namespace> -
Wait until the upgrade is finished, and the MCO is available.
$ watch -n10 "oc get clusterversion; echo; oc get mcp; echo; oc get nodes -o wide; echo; oc get co"
3- If the pod cannot be manually removed
-
Check if the
PodDisruptionBudgethas configuredminAvailable: 1, as it will affect pod eviction process during OCP 4 upgrade:$ oc get pdb <pdb_name> -n <pod_namespace> NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE <pdb_name> 1 N/A 0 18h -
If the pod
replicasis just "1", patch thePodDisruptionBudgetas follows:$ oc patch pdb <pdb_name> -n <pod_namespace> --type=merge -p '{"spec":{"minAvailable":0}}' -
Wait until the upgrade is finished, and the MCO is available.
$ watch -n10 "oc get clusterversion; echo; oc get mcp; echo; oc get nodes -o wide; echo; oc get co" -
Restore the
PodDisruptionBudget:$ oc patch pdb <pdb_name> -n <pod_namespace> --type=merge -p '{"spec":{"minAvailable":1}}'
4- If the PodDisruptionBudget can't be patched
If the patch of the PodDisruptionBudget fails with error: PodDisruptionBudget.policy "<pdb_name>" is invalid: spec: Forbidden: updates to poddisruptionbudget spec are forbidden.
-
Backup the
PodDisruptionBudget:$ oc get pdb <pdb_name> -n <pod_namespace> -o yaml > <pdb_name>.yaml -
Remove the
PodDisruptionBudgetwhich is configured withminAvailable: 1:$ oc delete pdb <pdb_name> -n <pod_namespace> -
Wait until the upgrade is finished, and the MCO is available.
$ watch -n10 "oc get clusterversion; echo; oc get mcp; echo; oc get nodes -o wide; echo; oc get co" -
Edit the file and remove the unneeded metadata and the status (see
PodDisruptionBudgetexample below). -
Create the
PodDisruptionBudgetagain:$ oc create -f <pdb_name>.yaml -n <pod_namespace>
Example of
PodDisruptionBudgetwithminAvailable: 1
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: <pdb_name>
namespace: <pod_namespace>
spec:
minAvailable: 1
selector:
matchLabels:
app: <my-app>
Note: if the pod's
replicasare set to 0, then the rolling update will not be affected even if the ALLOWED DISRUPTIONS is 0 in PDB since there is no pod that needs to be evicted. So another workaround that can be done before the upgrade, other than modifying/deleting the PDB, is to scale the podreplicasto 0 before the upgrade and scale it back to the expectedreplicasafter the upgrade is finished.
Root Cause
A PodDisruptionBudget not correctly configured could cause a node to not being drained, affecting the upgrade:
minAvailable: 1inPodDisruptionBudgetcan be blocker for eviction process while OCP4 upgrade proceed.- If several nodes are rebooted, all the pods could be running in only one node, and the
PodDisruptionBudgetcan prevent to drain the node. - The
PodDisruptionBudgetprevents the automatic eviction of pods, but it's possible to manually delete the pods with aPodDisruptionBudgetconfigured.
Diagnostic Steps
Output when the upgrade is failing:
-
The
machine-configCluster Operator is degraded, and showing a message similar to the following one:$ oc get co machine-config NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.12.45 True False True 2h $ oc get co machine-config -o yaml [...] extension: master: all 3 nodes are at latest configuration rendered-master-b0201cffb3e33e8504ca4cd06644be41 worker: 'pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node [node_name] is reporting: \"failed to drain node: [node_name] after 1 hour. Please see machine-config-controller logs for more information\""' [...] -
Some nodes will be
SchedulingDisabled:$ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-111-11.example.com Ready,SchedulingDisabled master 42h v1.25.14+a52e8df ip-10-0-222-22.example.com Ready worker 41h v1.25.14+a52e8df ip-10-0-333-33.example.com Ready master 42h v1.25.14+a52e8df ip-10-0-444-44.example.com Ready worker 41h v1.25.14+a52e8df ip-10-0-555-55.example.com Ready master 42h v1.25.14+a52e8df ip-10-0-666-66.example.com Ready,SchedulingDisabled worker 41h v1.25.14+a52e8df -
The
machine-config-controllerpod logs show the following messages:$ oc logs -n openshift-machine-config-operator machine-config-controller-xxxxx -c machine-config-controller ... I0220 04:14:18.029980 49566 update.go:89] error when evicting pod "test-1-xxxxx" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0220 04:14:23.055546 49566 update.go:89] error when evicting pod "test-1-xxxxx" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0220 04:14:28.073188 49566 update.go:89] error when evicting pod "test-1-xxxxx" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. -
The pod which is shown in above logs run well without eviction at that time:
$ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-1-xxxxx 1/1 Running 0 47m 10.131.0.23 ip-10-0-111-11.example.com <none> <none> -
Several pods can be running in only one node:
$ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-1-xxxxx 1/1 Running 0 47m 10.131.0.23 ip-10-0-111-11.example.com <none> <none> test-1-yyyyy 1/1 Running 0 47m 10.131.0.24 ip-10-0-111-11.example.com <none> <none> test-1-zzzzz 1/1 Running 0 47m 10.131.0.25 ip-10-0-111-11.example.com <none> <none>
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.