Cannot evict rook-ceph-mon pod due to pod violating PodDisruptionBudget in OCS 4.x
Environment
- Red Hat Openshift Storage (RHOCP)
- 4.x
- Red Hat Ceph Storage 4.x
Issue
-
OCP upgrade stuck on OCS node and
machine-config-daemonreporting cannot evict mon pod.# oc logs machine-config-daemon-12345 -f -c machine-config-daemon I0721 21:02:27.039558 1483239 update.go:92] error when evicting pod "rook-ceph-mon-a-aaaabbbb-cccc" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. -
While draining OCS node the
oc adm draincommand waiting for mon pod to get evicted# oc adm drain rhocs02 --ignore-daemonsets=true --delete-local-data=true --force error when evicting pod "rook-ceph-mon-a-ccccdddd-aaaa" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
Resolution
-
Wait for atleast 10 minutes to allow pods to get evicted.
-
If pod is still not getting evicted then issues related to OCS nodes must be resolved first. Possibly OCS nodes are in unhealthy state. It might be possible that remaining MON pods are not available or the pods are not running or the pods are not in quorum.
Root Cause
- Evicting mon pod will put ceph cluster further in degraded state hence
machine-config-daemonon OCS node will fail to evict healthy mon pod.
Diagnostic Steps
- Verify that ceph cluster is in
HEALTH_OKstate - Not all MONs are in quorum.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.