Cannot evict rook-ceph-mon pod due to pod violating PodDisruptionBudget in OCS 4.x

Solution Verified - Updated

Environment

  • Red Hat Openshift Storage (RHOCP)
    • 4.x
  • Red Hat Ceph Storage 4.x

Issue

  • OCP upgrade stuck on OCS node and machine-config-daemon reporting cannot evict mon pod.

    # oc logs machine-config-daemon-12345 -f -c machine-config-daemon
    I0721 21:02:27.039558 1483239 update.go:92] error when evicting pod "rook-ceph-mon-a-aaaabbbb-cccc" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
    
  • While draining OCS node theoc adm drain command waiting for mon pod to get evicted

    # oc adm drain rhocs02 --ignore-daemonsets=true --delete-local-data=true --force
    error when evicting pod "rook-ceph-mon-a-ccccdddd-aaaa" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
    

Resolution

  • Wait for atleast 10 minutes to allow pods to get evicted.

  • If pod is still not getting evicted then issues related to OCS nodes must be resolved first. Possibly OCS nodes are in unhealthy state. It might be possible that remaining MON pods are not available or the pods are not running or the pods are not in quorum.

Root Cause

  • Evicting mon pod will put ceph cluster further in degraded state hence machine-config-daemon on OCS node will fail to evict healthy mon pod.

Diagnostic Steps

  • Verify that ceph cluster is in HEALTH_OK state
  • Not all MONs are in quorum.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.