Drain with PodDisruptionBudget blocks in OpenShift 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • PodDisruptionBudget (PDB)

Issue

  • If a PodDisruptionBudget is not correctly configured, a cluster administrator is not able to drain the node for the OS updates.
  • Some cluster providers have no control on the business pods so it means that in production clusters they will not be able to automatically drain & upgrade the node for example until an action is taken on the business side (responsible of the pod).
  • Having both a PodDisruptionBudget and replicas as 1, will lead to the situation that the drain is blocked.

Resolution

Since not all voluntary disruptions are constrained by Pod Disruption Budgets, the workaround is to use pod deletion (assuming one has no control over business pods).

Starting with OpenShift 4.5, the oc adm drain subcommand has a dedicated option "--disable-eviction" that could be used to ignore PodDisruptionBudgets after a normal drain has failed for some time:

$ oc adm drain --disable-eviction=true [node_name]

Where [node_name] has to be set accordingly and other options could be needed (for example --ignore-daemonsets, --delete-emptydir-data, --force, etc.)

Root Cause

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

The situation where there is minAvailable >= replicas OR maxUnavailable = 0 is useful and legitimate scenario. It can be used where the application owner is stressing to the cluster owner "do not do migrate my pods" without consulting the application first (Content from kubernetes.io is not included.as referenced in the Kubernetes documentation).

Diagnostic Steps

Refer to PodDisruptionBudget (PDB) could cause Machine-Config-Operator (MCO) to be degraded during OCP4 upgrade for related errors when PDB doesn't allow nodes to be drained.

SBR
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.