OpenShift upgrade delayed due to Virtual Machines failing to drain from nodes

Solution Verified - Updated 24 Feb 2026

Environment

Red Hat OpenShift Container Platform 4.x
OpenShift Virtualization 4.x

Issue

Unable to complete OpenShift cluster upgrades timely
Virtual Machines unable to migrate quick enough to drain nodes timely
Virtual Machines migrating over slow network, unable to add and configure dedicated network at the moment.

Resolution

The proper resolution to this situation is to have a high speed dedicated migration network, as recommended in the product Documentation and tune the migration settings as described in the OpenShift Virtualization - Tuning & Scaling Guide. The below is just a workaround.
If no network changes are possible at the time, and a higher stun time (switchover time), where the VM pauses for longer to allow the data transfer to complete over the slow network is also acceptable, then please let the script below run on the bastion node. This will manually set the downtime to 5s, so most migrations should complete even over slow networks:
```
$ while true; do oc get vmim -A -o json | jq -r '.items[] | select(.status.phase=="Running") | "\(.metadata.namespace) \(.status.migrationState.sourcePod) \(.metadata.name)"' | while read ns pod vm; do oc exec -n "$ns" "$pod" -- virsh migrate-setmaxdowntime 1 5000 && echo "Set downtime on $vm"; done; sleep 10; done
```
This may be insufficient for very big and busy VMs, changing the 5s (5000ms) downtime time above to 10s (10000) or even higher is possible.
Enabling PostCopy in HCO is also possible, but given it only executes in at the migration timeout, this will not make the node drain effective, it will still take a long time as every migration needs to timeout first. The script above helps as soon as the migration of the VM is started (within 10s max, the sleep loop variable). AutoConverge may help, but it may slow down the VM excessively for longer time if the network speed is too low.
Leave the script above running until the upgrade is complete and all migrations are done.
Alternate workaround is to shut down the VMs until the upgrade completes, or reduce their workloads.

Root Cause

Network speed is insufficient for VM live migrations.

Diagnostic Steps

Check the statistics of all currently running VMs and look for dirty rates (in 4k pages per second) being higher than the transfer bandwidth:

$ oc get vmim -A -o json | jq -r '.items[] | select(.status.phase=="Running") | "\(.metadata.namespace) \(.status.migrationState.sourcePod) \(.metadata.name)"' | while read ns pod vm; do echo "--> VM: $vm"; oc exec -n "$ns" "$pod" -- virsh domjobinfo 1; echo ""; done

On newer OpenShift Virtualization versions the above data is available as a Metric in the Web Console.

SBR

Virtualization

Product(s)

Red Hat OpenShift Container Platform

Components

Category

Troubleshoot

Tags

virtual_machine

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.