Recovering an OpenShift 4 IPI Cluster from Complete etcd Quorum Loss

Solution Verified - Updated 20 May 2026

Environment

Red Hat OpenShift Container Platform 4

Issue

An OpenShift IPI cluster becomes completely unreachable — the API server returns EOF or connection refused on port 6443. The cluster was either freshly installed and never fully bootstrapped, or experienced a failure that left etcd running on only one of three control plane nodes. The etcd operator cannot self-recover because it refuses to make changes when quorum is not fault-tolerant.

Resolution

Step 1: Identify the recovery node

The recovery node is the master that has:

An etcd-pod.yaml manifest in /etc/kubernetes/manifests/
Data in /var/lib/etcd/member/
etcd container (even if crash-looping)

In the example, this is master-2 (10.0.1.12).

Step 2: Stop etcd and kube-apiserver on the recovery node

Move the static pod manifests out to stop the pods:

ssh core@10.0.1.12 "sudo mv /etc/kubernetes/manifests/etcd-pod.yaml /tmp/etcd-pod.yaml"
ssh core@10.0.1.12 "sudo mv /etc/kubernetes/manifests/kube-apiserver-pod.yaml /tmp/kube-apiserver-pod.yaml"

Wait for the pods to stop and ports to be released (15-30 seconds):

ssh core@10.0.1.12 "ss -tln | grep -E '2379|2380|6443'"

All three ports should show no listeners.

Step 3: Back up the etcd data directory

ssh core@10.0.1.12 "sudo cp -a /var/lib/etcd/member /var/lib/etcd/member.bak"

Step 4: Force etcd to a single-member cluster

Use the etcd container image (from the etcd-pod.yaml manifest) to run --force-new-cluster. This rewrites the etcd WAL to remove all other members and makes

Find the etcd image:

grep -o 'quay.io/openshift-release-dev/[^"]*' /tmp/etcd-pod.yaml | head -1

Important: Use --entrypoint etcd — the default container entrypoint includes etcd, so passing etcd as an argument would cause a `'etcd' is not a val

Run force-new-cluster:

ETCD_IMAGE="<image-from-above>"

ssh core@10.0.1.12 "sudo podman run -d --name etcd-force --rm \
  -v /var/lib/etcd:/var/lib/etcd:Z \
  --network=host --privileged \
  --entrypoint etcd ${ETCD_IMAGE} \
  --force-new-cluster \
  --data-dir /var/lib/etcd \
  --name <cluster-infra-id>-master-2"

Wait ~10 seconds, then verify etcd is running as a single member:

ssh core@10.0.1.12 "sudo podman run --rm --network=host \
  --entrypoint etcdctl ${ETCD_IMAGE} \
  --endpoints=http://127.0.0.1:2379 member list -w table"

Expected output — a single member with status started:

+------------------+---------+-----------------------+----------------------------+-----------------------+
|        ID        | STATUS  |         NAME          |       PEER ADDRS           |     CLIENT ADDRS      |
+------------------+---------+-----------------------+----------------------------+-----------------------+
| 131cfa19f51cfb79 | started | <infra-id>-master-2   | https://10.0.1.12:2380     | http://localhost:2379  |
+------------------+---------+-----------------------+----------------------------+-----------------------+

If the member list is empty or the command times out, the force-new-cluster failed. Restore the backup and retry:

ssh core@10.0.1.12 "sudo podman stop etcd-force 2>/dev/null; \
  sudo rm -rf /var/lib/etcd/member; \
  sudo mv /var/lib/etcd/member.bak /var/lib/etcd/member"

Step 5: Stop the temporary etcd and restore manifests

ssh core@10.0.1.12 "sudo podman stop etcd-force"

Wait for ports to be free, then restore the manifests. Restore etcd first, wait a few seconds for it to start, then restore the API server:

ssh core@10.0.1.12 "sudo mv /tmp/etcd-pod.yaml /etc/kubernetes/manifests/etcd-pod.yaml"
sleep 3
ssh core@10.0.1.12 "sudo mv /tmp/kube-apiserver-pod.yaml /etc/kubernetes/manifests/kube-apiserver-pod.yaml"

Step 6: Verify etcd elected itself leader

Check the etcd logs to confirm it successfully became leader:

ssh core@10.0.1.12 "sudo crictl ps | grep 'etcd '"
ssh core@10.0.1.12 "sudo crictl logs <new-etcd-container-id> 2>&1 | grep -E 'became leader|elected leader|ready to serve'"

Expected:

{"msg":"131cfa19f51cfb79 became leader at term 11"}
{"msg":"raft.node: 131cfa19f51cfb79 elected leader 131cfa19f51cfb79 at term 11"}
{"msg":"ready to serve client requests"}

Step 7: Wait for API server to come up

The kube-apiserver may take 30-90 seconds to start after etcd. It may fail once on the first attempt (race condition — etcd not yet ready when apiserver start

# Watch for port 6443 to start listening
ssh core@10.0.1.12 "ss -tln | grep 6443"

# Test API access from the bastion or local machine
export KUBECONFIG=/path/to/auth/kubeconfig
oc get nodes

If the API server keeps crashing, check its logs for the specific error. Common issues at this stage:

etcd not yet ready: Wait longer, kubelet will retry
Certificate issues: The force-new-cluster may have invalidated some certs — the cert-regeneration sidecar should handle this

Step 8: Enable unsafe single-member mode in the etcd operator

Once the API is reachable, the etcd operator will detect quorum=1 and refuse to make any changes (deploy etcd to other masters, generate certificates) becausebut creates a deadlock during recovery.

Override this safety check:

oc patch etcd/cluster --type=merge -p \
  '{"spec":{"unsupportedConfigOverrides":{"useUnsupportedUnsafeNonHANonProductionUnstableEtcd":true}}}'

Check the operator logs to confirm it is now reconciling:

oc logs -n openshift-etcd-operator deployment/etcd-operator --tail=20

The TargetConfigController and EtcdCertSignerController errors should stop, and the operator should begin deploying etcd pods to the other masters.

Step 9: Monitor etcd scaling to all masters

Watch the etcd operator deploy etcd pods to all three masters:

watch 'oc get pods -n openshift-etcd -l app=etcd -o wide; echo; oc get co etcd'

Wait until all three etcd pods show 4/4 Running. This typically takes 2-5 minutes.

You can also verify the static pod manifests are deployed to all masters:

for ip in 10.0.1.10 10.0.1.11 10.0.1.12; do
  echo "$ip: $(ssh -o StrictHostKeyChecking=no core@${ip} 'ls /etc/kubernetes/manifests/' 2>/dev/null | tr '\n' ' ')"
done

Step 10: Remove the unsafe override

Once all three etcd members are healthy and running, remove the override:

oc patch etcd/cluster --type=merge -p '{"spec":{"unsupportedConfigOverrides":null}}'

Important: Do not leave this override in place permanently. It disables quorum safety checks that protect against data loss.

Step 11: Approve pending CSRs

After etcd recovery, kubelet client and serving certificates may have expired or been invalidated. The kubelets will request new certificates but they need ap

Check for and approve pending CSRs:

# Check for pending CSRs
oc get csr | grep Pending

# Approve all pending CSRs
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | \
  xargs -r oc adm certificate approve

Signs that CSR approval is needed:

Nodes show NotReady status
Kubelet logs show Unable to register node with API server: nodes is forbidden: User "system:anonymous" — the kubelet's certificate is invalid and it's fal
Kubelet logs show no serving certificate available for the kubelet

You may need to run the approval command multiple times — new CSRs appear as kubelets re-register and request both client and serving certificates.

Step 12: Wait for full cluster recovery

Monitor all cluster operators:

watch 'oc get co | grep -vE "True.*False.*False"'

The remaining operators will progressively recover:

etcd — completes revision rollout across all masters (5-10 min)
kube-apiserver — deploys static pods to master-0 and master-1 (5-10 min)
kube-controller-manager — deploys static pods to missing masters (5 min)
kube-scheduler — deploys static pods to missing masters (5 min)
authentication, console, ingress, monitoring — restart after API stabilizes (2-5 min)
openshift-apiserver — rolls out updated pods (2-5 min)

Total recovery time after API comes back: 10-20 minutes.

A fully recovered cluster shows all operators as Available=True, Progressing=False, Degraded=False.

Root Cause

etcd requires a majority (2 out of 3) of members to form quorum and elect a leader. When only one member is running, it cannot elect itself leader, so all reacannot start, and without the API server, the etcd-operator cannot deploy etcd to the other masters — creating a deadlock.

Common causes:

Installation/bootstrap did not complete on all masters (network issues, resource limits such as vCPU per-host caps, image pull failures)
Two masters lost their etcd data simultaneously (disk failure, accidental deletion)
Node failures during an etcd operator rollout
ESXi host resource limits (e.g., 512 vCPU cap) preventing VMs from powering on during cluster deployment

Diagnostic Steps

Since the API is down, all diagnosis must be done by SSH-ing directly to the RHCOS nodes as the core user.

Step 1: Check which static pod manifests exist on each master

A healthy master should have etcd-pod.yaml, kube-apiserver-pod.yaml, kube-controller-manager-pod.yaml, and kube-scheduler-pod.yaml:

for ip in 10.0.1.10 10.0.1.11 10.0.1.12; do
  echo "=== $ip ==="
  ssh -o StrictHostKeyChecking=no core@${ip} "ls /etc/kubernetes/manifests/"
done

Example output showing the problem — master-0 and master-1 are missing control plane manifests:

=== 10.0.1.10 (master-0) ===
coredns.yaml
haproxy.yaml
keepalived.yaml

=== 10.0.1.11 (master-1) ===
coredns.yaml
haproxy.yaml
keepalived.yaml
kube-controller-manager-pod.yaml
kube-scheduler-pod.yaml

=== 10.0.1.12 (master-2) ===
coredns.yaml
etcd-pod.yaml
haproxy.yaml
keepalived.yaml
kube-apiserver-pod.yaml
kube-controller-manager-pod.yaml
kube-scheduler-pod.yaml

Step 2: Check container status on each master

Use crictl to identify which control plane components are running or crash-looping on each master:

for ip in 10.0.1.10 10.0.1.11 10.0.1.12; do
  echo "=== $ip ==="
  ssh -o StrictHostKeyChecking=no core@${ip} \
    "sudo crictl ps -a 2>/dev/null | grep -E 'kube-apiserver|etcd|kube-controller|kube-sched' | head -10"
done

Look for:

etcd container with high restart count or Exited status
kube-apiserver container repeatedly exiting
Missing containers on nodes that should have them

Step 3: Check etcd logs on the running member

On the master that has etcd running (master-2 in the example), get the etcd container ID and check its logs:

# Find the etcd container
ssh core@10.0.1.12 "sudo crictl ps -a 2>/dev/null | grep -i etcd"

# Get the last 30 lines of logs (replace <container-id> with actual ID)
ssh core@10.0.1.12 "sudo crictl logs --tail 30 <container-id> 2>&1"

Key indicators of quorum loss:

Repeated MsgPreVote requests to unreachable peers
dial tcp <peer-ip>:2380: connect: connection refused errors
The member never transitions from pre-candidate to leader

Step 4: Verify etcd cannot respond to commands

Try running etcdctl member list inside the etcd container:

ssh core@10.0.1.12 "sudo crictl exec <etcd-container-id> etcdctl member list --write-out=table 2>&1"

If this times out with context deadline exceeded, etcd has no quorum and cannot serve any requests.

Step 5: Check kube-apiserver crash reason

On the master where kube-apiserver exists, check its last crash logs:

# Find the exited kube-apiserver container
ssh core@10.0.1.12 "sudo crictl ps -a 2>/dev/null | grep kube-apiserver | head -5"

# Get crash logs
ssh core@10.0.1.12 "sudo crictl logs --tail 20 <apiserver-container-id> 2>&1"

Common crash messages:

PostStartHook "start-service-ip-repair-controllers" failed — etcd unreachable, cannot verify service IP allocations
dial tcp <ip>:2379: connect: connection refused — cannot reach any etcd endpoint
context deadline exceeded — etcd connections timing out

Step 6: Verify etcd data directory exists on the recovery node

Before proceeding with recovery, confirm that the etcd data directory exists and has data:

ssh core@10.0.1.12 "sudo ls -la /var/lib/etcd/member/ && sudo du -sh /var/lib/etcd/member/"

You should see snap/ and wal/ subdirectories. If the directory is empty or missing, this node cannot be used for recovery — check the other masters.

Verification

# All nodes Ready
oc get nodes

# All cluster operators healthy (no output = all healthy)
oc get co | grep -vE "True.*False.*False"

# etcd cluster has 3 healthy members
oc get pods -n openshift-etcd -l app=etcd -o wide

# Cluster version not reporting errors
oc get clusterversion

# No pending CSRs
oc get csr | grep Pending

Troubleshooting Recovery Failures

etcd operator still stuck after applying the override

Check if the etcd-all-certs secret exists in the openshift-etcd namespace:

oc get secret -n openshift-etcd etcd-all-certs

If missing, the EtcdCertSignerController needs to generate it. Check the operator logs:

oc logs -n openshift-etcd-operator deployment/etcd-operator --tail=50 2>&1 | grep -E "CertSigner|InstallerController"

The InstallerControllerDegraded: missing required resources: [secrets: etcd-all-certs] error should resolve after the cert signer runs successfully.

Nodes stuck in NotReady after CSR approval

If nodes remain NotReady after approving CSRs, check kubelet status on the affected node:

ssh core@10.0.1.10 "sudo systemctl status kubelet"
ssh core@10.0.1.10 "sudo journalctl -u kubelet --no-pager -n 20"

If kubelet is cycling with system:anonymous errors, it needs a fresh bootstrap token. Restart kubelet to force re-bootstrap:

ssh core@10.0.1.10 "sudo systemctl restart kubelet"

Then approve the new CSRs that appear.

kube-apiserver not deploying to other masters

If kube-apiserver remains degraded with Missing operand on node master-X, the kube-apiserver-operator needs the API to be stable before it can roll out. C

oc get pods -n openshift-kube-apiserver -o wide | grep -v Completed

If installer pods are stuck, check their logs:

oc logs -n openshift-kube-apiserver <installer-pod-name>

SBR

Shift

Product(s)

Red Hat OpenShift Container Platform

Components

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.