Old configmap and secret revisions in "openshift-kube-*" namespaces are not deleted by the revision pruner

Solution Unverified - Updated

Environment

  • Red Hat OpenShift Container Platform (OCP)
    • 4.15+

Issue

  • In a openshift-kube-* namespaces, expired certificates in secrets with the name serving-cert-* can be found even when the current revision is significantly higher:

      $ for i in $(oc get secrets -n openshift-kube-controller-manager | grep serving | awk '{print $1}'); do echo "certName: $i" && oc get secrets -n openshift-kube-controller-manager $i -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates -issuer -subject && echo ""; done
      certName: serving-cert
      notBefore=Feb 22 14:08:09 2024 GMT
      notAfter=Feb 21 14:08:10 2026 GMT
      issuer=CN = openshift-service-serving-signer@1674482820
      subject=CN = kube-controller-manager.openshift-kube-controller-manager.svc
      
      certName: serving-cert-2
      notBefore=Jan 23 14:07:17 2023 GMT
      notAfter=Jan 22 14:07:18 2025 GMT
      issuer=CN = openshift-service-serving-signer@1674482820
      subject=CN = kube-controller-manager.openshift-kube-controller-manager.svc
      [..]
    
      certName: serving-cert-33
      notBefore=Feb 22 14:08:09 2024 GMT
      notAfter=Feb 21 14:08:10 2026 GMT
      issuer=CN = openshift-service-serving-signer@1674482820
      subject=CN = kube-controller-manager.openshift-kube-controller-manager.svc
      [..]
    
  • Why are resources of old revisions not deleted?

  • Files for old revisions in the static pod manifests directory are not being cleaned up.

Resolution

  1. Follow the Diagnostics Steps to identify any previous failed revision updates for the kubeapiservers, kubecontrollermanagers and/or kubeschedulers cluster resources, confirming that any reported errors are no longer relevant and that all nodes are on the same, more recent revision that the last failed revision.

  2. Using the TARGET and INDEX identified in the Diagnostics Steps, clear the relevant lastFailed status fields to allow the corresponding resources to be removed by the revision pruner:

$ TARGET=<impacted resource>
$ INDEX=<node index>
$ oc patch ${TARGET} cluster --subresource=status --type json -p="[{\"op\": \"remove\", \"path\": \"/status/nodeStatuses/${INDEX}/lastFailedCount\"},{\"op\": \"remove\", \"path\": \"/status/nodeStatuses/${INDEX}/lastFailedReason\"},{\"op\": \"remove\", \"path\": \"/status/nodeStatuses/${INDEX}/lastFailedRevision\"},{\"op\": \"remove\", \"path\": \"/status/nodeStatuses/${INDEX}/lastFailedRevisionErrors\"},{\"op\": \"remove\", \"path\": \"/status/nodeStatuses/${INDEX}/lastFailedTime\"}]"
  1. Clearing the status will lead to the old revisions being cleaned up during the next pruner run, but to trigger it sooner, a roll-out of new kube-* can be forced:
$ TARGET=<impacted resource>
$ oc patch ${TARGET} cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge

Root Cause

  • The revision-pruner pods works as expected as the pruner will keep lastFailedRevision - 5 revisions.

Diagnostic Steps

  1. Review the nodeStatuses field for the kubeapiservers, kubecontrollermanagers and kubeschedulers cluster resources to identify failed revisions. Take note of the INDEX of impacted nodes and confirm that all Nodes are on the same, higher currentRevision than any identified failed revisions:
$ oc get kubeapiservers cluster -o jsonpath='{.status.nodeStatuses}' | jq -r '"||CURRENT|LAST FAILED|LAST FAILED\nNODE|INDEX|REVISION|REVISION|DATE", (. | to_entries[] | "\(.value.nodeName)|\(.key)|\(.value.currentRevision)|\(.value.lastFailedRevision)|\(.value.lastFailedTime)")' | column -t -s'|'

$ oc get kubecontrollermanagers cluster -o jsonpath='{.status.nodeStatuses}' | jq -r '"||CURRENT|LAST FAILED|LAST FAILED\nNODE|INDEX|REVISION|REVISION|DATE", (. | to_entries[] | "\(.value.nodeName)|\(.key)|\(.value.currentRevision)|\(.value.lastFailedRevision)|\(.value.lastFailedTime)")' | column -t -s'|'

$ oc get kubeschedulers cluster -o jsonpath='{.status.nodeStatuses}' | jq -r '"||CURRENT|LAST FAILED|LAST FAILED\nNODE|INDEX|REVISION|REVISION|DATE", (. | to_entries[] | "\(.value.nodeName)|\(.key)|\(.value.currentRevision)|\(.value.lastFailedRevision)|\(.value.lastFailedTime)")' | column -t -s'|'

Example Output

$ oc get kubeapiservers cluster -o jsonpath='{.status.nodeStatuses}' | jq -r '"||CURRENT|LAST FAILED|LAST FAILED\nNODE|INDEX|REVISION|REVISION|DATE", (. | to_entries[] | "\(.value.nodeName)|\(.key)|\(.value.currentRevision)|\(.value.lastFailedRevision)|\(.value.lastFailedTime)")' | column -t -s'|'
                CURRENT   LAST FAILED  LAST FAILED
NODE     INDEX  REVISION  REVISION     DATE
master0  0      741       null         null
master1  1      741       null         null
master2  2      741       null         null

$ oc get kubecontrollermanagers cluster -o jsonpath='{.status.nodeStatuses}' | jq -r '"||CURRENT|LAST FAILED|LAST FAILED\nNODE|INDEX|REVISION|REVISION|DATE", (. | to_entries[] | "\(.value.nodeName)|\(.key)|\(.value.currentRevision)|\(.value.lastFailedRevision)|\(.value.lastFailedTime)")' | column -t -s'|'
                CURRENT   LAST FAILED  LAST FAILED    
NODE     INDEX  REVISION  REVISION     DATE
master0  0      39        35           2026-02-07T17:54:51Z
master1  1      39        null         null
master2  2      39        null         null

$ oc get kubeschedulers cluster -o jsonpath='{.status.nodeStatuses}' | jq -r '"||CURRENT|LAST FAILED|LAST FAILED\nNODE|INDEX|REVISION|REVISION|DATE", (. | to_entries[] | "\(.value.nodeName)|\(.key)|\(.value.currentRevision)|\(.value.lastFailedRevision)|\(.value.lastFailedTime)")' | column -t -s'|'
                CURRENT   LAST FAILED  LAST FAILED
NODE     INDEX  REVISION  REVISION     DATE
master0  0      25        8            2023-03-31T09:50:02Z
master1  1      25        23           2026-02-07T17:57:03Z
master2  2      25        null         null
  1. If the LAST FAILED REVISION column contains a revision less than 6, please review the following Solution: The revision-pruner pods are not created when the lastFailedRevision of any node is less than 6.

  2. If the LAST FAILED REVISION is higher than 6, confirm that the lastFailedRevisionErrors for each impacted resource type, kubeapiservers or kubecontrollermanagers or kubeschedulers, are no longer relevant:

$ TARGET=<impacted resource>
$ oc get ${TARGET} cluster -o jsonpath='{.status.nodeStatuses[?(@.lastFailedCount)]}' | jq . | sed -e 's/\\n/\n/g'

Example Output

$ TARGET=kubeschedulers
$ oc get ${TARGET} cluster -o jsonpath='{.status.nodeStatuses[?(@.lastFailedCount)]}' | jq . | sed -e 's/\\n/\n/g'
{
  "currentRevision": 25,
  "lastFailedCount": 1,
  "lastFailedReason": "InstallerFailed",
  "lastFailedRevision": 8,
  "lastFailedRevisionErrors": [
    "installer: 40d20 0xc000841040 0xc00034af00 0xc0008410e0 0xc0008401e0 0xc000840140 0xc00034b680 0xc00034b540 0xc00034b180 0xc00034b900 0xc00034bcc0 0xc00034b7c0 0xc00034adc0 0xc00034b2c0 0xc000841180 0xc000841220 0xc0008412c0 0xc00034ba40 0xc000841360 0xc000841400] map[104:0xc000841720 118:0xc000841360] [] -1 0 0xc0005d3f80 true 0x1d45de0 []}
I0331 09:49:31.572806       1 cmd.go:93] (*installerpod.InstallOptions)(0xc0004ad520)({
 KubeConfig: (string) \"\",
 KubeClient: (kubernetes.Interface) <nil>,
 Revision: (string) (len=1) \"8\",
 NodeName: (string) \"\",
 Namespace: (string) (len=24) \"openshift-kube-scheduler\",
 PodConfigMapNamePrefix: (string) (len=18) \"kube-scheduler-pod\",
 SecretNamePrefixes: ([]string) (len=1 cap=1) {
  (string) (len=31) \"localhost-recovery-client-token\"
 },
 OptionalSecretNamePrefixes: ([]string) (len=1 cap=1) {
  (string) (len=12) \"serving-cert\"
 },
 ConfigMapNamePrefixes: ([]string) (len=5 cap=8) {
  (string) (len=18) \"kube-scheduler-pod\",
  (string) (len=6) \"config\",
  (string) (len=17) \"serviceaccount-ca\",
  (string) (len=20) \"scheduler-kubeconfig\",
  (string) (len=37) \"kube-scheduler-cert-syncer-kubeconfig\"
 },
 OptionalConfigMapNamePrefixes: ([]string) (len=1 cap=1) {
  (string) (len=16) \"policy-configmap\"
 },
 CertSecretNames: ([]string) (len=1 cap=1) {
  (string) (len=30) \"kube-scheduler-client-cert-key\"
 },
 OptionalCertSecretNamePrefixes: ([]string) <nil>,
 CertConfigMapNamePrefixes: ([]string) <nil>,
 OptionalCertConfigMapNamePrefixes: ([]string) <nil>,
 CertDir: (string) (len=57) \"/etc/kubernetes/static-pod-resources/kube-scheduler-certs\",
 ResourceDir: (string) (len=36) \"/etc/kubernetes/static-pod-resources\",
 PodManifestDir: (string) (len=25) \"/etc/kubernetes/manifests\",
 Timeout: (time.Duration) 2m0s,
 StaticPodManifestsLockFile: (string) \"\",
 PodMutationFns: ([]installerpod.PodMutationFunc) <nil>,
 KubeletVersion: (string) \"\"
})
I0331 09:49:34.918247       1 cmd.go:124] Received SIGTERM or SIGINT signal, shutting down the process.
F0331 09:50:01.696102       1 cmd.go:106] context canceled
"
  ],
  "lastFailedTime": "2023-03-31T09:50:02Z",
  "nodeName": "master0"
}
{
  "currentRevision": 25,
  "lastFailedCount": 1,
  "lastFailedReason": "InstallerFailed",
  "lastFailedRevision": 23,
  "lastFailedRevisionErrors": [
    "installer: The container could not be located when the pod was terminated"
  ],
  "lastFailedTime": "2026-02-07T17:57:03Z",
  "nodeName": "master1"
}
  1. If the current revision of all nodes is greater than any failed revision identified in step 1, and the error message identified in step 3 has been confirmed to no longer be an issue, proceed with the clean up in the Resolution
SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.