OpenShift Container Platform Boot Image Updates

Updated 28 Feb 2024

Background

OpenShift Container Platform (OCP) ships its operating system, Red Hat Enterprise Linux CoreOS (RHCOS), in two formats:

The (now OCI formatted) operating system container image, and
Per platform cloud images/disks/etc. (collectively referred to as the BootImage)

The container image is what actually gets used during the OCP upgrade process, where the Machine Config Operator (MCO) calls rpm-ostree to stage an image-to-image update. This is the mechanism that binds the lifecycle of the Operating System to the lifecycle of the platform.

In contrast, the bootimage is pinned to a specific installer version, and, currently, it is not automatically updated during the lifetime of the cluster. These bootimages are generally not used unless node scaling happens. When a new node joins the cluster, the RHCOS bootimage first boots on the original version of OCP, and then does an immediate update to the latest version and configuration before the kubelet runs.

Let's illustrate with an example: I installed OCP version 4.8, upgraded to 4.12 and now want to add a new node to the cluster. In a MachineSet-backed cluster, I can scale the replicas of a MachineSet up. This will spin up the necessary cloud resources, and boot with the image referenced in the MachineSet which was originally set in 4.8. The 4.8-RHCOS bootimage will fetch Ignition from the Machine-Config-Server (MCS), write all the configuration during the initramfs, then use rpm-ostree to do an OS image update from 4.8->4.12 immediately and reboot. After reboot, the kubelet comes up and requests to join the cluster, finishing
the provisioning process.

This "immediate OS image update" works and is supported across all versions of OCP. Meaning, if you installed a 4.x cluster, that bootimage will continue to work for any 4.y version. It was previously unsupported to edit the bootimage in the MachineSet on IPI installations.

However, there are a few use cases for which you need to do this update, for example:

You need to install nodes with new hardware, which is not compatible with very old versions of RHCOS
You have lost access to the old bootimage and need to use any working version
You are concerned that there exists a small timing window where an old system can be vulnerable to security attacks

Red Hat is looking to make this a managed workflow in the MCO. In the meantime, this document aims to highlight a working method to do this update by hand. Note that regardless of the final implementation, there are some environments where the MCO cannot directly access the install media, such as a Bare Metal deployment, for which you will have to set up the image again as you did during installation.

Prerequisites

Note: For most clusters installed after OCP 4.6, these steps shouldn't be necessary, but it is best to check regardless.

Note: these steps must all be completed before new nodes can scale. If you have autoscaling on, you should turn it off until the bootimage update is completed.

Ignition version

In OCP 4.6 and later, the platform updated to using Ignition spec v3 by default, which is not backwards compatible. If you installed previous to this version, the "stub Ignition secret", which is used to point the new node to the Machine Config Server and verify its contents, will still be referring to a spec v2 config. New 4.6+ bootimages will not be able to parse and use this config.

If you are unsure what the version your secrets are using, you can follow these steps:

Get the current list of secrets in the machine-api namespace:
```
$ oc get secrets -n openshift-machine-api |grep user-data
```
By default, you should see
- worker-user-data
- worker-user-data-managed
- master-user-data
- master-user-data-managed
If you have other user-data secrets, they were most likely manually created at some point in the cluster's history. You can check which of these are referenced in the MachineSets objects on the cluster. For example:
```
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    metadata:
      labels:
        …
      name: my-machineset
      namespace: openshift-machine-api

        spec:
            …
              userDataSecret:
                name: my-user-data-secret
```
The userDataSecret under the MachineSet object refers to any secrets currently in use.

Inspect the worker-user-data contents:

$ oc -n openshift-machine-api get secret/worker-user-data --template='{{.data.userData | base64decode}}'
{"ignition":{"config":{"merge":[{"source":"https://xxx/config/worker","verification":{}}],"replace":{"verification":{}}},"proxy":{},"security":{"tls":{"certificateAuthorities":[{"source":"data:text/plain;charset=utf-8;base64,xxx","verification":{}}]}},"timeouts":{},"version":"3.1.0"},"passwd":{},"storage":{},"systemd":{}}

Depending on the install-version of your OCP cluster, the generated ignition versions can be one of the following: 2.2.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0

If the version is currently 3.x, you do not need to make any modifications to this spec. However, if it is 2.2.0, you will need to make the following changes:

Switch version to 3.1.0
Switch ignition.config.append to ignition.config.merge

If you do create a new secret, go to the MachineSet(s) you plan to scale with, and find a snippet that looks like:
```
      userDataSecret:
        name: worker-user-data
```
And update that to your new secret.
With these changes in worker-user-data or other secrets referenced by MachineSet, it will immediately take effect the next time you provision a machine from that MachineSet.

Note: in some environments, such as Bare Metal UPI, this Ignition stub lives outside the cluster and is directly given to new OS images. The same steps apply to update any Ignition stub or full Ignition.

Ignition TLS CA certificate

In the prior section for updating the Ignition version, the stub secret had referenced a TLS CA certificate:

{"ignition":{"config":{"merge":[{"source":"https://xxx/config/worker","verification":{}}],"replace":{"verification":{}}},"proxy":{},"security":{"tls":{"certificateAuthorities":[{"source":"data:text/plain;charset=utf-8;base64,xxx","verification":{}}]}},"timeouts":{},"version":"3.1.0"},"passwd":{},"storage":{},"systemd":{}}

This is generated by the OCP installer during install time and is not currently managed. For old systems, the originally generated certificates do not contain a Subject Alternate Name, i.e.

Old:

CN: api-int.<base_domain>

no SAN

New:

CN: system:machine-config-server
SAN: api-int.<base_domain>

You can check this via the MCS-TLS secret in cluster directly, e.g.

$ oc -n openshift-machine-config-operator get secret machine-config-server-tls --template '{{index .data "tls.crt" | base64decode}}' | openssl x509 -text | grep -A1 "Subject Alternative Name"

If there are any matching lines, you can skip this step. If not, perform the following:

Regenerate the cluster cert and the stub secret. You can do so automatically via the instructions at https://access.redhat.com/articles/regenerating_cluster_certificates#regenerating-ca-certificates-for-the-machine-config-server-5

Then the above step should give you a new certificate with SAN attached.

Updating your bootimage

Once the Ignition has been updated, and the certificate is correct, you are able to use any bootimage up to the current version of your OCP cluster. First, find a bootimage you would like to use.

You can do this either via:

The Openshift mirror for RHCOS images: Content from mirror.openshift.com is not included.Content from mirror.openshift.com is not included.https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/
Take the corresponding openshift installer binary of that version, and do: openshift-install coreos print-stream-json

Then, if the cluster has MachineSet-backed scaling, we will need to update your MachineSet reference to the bootimage.

Using AWS as an example, your MachineSet should have a snippet like:

      providerSpec:
        value:
          ami:
            id: ami-xxxxxx

That AMI is for a specific region. In the print-stream-json output above, you can find a new AMI for that region and provide it to the cluster. As long as the AMI is accessible to your AWS account (which should be the case for all shipped RHCOS AMIs) it can and will be used the next time you scale a node.

You can update that MachineSet object or create a new MachineSet if you wish. You should make a copy of the old AMI ID just in case you need to roll back. Now you should be able to scale directly with an updated bootimage.

If you are using bare metal or other on-prem environments, you will need to set up the new bootimage like the original bootimage during installation. Please follow the steps in the Openshift documentation for your specific platform.

If you are using a Red Hat hosted cluster (e.g. ARO or ROSA), you should not modify this yourself. Please reach out to Red Hat if you have any concerns regarding your cluster.

Platform specific example - VSphere:

vSphere boot images use a template VM created by uploading the OVA image to the vCenter. The template VM is then referenced by name in all machinesets during cluster installation. This is automatically done by the installer for you in an IPI installation, or manually done by the user in a UPI installation. This document will guide you through updating your boot image post cluster installation.

Check whether your cluster requires a boot image update

As mentioned above, the machinesets reference a VM template. The current boot image for a node is generally listed by the Machine Config Daemon(MCD) logs. The MCD is a MCO DaemonSet that runs on every node in the cluster. Vsphere clusters typically only have one worker machineset, so it is advised to pick the youngest worker node. If you have created new machinesets, then you will have to find out the youngest node attached to the new machineset for this step. This example uses the MCD that runs on the node ci-ln-3948f5b-c1627-z58jm-worker-0-lkn9x.

First, find out the name of the MCD pod running on the node of interest with the following command:

$ oc get pods -o wide -n openshift-machine-config-operator
NAME                                        READY   STATUS    RESTARTS      AGE   IP              NODE                                       NOMINATED NODE   READINESS GATES
machine-config-controller-d8cdd7555-hl945   2/2     Running   1 (44m ago)   54m   10.130.0.35     ci-ln-3948f5b-c1627-z58jm-master-2         <none>           <none>
machine-config-daemon-4j8fs                 3/3     Running   0             54m   10.38.221.218   ci-ln-3948f5b-c1627-z58jm-master-0         <none>           <none>
machine-config-daemon-4mfz2                 3/3     Running   0             54m   10.38.221.148   ci-ln-3948f5b-c1627-z58jm-master-2         <none>           <none>
machine-config-daemon-mqpq4                 3/3     Running   0             54m   10.38.221.149   ci-ln-3948f5b-c1627-z58jm-master-1         <none>           <none>
machine-config-daemon-qb2ld                 3/3     Running   0             47m   10.38.221.143   ci-ln-3948f5b-c1627-z58jm-worker-0-lkn9x   <none>           <none>
machine-config-daemon-r8gxw                 3/3     Running   0             47m   10.38.221.253   ci-ln-3948f5b-c1627-z58jm-worker-0-rrssv   <none>           <none>
machine-config-daemon-rl9lk                 3/3     Running   0             47m   10.38.221.225   ci-ln-3948f5b-c1627-z58jm-worker-0-sgv56   <none>           <none>
machine-config-operator-57df866f89-9j5tp    2/2     Running   1 (44m ago)   57m   10.129.0.17     ci-ln-3948f5b-c1627-z58jm-master-1         <none>           <none>
machine-config-server-7dmr6                 1/1     Running   0             54m   10.38.221.218   ci-ln-3948f5b-c1627-z58jm-master-0         <none>           <none>
machine-config-server-7xf4r                 1/1     Running   0             54m   10.38.221.148   ci-ln-3948f5b-c1627-z58jm-master-2         <none>           <none>
machine-config-server-86npv                 1/1     Running   0             54m   10.38.221.149   ci-ln-3948f5b-c1627-z58jm-master-1         <none>           <none>

Then, with the MCD name for your target node, run the following command.

$ oc logs machine-config-daemon-qb2ld -n openshift-machine-config-operator | grep -A6 'CoreOS aleph version'
Defaulted container "machine-config-daemon" out of: machine-config-daemon, kube-rbac-proxy, crio-kube-rbac-proxy
I0206 15:43:09.260899    3531 coreos.go:53] CoreOS aleph version: mtime=2023-11-24 16:50:34.214 +0000 UTC
{
   "build": "415.92.202311241643-0",
   "imgid": "rhcos-415.92.202311241643-0-qemu.x86_64.qcow2",
   "ostree-commit": "3aff20eacec06af854303111319e74d9dc84c241af5c57dc8ae3330a8ae5b086",
   "ref": ""
}

Note the highlighted build field above, this is the build ID for the boot image of this node(and the machineset, effectively). Next, download the openshift-installer and run the following command (jq required):

Important: Please ensure that you are using an installer that matches your cluster OCP version. The RHCOS boot images might not change with every release of OpenShift Container Platform.

$ openshift-install coreos print-stream-json | jq '.architectures.x86_64.artifacts.vmware'
{
  "release": "415.92.202311241643-0",
  "formats": {
    "ova": {
      "disk": {
        "location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.15-9.2/builds/415.92.202311241643-0/x86_64/rhcos-415.92.202311241643-0-vmware.x86_64.ova",
        "sha256": "38d16424a9b6170ae4efbb356a17471aecff717ed50eac764d9574ef817ec43e"
      }
    }
  }
}

This command will print out the latest available boot image for this OCP release. If the highlighted release field matches the boot image you found in the previous section, then your machineset is already on the latest boot image for the cluster’s OCP version.

If it does not match, then there is a boot image update available for this machineset. You can proceed to download the image by using the link highlighted under the location field.

Create a new VM template in your vCenter

Launch the vSphere client after logging into the vCenter for your cluster.
From the Hosts and Clusters section, right-click your cluster name and select Deploy OVF Template.
On the Select an OVF tab, specify the name of the RHCOS OVA file that you downloaded.
On the Select a name and folder tab, set a Virtual machine name for your template, such as {$build-id}-{cluster-name}-rhcos. This will help you keep track of the boot image used to create this VM template in the future. Click the name of your vSphere cluster and select the folder.
On the Select a compute resource tab, click the name of your vSphere cluster.
On the Select storage tab, configure the storage options for your VM. Select Thin Provision or Thick Provision, based on your storage preferences. Select the datastore that you specified in your original installation. If you want to encrypt your virtual machines, select Encrypt this virtual machine. See This page is not included, but the link has been rewritten to point to the nearest parent document.Requirements for encrypting virtual machines from the official installation documentation for more information.
On the Select network tab, specify the network that you configured for the cluster, if available.
On the Customize template tab, leave the settings as is(unless you had made any changes during the installation)
On the Ready to complete tab, verify your settings one last time and click Finish.
The vSphere client will now begin to upload the boot image and create the OVF template. This can take a few minutes depending on network speeds. You can keep track of this in the task tab - look for a task under “Deploy OVF template”.
Once the upload is complete, right click on the new virtual machine and click Template>Convert to template. Click “Yes” on the pop-up.

You now have a VM template based on the new boot image, which can be used to update the machineset objects.

Update the machineset

In every machineset that you’d like to update the boot image on, edit the machineset object via the oc client. Replace the spec.providerSpec.template field with the name of the new VM template you created in the previous step.
Here is a sample machineset snippet:

$ oc edit  machineset/ci-ln-6vjqx8t-c1627-bwxkr-worker-0 -n openshift-machine-api
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  annotations:
    machine.openshift.io/memoryMb: "16384"
    machine.openshift.io/vCPU: "4"
  creationTimestamp: "2024-01-02T18:50:51Z"
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: ci-ln-6vjqx8t-c1627-bwxkr
  name: ci-ln-6vjqx8t-c1627-bwxkr-worker-0
  namespace: openshift-machine-api
  resourceVersion: "29561"
  uid: acc3c454-023c-491b-825e-cbc48be70e71
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: ci-ln-6vjqx8t-c1627-bwxkr
      machine.openshift.io/cluster-api-machineset: ci-ln-6vjqx8t-c1627-bwxkr-worker-0
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: ci-ln-6vjqx8t-c1627-bwxkr
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: ci-ln-6vjqx8t-c1627-bwxkr-worker-0
    spec:
      lifecycleHooks: {}
      metadata: {}
      providerSpec:
        value:
          apiVersion: machine.openshift.io/v1beta1
          credentialsSecret:
            name: vSphere-cloud-credentials
          diskGiB: 120
          kind: VSphereMachineProviderSpec
          memoryMiB: 16384
          metadata:
            creationTimestamp: null
          network:
            devices:
            - networkName: ci-vlan-1274
          numCPUs: 4
          numCoresPerSocket: 4
          snapshot: ""
          template: ci-ln-6vjqx8t-c1627-bwxkr-rhcos-generated-region-generated-zone
          userDataSecret:
            name: worker-user-data
          workspace:
            datacenter: IBMCloud
            datastore: /IBMCloud/datastore/vsanDatastore
            folder: /IBMCloud/vm/ci-ln-6vjqx8t-c1627-bwxkr
            resourcePool: /IBMCloud/host/vcs-ci-workload/Resources/ipi-ci-clusters
            server: vcs8e-vc.ocp2.dev.example.com
status:
  availableReplicas: 3
  fullyLabeledReplicas: 3
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3

Any future nodes scaled from this machineset should boot up with the new boot image you uploaded. You should be able to test this by checking the aleph version of the node in the daemon logs (see section titled “Check the current boot image” for more information). You will have to repeat this section for any other machinesets that need updated boot images.

SBR

Shift Install Upgrade

Product(s)

Red Hat OpenShift Container Platform

Category

Upgrade

Components

Machine Config Operator

Article Type

General