Expanding the db-noobaa-db-pg-0 PVC - OpenShift Data Foundation (ODF) v4.19+

Solution Verified - Updated

Environment

Red Hat OpenShift Container Platform (RCOCP) v4.x
Red Hat OpenShift Data Foundations (ODF) v4.19+
Red Hat Quay (RHQ) v3.x

Issue

The noobaa-db-pg-cluster-<1|2> PVC has become full, preventing the PostgreSQL server from starting.

Related Articles:
Expanding the db-noobaa-db-pg-0 PVC - OpenShift Data Foundation (ODF) v4.18 and Below
Change the Multi-Cloud Object Gateway Database's Collation Locale to C
How to Check the Size/Consumption of the PostgreSQL Database in the db-noobaa-db-pg-0 PVC

Resolution

Before starting, ensure adequate space is available in the ocs-storagecluster-cephblockpool, via ceph df, and Ceph is reporting HEALTH_OK with all PGs reporting active+clean via ceph status.

NOTE: If ceph is NOT reporting HEALTH_OK nor are all PGs reporting active+clean, please open a support case for further investigation.

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph status -c /var/lib/rook/openshift-storage/openshift-storage.config

    health: HEALTH_OK            <---------------------- HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 41m)
    mgr: a(active, since 41m)
    mds: 1/1 daemons up, 1 hot standby
    osd: 3 osds: 3 up (since 41m), 3 in (since 41m)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 92 objects, 138 MiB
    usage:   277 MiB used, 300 GiB / 300 GiB avail
    pgs:     97 active+clean      <---------------------- active+clean
 
  io:
    client:   1.2 KiB/s rd, 9.0 KiB/s wr, 2 op/s rd, 1 op/s wr


$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph df -c /var/lib/rook/openshift-storage/openshift-storage.config

--- RAW STORAGE ---
CLASS   SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    6 TiB  6.0 TiB  3.6 GiB   3.6 GiB       0.06
TOTAL  6 TiB  6.0 TiB  3.6 GiB   3.6 GiB       0.06
--- POOLS ---
POOL                                        ID  PGS   STORED  OBJECTS      USED  %USED  MAX AVAIL
ocs-storagecluster-cephblockpool             1   32  340 MiB      177  1021 MiB   0.02    1.7 TiB <----- cephblockpool

Procedures:

  1. In the event that this is a standalone NooBaa deployment where the db-noobaa-db-pg-cluster-X PVC is NOT backed by the storageclass ocs-storagecluster-ceph-rbd, validate that volumeExpansion is supported. For example:
$ oc get sc <storageclass-name> -o yaml | grep -i expansion
allowVolumeExpansion: true <-----
  1. Make note of the noobaa-db-pg-cluster-X and noobaa-db-pg-cluster-Y PVC capacities (example shows cluster-1 and cluster-2):
$ oc -n openshift-storage get clusters.postgresql.cnpg.noobaa.io noobaa-db-pg-cluster -o jsonpath={.spec.storage.size}
50Gi

$ oc get pvc  -n openshift-storage
NAME                     STATUS   VOLUME                                     CAPACITY
noobaa-db-pg-cluster-1   Bound    pvc-6191c66e-18fa-4792-8ccd-4838ded0fd03   50Gi
noobaa-db-pg-cluster-2   Bound    pvc-41af5072-163a-4dca-8827-ba574103209a   50Gi
  1. Scale down the NooBaa services:
oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=0
oc -n openshift-storage scale sts noobaa-core --replicas=0

Example: Expanding volume from 50Gi to 100Gi.

WARNING: ONLY VOLUME EXPANSION IS ALLOWED. REVERTING BACK TO A SMALLER VOLUME SIZE IS NOT SUPPORTED.

  1. After pod termination of the NooBaa services (it takes time, don't force delete), patch the storagecluster.yaml using the command below and update the storage size to the desired size. The example below uses 100Gi.
$ oc patch -n openshift-storage storagecluster ocs-storagecluster --type merge --patch '{"spec": {"resources": {"noobaa-db-vol":{"requests":{"storage":"100Gi"}}}}}'

Validate:

  1. The resize will not take place until the NooBaa services are scaled back up. Validate the patch was successful:
$ oc get storagecluster -n openshift-storage -o yaml | grep -A2 noobaa-db-vol
      noobaa-db-vol:
        requests:
          storage: 100Gi
  1. Scale up the NooBaa services and allow ~5 minutes for the resize process:
oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=1
oc -n openshift-storage scale sts noobaa-core --replicas=1
  1. Validate the PVC resize was successful, and the NooBaa CR now reflects the correct size:
$ oc get pvc -n openshift-storage
NAME                     STATUS   VOLUME                                     CAPACITY
noobaa-db-pg-cluster-1   Bound    pvc-6191c66e-18fa-4792-8ccd-4838ded0fd03   100Gi
noobaa-db-pg-cluster-2   Bound    pvc-41af5072-163a-4dca-8827-ba574103209a   100Gi

$ oc get noobaa -n openshift-storage -o yaml | grep dbMinVolumeSize
  dbMinVolumeSize: 100Gi
  1. Once all pods have been in a Running state for ~3 minutes, validate that NooBaa, and the respective backingstores/namespacestores are in a Ready phase:
$ oc get noobaa -n openshift-storage
NAME     S3-ENDPOINTS           STS-ENDPOINTS             IMAGE                          PHASE   AGE
noobaa   ["https://<omitted>"]  ["https://<omitted>"]     registry.redhat.io/<omitted>   Ready   46h

$ oc get backingstore -A
NAME                             TYPE       PHASE             AGE
noobaa-default-backing-store     <omitted>  Ready <----       35h

$ oc get namespacestore -A
NAME                             TYPE       PHASE             AGE
custom-namespace-store           <omitted>  Ready <----       35h

Root Cause

When troubleshooting the noobaa-db resource, noobaa-db-pg-cluster-x PVC may become full, preventing the Postgres server from starting. Expanding noobaa-db-pg-cluster-1 and noobaa-db-pg-cluster-2 PVCs will allow Postgres server to start back up again to finish troubleshooting.

Diagnostic Steps

Review the pod logs for noobaa-db-pg-cluster-1-<pod-name> and noobaa-db-pg-cluster-2-<pod-name>

$ oc logs noobaa-db-pg-cluster-1-<pod-name>
waiting for server to start....2022-08-25 19:48:38.185 UTC [22] FATAL:  could not write lock file "postmaster.pid": No space left on device
 stopped waiting
pg_ctl: could not start server
SBR
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.