Expanding the db-noobaa-db-pg-0 PVC - OpenShift Data Foundation (ODF) v4.19+
Environment
Red Hat OpenShift Container Platform (RCOCP) v4.x
Red Hat OpenShift Data Foundations (ODF) v4.19+
Red Hat Quay (RHQ) v3.x
Issue
The noobaa-db-pg-cluster-<1|2> PVC has become full, preventing the PostgreSQL server from starting.
Related Articles:
Expanding the db-noobaa-db-pg-0 PVC - OpenShift Data Foundation (ODF) v4.18 and Below
Change the Multi-Cloud Object Gateway Database's Collation Locale to C
How to Check the Size/Consumption of the PostgreSQL Database in the db-noobaa-db-pg-0 PVC
Resolution
Before starting, ensure adequate space is available in the ocs-storagecluster-cephblockpool, via ceph df, and Ceph is reporting HEALTH_OK with all PGs reporting active+clean via ceph status.
NOTE: If ceph is NOT reporting HEALTH_OK nor are all PGs reporting active+clean, please open a support case for further investigation.
$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph status -c /var/lib/rook/openshift-storage/openshift-storage.config
health: HEALTH_OK <---------------------- HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 41m)
mgr: a(active, since 41m)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 41m), 3 in (since 41m)
data:
volumes: 1/1 healthy
pools: 4 pools, 97 pgs
objects: 92 objects, 138 MiB
usage: 277 MiB used, 300 GiB / 300 GiB avail
pgs: 97 active+clean <---------------------- active+clean
io:
client: 1.2 KiB/s rd, 9.0 KiB/s wr, 2 op/s rd, 1 op/s wr
$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph df -c /var/lib/rook/openshift-storage/openshift-storage.config
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 6 TiB 6.0 TiB 3.6 GiB 3.6 GiB 0.06
TOTAL 6 TiB 6.0 TiB 3.6 GiB 3.6 GiB 0.06
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
ocs-storagecluster-cephblockpool 1 32 340 MiB 177 1021 MiB 0.02 1.7 TiB <----- cephblockpool
Procedures:
- In the event that this is a standalone NooBaa deployment where the
db-noobaa-db-pg-cluster-XPVC is NOT backed by the storageclassocs-storagecluster-ceph-rbd, validate thatvolumeExpansionis supported. For example:
$ oc get sc <storageclass-name> -o yaml | grep -i expansion
allowVolumeExpansion: true <-----
- Make note of the
noobaa-db-pg-cluster-Xandnoobaa-db-pg-cluster-YPVC capacities (example shows cluster-1 and cluster-2):
$ oc -n openshift-storage get clusters.postgresql.cnpg.noobaa.io noobaa-db-pg-cluster -o jsonpath={.spec.storage.size}
50Gi
$ oc get pvc -n openshift-storage
NAME STATUS VOLUME CAPACITY
noobaa-db-pg-cluster-1 Bound pvc-6191c66e-18fa-4792-8ccd-4838ded0fd03 50Gi
noobaa-db-pg-cluster-2 Bound pvc-41af5072-163a-4dca-8827-ba574103209a 50Gi
- Scale down the NooBaa services:
oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=0
oc -n openshift-storage scale sts noobaa-core --replicas=0
Example: Expanding volume from 50Gi to 100Gi.
WARNING: ONLY VOLUME EXPANSION IS ALLOWED. REVERTING BACK TO A SMALLER VOLUME SIZE IS NOT SUPPORTED.
- After pod termination of the NooBaa services (it takes time, don't force delete), patch the storagecluster.yaml using the command below and update the storage size to the desired size. The example below uses
100Gi.
$ oc patch -n openshift-storage storagecluster ocs-storagecluster --type merge --patch '{"spec": {"resources": {"noobaa-db-vol":{"requests":{"storage":"100Gi"}}}}}'
Validate:
- The resize will not take place until the NooBaa services are scaled back up. Validate the patch was successful:
$ oc get storagecluster -n openshift-storage -o yaml | grep -A2 noobaa-db-vol
noobaa-db-vol:
requests:
storage: 100Gi
- Scale up the NooBaa services and allow ~5 minutes for the resize process:
oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=1
oc -n openshift-storage scale sts noobaa-core --replicas=1
- Validate the PVC resize was successful, and the NooBaa CR now reflects the correct size:
$ oc get pvc -n openshift-storage
NAME STATUS VOLUME CAPACITY
noobaa-db-pg-cluster-1 Bound pvc-6191c66e-18fa-4792-8ccd-4838ded0fd03 100Gi
noobaa-db-pg-cluster-2 Bound pvc-41af5072-163a-4dca-8827-ba574103209a 100Gi
$ oc get noobaa -n openshift-storage -o yaml | grep dbMinVolumeSize
dbMinVolumeSize: 100Gi
- Once all pods have been in a
Runningstate for ~3 minutes, validate that NooBaa, and the respective backingstores/namespacestores are in aReadyphase:
$ oc get noobaa -n openshift-storage
NAME S3-ENDPOINTS STS-ENDPOINTS IMAGE PHASE AGE
noobaa ["https://<omitted>"] ["https://<omitted>"] registry.redhat.io/<omitted> Ready 46h
$ oc get backingstore -A
NAME TYPE PHASE AGE
noobaa-default-backing-store <omitted> Ready <---- 35h
$ oc get namespacestore -A
NAME TYPE PHASE AGE
custom-namespace-store <omitted> Ready <---- 35h
Root Cause
When troubleshooting the noobaa-db resource, noobaa-db-pg-cluster-x PVC may become full, preventing the Postgres server from starting. Expanding noobaa-db-pg-cluster-1 and noobaa-db-pg-cluster-2 PVCs will allow Postgres server to start back up again to finish troubleshooting.
Diagnostic Steps
Review the pod logs for noobaa-db-pg-cluster-1-<pod-name> and noobaa-db-pg-cluster-2-<pod-name>
$ oc logs noobaa-db-pg-cluster-1-<pod-name>
waiting for server to start....2022-08-25 19:48:38.185 UTC [22] FATAL: could not write lock file "postmaster.pid": No space left on device
stopped waiting
pg_ctl: could not start server
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.