Not able to drain a node when running LokiStack with size 1x.demo in RHOCP 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Red Hat OpenShift Logging (RHOL)
- 5
- 6
- LokiStack
- Loki
Issue
-
When running the LokiStack wih size
1x.demothe node where running Loki pods is not able to be drained with apod's disruption budgeterror:E0208 09:40:06.497414 1 drain_controller.go:142] error when evicting pods/"logging-loki-distributor-84f4ccd8d5-5xz6m" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. E0208 09:40:07.709796 1 drain_controller.go:142] error when evicting pods/"logging-loki-index-gateway-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. E0208 09:40:09.291697 1 drain_controller.go:142] error when evicting pods/"logging-loki-ingester-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
Resolution
This issue has been reported to Red Hat engineering. It is being tracked in Bug This content is not included.LOG-4824 and closed as "Not a Bug".
The Loki size 1x.demo is pretended for using only in Labs/Demos. Evaluate to use a different Loki deployment size in case that needed to maintain it running longer time that the expected in a laboratory or demo.
In the case that required to stay longer than the expected and it could interfers in a node rollout, then apply one of the workarounds detailed to continuation.
Workaround
Review the NOTE in the Diagnostic Steps section
Workaround 1. Increase the number of replicas of each Loki component
This is the recommended, but it requires more resources.
If not using not using the Loki Component ruler, don't define it in the LokiStack Custom Resource (CR).
The Loki Compactor needs to be only 1 replica, don't change it.
$ oc edit lokistack logging-loki -n openshift-logging
...
spec:
template:
distributor:
replicas: 2
gateway:
replicas: 2
indexGateway:
replicas: 2
ingester:
replicas: 2
querier:
replicas: 2
queryFrontend:
replicas: 2
ruler: <------ if not using it, omit this and the next line
replicas: 2
Review that all the Loki components has 2 pods, excepting the Loki Compactor that should have only 1:
$ oc get pods -l app.kubernetes.io/name=lokistack -n openshift-logging
NAME READY STATUS RESTARTS AGE
logging-loki-compactor-0 1/1 Running 0 9m43s
logging-loki-distributor-7bfd5998f7-bcx5j 1/1 Running 0 9m53s
logging-loki-distributor-7bfd5998f7-kwhcl 1/1 Running 0 10m
logging-loki-gateway-86759869bf-vpl72 2/2 Running 0 20h
logging-loki-gateway-86759869bf-xbh9d 2/2 Running 0 20h
logging-loki-index-gateway-0 1/1 Running 0 8m46s
logging-loki-index-gateway-1 1/1 Running 0 9m42s
logging-loki-ingester-0 1/1 Running 0 5m36s
logging-loki-ingester-1 1/1 Running 0 9m43s
logging-loki-querier-58dd589fbc-mm5ch 1/1 Running 0 10m
logging-loki-querier-58dd589fbc-vfvnf 1/1 Running 0 9m53s
logging-loki-query-frontend-66d74fbbfd-l8dg4 1/1 Running 0 10m
logging-loki-query-frontend-66d74fbbfd-svxk2 1/1 Running 0 9m53s
Review the podDisruptionBudget has ALLOWED DISRUPTIONS: 1 for all the Loki components:
$ oc get pdb -n openshift-logging
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
logging-loki-distributor 1 N/A 1 22d
logging-loki-gateway 1 N/A 1 22d
logging-loki-index-gateway 1 N/A 1 22d
logging-loki-ingester 1 N/A 1 22d
logging-loki-querier 1 N/A 1 22d
logging-loki-query-frontend 1 N/A 1 22d
Now, the nodes can be drained without receiving the pod's disruption budget error.
Workaround 2. Delete the Loki pods causing pod disruption budget
This option requires manual intervention always that required to drain a node and it could lead to a long replay of the WAL or even, not starting the Loki Ingester because corrupted the WAL or taking long time. If that's the case, then, review the article " Loki ingesters 0/1 in RHOCP 4 ".
Cordon and drain the node where it will be visible the Loki pods returning pod disruption budget error:
$ oc adm cordon <node>
$ oc adm drain <node> --ignore-daemonsets --delete-emptydir-data --force
...
error when evicting pods/"logging-loki-index-gateway-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
error when evicting pods/"logging-loki-ingester-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
From a different terminal connected to the same OpenShift Cluster, delete the Loki pods returning the pod disruption budget error when draining the node:
$ oc delete pod logging-loki-ingester-0 logging-loki-index-gateway-0" -n openshift-logging
Iterate through the previous steps always that needed when draining an OpenShift node deleting the Loki pods returning the pod disruption budget error.
Root Cause
When using the Loki size 1x.demo, the Loki components distributor, index-gateway, ingester, querier and query-frontend have not allowed disruptions.
Diagnostic Steps
NOTE: In this example, it's assumed that the LokiStack was deployed in the
openshift-loggingnamespace and the name of the Loki instance islogging-loki. Then, the commands are related to this namespace and the Loki instance, but it could be in one different being needed to replace the-n openshift-loggingby the namespace where installed the LokiStack andlogging-lokifor the name of the instance used
-
Review the size of the LokiStack is
1x.demo.$ oc get lokistack logging-loki -o jsonpath='{.spec.size}' 1x.demo -
Review that the
ALLOWED DISRUPTIONSfor the Loki componentsdistributor,index-gateway,ingester,querierandquery-frontendis0:$ oc get pdb NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE logging-loki-distributor 1 N/A 0 22d logging-loki-gateway 1 N/A 1 22d logging-loki-index-gateway 1 N/A 0 22d logging-loki-ingester 1 N/A 0 22d logging-loki-querier 1 N/A 0 22d logging-loki-query-frontend 1 N/A 0 22d -
Review that when draining the node, it's not possible to evict some Loki pods with the error
pod's disruption budget:$ oc adm drain <node> --ignore-daemonsets --delete-emptydir-data --force ... error when evicting pods/"logging-loki-index-gateway-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. error when evicting pods/"logging-loki-ingester-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. ...
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.