Not able to drain a node when running LokiStack with size 1x.demo in RHOCP 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Red Hat OpenShift Logging (RHOL)
    • 5
    • 6
  • LokiStack
  • Loki

Issue

  • When running the LokiStack wih size 1x.demo the node where running Loki pods is not able to be drained with a pod's disruption budget error:

    E0208 09:40:06.497414       1 drain_controller.go:142] error when evicting pods/"logging-loki-distributor-84f4ccd8d5-5xz6m" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
    E0208 09:40:07.709796       1 drain_controller.go:142] error when evicting pods/"logging-loki-index-gateway-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
    E0208 09:40:09.291697       1 drain_controller.go:142] error when evicting pods/"logging-loki-ingester-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
    

Resolution

This issue has been reported to Red Hat engineering. It is being tracked in Bug This content is not included.LOG-4824 and closed as "Not a Bug".

The Loki size 1x.demo is pretended for using only in Labs/Demos. Evaluate to use a different Loki deployment size in case that needed to maintain it running longer time that the expected in a laboratory or demo.

In the case that required to stay longer than the expected and it could interfers in a node rollout, then apply one of the workarounds detailed to continuation.

Workaround

Review the NOTE in the Diagnostic Steps section

Workaround 1. Increase the number of replicas of each Loki component

This is the recommended, but it requires more resources.

If not using not using the Loki Component ruler, don't define it in the LokiStack Custom Resource (CR).
The Loki Compactor needs to be only 1 replica, don't change it.

$ oc edit lokistack logging-loki -n openshift-logging
...
spec:
  template:
    distributor:
      replicas: 2
    gateway:
      replicas: 2
    indexGateway:
      replicas: 2
    ingester:
      replicas: 2
    querier:
      replicas: 2
    queryFrontend:
      replicas: 2
    ruler:      <------ if not using it, omit this and the next line
      replicas: 2

Review that all the Loki components has 2 pods, excepting the Loki Compactor that should have only 1:

$ oc get pods -l app.kubernetes.io/name=lokistack -n openshift-logging
NAME                                           READY   STATUS    RESTARTS   AGE
logging-loki-compactor-0                       1/1     Running   0          9m43s
logging-loki-distributor-7bfd5998f7-bcx5j      1/1     Running   0          9m53s
logging-loki-distributor-7bfd5998f7-kwhcl      1/1     Running   0          10m
logging-loki-gateway-86759869bf-vpl72          2/2     Running   0          20h
logging-loki-gateway-86759869bf-xbh9d          2/2     Running   0          20h
logging-loki-index-gateway-0                   1/1     Running   0          8m46s
logging-loki-index-gateway-1                   1/1     Running   0          9m42s
logging-loki-ingester-0                        1/1     Running   0          5m36s
logging-loki-ingester-1                        1/1     Running   0          9m43s
logging-loki-querier-58dd589fbc-mm5ch          1/1     Running   0          10m
logging-loki-querier-58dd589fbc-vfvnf          1/1     Running   0          9m53s
logging-loki-query-frontend-66d74fbbfd-l8dg4   1/1     Running   0          10m
logging-loki-query-frontend-66d74fbbfd-svxk2   1/1     Running   0          9m53s

Review the podDisruptionBudget has ALLOWED DISRUPTIONS: 1 for all the Loki components:

$ oc get pdb -n openshift-logging
NAME                          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
logging-loki-distributor      1               N/A               1                     22d
logging-loki-gateway          1               N/A               1                     22d
logging-loki-index-gateway    1               N/A               1                     22d
logging-loki-ingester         1               N/A               1                     22d
logging-loki-querier          1               N/A               1                     22d
logging-loki-query-frontend   1               N/A               1                     22d

Now, the nodes can be drained without receiving the pod's disruption budget error.

Workaround 2. Delete the Loki pods causing pod disruption budget

This option requires manual intervention always that required to drain a node and it could lead to a long replay of the WAL or even, not starting the Loki Ingester because corrupted the WAL or taking long time. If that's the case, then, review the article " Loki ingesters 0/1 in RHOCP 4 ".

Cordon and drain the node where it will be visible the Loki pods returning pod disruption budget error:

$ oc adm cordon <node>
$ oc adm drain <node> --ignore-daemonsets --delete-emptydir-data --force
...
error when evicting pods/"logging-loki-index-gateway-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
error when evicting pods/"logging-loki-ingester-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

From a different terminal connected to the same OpenShift Cluster, delete the Loki pods returning the pod disruption budget error when draining the node:

$ oc delete pod logging-loki-ingester-0 logging-loki-index-gateway-0" -n openshift-logging

Iterate through the previous steps always that needed when draining an OpenShift node deleting the Loki pods returning the pod disruption budget error.

Root Cause

When using the Loki size 1x.demo, the Loki components distributor, index-gateway, ingester, querier and query-frontend have not allowed disruptions.

Diagnostic Steps

NOTE: In this example, it's assumed that the LokiStack was deployed in the openshift-logging namespace and the name of the Loki instance is logging-loki. Then, the commands are related to this namespace and the Loki instance, but it could be in one different being needed to replace the -n openshift-logging by the namespace where installed the LokiStack and logging-loki for the name of the instance used

  • Review the size of the LokiStack is 1x.demo.

    $ oc get lokistack logging-loki -o jsonpath='{.spec.size}'
    1x.demo
    
  • Review that the ALLOWED DISRUPTIONS for the Loki components distributor, index-gateway, ingester, querier and query-frontend is 0:

    $ oc get pdb
    NAME                          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
    logging-loki-distributor      1               N/A               0                     22d
    logging-loki-gateway          1               N/A               1                     22d
    logging-loki-index-gateway    1               N/A               0                     22d
    logging-loki-ingester         1               N/A               0                     22d
    logging-loki-querier          1               N/A               0                     22d
    logging-loki-query-frontend   1               N/A               0                     22d
    
  • Review that when draining the node, it's not possible to evict some Loki pods with the error pod's disruption budget:

    $ oc adm drain <node> --ignore-daemonsets --delete-emptydir-data --force
    ...
    error when evicting pods/"logging-loki-index-gateway-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
    error when evicting pods/"logging-loki-ingester-0" -n "openshift-logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
    ...
    
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.