Loki ingesters 0/1 in RHOCP 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Red Hat OpenShift Logging (RHOL)
- 5
- 6
- Red Hat Network Observability
- LokiStack
Issue
-
A
logging-loki-ingesterornetobserv-loki-ingesterpod are unable to initialize due to presence of staleloki-ingesterentries inLokiStackhash ring:netobserv-loki-ingester-0 0/1 Running 0 1dlogging-loki-ingester-0 0/1 Running 0 1d -
The
loki-ingesterlogs shows an error with the InvalidBucketState/ring:level=error ts=2024-08-14T09:25:53.576834111Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: InvalidBucketState: The request is not valid with the current state of the bucket.\n\tstatus code: 409, request id: xxxx-xxxx-xxxx, host id: xxxx-xxxx-xxx, num_chunks: 1, labels: {kubernetes_host=\"example.ocp.com\", log_type=\"infrastructure\"}"msg="found an existing instance(s) with a problem in the ring, this instance cannot become ready until this problem is resolved. The /ring http endpoint on the distributor (or single binary) provides visibility into the ring." ring=ingester err="instance 10.x.x.x:9095 past heartbeat timeout" -
Unable see logs in web console log section showing "Too Many Unhealthy Instances In The Ring"

Resolution
1. For not "production" sizes
Red Hat investigated this issue:
| RHOL Release | Bug | Fixed version | Errata |
|---|---|---|---|
| RHOL 5.9 | This content is not included.LOG-5614 | 5.9.3 | RHBA-2024:3736 |
| RHOL 5.8 | This content is not included.LOG-5615 | 5.8.8 | RHBA-2024:3738 |
| RHOL 5.7 | This content is not included.LOG-5616 | 5.7.15 | RHBA-2024:3739 |
| RHOL 5.6 | This content is not included.LOG-5617 | 5.6.20 | RHBA-2024:3740 |
If this issue still occurs in your environment after updating, open a support case in the Red Hat Customer Portal referring to this solution.
After upgrading, restart the Loki Ingester pods and in case that the Loki Operator was not doing it automatically:
$ oc delete pods -l app.kubernetes.io/component=ingester -n [namespace_name]
2. For when not able to write the data to the storage
It should be reviewed and fixed the problem for being able to persist the data, usually, this is related to a bad definition for accessing to the backend storage, bad credentials, no connectivity, or an issue in the backend storage that is not working as expected.
If the issue was:
- with the object storage, verify that it works following the Red Hat Knowledge Base: "How to test S3 API Compatible buckets used by Lokistack in RHOCP 4"
- with the filesystem mounted in the local Loki pods, write a file and read it
Once the storage issue is fixed. Restart the Loki Ingester pods:
$ oc delete pods -l app.kubernetes.io/component=ingester -n [namespace_name]
Root Cause
Red Hat has investigated this issue in This content is not included.LOG-4840 and detected two different causes:
- In the most of the cases, the Loki ingester not able to write the data to the storage remaining
Unhealthy. Then, it should be reviewed and fixed the problem with the storage to persist the data - In not "production" sizes a configuration issue was present being the
replay_memory_ceilingvalue 0
Diagnostic Steps
Note: Replace the
[namespace_name]withopenshift-loggingornetobservas needed.
-
Check the status of the
loki-ingesterpods:$ oc get pods -n [namespace_name] | grep ingester logging-loki-ingester-0 0/1 Running 0 1d -
Check the logs of
loki-ingesterpod for the issue with thering:$ oc logs $loki-ingester-pod -n [namespace_name] [...] 2023-01-01T00:00:00.000000000Z level=warn ts=2023-01-01T00:00:00.0000000002Z caller=lifecycler.go:241 msg="found an existing instance(s) with a problem in the ring, this instance cannot become ready until this problem is resolved. The /ring http endpoint on the distributor (or single binary) provides visibility into the ring." ring=ingester err="instance 10.x.x.x:9095 past heartbeat timeout"
Note: if not an issue with the storage, check the Red Hat Knowledge Article "Loki Ingester 0/1 fails to Connect to Outdated IPs in Gossip Ring in RHOCP 4"
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.