Loki Ingester 0/1 fails to Connect to Outdated IPs in Gossip Ring in RHOCP 4

Solution Verified - Updated 12 May 2026

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 4
Red Hat OpenShift Logging (RHOL)
- 5
- 6
LokiStack
Loki

Issue

Lokistack operator fails to start logging-loki-ingester pod due to connection timeout to an outdated IP in the gossip ring.
IP addresses in the gossip ring endpoint list that are no longer in use causing the issue.
The logs indicate an error connecting to the outdated IP address: WriteTo failed" addr=<IP>:7946 err="dial tcp <IP>:7946: i/o timeout".
The IP addresses are not present in the podnetwork or in the list of addresses for the logging-loki-gossip-ring endpoint.
Loki Ingester in 0/1, even, when not having an issue with the Loki storage as described in the Red Hat Knowledge Article "Loki ingesters 0/1 in RHOCP 4"

Loki Ingester pod throws the error:

msg="found an existing instance(s) with a problem in the ring, this instance cannot become ready until this problem is resolved. The /ring http endpoint on the distributor (or single binary) provides visibility into the ring." ring=ingester err="instance 10.x.x.x:9095 past heartbeat timeout"

Resolution

Red Hat investigated this issue in bug report:

RHOL Release	Bug	Fixed version	Errata
6.3	This content is not included.LOG-6987	6.3.0	RHBA-2025:11336
6.2	This content is not included.LOG-6992	6.2.1	RHBA-2025:3908

If this issue still occurs in the environment after updating, open a support case in the Red Hat Customer Portal referring to this solution.

Workaround

Note: the variable LOGGING_LOKI_DISTRIBUTOR_HTTP_PORT_3100_TCP_ADDR could be different depending on the environment as it has two parts:

Variable part: LokiStack CR name in upper case. In this example LOGGING_LOKI
Fixed part: _DISTRIBUTOR_HTTP_PORT_3100_TCP_ADDR

Set environment variables

$ cr="logging-loki"      ### LokiStack CR name, change is the name is different 
$ ns="openshift-logging" ### change if Loki runs in a different namespace

Start a Loki Distributor pod with with the UBI8 image

$ oc -n ${ns} debug --image=registry.redhat.io/ubi8:latest deployment/logging-loki-distributor

Get the "Unhealthy" Loki Ingester members

Content from ${logging_loki_distributor_http_port_3100_tcp_addr} is not included.https://${logging_loki_distributor_http_port_3100_tcp_addr}:3100/ring%20/

Forget the "Unhealthy" Loki Ingester members

  // If not using cluster-wide proxy
  sh-4.4$ curl -k --cert /var/run/tls/http/server/tls.crt --key /var/run/tls/http/server/tls.key   https://${LOGGING_LOKI_DISTRIBUTOR_HTTP_PORT_3100_TCP_ADDR}:3100/ring -X POST --data-raw 'forget=<UNHEALTHY_POD_FROM_EARLIER_COMMAND>'

  // If using [cluster-wide proxy](https://docs.openshift.com/container-platform/4.16/networking/enable-cluster-wide-proxy.html)
  $ curl -k --cert /var/run/tls/http/server/tls.crt --key /var/run/tls/http/server/tls.key --noproxy "${LOGGING_LOKI_DISTRIBUTOR_HTTP_PORT_3100_TCP_ADDR}" https://${LOGGING_LOKI_DISTRIBUTOR_HTTP_PORT_3100_TCP_ADDR}:3100/ring -X POST --data-raw 'forget=<UNHEALTHY_POD_FROM_EARLIER_COMMAND>'

Restart the Loki Ingester pods

$ oc delete pods -l app.kubernetes.io/component=ingester -n [namespace_name]

Root Cause

Loki ingesters that got into an Unhealthy state due to networking issues stayed in that state even when the network recovered.

Diagnostic Steps

Verify that the Loki Ingester pod is 0/1

$ oc get pods -n [namespace_name] | grep ingester
logging-loki-ingester-1                        0/1     Running   0          1d

Check the logs of loki-ingester pod for the issue with the ring

$ oc logs logging-loki-ingester-1 -n [namespace_name]
[...]
2023-01-01T00:00:00.000000000Z level=warn ts=2023-01-01T00:00:00.0000000002Z caller=lifecycler.go:241 msg="found an existing instance(s) with a problem in the ring, this instance cannot become ready until this problem is resolved. The /ring http endpoint on the distributor (or single binary) provides visibility into the ring." ring=ingester err="instance 10.x.x.x:9095 past heartbeat timeout"

Verify that the Loki Ingester logging-loki-ingester-1 is "UNHEALTHY". In the next example, the logging-loki-ingester-1 is UNHEALTHY

   $ oc -n [namespace_name] debug --image=registry.redhat.io/ubi8:latest deployment/logging-loki-distributor

   sh-4.4$ curl -k --cert /var/run/tls/http/server/tls.crt --key /var/run/tls/http/server/tls.key  https://${LOGGING_LOKI_DISTRIBUTOR_HTTP_PORT_3100_TCP_ADDR}:3100/ring -H "Accept: application/json" 
   [...]
   {"shards":[{"id":"logging-loki-ingester-0","state":"ACTIVE",[...]},{"id":"logging-loki-ingester-1","state":"UNHEALTHY",[...]}

Verify that the Loki Gossip ring contains IP addresses that not longer exists

$ oc describe endpoints logging-loki-gossip-ring -n [namespace_name]
[...]
Subsets:
  Addresses:          10.128.1.44,10.128.1.46,10.129.0.73,10.129.0.76,10.129.0.77,10.129.0.78,10.129.0.79,10.130.1.10,10.130.1.11,10.130.1.12,10.130.1.13
  NotReadyAddresses:  10.128.1.45,10.129.0.80
  Ports:
    Name         Port  Protocol
    ----         ----  --------
    gossip-ring  7946  TCP

Verify that not hitting the issue described in the Red Hat Article Knowledge Base "Loki ingesters 0/1 in RHOCP 4"

SBR

Shift Monitoring

Product(s)

Components

loki

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.