RHEL6: If /proc/net/rpc/auth.unix.gid/channel is referred, the RPC time-out is generated.

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 6 (NFS server)
    • kernel-2.6.32-220.30.1.el6
    • nfs-utils-1.2.3-15.el6_2.1
    • sos greater than sos-2.2-29.el6 and earlier than sos-2.2-47.el6

Issue

  • Running sosreport on the NFS server causes NFS server to temporarily not respond to NFS client.
  • NFS server TCP ACKs NFS client requests, then silently drops them, never responding at the NFS layer.
  • NFS times out periodically with the following message seen in the log
kernel: nfs: server nfsserver not responding, still trying
  • If /proc/net/rpc/auth.unix.gid/channel is referred, the RPC time-out is generated e.g.
     # df
     Filesystem           1K-blocks      Used Available Use% Mounted on
     /dev/vda5             17906468  10171684   6825168  60% /
     tmpfs                  1027344        88   1027256   1% /dev/shm
     /dev/vda1               198337     28339    159758  16% /boot
     /dev/vda2              2015824     35812   1877612   2% /work
     df: `/mnt/nfs': Input/output error

Resolution

  • Fixed in sos-2.2-47.el6 provided by RHBA-2013:1688-1
  • Note: This bug should only cause a single, temporary pause in NFS traffic lasting approximately 2 minutes after sosreport is run. After this time, the RPC / NFS traffic should resume normally. If a more repeated or persistent RPC / NFS issue is occurring, this issue alone is unlikely to be the full root cause.

Root Cause

  • Regression introduced in sos-2.2-29.el6 or above includes a fix for This content is not included.Bugzilla #730641 - sosreport does not collect /proc/net details
  • The various /proc/net/rpc/*/channel files are special-purpose files and are used for communication between the kernel and NFS support applications such as rpc.mountd. Having another process attempt to read from these files can result in the kernel attempting to send a message which never receives a response. The kernel will then fail the NFS request after timing out the message the other process read but can't answer. Specifically, the /proc/net/rpc/auth.unix.gid/channel is used with the "--manage-gids" feature of rpc.mountd. When this feature is not used (and it is off by default), this file is not opened by rpc.mountd. Normally, the kernel will see the file is not opened and function normally. But the having another process access this file make the kernel think the feature is in use, and when no response is received to the message sent to the other process, the NFS server has to fail the NFS request. These files are only open-able by root to keep normal users from interfering with NFS. As special, root-only files, these files should only be accessed by applications which are intended to work with the files.

Diagnostic Steps

Diagnosis

Reproducer

1. Mount the share on the NFS client:

 # mount -t nfs -o soft,intr,timeo=30,retrans=1 nfsserver:/work /mnt/nfs

2. Run the following run.sh script on the NFS server:

 #!/bin/sh
 while :
 do
    date; cat /proc/net/rpc/auth.unix.gid/channel; sleep 1
 done

3. Run the df command on the NFS client, an input/output error should occur when the df command attempts to fsstat() the mounted NFS shares:

 # df
 Filesystem           1K-blocks      Used Available Use% Mounted on
 /dev/vda5             17906468  10171684   6825168  60% /
 tmpfs                  1027344        88   1027256   1% /dev/shm
 /dev/vda1               198337     28339    159758  16% /boot
 /dev/vda2              2015824     35812   1877612   2% /work
 df: `/mnt/nfs': Input/output error
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.