Gathering data and logs to troubleshoot NFS issues

Updated 29 Aug 2024

Are you encountering an issue with your NFS environment that requires troubleshooting? This article will provide you instructions and guidance on troubleshooting a variety of NFS related issues and what information will be helpful to Red Hat Support.

Basic information and collection

Provide the following information:
- What is the NFS server? Is it RHEL, a NetApp Filer, EMC NAS, etc?
  
  The NFS server is the machine or host that is exporting the NFS shares.
- What is the NFS client? Is the client running on RHEL or other Linux platform, UNIX, or Microsoft Windows?
  
  The NFS client is a machine or host that is mounting the NFS exports.
For Red Hat Enterprise Linux (RHEL) or RHEL based NFS servers or clients, capture a sosreport. It is recommended to prefix the word nfsserver or nfsclient to the start of the sosreport archive to help differentiate the client and server involved. For example, nfsserver-sosreport-redhat.00123456-20131126165814-495f.tar.xz.

Data for specific symptoms

In some cases, multiple issue symptoms may be present. If that is the case, please use the suggested data capture methods for all of the symptoms present in your environment.

The issue is reproducible by issuing a command

Refer to Protocol Issues in How to begin NFS Debugging.

The issue occurs intermittently and creates messages in system and/or application logs

In some cases the issue may not be consistently reproducible but an error message will appear in the system logs or application logs at the time the issue occurs.

If you know what error message is associated with the NFS issue you can use the tcpdump-watch.sh script to collect network packets up until just after the error message is logged.

Download the attached tcpdump-watch.sh script and place it on the host where the log message is being captured.
Modify the SETUP section in tcpdump-watch.sh and set the necessary variables. At a minimum, verify that the variable match is set to the expected error message text.
Make sure the script is executable.
```
 # chmod +x tcpdump-watch.sh
```
Invoke the script passing it the case number or identifier as the first parameter and an NFS server/client IP address as the second. You can optionally specify additional IP addresses for instances where multiple NFS servers are involved.

./tcpdump-watch.sh CASENUMBER nfs-server-ip1 [nfs-server-ip2 ... nfs-server-ipN]

The script will exit when the specified error message appears in the log and the nfs-data-CASENUMBER-NNN.tgz file will contain the relevant capture and logs.

The NFS client or server experiences a hang

Configure kdump.
Configure hung task panic to crash the machine when a hang is detected.
Once a vmcore has been captured after a panic, it is recommended you disable hung_task_panic.

This is because any process that hangs for the time specified in /proc/sys/kernel/hung_task_timeout will trigger a panic. This can occur for a variety of expected reasons, most commonly during large write based operations D stating on storage.

If you will be providing the vmcore file to Red Hat Supprt, be sure to rename the vmcore file to include the case number before you attach it to your case or upload it. See How can I provide large files to Red Hat Support if you need assistance.

The NFS client is encountering performance problems with latency, throughput, etc.

Refer to Performance Issues (Reading or Writing data) in How to begin NFS Debugging.

The NFS share on my NFS server is suddenly losing data and I believe an NFS client could be the culprit OR the NFS share is growing and I want to find the cause

In some cases the issue may be related to sudden or consistent changes in available space of the NFS share. You can use the tcpdump-fs.sh script to collect network packets up until just after a specified disk usage percentage threshold occurs on a specific mount.

Download the attached tcpdump-fs.sh script and place it on the host where the NFS share is located.
Modify the SETUP section in tcpdump-fs.sh and set the necessary variables. At a minimum, verify that the variable value is set to the expected usage percentage; the variable delim is set to ge or le; and nfsmount is the full path to the NFS share/mount point that is being monitored.
Make sure the script is executable.
```
 # chmod +x tcpdump-fs.sh
```
Invoke the script passing it the NFS server/client IP address.

./tcpdump-fs.sh

The script will exit when the specified free or used space percentage is exceeded and inform you of the file/archive containing the captured packets and logs.

SBR

Filesystem

Product(s)

Red Hat Enterprise Linux

Category

Troubleshoot

Components

Tags

Article Type

General