Is there a way to monitor for 'hung' NFS mount points?
Environment
- Red Hat Enteprise Linux
- NFS client
Issue
- Is there a safe way to detect which mounts may be hung for > N seconds on a hard mount semantic? For example, something that a polling check system (like Condor startd cron's) could check and report on?
- Is it possible to create an nfs watchdog process to monitor for hung NFS mounts?
Resolution
- There's no easy way to know ahead of time whether an NFS mount point will hang. An NFS mount point is only detected as 'hung' when an NFS operation is attempted.
- To monitor for potentially 'hung' NFS mounts, you can use a 'watchdog' based approach, similar to what is used in many clustering systems. One process periodically issues an operation to the NFS share and then writes a timestamp to a local file, and a second process compares the timestamp written with the current time. For example:
- Process #1: Attempt an NFS operation and write a timestamp to local file
# cat /tmp/nfs-watchdog-process1.sh
#!/bin/bash
#
# Process #1 of the NFS 'watchdog' to monitor a specific mount point
# - Every second, do a 'touch' on a file in the nfs filesystem
# - Write the current time (in seconds) to a local file based on the mount point name
#
# Local file in which the timestamps are written. Filename based on
# mount point. Something like this might work to generate a unique
# name: mount | grep "type nfs " | awk '{ print $1 }' | tr ':/' '--'
#
# To the extent possible under law, Red Hat, Inc. has dedicated all copyright to this
# software to the public domain worldwide, pursuant to the CC0 Public Domain Dedication.
# This software is distributed without any warranty. See <http://creativecommons.org/publicdomain/zero/1.0/>.
#
LOCALFILE=/tmp/nfs-watchdog-nfs-server--export1.txt
#
while true
do echo `date +%s` >> $LOCALFILE
touch /mnt/nfsimport1/.nfs-watchdog
sleep 1
done
- Process #2: Monitor the last timestamp written to the local file
# cat ./nfs-watchdog-process2.sh
#!/bin/bash
#
# Process #2 of the NFS 'watchdog' to monitor a specific mount point
# - compare the last time written with the current time
# - if last time written was greater than some threshold print a message
#
# Local file in which the timestamps are written. Filename based on
# mount point. Something like this might work to generate a unique
# name: mount | grep "type nfs " | awk '{ print $1 }' | tr ':/' '--'
#
# To the extent possible under law, Red Hat, Inc. has dedicated all copyright to this
# software to the public domain worldwide, pursuant to the CC0 Public Domain Dedication.
# This software is distributed without any warranty. See <http://creativecommons.org/publicdomain/zero/1.0/>.
#
LOCALFILE=/tmp/nfs-watchdog-nfs-server--export1.txt
#
# Threshold of failed NFS operation, in seconds
THRESHOLD=5
#
while true
do DOG=`tail -1 $LOCALFILE`
NOW=`date +%s`
declare -i DIFF;
DIFF=$NOW-$DOG;
if [ $DIFF -gt $THRESHOLD ]; then
echo WARNING: watchdog last written $DIFF seconds ago, greater than threshold $THRESHOLD
fi
sleep 5
done
# ./nfs-watchdog-process2.sh
WARNING: watchdog last written 7 seconds ago, greater than threshold 5
WARNING: watchdog last written 12 seconds ago, greater than threshold 5
WARNING: watchdog last written 17 seconds ago, greater than threshold 5
WARNING: watchdog last written 23 seconds ago, greater than threshold 5
SBR
Product(s)
Components
Category
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.