From: Stephan Koledin <skoledin@neolinear.com>
To: nfs@lists.sourceforge.net
Subject: NFS lockups with 2.4.18
Date: Wed, 24 Sep 2003 12:51:22 -0400 [thread overview]
Message-ID: <3F71CB8A.3090208@neolinear.com> (raw)
Hello All-
We're running into some NFS lockups here and having a rough time
debugging/solving the issue. I'm hoping that someone on the list may be
able to provide some suggestions.
Background:
Unpredictably, but approximately every week or two, we run into a
problem where our file server stops responding to NFS requests. The
server is Debian 3.0 (woody), running the 2.4.18-1-686 stock debian
kernel. Have also seen the same problem with the debian bf2.4 kernel.
Server is a single 2.6GHz P4 processor, >1GB RAM, with several large
(>500GB) SCSI-160 arrays using ext3 filesystems, quotas enabled.
The problem only seems to be resolvable with a reboot, and even then,
the problem will often reoccur within a few minutes, requiring another
reboot (1-5 times), before finally settling down and being stable for
1-14+ days.
We have not noticed anything strange/interesting in any of the logs or
on the console. All other services/processes on the machine continue to
operate perfectly, including disk operations. The machine simply stops
serving NFS.
We serve NFS to Linux, Solaris, and HP clients, several versions of each OS.
Some Relevant Data (from when NFS was not working properly):
$ rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100029 1 udp 845 keyserv
100029 2 udp 845 keyserv
100011 1 udp 852 rquotad
100011 2 udp 852 rquotad
100011 1 tcp 855 rquotad
100011 2 tcp 855 rquotad
100024 1 udp 32772 status
100024 1 tcp 32768 status
100001 1 udp 32773 rstatd
100001 2 udp 32773 rstatd
100001 3 udp 32773 rstatd
100001 4 udp 32773 rstatd
100001 5 udp 32773 rstatd
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100021 1 udp 32774 nlockmgr
100021 3 udp 32774 nlockmgr
100021 4 udp 32774 nlockmgr
100005 1 udp 32775 mountd
100005 1 tcp 32769 mountd
100005 2 udp 32775 mountd
100005 2 tcp 32769 mountd
100005 3 udp 32775 mountd
100005 3 tcp 32769 mountd
$ rpcinfo [-u | -t] <host> <program>
portmapper udp
program 100000 version 2 ready and waiting
portmapper tcp
program 100000 version 2 ready and waiting
keyserv
program 100029 version 1 ready and waiting
program 100029 version 2 ready and waiting
rquotad udp
program 100011 version 1 ready and waiting
program 100011 version 2 ready and waiting
rquotad tcp
program 100011 version 1 ready and waiting
program 100011 version 2 ready and waiting
status udp
program 100024 version 1 ready and waiting
status tcp
program 100024 version 1 ready and waiting
rstatd udp
program 100001 version 1 ready and waiting
program 100001 version 2 ready and waiting
program 100001 version 3 ready and waiting
program 100001 version 4 is not available
program 100001 version 5 ready and waiting
nfs udp
program 100003 version 0 is not available
nlockmgr udp
program 100021 version 0 is not available
mountd udp
program 100005 version 0 is not available
mountd tcp
program 100005 version 0 is not available
(normal output, as expected, is as follows)
nfs udp
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting
nlockmgr udp
program 100021 version 1 ready and waiting
program 100021 version 2 is not available
program 100021 version 3 ready and waiting
program 100021 version 4 ready and waiting
mountd udp
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting
mountd tcp
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting
All nfsd processes (32) are still in process list, but the number of
swapped processes reported by vmstat jumps from 2 to 33, right around
the occurrence of the problem. The rpc.mountd, rpc.rquotad, and
rpc.statd processes are also still in the process list along with the 32
nfsd instances. Don't see anything unusual in any of the other system
stats - memory, cpu, and disk usage all remain steady. Memory
utilization graphs during the problem period appear flatter than is
typical, but not much change in values from normal operation.
Does anyone have any ideas about the problem? I'll be sure to grab some
nfsstat and better ps output next time this happens, but any other
suggestions for better logging or any other relevant data collection?
Have there been any fixes since 2.4.18 that address similar or related
problems?
Thanks for any help with this elusive problem.
-Stephan
--
Stephan B Koledin
Network Systems Developer
http://neolinear.com/
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next reply other threads:[~2003-09-24 16:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-09-24 16:51 Stephan Koledin [this message]
2003-09-25 19:54 ` NFS lockups with 2.4.18 Stephan Koledin
2003-09-26 15:47 ` Stephan Koledin
2003-09-30 23:08 ` Stephan Koledin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3F71CB8A.3090208@neolinear.com \
--to=skoledin@neolinear.com \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.