All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephan Koledin <skoledin@neolinear.com>
To: nfs@lists.sourceforge.net
Subject: Re: NFS lockups with 2.4.18
Date: Thu, 25 Sep 2003 15:54:39 -0400	[thread overview]
Message-ID: <3F7347FF.2000403@neolinear.com> (raw)
In-Reply-To: <3F71CB8A.3090208@neolinear.com>

We ran into an NFS lockup again this morning. It continued for 
approximately 1.5 hours, then cleared up (after a number of consecutive 
reboots) and nfs worked perfectly again.

I managed to collect some more data, in the hopes that someone might 
have some suggestions.

A process list shows that the lockd and all 32 nfsd processes are in the 
D state (uninterruptible sleep). lockd is stuck at a WCHAN of 'down'. 
Also, 31 of the nfsd processes have lost the 'super-user' process flag 
and are now marked 040 instead of the normal 140. The first nfsd and 
lockd are still 140.

Interestingly, a process snapshot from a period when nfs was still 
running, but shortly before it died, shows the nfsd processes in a 
normal 'S' state, but they have all lost their super-user process flag. 
The lockd is already in a 'D' state. Not sure if it really means much, 
but does seem strange.

nfsstat doesn't seem to show anything out of the ordinary, but maybe 
someone here will see something relevant.

Once again, thanks for any help you can provide.
-Stephan


a relevant excerpt from a `ps -elfjHm` during the lockup:

   F S UID        PID  PPID WCHAN  STIME     TIME CMD
140 D root       337     1 ?      07:36 00:00:00   [nfsd]
140 D root       338     1 down   07:36 00:00:00   [lockd]
040 S root       339   338 ?      07:36 00:00:00     [rpciod]
040 D root       340     1 ?      07:36 00:00:00   [nfsd]
040 D root       341     1 ?      07:36 00:00:00   [nfsd]
040 D root       342     1 ?      07:36 00:00:00   [nfsd]
040 D root       343     1 ?      07:36 00:00:00   [nfsd]
040 D root       344     1 ?      07:36 00:00:00   [nfsd]
040 D root       345     1 ?      07:36 00:00:00   [nfsd]
040 D root       346     1 ?      07:36 00:00:00   [nfsd]
040 D root       347     1 ?      07:36 00:00:00   [nfsd]
040 D root       348     1 ?      07:36 00:00:00   [nfsd]
040 D root       349     1 ?      07:36 00:00:00   [nfsd]
040 D root       350     1 ?      07:36 00:00:00   [nfsd]
040 D root       351     1 ?      07:36 00:00:00   [nfsd]
040 D root       352     1 ?      07:36 00:00:00   [nfsd]
040 D root       353     1 ?      07:36 00:00:00   [nfsd]
040 D root       354     1 ?      07:36 00:00:00   [nfsd]
040 D root       355     1 ?      07:36 00:00:00   [nfsd]
040 D root       356     1 ?      07:36 00:00:00   [nfsd]
040 D root       357     1 ?      07:36 00:00:00   [nfsd]
040 D root       358     1 ?      07:36 00:00:00   [nfsd]
040 D root       359     1 ?      07:36 00:00:00   [nfsd]
040 D root       360     1 ?      07:36 00:00:00   [nfsd]
040 D root       361     1 ?      07:36 00:00:00   [nfsd]
040 D root       362     1 ?      07:36 00:00:00   [nfsd]
040 D root       363     1 ?      07:36 00:00:00   [nfsd]
040 D root       364     1 ?      07:36 00:00:00   [nfsd]
040 D root       365     1 ?      07:36 00:00:00   [nfsd]
040 D root       366     1 ?      07:36 00:00:00   [nfsd]
040 D root       367     1 ?      07:36 00:00:00   [nfsd]
040 D root       368     1 ?      07:36 00:00:00   [nfsd]
040 D root       369     1 ?      07:36 00:00:00   [nfsd]
040 D root       370     1 ?      07:36 00:00:00   [nfsd]
140 S root       373     1 ?      07:36 00:00:00   /usr/sbin/rpc.mountd


an excerpt from a `ps -elfjHm` between lockups:

   F S UID        PID  PPID WCHAN  STIME TIME CMD
140 S root       327     1 ?      07:57 00:00:00   [nfsd]
140 D root       328     1 down   07:57 00:00:00   [lockd]
040 S root       329   328 ?      07:57 00:00:00     [rpciod]
040 S root       330     1 ?      07:57 00:00:00   [nfsd]
040 S root       331     1 ?      07:57 00:00:00   [nfsd]
040 S root       332     1 ?      07:57 00:00:00   [nfsd]
...
040 S root       359     1 ?      07:57 00:00:00   [nfsd]
040 S root       360     1 ?      07:57 00:00:00   [nfsd]
140 S root       363     1 poll   07:57 00:00:00   /usr/sbin/rpc.mountd


nfsstat -s during lockup:

Server rpc stats:
calls      badcalls   badauth    badclnt    xdrcall
67445      0          0          0          0
Server nfs v2:
null       getattr    setattr    root       lookup     readlink
8       0% 50557  77% 668     1% 0       0% 9828   15% 4       0%
read       wrcache    write      create     remove     rename
1762    2% 0       0% 1434    2% 184     0% 6       0% 168     0%
link       symlink    mkdir      rmdir      readdir    fsstat
0       0% 0       0% 19      0% 0       0% 675     1% 9       0%

Server nfs v3:
null       getattr    setattr    lookup     access     readlink
69      3% 342    16% 0       0% 108     5% 1601   75% 0       0%
read       write      create     mkdir      symlink    mknod
0       0% 0       0% 0       0% 0       0% 0       0% 0       0%
remove     rmdir      rename     link       readdir    readdirplus
0       0% 0       0% 0       0% 0       0% 3       0% 0       0%
fsstat     fsinfo     pathconf   commit
0       0% 0       0% 0       0% 0       0%


nfsstat -s during normal operation:

Server rpc stats:
calls      badcalls   badauth    badclnt    xdrcall
8781107    4384       4384       0          0
Server nfs v2:
null       getattr    setattr    root       lookup     readlink
3845    0% 6360155 82% 30228   0% 0       0% 643234  8% 30581   0%
read       wrcache    write      create     remove     rename
298134  3% 0       0% 212600  2% 5476    0% 5240    0% 801     0%
link       symlink    mkdir      rmdir      readdir    fsstat
11      0% 12      0% 336     0% 349     0% 75643   0% 5488    0%

Server nfs v3:
null       getattr    setattr    lookup     access     readlink
6481    0% 942506 84% 0       0% 81868   7% 64802   5% 1119    0%
read       write      create     mkdir      symlink    mknod
9091    0% 0       0% 0       0% 0       0% 0       0% 0       0%
remove     rmdir      rename     link       readdir    readdirplus
0       0% 0       0% 0       0% 0       0% 1673    0% 0       0%
fsstat     fsinfo     pathconf   commit
717     0% 717     0% 0       0% 0       0%


-- 
Stephan B Koledin
Network Systems Developer
http://neolinear.com/



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

  reply	other threads:[~2003-09-25 19:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-09-24 16:51 NFS lockups with 2.4.18 Stephan Koledin
2003-09-25 19:54 ` Stephan Koledin [this message]
2003-09-26 15:47   ` Stephan Koledin
2003-09-30 23:08     ` Stephan Koledin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F7347FF.2000403@neolinear.com \
    --to=skoledin@neolinear.com \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.