All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephan Koledin <skoledin@neolinear.com>
To: nfs@lists.sourceforge.net
Subject: NFS lockups with 2.4.18
Date: Wed, 24 Sep 2003 12:51:22 -0400	[thread overview]
Message-ID: <3F71CB8A.3090208@neolinear.com> (raw)

Hello All-

We're running into some NFS lockups here and having a rough time 
debugging/solving the issue. I'm hoping that someone on the list may be 
able to provide some suggestions.

Background:

Unpredictably, but approximately every week or two, we run into a 
problem where our file server stops responding to NFS requests. The 
server is Debian 3.0 (woody), running the 2.4.18-1-686 stock debian 
kernel. Have also seen the same problem with the debian bf2.4 kernel.
Server is a single 2.6GHz P4 processor, >1GB RAM, with several large 
(>500GB) SCSI-160 arrays using ext3 filesystems, quotas enabled.

The problem only seems to be resolvable with a reboot, and even then, 
the problem will often reoccur within a few minutes, requiring another 
reboot (1-5 times), before finally settling down and being stable for 
1-14+ days.

We have not noticed anything strange/interesting in any of the logs or 
on the console. All other services/processes on the machine continue to 
operate perfectly, including disk operations. The machine simply stops 
serving NFS.

We serve NFS to Linux, Solaris, and HP clients, several versions of each OS.

Some Relevant Data (from when NFS was not working properly):

$ rpcinfo -p

    program vers proto   port
     100000    2   tcp    111  portmapper
     100000    2   udp    111  portmapper
     100029    1   udp    845  keyserv
     100029    2   udp    845  keyserv
     100011    1   udp    852  rquotad
     100011    2   udp    852  rquotad
     100011    1   tcp    855  rquotad
     100011    2   tcp    855  rquotad
     100024    1   udp  32772  status
     100024    1   tcp  32768  status
     100001    1   udp  32773  rstatd
     100001    2   udp  32773  rstatd
     100001    3   udp  32773  rstatd
     100001    4   udp  32773  rstatd
     100001    5   udp  32773  rstatd
     100003    2   udp   2049  nfs
     100003    3   udp   2049  nfs
     100021    1   udp  32774  nlockmgr
     100021    3   udp  32774  nlockmgr
     100021    4   udp  32774  nlockmgr
     100005    1   udp  32775  mountd
     100005    1   tcp  32769  mountd
     100005    2   udp  32775  mountd
     100005    2   tcp  32769  mountd
     100005    3   udp  32775  mountd
     100005    3   tcp  32769  mountd

$ rpcinfo [-u | -t] <host> <program>

portmapper udp
program 100000 version 2 ready and waiting
portmapper tcp
program 100000 version 2 ready and waiting

keyserv
program 100029 version 1 ready and waiting
program 100029 version 2 ready and waiting

rquotad udp
program 100011 version 1 ready and waiting
program 100011 version 2 ready and waiting
rquotad tcp
program 100011 version 1 ready and waiting
program 100011 version 2 ready and waiting

status udp
program 100024 version 1 ready and waiting
status tcp
program 100024 version 1 ready and waiting

rstatd udp
program 100001 version 1 ready and waiting
program 100001 version 2 ready and waiting
program 100001 version 3 ready and waiting
program 100001 version 4 is not available
program 100001 version 5 ready and waiting

nfs udp
program 100003 version 0 is not available

nlockmgr udp
program 100021 version 0 is not available

mountd udp
program 100005 version 0 is not available
mountd tcp
program 100005 version 0 is not available


(normal output, as expected, is as follows)
nfs udp
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting

nlockmgr udp
program 100021 version 1 ready and waiting
program 100021 version 2 is not available
program 100021 version 3 ready and waiting
program 100021 version 4 ready and waiting

mountd udp
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting
mountd tcp
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting


All nfsd processes (32) are still in process list, but the number of 
swapped processes reported by vmstat jumps from 2 to 33, right around 
the occurrence of the problem. The rpc.mountd, rpc.rquotad, and 
rpc.statd processes are also still in the process list along with the 32 
nfsd instances. Don't see anything unusual in any of the other system 
stats - memory, cpu, and disk usage all remain steady. Memory 
utilization graphs during the problem period appear flatter than is 
typical, but not much change in values from normal operation.

Does anyone have any ideas about the problem? I'll be sure to grab some 
nfsstat and better ps output next time this happens, but any other 
suggestions for better logging or any other relevant data collection? 
Have there been any fixes since 2.4.18 that address similar or related 
problems?

Thanks for any help with this elusive problem.

-Stephan

-- 
Stephan B Koledin
Network Systems Developer
http://neolinear.com/



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

             reply	other threads:[~2003-09-24 16:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-09-24 16:51 Stephan Koledin [this message]
2003-09-25 19:54 ` NFS lockups with 2.4.18 Stephan Koledin
2003-09-26 15:47   ` Stephan Koledin
2003-09-30 23:08     ` Stephan Koledin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F71CB8A.3090208@neolinear.com \
    --to=skoledin@neolinear.com \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.