All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS lockups with 2.4.18
@ 2003-09-24 16:51 Stephan Koledin
  2003-09-25 19:54 ` Stephan Koledin
  0 siblings, 1 reply; 4+ messages in thread
From: Stephan Koledin @ 2003-09-24 16:51 UTC (permalink / raw)
  To: nfs

Hello All-

We're running into some NFS lockups here and having a rough time 
debugging/solving the issue. I'm hoping that someone on the list may be 
able to provide some suggestions.

Background:

Unpredictably, but approximately every week or two, we run into a 
problem where our file server stops responding to NFS requests. The 
server is Debian 3.0 (woody), running the 2.4.18-1-686 stock debian 
kernel. Have also seen the same problem with the debian bf2.4 kernel.
Server is a single 2.6GHz P4 processor, >1GB RAM, with several large 
(>500GB) SCSI-160 arrays using ext3 filesystems, quotas enabled.

The problem only seems to be resolvable with a reboot, and even then, 
the problem will often reoccur within a few minutes, requiring another 
reboot (1-5 times), before finally settling down and being stable for 
1-14+ days.

We have not noticed anything strange/interesting in any of the logs or 
on the console. All other services/processes on the machine continue to 
operate perfectly, including disk operations. The machine simply stops 
serving NFS.

We serve NFS to Linux, Solaris, and HP clients, several versions of each OS.

Some Relevant Data (from when NFS was not working properly):

$ rpcinfo -p

    program vers proto   port
     100000    2   tcp    111  portmapper
     100000    2   udp    111  portmapper
     100029    1   udp    845  keyserv
     100029    2   udp    845  keyserv
     100011    1   udp    852  rquotad
     100011    2   udp    852  rquotad
     100011    1   tcp    855  rquotad
     100011    2   tcp    855  rquotad
     100024    1   udp  32772  status
     100024    1   tcp  32768  status
     100001    1   udp  32773  rstatd
     100001    2   udp  32773  rstatd
     100001    3   udp  32773  rstatd
     100001    4   udp  32773  rstatd
     100001    5   udp  32773  rstatd
     100003    2   udp   2049  nfs
     100003    3   udp   2049  nfs
     100021    1   udp  32774  nlockmgr
     100021    3   udp  32774  nlockmgr
     100021    4   udp  32774  nlockmgr
     100005    1   udp  32775  mountd
     100005    1   tcp  32769  mountd
     100005    2   udp  32775  mountd
     100005    2   tcp  32769  mountd
     100005    3   udp  32775  mountd
     100005    3   tcp  32769  mountd

$ rpcinfo [-u | -t] <host> <program>

portmapper udp
program 100000 version 2 ready and waiting
portmapper tcp
program 100000 version 2 ready and waiting

keyserv
program 100029 version 1 ready and waiting
program 100029 version 2 ready and waiting

rquotad udp
program 100011 version 1 ready and waiting
program 100011 version 2 ready and waiting
rquotad tcp
program 100011 version 1 ready and waiting
program 100011 version 2 ready and waiting

status udp
program 100024 version 1 ready and waiting
status tcp
program 100024 version 1 ready and waiting

rstatd udp
program 100001 version 1 ready and waiting
program 100001 version 2 ready and waiting
program 100001 version 3 ready and waiting
program 100001 version 4 is not available
program 100001 version 5 ready and waiting

nfs udp
program 100003 version 0 is not available

nlockmgr udp
program 100021 version 0 is not available

mountd udp
program 100005 version 0 is not available
mountd tcp
program 100005 version 0 is not available


(normal output, as expected, is as follows)
nfs udp
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting

nlockmgr udp
program 100021 version 1 ready and waiting
program 100021 version 2 is not available
program 100021 version 3 ready and waiting
program 100021 version 4 ready and waiting

mountd udp
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting
mountd tcp
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting


All nfsd processes (32) are still in process list, but the number of 
swapped processes reported by vmstat jumps from 2 to 33, right around 
the occurrence of the problem. The rpc.mountd, rpc.rquotad, and 
rpc.statd processes are also still in the process list along with the 32 
nfsd instances. Don't see anything unusual in any of the other system 
stats - memory, cpu, and disk usage all remain steady. Memory 
utilization graphs during the problem period appear flatter than is 
typical, but not much change in values from normal operation.

Does anyone have any ideas about the problem? I'll be sure to grab some 
nfsstat and better ps output next time this happens, but any other 
suggestions for better logging or any other relevant data collection? 
Have there been any fixes since 2.4.18 that address similar or related 
problems?

Thanks for any help with this elusive problem.

-Stephan

-- 
Stephan B Koledin
Network Systems Developer
http://neolinear.com/



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-09-30 23:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-24 16:51 NFS lockups with 2.4.18 Stephan Koledin
2003-09-25 19:54 ` Stephan Koledin
2003-09-26 15:47   ` Stephan Koledin
2003-09-30 23:08     ` Stephan Koledin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.