All of lore.kernel.org
 help / color / mirror / Atom feed
* Spontaneous server reboot with 2.6.10 and nfsd
@ 2005-02-11 12:56 Kim Holviala
  2005-02-11 20:17 ` comsatcat
  0 siblings, 1 reply; 3+ messages in thread
From: Kim Holviala @ 2005-02-11 12:56 UTC (permalink / raw)
  To: nfs

I already posted this to LKML, but I don't think anyone was interested 
there... Here's the original posting:

===============
I hit an obscure bug last night when trying to copy files from an nfs
client to my nfs server. The server is a P3/800 with three IDE disks in
software RAID5 running vanilla 2.6.10 and Debian Sarge. The network is
local 100Mbit/s switched ethernet. The server exports a 220 gig
partition which contains a lot of data.

Oh, kernel configs and stuff from the server can be found from:
http://www.holviala.com/~kimmy/crash/

Anyway, I mount the export to a Linux client (tried with a few with
different 2.6 kernels and distros) and then start copying files from
clients CDROM to the server through NFS. After copying a few small
files, the first big one reboots the server. There are no log entries,
and the server has no local console so I don't know what happens. This
is reproduceable 100% of the time.
To narrow down the problem, I've tried the following:

- copied files from a different client running Gentoo: reboot
- exported a non-raided partition (hdc9) and tried that: reboot
- switched 2.6.10 to 2.6.11-rc3: reboot, but it took longer

I hope it's just something that I've done, but this server has been in
use for a long time now without any problems, and I haven't touched it
for a while.

So, if anyone knows what's wrong, or can tell me a way to debug the
situation more I'd be grateful. The server is in a place where it's
nearly impossible to have a local console - I could probably use a
serial one if necessary for debugging.
===============

So, that was my original posting. Since then I've tried localhost 
mounts, tcp, udp, different r/wsizes etc etc. I can still reliably 
reboot teh server remotely just by copying something to the NFS mount :-/.

Now, there are two things that I've tested that worked better than 
others: First I switched to async exports, mounted localhost:/export/tmp 
with udp and copied stuff there. The copying hang 
(http://www.holviala.com/~kimmy/crash/nfsd.log) but the server didn't 
crash. Woo! Tried that remotely and it once again rebooted the server...

And then I made one test with tcp,rsize=1024,wsize=1024 again with 
localhost:/export/tmp, and that worked ok. I haven't had the time to 
test that remotely, yet.

So, I can only assume that there's something wrong with using r/wsize 
which is bigger than MTU. However, I run a lot of stuff through that 
same network and I never see any TCP retransmissions or any other 
problems. Besides, I'm getting the same reboot even with localhost NFS 
mounts.

I have managed to capture some logs with nfsd logging on, those can be 
found from the above link.

I'd be grateful for any pointers, debugging flags, anything. I've 
crashed my server now maybe three dozen times trying to narrow the 
problem down....



Kim




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-02-12  8:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-11 12:56 Spontaneous server reboot with 2.6.10 and nfsd Kim Holviala
2005-02-11 20:17 ` comsatcat
2005-02-12  8:02   ` Kim Holviala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.