From: Kim Holviala <kim@holviala.com>
To: comsatcat <comsatcat@earthlink.net>
Cc: nfs@lists.sourceforge.net
Subject: Re: Spontaneous server reboot with 2.6.10 and nfsd
Date: Sat, 12 Feb 2005 10:02:46 +0200 [thread overview]
Message-ID: <420DB826.1080506@holviala.com> (raw)
In-Reply-To: <1108153050.9386.3.camel@solaris.skunkware.org>
comsatcat wrote:
> I'm not sure if this is related or not, but on a batch of 8 servers
> running 2.6.9 and 2.6.10 pushing 300-600mb/s we're seeing the same thing
> using 32k r/wsize w/ jumbo frames (MTU 9000).
I tried kernels from 2.6.8.1 -> 2.6.11-rc3 and they all did the same.
The 11-rc3 seemed to work a bit better - it lasted about 3 secods longer
than the others before it rebooted itself.
> We push all ranges of
> files (few bytes -> 2+ gigs), so we haven't been able to link this to
> specific file sizes.
Yeah, I too have gotten it to crash with small files too - it's just
that it seems to crash more easily with big ones.
> Do you have a kernel version that used to work for you that I can test
> on some of our boxes?
2.2.18?
:-)
Seriously, that was the last time NFS was really stable... And that was
with the user-space nfs server.
The problem is that I use NFS for mostly read-only things so I haven't
ran into this particular problem. But the other day I needed to dump a
CDROM to the server, and since I had a rw mount I decided to just copy
it there - and that's where the problems started. Reading stuff from NFS
seems work just fine no matter what I do.
> Note we are also running Gentoo 2004.3 on all 8 servers.
Oh, mine is Debian Sarge, the clients vere both Debian and Gentoo. I
think I'll switch to *BSD or x86 Solaris on my NFS servers...
Kim
> On Fri, 2005-02-11 at 14:56 +0200, Kim Holviala wrote:
>
>>I already posted this to LKML, but I don't think anyone was interested
>>there... Here's the original posting:
>>
>>===============
>>I hit an obscure bug last night when trying to copy files from an nfs
>>client to my nfs server. The server is a P3/800 with three IDE disks in
>>software RAID5 running vanilla 2.6.10 and Debian Sarge. The network is
>>local 100Mbit/s switched ethernet. The server exports a 220 gig
>>partition which contains a lot of data.
>>
>>Oh, kernel configs and stuff from the server can be found from:
>>http://www.holviala.com/~kimmy/crash/
>>
>>Anyway, I mount the export to a Linux client (tried with a few with
>>different 2.6 kernels and distros) and then start copying files from
>>clients CDROM to the server through NFS. After copying a few small
>>files, the first big one reboots the server. There are no log entries,
>>and the server has no local console so I don't know what happens. This
>>is reproduceable 100% of the time.
>>To narrow down the problem, I've tried the following:
>>
>>- copied files from a different client running Gentoo: reboot
>>- exported a non-raided partition (hdc9) and tried that: reboot
>>- switched 2.6.10 to 2.6.11-rc3: reboot, but it took longer
>>
>>I hope it's just something that I've done, but this server has been in
>>use for a long time now without any problems, and I haven't touched it
>>for a while.
>>
>>So, if anyone knows what's wrong, or can tell me a way to debug the
>>situation more I'd be grateful. The server is in a place where it's
>>nearly impossible to have a local console - I could probably use a
>>serial one if necessary for debugging.
>>===============
>>
>>So, that was my original posting. Since then I've tried localhost
>>mounts, tcp, udp, different r/wsizes etc etc. I can still reliably
>>reboot teh server remotely just by copying something to the NFS mount :-/.
>>
>>Now, there are two things that I've tested that worked better than
>>others: First I switched to async exports, mounted localhost:/export/tmp
>>with udp and copied stuff there. The copying hang
>>(http://www.holviala.com/~kimmy/crash/nfsd.log) but the server didn't
>>crash. Woo! Tried that remotely and it once again rebooted the server...
>>
>>And then I made one test with tcp,rsize=1024,wsize=1024 again with
>>localhost:/export/tmp, and that worked ok. I haven't had the time to
>>test that remotely, yet.
>>
>>So, I can only assume that there's something wrong with using r/wsize
>>which is bigger than MTU. However, I run a lot of stuff through that
>>same network and I never see any TCP retransmissions or any other
>>problems. Besides, I'm getting the same reboot even with localhost NFS
>>mounts.
>>
>>I have managed to capture some logs with nfsd logging on, those can be
>>found from the above link.
>>
>>I'd be grateful for any pointers, debugging flags, anything. I've
>>crashed my server now maybe three dozen times trying to narrow the
>>problem down....
>>
>>
>>
>>Kim
>>
>>
>>
>>
>>-------------------------------------------------------
>>SF email is sponsored by - The IT Product Guide
>>Read honest & candid reviews on hundreds of IT Products from real users.
>>Discover which products truly live up to the hype. Start reading now.
>>http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
>>_______________________________________________
>>NFS maillist - NFS@lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/nfs
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
prev parent reply other threads:[~2005-02-12 8:02 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-02-11 12:56 Spontaneous server reboot with 2.6.10 and nfsd Kim Holviala
2005-02-11 20:17 ` comsatcat
2005-02-12 8:02 ` Kim Holviala [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=420DB826.1080506@holviala.com \
--to=kim@holviala.com \
--cc=comsatcat@earthlink.net \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox