From: Kenneth Sumrall <ksumrall@pacbell.net>
To: nfs@lists.sourceforge.net
Subject: NFS server hang, looking for suggestions
Date: Wed, 20 Apr 2005 22:10:36 -0700 [thread overview]
Message-ID: <426735CC.9020900@pacbell.net> (raw)
At work, we have a very large (5.6 Tb) SCSI raid unit, which is formatted
as 1 XFS filesystem. It is connected to a SuperMicro 6012P-6 dual CPU
Pentium-4 server. The server is running on Suse 9.2, but we've upgraded
the kernel from the 2.6.8 that shipped with it to 2.6.11.7 from kernel.org.
The server exports the XFS filesystem using the kernel NFSD Version 3.
The machine has recently been hanging on a regular basis. We think it's
related to NFS as the hangs often occur during a time in our nightly builds
when a bunch of machines are all writing data to the server at the same time.
However, sometimes the hangs occur when the write load is not as heavy.
The things we've tried are:
Swap the server box with a spare. Just to make sure it's not a hardware
problem.
Tried booting with "nosmp noapic" in case SMP was causing us problems.
Update to 2.6.11.7, because I read about a problem exporting XFS over NFS
in 2.6.8. One thing I'm not clear on, with the 2.6.8 XFS over NFS bug,
could that cause XFS filesystem corruption. Should I run xfs_check on
my XFS filesystem?
We recently re-cabled a bunch of the clients for this machine, and in the
process, removed a choke point where 13 of our clients were funnelled through
a 100 Mbs ethernet switch. That could have caused major fragmentation issues,
which I've read are a bad thing. It's only been 1 day since we did that, so
no data yet on if things are better.
Other things to note. Because the RAID is so big, we are running XFS directly
on the raw disk device, not a partition. The partition format seems to have
problems with sizes over 2 terabytes. Of course, I had to turn on CONFIG_LBD
in order to access such a large block device.
The ethernet interface is an e1000 gigabit interface. It plugs directly into
our main Foundry ethernet switch. The clients all have 100 Mbit interfaces, but
there's a bunch of them.
Also, the RAID uses a sector size of 2048 bytes, not the typical 512 bytes.
The SCSI controller in the server is an Adaptec Ultra160 chip, and we're using
the aic7xxx driver.
Does anyone have any suggestions on how to further diagnose our problem? I've
not used magic sysrq before, but I'm thinking maybe trying to dump a list of
current tasks, and the registers might be useful to see if it hangs in the
same place everytime. Or I could apply the KGDB patch, and try using that.
Does anyone have any other ideas on how to diagnose this? Any known problems
I'm not aware of? I'd really like to make this server rock solid.
Thanks.
Ken Sumrall
ksumrall@pacbell.net
-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next reply other threads:[~2005-04-21 5:10 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-21 5:10 Kenneth Sumrall [this message]
2005-04-21 18:09 ` NFS server hang, looking for suggestions Dan Stromberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=426735CC.9020900@pacbell.net \
--to=ksumrall@pacbell.net \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.