All of lore.kernel.org
 help / color / mirror / Atom feed
From: Erik Walthinsen <omega@pdxcolo.net>
To: nfs@lists.sourceforge.net
Subject: NAS server avalanche overload
Date: Wed, 03 Mar 2004 00:31:59 -0800	[thread overview]
Message-ID: <1078302718.825.67.camel@localhost> (raw)

We have a NAS server currently running a basically stock 2.4.22 kernel,
on a single 2.4GHz Xeon HT with 512MB RAM and 4 SATA 200GB disks on a
3ware 8506-12 in RAID-5.

It serves two machines via switched gigabit (1500 mtu, no switch
support) which each run large numbers of User-mode Linux processes. 
Each kernel (there are ~60 total) has one or more open files at all
times, backing the normal ext3 filesystems for these virtual machines. 
They use copy-on-write such that the main distro files are on local
disks and only the differences against this are stored, per machine, on
the NAS.  /home filesystems and such are flat, sparse files.  No attempt
is made at this point to turn atime off (but I can patch the kernel to
change the default if necessary).

The problem we're having is that every once in a while the entire system
grinds to a screeching halt with the load average on the NAS box spiking
to 17-18 (with 16 nfsd processes, this means every last one is wedged),
which quickly causes the load on the two client machines to spike as
requests they're making get stuck.  This eventually clears up, but can
last anywhere from 15 seconds to 15+ minutes.  In the meantime, however,
any disk-based operation inside the virtual machines can take a minute
or more to complete.

I've been trying for a long time to track this down with no luck, so now
it's time to see if anyone here has any ideas.

First major datapoint: early in the debugging cycle a large-ish number
of RRD datasets were kept on the NAS box, being updated regularly in an
attempt to spot the culprit.  This instead made the problem
significantly more frequent.  Moving the archives to another machine and
off NFS entirely immediately trimmed 100-200 I/O's per second average
off the NAS box, and the problem eased greatly.

Second: the whole process can easily be replicated by running bonnie++
on any of the machines (the NAS, the client, or a virtual machine), and
it appears clearly related to the I/O's per second, but only in cases
where I/O's are not linear.  *Reading* a huge file either locally or
over NFS will cause a very mild form of the overload, but *writing* can
cause it almost instantaneously.

I've tried playing around with bdflush parameters, but without a
dramatically clearer mental picture of how that whole subsystem works, I
have no real chance of coming up with the best direction to move.  A
gradual search isn't really feasible because the spikes are
unpredictable, and artificially generated loads (writing huge files) are
*too* stressful to see any differences.

I've graphed this thing utterly to death, and anyone interested in
checking it out can see tonight's fiasco at:

http://narsil.pdxcolo.net/graphs/?start=200403022200&duration=1hr

The aforementioned switch away from NAS-based RRD archives can be seen
quite easily at:

http://narsil.pdxcolo.net/graphs/?start=20040208&duration=1week

The graph pages are designed for a full 1600x1200 screen (mine), so it
may be hard to see everything clearly on smaller screens.  Try adding
&width=100&height=50 maybe.  The most relevant link is the NAS debug
page (nasdebug.php?...), which shows more information than the main
graphs page.

What I'd like to know is if anyone has any idea what's really going on
here, or suggestions as to what other data I might gather that would
help diagnose the problem.  Easy solutions (add RAM, tweak a sysctl,
etc.) would be *greatly* appreciated ;-)
-- 
- Omega
  aka Erik Walthinsen
  omega@pdxcolo.net



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

             reply	other threads:[~2004-03-03  8:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-03  8:31 Erik Walthinsen [this message]
2004-03-03 22:02 ` NAS server avalanche overload Erik Walthinsen
2004-03-04  0:04   ` Greg Banks
2004-03-04  0:20     ` Erik Walthinsen
2004-03-04  1:40       ` Greg Banks
2004-03-04  2:17         ` Trond Myklebust
2004-03-04  4:39         ` Ian Kent
2004-03-04  5:31           ` Erik Walthinsen
2004-03-04  5:47             ` Greg Banks
2004-03-04 14:38             ` Ian Kent
     [not found] <482A3FA0050D21419C269D13989C61130435DCCB@lavender-fe.eng.netapp.com>
2004-03-03 22:34 ` Erik Walthinsen
  -- strict thread matches above, loose matches on Subject: below --
2004-03-04  2:07 Lever, Charles

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1078302718.825.67.camel@localhost \
    --to=omega@pdxcolo.net \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.