All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: NAS server avalanche overload
@ 2004-03-04  2:07 Lever, Charles
  0 siblings, 0 replies; 12+ messages in thread
From: Lever, Charles @ 2004-03-04  2:07 UTC (permalink / raw)
  To: Greg Banks, Erik Walthinsen; +Cc: nfs

> > > The noatime option has no effect over NFS.  Your [rw]sizes are
> > > really quite small, try 8K.  Also, try turning off noac.
> > The rwsizes were set based on a set of experiments with=20
> different sizes,
> > with 4k yielding by far the best bandwidth performance. =20
> The limiting
> > factor there is that the gig switch we have atm doesn't handle jumbo
> > frames.=20
>=20
> That really shouldn't make a difference; both 4K and 8K IOs will end
> up being split over multiple ethernet frames.

it does make a difference if you are using UDP and your network
suffers from bursty congestion or buffer overruns.

erik, you should look for network problems so you can boost your
transfer size without suffering a loss in performance.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread
* NAS server avalanche overload
@ 2004-03-03  8:31 Erik Walthinsen
  2004-03-03 22:02 ` Erik Walthinsen
  0 siblings, 1 reply; 12+ messages in thread
From: Erik Walthinsen @ 2004-03-03  8:31 UTC (permalink / raw)
  To: nfs

We have a NAS server currently running a basically stock 2.4.22 kernel,
on a single 2.4GHz Xeon HT with 512MB RAM and 4 SATA 200GB disks on a
3ware 8506-12 in RAID-5.

It serves two machines via switched gigabit (1500 mtu, no switch
support) which each run large numbers of User-mode Linux processes. 
Each kernel (there are ~60 total) has one or more open files at all
times, backing the normal ext3 filesystems for these virtual machines. 
They use copy-on-write such that the main distro files are on local
disks and only the differences against this are stored, per machine, on
the NAS.  /home filesystems and such are flat, sparse files.  No attempt
is made at this point to turn atime off (but I can patch the kernel to
change the default if necessary).

The problem we're having is that every once in a while the entire system
grinds to a screeching halt with the load average on the NAS box spiking
to 17-18 (with 16 nfsd processes, this means every last one is wedged),
which quickly causes the load on the two client machines to spike as
requests they're making get stuck.  This eventually clears up, but can
last anywhere from 15 seconds to 15+ minutes.  In the meantime, however,
any disk-based operation inside the virtual machines can take a minute
or more to complete.

I've been trying for a long time to track this down with no luck, so now
it's time to see if anyone here has any ideas.

First major datapoint: early in the debugging cycle a large-ish number
of RRD datasets were kept on the NAS box, being updated regularly in an
attempt to spot the culprit.  This instead made the problem
significantly more frequent.  Moving the archives to another machine and
off NFS entirely immediately trimmed 100-200 I/O's per second average
off the NAS box, and the problem eased greatly.

Second: the whole process can easily be replicated by running bonnie++
on any of the machines (the NAS, the client, or a virtual machine), and
it appears clearly related to the I/O's per second, but only in cases
where I/O's are not linear.  *Reading* a huge file either locally or
over NFS will cause a very mild form of the overload, but *writing* can
cause it almost instantaneously.

I've tried playing around with bdflush parameters, but without a
dramatically clearer mental picture of how that whole subsystem works, I
have no real chance of coming up with the best direction to move.  A
gradual search isn't really feasible because the spikes are
unpredictable, and artificially generated loads (writing huge files) are
*too* stressful to see any differences.

I've graphed this thing utterly to death, and anyone interested in
checking it out can see tonight's fiasco at:

http://narsil.pdxcolo.net/graphs/?start=200403022200&duration=1hr

The aforementioned switch away from NAS-based RRD archives can be seen
quite easily at:

http://narsil.pdxcolo.net/graphs/?start=20040208&duration=1week

The graph pages are designed for a full 1600x1200 screen (mine), so it
may be hard to see everything clearly on smaller screens.  Try adding
&width=100&height=50 maybe.  The most relevant link is the NAS debug
page (nasdebug.php?...), which shows more information than the main
graphs page.

What I'd like to know is if anyone has any idea what's really going on
here, or suggestions as to what other data I might gather that would
help diagnose the problem.  Easy solutions (add RAM, tweak a sysctl,
etc.) would be *greatly* appreciated ;-)
-- 
- Omega
  aka Erik Walthinsen
  omega@pdxcolo.net



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-03-04 14:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <482A3FA0050D21419C269D13989C61130435DCCB@lavender-fe.eng.netapp.com>
2004-03-03 22:34 ` NAS server avalanche overload Erik Walthinsen
2004-03-04  2:07 Lever, Charles
  -- strict thread matches above, loose matches on Subject: below --
2004-03-03  8:31 Erik Walthinsen
2004-03-03 22:02 ` Erik Walthinsen
2004-03-04  0:04   ` Greg Banks
2004-03-04  0:20     ` Erik Walthinsen
2004-03-04  1:40       ` Greg Banks
2004-03-04  2:17         ` Trond Myklebust
2004-03-04  4:39         ` Ian Kent
2004-03-04  5:31           ` Erik Walthinsen
2004-03-04  5:47             ` Greg Banks
2004-03-04 14:38             ` Ian Kent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.