From: Erik Walthinsen <omega@pdxcolo.net>
To: nfs@lists.sourceforge.net
Subject: Re: NAS server avalanche overload
Date: Wed, 03 Mar 2004 14:02:23 -0800 [thread overview]
Message-ID: <1078351343.821.28.camel@localhost> (raw)
In-Reply-To: <1078302718.825.67.camel@localhost>
Greg Banks said:
> Are you using the "async" export option on the server? It causes
> similar symptoms when used with large NFS writes. Use "sync".
Mount options as reported by /proc/mounts are:
rw,noatime,rsize=3D4096,wsize=3D4096,intr,soft,noac,tcp
I'm pretty sure the default here is async, as I had sync on there
earlier and it actually caused a noticeable drop in performance.
What I'm wondering is if the default bdflush settings are putting a hard
cap on how much data can be write-cached, forcing the system to block
writes too early. With 512MB of RAM, say half available as write-cache,
even at the rate of 5MB/sec, we should be able to run for almost a
minute with complete disk starvation before things start to wedge. And
since this doesn't look like complete starvation at all (graphs show
I/O's are completing the whole time), it should last even longer.
If anyone has any ideas on what to tweak in bdflush, it seems that there
*is* some pattern in the spikes, with them occurring at 11:25pm and
12:00am every day for at least the last 3 days.
Philippe Gramouli=E9 said:
> Is there anything that prevent you from running a 2.4.25 kernel ?
It's a production machine with those 60+ virtual machines running on it,
so the only opportunity I have to change anything of this sort is during
our quarterly downtime, the next one being early April.
Williamson, Jay (John G) said:
> Hi. I have no experience with your particular setup but have had
> similar problems when our clients were running a pre-2.4.20 kernel and
> using UDP for the NFS mounts. If that fits your client setup then try
> either upgrading the kernel or switching to TCP.
We're using TCP, as it also had performance advantages in our early
tests.
David Dougall said:
> My experience is that ext3 is dreadfully slow and RAID5 is dreadfully
> slow. These 2 combined can cause significant problems. The
> suggestions that have come from the list before are to change to
> RAID10 and use another filesystem such as reiserfs or xfs. I saw
> significant speedup moving away from ext3.
The NAS itself is using reiserfs, only the virtual machines are using
ext3. The question there is what kind of read/write load differences
one might have between the two. Certainly there's a possibility that
the journaling writes have something to do with it, but I wouldn't think
they would cluster to the degree things seem to be.
RAID 1+0 is an option with the 8506-12, but the migration is extremely
painful. We have to acquire a whole new set of disks (would probably
get 6), construct the array, then copy half a TB of data across. Much
of the data is sparse files, so the process would take even longer. At
least a large chunk of it is non-production files (mirrors), so probably
1/2 to 2/3 can be done without downtime.
- Omega
aka Erik Walthinsen
omega@pdxcolo.net
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2004-03-03 22:08 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-03 8:31 NAS server avalanche overload Erik Walthinsen
2004-03-03 22:02 ` Erik Walthinsen [this message]
2004-03-04 0:04 ` Greg Banks
2004-03-04 0:20 ` Erik Walthinsen
2004-03-04 1:40 ` Greg Banks
2004-03-04 2:17 ` Trond Myklebust
2004-03-04 4:39 ` Ian Kent
2004-03-04 5:31 ` Erik Walthinsen
2004-03-04 5:47 ` Greg Banks
2004-03-04 14:38 ` Ian Kent
[not found] <482A3FA0050D21419C269D13989C61130435DCCB@lavender-fe.eng.netapp.com>
2004-03-03 22:34 ` Erik Walthinsen
-- strict thread matches above, loose matches on Subject: below --
2004-03-04 2:07 Lever, Charles
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1078351343.821.28.camel@localhost \
--to=omega@pdxcolo.net \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox