All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Banks <gnb@sgi.com>
To: Olaf Kirch <okir@suse.de>
Cc: nfs@lists.sourceforge.net
Subject: Re: nfsd write throughput
Date: Tue, 3 Aug 2004 17:55:06 +1000	[thread overview]
Message-ID: <20040803075506.GL5581@sgi.com> (raw)
In-Reply-To: <20040803060213.GA21134@suse.de>

On Tue, Aug 03, 2004 at 08:02:13AM +0200, Olaf Kirch wrote:
> On Tue, Aug 03, 2004 at 12:10:18PM +1000, Greg Banks wrote:
> 
> > First, the way the v3 server is supposed to work is that normal page
> > cache pressure pushes pages from unstable writes to disk before the
> > COMMIT call arrives from the client.  The best way to achieve this
> > for a dedicated NFS server box is tuning the pdflush parameters
> > to be more aggressive about writing back dirty pages, e.g. bumping
> > down the following in /proc/vm: dirty_background_ratio, dirty_ratio,
> > dirty_writeback_centisecs, and dirty_expire_centisecs.  I have to
> 
> Yes and no. Can we expect every user to fiddle with the pdflush
> tunables to get an NFS server that performs reasonably well?

No, of course not, hence the next idea.

> > I think another useful approach would be to writeback pages which
> > have been written by NFS unstable writes at a faster rate than pages
> > written by local applications, i.e. add a new /proc/vm/ sysctl like
> > nfs_dirty_writeback_centisecs and a per-page flag.
> 
> That may be a useful solution, too. My patch basically does what
> fadvise(WONTNEED) does.

Sure, the key question is when and for how many pages.  You don't
really have enough information in nfsd_write() to tell that safely.

> > For example, imagine the disk backend is a hardware RAID5 with a
> > stripe size of 128K or greater and the client is doing streaming
> > 32K WRITE calls.  With your patch, every second WRITE call will now
> > try to write half a RAID stripe unit,
> 
> No, it doesn't. If you look at the if() expression, you'll see it
> writes every 64 client-size pages.

> > > +		if ((cnt & 1023) == 0
> > > +		 && ((offset / cnt) & 63) == 0

It writes every time `offset' is a multiple of 64 times `cnt' and
`cnt' is a multiple of 1024.  At this point `cnt' is the length
of the data received in the WRITE call, which has only a vague
relationship to the client page size.

> In the worst case that's every 64K,
> but for Linux clients that's every 256K which is a reasonable size
> for IDE DMA, as well as most RAID configurations

We have configurations with hardware RAID stripe sizes up to 4MB.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

  reply	other threads:[~2004-08-03  7:55 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-02 16:24 nfsd write throughput Olaf Kirch
2004-08-03  2:10 ` Greg Banks
2004-08-03  6:02   ` Olaf Kirch
2004-08-03  7:55     ` Greg Banks [this message]
2004-08-03  8:09       ` Olaf Kirch
2004-08-03  8:28         ` Greg Banks
2004-08-03 10:32       ` Olaf Kirch
2004-08-03 10:52         ` Olaf Kirch
2004-08-03 11:24         ` Greg Banks
2004-08-03 13:26           ` Olaf Kirch
2004-08-03  2:23 ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2004-08-04  0:10 Bruce Allan
2004-08-04  8:18 ` Greg Banks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040803075506.GL5581@sgi.com \
    --to=gnb@sgi.com \
    --cc=nfs@lists.sourceforge.net \
    --cc=okir@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.