All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Neil Brown <neilb@suse.de>
Cc: nfs@lists.sourceforge.net,
	Norman Weathers <norman.r.weathers@conocophillips.com>
Subject: Re: Broken nfsd in recent kernels
Date: Tue, 13 Feb 2007 14:58:06 +1100	[thread overview]
Message-ID: <45D1374E.1090800@yahoo.com.au> (raw)
In-Reply-To: <17873.13603.316692.955211@notabene.brown>

Neil Brown wrote:
> On Monday February 12, norman.r.weathers@conocophillips.com wrote:
> 
>>Hello,
>>
>>I have noticed, at least in our Fedora 6 test case, that recent kernels
>>(2.6.18 and 2.6.19) that there appears to be a "read hell" issue.  Has
>>anyone else seen this?
>>
>>For instance, using iozone, during a write case (32 kb blocks) to a Sun
>>x4100 running Fedora Core 6 and the Fedora core kernels, I get decent
>>throughput.  But, as soon as the test goes from write to rewrite, I see
>>a large amount of read activity (via iostat) on the NFS server.  It
>>looks like 4kb read blocks.
> 
> 
> Yes.......
> 
> When the NFS server writes a large block (e.g. 32K) to a file, it has
> the data in a number of buffers as they came in off the network.  Due
> to the alignment of data in an NFS request, they almost certainly will
> not be page-aligned.
> 
> This 'iovec' is then written to the file.
> 
> Normally when writing to a file from user-space (normal write or
> writev system call), the pages holding the data to be written could be
> paged out, so it has to be brought in to memory before the copy start.
> 
> A change was made to generic_file_buffered_write (in mm/filemap.c)
> probably around 2.6.18 so that when writing from an iovec, each entry
> is send to the file separately, because faulting in all the entries
> at once is a bit awkward.
> 
> So the net result is that when NFSd writes to a file, the filesystem
> sees a bunch of non-page-aligned writes rather than nicely aligned
> writes (even when the NFS request holds a nicely aligned write).  This
> causes it to pre-read all the pages.  Ugh.
> 
> Nick:  You've have some pending patching in this area.  Might they
> address this problem?

Hi Neil,

Yes, they do address the multiple-segment iovec problem, but it remains
to be seen when the patches will get in...

It is very awkward to fix the problem in the prepare_write/commit_write
path due to the nature of the API. Basically I'm reverting to performing
an extra data copy there, which reduces bandwidth quite a lot (although
it does reintroduce the multi-segment iovec copying, so it might be a
win in this case).

Then I'm looking at introducing a new aops API that filesystems can
implement to solve the problem in a well performing manner.

The problem is, this can't really happen until the important filesystems
implement the API.

It would be interesting to know whether Norman's test case actually is
using writev...

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

  reply	other threads:[~2007-02-13  3:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-13  0:26 Broken nfsd in recent kernels Norman Weathers
2007-02-13  3:48 ` Neil Brown
2007-02-13  3:58   ` Nick Piggin [this message]
2007-02-13  4:37     ` Neil Brown
2007-02-13  4:50       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45D1374E.1090800@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=neilb@suse.de \
    --cc=nfs@lists.sourceforge.net \
    --cc=norman.r.weathers@conocophillips.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.