All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olaf Kirch <okir@suse.de>
To: nfs@lists.sourceforge.net
Subject: nfsd: rewrite performance problem
Date: Wed, 20 Jul 2005 15:06:24 +0200	[thread overview]
Message-ID: <20050720130623.GA12537@suse.de> (raw)

Hi,

we've just been investigating a performance problem at a customer of ours.
On a machine of 4G RAM, they were running N iozone threads on one 1G file
each. For 3 threads, where the working set fits into RAM, the rewrite
numbers are reasonably, but when using 4 threads, rewrite performance is
horrible.

vmstat shows that in this case, the number of block reads roughly equals
the number of block writes. They reproduced this with our sles9 kernel as
well as 2.6.12 vanilly, with both reiser and ext3.

Chris Mason and I looked into this, and we believe we've nailed down
the problem, which is the use of writev in nfsd_write. nfsd receives
the WRITE request from the client, broken up into page sized chunks.
The default implementation of writev will simply call file->op->write
for each of the fragments it's given, but the first fragment is
PAGE_SIZE minus the RPC header and write_args. So we end up writing
less than a full block, causing the block to be read first. All
subsequent pages in the iovec are non block aligned either, so the
same happens for these as well.

Does that sound right?

In order to verify our theory, we've asked the customer to test with 8K
wsize and jumbograms enabled. I'll keep you posted.

Possible fixes that I can think of are to implement generic_file_writev
that avoids calling write() for each chunk, but rather grabs all
the pages, updates each and passes it to writepage. Another would be
to use a large linear buffer in svc_recvfrom rather than the iovec,
as the initial implementation used to do. Another (band-aid)
hack would be to check in nfsd_write whether we're re-writing a multiple
of PAGE_SIZE worth of data, and properly align the iovec in this case.

Any other suggestions?

Cheers,
Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
okir@suse.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

             reply	other threads:[~2005-07-20 13:06 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-20 13:06 Olaf Kirch [this message]
2005-07-27 10:21 ` nfsd: rewrite performance problem Olaf Kirch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050720130623.GA12537@suse.de \
    --to=okir@suse.de \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.