From: Olaf Kirch <okir@suse.de>
To: nfs@lists.sourceforge.net
Subject: Re: nfsd: rewrite performance problem
Date: Wed, 27 Jul 2005 12:21:22 +0200 [thread overview]
Message-ID: <20050727102122.GE10892@suse.de> (raw)
In-Reply-To: <20050720130623.GA12537@suse.de>
[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]
On Wed, Jul 20, 2005 at 03:06:23PM +0200, Olaf Kirch wrote:
> we've just been investigating a performance problem at a customer of ours.
> On a machine of 4G RAM, they were running N iozone threads on one 1G file
> each. For 3 threads, where the working set fits into RAM, the rewrite
> numbers are reasonably, but when using 4 threads, rewrite performance is
> horrible.
[...]
> Chris Mason and I looked into this, and we believe we've nailed down
> the problem, which is the use of writev in nfsd_write. nfsd receives
> the WRITE request from the client, broken up into page sized chunks.
> The default implementation of writev will simply call file->op->write
> for each of the fragments it's given, but the first fragment is
> PAGE_SIZE minus the RPC header and write_args. So we end up writing
> less than a full block, causing the block to be read first. All
> subsequent pages in the iovec are non block aligned either, so the
> same happens for these as well.
The customer has confirmed that this is indeed what's causing the
problem. Here's a patch to make nfsd_write page align the payload when
it sees a rewrite. I'm currently testing this patch (and I will also
do some performance testing to see if we get a general performance if
I page align all writes).
Any comments?
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
[-- Attachment #2: nfsd-rewrite-align --]
[-- Type: text/plain, Size: 2544 bytes --]
Subject: NFS: Fix rewrite performance
This patch fixes a performance issue with rewrite. Most of the time,
the iovecs passed to nfsd_vfs_write are unaligned. As the default writev
implementation will just call write() on each chunk in the iovec, this
will cause partial blocks to be dirtied, triggering a read-modify-write
cycle for each block.
The short-term fix is to make sure nfsd aligns the data properly.
The long term fix would be to make the VFS smarter about writev requests.
Signed-off-by: Olaf Kirch <okir@suse.de>
Index: linux-2.6.12.new/fs/nfsd/vfs.c
===================================================================
--- linux-2.6.12.new.orig/fs/nfsd/vfs.c
+++ linux-2.6.12.new/fs/nfsd/vfs.c
@@ -874,6 +874,46 @@ out:
return err;
}
+/*
+ * Helper function to page-align the write payload.
+ */
+static inline int
+nfsd_page_align_payload(struct iovec *vec, int vlen)
+{
+ unsigned char *this_page, *prev_page;
+ int i, chunk0, chunk1;
+
+ /* The following checks are just paranoia */
+ if (vlen < 2)
+ return 0;
+
+ if (vec[0].iov_len + vec[vlen-1].iov_len != PAGE_CACHE_SIZE)
+ return 0;
+ for (i = 1; i < vlen - 1; ++i) {
+ if (vec[i].iov_len != PAGE_CACHE_SIZE)
+ return 0;
+ }
+
+ chunk0 = vec[0].iov_len;
+ chunk1 = PAGE_CACHE_SIZE - chunk0;
+
+ this_page = (unsigned char *) vec[vlen-1].iov_base;
+ for (i = vlen-1; i; --i) {
+ prev_page = (unsigned char *) vec[i-1].iov_base;
+
+ /* Push trailing partial page so it's
+ * aligned with the end of the page, then
+ * pull up the missing chunk from the previous
+ * page */
+ memmove(this_page + chunk0, this_page, chunk1);
+ memcpy(this_page, prev_page + chunk1, chunk0);
+ vec[i].iov_len = PAGE_CACHE_SIZE;
+ this_page = prev_page;
+ }
+
+ return 1;
+}
+
static inline int
nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
loff_t offset, struct kvec *vec, int vlen,
@@ -917,6 +957,17 @@ nfsd_vfs_write(struct svc_rqst *rqstp, s
if (stable && !EX_WGATHER(exp))
file->f_flags |= O_SYNC;
+ /* Hack: if we're rewriting the file, make sure
+ * we align the iovec properly to avoid costly
+ * read-modify-write operations on the block devices.
+ * This hack can go away once we have generic_file_writev.
+ */
+ if ((offset < inode->i_size)
+ && (cnt % PAGE_CACHE_SIZE) == 0
+ && vec->iov_len != PAGE_CACHE_SIZE
+ && nfsd_page_align_payload(vec, vlen))
+ vec++, vlen--;
+
/* Write the data. */
oldfs = get_fs(); set_fs(KERNEL_DS);
err = vfs_writev(file, (struct iovec __user *)vec, vlen, &offset);
prev parent reply other threads:[~2005-07-27 10:21 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-07-20 13:06 nfsd: rewrite performance problem Olaf Kirch
2005-07-27 10:21 ` Olaf Kirch [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050727102122.GE10892@suse.de \
--to=okir@suse.de \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.