* Re: nfsd: rewrite performance problem
2005-07-20 13:06 nfsd: rewrite performance problem Olaf Kirch
@ 2005-07-27 10:21 ` Olaf Kirch
0 siblings, 0 replies; 2+ messages in thread
From: Olaf Kirch @ 2005-07-27 10:21 UTC (permalink / raw)
To: nfs
[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]
On Wed, Jul 20, 2005 at 03:06:23PM +0200, Olaf Kirch wrote:
> we've just been investigating a performance problem at a customer of ours.
> On a machine of 4G RAM, they were running N iozone threads on one 1G file
> each. For 3 threads, where the working set fits into RAM, the rewrite
> numbers are reasonably, but when using 4 threads, rewrite performance is
> horrible.
[...]
> Chris Mason and I looked into this, and we believe we've nailed down
> the problem, which is the use of writev in nfsd_write. nfsd receives
> the WRITE request from the client, broken up into page sized chunks.
> The default implementation of writev will simply call file->op->write
> for each of the fragments it's given, but the first fragment is
> PAGE_SIZE minus the RPC header and write_args. So we end up writing
> less than a full block, causing the block to be read first. All
> subsequent pages in the iovec are non block aligned either, so the
> same happens for these as well.
The customer has confirmed that this is indeed what's causing the
problem. Here's a patch to make nfsd_write page align the payload when
it sees a rewrite. I'm currently testing this patch (and I will also
do some performance testing to see if we get a general performance if
I page align all writes).
Any comments?
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
[-- Attachment #2: nfsd-rewrite-align --]
[-- Type: text/plain, Size: 2544 bytes --]
Subject: NFS: Fix rewrite performance
This patch fixes a performance issue with rewrite. Most of the time,
the iovecs passed to nfsd_vfs_write are unaligned. As the default writev
implementation will just call write() on each chunk in the iovec, this
will cause partial blocks to be dirtied, triggering a read-modify-write
cycle for each block.
The short-term fix is to make sure nfsd aligns the data properly.
The long term fix would be to make the VFS smarter about writev requests.
Signed-off-by: Olaf Kirch <okir@suse.de>
Index: linux-2.6.12.new/fs/nfsd/vfs.c
===================================================================
--- linux-2.6.12.new.orig/fs/nfsd/vfs.c
+++ linux-2.6.12.new/fs/nfsd/vfs.c
@@ -874,6 +874,46 @@ out:
return err;
}
+/*
+ * Helper function to page-align the write payload.
+ */
+static inline int
+nfsd_page_align_payload(struct iovec *vec, int vlen)
+{
+ unsigned char *this_page, *prev_page;
+ int i, chunk0, chunk1;
+
+ /* The following checks are just paranoia */
+ if (vlen < 2)
+ return 0;
+
+ if (vec[0].iov_len + vec[vlen-1].iov_len != PAGE_CACHE_SIZE)
+ return 0;
+ for (i = 1; i < vlen - 1; ++i) {
+ if (vec[i].iov_len != PAGE_CACHE_SIZE)
+ return 0;
+ }
+
+ chunk0 = vec[0].iov_len;
+ chunk1 = PAGE_CACHE_SIZE - chunk0;
+
+ this_page = (unsigned char *) vec[vlen-1].iov_base;
+ for (i = vlen-1; i; --i) {
+ prev_page = (unsigned char *) vec[i-1].iov_base;
+
+ /* Push trailing partial page so it's
+ * aligned with the end of the page, then
+ * pull up the missing chunk from the previous
+ * page */
+ memmove(this_page + chunk0, this_page, chunk1);
+ memcpy(this_page, prev_page + chunk1, chunk0);
+ vec[i].iov_len = PAGE_CACHE_SIZE;
+ this_page = prev_page;
+ }
+
+ return 1;
+}
+
static inline int
nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
loff_t offset, struct kvec *vec, int vlen,
@@ -917,6 +957,17 @@ nfsd_vfs_write(struct svc_rqst *rqstp, s
if (stable && !EX_WGATHER(exp))
file->f_flags |= O_SYNC;
+ /* Hack: if we're rewriting the file, make sure
+ * we align the iovec properly to avoid costly
+ * read-modify-write operations on the block devices.
+ * This hack can go away once we have generic_file_writev.
+ */
+ if ((offset < inode->i_size)
+ && (cnt % PAGE_CACHE_SIZE) == 0
+ && vec->iov_len != PAGE_CACHE_SIZE
+ && nfsd_page_align_payload(vec, vlen))
+ vec++, vlen--;
+
/* Write the data. */
oldfs = get_fs(); set_fs(KERNEL_DS);
err = vfs_writev(file, (struct iovec __user *)vec, vlen, &offset);
^ permalink raw reply [flat|nested] 2+ messages in thread