All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix nfsd rewrite performance
@ 2005-08-01 11:39 Olaf Kirch
  2005-08-01 11:53 ` J. Bruce Fields
  0 siblings, 1 reply; 10+ messages in thread
From: Olaf Kirch @ 2005-08-01 11:39 UTC (permalink / raw)
  To: nfs

Since no-one commented on the previous versions of the patch, I'm
submitting it almost unchanged now. This patch is relative to 2.6.12-rc3,
and has undergone some iozone testing, which showed write and rewrite
performance coming out almost the same.

------------------------------------------------------------------
Subject: NFS: Fix rewrite performance

This patch fixes an nfsd performance issue with rewrite. Most of the time,
the iovecs passed to nfsd_vfs_write are unaligned. As the default writev
implementation will just call write() on each chunk in the iovec, this
will cause partial blocks to be dirtied, triggering a read-modify-write
cycle for each block.

The short-term fix is to make sure nfsd aligns the data properly.
The long term fix would be to make the VFS smarter about writev requests.

Signed-off-by: Olaf Kirch <okir@suse.de>

Index: linux-2.6.12.new/fs/nfsd/vfs.c
===================================================================
--- linux-2.6.12.new.orig/fs/nfsd/vfs.c
+++ linux-2.6.12.new/fs/nfsd/vfs.c
@@ -874,6 +874,46 @@ out:
 	return err;
 }
 
+/*
+ * Helper function to page-align the write payload.
+ */
+static inline int
+nfsd_page_align_payload(struct kvec *vec, int vlen)
+{
+	unsigned char *this_page, *prev_page;
+	int i, chunk0, chunk1;
+
+	/* The following checks are just paranoia */
+	if (vlen < 2)
+		return 0;
+
+	if (vec[0].iov_len + vec[vlen-1].iov_len != PAGE_CACHE_SIZE)
+		return 0;
+	for (i = 1; i < vlen - 1; ++i) {
+		if (vec[i].iov_len != PAGE_CACHE_SIZE)
+			return 0;
+	}
+
+	chunk0 = vec[0].iov_len;
+	chunk1 = PAGE_CACHE_SIZE - chunk0;
+
+	this_page = (unsigned char *) vec[vlen-1].iov_base;
+	for (i = vlen-1; i; --i) {
+		prev_page = (unsigned char *) vec[i-1].iov_base;
+
+		/* Push trailing partial page so it's
+		 * aligned with the end of the page, then
+		 * pull up the missing chunk from the previous
+		 * page */
+		memmove(this_page + chunk0, this_page, chunk1);
+		memcpy(this_page, prev_page + chunk1, chunk0);
+		vec[i].iov_len = PAGE_CACHE_SIZE;
+		this_page = prev_page;
+	}
+
+	return 1;
+}
+
 static inline int
 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 				loff_t offset, struct kvec *vec, int vlen,
@@ -917,6 +957,17 @@ nfsd_vfs_write(struct svc_rqst *rqstp, s
 	if (stable && !EX_WGATHER(exp))
 		file->f_flags |= O_SYNC;
 
+	/* Hack: if we're rewriting the file, make sure
+	 * we align the iovec properly to avoid costly
+	 * read-modify-write operations on the block devices.
+	 * This hack can go away once we have generic_file_writev.
+	 */
+	if ((offset < inode->i_size)
+	 && (cnt % PAGE_CACHE_SIZE) == 0
+	 && vec->iov_len != PAGE_CACHE_SIZE
+	 && nfsd_page_align_payload(vec, vlen))
+		vec++, vlen--;
+
 	/* Write the data. */
 	oldfs = get_fs(); set_fs(KERNEL_DS);
 	err = vfs_writev(file, (struct iovec __user *)vec, vlen, &offset);
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
okir@suse.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-08-02 12:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-01 11:39 [PATCH] Fix nfsd rewrite performance Olaf Kirch
2005-08-01 11:53 ` J. Bruce Fields
2005-08-01 11:59   ` Olaf Kirch
2005-08-01 12:10     ` J. Bruce Fields
2005-08-01 12:14       ` Olaf Kirch
2005-08-01 12:58         ` J. Bruce Fields
2005-08-02  9:49           ` Olaf Kirch
2005-08-02 10:02             ` J. Bruce Fields
2005-08-02 10:49               ` Olaf Kirch
2005-08-02 12:07                 ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.