public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: improve metadata I/O merging in the elevator
@ 2009-11-12 19:09 Christoph Hellwig
  2009-11-16  3:50 ` Dave Chinner
  2009-11-24 18:03 ` [PATCH v2] " Christoph Hellwig
  0 siblings, 2 replies; 5+ messages in thread
From: Christoph Hellwig @ 2009-11-12 19:09 UTC (permalink / raw)
  To: xfs

I had the patch below from Dave in my queue for a while, but previously
couldn't really reproduce his numbers.  After some discussions of the
bio types I've reteseted it again and can see constant improvements when
using cfq on my large array box with it (5-10% for the sequential create
workloads), but still nothing on deadline.  Given that people also want
it for better marking in blktrace it might be time to put it in.

Comments?

-- 

From: Dave Chinner <dgc@sgi.com>
Subject: xfs: improve metadata I/O merging in the elevator

Change all async metadata buffers to use [READ|WRITE]_META I/O types
so that the I/O doesn't get issued immediately. This allows merging
of adjacent metadata requests but still prioritises them over bulk
data. This shows a 10-15% improvement in sequential create speed of
small files.

Don't include the log buffers in this classification - leave them
as sync types so they are issued immediately.

Signed-off-by: Dave Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.c	2009-11-12 17:10:19.852253847 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_buf.c	2009-11-12 17:13:55.334003777 +0100
@@ -1177,10 +1177,14 @@ _xfs_buf_ioapply(
 	if (bp->b_flags & XBF_ORDERED) {
 		ASSERT(!(bp->b_flags & XBF_READ));
 		rw = WRITE_BARRIER;
-	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
+	} else if (bp->b_flags & XBF_LOG_BUFFER) {
 		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
 		bp->b_flags &= ~_XBF_RUN_QUEUES;
 		rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC;
+	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
+		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
+		bp->b_flags &= ~_XBF_RUN_QUEUES;
+		rw = (bp->b_flags & XBF_WRITE) ? WRITE : READ_META;
 	} else {
 		rw = (bp->b_flags & XBF_WRITE) ? WRITE :
 		     (bp->b_flags & XBF_READ_AHEAD) ? READA : READ;
Index: xfs/fs/xfs/linux-2.6/xfs_buf.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.h	2009-11-12 17:10:19.857278370 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_buf.h	2009-11-12 17:13:55.334003777 +0100
@@ -55,6 +55,7 @@ typedef enum {
 	XBF_FS_MANAGED = (1 << 8),  /* filesystem controls freeing memory  */
  	XBF_ORDERED = (1 << 11),    /* use ordered writes		   */
 	XBF_READ_AHEAD = (1 << 12), /* asynchronous read-ahead		   */
+	XBF_LOG_BUFFER = (1 << 13), /* this is a buffer used for the log   */
 
 	/* flags used only as arguments to access routines */
 	XBF_LOCK = (1 << 14),       /* lock requested			   */
Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c	2009-11-12 17:10:20.267254560 +0100
+++ xfs/fs/xfs/xfs_log.c	2009-11-12 17:13:55.335004184 +0100
@@ -1524,6 +1524,7 @@ xlog_sync(xlog_t		*log,
 	XFS_BUF_ZEROFLAGS(bp);
 	XFS_BUF_BUSY(bp);
 	XFS_BUF_ASYNC(bp);
+	bp->b_flags |= XBF_LOG_BUFFER;
 	/*
 	 * Do an ordered write for the log block.
 	 * Its unnecessary to flush the first split block in the log wrap case.
@@ -1561,6 +1562,7 @@ xlog_sync(xlog_t		*log,
 		XFS_BUF_ZEROFLAGS(bp);
 		XFS_BUF_BUSY(bp);
 		XFS_BUF_ASYNC(bp);
+		bp->b_flags |= XBF_LOG_BUFFER;
 		if (log->l_mp->m_flags & XFS_MOUNT_BARRIER)
 			XFS_BUF_ORDERED(bp);
 		dptr = XFS_BUF_PTR(bp);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: improve metadata I/O merging in the elevator
  2009-11-12 19:09 [PATCH] xfs: improve metadata I/O merging in the elevator Christoph Hellwig
@ 2009-11-16  3:50 ` Dave Chinner
  2009-11-16 11:05   ` Christoph Hellwig
  2009-11-24 18:03 ` [PATCH v2] " Christoph Hellwig
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2009-11-16  3:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Nov 12, 2009 at 02:09:31PM -0500, Christoph Hellwig wrote:
> I had the patch below from Dave in my queue for a while, but previously
> couldn't really reproduce his numbers.  After some discussions of the
> bio types I've reteseted it again and can see constant improvements when
> using cfq on my large array box with it (5-10% for the sequential create
> workloads), but still nothing on deadline.  Given that people also want
> it for better marking in blktrace it might be time to put it in.
> 
> Comments?

Definitely should be done, but....

It looks like the patch you posted isn't quite doing what was
intended - async write buffers are being classified as WRITE, not
WRITE_META. That means we get more write combining in the elevator
(performance increase) like with WRITE_META, but don't get the
faster dispatch (latency reduction) by using the META queue to keep
the metadata writeback separate from the bulk data writeback.
That may be why deadline is not showing any improvement...

FWIW, the original patch here:

http://oss.sgi.com/archives/xfs/2008-01/msg00630.html

uses WRITE_META, but it looks like you've taken bits of this
patch:

http://oss.sgi.com/archives/xfs/2008-01/msg00653.html

and added the log buffer marking to this patch and accidentally
dropped the WRITE_META marking. i.e. this:

> +	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
> +		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
> +		bp->b_flags &= ~_XBF_RUN_QUEUES;
> +		rw = (bp->b_flags & XBF_WRITE) ? WRITE : READ_META;

I think should be:

+		rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META;

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: improve metadata I/O merging in the elevator
  2009-11-16  3:50 ` Dave Chinner
@ 2009-11-16 11:05   ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2009-11-16 11:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Mon, Nov 16, 2009 at 02:50:19PM +1100, Dave Chinner wrote:
> Definitely should be done, but....
> 
> It looks like the patch you posted isn't quite doing what was
> intended - async write buffers are being classified as WRITE, not
> WRITE_META. That means we get more write combining in the elevator
> (performance increase) like with WRITE_META, but don't get the
> faster dispatch (latency reduction) by using the META queue to keep
> the metadata writeback separate from the bulk data writeback.
> That may be why deadline is not showing any improvement...
> 
> FWIW, the original patch here:
> 
> http://oss.sgi.com/archives/xfs/2008-01/msg00630.html
> 
> uses WRITE_META, but it looks like you've taken bits of this
> patch:

Indeed.  I'l re-add the write-side markings and will re-bench and
re-submit.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] xfs: improve metadata I/O merging in the elevator
  2009-11-12 19:09 [PATCH] xfs: improve metadata I/O merging in the elevator Christoph Hellwig
  2009-11-16  3:50 ` Dave Chinner
@ 2009-11-24 18:03 ` Christoph Hellwig
  2009-12-15 20:12   ` Alex Elder
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2009-11-24 18:03 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dgc@sgi.com>

Change all async metadata buffers to use [READ|WRITE]_META I/O types
so that the I/O doesn't get issued immediately. This allows merging
of adjacent metadata requests but still prioritises them over bulk
data. This shows a 10-15% improvement in sequential create speed of
small files.

Don't include the log buffers in this classification - leave them
as sync types so they are issued immediately.

Signed-off-by: Dave Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.c	2009-11-16 12:03:21.940253900 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_buf.c	2009-11-16 12:05:54.224256517 +0100
@@ -1177,10 +1177,14 @@ _xfs_buf_ioapply(
 	if (bp->b_flags & XBF_ORDERED) {
 		ASSERT(!(bp->b_flags & XBF_READ));
 		rw = WRITE_BARRIER;
-	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
+	} else if (bp->b_flags & XBF_LOG_BUFFER) {
 		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
 		bp->b_flags &= ~_XBF_RUN_QUEUES;
 		rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC;
+	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
+		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
+		bp->b_flags &= ~_XBF_RUN_QUEUES;
+		rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META;
 	} else {
 		rw = (bp->b_flags & XBF_WRITE) ? WRITE :
 		     (bp->b_flags & XBF_READ_AHEAD) ? READA : READ;
Index: xfs/fs/xfs/linux-2.6/xfs_buf.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.h	2009-11-16 12:03:21.945261731 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_buf.h	2009-11-16 12:03:23.796003934 +0100
@@ -55,6 +55,7 @@ typedef enum {
 	XBF_FS_MANAGED = (1 << 8),  /* filesystem controls freeing memory  */
  	XBF_ORDERED = (1 << 11),    /* use ordered writes		   */
 	XBF_READ_AHEAD = (1 << 12), /* asynchronous read-ahead		   */
+	XBF_LOG_BUFFER = (1 << 13), /* this is a buffer used for the log   */
 
 	/* flags used only as arguments to access routines */
 	XBF_LOCK = (1 << 14),       /* lock requested			   */
Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c	2009-11-16 12:03:21.952261854 +0100
+++ xfs/fs/xfs/xfs_log.c	2009-11-16 12:03:23.798003071 +0100
@@ -1524,6 +1524,7 @@ xlog_sync(xlog_t		*log,
 	XFS_BUF_ZEROFLAGS(bp);
 	XFS_BUF_BUSY(bp);
 	XFS_BUF_ASYNC(bp);
+	bp->b_flags |= XBF_LOG_BUFFER;
 	/*
 	 * Do an ordered write for the log block.
 	 * Its unnecessary to flush the first split block in the log wrap case.
@@ -1561,6 +1562,7 @@ xlog_sync(xlog_t		*log,
 		XFS_BUF_ZEROFLAGS(bp);
 		XFS_BUF_BUSY(bp);
 		XFS_BUF_ASYNC(bp);
+		bp->b_flags |= XBF_LOG_BUFFER;
 		if (log->l_mp->m_flags & XFS_MOUNT_BARRIER)
 			XFS_BUF_ORDERED(bp);
 		dptr = XFS_BUF_PTR(bp);
Index: xfs/include/linux/fs.h
===================================================================
--- xfs.orig/include/linux/fs.h	2009-11-16 12:04:49.799002997 +0100
+++ xfs/include/linux/fs.h	2009-11-16 12:05:10.130255677 +0100
@@ -151,6 +151,7 @@ struct inodes_stat_t {
 #define READ_META	(READ | (1 << BIO_RW_META))
 #define WRITE_SYNC_PLUG	(WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
 #define WRITE_SYNC	(WRITE_SYNC_PLUG | (1 << BIO_RW_UNPLUG))
+#define WRITE_META	(WRITE | (1 << BIO_RW_META))
 #define WRITE_ODIRECT	(WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
 #define SWRITE_SYNC_PLUG	\
 			(SWRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v2] xfs: improve metadata I/O merging in the elevator
  2009-11-24 18:03 ` [PATCH v2] " Christoph Hellwig
@ 2009-12-15 20:12   ` Alex Elder
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Elder @ 2009-12-15 20:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Christoph Hellwig wrote:
> From: Dave Chinner <dgc@sgi.com>
> 
> Change all async metadata buffers to use [READ|WRITE]_META I/O types
> so that the I/O doesn't get issued immediately. This allows merging
> of adjacent metadata requests but still prioritises them over bulk
> data. This shows a 10-15% improvement in sequential create speed of
> small files.
> 
> Don't include the log buffers in this classification - leave them
> as sync types so they are issued immediately.

Looks good.

> Signed-off-by: Dave Chinner <dgc@sgi.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Alex Elder <aelder@sgi.com>

> Index: xfs/fs/xfs/linux-2.6/xfs_buf.c
> ===================================================================
> --- xfs.orig/fs/xfs/linux-2.6/xfs_buf.c	2009-11-16 12:03:21.940253900 +0100
> +++ xfs/fs/xfs/linux-2.6/xfs_buf.c	2009-11-16 12:05:54.224256517 +0100
> @@ -1177,10 +1177,14 @@ _xfs_buf_ioapply(
>  	if (bp->b_flags & XBF_ORDERED) {
>  		ASSERT(!(bp->b_flags & XBF_READ));
>  		rw = WRITE_BARRIER;
> -	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
> +	} else if (bp->b_flags & XBF_LOG_BUFFER) {
>  		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
>  		bp->b_flags &= ~_XBF_RUN_QUEUES;
>  		rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC;
> +	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
> +		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
> +		bp->b_flags &= ~_XBF_RUN_QUEUES;
> +		rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META;
>  	} else {
>  		rw = (bp->b_flags & XBF_WRITE) ? WRITE :
>  		     (bp->b_flags & XBF_READ_AHEAD) ? READA : READ;
> Index: xfs/fs/xfs/linux-2.6/xfs_buf.h
> ===================================================================
> --- xfs.orig/fs/xfs/linux-2.6/xfs_buf.h	2009-11-16 12:03:21.945261731 +0100
> +++ xfs/fs/xfs/linux-2.6/xfs_buf.h	2009-11-16 12:03:23.796003934 +0100
> @@ -55,6 +55,7 @@ typedef enum {
>  	XBF_FS_MANAGED = (1 << 8),  /* filesystem controls freeing memory  */
>   	XBF_ORDERED = (1 << 11),    /* use ordered writes		   */
>  	XBF_READ_AHEAD = (1 << 12), /* asynchronous read-ahead		   */
> +	XBF_LOG_BUFFER = (1 << 13), /* this is a buffer used for the log   */
> 
>  	/* flags used only as arguments to access routines */
>  	XBF_LOCK = (1 << 14),       /* lock requested			   */
> Index: xfs/fs/xfs/xfs_log.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_log.c	2009-11-16 12:03:21.952261854 +0100
> +++ xfs/fs/xfs/xfs_log.c	2009-11-16 12:03:23.798003071 +0100
> @@ -1524,6 +1524,7 @@ xlog_sync(xlog_t		*log,
>  	XFS_BUF_ZEROFLAGS(bp);
>  	XFS_BUF_BUSY(bp);
>  	XFS_BUF_ASYNC(bp);
> +	bp->b_flags |= XBF_LOG_BUFFER;
>  	/*
>  	 * Do an ordered write for the log block.
>  	 * Its unnecessary to flush the first split block in the log wrap case.
> @@ -1561,6 +1562,7 @@ xlog_sync(xlog_t		*log,
>  		XFS_BUF_ZEROFLAGS(bp);
>  		XFS_BUF_BUSY(bp);
>  		XFS_BUF_ASYNC(bp);
> +		bp->b_flags |= XBF_LOG_BUFFER;
>  		if (log->l_mp->m_flags & XFS_MOUNT_BARRIER)
>  			XFS_BUF_ORDERED(bp);
>  		dptr = XFS_BUF_PTR(bp);
> Index: xfs/include/linux/fs.h
> ===================================================================
> --- xfs.orig/include/linux/fs.h	2009-11-16 12:04:49.799002997 +0100
> +++ xfs/include/linux/fs.h	2009-11-16 12:05:10.130255677 +0100
> @@ -151,6 +151,7 @@ struct inodes_stat_t {
>  #define READ_META	(READ | (1 << BIO_RW_META))
>  #define WRITE_SYNC_PLUG	(WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
>  #define WRITE_SYNC	(WRITE_SYNC_PLUG | (1 << BIO_RW_UNPLUG))
> +#define WRITE_META	(WRITE | (1 << BIO_RW_META))
>  #define WRITE_ODIRECT	(WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
>  #define SWRITE_SYNC_PLUG	\
>  			(SWRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-12-15 20:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-12 19:09 [PATCH] xfs: improve metadata I/O merging in the elevator Christoph Hellwig
2009-11-16  3:50 ` Dave Chinner
2009-11-16 11:05   ` Christoph Hellwig
2009-11-24 18:03 ` [PATCH v2] " Christoph Hellwig
2009-12-15 20:12   ` Alex Elder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox