public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] avoid too many boundaries in DIO
@ 2006-11-10  1:48 Chris Mason
  2006-11-10  6:35 ` David Chinner
  2006-11-10  6:56 ` Andrew Morton
  0 siblings, 2 replies; 4+ messages in thread
From: Chris Mason @ 2006-11-10  1:48 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: dgc

Dave Chinner found a 10% performance regression with ext3 when using DIO
to fill holes instead of buffered IO.  On large IOs, the ext3 get_block
routine will send more than a page worth of blocks back to DIO via a
single buffer_head with a large b_size value.

The DIO code iterates through this massive block and tests for a
boundary buffer over and over again.  For every block size unit spanned
by the big map_bh, the boundary bit is tested and a bio may be forced
down to the block layer.

There are two potential fixes, one is to ignore the boundary bit on
large regions returned by the FS.  DIO can't tell which part of the big
region was a boundary, and so it may not be a good idea to trust the
hint.

This patch just clears the boundary bit after using it once.  It is 10%
faster for a streaming DIO write w/blocksize of 512k on my sata drive.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 38d08cbe880b fs/direct-io.c
--- a/fs/direct-io.c	Thu Nov 09 20:02:08 2006 -0500
+++ b/fs/direct-io.c	Thu Nov 09 20:31:12 2006 -0500
@@ -959,6 +959,17 @@ do_holes:
 			BUG_ON(this_chunk_bytes == 0);
 
 			dio->boundary = buffer_boundary(map_bh);
+
+			/*
+			 * get_block may return more than one page worth
+			 * of blocks.  Only make the first one a boundary.
+			 * This is still sub-optimal, it probably only
+			 * makes sense to play with boundaries when
+			 * get_block returns a single FS block sized
+			 * unit.
+			 */
+			clear_buffer_boundary(map_bh);
+
 			ret = submit_page_section(dio, page, offset_in_page,
 				this_chunk_bytes, dio->next_block_for_io);
 			if (ret) {

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] avoid too many boundaries in DIO
  2006-11-10  1:48 [PATCH] avoid too many boundaries in DIO Chris Mason
@ 2006-11-10  6:35 ` David Chinner
  2006-11-10  6:56 ` Andrew Morton
  1 sibling, 0 replies; 4+ messages in thread
From: David Chinner @ 2006-11-10  6:35 UTC (permalink / raw)
  To: Chris Mason; +Cc: akpm, linux-kernel

On Thu, Nov 09, 2006 at 08:48:54PM -0500, Chris Mason wrote:
> Dave Chinner found a 10% performance regression with ext3 when using DIO
> to fill holes instead of buffered IO.  On large IOs, the ext3 get_block
> routine will send more than a page worth of blocks back to DIO via a
> single buffer_head with a large b_size value.
> 
> The DIO code iterates through this massive block and tests for a
> boundary buffer over and over again.  For every block size unit spanned
> by the big map_bh, the boundary bit is tested and a bio may be forced
> down to the block layer.
> 
> There are two potential fixes, one is to ignore the boundary bit on
> large regions returned by the FS.  DIO can't tell which part of the big
> region was a boundary, and so it may not be a good idea to trust the
> hint.
> 
> This patch just clears the boundary bit after using it once.  It is 10%
> faster for a streaming DIO write w/blocksize of 512k on my sata drive.

8p altix, 8GB RAM, 64 FC disks, >2.5GiB/s sustainable raw throughput.
dm stripe, outer 1GB of each disk for 64GB volume. Chunk size 128k.
Single thread Direct I/O, I/O size of 512MiB, sequential file extend.

# mkfs.ext3 -E stride-size=32 /dev/mapper/testvol
# mkfs.xfs -f -d sunit=256,swidth=16384 /dev/mapper/testvol

ext3 mounted data=ordered; data=writeback results are the same
for direct I/O.

2.6.19-rc3-pl is 2.6.19-rc3 + the direct I/o placeholder patches.
2.6.19-rc3-pl-bd is 2.6.19-rc3-pl plus the boundary patch.

Kernel            fs    throughput      I/Os/s          sys+intr
-----------       ----  ----------      -------         ------
2.6.19-rc3        ext3   660MiB/s       ~36,000         70+60
2.6.19-rc3-pl     ext3   600MiB/s       ~36,000         70+60
2.6.19-rc3-pl-bd  ext3   715MiB/s       ~16,000         45+35
2.6.19-rc3        xfs   2.28GiB/s       ~18,000         65+65
2.6.19-rc3-pl     xfs   2.24GiB/s       ~18,000         65+65
2.6.19-rc3-pl-bd  xfs   2.26GiB/s       ~18,000         65+65

Hole filling with direct I/O shows equivalent results.

The boundary patch doubles the average I/O size of ext3 and
substantially reduces CPU usage for direct I/O. Nice one, Chris.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] avoid too many boundaries in DIO
  2006-11-10  1:48 [PATCH] avoid too many boundaries in DIO Chris Mason
  2006-11-10  6:35 ` David Chinner
@ 2006-11-10  6:56 ` Andrew Morton
  2006-11-10 21:49   ` Chris Mason
  1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2006-11-10  6:56 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-kernel, dgc

On Thu, 9 Nov 2006 20:48:54 -0500
Chris Mason <chris.mason@oracle.com> wrote:

> Dave Chinner found a 10% performance regression with ext3 when using DIO
> to fill holes instead of buffered IO.  On large IOs, the ext3 get_block
> routine will send more than a page worth of blocks back to DIO via a
> single buffer_head with a large b_size value.
> 
> The DIO code iterates through this massive block and tests for a
> boundary buffer over and over again.  For every block size unit spanned
> by the big map_bh, the boundary bit is tested and a bio may be forced
> down to the block layer.
> 
> There are two potential fixes, one is to ignore the boundary bit on
> large regions returned by the FS.  DIO can't tell which part of the big
> region was a boundary, and so it may not be a good idea to trust the
> hint.
> 
> This patch just clears the boundary bit after using it once.  It is 10%
> faster for a streaming DIO write w/blocksize of 512k on my sata drive.
> 

Thanks.

But that's two large performance regressions (so far) from the multi-block
get_block() feature.  And that was allegedly a performance optimisation! 
Who's testing this stuff?

> 
> diff -r 38d08cbe880b fs/direct-io.c
> --- a/fs/direct-io.c	Thu Nov 09 20:02:08 2006 -0500
> +++ b/fs/direct-io.c	Thu Nov 09 20:31:12 2006 -0500
> @@ -959,6 +959,17 @@ do_holes:
>  			BUG_ON(this_chunk_bytes == 0);
>  
>  			dio->boundary = buffer_boundary(map_bh);
> +
> +			/*
> +			 * get_block may return more than one page worth
> +			 * of blocks.  Only make the first one a boundary.
> +			 * This is still sub-optimal, it probably only
> +			 * makes sense to play with boundaries when
> +			 * get_block returns a single FS block sized
> +			 * unit.
> +			 */
> +			clear_buffer_boundary(map_bh);
> +
>  			ret = submit_page_section(dio, page, offset_in_page,
>  				this_chunk_bytes, dio->next_block_for_io);
>  			if (ret) {


Is that actually correct?  If ->get_block() returned a buffer_boundary()
buffer then what we want to do is to push down all the thus-far-queued BIOs
once we've submitted _all_ of the BIOs represented by map_bh.  I think that
if we require more than one BIO to cover map_bh.b_size then we'll do the
submission after the first BIO has been sent instead of after the final one
has been sent?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] avoid too many boundaries in DIO
  2006-11-10  6:56 ` Andrew Morton
@ 2006-11-10 21:49   ` Chris Mason
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Mason @ 2006-11-10 21:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, dgc

On Thu, Nov 09, 2006 at 10:56:18PM -0800, Andrew Morton wrote:
> > This patch just clears the boundary bit after using it once.  It is 10%
> > faster for a streaming DIO write w/blocksize of 512k on my sata drive.
> > 
> 
> Thanks.
> 
> But that's two large performance regressions (so far) from the multi-block
> get_block() feature.  And that was allegedly a performance optimisation! 
> Who's testing this stuff?

Well, I've been wearing the sata-drive-writeback-cache cape of shame
since Dave found the regression while testing my stuff, so I can't
really point fingers.  On some drives the regression didn't show up, but
it should be there on any beefy storage.

> Is that actually correct?  If ->get_block() returned a
> buffer_boundary() buffer then what we want to do is to push down all
> the thus-far-queued BIOs once we've submitted _all_ of the BIOs
> represented by map_bh.  I think that if we require more than one BIO
> to cover map_bh.b_size then we'll do the submission after the first
> BIO has been sent instead of after the final one has been sent?
> 
I realized the same thing this morning, but it took a while to figure
out why honoring the boundary on the last block was 5% slower than my
first patch.  It turns out that we consistently send down the boundary
bio too soon.

Testing of this has been very light, but I wanted to get it out for
review.  I'll test more over the weekend.

-chris

From: Chris Mason <chris.mason@oracle.com>
Subject: avoid too many boundaries in DIO

Dave Chinner found a 10% performance regression with ext3 when using DIO
to fill holes instead of buffered IO.  On large IOs, the ext3 get_block
routine will send more than a page worth of blocks back to DIO via a
single buffer_head with a large b_size value.

The DIO code iterates through this massive block and tests for a
boundary buffer over and over again.  For every block size unit spanned
by the big map_bh, the boundary bit is tested and a bio may be forced
down to the block layer.

This patch changes things to only submit the boundary bio for the
last block in the big map_bh.

The DIO code had a number of places that would honor dio->boundary
too early, sending the bio down before actually adding the boundary
block to it.  Those are also fixed.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 18a9e9f5c707 fs/direct-io.c
--- a/fs/direct-io.c	Thu Oct 19 08:30:00 2006 +0700
+++ b/fs/direct-io.c	Fri Nov 10 16:33:04 2006 -0500
@@ -572,7 +571,6 @@ static int dio_new_bio(struct dio *dio, 
 	nr_pages = min(dio->pages_in_io, bio_get_nr_vecs(dio->map_bh.b_bdev));
 	BUG_ON(nr_pages <= 0);
 	ret = dio_bio_alloc(dio, dio->map_bh.b_bdev, sector, nr_pages);
-	dio->boundary = 0;
 out:
 	return ret;
 }
@@ -626,12 +624,6 @@ static int dio_send_cur_page(struct dio 
 		 */
 		if (dio->final_block_in_bio != dio->cur_page_block)
 			dio_bio_submit(dio);
-		/*
-		 * Submit now if the underlying fs is about to perform a
-		 * metadata read
-		 */
-		if (dio->boundary)
-			dio_bio_submit(dio);
 	}
 
 	if (dio->bio == NULL) {
@@ -648,6 +640,12 @@ static int dio_send_cur_page(struct dio 
 			BUG_ON(ret != 0);
 		}
 	}
+	/*
+	 * Submit now if the underlying fs is about to perform a
+	 * metadata read
+	 */
+	if (dio->boundary)
+		dio_bio_submit(dio);
 out:
 	return ret;
 }
@@ -674,6 +672,10 @@ submit_page_section(struct dio *dio, str
 		unsigned offset, unsigned len, sector_t blocknr)
 {
 	int ret = 0;
+	int boundary = dio->boundary;
+
+	/* don't let dio_send_cur_page do the boundary too soon */
+	dio->boundary = 0;
 
 	/*
 	 * Can we just grow the current page's presence in the dio?
@@ -683,17 +683,7 @@ submit_page_section(struct dio *dio, str
 		(dio->cur_page_block +
 			(dio->cur_page_len >> dio->blkbits) == blocknr)) {
 		dio->cur_page_len += len;
-
-		/*
-		 * If dio->boundary then we want to schedule the IO now to
-		 * avoid metadata seeks.
-		 */
-		if (dio->boundary) {
-			ret = dio_send_cur_page(dio);
-			page_cache_release(dio->cur_page);
-			dio->cur_page = NULL;
-		}
-		goto out;
+		goto out_send;
 	}
 
 	/*
@@ -712,6 +702,18 @@ submit_page_section(struct dio *dio, str
 	dio->cur_page_offset = offset;
 	dio->cur_page_len = len;
 	dio->cur_page_block = blocknr;
+
+out_send:
+	/*
+	 * If dio->boundary then we want to schedule the IO now to
+	 * avoid metadata seeks.
+	 */
+	if (boundary) {
+		dio->boundary = 1;
+		ret = dio_send_cur_page(dio);
+		page_cache_release(dio->cur_page);
+		dio->cur_page = NULL;
+	}
 out:
 	return ret;
 }
@@ -917,7 +919,16 @@ do_holes:
 			this_chunk_bytes = this_chunk_blocks << blkbits;
 			BUG_ON(this_chunk_bytes == 0);
 
-			dio->boundary = buffer_boundary(map_bh);
+			/*
+			 * get_block may return more than one page worth
+			 * of blocks.  Make sure only the last io we
+			 * send down for this region is a boundary
+			 */
+			if (dio->blocks_available == this_chunk_blocks)
+				dio->boundary = buffer_boundary(map_bh);
+			else
+				dio->boundary = 0;
+
 			ret = submit_page_section(dio, page, offset_in_page,
 				this_chunk_bytes, dio->next_block_for_io);
 			if (ret) {

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-11-10 21:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-10  1:48 [PATCH] avoid too many boundaries in DIO Chris Mason
2006-11-10  6:35 ` David Chinner
2006-11-10  6:56 ` Andrew Morton
2006-11-10 21:49   ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox