[PATCH 8 of 8] Avoid too many boundary buffers in DIO

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com
Subject: [PATCH 8 of 8] Avoid too many boundary buffers in DIO
Date: Thu, 21 Dec 2006 21:05:28 -0500	[thread overview]
Message-ID: <20061222020528.GT11354@think.oraclecorp.com> (raw)
In-Reply-To: <20061222014552.GA26388@think.oraclecorp.com>

Dave Chinner found a 10% performance regression with ext3 when using DIO
to fill holes instead of buffered IO.  On large IOs, the ext3 get_block
routine will send more than a page worth of blocks back to DIO via a
single buffer_head with a large b_size value.

The DIO code iterates through this massive block and tests for a
boundary buffer over and over again.  For every block size unit spanned
by the big map_bh, the boundary bit is tested and a bio may be forced
down to the block layer.

There are two potential fixes, one is to ignore the boundary bit on
large regions returned by the FS.  DIO can't tell which part of the big
region was a boundary, and so it may not be a good idea to trust the
hint.

This patch just clears the boundary bit after using it once.  It is 10%
faster for a streaming DIO write w/blocksize of 512k on my sata drive.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff -r 3bd838f3dc06 -r 9d3d4e0f01fe fs/direct-io.c
--- a/fs/direct-io.c	Thu Dec 21 15:31:31 2006 -0500
+++ b/fs/direct-io.c	Thu Dec 21 15:31:31 2006 -0500
@@ -610,7 +610,6 @@ static int dio_new_bio(struct dio *dio, 
 	nr_pages = min(dio->pages_in_io, bio_get_nr_vecs(dio->map_bh.b_bdev));
 	BUG_ON(nr_pages <= 0);
 	ret = dio_bio_alloc(dio, dio->map_bh.b_bdev, sector, nr_pages);
-	dio->boundary = 0;
 out:
 	return ret;
 }
@@ -664,12 +663,6 @@ static int dio_send_cur_page(struct dio 
 		 */
 		if (dio->final_block_in_bio != dio->cur_page_block)
 			dio_bio_submit(dio);
-		/*
-		 * Submit now if the underlying fs is about to perform a
-		 * metadata read
-		 */
-		if (dio->boundary)
-			dio_bio_submit(dio);
 	}
 
 	if (dio->bio == NULL) {
@@ -686,6 +679,12 @@ static int dio_send_cur_page(struct dio 
 			BUG_ON(ret != 0);
 		}
 	}
+	/*
+	 * Submit now if the underlying fs is about to perform a
+	 * metadata read
+	 */
+	if (dio->boundary)
+		dio_bio_submit(dio);
 out:
 	return ret;
 }
@@ -712,6 +711,10 @@ submit_page_section(struct dio *dio, str
 		unsigned offset, unsigned len, sector_t blocknr)
 {
 	int ret = 0;
+	int boundary = dio->boundary;
+
+	/* don't let dio_send_cur_page do the boundary too soon */
+	dio->boundary = 0;
 
 	if (dio->rw & WRITE) {
 		/*
@@ -728,17 +731,7 @@ submit_page_section(struct dio *dio, str
 		(dio->cur_page_block +
 			(dio->cur_page_len >> dio->blkbits) == blocknr)) {
 		dio->cur_page_len += len;
-
-		/*
-		 * If dio->boundary then we want to schedule the IO now to
-		 * avoid metadata seeks.
-		 */
-		if (dio->boundary) {
-			ret = dio_send_cur_page(dio);
-			page_cache_release(dio->cur_page);
-			dio->cur_page = NULL;
-		}
-		goto out;
+		goto out_send;
 	}
 
 	/*
@@ -757,6 +750,18 @@ submit_page_section(struct dio *dio, str
 	dio->cur_page_offset = offset;
 	dio->cur_page_len = len;
 	dio->cur_page_block = blocknr;
+
+out_send:
+	/*
+	 * If dio->boundary then we want to schedule the IO now to
+	 * avoid metadata seeks.
+	 */
+	if (boundary) {
+		dio->boundary = 1;
+		ret = dio_send_cur_page(dio);
+		page_cache_release(dio->cur_page);
+		dio->cur_page = NULL;
+	}
 out:
 	return ret;
 }
@@ -962,7 +967,16 @@ do_holes:
 			this_chunk_bytes = this_chunk_blocks << blkbits;
 			BUG_ON(this_chunk_bytes == 0);
 
-			dio->boundary = buffer_boundary(map_bh);
+			/*
+			 * get_block may return more than one page worth
+			 * of blocks.  Make sure only the last io we
+			 * send down for this region is a boundary
+			 */
+			if (dio->blocks_available == this_chunk_blocks)
+				dio->boundary = buffer_boundary(map_bh);
+			else
+				dio->boundary = 0;
+
 			ret = submit_page_section(dio, page, offset_in_page,
 				this_chunk_bytes, dio->next_block_for_io);
 			if (ret) {

next prev parent reply	other threads:[~2006-12-22  2:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-22  1:45 [PATCH 0 of 8] O_DIRECT locking rework v5 Chris Mason
2006-12-22  1:52 ` [PATCH 1 of 8] Introduce a place holder page for the pagecache Chris Mason
2006-12-22  1:55 ` [PATCH 2 of 8] Change O_DIRECT to use placeholders instead of i_mutex/i_alloc_sem locking Chris Mason
2006-12-22  1:57 ` [PATCH 3 of 8] DIO: don't fall back to buffered writes Chris Mason
2006-12-22  1:59 ` [PATCH 4 of 8] Add flags to control direct IO helpers Chris Mason
2006-12-22  2:00 ` [PATCH 5 of 8] Make ext3 safe for the new DIO locking rules Chris Mason
2006-12-22  2:02 ` [PATCH 6 of 8] Make reiserfs safe for " Chris Mason
2006-12-22  2:03 ` [PATCH 7 of 8] Adapt XFS to the new blockdev_direct_IO calls Chris Mason
2006-12-22  2:05 ` Chris Mason [this message]
  -- strict thread matches above, loose matches on Subject: below --
2007-02-07  0:32 [RFC PATCH 0 of 8] O_DIRECT locking rework Chris Mason
2007-02-07  0:32 ` [PATCH 8 of 8] Avoid too many boundary buffers in DIO Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061222020528.GT11354@think.oraclecorp.com \
    --to=chris.mason@oracle.com \
    --cc=akpm@osdl.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).