linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: akpm@linux-foundation.org
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	viro@zeniv.linux.org.uk, Andi Kleen <ak@linux.intel.com>
Subject: [PATCH 11/11] DIO: optimize cache misses in the submission path v2
Date: Mon, 29 Aug 2011 16:23:22 -0700	[thread overview]
Message-ID: <1314660202-661-12-git-send-email-andi@firstfloor.org> (raw)
In-Reply-To: <1314660202-661-1-git-send-email-andi@firstfloor.org>

From: Andi Kleen <ak@linux.intel.com>

Some investigation of a transaction processing workload showed that
a major consumer of cycles in __blockdev_direct_IO is the cache miss
while accessing the block size. This is because it has to walk
the chain from block_dev to gendisk to queue.

The block size is needed early on to check alignment and sizes.
It's only done if the check for the inode block size fails.
But the costly block device state is unconditionally fetched.

- Reorganize the code to only fetch block dev state when actually
needed.

Then do a prefetch on the block dev early on in the direct IO
path. This is worth it, because there is substantial code run
before we actually touch the block dev now.

- I also added some unlikelies to make it clear the compiler
that block device fetch code is not normally executed.

This gave a small, but measurable improvement on a large database
benchmark (about 0.3%)

v2: Remove unlikely (Jeff Moyer)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 fs/direct-io.c |   45 ++++++++++++++++++++++++++++++++++++---------
 1 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 03bcc6f..e149f5f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1086,8 +1086,8 @@ static inline int drop_refcount(struct dio *dio)
  * individual fields and will generate much worse code. 
  * This is important for the whole file.
  */
-ssize_t
-__blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
+static inline ssize_t
+do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 	struct block_device *bdev, const struct iovec *iov, loff_t offset, 
 	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
 	dio_submit_t submit_io,	int flags)
@@ -1096,7 +1096,6 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 	size_t size;
 	unsigned long addr;
 	unsigned blkbits = inode->i_blkbits;
-	unsigned bdev_blkbits = 0;
 	unsigned blocksize_mask = (1 << blkbits) - 1;
 	ssize_t retval = -EINVAL;
 	loff_t end = offset;
@@ -1109,12 +1108,14 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 	if (rw & WRITE)
 		rw = WRITE_ODIRECT;
 
-	if (bdev)
-		bdev_blkbits = blksize_bits(bdev_logical_block_size(bdev));
+	/* 
+	 * Avoid references to bdev if not absolutely needed to give
+	 * the early prefetch in the caller enough time.
+	 */
 
 	if (offset & blocksize_mask) {
 		if (bdev)
-			 blkbits = bdev_blkbits;
+			blkbits = blksize_bits(bdev_logical_block_size(bdev));
 		blocksize_mask = (1 << blkbits) - 1;
 		if (offset & blocksize_mask)
 			goto out;
@@ -1125,11 +1126,13 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 		addr = (unsigned long)iov[seg].iov_base;
 		size = iov[seg].iov_len;
 		end += size;
-		if ((addr & blocksize_mask) || (size & blocksize_mask))  {
+		if (unlikely((addr & blocksize_mask) || 
+			     (size & blocksize_mask)))  {
 			if (bdev)
-				 blkbits = bdev_blkbits;
+				 blkbits = blksize_bits(
+					 bdev_logical_block_size(bdev));
 			blocksize_mask = (1 << blkbits) - 1;
-			if ((addr & blocksize_mask) || (size & blocksize_mask))  
+			if ((addr & blocksize_mask) || (size & blocksize_mask))
 				goto out;
 		}
 	}
@@ -1312,6 +1315,30 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 out:
 	return retval;
 }
+
+ssize_t
+__blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
+	struct block_device *bdev, const struct iovec *iov, loff_t offset, 
+	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
+	dio_submit_t submit_io,	int flags)
+{
+	/* 
+	 * The block device state is needed in the end to finally
+	 * submit everything.  Since it's likely to be cache cold
+	 * prefetch it here as first thing to hide some of the
+	 * latency.
+	 * 
+	 * Attempt to prefetch the pieces we likely need later.
+	 */
+	prefetch(&bdev->bd_disk->part_tbl);
+	prefetch(bdev->bd_queue);
+	prefetch((char *)bdev->bd_queue + SMP_CACHE_BYTES);
+
+	return do_blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
+				     nr_segs, get_block, end_io,
+				     submit_io, flags);
+}
+
 EXPORT_SYMBOL(__blockdev_direct_IO);
 
 static __init int dio_init(void)
-- 
1.7.4.4

      parent reply	other threads:[~2011-08-29 23:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-29 23:23 Updated direct IO optimization patchkit v3 Andi Kleen
2011-08-29 23:23 ` [PATCH 01/11] DIO: Separate fields only used in the submission path from struct dio Andi Kleen
2011-08-29 23:23 ` [PATCH 02/11] DIO: Fix a wrong comment Andi Kleen
2011-08-29 23:23 ` [PATCH 03/11] DIO: Rearrange fields in dio/dio_submit to avoid holes Andi Kleen
2011-08-29 23:23 ` [PATCH 04/11] DIO: Use a slab cache for struct dio Andi Kleen
2011-08-29 23:23 ` [PATCH 05/11] DIO: Separate map_bh from dio v2 Andi Kleen
2011-08-29 23:23 ` [PATCH 06/11] DIO: Inline the complete submission path v2 Andi Kleen
2011-08-29 23:23 ` [PATCH 07/11] DIO: Merge direct_io_walker into __blockdev_direct_IO Andi Kleen
2011-08-29 23:23 ` [PATCH 08/11] DIO: Remove unnecessary dio argument from dio_pages_present() Andi Kleen
2011-08-29 23:23 ` [PATCH 09/11] DIO: Remove unused dio parameter from dio_bio_add_page Andi Kleen
2011-08-29 23:23 ` [PATCH 10/11] VFS: Cache request_queue in struct block_device Andi Kleen
2011-08-29 23:23 ` Andi Kleen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1314660202-661-12-git-send-email-andi@firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).