linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] implement readpages() for block device to optimize sequential read
@ 2014-08-05 14:38 Akinobu Mita
  2014-08-05 14:38 ` [PATCH 1/3] vfs: make guard_bh_eod() more generic Akinobu Mita
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Akinobu Mita @ 2014-08-05 14:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Akinobu Mita, Andrew Morton, Jens Axboe, Alexander Viro,
	Jeff Moyer, linux-fsdevel

This patchset implements readpages() operation for block device by
using mpage_readpages() which can create multipage BIOs instead of
BIOs for each page and reduce system CPU time consumption.

Akinobu Mita (3):
  vfs: make guard_bh_eod() more generic
  vfs: guard end of device for mpage interface
  block_dev: implement readpages() to optimize sequential read

 fs/block_dev.c |  7 +++++++
 fs/buffer.c    | 26 ++++++++++++--------------
 fs/internal.h  |  5 +++++
 fs/mpage.c     |  2 ++
 4 files changed, 26 insertions(+), 14 deletions(-)

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
-- 
1.9.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] vfs: make guard_bh_eod() more generic
  2014-08-05 14:38 [PATCH 0/3] implement readpages() for block device to optimize sequential read Akinobu Mita
@ 2014-08-05 14:38 ` Akinobu Mita
  2014-08-05 14:38 ` [PATCH 2/3] vfs: guard end of device for mpage interface Akinobu Mita
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Akinobu Mita @ 2014-08-05 14:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Akinobu Mita, Andrew Morton, Jens Axboe, Alexander Viro,
	Jeff Moyer, linux-fsdevel

guard_bh_eod() is used in submit_bh() to allow us to do IO even on the
odd last sectors of a device, even if the block size is some multiple
of the physical sector size.  This makes guard_bh_eod() more generic
and renames it guard_bio_eod() so that we can use it without struct
buffer_head argument.

The reason for this change is that using mpage_readpages() for block
device requires to add this guard check in mpage code.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/buffer.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 8f05111..f891c90 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2954,7 +2954,7 @@ static void end_bio_bh_io_sync(struct bio *bio, int err)
 
 /*
  * This allows us to do IO even on the odd last sectors
- * of a device, even if the bh block size is some multiple
+ * of a device, even if the block size is some multiple
  * of the physical sector size.
  *
  * We'll just truncate the bio to the size of the device,
@@ -2964,10 +2964,11 @@ static void end_bio_bh_io_sync(struct bio *bio, int err)
  * errors, this only handles the "we need to be able to
  * do IO at the final sector" case.
  */
-static void guard_bh_eod(int rw, struct bio *bio, struct buffer_head *bh)
+static void guard_bio_eod(int rw, struct bio *bio)
 {
 	sector_t maxsector;
-	unsigned bytes;
+	struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
+	unsigned truncated_bytes;
 
 	maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
 	if (!maxsector)
@@ -2982,23 +2983,20 @@ static void guard_bh_eod(int rw, struct bio *bio, struct buffer_head *bh)
 		return;
 
 	maxsector -= bio->bi_iter.bi_sector;
-	bytes = bio->bi_iter.bi_size;
-	if (likely((bytes >> 9) <= maxsector))
+	if (likely((bio->bi_iter.bi_size >> 9) <= maxsector))
 		return;
 
-	/* Uhhuh. We've got a bh that straddles the device size! */
-	bytes = maxsector << 9;
+	/* Uhhuh. We've got a bio that straddles the device size! */
+	truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
 
 	/* Truncate the bio.. */
-	bio->bi_iter.bi_size = bytes;
-	bio->bi_io_vec[0].bv_len = bytes;
+	bio->bi_iter.bi_size -= truncated_bytes;
+	bvec->bv_len -= truncated_bytes;
 
 	/* ..and clear the end of the buffer for reads */
 	if ((rw & RW_MASK) == READ) {
-		void *kaddr = kmap_atomic(bh->b_page);
-		memset(kaddr + bh_offset(bh) + bytes, 0, bh->b_size - bytes);
-		kunmap_atomic(kaddr);
-		flush_dcache_page(bh->b_page);
+		zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len,
+				truncated_bytes);
 	}
 }
 
@@ -3039,7 +3037,7 @@ int _submit_bh(int rw, struct buffer_head *bh, unsigned long bio_flags)
 	bio->bi_flags |= bio_flags;
 
 	/* Take care of bh's that straddle the end of the device */
-	guard_bh_eod(rw, bio, bh);
+	guard_bio_eod(rw, bio);
 
 	if (buffer_meta(bh))
 		rw |= REQ_META;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] vfs: guard end of device for mpage interface
  2014-08-05 14:38 [PATCH 0/3] implement readpages() for block device to optimize sequential read Akinobu Mita
  2014-08-05 14:38 ` [PATCH 1/3] vfs: make guard_bh_eod() more generic Akinobu Mita
@ 2014-08-05 14:38 ` Akinobu Mita
  2014-08-05 14:38 ` [PATCH 3/3] block_dev: implement readpages() to optimize sequential read Akinobu Mita
  2014-08-14 22:04 ` [PATCH 0/3] implement readpages() for block device " Andrew Morton
  3 siblings, 0 replies; 7+ messages in thread
From: Akinobu Mita @ 2014-08-05 14:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Akinobu Mita, Andrew Morton, Jens Axboe, Alexander Viro,
	Jeff Moyer, linux-fsdevel

Add guard_bio_eod() check for mpage code in order to allow us to do IO
even on the odd last sectors of a device, even if the block size is
some multiple of the physical sector size.

Using mpage_readpages() for block device requires this guard check.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/buffer.c   | 2 +-
 fs/internal.h | 5 +++++
 fs/mpage.c    | 2 ++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index f891c90..0e4b01c 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2964,7 +2964,7 @@ static void end_bio_bh_io_sync(struct bio *bio, int err)
  * errors, this only handles the "we need to be able to
  * do IO at the final sector" case.
  */
-static void guard_bio_eod(int rw, struct bio *bio)
+void guard_bio_eod(int rw, struct bio *bio)
 {
 	sector_t maxsector;
 	struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
diff --git a/fs/internal.h b/fs/internal.h
index 4657424..27d4ec5 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -144,3 +144,8 @@ extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
  * pipe.c
  */
 extern const struct file_operations pipefifo_fops;
+
+/*
+ * buffer.c
+ */
+extern void guard_bio_eod(int rw, struct bio *bio);
diff --git a/fs/mpage.c b/fs/mpage.c
index 5f9ed62..3e79220 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -28,6 +28,7 @@
 #include <linux/backing-dev.h>
 #include <linux/pagevec.h>
 #include <linux/cleancache.h>
+#include "internal.h"
 
 /*
  * I/O completion handler for multipage BIOs.
@@ -57,6 +58,7 @@ static void mpage_end_io(struct bio *bio, int err)
 static struct bio *mpage_bio_submit(int rw, struct bio *bio)
 {
 	bio->bi_end_io = mpage_end_io;
+	guard_bio_eod(rw, bio);
 	submit_bio(rw, bio);
 	return NULL;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] block_dev: implement readpages() to optimize sequential read
  2014-08-05 14:38 [PATCH 0/3] implement readpages() for block device to optimize sequential read Akinobu Mita
  2014-08-05 14:38 ` [PATCH 1/3] vfs: make guard_bh_eod() more generic Akinobu Mita
  2014-08-05 14:38 ` [PATCH 2/3] vfs: guard end of device for mpage interface Akinobu Mita
@ 2014-08-05 14:38 ` Akinobu Mita
  2014-08-14 22:04 ` [PATCH 0/3] implement readpages() for block device " Andrew Morton
  3 siblings, 0 replies; 7+ messages in thread
From: Akinobu Mita @ 2014-08-05 14:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Akinobu Mita, Andrew Morton, Jens Axboe, Alexander Viro,
	Jeff Moyer, linux-fsdevel

Sequential read from a block device is expected to be equal or faster
than from the file on a filesystem.  But it is not correct due to the
lack of effective readpages() in the address space operations for
block device.

This implements readpages() operation for block device by using
mpage_readpages() which can create multipage BIOs instead of BIOs for
each page and reduce system CPU time consumption.

Install 1GB of RAM disk storage:

	# modprobe scsi_debug dev_size_mb=1024 delay=0

Sequential read from file on a filesystem:

	# mkfs.ext4 /dev/$DEV
	# mount /dev/$DEV /mnt
	# fio --name=t --size=512m --rw=read --filename=/mnt/file
	...
	  read : io=524288KB, bw=2133.4MB/s, iops=546133, runt=   240msec

Sequential read from a block device:
	# fio --name=t --size=512m --rw=read --filename=/dev/$DEV
	...
(Without this commit)
	  read : io=524288KB, bw=1700.2MB/s, iops=435455, runt=   301msec

(With this commit)
	  read : io=524288KB, bw=2160.4MB/s, iops=553046, runt=   237msec

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/block_dev.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6d72746..e2f3ad08 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -304,6 +304,12 @@ static int blkdev_readpage(struct file * file, struct page * page)
 	return block_read_full_page(page, blkdev_get_block);
 }
 
+static int blkdev_readpages(struct file *file, struct address_space *mapping,
+			struct list_head *pages, unsigned nr_pages)
+{
+	return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block);
+}
+
 static int blkdev_write_begin(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned flags,
 			struct page **pagep, void **fsdata)
@@ -1622,6 +1628,7 @@ static int blkdev_releasepage(struct page *page, gfp_t wait)
 
 static const struct address_space_operations def_blk_aops = {
 	.readpage	= blkdev_readpage,
+	.readpages	= blkdev_readpages,
 	.writepage	= blkdev_writepage,
 	.write_begin	= blkdev_write_begin,
 	.write_end	= blkdev_write_end,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] implement readpages() for block device to optimize sequential read
  2014-08-05 14:38 [PATCH 0/3] implement readpages() for block device to optimize sequential read Akinobu Mita
                   ` (2 preceding siblings ...)
  2014-08-05 14:38 ` [PATCH 3/3] block_dev: implement readpages() to optimize sequential read Akinobu Mita
@ 2014-08-14 22:04 ` Andrew Morton
  2014-08-15 17:09   ` Akinobu Mita
  3 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2014-08-14 22:04 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-kernel, Jens Axboe, Alexander Viro, Jeff Moyer,
	linux-fsdevel

On Tue,  5 Aug 2014 23:38:31 +0900 Akinobu Mita <akinobu.mita@gmail.com> wrote:

> This patchset implements readpages() operation for block device by
> using mpage_readpages() which can create multipage BIOs instead of
> BIOs for each page and reduce system CPU time consumption.

Patchset is simple and straightforward enough.  But who the 
heck cares about the performance of buffered reads from /dev/XXX?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] implement readpages() for block device to optimize sequential read
  2014-08-14 22:04 ` [PATCH 0/3] implement readpages() for block device " Andrew Morton
@ 2014-08-15 17:09   ` Akinobu Mita
  2014-08-21 21:44     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Akinobu Mita @ 2014-08-15 17:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Jens Axboe, Alexander Viro, Jeff Moyer, linux-fsdevel

2014-08-15 7:04 GMT+09:00 Andrew Morton <akpm@linux-foundation.org>:
> On Tue,  5 Aug 2014 23:38:31 +0900 Akinobu Mita <akinobu.mita@gmail.com> wrote:
>
>> This patchset implements readpages() operation for block device by
>> using mpage_readpages() which can create multipage BIOs instead of
>> BIOs for each page and reduce system CPU time consumption.
>
> Patchset is simple and straightforward enough.  But who the
> heck cares about the performance of buffered reads from /dev/XXX?

I tend to consider the block device as a baseline when I measure the
performance of the storage device.  So I was a bit surprised when I saw
the performance of buffered reads from filesystem is better than the one
from block device.  That is the reason about this patch for me.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] implement readpages() for block device to optimize sequential read
  2014-08-15 17:09   ` Akinobu Mita
@ 2014-08-21 21:44     ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2014-08-21 21:44 UTC (permalink / raw)
  To: Akinobu Mita; +Cc: LKML, Jens Axboe, Alexander Viro, Jeff Moyer, linux-fsdevel

On Sat, 16 Aug 2014 02:09:44 +0900 Akinobu Mita <akinobu.mita@gmail.com> wrote:

> 2014-08-15 7:04 GMT+09:00 Andrew Morton <akpm@linux-foundation.org>:
> > On Tue,  5 Aug 2014 23:38:31 +0900 Akinobu Mita <akinobu.mita@gmail.com> wrote:
> >
> >> This patchset implements readpages() operation for block device by
> >> using mpage_readpages() which can create multipage BIOs instead of
> >> BIOs for each page and reduce system CPU time consumption.
> >
> > Patchset is simple and straightforward enough.  But who the
> > heck cares about the performance of buffered reads from /dev/XXX?
> 
> I tend to consider the block device as a baseline when I measure the
> performance of the storage device.  So I was a bit surprised when I saw
> the performance of buffered reads from filesystem is better than the one
> from block device.  That is the reason about this patch for me.

OK.  The lack of readpages for blockdevs has been an outstanding oddity
for a decade or longer - I think it's just that nobody was motivated to
do it because the workload isn't important.

But the implementation looks pretty simple so why not clean it up.

I grabbed the patches.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-08-21 21:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-05 14:38 [PATCH 0/3] implement readpages() for block device to optimize sequential read Akinobu Mita
2014-08-05 14:38 ` [PATCH 1/3] vfs: make guard_bh_eod() more generic Akinobu Mita
2014-08-05 14:38 ` [PATCH 2/3] vfs: guard end of device for mpage interface Akinobu Mita
2014-08-05 14:38 ` [PATCH 3/3] block_dev: implement readpages() to optimize sequential read Akinobu Mita
2014-08-14 22:04 ` [PATCH 0/3] implement readpages() for block device " Andrew Morton
2014-08-15 17:09   ` Akinobu Mita
2014-08-21 21:44     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).