* [PATCH 0/3] implement readpages() for block device to optimize sequential read
@ 2014-08-05 14:38 Akinobu Mita
2014-08-05 14:38 ` [PATCH 1/3] vfs: make guard_bh_eod() more generic Akinobu Mita
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Akinobu Mita @ 2014-08-05 14:38 UTC (permalink / raw)
To: linux-kernel
Cc: Akinobu Mita, Andrew Morton, Jens Axboe, Alexander Viro,
Jeff Moyer, linux-fsdevel
This patchset implements readpages() operation for block device by
using mpage_readpages() which can create multipage BIOs instead of
BIOs for each page and reduce system CPU time consumption.
Akinobu Mita (3):
vfs: make guard_bh_eod() more generic
vfs: guard end of device for mpage interface
block_dev: implement readpages() to optimize sequential read
fs/block_dev.c | 7 +++++++
fs/buffer.c | 26 ++++++++++++--------------
fs/internal.h | 5 +++++
fs/mpage.c | 2 ++
4 files changed, 26 insertions(+), 14 deletions(-)
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
--
1.9.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/3] vfs: make guard_bh_eod() more generic
2014-08-05 14:38 [PATCH 0/3] implement readpages() for block device to optimize sequential read Akinobu Mita
@ 2014-08-05 14:38 ` Akinobu Mita
2014-08-05 14:38 ` [PATCH 2/3] vfs: guard end of device for mpage interface Akinobu Mita
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Akinobu Mita @ 2014-08-05 14:38 UTC (permalink / raw)
To: linux-kernel
Cc: Akinobu Mita, Andrew Morton, Jens Axboe, Alexander Viro,
Jeff Moyer, linux-fsdevel
guard_bh_eod() is used in submit_bh() to allow us to do IO even on the
odd last sectors of a device, even if the block size is some multiple
of the physical sector size. This makes guard_bh_eod() more generic
and renames it guard_bio_eod() so that we can use it without struct
buffer_head argument.
The reason for this change is that using mpage_readpages() for block
device requires to add this guard check in mpage code.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
---
fs/buffer.c | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 8f05111..f891c90 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2954,7 +2954,7 @@ static void end_bio_bh_io_sync(struct bio *bio, int err)
/*
* This allows us to do IO even on the odd last sectors
- * of a device, even if the bh block size is some multiple
+ * of a device, even if the block size is some multiple
* of the physical sector size.
*
* We'll just truncate the bio to the size of the device,
@@ -2964,10 +2964,11 @@ static void end_bio_bh_io_sync(struct bio *bio, int err)
* errors, this only handles the "we need to be able to
* do IO at the final sector" case.
*/
-static void guard_bh_eod(int rw, struct bio *bio, struct buffer_head *bh)
+static void guard_bio_eod(int rw, struct bio *bio)
{
sector_t maxsector;
- unsigned bytes;
+ struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
+ unsigned truncated_bytes;
maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
if (!maxsector)
@@ -2982,23 +2983,20 @@ static void guard_bh_eod(int rw, struct bio *bio, struct buffer_head *bh)
return;
maxsector -= bio->bi_iter.bi_sector;
- bytes = bio->bi_iter.bi_size;
- if (likely((bytes >> 9) <= maxsector))
+ if (likely((bio->bi_iter.bi_size >> 9) <= maxsector))
return;
- /* Uhhuh. We've got a bh that straddles the device size! */
- bytes = maxsector << 9;
+ /* Uhhuh. We've got a bio that straddles the device size! */
+ truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
/* Truncate the bio.. */
- bio->bi_iter.bi_size = bytes;
- bio->bi_io_vec[0].bv_len = bytes;
+ bio->bi_iter.bi_size -= truncated_bytes;
+ bvec->bv_len -= truncated_bytes;
/* ..and clear the end of the buffer for reads */
if ((rw & RW_MASK) == READ) {
- void *kaddr = kmap_atomic(bh->b_page);
- memset(kaddr + bh_offset(bh) + bytes, 0, bh->b_size - bytes);
- kunmap_atomic(kaddr);
- flush_dcache_page(bh->b_page);
+ zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len,
+ truncated_bytes);
}
}
@@ -3039,7 +3037,7 @@ int _submit_bh(int rw, struct buffer_head *bh, unsigned long bio_flags)
bio->bi_flags |= bio_flags;
/* Take care of bh's that straddle the end of the device */
- guard_bh_eod(rw, bio, bh);
+ guard_bio_eod(rw, bio);
if (buffer_meta(bh))
rw |= REQ_META;
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] vfs: guard end of device for mpage interface
2014-08-05 14:38 [PATCH 0/3] implement readpages() for block device to optimize sequential read Akinobu Mita
2014-08-05 14:38 ` [PATCH 1/3] vfs: make guard_bh_eod() more generic Akinobu Mita
@ 2014-08-05 14:38 ` Akinobu Mita
2014-08-05 14:38 ` [PATCH 3/3] block_dev: implement readpages() to optimize sequential read Akinobu Mita
2014-08-14 22:04 ` [PATCH 0/3] implement readpages() for block device " Andrew Morton
3 siblings, 0 replies; 7+ messages in thread
From: Akinobu Mita @ 2014-08-05 14:38 UTC (permalink / raw)
To: linux-kernel
Cc: Akinobu Mita, Andrew Morton, Jens Axboe, Alexander Viro,
Jeff Moyer, linux-fsdevel
Add guard_bio_eod() check for mpage code in order to allow us to do IO
even on the odd last sectors of a device, even if the block size is
some multiple of the physical sector size.
Using mpage_readpages() for block device requires this guard check.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
---
fs/buffer.c | 2 +-
fs/internal.h | 5 +++++
fs/mpage.c | 2 ++
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index f891c90..0e4b01c 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2964,7 +2964,7 @@ static void end_bio_bh_io_sync(struct bio *bio, int err)
* errors, this only handles the "we need to be able to
* do IO at the final sector" case.
*/
-static void guard_bio_eod(int rw, struct bio *bio)
+void guard_bio_eod(int rw, struct bio *bio)
{
sector_t maxsector;
struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
diff --git a/fs/internal.h b/fs/internal.h
index 4657424..27d4ec5 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -144,3 +144,8 @@ extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
* pipe.c
*/
extern const struct file_operations pipefifo_fops;
+
+/*
+ * buffer.c
+ */
+extern void guard_bio_eod(int rw, struct bio *bio);
diff --git a/fs/mpage.c b/fs/mpage.c
index 5f9ed62..3e79220 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -28,6 +28,7 @@
#include <linux/backing-dev.h>
#include <linux/pagevec.h>
#include <linux/cleancache.h>
+#include "internal.h"
/*
* I/O completion handler for multipage BIOs.
@@ -57,6 +58,7 @@ static void mpage_end_io(struct bio *bio, int err)
static struct bio *mpage_bio_submit(int rw, struct bio *bio)
{
bio->bi_end_io = mpage_end_io;
+ guard_bio_eod(rw, bio);
submit_bio(rw, bio);
return NULL;
}
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] block_dev: implement readpages() to optimize sequential read
2014-08-05 14:38 [PATCH 0/3] implement readpages() for block device to optimize sequential read Akinobu Mita
2014-08-05 14:38 ` [PATCH 1/3] vfs: make guard_bh_eod() more generic Akinobu Mita
2014-08-05 14:38 ` [PATCH 2/3] vfs: guard end of device for mpage interface Akinobu Mita
@ 2014-08-05 14:38 ` Akinobu Mita
2014-08-14 22:04 ` [PATCH 0/3] implement readpages() for block device " Andrew Morton
3 siblings, 0 replies; 7+ messages in thread
From: Akinobu Mita @ 2014-08-05 14:38 UTC (permalink / raw)
To: linux-kernel
Cc: Akinobu Mita, Andrew Morton, Jens Axboe, Alexander Viro,
Jeff Moyer, linux-fsdevel
Sequential read from a block device is expected to be equal or faster
than from the file on a filesystem. But it is not correct due to the
lack of effective readpages() in the address space operations for
block device.
This implements readpages() operation for block device by using
mpage_readpages() which can create multipage BIOs instead of BIOs for
each page and reduce system CPU time consumption.
Install 1GB of RAM disk storage:
# modprobe scsi_debug dev_size_mb=1024 delay=0
Sequential read from file on a filesystem:
# mkfs.ext4 /dev/$DEV
# mount /dev/$DEV /mnt
# fio --name=t --size=512m --rw=read --filename=/mnt/file
...
read : io=524288KB, bw=2133.4MB/s, iops=546133, runt= 240msec
Sequential read from a block device:
# fio --name=t --size=512m --rw=read --filename=/dev/$DEV
...
(Without this commit)
read : io=524288KB, bw=1700.2MB/s, iops=435455, runt= 301msec
(With this commit)
read : io=524288KB, bw=2160.4MB/s, iops=553046, runt= 237msec
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
---
fs/block_dev.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6d72746..e2f3ad08 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -304,6 +304,12 @@ static int blkdev_readpage(struct file * file, struct page * page)
return block_read_full_page(page, blkdev_get_block);
}
+static int blkdev_readpages(struct file *file, struct address_space *mapping,
+ struct list_head *pages, unsigned nr_pages)
+{
+ return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block);
+}
+
static int blkdev_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
@@ -1622,6 +1628,7 @@ static int blkdev_releasepage(struct page *page, gfp_t wait)
static const struct address_space_operations def_blk_aops = {
.readpage = blkdev_readpage,
+ .readpages = blkdev_readpages,
.writepage = blkdev_writepage,
.write_begin = blkdev_write_begin,
.write_end = blkdev_write_end,
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] implement readpages() for block device to optimize sequential read
2014-08-05 14:38 [PATCH 0/3] implement readpages() for block device to optimize sequential read Akinobu Mita
` (2 preceding siblings ...)
2014-08-05 14:38 ` [PATCH 3/3] block_dev: implement readpages() to optimize sequential read Akinobu Mita
@ 2014-08-14 22:04 ` Andrew Morton
2014-08-15 17:09 ` Akinobu Mita
3 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2014-08-14 22:04 UTC (permalink / raw)
To: Akinobu Mita
Cc: linux-kernel, Jens Axboe, Alexander Viro, Jeff Moyer,
linux-fsdevel
On Tue, 5 Aug 2014 23:38:31 +0900 Akinobu Mita <akinobu.mita@gmail.com> wrote:
> This patchset implements readpages() operation for block device by
> using mpage_readpages() which can create multipage BIOs instead of
> BIOs for each page and reduce system CPU time consumption.
Patchset is simple and straightforward enough. But who the
heck cares about the performance of buffered reads from /dev/XXX?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] implement readpages() for block device to optimize sequential read
2014-08-14 22:04 ` [PATCH 0/3] implement readpages() for block device " Andrew Morton
@ 2014-08-15 17:09 ` Akinobu Mita
2014-08-21 21:44 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Akinobu Mita @ 2014-08-15 17:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: LKML, Jens Axboe, Alexander Viro, Jeff Moyer, linux-fsdevel
2014-08-15 7:04 GMT+09:00 Andrew Morton <akpm@linux-foundation.org>:
> On Tue, 5 Aug 2014 23:38:31 +0900 Akinobu Mita <akinobu.mita@gmail.com> wrote:
>
>> This patchset implements readpages() operation for block device by
>> using mpage_readpages() which can create multipage BIOs instead of
>> BIOs for each page and reduce system CPU time consumption.
>
> Patchset is simple and straightforward enough. But who the
> heck cares about the performance of buffered reads from /dev/XXX?
I tend to consider the block device as a baseline when I measure the
performance of the storage device. So I was a bit surprised when I saw
the performance of buffered reads from filesystem is better than the one
from block device. That is the reason about this patch for me.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] implement readpages() for block device to optimize sequential read
2014-08-15 17:09 ` Akinobu Mita
@ 2014-08-21 21:44 ` Andrew Morton
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2014-08-21 21:44 UTC (permalink / raw)
To: Akinobu Mita; +Cc: LKML, Jens Axboe, Alexander Viro, Jeff Moyer, linux-fsdevel
On Sat, 16 Aug 2014 02:09:44 +0900 Akinobu Mita <akinobu.mita@gmail.com> wrote:
> 2014-08-15 7:04 GMT+09:00 Andrew Morton <akpm@linux-foundation.org>:
> > On Tue, 5 Aug 2014 23:38:31 +0900 Akinobu Mita <akinobu.mita@gmail.com> wrote:
> >
> >> This patchset implements readpages() operation for block device by
> >> using mpage_readpages() which can create multipage BIOs instead of
> >> BIOs for each page and reduce system CPU time consumption.
> >
> > Patchset is simple and straightforward enough. But who the
> > heck cares about the performance of buffered reads from /dev/XXX?
>
> I tend to consider the block device as a baseline when I measure the
> performance of the storage device. So I was a bit surprised when I saw
> the performance of buffered reads from filesystem is better than the one
> from block device. That is the reason about this patch for me.
OK. The lack of readpages for blockdevs has been an outstanding oddity
for a decade or longer - I think it's just that nobody was motivated to
do it because the workload isn't important.
But the implementation looks pretty simple so why not clean it up.
I grabbed the patches.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-08-21 21:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-05 14:38 [PATCH 0/3] implement readpages() for block device to optimize sequential read Akinobu Mita
2014-08-05 14:38 ` [PATCH 1/3] vfs: make guard_bh_eod() more generic Akinobu Mita
2014-08-05 14:38 ` [PATCH 2/3] vfs: guard end of device for mpage interface Akinobu Mita
2014-08-05 14:38 ` [PATCH 3/3] block_dev: implement readpages() to optimize sequential read Akinobu Mita
2014-08-14 22:04 ` [PATCH 0/3] implement readpages() for block device " Andrew Morton
2014-08-15 17:09 ` Akinobu Mita
2014-08-21 21:44 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).