* [Drbd-dev] [PATCH 00/60] block: support multipage bvec
@ 2016-10-29 8:07 Ming Lei
2016-10-29 8:08 ` [Drbd-dev] [PATCH 03/60] block: drbd: remove impossible failure handling Ming Lei
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Ming Lei @ 2016-10-29 8:07 UTC (permalink / raw)
To: Jens Axboe, linux-kernel
Cc: Michal Hocko, Mike Snitzer, Takashi Iwai, Ming Lei,
Rasmus Villemoes, open list:NVM EXPRESS TARGET DRIVER, Zheng Liu,
Keith Busch, open list:MEMORY MANAGEMENT,
open list:DEVICE-MAPPER LVM, open list:TARGET SUBSYSTEM,
Yijing Wang, open list:LogFS, open list:DRBD DRIVER,
open list:GFS2 FILE SYSTEM, Hannes Reinecke, Kirill A . Shutemov,
open list:TARGET SUBSYSTEM, Christoph Hellwig, Mike Christie,
Guoqing Jiang, Johannes Thumshirn, open list:EXT4 FILE SYSTEM,
Kent Overstreet, Petr Mladek, Johannes Berg,
open list:SUSPEND TO RAM, Toshi Kani, Coly Li, linux-block,
linux-fsdevel, Kent Overstreet, Hannes Reinecke, Dan Williams,
open list:BTRFS FILE SYSTEM,
open list:SOFTWARE RAID Multiple Disks SUPPORT, Jiri Kosina,
open list:BCACHE BLOCK LAYER CACHE, open list:F2FS FILE SYSTEM,
open list:XFS FILESYSTEM, Minchan Kim, Al Viro, Minfei Huang,
Zheng Liu, Joe Perches, Andrew Morton, Eric Wheeler,
open list:OSD LIBRARY and FILESYSTEM
Hi,
This patchset brings multipage bvec into block layer. Basic
xfstests(-a auto) over virtio-blk/virtio-scsi have been run
and no regression is found, so it should be good enough
to show the approach now, and any comments are welcome!
1) what is multipage bvec?
Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in linux kernel for long time.
2) why is multipage bvec introduced?
Kent proposed the idea[1] first.
As system's RAM becomes much bigger than before, and
at the same time huge page, transparent huge page and
memory compaction are widely used, it is a bit easy now
to see physically contiguous pages inside fs/block stack.
On the other hand, from block layer's view, it isn't
necessary to store intermediate pages into bvec, and
it is enough to just store the physicallly contiguous
'segment'.
Also huge pages are being brought to filesystem[2], we
can do IO a hugepage a time[3], requires that one bio can
transfer at least one huge page one time. Turns out it isn't
flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
can fit in this case very well.
With multipage bvec:
- bio size can be increased and it should improve some
high-bandwidth IO case in theory[4].
- Inside block layer, both bio splitting and sg map can
become more efficient than before by just traversing the
physically contiguous 'segment' instead of each page.
- there is possibility in future to improve memory footprint
of bvecs usage.
3) how is multipage bvec implemented in this patchset?
The 1st 22 patches cleanup on direct access to bvec table,
and comments on some special cases. With this approach,
most of cases are found as safe for multipage bvec,
only fs/buffer, pktcdvd, dm-io, MD and btrfs need to deal
with.
Given a little more work is involved to cleanup pktcdvd,
MD and btrfs, this patchset introduces QUEUE_FLAG_NO_MP for
them, and these components can still see/use singlepage bvec.
In the future, once the cleanup is done, the flag can be killed.
The 2nd part(23 ~ 60) implements multipage bvec in block:
- put all tricks into bvec/bio/rq iterators, and as far as
drivers and fs use these standard iterators, they are happy
with multipage bvec
- bio_for_each_segment_all() changes
this helper pass pointer of each bvec directly to user, and
it has to be changed. Two new helpers(bio_for_each_segment_all_rd()
and bio_for_each_segment_all_wt()) are introduced.
- bio_clone() changes
At default bio_clone still clones one new bio in multipage bvec
way. Also single page version of bio_clone() is introduced
for some special cases, such as only single page bvec is used
for the new cloned bio(bio bounce, ...)
These patches can be found in the following git tree:
https://github.com/ming1/linux/tree/mp-bvec-0.3-v4.9
Thanks Christoph for looking at the early version and providing
very good suggestions, such as: introduce bio_init_with_vec_table(),
remove another unnecessary helpers for cleanup and so on.
TODO:
- cleanup direct access to bvec table for MD & btrfs
[1], http://marc.info/?l=linux-kernel&m=141680246629547&w=2
[2], http://lwn.net/Articles/700781/
[3], http://marc.info/?t=147735447100001&r=1&w=2
[4], http://marc.info/?l=linux-mm&m=147745525801433&w=2
Ming Lei (60):
block: bio: introduce bio_init_with_vec_table()
block drivers: convert to bio_init_with_vec_table()
block: drbd: remove impossible failure handling
block: floppy: use bio_add_page()
target: avoid to access .bi_vcnt directly
bcache: debug: avoid to access .bi_io_vec directly
dm: crypt: use bio_add_page()
dm: use bvec iterator helpers to implement .get_page and .next_page
dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
fs: logfs: convert to bio_add_page() in sync_request()
fs: logfs: use bio_add_page() in __bdev_writeseg()
fs: logfs: use bio_add_page() in do_erase()
fs: logfs: remove unnecesary check
block: drbd: comment on direct access bvec table
block: loop: comment on direct access to bvec table
block: pktcdvd: comment on direct access to bvec table
kernel/power/swap.c: comment on direct access to bvec table
mm: page_io.c: comment on direct access to bvec table
fs/buffer: comment on direct access to bvec table
f2fs: f2fs_read_end_io: comment on direct access to bvec table
bcache: comment on direct access to bvec table
block: comment on bio_alloc_pages()
block: introduce flag QUEUE_FLAG_NO_MP
md: set NO_MP for request queue of md
block: pktcdvd: set NO_MP for pktcdvd request queue
btrfs: set NO_MP for request queues behind BTRFS
block: introduce BIO_SP_MAX_SECTORS
block: introduce QUEUE_FLAG_SPLIT_MP
dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT
bcache: set flag of QUEUE_FLAG_SPLIT_MP
block: introduce multipage/single page bvec helpers
block: implement sp version of bvec iterator helpers
block: introduce bio_for_each_segment_mp()
block: introduce bio_clone_sp()
bvec_iter: introduce BVEC_ITER_ALL_INIT
block: bounce: avoid direct access to bvec from bio->bi_io_vec
block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
block: bounce: convert multipage bvecs into singlepage
bcache: debug: switch to bio_clone_sp()
blk-merge: compute bio->bi_seg_front_size efficiently
block: blk-merge: try to make front segments in full size
block: use bio_for_each_segment_mp() to compute segments count
block: use bio_for_each_segment_mp() to map sg
block: introduce bvec_for_each_sp_bvec()
block: bio: introduce bio_for_each_segment_all_rd() and its write pair
block: deal with dirtying pages for multipage bvec
block: convert to bio_for_each_segment_all_rd()
fs/mpage: convert to bio_for_each_segment_all_rd()
fs/direct-io: convert to bio_for_each_segment_all_rd()
ext4: convert to bio_for_each_segment_all_rd()
xfs: convert to bio_for_each_segment_all_rd()
logfs: convert to bio_for_each_segment_all_rd()
gfs2: convert to bio_for_each_segment_all_rd()
f2fs: convert to bio_for_each_segment_all_rd()
exofs: convert to bio_for_each_segment_all_rd()
fs: crypto: convert to bio_for_each_segment_all_rd()
bcache: convert to bio_for_each_segment_all_rd()
dm-crypt: convert to bio_for_each_segment_all_rd()
fs/buffer.c: use bvec iterator to truncate the bio
block: enable multipage bvecs
block/bio.c | 104 ++++++++++++++----
block/blk-merge.c | 216 +++++++++++++++++++++++++++++--------
block/bounce.c | 80 ++++++++++----
drivers/block/drbd/drbd_bitmap.c | 1 +
drivers/block/drbd/drbd_receiver.c | 14 +--
drivers/block/floppy.c | 10 +-
drivers/block/loop.c | 5 +
drivers/block/pktcdvd.c | 8 ++
drivers/md/bcache/btree.c | 4 +-
drivers/md/bcache/debug.c | 19 +++-
drivers/md/bcache/io.c | 4 +-
drivers/md/bcache/journal.c | 4 +-
drivers/md/bcache/movinggc.c | 7 +-
drivers/md/bcache/super.c | 25 +++--
drivers/md/bcache/util.c | 7 ++
drivers/md/bcache/writeback.c | 6 +-
drivers/md/dm-bufio.c | 4 +-
drivers/md/dm-crypt.c | 11 +-
drivers/md/dm-io.c | 34 ++++--
drivers/md/dm-rq.c | 3 +-
drivers/md/dm.c | 11 +-
drivers/md/md.c | 12 +++
drivers/md/raid5.c | 9 +-
drivers/nvme/target/io-cmd.c | 4 +-
drivers/target/target_core_pscsi.c | 8 +-
fs/btrfs/volumes.c | 3 +
fs/buffer.c | 24 +++--
fs/crypto/crypto.c | 3 +-
fs/direct-io.c | 4 +-
fs/exofs/ore.c | 3 +-
fs/exofs/ore_raid.c | 3 +-
fs/ext4/page-io.c | 3 +-
fs/ext4/readpage.c | 3 +-
fs/f2fs/data.c | 13 ++-
fs/gfs2/lops.c | 3 +-
fs/gfs2/meta_io.c | 3 +-
fs/logfs/dev_bdev.c | 110 +++++++------------
fs/mpage.c | 3 +-
fs/xfs/xfs_aops.c | 3 +-
include/linux/bio.h | 108 +++++++++++++++++--
include/linux/blk_types.h | 6 ++
include/linux/blkdev.h | 4 +
include/linux/bvec.h | 123 +++++++++++++++++++--
kernel/power/swap.c | 2 +
mm/page_io.c | 1 +
45 files changed, 759 insertions(+), 276 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 6+ messages in thread* [Drbd-dev] [PATCH 03/60] block: drbd: remove impossible failure handling
2016-10-29 8:07 [Drbd-dev] [PATCH 00/60] block: support multipage bvec Ming Lei
@ 2016-10-29 8:08 ` Ming Lei
2016-10-31 15:25 ` Christoph Hellwig
2016-10-29 8:08 ` [Drbd-dev] [PATCH 14/60] block: drbd: comment on direct access bvec table Ming Lei
2016-10-31 15:25 ` [Drbd-dev] [PATCH 00/60] block: support multipage bvec Christoph Hellwig
2 siblings, 1 reply; 6+ messages in thread
From: Ming Lei @ 2016-10-29 8:08 UTC (permalink / raw)
To: Jens Axboe, linux-kernel
Cc: Christoph Hellwig, Ming Lei, Philipp Reisner, linux-block,
linux-fsdevel, Lars Ellenberg, Kirill A . Shutemov,
open list:DRBD DRIVER
For a non-cloned bio, bio_add_page() only returns failure when
the io vec table is full, but in that case, bio->bi_vcnt can't
be zero at all.
So remove the impossible failure handling.
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/drbd/drbd_receiver.c | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 942384f34e22..c537e3bd09eb 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1648,20 +1648,8 @@ int drbd_submit_peer_request(struct drbd_device *device,
page_chain_for_each(page) {
unsigned len = min_t(unsigned, data_size, PAGE_SIZE);
- if (!bio_add_page(bio, page, len, 0)) {
- /* A single page must always be possible!
- * But in case it fails anyways,
- * we deal with it, and complain (below). */
- if (bio->bi_vcnt == 0) {
- drbd_err(device,
- "bio_add_page failed for len=%u, "
- "bi_vcnt=0 (bi_sector=%llu)\n",
- len, (uint64_t)bio->bi_iter.bi_sector);
- err = -ENOSPC;
- goto fail;
- }
+ if (!bio_add_page(bio, page, len, 0))
goto next_bio;
- }
data_size -= len;
sector += len >> 9;
--nr_pages;
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [Drbd-dev] [PATCH 03/60] block: drbd: remove impossible failure handling
2016-10-29 8:08 ` [Drbd-dev] [PATCH 03/60] block: drbd: remove impossible failure handling Ming Lei
@ 2016-10-31 15:25 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:25 UTC (permalink / raw)
To: Ming Lei
Cc: linux-block, Christoph Hellwig, linux-kernel, Philipp Reisner,
Jens Axboe, linux-fsdevel, Lars Ellenberg, Kirill A . Shutemov,
open list:DRBD DRIVER
On Sat, Oct 29, 2016 at 04:08:02PM +0800, Ming Lei wrote:
> For a non-cloned bio, bio_add_page() only returns failure when
> the io vec table is full, but in that case, bio->bi_vcnt can't
> be zero at all.
>
> So remove the impossible failure handling.
>
> Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Drbd-dev] [PATCH 14/60] block: drbd: comment on direct access bvec table
2016-10-29 8:07 [Drbd-dev] [PATCH 00/60] block: support multipage bvec Ming Lei
2016-10-29 8:08 ` [Drbd-dev] [PATCH 03/60] block: drbd: remove impossible failure handling Ming Lei
@ 2016-10-29 8:08 ` Ming Lei
2016-10-31 15:25 ` [Drbd-dev] [PATCH 00/60] block: support multipage bvec Christoph Hellwig
2 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2016-10-29 8:08 UTC (permalink / raw)
To: Jens Axboe, linux-kernel
Cc: Christoph Hellwig, Ming Lei, Philipp Reisner, linux-block,
linux-fsdevel, Lars Ellenberg, Kirill A . Shutemov,
open list:DRBD DRIVER
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/drbd/drbd_bitmap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index ab62b81c2ca7..ce9506da30ad 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -953,6 +953,7 @@ static void drbd_bm_endio(struct bio *bio)
struct drbd_bm_aio_ctx *ctx = bio->bi_private;
struct drbd_device *device = ctx->device;
struct drbd_bitmap *b = device->bitmap;
+ /* single page bio, safe for multipage bvec */
unsigned int idx = bm_page_to_idx(bio->bi_io_vec[0].bv_page);
if ((ctx->flags & BM_AIO_COPY_PAGES) == 0 &&
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Drbd-dev] [PATCH 00/60] block: support multipage bvec
2016-10-29 8:07 [Drbd-dev] [PATCH 00/60] block: support multipage bvec Ming Lei
2016-10-29 8:08 ` [Drbd-dev] [PATCH 03/60] block: drbd: remove impossible failure handling Ming Lei
2016-10-29 8:08 ` [Drbd-dev] [PATCH 14/60] block: drbd: comment on direct access bvec table Ming Lei
@ 2016-10-31 15:25 ` Christoph Hellwig
2016-10-31 22:52 ` Ming Lei
2 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:25 UTC (permalink / raw)
To: Ming Lei
Cc: Michal Hocko, Mike Snitzer, Takashi Iwai,
open list:BTRFS FILE SYSTEM, Rasmus Villemoes,
open list:NVM EXPRESS TARGET DRIVER, Zheng Liu, Keith Busch,
open list:MEMORY MANAGEMENT, open list:DEVICE-MAPPER (LVM),
open list:TARGET SUBSYSTEM, Yijing Wang, open list:LogFS,
open list:DRBD DRIVER, Christoph Hellwig, Hannes Reinecke,
Kirill A . Shutemov, open list:TARGET SUBSYSTEM,
open list:GFS2 FILE SYSTEM, Mike Christie, Guoqing Jiang,
Johannes Thumshirn, open list:EXT4 FILE SYSTEM, Kent Overstreet,
Petr Mladek, Johannes Berg, open list:SUSPEND TO RAM, Toshi Kani,
Coly Li, linux-block, linux-fsdevel, Kent Overstreet,
Hannes Reinecke, Dan Williams,
open list:BCACHE (BLOCK LAYER CACHE), Jens Axboe,
open list:SOFTWARE RAID (Multiple Disks) SUPPORT, Jiri Kosina,
linux-kernel, open list:F2FS FILE SYSTEM,
open list:XFS FILESYSTEM, Minchan Kim, Al Viro, Minfei Huang,
Zheng Liu, Joe Perches, Andrew Morton, Eric Wheeler,
open list:OSD LIBRARY and FILESYSTEM
Hi Ming,
can you send a first patch just doing the obvious cleanups like
converting to bio_add_page and replacing direct poking into the
bio with the proper accessors? That should help reducing the
actual series to a sane size, and it should also help to cut
down the Cc list.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Drbd-dev] [PATCH 00/60] block: support multipage bvec
2016-10-31 15:25 ` [Drbd-dev] [PATCH 00/60] block: support multipage bvec Christoph Hellwig
@ 2016-10-31 22:52 ` Ming Lei
0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2016-10-31 22:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Michal Hocko, Mike Snitzer, Takashi Iwai,
open list:BTRFS FILE SYSTEM, Rasmus Villemoes,
open list:NVM EXPRESS TARGET DRIVER, Zheng Liu, Keith Busch,
open list:MEMORY MANAGEMENT, open list:DEVICE-MAPPER (LVM),
open list:TARGET SUBSYSTEM, Yijing Wang, open list:LogFS,
open list:DRBD DRIVER, Hannes Reinecke, Toshi Kani,
open list:TARGET SUBSYSTEM, open list:GFS2 FILE SYSTEM,
Mike Christie, Guoqing Jiang, Johannes Thumshirn,
open list:EXT4 FILE SYSTEM, Kent Overstreet, Petr Mladek,
open list:OSD LIBRARY and FILESYSTEM, Johannes Berg,
open list:SUSPEND TO RAM, Coly Li, linux-block, Linux FS Devel,
Kent Overstreet, Hannes Reinecke, Dan Williams,
open list:BCACHE (BLOCK LAYER CACHE), Jens Axboe,
open list:SOFTWARE RAID (Multiple Disks) SUPPORT, Jiri Kosina,
Linux Kernel Mailing List, open list:F2FS FILE SYSTEM,
open list:XFS FILESYSTEM, Minchan Kim, Kirill A . Shutemov,
Minfei Huang, Zheng Liu, Joe Perches, Andrew Morton, Eric Wheeler,
Al Viro
On Mon, Oct 31, 2016 at 11:25 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Hi Ming,
>
> can you send a first patch just doing the obvious cleanups like
> converting to bio_add_page and replacing direct poking into the
> bio with the proper accessors? That should help reducing the
OK, that is just the 1st part of the patchset.
> actual series to a sane size, and it should also help to cut
> down the Cc list.
>
Thanks,
Ming Lei
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-10-31 22:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-29 8:07 [Drbd-dev] [PATCH 00/60] block: support multipage bvec Ming Lei
2016-10-29 8:08 ` [Drbd-dev] [PATCH 03/60] block: drbd: remove impossible failure handling Ming Lei
2016-10-31 15:25 ` Christoph Hellwig
2016-10-29 8:08 ` [Drbd-dev] [PATCH 14/60] block: drbd: comment on direct access bvec table Ming Lei
2016-10-31 15:25 ` [Drbd-dev] [PATCH 00/60] block: support multipage bvec Christoph Hellwig
2016-10-31 22:52 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox