* [PATCH for-4.2 1/3] block: remove management of bi_remaining when restoring original bi_end_io
@ 2015-05-22 13:14 Mike Snitzer
2015-05-22 13:14 ` [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones Mike Snitzer
2015-05-22 13:14 ` [PATCH for-4.2 3/3] block: remove export for blk_queue_bio Mike Snitzer
0 siblings, 2 replies; 14+ messages in thread
From: Mike Snitzer @ 2015-05-22 13:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: dm-devel, Mike Snitzer
Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
non-chains") regressed all existing callers that followed this pattern:
1) saving a bio's original bi_end_io
2) wiring up an intermediate bi_end_io
3) restoring the original bi_end_io from intermediate bi_end_io
4) calling bio_endio() to execute the restored original bi_end_io
The regression was due to BIO_CHAIN only ever getting set if
bio_inc_remaining() is called. For the above pattern it isn't set until
step 3 above (step 2 would've needed to establish BIO_CHAIN). As such
the first bio_endio(), in step 2 above, never decremented __bi_remaining
before calling the intermediate bi_end_io -- leaving __bi_remaining with
the value 1 instead of 0. When bio_inc_remaining() occurred during step
3 it brought it to a value of 2. When the second bio_endio() was
called, in step 4 above, it should've called the original bi_end_io but
it didn't because there was an extra reference that wasn't dropped (due
to atomic operations being optimized away since BIO_CHAIN wasn't set
upfront).
Fix this issue by removing the __bi_remaining management complexity for
all callers that use the above pattern -- bio_chain() is the only
interface that _needs_ to be concerned with __bi_remaining. For the
above pattern callers just expect the bi_end_io they set to get called!
Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
that aren't associated with the bio_chain() interface.
Also, the bio_inc_remaining() interface has been moved local to bio.c.
Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
block/bio-integrity.c | 4 ++--
block/bio.c | 35 ++++++++++++++---------------------
drivers/md/bcache/io.c | 2 +-
drivers/md/dm-cache-target.c | 6 ------
drivers/md/dm-raid1.c | 2 --
drivers/md/dm-snap.c | 1 -
drivers/md/dm-thin.c | 9 +++------
drivers/md/dm-verity.c | 2 +-
fs/btrfs/disk-io.c | 2 +-
fs/btrfs/volumes.c | 16 +++++-----------
fs/btrfs/volumes.h | 2 --
include/linux/bio.h | 12 ------------
12 files changed, 27 insertions(+), 66 deletions(-)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 5cbd5d9..0436c21 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -361,7 +361,7 @@ static void bio_integrity_verify_fn(struct work_struct *work)
/* Restore original bio completion handler */
bio->bi_end_io = bip->bip_end_io;
- bio_endio_nodec(bio, error);
+ bio_endio(bio, error);
}
/**
@@ -388,7 +388,7 @@ void bio_integrity_endio(struct bio *bio, int error)
*/
if (error) {
bio->bi_end_io = bip->bip_end_io;
- bio_endio_nodec(bio, error);
+ bio_endio(bio, error);
return;
}
diff --git a/block/bio.c b/block/bio.c
index c2ff8a8..259197d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -303,6 +303,17 @@ static void bio_chain_endio(struct bio *bio, int error)
bio_put(bio);
}
+/*
+ * Increment chain count for the bio. Make sure the CHAIN flag update
+ * is visible before the raised count.
+ */
+static inline void bio_inc_remaining(struct bio *bio)
+{
+ bio->bi_flags |= (1 << BIO_CHAIN);
+ smp_mb__before_atomic();
+ atomic_inc(&bio->__bi_remaining);
+}
+
/**
* bio_chain - chain bio completions
* @bio: the target bio
@@ -1756,8 +1767,10 @@ static inline bool bio_remaining_done(struct bio *bio)
BUG_ON(atomic_read(&bio->__bi_remaining) <= 0);
- if (atomic_dec_and_test(&bio->__bi_remaining))
+ if (atomic_dec_and_test(&bio->__bi_remaining)) {
+ clear_bit(BIO_CHAIN, &bio->bi_flags);
return true;
+ }
return false;
}
@@ -1809,26 +1822,6 @@ void bio_endio(struct bio *bio, int error)
EXPORT_SYMBOL(bio_endio);
/**
- * bio_endio_nodec - end I/O on a bio, without decrementing bi_remaining
- * @bio: bio
- * @error: error, if any
- *
- * For code that has saved and restored bi_end_io; thing hard before using this
- * function, probably you should've cloned the entire bio.
- **/
-void bio_endio_nodec(struct bio *bio, int error)
-{
- /*
- * If it's not flagged as a chain, we are not going to dec the count
- */
- if (bio_flagged(bio, BIO_CHAIN))
- bio_inc_remaining(bio);
-
- bio_endio(bio, error);
-}
-EXPORT_SYMBOL(bio_endio_nodec);
-
-/**
* bio_split - split a bio
* @bio: bio to split
* @sectors: number of sectors to split from the front of @bio
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index fa028fa..cb64e64a 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -55,7 +55,7 @@ static void bch_bio_submit_split_done(struct closure *cl)
s->bio->bi_end_io = s->bi_end_io;
s->bio->bi_private = s->bi_private;
- bio_endio_nodec(s->bio, 0);
+ bio_endio(s->bio, 0);
closure_debug_destroy(&s->cl);
mempool_free(s, s->p->bio_split_hook);
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 705eb7b..41b2594 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -86,12 +86,6 @@ static void dm_unhook_bio(struct dm_hook_info *h, struct bio *bio)
{
bio->bi_end_io = h->bi_end_io;
bio->bi_private = h->bi_private;
-
- /*
- * Must bump bi_remaining to allow bio to complete with
- * restored bi_end_io.
- */
- bio_inc_remaining(bio);
}
/*----------------------------------------------------------------*/
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index d6a1c09..743fa9b 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -1254,8 +1254,6 @@ static int mirror_end_io(struct dm_target *ti, struct bio *bio, int error)
dm_bio_restore(bd, bio);
bio_record->details.bi_bdev = NULL;
- bio_inc_remaining(bio);
-
queue_bio(ms, bio, rw);
return DM_ENDIO_INCOMPLETE;
}
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 8bfeae2..7c82d3c 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1478,7 +1478,6 @@ out:
if (full_bio) {
full_bio->bi_end_io = pe->full_bio_end_io;
full_bio->bi_private = pe->full_bio_private;
- bio_inc_remaining(full_bio);
}
increment_pending_exceptions_done_count();
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 342dbda..e852602c 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -793,10 +793,9 @@ static void inc_remap_and_issue_cell(struct thin_c *tc,
static void process_prepared_mapping_fail(struct dm_thin_new_mapping *m)
{
- if (m->bio) {
+ if (m->bio)
m->bio->bi_end_io = m->saved_bi_end_io;
- bio_inc_remaining(m->bio);
- }
+
cell_error(m->tc->pool, m->cell);
list_del(&m->list);
mempool_free(m, m->tc->pool->mapping_pool);
@@ -810,10 +809,8 @@ static void process_prepared_mapping(struct dm_thin_new_mapping *m)
int r;
bio = m->bio;
- if (bio) {
+ if (bio)
bio->bi_end_io = m->saved_bi_end_io;
- bio_inc_remaining(bio);
- }
if (m->err) {
cell_error(pool, m->cell);
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index 66616db..bb9c6a0 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -459,7 +459,7 @@ static void verity_finish_io(struct dm_verity_io *io, int error)
bio->bi_end_io = io->orig_bi_end_io;
bio->bi_private = io->orig_bi_private;
- bio_endio_nodec(bio, error);
+ bio_endio(bio, error);
}
static void verity_work(struct work_struct *w)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e08a926..0bccf18 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1745,7 +1745,7 @@ static void end_workqueue_fn(struct btrfs_work *work)
bio->bi_private = end_io_wq->private;
bio->bi_end_io = end_io_wq->end_io;
kmem_cache_free(btrfs_end_io_wq_cache, end_io_wq);
- bio_endio_nodec(bio, error);
+ bio_endio(bio, error);
}
static int cleaner_kthread(void *arg)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8e8d1d1..dac77d4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5585,10 +5585,10 @@ int btrfs_rmap_block(struct btrfs_mapping_tree *map_tree,
static inline void btrfs_end_bbio(struct btrfs_bio *bbio, struct bio *bio, int err)
{
- if (likely(bbio->flags & BTRFS_BIO_ORIG_BIO_SUBMITTED))
- bio_endio_nodec(bio, err);
- else
- bio_endio(bio, err);
+ bio->bi_private = bbio->private;
+ bio->bi_end_io = bbio->end_io;
+ bio_endio(bio, err);
+
btrfs_put_bbio(bbio);
}
@@ -5632,8 +5632,6 @@ static void btrfs_end_bio(struct bio *bio, int err)
bio = bbio->orig_bio;
}
- bio->bi_private = bbio->private;
- bio->bi_end_io = bbio->end_io;
btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
/* only send an error to the higher layers if it is
* beyond the tolerance of the btrfs bio
@@ -5815,8 +5813,6 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
/* Shoud be the original bio. */
WARN_ON(bio != bbio->orig_bio);
- bio->bi_private = bbio->private;
- bio->bi_end_io = bbio->end_io;
btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
bio->bi_iter.bi_sector = logical >> 9;
@@ -5897,10 +5893,8 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio,
if (dev_nr < total_devs - 1) {
bio = btrfs_bio_clone(first_bio, GFP_NOFS);
BUG_ON(!bio); /* -ENOMEM */
- } else {
+ } else
bio = first_bio;
- bbio->flags |= BTRFS_BIO_ORIG_BIO_SUBMITTED;
- }
submit_stripe_bio(root, bbio, bio,
bbio->stripes[dev_nr].physical, dev_nr, rw,
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index ebc3133..cedae03 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -292,8 +292,6 @@ struct btrfs_bio_stripe {
struct btrfs_bio;
typedef void (btrfs_bio_end_io_t) (struct btrfs_bio *bio, int err);
-#define BTRFS_BIO_ORIG_BIO_SUBMITTED (1 << 0)
-
struct btrfs_bio {
atomic_t refs;
atomic_t stripes_pending;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7486ea1..f0291cf 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -427,7 +427,6 @@ static inline struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
}
extern void bio_endio(struct bio *, int);
-extern void bio_endio_nodec(struct bio *, int);
struct request_queue;
extern int bio_phys_segments(struct request_queue *, struct bio *);
@@ -659,17 +658,6 @@ static inline struct bio *bio_list_get(struct bio_list *bl)
}
/*
- * Increment chain count for the bio. Make sure the CHAIN flag update
- * is visible before the raised count.
- */
-static inline void bio_inc_remaining(struct bio *bio)
-{
- bio->bi_flags |= (1 << BIO_CHAIN);
- smp_mb__before_atomic();
- atomic_inc(&bio->__bi_remaining);
-}
-
-/*
* bio_set is used to allow other portions of the IO system to
* allocate their own private memory pools for bio and iovec structures.
* These memory pools in turn all allocate from the bio_slab
--
2.3.2 (Apple Git-55)
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-05-22 13:14 [PATCH for-4.2 1/3] block: remove management of bi_remaining when restoring original bi_end_io Mike Snitzer @ 2015-05-22 13:14 ` Mike Snitzer 2015-05-26 6:20 ` Junichi Nomura 2015-05-22 13:14 ` [PATCH for-4.2 3/3] block: remove export for blk_queue_bio Mike Snitzer 1 sibling, 1 reply; 14+ messages in thread From: Mike Snitzer @ 2015-05-22 13:14 UTC (permalink / raw) To: Jens Axboe; +Cc: dm-devel, Christoph Hellwig, Mike Snitzer From: Christoph Hellwig <hch@lst.de> Currently dm-multipath has to clone the bios for every request sent to the lower devices, which wastes cpu cycles and ties down memory. This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio to not complete bios attached to a request, which we set on clone requests similar to bios in a flush sequence. With this change I/O errors on a path failure only get propagated to dm-multipath, which can then either resubmit the I/O or complete the bios on the original request. I've done some basic testing of this on a Linux target with ALUA support, and it survives path failures during I/O nicely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-core.c | 94 +++---------------------- drivers/md/dm-table.c | 25 ++++--- drivers/md/dm.c | 171 +++++++++++----------------------------------- drivers/md/dm.h | 5 +- include/linux/blk_types.h | 2 + include/linux/blkdev.h | 6 +- 6 files changed, 73 insertions(+), 230 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 7a71a1d..986b256 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -117,7 +117,7 @@ EXPORT_SYMBOL(blk_rq_init); static void req_bio_endio(struct request *rq, struct bio *bio, unsigned int nbytes, int error) { - if (error) + if (error && !(rq->cmd_flags & REQ_CLONE)) clear_bit(BIO_UPTODATE, &bio->bi_flags); else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) error = -EIO; @@ -128,7 +128,8 @@ static void req_bio_endio(struct request *rq, struct bio *bio, bio_advance(bio, nbytes); /* don't actually finish bio if it's part of flush sequence */ - if (bio->bi_iter.bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ)) + if (bio->bi_iter.bi_size == 0 && + !(rq->cmd_flags & (REQ_FLUSH_SEQ|REQ_CLONE))) bio_endio(bio, error); } @@ -2911,95 +2912,22 @@ int blk_lld_busy(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_lld_busy); -/** - * blk_rq_unprep_clone - Helper function to free all bios in a cloned request - * @rq: the clone request to be cleaned up - * - * Description: - * Free all bios in @rq for a cloned request. - */ -void blk_rq_unprep_clone(struct request *rq) -{ - struct bio *bio; - - while ((bio = rq->bio) != NULL) { - rq->bio = bio->bi_next; - - bio_put(bio); - } -} -EXPORT_SYMBOL_GPL(blk_rq_unprep_clone); - -/* - * Copy attributes of the original request to the clone request. - * The actual data parts (e.g. ->cmd, ->sense) are not copied. - */ -static void __blk_rq_prep_clone(struct request *dst, struct request *src) +void blk_rq_prep_clone(struct request *dst, struct request *src) { dst->cpu = src->cpu; - dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; + dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK); + dst->cmd_flags |= REQ_NOMERGE | REQ_CLONE; dst->cmd_type = src->cmd_type; dst->__sector = blk_rq_pos(src); dst->__data_len = blk_rq_bytes(src); dst->nr_phys_segments = src->nr_phys_segments; dst->ioprio = src->ioprio; dst->extra_len = src->extra_len; -} - -/** - * blk_rq_prep_clone - Helper function to setup clone request - * @rq: the request to be setup - * @rq_src: original request to be cloned - * @bs: bio_set that bios for clone are allocated from - * @gfp_mask: memory allocation mask for bio - * @bio_ctr: setup function to be called for each clone bio. - * Returns %0 for success, non %0 for failure. - * @data: private data to be passed to @bio_ctr - * - * Description: - * Clones bios in @rq_src to @rq, and copies attributes of @rq_src to @rq. - * The actual data parts of @rq_src (e.g. ->cmd, ->sense) - * are not copied, and copying such parts is the caller's responsibility. - * Also, pages which the original bios are pointing to are not copied - * and the cloned bios just point same pages. - * So cloned bios must be completed before original bios, which means - * the caller must complete @rq before @rq_src. - */ -int blk_rq_prep_clone(struct request *rq, struct request *rq_src, - struct bio_set *bs, gfp_t gfp_mask, - int (*bio_ctr)(struct bio *, struct bio *, void *), - void *data) -{ - struct bio *bio, *bio_src; - - if (!bs) - bs = fs_bio_set; - - __rq_for_each_bio(bio_src, rq_src) { - bio = bio_clone_fast(bio_src, gfp_mask, bs); - if (!bio) - goto free_and_out; - - if (bio_ctr && bio_ctr(bio, bio_src, data)) - goto free_and_out; - - if (rq->bio) { - rq->biotail->bi_next = bio; - rq->biotail = bio; - } else - rq->bio = rq->biotail = bio; - } - - __blk_rq_prep_clone(rq, rq_src); - - return 0; - -free_and_out: - if (bio) - bio_put(bio); - blk_rq_unprep_clone(rq); - - return -ENOMEM; + dst->bio = src->bio; + dst->biotail = src->biotail; + dst->cmd = src->cmd; + dst->cmd_len = src->cmd_len; + dst->sense = src->sense; } EXPORT_SYMBOL_GPL(blk_rq_prep_clone); diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index d9b00b8..3662b2e 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -940,21 +940,28 @@ static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device * { unsigned type = dm_table_get_type(t); unsigned per_bio_data_size = 0; - struct dm_target *tgt; unsigned i; - if (unlikely(type == DM_TYPE_NONE)) { + switch (type) { + case DM_TYPE_BIO_BASED: + for (i = 0; i < t->num_targets; i++) { + struct dm_target *tgt = t->targets + i; + + per_bio_data_size = max(per_bio_data_size, + tgt->per_bio_data_size); + } + t->mempools = dm_alloc_bio_mempools(t->integrity_supported, + per_bio_data_size); + break; + case DM_TYPE_REQUEST_BASED: + case DM_TYPE_MQ_REQUEST_BASED: + t->mempools = dm_alloc_rq_mempools(md, type); + break; + default: DMWARN("no table type is set, can't allocate mempools"); return -EINVAL; } - if (type == DM_TYPE_BIO_BASED) - for (i = 0; i < t->num_targets; i++) { - tgt = t->targets + i; - per_bio_data_size = max(per_bio_data_size, tgt->per_bio_data_size); - } - - t->mempools = dm_alloc_md_mempools(md, type, t->integrity_supported, per_bio_data_size); if (!t->mempools) return -ENOMEM; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index a930b72..38837f8 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -990,57 +990,6 @@ static void clone_endio(struct bio *bio, int error) dec_pending(io, error); } -/* - * Partial completion handling for request-based dm - */ -static void end_clone_bio(struct bio *clone, int error) -{ - struct dm_rq_clone_bio_info *info = - container_of(clone, struct dm_rq_clone_bio_info, clone); - struct dm_rq_target_io *tio = info->tio; - struct bio *bio = info->orig; - unsigned int nr_bytes = info->orig->bi_iter.bi_size; - - bio_put(clone); - - if (tio->error) - /* - * An error has already been detected on the request. - * Once error occurred, just let clone->end_io() handle - * the remainder. - */ - return; - else if (error) { - /* - * Don't notice the error to the upper layer yet. - * The error handling decision is made by the target driver, - * when the request is completed. - */ - tio->error = error; - return; - } - - /* - * I/O for the bio successfully completed. - * Notice the data completion to the upper layer. - */ - - /* - * bios are processed from the head of the list. - * So the completing bio should always be rq->bio. - * If it's not, something wrong is happening. - */ - if (tio->orig->bio != bio) - DMERR("bio completion is going in the middle of the request"); - - /* - * Update the original request. - * Do not use blk_end_request() here, because it may complete - * the original request before the clone, and break the ordering. - */ - blk_update_request(tio->orig, 0, nr_bytes); -} - static struct dm_rq_target_io *tio_from_request(struct request *rq) { return (rq->q->mq_ops ? blk_mq_rq_to_pdu(rq) : rq->special); @@ -1089,8 +1038,6 @@ static void free_rq_clone(struct request *clone, bool must_be_mapped) WARN_ON_ONCE(must_be_mapped && !clone->q); - blk_rq_unprep_clone(clone); - if (md->type == DM_TYPE_MQ_REQUEST_BASED) /* stacked on blk-mq queue(s) */ tio->ti->type->release_clone_rq(clone); @@ -1821,39 +1768,13 @@ static void dm_dispatch_clone_request(struct request *clone, struct request *rq) dm_complete_request(rq, r); } -static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig, - void *data) +static void setup_clone(struct request *clone, struct request *rq, + struct dm_rq_target_io *tio) { - struct dm_rq_target_io *tio = data; - struct dm_rq_clone_bio_info *info = - container_of(bio, struct dm_rq_clone_bio_info, clone); - - info->orig = bio_orig; - info->tio = tio; - bio->bi_end_io = end_clone_bio; - - return 0; -} - -static int setup_clone(struct request *clone, struct request *rq, - struct dm_rq_target_io *tio, gfp_t gfp_mask) -{ - int r; - - r = blk_rq_prep_clone(clone, rq, tio->md->bs, gfp_mask, - dm_rq_bio_constructor, tio); - if (r) - return r; - - clone->cmd = rq->cmd; - clone->cmd_len = rq->cmd_len; - clone->sense = rq->sense; + blk_rq_prep_clone(clone, rq); clone->end_io = end_clone_request; clone->end_io_data = tio; - tio->clone = clone; - - return 0; } static struct request *clone_rq(struct request *rq, struct mapped_device *md, @@ -1874,12 +1795,7 @@ static struct request *clone_rq(struct request *rq, struct mapped_device *md, clone = tio->clone; blk_rq_init(NULL, clone); - if (setup_clone(clone, rq, tio, gfp_mask)) { - /* -ENOMEM */ - if (alloc_clone) - free_clone_request(md, clone); - return NULL; - } + setup_clone(clone, rq, tio); return clone; } @@ -1973,11 +1889,7 @@ static int map_request(struct dm_rq_target_io *tio, struct request *rq, } if (IS_ERR(clone)) return DM_MAPIO_REQUEUE; - if (setup_clone(clone, rq, tio, GFP_ATOMIC)) { - /* -ENOMEM */ - ti->type->release_clone_rq(clone); - return DM_MAPIO_REQUEUE; - } + setup_clone(clone, rq, tio); } switch (r) { @@ -2431,8 +2343,6 @@ static void __bind_mempools(struct mapped_device *md, struct dm_table *t) goto out; } - BUG_ON(!p || md->io_pool || md->rq_pool || md->bs); - md->io_pool = p->io_pool; p->io_pool = NULL; md->rq_pool = p->rq_pool; @@ -3536,48 +3446,23 @@ int dm_noflush_suspending(struct dm_target *ti) } EXPORT_SYMBOL_GPL(dm_noflush_suspending); -struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, unsigned type, - unsigned integrity, unsigned per_bio_data_size) +struct dm_md_mempools *dm_alloc_bio_mempools(unsigned integrity, + unsigned per_bio_data_size) { - struct dm_md_mempools *pools = kzalloc(sizeof(*pools), GFP_KERNEL); - struct kmem_cache *cachep = NULL; - unsigned int pool_size = 0; + struct dm_md_mempools *pools; + unsigned int pool_size = dm_get_reserved_bio_based_ios(); unsigned int front_pad; + pools = kzalloc(sizeof(*pools), GFP_KERNEL); if (!pools) return NULL; - type = filter_md_type(type, md); + front_pad = roundup(per_bio_data_size, __alignof__(struct dm_target_io)) + + offsetof(struct dm_target_io, clone); - switch (type) { - case DM_TYPE_BIO_BASED: - cachep = _io_cache; - pool_size = dm_get_reserved_bio_based_ios(); - front_pad = roundup(per_bio_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone); - break; - case DM_TYPE_REQUEST_BASED: - cachep = _rq_tio_cache; - pool_size = dm_get_reserved_rq_based_ios(); - pools->rq_pool = mempool_create_slab_pool(pool_size, _rq_cache); - if (!pools->rq_pool) - goto out; - /* fall through to setup remaining rq-based pools */ - case DM_TYPE_MQ_REQUEST_BASED: - if (!pool_size) - pool_size = dm_get_reserved_rq_based_ios(); - front_pad = offsetof(struct dm_rq_clone_bio_info, clone); - /* per_bio_data_size is not used. See __bind_mempools(). */ - WARN_ON(per_bio_data_size != 0); - break; - default: - BUG(); - } - - if (cachep) { - pools->io_pool = mempool_create_slab_pool(pool_size, cachep); - if (!pools->io_pool) - goto out; - } + pools->io_pool = mempool_create_slab_pool(pool_size, _io_cache); + if (!pools->io_pool) + goto out; pools->bs = bioset_create_nobvec(pool_size, front_pad); if (!pools->bs) @@ -3587,10 +3472,34 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, unsigned t goto out; return pools; - out: dm_free_md_mempools(pools); + return NULL; +} + +struct dm_md_mempools *dm_alloc_rq_mempools(struct mapped_device *md, + unsigned type) +{ + unsigned int pool_size = dm_get_reserved_rq_based_ios(); + struct dm_md_mempools *pools; + pools = kzalloc(sizeof(*pools), GFP_KERNEL); + if (!pools) + return NULL; + + if (filter_md_type(type, md) == DM_TYPE_REQUEST_BASED) { + pools->rq_pool = mempool_create_slab_pool(pool_size, _rq_cache); + if (!pools->rq_pool) + goto out; + } + + pools->io_pool = mempool_create_slab_pool(pool_size, _rq_tio_cache); + if (!pools->io_pool) + goto out; + + return pools; +out: + dm_free_md_mempools(pools); return NULL; } diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 6123c2b..e6e66d0 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -222,8 +222,9 @@ void dm_kcopyd_exit(void); /* * Mempool operations */ -struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, unsigned type, - unsigned integrity, unsigned per_bio_data_size); +struct dm_md_mempools *dm_alloc_bio_mempools(unsigned integrity, + unsigned per_bio_data_size); +struct dm_md_mempools *dm_alloc_rq_mempools(struct mapped_device *md, unsigned type); void dm_free_md_mempools(struct dm_md_mempools *pools); /* diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 7303b34..6ab9d12 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -192,6 +192,7 @@ enum rq_flag_bits { __REQ_HASHED, /* on IO scheduler merge hash */ __REQ_MQ_INFLIGHT, /* track inflight for MQ */ __REQ_NO_TIMEOUT, /* requests may never expire */ + __REQ_CLONE, /* cloned bios */ __REQ_NR_BITS, /* stops here */ }; @@ -246,5 +247,6 @@ enum rq_flag_bits { #define REQ_HASHED (1ULL << __REQ_HASHED) #define REQ_MQ_INFLIGHT (1ULL << __REQ_MQ_INFLIGHT) #define REQ_NO_TIMEOUT (1ULL << __REQ_NO_TIMEOUT) +#define REQ_CLONE (1ULL << __REQ_CLONE) #endif /* __LINUX_BLK_TYPES_H */ diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index bc91795..9ded80d 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -775,11 +775,7 @@ extern void blk_add_request_payload(struct request *rq, struct page *page, unsigned int len); extern int blk_rq_check_limits(struct request_queue *q, struct request *rq); extern int blk_lld_busy(struct request_queue *q); -extern int blk_rq_prep_clone(struct request *rq, struct request *rq_src, - struct bio_set *bs, gfp_t gfp_mask, - int (*bio_ctr)(struct bio *, struct bio *, void *), - void *data); -extern void blk_rq_unprep_clone(struct request *rq); +extern void blk_rq_prep_clone(struct request *rq, struct request *rq_src); extern int blk_insert_cloned_request(struct request_queue *q, struct request *rq); extern void blk_delay_queue(struct request_queue *, unsigned long); -- 2.3.2 (Apple Git-55) ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-05-22 13:14 ` [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones Mike Snitzer @ 2015-05-26 6:20 ` Junichi Nomura 2015-05-27 8:21 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Junichi Nomura @ 2015-05-26 6:20 UTC (permalink / raw) To: device-mapper development, Jens Axboe, Christoph Hellwig, Mike Snitzer On 05/22/15 22:14, Mike Snitzer wrote: > From: Christoph Hellwig <hch@lst.de> > > Currently dm-multipath has to clone the bios for every request sent > to the lower devices, which wastes cpu cycles and ties down memory. > > This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio > to not complete bios attached to a request, which we set on clone > requests similar to bios in a flush sequence. With this change I/O > errors on a path failure only get propagated to dm-multipath, which > can then either resubmit the I/O or complete the bios on the original > request. .. > @@ -128,7 +128,8 @@ static void req_bio_endio(struct request *rq, struct bio *bio, > bio_advance(bio, nbytes); > > /* don't actually finish bio if it's part of flush sequence */ > - if (bio->bi_iter.bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ)) > + if (bio->bi_iter.bi_size == 0 && > + !(rq->cmd_flags & (REQ_FLUSH_SEQ|REQ_CLONE))) > bio_endio(bio, error); > } Not completing bios is not sufficient. If you advance the bi_iter to the end, you need to somehow rewind it or the re-submission will be incomplete, that would end up as a data corruption... -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-05-26 6:20 ` Junichi Nomura @ 2015-05-27 8:21 ` Christoph Hellwig 2015-05-27 9:50 ` Junichi Nomura 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2015-05-27 8:21 UTC (permalink / raw) To: Junichi Nomura Cc: Jens Axboe, device-mapper development, Christoph Hellwig, Mike Snitzer On Tue, May 26, 2015 at 06:20:43AM +0000, Junichi Nomura wrote: > Not completing bios is not sufficient. > If you advance the bi_iter to the end, you need to somehow rewind it > or the re-submission will be incomplete, that would end up as a data > corruption... Can you explain which particular case you're worried about? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-05-27 8:21 ` Christoph Hellwig @ 2015-05-27 9:50 ` Junichi Nomura 2015-05-28 6:38 ` Junichi Nomura 2015-05-29 16:55 ` Christoph Hellwig 0 siblings, 2 replies; 14+ messages in thread From: Junichi Nomura @ 2015-05-27 9:50 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, device-mapper development, Mike Snitzer On 05/27/15 17:21, Christoph Hellwig wrote: > On Tue, May 26, 2015 at 06:20:43AM +0000, Junichi Nomura wrote: >> Not completing bios is not sufficient. >> If you advance the bi_iter to the end, you need to somehow rewind it >> or the re-submission will be incomplete, that would end up as a data >> corruption... > > Can you explain which particular case you're worried about? General path failure case. On retrying, another clone is created but bios it points to are already advanced to the end with your patch. So they look like bios with no remaining segments. Lower driver may successfully completes such a resubmitted clone *without doing actual I/O*. Then written data will be lost / read data will be bogus. Can you test this scenario with your patch? 1. Set up a multipath device with fail-over mode 2. Write something to the multipath device. After the clone request is sent to the primary path and before the data goes to the disk, down the primary path (e.g. echo offline > /sys/block/sdXX/device/state) 3. (dm-mpath will retry from the secondary path and the write will eventually succeed) 4. Verify if the written data is really on the disk -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-05-27 9:50 ` Junichi Nomura @ 2015-05-28 6:38 ` Junichi Nomura 2015-05-29 16:54 ` Christoph Hellwig 2015-05-29 16:55 ` Christoph Hellwig 1 sibling, 1 reply; 14+ messages in thread From: Junichi Nomura @ 2015-05-28 6:38 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, device-mapper development, Mike Snitzer [-- Attachment #1: Type: text/plain, Size: 3053 bytes --] On 05/27/15 18:50, Junichi Nomura wrote: > On 05/27/15 17:21, Christoph Hellwig wrote: >> On Tue, May 26, 2015 at 06:20:43AM +0000, Junichi Nomura wrote: >>> Not completing bios is not sufficient. >>> If you advance the bi_iter to the end, you need to somehow rewind it >>> or the re-submission will be incomplete, that would end up as a data >>> corruption... Less critical than the data corruption issue, I'm also worried about partial completion case. For successful partial completion, current code completes bio before fully completing the request. Your patch changes bios not completed until the request is fully completed. Other related concern is partial failure. In the case of bad sector, for example, current code fails I/O for the particular sector but other sectors in the request succeeds. If you make the request completion as all-or-nothing model, that will be a degrade for such a case. I'm not very sure how much impact does the removal of partial completion have in the real world. If partial completion is so negligible, I think it should be handled in such a way all the cases, instead of special casing REQ_CLONE. >> Can you explain which particular case you're worried about? > > General path failure case. > > On retrying, another clone is created but bios it points to > are already advanced to the end with your patch. > So they look like bios with no remaining segments. > Lower driver may successfully completes such a resubmitted > clone *without doing actual I/O*. > Then written data will be lost / read data will be bogus. > > Can you test this scenario with your patch? > 1. Set up a multipath device with fail-over mode > 2. Write something to the multipath device. > After the clone request is sent to the primary path > and before the data goes to the disk, > down the primary path > (e.g. echo offline > /sys/block/sdXX/device/state) > 3. (dm-mpath will retry from the secondary path and > the write will eventually succeed) > 4. Verify if the written data is really on the disk I made a small script so that people can play with. The script sets up tcm_loop multipath device and fio verification test while repeating paths up and down quickly. When your patch is applied, fio reports verification failure within a minute like this: # ./stress-mp.sh .. test1: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=2 fio-2.2.8-16-g68d9 Starting 1 process meta: verify failed at file /dev/mapper/mp offset 477626368, length 524288 received data dumped as mp.477626368.received expected data dumped as mp.477626368.expected fio: pid=13560, err=84/file:io_u.c:1866, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character test1: (groupid=0, jobs=1): err=84 (file:io_u.c:1866, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character): pid=13560: Thu May 28 01:54:56 2015 -- Jun'ichi Nomura, NEC Corporation [-- Attachment #2: stress-mp.sh --] [-- Type: application/x-sh, Size: 1677 bytes --] [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-05-28 6:38 ` Junichi Nomura @ 2015-05-29 16:54 ` Christoph Hellwig 2015-06-01 1:14 ` Junichi Nomura 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2015-05-29 16:54 UTC (permalink / raw) To: Junichi Nomura Cc: Jens Axboe, device-mapper development, Christoph Hellwig, Mike Snitzer On Thu, May 28, 2015 at 06:38:29AM +0000, Junichi Nomura wrote: > I'm also worried about partial completion case. > For successful partial completion, current code completes > bio before fully completing the request. > Your patch changes bios not completed until the request is > fully completed. Yes. Why are you worried about this case? > I'm not very sure how much impact does the removal of partial > completion have in the real world. > If partial completion is so negligible, I think it should be > handled in such a way all the cases, instead of special casing > REQ_CLONE. It isn't negligible - under load it actually does matter that we handle partial completions as arrays decide to just complete a request partially, so if we don't handle them we might end up in a loop not making progress. But if we only do that once on path fail over it's not an actual issue. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-05-29 16:54 ` Christoph Hellwig @ 2015-06-01 1:14 ` Junichi Nomura 2015-06-03 7:37 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Junichi Nomura @ 2015-06-01 1:14 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, device-mapper development, Mike Snitzer On 05/30/15 01:54, Christoph Hellwig wrote: > On Thu, May 28, 2015 at 06:38:29AM +0000, Junichi Nomura wrote: >> I'm also worried about partial completion case. >> For successful partial completion, current code completes >> bio before fully completing the request. >> Your patch changes bios not completed until the request is >> fully completed. > > Yes. Why are you worried about this case? See below. >> I'm not very sure how much impact does the removal of partial >> completion have in the real world. >> If partial completion is so negligible, I think it should be >> handled in such a way all the cases, instead of special casing >> REQ_CLONE. > > It isn't negligible - under load it actually does matter that > we handle partial completions as arrays decide to just complete > a request partially, so if we don't handle them we might end > up in a loop not making progress. OK. > But if we only do that once on > path fail over it's not an actual issue. But with your patch, the fail over happens after full completion of the request. Is it ok? -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-06-01 1:14 ` Junichi Nomura @ 2015-06-03 7:37 ` Christoph Hellwig 2015-06-03 23:13 ` Junichi Nomura 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2015-06-03 7:37 UTC (permalink / raw) To: Junichi Nomura Cc: Jens Axboe, device-mapper development, Christoph Hellwig, Mike Snitzer On Mon, Jun 01, 2015 at 01:14:26AM +0000, Junichi Nomura wrote: > > But if we only do that once on > > path fail over it's not an actual issue. > > But with your patch, the fail over happens after full completion > of the request. Is it ok? At least for SCSI it is, although the reason why is very subtile: From scsi_io_completion: /* * If we finished all bytes in the request we are done now. */ if (!scsi_end_request(req, error, good_bytes, 0)) return; /* * Kill remainder if no retrys. */ if (error && scsi_noretry_cmd(cmd)) { if (scsi_end_request(req, error, blk_rq_bytes(req), 0)) BUG(); return; } So for a noretry command send from dm-mpath we will never leave a command that had an error completion around, even if it was a partial completion. Relying on the LLDDs to get this right seems rather dangerous, though, so it might make sense to lift the above sequence to core code after a careful audit of other drivers to see if and how they handle partial completions. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-06-03 7:37 ` Christoph Hellwig @ 2015-06-03 23:13 ` Junichi Nomura 0 siblings, 0 replies; 14+ messages in thread From: Junichi Nomura @ 2015-06-03 23:13 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, device-mapper development, Mike Snitzer On 06/03/15 16:37, Christoph Hellwig wrote: > From scsi_io_completion: > > /* > * If we finished all bytes in the request we are done now. > */ > if (!scsi_end_request(req, error, good_bytes, 0)) > return; > > /* > * Kill remainder if no retrys. > */ > if (error && scsi_noretry_cmd(cmd)) { > if (scsi_end_request(req, error, blk_rq_bytes(req), 0)) > BUG(); > return; > } > > So for a noretry command send from dm-mpath we will never leave a command > that had an error completion around, even if it was a partial completion. scsi_noretry_cmd() is not always true even for dm-mpath. Following code in the later part of the function tries to complete some bytes with error and requeue the remainder. However it depends on blk_rq_err_bytes() whether partial error completion actually happens... case ACTION_FAIL: ... if (!scsi_end_request(req, error, blk_rq_err_bytes(req), 0)) return; /*FALLTHRU*/ case ACTION_REPREP: requeue: /* Unprep the request and put it back at the head of the queue. * A new command will be prepared and issued. */ if (q->mq_ops) { cmd->request->cmd_flags &= ~REQ_DONTPREP; scsi_mq_uninit_cmd(cmd); scsi_mq_requeue_cmd(cmd); } else { scsi_release_buffers(cmd); scsi_requeue_command(q, cmd); } > Relying on the LLDDs to get this right seems rather dangerous, though, so > it might make sense to lift the above sequence to core code after a careful > audit of other drivers to see if and how they handle partial completions. Right, that is the point. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-05-27 9:50 ` Junichi Nomura 2015-05-28 6:38 ` Junichi Nomura @ 2015-05-29 16:55 ` Christoph Hellwig 2015-06-01 1:19 ` Junichi Nomura 1 sibling, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2015-05-29 16:55 UTC (permalink / raw) To: Junichi Nomura Cc: Jens Axboe, device-mapper development, Christoph Hellwig, Mike Snitzer On Wed, May 27, 2015 at 09:50:18AM +0000, Junichi Nomura wrote: > Can you test this scenario with your patch? > 1. Set up a multipath device with fail-over mode > 2. Write something to the multipath device. > After the clone request is sent to the primary path > and before the data goes to the disk, > down the primary path > (e.g. echo offline > /sys/block/sdXX/device/state) > 3. (dm-mpath will retry from the secondary path and > the write will eventually succeed) > 4. Verify if the written data is really on the disk Verified as not working correctly. The patch below fixes it, but it needs more testing and some comments: diff --git a/block/blk-core.c b/block/blk-core.c index aa819a5..54feaae 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -117,7 +117,13 @@ EXPORT_SYMBOL(blk_rq_init); static void req_bio_endio(struct request *rq, struct bio *bio, unsigned int nbytes, int error) { - if (error && !(rq->cmd_flags & REQ_CLONE)) + if (rq->cmd_flags & REQ_CLONE) { + if (!error && test_bit(BIO_UPTODATE, &bio->bi_flags)) + bio_advance(bio, nbytes); + return; + } + + if (error) clear_bit(BIO_UPTODATE, &bio->bi_flags); else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) error = -EIO; @@ -128,8 +134,7 @@ static void req_bio_endio(struct request *rq, struct bio *bio, bio_advance(bio, nbytes); /* don't actually finish bio if it's part of flush sequence */ - if (bio->bi_iter.bi_size == 0 && - !(rq->cmd_flags & (REQ_FLUSH_SEQ|REQ_CLONE))) + if (bio->bi_iter.bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ)) bio_endio(bio, error); } ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-05-29 16:55 ` Christoph Hellwig @ 2015-06-01 1:19 ` Junichi Nomura 2015-06-03 7:39 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Junichi Nomura @ 2015-06-01 1:19 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, device-mapper development, Mike Snitzer On 05/30/15 01:55, Christoph Hellwig wrote: > On Wed, May 27, 2015 at 09:50:18AM +0000, Junichi Nomura wrote: >> Can you test this scenario with your patch? >> 1. Set up a multipath device with fail-over mode >> 2. Write something to the multipath device. >> After the clone request is sent to the primary path >> and before the data goes to the disk, >> down the primary path >> (e.g. echo offline > /sys/block/sdXX/device/state) >> 3. (dm-mpath will retry from the secondary path and >> the write will eventually succeed) >> 4. Verify if the written data is really on the disk > > Verified as not working correctly. The patch below fixes it, > but it needs more testing and some comments: Thanks. But just skipping bio_advance() is not good. For example, blk_update_request() does this: > while (req->bio) { > struct bio *bio = req->bio; > unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes); > > if (bio_bytes == bio->bi_iter.bi_size) > req->bio = bio->bi_next; if bi_iter.bi_size is not correct, we possibly fail to move the req->bio pointer. So other test case could be: 1. Create a bio with 1MB size 2. Submit a request with the bio 3. From lower driver, firstly partial complete 512KB with error, secondly partial complete 512KB without error 4. The I/O should complete successfully. And yet another test: 1. Create 4 READ bios with each 4KB 2. Submit a request with the 4 bios 3. From lower driver, partial complete 8KB without error, partial complete 4KB with error (emulate medium error), partial complete 4KB without error 4. Verify the read data. Only the 3rd 4KB should be failed. Others should have correct contents. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones 2015-06-01 1:19 ` Junichi Nomura @ 2015-06-03 7:39 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2015-06-03 7:39 UTC (permalink / raw) To: Junichi Nomura Cc: Jens Axboe, device-mapper development, Christoph Hellwig, Mike Snitzer On Mon, Jun 01, 2015 at 01:19:13AM +0000, Junichi Nomura wrote: > For example, blk_update_request() does this: > > > while (req->bio) { > > struct bio *bio = req->bio; > > unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes); > > > > if (bio_bytes == bio->bi_iter.bi_size) > > req->bio = bio->bi_next; > > if bi_iter.bi_size is not correct, we possibly fail > to move the req->bio pointer. > > So other test case could be: > > 1. Create a bio with 1MB size > 2. Submit a request with the bio > 3. From lower driver, > firstly partial complete 512KB with error, > secondly partial complete 512KB without error > 4. The I/O should complete successfully. It will be returned to the upper layer after the first completion as mentioned in my previous mail. Same for the other test. Think of partial completion as a way for SCSI targets to chunk transfers, higher layers will only see full success/failure cases. For that reason we also removed the bytes completed argument to ->bi_end_io because the callers should not care about these details. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH for-4.2 3/3] block: remove export for blk_queue_bio 2015-05-22 13:14 [PATCH for-4.2 1/3] block: remove management of bi_remaining when restoring original bi_end_io Mike Snitzer 2015-05-22 13:14 ` [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones Mike Snitzer @ 2015-05-22 13:14 ` Mike Snitzer 1 sibling, 0 replies; 14+ messages in thread From: Mike Snitzer @ 2015-05-22 13:14 UTC (permalink / raw) To: Jens Axboe; +Cc: dm-devel, Mike Snitzer With commit ff36ab345 ("dm: remove request-based logic from make_request_fn wrapper") DM no longer calls blk_queue_bio() directly, so remove its export. Doing so required a forward declaration in blk-core.c. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-core.c | 5 +++-- include/linux/blkdev.h | 2 -- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 986b256..f6ab750 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -736,6 +736,8 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) } EXPORT_SYMBOL(blk_init_queue_node); +static void blk_queue_bio(struct request_queue *q, struct bio *bio); + struct request_queue * blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn, spinlock_t *lock) @@ -1589,7 +1591,7 @@ void init_request_from_bio(struct request *req, struct bio *bio) blk_rq_bio_prep(req->q, req, bio); } -void blk_queue_bio(struct request_queue *q, struct bio *bio) +static void blk_queue_bio(struct request_queue *q, struct bio *bio) { const bool sync = !!(bio->bi_rw & REQ_SYNC); struct blk_plug *plug; @@ -1697,7 +1699,6 @@ out_unlock: spin_unlock_irq(q->queue_lock); } } -EXPORT_SYMBOL_GPL(blk_queue_bio); /* for device mapper only */ /* * If bio->bi_dev is a partition, remap the location diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 9ded80d..776d2ee 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -788,8 +788,6 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t, extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t, struct scsi_ioctl_command __user *); -extern void blk_queue_bio(struct request_queue *q, struct bio *bio); - /* * A queue has just exitted congestion. Note this in the global counter of * congested queues, and wake up anyone who was waiting for requests to be -- 2.3.2 (Apple Git-55) ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-06-03 23:13 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-05-22 13:14 [PATCH for-4.2 1/3] block: remove management of bi_remaining when restoring original bi_end_io Mike Snitzer 2015-05-22 13:14 ` [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones Mike Snitzer 2015-05-26 6:20 ` Junichi Nomura 2015-05-27 8:21 ` Christoph Hellwig 2015-05-27 9:50 ` Junichi Nomura 2015-05-28 6:38 ` Junichi Nomura 2015-05-29 16:54 ` Christoph Hellwig 2015-06-01 1:14 ` Junichi Nomura 2015-06-03 7:37 ` Christoph Hellwig 2015-06-03 23:13 ` Junichi Nomura 2015-05-29 16:55 ` Christoph Hellwig 2015-06-01 1:19 ` Junichi Nomura 2015-06-03 7:39 ` Christoph Hellwig 2015-05-22 13:14 ` [PATCH for-4.2 3/3] block: remove export for blk_queue_bio Mike Snitzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.