All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/4] blk-mq support for dm multipath
@ 2014-10-17 23:46 Keith Busch
  2014-10-17 23:46 ` [PATCHv2 1/4] dm: prep initialized requests Keith Busch
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Keith Busch @ 2014-10-17 23:46 UTC (permalink / raw)
  To: dm-devel
  Cc: Christoph Hellwig, Jun'ichi Nomura, Keith Busch, Mike Snitzer

This series makes device mapper's multipath work with both blk-mq and
the older request_queue based block devices.

As before, I tested with a slightly modified version of Matias' blk-mq
nvme driver with dual ported nvme controller. I'll post the nvme changes
to that list once the blk-mq mpath is good to go.

I've tried to split this into logically separate patches to prep and
support this request_queue type.

Patch [2/4] is a bit of a departure from what was done before, but I
hope something like this is alright.

v1 -> v2:

Micro-optimization setting the cloned rq->bio to null from dm-mpath
instead of blk_mq_rq_ctx_init.

There's no way around getting a request without calling
blk_get_request. Making this API safe in an interrupt disabled context is
troublesome, so submitting the dm request to the target in a different
thread so it is NOT inline with dm's request_fn. We can use GFP_KERNEL
now since the allocation occurs in this new context. I believe this is
okay in both request queue types since neither allocate from general
purpose memory.

In v1, the target would allocate the request and dm would release it. This
duality is somewhat changed: the target's map_rq is still responsible
to allocate since only it knows what request queue to allocate from,
and I've added a new target type function (unmap_rq) to handle releasing.

I fixed the error handling; it was completely untested before and quite
broken. To test, I synthesized 4 different errors: no requests available,
no memory available to the clone bio, invalid target, and completing
requests in low-level driver with an error. All worked as expected on
blk-mq devices.

Keith Busch (4):
  dm: prep initialized requests
  dm: Submit stacked requests in irq enabled context
  dm: Move request allocation to dm_target type
  block: blk-mq support for cloned requests

 block/blk-core.c              |    7 +-
 drivers/md/dm-mpath.c         |   22 ++++--
 drivers/md/dm-target.c        |    9 ++-
 drivers/md/dm.c               |  155 +++++++++++++++++++++--------------------
 include/linux/device-mapper.h |    7 +-
 5 files changed, 112 insertions(+), 88 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv2 1/4] dm: prep initialized requests
  2014-10-17 23:46 [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
@ 2014-10-17 23:46 ` Keith Busch
  2014-10-17 23:46 ` [PATCHv2 2/4] dm: Submit stacked requests in irq enabled context Keith Busch
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Keith Busch @ 2014-10-17 23:46 UTC (permalink / raw)
  To: dm-devel
  Cc: Christoph Hellwig, Jun'ichi Nomura, Keith Busch, Mike Snitzer

Require blk_rq_prep_clone use requests that have already been
initialized. This patch is preping to allow this path to accept requests
allocated from blk-mq request queues.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-core.c |    2 --
 drivers/md/dm.c  |    1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index bf930f4..3abbc04 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2928,8 +2928,6 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
 	if (!bs)
 		bs = fs_bio_set;
 
-	blk_rq_init(NULL, rq);
-
 	__rq_for_each_bio(bio_src, rq_src) {
 		bio = bio_clone_bioset(bio_src, gfp_mask, bs);
 		if (!bio)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 32b958d..809f83f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1605,6 +1605,7 @@ static int setup_clone(struct request *clone, struct request *rq,
 {
 	int r;
 
+	blk_rq_init(NULL, rq);
 	r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
 			      dm_rq_bio_constructor, tio);
 	if (r)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCHv2 2/4] dm: Submit stacked requests in irq enabled context
  2014-10-17 23:46 [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
  2014-10-17 23:46 ` [PATCHv2 1/4] dm: prep initialized requests Keith Busch
@ 2014-10-17 23:46 ` Keith Busch
  2014-10-17 23:46 ` [PATCHv2 3/4] dm: Move request allocation to dm_target type Keith Busch
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Keith Busch @ 2014-10-17 23:46 UTC (permalink / raw)
  To: dm-devel
  Cc: Christoph Hellwig, Jun'ichi Nomura, Keith Busch, Mike Snitzer

This has dm enqueue all prep'ed requests into work processed by another
thread. This will allow dm to invoke block APIs that assume interrupt
enabled context. This patch is to prepare for adding blk-mq support.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/md/dm.c |   37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 809f83f..88a73be 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -19,6 +19,7 @@
 #include <linux/idr.h>
 #include <linux/hdreg.h>
 #include <linux/delay.h>
+#include <linux/kthread.h>
 
 #include <trace/events/block.h>
 
@@ -56,6 +57,8 @@ static DECLARE_WORK(deferred_remove_work, do_deferred_remove);
 
 static struct workqueue_struct *deferred_remove_workqueue;
 
+static void map_tio_request(struct kthread_work *work);
+
 /*
  * For bio-based dm.
  * One of these is allocated per bio.
@@ -78,6 +81,7 @@ struct dm_rq_target_io {
 	struct mapped_device *md;
 	struct dm_target *ti;
 	struct request *orig, clone;
+	struct kthread_work work;
 	int error;
 	union map_info info;
 };
@@ -202,6 +206,9 @@ struct mapped_device {
 	struct bio flush_bio;
 
 	struct dm_stats stats;
+
+	struct kthread_worker kworker;
+	struct task_struct *kworker_task;
 };
 
 /*
@@ -1635,6 +1642,7 @@ static struct request *clone_rq(struct request *rq, struct mapped_device *md,
 	tio->orig = rq;
 	tio->error = 0;
 	memset(&tio->info, 0, sizeof(tio->info));
+	init_kthread_work(&tio->work, map_tio_request);
 
 	clone = &tio->clone;
 	if (setup_clone(clone, rq, tio)) {
@@ -1731,6 +1739,13 @@ static struct request *dm_start_request(struct mapped_device *md, struct request
 	return clone;
 }
 
+static void map_tio_request(struct kthread_work *work)
+{
+	struct dm_rq_target_io *tio = container_of(work, struct dm_rq_target_io,
+									work);
+	map_request(tio->ti, &tio->clone, tio->md);
+}
+
 /*
  * q->request_fn for request-based dm.
  * Called with the queue lock held.
@@ -1742,6 +1757,7 @@ static void dm_request_fn(struct request_queue *q)
 	struct dm_table *map = dm_get_live_table(md, &srcu_idx);
 	struct dm_target *ti;
 	struct request *rq, *clone;
+	struct dm_rq_target_io *tio;
 	sector_t pos;
 
 	/*
@@ -1777,20 +1793,14 @@ static void dm_request_fn(struct request_queue *q)
 
 		clone = dm_start_request(md, rq);
 
-		spin_unlock(q->queue_lock);
-		if (map_request(ti, clone, md))
-			goto requeued;
-
+		tio = rq->special;
+		tio->ti = ti;
+		queue_kthread_work(&md->kworker, &tio->work);
 		BUG_ON(!irqs_disabled());
-		spin_lock(q->queue_lock);
 	}
 
 	goto out;
 
-requeued:
-	BUG_ON(!irqs_disabled());
-	spin_lock(q->queue_lock);
-
 delay_and_out:
 	blk_delay_queue(q, HZ / 10);
 out:
@@ -1981,6 +1991,11 @@ static struct mapped_device *alloc_dev(int minor)
 	md->disk->queue = md->queue;
 	md->disk->private_data = md;
 	sprintf(md->disk->disk_name, "dm-%d", minor);
+
+	init_kthread_worker(&md->kworker);
+	md->kworker_task = kthread_run(kthread_worker_fn, &md->kworker, "dm-%s",
+					dev_name(disk_to_dev(md->disk)));
+
 	add_disk(md->disk);
 	format_dev_t(md->name, MKDEV(_major, minor));
 
@@ -2034,6 +2049,10 @@ static void free_dev(struct mapped_device *md)
 	unlock_fs(md);
 	bdput(md->bdev);
 	destroy_workqueue(md->wq);
+
+	flush_kthread_worker(&md->kworker);
+	kthread_stop(md->kworker_task);
+
 	if (md->io_pool)
 		mempool_destroy(md->io_pool);
 	if (md->bs)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCHv2 3/4] dm: Move request allocation to dm_target type
  2014-10-17 23:46 [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
  2014-10-17 23:46 ` [PATCHv2 1/4] dm: prep initialized requests Keith Busch
  2014-10-17 23:46 ` [PATCHv2 2/4] dm: Submit stacked requests in irq enabled context Keith Busch
@ 2014-10-17 23:46 ` Keith Busch
  2014-10-17 23:46 ` [PATCHv2 4/4] block: blk-mq support for cloned requests Keith Busch
  2014-10-28 18:13 ` [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
  4 siblings, 0 replies; 14+ messages in thread
From: Keith Busch @ 2014-10-17 23:46 UTC (permalink / raw)
  To: dm-devel
  Cc: Christoph Hellwig, Jun'ichi Nomura, Keith Busch, Mike Snitzer

This moves the responsibility of allocating a cloned request from dm
to the target type so that the cloned request is allocated from the
appropriate request_queue's pool and initialized for the target block
device. The original request's 'special' now points to the dm_rq_target_io
because the clone is allocated later in the block layer rather than in
dm. All the back references to the original will then work much as before.

Since the previous patch move the mapping into an interrupt safe context,
any allocations can be safely done in GFP_KERNEL context.

Since the responsibility for allocating the request is moved to the
target's map_rq function, it has to take responisbility for releasing
the request when completed by dm, so this adds a new unmap_rq function
for target types to implement.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/md/dm-mpath.c         |   22 ++++++--
 drivers/md/dm-target.c        |    9 ++-
 drivers/md/dm.c               |  125 ++++++++++++++++++-----------------------
 include/linux/device-mapper.h |    7 ++-
 4 files changed, 82 insertions(+), 81 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 833d7e7..10fd340 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -11,6 +11,7 @@
 #include "dm-path-selector.h"
 #include "dm-uevent.h"
 
+#include <linux/blkdev.h>
 #include <linux/ctype.h>
 #include <linux/init.h>
 #include <linux/mempool.h>
@@ -376,12 +377,12 @@ static int __must_push_back(struct multipath *m)
 /*
  * Map cloned requests
  */
-static int multipath_map(struct dm_target *ti, struct request *clone,
-			 union map_info *map_context)
+static int multipath_map(struct dm_target *ti, struct request *rq,
+		struct request **clone, union map_info *map_context)
 {
 	struct multipath *m = (struct multipath *) ti->private;
 	int r = DM_MAPIO_REQUEUE;
-	size_t nr_bytes = blk_rq_bytes(clone);
+	size_t nr_bytes = blk_rq_bytes(rq);
 	unsigned long flags;
 	struct pgpath *pgpath;
 	struct block_device *bdev;
@@ -410,9 +411,12 @@ static int multipath_map(struct dm_target *ti, struct request *clone,
 		goto out_unlock;
 
 	bdev = pgpath->path.dev->bdev;
-	clone->q = bdev_get_queue(bdev);
-	clone->rq_disk = bdev->bd_disk;
-	clone->cmd_flags |= REQ_FAILFAST_TRANSPORT;
+	*clone = blk_get_request(bdev_get_queue(bdev), rq_data_dir(rq),
+							GFP_KERNEL);
+	if (!*clone)
+		goto out_unlock;
+	(*clone)->bio = NULL;
+	(*clone)->cmd_flags |= REQ_FAILFAST_TRANSPORT;
 	mpio = map_context->ptr;
 	mpio->pgpath = pgpath;
 	mpio->nr_bytes = nr_bytes;
@@ -428,6 +432,11 @@ out_unlock:
 	return r;
 }
 
+static void multipath_unmap(struct request *clone)
+{
+	blk_put_request(clone);
+}
+
 /*
  * If we run out of usable paths, should we queue I/O or error it?
  */
@@ -1669,6 +1678,7 @@ static struct target_type multipath_target = {
 	.ctr = multipath_ctr,
 	.dtr = multipath_dtr,
 	.map_rq = multipath_map,
+	.unmap_rq = multipath_unmap,
 	.rq_end_io = multipath_end_io,
 	.presuspend = multipath_presuspend,
 	.postsuspend = multipath_postsuspend,
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index 242e3ce..32fa435 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -131,12 +131,16 @@ static int io_err_map(struct dm_target *tt, struct bio *bio)
 	return -EIO;
 }
 
-static int io_err_map_rq(struct dm_target *ti, struct request *clone,
-			 union map_info *map_context)
+static int io_err_map_rq(struct dm_target *ti, struct request *rq,
+			 struct request **clone, union map_info *map_context)
 {
 	return -EIO;
 }
 
+static void io_err_unmap_rq(struct request *clone)
+{
+}
+
 static struct target_type error_target = {
 	.name = "error",
 	.version = {1, 2, 0},
@@ -144,6 +148,7 @@ static struct target_type error_target = {
 	.dtr  = io_err_dtr,
 	.map  = io_err_map,
 	.map_rq = io_err_map_rq,
+	.unmap_rq = io_err_unmap_rq,
 };
 
 int __init dm_target_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 88a73be..944b133 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -80,7 +80,7 @@ struct dm_io {
 struct dm_rq_target_io {
 	struct mapped_device *md;
 	struct dm_target *ti;
-	struct request *orig, clone;
+	struct request *orig, *clone;
 	struct kthread_work work;
 	int error;
 	union map_info info;
@@ -901,6 +901,7 @@ static void free_rq_clone(struct request *clone)
 	struct dm_rq_target_io *tio = clone->end_io_data;
 
 	blk_rq_unprep_clone(clone);
+	tio->ti->type->unmap_rq(clone);
 	free_rq_tio(tio);
 }
 
@@ -935,7 +936,8 @@ static void dm_end_request(struct request *clone, int error)
 
 static void dm_unprep_request(struct request *rq)
 {
-	struct request *clone = rq->special;
+	struct dm_rq_target_io *tio = rq->special;
+	struct request *clone = tio->clone;
 
 	rq->special = NULL;
 	rq->cmd_flags &= ~REQ_DONTPREP;
@@ -1032,8 +1034,14 @@ static void dm_done(struct request *clone, int error, bool mapped)
 static void dm_softirq_done(struct request *rq)
 {
 	bool mapped = true;
-	struct request *clone = rq->completion_data;
-	struct dm_rq_target_io *tio = clone->end_io_data;
+	struct dm_rq_target_io *tio = rq->special;
+	struct request *clone = tio->clone;
+
+	if (!clone) {
+		blk_end_request_all(rq, tio->error);
+		free_rq_tio(tio);
+		return;
+	}
 
 	if (rq->cmd_flags & REQ_FAILED)
 		mapped = false;
@@ -1045,44 +1053,21 @@ static void dm_softirq_done(struct request *rq)
  * Complete the clone and the original request with the error status
  * through softirq context.
  */
-static void dm_complete_request(struct request *clone, int error)
+static void dm_complete_request(struct request *rq, int error)
 {
-	struct dm_rq_target_io *tio = clone->end_io_data;
-	struct request *rq = tio->orig;
+	struct dm_rq_target_io *tio = rq->special;
 
 	tio->error = error;
-	rq->completion_data = clone;
 	blk_complete_request(rq);
 }
 
 /*
- * Complete the not-mapped clone and the original request with the error status
- * through softirq context.
- * Target's rq_end_io() function isn't called.
- * This may be used when the target's map_rq() function fails.
- */
-void dm_kill_unmapped_request(struct request *clone, int error)
-{
-	struct dm_rq_target_io *tio = clone->end_io_data;
-	struct request *rq = tio->orig;
-
-	rq->cmd_flags |= REQ_FAILED;
-	dm_complete_request(clone, error);
-}
-EXPORT_SYMBOL_GPL(dm_kill_unmapped_request);
-
-/*
  * Called with the queue lock held
  */
 static void end_clone_request(struct request *clone, int error)
 {
-	/*
-	 * For just cleaning up the information of the queue in which
-	 * the clone was dispatched.
-	 * The clone is *NOT* freed actually here because it is alloced from
-	 * dm own mempool and REQ_ALLOCED isn't set in clone->cmd_flags.
-	 */
-	__blk_put_request(clone->q, clone);
+	struct dm_rq_target_io *tio = clone->end_io_data;
+	struct request *rq = tio->orig;
 
 	/*
 	 * Actual request completion is done in a softirq context which doesn't
@@ -1092,7 +1077,7 @@ static void end_clone_request(struct request *clone, int error)
 	 *     - the submission which requires queue lock may be done
 	 *       against this queue
 	 */
-	dm_complete_request(clone, error);
+	dm_complete_request(rq, error);
 }
 
 /*
@@ -1612,8 +1597,7 @@ static int setup_clone(struct request *clone, struct request *rq,
 {
 	int r;
 
-	blk_rq_init(NULL, rq);
-	r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
+	r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_KERNEL,
 			      dm_rq_bio_constructor, tio);
 	if (r)
 		return r;
@@ -1627,10 +1611,9 @@ static int setup_clone(struct request *clone, struct request *rq,
 	return 0;
 }
 
-static struct request *clone_rq(struct request *rq, struct mapped_device *md,
-				gfp_t gfp_mask)
+static struct dm_rq_target_io *prep_tio(struct request *rq,
+			struct mapped_device *md, gfp_t gfp_mask)
 {
-	struct request *clone;
 	struct dm_rq_target_io *tio;
 
 	tio = alloc_rq_tio(md, gfp_mask);
@@ -1639,19 +1622,13 @@ static struct request *clone_rq(struct request *rq, struct mapped_device *md,
 
 	tio->md = md;
 	tio->ti = NULL;
+	tio->clone = NULL;
 	tio->orig = rq;
 	tio->error = 0;
 	memset(&tio->info, 0, sizeof(tio->info));
 	init_kthread_work(&tio->work, map_tio_request);
 
-	clone = &tio->clone;
-	if (setup_clone(clone, rq, tio)) {
-		/* -ENOMEM */
-		free_rq_tio(tio);
-		return NULL;
-	}
-
-	return clone;
+	return tio;
 }
 
 /*
@@ -1660,18 +1637,18 @@ static struct request *clone_rq(struct request *rq, struct mapped_device *md,
 static int dm_prep_fn(struct request_queue *q, struct request *rq)
 {
 	struct mapped_device *md = q->queuedata;
-	struct request *clone;
+	struct dm_rq_target_io *tio;
 
 	if (unlikely(rq->special)) {
 		DMWARN("Already has something in rq->special.");
 		return BLKPREP_KILL;
 	}
 
-	clone = clone_rq(rq, md, GFP_ATOMIC);
-	if (!clone)
+	tio = prep_tio(rq, md, GFP_ATOMIC);
+	if (!tio)
 		return BLKPREP_DEFER;
 
-	rq->special = clone;
+	rq->special = tio;
 	rq->cmd_flags |= REQ_DONTPREP;
 
 	return BLKPREP_OK;
@@ -1682,14 +1659,23 @@ static int dm_prep_fn(struct request_queue *q, struct request *rq)
  * 0  : the request has been processed (not requeued)
  * !0 : the request has been requeued
  */
-static int map_request(struct dm_target *ti, struct request *clone,
+static int map_request(struct dm_target *ti, struct request *rq,
 		       struct mapped_device *md)
 {
+	struct request *clone = NULL;
 	int r, requeued = 0;
-	struct dm_rq_target_io *tio = clone->end_io_data;
+	struct dm_rq_target_io *tio = rq->special;
+
+	r = ti->type->map_rq(ti, rq, &clone, &tio->info);
+	if (!clone)
+		return -ENOMEM;
+	if (setup_clone(clone, rq, tio)) {
+		tio->ti->type->unmap_rq(clone);
+		return -ENOMEM;
+	}
+	tio->clone = clone;
+	atomic_inc(&md->pending[rq_data_dir(clone)]);
 
-	tio->ti = ti;
-	r = ti->type->map_rq(ti, clone, &tio->info);
 	switch (r) {
 	case DM_MAPIO_SUBMITTED:
 		/* The target has taken the I/O to submit by itself later */
@@ -1712,20 +1698,17 @@ static int map_request(struct dm_target *ti, struct request *clone,
 		}
 
 		/* The target wants to complete the I/O */
-		dm_kill_unmapped_request(clone, r);
+		rq->cmd_flags |= REQ_FAILED;
+		dm_complete_request(rq, r);
 		break;
 	}
 
 	return requeued;
 }
 
-static struct request *dm_start_request(struct mapped_device *md, struct request *orig)
+static void dm_start_request(struct mapped_device *md, struct request *orig)
 {
-	struct request *clone;
-
 	blk_start_request(orig);
-	clone = orig->special;
-	atomic_inc(&md->pending[rq_data_dir(clone)]);
 
 	/*
 	 * Hold the md reference here for the in-flight I/O.
@@ -1735,15 +1718,20 @@ static struct request *dm_start_request(struct mapped_device *md, struct request
 	 * See the comment in rq_completed() too.
 	 */
 	dm_get(md);
-
-	return clone;
 }
 
 static void map_tio_request(struct kthread_work *work)
 {
 	struct dm_rq_target_io *tio = container_of(work, struct dm_rq_target_io,
 									work);
-	map_request(tio->ti, &tio->clone, tio->md);
+	struct request *rq = tio->orig;
+
+	if (map_request(tio->ti, rq, tio->md) < 0) {
+		/* requeue prepped request */
+		spin_lock_irq(rq->q->queue_lock);
+		blk_requeue_request(rq->q, rq);
+		spin_unlock_irq(rq->q->queue_lock);
+	}
 }
 
 /*
@@ -1756,7 +1744,7 @@ static void dm_request_fn(struct request_queue *q)
 	int srcu_idx;
 	struct dm_table *map = dm_get_live_table(md, &srcu_idx);
 	struct dm_target *ti;
-	struct request *rq, *clone;
+	struct request *rq;
 	struct dm_rq_target_io *tio;
 	sector_t pos;
 
@@ -1776,24 +1764,19 @@ static void dm_request_fn(struct request_queue *q)
 		if (!(rq->cmd_flags & REQ_FLUSH))
 			pos = blk_rq_pos(rq);
 
+		tio = rq->special;
 		ti = dm_table_find_target(map, pos);
 		if (!dm_target_is_valid(ti)) {
-			/*
-			 * Must perform setup, that dm_done() requires,
-			 * before calling dm_kill_unmapped_request
-			 */
 			DMERR_LIMIT("request attempted access beyond the end of device");
-			clone = dm_start_request(md, rq);
-			dm_kill_unmapped_request(clone, -EIO);
+			dm_complete_request(rq, -EIO);
 			continue;
 		}
 
 		if (ti->type->busy && ti->type->busy(ti))
 			goto delay_and_out;
 
-		clone = dm_start_request(md, rq);
+		dm_start_request(md, rq);
 
-		tio = rq->special;
 		tio->ti = ti;
 		queue_kthread_work(&md->kworker, &tio->work);
 		BUG_ON(!irqs_disabled());
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index e1707de..90a4774 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -46,8 +46,10 @@ typedef void (*dm_dtr_fn) (struct dm_target *ti);
  * = 2: The target wants to push back the io
  */
 typedef int (*dm_map_fn) (struct dm_target *ti, struct bio *bio);
-typedef int (*dm_map_request_fn) (struct dm_target *ti, struct request *clone,
-				  union map_info *map_context);
+typedef int (*dm_map_request_fn) (struct dm_target *ti, struct request *rq,
+			struct request **clone, union map_info *map_context);
+
+typedef void (*dm_unmap_request_fn) (struct request *clone);
 
 /*
  * Returns:
@@ -142,6 +144,7 @@ struct target_type {
 	dm_dtr_fn dtr;
 	dm_map_fn map;
 	dm_map_request_fn map_rq;
+	dm_unmap_request_fn unmap_rq;
 	dm_endio_fn end_io;
 	dm_request_endio_fn rq_end_io;
 	dm_presuspend_fn presuspend;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCHv2 4/4] block: blk-mq support for cloned requests
  2014-10-17 23:46 [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
                   ` (2 preceding siblings ...)
  2014-10-17 23:46 ` [PATCHv2 3/4] dm: Move request allocation to dm_target type Keith Busch
@ 2014-10-17 23:46 ` Keith Busch
  2014-10-28 18:13 ` [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
  4 siblings, 0 replies; 14+ messages in thread
From: Keith Busch @ 2014-10-17 23:46 UTC (permalink / raw)
  To: dm-devel
  Cc: Christoph Hellwig, Jun'ichi Nomura, Keith Busch, Mike Snitzer

Use blk-mq to insert cloned requests if it was allocated from a
multi-queue block driver.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-core.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3abbc04..4c5952b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2031,6 +2031,11 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
 	    should_fail_request(&rq->rq_disk->part0, blk_rq_bytes(rq)))
 		return -EIO;
 
+	if (q->mq_ops) {
+		blk_mq_insert_request(rq, false, true, true);
+		return 0;
+	}
+
 	spin_lock_irqsave(q->queue_lock, flags);
 	if (unlikely(blk_queue_dying(q))) {
 		spin_unlock_irqrestore(q->queue_lock, flags);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 0/4] blk-mq support for dm multipath
  2014-10-17 23:46 [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
                   ` (3 preceding siblings ...)
  2014-10-17 23:46 ` [PATCHv2 4/4] block: blk-mq support for cloned requests Keith Busch
@ 2014-10-28 18:13 ` Keith Busch
  2014-10-28 18:26   ` Mike Snitzer
  2014-10-28 18:48   ` Hannes Reinecke
  4 siblings, 2 replies; 14+ messages in thread
From: Keith Busch @ 2014-10-28 18:13 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jun'ichi Nomura, dm-devel, Mike Snitzer

No comments this time. Did I totally botch this in the wrong direction,
or was the timing bad with the linux con/plumbers that week?

On Fri, 17 Oct 2014, Keith Busch wrote:
> This series makes device mapper's multipath work with both blk-mq and
> the older request_queue based block devices.
>
> As before, I tested with a slightly modified version of Matias' blk-mq
> nvme driver with dual ported nvme controller. I'll post the nvme changes
> to that list once the blk-mq mpath is good to go.
>
> I've tried to split this into logically separate patches to prep and
> support this request_queue type.
>
> Patch [2/4] is a bit of a departure from what was done before, but I
> hope something like this is alright.
>
> v1 -> v2:
>
> Micro-optimization setting the cloned rq->bio to null from dm-mpath
> instead of blk_mq_rq_ctx_init.
>
> There's no way around getting a request without calling
> blk_get_request. Making this API safe in an interrupt disabled context is
> troublesome, so submitting the dm request to the target in a different
> thread so it is NOT inline with dm's request_fn. We can use GFP_KERNEL
> now since the allocation occurs in this new context. I believe this is
> okay in both request queue types since neither allocate from general
> purpose memory.
>
> In v1, the target would allocate the request and dm would release it. This
> duality is somewhat changed: the target's map_rq is still responsible
> to allocate since only it knows what request queue to allocate from,
> and I've added a new target type function (unmap_rq) to handle releasing.
>
> I fixed the error handling; it was completely untested before and quite
> broken. To test, I synthesized 4 different errors: no requests available,
> no memory available to the clone bio, invalid target, and completing
> requests in low-level driver with an error. All worked as expected on
> blk-mq devices.
>
> Keith Busch (4):
>  dm: prep initialized requests
>  dm: Submit stacked requests in irq enabled context
>  dm: Move request allocation to dm_target type
>  block: blk-mq support for cloned requests
>
> block/blk-core.c              |    7 +-
> drivers/md/dm-mpath.c         |   22 ++++--
> drivers/md/dm-target.c        |    9 ++-
> drivers/md/dm.c               |  155 +++++++++++++++++++++--------------------
> include/linux/device-mapper.h |    7 +-
> 5 files changed, 112 insertions(+), 88 deletions(-)
>
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 0/4] blk-mq support for dm multipath
  2014-10-28 18:13 ` [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
@ 2014-10-28 18:26   ` Mike Snitzer
  2014-12-05 23:25     ` Mike Snitzer
  2014-10-28 18:48   ` Hannes Reinecke
  1 sibling, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2014-10-28 18:26 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Jun'ichi Nomura, dm-devel

On Tue, Oct 28 2014 at  2:13pm -0400,
Keith Busch <keith.busch@intel.com> wrote:

> No comments this time. Did I totally botch this in the wrong direction,
> or was the timing bad with the linux con/plumbers that week?

I've just been busy with other things.  Reviewing this is on my
near-term TODO (for 3.19 merge consideration).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 0/4] blk-mq support for dm multipath
  2014-10-28 18:13 ` [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
  2014-10-28 18:26   ` Mike Snitzer
@ 2014-10-28 18:48   ` Hannes Reinecke
  1 sibling, 0 replies; 14+ messages in thread
From: Hannes Reinecke @ 2014-10-28 18:48 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jun'ichi Nomura, dm-devel, Mike Snitzer

On 10/28/2014 07:13 PM, Keith Busch wrote:
> No comments this time. Did I totally botch this in the wrong direction,
> or was the timing bad with the linux con/plumbers that week?
> 
Basically the latter. We (ie hch and myself) discussed this briefly at
LPC, and thought it'd be the right step in the right direction.

'Real' multipath support w/ blk-mq would need some more thought, though.
What we've been discussing was to redo the entire I/O submission path:
a) pass down I/O _without_ cloning on the current path
b) clone the request _for retry_ only.
with b) covering retry on a different path and partial I/O submission
on the current path.
That would have the benefit doing away with the current request/bio
cloning. Plus we could even model things like 'preferred' path.

Now we only need someone to implement this ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 0/4] blk-mq support for dm multipath
  2014-10-28 18:26   ` Mike Snitzer
@ 2014-12-05 23:25     ` Mike Snitzer
  2014-12-05 23:33       ` Keith Busch
  2014-12-06  5:53       ` Mike Snitzer
  0 siblings, 2 replies; 14+ messages in thread
From: Mike Snitzer @ 2014-12-05 23:25 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Jun'ichi Nomura, dm-devel

On Tue, Oct 28 2014 at  2:26pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Oct 28 2014 at  2:13pm -0400,
> Keith Busch <keith.busch@intel.com> wrote:
> 
> > No comments this time. Did I totally botch this in the wrong direction,
> > or was the timing bad with the linux con/plumbers that week?
> 
> I've just been busy with other things.  Reviewing this is on my
> near-term TODO (for 3.19 merge consideration).

Hi Keith,

I've found a few issues with your v2.  I've been working through the
code and fixing things as I see them.  Please see this branch:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19-blk-mq

My changes build on your v2 baseline.

How much testing have you done?  It could be that my changes screwed
something up but... I just tried simply creating a normal request-based
DM multipath device (using non blk-mq devices) and hit this crash:

[ 1179.088841] BUG: unable to handle kernel NULL pointer dereference at 0000000000000378
[ 1179.097587] IP: [<ffffffff8142091a>] scsi_setup_cmnd+0xea/0x170
[ 1179.104202] PGD 0
[ 1179.106450] Oops: 0000 [#1] SMP
[ 1179.110062] Modules linked in: dm_service_time(O) scsi_debug dm_multipath(O) sg iTCO_wdt ixgbe dm_thin_pool igb dm_persistent_data mdio coretemp kvm_intel i7core_ed
ac dm_bio_prison kvm ptp crct10dif_pclmul pps_core edac_core dm_bufio crc32_pclmul dca crc32c_intel ipmi_si iTCO_vendor_support ses ghash_clmulni_intel ipmi_msghandler
 aesni_intel enclosure glue_helper lrw shpchp gf128mul ablk_helper lpc_ich mfd_core cryptd i2c_i801 pcspkr acpi_power_meter acpi_cpufreq xfs libcrc32c sr_mod sd_mod cd
rom mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic pata_acpi ttm ata_piix drm libata iomemory_vsl(POE) megaraid_sas i2ccore dm_mirro
r dm_region_hash dm_log dm_mod
[ 1179.178539] CPU: 5 PID: 4395 Comm: kdmwork-253:13 Tainted: P        W IOE  3.17.0-rc5.snitm+ #21
[ 1179.188338] Hardware name: FUJITSU                          PRIMERGY RX300 S6             /D2619, BIOS 6.00 Rev. 1.10.2619.N1           05/24/2011
[ 1179.202983] task: ffff8800b9bfa8b0 ti: ffff8802c7578000 task.ti: ffff8802c7578000
[ 1179.211329] RIP: 0010:[<ffffffff8142091a>]  [<ffffffff8142091a>] scsi_setup_cmnd+0xea/0x170
[ 1179.220653] RSP: 0018:ffff8802c757bcb0  EFLAGS: 00010046
[ 1179.226574] RAX: 0000000000000000 RBX: ffff88032b193140 RCX: 0000000000001000
[ 1179.234532] RDX: ffff8800afc23200 RSI: ffff88032b193140 RDI: ffff8800afc23200
[ 1179.242488] RBP: ffff8802c757bcc0 R08: ffff880329b3f840 R09: ffffffff81416a9b
[ 1179.250444] R10: ffffffff812cc9c3 R11: 0000000000000000 R12: ffff8800afc23200
[ 1179.258401] R13: ffff880329b3f800 R14: ffff880329b3f968 R15: ffff880328390000
[ 1179.266359] FS:  0000000000000000(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000
[ 1179.275381] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1179.281787] CR2: 0000000000000378 CR3: 00000000b0d90000 CR4: 00000000000007e0
[ 1179.289746] Stack:
[ 1179.291984]  ffff88032b193140 ffff880328390000 ffff8802c757bcf8 ffffffff81420a7b
[ 1179.300266]  ffff8800b9bfa8b0 ffff880328390000 ffff880329addc00 0000000100000000
[ 1179.308546]  ffff88032b193140 ffff8802c757bd28 ffffffff812c8fa6 ffff880329b3f800
[ 1179.316827] Call Trace:
[ 1179.319553]  [<ffffffff81420a7b>] scsi_prep_fn+0xdb/0x180
[ 1179.325573]  [<ffffffff812c8fa6>] blk_peek_request+0x166/0x290
[ 1179.332077]  [<ffffffff81422540>] scsi_request_fn+0x50/0x7b0
[ 1179.338388]  [<ffffffff812c4863>] __blk_run_queue+0x33/0x40
[ 1179.344602]  [<ffffffff812c22fb>] __elv_add_request+0xdb/0x2e0
[ 1179.351105]  [<ffffffff812c8bd1>] blk_insert_cloned_request+0xa1/0x120
[ 1179.358390]  [<ffffffffa00007ab>] dm_dispatch_request+0x3b/0x60 [dm_mod]
[ 1179.365865]  [<ffffffffa0001154>] map_tio_request+0x174/0x250 [dm_mod]
[ 1179.373146]  [<ffffffff81091023>] kthread_worker_fn+0x83/0x1d0
[ 1179.379650]  [<ffffffff81090fa0>] ? kthread_stop+0xe0/0xe0
[ 1179.385767]  [<ffffffff81090c58>] kthread+0xd8/0xf0
[ 1179.391204]  [<ffffffff81090b80>] ? kthread_create_on_node+0x1a0/0x1a0
[ 1179.398486]  [<ffffffff8163abfc>] ret_from_fork+0x7c/0xb0
[ 1179.404505]  [<ffffffff81090b80>] ? kthread_create_on_node+0x1a0/0x1a0
[ 1179.411783] Code: 00 00 49 8b 84 24 d8 00 00 00 4c 89 e7 48 c7 00 00 00 00 00 48 c7 40 08 00 00 00 00 49 8b 84 24 08 01 00 00 48 8b 80 c0 00 00 00 <48> 8b 80 78 03
00 00 48 8b 00 ff 90 88 00 00 00 5b 41 5c 5d c3
[ 1179.433377] RIP  [<ffffffff8142091a>] scsi_setup_cmnd+0xea/0x170
[ 1179.440086]  RSP <ffff8802c757bcb0>
[ 1179.443974] CR2: 0000000000000378
[ 1199.444124] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 10
[ 1200.503096] Shutting down cpus with NMI
[ 1200.507978] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[ 1200.519326] drm_kms_helper: panic occurred, switching back to text console
[ 1208.015881] ---[ end Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 10

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 0/4] blk-mq support for dm multipath
  2014-12-05 23:25     ` Mike Snitzer
@ 2014-12-05 23:33       ` Keith Busch
  2014-12-06  0:35         ` Mike Snitzer
  2014-12-06  5:53       ` Mike Snitzer
  1 sibling, 1 reply; 14+ messages in thread
From: Keith Busch @ 2014-12-05 23:33 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Keith Busch, Christoph Hellwig, Jun'ichi Nomura, dm-devel

On Fri, 5 Dec 2014, Mike Snitzer wrote:
> Hi Keith,
>
> I've found a few issues with your v2.  I've been working through the
> code and fixing things as I see them.  Please see this branch:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19-blk-mq
>
> My changes build on your v2 baseline.
>
> How much testing have you done?  It could be that my changes screwed
> something up but... I just tried simply creating a normal request-based
> DM multipath device (using non blk-mq devices) and hit this crash:

Thanks, Mike. I'll take a look at your fixes next week.

I only tested on nvme blk-mq. I don't have any SCSI devices to try,
but I _really_ need to get one if I'm serious about making this work,
so I'll find something next week as well.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 0/4] blk-mq support for dm multipath
  2014-12-05 23:33       ` Keith Busch
@ 2014-12-06  0:35         ` Mike Snitzer
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2014-12-06  0:35 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Jun'ichi Nomura, dm-devel

On Fri, Dec 05 2014 at  6:33pm -0500,
Keith Busch <keith.busch@intel.com> wrote:

> On Fri, 5 Dec 2014, Mike Snitzer wrote:
> >Hi Keith,
> >
> >I've found a few issues with your v2.  I've been working through the
> >code and fixing things as I see them.  Please see this branch:
> >https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19-blk-mq
> >
> >My changes build on your v2 baseline.
> >
> >How much testing have you done?  It could be that my changes screwed
> >something up but... I just tried simply creating a normal request-based
> >DM multipath device (using non blk-mq devices) and hit this crash:
> 
> Thanks, Mike. I'll take a look at your fixes next week.
> 
> I only tested on nvme blk-mq. I don't have any SCSI devices to try,
> but I _really_ need to get one if I'm serious about making this work,
> so I'll find something next week as well.

I'll continue review to see if I can sort out why we're getting this crash.

FYI, you don't _need_ real SCSI devices to test dm-multipath.  Easy
enough to just use scsi_debug so long as you have free memory:

 modprobe scsi_debug dev_size_mb=100 num_tgts=1 vpd_use_hostno=0 add_host=4

add_host makes multiple paths
vpd_use_hostno=0 makes sure that all paths will have the same wwid

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 0/4] blk-mq support for dm multipath
  2014-12-05 23:25     ` Mike Snitzer
  2014-12-05 23:33       ` Keith Busch
@ 2014-12-06  5:53       ` Mike Snitzer
  2014-12-06  6:09         ` Mike Snitzer
  1 sibling, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2014-12-06  5:53 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Jun'ichi Nomura, dm-devel

On Fri, Dec 05 2014 at  6:25pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Oct 28 2014 at  2:26pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Tue, Oct 28 2014 at  2:13pm -0400,
> > Keith Busch <keith.busch@intel.com> wrote:
> > 
> > > No comments this time. Did I totally botch this in the wrong direction,
> > > or was the timing bad with the linux con/plumbers that week?
> > 
> > I've just been busy with other things.  Reviewing this is on my
> > near-term TODO (for 3.19 merge consideration).
> 
> Hi Keith,
> 
> I've found a few issues with your v2.  I've been working through the
> code and fixing things as I see them.  Please see this branch:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19-blk-mq
> 
> My changes build on your v2 baseline.
> 
> How much testing have you done?  It could be that my changes screwed
> something up but... I just tried simply creating a normal request-based
> DM multipath device (using non blk-mq devices) and hit this crash:
> 
> [ 1179.088841] BUG: unable to handle kernel NULL pointer dereference at 0000000000000378
> [ 1179.097587] IP: [<ffffffff8142091a>] scsi_setup_cmnd+0xea/0x170

map_tio_request -> map_request -> dm_dispatch_request -> blk_insert_cloned_request ->
scsi_setup_cmnd -> scsi_setup_fs_cmnd -> scsi_cmd_to_driver

static inline struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd)
{
        return *(struct scsi_driver **)cmd->request->rq_disk->private_data;
}

We're crashing because rq_disk is NULL.
This fixed the crash for me:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.19-blk-mq&id=0373233d9a9ca72a90394ed1770b21a8b7431abc

But in general this code needs _a lot_ more testing/review.
Sadly I cannot stage it for 3.19 inclusion.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 0/4] blk-mq support for dm multipath
  2014-12-06  5:53       ` Mike Snitzer
@ 2014-12-06  6:09         ` Mike Snitzer
  2014-12-13  2:08           ` Mike Snitzer
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2014-12-06  6:09 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Jun'ichi Nomura, dm-devel

On Sat, Dec 06 2014 at 12:53am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> But in general this code needs _a lot_ more testing/review.
> Sadly I cannot stage it for 3.19 inclusion.

FYI, even though the mpath device is created without crashing it hangs
when IO is issued to it (mkfs.xfs /dev/mapper/mpathi):

 kernel: mkfs.xfs        D ffff88033fc34740     0  2792   2457 0x00000080
 kernel: ffff880319057cb8 0000000000000082 ffff880327e84c80 0000000000014740
 kernel: ffff880319057fd8 0000000000014740 ffff88032e644c80 ffff880327e84c80
 kernel: ffff88033fc35058 ffff88033ff940e8 ffff880319057d48 0000000000000082
 kernel: Call Trace:
 kernel: [<ffffffff81633a50>] ? bit_wait+0x50/0x50
 kernel: [<ffffffff8163325d>] io_schedule+0x9d/0x120
 kernel: [<ffffffff81633a7c>] bit_wait_io+0x2c/0x50
 kernel: [<ffffffff816337eb>] __wait_on_bit_lock+0x4b/0xb0
 kernel: [<ffffffff81166da9>] __lock_page_killable+0xb9/0xe0
 kernel: [<ffffffff810b1c20>] ? autoremove_wake_function+0x40/0x40
 kernel: [<ffffffff81168e48>] generic_file_read_iter+0x3e8/0x610
 kernel: [<ffffffff81213bf7>] blkdev_read_iter+0x37/0x40
 kernel: [<ffffffff811dba1b>] new_sync_read+0x8b/0xd0
 kernel: [<ffffffff811dc188>] vfs_read+0x98/0x170
 kernel: [<ffffffff811dce65>] SyS_read+0x55/0xd0
 kernel: [<ffffffff81637769>] system_call_fastpath+0x16/0x1b

 kernel: kdmwork-253:15  D ffff88033fc54740     0  2745      2 0x00000080
 kernel: ffff88031910bce8 0000000000000046 ffff88032cedb960 0000000000014740
 kernel: ffff88031910bfd8 0000000000014740 ffff88032e645610 ffff88032cedb960
 kernel: ffff88033fc55058 ffff880329565420 ffff880327c4f870 ffff880327c4f800
 kernel: Call Trace:
 kernel: [<ffffffff8163325d>] io_schedule+0x9d/0x120
 kernel: [<ffffffff812c47e8>] get_request+0x218/0x770
 kernel: [<ffffffff810b1be0>] ? prepare_to_wait_event+0xf0/0xf0
 kernel: [<ffffffff812c4dc7>] blk_get_request+0x87/0xe0
 kernel: [<ffffffffa0195bb1>] multipath_map+0x121/0x1b0 [dm_multipath]
 kernel: [<ffffffffa0001032>] map_tio_request+0x52/0x250 [dm_mod]
 kernel: [<ffffffff81090be2>] kthread_worker_fn+0x82/0x1d0
 kernel: [<ffffffff81090b60>] ? kthread_stop+0xe0/0xe0
 kernel: [<ffffffff81090818>] kthread+0xd8/0xf0
 kernel: [<ffffffff81090740>] ? kthread_create_on_node+0x1a0/0x1a0
 kernel: [<ffffffff816376bc>] ret_from_fork+0x7c/0xb0
 kernel: [<ffffffff81090740>] ? kthread_create_on_node+0x1a0/0x1a0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 0/4] blk-mq support for dm multipath
  2014-12-06  6:09         ` Mike Snitzer
@ 2014-12-13  2:08           ` Mike Snitzer
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2014-12-13  2:08 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Jun'ichi Nomura, dm-devel

On Sat, Dec 06 2014 at  1:09am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Sat, Dec 06 2014 at 12:53am -0500,
> Mike Snitzer <snitzer@redhat.com> wrote:
>  
> > But in general this code needs _a lot_ more testing/review.
> > Sadly I cannot stage it for 3.19 inclusion.
> 
> FYI, even though the mpath device is created without crashing it hangs
> when IO is issued to it (mkfs.xfs /dev/mapper/mpathi):
> 
>  kernel: mkfs.xfs        D ffff88033fc34740     0  2792   2457 0x00000080
>  kernel: ffff880319057cb8 0000000000000082 ffff880327e84c80 0000000000014740
>  kernel: ffff880319057fd8 0000000000014740 ffff88032e644c80 ffff880327e84c80
>  kernel: ffff88033fc35058 ffff88033ff940e8 ffff880319057d48 0000000000000082
>  kernel: Call Trace:
>  kernel: [<ffffffff81633a50>] ? bit_wait+0x50/0x50
>  kernel: [<ffffffff8163325d>] io_schedule+0x9d/0x120
>  kernel: [<ffffffff81633a7c>] bit_wait_io+0x2c/0x50
>  kernel: [<ffffffff816337eb>] __wait_on_bit_lock+0x4b/0xb0
>  kernel: [<ffffffff81166da9>] __lock_page_killable+0xb9/0xe0
>  kernel: [<ffffffff810b1c20>] ? autoremove_wake_function+0x40/0x40
>  kernel: [<ffffffff81168e48>] generic_file_read_iter+0x3e8/0x610
>  kernel: [<ffffffff81213bf7>] blkdev_read_iter+0x37/0x40
>  kernel: [<ffffffff811dba1b>] new_sync_read+0x8b/0xd0
>  kernel: [<ffffffff811dc188>] vfs_read+0x98/0x170
>  kernel: [<ffffffff811dce65>] SyS_read+0x55/0xd0
>  kernel: [<ffffffff81637769>] system_call_fastpath+0x16/0x1b
> 
>  kernel: kdmwork-253:15  D ffff88033fc54740     0  2745      2 0x00000080
>  kernel: ffff88031910bce8 0000000000000046 ffff88032cedb960 0000000000014740
>  kernel: ffff88031910bfd8 0000000000014740 ffff88032e645610 ffff88032cedb960
>  kernel: ffff88033fc55058 ffff880329565420 ffff880327c4f870 ffff880327c4f800
>  kernel: Call Trace:
>  kernel: [<ffffffff8163325d>] io_schedule+0x9d/0x120
>  kernel: [<ffffffff812c47e8>] get_request+0x218/0x770
>  kernel: [<ffffffff810b1be0>] ? prepare_to_wait_event+0xf0/0xf0
>  kernel: [<ffffffff812c4dc7>] blk_get_request+0x87/0xe0
>  kernel: [<ffffffffa0195bb1>] multipath_map+0x121/0x1b0 [dm_multipath]
>  kernel: [<ffffffffa0001032>] map_tio_request+0x52/0x250 [dm_mod]
>  kernel: [<ffffffff81090be2>] kthread_worker_fn+0x82/0x1d0
>  kernel: [<ffffffff81090b60>] ? kthread_stop+0xe0/0xe0
>  kernel: [<ffffffff81090818>] kthread+0xd8/0xf0
>  kernel: [<ffffffff81090740>] ? kthread_create_on_node+0x1a0/0x1a0
>  kernel: [<ffffffff816376bc>] ret_from_fork+0x7c/0xb0
>  kernel: [<ffffffff81090740>] ? kthread_create_on_node+0x1a0/0x1a0

Pretty sure requests weren't completing due to deadlock because with non
blk-mq use-case special care _must_ be taken not to acquire the clone
request's queue lock in the completion path.  See the comment in
drivers/md/dm.c:end_clone_request()

I couldn't come up with a way of salvaging having the old request-based
path use blk_get_request() in .map_mq and blk_put_request() in .unmap_rq

Instead I've gone the different direction of training DM core to
conditionally use either:
1) old-style request cloning in terms of a DM mempool
or
2) new-style blk_get_request+blk_put_request in dm-mpath target

2 is obviously used if the backing device is using blk-mq.

I've published a rebased patchset here (BUT I haven't tested the blk-mq
support yet.. I _think_ it should work but it could easily crash due to
some silly oversight.  I'll test using virtio-blk in a guest on Monday):

https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19-blk-mq

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-12-13  2:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-17 23:46 [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
2014-10-17 23:46 ` [PATCHv2 1/4] dm: prep initialized requests Keith Busch
2014-10-17 23:46 ` [PATCHv2 2/4] dm: Submit stacked requests in irq enabled context Keith Busch
2014-10-17 23:46 ` [PATCHv2 3/4] dm: Move request allocation to dm_target type Keith Busch
2014-10-17 23:46 ` [PATCHv2 4/4] block: blk-mq support for cloned requests Keith Busch
2014-10-28 18:13 ` [PATCHv2 0/4] blk-mq support for dm multipath Keith Busch
2014-10-28 18:26   ` Mike Snitzer
2014-12-05 23:25     ` Mike Snitzer
2014-12-05 23:33       ` Keith Busch
2014-12-06  0:35         ` Mike Snitzer
2014-12-06  5:53       ` Mike Snitzer
2014-12-06  6:09         ` Mike Snitzer
2014-12-13  2:08           ` Mike Snitzer
2014-10-28 18:48   ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.