dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path
@ 2016-11-15 23:31 Bart Van Assche
  2016-11-15 23:32 ` [PATCH 1/7] dm: Fix a (theoretical?) race condition in rq_completed() Bart Van Assche
                   ` (8 more replies)
  0 siblings, 9 replies; 39+ messages in thread
From: Bart Van Assche @ 2016-11-15 23:31 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Hello Mike,

The seven patches in this series is what I came up with while reviewing 
and testing the dm-mpath single queue and multiqueue code. It would be 
appreciated if these patches would be considered for inclusion in the 
upstream kernel.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/7] dm: Fix a (theoretical?) race condition in rq_completed()
  2016-11-15 23:31 [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Bart Van Assche
@ 2016-11-15 23:32 ` Bart Van Assche
  2016-11-16  0:46   ` Mike Snitzer
  2016-11-15 23:33 ` [PATCH 2/7] dm: Simplify dm_table_determine_type() Bart Van Assche
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-15 23:32 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

It is required to hold the queue lock when calling blk_run_queue_async()
to avoid that a race between blk_run_queue_async() and
blk_cleanup_queue() is triggered.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 drivers/md/dm-rq.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index f9f37ad..7df7948 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -210,6 +210,9 @@ static void rq_end_stats(struct mapped_device *md, struct request *orig)
  */
 static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
 {
+	struct request_queue *q = md->queue;
+	unsigned long flags;
+
 	atomic_dec(&md->pending[rw]);
 
 	/* nudge anyone waiting on suspend queue */
@@ -222,8 +225,11 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
 	 * back into ->request_fn() could deadlock attempting to grab the
 	 * queue lock again.
 	 */
-	if (!md->queue->mq_ops && run_queue)
-		blk_run_queue_async(md->queue);
+	if (!q->mq_ops && run_queue) {
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_run_queue_async(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
 
 	/*
 	 * dm_put() must be at the end of this function. See the comment above
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/7] dm: Simplify dm_table_determine_type()
  2016-11-15 23:31 [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Bart Van Assche
  2016-11-15 23:32 ` [PATCH 1/7] dm: Fix a (theoretical?) race condition in rq_completed() Bart Van Assche
@ 2016-11-15 23:33 ` Bart Van Assche
  2016-11-16 14:54   ` Mike Snitzer
  2016-11-15 23:33 ` [PATCH 3/7] dm-mpath: Document a locking assumption Bart Van Assche
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-15 23:33 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Use a single loop instead of two loops to determine whether or not
all_blk_mq has to be set.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 drivers/md/dm-table.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 49893fdc..fff4979 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -871,7 +871,7 @@ static int dm_table_determine_type(struct dm_table *t)
 {
 	unsigned i;
 	unsigned bio_based = 0, request_based = 0, hybrid = 0;
-	bool verify_blk_mq = false;
+	unsigned sq_count = 0, mq_count = 0;
 	struct dm_target *tgt;
 	struct dm_dev_internal *dd;
 	struct list_head *devices = dm_table_get_devices(t);
@@ -959,20 +959,15 @@ static int dm_table_determine_type(struct dm_table *t)
 		}
 
 		if (q->mq_ops)
-			verify_blk_mq = true;
+			mq_count++;
+		else
+			sq_count++;
 	}
-
-	if (verify_blk_mq) {
-		/* verify _all_ devices in the table are blk-mq devices */
-		list_for_each_entry(dd, devices, list)
-			if (!bdev_get_queue(dd->dm_dev->bdev)->mq_ops) {
-				DMERR("table load rejected: not all devices"
-				      " are blk-mq request-stackable");
-				return -EINVAL;
-			}
-
-		t->all_blk_mq = true;
+	if (sq_count && mq_count) {
+		DMERR("table load rejected: not all devices are blk-mq request-stackable");
+		return -EINVAL;
 	}
+	t->all_blk_mq = mq_count > 0;
 
 	return 0;
 }
@@ -1021,6 +1016,10 @@ bool dm_table_request_based(struct dm_table *t)
 	return __table_type_request_based(dm_table_get_type(t));
 }
 
+/*
+ * Returns true if all paths are blk-mq devices. Returns false if all paths
+ * are single queue block devices or if there are no paths.
+ */
 bool dm_table_all_blk_mq_devices(struct dm_table *t)
 {
 	return t->all_blk_mq;
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/7] dm-mpath: Document a locking assumption
  2016-11-15 23:31 [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Bart Van Assche
  2016-11-15 23:32 ` [PATCH 1/7] dm: Fix a (theoretical?) race condition in rq_completed() Bart Van Assche
  2016-11-15 23:33 ` [PATCH 2/7] dm: Simplify dm_table_determine_type() Bart Van Assche
@ 2016-11-15 23:33 ` Bart Van Assche
  2016-11-18  0:07   ` Mike Snitzer
  2016-11-15 23:34 ` [PATCH 4/7] dm-mpath: Change return type of pg_init_all_paths() from int into void Bart Van Assche
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-15 23:33 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Document that __pg_init_all_paths() must be called with
multipath.lock held.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 drivers/md/dm-mpath.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index e477af8..f1e226d 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -348,6 +348,8 @@ static int __pg_init_all_paths(struct multipath *m)
 	struct pgpath *pgpath;
 	unsigned long pg_init_delay = 0;
 
+	lockdep_assert_held(&m->lock);
+
 	if (atomic_read(&m->pg_init_in_progress) || test_bit(MPATHF_PG_INIT_DISABLED, &m->flags))
 		return 0;
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/7] dm-mpath: Change return type of pg_init_all_paths() from int into void
  2016-11-15 23:31 [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Bart Van Assche
                   ` (2 preceding siblings ...)
  2016-11-15 23:33 ` [PATCH 3/7] dm-mpath: Document a locking assumption Bart Van Assche
@ 2016-11-15 23:34 ` Bart Van Assche
  2016-11-15 23:34 ` [PATCH 5/7] dm-mpath: Do not touch *__clone if request allocation fails Bart Van Assche
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2016-11-15 23:34 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

None of the callers of pg_init_all_paths() check its return value.
Hence change the return type of pg_init_all_paths() from int into
void.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 drivers/md/dm-mpath.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index f1e226d..1c97f0e 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -374,16 +374,13 @@ static int __pg_init_all_paths(struct multipath *m)
 	return atomic_read(&m->pg_init_in_progress);
 }
 
-static int pg_init_all_paths(struct multipath *m)
+static void pg_init_all_paths(struct multipath *m)
 {
-	int r;
 	unsigned long flags;
 
 	spin_lock_irqsave(&m->lock, flags);
-	r = __pg_init_all_paths(m);
+	__pg_init_all_paths(m);
 	spin_unlock_irqrestore(&m->lock, flags);
-
-	return r;
 }
 
 static void __switch_pg(struct multipath *m, struct priority_group *pg)
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/7] dm-mpath: Do not touch *__clone if request allocation fails
  2016-11-15 23:31 [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Bart Van Assche
                   ` (3 preceding siblings ...)
  2016-11-15 23:34 ` [PATCH 4/7] dm-mpath: Change return type of pg_init_all_paths() from int into void Bart Van Assche
@ 2016-11-15 23:34 ` Bart Van Assche
  2016-11-15 23:34 ` [PATCH 6/7] dm-mpath: Avoid code duplication in __multipath_map() Bart Van Assche
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2016-11-15 23:34 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Do not modify *__clone if blk_mq_alloc_request() fails. This makes
it easier to figure out what is going on if the caller accidentally
dereferences *__clone if blk_mq_alloc_request() failed.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 drivers/md/dm-mpath.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 1c97f0e..5c73818 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -582,16 +582,17 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 		 * .request_fn stacked on blk-mq path(s) and
 		 * blk-mq stacked on blk-mq path(s).
 		 */
-		*__clone = blk_mq_alloc_request(bdev_get_queue(bdev),
-						rq_data_dir(rq), BLK_MQ_REQ_NOWAIT);
-		if (IS_ERR(*__clone)) {
-			/* ENOMEM, requeue */
+		clone = blk_mq_alloc_request(bdev_get_queue(bdev),
+					rq_data_dir(rq), BLK_MQ_REQ_NOWAIT);
+		if (IS_ERR(clone)) {
+			/* EBUSY, ENODEV or EWOULDBLOCK; requeue */
 			clear_request_fn_mpio(m, map_context);
 			return r;
 		}
-		(*__clone)->bio = (*__clone)->biotail = NULL;
-		(*__clone)->rq_disk = bdev->bd_disk;
-		(*__clone)->cmd_flags |= REQ_FAILFAST_TRANSPORT;
+		*__clone = clone;
+		clone->bio = clone->biotail = NULL;
+		clone->rq_disk = bdev->bd_disk;
+		clone->cmd_flags |= REQ_FAILFAST_TRANSPORT;
 	}
 
 	if (pgpath->pg->ps.type->start_io)
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/7] dm-mpath: Avoid code duplication in __multipath_map()
  2016-11-15 23:31 [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Bart Van Assche
                   ` (4 preceding siblings ...)
  2016-11-15 23:34 ` [PATCH 5/7] dm-mpath: Do not touch *__clone if request allocation fails Bart Van Assche
@ 2016-11-15 23:34 ` Bart Van Assche
  2016-11-16  0:39   ` Mike Snitzer
  2016-11-15 23:35 ` [PATCH 7/7] dm-mpath: Fix a race condition " Bart Van Assche
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-15 23:34 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Move common code out of if (clone) { ... } else { ... }.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 drivers/md/dm-mpath.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 5c73818..7559537 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -574,8 +574,6 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 		 * Used by: .request_fn stacked on .request_fn path(s).
 		 */
 		clone->q = bdev_get_queue(bdev);
-		clone->rq_disk = bdev->bd_disk;
-		clone->cmd_flags |= REQ_FAILFAST_TRANSPORT;
 	} else {
 		/*
 		 * blk-mq request-based interface; used by both:
@@ -591,9 +589,9 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 		}
 		*__clone = clone;
 		clone->bio = clone->biotail = NULL;
-		clone->rq_disk = bdev->bd_disk;
-		clone->cmd_flags |= REQ_FAILFAST_TRANSPORT;
 	}
+	clone->rq_disk = bdev->bd_disk;
+	clone->cmd_flags |= REQ_FAILFAST_TRANSPORT;
 
 	if (pgpath->pg->ps.type->start_io)
 		pgpath->pg->ps.type->start_io(&pgpath->pg->ps,
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-15 23:31 [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Bart Van Assche
                   ` (5 preceding siblings ...)
  2016-11-15 23:34 ` [PATCH 6/7] dm-mpath: Avoid code duplication in __multipath_map() Bart Van Assche
@ 2016-11-15 23:35 ` Bart Van Assche
  2016-11-16  0:37   ` Mike Snitzer
  2016-11-16  0:47 ` [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Mike Snitzer
  2016-11-16  7:39 ` Hannes Reinecke
  8 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-15 23:35 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

If a single-queue dm device is stacked on top of multi-queue block
devices and map_tio_request() is called while there are no paths then
the request will be prepared for a single-queue path. If a path is
added after a request was prepared and before __multipath_map() is
called return DM_MAPIO_REQUEUE such that it gets unprepared and
reprepared as a blk-mq request.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 drivers/md/dm-mpath.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 7559537..6b20349 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -541,6 +541,7 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 	size_t nr_bytes = clone ? blk_rq_bytes(clone) : blk_rq_bytes(rq);
 	struct pgpath *pgpath;
 	struct block_device *bdev;
+	struct request_queue *q;
 	struct dm_mpath_io *mpio;
 
 	/* Do we need to select a new pgpath? */
@@ -558,6 +559,18 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 		return r;
 	}
 
+	bdev = pgpath->path.dev->bdev;
+	q = bdev_get_queue(bdev);
+
+	/*
+	 * When a request is prepared if there are no paths it may happen that
+	 * the request was prepared for a single-queue path and that a
+	 * multiqueue path is added before __multipath_map() is called. If
+	 * that happens requeue to trigger unprepare and reprepare.
+	 */
+	if ((clone && q->mq_ops) || (!clone && !q->mq_ops))
+		return r;
+
 	mpio = set_mpio(m, map_context);
 	if (!mpio)
 		/* ENOMEM, requeue */
@@ -566,22 +579,20 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 	mpio->pgpath = pgpath;
 	mpio->nr_bytes = nr_bytes;
 
-	bdev = pgpath->path.dev->bdev;
-
 	if (clone) {
 		/*
 		 * Old request-based interface: allocated clone is passed in.
 		 * Used by: .request_fn stacked on .request_fn path(s).
 		 */
-		clone->q = bdev_get_queue(bdev);
+		clone->q = q;
 	} else {
 		/*
 		 * blk-mq request-based interface; used by both:
 		 * .request_fn stacked on blk-mq path(s) and
 		 * blk-mq stacked on blk-mq path(s).
 		 */
-		clone = blk_mq_alloc_request(bdev_get_queue(bdev),
-					rq_data_dir(rq), BLK_MQ_REQ_NOWAIT);
+		clone = blk_mq_alloc_request(q, rq_data_dir(rq),
+					     BLK_MQ_REQ_NOWAIT);
 		if (IS_ERR(clone)) {
 			/* EBUSY, ENODEV or EWOULDBLOCK; requeue */
 			clear_request_fn_mpio(m, map_context);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-15 23:35 ` [PATCH 7/7] dm-mpath: Fix a race condition " Bart Van Assche
@ 2016-11-16  0:37   ` Mike Snitzer
  2016-11-16  0:40     ` Bart Van Assche
  2016-11-21 21:44     ` Bart Van Assche
  0 siblings, 2 replies; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16  0:37 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 15 2016 at  6:35pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> If a single-queue dm device is stacked on top of multi-queue block
> devices and map_tio_request() is called while there are no paths then
> the request will be prepared for a single-queue path. If a path is
> added after a request was prepared and before __multipath_map() is
> called return DM_MAPIO_REQUEUE such that it gets unprepared and
> reprepared as a blk-mq request.

This patch makes little sense to me.  There isn't a scenario that I'm
aware of that would allow the request_queue to transition between old
.request_fn and new blk-mq.

The dm-table code should prevent this.

Mike

> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 7559537..6b20349 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -541,6 +541,7 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
>  	size_t nr_bytes = clone ? blk_rq_bytes(clone) : blk_rq_bytes(rq);
>  	struct pgpath *pgpath;
>  	struct block_device *bdev;
> +	struct request_queue *q;
>  	struct dm_mpath_io *mpio;
>  
>  	/* Do we need to select a new pgpath? */
> @@ -558,6 +559,18 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
>  		return r;
>  	}
>  
> +	bdev = pgpath->path.dev->bdev;
> +	q = bdev_get_queue(bdev);
> +
> +	/*
> +	 * When a request is prepared if there are no paths it may happen that
> +	 * the request was prepared for a single-queue path and that a
> +	 * multiqueue path is added before __multipath_map() is called. If
> +	 * that happens requeue to trigger unprepare and reprepare.
> +	 */
> +	if ((clone && q->mq_ops) || (!clone && !q->mq_ops))
> +		return r;
> +
>  	mpio = set_mpio(m, map_context);
>  	if (!mpio)
>  		/* ENOMEM, requeue */
> @@ -566,22 +579,20 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
>  	mpio->pgpath = pgpath;
>  	mpio->nr_bytes = nr_bytes;
>  
> -	bdev = pgpath->path.dev->bdev;
> -
>  	if (clone) {
>  		/*
>  		 * Old request-based interface: allocated clone is passed in.
>  		 * Used by: .request_fn stacked on .request_fn path(s).
>  		 */
> -		clone->q = bdev_get_queue(bdev);
> +		clone->q = q;
>  	} else {
>  		/*
>  		 * blk-mq request-based interface; used by both:
>  		 * .request_fn stacked on blk-mq path(s) and
>  		 * blk-mq stacked on blk-mq path(s).
>  		 */
> -		clone = blk_mq_alloc_request(bdev_get_queue(bdev),
> -					rq_data_dir(rq), BLK_MQ_REQ_NOWAIT);
> +		clone = blk_mq_alloc_request(q, rq_data_dir(rq),
> +					     BLK_MQ_REQ_NOWAIT);
>  		if (IS_ERR(clone)) {
>  			/* EBUSY, ENODEV or EWOULDBLOCK; requeue */
>  			clear_request_fn_mpio(m, map_context);
> -- 
> 2.10.1
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/7] dm-mpath: Avoid code duplication in __multipath_map()
  2016-11-15 23:34 ` [PATCH 6/7] dm-mpath: Avoid code duplication in __multipath_map() Bart Van Assche
@ 2016-11-16  0:39   ` Mike Snitzer
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16  0:39 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 15 2016 at  6:34pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> Move common code out of if (clone) { ... } else { ... }.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> ---
>  drivers/md/dm-mpath.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 5c73818..7559537 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -574,8 +574,6 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
>  		 * Used by: .request_fn stacked on .request_fn path(s).
>  		 */
>  		clone->q = bdev_get_queue(bdev);
> -		clone->rq_disk = bdev->bd_disk;
> -		clone->cmd_flags |= REQ_FAILFAST_TRANSPORT;
>  	} else {
>  		/*
>  		 * blk-mq request-based interface; used by both:
> @@ -591,9 +589,9 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
>  		}
>  		*__clone = clone;
>  		clone->bio = clone->biotail = NULL;
> -		clone->rq_disk = bdev->bd_disk;
> -		clone->cmd_flags |= REQ_FAILFAST_TRANSPORT;
>  	}
> +	clone->rq_disk = bdev->bd_disk;
> +	clone->cmd_flags |= REQ_FAILFAST_TRANSPORT;
>  
>  	if (pgpath->pg->ps.type->start_io)
>  		pgpath->pg->ps.type->start_io(&pgpath->pg->ps,
> -- 
> 2.10.1

I conciously left this duplication to avoid sprawling setup of the clone
request.  This isn't a lot of duplication here...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-16  0:37   ` Mike Snitzer
@ 2016-11-16  0:40     ` Bart Van Assche
  2016-11-16  1:01       ` Mike Snitzer
  2016-11-21 21:44     ` Bart Van Assche
  1 sibling, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-16  0:40 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 11/15/2016 04:37 PM, Mike Snitzer wrote:
> On Tue, Nov 15 2016 at  6:35pm -0500,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> If a single-queue dm device is stacked on top of multi-queue block
>> devices and map_tio_request() is called while there are no paths then
>> the request will be prepared for a single-queue path. If a path is
>> added after a request was prepared and before __multipath_map() is
>> called return DM_MAPIO_REQUEUE such that it gets unprepared and
>> reprepared as a blk-mq request.
>
> This patch makes little sense to me.  There isn't a scenario that I'm
> aware of that would allow the request_queue to transition between old
> .request_fn and new blk-mq.
>
> The dm-table code should prevent this.

Hello Mike,

Are you aware that dm_table_determine_type() sets "all_blk_mq" to false 
if there are no paths, even if the dm device is in blk-mq mode?

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/7] dm: Fix a (theoretical?) race condition in rq_completed()
  2016-11-15 23:32 ` [PATCH 1/7] dm: Fix a (theoretical?) race condition in rq_completed() Bart Van Assche
@ 2016-11-16  0:46   ` Mike Snitzer
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16  0:46 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 15 2016 at  6:32pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> It is required to hold the queue lock when calling blk_run_queue_async()
> to avoid that a race between blk_run_queue_async() and
> blk_cleanup_queue() is triggered.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>

I picked this patch up earlier today, see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.10&id=d15bb3a6467e102e60d954aadda5fb19ce6fd8ec

But you your "(theoretical?)", I'd really expected you to have realized
an actual the need for this change...

Mike

> ---
>  drivers/md/dm-rq.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index f9f37ad..7df7948 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -210,6 +210,9 @@ static void rq_end_stats(struct mapped_device *md, struct request *orig)
>   */
>  static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
>  {
> +	struct request_queue *q = md->queue;
> +	unsigned long flags;
> +
>  	atomic_dec(&md->pending[rw]);
>  
>  	/* nudge anyone waiting on suspend queue */
> @@ -222,8 +225,11 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
>  	 * back into ->request_fn() could deadlock attempting to grab the
>  	 * queue lock again.
>  	 */
> -	if (!md->queue->mq_ops && run_queue)
> -		blk_run_queue_async(md->queue);
> +	if (!q->mq_ops && run_queue) {
> +		spin_lock_irqsave(q->queue_lock, flags);
> +		blk_run_queue_async(q);
> +		spin_unlock_irqrestore(q->queue_lock, flags);
> +	}
>  
>  	/*
>  	 * dm_put() must be at the end of this function. See the comment above
> -- 
> 2.10.1
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path
  2016-11-15 23:31 [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Bart Van Assche
                   ` (6 preceding siblings ...)
  2016-11-15 23:35 ` [PATCH 7/7] dm-mpath: Fix a race condition " Bart Van Assche
@ 2016-11-16  0:47 ` Mike Snitzer
  2016-11-16  0:57   ` Bart Van Assche
  2016-11-16  7:39 ` Hannes Reinecke
  8 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16  0:47 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 15 2016 at  6:31pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> Hello Mike,
> 
> The seven patches in this series is what I came up with while
> reviewing and testing the dm-mpath single queue and multiqueue code.
> It would be appreciated if these patches would be considered for
> inclusion in the upstream kernel.

This series seems like it is not a product of need.  But that of changes
that fell out from code review.

If not, what test was failing that now passes with this patchset?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path
  2016-11-16  0:47 ` [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Mike Snitzer
@ 2016-11-16  0:57   ` Bart Van Assche
  2016-11-16  1:08     ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-16  0:57 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 11/15/2016 04:47 PM, Mike Snitzer wrote:
> On Tue, Nov 15 2016 at  6:31pm -0500,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> The seven patches in this series is what I came up with while
>> reviewing and testing the dm-mpath single queue and multiqueue code.
>> It would be appreciated if these patches would be considered for
>> inclusion in the upstream kernel.
>
> This series seems like it is not a product of need.  But that of changes
> that fell out from code review.
>
> If not, what test was failing that now passes with this patchset?

Hello Mike,

Without this patch series I see sporadic I/O errors when running I/O on 
top of dm-mq-on-mq. With this patch series my dm-mq-on-mq tests pass. 
However, I still see sporadic I/O errors being reported when I run I/O 
on top of dm-sq-on-mq and very sporadic I/O errors with my dm-sq-on-sq 
tests. It is not yet clear to me what is causing these I/O errors but 
it's probably something in either the dm core or the dm-mpath driver. My 
tests scripts are available at https://github.com/bvanassche/srp-test in 
case you would like to have a look.

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-16  0:40     ` Bart Van Assche
@ 2016-11-16  1:01       ` Mike Snitzer
  2016-11-16  1:08         ` Bart Van Assche
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16  1:01 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 15 2016 at  7:40pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/15/2016 04:37 PM, Mike Snitzer wrote:
> >On Tue, Nov 15 2016 at  6:35pm -0500,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>If a single-queue dm device is stacked on top of multi-queue block
> >>devices and map_tio_request() is called while there are no paths then
> >>the request will be prepared for a single-queue path. If a path is
> >>added after a request was prepared and before __multipath_map() is
> >>called return DM_MAPIO_REQUEUE such that it gets unprepared and
> >>reprepared as a blk-mq request.
> >
> >This patch makes little sense to me.  There isn't a scenario that I'm
> >aware of that would allow the request_queue to transition between old
> >.request_fn and new blk-mq.
> >
> >The dm-table code should prevent this.
> 
> Hello Mike,
> 
> Are you aware that dm_table_determine_type() sets "all_blk_mq" to
> false if there are no paths, even if the dm device is in blk-mq
> mode?

That shouldn't matter.  Once the type is established, it is used to
initialize the DM device's request_queue, the type cannot change across
different table loads.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path
  2016-11-16  0:57   ` Bart Van Assche
@ 2016-11-16  1:08     ` Mike Snitzer
  2016-11-16  1:10       ` Bart Van Assche
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16  1:08 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 15 2016 at  7:57pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/15/2016 04:47 PM, Mike Snitzer wrote:
> >On Tue, Nov 15 2016 at  6:31pm -0500,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>The seven patches in this series is what I came up with while
> >>reviewing and testing the dm-mpath single queue and multiqueue code.
> >>It would be appreciated if these patches would be considered for
> >>inclusion in the upstream kernel.
> >
> >This series seems like it is not a product of need.  But that of changes
> >that fell out from code review.
> >
> >If not, what test was failing that now passes with this patchset?
> 
> Hello Mike,
> 
> Without this patch series I see sporadic I/O errors when running I/O
> on top of dm-mq-on-mq. With this patch series my dm-mq-on-mq tests
> pass. However, I still see sporadic I/O errors being reported when I
> run I/O on top of dm-sq-on-mq and very sporadic I/O errors with my
> dm-sq-on-sq tests. It is not yet clear to me what is causing these
> I/O errors but it's probably something in either the dm core or the
> dm-mpath driver. My tests scripts are available at
> https://github.com/bvanassche/srp-test in case you would like to
> have a look.

I'm getting very tired of this.  Last I knew those tests pass.  Do you
keep changing the tests or something?

There is no change in this entire series that seems needed.  Exception
possibly being the patch 1/7 -- given you put so much pressure on DM
device teardown vs concurrent IO.

Please drop all but patch 1/7 and see if your tests pass.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-16  1:01       ` Mike Snitzer
@ 2016-11-16  1:08         ` Bart Van Assche
  2016-11-16  1:50           ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-16  1:08 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 11/15/2016 05:01 PM, Mike Snitzer wrote:
> On Tue, Nov 15 2016 at  7:40pm -0500,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> Are you aware that dm_table_determine_type() sets "all_blk_mq" to
>> false if there are no paths, even if the dm device is in blk-mq
>> mode?
>
> That shouldn't matter.  Once the type is established, it is used to
> initialize the DM device's request_queue, the type cannot change across
> different table loads.

For a single queue dm device, what prevents a user from removing all 
single queue paths and to add one or more blk-mq paths? I think this 
will cause dm_table_determine_type() to change the table type.

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path
  2016-11-16  1:08     ` Mike Snitzer
@ 2016-11-16  1:10       ` Bart Van Assche
  2016-11-16  1:53         ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-16  1:10 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 11/15/2016 05:08 PM, Mike Snitzer wrote:
> I'm getting very tired of this.  Last I knew those tests pass.  Do you
> keep changing the tests or something?

If the dm code would work reliably I wouldn't have to post any dm 
patches. The only factor that I changed is the number of test iterations 
I ran.

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-16  1:08         ` Bart Van Assche
@ 2016-11-16  1:50           ` Mike Snitzer
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16  1:50 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 15 2016 at  8:08pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/15/2016 05:01 PM, Mike Snitzer wrote:
> >On Tue, Nov 15 2016 at  7:40pm -0500,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>Are you aware that dm_table_determine_type() sets "all_blk_mq" to
> >>false if there are no paths, even if the dm device is in blk-mq
> >>mode?
> >
> >That shouldn't matter.  Once the type is established, it is used to
> >initialize the DM device's request_queue, the type cannot change across
> >different table loads.
> 
> For a single queue dm device, what prevents a user from removing all
> single queue paths and to add one or more blk-mq paths? I think this
> will cause dm_table_determine_type() to change the table type.

A new table is created for every table load (any time a multipath device
is loaded into the kernel).  The DM core disallows a table with a
different type to load.  It will be rejected within an error.

See dm-ioctl.c:table_load()

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path
  2016-11-16  1:10       ` Bart Van Assche
@ 2016-11-16  1:53         ` Mike Snitzer
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16  1:53 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 15 2016 at  8:10pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/15/2016 05:08 PM, Mike Snitzer wrote:
> >I'm getting very tired of this.  Last I knew those tests pass.  Do you
> >keep changing the tests or something?
> 
> If the dm code would work reliably I wouldn't have to post any dm
> patches. The only factor that I changed is the number of test
> iterations I ran.

Now you're just baiting me.  In general your DM patches are appreciated
but you also don't understand the code well enough to justify the
changes you're proposing.  Aside from patch 1/7 I think this series is a
misfire -- most of it churn and unnecessary.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path
  2016-11-15 23:31 [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Bart Van Assche
                   ` (7 preceding siblings ...)
  2016-11-16  0:47 ` [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Mike Snitzer
@ 2016-11-16  7:39 ` Hannes Reinecke
  2016-11-16 14:56   ` Mike Snitzer
  8 siblings, 1 reply; 39+ messages in thread
From: Hannes Reinecke @ 2016-11-16  7:39 UTC (permalink / raw)
  To: Bart Van Assche, Mike Snitzer; +Cc: device-mapper development

On 11/16/2016 12:31 AM, Bart Van Assche wrote:
> Hello Mike,
> 
> The seven patches in this series is what I came up with while reviewing
> and testing the dm-mpath single queue and multiqueue code. It would be
> appreciated if these patches would be considered for inclusion in the
> upstream kernel.
> 
For the whole series:

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] dm: Simplify dm_table_determine_type()
  2016-11-15 23:33 ` [PATCH 2/7] dm: Simplify dm_table_determine_type() Bart Van Assche
@ 2016-11-16 14:54   ` Mike Snitzer
  2016-11-16 20:14     ` Bart Van Assche
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16 14:54 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 15 2016 at  6:33pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> Use a single loop instead of two loops to determine whether or not
> all_blk_mq has to be set.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> ---
>  drivers/md/dm-table.c | 25 ++++++++++++-------------
>  1 file changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 49893fdc..fff4979 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -871,7 +871,7 @@ static int dm_table_determine_type(struct dm_table *t)
>  {
>  	unsigned i;
>  	unsigned bio_based = 0, request_based = 0, hybrid = 0;
> -	bool verify_blk_mq = false;
> +	unsigned sq_count = 0, mq_count = 0;
>  	struct dm_target *tgt;
>  	struct dm_dev_internal *dd;
>  	struct list_head *devices = dm_table_get_devices(t);
> @@ -959,20 +959,15 @@ static int dm_table_determine_type(struct dm_table *t)
>  		}
>  
>  		if (q->mq_ops)
> -			verify_blk_mq = true;
> +			mq_count++;
> +		else
> +			sq_count++;
>  	}
> -
> -	if (verify_blk_mq) {
> -		/* verify _all_ devices in the table are blk-mq devices */
> -		list_for_each_entry(dd, devices, list)
> -			if (!bdev_get_queue(dd->dm_dev->bdev)->mq_ops) {
> -				DMERR("table load rejected: not all devices"
> -				      " are blk-mq request-stackable");
> -				return -EINVAL;
> -			}
> -
> -		t->all_blk_mq = true;
> +	if (sq_count && mq_count) {
> +		DMERR("table load rejected: not all devices are blk-mq request-stackable");
> +		return -EINVAL;
>  	}
> +	t->all_blk_mq = mq_count > 0;
>  
>  	return 0;
>  }
> @@ -1021,6 +1016,10 @@ bool dm_table_request_based(struct dm_table *t)
>  	return __table_type_request_based(dm_table_get_type(t));
>  }
>  
> +/*
> + * Returns true if all paths are blk-mq devices. Returns false if all paths
> + * are single queue block devices or if there are no paths.
> + */

This code isn't specific to multipath.  So "paths" is misplaced.
"devices" is more appropriate.  Not seeing the need for the comment to
be honest.  The function name is pretty descriptive.

>  bool dm_table_all_blk_mq_devices(struct dm_table *t)
>  {
>  	return t->all_blk_mq;
> -- 
> 2.10.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path
  2016-11-16  7:39 ` Hannes Reinecke
@ 2016-11-16 14:56   ` Mike Snitzer
  2016-11-16 18:22     ` Bart Van Assche
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16 14:56 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Bart Van Assche, device-mapper development

On Wed, Nov 16 2016 at  2:39am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 11/16/2016 12:31 AM, Bart Van Assche wrote:
> > Hello Mike,
> > 
> > The seven patches in this series is what I came up with while reviewing
> > and testing the dm-mpath single queue and multiqueue code. It would be
> > appreciated if these patches would be considered for inclusion in the
> > upstream kernel.
> > 
> For the whole series:
> 
> Reviewed-by: Hannes Reinecke <hare@suse.com>

I already took patch 1.

2 - 6 are fluff.  I'll take them, but they don't fix anything.

7 is not acceptable.  It complicates the code for no reason (the
scenario that it is meant to address isn't possible).

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path
  2016-11-16 14:56   ` Mike Snitzer
@ 2016-11-16 18:22     ` Bart Van Assche
  2016-11-16 19:32       ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-16 18:22 UTC (permalink / raw)
  To: Mike Snitzer, Hannes Reinecke; +Cc: device-mapper development

On 11/16/2016 06:56 AM, Mike Snitzer wrote:
> 7 is not acceptable.  It complicates the code for no reason (the
> scenario that it is meant to address isn't possible).

Hello Mike,

With patch (a) applied I see warning (b) appear a few seconds after I 
start test 02-sq from the srp-test suite. I think that shows that 
something like patch (7) is really needed. Please let me know if you 
need more information about my test setup.

Bart.


(a)

@@ -568,8 +568,10 @@ static int __multipath_map(struct dm_target *ti, 
struct requ
est *clone,
          * multiqueue path is added before __multipath_map() is called. If
          * that happens requeue to trigger unprepare and reprepare.
          */
-       if ((clone && q->mq_ops) || (!clone && !q->mq_ops))
+       if ((clone && q->mq_ops) || (!clone && !q->mq_ops)) {
+               WARN_ON_ONCE(true);
                 return r;
+       }

         mpio = set_mpio(m, map_context);
         if (!mpio)

(b)

------------[ cut here ]------------
WARNING: CPU: 9 PID: 542 at drivers/md/dm-mpath.c:584 
__multipath_map.isra.15+0x1e2/0x390 [dm_multipath]
Modules linked in: ib_srp scsi_transport_srp ib_srpt(O) scst_vdisk(O) 
scst(O) dlm libcrc32c brd dm_service_time netconsole xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat 
nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conn track 
nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter 
ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables 
af_packet ib_ipoib rdma_ucm ib_ucm ib_uverbs msr ib_umad rdma_cm 
configfs ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core 
x86_pkg_temp_thermal coretemp kvm_intel kvm ipmi_ssif ipmi_devintf 
mlx4_core irqbypass hid_generic crct10dif_pclmul crc32_pclmul usbhid 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw tg3 gf128mul glue_helper 
iTCO_wdt ptp iTCO_vendor_support pps_core dcdbas ablk_helper pcspkr 
ipmi_si libphy cryptd mei_me ipmi_msghandler fjes tpm_tis mei 
tpm_tis_core tpm lpc_ich shpchp mfd_core wmi button mgag200 i2c_algo_bit 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm 
sr_mod cdrom crc32c_intel ehci_pci ehci_hcd usbcore usb_common sg 
dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua [last 
unloaded: brd]
CPU: 9 PID: 542 Comm: kworker/9:1H Tainted: G           O 
4.9.0-rc5-dbg+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
Workqueue: kblockd blk_mq_requeue_work
  ffffc9000251fb08 ffffffff81329555 0000000000000000 0000000000000000
  ffffc9000251fb48 ffffffff81064a56 0000024800000000 ffff8803f2a3ba78
  ffff8803f2a32758 0000000000000000 ffff8804519de528 0000000000000000
Call Trace:
  [<ffffffff81329555>] dump_stack+0x68/0x93
  [<ffffffff81064a56>] __warn+0xc6/0xe0
  [<ffffffff81064b28>] warn_slowpath_null+0x18/0x20
  [<ffffffffa0046372>] __multipath_map.isra.15+0x1e2/0x390 [dm_multipath]
  [<ffffffffa0046535>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
  [<ffffffffa002a3ed>] map_request+0x14d/0x3a0 [dm_mod]
  [<ffffffffa002a6e7>] dm_mq_queue_rq+0x77/0x110 [dm_mod]
  [<ffffffff8131083f>] blk_mq_process_rq_list+0x23f/0x340
  [<ffffffff81310a62>] __blk_mq_run_hw_queue+0x122/0x1c0
  [<ffffffff81310a1e>] ? __blk_mq_run_hw_queue+0xde/0x1c0
  [<ffffffff813105df>] blk_mq_run_hw_queue+0x9f/0xc0
  [<ffffffff81310bae>] blk_mq_run_hw_queues+0x6e/0x90
  [<ffffffff81312b37>] blk_mq_requeue_work+0xf7/0x110
  [<ffffffff81082ab5>] process_one_work+0x1f5/0x690
  [<ffffffff81082a3a>] ? process_one_work+0x17a/0x690
  [<ffffffff81082f99>] worker_thread+0x49/0x490
  [<ffffffff81082f50>] ? process_one_work+0x690/0x690
  [<ffffffff81082f50>] ? process_one_work+0x690/0x690
  [<ffffffff8108983b>] kthread+0xeb/0x110
  [<ffffffff81089750>] ? kthread_park+0x60/0x60
  [<ffffffff8163ef87>] ret_from_fork+0x27/0x40
---[ end trace 81cfd74742407be1 ]---

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path
  2016-11-16 18:22     ` Bart Van Assche
@ 2016-11-16 19:32       ` Mike Snitzer
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16 19:32 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Wed, Nov 16 2016 at  1:22pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/16/2016 06:56 AM, Mike Snitzer wrote:
> >7 is not acceptable.  It complicates the code for no reason (the
> >scenario that it is meant to address isn't possible).
> 
> Hello Mike,
> 
> With patch (a) applied I see warning (b) appear a few seconds after
> I start test 02-sq from the srp-test suite. I think that shows that
> something like patch (7) is really needed. Please let me know if you
> need more information about my test setup.
> 
> Bart.
> 
> 
> (a)
> 
> @@ -568,8 +568,10 @@ static int __multipath_map(struct dm_target
> *ti, struct requ
> est *clone,
>          * multiqueue path is added before __multipath_map() is called. If
>          * that happens requeue to trigger unprepare and reprepare.
>          */
> -       if ((clone && q->mq_ops) || (!clone && !q->mq_ops))
> +       if ((clone && q->mq_ops) || (!clone && !q->mq_ops)) {
> +               WARN_ON_ONCE(true);
>                 return r;
> +       }
> 
>         mpio = set_mpio(m, map_context);
>         if (!mpio)
> 
> (b)
> 
> ------------[ cut here ]------------
> WARNING: CPU: 9 PID: 542 at drivers/md/dm-mpath.c:584
> __multipath_map.isra.15+0x1e2/0x390 [dm_multipath]
> Modules linked in: ib_srp scsi_transport_srp ib_srpt(O)
> scst_vdisk(O) scst(O) dlm libcrc32c brd dm_service_time netconsole
> xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
> iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
> xt_conn track nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
> ip_tables x_tables af_packet ib_ipoib rdma_ucm ib_ucm ib_uverbs msr
> ib_umad rdma_cm configfs ib_cm iw_cm mlx4_ib ib_core sb_edac
> edac_core x86_pkg_temp_thermal coretemp kvm_intel kvm ipmi_ssif
> ipmi_devintf mlx4_core irqbypass hid_generic crct10dif_pclmul
> crc32_pclmul usbhid ghash_clmulni_intel aesni_intel aes_x86_64 lrw
> tg3 gf128mul glue_helper iTCO_wdt ptp iTCO_vendor_support pps_core
> dcdbas ablk_helper pcspkr ipmi_si libphy cryptd mei_me
> ipmi_msghandler fjes tpm_tis mei tpm_tis_core tpm lpc_ich shpchp
> mfd_core wmi button mgag200 i2c_algo_bit drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops ttm drm sr_mod cdrom crc32c_intel
> ehci_pci ehci_hcd usbcore usb_common sg dm_multipath dm_mod
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua [last unloaded: brd]
> CPU: 9 PID: 542 Comm: kworker/9:1H Tainted: G           O
> 4.9.0-rc5-dbg+ #1
> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
> Workqueue: kblockd blk_mq_requeue_work
>  ffffc9000251fb08 ffffffff81329555 0000000000000000 0000000000000000
>  ffffc9000251fb48 ffffffff81064a56 0000024800000000 ffff8803f2a3ba78
>  ffff8803f2a32758 0000000000000000 ffff8804519de528 0000000000000000
> Call Trace:
>  [<ffffffff81329555>] dump_stack+0x68/0x93
>  [<ffffffff81064a56>] __warn+0xc6/0xe0
>  [<ffffffff81064b28>] warn_slowpath_null+0x18/0x20
>  [<ffffffffa0046372>] __multipath_map.isra.15+0x1e2/0x390 [dm_multipath]
>  [<ffffffffa0046535>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
>  [<ffffffffa002a3ed>] map_request+0x14d/0x3a0 [dm_mod]
>  [<ffffffffa002a6e7>] dm_mq_queue_rq+0x77/0x110 [dm_mod]
>  [<ffffffff8131083f>] blk_mq_process_rq_list+0x23f/0x340
>  [<ffffffff81310a62>] __blk_mq_run_hw_queue+0x122/0x1c0
>  [<ffffffff81310a1e>] ? __blk_mq_run_hw_queue+0xde/0x1c0
>  [<ffffffff813105df>] blk_mq_run_hw_queue+0x9f/0xc0
>  [<ffffffff81310bae>] blk_mq_run_hw_queues+0x6e/0x90
>  [<ffffffff81312b37>] blk_mq_requeue_work+0xf7/0x110
>  [<ffffffff81082ab5>] process_one_work+0x1f5/0x690
>  [<ffffffff81082a3a>] ? process_one_work+0x17a/0x690
>  [<ffffffff81082f99>] worker_thread+0x49/0x490
>  [<ffffffff81082f50>] ? process_one_work+0x690/0x690
>  [<ffffffff81082f50>] ? process_one_work+0x690/0x690
>  [<ffffffff8108983b>] kthread+0xeb/0x110
>  [<ffffffff81089750>] ? kthread_park+0x60/0x60
>  [<ffffffff8163ef87>] ret_from_fork+0x27/0x40
> ---[ end trace 81cfd74742407be1 ]---

Glad you tried this, I was going to ask you to.

It'd be nice to verify, but I assume it is this half of the conditional
that is triggering: (!clone && !q->mq_ops)

This speaks to a race with cleanup of the underlying path while the
top-level blk-mq request_queue is in ->queue_rq

If that is in fact the case then I'd imagine that the underlying path's
request_queue should be marked as dying?  Wouldn't it be better to check
for that, rather than looking for a side-effect of a request_queue being
torn down (rather that ->mq_ops being NULL, though it isn't clear to me
what would be causing ->mq_ops to be NULL either)?  Anyway, my point is
we need to _know_ what is causing this to trigger.  What part of the
life-cycle is the underlying path's request_queue in?

BTW< Laurence is the one who has a testbed that can run your testsuite.
I can coordinate with him if need be but I'd prefer it if you could dig
into this the last 5%.  Apologies for being prickly yesterday, I held
certain aspects of the IO stack to be infallible.  Reality is code
evolves and bugs/regressions happen.  We just need to pin it down.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] dm: Simplify dm_table_determine_type()
  2016-11-16 14:54   ` Mike Snitzer
@ 2016-11-16 20:14     ` Bart Van Assche
  2016-11-16 21:11       ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-16 20:14 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 11/16/2016 06:54 AM, Mike Snitzer wrote:
> On Tue, Nov 15 2016 at  6:33pm -0500,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> +/*
>> + * Returns true if all paths are blk-mq devices. Returns false if all paths
>> + * are single queue block devices or if there are no paths.
>> + */
>
> This code isn't specific to multipath.  So "paths" is misplaced.
> "devices" is more appropriate.  Not seeing the need for the comment to
> be honest.  The function name is pretty descriptive.
>
>>  bool dm_table_all_blk_mq_devices(struct dm_table *t)
>>  {
>>  	return t->all_blk_mq;

Hello Mike,

After having added WARN_ON_ONCE(sq_count == 0 && mq_count == 0) close to 
the end of dm_table_determine_type() the following output appeared:

WARNING: CPU: 0 PID: 2458 at drivers/md/dm-table.c:966 
dm_table_complete+0x60e/0x6b0 [dm_mod]
[ ... ]
Call Trace:
  [<ffffffff81329b45>] dump_stack+0x68/0x93
  [<ffffffff810650e6>] __warn+0xc6/0xe0
  [<ffffffff810651b8>] warn_slowpath_null+0x18/0x20
  [<ffffffffa0021c8e>] dm_table_complete+0x60e/0x6b0 [dm_mod]
  [<ffffffffa00245cd>] table_load+0x13d/0x330 [dm_mod]
  [<ffffffffa0024490>] ? retrieve_status+0x1b0/0x1b0 [dm_mod]
  [<ffffffffa0025145>] ctl_ioctl+0x1f5/0x520 [dm_mod]
  [<ffffffffa002547e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
  [<ffffffff811f26cf>] do_vfs_ioctl+0x8f/0x6b0
  [<ffffffff811ff41c>] ? __fget+0x10c/0x200
  [<ffffffff811ff310>] ? expand_files+0x2a0/0x2a0
  [<ffffffff811f2d2c>] SyS_ioctl+0x3c/0x70
  [<ffffffff8163f9aa>] entry_SYSCALL_64_fastpath+0x18/0xad

Is it intentional that dm_table_determine_type() can get called for a dm 
device with an empty table?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] dm: Simplify dm_table_determine_type()
  2016-11-16 20:14     ` Bart Van Assche
@ 2016-11-16 21:11       ` Mike Snitzer
  2016-11-16 21:53         ` Bart Van Assche
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16 21:11 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Wed, Nov 16 2016 at  3:14pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/16/2016 06:54 AM, Mike Snitzer wrote:
> >On Tue, Nov 15 2016 at  6:33pm -0500,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>+/*
> >>+ * Returns true if all paths are blk-mq devices. Returns false if all paths
> >>+ * are single queue block devices or if there are no paths.
> >>+ */
> >
> >This code isn't specific to multipath.  So "paths" is misplaced.
> >"devices" is more appropriate.  Not seeing the need for the comment to
> >be honest.  The function name is pretty descriptive.
> >
> >> bool dm_table_all_blk_mq_devices(struct dm_table *t)
> >> {
> >> 	return t->all_blk_mq;
> 
> Hello Mike,
> 
> After having added WARN_ON_ONCE(sq_count == 0 && mq_count == 0)
> close to the end of dm_table_determine_type() the following output
> appeared:
> 
> WARNING: CPU: 0 PID: 2458 at drivers/md/dm-table.c:966
> dm_table_complete+0x60e/0x6b0 [dm_mod]
> [ ... ]
> Call Trace:
>  [<ffffffff81329b45>] dump_stack+0x68/0x93
>  [<ffffffff810650e6>] __warn+0xc6/0xe0
>  [<ffffffff810651b8>] warn_slowpath_null+0x18/0x20
>  [<ffffffffa0021c8e>] dm_table_complete+0x60e/0x6b0 [dm_mod]
>  [<ffffffffa00245cd>] table_load+0x13d/0x330 [dm_mod]
>  [<ffffffffa0024490>] ? retrieve_status+0x1b0/0x1b0 [dm_mod]
>  [<ffffffffa0025145>] ctl_ioctl+0x1f5/0x520 [dm_mod]
>  [<ffffffffa002547e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
>  [<ffffffff811f26cf>] do_vfs_ioctl+0x8f/0x6b0
>  [<ffffffff811ff41c>] ? __fget+0x10c/0x200
>  [<ffffffff811ff310>] ? expand_files+0x2a0/0x2a0
>  [<ffffffff811f2d2c>] SyS_ioctl+0x3c/0x70
>  [<ffffffff8163f9aa>] entry_SYSCALL_64_fastpath+0x18/0xad
> 
> Is it intentional that dm_table_determine_type() can get called for
> a dm device with an empty table?

What table were you trying to load when this occurred?

Targets like the 'error' target don't have any devices.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] dm: Simplify dm_table_determine_type()
  2016-11-16 21:11       ` Mike Snitzer
@ 2016-11-16 21:53         ` Bart Van Assche
  2016-11-16 23:09           ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-16 21:53 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 11/16/2016 01:11 PM, Mike Snitzer wrote:
> On Wed, Nov 16 2016 at  3:14pm -0500,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> After having added WARN_ON_ONCE(sq_count == 0 && mq_count == 0)
>> close to the end of dm_table_determine_type() the following output
>> appeared:
>>
>> WARNING: CPU: 0 PID: 2458 at drivers/md/dm-table.c:966
>> dm_table_complete+0x60e/0x6b0 [dm_mod]
>> [ ... ]
>> Call Trace:
>>  [<ffffffff81329b45>] dump_stack+0x68/0x93
>>  [<ffffffff810650e6>] __warn+0xc6/0xe0
>>  [<ffffffff810651b8>] warn_slowpath_null+0x18/0x20
>>  [<ffffffffa0021c8e>] dm_table_complete+0x60e/0x6b0 [dm_mod]
>>  [<ffffffffa00245cd>] table_load+0x13d/0x330 [dm_mod]
>>  [<ffffffffa0024490>] ? retrieve_status+0x1b0/0x1b0 [dm_mod]
>>  [<ffffffffa0025145>] ctl_ioctl+0x1f5/0x520 [dm_mod]
>>  [<ffffffffa002547e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
>>  [<ffffffff811f26cf>] do_vfs_ioctl+0x8f/0x6b0
>>  [<ffffffff811ff41c>] ? __fget+0x10c/0x200
>>  [<ffffffff811ff310>] ? expand_files+0x2a0/0x2a0
>>  [<ffffffff811f2d2c>] SyS_ioctl+0x3c/0x70
>>  [<ffffffff8163f9aa>] entry_SYSCALL_64_fastpath+0x18/0xad
>>
>> Is it intentional that dm_table_determine_type() can get called for
>> a dm device with an empty table?
>
> What table were you trying to load when this occurred?
>
> Targets like the 'error' target don't have any devices.

Hello Mike,

This was the result of a table load issued by multipathd. Do you want me 
to add more debug code such that the details of the table load request 
are logged?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] dm: Simplify dm_table_determine_type()
  2016-11-16 21:53         ` Bart Van Assche
@ 2016-11-16 23:09           ` Mike Snitzer
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Snitzer @ 2016-11-16 23:09 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Wed, Nov 16 2016 at  4:53pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/16/2016 01:11 PM, Mike Snitzer wrote:
> >On Wed, Nov 16 2016 at  3:14pm -0500,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>After having added WARN_ON_ONCE(sq_count == 0 && mq_count == 0)
> >>close to the end of dm_table_determine_type() the following output
> >>appeared:
> >>
> >>WARNING: CPU: 0 PID: 2458 at drivers/md/dm-table.c:966
> >>dm_table_complete+0x60e/0x6b0 [dm_mod]
> >>[ ... ]
> >>Call Trace:
> >> [<ffffffff81329b45>] dump_stack+0x68/0x93
> >> [<ffffffff810650e6>] __warn+0xc6/0xe0
> >> [<ffffffff810651b8>] warn_slowpath_null+0x18/0x20
> >> [<ffffffffa0021c8e>] dm_table_complete+0x60e/0x6b0 [dm_mod]
> >> [<ffffffffa00245cd>] table_load+0x13d/0x330 [dm_mod]
> >> [<ffffffffa0024490>] ? retrieve_status+0x1b0/0x1b0 [dm_mod]
> >> [<ffffffffa0025145>] ctl_ioctl+0x1f5/0x520 [dm_mod]
> >> [<ffffffffa002547e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
> >> [<ffffffff811f26cf>] do_vfs_ioctl+0x8f/0x6b0
> >> [<ffffffff811ff41c>] ? __fget+0x10c/0x200
> >> [<ffffffff811ff310>] ? expand_files+0x2a0/0x2a0
> >> [<ffffffff811f2d2c>] SyS_ioctl+0x3c/0x70
> >> [<ffffffff8163f9aa>] entry_SYSCALL_64_fastpath+0x18/0xad
> >>
> >>Is it intentional that dm_table_determine_type() can get called for
> >>a dm device with an empty table?
> >
> >What table were you trying to load when this occurred?
> >
> >Targets like the 'error' target don't have any devices.
> 
> Hello Mike,
> 
> This was the result of a table load issued by multipathd. Do you
> want me to add more debug code such that the details of the table
> load request are logged?

No, I think it's fine.  You likely have 'queue_if_no_path' configured
and multipathd is loading a 'multipath' table that doesn't have any
paths.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/7] dm-mpath: Document a locking assumption
  2016-11-15 23:33 ` [PATCH 3/7] dm-mpath: Document a locking assumption Bart Van Assche
@ 2016-11-18  0:07   ` Mike Snitzer
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Snitzer @ 2016-11-18  0:07 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 15 2016 at  6:33pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> Document that __pg_init_all_paths() must be called with
> multipath.lock held.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> ---
>  drivers/md/dm-mpath.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index e477af8..f1e226d 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -348,6 +348,8 @@ static int __pg_init_all_paths(struct multipath *m)
>  	struct pgpath *pgpath;
>  	unsigned long pg_init_delay = 0;
>  
> +	lockdep_assert_held(&m->lock);
> +
>  	if (atomic_read(&m->pg_init_in_progress) || test_bit(MPATHF_PG_INIT_DISABLED, &m->flags))
>  		return 0;
>  
> -- 
> 2.10.1
> 

The leading underscores document the need for locking
(__pg_init_all_paths vs pg_init_all_paths).

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-16  0:37   ` Mike Snitzer
  2016-11-16  0:40     ` Bart Van Assche
@ 2016-11-21 21:44     ` Bart Van Assche
  2016-11-21 23:43       ` Mike Snitzer
  1 sibling, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-21 21:44 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 11/15/2016 04:37 PM, Mike Snitzer wrote:
> On Tue, Nov 15 2016 at  6:35pm -0500,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
>> If a single-queue dm device is stacked on top of multi-queue block
>> devices and map_tio_request() is called while there are no paths then
>> the request will be prepared for a single-queue path. If a path is
>> added after a request was prepared and before __multipath_map() is
>> called return DM_MAPIO_REQUEUE such that it gets unprepared and
>> reprepared as a blk-mq request.
> 
> This patch makes little sense to me.  There isn't a scenario that I'm
> aware of that would allow the request_queue to transition between old
> .request_fn and new blk-mq.
> 
> The dm-table code should prevent this.

Hello Mike,

After having added the following code in __multipath_map() just before
the set_mpio() call:

	bdev = pgpath->path.dev->bdev;
	q = bdev_get_queue(bdev);

	if (WARN_ON_ONCE(clone && q->mq_ops) ||
	    WARN_ON_ONCE(!clone && !q->mq_ops)) {
		pr_debug("q->queue_flags = %#lx\n", q->queue_flags);
		return r;
	}

I see the following warning appear (line 544 contains
WARN_ON_ONCE(clone && q->mq_ops)):

------------[ cut here ]------------
WARNING: CPU: 2 PID: 25384 at drivers/md/dm-mpath.c:544 __multipath_map.isra.17+0x325/0x360 [dm_multipath]
Modules linked in: ib_srp scsi_transport_srp ib_srpt(O) scst_vdisk(O) scst(O) dlm libcrc32c brd dm_service_time netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet ib_ipoib rdma_ucm ib_ucm msr ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core x86_pkg_temp_thermal coretemp kvm_intel hid_generic ipmi_ssif usbhid ipmi_devintf kvm irqbypass mlx4_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel tg3 aes_x86_64 iTCO_wdt lrw gf128mul ptp dcdbas iTCO_vendor_support glue_helper pps_core ablk_helper libphy c
 ryptd ipmi_si pcspkr mei_me fjes ipmi_msghandler mei shpchp tpm_tis tpm_tis_core lpc_ich tpm mfd_core wmi button mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys!
 _fops ttm drm sr_mod cdrom crc32c_intel ehci_pci ehci_hcd usbcore usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua [last unloaded: brd]
CPU: 2 PID: 25384 Comm: kdmwork-254:0 Tainted: G           O    4.9.0-rc6-dbg+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
 ffffc90002cd7d00 ffffffff81329bb5 0000000000000000 0000000000000000
 ffffc90002cd7d40 ffffffff810650e6 0000022000001000 ffff8804433a0008
 ffff88039134fc28 ffff88037e804008 ffff88039bacce98 0000000000001000
Call Trace:
 [<ffffffff81329bb5>] dump_stack+0x68/0x93
 [<ffffffff810650e6>] __warn+0xc6/0xe0
 [<ffffffff810651b8>] warn_slowpath_null+0x18/0x20
 [<ffffffffa0046125>] __multipath_map.isra.17+0x325/0x360 [dm_multipath]
 [<ffffffffa0046192>] multipath_map+0x12/0x20 [dm_multipath]
 [<ffffffffa002a356>] map_request+0x46/0x300 [dm_mod]
 [<ffffffffa002a621>] map_tio_request+0x11/0x30 [dm_mod]
 [<ffffffff8108a065>] kthread_worker_fn+0x105/0x1e0
 [<ffffffff81089f60>] ? __kthread_init_worker+0x70/0x70
 [<ffffffff81089ecb>] kthread+0xeb/0x110
 [<ffffffff81089de0>] ? kthread_park+0x60/0x60
 [<ffffffff8163fcc7>] ret_from_fork+0x27/0x40
---[ end trace b181de88e3eff2a0 ]---
dm_multipath:__multipath_map: q->queue_flags = 0x1d06a00

As one can see neither QUEUE_FLAG_DYING nor QUEUE_FLAG_DEAD was set:

$ grep -E 'define QUEUE_FLAG[^[:blank:]]*[[:blank:]](9|11|13|14|20|22|23|24)[[:blank:]]' include/linux/blkdev.h 
#define QUEUE_FLAG_SAME_COMP    9       /* complete on same CPU-group */
#define QUEUE_FLAG_STACKABLE   11       /* supports request stacking */
#define QUEUE_FLAG_IO_STAT     13       /* do IO stats */
#define QUEUE_FLAG_DISCARD     14       /* supports DISCARD */
#define QUEUE_FLAG_INIT_DONE   20       /* queue is initialized */
#define QUEUE_FLAG_POLL        22       /* IO polling enabled if set */
#define QUEUE_FLAG_WC          23       /* Write back caching */
#define QUEUE_FLAG_FUA         24       /* device supports FUA writes */

Do you want to comment on this?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-21 21:44     ` Bart Van Assche
@ 2016-11-21 23:43       ` Mike Snitzer
  2016-11-21 23:57         ` Bart Van Assche
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2016-11-21 23:43 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Mon, Nov 21 2016 at  4:44pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/15/2016 04:37 PM, Mike Snitzer wrote:
> > On Tue, Nov 15 2016 at  6:35pm -0500,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> > 
> >> If a single-queue dm device is stacked on top of multi-queue block
> >> devices and map_tio_request() is called while there are no paths then
> >> the request will be prepared for a single-queue path. If a path is
> >> added after a request was prepared and before __multipath_map() is
> >> called return DM_MAPIO_REQUEUE such that it gets unprepared and
> >> reprepared as a blk-mq request.
> > 
> > This patch makes little sense to me.  There isn't a scenario that I'm
> > aware of that would allow the request_queue to transition between old
> > .request_fn and new blk-mq.
> > 
> > The dm-table code should prevent this.
> 
> Hello Mike,
> 
> After having added the following code in __multipath_map() just before
> the set_mpio() call:
> 
> 	bdev = pgpath->path.dev->bdev;
> 	q = bdev_get_queue(bdev);
> 
> 	if (WARN_ON_ONCE(clone && q->mq_ops) ||
> 	    WARN_ON_ONCE(!clone && !q->mq_ops)) {
> 		pr_debug("q->queue_flags = %#lx\n", q->queue_flags);
> 		return r;
> 	}
> 
> I see the following warning appear (line 544 contains
> WARN_ON_ONCE(clone && q->mq_ops)):
> 
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 25384 at drivers/md/dm-mpath.c:544 __multipath_map.isra.17+0x325/0x360 [dm_multipath]
> Modules linked in: ib_srp scsi_transport_srp ib_srpt(O) scst_vdisk(O) scst(O) dlm libcrc32c brd dm_service_time netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet ib_ipoib rdma_ucm ib_ucm msr ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core x86_pkg_temp_thermal coretemp kvm_intel hid_generic ipmi_ssif usbhid ipmi_devintf kvm irqbypass mlx4_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel tg3 aes_x86_64 iTCO_wdt lrw gf128mul ptp dcdbas iTCO_vendor_support glue_helper pps_core ablk_helper libphy
  cryptd ipmi_si pcspkr mei_me fjes ipmi_msghandler mei shpchp tpm_tis tpm_tis_core lpc_ich tpm mfd_core wmi button mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s!
 ys_fops ttm drm sr_mod cdrom crc32c_intel ehci_pci ehci_hcd usbcore usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua [last unloaded: brd]
> CPU: 2 PID: 25384 Comm: kdmwork-254:0 Tainted: G           O    4.9.0-rc6-dbg+ #1
> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
>  ffffc90002cd7d00 ffffffff81329bb5 0000000000000000 0000000000000000
>  ffffc90002cd7d40 ffffffff810650e6 0000022000001000 ffff8804433a0008
>  ffff88039134fc28 ffff88037e804008 ffff88039bacce98 0000000000001000
> Call Trace:
>  [<ffffffff81329bb5>] dump_stack+0x68/0x93
>  [<ffffffff810650e6>] __warn+0xc6/0xe0
>  [<ffffffff810651b8>] warn_slowpath_null+0x18/0x20
>  [<ffffffffa0046125>] __multipath_map.isra.17+0x325/0x360 [dm_multipath]
>  [<ffffffffa0046192>] multipath_map+0x12/0x20 [dm_multipath]
>  [<ffffffffa002a356>] map_request+0x46/0x300 [dm_mod]
>  [<ffffffffa002a621>] map_tio_request+0x11/0x30 [dm_mod]
>  [<ffffffff8108a065>] kthread_worker_fn+0x105/0x1e0
>  [<ffffffff81089f60>] ? __kthread_init_worker+0x70/0x70
>  [<ffffffff81089ecb>] kthread+0xeb/0x110
>  [<ffffffff81089de0>] ? kthread_park+0x60/0x60
>  [<ffffffff8163fcc7>] ret_from_fork+0x27/0x40
> ---[ end trace b181de88e3eff2a0 ]---
> dm_multipath:__multipath_map: q->queue_flags = 0x1d06a00
> 
> As one can see neither QUEUE_FLAG_DYING nor QUEUE_FLAG_DEAD was set:
> 
> $ grep -E 'define QUEUE_FLAG[^[:blank:]]*[[:blank:]](9|11|13|14|20|22|23|24)[[:blank:]]' include/linux/blkdev.h 
> #define QUEUE_FLAG_SAME_COMP    9       /* complete on same CPU-group */
> #define QUEUE_FLAG_STACKABLE   11       /* supports request stacking */
> #define QUEUE_FLAG_IO_STAT     13       /* do IO stats */
> #define QUEUE_FLAG_DISCARD     14       /* supports DISCARD */
> #define QUEUE_FLAG_INIT_DONE   20       /* queue is initialized */
> #define QUEUE_FLAG_POLL        22       /* IO polling enabled if set */
> #define QUEUE_FLAG_WC          23       /* Write back caching */
> #define QUEUE_FLAG_FUA         24       /* device supports FUA writes */
> 
> Do you want to comment on this?

Shouldn't be possible.  The previous stacktrace you shared clearly
showed that the DM mpath request_queue was using blk-mq (dm_mq_queue_rq
was in the stack).

Whereas the stacktrace above is clearly the old request_fn interface.

I'm unaware of how the existing code can allow this.  As I said in my
earlier mails on this: the request-queue shouldn't be able to change
from blk-mq back to .request_fn or vice-versa.

So if you think you're only testing blk-mq DM mpath on blk-mq paths,
then you need to determine how dm_old_init_request_queue() is getting
called to even setup .request_fn (dm_old_request_fn) to be used.

If the opposite is true (old request_fn DM mpath stack on blk-mq paths)
then determine how dm_mq_init_request_queue is getting called.

Basically dm_setup_md_queue() should only ever be called the first time
the "multipath" target is loaded.  If that isn't the case, then you've
exposed some seriously weird bug/regression.

Mike

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-21 23:43       ` Mike Snitzer
@ 2016-11-21 23:57         ` Bart Van Assche
  2016-11-22  0:34           ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-21 23:57 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 11/21/2016 03:43 PM, Mike Snitzer wrote:
> Shouldn't be possible.  The previous stacktrace you shared clearly
> showed that the DM mpath request_queue was using blk-mq (dm_mq_queue_rq
> was in the stack).
>
> Whereas the stacktrace above is clearly the old request_fn interface.
>
> I'm unaware of how the existing code can allow this.  As I said in my
> earlier mails on this: the request-queue shouldn't be able to change
> from blk-mq back to .request_fn or vice-versa.
>
> So if you think you're only testing blk-mq DM mpath on blk-mq paths,
> then you need to determine how dm_old_init_request_queue() is getting
> called to even setup .request_fn (dm_old_request_fn) to be used.
>
> If the opposite is true (old request_fn DM mpath stack on blk-mq paths)
> then determine how dm_mq_init_request_queue is getting called.
>
> Basically dm_setup_md_queue() should only ever be called the first time
> the "multipath" target is loaded.  If that isn't the case, then you've
> exposed some seriously weird bug/regression.

Hello Mike,

Sorry that I had not yet mentioned this, but the test I'm running is as 
follows:

# while true; do for t in mq sq sq-on-mq; do echo ==== $t; 
srp-test/run_tests -s -t 02-$t; done

In other words, I'm alternating mq-on-mq, sq-on-sq and sq-on-mq tests. 
The above command does not log the time at which each test started so 
I'm not sure whether it is possible to determine which test triggered 
the call stack I posted in my previous e-mail.

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-21 23:57         ` Bart Van Assche
@ 2016-11-22  0:34           ` Mike Snitzer
  2016-11-22 23:47             ` Bart Van Assche
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2016-11-22  0:34 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Mon, Nov 21 2016 at  6:57pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/21/2016 03:43 PM, Mike Snitzer wrote:
> >Shouldn't be possible.  The previous stacktrace you shared clearly
> >showed that the DM mpath request_queue was using blk-mq (dm_mq_queue_rq
> >was in the stack).
> >
> >Whereas the stacktrace above is clearly the old request_fn interface.
> >
> >I'm unaware of how the existing code can allow this.  As I said in my
> >earlier mails on this: the request-queue shouldn't be able to change
> >from blk-mq back to .request_fn or vice-versa.
> >
> >So if you think you're only testing blk-mq DM mpath on blk-mq paths,
> >then you need to determine how dm_old_init_request_queue() is getting
> >called to even setup .request_fn (dm_old_request_fn) to be used.
> >
> >If the opposite is true (old request_fn DM mpath stack on blk-mq paths)
> >then determine how dm_mq_init_request_queue is getting called.
> >
> >Basically dm_setup_md_queue() should only ever be called the first time
> >the "multipath" target is loaded.  If that isn't the case, then you've
> >exposed some seriously weird bug/regression.
> 
> Hello Mike,
> 
> Sorry that I had not yet mentioned this, but the test I'm running is
> as follows:
> 
> # while true; do for t in mq sq sq-on-mq; do echo ==== $t;
> srp-test/run_tests -s -t 02-$t; done

But you WARN_ON_ONCE(clone && q->mq_ops) will trigger with sq-on-mq.
So this would seem to be a false positive.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-22  0:34           ` Mike Snitzer
@ 2016-11-22 23:47             ` Bart Van Assche
  2016-11-23  0:48               ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-22 23:47 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 11/21/2016 04:34 PM, Mike Snitzer wrote:
> But you WARN_ON_ONCE(clone && q->mq_ops) will trigger with sq-on-mq.

Hello Mike,

This behavior is indeed triggered by the sq-on-mq test. After having 
added the following code in __bind():

	if (old_map &&
	    dm_table_all_blk_mq_devices(old_map) !=
	    dm_table_all_blk_mq_devices(t))
		pr_debug("%s: old all_blk_mq %d <> new all_blk_mq %d\n",
			 dm_device_name(md),
			 dm_table_all_blk_mq_devices(old_map),
			 dm_table_all_blk_mq_devices(t));

I see the following output appear frequently in the kernel log:

dm_mod:__bind: 254:0: old all_blk_mq 1 <> new all_blk_mq 0

Could these all_blk_mq state changes explain that WARN_ON_ONCE(clone && 
q->mq_ops) is triggered in __multipath_map()? Does this mean that the 
comment in patch http://marc.info/?l=dm-devel&m=147925314306752 is correct?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-22 23:47             ` Bart Van Assche
@ 2016-11-23  0:48               ` Mike Snitzer
  2016-11-23  3:16                 ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2016-11-23  0:48 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 22 2016 at  6:47pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/21/2016 04:34 PM, Mike Snitzer wrote:
> >But you WARN_ON_ONCE(clone && q->mq_ops) will trigger with sq-on-mq.
> 
> Hello Mike,
> 
> This behavior is indeed triggered by the sq-on-mq test. After having
> added the following code in __bind():
> 
> 	if (old_map &&
> 	    dm_table_all_blk_mq_devices(old_map) !=
> 	    dm_table_all_blk_mq_devices(t))
> 		pr_debug("%s: old all_blk_mq %d <> new all_blk_mq %d\n",
> 			 dm_device_name(md),
> 			 dm_table_all_blk_mq_devices(old_map),
> 			 dm_table_all_blk_mq_devices(t));
> 
> I see the following output appear frequently in the kernel log:
> 
> dm_mod:__bind: 254:0: old all_blk_mq 1 <> new all_blk_mq 0
> 
> Could these all_blk_mq state changes explain that WARN_ON_ONCE(clone
> && q->mq_ops) is triggered in __multipath_map()? Does this mean that
> the comment in patch http://marc.info/?l=dm-devel&m=147925314306752
> is correct?

Yes, looks like you're on to something.  dm_old_prep_tio() has:

        /*
         * Must clone a request if this .request_fn DM device
         * is stacked on .request_fn device(s).
         */
        if (!dm_table_all_blk_mq_devices(table)) { ...

So if your table->all_blk_mq is false (likely due to a temporary no
paths in multipath table scenario) a clone will be passed to
__multipath_map().  But what isn't clear is how __multipath_map() would
then go on to have any underlying paths available to issue IO to (given
the table would have been empty -- or so I would think).

Would be great if you could verify that the table is in fact empty.

It should be noted that dm_table_determine_type() has:

        if (list_empty(devices) && __table_type_request_based(live_md_type)) {
                /* inherit live MD type */
                t->type = live_md_type;
                return 0;
        }

So this explains how/why an empty table will inherit the
DM_TYPE_MQ_REQUEST_BASED, and it also explains why the new (empty) table
would not have ->all_blk_mq set to true.  But again, no IO is able to be
issued when there are no underlying paths.

And looking closer, __multipath_map() _should_ return early with either
DM_MAPIO_DELAY_REQUEUE or DM_MAPIO_REQUEUE when no paths are available.

Not seeing how you could have this scenario where you prepared a clone
(as if the clone request were to be issued to a .request_fn, aka "sq",
device) and then by the time you get into __multipath_map() there is an
underlying path available with q->mq_ops.

But regardless, what certainly needs fixing is the inconsistency of
inheriting DM_TYPE_MQ_REQUEST_BASED but not setting all_blk_mq to true
(all_blk_mq == true is implied by DM_TYPE_MQ_REQUEST_BASED).

I'm now questioning why we even need the 'all_blk_mq' state within the
table.  I'll revisit the "why?" on all that in my previous commits.
I'll likely get back with you on this point tomorrow.  And will
hopefully have a fix for you.

FYI: given all this I do think something like your 7/7 patch (which you
referenced with the above url) would be a possible added safety net to
guard against future inconsistencies/bug/regressions.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-23  0:48               ` Mike Snitzer
@ 2016-11-23  3:16                 ` Mike Snitzer
  2016-11-23 18:28                   ` Bart Van Assche
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2016-11-23  3:16 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Nov 22 2016 at  7:48P -0500,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> But regardless, what certainly needs fixing is the inconsistency of
> inheriting DM_TYPE_MQ_REQUEST_BASED but not setting all_blk_mq to true
> (all_blk_mq == true is implied by DM_TYPE_MQ_REQUEST_BASED).
> 
> I'm now questioning why we even need the 'all_blk_mq' state within the
> table.  I'll revisit the "why?" on all that in my previous commits.
> I'll likely get back with you on this point tomorrow.  And will
> hopefully have a fix for you.

We need 'all_blk_mq' because DM_TYPE_* is only used to establish the
DM device's type of request_queue.  It doesn't say anything about the DM
device's underlying devices.

Anyway, this _untested_ patch should hopefully resolve the 'all_blk_mq'
inconsistency you saw:

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 8b013ea..8ce81d0 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -924,12 +924,6 @@ static int dm_table_determine_type(struct dm_table *t)
 
 	BUG_ON(!request_based); /* No targets in this table */
 
-	if (list_empty(devices) && __table_type_request_based(live_md_type)) {
-		/* inherit live MD type */
-		t->type = live_md_type;
-		return 0;
-	}
-
 	/*
 	 * The only way to establish DM_TYPE_MQ_REQUEST_BASED is by
 	 * having a compatible target use dm_table_set_type.
@@ -948,6 +942,19 @@ static int dm_table_determine_type(struct dm_table *t)
 		return -EINVAL;
 	}
 
+	if (list_empty(devices)) {
+		int srcu_idx;
+		struct dm_table *live_table = dm_get_live_table(t->md, &srcu_idx);
+
+		/* inherit live table's type and all_blk_mq */
+		if (live_table) {
+			t->type = live_table->type;
+			t->all_blk_mq = live_table->all_blk_mq;
+		}
+		dm_put_live_table(t->md, srcu_idx);
+		return 0;
+	}
+
 	/* Non-request-stackable devices can't be used for request-based dm */
 	list_for_each_entry(dd, devices, list) {
 		struct request_queue *q = bdev_get_queue(dd->dm_dev->bdev);

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-23  3:16                 ` Mike Snitzer
@ 2016-11-23 18:28                   ` Bart Van Assche
  2016-11-23 18:50                     ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2016-11-23 18:28 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 11/22/2016 07:16 PM, Mike Snitzer wrote:
> Anyway, this _untested_ patch should hopefully resolve the 'all_blk_mq'
> inconsistency you saw:
>
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 8b013ea..8ce81d0 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -924,12 +924,6 @@ static int dm_table_determine_type(struct dm_table *t)
>
>  	BUG_ON(!request_based); /* No targets in this table */
>
> -	if (list_empty(devices) && __table_type_request_based(live_md_type)) {
> -		/* inherit live MD type */
> -		t->type = live_md_type;
> -		return 0;
> -	}
> -
>  	/*
>  	 * The only way to establish DM_TYPE_MQ_REQUEST_BASED is by
>  	 * having a compatible target use dm_table_set_type.
> @@ -948,6 +942,19 @@ static int dm_table_determine_type(struct dm_table *t)
>  		return -EINVAL;
>  	}
>
> +	if (list_empty(devices)) {
> +		int srcu_idx;
> +		struct dm_table *live_table = dm_get_live_table(t->md, &srcu_idx);
> +
> +		/* inherit live table's type and all_blk_mq */
> +		if (live_table) {
> +			t->type = live_table->type;
> +			t->all_blk_mq = live_table->all_blk_mq;
> +		}
> +		dm_put_live_table(t->md, srcu_idx);
> +		return 0;
> +	}
> +
>  	/* Non-request-stackable devices can't be used for request-based dm */
>  	list_for_each_entry(dd, devices, list) {
>  		struct request_queue *q = bdev_get_queue(dd->dm_dev->bdev);
>

Hello Mike,

Thanks for the patch. This patch works fine on my test setup. This means 
that WARN_ON_ONCE(clone && q->mq_ops) is no longer triggered.

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()
  2016-11-23 18:28                   ` Bart Van Assche
@ 2016-11-23 18:50                     ` Mike Snitzer
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Snitzer @ 2016-11-23 18:50 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Wed, Nov 23 2016 at  1:28pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/22/2016 07:16 PM, Mike Snitzer wrote:
> >Anyway, this _untested_ patch should hopefully resolve the 'all_blk_mq'
> >inconsistency you saw:
> >
> >diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> >index 8b013ea..8ce81d0 100644
> >--- a/drivers/md/dm-table.c
> >+++ b/drivers/md/dm-table.c
> >@@ -924,12 +924,6 @@ static int dm_table_determine_type(struct dm_table *t)
> >
> > 	BUG_ON(!request_based); /* No targets in this table */
> >
> >-	if (list_empty(devices) && __table_type_request_based(live_md_type)) {
> >-		/* inherit live MD type */
> >-		t->type = live_md_type;
> >-		return 0;
> >-	}
> >-
> > 	/*
> > 	 * The only way to establish DM_TYPE_MQ_REQUEST_BASED is by
> > 	 * having a compatible target use dm_table_set_type.
> >@@ -948,6 +942,19 @@ static int dm_table_determine_type(struct dm_table *t)
> > 		return -EINVAL;
> > 	}
> >
> >+	if (list_empty(devices)) {
> >+		int srcu_idx;
> >+		struct dm_table *live_table = dm_get_live_table(t->md, &srcu_idx);
> >+
> >+		/* inherit live table's type and all_blk_mq */
> >+		if (live_table) {
> >+			t->type = live_table->type;
> >+			t->all_blk_mq = live_table->all_blk_mq;
> >+		}
> >+		dm_put_live_table(t->md, srcu_idx);
> >+		return 0;
> >+	}
> >+
> > 	/* Non-request-stackable devices can't be used for request-based dm */
> > 	list_for_each_entry(dd, devices, list) {
> > 		struct request_queue *q = bdev_get_queue(dd->dm_dev->bdev);
> >
> 
> Hello Mike,
> 
> Thanks for the patch. This patch works fine on my test setup. This
> means that WARN_ON_ONCE(clone && q->mq_ops) is no longer triggered.

Thanks for testing.  I'll get it staged for 4.10 inclusion and marked
for stable.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-11-23 18:50 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-15 23:31 [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Bart Van Assche
2016-11-15 23:32 ` [PATCH 1/7] dm: Fix a (theoretical?) race condition in rq_completed() Bart Van Assche
2016-11-16  0:46   ` Mike Snitzer
2016-11-15 23:33 ` [PATCH 2/7] dm: Simplify dm_table_determine_type() Bart Van Assche
2016-11-16 14:54   ` Mike Snitzer
2016-11-16 20:14     ` Bart Van Assche
2016-11-16 21:11       ` Mike Snitzer
2016-11-16 21:53         ` Bart Van Assche
2016-11-16 23:09           ` Mike Snitzer
2016-11-15 23:33 ` [PATCH 3/7] dm-mpath: Document a locking assumption Bart Van Assche
2016-11-18  0:07   ` Mike Snitzer
2016-11-15 23:34 ` [PATCH 4/7] dm-mpath: Change return type of pg_init_all_paths() from int into void Bart Van Assche
2016-11-15 23:34 ` [PATCH 5/7] dm-mpath: Do not touch *__clone if request allocation fails Bart Van Assche
2016-11-15 23:34 ` [PATCH 6/7] dm-mpath: Avoid code duplication in __multipath_map() Bart Van Assche
2016-11-16  0:39   ` Mike Snitzer
2016-11-15 23:35 ` [PATCH 7/7] dm-mpath: Fix a race condition " Bart Van Assche
2016-11-16  0:37   ` Mike Snitzer
2016-11-16  0:40     ` Bart Van Assche
2016-11-16  1:01       ` Mike Snitzer
2016-11-16  1:08         ` Bart Van Assche
2016-11-16  1:50           ` Mike Snitzer
2016-11-21 21:44     ` Bart Van Assche
2016-11-21 23:43       ` Mike Snitzer
2016-11-21 23:57         ` Bart Van Assche
2016-11-22  0:34           ` Mike Snitzer
2016-11-22 23:47             ` Bart Van Assche
2016-11-23  0:48               ` Mike Snitzer
2016-11-23  3:16                 ` Mike Snitzer
2016-11-23 18:28                   ` Bart Van Assche
2016-11-23 18:50                     ` Mike Snitzer
2016-11-16  0:47 ` [PATCH 0/7] dm-mpath: Fix a race condition in the blk-mq path Mike Snitzer
2016-11-16  0:57   ` Bart Van Assche
2016-11-16  1:08     ` Mike Snitzer
2016-11-16  1:10       ` Bart Van Assche
2016-11-16  1:53         ` Mike Snitzer
2016-11-16  7:39 ` Hannes Reinecke
2016-11-16 14:56   ` Mike Snitzer
2016-11-16 18:22     ` Bart Van Assche
2016-11-16 19:32       ` Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).