* [PATCH 0/9] dm patches for kernel v4.9
@ 2016-08-31 22:14 Bart Van Assche
2016-08-31 22:15 ` [PATCH 1/9] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
` (8 more replies)
0 siblings, 9 replies; 45+ messages in thread
From: Bart Van Assche @ 2016-08-31 22:14 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
Hello Mike,
This patch series avoids that removing a dm device hangs sporadically. I
would appreciate it if you would consider these patches for inclusion in
kernel v4.9. The patches in this series are:
0001-blk-mq-Introduce-blk_mq_queue_stopped.patch
0002-dm-Rename-a-function-argument.patch
0003-dm-Introduce-signal_pending_state.patch
0004-dm-Convert-wait-loops.patch
0005-dm-Add-two-lockdep_assert_held-statements.patch
0006-dm-Simplify-dm_old_stop_queue.patch
0007-dm-Mark-block-layer-queue-dead-before-destroying-the.patch
0008-dm-Fix-two-race-conditions-related-to-stopping-and-s.patch
0009-dm-path-selector-Avoid-that-device-removal-triggers-.patch
Thanks,
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 1/9] blk-mq: Introduce blk_mq_queue_stopped()
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
@ 2016-08-31 22:15 ` Bart Van Assche
2016-08-31 22:16 ` [PATCH 2/9] dm: Rename a function argument Bart Van Assche
` (7 subsequent siblings)
8 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2016-08-31 22:15 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Jens Axboe, device-mapper development, Christoph Hellwig
The function blk_queue_stopped() allows to test whether or not a
non-mq queue has been stopped. Introduce a helper function that
allows block drivers to query easily whether or not one or more
hardware contexts of a blk-mq queue have been stopped.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
---
block/blk-mq.c | 17 +++++++++++++++++
include/linux/blk-mq.h | 1 +
2 files changed, 18 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0dcafa6..6929a52 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -965,6 +965,23 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
}
EXPORT_SYMBOL(blk_mq_run_hw_queues);
+/*
+ * The caller is responsible for serializing this function against
+ * blk_mq_{start,stop}_hw_queue().
+ */
+bool blk_mq_queue_stopped(struct request_queue *q)
+{
+ struct blk_mq_hw_ctx *hctx;
+ int i;
+
+ queue_for_each_hw_ctx(q, hctx, i)
+ if (test_bit(BLK_MQ_S_STOPPED, &hctx->state))
+ return true;
+
+ return false;
+}
+EXPORT_SYMBOL(blk_mq_queue_stopped);
+
void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
{
cancel_delayed_work(&hctx->run_work);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index e43bbff..bd677bc 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -235,6 +235,7 @@ void blk_mq_kick_requeue_list(struct request_queue *q);
void blk_mq_abort_requeue_list(struct request_queue *q);
void blk_mq_complete_request(struct request *rq, int error);
+bool blk_mq_queue_stopped(struct request_queue *q);
void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx);
void blk_mq_stop_hw_queues(struct request_queue *q);
--
2.9.3
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 2/9] dm: Rename a function argument
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
2016-08-31 22:15 ` [PATCH 1/9] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
@ 2016-08-31 22:16 ` Bart Van Assche
2016-09-01 3:29 ` Mike Snitzer
2016-08-31 22:16 ` [PATCH 3/9] dm: Introduce signal_pending_state() Bart Van Assche
` (6 subsequent siblings)
8 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-08-31 22:16 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
Rename 'interruptible' into 'sleep_state' to make it clear that this
argument is a task state instead of a boolean.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
drivers/md/dm.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fa9b1cb..1d3627c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1934,7 +1934,8 @@ void dm_put(struct mapped_device *md)
}
EXPORT_SYMBOL_GPL(dm_put);
-static int dm_wait_for_completion(struct mapped_device *md, int interruptible)
+/* @sleep_state: e.g. TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE */
+static int dm_wait_for_completion(struct mapped_device *md, int sleep_state)
{
int r = 0;
DECLARE_WAITQUEUE(wait, current);
@@ -1942,12 +1943,12 @@ static int dm_wait_for_completion(struct mapped_device *md, int interruptible)
add_wait_queue(&md->wait, &wait);
while (1) {
- set_current_state(interruptible);
+ set_current_state(sleep_state);
if (!md_in_flight(md))
break;
- if (interruptible == TASK_INTERRUPTIBLE &&
+ if (sleep_state == TASK_INTERRUPTIBLE &&
signal_pending(current)) {
r = -EINTR;
break;
@@ -2075,6 +2076,10 @@ static void unlock_fs(struct mapped_device *md)
}
/*
+ * @suspend_flags: DM_SUSPEND_LOCKFS_FLAG and/or DM_SUSPEND_NOFLUSH_FLAG
+ * @sleep_state: e.g. TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE
+ * @dmf_suspended_flag: DMF_SUSPENDED or DMF_SUSPENDED_INTERNALLY
+ *
* If __dm_suspend returns 0, the device is completely quiescent
* now. There is no request-processing activity. All new requests
* are being added to md->deferred list.
@@ -2082,7 +2087,7 @@ static void unlock_fs(struct mapped_device *md)
* Caller must hold md->suspend_lock
*/
static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
- unsigned suspend_flags, int interruptible,
+ unsigned suspend_flags, int sleep_state,
int dmf_suspended_flag)
{
bool do_lockfs = suspend_flags & DM_SUSPEND_LOCKFS_FLAG;
@@ -2149,7 +2154,7 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
* We call dm_wait_for_completion to wait for all existing requests
* to finish.
*/
- r = dm_wait_for_completion(md, interruptible);
+ r = dm_wait_for_completion(md, sleep_state);
if (!r)
set_bit(dmf_suspended_flag, &md->flags);
--
2.9.3
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 3/9] dm: Introduce signal_pending_state()
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
2016-08-31 22:15 ` [PATCH 1/9] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
2016-08-31 22:16 ` [PATCH 2/9] dm: Rename a function argument Bart Van Assche
@ 2016-08-31 22:16 ` Bart Van Assche
2016-08-31 22:16 ` [PATCH 4/9] dm: Convert wait loops Bart Van Assche
` (5 subsequent siblings)
8 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2016-08-31 22:16 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
Use signal_pending_state() instead of open-coding it. This patch does
not change any functionality but makes it possible to pass TASK_KILLABLE
as the second argument of dm_wait_for_completion(). See also commit
16882c1e962b ("sched: fix TASK_WAKEKILL vs SIGKILL race").
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>.
---
drivers/md/dm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1d3627c..7cce09e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1948,8 +1948,7 @@ static int dm_wait_for_completion(struct mapped_device *md, int sleep_state)
if (!md_in_flight(md))
break;
- if (sleep_state == TASK_INTERRUPTIBLE &&
- signal_pending(current)) {
+ if (signal_pending_state(sleep_state, current)) {
r = -EINTR;
break;
}
--
2.9.3
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 4/9] dm: Convert wait loops
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
` (2 preceding siblings ...)
2016-08-31 22:16 ` [PATCH 3/9] dm: Introduce signal_pending_state() Bart Van Assche
@ 2016-08-31 22:16 ` Bart Van Assche
2016-08-31 22:17 ` [PATCH 5/9] dm: Add two lockdep_assert_held() statements Bart Van Assche
` (4 subsequent siblings)
8 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2016-08-31 22:16 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
Use autoremove_wake_function() instead of default_wake_function()
to make the dm wait loops more similar to other wait loops in the
kernel. This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
drivers/md/dm-mpath.c | 10 +++-------
drivers/md/dm.c | 10 +++-------
2 files changed, 6 insertions(+), 14 deletions(-)
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index ac734e5..8df7b12 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1193,21 +1193,17 @@ static int multipath_ctr(struct dm_target *ti, unsigned argc, char **argv)
static void multipath_wait_for_pg_init_completion(struct multipath *m)
{
- DECLARE_WAITQUEUE(wait, current);
-
- add_wait_queue(&m->pg_init_wait, &wait);
+ DEFINE_WAIT(wait);
while (1) {
- set_current_state(TASK_UNINTERRUPTIBLE);
+ prepare_to_wait(&m->pg_init_wait, &wait, TASK_UNINTERRUPTIBLE);
if (!atomic_read(&m->pg_init_in_progress))
break;
io_schedule();
}
- set_current_state(TASK_RUNNING);
-
- remove_wait_queue(&m->pg_init_wait, &wait);
+ finish_wait(&m->pg_init_wait, &wait);
}
static void flush_multipath_work(struct multipath *m)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7cce09e..090ef31 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1938,12 +1938,10 @@ EXPORT_SYMBOL_GPL(dm_put);
static int dm_wait_for_completion(struct mapped_device *md, int sleep_state)
{
int r = 0;
- DECLARE_WAITQUEUE(wait, current);
-
- add_wait_queue(&md->wait, &wait);
+ DEFINE_WAIT(wait);
while (1) {
- set_current_state(sleep_state);
+ prepare_to_wait(&md->wait, &wait, sleep_state);
if (!md_in_flight(md))
break;
@@ -1955,9 +1953,7 @@ static int dm_wait_for_completion(struct mapped_device *md, int sleep_state)
io_schedule();
}
- set_current_state(TASK_RUNNING);
-
- remove_wait_queue(&md->wait, &wait);
+ finish_wait(&md->wait, &wait);
return r;
}
--
2.9.3
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 5/9] dm: Add two lockdep_assert_held() statements
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
` (3 preceding siblings ...)
2016-08-31 22:16 ` [PATCH 4/9] dm: Convert wait loops Bart Van Assche
@ 2016-08-31 22:17 ` Bart Van Assche
2016-08-31 22:17 ` [PATCH 6/9] dm: Simplify dm_old_stop_queue() Bart Van Assche
` (3 subsequent siblings)
8 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2016-08-31 22:17 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
Document the locking assumptions for the __bind() and __dm_suspend()
functions.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
drivers/md/dm.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 090ef31..0775411 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1648,6 +1648,8 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
struct request_queue *q = md->queue;
sector_t size;
+ lockdep_assert_held(&md->suspend_lock);
+
size = dm_table_get_size(t);
/*
@@ -2089,6 +2091,8 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
bool noflush = suspend_flags & DM_SUSPEND_NOFLUSH_FLAG;
int r;
+ lockdep_assert_held(&md->suspend_lock);
+
/*
* DMF_NOFLUSH_SUSPENDING must be set before presuspend.
* This flag is cleared before dm_suspend returns.
--
2.9.3
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 6/9] dm: Simplify dm_old_stop_queue()
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
` (4 preceding siblings ...)
2016-08-31 22:17 ` [PATCH 5/9] dm: Add two lockdep_assert_held() statements Bart Van Assche
@ 2016-08-31 22:17 ` Bart Van Assche
2016-08-31 22:17 ` [PATCH 7/9] dm: Mark block layer queue dead before destroying the dm device Bart Van Assche
` (2 subsequent siblings)
8 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2016-08-31 22:17 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
drivers/md/dm-rq.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1ca7463..8dc8cfb 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -89,12 +89,8 @@ static void dm_old_stop_queue(struct request_queue *q)
unsigned long flags;
spin_lock_irqsave(q->queue_lock, flags);
- if (blk_queue_stopped(q)) {
- spin_unlock_irqrestore(q->queue_lock, flags);
- return;
- }
-
- blk_stop_queue(q);
+ if (!blk_queue_stopped(q))
+ blk_stop_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
}
--
2.9.3
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 7/9] dm: Mark block layer queue dead before destroying the dm device
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
` (5 preceding siblings ...)
2016-08-31 22:17 ` [PATCH 6/9] dm: Simplify dm_old_stop_queue() Bart Van Assche
@ 2016-08-31 22:17 ` Bart Van Assche
2016-08-31 22:18 ` [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues Bart Van Assche
2016-08-31 22:18 ` [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop Bart Van Assche
8 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2016-08-31 22:17 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
This avoids that new requests are queued while __dm_destroy() is in
progress.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
drivers/md/dm.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 0775411..4aa7030 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1875,6 +1875,7 @@ EXPORT_SYMBOL_GPL(dm_device_name);
static void __dm_destroy(struct mapped_device *md, bool wait)
{
+ struct request_queue *q = dm_get_md_queue(md);
struct dm_table *map;
int srcu_idx;
@@ -1885,6 +1886,10 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
set_bit(DMF_FREEING, &md->flags);
spin_unlock(&_minor_lock);
+ spin_lock_irq(q->queue_lock);
+ queue_flag_set(QUEUE_FLAG_DYING, q);
+ spin_unlock_irq(q->queue_lock);
+
if (dm_request_based(md) && md->kworker_task)
flush_kthread_worker(&md->kworker);
--
2.9.3
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
` (6 preceding siblings ...)
2016-08-31 22:17 ` [PATCH 7/9] dm: Mark block layer queue dead before destroying the dm device Bart Van Assche
@ 2016-08-31 22:18 ` Bart Van Assche
2016-09-01 3:13 ` Mike Snitzer
2016-08-31 22:18 ` [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop Bart Van Assche
8 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-08-31 22:18 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
calls have stopped before setting the "queue stopped" flag. This
allows to remove the "queue stopped" test from dm_mq_queue_rq() and
dm_mq_requeue_request(). Use BLK_MQ_S_STOPPED instead of
QUEUE_FLAG_STOPPED.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
drivers/md/dm-rq.c | 29 +++++++----------------------
1 file changed, 7 insertions(+), 22 deletions(-)
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 8dc8cfb..b5db523 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -78,7 +78,6 @@ void dm_start_queue(struct request_queue *q)
if (!q->mq_ops)
dm_old_start_queue(q);
else {
- queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, q);
blk_mq_start_stopped_hw_queues(q, true);
blk_mq_kick_requeue_list(q);
}
@@ -98,13 +97,13 @@ void dm_stop_queue(struct request_queue *q)
{
if (!q->mq_ops)
dm_old_stop_queue(q);
- else {
- spin_lock_irq(q->queue_lock);
- queue_flag_set(QUEUE_FLAG_STOPPED, q);
- spin_unlock_irq(q->queue_lock);
-
+ else if (!blk_mq_queue_stopped(q)) {
+ /* Wait until dm_mq_queue_rq() has finished. */
+ blk_mq_freeze_queue(q);
+ /* Avoid that requeuing could restart the queue. */
blk_mq_cancel_requeue_work(q);
blk_mq_stop_hw_queues(q);
+ blk_mq_unfreeze_queue(q);
}
}
@@ -318,13 +317,10 @@ static void dm_old_requeue_request(struct request *rq)
static void dm_mq_requeue_request(struct request *rq)
{
struct request_queue *q = rq->q;
- unsigned long flags;
blk_mq_requeue_request(rq);
- spin_lock_irqsave(q->queue_lock, flags);
- if (!blk_queue_stopped(q))
- blk_mq_kick_requeue_list(q);
- spin_unlock_irqrestore(q->queue_lock, flags);
+ WARN_ON_ONCE(blk_mq_queue_stopped(q));
+ blk_mq_kick_requeue_list(q);
}
static void dm_requeue_original_request(struct mapped_device *md,
@@ -867,17 +863,6 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
dm_put_live_table(md, srcu_idx);
}
- /*
- * On suspend dm_stop_queue() handles stopping the blk-mq
- * request_queue BUT: even though the hw_queues are marked
- * BLK_MQ_S_STOPPED at that point there is still a race that
- * is allowing block/blk-mq.c to call ->queue_rq against a
- * hctx that it really shouldn't. The following check guards
- * against this rarity (albeit _not_ race-free).
- */
- if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
- return BLK_MQ_RQ_QUEUE_BUSY;
-
if (ti->type->busy && ti->type->busy(ti))
return BLK_MQ_RQ_QUEUE_BUSY;
--
2.9.3
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
` (7 preceding siblings ...)
2016-08-31 22:18 ` [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues Bart Van Assche
@ 2016-08-31 22:18 ` Bart Van Assche
2016-09-01 2:49 ` Mike Snitzer
8 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-08-31 22:18 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
If pg_init_retries is set and a request is queued against a
multipath device with all underlying block devices in the "dying"
state then an infinite loop is triggered because activate_path()
never succeeds and hence never calls pg_init_done(). Fix this by
making ql_select_path() skip dying paths.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
drivers/md/dm-queue-length.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/md/dm-queue-length.c b/drivers/md/dm-queue-length.c
index 23f1786..a283c66 100644
--- a/drivers/md/dm-queue-length.c
+++ b/drivers/md/dm-queue-length.c
@@ -199,11 +199,12 @@ static struct dm_path *ql_select_path(struct path_selector *ps, size_t nr_bytes)
list_move_tail(s->valid_paths.next, &s->valid_paths);
list_for_each_entry(pi, &s->valid_paths, list) {
- if (!best ||
- (atomic_read(&pi->qlen) < atomic_read(&best->qlen)))
+ if ((!best ||
+ atomic_read(&pi->qlen) < atomic_read(&best->qlen)) &&
+ !blk_queue_dying(pi->path->dev->bdev->bd_queue))
best = pi;
- if (!atomic_read(&best->qlen))
+ if (best && atomic_read(&best->qlen) == 0)
break;
}
--
2.9.3
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop
2016-08-31 22:18 ` [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop Bart Van Assche
@ 2016-09-01 2:49 ` Mike Snitzer
2016-09-01 14:14 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 2:49 UTC (permalink / raw)
To: Bart Van Assche; +Cc: device-mapper development
On Wed, Aug 31 2016 at 6:18pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> If pg_init_retries is set and a request is queued against a
> multipath device with all underlying block devices in the "dying"
> state then an infinite loop is triggered because activate_path()
> never succeeds and hence never calls pg_init_done(). Fix this by
> making ql_select_path() skip dying paths.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Assuming DM multipath needs to be sprinkling these dying queue checks so
deep (which I'm not yet sold on):
Same would be needed in service-time and round-robin right?
> ---
> drivers/md/dm-queue-length.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/dm-queue-length.c b/drivers/md/dm-queue-length.c
> index 23f1786..a283c66 100644
> --- a/drivers/md/dm-queue-length.c
> +++ b/drivers/md/dm-queue-length.c
> @@ -199,11 +199,12 @@ static struct dm_path *ql_select_path(struct path_selector *ps, size_t nr_bytes)
> list_move_tail(s->valid_paths.next, &s->valid_paths);
>
> list_for_each_entry(pi, &s->valid_paths, list) {
> - if (!best ||
> - (atomic_read(&pi->qlen) < atomic_read(&best->qlen)))
> + if ((!best ||
> + atomic_read(&pi->qlen) < atomic_read(&best->qlen)) &&
> + !blk_queue_dying(pi->path->dev->bdev->bd_queue))
> best = pi;
>
> - if (!atomic_read(&best->qlen))
> + if (best && atomic_read(&best->qlen) == 0)
> break;
> }
>
> --
> 2.9.3
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-08-31 22:18 ` [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues Bart Van Assche
@ 2016-09-01 3:13 ` Mike Snitzer
2016-09-01 14:23 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 3:13 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe, device-mapper development, hch
On Wed, Aug 31 2016 at 6:18pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
> calls have stopped before setting the "queue stopped" flag. This
> allows to remove the "queue stopped" test from dm_mq_queue_rq() and
> dm_mq_requeue_request(). Use BLK_MQ_S_STOPPED instead of
> QUEUE_FLAG_STOPPED.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
At first glance, at a minimum this patch needs a better header. It
seems you're doing 2 things:
1) using blk_mq_{freeze,unfreeze}_queue() actually makes dm_stop_queue()
work for blk-mq? Whereby fixing blk-mq race(s)?
2) switching away from QUEUE_FLAG_STOPPED to BLK_MQ_S_STOPPED (via
blk_mq_queue_stopped)
- not clear to me that dm-mq's use of QUEUE_FLAG_STOPPED wasn't fine;
NVMe also uses it for blk-mq
> ---
> drivers/md/dm-rq.c | 29 +++++++----------------------
> 1 file changed, 7 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 8dc8cfb..b5db523 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -78,7 +78,6 @@ void dm_start_queue(struct request_queue *q)
> if (!q->mq_ops)
> dm_old_start_queue(q);
> else {
> - queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, q);
> blk_mq_start_stopped_hw_queues(q, true);
> blk_mq_kick_requeue_list(q);
> }
> @@ -98,13 +97,13 @@ void dm_stop_queue(struct request_queue *q)
> {
> if (!q->mq_ops)
> dm_old_stop_queue(q);
> - else {
> - spin_lock_irq(q->queue_lock);
> - queue_flag_set(QUEUE_FLAG_STOPPED, q);
> - spin_unlock_irq(q->queue_lock);
> -
> + else if (!blk_mq_queue_stopped(q)) {
> + /* Wait until dm_mq_queue_rq() has finished. */
> + blk_mq_freeze_queue(q);
> + /* Avoid that requeuing could restart the queue. */
> blk_mq_cancel_requeue_work(q);
> blk_mq_stop_hw_queues(q);
> + blk_mq_unfreeze_queue(q);
> }
> }
>
> @@ -318,13 +317,10 @@ static void dm_old_requeue_request(struct request *rq)
> static void dm_mq_requeue_request(struct request *rq)
> {
> struct request_queue *q = rq->q;
> - unsigned long flags;
>
> blk_mq_requeue_request(rq);
> - spin_lock_irqsave(q->queue_lock, flags);
> - if (!blk_queue_stopped(q))
> - blk_mq_kick_requeue_list(q);
> - spin_unlock_irqrestore(q->queue_lock, flags);
> + WARN_ON_ONCE(blk_mq_queue_stopped(q));
> + blk_mq_kick_requeue_list(q);
> }
>
> static void dm_requeue_original_request(struct mapped_device *md,
> @@ -867,17 +863,6 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
> dm_put_live_table(md, srcu_idx);
> }
>
> - /*
> - * On suspend dm_stop_queue() handles stopping the blk-mq
> - * request_queue BUT: even though the hw_queues are marked
> - * BLK_MQ_S_STOPPED at that point there is still a race that
> - * is allowing block/blk-mq.c to call ->queue_rq against a
> - * hctx that it really shouldn't. The following check guards
> - * against this rarity (albeit _not_ race-free).
> - */
> - if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> - return BLK_MQ_RQ_QUEUE_BUSY;
> -
> if (ti->type->busy && ti->type->busy(ti))
> return BLK_MQ_RQ_QUEUE_BUSY;
>
> --
> 2.9.3
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/9] dm: Rename a function argument
2016-08-31 22:16 ` [PATCH 2/9] dm: Rename a function argument Bart Van Assche
@ 2016-09-01 3:29 ` Mike Snitzer
2016-09-01 14:17 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 3:29 UTC (permalink / raw)
To: Bart Van Assche; +Cc: device-mapper development
On Wed, Aug 31 2016 at 6:16pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> Rename 'interruptible' into 'sleep_state' to make it clear that this
> argument is a task state instead of a boolean.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Shouldn't the type also be changed from int to long (to match 'state'
member from 'struct task_struct')?
> ---
> drivers/md/dm.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index fa9b1cb..1d3627c 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1934,7 +1934,8 @@ void dm_put(struct mapped_device *md)
> }
> EXPORT_SYMBOL_GPL(dm_put);
>
> -static int dm_wait_for_completion(struct mapped_device *md, int interruptible)
> +/* @sleep_state: e.g. TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE */
> +static int dm_wait_for_completion(struct mapped_device *md, int sleep_state)
> {
> int r = 0;
> DECLARE_WAITQUEUE(wait, current);
> @@ -1942,12 +1943,12 @@ static int dm_wait_for_completion(struct mapped_device *md, int interruptible)
> add_wait_queue(&md->wait, &wait);
>
> while (1) {
> - set_current_state(interruptible);
> + set_current_state(sleep_state);
>
> if (!md_in_flight(md))
> break;
>
> - if (interruptible == TASK_INTERRUPTIBLE &&
> + if (sleep_state == TASK_INTERRUPTIBLE &&
> signal_pending(current)) {
> r = -EINTR;
> break;
> @@ -2075,6 +2076,10 @@ static void unlock_fs(struct mapped_device *md)
> }
>
> /*
> + * @suspend_flags: DM_SUSPEND_LOCKFS_FLAG and/or DM_SUSPEND_NOFLUSH_FLAG
> + * @sleep_state: e.g. TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE
> + * @dmf_suspended_flag: DMF_SUSPENDED or DMF_SUSPENDED_INTERNALLY
> + *
> * If __dm_suspend returns 0, the device is completely quiescent
> * now. There is no request-processing activity. All new requests
> * are being added to md->deferred list.
> @@ -2082,7 +2087,7 @@ static void unlock_fs(struct mapped_device *md)
> * Caller must hold md->suspend_lock
> */
> static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
> - unsigned suspend_flags, int interruptible,
> + unsigned suspend_flags, int sleep_state,
> int dmf_suspended_flag)
> {
> bool do_lockfs = suspend_flags & DM_SUSPEND_LOCKFS_FLAG;
> @@ -2149,7 +2154,7 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
> * We call dm_wait_for_completion to wait for all existing requests
> * to finish.
> */
> - r = dm_wait_for_completion(md, interruptible);
> + r = dm_wait_for_completion(md, sleep_state);
> if (!r)
> set_bit(dmf_suspended_flag, &md->flags);
>
> --
> 2.9.3
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop
2016-09-01 2:49 ` Mike Snitzer
@ 2016-09-01 14:14 ` Bart Van Assche
2016-09-01 15:06 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 14:14 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
On 08/31/16 20:29, Mike Snitzer wrote:
> On Wed, Aug 31 2016 at 6:18pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
>> If pg_init_retries is set and a request is queued against a
>> multipath device with all underlying block devices in the "dying"
>> state then an infinite loop is triggered because activate_path()
>> never succeeds and hence never calls pg_init_done(). Fix this by
>> making ql_select_path() skip dying paths.
>
> Assuming DM multipath needs to be sprinkling these dying queue checks so
> deep (which I'm not yet sold on):
>
> Same would be needed in service-time and round-robin right?
Hello Mike,
Before addressing service-time and round-robin path selectors I wanted
to make sure that we reach agreement about how to fix the queue length
path selector.
Do you have a proposal for an alternative approach to fix the infinite
loop that can be triggered during device removal?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/9] dm: Rename a function argument
2016-09-01 3:29 ` Mike Snitzer
@ 2016-09-01 14:17 ` Bart Van Assche
0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 14:17 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
On 08/31/16 20:29, Mike Snitzer wrote:
> On Wed, Aug 31 2016 at 6:16pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
>> Rename 'interruptible' into 'sleep_state' to make it clear that this
>> argument is a task state instead of a boolean.
>
> Shouldn't the type also be changed from int to long (to match 'state'
> member from 'struct task_struct')?
Hello Mike,
Today TASK_STATE_MAX equals 2048. So I think an int is big enough even
on 32-bit platforms. Please let me know if you really want me to change
the type from int to long.
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 3:13 ` Mike Snitzer
@ 2016-09-01 14:23 ` Bart Van Assche
2016-09-01 15:05 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 14:23 UTC (permalink / raw)
To: Mike Snitzer; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On 08/31/16 20:14, Mike Snitzer wrote:
> On Wed, Aug 31 2016 at 6:18pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
>> Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
>> calls have stopped before setting the "queue stopped" flag. This
>> allows to remove the "queue stopped" test from dm_mq_queue_rq() and
>> dm_mq_requeue_request(). Use BLK_MQ_S_STOPPED instead of
>> QUEUE_FLAG_STOPPED.
>
> At first glance, at a minimum this patch needs a better header. It
> seems you're doing 2 things:
>
> 1) using blk_mq_{freeze,unfreeze}_queue() actually makes dm_stop_queue()
> work for blk-mq? Whereby fixing blk-mq race(s)?
>
> 2) switching away from QUEUE_FLAG_STOPPED to BLK_MQ_S_STOPPED (via
> blk_mq_queue_stopped)
> - not clear to me that dm-mq's use of QUEUE_FLAG_STOPPED wasn't fine;
> NVMe also uses it for blk-mq
Hello Mike,
Adding the blk_mq_{freeze,unfreeze}_queue() calls is indeed what fixes
the race conditions related to stopping dm queues and what makes
dm_stop_queue() work.
If other blk-mq users and developers agree that QUEUE_FLAG_STOPPED
should be set for stopped blk-mq queues then I think the code to set
that flag should be moved into the blk-mq core. However, setting that
flag for blk-mq queues seems redundant to me. Hence my proposal to
introduce blk_mq_queue_stopped() instead.
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 14:23 ` Bart Van Assche
@ 2016-09-01 15:05 ` Mike Snitzer
2016-09-01 15:31 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 15:05 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 10:23am -0400,
Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
> On 08/31/16 20:14, Mike Snitzer wrote:
> > On Wed, Aug 31 2016 at 6:18pm -0400,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >
> >> Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
> >> calls have stopped before setting the "queue stopped" flag. This
> >> allows to remove the "queue stopped" test from dm_mq_queue_rq() and
> >> dm_mq_requeue_request(). Use BLK_MQ_S_STOPPED instead of
> >> QUEUE_FLAG_STOPPED.
> >
> > At first glance, at a minimum this patch needs a better header. It
> > seems you're doing 2 things:
> >
> > 1) using blk_mq_{freeze,unfreeze}_queue() actually makes dm_stop_queue()
> > work for blk-mq? Whereby fixing blk-mq race(s)?
> >
> > 2) switching away from QUEUE_FLAG_STOPPED to BLK_MQ_S_STOPPED (via
> > blk_mq_queue_stopped)
> > - not clear to me that dm-mq's use of QUEUE_FLAG_STOPPED wasn't fine;
> > NVMe also uses it for blk-mq
>
> Hello Mike,
>
> Adding the blk_mq_{freeze,unfreeze}_queue() calls is indeed what fixes
> the race conditions related to stopping dm queues and what makes
> dm_stop_queue() work.
OK, thanks for confirming.
> If other blk-mq users and developers agree that QUEUE_FLAG_STOPPED
> should be set for stopped blk-mq queues then I think the code to set
> that flag should be moved into the blk-mq core. However, setting that
> flag for blk-mq queues seems redundant to me. Hence my proposal to
> introduce blk_mq_queue_stopped() instead.
This is a secondary issue that can be dealt with independent of the rest
of your DM patchset.
I've staged most of your changes (with slight tweaks), see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
Only remaining issue is the queue dying race(s) in dm-multipath.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop
2016-09-01 14:14 ` Bart Van Assche
@ 2016-09-01 15:06 ` Mike Snitzer
2016-09-01 15:22 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 15:06 UTC (permalink / raw)
To: Bart Van Assche; +Cc: device-mapper development
On Thu, Sep 01 2016 at 10:14am -0400,
Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
> On 08/31/16 20:29, Mike Snitzer wrote:
> > On Wed, Aug 31 2016 at 6:18pm -0400,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >
> >> If pg_init_retries is set and a request is queued against a
> >> multipath device with all underlying block devices in the "dying"
> >> state then an infinite loop is triggered because activate_path()
> >> never succeeds and hence never calls pg_init_done(). Fix this by
> >> making ql_select_path() skip dying paths.
> >
> > Assuming DM multipath needs to be sprinkling these dying queue checks so
> > deep (which I'm not yet sold on):
> >
> > Same would be needed in service-time and round-robin right?
>
> Hello Mike,
>
> Before addressing service-time and round-robin path selectors I wanted
> to make sure that we reach agreement about how to fix the queue length
> path selector.
>
> Do you have a proposal for an alternative approach to fix the infinite
> loop that can be triggered during device removal?
I'm going to look closer now. But I'd prefer to see the "dying" state
check(s) elevated to DM muiltpath. Really would rather the path
selectors not have to worry about this state.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop
2016-09-01 15:06 ` Mike Snitzer
@ 2016-09-01 15:22 ` Bart Van Assche
2016-09-01 15:26 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 15:22 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
On 09/01/2016 08:06 AM, Mike Snitzer wrote:
> On Thu, Sep 01 2016 at 10:14am -0400,
> Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
>
>> On 08/31/16 20:29, Mike Snitzer wrote:
>>> On Wed, Aug 31 2016 at 6:18pm -0400,
>>> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>>>
>>>> If pg_init_retries is set and a request is queued against a
>>>> multipath device with all underlying block devices in the "dying"
>>>> state then an infinite loop is triggered because activate_path()
>>>> never succeeds and hence never calls pg_init_done(). Fix this by
>>>> making ql_select_path() skip dying paths.
>>>
>>> Assuming DM multipath needs to be sprinkling these dying queue checks so
>>> deep (which I'm not yet sold on):
>>>
>>> Same would be needed in service-time and round-robin right?
>>
>> Hello Mike,
>>
>> Before addressing service-time and round-robin path selectors I wanted
>> to make sure that we reach agreement about how to fix the queue length
>> path selector.
>>
>> Do you have a proposal for an alternative approach to fix the infinite
>> loop that can be triggered during device removal?
>
> I'm going to look closer now. But I'd prefer to see the "dying" state
> check(s) elevated to DM multipath. Really would rather the path
> selectors not have to worry about this state.
Hello Mike,
How about making blk_cleanup_queue() invoke a callback function in dm or
dm-mpath and to use that callback function to keep track of the number
of paths that are not in the "dying" state? That would allow to detect
in the dm or dm-mpath driver whether or not all paths are in the dying
state without having to modify every path selector. This is just an idea
- there might be better alternatives.
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop
2016-09-01 15:22 ` Bart Van Assche
@ 2016-09-01 15:26 ` Mike Snitzer
0 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 15:26 UTC (permalink / raw)
To: Bart Van Assche; +Cc: device-mapper development
On Thu, Sep 01 2016 at 11:22P -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/01/2016 08:06 AM, Mike Snitzer wrote:
> >On Thu, Sep 01 2016 at 10:14am -0400,
> >Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
> >
> >>On 08/31/16 20:29, Mike Snitzer wrote:
> >>>On Wed, Aug 31 2016 at 6:18pm -0400,
> >>>Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>>
> >>>>If pg_init_retries is set and a request is queued against a
> >>>>multipath device with all underlying block devices in the "dying"
> >>>>state then an infinite loop is triggered because activate_path()
> >>>>never succeeds and hence never calls pg_init_done(). Fix this by
> >>>>making ql_select_path() skip dying paths.
> >>>
> >>>Assuming DM multipath needs to be sprinkling these dying queue checks so
> >>>deep (which I'm not yet sold on):
> >>>
> >>>Same would be needed in service-time and round-robin right?
> >>
> >>Hello Mike,
> >>
> >>Before addressing service-time and round-robin path selectors I wanted
> >>to make sure that we reach agreement about how to fix the queue length
> >>path selector.
> >>
> >>Do you have a proposal for an alternative approach to fix the infinite
> >>loop that can be triggered during device removal?
> >
> >I'm going to look closer now. But I'd prefer to see the "dying" state
> >check(s) elevated to DM multipath. Really would rather the path
> >selectors not have to worry about this state.
>
> Hello Mike,
>
> How about making blk_cleanup_queue() invoke a callback function in dm or
> dm-mpath and to use that callback function to keep track of the number of
> paths that are not in the "dying" state? That would allow to detect in the
> dm or dm-mpath driver whether or not all paths are in the dying state
> without having to modify every path selector. This is just an idea - there
> might be better alternatives.
Even that seems like overkill. What about this? Any chance you could
try the linux-dm.git 'dm-4.9' branch with this patch ontop?
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index ac734e5..15db5e9 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1521,10 +1521,10 @@ static void activate_path(struct work_struct *work)
{
struct pgpath *pgpath =
container_of(work, struct pgpath, activate_path.work);
+ struct request_queue *q = bdev_get_queue(pgpath->path.dev->bdev);
- if (pgpath->is_active)
- scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
- pg_init_done, pgpath);
+ if (pgpath->is_active && !blk_queue_dying(q))
+ scsi_dh_activate(q, pg_init_done, pgpath);
else
pg_init_done(pgpath, SCSI_DH_DEV_OFFLINED);
}
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 15:05 ` Mike Snitzer
@ 2016-09-01 15:31 ` Bart Van Assche
2016-09-01 15:50 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 15:31 UTC (permalink / raw)
To: Mike Snitzer; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On 09/01/2016 08:05 AM, Mike Snitzer wrote:
> I've staged most of your changes (with slight tweaks), see:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
>
> Only remaining issue is the queue dying race(s) in dm-multipath.
Thanks Mike! Two minor comments though:
* In dm_start_queue(), I think that the queue_flag_clear_unlocked()
call should be converted into queue_flag_clear() and that it should
be protected by the block layer queue lock. Every call of
queue_flag_clear_unlocked() after block device initialization has
finished is wrong if blk_cleanup_queue() can be called concurrently.
* I think that adding blk_mq_{freeze,unfreeze}_queue() calls in
dm_stop_queue() not only allows to remove the "queue stopped" test
from dm_mq_queue_rq() but also that it allows to remove that test
from dm_mq_requeue_request().
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 15:31 ` Bart Van Assche
@ 2016-09-01 15:50 ` Mike Snitzer
2016-09-01 16:12 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 15:50 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 11:31am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/01/2016 08:05 AM, Mike Snitzer wrote:
> >I've staged most of your changes (with slight tweaks), see:
> >https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
> >
> >Only remaining issue is the queue dying race(s) in dm-multipath.
>
> Thanks Mike! Two minor comments though:
> * In dm_start_queue(), I think that the queue_flag_clear_unlocked()
> call should be converted into queue_flag_clear() and that it should
> be protected by the block layer queue lock. Every call of
> queue_flag_clear_unlocked() after block device initialization has
> finished is wrong if blk_cleanup_queue() can be called concurrently.
OK, I'll have a look.
> * I think that adding blk_mq_{freeze,unfreeze}_queue() calls in
> dm_stop_queue() not only allows to remove the "queue stopped" test
> from dm_mq_queue_rq() but also that it allows to remove that test
> from dm_mq_requeue_request().
I'm aware you think that but I need to circle back to
dm_mq_requeue_request() vs blk_queue_stopped(). The code as is isn't a
problem, just might be an extra check that isn't needed any more.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 15:50 ` Mike Snitzer
@ 2016-09-01 16:12 ` Mike Snitzer
2016-09-01 17:59 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 16:12 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 11:50am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Thu, Sep 01 2016 at 11:31am -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
> > On 09/01/2016 08:05 AM, Mike Snitzer wrote:
> > >I've staged most of your changes (with slight tweaks), see:
> > >https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
> > >
> > >Only remaining issue is the queue dying race(s) in dm-multipath.
> >
> > Thanks Mike! Two minor comments though:
> > * In dm_start_queue(), I think that the queue_flag_clear_unlocked()
> > call should be converted into queue_flag_clear() and that it should
> > be protected by the block layer queue lock. Every call of
> > queue_flag_clear_unlocked() after block device initialization has
> > finished is wrong if blk_cleanup_queue() can be called concurrently.
>
> OK, I'll have a look.
Please see/test the dm-4.8 and dm-4.9 branches (dm-4.9 being rebased
ontop of dm-4.8):
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
Thanks.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 16:12 ` Mike Snitzer
@ 2016-09-01 17:59 ` Bart Van Assche
2016-09-01 19:05 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 17:59 UTC (permalink / raw)
To: Mike Snitzer; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On 09/01/2016 09:12 AM, Mike Snitzer wrote:
> On Thu, Sep 01 2016 at 11:50am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
>> On Thu, Sep 01 2016 at 11:31am -0400,
>> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>>
>>> On 09/01/2016 08:05 AM, Mike Snitzer wrote:
>>>> I've staged most of your changes (with slight tweaks), see:
>>>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
>>>>
>>>> Only remaining issue is the queue dying race(s) in dm-multipath.
>>>
>>> Thanks Mike! Two minor comments though:
>>> * In dm_start_queue(), I think that the queue_flag_clear_unlocked()
>>> call should be converted into queue_flag_clear() and that it should
>>> be protected by the block layer queue lock. Every call of
>>> queue_flag_clear_unlocked() after block device initialization has
>>> finished is wrong if blk_cleanup_queue() can be called concurrently.
>>
>> OK, I'll have a look.
>
> Please see/test the dm-4.8 and dm-4.9 branches (dm-4.9 being rebased
> ontop of dm-4.8):
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
Hello Mike,
The result of my tests of the dm-4.9 branch is as follows:
* With patch "dm mpath: check if path's request_queue is dying in
activate_path()" I still see every now and then that CPU usage of one of
the kworker threads jumps to 100%.
* A "if (!blk_queue_stopped(q))" test needs to be added in
dm_stop_queue() to avoid the following hang (that test was present in my
version of the patch that adds the blk_mq_{freeze,unfreeze}_queue() calls):
sysrq: SysRq : Show Blocked State
task PC stack pid father
multipathd D ffff8803c8d37b80 0 3242 1 0x00000000
Call Trace:
[<ffffffff81627087>] schedule+0x37/0x90
[<ffffffff813097e1>] blk_mq_freeze_queue_wait+0x51/0xb0
[<ffffffff8130be05>] blk_mq_freeze_queue+0x15/0x20
[<ffffffffa034d882>] dm_stop_queue+0x62/0xc0 [dm_mod]
[<ffffffffa0342a1b>] dm_swap_table+0x2fb/0x370 [dm_mod]
[<ffffffffa0347875>] dev_suspend+0x95/0x220 [dm_mod]
[<ffffffffa03480fc>] ctl_ioctl+0x1fc/0x550 [dm_mod]
[<ffffffffa034845e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
[<ffffffff811ee27f>] do_vfs_ioctl+0x8f/0x690
[<ffffffff811ee8bc>] SyS_ioctl+0x3c/0x70
[<ffffffff8162d125>] entry_SYSCALL_64_fastpath+0x18/0xa8
Thanks,
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 17:59 ` Bart Van Assche
@ 2016-09-01 19:05 ` Mike Snitzer
2016-09-01 19:35 ` Mike Snitzer
2016-09-01 20:15 ` Bart Van Assche
0 siblings, 2 replies; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 19:05 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 1:59pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/01/2016 09:12 AM, Mike Snitzer wrote:
> >On Thu, Sep 01 2016 at 11:50am -0400,
> >Mike Snitzer <snitzer@redhat.com> wrote:
> >
> >>On Thu, Sep 01 2016 at 11:31am -0400,
> >>Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>
> >>>On 09/01/2016 08:05 AM, Mike Snitzer wrote:
> >>>>I've staged most of your changes (with slight tweaks), see:
> >>>>https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
> >>>>
> >>>>Only remaining issue is the queue dying race(s) in dm-multipath.
> >>>
> >>>Thanks Mike! Two minor comments though:
> >>>* In dm_start_queue(), I think that the queue_flag_clear_unlocked()
> >>> call should be converted into queue_flag_clear() and that it should
> >>> be protected by the block layer queue lock. Every call of
> >>> queue_flag_clear_unlocked() after block device initialization has
> >>> finished is wrong if blk_cleanup_queue() can be called concurrently.
> >>
> >>OK, I'll have a look.
> >
> >Please see/test the dm-4.8 and dm-4.9 branches (dm-4.9 being rebased
> >ontop of dm-4.8):
> >https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8
> >https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
>
> Hello Mike,
>
> The result of my tests of the dm-4.9 branch is as follows:
> * With patch "dm mpath: check if path's request_queue is dying in
> activate_path()" I still see every now and then that CPU usage of
> one of the kworker threads jumps to 100%.
So you're saying that the dying queue check is still needed in the path
selector? Would be useful to know why the 100% is occuring. Can you
get a stack trace during this time?
> * A "if (!blk_queue_stopped(q))" test needs to be added in
> dm_stop_queue() to avoid the following hang (that test was present
> in my version of the patch that adds the
> blk_mq_{freeze,unfreeze}_queue() calls):
>
> sysrq: SysRq : Show Blocked State
> task PC stack pid father
> multipathd D ffff8803c8d37b80 0 3242 1 0x00000000
> Call Trace:
> [<ffffffff81627087>] schedule+0x37/0x90
> [<ffffffff813097e1>] blk_mq_freeze_queue_wait+0x51/0xb0
> [<ffffffff8130be05>] blk_mq_freeze_queue+0x15/0x20
> [<ffffffffa034d882>] dm_stop_queue+0x62/0xc0 [dm_mod]
> [<ffffffffa0342a1b>] dm_swap_table+0x2fb/0x370 [dm_mod]
> [<ffffffffa0347875>] dev_suspend+0x95/0x220 [dm_mod]
> [<ffffffffa03480fc>] ctl_ioctl+0x1fc/0x550 [dm_mod]
> [<ffffffffa034845e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
> [<ffffffff811ee27f>] do_vfs_ioctl+0x8f/0x690
> [<ffffffff811ee8bc>] SyS_ioctl+0x3c/0x70
> [<ffffffff8162d125>] entry_SYSCALL_64_fastpath+0x18/0xa8
OK, I've adjusted accordingly and pushed dm-4.8 and dm-4.9 again (with
force, sorry about that).
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 19:05 ` Mike Snitzer
@ 2016-09-01 19:35 ` Mike Snitzer
2016-09-01 20:15 ` Bart Van Assche
1 sibling, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 19:35 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 3:05pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Thu, Sep 01 2016 at 1:59pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
> > * A "if (!blk_queue_stopped(q))" test needs to be added in
> > dm_stop_queue() to avoid the following hang (that test was present
> > in my version of the patch that adds the
> > blk_mq_{freeze,unfreeze}_queue() calls):
> >
> > sysrq: SysRq : Show Blocked State
> > task PC stack pid father
> > multipathd D ffff8803c8d37b80 0 3242 1 0x00000000
> > Call Trace:
> > [<ffffffff81627087>] schedule+0x37/0x90
> > [<ffffffff813097e1>] blk_mq_freeze_queue_wait+0x51/0xb0
> > [<ffffffff8130be05>] blk_mq_freeze_queue+0x15/0x20
> > [<ffffffffa034d882>] dm_stop_queue+0x62/0xc0 [dm_mod]
> > [<ffffffffa0342a1b>] dm_swap_table+0x2fb/0x370 [dm_mod]
> > [<ffffffffa0347875>] dev_suspend+0x95/0x220 [dm_mod]
> > [<ffffffffa03480fc>] ctl_ioctl+0x1fc/0x550 [dm_mod]
> > [<ffffffffa034845e>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
> > [<ffffffff811ee27f>] do_vfs_ioctl+0x8f/0x690
> > [<ffffffff811ee8bc>] SyS_ioctl+0x3c/0x70
> > [<ffffffff8162d125>] entry_SYSCALL_64_fastpath+0x18/0xa8
>
> OK, I've adjusted accordingly and pushed dm-4.8 and dm-4.9 again (with
> force, sorry about that).
I'm testing the result.. needs more time. I'll let you know once my
mptest-based testing passes. (I think I'm holding the q->lock too
long in dm_mq_stop_queue()).
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 19:05 ` Mike Snitzer
2016-09-01 19:35 ` Mike Snitzer
@ 2016-09-01 20:15 ` Bart Van Assche
2016-09-01 20:33 ` Mike Snitzer
1 sibling, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 20:15 UTC (permalink / raw)
To: Mike Snitzer; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On 09/01/2016 12:05 PM, Mike Snitzer wrote:
> On Thu, Sep 01 2016 at 1:59pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> On 09/01/2016 09:12 AM, Mike Snitzer wrote:
>>> Please see/test the dm-4.8 and dm-4.9 branches (dm-4.9 being rebased
>>> ontop of dm-4.8):
>>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8
>>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
>>
>> Hello Mike,
>>
>> The result of my tests of the dm-4.9 branch is as follows:
>> * With patch "dm mpath: check if path's request_queue is dying in
>> activate_path()" I still see every now and then that CPU usage of
>> one of the kworker threads jumps to 100%.
>
> So you're saying that the dying queue check is still needed in the path
> selector? Would be useful to know why the 100% is occuring. Can you
> get a stack trace during this time?
Hello Mike,
A few days ago I had already tried to obtain a stack trace with perf but
the information reported by perf wasn't entirely accurate. What I know
about that 100% CPU usage is as follows:
* "dmsetup table" showed three SRP SCSI device nodes but these SRP SCSI
device nodes were not visible in /sys/block. This means that
scsi_remove_host() had already removed these from sysfs.
* hctx->run_work kept being requeued over and over again on the kernel
thread with name "kworker/3:1H". I assume this means that
blk_mq_run_hw_queue() was called with the second argument (async) set
to true. This probably means that the following dm-rq code was
triggered:
if (map_request(tio, rq, md) == DM_MAPIO_REQUEUE) {
/* Undo dm_start_request() before requeuing */
rq_end_stats(md, rq);
rq_completed(md, rq_data_dir(rq), false);
return BLK_MQ_RQ_QUEUE_BUSY;
}
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 20:15 ` Bart Van Assche
@ 2016-09-01 20:33 ` Mike Snitzer
2016-09-01 20:39 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 20:33 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 4:15pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/01/2016 12:05 PM, Mike Snitzer wrote:
> >On Thu, Sep 01 2016 at 1:59pm -0400,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>On 09/01/2016 09:12 AM, Mike Snitzer wrote:
> >>>Please see/test the dm-4.8 and dm-4.9 branches (dm-4.9 being rebased
> >>>ontop of dm-4.8):
> >>>https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8
> >>>https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
> >>
> >>Hello Mike,
> >>
> >>The result of my tests of the dm-4.9 branch is as follows:
> >>* With patch "dm mpath: check if path's request_queue is dying in
> >>activate_path()" I still see every now and then that CPU usage of
> >>one of the kworker threads jumps to 100%.
> >
> >So you're saying that the dying queue check is still needed in the path
> >selector? Would be useful to know why the 100% is occuring. Can you
> >get a stack trace during this time?
>
> Hello Mike,
>
> A few days ago I had already tried to obtain a stack trace with perf
> but the information reported by perf wasn't entirely accurate. What
> I know about that 100% CPU usage is as follows:
> * "dmsetup table" showed three SRP SCSI device nodes but these SRP SCSI
> device nodes were not visible in /sys/block. This means that
> scsi_remove_host() had already removed these from sysfs.
> * hctx->run_work kept being requeued over and over again on the kernel
> thread with name "kworker/3:1H". I assume this means that
> blk_mq_run_hw_queue() was called with the second argument (async) set
> to true. This probably means that the following dm-rq code was
> triggered:
>
> if (map_request(tio, rq, md) == DM_MAPIO_REQUEUE) {
> /* Undo dm_start_request() before requeuing */
> rq_end_stats(md, rq);
> rq_completed(md, rq_data_dir(rq), false);
> return BLK_MQ_RQ_QUEUE_BUSY;
> }
I'm able to easily reproduce this 100% cpu usage using mptest's
test_02_sdev_delete.
'dmsetup suspend --nolockfs --noflush mp' hangs, seems rooted in your
use of blk_mq_freeze_queue():
[ 298.136930] dmsetup D ffff880142cb3b70 0 9478 9414 0x00000080
[ 298.144831] ffff880142cb3b70 ffff880142cb3b28 ffff880330d6cb00 ffff88032d0022f8
[ 298.153132] ffff880142cb4000 ffff88032d0022f8 ffff88032b161800 0000000000000001
[ 298.161438] 0000000000000001 ffff880142cb3b88 ffffffff816c06e5 ffff88032d001aa0
[ 298.169740] Call Trace:
[ 298.172473] [<ffffffff816c06e5>] schedule+0x35/0x80
[ 298.178019] [<ffffffff8131b937>] blk_mq_freeze_queue_wait+0x57/0xc0
[ 298.185116] [<ffffffff810c58c0>] ? prepare_to_wait_event+0xf0/0xf0
[ 298.192117] [<ffffffff8131d92a>] blk_mq_freeze_queue+0x1a/0x20
[ 298.198734] [<ffffffffa000e910>] dm_stop_queue+0x50/0xc0 [dm_mod]
[ 298.205644] [<ffffffffa0001824>] __dm_suspend+0x134/0x1f0 [dm_mod]
[ 298.212649] [<ffffffffa00035b8>] dm_suspend+0xb8/0xd0 [dm_mod]
[ 298.219270] [<ffffffffa000882e>] dev_suspend+0x18e/0x240 [dm_mod]
[ 298.226175] [<ffffffffa00086a0>] ? table_load+0x380/0x380 [dm_mod]
[ 298.233180] [<ffffffffa0009027>] ctl_ioctl+0x1e7/0x4d0 [dm_mod]
[ 298.239890] [<ffffffff81197f00>] ? lru_cache_add_active_or_unevictable+0x10/0xb0
[ 298.248253] [<ffffffffa0009323>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 298.255049] [<ffffffff81227937>] do_vfs_ioctl+0xa7/0x5d0
[ 298.261081] [<ffffffff8112787f>] ? __audit_syscall_entry+0xaf/0x100
[ 298.268178] [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[ 298.275179] [<ffffffff81227ed9>] SyS_ioctl+0x79/0x90
[ 298.280821] [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[ 298.286950] [<ffffffff816c4921>] entry_SYSCALL64_slow_path+0x25/0x25
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 20:33 ` Mike Snitzer
@ 2016-09-01 20:39 ` Bart Van Assche
2016-09-01 20:48 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 20:39 UTC (permalink / raw)
To: Mike Snitzer; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On 09/01/2016 01:33 PM, Mike Snitzer wrote:
> I'm able to easily reproduce this 100% cpu usage using mptest's
> test_02_sdev_delete.
>
> 'dmsetup suspend --nolockfs --noflush mp' hangs, seems rooted in your
> use of blk_mq_freeze_queue():
>
> [ 298.136930] dmsetup D ffff880142cb3b70 0 9478 9414 0x00000080
> [ 298.144831] ffff880142cb3b70 ffff880142cb3b28 ffff880330d6cb00 ffff88032d0022f8
> [ 298.153132] ffff880142cb4000 ffff88032d0022f8 ffff88032b161800 0000000000000001
> [ 298.161438] 0000000000000001 ffff880142cb3b88 ffffffff816c06e5 ffff88032d001aa0
> [ 298.169740] Call Trace:
> [ 298.172473] [<ffffffff816c06e5>] schedule+0x35/0x80
> [ 298.178019] [<ffffffff8131b937>] blk_mq_freeze_queue_wait+0x57/0xc0
> [ 298.185116] [<ffffffff810c58c0>] ? prepare_to_wait_event+0xf0/0xf0
> [ 298.192117] [<ffffffff8131d92a>] blk_mq_freeze_queue+0x1a/0x20
> [ 298.198734] [<ffffffffa000e910>] dm_stop_queue+0x50/0xc0 [dm_mod]
> [ 298.205644] [<ffffffffa0001824>] __dm_suspend+0x134/0x1f0 [dm_mod]
> [ 298.212649] [<ffffffffa00035b8>] dm_suspend+0xb8/0xd0 [dm_mod]
> [ 298.219270] [<ffffffffa000882e>] dev_suspend+0x18e/0x240 [dm_mod]
> [ 298.226175] [<ffffffffa00086a0>] ? table_load+0x380/0x380 [dm_mod]
> [ 298.233180] [<ffffffffa0009027>] ctl_ioctl+0x1e7/0x4d0 [dm_mod]
> [ 298.239890] [<ffffffff81197f00>] ? lru_cache_add_active_or_unevictable+0x10/0xb0
> [ 298.248253] [<ffffffffa0009323>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
> [ 298.255049] [<ffffffff81227937>] do_vfs_ioctl+0xa7/0x5d0
> [ 298.261081] [<ffffffff8112787f>] ? __audit_syscall_entry+0xaf/0x100
> [ 298.268178] [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
> [ 298.275179] [<ffffffff81227ed9>] SyS_ioctl+0x79/0x90
> [ 298.280821] [<ffffffff81003a47>] do_syscall_64+0x67/0x160
> [ 298.286950] [<ffffffff816c4921>] entry_SYSCALL64_slow_path+0x25/0x25
>
Hello Mike,
In your dm-4.9 branch I see that you call blk_mq_freeze_queue() while
holding the block layer queue lock. Please don't do this.
blk_mq_freeze_queue() can sleep and as you know calling a sleeping
function while holding a spinlock is not allowed.
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 20:39 ` Bart Van Assche
@ 2016-09-01 20:48 ` Mike Snitzer
2016-09-01 20:52 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 20:48 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 4:39pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/01/2016 01:33 PM, Mike Snitzer wrote:
> >I'm able to easily reproduce this 100% cpu usage using mptest's
> >test_02_sdev_delete.
> >
> >'dmsetup suspend --nolockfs --noflush mp' hangs, seems rooted in your
> >use of blk_mq_freeze_queue():
> >
> >[ 298.136930] dmsetup D ffff880142cb3b70 0 9478 9414 0x00000080
> >[ 298.144831] ffff880142cb3b70 ffff880142cb3b28 ffff880330d6cb00 ffff88032d0022f8
> >[ 298.153132] ffff880142cb4000 ffff88032d0022f8 ffff88032b161800 0000000000000001
> >[ 298.161438] 0000000000000001 ffff880142cb3b88 ffffffff816c06e5 ffff88032d001aa0
> >[ 298.169740] Call Trace:
> >[ 298.172473] [<ffffffff816c06e5>] schedule+0x35/0x80
> >[ 298.178019] [<ffffffff8131b937>] blk_mq_freeze_queue_wait+0x57/0xc0
> >[ 298.185116] [<ffffffff810c58c0>] ? prepare_to_wait_event+0xf0/0xf0
> >[ 298.192117] [<ffffffff8131d92a>] blk_mq_freeze_queue+0x1a/0x20
> >[ 298.198734] [<ffffffffa000e910>] dm_stop_queue+0x50/0xc0 [dm_mod]
> >[ 298.205644] [<ffffffffa0001824>] __dm_suspend+0x134/0x1f0 [dm_mod]
> >[ 298.212649] [<ffffffffa00035b8>] dm_suspend+0xb8/0xd0 [dm_mod]
> >[ 298.219270] [<ffffffffa000882e>] dev_suspend+0x18e/0x240 [dm_mod]
> >[ 298.226175] [<ffffffffa00086a0>] ? table_load+0x380/0x380 [dm_mod]
> >[ 298.233180] [<ffffffffa0009027>] ctl_ioctl+0x1e7/0x4d0 [dm_mod]
> >[ 298.239890] [<ffffffff81197f00>] ? lru_cache_add_active_or_unevictable+0x10/0xb0
> >[ 298.248253] [<ffffffffa0009323>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
> >[ 298.255049] [<ffffffff81227937>] do_vfs_ioctl+0xa7/0x5d0
> >[ 298.261081] [<ffffffff8112787f>] ? __audit_syscall_entry+0xaf/0x100
> >[ 298.268178] [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
> >[ 298.275179] [<ffffffff81227ed9>] SyS_ioctl+0x79/0x90
> >[ 298.280821] [<ffffffff81003a47>] do_syscall_64+0x67/0x160
> >[ 298.286950] [<ffffffff816c4921>] entry_SYSCALL64_slow_path+0x25/0x25
> >
>
> Hello Mike,
>
> In your dm-4.9 branch I see that you call blk_mq_freeze_queue()
> while holding the block layer queue lock. Please don't do this.
> blk_mq_freeze_queue() can sleep and as you know calling a sleeping
> function while holding a spinlock is not allowed.
Yeah, I since fixed that. Doesn't change the fact that your use of
blk_mq_freeze_queue() causes the 100% cpu usage.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 20:48 ` Mike Snitzer
@ 2016-09-01 20:52 ` Bart Van Assche
2016-09-01 21:17 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 20:52 UTC (permalink / raw)
To: Mike Snitzer; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On 09/01/2016 01:48 PM, Mike Snitzer wrote:
> Yeah, I since fixed that. Doesn't change the fact that your use of
> blk_mq_freeze_queue() causes the 100% cpu usage.
Hello Mike,
Sorry but that doesn't make sense to me. blk_mq_freeze_queue() either
returns quickly or waits. It cannot cause 100% CPU usage. Something else
must be going on. If you can make the code available that you used in
your test I will have a look at it.
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 20:52 ` Bart Van Assche
@ 2016-09-01 21:17 ` Mike Snitzer
2016-09-01 22:18 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 21:17 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 4:52pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/01/2016 01:48 PM, Mike Snitzer wrote:
> >Yeah, I since fixed that. Doesn't change the fact that your use of
> >blk_mq_freeze_queue() causes the 100% cpu usage.
>
> Hello Mike,
>
> Sorry but that doesn't make sense to me. blk_mq_freeze_queue()
> either returns quickly or waits. It cannot cause 100% CPU usage.
> Something else must be going on. If you can make the code available
> that you used in your test I will have a look at it.
The suspend is hanging due the blk_mq_freeze_queue() -- and in turn the
resume (after reinstating paths) never happens. So no paths are
available. IO just keeps getting requeued, hence the 100% usage.
The root of the problem is that the suspend isn't completing though.
I've moved your work out to my private devel branch, see:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=devel
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 21:17 ` Mike Snitzer
@ 2016-09-01 22:18 ` Mike Snitzer
2016-09-01 22:22 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 22:18 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 5:17pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Thu, Sep 01 2016 at 4:52pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
> > On 09/01/2016 01:48 PM, Mike Snitzer wrote:
> > >Yeah, I since fixed that. Doesn't change the fact that your use of
> > >blk_mq_freeze_queue() causes the 100% cpu usage.
> >
> > Hello Mike,
> >
> > Sorry but that doesn't make sense to me. blk_mq_freeze_queue()
> > either returns quickly or waits. It cannot cause 100% CPU usage.
> > Something else must be going on. If you can make the code available
> > that you used in your test I will have a look at it.
>
> The suspend is hanging due the blk_mq_freeze_queue() -- and in turn the
> resume (after reinstating paths) never happens. So no paths are
> available. IO just keeps getting requeued, hence the 100% usage.
>
> The root of the problem is that the suspend isn't completing though.
>
> I've moved your work out to my private devel branch, see:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=devel
FYI I get the same 'dmsetup suspend --nolockfs --noflush mp' hang,
running mptest's test_02_sdev_delete, when I try your unmodified
patchset, see:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=devel.bart
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 22:18 ` Mike Snitzer
@ 2016-09-01 22:22 ` Bart Van Assche
2016-09-01 22:26 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 22:22 UTC (permalink / raw)
To: Mike Snitzer; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On 09/01/2016 03:18 PM, Mike Snitzer wrote:
> FYI I get the same 'dmsetup suspend --nolockfs --noflush mp' hang,
> running mptest's test_02_sdev_delete, when I try your unmodified
> patchset, see:
>
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=devel.bart
Hello Mike,
Are you aware that the code on that branch is a *modified* version of my
patch series? The following patch is not present on that branch: "dm
path selector: Avoid that device removal triggers an infinite loop".
There are also other (smaller) differences.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 22:22 ` Bart Van Assche
@ 2016-09-01 22:26 ` Mike Snitzer
2016-09-01 23:17 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 22:26 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 6:22pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/01/2016 03:18 PM, Mike Snitzer wrote:
> >FYI I get the same 'dmsetup suspend --nolockfs --noflush mp' hang,
> >running mptest's test_02_sdev_delete, when I try your unmodified
> >patchset, see:
> >
> >http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=devel.bart
>
> Hello Mike,
>
> Are you aware that the code on that branch is a *modified* version
> of my patch series? The following patch is not present on that
> branch: "dm path selector: Avoid that device removal triggers an
> infinite loop". There are also other (smaller) differences.
No, you're obviously talking about the 'devel' branch and not the
'devel.bart' branch I pointed to. The 'devel.bart' branch is the
_exact_ patchset you sent. It has the same problem as the 'devel'
branch.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 22:26 ` Mike Snitzer
@ 2016-09-01 23:17 ` Bart Van Assche
2016-09-01 23:47 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-01 23:17 UTC (permalink / raw)
To: Mike Snitzer; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
[-- Attachment #1: Type: text/plain, Size: 1663 bytes --]
On 09/01/2016 03:27 PM, Mike Snitzer wrote:
> On Thu, Sep 01 2016 at 6:22pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
>> On 09/01/2016 03:18 PM, Mike Snitzer wrote:
>>> FYI I get the same 'dmsetup suspend --nolockfs --noflush mp' hang,
>>> running mptest's test_02_sdev_delete, when I try your unmodified
>>> patchset, see:
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=devel.bart
>>
>> Hello Mike,
>>
>> Are you aware that the code on that branch is a *modified* version
>> of my patch series? The following patch is not present on that
>> branch: "dm path selector: Avoid that device removal triggers an
>> infinite loop". There are also other (smaller) differences.
>
> No, you're obviously talking about the 'devel' branch and not the
> 'devel.bart' branch I pointed to. The 'devel.bart' branch is the
> _exact_ patchset you sent. It has the same problem as the 'devel'
> branch.
Hello Mike,
Sorry that I misread your previous e-mail. After I received your latest
e-mail I rebased my tree on top of the devel.bart branch mentioned
above. My tests still pass. The only two patches in my tree that are
relevant and that are not in the devel.bart branch have been attached to
this e-mail. Did your test involve the sd driver? If so, do the attached
two patches help? If the sd driver was not involved, can you provide
more information about the hang you ran into? The output and log
messages generated by the following commands after the hang has been
reproduced would be very welcome:
* echo w > /proc/sysrq-trigger
* (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list})
Thanks,
Bart.
[-- Attachment #2: 0001-block-dm-mpath-Introduce-request-flag-REQ_FAIL_IF_NO.patch --]
[-- Type: text/x-patch, Size: 2857 bytes --]
From 872971ee76d299e9c002b57b5ae6f12a4bd4ef92 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Thu, 30 Jun 2016 11:53:38 +0200
Subject: [PATCH 1/2] block, dm-mpath: Introduce request flag
REQ_FAIL_IF_NO_PATH
Let dm-mpath fail requests if queue_if_no_path is enabled and
request flag REQ_FAIL_IF_NO_PATH has been set. This facility will
be used in the next patch to fix a deadlock between SCSI device
removal and sd event checking.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
drivers/md/dm-mpath.c | 6 ++++--
include/linux/blk_types.h | 4 ++++
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 8df7b12..58ce8bc 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -540,6 +540,7 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
struct multipath *m = ti->private;
int r = DM_MAPIO_REQUEUE;
size_t nr_bytes = clone ? blk_rq_bytes(clone) : blk_rq_bytes(rq);
+ bool fail_if_no_path = (clone ? : rq)->cmd_flags & REQ_FAIL_IF_NO_PATH;
struct pgpath *pgpath;
struct block_device *bdev;
struct dm_mpath_io *mpio;
@@ -550,7 +551,7 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
pgpath = choose_pgpath(m, nr_bytes);
if (!pgpath) {
- if (!must_push_back_rq(m))
+ if (fail_if_no_path || !must_push_back_rq(m))
r = -EIO; /* Failed */
return r;
} else if (test_bit(MPATHF_QUEUE_IO, &m->flags) ||
@@ -1558,6 +1559,7 @@ static int do_end_io(struct multipath *m, struct request *clone,
* clone bios for it and resubmit it later.
*/
int r = DM_ENDIO_REQUEUE;
+ bool fail_if_no_path = clone->cmd_flags & REQ_FAIL_IF_NO_PATH;
if (!error && !clone->errors)
return 0; /* I/O complete */
@@ -1570,7 +1572,7 @@ static int do_end_io(struct multipath *m, struct request *clone,
if (!atomic_read(&m->nr_valid_paths)) {
if (!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) {
- if (!must_push_back_rq(m))
+ if (fail_if_no_path || !must_push_back_rq(m))
r = -EIO;
} else {
if (error == -EBADE)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 436f43f..954e687 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -182,6 +182,9 @@ enum rq_flag_bits {
__REQ_PM, /* runtime pm request */
__REQ_HASHED, /* on IO scheduler merge hash */
__REQ_MQ_INFLIGHT, /* track inflight for MQ */
+
+ __REQ_FAIL_IF_NO_PATH, /* fail if queued on dm-mpath with no paths */
+
__REQ_NR_BITS, /* stops here */
};
@@ -228,6 +231,7 @@ enum rq_flag_bits {
#define REQ_PM (1ULL << __REQ_PM)
#define REQ_HASHED (1ULL << __REQ_HASHED)
#define REQ_MQ_INFLIGHT (1ULL << __REQ_MQ_INFLIGHT)
+#define REQ_FAIL_IF_NO_PATH (1ULL << __REQ_FAIL_IF_NO_PATH)
enum req_op {
REQ_OP_READ,
--
2.9.3
[-- Attachment #3: 0002-sd-Fix-a-deadlock.patch --]
[-- Type: text/x-patch, Size: 8343 bytes --]
From 60acb292b29fc7eb40298e8d6dcc409315154340 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Thu, 30 Jun 2016 11:54:06 +0200
Subject: [PATCH 2/2] sd: Fix a deadlock
If sd_check_events() is called for an sd device on top of a
multipath device without paths then the scsi_test_unit_ready()
call from that function will block until one or more paths
have been added to the multipath device. However, if
scsi_remove_host() is called before a path has been added then
a deadlock is triggered. Fix this deadlock by making the
scsi_test_unit_ready() call from sd_check_events() fail if
there are no paths. SysRq-w reported the following call stacks:
sysrq: SysRq : Show Blocked State
task PC stack pid father
kworker/6:1 D ffff88046346b958 0 95 2 0x00000000
Workqueue: events_freezable_power_ disk_events_workfn
Call Trace:
[<ffffffff815acf97>] schedule+0x37/0x90
[<ffffffff815b1487>] schedule_timeout+0x157/0x230
[<ffffffff815ac61f>] io_schedule_timeout+0x9f/0x110
[<ffffffff815adba9>] wait_for_completion_io_timeout+0xd9/0x110
[<ffffffff812aa9f8>] blk_execute_rq+0xa8/0x130
[<ffffffff81408411>] scsi_execute+0xf1/0x160
[<ffffffff8140a484>] scsi_execute_req_flags+0x84/0xf0
[<ffffffff8140aabd>] scsi_test_unit_ready+0x7d/0x130
[<ffffffff81416fe6>] sd_check_events+0x116/0x1a0
[<ffffffff812b477f>] disk_check_events+0x4f/0x130
[<ffffffff812b4877>] disk_events_workfn+0x17/0x20
[<ffffffff810737ed>] process_one_work+0x19d/0x490
[<ffffffff81073b29>] worker_thread+0x49/0x490
[<ffffffff8107a0ea>] kthread+0xea/0x100
[<ffffffff815b26ef>] ret_from_fork+0x1f/0x40
kworker/2:16 D ffff88006d9a3af0 0 2575 2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
Call Trace:
[<ffffffff815acf97>] schedule+0x37/0x90
[<ffffffff815ad300>] schedule_preempt_disabled+0x10/0x20
[<ffffffff815af0b5>] mutex_lock_nested+0x145/0x360
[<ffffffff812b61ce>] disk_block_events+0x2e/0x80
[<ffffffff812b62e5>] del_gendisk+0x35/0x1d0
[<ffffffff81416ae6>] sd_remove+0x56/0xc0
[<ffffffff813e0812>] __device_release_driver+0x82/0x130
[<ffffffff813e08e0>] device_release_driver+0x20/0x30
[<ffffffff813dff33>] bus_remove_device+0x113/0x190
[<ffffffff813dc5bc>] device_del+0x12c/0x230
[<ffffffff814113a8>] __scsi_remove_device+0xf8/0x130
[<ffffffff8140f5d4>] scsi_forget_host+0x64/0x70
[<ffffffff814033c5>] scsi_remove_host+0x75/0x120
[<ffffffffa05b5e5b>] srp_remove_work+0x8b/0x1f0 [ib_srp]
[<ffffffff810737ed>] process_one_work+0x19d/0x490
[<ffffffff81073b29>] worker_thread+0x49/0x490
[<ffffffff8107a0ea>] kthread+0xea/0x100
[<ffffffff815b26ef>] ret_from_fork+0x1f/0x40
multipathd D ffff88046d91fb40 0 2604 1 0x00000000
Call Trace:
[<ffffffff815acf97>] schedule+0x37/0x90
[<ffffffff815ad300>] schedule_preempt_disabled+0x10/0x20
[<ffffffff815af0b5>] mutex_lock_nested+0x145/0x360
[<ffffffff812b61ce>] disk_block_events+0x2e/0x80
[<ffffffff811d8953>] __blkdev_get+0x53/0x480
[<ffffffff811d8ebb>] blkdev_get+0x13b/0x3a0
[<ffffffff811d9256>] blkdev_open+0x56/0x70
[<ffffffff81199a46>] do_dentry_open.isra.17+0x146/0x2d0
[<ffffffff8119ad58>] vfs_open+0x48/0x70
[<ffffffff811ab213>] path_openat+0x163/0xa10
[<ffffffff811aca29>] do_filp_open+0x79/0xd0
[<ffffffff8119b0a6>] do_sys_open+0x116/0x1f0
[<ffffffff8119b199>] SyS_open+0x19/0x20
[<ffffffff815b24a9>] entry_SYSCALL_64_fastpath+0x1c/0xac
systemd-udevd D ffff880419273968 0 10074 475 0x00000004
Call Trace:
[<ffffffff815acf97>] schedule+0x37/0x90
[<ffffffff815b14bb>] schedule_timeout+0x18b/0x230
[<ffffffff815ae261>] wait_for_completion+0xd1/0x110
[<ffffffff81073036>] flush_work+0x1d6/0x2a0
[<ffffffff8107339b>] __cancel_work_timer+0xfb/0x1b0
[<ffffffff8107346e>] cancel_delayed_work_sync+0xe/0x10
[<ffffffff812b621e>] disk_block_events+0x7e/0x80
[<ffffffff811d8953>] __blkdev_get+0x53/0x480
[<ffffffff811d8ebb>] blkdev_get+0x13b/0x3a0
[<ffffffff811d9256>] blkdev_open+0x56/0x70
[<ffffffff81199a46>] do_dentry_open.isra.17+0x146/0x2d0
[<ffffffff8119ad58>] vfs_open+0x48/0x70
[<ffffffff811ab213>] path_openat+0x163/0xa10
[<ffffffff811aca29>] do_filp_open+0x79/0xd0
[<ffffffff8119b0a6>] do_sys_open+0x116/0x1f0
[<ffffffff8119b199>] SyS_open+0x19/0x20
[<ffffffff815b24a9>] entry_SYSCALL_64_fastpath+0x1c/0xac
mount D ffff8803b42b7be8 0 13567 13442 0x00000000
Call Trace:
[<ffffffff815acf97>] schedule+0x37/0x90
[<ffffffff815b14bb>] schedule_timeout+0x18b/0x230
[<ffffffff815ac61f>] io_schedule_timeout+0x9f/0x110
[<ffffffff815ad786>] bit_wait_io+0x16/0x60
[<ffffffff815ad408>] __wait_on_bit+0x58/0x90
[<ffffffff8111e30f>] wait_on_page_bit_killable+0xbf/0xd0
[<ffffffff8111e450>] generic_file_read_iter+0x130/0x710
[<ffffffff811d7970>] blkdev_read_iter+0x30/0x40
[<ffffffff8119c0e9>] __vfs_read+0xb9/0x120
[<ffffffff8119c4c0>] vfs_read+0x90/0x130
[<ffffffff8119d884>] SyS_read+0x44/0xa0
[<ffffffff815b24a9>] entry_SYSCALL_64_fastpath+0x1c/0xac
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
drivers/scsi/scsi_lib.c | 17 +++++++++++++----
drivers/scsi/sd.c | 6 ++++--
include/scsi/scsi_device.h | 2 ++
3 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dd4766a..6b172f9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2401,13 +2401,14 @@ EXPORT_SYMBOL(scsi_mode_sense);
* @sshdr_external: Optional pointer to struct scsi_sense_hdr for
* returning sense. Make sure that this is cleared before passing
* in.
+ * @flags: or-ed into cmd_flags.
*
* Returns zero if unsuccessful or an error if TUR failed. For
* removable media, UNIT_ATTENTION sets ->changed flag.
**/
int
-scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
- struct scsi_sense_hdr *sshdr_external)
+scsi_test_unit_ready_flags(struct scsi_device *sdev, int timeout, int retries,
+ struct scsi_sense_hdr *sshdr_external, u64 flags)
{
char cmd[] = {
TEST_UNIT_READY, 0, 0, 0, 0, 0,
@@ -2422,8 +2423,8 @@ scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
/* try to eat the UNIT_ATTENTION if there are enough retries */
do {
- result = scsi_execute_req(sdev, cmd, DMA_NONE, NULL, 0, sshdr,
- timeout, retries, NULL);
+ result = scsi_execute_req_flags(sdev, cmd, DMA_NONE, NULL, 0,
+ sshdr, timeout, retries, NULL, flags);
if (sdev->removable && scsi_sense_valid(sshdr) &&
sshdr->sense_key == UNIT_ATTENTION)
sdev->changed = 1;
@@ -2434,6 +2435,14 @@ scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
kfree(sshdr);
return result;
}
+EXPORT_SYMBOL(scsi_test_unit_ready_flags);
+
+int scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
+ struct scsi_sense_hdr *sshdr_external)
+{
+ return scsi_test_unit_ready_flags(sdev, timeout, retries,
+ sshdr_external, 0);
+}
EXPORT_SYMBOL(scsi_test_unit_ready);
/**
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d3e852a..85d864b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1439,8 +1439,10 @@ static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing)
if (scsi_block_when_processing_errors(sdp)) {
sshdr = kzalloc(sizeof(*sshdr), GFP_KERNEL);
- retval = scsi_test_unit_ready(sdp, SD_TIMEOUT, SD_MAX_RETRIES,
- sshdr);
+ retval = scsi_test_unit_ready_flags(sdp, SD_TIMEOUT,
+ SD_MAX_RETRIES, sshdr, REQ_FAIL_IF_NO_PATH);
+ if (retval)
+ pr_info("%s: TUR %#x\n", __func__, retval);
}
/* failed to execute TUR, assume media not present */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 8a95631..840030a 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -379,6 +379,8 @@ extern int scsi_mode_select(struct scsi_device *sdev, int pf, int sp,
int timeout, int retries,
struct scsi_mode_data *data,
struct scsi_sense_hdr *);
+extern int scsi_test_unit_ready_flags(struct scsi_device *sdev, int timeout,
+ int retries, struct scsi_sense_hdr *sshdr, u64 flags);
extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout,
int retries, struct scsi_sense_hdr *sshdr);
extern int scsi_get_vpd_page(struct scsi_device *, u8 page, unsigned char *buf,
--
2.9.3
[-- Attachment #4: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 23:17 ` Bart Van Assche
@ 2016-09-01 23:47 ` Mike Snitzer
2016-09-02 0:03 ` Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-01 23:47 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 7:17pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/01/2016 03:27 PM, Mike Snitzer wrote:
> >On Thu, Sep 01 2016 at 6:22pm -0400,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >
> >>On 09/01/2016 03:18 PM, Mike Snitzer wrote:
> >>>FYI I get the same 'dmsetup suspend --nolockfs --noflush mp' hang,
> >>>running mptest's test_02_sdev_delete, when I try your unmodified
> >>>patchset, see:
> >>>
> >>>http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=devel.bart
> >>
> >>Hello Mike,
> >>
> >>Are you aware that the code on that branch is a *modified* version
> >>of my patch series? The following patch is not present on that
> >>branch: "dm path selector: Avoid that device removal triggers an
> >>infinite loop". There are also other (smaller) differences.
> >
> >No, you're obviously talking about the 'devel' branch and not the
> >'devel.bart' branch I pointed to. The 'devel.bart' branch is the
> >_exact_ patchset you sent. It has the same problem as the 'devel'
> >branch.
>
> Hello Mike,
>
> Sorry that I misread your previous e-mail. After I received your
> latest e-mail I rebased my tree on top of the devel.bart branch
> mentioned above. My tests still pass. The only two patches in my
> tree that are relevant and that are not in the devel.bart branch
> have been attached to this e-mail. Did your test involve the sd
> driver? If so, do the attached two patches help? If the sd driver
> was not involved, can you provide more information about the hang
> you ran into? The output and log messages generated by the following
> commands after the hang has been reproduced would be very welcome:
> * echo w > /proc/sysrq-trigger
> * (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list})
sd is used. I'll apply those patches and test, tomorrow, but I'm pretty
skeptical.
Haven't had any problems with these tests for quite a while. The tests
I'm running are just those in the mptest testsuite, see:
https://github.com/snitm/mptest
Running them should be as simple as you doing:
git clone git://github.com/snitm/mptest.git
cd mptest
./runtest
The default is to use dm-mq on scsi-mq ontop of tcmloop.
multipath -ll shows:
mp () dm-4 LIO-ORG ,rd
size=1.0G features='4 queue_if_no_path retain_attached_hw_handler queue_mode mq' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=-1 status=active
| |- 7:0:1:0 sdj 8:144 active ready running
| `- 8:0:1:0 sdk 8:160 active ready running
`-+- policy='queue-length 0' prio=-1 status=enabled
|- 9:0:1:0 sdl 8:176 active ready running
`- 10:0:1:0 sdm 8:192 active ready running
[ 4839.452237] scsi host7: TCM_Loopback
[ 4839.472788] scsi host8: TCM_Loopback
[ 4839.492867] scsi host9: TCM_Loopback
[ 4839.512841] scsi host10: TCM_Loopback
[ 4839.549430] scsi 7:0:1:0: Direct-Access LIO-ORG rd 4.0 PQ: 0 ANSI: 5
[ 4839.570556] scsi 7:0:1:0: alua: supports implicit and explicit TPGS
[ 4839.577562] scsi 7:0:1:0: alua: device naa.600140559050dd34f6e46deb7e0e9f24 port group 0 rel port 1
[ 4839.587810] sd 7:0:1:0: [sdj] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 4839.587830] sd 7:0:1:0: Attached scsi generic sg10 type 0
[ 4839.593569] sd 7:0:1:0: alua: transition timeout set to 60 seconds
[ 4839.593572] sd 7:0:1:0: alua: port group 00 state A non-preferred supports TOlUSNA
[ 4839.608254] scsi 8:0:1:0: Direct-Access LIO-ORG rd 4.0 PQ: 0 ANSI: 5
[ 4839.626620] sd 7:0:1:0: [sdj] Write Protect is off
[ 4839.631974] sd 7:0:1:0: [sdj] Mode Sense: 43 00 00 08
[ 4839.631999] sd 7:0:1:0: [sdj] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 4839.642209] loopback/naa.50014056fcae4fb4: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 4839.652646] sd 7:0:1:0: [sdj] Attached SCSI disk
[ 4839.673568] scsi 8:0:1:0: alua: supports implicit and explicit TPGS
[ 4839.680573] scsi 8:0:1:0: alua: device naa.600140559050dd34f6e46deb7e0e9f24 port group 0 rel port 2
[ 4839.690814] sd 8:0:1:0: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 4839.690888] sd 8:0:1:0: Attached scsi generic sg11 type 0
[ 4839.696543] sd 8:0:1:0: alua: port group 00 state A non-preferred supports TOlUSNA
[ 4839.711419] scsi 9:0:1:0: Direct-Access LIO-ORG rd 4.0 PQ: 0 ANSI: 5
[ 4839.722730] sd 8:0:1:0: [sdk] Write Protect is off
[ 4839.728076] sd 8:0:1:0: [sdk] Mode Sense: 43 00 00 08
[ 4839.728094] sd 8:0:1:0: [sdk] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 4839.738298] loopback/naa.500140553365fbe6: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 4839.748700] sd 8:0:1:0: [sdk] Attached SCSI disk
[ 4839.771561] scsi 9:0:1:0: alua: supports implicit and explicit TPGS
[ 4839.778567] scsi 9:0:1:0: alua: device naa.600140559050dd34f6e46deb7e0e9f24 port group 0 rel port 3
[ 4839.788794] sd 9:0:1:0: [sdl] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 4839.788823] sd 9:0:1:0: Attached scsi generic sg12 type 0
[ 4839.794546] sd 9:0:1:0: alua: port group 00 state A non-preferred supports TOlUSNA
[ 4839.809308] scsi 10:0:1:0: Direct-Access LIO-ORG rd 4.0 PQ: 0 ANSI: 5
[ 4839.820806] sd 9:0:1:0: [sdl] Write Protect is off
[ 4839.826161] sd 9:0:1:0: [sdl] Mode Sense: 43 00 00 08
[ 4839.826181] sd 9:0:1:0: [sdl] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 4839.836379] loopback/naa.5001405631dca816: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 4839.846762] sd 9:0:1:0: [sdl] Attached SCSI disk
[ 4839.856572] scsi 10:0:1:0: alua: supports implicit and explicit TPGS
[ 4839.863673] scsi 10:0:1:0: alua: device naa.600140559050dd34f6e46deb7e0e9f24 port group 0 rel port 4
[ 4839.874002] sd 10:0:1:0: [sdm] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 4839.874033] sd 10:0:1:0: Attached scsi generic sg13 type 0
[ 4839.879549] sd 10:0:1:0: alua: port group 00 state A non-preferred supports TOlUSNA
[ 4839.897162] sd 10:0:1:0: [sdm] Write Protect is off
[ 4839.902613] sd 10:0:1:0: [sdm] Mode Sense: 43 00 00 08
[ 4839.902632] sd 10:0:1:0: [sdm] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 4839.912935] loopback/naa.5001405afca06b48: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 4839.923291] sd 10:0:1:0: [sdm] Attached SCSI disk
[ 4841.065972] device-mapper: multipath queue-length: version 0.2.0 loaded
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-01 23:47 ` Mike Snitzer
@ 2016-09-02 0:03 ` Bart Van Assche
2016-09-02 15:12 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-02 0:03 UTC (permalink / raw)
To: Mike Snitzer; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On 09/01/2016 04:48 PM, Mike Snitzer wrote:
> On Thu, Sep 01 2016 at 7:17pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> Sorry that I misread your previous e-mail. After I received your
>> latest e-mail I rebased my tree on top of the devel.bart branch
>> mentioned above. My tests still pass. The only two patches in my
>> tree that are relevant and that are not in the devel.bart branch
>> have been attached to this e-mail. Did your test involve the sd
>> driver? If so, do the attached two patches help? If the sd driver
>> was not involved, can you provide more information about the hang
>> you ran into? The output and log messages generated by the following
>> commands after the hang has been reproduced would be very welcome:
>> * echo w > /proc/sysrq-trigger
>> * (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list})
>
> sd is used. I'll apply those patches and test, tomorrow, but I'm pretty
> skeptical.
>
> Haven't had any problems with these tests for quite a while. The tests
> I'm running are just those in the mptest testsuite, see:
> https://github.com/snitm/mptest
>
> Running them should be as simple as you doing:
>
> git clone git://github.com/snitm/mptest.git
> cd mptest
> ./runtest
>
> The default is to use dm-mq on scsi-mq ontop of tcmloop.
>
> [ ... ]
Hello Mike,
If I run mptest on my setup I can reproduce the hang. But what I see is
that the service-time path selector is in use when the hang is triggered.
I will patch that path selector in the same way as I did with the
queue-length path selector and rerun the test.
# dmsetup table
1Linux_scsi_debug_2000: 0 2097152 multipath 3 retain_attached_hw_handler queue_mode mq 1 alua 3 1 service-time 0 1 2 8:128 1 1 service-time 0 1 2 8:144 1 1 service-time 0 1 2 8:112 1 1
mp: 0 2097152 multipath 3 retain_attached_hw_handler queue_mode mq 1 alua 2 1 queue-length 0 2 1 8:96 1 8:112 1 queue-length 0 2 1 8:128 1 8:144 1
# (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list}) | grep -v ':$'
dm-0/mq/0/pending: ffff880358610000
dm-1/mq/0/pending: ffff880358220200
dm-1/mq/0/pending: ffff880358220400
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
2016-09-02 0:03 ` Bart Van Assche
@ 2016-09-02 15:12 ` Mike Snitzer
2016-09-02 16:10 ` should blk-mq halt requeue processing while queue is frozen? [was: Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues] Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-02 15:12 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe@kernel.dk, device-mapper development, hch@lst.de
On Thu, Sep 01 2016 at 8:03pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/01/2016 04:48 PM, Mike Snitzer wrote:
> > On Thu, Sep 01 2016 at 7:17pm -0400,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >> Sorry that I misread your previous e-mail. After I received your
> >> latest e-mail I rebased my tree on top of the devel.bart branch
> >> mentioned above. My tests still pass. The only two patches in my
> >> tree that are relevant and that are not in the devel.bart branch
> >> have been attached to this e-mail. Did your test involve the sd
> >> driver? If so, do the attached two patches help? If the sd driver
> >> was not involved, can you provide more information about the hang
> >> you ran into? The output and log messages generated by the following
> >> commands after the hang has been reproduced would be very welcome:
> >> * echo w > /proc/sysrq-trigger
> >> * (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list})
> >
> > sd is used. I'll apply those patches and test, tomorrow, but I'm pretty
> > skeptical.
> >
> > Haven't had any problems with these tests for quite a while. The tests
> > I'm running are just those in the mptest testsuite, see:
> > https://github.com/snitm/mptest
> >
> > Running them should be as simple as you doing:
> >
> > git clone git://github.com/snitm/mptest.git
> > cd mptest
> > ./runtest
> >
> > The default is to use dm-mq on scsi-mq ontop of tcmloop.
> >
> > [ ... ]
>
> Hello Mike,
>
> If I run mptest on my setup I can reproduce the hang. But what I see is
> that the service-time path selector is in use when the hang is triggered.
> I will patch that path selector in the same way as I did with the
> queue-length path selector and rerun the test.
>
> # dmsetup table
> 1Linux_scsi_debug_2000: 0 2097152 multipath 3 retain_attached_hw_handler queue_mode mq 1 alua 3 1 service-time 0 1 2 8:128 1 1 service-time 0 1 2 8:144 1 1 service-time 0 1 2 8:112 1 1
> mp: 0 2097152 multipath 3 retain_attached_hw_handler queue_mode mq 1 alua 2 1 queue-length 0 2 1 8:96 1 8:112 1 queue-length 0 2 1 8:128 1 8:144 1
> # (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list}) | grep -v ':$'
> dm-0/mq/0/pending: ffff880358610000
> dm-1/mq/0/pending: ffff880358220200
> dm-1/mq/0/pending: ffff880358220400
In case I haven't been clear: calling blk_mq_freeze_queue() _after_
you've suspended the DM device will only trigger IO that will get
re-queued to the DM device's blk-mq queue. So you're creating a
livelock since blk_mq_freeze_queue_wait() will not return, see stack
from:
https://www.redhat.com/archives/dm-devel/2016-September/msg00017.html
FYI, the BLK_MQ_S_STOPPED check that you removed from dm_mq_queue_rq()
in this commit...
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=devel.bart&id=69eb3e60e099a6117fc754e70eedd504685326ad
...is effectively serving as this when the device is suspended:
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index b5db523..3bc16dc 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -863,6 +863,9 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
dm_put_live_table(md, srcu_idx);
}
+ if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)))
+ return BLK_MQ_RQ_QUEUE_BUSY;
+
if (ti->type->busy && ti->type->busy(ti))
return BLK_MQ_RQ_QUEUE_BUSY;
Which is comparable to what the old .request_fn DM does (see:
dm_make_request).
Without that type of suspend check the code will go on to call
dm-mpath.c:multipath_clone_and_map() which will only result in
DM_MAPIO_REQUEUE (and dm_mq_queue_rq returning BLK_MQ_RQ_QUEUE_BUSY in
the case when multipath has no paths, via must_push_back_rq()).
Anyway, not what we want in general. The goal during DM device suspend
is to stop IO from being mapped. With request-based DM suspend we're
punting requests back to the block layer's request_queue.
So in the case of blk-mq request-based DM: we cannot expect
blk_mq_freeze_queue(), during suspend, to complete if requests are
getting requeued to the blk-mq queue via BLK_MQ_RQ_QUEUE_BUSY.
Mike
^ permalink raw reply related [flat|nested] 45+ messages in thread
* should blk-mq halt requeue processing while queue is frozen? [was: Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues]
2016-09-02 15:12 ` Mike Snitzer
@ 2016-09-02 16:10 ` Mike Snitzer
2016-09-02 22:42 ` [dm-devel] should blk-mq halt requeue processing while queue is frozen? Bart Van Assche
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-02 16:10 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe, device-mapper development, hch, linux-block
On Fri, Sep 02 2016 at 11:12am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> So in the case of blk-mq request-based DM: we cannot expect
> blk_mq_freeze_queue(), during suspend, to complete if requests are
> getting requeued to the blk-mq queue via BLK_MQ_RQ_QUEUE_BUSY.
Looking closer at blk-mq. Currently __blk_mq_run_hw_queue() will move
any requeued requests to the hctx->dispatch list and then performs async
blk_mq_run_hw_queue().
To do what you hoped (have blk_mq_freeze_queue() discontinue all use of
blk-mq hw queues during DM suspend) I think we'd need blk-mq to:
1) avoid processing requeued IO if blk_mq_freeze_queue() was used to
freeze the queue. Meaning it'd have to hold requeued work longer
than it currently does.
2) Then once blk_mq_unfreeze_queue() is called it'd allow requeues to
proceed.
This would be catering to a very specific requirement of DM (given it
re-queues IO back to the request_queue during suspend).
BUT all said, relative to request-based DM multipath, what we have is
perfectly fine on a correctness level: the requests are re-queued
because the blk-mq DM device is suspended.
Unfortunately on an efficiency level DM suspend creates a lot of busy
looping in blk-mq, with 100% cpu usage in a threads with names
"kworker/3:1H", ideally we'd avoid that!
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dm-devel] should blk-mq halt requeue processing while queue is frozen?
2016-09-02 16:10 ` should blk-mq halt requeue processing while queue is frozen? [was: Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues] Mike Snitzer
@ 2016-09-02 22:42 ` Bart Van Assche
2016-09-03 0:34 ` Mike Snitzer
2016-09-07 16:41 ` Mike Snitzer
0 siblings, 2 replies; 45+ messages in thread
From: Bart Van Assche @ 2016-09-02 22:42 UTC (permalink / raw)
To: Mike Snitzer; +Cc: axboe, linux-block, device-mapper development, hch
[-- Attachment #1: Type: text/plain, Size: 1852 bytes --]
On 09/02/2016 09:10 AM, Mike Snitzer wrote:
> On Fri, Sep 02 2016 at 11:12am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
>> So in the case of blk-mq request-based DM: we cannot expect
>> blk_mq_freeze_queue(), during suspend, to complete if requests are
>> getting requeued to the blk-mq queue via BLK_MQ_RQ_QUEUE_BUSY.
>
> Looking closer at blk-mq. Currently __blk_mq_run_hw_queue() will move
> any requeued requests to the hctx->dispatch list and then performs async
> blk_mq_run_hw_queue().
>
> To do what you hoped (have blk_mq_freeze_queue() discontinue all use of
> blk-mq hw queues during DM suspend) I think we'd need blk-mq to:
> 1) avoid processing requeued IO if blk_mq_freeze_queue() was used to
> freeze the queue. Meaning it'd have to hold requeued work longer
> than it currently does.
> 2) Then once blk_mq_unfreeze_queue() is called it'd allow requeues to
> proceed.
>
> This would be catering to a very specific requirement of DM (given it
> re-queues IO back to the request_queue during suspend).
>
> BUT all said, relative to request-based DM multipath, what we have is
> perfectly fine on a correctness level: the requests are re-queued
> because the blk-mq DM device is suspended.
>
> Unfortunately on an efficiency level DM suspend creates a lot of busy
> looping in blk-mq, with 100% cpu usage in a threads with names
> "kworker/3:1H", ideally we'd avoid that!
Hello Mike,
What blk_mq_freeze_queue() does is to wait until queue_rq() has finished
*and* all pending requests have completed. However, I think in
dm_stop_queue() all we need is to wait until queue_rq() has finished.
How about adding new functions in the block layer core to realize this,
e.g. something like in the attached (untested) patch? Busy looping
should be avoided - see also the tests of the new "quiescing" flag.
Thanks,
Bart.
[-- Attachment #2: 0001-blk-mq-Introduce-blk_mq_quiesce_queue.patch --]
[-- Type: text/x-patch, Size: 4321 bytes --]
From e55a161ee4df7804767ed8faf9ddb698e8852b06 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Fri, 2 Sep 2016 09:32:17 -0700
Subject: [PATCH] blk-mq: Introduce blk_mq_quiesce_queue()
---
block/blk-mq.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/blk-mq.h | 2 ++
include/linux/blkdev.h | 3 +++
3 files changed, 54 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 123d1ad..0320cd9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -135,6 +135,46 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
}
EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
+/**
+ * blk_mq_quiesce_queue - wait until all pending queue_rq calls have finished
+ *
+ * Prevent that new I/O requests are queued and wait until all pending
+ * queue_rq() calls have finished.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+ spin_lock_irq(q->queue_lock);
+ WARN_ON_ONCE(blk_queue_quiescing(q));
+ queue_flag_set(QUEUE_FLAG_QUIESCING, q);
+ spin_unlock_irq(q->queue_lock);
+
+ atomic_inc_return(&q->mq_freeze_depth);
+ blk_mq_run_hw_queues(q, false);
+ synchronize_rcu();
+
+ spin_lock_irq(q->queue_lock);
+ WARN_ON_ONCE(!blk_queue_quiescing(q));
+ queue_flag_clear(QUEUE_FLAG_QUIESCING, q);
+ spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+
+/**
+ * blk_mq_resume_queue - resume request processing
+ */
+void blk_mq_resume_queue(struct request_queue *q)
+{
+ int freeze_depth;
+
+ freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
+ WARN_ON_ONCE(freeze_depth < 0);
+ if (freeze_depth == 0)
+ wake_up_all(&q->mq_freeze_wq);
+
+ blk_mq_run_hw_queues(q, false);
+}
+EXPORT_SYMBOL_GPL(blk_mq_resume_queue);
+
void blk_mq_wake_waiters(struct request_queue *q)
{
struct blk_mq_hw_ctx *hctx;
@@ -506,6 +546,9 @@ static void blk_mq_requeue_work(struct work_struct *work)
struct request *rq, *next;
unsigned long flags;
+ if (blk_queue_quiescing(q))
+ return;
+
spin_lock_irqsave(&q->requeue_lock, flags);
list_splice_init(&q->requeue_list, &rq_list);
spin_unlock_irqrestore(&q->requeue_lock, flags);
@@ -806,6 +849,8 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
*/
flush_busy_ctxs(hctx, &rq_list);
+ rcu_read_lock();
+
/*
* If we have previous entries on our dispatch list, grab them
* and stuff them at the front for more fair dispatch.
@@ -888,8 +933,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
*
* blk_mq_run_hw_queue() already checks the STOPPED bit
**/
- blk_mq_run_hw_queue(hctx, true);
+ if (!blk_queue_quiescing(q))
+ blk_mq_run_hw_queue(hctx, true);
}
+
+ rcu_read_unlock();
}
/*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index bd677bc..8fc07bb 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -248,6 +248,8 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
void blk_mq_freeze_queue(struct request_queue *q);
void blk_mq_unfreeze_queue(struct request_queue *q);
void blk_mq_freeze_queue_start(struct request_queue *q);
+void blk_mq_quiesce_queue(struct request_queue *q);
+void blk_mq_resume_queue(struct request_queue *q);
int blk_mq_reinit_tagset(struct blk_mq_tag_set *set);
void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e79055c..9b360fc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -505,6 +505,7 @@ struct request_queue {
#define QUEUE_FLAG_FUA 24 /* device supports FUA writes */
#define QUEUE_FLAG_FLUSH_NQ 25 /* flush not queueuable */
#define QUEUE_FLAG_DAX 26 /* device supports DAX */
+#define QUEUE_FLAG_QUIESCING 27
#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -595,6 +596,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_queue_secure_erase(q) \
(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
#define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_quiescing(q) test_bit(QUEUE_FLAG_QUIESCING, \
+ &(q)->queue_flags)
#define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
--
2.9.3
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: should blk-mq halt requeue processing while queue is frozen?
2016-09-02 22:42 ` [dm-devel] should blk-mq halt requeue processing while queue is frozen? Bart Van Assche
@ 2016-09-03 0:34 ` Mike Snitzer
2016-09-07 16:41 ` Mike Snitzer
1 sibling, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2016-09-03 0:34 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe, linux-block, device-mapper development, hch
On Fri, Sep 02 2016 at 6:42pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/02/2016 09:10 AM, Mike Snitzer wrote:
> >On Fri, Sep 02 2016 at 11:12am -0400,
> >Mike Snitzer <snitzer@redhat.com> wrote:
> >
> >>So in the case of blk-mq request-based DM: we cannot expect
> >>blk_mq_freeze_queue(), during suspend, to complete if requests are
> >>getting requeued to the blk-mq queue via BLK_MQ_RQ_QUEUE_BUSY.
> >
> >Looking closer at blk-mq. Currently __blk_mq_run_hw_queue() will move
> >any requeued requests to the hctx->dispatch list and then performs async
> >blk_mq_run_hw_queue().
> >
> >To do what you hoped (have blk_mq_freeze_queue() discontinue all use of
> >blk-mq hw queues during DM suspend) I think we'd need blk-mq to:
> >1) avoid processing requeued IO if blk_mq_freeze_queue() was used to
> > freeze the queue. Meaning it'd have to hold requeued work longer
> > than it currently does.
> >2) Then once blk_mq_unfreeze_queue() is called it'd allow requeues to
> > proceed.
> >
> >This would be catering to a very specific requirement of DM (given it
> >re-queues IO back to the request_queue during suspend).
> >
> >BUT all said, relative to request-based DM multipath, what we have is
> >perfectly fine on a correctness level: the requests are re-queued
> >because the blk-mq DM device is suspended.
> >
> >Unfortunately on an efficiency level DM suspend creates a lot of busy
> >looping in blk-mq, with 100% cpu usage in a threads with names
> >"kworker/3:1H", ideally we'd avoid that!
>
> Hello Mike,
>
> What blk_mq_freeze_queue() does is to wait until queue_rq() has
> finished *and* all pending requests have completed.
Right, I had a look at blk-mq this afternoon and it is clear that the
current implementation of blk-mq's freeze (in terms of percpu
q->q_usage_counter dropping to zero) won't fly for the DM requeue
usecase.
> However, I think
> in dm_stop_queue() all we need is to wait until queue_rq() has
> finished. How about adding new functions in the block layer core to
> realize this, e.g. something like in the attached (untested) patch?
> Busy looping should be avoided - see also the tests of the new
> "quiescing" flag.
I'll take a closer look at your patch next week.
The reuse of the mq_freeze_depth to achieve this quiesce/resume will
need closer review -- likely by Jens.
> void blk_mq_wake_waiters(struct request_queue *q)
> {
> struct blk_mq_hw_ctx *hctx;
> @@ -506,6 +546,9 @@ static void blk_mq_requeue_work(struct work_struct *work)
> struct request *rq, *next;
> unsigned long flags;
>
> + if (blk_queue_quiescing(q))
> + return;
> +
> spin_lock_irqsave(&q->requeue_lock, flags);
> list_splice_init(&q->requeue_list, &rq_list);
> spin_unlock_irqrestore(&q->requeue_lock, flags);
> @@ -806,6 +849,8 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> */
> flush_busy_ctxs(hctx, &rq_list);
>
> + rcu_read_lock();
> +
> /*
> * If we have previous entries on our dispatch list, grab them
> * and stuff them at the front for more fair dispatch.
> @@ -888,8 +933,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> *
> * blk_mq_run_hw_queue() already checks the STOPPED bit
> **/
> - blk_mq_run_hw_queue(hctx, true);
> + if (!blk_queue_quiescing(q))
> + blk_mq_run_hw_queue(hctx, true);
> }
> +
> + rcu_read_unlock();
> }
Yes, those are the correct hooks to place code to conditionally
short-circuit normal blk-mq operation.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: should blk-mq halt requeue processing while queue is frozen?
2016-09-02 22:42 ` [dm-devel] should blk-mq halt requeue processing while queue is frozen? Bart Van Assche
2016-09-03 0:34 ` Mike Snitzer
@ 2016-09-07 16:41 ` Mike Snitzer
2016-09-13 8:01 ` [dm-devel] " Bart Van Assche
1 sibling, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2016-09-07 16:41 UTC (permalink / raw)
To: Bart Van Assche; +Cc: axboe, linux-block, device-mapper development, hch
On Fri, Sep 02 2016 at 6:42pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> However, I think
> in dm_stop_queue() all we need is to wait until queue_rq() has
> finished. How about adding new functions in the block layer core to
> realize this, e.g. something like in the attached (untested) patch?
> Busy looping should be avoided - see also the tests of the new
> "quiescing" flag.
>
> Thanks,
>
> Bart.
Comments inlined below.
> From e55a161ee4df7804767ed8faf9ddb698e8852b06 Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@sandisk.com>
> Date: Fri, 2 Sep 2016 09:32:17 -0700
> Subject: [PATCH] blk-mq: Introduce blk_mq_quiesce_queue()
>
> ---
> block/blk-mq.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/blk-mq.h | 2 ++
> include/linux/blkdev.h | 3 +++
> 3 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 123d1ad..0320cd9 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -135,6 +135,46 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
> }
> EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>
> +/**
> + * blk_mq_quiesce_queue - wait until all pending queue_rq calls have finished
> + *
> + * Prevent that new I/O requests are queued and wait until all pending
> + * queue_rq() calls have finished.
> + */
> +void blk_mq_quiesce_queue(struct request_queue *q)
> +{
> + spin_lock_irq(q->queue_lock);
> + WARN_ON_ONCE(blk_queue_quiescing(q));
> + queue_flag_set(QUEUE_FLAG_QUIESCING, q);
> + spin_unlock_irq(q->queue_lock);
> +
> + atomic_inc_return(&q->mq_freeze_depth);
> + blk_mq_run_hw_queues(q, false);
> + synchronize_rcu();
Why the synchronize_rcu()?
Also, you're effectively open-coding blk_mq_freeze_queue_start() minus
the q->q_usage_counter mgmt. Why not add a flag to conditionally manage
q->q_usage_counter to blk_mq_freeze_queue_start()?
> + spin_lock_irq(q->queue_lock);
> + WARN_ON_ONCE(!blk_queue_quiescing(q));
> + queue_flag_clear(QUEUE_FLAG_QUIESCING, q);
> + spin_unlock_irq(q->queue_lock);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> +
> +/**
> + * blk_mq_resume_queue - resume request processing
> + */
> +void blk_mq_resume_queue(struct request_queue *q)
> +{
> + int freeze_depth;
> +
> + freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
> + WARN_ON_ONCE(freeze_depth < 0);
> + if (freeze_depth == 0)
> + wake_up_all(&q->mq_freeze_wq);
Likewise, here you've open coded blk_mq_unfreeze_queue(). Adding a flag
to conditionally reinit q->q_usage_counter would be better.
But I'm concerned about blk_mq_{quiesce,resume}_queue vs
blk_mq_{freeze,unfreeze}_queue -- e.g. if "freeze" is nested after
"queue" (but before "resume") it would still need the q->q_usage_counter
management. Your patch as-is would break the blk-mq freeze interface.
> + blk_mq_run_hw_queues(q, false);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_resume_queue);
> +
> void blk_mq_wake_waiters(struct request_queue *q)
> {
> struct blk_mq_hw_ctx *hctx;
> @@ -506,6 +546,9 @@ static void blk_mq_requeue_work(struct work_struct *work)
> struct request *rq, *next;
> unsigned long flags;
>
> + if (blk_queue_quiescing(q))
> + return;
> +
> spin_lock_irqsave(&q->requeue_lock, flags);
> list_splice_init(&q->requeue_list, &rq_list);
> spin_unlock_irqrestore(&q->requeue_lock, flags);
> @@ -806,6 +849,8 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> */
> flush_busy_ctxs(hctx, &rq_list);
>
> + rcu_read_lock();
> +
> /*
> * If we have previous entries on our dispatch list, grab them
> * and stuff them at the front for more fair dispatch.
> @@ -888,8 +933,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> *
> * blk_mq_run_hw_queue() already checks the STOPPED bit
> **/
> - blk_mq_run_hw_queue(hctx, true);
> + if (!blk_queue_quiescing(q))
> + blk_mq_run_hw_queue(hctx, true);
> }
> +
> + rcu_read_unlock();
> }
>
> /*
Please explain this extra rcu_read_{lock,unlock}
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dm-devel] should blk-mq halt requeue processing while queue is frozen?
2016-09-07 16:41 ` Mike Snitzer
@ 2016-09-13 8:01 ` Bart Van Assche
2016-09-13 14:36 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Bart Van Assche @ 2016-09-13 8:01 UTC (permalink / raw)
To: Mike Snitzer, Bart Van Assche
Cc: axboe, linux-block, device-mapper development, hch
On 09/07/2016 06:41 PM, Mike Snitzer wrote:
> On Fri, Sep 02 2016 at 6:42pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> +/**
>> + * blk_mq_quiesce_queue - wait until all pending queue_rq calls have finished
>> + *
>> + * Prevent that new I/O requests are queued and wait until all pending
>> + * queue_rq() calls have finished.
>> + */
>> +void blk_mq_quiesce_queue(struct request_queue *q)
>> +{
>> + spin_lock_irq(q->queue_lock);
>> + WARN_ON_ONCE(blk_queue_quiescing(q));
>> + queue_flag_set(QUEUE_FLAG_QUIESCING, q);
>> + spin_unlock_irq(q->queue_lock);
>> +
>> + atomic_inc_return(&q->mq_freeze_depth);
>> + blk_mq_run_hw_queues(q, false);
>> + synchronize_rcu();
>
> Why the synchronize_rcu()?
Hello Mike,
Adding read_lock() + read_unlock() in __blk_mq_run_hw_queue() and
synchronize_rcu() in blk_mq_quiesce_queue() is the lowest overhead
mechanism I know of to make the latter function wait until the former
has finished.
> Also, you're effectively open-coding blk_mq_freeze_queue_start() minus
> the q->q_usage_counter mgmt. Why not add a flag to conditionally manage
> q->q_usage_counter to blk_mq_freeze_queue_start()?
I will consider this.
> But I'm concerned about blk_mq_{quiesce,resume}_queue vs
> blk_mq_{freeze,unfreeze}_queue -- e.g. if "freeze" is nested after
> "queue" (but before "resume") it would still need the q->q_usage_counter
> management. Your patch as-is would break the blk-mq freeze interface.
Agreed. blk_mq_{quiesce,resume}_queue() has to manipulate
q_usage_counter in the same way as blk_mq_{freeze,unfreeze}_queue().
Once I am back in the office I will rework this patch and send it to Jens.
Bart.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: should blk-mq halt requeue processing while queue is frozen?
2016-09-13 8:01 ` [dm-devel] " Bart Van Assche
@ 2016-09-13 14:36 ` Mike Snitzer
0 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2016-09-13 14:36 UTC (permalink / raw)
To: Bart Van Assche
Cc: Bart Van Assche, axboe, linux-block, device-mapper development,
hch
On Tue, Sep 13 2016 at 4:01am -0400,
Bart Van Assche <bart.vanassche@gmail.com> wrote:
> On 09/07/2016 06:41 PM, Mike Snitzer wrote:
> >On Fri, Sep 02 2016 at 6:42pm -0400,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>+/**
> >>+ * blk_mq_quiesce_queue - wait until all pending queue_rq calls have finished
> >>+ *
> >>+ * Prevent that new I/O requests are queued and wait until all pending
> >>+ * queue_rq() calls have finished.
> >>+ */
> >>+void blk_mq_quiesce_queue(struct request_queue *q)
> >>+{
> >>+ spin_lock_irq(q->queue_lock);
> >>+ WARN_ON_ONCE(blk_queue_quiescing(q));
> >>+ queue_flag_set(QUEUE_FLAG_QUIESCING, q);
> >>+ spin_unlock_irq(q->queue_lock);
> >>+
> >>+ atomic_inc_return(&q->mq_freeze_depth);
> >>+ blk_mq_run_hw_queues(q, false);
> >>+ synchronize_rcu();
> >
> >Why the synchronize_rcu()?
>
> Hello Mike,
>
> Adding read_lock() + read_unlock() in __blk_mq_run_hw_queue() and
> synchronize_rcu() in blk_mq_quiesce_queue() is the lowest overhead
> mechanism I know of to make the latter function wait until the
> former has finished.
OK.
> >Also, you're effectively open-coding blk_mq_freeze_queue_start() minus
> >the q->q_usage_counter mgmt. Why not add a flag to conditionally manage
> >q->q_usage_counter to blk_mq_freeze_queue_start()?
>
> I will consider this.
>
> >But I'm concerned about blk_mq_{quiesce,resume}_queue vs
> >blk_mq_{freeze,unfreeze}_queue -- e.g. if "freeze" is nested after
> >"queue" (but before "resume") it would still need the q->q_usage_counter
> >management. Your patch as-is would break the blk-mq freeze interface.
>
> Agreed. blk_mq_{quiesce,resume}_queue() has to manipulate
> q_usage_counter in the same way as blk_mq_{freeze,unfreeze}_queue().
> Once I am back in the office I will rework this patch and send it to
> Jens.
Please base any further work in this area ontop of
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=devel
And please verify all the mptest tests pass with your changes in place.
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2016-09-13 14:36 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
2016-08-31 22:15 ` [PATCH 1/9] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
2016-08-31 22:16 ` [PATCH 2/9] dm: Rename a function argument Bart Van Assche
2016-09-01 3:29 ` Mike Snitzer
2016-09-01 14:17 ` Bart Van Assche
2016-08-31 22:16 ` [PATCH 3/9] dm: Introduce signal_pending_state() Bart Van Assche
2016-08-31 22:16 ` [PATCH 4/9] dm: Convert wait loops Bart Van Assche
2016-08-31 22:17 ` [PATCH 5/9] dm: Add two lockdep_assert_held() statements Bart Van Assche
2016-08-31 22:17 ` [PATCH 6/9] dm: Simplify dm_old_stop_queue() Bart Van Assche
2016-08-31 22:17 ` [PATCH 7/9] dm: Mark block layer queue dead before destroying the dm device Bart Van Assche
2016-08-31 22:18 ` [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues Bart Van Assche
2016-09-01 3:13 ` Mike Snitzer
2016-09-01 14:23 ` Bart Van Assche
2016-09-01 15:05 ` Mike Snitzer
2016-09-01 15:31 ` Bart Van Assche
2016-09-01 15:50 ` Mike Snitzer
2016-09-01 16:12 ` Mike Snitzer
2016-09-01 17:59 ` Bart Van Assche
2016-09-01 19:05 ` Mike Snitzer
2016-09-01 19:35 ` Mike Snitzer
2016-09-01 20:15 ` Bart Van Assche
2016-09-01 20:33 ` Mike Snitzer
2016-09-01 20:39 ` Bart Van Assche
2016-09-01 20:48 ` Mike Snitzer
2016-09-01 20:52 ` Bart Van Assche
2016-09-01 21:17 ` Mike Snitzer
2016-09-01 22:18 ` Mike Snitzer
2016-09-01 22:22 ` Bart Van Assche
2016-09-01 22:26 ` Mike Snitzer
2016-09-01 23:17 ` Bart Van Assche
2016-09-01 23:47 ` Mike Snitzer
2016-09-02 0:03 ` Bart Van Assche
2016-09-02 15:12 ` Mike Snitzer
2016-09-02 16:10 ` should blk-mq halt requeue processing while queue is frozen? [was: Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues] Mike Snitzer
2016-09-02 22:42 ` [dm-devel] should blk-mq halt requeue processing while queue is frozen? Bart Van Assche
2016-09-03 0:34 ` Mike Snitzer
2016-09-07 16:41 ` Mike Snitzer
2016-09-13 8:01 ` [dm-devel] " Bart Van Assche
2016-09-13 14:36 ` Mike Snitzer
2016-08-31 22:18 ` [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop Bart Van Assche
2016-09-01 2:49 ` Mike Snitzer
2016-09-01 14:14 ` Bart Van Assche
2016-09-01 15:06 ` Mike Snitzer
2016-09-01 15:22 ` Bart Van Assche
2016-09-01 15:26 ` Mike Snitzer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).