* [RFC PATCH] blk-mq: fix potential I/O hang caused by batch wakeup
@ 2024-05-20 3:38 Yang Yang
2024-05-20 18:11 ` Bart Van Assche
0 siblings, 1 reply; 3+ messages in thread
From: Yang Yang @ 2024-05-20 3:38 UTC (permalink / raw)
To: Jens Axboe, linux-block, linux-kernel; +Cc: Yang Yang
The depth is 62, and the wake_batch is 8. In the following situation,
the task would hang forever.
t1: t2: t3:
blk_mq_get_tag . .
io_schedule . .
elevator_switch .
blk_mq_freeze_queue .
blk_freeze_queue_start .
blk_mq_freeze_queue_wait .
blk_mq_submit_bio
__bio_queue_enter
Fix this issue by waking up all the waiters sleeping on tags after
freezing the queue.
Signed-off-by: Yang Yang <yang.yang@vivo.com>
---
block/blk-core.c | 2 --
block/blk-mq.c | 4 +++-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index a16b5abdbbf5..e1eacfad6e5b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q)
* prevent I/O from crossing blk_queue_enter().
*/
blk_freeze_queue_start(q);
- if (queue_is_mq(q))
- blk_mq_wake_waiters(q);
/* Make blk_queue_enter() reexamine the DYING flag. */
wake_up_all(&q->mq_freeze_wq);
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4ecb9db62337..9eb3139e713a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q)
if (++q->mq_freeze_depth == 1) {
percpu_ref_kill(&q->q_usage_counter);
mutex_unlock(&q->mq_freeze_lock);
- if (queue_is_mq(q))
+ if (queue_is_mq(q)) {
+ blk_mq_wake_waiters(q);
blk_mq_run_hw_queues(q, false);
+ }
} else {
mutex_unlock(&q->mq_freeze_lock);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [RFC PATCH] blk-mq: fix potential I/O hang caused by batch wakeup 2024-05-20 3:38 [RFC PATCH] blk-mq: fix potential I/O hang caused by batch wakeup Yang Yang @ 2024-05-20 18:11 ` Bart Van Assche 2024-05-21 11:25 ` YangYang 0 siblings, 1 reply; 3+ messages in thread From: Bart Van Assche @ 2024-05-20 18:11 UTC (permalink / raw) To: Yang Yang, Jens Axboe, linux-block, linux-kernel On 5/19/24 20:38, Yang Yang wrote: > The depth is 62, and the wake_batch is 8. In the following situation, > the task would hang forever. > > t1: t2: t3: > blk_mq_get_tag . . > io_schedule . . > elevator_switch . > blk_mq_freeze_queue . > blk_freeze_queue_start . > blk_mq_freeze_queue_wait . > blk_mq_submit_bio > __bio_queue_enter > > Fix this issue by waking up all the waiters sleeping on tags after > freezing the queue. Shouldn't blk_mq_alloc_request() be mentioned in t1 since that is the function that calls blk_queue_enter()? > diff --git a/block/blk-core.c b/block/blk-core.c > index a16b5abdbbf5..e1eacfad6e5b 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q) > * prevent I/O from crossing blk_queue_enter(). > */ > blk_freeze_queue_start(q); > - if (queue_is_mq(q)) > - blk_mq_wake_waiters(q); > /* Make blk_queue_enter() reexamine the DYING flag. */ > wake_up_all(&q->mq_freeze_wq); > } Why has blk_queue_start_drain() been modified? I don't see any reference in the patch description to blk_queue_start_drain(). Am I perhaps missing something? > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 4ecb9db62337..9eb3139e713a 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q) > if (++q->mq_freeze_depth == 1) { > percpu_ref_kill(&q->q_usage_counter); > mutex_unlock(&q->mq_freeze_lock); > - if (queue_is_mq(q)) > + if (queue_is_mq(q)) { > + blk_mq_wake_waiters(q); > blk_mq_run_hw_queues(q, false); > + } > } else { > mutex_unlock(&q->mq_freeze_lock); > } Why would the above change be necessary? If the blk_queue_enter() call by blk_mq_alloc_request() succeeds and blk_mq_get_tag() calls io_schedule(), io_schedule() will be woken up indirectly by the blk_mq_run_hw_queues() call because that call will free one of the tags that the io_schedule() call is waiting for. Thanks, Bart. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC PATCH] blk-mq: fix potential I/O hang caused by batch wakeup 2024-05-20 18:11 ` Bart Van Assche @ 2024-05-21 11:25 ` YangYang 0 siblings, 0 replies; 3+ messages in thread From: YangYang @ 2024-05-21 11:25 UTC (permalink / raw) To: Bart Van Assche, linux-block; +Cc: Jens Axboe, linux-kernel On 2024/5/21 2:11, Bart Van Assche wrote: > On 5/19/24 20:38, Yang Yang wrote: >> The depth is 62, and the wake_batch is 8. In the following situation, >> the task would hang forever. >> >> t1: t2: t3: >> blk_mq_get_tag . . >> io_schedule . . >> elevator_switch . >> blk_mq_freeze_queue . >> blk_freeze_queue_start . >> blk_mq_freeze_queue_wait . >> blk_mq_submit_bio >> __bio_queue_enter >> >> Fix this issue by waking up all the waiters sleeping on tags after >> freezing the queue. > > Shouldn't blk_mq_alloc_request() be mentioned in t1 since that is the function > that calls blk_queue_enter()? t1: t2: t3: blk_mq_submit_bio . . __blk_mq_alloc_requests . . blk_mq_get_tag . . io_schedule . . elevator_switch . blk_mq_freeze_queue . blk_freeze_queue_start . q->mq_freeze_depth=1 . blk_mq_freeze_queue_wait . blk_mq_submit_bio __bio_queue_enter wait_event(!q->mq_freeze_depth) > >> diff --git a/block/blk-core.c b/block/blk-core.c >> index a16b5abdbbf5..e1eacfad6e5b 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q) >> * prevent I/O from crossing blk_queue_enter(). >> */ >> blk_freeze_queue_start(q); >> - if (queue_is_mq(q)) >> - blk_mq_wake_waiters(q); >> /* Make blk_queue_enter() reexamine the DYING flag. */ >> wake_up_all(&q->mq_freeze_wq); >> } > > Why has blk_queue_start_drain() been modified? I don't see any reference > in the patch description to blk_queue_start_drain(). Am I perhaps missing > something? blk_mq_wake_waiters() has already been called in blk_freeze_queue_start(), so I thought it can be removed from blk_queue_start_drain(). > >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 4ecb9db62337..9eb3139e713a 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q) >> if (++q->mq_freeze_depth == 1) { >> percpu_ref_kill(&q->q_usage_counter); >> mutex_unlock(&q->mq_freeze_lock); >> - if (queue_is_mq(q)) >> + if (queue_is_mq(q)) { >> + blk_mq_wake_waiters(q); >> blk_mq_run_hw_queues(q, false); >> + } >> } else { >> mutex_unlock(&q->mq_freeze_lock); >> } > > Why would the above change be necessary? If the blk_queue_enter() call > by blk_mq_alloc_request() succeeds and blk_mq_get_tag() calls > io_schedule(), io_schedule() will be woken up indirectly by the > blk_mq_run_hw_queues() call because that call will free one of the tags > that the io_schedule() call is waiting for. This patch is a workaround solution. I think the hang is caused by a lost wakeup, so after blk_mq_run_hw_queues(), t1 is still waiting for the tag. bt = 0xFFFFFF802F9C6790 -> ( sb = ( depth = 62, shift = 6, map_nr = 1, round_robin = FALSE, map = 0xFFFFFF803BF97000, alloc_hint = 0x00000049119B3F4C), wake_batch = 6, wake_index = (counter = 0), ws = 0xFFFFFF803BEBCA00, ws_active = (counter = 1), min_shallow_depth = 48, completion_cnt = (counter = 1), wakeup_cnt = (counter = 0)) Upon analyzing the coredump, it was noticed that sbq->completion_cnt=1, and I can't figure out why. blk_mq_put_tag->sbitmap_queue_clear->sbitmap_queue_wake_up(sbq, 1) should be called multiple times, considering that sbq->ws_active=1, sbq->completion_cnt should be greater than 1. Looking forward to some advice from block layer experts. Thanks. > > Thanks, > > Bart. ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-05-21 11:25 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-20 3:38 [RFC PATCH] blk-mq: fix potential I/O hang caused by batch wakeup Yang Yang 2024-05-20 18:11 ` Bart Van Assche 2024-05-21 11:25 ` YangYang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox