Linux block layer
 help / color / mirror / Atom feed
From: YangYang <yang.yang@vivo.com>
To: Bart Van Assche <bvanassche@acm.org>, linux-block@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] blk-mq: fix potential I/O hang caused by batch wakeup
Date: Tue, 21 May 2024 19:25:29 +0800	[thread overview]
Message-ID: <de0c70fc-4965-42e3-b621-b5ffeb3bbc84@vivo.com> (raw)
In-Reply-To: <24d4d60f-05f3-472b-8dfc-4edcb5f7883c@acm.org>

On 2024/5/21 2:11, Bart Van Assche wrote:
> On 5/19/24 20:38, Yang Yang wrote:
>> The depth is 62, and the wake_batch is 8. In the following situation,
>> the task would hang forever.
>>
>>    t1:                 t2:                          t3:
>>    blk_mq_get_tag      .                            .
>>    io_schedule         .                            .
>>                        elevator_switch              .
>>                        blk_mq_freeze_queue          .
>>                        blk_freeze_queue_start       .
>>                        blk_mq_freeze_queue_wait     .
>>                                                     blk_mq_submit_bio
>>                                                     __bio_queue_enter
>>
>> Fix this issue by waking up all the waiters sleeping on tags after
>> freezing the queue.
> 
> Shouldn't blk_mq_alloc_request() be mentioned in t1 since that is the function
> that calls blk_queue_enter()?

  t1:                      t2:                          t3:
  blk_mq_submit_bio        .                            .
  __blk_mq_alloc_requests  .                            .
  blk_mq_get_tag           .                            .
  io_schedule              .                            .
                           elevator_switch              .
                           blk_mq_freeze_queue          .
                           blk_freeze_queue_start       .
                           q->mq_freeze_depth=1         .
                           blk_mq_freeze_queue_wait     .
                                                        blk_mq_submit_bio
                                                        __bio_queue_enter
                                                        wait_event(!q->mq_freeze_depth)

> 
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index a16b5abdbbf5..e1eacfad6e5b 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q)
>>        * prevent I/O from crossing blk_queue_enter().
>>        */
>>       blk_freeze_queue_start(q);
>> -    if (queue_is_mq(q))
>> -        blk_mq_wake_waiters(q);
>>       /* Make blk_queue_enter() reexamine the DYING flag. */
>>       wake_up_all(&q->mq_freeze_wq);
>>   }
> 
> Why has blk_queue_start_drain() been modified? I don't see any reference
> in the patch description to blk_queue_start_drain(). Am I perhaps missing
> something?

blk_mq_wake_waiters() has already been called in blk_freeze_queue_start(),
so I thought it can be removed from blk_queue_start_drain().

> 
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 4ecb9db62337..9eb3139e713a 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q)
>>       if (++q->mq_freeze_depth == 1) {
>>           percpu_ref_kill(&q->q_usage_counter);
>>           mutex_unlock(&q->mq_freeze_lock);
>> -        if (queue_is_mq(q))
>> +        if (queue_is_mq(q)) {
>> +            blk_mq_wake_waiters(q);
>>               blk_mq_run_hw_queues(q, false);
>> +        }
>>       } else {
>>           mutex_unlock(&q->mq_freeze_lock);
>>       }
> 
> Why would the above change be necessary? If the blk_queue_enter() call
> by blk_mq_alloc_request() succeeds and blk_mq_get_tag() calls
> io_schedule(), io_schedule() will be woken up indirectly by the
> blk_mq_run_hw_queues() call because that call will free one of the tags
> that the io_schedule() call is waiting for.

This patch is a workaround solution. I think the hang is caused by
a lost wakeup, so after blk_mq_run_hw_queues(), t1 is still waiting
for the tag.

     bt = 0xFFFFFF802F9C6790 -> (
     sb = (
       depth = 62,
       shift = 6,
       map_nr = 1,
       round_robin = FALSE,
       map = 0xFFFFFF803BF97000,
       alloc_hint = 0x00000049119B3F4C),
     wake_batch = 6,
     wake_index = (counter = 0),
     ws = 0xFFFFFF803BEBCA00,
     ws_active = (counter = 1),
     min_shallow_depth = 48,
     completion_cnt = (counter = 1),
     wakeup_cnt = (counter = 0))

Upon analyzing the coredump, it was noticed that sbq->completion_cnt=1,
and I can't figure out why.
blk_mq_put_tag->sbitmap_queue_clear->sbitmap_queue_wake_up(sbq, 1) should
be called multiple times, considering that sbq->ws_active=1,
sbq->completion_cnt should be greater than 1.
Looking forward to some advice from block layer experts.

Thanks.

> 
> Thanks,
> 
> Bart.


      reply	other threads:[~2024-05-21 11:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-20  3:38 [RFC PATCH] blk-mq: fix potential I/O hang caused by batch wakeup Yang Yang
2024-05-20 18:11 ` Bart Van Assche
2024-05-21 11:25   ` YangYang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de0c70fc-4965-42e3-b621-b5ffeb3bbc84@vivo.com \
    --to=yang.yang@vivo.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox