Re: [PATCH -next] blk-mq: fix boot time regression for scsi drives with multiple hctx

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Yu Kuai <yukuai3@huawei.com>
Cc: axboe@kernel.dk, djeffery@redhat.com, bvanassche@acm.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	yi.zhang@huawei.com
Subject: Re: [PATCH -next] blk-mq: fix boot time regression for scsi drives with multiple hctx
Date: Tue, 14 Jun 2022 22:26:01 +0800	[thread overview]
Message-ID: <YqiaebRquFDjwcPO@T590> (raw)
In-Reply-To: <01cb0e49-1154-33db-f572-3960c972fe08@huawei.com>

On Tue, Jun 14, 2022 at 09:15:36PM +0800, Yu Kuai wrote:
> 在 2022/06/14 15:31, Ming Lei 写道:
> > On Tue, Jun 14, 2022 at 03:14:10PM +0800, Yu Kuai wrote:
> > > We found that boot time is increased for about 8s after upgrading kernel
> > > from v4.19 to v5.10(megaraid-sas is used in the environment).
> > 
> > But 'blk-mq: clearing flush request reference in tags->rqs[]' was merged
> > to v5.14, :-)
> Hi,
> 
> Yes, but this patch is applied to 5.10 stable, thus we backport in our
> v5.10. Sorry that I didn't mention that.
> 
> > 
> > > 
> > > Following is where the extra time is spent:
> > > 
> > > 
> > >   __scsi_remove_device
> > >    blk_cleanup_queue
> > >     blk_mq_exit_queue
> > >      blk_mq_exit_hw_queues
> > >       blk_mq_exit_hctx
> > >        blk_mq_clear_flush_rq_mapping -> function latency is 0.1ms
> > 
> > So queue_depth looks pretty long, is it 4k?
> No, in the environment, it's just 32, and nr_hw_queues is 128, which
> means each blk_cleanup_queue() will cost about 10-20 ms.

So 32 * cmpxchg takes 100us, which is really crazy, what is the arch
and processor?

> 
> > 
> > But if it is 0.1ms, how can the 8sec delay be caused? That requires 80K hw queues
> > for making so long, so I guess there must be other delay added by the feature
> > of BLK_MQ_F_TAG_HCTX_SHARED.
> 
> Please see details in the reasons 2), scsi scan will call
> __scsi_remove_device() a lot of times(each host, each channel, each
> target).

80K / 128 = 640 LUNs.

OK, that can be true.

> > 
> > >         cmpxchg
> > > 
> > > There are three reasons:
> > > 1) megaraid-sas is using multiple hctxs in v5.10, thus blk_mq_exit_hctx()
> > > will be called much more times in v5.10 compared to v4.19.
> > > 2) scsi will scan for each target thus __scsi_remove_device() will be
> > > called for many times.
> > > 3) blk_mq_clear_flush_rq_mapping() is introduced after v4.19, it will
> > > call cmpxchg() for each request, and function latency is abount 0.1ms.
> > > 
> > > Since that blk_mq_clear_flush_rq_mapping() will only be called while the
> > > queue is freezed already, which means there is no inflight request,
> > > it's safe to set NULL for 'tags->rqs[]' directly instead of using
> > > cmpxchg(). Tests show that with this change, function latency of
> > > blk_mq_clear_flush_rq_mapping() is about 1us, and boot time is not
> > > increased.
> > 
> > tags is shared among all LUNs attached to the host, so freezing single
> > request queue here means nothing, so your patch doesn't work.
> 
> You'are right, I forgot about that tags can be shared.
> 
> > 
> > Please test the following patch, and see if it can improve boot delay for
> > your case.
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index e9bf950983c7..1463076a527c 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -3443,8 +3443,9 @@ static void blk_mq_exit_hctx(struct request_queue *q,
> >   	if (blk_mq_hw_queue_mapped(hctx))
> >   		blk_mq_tag_idle(hctx);
> > -	blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
> > -			set->queue_depth, flush_rq);
> > +	if (blk_queue_init_done(q))
> > +		blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
> > +				set->queue_depth, flush_rq);
> >   	if (set->ops->exit_request)
> >   		set->ops->exit_request(set, flush_rq, hctx_idx);
> 
> Thanks for the patch, I test it and boot delay is fixed.

Thanks for the test, and I will send it out tomorrow.

Thanks,
Ming

next prev parent reply	other threads:[~2022-06-14 14:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14  7:14 [PATCH -next] blk-mq: fix boot time regression for scsi drives with multiple hctx Yu Kuai
2022-06-14  7:31 ` Ming Lei
2022-06-14 13:15   ` Yu Kuai
2022-06-14 14:26     ` Ming Lei [this message]
2022-06-15  6:21     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YqiaebRquFDjwcPO@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=djeffery@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).