All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Muchun Song <songmuchun@bytedance.com>,
	yukuai1@huaweicloud.com, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, muchun.song@linux.dev,
	stable@vger.kernel.org, ming.lei@redhat.com
Subject: Re: [PATCH v2 2/3] block: fix ordering between checking QUEUE_FLAG_QUIESCED and adding requests
Date: Wed, 11 Sep 2024 11:54:49 +0800	[thread overview]
Message-ID: <ZuEUiScRwuXgIrC0@fedora> (raw)
In-Reply-To: <91ce06c7-6965-4d1d-8ed4-d0a6f01acecf@kernel.dk>

On Tue, Sep 10, 2024 at 07:22:16AM -0600, Jens Axboe wrote:
> On 9/3/24 2:16 AM, Muchun Song wrote:
> > Supposing the following scenario.
> > 
> > CPU0                                        CPU1
> > 
> > blk_mq_insert_request()         1) store    blk_mq_unquiesce_queue()
> > blk_mq_run_hw_queue()                       blk_queue_flag_clear(QUEUE_FLAG_QUIESCED)       3) store
> >     if (blk_queue_quiesced())   2) load         blk_mq_run_hw_queues()
> >         return                                      blk_mq_run_hw_queue()
> >     blk_mq_sched_dispatch_requests()                    if (!blk_mq_hctx_has_pending())     4) load
> >                                                            return
> > 
> > The full memory barrier should be inserted between 1) and 2), as well as
> > between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is
> > cleared or CPU1 sees dispatch list or setting of bitmap of software queue.
> > Otherwise, either CPU will not re-run the hardware queue causing starvation.
> > 
> > So the first solution is to 1) add a pair of memory barrier to fix the
> > problem, another solution is to 2) use hctx->queue->queue_lock to synchronize
> > QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not
> > easy to be maintained.
> 
> Same comment here, 72-74 chars wide please.
> 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index b2d0f22de0c7f..ac39f2a346a52 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2202,6 +2202,24 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
> >  }
> >  EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
> >  
> > +static inline bool blk_mq_hw_queue_need_run(struct blk_mq_hw_ctx *hctx)
> > +{
> > +	bool need_run;
> > +
> > +	/*
> > +	 * When queue is quiesced, we may be switching io scheduler, or
> > +	 * updating nr_hw_queues, or other things, and we can't run queue
> > +	 * any more, even blk_mq_hctx_has_pending() can't be called safely.
> > +	 *
> > +	 * And queue will be rerun in blk_mq_unquiesce_queue() if it is
> > +	 * quiesced.
> > +	 */
> > +	__blk_mq_run_dispatch_ops(hctx->queue, false,
> > +				  need_run = !blk_queue_quiesced(hctx->queue) &&
> > +					      blk_mq_hctx_has_pending(hctx));
> > +	return need_run;
> > +}
> 
> This __blk_mq_run_dispatch_ops() is also way too wide, why didn't you
> just break it like where you copied it from?
> 
> > +
> >  /**
> >   * blk_mq_run_hw_queue - Start to run a hardware queue.
> >   * @hctx: Pointer to the hardware queue to run.
> > @@ -2222,20 +2240,23 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
> >  
> >  	might_sleep_if(!async && hctx->flags & BLK_MQ_F_BLOCKING);
> >  
> > -	/*
> > -	 * When queue is quiesced, we may be switching io scheduler, or
> > -	 * updating nr_hw_queues, or other things, and we can't run queue
> > -	 * any more, even __blk_mq_hctx_has_pending() can't be called safely.
> > -	 *
> > -	 * And queue will be rerun in blk_mq_unquiesce_queue() if it is
> > -	 * quiesced.
> > -	 */
> > -	__blk_mq_run_dispatch_ops(hctx->queue, false,
> > -		need_run = !blk_queue_quiesced(hctx->queue) &&
> > -		blk_mq_hctx_has_pending(hctx));
> > +	need_run = blk_mq_hw_queue_need_run(hctx);
> > +	if (!need_run) {
> > +		unsigned long flags;
> >  
> > -	if (!need_run)
> > -		return;
> > +		/*
> > +		 * synchronize with blk_mq_unquiesce_queue(), becuase we check
> > +		 * if hw queue is quiesced locklessly above, we need the use
> > +		 * ->queue_lock to make sure we see the up-to-date status to
> > +		 * not miss rerunning the hw queue.
> > +		 */
> > +		spin_lock_irqsave(&hctx->queue->queue_lock, flags);
> > +		need_run = blk_mq_hw_queue_need_run(hctx);
> > +		spin_unlock_irqrestore(&hctx->queue->queue_lock, flags);
> > +
> > +		if (!need_run)
> > +			return;
> > +	}
> 
> Is this not solvable on the unquiesce side instead? It's rather a shame
> to add overhead to the fast path to avoid a race with something that's
> super unlikely, like quisce.

Yeah, it can be solved by adding synchronize_rcu()/srcu() in unquiesce
side, but SCSI may call it in non-sleepable context via scsi_internal_device_unblock_nowait().


Thanks,
Ming


  reply	other threads:[~2024-09-11  3:55 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-03  8:16 [PATCH v2 0/3] Fix some starvation problems in block layer Muchun Song
2024-09-03  8:16 ` [PATCH v2 1/3] block: fix missing dispatching request when queue is started or unquiesced Muchun Song
2024-09-10 13:17   ` Jens Axboe
2024-09-11  2:43     ` Muchun Song
2024-09-03  8:16 ` [PATCH v2 2/3] block: fix ordering between checking QUEUE_FLAG_QUIESCED and adding requests Muchun Song
2024-09-04 12:56   ` Ming Lei
2024-09-10 13:22   ` Jens Axboe
2024-09-11  3:54     ` Ming Lei [this message]
2024-09-11  3:59       ` Muchun Song
2024-09-11  5:20         ` Muchun Song
2024-09-12  3:27       ` Muchun Song
2024-09-12  6:27         ` Muchun Song
2024-09-11  3:56     ` Muchun Song
2024-09-03  8:16 ` [PATCH v2 3/3] block: fix ordering between checking BLK_MQ_S_STOPPED " Muchun Song
2024-09-04 13:04   ` Ming Lei
2024-09-10 13:22   ` Jens Axboe
2024-09-11  2:44     ` Muchun Song
2024-09-10  2:49 ` [PATCH v2 0/3] Fix some starvation problems in block layer Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZuEUiScRwuXgIrC0@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=songmuchun@bytedance.com \
    --cc=stable@vger.kernel.org \
    --cc=yukuai1@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.