Linux block layer
 help / color / mirror / Atom feed
* [PATCHv3] blk-mq: pop cached request if it is usable
@ 2026-05-21 19:02 Keith Busch
  2026-05-21 19:05 ` Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Keith Busch @ 2026-05-21 19:02 UTC (permalink / raw)
  To: axboe, hch, linux-block; +Cc: tom.leiming, Keith Busch

From: Keith Busch <kbusch@kernel.org>

When submitting a bio to blk-mq, if the task should sleep after peeking
a cached request, but before it pops it, the plug flushes and calls
blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
use-after-free bug. Fix this by popping the cached request before any
possible blocking calls if it is suitable for use.

Popping this request first holds a queue reference, so avoid any
serialization races with queue freezes and can safely proceed with
dispatching that request to the driver. This potentially increases a
timing window from when a driver wants to freeze its queue to when
requests stop being dispatched. That scenario is off the fast path
though, and drivers need to appropriately handle requests during a
freeze request anyway.

The downside is the popped element needs to be individually freed when
we performed a bio plug merge. The cached request would have had to be
freed later anyway, but this patch does it inline with building the plug
list instead of after flushing it.

Fixes: b0077e269f6c1 ("blk-mq: make sure active queue usage is held for bio_integrity_prep()")
Fixes: 7b4f36cd22a65 ("block: ensure we hold a queue reference when using queue limits")
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v2->v3:

  This pops the cached requests first. It's simpler this way and I don't
  see any strong reason against it.

 block/blk-mq.c | 34 +++++++++-------------------------
 1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d0c37daf568f2..28c2d931e75ea 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3077,7 +3077,7 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q,
 /*
  * Check if there is a suitable cached request and return it.
  */
-static struct request *blk_mq_peek_cached_request(struct blk_plug *plug,
+static struct request *blk_mq_get_cached_request(struct blk_plug *plug,
 		struct request_queue *q, blk_opf_t opf)
 {
 	enum hctx_type type = blk_mq_get_hctx_type(opf);
@@ -3093,27 +3093,10 @@ static struct request *blk_mq_peek_cached_request(struct blk_plug *plug,
 		return NULL;
 	if (op_is_flush(rq->cmd_flags) != op_is_flush(opf))
 		return NULL;
+	rq_list_pop(&plug->cached_rqs);
 	return rq;
 }
 
-static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
-		struct bio *bio)
-{
-	if (rq_list_pop(&plug->cached_rqs) != rq)
-		WARN_ON_ONCE(1);
-
-	/*
-	 * If any qos ->throttle() end up blocking, we will have flushed the
-	 * plug and hence killed the cached_rq list as well. Pop this entry
-	 * before we throttle.
-	 */
-	rq_qos_throttle(rq->q, bio);
-
-	blk_mq_rq_time_init(rq, blk_time_get_ns());
-	rq->cmd_flags = bio->bi_opf;
-	INIT_LIST_HEAD(&rq->queuelist);
-}
-
 static bool bio_unaligned(const struct bio *bio, struct request_queue *q)
 {
 	unsigned int bs_mask = queue_logical_block_size(q) - 1;
@@ -3152,7 +3135,7 @@ void blk_mq_submit_bio(struct bio *bio)
 	/*
 	 * If the plug has a cached request for this queue, try to use it.
 	 */
-	rq = blk_mq_peek_cached_request(plug, q, bio->bi_opf);
+	rq = blk_mq_get_cached_request(plug, q, bio->bi_opf);
 
 	/*
 	 * A BIO that was released from a zone write plug has already been
@@ -3211,7 +3194,10 @@ void blk_mq_submit_bio(struct bio *bio)
 
 new_request:
 	if (rq) {
-		blk_mq_use_cached_rq(rq, plug, bio);
+		rq_qos_throttle(rq->q, bio);
+		blk_mq_rq_time_init(rq, blk_time_get_ns());
+		rq->cmd_flags = bio->bi_opf;
+		INIT_LIST_HEAD(&rq->queuelist);
 	} else {
 		rq = blk_mq_get_new_requests(q, plug, bio);
 		if (unlikely(!rq)) {
@@ -3257,12 +3243,10 @@ void blk_mq_submit_bio(struct bio *bio)
 	return;
 
 queue_exit:
-	/*
-	 * Don't drop the queue reference if we were trying to use a cached
-	 * request and thus didn't acquire one.
-	 */
 	if (!rq)
 		blk_queue_exit(q);
+	else
+		blk_mq_free_request(rq);
 }
 
 #ifdef CONFIG_BLK_MQ_STACKING
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCHv3] blk-mq: pop cached request if it is usable
  2026-05-21 19:02 [PATCHv3] blk-mq: pop cached request if it is usable Keith Busch
@ 2026-05-21 19:05 ` Jens Axboe
  2026-05-21 19:05 ` Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2026-05-21 19:05 UTC (permalink / raw)
  To: Keith Busch, hch, linux-block; +Cc: tom.leiming, Keith Busch

On 5/21/26 1:02 PM, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> When submitting a bio to blk-mq, if the task should sleep after peeking
> a cached request, but before it pops it, the plug flushes and calls
> blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
> use-after-free bug. Fix this by popping the cached request before any
> possible blocking calls if it is suitable for use.
> 
> Popping this request first holds a queue reference, so avoid any
> serialization races with queue freezes and can safely proceed with
> dispatching that request to the driver. This potentially increases a
> timing window from when a driver wants to freeze its queue to when
> requests stop being dispatched. That scenario is off the fast path
> though, and drivers need to appropriately handle requests during a
> freeze request anyway.
> 
> The downside is the popped element needs to be individually freed when
> we performed a bio plug merge. The cached request would have had to be
> freed later anyway, but this patch does it inline with building the plug
> list instead of after flushing it.

Keith and I had a side-bar about this earlier today, and yes this is so
much better imho than both v2 or Christoph's sledge hammer.

I've already tested this one too, and as expected, it's not regressing
performance. It's also saner in that there's no gap between peek and pop
anymore, so no risk of further blocking screwing with the list on
unplug.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3] blk-mq: pop cached request if it is usable
  2026-05-21 19:02 [PATCHv3] blk-mq: pop cached request if it is usable Keith Busch
  2026-05-21 19:05 ` Jens Axboe
@ 2026-05-21 19:05 ` Jens Axboe
  2026-05-21 23:33 ` Ming Lei
  2026-05-22  9:03 ` Christoph Hellwig
  3 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2026-05-21 19:05 UTC (permalink / raw)
  To: hch, linux-block, Keith Busch; +Cc: tom.leiming, Keith Busch


On Thu, 21 May 2026 12:02:53 -0700, Keith Busch wrote:
> When submitting a bio to blk-mq, if the task should sleep after peeking
> a cached request, but before it pops it, the plug flushes and calls
> blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
> use-after-free bug. Fix this by popping the cached request before any
> possible blocking calls if it is suitable for use.
> 
> Popping this request first holds a queue reference, so avoid any
> serialization races with queue freezes and can safely proceed with
> dispatching that request to the driver. This potentially increases a
> timing window from when a driver wants to freeze its queue to when
> requests stop being dispatched. That scenario is off the fast path
> though, and drivers need to appropriately handle requests during a
> freeze request anyway.
> 
> [...]

Applied, thanks!

[1/1] blk-mq: pop cached request if it is usable
      commit: dc278e9bf2b9513a763353e6b9cc21e0f532954e

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3] blk-mq: pop cached request if it is usable
  2026-05-21 19:02 [PATCHv3] blk-mq: pop cached request if it is usable Keith Busch
  2026-05-21 19:05 ` Jens Axboe
  2026-05-21 19:05 ` Jens Axboe
@ 2026-05-21 23:33 ` Ming Lei
  2026-05-22  1:44   ` Keith Busch
  2026-05-22  9:03 ` Christoph Hellwig
  3 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2026-05-21 23:33 UTC (permalink / raw)
  To: Keith Busch; +Cc: axboe, hch, linux-block, Keith Busch

On Thu, May 21, 2026 at 12:02:53PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> When submitting a bio to blk-mq, if the task should sleep after peeking
> a cached request, but before it pops it, the plug flushes and calls
> blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
> use-after-free bug. Fix this by popping the cached request before any
> possible blocking calls if it is suitable for use.
> 
> Popping this request first holds a queue reference, so avoid any
> serialization races with queue freezes and can safely proceed with
> dispatching that request to the driver. This potentially increases a
> timing window from when a driver wants to freeze its queue to when
> requests stop being dispatched. That scenario is off the fast path
> though, and drivers need to appropriately handle requests during a
> freeze request anyway.
> 
> The downside is the popped element needs to be individually freed when
> we performed a bio plug merge. The cached request would have had to be
> freed later anyway, but this patch does it inline with building the plug
> list instead of after flushing it.
> 
> Fixes: b0077e269f6c1 ("blk-mq: make sure active queue usage is held for bio_integrity_prep()")
> Fixes: 7b4f36cd22a65 ("block: ensure we hold a queue reference when using queue limits")
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> v2->v3:
> 
>   This pops the cached requests first. It's simpler this way and I don't
>   see any strong reason against it.

Reviewed-by: Ming Lei <tom.leiming@gmail.com>

BTW, as mentioned in v2, the request may be added back in case of merge,
but seems not a big deal given blk_mq_free_plug_rqs() doesn't free requests
in batch.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3] blk-mq: pop cached request if it is usable
  2026-05-21 23:33 ` Ming Lei
@ 2026-05-22  1:44   ` Keith Busch
  2026-05-22  4:12     ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2026-05-22  1:44 UTC (permalink / raw)
  To: Ming Lei; +Cc: Keith Busch, axboe, hch, linux-block

On Fri, May 22, 2026 at 07:33:39AM +0800, Ming Lei wrote:
> 
> BTW, as mentioned in v2, the request may be added back in case of merge,
> but seems not a big deal given blk_mq_free_plug_rqs() doesn't free requests
> in batch.

We could introduce a special goto label for the merge case to push it
back to the cached requests, but it's not clear that it's worth it. The
cached requests are preallocated to match what is about to be dispatched
(up to a limit), so if a merge happens, there may not be another
subsequent bio submissions to use it, so it would just get freed as an
unused request later anyway. And if we block on the split bio allocation
such that it frees the cached rq list, I think we've already lost the
optimization battle, so no need to complicate things.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3] blk-mq: pop cached request if it is usable
  2026-05-22  1:44   ` Keith Busch
@ 2026-05-22  4:12     ` Ming Lei
  2026-05-22 22:08       ` Keith Busch
  0 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2026-05-22  4:12 UTC (permalink / raw)
  To: Keith Busch; +Cc: Keith Busch, axboe, hch, linux-block

On Thu, May 21, 2026 at 07:44:50PM -0600, Keith Busch wrote:
> On Fri, May 22, 2026 at 07:33:39AM +0800, Ming Lei wrote:
> > 
> > BTW, as mentioned in v2, the request may be added back in case of merge,
> > but seems not a big deal given blk_mq_free_plug_rqs() doesn't free requests
> > in batch.
> 
> We could introduce a special goto label for the merge case to push it

It can be done simply by replacing the added `blk_mq_free_request` with moving
it back to plug list.

> back to the cached requests, but it's not clear that it's worth it. The
> cached requests are preallocated to match what is about to be dispatched
> (up to a limit), so if a merge happens, there may not be another
> subsequent bio submissions to use it, so it would just get freed as an
> unused request later anyway. And if we block on the split bio allocation
> such that it frees the cached rq list, I think we've already lost the
> optimization battle, so no need to complicate things.

I mean plain sequential IO workload, in which batch allocation is taken, and
just the 1st request is used, all others are freed one by one, which may
be improved to batch release too in theory.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3] blk-mq: pop cached request if it is usable
  2026-05-21 19:02 [PATCHv3] blk-mq: pop cached request if it is usable Keith Busch
                   ` (2 preceding siblings ...)
  2026-05-21 23:33 ` Ming Lei
@ 2026-05-22  9:03 ` Christoph Hellwig
  2026-05-22 12:23   ` Keith Busch
  3 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2026-05-22  9:03 UTC (permalink / raw)
  To: Keith Busch; +Cc: axboe, hch, linux-block, tom.leiming, Keith Busch

> +static struct request *blk_mq_get_cached_request(struct blk_plug *plug,
>  		struct request_queue *q, blk_opf_t opf)
>  {
>  	enum hctx_type type = blk_mq_get_hctx_type(opf);
> @@ -3093,27 +3093,10 @@ static struct request *blk_mq_peek_cached_request(struct blk_plug *plug,
>  		return NULL;
>  	if (op_is_flush(rq->cmd_flags) != op_is_flush(opf))
>  		return NULL;
> +	rq_list_pop(&plug->cached_rqs);
>  	return rq;
>  }

Please add a comment about not sleeping between the peek and pop as
in my earlier patch here.

> @@ -3257,12 +3243,10 @@ void blk_mq_submit_bio(struct bio *bio)
>  	return;
>  
>  queue_exit:
> -	/*
> -	 * Don't drop the queue reference if we were trying to use a cached
> -	 * request and thus didn't acquire one.
> -	 */
>  	if (!rq)
>  		blk_queue_exit(q);
> +	else
> +		blk_mq_free_request(rq);
>  }

I think keeping a comment here would be nice, a would be avoid
the inversion of the condition for a trivial if/else.

But on a higher level, I really think you should add it back to the
batch list here, otherwise we're doing lots of roundtrips through
blk_mq_free_request for trivially mergable sequential I/O.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3] blk-mq: pop cached request if it is usable
  2026-05-22  9:03 ` Christoph Hellwig
@ 2026-05-22 12:23   ` Keith Busch
  0 siblings, 0 replies; 11+ messages in thread
From: Keith Busch @ 2026-05-22 12:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, axboe, linux-block, tom.leiming

On Fri, May 22, 2026 at 11:03:40AM +0200, Christoph Hellwig wrote:
> > @@ -3257,12 +3243,10 @@ void blk_mq_submit_bio(struct bio *bio)
> >  	return;
> >  
> >  queue_exit:
> > -	/*
> > -	 * Don't drop the queue reference if we were trying to use a cached
> > -	 * request and thus didn't acquire one.
> > -	 */
> >  	if (!rq)
> >  		blk_queue_exit(q);
> > +	else
> > +		blk_mq_free_request(rq);
> >  }
> 
> I think keeping a comment here would be nice, a would be avoid
> the inversion of the condition for a trivial if/else.
> 
> But on a higher level, I really think you should add it back to the
> batch list here, otherwise we're doing lots of roundtrips through
> blk_mq_free_request for trivially mergable sequential I/O.

That gets compicated when we did sleep. This request may be the last
reference that's holding up a freeze. We really want to re-enter the
queue in that case because the driver is trying to change the queue
limits. Unless this is measurably harming perforamnce, it's a corner
case that's just easier to not think about.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3] blk-mq: pop cached request if it is usable
  2026-05-22  4:12     ` Ming Lei
@ 2026-05-22 22:08       ` Keith Busch
  2026-05-22 22:55         ` Keith Busch
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2026-05-22 22:08 UTC (permalink / raw)
  To: Ming Lei; +Cc: Keith Busch, axboe, hch, linux-block

On Fri, May 22, 2026 at 12:12:18PM +0800, Ming Lei wrote:
> On Thu, May 21, 2026 at 07:44:50PM -0600, Keith Busch wrote:
> > On Fri, May 22, 2026 at 07:33:39AM +0800, Ming Lei wrote:
> > > 
> > > BTW, as mentioned in v2, the request may be added back in case of merge,
> > > but seems not a big deal given blk_mq_free_plug_rqs() doesn't free requests
> > > in batch.
> > 
> > We could introduce a special goto label for the merge case to push it
> 
> It can be done simply by replacing the added `blk_mq_free_request` with moving
> it back to plug list.

What I'm worried about is hitting a blocking allocation, then the
cached_rqs list is freed, leaving the current request from it the only
one still holding a queue reference. I think we ought to re-enter the
queue in that case.

I suggested a special goto for the successful merge because a plug merge
couldn't happen if a previous allocation did schedule since that flushes
the plug. I guess we'd have to distinguish a plug merge vs a sched
merge, though.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3] blk-mq: pop cached request if it is usable
  2026-05-22 22:08       ` Keith Busch
@ 2026-05-22 22:55         ` Keith Busch
  2026-05-23 13:56           ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2026-05-22 22:55 UTC (permalink / raw)
  To: Ming Lei; +Cc: Keith Busch, axboe, hch, linux-block

On Fri, May 22, 2026 at 04:08:56PM -0600, Keith Busch wrote:
> On Fri, May 22, 2026 at 12:12:18PM +0800, Ming Lei wrote:
> > On Thu, May 21, 2026 at 07:44:50PM -0600, Keith Busch wrote:
> > > On Fri, May 22, 2026 at 07:33:39AM +0800, Ming Lei wrote:
> > > > 
> > > > BTW, as mentioned in v2, the request may be added back in case of merge,
> > > > but seems not a big deal given blk_mq_free_plug_rqs() doesn't free requests
> > > > in batch.
> > > 
> > > We could introduce a special goto label for the merge case to push it
> > 
> > It can be done simply by replacing the added `blk_mq_free_request` with moving
> > it back to plug list.
> 
> What I'm worried about is hitting a blocking allocation, then the
> cached_rqs list is freed, leaving the current request from it the only
> one still holding a queue reference. I think we ought to re-enter the
> queue in that case.

Hmm, I may be mistaken here. The block allocation doesn't call
blk_finish_plug(), so the current plug is left intact; the other
queue_exit goto's are either from non-blocking contexts or a successful
merge that holds queue references in other ways. I guess there is no
queue_exit goto where unconditionally pushing back might be a problem.
So yeah, sorry, maybe restoring it to the cached_rqs is a worthy
optimization to make.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3] blk-mq: pop cached request if it is usable
  2026-05-22 22:55         ` Keith Busch
@ 2026-05-23 13:56           ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2026-05-23 13:56 UTC (permalink / raw)
  To: Keith Busch, Ming Lei; +Cc: Keith Busch, hch, linux-block

On 5/22/26 4:55 PM, Keith Busch wrote:
> On Fri, May 22, 2026 at 04:08:56PM -0600, Keith Busch wrote:
>> On Fri, May 22, 2026 at 12:12:18PM +0800, Ming Lei wrote:
>>> On Thu, May 21, 2026 at 07:44:50PM -0600, Keith Busch wrote:
>>>> On Fri, May 22, 2026 at 07:33:39AM +0800, Ming Lei wrote:
>>>>>
>>>>> BTW, as mentioned in v2, the request may be added back in case of merge,
>>>>> but seems not a big deal given blk_mq_free_plug_rqs() doesn't free requests
>>>>> in batch.
>>>>
>>>> We could introduce a special goto label for the merge case to push it
>>>
>>> It can be done simply by replacing the added `blk_mq_free_request` with moving
>>> it back to plug list.
>>
>> What I'm worried about is hitting a blocking allocation, then the
>> cached_rqs list is freed, leaving the current request from it the only
>> one still holding a queue reference. I think we ought to re-enter the
>> queue in that case.
> 
> Hmm, I may be mistaken here. The block allocation doesn't call
> blk_finish_plug(), so the current plug is left intact; the other
> queue_exit goto's are either from non-blocking contexts or a
> successful merge that holds queue references in other ways. I guess
> there is no queue_exit goto where unconditionally pushing back might
> be a problem. So yeah, sorry, maybe restoring it to the cached_rqs is
> a worthy optimization to make.

Yeah should be fine, we also discussed this one the other day. Want to
send a patch for that? Just mark it as fixing the previous even if it
isn't a bug fix in the strictest sense of the word, but then we ensure
that if one is backported, the other one will be too.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-05-23 13:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-21 19:02 [PATCHv3] blk-mq: pop cached request if it is usable Keith Busch
2026-05-21 19:05 ` Jens Axboe
2026-05-21 19:05 ` Jens Axboe
2026-05-21 23:33 ` Ming Lei
2026-05-22  1:44   ` Keith Busch
2026-05-22  4:12     ` Ming Lei
2026-05-22 22:08       ` Keith Busch
2026-05-22 22:55         ` Keith Busch
2026-05-23 13:56           ` Jens Axboe
2026-05-22  9:03 ` Christoph Hellwig
2026-05-22 12:23   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox