Linux block layer
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Keith Busch <kbusch@kernel.org>
Cc: Keith Busch <kbusch@meta.com>,
	axboe@kernel.dk, hch@lst.de, linux-block@vger.kernel.org
Subject: Re: [PATCHv2] blk-mq: check for stale cached request in blk_mq_submit_bio
Date: Tue, 19 May 2026 08:47:55 +0200	[thread overview]
Message-ID: <20260519064755.GA6682@lst.de> (raw)
In-Reply-To: <agslvk4DLD0OOVLR@kbusch-mbp>

On Mon, May 18, 2026 at 08:44:14AM -0600, Keith Busch wrote:
> On Fri, May 01, 2026 at 10:41:19AM -0700, Keith Busch wrote:
> > When submitting a bio to blk-mq, if the task should sleep after peeking
> > a cached request, but before it pops it, the plug flushes and calls
> > blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
> > use-after-free bug. Fix this by ensuring the cached_rqs still contains
> > our peeked request, and retry the bio submission without it if the
> > request had been freed.
> 
> Any thoughts on this one? I know it's an old bug, but I've only recently
> started seeing it happen for some reason.

I still think the only proper fix is to always hold a queue ref over
the checks.  Correctness first and then look into optimizing that
with hazard pointers or whatever.  Below is the patch I did during
LSF/MM, refeshed using a commit log mostly stolen from you:

---
From e1bff09d6b594f777624240cefd70861137c45b4 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 19 May 2026 06:49:12 +0200
Subject: blk-mq: always take a queue reference in blk_mq_submit_bio

When submitting a bio to blk-mq, if the task should sleep after peeking
a cached request, but before it pops it, the plug flushes and calls
blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
use-after-free bug.

Fix this by alway grabbing a queue usage reference in blk_mq_submit_bio.
While past commit claims that this is slow, we're better safe than
fast as a first priority.

The code had already warned of this possibility, and specifically popped
the request before other known blocking calls, but it didn't handle a
blocking GFP_NOIO alloc. Under memory pressure, allocating the split bio
or the integrity payload are two such cases that can block. The blk-mq
submit_bio function continues using the peeked request that was already
freed and re-initialized, so the driver receives that request with a
NULL'ed mq_hctx, and inevitably panics.

Large parts of the commit message are stolen from an an earlier attempt
to fix this issue by Keith Busch <kbusch@kernel.org>.

Fixes: b0077e269f6c1 ("blk-mq: make sure active queue usage is held for bio_integrity_prep()")
Fixes: 7b4f36cd22a65 ("block: ensure we hold a queue reference when using queue limits")
Reported-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 65 +++++++++++++++++++-------------------------------
 1 file changed, 24 insertions(+), 41 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d0c37daf568f..a9af1a9faedc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3075,43 +3075,42 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q,
 }
 
 /*
- * Check if there is a suitable cached request and return it.
+ * Check if there is a suitable cached request and use it if possible.
  */
-static struct request *blk_mq_peek_cached_request(struct blk_plug *plug,
-		struct request_queue *q, blk_opf_t opf)
+static struct request *blk_mq_pop_cached_request(struct blk_plug *plug,
+		struct request_queue *q, struct bio *bio)
 {
-	enum hctx_type type = blk_mq_get_hctx_type(opf);
+	enum hctx_type type = blk_mq_get_hctx_type(bio->bi_opf);
 	struct request *rq;
 
 	if (!plug)
 		return NULL;
+
+	/*
+	 * Note: we must not sleep between the peek of the cached_rqs list here
+	 * and the pop below.
+	 */
 	rq = rq_list_peek(&plug->cached_rqs);
 	if (!rq || rq->q != q)
 		return NULL;
 	if (type != rq->mq_hctx->type &&
 	    (type != HCTX_TYPE_READ || rq->mq_hctx->type != HCTX_TYPE_DEFAULT))
 		return NULL;
-	if (op_is_flush(rq->cmd_flags) != op_is_flush(opf))
+	if (op_is_flush(rq->cmd_flags) != op_is_flush(bio->bi_opf))
 		return NULL;
-	return rq;
-}
-
-static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
-		struct bio *bio)
-{
 	if (rq_list_pop(&plug->cached_rqs) != rq)
 		WARN_ON_ONCE(1);
 
-	/*
-	 * If any qos ->throttle() end up blocking, we will have flushed the
-	 * plug and hence killed the cached_rq list as well. Pop this entry
-	 * before we throttle.
-	 */
 	rq_qos_throttle(rq->q, bio);
-
 	blk_mq_rq_time_init(rq, blk_time_get_ns());
 	rq->cmd_flags = bio->bi_opf;
 	INIT_LIST_HEAD(&rq->queuelist);
+
+	/*
+	 * Drop the extra queue reference from the cached request.
+	 */
+	blk_queue_exit(q);
+	return rq;
 }
 
 static bool bio_unaligned(const struct bio *bio, struct request_queue *q)
@@ -3149,11 +3148,6 @@ void blk_mq_submit_bio(struct bio *bio)
 	struct request *rq;
 	blk_status_t ret;
 
-	/*
-	 * If the plug has a cached request for this queue, try to use it.
-	 */
-	rq = blk_mq_peek_cached_request(plug, q, bio->bi_opf);
-
 	/*
 	 * A BIO that was released from a zone write plug has already been
 	 * through the preparation in this function, already holds a reference
@@ -3162,19 +3156,11 @@ void blk_mq_submit_bio(struct bio *bio)
 	 */
 	if (bio_zone_write_plugging(bio)) {
 		nr_segs = bio->__bi_nr_segments;
-		if (rq)
-			blk_queue_exit(q);
 		goto new_request;
 	}
 
-	/*
-	 * The cached request already holds a q_usage_counter reference and we
-	 * don't have to acquire a new one if we use it.
-	 */
-	if (!rq) {
-		if (unlikely(bio_queue_enter(bio)))
-			return;
-	}
+	if (unlikely(bio_queue_enter(bio)))
+		return;
 
 	/*
 	 * Device reconfiguration may change logical block size or reduce the
@@ -3210,9 +3196,11 @@ void blk_mq_submit_bio(struct bio *bio)
 	}
 
 new_request:
-	if (rq) {
-		blk_mq_use_cached_rq(rq, plug, bio);
-	} else {
+	/*
+	 * If the plug has a cached request for this queue, try to use it.
+	 */
+	rq = blk_mq_pop_cached_request(plug, q, bio);
+	if (!rq) {
 		rq = blk_mq_get_new_requests(q, plug, bio);
 		if (unlikely(!rq)) {
 			if (bio->bi_opf & REQ_NOWAIT)
@@ -3257,12 +3245,7 @@ void blk_mq_submit_bio(struct bio *bio)
 	return;
 
 queue_exit:
-	/*
-	 * Don't drop the queue reference if we were trying to use a cached
-	 * request and thus didn't acquire one.
-	 */
-	if (!rq)
-		blk_queue_exit(q);
+	blk_queue_exit(q);
 }
 
 #ifdef CONFIG_BLK_MQ_STACKING
-- 
2.53.0


  reply	other threads:[~2026-05-19  6:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 17:41 [PATCHv2] blk-mq: check for stale cached request in blk_mq_submit_bio Keith Busch
2026-05-18 14:44 ` Keith Busch
2026-05-19  6:47   ` Christoph Hellwig [this message]
2026-05-19  1:24 ` Ming Lei
2026-05-19  3:07   ` Keith Busch
2026-05-19  3:36     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260519064755.GA6682@lst.de \
    --to=hch@lst.de \
    --cc=axboe@kernel.dk \
    --cc=kbusch@kernel.org \
    --cc=kbusch@meta.com \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox