From: Christoph Hellwig <hch@lst.de>
To: Keith Busch <kbusch@kernel.org>
Cc: Keith Busch <kbusch@meta.com>,
axboe@kernel.dk, hch@lst.de, linux-block@vger.kernel.org
Subject: Re: [PATCHv2] blk-mq: check for stale cached request in blk_mq_submit_bio
Date: Tue, 19 May 2026 08:47:55 +0200 [thread overview]
Message-ID: <20260519064755.GA6682@lst.de> (raw)
In-Reply-To: <agslvk4DLD0OOVLR@kbusch-mbp>
On Mon, May 18, 2026 at 08:44:14AM -0600, Keith Busch wrote:
> On Fri, May 01, 2026 at 10:41:19AM -0700, Keith Busch wrote:
> > When submitting a bio to blk-mq, if the task should sleep after peeking
> > a cached request, but before it pops it, the plug flushes and calls
> > blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
> > use-after-free bug. Fix this by ensuring the cached_rqs still contains
> > our peeked request, and retry the bio submission without it if the
> > request had been freed.
>
> Any thoughts on this one? I know it's an old bug, but I've only recently
> started seeing it happen for some reason.
I still think the only proper fix is to always hold a queue ref over
the checks. Correctness first and then look into optimizing that
with hazard pointers or whatever. Below is the patch I did during
LSF/MM, refeshed using a commit log mostly stolen from you:
---
From e1bff09d6b594f777624240cefd70861137c45b4 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 19 May 2026 06:49:12 +0200
Subject: blk-mq: always take a queue reference in blk_mq_submit_bio
When submitting a bio to blk-mq, if the task should sleep after peeking
a cached request, but before it pops it, the plug flushes and calls
blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
use-after-free bug.
Fix this by alway grabbing a queue usage reference in blk_mq_submit_bio.
While past commit claims that this is slow, we're better safe than
fast as a first priority.
The code had already warned of this possibility, and specifically popped
the request before other known blocking calls, but it didn't handle a
blocking GFP_NOIO alloc. Under memory pressure, allocating the split bio
or the integrity payload are two such cases that can block. The blk-mq
submit_bio function continues using the peeked request that was already
freed and re-initialized, so the driver receives that request with a
NULL'ed mq_hctx, and inevitably panics.
Large parts of the commit message are stolen from an an earlier attempt
to fix this issue by Keith Busch <kbusch@kernel.org>.
Fixes: b0077e269f6c1 ("blk-mq: make sure active queue usage is held for bio_integrity_prep()")
Fixes: 7b4f36cd22a65 ("block: ensure we hold a queue reference when using queue limits")
Reported-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-mq.c | 65 +++++++++++++++++++-------------------------------
1 file changed, 24 insertions(+), 41 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d0c37daf568f..a9af1a9faedc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3075,43 +3075,42 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q,
}
/*
- * Check if there is a suitable cached request and return it.
+ * Check if there is a suitable cached request and use it if possible.
*/
-static struct request *blk_mq_peek_cached_request(struct blk_plug *plug,
- struct request_queue *q, blk_opf_t opf)
+static struct request *blk_mq_pop_cached_request(struct blk_plug *plug,
+ struct request_queue *q, struct bio *bio)
{
- enum hctx_type type = blk_mq_get_hctx_type(opf);
+ enum hctx_type type = blk_mq_get_hctx_type(bio->bi_opf);
struct request *rq;
if (!plug)
return NULL;
+
+ /*
+ * Note: we must not sleep between the peek of the cached_rqs list here
+ * and the pop below.
+ */
rq = rq_list_peek(&plug->cached_rqs);
if (!rq || rq->q != q)
return NULL;
if (type != rq->mq_hctx->type &&
(type != HCTX_TYPE_READ || rq->mq_hctx->type != HCTX_TYPE_DEFAULT))
return NULL;
- if (op_is_flush(rq->cmd_flags) != op_is_flush(opf))
+ if (op_is_flush(rq->cmd_flags) != op_is_flush(bio->bi_opf))
return NULL;
- return rq;
-}
-
-static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
- struct bio *bio)
-{
if (rq_list_pop(&plug->cached_rqs) != rq)
WARN_ON_ONCE(1);
- /*
- * If any qos ->throttle() end up blocking, we will have flushed the
- * plug and hence killed the cached_rq list as well. Pop this entry
- * before we throttle.
- */
rq_qos_throttle(rq->q, bio);
-
blk_mq_rq_time_init(rq, blk_time_get_ns());
rq->cmd_flags = bio->bi_opf;
INIT_LIST_HEAD(&rq->queuelist);
+
+ /*
+ * Drop the extra queue reference from the cached request.
+ */
+ blk_queue_exit(q);
+ return rq;
}
static bool bio_unaligned(const struct bio *bio, struct request_queue *q)
@@ -3149,11 +3148,6 @@ void blk_mq_submit_bio(struct bio *bio)
struct request *rq;
blk_status_t ret;
- /*
- * If the plug has a cached request for this queue, try to use it.
- */
- rq = blk_mq_peek_cached_request(plug, q, bio->bi_opf);
-
/*
* A BIO that was released from a zone write plug has already been
* through the preparation in this function, already holds a reference
@@ -3162,19 +3156,11 @@ void blk_mq_submit_bio(struct bio *bio)
*/
if (bio_zone_write_plugging(bio)) {
nr_segs = bio->__bi_nr_segments;
- if (rq)
- blk_queue_exit(q);
goto new_request;
}
- /*
- * The cached request already holds a q_usage_counter reference and we
- * don't have to acquire a new one if we use it.
- */
- if (!rq) {
- if (unlikely(bio_queue_enter(bio)))
- return;
- }
+ if (unlikely(bio_queue_enter(bio)))
+ return;
/*
* Device reconfiguration may change logical block size or reduce the
@@ -3210,9 +3196,11 @@ void blk_mq_submit_bio(struct bio *bio)
}
new_request:
- if (rq) {
- blk_mq_use_cached_rq(rq, plug, bio);
- } else {
+ /*
+ * If the plug has a cached request for this queue, try to use it.
+ */
+ rq = blk_mq_pop_cached_request(plug, q, bio);
+ if (!rq) {
rq = blk_mq_get_new_requests(q, plug, bio);
if (unlikely(!rq)) {
if (bio->bi_opf & REQ_NOWAIT)
@@ -3257,12 +3245,7 @@ void blk_mq_submit_bio(struct bio *bio)
return;
queue_exit:
- /*
- * Don't drop the queue reference if we were trying to use a cached
- * request and thus didn't acquire one.
- */
- if (!rq)
- blk_queue_exit(q);
+ blk_queue_exit(q);
}
#ifdef CONFIG_BLK_MQ_STACKING
--
2.53.0
next prev parent reply other threads:[~2026-05-19 6:48 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 17:41 [PATCHv2] blk-mq: check for stale cached request in blk_mq_submit_bio Keith Busch
2026-05-18 14:44 ` Keith Busch
2026-05-19 6:47 ` Christoph Hellwig [this message]
2026-05-19 1:24 ` Ming Lei
2026-05-19 3:07 ` Keith Busch
2026-05-19 3:36 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260519064755.GA6682@lst.de \
--to=hch@lst.de \
--cc=axboe@kernel.dk \
--cc=kbusch@kernel.org \
--cc=kbusch@meta.com \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.