From: Christoph Hellwig <hch@lst.de>
To: Keith Busch <kbusch@kernel.org>
Cc: Keith Busch <kbusch@meta.com>,
axboe@kernel.dk, hch@lst.de, linux-block@vger.kernel.org
Subject: Re: [PATCHv2] blk-mq: check for stale cached request in blk_mq_submit_bio
Date: Tue, 19 May 2026 08:47:55 +0200 [thread overview]
Message-ID: <20260519064755.GA6682@lst.de> (raw)
In-Reply-To: <agslvk4DLD0OOVLR@kbusch-mbp>
On Mon, May 18, 2026 at 08:44:14AM -0600, Keith Busch wrote:
> On Fri, May 01, 2026 at 10:41:19AM -0700, Keith Busch wrote:
> > When submitting a bio to blk-mq, if the task should sleep after peeking
> > a cached request, but before it pops it, the plug flushes and calls
> > blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
> > use-after-free bug. Fix this by ensuring the cached_rqs still contains
> > our peeked request, and retry the bio submission without it if the
> > request had been freed.
>
> Any thoughts on this one? I know it's an old bug, but I've only recently
> started seeing it happen for some reason.
I still think the only proper fix is to always hold a queue ref over
the checks. Correctness first and then look into optimizing that
with hazard pointers or whatever. Below is the patch I did during
LSF/MM, refeshed using a commit log mostly stolen from you:
---
From e1bff09d6b594f777624240cefd70861137c45b4 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 19 May 2026 06:49:12 +0200
Subject: blk-mq: always take a queue reference in blk_mq_submit_bio
When submitting a bio to blk-mq, if the task should sleep after peeking
a cached request, but before it pops it, the plug flushes and calls
blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
use-after-free bug.
Fix this by alway grabbing a queue usage reference in blk_mq_submit_bio.
While past commit claims that this is slow, we're better safe than
fast as a first priority.
The code had already warned of this possibility, and specifically popped
the request before other known blocking calls, but it didn't handle a
blocking GFP_NOIO alloc. Under memory pressure, allocating the split bio
or the integrity payload are two such cases that can block. The blk-mq
submit_bio function continues using the peeked request that was already
freed and re-initialized, so the driver receives that request with a
NULL'ed mq_hctx, and inevitably panics.
Large parts of the commit message are stolen from an an earlier attempt
to fix this issue by Keith Busch <kbusch@kernel.org>.
Fixes: b0077e269f6c1 ("blk-mq: make sure active queue usage is held for bio_integrity_prep()")
Fixes: 7b4f36cd22a65 ("block: ensure we hold a queue reference when using queue limits")
Reported-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-mq.c | 65 +++++++++++++++++++-------------------------------
1 file changed, 24 insertions(+), 41 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d0c37daf568f..a9af1a9faedc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3075,43 +3075,42 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q,
}
/*
- * Check if there is a suitable cached request and return it.
+ * Check if there is a suitable cached request and use it if possible.
*/
-static struct request *blk_mq_peek_cached_request(struct blk_plug *plug,
- struct request_queue *q, blk_opf_t opf)
+static struct request *blk_mq_pop_cached_request(struct blk_plug *plug,
+ struct request_queue *q, struct bio *bio)
{
- enum hctx_type type = blk_mq_get_hctx_type(opf);
+ enum hctx_type type = blk_mq_get_hctx_type(bio->bi_opf);
struct request *rq;
if (!plug)
return NULL;
+
+ /*
+ * Note: we must not sleep between the peek of the cached_rqs list here
+ * and the pop below.
+ */
rq = rq_list_peek(&plug->cached_rqs);
if (!rq || rq->q != q)
return NULL;
if (type != rq->mq_hctx->type &&
(type != HCTX_TYPE_READ || rq->mq_hctx->type != HCTX_TYPE_DEFAULT))
return NULL;
- if (op_is_flush(rq->cmd_flags) != op_is_flush(opf))
+ if (op_is_flush(rq->cmd_flags) != op_is_flush(bio->bi_opf))
return NULL;
- return rq;
-}
-
-static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
- struct bio *bio)
-{
if (rq_list_pop(&plug->cached_rqs) != rq)
WARN_ON_ONCE(1);
- /*
- * If any qos ->throttle() end up blocking, we will have flushed the
- * plug and hence killed the cached_rq list as well. Pop this entry
- * before we throttle.
- */
rq_qos_throttle(rq->q, bio);
-
blk_mq_rq_time_init(rq, blk_time_get_ns());
rq->cmd_flags = bio->bi_opf;
INIT_LIST_HEAD(&rq->queuelist);
+
+ /*
+ * Drop the extra queue reference from the cached request.
+ */
+ blk_queue_exit(q);
+ return rq;
}
static bool bio_unaligned(const struct bio *bio, struct request_queue *q)
@@ -3149,11 +3148,6 @@ void blk_mq_submit_bio(struct bio *bio)
struct request *rq;
blk_status_t ret;
- /*
- * If the plug has a cached request for this queue, try to use it.
- */
- rq = blk_mq_peek_cached_request(plug, q, bio->bi_opf);
-
/*
* A BIO that was released from a zone write plug has already been
* through the preparation in this function, already holds a reference
@@ -3162,19 +3156,11 @@ void blk_mq_submit_bio(struct bio *bio)
*/
if (bio_zone_write_plugging(bio)) {
nr_segs = bio->__bi_nr_segments;
- if (rq)
- blk_queue_exit(q);
goto new_request;
}
- /*
- * The cached request already holds a q_usage_counter reference and we
- * don't have to acquire a new one if we use it.
- */
- if (!rq) {
- if (unlikely(bio_queue_enter(bio)))
- return;
- }
+ if (unlikely(bio_queue_enter(bio)))
+ return;
/*
* Device reconfiguration may change logical block size or reduce the
@@ -3210,9 +3196,11 @@ void blk_mq_submit_bio(struct bio *bio)
}
new_request:
- if (rq) {
- blk_mq_use_cached_rq(rq, plug, bio);
- } else {
+ /*
+ * If the plug has a cached request for this queue, try to use it.
+ */
+ rq = blk_mq_pop_cached_request(plug, q, bio);
+ if (!rq) {
rq = blk_mq_get_new_requests(q, plug, bio);
if (unlikely(!rq)) {
if (bio->bi_opf & REQ_NOWAIT)
@@ -3257,12 +3245,7 @@ void blk_mq_submit_bio(struct bio *bio)
return;
queue_exit:
- /*
- * Don't drop the queue reference if we were trying to use a cached
- * request and thus didn't acquire one.
- */
- if (!rq)
- blk_queue_exit(q);
+ blk_queue_exit(q);
}
#ifdef CONFIG_BLK_MQ_STACKING
--
2.53.0
next prev parent reply other threads:[~2026-05-19 6:48 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 17:41 [PATCHv2] blk-mq: check for stale cached request in blk_mq_submit_bio Keith Busch
2026-05-18 14:44 ` Keith Busch
2026-05-19 6:47 ` Christoph Hellwig [this message]
2026-05-19 1:24 ` Ming Lei
2026-05-19 3:07 ` Keith Busch
2026-05-19 3:36 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260519064755.GA6682@lst.de \
--to=hch@lst.de \
--cc=axboe@kernel.dk \
--cc=kbusch@kernel.org \
--cc=kbusch@meta.com \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox