From: Christoph Hellwig <hch@lst.de>
To: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Cc: kbusch@kernel.org, axboe@kernel.dk, hch@lst.de, sagi@grimberg.me,
linux-nvme@lists.infradead.org
Subject: Re: [PATCH BUG FIX 2/2] nvme-multipath: clear BIO_QOS flags on requeue
Date: Mon, 24 Nov 2025 07:25:52 +0100 [thread overview]
Message-ID: <20251124062552.GC16614@lst.de> (raw)
In-Reply-To: <20251123191858.69957-3-ckulkarnilinux@gmail.com>
On Sun, Nov 23, 2025 at 11:18:58AM -0800, Chaitanya Kulkarni wrote:
> When a bio goes through the rq_qos infrastructure on a path's request
> queue, it gets BIO_QOS_THROTTLED or BIO_QOS_MERGED flags set. These
> flags indicate that rq_qos_done_bio() should be called on completion
> to update rq_qos accounting.
>
> During path failover in nvme_failover_req(), the bio's bi_bdev is
> redirected from the failed path's disk to the multipath head's disk
> via bio_set_dev(). However, the BIO_QOS flags are not cleared.
>
> When the bio eventually completes (either successfully via a new path
> or with an error via bio_io_error()), rq_qos_done_bio() checks for
> these flags and calls __rq_qos_done_bio(q->rq_qos, bio) where q is
> obtained from the bio's current bi_bdev - which is now the multipath
> head's queue, not the original path's queue.
>
> The multipath head's queue does not have rq_qos enabled (q->rq_qos is
> NULL), but the code assumes that if BIO_QOS_* flags are set, q->rq_qos
> must be valid. This assumption is documented in block/blk-rq-qos.h:
>
> "If a bio has BIO_QOS_xxx set, it implicitly implies that
> q->rq_qos is present."
>
> This breaks when a bio is moved between queues during NVMe multipath
> failover, leading to a NULL pointer dereference.
>
> Execution Context timeline :-
>
> * =====> dd process context
> [USER] dd process
> [SYSCALL] write() - dd process context
> submit_bio()
> nvme_ns_head_submit_bio() - path selection
> blk_mq_submit_bio() #### QOS FLAGS SET HERE
>
> [USER] dd waits or returns
>
> ==== I/O in flight on NVMe hardware =====
>
> ===== End of submission path ====
> ------------------------------------------------------
>
> * dd ====> Interrupt context;
> [IRQ] NVMe completion interrupt
> nvme_irq()
> nvme_complete_rq()
> nvme_failover_req() ### BIO MOVED TO HEAD
> spin_lock_irqsave (atomic section)
> bio_set_dev() changes bi_bdev
> ### BUG: QOS flags NOT cleared
> kblockd_schedule_work()
>
> * Interrupt context =====> kblockd workqueue
> [WQ] kblockd workqueue - kworker process
> nvme_requeue_work()
> submit_bio_noacct()
> nvme_ns_head_submit_bio()
> nvme_find_path() returns NULL
> bio_io_error()
> bio_endio()
> rq_qos_done_bio() ### CRASH ###
>
> KERNEL PANIC / OOPS
>
> Crash from blktests nvme/058 (rapid namespace remapping):
>
> [ 1339.636033] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 1339.641025] nvme nvme4: rescanning namespaces.
> [ 1339.642064] #PF: supervisor read access in kernel mode
> [ 1339.642067] #PF: error_code(0x0000) - not-present page
> [ 1339.642070] PGD 0 P4D 0
> [ 1339.642073] Oops: Oops: 0000 [#1] SMP NOPTI
> [ 1339.642078] CPU: 35 UID: 0 PID: 4579 Comm: kworker/35:2H
> Tainted: G O N 6.17.0-rc3nvme+ #5 PREEMPT(voluntary)
> [ 1339.642084] Tainted: [O]=OOT_MODULE, [N]=TEST
> [ 1339.673446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [ 1339.682359] Workqueue: kblockd nvme_requeue_work [nvme_core]
> [ 1339.686613] RIP: 0010:__rq_qos_done_bio+0xd/0x40
> [ 1339.690161] Code: 75 dd 5b 5d 41 5c c3 cc cc cc cc 66 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 f5
> 53 48 89 fb <48> 8b 03 48 8b 40 30 48 85 c0 74 0b 48 89 ee
> 48 89 df ff d0 0f 1f
> [ 1339.703691] RSP: 0018:ffffc900066f3c90 EFLAGS: 00010202
> [ 1339.706844] RAX: ffff888148b9ef00 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1339.711136] RDX: 00000000000001c0 RSI: ffff8882aaab8a80 RDI: 0000000000000000
> [ 1339.715691] RBP: ffff8882aaab8a80 R08: 0000000000000000 R09: 0000000000000000
> [ 1339.720472] R10: 0000000000000000 R11: fefefefefefefeff R12: ffff8882aa3b6010
> [ 1339.724650] R13: 0000000000000000 R14: ffff8882338bcef0 R15: ffff8882aa3b6020
> [ 1339.729029] FS: 0000000000000000(0000) GS:ffff88985c0cf000(0000) knlGS:0000000000000000
> [ 1339.734525] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1339.738563] CR2: 0000000000000000 CR3: 0000000111045000 CR4: 0000000000350ef0
> [ 1339.742750] DR0: ffffffff845ccbec DR1: ffffffff845ccbed DR2: ffffffff845ccbee
> [ 1339.745630] DR3: ffffffff845ccbef DR6: 00000000ffff0ff0 DR7: 0000000000000600
> [ 1339.748488] Call Trace:
> [ 1339.749512] <TASK>
> [ 1339.750449] bio_endio+0x71/0x2e0
> [ 1339.751833] nvme_ns_head_submit_bio+0x290/0x320 [nvme_core]
> [ 1339.754073] __submit_bio+0x222/0x5e0
> [ 1339.755623] ? rcu_is_watching+0xd/0x40
> [ 1339.757201] ? submit_bio_noacct_nocheck+0x131/0x370
> [ 1339.759210] submit_bio_noacct_nocheck+0x131/0x370
> [ 1339.761189] ? submit_bio_noacct+0x20/0x620
> [ 1339.762849] nvme_requeue_work+0x4b/0x60 [nvme_core]
> [ 1339.764828] process_one_work+0x20e/0x630
> [ 1339.766528] worker_thread+0x184/0x330
> [ 1339.768129] ? __pfx_worker_thread+0x10/0x10
> [ 1339.769942] kthread+0x10a/0x250
> [ 1339.771263] ? __pfx_kthread+0x10/0x10
> [ 1339.772776] ? __pfx_kthread+0x10/0x10
> [ 1339.774381] ret_from_fork+0x273/0x2e0
> [ 1339.775948] ? __pfx_kthread+0x10/0x10
> [ 1339.777504] ret_from_fork_asm+0x1a/0x30
> [ 1339.779163] </TASK>
>
> Fix this by clearing both BIO_QOS_THROTTLED and BIO_QOS_MERGED flags
> when bios are redirected to the multipath head in nvme_failover_req().
> This is consistent with the existing code that clears REQ_POLLED and
> REQ_NOWAIT flags when the bio changes queues.
>
> Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
> ---
> drivers/nvme/host/multipath.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 3da980dc60d9..2535dba8ce1e 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -168,6 +168,16 @@ void nvme_failover_req(struct request *req)
> * the flag to avoid spurious EAGAIN I/O failures.
> */
> bio->bi_opf &= ~REQ_NOWAIT;
> + /*
> + * BIO_QOS_THROTTLED and BIO_QOS_MERGED were set when the bio
> + * went through the path's request queue rq_qos infrastructure.
> + * The bio is now being redirected to the multipath head's
> + * queue which may not have rq_qos enabled, so these flags are
> + * no longer valid and must be cleared to prevent
> + * rq_qos_done_bio() from dereferencing a NULL q->rq_qos.
> + */
> + bio_clear_flag(bio, BIO_QOS_THROTTLED);
> + bio_clear_flag(bio, BIO_QOS_MERGED);
This really should go into blk_steal_bios instead. As should be the
existing nowait/polled fixups..
next prev parent reply other threads:[~2025-11-24 6:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-23 19:18 [PATCH 0/2] nvme: blktests bug fix for 6.19 Chaitanya Kulkarni
2025-11-23 19:18 ` [PATCH BUF FIX 1/2] nvme-tcp: use __fput_sync() to avoid use-after-free on reset Chaitanya Kulkarni
2025-11-24 6:24 ` Christoph Hellwig
2025-11-23 19:18 ` [PATCH BUG FIX 2/2] nvme-multipath: clear BIO_QOS flags on requeue Chaitanya Kulkarni
2025-11-24 6:25 ` Christoph Hellwig [this message]
2025-11-24 6:45 ` Chaitanya Kulkarni
2025-11-24 7:01 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251124062552.GC16614@lst.de \
--to=hch@lst.de \
--cc=axboe@kernel.dk \
--cc=ckulkarnilinux@gmail.com \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.