public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover
@ 2026-02-26  3:12 Chaitanya Kulkarni
  2026-02-26  3:12 ` [PATCH V2 1/2] block: move bio queue-transition flag fixups into blk_steal_bios() Chaitanya Kulkarni
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2026-02-26  3:12 UTC (permalink / raw)
  To: kbusch, hch, sagi, wagi; +Cc: linux-block, linux-nvme, Chaitanya Kulkarni

Hi,

When a bio is processed on a path's request queue with rq_qos enabled,
it gets BIO_QOS_THROTTLED or BIO_QOS_MERGED flags set.  During NVMe
multipath failover, nvme_failover_req() redirects the bio's bi_bdev to
the multipath head's disk via bio_set_dev(), but the BIO_QOS flags are
left intact.

This series moves bio queue transition code into blk_steal_bios()
and adds a patch to clears BIO_QOS_THROTTLED and BIO_QOS_MERGED flags
in blk_steal_bios().

-ck

v1->v2 https://lore.kernel.org/all/20251124070142.GA17632@lst.de/:

*  Add a new patch to move the bio flag fixup loop from nvme_failover_req()
   into blk_steal_bios() rather than adding it only in the NVMe multipath
   path. (Christoph)

Chaitanya Kulkarni (2):
  block: move bio queue-transition flag fixups into blk_steal_bios()
  block: clear BIO_QOS flags in blk_steal_bios()

 block/blk-mq.c                | 19 +++++++++++++++++++
 drivers/nvme/host/multipath.c | 15 +--------------
 2 files changed, 20 insertions(+), 14 deletions(-)

-- 
2.39.5


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V2 1/2] block: move bio queue-transition flag fixups into blk_steal_bios()
  2026-02-26  3:12 [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover Chaitanya Kulkarni
@ 2026-02-26  3:12 ` Chaitanya Kulkarni
  2026-02-26 15:32   ` Christoph Hellwig
  2026-02-26  3:12 ` [PATCH V2 2/2] block: clear BIO_QOS flags in blk_steal_bios() Chaitanya Kulkarni
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Chaitanya Kulkarni @ 2026-02-26  3:12 UTC (permalink / raw)
  To: kbusch, hch, sagi, wagi; +Cc: linux-block, linux-nvme, Chaitanya Kulkarni

blk_steal_bios() transfers bios from a request to a bio_list when the
request is requeued to a different queue. The NVMe multipath failover
path (nvme_failover_req) currently open-codes clearing of REQ_POLLED,
bi_cookie, and REQ_NOWAIT on each bio before calling blk_steal_bios().

Move these fixups into blk_steal_bios() itself so that any caller
automatically gets correct flag state when bios cross queue boundaries.
Simplify nvme_failover_req() accordingly.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 block/blk-mq.c                | 17 +++++++++++++++++
 drivers/nvme/host/multipath.c | 15 +--------------
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a29d8ac9d3e3..419b5c768af2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3412,6 +3412,23 @@ EXPORT_SYMBOL_GPL(blk_rq_prep_clone);
  */
 void blk_steal_bios(struct bio_list *list, struct request *rq)
 {
+	struct bio *bio;
+
+	for (bio = rq->bio; bio; bio = bio->bi_next) {
+		if (bio->bi_opf & REQ_POLLED) {
+			bio->bi_opf &= ~REQ_POLLED;
+			bio->bi_cookie = BLK_QC_T_NONE;
+		}
+		/*
+		 * The alternate request queue that we may end up submitting
+		 * the bio to may be frozen temporarily, in this case REQ_NOWAIT
+		 * will fail the I/O immediately with EAGAIN to the issuer.
+		 * We are not in the issuer context which cannot block. Clear
+		 * the flag to avoid spurious EAGAIN I/O failures.
+		 */
+		bio->bi_opf &= ~REQ_NOWAIT;
+	}
+
 	if (rq->bio) {
 		if (list->tail)
 			list->tail->bi_next = rq->bio;
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index bfcc5904e6a2..cda8a1e21f59 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -154,21 +154,8 @@ void nvme_failover_req(struct request *req)
 	}
 
 	spin_lock_irqsave(&ns->head->requeue_lock, flags);
-	for (bio = req->bio; bio; bio = bio->bi_next) {
+	for (bio = req->bio; bio; bio = bio->bi_next)
 		bio_set_dev(bio, ns->head->disk->part0);
-		if (bio->bi_opf & REQ_POLLED) {
-			bio->bi_opf &= ~REQ_POLLED;
-			bio->bi_cookie = BLK_QC_T_NONE;
-		}
-		/*
-		 * The alternate request queue that we may end up submitting
-		 * the bio to may be frozen temporarily, in this case REQ_NOWAIT
-		 * will fail the I/O immediately with EAGAIN to the issuer.
-		 * We are not in the issuer context which cannot block. Clear
-		 * the flag to avoid spurious EAGAIN I/O failures.
-		 */
-		bio->bi_opf &= ~REQ_NOWAIT;
-	}
 	blk_steal_bios(&ns->head->requeue_list, req);
 	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V2 2/2] block: clear BIO_QOS flags in blk_steal_bios()
  2026-02-26  3:12 [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover Chaitanya Kulkarni
  2026-02-26  3:12 ` [PATCH V2 1/2] block: move bio queue-transition flag fixups into blk_steal_bios() Chaitanya Kulkarni
@ 2026-02-26  3:12 ` Chaitanya Kulkarni
  2026-02-26 15:33   ` Christoph Hellwig
  2026-03-10  5:43 ` [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover Chaitanya Kulkarni
  2026-03-10 13:11 ` Jens Axboe
  3 siblings, 1 reply; 7+ messages in thread
From: Chaitanya Kulkarni @ 2026-02-26  3:12 UTC (permalink / raw)
  To: kbusch, hch, sagi, wagi; +Cc: linux-block, linux-nvme, Chaitanya Kulkarni

When a bio goes through the rq_qos infrastructure on a path's request
queue, it gets BIO_QOS_THROTTLED or BIO_QOS_MERGED flags set. These
flags indicate that rq_qos_done_bio() should be called on completion
to update rq_qos accounting.

During path failover in nvme_failover_req(), the bio's bi_bdev is
redirected from the failed path's disk to the multipath head's disk
via bio_set_dev(). However, the BIO_QOS flags are not cleared.

When the bio eventually completes (either successfully via a new path
or with an error via bio_io_error()), rq_qos_done_bio() checks for
these flags and calls __rq_qos_done_bio(q->rq_qos, bio) where q is
obtained from the bio's current bi_bdev - which is now the multipath
head's queue, not the original path's queue.

The multipath head's queue does not have rq_qos enabled (q->rq_qos is
NULL), but the code assumes that if BIO_QOS_* flags are set, q->rq_qos
must be valid.

This breaks when a bio is moved between queues during NVMe multipath
failover, leading to a NULL pointer dereference.

Execution Context timeline :-

   * =====> dd process context
   [USER] dd process
     [SYSCALL] write() - dd process context
       submit_bio()
       nvme_ns_head_submit_bio() - path selection
       blk_mq_submit_bio()  #### QOS FLAGS SET HERE

        [USER] dd waits or returns

          ==== I/O in flight on NVMe hardware =====

   ===== End of submission path ====
   ------------------------------------------------------

   * dd ====> Interrupt context;
   [IRQ] NVMe completion interrupt
       nvme_irq()
        nvme_complete_rq()
         nvme_failover_req() ### BIO MOVED TO HEAD
            spin_lock_irqsave (atomic section)
            bio_set_dev() changes bi_bdev
            ### BUG: QOS flags NOT cleared
            kblockd_schedule_work()

   * Interrupt context =====> kblockd workqueue
   [WQ] kblockd workqueue - kworker process
       nvme_requeue_work()
        submit_bio_noacct()
         nvme_ns_head_submit_bio()
          nvme_find_path() returns NULL
           bio_io_error()
            bio_endio()
             rq_qos_done_bio()  ### CRASH ###

   KERNEL PANIC / OOPS

Crash from blktests nvme/058 (rapid namespace remapping):

[ 1339.636033] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1339.641025] nvme nvme4: rescanning namespaces.
[ 1339.642064] #PF: supervisor read access in kernel mode
[ 1339.642067] #PF: error_code(0x0000) - not-present page
[ 1339.642070] PGD 0 P4D 0
[ 1339.642073] Oops: Oops: 0000 [#1] SMP NOPTI
[ 1339.642078] CPU: 35 UID: 0 PID: 4579 Comm: kworker/35:2H
               Tainted: G   O     N  6.17.0-rc3nvme+ #5 PREEMPT(voluntary)
[ 1339.642084] Tainted: [O]=OOT_MODULE, [N]=TEST
[ 1339.673446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
           BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 1339.682359] Workqueue: kblockd nvme_requeue_work [nvme_core]
[ 1339.686613] RIP: 0010:__rq_qos_done_bio+0xd/0x40
[ 1339.690161] Code: 75 dd 5b 5d 41 5c c3 cc cc cc cc 66 90 90 90 90 90 90 90
                     90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 f5
             53 48 89 fb <48> 8b 03 48 8b 40 30 48 85 c0 74 0b 48 89 ee
             48 89 df ff d0 0f 1f
[ 1339.703691] RSP: 0018:ffffc900066f3c90 EFLAGS: 00010202
[ 1339.706844] RAX: ffff888148b9ef00 RBX: 0000000000000000 RCX: 0000000000000000
[ 1339.711136] RDX: 00000000000001c0 RSI: ffff8882aaab8a80 RDI: 0000000000000000
[ 1339.715691] RBP: ffff8882aaab8a80 R08: 0000000000000000 R09: 0000000000000000
[ 1339.720472] R10: 0000000000000000 R11: fefefefefefefeff R12: ffff8882aa3b6010
[ 1339.724650] R13: 0000000000000000 R14: ffff8882338bcef0 R15: ffff8882aa3b6020
[ 1339.729029] FS:  0000000000000000(0000) GS:ffff88985c0cf000(0000) knlGS:0000000000000000
[ 1339.734525] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1339.738563] CR2: 0000000000000000 CR3: 0000000111045000 CR4: 0000000000350ef0
[ 1339.742750] DR0: ffffffff845ccbec DR1: ffffffff845ccbed DR2: ffffffff845ccbee
[ 1339.745630] DR3: ffffffff845ccbef DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 1339.748488] Call Trace:
[ 1339.749512]  <TASK>
[ 1339.750449]  bio_endio+0x71/0x2e0
[ 1339.751833]  nvme_ns_head_submit_bio+0x290/0x320 [nvme_core]
[ 1339.754073]  __submit_bio+0x222/0x5e0
[ 1339.755623]  ? rcu_is_watching+0xd/0x40
[ 1339.757201]  ? submit_bio_noacct_nocheck+0x131/0x370
[ 1339.759210]  submit_bio_noacct_nocheck+0x131/0x370
[ 1339.761189]  ? submit_bio_noacct+0x20/0x620
[ 1339.762849]  nvme_requeue_work+0x4b/0x60 [nvme_core]
[ 1339.764828]  process_one_work+0x20e/0x630
[ 1339.766528]  worker_thread+0x184/0x330
[ 1339.768129]  ? __pfx_worker_thread+0x10/0x10
[ 1339.769942]  kthread+0x10a/0x250
[ 1339.771263]  ? __pfx_kthread+0x10/0x10
[ 1339.772776]  ? __pfx_kthread+0x10/0x10
[ 1339.774381]  ret_from_fork+0x273/0x2e0
[ 1339.775948]  ? __pfx_kthread+0x10/0x10
[ 1339.777504]  ret_from_fork_asm+0x1a/0x30
[ 1339.779163]  </TASK>

Fix this by clearing both BIO_QOS_THROTTLED and BIO_QOS_MERGED flags
when bios are redirected to the multipath head in nvme_failover_req().
This is consistent with the existing code that clears REQ_POLLED and
REQ_NOWAIT flags when the bio changes queues.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 block/blk-mq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 419b5c768af2..fea1d46829d6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3427,6 +3427,8 @@ void blk_steal_bios(struct bio_list *list, struct request *rq)
 		 * the flag to avoid spurious EAGAIN I/O failures.
 		 */
 		bio->bi_opf &= ~REQ_NOWAIT;
+		bio_clear_flag(bio, BIO_QOS_THROTTLED);
+		bio_clear_flag(bio, BIO_QOS_MERGED);
 	}
 
 	if (rq->bio) {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 1/2] block: move bio queue-transition flag fixups into blk_steal_bios()
  2026-02-26  3:12 ` [PATCH V2 1/2] block: move bio queue-transition flag fixups into blk_steal_bios() Chaitanya Kulkarni
@ 2026-02-26 15:32   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-02-26 15:32 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: kbusch, hch, sagi, wagi, linux-block, linux-nvme

On Wed, Feb 25, 2026 at 07:12:42PM -0800, Chaitanya Kulkarni wrote:
> blk_steal_bios() transfers bios from a request to a bio_list when the
> request is requeued to a different queue. The NVMe multipath failover
> path (nvme_failover_req) currently open-codes clearing of REQ_POLLED,
> bi_cookie, and REQ_NOWAIT on each bio before calling blk_steal_bios().
> 
> Move these fixups into blk_steal_bios() itself so that any caller
> automatically gets correct flag state when bios cross queue boundaries.
> Simplify nvme_failover_req() accordingly.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 2/2] block: clear BIO_QOS flags in blk_steal_bios()
  2026-02-26  3:12 ` [PATCH V2 2/2] block: clear BIO_QOS flags in blk_steal_bios() Chaitanya Kulkarni
@ 2026-02-26 15:33   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-02-26 15:33 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: kbusch, hch, sagi, wagi, linux-block, linux-nvme

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover
  2026-02-26  3:12 [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover Chaitanya Kulkarni
  2026-02-26  3:12 ` [PATCH V2 1/2] block: move bio queue-transition flag fixups into blk_steal_bios() Chaitanya Kulkarni
  2026-02-26  3:12 ` [PATCH V2 2/2] block: clear BIO_QOS flags in blk_steal_bios() Chaitanya Kulkarni
@ 2026-03-10  5:43 ` Chaitanya Kulkarni
  2026-03-10 13:11 ` Jens Axboe
  3 siblings, 0 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2026-03-10  5:43 UTC (permalink / raw)
  To: Chaitanya Kulkarni, kbusch@kernel.org, hch@lst.de,
	sagi@grimberg.me, wagi@monom.org, Jens Axboe
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org

On 2/25/26 19:12, Chaitanya Kulkarni wrote:
> Hi,
>
> When a bio is processed on a path's request queue with rq_qos enabled,
> it gets BIO_QOS_THROTTLED or BIO_QOS_MERGED flags set.  During NVMe
> multipath failover, nvme_failover_req() redirects the bio's bi_bdev to
> the multipath head's disk via bio_set_dev(), but the BIO_QOS flags are
> left intact.
>
> This series moves bio queue transition code into blk_steal_bios()
> and adds a patch to clears BIO_QOS_THROTTLED and BIO_QOS_MERGED flags
> in blk_steal_bios().
>
> -ck
>
> v1->v2 https://lore.kernel.org/all/20251124070142.GA17632@lst.de/:
>
> *  Add a new patch to move the bio flag fixup loop from nvme_failover_req()
>     into blk_steal_bios() rather than adding it only in the NVMe multipath
>     path. (Christoph)
>
> Chaitanya Kulkarni (2):
>    block: move bio queue-transition flag fixups into blk_steal_bios()
>    block: clear BIO_QOS flags in blk_steal_bios()
>
>   block/blk-mq.c                | 19 +++++++++++++++++++
>   drivers/nvme/host/multipath.c | 15 +--------------
>   2 files changed, 20 insertions(+), 14 deletions(-)
>
Gentle ping on this how should we merge nvme or block tree ?

-ck



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover
  2026-02-26  3:12 [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover Chaitanya Kulkarni
                   ` (2 preceding siblings ...)
  2026-03-10  5:43 ` [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover Chaitanya Kulkarni
@ 2026-03-10 13:11 ` Jens Axboe
  3 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2026-03-10 13:11 UTC (permalink / raw)
  To: kbusch, hch, sagi, wagi, Chaitanya Kulkarni; +Cc: linux-block, linux-nvme


On Wed, 25 Feb 2026 19:12:41 -0800, Chaitanya Kulkarni wrote:
> When a bio is processed on a path's request queue with rq_qos enabled,
> it gets BIO_QOS_THROTTLED or BIO_QOS_MERGED flags set.  During NVMe
> multipath failover, nvme_failover_req() redirects the bio's bi_bdev to
> the multipath head's disk via bio_set_dev(), but the BIO_QOS flags are
> left intact.
> 
> This series moves bio queue transition code into blk_steal_bios()
> and adds a patch to clears BIO_QOS_THROTTLED and BIO_QOS_MERGED flags
> in blk_steal_bios().
> 
> [...]

Applied, thanks!

[1/2] block: move bio queue-transition flag fixups into blk_steal_bios()
      commit: b2c45ced591e6cf947560d2d290a51855926b774
[2/2] block: clear BIO_QOS flags in blk_steal_bios()
      commit: daa6c79858e9ca75c548452bf71db8a9e61bde42

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-10 13:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26  3:12 [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover Chaitanya Kulkarni
2026-02-26  3:12 ` [PATCH V2 1/2] block: move bio queue-transition flag fixups into blk_steal_bios() Chaitanya Kulkarni
2026-02-26 15:32   ` Christoph Hellwig
2026-02-26  3:12 ` [PATCH V2 2/2] block: clear BIO_QOS flags in blk_steal_bios() Chaitanya Kulkarni
2026-02-26 15:33   ` Christoph Hellwig
2026-03-10  5:43 ` [PATCH v2 0/2] blk/nvme: fix NULL deref in rq_qos_done_bio() on multipath failover Chaitanya Kulkarni
2026-03-10 13:11 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox