public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Chao Shi <coshi036@gmail.com>
To: linux-nvme@lists.infradead.org
Cc: linux-block@vger.kernel.org, hch@lst.de, kbusch@kernel.org,
	sagi@grimberg.me, axboe@kernel.dk, Chao Shi <coshi036@gmail.com>,
	Sungwoo Kim <iam@sung-woo.kim>, Dave Tian <daveti@purdue.edu>,
	Weidong Zhu <weizhu@fiu.edu>
Subject: [PATCH RFC 2/2] nvme: set integrity metadata size for EXT_LBAS non-PI namespace
Date: Sun, 26 Apr 2026 20:34:57 -0400	[thread overview]
Message-ID: <20260427003457.1264511-2-coshi036@gmail.com> (raw)
In-Reply-To: <20260427003457.1264511-1-coshi036@gmail.com>

This patch is an alternative to patch 1/2: instead of downgrading the
assertion in nvme_setup_rw(), it addresses the root cause at the
integrity-profile level so that the assertion is never reached.

For PCIe namespaces with extended LBAs (NVME_NS_EXT_LBAS set, flbas
bit 4) but without PI and without NVME_NS_METADATA_SUPPORTED, the early-
exit branch of nvme_init_integrity() at core.c:1834 returns false
without populating bi->metadata_size.  As a result blk_get_integrity()
returns NULL (it checks q->limits.integrity.metadata_size via
blk_integrity_queue_supports_integrity()), bio_integrity_action() returns
0, bio_integrity_prep() is never called, and REQ_INTEGRITY is never set
on bios dispatched to the namespace.  Any such bio that reaches
nvme_setup_rw() triggers WARN_ON_ONCE because head->ms != 0 but
blk_integrity_rq() returns false.

Populate bi->metadata_size = head->ms in the early-exit path for the
EXT_LBAS non-PI case.  This is sufficient to make blk_get_integrity()
return non-NULL, which causes bio_integrity_action() to return non-zero,
which causes bio_integrity_prep() to run and set REQ_INTEGRITY on any
bio submitted to the namespace.  Requests that reach nvme_setup_rw()
then satisfy blk_integrity_rq() and the assertion is not reached.

blk_validate_integrity_limits() accepts this configuration: with
csum_type=BLK_INTEGRITY_CSUM_NONE, pi_tuple_size=0, and pi_offset=0,
all checks pass (pi_offset + pi_tuple_size <= metadata_size, pi_tuple_size
must be 0 for CSUM_NONE), and interval_exp is auto-filled to
ilog2(logical_block_size).  No generate/verify callbacks are configured,
so no actual integrity computation occurs; only the blk_integrity_rq()
predicate is satisfied.  Capacity is still forced to 0 by
set_capacity_and_notify(), so new bios are rejected by bio_check_eod()
before queue entry.

Tested: Compiled on linux-kcov-debug (6.19.0+, KASAN/DEBUG_LIST).
Boot-tested under FEMU with NVME_SEMANTIC_DATA_MUTATOR=1; ran 4
concurrent dd processes plus 500 rescan_controller cycles with no WARN,
BUG, or Oops.  The EXT_LBAS + ms!=0 + !PI combination was not triggered
during testing (FEMU's mutator varies flbas and lbaf[0].ms independently;
flbas=0x10 with lbaf_idx=0 was not produced in this run).  The
bi->metadata_size assignment path was not exercised in testing;
correctness of blk_validate_integrity_limits() for this configuration
was verified by code inspection.  Provided as RFC.

Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).

Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
 drivers/nvme/host/core.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4e20c8f08e4..76fb788024f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1836,8 +1836,29 @@ static bool nvme_init_integrity(struct nvme_ns_head *head,
 	 * insert/strip it, which is not possible for other kinds of metadata.
 	 */
 	if (!IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) ||
-	    !(head->features & NVME_NS_METADATA_SUPPORTED))
-		return nvme_ns_has_pi(head);
+	    !(head->features & NVME_NS_METADATA_SUPPORTED)) {
+		bool has_pi = nvme_ns_has_pi(head);
+
+		/*
+		 * For PCIe EXT_LBAS non-PI namespaces the block layer sets
+		 * capacity to 0 (we return false) to prevent block I/O, but a
+		 * cached-rq bio may bypass bio_queue_enter freeze serialisation
+		 * and reach nvme_setup_rw() with head->ms != 0 and no
+		 * REQ_INTEGRITY set.  Populate bi->metadata_size so that
+		 * bio_integrity_action() returns non-zero and bio_integrity_prep()
+		 * sets REQ_INTEGRITY on any such bio, preventing the WARN_ON_ONCE
+		 * at nvme_setup_rw() (addressed by patch 1/2).
+		 *
+		 * NOTE: only metadata_size is populated; no csum or PI profile is
+		 * configured.  Actual data integrity for EXT_LBAS non-PI workloads
+		 * is untested; this patch is RFC for direction discussion.
+		 */
+		if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
+		    (head->features & NVME_NS_EXT_LBAS) &&
+		    head->ms && !has_pi)
+			bi->metadata_size = head->ms;
+		return has_pi;
+	}
 
 	switch (head->pi_type) {
 	case NVME_NS_DPS_PI_TYPE3:
-- 
2.43.0


      reply	other threads:[~2026-04-27  0:35 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27  0:34 [PATCH RFC 1/2] nvme: downgrade WARN in nvme_setup_rw to pr_debug Chao Shi
2026-04-27  0:34 ` Chao Shi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260427003457.1264511-2-coshi036@gmail.com \
    --to=coshi036@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=daveti@purdue.edu \
    --cc=hch@lst.de \
    --cc=iam@sung-woo.kim \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=weizhu@fiu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox