From: Chao Shi <coshi036@gmail.com>
To: linux-nvme@lists.infradead.org
Cc: linux-block@vger.kernel.org, hch@lst.de, kbusch@kernel.org,
sagi@grimberg.me, axboe@kernel.dk, Chao Shi <coshi036@gmail.com>,
Sungwoo Kim <iam@sung-woo.kim>, Dave Tian <daveti@purdue.edu>,
Weidong Zhu <weizhu@fiu.edu>
Subject: [PATCH RFC 2/2] nvme: set integrity metadata size for EXT_LBAS non-PI namespace
Date: Sun, 26 Apr 2026 20:34:57 -0400 [thread overview]
Message-ID: <20260427003457.1264511-2-coshi036@gmail.com> (raw)
In-Reply-To: <20260427003457.1264511-1-coshi036@gmail.com>
This patch is an alternative to patch 1/2: instead of downgrading the
assertion in nvme_setup_rw(), it addresses the root cause at the
integrity-profile level so that the assertion is never reached.
For PCIe namespaces with extended LBAs (NVME_NS_EXT_LBAS set, flbas
bit 4) but without PI and without NVME_NS_METADATA_SUPPORTED, the early-
exit branch of nvme_init_integrity() at core.c:1834 returns false
without populating bi->metadata_size. As a result blk_get_integrity()
returns NULL (it checks q->limits.integrity.metadata_size via
blk_integrity_queue_supports_integrity()), bio_integrity_action() returns
0, bio_integrity_prep() is never called, and REQ_INTEGRITY is never set
on bios dispatched to the namespace. Any such bio that reaches
nvme_setup_rw() triggers WARN_ON_ONCE because head->ms != 0 but
blk_integrity_rq() returns false.
Populate bi->metadata_size = head->ms in the early-exit path for the
EXT_LBAS non-PI case. This is sufficient to make blk_get_integrity()
return non-NULL, which causes bio_integrity_action() to return non-zero,
which causes bio_integrity_prep() to run and set REQ_INTEGRITY on any
bio submitted to the namespace. Requests that reach nvme_setup_rw()
then satisfy blk_integrity_rq() and the assertion is not reached.
blk_validate_integrity_limits() accepts this configuration: with
csum_type=BLK_INTEGRITY_CSUM_NONE, pi_tuple_size=0, and pi_offset=0,
all checks pass (pi_offset + pi_tuple_size <= metadata_size, pi_tuple_size
must be 0 for CSUM_NONE), and interval_exp is auto-filled to
ilog2(logical_block_size). No generate/verify callbacks are configured,
so no actual integrity computation occurs; only the blk_integrity_rq()
predicate is satisfied. Capacity is still forced to 0 by
set_capacity_and_notify(), so new bios are rejected by bio_check_eod()
before queue entry.
Tested: Compiled on linux-kcov-debug (6.19.0+, KASAN/DEBUG_LIST).
Boot-tested under FEMU with NVME_SEMANTIC_DATA_MUTATOR=1; ran 4
concurrent dd processes plus 500 rescan_controller cycles with no WARN,
BUG, or Oops. The EXT_LBAS + ms!=0 + !PI combination was not triggered
during testing (FEMU's mutator varies flbas and lbaf[0].ms independently;
flbas=0x10 with lbaf_idx=0 was not produced in this run). The
bi->metadata_size assignment path was not exercised in testing;
correctness of blk_validate_integrity_limits() for this configuration
was verified by code inspection. Provided as RFC.
Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
drivers/nvme/host/core.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4e20c8f08e4..76fb788024f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1836,8 +1836,29 @@ static bool nvme_init_integrity(struct nvme_ns_head *head,
* insert/strip it, which is not possible for other kinds of metadata.
*/
if (!IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) ||
- !(head->features & NVME_NS_METADATA_SUPPORTED))
- return nvme_ns_has_pi(head);
+ !(head->features & NVME_NS_METADATA_SUPPORTED)) {
+ bool has_pi = nvme_ns_has_pi(head);
+
+ /*
+ * For PCIe EXT_LBAS non-PI namespaces the block layer sets
+ * capacity to 0 (we return false) to prevent block I/O, but a
+ * cached-rq bio may bypass bio_queue_enter freeze serialisation
+ * and reach nvme_setup_rw() with head->ms != 0 and no
+ * REQ_INTEGRITY set. Populate bi->metadata_size so that
+ * bio_integrity_action() returns non-zero and bio_integrity_prep()
+ * sets REQ_INTEGRITY on any such bio, preventing the WARN_ON_ONCE
+ * at nvme_setup_rw() (addressed by patch 1/2).
+ *
+ * NOTE: only metadata_size is populated; no csum or PI profile is
+ * configured. Actual data integrity for EXT_LBAS non-PI workloads
+ * is untested; this patch is RFC for direction discussion.
+ */
+ if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
+ (head->features & NVME_NS_EXT_LBAS) &&
+ head->ms && !has_pi)
+ bi->metadata_size = head->ms;
+ return has_pi;
+ }
switch (head->pi_type) {
case NVME_NS_DPS_PI_TYPE3:
--
2.43.0
next prev parent reply other threads:[~2026-04-27 0:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 0:34 [PATCH RFC 1/2] nvme: downgrade WARN in nvme_setup_rw to pr_debug Chao Shi
2026-04-27 0:34 ` Chao Shi [this message]
2026-05-07 5:49 ` [PATCH RFC 2/2] nvme: set integrity metadata size for EXT_LBAS non-PI namespace Christoph Hellwig
2026-05-07 8:05 ` Keith Busch
2026-05-07 5:48 ` [PATCH RFC 1/2] nvme: downgrade WARN in nvme_setup_rw to pr_debug Christoph Hellwig
2026-05-17 3:54 ` Chao S
2026-05-18 5:56 ` Christoph Hellwig
2026-05-07 18:12 ` Keith Busch
2026-05-17 3:53 ` Chao S
2026-05-17 22:05 ` Keith Busch
2026-05-17 22:42 ` Keith Busch
2026-05-18 22:41 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260427003457.1264511-2-coshi036@gmail.com \
--to=coshi036@gmail.com \
--cc=axboe@kernel.dk \
--cc=daveti@purdue.edu \
--cc=hch@lst.de \
--cc=iam@sung-woo.kim \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=weizhu@fiu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.