Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] nvme: recompute multipath zoned limits from ready paths
@ 2026-05-22  7:58 Yao Sang
  2026-05-22  9:21 ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Yao Sang @ 2026-05-22  7:58 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, shinichiro.kawasaki, Yao Sang

This was found while debugging a zoned NVMe multipath setup, where the
namespace head reported 0/0 for max_open_zones and max_active_zones
while the live path still reported finite limits.

These zoned resource limits are namespace-wide, but the head can retain
stale values after the ready-path set changes. Since 0 means "no limit"
for both values, stale head state can leave the multipath namespace with
bogus 0/0 limits even when the current ready path still advertises
finite limits.

Recompute max_open_zones and max_active_zones from the current
NVME_NS_READY paths when refreshing the namespace head limits, and clear
them when the resulting queue is not zoned.

Signed-off-by: Yao Sang <sangyao@kylinos.cn>
---
Changes in v2:
- Address Shin'ichiro Kawasaki's feedback that the v1 generic
  block-layer approach ("block: stack zoned resource limits") could
  break the DM zone resource limit semantics behind blktests zbd/011.
- Narrow the fix to NVMe multipath, rename the patch accordingly, and
  move the CONFIG_NVME_MULTIPATH guard to the call site.
- Rewrite the changelog around the stale 0/0 head-limit symptom and
  refresh the testing summary with directly relevant passing coverage.

Testing:
- Build: CONFIG_NVME_MULTIPATH=n to cover the non-multipath call-site guard.
- blktests: nvme/005, nvme/057, nvme/058 for ready-path changes via reset,
  ANA failover/failback, and namespace remap.
- blktests: zbd/011, zbd/012, zbd/013, block/004
  to cover adjacent zoned/block-limit behavior.
- blktests on QEMU ZNS NVMe /dev/nvme0n1: zbd/001, zbd/002, zbd/003,
  zbd/004, zbd/005, zbd/006 to confirm the fix on the QEMU-backed
  zoned NVMe device used in this VM testbed.

Link: https://lore.kernel.org/r/20260520091237.392802-1-sangyao@kylinos.cn

 drivers/nvme/host/core.c | 42 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c3032d6ad6b1..41fbfbe5f970 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2483,6 +2483,44 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 	return ret;
 }

+#ifdef CONFIG_NVME_MULTIPATH
+static void nvme_update_ns_head_zoned_limits(struct nvme_ns_head *head,
+					     struct queue_limits *lim)
+{
+	int srcu_idx;
+	struct nvme_ns *path;
+
+	if (!(lim->features & BLK_FEAT_ZONED)) {
+		lim->max_open_zones = 0;
+		lim->max_active_zones = 0;
+		return;
+	}
+
+	/*
+	 * Zone resource limits are namespace-wide. Recompute them from all ready
+	 * namespace paths instead of incrementally stacking stale head values.
+	 */
+	lim->max_open_zones = 0;
+	lim->max_active_zones = 0;
+
+	srcu_idx = srcu_read_lock(&head->srcu);
+	list_for_each_entry_srcu(path, &head->list, siblings,
+				 srcu_read_lock_held(&head->srcu)) {
+		struct queue_limits *path_lim;
+
+		if (!path->disk || !test_bit(NVME_NS_READY, &path->flags))
+			continue;
+
+		path_lim = &path->disk->queue->limits;
+		lim->max_open_zones = min_not_zero(lim->max_open_zones,
+						   path_lim->max_open_zones);
+		lim->max_active_zones = min_not_zero(lim->max_active_zones,
+						     path_lim->max_active_zones);
+	}
+	srcu_read_unlock(&head->srcu, srcu_idx);
+}
+#endif
+
 static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 {
 	bool unsupported = false;
@@ -2549,6 +2587,10 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 		lim.io_opt = ns_lim->io_opt;
 		queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
 					ns->head->disk->disk_name);
+
+#ifdef CONFIG_NVME_MULTIPATH
+		nvme_update_ns_head_zoned_limits(ns->head, &lim);
+#endif
 		if (unsupported)
 			ns->head->disk->flags |= GENHD_FL_HIDDEN;
 		else
--
2.25.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] nvme: recompute multipath zoned limits from ready paths
  2026-05-22  7:58 [PATCH v2] nvme: recompute multipath zoned limits from ready paths Yao Sang
@ 2026-05-22  9:21 ` Christoph Hellwig
  2026-05-25  6:40   ` Yao Sang
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2026-05-22  9:21 UTC (permalink / raw)
  To: Yao Sang; +Cc: linux-nvme, kbusch, axboe, hch, sagi, shinichiro.kawasaki

On Fri, May 22, 2026 at 03:58:06PM +0800, Yao Sang wrote:
> This was found while debugging a zoned NVMe multipath setup, where the
> namespace head reported 0/0 for max_open_zones and max_active_zones
> while the live path still reported finite limits.

How?  Having the different path report different limits is bogus.
I guess we just need to fix it and reject a device with mismatching
parameters.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] nvme: recompute multipath zoned limits from ready paths
  2026-05-22  9:21 ` Christoph Hellwig
@ 2026-05-25  6:40   ` Yao Sang
  2026-05-25  7:30     ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Yao Sang @ 2026-05-25  6:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, kbusch, axboe, sagi, shinichiro.kawasaki

On Fri, May 22, 2026 at 11:21:36AM +0200, Christoph Hellwig wrote:
> On Fri, May 22, 2026 at 03:58:06PM +0800, Yao Sang wrote:
> > This was found while debugging a zoned NVMe multipath setup, where the
> > namespace head reported 0/0 for max_open_zones and max_active_zones
> > while the live path still reported finite limits.
> 
> How?  Having the different path report different limits is bogus.
> I guess we just need to fix it and reject a device with mismatching
> parameters.
Hi Christoph,

Thanks for looking at this.

Sorry, the changelog was imprecise.

What I saw was not different live paths of the same namespace
reporting different zoned limits. On a QEMU VM with native NVMe
multipath enabled, two emulated NVMe controllers shared the same ZNS
namespace and advertised CMIC.MULTI_CTRL, so the kernel created a
multipath namespace head. In that setup, both controller paths
reported max_open_zones=16 and max_active_zones=16, while the
multipath namespace head for the same namespace reported 0/0.

I also checked Identify Namespace data through both controllers. In
both cases, nvme zns id-ns reported mor=15 and mar=15, so I did not
observe any path-to-path MOR/MAR mismatch. The mismatch was between
the namespace head and the controller paths.

So the issue here is not inconsistent zoned limits across paths. It
is that the namespace head can still show bogus 0/0 even when
multiple live paths report the same 16/16 limits.

After the patch, I reran nvme/005, nvme/057, and nvme/058, and
rechecked the head and path limits after reset, ANA failover/failback,
and namespace remap. In all cases, head and path stayed at 16/16.

If you prefer, I can respin this with changelog text that states only
that observed head/path mismatch.

Thanks,
Yao


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] nvme: recompute multipath zoned limits from ready paths
  2026-05-25  6:40   ` Yao Sang
@ 2026-05-25  7:30     ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2026-05-25  7:30 UTC (permalink / raw)
  To: Yao Sang
  Cc: Christoph Hellwig, linux-nvme, kbusch, axboe, sagi,
	shinichiro.kawasaki

On Mon, May 25, 2026 at 02:40:26PM +0800, Yao Sang wrote:
> What I saw was not different live paths of the same namespace
> reporting different zoned limits. On a QEMU VM with native NVMe
> multipath enabled, two emulated NVMe controllers shared the same ZNS
> namespace and advertised CMIC.MULTI_CTRL, so the kernel created a
> multipath namespace head. In that setup, both controller paths
> reported max_open_zones=16 and max_active_zones=16, while the
> multipath namespace head for the same namespace reported 0/0.
> 
> I also checked Identify Namespace data through both controllers. In
> both cases, nvme zns id-ns reported mor=15 and mar=15, so I did not
> observe any path-to-path MOR/MAR mismatch. The mismatch was between
> the namespace head and the controller paths.

They have to.  The high-level approach in your earlier patch is
the correct one.  The problem is just that we use the block-level
stacking for totally different things (multipath and complicated
DM setups).  The long-term strategy is to unwind that, but for
now I think you should just open code what you did there in the
nvme-multipath driver.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-25  7:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22  7:58 [PATCH v2] nvme: recompute multipath zoned limits from ready paths Yao Sang
2026-05-22  9:21 ` Christoph Hellwig
2026-05-25  6:40   ` Yao Sang
2026-05-25  7:30     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox