public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [RFC-PATCH 1/2] nvme-multipath: delete gendisk under subsys lock
@ 2026-03-02 22:25 Keith Busch
  2026-03-02 22:25 ` [RFC-PATCH 2/2] nvme: use the namespace id for block device names Keith Busch
  2026-03-03  7:29 ` [RFC-PATCH 1/2] nvme-multipath: delete gendisk under subsys lock Hannes Reinecke
  0 siblings, 2 replies; 8+ messages in thread
From: Keith Busch @ 2026-03-02 22:25 UTC (permalink / raw)
  To: linux-nvme, hch, mlombard; +Cc: jmeneghi, hare, Keith Busch

From: Keith Busch <kbusch@kernel.org>

To maintain symmetry with the disk add side, perform the disk deletion
under the nvme subsystem lock. This will ensure consistent ordering of
device naming when adding namespaces while concurrently deleting stale
references.

The deferred removal requires special attention to the nvme_subsystem:
the head may be holding the final reference in the deferred path, so
take a reference on it while removing the head.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c      |  3 +--
 drivers/nvme/host/multipath.c | 28 +++++++++++-----------------
 drivers/nvme/host/nvme.h      |  1 +
 3 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a5c1af443420b..7a558d5103a21 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -147,7 +147,6 @@ static const struct class nvme_ns_chr_class = {
 	.name = "nvme-generic",
 };
 
-static void nvme_put_subsystem(struct nvme_subsystem *subsys);
 static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 					   unsigned nsid);
 static void nvme_update_keep_alive(struct nvme_ctrl *ctrl,
@@ -3119,7 +3118,7 @@ static void nvme_destroy_subsystem(struct kref *ref)
 	put_device(&subsys->dev);
 }
 
-static void nvme_put_subsystem(struct nvme_subsystem *subsys)
+void nvme_put_subsystem(struct nvme_subsystem *subsys)
 {
 	kref_put(&subsys->ref, nvme_destroy_subsystem);
 }
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index fc6800a9f7f94..02a50181d1dd6 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -682,6 +682,8 @@ static void nvme_requeue_work(struct work_struct *work)
 
 static void nvme_remove_head(struct nvme_ns_head *head)
 {
+	list_del_init(&head->entry);
+
 	if (test_and_clear_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
 		/*
 		 * requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared
@@ -700,16 +702,14 @@ static void nvme_remove_head_work(struct work_struct *work)
 {
 	struct nvme_ns_head *head = container_of(to_delayed_work(work),
 			struct nvme_ns_head, remove_work);
-	bool remove = false;
+	struct nvme_subsystem *subsys = head->subsys;
 
-	mutex_lock(&head->subsys->lock);
-	if (list_empty(&head->list)) {
-		list_del_init(&head->entry);
-		remove = true;
-	}
-	mutex_unlock(&head->subsys->lock);
-	if (remove)
+	kref_get(&subsys->ref);
+	mutex_lock(&subsys->lock);
+	if (list_empty(&head->list))
 		nvme_remove_head(head);
+	mutex_unlock(&subsys->lock);
+	nvme_put_subsystem(subsys);
 
 	module_put(THIS_MODULE);
 }
@@ -1292,8 +1292,6 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, __le32 anagrpid)
 
 void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
-	bool remove = false;
-
 	if (!head->disk)
 		return;
 
@@ -1314,17 +1312,13 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 	 * Ensure that no one could remove this module while the head
 	 * remove work is pending.
 	 */
-	if (head->delayed_removal_secs && try_module_get(THIS_MODULE)) {
+	if (head->delayed_removal_secs && try_module_get(THIS_MODULE))
 		mod_delayed_work(nvme_wq, &head->remove_work,
 				head->delayed_removal_secs * HZ);
-	} else {
-		list_del_init(&head->entry);
-		remove = true;
-	}
+	else
+		nvme_remove_head(head);
 out:
 	mutex_unlock(&head->subsys->lock);
-	if (remove)
-		nvme_remove_head(head);
 }
 
 void nvme_mpath_put_disk(struct nvme_ns_head *head)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 44801801fc289..ed8ce356c363e 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1283,6 +1283,7 @@ struct nvme_ctrl *nvme_ctrl_from_file(struct file *file);
 struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid);
 bool nvme_get_ns(struct nvme_ns *ns);
 void nvme_put_ns(struct nvme_ns *ns);
+void nvme_put_subsystem(struct nvme_subsystem *subsys);
 
 static inline bool nvme_multi_css(struct nvme_ctrl *ctrl)
 {
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC-PATCH 2/2] nvme: use the namespace id for block device names
  2026-03-02 22:25 [RFC-PATCH 1/2] nvme-multipath: delete gendisk under subsys lock Keith Busch
@ 2026-03-02 22:25 ` Keith Busch
  2026-03-03  7:34   ` Hannes Reinecke
  2026-03-03  7:29 ` [RFC-PATCH 1/2] nvme-multipath: delete gendisk under subsys lock Hannes Reinecke
  1 sibling, 1 reply; 8+ messages in thread
From: Keith Busch @ 2026-03-02 22:25 UTC (permalink / raw)
  To: linux-nvme, hch, mlombard; +Cc: jmeneghi, hare, Keith Busch

From: Keith Busch <kbusch@kernel.org>

We can now assume the ns_id provides a unique number within a subsystem
when creating a namespace kobject. Use that rather than pulling an
available bit from an ida, which becomes unnecessary once we use the
controller's reported identifier.

The user observable change is that the suffix of the nvme namespace
device handle name will always be consistent regardless of which
controller is scanned first or what the namespace attachment state is
for a given controller being scanned. The prefix will still be dependent
on the order the controllers were connected/probed.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c      | 19 +++++--------------
 drivers/nvme/host/multipath.c |  4 ++--
 drivers/nvme/host/nvme.h      |  2 --
 3 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7a558d5103a21..3f2f9b2be87c2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -667,7 +667,6 @@ static void nvme_free_ns_head(struct kref *ref)
 		container_of(ref, struct nvme_ns_head, ref);
 
 	nvme_mpath_put_disk(head);
-	ida_free(&head->subsys->ns_ida, head->instance);
 	cleanup_srcu_struct(&head->srcu);
 	nvme_put_subsystem(head->subsys);
 	kfree(head->plids);
@@ -3113,7 +3112,6 @@ static void nvme_destroy_subsystem(struct kref *ref)
 	list_del(&subsys->entry);
 	mutex_unlock(&nvme_subsystems_lock);
 
-	ida_destroy(&subsys->ns_ida);
 	device_del(&subsys->dev);
 	put_device(&subsys->dev);
 }
@@ -3257,7 +3255,6 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 			put_device(&subsys->dev);
 			goto out_unlock;
 		}
-		ida_init(&subsys->ns_ida);
 		list_add_tail(&subsys->entry, &nvme_subsystems);
 	}
 
@@ -3868,7 +3865,7 @@ static int nvme_add_ns_cdev(struct nvme_ns *ns)
 
 	ns->cdev_device.parent = ns->ctrl->device;
 	ret = dev_set_name(&ns->cdev_device, "ng%dn%d",
-			   ns->ctrl->instance, ns->head->instance);
+			   ns->ctrl->instance, ns->head->ns_id);
 	if (ret)
 		return ret;
 
@@ -3890,14 +3887,10 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 	head = kzalloc(size, GFP_KERNEL);
 	if (!head)
 		goto out;
-	ret = ida_alloc_min(&ctrl->subsys->ns_ida, 1, GFP_KERNEL);
-	if (ret < 0)
-		goto out_free_head;
-	head->instance = ret;
 	INIT_LIST_HEAD(&head->list);
 	ret = init_srcu_struct(&head->srcu);
 	if (ret)
-		goto out_ida_remove;
+		goto out_free_head;
 	head->subsys = ctrl->subsys;
 	head->ns_id = info->nsid;
 	head->ids = info->ids;
@@ -3925,8 +3918,6 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 	return head;
 out_cleanup_srcu:
 	cleanup_srcu_struct(&head->srcu);
-out_ida_remove:
-	ida_free(&ctrl->subsys->ns_ida, head->instance);
 out_free_head:
 	kfree(head);
 out:
@@ -4162,14 +4153,14 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 	 */
 	if (nvme_ns_head_multipath(ns->head)) {
 		sprintf(disk->disk_name, "nvme%dc%dn%d", ctrl->subsys->instance,
-			ctrl->instance, ns->head->instance);
+			ctrl->instance, ns->head->ns_id);
 		disk->flags |= GENHD_FL_HIDDEN;
 	} else if (multipath) {
 		sprintf(disk->disk_name, "nvme%dn%d", ctrl->subsys->instance,
-			ns->head->instance);
+			ns->head->ns_id);
 	} else {
 		sprintf(disk->disk_name, "nvme%dn%d", ctrl->instance,
-			ns->head->instance);
+			ns->head->ns_id);
 	}
 
 	if (nvme_update_ns_info(ns, info))
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 02a50181d1dd6..8715b04ff92a8 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -640,7 +640,7 @@ static int nvme_add_ns_head_cdev(struct nvme_ns_head *head)
 
 	head->cdev_device.parent = &head->subsys->dev;
 	ret = dev_set_name(&head->cdev_device, "ng%dn%d",
-			   head->subsys->instance, head->instance);
+			   head->subsys->instance, head->ns_id);
 	if (ret)
 		return ret;
 	ret = nvme_cdev_add(&head->cdev, &head->cdev_device,
@@ -767,7 +767,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 	 */
 	set_bit(GD_SUPPRESS_PART_SCAN, &head->disk->state);
 	sprintf(head->disk->disk_name, "nvme%dn%d",
-			ctrl->subsys->instance, head->instance);
+			ctrl->subsys->instance, head->ns_id);
 	nvme_tryget_ns_head(head);
 	return 0;
 }
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ed8ce356c363e..142a5684205a8 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -499,7 +499,6 @@ struct nvme_subsystem {
 	u8			cmic;
 	enum nvme_subsys_type	subtype;
 	u16			vendor_id;
-	struct ida		ns_ida;
 #ifdef CONFIG_NVME_MULTIPATH
 	enum nvme_iopolicy	iopolicy;
 #endif
@@ -540,7 +539,6 @@ struct nvme_ns_head {
 	struct nvme_effects_log *effects;
 	u64			nuse;
 	unsigned		ns_id;
-	int			instance;
 #ifdef CONFIG_BLK_DEV_ZONED
 	u64			zsze;
 #endif
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC-PATCH 1/2] nvme-multipath: delete gendisk under subsys lock
  2026-03-02 22:25 [RFC-PATCH 1/2] nvme-multipath: delete gendisk under subsys lock Keith Busch
  2026-03-02 22:25 ` [RFC-PATCH 2/2] nvme: use the namespace id for block device names Keith Busch
@ 2026-03-03  7:29 ` Hannes Reinecke
  1 sibling, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2026-03-03  7:29 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, hch, mlombard; +Cc: jmeneghi, Keith Busch

On 3/2/26 23:25, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> To maintain symmetry with the disk add side, perform the disk deletion
> under the nvme subsystem lock. This will ensure consistent ordering of
> device naming when adding namespaces while concurrently deleting stale
> references.
> 
> The deferred removal requires special attention to the nvme_subsystem:
> the head may be holding the final reference in the deferred path, so
> take a reference on it while removing the head.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/nvme/host/core.c      |  3 +--
>   drivers/nvme/host/multipath.c | 28 +++++++++++-----------------
>   drivers/nvme/host/nvme.h      |  1 +
>   3 files changed, 13 insertions(+), 19 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC-PATCH 2/2] nvme: use the namespace id for block device names
  2026-03-02 22:25 ` [RFC-PATCH 2/2] nvme: use the namespace id for block device names Keith Busch
@ 2026-03-03  7:34   ` Hannes Reinecke
  2026-03-03 15:39     ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2026-03-03  7:34 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, hch, mlombard; +Cc: jmeneghi, Keith Busch

On 3/2/26 23:25, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> We can now assume the ns_id provides a unique number within a subsystem
> when creating a namespace kobject. Use that rather than pulling an
> available bit from an ida, which becomes unnecessary once we use the
> controller's reported identifier.
> 
> The user observable change is that the suffix of the nvme namespace
> device handle name will always be consistent regardless of which
> controller is scanned first or what the namespace attachment state is
> for a given controller being scanned. The prefix will still be dependent
> on the order the controllers were connected/probed.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/nvme/host/core.c      | 19 +++++--------------
>   drivers/nvme/host/multipath.c |  4 ++--
>   drivers/nvme/host/nvme.h      |  2 --
>   3 files changed, 7 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 7a558d5103a21..3f2f9b2be87c2 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -667,7 +667,6 @@ static void nvme_free_ns_head(struct kref *ref)
>   		container_of(ref, struct nvme_ns_head, ref);
>   
>   	nvme_mpath_put_disk(head);
> -	ida_free(&head->subsys->ns_ida, head->instance);
>   	cleanup_srcu_struct(&head->srcu);
>   	nvme_put_subsystem(head->subsys);
>   	kfree(head->plids);
> @@ -3113,7 +3112,6 @@ static void nvme_destroy_subsystem(struct kref *ref)
>   	list_del(&subsys->entry);
>   	mutex_unlock(&nvme_subsystems_lock);
>   
> -	ida_destroy(&subsys->ns_ida);
>   	device_del(&subsys->dev);
>   	put_device(&subsys->dev);
>   }
> @@ -3257,7 +3255,6 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
>   			put_device(&subsys->dev);
>   			goto out_unlock;
>   		}
> -		ida_init(&subsys->ns_ida);
>   		list_add_tail(&subsys->entry, &nvme_subsystems);
>   	}
>   
> @@ -3868,7 +3865,7 @@ static int nvme_add_ns_cdev(struct nvme_ns *ns)
>   
>   	ns->cdev_device.parent = ns->ctrl->device;
>   	ret = dev_set_name(&ns->cdev_device, "ng%dn%d",
> -			   ns->ctrl->instance, ns->head->instance);
> +			   ns->ctrl->instance, ns->head->ns_id);
>   	if (ret)
>   		return ret;
>   
> @@ -3890,14 +3887,10 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>   	head = kzalloc(size, GFP_KERNEL);
>   	if (!head)
>   		goto out;
> -	ret = ida_alloc_min(&ctrl->subsys->ns_ida, 1, GFP_KERNEL);
> -	if (ret < 0)
> -		goto out_free_head;
> -	head->instance = ret;
>   	INIT_LIST_HEAD(&head->list);
>   	ret = init_srcu_struct(&head->srcu);
>   	if (ret)
> -		goto out_ida_remove;
> +		goto out_free_head;
>   	head->subsys = ctrl->subsys;
>   	head->ns_id = info->nsid;
>   	head->ids = info->ids;
> @@ -3925,8 +3918,6 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>   	return head;
>   out_cleanup_srcu:
>   	cleanup_srcu_struct(&head->srcu);
> -out_ida_remove:
> -	ida_free(&ctrl->subsys->ns_ida, head->instance);
>   out_free_head:
>   	kfree(head);
>   out:
> @@ -4162,14 +4153,14 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
>   	 */
>   	if (nvme_ns_head_multipath(ns->head)) {
>   		sprintf(disk->disk_name, "nvme%dc%dn%d", ctrl->subsys->instance,
> -			ctrl->instance, ns->head->instance);
> +			ctrl->instance, ns->head->ns_id);
>   		disk->flags |= GENHD_FL_HIDDEN;
>   	} else if (multipath) {
>   		sprintf(disk->disk_name, "nvme%dn%d", ctrl->subsys->instance,
> -			ns->head->instance);
> +			ns->head->ns_id);
>   	} else {
>   		sprintf(disk->disk_name, "nvme%dn%d", ctrl->instance,
> -			ns->head->instance);
> +			ns->head->ns_id);
>   	}
>   
>   	if (nvme_update_ns_info(ns, info))
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 02a50181d1dd6..8715b04ff92a8 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -640,7 +640,7 @@ static int nvme_add_ns_head_cdev(struct nvme_ns_head *head)
>   
>   	head->cdev_device.parent = &head->subsys->dev;
>   	ret = dev_set_name(&head->cdev_device, "ng%dn%d",
> -			   head->subsys->instance, head->instance);
> +			   head->subsys->instance, head->ns_id);
>   	if (ret)
>   		return ret;
>   	ret = nvme_cdev_add(&head->cdev, &head->cdev_device,
> @@ -767,7 +767,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
>   	 */
>   	set_bit(GD_SUPPRESS_PART_SCAN, &head->disk->state);
>   	sprintf(head->disk->disk_name, "nvme%dn%d",
> -			ctrl->subsys->instance, head->instance);
> +			ctrl->subsys->instance, head->ns_id);
>   	nvme_tryget_ns_head(head);
>   	return 0;
>   }
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index ed8ce356c363e..142a5684205a8 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -499,7 +499,6 @@ struct nvme_subsystem {
>   	u8			cmic;
>   	enum nvme_subsys_type	subtype;
>   	u16			vendor_id;
> -	struct ida		ns_ida;
>   #ifdef CONFIG_NVME_MULTIPATH
>   	enum nvme_iopolicy	iopolicy;
>   #endif
> @@ -540,7 +539,6 @@ struct nvme_ns_head {
>   	struct nvme_effects_log *effects;
>   	u64			nuse;
>   	unsigned		ns_id;
> -	int			instance;
>   #ifdef CONFIG_BLK_DEV_ZONED
>   	u64			zsze;
>   #endif

The idea is nice, and I would love to go into that
direction.
But have you checked how this holds up under
rescan/remapping (eg things like blktest/nvme/058)?
Removal of the sysfs nodes might be delayed, and we cannot
create new entries with the same name until then.
So if that is taken care of, fine, but I don't see that in
the patch ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC-PATCH 2/2] nvme: use the namespace id for block device names
  2026-03-03  7:34   ` Hannes Reinecke
@ 2026-03-03 15:39     ` Keith Busch
  2026-03-03 16:53       ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2026-03-03 15:39 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Keith Busch, linux-nvme, hch, mlombard, jmeneghi

On Tue, Mar 03, 2026 at 08:34:38AM +0100, Hannes Reinecke wrote:
> The idea is nice, and I would love to go into that
> direction.
> But have you checked how this holds up under
> rescan/remapping (eg things like blktest/nvme/058)?
> Removal of the sysfs nodes might be delayed, and we cannot
> create new entries with the same name until then.
> So if that is taken care of, fine, but I don't see that in
> the patch ...

I see. I set out to ensure everything was ordered, but apparently I've
missed this case. Thanks for pointing out the test.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC-PATCH 2/2] nvme: use the namespace id for block device names
  2026-03-03 15:39     ` Keith Busch
@ 2026-03-03 16:53       ` Keith Busch
  2026-03-04 11:55         ` Nilay Shroff
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2026-03-03 16:53 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Keith Busch, linux-nvme, hch, mlombard, jmeneghi

On Tue, Mar 03, 2026 at 08:39:56AM -0700, Keith Busch wrote:
> On Tue, Mar 03, 2026 at 08:34:38AM +0100, Hannes Reinecke wrote:
> > The idea is nice, and I would love to go into that
> > direction.
> > But have you checked how this holds up under
> > rescan/remapping (eg things like blktest/nvme/058)?
> > Removal of the sysfs nodes might be delayed, and we cannot
> > create new entries with the same name until then.
> > So if that is taken care of, fine, but I don't see that in
> > the patch ...
> 
> I see. I set out to ensure everything was ordered, but apparently I've
> missed this case. Thanks for pointing out the test.

Okay, I think it's as simple as the head unlinking prior to actually
doing the del_gendisk creates a time when one scan work makes a new
namespace before the stale one is deleted in a different scan. This
should fix it:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3f2f9b2be87c2..e8dbb6cb85694 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4246,11 +4246,8 @@ static void nvme_ns_remove(struct nvme_ns *ns)

 	mutex_lock(&ns->ctrl->subsys->lock);
 	list_del_rcu(&ns->siblings);
-	if (list_empty(&ns->head->list)) {
-		if (!nvme_mpath_queue_if_no_path(ns->head))
-			list_del_init(&ns->head->entry);
+	if (list_empty(&ns->head->list))
 		last_path = true;
-	}
 	mutex_unlock(&ns->ctrl->subsys->lock);
 
 	/* guarantee not available in head->list */
--

This opens a different race where Controller A is deleting the last
path, and Controller B is bringing up a new namespace that reused the
NSID. An earlier patch from me should fix that by reschudeling B's
scan_work once A detects it removed the last path. It should work, but
it feels a bit off to me, so I'll think about it a little more.


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC-PATCH 2/2] nvme: use the namespace id for block device names
  2026-03-03 16:53       ` Keith Busch
@ 2026-03-04 11:55         ` Nilay Shroff
  2026-03-04 14:53           ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Nilay Shroff @ 2026-03-04 11:55 UTC (permalink / raw)
  To: Keith Busch, Hannes Reinecke
  Cc: Keith Busch, linux-nvme, hch, mlombard, jmeneghi

On 3/3/26 10:23 PM, Keith Busch wrote:
> On Tue, Mar 03, 2026 at 08:39:56AM -0700, Keith Busch wrote:
>> On Tue, Mar 03, 2026 at 08:34:38AM +0100, Hannes Reinecke wrote:
>>> The idea is nice, and I would love to go into that
>>> direction.
>>> But have you checked how this holds up under
>>> rescan/remapping (eg things like blktest/nvme/058)?
>>> Removal of the sysfs nodes might be delayed, and we cannot
>>> create new entries with the same name until then.
>>> So if that is taken care of, fine, but I don't see that in
>>> the patch ...
>>
>> I see. I set out to ensure everything was ordered, but apparently I've
>> missed this case. Thanks for pointing out the test.
> 
> Okay, I think it's as simple as the head unlinking prior to actually
> doing the del_gendisk creates a time when one scan work makes a new
> namespace before the stale one is deleted in a different scan. This
> should fix it:
> 
> ---
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3f2f9b2be87c2..e8dbb6cb85694 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4246,11 +4246,8 @@ static void nvme_ns_remove(struct nvme_ns *ns)
> 
>   	mutex_lock(&ns->ctrl->subsys->lock);
>   	list_del_rcu(&ns->siblings);
> -	if (list_empty(&ns->head->list)) {
> -		if (!nvme_mpath_queue_if_no_path(ns->head))
> -			list_del_init(&ns->head->entry);
> +	if (list_empty(&ns->head->list))
>   		last_path = true;
> -	}
>   	mutex_unlock(&ns->ctrl->subsys->lock);
>   
>   	/* guarantee not available in head->list */
> --
> 
> This opens a different race where Controller A is deleting the last
> path, and Controller B is bringing up a new namespace that reused the
> NSID. An earlier patch from me should fix that by reschudeling B's
> scan_work once A detects it removed the last path. It should work, but
> it feels a bit off to me, so I'll think about it a little more.
> 

I think we need to ensure that the head disk is fully removed from the 
system before a new head is created that may reuse the same NSID; 
otherwise we may run into name collisions. Since removal of the sysfs 
entries associated with the disk can be delayed, a namespace scan may 
attempt to create a new head with the same name while the previous one 
is still being torn down.

To avoid this, it might be better to defer removing head->entry from the 
subsys->nsheads list until after the head node’s gendisk has been 
removed in nvme_remove_head().

 From your first patch in this series, I see that the removal of
head->entry from subsys->nsheads was moved to the beginning of 
nvme_remove_head(). IMO, in addition to the change you proposed above, 
if we also move the removal of head->entry until after del_gendisk() is 
called, that should ensure the old disk is fully removed before a new 
head with the same NSID can be created.

Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC-PATCH 2/2] nvme: use the namespace id for block device names
  2026-03-04 11:55         ` Nilay Shroff
@ 2026-03-04 14:53           ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2026-03-04 14:53 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Hannes Reinecke, Keith Busch, linux-nvme, hch, mlombard, jmeneghi

On Wed, Mar 04, 2026 at 05:25:28PM +0530, Nilay Shroff wrote:
> I think we need to ensure that the head disk is fully removed from the
> system before a new head is created that may reuse the same NSID; otherwise
> we may run into name collisions. Since removal of the sysfs entries
> associated with the disk can be delayed, a namespace scan may attempt to
> create a new head with the same name while the previous one is still being
> torn down.
> 
> To avoid this, it might be better to defer removing head->entry from the
> subsys->nsheads list until after the head node´s gendisk has been removed in
> nvme_remove_head().
> 
> From your first patch in this series, I see that the removal of
> head->entry from subsys->nsheads was moved to the beginning of
> nvme_remove_head(). IMO, in addition to the change you proposed above, if we
> also move the removal of head->entry until after del_gendisk() is called,
> that should ensure the old disk is fully removed before a new head with the
> same NSID can be created.

Yeah, I'm really sure why it was done in two steps split across
different subsystem lock contexts.

I'll respin the series along with the last two patches from this one:

  https://lore.kernel.org/linux-nvme/20260226183216.2098584-1-kbusch@meta.com/


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-04 14:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02 22:25 [RFC-PATCH 1/2] nvme-multipath: delete gendisk under subsys lock Keith Busch
2026-03-02 22:25 ` [RFC-PATCH 2/2] nvme: use the namespace id for block device names Keith Busch
2026-03-03  7:34   ` Hannes Reinecke
2026-03-03 15:39     ` Keith Busch
2026-03-03 16:53       ` Keith Busch
2026-03-04 11:55         ` Nilay Shroff
2026-03-04 14:53           ` Keith Busch
2026-03-03  7:29 ` [RFC-PATCH 1/2] nvme-multipath: delete gendisk under subsys lock Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox