Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/1] Ensure ordered namespace registration during async scan
@ 2025-08-04 11:43 Maurizio Lombardi
  2025-08-04 11:43 ` [PATCH RFC 1/1] nvme-core: register namespaces in order " Maurizio Lombardi
  0 siblings, 1 reply; 5+ messages in thread
From: Maurizio Lombardi @ 2025-08-04 11:43 UTC (permalink / raw)
  To: kbusch; +Cc: hch, sagi, hare, mlombard, linux-nvme

Hello,

The fully asynchronous namespace scanning, introduced in
commit 4e893ca81170 ("nvme-core: scan namespaces asynchronously"),
significantly improved discovery times.
However, it also introduced non-deterministic ordering for namespace registration.

While I am perfectly aware that kernel device names
like /dev/nvmeXnY are not guaranteed to be stable across reboots,
the unpredictable ordering has caused user confusion and
has been perceived as a regression. This has led to bug reports from customers,
and as a result, the asynchronous scanning feature was reverted in RHEL.

I was thinking about a solution to enforce sequential registration while
preserving the performance benefits of the asynchronous scan approach.

This one uses a dependency chain built from a linked list of
asynchronous tasks.
Each task waits for the completion of the previous one before it proceeds to
allocate its namespace, thus guaranteeing that namespaces
are registered sequentially, following the NSID list order, exactly as it
worked before commit 4e893ca81170.

This method is inspired by a very similar mechanism used in the SCSI
subsystem for asynchronous scanning (see drivers/scsi/scsi_scan.c).

Original code:

$ nvme list
Node                  Generic               Namespace
--------------------- --------------------- ----------
/dev/nvme0n1          /dev/ng0n1            0x2
/dev/nvme0n2          /dev/ng0n2            0x1
/dev/nvme0n3          /dev/ng0n3            0x5
/dev/nvme0n4          /dev/ng0n4            0x3
/dev/nvme0n5          /dev/ng0n5            0x4
[...]
/dev/nvme0n10         /dev/ng0n10           0xa
/dev/nvme0n11         /dev/ng0n11           0x8
/dev/nvme0n12         /dev/ng0n12           0x12
/dev/nvme0n13         /dev/ng0n13           0x17
/dev/nvme0n14         /dev/ng0n14           0xc
/dev/nvme0n15         /dev/ng0n15           0x11
/dev/nvme0n16         /dev/ng0n16           0x14
/dev/nvme0n17         /dev/ng0n17           0x13
/dev/nvme0n18         /dev/ng0n18           0xe
/dev/nvme0n19         /dev/ng0n19           0xf


With this patch:

$ nvme list
Node                  Generic               Namespace
--------------------- --------------------- ----------
/dev/nvme0n1          /dev/ng0n1            0x1
/dev/nvme0n2          /dev/ng0n2            0x2
/dev/nvme0n3          /dev/ng0n3            0x3
/dev/nvme0n4          /dev/ng0n4            0x4
/dev/nvme0n5          /dev/ng0n5            0x5
/dev/nvme0n6          /dev/ng0n6            0x6
[...]
/dev/nvme0n10         /dev/ng0n10           0xa
/dev/nvme0n11         /dev/ng0n11           0xb
/dev/nvme0n12         /dev/ng0n12           0xc
/dev/nvme0n13         /dev/ng0n13           0xd
/dev/nvme0n14         /dev/ng0n14           0xe
/dev/nvme0n15         /dev/ng0n15           0xf
/dev/nvme0n16         /dev/ng0n16           0x10
/dev/nvme0n17         /dev/ng0n17           0x11
/dev/nvme0n18         /dev/ng0n18           0x12
/dev/nvme0n19         /dev/ng0n19           0x13

Maurizio Lombardi (1):
  nvme-core: register namespaces in order during async scan

 drivers/nvme/host/core.c | 145 ++++++++++++++++++++++++++++-----------
 drivers/nvme/host/nvme.h |   2 +
 2 files changed, 108 insertions(+), 39 deletions(-)

-- 
2.47.3



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH RFC 1/1] nvme-core: register namespaces in order during async scan
  2025-08-04 11:43 [PATCH RFC 0/1] Ensure ordered namespace registration during async scan Maurizio Lombardi
@ 2025-08-04 11:43 ` Maurizio Lombardi
  2025-09-22 17:30   ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Maurizio Lombardi @ 2025-08-04 11:43 UTC (permalink / raw)
  To: kbusch; +Cc: hch, sagi, hare, mlombard, linux-nvme

The fully asynchronous namespace scanning, while fast,
can result in namespaces being allocated and registered
out of order. This leads to unpredictable device naming
which can be confusing for users.

Introduce a dependency chain to the asynchronous
scanning process to ensure namespaces are allocated sequentially.
It replaces the simple atomic counter with a linked list of
`async_scan_task` structures. Each task, representing a
namespace to be scanned, waits for the completion of the
previous task in the list before it proceeds to allocate its own namespace.

This approach preserves the performance benefits of asynchronous
identification while guaranteeing that the final device
registration occurs in the correct order.

Performance testing shows that this change has no noticeable impact
on scan times compared to the fully asynchronous method.

High latency NVMe/TCP, ~100ms ping, 100 namespaces

Synchronous namespace scan (RHEL-10.1): 31175ms
Fully async namespace scan (6.16-rc7):   2563ms
Async namespace scan with dependency chain (6.16-rc7): 2599ms

Low latency NVMe/TCP, ~0.2ms ping, 100 namespaces

Synchronous namespace scan (RHEL-10.1): 335ms
Fully async namespace scan (6.16-rc7):  156ms
Async namespace scan with dependency chain (6.16-rc7): 116ms

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c | 145 ++++++++++++++++++++++++++++-----------
 drivers/nvme/host/nvme.h |   2 +
 2 files changed, 108 insertions(+), 39 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9d988f4cb87a..2a58a2b3c173 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4080,17 +4080,83 @@ static void nvme_ns_add_to_ctrl_list(struct nvme_ns *ns)
 	list_add_rcu(&ns->list, &ns->ctrl->namespaces);
 }
 
-static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
+/**
+ * struct async_scan_task - keeps track of controller & NSID to scan
+ * @ctrl:		Controller on which namespaces are being scanned
+ * @nsid:		The NSID to scan
+ * @prev_finished:	Set when the previous namespace scan has been completed
+ * @head:		Linked list of the scan asynchronous tasks
+ */
+struct async_scan_task {
+	struct nvme_ctrl *ctrl;
+	u32 nsid;
+	struct completion prev_finished;
+	struct list_head head;
+};
+
+static struct async_scan_task *async_scan_task_init(struct nvme_ctrl *ctrl,
+							u32 nsid)
+{
+	struct async_scan_task *task = kzalloc(sizeof(*task), GFP_KERNEL);
+	if (!task)
+		return NULL;
+
+	task->ctrl = ctrl;
+	task->nsid = nsid;
+	init_completion(&task->prev_finished);
+	INIT_LIST_HEAD(&task->head);
+
+	spin_lock(&ctrl->scan_list_lock);
+	if (list_empty(&ctrl->scan_list))
+		complete_all(&task->prev_finished);
+	list_add_tail(&task->head, &ctrl->scan_list);
+	spin_unlock(&ctrl->scan_list_lock);
+
+	return task;
+}
+
+static void async_scan_prev_task_wait(struct async_scan_task *task)
+{
+	if (!task)
+		return;
+
+	wait_for_completion(&task->prev_finished);
+}
+
+static void async_scan_task_complete(struct async_scan_task *task)
+{
+	struct nvme_ctrl *ctrl;
+
+	if (!task)
+		return;
+
+	async_scan_prev_task_wait(task);
+
+	ctrl = task->ctrl;
+	spin_lock(&ctrl->scan_list_lock);
+	list_del(&task->head);
+	if (!list_empty(&ctrl->scan_list)) {
+		struct async_scan_task *next = list_entry(ctrl->scan_list.next,
+			struct async_scan_task, head);
+		complete_all(&next->prev_finished);
+	}
+	spin_unlock(&ctrl->scan_list_lock);
+	kfree(task);
+}
+
+static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info,
+				struct async_scan_task *task)
 {
 	struct queue_limits lim = { };
 	struct nvme_ns *ns;
 	struct gendisk *disk;
 	int node = ctrl->numa_node;
 	bool last_path = false;
+	int r;
 
 	ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
 	if (!ns)
-		return;
+		goto out_complete_async;
 
 	if (ctrl->opts && ctrl->opts->data_digest)
 		lim.features |= BLK_FEAT_STABLE_WRITES;
@@ -4109,7 +4175,16 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 	ns->ctrl = ctrl;
 	kref_init(&ns->kref);
 
-	if (nvme_init_ns_head(ns, info))
+	/*
+	 * Wait for the previous async task to finish before
+	 * allocating the namespace.
+	 */
+	async_scan_prev_task_wait(task);
+	r = nvme_init_ns_head(ns, info);
+	async_scan_task_complete(task);
+	task = NULL;
+
+	if (r)
 		goto out_cleanup_disk;
 
 	/*
@@ -4200,6 +4275,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 	put_disk(disk);
  out_free_ns:
 	kfree(ns);
+ out_complete_async:
+	async_scan_task_complete(task);
 }
 
 static void nvme_ns_remove(struct nvme_ns *ns)
@@ -4284,19 +4361,20 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info)
 		nvme_ns_remove(ns);
 }
 
-static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
+static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid,
+				struct async_scan_task *task)
 {
 	struct nvme_ns_info info = { .nsid = nsid };
 	struct nvme_ns *ns;
 	int ret = 1;
 
 	if (nvme_identify_ns_descs(ctrl, &info))
-		return;
+		goto exit;
 
 	if (info.ids.csi != NVME_CSI_NVM && !nvme_multi_css(ctrl)) {
 		dev_warn(ctrl->device,
 			"command set not reported for nsid: %d\n", nsid);
-		return;
+		goto exit;
 	}
 
 	/*
@@ -4319,44 +4397,27 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	 * becomes ready and restart the scan.
 	 */
 	if (ret || !info.is_ready)
-		return;
+		goto exit;
 
 	ns = nvme_find_get_ns(ctrl, nsid);
 	if (ns) {
+		async_scan_task_complete(task);
 		nvme_validate_ns(ns, &info);
 		nvme_put_ns(ns);
 	} else {
-		nvme_alloc_ns(ctrl, &info);
+		nvme_alloc_ns(ctrl, &info, task);
 	}
-}
+	return;
 
-/**
- * struct async_scan_info - keeps track of controller & NSIDs to scan
- * @ctrl:	Controller on which namespaces are being scanned
- * @next_nsid:	Index of next NSID to scan in ns_list
- * @ns_list:	Pointer to list of NSIDs to scan
- *
- * Note: There is a single async_scan_info structure shared by all instances
- * of nvme_scan_ns_async() scanning a given controller, so the atomic
- * operations on next_nsid are critical to ensure each instance scans a unique
- * NSID.
- */
-struct async_scan_info {
-	struct nvme_ctrl *ctrl;
-	atomic_t next_nsid;
-	__le32 *ns_list;
-};
+exit:
+	async_scan_task_complete(task);
+}
 
 static void nvme_scan_ns_async(void *data, async_cookie_t cookie)
 {
-	struct async_scan_info *scan_info = data;
-	int idx;
-	u32 nsid;
+	struct async_scan_task *task = data;
 
-	idx = (u32)atomic_fetch_inc(&scan_info->next_nsid);
-	nsid = le32_to_cpu(scan_info->ns_list[idx]);
-
-	nvme_scan_ns(scan_info->ctrl, nsid);
+	nvme_scan_ns(task->ctrl, task->nsid, task);
 }
 
 static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
@@ -4386,14 +4447,12 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
 	u32 prev = 0;
 	int ret = 0, i;
 	ASYNC_DOMAIN(domain);
-	struct async_scan_info scan_info;
+	struct async_scan_task *task;
 
 	ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL);
 	if (!ns_list)
 		return -ENOMEM;
 
-	scan_info.ctrl = ctrl;
-	scan_info.ns_list = ns_list;
 	for (;;) {
 		struct nvme_command cmd = {
 			.identify.opcode	= nvme_admin_identify,
@@ -4409,20 +4468,26 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
 			goto free;
 		}
 
-		atomic_set(&scan_info.next_nsid, 0);
 		for (i = 0; i < nr_entries; i++) {
 			u32 nsid = le32_to_cpu(ns_list[i]);
 
 			if (!nsid)	/* end of the list? */
 				goto out;
-			async_schedule_domain(nvme_scan_ns_async, &scan_info,
+
+			task = async_scan_task_init(ctrl, nsid);
+			if (!task) {
+				ret = -ENOMEM;
+				goto out;
+			}
+
+			async_schedule_domain(nvme_scan_ns_async, task,
 						&domain);
 			while (++prev < nsid)
 				nvme_ns_remove_by_nsid(ctrl, prev);
 		}
-		async_synchronize_full_domain(&domain);
 	}
  out:
+	async_synchronize_full_domain(&domain);
 	nvme_remove_invalid_namespaces(ctrl, prev);
  free:
 	async_synchronize_full_domain(&domain);
@@ -4441,7 +4506,7 @@ static void nvme_scan_ns_sequential(struct nvme_ctrl *ctrl)
 	kfree(id);
 
 	for (i = 1; i <= nn; i++)
-		nvme_scan_ns(ctrl, i);
+		nvme_scan_ns(ctrl, i, NULL);
 
 	nvme_remove_invalid_namespaces(ctrl, nn);
 }
@@ -5062,6 +5127,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 
 	mutex_init(&ctrl->scan_lock);
 	INIT_LIST_HEAD(&ctrl->namespaces);
+	INIT_LIST_HEAD(&ctrl->scan_list);
+	spin_lock_init(&ctrl->scan_list_lock);
 	xa_init(&ctrl->cels);
 	ctrl->dev = dev;
 	ctrl->ops = ops;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index cfd2b5b90b91..841126dc526f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -294,6 +294,8 @@ struct nvme_ctrl {
 	struct blk_mq_tag_set *tagset;
 	struct blk_mq_tag_set *admin_tagset;
 	struct list_head namespaces;
+	struct list_head scan_list;
+	spinlock_t scan_list_lock;
 	struct mutex namespaces_lock;
 	struct srcu_struct srcu;
 	struct device ctrl_device;
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 1/1] nvme-core: register namespaces in order during async scan
  2025-08-04 11:43 ` [PATCH RFC 1/1] nvme-core: register namespaces in order " Maurizio Lombardi
@ 2025-09-22 17:30   ` Christoph Hellwig
  2025-09-26  8:43     ` Maurizio Lombardi
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2025-09-22 17:30 UTC (permalink / raw)
  To: Maurizio Lombardi; +Cc: kbusch, hch, sagi, hare, mlombard, linux-nvme

This idea looks fine to me, but I hate duplicating the logic
from SCSI here.  Any chance you could try to factor the logic
into a common helper?


> +	struct async_scan_task *task = kzalloc(sizeof(*task), GFP_KERNEL);
> +	if (!task)

Always keep an empty line after function declarations.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 1/1] nvme-core: register namespaces in order during async scan
  2025-09-22 17:30   ` Christoph Hellwig
@ 2025-09-26  8:43     ` Maurizio Lombardi
  2025-10-03 10:14       ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Maurizio Lombardi @ 2025-09-26  8:43 UTC (permalink / raw)
  To: Christoph Hellwig, Maurizio Lombardi; +Cc: kbusch, sagi, hare, linux-nvme

On Mon Sep 22, 2025 at 7:30 PM CEST, Christoph Hellwig wrote:
> This idea looks fine to me, but I hate duplicating the logic
> from SCSI here.  Any chance you could try to factor the logic
> into a common helper?

Ok, let's try. What do you think about something like this?
I am not good with function names so I am open to suggestions:



lib: Introduce completion chain helper

Introduce a new helper library, the completion chain, designed
to serialize asynchronous operations that must execute
in a strict First-In, First-Out (FIFO) order.

Certain workflows, particularly in block or
storage drivers, require asynchronous operations to complete in the
same sequence they were submitted.
This helper provides a generic mechanism to enforce this ordering.

struct compl_chain: The main structure representing the queue of operations.

struct compl_chain_entry: An entry embedded in a per-operation structure.

The typical usage pattern is:

    * An operation is enqueued by calling compl_chain_add().

    * The worker thread for the operation calls
      compl_chain_wait(), which blocks until the previously
      enqueued operation has finished.

    * After the work is done, the thread calls compl_chain_complete().
      This signals the next operation in the chain that it can now
      proceed and removes the current entry from the list.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 include/linux/compl_chain.h | 34 ++++++++++++++++++
 lib/Makefile                |  2 +-
 lib/compl_chain.c           | 72 +++++++++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/compl_chain.h
 create mode 100644 lib/compl_chain.c

diff --git a/include/linux/compl_chain.h b/include/linux/compl_chain.h
new file mode 100644
index 000000000000..663c105b6605
--- /dev/null
+++ b/include/linux/compl_chain.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_COMPLETION_CHAIN_H
+#define _LINUX_COMPLETION_CHAIN_H
+
+#include <linux/list.h>
+#include <linux/completion.h>
+#include <linux/spinlock.h>
+
+struct compl_chain {
+	spinlock_t lock;
+	struct list_head list;
+};
+
+#define COMPL_CHAIN_INIT(name) \
+    { .lock = __SPIN_LOCK_UNLOCKED((name).lock), \
+      .list = LIST_HEAD_INIT((name).list) }
+
+#define DEFINE_COMPL_CHAIN(name) \
+    struct compl_chain name = COMPL_CHAIN_INIT(name)
+
+struct compl_chain_entry {
+	struct compl_chain *chain;
+	struct list_head list;
+	struct completion prev_finished;
+};
+
+void compl_chain_init(struct compl_chain *chain);
+void compl_chain_add(struct compl_chain *chain,
+			struct compl_chain_entry *entry);
+void compl_chain_wait(struct compl_chain_entry *entry);
+void compl_chain_complete(struct compl_chain_entry *entry);
+bool compl_chain_empty(struct compl_chain *chain);
+
+#endif /* _LINUX_COMPL_CHAIN_H */
diff --git a/lib/Makefile b/lib/Makefile
index 392ff808c9b9..8eea23c22b45 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -56,7 +56,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
 	 bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \
 	 percpu-refcount.o rhashtable.o base64.o \
 	 once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
-	 generic-radix-tree.o bitmap-str.o
+	 generic-radix-tree.o bitmap-str.o compl_chain.o
 obj-y += string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
diff --git a/lib/compl_chain.c b/lib/compl_chain.c
new file mode 100644
index 000000000000..7f7063802286
--- /dev/null
+++ b/lib/compl_chain.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/compl_chain.h>
+
+void compl_chain_init(struct compl_chain *chain)
+{
+
+	spin_lock_init(&chain->lock);
+	INIT_LIST_HEAD(&chain->list);
+}
+EXPORT_SYMBOL_GPL(compl_chain_init);
+
+void compl_chain_add(struct compl_chain *chain,
+			struct compl_chain_entry *entry)
+{
+
+	init_completion(&entry->prev_finished);
+	INIT_LIST_HEAD(&entry->list);
+
+	entry->chain = chain;
+
+	spin_lock(&chain->lock);
+	if (list_empty(&chain->list))
+		complete_all(&entry->prev_finished);
+	list_add_tail(&entry->list, &chain->list);
+	spin_unlock(&chain->lock);
+}
+EXPORT_SYMBOL_GPL(compl_chain_add);
+
+void compl_chain_wait(struct compl_chain_entry *entry)
+{
+
+	WARN_ON(!entry->chain);
+
+	wait_for_completion(&entry->prev_finished);
+}
+EXPORT_SYMBOL_GPL(compl_chain_wait);
+
+void compl_chain_complete(struct compl_chain_entry *entry)
+{
+
+	struct compl_chain *chain = entry->chain;
+
+	WARN_ON(!chain);
+
+	wait_for_completion(&entry->prev_finished);
+
+	spin_lock(&chain->lock);
+	list_del(&entry->list);
+	if (!list_empty(&chain->list)) {
+		struct compl_chain_entry *next =
+			list_first_entry(&chain->list,
+					 struct compl_chain_entry, list);
+		complete_all(&next->prev_finished);
+	}
+	spin_unlock(&chain->lock);
+
+	entry->chain = NULL;
+}
+EXPORT_SYMBOL_GPL(compl_chain_complete);
+
+bool compl_chain_empty(struct compl_chain *chain)
+{
+
+	bool r;
+
+	spin_lock(&chain->lock);
+	r = list_empty(&chain->list);
+	spin_unlock(&chain->lock);
+
+	return r;
+}
+EXPORT_SYMBOL_GPL(compl_chain_empty);



And this is like scsi/scsi_scan.c would be changed:



diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3c6e089e80c3..0f833c5888b5 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -36,6 +36,7 @@
 #include <linux/async.h>
 #include <linux/slab.h>
 #include <linux/unaligned.h>
+#include <linux/compl_chain.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -112,14 +113,11 @@ MODULE_PARM_DESC(inq_timeout,
 		 "Timeout (in seconds) waiting for devices to answer INQUIRY."
 		 " Default is 20. Some devices may need more; most need less.");
 
-/* This lock protects only this list */
-static DEFINE_SPINLOCK(async_scan_lock);
-static LIST_HEAD(scanning_hosts);
+DEFINE_COMPL_CHAIN(scanning_hosts);
 
 struct async_scan_data {
-	struct list_head list;
+	struct compl_chain_entry entry;
 	struct Scsi_Host *shost;
-	struct completion prev_finished;
 };
 
 /*
@@ -151,9 +149,8 @@ int scsi_complete_async_scans(void)
 	struct async_scan_data *data;
 
 	do {
-		scoped_guard(spinlock, &async_scan_lock)
-			if (list_empty(&scanning_hosts))
-				return 0;
+		if (compl_chain_empty(&scanning_hosts))
+			return 0;
 		/* If we can't get memory immediately, that's OK.  Just
 		 * sleep a little.  Even if we never get memory, the async
 		 * scans will finish eventually.
@@ -164,27 +161,11 @@ int scsi_complete_async_scans(void)
 	} while (!data);
 
 	data->shost = NULL;
-	init_completion(&data->prev_finished);
 
-	spin_lock(&async_scan_lock);
-	/* Check that there's still somebody else on the list */
-	if (list_empty(&scanning_hosts))
-		goto done;
-	list_add_tail(&data->list, &scanning_hosts);
-	spin_unlock(&async_scan_lock);
+	compl_chain_add(&scanning_hosts, &data->entry);
 
 	printk(KERN_INFO "scsi: waiting for bus probes to complete ...\n");
-	wait_for_completion(&data->prev_finished);
-
-	spin_lock(&async_scan_lock);
-	list_del(&data->list);
-	if (!list_empty(&scanning_hosts)) {
-		struct async_scan_data *next = list_entry(scanning_hosts.next,
-				struct async_scan_data, list);
-		complete(&next->prev_finished);
-	}
- done:
-	spin_unlock(&async_scan_lock);
+	compl_chain_complete(&data->entry);
 
 	kfree(data);
 	return 0;
@@ -1947,18 +1928,13 @@ static struct async_scan_data *scsi_prep_async_scan(struct Scsi_Host *shost)
 	data->shost = scsi_host_get(shost);
 	if (!data->shost)
 		goto err;
-	init_completion(&data->prev_finished);
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	shost->async_scan = 1;
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	mutex_unlock(&shost->scan_mutex);
 
-	spin_lock(&async_scan_lock);
-	if (list_empty(&scanning_hosts))
-		complete(&data->prev_finished);
-	list_add_tail(&data->list, &scanning_hosts);
-	spin_unlock(&async_scan_lock);
+	compl_chain_add(&scanning_hosts, &data->entry);
 
 	return data;
 
@@ -1995,7 +1971,7 @@ static void scsi_finish_async_scan(struct async_scan_data *data)
 		return;
 	}
 
-	wait_for_completion(&data->prev_finished);
+	compl_chain_wait(&data->entry);
 
 	scsi_sysfs_add_devices(shost);
 
@@ -2005,14 +1981,7 @@ static void scsi_finish_async_scan(struct async_scan_data *data)
 
 	mutex_unlock(&shost->scan_mutex);
 
-	spin_lock(&async_scan_lock);
-	list_del(&data->list);
-	if (!list_empty(&scanning_hosts)) {
-		struct async_scan_data *next = list_entry(scanning_hosts.next,
-				struct async_scan_data, list);
-		complete(&next->prev_finished);
-	}
-	spin_unlock(&async_scan_lock);
+	compl_chain_complete(&data->entry);
 
 	scsi_autopm_put_host(shost);
 	scsi_host_put(shost);



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 1/1] nvme-core: register namespaces in order during async scan
  2025-09-26  8:43     ` Maurizio Lombardi
@ 2025-10-03 10:14       ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2025-10-03 10:14 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Christoph Hellwig, Maurizio Lombardi, kbusch, sagi, hare,
	linux-nvme

On Fri, Sep 26, 2025 at 10:43:16AM +0200, Maurizio Lombardi wrote:
> On Mon Sep 22, 2025 at 7:30 PM CEST, Christoph Hellwig wrote:
> > This idea looks fine to me, but I hate duplicating the logic
> > from SCSI here.  Any chance you could try to factor the logic
> > into a common helper?
> 
> Ok, let's try. What do you think about something like this?
> I am not good with function names so I am open to suggestions:

This looks reasonable to me.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-10-03 10:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-04 11:43 [PATCH RFC 0/1] Ensure ordered namespace registration during async scan Maurizio Lombardi
2025-08-04 11:43 ` [PATCH RFC 1/1] nvme-core: register namespaces in order " Maurizio Lombardi
2025-09-22 17:30   ` Christoph Hellwig
2025-09-26  8:43     ` Maurizio Lombardi
2025-10-03 10:14       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox