linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 0/5] shut down devices asynchronously
@ 2025-06-25 20:18 Stuart Hayes
  2025-06-25 20:18 ` [PATCH v10 1/5] kernel/async: streamline cookie synchronization Stuart Hayes
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Stuart Hayes @ 2025-06-25 20:18 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki
  Cc: Stuart Hayes

This adds the ability for the kernel to shutdown devices asynchronously.

Only devices with drivers that enable it are shut down asynchronously.

This can dramatically reduce system shutdown/reboot time on systems that
have multiple devices that take many seconds to shut down (like certain
NVMe drives). On one system tested, the shutdown time went from 11 minutes
without this patch to 55 seconds with the patch.

Changes from V9:

Address resource and timing issues when spawning a unique async thread
for every device during shutdown:
  * Make the asynchronous threads able to shut down multiple devices,
    instead of spawning a unique thread for every device.
  * Modify core kernel async code with a custom wake function so it
    doesn't wake up threads waiting to synchronize every time the cookie
    changes

Changes from V8:

Deal with shutdown hangs resulting when a parent/supplier device is
  later in the devices_kset list than its children/consumers:
  * Ignore sync_state_only devlinks for shutdown dependencies
  * Ignore shutdown_after for devices that don't want async shutdown
  * Add a sanity check to revert to sync shutdown for any device that
    would otherwise wait for a child/consumer shutdown that hasn't
    already been scheduled

Changes from V7:

Do not expose driver async_shutdown_enable in sysfs.
Wrapped a long line.
 
Changes from V6:

Removed a sysfs attribute that allowed the async device shutdown to be
"on" (with driver opt-out), "safe" (driver opt-in), or "off"... what was
previously "safe" is now the only behavior, so drivers now only need to
have the option to enable or disable async shutdown.

Changes from V5:

Separated into multiple patches to make review easier.
Reworked some code to make it more readable
Made devices wait for consumers to shut down, not just children
  (suggested by David Jeffery)

Changes from V4:

Change code to use cookies for synchronization rather than async domains
Allow async shutdown to be disabled via sysfs, and allow driver opt-in or
  opt-out of async shutdown (when not disabled), with ability to control
  driver opt-in/opt-out via sysfs

Changes from V3:

Bug fix (used "parent" not "dev->parent" in device_shutdown)

Changes from V2:

Removed recursive functions to schedule children to be shutdown before
  parents, since existing device_shutdown loop will already do this

Changes from V1:

Rewritten using kernel async code (suggested by Lukas Wunner)

David Jeffery (1):
  kernel/async: streamline cookie synchronization

Stuart Hayes (4):
  driver core: don't always lock parent in shutdown
  driver core: separate function to shutdown one device
  driver core: shut down devices asynchronously
  nvme-pci: Make driver prefer asynchronous shutdown

 drivers/base/base.h           |   8 ++
 drivers/base/core.c           | 210 +++++++++++++++++++++++++++++-----
 drivers/nvme/host/pci.c       |   1 +
 include/linux/device/driver.h |   2 +
 kernel/async.c                |  42 ++++++-
 5 files changed, 236 insertions(+), 27 deletions(-)

-- 
2.39.3



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v10 1/5] kernel/async: streamline cookie synchronization
  2025-06-25 20:18 [PATCH v10 0/5] shut down devices asynchronously Stuart Hayes
@ 2025-06-25 20:18 ` Stuart Hayes
  2025-07-08 22:17   ` Sultan Alsawaf
  2025-06-25 20:18 ` [PATCH v10 2/5] driver core: don't always lock parent in shutdown Stuart Hayes
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Stuart Hayes @ 2025-06-25 20:18 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki
  Cc: Stuart Hayes

From: David Jeffery <djeffery@redhat.com>

To prevent a thundering herd effect, implement a custom wake function for
the async shubsystem which will only wake waiters which have all their
dependencies completed.

The async subsystem currently wakes all waiters on async_done when an async
task completes. When there are many tasks trying to synchronize on differnt
async values, this can create a thundering herd problem when an async task
wakes up all waiters, most of whom go back to waiting after causing
lock contention and wasting CPU.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
 kernel/async.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/kernel/async.c b/kernel/async.c
index 4c3e6a44595f..ae327f29bac9 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -76,6 +76,12 @@ struct async_entry {
 	struct async_domain	*domain;
 };
 
+struct async_wait_entry {
+	wait_queue_entry_t wait;
+	async_cookie_t cookie;
+	struct async_domain *domain;
+};
+
 static DECLARE_WAIT_QUEUE_HEAD(async_done);
 
 static atomic_t entry_count;
@@ -298,6 +304,24 @@ void async_synchronize_full_domain(struct async_domain *domain)
 }
 EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
 
+/**
+ * async_domain_wake_function - wait function for cooking synchronization
+ *
+ * Custom wait function for async_synchronize_cookie_domain to check cookie
+ * value.  This prevents waking up waiting threads unnecessarily.
+ */
+static int async_domain_wake_function(struct wait_queue_entry *wait,
+				      unsigned int mode, int sync, void *key)
+{
+	struct async_wait_entry *await =
+		container_of(wait, struct async_wait_entry, wait);
+
+	if (lowest_in_progress(await->domain) < await->cookie)
+		return 0;
+
+	return autoremove_wake_function(wait, mode, sync, key);
+}
+
 /**
  * async_synchronize_cookie_domain - synchronize asynchronous function calls within a certain domain with cookie checkpointing
  * @cookie: async_cookie_t to use as checkpoint
@@ -310,11 +334,27 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
 void async_synchronize_cookie_domain(async_cookie_t cookie, struct async_domain *domain)
 {
 	ktime_t starttime;
+	struct async_wait_entry await = {
+		.cookie = cookie,
+		.domain = domain,
+		.wait = {
+			.func = async_domain_wake_function,
+			.private = current,
+			.flags = 0,
+			.entry = LIST_HEAD_INIT(await.wait.entry),
+		}};
 
 	pr_debug("async_waiting @ %i\n", task_pid_nr(current));
 	starttime = ktime_get();
 
-	wait_event(async_done, lowest_in_progress(domain) >= cookie);
+	for (;;) {
+		prepare_to_wait(&async_done, &await.wait, TASK_UNINTERRUPTIBLE);
+
+		if (lowest_in_progress(domain) >= cookie)
+			break;
+		schedule();
+	}
+	finish_wait(&async_done, &await.wait);
 
 	pr_debug("async_continuing @ %i after %lli usec\n", task_pid_nr(current),
 		 microseconds_since(starttime));
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 2/5] driver core: don't always lock parent in shutdown
  2025-06-25 20:18 [PATCH v10 0/5] shut down devices asynchronously Stuart Hayes
  2025-06-25 20:18 ` [PATCH v10 1/5] kernel/async: streamline cookie synchronization Stuart Hayes
@ 2025-06-25 20:18 ` Stuart Hayes
  2025-07-01  8:50   ` Greg Kroah-Hartman
  2025-06-25 20:18 ` [PATCH v10 3/5] driver core: separate function to shutdown one device Stuart Hayes
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Stuart Hayes @ 2025-06-25 20:18 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki
  Cc: Stuart Hayes

Don't lock a parent device unless it is needed in device_shutdown. This
is in preparation for making device shutdown asynchronous, when it will
be needed to allow children of a common parent to shut down
simultaneously.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
---
 drivers/base/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index cbc0099d8ef2..58c772785606 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4823,7 +4823,7 @@ void device_shutdown(void)
 		spin_unlock(&devices_kset->list_lock);
 
 		/* hold lock to avoid race with probe/release */
-		if (parent)
+		if (parent && dev->bus && dev->bus->need_parent_lock)
 			device_lock(parent);
 		device_lock(dev);
 
@@ -4847,7 +4847,7 @@ void device_shutdown(void)
 		}
 
 		device_unlock(dev);
-		if (parent)
+		if (parent && dev->bus && dev->bus->need_parent_lock)
 			device_unlock(parent);
 
 		put_device(dev);
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 3/5] driver core: separate function to shutdown one device
  2025-06-25 20:18 [PATCH v10 0/5] shut down devices asynchronously Stuart Hayes
  2025-06-25 20:18 ` [PATCH v10 1/5] kernel/async: streamline cookie synchronization Stuart Hayes
  2025-06-25 20:18 ` [PATCH v10 2/5] driver core: don't always lock parent in shutdown Stuart Hayes
@ 2025-06-25 20:18 ` Stuart Hayes
  2025-06-25 20:18 ` [PATCH v10 4/5] driver core: shut down devices asynchronously Stuart Hayes
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Stuart Hayes @ 2025-06-25 20:18 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki
  Cc: Stuart Hayes

Make a separate function for the part of device_shutdown() that does the
shutown for a single device. This is in preparation for making device
shutdown asynchronous.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
---
 drivers/base/core.c | 66 ++++++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 30 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 58c772785606..39502621e88e 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4786,6 +4786,41 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
 }
 EXPORT_SYMBOL_GPL(device_change_owner);
 
+static void shutdown_one_device(struct device *dev)
+{
+	/* hold lock to avoid race with probe/release */
+	if (dev->parent && dev->bus && dev->bus->need_parent_lock)
+		device_lock(dev->parent);
+	device_lock(dev);
+
+	/* Don't allow any more runtime suspends */
+	pm_runtime_get_noresume(dev);
+	pm_runtime_barrier(dev);
+
+	if (dev->class && dev->class->shutdown_pre) {
+		if (initcall_debug)
+			dev_info(dev, "shutdown_pre\n");
+		dev->class->shutdown_pre(dev);
+	}
+	if (dev->bus && dev->bus->shutdown) {
+		if (initcall_debug)
+			dev_info(dev, "shutdown\n");
+		dev->bus->shutdown(dev);
+	} else if (dev->driver && dev->driver->shutdown) {
+		if (initcall_debug)
+			dev_info(dev, "shutdown\n");
+		dev->driver->shutdown(dev);
+	}
+
+	device_unlock(dev);
+	if (dev->parent && dev->bus && dev->bus->need_parent_lock)
+		device_unlock(dev->parent);
+
+	if (dev->parent)
+		put_device(dev->parent);
+	put_device(dev);
+}
+
 /**
  * device_shutdown - call ->shutdown() on each device to shutdown.
  */
@@ -4822,36 +4857,7 @@ void device_shutdown(void)
 		list_del_init(&dev->kobj.entry);
 		spin_unlock(&devices_kset->list_lock);
 
-		/* hold lock to avoid race with probe/release */
-		if (parent && dev->bus && dev->bus->need_parent_lock)
-			device_lock(parent);
-		device_lock(dev);
-
-		/* Don't allow any more runtime suspends */
-		pm_runtime_get_noresume(dev);
-		pm_runtime_barrier(dev);
-
-		if (dev->class && dev->class->shutdown_pre) {
-			if (initcall_debug)
-				dev_info(dev, "shutdown_pre\n");
-			dev->class->shutdown_pre(dev);
-		}
-		if (dev->bus && dev->bus->shutdown) {
-			if (initcall_debug)
-				dev_info(dev, "shutdown\n");
-			dev->bus->shutdown(dev);
-		} else if (dev->driver && dev->driver->shutdown) {
-			if (initcall_debug)
-				dev_info(dev, "shutdown\n");
-			dev->driver->shutdown(dev);
-		}
-
-		device_unlock(dev);
-		if (parent && dev->bus && dev->bus->need_parent_lock)
-			device_unlock(parent);
-
-		put_device(dev);
-		put_device(parent);
+		shutdown_one_device(dev);
 
 		spin_lock(&devices_kset->list_lock);
 	}
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 4/5] driver core: shut down devices asynchronously
  2025-06-25 20:18 [PATCH v10 0/5] shut down devices asynchronously Stuart Hayes
                   ` (2 preceding siblings ...)
  2025-06-25 20:18 ` [PATCH v10 3/5] driver core: separate function to shutdown one device Stuart Hayes
@ 2025-06-25 20:18 ` Stuart Hayes
  2025-06-25 20:18 ` [PATCH v10 5/5] nvme-pci: Make driver prefer asynchronous shutdown Stuart Hayes
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Stuart Hayes @ 2025-06-25 20:18 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki
  Cc: Stuart Hayes

Add code to allow asynchronous shutdown of devices.

Devices allowed to do asynchronous shutdown can start to shut down in their
own thread as soon as all of the devices dependent on them have finished
shutting down. All other devices will wait until the previous device in
the devices_kset list have finished shutting down.

Only devices with drivers that have async_shutdown_enable set will be shut
down asynchronously.

Devices shutting down asynchronously will be shut down in their own thread,
but synchronous devices in the devices_kset list between asynch devices
will be put in a list, and that list will be shut down in a single thread.
This avoids burdening the system with hundreds of threads on shutdown.

This can dramatically reduce system shutdown/reboot time on systems that
have multiple devices that take many seconds to shut down (like certain
NVMe drives). On one system tested, the shutdown time went from 11
minutes without this patch to 55 seconds when this patch.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
 drivers/base/base.h           |   8 ++
 drivers/base/core.c           | 157 +++++++++++++++++++++++++++++++++-
 include/linux/device/driver.h |   2 +
 3 files changed, 165 insertions(+), 2 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 123031a757d9..214207ca5392 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -10,6 +10,7 @@
  * shared outside of the drivers/base/ directory.
  *
  */
+#include <linux/async.h>
 #include <linux/notifier.h>
 
 /**
@@ -85,6 +86,11 @@ struct driver_private {
 };
 #define to_driver(obj) container_of(obj, struct driver_private, kobj)
 
+union shutdown_private {
+	struct device *next;
+	async_cookie_t after;
+};
+
 /**
  * struct device_private - structure to hold the private to the driver core portions of the device structure.
  *
@@ -98,6 +104,7 @@ struct driver_private {
  *	the device; typically because it depends on another driver getting
  *	probed first.
  * @async_driver - pointer to device driver awaiting probe via async_probe
+ * @shutdown - used during device shutdown to ensure correct shutdown ordering.
  * @device - pointer back to the struct device that this structure is
  * associated with.
  * @dead - This device is currently either in the process of or has been
@@ -115,6 +122,7 @@ struct device_private {
 	struct list_head deferred_probe;
 	const struct device_driver *async_driver;
 	char *deferred_probe_reason;
+	union shutdown_private shutdown;
 	struct device *device;
 	u8 dead:1;
 };
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 39502621e88e..f0484ceefc52 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -9,6 +9,7 @@
  */
 
 #include <linux/acpi.h>
+#include <linux/async.h>
 #include <linux/blkdev.h>
 #include <linux/cleanup.h>
 #include <linux/cpufreq.h>
@@ -4786,6 +4787,8 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
 }
 EXPORT_SYMBOL_GPL(device_change_owner);
 
+static ASYNC_DOMAIN(sd_domain);
+
 static void shutdown_one_device(struct device *dev)
 {
 	/* hold lock to avoid race with probe/release */
@@ -4821,12 +4824,116 @@ static void shutdown_one_device(struct device *dev)
 	put_device(dev);
 }
 
+static bool device_wants_async_shutdown(struct device *dev)
+{
+	if (dev->driver && dev->driver->async_shutdown_enable)
+		return true;
+
+	return false;
+}
+
+/**
+ * set_wait_cookies
+ * @dev: device to find parents and suppliers for
+ * @cookie: shutdown cookie for dev
+ *
+ * Look for parent and suppliers of dev that want async shutdown, and
+ * set shutdown.after to cookie on those devices to ensure they
+ * don't shut down before dev.
+ *
+ * Passing a cookie of zero will return whether any such devices are found
+ * without setting shutdown.after.
+ *
+ * Return true if any async supplier/parent devices are found.
+ */
+static bool device_set_async_cookie(struct device *dev, async_cookie_t cookie)
+{
+	int idx;
+	struct device_link *link;
+	bool ret = false;
+	struct device *parent = dev->parent;
+
+	if (parent && device_wants_async_shutdown(parent)) {
+		ret = true;
+		if (cookie)
+			parent->p->shutdown.after = cookie;
+		else
+			goto done;
+	}
+
+	idx = device_links_read_lock();
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
+				device_links_read_lock_held()) {
+		if (!device_link_flag_is_sync_state_only(link->flags)
+		    && device_wants_async_shutdown(link->supplier)) {
+			ret = true;
+			if (cookie)
+				link->supplier->p->shutdown.after = cookie;
+			else
+				break;
+		}
+	}
+	device_links_read_unlock(idx);
+done:
+	return ret;
+}
+
+#define is_async_shutdown_dependency(dev) device_set_async_cookie(dev, 0)
+
+/**
+ * shutdown_devices_async
+ * @data: list of devices to be shutdown
+ * @cookie: not used
+ *
+ * Shuts down devices after waiting for previous devices to shut down (for
+ * synchronous shutdown) or waiting for device's last child or consumer to
+ * be shutdown (for async shutdown).
+ *
+ * shutdown.after is set to the shutdown cookie of the last child or consumer
+ * of this device (if any).
+ */
+static void shutdown_devices_async(void *data, async_cookie_t cookie)
+{
+	struct device *next, *dev = data;
+	async_cookie_t wait = cookie;
+	bool async = device_wants_async_shutdown(dev);
+
+	if (async) {
+		wait = dev->p->shutdown.after + 1;
+		/*
+		 * To prevent system hang, revert to sync shutdown in the event
+		 * that shutdown.after would make this shutdown wait for a
+		 * shutdown that hasn't been scheduled yet.
+		 *
+		 * This can happen if a parent or supplier is not ordered in the
+		 * devices_kset list before a child or consumer, which is not
+		 * expected.
+		 */
+		if (wait > cookie) {
+			wait = cookie;
+			dev_warn(dev, "Unsafe shutdown ordering, forcing sync order\n");
+		}
+	}
+
+	async_synchronize_cookie_domain(wait, &sd_domain);
+
+	/*
+	 * Shut down the async device or list of sync devices
+	 */
+	do {
+		next = dev->p->shutdown.next;
+		shutdown_one_device(dev);
+		dev = next;
+	} while (!async && dev);
+}
+
 /**
  * device_shutdown - call ->shutdown() on each device to shutdown.
  */
 void device_shutdown(void)
 {
-	struct device *dev, *parent;
+	struct device *dev, *parent, *synclist = NULL, *syncend = NULL;
+	async_cookie_t cookie = 0;
 
 	wait_for_device_probe();
 	device_block_probing();
@@ -4857,11 +4964,57 @@ void device_shutdown(void)
 		list_del_init(&dev->kobj.entry);
 		spin_unlock(&devices_kset->list_lock);
 
-		shutdown_one_device(dev);
+		get_device(dev);
+		get_device(parent);
+
+		if (device_wants_async_shutdown(dev)) {
+			/*
+			 * async devices run alone in their own async task,
+			 * push out any waiting sync devices to maintain
+			 * ordering.
+			 */
+			if (synclist) {
+				async_schedule_domain(shutdown_devices_async,
+						      synclist, &sd_domain);
+				synclist = syncend = NULL;
+			}
+
+			cookie = async_schedule_domain(shutdown_devices_async,
+					       dev, &sd_domain);
+			device_set_async_cookie(dev, cookie);
+		} else {
+			if (!synclist) {
+				synclist = syncend = dev;
+			} else {
+				syncend->p->shutdown.next = dev;
+				syncend = dev;
+			}
+			if (is_async_shutdown_dependency(dev)) {
+				/*
+				 * dev is a dependency for an async device,
+				 * kick off a new thread so it can complete
+				 * and allow the async device to run its
+				 * shutdown.
+				 */
+				cookie = async_schedule_domain(
+							shutdown_devices_async,
+							synclist, &sd_domain);
+				device_set_async_cookie(dev, cookie);
+				synclist = syncend = NULL;
+			}
+		}
+
+		put_device(parent);
+		put_device(dev);
 
 		spin_lock(&devices_kset->list_lock);
 	}
 	spin_unlock(&devices_kset->list_lock);
+
+	if (synclist)
+		async_schedule_domain(shutdown_devices_async, synclist,
+				      &sd_domain);
+	async_synchronize_full_domain(&sd_domain);
 }
 
 /*
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index cd8e0f0a634b..c63bc0050c84 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -56,6 +56,7 @@ enum probe_type {
  * @mod_name:	Used for built-in modules.
  * @suppress_bind_attrs: Disables bind/unbind via sysfs.
  * @probe_type:	Type of the probe (synchronous or asynchronous) to use.
+ * @async_shutdown_enable: Enables devices to be shutdown asynchronously.
  * @of_match_table: The open firmware table.
  * @acpi_match_table: The ACPI match table.
  * @probe:	Called to query the existence of a specific device,
@@ -102,6 +103,7 @@ struct device_driver {
 
 	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
 	enum probe_type probe_type;
+	bool async_shutdown_enable;
 
 	const struct of_device_id	*of_match_table;
 	const struct acpi_device_id	*acpi_match_table;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 5/5] nvme-pci: Make driver prefer asynchronous shutdown
  2025-06-25 20:18 [PATCH v10 0/5] shut down devices asynchronously Stuart Hayes
                   ` (3 preceding siblings ...)
  2025-06-25 20:18 ` [PATCH v10 4/5] driver core: shut down devices asynchronously Stuart Hayes
@ 2025-06-25 20:18 ` Stuart Hayes
  2025-06-30 20:33 ` [PATCH v10 0/5] shut down devices asynchronously Michael Kelley
  2025-07-03 11:46 ` Christoph Hellwig
  6 siblings, 0 replies; 25+ messages in thread
From: Stuart Hayes @ 2025-06-25 20:18 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki
  Cc: Stuart Hayes

Set the driver default to enable asynchronous shutdown.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
---
 drivers/nvme/host/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8ff12e415cb5..727527ef3896 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3831,6 +3831,7 @@ static struct pci_driver nvme_driver = {
 	.shutdown	= nvme_shutdown,
 	.driver		= {
 		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
+		.async_shutdown_enable	= true,
 #ifdef CONFIG_PM_SLEEP
 		.pm		= &nvme_dev_pm_ops,
 #endif
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH v10 0/5] shut down devices asynchronously
  2025-06-25 20:18 [PATCH v10 0/5] shut down devices asynchronously Stuart Hayes
                   ` (4 preceding siblings ...)
  2025-06-25 20:18 ` [PATCH v10 5/5] nvme-pci: Make driver prefer asynchronous shutdown Stuart Hayes
@ 2025-06-30 20:33 ` Michael Kelley
  2025-06-30 22:02   ` Laurence Oberman
  2025-07-03 11:46 ` Christoph Hellwig
  6 siblings, 1 reply; 25+ messages in thread
From: Michael Kelley @ 2025-06-30 20:33 UTC (permalink / raw)
  To: Stuart Hayes, linux-kernel@vger.kernel.org, Greg Kroah-Hartman,
	Rafael J . Wysocki, Martin Belanger, Oliver O'Halloran,
	Daniel Wagner, Keith Busch, Lukas Wunner, David Jeffery,
	Jeremy Allison, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme@lists.infradead.org, Nathan Chancellor, Jan Kiszka,
	Bert Karwatzki

From: Stuart Hayes <stuart.w.hayes@gmail.com> Sent: Wednesday, June 25, 2025 1:19 PM
> 
> This adds the ability for the kernel to shutdown devices asynchronously.
> 
> Only devices with drivers that enable it are shut down asynchronously.
> 
> This can dramatically reduce system shutdown/reboot time on systems that
> have multiple devices that take many seconds to shut down (like certain
> NVMe drives). On one system tested, the shutdown time went from 11 minutes
> without this patch to 55 seconds with the patch.

I've tested this version and all looks good. I did the same tests that I did
with v9 [1], running in a VM in the Azure cloud. The 2 NVMe devices are
shutdown in parallel, gaining about 110 milliseconds, and there were no
slowdowns as seen in v9. The net gain was ~100 ms.

I also tested a local Hyper-V VM that does not have any NVMe devices.
The shutdown timings with and without this patch set are pretty much
the same, which was not the case with v9.

I did not repeat the more detailed debugging from v9 as reported
here [2], since there is no unexpected slowness with v10.

For the series,

Tested-by: Michael Kelley <mhklinux@outlook.com>

[1] https://lore.kernel.org/lkml/BN7PR02MB41480DE777B9C224F3C2DF43D4792@BN7PR02MB4148.namprd02.prod.outlook.com/
[2] https://lore.kernel.org/lkml/SN6PR02MB41571E2DD410D09CE7494B38D4402@SN6PR02MB4157.namprd02.prod.outlook.com/

> 
> Changes from V9:
> 
> Address resource and timing issues when spawning a unique async thread
> for every device during shutdown:
>   * Make the asynchronous threads able to shut down multiple devices,
>     instead of spawning a unique thread for every device.
>   * Modify core kernel async code with a custom wake function so it
>     doesn't wake up threads waiting to synchronize every time the cookie
>     changes
> 
> Changes from V8:
> 
> Deal with shutdown hangs resulting when a parent/supplier device is
>   later in the devices_kset list than its children/consumers:
>   * Ignore sync_state_only devlinks for shutdown dependencies
>   * Ignore shutdown_after for devices that don't want async shutdown
>   * Add a sanity check to revert to sync shutdown for any device that
>     would otherwise wait for a child/consumer shutdown that hasn't
>     already been scheduled
> 
> Changes from V7:
> 
> Do not expose driver async_shutdown_enable in sysfs.
> Wrapped a long line.
> 
> Changes from V6:
> 
> Removed a sysfs attribute that allowed the async device shutdown to be
> "on" (with driver opt-out), "safe" (driver opt-in), or "off"... what was
> previously "safe" is now the only behavior, so drivers now only need to
> have the option to enable or disable async shutdown.
> 
> Changes from V5:
> 
> Separated into multiple patches to make review easier.
> Reworked some code to make it more readable
> Made devices wait for consumers to shut down, not just children
>   (suggested by David Jeffery)
> 
> Changes from V4:
> 
> Change code to use cookies for synchronization rather than async domains
> Allow async shutdown to be disabled via sysfs, and allow driver opt-in or
>   opt-out of async shutdown (when not disabled), with ability to control
>   driver opt-in/opt-out via sysfs
> 
> Changes from V3:
> 
> Bug fix (used "parent" not "dev->parent" in device_shutdown)
> 
> Changes from V2:
> 
> Removed recursive functions to schedule children to be shutdown before
>   parents, since existing device_shutdown loop will already do this
> 
> Changes from V1:
> 
> Rewritten using kernel async code (suggested by Lukas Wunner)
> 
> David Jeffery (1):
>   kernel/async: streamline cookie synchronization
> 
> Stuart Hayes (4):
>   driver core: don't always lock parent in shutdown
>   driver core: separate function to shutdown one device
>   driver core: shut down devices asynchronously
>   nvme-pci: Make driver prefer asynchronous shutdown
> 
>  drivers/base/base.h           |   8 ++
>  drivers/base/core.c           | 210 +++++++++++++++++++++++++++++-----
>  drivers/nvme/host/pci.c       |   1 +
>  include/linux/device/driver.h |   2 +
>  kernel/async.c                |  42 ++++++-
>  5 files changed, 236 insertions(+), 27 deletions(-)
> 
> --
> 2.39.3
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-06-30 20:33 ` [PATCH v10 0/5] shut down devices asynchronously Michael Kelley
@ 2025-06-30 22:02   ` Laurence Oberman
  0 siblings, 0 replies; 25+ messages in thread
From: Laurence Oberman @ 2025-06-30 22:02 UTC (permalink / raw)
  To: Michael Kelley, Stuart Hayes, linux-kernel@vger.kernel.org,
	Greg Kroah-Hartman, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	David Jeffery, Jeremy Allison, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, linux-nvme@lists.infradead.org, Nathan Chancellor,
	Jan Kiszka, Bert Karwatzki

On Mon, 2025-06-30 at 20:33 +0000, Michael Kelley wrote:
> From: Stuart Hayes <stuart.w.hayes@gmail.com> Sent: Wednesday, June
> 25, 2025 1:19 PM
> > 
> > This adds the ability for the kernel to shutdown devices
> > asynchronously.
> > 
> > Only devices with drivers that enable it are shut down
> > asynchronously.
> > 
> > This can dramatically reduce system shutdown/reboot time on systems
> > that
> > have multiple devices that take many seconds to shut down (like
> > certain
> > NVMe drives). On one system tested, the shutdown time went from 11
> > minutes
> > without this patch to 55 seconds with the patch.
> 
> I've tested this version and all looks good. I did the same tests
> that I did
> with v9 [1], running in a VM in the Azure cloud. The 2 NVMe devices
> are
> shutdown in parallel, gaining about 110 milliseconds, and there were
> no
> slowdowns as seen in v9. The net gain was ~100 ms.
> 
> I also tested a local Hyper-V VM that does not have any NVMe devices.
> The shutdown timings with and without this patch set are pretty much
> the same, which was not the case with v9.
> 
> I did not repeat the more detailed debugging from v9 as reported
> here [2], since there is no unexpected slowness with v10.
> 
> For the series,
> 
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> 
> [1]
> https://lore.kernel.org/lkml/BN7PR02MB41480DE777B9C224F3C2DF43D4792@BN7PR02MB4148.namprd02.prod.outlook.com/
> [2]
> https://lore.kernel.org/lkml/SN6PR02MB41571E2DD410D09CE7494B38D4402@SN6PR02MB4157.namprd02.prod.outlook.com/
> 
> > 
> > Changes from V9:
> > 
> > Address resource and timing issues when spawning a unique async
> > thread
> > for every device during shutdown:
> >   * Make the asynchronous threads able to shut down multiple
> > devices,
> >     instead of spawning a unique thread for every device.
> >   * Modify core kernel async code with a custom wake function so it
> >     doesn't wake up threads waiting to synchronize every time the
> > cookie
> >     changes
> > 
> > Changes from V8:
> > 
> > Deal with shutdown hangs resulting when a parent/supplier device is
> >   later in the devices_kset list than its children/consumers:
> >   * Ignore sync_state_only devlinks for shutdown dependencies
> >   * Ignore shutdown_after for devices that don't want async
> > shutdown
> >   * Add a sanity check to revert to sync shutdown for any device
> > that
> >     would otherwise wait for a child/consumer shutdown that hasn't
> >     already been scheduled
> > 
> > Changes from V7:
> > 
> > Do not expose driver async_shutdown_enable in sysfs.
> > Wrapped a long line.
> > 
> > Changes from V6:
> > 
> > Removed a sysfs attribute that allowed the async device shutdown to
> > be
> > "on" (with driver opt-out), "safe" (driver opt-in), or "off"...
> > what was
> > previously "safe" is now the only behavior, so drivers now only
> > need to
> > have the option to enable or disable async shutdown.
> > 
> > Changes from V5:
> > 
> > Separated into multiple patches to make review easier.
> > Reworked some code to make it more readable
> > Made devices wait for consumers to shut down, not just children
> >   (suggested by David Jeffery)
> > 
> > Changes from V4:
> > 
> > Change code to use cookies for synchronization rather than async
> > domains
> > Allow async shutdown to be disabled via sysfs, and allow driver
> > opt-in or
> >   opt-out of async shutdown (when not disabled), with ability to
> > control
> >   driver opt-in/opt-out via sysfs
> > 
> > Changes from V3:
> > 
> > Bug fix (used "parent" not "dev->parent" in device_shutdown)
> > 
> > Changes from V2:
> > 
> > Removed recursive functions to schedule children to be shutdown
> > before
> >   parents, since existing device_shutdown loop will already do this
> > 
> > Changes from V1:
> > 
> > Rewritten using kernel async code (suggested by Lukas Wunner)
> > 
> > David Jeffery (1):
> >   kernel/async: streamline cookie synchronization
> > 
> > Stuart Hayes (4):
> >   driver core: don't always lock parent in shutdown
> >   driver core: separate function to shutdown one device
> >   driver core: shut down devices asynchronously
> >   nvme-pci: Make driver prefer asynchronous shutdown
> > 
> >  drivers/base/base.h           |   8 ++
> >  drivers/base/core.c           | 210 +++++++++++++++++++++++++++++-
> > ----
> >  drivers/nvme/host/pci.c       |   1 +
> >  include/linux/device/driver.h |   2 +
> >  kernel/async.c                |  42 ++++++-
> >  5 files changed, 236 insertions(+), 27 deletions(-)
> > 
> > --
> > 2.39.3
> > 
> 
> 

For the series:

Against 
Kernel 6.16.0-rc4-dirty on an x86_64

Difference of about 15 seconds to shutdown compared to almost 60
Same set of test I always run and stable and repeatable

Looks good again, although V9 also looked good until Mike Kelley found
his issues.
 
Tested-by: Laurence Oberman <loberman@redhat.com>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 2/5] driver core: don't always lock parent in shutdown
  2025-06-25 20:18 ` [PATCH v10 2/5] driver core: don't always lock parent in shutdown Stuart Hayes
@ 2025-07-01  8:50   ` Greg Kroah-Hartman
  2025-07-02 14:38     ` David Jeffery
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-01  8:50 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: linux-kernel, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	David Jeffery, Jeremy Allison, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, linux-nvme, Nathan Chancellor, Jan Kiszka,
	Bert Karwatzki

On Wed, Jun 25, 2025 at 03:18:50PM -0500, Stuart Hayes wrote:
> Don't lock a parent device unless it is needed in device_shutdown. This
> is in preparation for making device shutdown asynchronous, when it will
> be needed to allow children of a common parent to shut down
> simultaneously.
> 
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> Signed-off-by: David Jeffery <djeffery@redhat.com>
> ---
>  drivers/base/core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index cbc0099d8ef2..58c772785606 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -4823,7 +4823,7 @@ void device_shutdown(void)
>  		spin_unlock(&devices_kset->list_lock);
>  
>  		/* hold lock to avoid race with probe/release */
> -		if (parent)
> +		if (parent && dev->bus && dev->bus->need_parent_lock)
>  			device_lock(parent);

What about parents for a device that is not on a bus?  Don't they need
to be properly locked?


>  		device_lock(dev);
>  
> @@ -4847,7 +4847,7 @@ void device_shutdown(void)
>  		}
>  
>  		device_unlock(dev);
> -		if (parent)
> +		if (parent && dev->bus && dev->bus->need_parent_lock)
>  			device_unlock(parent);

Same here.

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 2/5] driver core: don't always lock parent in shutdown
  2025-07-01  8:50   ` Greg Kroah-Hartman
@ 2025-07-02 14:38     ` David Jeffery
  0 siblings, 0 replies; 25+ messages in thread
From: David Jeffery @ 2025-07-02 14:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Stuart Hayes, linux-kernel, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	Jeremy Allison, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, Nathan Chancellor, Jan Kiszka, Bert Karwatzki

On Tue, Jul 1, 2025 at 5:03 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Jun 25, 2025 at 03:18:50PM -0500, Stuart Hayes wrote:
> > Don't lock a parent device unless it is needed in device_shutdown. This
> > is in preparation for making device shutdown asynchronous, when it will
> > be needed to allow children of a common parent to shut down
> > simultaneously.
> >
> > Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> > Signed-off-by: David Jeffery <djeffery@redhat.com>
> > ---
> >  drivers/base/core.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index cbc0099d8ef2..58c772785606 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -4823,7 +4823,7 @@ void device_shutdown(void)
> >               spin_unlock(&devices_kset->list_lock);
> >
> >               /* hold lock to avoid race with probe/release */
> > -             if (parent)
> > +             if (parent && dev->bus && dev->bus->need_parent_lock)
> >                       device_lock(parent);
>
> What about parents for a device that is not on a bus?  Don't they need
> to be properly locked?

From my examination of the code and history, I do not believe so.
Locking the parent was added before need_parent_lock was added, and
when the other locations changed to depend on need_parent_lock to lock
both, device_shutdown was left always locking both.

It is simple enough to change the if checks to:

if (parent && (!dev->bus || dev->bus->need_parent_lock))

if you think my understanding is wrong and some bus-less devices have
come to depend on the behavior.

David Jeffery



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-06-25 20:18 [PATCH v10 0/5] shut down devices asynchronously Stuart Hayes
                   ` (5 preceding siblings ...)
  2025-06-30 20:33 ` [PATCH v10 0/5] shut down devices asynchronously Michael Kelley
@ 2025-07-03 11:46 ` Christoph Hellwig
  2025-07-03 15:41   ` Jeremy Allison
                     ` (2 more replies)
  6 siblings, 3 replies; 25+ messages in thread
From: Christoph Hellwig @ 2025-07-03 11:46 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki

On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> Address resource and timing issues when spawning a unique async thread
> for every device during shutdown:
>   * Make the asynchronous threads able to shut down multiple devices,
>     instead of spawning a unique thread for every device.
>   * Modify core kernel async code with a custom wake function so it
>     doesn't wake up threads waiting to synchronize every time the cookie
>     changes

Given all these thread spawning issues, why can't we just go back
to the approach that kicks off shutdown asynchronously and then waits
for it without spawning all these threads?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-03 11:46 ` Christoph Hellwig
@ 2025-07-03 15:41   ` Jeremy Allison
  2025-07-04 13:45     ` David Jeffery
  2025-07-03 15:59   ` stuart hayes
  2025-07-04 13:38   ` David Jeffery
  2 siblings, 1 reply; 25+ messages in thread
From: Jeremy Allison @ 2025-07-03 15:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Stuart Hayes, linux-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, Martin Belanger, Oliver O'Halloran,
	Daniel Wagner, Keith Busch, Lukas Wunner, David Jeffery,
	Jeremy Allison, Jens Axboe, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki, sultan, jra

On Thu, Jul 03, 2025 at 01:46:56PM +0200, Christoph Hellwig wrote:
>On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
>> Address resource and timing issues when spawning a unique async thread
>> for every device during shutdown:
>>   * Make the asynchronous threads able to shut down multiple devices,
>>     instead of spawning a unique thread for every device.
>>   * Modify core kernel async code with a custom wake function so it
>>     doesn't wake up threads waiting to synchronize every time the cookie
>>     changes
>
>Given all these thread spawning issues, why can't we just go back
>to the approach that kicks off shutdown asynchronously and then waits
>for it without spawning all these threads?

It isn't just an nvme issue. Red Hat found the same issue
with SCSI devices.

My colleague Sultan Alsawaf posted a simpler fix for the
earlier patch here:

https://lists.infradead.org/pipermail/linux-nvme/2025-January/053666.html

Maybe this could be explored.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-03 11:46 ` Christoph Hellwig
  2025-07-03 15:41   ` Jeremy Allison
@ 2025-07-03 15:59   ` stuart hayes
  2025-07-04 13:38   ` David Jeffery
  2 siblings, 0 replies; 25+ messages in thread
From: stuart hayes @ 2025-07-03 15:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Sagi Grimberg, linux-nvme, Nathan Chancellor,
	Jan Kiszka, Bert Karwatzki

On 7/3/2025 6:46 AM, Christoph Hellwig wrote:
> On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
>> Address resource and timing issues when spawning a unique async thread
>> for every device during shutdown:
>>    * Make the asynchronous threads able to shut down multiple devices,
>>      instead of spawning a unique thread for every device.
>>    * Modify core kernel async code with a custom wake function so it
>>      doesn't wake up threads waiting to synchronize every time the cookie
>>      changes
> 
> Given all these thread spawning issues, why can't we just go back
> to the approach that kicks off shutdown asynchronously and then waits
> for it without spawning all these threads?
> 

If you mean for drivers to have an extra call like shutdown_start() 
that's called for all devices before waiting for any of them to finish, 
it seems like that would just push the work of spawning a shutdown 
thread onto the individual drivers (unless they just had a single 
command they issue to the device that takes all the time), and 
synchronization might be more of an issue since shutdown_start() could 
be called on a device before that device's children or dependents have 
finished shutting down.

With the async shutdown code from this patch set in place, any driver 
can be fully enabled to shut down devices asynchronously with no work 
other than setting a flag to allow it, and no device will start shutting 
down until its children and dependents have finished.

I think the main issue in the previous version of this patch was just 
that it was taking too long on smaller systems to spawn a thread for 
every single device shutdown, when most or all of those devices were 
shutting down synchronously anyway. This version of the patch solves 
that issue... it should only spawn on the order of roughly 2N threads on 
systems with N async shutdown devices, and it improves the kernel async 
code to make synchronizing to a cookie faster, too.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-03 11:46 ` Christoph Hellwig
  2025-07-03 15:41   ` Jeremy Allison
  2025-07-03 15:59   ` stuart hayes
@ 2025-07-04 13:38   ` David Jeffery
  2025-07-04 13:44     ` Greg Kroah-Hartman
  2 siblings, 1 reply; 25+ messages in thread
From: David Jeffery @ 2025-07-04 13:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Stuart Hayes, linux-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, Martin Belanger, Oliver O'Halloran,
	Daniel Wagner, Keith Busch, Lukas Wunner, Jeremy Allison,
	Jens Axboe, Sagi Grimberg, linux-nvme, Nathan Chancellor,
	Jan Kiszka, Bert Karwatzki

On Thu, Jul 3, 2025 at 7:47 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> > Address resource and timing issues when spawning a unique async thread
> > for every device during shutdown:
> >   * Make the asynchronous threads able to shut down multiple devices,
> >     instead of spawning a unique thread for every device.
> >   * Modify core kernel async code with a custom wake function so it
> >     doesn't wake up threads waiting to synchronize every time the cookie
> >     changes
>
> Given all these thread spawning issues, why can't we just go back
> to the approach that kicks off shutdown asynchronously and then waits
> for it without spawning all these threads?
>

The async subsystem fix is something that should be fixed regardless
of async shutdown. Async shutdown's use just exposed its thundering
herd behavior which is easily fixed.

Reducing the threads is just good optimization. Doing every callback
in its own thread adds extra overhead which isn't needed to maintain
ordering and async shutdown gains, and combining sync devices into a
thread where reasonable didn't add that much complexity to the code.

The older non-thread approaches were unpopular with how they still
added plenty of complexity to the shutdown logic while pushing either
ugly splits or their own thread creation down into the individual
shutdown routines of drivers.

David Jeffery



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-04 13:38   ` David Jeffery
@ 2025-07-04 13:44     ` Greg Kroah-Hartman
  2025-07-04 14:09       ` David Jeffery
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-04 13:44 UTC (permalink / raw)
  To: David Jeffery
  Cc: Christoph Hellwig, Stuart Hayes, linux-kernel, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, Jeremy Allison, Jens Axboe,
	Sagi Grimberg, linux-nvme, Nathan Chancellor, Jan Kiszka,
	Bert Karwatzki

On Fri, Jul 04, 2025 at 09:38:15AM -0400, David Jeffery wrote:
> On Thu, Jul 3, 2025 at 7:47 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> > > Address resource and timing issues when spawning a unique async thread
> > > for every device during shutdown:
> > >   * Make the asynchronous threads able to shut down multiple devices,
> > >     instead of spawning a unique thread for every device.
> > >   * Modify core kernel async code with a custom wake function so it
> > >     doesn't wake up threads waiting to synchronize every time the cookie
> > >     changes
> >
> > Given all these thread spawning issues, why can't we just go back
> > to the approach that kicks off shutdown asynchronously and then waits
> > for it without spawning all these threads?
> >
> 
> The async subsystem fix is something that should be fixed regardless
> of async shutdown. Async shutdown's use just exposed its thundering
> herd behavior which is easily fixed.

Great, then please submit that on its own and get the maintainer of that
subsystem to agree and accept it as I have no way to judge that code at
all.

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-03 15:41   ` Jeremy Allison
@ 2025-07-04 13:45     ` David Jeffery
  2025-07-04 16:26       ` Sultan Alsawaf
  0 siblings, 1 reply; 25+ messages in thread
From: David Jeffery @ 2025-07-04 13:45 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Christoph Hellwig, Stuart Hayes, linux-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, Martin Belanger, Oliver O'Halloran,
	Daniel Wagner, Keith Busch, Lukas Wunner, Jeremy Allison,
	Jens Axboe, Sagi Grimberg, linux-nvme, Nathan Chancellor,
	Jan Kiszka, Bert Karwatzki, sultan

On Thu, Jul 3, 2025 at 12:13 PM Jeremy Allison <jra@samba.org> wrote:
>
> On Thu, Jul 03, 2025 at 01:46:56PM +0200, Christoph Hellwig wrote:
> >On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> >> Address resource and timing issues when spawning a unique async thread
> >> for every device during shutdown:
> >>   * Make the asynchronous threads able to shut down multiple devices,
> >>     instead of spawning a unique thread for every device.
> >>   * Modify core kernel async code with a custom wake function so it
> >>     doesn't wake up threads waiting to synchronize every time the cookie
> >>     changes
> >
> >Given all these thread spawning issues, why can't we just go back
> >to the approach that kicks off shutdown asynchronously and then waits
> >for it without spawning all these threads?
>
> It isn't just an nvme issue. Red Hat found the same issue
> with SCSI devices.
>
> My colleague Sultan Alsawaf posted a simpler fix for the
> earlier patch here:
>
> https://lists.infradead.org/pipermail/linux-nvme/2025-January/053666.html
>
> Maybe this could be explored.
>

Unfortunately, this approach looks flawed. If I am reading it right,
it assumes async shutdown devices do not have dependencies on sync
shutdown devices. This is not a valid assumption and so violates
dependency ordering.

Maintaining all the dependencies is the core problem and source of the
complexity of the async shutdown patches.

David Jeffery



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-04 13:44     ` Greg Kroah-Hartman
@ 2025-07-04 14:09       ` David Jeffery
  2025-07-04 14:13         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 25+ messages in thread
From: David Jeffery @ 2025-07-04 14:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Christoph Hellwig, Stuart Hayes, linux-kernel, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, Jeremy Allison, Jens Axboe,
	Sagi Grimberg, linux-nvme, Nathan Chancellor, Jan Kiszka,
	Bert Karwatzki

On Fri, Jul 4, 2025 at 9:45 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Fri, Jul 04, 2025 at 09:38:15AM -0400, David Jeffery wrote:
> > On Thu, Jul 3, 2025 at 7:47 AM Christoph Hellwig <hch@lst.de> wrote:
> > >
> > > On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> > > > Address resource and timing issues when spawning a unique async thread
> > > > for every device during shutdown:
> > > >   * Make the asynchronous threads able to shut down multiple devices,
> > > >     instead of spawning a unique thread for every device.
> > > >   * Modify core kernel async code with a custom wake function so it
> > > >     doesn't wake up threads waiting to synchronize every time the cookie
> > > >     changes
> > >
> > > Given all these thread spawning issues, why can't we just go back
> > > to the approach that kicks off shutdown asynchronously and then waits
> > > for it without spawning all these threads?
> > >
> >
> > The async subsystem fix is something that should be fixed regardless
> > of async shutdown. Async shutdown's use just exposed its thundering
> > herd behavior which is easily fixed.
>
> Great, then please submit that on its own and get the maintainer of that
> subsystem to agree and accept it as I have no way to judge that code at
> all.

Unfortunately, it does not have a maintainer listed and sees limited
activity. Would CC-ing some of the recent developers of kernel/async.c
to ask them to review be recommended in this situation?

David Jeffery



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-04 14:09       ` David Jeffery
@ 2025-07-04 14:13         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 25+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-04 14:13 UTC (permalink / raw)
  To: David Jeffery
  Cc: Christoph Hellwig, Stuart Hayes, linux-kernel, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, Jeremy Allison, Jens Axboe,
	Sagi Grimberg, linux-nvme, Nathan Chancellor, Jan Kiszka,
	Bert Karwatzki

On Fri, Jul 04, 2025 at 10:09:01AM -0400, David Jeffery wrote:
> On Fri, Jul 4, 2025 at 9:45 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Fri, Jul 04, 2025 at 09:38:15AM -0400, David Jeffery wrote:
> > > On Thu, Jul 3, 2025 at 7:47 AM Christoph Hellwig <hch@lst.de> wrote:
> > > >
> > > > On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> > > > > Address resource and timing issues when spawning a unique async thread
> > > > > for every device during shutdown:
> > > > >   * Make the asynchronous threads able to shut down multiple devices,
> > > > >     instead of spawning a unique thread for every device.
> > > > >   * Modify core kernel async code with a custom wake function so it
> > > > >     doesn't wake up threads waiting to synchronize every time the cookie
> > > > >     changes
> > > >
> > > > Given all these thread spawning issues, why can't we just go back
> > > > to the approach that kicks off shutdown asynchronously and then waits
> > > > for it without spawning all these threads?
> > > >
> > >
> > > The async subsystem fix is something that should be fixed regardless
> > > of async shutdown. Async shutdown's use just exposed its thundering
> > > herd behavior which is easily fixed.
> >
> > Great, then please submit that on its own and get the maintainer of that
> > subsystem to agree and accept it as I have no way to judge that code at
> > all.
> 
> Unfortunately, it does not have a maintainer listed and sees limited
> activity. Would CC-ing some of the recent developers of kernel/async.c
> to ask them to review be recommended in this situation?

Like any other piece of the kernel, yes :)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-04 13:45     ` David Jeffery
@ 2025-07-04 16:26       ` Sultan Alsawaf
  2025-07-07 15:34         ` David Jeffery
  0 siblings, 1 reply; 25+ messages in thread
From: Sultan Alsawaf @ 2025-07-04 16:26 UTC (permalink / raw)
  To: David Jeffery
  Cc: Jeremy Allison, Christoph Hellwig, Stuart Hayes, linux-kernel,
	Greg Kroah-Hartman, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	Jeremy Allison, Jens Axboe, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki

On Fri, Jul 04, 2025 at 09:45:44AM -0400, David Jeffery wrote:
> On Thu, Jul 3, 2025 at 12:13 PM Jeremy Allison <jra@samba.org> wrote:
> >
> > On Thu, Jul 03, 2025 at 01:46:56PM +0200, Christoph Hellwig wrote:
> > >On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> > >> Address resource and timing issues when spawning a unique async thread
> > >> for every device during shutdown:
> > >>   * Make the asynchronous threads able to shut down multiple devices,
> > >>     instead of spawning a unique thread for every device.
> > >>   * Modify core kernel async code with a custom wake function so it
> > >>     doesn't wake up threads waiting to synchronize every time the cookie
> > >>     changes
> > >
> > >Given all these thread spawning issues, why can't we just go back
> > >to the approach that kicks off shutdown asynchronously and then waits
> > >for it without spawning all these threads?
> >
> > It isn't just an nvme issue. Red Hat found the same issue
> > with SCSI devices.
> >
> > My colleague Sultan Alsawaf posted a simpler fix for the
> > earlier patch here:
> >
> > https://lists.infradead.org/pipermail/linux-nvme/2025-January/053666.html
> >
> > Maybe this could be explored.
> >
> 
> Unfortunately, this approach looks flawed. If I am reading it right,
> it assumes async shutdown devices do not have dependencies on sync
> shutdown devices.

It does not make any such assumption. Dependency on a sync device is handled
through a combination of queue_device_async_shutdown() setting an async device's
shutdown_after and the synchronous shutdown loop dispatching an "async" shutdown
for a sync device when it encounters a sync device that has a downstream async
dependency.

> Maintaining all the dependencies is the core problem and source of the
> complexity of the async shutdown patches.

I am acutely aware. Please take a closer look at my patch.

Sultan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-04 16:26       ` Sultan Alsawaf
@ 2025-07-07 15:34         ` David Jeffery
  2025-07-07 20:49           ` stuart hayes
  0 siblings, 1 reply; 25+ messages in thread
From: David Jeffery @ 2025-07-07 15:34 UTC (permalink / raw)
  To: Sultan Alsawaf
  Cc: Jeremy Allison, Christoph Hellwig, Stuart Hayes, linux-kernel,
	Greg Kroah-Hartman, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	Jeremy Allison, Jens Axboe, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki

On Fri, Jul 4, 2025 at 12:26 PM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
>
> On Fri, Jul 04, 2025 at 09:45:44AM -0400, David Jeffery wrote:
> > On Thu, Jul 3, 2025 at 12:13 PM Jeremy Allison <jra@samba.org> wrote:
> > >
> > > On Thu, Jul 03, 2025 at 01:46:56PM +0200, Christoph Hellwig wrote:
> > > >On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> > > >> Address resource and timing issues when spawning a unique async thread
> > > >> for every device during shutdown:
> > > >>   * Make the asynchronous threads able to shut down multiple devices,
> > > >>     instead of spawning a unique thread for every device.
> > > >>   * Modify core kernel async code with a custom wake function so it
> > > >>     doesn't wake up threads waiting to synchronize every time the cookie
> > > >>     changes
> > > >
> > > >Given all these thread spawning issues, why can't we just go back
> > > >to the approach that kicks off shutdown asynchronously and then waits
> > > >for it without spawning all these threads?
> > >
> > > It isn't just an nvme issue. Red Hat found the same issue
> > > with SCSI devices.
> > >
> > > My colleague Sultan Alsawaf posted a simpler fix for the
> > > earlier patch here:
> > >
> > > https://lists.infradead.org/pipermail/linux-nvme/2025-January/053666.html
> > >
> > > Maybe this could be explored.
> > >
> >
> > Unfortunately, this approach looks flawed. If I am reading it right,
> > it assumes async shutdown devices do not have dependencies on sync
> > shutdown devices.
>
> It does not make any such assumption. Dependency on a sync device is handled
> through a combination of queue_device_async_shutdown() setting an async device's
> shutdown_after and the synchronous shutdown loop dispatching an "async" shutdown
> for a sync device when it encounters a sync device that has a downstream async
> dependency.
>

Yes, but not what I think fails. This handles a sync parent having an
async child. It does not handle the reverse, a sync child having an
async parent.

For example, take a system with 1 pci nvme device. The nvme device
which is flagged for async shutdown can have sync shutdown children as
well as a sync shutdown parent. The patch linked pulls the async
device off the shutdown list into a separate async list, then starts
this lone async device with queue_device_async_shutdown from being on
the async list. The device then is passed to the async subsystem
running shutdown_one_device_async where it will immediately do
shutdown due to a zero value shutdown_after. The patch sets
shutdown_after for its parent, but there is nothing connecting and
ordering the async device to its sync children which will be shutdown
later from the original device_shutdown task.

> > Maintaining all the dependencies is the core problem and source of the
> > complexity of the async shutdown patches.
>
> I am acutely aware. Please take a closer look at my patch.
>

I have, and it still looks incomplete to me.

David Jeffery



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-07 15:34         ` David Jeffery
@ 2025-07-07 20:49           ` stuart hayes
  2025-07-08  0:00             ` Sultan Alsawaf
  2025-07-08 21:31             ` Sultan Alsawaf
  0 siblings, 2 replies; 25+ messages in thread
From: stuart hayes @ 2025-07-07 20:49 UTC (permalink / raw)
  To: David Jeffery, Sultan Alsawaf
  Cc: Jeremy Allison, Christoph Hellwig, linux-kernel,
	Greg Kroah-Hartman, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	Jeremy Allison, Jens Axboe, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki

On 7/7/2025 10:34 AM, David Jeffery wrote:
> On Fri, Jul 4, 2025 at 12:26 PM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
>>
>> On Fri, Jul 04, 2025 at 09:45:44AM -0400, David Jeffery wrote:
>>> On Thu, Jul 3, 2025 at 12:13 PM Jeremy Allison <jra@samba.org> wrote:
>>>>
>>>> On Thu, Jul 03, 2025 at 01:46:56PM +0200, Christoph Hellwig wrote:
>>>>> On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
>>>>>> Address resource and timing issues when spawning a unique async thread
>>>>>> for every device during shutdown:
>>>>>>    * Make the asynchronous threads able to shut down multiple devices,
>>>>>>      instead of spawning a unique thread for every device.
>>>>>>    * Modify core kernel async code with a custom wake function so it
>>>>>>      doesn't wake up threads waiting to synchronize every time the cookie
>>>>>>      changes
>>>>>
>>>>> Given all these thread spawning issues, why can't we just go back
>>>>> to the approach that kicks off shutdown asynchronously and then waits
>>>>> for it without spawning all these threads?
>>>>
>>>> It isn't just an nvme issue. Red Hat found the same issue
>>>> with SCSI devices.
>>>>
>>>> My colleague Sultan Alsawaf posted a simpler fix for the
>>>> earlier patch here:
>>>>
>>>> https://lists.infradead.org/pipermail/linux-nvme/2025-January/053666.html
>>>>
>>>> Maybe this could be explored.
>>>>
>>>
>>> Unfortunately, this approach looks flawed. If I am reading it right,
>>> it assumes async shutdown devices do not have dependencies on sync
>>> shutdown devices.
>>
>> It does not make any such assumption. Dependency on a sync device is handled
>> through a combination of queue_device_async_shutdown() setting an async device's
>> shutdown_after and the synchronous shutdown loop dispatching an "async" shutdown
>> for a sync device when it encounters a sync device that has a downstream async
>> dependency.
>>
> 
> Yes, but not what I think fails. This handles a sync parent having an
> async child. It does not handle the reverse, a sync child having an
> async parent.
> 
> For example, take a system with 1 pci nvme device. The nvme device
> which is flagged for async shutdown can have sync shutdown children as
> well as a sync shutdown parent. The patch linked pulls the async
> device off the shutdown list into a separate async list, then starts
> this lone async device with queue_device_async_shutdown from being on
> the async list. The device then is passed to the async subsystem
> running shutdown_one_device_async where it will immediately do
> shutdown due to a zero value shutdown_after. The patch sets
> shutdown_after for its parent, but there is nothing connecting and
> ordering the async device to its sync children which will be shutdown
> later from the original device_shutdown task.
> 
>>> Maintaining all the dependencies is the core problem and source of the
>>> complexity of the async shutdown patches.
>>
>> I am acutely aware. Please take a closer look at my patch.
>>
> 
> I have, and it still looks incomplete to me.
> 
> David Jeffery
> 

Also, the way it is implemented, it could hang if there isn't enough 
memory to queue up all of the async device shutdowns before starting the 
synchronous shutdowns.

When you call async_schedule_domain() and there's not enough memory to 
allocate an async_entry, the async function will be run immediately. If 
that happens when queuing up the async shutdowns, it could easily hang 
if there if there are any dependencies requiring an async device 
shutdown to have to wait for a synchronous device to shutdown, since 
none of the synchronous shutdown devices have been scheduled yet.

Before your patch, all device shutdowns are scheduled in order such that 
if the call to async_schedule_domain() runs the function immediately, it 
will still be able to complete, it just won't get the benefit of 
multiple threads shutting down devices concurrently. The V10 patch 
maintains that--it just lets one thread shut down multiple synchronous 
devices instead of creating one thread for each synchronous device shutdown.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-07 20:49           ` stuart hayes
@ 2025-07-08  0:00             ` Sultan Alsawaf
  2025-07-08 21:47               ` Sultan Alsawaf
  2025-07-08 21:31             ` Sultan Alsawaf
  1 sibling, 1 reply; 25+ messages in thread
From: Sultan Alsawaf @ 2025-07-08  0:00 UTC (permalink / raw)
  To: stuart hayes, David Jeffery
  Cc: Jeremy Allison, Christoph Hellwig, linux-kernel,
	Greg Kroah-Hartman, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	Jeremy Allison, Jens Axboe, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki

On Mon, Jul 07, 2025 at 03:49:44PM -0500, stuart hayes wrote:
> On 7/7/2025 10:34 AM, David Jeffery wrote:
> > On Fri, Jul 4, 2025 at 12:26 PM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > > 
> > > On Fri, Jul 04, 2025 at 09:45:44AM -0400, David Jeffery wrote:
> > > > On Thu, Jul 3, 2025 at 12:13 PM Jeremy Allison <jra@samba.org> wrote:
> > > > > 
> > > > > On Thu, Jul 03, 2025 at 01:46:56PM +0200, Christoph Hellwig wrote:
> > > > > > On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> > > > > > > Address resource and timing issues when spawning a unique async thread
> > > > > > > for every device during shutdown:
> > > > > > >    * Make the asynchronous threads able to shut down multiple devices,
> > > > > > >      instead of spawning a unique thread for every device.
> > > > > > >    * Modify core kernel async code with a custom wake function so it
> > > > > > >      doesn't wake up threads waiting to synchronize every time the cookie
> > > > > > >      changes
> > > > > > 
> > > > > > Given all these thread spawning issues, why can't we just go back
> > > > > > to the approach that kicks off shutdown asynchronously and then waits
> > > > > > for it without spawning all these threads?
> > > > > 
> > > > > It isn't just an nvme issue. Red Hat found the same issue
> > > > > with SCSI devices.
> > > > > 
> > > > > My colleague Sultan Alsawaf posted a simpler fix for the
> > > > > earlier patch here:
> > > > > 
> > > > > https://lists.infradead.org/pipermail/linux-nvme/2025-January/053666.html
> > > > > 
> > > > > Maybe this could be explored.
> > > > > 
> > > > 
> > > > Unfortunately, this approach looks flawed. If I am reading it right,
> > > > it assumes async shutdown devices do not have dependencies on sync
> > > > shutdown devices.
> > > 
> > > It does not make any such assumption. Dependency on a sync device is handled
> > > through a combination of queue_device_async_shutdown() setting an async device's
> > > shutdown_after and the synchronous shutdown loop dispatching an "async" shutdown
> > > for a sync device when it encounters a sync device that has a downstream async
> > > dependency.
> > > 
> > 
> > Yes, but not what I think fails. This handles a sync parent having an
> > async child. It does not handle the reverse, a sync child having an
> > async parent.
> > 
> > For example, take a system with 1 pci nvme device. The nvme device
> > which is flagged for async shutdown can have sync shutdown children as
> > well as a sync shutdown parent. The patch linked pulls the async
> > device off the shutdown list into a separate async list, then starts
> > this lone async device with queue_device_async_shutdown from being on
> > the async list. The device then is passed to the async subsystem
> > running shutdown_one_device_async where it will immediately do
> > shutdown due to a zero value shutdown_after. The patch sets
> > shutdown_after for its parent, but there is nothing connecting and
> > ordering the async device to its sync children which will be shutdown
> > later from the original device_shutdown task.
> > 
> > > > Maintaining all the dependencies is the core problem and source of the
> > > > complexity of the async shutdown patches.
> > > 
> > > I am acutely aware. Please take a closer look at my patch.
> > > 
> > 
> > I have, and it still looks incomplete to me.
> > 
> > David Jeffery
> > 
> 
> Also, the way it is implemented, it could hang if there isn't enough memory
> to queue up all of the async device shutdowns before starting the
> synchronous shutdowns.
> 
> When you call async_schedule_domain() and there's not enough memory to
> allocate an async_entry, the async function will be run immediately. If that
> happens when queuing up the async shutdowns, it could easily hang if there
> if there are any dependencies requiring an async device shutdown to have to
> wait for a synchronous device to shutdown, since none of the synchronous
> shutdown devices have been scheduled yet.

Understood. Thank you both for the clarifications.

Regarding an async device with a sync child: this case strikes me as odd. What
exactly makes the child device require "synchronous" shutdown? Synchronous
relative to what, specifically?

This also makes me question what, exactly, the criteria are for determining that
a device is safe to shut down "async". I think that all children of an async
shutdown device should be enrolled into async shutdown to isolate the entire
async dependency chain. That way, the async shutdown devices don't need to wait
for synchronous shutdown of unrelated devices. I'm happy to do this in a v3.

Regarding async_schedule_domain() running synchronously when async_entry
allocation fails: this is a bothersome constraint of the async helpers. Although
there is a lot of overlap between the async helpers and the requirements for
async device shutdown, there are other annoying constraints that make the async
helpers unfit as-is for async device shutdown (like the arbitrary MAX_WORK
limit).

The async helpers also don't do exclusive wakes, which leads to the behavior you
observed in "[PATCH v10 1/5] kernel/async: streamline cookie synchronization".
We could use exclusive wakes by isolating async shutdown device dependency
chains and creating a different async_domain for each chain, which is faster
than calling TTWU on all async waiters and filtering out the wakeups ad hoc.

These points make me think that either the async helpers should be made far more
generic or the driver core code should have its own async infrastructure.

> Before your patch, all device shutdowns are scheduled in order such that if
> the call to async_schedule_domain() runs the function immediately, it will
> still be able to complete, it just won't get the benefit of multiple threads
> shutting down devices concurrently. The V10 patch maintains that--it just
> lets one thread shut down multiple synchronous devices instead of creating
> one thread for each synchronous device shutdown.

This still dedicates threads to shutting down synchronous devices that don't
share a dependency chain with an async shutdown device. This shouldn't be
necessary.

(PS: fixed CC of Jan Kiszka <jan.kiszka@siemens.com>, who appears to have been
CC'd with a typo'd email address for at least the entire v9 patchset and this
v10 patchset. The CC list had siemens.com misspelled as seimens.com)

Sultan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-07 20:49           ` stuart hayes
  2025-07-08  0:00             ` Sultan Alsawaf
@ 2025-07-08 21:31             ` Sultan Alsawaf
  1 sibling, 0 replies; 25+ messages in thread
From: Sultan Alsawaf @ 2025-07-08 21:31 UTC (permalink / raw)
  To: stuart hayes
  Cc: David Jeffery, Jeremy Allison, Christoph Hellwig, linux-kernel,
	Greg Kroah-Hartman, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	Jeremy Allison, Jens Axboe, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki

On Mon, Jul 07, 2025 at 03:49:44PM -0500, stuart hayes wrote:
> On 7/7/2025 10:34 AM, David Jeffery wrote:
> > On Fri, Jul 4, 2025 at 12:26 PM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > > 
> > > On Fri, Jul 04, 2025 at 09:45:44AM -0400, David Jeffery wrote:
> > > > On Thu, Jul 3, 2025 at 12:13 PM Jeremy Allison <jra@samba.org> wrote:
> > > > > 
> > > > > On Thu, Jul 03, 2025 at 01:46:56PM +0200, Christoph Hellwig wrote:
> > > > > > On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> > > > > > > Address resource and timing issues when spawning a unique async thread
> > > > > > > for every device during shutdown:
> > > > > > >    * Make the asynchronous threads able to shut down multiple devices,
> > > > > > >      instead of spawning a unique thread for every device.
> > > > > > >    * Modify core kernel async code with a custom wake function so it
> > > > > > >      doesn't wake up threads waiting to synchronize every time the cookie
> > > > > > >      changes
> > > > > > 
> > > > > > Given all these thread spawning issues, why can't we just go back
> > > > > > to the approach that kicks off shutdown asynchronously and then waits
> > > > > > for it without spawning all these threads?
> > > > > 
> > > > > It isn't just an nvme issue. Red Hat found the same issue
> > > > > with SCSI devices.
> > > > > 
> > > > > My colleague Sultan Alsawaf posted a simpler fix for the
> > > > > earlier patch here:
> > > > > 
> > > > > https://lists.infradead.org/pipermail/linux-nvme/2025-January/053666.html
> > > > > 
> > > > > Maybe this could be explored.
> > > > > 
> > > > 
> > > > Unfortunately, this approach looks flawed. If I am reading it right,
> > > > it assumes async shutdown devices do not have dependencies on sync
> > > > shutdown devices.
> > > 
> > > It does not make any such assumption. Dependency on a sync device is handled
> > > through a combination of queue_device_async_shutdown() setting an async device's
> > > shutdown_after and the synchronous shutdown loop dispatching an "async" shutdown
> > > for a sync device when it encounters a sync device that has a downstream async
> > > dependency.
> > > 
> > 
> > Yes, but not what I think fails. This handles a sync parent having an
> > async child. It does not handle the reverse, a sync child having an
> > async parent.
> > 
> > For example, take a system with 1 pci nvme device. The nvme device
> > which is flagged for async shutdown can have sync shutdown children as
> > well as a sync shutdown parent. The patch linked pulls the async
> > device off the shutdown list into a separate async list, then starts
> > this lone async device with queue_device_async_shutdown from being on
> > the async list. The device then is passed to the async subsystem
> > running shutdown_one_device_async where it will immediately do
> > shutdown due to a zero value shutdown_after. The patch sets
> > shutdown_after for its parent, but there is nothing connecting and
> > ordering the async device to its sync children which will be shutdown
> > later from the original device_shutdown task.
> > 
> > > > Maintaining all the dependencies is the core problem and source of the
> > > > complexity of the async shutdown patches.
> > > 
> > > I am acutely aware. Please take a closer look at my patch.
> > > 
> > 
> > I have, and it still looks incomplete to me.
> > 
> > David Jeffery
> > 
> 
> Also, the way it is implemented, it could hang if there isn't enough memory
> to queue up all of the async device shutdowns before starting the
> synchronous shutdowns.
> 
> When you call async_schedule_domain() and there's not enough memory to
> allocate an async_entry, the async function will be run immediately. If that
> happens when queuing up the async shutdowns, it could easily hang if there
> if there are any dependencies requiring an async device shutdown to have to
> wait for a synchronous device to shutdown, since none of the synchronous
> shutdown devices have been scheduled yet.

FWIW, I did consider this scenario when I wrote my patch and this hang cannot
occur with the way I implemented async shutdown. The reason is because my patch
assumes that async devices never need to wait on synchronous devices to shut
down, as David pointed out. My patch only assumes that synchronous devices need
to wait for async devices, so any allocation failures in async_schedule_domain()
won't cause the hang you described.

Sultan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 0/5] shut down devices asynchronously
  2025-07-08  0:00             ` Sultan Alsawaf
@ 2025-07-08 21:47               ` Sultan Alsawaf
  0 siblings, 0 replies; 25+ messages in thread
From: Sultan Alsawaf @ 2025-07-08 21:47 UTC (permalink / raw)
  To: stuart hayes, David Jeffery
  Cc: Jeremy Allison, Christoph Hellwig, linux-kernel,
	Greg Kroah-Hartman, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	Jeremy Allison, Jens Axboe, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki

On Mon, Jul 07, 2025 at 05:00:34PM -0700, Sultan Alsawaf wrote:
> On Mon, Jul 07, 2025 at 03:49:44PM -0500, stuart hayes wrote:
> > On 7/7/2025 10:34 AM, David Jeffery wrote:
> > > On Fri, Jul 4, 2025 at 12:26 PM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > > > 
> > > > On Fri, Jul 04, 2025 at 09:45:44AM -0400, David Jeffery wrote:
> > > > > On Thu, Jul 3, 2025 at 12:13 PM Jeremy Allison <jra@samba.org> wrote:
> > > > > > 
> > > > > > On Thu, Jul 03, 2025 at 01:46:56PM +0200, Christoph Hellwig wrote:
> > > > > > > On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> > > > > > > > Address resource and timing issues when spawning a unique async thread
> > > > > > > > for every device during shutdown:
> > > > > > > >    * Make the asynchronous threads able to shut down multiple devices,
> > > > > > > >      instead of spawning a unique thread for every device.
> > > > > > > >    * Modify core kernel async code with a custom wake function so it
> > > > > > > >      doesn't wake up threads waiting to synchronize every time the cookie
> > > > > > > >      changes
> > > > > > > 
> > > > > > > Given all these thread spawning issues, why can't we just go back
> > > > > > > to the approach that kicks off shutdown asynchronously and then waits
> > > > > > > for it without spawning all these threads?
> > > > > > 
> > > > > > It isn't just an nvme issue. Red Hat found the same issue
> > > > > > with SCSI devices.
> > > > > > 
> > > > > > My colleague Sultan Alsawaf posted a simpler fix for the
> > > > > > earlier patch here:
> > > > > > 
> > > > > > https://lists.infradead.org/pipermail/linux-nvme/2025-January/053666.html
> > > > > > 
> > > > > > Maybe this could be explored.
> > > > > > 
> > > > > 
> > > > > Unfortunately, this approach looks flawed. If I am reading it right,
> > > > > it assumes async shutdown devices do not have dependencies on sync
> > > > > shutdown devices.
> > > > 
> > > > It does not make any such assumption. Dependency on a sync device is handled
> > > > through a combination of queue_device_async_shutdown() setting an async device's
> > > > shutdown_after and the synchronous shutdown loop dispatching an "async" shutdown
> > > > for a sync device when it encounters a sync device that has a downstream async
> > > > dependency.
> > > > 
> > > 
> > > Yes, but not what I think fails. This handles a sync parent having an
> > > async child. It does not handle the reverse, a sync child having an
> > > async parent.
> > > 
> > > For example, take a system with 1 pci nvme device. The nvme device
> > > which is flagged for async shutdown can have sync shutdown children as
> > > well as a sync shutdown parent. The patch linked pulls the async
> > > device off the shutdown list into a separate async list, then starts
> > > this lone async device with queue_device_async_shutdown from being on
> > > the async list. The device then is passed to the async subsystem
> > > running shutdown_one_device_async where it will immediately do
> > > shutdown due to a zero value shutdown_after. The patch sets
> > > shutdown_after for its parent, but there is nothing connecting and
> > > ordering the async device to its sync children which will be shutdown
> > > later from the original device_shutdown task.
> > > 
> > > > > Maintaining all the dependencies is the core problem and source of the
> > > > > complexity of the async shutdown patches.
> > > > 
> > > > I am acutely aware. Please take a closer look at my patch.
> > > > 
> > > 
> > > I have, and it still looks incomplete to me.
> > > 
> > > David Jeffery
> > > 
> > 
> > Also, the way it is implemented, it could hang if there isn't enough memory
> > to queue up all of the async device shutdowns before starting the
> > synchronous shutdowns.
> > 
> > When you call async_schedule_domain() and there's not enough memory to
> > allocate an async_entry, the async function will be run immediately. If that
> > happens when queuing up the async shutdowns, it could easily hang if there
> > if there are any dependencies requiring an async device shutdown to have to
> > wait for a synchronous device to shutdown, since none of the synchronous
> > shutdown devices have been scheduled yet.
> 
> Understood. Thank you both for the clarifications.
> 
> Regarding an async device with a sync child: this case strikes me as odd. What
> exactly makes the child device require "synchronous" shutdown? Synchronous
> relative to what, specifically?
> 
> This also makes me question what, exactly, the criteria are for determining that
> a device is safe to shut down "async". I think that all children of an async
> shutdown device should be enrolled into async shutdown to isolate the entire
> async dependency chain. That way, the async shutdown devices don't need to wait
> for synchronous shutdown of unrelated devices. I'm happy to do this in a v3.
> 
> Regarding async_schedule_domain() running synchronously when async_entry
> allocation fails: this is a bothersome constraint of the async helpers. Although
> there is a lot of overlap between the async helpers and the requirements for
> async device shutdown, there are other annoying constraints that make the async
> helpers unfit as-is for async device shutdown (like the arbitrary MAX_WORK
> limit).
> 
> The async helpers also don't do exclusive wakes, which leads to the behavior you
> observed in "[PATCH v10 1/5] kernel/async: streamline cookie synchronization".
> We could use exclusive wakes by isolating async shutdown device dependency
> chains and creating a different async_domain for each chain, which is faster
> than calling TTWU on all async waiters and filtering out the wakeups ad hoc.

Correcting myself here: the custom wake function filters out wakeups _before_
TTWU, so what I said in this paragraph is incorrect and therefore exclusive
wakes aren't necessary or helpful. My apologies.

Sultan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 1/5] kernel/async: streamline cookie synchronization
  2025-06-25 20:18 ` [PATCH v10 1/5] kernel/async: streamline cookie synchronization Stuart Hayes
@ 2025-07-08 22:17   ` Sultan Alsawaf
  0 siblings, 0 replies; 25+ messages in thread
From: Sultan Alsawaf @ 2025-07-08 22:17 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka, Bert Karwatzki

On Wed, Jun 25, 2025 at 03:18:49PM -0500, Stuart Hayes wrote:
> From: David Jeffery <djeffery@redhat.com>
> 
> To prevent a thundering herd effect, implement a custom wake function for
> the async shubsystem which will only wake waiters which have all their
> dependencies completed.
> 
> The async subsystem currently wakes all waiters on async_done when an async
> task completes. When there are many tasks trying to synchronize on differnt
> async values, this can create a thundering herd problem when an async task
> wakes up all waiters, most of whom go back to waiting after causing
> lock contention and wasting CPU.
> 
> Signed-off-by: David Jeffery <djeffery@redhat.com>
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> ---
>  kernel/async.c | 42 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/async.c b/kernel/async.c
> index 4c3e6a44595f..ae327f29bac9 100644
> --- a/kernel/async.c
> +++ b/kernel/async.c
> @@ -76,6 +76,12 @@ struct async_entry {
>  	struct async_domain	*domain;
>  };
>  
> +struct async_wait_entry {
> +	wait_queue_entry_t wait;
> +	async_cookie_t cookie;
> +	struct async_domain *domain;
> +};
> +
>  static DECLARE_WAIT_QUEUE_HEAD(async_done);
>  
>  static atomic_t entry_count;
> @@ -298,6 +304,24 @@ void async_synchronize_full_domain(struct async_domain *domain)
>  }
>  EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
>  
> +/**
> + * async_domain_wake_function - wait function for cooking synchronization
> + *
> + * Custom wait function for async_synchronize_cookie_domain to check cookie
> + * value.  This prevents waking up waiting threads unnecessarily.
> + */
> +static int async_domain_wake_function(struct wait_queue_entry *wait,
> +				      unsigned int mode, int sync, void *key)
> +{
> +	struct async_wait_entry *await =
> +		container_of(wait, struct async_wait_entry, wait);
> +
> +	if (lowest_in_progress(await->domain) < await->cookie)
> +		return 0;
> +
> +	return autoremove_wake_function(wait, mode, sync, key);
> +}
> +
>  /**
>   * async_synchronize_cookie_domain - synchronize asynchronous function calls within a certain domain with cookie checkpointing
>   * @cookie: async_cookie_t to use as checkpoint
> @@ -310,11 +334,27 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
>  void async_synchronize_cookie_domain(async_cookie_t cookie, struct async_domain *domain)
>  {
>  	ktime_t starttime;
> +	struct async_wait_entry await = {
> +		.cookie = cookie,
> +		.domain = domain,
> +		.wait = {
> +			.func = async_domain_wake_function,
> +			.private = current,
> +			.flags = 0,
> +			.entry = LIST_HEAD_INIT(await.wait.entry),
> +		}};
>  
>  	pr_debug("async_waiting @ %i\n", task_pid_nr(current));
>  	starttime = ktime_get();
>  
> -	wait_event(async_done, lowest_in_progress(domain) >= cookie);
> +	for (;;) {
> +		prepare_to_wait(&async_done, &await.wait, TASK_UNINTERRUPTIBLE);
> +
> +		if (lowest_in_progress(domain) >= cookie)

This line introduces a bug on PREEMPT_RT because lowest_in_progress() may sleep
on PREEMPT_RT. If it does sleep, it'll corrupt the current task's state by
setting it to TASK_RUNNING after the sleep is over. IOW, the current task's
state might be TASK_RUNNING after lowest_in_progress() returns.

lowest_in_progress() may sleep on PREEMPT_RT because it locks a non-raw spin
lock (async_lock).

> +			break;
> +		schedule();
> +	}
> +	finish_wait(&async_done, &await.wait);
>  
>  	pr_debug("async_continuing @ %i after %lli usec\n", task_pid_nr(current),
>  		 microseconds_since(starttime));
> -- 
> 2.39.3
> 

Sultan


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-07-08 22:17 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-25 20:18 [PATCH v10 0/5] shut down devices asynchronously Stuart Hayes
2025-06-25 20:18 ` [PATCH v10 1/5] kernel/async: streamline cookie synchronization Stuart Hayes
2025-07-08 22:17   ` Sultan Alsawaf
2025-06-25 20:18 ` [PATCH v10 2/5] driver core: don't always lock parent in shutdown Stuart Hayes
2025-07-01  8:50   ` Greg Kroah-Hartman
2025-07-02 14:38     ` David Jeffery
2025-06-25 20:18 ` [PATCH v10 3/5] driver core: separate function to shutdown one device Stuart Hayes
2025-06-25 20:18 ` [PATCH v10 4/5] driver core: shut down devices asynchronously Stuart Hayes
2025-06-25 20:18 ` [PATCH v10 5/5] nvme-pci: Make driver prefer asynchronous shutdown Stuart Hayes
2025-06-30 20:33 ` [PATCH v10 0/5] shut down devices asynchronously Michael Kelley
2025-06-30 22:02   ` Laurence Oberman
2025-07-03 11:46 ` Christoph Hellwig
2025-07-03 15:41   ` Jeremy Allison
2025-07-04 13:45     ` David Jeffery
2025-07-04 16:26       ` Sultan Alsawaf
2025-07-07 15:34         ` David Jeffery
2025-07-07 20:49           ` stuart hayes
2025-07-08  0:00             ` Sultan Alsawaf
2025-07-08 21:47               ` Sultan Alsawaf
2025-07-08 21:31             ` Sultan Alsawaf
2025-07-03 15:59   ` stuart hayes
2025-07-04 13:38   ` David Jeffery
2025-07-04 13:44     ` Greg Kroah-Hartman
2025-07-04 14:09       ` David Jeffery
2025-07-04 14:13         ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).