* [PATCH 1/3] driver core: Introduce helper function __device_attach_driver_scan()
2026-01-07 17:55 [PATCH 0/3] Add NUMA-node-aware synchronous probing to driver core Jinhui Guo
@ 2026-01-07 17:55 ` Jinhui Guo
2026-01-17 13:36 ` Danilo Krummrich
2026-01-07 17:55 ` [PATCH 2/3] driver core: Add NUMA-node awareness to the synchronous probe path Jinhui Guo
2026-01-07 17:55 ` [PATCH 3/3] PCI: Clean up NUMA-node awareness in pci_bus_type probe Jinhui Guo
2 siblings, 1 reply; 9+ messages in thread
From: Jinhui Guo @ 2026-01-07 17:55 UTC (permalink / raw)
To: dakr, alexander.h.duyck, alexanderduyck, bhelgaas, bvanassche,
dan.j.williams, gregkh, helgaas, rafael, tj
Cc: guojinhui.liam, linux-kernel, linux-pci
Introduce a helper to eliminate duplication between
__device_attach() and __device_attach_async_helper();
a later patch will reuse it to add NUMA-node awareness
to the synchronous probe path in __device_attach().
No functional changes.
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
---
drivers/base/dd.c | 71 ++++++++++++++++++++++++++---------------------
1 file changed, 40 insertions(+), 31 deletions(-)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 349f31bedfa1..896f98add97d 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -962,6 +962,44 @@ static int __device_attach_driver(struct device_driver *drv, void *_data)
return ret == 0;
}
+static int __device_attach_driver_scan(struct device_attach_data *data,
+ bool *need_async)
+{
+ int ret = 0;
+ struct device *dev = data->dev;
+
+ if (dev->parent)
+ pm_runtime_get_sync(dev->parent);
+
+ ret = bus_for_each_drv(dev->bus, NULL, data,
+ __device_attach_driver);
+ /*
+ * When running in an async worker, a NULL need_async is passed
+ * since we are already in an async worker.
+ */
+ if (need_async && !ret && data->check_async && data->have_async) {
+ /*
+ * If we could not find appropriate driver
+ * synchronously and we are allowed to do
+ * async probes and there are drivers that
+ * want to probe asynchronously, we'll
+ * try them.
+ */
+ dev_dbg(dev, "scheduling asynchronous probe\n");
+ get_device(dev);
+ *need_async = true;
+ } else {
+ if (!need_async)
+ dev_dbg(dev, "async probe completed\n");
+ pm_request_idle(dev);
+ }
+
+ if (dev->parent)
+ pm_runtime_put(dev->parent);
+
+ return ret;
+}
+
static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
{
struct device *dev = _dev;
@@ -982,16 +1020,8 @@ static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
if (dev->p->dead || dev->driver)
goto out_unlock;
- if (dev->parent)
- pm_runtime_get_sync(dev->parent);
+ __device_attach_driver_scan(&data, NULL);
- bus_for_each_drv(dev->bus, NULL, &data, __device_attach_driver);
- dev_dbg(dev, "async probe completed\n");
-
- pm_request_idle(dev);
-
- if (dev->parent)
- pm_runtime_put(dev->parent);
out_unlock:
device_unlock(dev);
@@ -1025,28 +1055,7 @@ static int __device_attach(struct device *dev, bool allow_async)
.want_async = false,
};
- if (dev->parent)
- pm_runtime_get_sync(dev->parent);
-
- ret = bus_for_each_drv(dev->bus, NULL, &data,
- __device_attach_driver);
- if (!ret && allow_async && data.have_async) {
- /*
- * If we could not find appropriate driver
- * synchronously and we are allowed to do
- * async probes and there are drivers that
- * want to probe asynchronously, we'll
- * try them.
- */
- dev_dbg(dev, "scheduling asynchronous probe\n");
- get_device(dev);
- async = true;
- } else {
- pm_request_idle(dev);
- }
-
- if (dev->parent)
- pm_runtime_put(dev->parent);
+ ret = __device_attach_driver_scan(&data, &async);
}
out_unlock:
device_unlock(dev);
--
2.20.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] driver core: Introduce helper function __device_attach_driver_scan()
2026-01-07 17:55 ` [PATCH 1/3] driver core: Introduce helper function __device_attach_driver_scan() Jinhui Guo
@ 2026-01-17 13:36 ` Danilo Krummrich
0 siblings, 0 replies; 9+ messages in thread
From: Danilo Krummrich @ 2026-01-17 13:36 UTC (permalink / raw)
To: Jinhui Guo
Cc: alexander.h.duyck, alexanderduyck, bhelgaas, bvanassche,
dan.j.williams, gregkh, helgaas, rafael, tj, linux-kernel,
linux-pci
On Wed Jan 7, 2026 at 6:55 PM CET, Jinhui Guo wrote:
> Introduce a helper to eliminate duplication between
> __device_attach() and __device_attach_async_helper();
> a later patch will reuse it to add NUMA-node awareness
> to the synchronous probe path in __device_attach().
>
> No functional changes.
>
> Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/3] driver core: Add NUMA-node awareness to the synchronous probe path
2026-01-07 17:55 [PATCH 0/3] Add NUMA-node-aware synchronous probing to driver core Jinhui Guo
2026-01-07 17:55 ` [PATCH 1/3] driver core: Introduce helper function __device_attach_driver_scan() Jinhui Guo
@ 2026-01-07 17:55 ` Jinhui Guo
2026-01-07 18:22 ` Danilo Krummrich
2026-01-17 14:03 ` Danilo Krummrich
2026-01-07 17:55 ` [PATCH 3/3] PCI: Clean up NUMA-node awareness in pci_bus_type probe Jinhui Guo
2 siblings, 2 replies; 9+ messages in thread
From: Jinhui Guo @ 2026-01-07 17:55 UTC (permalink / raw)
To: dakr, alexander.h.duyck, alexanderduyck, bhelgaas, bvanassche,
dan.j.williams, gregkh, helgaas, rafael, tj
Cc: guojinhui.liam, linux-kernel, linux-pci
Introduce NUMA-node-aware synchronous probing: drivers
can initialize and allocate memory on the device’s local
node without scattering kmalloc_node() calls throughout
the code.
NUMA-aware probing was first added to PCI drivers by
commit d42c69972b85 ("[PATCH] PCI: Run PCI driver
initialization on local node") in 2005 and has benefited
PCI drivers ever since.
The asynchronous probe path already supports NUMA-node-aware
probing via async_schedule_dev() in the driver core. Since
NUMA affinity is orthogonal to sync/async probing, this
patch adds NUMA-node-aware support to the synchronous
probe path.
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
---
drivers/base/dd.c | 104 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 101 insertions(+), 3 deletions(-)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 896f98add97d..e1fb10ae2cc0 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -381,6 +381,92 @@ static void __exit deferred_probe_exit(void)
}
__exitcall(deferred_probe_exit);
+/*
+ * NUMA-node-aware synchronous probing:
+ * drivers can initialize and allocate memory on the device’s local
+ * node without scattering kmalloc_node() calls throughout the code.
+ */
+
+/* Generic function pointer type */
+typedef int (*numa_func_t)(void *arg1, void *arg2);
+
+/* Context for NUMA execution */
+struct numa_work_ctx {
+ struct work_struct work;
+ numa_func_t func;
+ void *arg1;
+ void *arg2;
+ int result;
+};
+
+/* Worker function running on the target node */
+static void numa_work_func(struct work_struct *work)
+{
+ struct numa_work_ctx *ctx = container_of(work, struct numa_work_ctx, work);
+
+ ctx->result = ctx->func(ctx->arg1, ctx->arg2);
+}
+
+/*
+ * __exec_on_numa_node - Execute a function on a specific NUMA node synchronously
+ * @node: Target NUMA node ID
+ * @func: The wrapper function to execute
+ * @arg1: First argument (void *)
+ * @arg2: Second argument (void *)
+ *
+ * Returns the result of the function execution, or -ENODEV if initialization fails.
+ * If the node is invalid or offline, it falls back to local execution.
+ */
+static int __exec_on_numa_node(int node, numa_func_t func, void *arg1, void *arg2)
+{
+ struct numa_work_ctx ctx;
+
+ /* Fallback to local execution if the node is invalid or offline */
+ if (node < 0 || node >= MAX_NUMNODES || !node_online(node))
+ return func(arg1, arg2);
+
+ ctx.func = func;
+ ctx.arg1 = arg1;
+ ctx.arg2 = arg2;
+ ctx.result = -ENODEV;
+ INIT_WORK_ONSTACK(&ctx.work, numa_work_func);
+
+ /* Use system_dfl_wq to allow execution on the specific node. */
+ queue_work_node(node, system_dfl_wq, &ctx.work);
+ flush_work(&ctx.work);
+ destroy_work_on_stack(&ctx.work);
+
+ return ctx.result;
+}
+
+/*
+ * DEFINE_NUMA_WRAPPER - Generate a type-safe wrapper for a function
+ * @func_name: The name of the target function
+ * @type1: The type of the first argument
+ * @type2: The type of the second argument
+ *
+ * This macro generates a static function named __wrapper_<func_name> that
+ * casts void pointers back to their original types and calls the target function.
+ */
+#define DEFINE_NUMA_WRAPPER(func_name, type1, type2) \
+ static int __wrapper_##func_name(void *arg1, void *arg2) \
+ { \
+ return func_name((type1)arg1, (type2)arg2); \
+ }
+
+/*
+ * EXEC_ON_NUMA_NODE - Execute a registered function on a NUMA node
+ * @node: Target NUMA node ID
+ * @func_name: The name of the target function (must be registered via DEFINE_NUMA_WRAPPER)
+ * @arg1: First argument
+ * @arg2: Second argument
+ *
+ * This macro invokes the internal execution helper using the generated wrapper.
+ */
+#define EXEC_ON_NUMA_NODE(node, func_name, arg1, arg2) \
+ __exec_on_numa_node(node, __wrapper_##func_name, \
+ (void *)(arg1), (void *)(arg2))
+
/**
* device_is_bound() - Check if device is bound to a driver
* @dev: device to check
@@ -808,6 +894,8 @@ static int __driver_probe_device(const struct device_driver *drv, struct device
return ret;
}
+DEFINE_NUMA_WRAPPER(__driver_probe_device, const struct device_driver *, struct device *)
+
/**
* driver_probe_device - attempt to bind device & driver together
* @drv: driver to bind a device to
@@ -844,6 +932,8 @@ static int driver_probe_device(const struct device_driver *drv, struct device *d
return ret;
}
+DEFINE_NUMA_WRAPPER(driver_probe_device, const struct device_driver *, struct device *)
+
static inline bool cmdline_requested_async_probing(const char *drv_name)
{
bool async_drv;
@@ -1000,6 +1090,8 @@ static int __device_attach_driver_scan(struct device_attach_data *data,
return ret;
}
+DEFINE_NUMA_WRAPPER(__device_attach_driver_scan, struct device_attach_data *, bool *)
+
static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
{
struct device *dev = _dev;
@@ -1055,7 +1147,9 @@ static int __device_attach(struct device *dev, bool allow_async)
.want_async = false,
};
- ret = __device_attach_driver_scan(&data, &async);
+ ret = EXEC_ON_NUMA_NODE(dev_to_node(dev),
+ __device_attach_driver_scan,
+ &data, &async);
}
out_unlock:
device_unlock(dev);
@@ -1142,7 +1236,9 @@ int device_driver_attach(const struct device_driver *drv, struct device *dev)
int ret;
__device_driver_lock(dev, dev->parent);
- ret = __driver_probe_device(drv, dev);
+ ret = EXEC_ON_NUMA_NODE(dev_to_node(dev),
+ __driver_probe_device,
+ drv, dev);
__device_driver_unlock(dev, dev->parent);
/* also return probe errors as normal negative errnos */
@@ -1231,7 +1327,9 @@ static int __driver_attach(struct device *dev, void *data)
}
__device_driver_lock(dev, dev->parent);
- driver_probe_device(drv, dev);
+ EXEC_ON_NUMA_NODE(dev_to_node(dev),
+ driver_probe_device,
+ drv, dev);
__device_driver_unlock(dev, dev->parent);
return 0;
--
2.20.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 2/3] driver core: Add NUMA-node awareness to the synchronous probe path
2026-01-07 17:55 ` [PATCH 2/3] driver core: Add NUMA-node awareness to the synchronous probe path Jinhui Guo
@ 2026-01-07 18:22 ` Danilo Krummrich
2026-01-08 8:28 ` Jinhui Guo
2026-01-17 14:03 ` Danilo Krummrich
1 sibling, 1 reply; 9+ messages in thread
From: Danilo Krummrich @ 2026-01-07 18:22 UTC (permalink / raw)
To: Jinhui Guo
Cc: alexander.h.duyck, alexanderduyck, bhelgaas, bvanassche,
dan.j.williams, gregkh, helgaas, rafael, tj, linux-kernel,
linux-pci
On Wed Jan 7, 2026 at 6:55 PM CET, Jinhui Guo wrote:
> + * __exec_on_numa_node - Execute a function on a specific NUMA node synchronously
> + * @node: Target NUMA node ID
> + * @func: The wrapper function to execute
> + * @arg1: First argument (void *)
> + * @arg2: Second argument (void *)
> + *
> + * Returns the result of the function execution, or -ENODEV if initialization fails.
> + * If the node is invalid or offline, it falls back to local execution.
> + */
> +static int __exec_on_numa_node(int node, numa_func_t func, void *arg1, void *arg2)
> +{
> + struct numa_work_ctx ctx;
> +
> + /* Fallback to local execution if the node is invalid or offline */
> + if (node < 0 || node >= MAX_NUMNODES || !node_online(node))
> + return func(arg1, arg2);
Just a quick drive-by comment (I’ll go through it more thoroughly later).
What about the case where we are already on the requested node?
Also, we should probably set the corresponding CPU affinity for the time we are
executing func() to prevent migration.
> +
> + ctx.func = func;
> + ctx.arg1 = arg1;
> + ctx.arg2 = arg2;
> + ctx.result = -ENODEV;
> + INIT_WORK_ONSTACK(&ctx.work, numa_work_func);
> +
> + /* Use system_dfl_wq to allow execution on the specific node. */
> + queue_work_node(node, system_dfl_wq, &ctx.work);
> + flush_work(&ctx.work);
> + destroy_work_on_stack(&ctx.work);
> +
> + return ctx.result;
> +}
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 2/3] driver core: Add NUMA-node awareness to the synchronous probe path
2026-01-07 18:22 ` Danilo Krummrich
@ 2026-01-08 8:28 ` Jinhui Guo
0 siblings, 0 replies; 9+ messages in thread
From: Jinhui Guo @ 2026-01-08 8:28 UTC (permalink / raw)
To: dakr
Cc: alexander.h.duyck, alexanderduyck, bhelgaas, bvanassche,
dan.j.williams, gregkh, guojinhui.liam, helgaas, linux-kernel,
linux-pci, rafael, tj
On Wed Jan 07, 2026 at 19:22:15 +0100, Danilo Krummrich wrote:
> On Wed Jan 7, 2026 at 6:55 PM CET, Jinhui Guo wrote:
> > + * __exec_on_numa_node - Execute a function on a specific NUMA node synchronously
> > + * @node: Target NUMA node ID
> > + * @func: The wrapper function to execute
> > + * @arg1: First argument (void *)
> > + * @arg2: Second argument (void *)
> > + *
> > + * Returns the result of the function execution, or -ENODEV if initialization fails.
> > + * If the node is invalid or offline, it falls back to local execution.
> > + */
> > +static int __exec_on_numa_node(int node, numa_func_t func, void *arg1, void *arg2)
> > +{
> > + struct numa_work_ctx ctx;
> > +
> > + /* Fallback to local execution if the node is invalid or offline */
> > + if (node < 0 || node >= MAX_NUMNODES || !node_online(node))
> > + return func(arg1, arg2);
>
> Just a quick drive-by comment (I’ll go through it more thoroughly later).
>
> What about the case where we are already on the requested node?
>
> Also, we should probably set the corresponding CPU affinity for the time we are
> executing func() to prevent migration.
Hi Danilo,
Thank you for your time and helpful comments.
Relying on queue_work_node() for node affinity is safer, even if the thread
is already on the target CPU.
Checking the current CPU and then setting affinity ourselves would require
handling CPU-hotplug and isolated CPUs—corner cases that become complex
quickly.
The PCI driver tried this years ago and ran into numerous problems; delegating
the decision to queue_work_node() avoids repeating that history.
- Commit d42c69972b85 ("[PATCH] PCI: Run PCI driver initialization on local node")
first added NUMA awareness with set_cpus_allowed_ptr().
- Commit 1ddd45f8d76f ("PCI: Use cpu_hotplug_disable() instead of get_online_cpus()")
handled CPU-hotplug.
- Commits 69a18b18699b ("PCI: Restrict probe functions to housekeeping CPUs") and
9d42ea0d6984 ("pci: Decouple HK_FLAG_WQ and HK_FLAG_DOMAIN cpumask fetch") dealt
with isolated CPUs.
I considered setting CPU affinity, but the performance gain is minimal:
1. Driver probing happens mainly at boot, when load is light, so queuing a worker
incurs little delay.
2. With many devices they are usually spread across nodes, so workers are not
stalled long within any NUMA node.
3. Even after pinning, tasks can still be migrated by load balancing within the
NUMA node, so the reduction in context switches versus using queue_work_node()
alone is negligible.
Test data [1] shows that queue_work_node() has negligible impact on synchronous probe time.
[1] https://lore.kernel.org/all/20260107175548.1792-1-guojinhui.liam@bytedance.com/
If you have any other concerns, please let me know.
Best Regards,
Jinhui
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] driver core: Add NUMA-node awareness to the synchronous probe path
2026-01-07 17:55 ` [PATCH 2/3] driver core: Add NUMA-node awareness to the synchronous probe path Jinhui Guo
2026-01-07 18:22 ` Danilo Krummrich
@ 2026-01-17 14:03 ` Danilo Krummrich
2026-01-20 17:23 ` Jinhui Guo
1 sibling, 1 reply; 9+ messages in thread
From: Danilo Krummrich @ 2026-01-17 14:03 UTC (permalink / raw)
To: Jinhui Guo
Cc: alexander.h.duyck, alexanderduyck, bhelgaas, bvanassche,
dan.j.williams, gregkh, helgaas, rafael, tj, linux-kernel,
linux-pci
On Wed Jan 7, 2026 at 6:55 PM CET, Jinhui Guo wrote:
> @@ -808,6 +894,8 @@ static int __driver_probe_device(const struct device_driver *drv, struct device
> return ret;
> }
>
> +DEFINE_NUMA_WRAPPER(__driver_probe_device, const struct device_driver *, struct device *)
> +
> /**
> * driver_probe_device - attempt to bind device & driver together
> * @drv: driver to bind a device to
> @@ -844,6 +932,8 @@ static int driver_probe_device(const struct device_driver *drv, struct device *d
> return ret;
> }
>
> +DEFINE_NUMA_WRAPPER(driver_probe_device, const struct device_driver *, struct device *)
> +
> static inline bool cmdline_requested_async_probing(const char *drv_name)
> {
> bool async_drv;
> @@ -1000,6 +1090,8 @@ static int __device_attach_driver_scan(struct device_attach_data *data,
> return ret;
> }
>
> +DEFINE_NUMA_WRAPPER(__device_attach_driver_scan, struct device_attach_data *, bool *)
Why define three different wrappers? To me it looks like we should easily get
away with a single wrapper for __driver_probe_device(), which could just be
__driver_probe_device_node().
__device_attach_driver_scan() already has this information (i.e. we can check if
need_async == NULL). Additionally, we can change the signature of
driver_probe_device() to
static int driver_probe_device(const struct device_driver *drv, struct device *dev, bool async)
This reduces complexity a lot, since it gets us rid of DEFINE_NUMA_WRAPPER() and
EXEC_ON_NUMA_NODE() macros.
> static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
> {
> struct device *dev = _dev;
> @@ -1055,7 +1147,9 @@ static int __device_attach(struct device *dev, bool allow_async)
> .want_async = false,
> };
>
> - ret = __device_attach_driver_scan(&data, &async);
> + ret = EXEC_ON_NUMA_NODE(dev_to_node(dev),
> + __device_attach_driver_scan,
> + &data, &async);
> }
> out_unlock:
> device_unlock(dev);
> @@ -1142,7 +1236,9 @@ int device_driver_attach(const struct device_driver *drv, struct device *dev)
> int ret;
>
> __device_driver_lock(dev, dev->parent);
> - ret = __driver_probe_device(drv, dev);
> + ret = EXEC_ON_NUMA_NODE(dev_to_node(dev),
> + __driver_probe_device,
> + drv, dev);
> __device_driver_unlock(dev, dev->parent);
>
> /* also return probe errors as normal negative errnos */
> @@ -1231,7 +1327,9 @@ static int __driver_attach(struct device *dev, void *data)
> }
>
> __device_driver_lock(dev, dev->parent);
> - driver_probe_device(drv, dev);
> + EXEC_ON_NUMA_NODE(dev_to_node(dev),
> + driver_probe_device,
> + drv, dev);
> __device_driver_unlock(dev, dev->parent);
>
> return 0;
> --
> 2.20.1
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 2/3] driver core: Add NUMA-node awareness to the synchronous probe path
2026-01-17 14:03 ` Danilo Krummrich
@ 2026-01-20 17:23 ` Jinhui Guo
0 siblings, 0 replies; 9+ messages in thread
From: Jinhui Guo @ 2026-01-20 17:23 UTC (permalink / raw)
To: dakr
Cc: alexander.h.duyck, alexanderduyck, bhelgaas, bvanassche,
dan.j.williams, gregkh, guojinhui.liam, helgaas, linux-kernel,
linux-pci, rafael, tj
On Sat Jan 17, 2026 15:03:08 +0100, Danilo Krummrich wrote:
> On Wed Jan 7, 2026 at 6:55 PM CET, Jinhui Guo wrote:
> > @@ -808,6 +894,8 @@ static int __driver_probe_device(const struct device_driver *drv, struct device
> > return ret;
> > }
> >
> > +DEFINE_NUMA_WRAPPER(__driver_probe_device, const struct device_driver *, struct device *)
> > +
> > /**
> > * driver_probe_device - attempt to bind device & driver together
> > * @drv: driver to bind a device to
> > @@ -844,6 +932,8 @@ static int driver_probe_device(const struct device_driver *drv, struct device *d
> > return ret;
> > }
> >
> > +DEFINE_NUMA_WRAPPER(driver_probe_device, const struct device_driver *, struct device *)
> > +
> > static inline bool cmdline_requested_async_probing(const char *drv_name)
> > {
> > bool async_drv;
> > @@ -1000,6 +1090,8 @@ static int __device_attach_driver_scan(struct device_attach_data *data,
> > return ret;
> > }
> >
> > +DEFINE_NUMA_WRAPPER(__device_attach_driver_scan, struct device_attach_data *, bool *)
>
> Why define three different wrappers? To me it looks like we should easily get
> away with a single wrapper for __driver_probe_device(), which could just be
> __driver_probe_device_node().
>
>
> __device_attach_driver_scan() already has this information (i.e. we can check if
> need_async == NULL). Additionally, we can change the signature of
> driver_probe_device() to
>
> static int driver_probe_device(const struct device_driver *drv, struct device *dev, bool async)
>
> This reduces complexity a lot, since it gets us rid of DEFINE_NUMA_WRAPPER() and
> EXEC_ON_NUMA_NODE() macros.
Hi Danilo,
Thank you for your time and helpful comments.
Apologies for the delayed reply. I understand your concern: before sending this
patchset I prototyped a version that added __driver_probe_device_node() and
relied solely on current_is_async() to detect an async worker, without changing
driver_probe_device()’s signature. That proved fragile, so I abandoned it; your
suggestion is the more reliable path forward.
I’ve spent the last couple of days preparing a new patch and will send it out
after testing.
Best Regards,
Jinhui
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] PCI: Clean up NUMA-node awareness in pci_bus_type probe
2026-01-07 17:55 [PATCH 0/3] Add NUMA-node-aware synchronous probing to driver core Jinhui Guo
2026-01-07 17:55 ` [PATCH 1/3] driver core: Introduce helper function __device_attach_driver_scan() Jinhui Guo
2026-01-07 17:55 ` [PATCH 2/3] driver core: Add NUMA-node awareness to the synchronous probe path Jinhui Guo
@ 2026-01-07 17:55 ` Jinhui Guo
2 siblings, 0 replies; 9+ messages in thread
From: Jinhui Guo @ 2026-01-07 17:55 UTC (permalink / raw)
To: dakr, alexander.h.duyck, alexanderduyck, bhelgaas, bvanassche,
dan.j.williams, gregkh, helgaas, rafael, tj
Cc: guojinhui.liam, linux-kernel, linux-pci
With NUMA-node-aware probing now handled by the driver core,
the equivalent code in the PCI driver is redundant and can
be removed.
Dropping it speeds up asynchronous probe by 35%; the gain
comes from eliminating the work_on_cpu() call in pci_call_probe()
that previously pinned every worker to the same CPU, forcing
serial probe of devices on the same NUMA node.
Testing three NVMe devices on the same NUMA node of an AMD
EPYC 9A64 2.4 GHz processor shows a 35% probe-time improvement
with the patch:
Before (all on CPU 0):
nvme 0000:01:00.0: CPU: 0, COMM: kworker/0:1, cost: 52266334ns
nvme 0000:02:00.0: CPU: 0, COMM: kworker/0:0, cost: 50787194ns
nvme 0000:03:00.0: CPU: 0, COMM: kworker/0:2, cost: 50541584ns
After (spread across CPUs 1, 2, 4):
nvme 0000:01:00.0: CPU: 1, COMM: kworker/u1025:2, cost: 35399608ns
nvme 0000:02:00.0: CPU: 2, COMM: kworker/u1025:3, cost: 35156157ns
nvme 0000:03:00.0: CPU: 4, COMM: kworker/u1025:0, cost: 35322116ns
The improvement grows with more PCI devices because fewer probes
contend for the same CPU.
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
---
drivers/pci/pci-driver.c | 83 ++++------------------------------------
include/linux/pci.h | 1 -
2 files changed, 8 insertions(+), 76 deletions(-)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 7c2d9d596258..683bc682e750 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -296,18 +296,9 @@ static struct attribute *pci_drv_attrs[] = {
};
ATTRIBUTE_GROUPS(pci_drv);
-struct drv_dev_and_id {
- struct pci_driver *drv;
- struct pci_dev *dev;
- const struct pci_device_id *id;
-};
-
-static long local_pci_probe(void *_ddi)
+static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
+ const struct pci_device_id *id)
{
- struct drv_dev_and_id *ddi = _ddi;
- struct pci_dev *pci_dev = ddi->dev;
- struct pci_driver *pci_drv = ddi->drv;
- struct device *dev = &pci_dev->dev;
int rc;
/*
@@ -319,83 +310,25 @@ static long local_pci_probe(void *_ddi)
* count, in its probe routine and pm_runtime_get_noresume() in
* its remove routine.
*/
- pm_runtime_get_sync(dev);
- pci_dev->driver = pci_drv;
- rc = pci_drv->probe(pci_dev, ddi->id);
+ pm_runtime_get_sync(&dev->dev);
+ dev->driver = drv;
+ rc = drv->probe(dev, id);
if (!rc)
return rc;
if (rc < 0) {
- pci_dev->driver = NULL;
- pm_runtime_put_sync(dev);
+ dev->driver = NULL;
+ pm_runtime_put_sync(&dev->dev);
return rc;
}
/*
* Probe function should return < 0 for failure, 0 for success
* Treat values > 0 as success, but warn.
*/
- pci_warn(pci_dev, "Driver probe function unexpectedly returned %d\n",
+ pci_warn(dev, "Driver probe function unexpectedly returned %d\n",
rc);
return 0;
}
-static bool pci_physfn_is_probed(struct pci_dev *dev)
-{
-#ifdef CONFIG_PCI_IOV
- return dev->is_virtfn && dev->physfn->is_probed;
-#else
- return false;
-#endif
-}
-
-static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
- const struct pci_device_id *id)
-{
- int error, node, cpu;
- struct drv_dev_and_id ddi = { drv, dev, id };
-
- /*
- * Execute driver initialization on node where the device is
- * attached. This way the driver likely allocates its local memory
- * on the right node.
- */
- node = dev_to_node(&dev->dev);
- dev->is_probed = 1;
-
- cpu_hotplug_disable();
-
- /*
- * Prevent nesting work_on_cpu() for the case where a Virtual Function
- * device is probed from work_on_cpu() of the Physical device.
- */
- if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
- pci_physfn_is_probed(dev)) {
- cpu = nr_cpu_ids;
- } else {
- cpumask_var_t wq_domain_mask;
-
- if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) {
- error = -ENOMEM;
- goto out;
- }
- cpumask_and(wq_domain_mask,
- housekeeping_cpumask(HK_TYPE_WQ),
- housekeeping_cpumask(HK_TYPE_DOMAIN));
-
- cpu = cpumask_any_and(cpumask_of_node(node),
- wq_domain_mask);
- free_cpumask_var(wq_domain_mask);
- }
-
- if (cpu < nr_cpu_ids)
- error = work_on_cpu(cpu, local_pci_probe, &ddi);
- else
- error = local_pci_probe(&ddi);
-out:
- dev->is_probed = 0;
- cpu_hotplug_enable();
- return error;
-}
-
/**
* __pci_device_probe - check if a driver wants to claim a specific PCI device
* @drv: driver to call to check if it wants the PCI device
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 864775651c6f..cbc0db2f2b84 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -481,7 +481,6 @@ struct pci_dev {
unsigned int io_window_1k:1; /* Intel bridge 1K I/O windows */
unsigned int irq_managed:1;
unsigned int non_compliant_bars:1; /* Broken BARs; ignore them */
- unsigned int is_probed:1; /* Device probing in progress */
unsigned int link_active_reporting:1;/* Device capable of reporting link active */
unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */
unsigned int no_command_memory:1; /* No PCI_COMMAND_MEMORY */
--
2.20.1
^ permalink raw reply related [flat|nested] 9+ messages in thread