* [PATCH 0/4] replace old wq(s), added WQ_PERCPU to alloc_workqueue
@ 2025-10-30 16:10 Marco Crivellari
2025-10-30 16:10 ` [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Marco Crivellari @ 2025-10-30 16:10 UTC (permalink / raw)
To: linux-kernel, amd-gfx, dri-devel
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Alex Deucher, Christian König, David Airlie, Simona Vetter
Hi,
=== Current situation: problems ===
Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is
set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected.
This leads to different scenarios if a work item is scheduled on an
isolated CPU where "delay" value is 0 or greater then 0:
schedule_delayed_work(, 0);
This will be handled by __queue_work() that will queue the work item on the
current local (isolated) CPU, while:
schedule_delayed_work(, 1);
Will move the timer on an housekeeping CPU, and schedule the work there.
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
=== Recent changes to the WQ API ===
The following, address the recent changes in the Workqueue API:
- commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
- commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The old workqueues will be removed in a future release cycle.
=== Introduced Changes by this series ===
1) [P 1-2] Replace uses of system_wq and system_unbound_wq
system_wq is a per-CPU workqueue, but his name is not clear.
system_unbound_wq is to be used when locality is not required.
Because of that, system_wq has been replaced with system_percpu_wq, and
system_unbound_wq has been replaced with system_dfl_wq.
2) [P 3-4] WQ_PERCPU added to alloc_workqueue()
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
Thanks!
Marco Crivellari (4):
drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq
drm/amdgpu: replace use of system_wq with system_percpu_wq
amd/amdkfd: WQ_PERCPU added to alloc_workqueue users
drm/radeon: WQ_PERCPU added to alloc_workqueue users
drivers/gpu/drm/amd/amdgpu/aldebaran.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 ++-
drivers/gpu/drm/radeon/radeon_display.c | 3 ++-
5 files changed, 9 insertions(+), 7 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq
2025-10-30 16:10 [PATCH 0/4] replace old wq(s), added WQ_PERCPU to alloc_workqueue Marco Crivellari
@ 2025-10-30 16:10 ` Marco Crivellari
2025-10-30 17:14 ` Christian König
2025-10-30 16:10 ` [PATCH 2/4] drm/amdgpu: replace use of system_wq with system_percpu_wq Marco Crivellari
` (2 subsequent siblings)
3 siblings, 1 reply; 15+ messages in thread
From: Marco Crivellari @ 2025-10-30 16:10 UTC (permalink / raw)
To: linux-kernel, amd-gfx, dri-devel
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Alex Deucher, Christian König, David Airlie, Simona Vetter
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.
Adding system_dfl_wq to encourage its use when unbound work should be used.
The old system_unbound_wq will be kept for a few release cycles.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
drivers/gpu/drm/amd/amdgpu/aldebaran.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
index 9569dc16dd3d..7957e6c4c416 100644
--- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
@@ -175,7 +175,7 @@ aldebaran_mode2_perform_reset(struct amdgpu_reset_control *reset_ctl,
list_for_each_entry(tmp_adev, reset_device_list, reset_list) {
/* For XGMI run all resets in parallel to speed up the process */
if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
- if (!queue_work(system_unbound_wq,
+ if (!queue_work(system_dfl_wq,
&tmp_adev->reset_cntl->reset_work))
r = -EALREADY;
} else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7a899fb4de29..8c4d79f6c14f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -6033,7 +6033,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
/* For XGMI run all resets in parallel to speed up the process */
if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
- if (!queue_work(system_unbound_wq,
+ if (!queue_work(system_dfl_wq,
&tmp_adev->xgmi_reset_work))
r = -EALREADY;
} else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 28c4ad62f50e..9c4631608526 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -116,7 +116,7 @@ static int amdgpu_reset_xgmi_reset_on_init_perform_reset(
/* Mode1 reset needs to be triggered on all devices together */
list_for_each_entry(tmp_adev, reset_device_list, reset_list) {
/* For XGMI run all resets in parallel to speed up the process */
- if (!queue_work(system_unbound_wq, &tmp_adev->xgmi_reset_work))
+ if (!queue_work(system_dfl_wq, &tmp_adev->xgmi_reset_work))
r = -EALREADY;
if (r) {
dev_err(tmp_adev->dev,
--
2.51.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] drm/amdgpu: replace use of system_wq with system_percpu_wq
2025-10-30 16:10 [PATCH 0/4] replace old wq(s), added WQ_PERCPU to alloc_workqueue Marco Crivellari
2025-10-30 16:10 ` [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
@ 2025-10-30 16:10 ` Marco Crivellari
2025-10-30 17:10 ` Christian König
2025-10-30 16:10 ` [PATCH 3/4] amd/amdkfd: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
2025-10-30 16:10 ` [PATCH 4/4] drm/radeon: " Marco Crivellari
3 siblings, 1 reply; 15+ messages in thread
From: Marco Crivellari @ 2025-10-30 16:10 UTC (permalink / raw)
To: linux-kernel, amd-gfx, dri-devel
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Alex Deucher, Christian König, David Airlie, Simona Vetter
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
system_wq should be the per-cpu workqueue, yet in this name nothing makes
that clear, so replace system_wq with system_percpu_wq.
The old wq (system_wq) will be kept for a few release cycles.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8c4d79f6c14f..2f8160702f9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4798,7 +4798,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
}
/* must succeed. */
amdgpu_ras_resume(adev);
- queue_delayed_work(system_wq, &adev->delayed_init_work,
+ queue_delayed_work(system_percpu_wq, &adev->delayed_init_work,
msecs_to_jiffies(AMDGPU_RESUME_MS));
}
@@ -5328,7 +5328,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool notify_clients)
if (r)
goto exit;
- queue_delayed_work(system_wq, &adev->delayed_init_work,
+ queue_delayed_work(system_percpu_wq, &adev->delayed_init_work,
msecs_to_jiffies(AMDGPU_RESUME_MS));
exit:
if (amdgpu_sriov_vf(adev)) {
--
2.51.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] amd/amdkfd: WQ_PERCPU added to alloc_workqueue users
2025-10-30 16:10 [PATCH 0/4] replace old wq(s), added WQ_PERCPU to alloc_workqueue Marco Crivellari
2025-10-30 16:10 ` [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
2025-10-30 16:10 ` [PATCH 2/4] drm/amdgpu: replace use of system_wq with system_percpu_wq Marco Crivellari
@ 2025-10-30 16:10 ` Marco Crivellari
2025-10-30 17:15 ` Christian König
2025-10-30 16:10 ` [PATCH 4/4] drm/radeon: " Marco Crivellari
3 siblings, 1 reply; 15+ messages in thread
From: Marco Crivellari @ 2025-10-30 16:10 UTC (permalink / raw)
To: linux-kernel, amd-gfx, dri-devel
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Alex Deucher, Christian König, David Airlie, Simona Vetter
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index ddfe30c13e9d..ebc9925f4e66 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -690,7 +690,8 @@ void kfd_procfs_del_queue(struct queue *q)
int kfd_process_create_wq(void)
{
if (!kfd_process_wq)
- kfd_process_wq = alloc_workqueue("kfd_process_wq", 0, 0);
+ kfd_process_wq = alloc_workqueue("kfd_process_wq", WQ_PERCPU,
+ 0);
if (!kfd_restore_wq)
kfd_restore_wq = alloc_ordered_workqueue("kfd_restore_wq",
WQ_FREEZABLE);
--
2.51.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] drm/radeon: WQ_PERCPU added to alloc_workqueue users
2025-10-30 16:10 [PATCH 0/4] replace old wq(s), added WQ_PERCPU to alloc_workqueue Marco Crivellari
` (2 preceding siblings ...)
2025-10-30 16:10 ` [PATCH 3/4] amd/amdkfd: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
@ 2025-10-30 16:10 ` Marco Crivellari
3 siblings, 0 replies; 15+ messages in thread
From: Marco Crivellari @ 2025-10-30 16:10 UTC (permalink / raw)
To: linux-kernel, amd-gfx, dri-devel
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Alex Deucher, Christian König, David Airlie, Simona Vetter
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
drivers/gpu/drm/radeon/radeon_display.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index 351b9dfcdad8..3c8aa5274c51 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -687,7 +687,8 @@ static void radeon_crtc_init(struct drm_device *dev, int index)
if (radeon_crtc == NULL)
return;
- radeon_crtc->flip_queue = alloc_workqueue("radeon-crtc", WQ_HIGHPRI, 0);
+ radeon_crtc->flip_queue = alloc_workqueue("radeon-crtc",
+ WQ_HIGHPRI | WQ_PERCPU, 0);
if (!radeon_crtc->flip_queue) {
kfree(radeon_crtc);
return;
--
2.51.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] drm/amdgpu: replace use of system_wq with system_percpu_wq
2025-10-30 16:10 ` [PATCH 2/4] drm/amdgpu: replace use of system_wq with system_percpu_wq Marco Crivellari
@ 2025-10-30 17:10 ` Christian König
2025-10-31 9:01 ` Marco Crivellari
0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2025-10-30 17:10 UTC (permalink / raw)
To: Marco Crivellari, linux-kernel, amd-gfx, dri-devel
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Michal Hocko, Alex Deucher,
David Airlie, Simona Vetter
On 10/30/25 17:10, Marco Crivellari wrote:
> Currently if a user enqueue a work item using schedule_delayed_work() the
> used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
> WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
> schedule_work() that is using system_wq and queue_work(), that makes use
> again of WORK_CPU_UNBOUND.
>
> This lack of consistency cannot be addressed without refactoring the API.
>
> system_wq should be the per-cpu workqueue, yet in this name nothing makes
> that clear, so replace system_wq with system_percpu_wq.
>
> The old wq (system_wq) will be kept for a few release cycles.
Oh, good point!
>
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 8c4d79f6c14f..2f8160702f9a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4798,7 +4798,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> }
> /* must succeed. */
> amdgpu_ras_resume(adev);
> - queue_delayed_work(system_wq, &adev->delayed_init_work,
> + queue_delayed_work(system_percpu_wq, &adev->delayed_init_work,
> msecs_to_jiffies(AMDGPU_RESUME_MS));
> }
>
> @@ -5328,7 +5328,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool notify_clients)
> if (r)
> goto exit;
>
> - queue_delayed_work(system_wq, &adev->delayed_init_work,
> + queue_delayed_work(system_percpu_wq, &adev->delayed_init_work,
> msecs_to_jiffies(AMDGPU_RESUME_MS));
In this particular use case we actually don't want the percpu wq.
This can execute on any CPU except for the current one.
Regards,
Christian.
> exit:
> if (amdgpu_sriov_vf(adev)) {
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq
2025-10-30 16:10 ` [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
@ 2025-10-30 17:14 ` Christian König
2025-10-31 8:42 ` Marco Crivellari
0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2025-10-30 17:14 UTC (permalink / raw)
To: Marco Crivellari, linux-kernel, amd-gfx, dri-devel
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Michal Hocko, Alex Deucher,
David Airlie, Simona Vetter
On 10/30/25 17:10, Marco Crivellari wrote:
> Currently if a user enqueue a work item using schedule_delayed_work() the
> used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
> WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
> schedule_work() that is using system_wq and queue_work(), that makes use
> again of WORK_CPU_UNBOUND.
>
> This lack of consistency cannot be addressed without refactoring the API.
>
> system_unbound_wq should be the default workqueue so as not to enforce
> locality constraints for random work whenever it's not required.
>
> Adding system_dfl_wq to encourage its use when unbound work should be used.
>
> The old system_unbound_wq will be kept for a few release cycles.
In all the cases below we actually want the work to run on a different CPU than the current one.
So using system_unbound_wq seems to be more appropriate.
Regards,
Christian.
>
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
> ---
> drivers/gpu/drm/amd/amdgpu/aldebaran.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 2 +-
> 3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> index 9569dc16dd3d..7957e6c4c416 100644
> --- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> +++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> @@ -175,7 +175,7 @@ aldebaran_mode2_perform_reset(struct amdgpu_reset_control *reset_ctl,
> list_for_each_entry(tmp_adev, reset_device_list, reset_list) {
> /* For XGMI run all resets in parallel to speed up the process */
> if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
> - if (!queue_work(system_unbound_wq,
> + if (!queue_work(system_dfl_wq,
> &tmp_adev->reset_cntl->reset_work))
> r = -EALREADY;
> } else
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7a899fb4de29..8c4d79f6c14f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -6033,7 +6033,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
> list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
> /* For XGMI run all resets in parallel to speed up the process */
> if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
> - if (!queue_work(system_unbound_wq,
> + if (!queue_work(system_dfl_wq,
> &tmp_adev->xgmi_reset_work))
> r = -EALREADY;
> } else
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> index 28c4ad62f50e..9c4631608526 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> @@ -116,7 +116,7 @@ static int amdgpu_reset_xgmi_reset_on_init_perform_reset(
> /* Mode1 reset needs to be triggered on all devices together */
> list_for_each_entry(tmp_adev, reset_device_list, reset_list) {
> /* For XGMI run all resets in parallel to speed up the process */
> - if (!queue_work(system_unbound_wq, &tmp_adev->xgmi_reset_work))
> + if (!queue_work(system_dfl_wq, &tmp_adev->xgmi_reset_work))
> r = -EALREADY;
> if (r) {
> dev_err(tmp_adev->dev,
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] amd/amdkfd: WQ_PERCPU added to alloc_workqueue users
2025-10-30 16:10 ` [PATCH 3/4] amd/amdkfd: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
@ 2025-10-30 17:15 ` Christian König
2025-10-31 8:48 ` Marco Crivellari
0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2025-10-30 17:15 UTC (permalink / raw)
To: Marco Crivellari, linux-kernel, amd-gfx, dri-devel
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Michal Hocko, Alex Deucher,
David Airlie, Simona Vetter, Yang, Philip, Kuehling, Felix
On 10/30/25 17:10, Marco Crivellari wrote:
> Currently if a user enqueue a work item using schedule_delayed_work() the
> used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
> WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
> schedule_work() that is using system_wq and queue_work(), that makes use
> again of WORK_CPU_UNBOUND.
> This lack of consistentcy cannot be addressed without refactoring the API.
>
> alloc_workqueue() treats all queues as per-CPU by default, while unbound
> workqueues must opt-in via WQ_UNBOUND.
>
> This default is suboptimal: most workloads benefit from unbound queues,
> allowing the scheduler to place worker threads where they’re needed and
> reducing noise when CPUs are isolated.
>
> This change adds a new WQ_PERCPU flag to explicitly request
> alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
>
> With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
> any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
> must now use WQ_PERCPU.
>
> Once migration is complete, WQ_UNBOUND can be removed and unbound will
> become the implicit default.
Adding Philip and Felix to comment, but this should most likely also not execute on the same CPU as the one who scheduled the work.
Regards,
Christian.
>
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index ddfe30c13e9d..ebc9925f4e66 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -690,7 +690,8 @@ void kfd_procfs_del_queue(struct queue *q)
> int kfd_process_create_wq(void)
> {
> if (!kfd_process_wq)
> - kfd_process_wq = alloc_workqueue("kfd_process_wq", 0, 0);
> + kfd_process_wq = alloc_workqueue("kfd_process_wq", WQ_PERCPU,
> + 0);
> if (!kfd_restore_wq)
> kfd_restore_wq = alloc_ordered_workqueue("kfd_restore_wq",
> WQ_FREEZABLE);
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq
2025-10-30 17:14 ` Christian König
@ 2025-10-31 8:42 ` Marco Crivellari
2025-10-31 8:44 ` Christian König
0 siblings, 1 reply; 15+ messages in thread
From: Marco Crivellari @ 2025-10-31 8:42 UTC (permalink / raw)
To: Christian König
Cc: linux-kernel, amd-gfx, dri-devel, Tejun Heo, Lai Jiangshan,
Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko,
Alex Deucher, David Airlie, Simona Vetter
On Thu, Oct 30, 2025 at 6:14 PM Christian König
<christian.koenig@amd.com> wrote:
>[...]
> In all the cases below we actually want the work to run on a different CPU than the current one.
>
> So using system_unbound_wq seems to be more appropriate.
Hello Christian,
system_dfl_wq is the new workqueue that will replace
system_unbound_wq, but the behavior is the same.
So, if you need system_unbound_wq, it means system_dfl_wq is fine here.
Thanks!
--
Marco Crivellari
L3 Support Engineer, Technology & Product
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq
2025-10-31 8:42 ` Marco Crivellari
@ 2025-10-31 8:44 ` Christian König
0 siblings, 0 replies; 15+ messages in thread
From: Christian König @ 2025-10-31 8:44 UTC (permalink / raw)
To: Marco Crivellari
Cc: linux-kernel, amd-gfx, dri-devel, Tejun Heo, Lai Jiangshan,
Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko,
Alex Deucher, David Airlie, Simona Vetter
On 10/31/25 09:42, Marco Crivellari wrote:
> On Thu, Oct 30, 2025 at 6:14 PM Christian König
> <christian.koenig@amd.com> wrote:
>> [...]
>> In all the cases below we actually want the work to run on a different CPU than the current one.
>>
>> So using system_unbound_wq seems to be more appropriate.
>
> Hello Christian,
>
> system_dfl_wq is the new workqueue that will replace
> system_unbound_wq, but the behavior is the same.
> So, if you need system_unbound_wq, it means system_dfl_wq is fine here.
Ah, ok thanks! In that case I'm fine with the change.
It sounded like system_dfl_wq is the new per CPU wq.
Regards,
Christian.
>
> Thanks!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] amd/amdkfd: WQ_PERCPU added to alloc_workqueue users
2025-10-30 17:15 ` Christian König
@ 2025-10-31 8:48 ` Marco Crivellari
2025-10-31 13:12 ` Philip Yang
0 siblings, 1 reply; 15+ messages in thread
From: Marco Crivellari @ 2025-10-31 8:48 UTC (permalink / raw)
To: Christian König
Cc: linux-kernel, amd-gfx, dri-devel, Tejun Heo, Lai Jiangshan,
Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko,
Alex Deucher, David Airlie, Simona Vetter, Yang, Philip,
Kuehling, Felix
On Thu, Oct 30, 2025 at 6:15 PM Christian König
<christian.koenig@amd.com> wrote:
>[...]
> Adding Philip and Felix to comment, but this should most likely also not execute on the same CPU as the one who scheduled the work.
Hi Christian,
The actual behavior without WQ_PERCPU is exactly the same: with 0 it
means the workqueue is per-cpu. We just enforced that, adding the
WQ_PERCPU flag, so that it is explicit.
So if you need this to be unbound, I can send the v2 with WQ_UNBOUND
instead of WQ_PERCPU.
Thanks!
--
Marco Crivellari
L3 Support Engineer, Technology & Product
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] drm/amdgpu: replace use of system_wq with system_percpu_wq
2025-10-30 17:10 ` Christian König
@ 2025-10-31 9:01 ` Marco Crivellari
2025-11-03 15:07 ` Christian König
0 siblings, 1 reply; 15+ messages in thread
From: Marco Crivellari @ 2025-10-31 9:01 UTC (permalink / raw)
To: Christian König
Cc: linux-kernel, amd-gfx, dri-devel, Tejun Heo, Lai Jiangshan,
Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko,
Alex Deucher, David Airlie, Simona Vetter
On Thu, Oct 30, 2025 at 6:10 PM Christian König
<christian.koenig@amd.com> wrote:
>[...]
> In this particular use case we actually don't want the percpu wq.
>
> This can execute on any CPU except for the current one.
>
> Regards,
> Christian.
>
> > exit:
> > if (amdgpu_sriov_vf(adev)) {
>
Hi Christian,
like for the unbound workqueue also the system_percpu_wq is just a
rename for system_wq.
Technically I changed the workqueue because we added in the code two wq:
- system_percpu_wq
- system_dfl_wq
You can see the commits mentioned in the cover letter, shared also below:
- commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
- commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
So basically the behavior is the same.
But if it would be beneficial to have an unbound wq, I can send the v2
with the change!
We did so also for other subsystems.
Thanks!
--
Marco Crivellari
L3 Support Engineer, Technology & Product
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] amd/amdkfd: WQ_PERCPU added to alloc_workqueue users
2025-10-31 8:48 ` Marco Crivellari
@ 2025-10-31 13:12 ` Philip Yang
2025-11-03 15:01 ` Marco Crivellari
0 siblings, 1 reply; 15+ messages in thread
From: Philip Yang @ 2025-10-31 13:12 UTC (permalink / raw)
To: Marco Crivellari, Christian König
Cc: linux-kernel, amd-gfx, dri-devel, Tejun Heo, Lai Jiangshan,
Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko,
Alex Deucher, David Airlie, Simona Vetter, Yang, Philip,
Kuehling, Felix
On 2025-10-31 04:48, Marco Crivellari wrote:
> On Thu, Oct 30, 2025 at 6:15 PM Christian König
> <christian.koenig@amd.com> wrote:
>> [...]
>> Adding Philip and Felix to comment, but this should most likely also not execute on the same CPU as the one who scheduled the work.
> Hi Christian,
>
> The actual behavior without WQ_PERCPU is exactly the same: with 0 it
> means the workqueue is per-cpu. We just enforced that, adding the
> WQ_PERCPU flag, so that it is explicit.
>
> So if you need this to be unbound, I can send the v2 with WQ_UNBOUND
> instead of WQ_PERCPU.
Hi,
WQ_UNBOUND is more appropriate here, to execute the KFD release work immediately as long as CPU resource is available, not specific to the CPU that kfd_unref_process the last process refcount.
Thanks,
Philip
> Thanks!
>
> --
>
> Marco Crivellari
>
> L3 Support Engineer, Technology & Product
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] amd/amdkfd: WQ_PERCPU added to alloc_workqueue users
2025-10-31 13:12 ` Philip Yang
@ 2025-11-03 15:01 ` Marco Crivellari
0 siblings, 0 replies; 15+ messages in thread
From: Marco Crivellari @ 2025-11-03 15:01 UTC (permalink / raw)
To: Philip Yang
Cc: Christian König, linux-kernel, amd-gfx, dri-devel, Tejun Heo,
Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior,
Michal Hocko, Alex Deucher, David Airlie, Simona Vetter,
Yang, Philip, Kuehling, Felix
On Fri, Oct 31, 2025 at 2:12 PM Philip Yang <yangp@amd.com> wrote:
> Hi,
>
> WQ_UNBOUND is more appropriate here, to execute the KFD release work immediately as long as CPU resource is available, not specific to the CPU that kfd_unref_process the last process refcount.
Hi,
I will do what you suggest.
Thank you!
--
Marco Crivellari
L3 Support Engineer, Technology & Product
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] drm/amdgpu: replace use of system_wq with system_percpu_wq
2025-10-31 9:01 ` Marco Crivellari
@ 2025-11-03 15:07 ` Christian König
0 siblings, 0 replies; 15+ messages in thread
From: Christian König @ 2025-11-03 15:07 UTC (permalink / raw)
To: Marco Crivellari
Cc: linux-kernel, amd-gfx, dri-devel, Tejun Heo, Lai Jiangshan,
Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko,
Alex Deucher, David Airlie, Simona Vetter
On 10/31/25 10:01, Marco Crivellari wrote:
> On Thu, Oct 30, 2025 at 6:10 PM Christian König
> <christian.koenig@amd.com> wrote:
>> [...]
>> In this particular use case we actually don't want the percpu wq.
>>
>> This can execute on any CPU except for the current one.
>>
>> Regards,
>> Christian.
>>
>>> exit:
>>> if (amdgpu_sriov_vf(adev)) {
>>
>
> Hi Christian,
>
> like for the unbound workqueue also the system_percpu_wq is just a
> rename for system_wq.
> Technically I changed the workqueue because we added in the code two wq:
> - system_percpu_wq
> - system_dfl_wq
>
> You can see the commits mentioned in the cover letter, shared also below:
>
> - commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
> - commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
>
> So basically the behavior is the same.
>
> But if it would be beneficial to have an unbound wq, I can send the v2
> with the change!
Please do so. The purpose of offloading that into a work item is to execute it on a different CPU.
I wasn't aware that the system_wq was CPU bound at all.
Thanks to taking care of that,
Christian.
> We did so also for other subsystems.
>
> Thanks!
>
>
>
> --
>
> Marco Crivellari
>
> L3 Support Engineer, Technology & Product
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-11-04 8:30 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-30 16:10 [PATCH 0/4] replace old wq(s), added WQ_PERCPU to alloc_workqueue Marco Crivellari
2025-10-30 16:10 ` [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
2025-10-30 17:14 ` Christian König
2025-10-31 8:42 ` Marco Crivellari
2025-10-31 8:44 ` Christian König
2025-10-30 16:10 ` [PATCH 2/4] drm/amdgpu: replace use of system_wq with system_percpu_wq Marco Crivellari
2025-10-30 17:10 ` Christian König
2025-10-31 9:01 ` Marco Crivellari
2025-11-03 15:07 ` Christian König
2025-10-30 16:10 ` [PATCH 3/4] amd/amdkfd: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
2025-10-30 17:15 ` Christian König
2025-10-31 8:48 ` Marco Crivellari
2025-10-31 13:12 ` Philip Yang
2025-11-03 15:01 ` Marco Crivellari
2025-10-30 16:10 ` [PATCH 4/4] drm/radeon: " Marco Crivellari
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox