* [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users
@ 2025-09-05 8:53 Marco Crivellari
2025-09-05 8:53 ` [PATCH 1/3] bpf: replace use of system_wq with system_percpu_wq Marco Crivellari
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Marco Crivellari @ 2025-09-05 8:53 UTC (permalink / raw)
To: linux-kernel, bpf
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Hi!
Below is a summary of a discussion about the Workqueue API and cpu isolation
considerations. Details and more information are available here:
"workqueue: Always use wq_select_unbound_cpu() for WORK_CPU_UNBOUND."
https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
=== Current situation: problems ===
Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is
set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected.
This leads to different scenarios if a work item is scheduled on an isolated
CPU where "delay" value is 0 or greater then 0:
schedule_delayed_work(, 0);
This will be handled by __queue_work() that will queue the work item on the
current local (isolated) CPU, while:
schedule_delayed_work(, 1);
Will move the timer on an housekeeping CPU, and schedule the work there.
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
=== Plan and future plans ===
This patchset is the first stone on a refactoring needed in order to
address the points aforementioned; it will have a positive impact also
on the cpu isolation, in the long term, moving away percpu workqueue in
favor to an unbound model.
These are the main steps:
1) API refactoring (that this patch is introducing)
- Make more clear and uniform the system wq names, both per-cpu and
unbound. This to avoid any possible confusion on what should be
used.
- Introduction of WQ_PERCPU: this flag is the complement of WQ_UNBOUND,
introduced in this patchset and used on all the callers that are not
currently using WQ_UNBOUND.
WQ_UNBOUND will be removed in a future release cycle.
Most users don't need to be per-cpu, because they don't have
locality requirements, because of that, a next future step will be
make "unbound" the default behavior.
2) Check who really needs to be per-cpu
- Remove the WQ_PERCPU flag when is not strictly required.
3) Add a new API (prefer local cpu)
- There are users that don't require a local execution, like mentioned
above; despite that, local execution yeld to performance gain.
This new API will prefer the local execution, without requiring it.
=== Introduced Changes by this series ===
1) [P 1-2] Replace use of system_wq and system_unbound_wq
system_wq is a per-CPU workqueue, but his name is not clear.
system_unbound_wq is to be used when locality is not required.
Because of that, system_wq has been renamed in system_percpu_wq, and
system_unbound_wq has been renamed in system_dfl_wq.
2) [P 3] add WQ_PERCPU to remaining alloc_workqueue() users
Every alloc_workqueue() caller should use one among WQ_PERCPU or
WQ_UNBOUND. This is actually enforced warning if both or none of them
are present at the same time.
WQ_UNBOUND will be removed in a next release cycle.
=== For Maintainers ===
There are prerequisites for this series, already merged in the master branch.
The commits are:
128ea9f6ccfb6960293ae4212f4f97165e42222d ("workqueue: Add system_percpu_wq and
system_dfl_wq")
930c2ea566aff59e962c50b2421d5fcc3b98b8be ("workqueue: Add new WQ_PERCPU flag")
Thanks!
Marco Crivellari (3):
bpf: replace use of system_wq with system_percpu_wq
bpf: replace use of system_unbound_wq with system_dfl_wq
bpf: WQ_PERCPU added to alloc_workqueue users
kernel/bpf/cgroup.c | 5 +++--
kernel/bpf/cpumap.c | 2 +-
kernel/bpf/helpers.c | 4 ++--
kernel/bpf/memalloc.c | 2 +-
kernel/bpf/syscall.c | 2 +-
5 files changed, 8 insertions(+), 7 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/3] bpf: replace use of system_wq with system_percpu_wq
2025-09-05 8:53 [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
@ 2025-09-05 8:53 ` Marco Crivellari
2025-09-05 8:53 ` [PATCH 2/3] bpf: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Marco Crivellari @ 2025-09-05 8:53 UTC (permalink / raw)
To: linux-kernel, bpf
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
system_wq is a per-CPU worqueue, yet nothing in its name tells about that
CPU affinity constraint, which is very often not required by users. Make
it clear by adding a system_percpu_wq.
queue_work() / queue_delayed_work() mod_delayed_work() will now use the
new per-cpu wq: whether the user still stick on the old name a warn will
be printed along a wq redirect to the new one.
This patch add the new system_percpu_wq except for mm, fs and net
subsystem, whom are handled in separated patches.
The old wq will be kept for a few release cylces.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
kernel/bpf/cgroup.c | 2 +-
kernel/bpf/cpumap.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 84f58f3d028a..b8699ec4d766 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -27,7 +27,7 @@ EXPORT_SYMBOL(cgroup_bpf_enabled_key);
/*
* cgroup bpf destruction makes heavy use of work items and there can be a lot
* of concurrent destructions. Use a separate workqueue so that cgroup bpf
- * destruction work items don't end up filling up max_active of system_wq
+ * destruction work items don't end up filling up max_active of system_percpu_wq
* which may lead to deadlock.
*/
static struct workqueue_struct *cgroup_bpf_destroy_wq;
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 67e8a2fc1a99..1ab8e6876618 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -551,7 +551,7 @@ static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap,
old_rcpu = unrcu_pointer(xchg(&cmap->cpu_map[key_cpu], RCU_INITIALIZER(rcpu)));
if (old_rcpu) {
INIT_RCU_WORK(&old_rcpu->free_work, __cpu_map_entry_free);
- queue_rcu_work(system_wq, &old_rcpu->free_work);
+ queue_rcu_work(system_percpu_wq, &old_rcpu->free_work);
}
}
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] bpf: replace use of system_unbound_wq with system_dfl_wq
2025-09-05 8:53 [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
2025-09-05 8:53 ` [PATCH 1/3] bpf: replace use of system_wq with system_percpu_wq Marco Crivellari
@ 2025-09-05 8:53 ` Marco Crivellari
2025-09-05 8:53 ` [PATCH 3/3] bpf: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Marco Crivellari @ 2025-09-05 8:53 UTC (permalink / raw)
To: linux-kernel, bpf
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.
Adding system_dfl_wq to encourage its use when unbound work should be used.
queue_work() / queue_delayed_work() / mod_delayed_work() will now use the
new unbound wq: whether the user still use the old wq a warn will be
printed along with a wq redirect to the new one.
The old system_unbound_wq will be kept for a few release cycles.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
kernel/bpf/helpers.c | 4 ++--
kernel/bpf/memalloc.c | 2 +-
kernel/bpf/syscall.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e3a2662f4e33..b969ca4d7af0 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1593,7 +1593,7 @@ void bpf_timer_cancel_and_free(void *val)
* timer callback.
*/
if (this_cpu_read(hrtimer_running)) {
- queue_work(system_unbound_wq, &t->cb.delete_work);
+ queue_work(system_dfl_wq, &t->cb.delete_work);
return;
}
@@ -1606,7 +1606,7 @@ void bpf_timer_cancel_and_free(void *val)
if (hrtimer_try_to_cancel(&t->timer) >= 0)
kfree_rcu(t, cb.rcu);
else
- queue_work(system_unbound_wq, &t->cb.delete_work);
+ queue_work(system_dfl_wq, &t->cb.delete_work);
} else {
bpf_timer_delete_work(&t->cb.delete_work);
}
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 889374722d0a..bd45dda9dc35 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -736,7 +736,7 @@ static void destroy_mem_alloc(struct bpf_mem_alloc *ma, int rcu_in_progress)
/* Defer barriers into worker to let the rest of map memory to be freed */
memset(ma, 0, sizeof(*ma));
INIT_WORK(©->work, free_mem_alloc_deferred);
- queue_work(system_unbound_wq, ©->work);
+ queue_work(system_dfl_wq, ©->work);
}
void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9794446bc8c6..bb6f85fda240 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -901,7 +901,7 @@ static void bpf_map_free_in_work(struct bpf_map *map)
/* Avoid spawning kworkers, since they all might contend
* for the same mutex like slab_mutex.
*/
- queue_work(system_unbound_wq, &map->work);
+ queue_work(system_dfl_wq, &map->work);
}
static void bpf_map_free_rcu_gp(struct rcu_head *rcu)
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] bpf: WQ_PERCPU added to alloc_workqueue users
2025-09-05 8:53 [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
2025-09-05 8:53 ` [PATCH 1/3] bpf: replace use of system_wq with system_percpu_wq Marco Crivellari
2025-09-05 8:53 ` [PATCH 2/3] bpf: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
@ 2025-09-05 8:53 ` Marco Crivellari
2025-09-07 18:30 ` [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users Alexei Starovoitov
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Marco Crivellari @ 2025-09-05 8:53 UTC (permalink / raw)
To: linux-kernel, bpf
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This patch adds a new WQ_PERCPU flag to explicitly request the use of
the per-CPU behavior. Both flags coexist for one release cycle to allow
callers to transition their calls.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
All existing users have been updated accordingly.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
kernel/bpf/cgroup.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index b8699ec4d766..f3da9400c178 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -34,7 +34,8 @@ static struct workqueue_struct *cgroup_bpf_destroy_wq;
static int __init cgroup_bpf_wq_init(void)
{
- cgroup_bpf_destroy_wq = alloc_workqueue("cgroup_bpf_destroy", 0, 1);
+ cgroup_bpf_destroy_wq = alloc_workqueue("cgroup_bpf_destroy",
+ WQ_PERCPU, 1);
if (!cgroup_bpf_destroy_wq)
panic("Failed to alloc workqueue for cgroup bpf destroy.\n");
return 0;
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users
2025-09-05 8:53 [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
` (2 preceding siblings ...)
2025-09-05 8:53 ` [PATCH 3/3] bpf: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
@ 2025-09-07 18:30 ` Alexei Starovoitov
2025-09-08 15:43 ` Tejun Heo
2025-09-08 17:10 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2025-09-07 18:30 UTC (permalink / raw)
To: Marco Crivellari
Cc: LKML, bpf, Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Michal Hocko, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
On Fri, Sep 5, 2025 at 1:53 AM Marco Crivellari
<marco.crivellari@suse.com> wrote:
>
> === Plan and future plans ===
>
> This patchset is the first stone on a refactoring needed in order to
> address the points aforementioned; it will have a positive impact also
> on the cpu isolation, in the long term, moving away percpu workqueue in
> favor to an unbound model.
>
> These are the main steps:
> 1) API refactoring (that this patch is introducing)
> - Make more clear and uniform the system wq names, both per-cpu and
> unbound. This to avoid any possible confusion on what should be
> used.
>
> - Introduction of WQ_PERCPU: this flag is the complement of WQ_UNBOUND,
> introduced in this patchset and used on all the callers that are not
> currently using WQ_UNBOUND.
>
> WQ_UNBOUND will be removed in a future release cycle.
>
> Most users don't need to be per-cpu, because they don't have
> locality requirements, because of that, a next future step will be
> make "unbound" the default behavior.
>
> 2) Check who really needs to be per-cpu
> - Remove the WQ_PERCPU flag when is not strictly required.
>
> 3) Add a new API (prefer local cpu)
> - There are users that don't require a local execution, like mentioned
> above; despite that, local execution yeld to performance gain.
>
> This new API will prefer the local execution, without requiring it.
>
> === Introduced Changes by this series ===
>
> 1) [P 1-2] Replace use of system_wq and system_unbound_wq
>
> system_wq is a per-CPU workqueue, but his name is not clear.
> system_unbound_wq is to be used when locality is not required.
>
> Because of that, system_wq has been renamed in system_percpu_wq, and
> system_unbound_wq has been renamed in system_dfl_wq.
>
> 2) [P 3] add WQ_PERCPU to remaining alloc_workqueue() users
>
> Every alloc_workqueue() caller should use one among WQ_PERCPU or
> WQ_UNBOUND. This is actually enforced warning if both or none of them
> are present at the same time.
>
> WQ_UNBOUND will be removed in a next release cycle.
>
> === For Maintainers ===
>
> There are prerequisites for this series, already merged in the master branch.
Everything makes sense.
Tejun,
please ack this set just to make sure it's all going as planned.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users
2025-09-05 8:53 [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
` (3 preceding siblings ...)
2025-09-07 18:30 ` [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users Alexei Starovoitov
@ 2025-09-08 15:43 ` Tejun Heo
2025-09-08 17:10 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2025-09-08 15:43 UTC (permalink / raw)
To: Marco Crivellari
Cc: linux-kernel, bpf, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Michal Hocko, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
On Fri, Sep 05, 2025 at 10:53:06AM +0200, Marco Crivellari wrote:
...
> Marco Crivellari (3):
> bpf: replace use of system_wq with system_percpu_wq
> bpf: replace use of system_unbound_wq with system_dfl_wq
> bpf: WQ_PERCPU added to alloc_workqueue users
Acked-by: Tejun Heo <tj@kernel.org>
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users
2025-09-05 8:53 [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
` (4 preceding siblings ...)
2025-09-08 15:43 ` Tejun Heo
@ 2025-09-08 17:10 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-09-08 17:10 UTC (permalink / raw)
To: Marco Crivellari
Cc: linux-kernel, bpf, tj, jiangshanlai, frederic, bigeasy, mhocko,
ast, daniel, andrii
Hello:
This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Fri, 5 Sep 2025 10:53:06 +0200 you wrote:
> Hi!
>
> Below is a summary of a discussion about the Workqueue API and cpu isolation
> considerations. Details and more information are available here:
>
> "workqueue: Always use wq_select_unbound_cpu() for WORK_CPU_UNBOUND."
> https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
>
> [...]
Here is the summary with links:
- [1/3] bpf: replace use of system_wq with system_percpu_wq
https://git.kernel.org/bpf/bpf-next/c/34f86083a4e1
- [2/3] bpf: replace use of system_unbound_wq with system_dfl_wq
https://git.kernel.org/bpf/bpf-next/c/0409819a0021
- [3/3] bpf: WQ_PERCPU added to alloc_workqueue users
https://git.kernel.org/bpf/bpf-next/c/a857210b104f
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-08 17:10 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-05 8:53 [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
2025-09-05 8:53 ` [PATCH 1/3] bpf: replace use of system_wq with system_percpu_wq Marco Crivellari
2025-09-05 8:53 ` [PATCH 2/3] bpf: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
2025-09-05 8:53 ` [PATCH 3/3] bpf: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
2025-09-07 18:30 ` [PATCH 0/3] bpf: replace wq users and add WQ_PERCPU to alloc_workqueue() users Alexei Starovoitov
2025-09-08 15:43 ` Tejun Heo
2025-09-08 17:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox