Linux cgroups development
 help / color / mirror / Atom feed
* [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users
@ 2025-09-05  8:54 Marco Crivellari
  2025-09-05  8:54 ` [PATCH 1/2] cgroup: replace use of system_wq with system_percpu_wq Marco Crivellari
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Marco Crivellari @ 2025-09-05  8:54 UTC (permalink / raw)
  To: linux-kernel, cgroups
  Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
	Johannes Weiner, Michal Koutny

Hi!

Below is a summary of a discussion about the Workqueue API and cpu isolation
considerations. Details and more information are available here:

        "workqueue: Always use wq_select_unbound_cpu() for WORK_CPU_UNBOUND."
        https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/

=== Current situation: problems ===

Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is
set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected.

This leads to different scenarios if a work item is scheduled on an isolated
CPU where "delay" value is 0 or greater then 0:
        schedule_delayed_work(, 0);

This will be handled by __queue_work() that will queue the work item on the
current local (isolated) CPU, while:

        schedule_delayed_work(, 1);

Will move the timer on an housekeeping CPU, and schedule the work there.

Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistentcy cannot be addressed without refactoring the API.

=== Plan and future plans ===

This patchset is the first stone on a refactoring needed in order to
address the points aforementioned; it will have a positive impact also
on the cpu isolation, in the long term, moving away percpu workqueue in
favor to an unbound model.

These are the main steps:
1)  API refactoring (that this patch is introducing)
    -   Make more clear and uniform the system wq names, both per-cpu and
        unbound. This to avoid any possible confusion on what should be
        used.

    -   Introduction of WQ_PERCPU: this flag is the complement of WQ_UNBOUND,
        introduced in this patchset and used on all the callers that are not
        currently using WQ_UNBOUND.

        WQ_UNBOUND will be removed in a future release cycle.

        Most users don't need to be per-cpu, because they don't have
        locality requirements, because of that, a next future step will be
        make "unbound" the default behavior.

2)  Check who really needs to be per-cpu
    -   Remove the WQ_PERCPU flag when is not strictly required.

3)  Add a new API (prefer local cpu)
    -   There are users that don't require a local execution, like mentioned
        above; despite that, local execution yeld to performance gain.

        This new API will prefer the local execution, without requiring it.

=== Introduced Changes by this series ===

1) [P 1] Replace use of system_wq

        system_wq is a per-CPU workqueue, but his name is not clear.

        Because of that, system_wq has been renamed in system_percpu_wq.

2) [P 2] add WQ_PERCPU to remaining alloc_workqueue() users

        Every alloc_workqueue() caller should use one among WQ_PERCPU or
        WQ_UNBOUND. This is actually enforced warning if both or none of them
        are present at the same time.

        WQ_UNBOUND will be removed in a next release cycle.

=== For Maintainers ===

There are prerequisites for this series, already merged in the master branch.
The commits are:

128ea9f6ccfb6960293ae4212f4f97165e42222d ("workqueue: Add system_percpu_wq and
system_dfl_wq")

930c2ea566aff59e962c50b2421d5fcc3b98b8be ("workqueue: Add new WQ_PERCPU flag")


Thanks!

Marco Crivellari (2):
  cgroup: replace use of system_wq with system_percpu_wq
  cgroup: WQ_PERCPU added to alloc_workqueue users

 kernel/cgroup/cgroup-v1.c | 2 +-
 kernel/cgroup/cgroup.c    | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] cgroup: replace use of system_wq with system_percpu_wq
  2025-09-05  8:54 [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
@ 2025-09-05  8:54 ` Marco Crivellari
  2025-09-05  8:54 ` [PATCH 2/2] cgroup: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
  2025-09-05 16:41 ` [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users Tejun Heo
  2 siblings, 0 replies; 5+ messages in thread
From: Marco Crivellari @ 2025-09-05  8:54 UTC (permalink / raw)
  To: linux-kernel, cgroups
  Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
	Johannes Weiner, Michal Koutny

Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistentcy cannot be addressed without refactoring the API.

system_wq is a per-CPU worqueue, yet nothing in its name tells about that
CPU affinity constraint, which is very often not required by users. Make
it clear by adding a system_percpu_wq.

queue_work() / queue_delayed_work() mod_delayed_work() will now use the
new per-cpu wq: whether the user still stick on the old name a warn will
be printed along a wq redirect to the new one.

This patch add the new system_percpu_wq except for mm, fs and net
subsystem, whom are handled in separated patches.

The old wq will be kept for a few release cylces.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
 kernel/cgroup/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 3caf2cd86e65..1e39355194fd 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -121,7 +121,7 @@ DEFINE_PERCPU_RWSEM(cgroup_threadgroup_rwsem);
 /*
  * cgroup destruction makes heavy use of work items and there can be a lot
  * of concurrent destructions.  Use a separate workqueue so that cgroup
- * destruction work items don't end up filling up max_active of system_wq
+ * destruction work items don't end up filling up max_active of system_percpu_wq
  * which may lead to deadlock.
  */
 static struct workqueue_struct *cgroup_destroy_wq;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] cgroup: WQ_PERCPU added to alloc_workqueue users
  2025-09-05  8:54 [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
  2025-09-05  8:54 ` [PATCH 1/2] cgroup: replace use of system_wq with system_percpu_wq Marco Crivellari
@ 2025-09-05  8:54 ` Marco Crivellari
  2025-09-05 16:41 ` [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users Tejun Heo
  2 siblings, 0 replies; 5+ messages in thread
From: Marco Crivellari @ 2025-09-05  8:54 UTC (permalink / raw)
  To: linux-kernel, cgroups
  Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
	Johannes Weiner, Michal Koutny

Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This patch adds a new WQ_PERCPU flag to explicitly request the use of
the per-CPU behavior. Both flags coexist for one release cycle to allow
callers to transition their calls.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

All existing users have been updated accordingly.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
 kernel/cgroup/cgroup-v1.c | 2 +-
 kernel/cgroup/cgroup.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index fa24c032ed6f..779d586e191c 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -1321,7 +1321,7 @@ static int __init cgroup1_wq_init(void)
 	 * Cap @max_active to 1 too.
 	 */
 	cgroup_pidlist_destroy_wq = alloc_workqueue("cgroup_pidlist_destroy",
-						    0, 1);
+						    WQ_PERCPU, 1);
 	BUG_ON(!cgroup_pidlist_destroy_wq);
 	return 0;
 }
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1e39355194fd..54a66cf0cef9 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6281,7 +6281,7 @@ static int __init cgroup_wq_init(void)
 	 * We would prefer to do this in cgroup_init() above, but that
 	 * is called before init_workqueues(): so leave this until after.
 	 */
-	cgroup_destroy_wq = alloc_workqueue("cgroup_destroy", 0, 1);
+	cgroup_destroy_wq = alloc_workqueue("cgroup_destroy", WQ_PERCPU, 1);
 	BUG_ON(!cgroup_destroy_wq);
 	return 0;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users
  2025-09-05  8:54 [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
  2025-09-05  8:54 ` [PATCH 1/2] cgroup: replace use of system_wq with system_percpu_wq Marco Crivellari
  2025-09-05  8:54 ` [PATCH 2/2] cgroup: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
@ 2025-09-05 16:41 ` Tejun Heo
  2025-09-09 10:26   ` Marco Crivellari
  2 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2025-09-05 16:41 UTC (permalink / raw)
  To: Marco Crivellari
  Cc: linux-kernel, cgroups, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Michal Hocko, Johannes Weiner,
	Michal Koutny

On Fri, Sep 05, 2025 at 10:54:34AM +0200, Marco Crivellari wrote:
...
> Marco Crivellari (2):
>   cgroup: replace use of system_wq with system_percpu_wq
>   cgroup: WQ_PERCPU added to alloc_workqueue users

Applied to cgroup/for-6.18 with dup para removed from the description of the
second patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users
  2025-09-05 16:41 ` [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users Tejun Heo
@ 2025-09-09 10:26   ` Marco Crivellari
  0 siblings, 0 replies; 5+ messages in thread
From: Marco Crivellari @ 2025-09-09 10:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, cgroups, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Michal Hocko, Johannes Weiner,
	Michal Koutny

On Fri, Sep 5, 2025 at 6:41 PM Tejun Heo <tj@kernel.org> wrote:
>
> On Fri, Sep 05, 2025 at 10:54:34AM +0200, Marco Crivellari wrote:
> ...
> > Marco Crivellari (2):
> >   cgroup: replace use of system_wq with system_percpu_wq
> >   cgroup: WQ_PERCPU added to alloc_workqueue users
>
> Applied to cgroup/for-6.18 with dup para removed from the description of the
> second patch.

Thank you!

-- 

Marco Crivellari

L3 Support Engineer, Technology & Product

marco.crivellari@suse.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-09-09 10:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-05  8:54 [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
2025-09-05  8:54 ` [PATCH 1/2] cgroup: replace use of system_wq with system_percpu_wq Marco Crivellari
2025-09-05  8:54 ` [PATCH 2/2] cgroup: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
2025-09-05 16:41 ` [PATCH 0/2] cgroup: replace wq users and add WQ_PERCPU to alloc_workqueue() users Tejun Heo
2025-09-09 10:26   ` Marco Crivellari

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox