[RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
@ 2026-05-05 16:16 Marco Crivellari
  2026-05-05 16:16 ` [RFC PATCH 1/2] workqueue: Add queue_*() functions, future schedule_*() replacement Marco Crivellari
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Marco Crivellari @ 2026-05-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko

Hi,

The following is part of the Workqueue refactoring, and a first RFC about
the rename of the schedule_*() interfaces along with the introduction of
the "wq prefer per-cpu" workqueue and workqueue flag.

Any feedback is more then welcome!

~~~

More information about the reasons behind the workqueue refactoring can
be found at:

  https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/

Actually schedule_work() and schedule_work_on() enqueue works using
system_percpu_wq. The function name doesn't suggest it, on top of that,
only the per-cpu version is present.

Because of that, the following changes are introduced:

- queue_{bound|unbound}_work() as future replacement of schedule_work()

- queue_bound_work_on() as future replacement of schedule_work_on()

- queue_bound_delayed_work() as future replacement of
  schedule_delayed_work()

- queue_unbound_delayed_work() to offer the unbound version

- queue_bound_delayed_work_on() as future replacement of
  schedule_delayed_work_on()

The addition of queue_unbound_delayed_work() is because "delayed" functions
make use of a global timer and that means the work will be executed on the
CPU where the timer fired.

The Workqueue API currently do not distinguish between use case where
locality is important for correctness and where is important for
efficiency. So introduce WQ_PREFER_PERCPU and wq_prefer_percpu_wq, so
that works who need to be per-cpu but don't strictly require it, can
use such workqueue / workqueue flag.

Thanks!

Marco Crivellari (2):
  workqueue: Add queue_*() functions, future schedule_*() replacement
  workqueue: Add WQ_PREFER_PERCPU and system_prefer_percpu_wq

 include/linux/workqueue.h | 108 ++++++++++++++++++++++++++++++++++++++
 kernel/workqueue.c        |   6 ++-
 2 files changed, 113 insertions(+), 1 deletion(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/2] workqueue: Add queue_*() functions, future schedule_*() replacement
  2026-05-05 16:16 [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag Marco Crivellari
@ 2026-05-05 16:16 ` Marco Crivellari
  2026-05-05 16:16 ` [RFC PATCH 2/2] workqueue: Add WQ_PREFER_PERCPU and system_prefer_percpu_wq Marco Crivellari
  2026-05-05 20:18 ` [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag Tejun Heo
  2 siblings, 0 replies; 10+ messages in thread
From: Marco Crivellari @ 2026-05-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko

This is part of the workqueue refactoring. More details can be found at
the Link below.

The acqual schedule_*() interface used to schedule work items on a
workqueue doesn't distinguish between bound and unbound workqueue but
only system_percpu_wq is used. So introduce the bound and unbound versions.

To better reflect what these function does, rename them with a cleaner and
unified interface dropping the "schedule_*()" prefix and using "queue_*()".

This change introduce:
- queue_{bound|unbound}_work() with the bound version being the future
  replacement of schedule_work()

- queue_bound_work_on() as future replacement of schedule_work_on()

- queue_bound_delayed_work() as future replacement of
  schedule_delayed_work()

- add queue_unbound_delayed_work() to offer the unbound version

- queue_bound_delayed_work_on() as future replacement of
  schedule_delayed_work_on()

A further step would be the conversion of all the users to the new
introduced interfaces and the migration, whether locality is not
strictly required, to the unbound version.

In a future relase cycle and once users are migrated, the schedule_*()
interface will be removed.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
 include/linux/workqueue.h | 101 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index ab6cb70ca1a5..f46379d937c9 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -732,12 +732,26 @@ static inline bool mod_delayed_work(struct workqueue_struct *wq,
  * @work: job to be done
  *
  * This puts a job on a specific cpu
+ *
+ * Note: this function will be replaced by queue_bound_work_on()
  */
 static inline bool schedule_work_on(int cpu, struct work_struct *work)
 {
 	return queue_work_on(cpu, system_percpu_wq, work);
 }
 
+/**
+ * queue_bound_work_on - put work task on a specific cpu
+ * @cpu: cpu to put the work task on
+ * @work: job to be done
+ *
+ * This puts a job on a specific cpu
+ */
+static inline bool queue_bound_work_on(int cpu, struct work_struct *work)
+{
+	return queue_work_on(cpu, system_percpu_wq, work);
+}
+
 /**
  * schedule_work - put work task in per-CPU workqueue
  * @work: job to be done
@@ -751,12 +765,53 @@ static inline bool schedule_work_on(int cpu, struct work_struct *work)
  *
  * Shares the same memory-ordering properties of queue_work(), cf. the
  * DocBook header of queue_work().
+ *
+ * Note: this function will be removed in future, use schedule_{bound|unbound}_work()
+ * instead.
  */
 static inline bool schedule_work(struct work_struct *work)
 {
 	return queue_work(system_percpu_wq, work);
 }
 
+/**
+ * queue_bound_work - put work task in per-CPU workqueue
+ * @work: job to be done
+ *
+ * Returns %false if @work was already on the system per-CPU workqueue and
+ * %true otherwise.
+ *
+ * This puts a job in the system per-CPU workqueue if it was not already
+ * queued and leaves it in the same position on the system per-CPU
+ * workqueue otherwise.
+ *
+ * Shares the same memory-ordering properties of queue_work(), cf. the
+ * DocBook header of queue_work().
+ */
+static inline bool queue_bound_work(struct work_struct *work)
+{
+	return queue_work(system_percpu_wq, work);
+}
+
+/**
+ * queue_unbound_work - put work task in unbound workqueue
+ * @work: job to be done
+ *
+ * Returns %false if @work was already on the system unbound workqueue and
+ * %true otherwise.
+ *
+ * This puts a job in the system unbound workqueue if it was not already
+ * queued and leaves it in the same position on the system unbound
+ * workqueue otherwise.
+ *
+ * Shares the same memory-ordering properties of queue_work(), cf. the
+ * DocBook header of queue_work().
+ */
+static inline bool queue_unbound_work(struct work_struct *work)
+{
+	return queue_work(system_dfl_wq, work);
+}
+
 /**
  * enable_and_queue_work - Enable and queue a work item on a specific workqueue
  * @wq: The target workqueue
@@ -832,6 +887,9 @@ extern void __warn_flushing_systemwide_wq(void)
  *
  * After waiting for a given time this puts a job in the system per-CPU
  * workqueue on the specified CPU.
+ *
+ * Note: this function will be removed. Please use queue_delayed_bound_work_on()
+ * instead
  */
 static inline bool schedule_delayed_work_on(int cpu, struct delayed_work *dwork,
 					    unsigned long delay)
@@ -839,6 +897,21 @@ static inline bool schedule_delayed_work_on(int cpu, struct delayed_work *dwork,
 	return queue_delayed_work_on(cpu, system_percpu_wq, dwork, delay);
 }
 
+/**
+ * queue_delayed_bound_work_on - queue work in per-CPU workqueue on CPU after delay
+ * @cpu: cpu to use
+ * @dwork: job to be done
+ * @delay: number of jiffies to wait
+ *
+ * After waiting for a given time this puts a job in the system per-CPU
+ * workqueue on the specified CPU.
+ */
+static inline bool queue_delayed_bound_work_on(int cpu, struct delayed_work *dwork,
+					    unsigned long delay)
+{
+	return queue_delayed_work_on(cpu, system_percpu_wq, dwork, delay);
+}
+
 /**
  * schedule_delayed_work - put work task in per-CPU workqueue after delay
  * @dwork: job to be done
@@ -853,6 +926,34 @@ static inline bool schedule_delayed_work(struct delayed_work *dwork,
 	return queue_delayed_work(system_percpu_wq, dwork, delay);
 }
 
+/**
+ * queue_delayed_bound_work - put work task in per-CPU workqueue after delay
+ * @dwork: job to be done
+ * @delay: number of jiffies to wait or 0 for immediate execution
+ *
+ * After waiting for a given time this puts a job in the system per-CPU
+ * workqueue.
+ */
+static inline bool queue_delayed_bound_work(struct delayed_work *dwork,
+					 unsigned long delay)
+{
+	return queue_delayed_work(system_percpu_wq, dwork, delay);
+}
+
+/**
+ * queue_delayed_unbound_work - put work task in unbound workqueue after delay
+ * @dwork: job to be done
+ * @delay: number of jiffies to wait or 0 for immediate execution
+ *
+ * After waiting for a given time this puts a job in the system unbound
+ * workqueue.
+ */
+static inline bool queue_delayed_unbound_work(struct delayed_work *dwork,
+					 unsigned long delay)
+{
+	return queue_delayed_work(system_dfl_wq, dwork, delay);
+}
+
 #ifndef CONFIG_SMP
 static inline long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
 {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/2] workqueue: Add WQ_PREFER_PERCPU and system_prefer_percpu_wq
  2026-05-05 16:16 [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag Marco Crivellari
  2026-05-05 16:16 ` [RFC PATCH 1/2] workqueue: Add queue_*() functions, future schedule_*() replacement Marco Crivellari
@ 2026-05-05 16:16 ` Marco Crivellari
  2026-05-05 20:18 ` [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag Tejun Heo
  2 siblings, 0 replies; 10+ messages in thread
From: Marco Crivellari @ 2026-05-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko

There may be situations where local execution is not strictly needed for
correctness, but it is preferred for performance gains.

The Workqueue API currently do not distinguish between these two instances.

So add WQ_PREFER_PERCPU and system_prefer_percpu_wq, so that it would be the
first choice for workload who don't strictly need local execution for
correctness but only to maintain locality.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
 include/linux/workqueue.h | 7 +++++++
 kernel/workqueue.c        | 6 +++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index f46379d937c9..be65df3dea5b 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -404,6 +404,7 @@ enum wq_flags {
 	 */
 	WQ_POWER_EFFICIENT	= 1 << 7,
 	WQ_PERCPU		= 1 << 8, /* bound to a specific cpu */
+	WQ_PREFER_PERCPU	= 1 << 9, /* prefer local cpu, but it doesn't require it */
 
 	__WQ_DESTROYING		= 1 << 15, /* internal: workqueue is destroying */
 	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
@@ -460,6 +461,9 @@ enum wq_consts {
  *
  * system_bh[_highpri]_wq are convenience interface to softirq. BH work items
  * are executed in the queueing CPU's BH context in the queueing order.
+ *
+ * system_prefer_percpu_wq is for work items which prefer to be local but
+ * doesn't require it
  */
 extern struct workqueue_struct *system_wq; /* use system_percpu_wq, this will be removed */
 extern struct workqueue_struct *system_percpu_wq;
@@ -473,6 +477,7 @@ extern struct workqueue_struct *system_freezable_power_efficient_wq;
 extern struct workqueue_struct *system_bh_wq;
 extern struct workqueue_struct *system_bh_highpri_wq;
 extern struct workqueue_struct *system_dfl_long_wq;
+extern struct workqueue_struct *system_prefer_percpu_wq;
 
 void workqueue_softirq_action(bool highpri);
 void workqueue_softirq_dead(unsigned int cpu);
@@ -873,6 +878,8 @@ extern void __warn_flushing_systemwide_wq(void)
 	     _wq == system_freezable_wq) ||				\
 	    (__builtin_constant_p(_wq == system_power_efficient_wq) &&	\
 	     _wq == system_power_efficient_wq) ||			\
+	    (__builtin_constant_p(_wq == system_prefer_percpu_wq) &&	\
+	     _wq == system_prefer_percpu_wq) ||				\
 	    (__builtin_constant_p(_wq == system_freezable_power_efficient_wq) && \
 	     _wq == system_freezable_power_efficient_wq))		\
 		__warn_flushing_systemwide_wq();			\
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5f747f241a5f..abba222a2d58 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -545,6 +545,8 @@ struct workqueue_struct *system_bh_highpri_wq;
 EXPORT_SYMBOL_GPL(system_bh_highpri_wq);
 struct workqueue_struct *system_dfl_long_wq __ro_after_init;
 EXPORT_SYMBOL_GPL(system_dfl_long_wq);
+struct workqueue_struct *system_prefer_percpu_wq __ro_after_init;
+EXPORT_SYMBOL_GPL(system_prefer_percpu_wq);
 
 static int worker_thread(void *__worker);
 static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
@@ -8029,11 +8031,13 @@ void __init workqueue_init_early(void)
 	system_bh_highpri_wq = alloc_workqueue("events_bh_highpri",
 					       WQ_BH | WQ_HIGHPRI | WQ_PERCPU, 0);
 	system_dfl_long_wq = alloc_workqueue("events_dfl_long", WQ_UNBOUND, WQ_MAX_ACTIVE);
+	system_prefer_percpu_wq = alloc_workqueue("events_prefer_percpu", WQ_PREFER_PERCPU, 0);
 	BUG_ON(!system_wq || !system_percpu_wq|| !system_highpri_wq || !system_long_wq ||
 	       !system_unbound_wq || !system_freezable_wq || !system_dfl_wq ||
 	       !system_power_efficient_wq ||
 	       !system_freezable_power_efficient_wq ||
-	       !system_bh_wq || !system_bh_highpri_wq || !system_dfl_long_wq);
+	       !system_bh_wq || !system_bh_highpri_wq || !system_dfl_long_wq ||
+	       !system_prefer_percpu_wq);
 }
 
 static void __init wq_cpu_intensive_thresh_init(void)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
  2026-05-05 16:16 [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag Marco Crivellari
  2026-05-05 16:16 ` [RFC PATCH 1/2] workqueue: Add queue_*() functions, future schedule_*() replacement Marco Crivellari
  2026-05-05 16:16 ` [RFC PATCH 2/2] workqueue: Add WQ_PREFER_PERCPU and system_prefer_percpu_wq Marco Crivellari
@ 2026-05-05 20:18 ` Tejun Heo
  2026-05-06 13:40   ` Breno Leitao
  2 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2026-05-05 20:18 UTC (permalink / raw)
  To: Marco Crivellari
  Cc: linux-kernel, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Michal Hocko, Breno Leitao

(cc'ing Breno)

Hello,

On Tue, May 05, 2026 at 06:16:56PM +0200, Marco Crivellari wrote:
> Actually schedule_work() and schedule_work_on() enqueue works using
> system_percpu_wq. The function name doesn't suggest it, on top of that,
> only the per-cpu version is present.

I was hoping to just retire schedule_work[_on]() and let people use e.g.
system_percpu_wq directly. Is that too verbose for casual users?

> Because of that, the following changes are introduced:
> 
> - queue_{bound|unbound}_work() as future replacement of schedule_work()

If we do this, I think "percpu" is a lot clearer than "bound". percpu <->
(nothing) combination would be nice eventually but maybe that's too
confusing now. Does percpu <-> unbound combination sound weird?

...
> The Workqueue API currently do not distinguish between use case where
> locality is important for correctness and where is important for
> efficiency. So introduce WQ_PREFER_PERCPU and wq_prefer_percpu_wq, so
> that works who need to be per-cpu but don't strictly require it, can
> use such workqueue / workqueue flag.

What's requested through WQ_PREFER_PERCPU is simliar to what WQ_AFFN_CPU
does, so that might just work out. The only problem is that WQ_AFFN_CPU will
create nr_cpus workers to populate the per-cpu pods on boot. Maybe that's
not a problem if this gets used widely.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
  2026-05-05 20:18 ` [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag Tejun Heo
@ 2026-05-06 13:40   ` Breno Leitao
  2026-05-07 10:25     ` Marco Crivellari
  0 siblings, 1 reply; 10+ messages in thread
From: Breno Leitao @ 2026-05-06 13:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Marco Crivellari, linux-kernel, Lai Jiangshan,
	Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko

On Tue, May 05, 2026 at 10:18:49AM -1000, Tejun Heo wrote:
> (cc'ing Breno)

Thanks!

> On Tue, May 05, 2026 at 06:16:56PM +0200, Marco Crivellari wrote:
> > Actually schedule_work() and schedule_work_on() enqueue works using
> > system_percpu_wq. The function name doesn't suggest it, on top of that,
> > only the per-cpu version is present.
> 
> I was hoping to just retire schedule_work[_on]() and let people use e.g.
> system_percpu_wq directly. Is that too verbose for casual users?

I think schedule_work() doesn't help much, and makes the system a bit harder to
understand. When I started reading this code, I would have preferred to see
queue_work(system_percpu_wq, work) instead of schedule_work(work).

In fact, I suspect this patchset exists partly because we have the
schedule_work() helper.

Would this proposal exist if schedule_work() had never been added?

> > Because of that, the following changes are introduced:
> > 
> > - queue_{bound|unbound}_work() as future replacement of schedule_work()
> 
> If we do this, I think "percpu" is a lot clearer than "bound". percpu <->
> (nothing) combination would be nice eventually but maybe that's too
> confusing now. Does percpu <-> unbound combination sound weird?

Would percpu <-> global sound less weird?

> > The Workqueue API currently do not distinguish between use case where
> > locality is important for correctness and where is important for
> > efficiency.

If you enqueue work to system_unbound_wq with the default affinitization, you
already get locality (WQ_AFFN_CACHE groups CPUs sharing the same LLC). This is
the way to say that locality is important for efficiency, anbd the WQ_AFFN_CPU
is the way to specify that locality is important for correctness. 

On top of that, WQ_AFFN_SYSTEM is a way to specify that locality is not
necessary at all.

Also, how WQ_PREFER_PERCPU behaves differently from WQ_AFFN_CPU?

Thanks for the RFC,
--breno

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
  2026-05-06 13:40   ` Breno Leitao
@ 2026-05-07 10:25     ` Marco Crivellari
  2026-05-07 21:27       ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Marco Crivellari @ 2026-05-07 10:25 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Tejun Heo, linux-kernel, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Michal Hocko

On Wed, May 6, 2026 at 3:40 PM Breno Leitao <leitao@debian.org> wrote:
>
> On Tue, May 05, 2026 at 10:18:49AM -1000, Tejun Heo wrote:
> > (cc'ing Breno)

Thanks Tejun and Breno, for your feedbacks.

> > On Tue, May 05, 2026 at 06:16:56PM +0200, Marco Crivellari wrote:
> > > Actually schedule_work() and schedule_work_on() enqueue works using
> > > system_percpu_wq. The function name doesn't suggest it, on top of that,
> > > only the per-cpu version is present.
> >
> > I was hoping to just retire schedule_work[_on]() and let people use e.g.
> > system_percpu_wq directly. Is that too verbose for casual users?
>
> I think schedule_work() doesn't help much, and makes the system a bit harder to
> understand. When I started reading this code, I would have preferred to see
> queue_work(system_percpu_wq, work) instead of schedule_work(work).
>
> In fact, I suspect this patchset exists partly because we have the
> schedule_work() helper.

Yes, correct. Perhaps retiring schedule_work[_on](), as Tejun
suggested, would be the easiest way indeed.

I proposed this in light of our next step (which I would say is the
first): ensuring that every schedule_work() really needs to use the
per-cpu workqueue and offloads work that can be unbound to the unbound
workqueue.

So, either we're going to have an "unbound" version or we use
queue_work() directly that sounds good to me. I guess retire - in
future - schedule_work[_on]() would be cleaner: so that users must
also specify the workqueue they really need to use.

> > > Because of that, the following changes are introduced:
> > >
> > > - queue_{bound|unbound}_work() as future replacement of schedule_work()
> >
> > If we do this, I think "percpu" is a lot clearer than "bound". percpu <->
> > (nothing) combination would be nice eventually but maybe that's too
> > confusing now. Does percpu <-> unbound combination sound weird?
>
> Would percpu <-> global sound less weird?

Now that I read your inputs, if we rename, perhaps we can keep the
current abbreviation for unbound ("dfl") to avoid introducing
something new.

What do you both think about:

- queue_percpu_work()
- queue_dfl_work()

?

They somehow reflect the newly introduced system_dfl_wq and system_percpu_wq.

> > > The Workqueue API currently do not distinguish between use case where
> > > locality is important for correctness and where is important for
> > > efficiency.
>
> If you enqueue work to system_unbound_wq with the default affinitization, you
> already get locality (WQ_AFFN_CACHE groups CPUs sharing the same LLC). This is
> the way to say that locality is important for efficiency, anbd the WQ_AFFN_CPU
> is the way to specify that locality is important for correctness.
>
> On top of that, WQ_AFFN_SYSTEM is a way to specify that locality is not
> necessary at all.
>
> Also, how WQ_PREFER_PERCPU behaves differently from WQ_AFFN_CPU?

Let me share where this was discussed a year ago:

https://lore.kernel.org/all/Z79E_gbWm9j9bkfR@slm.duckdns.org/

Perhaps - likely - I haven't understood the WQ_PREFER_PERCPU proposal
here; I thought it was a workqueue flag, to be used like WQ_PERCPU or
WQ_UNBOUND.
Reading Tejun's reply is also clearer now.

Anyhow, this idea is based on customer reports I've seen previously.
We noticed that with certain workloads, specific per-cpu work creates
noise on isolated CPUs. With a flag like that we can identify which
workqueues prefer to be per-cpu and *not* for correctness. This allows
using a boot parameter / sysctl, for example, to keep those workqueues
affined only to housekeeping CPUs.

Of course, if we can achieve the same with a system workqueue (like
system_prefer_percpu_wq), that would also be fine. I think it would be
way easier, it should be similar to what we're doing with
system_power_efficient_wq [1].

Tejun, Breno (and others), what do you think? Bad idea? :-)

Thanks!

- [1] https://elixir.bootlin.com/linux/v7.0.1/source/kernel/workqueue.c#L7907

--

Marco Crivellari

SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
  2026-05-07 10:25     ` Marco Crivellari
@ 2026-05-07 21:27       ` Tejun Heo
  2026-05-08 12:09         ` Frederic Weisbecker
  2026-05-12  8:52         ` Marco Crivellari
  0 siblings, 2 replies; 10+ messages in thread
From: Tejun Heo @ 2026-05-07 21:27 UTC (permalink / raw)
  To: Marco Crivellari
  Cc: Breno Leitao, linux-kernel, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Michal Hocko

Hello,

On Thu, May 07, 2026 at 12:25:30PM +0200, Marco Crivellari wrote:
> So, either we're going to have an "unbound" version or we use
> queue_work() directly that sounds good to me. I guess retire - in
> future - schedule_work[_on]() would be cleaner: so that users must
> also specify the workqueue they really need to use.

Yeah, retiring would be my preference if we need to update them anyway. I
don't think the thin wrappers add anything useful.

> What do you both think about:
> 
> - queue_percpu_work()
> - queue_dfl_work()

But if were to keep the wrappers, yeah, these are better names.

> Let me share where this was discussed a year ago:
> 
> https://lore.kernel.org/all/Z79E_gbWm9j9bkfR@slm.duckdns.org/
> 
> Perhaps - likely - I haven't understood the WQ_PREFER_PERCPU proposal
> here; I thought it was a workqueue flag, to be used like WQ_PERCPU or
> WQ_UNBOUND.
> Reading Tejun's reply is also clearer now.

Yeah, that was what was discussed then.

> Anyhow, this idea is based on customer reports I've seen previously.
> We noticed that with certain workloads, specific per-cpu work creates
> noise on isolated CPUs. With a flag like that we can identify which
> workqueues prefer to be per-cpu and *not* for correctness. This allows
> using a boot parameter / sysctl, for example, to keep those workqueues
> affined only to housekeeping CPUs.
> 
> Of course, if we can achieve the same with a system workqueue (like
> system_prefer_percpu_wq), that would also be fine. I think it would be
> way easier, it should be similar to what we're doing with
> system_power_efficient_wq [1].

WQ_AFFN_CPU is more flexible as the tasks aren't pinned to the CPU but there
may be downsides:

- Concurrency management isn't available.

- Would create more kworkers.

Maybe the original plan can be adapted to:

- Add WQ_PERFER_PERCPU as discussed before.

- At boot time, allow selecting whether to back them with percpu wqs or
  WQ_AFFN_X unbound ones. Maybe we can even experiment with default to
  WQ_AFFN_CPU.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
  2026-05-07 21:27       ` Tejun Heo
@ 2026-05-08 12:09         ` Frederic Weisbecker
  2026-05-08 15:11           ` Tejun Heo
  2026-05-12  8:52         ` Marco Crivellari
  1 sibling, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2026-05-08 12:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Marco Crivellari, Breno Leitao, linux-kernel, Lai Jiangshan,
	Sebastian Andrzej Siewior, Michal Hocko

Le Thu, May 07, 2026 at 11:27:52AM -1000, Tejun Heo a écrit :
> > Anyhow, this idea is based on customer reports I've seen previously.
> > We noticed that with certain workloads, specific per-cpu work creates
> > noise on isolated CPUs. With a flag like that we can identify which
> > workqueues prefer to be per-cpu and *not* for correctness. This allows
> > using a boot parameter / sysctl, for example, to keep those workqueues
> > affined only to housekeeping CPUs.
> > 
> > Of course, if we can achieve the same with a system workqueue (like
> > system_prefer_percpu_wq), that would also be fine. I think it would be
> > way easier, it should be similar to what we're doing with
> > system_power_efficient_wq [1].
> 
> WQ_AFFN_CPU is more flexible as the tasks aren't pinned to the CPU but there
> may be downsides:
> 
> - Concurrency management isn't available.
> 
> - Would create more kworkers.
> 
> Maybe the original plan can be adapted to:
> 
> - Add WQ_PERFER_PERCPU as discussed before.
> 
> - At boot time, allow selecting whether to back them with percpu wqs or
>   WQ_AFFN_X unbound ones. Maybe we can even experiment with default to
>   WQ_AFFN_CPU.

Isn't WQ_POWER_EFFICIENT enough for what we want here? ie: it does a per-cpu
preference except when some config is enabled or isolation is on. It could be
renamed to WQ_PREFER_PERCPU to generalize its meaning for more than just power
purposes.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
  2026-05-08 12:09         ` Frederic Weisbecker
@ 2026-05-08 15:11           ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2026-05-08 15:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Marco Crivellari, Breno Leitao, linux-kernel, Lai Jiangshan,
	Sebastian Andrzej Siewior, Michal Hocko

Hello,

On Fri, May 08, 2026 at 02:09:20PM +0200, Frederic Weisbecker wrote:
> Isn't WQ_POWER_EFFICIENT enough for what we want here? ie: it does a per-cpu
> preference except when some config is enabled or isolation is on. It could be
> renamed to WQ_PREFER_PERCPU to generalize its meaning for more than just power
> purposes.

That may satisfy the minimum requirement but I think it'd be a shame if we
do all the work and still leave the semantics overloaded, which was the
initial problem to begin with. I really want the intent of each specific
selection expressed unambigiously.

Besides, even outside of isolation use cases, having relaxed affinity can be
useful as it gives the scheduler more leeway in placement decisions. e.g.
There's no real downsides to running such work item on SMT pair or maybe
that CPU is particularly overloaded due to net irq and rx processing and
some work items are better off running on another CPU in the same LLC and so
on. If we can manage so without causing perf issues, I want the default to
be soft affinity, not a hard one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
  2026-05-07 21:27       ` Tejun Heo
  2026-05-08 12:09         ` Frederic Weisbecker
@ 2026-05-12  8:52         ` Marco Crivellari
  1 sibling, 0 replies; 10+ messages in thread
From: Marco Crivellari @ 2026-05-12  8:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Breno Leitao, linux-kernel, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Michal Hocko

Hello,

On Thu, May 7, 2026 at 11:27 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Thu, May 07, 2026 at 12:25:30PM +0200, Marco Crivellari wrote:
> > So, either we're going to have an "unbound" version or we use
> > queue_work() directly that sounds good to me. I guess retire - in
> > future - schedule_work[_on]() would be cleaner: so that users must
> > also specify the workqueue they really need to use.
>
> Yeah, retiring would be my preference if we need to update them anyway. I
> don't think the thin wrappers add anything useful.

Fine, this sounds good for me if we do it that way.

> WQ_AFFN_CPU is more flexible as the tasks aren't pinned to the CPU but there
> may be downsides:
>
> - Concurrency management isn't available.
>
> - Would create more kworkers.
>
> Maybe the original plan can be adapted to:
>
> - Add WQ_PERFER_PERCPU as discussed before.
>
> - At boot time, allow selecting whether to back them with percpu wqs or
>   WQ_AFFN_X unbound ones. Maybe we can even experiment with default to
>   WQ_AFFN_CPU.

Cool, maybe I will ask something else when the time comes!

Thanks!
--

Marco Crivellari

SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-12  8:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05 16:16 [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag Marco Crivellari
2026-05-05 16:16 ` [RFC PATCH 1/2] workqueue: Add queue_*() functions, future schedule_*() replacement Marco Crivellari
2026-05-05 16:16 ` [RFC PATCH 2/2] workqueue: Add WQ_PREFER_PERCPU and system_prefer_percpu_wq Marco Crivellari
2026-05-05 20:18 ` [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag Tejun Heo
2026-05-06 13:40   ` Breno Leitao
2026-05-07 10:25     ` Marco Crivellari
2026-05-07 21:27       ` Tejun Heo
2026-05-08 12:09         ` Frederic Weisbecker
2026-05-08 15:11           ` Tejun Heo
2026-05-12  8:52         ` Marco Crivellari

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox