[PATCH] xfs: convert alloc_workqueue users to WQ

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND
@ 2026-02-18 16:56 Marco Crivellari
  2026-02-19  1:24 ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Marco Crivellari @ 2026-02-18 16:56 UTC (permalink / raw)
  To: linux-kernel, linux-xfs
  Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
	Anthony Iliopoulos, Carlos Maiolino

Recently, as part of a workqueue refactor, WQ_PERCPU has been added to
alloc_workqueue() users that didn't specify WQ_UNBOUND.
The change has been introduced by:

  69635d7f4b344 ("fs: WQ_PERCPU added to alloc_workqueue users")

These specific workqueues don't use per-cpu data, so change the behavior
removing WQ_PERCPU and adding WQ_UNBOUND. Even if these workqueue are
marked unbound, the workqueue subsystem maintains cache locality by
default via affinity scopes.

The changes from per-cpu to unbound will help to improve situations where
CPU isolation is used, because unbound work can be moved away from
isolated CPUs.

Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
 fs/xfs/xfs_log.c   |  2 +-
 fs/xfs/xfs_super.c | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index a26378ca247d..82f6b12efe22 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1441,7 +1441,7 @@ xlog_alloc_log(
 	log->l_iclog->ic_prev = prev_iclog;	/* re-write 1st prev ptr */
 
 	log->l_ioend_workqueue = alloc_workqueue("xfs-log/%s",
-			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_PERCPU),
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND),
 			0, mp->m_super->s_id);
 	if (!log->l_ioend_workqueue)
 		goto out_free_iclog;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8586f044a14b..072381c6f137 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -592,19 +592,19 @@ xfs_init_mount_workqueues(
 	struct xfs_mount	*mp)
 {
 	mp->m_buf_workqueue = alloc_workqueue("xfs-buf/%s",
-			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
 			1, mp->m_super->s_id);
 	if (!mp->m_buf_workqueue)
 		goto out;
 
 	mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s",
-			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
 			0, mp->m_super->s_id);
 	if (!mp->m_unwritten_workqueue)
 		goto out_destroy_buf;
 
 	mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s",
-			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
 			0, mp->m_super->s_id);
 	if (!mp->m_reclaim_workqueue)
 		goto out_destroy_unwritten;
@@ -616,13 +616,13 @@ xfs_init_mount_workqueues(
 		goto out_destroy_reclaim;
 
 	mp->m_inodegc_wq = alloc_workqueue("xfs-inodegc/%s",
-			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
 			1, mp->m_super->s_id);
 	if (!mp->m_inodegc_wq)
 		goto out_destroy_blockgc;
 
 	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s",
-			XFS_WQFLAGS(WQ_FREEZABLE | WQ_PERCPU), 0,
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_UNBOUND), 0,
 			mp->m_super->s_id);
 	if (!mp->m_sync_workqueue)
 		goto out_destroy_inodegc;
@@ -2564,7 +2564,7 @@ xfs_init_workqueues(void)
 	 * AGs in all the filesystems mounted. Hence use the default large
 	 * max_active value for this workqueue.
 	 */
-	xfs_alloc_wq = alloc_workqueue("xfsalloc", XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_PERCPU),
+	xfs_alloc_wq = alloc_workqueue("xfsalloc", XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND),
 			0);
 	if (!xfs_alloc_wq)
 		return -ENOMEM;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND
  2026-02-18 16:56 [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND Marco Crivellari
@ 2026-02-19  1:24 ` Dave Chinner
  2026-02-19  7:25   ` Sebastian Andrzej Siewior
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Dave Chinner @ 2026-02-19  1:24 UTC (permalink / raw)
  To: Marco Crivellari
  Cc: linux-kernel, linux-xfs, Tejun Heo, Lai Jiangshan,
	Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko,
	Anthony Iliopoulos, Carlos Maiolino

On Wed, Feb 18, 2026 at 05:56:09PM +0100, Marco Crivellari wrote:
> Recently, as part of a workqueue refactor, WQ_PERCPU has been added to
> alloc_workqueue() users that didn't specify WQ_UNBOUND.
> The change has been introduced by:
> 
>   69635d7f4b344 ("fs: WQ_PERCPU added to alloc_workqueue users")
> 
> These specific workqueues don't use per-cpu data, so change the behavior
> removing WQ_PERCPU and adding WQ_UNBOUND.

Your definition for "doesn't need per-cpu workqueues" is sadly
deficient.

> Even if these workqueue are
> marked unbound, the workqueue subsystem maintains cache locality by
> default via affinity scopes.
> 
> The changes from per-cpu to unbound will help to improve situations where
> CPU isolation is used, because unbound work can be moved away from
> isolated CPUs.

If you are running operations through the XFS filesystem on isolated
CPUs, then you absolutely need some of these the per-cpu workqueues
running on those isolated CPUs too.

Also, these workqueues are typically implemented these ways to meet
performancei targets, concurrency constraints or algorithm
requirements. Changes like this need a bunch of XFS metadata
scalability benchmarks on high end server systems under a variety of
conditions to at least show there aren't any obvious any behavioural
or performance regressions that result from the change.

> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
> ---
>  fs/xfs/xfs_log.c   |  2 +-
>  fs/xfs/xfs_super.c | 12 ++++++------
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index a26378ca247d..82f6b12efe22 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1441,7 +1441,7 @@ xlog_alloc_log(
>  	log->l_iclog->ic_prev = prev_iclog;	/* re-write 1st prev ptr */
>  
>  	log->l_ioend_workqueue = alloc_workqueue("xfs-log/%s",
> -			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_PERCPU),
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND),
>  			0, mp->m_super->s_id);

We want to process IO completions on the same CPU that the
completion was delivered for performance reasons. If you've
configured storage interrupts to be delivered to an isolated CPU,
then you're doing CPU isolation wrong.

>  	if (!log->l_ioend_workqueue)
>  		goto out_free_iclog;
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 8586f044a14b..072381c6f137 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -592,19 +592,19 @@ xfs_init_mount_workqueues(
>  	struct xfs_mount	*mp)
>  {
>  	mp->m_buf_workqueue = alloc_workqueue("xfs-buf/%s",
> -			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
>  			1, mp->m_super->s_id);

Same here - these are IO completion processing workers.

However, we also want to limit IO completion processing work
depth to a single worker thread per CPU because these completions
rarely block and we can have thousands of them delivered in very
short periods of time.

Hence we do not want thundering storms of kworkers being spawned to
process a single IO completion each; it is far more efficient for a
single kworker to loop processing the incoming queue of IO
completions on a given CPU in a serial manner.

i.e. we use the concurrency control of per-cpu workqueues provide us
with concurrent completion processing based on storage defined
completion delivery, but we also constrain the the concurrency of
work processing to the most efficient method possible.

>  	if (!mp->m_buf_workqueue)
>  		goto out;
>  
>  	mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s",
> -			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
>  			0, mp->m_super->s_id);

Similar to above - this is data IO completion processing rather than
metadata. However, in this case individual unwritten extent
conversion works can block for long periods (e.g. might need to read
metadata from disk) so we allow lots of unwritten conversions to be
run concurrently per CPU so when one conversion blocks another
kworker can start working. It is not uncommon to see thousands (even
tens of thousands) of unwritten extent kworkers on systems running
heavy IO workloads..

>  	if (!mp->m_unwritten_workqueue)
>  		goto out_destroy_buf;
>  
>  	mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s",
> -			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
>  			0, mp->m_super->s_id);

This one might be able to be unbound, but the behaviour of this
workqueue has significant impact on memory reclaim performance. i.e
we use per-cpu work here because want it to irun immediately and
hold off other work on the CPU once it has been scheduled because it
is directly responsible for freeing memory. And when we are under
heavy memory pressure, this is kinda important.

>  	if (!mp->m_reclaim_workqueue)
>  		goto out_destroy_unwritten;
> @@ -616,13 +616,13 @@ xfs_init_mount_workqueues(
>  		goto out_destroy_reclaim;
>  
>  	mp->m_inodegc_wq = alloc_workqueue("xfs-inodegc/%s",
> -			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
>  			1, mp->m_super->s_id);

This is part of a per-cpu work deferring algorithm. It has hard
requirements on per-cpu work scheduling because it uses lockless
per-cpu queues.

This inodegc stuff is required to run on isolated CPUs if the tasks
on isolated CPUs are accessing anything from an XFS filesystem (even
if it is just running binaries from XFS filesystems).

Go look at xfs_inodegc_queue()...

>  	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s",
> -			XFS_WQFLAGS(WQ_FREEZABLE | WQ_PERCPU), 0,
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_UNBOUND), 0,
>  			mp->m_super->s_id);

That can probably be unbound.

>  	if (!mp->m_sync_workqueue)
>  		goto out_destroy_inodegc;
> @@ -2564,7 +2564,7 @@ xfs_init_workqueues(void)
>  	 * AGs in all the filesystems mounted. Hence use the default large
>  	 * max_active value for this workqueue.
>  	 */
> -	xfs_alloc_wq = alloc_workqueue("xfsalloc", XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_PERCPU),
> +	xfs_alloc_wq = alloc_workqueue("xfsalloc", XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND),
>  			0);

Not sure about this one. It might be ok, but if it runs out of
concurrency then there is potential for memory reclaim/dirty page
cleaning stalls (and maybe deadlocks).

-Dave.

-- 
Dave Chinner
dgc@kernel.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND
  2026-02-19  1:24 ` Dave Chinner
@ 2026-02-19  7:25   ` Sebastian Andrzej Siewior
  2026-02-19 23:05     ` Dave Chinner
  2026-02-19  9:20   ` Michal Hocko
  2026-02-19  9:35   ` Marco Crivellari
  2 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-19  7:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo,
	Lai Jiangshan, Frederic Weisbecker, Michal Hocko,
	Anthony Iliopoulos, Carlos Maiolino

On 2026-02-19 12:24:38 [+1100], Dave Chinner wrote:
> > The changes from per-cpu to unbound will help to improve situations where
> > CPU isolation is used, because unbound work can be moved away from
> > isolated CPUs.
> 
> If you are running operations through the XFS filesystem on isolated
> CPUs, then you absolutely need some of these the per-cpu workqueues
> running on those isolated CPUs too.
> 
> Also, these workqueues are typically implemented these ways to meet
> performancei targets, concurrency constraints or algorithm
> requirements. Changes like this need a bunch of XFS metadata
> scalability benchmarks on high end server systems under a variety of
> conditions to at least show there aren't any obvious any behavioural
> or performance regressions that result from the change.

So all of those (below) where you say "performance critical", those
work items are only enqueued from an interrupt? Never origin from a user
task?

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND
  2026-02-19  7:25   ` Sebastian Andrzej Siewior
@ 2026-02-19 23:05     ` Dave Chinner
  2026-02-20  7:19       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2026-02-19 23:05 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo,
	Lai Jiangshan, Frederic Weisbecker, Michal Hocko,
	Anthony Iliopoulos, Carlos Maiolino

On Thu, Feb 19, 2026 at 08:25:56AM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-19 12:24:38 [+1100], Dave Chinner wrote:
> > > The changes from per-cpu to unbound will help to improve situations where
> > > CPU isolation is used, because unbound work can be moved away from
> > > isolated CPUs.
> > 
> > If you are running operations through the XFS filesystem on isolated
> > CPUs, then you absolutely need some of these the per-cpu workqueues
> > running on those isolated CPUs too.
> > 
> > Also, these workqueues are typically implemented these ways to meet
> > performancei targets, concurrency constraints or algorithm
> > requirements. Changes like this need a bunch of XFS metadata
> > scalability benchmarks on high end server systems under a variety of
> > conditions to at least show there aren't any obvious any behavioural
> > or performance regressions that result from the change.
> 
> So all of those (below) where you say "performance critical", those
> work items are only enqueued from an interrupt?

No. 

> Never origin from a user task?

Inode GC is most definitely driven from user tasks with unbound
concurrency (e.g. unlink(), close() and other syscalls that can drop
a file reference). It can also be driven by the kernel through
direct reclaim (again, from user task context with unbound
concurrency), and from pure kernel context via reclaim from kswapd
(strictly bound concurrency in this case).

The lockless per-cpu queuing and processing algorithm was added
because the inode eviction path from user context is performance
critical. The original version using unbound workqueues had major
performance regressions.  There's discussion of the reasons for
those performance regressions and numbers around those in the
original discussions and prototypes:

https://lore.kernel.org/linux-xfs/20210802103559.GH2757197@dread.disaster.area/
https://lore.kernel.org/linux-xfs/20210804032030.GT3601443@magnolia/

-Dave.
-- 
Dave Chinner
dgc@kernel.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND
  2026-02-19 23:05     ` Dave Chinner
@ 2026-02-20  7:19       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-20  7:19 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo,
	Lai Jiangshan, Frederic Weisbecker, Michal Hocko,
	Anthony Iliopoulos, Carlos Maiolino

On 2026-02-20 10:05:09 [+1100], Dave Chinner wrote:
> > Never origin from a user task?
> 
> Inode GC is most definitely driven from user tasks with unbound
…
Thank you for the details.

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND
  2026-02-19  1:24 ` Dave Chinner
  2026-02-19  7:25   ` Sebastian Andrzej Siewior
@ 2026-02-19  9:20   ` Michal Hocko
  2026-02-19 22:22     ` Dave Chinner
  2026-02-19  9:35   ` Marco Crivellari
  2 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2026-02-19  9:20 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo,
	Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Anthony Iliopoulos, Carlos Maiolino

On Thu 19-02-26 12:24:38, Dave Chinner wrote:
> On Wed, Feb 18, 2026 at 05:56:09PM +0100, Marco Crivellari wrote:
> > Recently, as part of a workqueue refactor, WQ_PERCPU has been added to
> > alloc_workqueue() users that didn't specify WQ_UNBOUND.
> > The change has been introduced by:
> > 
> >   69635d7f4b344 ("fs: WQ_PERCPU added to alloc_workqueue users")
> > 
> > These specific workqueues don't use per-cpu data, so change the behavior
> > removing WQ_PERCPU and adding WQ_UNBOUND.
> 
> Your definition for "doesn't need per-cpu workqueues" is sadly
> deficient.

I believe Marco wanted to say they do not require strict per-cpu
guarantee of WQ_PERCPU for correctness. I.e. those workers do not
operate on per-cpu data.

> > Even if these workqueue are
> > marked unbound, the workqueue subsystem maintains cache locality by
> > default via affinity scopes.
> > 
> > The changes from per-cpu to unbound will help to improve situations where
> > CPU isolation is used, because unbound work can be moved away from
> > isolated CPUs.
> 
> If you are running operations through the XFS filesystem on isolated
> CPUs, then you absolutely need some of these the per-cpu workqueues
> running on those isolated CPUs too.

The usecase is that isolated workload needs to perform fs operations at
certain stages of the operation. Then it moves over to "do not disturb"
mode when it operates in the userspace and shouldn't be disrupted by the
kernel. We do observe that those workers trigger at later time and
disturb the workload when not appropriate.

> Also, these workqueues are typically implemented these ways to meet
> performancei targets, concurrency constraints or algorithm
> requirements. Changes like this need a bunch of XFS metadata
> scalability benchmarks on high end server systems under a variety of
> conditions to at least show there aren't any obvious any behavioural
> or performance regressions that result from the change.

This is a fair ask. We do not want to regress non-isolated workloads by
any means and if there is a risk of regression for those, and from your
more detailed explanation it seems so, then we might need to search for
a different approach. Would be an opt in - i.e. tolerate performance
loss by loosing the locality via a kernel cmd line an option?

I am cutting your specific feedback on those WQs. Thanks for that! This
is a very valuable feedback.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND
  2026-02-19  9:20   ` Michal Hocko
@ 2026-02-19 22:22     ` Dave Chinner
  2026-02-20  9:29       ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2026-02-19 22:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo,
	Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Anthony Iliopoulos, Carlos Maiolino

On Thu, Feb 19, 2026 at 10:20:54AM +0100, Michal Hocko wrote:
> On Thu 19-02-26 12:24:38, Dave Chinner wrote:
> > On Wed, Feb 18, 2026 at 05:56:09PM +0100, Marco Crivellari wrote:
> > > Recently, as part of a workqueue refactor, WQ_PERCPU has been added to
> > > alloc_workqueue() users that didn't specify WQ_UNBOUND.
> > > The change has been introduced by:
> > > 
> > >   69635d7f4b344 ("fs: WQ_PERCPU added to alloc_workqueue users")
> > > 
> > > These specific workqueues don't use per-cpu data, so change the behavior
> > > removing WQ_PERCPU and adding WQ_UNBOUND.
> > 
> > Your definition for "doesn't need per-cpu workqueues" is sadly
> > deficient.
> 
> I believe Marco wanted to say they do not require strict per-cpu
> guarantee of WQ_PERCPU for correctness. I.e. those workers do not
> operate on per-cpu data.

Which I've already pointed out was an incorrect statement w.r.t to
the code being changed (e.g. inodegc use lockless per-cpu queues
and a per-cpu worker to process those queues). I've also pointed out
that it is a fundamentally incorrect statement when considering
bound concurrency algorithms in general, too.

> > > Even if these workqueue are
> > > marked unbound, the workqueue subsystem maintains cache locality by
> > > default via affinity scopes.
> > > 
> > > The changes from per-cpu to unbound will help to improve situations where
> > > CPU isolation is used, because unbound work can be moved away from
> > > isolated CPUs.
> > 
> > If you are running operations through the XFS filesystem on isolated
> > CPUs, then you absolutely need some of these the per-cpu workqueues
> > running on those isolated CPUs too.
> 
> The usecase is that isolated workload needs to perform fs operations at
> certain stages of the operation. Then it moves over to "do not disturb"
> mode when it operates in the userspace and shouldn't be disrupted by the
> kernel. We do observe that those workers trigger at later time and
> disturb the workload when not appropriate.

Define "later time".

Also, please explain how the XFS work gets queued to run on these
isolated CPUs?  If there's nothing fs, storage or memory reclaim
related running on the isolated CPU, then none of the XFS workqueues
should ever trigger on those CPUs. 

IOWs, if you are getting unexpected work triggers on isolated CPUs,
then you need to first explain how those unexpected triggers are
occurring. Once you can explain how the per-cpu workqueues are
responsible for the unexpected behaviour rather than just being the
visible symptom of something else going wrong (e.g. a scheduler or
workqueue bug), then we can discussion changing the XFS code....

> > Also, these workqueues are typically implemented these ways to meet
> > performancei targets, concurrency constraints or algorithm
> > requirements. Changes like this need a bunch of XFS metadata
> > scalability benchmarks on high end server systems under a variety of
> > conditions to at least show there aren't any obvious any behavioural
> > or performance regressions that result from the change.
> 
> This is a fair ask. We do not want to regress non-isolated workloads by
> any means and if there is a risk of regression for those, and from your
> more detailed explanation it seems so, then we might need to search for
> a different approach. Would be an opt in - i.e. tolerate performance
> loss by loosing the locality via a kernel cmd line an option?

No, this is still just hacking around an unexpected behaviour
without first understanding the cause of it. We can discuss how to
fix whatever the problem is once the root cause has been identified
and understood.

-Dave.
-- 
Dave Chinner
dgc@kernel.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND
  2026-02-19 22:22     ` Dave Chinner
@ 2026-02-20  9:29       ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2026-02-20  9:29 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo,
	Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior,
	Anthony Iliopoulos, Carlos Maiolino

On Fri 20-02-26 09:22:19, Dave Chinner wrote:
> On Thu, Feb 19, 2026 at 10:20:54AM +0100, Michal Hocko wrote:
[...]
> > The usecase is that isolated workload needs to perform fs operations at
> > certain stages of the operation. Then it moves over to "do not disturb"
> > mode when it operates in the userspace and shouldn't be disrupted by the
> > kernel. We do observe that those workers trigger at later time and
> > disturb the workload when not appropriate.
> 
> Define "later time".

After workload transitions to the "do not disturb" operation. 

> Also, please explain how the XFS work gets queued to run on these
> isolated CPUs?  If there's nothing fs, storage or memory reclaim
> related running on the isolated CPU, then none of the XFS workqueues
> should ever trigger on those CPUs. 

I do not have full visibility in the workload (i.e. access to the code)
so we only rely on tracing data. We know that the workload operates on
the set of isolated cpus and each component is consuming a dedicated
CPU. We also do see (among others) XFS workers interfering. I am not
able to link exact syscalls to those worker items but we pressume those
are result of prior workload execution as not much else is running on
those CPUs.

> IOWs, if you are getting unexpected work triggers on isolated CPUs,
> then you need to first explain how those unexpected triggers are
> occurring. Once you can explain how the per-cpu workqueues are
> responsible for the unexpected behaviour rather than just being the
> visible symptom of something else going wrong (e.g. a scheduler or
> workqueue bug), then we can discussion changing the XFS code....

Understood.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND
  2026-02-19  1:24 ` Dave Chinner
  2026-02-19  7:25   ` Sebastian Andrzej Siewior
  2026-02-19  9:20   ` Michal Hocko
@ 2026-02-19  9:35   ` Marco Crivellari
  2 siblings, 0 replies; 9+ messages in thread
From: Marco Crivellari @ 2026-02-19  9:35 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, linux-xfs, Tejun Heo, Lai Jiangshan,
	Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko,
	Anthony Iliopoulos, Carlos Maiolino

On Thu, Feb 19, 2026 at 2:24 AM Dave Chinner <dgc@kernel.org> wrote:
>
> On Wed, Feb 18, 2026 at 05:56:09PM +0100, Marco Crivellari wrote:
> > Recently, as part of a workqueue refactor, WQ_PERCPU has been added to
> > alloc_workqueue() users that didn't specify WQ_UNBOUND.
> > The change has been introduced by:
> >
> >   69635d7f4b344 ("fs: WQ_PERCPU added to alloc_workqueue users")
> >
> > These specific workqueues don't use per-cpu data, so change the behavior
> > removing WQ_PERCPU and adding WQ_UNBOUND.
>
> Your definition for "doesn't need per-cpu workqueues" is sadly
> deficient.

I should have added a "request for comments" for this patch, maybe.
Anyhow I meant they are not operating on per-cpu data, so it wasn't
"strictly" necessary because of this, sorry.

Thanks for all the information you shared. I will read carefully.

--

Marco Crivellari

L3 Support Engineer

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-20  9:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 16:56 [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND Marco Crivellari
2026-02-19  1:24 ` Dave Chinner
2026-02-19  7:25   ` Sebastian Andrzej Siewior
2026-02-19 23:05     ` Dave Chinner
2026-02-20  7:19       ` Sebastian Andrzej Siewior
2026-02-19  9:20   ` Michal Hocko
2026-02-19 22:22     ` Dave Chinner
2026-02-20  9:29       ` Michal Hocko
2026-02-19  9:35   ` Marco Crivellari

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox