* [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND
@ 2026-02-18 16:56 Marco Crivellari
2026-02-19 1:24 ` Dave Chinner
0 siblings, 1 reply; 9+ messages in thread
From: Marco Crivellari @ 2026-02-18 16:56 UTC (permalink / raw)
To: linux-kernel, linux-xfs
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Anthony Iliopoulos, Carlos Maiolino
Recently, as part of a workqueue refactor, WQ_PERCPU has been added to
alloc_workqueue() users that didn't specify WQ_UNBOUND.
The change has been introduced by:
69635d7f4b344 ("fs: WQ_PERCPU added to alloc_workqueue users")
These specific workqueues don't use per-cpu data, so change the behavior
removing WQ_PERCPU and adding WQ_UNBOUND. Even if these workqueue are
marked unbound, the workqueue subsystem maintains cache locality by
default via affinity scopes.
The changes from per-cpu to unbound will help to improve situations where
CPU isolation is used, because unbound work can be moved away from
isolated CPUs.
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
fs/xfs/xfs_log.c | 2 +-
fs/xfs/xfs_super.c | 12 ++++++------
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index a26378ca247d..82f6b12efe22 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1441,7 +1441,7 @@ xlog_alloc_log(
log->l_iclog->ic_prev = prev_iclog; /* re-write 1st prev ptr */
log->l_ioend_workqueue = alloc_workqueue("xfs-log/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_PERCPU),
+ XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND),
0, mp->m_super->s_id);
if (!log->l_ioend_workqueue)
goto out_free_iclog;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8586f044a14b..072381c6f137 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -592,19 +592,19 @@ xfs_init_mount_workqueues(
struct xfs_mount *mp)
{
mp->m_buf_workqueue = alloc_workqueue("xfs-buf/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
+ XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
1, mp->m_super->s_id);
if (!mp->m_buf_workqueue)
goto out;
mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
+ XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
0, mp->m_super->s_id);
if (!mp->m_unwritten_workqueue)
goto out_destroy_buf;
mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
+ XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
0, mp->m_super->s_id);
if (!mp->m_reclaim_workqueue)
goto out_destroy_unwritten;
@@ -616,13 +616,13 @@ xfs_init_mount_workqueues(
goto out_destroy_reclaim;
mp->m_inodegc_wq = alloc_workqueue("xfs-inodegc/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU),
+ XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
1, mp->m_super->s_id);
if (!mp->m_inodegc_wq)
goto out_destroy_blockgc;
mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_PERCPU), 0,
+ XFS_WQFLAGS(WQ_FREEZABLE | WQ_UNBOUND), 0,
mp->m_super->s_id);
if (!mp->m_sync_workqueue)
goto out_destroy_inodegc;
@@ -2564,7 +2564,7 @@ xfs_init_workqueues(void)
* AGs in all the filesystems mounted. Hence use the default large
* max_active value for this workqueue.
*/
- xfs_alloc_wq = alloc_workqueue("xfsalloc", XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_PERCPU),
+ xfs_alloc_wq = alloc_workqueue("xfsalloc", XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND),
0);
if (!xfs_alloc_wq)
return -ENOMEM;
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND 2026-02-18 16:56 [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND Marco Crivellari @ 2026-02-19 1:24 ` Dave Chinner 2026-02-19 7:25 ` Sebastian Andrzej Siewior ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Dave Chinner @ 2026-02-19 1:24 UTC (permalink / raw) To: Marco Crivellari Cc: linux-kernel, linux-xfs, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Anthony Iliopoulos, Carlos Maiolino On Wed, Feb 18, 2026 at 05:56:09PM +0100, Marco Crivellari wrote: > Recently, as part of a workqueue refactor, WQ_PERCPU has been added to > alloc_workqueue() users that didn't specify WQ_UNBOUND. > The change has been introduced by: > > 69635d7f4b344 ("fs: WQ_PERCPU added to alloc_workqueue users") > > These specific workqueues don't use per-cpu data, so change the behavior > removing WQ_PERCPU and adding WQ_UNBOUND. Your definition for "doesn't need per-cpu workqueues" is sadly deficient. > Even if these workqueue are > marked unbound, the workqueue subsystem maintains cache locality by > default via affinity scopes. > > The changes from per-cpu to unbound will help to improve situations where > CPU isolation is used, because unbound work can be moved away from > isolated CPUs. If you are running operations through the XFS filesystem on isolated CPUs, then you absolutely need some of these the per-cpu workqueues running on those isolated CPUs too. Also, these workqueues are typically implemented these ways to meet performancei targets, concurrency constraints or algorithm requirements. Changes like this need a bunch of XFS metadata scalability benchmarks on high end server systems under a variety of conditions to at least show there aren't any obvious any behavioural or performance regressions that result from the change. > Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> > --- > fs/xfs/xfs_log.c | 2 +- > fs/xfs/xfs_super.c | 12 ++++++------ > 2 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index a26378ca247d..82f6b12efe22 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -1441,7 +1441,7 @@ xlog_alloc_log( > log->l_iclog->ic_prev = prev_iclog; /* re-write 1st prev ptr */ > > log->l_ioend_workqueue = alloc_workqueue("xfs-log/%s", > - XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_PERCPU), > + XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND), > 0, mp->m_super->s_id); We want to process IO completions on the same CPU that the completion was delivered for performance reasons. If you've configured storage interrupts to be delivered to an isolated CPU, then you're doing CPU isolation wrong. > if (!log->l_ioend_workqueue) > goto out_free_iclog; > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 8586f044a14b..072381c6f137 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -592,19 +592,19 @@ xfs_init_mount_workqueues( > struct xfs_mount *mp) > { > mp->m_buf_workqueue = alloc_workqueue("xfs-buf/%s", > - XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU), > + XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND), > 1, mp->m_super->s_id); Same here - these are IO completion processing workers. However, we also want to limit IO completion processing work depth to a single worker thread per CPU because these completions rarely block and we can have thousands of them delivered in very short periods of time. Hence we do not want thundering storms of kworkers being spawned to process a single IO completion each; it is far more efficient for a single kworker to loop processing the incoming queue of IO completions on a given CPU in a serial manner. i.e. we use the concurrency control of per-cpu workqueues provide us with concurrent completion processing based on storage defined completion delivery, but we also constrain the the concurrency of work processing to the most efficient method possible. > if (!mp->m_buf_workqueue) > goto out; > > mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s", > - XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU), > + XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND), > 0, mp->m_super->s_id); Similar to above - this is data IO completion processing rather than metadata. However, in this case individual unwritten extent conversion works can block for long periods (e.g. might need to read metadata from disk) so we allow lots of unwritten conversions to be run concurrently per CPU so when one conversion blocks another kworker can start working. It is not uncommon to see thousands (even tens of thousands) of unwritten extent kworkers on systems running heavy IO workloads.. > if (!mp->m_unwritten_workqueue) > goto out_destroy_buf; > > mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s", > - XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU), > + XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND), > 0, mp->m_super->s_id); This one might be able to be unbound, but the behaviour of this workqueue has significant impact on memory reclaim performance. i.e we use per-cpu work here because want it to irun immediately and hold off other work on the CPU once it has been scheduled because it is directly responsible for freeing memory. And when we are under heavy memory pressure, this is kinda important. > if (!mp->m_reclaim_workqueue) > goto out_destroy_unwritten; > @@ -616,13 +616,13 @@ xfs_init_mount_workqueues( > goto out_destroy_reclaim; > > mp->m_inodegc_wq = alloc_workqueue("xfs-inodegc/%s", > - XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_PERCPU), > + XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND), > 1, mp->m_super->s_id); This is part of a per-cpu work deferring algorithm. It has hard requirements on per-cpu work scheduling because it uses lockless per-cpu queues. This inodegc stuff is required to run on isolated CPUs if the tasks on isolated CPUs are accessing anything from an XFS filesystem (even if it is just running binaries from XFS filesystems). Go look at xfs_inodegc_queue()... > mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", > - XFS_WQFLAGS(WQ_FREEZABLE | WQ_PERCPU), 0, > + XFS_WQFLAGS(WQ_FREEZABLE | WQ_UNBOUND), 0, > mp->m_super->s_id); That can probably be unbound. > if (!mp->m_sync_workqueue) > goto out_destroy_inodegc; > @@ -2564,7 +2564,7 @@ xfs_init_workqueues(void) > * AGs in all the filesystems mounted. Hence use the default large > * max_active value for this workqueue. > */ > - xfs_alloc_wq = alloc_workqueue("xfsalloc", XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_PERCPU), > + xfs_alloc_wq = alloc_workqueue("xfsalloc", XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND), > 0); Not sure about this one. It might be ok, but if it runs out of concurrency then there is potential for memory reclaim/dirty page cleaning stalls (and maybe deadlocks). -Dave. -- Dave Chinner dgc@kernel.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND 2026-02-19 1:24 ` Dave Chinner @ 2026-02-19 7:25 ` Sebastian Andrzej Siewior 2026-02-19 23:05 ` Dave Chinner 2026-02-19 9:20 ` Michal Hocko 2026-02-19 9:35 ` Marco Crivellari 2 siblings, 1 reply; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-19 7:25 UTC (permalink / raw) To: Dave Chinner Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Michal Hocko, Anthony Iliopoulos, Carlos Maiolino On 2026-02-19 12:24:38 [+1100], Dave Chinner wrote: > > The changes from per-cpu to unbound will help to improve situations where > > CPU isolation is used, because unbound work can be moved away from > > isolated CPUs. > > If you are running operations through the XFS filesystem on isolated > CPUs, then you absolutely need some of these the per-cpu workqueues > running on those isolated CPUs too. > > Also, these workqueues are typically implemented these ways to meet > performancei targets, concurrency constraints or algorithm > requirements. Changes like this need a bunch of XFS metadata > scalability benchmarks on high end server systems under a variety of > conditions to at least show there aren't any obvious any behavioural > or performance regressions that result from the change. So all of those (below) where you say "performance critical", those work items are only enqueued from an interrupt? Never origin from a user task? Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND 2026-02-19 7:25 ` Sebastian Andrzej Siewior @ 2026-02-19 23:05 ` Dave Chinner 2026-02-20 7:19 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 9+ messages in thread From: Dave Chinner @ 2026-02-19 23:05 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Michal Hocko, Anthony Iliopoulos, Carlos Maiolino On Thu, Feb 19, 2026 at 08:25:56AM +0100, Sebastian Andrzej Siewior wrote: > On 2026-02-19 12:24:38 [+1100], Dave Chinner wrote: > > > The changes from per-cpu to unbound will help to improve situations where > > > CPU isolation is used, because unbound work can be moved away from > > > isolated CPUs. > > > > If you are running operations through the XFS filesystem on isolated > > CPUs, then you absolutely need some of these the per-cpu workqueues > > running on those isolated CPUs too. > > > > Also, these workqueues are typically implemented these ways to meet > > performancei targets, concurrency constraints or algorithm > > requirements. Changes like this need a bunch of XFS metadata > > scalability benchmarks on high end server systems under a variety of > > conditions to at least show there aren't any obvious any behavioural > > or performance regressions that result from the change. > > So all of those (below) where you say "performance critical", those > work items are only enqueued from an interrupt? No. > Never origin from a user task? Inode GC is most definitely driven from user tasks with unbound concurrency (e.g. unlink(), close() and other syscalls that can drop a file reference). It can also be driven by the kernel through direct reclaim (again, from user task context with unbound concurrency), and from pure kernel context via reclaim from kswapd (strictly bound concurrency in this case). The lockless per-cpu queuing and processing algorithm was added because the inode eviction path from user context is performance critical. The original version using unbound workqueues had major performance regressions. There's discussion of the reasons for those performance regressions and numbers around those in the original discussions and prototypes: https://lore.kernel.org/linux-xfs/20210802103559.GH2757197@dread.disaster.area/ https://lore.kernel.org/linux-xfs/20210804032030.GT3601443@magnolia/ -Dave. -- Dave Chinner dgc@kernel.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND 2026-02-19 23:05 ` Dave Chinner @ 2026-02-20 7:19 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-20 7:19 UTC (permalink / raw) To: Dave Chinner Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Michal Hocko, Anthony Iliopoulos, Carlos Maiolino On 2026-02-20 10:05:09 [+1100], Dave Chinner wrote: > > Never origin from a user task? > > Inode GC is most definitely driven from user tasks with unbound … Thank you for the details. Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND 2026-02-19 1:24 ` Dave Chinner 2026-02-19 7:25 ` Sebastian Andrzej Siewior @ 2026-02-19 9:20 ` Michal Hocko 2026-02-19 22:22 ` Dave Chinner 2026-02-19 9:35 ` Marco Crivellari 2 siblings, 1 reply; 9+ messages in thread From: Michal Hocko @ 2026-02-19 9:20 UTC (permalink / raw) To: Dave Chinner Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Anthony Iliopoulos, Carlos Maiolino On Thu 19-02-26 12:24:38, Dave Chinner wrote: > On Wed, Feb 18, 2026 at 05:56:09PM +0100, Marco Crivellari wrote: > > Recently, as part of a workqueue refactor, WQ_PERCPU has been added to > > alloc_workqueue() users that didn't specify WQ_UNBOUND. > > The change has been introduced by: > > > > 69635d7f4b344 ("fs: WQ_PERCPU added to alloc_workqueue users") > > > > These specific workqueues don't use per-cpu data, so change the behavior > > removing WQ_PERCPU and adding WQ_UNBOUND. > > Your definition for "doesn't need per-cpu workqueues" is sadly > deficient. I believe Marco wanted to say they do not require strict per-cpu guarantee of WQ_PERCPU for correctness. I.e. those workers do not operate on per-cpu data. > > Even if these workqueue are > > marked unbound, the workqueue subsystem maintains cache locality by > > default via affinity scopes. > > > > The changes from per-cpu to unbound will help to improve situations where > > CPU isolation is used, because unbound work can be moved away from > > isolated CPUs. > > If you are running operations through the XFS filesystem on isolated > CPUs, then you absolutely need some of these the per-cpu workqueues > running on those isolated CPUs too. The usecase is that isolated workload needs to perform fs operations at certain stages of the operation. Then it moves over to "do not disturb" mode when it operates in the userspace and shouldn't be disrupted by the kernel. We do observe that those workers trigger at later time and disturb the workload when not appropriate. > Also, these workqueues are typically implemented these ways to meet > performancei targets, concurrency constraints or algorithm > requirements. Changes like this need a bunch of XFS metadata > scalability benchmarks on high end server systems under a variety of > conditions to at least show there aren't any obvious any behavioural > or performance regressions that result from the change. This is a fair ask. We do not want to regress non-isolated workloads by any means and if there is a risk of regression for those, and from your more detailed explanation it seems so, then we might need to search for a different approach. Would be an opt in - i.e. tolerate performance loss by loosing the locality via a kernel cmd line an option? I am cutting your specific feedback on those WQs. Thanks for that! This is a very valuable feedback. Thanks! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND 2026-02-19 9:20 ` Michal Hocko @ 2026-02-19 22:22 ` Dave Chinner 2026-02-20 9:29 ` Michal Hocko 0 siblings, 1 reply; 9+ messages in thread From: Dave Chinner @ 2026-02-19 22:22 UTC (permalink / raw) To: Michal Hocko Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Anthony Iliopoulos, Carlos Maiolino On Thu, Feb 19, 2026 at 10:20:54AM +0100, Michal Hocko wrote: > On Thu 19-02-26 12:24:38, Dave Chinner wrote: > > On Wed, Feb 18, 2026 at 05:56:09PM +0100, Marco Crivellari wrote: > > > Recently, as part of a workqueue refactor, WQ_PERCPU has been added to > > > alloc_workqueue() users that didn't specify WQ_UNBOUND. > > > The change has been introduced by: > > > > > > 69635d7f4b344 ("fs: WQ_PERCPU added to alloc_workqueue users") > > > > > > These specific workqueues don't use per-cpu data, so change the behavior > > > removing WQ_PERCPU and adding WQ_UNBOUND. > > > > Your definition for "doesn't need per-cpu workqueues" is sadly > > deficient. > > I believe Marco wanted to say they do not require strict per-cpu > guarantee of WQ_PERCPU for correctness. I.e. those workers do not > operate on per-cpu data. Which I've already pointed out was an incorrect statement w.r.t to the code being changed (e.g. inodegc use lockless per-cpu queues and a per-cpu worker to process those queues). I've also pointed out that it is a fundamentally incorrect statement when considering bound concurrency algorithms in general, too. > > > Even if these workqueue are > > > marked unbound, the workqueue subsystem maintains cache locality by > > > default via affinity scopes. > > > > > > The changes from per-cpu to unbound will help to improve situations where > > > CPU isolation is used, because unbound work can be moved away from > > > isolated CPUs. > > > > If you are running operations through the XFS filesystem on isolated > > CPUs, then you absolutely need some of these the per-cpu workqueues > > running on those isolated CPUs too. > > The usecase is that isolated workload needs to perform fs operations at > certain stages of the operation. Then it moves over to "do not disturb" > mode when it operates in the userspace and shouldn't be disrupted by the > kernel. We do observe that those workers trigger at later time and > disturb the workload when not appropriate. Define "later time". Also, please explain how the XFS work gets queued to run on these isolated CPUs? If there's nothing fs, storage or memory reclaim related running on the isolated CPU, then none of the XFS workqueues should ever trigger on those CPUs. IOWs, if you are getting unexpected work triggers on isolated CPUs, then you need to first explain how those unexpected triggers are occurring. Once you can explain how the per-cpu workqueues are responsible for the unexpected behaviour rather than just being the visible symptom of something else going wrong (e.g. a scheduler or workqueue bug), then we can discussion changing the XFS code.... > > Also, these workqueues are typically implemented these ways to meet > > performancei targets, concurrency constraints or algorithm > > requirements. Changes like this need a bunch of XFS metadata > > scalability benchmarks on high end server systems under a variety of > > conditions to at least show there aren't any obvious any behavioural > > or performance regressions that result from the change. > > This is a fair ask. We do not want to regress non-isolated workloads by > any means and if there is a risk of regression for those, and from your > more detailed explanation it seems so, then we might need to search for > a different approach. Would be an opt in - i.e. tolerate performance > loss by loosing the locality via a kernel cmd line an option? No, this is still just hacking around an unexpected behaviour without first understanding the cause of it. We can discuss how to fix whatever the problem is once the root cause has been identified and understood. -Dave. -- Dave Chinner dgc@kernel.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND 2026-02-19 22:22 ` Dave Chinner @ 2026-02-20 9:29 ` Michal Hocko 0 siblings, 0 replies; 9+ messages in thread From: Michal Hocko @ 2026-02-20 9:29 UTC (permalink / raw) To: Dave Chinner Cc: Marco Crivellari, linux-kernel, linux-xfs, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Anthony Iliopoulos, Carlos Maiolino On Fri 20-02-26 09:22:19, Dave Chinner wrote: > On Thu, Feb 19, 2026 at 10:20:54AM +0100, Michal Hocko wrote: [...] > > The usecase is that isolated workload needs to perform fs operations at > > certain stages of the operation. Then it moves over to "do not disturb" > > mode when it operates in the userspace and shouldn't be disrupted by the > > kernel. We do observe that those workers trigger at later time and > > disturb the workload when not appropriate. > > Define "later time". After workload transitions to the "do not disturb" operation. > Also, please explain how the XFS work gets queued to run on these > isolated CPUs? If there's nothing fs, storage or memory reclaim > related running on the isolated CPU, then none of the XFS workqueues > should ever trigger on those CPUs. I do not have full visibility in the workload (i.e. access to the code) so we only rely on tracing data. We know that the workload operates on the set of isolated cpus and each component is consuming a dedicated CPU. We also do see (among others) XFS workers interfering. I am not able to link exact syscalls to those worker items but we pressume those are result of prior workload execution as not much else is running on those CPUs. > IOWs, if you are getting unexpected work triggers on isolated CPUs, > then you need to first explain how those unexpected triggers are > occurring. Once you can explain how the per-cpu workqueues are > responsible for the unexpected behaviour rather than just being the > visible symptom of something else going wrong (e.g. a scheduler or > workqueue bug), then we can discussion changing the XFS code.... Understood. Thanks! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND 2026-02-19 1:24 ` Dave Chinner 2026-02-19 7:25 ` Sebastian Andrzej Siewior 2026-02-19 9:20 ` Michal Hocko @ 2026-02-19 9:35 ` Marco Crivellari 2 siblings, 0 replies; 9+ messages in thread From: Marco Crivellari @ 2026-02-19 9:35 UTC (permalink / raw) To: Dave Chinner Cc: linux-kernel, linux-xfs, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Anthony Iliopoulos, Carlos Maiolino On Thu, Feb 19, 2026 at 2:24 AM Dave Chinner <dgc@kernel.org> wrote: > > On Wed, Feb 18, 2026 at 05:56:09PM +0100, Marco Crivellari wrote: > > Recently, as part of a workqueue refactor, WQ_PERCPU has been added to > > alloc_workqueue() users that didn't specify WQ_UNBOUND. > > The change has been introduced by: > > > > 69635d7f4b344 ("fs: WQ_PERCPU added to alloc_workqueue users") > > > > These specific workqueues don't use per-cpu data, so change the behavior > > removing WQ_PERCPU and adding WQ_UNBOUND. > > Your definition for "doesn't need per-cpu workqueues" is sadly > deficient. I should have added a "request for comments" for this patch, maybe. Anyhow I meant they are not operating on per-cpu data, so it wasn't "strictly" necessary because of this, sorry. Thanks for all the information you shared. I will read carefully. -- Marco Crivellari L3 Support Engineer ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-02-20 9:29 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-18 16:56 [PATCH] xfs: convert alloc_workqueue users to WQ_UNBOUND Marco Crivellari 2026-02-19 1:24 ` Dave Chinner 2026-02-19 7:25 ` Sebastian Andrzej Siewior 2026-02-19 23:05 ` Dave Chinner 2026-02-20 7:19 ` Sebastian Andrzej Siewior 2026-02-19 9:20 ` Michal Hocko 2026-02-19 22:22 ` Dave Chinner 2026-02-20 9:29 ` Michal Hocko 2026-02-19 9:35 ` Marco Crivellari
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox