Re: [PATCH v2 0/5] workqueue: Introduce a sharded cache affinity scope

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: Chuck Lever <cel@kernel.org>
To: Breno Leitao <leitao@debian.org>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, puranjay@kernel.org,
	linux-crypto@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Michael van der Westhuizen <rmikey@meta.com>,
	kernel-team@meta.com, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH v2 0/5] workqueue: Introduce a sharded cache affinity scope
Date: Mon, 23 Mar 2026 11:28:49 -0400	[thread overview]
Message-ID: <f2f7fff3-2f6a-4ebb-aa5e-33188be4dd9a@kernel.org> (raw)
In-Reply-To: <acFVEr7iVnU_70yh@gmail.com>

On 3/23/26 11:10 AM, Breno Leitao wrote:
> Hello Chuck,
> 
> On Mon, Mar 23, 2026 at 10:11:07AM -0400, Chuck Lever wrote:
>> On Fri, Mar 20, 2026, at 1:56 PM, Breno Leitao wrote:
>>> TL;DR: Some modern processors have many CPUs per LLC (L3 cache), and
>>> unbound workqueues using the default affinity (WQ_AFFN_CACHE) collapse
>>> to a single worker pool, causing heavy spinlock (pool->lock) contention.
>>> Create a new affinity (WQ_AFFN_CACHE_SHARD) that caps each pool at
>>> wq_cache_shard_size CPUs (default 8).
>>>
>>> Changes from RFC:
>>>
>>> * wq_cache_shard_size is in terms of cores (not vCPU). So,
>>>   wq_cache_shard_size=8 means the pool will have 8 cores and their siblings,
>>>   like 16 threads/CPUs if SMT=1
>>
>> My concern about the "cores per shard" approach is that it
>> improves the default situation for moderately-sized machines
>> little or not at all.
>>
>> A machine with one L3 and 10 cores will go from 1 UNBOUND
>> pool to only 2. For virtual machines commonly deployed as
>> cloud instances, which are 2, 4, or 8 core systems (up to
>> 16 threads) there will still be significant contention for
>> UNBOUND workers.
> 
> Could you clarify your concern? Are you suggesting the default value of
> wq_cache_shard_size=8 is too high, or that the cores-per-shard approach
> fundamentally doesn't scale well for moderately-sized systems?
> 
> Any approach—whether sharding by cores or by LLC—ultimately relies on
> heuristics that may need tuning for specific workloads. The key difference
> is where we draw the line. The current default of 8 cores prevents the
> worst-case scenario: severe lock contention on large systems with 16+ CPUs
> all hammering a single unbound workqueue.

An 8-core machine with 16 threads can handle quite a bit of I/O, but
with the proposed scheme it will still have a single UNBOUND pool.
For NFS workloads I commonly benchmark, splitting the UNBOUND pool
on such systems is a very clear win.


> For smaller systems (2-4 CPUs), contention is usually negligible
> regardless of the approach. My perf lock contention measurements
> consistently show minimal contention in that range.
> 
>> IOW, if you want good scaling, human intervention (via a
>> boot command-line option) is still needed.
> 
> I am not convinced. The wq_cache_shard_size approach creates multiple
> pools on large systems while leaving small systems (<8 cores) unchanged.

This is exactly my concern. Smaller systems /do/ experience measurable
contention in this area. I don't object to your series at all, it's
clean and well-motivated; but the cores-per-shard approach doesn't scale
down to very commonly deployed machine sizes.

We might also argue that the NFS client and other subsystems that make
significant use of UNBOUND workqueues in their I/O paths might be well
advised to modify their approach. (net/sunrpc/sched.c, hint hint)


> This eliminates the pathological lock contention we're observing on
> high-core-count machines without impacting smaller deployments.
> 
> In contrast, splitting pools per LLC would force fragmentation even on
> systems that aren't experiencing contention, increasing the need for
> manual tuning across a wider range of configurations.

I claim that smaller deployments also need help. Further, I don't see
how UNBOUND pool fragmentation is a problem on such systems that needs
to be addressed (IMHO).


-- 
Chuck Lever

next prev parent reply	other threads:[~2026-03-23 15:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 17:56 [PATCH v2 0/5] workqueue: Introduce a sharded cache affinity scope Breno Leitao
2026-03-20 17:56 ` [PATCH v2 1/5] workqueue: fix typo in WQ_AFFN_SMT comment Breno Leitao
2026-03-20 17:56 ` [PATCH v2 2/5] workqueue: add WQ_AFFN_CACHE_SHARD affinity scope Breno Leitao
2026-03-23 22:43   ` Tejun Heo
2026-03-26 16:20     ` Breno Leitao
2026-03-26 19:41       ` Tejun Heo
2026-03-20 17:56 ` [PATCH v2 3/5] workqueue: set WQ_AFFN_CACHE_SHARD as the default " Breno Leitao
2026-03-20 17:56 ` [PATCH v2 4/5] tools/workqueue: add CACHE_SHARD support to wq_dump.py Breno Leitao
2026-03-20 17:56 ` [PATCH v2 5/5] workqueue: add test_workqueue benchmark module Breno Leitao
2026-03-23 14:11 ` [PATCH v2 0/5] workqueue: Introduce a sharded cache affinity scope Chuck Lever
2026-03-23 15:10   ` Breno Leitao
2026-03-23 15:28     ` Chuck Lever [this message]
2026-03-23 16:26       ` Breno Leitao
2026-03-23 18:04         ` Chuck Lever
2026-03-23 18:19           ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2f7fff3-2f6a-4ebb-aa5e-33188be4dd9a@kernel.org \
    --to=cel@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=chuck.lever@oracle.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=leitao@debian.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=puranjay@kernel.org \
    --cc=rmikey@meta.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox