Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA node to reduce contention

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Tim Chen <tim.c.chen@linux.intel.com>
To: "Chen, Yu C" <yu.c.chen@intel.com>
Cc: Pan Deng <pan.deng@intel.com>,
	mingo@kernel.org,  linux-kernel@vger.kernel.org,
	tianyou.li@intel.com, K Prateek Nayak <kprateek.nayak@amd.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA node to reduce contention
Date: Wed, 08 Apr 2026 09:47:18 -0700	[thread overview]
Message-ID: <19b2e22bd44d9f10a4960d5f1c4609e78fee73ba.camel@linux.intel.com> (raw)
In-Reply-To: <3a146435-7f5a-40e1-9e63-b9bb7494faf1@intel.com>

On Wed, 2026-04-08 at 17:25 +0800, Chen, Yu C wrote:
> On 4/8/2026 4:35 AM, Tim Chen wrote:
> > On Fri, 2026-04-03 at 13:46 +0800, Chen, Yu C wrote:
> > > On 4/2/2026 7:06 PM, K Prateek Nayak wrote:
> > > > Hello Peter,
> > > > 
> > > > On 4/2/2026 4:25 PM, Peter Zijlstra wrote:
> > > > > On Thu, Apr 02, 2026 at 10:11:11AM +0530, K Prateek Nayak wrote:
> > > > > 
> > > > > > It is still not super clear to me how the logic deals with more than
> > > > > > 128CPUs in a DIE domain because that'll need more than the u64 but
> > > > > > sbm_find_next_bit() simply does:
> > > > > > 
> > > > > >       tmp = leaf->bitmap & mask; /* All are u64 */
> > > > > > 
> > > > > > expecting just the u64 bitmap to represent all the CPUs in the leaf.
> > > > > > 
> > > > > > If we have, say 256 CPUs per DIE, we get shift(7) and arch_sbm_mask
> > > > > > as 7f (127) which allows a leaf to more than 64 CPUs but we are
> > > > > > using the "u64 bitmap" directly and not:
> > > > > > 
> > > > > >       find_next_bit(bitmap, arch_sbm_mask)
> > > > > > 
> > > > > > Am I missing something here?
> > > > > 
> > > > > Nope. That logic just isn't there, that was left as an exercise to the
> > > > > reader :-)
> > > > 
> > > > Ack! Let me go fiddle with that.
> > > > 
> > > 
> > > Nice catch. I hadn't noticed this since we have fewer than
> > > 64 CPUs per die. Please feel free to send patches to me when
> > > they're available.
> > > 
> > > And regarding your other question about the calculation of arch_sbm_shift,
> > > I'm trying to understand why there is a subtraction of 1, should it be:
> > > -       arch_sbm_shift = x86_topo_system.dom_shifts[TOPO_DIE_DOMAIN] - 1;
> > > +       arch_sbm_shift = x86_topo_system.dom_shifts[TOPO_DIE_DOMAIN - 1];
> > 
> > Perhaps something like
> > 
> > 	arch_sbm_shift = min(sizeof(unsigned long),
> > 			     topology_get_domain_shift(TOPO_TILE_DOMAIN));
> > 
> > to take care of both AMD system and the 64 bit leaf bitmask limit?
> > 
> 
> Yes, this should be doable (Prateek has mentioned using TOPO_TILE_DOMAIN).
> The only drawback I can think of is that if there are more than 64 CPUs
> within a die, it is possible CPUs in different dies (LLCs) be indexed in
> the same leaf and access the same mask, 
> 

First, I think I should have used 
	arch_sbm_shift = min(BITS_PER_LONG,
			     topology_get_domain_shift(TOPO_TILE_DOMAIN));


I am assuming that we should choose TOPO_DIE_DOMAIN for Intel CPUs and
TOPO_TILE_DOMAIN for AMD CPUs. And the assumption is that such domain
choice will span one L3 (I think that's the case). 

Then leaf domains smaller than the
domain size will also only span one L3 by definition.  So for the 128 CPUs
example you gave, both leaves with CPU
 0-63 and 64-127 will span the same LLC and we should not have cache
bounce.

Tim


> which would still lead to cache
> contention. Maybe we should allocate the leaf cpumask according to the
> actual size of a die?
> 
> thanks,
> Chenyu
> 
>

next prev parent reply	other threads:[~2026-04-08 16:47 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-21  6:10 [PATCH v2 0/4] sched/rt: mitigate root_domain cache line contention Pan Deng
2025-07-21  6:10 ` [PATCH v2 1/4] sched/rt: Optimize cpupri_vec layout to mitigate " Pan Deng
2026-03-20 10:09   ` Peter Zijlstra
2026-03-24  9:36     ` Deng, Pan
2026-03-24 12:11       ` Peter Zijlstra
2026-03-27 10:17         ` Deng, Pan
2026-04-02 10:37           ` Deng, Pan
2026-04-02 10:43           ` Peter Zijlstra
2026-04-08 10:16   ` Chen, Yu C
2026-04-09 11:47     ` Deng, Pan
2025-07-21  6:10 ` [PATCH v2 2/4] sched/rt: Restructure root_domain to reduce cacheline contention Pan Deng
2026-03-20 10:18   ` Peter Zijlstra
2025-07-21  6:10 ` [PATCH v2 3/4] sched/rt: Split root_domain->rto_count to per-NUMA-node counters Pan Deng
2026-03-20 10:24   ` Peter Zijlstra
2026-03-23 18:09     ` Tim Chen
2026-03-24 12:16       ` Peter Zijlstra
2026-03-24 22:40         ` Tim Chen
2025-07-21  6:10 ` [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA node to reduce contention Pan Deng
2026-03-20 12:40   ` Peter Zijlstra
2026-03-23 18:45     ` Tim Chen
2026-03-24 12:00       ` Peter Zijlstra
2026-03-31  5:37         ` Chen, Yu C
2026-03-31 10:19           ` K Prateek Nayak
2026-04-02  3:15             ` Chen, Yu C
2026-04-02  4:41               ` K Prateek Nayak
2026-04-02 10:55                 ` Peter Zijlstra
2026-04-02 11:06                   ` K Prateek Nayak
2026-04-03  5:46                     ` Chen, Yu C
2026-04-03  8:13                       ` K Prateek Nayak
2026-04-07 20:35                       ` Tim Chen
2026-04-08  3:06                         ` K Prateek Nayak
2026-04-08 11:35                           ` Chen, Yu C
2026-04-08 15:52                             ` K Prateek Nayak
2026-04-09  5:17                               ` K Prateek Nayak
2026-04-09 23:09                                 ` Tim Chen
2026-04-10  5:51                                   ` Chen, Yu C
2026-04-10  6:02                                     ` K Prateek Nayak
2026-04-08  9:25                         ` Chen, Yu C
2026-04-08 16:47                           ` Tim Chen [this message]
2026-03-20  9:59 ` [PATCH v2 0/4] sched/rt: mitigate root_domain cache line contention Peter Zijlstra
2026-03-20 12:50   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19b2e22bd44d9f10a4960d5f1c4609e78fee73ba.camel@linux.intel.com \
    --to=tim.c.chen@linux.intel.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=pan.deng@intel.com \
    --cc=peterz@infradead.org \
    --cc=tianyou.li@intel.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox