From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B6EAE3AB262
	for <linux-kernel@vger.kernel.org>; Wed,  8 Apr 2026 16:47:28 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1775666850; cv=none; b=UtceD744ZEJHI9LT2Puc4gxFCY64CtH+fQ8lERqw5tUWHUJNl0/+KmMs774x+b3K6aeVQHocylcPu8i0rD2lLmPGXsyS9SKZI1p2L9NBzW8RpLdGIbR9H4nn0zyEyW6aCf2sSlk0bJG1GVi+757nVwnhom92EvkLb5PlURHD130=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1775666850; c=relaxed/simple;
	bh=QAxHZa0p4pXevyd0CH1aosqwNis0XgZE0xzb667IlJg=;
	h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References:
	 Content-Type:MIME-Version; b=AmRaE/UmyJo6MAv1JcovGRuIm5jQpXvt68CvblYYOVmaTOdbQJPZ6YVhp7K3T5kUPfegXjzJYP4beSSs2HOGg3HKsBPyQVVQvJohCEwRfbQ3gQ6WXgRS2oNAJ4Fwx3Xw9p5FCRLyJE8f7YFNjsDhn5iOzuuwO0/6f3vMt0G0bro=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Hi/ejM63; arc=none smtp.client-ip=192.198.163.16
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Hi/ejM63"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1775666849; x=1807202849;
  h=message-id:subject:from:to:cc:date:in-reply-to:
   references:content-transfer-encoding:mime-version;
  bh=QAxHZa0p4pXevyd0CH1aosqwNis0XgZE0xzb667IlJg=;
  b=Hi/ejM63+80DlSGVXH0EBZe7XKyo3bNnxz1TNlIePKpM7NOdAi+zOSLe
   05qzW62lfdd5ucMn8JZP5G7sqLWfc4CuvDWG1uND6x486Z12a3A7uKDqo
   DuOoAeGQ6SnyCGl3DqFcU0xMfDkzNtb6gjq+LJfphtdB7H7j93D9kc+5i
   /qMTAzlQUJ8ztGkgOT5JlGn/5GJqiIrPAe2ZCKxoZmSbG+G7ZKDOy6Cul
   4+lT/nr7fn1yU+JVafdidZ3Sxa4cn3wKxoXlYmLnCp3OJ6nMLs5g6ltRE
   dDzk1Rc67c0CLs9EH8C39R4geCbGr7m36iLo1jrKSa3C6AHl4hFgGsvF5
   Q==;
X-CSE-ConnectionGUID: LAdm4NqgQbeIJGtv6pDviw==
X-CSE-MsgGUID: eAi65fygTbuF/PCY2AubMQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11753"; a="64201776"
X-IronPort-AV: E=Sophos;i="6.23,167,1770624000"; 
   d="scan'208";a="64201776"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Apr 2026 09:47:28 -0700
X-CSE-ConnectionGUID: II0H1PZMTSKr1EnCr+pigQ==
X-CSE-MsgGUID: Ch0BBWM8RfuEGmHygdgSVQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.23,167,1770624000"; 
   d="scan'208";a="225348108"
Received: from unknown (HELO [10.241.243.39]) ([10.241.243.39])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Apr 2026 09:47:28 -0700
Message-ID: <19b2e22bd44d9f10a4960d5f1c4609e78fee73ba.camel@linux.intel.com>
Subject: Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA
 node to reduce contention
From: Tim Chen <tim.c.chen@linux.intel.com>
To: "Chen, Yu C" <yu.c.chen@intel.com>
Cc: Pan Deng <pan.deng@intel.com>, mingo@kernel.org, 
	linux-kernel@vger.kernel.org, tianyou.li@intel.com, K Prateek Nayak
	 <kprateek.nayak@amd.com>, Peter Zijlstra <peterz@infradead.org>
Date: Wed, 08 Apr 2026 09:47:18 -0700
In-Reply-To: <3a146435-7f5a-40e1-9e63-b9bb7494faf1@intel.com>
References: <cover.1753076363.git.pan.deng@intel.com>
	 <a3207ebf537bbe5605ff5454f63b5604d83a04a0.1753076363.git.pan.deng@intel.com>
	 <20260320124003.GU3738786@noisy.programming.kicks-ass.net>
	 <63a095f02428700a7ff2623b8ea81e524a406834.camel@linux.intel.com>
	 <20260324120008.GB3738010@noisy.programming.kicks-ass.net>
	 <138c3f9d-309f-41e6-aa72-a3f6bd713bf0@intel.com>
	 <22072ef8-5aec-49ac-9cc4-8a80bec14261@amd.com>
	 <64649c85-29ab-4f70-a0c4-3c83cbdae2fc@intel.com>
	 <c9647f77-9ce6-4291-bc03-5a13e2d713a8@amd.com>
	 <20260402105530.GA3738786@noisy.programming.kicks-ass.net>
	 <93d7eb33-c3a5-4498-bc26-57806b73d9e0@amd.com>
	 <3b66e8e8-07e0-4f3e-a3ba-d97133af5162@intel.com>
	 <1c742a1d8ecd8e314d704d46a44e2b8893479e50.camel@linux.intel.com>
	 <3a146435-7f5a-40e1-9e63-b9bb7494faf1@intel.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.58.1 (3.58.1-1.fc43) 
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

On Wed, 2026-04-08 at 17:25 +0800, Chen, Yu C wrote:
> On 4/8/2026 4:35 AM, Tim Chen wrote:
> > On Fri, 2026-04-03 at 13:46 +0800, Chen, Yu C wrote:
> > > On 4/2/2026 7:06 PM, K Prateek Nayak wrote:
> > > > Hello Peter,
> > > >=20
> > > > On 4/2/2026 4:25 PM, Peter Zijlstra wrote:
> > > > > On Thu, Apr 02, 2026 at 10:11:11AM +0530, K Prateek Nayak wrote:
> > > > >=20
> > > > > > It is still not super clear to me how the logic deals with more=
 than
> > > > > > 128CPUs in a DIE domain because that'll need more than the u64 =
but
> > > > > > sbm_find_next_bit() simply does:
> > > > > >=20
> > > > > >       tmp =3D leaf->bitmap & mask; /* All are u64 */
> > > > > >=20
> > > > > > expecting just the u64 bitmap to represent all the CPUs in the =
leaf.
> > > > > >=20
> > > > > > If we have, say 256 CPUs per DIE, we get shift(7) and arch_sbm_=
mask
> > > > > > as 7f (127) which allows a leaf to more than 64 CPUs but we are
> > > > > > using the "u64 bitmap" directly and not:
> > > > > >=20
> > > > > >       find_next_bit(bitmap, arch_sbm_mask)
> > > > > >=20
> > > > > > Am I missing something here?
> > > > >=20
> > > > > Nope. That logic just isn't there, that was left as an exercise t=
o the
> > > > > reader :-)
> > > >=20
> > > > Ack! Let me go fiddle with that.
> > > >=20
> > >=20
> > > Nice catch. I hadn't noticed this since we have fewer than
> > > 64 CPUs per die. Please feel free to send patches to me when
> > > they're available.
> > >=20
> > > And regarding your other question about the calculation of arch_sbm_s=
hift,
> > > I'm trying to understand why there is a subtraction of 1, should it b=
e:
> > > -       arch_sbm_shift =3D x86_topo_system.dom_shifts[TOPO_DIE_DOMAIN=
] - 1;
> > > +       arch_sbm_shift =3D x86_topo_system.dom_shifts[TOPO_DIE_DOMAIN=
 - 1];
> >=20
> > Perhaps something like
> >=20
> > 	arch_sbm_shift =3D min(sizeof(unsigned long),
> > 			     topology_get_domain_shift(TOPO_TILE_DOMAIN));
> >=20
> > to take care of both AMD system and the 64 bit leaf bitmask limit?
> >=20
>=20
> Yes, this should be doable (Prateek has mentioned using TOPO_TILE_DOMAIN)=
.
> The only drawback I can think of is that if there are more than 64 CPUs
> within a die, it is possible CPUs in different dies (LLCs) be indexed in
> the same leaf and access the same mask,=C2=A0
>=20

First, I think I should have used=20
	arch_sbm_shift =3D min(BITS_PER_LONG,
			     topology_get_domain_shift(TOPO_TILE_DOMAIN));


I am assuming that we should choose TOPO_DIE_DOMAIN for Intel CPUs and
TOPO_TILE_DOMAIN for AMD CPUs. And the assumption is that such domain
choice will span one L3 (I think that's the case).=C2=A0

Then leaf domains smaller than the
domain size will also only span one L3 by definition.  So for the 128 CPUs
example you gave, both leaves with CPU
 0-63 and 64-127 will span the same LLC and we should not have cache
bounce.

Tim


> which would still lead to cache
> contention. Maybe we should allocate the leaf cpumask according to the
> actual size of a die?
>=20
> thanks,
> Chenyu
>=20
>=20