From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1CA230BF5C for ; Fri, 10 Apr 2026 05:52:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=198.175.65.9 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775800334; cv=fail; b=otQCJtbC2H6leTd94D+3hB/2/pM8Wpmt1QDK+SqsyOnxmmkZD39C00GI9dRJ106DAclDTQQw71ah7vpVLk3PZWX5j89rX5latM+inHJo+ViIVL4nyVFnHatOShRV9vpXjidi6udYUDu5uOrY39gCt3tzEoirG5qXaCnJNKQ+5Vk= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775800334; c=relaxed/simple; bh=SgCoQ45uJQ8vgJ6vkUO0xHoaiDq18zG5Pg4OGpw7Udw=; h=Message-ID:Date:Subject:To:CC:References:From:In-Reply-To: Content-Type:MIME-Version; b=BQfKwdPDniXWX6wfBnFKdghCZWs8P1ClqauWrUn1/rHFmfKxTyFdGPabyn3GTKbar2rMI/tx9zYPVib3SO7ranPqithFuUkjd6ji3wm9/O0dCZjYeXF+WB8CyM5cpzJBKzcvlzvcaFMgANVvCDxMfg/W6MkQPilT5Tko58ILIH8= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UID2tj7v; arc=fail smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UID2tj7v" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1775800332; x=1807336332; h=message-id:date:subject:to:cc:references:from: in-reply-to:content-transfer-encoding:mime-version; bh=SgCoQ45uJQ8vgJ6vkUO0xHoaiDq18zG5Pg4OGpw7Udw=; b=UID2tj7vUT/0NcQf1OpGoO+x8VhIOFsTepwJFULr04kBYYD4fxaOVRMS eXtunfIuWKCE9BWb+CtDpkf2kgqqleIYisj9GSliguas/1ujDmezNuCPQ sBOp59qYONF9EPeyDhXyJXH9LfkgjKmQUbZ00XuasvIrq2IyD6hAAEeKA TQ/Iq66ZjtEFH2//jcl3ZAHxMEUueKZhRy2iQWWEN49/ELhzRYBnRRISk Pvz8/md9wipohPYxSpxatjmirCWIQC4sZK+Rdf4WoqDGYRcPgX7wl4Lww PXv51VwEXWAFpXcyfVhZ7ptfVZ+5zI2EyQFfFsaDn0+sJSxxpjpNRBywg Q==; X-CSE-ConnectionGUID: 4TYZTLSFQE+AMBGikX0Tfg== X-CSE-MsgGUID: H42Z3DxdRWS385FpPxbWSA== X-IronPort-AV: E=McAfee;i="6800,10657,11754"; a="99445561" X-IronPort-AV: E=Sophos;i="6.23,171,1770624000"; d="scan'208";a="99445561" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Apr 2026 22:52:12 -0700 X-CSE-ConnectionGUID: FsYohzMXRv2FA/hhPPP8AQ== X-CSE-MsgGUID: COe+Bv41Q5umM9508x6Xkg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,171,1770624000"; d="scan'208";a="266965582" Received: from fmsmsx902.amr.corp.intel.com ([10.18.126.91]) by orviesa001.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Apr 2026 22:52:11 -0700 Received: from FMSMSX902.amr.corp.intel.com (10.18.126.91) by fmsmsx902.amr.corp.intel.com (10.18.126.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Thu, 9 Apr 2026 22:52:10 -0700 Received: from fmsedg903.ED.cps.intel.com (10.1.192.145) by FMSMSX902.amr.corp.intel.com (10.18.126.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37 via Frontend Transport; Thu, 9 Apr 2026 22:52:10 -0700 Received: from CH4PR04CU002.outbound.protection.outlook.com (40.107.201.13) by edgegateway.intel.com (192.55.55.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Thu, 9 Apr 2026 22:52:10 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=T/bU6sojJzVA5NAb5AaoLfZhdx7XxuqN/YvZ70Up5eMnL9xcESilrnrGwe+7x/cF5JxOW1mALaeKGh7QhBIWvZnGN34vwETKBkmGnC4sAIgXfE0cpB3B/FdZM9KH8rv0TjveQrTS5DZN6JL7rPEzbHKqBaApIYJ3IHPvlFHnUyzMO7KCWyb0a6Phin1tRUiLs5jiSGIMKK8NADpRTx9OzZHMUw7Ck9tYUqU4FzGhUfpbczInlpDjd3NFmOdNWdP1+osb2HuQ5WkxXgeToGu+tecohGFkCh9hlllLrju1lNs93v4J87Bz6UXAbgwhq5ywrGXDbOdFI/jF5ShfzJq+/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4qqsHO2EbOiuscUxahNC0FAgueXRf7QCnkBs0jobe80=; b=ci1l8B/O6PeF+jcKjtL7GU4KUXIAAb4qCrihodfaL3fXVdJS0n3ofEc4r9rqMOcyfBaX4UmAvVcJvKYNxxwMa1Y79tQS5qab8KT2Ns9e5vxBdtc9Kpz1N0/tNkEXhBtT0nD2h9wL4J4gXf0q8la4JQRC25UEYGNiV1ntGlGmgCceB7UW9eBz+W6zyFUgWcN5870WF3q5Vvief8TIKSYjKbEqeSG0BcwLMlccR4o16dpXwhEhl6VIZ0DhN4TDMa6lz94bilKvkMnfgKDTirdBWNChFWwGdgp0MO850galSTTCSijyWWmxX0UT/u3GhNFHGYEBp28VTjQSAQ2TOHe6qQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from DM4PR11MB6020.namprd11.prod.outlook.com (2603:10b6:8:61::19) by DS0PR11MB7412.namprd11.prod.outlook.com (2603:10b6:8:152::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.20; Fri, 10 Apr 2026 05:52:02 +0000 Received: from DM4PR11MB6020.namprd11.prod.outlook.com ([fe80::3058:1480:e4ac:5765]) by DM4PR11MB6020.namprd11.prod.outlook.com ([fe80::3058:1480:e4ac:5765%6]) with mapi id 15.20.9769.041; Fri, 10 Apr 2026 05:52:02 +0000 Message-ID: <1684620e-7c03-435c-9596-8d12ddba83bd@intel.com> Date: Fri, 10 Apr 2026 13:51:55 +0800 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA node to reduce contention To: Tim Chen , K Prateek Nayak , Peter Zijlstra CC: Pan Deng , , , References: <20260320124003.GU3738786@noisy.programming.kicks-ass.net> <63a095f02428700a7ff2623b8ea81e524a406834.camel@linux.intel.com> <20260324120008.GB3738010@noisy.programming.kicks-ass.net> <138c3f9d-309f-41e6-aa72-a3f6bd713bf0@intel.com> <22072ef8-5aec-49ac-9cc4-8a80bec14261@amd.com> <64649c85-29ab-4f70-a0c4-3c83cbdae2fc@intel.com> <20260402105530.GA3738786@noisy.programming.kicks-ass.net> <93d7eb33-c3a5-4498-bc26-57806b73d9e0@amd.com> <3b66e8e8-07e0-4f3e-a3ba-d97133af5162@intel.com> <1c742a1d8ecd8e314d704d46a44e2b8893479e50.camel@linux.intel.com> <2881a07f-ff14-4faa-9da7-3fbe206a463d@amd.com> <14eda829-fc6b-407a-93a3-0794ab521177@intel.com> <2dcbf93d-030d-466c-8b1c-8387513e9eb9@amd.com> Content-Language: en-US From: "Chen, Yu C" In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: TP0P295CA0014.TWNP295.PROD.OUTLOOK.COM (2603:1096:910:2::11) To DM4PR11MB6020.namprd11.prod.outlook.com (2603:10b6:8:61::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM4PR11MB6020:EE_|DS0PR11MB7412:EE_ X-MS-Office365-Filtering-Correlation-Id: 8137524b-2546-498a-935c-08de96c549ed X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|366016|1800799024|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: U3ODmXT/s/SNUXd+zTQn+GF0HWzUyWiqLYkGQyx0fTMQGfxuSIY0croOKmiJle1D2c1tXYFZe23/OMch3IOFek8ykkpg4Oa/Qm8Rf6mZnIce2AUnlGhsgEpjCW2vqAuh/C7g6BCTU7RQiPlgFIBUVtP+6tfvTYYsIVGVYEBq20O488TXFwaLC+MFMvnD23prjqbTG+utDprrqRdScHj/ZwY7+ciHOVHJXwGVc93GpwRkpYQoMzCKAnHrBReRvDJUfv9+1pMU8KSImI3W7YLgSGlTv7k+bEiSf4cT6Jzb77LDwsyLu1oP14xSijW+J9lnVLaNXQV4fkzrH2mvZ/ZykgKUnQ7HaabNJ2MHujbzAgG7L0sIFqu/gEg0xtAP1yPEG6Ip1l/G1G7ai/5bROw2TLPTqe8sl9t1ntS5fxUmlnLPZ7PqQyhVEuurWkV/E32XBSaqHRoIb8yzIpySJki4rExMKYkAn/fZ2Ml8bicnBdQ/VecQV7MTCtR3ASCUPlcLoqTVqrnP9x9FUbfvE0Ey3muaImoRJlljn0BFPfZ53bnFE9yRpOrA+0rALmMkcfLFcseNkww3sZM1EVrAXO1p7KEDx6iD6+9c7mvW2e7k8lRrk0bwtW4PZPkurRXt5FwCo/CDYvgAPycnD4BkGZm77lJCXBwPYhA0YLFF1XyNPS7LhMzHLfeidQn+H8T9q4XeWLOTJ+Vn+aTR+07zenofYTDBhQ8okDMZy92NjpuFe1k= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM4PR11MB6020.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(366016)(1800799024)(22082099003)(18002099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?bm5MWG11elluejM0a2hDYkJ0MDJ6b2dNQ21nRzVkN2YzZVhFSHJpTHlYTGUr?= =?utf-8?B?T2JlaVR5cnVaMVBlSHc3MWFZblhrN0UvYWM1enBaY2FKRWdRa2ZuTjlvd2ZO?= =?utf-8?B?S0lBNlQzb1ZFT2V6UVppZ2RKcjcwYXh2SWQ5eW0vN0N3cEsvUVIrQjZmZktm?= =?utf-8?B?bWhIdStOZmxJTS9qSWIzblYrWVRLMWk5MHJYQWpOQS9QQi84OHVzTDBVcHJp?= =?utf-8?B?cmFBQzJvOFozY2pHV1MrdmNGQ0sxR0dlS2dSbkprU3FLeTJObVJNUkxLRTdO?= =?utf-8?B?dGNXUW85NU5aZnpCYndWckFkbUxaa1I2SEI0MkJNcEFDY0dDSDBzREtjRjRv?= =?utf-8?B?VUJOdUdwR0dOQVdqNk41dUxFMSs5UDlxa3VDRndGWFBoOW5PLytXcWpTMEhu?= =?utf-8?B?NmVxM2ltQUJLYVpLM3I2MWhnR1dBU3lWdWhINGtjRHlLSzhoSHRkK29qNFdQ?= =?utf-8?B?MHQ1OXpGTDVSd0t2Q3ZxMXNVWjVXMEc3azlUYW8wQXE0cjNrc1J6a2lxamFF?= =?utf-8?B?amp5NlUwTmw1WUFaa2NuRmo0R0JLV0Myd2lBNkY5YXFDLy8ycXowazFQeXEr?= =?utf-8?B?Um81ckNML3VlSFNKNXlUNnhhUUMyTW5tR0NwV051NHl3RkxYWjFGeTJvUUFN?= =?utf-8?B?aFJIeW5WYkZjam1wUWh2Y1pIcjI5aXlvdCtndW55UlNLcTc0Z09lUlBSeFNC?= =?utf-8?B?TDVTT3MzSk5QVzQwQVd5THVuKzh4aDIrMlNiR1lJVnArZTF6Zkh4L0kyaXJj?= =?utf-8?B?NHN3anduaHRzb29lY3VZYUZxQVdISldaMFhrMkhzL29ueHppeFZ6WUdBajVK?= =?utf-8?B?MkdpaXd5d0Z1U3piRUpYUzg0eStPdmdMcmtBaVVDZU0ycFNSWXpOUTJoNS9l?= =?utf-8?B?Q20vcWxBLzBQRWRGOGN1RERmaTJmaHY3SGxmdzJwUVFWQmZ6cmVVQUI2VmhE?= =?utf-8?B?dE9ESkNmTmFmV1ZMekhRcGhaL1ozQlBDMXVuL1J1dWsxSEVNdjdZRlRYVkU1?= =?utf-8?B?d0p2a3pJVTdRR1dTN3E1VTZPMWJKSXpiVmdHYklIQi9KNEN5RFU5MHhFK1B3?= =?utf-8?B?NlZQcFlraFNMK3BaMWdNeSt6RysxWXI0cjFVcmk0T01wN1lYUUhCQTRRVUZB?= =?utf-8?B?VEtxbmFUUElXS0FYaGNSV0M0SFdIU0dSNlIwSWRTT3FubW9rcDlRcTV2UGF0?= =?utf-8?B?anFHcnk0aW1JSjh2eFFnK1ZuNTJXSGZROVE2akY3YVlZNVI3ckxBOXdOMVpM?= =?utf-8?B?aS9ndU5rMlJkcmNHU0d3QnIyOFlRQTFhQ0xFRGMzTTFaMFBrTDI1VllHWU5U?= =?utf-8?B?WWIvVXpYQS9YdlJHdnl2SEJKc2c1WmZqck85VmFCK3dUeFhDWE9tTklVbEIw?= =?utf-8?B?Y0hNaGpqZW1ydHVKMUdFY2U5alBPVXJqZXBEWnZaUmwzZnMyK29UbDN0NFEx?= =?utf-8?B?dUtHLzZoTElnUy9Ec09pTTZnUjdEdEE0RldjODhEWkRnRmQwamZjM3JKazRG?= =?utf-8?B?VzluM0gvemZNZDdKOWp0bUQyaHNyWWdleVVrOG5lUUd6OFZjRXIxR280UFVQ?= =?utf-8?B?eldZVVkvTFpGMDFPMDBKSjRVWFdlUDZSVzRJRXZ4UzRSaVZMY1hkTktEcHZO?= =?utf-8?B?NERVN1JUL2ZVMnVZc3Yrekp0YnZoWUh6QjZTem1uNHFZUUU3bmNpTlRzMjV3?= =?utf-8?B?R2JCYnpYRUtQZnBaZTlhcENJYnNzM0lqSlZ6TXo3NkgvdmROZUNpOWhycUg3?= =?utf-8?B?NWlvdGxzV0t6am10Z085NnE4c01qNnhtMVlMODE3S25NQWlTWEdzTnNNVjFB?= =?utf-8?B?RndWUUl6ZmZrbzI2TytBY1JYUndEeXNtcStCK1Z1R2xaS0RoamxldzlTM2tT?= =?utf-8?B?by9YVEZWeDBjOExkbUdDRXJvbzk0MlovYmVsS0tBc1pnUlFOeisvSXBqQ1J1?= =?utf-8?B?VTFoL2RWNktmSk80M0ttZCt0c1ZOZkcrVTRwc2FxOGxiVG9Va2k5QTdjZ3FY?= =?utf-8?B?ZUErQ0ZZRFJydmduOEkxdDQ2cXB6aFZpaG0vYkducys2eXdvT1BxdXFsYmJJ?= =?utf-8?B?anVZZmNSczhxcnFWZXUzTExHeklvMU5CbnV3TDM5bjVVWm05dEdxVVVGRnli?= =?utf-8?B?U1h4NDlXUWpnZkxsTGQ1OTlzV0NYdGZVbjJRdmVrL3dhcjBZZk5XbSs0T2J0?= =?utf-8?B?ZC85c0lhRmVYVnU3S0hmaFdkOEtCNXJETU5IWjI1eVM4V3d1dk1lU2loSEhS?= =?utf-8?B?WGJ5aXNXWHd0WHAzaDAzaEFieWZYOVVETUN2Zm4vMitzYVVLTWo0N0lob292?= =?utf-8?B?Nm9tdVljOWNkNk1VR2lrUnZYQzdvNHJPWkd3S3VjWlZ2YlVqNlVWUT09?= X-Exchange-RoutingPolicyChecked: a/TA62VTRKdMiA3o6XbzyR9vVPHQ0AsLqHeG4FDmN3J8+z5mlB6qhFmq0xbwHMP7KADyXbsOxikwrVYHlVJFZiZahp5JVIGbXg3yOSk9CT9gFtmsYZJJJy814yhbvsDY7Sxmv3ERrtCGXcYEUYzGA3w2gwsUPjWsmaOT5w8OenrUIz/NfeAMYlRVjgDyKfufCby8zNK8BAffdb/PTvzOLZNrGkkj0m69oBhl06OjuFaxPBUzKhRIlq8uUpebwF9qrzkGGERvOBf1RdM31Trvg9jOD32w4MXMoB7HRZP2dcR1mKOaEfm6gucLmYx+wa4iVDazJih9C7Pk4auydi2fRw== X-MS-Exchange-CrossTenant-Network-Message-Id: 8137524b-2546-498a-935c-08de96c549ed X-MS-Exchange-CrossTenant-AuthSource: DM4PR11MB6020.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Apr 2026 05:52:02.7335 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: bjtoMlZm/1GVMashIpVxtM/sZzY0dF/Bg0CD8b+rqcK/T4tza+DJAQ4b58wrzzSa1J95Ytu44uVNuGM7ATCCXg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR11MB7412 X-OriginatorOrg: intel.com Hi Prateek, Tim, On 4/10/2026 7:09 AM, Tim Chen wrote: > On Thu, 2026-04-09 at 10:47 +0530, K Prateek Nayak wrote: >> Hello Chenyu, Tim, >> >> On 4/8/2026 9:22 PM, K Prateek Nayak wrote: >>> Hello Chenyu, >>> >>> On 4/8/2026 5:05 PM, Chen, Yu C wrote: >>>> We haven't tried breaking it down further. One possible approach >>>> is to partition it at L2 scope, the benefit of which may depend on >>>> the workload. >>> >>> I fear at that point we'll have too many cachelines and too much >>> cache pollution when the CPU starts reading this at tick to schedule >>> a newidle balance. >>> >>> A 128 core system would bring in 128 * 64B = 8kB worth of data to >>> traverse the mask and at that point it becomes a trade off between >>> how fast you want reads vs writes and does it even speed up writes >>> after a certain point? >>> >>> Sorry I got distracted by some other stuff today but I'll share the >>> results from my experiments tomorrow. >> >> Here is some data from an experiments I ran on a 3rd Generation EPYC >> system (2 socket x 64C/128T (8LLCs per socket)): >> >> Experiment: Two threads pinned per-CPU on all CPUs yielding to each other >> and are operating on some cpumask - one setting the current CPU on the >> mask and other clearing the current CPU: Just an estimate of worst case >> scenario is we have to do one modification per sched-switch. >> >> I'm measuring total cycles taken for cpumask operations with following >> variants: >> >> %cycles vs global mask operation >> >> global mask : 100.0000% (var: 3.28%) >> per-NUMA mask : 32.9209% (var: 7.77%) >> per-LLC mask : 1.2977% (var: 4.85%) >> per-LLC mask (u8 operation; no LOCK prefix) : 0.4930% (var: 0.83%) >> >> per-NUMA split is 3X faster, per-LLC on this 16LLC machine is 77x faster >> and since there is enough space in the cacheline we can use a u8 to set >> and clear the CPu atomically without LOCK prefix and then do a >> 3 to >> get the CPU index from set bit which is 202x faster. >> >> If we use the u8 operations, we can only read 8CPUs per 8-byte load on >> 64-bit system but with per-LLC mask, we can scan all 16CPUs on the LLC >> with one 8-byte read and and per-NUMA one requires two 8-byte reads to >> scan the 128CPUs per socket. >> >> I think per-LLC mask (or, as Tim suggested, 64CPUs per cacheline) is >> a good tradeoff between the speedup vs amount of loads required to >> piece together the full cpumask. Thoughts? Yes, making it per LLC should work well enough (for balancing) to achieve optimal benefit. Let me run some similar tests to yours,plus hackbench/schbench, to see what the results are. BTW, on AMD systems, does the TILE domain always match the CCX where L3 is shared? On Intel the DIE is not always mapped to a domain where L3 is shared. > > I agree that per-LLC mask is a good compromise between minimizing loads > and offer good speed ups. I think we should get the LLC APICID > mask from 0x4 leaf (L1, L2, L3) instead of inferring from 0x1f leaf (Tile, Die ...etc) > for Intel. And the cache leaf I think is 0x8000_001D leaf for AMD. > Those are parsed in cacheinfo code and we can get it from there. > Yes, let me check how we can leverage the l3 id for that. thanks, Chenyu