From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE6DE287246 for ; Mon, 22 Dec 2025 02:21:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766370112; cv=none; b=Pg0bqG8mBVYfb+SMgEl/cvciscQ9yobPj1ne+wemIlvk9clxNvWw4ryhacGz7o66MWQ/+K57Q+SoFRzrQecT1Fht6fVrpWCrCbOZHu2XRFa8c+v7D//w8F3ZoVlNwPsXrRlfix1PKYwBtYYpr1Wn6iPKcHit65dkWjgqQNHZWN4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766370112; c=relaxed/simple; bh=XrlXe32dfYwHqp09OcCLy3UnreoUOOglqczFGuGSf+g=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=T0UvytbPVn9DaUfyIEmQaE6UMKwzcEn2oQ+/r0+exWu78IoP5ZRaIg2fDF9iDKHMjWgplnGa3GUWr7MF45ecgA5TWDTkj+BWGrS6hV2HT31d7lNuLjupti6G8+XkqvtHBWz9W9JS+geUgmTb0nIi5KrJ6pM71CIRyA2tVZe8BnQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=edSG9DZs; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="edSG9DZs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1766370111; x=1797906111; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=XrlXe32dfYwHqp09OcCLy3UnreoUOOglqczFGuGSf+g=; b=edSG9DZsw1ZXM/EhHaHzsB8fOIumuK6zuW+ic71+pstxDIfSQpfS3AWC C/0U6eulvZXxCUgx8tSJ8Ks5D8dvnHXp9siU/bRz/PGleI01fhjFLWEHG DqUtIvfXo3MIEUeW1I31Do7EHAmcyl7jPIuiQkUQWL8NQdMyIgS3su9cU RC1sKAoDfjo7XKKLB7jtbWuxvjCwSrGWE6X+AtgQ84+tFImBU4Htb/TqZ 9WGhxlE7QBrHOsuhcaLYB9xitivrRRniRRVPf+F+Fo01R2mhaiPGLg+7x Z/YsuP01ZE693VNsIJTGFWzPl4EStbckYyNLRDAVzEM0FpZDT9k91rPD7 w==; X-CSE-ConnectionGUID: STF5YeeuQtyfotpyJfgDlw== X-CSE-MsgGUID: F2Fj3SldRiSAKg5KK0znGQ== X-IronPort-AV: E=McAfee;i="6800,10657,11649"; a="85810943" X-IronPort-AV: E=Sophos;i="6.21,167,1763452800"; d="scan'208";a="85810943" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2025 18:21:49 -0800 X-CSE-ConnectionGUID: 5dBi09TJReqVtJKsQ8yc2Q== X-CSE-MsgGUID: j+ifNuCkQoWe+Zn4OYS6vw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,167,1763452800"; d="scan'208";a="199311725" Received: from unknown (HELO [10.238.3.27]) ([10.238.3.27]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2025 18:21:37 -0800 Message-ID: <7297e5e6-ae5a-42dc-8495-fddbb87ddf87@intel.com> Date: Mon, 22 Dec 2025 10:21:24 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] sched/fair: Avoid false sharing in nohz struct To: Shrikanth Hegde Cc: linux-kernel@vger.kernel.org, Benjamin Lei , Tim Chen , Tianyou Li , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider References: <20251211055612.4071266-1-wangyang.guo@intel.com> Content-Language: en-US From: "Guo, Wangyang" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 12/21/2025 9:05 PM, Shrikanth Hegde wrote: > Hi Wangyang, > > On 12/11/25 11:26 AM, Wangyang Guo wrote: >> There are two potential false sharing issue in nohz struct: >> 1. idle_cpus_mask is a read-mostly field, but share the same cacheline >>     with frequently updated nr_cpus. > > Updates to idle_cpus_mask is not same cacheline. it is updated alongside > nr_cpus. > > with CPUMASK_OFFSTACK=y, idle_cpus_mask is a pointer to the actual mask. > Updates to it happen in another cacheline. > > with CPUMASK_OFFSTACK=n, idle_cpus_mask is on the stack and its length > depends on NR_CPUS. typical value being 512/2048/8192 it can span a few > cachelines. So updates to it likely in different cacheline compared to > nr_cpus. > > see  https://lore.kernel.org/all/aS6bK4ad-wO2fsoo@gmail.com/ > This patch is mainly target for idle_cpus_mask as a pointer, which is default for many distro OS. > > Likely in your case, nr_cpus updates are the costly ones. > Try below and see if it helps to fix your issue too. > https://lore.kernel.org/all/20251201183146.74443-1-sshegde@linux.ibm.com/ > I Should send out new version soon. > >> 2. Data followed by nohz still share the same cacheline and has >>     potential false sharing issue. >> > > How does your patch handle this? > I don't see any other struct apart from nohz being changed. The data follow by nohz is implicit and determined by compiler. For example, this is the layout from /proc/kallsyms in my machine: ffffffff88600d40 b nohz ffffffff88600d68 B arch_needs_tick_broadcast ffffffff88600d6c b __key.264 ffffffff88600d6c b __key.265 ffffffff88600d70 b dl_generation ffffffff88600d78 b sched_clock_irqtime What we can do is placing read-mostly `idle_cpus_mask` pointer in a new cacheline, so data followed by nohz would not be affected by nr_cpus. > >> This patch tries to resolve the above two problems by isolating the >> frequently updated fields in a single cacheline. >> >> Reported-by: Benjamin Lei >> Reviewed-by: Tim Chen >> Reviewed-by: Tianyou Li >> Signed-off-by: Wangyang Guo >> --- >>   kernel/sched/fair.c | 7 ++++--- >>   1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 5b752324270b..bcc2766b7986 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -7193,13 +7193,14 @@ static DEFINE_PER_CPU(cpumask_var_t, >> should_we_balance_tmpmask); >>   #ifdef CONFIG_NO_HZ_COMMON >>   static struct { >> -    cpumask_var_t idle_cpus_mask; >> -    atomic_t nr_cpus; >> +    /* Isolate frequently updated fields in a cacheline to avoid >> false sharing issue. */ >> +    atomic_t nr_cpus ____cacheline_aligned; >>       int has_blocked;        /* Idle CPUS has blocked load */ >>       int needs_update;        /* Newly idle CPUs need their >> next_balance collated */ >>       unsigned long next_balance;     /* in jiffy units */ >>       unsigned long next_blocked;    /* Next update of blocked load in >> jiffies */ >> -} nohz ____cacheline_aligned; >> +    cpumask_var_t idle_cpus_mask ____cacheline_aligned; >> +} nohz; >> > > This can cause a lot of space wastage. > for exp: powerpc has 128 byte cacheline. > nohz is global, only one exists. The size inflating is minimal, less than 1 cacheline. BR Wangyang