From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33D23374E71 for ; Wed, 13 May 2026 20:33:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778704435; cv=none; b=ghgDAI9ryHDy5oqElfCu9DtSHTOi+QJEi40GlGxRyZWcQ3BOauuqGXfcBkpiyPXWrr92BqCrSTMtgU78fWMnjq147YUhAWCsUh9+fN/vWuCQ6yD9KuVdkBihtCrz3ajW6ixoe1d6iVFz5OeBHkh+s7JI2j5ISoPDNgWHTCws7PI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778704435; c=relaxed/simple; bh=/VmgYErXZx6IcFJgtwwENbrpk1uKBU0gJ+0pKw7HLsI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cMHdsaZBk6xd+lKdzRL5Tm6hoCkY+LAsY/a8dKEjVOeLssngxmShjzHJKuOL7fH1FTqt95huJNwWuQSV4n4H0epAyRmRnxSIG1iYPugASDiX7kmFyCZZmmABT3Ynrp/FRiwv4elyQ0jErm1aZ53dzIlkkHBjlbp3x5J7eUTV3dQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HxtzzN6y; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HxtzzN6y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778704432; x=1810240432; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/VmgYErXZx6IcFJgtwwENbrpk1uKBU0gJ+0pKw7HLsI=; b=HxtzzN6yYMv4gJahnVnBCqIyny1GqUc6DsH6j7hsPtUV2f+SXqpdgqUn 8FTVgTpv6UM3Kre7LsoyFzw1rLWJuhZmvBoCfimafTGx8u4mVL1W/oovX 4fSqDGLfHy4+nmcSIGgZ0QfgkzNCQXQYA9FlUW58TandMVwN+E9fmj+Bv mzxvoMWw+Nd3oLi/HvvABSJnBAYma2T16cG+1lrTQ/FO3hRchx7suiUPt vCZMByTtixFxvyBKu0QTm/BfIr0gB4jvzUaI0+PVUU+EXMXQoRVkykWBW jJ1FbYz2kFqz89k5fQ8iPDNTRvpwuuUuypDG1vmLRSRsni+0K/ugGpwqi Q==; X-CSE-ConnectionGUID: fPIq7/RCT7C1BrdnU/01Fw== X-CSE-MsgGUID: U0rh/NKoQq6JD0ccyTCpRg== X-IronPort-AV: E=McAfee;i="6800,10657,11785"; a="79623246" X-IronPort-AV: E=Sophos;i="6.23,233,1770624000"; d="scan'208";a="79623246" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 May 2026 13:33:47 -0700 X-CSE-ConnectionGUID: dfLlWYxpQoK1Dh6kQ0pFrg== X-CSE-MsgGUID: 2+WaR7QYR+iL6KLdW11vBA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,233,1770624000"; d="scan'208";a="238076400" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by orviesa008.jf.intel.com with ESMTP; 13 May 2026 13:33:46 -0700 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , Vincent Guittot Cc: Chen Yu , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Tim Chen , Aubrey Li , Zhao Liu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , Josh Don , Gavin Guo , Qais Yousef , Libo Chen , Luo Gengkun , linux-kernel@vger.kernel.org Subject: [Patch v4 13/16] sched/cache: Fix cache aware scheduling enabling for multi LLCs system Date: Wed, 13 May 2026 13:39:24 -0700 Message-Id: <6328a8a7f40925cec2a712d81ee58128a4c4444a.1778703694.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Chen Yu If there are multiple LLCs in the system, cache aware scheduling should be enabled. However, there is a corner case where, if there is a single NUMA node and a single LLC per node, cache aware scheduling will be turned on in the current implementation - because at this moment, the parent domain has not yet been degenerated, and it is possible that the current domain has the same cpu span as its parent. There is no need to turn cache aware scheduling on in this scenario. Fix it by iterating the parent domains to find a domain that is a superset of the current sd_llc, so that later, after the duplicated parent domains have been degenerated, cache aware scheduling will take effect. For example, the expected behavior would be: 2 sockets, 1 LLC per socket: MC span=0-3, PKG span=0-7, has_multi_llcs=true 1 socket, 2 LLCs per socket: MC span=0-3, PKG span=0-7, has_multi_llcs=true 2 sockets, 2 LLCs per socket: MC span=0-3, PKG span=0-7, has_multi_llcs=true 1 socket, 1 LLC per socket: MC span=0-3, PKG span=0-3, has_multi_llcs=false This bug was reported by sashiko. Fixes: d59f4fd1d303 ("sched/cache: Enable cache aware scheduling for multi LLCs NUMA node") Signed-off-by: Chen Yu Co-developed-by: Tim Chen Signed-off-by: Tim Chen --- kernel/sched/topology.c | 39 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index cff5a0ecd64d..07f0a3d28253 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1007,6 +1007,37 @@ static bool alloc_sd_llc(const struct cpumask *cpu_map, } #endif +/* + * Return true if @sd belongs to an LLC group whose enclosing + * partition spans more than one LLC. @sd must be the topmost + * SD_SHARE_LLC domain. + * + * Any duplicated parent domains with the same span as @sd are + * skipped: before cpu_attach_domain() degeneration these still + * exist, after degeneration the loop is a no-op. This makes the + * helper usable both during sched domain build and against an + * already-attached domain tree. + * + * Note: For systems with a single LLC per node, cache-aware + * scheduling is still enabled when multiple nodes exist. + * However, NUMA balancing decisions take precedence over + * cache-aware scheduling. Conversely, if there is only one + * LLC per partition, cache-aware scheduling should be disabled. + */ +static bool sd_in_multi_llcs(struct sched_domain *sd) +{ + struct sched_domain *sdp = sd->parent; + + /* it does not make sense to aggregate to 1 CPU */ + if (sd->span_weight == 1) + return false; + + while (sdp && sdp->span_weight == sd->span_weight) + sdp = sdp->parent; + + return !!sdp; +} + /* * Return the canonical balance CPU for this group, this is the first CPU * of this group that's also in the balance mask. @@ -3016,9 +3047,11 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att * NUMA imbalance stats for the hierarchy. */ if (sd->parent) { - if (IS_ENABLED(CONFIG_NUMA)) - adjust_numa_imbalance(sd); - has_multi_llcs = true; + if (IS_ENABLED(CONFIG_NUMA)) + adjust_numa_imbalance(sd); + + if (sd_in_multi_llcs(sd)) + has_multi_llcs = true; } } } -- 2.32.0