From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92F923939AE for ; Wed, 29 Apr 2026 21:21:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777497707; cv=none; b=oxLnogznLvVNOe2BpwgUAhQafB9PzKt3i9DmNeUOS7qe2wubk6adYFTZBNpSNRfL6Ds7g9QzGvPfoxePJWDOHvhT2LO4VLLn+mcCn+fUke98eLD2mFZzvWZiRRDC0ko23mg/K1TjgGOQv1jqPCBdN35HrnkVsqGgCHTPQiYrZxc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777497707; c=relaxed/simple; bh=txEzsWgL5HDk7Yr72MKN63ejcP6rdoXzkMcSpxeUFjM=; h=From:Subject:Date:Message-Id:MIME-Version:Content-Type:To:Cc; b=oZBS2Yu9qpngKcZFCDoXbQt3zCCb1QDskWeToxf6xoywIh46PJPKU+IsnTjODNfupFFFmqyoHdAjZXVtzKXeKHRUG+tbByYy7UGJmZHiHV0d6wHZwrYpiZs+AU5/7ojf8/26enPuwZZoVscSkUO8ayEoPluVLTBmYAezJBTJXa4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DsBcri+Y; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DsBcri+Y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1777497705; x=1809033705; h=from:subject:date:message-id:mime-version: content-transfer-encoding:to:cc; bh=txEzsWgL5HDk7Yr72MKN63ejcP6rdoXzkMcSpxeUFjM=; b=DsBcri+YCb0paMuFHxZxSjmNQYKQwRZf14fNbo8w0N36xmtE4LI3P1hj uOI9K3N9uS8U41THW7q+S3sY99166JYwIjWNuO4TqN7imD/g8x71SHuDE vXC8vaQSx1FPjfsSE7OzaJAiboRY1fAoVu5IOZ4BwotQ7eJMehWXDjZ/d jdVhYC0dvpbdhamNj29Azl5ambeLmqwRraN1yXIV4z0LHGgHyX+AMutFV Ek2+Jh4cluLpclwZXeuq/6HnmGH2SgPEsfmAc3Qj/H96vHV4IbM7Z1RbC /I03BGYegJ97UwqJRJVeEPxgAkEt5b4VzLvVs/vyvP6g0KzfblhEZeJ8i A==; X-CSE-ConnectionGUID: jV6M942iTjinwPQO91K9rA== X-CSE-MsgGUID: Yq8Ky0MtTNe6HvPdG5cIOg== X-IronPort-AV: E=McAfee;i="6800,10657,11771"; a="88748726" X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="88748726" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 14:21:44 -0700 X-CSE-ConnectionGUID: M615DIjZRDuVTltYb1q3uw== X-CSE-MsgGUID: sHSmuTVgQ/250VPIJfO71g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="234260021" Received: from unknown (HELO [172.25.112.21]) ([172.25.112.21]) by orviesa008.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 14:21:44 -0700 From: Ricardo Neri Subject: [PATCH v2 0/4] sched: Fix cluster scheduling in the presence of asymmetric capacity Date: Wed, 29 Apr 2026 14:19:43 -0700 Message-Id: <20260429-rneri-fix-cas-clusters-v2-0-cd787de35cc6@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-B4-Tracking: v=1; b=H4sIAO918mkC/3WNwQqDMBBEf0X23JVkqbX01P8QDyaudcHGklWxi P/eVOixxzfMvNlAOQor3LINIi+iMoYEdMrA9014MEqbGMhQYS5kMIY0wE5W9I2iH2adOCo6d6Z r2Vq2BUEavyKnziGu6sS96DTG9/Gz2G/6U5b/lItFg5Zs17nCOOfL+yBhXnMJEw+5H59Q7/v+A Y5jfjnCAAAA To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Tim C Chen , Chen Yu , Christian Loehle , Barry Song Cc: "Rafael J. Wysocki" , Len Brown , ricardo.neri@intel.com, linux-kernel@vger.kernel.org, Ricardo Neri X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777497633; l=4647; i=ricardo.neri-calderon@linux.intel.com; s=20250602; h=from:subject:message-id; bh=txEzsWgL5HDk7Yr72MKN63ejcP6rdoXzkMcSpxeUFjM=; b=3fgFXwumpsogwOAW6LT/Hh9NZkpieC58cVWzLwSNFb16C8O3jX2LBlLyKqm4BTvDCSt+5kPF/ jYT3786Z3cbBAkPpW32qLv4GqJu36S1QxaHyTmgcj40uCZalgo3QSPf X-Developer-Key: i=ricardo.neri-calderon@linux.intel.com; a=ed25519; pk=NfZw5SyQ2lxVfmNMaMR6KUj3+0OhcwDPyRzFDH9gY2w= Cluster scheduling aims to maximize performance by spreading load across clusters of CPUs that share mid-level resources [1]. It works well on uniform systems, but it breaks down on topologies with big and small cores arranged in clusters. As a result, it fails on several generations of Intel processors already shipped and upcoming. Consider the topology below of big (B) cores and clusters of small (s) cores. ------ ------ | B | | B | ----------------- ----------------- | | | | | s | s | s | s | | s | s | s | s | ------ ------ ----------------- ----------------- | L2 | | L2 | | L2 | | L2 | ------------------------------------------------------- | L3 | ------------------------------------------------------- On a partially busy system (one with idle CPUs; busy CPUs have one task each), scheduling for asymmetric capacity ensures that misfit tasks land on the big CPUs. The remaining tasks, misfit or not, run on the small CPUs. When CONFIG_SCHED_CLUSTER is enabled, these remaining tasks are supposed to be evenly spread among the small-CPU clusters. Today, this does not happen. Several issues in the load balancer prevent a small CPU in one cluster from pulling tasks from another: a) update_sd_pick_busiest() may select a fully_busy group with higher per-CPU capacity as the busiest, preventing a subsequent fully_busy group of equal capacity from being correctly selected. b) Misfit-load statistics are used to identify tasks that would benefit from migrating to bigger CPUs. Accounting misfit load is pointless if the destination CPU is equally small, and it also blocks balancing between clusters. c) Due to b), groups that are truly has_spare or fully_busy get misclassified as misfit_task. update_sd_pick_busiest() then skips them, since a small destination CPU cannot help with misfit tasks. d) Once a busiest group has been identified, sched_balance_find_src_rq() will refuse to migrate tasks to CPUs of equal capacity, even when doing so is precisely what is required to balance small-CPU clusters. e) The SD_PREFER_SIBLING flag is missing from scheduling domains with asymmetric capacity, preventing the balancer from equalizing load across sibling small-core clusters. Together, these issues prevent cluster-level balancing on systems with asymmetric CPU capacity. This series addresses each problem and restores the intended behavior. Details, rationale, and code changes are explained in each patch. I tested these patches on Alder Lake (with Hyper-Threading disabled), Lunar Lake and Panther Lake. I also tested configurations with only one CPU online per cluster to ensure that systems without cluster topology continue to behave as expected. Link: https://lore.kernel.org/r/20210924085104.44806-1-21cnbao@gmail.com/ [1] Changes in v2: - Patch 1: Rewrote patch description for clarity. Added a note clarifying that SD_ASYM_CPUCAPACITY and SMT are mutually exclusive. (Tim) - Patch 2: Fixed a bug where the capacity check inadvertently broke the mutual exclusion of the sched_reduced_capacity() path. Keep marking the root domain as overloaded when misfit tasks are present to allow bigger CPUs to help via newly idle balance. (sashiko) Fixed the description to state that capacity_greater() looks for differences of ~5% or more, not 20%. (Christian) - Patch 3: Use arch_scale_cpu_capacity() instead of capacity_of() to ignore runtime capacity variability. Inverted the capacity check. (Christian) - Patch 4: Reworded the patch description for clarity. - Link to v1: https://lore.kernel.org/r/20260330-rneri-fix-cas-clusters-v1-0-1e465b6fecb2@linux.intel.com/ --- Ricardo Neri (4): sched/fair: Check CPU capacity before comparing group types during load balance sched/fair: Skip misfit load accounting when the destination CPU cannot help sched/fair: Allow load balancing between CPUs of identical capacity sched/topology: Do not clear SD_PREFER_SIBLING in domains with clusters kernel/sched/fair.c | 52 +++++++++++++++++++++++++++++++++++-------------- kernel/sched/topology.c | 11 +++++++++-- 2 files changed, 46 insertions(+), 17 deletions(-) --- base-commit: 8f1aacb683ef4a49b83dcc40bfce022aaa4aa597 change-id: 20250620-rneri-fix-cas-clusters-bb4287d1e152 Best regards, -- Ricardo Neri