From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8719B1C84BB for ; Wed, 13 May 2026 20:33:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778704413; cv=none; b=NwLllxG034IJ/HvTpHht/ZTE14CRTg4W+Bk+2wJOqQG2hw95s70dKpBGaqhFCmywM9sTcGebdd4IMSqyC9KVDgwo9z1MhKpyf/oPNH7gxKuLM71sOkxIyae1UDLCqYv/2hWGoscuzMbPQ9BhvobVq2SuGM7Txsa03CCTg/1DsYQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778704413; c=relaxed/simple; bh=tYdina0xPiASpYdxACILcS5iw7wuwifSit3wG0QYK9g=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version:Content-Type; b=Ti1EE6Ota++2UEL352CHKSDpRQQD6xNpZSJuO2/67fgnk529xYiyE1FjCmw8tsa253ki0BdQef61Kf896RWVTgh+72BW5zbuzzidJbBm+sbRb/D5BKYLY6n85U9MB0QrDaYDPifu+DCe8g+X4r0Oh4a/lnxBEIxTHmGMFzCchgI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZuNRa052; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZuNRa052" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778704413; x=1810240413; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=tYdina0xPiASpYdxACILcS5iw7wuwifSit3wG0QYK9g=; b=ZuNRa052IxhEDQDcBEeYbJ6A+imBYiqrkb4OjVJr9wLIuSwZSOOfmpxC XTj7ZEQjn3rWHJVYXyUSLeniPngJxCWSv6kjq8lpThlmrSyqeeh5K7EgC 2Zj83WI5+moJ11nNnrN5e12jJHayWOgX3gwP/Jv5n58JGX2byLeh0YSVQ rd2uMtvrTHWyEGWx487bA+Cpdtnt7cAkFh2UhFc5EIXxovaVPPY+0DXSa H5/MLC0+VXPLvyYEFosmucsRj7eXYDbMJVwy6nsAnSi/IGOnVAIJWGB3y zhG8deS0GYKmnkaxq7znqF8R5nmiOaHoNHfGELO6Gi1ghRGeQW7ZIh5X7 Q==; X-CSE-ConnectionGUID: Zru495YpTvad/0QBIh23cw== X-CSE-MsgGUID: dcwuC31dTcqPq9xisLAIEA== X-IronPort-AV: E=McAfee;i="6800,10657,11785"; a="79622942" X-IronPort-AV: E=Sophos;i="6.23,233,1770624000"; d="scan'208";a="79622942" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 May 2026 13:33:31 -0700 X-CSE-ConnectionGUID: RsfsKagJRxuzYzPQu2ioDw== X-CSE-MsgGUID: 1t2UQKyvT7SdddWE615S1A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,233,1770624000"; d="scan'208";a="238076300" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by orviesa008.jf.intel.com with ESMTP; 13 May 2026 13:33:31 -0700 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , Vincent Guittot Cc: Tim Chen , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Aubrey Li , Zhao Liu , Chen Yu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , Josh Don , Gavin Guo , Qais Yousef , Libo Chen , Luo Gengkun , linux-kernel@vger.kernel.org Subject: [Patch v4 00/16] Cache aware scheduling enhancements Date: Wed, 13 May 2026 13:39:11 -0700 Message-Id: X-Mailer: git-send-email 2.32.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch set contains cache-aware scheduling enhancements and bug fixes on top of Peter's sched/cache branch: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/cache Patches 1 to 6 resolve the over-aggregation issue, which is the remaining part of v4 that has not yet been merged into sched/cache. Patches 7 to 15 fix bugs reported by Sashiko (online and local). Compared with cache-aware v4, the major change in the first part is storing the LLC effective size in the per-CPU bottom sched_domain. This allows checking whether a task's memory footprint exceeds the threshold by fetching the value directly from the corresponding sched_domain, instead of recalculating it every time. Besides,  the NUMA balance page-fault statistics is used instead of RSS to estimate the working set. We also picked up Jianyong's optimization patch to reduce CPU scan overhead. However, if NUMA balancing is not enabled we will not have this working set estimate. Perhaps using RSS will be apprpriate for such scenario. Gengkun's CPU scan optimization is not included for now and will be revisited after further tuning. Most patches in the second part address race conditions. Each patch fixes one independent issue to facilitate easier review. Test results show that the current version keeps the same performance as v4 for workloads and platforms we tested. Future plans are to introduce fine-grained control of using cache aware scheduling on specific tasks after the load-balance-based cache-aware scheduling is merged: - Look into task tagging (e.g. with schedqos framework, cgroup) for non process based tasks grouping to LLC. - Evaluate fast cache-aware aggregation in the wakeup path. I will be on sabbatical from mid May to mid June. Chen Yu will still be following up these patches. Thanks. Tim Chen Yu (15): sched/cache: Disable cache aware scheduling for processes with high thread counts sched/cache: Skip cache-aware scheduling for single-threaded processes sched/cache: Calculate the LLC size and store it in sched_domain sched/cache: Avoid cache-aware scheduling for memory-heavy processes sched/cache: Add user control to adjust the aggressiveness of cache-aware scheduling sched/cache: Fix rcu warning when accessing sd_llc domain sched/cache: Fix potential NULL mm pointer access sched/cache: Annotate lockless accesses to mm->sc_stat.cpu sched/cache: Fix unpaired account_llc_enqueue/dequeue sched/cache: Fix checking active load balance by only considering the CFS task sched/cache: Fix race condition during sched domain rebuild sched/cache: Fix cache aware scheduling enabling for multi LLCs system sched/cache: Fix has_multi_llcs iff at least one partition has multiple LLCs sched/cache: Fix possible overflow when invalidating the preferred CPU sched/cache: Fix stale preferred_llc for a new task Jianyong Wu (1): sched/cache: Allow only 1 thread of the process to calculate the LLC occupancy drivers/base/cacheinfo.c | 23 +++ include/linux/cacheinfo.h | 1 + include/linux/sched.h | 5 + include/linux/sched/topology.h | 7 + init/init_task.c | 1 + kernel/exit.c | 29 ++++ kernel/sched/debug.c | 14 +- kernel/sched/fair.c | 256 +++++++++++++++++++++++++++++---- kernel/sched/sched.h | 7 +- kernel/sched/topology.c | 240 +++++++++++++++++++++++++------ 10 files changed, 509 insertions(+), 74 deletions(-) -- 2.32.0