From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA858342526 for ; Fri, 10 Apr 2026 09:21:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775812893; cv=none; b=ojd9ZB8Xz+3kk95mLVhvh81DFOUcQEFObcvb8ppe/dLMBL5j4VswBrxIHJL5rHBJDZPd4ddQHhbTAOpjFBninZroJ2Wg0cGh+zq7HOcIh+UYCfuSYMPzwW2wTXnr0QdhrUlJn3VnLj1ZqRM2stzNo03I/NWP9QQpbV1R24QrftE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775812893; c=relaxed/simple; bh=JS2tvcxKYyzRPhNAEs0coVZnh7UmLqZvHZZMtiOczBM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Ehoo8+yMIzAbj4MFd0RxcEoT80I6idcwbVnkV4Obz0aSzgYoak5aCetm4RFU3SuBqOL7fFsv71TDLQI14gJ3nwVhtben6uJHXA+v+IBV38fdUmTSHZFP4AKlG4g7LWNfD01ymlPJEx1lldofke5CHNeTcYXMhGpf+2OAJfLggDk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Y9yGrk0p; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Y9yGrk0p" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=bV5tbyzP+yvftC0HhOA6x8+ZDI3yoeKw9iUkVYhb+9M=; b=Y9yGrk0pcv03tF8UdumwG9oc6M t5+as/V32wkubLc6gifW86ntbEK50v+l6xKLQu7cjMB9LCveBM98VI918qipsspIWvguP4wh8NIqw Ji2Y78BT2sJKIsuT533JTlwE0iLnVJ1sANF242GZo6uWjptsxGzp6wG8kLtpQkWpW4qwieMmKYwhw Qk81id7Eze6K0KjaA8uxXRrJpifC4dbtzLi8frElAYvnQC7qJ6Cpe/9ujFwigL7CP4r+eeYz6nPH+ estQ0dKTC+XzQpdSeoNm7HLiNJNwJgKnKBZ2KcjV8M2d3yhRx6ucRrdpJyRQrX5zpYh4ZcfZJngRG PhvmXaWQ==; Received: from 2001-1c00-8d85-4b00-266e-96ff-fe07-7dcc.cable.dynamic.v6.ziggo.nl ([2001:1c00:8d85:4b00:266e:96ff:fe07:7dcc] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1wB82s-0000000DAUy-2Ij9; Fri, 10 Apr 2026 09:20:58 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 338103021A4; Fri, 10 Apr 2026 11:20:57 +0200 (CEST) Date: Fri, 10 Apr 2026 11:20:57 +0200 From: Peter Zijlstra To: "Chen, Yu C" Cc: Tim Chen , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" , Vincent Guittot , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Aubrey Li , Zhao Liu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , Josh Don , Gavin Guo , Qais Yousef , Libo Chen , linux-kernel@vger.kernel.org, "Luck, Tony" , Reinette Chatre Subject: Re: [Patch v4 17/22] sched/cache: Avoid cache-aware scheduling for memory-heavy processes Message-ID: <20260410092057.GG3126523@noisy.programming.kicks-ass.net> References: <339bb2636c7306e17540268a9295a8e673b92804.1775065312.git.tim.c.chen@linux.intel.com> <20260409124642.GC3126523@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Fri, Apr 10, 2026 at 04:59:19PM +0800, Chen, Yu C wrote: > > This is pretty terrible. If you want LLC size, add it to the topology > > information (and ideally integrate with RDT) and make proportional to > > cpumask size, such that if someone cuts the domain in pieces, they get > > proportional size etc. > > > > If I understand correctly, do you mean the following: > > 1.Introduce a generic arch_get_llc_size() as a wrapper > around the existing get_cpu_cacheinfo_level(), which > returns the llc_size. Both the scheduler and RDT can > use arch_get_llc_size(). The tie in with RDT was more to affect the return of arch_get_llc_size(). Eg. when RDT takes away some ways for specific tasks, then the total effective size gets reduced for generic use. > 2. The sched domain stores llc_size in > sd->res_size = llc_size * sd_span / arch_llc_span, > and the cache_aware_scheduler uses sd->res_size for > the comparison. Just so. > We will adjust the code accordingly. Thanks. > > Also, if we have NUMA_BALANCING on, that can provide a much better > > estimate for the actual size. > > > > Just using RSS seems like a very bad metric here. > > > > Got it. Currently we lack accurate memory footprint metrics in > the kernel. If we support user-provided hints in the future, we > can leverage RDT llc_occupancy metrics(Is it legal to use > RDT's metrics directly in the kernel? It would switch from > MSR-read to MMIO read thus less overhead). For now, let me have > a try how to leverage NUMA fault-in stats. If NUMA balancing > is off, I need to think more on how to avoid over-aggregation for > memory-intensive workloads. There is also things like this: https://lkml.kernel.org/r/20260323095104.238982-1-bharata@amd.com But yeah, in an ideal world we could be looking at LLC cache hit/miss information... streaming workloads would have very low hit rate. But yes, possible prctl() controls could help, create tools to disable things per program etc.