From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA858342526
	for <linux-kernel@vger.kernel.org>; Fri, 10 Apr 2026 09:21:29 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1775812893; cv=none; b=ojd9ZB8Xz+3kk95mLVhvh81DFOUcQEFObcvb8ppe/dLMBL5j4VswBrxIHJL5rHBJDZPd4ddQHhbTAOpjFBninZroJ2Wg0cGh+zq7HOcIh+UYCfuSYMPzwW2wTXnr0QdhrUlJn3VnLj1ZqRM2stzNo03I/NWP9QQpbV1R24QrftE=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1775812893; c=relaxed/simple;
	bh=JS2tvcxKYyzRPhNAEs0coVZnh7UmLqZvHZZMtiOczBM=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=Ehoo8+yMIzAbj4MFd0RxcEoT80I6idcwbVnkV4Obz0aSzgYoak5aCetm4RFU3SuBqOL7fFsv71TDLQI14gJ3nwVhtben6uJHXA+v+IBV38fdUmTSHZFP4AKlG4g7LWNfD01ymlPJEx1lldofke5CHNeTcYXMhGpf+2OAJfLggDk=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Y9yGrk0p; arc=none smtp.client-ip=90.155.92.199
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org
Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Y9yGrk0p"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version:
	References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:
	Content-Transfer-Encoding:Content-ID:Content-Description;
	bh=bV5tbyzP+yvftC0HhOA6x8+ZDI3yoeKw9iUkVYhb+9M=; b=Y9yGrk0pcv03tF8UdumwG9oc6M
	t5+as/V32wkubLc6gifW86ntbEK50v+l6xKLQu7cjMB9LCveBM98VI918qipsspIWvguP4wh8NIqw
	Ji2Y78BT2sJKIsuT533JTlwE0iLnVJ1sANF242GZo6uWjptsxGzp6wG8kLtpQkWpW4qwieMmKYwhw
	Qk81id7Eze6K0KjaA8uxXRrJpifC4dbtzLi8frElAYvnQC7qJ6Cpe/9ujFwigL7CP4r+eeYz6nPH+
	estQ0dKTC+XzQpdSeoNm7HLiNJNwJgKnKBZ2KcjV8M2d3yhRx6ucRrdpJyRQrX5zpYh4ZcfZJngRG
	PhvmXaWQ==;
Received: from 2001-1c00-8d85-4b00-266e-96ff-fe07-7dcc.cable.dynamic.v6.ziggo.nl ([2001:1c00:8d85:4b00:266e:96ff:fe07:7dcc] helo=noisy.programming.kicks-ass.net)
	by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux))
	id 1wB82s-0000000DAUy-2Ij9;
	Fri, 10 Apr 2026 09:20:58 +0000
Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000)
	id 338103021A4; Fri, 10 Apr 2026 11:20:57 +0200 (CEST)
Date: Fri, 10 Apr 2026 11:20:57 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: "Chen, Yu C" <yu.c.chen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>, Ingo Molnar <mingo@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Hillf Danton <hdanton@sina.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Jianyong Wu <jianyong.wu@outlook.com>,
	Yangyu Chen <cyy@cyyself.name>,
	Tingyin Duan <tingyin.duan@gmail.com>,
	Vern Hao <vernhao@tencent.com>, Vern Hao <haoxing990@gmail.com>,
	Len Brown <len.brown@intel.com>, Aubrey Li <aubrey.li@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>, Chen Yu <yu.chen.surf@gmail.com>,
	Adam Li <adamli@os.amperecomputing.com>,
	Aaron Lu <ziqianlu@bytedance.com>, Tim Chen <tim.c.chen@intel.com>,
	Josh Don <joshdon@google.com>, Gavin Guo <gavinguo@igalia.com>,
	Qais Yousef <qyousef@layalina.io>,
	Libo Chen <libchen@purestorage.com>, linux-kernel@vger.kernel.org,
	"Luck, Tony" <tony.luck@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>
Subject: Re: [Patch v4 17/22] sched/cache: Avoid cache-aware scheduling for
 memory-heavy processes
Message-ID: <20260410092057.GG3126523@noisy.programming.kicks-ass.net>
References: <cover.1775065312.git.tim.c.chen@linux.intel.com>
 <339bb2636c7306e17540268a9295a8e673b92804.1775065312.git.tim.c.chen@linux.intel.com>
 <20260409124642.GC3126523@noisy.programming.kicks-ass.net>
 <a2d7e7b2-5a4c-48bc-b567-61e3cbd17786@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <a2d7e7b2-5a4c-48bc-b567-61e3cbd17786@intel.com>

On Fri, Apr 10, 2026 at 04:59:19PM +0800, Chen, Yu C wrote:

> > This is pretty terrible. If you want LLC size, add it to the topology
> > information (and ideally integrate with RDT) and make proportional to
> > cpumask size, such that if someone cuts the domain in pieces, they get
> > proportional size etc.
> > 
> 
> If I understand correctly, do you mean the following:
> 
> 1.Introduce a generic arch_get_llc_size() as a wrapper
>   around the existing get_cpu_cacheinfo_level(), which
>   returns the llc_size. Both the scheduler and RDT can
>   use arch_get_llc_size().

The tie in with RDT was more to affect the return of
arch_get_llc_size(). Eg. when RDT takes away some ways for specific
tasks, then the total effective size gets reduced for generic use.


> 2. The sched domain stores llc_size in
>    sd->res_size = llc_size * sd_span / arch_llc_span,
>    and the cache_aware_scheduler uses sd->res_size for
>    the comparison.

Just so.

> We will adjust the code accordingly.

Thanks.

> > Also, if we have NUMA_BALANCING on, that can provide a much better
> > estimate for the actual size.
> > 
> > Just using RSS seems like a very bad metric here.
> > 
> 
> Got it. Currently we lack accurate memory footprint metrics in
> the kernel. If we support user-provided hints in the future, we
> can leverage RDT llc_occupancy metrics(Is it legal to use
> RDT's metrics directly in the kernel? It would switch from
> MSR-read to MMIO read thus less overhead). For now, let me have
> a try how to leverage NUMA fault-in stats. If NUMA balancing
> is off, I need to think more on how to avoid over-aggregation for
> memory-intensive workloads.

There is also things like this:

  https://lkml.kernel.org/r/20260323095104.238982-1-bharata@amd.com

But yeah, in an ideal world we could be looking at LLC cache hit/miss
information... streaming workloads would have very low hit rate.

But yes, possible prctl() controls could help, create tools to disable
things per program etc.