From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 428AC1EA84
	for <linux-kernel@vger.kernel.org>; Wed,  4 Feb 2026 13:40:25 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.176.79.56
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1770212426; cv=none; b=JkI5qtJrhzxQjklmBzm3MnJsyWrVlUG4W6UYKx5z7gEXQn8Wml3ty/hSgzunj5BXO/Q8OWwuIMuCSnPMcAuUh3ACt4PnahvNmcrwW1EMQBSTydY+pVgZg1Tjik8TDbqEN5021OTujlty1ixCKbh9pACr7DHI7uooFhwQDbrO+2A=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1770212426; c=relaxed/simple;
	bh=LQ2GbFqNrwPjxxn43AMwwnnXmP+4A11ack4CGi/FlFQ=;
	h=Date:From:To:CC:Subject:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type; b=VQcdQzt0gAj1krWBxCj3HhDS0qZxJrKLci4Z6gXwTmovXBOyKjeaOlliCeSif9wis+FHX1CD+2fMwe/kQIL7wBwD+Y08t/TBGTe7mYxmCEA4oylJedpr+ZkMetiULOLdT5gPa35q7Snd1tannQol/oDY4qCiD2cSXOmOdTHvIss=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=185.176.79.56
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com
Received: from mail.maildlp.com (unknown [172.18.224.83])
	by frasgout.his.huawei.com (SkyGuard) with ESMTPS id 4f5hKd6xGpzJ46Zw;
	Wed,  4 Feb 2026 21:39:33 +0800 (CST)
Received: from dubpeml500005.china.huawei.com (unknown [7.214.145.207])
	by mail.maildlp.com (Postfix) with ESMTPS id 118F640086;
	Wed,  4 Feb 2026 21:40:23 +0800 (CST)
Received: from localhost (10.203.177.15) by dubpeml500005.china.huawei.com
 (7.214.145.207) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 4 Feb
 2026 13:40:22 +0000
Date: Wed, 4 Feb 2026 13:40:20 +0000
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Linus Walleij <linusw@kernel.org>
CC: Yushan Wang <wangyushan12@huawei.com>, <alexandre.belloni@bootlin.com>,
	<arnd@arndb.de>, <fustini@kernel.org>, <krzk@kernel.org>,
	<linus.walleij@linaro.org>, <will@kernel.org>,
	<linux-arm-kernel@lists.infradead.org>, <linux-kernel@vger.kernel.org>,
	<fanghao11@huawei.com>, <linuxarm@huawei.com>, <liuyonglong@huawei.com>,
	<prime.zeng@hisilicon.com>, <wangzhou1@hisilicon.com>,
	<xuwei5@hisilicon.com>, <linux-mm@vger.kernel.org>, SeongJae Park
	<sj@kernel.org>
Subject: Re: [PATCH 1/3] soc cache: L3 cache driver for HiSilicon SoC
Message-ID: <20260204134020.00002393@huawei.com>
In-Reply-To: <CAD++jLn+TDfu-aQ+Kfm=unYp4Zjg=endP3GGzZcpuYFR=s1K1g@mail.gmail.com>
References: <20260203161843.649417-1-wangyushan12@huawei.com>
	<20260203161843.649417-2-wangyushan12@huawei.com>
	<CAD++jLn+TDfu-aQ+Kfm=unYp4Zjg=endP3GGzZcpuYFR=s1K1g@mail.gmail.com>
X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32)
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-ClientProxiedBy: lhrpeml100011.china.huawei.com (7.191.174.247) To
 dubpeml500005.china.huawei.com (7.214.145.207)

On Wed, 4 Feb 2026 01:10:01 +0100
Linus Walleij <linusw@kernel.org> wrote:

> Hi Yushan,
>=20
> thanks for your patch!
>=20
> On Tue, Feb 3, 2026 at 5:18=E2=80=AFPM Yushan Wang <wangyushan12@huawei.c=
om> wrote:
> >
> > The driver will create a file of `/dev/hisi_l3c` on init, mmap
> > operations to it will allocate a memory region that is guaranteed to be
> > placed in L3 cache.
> >
> > The driver also provides unmap() to deallocated the locked memory.
> >
> > The driver also provides an ioctl interface for user to get cache lock
> > information, such as lock restrictions and locked sizes.
> >
> > Signed-off-by: Yushan Wang <wangyushan12@huawei.com> =20
>=20
> The commit message does not say *why* you are doing this?
>=20
> > +config HISI_SOC_L3C
> > +       bool "HiSilicon L3 Cache device driver"
> > +       depends on ACPI
> > +       depends on ARM64 || COMPILE_TEST
> > +       help
> > +         This driver provides the functions to lock L3 cache entries f=
rom
> > +         being evicted for better performance. =20
>=20
> Here is the reason though.
>=20
> Things like this need to be CC to linux-mm@vger.kernel.org.
>=20
> I don't see why userspace would be so well informed as to make decisions
> about what should be locked in the L3 cache and not?
>=20
> I see the memory hierarchy as any other hardware: a resource that is
> allocated and arbitrated by the kernel.
>=20
> The MM subsytem knows which memory is most cache hot.
> Especially when you use DAMON DAMOS, which has the sole
> purpose of executing actions like that. Here is a good YouTube.
> https://www.youtube.com/watch?v=3DxKJO4kLTHOI
Hi Linus,

This typically isn't about cache hot.  It it were, the data would
be in the cache without this. It's about ensuring something that would
otherwise unlikely to be there is in the cache.

Normally that's a latency critical region.  In general the kernel
has no chance of figuring out what those are ahead of time, only
userspace can know (based on profiling etc) that is per workload.
The first hit matters in these use cases and it's not something
the prefetchers can help with.

The only thing we could do if this was in kernel would be to
have userspace pass some hints and then let the kernel actually
kick off the process. That just boils down to using a different
interface to do what this driver is doing (and that's the conversaion
this series is trying to get going)  It's a finite resource
and you absolutely need userspace to be able to tell if it
got what it asked for or not.

Damon might be useful for that preanalysis though but it can't do
anything for the infrequent extremely latency sensitive accesses.
Normally this is fleet wide stuff based on intensive benchmarking
of a few nodes.  Same sort of approach as the original warehouse
scale computing paper on tuning zswap capacity across a fleet.
Its an extreme form of profile guided optimization (and not
currently automatic I think?). If we are putting code in this
locked region, the program has been carefully recompiled / linked
to group the critical parts so that we can use the minimum number
of these locked regions. Data is a little simpler.

It's kind of similar to resctl but at a sub process granularity.

>=20
> Shouldn't the MM subsystem be in charge of determining, locking
> down and freeing up hot regions in L3 cache?
>=20
> This looks more like userspace is going to determine that but
> how exactly? By running DAMON? Then it's better to keep the
> whole mechanism in the kernel where it belongs and let the
> MM subsystem adapt locked L3 cache to the usage patterns.

I haven't yet come up with any plausible scheme by which the MM
subsystem could do this.

I think what we need here Yushan, is more detail on end to end
use cases for this.  Some examples etc as clearer motivation.

Jonathan

>=20
> Yours,
> Linus Walleij
>=20