[RFC PATCH 0/1] arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/1] arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect
@ 2025-01-16 18:54 Radu Rendec
  2025-01-16 18:54 ` [RFC PATCH 1/1] " Radu Rendec
  2025-01-17 13:08 ` [RFC PATCH 0/1] " Rob Herring
  0 siblings, 2 replies; 6+ messages in thread
From: Radu Rendec @ 2025-01-16 18:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Sudeep Holla, Catalin Marinas, Will Deacon, Borislav Petkov

Hi all,

I found an out-of-bounds write bug in the arm64 implementation of
populate_cache_leaves() and I'm trying to fix it. The reason why this
is an RFC is that I'm not entirely sure these changes are sufficient.

The problem is described in detail in the patch itself, so I won't
repeat it here. The gist of it is that on arm64 boards where the device
tree describes the cache structure, the number of cache leaves that
comes out of fetch_cache_info() and is used to size the cacheinfo array
can be smaller than the actual number of leaves. In that case,
populate_cache_leaves() writes past the cacheinfo array bounds.

The way I fixed it, the code doesn't change too much and doesn't look
ugly but it's still possible to write past the array bounds if the last
populated level is a separate data/instruction cache. But I'm not sure
if this is a real-world scenario, so this is one of the areas where I'm
looking for feedback. For example, the DT may define a single unified
level (so levels=1, leaves=1) but in fact L1 is a split D/I cache. Or
the DT defines multiple levels, the last one being a unified cache, but
in reality the last level is a split D/I cache. I believe the latter
scenario is very unlikely since typically only L1 is a split D/I cache.

The other thing that doesn't look right is that init_of_cache_level()
bumps the level at each iteration to whatever the node corresponding to
that level has in the `cache-level` property, but that way one or more
levels can be skipped without accounting any leaves for them. This is
exactly what happens in my case (the Renesas R-Car S4 board, described
in arch/arm64/boot/dts/renesas/r8a779f0.dtsi). In this case, even if
populate_cache_leaves() is fixed, the cache information in the kernel
will be incomplete. Shouldn't init_of_cache_level() return -EINVAL also
when a level is skipped altogether?

Last, and also related to the previous question, is a device tree
definition that skips a cache level correct? Or, in other words, should
the definition in r8a779f0.dtsi be fixed?

Thanks!

Radu Rendec (1):
  arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect

 arch/arm64/kernel/cacheinfo.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
2.47.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 1/1] arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect
  2025-01-16 18:54 [RFC PATCH 0/1] arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect Radu Rendec
@ 2025-01-16 18:54 ` Radu Rendec
  2025-01-17 13:08 ` [RFC PATCH 0/1] " Rob Herring
  1 sibling, 0 replies; 6+ messages in thread
From: Radu Rendec @ 2025-01-16 18:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Sudeep Holla, Catalin Marinas, Will Deacon, Borislav Petkov

The (percpu) cacheinfo array is allocated based on the number of cache
leaves that is determined by fetch_cache_info(). For arm64 systems and
when the device tree provides cache information, the number of cache
levels/leaves comes from the device tree but then the cacheinfo data
structures are populated by analyzing the relevant CPU registers. If
the information in the device tree is incorrect and provides a smaller
number of leaves than the actual one, populate_cache_leaves() happily
writes past the allocated struct cacheinfo array.

An example is the Renesas R-Car S4 Spider board, where the cache
information is described in arch/arm64/boot/dts/renesas/r8a779f0.dtsi.
The CPU nodes contain no explicit info (other than `next-level-cache`),
so of_count_cache_leaves() defaults to 2. The next cache level has an
explicit `cache-level` property equal to 3 and `cache-unified`, so at
the first iteration in init_of_cache_level(), the level is bumped to 3
(from 1) and the leaves count is incremented to 3. So, it ends up with
a total of 3 levels and 3 leaves. However, in reality there are 4 leaves
because L2 does exist even if it's not described in the device tree. So,
the code in populate_cache_leaves() attempts to populate the info for
all 4 leaves and writes past the struct cacheinfo array.

This is easily observed on boot if KASAN is enabled:
8<----------------------------------------------------------------------
BUG: KASAN: slab-out-of-bounds in populate_cache_leaves+0x314/0x388
Write of size 4 at addr ffff0004402e16c4 by task swapper/0/1

CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc6+ #14
Hardware name: Renesas Spider CPU and Breakout boards based on r8a779f0 (DT)
Call trace:
 show_stack+0x34/0x98 (C)
 dump_stack_lvl+0x7c/0xa0
 print_address_description.constprop.0+0x90/0x320
 print_report+0x108/0x1f0
 kasan_report+0xb4/0x100
 __asan_report_store4_noabort+0x20/0x30
 populate_cache_leaves+0x314/0x388
 detect_cache_attributes+0x2f0/0x440
 update_siblings_masks+0x30/0x5b0
 store_cpu_topology+0x154/0x1d8
 smp_prepare_cpus+0x34/0x180
 kernel_init_freeable+0x338/0x480
 kernel_init+0x28/0x170
 ret_from_fork+0x10/0x20

Allocated by task 1:
 kasan_save_stack+0x2c/0x58
 kasan_save_track+0x20/0x40
 kasan_save_alloc_info+0x40/0x60
 __kasan_kmalloc+0xd4/0xd8
 __kmalloc_noprof+0x190/0x4b0
 allocate_cache_info+0x90/0x188
 fetch_cache_info+0x320/0x408
 init_cpu_topology+0x318/0x398
 smp_prepare_cpus+0x28/0x180
 kernel_init_freeable+0x338/0x480
 kernel_init+0x28/0x170
 ret_from_fork+0x10/0x20
8<----------------------------------------------------------------------

The loop that detects/populates cache information already has a bounds
check on the array size but does not account for cache levels with
separate data/instructions cache. Fix this by incrementing the index
for any populated leaf (instead of any populated level).

Fixes: 5d425c186537 ("arm64: kernel: add support for cpu cache information")

Signed-off-by: Radu Rendec <rrendec@redhat.com>
---
 arch/arm64/kernel/cacheinfo.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index d9c9218fa1fdd..c9a75b13972f7 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -101,16 +101,16 @@ int populate_cache_leaves(unsigned int cpu)
 	unsigned int level, idx;
 	enum cache_type type;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-	struct cacheinfo *this_leaf = this_cpu_ci->info_list;
+	struct cacheinfo *infos = this_cpu_ci->info_list;

 	for (idx = 0, level = 1; level <= this_cpu_ci->num_levels &&
-	     idx < this_cpu_ci->num_leaves; idx++, level++) {
+	     idx < this_cpu_ci->num_leaves; level++) {
 		type = get_cache_type(level);
 		if (type == CACHE_TYPE_SEPARATE) {
-			ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
-			ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
+			ci_leaf_init(&infos[idx++], CACHE_TYPE_DATA, level);
+			ci_leaf_init(&infos[idx++], CACHE_TYPE_INST, level);
 		} else {
-			ci_leaf_init(this_leaf++, type, level);
+			ci_leaf_init(&infos[idx++], type, level);
 		}
 	}
 	return 0;
-- 
2.47.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 0/1] arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect
  2025-01-16 18:54 [RFC PATCH 0/1] arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect Radu Rendec
  2025-01-16 18:54 ` [RFC PATCH 1/1] " Radu Rendec
@ 2025-01-17 13:08 ` Rob Herring
  2025-01-17 16:59   ` Radu Rendec
  1 sibling, 1 reply; 6+ messages in thread
From: Rob Herring @ 2025-01-17 13:08 UTC (permalink / raw)
  To: Radu Rendec
  Cc: linux-arm-kernel, Sudeep Holla, Catalin Marinas, Will Deacon,
	Borislav Petkov

On Thu, Jan 16, 2025 at 01:54:57PM -0500, Radu Rendec wrote:
> Hi all,
> 
> I found an out-of-bounds write bug in the arm64 implementation of
> populate_cache_leaves() and I'm trying to fix it. The reason why this
> is an RFC is that I'm not entirely sure these changes are sufficient.
> 
> The problem is described in detail in the patch itself, so I won't
> repeat it here. The gist of it is that on arm64 boards where the device
> tree describes the cache structure, the number of cache leaves that
> comes out of fetch_cache_info() and is used to size the cacheinfo array
> can be smaller than the actual number of leaves. In that case,
> populate_cache_leaves() writes past the cacheinfo array bounds.
> 
> The way I fixed it, the code doesn't change too much and doesn't look
> ugly but it's still possible to write past the array bounds if the last
> populated level is a separate data/instruction cache. But I'm not sure
> if this is a real-world scenario, so this is one of the areas where I'm
> looking for feedback. For example, the DT may define a single unified
> level (so levels=1, leaves=1) but in fact L1 is a split D/I cache. Or
> the DT defines multiple levels, the last one being a unified cache, but
> in reality the last level is a split D/I cache. I believe the latter
> scenario is very unlikely since typically only L1 is a split D/I cache.

I think it is a safe assumption there is not a split cache below a 
unified cache. The DT binding doesn't support that either as 
'next-level-cache' is a single phandle (though I guess it could be 
extended to allow multiple phandles).

> The other thing that doesn't look right is that init_of_cache_level()
> bumps the level at each iteration to whatever the node corresponding to
> that level has in the `cache-level` property, but that way one or more
> levels can be skipped without accounting any leaves for them. This is
> exactly what happens in my case (the Renesas R-Car S4 board, described
> in arch/arm64/boot/dts/renesas/r8a779f0.dtsi). In this case, even if
> populate_cache_leaves() is fixed, the cache information in the kernel
> will be incomplete. Shouldn't init_of_cache_level() return -EINVAL also
> when a level is skipped altogether?
> 
> Last, and also related to the previous question, is a device tree
> definition that skips a cache level correct? Or, in other words, should
> the definition in r8a779f0.dtsi be fixed?

The answer depends on whether there's L2 caches in in the cores 
or not. They are optional in the A55. Calling the L3 level 2 when 
there's no L2 would be confusing. So if there's not an L2, I think 
the description is correct (though perhaps missing L1 cache sizes if 
that's not otherwise discoverable). If there are L2 caches, then there 
should be a cache node under each cpu node.

It could get even messier with heterogeneous systems where some cores 
have an L2 and some don't.

So the cacheinfo code needs to deal skipped levels.

Rob


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 0/1] arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect
  2025-01-17 13:08 ` [RFC PATCH 0/1] " Rob Herring
@ 2025-01-17 16:59   ` Radu Rendec
  2025-01-21  9:20     ` Alireza Sanaee
  0 siblings, 1 reply; 6+ messages in thread
From: Radu Rendec @ 2025-01-17 16:59 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-arm-kernel, Sudeep Holla, Catalin Marinas, Will Deacon,
	Borislav Petkov

On Fri, 2025-01-17 at 07:08 -0600, Rob Herring wrote:
> On Thu, Jan 16, 2025 at 01:54:57PM -0500, Radu Rendec wrote:
> > I found an out-of-bounds write bug in the arm64 implementation of
> > populate_cache_leaves() and I'm trying to fix it. The reason why this
> > is an RFC is that I'm not entirely sure these changes are sufficient.
> > 
> > The problem is described in detail in the patch itself, so I won't
> > repeat it here. The gist of it is that on arm64 boards where the device
> > tree describes the cache structure, the number of cache leaves that
> > comes out of fetch_cache_info() and is used to size the cacheinfo array
> > can be smaller than the actual number of leaves. In that case,
> > populate_cache_leaves() writes past the cacheinfo array bounds.
> > 
> > The way I fixed it, the code doesn't change too much and doesn't look
> > ugly but it's still possible to write past the array bounds if the last
> > populated level is a separate data/instruction cache. But I'm not sure
> > if this is a real-world scenario, so this is one of the areas where I'm
> > looking for feedback. For example, the DT may define a single unified
> > level (so levels=1, leaves=1) but in fact L1 is a split D/I cache. Or
> > the DT defines multiple levels, the last one being a unified cache, but
> > in reality the last level is a split D/I cache. I believe the latter
> > scenario is very unlikely since typically only L1 is a split D/I cache.
> 
> I think it is a safe assumption there is not a split cache below a 
> unified cache. The DT binding doesn't support that either as 
> 'next-level-cache' is a single phandle (though I guess it could be 
> extended to allow multiple phandles).

If I'm reading the code correctly, even though 'next-level-cache' is a
single phandle, the node it points to can still have 'i-cache-size' and
'd-cache-size' properties, which would make it a split cache - at least
as far as of_count_cache_leaves() is concerned.

But I'm more worried about the scenario where the DT defines only L1
and defines it as a unified cache but CLIDR_EL1 reports it's a split
cache. In that case, populate_cache_leaves() would still write past the
array bounds, even with my proposed changes.

I'm starting to think it's much safer to just check the size correctly
instead of relying on assumptions about what cache layout is (un)likely.

> > The other thing that doesn't look right is that init_of_cache_level()
> > bumps the level at each iteration to whatever the node corresponding to
> > that level has in the `cache-level` property, but that way one or more
> > levels can be skipped without accounting any leaves for them. This is
> > exactly what happens in my case (the Renesas R-Car S4 board, described
> > in arch/arm64/boot/dts/renesas/r8a779f0.dtsi). In this case, even if
> > populate_cache_leaves() is fixed, the cache information in the kernel
> > will be incomplete. Shouldn't init_of_cache_level() return -EINVAL also
> > when a level is skipped altogether?
> > 
> > Last, and also related to the previous question, is a device tree
> > definition that skips a cache level correct? Or, in other words, should
> > the definition in r8a779f0.dtsi be fixed?
> 
> The answer depends on whether there's L2 caches in in the cores 
> or not. They are optional in the A55. Calling the L3 level 2 when 
> there's no L2 would be confusing. So if there's not an L2, I think 
> the description is correct (though perhaps missing L1 cache sizes if 
> that's not otherwise discoverable). If there are L2 caches, then there 
> should be a cache node under each cpu node.
> 
> It could get even messier with heterogeneous systems where some cores
> have an L2 and some don't.
> 
> So the cacheinfo code needs to deal skipped levels.

I just checked the technical documentation (I should have done that in
the first place), and it's consistent with the DT. There is no L2 on
that system, and this reconfirms what you just said.

But this is where it gets interesting. CLIDR_EL1 is 0x82000023, so it
reports L1 and L2. That means the information populate_cache_leaves()
puts into the kernel data structures is not only inconsistent with the
DT but also incorrect.

Is CLIDR_EL1 supposed to "skip" missing levels i.e. report L2 instead
of L3 if L2 doesn't exist?

Thanks,
Radu



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 0/1] arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect
  2025-01-17 16:59   ` Radu Rendec
@ 2025-01-21  9:20     ` Alireza Sanaee
  2025-01-22 15:37       ` Radu Rendec
  0 siblings, 1 reply; 6+ messages in thread
From: Alireza Sanaee @ 2025-01-21  9:20 UTC (permalink / raw)
  To: Radu Rendec, robh
  Cc: Rob Herring, linux-arm-kernel, Sudeep Holla, Catalin Marinas,
	Will Deacon, Borislav Petkov, jonathan.cameron

On Fri, 17 Jan 2025 11:59:36 -0500
Radu Rendec <rrendec@redhat.com> wrote:

> On Fri, 2025-01-17 at 07:08 -0600, Rob Herring wrote:
> > On Thu, Jan 16, 2025 at 01:54:57PM -0500, Radu Rendec wrote:  
> > > I found an out-of-bounds write bug in the arm64 implementation of
> > > populate_cache_leaves() and I'm trying to fix it. The reason why
> > > this is an RFC is that I'm not entirely sure these changes are
> > > sufficient.
> > > 
> > > The problem is described in detail in the patch itself, so I won't
> > > repeat it here. The gist of it is that on arm64 boards where the
> > > device tree describes the cache structure, the number of cache
> > > leaves that comes out of fetch_cache_info() and is used to size
> > > the cacheinfo array can be smaller than the actual number of
> > > leaves. In that case, populate_cache_leaves() writes past the
> > > cacheinfo array bounds.
> > > 
> > > The way I fixed it, the code doesn't change too much and doesn't
> > > look ugly but it's still possible to write past the array bounds
> > > if the last populated level is a separate data/instruction cache.
> > > But I'm not sure if this is a real-world scenario, so this is one
> > > of the areas where I'm looking for feedback. For example, the DT
> > > may define a single unified level (so levels=1, leaves=1) but in
> > > fact L1 is a split D/I cache. Or the DT defines multiple levels,
> > > the last one being a unified cache, but in reality the last level
> > > is a split D/I cache. I believe the latter scenario is very
> > > unlikely since typically only L1 is a split D/I cache.  
> > 
> > I think it is a safe assumption there is not a split cache below a 
> > unified cache. The DT binding doesn't support that either as 
> > 'next-level-cache' is a single phandle (though I guess it could be 
> > extended to allow multiple phandles).  
> 
> If I'm reading the code correctly, even though 'next-level-cache' is a
> single phandle, the node it points to can still have 'i-cache-size'
> and 'd-cache-size' properties, which would make it a split cache - at
> least as far as of_count_cache_leaves() is concerned.
> 
> But I'm more worried about the scenario where the DT defines only L1
> and defines it as a unified cache but CLIDR_EL1 reports it's a split
> cache. In that case, populate_cache_leaves() would still write past
> the array bounds, even with my proposed changes.
> 
> I'm starting to think it's much safer to just check the size correctly
> instead of relying on assumptions about what cache layout is
> (un)likely.
> 
> > > The other thing that doesn't look right is that
> > > init_of_cache_level() bumps the level at each iteration to
> > > whatever the node corresponding to that level has in the
> > > `cache-level` property, but that way one or more levels can be
> > > skipped without accounting any leaves for them. This is exactly
> > > what happens in my case (the Renesas R-Car S4 board, described in
> > > arch/arm64/boot/dts/renesas/r8a779f0.dtsi). In this case, even if
> > > populate_cache_leaves() is fixed, the cache information in the
> > > kernel will be incomplete. Shouldn't init_of_cache_level() return
> > > -EINVAL also when a level is skipped altogether?
> > > 
> > > Last, and also related to the previous question, is a device tree
> > > definition that skips a cache level correct? Or, in other words,
> > > should the definition in r8a779f0.dtsi be fixed?  
> > 
> > The answer depends on whether there's L2 caches in in the cores 
> > or not. They are optional in the A55. Calling the L3 level 2 when 
> > there's no L2 would be confusing. So if there's not an L2, I think 
> > the description is correct (though perhaps missing L1 cache sizes
> > if that's not otherwise discoverable). If there are L2 caches, then
> > there should be a cache node under each cpu node.
> > 
> > It could get even messier with heterogeneous systems where some
> > cores have an L2 and some don't.
> > 
> > So the cacheinfo code needs to deal skipped levels.  
> 
> I just checked the technical documentation (I should have done that in
> the first place), and it's consistent with the DT. There is no L2 on
> that system, and this reconfirms what you just said.
> 
> But this is where it gets interesting. CLIDR_EL1 is 0x82000023, so it
> reports L1 and L2. That means the information populate_cache_leaves()
> puts into the kernel data structures is not only inconsistent with the
> DT but also incorrect.
> 
> Is CLIDR_EL1 supposed to "skip" missing levels i.e. report L2 instead
> of L3 if L2 doesn't exist?
> 
> Thanks,
> Radu
> 
> 

Hi Radu,

I believe this discussion is heading in the right direction, especially
when examining the code. Currently, data is sometimes retrieved from
the device tree and other times from the system’s registers. While
there are cross-checks in place, it can still be confusing and
error-prone.

One scenario that comes to mind involves QEMU, where not all system
registers—particularly those related to the cache CLIDR_EL1—are
implemented in TCG “CORRECTLY”. Additionally, at this point, QEMU does
not generate any cache descriptions in the device tree, which is
something I am actively working on [1]. This patch-set will likely
require the kernel to handle various cases and address these
inconsistencies effectively.

I also found it challenging to understand how the number of leaves is
calculated and utilized in the code. While I wasn’t specifically
looking for bugs, my focus was on identifying the best approach to
share caches between SMTs. There are some existing discussions and
trials already [2,3]. But in the new one that I will share soon, I allow
adding another layer l1-cache referenced by next-level-cache at each
CPU node. This also will reduce the complexity when counting leaves
instead of just looking for different properties in the CPU node. This
might allow for easier testing and experimentation of more complex
scenarios. It also would be good to know what are the thoughts on this
one too.

1) https://mail.gnu.org/archive/html/qemu-arm/2025-01/msg00014.html
2) https://lore.kernel.org/linux-devicetree/CAL_JsqLGEvGBQ0W_B6+5cME1UEhuKXadBB-6=GoN1tmavw9K_w@mail.gmail.com/
3) https://lore.kernel.org/linux-arm-kernel/20250113115849.00006fee@huawei.com/T/#m1fc077e393ca2a785e67d818a67c20e1032ca31e




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 0/1] arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect
  2025-01-21  9:20     ` Alireza Sanaee
@ 2025-01-22 15:37       ` Radu Rendec
  0 siblings, 0 replies; 6+ messages in thread
From: Radu Rendec @ 2025-01-22 15:37 UTC (permalink / raw)
  To: Alireza Sanaee, robh
  Cc: linux-arm-kernel, Sudeep Holla, Catalin Marinas, Will Deacon,
	Borislav Petkov, jonathan.cameron

Hi Alireza,

On Tue, 2025-01-21 at 09:20 +0000, Alireza Sanaee wrote:
> On Fri, 17 Jan 2025 11:59:36 -0500
> Radu Rendec <rrendec@redhat.com> wrote:
> 
> > On Fri, 2025-01-17 at 07:08 -0600, Rob Herring wrote:
> > > On Thu, Jan 16, 2025 at 01:54:57PM -0500, Radu Rendec wrote:  
> > > > I found an out-of-bounds write bug in the arm64 implementation of
> > > > populate_cache_leaves() and I'm trying to fix it. The reason why
> > > > this is an RFC is that I'm not entirely sure these changes are
> > > > sufficient.
> > > > 
> > > > The problem is described in detail in the patch itself, so I won't
> > > > repeat it here. The gist of it is that on arm64 boards where the
> > > > device tree describes the cache structure, the number of cache
> > > > leaves that comes out of fetch_cache_info() and is used to size
> > > > the cacheinfo array can be smaller than the actual number of
> > > > leaves. In that case, populate_cache_leaves() writes past the
> > > > cacheinfo array bounds.
> > > > 
> > > > The way I fixed it, the code doesn't change too much and doesn't
> > > > look ugly but it's still possible to write past the array bounds
> > > > if the last populated level is a separate data/instruction cache.
> > > > But I'm not sure if this is a real-world scenario, so this is one
> > > > of the areas where I'm looking for feedback. For example, the DT
> > > > may define a single unified level (so levels=1, leaves=1) but in
> > > > fact L1 is a split D/I cache. Or the DT defines multiple levels,
> > > > the last one being a unified cache, but in reality the last level
> > > > is a split D/I cache. I believe the latter scenario is very
> > > > unlikely since typically only L1 is a split D/I cache.  
> > > 
> > > I think it is a safe assumption there is not a split cache below a 
> > > unified cache. The DT binding doesn't support that either as 
> > > 'next-level-cache' is a single phandle (though I guess it could be 
> > > extended to allow multiple phandles).  
> > 
> > If I'm reading the code correctly, even though 'next-level-cache' is a
> > single phandle, the node it points to can still have 'i-cache-size'
> > and 'd-cache-size' properties, which would make it a split cache - at
> > least as far as of_count_cache_leaves() is concerned.
> > 
> > But I'm more worried about the scenario where the DT defines only L1
> > and defines it as a unified cache but CLIDR_EL1 reports it's a split
> > cache. In that case, populate_cache_leaves() would still write past
> > the array bounds, even with my proposed changes.
> > 
> > I'm starting to think it's much safer to just check the size correctly
> > instead of relying on assumptions about what cache layout is
> > (un)likely.
> > 
> > > > The other thing that doesn't look right is that
> > > > init_of_cache_level() bumps the level at each iteration to
> > > > whatever the node corresponding to that level has in the
> > > > `cache-level` property, but that way one or more levels can be
> > > > skipped without accounting any leaves for them. This is exactly
> > > > what happens in my case (the Renesas R-Car S4 board, described in
> > > > arch/arm64/boot/dts/renesas/r8a779f0.dtsi). In this case, even if
> > > > populate_cache_leaves() is fixed, the cache information in the
> > > > kernel will be incomplete. Shouldn't init_of_cache_level() return
> > > > -EINVAL also when a level is skipped altogether?
> > > > 
> > > > Last, and also related to the previous question, is a device tree
> > > > definition that skips a cache level correct? Or, in other words,
> > > > should the definition in r8a779f0.dtsi be fixed?  
> > > 
> > > The answer depends on whether there's L2 caches in in the cores 
> > > or not. They are optional in the A55. Calling the L3 level 2 when
> > > there's no L2 would be confusing. So if there's not an L2, I think 
> > > the description is correct (though perhaps missing L1 cache sizes
> > > if that's not otherwise discoverable). If there are L2 caches, then
> > > there should be a cache node under each cpu node.
> > > 
> > > It could get even messier with heterogeneous systems where some
> > > cores have an L2 and some don't.
> > > 
> > > So the cacheinfo code needs to deal skipped levels.  
> > 
> > I just checked the technical documentation (I should have done that in
> > the first place), and it's consistent with the DT. There is no L2 on
> > that system, and this reconfirms what you just said.
> > 
> > But this is where it gets interesting. CLIDR_EL1 is 0x82000023, so it
> > reports L1 and L2. That means the information populate_cache_leaves()
> > puts into the kernel data structures is not only inconsistent with the
> > DT but also incorrect.
> > 
> > Is CLIDR_EL1 supposed to "skip" missing levels i.e. report L2 instead
> > of L3 if L2 doesn't exist?
> > 
> > Thanks,
> > Radu
> > 
> > 
> 
> Hi Radu,
> 
> I believe this discussion is heading in the right direction, especially
> when examining the code. Currently, data is sometimes retrieved from
> the device tree and other times from the system’s registers. While
> there are cross-checks in place, it can still be confusing and
> error-prone.

Agreed. And the more I look at it, the more I think that information
should not be pulled from different sources because they *can* be
inconsistent.

For arm64 specifically, the information that's extracted from CLIDR_EL1
does not seem to include more details than what the device tree can
provide. Essentially it's just the level and type of cache (instruction,
data, or unified). The same information can be extracted from the device
tree, and looking at init_of_cache_level() it already is but it's used
just to count the levels/leaves.

For that reason, I would argue that for arm64, if the device tree (or
ACPI) provides information about the cache, it should be used as a
single source of truth, and CLIDR_EL1 should not be used. Even if the
device tree (or ACPI) description is incomplete, we still can't extract
information about more levels/leaves from CLIDR_EL1 because the number
of leaves is pre-determined from DT/ACPI and populate_cache_leaves()
will stop at that number of leaves. Or at least it should (that's the
bug I'm trying to fix).

> One scenario that comes to mind involves QEMU, where not all system
> registers—particularly those related to the cache CLIDR_EL1—are
> implemented in TCG “CORRECTLY”. Additionally, at this point, QEMU does
> not generate any cache descriptions in the device tree, which is
> something I am actively working on [1]. This patch-set will likely
> require the kernel to handle various cases and address these
> inconsistencies effectively.
> 
> I also found it challenging to understand how the number of leaves is
> calculated and utilized in the code. While I wasn’t specifically
> looking for bugs, my focus was on identifying the best approach to
> share caches between SMTs. There are some existing discussions and
> trials already [2,3]. But in the new one that I will share soon, I allow
> adding another layer l1-cache referenced by next-level-cache at each
> CPU node. This also will reduce the complexity when counting leaves
> instead of just looking for different properties in the CPU node. This
> might allow for easier testing and experimentation of more complex
> scenarios. It also would be good to know what are the thoughts on this
> one too.

I looked at the links you provided and I can understand what you're
trying to do there. I believe this is an orthogonal problem though
because struct cpu_cacheinfo is a per-cpu data structure.

In the discussions you had, it was pointed out that physical CPU
threads need to be modeled as distinct CPU nodes in the kernel, which
means they will also have distinct cpu_cacheinfo structures. Even if
the cache is physically shared, you will still have a separate struct
cpu_cacheinfo for each of them in the kernel, regardless of where the
information extracted from.

> 1) https://mail.gnu.org/archive/html/qemu-arm/2025-01/msg00014.html
> 2) https://lore.kernel.org/linux-devicetree/CAL_JsqLGEvGBQ0W_B6+5cME1UEhuKXadBB-6=GoN1tmavw9K_w@mail.gmail.com/
> 3) https://lore.kernel.org/linux-arm-kernel/20250113115849.00006fee@huawei.com/T/#m1fc077e393ca2a785e67d818a67c20e1032ca31e

-- 
Best regards,
Radu



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-01-22 15:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-16 18:54 [RFC PATCH 0/1] arm64: cacheinfo: Avoid out-of-bounds write when DT info is incorrect Radu Rendec
2025-01-16 18:54 ` [RFC PATCH 1/1] " Radu Rendec
2025-01-17 13:08 ` [RFC PATCH 0/1] " Rob Herring
2025-01-17 16:59   ` Radu Rendec
2025-01-21  9:20     ` Alireza Sanaee
2025-01-22 15:37       ` Radu Rendec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).