linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ARM: l2c: aurora: force write allocate for system cache
@ 2020-08-13  7:27 Sascha Hauer
  2020-08-20  7:23 ` Sascha Hauer
  0 siblings, 1 reply; 3+ messages in thread
From: Sascha Hauer @ 2020-08-13  7:27 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andrew Lunn, Jason Cooper, Gregory Clement, Stach, Russell King,
	kernel, Lucas, Sascha Hauer, l.stach

It has been observed that some workloads (specifically iperf using
a 2.5Gb network interface) die with an imprecise external abort when
ECC is enabled in the L2 cache. It is assumed that the cache ECC
functionality isn't directly at fault here, but just exposes another
issue more openly by generating a slave error to what would otherwise
be silent data corruption.

The coherency protocol on Marvell Armada XP/375/38x requires that write
misses allocate in the cache, so all requests should have the write-
allocate attribute, which is done by setting the CPU pagetables to
write-back write-allocate.

Forcing all cacheable requests to allocate in the cache, by changing
the allocation policy from "requester attribute" to "force allocate
always", has been demonstrated to fix the issue in extensive testing.
As this setting only enforces the requirements of the coherency
protocol at the L2 cache level, instead of relying on requesters to
provide the correct attribute, it is expected to have no adverse side
effects and none have been found in the testing. Thus enabling this
attribute override whenever the aurora cache is configured to be a
coherent agent (no-outer) seem to be right thing to do.

As we have no direct channel into Marvell to get more detailed
information, as well as the Armada XP on which this behavior has been
observed being a pretty old part by now, we can only speculate about
the root cause. It may be a misguided CPU core optimization that tries
to write-around the cache on specific access patterns, or it might be a
bad interaction between the L2 cache and multiple concurrent masters.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Lucas Stach, <l.stach@pengutronix.de>
---
 arch/arm/include/asm/hardware/cache-aurora-l2.h | 12 ++++++++++++
 arch/arm/mm/cache-l2x0.c                        | 12 +++++++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/hardware/cache-aurora-l2.h b/arch/arm/include/asm/hardware/cache-aurora-l2.h
index 39769ffa00512..1a75b9b210a83 100644
--- a/arch/arm/include/asm/hardware/cache-aurora-l2.h
+++ b/arch/arm/include/asm/hardware/cache-aurora-l2.h
@@ -31,6 +31,18 @@
 #define AURORA_ACR_REPLACEMENT_TYPE_SEMIPLRU \
 	(3 << AURORA_ACR_REPLACEMENT_OFFSET)
 
+#define AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET        23
+#define AURORA_ACR_FORCE_WRITE_ALLOCATE_MASK \
+	(0x3 << AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET)
+#define AURORA_ACR_FORCE_WRITE_ALLOCATE_REQUESTER \
+	(0 << AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET)
+#define AURORA_ACR_FORCE_WRITE_ALLOCATE_NEVER \
+	(1 << AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET)
+#define AURORA_ACR_FORCE_WRITE_ALLOCATE_ALWAYS \
+	(2 << AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET)
+#define AURORA_ACR_FORCE_WRITE_ALLOCATE_DT_ONLY \
+	(3 << AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET)
+
 #define AURORA_ACR_PARITY_EN	(1 << 21)
 #define AURORA_ACR_ECC_EN	(1 << 20)
 
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 12c26eb88afbc..d0b03d77f3c5c 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1510,6 +1510,16 @@ static void __init aurora_of_parse(const struct device_node *np,
 	*aux_mask &= ~mask;
 }
 
+static void __init aurora_of_parse_no_outer(const struct device_node *np,
+				u32 *aux_val, u32 *aux_mask)
+{
+	aurora_of_parse(np, aux_val, aux_mask);
+
+	*aux_val &= ~AURORA_ACR_FORCE_WRITE_ALLOCATE_MASK;
+	*aux_val |= AURORA_ACR_FORCE_WRITE_ALLOCATE_ALWAYS;
+	*aux_mask &= ~AURORA_ACR_FORCE_WRITE_ALLOCATE_MASK;
+}
+
 static const struct l2c_init_data of_aurora_with_outer_data __initconst = {
 	.type = "Aurora",
 	.way_size_0 = SZ_4K,
@@ -1535,7 +1545,7 @@ static const struct l2c_init_data of_aurora_no_outer_data __initconst = {
 	.type = "Aurora",
 	.way_size_0 = SZ_4K,
 	.num_lock = 4,
-	.of_parse = aurora_of_parse,
+	.of_parse = aurora_of_parse_no_outer,
 	.enable = aurora_enable_no_outer,
 	.fixup = aurora_fixup,
 	.save  = aurora_save,
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] ARM: l2c: aurora: force write allocate for system cache
  2020-08-13  7:27 [PATCH] ARM: l2c: aurora: force write allocate for system cache Sascha Hauer
@ 2020-08-20  7:23 ` Sascha Hauer
  2020-08-22 16:21   ` Andrew Lunn
  0 siblings, 1 reply; 3+ messages in thread
From: Sascha Hauer @ 2020-08-20  7:23 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andrew Lunn, Jason Cooper, Gregory Clement, Russell King, kernel,
	Lucas Stach

Any input on this one?
Are there any other Armada XP users out there who could tell if they
also need such a workaround or if the current kernel works fine for them?

Sascha

On Thu, Aug 13, 2020 at 09:27:56AM +0200, Sascha Hauer wrote:
> It has been observed that some workloads (specifically iperf using
> a 2.5Gb network interface) die with an imprecise external abort when
> ECC is enabled in the L2 cache. It is assumed that the cache ECC
> functionality isn't directly at fault here, but just exposes another
> issue more openly by generating a slave error to what would otherwise
> be silent data corruption.
> 
> The coherency protocol on Marvell Armada XP/375/38x requires that write
> misses allocate in the cache, so all requests should have the write-
> allocate attribute, which is done by setting the CPU pagetables to
> write-back write-allocate.
> 
> Forcing all cacheable requests to allocate in the cache, by changing
> the allocation policy from "requester attribute" to "force allocate
> always", has been demonstrated to fix the issue in extensive testing.
> As this setting only enforces the requirements of the coherency
> protocol at the L2 cache level, instead of relying on requesters to
> provide the correct attribute, it is expected to have no adverse side
> effects and none have been found in the testing. Thus enabling this
> attribute override whenever the aurora cache is configured to be a
> coherent agent (no-outer) seem to be right thing to do.
> 
> As we have no direct channel into Marvell to get more detailed
> information, as well as the Armada XP on which this behavior has been
> observed being a pretty old part by now, we can only speculate about
> the root cause. It may be a misguided CPU core optimization that tries
> to write-around the cache on specific access patterns, or it might be a
> bad interaction between the L2 cache and multiple concurrent masters.
> 
> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
> Signed-off-by: Lucas Stach, <l.stach@pengutronix.de>
> ---
>  arch/arm/include/asm/hardware/cache-aurora-l2.h | 12 ++++++++++++
>  arch/arm/mm/cache-l2x0.c                        | 12 +++++++++++-
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/hardware/cache-aurora-l2.h b/arch/arm/include/asm/hardware/cache-aurora-l2.h
> index 39769ffa00512..1a75b9b210a83 100644
> --- a/arch/arm/include/asm/hardware/cache-aurora-l2.h
> +++ b/arch/arm/include/asm/hardware/cache-aurora-l2.h
> @@ -31,6 +31,18 @@
>  #define AURORA_ACR_REPLACEMENT_TYPE_SEMIPLRU \
>  	(3 << AURORA_ACR_REPLACEMENT_OFFSET)
>  
> +#define AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET        23
> +#define AURORA_ACR_FORCE_WRITE_ALLOCATE_MASK \
> +	(0x3 << AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET)
> +#define AURORA_ACR_FORCE_WRITE_ALLOCATE_REQUESTER \
> +	(0 << AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET)
> +#define AURORA_ACR_FORCE_WRITE_ALLOCATE_NEVER \
> +	(1 << AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET)
> +#define AURORA_ACR_FORCE_WRITE_ALLOCATE_ALWAYS \
> +	(2 << AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET)
> +#define AURORA_ACR_FORCE_WRITE_ALLOCATE_DT_ONLY \
> +	(3 << AURORA_ACR_FORCE_WRITE_ALLOCATE_OFFSET)
> +
>  #define AURORA_ACR_PARITY_EN	(1 << 21)
>  #define AURORA_ACR_ECC_EN	(1 << 20)
>  
> diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
> index 12c26eb88afbc..d0b03d77f3c5c 100644
> --- a/arch/arm/mm/cache-l2x0.c
> +++ b/arch/arm/mm/cache-l2x0.c
> @@ -1510,6 +1510,16 @@ static void __init aurora_of_parse(const struct device_node *np,
>  	*aux_mask &= ~mask;
>  }
>  
> +static void __init aurora_of_parse_no_outer(const struct device_node *np,
> +				u32 *aux_val, u32 *aux_mask)
> +{
> +	aurora_of_parse(np, aux_val, aux_mask);
> +
> +	*aux_val &= ~AURORA_ACR_FORCE_WRITE_ALLOCATE_MASK;
> +	*aux_val |= AURORA_ACR_FORCE_WRITE_ALLOCATE_ALWAYS;
> +	*aux_mask &= ~AURORA_ACR_FORCE_WRITE_ALLOCATE_MASK;
> +}
> +
>  static const struct l2c_init_data of_aurora_with_outer_data __initconst = {
>  	.type = "Aurora",
>  	.way_size_0 = SZ_4K,
> @@ -1535,7 +1545,7 @@ static const struct l2c_init_data of_aurora_no_outer_data __initconst = {
>  	.type = "Aurora",
>  	.way_size_0 = SZ_4K,
>  	.num_lock = 4,
> -	.of_parse = aurora_of_parse,
> +	.of_parse = aurora_of_parse_no_outer,
>  	.enable = aurora_enable_no_outer,
>  	.fixup = aurora_fixup,
>  	.save  = aurora_save,
> -- 
> 2.28.0
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] ARM: l2c: aurora: force write allocate for system cache
  2020-08-20  7:23 ` Sascha Hauer
@ 2020-08-22 16:21   ` Andrew Lunn
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Lunn @ 2020-08-22 16:21 UTC (permalink / raw)
  To: Sascha Hauer
  Cc: Jason Cooper, Gregory Clement, Russell King, kernel,
	linux-arm-kernel, Lucas Stach

On Thu, Aug 20, 2020 at 09:23:55AM +0200, Sascha Hauer wrote:
> Any input on this one?
> Are there any other Armada XP users out there who could tell if they
> also need such a workaround or if the current kernel works fine for them?
> 
> Sascha

Hi Sascha

I've used an wrt1900ac for quite a while for kernel hacking on Armada
XP. I've not seen issues with it, but it is 1G Ethernet only. The
memory architecture is odd on the XP, it took bootlin quite a while to
get is working. So i would not be too surprised if there are still
issues.

	Andrew

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-22 16:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-13  7:27 [PATCH] ARM: l2c: aurora: force write allocate for system cache Sascha Hauer
2020-08-20  7:23 ` Sascha Hauer
2020-08-22 16:21   ` Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).