linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] arm64: errata: Workaround for SI L1 downstream coherency issue
@ 2025-12-29  3:36 Lucas Wei
  2026-01-01  8:27 ` Kuan-Wei Chiu
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Lucas Wei @ 2025-12-29  3:36 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Jonathan Corbet
  Cc: sjadavani, Lucas Wei, kernel test robot, stable, kernel-team,
	linux-arm-kernel, linux-doc, linux-kernel

When software issues a Cache Maintenance Operation (CMO) targeting a
dirty cache line, the CPU and DSU cluster may optimize the operation by
combining the CopyBack Write and CMO into a single combined CopyBack
Write plus CMO transaction presented to the interconnect (MCN).
For these combined transactions, the MCN splits the operation into two
separate transactions, one Write and one CMO, and then propagates the
write and optionally the CMO to the downstream memory system or external
Point of Serialization (PoS).
However, the MCN may return an early CompCMO response to the DSU cluster
before the corresponding Write and CMO transactions have completed at
the external PoS or downstream memory. As a result, stale data may be
observed by external observers that are directly connected to the
external PoS or downstream memory.

This erratum affects any system topology in which the following
conditions apply:
 - The Point of Serialization (PoS) is located downstream of the
   interconnect.
 - A downstream observer accesses memory directly, bypassing the
   interconnect.

Conditions:
This erratum occurs only when all of the following conditions are met:
 1. Software executes a data cache maintenance operation, specifically,
    a clean or invalidate by virtual address (DC CVAC, DC CIVAC, or DC
    IVAC), that hits on unique dirty data in the CPU or DSU cache. This
    results in a combined CopyBack and CMO being issued to the
    interconnect.
 2. The interconnect splits the combined transaction into separate Write
    and CMO transactions and returns an early completion response to the
    CPU or DSU before the write has completed at the downstream memory
    or PoS.
 3. A downstream observer accesses the affected memory address after the
    early completion response is issued but before the actual memory
    write has completed. This allows the observer to read stale data
    that has not yet been updated at the PoS or downstream memory.

The implementation of workaround put a second loop of CMOs at the same
virtual address whose operation meet erratum conditions to wait until
cache data be cleaned to PoC.. This way of implementation mitigates
performance panalty compared to purly duplicate orignial CMO.

Reported-by: kernel test robot <lkp@intel.com>
Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Lucas Wei <lucaswei@google.com>
---

Changes in v2:

 1. Fixed warning from kernel test robot by changing
    arm_si_l1_workaround_4311569 to static 
    [Reported-by: kernel test robot <lkp@intel.com>]

---
 Documentation/arch/arm64/silicon-errata.rst |  3 ++
 arch/arm64/Kconfig                          | 19 +++++++++++++
 arch/arm64/include/asm/assembler.h          | 10 +++++++
 arch/arm64/kernel/cpu_errata.c              | 31 +++++++++++++++++++++
 arch/arm64/mm/cache.S                       | 13 ++++++++-
 arch/arm64/tools/cpucaps                    |  1 +
 6 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index a7ec57060f64..98efdf528719 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -213,6 +213,9 @@ stable kernels.
 | ARM            | GIC-700         | #2941627        | ARM64_ERRATUM_2941627       |
 +----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | SI L1           | #4311569        | ARM64_ERRATUM_4311569       |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
 | Broadcom       | Brahma-B53      | N/A             | ARM64_ERRATUM_845719        |
 +----------------+-----------------+-----------------+-----------------------------+
 | Broadcom       | Brahma-B53      | N/A             | ARM64_ERRATUM_843419        |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 65db12f66b8f..a834d30859cc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1153,6 +1153,25 @@ config ARM64_ERRATUM_3194386
 
 	  If unsure, say Y.
 
+config ARM64_ERRATUM_4311569
+	bool "SI L1: 4311569: workaround for premature CMO completion erratum"
+	default y
+	help
+	  This option adds the workaround for ARM SI L1 erratum 4311569.
+
+	  The erratum of SI L1 can cause an early response to a combined write
+	  and cache maintenance operation (WR+CMO) before the operation is fully
+	  completed to the Point of Serialization (POS).
+	  This can result in a non-I/O coherent agent observing stale data,
+	  potentially leading to system instability or incorrect behavior.
+
+	  Enabling this option implements a software workaround by inserting a
+	  second loop of Cache Maintenance Operation (CMO) immediately following the
+	  end of function to do CMOs. This ensures that the data is correctly serialized
+	  before the buffer is handed off to a non-coherent agent.
+
+	  If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
 	bool "Cavium erratum 22375, 24313"
 	default y
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index f0ca7196f6fa..d3d46e5f7188 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -381,6 +381,9 @@ alternative_endif
 	.macro dcache_by_myline_op op, domain, start, end, linesz, tmp, fixup
 	sub	\tmp, \linesz, #1
 	bic	\start, \start, \tmp
+alternative_if ARM64_WORKAROUND_4311569
+	mov	\tmp, \start
+alternative_else_nop_endif
 .Ldcache_op\@:
 	.ifc	\op, cvau
 	__dcache_op_workaround_clean_cache \op, \start
@@ -402,6 +405,13 @@ alternative_endif
 	add	\start, \start, \linesz
 	cmp	\start, \end
 	b.lo	.Ldcache_op\@
+alternative_if ARM64_WORKAROUND_4311569
+	.ifnc	\op, cvau
+	mov	\start, \tmp
+	mov	\tmp, xzr
+	cbnz	\start, .Ldcache_op\@
+	.endif
+alternative_else_nop_endif
 	dsb	\domain
 
 	_cond_uaccess_extable .Ldcache_op\@, \fixup
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 8cb3b575a031..5c0ab6bfd44a 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
 	return (ctr_real != sys) && (ctr_raw != sys);
 }
 
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
+static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
+{
+	static_branch_enable(&arm_si_l1_workaround_4311569);
+	pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
+
+	return 0;
+}
+early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
+
+/*
+ * We have some earlier use cases to call cache maintenance operation functions, for example,
+ * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
+ * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
+ * safe to have the workaround off by default.
+ */
+static bool
+need_arm_si_l1_workaround_4311569(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	return static_branch_unlikely(&arm_si_l1_workaround_4311569);
+}
+#endif
+
 static void
 cpu_enable_trap_ctr_access(const struct arm64_cpu_capabilities *cap)
 {
@@ -870,6 +894,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		ERRATA_MIDR_RANGE_LIST(erratum_spec_ssbs_list),
 	},
 #endif
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+	{
+		.capability = ARM64_WORKAROUND_4311569,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = need_arm_si_l1_workaround_4311569,
+	},
+#endif
 #ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD
 	{
 		.desc = "ARM errata 2966298, 3117295",
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 503567c864fd..ddf0097624ed 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -143,9 +143,14 @@ SYM_FUNC_END(dcache_clean_pou)
  *	- end     - kernel end address of region
  */
 SYM_FUNC_START(__pi_dcache_inval_poc)
+alternative_if ARM64_WORKAROUND_4311569
+	mov	x4, x0
+	mov	x5, x1
+	mov	x6, #1
+alternative_else_nop_endif
 	dcache_line_size x2, x3
 	sub	x3, x2, #1
-	tst	x1, x3				// end cache line aligned?
+again:	tst	x1, x3				// end cache line aligned?
 	bic	x1, x1, x3
 	b.eq	1f
 	dc	civac, x1			// clean & invalidate D / U line
@@ -158,6 +163,12 @@ SYM_FUNC_START(__pi_dcache_inval_poc)
 3:	add	x0, x0, x2
 	cmp	x0, x1
 	b.lo	2b
+alternative_if ARM64_WORKAROUND_4311569
+	mov	x0, x4
+	mov	x1, x5
+	sub	x6, x6, #1
+	cbz	x6, again
+alternative_else_nop_endif
 	dsb	sy
 	ret
 SYM_FUNC_END(__pi_dcache_inval_poc)
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 1b32c1232d28..3b18734f9744 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -101,6 +101,7 @@ WORKAROUND_2077057
 WORKAROUND_2457168
 WORKAROUND_2645198
 WORKAROUND_2658417
+WORKAROUND_4311569
 WORKAROUND_AMPERE_AC03_CPU_38
 WORKAROUND_AMPERE_AC04_CPU_23
 WORKAROUND_TRBE_OVERWRITE_FILL_MODE

base-commit: edde060637b92607f3522252c03d64ad06369933
-- 
2.52.0.358.g0dd7633a29-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] arm64: errata: Workaround for SI L1 downstream coherency issue
  2025-12-29  3:36 [PATCH v2] arm64: errata: Workaround for SI L1 downstream coherency issue Lucas Wei
@ 2026-01-01  8:27 ` Kuan-Wei Chiu
  2026-01-01 18:55 ` Marc Zyngier
  2026-01-07 16:22 ` Will Deacon
  2 siblings, 0 replies; 6+ messages in thread
From: Kuan-Wei Chiu @ 2026-01-01  8:27 UTC (permalink / raw)
  To: Lucas Wei
  Cc: Catalin Marinas, Will Deacon, Jonathan Corbet, sjadavani,
	kernel test robot, stable, kernel-team, linux-arm-kernel,
	linux-doc, linux-kernel, visitorckw, marscheng

Hi Lucas,

On Mon, Dec 29, 2025 at 03:36:19AM +0000, Lucas Wei wrote:
> When software issues a Cache Maintenance Operation (CMO) targeting a
> dirty cache line, the CPU and DSU cluster may optimize the operation by
> combining the CopyBack Write and CMO into a single combined CopyBack
> Write plus CMO transaction presented to the interconnect (MCN).
> For these combined transactions, the MCN splits the operation into two
> separate transactions, one Write and one CMO, and then propagates the
> write and optionally the CMO to the downstream memory system or external
> Point of Serialization (PoS).
> However, the MCN may return an early CompCMO response to the DSU cluster
> before the corresponding Write and CMO transactions have completed at
> the external PoS or downstream memory. As a result, stale data may be
> observed by external observers that are directly connected to the
> external PoS or downstream memory.
> 
> This erratum affects any system topology in which the following
> conditions apply:
>  - The Point of Serialization (PoS) is located downstream of the
>    interconnect.
>  - A downstream observer accesses memory directly, bypassing the
>    interconnect.
> 
> Conditions:
> This erratum occurs only when all of the following conditions are met:
>  1. Software executes a data cache maintenance operation, specifically,
>     a clean or invalidate by virtual address (DC CVAC, DC CIVAC, or DC
>     IVAC), that hits on unique dirty data in the CPU or DSU cache. This
>     results in a combined CopyBack and CMO being issued to the
>     interconnect.
>  2. The interconnect splits the combined transaction into separate Write
>     and CMO transactions and returns an early completion response to the
>     CPU or DSU before the write has completed at the downstream memory
>     or PoS.
>  3. A downstream observer accesses the affected memory address after the
>     early completion response is issued but before the actual memory
>     write has completed. This allows the observer to read stale data
>     that has not yet been updated at the PoS or downstream memory.
> 
> The implementation of workaround put a second loop of CMOs at the same
> virtual address whose operation meet erratum conditions to wait until
> cache data be cleaned to PoC.. This way of implementation mitigates
> performance panalty compared to purly duplicate orignial CMO.
> 
> Reported-by: kernel test robot <lkp@intel.com>

I assume the Reported-by tag was added due to the sparse warning in v1?
Since this patch fixes a hardware erratum rather than an issue reported
by the robot, I don't think we need this tag here.

Generally, we don't add Reported-by for fixing robot warnings across
patch versions.

Regards,
Kuan-Wei

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] arm64: errata: Workaround for SI L1 downstream coherency issue
  2025-12-29  3:36 [PATCH v2] arm64: errata: Workaround for SI L1 downstream coherency issue Lucas Wei
  2026-01-01  8:27 ` Kuan-Wei Chiu
@ 2026-01-01 18:55 ` Marc Zyngier
  2026-01-07 16:33   ` Will Deacon
  2026-01-07 16:22 ` Will Deacon
  2 siblings, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2026-01-01 18:55 UTC (permalink / raw)
  To: Lucas Wei
  Cc: Catalin Marinas, Will Deacon, Jonathan Corbet, sjadavani,
	kernel test robot, stable, kernel-team, linux-arm-kernel,
	linux-doc, linux-kernel

On Mon, 29 Dec 2025 03:36:19 +0000,
Lucas Wei <lucaswei@google.com> wrote:
> 
> When software issues a Cache Maintenance Operation (CMO) targeting a
> dirty cache line, the CPU and DSU cluster may optimize the operation by
> combining the CopyBack Write and CMO into a single combined CopyBack
> Write plus CMO transaction presented to the interconnect (MCN).
> For these combined transactions, the MCN splits the operation into two
> separate transactions, one Write and one CMO, and then propagates the
> write and optionally the CMO to the downstream memory system or external
> Point of Serialization (PoS).
> However, the MCN may return an early CompCMO response to the DSU cluster
> before the corresponding Write and CMO transactions have completed at
> the external PoS or downstream memory. As a result, stale data may be
> observed by external observers that are directly connected to the
> external PoS or downstream memory.
> 
> This erratum affects any system topology in which the following
> conditions apply:
>  - The Point of Serialization (PoS) is located downstream of the
>    interconnect.
>  - A downstream observer accesses memory directly, bypassing the
>    interconnect.
> 
> Conditions:
> This erratum occurs only when all of the following conditions are met:
>  1. Software executes a data cache maintenance operation, specifically,
>     a clean or invalidate by virtual address (DC CVAC, DC CIVAC, or DC
>     IVAC), that hits on unique dirty data in the CPU or DSU cache. This
>     results in a combined CopyBack and CMO being issued to the
>     interconnect.
>  2. The interconnect splits the combined transaction into separate Write
>     and CMO transactions and returns an early completion response to the
>     CPU or DSU before the write has completed at the downstream memory
>     or PoS.
>  3. A downstream observer accesses the affected memory address after the
>     early completion response is issued but before the actual memory
>     write has completed. This allows the observer to read stale data
>     that has not yet been updated at the PoS or downstream memory.
> 
> The implementation of workaround put a second loop of CMOs at the same
> virtual address whose operation meet erratum conditions to wait until
> cache data be cleaned to PoC.. This way of implementation mitigates
> performance panalty compared to purly duplicate orignial CMO.

penalty, purely, original.

How does one identify the "erratum conditions"?

> 
> Reported-by: kernel test robot <lkp@intel.com>

Well, no.

> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Lucas Wei <lucaswei@google.com>
> ---
> 
> Changes in v2:
> 
>  1. Fixed warning from kernel test robot by changing
>     arm_si_l1_workaround_4311569 to static 
>     [Reported-by: kernel test robot <lkp@intel.com>]
> 
> ---
>  Documentation/arch/arm64/silicon-errata.rst |  3 ++
>  arch/arm64/Kconfig                          | 19 +++++++++++++
>  arch/arm64/include/asm/assembler.h          | 10 +++++++
>  arch/arm64/kernel/cpu_errata.c              | 31 +++++++++++++++++++++
>  arch/arm64/mm/cache.S                       | 13 ++++++++-
>  arch/arm64/tools/cpucaps                    |  1 +
>  6 files changed, 76 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index a7ec57060f64..98efdf528719 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -213,6 +213,9 @@ stable kernels.
>  | ARM            | GIC-700         | #2941627        | ARM64_ERRATUM_2941627       |
>  +----------------+-----------------+-----------------+-----------------------------+
>  +----------------+-----------------+-----------------+-----------------------------+
> +| ARM            | SI L1           | #4311569        | ARM64_ERRATUM_4311569       |
> ++----------------+-----------------+-----------------+-----------------------------+

Keep ARM within a single section (no double line -- there's already a
pointless extra one before 2941627).

> ++----------------+-----------------+-----------------+-----------------------------+
>  | Broadcom       | Brahma-B53      | N/A             | ARM64_ERRATUM_845719        |
>  +----------------+-----------------+-----------------+-----------------------------+
>  | Broadcom       | Brahma-B53      | N/A             | ARM64_ERRATUM_843419        |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 65db12f66b8f..a834d30859cc 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1153,6 +1153,25 @@ config ARM64_ERRATUM_3194386
>  
>  	  If unsure, say Y.
>  
> +config ARM64_ERRATUM_4311569
> +	bool "SI L1: 4311569: workaround for premature CMO completion erratum"
> +	default y
> +	help
> +	  This option adds the workaround for ARM SI L1 erratum 4311569.
> +
> +	  The erratum of SI L1 can cause an early response to a combined write
> +	  and cache maintenance operation (WR+CMO) before the operation is fully
> +	  completed to the Point of Serialization (POS).
> +	  This can result in a non-I/O coherent agent observing stale data,
> +	  potentially leading to system instability or incorrect behavior.
> +
> +	  Enabling this option implements a software workaround by inserting a
> +	  second loop of Cache Maintenance Operation (CMO) immediately following the
> +	  end of function to do CMOs. This ensures that the data is correctly serialized
> +	  before the buffer is handed off to a non-coherent agent.
> +
> +	  If unsure, say Y.
> +
>  config CAVIUM_ERRATUM_22375
>  	bool "Cavium erratum 22375, 24313"
>  	default y
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index f0ca7196f6fa..d3d46e5f7188 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -381,6 +381,9 @@ alternative_endif
>  	.macro dcache_by_myline_op op, domain, start, end, linesz, tmp, fixup
>  	sub	\tmp, \linesz, #1
>  	bic	\start, \start, \tmp
> +alternative_if ARM64_WORKAROUND_4311569
> +	mov	\tmp, \start
> +alternative_else_nop_endif
>  .Ldcache_op\@:
>  	.ifc	\op, cvau
>  	__dcache_op_workaround_clean_cache \op, \start
> @@ -402,6 +405,13 @@ alternative_endif
>  	add	\start, \start, \linesz
>  	cmp	\start, \end
>  	b.lo	.Ldcache_op\@
> +alternative_if ARM64_WORKAROUND_4311569
> +	.ifnc	\op, cvau
> +	mov	\start, \tmp
> +	mov	\tmp, xzr
> +	cbnz	\start, .Ldcache_op\@
> +	.endif
> +alternative_else_nop_endif
>  	dsb	\domain
>  
>  	_cond_uaccess_extable .Ldcache_op\@, \fixup
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 8cb3b575a031..5c0ab6bfd44a 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
>  	return (ctr_real != sys) && (ctr_raw != sys);
>  }
>  
> +#ifdef CONFIG_ARM64_ERRATUM_4311569
> +static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
> +static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
> +{
> +	static_branch_enable(&arm_si_l1_workaround_4311569);
> +	pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
> +
> +	return 0;
> +}
> +early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
> +
> +/*
> + * We have some earlier use cases to call cache maintenance operation functions, for example,
> + * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
> + * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
> + * safe to have the workaround off by default.
> + */
> +static bool
> +need_arm_si_l1_workaround_4311569(const struct arm64_cpu_capabilities *entry, int scope)
> +{
> +	return static_branch_unlikely(&arm_si_l1_workaround_4311569);
> +}
> +#endif

But this isn't a detection mechanism. That's relying on the user
knowing they are dealing with broken hardware. How do they find out?
You don't even call out what platform is actually affected...

The other elephant in the room is virtualisation: how does a guest
performing CMOs deals with this? How does it discover the that the
host is broken? I also don't see any attempt to make KVM handle the
erratum on behalf of the guest...

Thanks,

	M.

-- 
Jazz isn't dead. It just smells funny.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] arm64: errata: Workaround for SI L1 downstream coherency issue
  2025-12-29  3:36 [PATCH v2] arm64: errata: Workaround for SI L1 downstream coherency issue Lucas Wei
  2026-01-01  8:27 ` Kuan-Wei Chiu
  2026-01-01 18:55 ` Marc Zyngier
@ 2026-01-07 16:22 ` Will Deacon
  2 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2026-01-07 16:22 UTC (permalink / raw)
  To: Lucas Wei
  Cc: Catalin Marinas, Jonathan Corbet, sjadavani, kernel test robot,
	stable, kernel-team, linux-arm-kernel, linux-doc, linux-kernel,
	robin.murphy

[+Robin as he's been involved with this]

On Mon, Dec 29, 2025 at 03:36:19AM +0000, Lucas Wei wrote:
> When software issues a Cache Maintenance Operation (CMO) targeting a
> dirty cache line, the CPU and DSU cluster may optimize the operation by
> combining the CopyBack Write and CMO into a single combined CopyBack
> Write plus CMO transaction presented to the interconnect (MCN).
> For these combined transactions, the MCN splits the operation into two
> separate transactions, one Write and one CMO, and then propagates the
> write and optionally the CMO to the downstream memory system or external
> Point of Serialization (PoS).
> However, the MCN may return an early CompCMO response to the DSU cluster
> before the corresponding Write and CMO transactions have completed at
> the external PoS or downstream memory. As a result, stale data may be
> observed by external observers that are directly connected to the
> external PoS or downstream memory.
> 
> This erratum affects any system topology in which the following
> conditions apply:
>  - The Point of Serialization (PoS) is located downstream of the
>    interconnect.
>  - A downstream observer accesses memory directly, bypassing the
>    interconnect.
> 
> Conditions:
> This erratum occurs only when all of the following conditions are met:
>  1. Software executes a data cache maintenance operation, specifically,
>     a clean or invalidate by virtual address (DC CVAC, DC CIVAC, or DC
>     IVAC), that hits on unique dirty data in the CPU or DSU cache. This
>     results in a combined CopyBack and CMO being issued to the
>     interconnect.

Why do we need to worry about IVAC here? Even though that might be
upgraded to CIVAC and result in the erratum conditions, the DMA API
shouldn't use IVAC on dirty lines so I don't think we need to worry
about it.

> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index f0ca7196f6fa..d3d46e5f7188 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -381,6 +381,9 @@ alternative_endif
>  	.macro dcache_by_myline_op op, domain, start, end, linesz, tmp, fixup
>  	sub	\tmp, \linesz, #1
>  	bic	\start, \start, \tmp
> +alternative_if ARM64_WORKAROUND_4311569
> +	mov	\tmp, \start
> +alternative_else_nop_endif
>  .Ldcache_op\@:
>  	.ifc	\op, cvau
>  	__dcache_op_workaround_clean_cache \op, \start
> @@ -402,6 +405,13 @@ alternative_endif
>  	add	\start, \start, \linesz
>  	cmp	\start, \end
>  	b.lo	.Ldcache_op\@
> +alternative_if ARM64_WORKAROUND_4311569
> +	.ifnc	\op, cvau
> +	mov	\start, \tmp
> +	mov	\tmp, xzr
> +	cbnz	\start, .Ldcache_op\@
> +	.endif
> +alternative_else_nop_endif

So you could also avoid this for ivac, although it looks like this is
only called for civac, cvau, cvac and cvap so perhaps not worth it.

> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index 503567c864fd..ddf0097624ed 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -143,9 +143,14 @@ SYM_FUNC_END(dcache_clean_pou)
>   *	- end     - kernel end address of region
>   */
>  SYM_FUNC_START(__pi_dcache_inval_poc)
> +alternative_if ARM64_WORKAROUND_4311569
> +	mov	x4, x0
> +	mov	x5, x1
> +	mov	x6, #1
> +alternative_else_nop_endif
>  	dcache_line_size x2, x3
>  	sub	x3, x2, #1
> -	tst	x1, x3				// end cache line aligned?
> +again:	tst	x1, x3				// end cache line aligned?
>  	bic	x1, x1, x3
>  	b.eq	1f
>  	dc	civac, x1			// clean & invalidate D / U line
> @@ -158,6 +163,12 @@ SYM_FUNC_START(__pi_dcache_inval_poc)
>  3:	add	x0, x0, x2
>  	cmp	x0, x1
>  	b.lo	2b
> +alternative_if ARM64_WORKAROUND_4311569
> +	mov	x0, x4
> +	mov	x1, x5
> +	sub	x6, x6, #1
> +	cbz	x6, again
> +alternative_else_nop_endif
>  	dsb	sy
>  	ret
>  SYM_FUNC_END(__pi_dcache_inval_poc)

But this whole part could be dropped? The CIVACs are just for the
unaligned parts at the ends of the buffer and we shouldn't need to worry
about propagating them -- we just don't want to chuck them away with an
invalidation!

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] arm64: errata: Workaround for SI L1 downstream coherency issue
  2026-01-01 18:55 ` Marc Zyngier
@ 2026-01-07 16:33   ` Will Deacon
  2026-01-07 17:55     ` Robin Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2026-01-07 16:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Lucas Wei, Catalin Marinas, Jonathan Corbet, sjadavani,
	kernel test robot, stable, kernel-team, linux-arm-kernel,
	linux-doc, linux-kernel, robin.murphy, smostafa

Hey Marc,

On Thu, Jan 01, 2026 at 06:55:05PM +0000, Marc Zyngier wrote:
> On Mon, 29 Dec 2025 03:36:19 +0000,
> Lucas Wei <lucaswei@google.com> wrote:
> > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> > index 8cb3b575a031..5c0ab6bfd44a 100644
> > --- a/arch/arm64/kernel/cpu_errata.c
> > +++ b/arch/arm64/kernel/cpu_errata.c
> > @@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
> >  	return (ctr_real != sys) && (ctr_raw != sys);
> >  }
> >  
> > +#ifdef CONFIG_ARM64_ERRATUM_4311569
> > +static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
> > +static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
> > +{
> > +	static_branch_enable(&arm_si_l1_workaround_4311569);
> > +	pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
> > +
> > +	return 0;
> > +}
> > +early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
> > +
> > +/*
> > + * We have some earlier use cases to call cache maintenance operation functions, for example,
> > + * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
> > + * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
> > + * safe to have the workaround off by default.
> > + */
> > +static bool
> > +need_arm_si_l1_workaround_4311569(const struct arm64_cpu_capabilities *entry, int scope)
> > +{
> > +	return static_branch_unlikely(&arm_si_l1_workaround_4311569);
> > +}
> > +#endif
> 
> But this isn't a detection mechanism. That's relying on the user
> knowing they are dealing with broken hardware. How do they find out?

Sadly, I'm not aware of a mechanism to detect this reliably at runtime
but adding Robin in case he knows of one. Linux generally doesn't need
to worry about the SLC, so we'd have to add something to DT to detect
it and even then I don't know whether it's something that is typically
exposed to non-secure...

We also need the workaround to be up early enough that drivers don't
run into issues, so that would probably involve invasive surgery in the
DT parsing code.

> You don't even call out what platform is actually affected...

Well, it's an Android phone :)

More generally, it's going to be anything with an Arm "SI L1" configured
to work with non-coherent DMA agents below it. Christ knows whose bright
idea it was to put "L1" in the name of the thing containing the system
cache.

> The other elephant in the room is virtualisation: how does a guest
> performing CMOs deals with this? How does it discover the that the
> host is broken? I also don't see any attempt to make KVM handle the
> erratum on behalf of the guest...

A guest shouldn't have to worry about the problem, as it only affects
clean to PoC for non-coherent DMA agents that reside downstream of the
SLC in the interconnect. Since VFIO doesn't permit assigning
non-coherent devices to a guest, guests shouldn't ever need to push
writes that far (and FWB would cause bigger problems if that was
something we wanted to support)

+Mostafa to keep me honest on the VFIO front.

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] arm64: errata: Workaround for SI L1 downstream coherency issue
  2026-01-07 16:33   ` Will Deacon
@ 2026-01-07 17:55     ` Robin Murphy
  0 siblings, 0 replies; 6+ messages in thread
From: Robin Murphy @ 2026-01-07 17:55 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier
  Cc: Lucas Wei, Catalin Marinas, Jonathan Corbet, sjadavani,
	kernel test robot, stable, kernel-team, linux-arm-kernel,
	linux-doc, linux-kernel, smostafa

On 2026-01-07 4:33 pm, Will Deacon wrote:
> Hey Marc,
> 
> On Thu, Jan 01, 2026 at 06:55:05PM +0000, Marc Zyngier wrote:
>> On Mon, 29 Dec 2025 03:36:19 +0000,
>> Lucas Wei <lucaswei@google.com> wrote:
>>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
>>> index 8cb3b575a031..5c0ab6bfd44a 100644
>>> --- a/arch/arm64/kernel/cpu_errata.c
>>> +++ b/arch/arm64/kernel/cpu_errata.c
>>> @@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
>>>   	return (ctr_real != sys) && (ctr_raw != sys);
>>>   }
>>>   
>>> +#ifdef CONFIG_ARM64_ERRATUM_4311569
>>> +static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
>>> +static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
>>> +{
>>> +	static_branch_enable(&arm_si_l1_workaround_4311569);
>>> +	pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
>>> +
>>> +	return 0;
>>> +}
>>> +early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
>>> +
>>> +/*
>>> + * We have some earlier use cases to call cache maintenance operation functions, for example,
>>> + * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
>>> + * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
>>> + * safe to have the workaround off by default.
>>> + */
>>> +static bool
>>> +need_arm_si_l1_workaround_4311569(const struct arm64_cpu_capabilities *entry, int scope)
>>> +{
>>> +	return static_branch_unlikely(&arm_si_l1_workaround_4311569);
>>> +}
>>> +#endif
>>
>> But this isn't a detection mechanism. That's relying on the user
>> knowing they are dealing with broken hardware. How do they find out?
> 
> Sadly, I'm not aware of a mechanism to detect this reliably at runtime
> but adding Robin in case he knows of one. Linux generally doesn't need
> to worry about the SLC, so we'd have to add something to DT to detect
> it and even then I don't know whether it's something that is typically
> exposed to non-secure...
> 
> We also need the workaround to be up early enough that drivers don't
> run into issues, so that would probably involve invasive surgery in the
> DT parsing code.

Indeed even if we did happen to know where the interconnect registers 
are, I'm not sure there's any ID bit for the relevant configuration 
option, plus that still wouldn't be accurate anyway - it's fine to have 
a downstream cache/PoS *without* any back-door observers, so the actual 
problematic condition we need to detect is outside the SI IP altogether. 
It's a matter of SoC-level integration, so AFAICS the realistic options 
are likely to be:

  - SMCCC SOC_ID (if available early enough)
  - Match a top-level SoC/platform compatible out of the flat DT
  - Just trust that affected platforms' bootloaders will know to add the 
command-line option :/

>> You don't even call out what platform is actually affected...
> 
> Well, it's an Android phone :)
> 
> More generally, it's going to be anything with an Arm "SI L1" configured
> to work with non-coherent DMA agents below it. Christ knows whose bright
> idea it was to put "L1" in the name of the thing containing the system
> cache.

I'm still thankful the Neoverse product line skipped "MMU S1" and "MMU 
S2"...

>> The other elephant in the room is virtualisation: how does a guest
>> performing CMOs deals with this? How does it discover the that the
>> host is broken? I also don't see any attempt to make KVM handle the
>> erratum on behalf of the guest...
> 
> A guest shouldn't have to worry about the problem, as it only affects
> clean to PoC for non-coherent DMA agents that reside downstream of the
> SLC in the interconnect. Since VFIO doesn't permit assigning
> non-coherent devices to a guest, guests shouldn't ever need to push
> writes that far (and FWB would cause bigger problems if that was
> something we wanted to support)
> 
> +Mostafa to keep me honest on the VFIO front.

I don't think we actually prevent non-coherent devices being assigned, 
we just rely on the IOMMU supporting IOMMU_CAP_CACHE_COHERENCY. Thus if 
there's an I/O-coherent SMMU then it could end up being permitted, 
however I would hope that either the affected devices are not behind 
such an SMMU, or at least that if the SMMU imposes cacheable attributes 
then that prevents traffic from taking the back-door path to RAM.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-01-07 17:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-29  3:36 [PATCH v2] arm64: errata: Workaround for SI L1 downstream coherency issue Lucas Wei
2026-01-01  8:27 ` Kuan-Wei Chiu
2026-01-01 18:55 ` Marc Zyngier
2026-01-07 16:33   ` Will Deacon
2026-01-07 17:55     ` Robin Murphy
2026-01-07 16:22 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).