Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
@ 2026-06-10 16:48 Shanker Donthineni
  2026-06-11 13:34 ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Shanker Donthineni @ 2026-06-10 16:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Vladimir Murzin
  Cc: Jason Gunthorpe, linux-arm-kernel, Mark Rutland, linux-kernel,
	linux-doc, Shanker Donthineni, Vikram Sethi, Jason Sequeira

On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
observed by a peripheral before an older, non-overlapping Device-nGnR*
store to the same peripheral. This breaks the program-order guarantee
that software expects for Device-nGnR* accesses and can leave a
peripheral in an incorrect state, as a load is observed before an
earlier store takes effect.

The erratum can occur only when all of the following apply:

  - A PE executes a Device-nGnR* store followed by a younger
    Device-nGnR* load.
  - The store is not a store-release.
  - The accesses target the same peripheral and do not overlap in bytes.
  - There is at most one intervening Device-nGnR* store in program
    order, and there are no intervening Device-nGnR* loads.
  - There is no DSB, and no DMB that orders loads, between the store and
    the load.
  - Specific micro-architectural and timing conditions occur.

Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
to stlr* (Store-Release), which removes the "store is not a
store-release" condition for every device write the kernel issues.
Because writel() and writel_relaxed() are both built on __raw_writel()
in asm-generic/io.h, patching the raw variants covers both the
non-relaxed and relaxed APIs without touching the higher layers. Note
that writel()'s own barrier sits before the store, so it does not order
the store against a subsequent readl(); the store-release promotion is
what provides that ordering.

Like ARM64_ERRATUM_832075 on the load side, the change is gated on a new
ARM64_WORKAROUND_DEVICE_STORE_RELEASE capability and only activated on
parts that match MIDR_NVIDIA_OLYMPUS, so unaffected CPUs continue to use
the plain str* sequence.

Note: stlr* only supports base-register addressing, so affected CPUs use
a base-register stlr* path. Unaffected CPUs keep the original
offset-addressed str* sequence introduced by commit d044d6ba6f02
("arm64: io: permit offset addressing").

The __const_memcpy_toio_aligned32() and __const_memcpy_toio_aligned64()
helpers are left unchanged. These helpers are intended for
write-combining mappings, which are Normal-NC on arm64. Replacing their
contiguous str* groups would defeat the write-combining behavior used to
improve store performance.

Co-developed-by: Vikram Sethi <vsethi@nvidia.com>
Signed-off-by: Vikram Sethi <vsethi@nvidia.com>
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
Changes since v2:
  - Reworked the raw MMIO write helpers so unaffected CPUs keep the
    existing offset-addressed STR sequence, while affected CPUs use the
    base-register STLR path.
  - Updated the commit message to match the code changes.
  - Rebased on top of the arm64 for-next/errata branch:
    https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/errata

Changes since v1:
  - Updated the commit message based on feedback from Vladimir Murzin.

 Documentation/arch/arm64/silicon-errata.rst |  2 ++
 arch/arm64/Kconfig                          | 23 ++++++++++++++++
 arch/arm64/include/asm/io.h                 | 30 +++++++++++++++++++++
 arch/arm64/kernel/cpu_errata.c              |  8 ++++++
 arch/arm64/tools/cpucaps                    |  1 +
 5 files changed, 64 insertions(+)

diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index ad09bbb10da80..fc45125dc2f80 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -298,6 +298,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | NVIDIA         | Carmel Core     | N/A             | NVIDIA_CARMEL_CNP_ERRATUM   |
 +----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA         | Olympus core    | T410-OLY-1027   | NVIDIA_OLYMPUS_1027_ERRATUM |
++----------------+-----------------+-----------------+-----------------------------+
 | NVIDIA         | Olympus core    | T410-OLY-1029   | ARM64_ERRATUM_4118414       |
 +----------------+-----------------+-----------------+-----------------------------+
 | NVIDIA         | T241 GICv3/4.x  | T241-FABRIC-4   | N/A                         |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c65cef81be86a..d633eb70de1ac 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -564,6 +564,29 @@ config ARM64_ERRATUM_832075
 
 	  If unsure, say Y.
 
+config NVIDIA_OLYMPUS_1027_ERRATUM
+	bool "NVIDIA Olympus: device store/load ordering erratum"
+	default y
+	help
+	  This option adds an alternative code sequence to work around an
+	  NVIDIA Olympus core erratum where a Device-nGnR* store can be
+	  observed by a peripheral after a younger Device-nGnR* load to the
+	  same peripheral. This breaks the program order that drivers rely
+	  on for MMIO and can leave a device in an incorrect state.
+
+	  The workaround promotes the raw MMIO store helpers
+	  (__raw_writeb/w/l/q) to Store-Release (STLR), which restores the
+	  required ordering. Because writel() and writel_relaxed() are built
+	  on __raw_writel(), both are covered without changes to the higher
+	  layers.
+
+	  The fix is applied through the alternatives framework, so enabling
+	  this option does not by itself activate the workaround: it is
+	  patched in only when an affected CPU is detected, and is a no-op on
+	  unaffected CPUs.
+
+	  If unsure, say Y.
+
 config ARM64_ERRATUM_834220
 	bool "Cortex-A57: 834220: Stage 2 translation fault might be incorrectly reported in presence of a Stage 1 fault (rare)"
 	depends on KVM
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 8cbd1e96fd50b..801223e754c90 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -22,10 +22,22 @@
 /*
  * Generic IO read/write.  These perform native-endian accesses.
  */
+static __always_inline bool arm64_needs_device_store_release(void)
+{
+	return alternative_has_cap_unlikely(
+				ARM64_WORKAROUND_DEVICE_STORE_RELEASE);
+}
+
 #define __raw_writeb __raw_writeb
 static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
 {
 	volatile u8 __iomem *ptr = addr;
+
+	if (arm64_needs_device_store_release()) {
+		asm volatile("stlrb %w0, [%1]" : : "rZ" (val), "r" (addr));
+		return;
+	}
+
 	asm volatile("strb %w0, %1" : : "rZ" (val), "Qo" (*ptr));
 }
 
@@ -33,6 +45,12 @@ static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
 static __always_inline void __raw_writew(u16 val, volatile void __iomem *addr)
 {
 	volatile u16 __iomem *ptr = addr;
+
+	if (arm64_needs_device_store_release()) {
+		asm volatile("stlrh %w0, [%1]" : : "rZ" (val), "r" (addr));
+		return;
+	}
+
 	asm volatile("strh %w0, %1" : : "rZ" (val), "Qo" (*ptr));
 }
 
@@ -40,6 +58,12 @@ static __always_inline void __raw_writew(u16 val, volatile void __iomem *addr)
 static __always_inline void __raw_writel(u32 val, volatile void __iomem *addr)
 {
 	volatile u32 __iomem *ptr = addr;
+
+	if (arm64_needs_device_store_release()) {
+		asm volatile("stlr %w0, [%1]" : : "rZ" (val), "r" (addr));
+		return;
+	}
+
 	asm volatile("str %w0, %1" : : "rZ" (val), "Qo" (*ptr));
 }
 
@@ -47,6 +71,12 @@ static __always_inline void __raw_writel(u32 val, volatile void __iomem *addr)
 static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
 {
 	volatile u64 __iomem *ptr = addr;
+
+	if (arm64_needs_device_store_release()) {
+		asm volatile("stlr %x0, [%1]" : : "rZ" (val), "r" (addr));
+		return;
+	}
+
 	asm volatile("str %x0, %1" : : "rZ" (val), "Qo" (*ptr));
 }
 
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index d597896b0f7f3..b096d9acca578 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -838,6 +838,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL),
 	},
 #endif
+#ifdef CONFIG_NVIDIA_OLYMPUS_1027_ERRATUM
+	{
+		/* NVIDIA Olympus core */
+		.desc = "NVIDIA Olympus device load/store ordering erratum",
+		.capability = ARM64_WORKAROUND_DEVICE_STORE_RELEASE,
+		ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
+	},
+#endif
 #ifdef CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
 	{
 		/*
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 811c2479e82d6..d367257bf7703 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -120,6 +120,7 @@ WORKAROUND_CAVIUM_TX2_219_PRFM
 WORKAROUND_CAVIUM_TX2_219_TVM
 WORKAROUND_CLEAN_CACHE
 WORKAROUND_DEVICE_LOAD_ACQUIRE
+WORKAROUND_DEVICE_STORE_RELEASE
 WORKAROUND_NVIDIA_CARMEL_CNP
 WORKAROUND_PMUV3_IMPDEF_TRAPS
 WORKAROUND_QCOM_FALKOR_E1003
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-11 17:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 16:48 [PATCH v3] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum Shanker Donthineni
2026-06-11 13:34 ` Will Deacon
2026-06-11 14:08   ` Shanker Donthineni
2026-06-11 15:08   ` Vladimir Murzin
2026-06-11 16:00     ` Shanker Donthineni
2026-06-11 17:49   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox