From: Shanker Donthineni <sdonthineni@nvidia.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
<linux-arm-kernel@lists.infradead.org>,
Mark Rutland <mark.rutland@arm.com>,
<linux-kernel@vger.kernel.org>, <linux-doc@vger.kernel.org>,
Shanker Donthineni <sdonthineni@nvidia.com>,
Vikram Sethi <vsethi@nvidia.com>,
Jason Sequeira <jsequeira@nvidia.com>
Subject: [PATCH v4 1/2] arm64: errata: Workaround NVIDIA Olympus device store/load ordering
Date: Thu, 25 Jun 2026 13:24:24 -0500 [thread overview]
Message-ID: <20260625182425.3194066-2-sdonthineni@nvidia.com> (raw)
In-Reply-To: <20260625182425.3194066-1-sdonthineni@nvidia.com>
On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
observed by a peripheral before an older, non-overlapping Device-nGnR*
store to the same peripheral. This breaks the program-order guarantee
that software expects for Device-nGnR* accesses and can leave a
peripheral in an incorrect state, as a load is observed before an
earlier store takes effect.
The erratum can occur only when all of the following apply:
- A PE executes a Device-nGnR* store followed by a younger
Device-nGnR* load.
- The store is not a store-release.
- The accesses target the same peripheral and do not overlap in bytes.
- There is at most one intervening Device-nGnR* store in program
order, and there are no intervening Device-nGnR* loads.
- There is no DSB, and no DMB that orders loads, between the store and
the load.
- Specific micro-architectural and timing conditions occur.
Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
to stlr* (Store-Release) on affected CPUs, which removes the "store is
not a store-release" condition for every device write the kernel issues.
Because writel() and writel_relaxed() are both built on __raw_writel()
in asm-generic/io.h, patching the raw variants covers both the
non-relaxed and relaxed APIs without touching the higher layers. Note
that writel()'s own barrier sits before the store, so it does not order
the store against a subsequent readl(); the store-release promotion is
what provides that ordering.
Like ARM64_ERRATUM_832075 on the load side, the change is gated on a new
ARM64_WORKAROUND_DEVICE_STORE_RELEASE capability and only activated on
parts that match MIDR_NVIDIA_OLYMPUS, so unaffected CPUs continue to use
plain str* instructions.
Note: stlr* only supports base-register addressing, so the raw MMIO
write helpers use a base-register str*/stlr* alternative sequence. This
gives up the offset-addressed str* code generation introduced by commit
d044d6ba6f02 ("arm64: io: permit offset addressing"). A static-branch
implementation would add extra control flow without preserving the
desired offset-addressed code generation in practice, so use a direct
base-register str*/stlr* alternative instead.
For the write-combining copy helpers (__iowrite{32,64}_copy()), the
contiguous str* groups are kept, because replacing those stores would
defeat the write-combining behaviour used to improve store performance.
Rather than rely on the relaxed, no-ordering contract of these helpers -
which would leave affected CPUs behaving differently from every other
arm64 system and exposed to any future driver that depends on ordering
across such copies - the DGH hint emitted once after each copy is
promoted to dmb osh on affected CPUs. That orders the grouped stores
against subsequent loads without placing a barrier in the copy loop,
while unaffected CPUs keep the existing DGH hint. The single-element
case of __const_memcpy_toio_aligned{32,64}() likewise uses a plain str*
(instead of __raw_write*()) so it shares that str* group + DGH path
rather than taking a per-store store-release.
Co-developed-by: Vikram Sethi <vsethi@nvidia.com>
Signed-off-by: Vikram Sethi <vsethi@nvidia.com>
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/all/ajVZBJgKn-5sxHD6@willie-the-truck/
---
Documentation/arch/arm64/silicon-errata.rst | 2 ++
arch/arm64/Kconfig | 25 +++++++++++++++++
arch/arm64/include/asm/barrier.h | 4 ++-
arch/arm64/include/asm/io.h | 31 +++++++++++++--------
arch/arm64/kernel/cpu_errata.c | 8 ++++++
arch/arm64/tools/cpucaps | 1 +
6 files changed, 59 insertions(+), 12 deletions(-)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index ad04d1cdc0f0..c4137f89acef 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -298,6 +298,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| NVIDIA | Carmel Core | N/A | NVIDIA_CARMEL_CNP_ERRATUM |
+----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA | Olympus core | T410-OLY-1027 | NVIDIA_OLYMPUS_1027_ERRATUM |
++----------------+-----------------+-----------------+-----------------------------+
| NVIDIA | Olympus core | T410-OLY-1029 | ARM64_ERRATUM_4118414 |
+----------------+-----------------+-----------------+-----------------------------+
| NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 10c69474f276..da4e66b19209 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -564,6 +564,31 @@ config ARM64_ERRATUM_832075
If unsure, say Y.
+config NVIDIA_OLYMPUS_1027_ERRATUM
+ bool "NVIDIA Olympus: device store/load ordering erratum"
+ default y
+ help
+ This option adds an alternative code sequence to work around an
+ NVIDIA Olympus core erratum where a Device-nGnR* store can be
+ observed by a peripheral after a younger Device-nGnR* load to the
+ same peripheral. This breaks the program order that drivers rely
+ on for MMIO and can leave a device in an incorrect state.
+
+ The workaround promotes the raw MMIO store helpers
+ (__raw_writeb/w/l/q) to Store-Release (STLR), which restores the
+ required ordering. Because writel() and writel_relaxed() are built
+ on __raw_writel(), both are covered without changes to the higher
+ layers. It also promotes the DGH hint used after write-combining
+ memcpy-to-IO sequences to a DMB, so grouped stores are ordered
+ against subsequent reads without placing a barrier in the copy loop.
+
+ The fix is applied through the alternatives framework, so enabling
+ this option does not by itself activate the workaround: it is
+ patched in only when an affected CPU is detected, and is a no-op on
+ unaffected CPUs.
+
+ If unsure, say Y.
+
config ARM64_ERRATUM_834220
bool "Cortex-A57: 834220: Stage 2 translation fault might be incorrectly reported in presence of a Stage 1 fault (rare)"
depends on KVM
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 9495c4441a46..22792d1305aa 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -38,7 +38,9 @@
* Device-GRE attributes before the hint instruction with any memory accesses
* appearing after the hint instruction.
*/
-#define dgh() asm volatile("hint #6" : : : "memory")
+#define dgh() asm volatile(ALTERNATIVE("hint #6", "dmb osh", \
+ ARM64_WORKAROUND_DEVICE_STORE_RELEASE) \
+ : : : "memory")
#define spec_bar() asm volatile(ALTERNATIVE("dsb nsh\nisb\n", \
SB_BARRIER_INSN"nop\n", \
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 8cbd1e96fd50..69e0fa004d31 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -16,7 +16,6 @@
#include <asm/memory.h>
#include <asm/early_ioremap.h>
#include <asm/alternative.h>
-#include <asm/cpufeature.h>
#include <asm/rsi.h>
/*
@@ -25,29 +24,37 @@
#define __raw_writeb __raw_writeb
static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
{
- volatile u8 __iomem *ptr = addr;
- asm volatile("strb %w0, %1" : : "rZ" (val), "Qo" (*ptr));
+ asm volatile(ALTERNATIVE("strb %w0, [%1]",
+ "stlrb %w0, [%1]",
+ ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
+ : : "rZ" (val), "r" (addr));
}
#define __raw_writew __raw_writew
static __always_inline void __raw_writew(u16 val, volatile void __iomem *addr)
{
- volatile u16 __iomem *ptr = addr;
- asm volatile("strh %w0, %1" : : "rZ" (val), "Qo" (*ptr));
+ asm volatile(ALTERNATIVE("strh %w0, [%1]",
+ "stlrh %w0, [%1]",
+ ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
+ : : "rZ" (val), "r" (addr));
}
#define __raw_writel __raw_writel
static __always_inline void __raw_writel(u32 val, volatile void __iomem *addr)
{
- volatile u32 __iomem *ptr = addr;
- asm volatile("str %w0, %1" : : "rZ" (val), "Qo" (*ptr));
+ asm volatile(ALTERNATIVE("str %w0, [%1]",
+ "stlr %w0, [%1]",
+ ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
+ : : "rZ" (val), "r" (addr));
}
#define __raw_writeq __raw_writeq
static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
{
- volatile u64 __iomem *ptr = addr;
- asm volatile("str %x0, %1" : : "rZ" (val), "Qo" (*ptr));
+ asm volatile(ALTERNATIVE("str %x0, [%1]",
+ "stlr %x0, [%1]",
+ ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
+ : : "rZ" (val), "r" (addr));
}
#define __raw_readb __raw_readb
@@ -178,7 +185,8 @@ __const_memcpy_toio_aligned32(volatile u32 __iomem *to, const u32 *from,
: "rZ"(from[0]), "rZ"(from[1]), "r"(to));
break;
case 1:
- __raw_writel(*from, to);
+ asm volatile("str %w0, [%1]"
+ : : "rZ"(from[0]), "r"(to) : "memory");
break;
default:
BUILD_BUG();
@@ -235,7 +243,8 @@ __const_memcpy_toio_aligned64(volatile u64 __iomem *to, const u64 *from,
: "rZ"(from[0]), "rZ"(from[1]), "r"(to));
break;
case 1:
- __raw_writeq(*from, to);
+ asm volatile("str %x0, [%1]"
+ : : "rZ"(from[0]), "r"(to) : "memory");
break;
default:
BUILD_BUG();
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 4b0d5d932897..76c1f8cf1ee0 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -839,6 +839,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL),
},
#endif
+#ifdef CONFIG_NVIDIA_OLYMPUS_1027_ERRATUM
+ {
+ /* NVIDIA Olympus core */
+ .desc = "NVIDIA Olympus device load/store ordering erratum",
+ .capability = ARM64_WORKAROUND_DEVICE_STORE_RELEASE,
+ ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
+ },
+#endif
#ifdef CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
{
/*
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 811c2479e82d..d367257bf770 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -120,6 +120,7 @@ WORKAROUND_CAVIUM_TX2_219_PRFM
WORKAROUND_CAVIUM_TX2_219_TVM
WORKAROUND_CLEAN_CACHE
WORKAROUND_DEVICE_LOAD_ACQUIRE
+WORKAROUND_DEVICE_STORE_RELEASE
WORKAROUND_NVIDIA_CARMEL_CNP
WORKAROUND_PMUV3_IMPDEF_TRAPS
WORKAROUND_QCOM_FALKOR_E1003
--
2.54.0.windows.1
next prev parent reply other threads:[~2026-06-25 18:25 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 18:24 [PATCH v4 0/2] arm64: errata: NVIDIA Olympus device store/load ordering Shanker Donthineni
2026-06-25 18:24 ` Shanker Donthineni [this message]
2026-06-25 18:24 ` [PATCH v4 2/2] arm64: io: apply the device store-release workaround once per block write Shanker Donthineni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260625182425.3194066-2-sdonthineni@nvidia.com \
--to=sdonthineni@nvidia.com \
--cc=catalin.marinas@arm.com \
--cc=jgg@nvidia.com \
--cc=jsequeira@nvidia.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=vladimir.murzin@arm.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox