From: Will Deacon <will@kernel.org>
To: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
linux-arm-kernel@lists.infradead.org,
Vladimir Murzin <vladimir.murzin@arm.com>,
Mark Rutland <mark.rutland@arm.com>,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
Vikram Sethi <vsethi@nvidia.com>,
Jason Sequeira <jsequeira@nvidia.com>,
jgg@nvidia.com
Subject: Re: [PATCH v2] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
Date: Wed, 10 Jun 2026 12:28:33 +0100 [thread overview]
Message-ID: <ailKYTOX23EMnJsK@willie-the-truck> (raw)
In-Reply-To: <20260605144551.2004391-1-sdonthineni@nvidia.com>
[+Jason G]
On Fri, Jun 05, 2026 at 09:45:51AM -0500, Shanker Donthineni wrote:
> On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
> observed by a peripheral before an older, non-overlapping Device-nGnR*
> store to the same peripheral. This breaks the program-order guarantee
> that software expects for Device-nGnR* accesses and can leave a
> peripheral in an incorrect state, as a load is observed before an
> earlier store takes effect.
>
> The erratum can occur only when all of the following apply:
>
> - A PE executes a Device-nGnR* store followed by a younger
> Device-nGnR* load.
> - The store is not a store-release.
> - The accesses target the same peripheral and do not overlap in bytes.
> - There is at most one intervening Device-nGnR* store in program
> order, and there are no intervening Device-nGnR* loads.
> - There is no DSB, and no DMB that orders loads, between the store and
> the load.
> - Specific micro-architectural and timing conditions occur.
>
> Two ways to restore ordering: insert a barrier (any DSB, or a DMB that
> orders loads) between the store and the load, or make the store a
> store-release. A load-acquire on the load side would not help, because
> acquire semantics do not prevent a load from being observed ahead of an
> older store; only the store side (release or a barrier) closes the
> window.
I think you can drop the paragraph above. A store-release isn't enough
to order against a later load in the architecture either, so we're
clearly in micro-architecture territory and I don't think you need to
describe mechanisms that don't work here.
> Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
> to stlr* (Store-Release), which removes the "store is not a
> store-release" condition for every device write the kernel issues.
> Because writel() and writel_relaxed() are both built on __raw_writel()
> in asm-generic/io.h, patching the raw variants covers both the
> non-relaxed and relaxed APIs without touching the higher layers. Note
> that writel()'s own barrier sits before the store, so it does not order
> the store against a subsequent readl(); the store-release promotion is
> what provides that ordering.
Sashiko points out that you're missing __const_memcpy_toio_aligned32().
> Like ARM64_ERRATUM_832075 on the load side, the change is gated on a new
> ARM64_WORKAROUND_DEVICE_STORE_RELEASE capability and only activated on
> parts that match MIDR_NVIDIA_OLYMPUS, so unaffected CPUs continue to use
> the plain str* sequence.
>
> Note: stlr* only supports base-register addressing, so the raw accessors
> can no longer use the offset addressing introduced by commit d044d6ba6f02
> ("arm64: io: permit offset addressing"). The str* and stlr* alternates
> share a single inline-asm operand and the sequence is selected at boot,
> so the operand form is fixed at compile time; unaffected CPUs keep using
> str* but also revert to base-register addressing. This keeps the store
> side as simple as the existing load-side patching (load-acquire) and
> avoids adding complexity to the device write path; retaining offset
> addressing only for str* would otherwise require a runtime branch on
> every write.
I seem to remember Jason caring about that, possibly because some CPUs
are very picky about write-combining?
Will
next prev parent reply other threads:[~2026-06-10 11:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-05 14:45 [PATCH v2] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum Shanker Donthineni
2026-06-10 11:28 ` Will Deacon [this message]
2026-06-10 12:50 ` Jason Gunthorpe
2026-06-10 12:53 ` Shanker Donthineni
2026-06-10 13:20 ` Shanker Donthineni
2026-06-10 16:11 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ailKYTOX23EMnJsK@willie-the-truck \
--to=will@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=jgg@nvidia.com \
--cc=jsequeira@nvidia.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=sdonthineni@nvidia.com \
--cc=vladimir.murzin@arm.com \
--cc=vsethi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.