Re: [PATCH v2] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Will Deacon <will@kernel.org>
To: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Vladimir Murzin <vladimir.murzin@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	Vikram Sethi <vsethi@nvidia.com>,
	Jason Sequeira <jsequeira@nvidia.com>,
	jgg@nvidia.com
Subject: Re: [PATCH v2] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
Date: Wed, 10 Jun 2026 12:28:33 +0100	[thread overview]
Message-ID: <ailKYTOX23EMnJsK@willie-the-truck> (raw)
In-Reply-To: <20260605144551.2004391-1-sdonthineni@nvidia.com>

[+Jason G]

On Fri, Jun 05, 2026 at 09:45:51AM -0500, Shanker Donthineni wrote:
> On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
> observed by a peripheral before an older, non-overlapping Device-nGnR*
> store to the same peripheral. This breaks the program-order guarantee
> that software expects for Device-nGnR* accesses and can leave a
> peripheral in an incorrect state, as a load is observed before an
> earlier store takes effect.
> 
> The erratum can occur only when all of the following apply:
> 
>   - A PE executes a Device-nGnR* store followed by a younger
>     Device-nGnR* load.
>   - The store is not a store-release.
>   - The accesses target the same peripheral and do not overlap in bytes.
>   - There is at most one intervening Device-nGnR* store in program
>     order, and there are no intervening Device-nGnR* loads.
>   - There is no DSB, and no DMB that orders loads, between the store and
>     the load.
>   - Specific micro-architectural and timing conditions occur.
> 
> Two ways to restore ordering: insert a barrier (any DSB, or a DMB that
> orders loads) between the store and the load, or make the store a
> store-release. A load-acquire on the load side would not help, because
> acquire semantics do not prevent a load from being observed ahead of an
> older store; only the store side (release or a barrier) closes the
> window.

I think you can drop the paragraph above. A store-release isn't enough
to order against a later load in the architecture either, so we're
clearly in micro-architecture territory and I don't think you need to
describe mechanisms that don't work here.

> Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
> to stlr* (Store-Release), which removes the "store is not a
> store-release" condition for every device write the kernel issues.
> Because writel() and writel_relaxed() are both built on __raw_writel()
> in asm-generic/io.h, patching the raw variants covers both the
> non-relaxed and relaxed APIs without touching the higher layers. Note
> that writel()'s own barrier sits before the store, so it does not order
> the store against a subsequent readl(); the store-release promotion is
> what provides that ordering.

Sashiko points out that you're missing __const_memcpy_toio_aligned32().

> Like ARM64_ERRATUM_832075 on the load side, the change is gated on a new
> ARM64_WORKAROUND_DEVICE_STORE_RELEASE capability and only activated on
> parts that match MIDR_NVIDIA_OLYMPUS, so unaffected CPUs continue to use
> the plain str* sequence.
> 
> Note: stlr* only supports base-register addressing, so the raw accessors
> can no longer use the offset addressing introduced by commit d044d6ba6f02
> ("arm64: io: permit offset addressing"). The str* and stlr* alternates
> share a single inline-asm operand and the sequence is selected at boot,
> so the operand form is fixed at compile time; unaffected CPUs keep using
> str* but also revert to base-register addressing. This keeps the store
> side as simple as the existing load-side patching (load-acquire) and
> avoids adding complexity to the device write path; retaining offset
> addressing only for str* would otherwise require a runtime branch on
> every write.

I seem to remember Jason caring about that, possibly because some CPUs
are very picky about write-combining?

Will

next prev parent reply	other threads:[~2026-06-10 11:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-05 14:45 [PATCH v2] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum Shanker Donthineni
2026-06-10 11:28 ` Will Deacon [this message]
2026-06-10 12:50   ` Jason Gunthorpe
2026-06-10 12:53   ` Shanker Donthineni
2026-06-10 13:20   ` Shanker Donthineni
2026-06-10 16:11     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ailKYTOX23EMnJsK@willie-the-truck \
    --to=will@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=jgg@nvidia.com \
    --cc=jsequeira@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=sdonthineni@nvidia.com \
    --cc=vladimir.murzin@arm.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox