public inbox for nouveau@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH 0/3] rust: more memory barriers bindings
@ 2026-04-02 15:24 Gary Guo
  2026-04-02 15:24 ` [PATCH 1/3] rust: sync: add helpers for mb, dma_mb and friends Gary Guo
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Gary Guo @ 2026-04-02 15:24 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich
  Cc: Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	rust-for-linux, nouveau, linux-kernel, linux-arch, lkmm

From: Gary Guo <gary@garyguo.net>

This expands the existing smp barriers to also mandatory barriers and
DMA barriers.

The API looks like:
`mb(Ordering)`/`smp_mb(Ordering)`/`dma_mb(Ordering)`, where `Ordering`
is one of `Full`, `Read`, `Write` and also `Acquire`, `Release`.

The `Acquire` and `Release` barriers are mapped to `Full` barriers for
now and they only serve the purpose of documenting what ordering is
needed without codegen optimizations, but they could be improved
later to produce better codegen compared to full barriers. More on them
in the commit message of patch 2.

A user of these introduced API is included in patch 3, which is a
concurrency bug that exists in Nova code today due to missing barriers.

To: Miguel Ojeda <ojeda@kernel.org>
To: Boqun Feng <boqun@kernel.org>
To: Danilo Krummrich <dakr@kernel.org>
To: Alexandre Courbot <acourbot@nvidia.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Benno Lossin <lossin@kernel.org>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jade Alglave <j.alglave@ucl.ac.uk>
Cc: Luc Maranget <luc.maranget@inria.fr>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Akira Yokosawa <akiyks@gmail.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: rust-for-linux@vger.kernel.org
Cc: nouveau@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: lkmm@lists.linux.dev

Gary Guo (3):
  rust: sync: add helpers for mb, dma_mb and friends
  rust: sync: generic memory barriers
  gpu: nova-core: fix wrong use of barriers in GSP code

 drivers/gpu/nova-core/gsp/cmdq.rs   |  19 +++
 drivers/gpu/nova-core/gsp/fw.rs     |  12 --
 rust/helpers/barrier.c              |  30 +++++
 rust/kernel/sync/atomic/ordering.rs |   2 +-
 rust/kernel/sync/barrier.rs         | 194 ++++++++++++++++++++++++----
 5 files changed, 217 insertions(+), 40 deletions(-)


base-commit: 36ece9697e89016181e5ae87510e40fb31d86f2b
-- 
2.51.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] rust: sync: add helpers for mb, dma_mb and friends
  2026-04-02 15:24 [PATCH 0/3] rust: more memory barriers bindings Gary Guo
@ 2026-04-02 15:24 ` Gary Guo
  2026-04-02 15:24 ` [PATCH 2/3] rust: sync: generic memory barriers Gary Guo
  2026-04-02 15:24 ` [PATCH 3/3] gpu: nova-core: fix wrong use of barriers in GSP code Gary Guo
  2 siblings, 0 replies; 7+ messages in thread
From: Gary Guo @ 2026-04-02 15:24 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich
  Cc: Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	rust-for-linux, nouveau, linux-kernel, linux-arch, lkmm

From: Gary Guo <gary@garyguo.net>

They supplement the existing smp_mb, smp_rmb and smp_wmb.

Signed-off-by: Gary Guo <gary@garyguo.net>
---
 rust/helpers/barrier.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/rust/helpers/barrier.c b/rust/helpers/barrier.c
index fed8853745c8..dbc7a3017c78 100644
--- a/rust/helpers/barrier.c
+++ b/rust/helpers/barrier.c
@@ -2,6 +2,36 @@
 
 #include <asm/barrier.h>
 
+__rust_helper void rust_helper_mb(void)
+{
+	mb();
+}
+
+__rust_helper void rust_helper_rmb(void)
+{
+	rmb();
+}
+
+__rust_helper void rust_helper_wmb(void)
+{
+	wmb();
+}
+
+__rust_helper void rust_helper_dma_mb(void)
+{
+	dma_mb();
+}
+
+__rust_helper void rust_helper_dma_rmb(void)
+{
+	dma_rmb();
+}
+
+__rust_helper void rust_helper_dma_wmb(void)
+{
+	dma_wmb();
+}
+
 __rust_helper void rust_helper_smp_mb(void)
 {
 	smp_mb();
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] rust: sync: generic memory barriers
  2026-04-02 15:24 [PATCH 0/3] rust: more memory barriers bindings Gary Guo
  2026-04-02 15:24 ` [PATCH 1/3] rust: sync: add helpers for mb, dma_mb and friends Gary Guo
@ 2026-04-02 15:24 ` Gary Guo
  2026-04-02 15:24 ` [PATCH 3/3] gpu: nova-core: fix wrong use of barriers in GSP code Gary Guo
  2 siblings, 0 replies; 7+ messages in thread
From: Gary Guo @ 2026-04-02 15:24 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Will Deacon, Peter Zijlstra, Mark Rutland
  Cc: Alan Stern, Andrea Parri, Nicholas Piggin, David Howells,
	Jade Alglave, Luc Maranget, Paul E. McKenney, Akira Yokosawa,
	Daniel Lustig, Joel Fernandes, rust-for-linux, nouveau,
	linux-kernel, linux-arch, lkmm

From: Gary Guo <gary@garyguo.net>

Implement a generic interface for memory barriers (full system/DMA/SMP).
The interface uses a parameter to force user to specify their intent with
barriers.

It provides `Read`, `Write`, `Full` orderings which map to the existing
`rmb()`, `wmb()` and `mb()`, but also `Acquire` and `Release` which is
documented to have `LOAD->{LOAD,STORE}` ordering and `{LOAD,STORE}->WRITE`
ordering, although for now they're still mapped to a full `mb()`. But in
the future it could be mapped to a more efficient form depending on the
architecture. I included them as many users do not need the STORE->LOAD
ordering, and having them use `Acquire`/`Release` is more clear on their
intent in what reordering is to be prevented.

Generic is used here instead of providing individual standalone functions
to reduce code duplication. For example, the `Acquire` -> `Full` upgrade
here is uniformly implemented for all three types. The `CONFIG_SMP` check
in `smp_mb` is uniformly implemented for all SMP barriers. This could
extend to `virt_mb`'s if they're introduced in the future.

Signed-off-by: Gary Guo <gary@garyguo.net>
---
 rust/kernel/sync/atomic/ordering.rs |   2 +-
 rust/kernel/sync/barrier.rs         | 194 ++++++++++++++++++++++++----
 2 files changed, 168 insertions(+), 28 deletions(-)

diff --git a/rust/kernel/sync/atomic/ordering.rs b/rust/kernel/sync/atomic/ordering.rs
index 3f103aa8db99..c4e732e7212f 100644
--- a/rust/kernel/sync/atomic/ordering.rs
+++ b/rust/kernel/sync/atomic/ordering.rs
@@ -15,7 +15,7 @@
 //!   - It provides ordering between the annotated operation and all the following memory accesses.
 //!   - It provides ordering between all the preceding memory accesses and all the following memory
 //!     accesses.
-//!   - All the orderings are the same strength as a full memory barrier (i.e. `smp_mb()`).
+//!   - All the orderings are the same strength as a full memory barrier (i.e. `smp_mb(Full)`).
 //! - [`Relaxed`] provides no ordering except the dependency orderings. Dependency orderings are
 //!   described in "DEPENDENCY RELATIONS" in [`LKMM`]'s [`explanation`].
 //!
diff --git a/rust/kernel/sync/barrier.rs b/rust/kernel/sync/barrier.rs
index 8f2d435fcd94..0331bb353a76 100644
--- a/rust/kernel/sync/barrier.rs
+++ b/rust/kernel/sync/barrier.rs
@@ -7,6 +7,23 @@
 //!
 //! [`LKMM`]: srctree/tools/memory-model/
 
+#![expect(private_bounds, reason = "sealed implementation")]
+
+pub use super::atomic::ordering::{
+    Acquire,
+    Full,
+    Release, //
+};
+
+/// The annotation type for read operations.
+pub struct Read;
+
+/// The annotation type for write operations.
+pub struct Write;
+
+struct Smp;
+struct Dma;
+
 /// A compiler barrier.
 ///
 /// A barrier that prevents compiler from reordering memory accesses across the barrier.
@@ -19,43 +36,166 @@ pub(crate) fn barrier() {
     unsafe { core::arch::asm!("") };
 }
 
-/// A full memory barrier.
-///
-/// A barrier that prevents compiler and CPU from reordering memory accesses across the barrier.
-#[inline(always)]
-pub fn smp_mb() {
-    if cfg!(CONFIG_SMP) {
-        // SAFETY: `smp_mb()` is safe to call.
-        unsafe { bindings::smp_mb() };
-    } else {
-        barrier();
+trait MemoryBarrier<Flavour = ()> {
+    fn run();
+}
+
+// Currently kernel only support `rmb`, `wmb` and full `mb`.
+// Upgrade `Acquire`/`Release` barriers to full barriers.
+
+impl<F> MemoryBarrier<F> for Acquire
+where
+    Full: MemoryBarrier<F>,
+{
+    #[inline]
+    fn run() {
+        Full::run();
     }
 }
 
-/// A write-write memory barrier.
-///
-/// A barrier that prevents compiler and CPU from reordering memory write accesses across the
-/// barrier.
-#[inline(always)]
-pub fn smp_wmb() {
-    if cfg!(CONFIG_SMP) {
+impl<F> MemoryBarrier<F> for Release
+where
+    Full: MemoryBarrier<F>,
+{
+    #[inline]
+    fn run() {
+        Full::run();
+    }
+}
+
+// Specific barrier implementations.
+
+impl MemoryBarrier for Read {
+    #[inline]
+    fn run() {
+        // SAFETY: `rmb()` is safe to call.
+        unsafe { bindings::rmb() };
+    }
+}
+
+impl MemoryBarrier for Write {
+    #[inline]
+    fn run() {
+        // SAFETY: `wmb()` is safe to call.
+        unsafe { bindings::wmb() };
+    }
+}
+
+impl MemoryBarrier for Full {
+    #[inline]
+    fn run() {
+        // SAFETY: `mb()` is safe to call.
+        unsafe { bindings::mb() };
+    }
+}
+
+impl MemoryBarrier<Dma> for Read {
+    #[inline]
+    fn run() {
+        // SAFETY: `dma_rmb()` is safe to call.
+        unsafe { bindings::dma_rmb() };
+    }
+}
+
+impl MemoryBarrier<Dma> for Write {
+    #[inline]
+    fn run() {
+        // SAFETY: `dma_wmb()` is safe to call.
+        unsafe { bindings::dma_wmb() };
+    }
+}
+
+impl MemoryBarrier<Dma> for Full {
+    #[inline]
+    fn run() {
+        // SAFETY: `dma_mb()` is safe to call.
+        unsafe { bindings::dma_mb() };
+    }
+}
+
+impl MemoryBarrier<Smp> for Read {
+    #[inline]
+    fn run() {
+        // SAFETY: `smp_rmb()` is safe to call.
+        unsafe { bindings::smp_rmb() };
+    }
+}
+
+impl MemoryBarrier<Smp> for Write {
+    #[inline]
+    fn run() {
         // SAFETY: `smp_wmb()` is safe to call.
         unsafe { bindings::smp_wmb() };
-    } else {
-        barrier();
     }
 }
 
-/// A read-read memory barrier.
+impl MemoryBarrier<Smp> for Full {
+    #[inline]
+    fn run() {
+        // SAFETY: `smp_mb()` is safe to call.
+        unsafe { bindings::smp_mb() };
+    }
+}
+
+/// Memory barrier.
 ///
-/// A barrier that prevents compiler and CPU from reordering memory read accesses across the
-/// barrier.
-#[inline(always)]
-pub fn smp_rmb() {
+/// A barrier that prevents compiler and CPU from reordering memory accesses across the barrier.
+///
+/// The specific forms of reordering can be specified using the parameter.
+/// - `mb(Read)` provides a read-read barrier.
+/// - `mb(Write)` provides a write-write barrier.
+/// - `mb(Full)` provides a full barrier.
+/// - `mb(Acquire)` prevents preceding read from being ordered against succeeding memory
+///    operations.
+/// - `mb(Release)` prevents preceding memory operations from being ordered against succeeding
+///    writes.
+///
+/// # Examples
+///
+/// ```
+/// # use kernel::sync::barrier::*;
+/// mb(Read);
+/// mb(Write);
+/// mb(Acquire);
+/// mb(Release);
+/// mb(Full);
+/// ```
+#[inline]
+#[doc(alias = "rmb")]
+#[doc(alias = "wmb")]
+pub fn mb<T: MemoryBarrier>(_: T) {
+    T::run()
+}
+
+/// Memory barrier between CPUs.
+///
+/// A barrier that prevents compiler and CPU from reordering memory accesses across the barrier.
+/// Does not prevent re-ordering with respect to other bus-mastering devices.
+///
+/// Prefer using `Acquire` [loads](super::atomic::Atomic::load) to `Acquire` barriers, and `Release`
+/// [stores](super::atomic::Atomic::store) to `Release` barriers.
+///
+/// See [`mb`] for usage.
+#[inline]
+#[doc(alias = "smp_rmb")]
+#[doc(alias = "smp_wmb")]
+pub fn smp_mb<T: MemoryBarrier<Smp>>(_: T) {
     if cfg!(CONFIG_SMP) {
-        // SAFETY: `smp_rmb()` is safe to call.
-        unsafe { bindings::smp_rmb() };
+        T::run()
     } else {
-        barrier();
+        barrier()
     }
 }
+
+/// Memory barrier between local CPU and bus-mastering devices.
+///
+/// A barrier that prevents compiler and CPU from reordering memory accesses across the barrier.
+/// Does not prevent re-ordering with respect to other CPUs.
+///
+/// See [`mb`] for usage.
+#[inline]
+#[doc(alias = "dma_rmb")]
+#[doc(alias = "dma_wmb")]
+pub fn dma_mb<T: MemoryBarrier<Dma>>(_: T) {
+    T::run()
+}
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] gpu: nova-core: fix wrong use of barriers in GSP code
  2026-04-02 15:24 [PATCH 0/3] rust: more memory barriers bindings Gary Guo
  2026-04-02 15:24 ` [PATCH 1/3] rust: sync: add helpers for mb, dma_mb and friends Gary Guo
  2026-04-02 15:24 ` [PATCH 2/3] rust: sync: generic memory barriers Gary Guo
@ 2026-04-02 15:24 ` Gary Guo
  2026-04-02 21:56   ` Joel Fernandes
  2 siblings, 1 reply; 7+ messages in thread
From: Gary Guo @ 2026-04-02 15:24 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Alexandre Courbot, David Airlie, Simona Vetter
  Cc: Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	rust-for-linux, nouveau, linux-kernel, linux-arch, lkmm,
	dri-devel

From: Gary Guo <gary@garyguo.net>

Currently, in the GSP->CPU messaging path, the current code misses a read
barrier before data read. The barrier after read is updated to a DMA
barrier (with release ordering desired), instead of the existing (Rust)
SeqCst SMP barrier; the location of barrier is also moved to the beginning
of function, because the barrier is needed to synchronizing between data
and ring-buffer pointer, the RMW operation does not internally need a
barrier (nor it has to be atomic, as CPU pointers are updated by CPU only).

In the CPU->GSP messaging path, the current code misses a write barrier
after data write and before updating the CPU write pointer. Barrier is not
needed before data write due to control dependency, this fact is documented
explicitly. This could be replaced with an acquire barrier if needed.

Signed-off-by: Gary Guo <gary@garyguo.net>
---
 drivers/gpu/nova-core/gsp/cmdq.rs | 19 +++++++++++++++++++
 drivers/gpu/nova-core/gsp/fw.rs   | 12 ------------
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 2224896ccc89..7e4315b13984 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -19,6 +19,12 @@
     prelude::*,
     sync::{
         aref::ARef,
+        barrier::{
+            dma_mb,
+            Read,
+            Release,
+            Write, //
+        },
         Mutex, //
     },
     time::Delta,
@@ -258,6 +264,9 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
         let tx = self.cpu_write_ptr() as usize;
         let rx = self.gsp_read_ptr() as usize;
 
+        // ORDERING: control dependency provides necessary LOAD->STORE ordering.
+        // `dma_mb(Acquire)` may be used here if we don't want to rely on control dependency.
+
         // SAFETY:
         // - We will only access the driver-owned part of the shared memory.
         // - Per the safety statement of the function, no concurrent access will be performed.
@@ -311,6 +320,9 @@ fn driver_write_area_size(&self) -> usize {
         let tx = self.gsp_write_ptr() as usize;
         let rx = self.cpu_read_ptr() as usize;
 
+        // ORDERING: Ensure data load is ordered after load of GSP write pointer.
+        dma_mb(Read);
+
         // SAFETY:
         // - We will only access the driver-owned part of the shared memory.
         // - Per the safety statement of the function, no concurrent access will be performed.
@@ -408,6 +420,10 @@ fn cpu_read_ptr(&self) -> u32 {
 
     // Informs the GSP that it can send `elem_count` new pages into the message queue.
     fn advance_cpu_read_ptr(&mut self, elem_count: u32) {
+        // ORDERING: Ensure read pointer is properly ordered.
+        //
+        dma_mb(Release);
+
         super::fw::gsp_mem::advance_cpu_read_ptr(&self.0, elem_count)
     }
 
@@ -422,6 +438,9 @@ fn cpu_write_ptr(&self) -> u32 {
 
     // Informs the GSP that it can process `elem_count` new pages from the command queue.
     fn advance_cpu_write_ptr(&mut self, elem_count: u32) {
+        // ORDERING: Ensure all command data is visible before updateing ring buffer pointer.
+        dma_mb(Write);
+
         super::fw::gsp_mem::advance_cpu_write_ptr(&self.0, elem_count)
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 0c8a74f0e8ac..62c2cf1b030c 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -42,11 +42,6 @@
 
 // TODO: Replace with `IoView` projections once available.
 pub(super) mod gsp_mem {
-    use core::sync::atomic::{
-        fence,
-        Ordering, //
-    };
-
     use kernel::{
         dma::Coherent,
         dma_read,
@@ -72,10 +67,6 @@ pub(in crate::gsp) fn cpu_read_ptr(qs: &Coherent<GspMem>) -> u32 {
 
     pub(in crate::gsp) fn advance_cpu_read_ptr(qs: &Coherent<GspMem>, count: u32) {
         let rptr = cpu_read_ptr(qs).wrapping_add(count) % MSGQ_NUM_PAGES;
-
-        // Ensure read pointer is properly ordered.
-        fence(Ordering::SeqCst);
-
         dma_write!(qs, .cpuq.rx.0.readPtr, rptr);
     }
 
@@ -87,9 +78,6 @@ pub(in crate::gsp) fn advance_cpu_write_ptr(qs: &Coherent<GspMem>, count: u32) {
         let wptr = cpu_write_ptr(qs).wrapping_add(count) % MSGQ_NUM_PAGES;
 
         dma_write!(qs, .cpuq.tx.0.writePtr, wptr);
-
-        // Ensure all command data is visible before triggering the GSP read.
-        fence(Ordering::SeqCst);
     }
 }
 
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] gpu: nova-core: fix wrong use of barriers in GSP code
  2026-04-02 15:24 ` [PATCH 3/3] gpu: nova-core: fix wrong use of barriers in GSP code Gary Guo
@ 2026-04-02 21:56   ` Joel Fernandes
  2026-04-04 13:02     ` Gary Guo
  0 siblings, 1 reply; 7+ messages in thread
From: Joel Fernandes @ 2026-04-02 21:56 UTC (permalink / raw)
  To: Gary Guo, Miguel Ojeda, Boqun Feng, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Alexandre Courbot, David Airlie, Simona Vetter
  Cc: Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, rust-for-linux,
	nouveau, linux-kernel, linux-arch, lkmm, dri-devel

Hi Gary,

On 4/2/2026 11:24 AM, Gary Guo wrote:
> From: Gary Guo <gary@garyguo.net>
> 
> Currently, in the GSP->CPU messaging path, the current code misses a read
> barrier before data read. The barrier after read is updated to a DMA
> barrier (with release ordering desired), instead of the existing (Rust)
> SeqCst SMP barrier; the location of barrier is also moved to the beginning
> of function, because the barrier is needed to synchronizing between data
> and ring-buffer pointer, the RMW operation does not internally need a
> barrier (nor it has to be atomic, as CPU pointers are updated by CPU only).
> 
> In the CPU->GSP messaging path, the current code misses a write barrier
> after data write and before updating the CPU write pointer. Barrier is not
> needed before data write due to control dependency, this fact is documented
> explicitly. This could be replaced with an acquire barrier if needed.
> 
> Signed-off-by: Gary Guo <gary@garyguo.net>
> ---
>  drivers/gpu/nova-core/gsp/cmdq.rs | 19 +++++++++++++++++++
>  drivers/gpu/nova-core/gsp/fw.rs   | 12 ------------
>  2 files changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 2224896ccc89..7e4315b13984 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -19,6 +19,12 @@
>      prelude::*,
>      sync::{
>          aref::ARef,
> +        barrier::{
> +            dma_mb,
> +            Read,
> +            Release,
> +            Write, //
> +        },
>          Mutex, //
>      },
>      time::Delta,
> @@ -258,6 +264,9 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>          let tx = self.cpu_write_ptr() as usize;
>          let rx = self.gsp_read_ptr() as usize;
>  
> +        // ORDERING: control dependency provides necessary LOAD->STORE ordering.
> +        // `dma_mb(Acquire)` may be used here if we don't want to rely on control dependency.

Just checking, does control dependency on CPU side really apply to ordering for
IO (what the device perceives?). IOW, the loads are stores might be ordered on
the CPU side, but the device might be seeing these operations out of order. If
that is the case, perhaps the control dependency comment is misleading.


> +
>          // SAFETY:
>          // - We will only access the driver-owned part of the shared memory.
>          // - Per the safety statement of the function, no concurrent access will be performed.
> @@ -311,6 +320,9 @@ fn driver_write_area_size(&self) -> usize {
>          let tx = self.gsp_write_ptr() as usize;
>          let rx = self.cpu_read_ptr() as usize;
>  
> +        // ORDERING: Ensure data load is ordered after load of GSP write pointer.
> +        dma_mb(Read);
> +

I suggest taking it on a case by case basis, and splitting the patch for each
case, for easier review. There are many patterns AFAICS, load-store, store-store
etc.

I do acknowledge the issue you find here though. thanks,

--
Joel Fernandes



>          // SAFETY:
>          // - We will only access the driver-owned part of the shared memory.
>          // - Per the safety statement of the function, no concurrent access will be performed.
> @@ -408,6 +420,10 @@ fn cpu_read_ptr(&self) -> u32 {
>  
>      // Informs the GSP that it can send `elem_count` new pages into the message queue.
>      fn advance_cpu_read_ptr(&mut self, elem_count: u32) {
> +        // ORDERING: Ensure read pointer is properly ordered.
> +        //
> +        dma_mb(Release);
> +
>          super::fw::gsp_mem::advance_cpu_read_ptr(&self.0, elem_count)
>      }
>  
> @@ -422,6 +438,9 @@ fn cpu_write_ptr(&self) -> u32 {
>  
>      // Informs the GSP that it can process `elem_count` new pages from the command queue.
>      fn advance_cpu_write_ptr(&mut self, elem_count: u32) {
> +        // ORDERING: Ensure all command data is visible before updateing ring buffer pointer.
> +        dma_mb(Write);
> +
>          super::fw::gsp_mem::advance_cpu_write_ptr(&self.0, elem_count)
>      }
>  }
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 0c8a74f0e8ac..62c2cf1b030c 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -42,11 +42,6 @@
>  
>  // TODO: Replace with `IoView` projections once available.
>  pub(super) mod gsp_mem {
> -    use core::sync::atomic::{
> -        fence,
> -        Ordering, //
> -    };
> -
>      use kernel::{
>          dma::Coherent,
>          dma_read,
> @@ -72,10 +67,6 @@ pub(in crate::gsp) fn cpu_read_ptr(qs: &Coherent<GspMem>) -> u32 {
>  
>      pub(in crate::gsp) fn advance_cpu_read_ptr(qs: &Coherent<GspMem>, count: u32) {
>          let rptr = cpu_read_ptr(qs).wrapping_add(count) % MSGQ_NUM_PAGES;
> -
> -        // Ensure read pointer is properly ordered.
> -        fence(Ordering::SeqCst);
> -
>          dma_write!(qs, .cpuq.rx.0.readPtr, rptr);
>      }
>  
> @@ -87,9 +78,6 @@ pub(in crate::gsp) fn advance_cpu_write_ptr(qs: &Coherent<GspMem>, count: u32) {
>          let wptr = cpu_write_ptr(qs).wrapping_add(count) % MSGQ_NUM_PAGES;
>  
>          dma_write!(qs, .cpuq.tx.0.writePtr, wptr);
> -
> -        // Ensure all command data is visible before triggering the GSP read.
> -        fence(Ordering::SeqCst);
>      }
>  }
>  

-- 
Joel Fernandes


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] gpu: nova-core: fix wrong use of barriers in GSP code
  2026-04-02 21:56   ` Joel Fernandes
@ 2026-04-04 13:02     ` Gary Guo
  2026-04-06 18:18       ` Joel Fernandes
  0 siblings, 1 reply; 7+ messages in thread
From: Gary Guo @ 2026-04-04 13:02 UTC (permalink / raw)
  To: Joel Fernandes, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, rust-for-linux,
	nouveau, linux-kernel, linux-arch, lkmm, dri-devel

On Thu Apr 2, 2026 at 10:56 PM BST, Joel Fernandes wrote:
> Hi Gary,
>
> On 4/2/2026 11:24 AM, Gary Guo wrote:
>> From: Gary Guo <gary@garyguo.net>
>> 
>> Currently, in the GSP->CPU messaging path, the current code misses a read
>> barrier before data read. The barrier after read is updated to a DMA
>> barrier (with release ordering desired), instead of the existing (Rust)
>> SeqCst SMP barrier; the location of barrier is also moved to the beginning
>> of function, because the barrier is needed to synchronizing between data
>> and ring-buffer pointer, the RMW operation does not internally need a
>> barrier (nor it has to be atomic, as CPU pointers are updated by CPU only).
>> 
>> In the CPU->GSP messaging path, the current code misses a write barrier
>> after data write and before updating the CPU write pointer. Barrier is not
>> needed before data write due to control dependency, this fact is documented
>> explicitly. This could be replaced with an acquire barrier if needed.
>> 
>> Signed-off-by: Gary Guo <gary@garyguo.net>
>> ---
>>  drivers/gpu/nova-core/gsp/cmdq.rs | 19 +++++++++++++++++++
>>  drivers/gpu/nova-core/gsp/fw.rs   | 12 ------------
>>  2 files changed, 19 insertions(+), 12 deletions(-)
>> 
>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> index 2224896ccc89..7e4315b13984 100644
>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> @@ -19,6 +19,12 @@
>>      prelude::*,
>>      sync::{
>>          aref::ARef,
>> +        barrier::{
>> +            dma_mb,
>> +            Read,
>> +            Release,
>> +            Write, //
>> +        },
>>          Mutex, //
>>      },
>>      time::Delta,
>> @@ -258,6 +264,9 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>          let tx = self.cpu_write_ptr() as usize;
>>          let rx = self.gsp_read_ptr() as usize;
>>  
>> +        // ORDERING: control dependency provides necessary LOAD->STORE ordering.
>> +        // `dma_mb(Acquire)` may be used here if we don't want to rely on control dependency.
>
> Just checking, does control dependency on CPU side really apply to ordering for
> IO (what the device perceives?). IOW, the loads are stores might be ordered on
> the CPU side, but the device might be seeing these operations out of order. If
> that is the case, perhaps the control dependency comment is misleading.

Given that CPU cannot speculate store, I don't see how dependency ordering can
be broken even if the other side is a bus-mastering device.

For this to be broken, the device would be able to see the dependency-ordered
STORE to be propagated to it before it does it own STORE that propagates to the
CPU to trigger the CPU STORE in the first place? That'll just break casuality.

I'll say that control dependency is sufficient here. I'm not worried that the
compiler will break this particular control dependency given that if
`gsp_read_ptr == cpu_write_ptr` will imply command allocation failure and this
condition cannot possibly be optimized out by the compiler.

Best,
Gary

>
>
>> +
>>          // SAFETY:
>>          // - We will only access the driver-owned part of the shared memory.
>>          // - Per the safety statement of the function, no concurrent access will be performed.
>> @@ -311,6 +320,9 @@ fn driver_write_area_size(&self) -> usize {
>>          let tx = self.gsp_write_ptr() as usize;
>>          let rx = self.cpu_read_ptr() as usize;
>>  
>> +        // ORDERING: Ensure data load is ordered after load of GSP write pointer.
>> +        dma_mb(Read);
>> +
>
> I suggest taking it on a case by case basis, and splitting the patch for each
> case, for easier review. There are many patterns AFAICS, load-store, store-store
> etc.
>
> I do acknowledge the issue you find here though. thanks,

This is pretty much just a common MP pattern, where there's even an example in
Documentation/core-api/circular-buffers.rst (except we're using dma barriers
here as we're talking to devices rather than another SMP CPU), so I think
there's no real need of breaking this into small parts.

(That documentation makes use of smp_load_acquire/smp_store_release which is
another reason I want to at least document the barrrier to be of ACQUIRE/RELEASE
ordering).

Perhaps I can move the removal of redundant barrier after write pointer update to
another patch.

Best,
Gary

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] gpu: nova-core: fix wrong use of barriers in GSP code
  2026-04-04 13:02     ` Gary Guo
@ 2026-04-06 18:18       ` Joel Fernandes
  0 siblings, 0 replies; 7+ messages in thread
From: Joel Fernandes @ 2026-04-06 18:18 UTC (permalink / raw)
  To: Gary Guo, Miguel Ojeda, Boqun Feng, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Alexandre Courbot, David Airlie, Simona Vetter
  Cc: Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, rust-for-linux,
	nouveau, linux-kernel, linux-arch, lkmm, dri-devel



On 4/4/2026 9:02 AM, Gary Guo wrote:
> On Thu Apr 2, 2026 at 10:56 PM BST, Joel Fernandes wrote:
>> Hi Gary,
>>
>> On 4/2/2026 11:24 AM, Gary Guo wrote:
>>> From: Gary Guo <gary@garyguo.net>
>>>
>>> Currently, in the GSP->CPU messaging path, the current code misses a read
>>> barrier before data read. The barrier after read is updated to a DMA
>>> barrier (with release ordering desired), instead of the existing (Rust)
>>> SeqCst SMP barrier; the location of barrier is also moved to the beginning
>>> of function, because the barrier is needed to synchronizing between data
>>> and ring-buffer pointer, the RMW operation does not internally need a
>>> barrier (nor it has to be atomic, as CPU pointers are updated by CPU only).
>>>
>>> In the CPU->GSP messaging path, the current code misses a write barrier
>>> after data write and before updating the CPU write pointer. Barrier is not
>>> needed before data write due to control dependency, this fact is documented
>>> explicitly. This could be replaced with an acquire barrier if needed.
>>>
>>> Signed-off-by: Gary Guo <gary@garyguo.net>
>>> ---
>>>  drivers/gpu/nova-core/gsp/cmdq.rs | 19 +++++++++++++++++++
>>>  drivers/gpu/nova-core/gsp/fw.rs   | 12 ------------
>>>  2 files changed, 19 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>>> index 2224896ccc89..7e4315b13984 100644
>>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>>> @@ -19,6 +19,12 @@
>>>      prelude::*,
>>>      sync::{
>>>          aref::ARef,
>>> +        barrier::{
>>> +            dma_mb,
>>> +            Read,
>>> +            Release,
>>> +            Write, //
>>> +        },
>>>          Mutex, //
>>>      },
>>>      time::Delta,
>>> @@ -258,6 +264,9 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>>          let tx = self.cpu_write_ptr() as usize;
>>>          let rx = self.gsp_read_ptr() as usize;
>>>  
>>> +        // ORDERING: control dependency provides necessary LOAD->STORE ordering.
>>> +        // `dma_mb(Acquire)` may be used here if we don't want to rely on control dependency.
>>
>> Just checking, does control dependency on CPU side really apply to ordering for
>> IO (what the device perceives?). IOW, the loads are stores might be ordered on
>> the CPU side, but the device might be seeing these operations out of order. If
>> that is the case, perhaps the control dependency comment is misleading.
> 
> Given that CPU cannot speculate store, I don't see how dependency ordering can
> be broken even if the other side is a bus-mastering device.
> 
> For this to be broken, the device would be able to see the dependency-ordered
> STORE to be propagated to it before it does it own STORE that propagates to the
> CPU to trigger the CPU STORE in the first place? That'll just break casuality.
> 
> I'll say that control dependency is sufficient here. I'm not worried that the
> compiler will break this particular control dependency given that if
> `gsp_read_ptr == cpu_write_ptr` will imply command allocation failure and this
> condition cannot possibly be optimized out by the compiler.
Yep, ok, fair enough.

thanks,

--
Joel Fernandes


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-06 18:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02 15:24 [PATCH 0/3] rust: more memory barriers bindings Gary Guo
2026-04-02 15:24 ` [PATCH 1/3] rust: sync: add helpers for mb, dma_mb and friends Gary Guo
2026-04-02 15:24 ` [PATCH 2/3] rust: sync: generic memory barriers Gary Guo
2026-04-02 15:24 ` [PATCH 3/3] gpu: nova-core: fix wrong use of barriers in GSP code Gary Guo
2026-04-02 21:56   ` Joel Fernandes
2026-04-04 13:02     ` Gary Guo
2026-04-06 18:18       ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox