Linux Documentation
 help / color / mirror / Atom feed
* [PATCH v11 03/20] gpu: nova-core: gsp: Expose total physical VRAM end from FB region info
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add `total_fb_end()` to `GspStaticConfigInfo` that computes the
exclusive end address of the highest valid FB region covering both
usable and GSP-reserved areas.

This allows callers to know the full physical VRAM extent, not just
the allocatable portion.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gsp/commands.rs    | 6 ++++++
 drivers/gpu/nova-core/gsp/fw/commands.rs | 7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index d18abd8b5f04..e42a865fd4ac 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -196,6 +196,9 @@ pub(crate) struct GetGspStaticInfoReply {
     /// Usable FB (VRAM) region for driver memory allocation.
     #[expect(dead_code)]
     pub(crate) usable_fb_region: Range<u64>,
+    /// End of VRAM.
+    #[expect(dead_code)]
+    pub(crate) total_fb_end: u64,
 }
 
 impl MessageFromGsp for GetGspStaticInfoReply {
@@ -207,9 +210,12 @@ fn read(
         msg: &Self::Message,
         _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
     ) -> Result<Self, Self::InitError> {
+        let total_fb_end = msg.total_fb_end().ok_or(ENODEV)?;
+
         Ok(GetGspStaticInfoReply {
             gpu_name: msg.gpu_name_str(),
             usable_fb_region: msg.first_usable_fb_region().ok_or(ENODEV)?,
+            total_fb_end,
         })
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index 3d5180e6b1e0..f2d59aa3131f 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -164,6 +164,13 @@ pub(crate) fn first_usable_fb_region(&self) -> Option<Range<u64>> {
             }
         })
     }
+
+    /// Compute the end of physical VRAM from all FB regions.
+    pub(crate) fn total_fb_end(&self) -> Option<u64> {
+        self.fb_regions()
+            .filter_map(|reg| reg.limit.checked_add(1))
+            .max()
+    }
 }
 
 // SAFETY: Padding is explicit and will not contain uninitialized data.
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 04/20] gpu: nova-core: mm: Add support to use PRAMIN windows to write to VRAM
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

PRAMIN apertures are a crucial mechanism to direct read/write to VRAM.
Add support for the same.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm.rs        |   5 +
 drivers/gpu/nova-core/mm/pramin.rs | 280 +++++++++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs |   1 +
 drivers/gpu/nova-core/regs.rs      |  10 ++
 4 files changed, 296 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm.rs
 create mode 100644 drivers/gpu/nova-core/mm/pramin.rs

diff --git a/drivers/gpu/nova-core/mm.rs b/drivers/gpu/nova-core/mm.rs
new file mode 100644
index 000000000000..7a5dd4220c67
--- /dev/null
+++ b/drivers/gpu/nova-core/mm.rs
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Memory management subsystems for nova-core.
+
+pub(crate) mod pramin;
diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
new file mode 100644
index 000000000000..91a0957b2f92
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pramin.rs
@@ -0,0 +1,280 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Direct VRAM access through the PRAMIN aperture.
+//!
+//! PRAMIN provides a 1MB sliding window into VRAM through BAR0, allowing the CPU to access
+//! video memory directly. Access is managed through a two-level API:
+//!
+//! - [`Pramin`]: The parent object that owns the BAR0 reference and synchronization lock.
+//! - [`PraminWindow`]: A guard object that holds exclusive PRAMIN access for its lifetime.
+//!
+//! The PRAMIN aperture is a 1MB region at a fixed offset from BAR0. The window base is
+//! controlled by an architecture-specific register and is 64KB aligned.
+//!
+//! # Examples
+//!
+//! ## Basic read/write
+//!
+//! ```no_run
+//! use crate::driver::Bar0;
+//! use crate::mm::pramin;
+//! use kernel::devres::Devres;
+//! use kernel::prelude::*;
+//! use kernel::sync::Arc;
+//!
+//! fn example(devres_bar: Arc<Devres<Bar0>>, vram_region: core::ops::Range<u64>) -> Result<()> {
+//!     let pramin = Arc::pin_init(pramin::Pramin::new(devres_bar, vram_region)?, GFP_KERNEL)?;
+//!     let mut window = pramin.get_window()?;
+//!
+//!     // Write and read back.
+//!     window.try_write32(0x100, 0xDEADBEEF)?;
+//!     let val = window.try_read32(0x100)?;
+//!     assert_eq!(val, 0xDEADBEEF);
+//!
+//!     Ok(())
+//! }
+//! ```
+//!
+//! ## Auto-repositioning across VRAM regions
+//!
+//! ```no_run
+//! use crate::driver::Bar0;
+//! use crate::mm::pramin;
+//! use kernel::devres::Devres;
+//! use kernel::prelude::*;
+//! use kernel::sync::Arc;
+//!
+//! fn example(devres_bar: Arc<Devres<Bar0>>, vram_region: core::ops::Range<u64>) -> Result<()> {
+//!     let pramin = Arc::pin_init(pramin::Pramin::new(devres_bar, vram_region)?, GFP_KERNEL)?;
+//!     let mut window = pramin.get_window()?;
+//!
+//!     // Access first 1MB region.
+//!     window.try_write32(0x100, 0x11111111)?;
+//!
+//!     // Access at 2MB - window auto-repositions.
+//!     window.try_write32(0x200000, 0x22222222)?;
+//!
+//!     // Back to first region - window repositions again.
+//!     let val = window.try_read32(0x100)?;
+//!     assert_eq!(val, 0x11111111);
+//!
+//!     Ok(())
+//! }
+//! ```
+
+#![expect(unused)]
+
+use core::ops::Range;
+
+use crate::{
+    bounded_enum,
+    driver::Bar0,
+    num::IntoSafeCast,
+    regs, //
+};
+
+use kernel::{
+    devres::Devres,
+    io::Io,
+    new_mutex,
+    num::Bounded,
+    prelude::*,
+    revocable::RevocableGuard,
+    sizes::{
+        SZ_1M,
+        SZ_64K, //
+    },
+    sync::{
+        lock::mutex::MutexGuard,
+        Arc,
+        Mutex, //
+    },
+};
+
+bounded_enum! {
+    /// Target memory type for the BAR0 window register.
+    ///
+    /// Only VRAM is supported; Hopper+ GPUs do not support other targets.
+    #[derive(Debug)]
+    pub(crate) enum Bar0WindowTarget with TryFrom<Bounded<u32, 2>> {
+        /// Video RAM (GPU framebuffer memory).
+        Vram = 0,
+    }
+}
+
+/// PRAMIN aperture base offset in BAR0.
+const PRAMIN_BASE: usize = 0x700000;
+
+/// PRAMIN aperture size (1MB).
+const PRAMIN_SIZE: usize = SZ_1M;
+
+/// Generate a PRAMIN read accessor.
+macro_rules! define_pramin_read {
+    ($name:ident, $ty:ty) => {
+        #[doc = concat!("Read a `", stringify!($ty), "` from VRAM at the given offset.")]
+        pub(crate) fn $name(&mut self, vram_offset: usize) -> Result<$ty> {
+            let (bar_offset, new_base) =
+                self.compute_window(vram_offset, ::core::mem::size_of::<$ty>())?;
+
+            if let Some(base) = new_base {
+                Self::write_window_base(&self.bar, base)?;
+                *self.state = base;
+            }
+            self.bar.$name(bar_offset)
+        }
+    };
+}
+
+/// Generate a PRAMIN write accessor.
+macro_rules! define_pramin_write {
+    ($name:ident, $ty:ty) => {
+        #[doc = concat!("Write a `", stringify!($ty), "` to VRAM at the given offset.")]
+        pub(crate) fn $name(&mut self, vram_offset: usize, value: $ty) -> Result {
+            let (bar_offset, new_base) =
+                self.compute_window(vram_offset, ::core::mem::size_of::<$ty>())?;
+
+            if let Some(base) = new_base {
+                Self::write_window_base(&self.bar, base)?;
+                *self.state = base;
+            }
+            self.bar.$name(value, bar_offset)
+        }
+    };
+}
+
+/// PRAMIN aperture manager.
+///
+/// Call [`Pramin::get_window()`] to acquire exclusive PRAMIN access.
+#[pin_data]
+pub(crate) struct Pramin {
+    bar: Arc<Devres<Bar0>>,
+    /// Valid VRAM region. Accesses outside this range are rejected.
+    vram_region: Range<u64>,
+    /// PRAMIN aperture state, protected by a mutex.
+    ///
+    /// # Invariants
+    ///
+    /// This lock is acquired during the DMA fence signaling critical path.
+    /// It must NEVER be held across any reclaimable CPU memory / allocations
+    /// (`GFP_KERNEL`), because the memory reclaim path can call
+    /// `dma_fence_wait()`, which would deadlock with this lock held.
+    #[pin]
+    state: Mutex<u64>,
+}
+
+impl Pramin {
+    /// Create a pin-initializer for PRAMIN.
+    ///
+    /// `vram_region` specifies the valid VRAM address range.
+    pub(crate) fn new(
+        bar: Arc<Devres<Bar0>>,
+        vram_region: Range<u64>,
+    ) -> Result<impl PinInit<Self>> {
+        let bar_access = bar.try_access().ok_or(ENODEV)?;
+        let current_base = Self::read_window_base(&bar_access);
+
+        Ok(pin_init!(Self {
+            bar,
+            vram_region,
+            state <- new_mutex!(current_base, "pramin_state"),
+        }))
+    }
+
+    /// Acquire exclusive PRAMIN access.
+    ///
+    /// Returns a [`PraminWindow`] guard that provides VRAM read/write accessors.
+    /// The [`PraminWindow`] is exclusive and only one can exist at a time.
+    pub(crate) fn get_window(&self) -> Result<PraminWindow<'_>> {
+        let bar = self.bar.try_access().ok_or(ENODEV)?;
+        let state = self.state.lock();
+        Ok(PraminWindow {
+            bar,
+            vram_region: self.vram_region.clone(),
+            state,
+        })
+    }
+
+    /// Read the current window base from the BAR0_WINDOW register.
+    fn read_window_base(bar: &Bar0) -> u64 {
+        let reg = bar.read(regs::NV_PBUS_BAR0_WINDOW);
+
+        // TODO: Convert to Bounded<u64, 40> when available.
+        u64::from(reg.window_base()) << 16
+    }
+}
+
+/// PRAMIN window guard for direct VRAM access.
+///
+/// This guard holds exclusive access to the PRAMIN aperture. The window auto-repositions
+/// when accessing VRAM offsets outside the current 1MB range.
+///
+/// Only one [`PraminWindow`] can exist at a time per [`Pramin`] instance (enforced by the
+/// internal `MutexGuard`).
+pub(crate) struct PraminWindow<'a> {
+    bar: RevocableGuard<'a, Bar0>,
+    vram_region: Range<u64>,
+    state: MutexGuard<'a, u64>,
+}
+
+impl PraminWindow<'_> {
+    /// Write a new window base to the BAR0_WINDOW register.
+    fn write_window_base(bar: &Bar0, base: u64) -> Result {
+        // CAST: After >> 16, a VRAM address fits in u32.
+        let window_base = (base >> 16) as u32;
+        bar.write_reg(
+            regs::NV_PBUS_BAR0_WINDOW::zeroed()
+                .with_target(Bar0WindowTarget::Vram)
+                .try_with_window_base(window_base)?,
+        );
+        Ok(())
+    }
+
+    /// Compute window parameters for a VRAM access.
+    ///
+    /// Returns (`bar_offset`, `new_base`) where:
+    /// - `bar_offset`: The BAR0 offset to use for the access.
+    /// - `new_base`: `Some(base)` if window needs repositioning, `None` otherwise.
+    fn compute_window(
+        &self,
+        vram_offset: usize,
+        access_size: usize,
+    ) -> Result<(usize, Option<u64>)> {
+        // Validate VRAM offset is within the valid VRAM region.
+        let vram_addr = vram_offset as u64;
+        let end_addr = vram_addr.checked_add(access_size as u64).ok_or(EINVAL)?;
+        if vram_addr < self.vram_region.start || end_addr > self.vram_region.end {
+            return Err(EINVAL);
+        }
+
+        // Check if access fits within the current 1MB window.
+        let current_base = *self.state;
+        if vram_addr >= current_base {
+            let offset_in_window: usize = (vram_addr - current_base).into_safe_cast();
+            if offset_in_window + access_size <= PRAMIN_SIZE {
+                return Ok((PRAMIN_BASE + offset_in_window, None));
+            }
+        }
+
+        // Access doesn't fit in current window - reposition.
+        // Hardware requires 64KB alignment for the window base register.
+        let needed_base = vram_addr & !(SZ_64K as u64 - 1);
+        let offset_in_window: usize = (vram_addr - needed_base).into_safe_cast();
+
+        // Verify access fits in the 1MB window from the new base.
+        if offset_in_window + access_size > PRAMIN_SIZE {
+            return Err(EINVAL);
+        }
+
+        Ok((PRAMIN_BASE + offset_in_window, Some(needed_base)))
+    }
+
+    define_pramin_read!(try_read8, u8);
+    define_pramin_read!(try_read16, u16);
+    define_pramin_read!(try_read32, u32);
+    define_pramin_read!(try_read64, u64);
+
+    define_pramin_write!(try_write8, u8);
+    define_pramin_write!(try_write16, u16);
+    define_pramin_write!(try_write32, u32);
+    define_pramin_write!(try_write64, u64);
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 3a0c45481a92..d087354f03b9 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -17,6 +17,7 @@
 mod gfw;
 mod gpu;
 mod gsp;
+mod mm;
 #[macro_use]
 mod num;
 mod regs;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 2f171a4ff9ba..a3ca02345e20 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -30,6 +30,7 @@
         Architecture,
         Chipset, //
     },
+    mm::pramin::Bar0WindowTarget,
     num::FromSafeCast,
 };
 
@@ -115,6 +116,15 @@ fn fmt(&self, f: &mut kernel::fmt::Formatter<'_>) -> kernel::fmt::Result {
     }
 }
 
+register! {
+    /// BAR0 window control for PRAMIN access.
+    pub(crate) NV_PBUS_BAR0_WINDOW(u32) @ 0x00001700 {
+        25:24   target ?=> Bar0WindowTarget;
+        /// Window base address (bits 39:16 of FB addr).
+        23:0    window_base;
+    }
+}
+
 // PFB
 
 register! {
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 05/20] docs: gpu: nova-core: Document the PRAMIN aperture mechanism
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add documentation for the PRAMIN aperture mechanism used by nova-core
for direct VRAM access.

Nova only uses TARGET=VRAM for VRAM access. The SYS_MEM target values
are documented for completeness but not used by the driver.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 Documentation/gpu/nova/core/pramin.rst | 123 +++++++++++++++++++++++++
 Documentation/gpu/nova/index.rst       |   1 +
 2 files changed, 124 insertions(+)
 create mode 100644 Documentation/gpu/nova/core/pramin.rst

diff --git a/Documentation/gpu/nova/core/pramin.rst b/Documentation/gpu/nova/core/pramin.rst
new file mode 100644
index 000000000000..3e8adbabeb74
--- /dev/null
+++ b/Documentation/gpu/nova/core/pramin.rst
@@ -0,0 +1,123 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
+PRAMIN aperture mechanism
+=========================
+
+.. note::
+   The following description is approximate and current as of the Ampere family.
+   It may change for future generations and is intended to assist in understanding
+   the driver code.
+
+Introduction
+============
+
+PRAMIN is a hardware aperture mechanism that provides CPU access to GPU Video RAM (VRAM) before
+the GPU's Memory Management Unit (MMU) and page tables are initialized. This 1MB sliding window,
+located at a fixed offset within BAR0, is essential for setting up page tables and other critical
+GPU data structures without relying on the GPU's MMU.
+
+Architecture Overview
+=====================
+
+The PRAMIN aperture mechanism is logically implemented by the GPU's PBUS (PCIe Bus Controller Unit)
+and provides a CPU-accessible window into VRAM through the PCIe interface::
+
+    +-----------------+    PCIe     +------------------------------+
+    |      CPU        |<----------->|           GPU                |
+    +-----------------+             |                              |
+                                    |  +----------------------+    |
+                                    |  |       PBUS           |    |
+                                    |  |  (Bus Controller)    |    |
+                                    |  |                      |    |
+                                    |  |  +--------------+<------------ (window starts at
+                                    |  |  |   PRAMIN     |    |    |     BAR0 + 0x700000)
+                                    |  |  |   Window     |    |    |
+                                    |  |  |   (1MB)      |    |    |
+                                    |  |  +--------------+    |    |
+                                    |  |         |            |    |
+                                    |  +---------|------------+    |
+                                    |            |                 |
+                                    |            v                 |
+                                    |  +----------------------+<------------ (Program PRAMIN to any
+                                    |  |       VRAM           |    |    64KB-aligned VRAM boundary)
+                                    |  |    (Several GBs)     |    |
+                                    |  |                      |    |
+                                    |  |  FB[0x000000000000]  |    |
+                                    |  |          ...         |    |
+                                    |  |  FB[0x7FFFFFFFFFF]   |    |
+                                    |  +----------------------+    |
+                                    +------------------------------+
+
+PBUS (PCIe Bus Controller) is responsible for, among other things, handling MMIO
+accesses to the BAR registers.
+
+PRAMIN Window Operation
+=======================
+
+The PRAMIN window provides a 1MB sliding aperture that can be repositioned over
+the entire VRAM address space using the ``NV_PBUS_BAR0_WINDOW`` register.
+
+Window Control Mechanism
+-------------------------
+
+::
+
+    NV_PBUS_BAR0_WINDOW Register (0x1700):
+    +-------+--------+--------------------------------------+
+    | 31:26 | 25:24  |               23:0                   |
+    | RSVD  | TARGET |            BASE_ADDR                 |
+    |       |        |        (bits 39:16 of VRAM address)  |
+    +-------+--------+--------------------------------------+
+
+    The 24-bit BASE_ADDR field encodes bits [39:16] of the target VRAM address,
+    providing 40-bit (1TB) address space coverage with 64KB alignment.
+
+    TARGET field (bits 25:24):
+    - 0x0: VRAM (Video Memory)
+    - 0x1: SYS_MEM_COH (Coherent System Memory)
+    - 0x2: SYS_MEM_NONCOH (Non-coherent System Memory)
+    - 0x3: Reserved
+
+.. note::
+   Nova only uses TARGET=VRAM (0x0) for video memory access. The SYS_MEM
+   target values are documented here for hardware completeness but are
+   not used by the driver.
+
+64KB Alignment Requirement
+---------------------------
+
+The PRAMIN window must be aligned to 64KB boundaries in VRAM. This is enforced
+by the ``BASE_ADDR`` field representing bits [39:16] of the target address::
+
+    VRAM Address Calculation:
+    actual_vram_addr = (BASE_ADDR << 16) + pramin_offset
+    Where:
+    - BASE_ADDR: 24-bit value from NV_PBUS_BAR0_WINDOW[23:0]
+    - pramin_offset: 20-bit offset within the PRAMIN window [0x00000-0xFFFFF]
+
+    Example Window Positioning:
+    +---------------------------------------------------------+
+    |                    VRAM Space                           |
+    |                                                         |
+    |  0x000000000  +-----------------+ <-- 64KB aligned      |
+    |               | PRAMIN Window   |                       |
+    |               |    (1MB)        |                       |
+    |  0x0000FFFFF  +-----------------+                       |
+    |                                                         |
+    |       |              ^                                  |
+    |       |              | Window can slide                 |
+    |       v              | to any 64KB-aligned boundary     |
+    |                                                         |
+    |  0x123400000  +-----------------+ <-- 64KB aligned      |
+    |               | PRAMIN Window   |                       |
+    |               |    (1MB)        |                       |
+    |  0x1234FFFFF  +-----------------+                       |
+    |                                                         |
+    |                       ...                               |
+    |                                                         |
+    |  0x7FFFF0000  +-----------------+ <-- 64KB aligned      |
+    |               | PRAMIN Window   |                       |
+    |               |    (1MB)        |                       |
+    |  0x7FFFFFFFF  +-----------------+                       |
+    +---------------------------------------------------------+
diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
index e39cb3163581..b8254b1ffe2a 100644
--- a/Documentation/gpu/nova/index.rst
+++ b/Documentation/gpu/nova/index.rst
@@ -32,3 +32,4 @@ vGPU manager VFIO driver and the nova-drm driver.
    core/devinit
    core/fwsec
    core/falcon
+   core/pramin
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 06/20] gpu: nova-core: mm: Add common memory management types
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add foundational types for GPU memory management. These types are used
throughout the nova memory management subsystem for page table
operations, address translation, and memory allocation.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm.rs | 196 ++++++++++++++++++++++++++++++++++++
 1 file changed, 196 insertions(+)

diff --git a/drivers/gpu/nova-core/mm.rs b/drivers/gpu/nova-core/mm.rs
index 7a5dd4220c67..fa29f525f282 100644
--- a/drivers/gpu/nova-core/mm.rs
+++ b/drivers/gpu/nova-core/mm.rs
@@ -2,4 +2,200 @@
 
 //! Memory management subsystems for nova-core.
 
+#![expect(dead_code)]
+
+/// Implements `From` conversions between [`Pfn`] and `Bounded<u64, N>` for bitfield interop.
+///
+/// Each MMU version module should invoke this for the specific bit widths used by that version's
+/// PTE/PDE bitfield definitions.
+macro_rules! impl_pfn_bounded {
+    ($bits:literal) => {
+        impl From<Bounded<u64, $bits>> for Pfn {
+            fn from(val: Bounded<u64, $bits>) -> Self {
+                Self::new(val.get())
+            }
+        }
+
+        impl From<Pfn> for Bounded<u64, $bits> {
+            fn from(pfn: Pfn) -> Self {
+                Bounded::from_expr(pfn.raw() & ::kernel::bits::genmask_u64(0..=($bits - 1)))
+            }
+        }
+    };
+}
+
 pub(crate) mod pramin;
+
+use kernel::{
+    bitfield,
+    num::Bounded,
+    prelude::*,
+    sizes::SZ_4K, //
+};
+
+use crate::num::u64_as_usize;
+
+/// Page size in bytes (4 KiB).
+pub(crate) const PAGE_SIZE: usize = SZ_4K;
+
+bitfield! {
+    /// Physical VRAM address in GPU video memory.
+    pub(crate) struct VramAddress(u64) {
+        /// Offset within 4KB page.
+        11:0    offset;
+        /// Physical frame number.
+        63:12   frame_number => Pfn;
+    }
+}
+
+impl VramAddress {
+    /// Create a new VRAM address from a raw value.
+    pub(crate) const fn new(addr: u64) -> Self {
+        Self::from_raw(addr)
+    }
+
+    /// Get the raw address value as `usize` (useful for MMIO offsets).
+    pub(crate) const fn raw(&self) -> usize {
+        u64_as_usize(self.into_raw())
+    }
+
+    /// Get the raw address value as `u64`.
+    pub(crate) const fn raw_u64(&self) -> u64 {
+        self.into_raw()
+    }
+}
+
+impl PartialOrd for VramAddress {
+    fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+impl Ord for VramAddress {
+    fn cmp(&self, other: &Self) -> core::cmp::Ordering {
+        self.into_raw().cmp(&other.into_raw())
+    }
+}
+
+impl From<Pfn> for VramAddress {
+    fn from(pfn: Pfn) -> Self {
+        Self::zeroed().with_frame_number(pfn)
+    }
+}
+
+bitfield! {
+    /// Virtual address in GPU address space.
+    pub(crate) struct VirtualAddress(u64) {
+        /// Offset within 4KB page.
+        11:0    offset;
+        /// Virtual frame number.
+        63:12   frame_number => Vfn;
+    }
+}
+
+impl VirtualAddress {
+    /// Create a new virtual address from a raw value.
+    #[expect(dead_code)]
+    pub(crate) const fn new(addr: u64) -> Self {
+        Self::from_raw(addr)
+    }
+
+    /// Get the raw address value as `u64`.
+    pub(crate) const fn raw_u64(&self) -> u64 {
+        self.into_raw()
+    }
+}
+
+impl From<Vfn> for VirtualAddress {
+    fn from(vfn: Vfn) -> Self {
+        Self::zeroed().with_frame_number(vfn)
+    }
+}
+
+/// Physical Frame Number.
+///
+/// Represents a physical page in VRAM.
+#[repr(transparent)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(crate) struct Pfn(u64);
+
+impl Pfn {
+    /// Create a new PFN from a frame number.
+    pub(crate) const fn new(frame_number: u64) -> Self {
+        Self(frame_number)
+    }
+
+    /// Get the raw frame number.
+    pub(crate) const fn raw(self) -> u64 {
+        self.0
+    }
+}
+
+impl From<VramAddress> for Pfn {
+    fn from(addr: VramAddress) -> Self {
+        addr.frame_number()
+    }
+}
+
+impl From<u64> for Pfn {
+    fn from(val: u64) -> Self {
+        Self(val)
+    }
+}
+
+impl From<Pfn> for u64 {
+    fn from(pfn: Pfn) -> Self {
+        pfn.0
+    }
+}
+
+impl_pfn_bounded!(52);
+
+/// Virtual Frame Number.
+///
+/// Represents a virtual page in GPU address space.
+#[repr(transparent)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(crate) struct Vfn(u64);
+
+impl Vfn {
+    /// Create a new VFN from a frame number.
+    pub(crate) const fn new(frame_number: u64) -> Self {
+        Self(frame_number)
+    }
+
+    /// Get the raw frame number.
+    pub(crate) const fn raw(self) -> u64 {
+        self.0
+    }
+}
+
+impl From<VirtualAddress> for Vfn {
+    fn from(addr: VirtualAddress) -> Self {
+        addr.frame_number()
+    }
+}
+
+impl From<u64> for Vfn {
+    fn from(val: u64) -> Self {
+        Self(val)
+    }
+}
+
+impl From<Vfn> for u64 {
+    fn from(vfn: Vfn) -> Self {
+        vfn.0
+    }
+}
+
+impl From<Bounded<u64, 52>> for Vfn {
+    fn from(val: Bounded<u64, 52>) -> Self {
+        Self(val.get())
+    }
+}
+
+impl From<Vfn> for Bounded<u64, 52> {
+    fn from(vfn: Vfn) -> Self {
+        Bounded::from_expr(vfn.0 & ::kernel::bits::genmask_u64(0..=51))
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 07/20] gpu: nova-core: mm: Add TLB flush support
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add TLB (Translation Lookaside Buffer) flush support for GPU MMU.

After modifying page table entries, the GPU's TLB must be invalidated
to ensure the new mappings take effect. The Tlb struct provides flush
functionality through BAR0 registers.

The flush operation writes the page directory base address and triggers
an invalidation, polling for completion with a 2 second timeout matching
the Nouveau driver.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm.rs     |  1 +
 drivers/gpu/nova-core/mm/tlb.rs | 97 +++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/regs.rs   | 44 +++++++++++++++
 3 files changed, 142 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/tlb.rs

diff --git a/drivers/gpu/nova-core/mm.rs b/drivers/gpu/nova-core/mm.rs
index fa29f525f282..314d660d898b 100644
--- a/drivers/gpu/nova-core/mm.rs
+++ b/drivers/gpu/nova-core/mm.rs
@@ -25,6 +25,7 @@ fn from(pfn: Pfn) -> Self {
 }
 
 pub(crate) mod pramin;
+pub(super) mod tlb;
 
 use kernel::{
     bitfield,
diff --git a/drivers/gpu/nova-core/mm/tlb.rs b/drivers/gpu/nova-core/mm/tlb.rs
new file mode 100644
index 000000000000..6d384f447635
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/tlb.rs
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! TLB (Translation Lookaside Buffer) flush support for GPU MMU.
+//!
+//! After modifying page table entries, the GPU's TLB must be flushed to
+//! ensure the new mappings take effect. This module provides TLB flush
+//! functionality for virtual memory managers.
+//!
+//! # Examples
+//!
+//! ```ignore
+//! use crate::mm::tlb::Tlb;
+//!
+//! fn page_table_update(tlb: &Tlb, pdb_addr: VramAddress) -> Result<()> {
+//!     // ... modify page tables ...
+//!
+//!     // Flush TLB to make changes visible (polls for completion).
+//!     tlb.flush(pdb_addr)?;
+//!
+//!     Ok(())
+//! }
+//! ```
+
+use kernel::{
+    devres::Devres,
+    io::poll::read_poll_timeout,
+    io::Io,
+    new_mutex,
+    prelude::*,
+    sync::{
+        Arc,
+        Mutex, //
+    },
+    time::Delta, //
+};
+
+use crate::{
+    driver::Bar0,
+    mm::VramAddress,
+    regs, //
+};
+
+/// TLB manager for GPU translation buffer operations.
+#[pin_data]
+pub(crate) struct Tlb {
+    bar: Arc<Devres<Bar0>>,
+    /// TLB flush serialization lock: This lock is designed to be acquired during
+    /// the DMA fence signalling critical path. It should NEVER be held across any
+    /// reclaimable CPU memory allocations because the memory reclaim path can
+    /// call `dma_fence_wait()` (when implemented), which would deadlock if lock held.
+    #[pin]
+    lock: Mutex<()>,
+}
+
+impl Tlb {
+    /// Create a new TLB manager.
+    pub(super) fn new(bar: Arc<Devres<Bar0>>) -> impl PinInit<Self> {
+        pin_init!(Self {
+            bar,
+            lock <- new_mutex!((), "tlb_flush"),
+        })
+    }
+
+    /// Flush the GPU TLB for a specific page directory base.
+    ///
+    /// This invalidates all TLB entries associated with the given PDB address.
+    /// Must be called after modifying page table entries to ensure the GPU sees
+    /// the updated mappings.
+    pub(super) fn flush(&self, pdb_addr: VramAddress) -> Result {
+        let _guard = self.lock.lock();
+
+        let bar = self.bar.try_access().ok_or(ENODEV)?;
+
+        // Write PDB address.
+        bar.write_reg(regs::NV_TLB_FLUSH_PDB_LO::from_pdb_addr(pdb_addr.raw_u64()));
+        bar.write_reg(regs::NV_TLB_FLUSH_PDB_HI::from_pdb_addr(pdb_addr.raw_u64()));
+
+        // Trigger flush: invalidate all pages, require global acknowledgment
+        // from all engines before completion.
+        bar.write_reg(
+            regs::NV_TLB_FLUSH_CTRL::zeroed()
+                .with_page_all(true)
+                .with_ack_globally(true)
+                .with_enable(true),
+        );
+
+        // Poll for completion - enable bit clears when flush is done.
+        read_poll_timeout(
+            || Ok(bar.read(regs::NV_TLB_FLUSH_CTRL)),
+            |ctrl: &regs::NV_TLB_FLUSH_CTRL| !ctrl.enable(),
+            Delta::ZERO,
+            Delta::from_secs(2),
+        )?;
+
+        Ok(())
+    }
+}
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index a3ca02345e20..640025041618 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -548,3 +548,47 @@ pub(crate) mod ga100 {
         }
     }
 }
+
+// MMU TLB
+
+register! {
+    /// TLB flush register: PDB address bits [39:8].
+    pub(crate) NV_TLB_FLUSH_PDB_LO(u32) @ 0x00b830a0 {
+        /// PDB address bits [39:8].
+        31:0    pdb_lo => u32;
+    }
+
+    /// TLB flush register: PDB address bits [47:40].
+    pub(crate) NV_TLB_FLUSH_PDB_HI(u32) @ 0x00b830a4 {
+        /// PDB address bits [47:40].
+        7:0     pdb_hi => u8;
+    }
+
+    /// TLB flush control register.
+    pub(crate) NV_TLB_FLUSH_CTRL(u32) @ 0x00b830b0 {
+        /// Invalidate all pages.
+        0:0     page_all => bool;
+        /// Require global acknowledgment of the invalidation.
+        7:7     ack_globally => bool;
+        /// Enable/trigger flush (clears when flush completes).
+        31:31   enable => bool;
+    }
+}
+
+impl NV_TLB_FLUSH_PDB_LO {
+    /// Create a register value from a PDB address.
+    ///
+    /// Extracts bits [39:8] of the address and shifts it right by 8 bits.
+    pub(crate) fn from_pdb_addr(addr: u64) -> Self {
+        Self::zeroed().with_pdb_lo(((addr >> 8) & 0xFFFF_FFFF) as u32)
+    }
+}
+
+impl NV_TLB_FLUSH_PDB_HI {
+    /// Create a register value from a PDB address.
+    ///
+    /// Extracts bits [47:40] of the address and shifts it right by 40 bits.
+    pub(crate) fn from_pdb_addr(addr: u64) -> Self {
+        Self::zeroed().with_pdb_hi(((addr >> 40) & 0xFF) as u8)
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 08/20] gpu: nova-core: mm: Add GpuMm centralized memory manager
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Introduce GpuMm as the centralized GPU memory manager that owns:
- Buddy allocator for VRAM allocation.
- PRAMIN window for direct VRAM access.
- TLB manager for translation buffer operations.

This provides clean ownership model where GpuMm provides accessor
methods for its components that can be used for memory management
operations.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/Kconfig         |  1 +
 drivers/gpu/nova-core/gpu.rs          | 34 ++++++++++++-
 drivers/gpu/nova-core/gsp/commands.rs |  2 -
 drivers/gpu/nova-core/mm.rs           | 69 ++++++++++++++++++++++++++-
 4 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/nova-core/Kconfig b/drivers/gpu/nova-core/Kconfig
index a4f2380654e2..6513007bf66f 100644
--- a/drivers/gpu/nova-core/Kconfig
+++ b/drivers/gpu/nova-core/Kconfig
@@ -4,6 +4,7 @@ config NOVA_CORE
 	depends on PCI
 	depends on RUST
 	select AUXILIARY_BUS
+	select GPU_BUDDY
 	select RUST_FW_LOADER_ABSTRACTIONS
 	default n
 	help
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index b4da4a1ae156..c49fa9c380b8 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -4,10 +4,13 @@
     device,
     devres::Devres,
     fmt,
+    gpu::buddy::GpuBuddyParams,
     io::Io,
     num::Bounded,
     pci,
     prelude::*,
+    ptr::Alignment,
+    sizes::SZ_4K,
     sync::Arc, //
 };
 
@@ -25,6 +28,7 @@
         commands::GetGspStaticInfoReply,
         Gsp, //
     },
+    mm::GpuMm,
     regs,
 };
 
@@ -238,6 +242,9 @@ pub(crate) struct Gpu {
     gsp_falcon: Falcon<GspFalcon>,
     /// SEC2 falcon instance, used for GSP boot up and cleanup.
     sec2_falcon: Falcon<Sec2Falcon>,
+    /// GPU memory manager owning memory management resources.
+    #[pin]
+    mm: GpuMm,
     /// GSP runtime data. Temporarily an empty placeholder.
     #[pin]
     gsp: Gsp,
@@ -274,7 +281,32 @@ pub(crate) fn new<'a>(
 
             gsp <- Gsp::new(pdev),
 
-            gsp_static_info: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
+            gsp_static_info: {
+                let info = gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)?;
+
+                dev_info!(
+                    pdev.as_ref(),
+                    "Using FB region: {:#x}..{:#x}\n",
+                    info.usable_fb_region.start,
+                    info.usable_fb_region.end
+                );
+
+                info
+            },
+
+            // Create GPU memory manager owning memory management resources.
+            mm <- {
+                let usable_vram = &gsp_static_info.usable_fb_region;
+
+                // PRAMIN covers all physical VRAM (including GSP-reserved areas
+                // above the usable region, e.g. the BAR1 page directory).
+                let pramin_vram_region = 0..gsp_static_info.total_fb_end;
+                GpuMm::new(devres_bar.clone(), GpuBuddyParams {
+                    base_offset: usable_vram.start,
+                    size: usable_vram.end - usable_vram.start,
+                    chunk_size: Alignment::new::<SZ_4K>(),
+                }, pramin_vram_region)?
+            },
 
             bar: devres_bar,
         })
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index e42a865fd4ac..eeecf81a0ffd 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -194,10 +194,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
 pub(crate) struct GetGspStaticInfoReply {
     gpu_name: [u8; 64],
     /// Usable FB (VRAM) region for driver memory allocation.
-    #[expect(dead_code)]
     pub(crate) usable_fb_region: Range<u64>,
     /// End of VRAM.
-    #[expect(dead_code)]
     pub(crate) total_fb_end: u64,
 }
 
diff --git a/drivers/gpu/nova-core/mm.rs b/drivers/gpu/nova-core/mm.rs
index 314d660d898b..fa92540bb006 100644
--- a/drivers/gpu/nova-core/mm.rs
+++ b/drivers/gpu/nova-core/mm.rs
@@ -29,12 +29,77 @@ fn from(pfn: Pfn) -> Self {
 
 use kernel::{
     bitfield,
+    devres::Devres,
+    gpu::buddy::{
+        GpuBuddy,
+        GpuBuddyParams, //
+    },
     num::Bounded,
     prelude::*,
-    sizes::SZ_4K, //
+    sizes::SZ_4K,
+    sync::Arc, //
 };
 
-use crate::num::u64_as_usize;
+use pin_init::Zeroable;
+
+use crate::{
+    driver::Bar0,
+    num::u64_as_usize, //
+};
+
+pub(crate) use tlb::Tlb;
+
+/// GPU Memory Manager - owns all core MM components.
+///
+/// Provides centralized ownership of memory management resources:
+/// - [`GpuBuddy`] allocator for VRAM page table allocation.
+/// - [`pramin::Pramin`] for direct VRAM access.
+/// - [`Tlb`] manager for translation buffer flush operations.
+#[pin_data]
+pub(crate) struct GpuMm {
+    buddy: GpuBuddy,
+    #[pin]
+    pramin: pramin::Pramin,
+    #[pin]
+    tlb: Tlb,
+}
+
+impl GpuMm {
+    /// Create a pin-initializer for `GpuMm`.
+    ///
+    /// `pramin_vram_region` is the full physical VRAM range (including GSP-reserved
+    /// areas). PRAMIN window accesses are validated against this range.
+    pub(crate) fn new(
+        bar: Arc<Devres<Bar0>>,
+        buddy_params: GpuBuddyParams,
+        pramin_vram_region: core::ops::Range<u64>,
+    ) -> Result<impl PinInit<Self>> {
+        let buddy = GpuBuddy::new(buddy_params)?;
+        let tlb_init = Tlb::new(bar.clone());
+        let pramin_init = pramin::Pramin::new(bar, pramin_vram_region)?;
+
+        Ok(pin_init!(Self {
+            buddy,
+            pramin <- pramin_init,
+            tlb <- tlb_init,
+        }))
+    }
+
+    /// Access the [`GpuBuddy`] allocator.
+    pub(crate) fn buddy(&self) -> &GpuBuddy {
+        &self.buddy
+    }
+
+    /// Access the [`pramin::Pramin`].
+    pub(crate) fn pramin(&self) -> &pramin::Pramin {
+        &self.pramin
+    }
+
+    /// Access the [`Tlb`] manager.
+    pub(crate) fn tlb(&self) -> &Tlb {
+        &self.tlb
+    }
+}
 
 /// Page size in bytes (4 KiB).
 pub(crate) const PAGE_SIZE: usize = SZ_4K;
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 09/20] gpu: nova-core: mm: Add common types for all page table formats
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add common page table types shared between MMU v2 and v3. These types
are hardware-agnostic and used by both MMU versions.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm.rs           |   1 +
 drivers/gpu/nova-core/mm/pagetable.rs | 157 ++++++++++++++++++++++++++
 2 files changed, 158 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/pagetable.rs

diff --git a/drivers/gpu/nova-core/mm.rs b/drivers/gpu/nova-core/mm.rs
index fa92540bb006..af398e94dd16 100644
--- a/drivers/gpu/nova-core/mm.rs
+++ b/drivers/gpu/nova-core/mm.rs
@@ -24,6 +24,7 @@ fn from(pfn: Pfn) -> Self {
     };
 }
 
+pub(super) mod pagetable;
 pub(crate) mod pramin;
 pub(super) mod tlb;
 
diff --git a/drivers/gpu/nova-core/mm/pagetable.rs b/drivers/gpu/nova-core/mm/pagetable.rs
new file mode 100644
index 000000000000..637ff43ea83a
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable.rs
@@ -0,0 +1,157 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Common page table types shared between MMU v2 and v3.
+//!
+//! This module provides foundational types used by both MMU versions:
+//! - Page table level hierarchy
+//! - Memory aperture types for PDEs and PTEs
+
+#![expect(dead_code)]
+
+use kernel::num::Bounded;
+
+use crate::gpu::Architecture;
+
+/// Extracts the page table index at a given level from a virtual address.
+pub(super) trait VaLevelIndex {
+    /// Return the page table index at `level` for this virtual address.
+    fn level_index(&self, level: u64) -> u64;
+}
+
+/// MMU version enumeration.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum MmuVersion {
+    /// MMU v2 for Turing/Ampere/Ada.
+    V2,
+    /// MMU v3 for Hopper and later.
+    V3,
+}
+
+impl From<Architecture> for MmuVersion {
+    fn from(arch: Architecture) -> Self {
+        match arch {
+            Architecture::Turing | Architecture::Ampere | Architecture::Ada => Self::V2,
+            // In the future, uncomment the following to support V3.
+            // _ => Self::V3,
+        }
+    }
+}
+
+/// Page Table Level hierarchy for MMU v2/v3.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(super) enum PageTableLevel {
+    /// Level 0 - Page Directory Base (root).
+    Pdb,
+    /// Level 1 - Intermediate page directory.
+    L1,
+    /// Level 2 - Intermediate page directory.
+    L2,
+    /// Level 3 - Intermediate page directory or dual PDE (version-dependent).
+    L3,
+    /// Level 4 - PTE level for v2, intermediate page directory for v3.
+    L4,
+    /// Level 5 - PTE level used for MMU v3 only.
+    L5,
+}
+
+impl PageTableLevel {
+    /// Number of entries per page table (512 for 4KB pages).
+    pub(super) const ENTRIES_PER_TABLE: usize = 512;
+
+    /// Get the next level in the hierarchy.
+    pub(super) const fn next(&self) -> Option<PageTableLevel> {
+        match self {
+            Self::Pdb => Some(Self::L1),
+            Self::L1 => Some(Self::L2),
+            Self::L2 => Some(Self::L3),
+            Self::L3 => Some(Self::L4),
+            Self::L4 => Some(Self::L5),
+            Self::L5 => None,
+        }
+    }
+
+    /// Convert level to index.
+    pub(super) const fn as_index(&self) -> u64 {
+        match self {
+            Self::Pdb => 0,
+            Self::L1 => 1,
+            Self::L2 => 2,
+            Self::L3 => 3,
+            Self::L4 => 4,
+            Self::L5 => 5,
+        }
+    }
+}
+
+/// Memory aperture for Page Table Entries (`PTE`s).
+///
+/// Determines which memory region the `PTE` points to.
+#[repr(u8)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(super) enum AperturePte {
+    /// Local video memory (VRAM).
+    #[default]
+    VideoMemory = 0,
+    /// Peer GPU's video memory.
+    PeerMemory = 1,
+    /// System memory with cache coherence.
+    SystemCoherent = 2,
+    /// System memory without cache coherence.
+    SystemNonCoherent = 3,
+}
+
+// TODO[FPRI]: Replace with `#[derive(FromPrimitive)]` when available.
+impl From<Bounded<u64, 2>> for AperturePte {
+    fn from(val: Bounded<u64, 2>) -> Self {
+        match *val {
+            0 => Self::VideoMemory,
+            1 => Self::PeerMemory,
+            2 => Self::SystemCoherent,
+            3 => Self::SystemNonCoherent,
+            _ => Self::VideoMemory,
+        }
+    }
+}
+
+// TODO[FPRI]: Replace with `#[derive(ToPrimitive)]` when available.
+impl From<AperturePte> for Bounded<u64, 2> {
+    fn from(val: AperturePte) -> Self {
+        Bounded::from_expr(val as u64 & 0x3)
+    }
+}
+
+/// Memory aperture for Page Directory Entries (`PDE`s).
+///
+/// Note: For `PDE`s, `Invalid` (0) means the entry is not valid.
+#[repr(u8)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(super) enum AperturePde {
+    /// Invalid/unused entry.
+    #[default]
+    Invalid = 0,
+    /// Page table is in video memory.
+    VideoMemory = 1,
+    /// Page table is in system memory with coherence.
+    SystemCoherent = 2,
+    /// Page table is in system memory without coherence.
+    SystemNonCoherent = 3,
+}
+
+// TODO[FPRI]: Replace with `#[derive(FromPrimitive)]` when available.
+impl From<Bounded<u64, 2>> for AperturePde {
+    fn from(val: Bounded<u64, 2>) -> Self {
+        match *val {
+            1 => Self::VideoMemory,
+            2 => Self::SystemCoherent,
+            3 => Self::SystemNonCoherent,
+            _ => Self::Invalid,
+        }
+    }
+}
+
+// TODO[FPRI]: Replace with `#[derive(ToPrimitive)]` when available.
+impl From<AperturePde> for Bounded<u64, 2> {
+    fn from(val: AperturePde) -> Self {
+        Bounded::from_expr(val as u64 & 0x3)
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 10/20] gpu: nova-core: mm: Add MMU v2 page table types
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add page table entry and directory structures for MMU version 2
used by Turing/Ampere/Ada GPUs.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/pagetable.rs      |   2 +
 drivers/gpu/nova-core/mm/pagetable/ver2.rs | 271 +++++++++++++++++++++
 2 files changed, 273 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/ver2.rs

diff --git a/drivers/gpu/nova-core/mm/pagetable.rs b/drivers/gpu/nova-core/mm/pagetable.rs
index 637ff43ea83a..f6b184c9b8c8 100644
--- a/drivers/gpu/nova-core/mm/pagetable.rs
+++ b/drivers/gpu/nova-core/mm/pagetable.rs
@@ -8,6 +8,8 @@
 
 #![expect(dead_code)]
 
+pub(super) mod ver2;
+
 use kernel::num::Bounded;
 
 use crate::gpu::Architecture;
diff --git a/drivers/gpu/nova-core/mm/pagetable/ver2.rs b/drivers/gpu/nova-core/mm/pagetable/ver2.rs
new file mode 100644
index 000000000000..8086f1e5abd8
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable/ver2.rs
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! MMU v2 page table types for Turing and Ampere GPUs.
+//!
+//! This module defines MMU version 2 specific types (Turing, Ampere and Ada GPUs).
+//!
+//! Bit field layouts derived from the NVIDIA OpenRM documentation:
+//! `open-gpu-kernel-modules/src/common/inc/swref/published/turing/tu102/dev_mmu.h`
+
+#![expect(dead_code)]
+
+use kernel::bitfield;
+use kernel::num::Bounded;
+use pin_init::Zeroable;
+
+use super::{
+    AperturePde,
+    AperturePte,
+    PageTableLevel,
+    VaLevelIndex, //
+};
+use crate::mm::{
+    Pfn,
+    VirtualAddress,
+    VramAddress, //
+};
+
+// Bounded to version 2 Pfn bitfield conversions:
+// 25 bits for video memory frame numbers (bits 32:8).
+impl_pfn_bounded!(25);
+// 46 bits for system memory frame numbers (bits 53:8).
+impl_pfn_bounded!(46);
+
+bitfield! {
+    /// MMU v2 49-bit virtual address layout.
+    pub(super) struct VirtualAddressV2(u64) {
+        /// Page offset [11:0].
+        11:0    offset;
+        /// PT index [20:12].
+        20:12   pt_idx;
+        /// PDE0 index [28:21].
+        28:21   pde0_idx;
+        /// PDE1 index [37:29].
+        37:29   pde1_idx;
+        /// PDE2 index [46:38].
+        46:38   pde2_idx;
+        /// PDE3 index [48:47].
+        48:47   pde3_idx;
+    }
+}
+
+impl VirtualAddressV2 {
+    /// Create a [`VirtualAddressV2`] from a [`VirtualAddress`].
+    pub(super) fn new(va: VirtualAddress) -> Self {
+        Self::from_raw(va.raw_u64())
+    }
+}
+
+impl VaLevelIndex for VirtualAddressV2 {
+    fn level_index(&self, level: u64) -> u64 {
+        match level {
+            0 => self.pde3_idx(),
+            1 => self.pde2_idx(),
+            2 => self.pde1_idx(),
+            3 => self.pde0_idx(),
+            4 => self.pt_idx(),
+            _ => 0,
+        }
+    }
+}
+
+/// `PDE` levels for MMU v2 (5-level hierarchy: `PDB` -> `L1` -> `L2` -> `L3` -> `L4`).
+pub(super) const PDE_LEVELS: &[PageTableLevel] = &[
+    PageTableLevel::Pdb,
+    PageTableLevel::L1,
+    PageTableLevel::L2,
+    PageTableLevel::L3,
+];
+
+/// `PTE` level for MMU v2.
+pub(super) const PTE_LEVEL: PageTableLevel = PageTableLevel::L4;
+
+/// Dual `PDE` level for MMU v2 (128-bit entries).
+pub(super) const DUAL_PDE_LEVEL: PageTableLevel = PageTableLevel::L3;
+
+// Page Table Entry (PTE) for MMU v2 - 64-bit entry at level 4.
+bitfield! {
+    /// Page Table Entry for MMU v2.
+    pub(in crate::mm) struct Pte(u64) {
+        /// Entry is valid.
+        0:0     valid;
+        /// Memory aperture type.
+        2:1     aperture => AperturePte;
+        /// Volatile (bypass L2 cache).
+        3:3     volatile;
+        /// Encryption enabled (Confidential Computing).
+        4:4     encrypted;
+        /// Privileged access only.
+        5:5     privilege;
+        /// Write protection.
+        6:6     read_only;
+        /// Atomic operations disabled.
+        7:7     atomic_disable;
+        /// Frame number for system memory.
+        53:8    frame_number_sys => Pfn;
+        /// Frame number for video memory.
+        32:8    frame_number_vid => Pfn;
+        /// Peer GPU ID for peer memory (0-7).
+        35:33   peer_id;
+        /// Compression tag line bits.
+        53:36   comptagline;
+        /// Surface kind/format.
+        63:56   kind;
+    }
+}
+
+impl Pte {
+    /// Create a `PTE` from a `u64` value.
+    pub(super) fn new(val: u64) -> Self {
+        Self::from_raw(val)
+    }
+
+    /// Create a valid `PTE` for video memory.
+    pub(super) fn new_vram(pfn: Pfn, writable: bool) -> Self {
+        Self::zeroed()
+            .with_valid(true)
+            .with_aperture(AperturePte::VideoMemory)
+            .with_frame_number_vid(pfn)
+            .with_read_only(!writable)
+    }
+
+    /// Create an invalid `PTE`.
+    pub(super) fn invalid() -> Self {
+        Self::zeroed()
+    }
+
+    /// Get the frame number based on aperture type.
+    pub(super) fn frame_number(&self) -> Pfn {
+        match self.aperture() {
+            AperturePte::VideoMemory => self.frame_number_vid(),
+            _ => self.frame_number_sys(),
+        }
+    }
+
+    /// Get the raw `u64` value.
+    pub(super) fn raw_u64(&self) -> u64 {
+        self.into_raw()
+    }
+}
+
+// Page Directory Entry (PDE) for MMU v2 - 64-bit entry at levels 0-2.
+bitfield! {
+    /// Page Directory Entry for MMU v2.
+    pub(in crate::mm) struct Pde(u64) {
+        /// Valid bit (inverted logic).
+        0:0     valid_inverted;
+        /// Memory aperture type.
+        2:1     aperture => AperturePde;
+        /// Volatile (bypass L2 cache).
+        3:3     volatile;
+        /// Disable Address Translation Services.
+        5:5     no_ats;
+        /// Table frame number for system memory.
+        53:8    table_frame_sys => Pfn;
+        /// Table frame number for video memory.
+        32:8    table_frame_vid => Pfn;
+        /// Peer GPU ID (0-7).
+        35:33   peer_id;
+    }
+}
+
+impl Pde {
+    /// Create a `PDE` from a `u64` value.
+    pub(super) fn new(val: u64) -> Self {
+        Self::from_raw(val)
+    }
+
+    /// Create a valid `PDE` pointing to a page table in video memory.
+    pub(super) fn new_vram(table_pfn: Pfn) -> Self {
+        Self::zeroed()
+            .with_valid_inverted(false) // 0 = valid
+            .with_aperture(AperturePde::VideoMemory)
+            .with_table_frame_vid(table_pfn)
+    }
+
+    /// Create an invalid `PDE`.
+    pub(super) fn invalid() -> Self {
+        Self::zeroed()
+            .with_valid_inverted(true)
+            .with_aperture(AperturePde::Invalid)
+    }
+
+    /// Check if this `PDE` is valid.
+    pub(super) fn is_valid(&self) -> bool {
+        !self.valid_inverted().into_bool() && self.aperture() != AperturePde::Invalid
+    }
+
+    /// Get the table frame number based on aperture type.
+    fn table_frame(&self) -> Pfn {
+        match self.aperture() {
+            AperturePde::VideoMemory => self.table_frame_vid(),
+            _ => self.table_frame_sys(),
+        }
+    }
+
+    /// Get the `VRAM` address of the page table.
+    pub(super) fn table_vram_address(&self) -> VramAddress {
+        debug_assert!(
+            self.aperture() == AperturePde::VideoMemory,
+            "table_vram_address called on non-VRAM PDE (aperture: {:?})",
+            self.aperture()
+        );
+        VramAddress::from(self.table_frame_vid())
+    }
+
+    /// Get the raw `u64` value of the `PDE`.
+    pub(super) fn raw_u64(&self) -> u64 {
+        self.into_raw()
+    }
+}
+
+/// Dual `PDE` at Level 3 - 128-bit entry of Large/Small Page Table pointers.
+///
+/// The dual `PDE` supports both large (64KB) and small (4KB) page tables.
+#[repr(C)]
+#[derive(Debug, Clone, Copy)]
+pub(in crate::mm) struct DualPde {
+    /// Large/Big Page Table pointer (lower 64 bits).
+    pub(super) big: Pde,
+    /// Small Page Table pointer (upper 64 bits).
+    pub(super) small: Pde,
+}
+
+
+impl DualPde {
+    /// Create a dual `PDE` from raw 128-bit value (two `u64`s).
+    pub(super) fn new(big: u64, small: u64) -> Self {
+        Self {
+            big: Pde::new(big),
+            small: Pde::new(small),
+        }
+    }
+
+    /// Create a dual `PDE` with only the small page table pointer set.
+    ///
+    /// Note: The big (LPT) portion is set to 0, not `Pde::invalid()`.
+    /// According to hardware documentation, clearing bit 0 of the 128-bit
+    /// entry makes the PDE behave as a "normal" PDE. Using `Pde::invalid()`
+    /// would set bit 0 (valid_inverted), which breaks page table walking.
+    pub(super) fn new_small(table_pfn: Pfn) -> Self {
+        Self {
+            big: Pde::new(0),
+            small: Pde::new_vram(table_pfn),
+        }
+    }
+
+    /// Check if the small page table pointer is valid.
+    pub(super) fn has_small(&self) -> bool {
+        self.small.is_valid()
+    }
+
+    /// Check if the big page table pointer is valid.
+    fn has_big(&self) -> bool {
+        self.big.is_valid()
+    }
+
+    /// Get the small page table `Pfn`.
+    fn small_pfn(&self) -> Pfn {
+        self.small.table_frame()
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 11/20] gpu: nova-core: mm: Add MMU v3 page table types
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add page table entry and directory structures for MMU version 3
used by Hopper and later GPUs.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/pagetable.rs      |   1 +
 drivers/gpu/nova-core/mm/pagetable/ver2.rs |  10 +-
 drivers/gpu/nova-core/mm/pagetable/ver3.rs | 391 +++++++++++++++++++++
 3 files changed, 397 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/ver3.rs

diff --git a/drivers/gpu/nova-core/mm/pagetable.rs b/drivers/gpu/nova-core/mm/pagetable.rs
index f6b184c9b8c8..9897818b3b07 100644
--- a/drivers/gpu/nova-core/mm/pagetable.rs
+++ b/drivers/gpu/nova-core/mm/pagetable.rs
@@ -9,6 +9,7 @@
 #![expect(dead_code)]
 
 pub(super) mod ver2;
+pub(super) mod ver3;
 
 use kernel::num::Bounded;
 
diff --git a/drivers/gpu/nova-core/mm/pagetable/ver2.rs b/drivers/gpu/nova-core/mm/pagetable/ver2.rs
index 8086f1e5abd8..37066688b5f1 100644
--- a/drivers/gpu/nova-core/mm/pagetable/ver2.rs
+++ b/drivers/gpu/nova-core/mm/pagetable/ver2.rs
@@ -59,11 +59,11 @@ pub(super) fn new(va: VirtualAddress) -> Self {
 impl VaLevelIndex for VirtualAddressV2 {
     fn level_index(&self, level: u64) -> u64 {
         match level {
-            0 => self.pde3_idx(),
-            1 => self.pde2_idx(),
-            2 => self.pde1_idx(),
-            3 => self.pde0_idx(),
-            4 => self.pt_idx(),
+            0 => self.pde3_idx().get(),
+            1 => self.pde2_idx().get(),
+            2 => self.pde1_idx().get(),
+            3 => self.pde0_idx().get(),
+            4 => self.pt_idx().get(),
             _ => 0,
         }
     }
diff --git a/drivers/gpu/nova-core/mm/pagetable/ver3.rs b/drivers/gpu/nova-core/mm/pagetable/ver3.rs
new file mode 100644
index 000000000000..2f9e762c4667
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable/ver3.rs
@@ -0,0 +1,391 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! MMU v3 page table types for Hopper and later GPUs.
+//!
+//! This module defines MMU version 3 specific types (Hopper and later GPUs).
+//!
+//! Key differences from MMU v2:
+//! - Unified 40-bit address field for all apertures (v2 had separate sys/vid fields).
+//! - PCF (Page Classification Field) replaces separate privilege/RO/atomic/cache bits.
+//! - KIND field is 4 bits (not 8).
+//! - IS_PTE bit in PDE to support large pages directly.
+//! - No COMPTAGLINE field (compression handled differently in v3).
+//! - No separate ENCRYPTED bit.
+//!
+//! Bit field layouts derived from the NVIDIA OpenRM documentation:
+//! `open-gpu-kernel-modules/src/common/inc/swref/published/hopper/gh100/dev_mmu.h`
+
+#![expect(dead_code)]
+
+use kernel::bitfield;
+use kernel::num::Bounded;
+use kernel::prelude::*;
+use pin_init::Zeroable;
+
+use super::{
+    AperturePde,
+    AperturePte,
+    PageTableLevel,
+    VaLevelIndex, //
+};
+use crate::mm::{
+    Pfn,
+    VirtualAddress,
+    VramAddress, //
+};
+
+// Bounded to version 3 Pfn conversion.
+impl_pfn_bounded!(40);
+
+bitfield! {
+    /// MMU v3 57-bit virtual address layout.
+    pub(super) struct VirtualAddressV3(u64) {
+        /// Page offset [11:0].
+        11:0    offset;
+        /// PT index [20:12].
+        20:12   pt_idx;
+        /// PDE0 index [28:21].
+        28:21   pde0_idx;
+        /// PDE1 index [37:29].
+        37:29   pde1_idx;
+        /// PDE2 index [46:38].
+        46:38   pde2_idx;
+        /// PDE3 index [55:47].
+        55:47   pde3_idx;
+        /// PDE4 index [56].
+        56:56   pde4_idx;
+    }
+}
+
+impl VirtualAddressV3 {
+    /// Create a [`VirtualAddressV3`] from a [`VirtualAddress`].
+    pub(super) fn new(va: VirtualAddress) -> Self {
+        Self::from_raw(va.raw_u64())
+    }
+}
+
+impl VaLevelIndex for VirtualAddressV3 {
+    fn level_index(&self, level: u64) -> u64 {
+        match level {
+            0 => self.pde4_idx().get(),
+            1 => self.pde3_idx().get(),
+            2 => self.pde2_idx().get(),
+            3 => self.pde1_idx().get(),
+            4 => self.pde0_idx().get(),
+            5 => self.pt_idx().get(),
+            _ => 0,
+        }
+    }
+}
+
+/// PDE levels for MMU v3 (6-level hierarchy).
+pub(super) const PDE_LEVELS: &[PageTableLevel] = &[
+    PageTableLevel::Pdb,
+    PageTableLevel::L1,
+    PageTableLevel::L2,
+    PageTableLevel::L3,
+    PageTableLevel::L4,
+];
+
+/// PTE level for MMU v3.
+pub(super) const PTE_LEVEL: PageTableLevel = PageTableLevel::L5;
+
+/// Dual PDE level for MMU v3 (128-bit entries).
+pub(super) const DUAL_PDE_LEVEL: PageTableLevel = PageTableLevel::L4;
+
+bitfield! {
+    /// Page Classification Field for PTEs (5 bits) in MMU v3.
+    pub(in crate::mm) struct PtePcf(u8) {
+        /// Bypass L2 cache (0=cached, 1=bypass).
+        0:0     uncached;
+        /// Access counting disabled (0=enabled, 1=disabled).
+        1:1     acd;
+        /// Read-only access (0=read-write, 1=read-only).
+        2:2     read_only;
+        /// Atomics disabled (0=enabled, 1=disabled).
+        3:3     no_atomic;
+        /// Privileged access only (0=regular, 1=privileged).
+        4:4     privileged;
+    }
+}
+
+impl PtePcf {
+    /// Create PCF for read-write mapping (cached, no atomics, regular mode).
+    fn rw() -> Self {
+        Self::zeroed().with_no_atomic(true)
+    }
+
+    /// Create PCF for read-only mapping (cached, no atomics, regular mode).
+    fn ro() -> Self {
+        Self::zeroed().with_read_only(true).with_no_atomic(true)
+    }
+
+    /// Get the raw `u8` value.
+    fn raw_u8(&self) -> u8 {
+        self.into_raw()
+    }
+}
+
+impl From<Bounded<u64, 5>> for PtePcf {
+    fn from(val: Bounded<u64, 5>) -> Self {
+        Self::from_raw(u8::from(val))
+    }
+}
+
+impl From<PtePcf> for Bounded<u64, 5> {
+    fn from(pcf: PtePcf) -> Self {
+        Bounded::from_expr(u64::from(pcf.into_raw()) & 0x1F)
+    }
+}
+
+bitfield! {
+    /// Page Classification Field for PDEs (3 bits) in MMU v3.
+    ///
+    /// Controls Address Translation Services (ATS) and caching.
+    pub(in crate::mm) struct PdePcf(u8) {
+        /// Bypass L2 cache (0=cached, 1=bypass).
+        0:0     uncached;
+        /// ATS disabled (0=enabled, 1=disabled).
+        1:1     no_ats;
+    }
+}
+
+impl PdePcf {
+    /// Create PCF for cached mapping with ATS enabled (default).
+    fn cached() -> Self {
+        Self::zeroed()
+    }
+
+    /// Get the raw `u8` value.
+    fn raw_u8(&self) -> u8 {
+        self.into_raw()
+    }
+}
+
+impl From<Bounded<u64, 3>> for PdePcf {
+    fn from(val: Bounded<u64, 3>) -> Self {
+        Self::from_raw(u8::from(val))
+    }
+}
+
+impl From<PdePcf> for Bounded<u64, 3> {
+    fn from(pcf: PdePcf) -> Self {
+        Bounded::from_expr(u64::from(pcf.into_raw()) & 0x7)
+    }
+}
+
+bitfield! {
+    /// Page Table Entry for MMU v3.
+    pub(in crate::mm) struct Pte(u64) {
+        /// Entry is valid.
+        0:0     valid;
+        /// Memory aperture type.
+        2:1     aperture => AperturePte;
+        /// Page Classification Field.
+        7:3     pcf => PtePcf;
+        /// Surface kind (4 bits, 0x0=pitch, 0xF=invalid).
+        11:8    kind;
+        /// Physical frame number (for all apertures).
+        51:12   frame_number => Pfn;
+        /// Peer GPU ID for peer memory (0-7).
+        63:61   peer_id;
+    }
+}
+
+impl Pte {
+    /// Create a PTE from a `u64` value.
+    pub(super) fn new(val: u64) -> Self {
+        Self::from_raw(val)
+    }
+
+    /// Create a valid PTE for video memory.
+    pub(super) fn new_vram(frame: Pfn, writable: bool) -> Self {
+        let pcf = if writable { PtePcf::rw() } else { PtePcf::ro() };
+        Self::zeroed()
+            .with_valid(true)
+            .with_aperture(AperturePte::VideoMemory)
+            .with_pcf(pcf)
+            .with_frame_number(frame)
+    }
+
+    /// Create an invalid PTE.
+    pub(super) fn invalid() -> Self {
+        Self::zeroed()
+    }
+
+    /// Get the raw `u64` value.
+    pub(super) fn raw_u64(&self) -> u64 {
+        self.into_raw()
+    }
+}
+
+bitfield! {
+    /// Page Directory Entry for MMU v3 (Hopper+).
+    ///
+    /// Note: v3 uses a unified 40-bit address field (v2 had separate sys/vid address fields).
+    pub(in crate::mm) struct Pde(u64) {
+        /// Entry is a PTE (0=PDE, 1=large page PTE).
+        0:0     is_pte;
+        /// Memory aperture type.
+        2:1     aperture => AperturePde;
+        /// Page Classification Field (3 bits for PDE).
+        5:3     pcf => PdePcf;
+        /// Table frame number (40-bit unified address).
+        51:12   table_frame => Pfn;
+    }
+}
+
+impl Pde {
+    /// Create a PDE from a `u64` value.
+    pub(super) fn new(val: u64) -> Self {
+        Self::from_raw(val)
+    }
+
+    /// Create a valid PDE pointing to a page table in video memory.
+    pub(super) fn new_vram(table_pfn: Pfn) -> Self {
+        Self::zeroed()
+            .with_is_pte(false)
+            .with_aperture(AperturePde::VideoMemory)
+            .with_table_frame(table_pfn)
+    }
+
+    /// Create an invalid PDE.
+    pub(super) fn invalid() -> Self {
+        Self::zeroed().with_aperture(AperturePde::Invalid)
+    }
+
+    /// Check if this PDE is valid.
+    pub(super) fn is_valid(&self) -> bool {
+        self.aperture() != AperturePde::Invalid
+    }
+
+    /// Get the VRAM address of the page table.
+    pub(super) fn table_vram_address(&self) -> VramAddress {
+        debug_assert!(
+            self.aperture() == AperturePde::VideoMemory,
+            "table_vram_address called on non-VRAM PDE (aperture: {:?})",
+            self.aperture()
+        );
+        VramAddress::from(self.table_frame())
+    }
+
+    /// Get the raw `u64` value.
+    pub(super) fn raw_u64(&self) -> u64 {
+        self.into_raw()
+    }
+}
+
+bitfield! {
+    /// Big Page Table pointer in Dual PDE (MMU v3).
+    ///
+    /// 64-bit lower word of the 128-bit Dual PDE.
+    pub(super) struct DualPdeBig(u64) {
+        /// Entry is a PTE (for large pages).
+        0:0     is_pte;
+        /// Memory aperture type.
+        2:1     aperture => AperturePde;
+        /// Page Classification Field.
+        5:3     pcf => PdePcf;
+        /// Table frame (table address 256-byte aligned).
+        51:8    table_frame;
+    }
+}
+
+impl DualPdeBig {
+    /// Create a big page table pointer from a `u64` value.
+    fn new(val: u64) -> Self {
+        Self::from_raw(val)
+    }
+
+    /// Create an invalid big page table pointer.
+    fn invalid() -> Self {
+        Self::zeroed().with_aperture(AperturePde::Invalid)
+    }
+
+    /// Create a valid big PDE pointing to a page table in video memory.
+    fn new_vram(table_addr: VramAddress) -> Result<Self> {
+        // Big page table addresses must be 256-byte aligned (shift 8).
+        if table_addr.raw_u64() & 0xFF != 0 {
+            return Err(EINVAL);
+        }
+
+        let table_frame = Bounded::from_expr(table_addr.raw_u64() >> 8);
+        Ok(Self::zeroed()
+            .with_is_pte(false)
+            .with_aperture(AperturePde::VideoMemory)
+            .with_table_frame(table_frame))
+    }
+
+    /// Check if this big PDE is valid.
+    fn is_valid(&self) -> bool {
+        self.aperture() != AperturePde::Invalid
+    }
+
+    /// Get the VRAM address of the big page table.
+    fn table_vram_address(&self) -> VramAddress {
+        debug_assert!(
+            self.aperture() == AperturePde::VideoMemory,
+            "table_vram_address called on non-VRAM DualPdeBig (aperture: {:?})",
+            self.aperture()
+        );
+        VramAddress::new(self.table_frame().get() << 8)
+    }
+
+    /// Get the raw `u64` value.
+    pub(super) fn raw_u64(&self) -> u64 {
+        self.into_raw()
+    }
+}
+
+/// Dual PDE at Level 4 for MMU v3 - 128-bit entry.
+///
+/// Contains both big (64KB) and small (4KB) page table pointers:
+/// - Lower 64 bits: Big Page Table pointer.
+/// - Upper 64 bits: Small Page Table pointer.
+///
+/// ## Note
+///
+/// The big and small page table pointers have different address layouts:
+/// - Big address = field value << 8 (256-byte alignment).
+/// - Small address = field value << 12 (4KB alignment).
+///
+/// This is why `DualPdeBig` is a separate type from `Pde`.
+#[repr(C)]
+#[derive(Debug, Clone, Copy)]
+pub(in crate::mm) struct DualPde {
+    /// Big Page Table pointer.
+    pub(super) big: DualPdeBig,
+    /// Small Page Table pointer.
+    pub(super) small: Pde,
+}
+
+// SAFETY: Both `DualPdeBig` and `Pde` fields are `Zeroable` (bitfield types are Zeroable).
+unsafe impl Zeroable for DualPde {}
+
+impl DualPde {
+    /// Create a dual PDE from raw 128-bit value (two `u64`s).
+    pub(super) fn new(big: u64, small: u64) -> Self {
+        Self {
+            big: DualPdeBig::new(big),
+            small: Pde::new(small),
+        }
+    }
+
+    /// Create a dual PDE with only the small page table pointer set.
+    pub(super) fn new_small(table_pfn: Pfn) -> Self {
+        Self {
+            big: DualPdeBig::invalid(),
+            small: Pde::new_vram(table_pfn),
+        }
+    }
+
+    /// Check if the small page table pointer is valid.
+    pub(super) fn has_small(&self) -> bool {
+        self.small.is_valid()
+    }
+
+    /// Check if the big page table pointer is valid.
+    fn has_big(&self) -> bool {
+        self.big.is_valid()
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 12/20] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add unified Pte, Pde, and DualPde wrapper enums that abstract over
MMU v2 and v3 page table entry formats. These enums allow the page
table walker and VMM to work with both MMU versions.

Each unified type:
- Takes MmuVersion parameter in constructors
- Wraps both ver2 and ver3 variants
- Delegates method calls to the appropriate variant

This enables version-agnostic page table operations while keeping
version-specific implementation details encapsulated in the ver2
and ver3 modules.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/pagetable.rs      | 223 +++++++++++++++++++++
 drivers/gpu/nova-core/mm/pagetable/ver2.rs | 150 ++++++++------
 drivers/gpu/nova-core/mm/pagetable/ver3.rs | 120 +++++++----
 3 files changed, 396 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/nova-core/mm/pagetable.rs b/drivers/gpu/nova-core/mm/pagetable.rs
index 9897818b3b07..764b9e71ae41 100644
--- a/drivers/gpu/nova-core/mm/pagetable.rs
+++ b/drivers/gpu/nova-core/mm/pagetable.rs
@@ -14,6 +14,13 @@
 use kernel::num::Bounded;
 
 use crate::gpu::Architecture;
+use crate::mm::{
+    pramin,
+    Pfn,
+    VirtualAddress,
+    VramAddress, //
+};
+use kernel::prelude::*;
 
 /// Extracts the page table index at a given level from a virtual address.
 pub(super) trait VaLevelIndex {
@@ -86,6 +93,222 @@ pub(super) const fn as_index(&self) -> u64 {
     }
 }
 
+// Trait abstractions for page table operations.
+
+/// Operations on Page Table Entries (`PTE`s).
+pub(super) trait PteOps: Copy + core::fmt::Debug {
+    /// Create a `PTE` from a raw `u64` value.
+    fn new(val: u64) -> Self;
+
+    /// Create an invalid `PTE`.
+    fn invalid() -> Self;
+
+    /// Create a valid `PTE` for video memory.
+    fn new_vram(pfn: Pfn, writable: bool) -> Self;
+
+    /// Check if this `PTE` is valid.
+    fn is_valid(&self) -> bool;
+
+    /// Get the physical frame number.
+    fn frame_number(&self) -> Pfn;
+
+    /// Get the raw `u64` value.
+    fn raw_u64(&self) -> u64;
+
+    /// Read a `PTE` from VRAM.
+    fn read(window: &mut pramin::PraminWindow<'_>, addr: VramAddress) -> Result<Self> {
+        let val = window.try_read64(addr.raw())?;
+        Ok(Self::new(val))
+    }
+
+    /// Write this `PTE` to VRAM.
+    fn write(&self, window: &mut pramin::PraminWindow<'_>, addr: VramAddress) -> Result {
+        window.try_write64(addr.raw(), self.raw_u64())
+    }
+}
+
+/// Operations on Page Directory Entries (`PDE`s).
+pub(super) trait PdeOps: Copy + core::fmt::Debug {
+    /// Create a `PDE` from a raw `u64` value.
+    fn new(val: u64) -> Self;
+
+    /// Create a valid `PDE` pointing to a page table in video memory.
+    fn new_vram(table_pfn: Pfn) -> Self;
+
+    /// Create an invalid `PDE`.
+    fn invalid() -> Self;
+
+    /// Check if this `PDE` is valid.
+    fn is_valid(&self) -> bool;
+
+    /// Get the memory aperture of this `PDE`.
+    fn aperture(&self) -> AperturePde;
+
+    /// Get the VRAM address of the page table.
+    fn table_vram_address(&self) -> VramAddress;
+
+    /// Get the raw `u64` value.
+    fn raw_u64(&self) -> u64;
+
+    /// Read a `PDE` from VRAM.
+    fn read(window: &mut pramin::PraminWindow<'_>, addr: VramAddress) -> Result<Self> {
+        let val = window.try_read64(addr.raw())?;
+        Ok(Self::new(val))
+    }
+
+    /// Write this `PDE` to VRAM.
+    fn write(&self, window: &mut pramin::PraminWindow<'_>, addr: VramAddress) -> Result {
+        window.try_write64(addr.raw(), self.raw_u64())
+    }
+
+    /// Check if this `PDE` is valid and points to video memory.
+    fn is_valid_vram(&self) -> bool {
+        self.is_valid() && self.aperture() == AperturePde::VideoMemory
+    }
+}
+
+/// Operations on Dual Page Directory Entries (128-bit `DualPde`s).
+pub(super) trait DualPdeOps: Copy + core::fmt::Debug {
+    /// Create a `DualPde` from raw 128-bit value (two `u64`s).
+    fn new(big: u64, small: u64) -> Self;
+
+    /// Create a `DualPde` with only the small page table pointer set.
+    fn new_small(table_pfn: Pfn) -> Self;
+
+    /// Check if the small page table pointer is valid.
+    fn has_small(&self) -> bool;
+
+    /// Get the small page table VRAM address.
+    fn small_vram_address(&self) -> VramAddress;
+
+    /// Get the raw `u64` value of the big PDE.
+    fn big_raw_u64(&self) -> u64;
+
+    /// Get the raw `u64` value of the small PDE.
+    fn small_raw_u64(&self) -> u64;
+
+    /// Read a dual PDE (128-bit) from VRAM.
+    fn read(window: &mut pramin::PraminWindow<'_>, addr: VramAddress) -> Result<Self> {
+        let lo = window.try_read64(addr.raw())?;
+        let hi = window.try_read64(addr.raw() + 8)?;
+        Ok(Self::new(lo, hi))
+    }
+
+    /// Write this dual PDE (128-bit) to VRAM.
+    fn write(&self, window: &mut pramin::PraminWindow<'_>, addr: VramAddress) -> Result {
+        window.try_write64(addr.raw(), self.big_raw_u64())?;
+        window.try_write64(addr.raw() + 8, self.small_raw_u64())
+    }
+}
+
+/// MMU configuration trait -- encodes version-specific constants and types.
+pub(super) trait MmuConfig: 'static {
+    /// Page Table Entry type.
+    type Pte: PteOps;
+    /// Page Directory Entry type.
+    type Pde: PdeOps;
+    /// Dual Page Directory Entry type (128-bit).
+    type DualPde: DualPdeOps;
+
+    /// PDE levels (excluding PTE level) for page table walking.
+    const PDE_LEVELS: &'static [PageTableLevel];
+    /// PTE level for this MMU version.
+    const PTE_LEVEL: PageTableLevel;
+    /// Dual PDE level (128-bit entries) for this MMU version.
+    const DUAL_PDE_LEVEL: PageTableLevel;
+
+    /// Get the number of entries per page table page for a given level.
+    fn entries_per_page(level: PageTableLevel) -> usize;
+
+    /// Extract the page table index at `level` from `va`.
+    fn level_index(va: VirtualAddress, level: u64) -> u64;
+
+    /// Get the entry size in bytes for a given level.
+    fn entry_size(level: PageTableLevel) -> usize {
+        if level == Self::DUAL_PDE_LEVEL {
+            16 // 128-bit dual PDE
+        } else {
+            8 // 64-bit PDE/PTE
+        }
+    }
+
+    /// Compute upper bound on page table pages needed for `num_virt_pages`.
+    ///
+    /// Walks from PTE level up through PDE levels, accumulating the tree.
+    fn pt_pages_upper_bound(num_virt_pages: usize) -> usize {
+        let mut total = 0;
+
+        // PTE pages at the leaf level.
+        let pte_epp = Self::entries_per_page(Self::PTE_LEVEL);
+        let mut pages_at_level = num_virt_pages.div_ceil(pte_epp);
+        total += pages_at_level;
+
+        // Walk PDE levels bottom-up (reverse of PDE_LEVELS).
+        for &level in Self::PDE_LEVELS.iter().rev() {
+            let epp = Self::entries_per_page(level);
+
+            // How many pages at this level do we need to point to
+            // the previous pages_at_level?
+            pages_at_level = pages_at_level.div_ceil(epp);
+            total += pages_at_level;
+        }
+
+        total
+    }
+}
+
+/// Marker struct for MMU v2 (Turing/Ampere/Ada).
+pub(super) struct MmuV2;
+
+impl MmuConfig for MmuV2 {
+    type Pte = ver2::Pte;
+    type Pde = ver2::Pde;
+    type DualPde = ver2::DualPde;
+
+    const PDE_LEVELS: &'static [PageTableLevel] = ver2::PDE_LEVELS;
+    const PTE_LEVEL: PageTableLevel = ver2::PTE_LEVEL;
+    const DUAL_PDE_LEVEL: PageTableLevel = ver2::DUAL_PDE_LEVEL;
+
+    fn entries_per_page(level: PageTableLevel) -> usize {
+        // TODO: Calculate these values from the bitfield dynamically
+        // instead of hardcoding them.
+        match level {
+            PageTableLevel::Pdb => 4,  // PD3 root: bits [48:47] = 2 bits
+            PageTableLevel::L3 => 256, // PD0 dual: bits [28:21] = 8 bits
+            _ => 512,                  // PD2, PD1, PT: 9 bits each
+        }
+    }
+
+    fn level_index(va: VirtualAddress, level: u64) -> u64 {
+        ver2::VirtualAddressV2::new(va).level_index(level)
+    }
+}
+
+/// Marker struct for MMU v3 (Hopper and later).
+pub(super) struct MmuV3;
+
+impl MmuConfig for MmuV3 {
+    type Pte = ver3::Pte;
+    type Pde = ver3::Pde;
+    type DualPde = ver3::DualPde;
+
+    const PDE_LEVELS: &'static [PageTableLevel] = ver3::PDE_LEVELS;
+    const PTE_LEVEL: PageTableLevel = ver3::PTE_LEVEL;
+    const DUAL_PDE_LEVEL: PageTableLevel = ver3::DUAL_PDE_LEVEL;
+
+    fn entries_per_page(level: PageTableLevel) -> usize {
+        match level {
+            PageTableLevel::Pdb => 2,  // PDE4 root: bit [56] = 1 bit, 2 entries
+            PageTableLevel::L4 => 256, // PDE0 dual: bits [28:21] = 8 bits
+            _ => 512,                  // PDE3, PDE2, PDE1, PT: 9 bits each
+        }
+    }
+
+    fn level_index(va: VirtualAddress, level: u64) -> u64 {
+        ver3::VirtualAddressV3::new(va).level_index(level)
+    }
+}
+
 /// Memory aperture for Page Table Entries (`PTE`s).
 ///
 /// Determines which memory region the `PTE` points to.
diff --git a/drivers/gpu/nova-core/mm/pagetable/ver2.rs b/drivers/gpu/nova-core/mm/pagetable/ver2.rs
index 37066688b5f1..b4ee91766a4f 100644
--- a/drivers/gpu/nova-core/mm/pagetable/ver2.rs
+++ b/drivers/gpu/nova-core/mm/pagetable/ver2.rs
@@ -16,7 +16,10 @@
 use super::{
     AperturePde,
     AperturePte,
+    DualPdeOps,
     PageTableLevel,
+    PdeOps,
+    PteOps,
     VaLevelIndex, //
 };
 use crate::mm::{
@@ -116,12 +119,12 @@ pub(in crate::mm) struct Pte(u64) {
 
 impl Pte {
     /// Create a `PTE` from a `u64` value.
-    pub(super) fn new(val: u64) -> Self {
+    pub(super) fn new_raw(val: u64) -> Self {
         Self::from_raw(val)
     }
 
     /// Create a valid `PTE` for video memory.
-    pub(super) fn new_vram(pfn: Pfn, writable: bool) -> Self {
+    fn new_vram_inner(pfn: Pfn, writable: bool) -> Self {
         Self::zeroed()
             .with_valid(true)
             .with_aperture(AperturePte::VideoMemory)
@@ -129,21 +132,37 @@ pub(super) fn new_vram(pfn: Pfn, writable: bool) -> Self {
             .with_read_only(!writable)
     }
 
-    /// Create an invalid `PTE`.
-    pub(super) fn invalid() -> Self {
-        Self::zeroed()
-    }
-
     /// Get the frame number based on aperture type.
-    pub(super) fn frame_number(&self) -> Pfn {
+    fn frame_number_by_aperture(&self) -> Pfn {
         match self.aperture() {
             AperturePte::VideoMemory => self.frame_number_vid(),
             _ => self.frame_number_sys(),
         }
     }
+}
 
-    /// Get the raw `u64` value.
-    pub(super) fn raw_u64(&self) -> u64 {
+impl PteOps for Pte {
+    fn new(val: u64) -> Self {
+        Self::from_raw(val)
+    }
+
+    fn invalid() -> Self {
+        Self::zeroed()
+    }
+
+    fn new_vram(pfn: Pfn, writable: bool) -> Self {
+        Self::new_vram_inner(pfn, writable)
+    }
+
+    fn is_valid(&self) -> bool {
+        self.valid().into_bool()
+    }
+
+    fn frame_number(&self) -> Pfn {
+        self.frame_number_by_aperture()
+    }
+
+    fn raw_u64(&self) -> u64 {
         self.into_raw()
     }
 }
@@ -171,30 +190,18 @@ pub(in crate::mm) struct Pde(u64) {
 
 impl Pde {
     /// Create a `PDE` from a `u64` value.
-    pub(super) fn new(val: u64) -> Self {
+    pub(super) fn new_raw(val: u64) -> Self {
         Self::from_raw(val)
     }
 
     /// Create a valid `PDE` pointing to a page table in video memory.
-    pub(super) fn new_vram(table_pfn: Pfn) -> Self {
+    fn new_vram_inner(table_pfn: Pfn) -> Self {
         Self::zeroed()
             .with_valid_inverted(false) // 0 = valid
             .with_aperture(AperturePde::VideoMemory)
             .with_table_frame_vid(table_pfn)
     }
 
-    /// Create an invalid `PDE`.
-    pub(super) fn invalid() -> Self {
-        Self::zeroed()
-            .with_valid_inverted(true)
-            .with_aperture(AperturePde::Invalid)
-    }
-
-    /// Check if this `PDE` is valid.
-    pub(super) fn is_valid(&self) -> bool {
-        !self.valid_inverted().into_bool() && self.aperture() != AperturePde::Invalid
-    }
-
     /// Get the table frame number based on aperture type.
     fn table_frame(&self) -> Pfn {
         match self.aperture() {
@@ -202,19 +209,42 @@ fn table_frame(&self) -> Pfn {
             _ => self.table_frame_sys(),
         }
     }
+}
 
-    /// Get the `VRAM` address of the page table.
-    pub(super) fn table_vram_address(&self) -> VramAddress {
+impl PdeOps for Pde {
+    fn new(val: u64) -> Self {
+        Self::from_raw(val)
+    }
+
+    fn new_vram(table_pfn: Pfn) -> Self {
+        Self::new_vram_inner(table_pfn)
+    }
+
+    fn invalid() -> Self {
+        Self::zeroed()
+            .with_valid_inverted(true)
+            .with_aperture(AperturePde::Invalid)
+    }
+
+    fn is_valid(&self) -> bool {
+        !self.valid_inverted().into_bool() && self.aperture() != AperturePde::Invalid
+    }
+
+    fn aperture(&self) -> AperturePde {
+        // Delegate to bitfield getter (takes self by value, Copy).
+        Pde::aperture(*self)
+    }
+
+    fn table_vram_address(&self) -> VramAddress {
         debug_assert!(
-            self.aperture() == AperturePde::VideoMemory,
+            Pde::aperture(*self) == AperturePde::VideoMemory,
             "table_vram_address called on non-VRAM PDE (aperture: {:?})",
-            self.aperture()
+            Pde::aperture(*self)
         );
         VramAddress::from(self.table_frame_vid())
     }
 
-    /// Get the raw `u64` value of the `PDE`.
-    pub(super) fn raw_u64(&self) -> u64 {
+    fn raw_u64(&self) -> u64 {
         self.into_raw()
     }
 }
@@ -233,35 +263,9 @@ pub(in crate::mm) struct DualPde {
 
 
 impl DualPde {
-    /// Create a dual `PDE` from raw 128-bit value (two `u64`s).
-    pub(super) fn new(big: u64, small: u64) -> Self {
-        Self {
-            big: Pde::new(big),
-            small: Pde::new(small),
-        }
-    }
-
-    /// Create a dual `PDE` with only the small page table pointer set.
-    ///
-    /// Note: The big (LPT) portion is set to 0, not `Pde::invalid()`.
-    /// According to hardware documentation, clearing bit 0 of the 128-bit
-    /// entry makes the PDE behave as a "normal" PDE. Using `Pde::invalid()`
-    /// would set bit 0 (valid_inverted), which breaks page table walking.
-    pub(super) fn new_small(table_pfn: Pfn) -> Self {
-        Self {
-            big: Pde::new(0),
-            small: Pde::new_vram(table_pfn),
-        }
-    }
-
-    /// Check if the small page table pointer is valid.
-    pub(super) fn has_small(&self) -> bool {
-        self.small.is_valid()
-    }
-
     /// Check if the big page table pointer is valid.
     fn has_big(&self) -> bool {
-        self.big.is_valid()
+        PdeOps::is_valid(&self.big)
     }
 
     /// Get the small page table `Pfn`.
@@ -269,3 +273,35 @@ fn small_pfn(&self) -> Pfn {
         self.small.table_frame()
     }
 }
+
+impl DualPdeOps for DualPde {
+    fn new(big: u64, small: u64) -> Self {
+        Self {
+            big: PdeOps::new(big),
+            small: PdeOps::new(small),
+        }
+    }
+
+    fn new_small(table_pfn: Pfn) -> Self {
+        Self {
+            big: PdeOps::new(0),
+            small: PdeOps::new_vram(table_pfn),
+        }
+    }
+
+    fn has_small(&self) -> bool {
+        PdeOps::is_valid(&self.small)
+    }
+
+    fn small_vram_address(&self) -> VramAddress {
+        PdeOps::table_vram_address(&self.small)
+    }
+
+    fn big_raw_u64(&self) -> u64 {
+        PdeOps::raw_u64(&self.big)
+    }
+
+    fn small_raw_u64(&self) -> u64 {
+        PdeOps::raw_u64(&self.small)
+    }
+}
diff --git a/drivers/gpu/nova-core/mm/pagetable/ver3.rs b/drivers/gpu/nova-core/mm/pagetable/ver3.rs
index 2f9e762c4667..1c52013e498d 100644
--- a/drivers/gpu/nova-core/mm/pagetable/ver3.rs
+++ b/drivers/gpu/nova-core/mm/pagetable/ver3.rs
@@ -25,7 +25,10 @@
 use super::{
     AperturePde,
     AperturePte,
+    DualPdeOps,
     PageTableLevel,
+    PdeOps,
+    PteOps,
     VaLevelIndex, //
 };
 use crate::mm::{
@@ -194,12 +197,12 @@ pub(in crate::mm) struct Pte(u64) {
 
 impl Pte {
     /// Create a PTE from a `u64` value.
-    pub(super) fn new(val: u64) -> Self {
+    pub(super) fn new_raw(val: u64) -> Self {
         Self::from_raw(val)
     }
 
     /// Create a valid PTE for video memory.
-    pub(super) fn new_vram(frame: Pfn, writable: bool) -> Self {
+    fn new_vram_inner(frame: Pfn, writable: bool) -> Self {
         let pcf = if writable { PtePcf::rw() } else { PtePcf::ro() };
         Self::zeroed()
             .with_valid(true)
@@ -207,14 +210,30 @@ pub(super) fn new_vram(frame: Pfn, writable: bool) -> Self {
             .with_pcf(pcf)
             .with_frame_number(frame)
     }
+}
 
-    /// Create an invalid PTE.
-    pub(super) fn invalid() -> Self {
+impl PteOps for Pte {
+    fn new(val: u64) -> Self {
+        Self::from_raw(val)
+    }
+
+    fn invalid() -> Self {
         Self::zeroed()
     }
 
-    /// Get the raw `u64` value.
-    pub(super) fn raw_u64(&self) -> u64 {
+    fn new_vram(pfn: Pfn, writable: bool) -> Self {
+        Self::new_vram_inner(pfn, writable)
+    }
+
+    fn is_valid(&self) -> bool {
+        self.valid().into_bool()
+    }
+
+    fn frame_number(&self) -> Pfn {
+        Pte::frame_number(*self)
+    }
+
+    fn raw_u64(&self) -> u64 {
         self.into_raw()
     }
 }
@@ -237,40 +256,50 @@ pub(in crate::mm) struct Pde(u64) {
 
 impl Pde {
     /// Create a PDE from a `u64` value.
-    pub(super) fn new(val: u64) -> Self {
+    pub(super) fn new_raw(val: u64) -> Self {
         Self::from_raw(val)
     }
 
     /// Create a valid PDE pointing to a page table in video memory.
-    pub(super) fn new_vram(table_pfn: Pfn) -> Self {
+    fn new_vram_inner(table_pfn: Pfn) -> Self {
         Self::zeroed()
             .with_is_pte(false)
             .with_aperture(AperturePde::VideoMemory)
             .with_table_frame(table_pfn)
     }
+}
 
-    /// Create an invalid PDE.
-    pub(super) fn invalid() -> Self {
+impl PdeOps for Pde {
+    fn new(val: u64) -> Self {
+        Self::from_raw(val)
+    }
+
+    fn new_vram(table_pfn: Pfn) -> Self {
+        Self::new_vram_inner(table_pfn)
+    }
+
+    fn invalid() -> Self {
         Self::zeroed().with_aperture(AperturePde::Invalid)
     }
 
-    /// Check if this PDE is valid.
-    pub(super) fn is_valid(&self) -> bool {
-        self.aperture() != AperturePde::Invalid
+    fn is_valid(&self) -> bool {
+        Pde::aperture(*self) != AperturePde::Invalid
     }
 
-    /// Get the VRAM address of the page table.
-    pub(super) fn table_vram_address(&self) -> VramAddress {
+    fn aperture(&self) -> AperturePde {
+        Pde::aperture(*self)
+    }
+
+    fn table_vram_address(&self) -> VramAddress {
         debug_assert!(
-            self.aperture() == AperturePde::VideoMemory,
+            Pde::aperture(*self) == AperturePde::VideoMemory,
             "table_vram_address called on non-VRAM PDE (aperture: {:?})",
-            self.aperture()
+            Pde::aperture(*self)
         );
         VramAddress::from(self.table_frame())
     }
 
-    /// Get the raw `u64` value.
-    pub(super) fn raw_u64(&self) -> u64 {
+    fn raw_u64(&self) -> u64 {
         self.into_raw()
     }
 }
@@ -363,29 +392,40 @@ pub(in crate::mm) struct DualPde {
 unsafe impl Zeroable for DualPde {}
 
 impl DualPde {
-    /// Create a dual PDE from raw 128-bit value (two `u64`s).
-    pub(super) fn new(big: u64, small: u64) -> Self {
-        Self {
-            big: DualPdeBig::new(big),
-            small: Pde::new(small),
-        }
-    }
-
-    /// Create a dual PDE with only the small page table pointer set.
-    pub(super) fn new_small(table_pfn: Pfn) -> Self {
-        Self {
-            big: DualPdeBig::invalid(),
-            small: Pde::new_vram(table_pfn),
-        }
-    }
-
-    /// Check if the small page table pointer is valid.
-    pub(super) fn has_small(&self) -> bool {
-        self.small.is_valid()
-    }
-
     /// Check if the big page table pointer is valid.
     fn has_big(&self) -> bool {
         self.big.is_valid()
     }
 }
+
+impl DualPdeOps for DualPde {
+    fn new(big: u64, small: u64) -> Self {
+        Self {
+            big: DualPdeBig::new(big),
+            small: PdeOps::new(small),
+        }
+    }
+
+    fn new_small(table_pfn: Pfn) -> Self {
+        Self {
+            big: DualPdeBig::invalid(),
+            small: PdeOps::new_vram(table_pfn),
+        }
+    }
+
+    fn has_small(&self) -> bool {
+        PdeOps::is_valid(&self.small)
+    }
+
+    fn small_vram_address(&self) -> VramAddress {
+        PdeOps::table_vram_address(&self.small)
+    }
+
+    fn big_raw_u64(&self) -> u64 {
+        self.big.raw_u64()
+    }
+
+    fn small_raw_u64(&self) -> u64 {
+        PdeOps::raw_u64(&self.small)
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 13/20] gpu: nova-core: mm: Add page table walker for MMU v2/v3
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add the page table walker implementation that traverses the page table
hierarchy for both MMU v2 (5-level) and MMU v3 (6-level) to resolve
virtual addresses to physical addresses or find PTE locations.

Currently only v2 has been tested (nova-core currently boots pre-hopper)
with some initial preparatory work done for v3.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/pagetable.rs      |   1 +
 drivers/gpu/nova-core/mm/pagetable/walk.rs | 242 +++++++++++++++++++++
 2 files changed, 243 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/walk.rs

diff --git a/drivers/gpu/nova-core/mm/pagetable.rs b/drivers/gpu/nova-core/mm/pagetable.rs
index 764b9e71ae41..b7e0e8e02905 100644
--- a/drivers/gpu/nova-core/mm/pagetable.rs
+++ b/drivers/gpu/nova-core/mm/pagetable.rs
@@ -10,6 +10,7 @@
 
 pub(super) mod ver2;
 pub(super) mod ver3;
+pub(super) mod walk;
 
 use kernel::num::Bounded;
 
diff --git a/drivers/gpu/nova-core/mm/pagetable/walk.rs b/drivers/gpu/nova-core/mm/pagetable/walk.rs
new file mode 100644
index 000000000000..89d4426bcf14
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable/walk.rs
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Page table walker implementation for NVIDIA GPUs.
+//!
+//! This module provides page table walking functionality for MMU v2 and v3.
+//! The walker traverses the page table hierarchy to resolve virtual addresses
+//! to physical addresses or to find PTE locations.
+//!
+//! # Page Table Hierarchy
+//!
+//! ## MMU v2 (Turing/Ampere/Ada) - 5 levels
+//!
+//! ```text
+//!     +-------+     +-------+     +-------+     +---------+     +-------+
+//!     | PDB   |---->|  L1   |---->|  L2   |---->| L3 Dual |---->|  L4   |
+//!     | (L0)  |     |       |     |       |     | PDE     |     | (PTE) |
+//!     +-------+     +-------+     +-------+     +---------+     +-------+
+//!       64-bit        64-bit        64-bit        128-bit         64-bit
+//!        PDE           PDE           PDE        (big+small)        PTE
+//! ```
+//!
+//! ## MMU v3 (Hopper+) - 6 levels
+//!
+//! ```text
+//!     +-------+     +-------+     +-------+     +-------+     +---------+     +-------+
+//!     | PDB   |---->|  L1   |---->|  L2   |---->|  L3   |---->| L4 Dual |---->|  L5   |
+//!     | (L0)  |     |       |     |       |     |       |     | PDE     |     | (PTE) |
+//!     +-------+     +-------+     +-------+     +-------+     +---------+     +-------+
+//!       64-bit        64-bit        64-bit        64-bit        128-bit         64-bit
+//!        PDE           PDE           PDE           PDE        (big+small)        PTE
+//! ```
+//!
+//! # Result of a page table walk
+//!
+//! The walker returns a [`WalkResult`] indicating the outcome.
+
+use core::marker::PhantomData;
+
+use kernel::prelude::*;
+
+use super::{
+    DualPdeOps,
+    MmuConfig,
+    MmuV2,
+    MmuV3,
+    MmuVersion,
+    PageTableLevel,
+    PdeOps,
+    PteOps, //
+};
+use crate::{
+    mm::{
+        pramin,
+        GpuMm,
+        Pfn,
+        Vfn,
+        VirtualAddress,
+        VramAddress, //
+    },
+    num::{
+        IntoSafeCast, //
+    },
+};
+
+/// Result of walking to a PTE.
+#[derive(Debug, Clone, Copy)]
+pub(in crate::mm) enum WalkResult {
+    /// Intermediate page tables are missing (only returned in lookup mode).
+    PageTableMissing,
+    /// PTE exists but is invalid (page not mapped).
+    Unmapped { pte_addr: VramAddress },
+    /// PTE exists and is valid (page is mapped).
+    Mapped { pte_addr: VramAddress, pfn: Pfn },
+}
+
+/// Result of walking PDE levels only.
+///
+/// Returned by [`PtWalkInner::walk_pde_levels()`] to indicate whether all PDE
+/// levels resolved or a PDE is missing.
+#[derive(Debug, Clone, Copy)]
+pub(in crate::mm) enum WalkPdeResult {
+    /// All PDE levels resolved -- returns PTE page table address.
+    Complete {
+        /// VRAM address of the PTE-level page table.
+        pte_table: VramAddress,
+    },
+    /// A PDE is missing and no prepared page was provided by the closure.
+    Missing {
+        /// PDE slot address in the parent page table (where to install).
+        install_addr: VramAddress,
+        /// The page table level that is missing.
+        level: PageTableLevel,
+    },
+}
+
+/// Page table walker.
+pub(in crate::mm) struct PtWalkInner<M: MmuConfig> {
+    pdb_addr: VramAddress,
+    _phantom: PhantomData<M>,
+}
+
+impl<M: MmuConfig> PtWalkInner<M> {
+    /// Calculate the VRAM address of an entry within a page table.
+    fn entry_addr(table: VramAddress, level: PageTableLevel, index: u64) -> VramAddress {
+        let entry_size: u64 = M::entry_size(level).into_safe_cast();
+        VramAddress::new(table.raw_u64() + index * entry_size)
+    }
+
+    /// Create a new page table walker.
+    pub(super) fn new(pdb_addr: VramAddress) -> Self {
+        Self {
+            pdb_addr,
+            _phantom: PhantomData,
+        }
+    }
+
+    /// Walk PDE levels with closure-based resolution for missing PDEs.
+    ///
+    /// Traverses all PDE levels for the MMU version. At each level, reads the PDE.
+    /// If valid, extracts the child table address and continues. If missing, calls
+    /// `resolve_prepared(install_addr)` to resolve the missing PDE.
+    pub(super) fn walk_pde_levels(
+        &self,
+        window: &mut pramin::PraminWindow<'_>,
+        vfn: Vfn,
+        resolve_prepared: impl Fn(VramAddress) -> Option<VramAddress>,
+    ) -> Result<WalkPdeResult> {
+        let va = VirtualAddress::from(vfn);
+        let mut cur_table = self.pdb_addr;
+
+        for &level in M::PDE_LEVELS {
+            let idx = M::level_index(va, level.as_index());
+            let install_addr = Self::entry_addr(cur_table, level, idx);
+
+            if level == M::DUAL_PDE_LEVEL {
+                // 128-bit dual PDE with big+small page table pointers.
+                let dpde = M::DualPde::read(window, install_addr)?;
+                if dpde.has_small() {
+                    cur_table = dpde.small_vram_address();
+                    continue;
+                }
+            } else {
+                // Regular 64-bit PDE.
+                let pde = M::Pde::read(window, install_addr)?;
+                if pde.is_valid() {
+                    cur_table = pde.table_vram_address();
+                    continue;
+                }
+            }
+
+            // PDE missing in HW. Ask caller for resolution.
+            if let Some(prepared_addr) = resolve_prepared(install_addr) {
+                cur_table = prepared_addr;
+                continue;
+            }
+
+            return Ok(WalkPdeResult::Missing {
+                install_addr,
+                level,
+            });
+        }
+
+        Ok(WalkPdeResult::Complete {
+            pte_table: cur_table,
+        })
+    }
+
+    /// Walk to PTE for lookup only (no allocation).
+    ///
+    /// Returns [`WalkResult::PageTableMissing`] if intermediate tables don't exist.
+    pub(super) fn walk_to_pte_lookup(&self, mm: &GpuMm, vfn: Vfn) -> Result<WalkResult> {
+        let mut window = mm.pramin().get_window()?;
+        self.walk_to_pte_lookup_with_window(&mut window, vfn)
+    }
+
+    /// Walk to PTE using a caller-provided PRAMIN window (lookup only).
+    pub(super) fn walk_to_pte_lookup_with_window(
+        &self,
+        window: &mut pramin::PraminWindow<'_>,
+        vfn: Vfn,
+    ) -> Result<WalkResult> {
+        match self.walk_pde_levels(window, vfn, |_| None)? {
+            WalkPdeResult::Complete { pte_table } => {
+                Self::read_pte_at_level(window, vfn, pte_table)
+            }
+            WalkPdeResult::Missing { .. } => Ok(WalkResult::PageTableMissing),
+        }
+    }
+
+    /// Read the PTE at the PTE level given the PTE table address.
+    fn read_pte_at_level(
+        window: &mut pramin::PraminWindow<'_>,
+        vfn: Vfn,
+        pte_table: VramAddress,
+    ) -> Result<WalkResult> {
+        let va = VirtualAddress::from(vfn);
+        let pte_level = M::PTE_LEVEL;
+        let pte_idx = M::level_index(va, pte_level.as_index());
+        let pte_addr = Self::entry_addr(pte_table, pte_level, pte_idx);
+        let pte = M::Pte::read(window, pte_addr)?;
+
+        if pte.is_valid() {
+            return Ok(WalkResult::Mapped {
+                pte_addr,
+                pfn: pte.frame_number(),
+            });
+        }
+        Ok(WalkResult::Unmapped { pte_addr })
+    }
+}
+
+macro_rules! pt_walk_dispatch {
+    ($self:expr, $method:ident ( $($arg:expr),* $(,)? )) => {
+        match $self {
+            PtWalk::V2(inner) => inner.$method($($arg),*),
+            PtWalk::V3(inner) => inner.$method($($arg),*),
+        }
+    };
+}
+
+/// Page table walker dispatch.
+pub(in crate::mm) enum PtWalk {
+    /// MMU v2 (Turing/Ampere/Ada).
+    V2(PtWalkInner<MmuV2>),
+    /// MMU v3 (Hopper+).
+    V3(PtWalkInner<MmuV3>),
+}
+
+impl PtWalk {
+    /// Create a new page table walker for the given MMU version.
+    pub(in crate::mm) fn new(pdb_addr: VramAddress, version: MmuVersion) -> Self {
+        match version {
+            MmuVersion::V2 => Self::V2(PtWalkInner::<MmuV2>::new(pdb_addr)),
+            MmuVersion::V3 => Self::V3(PtWalkInner::<MmuV3>::new(pdb_addr)),
+        }
+    }
+
+    /// Walk to PTE for lookup.
+    pub(in crate::mm) fn walk_to_pte(&self, mm: &GpuMm, vfn: Vfn) -> Result<WalkResult> {
+        pt_walk_dispatch!(self, walk_to_pte_lookup(mm, vfn))
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 17/20] gpu: nova-core: Add BAR1 aperture type and size constant
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add BAR1_SIZE constant and Bar1 type alias for the 256MB BAR1 aperture.
These are prerequisites for BAR1 memory access functionality.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Co-developed-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs          | 2 ++
 drivers/gpu/nova-core/gsp/commands.rs    | 4 ++++
 drivers/gpu/nova-core/gsp/fw/commands.rs | 8 ++++++++
 3 files changed, 14 insertions(+)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 84b0e1703150..597343d5da54 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -47,6 +47,8 @@ pub(crate) struct NovaCore {
 const GPU_DMA_BITS: u32 = 47;
 
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
+#[expect(dead_code)]
+pub(crate) type Bar1 = pci::Bar;
 
 kernel::pci_device_table!(
     PCI_TABLE,
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index eeecf81a0ffd..9bf0d32c6a7f 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -193,6 +193,9 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
 /// The reply from the GSP to the [`GetGspStaticInfo`] command.
 pub(crate) struct GetGspStaticInfoReply {
     gpu_name: [u8; 64],
+    /// BAR1 Page Directory Entry base address.
+    #[expect(dead_code)]
+    pub(crate) bar1_pde_base: u64,
     /// Usable FB (VRAM) region for driver memory allocation.
     pub(crate) usable_fb_region: Range<u64>,
     /// End of VRAM.
@@ -212,6 +215,7 @@ fn read(
 
         Ok(GetGspStaticInfoReply {
             gpu_name: msg.gpu_name_str(),
+            bar1_pde_base: msg.bar1_pde_base(),
             usable_fb_region: msg.first_usable_fb_region().ok_or(ENODEV)?,
             total_fb_end,
         })
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index f2d59aa3131f..ded6470df214 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -127,6 +127,14 @@ impl GspStaticConfigInfo {
         self.0.gpuNameString
     }
 
+    /// Returns the BAR1 Page Directory Entry base address.
+    ///
+    /// This is the root page table address for BAR1 virtual memory,
+    /// set up by GSP-RM firmware.
+    pub(crate) fn bar1_pde_base(&self) -> u64 {
+        self.0.bar1PdeBase
+    }
+
     /// Returns an iterator over valid FB regions from GSP firmware data.
     fn fb_regions(
         &self,
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 14/20] gpu: nova-core: mm: Add Virtual Memory Manager
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add the Virtual Memory Manager (VMM) infrastructure for GPU address
space management. Each Vmm instance manages a single address space
identified by its Page Directory Base (PDB) address, used for Channel,
BAR1 and BAR2 mappings.

Mapping APIs and virtual address range tracking are added in later
commits.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm.rs     |  1 +
 drivers/gpu/nova-core/mm/vmm.rs | 63 +++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/vmm.rs

diff --git a/drivers/gpu/nova-core/mm.rs b/drivers/gpu/nova-core/mm.rs
index af398e94dd16..87fd6f0b956e 100644
--- a/drivers/gpu/nova-core/mm.rs
+++ b/drivers/gpu/nova-core/mm.rs
@@ -27,6 +27,7 @@ fn from(pfn: Pfn) -> Self {
 pub(super) mod pagetable;
 pub(crate) mod pramin;
 pub(super) mod tlb;
+pub(super) mod vmm;
 
 use kernel::{
     bitfield,
diff --git a/drivers/gpu/nova-core/mm/vmm.rs b/drivers/gpu/nova-core/mm/vmm.rs
new file mode 100644
index 000000000000..d92495a4579d
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/vmm.rs
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Virtual Memory Manager for NVIDIA GPU page table management.
+//!
+//! The [`Vmm`] provides high-level page mapping and unmapping operations for GPU
+//! virtual address spaces (Channels, BAR1, BAR2). It wraps the page table walker
+//! and handles TLB flushing after modifications.
+
+use kernel::{
+    gpu::buddy::AllocatedBlocks,
+    prelude::*, //
+};
+
+use crate::mm::{
+    pagetable::{
+        walk::{PtWalk, WalkResult},
+        MmuVersion, //
+    },
+    GpuMm,
+    Pfn,
+    Vfn,
+    VramAddress, //
+};
+
+/// Virtual Memory Manager for a GPU address space.
+///
+/// Each [`Vmm`] instance manages a single address space identified by its Page
+/// Directory Base (`PDB`) address. The [`Vmm`] is used for Channel, BAR1 and
+/// BAR2 mappings.
+pub(crate) struct Vmm {
+    /// Page Directory Base address for this address space.
+    pdb_addr: VramAddress,
+    /// MMU version used for page table layout.
+    mmu_version: MmuVersion,
+    /// Page table allocations required for mappings.
+    page_table_allocs: KVec<Pin<KBox<AllocatedBlocks>>>,
+}
+
+impl Vmm {
+    /// Create a new [`Vmm`] for the given Page Directory Base address.
+    pub(crate) fn new(pdb_addr: VramAddress, mmu_version: MmuVersion) -> Result<Self> {
+        // Only MMU v2 is supported for now.
+        if mmu_version != MmuVersion::V2 {
+            return Err(ENOTSUPP);
+        }
+
+        Ok(Self {
+            pdb_addr,
+            mmu_version,
+            page_table_allocs: KVec::new(),
+        })
+    }
+
+    /// Read the [`Pfn`] for a mapped [`Vfn`] if one is mapped.
+    pub(super) fn read_mapping(&self, mm: &GpuMm, vfn: Vfn) -> Result<Option<Pfn>> {
+        let walker = PtWalk::new(self.pdb_addr, self.mmu_version);
+
+        match walker.walk_to_pte_lookup(mm, vfn)? {
+            WalkResult::Mapped { pfn, .. } => Ok(Some(pfn)),
+            WalkResult::Unmapped { .. } | WalkResult::PageTableMissing => Ok(None),
+        }
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 15/20] gpu: nova-core: mm: Add virtual address range tracking to VMM
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add virtual address range tracking to the VMM using a buddy allocator.
This enables contiguous virtual address range allocation for mappings.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/vmm.rs | 97 +++++++++++++++++++++++++++++----
 1 file changed, 86 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/nova-core/mm/vmm.rs b/drivers/gpu/nova-core/mm/vmm.rs
index d92495a4579d..0ff71119708d 100644
--- a/drivers/gpu/nova-core/mm/vmm.rs
+++ b/drivers/gpu/nova-core/mm/vmm.rs
@@ -7,19 +7,35 @@
 //! and handles TLB flushing after modifications.
 
 use kernel::{
-    gpu::buddy::AllocatedBlocks,
-    prelude::*, //
+    gpu::buddy::{
+        AllocatedBlocks,
+        GpuBuddy,
+        GpuBuddyAllocFlag,
+        GpuBuddyAllocMode,
+        GpuBuddyParams, //
+    },
+    prelude::*,
+    ptr::Alignment,
+    sizes::SZ_4K, //
 };
 
-use crate::mm::{
-    pagetable::{
-        walk::{PtWalk, WalkResult},
-        MmuVersion, //
+use core::ops::Range;
+
+use crate::{
+    mm::{
+        pagetable::{
+            walk::{PtWalk, WalkResult},
+            MmuVersion, //
+        },
+        GpuMm,
+        Pfn,
+        Vfn,
+        VramAddress,
+        PAGE_SIZE, //
+    },
+    num::{
+        IntoSafeCast, //
     },
-    GpuMm,
-    Pfn,
-    Vfn,
-    VramAddress, //
 };
 
 /// Virtual Memory Manager for a GPU address space.
@@ -34,23 +50,82 @@ pub(crate) struct Vmm {
     mmu_version: MmuVersion,
     /// Page table allocations required for mappings.
     page_table_allocs: KVec<Pin<KBox<AllocatedBlocks>>>,
+    /// Buddy allocator for virtual address range tracking.
+    virt_buddy: GpuBuddy,
 }
 
 impl Vmm {
     /// Create a new [`Vmm`] for the given Page Directory Base address.
-    pub(crate) fn new(pdb_addr: VramAddress, mmu_version: MmuVersion) -> Result<Self> {
+    ///
+    /// The [`Vmm`] will manage a virtual address space of `va_size` bytes.
+    pub(crate) fn new(
+        pdb_addr: VramAddress,
+        mmu_version: MmuVersion,
+        va_size: u64,
+    ) -> Result<Self> {
         // Only MMU v2 is supported for now.
         if mmu_version != MmuVersion::V2 {
             return Err(ENOTSUPP);
         }
 
+        let virt_buddy = GpuBuddy::new(GpuBuddyParams {
+            base_offset: 0,
+            size: va_size,
+            chunk_size: Alignment::new::<SZ_4K>(),
+        })?;
+
         Ok(Self {
             pdb_addr,
             mmu_version,
             page_table_allocs: KVec::new(),
+            virt_buddy,
         })
     }
 
+    /// Allocate a contiguous virtual frame number range.
+    ///
+    /// # Arguments
+    ///
+    /// - `num_pages`: Number of pages to allocate.
+    /// - `va_range`: `None` = allocate anywhere, `Some(range)` = constrain allocation to the given
+    ///   range.
+    fn alloc_vfn_range(
+        &self,
+        num_pages: usize,
+        va_range: Option<Range<u64>>,
+    ) -> Result<(Vfn, Pin<KBox<AllocatedBlocks>>)> {
+        let num_pages: u64 = num_pages.into_safe_cast();
+        let page_size: u64 = PAGE_SIZE.into_safe_cast();
+        let size: u64 = num_pages.checked_mul(page_size).ok_or(EOVERFLOW)?;
+
+        let mode = match va_range {
+            Some(r) => {
+                let range_size = r.end.checked_sub(r.start).ok_or(EOVERFLOW)?;
+                if range_size != size {
+                    return Err(EINVAL);
+                }
+                GpuBuddyAllocMode::Range(r)
+            }
+            None => GpuBuddyAllocMode::Simple,
+        };
+
+        let alloc = KBox::pin_init(
+            self.virt_buddy.alloc_blocks(
+                mode,
+                size,
+                Alignment::new::<SZ_4K>(),
+                GpuBuddyAllocFlag::Contiguous,
+            ),
+            GFP_KERNEL,
+        )?;
+
+        // Get the starting offset of the first block (only block as range is contiguous).
+        let offset = alloc.iter().next().ok_or(ENOMEM)?.offset();
+        let vfn = Vfn::new(offset / page_size);
+
+        Ok((vfn, alloc))
+    }
+
     /// Read the [`Pfn`] for a mapped [`Vfn`] if one is mapped.
     pub(super) fn read_mapping(&self, mm: &GpuMm, vfn: Vfn) -> Result<Option<Pfn>> {
         let walker = PtWalk::new(self.pdb_addr, self.mmu_version);
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 16/20] gpu: nova-core: mm: Add multi-page mapping API to VMM
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add the page table mapping and unmapping API to the Virtual Memory
Manager, implementing a two-phase prepare/execute model suitable for
use both inside and outside the DMA fence signalling critical path.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/pagetable.rs     |   1 +
 drivers/gpu/nova-core/mm/pagetable/map.rs | 338 ++++++++++++++++++++++
 drivers/gpu/nova-core/mm/vmm.rs           | 217 ++++++++++++--
 3 files changed, 537 insertions(+), 19 deletions(-)
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/map.rs

diff --git a/drivers/gpu/nova-core/mm/pagetable.rs b/drivers/gpu/nova-core/mm/pagetable.rs
index b7e0e8e02905..4070070922a4 100644
--- a/drivers/gpu/nova-core/mm/pagetable.rs
+++ b/drivers/gpu/nova-core/mm/pagetable.rs
@@ -8,6 +8,7 @@
 
 #![expect(dead_code)]
 
+pub(super) mod map;
 pub(super) mod ver2;
 pub(super) mod ver3;
 pub(super) mod walk;
diff --git a/drivers/gpu/nova-core/mm/pagetable/map.rs b/drivers/gpu/nova-core/mm/pagetable/map.rs
new file mode 100644
index 000000000000..a9719580143e
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable/map.rs
@@ -0,0 +1,338 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Page table mapping operations for NVIDIA GPUs.
+
+use core::marker::PhantomData;
+
+use kernel::{
+    gpu::buddy::{
+        AllocatedBlocks,
+        GpuBuddyAllocFlags,
+        GpuBuddyAllocMode, //
+    },
+    prelude::*,
+    ptr::Alignment,
+    rbtree::{RBTree, RBTreeNode},
+    sizes::SZ_4K, //
+};
+
+use super::{
+    walk::{
+        PtWalkInner,
+        WalkPdeResult,
+        WalkResult, //
+    },
+    DualPdeOps,
+    MmuConfig,
+    MmuV2,
+    MmuV3,
+    MmuVersion,
+    PageTableLevel,
+    PdeOps,
+    PteOps, //
+};
+use crate::{
+    mm::{
+        GpuMm,
+        Pfn,
+        Vfn,
+        VramAddress,
+        PAGE_SIZE, //
+    },
+    num::{
+        IntoSafeCast, //
+    },
+};
+
+/// A pre-allocated and zeroed page table page.
+///
+/// Created during the mapping prepare phase and consumed during the execute phase.
+/// Stored in an [`RBTree`] keyed by the PDE slot address (`install_addr`).
+pub(in crate::mm) struct PreparedPtPage {
+    /// The allocated and zeroed page table page.
+    pub(in crate::mm) alloc: Pin<KBox<AllocatedBlocks>>,
+    /// Page table level -- needed to determine if this PT page is for a dual PDE.
+    pub(in crate::mm) level: PageTableLevel,
+}
+
+/// Page table mapper.
+pub(in crate::mm) struct PtMapInner<M: MmuConfig> {
+    walker: PtWalkInner<M>,
+    pdb_addr: VramAddress,
+    _phantom: PhantomData<M>,
+}
+
+impl<M: MmuConfig> PtMapInner<M> {
+    /// Create a new [`PtMapInner`].
+    pub(super) fn new(pdb_addr: VramAddress) -> Self {
+        Self {
+            walker: PtWalkInner::<M>::new(pdb_addr),
+            pdb_addr,
+            _phantom: PhantomData,
+        }
+    }
+
+    /// Allocate and zero a physical page table page.
+    fn alloc_and_zero_page(mm: &GpuMm, level: PageTableLevel) -> Result<PreparedPtPage> {
+        let blocks = KBox::pin_init(
+            mm.buddy().alloc_blocks(
+                GpuBuddyAllocMode::Simple,
+                SZ_4K.into_safe_cast(),
+                Alignment::new::<SZ_4K>(),
+                GpuBuddyAllocFlags::default(),
+            ),
+            GFP_KERNEL,
+        )?;
+
+        let page_vram = VramAddress::new(blocks.iter().next().ok_or(ENOMEM)?.offset());
+
+        // Zero via PRAMIN.
+        let mut window = mm.pramin().get_window()?;
+        let base = page_vram.raw();
+        for off in (0..PAGE_SIZE).step_by(8) {
+            window.try_write64(base + off, 0)?;
+        }
+
+        Ok(PreparedPtPage {
+            alloc: blocks,
+            level,
+        })
+    }
+
+    /// Ensure all intermediate page table pages exist for a single VFN.
+    ///
+    /// PRAMIN is released before each allocation and re-acquired after. Memory
+    /// allocations are done outside of holding this lock to prevent deadlocks with
+    /// the fence signalling critical path.
+    fn ensure_single_pte_path(
+        &self,
+        mm: &GpuMm,
+        vfn: Vfn,
+        pt_pages: &mut RBTree<VramAddress, PreparedPtPage>,
+    ) -> Result {
+        let max_iter = 2 * M::PDE_LEVELS.len();
+
+        for _ in 0..max_iter {
+            let mut window = mm.pramin().get_window()?;
+
+            let result = self
+                .walker
+                .walk_pde_levels(&mut window, vfn, |install_addr| {
+                    pt_pages
+                        .get(&install_addr)
+                        .and_then(|p| p.alloc.iter().next().map(|b| VramAddress::new(b.offset())))
+                })?;
+
+            match result {
+                WalkPdeResult::Complete { .. } => {
+                    return Ok(());
+                }
+                WalkPdeResult::Missing {
+                    install_addr,
+                    level,
+                } => {
+                    // Drop PRAMIN before allocation.
+                    drop(window);
+                    let page = Self::alloc_and_zero_page(mm, level)?;
+                    let node = RBTreeNode::new(install_addr, page, GFP_KERNEL)?;
+                    let old = pt_pages.insert(node);
+                    if old.is_some() {
+                        kernel::pr_warn_once!(
+                            "VMM: duplicate install_addr in pt_pages (internal consistency error)\n"
+                        );
+                        return Err(EIO);
+                    }
+                }
+            }
+        }
+
+        kernel::pr_warn!(
+            "VMM: ensure_pte_path: loop exhausted after {} iters (VFN {:?})\n",
+            max_iter,
+            vfn
+        );
+        Err(EIO)
+    }
+
+    /// Prepare page table resources for mapping `num_pages` pages starting at `vfn_start`.
+    ///
+    /// Reserves capacity in `page_table_allocs`, then walks the hierarchy
+    /// per-VFN to prepare pages for all missing PDEs.
+    pub(super) fn prepare_map(
+        &self,
+        mm: &GpuMm,
+        vfn_start: Vfn,
+        num_pages: usize,
+        page_table_allocs: &mut KVec<Pin<KBox<AllocatedBlocks>>>,
+        pt_pages: &mut RBTree<VramAddress, PreparedPtPage>,
+    ) -> Result {
+        // Pre-reserve so install_mappings() can use push_within_capacity (no alloc
+        // in fence signalling critical path).
+        let pt_upper_bound = M::pt_pages_upper_bound(num_pages);
+        page_table_allocs.reserve(pt_upper_bound, GFP_KERNEL)?;
+
+        // Walk the hierarchy per-VFN to prepare pages for all missing PDEs.
+        for i in 0..num_pages {
+            let i_u64: u64 = i.into_safe_cast();
+            let vfn = Vfn::new(vfn_start.raw() + i_u64);
+            self.ensure_single_pte_path(mm, vfn, pt_pages)?;
+        }
+        Ok(())
+    }
+
+    /// Install prepared PDEs and write PTEs, then flush TLB.
+    ///
+    /// Drains `pt_pages` and moves allocations into `page_table_allocs`.
+    pub(super) fn install_mappings(
+        &self,
+        mm: &GpuMm,
+        pt_pages: &mut RBTree<VramAddress, PreparedPtPage>,
+        page_table_allocs: &mut KVec<Pin<KBox<AllocatedBlocks>>>,
+        vfn_start: Vfn,
+        pfns: &[Pfn],
+        writable: bool,
+    ) -> Result {
+        let mut window = mm.pramin().get_window()?;
+
+        // Drain prepared PT pages, install all pending PDEs.
+        let mut cursor = pt_pages.cursor_front_mut();
+        while let Some(c) = cursor {
+            let (next, node) = c.remove_current();
+            let (install_addr, page) = node.to_key_value();
+            let page_vram = VramAddress::new(page.alloc.iter().next().ok_or(ENOMEM)?.offset());
+
+            if page.level == M::DUAL_PDE_LEVEL {
+                let new_dpde = M::DualPde::new_small(Pfn::from(page_vram));
+                new_dpde.write(&mut window, install_addr)?;
+            } else {
+                let new_pde = M::Pde::new_vram(Pfn::from(page_vram));
+                new_pde.write(&mut window, install_addr)?;
+            }
+
+            page_table_allocs
+                .push_within_capacity(page.alloc)
+                .map_err(|_| ENOMEM)?;
+
+            cursor = next;
+        }
+
+        // Write PTEs (all PDEs now installed in HW).
+        for (i, &pfn) in pfns.iter().enumerate() {
+            let i_u64: u64 = i.into_safe_cast();
+            let vfn = Vfn::new(vfn_start.raw() + i_u64);
+            let result = self
+                .walker
+                .walk_to_pte_lookup_with_window(&mut window, vfn)?;
+
+            match result {
+                WalkResult::Unmapped { pte_addr } | WalkResult::Mapped { pte_addr, .. } => {
+                    let pte = M::Pte::new_vram(pfn, writable);
+                    pte.write(&mut window, pte_addr)?;
+                }
+                WalkResult::PageTableMissing => {
+                    kernel::pr_warn_once!("VMM: page table missing for VFN {vfn:?}\n");
+                    return Err(EIO);
+                }
+            }
+        }
+
+        drop(window);
+
+        // Flush TLB.
+        mm.tlb().flush(self.pdb_addr)
+    }
+
+    /// Invalidate PTEs for a range and flush TLB.
+    pub(super) fn invalidate_ptes(&self, mm: &GpuMm, vfn_start: Vfn, num_pages: usize) -> Result {
+        let invalid_pte = M::Pte::invalid();
+
+        let mut window = mm.pramin().get_window()?;
+        for i in 0..num_pages {
+            let i_u64: u64 = i.into_safe_cast();
+            let vfn = Vfn::new(vfn_start.raw() + i_u64);
+            let result = self
+                .walker
+                .walk_to_pte_lookup_with_window(&mut window, vfn)?;
+
+            match result {
+                WalkResult::Mapped { pte_addr, .. } | WalkResult::Unmapped { pte_addr } => {
+                    invalid_pte.write(&mut window, pte_addr)?;
+                }
+                WalkResult::PageTableMissing => {
+                    continue;
+                }
+            }
+        }
+        drop(window);
+
+        mm.tlb().flush(self.pdb_addr)
+    }
+}
+
+macro_rules! pt_map_dispatch {
+    ($self:expr, $method:ident ( $($arg:expr),* $(,)? )) => {
+        match $self {
+            PtMap::V2(inner) => inner.$method($($arg),*),
+            PtMap::V3(inner) => inner.$method($($arg),*),
+        }
+    };
+}
+
+/// Page table mapper dispatch.
+pub(in crate::mm) enum PtMap {
+    /// MMU v2 (Turing/Ampere/Ada).
+    V2(PtMapInner<MmuV2>),
+    /// MMU v3 (Hopper+).
+    V3(PtMapInner<MmuV3>),
+}
+
+impl PtMap {
+    /// Create a new page table mapper for the given MMU version.
+    pub(in crate::mm) fn new(pdb_addr: VramAddress, version: MmuVersion) -> Self {
+        match version {
+            MmuVersion::V2 => Self::V2(PtMapInner::<MmuV2>::new(pdb_addr)),
+            MmuVersion::V3 => Self::V3(PtMapInner::<MmuV3>::new(pdb_addr)),
+        }
+    }
+
+    /// Prepare page table resources for a mapping.
+    pub(in crate::mm) fn prepare_map(
+        &self,
+        mm: &GpuMm,
+        vfn_start: Vfn,
+        num_pages: usize,
+        page_table_allocs: &mut KVec<Pin<KBox<AllocatedBlocks>>>,
+        pt_pages: &mut RBTree<VramAddress, PreparedPtPage>,
+    ) -> Result {
+        pt_map_dispatch!(
+            self,
+            prepare_map(mm, vfn_start, num_pages, page_table_allocs, pt_pages)
+        )
+    }
+
+    /// Install prepared PDEs and write PTEs, then flush TLB.
+    pub(in crate::mm) fn install_mappings(
+        &self,
+        mm: &GpuMm,
+        pt_pages: &mut RBTree<VramAddress, PreparedPtPage>,
+        page_table_allocs: &mut KVec<Pin<KBox<AllocatedBlocks>>>,
+        vfn_start: Vfn,
+        pfns: &[Pfn],
+        writable: bool,
+    ) -> Result {
+        pt_map_dispatch!(
+            self,
+            install_mappings(mm, pt_pages, page_table_allocs, vfn_start, pfns, writable)
+        )
+    }
+
+    /// Invalidate PTEs for a range and flush TLB.
+    pub(in crate::mm) fn invalidate_ptes(
+        &self,
+        mm: &GpuMm,
+        vfn_start: Vfn,
+        num_pages: usize,
+    ) -> Result {
+        pt_map_dispatch!(self, invalidate_ptes(mm, vfn_start, num_pages))
+    }
+}
diff --git a/drivers/gpu/nova-core/mm/vmm.rs b/drivers/gpu/nova-core/mm/vmm.rs
index 0ff71119708d..4109d413e1b7 100644
--- a/drivers/gpu/nova-core/mm/vmm.rs
+++ b/drivers/gpu/nova-core/mm/vmm.rs
@@ -3,8 +3,7 @@
 //! Virtual Memory Manager for NVIDIA GPU page table management.
 //!
 //! The [`Vmm`] provides high-level page mapping and unmapping operations for GPU
-//! virtual address spaces (Channels, BAR1, BAR2). It wraps the page table walker
-//! and handles TLB flushing after modifications.
+//! virtual address spaces (Channels, BAR1, BAR2).
 
 use kernel::{
     gpu::buddy::{
@@ -16,15 +15,25 @@
     },
     prelude::*,
     ptr::Alignment,
+    rbtree::RBTree,
     sizes::SZ_4K, //
 };
 
-use core::ops::Range;
+use core::{
+    cell::Cell,
+    ops::Range, //
+};
 
 use crate::{
     mm::{
         pagetable::{
-            walk::{PtWalk, WalkResult},
+            map::{
+                PtMap, //
+            },
+            walk::{
+                PtWalk,
+                WalkResult, //
+            },
             MmuVersion, //
         },
         GpuMm,
@@ -38,20 +47,77 @@
     },
 };
 
+/// Multi-page prepared mapping -- VA range allocated, ready for execute.
+///
+/// Produced by [`Vmm::prepare_map()`], consumed by [`Vmm::execute_map()`].
+/// The struct owns the VA space allocation between prepare and execute phases.
+pub(crate) struct PreparedMapping {
+    vfn_start: Vfn,
+    num_pages: usize,
+    vfn_alloc: Pin<KBox<AllocatedBlocks>>,
+}
+
+/// Result of a mapping operation -- tracks the active mapped range.
+///
+/// Returned by [`Vmm::execute_map()`] and [`Vmm::map_pages()`].
+/// Owns the VA allocation; the VA range is freed when this is dropped.
+/// Callers must call [`Vmm::unmap_pages()`] before dropping to invalidate
+/// PTEs (dropping only frees the VA range, not the PTE entries).
+pub(crate) struct MappedRange {
+    pub(super) vfn_start: Vfn,
+    pub(super) num_pages: usize,
+    /// VA allocation -- freed when [`MappedRange`] is dropped.
+    _vfn_alloc: Pin<KBox<AllocatedBlocks>>,
+    /// Logs a warning if dropped without unmapping.
+    _drop_guard: MustUnmapGuard,
+}
+
+/// Guard that logs a warning once if a [`MappedRange`] is dropped without
+/// calling [`Vmm::unmap_pages()`].
+struct MustUnmapGuard {
+    armed: Cell<bool>,
+}
+
+impl MustUnmapGuard {
+    const fn new() -> Self {
+        Self {
+            armed: Cell::new(true),
+        }
+    }
+
+    fn disarm(&self) {
+        self.armed.set(false);
+    }
+}
+
+impl Drop for MustUnmapGuard {
+    fn drop(&mut self) {
+        if self.armed.get() {
+            kernel::pr_warn!("MappedRange dropped without calling unmap_pages()\n");
+        }
+    }
+}
+
 /// Virtual Memory Manager for a GPU address space.
 ///
 /// Each [`Vmm`] instance manages a single address space identified by its Page
-/// Directory Base (`PDB`) address. The [`Vmm`] is used for Channel, BAR1 and
-/// BAR2 mappings.
+/// Directory Base (`PDB`) address. Used for Channel, BAR1 and BAR2 mappings.
 pub(crate) struct Vmm {
     /// Page Directory Base address for this address space.
     pdb_addr: VramAddress,
-    /// MMU version used for page table layout.
-    mmu_version: MmuVersion,
+    /// Page table walker for reading existing mappings.
+    pt_walk: PtWalk,
+    /// Page table mapper for prepare/execute operations.
+    pt_map: PtMap,
     /// Page table allocations required for mappings.
     page_table_allocs: KVec<Pin<KBox<AllocatedBlocks>>>,
     /// Buddy allocator for virtual address range tracking.
     virt_buddy: GpuBuddy,
+    /// Prepared PT pages pending PDE installation, keyed by `install_addr`.
+    ///
+    /// Populated during prepare phase and drained in execute phase. Shared by all
+    /// pending maps, preventing races on the same PDE slot.
+    pt_pages: RBTree<VramAddress, super::pagetable::map::PreparedPtPage>,
 }
 
 impl Vmm {
@@ -76,19 +142,15 @@ pub(crate) fn new(
 
         Ok(Self {
             pdb_addr,
-            mmu_version,
+            pt_walk: PtWalk::new(pdb_addr, mmu_version),
+            pt_map: PtMap::new(pdb_addr, mmu_version),
             page_table_allocs: KVec::new(),
             virt_buddy,
+            pt_pages: RBTree::new(),
         })
     }
 
     /// Allocate a contiguous virtual frame number range.
-    ///
-    /// # Arguments
-    ///
-    /// - `num_pages`: Number of pages to allocate.
-    /// - `va_range`: `None` = allocate anywhere, `Some(range)` = constrain allocation to the given
-    ///   range.
     fn alloc_vfn_range(
         &self,
         num_pages: usize,
@@ -119,7 +181,6 @@ fn alloc_vfn_range(
             GFP_KERNEL,
         )?;
 
-        // Get the starting offset of the first block (only block as range is contiguous).
         let offset = alloc.iter().next().ok_or(ENOMEM)?.offset();
         let vfn = Vfn::new(offset / page_size);
 
@@ -128,11 +189,129 @@ fn alloc_vfn_range(
 
     /// Read the [`Pfn`] for a mapped [`Vfn`] if one is mapped.
     pub(super) fn read_mapping(&self, mm: &GpuMm, vfn: Vfn) -> Result<Option<Pfn>> {
-        let walker = PtWalk::new(self.pdb_addr, self.mmu_version);
-
-        match walker.walk_to_pte_lookup(mm, vfn)? {
+        match self.pt_walk.walk_to_pte(mm, vfn)? {
             WalkResult::Mapped { pfn, .. } => Ok(Some(pfn)),
             WalkResult::Unmapped { .. } | WalkResult::PageTableMissing => Ok(None),
         }
     }
+
+    /// Prepare resources for mapping `num_pages` pages.
+    ///
+    /// Allocates a contiguous VA range, then walks the hierarchy per-VFN to prepare pages
+    /// for all missing PDEs. Returns a [`PreparedMapping`] with the VA allocation.
+    ///
+    /// If `va_range` is not `None`, the VA range is constrained to the given range. Safe
+    /// to call outside the fence signalling critical path.
+    pub(crate) fn prepare_map(
+        &mut self,
+        mm: &GpuMm,
+        num_pages: usize,
+        va_range: Option<Range<u64>>,
+    ) -> Result<PreparedMapping> {
+        if num_pages == 0 {
+            return Err(EINVAL);
+        }
+
+        // Allocate contiguous VA range.
+        let (vfn_start, vfn_alloc) = self.alloc_vfn_range(num_pages, va_range)?;
+
+        self.pt_map.prepare_map(
+            mm,
+            vfn_start,
+            num_pages,
+            &mut self.page_table_allocs,
+            &mut self.pt_pages,
+        )?;
+
+        Ok(PreparedMapping {
+            vfn_start,
+            num_pages,
+            vfn_alloc,
+        })
+    }
+
+    /// Execute a prepared multi-page mapping.
+    ///
+    /// Installs all prepared PDEs and writes PTEs into the page table, then flushes TLB.
+    pub(crate) fn execute_map(
+        &mut self,
+        mm: &GpuMm,
+        prepared: PreparedMapping,
+        pfns: &[Pfn],
+        writable: bool,
+    ) -> Result<MappedRange> {
+        if pfns.len() != prepared.num_pages {
+            return Err(EINVAL);
+        }
+
+        let PreparedMapping {
+            vfn_start,
+            num_pages,
+            vfn_alloc,
+        } = prepared;
+
+        self.pt_map.install_mappings(
+            mm,
+            &mut self.pt_pages,
+            &mut self.page_table_allocs,
+            vfn_start,
+            pfns,
+            writable,
+        )?;
+
+        Ok(MappedRange {
+            vfn_start,
+            num_pages,
+            _vfn_alloc: vfn_alloc,
+            _drop_guard: MustUnmapGuard::new(),
+        })
+    }
+
+    /// Map pages doing prepare and execute in the same call.
+    ///
+    /// This is a convenience wrapper for callers outside the fence signalling critical
+    /// path (e.g., BAR mappings). For DRM usecases, [`Vmm::prepare_map()`] and
+    /// [`Vmm::execute_map()`] will be called separately.
+    pub(crate) fn map_pages(
+        &mut self,
+        mm: &GpuMm,
+        pfns: &[Pfn],
+        va_range: Option<Range<u64>>,
+        writable: bool,
+    ) -> Result<MappedRange> {
+        if pfns.is_empty() {
+            return Err(EINVAL);
+        }
+
+        // Check if provided VA range is sufficient (if provided).
+        if let Some(ref range) = va_range {
+            let required: u64 = pfns
+                .len()
+                .checked_mul(PAGE_SIZE)
+                .ok_or(EOVERFLOW)?
+                .into_safe_cast();
+            let available = range.end.checked_sub(range.start).ok_or(EINVAL)?;
+            if available < required {
+                return Err(EINVAL);
+            }
+        }
+
+        let prepared = self.prepare_map(mm, pfns.len(), va_range)?;
+        self.execute_map(mm, prepared, pfns, writable)
+    }
+
+    /// Unmap all pages in a [`MappedRange`] with a single TLB flush.
+    pub(crate) fn unmap_pages(&mut self, mm: &GpuMm, range: MappedRange) -> Result {
+        self.pt_map
+            .invalidate_ptes(mm, range.vfn_start, range.num_pages)?;
+
+        // TODO: Internal page table pages (PDE, PTE pages) are still kept around.
+        // This is by design as repeated maps/unmaps will be fast. As a future TODO,
+        // we can add a reclaimer here to reclaim if VRAM is short. For now, the PT
+        // pages are dropped once the `Vmm` is dropped.
+
+        // Unmap complete, safe to drop `MappedRange`.
+        range._drop_guard.disarm();
+        Ok(())
+    }
 }
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 18/20] gpu: nova-core: mm: Add BAR1 user interface
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add the BAR1 user interface for CPU access to GPU virtual memory through
the BAR1 aperture.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs       |   1 -
 drivers/gpu/nova-core/gpu.rs          |  21 +++-
 drivers/gpu/nova-core/gsp/commands.rs |   1 -
 drivers/gpu/nova-core/mm.rs           |   1 +
 drivers/gpu/nova-core/mm/bar_user.rs  | 152 ++++++++++++++++++++++++++
 5 files changed, 173 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/nova-core/mm/bar_user.rs

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 597343d5da54..e78a682a7f2a 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -47,7 +47,6 @@ pub(crate) struct NovaCore {
 const GPU_DMA_BITS: u32 = 47;
 
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
-#[expect(dead_code)]
 pub(crate) type Bar1 = pci::Bar;
 
 kernel::pci_device_table!(
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index c49fa9c380b8..1cd0f147994b 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -28,7 +28,12 @@
         commands::GetGspStaticInfoReply,
         Gsp, //
     },
-    mm::GpuMm,
+    mm::{
+        bar_user::BarUser,
+        pagetable::MmuVersion,
+        GpuMm,
+        VramAddress, //
+    },
     regs,
 };
 
@@ -122,6 +127,11 @@ pub(crate) const fn arch(self) -> Architecture {
     pub(crate) const fn needs_fwsec_bootloader(self) -> bool {
         matches!(self.arch(), Architecture::Turing) || matches!(self, Self::GA100)
     }
+
+    /// Returns the MMU version for this chipset.
+    pub(crate) fn mmu_version(self) -> MmuVersion {
+        MmuVersion::from(self.arch())
+    }
 }
 
 // TODO
@@ -250,6 +260,8 @@ pub(crate) struct Gpu {
     gsp: Gsp,
     /// Static GPU information from GSP.
     gsp_static_info: GetGspStaticInfoReply,
+    /// BAR1 user interface for CPU access to GPU virtual memory.
+    bar_user: BarUser,
 }
 
 impl Gpu {
@@ -308,6 +320,13 @@ pub(crate) fn new<'a>(
                 }, pramin_vram_region)?
             },
 
+            // Create BAR1 user interface for CPU access to GPU virtual memory.
+            bar_user: {
+                let pdb_addr = VramAddress::new(gsp_static_info.bar1_pde_base);
+                let bar1_size = pdev.resource_len(1)?;
+                BarUser::new(pdb_addr, spec.chipset, bar1_size)?
+            },
+
             bar: devres_bar,
         })
     }
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 9bf0d32c6a7f..32df0fe4b9c2 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -194,7 +194,6 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
 pub(crate) struct GetGspStaticInfoReply {
     gpu_name: [u8; 64],
     /// BAR1 Page Directory Entry base address.
-    #[expect(dead_code)]
     pub(crate) bar1_pde_base: u64,
     /// Usable FB (VRAM) region for driver memory allocation.
     pub(crate) usable_fb_region: Range<u64>,
diff --git a/drivers/gpu/nova-core/mm.rs b/drivers/gpu/nova-core/mm.rs
index 87fd6f0b956e..033e365aa4e1 100644
--- a/drivers/gpu/nova-core/mm.rs
+++ b/drivers/gpu/nova-core/mm.rs
@@ -24,6 +24,7 @@ fn from(pfn: Pfn) -> Self {
     };
 }
 
+pub(crate) mod bar_user;
 pub(super) mod pagetable;
 pub(crate) mod pramin;
 pub(super) mod tlb;
diff --git a/drivers/gpu/nova-core/mm/bar_user.rs b/drivers/gpu/nova-core/mm/bar_user.rs
new file mode 100644
index 000000000000..5f7c0e9e51f9
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/bar_user.rs
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! BAR1 user interface for CPU access to GPU virtual memory. Used for USERD
+//! for GPU work submission, and applications to access GPU buffers via mmap().
+
+use kernel::{
+    io::Io,
+    prelude::*, //
+};
+
+use crate::{
+    driver::Bar1,
+    gpu::Chipset,
+    mm::{
+        vmm::{
+            MappedRange,
+            Vmm, //
+        },
+        GpuMm,
+        Pfn,
+        Vfn,
+        VirtualAddress,
+        VramAddress,
+        PAGE_SIZE, //
+    },
+    num::IntoSafeCast,
+};
+
+/// BAR1 user interface for virtual memory mappings.
+///
+/// Owns a [`Vmm`] instance with virtual address tracking and provides
+/// BAR1-specific mapping and cleanup operations.
+pub(crate) struct BarUser {
+    vmm: Vmm,
+}
+
+impl BarUser {
+    /// Create a new [`BarUser`] with virtual address tracking.
+    pub(crate) fn new(pdb_addr: VramAddress, chipset: Chipset, va_size: u64) -> Result<Self> {
+        Ok(Self {
+            vmm: Vmm::new(pdb_addr, chipset.mmu_version(), va_size)?,
+        })
+    }
+
+    /// Map physical pages to a contiguous BAR1 virtual range.
+    pub(crate) fn map<'a>(
+        &'a mut self,
+        mm: &'a GpuMm,
+        bar: &'a Bar1,
+        pfns: &[Pfn],
+        writable: bool,
+    ) -> Result<BarAccess<'a>> {
+        if pfns.is_empty() {
+            return Err(EINVAL);
+        }
+
+        let mapped = self.vmm.map_pages(mm, pfns, None, writable)?;
+
+        Ok(BarAccess {
+            vmm: &mut self.vmm,
+            mm,
+            bar,
+            mapped: Some(mapped),
+        })
+    }
+}
+
+/// Access object for a mapped BAR1 region.
+///
+/// Wraps a [`MappedRange`] and provides BAR1 access. When dropped,
+/// unmaps pages and releases the VA range (by passing the range to
+/// [`Vmm::unmap_pages()`], which consumes it).
+pub(crate) struct BarAccess<'a> {
+    vmm: &'a mut Vmm,
+    mm: &'a GpuMm,
+    bar: &'a Bar1,
+    /// Needs to be an `Option` so that we can `take()` it and call `Drop`
+    /// on it in [`Vmm::unmap_pages()`].
+    mapped: Option<MappedRange>,
+}
+
+impl<'a> BarAccess<'a> {
+    /// Returns the active mapping.
+    fn mapped(&self) -> &MappedRange {
+        // `mapped` is only `None` after `take()` in `Drop`; accessors are
+        // never called from within `Drop`, so `unwrap()` never panics.
+        self.mapped.as_ref().unwrap()
+    }
+
+    /// Get the base virtual address of this mapping.
+    pub(crate) fn base(&self) -> VirtualAddress {
+        VirtualAddress::from(self.mapped().vfn_start)
+    }
+
+    /// Get the total size of the mapped region in bytes.
+    pub(crate) fn size(&self) -> usize {
+        self.mapped().num_pages * PAGE_SIZE
+    }
+
+    /// Get the starting virtual frame number.
+    pub(crate) fn vfn_start(&self) -> Vfn {
+        self.mapped().vfn_start
+    }
+
+    /// Get the number of pages in this mapping.
+    pub(crate) fn num_pages(&self) -> usize {
+        self.mapped().num_pages
+    }
+
+    /// Translate an offset within this mapping to a BAR1 aperture offset.
+    fn bar_offset(&self, offset: usize) -> Result<usize> {
+        if offset >= self.size() {
+            return Err(EINVAL);
+        }
+
+        let base_vfn: usize = self.mapped().vfn_start.raw().into_safe_cast();
+        let base = base_vfn.checked_mul(PAGE_SIZE).ok_or(EOVERFLOW)?;
+        base.checked_add(offset).ok_or(EOVERFLOW)
+    }
+
+    // Fallible accessors with runtime bounds checking.
+
+    /// Read a 32-bit value at the given offset.
+    pub(crate) fn try_read32(&self, offset: usize) -> Result<u32> {
+        self.bar.try_read32(self.bar_offset(offset)?)
+    }
+
+    /// Write a 32-bit value at the given offset.
+    pub(crate) fn try_write32(&self, value: u32, offset: usize) -> Result {
+        self.bar.try_write32(value, self.bar_offset(offset)?)
+    }
+
+    /// Read a 64-bit value at the given offset.
+    pub(crate) fn try_read64(&self, offset: usize) -> Result<u64> {
+        self.bar.try_read64(self.bar_offset(offset)?)
+    }
+
+    /// Write a 64-bit value at the given offset.
+    pub(crate) fn try_write64(&self, value: u64, offset: usize) -> Result {
+        self.bar.try_write64(value, self.bar_offset(offset)?)
+    }
+}
+
+impl Drop for BarAccess<'_> {
+    fn drop(&mut self) {
+        if let Some(mapped) = self.mapped.take() {
+            if self.vmm.unmap_pages(self.mm, mapped).is_err() {
+                kernel::pr_warn_once!("BarAccess: unmap_pages failed.\n");
+            }
+        }
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 19/20] gpu: nova-core: mm: Add BAR1 memory management self-tests
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add self-tests for BAR1 access during driver probe when
CONFIG_NOVA_MM_SELFTESTS is enabled (default disabled). This results in
testing the Vmm, GPU buddy allocator and BAR1 region all of which should
function correctly for the tests to pass.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/Kconfig         |  10 ++
 drivers/gpu/nova-core/driver.rs       |   2 +
 drivers/gpu/nova-core/gpu.rs          |  31 ++++
 drivers/gpu/nova-core/mm/bar_user.rs  | 214 ++++++++++++++++++++++++++
 drivers/gpu/nova-core/mm/pagetable.rs |  28 ++++
 5 files changed, 285 insertions(+)

diff --git a/drivers/gpu/nova-core/Kconfig b/drivers/gpu/nova-core/Kconfig
index 6513007bf66f..35de55aabcfc 100644
--- a/drivers/gpu/nova-core/Kconfig
+++ b/drivers/gpu/nova-core/Kconfig
@@ -15,3 +15,13 @@ config NOVA_CORE
 	  This driver is work in progress and may not be functional.
 
 	  If M is selected, the module will be called nova_core.
+
+config NOVA_MM_SELFTESTS
+	bool "Memory management self-tests"
+	depends on NOVA_CORE
+	help
+	  Enable self-tests for the memory management subsystem. When enabled,
+	  tests are run during GPU probe to verify PRAMIN aperture access,
+	  page table walking, and BAR1 virtual memory mapping functionality.
+
+	  This is a testing option and is default-disabled.
diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index e78a682a7f2a..6f95f8672158 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -97,6 +97,8 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
 
             Ok(try_pin_init!(Self {
                 gpu <- Gpu::new(pdev, bar.clone(), bar.access(pdev.as_ref())?),
+                // Run optional GPU selftests.
+                _: { gpu.run_selftests(pdev)? },
                 _reg <- auxiliary::Registration::new(
                     pdev.as_ref(),
                     c"nova-drm",
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 1cd0f147994b..8f236615cc13 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -341,4 +341,35 @@ pub(crate) fn unbind(&self, dev: &device::Device<device::Core>) {
             .inspect(|bar| self.sysmem_flush.unregister(bar))
             .is_err());
     }
+
+    /// Run selftests on the constructed [`Gpu`].
+    pub(crate) fn run_selftests(
+        mut self: Pin<&mut Self>,
+        pdev: &pci::Device<device::Bound>,
+    ) -> Result {
+        self.as_mut().run_mm_selftests(pdev)?;
+        Ok(())
+    }
+
+    #[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+    fn run_mm_selftests(self: Pin<&mut Self>, pdev: &pci::Device<device::Bound>) -> Result {
+        // BAR1 self-tests.
+        let bar1 = Arc::pin_init(pdev.iomap_region(1, c"nova-core/bar1"), GFP_KERNEL)?;
+        let bar1_access = bar1.access(pdev.as_ref())?;
+
+        crate::mm::bar_user::run_self_test(
+            pdev.as_ref(),
+            &self.mm,
+            bar1_access,
+            self.gsp_static_info.bar1_pde_base,
+            self.spec.chipset,
+        )?;
+
+        Ok(())
+    }
+
+    #[cfg(not(CONFIG_NOVA_MM_SELFTESTS))]
+    fn run_mm_selftests(self: Pin<&mut Self>, _pdev: &pci::Device<device::Bound>) -> Result {
+        Ok(())
+    }
 }
diff --git a/drivers/gpu/nova-core/mm/bar_user.rs b/drivers/gpu/nova-core/mm/bar_user.rs
index 5f7c0e9e51f9..8bccd8a8376b 100644
--- a/drivers/gpu/nova-core/mm/bar_user.rs
+++ b/drivers/gpu/nova-core/mm/bar_user.rs
@@ -150,3 +150,217 @@ fn drop(&mut self) {
         }
     }
 }
+
+/// Run MM subsystem self-tests during probe.
+///
+/// Tests page table infrastructure and `BAR1` MMIO access using the `BAR1`
+/// address space. Uses the `GpuMm`'s buddy allocator to allocate page tables
+/// and test pages as needed.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+pub(crate) fn run_self_test(
+    dev: &kernel::device::Device,
+    mm: &GpuMm,
+    bar1: &Bar1,
+    bar1_pdb: u64,
+    chipset: Chipset,
+) -> Result {
+    use kernel::gpu::buddy::{
+        GpuBuddyAllocFlags,
+        GpuBuddyAllocMode, //
+    };
+    use kernel::ptr::Alignment;
+    use kernel::sizes::{
+        SZ_16K,
+        SZ_32K,
+        SZ_4K,
+        SZ_64K, //
+    };
+
+    // Test patterns.
+    const PATTERN_PRAMIN: u32 = 0xDEAD_BEEF;
+    const PATTERN_BAR1: u32 = 0xCAFE_BABE;
+
+    dev_info!(dev, "MM: Starting self-test...\n");
+
+    let pdb_addr = VramAddress::new(bar1_pdb);
+
+    // Check if initial page tables are in VRAM.
+    if crate::mm::pagetable::check_pdb_valid(mm.pramin(), pdb_addr, chipset).is_err() {
+        dev_info!(dev, "MM: Self-test SKIPPED - no valid VRAM page tables\n");
+        return Ok(());
+    }
+
+    // Set up a test page from the buddy allocator.
+    let test_page_blocks = KBox::pin_init(
+        mm.buddy().alloc_blocks(
+            GpuBuddyAllocMode::Simple,
+            SZ_4K.into_safe_cast(),
+            Alignment::new::<SZ_4K>(),
+            GpuBuddyAllocFlags::default(),
+        ),
+        GFP_KERNEL,
+    )?;
+    let test_vram_offset = test_page_blocks.iter().next().ok_or(ENOMEM)?.offset();
+    let test_vram = VramAddress::new(test_vram_offset);
+    let test_pfn = Pfn::from(test_vram);
+
+    // Create a VMM of size 64K to track virtual memory mappings.
+    let mut vmm = Vmm::new(pdb_addr, chipset.mmu_version(), SZ_64K.into_safe_cast())?;
+
+    // Create a test mapping.
+    let mapped = vmm.map_pages(mm, &[test_pfn], None, true)?;
+    let test_vfn = mapped.vfn_start;
+
+    // Pre-compute test addresses for the PRAMIN to BAR1 read test.
+    let vfn_offset: usize = test_vfn.raw().into_safe_cast();
+    let bar1_base_offset = vfn_offset.checked_mul(PAGE_SIZE).ok_or(EOVERFLOW)?;
+    let bar1_read_offset: usize = bar1_base_offset + 0x100;
+    let vram_read_addr: usize = test_vram.raw() + 0x100;
+
+    // Test 1: Write via PRAMIN, read via BAR1.
+    {
+        let mut window = mm.pramin().get_window()?;
+        window.try_write32(vram_read_addr, PATTERN_PRAMIN)?;
+    }
+
+    // Read back via BAR1 aperture.
+    let bar1_value = bar1.try_read32(bar1_read_offset)?;
+
+    let test1_passed = if bar1_value == PATTERN_PRAMIN {
+        true
+    } else {
+        dev_err!(
+            dev,
+            "MM: Test 1 FAILED - Expected {:#010x}, got {:#010x}\n",
+            PATTERN_PRAMIN,
+            bar1_value
+        );
+        false
+    };
+
+    // Cleanup - invalidate PTE.
+    vmm.unmap_pages(mm, mapped)?;
+
+    // Test 2: Two-phase prepare/execute API.
+    let prepared = vmm.prepare_map(mm, 1, None)?;
+    let mapped2 = vmm.execute_map(mm, prepared, &[test_pfn], true)?;
+    let readback = vmm.read_mapping(mm, mapped2.vfn_start)?;
+    let test2_passed = if readback == Some(test_pfn) {
+        true
+    } else {
+        dev_err!(dev, "MM: Test 2 FAILED - Two-phase map readback mismatch\n");
+        false
+    };
+    vmm.unmap_pages(mm, mapped2)?;
+
+    // Test 3: Range-constrained allocation with a hole — exercises block.size()-driven
+    // BAR1 mapping. A 4K hole is punched at base+16K, then a single 32K allocation
+    // is requested within [base, base+36K). The buddy allocator must split around the
+    // hole, returning multiple blocks (expected: {16K, 4K, 8K, 4K} = 32K total).
+    // Each block is mapped into BAR1 and verified via PRAMIN read-back.
+    //
+    // Address layout (base = 0x10000):
+    //   [    16K    ] [HOLE 4K] [4K] [ 8K ] [4K]
+    //   0x10000       0x14000  0x15000 0x16000 0x18000 0x19000
+    let range_base: u64 = SZ_64K.into_safe_cast();
+    let sz_4k: u64 = SZ_4K.into_safe_cast();
+    let sz_16k: u64 = SZ_16K.into_safe_cast();
+    let sz_32k_4k: u64 = (SZ_32K + SZ_4K).into_safe_cast();
+
+    // Punch a 4K hole at base+16K so the subsequent 32K allocation must split.
+    let _hole = KBox::pin_init(
+        mm.buddy().alloc_blocks(
+            GpuBuddyAllocMode::Range(range_base + sz_16k..range_base + sz_16k + sz_4k),
+            SZ_4K.into_safe_cast(),
+            Alignment::new::<SZ_4K>(),
+            GpuBuddyAllocFlags::default(),
+        ),
+        GFP_KERNEL,
+    )?;
+
+    // Allocate 32K within [base, base+36K). The hole forces the allocator to return
+    // split blocks whose sizes are determined by buddy alignment.
+    let blocks = KBox::pin_init(
+        mm.buddy().alloc_blocks(
+            GpuBuddyAllocMode::Range(range_base..range_base + sz_32k_4k),
+            SZ_32K.into_safe_cast(),
+            Alignment::new::<SZ_4K>(),
+            GpuBuddyAllocFlags::default(),
+        ),
+        GFP_KERNEL,
+    )?;
+
+    let mut test3_passed = true;
+    let mut total_size = 0usize;
+
+    for block in blocks.iter() {
+        total_size += IntoSafeCast::<usize>::into_safe_cast(block.size());
+
+        // Map all pages of this block.
+        let page_size: u64 = PAGE_SIZE.into_safe_cast();
+        let num_pages: usize = (block.size() / page_size).into_safe_cast();
+
+        let mut pfns = KVec::new();
+        for j in 0..num_pages {
+            let j_u64: u64 = j.into_safe_cast();
+            pfns.push(
+                Pfn::from(VramAddress::new(
+                    block.offset() + j_u64.checked_mul(page_size).ok_or(EOVERFLOW)?,
+                )),
+                GFP_KERNEL,
+            )?;
+        }
+
+        let mapped = vmm.map_pages(mm, &pfns, None, true)?;
+        let bar1_base_vfn: usize = mapped.vfn_start.raw().into_safe_cast();
+        let bar1_base = bar1_base_vfn.checked_mul(PAGE_SIZE).ok_or(EOVERFLOW)?;
+
+        for j in 0..num_pages {
+            let page_bar1_off = bar1_base + j * PAGE_SIZE;
+            let j_u64: u64 = j.into_safe_cast();
+            let page_phys = block.offset()
+                + j_u64
+                    .checked_mul(PAGE_SIZE.into_safe_cast())
+                    .ok_or(EOVERFLOW)?;
+
+            bar1.try_write32(PATTERN_BAR1, page_bar1_off)?;
+
+            let pramin_val = {
+                let mut window = mm.pramin().get_window()?;
+                window.try_read32(page_phys.into_safe_cast())?
+            };
+
+            if pramin_val != PATTERN_BAR1 {
+                dev_err!(
+                    dev,
+                    "MM: Test 3 FAILED block offset {:#x} page {} (val={:#x})\n",
+                    block.offset(),
+                    j,
+                    pramin_val
+                );
+                test3_passed = false;
+            }
+        }
+
+        vmm.unmap_pages(mm, mapped)?;
+    }
+
+    // Verify aggregate: all returned block sizes must sum to allocation size.
+    if total_size != SZ_32K {
+        dev_err!(
+            dev,
+            "MM: Test 3 FAILED - total size {} != expected {}\n",
+            total_size,
+            SZ_32K
+        );
+        test3_passed = false;
+    }
+
+    if test1_passed && test2_passed && test3_passed {
+        dev_info!(dev, "MM: All self-tests PASSED\n");
+        Ok(())
+    } else {
+        dev_err!(dev, "MM: Self-tests FAILED\n");
+        Err(EIO)
+    }
+}
diff --git a/drivers/gpu/nova-core/mm/pagetable.rs b/drivers/gpu/nova-core/mm/pagetable.rs
index 4070070922a4..4db4478564c2 100644
--- a/drivers/gpu/nova-core/mm/pagetable.rs
+++ b/drivers/gpu/nova-core/mm/pagetable.rs
@@ -383,3 +383,31 @@ fn from(val: AperturePde) -> Self {
         Bounded::from_expr(val as u64 & 0x3)
     }
 }
+
+/// Check if the PDB has valid, VRAM-backed page tables.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+fn check_pdb_inner<M: MmuConfig>(
+    pramin: &pramin::Pramin,
+    pdb_addr: VramAddress,
+) -> Result {
+    let mut window = pramin.get_window()?;
+    let raw = window.try_read64(pdb_addr.raw())?;
+
+    if !M::Pde::new(raw).is_valid_vram() {
+        return Err(ENOENT);
+    }
+    Ok(())
+}
+
+/// Check if the PDB has valid, VRAM-backed page tables, dispatching by MMU version.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+pub(super) fn check_pdb_valid(
+    pramin: &pramin::Pramin,
+    pdb_addr: VramAddress,
+    chipset: crate::gpu::Chipset,
+) -> Result {
+    match MmuVersion::from(chipset.arch()) {
+        MmuVersion::V2 => check_pdb_inner::<MmuV2>(pramin, pdb_addr),
+        MmuVersion::V3 => check_pdb_inner::<MmuV3>(pramin, pdb_addr),
+    }
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 20/20] gpu: nova-core: mm: Add PRAMIN aperture self-tests
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add self-tests for the PRAMIN aperture mechanism to verify correct
operation during GPU probe. The tests validate various alignment
requirements and corner cases.

The tests are default disabled and behind CONFIG_NOVA_MM_SELFTESTS.
When enabled, tests run after GSP boot during probe.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs       |   3 +
 drivers/gpu/nova-core/mm/pramin.rs | 209 +++++++++++++++++++++++++++++
 2 files changed, 212 insertions(+)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 8f236615cc13..ba6f1f6f0485 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -353,6 +353,9 @@ pub(crate) fn run_selftests(
 
     #[cfg(CONFIG_NOVA_MM_SELFTESTS)]
     fn run_mm_selftests(self: Pin<&mut Self>, pdev: &pci::Device<device::Bound>) -> Result {
+        // PRAMIN aperture self-tests.
+        crate::mm::pramin::run_self_test(pdev.as_ref(), self.mm.pramin(), self.spec.chipset)?;
+
         // BAR1 self-tests.
         let bar1 = Arc::pin_init(pdev.iomap_region(1, c"nova-core/bar1"), GFP_KERNEL)?;
         let bar1_access = bar1.access(pdev.as_ref())?;
diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
index 91a0957b2f92..eccbaa67b39a 100644
--- a/drivers/gpu/nova-core/mm/pramin.rs
+++ b/drivers/gpu/nova-core/mm/pramin.rs
@@ -180,6 +180,11 @@ pub(crate) fn new(
         }))
     }
 
+    /// Returns the valid VRAM region for this PRAMIN instance.
+    fn vram_region(&self) -> &Range<u64> {
+        &self.vram_region
+    }
+
     /// Acquire exclusive PRAMIN access.
     ///
     /// Returns a [`PraminWindow`] guard that provides VRAM read/write accessors.
@@ -278,3 +283,207 @@ fn compute_window(
     define_pramin_write!(try_write32, u32);
     define_pramin_write!(try_write64, u64);
 }
+
+/// Offset within the VRAM region to use as the self-test area.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+const SELFTEST_REGION_OFFSET: usize = 0x1000;
+
+/// Test read/write at byte-aligned locations.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+fn test_byte_readwrite(
+    dev: &kernel::device::Device,
+    win: &mut PraminWindow<'_>,
+    base: usize,
+) -> Result {
+    for i in 0u8..4 {
+        let offset = base + 1 + usize::from(i);
+        let val = 0xA0 + i;
+        win.try_write8(offset, val)?;
+        let read_val = win.try_read8(offset)?;
+        if read_val != val {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - offset {:#x}: wrote {:#x}, read {:#x}\n",
+                offset,
+                val,
+                read_val
+            );
+            return Err(EIO);
+        }
+    }
+    Ok(())
+}
+
+/// Test writing a `u32` and reading back as individual `u8`s.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+fn test_u32_as_bytes(
+    dev: &kernel::device::Device,
+    win: &mut PraminWindow<'_>,
+    base: usize,
+) -> Result {
+    let offset = base + 0x10;
+    let val: u32 = 0xDEADBEEF;
+    win.try_write32(offset, val)?;
+
+    // Read back as individual bytes (little-endian: EF BE AD DE).
+    let expected_bytes: [u8; 4] = [0xEF, 0xBE, 0xAD, 0xDE];
+    for (i, &expected) in expected_bytes.iter().enumerate() {
+        let read_val = win.try_read8(offset + i)?;
+        if read_val != expected {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - offset {:#x}: expected {:#x}, read {:#x}\n",
+                offset + i,
+                expected,
+                read_val
+            );
+            return Err(EIO);
+        }
+    }
+    Ok(())
+}
+
+/// Test window repositioning across 1MB boundaries.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+fn test_window_reposition(
+    dev: &kernel::device::Device,
+    win: &mut PraminWindow<'_>,
+    base: usize,
+) -> Result {
+    let offset_a: usize = base;
+    let offset_b: usize = base + 0x200000; // base + 2MB (different 1MB region).
+    let val_a: u32 = 0x11111111;
+    let val_b: u32 = 0x22222222;
+
+    win.try_write32(offset_a, val_a)?;
+    win.try_write32(offset_b, val_b)?;
+
+    let read_b = win.try_read32(offset_b)?;
+    if read_b != val_b {
+        dev_err!(
+            dev,
+            "PRAMIN: FAIL - offset {:#x}: expected {:#x}, read {:#x}\n",
+            offset_b,
+            val_b,
+            read_b
+        );
+        return Err(EIO);
+    }
+
+    let read_a = win.try_read32(offset_a)?;
+    if read_a != val_a {
+        dev_err!(
+            dev,
+            "PRAMIN: FAIL - offset {:#x}: expected {:#x}, read {:#x}\n",
+            offset_a,
+            val_a,
+            read_a
+        );
+        return Err(EIO);
+    }
+    Ok(())
+}
+
+/// Test that offsets outside the VRAM region are rejected.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+fn test_invalid_offset(
+    dev: &kernel::device::Device,
+    win: &mut PraminWindow<'_>,
+    vram_end: u64,
+) -> Result {
+    let invalid_offset: usize = vram_end.into_safe_cast();
+    let result = win.try_read32(invalid_offset);
+    if result.is_ok() {
+        dev_err!(
+            dev,
+            "PRAMIN: FAIL - read at invalid offset {:#x} should have failed\n",
+            invalid_offset
+        );
+        return Err(EIO);
+    }
+    Ok(())
+}
+
+/// Test that misaligned multi-byte accesses are rejected.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+fn test_misaligned_access(
+    dev: &kernel::device::Device,
+    win: &mut PraminWindow<'_>,
+    base: usize,
+) -> Result {
+    // `u16` at odd offset (not 2-byte aligned).
+    let offset_u16 = base + 0x21;
+    if win.try_write16(offset_u16, 0xABCD).is_ok() {
+        dev_err!(
+            dev,
+            "PRAMIN: FAIL - misaligned u16 write at {:#x} should have failed\n",
+            offset_u16
+        );
+        return Err(EIO);
+    }
+
+    // `u32` at 2-byte-aligned (not 4-byte-aligned) offset.
+    let offset_u32 = base + 0x32;
+    if win.try_write32(offset_u32, 0x12345678).is_ok() {
+        dev_err!(
+            dev,
+            "PRAMIN: FAIL - misaligned u32 write at {:#x} should have failed\n",
+            offset_u32
+        );
+        return Err(EIO);
+    }
+
+    // `u64` read at 4-byte-aligned (not 8-byte-aligned) offset.
+    let offset_u64 = base + 0x44;
+    if win.try_read64(offset_u64).is_ok() {
+        dev_err!(
+            dev,
+            "PRAMIN: FAIL - misaligned u64 read at {:#x} should have failed\n",
+            offset_u64
+        );
+        return Err(EIO);
+    }
+    Ok(())
+}
+
+/// Run PRAMIN self-tests during boot if self-tests are enabled.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+pub(crate) fn run_self_test(
+    dev: &kernel::device::Device,
+    pramin: &Pramin,
+    chipset: crate::gpu::Chipset,
+) -> Result {
+    use crate::gpu::Architecture;
+
+    // PRAMIN uses NV_PBUS_BAR0_WINDOW which is only available on pre-Hopper GPUs.
+    // Hopper+ uses NV_XAL_EP_BAR0_WINDOW instead, requiring a separate HAL that
+    // has not been implemented yet.
+    if !matches!(
+        chipset.arch(),
+        Architecture::Turing | Architecture::Ampere | Architecture::Ada
+    ) {
+        dev_info!(
+            dev,
+            "PRAMIN: Skipping self-tests for {:?} (only pre-Hopper supported)\n",
+            chipset
+        );
+        return Ok(());
+    }
+
+    dev_info!(dev, "PRAMIN: Starting self-test...\n");
+
+    let vram_region = pramin.vram_region();
+    let base: usize = vram_region.start.into_safe_cast();
+    let base = base + SELFTEST_REGION_OFFSET;
+    let vram_end = vram_region.end;
+    let mut win = pramin.get_window()?;
+
+    test_byte_readwrite(dev, &mut win, base)?;
+    test_u32_as_bytes(dev, &mut win, base)?;
+    test_window_reposition(dev, &mut win, base)?;
+    test_invalid_offset(dev, &mut win, vram_end)?;
+    test_misaligned_access(dev, &mut win, base)?;
+
+    dev_info!(dev, "PRAMIN: All self-tests PASSED\n");
+    Ok(())
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 00/21] gpu: nova-core: Add memory management support
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

gpu: nova-core: Add memory management support

The patches are based on drm-rust-next and work on Ampere, and should "just
work on Blackwell" once John's Blackwell patches are merged, however it does
not depend on those patches and can independently go in.

This series depends on Alex Courbot's bitfield series:
https://lore.kernel.org/all/20260409-bitfield-v2-0-23ac400071cb@nvidia.com/

The git tree with all patches can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (tag: nova-mm-v11-20260415)

Change log:

Changes from v10 to v11:
- Rebased on new bitfield! macro infrastructure.
- Squashed Zhi Wang's "Use runtime BAR1 size" fix with Co-develop tag (Eliot)
- Refactored page table walker to generics (PtWalkInner<M: MmuConfig>)  (Eliot, Alex)
- Changed first_usable_fb_region() to return Range<u64> with checked_add (Eliot)
- Tightened visibility from pub(crate) to pub(super) across mm submodules (Eliot)
- TLB flush: added ack_globally bit for global engine acknowledgment.

Changes from v9 to v10:
- Rebased and dropped patches already merged in to drm-rust-next.
- GPU_BUDDY select folded into GpuMm patch.
- updated code with new register macro API.
- Refactored fb_regions() to use iterator (Alex Courbot).
- Renamed Pramin::window() to get_window() to make it more clear it is
  'acquiring a resource'.
- Converted Bar0WindowTarget to bounded_enum! macro, replacing TryFrom.
  Allows to use `with_*` instead of `try_with_*`.

Changes from v8 to v9:
- Added fixes from Zhi Wang for bitfield position changes in virtual address
  and larger BAR1 size on some platforms. Tested and working for vGPU usecase!
- Refactored gsp: boot() to return only GspStaticInfo, removing FbLayout (Alex).
- bar1_pde_base and bar2_pde_base are now accessed via GspStaticInfo directly (Alex).
- Added new patch "gsp: Expose total physical VRAM end from FB region info"
  introducing total_fb_end() to expose VRAM extent (Alex).
- Consolidated usable VRAM and BarUser setup; removed dedicated
  "fb: Add usable_vram field to FbLayout", "mm: Use usable VRAM region for
  buddy allocator", and "mm: Add BarUser to struct Gpu and create at boot".

Changes from v7 to v8:
- Incorporated "Select GPU_BUDDY for VRAM allocation" patch from the
  dependency series (Alex).
- Significant patch reordering for better logical flow (GSP/FB patches
  moved earlier, page table patches, Vmm, Bar1, tests) (Alex).
- Replaced several 'as' usages with into_safe_cast() (Danilo, Alex).
- Updated BAR 1 test cases to include exercising the block size API
  (Eliot, Danilo).

Changes from v6 to v7:
- Addressed DMA fence signalling usecase per Danilo's feedback.

Pre v6:
- Simplified PRAMIN code (John Hubbard, Alex Courbot).
- Handled different MMU versions: ver2 versus ver3 (John Hubbard).
- Added BAR1 usecase so we have user of DRM Buddy / VMM (John Hubbard).
- Iterating over clist/buddy bindings.

Link to v10: https://lore.kernel.org/all/20260331212048.2229260-1-joelagnelf@nvidia.com/
Link to v9: https://lore.kernel.org/all/20260311004008.2208806-1-joelagnelf@nvidia.com/
Link to v8: https://lore.kernel.org/all/20260224225323.3312204-1-joelagnelf@nvidia.com/
Link to v7: https://lore.kernel.org/all/20260218212020.800836-1-joelagnelf@nvidia.com/

Alexandre Courbot (1):
  gpu: nova-core: switch to kernel bitfield macro

Joel Fernandes (20):
  gpu: nova-core: gsp: Return GspStaticInfo from boot()
  gpu: nova-core: gsp: Extract usable FB region from GSP
  gpu: nova-core: gsp: Expose total physical VRAM end from FB region
    info
  gpu: nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  gpu: nova-core: mm: Add common memory management types
  gpu: nova-core: mm: Add TLB flush support
  gpu: nova-core: mm: Add GpuMm centralized memory manager
  gpu: nova-core: mm: Add common types for all page table formats
  gpu: nova-core: mm: Add MMU v2 page table types
  gpu: nova-core: mm: Add MMU v3 page table types
  gpu: nova-core: mm: Add unified page table entry wrapper enums
  gpu: nova-core: mm: Add page table walker for MMU v2/v3
  gpu: nova-core: mm: Add Virtual Memory Manager
  gpu: nova-core: mm: Add virtual address range tracking to VMM
  gpu: nova-core: mm: Add multi-page mapping API to VMM
  gpu: nova-core: Add BAR1 aperture type and size constant
  gpu: nova-core: mm: Add BAR1 user interface
  gpu: nova-core: mm: Add BAR1 memory management self-tests
  gpu: nova-core: mm: Add PRAMIN aperture self-tests

 Documentation/gpu/nova/core/pramin.rst     | 123 ++++++
 Documentation/gpu/nova/index.rst           |   1 +
 drivers/gpu/nova-core/Kconfig              |  11 +
 drivers/gpu/nova-core/bitfield.rs          | 330 --------------
 drivers/gpu/nova-core/driver.rs            |   3 +
 drivers/gpu/nova-core/gpu.rs               |  94 +++-
 drivers/gpu/nova-core/gsp/boot.rs          |   9 +-
 drivers/gpu/nova-core/gsp/commands.rs      |  18 +-
 drivers/gpu/nova-core/gsp/fw.rs            |  15 +-
 drivers/gpu/nova-core/gsp/fw/commands.rs   |  60 ++-
 drivers/gpu/nova-core/mm.rs                | 270 ++++++++++++
 drivers/gpu/nova-core/mm/bar_user.rs       | 366 +++++++++++++++
 drivers/gpu/nova-core/mm/pagetable.rs      | 413 +++++++++++++++++
 drivers/gpu/nova-core/mm/pagetable/map.rs  | 338 ++++++++++++++
 drivers/gpu/nova-core/mm/pagetable/ver2.rs | 307 +++++++++++++
 drivers/gpu/nova-core/mm/pagetable/ver3.rs | 431 ++++++++++++++++++
 drivers/gpu/nova-core/mm/pagetable/walk.rs | 242 ++++++++++
 drivers/gpu/nova-core/mm/pramin.rs         | 489 +++++++++++++++++++++
 drivers/gpu/nova-core/mm/tlb.rs            |  97 ++++
 drivers/gpu/nova-core/mm/vmm.rs            | 317 +++++++++++++
 drivers/gpu/nova-core/nova_core.rs         |   4 +-
 drivers/gpu/nova-core/regs.rs              |  54 +++
 22 files changed, 3643 insertions(+), 349 deletions(-)
 create mode 100644 Documentation/gpu/nova/core/pramin.rst
 delete mode 100644 drivers/gpu/nova-core/bitfield.rs
 create mode 100644 drivers/gpu/nova-core/mm.rs
 create mode 100644 drivers/gpu/nova-core/mm/bar_user.rs
 create mode 100644 drivers/gpu/nova-core/mm/pagetable.rs
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/map.rs
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/ver2.rs
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/ver3.rs
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/walk.rs
 create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
 create mode 100644 drivers/gpu/nova-core/mm/tlb.rs
 create mode 100644 drivers/gpu/nova-core/mm/vmm.rs


base-commit: 74a720e00dfbb3ab92934660b4692b90331623ac
-- 
2.34.1


^ permalink raw reply

* [RFC PATCH v1.2 0/2] mm/damon/stat: add kdamond_pid parameter
From: SeongJae Park @ 2026-04-16  0:21 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
	linux-kernel, linux-mm

DAMON_STAT doesn't provide the pid of its kdamond, unlike DAMON_RECLAIM
and DAMON_LRU_SORT.  This makes user-space management of DAMON_STAT
unnecessarily complicated.  Provide the information via a new parameter,
namely kdamond_pid, and document it.

Changes from RFC v1.1
- rfc v1.1: https://lore.kernel.org/20260414235912.98174-1-sj@kernel.org
- Close the parentheses of error handling block.
Changes from RFC
- rfc: https://lore.kernel.org/20260414053742.90296-1-sj@kernel.org
- Fix damon_kdamond_pid() failure handling.

SeongJae Park (2):
  mm/damon/stat: add a parameter for reading kdamond pid
  Docs/admin-guide/mm/damon/stat: document kdamond_pid parameter

 Documentation/admin-guide/mm/damon/stat.rst |  7 +++++++
 mm/damon/stat.c                             | 18 ++++++++++++++++++
 2 files changed, 25 insertions(+)


base-commit: bf44f59d29186d80db01e4124a8ab23b3b235b32
-- 
2.47.3

^ permalink raw reply

* [RFC PATCH v1.2 2/2] Docs/admin-guide/mm/damon/stat: document kdamond_pid parameter
From: SeongJae Park @ 2026-04-16  0:21 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
	linux-kernel, linux-mm
In-Reply-To: <20260416002149.87090-1-sj@kernel.org>

Update DAMON_STAT usage document for newly added kdamond_pid parameter.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 Documentation/admin-guide/mm/damon/stat.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/admin-guide/mm/damon/stat.rst b/Documentation/admin-guide/mm/damon/stat.rst
index c4b14daeb2dd6..46c5dd96aa2ed 100644
--- a/Documentation/admin-guide/mm/damon/stat.rst
+++ b/Documentation/admin-guide/mm/damon/stat.rst
@@ -89,3 +89,10 @@ percentiles of the idle time values via this read-only parameter.  Reading the
 parameter returns 101 idle time values in milliseconds, separated by comma.
 Each value represents 0-th, 1st, 2nd, 3rd, ..., 99th and 100th percentile idle
 times.
+
+kdamond_pid
+-----------
+
+PID of the DAMON thread.
+
+If DAMON_STAT is enabled, this becomes the PID of the worker thread.  Else, -1.
-- 
2.47.3

^ permalink raw reply related

* Re: [PATCH] Documentation/gpu: resolve kerneldoc duplicate declaration warning
From: Sanjay Chitroda @ 2026-04-16  2:17 UTC (permalink / raw)
  To: airlied, simona, maarten.lankhorst, mripard, tzimmermann, corbet,
	skhan
  Cc: dri-devel, linux-doc, linux-kernel
In-Reply-To: <20260318143612.2533863-1-sanjayembedded@gmail.com>



On 18 March 2026 8:06:12 pm IST, Sanjay Chitroda <sanjayembeddedse@gmail.com> wrote:
>From: Sanjay Chitroda <sanjayembeddedse@gmail.com>
>
>kernel-doc build with `make htmldocs` reports following WARNINGS.
>
>  Documentation/gpu/drm-kms:360: ../drivers/gpu/drm/drm_fourcc.c:397:
>  WARNING: Duplicate C declaration, also defined at gpu/drm-kms:35.
>  Declaration is '.. c:function::
>  const struct drm_format_info * drm_format_info (u32 format)'.
>
>  Documentation/gpu/drm-kms:491: ../drivers/gpu/drm/drm_modeset_lock.c:377:
>  WARNING: Duplicate C declaration, also defined at gpu/drm-kms:48.
>  Declaration is '.. c:function:: int drm_modeset_lock
>  (struct drm_modeset_lock *lock, struct drm_modeset_acquire_ctx *ctx)'.
>
>  Documentation/gpu/drm-uapi:607: ../drivers/gpu/drm/drm_ioctl.c:923:
>  WARNING: Duplicate C declaration, also defined at gpu/drm-uapi:69.
>  Declaration is '.. c:function::
>  bool drm_ioctl_flags (unsigned int nr, unsigned int *flags)'.
>
>Add :no-identifiers: to prevent duplicate identifier generation and
>keep the function documented at its implementation site.
>
>No functional change.
>
>Link: https://lore.kernel.org/oe-kbuild-all/202512302319.1PGGt3CN-lkp@intel.com/
>Signed-off-by: Sanjay Chitroda <sanjayembeddedse@gmail.com>
>---
> Documentation/gpu/drm-kms.rst  | 2 ++
> Documentation/gpu/drm-uapi.rst | 3 ++-
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
>diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
>index 2292e65f044c..0f0f90b3186d 100644
>--- a/Documentation/gpu/drm-kms.rst
>+++ b/Documentation/gpu/drm-kms.rst
>@@ -356,6 +356,7 @@ Format Functions Reference
> 
> .. kernel-doc:: include/drm/drm_fourcc.h
>    :internal:
>+   :no-identifiers: drm_format_info
> 
> .. kernel-doc:: drivers/gpu/drm/drm_fourcc.c
>    :export:
>@@ -487,6 +488,7 @@ KMS Locking
> 
> .. kernel-doc:: include/drm/drm_modeset_lock.h
>    :internal:
>+   :no-identifiers: drm_modeset_lock
> 
> .. kernel-doc:: drivers/gpu/drm/drm_modeset_lock.c
>    :export:
>diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
>index d98428a592f1..73b49e0dcda8 100644
>--- a/Documentation/gpu/drm-uapi.rst
>+++ b/Documentation/gpu/drm-uapi.rst
>@@ -603,6 +603,7 @@ DRM specific patterns. Note that ENOTTY has the slightly unintuitive meaning of
> 
> .. kernel-doc:: include/drm/drm_ioctl.h
>    :internal:
>+   :no-identifiers: drm_ioctl_flags
> 
> .. kernel-doc:: drivers/gpu/drm/drm_ioctl.c
>    :export:
>@@ -761,4 +762,4 @@ Stable uAPI events
> From ``drivers/gpu/drm/scheduler/gpu_scheduler_trace.h``
> 
> .. kernel-doc::  drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
>-   :doc: uAPI trace events
>\ No newline at end of file
>+   :doc: uAPI trace events

Hi all,

Requesting your input on the change.
Is there anything that I missed or required correction ?



^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Michael Roth @ 2026-04-16  3:05 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jroedel, jthoughton, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <3blpenhpvysb2ig7efegedx4v3flppl5ftnz6vhpqlatfk3ycn@vmmhs7mvjieg>

On Wed, Apr 15, 2026 at 01:20:41PM -0500, Michael Roth wrote:
> On Tue, Apr 14, 2026 at 06:37:00PM -0500, Michael Roth wrote:
> > On Wed, Apr 01, 2026 at 03:38:12PM -0700, Ackerley Tng wrote:
> > > Michael Roth <michael.roth@amd.com> writes:
> > > 
> > > >
> > > > [...snip...]
> > > >
> > > >>  static unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
> > > >>  {
> > > >> @@ -2635,6 +2625,8 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > >>  		return -EINVAL;
> > > >>  	if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
> > > >>  		return -EINVAL;
> > > >> +	if (attrs->error_offset)
> > > >> +		return -EINVAL;
> > > >>  	for (i = 0; i < ARRAY_SIZE(attrs->reserved); i++) {
> > > >>  		if (attrs->reserved[i])
> > > >>  			return -EINVAL;
> > > >> @@ -4983,6 +4975,11 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> > > >>  		return 1;
> > > >>  	case KVM_CAP_GUEST_MEMFD_FLAGS:
> > > >>  		return kvm_gmem_get_supported_flags(kvm);
> > > >> +	case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
> > > >> +		if (vm_memory_attributes)
> > > >> +			return 0;
> > > >> +
> > > >> +		return kvm_supported_mem_attributes(kvm);
> > > >
> > > > Based on the discussion from the PUCK call this morning,
> > > 
> > > Thanks for copying the discussion here, I'll start attending PUCK to
> > > catch those discussions too :)
> > > 
> > > > it sounds like it
> > > > would be a good idea to limit kvm_supported_mem_attributes() to only
> > > > reporting KVM_MEMORY_ATTRIBUTE_PRIVATE if the underlying CoCo
> > > > implementation has all the necessary enablement to support in-place
> > > > conversion via guest_memfd. In the case of SNP, there is a
> > > > documentation/parameter check in snp_launch_update() that needs to be
> > > > relaxed in order for userspace to be able to pass in a NULL 'src'
> > > > parameter (since, for in-place conversion, it would be initialized in place
> > > > as shared memory prior to the call, since by the time kvm_gmem_poulate()
> > > > it will have been set to private and therefore cannot be faulted in via
> > > > GUP (and if it could, we'd be unecessarily copying the src back on top
> > > > of itself since src/dst are the same).
> > > 
> > > Could this be a separate thing? If I'm understanding you correctly, it's
> > > not strictly a requirement for snp_launch_update() to first support a
> > > NULL 'src' parameter before this series lands.
> > 
> > I think we are already sync'd up on this during PUCK, but for the benefit
> > of others: Sean pointed out that if we don't then we'll need to add yet
> > another capability so userspace can determine when it can actually do
> > in-place conversion for SNP.
> 
> (in-place conversion for SNP during pre-launch/populate phase, I meant)
> 
> > 
> > Right now, this series effectively advertises in place conversion at the
> > point where KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES reports
> > 'KVM_MEMORY_ATTRIBUTE_PRIVATE', so I slightly reworked the series to
> > include the snp_launch_update() change prior to that point in time in
> > the series. Thanks to prereqs and changes/requirements you've already
> > pulled in, it's just one additional patch now:
> > 
> >  KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE 
> > 
> > I also did some minor updates (prefixed with a "[squash]" tag) to advertise
> > the KVM_SET_MEMORY_ATTRIBUTES2_PRESERVED flag so it can be used by
> 
> Though I'm not sure how we deal with it if SNP/TDX at some point become
> capable of using the PRESERVED flag *after* populate... but maybe that's
> too unlikely to worry about? If we wanted to address it though, we could
> have both PRESERVED and PRESERVED_BEFORE_LAUNCH so they can be
> enumerated separately from the start.
> 
> > userspace for SNP/TDX in the kvm_gmem_populate() path as agreed upon
> > during PUCK.
> > 
> > The branch is here, with the patches moved to where I think they
> > should remain (or be squashed in for the [squash] ones):
> > 
> >   https://github.com/AMDESE/linux/commits/guest_memfd-inplace-conversion-v4-snp2/

Argh! Sorry (again), I just now realized there was a typo here, this is the
tree I'd intended to push (and what was described in the original email):

  https://github.com/AMDESE/linux/commits/guest_memfd-inplace-conversion-v4-snp3/

-Mike

> > 
> > I've also updated the QEMU patches to use the agreed-upon API flow and
> > pushed them here:
> > 
> >   https://github.com/AMDESE/qemu/commits/snp-inplace-for-v4-wip2/
> > 
> > To start an SNP guest with in-place conversion:
> > 
> >   qemu-system-x86 \
> >   -machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
> >   -object sev-snp-guest,id=sev0,...,convert-in-place=true \
> >   -object memory-backend-memfd,id=ram1,size=16G,share=true,reserve=false
> 
> Sorry, that should've been:
> 
>   -object memory-backend-guest-memfd,id=ram1,size=16G,share=true,reserve=false
> 
> > 
> > To start an normal non-CoCo guest backed by guest_memfd with shared memory:
> > 
> >   qemu-system-x86 \
> >   -machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
> >   -object memory-backend-memfd,id=ram1,size=16G,share=true,reserve=false
> 
> and:
> 
>   -object memory-backend-guest-memfd,id=ram1,size=16G,share=true,reserve=false
> 
> (and both require kvm.vm_memory_attributes=0)
> 
> -Mike
> 
> > 
> > Thanks,
> > 
> > Mike

^ permalink raw reply

* Re: [PATCH v4] KEYS: trusted: Debugging as a feature
From: Nayna Jain @ 2026-04-16  3:16 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-integrity
  Cc: Srish Srinivasan, Jonathan Corbet, Shuah Khan, James Bottomley,
	Mimi Zohar, David Howells, Paul Moore, James Morris,
	Serge E. Hallyn, Ahmad Fatoum, Pengutronix Kernel Team,
	Andrew Morton, Borislav Petkov (AMD), Randy Dunlap, Dave Hansen,
	Pawan Gupta, Feng Tang, Dapeng Mi, Kees Cook, Marco Elver,
	Li RongQing, Paul E. McKenney, Thomas Gleixner, Bjorn Helgaas,
	linux-doc, linux-kernel, keyrings, linux-security-module
In-Reply-To: <20260415111202.63888-1-jarkko@kernel.org>


On 4/15/26 7:12 AM, Jarkko Sakkinen wrote:
> TPM_DEBUG, and other similar flags, are a non-standard way to specify a
> feature in Linux kernel. Introduce CONFIG_TRUSTED_KEYS_DEBUG for trusted
> keys, and use it to replace these ad-hoc feature flags.
>
> Given that trusted keys debug dumps can contain sensitive data, harden the
> feature as follows:
>
> 1. In the Kconfig description postulate that pr_debug() statements must be
>     used.
> 2. Use pr_debug() statements in TPM 1.x driver to print the protocol dump.
> 3. Require trusted.debug=1 on the kernel command line (default: 0) to
>     activate dumps at runtime, even when CONFIG_TRUSTED_KEYS_DEBUG=y.
>
> Traces, when actually needed, can be easily enabled by providing
> trusted.dyndbg='+p' and trusted.debug=1 in the kernel command-line.

Thanks for adding the documentation.

Reviewed-by: Nayna Jain <nayna@linux.ibm.com>

>
> Reported-by: Nayna Jain <nayna@linux.ibm.com>
> Closes: https://lore.kernel.org/all/7f8b8478-5cd8-4d97-bfd0-341fd5cf10f9@linux.ibm.com/
> Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
> Tested-by: Srish Srinivasan <ssrish@linux.ibm.com>
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---
> v4:
> - Added kernel parameter documentation.t
> - Added tags from Srishand and Nayna.
> - Sanity check round. This version will be applied unless there is
>    something specific to address.
> v3:
> - Add kernel-command line option for enabling the traces.
> - Add safety information to the Kconfig entry.
> v2:
> - Implement for all trusted keys backends.
> - Add HAVE_TRUSTED_KEYS_DEBUG as it is a good practice despite full
>    coverage.
> ---
>   .../admin-guide/kernel-parameters.txt         | 16 +++++++
>   include/keys/trusted-type.h                   | 21 +++++----
>   security/keys/trusted-keys/Kconfig            | 23 ++++++++++
>   security/keys/trusted-keys/trusted_caam.c     |  7 ++-
>   security/keys/trusted-keys/trusted_core.c     |  6 +++
>   security/keys/trusted-keys/trusted_tpm1.c     | 44 +++++++++++--------
>   6 files changed, 87 insertions(+), 30 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index f2ce1f4975c1..f1515668c8ab 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -7917,6 +7917,22 @@ Kernel parameters
>   			first trust source as a backend which is initialized
>   			successfully during iteration.
>   
> +	trusted.debug=	[KEYS]
> +			Format: <bool>
> +			Enable trusted keys debug traces at runtime when
> +			CONFIG_TRUSTED_KEYS_DEBUG=y.
> +
> +			To make the traces visible after enabling the option,
> +			use trusted.dyndbg='+p' as needed. By convention,
> +			the subsystem uses pr_debug() for these traces.
> +
> +			SAFETY: The traces can leak sensitive data, so be
> +			cautious before enabling this. They remain inactive
> +			unless this parameter is set this option to  a true
> +			value.
> +
> +			Default: false
> +
>   	trusted.rng=	[KEYS]
>   			Format: <string>
>   			The RNG used to generate key material for trusted keys.
> diff --git a/include/keys/trusted-type.h b/include/keys/trusted-type.h
> index 03527162613f..9f9940482da4 100644
> --- a/include/keys/trusted-type.h
> +++ b/include/keys/trusted-type.h
> @@ -83,18 +83,21 @@ struct trusted_key_source {
>   
>   extern struct key_type key_type_trusted;
>   
> -#define TRUSTED_DEBUG 0
> +#ifdef CONFIG_TRUSTED_KEYS_DEBUG
> +extern bool trusted_debug;
>   
> -#if TRUSTED_DEBUG
>   static inline void dump_payload(struct trusted_key_payload *p)
>   {
> -	pr_info("key_len %d\n", p->key_len);
> -	print_hex_dump(KERN_INFO, "key ", DUMP_PREFIX_NONE,
> -		       16, 1, p->key, p->key_len, 0);
> -	pr_info("bloblen %d\n", p->blob_len);
> -	print_hex_dump(KERN_INFO, "blob ", DUMP_PREFIX_NONE,
> -		       16, 1, p->blob, p->blob_len, 0);
> -	pr_info("migratable %d\n", p->migratable);
> +	if (!trusted_debug)
> +		return;
> +
> +	pr_debug("key_len %d\n", p->key_len);
> +	print_hex_dump_debug("key ", DUMP_PREFIX_NONE,
> +			     16, 1, p->key, p->key_len, 0);
> +	pr_debug("bloblen %d\n", p->blob_len);
> +	print_hex_dump_debug("blob ", DUMP_PREFIX_NONE,
> +			     16, 1, p->blob, p->blob_len, 0);
> +	pr_debug("migratable %d\n", p->migratable);
>   }
>   #else
>   static inline void dump_payload(struct trusted_key_payload *p)
> diff --git a/security/keys/trusted-keys/Kconfig b/security/keys/trusted-keys/Kconfig
> index 9e00482d886a..e5a4a53aeab2 100644
> --- a/security/keys/trusted-keys/Kconfig
> +++ b/security/keys/trusted-keys/Kconfig
> @@ -1,10 +1,29 @@
>   config HAVE_TRUSTED_KEYS
>   	bool
>   
> +config HAVE_TRUSTED_KEYS_DEBUG
> +	bool
> +
> +config TRUSTED_KEYS_DEBUG
> +	bool "Debug trusted keys"
> +	depends on HAVE_TRUSTED_KEYS_DEBUG
> +	default n
> +	help
> +	  Trusted key backends and core code that support debug traces can
> +	  opt-in that feature here. Traces must only use debug level output, as
> +	  sensitive data may pass by. In the kernel-command line traces can be
> +	  enabled via trusted.dyndbg='+p'.
> +
> +	  SAFETY: Debug dumps are inactive at runtime until trusted.debug is set
> +	  to a true value on the kernel command-line. Use at your utmost
> +	  consideration when enabling this feature on a production build. The
> +	  general advice is not to do this.
> +
>   config TRUSTED_KEYS_TPM
>   	bool "TPM-based trusted keys"
>   	depends on TCG_TPM >= TRUSTED_KEYS
>   	default y
> +	select HAVE_TRUSTED_KEYS_DEBUG
>   	select CRYPTO_HASH_INFO
>   	select CRYPTO_LIB_SHA1
>   	select CRYPTO_LIB_UTILS
> @@ -23,6 +42,7 @@ config TRUSTED_KEYS_TEE
>   	bool "TEE-based trusted keys"
>   	depends on TEE >= TRUSTED_KEYS
>   	default y
> +	select HAVE_TRUSTED_KEYS_DEBUG
>   	select HAVE_TRUSTED_KEYS
>   	help
>   	  Enable use of the Trusted Execution Environment (TEE) as trusted
> @@ -33,6 +53,7 @@ config TRUSTED_KEYS_CAAM
>   	depends on CRYPTO_DEV_FSL_CAAM_JR >= TRUSTED_KEYS
>   	select CRYPTO_DEV_FSL_CAAM_BLOB_GEN
>   	default y
> +	select HAVE_TRUSTED_KEYS_DEBUG
>   	select HAVE_TRUSTED_KEYS
>   	help
>   	  Enable use of NXP's Cryptographic Accelerator and Assurance Module
> @@ -42,6 +63,7 @@ config TRUSTED_KEYS_DCP
>   	bool "DCP-based trusted keys"
>   	depends on CRYPTO_DEV_MXS_DCP >= TRUSTED_KEYS
>   	default y
> +	select HAVE_TRUSTED_KEYS_DEBUG
>   	select HAVE_TRUSTED_KEYS
>   	help
>   	  Enable use of NXP's DCP (Data Co-Processor) as trusted key backend.
> @@ -50,6 +72,7 @@ config TRUSTED_KEYS_PKWM
>   	bool "PKWM-based trusted keys"
>   	depends on PSERIES_PLPKS >= TRUSTED_KEYS
>   	default y
> +	select HAVE_TRUSTED_KEYS_DEBUG
>   	select HAVE_TRUSTED_KEYS
>   	help
>   	  Enable use of IBM PowerVM Key Wrapping Module (PKWM) as a trusted key backend.
> diff --git a/security/keys/trusted-keys/trusted_caam.c b/security/keys/trusted-keys/trusted_caam.c
> index 601943ce0d60..6a33dbf2a7f5 100644
> --- a/security/keys/trusted-keys/trusted_caam.c
> +++ b/security/keys/trusted-keys/trusted_caam.c
> @@ -28,10 +28,13 @@ static const match_table_t key_tokens = {
>   	{opt_err, NULL}
>   };
>   
> -#ifdef CAAM_DEBUG
> +#ifdef CONFIG_TRUSTED_KEYS_DEBUG
>   static inline void dump_options(const struct caam_pkey_info *pkey_info)
>   {
> -	pr_info("key encryption algo %d\n", pkey_info->key_enc_algo);
> +	if (!trusted_debug)
> +		return;
> +
> +	pr_debug("key encryption algo %d\n", pkey_info->key_enc_algo);
>   }
>   #else
>   static inline void dump_options(const struct caam_pkey_info *pkey_info)
> diff --git a/security/keys/trusted-keys/trusted_core.c b/security/keys/trusted-keys/trusted_core.c
> index 0b142d941cd2..6aed17bee09d 100644
> --- a/security/keys/trusted-keys/trusted_core.c
> +++ b/security/keys/trusted-keys/trusted_core.c
> @@ -31,6 +31,12 @@ static char *trusted_rng = "default";
>   module_param_named(rng, trusted_rng, charp, 0);
>   MODULE_PARM_DESC(rng, "Select trusted key RNG");
>   
> +#ifdef CONFIG_TRUSTED_KEYS_DEBUG
> +bool trusted_debug;
> +module_param_named(debug, trusted_debug, bool, 0);
> +MODULE_PARM_DESC(debug, "Enable trusted keys debug traces (default: 0)");
> +#endif
> +
>   static char *trusted_key_source;
>   module_param_named(source, trusted_key_source, charp, 0);
>   MODULE_PARM_DESC(source, "Select trusted keys source (tpm, tee, caam, dcp or pkwm)");
> diff --git a/security/keys/trusted-keys/trusted_tpm1.c b/security/keys/trusted-keys/trusted_tpm1.c
> index 6ea728f1eae6..13513819991e 100644
> --- a/security/keys/trusted-keys/trusted_tpm1.c
> +++ b/security/keys/trusted-keys/trusted_tpm1.c
> @@ -46,38 +46,44 @@ enum {
>   	SRK_keytype = 4
>   };
>   
> -#define TPM_DEBUG 0
> -
> -#if TPM_DEBUG
> +#ifdef CONFIG_TRUSTED_KEYS_DEBUG
>   static inline void dump_options(struct trusted_key_options *o)
>   {
> -	pr_info("sealing key type %d\n", o->keytype);
> -	pr_info("sealing key handle %0X\n", o->keyhandle);
> -	pr_info("pcrlock %d\n", o->pcrlock);
> -	pr_info("pcrinfo %d\n", o->pcrinfo_len);
> -	print_hex_dump(KERN_INFO, "pcrinfo ", DUMP_PREFIX_NONE,
> -		       16, 1, o->pcrinfo, o->pcrinfo_len, 0);
> +	if (!trusted_debug)
> +		return;
> +
> +	pr_debug("sealing key type %d\n", o->keytype);
> +	pr_debug("sealing key handle %0X\n", o->keyhandle);
> +	pr_debug("pcrlock %d\n", o->pcrlock);
> +	pr_debug("pcrinfo %d\n", o->pcrinfo_len);
> +	print_hex_dump_debug("pcrinfo ", DUMP_PREFIX_NONE,
> +			     16, 1, o->pcrinfo, o->pcrinfo_len, 0);
>   }
>   
>   static inline void dump_sess(struct osapsess *s)
>   {
> -	print_hex_dump(KERN_INFO, "trusted-key: handle ", DUMP_PREFIX_NONE,
> -		       16, 1, &s->handle, 4, 0);
> -	pr_info("secret:\n");
> -	print_hex_dump(KERN_INFO, "", DUMP_PREFIX_NONE,
> -		       16, 1, &s->secret, SHA1_DIGEST_SIZE, 0);
> -	pr_info("trusted-key: enonce:\n");
> -	print_hex_dump(KERN_INFO, "", DUMP_PREFIX_NONE,
> -		       16, 1, &s->enonce, SHA1_DIGEST_SIZE, 0);
> +	if (!trusted_debug)
> +		return;
> +
> +	print_hex_dump_debug("trusted-key: handle ", DUMP_PREFIX_NONE,
> +			     16, 1, &s->handle, 4, 0);
> +	pr_debug("secret:\n");
> +	print_hex_dump_debug("", DUMP_PREFIX_NONE,
> +			     16, 1, &s->secret, SHA1_DIGEST_SIZE, 0);
> +	pr_debug("trusted-key: enonce:\n");
> +	print_hex_dump_debug("", DUMP_PREFIX_NONE,
> +			     16, 1, &s->enonce, SHA1_DIGEST_SIZE, 0);
>   }
>   
>   static inline void dump_tpm_buf(unsigned char *buf)
>   {
>   	int len;
>   
> -	pr_info("\ntpm buffer\n");
> +	if (!trusted_debug)
> +		return;
> +	pr_debug("\ntpm buffer\n");
>   	len = LOAD32(buf, TPM_SIZE_OFFSET);
> -	print_hex_dump(KERN_INFO, "", DUMP_PREFIX_NONE, 16, 1, buf, len, 0);
> +	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, buf, len, 0);
>   }
>   #else
>   static inline void dump_options(struct trusted_key_options *o)

^ permalink raw reply

* Re: [PATCH mm-unstable v15 05/13] mm/khugepaged: generalize collapse_huge_page for mTHP collapse
From: Nico Pache @ 2026-04-16  4:14 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <9f0b8790-eace-4caa-a0c0-45f66285887f@lucifer.local>

On Tue, Mar 17, 2026 at 10:52 AM Lorenzo Stoakes (Oracle)
<ljs@kernel.org> wrote:
>
> On Wed, Feb 25, 2026 at 08:24:27PM -0700, Nico Pache wrote:
> > Pass an order and offset to collapse_huge_page to support collapsing anon
> > memory to arbitrary orders within a PMD. order indicates what mTHP size we
> > are attempting to collapse to, and offset indicates were in the PMD to
> > start the collapse attempt.
> >
> > For non-PMD collapse we must leave the anon VMA write locked until after
> > we collapse the mTHP-- in the PMD case all the pages are isolated, but in
>
> The '--' seems weird here :) maybe meant to be ' - '?

It's called an em-dash, and I've been utilizing them for ages. Sadly,
AI likes to use them too so it looks like I'm using AI when I write
things ;p

>
> > the mTHP case this is not true, and we must keep the lock to prevent
> > changes to the VMA from occurring.
>
> You mean changes to the page tables right? rmap won't alter VMA parameters
> without a VMA lock. Better to be specific.

yes, I will update, thanks!

>
> >
> > Also convert these BUG_ON's to WARN_ON_ONCE's as these conditions, while
> > unexpected, should not bring down the system.
> >
> > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >  mm/khugepaged.c | 102 +++++++++++++++++++++++++++++-------------------
> >  1 file changed, 62 insertions(+), 40 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 99f78f0e44c6..fb3ba8fe5a6c 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1150,44 +1150,53 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
> >       return SCAN_SUCCEED;
> >  }
> >
> > -static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long address,
> > -             int referenced, int unmapped, struct collapse_control *cc)
> > +static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long start_addr,
> > +             int referenced, int unmapped, struct collapse_control *cc,
> > +             bool *mmap_locked, unsigned int order)
>
> This is getting horrible, could we maybe look at passing through a helper
> struct or something?

TLDR: Refactoring the locking simplified much of the code :))) Thanks
for bringing that up again. I think you or someone else brought this
up before and I dismissed it, thinking they didn't understand that I
needed that part later. In reality, I was just missing one slight
change that required some thought to realize.

Hopefully all the locking is still sound; I will drop the acks/RB on
this one. Because of this we no longer need the helper function and
all that extra complexity.

>
> >  {
> >       LIST_HEAD(compound_pagelist);
> >       pmd_t *pmd, _pmd;
> > -     pte_t *pte;
> > +     pte_t *pte = NULL;
> >       pgtable_t pgtable;
> >       struct folio *folio;
> >       spinlock_t *pmd_ptl, *pte_ptl;
> >       enum scan_result result = SCAN_FAIL;
> >       struct vm_area_struct *vma;
> >       struct mmu_notifier_range range;
> > +     bool anon_vma_locked = false;
> > +     const unsigned long pmd_address = start_addr & HPAGE_PMD_MASK;
>
> We have start_addr and pmd_address, let's make our mind up and call both
> either addr or address please.

ok

>
> >
> > -     VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> > +     VM_WARN_ON_ONCE(pmd_address & ~HPAGE_PMD_MASK);
>
> You just masked this with HPAGE_PMD_MASK then check & ~HPAGE_PMD_MASK? :)
>
> Can we just drop it? :)

im cool with that.

>
> >
> >       /*
> >        * Before allocating the hugepage, release the mmap_lock read lock.
> >        * The allocation can take potentially a long time if it involves
> >        * sync compaction, and we do not need to hold the mmap_lock during
> >        * that. We will recheck the vma after taking it again in write mode.
> > +      * If collapsing mTHPs we may have already released the read_lock.
> >        */
> > -     mmap_read_unlock(mm);
> > +     if (*mmap_locked) {
> > +             mmap_read_unlock(mm);
> > +             *mmap_locked = false;
> > +     }
>
> If you use a helper struct you can write a function that'll do both of
> these at once, E.g.:
>
> static void scan_mmap_unlock(struct scan_state *scan)
> {
>         if (!scan->mmap_locked)
>                 return;
>
>         mmap_read_unlock(scan->mm);
>         scan->mmap_locked = false;
> }
>
>         ...
>
>         scan_mmap_unlock(scan_state);
>
> >
> > -     result = alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER);
> > +     result = alloc_charge_folio(&folio, mm, cc, order);
> >       if (result != SCAN_SUCCEED)
> >               goto out_nolock;
> >
> >       mmap_read_lock(mm);
> > -     result = hugepage_vma_revalidate(mm, address, true, &vma, cc,
> > -                                      HPAGE_PMD_ORDER);
> > +     *mmap_locked = true;
> > +     result = hugepage_vma_revalidate(mm, pmd_address, true, &vma, cc, order);
>
> Be nice to add a /*expect_anon=*/true, here so we can read what parameter
> that is at a glance.

ack!

>
> >       if (result != SCAN_SUCCEED) {
> >               mmap_read_unlock(mm);
> > +             *mmap_locked = false;
> >               goto out_nolock;
> >       }
> >
> > -     result = find_pmd_or_thp_or_none(mm, address, &pmd);
> > +     result = find_pmd_or_thp_or_none(mm, pmd_address, &pmd);
> >       if (result != SCAN_SUCCEED) {
> >               mmap_read_unlock(mm);
> > +             *mmap_locked = false;
> >               goto out_nolock;
> >       }
> >
> > @@ -1197,13 +1206,16 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> >                * released when it fails. So we jump out_nolock directly in
> >                * that case.  Continuing to collapse causes inconsistency.
> >                */
> > -             result = __collapse_huge_page_swapin(mm, vma, address, pmd,
> > -                                                  referenced, HPAGE_PMD_ORDER);
> > -             if (result != SCAN_SUCCEED)
> > +             result = __collapse_huge_page_swapin(mm, vma, start_addr, pmd,
> > +                                                  referenced, order);
> > +             if (result != SCAN_SUCCEED) {
> > +                     *mmap_locked = false;
> >                       goto out_nolock;
> > +             }
> >       }
> >
> >       mmap_read_unlock(mm);
> > +     *mmap_locked = false;
> >       /*
> >        * Prevent all access to pagetables with the exception of
> >        * gup_fast later handled by the ptep_clear_flush and the VM
> > @@ -1213,20 +1225,20 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> >        * mmap_lock.
> >        */
> >       mmap_write_lock(mm);
>
> Hmm you take an mmap... write lock here then don/t set *mmap_locked =
> true... It's inconsistent and bug prone.

yay we no longer need the gross lock tracking :)

>
> I'm also seriously not a fan of switching between mmap read and write lock
> here but keeping an *mmap_locked parameter here which is begging for a bug.
>
> In general though, you seem to always make sure in the (fairly hideous
> honestly) error goto labels to have the mmap lock dropped, so what is the
> point in keeping the *mmap_locked parameter updated throughou this anyway?

Cleaned up the locking and its all much better now

>
> Are we ever exiting with it set? If not why not drop the parameter/helper
> struct field and just have the caller understand that it's dropped on exit
> (and document that).

This...

>
> Since you're just dropping the lock on entry, why not have the caller do
> that and document that you have to enter unlocked anyway?


+ moving one piece of code up into the parent (the part I was missing
conceptually) solved all this. Thanks!

>
>
> > -     result = hugepage_vma_revalidate(mm, address, true, &vma, cc,
> > -                                      HPAGE_PMD_ORDER);
> > +     result = hugepage_vma_revalidate(mm, pmd_address, true, &vma, cc, order);
> >       if (result != SCAN_SUCCEED)
> >               goto out_up_write;
> >       /* check if the pmd is still valid */
> >       vma_start_write(vma);
> > -     result = check_pmd_still_valid(mm, address, pmd);
> > +     result = check_pmd_still_valid(mm, pmd_address, pmd);
> >       if (result != SCAN_SUCCEED)
> >               goto out_up_write;
> >
> >       anon_vma_lock_write(vma->anon_vma);
> > +     anon_vma_locked = true;
>
> Again with a helper struct you can abstract this and avoid more noise.
>
> E.g. scan_anon_vma_lock_write(scan);
>
> >
> > -     mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
> > -                             address + HPAGE_PMD_SIZE);
> > +     mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, start_addr,
> > +                             start_addr + (PAGE_SIZE << order));
>
> I hate this open-coded 'start_addr + (PAGE_SIZE << order)' construct.
>
> If you use a helper struct (theme here :) you could have a macro that
> generates it set an end param to this.

Ill probably just do a variable with map_size or something. I dont
think we need a helper for this.

>
>
> >       mmu_notifier_invalidate_range_start(&range);
> >
> >       pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
> > @@ -1238,24 +1250,21 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> >        * Parallel GUP-fast is fine since GUP-fast will back off when
> >        * it detects PMD is changed.
> >        */
> > -     _pmd = pmdp_collapse_flush(vma, address, pmd);
> > +     _pmd = pmdp_collapse_flush(vma, pmd_address, pmd);
> >       spin_unlock(pmd_ptl);
> >       mmu_notifier_invalidate_range_end(&range);
> >       tlb_remove_table_sync_one();
> >
> > -     pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl);
> > +     pte = pte_offset_map_lock(mm, &_pmd, start_addr, &pte_ptl);
> >       if (pte) {
> > -             result = __collapse_huge_page_isolate(vma, address, pte, cc,
> > -                                                   HPAGE_PMD_ORDER,
> > -                                                   &compound_pagelist);
> > +             result = __collapse_huge_page_isolate(vma, start_addr, pte, cc,
> > +                                                   order, &compound_pagelist);
>
> Will this work correctly with the non-PMD aligned start_addr?

Yes we generalize all the other functions in the previous patch if
that is what you are asking.

>
> >               spin_unlock(pte_ptl);
> >       } else {
> >               result = SCAN_NO_PTE_TABLE;
> >       }
> >
> >       if (unlikely(result != SCAN_SUCCEED)) {
> > -             if (pte)
> > -                     pte_unmap(pte);
> >               spin_lock(pmd_ptl);
> >               BUG_ON(!pmd_none(*pmd));
>
> Can we downgrade to WARN_ON_ONCE() as we pass by any BUG_ON()'s please?
> Since we're churning here anyway it's worth doing :)

ack.

>
> >               /*
> > @@ -1265,21 +1274,21 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> >                */
> >               pmd_populate(mm, pmd, pmd_pgtable(_pmd));
> >               spin_unlock(pmd_ptl);
> > -             anon_vma_unlock_write(vma->anon_vma);
> >               goto out_up_write;
> >       }
> >
> >       /*
> > -      * All pages are isolated and locked so anon_vma rmap
> > -      * can't run anymore.
> > +      * For PMD collapse all pages are isolated and locked so anon_vma
> > +      * rmap can't run anymore. For mTHP collapse we must hold the lock
>
> This is really unclear. What does 'can't run anymore' mean? Why must we
> hold the lock for mTHP?

In the PMD case we have isolated all the pages in the PMD, so no
changes can occur, and we don't need to hold the lock. in the mTHP
case, the PMD is only partially isolated, so if we drop the lock,
changes can occur to the rest of the PMD. This was based on a bug
found by Hugh https://lore.kernel.org/lkml/7a81339c-f9e5-a718-fa7f-6e3fb134dca5@google.com/

>
> I realise the previous comment was equally as unclear but let's make this
> make sense please :)

Ack ill make it more clear.

>
> >        */
> > -     anon_vma_unlock_write(vma->anon_vma);
> > +     if (is_pmd_order(order)) {
> > +             anon_vma_unlock_write(vma->anon_vma);
> > +             anon_vma_locked = false;
> > +     }
> >
> >       result = __collapse_huge_page_copy(pte, folio, pmd, _pmd,
> > -                                        vma, address, pte_ptl,
> > -                                        HPAGE_PMD_ORDER,
> > -                                        &compound_pagelist);
> > -     pte_unmap(pte);
> > +                                        vma, start_addr, pte_ptl,
> > +                                        order, &compound_pagelist);
> >       if (unlikely(result != SCAN_SUCCEED))
> >               goto out_up_write;
> >
> > @@ -1289,20 +1298,34 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> >        * write.
> >        */
> >       __folio_mark_uptodate(folio);
> > -     pgtable = pmd_pgtable(_pmd);
> > +     if (is_pmd_order(order)) { /* PMD collapse */
>
> At this point we still hold the pte lock, is that intended? Are we sure
> there won't be any issues leaving it held during the operations that now
> happen before you release it?

I will verify before posting, but nothing has shown up in all my
testing (not that doesn't mean it's okay).

>
> > +             pgtable = pmd_pgtable(_pmd);
> >
> > -     spin_lock(pmd_ptl);
> > -     BUG_ON(!pmd_none(*pmd));
> > -     pgtable_trans_huge_deposit(mm, pmd, pgtable);
> > -     map_anon_folio_pmd_nopf(folio, pmd, vma, address);
> > +             spin_lock(pmd_ptl);
> > +             WARN_ON_ONCE(!pmd_none(*pmd));
> > +             pgtable_trans_huge_deposit(mm, pmd, pgtable);
> > +             map_anon_folio_pmd_nopf(folio, pmd, vma, pmd_address);
>
> If we're PMD order start_addr == pmd_address right?

Correct. If you're asking why we don't uniformly use `start_addr`
across the board, it's because using the PMD variable seemed clearer
for PMD-related functions. Let me know which you prefer.

>
> > +     } else { /* mTHP collapse */
> > +             spin_lock(pmd_ptl);
> > +             WARN_ON_ONCE(!pmd_none(*pmd));
>
> You duplicate both of these lines in both branches, pull them out?

Ill give that a shot.

>
> > +             map_anon_folio_pte_nopf(folio, pte, vma, start_addr, /*uffd_wp=*/ false);
> > +             smp_wmb(); /* make PTEs visible before PMD. See pmd_install() */
>
> It'd be much nicer to call pmd_install() :)

I don't think we can do that easily.

>
> Or maybe even to separate out the unlocked bit from pmd_install(), put that
> in e.g. __pmd_install(), then use that after lock acquired?

Can we please save all this for later? It's rather trivial; and last
time I made a cosmetic change I broke something that i had spent over
a year testing and verifying.

>
> > +             pmd_populate(mm, pmd, pmd_pgtable(_pmd));
> > +     }
> >       spin_unlock(pmd_ptl);
> >
> >       folio = NULL;
>
> Not your code but... why? I guess to avoid the folio_put() below but
> gross. Anyway this function needs refactoring, can be a follow up.

ack

>
> >
> >       result = SCAN_SUCCEED;
> >  out_up_write:
> > +     if (anon_vma_locked)
> > +             anon_vma_unlock_write(vma->anon_vma);
> > +     if (pte)
> > +             pte_unmap(pte);
>
> Again can be helped with helper struct :)
>
> >       mmap_write_unlock(mm);
> > +     *mmap_locked = false;
>
> And this... I also hate the break from if (*mmap_locked) ... etc.
>
> >  out_nolock:
> > +     WARN_ON_ONCE(*mmap_locked);
>
> Should be a VM_WARN_ON_ONCE() if we keep it.

ack to the above. I will try cleaning up the locking.

>
> >       if (folio)
> >               folio_put(folio);
> >       trace_mm_collapse_huge_page(mm, result == SCAN_SUCCEED, result);
> > @@ -1483,9 +1506,8 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> >       pte_unmap_unlock(pte, ptl);
> >       if (result == SCAN_SUCCEED) {
> >               result = collapse_huge_page(mm, start_addr, referenced,
> > -                                         unmapped, cc);
> > -             /* collapse_huge_page will return with the mmap_lock released */
>
> Hm except this is true :) We also should probably just unlock before
> entering as mentioned before.

Ack will keep that in mind as part of above

>
> > -             *mmap_locked = false;
> > +                                         unmapped, cc, mmap_locked,
> > +                                         HPAGE_PMD_ORDER);
> >       }
> >  out:
> >       trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
> > --
> > 2.53.0
> >
>
> Cheers, Lorenzo

Thank you for the review :)

Cheers,
-- Nico

>


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox