Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Alexandre Courbot" <acourbot@nvidia.com>
To: "Joel Fernandes" <joelagnelf@nvidia.com>,
	<linux-kernel@vger.kernel.org>, <rust-for-linux@vger.kernel.org>,
	<dri-devel@lists.freedesktop.org>, <dakr@kernel.org>,
	<acourbot@nvidia.com>
Cc: "Alistair Popple" <apopple@nvidia.com>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>, <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Trevor Gross" <tmgross@umich.edu>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Timur Tabi" <ttabi@nvidia.com>, <joel@joelfernandes.org>,
	"Elle Rhumsaa" <elle@weathered-steel.dev>,
	"Daniel Almeida" <daniel.almeida@collabora.com>,
	<nouveau@lists.freedesktop.org>
Subject: Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
Date: Wed, 22 Oct 2025 19:41:13 +0900	[thread overview]
Message-ID: <DDOSD746PCSR.CNAYZSTFR9XR@nvidia.com> (raw)
In-Reply-To: <20251020185539.49986-7-joelagnelf@nvidia.com>

On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
> Required for writing page tables directly to VRAM physical memory,
> before page tables and MMU are setup.
>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/mm/mod.rs    |   3 +
>  drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/nova_core.rs |   1 +
>  drivers/gpu/nova-core/regs.rs      |  29 +++-
>  4 files changed, 273 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
>
> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
> new file mode 100644
> index 000000000000..54c7cd9416a9
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/mod.rs
> @@ -0,0 +1,3 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +pub(crate) mod pramin;
> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
> new file mode 100644
> index 000000000000..4f4e1b8c0b9b
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/pramin.rs
> @@ -0,0 +1,241 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Direct VRAM access through PRAMIN window before page tables are set up.
> +//! PRAMIN can also write to system memory, however for simplicty we only

s/simplicty/simplicity

> +//! support VRAM access.
> +//!
> +//! # Examples
> +//!
> +//! ## Writing u32 data to VRAM
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin::PraminVram;
> +//!
> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
> +//!     let pramin = PraminVram::new(bar);
> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
> +//!     pramin.write::<u32>(0x10000, &data)?;
> +//!     Ok(())
> +//! }
> +//! ```
> +//!
> +//! ## Reading bytes from VRAM
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin::PraminVram;
> +//!
> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
> +//!     let pramin = PraminVram::new(bar);
> +//!     // Read a u8 from VRAM starting at offset 0x20000
> +//!     pramin.read::<u8>(0x20000, buffer)?;
> +//!     Ok(())
> +//! }
> +//! ```
> +
> +#![expect(dead_code)]
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +use core::mem;
> +use kernel::prelude::*;
> +
> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
> +/// the VRAM directly. These addresses are consistent across all GPUs.
> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000

This definition looks like it could be an array of registers - that way
we could use its `BASE` associated constant and keep the hardware
offsets into the `regs` module.

Even if we don't use the array of registers for convenience, it is good
to have it defined in `regs` for consistency.

> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position

You can use `kernel::sizes::SZ_1M` here.

> +
> +/// Trait for types that can be read/written through PRAMIN.
> +pub(crate) trait PraminNum: Copy + Default + Sized {
> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
> +
> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
> +
> +    fn size_bytes() -> usize {
> +        mem::size_of::<Self>()
> +    }
> +
> +    fn alignment() -> usize {
> +        Self::size_bytes()
> +    }
> +}

Since this trait requires `Sized`, you can use `size_of` and `align_of`
directly, making the `size_bytes` and `alignment` methods redundant.
Only `write_to_bar` should remain.

I also wonder whether we couldn't get rid of this trait entirely by
leveragin `FromBytes` and `AsBytes`. Since the size of the type is
known, we could have read/write methods in Pramin that write its content
by using Io accessors of decreasing size (first 64-bit, then 32, etc)
until all the data is written.

> +
> +/// Macro to implement PraminNum trait for unsigned integer types.
> +macro_rules! impl_pramin_unsigned_num {
> +    ($bits:literal) => {
> +        ::kernel::macros::paste! {
> +            impl PraminNum for [<u $bits>] {
> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
> +                    bar.[<try_read $bits>](offset)
> +                }
> +
> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
> +                    bar.[<try_write $bits>](self, offset)
> +                }
> +            }
> +        }
> +    };
> +}
> +
> +impl_pramin_unsigned_num!(8);
> +impl_pramin_unsigned_num!(16);
> +impl_pramin_unsigned_num!(32);
> +impl_pramin_unsigned_num!(64);
> +
> +/// Direct VRAM access through PRAMIN window before page tables are set up.
> +pub(crate) struct PraminVram<'a> {

Let's use the shorter name `Pramin` - the limitation to VRAM is a
reasonable one (since the CPU can access its own system memory), it is
not necessary to encode it into the name.

> +    bar: &'a Bar0,
> +    saved_window_addr: usize,
> +}
> +
> +impl<'a> PraminVram<'a> {
> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
> +    /// the state is restored when the accessor is dropped.
> +    ///
> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
> +        let saved_window_addr = Self::get_window_addr(bar);
> +        Self {
> +            bar,
> +            saved_window_addr,
> +        }
> +    }
> +
> +    /// Set BAR0 window to point to specific FB region.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
> +    ///                 Must be 64KB aligned (lower 16 bits zero).

Let's follow the rust doccomment guidelines for the arguments.

> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
> +        if fb_offset & 0xFFFF != 0 {
> +            return Err(EINVAL);
> +        }

Since this method is private and called from controlled contexts for
which `fb_offset` should always be valid, we can request callers to
give us a "window index" (e.g. the `window_base` of the
`NV_PBUS_BAR0_WINDOW` register) directly and remove this check. That
will also let us remove the impl block on `NV_PBUS_BAR0_WINDOW`.

> +
> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::default().set_window_addr(fb_offset);
> +        window_reg.write(self.bar);
> +
> +        // Read back to ensure it took effect
> +        let readback = regs::NV_PBUS_BAR0_WINDOW::read(self.bar);
> +        if readback.window_base() != window_reg.window_base() {
> +            return Err(EIO);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Get current BAR0 window offset.
> +    ///
> +    /// # Returns
> +    ///
> +    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
> +    /// This offset is always 64KB aligned.
> +    fn get_window_addr(bar: &Bar0) -> usize {
> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
> +        window_reg.get_window_addr()
> +    }
> +
> +    /// Common logic for accessing VRAM data through PRAMIN with windowing.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `num_items` - Number of items of type `T` to process.
> +    /// * `operation` - Closure called for each item to perform the actual read/write.
> +    ///                 Takes two parameters:
> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access

Formatting of arguments is strange here as well.

> +    ///
> +    /// The function automatically handles PRAMIN window repositioning when accessing
> +    /// data that spans multiple 1MB windows.

Inversely, this large method is under-documented. Understanding what
`operation` is supposed to do would be helpful.

> +    fn access_vram<T: PraminNum, F>(
> +        &self,
> +        fb_offset: usize,
> +        num_items: usize,
> +        mut operation: F,
> +    ) -> Result
> +    where
> +        F: FnMut(usize, usize) -> Result,
> +    {
> +        // FB offset must be aligned to the size of T
> +        if fb_offset & (T::alignment() - 1) != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let mut offset_bytes = fb_offset;
> +        let mut remaining_items = num_items;
> +        let mut data_index = 0;
> +
> +        while remaining_items > 0 {
> +            // Align the window to 64KB boundary
> +            let target_window = offset_bytes & !0xFFFF;
> +            let window_offset = offset_bytes - target_window;
> +
> +            // Set window if needed
> +            if target_window != Self::get_window_addr(self.bar) {
> +                self.set_window_addr(target_window)?;
> +            }
> +
> +            // Calculate how many items we can access from this window position
> +            // We can access up to 1MB total, minus the offset within the window
> +            let remaining_in_window = PRAMIN_SIZE - window_offset;
> +            let max_items_in_window = remaining_in_window / T::size_bytes();
> +            let items_to_write = core::cmp::min(remaining_items, max_items_in_window);
> +
> +            // Process data through PRAMIN
> +            for i in 0..items_to_write {
> +                // Calculate the byte offset in the PRAMIN window to write to.
> +                let pramin_offset_bytes = PRAMIN_BASE + window_offset + (i * T::size_bytes());
> +                operation(data_index + i, pramin_offset_bytes)?;
> +            }
> +
> +            // Move to next chunk.
> +            data_index += items_to_write;
> +            offset_bytes += items_to_write * T::size_bytes();
> +            remaining_items -= items_to_write;
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Generic write for data to VRAM through PRAMIN.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM where data will be written.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `data` - Slice of items to write to VRAM. All items will be written sequentially
> +    ///            starting at `fb_offset`.
> +    pub(crate) fn write<T: PraminNum>(&self, fb_offset: usize, data: &[T]) -> Result {
> +        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
> +            data[data_idx].write_to_bar(self.bar, pramin_offset)
> +        })
> +    }
> +
> +    /// Generic read data from VRAM through PRAMIN.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM where data will be read from.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `data` - Mutable slice that will be filled with data read from VRAM.
> +    ///            The number of items read equals `data.len()`.
> +    pub(crate) fn read<T: PraminNum>(&self, fb_offset: usize, data: &mut [T]) -> Result {
> +        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
> +            data[data_idx] = T::read_from_bar(self.bar, pramin_offset)?;
> +            Ok(())
> +        })
> +    }
> +}
> +
> +impl<'a> Drop for PraminVram<'a> {
> +    fn drop(&mut self) {
> +        let _ = self.set_window_addr(self.saved_window_addr); // Restore original window.
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 112277c7921e..6bd9fc3372d6 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -13,6 +13,7 @@
>  mod gfw;
>  mod gpu;
>  mod gsp;
> +mod mm;
>  mod regs;
>  mod util;
>  mod vbios;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index a3836a01996b..ba09da7e1541 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -12,6 +12,7 @@
>      FalconModSelAlgo, FalconSecurityModel, PFalcon2Base, PFalconBase, PeregrineCoreSelect,
>  };
>  use crate::gpu::{Architecture, Chipset};
> +use kernel::bits::genmask_u32;
>  use kernel::prelude::*;
>  
>  // PMC
> @@ -43,7 +44,8 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
>      }
>  }
>  
> -// PBUS
> +// PBUS - PBUS is a bus control unit, that helps the GPU communicate with the PCI bus.
> +// Handles the BAR windows, decoding of MMIO read/writes on the BARs, etc.
>  
>  register!(NV_PBUS_SW_SCRATCH @ 0x00001400[64]  {});
>  
> @@ -52,6 +54,31 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
>      31:16   frts_err_code as u16;
>  });
>  
> +// BAR0 window control register to configure the BAR0 window for PRAMIN access
> +// (direct physical VRAM access).
> +register!(NV_PBUS_BAR0_WINDOW @ 0x00001700, "BAR0 window control register" {
> +    25:24   target as u8, "Target (0=VID_MEM, 1=SYS_MEM_COHERENT, 2=SYS_MEM_NONCOHERENT)";
> +    23:0    window_base as u32, "Window base address (bits 39:16 of FB addr)";
> +});
> +
> +impl NV_PBUS_BAR0_WINDOW {
> +    /// Returns the 64-bit aligned VRAM address of the window.
> +    pub(crate) fn get_window_addr(self) -> usize {
> +        (self.window_base() as usize) << 16
> +    }
> +
> +    /// Sets the window address from a framebuffer offset.
> +    /// The fb_offset must be 64KB aligned (lower bits discared).
> +    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
> +        // Calculate window base (bits 39:16 of FB address)
> +        // The total FB address is 40 bits, mask anything above. Since we are
> +        // right shifting the offset by 16 bits, the mask is only 24 bits.
> +        let mask = genmask_u32(0..=23) as usize;
> +        let window_base = ((fb_offset >> 16) & mask) as u32;
> +        self.set_window_base(window_base)
> +    }
> +}

If you work directly with `window_base` as suggested above, this impl
block can be dropped altogether.

next prev parent reply	other threads:[~2025-10-22 10:41 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
2025-10-20 18:55 ` [PATCH 1/7] docs: rust: Fix a few grammatical errors Joel Fernandes
2025-10-20 21:21   ` John Hubbard
2025-10-20 21:33   ` Miguel Ojeda
2025-10-20 23:23     ` Joel Fernandes
2025-10-20 18:55 ` [PATCH 2/7] gpu: nova-core: Add support to convert bitfield to underlying type Joel Fernandes
2025-10-20 21:25   ` John Hubbard
2025-10-22  6:25   ` Alexandre Courbot
2025-10-22 17:51     ` Joel Fernandes
2025-10-20 18:55 ` [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture Joel Fernandes
2025-10-20 21:49   ` John Hubbard
2025-10-22  1:43   ` Bagas Sanjaya
2025-10-22 11:16   ` Alexandre Courbot
2025-10-20 18:55 ` [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
2025-10-20 19:36   ` John Hubbard
2025-10-20 19:48     ` Joel Fernandes
2025-10-20 20:42       ` John Hubbard
2025-10-20 20:45         ` Joel Fernandes
2025-10-20 22:08   ` John Hubbard
2025-10-22  2:09   ` Bagas Sanjaya
2025-10-20 18:55 ` [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts Joel Fernandes
2025-10-20 22:35   ` John Hubbard
2025-10-21 18:42     ` Joel Fernandes
2025-10-22  6:48     ` Alexandre Courbot
2025-10-22 21:09       ` Joel Fernandes
2025-10-22 23:16         ` John Hubbard
2025-10-22  6:47   ` Alexandre Courbot
2025-10-22 21:05     ` Joel Fernandes
2025-10-20 18:55 ` [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
2025-10-22  2:18   ` John Hubbard
2025-10-22 17:48     ` Joel Fernandes
2025-10-22 20:43       ` Joel Fernandes
2025-10-24 11:31       ` Alexandre Courbot
2025-10-22 10:41   ` Alexandre Courbot [this message]
2025-10-22 22:04     ` Joel Fernandes
2025-10-24 11:39       ` Alexandre Courbot
2025-10-20 18:55 ` [PATCH 7/7] nova-core: mm: Add data structures for page table management Joel Fernandes
2025-10-20 20:59   ` John Hubbard
2025-10-21 18:26     ` Joel Fernandes
2025-10-21 20:30       ` John Hubbard
2025-10-21 21:58         ` Joel Fernandes
2025-10-20 21:30   ` Miguel Ojeda
2025-11-03 19:21     ` Joel Fernandes
2025-11-04 17:54       ` Miguel Ojeda
2025-11-04 18:18         ` Danilo Krummrich
2025-11-03 19:29     ` John Hubbard
2025-11-04 17:56       ` Miguel Ojeda
2025-11-05  2:25         ` John Hubbard
2025-10-22 11:21   ` Alexandre Courbot
2025-10-22 19:13     ` Joel Fernandes
2025-10-20 21:20 ` [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core John Hubbard
2025-10-21 18:29   ` Joel Fernandes
2025-10-22  6:57 ` Alexandre Courbot
2025-10-22 21:30   ` Joel Fernandes
2025-10-24 11:51     ` Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DDOSD746PCSR.CNAYZSTFR9XR@nvidia.com \
    --to=acourbot@nvidia.com \
    --cc=a.hindborg@kernel.org \
    --cc=airlied@gmail.com \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apopple@nvidia.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dakr@kernel.org \
    --cc=daniel.almeida@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=elle@weathered-steel.dev \
    --cc=gary@garyguo.net \
    --cc=jhubbard@nvidia.com \
    --cc=joel@joelfernandes.org \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=tmgross@umich.edu \
    --cc=ttabi@nvidia.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox