rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core
@ 2025-10-20 18:55 Joel Fernandes
  2025-10-20 18:55 ` [PATCH 1/7] docs: rust: Fix a few grammatical errors Joel Fernandes
                   ` (8 more replies)
  0 siblings, 9 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 18:55 UTC (permalink / raw)
  To: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Joel Fernandes,
	Timur Tabi, joel, Elle Rhumsaa, Daniel Almeida, nouveau

These patches have some prerequistes needed for nova-core as support is added
for memory management and interrupt handling. I rebased them on drm-rust-next
and would like them to be considered for the next merge window. I also included
a simple rust documentation patch fixing some issues I noticed while reading it :).

The series adds support for the PRAMIN aperture mechanism, which is a
prerequisite for virtual memory as it is required to boot strap virtual memory
(we cannot write to VRAM using virtual memory because we need to write page
tables to VRAM in the first place).

I also add page table related structures (mm/types.rs) using the bitfield
macro, which will be used for page table walking, memory mapping, etc. This is
currently unused code, because without physical memory allocation (using the
buddy allocator), we cannot use this code as page table pages need to be
allocated in the first place. However, I have included several examples in the
file about how these structures will be used. I have also simplified the code
keeping future additions to it for later.

For interrupts, I only have added additional support for GSP's message queue
interrupt. I am working on adding support to the interrupt controller module
(VFN) which is the next thing for me to post after this series. I have it
prototyped and working, however I am currently making several changes to it
related to virtual functions. For now in this series, I just want to get the
GSP-specific patch out of the way, hence I am including it here.

I also have added a patch for bitfield macro which constructs a bitfield struct
given its storage type. This is used in a later GSP interrupt patch in the
series to read from one register and write to another.

Joel Fernandes (7):
  docs: rust: Fix a few grammatical errors
  gpu: nova-core: Add support to convert bitfield to underlying type
  docs: gpu: nova-core: Document GSP RPC message queue architecture
  docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  gpu: nova-core: Add support for managing GSP falcon interrupts
  nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  nova-core: mm: Add data structures for page table management

 Documentation/gpu/nova/core/msgq.rst     | 159 +++++++++
 Documentation/gpu/nova/core/pramin.rst   | 113 +++++++
 Documentation/gpu/nova/index.rst         |   2 +
 Documentation/rust/coding-guidelines.rst |   4 +-
 drivers/gpu/nova-core/bitfield.rs        |   7 +
 drivers/gpu/nova-core/falcon/gsp.rs      |  26 +-
 drivers/gpu/nova-core/gpu.rs             |   2 +-
 drivers/gpu/nova-core/mm/mod.rs          |   4 +
 drivers/gpu/nova-core/mm/pramin.rs       | 241 ++++++++++++++
 drivers/gpu/nova-core/mm/types.rs        | 405 +++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs       |   1 +
 drivers/gpu/nova-core/regs.rs            |  39 ++-
 12 files changed, 996 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/gpu/nova/core/msgq.rst
 create mode 100644 Documentation/gpu/nova/core/pramin.rst
 create mode 100644 drivers/gpu/nova-core/mm/mod.rs
 create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
 create mode 100644 drivers/gpu/nova-core/mm/types.rs

-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/7] docs: rust: Fix a few grammatical errors
  2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
@ 2025-10-20 18:55 ` Joel Fernandes
  2025-10-20 21:21   ` John Hubbard
  2025-10-20 21:33   ` Miguel Ojeda
  2025-10-20 18:55 ` [PATCH 2/7] gpu: nova-core: Add support to convert bitfield to underlying type Joel Fernandes
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 18:55 UTC (permalink / raw)
  To: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Joel Fernandes,
	Timur Tabi, joel, Elle Rhumsaa, Daniel Almeida, nouveau

Fix two grammatical errors in the Rust coding guidelines document.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 Documentation/rust/coding-guidelines.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/rust/coding-guidelines.rst b/Documentation/rust/coding-guidelines.rst
index 6ff9e754755d..d556f0db042b 100644
--- a/Documentation/rust/coding-guidelines.rst
+++ b/Documentation/rust/coding-guidelines.rst
@@ -97,7 +97,7 @@ should still be used. For instance:
 	// TODO: ...
 	fn f() {}
 
-One special kind of comments are the ``// SAFETY:`` comments. These must appear
+One special kind of comment is the ``// SAFETY:`` comment. These must appear
 before every ``unsafe`` block, and they explain why the code inside the block is
 correct/sound, i.e. why it cannot trigger undefined behavior in any case, e.g.:
 
@@ -166,7 +166,7 @@ in the kernel:
 - While not shown here, if a function may panic, the conditions under which
   that happens must be described under a ``# Panics`` section.
 
-  Please note that panicking should be very rare and used only with a good
+  Please note that panicking should be very rare and used only for a good
   reason. In almost all cases, a fallible approach should be used, typically
   returning a ``Result``.
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 2/7] gpu: nova-core: Add support to convert bitfield to underlying type
  2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
  2025-10-20 18:55 ` [PATCH 1/7] docs: rust: Fix a few grammatical errors Joel Fernandes
@ 2025-10-20 18:55 ` Joel Fernandes
  2025-10-20 21:25   ` John Hubbard
  2025-10-22  6:25   ` Alexandre Courbot
  2025-10-20 18:55 ` [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture Joel Fernandes
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 18:55 UTC (permalink / raw)
  To: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Joel Fernandes,
	Timur Tabi, joel, Elle Rhumsaa, Daniel Almeida, nouveau

To support the usecase where we read a register and write to another
with identical bit layout, add support to convert bitfield to underlying type.

Another way to do this, is to read individual fields, on the caller
side, and write to the destination fields, but that is both cumbersome
and error-prone as new bits added in hardware may be missed.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/bitfield.rs | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/nova-core/bitfield.rs b/drivers/gpu/nova-core/bitfield.rs
index 0994505393dd..2266abc3f7ab 100644
--- a/drivers/gpu/nova-core/bitfield.rs
+++ b/drivers/gpu/nova-core/bitfield.rs
@@ -72,6 +72,7 @@
 /// - Field setters: `set_mode()`, `set_state()`, etc. (supports chaining with builder pattern).
 ///   Note that the compiler will error out if the size of the setter's arg exceeds the
 ///   struct's storage size.
+/// - Conversion from the underlying storage type (e.g., `From<u32>`).
 /// - Debug and Default implementations.
 ///
 /// Note: Field accessors and setters inherit the same visibility as the struct itself.
@@ -117,6 +118,12 @@ fn from(val: $name) -> $storage {
             }
         }
 
+        impl ::core::convert::From<$storage> for $name {
+            fn from(val: $storage) -> $name {
+                $name(val)
+            }
+        }
+
         bitfield!(@fields_dispatcher $vis $name $storage { $($fields)* });
     };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture
  2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
  2025-10-20 18:55 ` [PATCH 1/7] docs: rust: Fix a few grammatical errors Joel Fernandes
  2025-10-20 18:55 ` [PATCH 2/7] gpu: nova-core: Add support to convert bitfield to underlying type Joel Fernandes
@ 2025-10-20 18:55 ` Joel Fernandes
  2025-10-20 21:49   ` John Hubbard
                     ` (2 more replies)
  2025-10-20 18:55 ` [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 18:55 UTC (permalink / raw)
  To: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Joel Fernandes,
	Timur Tabi, joel, Elle Rhumsaa, Daniel Almeida, nouveau

Document the GSP RPC message queue architecture in detail.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 Documentation/gpu/nova/core/msgq.rst | 159 +++++++++++++++++++++++++++
 Documentation/gpu/nova/index.rst     |   1 +
 2 files changed, 160 insertions(+)
 create mode 100644 Documentation/gpu/nova/core/msgq.rst

diff --git a/Documentation/gpu/nova/core/msgq.rst b/Documentation/gpu/nova/core/msgq.rst
new file mode 100644
index 000000000000..84e25be69cd6
--- /dev/null
+++ b/Documentation/gpu/nova/core/msgq.rst
@@ -0,0 +1,159 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================================
+Nova GPU RPC Message Passing Architecture
+=========================================
+
+.. note::
+   The following description is approximate and current as of the Ampere family.
+   It may change for future generations and is intended to assist in understanding
+   the driver code.
+
+Overview
+========
+
+The Nova GPU driver communicates with the GSP (GPU System Processor) firmware
+using an RPC (Remote Procedure Call) mechanism built on top of circular message
+queues in shared memory. This document describes the structure of RPC messages
+and the mechanics of the message passing system.
+
+Message Queue Architecture
+==========================
+
+The communication between CPU and GSP uses two unidirectional circular queues:
+
+1. **CPU Queue (cpuq)**: CPU writes, GSP reads
+2. **GSP Queue (gspq)**: GSP writes, CPU reads
+
+The advantage of this approach is no synchronization is required to access the
+queues, if one entity wants to communicate with the other (CPU or GSP), they
+simply write into their own queue.
+
+Memory Layout
+-------------
+
+The shared memory region (GspMem) where the queues reside has the following
+layout::
+
+    +------------------------+ GspMem DMA Handle (base address)
+    |    PTE Array (4KB)     |  <- Self-mapping page table
+    | PTE[0] = base + 0x0000 |     Points to this page
+    | PTE[1] = base + 0x1000 |     Points to CPU queue Header page
+    | PTE[2] = base + 0x2000 |     Points to first page of CPU queue data
+    | ...                    |     ...
+    | ...                    |     ...
+    +------------------------+ base + 0x1000
+    |    CPU Queue Header    |  MsgqTxHeader + MsgqRxHeader
+    |    - TX Header (32B)   |
+    |    - RX Header (4B)    | (1 page)
+    |    - Padding           |
+    +------------------------+ base + 0x2000
+    |    CPU Queue Data      | (63 pages)
+    |    (63 x 4KB pages)    |  Circular buffer for messages
+    | ...                    |     ...
+    +------------------------+ base + 0x41000
+    |    GSP Queue Header    |  MsgqTxHeader + MsgqRxHeader
+    |    - TX Header (32B)   |
+    |    - RX Header (4B)    | (1 page)
+    |    - Padding           |
+    +------------------------+ base + 0x42000
+    |    GSP Queue Data      | (63 pages)
+    |    (63 x 4KB pages)    |  Circular buffer for messages
+    | ...                    |     ...
+    +------------------------+ base + 0x81000
+
+
+Message Passing Mechanics
+-------------------------
+The split read/write pointer design allows bidirectional communication between the
+CPU and GSP without synchronization (if it were a shared queue), for example, the
+following diagram illustrates pointer updates, when CPU sends message to GSP::
+
+    +--------------------------------------------------------------------------+
+    |                     DMA coherent Shared Memory (GspMem)                  |
+    +--------------------------------------------------------------------------+
+    |                          (CPU sending message to GSP)                    |
+    |  +-------------------+                      +-------------------+        |
+    |  |   GSP Queue       |                      |   CPU Queue       |        |
+    |  |                   |                      |                   |        |
+    |  | +-------------+   |                      | +-------------+   |        |
+    |  | |  TX Header  |   |                      | |  TX Header  |   |        |
+    |  | | write_ptr   |   |                      | | write_ptr   |---+----,   |
+    |  | |             |   |                      | |             |   |    |   |
+    |  | +-------------+   |                      | +-------------+   |    |   |
+    |  |                   |                      |                   |    |   |
+    |  | +-------------+   |                      | +-------------+   |    |   |
+    |  | |  RX Header  |   |                      | |  RX Header  |   |    |   |
+    |  | |  read_ptr ------+-------,              | |  read_ptr   |   |    |   |
+    |  | |             |   |       |              | |             |   |    |   |
+    |  | +-------------+   |       |              | +-------------+   |    |   |
+    |  |                   |       |              |                   |    |   |
+    |  | +-------------+   |       |              | +-------------+   |    |   |
+    |  | |   Page 0    |   |       |              | |   Page 0    |   |    |   |
+    |  | +-------------+   |       |              | +-------------+   |    |   |
+    |  | |   Page 1    |   |       `--------------> |   Page 1    |   |    |   |
+    |  | +-------------+   |                      | +-------------+   |    |   |
+    |  | |   Page 2    |   |                      | |   Page 2    |<--+----'   |
+    |  | +-------------+   |                      | +-------------+   |        |
+    |  | |     ...     |   |                      | |     ...     |   |        |
+    |  | +-------------+   |                      | +-------------+   |        |
+    |  | |   Page 62   |   |                      | |   Page 62   |   |        |
+    |  | +-------------+   |                      | +-------------+   |        |
+    |  |   (63 pages)      |                      |   (63 pages)      |        |
+    |  +-------------------+                      +-------------------+        |
+    |                                                                          |
+    +--------------------------------------------------------------------------+
+
+When the CPU sends a message to the GSP, it writes the message to its own
+queue (CPU queue) and updates the write pointer in its queue's TX header. The GSP
+then reads the read pointer in its own queue's RX header and knows that there are
+pending messages from the CPU because its RX header's read pointer is behind the
+CPU's TX header's write pointer. After reading the message, the GSP updates its RX
+header's read pointer to catch up. The same happens in reverse.
+
+Page-based message passing
+--------------------------
+The message queue is page-based, which means that the message is stored in a
+page-aligned buffer. The page size is 4KB. Each message starts at the beginning of
+a page. If the message is shorter than a page, the remaining space in the page is
+wasted. The next message starts at the beginning of the next page no matter how
+small the previous message was.
+
+Note that messages larger than a page will span multiple pages. This means that
+it is possible that the first part of the message lands on the last page, and the
+second part of the message lands on the first page, thus requiring out-of-order
+memory access. The SBuffer data structure in Nova tackles this use case.
+
+RPC Message Structure:
+======================
+
+An RPC message is also called a "Message Element". The entire message has
+multiple headers. There is a "message element" header which handles message
+queue specific details and integrity, followed by a "RPC" header which handles
+the RPC protocol details::
+
+    +----------------------------------+
+    |        GspMsgHeader (64B)        | (aka, Message Element Header)
+    +----------------------------------+
+    | auth_tag_buffer[16]              | --+
+    | aad_buffer[16]                   |   |
+    | checksum        (u32)            |   +-- Security & Integrity
+    | sequence        (u32)            |   |
+    | elem_count      (u32)            |   |
+    | pad             (u32)            | --+
+    +----------------------------------+
+    |        GspRpcHeader (32B)        |
+    +----------------------------------+
+    | header_version  (0x03000000)     | --+
+    | signature       (0x43505256)     |   |
+    | length          (u32)            |   +-- RPC Protocol
+    | function        (u32)            |   |
+    | rpc_result      (u32)            |   |
+    | rpc_result_private (u32)         |   |
+    | sequence        (u32)            |   |
+    | cpu_rm_gfid     (u32)            | --+
+    +----------------------------------+
+    |                                  |
+    |        Payload (Variable)        | --- Function-specific data
+    |                                  |
+    +----------------------------------+
diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
index e39cb3163581..46302daace34 100644
--- a/Documentation/gpu/nova/index.rst
+++ b/Documentation/gpu/nova/index.rst
@@ -32,3 +32,4 @@ vGPU manager VFIO driver and the nova-drm driver.
    core/devinit
    core/fwsec
    core/falcon
+   core/msgq
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
                   ` (2 preceding siblings ...)
  2025-10-20 18:55 ` [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture Joel Fernandes
@ 2025-10-20 18:55 ` Joel Fernandes
  2025-10-20 19:36   ` John Hubbard
                     ` (2 more replies)
  2025-10-20 18:55 ` [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts Joel Fernandes
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 18:55 UTC (permalink / raw)
  To: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Joel Fernandes,
	Timur Tabi, joel, Elle Rhumsaa, Daniel Almeida, nouveau

While not terribly complicated, a little bit of documentation will help
augment the code for this very important mechanism.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 Documentation/gpu/nova/core/pramin.rst | 113 +++++++++++++++++++++++++
 Documentation/gpu/nova/index.rst       |   1 +
 2 files changed, 114 insertions(+)
 create mode 100644 Documentation/gpu/nova/core/pramin.rst

diff --git a/Documentation/gpu/nova/core/pramin.rst b/Documentation/gpu/nova/core/pramin.rst
new file mode 100644
index 000000000000..19615e504db9
--- /dev/null
+++ b/Documentation/gpu/nova/core/pramin.rst
@@ -0,0 +1,113 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
+PRAMIN aperture mechanism
+=========================
+
+.. note::
+   The following description is approximate and current as of the Ampere family.
+   It may change for future generations and is intended to assist in understanding
+   the driver code.
+
+Introduction
+============
+
+PRAMIN is a hardware aperture mechanism that provides CPU access to GPU Video RAM (VRAM) before
+the GPU's Memory Management Unit (MMU) and page tables are initialized. This 1MB sliding window,
+located at a fixed offset within BAR0, is essential for setting up page tables and other critical
+GPU data structures without relying on the GPU's MMU.
+
+Architecture Overview
+=====================
+
+Logically, the PRAMIN aperture mechanism is implemented by the GPU's PBUS (PCIe Bus Controller Unit)
+and provides a CPU-accessible window into VRAM through the PCIe interface::
+
+    +-----------------+    PCIe     +------------------------------+
+    |      CPU        |<----------->|           GPU                |
+    +-----------------+             |                              |
+                                    |  +----------------------+    |
+                                    |  |       PBUS           |    |
+                                    |  |  (Bus Controller)    |    |
+                                    |  |                      |    |
+                                    |  |  +--------------.<------------ (window always starts at
+                                    |  |  |   PRAMIN     |    |    |     BAR0 + 0x700000)
+                                    |  |  |   Window     |    |    |
+                                    |  |  |   (1MB)      |    |    |
+                                    |  |  +--------------+    |    |
+                                    |  |         |            |    |
+                                    |  +---------|------------+    |
+                                    |            |                 |
+                                    |            v                 |
+                                    |  .----------------------.<------------ (Program PRAMIN to any
+                                    |  |       VRAM           |    |    64KB VRAM physical boundary)
+                                    |  |    (Several GBs)     |    |
+                                    |  |                      |    |
+                                    |  |  FB[0x000000000000]  |    |
+                                    |  |          ...         |    |
+                                    |  |  FB[0x7FFFFFFFFFF]   |    |
+                                    |  +----------------------+    |
+                                    +------------------------------+
+
+PBUS (PCIe Bus Controller) among other things is responsible in the GPU for handling MMIO
+accesses to the BAR registers.
+
+PRAMIN Window Operation
+=======================
+
+The PRAMIN window provides a 1MB sliding aperture that can be repositioned over
+the entire VRAM address space using the NV_PBUS_BAR0_WINDOW register.
+
+Window Control Mechanism
+-------------------------
+
+The window position is controlled via the PBUS BAR0_WINDOW register::
+
+    NV_PBUS_BAR0_WINDOW Register
+    +-----+-----+--------------------------------------+
+    |31-26|25-24|           23-0                       |
+    |     |TARG |         BASE_ADDR                    |
+    |     | ET  |        (bits 39:16 of VRAM address)  |
+    +-----+-----+--------------------------------------+
+
+    TARGET field values:
+    - 0x0: VID_MEM (Video Memory / VRAM)
+    - 0x1: SYS_MEM_COHERENT (Coherent system memory)
+    - 0x2: SYS_MEM_NONCOHERENT (Non-coherent system memory)
+
+64KB Alignment Requirement
+---------------------------
+
+The PRAMIN window must be aligned to 64KB boundaries in VRAM. This is enforced
+by the BASE_ADDR field representing bits [39:16] of the target address::
+
+    VRAM Address Calculation:
+    actual_vram_addr = (BASE_ADDR << 16) + pramin_offset
+    Where:
+    - BASE_ADDR: 24-bit value from NV_PBUS_BAR0_WINDOW[23:0]
+    - pramin_offset: 20-bit offset within PRAMIN window [0x00000-0xFFFFF]
+    Example Window Positioning:
+    +---------------------------------------------------------+
+    |                    VRAM Space                           |
+    |                                                         |
+    |  0x000000000  +-----------------+ <-- 64KB aligned      |
+    |               | PRAMIN Window   |                       |
+    |               |    (1MB)        |                       |
+    |  0x0000FFFFF  +-----------------+                       |
+    |                                                         |
+    |       |              ^                                  |
+    |       |              | Window can slide                 |
+    |       v              | to any 64KB boundary             |
+    |                                                         |
+    |  0x123400000  +-----------------+ <-- 64KB aligned      |
+    |               | PRAMIN Window   |                       |
+    |               |    (1MB)        |                       |
+    |  0x1234FFFFF  +-----------------+                       |
+    |                                                         |
+    |                       ...                               |
+    |                                                         |
+    |  0x7FFFF0000  +-----------------+ <-- 64KB aligned      |
+    |               | PRAMIN Window   |                       |
+    |               |    (1MB)        |                       |
+    |  0x7FFFFFFFF  +-----------------+                       |
+    +---------------------------------------------------------+
diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
index 46302daace34..e77d3ee336a4 100644
--- a/Documentation/gpu/nova/index.rst
+++ b/Documentation/gpu/nova/index.rst
@@ -33,3 +33,4 @@ vGPU manager VFIO driver and the nova-drm driver.
    core/fwsec
    core/falcon
    core/msgq
+   core/pramin
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts
  2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
                   ` (3 preceding siblings ...)
  2025-10-20 18:55 ` [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
@ 2025-10-20 18:55 ` Joel Fernandes
  2025-10-20 22:35   ` John Hubbard
  2025-10-22  6:47   ` Alexandre Courbot
  2025-10-20 18:55 ` [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 18:55 UTC (permalink / raw)
  To: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Joel Fernandes,
	Timur Tabi, joel, Elle Rhumsaa, Daniel Almeida, nouveau

Add support for managing GSP falcon interrupts. These are required for
GSP message queue interrupt handling.

Also rename clear_swgen0_intr() to enable_msq_interrupt() for
readability.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/falcon/gsp.rs | 26 +++++++++++++++++++++++---
 drivers/gpu/nova-core/gpu.rs        |  2 +-
 drivers/gpu/nova-core/regs.rs       | 10 ++++++++++
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/nova-core/falcon/gsp.rs b/drivers/gpu/nova-core/falcon/gsp.rs
index f17599cb49fa..6da63823996b 100644
--- a/drivers/gpu/nova-core/falcon/gsp.rs
+++ b/drivers/gpu/nova-core/falcon/gsp.rs
@@ -22,11 +22,31 @@ impl FalconEngine for Gsp {
 }
 
 impl Falcon<Gsp> {
-    /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
-    /// allow GSP to signal CPU for processing new messages in message queue.
-    pub(crate) fn clear_swgen0_intr(&self, bar: &Bar0) {
+    /// Enable the GSP Falcon message queue interrupt (SWGEN0 interrupt).
+    #[expect(dead_code)]
+    pub(crate) fn enable_msgq_interrupt(&self, bar: &Bar0) {
+        regs::NV_PFALCON_FALCON_IRQMASK::alter(bar, &Gsp::ID, |r| r.set_swgen0(true));
+    }
+
+    /// Check if the message queue interrupt is pending.
+    #[expect(dead_code)]
+    pub(crate) fn has_msgq_interrupt(&self, bar: &Bar0) -> bool {
+        regs::NV_PFALCON_FALCON_IRQSTAT::read(bar, &Gsp::ID).swgen0()
+    }
+
+    /// Clears the message queue interrupt to allow GSP to signal CPU
+    /// for processing new messages.
+    pub(crate) fn clear_msgq_interrupt(&self, bar: &Bar0) {
         regs::NV_PFALCON_FALCON_IRQSCLR::default()
             .set_swgen0(true)
             .write(bar, &Gsp::ID);
     }
+
+    /// Acknowledge all pending GSP interrupts.
+    #[expect(dead_code)]
+    pub(crate) fn ack_all_interrupts(&self, bar: &Bar0) {
+        // Read status and write the raw value to IRQSCLR to clear all pending interrupts.
+        let status = regs::NV_PFALCON_FALCON_IRQSTAT::read(bar, &Gsp::ID);
+        regs::NV_PFALCON_FALCON_IRQSCLR::from(u32::from(status)).write(bar, &Gsp::ID);
+    }
 }
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index af20e2daea24..fb120cf7b15d 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -216,7 +216,7 @@ pub(crate) fn new<'a>(
                 bar,
                 spec.chipset > Chipset::GA100,
             )
-            .inspect(|falcon| falcon.clear_swgen0_intr(bar))?,
+            .inspect(|falcon| falcon.clear_msgq_interrupt(bar))?,
 
             sec2_falcon: Falcon::new(pdev.as_ref(), spec.chipset, bar, true)?,
 
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 206dab2e1335..a3836a01996b 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -198,6 +198,16 @@ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
 
 // PFALCON
 
+register!(NV_PFALCON_FALCON_IRQMASK @ PFalconBase[0x00000014] {
+    4:4     halt as bool;
+    6:6     swgen0 as bool;
+});
+
+register!(NV_PFALCON_FALCON_IRQSTAT @ PFalconBase[0x00000008] {
+    4:4     halt as bool;
+    6:6     swgen0 as bool;
+});
+
 register!(NV_PFALCON_FALCON_IRQSCLR @ PFalconBase[0x00000004] {
     4:4     halt as bool;
     6:6     swgen0 as bool;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
                   ` (4 preceding siblings ...)
  2025-10-20 18:55 ` [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts Joel Fernandes
@ 2025-10-20 18:55 ` Joel Fernandes
  2025-10-22  2:18   ` John Hubbard
  2025-10-22 10:41   ` Alexandre Courbot
  2025-10-20 18:55 ` [PATCH 7/7] nova-core: mm: Add data structures for page table management Joel Fernandes
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 18:55 UTC (permalink / raw)
  To: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Joel Fernandes,
	Timur Tabi, joel, Elle Rhumsaa, Daniel Almeida, nouveau

Required for writing page tables directly to VRAM physical memory,
before page tables and MMU are setup.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/mod.rs    |   3 +
 drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs |   1 +
 drivers/gpu/nova-core/regs.rs      |  29 +++-
 4 files changed, 273 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/nova-core/mm/mod.rs
 create mode 100644 drivers/gpu/nova-core/mm/pramin.rs

diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
new file mode 100644
index 000000000000..54c7cd9416a9
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/mod.rs
@@ -0,0 +1,3 @@
+// SPDX-License-Identifier: GPL-2.0
+
+pub(crate) mod pramin;
diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
new file mode 100644
index 000000000000..4f4e1b8c0b9b
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pramin.rs
@@ -0,0 +1,241 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Direct VRAM access through PRAMIN window before page tables are set up.
+//! PRAMIN can also write to system memory, however for simplicty we only
+//! support VRAM access.
+//!
+//! # Examples
+//!
+//! ## Writing u32 data to VRAM
+//!
+//! ```no_run
+//! use crate::driver::Bar0;
+//! use crate::mm::pramin::PraminVram;
+//!
+//! fn write_data_to_vram(bar: &Bar0) -> Result {
+//!     let pramin = PraminVram::new(bar);
+//!     // Write 4 32-bit words to VRAM at offset 0x10000
+//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
+//!     pramin.write::<u32>(0x10000, &data)?;
+//!     Ok(())
+//! }
+//! ```
+//!
+//! ## Reading bytes from VRAM
+//!
+//! ```no_run
+//! use crate::driver::Bar0;
+//! use crate::mm::pramin::PraminVram;
+//!
+//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
+//!     let pramin = PraminVram::new(bar);
+//!     // Read a u8 from VRAM starting at offset 0x20000
+//!     pramin.read::<u8>(0x20000, buffer)?;
+//!     Ok(())
+//! }
+//! ```
+
+#![expect(dead_code)]
+
+use crate::driver::Bar0;
+use crate::regs;
+use core::mem;
+use kernel::prelude::*;
+
+/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
+/// the VRAM directly. These addresses are consistent across all GPUs.
+const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
+const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
+
+/// Trait for types that can be read/written through PRAMIN.
+pub(crate) trait PraminNum: Copy + Default + Sized {
+    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
+
+    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
+
+    fn size_bytes() -> usize {
+        mem::size_of::<Self>()
+    }
+
+    fn alignment() -> usize {
+        Self::size_bytes()
+    }
+}
+
+/// Macro to implement PraminNum trait for unsigned integer types.
+macro_rules! impl_pramin_unsigned_num {
+    ($bits:literal) => {
+        ::kernel::macros::paste! {
+            impl PraminNum for [<u $bits>] {
+                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
+                    bar.[<try_read $bits>](offset)
+                }
+
+                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
+                    bar.[<try_write $bits>](self, offset)
+                }
+            }
+        }
+    };
+}
+
+impl_pramin_unsigned_num!(8);
+impl_pramin_unsigned_num!(16);
+impl_pramin_unsigned_num!(32);
+impl_pramin_unsigned_num!(64);
+
+/// Direct VRAM access through PRAMIN window before page tables are set up.
+pub(crate) struct PraminVram<'a> {
+    bar: &'a Bar0,
+    saved_window_addr: usize,
+}
+
+impl<'a> PraminVram<'a> {
+    /// Create a new PRAMIN VRAM accessor, saving current window state,
+    /// the state is restored when the accessor is dropped.
+    ///
+    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
+    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
+    pub(crate) fn new(bar: &'a Bar0) -> Self {
+        let saved_window_addr = Self::get_window_addr(bar);
+        Self {
+            bar,
+            saved_window_addr,
+        }
+    }
+
+    /// Set BAR0 window to point to specific FB region.
+    ///
+    /// # Arguments
+    ///
+    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
+    ///                 Must be 64KB aligned (lower 16 bits zero).
+    fn set_window_addr(&self, fb_offset: usize) -> Result {
+        // FB offset must be 64KB aligned (hardware requirement for window_base field)
+        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
+        if fb_offset & 0xFFFF != 0 {
+            return Err(EINVAL);
+        }
+
+        let window_reg = regs::NV_PBUS_BAR0_WINDOW::default().set_window_addr(fb_offset);
+        window_reg.write(self.bar);
+
+        // Read back to ensure it took effect
+        let readback = regs::NV_PBUS_BAR0_WINDOW::read(self.bar);
+        if readback.window_base() != window_reg.window_base() {
+            return Err(EIO);
+        }
+
+        Ok(())
+    }
+
+    /// Get current BAR0 window offset.
+    ///
+    /// # Returns
+    ///
+    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
+    /// This offset is always 64KB aligned.
+    fn get_window_addr(bar: &Bar0) -> usize {
+        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
+        window_reg.get_window_addr()
+    }
+
+    /// Common logic for accessing VRAM data through PRAMIN with windowing.
+    ///
+    /// # Arguments
+    ///
+    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
+    ///                 Must be aligned to `T::alignment()`.
+    /// * `num_items` - Number of items of type `T` to process.
+    /// * `operation` - Closure called for each item to perform the actual read/write.
+    ///                 Takes two parameters:
+    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
+    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access
+    ///
+    /// The function automatically handles PRAMIN window repositioning when accessing
+    /// data that spans multiple 1MB windows.
+    fn access_vram<T: PraminNum, F>(
+        &self,
+        fb_offset: usize,
+        num_items: usize,
+        mut operation: F,
+    ) -> Result
+    where
+        F: FnMut(usize, usize) -> Result,
+    {
+        // FB offset must be aligned to the size of T
+        if fb_offset & (T::alignment() - 1) != 0 {
+            return Err(EINVAL);
+        }
+
+        let mut offset_bytes = fb_offset;
+        let mut remaining_items = num_items;
+        let mut data_index = 0;
+
+        while remaining_items > 0 {
+            // Align the window to 64KB boundary
+            let target_window = offset_bytes & !0xFFFF;
+            let window_offset = offset_bytes - target_window;
+
+            // Set window if needed
+            if target_window != Self::get_window_addr(self.bar) {
+                self.set_window_addr(target_window)?;
+            }
+
+            // Calculate how many items we can access from this window position
+            // We can access up to 1MB total, minus the offset within the window
+            let remaining_in_window = PRAMIN_SIZE - window_offset;
+            let max_items_in_window = remaining_in_window / T::size_bytes();
+            let items_to_write = core::cmp::min(remaining_items, max_items_in_window);
+
+            // Process data through PRAMIN
+            for i in 0..items_to_write {
+                // Calculate the byte offset in the PRAMIN window to write to.
+                let pramin_offset_bytes = PRAMIN_BASE + window_offset + (i * T::size_bytes());
+                operation(data_index + i, pramin_offset_bytes)?;
+            }
+
+            // Move to next chunk.
+            data_index += items_to_write;
+            offset_bytes += items_to_write * T::size_bytes();
+            remaining_items -= items_to_write;
+        }
+
+        Ok(())
+    }
+
+    /// Generic write for data to VRAM through PRAMIN.
+    ///
+    /// # Arguments
+    ///
+    /// * `fb_offset` - Starting byte offset in VRAM where data will be written.
+    ///                 Must be aligned to `T::alignment()`.
+    /// * `data` - Slice of items to write to VRAM. All items will be written sequentially
+    ///            starting at `fb_offset`.
+    pub(crate) fn write<T: PraminNum>(&self, fb_offset: usize, data: &[T]) -> Result {
+        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
+            data[data_idx].write_to_bar(self.bar, pramin_offset)
+        })
+    }
+
+    /// Generic read data from VRAM through PRAMIN.
+    ///
+    /// # Arguments
+    ///
+    /// * `fb_offset` - Starting byte offset in VRAM where data will be read from.
+    ///                 Must be aligned to `T::alignment()`.
+    /// * `data` - Mutable slice that will be filled with data read from VRAM.
+    ///            The number of items read equals `data.len()`.
+    pub(crate) fn read<T: PraminNum>(&self, fb_offset: usize, data: &mut [T]) -> Result {
+        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
+            data[data_idx] = T::read_from_bar(self.bar, pramin_offset)?;
+            Ok(())
+        })
+    }
+}
+
+impl<'a> Drop for PraminVram<'a> {
+    fn drop(&mut self) {
+        let _ = self.set_window_addr(self.saved_window_addr); // Restore original window.
+    }
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 112277c7921e..6bd9fc3372d6 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -13,6 +13,7 @@
 mod gfw;
 mod gpu;
 mod gsp;
+mod mm;
 mod regs;
 mod util;
 mod vbios;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index a3836a01996b..ba09da7e1541 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -12,6 +12,7 @@
     FalconModSelAlgo, FalconSecurityModel, PFalcon2Base, PFalconBase, PeregrineCoreSelect,
 };
 use crate::gpu::{Architecture, Chipset};
+use kernel::bits::genmask_u32;
 use kernel::prelude::*;
 
 // PMC
@@ -43,7 +44,8 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
     }
 }
 
-// PBUS
+// PBUS - PBUS is a bus control unit, that helps the GPU communicate with the PCI bus.
+// Handles the BAR windows, decoding of MMIO read/writes on the BARs, etc.
 
 register!(NV_PBUS_SW_SCRATCH @ 0x00001400[64]  {});
 
@@ -52,6 +54,31 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
     31:16   frts_err_code as u16;
 });
 
+// BAR0 window control register to configure the BAR0 window for PRAMIN access
+// (direct physical VRAM access).
+register!(NV_PBUS_BAR0_WINDOW @ 0x00001700, "BAR0 window control register" {
+    25:24   target as u8, "Target (0=VID_MEM, 1=SYS_MEM_COHERENT, 2=SYS_MEM_NONCOHERENT)";
+    23:0    window_base as u32, "Window base address (bits 39:16 of FB addr)";
+});
+
+impl NV_PBUS_BAR0_WINDOW {
+    /// Returns the 64-bit aligned VRAM address of the window.
+    pub(crate) fn get_window_addr(self) -> usize {
+        (self.window_base() as usize) << 16
+    }
+
+    /// Sets the window address from a framebuffer offset.
+    /// The fb_offset must be 64KB aligned (lower bits discared).
+    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
+        // Calculate window base (bits 39:16 of FB address)
+        // The total FB address is 40 bits, mask anything above. Since we are
+        // right shifting the offset by 16 bits, the mask is only 24 bits.
+        let mask = genmask_u32(0..=23) as usize;
+        let window_base = ((fb_offset >> 16) & mask) as u32;
+        self.set_window_base(window_base)
+    }
+}
+
 // PFB
 
 // The following two registers together hold the physical system memory address that is used by the
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
                   ` (5 preceding siblings ...)
  2025-10-20 18:55 ` [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
@ 2025-10-20 18:55 ` Joel Fernandes
  2025-10-20 20:59   ` John Hubbard
                     ` (2 more replies)
  2025-10-20 21:20 ` [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core John Hubbard
  2025-10-22  6:57 ` Alexandre Courbot
  8 siblings, 3 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 18:55 UTC (permalink / raw)
  To: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Joel Fernandes,
	Timur Tabi, joel, Elle Rhumsaa, Daniel Almeida, nouveau

Add data structures and helpers for page table management. Uses
bitfield for cleanly representing and accessing the bitfields in the
structures.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/mod.rs   |   1 +
 drivers/gpu/nova-core/mm/types.rs | 405 ++++++++++++++++++++++++++++++
 2 files changed, 406 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/types.rs

diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
index 54c7cd9416a9..f4985780a8a1 100644
--- a/drivers/gpu/nova-core/mm/mod.rs
+++ b/drivers/gpu/nova-core/mm/mod.rs
@@ -1,3 +1,4 @@
 // SPDX-License-Identifier: GPL-2.0
 
 pub(crate) mod pramin;
+pub(crate) mod types;
diff --git a/drivers/gpu/nova-core/mm/types.rs b/drivers/gpu/nova-core/mm/types.rs
new file mode 100644
index 000000000000..0a2dec6b9145
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/types.rs
@@ -0,0 +1,405 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Page table data management for NVIDIA GPUs.
+//!
+//! This module provides data structures for GPU page table management, including
+//! address types, page table entries (PTEs), page directory entries (PDEs), and
+//! the page table hierarchy levels.
+//!
+//! # Examples
+//!
+//! ## Creating and writing a PDE
+//!
+//! ```no_run
+//! let new_pde = Pde::default()
+//!     .set_valid(true)
+//!     .set_aperture(AperturePde::VideoMemory)
+//!     .set_table_frame_number(new_table.frame_number());
+//! // Call a function to write PDE to VRAM address
+//! write_pde(pde_addr, new_pde)?;
+//! ```
+//!
+//! ## Given a PTE, Get or allocate a PFN (page frame number).
+//!
+//! ```no_run
+//! fn get_frame_number(pte_addr: VramAddress) -> Result<Pfn> {
+//!     // Call a function to read 64-bit PTE value from VRAM address
+//!     let pte = Pte(read_u64_from_vram(pte_addr)?);
+//!     if pte.valid() {
+//!         // Return physical frame number from existing mapping
+//!         Ok(Pfn::new(pte.frame_number()))
+//!     } else {
+//!         // Create new PTE mapping
+//!         // Call a function to allocate a physical page, returning a Pfn
+//!         let phys_pfn = allocate_page()?;
+//!         let new_pte = Pte::default()
+//!             .set_valid(true)
+//!             .set_frame_number(phys_pfn.raw())
+//!             .set_aperture(AperturePte::VideoMemory)
+//!             .set_privilege(false)   // User-accessible
+//!             .set_read_only(false);  // Writable
+//!
+//!         // Call a function to write 64-bit PTE value to VRAM address
+//!         write_u64_to_vram(pte_addr, new_pte.raw())?;
+//!         Ok(phys_pfn)
+//!     }
+//! }
+//! ```
+
+#![expect(dead_code)]
+
+/// Memory size constants
+pub(crate) const KB: usize = 1024;
+pub(crate) const MB: usize = KB * 1024;
+
+/// Page size: 4 KiB
+pub(crate) const PAGE_SIZE: usize = 4 * KB;
+
+/// Page Table Level hierarchy
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum PageTableLevel {
+    Pdb, // Level 0 - Page Directory Base
+    L1,  // Level 1
+    L2,  // Level 2
+    L3,  // Level 3 - Dual PDE (128-bit entries)
+    L4,  // Level 4 - PTEs
+}
+
+impl PageTableLevel {
+    /// Get the entry size for this level.
+    pub(crate) fn entry_size(&self) -> usize {
+        match self {
+            Self::L3 => 16, // 128-bit dual PDE
+            _ => 8,         // 64-bit PDE/PTE
+        }
+    }
+
+    /// PDE levels constant array for iteration.
+    const PDE_LEVELS: [PageTableLevel; 4] = [
+        PageTableLevel::Pdb,
+        PageTableLevel::L1,
+        PageTableLevel::L2,
+        PageTableLevel::L3,
+    ];
+
+    /// Get iterator over PDE levels.
+    pub(crate) fn pde_levels() -> impl Iterator<Item = PageTableLevel> {
+        Self::PDE_LEVELS.into_iter()
+    }
+}
+
+/// Memory aperture for Page Directory Entries (PDEs)
+#[repr(u8)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(crate) enum AperturePde {
+    #[default]
+    Invalid = 0,
+    VideoMemory = 1,
+    SystemCoherent = 2,
+    SystemNonCoherent = 3,
+}
+
+impl From<u8> for AperturePde {
+    fn from(val: u8) -> Self {
+        match val {
+            1 => Self::VideoMemory,
+            2 => Self::SystemCoherent,
+            3 => Self::SystemNonCoherent,
+            _ => Self::Invalid,
+        }
+    }
+}
+
+impl From<AperturePde> for u8 {
+    fn from(val: AperturePde) -> Self {
+        val as u8
+    }
+}
+
+/// Memory aperture for Page Table Entries (PTEs)
+#[repr(u8)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(crate) enum AperturePte {
+    #[default]
+    VideoMemory = 0,
+    PeerVideoMemory = 1,
+    SystemCoherent = 2,
+    SystemNonCoherent = 3,
+}
+
+impl From<u8> for AperturePte {
+    fn from(val: u8) -> Self {
+        match val {
+            0 => Self::VideoMemory,
+            1 => Self::PeerVideoMemory,
+            2 => Self::SystemCoherent,
+            3 => Self::SystemNonCoherent,
+            _ => Self::VideoMemory,
+        }
+    }
+}
+
+impl From<AperturePte> for u8 {
+    fn from(val: AperturePte) -> Self {
+        val as u8
+    }
+}
+
+/// Common trait for address types
+pub(crate) trait Address {
+    /// Get raw u64 value.
+    fn raw(&self) -> u64;
+
+    /// Convert an Address to a frame number.
+    fn frame_number(&self) -> u64 {
+        self.raw() >> 12
+    }
+
+    /// Get the frame offset within an Address.
+    fn frame_offset(&self) -> u16 {
+        (self.raw() & 0xFFF) as u16
+    }
+}
+
+bitfield! {
+    pub(crate) struct VramAddress(u64), "Physical VRAM address representation." {
+        11:0    offset          as u16;    // Offset within 4KB page
+        63:12   frame_number    as u64;    // Frame number
+    }
+}
+
+impl Address for VramAddress {
+    fn raw(&self) -> u64 {
+        self.0
+    }
+}
+
+impl From<Pfn> for VramAddress {
+    fn from(pfn: Pfn) -> VramAddress {
+        VramAddress::default().set_frame_number(pfn.raw())
+    }
+}
+
+bitfield! {
+    pub(crate) struct VirtualAddress(u64), "Virtual address representation for GPU." {
+        11:0    offset      as u16;    // Offset within 4KB page
+        20:12   l4_index    as u16;    // Level 4 index (PTE)
+        29:21   l3_index    as u16;    // Level 3 index
+        38:30   l2_index    as u16;    // Level 2 index
+        47:39   l1_index    as u16;    // Level 1 index
+        56:48   l0_index    as u16;    // Level 0 index (PDB)
+
+        63:12   frame_number as u64;   // Frame number (combination of levels).
+    }
+}
+
+impl VirtualAddress {
+    /// Get index for a specific page table level.
+    ///
+    /// # Example
+    ///
+    /// ```no_run
+    /// let va = VirtualAddress::default();
+    /// let pte_idx = va.level_index(PageTableLevel::L4);
+    /// ```
+    pub(crate) fn level_index(&self, level: PageTableLevel) -> u16 {
+        match level {
+            PageTableLevel::Pdb => self.l0_index(),
+            PageTableLevel::L1 => self.l1_index(),
+            PageTableLevel::L2 => self.l2_index(),
+            PageTableLevel::L3 => self.l3_index(),
+            PageTableLevel::L4 => self.l4_index(),
+        }
+    }
+}
+
+impl Address for VirtualAddress {
+    fn raw(&self) -> u64 {
+        self.0
+    }
+}
+
+impl From<Vfn> for VirtualAddress {
+    fn from(vfn: Vfn) -> VirtualAddress {
+        VirtualAddress::default().set_frame_number(vfn.raw())
+    }
+}
+
+bitfield! {
+    pub(crate) struct Pte(u64), "Page Table Entry (PTE) to map virtual pages to physical frames." {
+        0:0     valid           as bool;    // (1 = valid for PTEs)
+        1:1     privilege       as bool;    // P - Privileged/kernel-only access
+        2:2     read_only       as bool;    // RO - Write protection
+        3:3     atomic_disable  as bool;    // AD - Disable atomic ops
+        4:4     encrypted       as bool;    // E - Encryption enabled
+        39:8    frame_number    as u64;     // PA[39:8] - Physical frame number (32 bits)
+        41:40   aperture        as u8 => AperturePte;   // Memory aperture type.
+        42:42   volatile        as bool;    // VOL - Volatile flag
+        50:43   kind            as u8;      // K[7:0] - Compression/tiling kind
+        63:51   comptag_line    as u16;     // CTL[12:0] - Compression tag line
+    }
+}
+
+impl Pte {
+    /// Set the physical address mapped by this PTE.
+    pub(crate) fn set_address(&mut self, addr: VramAddress) {
+        self.set_frame_number(addr.frame_number());
+    }
+
+    /// Get the physical address mapped by this PTE.
+    pub(crate) fn address(&self) -> VramAddress {
+        VramAddress::default().set_frame_number(self.frame_number())
+    }
+
+    /// Get raw u64 value.
+    pub(crate) fn raw(&self) -> u64 {
+        self.0
+    }
+}
+
+bitfield! {
+    pub(crate) struct Pde(u64), "Page Directory Entry (PDE) pointing to next-level page tables." {
+        0:0     valid_inverted       as bool;    // V - Valid bit (0=valid for PDEs)
+        2:1     aperture             as u8 => AperturePde;      // Memory aperture type
+        3:3     volatile             as bool;    // VOL - Volatile flag
+        39:8    table_frame_number   as u64;     // PA[39:8] - Table frame number (32 bits)
+    }
+}
+
+impl Pde {
+    /// Check if PDE is valid.
+    pub(crate) fn is_valid(&self) -> bool {
+        !self.valid_inverted() && self.aperture() != AperturePde::Invalid
+    }
+
+    /// The valid bit is inverted so add an accessor to flip it.
+    pub(crate) fn set_valid(&self, value: bool) -> Pde {
+        self.set_valid_inverted(!value)
+    }
+
+    /// Set the physical table address mapped by this PDE.
+    pub(crate) fn set_table_address(&mut self, addr: VramAddress) {
+        self.set_table_frame_number(addr.frame_number());
+    }
+
+    /// Get the physical table address mapped by this PDE.
+    pub(crate) fn table_address(&self) -> VramAddress {
+        VramAddress::default().set_frame_number(self.table_frame_number())
+    }
+
+    /// Get raw u64 value.
+    pub(crate) fn raw(&self) -> u64 {
+        self.0
+    }
+}
+
+/// Dual PDE at Level 3 - 128-bit entry containing both LPT and SPT pointers.
+/// Lower 64 bits = big/large page, upper 64 bits = small page.
+///
+/// # Example
+///
+/// ## Set the SPT (small page table) address in a Dual PDE
+///
+/// ```no_run
+/// // Call a function to read dual PDE from VRAM address
+/// let mut dual_pde: DualPde = read_dual_pde(dpde_addr)?;
+/// // Call a function to allocate a page table and return its VRAM address
+/// let spt_addr = allocate_page_table()?;
+/// dual_pde.set_spt(Pfn::from(spt_addr), AperturePde::VideoMemory);
+/// // Call a function to write dual PDE to VRAM address
+/// write_dual_pde(dpde_addr, dual_pde)?;
+/// ```
+#[repr(C)]
+#[derive(Debug, Clone, Copy)]
+pub(crate) struct DualPde {
+    pub lpt: Pde, // Large/Big Page Table pointer (2MB pages) - bits 63:0 (lower)
+    pub spt: Pde, // Small Page Table pointer (4KB pages) - bits 127:64 (upper)
+}
+
+impl DualPde {
+    /// Create a new empty dual PDE.
+    pub(crate) fn new() -> Self {
+        Self {
+            spt: Pde::default(),
+            lpt: Pde::default(),
+        }
+    }
+
+    /// Set the Small Page Table address with aperture.
+    pub(crate) fn set_small_pt_address(&mut self, addr: VramAddress, aperture: AperturePde) {
+        self.spt = Pde::default()
+            .set_valid(true)
+            .set_table_frame_number(addr.frame_number())
+            .set_aperture(aperture);
+    }
+
+    /// Set the Large Page Table address with aperture.
+    pub(crate) fn set_large_pt_address(&mut self, addr: VramAddress, aperture: AperturePde) {
+        self.lpt = Pde::default()
+            .set_valid(true)
+            .set_table_frame_number(addr.frame_number())
+            .set_aperture(aperture);
+    }
+
+    /// Check if has valid Small Page Table.
+    pub(crate) fn has_small_pt_address(&self) -> bool {
+        self.spt.is_valid()
+    }
+
+    /// Check if has valid Large Page Table.
+    pub(crate) fn has_large_pt_address(&self) -> bool {
+        self.lpt.is_valid()
+    }
+
+    /// Set SPT (Small Page Table) using Pfn.
+    pub(crate) fn set_spt(&mut self, pfn: Pfn, aperture: AperturePde) {
+        self.spt = Pde::default()
+            .set_valid(true)
+            .set_aperture(aperture)
+            .set_table_frame_number(pfn.raw());
+    }
+}
+
+/// Virtual Frame Number - virtual address divided by 4KB.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) struct Vfn(u64);
+
+impl Vfn {
+    /// Create a new VFN from a frame number.
+    pub(crate) const fn new(frame_number: u64) -> Self {
+        Self(frame_number)
+    }
+
+    /// Get raw frame number.
+    pub(crate) const fn raw(&self) -> u64 {
+        self.0
+    }
+}
+
+impl From<VirtualAddress> for Vfn {
+    fn from(vaddr: VirtualAddress) -> Self {
+        Self(vaddr.frame_number())
+    }
+}
+
+/// Physical Frame Number - physical address divided by 4KB.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) struct Pfn(u64);
+
+impl Pfn {
+    /// Create a new PFN from a frame number.
+    pub(crate) const fn new(frame_number: u64) -> Self {
+        Self(frame_number)
+    }
+
+    /// Get raw frame number.
+    pub(crate) const fn raw(&self) -> u64 {
+        self.0
+    }
+}
+
+impl From<VramAddress> for Pfn {
+    fn from(addr: VramAddress) -> Self {
+        Self(addr.frame_number())
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  2025-10-20 18:55 ` [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
@ 2025-10-20 19:36   ` John Hubbard
  2025-10-20 19:48     ` Joel Fernandes
  2025-10-20 22:08   ` John Hubbard
  2025-10-22  2:09   ` Bagas Sanjaya
  2 siblings, 1 reply; 55+ messages in thread
From: John Hubbard @ 2025-10-20 19:36 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/20/25 11:55 AM, Joel Fernandes wrote:
...> +Logically, the PRAMIN aperture mechanism is implemented by the GPU's PBUS (PCIe Bus Controller Unit)
> +and provides a CPU-accessible window into VRAM through the PCIe interface::
> +
> +    +-----------------+    PCIe     +------------------------------+
> +    |      CPU        |<----------->|           GPU                |
> +    +-----------------+             |                              |
> +                                    |  +----------------------+    |
> +                                    |  |       PBUS           |    |
> +                                    |  |  (Bus Controller)    |    |
> +                                    |  |                      |    |
> +                                    |  |  +--------------.<------------ (window always starts at
> +                                    |  |  |   PRAMIN     |    |    |     BAR0 + 0x700000)

Quick question: does "window always starts at" actually mean "windows
is always initialized to" ? Or something else?


thanks,
-- 
John Hubbard

> +                                    |  |  |   Window     |    |    |
> +                                    |  |  |   (1MB)      |    |    |
> +                                    |  |  +--------------+    |    |
> +                                    |  |         |            |    |
> +                                    |  +---------|------------+    |
> +                                    |            |                 |
> +                                    |            v                 |
> +                                    |  .----------------------.<------------ (Program PRAMIN to any
> +                                    |  |       VRAM           |    |    64KB VRAM physical boundary)
> +                                    |  |    (Several GBs)     |    |
> +                                    |  |                      |    |
> +                                    |  |  FB[0x000000000000]  |    |
> +                                    |  |          ...         |    |
> +                                    |  |  FB[0x7FFFFFFFFFF]   |    |
> +                                    |  +----------------------+    |
> +                                    +------------------------------+
> +
> +PBUS (PCIe Bus Controller) among other things is responsible in the GPU for handling MMIO
> +accesses to the BAR registers.
> +
> +PRAMIN Window Operation
> +=======================
> +
> +The PRAMIN window provides a 1MB sliding aperture that can be repositioned over
> +the entire VRAM address space using the NV_PBUS_BAR0_WINDOW register.
> +
> +Window Control Mechanism
> +-------------------------
> +
> +The window position is controlled via the PBUS BAR0_WINDOW register::
> +
> +    NV_PBUS_BAR0_WINDOW Register
> +    +-----+-----+--------------------------------------+
> +    |31-26|25-24|           23-0                       |
> +    |     |TARG |         BASE_ADDR                    |
> +    |     | ET  |        (bits 39:16 of VRAM address)  |
> +    +-----+-----+--------------------------------------+
> +
> +    TARGET field values:
> +    - 0x0: VID_MEM (Video Memory / VRAM)
> +    - 0x1: SYS_MEM_COHERENT (Coherent system memory)
> +    - 0x2: SYS_MEM_NONCOHERENT (Non-coherent system memory)
> +
> +64KB Alignment Requirement
> +---------------------------
> +
> +The PRAMIN window must be aligned to 64KB boundaries in VRAM. This is enforced
> +by the BASE_ADDR field representing bits [39:16] of the target address::
> +
> +    VRAM Address Calculation:
> +    actual_vram_addr = (BASE_ADDR << 16) + pramin_offset
> +    Where:
> +    - BASE_ADDR: 24-bit value from NV_PBUS_BAR0_WINDOW[23:0]
> +    - pramin_offset: 20-bit offset within PRAMIN window [0x00000-0xFFFFF]
> +    Example Window Positioning:
> +    +---------------------------------------------------------+
> +    |                    VRAM Space                           |
> +    |                                                         |
> +    |  0x000000000  +-----------------+ <-- 64KB aligned      |
> +    |               | PRAMIN Window   |                       |
> +    |               |    (1MB)        |                       |
> +    |  0x0000FFFFF  +-----------------+                       |
> +    |                                                         |
> +    |       |              ^                                  |
> +    |       |              | Window can slide                 |
> +    |       v              | to any 64KB boundary             |
> +    |                                                         |
> +    |  0x123400000  +-----------------+ <-- 64KB aligned      |
> +    |               | PRAMIN Window   |                       |
> +    |               |    (1MB)        |                       |
> +    |  0x1234FFFFF  +-----------------+                       |
> +    |                                                         |
> +    |                       ...                               |
> +    |                                                         |
> +    |  0x7FFFF0000  +-----------------+ <-- 64KB aligned      |
> +    |               | PRAMIN Window   |                       |
> +    |               |    (1MB)        |                       |
> +    |  0x7FFFFFFFF  +-----------------+                       |
> +    +---------------------------------------------------------+
> diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
> index 46302daace34..e77d3ee336a4 100644
> --- a/Documentation/gpu/nova/index.rst
> +++ b/Documentation/gpu/nova/index.rst
> @@ -33,3 +33,4 @@ vGPU manager VFIO driver and the nova-drm driver.
>     core/fwsec
>     core/falcon
>     core/msgq
> +   core/pramin



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  2025-10-20 19:36   ` John Hubbard
@ 2025-10-20 19:48     ` Joel Fernandes
  2025-10-20 20:42       ` John Hubbard
  0 siblings, 1 reply; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 19:48 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, dakr@kernel.org,
	Alexandre Courbot, Alistair Popple, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, bjorn3_gh@protonmail.com, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Timur Tabi, joel@joelfernandes.org,
	Elle Rhumsaa, Daniel Almeida, nouveau@lists.freedesktop.org



> On Oct 20, 2025, at 3:36 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> 
> On 10/20/25 11:55 AM, Joel Fernandes wrote:
> ...> +Logically, the PRAMIN aperture mechanism is implemented by the GPU's PBUS (PCIe Bus Controller Unit)
>> +and provides a CPU-accessible window into VRAM through the PCIe interface::
>> +
>> +    +-----------------+    PCIe     +------------------------------+
>> +    |      CPU        |<----------->|           GPU                |
>> +    +-----------------+             |                              |
>> +                                    |  +----------------------+    |
>> +                                    |  |       PBUS           |    |
>> +                                    |  |  (Bus Controller)    |    |
>> +                                    |  |                      |    |
>> +                                    |  |  +--------------.<------------ (window always starts at
>> +                                    |  |  |   PRAMIN     |    |    |     BAR0 + 0x700000)
> 
> Quick question: does "window always starts at" actually mean "windows
> is always initialized to" ? Or something else?

It means start of the 1MB window (offset 0) is accessed from the BAR at that location.

Thanks.



> 
> 
> thanks,
> --
> John Hubbard
> 
>> +                                    |  |  |   Window     |    |    |
>> +                                    |  |  |   (1MB)      |    |    |
>> +                                    |  |  +--------------+    |    |
>> +                                    |  |         |            |    |
>> +                                    |  +---------|------------+    |
>> +                                    |            |                 |
>> +                                    |            v                 |
>> +                                    |  .----------------------.<------------ (Program PRAMIN to any
>> +                                    |  |       VRAM           |    |    64KB VRAM physical boundary)
>> +                                    |  |    (Several GBs)     |    |
>> +                                    |  |                      |    |
>> +                                    |  |  FB[0x000000000000]  |    |
>> +                                    |  |          ...         |    |
>> +                                    |  |  FB[0x7FFFFFFFFFF]   |    |
>> +                                    |  +----------------------+    |
>> +                                    +------------------------------+
>> +
>> +PBUS (PCIe Bus Controller) among other things is responsible in the GPU for handling MMIO
>> +accesses to the BAR registers.
>> +
>> +PRAMIN Window Operation
>> +=======================
>> +
>> +The PRAMIN window provides a 1MB sliding aperture that can be repositioned over
>> +the entire VRAM address space using the NV_PBUS_BAR0_WINDOW register.
>> +
>> +Window Control Mechanism
>> +-------------------------
>> +
>> +The window position is controlled via the PBUS BAR0_WINDOW register::
>> +
>> +    NV_PBUS_BAR0_WINDOW Register
>> +    +-----+-----+--------------------------------------+
>> +    |31-26|25-24|           23-0                       |
>> +    |     |TARG |         BASE_ADDR                    |
>> +    |     | ET  |        (bits 39:16 of VRAM address)  |
>> +    +-----+-----+--------------------------------------+
>> +
>> +    TARGET field values:
>> +    - 0x0: VID_MEM (Video Memory / VRAM)
>> +    - 0x1: SYS_MEM_COHERENT (Coherent system memory)
>> +    - 0x2: SYS_MEM_NONCOHERENT (Non-coherent system memory)
>> +
>> +64KB Alignment Requirement
>> +---------------------------
>> +
>> +The PRAMIN window must be aligned to 64KB boundaries in VRAM. This is enforced
>> +by the BASE_ADDR field representing bits [39:16] of the target address::
>> +
>> +    VRAM Address Calculation:
>> +    actual_vram_addr = (BASE_ADDR << 16) + pramin_offset
>> +    Where:
>> +    - BASE_ADDR: 24-bit value from NV_PBUS_BAR0_WINDOW[23:0]
>> +    - pramin_offset: 20-bit offset within PRAMIN window [0x00000-0xFFFFF]
>> +    Example Window Positioning:
>> +    +---------------------------------------------------------+
>> +    |                    VRAM Space                           |
>> +    |                                                         |
>> +    |  0x000000000  +-----------------+ <-- 64KB aligned      |
>> +    |               | PRAMIN Window   |                       |
>> +    |               |    (1MB)        |                       |
>> +    |  0x0000FFFFF  +-----------------+                       |
>> +    |                                                         |
>> +    |       |              ^                                  |
>> +    |       |              | Window can slide                 |
>> +    |       v              | to any 64KB boundary             |
>> +    |                                                         |
>> +    |  0x123400000  +-----------------+ <-- 64KB aligned      |
>> +    |               | PRAMIN Window   |                       |
>> +    |               |    (1MB)        |                       |
>> +    |  0x1234FFFFF  +-----------------+                       |
>> +    |                                                         |
>> +    |                       ...                               |
>> +    |                                                         |
>> +    |  0x7FFFF0000  +-----------------+ <-- 64KB aligned      |
>> +    |               | PRAMIN Window   |                       |
>> +    |               |    (1MB)        |                       |
>> +    |  0x7FFFFFFFF  +-----------------+                       |
>> +    +---------------------------------------------------------+
>> diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
>> index 46302daace34..e77d3ee336a4 100644
>> --- a/Documentation/gpu/nova/index.rst
>> +++ b/Documentation/gpu/nova/index.rst
>> @@ -33,3 +33,4 @@ vGPU manager VFIO driver and the nova-drm driver.
>>    core/fwsec
>>    core/falcon
>>    core/msgq
>> +   core/pramin
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  2025-10-20 19:48     ` Joel Fernandes
@ 2025-10-20 20:42       ` John Hubbard
  2025-10-20 20:45         ` Joel Fernandes
  0 siblings, 1 reply; 55+ messages in thread
From: John Hubbard @ 2025-10-20 20:42 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, dakr@kernel.org,
	Alexandre Courbot, Alistair Popple, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, bjorn3_gh@protonmail.com, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Timur Tabi, joel@joelfernandes.org,
	Elle Rhumsaa, Daniel Almeida, nouveau@lists.freedesktop.org

On 10/20/25 12:48 PM, Joel Fernandes wrote:
> 
> 
>> On Oct 20, 2025, at 3:36 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> On 10/20/25 11:55 AM, Joel Fernandes wrote:
>> ...> +Logically, the PRAMIN aperture mechanism is implemented by the GPU's PBUS (PCIe Bus Controller Unit)
>>> +and provides a CPU-accessible window into VRAM through the PCIe interface::
>>> +
>>> +    +-----------------+    PCIe     +------------------------------+
>>> +    |      CPU        |<----------->|           GPU                |
>>> +    +-----------------+             |                              |
>>> +                                    |  +----------------------+    |
>>> +                                    |  |       PBUS           |    |
>>> +                                    |  |  (Bus Controller)    |    |
>>> +                                    |  |                      |    |
>>> +                                    |  |  +--------------.<------------ (window always starts at
>>> +                                    |  |  |   PRAMIN     |    |    |     BAR0 + 0x700000)
>>
>> Quick question: does "window always starts at" actually mean "windows
>> is always initialized to" ? Or something else?
> 
> It means start of the 1MB window (offset 0) is accessed from the BAR at that location.
> 
OK, yes. May I suggest this slightly tweaked wording:

   (window into VRAM starts at BAR0 + 0x700000)

? 

This avoids "always" (because HW may change someday), and also
gives a subtly stronger hint about how this is laid out.



thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  2025-10-20 20:42       ` John Hubbard
@ 2025-10-20 20:45         ` Joel Fernandes
  0 siblings, 0 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 20:45 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, dakr@kernel.org,
	Alexandre Courbot, Alistair Popple, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, bjorn3_gh@protonmail.com, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Timur Tabi, joel@joelfernandes.org,
	Elle Rhumsaa, Daniel Almeida, nouveau@lists.freedesktop.org



On 10/20/2025 4:42 PM, John Hubbard wrote:
> On 10/20/25 12:48 PM, Joel Fernandes wrote:
>>
>>
>>> On Oct 20, 2025, at 3:36 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>>
>>> On 10/20/25 11:55 AM, Joel Fernandes wrote:
>>> ...> +Logically, the PRAMIN aperture mechanism is implemented by the GPU's PBUS (PCIe Bus Controller Unit)
>>>> +and provides a CPU-accessible window into VRAM through the PCIe interface::
>>>> +
>>>> +    +-----------------+    PCIe     +------------------------------+
>>>> +    |      CPU        |<----------->|           GPU                |
>>>> +    +-----------------+             |                              |
>>>> +                                    |  +----------------------+    |
>>>> +                                    |  |       PBUS           |    |
>>>> +                                    |  |  (Bus Controller)    |    |
>>>> +                                    |  |                      |    |
>>>> +                                    |  |  +--------------.<------------ (window always starts at
>>>> +                                    |  |  |   PRAMIN     |    |    |     BAR0 + 0x700000)
>>>
>>> Quick question: does "window always starts at" actually mean "windows
>>> is always initialized to" ? Or something else?
>>
>> It means start of the 1MB window (offset 0) is accessed from the BAR at that location.
>>
> OK, yes. May I suggest this slightly tweaked wording:
> 
>    (window into VRAM starts at BAR0 + 0x700000)
> 
> ? 
> 
> This avoids "always" (because HW may change someday), and also
> gives a subtly stronger hint about how this is laid out.
> 

Sure, I will change to this wording for next posting. Thanks,

 - Joel


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-10-20 18:55 ` [PATCH 7/7] nova-core: mm: Add data structures for page table management Joel Fernandes
@ 2025-10-20 20:59   ` John Hubbard
  2025-10-21 18:26     ` Joel Fernandes
  2025-10-20 21:30   ` Miguel Ojeda
  2025-10-22 11:21   ` Alexandre Courbot
  2 siblings, 1 reply; 55+ messages in thread
From: John Hubbard @ 2025-10-20 20:59 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/20/25 11:55 AM, Joel Fernandes wrote:
> Add data structures and helpers for page table management. Uses
> bitfield for cleanly representing and accessing the bitfields in the
> structures.
> 
...
> +bitfield! {
> +    pub(crate) struct Pte(u64), "Page Table Entry (PTE) to map virtual pages to physical frames." {
> +        0:0     valid           as bool;    // (1 = valid for PTEs)
> +        1:1     privilege       as bool;    // P - Privileged/kernel-only access
> +        2:2     read_only       as bool;    // RO - Write protection
> +        3:3     atomic_disable  as bool;    // AD - Disable atomic ops
> +        4:4     encrypted       as bool;    // E - Encryption enabled
> +        39:8    frame_number    as u64;     // PA[39:8] - Physical frame number (32 bits)
> +        41:40   aperture        as u8 => AperturePte;   // Memory aperture type.
> +        42:42   volatile        as bool;    // VOL - Volatile flag
> +        50:43   kind            as u8;      // K[7:0] - Compression/tiling kind
> +        63:51   comptag_line    as u16;     // CTL[12:0] - Compression tag line
> +    }
> +}

Hi Joel,

What GPUs is this good for? I ask because I seem to recall that
the format has changed over the years, on a per-GPU-architecture
basis...


thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core
  2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
                   ` (6 preceding siblings ...)
  2025-10-20 18:55 ` [PATCH 7/7] nova-core: mm: Add data structures for page table management Joel Fernandes
@ 2025-10-20 21:20 ` John Hubbard
  2025-10-21 18:29   ` Joel Fernandes
  2025-10-22  6:57 ` Alexandre Courbot
  8 siblings, 1 reply; 55+ messages in thread
From: John Hubbard @ 2025-10-20 21:20 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/20/25 11:55 AM, Joel Fernandes wrote:
> These patches have some prerequistes needed for nova-core as support is added
> for memory management and interrupt handling. I rebased them on drm-rust-next
> and would like them to be considered for the next merge window. I also included
> a simple rust documentation patch fixing some issues I noticed while reading it :).

Just to be clear, it appears to me that one must first apply
"[PATCH v7 0/4] bitfield initial refactor within nova-core" [1],
in order to apply this series, yes?


[1] https://lore.kernel.org/20251016150204.1189641-1-joelagnelf@nvidia.com


thanks,
-- 
John Hubbard

> 
> The series adds support for the PRAMIN aperture mechanism, which is a
> prerequisite for virtual memory as it is required to boot strap virtual memory
> (we cannot write to VRAM using virtual memory because we need to write page
> tables to VRAM in the first place).
> 
> I also add page table related structures (mm/types.rs) using the bitfield
> macro, which will be used for page table walking, memory mapping, etc. This is
> currently unused code, because without physical memory allocation (using the
> buddy allocator), we cannot use this code as page table pages need to be
> allocated in the first place. However, I have included several examples in the
> file about how these structures will be used. I have also simplified the code
> keeping future additions to it for later.
> 
> For interrupts, I only have added additional support for GSP's message queue
> interrupt. I am working on adding support to the interrupt controller module
> (VFN) which is the next thing for me to post after this series. I have it
> prototyped and working, however I am currently making several changes to it
> related to virtual functions. For now in this series, I just want to get the
> GSP-specific patch out of the way, hence I am including it here.
> 
> I also have added a patch for bitfield macro which constructs a bitfield struct
> given its storage type. This is used in a later GSP interrupt patch in the
> series to read from one register and write to another.
> 
> Joel Fernandes (7):
>   docs: rust: Fix a few grammatical errors
>   gpu: nova-core: Add support to convert bitfield to underlying type
>   docs: gpu: nova-core: Document GSP RPC message queue architecture
>   docs: gpu: nova-core: Document the PRAMIN aperture mechanism
>   gpu: nova-core: Add support for managing GSP falcon interrupts
>   nova-core: mm: Add support to use PRAMIN windows to write to VRAM
>   nova-core: mm: Add data structures for page table management
> 
>  Documentation/gpu/nova/core/msgq.rst     | 159 +++++++++
>  Documentation/gpu/nova/core/pramin.rst   | 113 +++++++
>  Documentation/gpu/nova/index.rst         |   2 +
>  Documentation/rust/coding-guidelines.rst |   4 +-
>  drivers/gpu/nova-core/bitfield.rs        |   7 +
>  drivers/gpu/nova-core/falcon/gsp.rs      |  26 +-
>  drivers/gpu/nova-core/gpu.rs             |   2 +-
>  drivers/gpu/nova-core/mm/mod.rs          |   4 +
>  drivers/gpu/nova-core/mm/pramin.rs       | 241 ++++++++++++++
>  drivers/gpu/nova-core/mm/types.rs        | 405 +++++++++++++++++++++++
>  drivers/gpu/nova-core/nova_core.rs       |   1 +
>  drivers/gpu/nova-core/regs.rs            |  39 ++-
>  12 files changed, 996 insertions(+), 7 deletions(-)
>  create mode 100644 Documentation/gpu/nova/core/msgq.rst
>  create mode 100644 Documentation/gpu/nova/core/pramin.rst
>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
>  create mode 100644 drivers/gpu/nova-core/mm/types.rs
> 



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/7] docs: rust: Fix a few grammatical errors
  2025-10-20 18:55 ` [PATCH 1/7] docs: rust: Fix a few grammatical errors Joel Fernandes
@ 2025-10-20 21:21   ` John Hubbard
  2025-10-20 21:33   ` Miguel Ojeda
  1 sibling, 0 replies; 55+ messages in thread
From: John Hubbard @ 2025-10-20 21:21 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/20/25 11:55 AM, Joel Fernandes wrote:
> Fix two grammatical errors in the Rust coding guidelines document.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  Documentation/rust/coding-guidelines.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard


> diff --git a/Documentation/rust/coding-guidelines.rst b/Documentation/rust/coding-guidelines.rst
> index 6ff9e754755d..d556f0db042b 100644
> --- a/Documentation/rust/coding-guidelines.rst
> +++ b/Documentation/rust/coding-guidelines.rst
> @@ -97,7 +97,7 @@ should still be used. For instance:
>  	// TODO: ...
>  	fn f() {}
>  
> -One special kind of comments are the ``// SAFETY:`` comments. These must appear
> +One special kind of comment is the ``// SAFETY:`` comment. These must appear
>  before every ``unsafe`` block, and they explain why the code inside the block is
>  correct/sound, i.e. why it cannot trigger undefined behavior in any case, e.g.:
>  
> @@ -166,7 +166,7 @@ in the kernel:
>  - While not shown here, if a function may panic, the conditions under which
>    that happens must be described under a ``# Panics`` section.
>  
> -  Please note that panicking should be very rare and used only with a good
> +  Please note that panicking should be very rare and used only for a good
>    reason. In almost all cases, a fallible approach should be used, typically
>    returning a ``Result``.
>  



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/7] gpu: nova-core: Add support to convert bitfield to underlying type
  2025-10-20 18:55 ` [PATCH 2/7] gpu: nova-core: Add support to convert bitfield to underlying type Joel Fernandes
@ 2025-10-20 21:25   ` John Hubbard
  2025-10-22  6:25   ` Alexandre Courbot
  1 sibling, 0 replies; 55+ messages in thread
From: John Hubbard @ 2025-10-20 21:25 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/20/25 11:55 AM, Joel Fernandes wrote:
> To support the usecase where we read a register and write to another
> with identical bit layout, add support to convert bitfield to underlying type.
> 
> Another way to do this, is to read individual fields, on the caller
> side, and write to the destination fields, but that is both cumbersome
> and error-prone as new bits added in hardware may be missed.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/bitfield.rs | 7 +++++++
>  1 file changed, 7 insertions(+)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard

> 
> diff --git a/drivers/gpu/nova-core/bitfield.rs b/drivers/gpu/nova-core/bitfield.rs
> index 0994505393dd..2266abc3f7ab 100644
> --- a/drivers/gpu/nova-core/bitfield.rs
> +++ b/drivers/gpu/nova-core/bitfield.rs
> @@ -72,6 +72,7 @@
>  /// - Field setters: `set_mode()`, `set_state()`, etc. (supports chaining with builder pattern).
>  ///   Note that the compiler will error out if the size of the setter's arg exceeds the
>  ///   struct's storage size.
> +/// - Conversion from the underlying storage type (e.g., `From<u32>`).
>  /// - Debug and Default implementations.
>  ///
>  /// Note: Field accessors and setters inherit the same visibility as the struct itself.
> @@ -117,6 +118,12 @@ fn from(val: $name) -> $storage {
>              }
>          }
>  
> +        impl ::core::convert::From<$storage> for $name {
> +            fn from(val: $storage) -> $name {
> +                $name(val)
> +            }
> +        }
> +
>          bitfield!(@fields_dispatcher $vis $name $storage { $($fields)* });
>      };
>  


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-10-20 18:55 ` [PATCH 7/7] nova-core: mm: Add data structures for page table management Joel Fernandes
  2025-10-20 20:59   ` John Hubbard
@ 2025-10-20 21:30   ` Miguel Ojeda
  2025-11-03 19:21     ` Joel Fernandes
  2025-11-03 19:29     ` John Hubbard
  2025-10-22 11:21   ` Alexandre Courbot
  2 siblings, 2 replies; 55+ messages in thread
From: Miguel Ojeda @ 2025-10-20 21:30 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot,
	Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

Hi Joel,

A few nits below (I do sometimes this kind of docs review to try to
keep a consistent style across all Rust code).

On Mon, Oct 20, 2025 at 8:56 PM Joel Fernandes <joelagnelf@nvidia.com> wrote:
>
> +//!     .set_table_frame_number(new_table.frame_number());
> +//! // Call a function to write PDE to VRAM address

Newline between these. Period ad the end.

> +//! ## Given a PTE, Get or allocate a PFN (page frame number).

In headers, no period at the end. Also, is "Get" intended to be capitalized?

> +//!     // Call a function to read 64-bit PTE value from VRAM address

Period at the end too (more of these elsewhere).

> +//!     if pte.valid() {
> +//!         // Return physical frame number from existing mapping
> +//!         Ok(Pfn::new(pte.frame_number()))

Early returns where possible, like in C, i.e. to avoid indentation on
big `else` branches.

> +/// Memory size constants
> +pub(crate) const KB: usize = 1024;
> +pub(crate) const MB: usize = KB * 1024;

The docs will only apply to the first item, so this probably was meant
to be a `//` instead of a `///`.

Or you could use a module to contain these (and then possibly `use`
them outside), and then you can have docs in the module itself, but
that is heavier.

> +/// Page size: 4 KiB
> +pub(crate) const PAGE_SIZE: usize = 4 * KB;

`rustdoc` would eventually render the value and the non-evaluated
expression, and in the source code it already says `4 * KB`, so I
think repeating the value isn't needed, unless you mean to show it is
really a multiple of 2.

> +pub(crate) enum PageTableLevel {
> +    Pdb, // Level 0 - Page Directory Base
> +    L1,  // Level 1
> +    L2,  // Level 2
> +    L3,  // Level 3 - Dual PDE (128-bit entries)
> +    L4,  // Level 4 - PTEs

In this case, I think you meant the other way around, i.e. actual
docs: `///` instead of `//`.

(Also, unless there is a particular reason (e.g. it is a big table),
please generally put comments on top of things, not at the side, which
matches closer to what is needed for docs.)

> +    /// Convert an Address to a frame number.

These should eventually be intra-doc links, but at least please use
for the moment backticks when referring to Rust items like types etc.

> +    /// # Example

We always use the plural for these section headers, even if there is a
single item (e.g. single example).

> +    /// ```no_run
> +    /// let va = VirtualAddress::default();
> +    /// let pte_idx = va.level_index(PageTableLevel::L4);
> +    /// ```

This will need some `use` lines -- not needed now, but just as a
reminder that these will get actually built eventually.

> +    /// Get raw u64 value.

Intra-doc link or at least backticks.

> +    /// The valid bit is inverted so add an accessor to flip it.
> +    pub(crate) fn set_valid(&self, value: bool) -> Pde {

This docs string sounds like a commit message.

> +/// Dual PDE at Level 3 - 128-bit entry containing both LPT and SPT pointers.
> +/// Lower 64 bits = big/large page, upper 64 bits = small page.

It sounds like a few of these details should be on its own paragraph
to avoid having them in the short description (title).

> +/// // Call a function to read dual PDE from VRAM address
> +/// let mut dual_pde: DualPde = read_dual_pde(dpde_addr)?;
> +/// // Call a function to allocate a page table and return its VRAM address
> +/// let spt_addr = allocate_page_table()?;
> +/// dual_pde.set_spt(Pfn::from(spt_addr), AperturePde::VideoMemory);
> +/// // Call a function to write dual PDE to VRAM address
> +/// write_dual_pde(dpde_addr, dual_pde)?;

Newlines before the comments, i.e. between "conceptual blocks".

> +    pub lpt: Pde, // Large/Big Page Table pointer (2MB pages) - bits 63:0 (lower)
> +    pub spt: Pde, // Small Page Table pointer (4KB pages) - bits 127:64 (upper)

Docs instead of comments.

> +    /// Check if has valid Small Page Table.

Missing word?

Thanks!

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/7] docs: rust: Fix a few grammatical errors
  2025-10-20 18:55 ` [PATCH 1/7] docs: rust: Fix a few grammatical errors Joel Fernandes
  2025-10-20 21:21   ` John Hubbard
@ 2025-10-20 21:33   ` Miguel Ojeda
  2025-10-20 23:23     ` Joel Fernandes
  1 sibling, 1 reply; 55+ messages in thread
From: Miguel Ojeda @ 2025-10-20 21:33 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot,
	Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Mon, Oct 20, 2025 at 8:55 PM Joel Fernandes <joelagnelf@nvidia.com> wrote:
>
> Fix two grammatical errors in the Rust coding guidelines document.
>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

In general, please submit independent patches like this one
independently, e.g. this should go through the Rust tree.

Thanks!

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture
  2025-10-20 18:55 ` [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture Joel Fernandes
@ 2025-10-20 21:49   ` John Hubbard
  2025-10-22  1:43   ` Bagas Sanjaya
  2025-10-22 11:16   ` Alexandre Courbot
  2 siblings, 0 replies; 55+ messages in thread
From: John Hubbard @ 2025-10-20 21:49 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/20/25 11:55 AM, Joel Fernandes wrote:
> Document the GSP RPC message queue architecture in detail.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---

Hi Joel,


>  Documentation/gpu/nova/core/msgq.rst | 159 +++++++++++++++++++++++++++


Can we please change the file name to approximately something like
message_queue.rst? I'll buy you a few extra characters. :)


>  Documentation/gpu/nova/index.rst     |   1 +
>  2 files changed, 160 insertions(+)
>  create mode 100644 Documentation/gpu/nova/core/msgq.rst
> 
> diff --git a/Documentation/gpu/nova/core/msgq.rst b/Documentation/gpu/nova/core/msgq.rst
> new file mode 100644
> index 000000000000..84e25be69cd6
> --- /dev/null
> +++ b/Documentation/gpu/nova/core/msgq.rst
> @@ -0,0 +1,159 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=========================================
> +Nova GPU RPC Message Passing Architecture
> +=========================================
> +
> +.. note::
> +   The following description is approximate and current as of the Ampere family.
> +   It may change for future generations and is intended to assist in understanding
> +   the driver code.
> +
> +Overview
> +========
> +
> +The Nova GPU driver communicates with the GSP (GPU System Processor) firmware
> +using an RPC (Remote Procedure Call) mechanism built on top of circular message
> +queues in shared memory. This document describes the structure of RPC messages
> +and the mechanics of the message passing system.
> +
> +Message Queue Architecture
> +==========================
> +
> +The communication between CPU and GSP uses two unidirectional circular queues:
> +
> +1. **CPU Queue (cpuq)**: CPU writes, GSP reads
> +2. **GSP Queue (gspq)**: GSP writes, CPU reads
> +
> +The advantage of this approach is no synchronization is required to access the
> +queues, if one entity wants to communicate with the other (CPU or GSP), they
> +simply write into their own queue.

How about this:

The advantage of this approach is that no synchronization is required to access the
queues. If one entity wants to communicate with the other (CPU or GSP), they
simply write into their own queue.


> +
> +Memory Layout
> +-------------
> +
> +The shared memory region (GspMem) where the queues reside has the following
> +layout::
> +
> +    +------------------------+ GspMem DMA Handle (base address)
> +    |    PTE Array (4KB)     |  <- Self-mapping page table
> +    | PTE[0] = base + 0x0000 |     Points to this page
> +    | PTE[1] = base + 0x1000 |     Points to CPU queue Header page

s/Header/header/

> +    | PTE[2] = base + 0x2000 |     Points to first page of CPU queue data
> +    | ...                    |     ...
> +    | ...                    |     ...
> +    +------------------------+ base + 0x1000
> +    |    CPU Queue Header    |  MsgqTxHeader + MsgqRxHeader
> +    |    - TX Header (32B)   |
> +    |    - RX Header (4B)    | (1 page)
> +    |    - Padding           |
> +    +------------------------+ base + 0x2000
> +    |    CPU Queue Data      | (63 pages)
> +    |    (63 x 4KB pages)    |  Circular buffer for messages
> +    | ...                    |     ...
> +    +------------------------+ base + 0x41000
> +    |    GSP Queue Header    |  MsgqTxHeader + MsgqRxHeader
> +    |    - TX Header (32B)   |
> +    |    - RX Header (4B)    | (1 page)
> +    |    - Padding           |
> +    +------------------------+ base + 0x42000
> +    |    GSP Queue Data      | (63 pages)
> +    |    (63 x 4KB pages)    |  Circular buffer for messages
> +    | ...                    |     ...
> +    +------------------------+ base + 0x81000
> +
> +
> +Message Passing Mechanics
> +-------------------------
> +The split read/write pointer design allows bidirectional communication between the
> +CPU and GSP without synchronization (if it were a shared queue), for example, the
> +following diagram illustrates pointer updates, when CPU sends message to GSP::
> +
> +    +--------------------------------------------------------------------------+
> +    |                     DMA coherent Shared Memory (GspMem)                  |

I think it would help to do this:

s/DMA coherent/DMA-coherent/

> +    +--------------------------------------------------------------------------+
> +    |                          (CPU sending message to GSP)                    |
> +    |  +-------------------+                      +-------------------+        |
> +    |  |   GSP Queue       |                      |   CPU Queue       |        |
> +    |  |                   |                      |                   |        |
> +    |  | +-------------+   |                      | +-------------+   |        |
> +    |  | |  TX Header  |   |                      | |  TX Header  |   |        |
> +    |  | | write_ptr   |   |                      | | write_ptr   |---+----,   |
> +    |  | |             |   |                      | |             |   |    |   |
> +    |  | +-------------+   |                      | +-------------+   |    |   |
> +    |  |                   |                      |                   |    |   |
> +    |  | +-------------+   |                      | +-------------+   |    |   |
> +    |  | |  RX Header  |   |                      | |  RX Header  |   |    |   |
> +    |  | |  read_ptr ------+-------,              | |  read_ptr   |   |    |   |
> +    |  | |             |   |       |              | |             |   |    |   |
> +    |  | +-------------+   |       |              | +-------------+   |    |   |
> +    |  |                   |       |              |                   |    |   |
> +    |  | +-------------+   |       |              | +-------------+   |    |   |
> +    |  | |   Page 0    |   |       |              | |   Page 0    |   |    |   |
> +    |  | +-------------+   |       |              | +-------------+   |    |   |
> +    |  | |   Page 1    |   |       `--------------> |   Page 1    |   |    |   |
> +    |  | +-------------+   |                      | +-------------+   |    |   |
> +    |  | |   Page 2    |   |                      | |   Page 2    |<--+----'   |
> +    |  | +-------------+   |                      | +-------------+   |        |
> +    |  | |     ...     |   |                      | |     ...     |   |        |
> +    |  | +-------------+   |                      | +-------------+   |        |
> +    |  | |   Page 62   |   |                      | |   Page 62   |   |        |
> +    |  | +-------------+   |                      | +-------------+   |        |
> +    |  |   (63 pages)      |                      |   (63 pages)      |        |
> +    |  +-------------------+                      +-------------------+        |
> +    |                                                                          |
> +    +--------------------------------------------------------------------------+
> +
> +When the CPU sends a message to the GSP, it writes the message to its own
> +queue (CPU queue) and updates the write pointer in its queue's TX header. The GSP
> +then reads the read pointer in its own queue's RX header and knows that there are
> +pending messages from the CPU because its RX header's read pointer is behind the
> +CPU's TX header's write pointer. After reading the message, the GSP updates its RX
> +header's read pointer to catch up. The same happens in reverse.

What do you think of this alternative wording:

When the CPU sends a message to the GSP, it writes the message to its own queue
(CPU queue) and updates the write pointer in its queue's TX header. The GSP
checks for pending messages by reading its RX header's read pointer and
comparing it to the CPU's TX header's write pointer. If the GSP's read pointer
lags behind, messages are waiting. After processing each message, the GSP
advances its read pointer to acknowledge receipt. 

For GSP-to-CPU communication, the roles reverse: the GSP writes to its queue and
updates its TX write pointer, while the CPU monitors its RX read pointer and
advances it after consuming messages.


> +
> +Page-based message passing
> +--------------------------
> +The message queue is page-based, which means that the message is stored in a
> +page-aligned buffer. The page size is 4KB. Each message starts at the beginning of
> +a page. If the message is shorter than a page, the remaining space in the page is
> +wasted. The next message starts at the beginning of the next page no matter how
> +small the previous message was.
> +

Error Handling: The document doesn't mention:

a) What happens when queues are full
b) How message corruption is detected and handled
c) Recovery mechanisms for communication failures

Performance Considerations: It would be helpful to add:
a) Why 63 pages were chosen for each queue
b) Typical message sizes and throughput expectations

> +Note that messages larger than a page will span multiple pages. This means that
> +it is possible that the first part of the message lands on the last page, and the
> +second part of the message lands on the first page, thus requiring out-of-order
> +memory access. The SBuffer data structure in Nova tackles this use case.

I don't think SBuffer has landed in the kernel, nor in the pre-requisite bitfield
patchset, yet, right? We could replace that last sentence with something like
"TODO: show how the upcoming SBuffer data structure helps with this use case".


> +
> +RPC Message Structure:

Let's remove the trailing colon.

> +======================
> +
> +An RPC message is also called a "Message Element". The entire message has
> +multiple headers. There is a "message element" header which handles message
> +queue specific details and integrity, followed by a "RPC" header which handles

s/a "RPC"/an "RPC"/

> +the RPC protocol details::
> +
> +    +----------------------------------+
> +    |        GspMsgHeader (64B)        | (aka, Message Element Header)
> +    +----------------------------------+
> +    | auth_tag_buffer[16]              | --+
> +    | aad_buffer[16]                   |   |
> +    | checksum        (u32)            |   +-- Security & Integrity

Can we say anything useful here about:

a) What authentication mechanism is used
b) How message integrity is verified
c) Whether encryption is employed

?

> +    | sequence        (u32)            |   |
> +    | elem_count      (u32)            |   |
> +    | pad             (u32)            | --+
> +    +----------------------------------+
> +    |        GspRpcHeader (32B)        |
> +    +----------------------------------+
> +    | header_version  (0x03000000)     | --+
> +    | signature       (0x43505256)     |   |
> +    | length          (u32)            |   +-- RPC Protocol
> +    | function        (u32)            |   |
> +    | rpc_result      (u32)            |   |
> +    | rpc_result_private (u32)         |   |
> +    | sequence        (u32)            |   |
> +    | cpu_rm_gfid     (u32)            | --+

This shows field values but doesn't explain:

a) What "signature (0x43505256)" represents (appears to be "CPRV" in ASCII)
b) The purpose of cpu_rm_gfid field
c) Valid ranges for the function field

> +    +----------------------------------+
> +    |                                  |
> +    |        Payload (Variable)        | --- Function-specific data
> +    |                                  |
> +    +----------------------------------+
> diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
> index e39cb3163581..46302daace34 100644
> --- a/Documentation/gpu/nova/index.rst
> +++ b/Documentation/gpu/nova/index.rst
> @@ -32,3 +32,4 @@ vGPU manager VFIO driver and the nova-drm driver.
>     core/devinit
>     core/fwsec
>     core/falcon
> +   core/msgq

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  2025-10-20 18:55 ` [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
  2025-10-20 19:36   ` John Hubbard
@ 2025-10-20 22:08   ` John Hubbard
  2025-10-22  2:09   ` Bagas Sanjaya
  2 siblings, 0 replies; 55+ messages in thread
From: John Hubbard @ 2025-10-20 22:08 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/20/25 11:55 AM, Joel Fernandes wrote:
> While not terribly complicated, a little bit of documentation will help
> augment the code for this very important mechanism.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  Documentation/gpu/nova/core/pramin.rst | 113 +++++++++++++++++++++++++
>  Documentation/gpu/nova/index.rst       |   1 +
>  2 files changed, 114 insertions(+)
>  create mode 100644 Documentation/gpu/nova/core/pramin.rst

Hi Joel,

Here's a more thorough review.

> 
> diff --git a/Documentation/gpu/nova/core/pramin.rst b/Documentation/gpu/nova/core/pramin.rst
> new file mode 100644
> index 000000000000..19615e504db9
> --- /dev/null
> +++ b/Documentation/gpu/nova/core/pramin.rst
> @@ -0,0 +1,113 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=========================
> +PRAMIN aperture mechanism
> +=========================
> +
> +.. note::
> +   The following description is approximate and current as of the Ampere family.
> +   It may change for future generations and is intended to assist in understanding
> +   the driver code.
> +
> +Introduction
> +============
> +
> +PRAMIN is a hardware aperture mechanism that provides CPU access to GPU Video RAM (VRAM) before
> +the GPU's Memory Management Unit (MMU) and page tables are initialized. This 1MB sliding window,
> +located at a fixed offset within BAR0, is essential for setting up page tables and other critical
> +GPU data structures without relying on the GPU's MMU.
> +
> +Architecture Overview
> +=====================
> +
> +Logically, the PRAMIN aperture mechanism is implemented by the GPU's PBUS (PCIe Bus Controller Unit)

How about:

The PRAMIN aperture mechanism is logically implemented by the GPU's PBUS (PCIe Bus Controller Unit)


> +and provides a CPU-accessible window into VRAM through the PCIe interface::
> +
> +    +-----------------+    PCIe     +------------------------------+
> +    |      CPU        |<----------->|           GPU                |
> +    +-----------------+             |                              |
> +                                    |  +----------------------+    |
> +                                    |  |       PBUS           |    |
> +                                    |  |  (Bus Controller)    |    |
> +                                    |  |                      |    |
> +                                    |  |  +--------------.<------------ (window always starts at
> +                                    |  |  |   PRAMIN     |    |    |     BAR0 + 0x700000)
> +                                    |  |  |   Window     |    |    |
> +                                    |  |  |   (1MB)      |    |    |
> +                                    |  |  +--------------+    |    |
> +                                    |  |         |            |    |
> +                                    |  +---------|------------+    |
> +                                    |            |                 |
> +                                    |            v                 |
> +                                    |  .----------------------.<------------ (Program PRAMIN to any
> +                                    |  |       VRAM           |    |    64KB VRAM physical boundary)
> +                                    |  |    (Several GBs)     |    |
> +                                    |  |                      |    |
> +                                    |  |  FB[0x000000000000]  |    |
> +                                    |  |          ...         |    |
> +                                    |  |  FB[0x7FFFFFFFFFF]   |    |
> +                                    |  +----------------------+    |
> +                                    +------------------------------+
> +
> +PBUS (PCIe Bus Controller) among other things is responsible in the GPU for handling MMIO

How about:

PBUS (PCIe Bus Controller) is responsible for, among other things, handling MMIO

> +accesses to the BAR registers.
> +
> +PRAMIN Window Operation
> +=======================
> +
> +The PRAMIN window provides a 1MB sliding aperture that can be repositioned over
> +the entire VRAM address space using the NV_PBUS_BAR0_WINDOW register.
> +
> +Window Control Mechanism
> +-------------------------
> +
> +The window position is controlled via the PBUS BAR0_WINDOW register::
> +
> +    NV_PBUS_BAR0_WINDOW Register
> +    +-----+-----+--------------------------------------+
> +    |31-26|25-24|           23-0                       |

Should we use ":" notation here?

31:26 | 25:24 | 23:0

> +    |     |TARG |         BASE_ADDR                    |
> +    |     | ET  |        (bits 39:16 of VRAM address)  |
> +    +-----+-----+--------------------------------------+
> +
> +    TARGET field values:
> +    - 0x0: VID_MEM (Video Memory / VRAM)
> +    - 0x1: SYS_MEM_COHERENT (Coherent system memory)
> +    - 0x2: SYS_MEM_NONCOHERENT (Non-coherent system memory)

These SYS_MEM_* items imply to the reader that PRAMIN can target
sysmem. However, pramin.rs in this series hard-codes this field
to VID_MEM:

let window_reg = regs::NV_PBUS_BAR0_WINDOW::default().set_window_addr(fb_offset);

It's not clear why that is done (I seem to recall a potential deadlock
in older chips, in similar situations, but that's probably stale info).

If, however, there is some limitation that prevents using SYS_MEM*, then
we should document it, probably here.

> +
> +64KB Alignment Requirement
> +---------------------------
> +
> +The PRAMIN window must be aligned to 64KB boundaries in VRAM. This is enforced
> +by the BASE_ADDR field representing bits [39:16] of the target address::
> +
> +    VRAM Address Calculation:
> +    actual_vram_addr = (BASE_ADDR << 16) + pramin_offset
> +    Where:
> +    - BASE_ADDR: 24-bit value from NV_PBUS_BAR0_WINDOW[23:0]
> +    - pramin_offset: 20-bit offset within PRAMIN window [0x00000-0xFFFFF]

How about:

      - pramin_offset: 20-bit offset within the PRAMIN window [0x00000-0xFFFFF]

> +    Example Window Positioning:
> +    +---------------------------------------------------------+
> +    |                    VRAM Space                           |
> +    |                                                         |
> +    |  0x000000000  +-----------------+ <-- 64KB aligned      |
> +    |               | PRAMIN Window   |                       |
> +    |               |    (1MB)        |                       |
> +    |  0x0000FFFFF  +-----------------+                       |
> +    |                                                         |
> +    |       |              ^                                  |
> +    |       |              | Window can slide                 |
> +    |       v              | to any 64KB boundary             |

Maybe:

    s/any 64KB boundary/any 64KB-aligned boundary/

> +    |                                                         |
> +    |  0x123400000  +-----------------+ <-- 64KB aligned      |
> +    |               | PRAMIN Window   |                       |
> +    |               |    (1MB)        |                       |
> +    |  0x1234FFFFF  +-----------------+                       |
> +    |                                                         |
> +    |                       ...                               |
> +    |                                                         |
> +    |  0x7FFFF0000  +-----------------+ <-- 64KB aligned      |
> +    |               | PRAMIN Window   |                       |
> +    |               |    (1MB)        |                       |
> +    |  0x7FFFFFFFF  +-----------------+                       |
> +    +---------------------------------------------------------+
> diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
> index 46302daace34..e77d3ee336a4 100644
> --- a/Documentation/gpu/nova/index.rst
> +++ b/Documentation/gpu/nova/index.rst
> @@ -33,3 +33,4 @@ vGPU manager VFIO driver and the nova-drm driver.
>     core/fwsec
>     core/falcon
>     core/msgq
> +   core/pramin

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts
  2025-10-20 18:55 ` [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts Joel Fernandes
@ 2025-10-20 22:35   ` John Hubbard
  2025-10-21 18:42     ` Joel Fernandes
  2025-10-22  6:48     ` Alexandre Courbot
  2025-10-22  6:47   ` Alexandre Courbot
  1 sibling, 2 replies; 55+ messages in thread
From: John Hubbard @ 2025-10-20 22:35 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/20/25 11:55 AM, Joel Fernandes wrote:
> Add support for managing GSP falcon interrupts. These are required for
> GSP message queue interrupt handling.
> 
> Also rename clear_swgen0_intr() to enable_msq_interrupt() for
> readability.

Hi Joel,

I have a few comments below, including one that doesn't apply to you,
but to Alex Courbot.

Also, other than some trivia below, I can't find any problems with this
patch, other than possibly the above commit message wording, so
regardless of what we do with the .alter() method, please feel free to
add:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon/gsp.rs | 26 +++++++++++++++++++++++---
>  drivers/gpu/nova-core/gpu.rs        |  2 +-
>  drivers/gpu/nova-core/regs.rs       | 10 ++++++++++
>  3 files changed, 34 insertions(+), 4 deletions(-)



> 
> diff --git a/drivers/gpu/nova-core/falcon/gsp.rs b/drivers/gpu/nova-core/falcon/gsp.rs
> index f17599cb49fa..6da63823996b 100644
> --- a/drivers/gpu/nova-core/falcon/gsp.rs
> +++ b/drivers/gpu/nova-core/falcon/gsp.rs
> @@ -22,11 +22,31 @@ impl FalconEngine for Gsp {
>  }
>  
>  impl Falcon<Gsp> {
> -    /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
> -    /// allow GSP to signal CPU for processing new messages in message queue.
> -    pub(crate) fn clear_swgen0_intr(&self, bar: &Bar0) {
> +    /// Enable the GSP Falcon message queue interrupt (SWGEN0 interrupt).
> +    #[expect(dead_code)]
> +    pub(crate) fn enable_msgq_interrupt(&self, bar: &Bar0) {
> +        regs::NV_PFALCON_FALCON_IRQMASK::alter(bar, &Gsp::ID, |r| r.set_swgen0(true));
> +    }

Alex, this ".alter" method is misnamed, IMHO. Because for registers,
The One True Way (or so I claim, haha) is to have the following methods:

    .read
    .modify, also known as RMW (read-modify-write)
    .write

"alter" never shows up in this naming scheme. I'm going to claim that
this is a bit jarring for old hardware/kernel programmers.

But it's not too late: these are only used in a very few places, and entirely
within nova-core, too.

Can I *please* send a patch to rename "alter" to "modify", perhaps?


> +
> +    /// Check if the message queue interrupt is pending.
> +    #[expect(dead_code)]
> +    pub(crate) fn has_msgq_interrupt(&self, bar: &Bar0) -> bool {
> +        regs::NV_PFALCON_FALCON_IRQSTAT::read(bar, &Gsp::ID).swgen0()
> +    }

Joel:

I am guessing that there is never a situation in which we would *disable*
these interrupts, right? Just thought I'd ask.

> +
> +    /// Clears the message queue interrupt to allow GSP to signal CPU
> +    /// for processing new messages.
> +    pub(crate) fn clear_msgq_interrupt(&self, bar: &Bar0) {
>          regs::NV_PFALCON_FALCON_IRQSCLR::default()
>              .set_swgen0(true)
>              .write(bar, &Gsp::ID);
>      }
> +
> +    /// Acknowledge all pending GSP interrupts.
> +    #[expect(dead_code)]
> +    pub(crate) fn ack_all_interrupts(&self, bar: &Bar0) {
> +        // Read status and write the raw value to IRQSCLR to clear all pending interrupts.
> +        let status = regs::NV_PFALCON_FALCON_IRQSTAT::read(bar, &Gsp::ID);
> +        regs::NV_PFALCON_FALCON_IRQSCLR::from(u32::from(status)).write(bar, &Gsp::ID);
> +    }
>  }
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index af20e2daea24..fb120cf7b15d 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -216,7 +216,7 @@ pub(crate) fn new<'a>(
>                  bar,
>                  spec.chipset > Chipset::GA100,
>              )
> -            .inspect(|falcon| falcon.clear_swgen0_intr(bar))?,
> +            .inspect(|falcon| falcon.clear_msgq_interrupt(bar))?,
>  
>              sec2_falcon: Falcon::new(pdev.as_ref(), spec.chipset, bar, true)?,
>  
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index 206dab2e1335..a3836a01996b 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -198,6 +198,16 @@ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
>  
>  // PFALCON
>  
> +register!(NV_PFALCON_FALCON_IRQMASK @ PFalconBase[0x00000014] {
> +    4:4     halt as bool;
> +    6:6     swgen0 as bool;
> +});
> +
> +register!(NV_PFALCON_FALCON_IRQSTAT @ PFalconBase[0x00000008] {
> +    4:4     halt as bool;
> +    6:6     swgen0 as bool;
> +});
> +
>  register!(NV_PFALCON_FALCON_IRQSCLR @ PFalconBase[0x00000004] {
>      4:4     halt as bool;
>      6:6     swgen0 as bool;

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/7] docs: rust: Fix a few grammatical errors
  2025-10-20 21:33   ` Miguel Ojeda
@ 2025-10-20 23:23     ` Joel Fernandes
  0 siblings, 0 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-20 23:23 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot,
	Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau



On 10/20/2025 5:33 PM, Miguel Ojeda wrote:
> On Mon, Oct 20, 2025 at 8:55 PM Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>
>> Fix two grammatical errors in the Rust coding guidelines document.
>>
>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> 
> In general, please submit independent patches like this one
> independently, e.g. this should go through the Rust tree.

Ack, I will do that. Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-10-20 20:59   ` John Hubbard
@ 2025-10-21 18:26     ` Joel Fernandes
  2025-10-21 20:30       ` John Hubbard
  0 siblings, 1 reply; 55+ messages in thread
From: Joel Fernandes @ 2025-10-21 18:26 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

Hi John,

On 10/20/2025 4:59 PM, John Hubbard wrote:
> On 10/20/25 11:55 AM, Joel Fernandes wrote:
>> Add data structures and helpers for page table management. Uses
>> bitfield for cleanly representing and accessing the bitfields in the
>> structures.
>>
> ...
>> +bitfield! {
>> +    pub(crate) struct Pte(u64), "Page Table Entry (PTE) to map virtual pages to physical frames." {
>> +        0:0     valid           as bool;    // (1 = valid for PTEs)
>> +        1:1     privilege       as bool;    // P - Privileged/kernel-only access
>> +        2:2     read_only       as bool;    // RO - Write protection
>> +        3:3     atomic_disable  as bool;    // AD - Disable atomic ops
>> +        4:4     encrypted       as bool;    // E - Encryption enabled
>> +        39:8    frame_number    as u64;     // PA[39:8] - Physical frame number (32 bits)
>> +        41:40   aperture        as u8 => AperturePte;   // Memory aperture type.
>> +        42:42   volatile        as bool;    // VOL - Volatile flag
>> +        50:43   kind            as u8;      // K[7:0] - Compression/tiling kind
>> +        63:51   comptag_line    as u16;     // CTL[12:0] - Compression tag line
>> +    }
>> +}
> 
> Hi Joel,
> 
> What GPUs is this good for? I ask because I seem to recall that
> the format has changed over the years, on a per-GPU-architecture
> basis...

Yes, there's different format versions.

This patch supports all Turing and later GPUs because all GPUs including Hopper+
are backward compatible with version 1. However they wont be able to use the
version 2 and 3 features with only this patch.

I kind of intentionally did this for a first cut. But yes, I could split it into
versions. The 3 MMU structures (PTE, PDE and Dual PDE) are versioned. Version 2
is Turing and later. Hopper+ is when Version 3 got introduced and it is also
backward compatible with Version 2.

We could eventually support versions 2 and 3 (instead of just version 1 as I am
doing), however my working MMU translation prototype is based on version 1 (I
did not have to configure anything in the MMU to switch versions, this was default).

There are a couple of options:

1. For starters, support only version 1. Drawback is, when/if we want to use
version 2 and 3 features, it may require some rework.

2. Have the following hierarchy:
mm/types.rs - all common structures (stuff that is generic like Pfn).
mm/types/ver1.rs - Version 1 specific types.
mm/types/ver2.rs - Version 2 specific types.
mm/types/ver3.rs - Version 3 specific types.
The advantage of this is it keeps the structs namespaced. So it'd be
nova_core::mm:types::ver2::Pte or nova_core::mm:types::ver3::Pte. And the
nova-core MMU code can pick the correct version.

3. Keep the single file types.rs and just suffix the structs with version
numbers. This is attractive because there are only 3 types that have version
flavors (pte, pde and dual pde). So instead of Pte, we would have PteVersion1,
PteVersion2 etc, and a helper abstraction can pick the correct struct.

4. Any of the options 1-3, but dropping version 1 since Turing+ supports version
2 and later. I do have to figure out how to configure the MMU to use a specific
version (which is reasonable).

5. Your options here.

Btw, I used Nouveau as a reference as well, so likely Nouveau doesn't support
version 2 and 3 features. Not that that matters (we should support newer
features in nova-core), but just thought I'd mention it.

Other thoughts?

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core
  2025-10-20 21:20 ` [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core John Hubbard
@ 2025-10-21 18:29   ` Joel Fernandes
  0 siblings, 0 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-21 18:29 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau



On 10/20/2025 5:20 PM, John Hubbard wrote:
> On 10/20/25 11:55 AM, Joel Fernandes wrote:
>> These patches have some prerequistes needed for nova-core as support is added
>> for memory management and interrupt handling. I rebased them on drm-rust-next
>> and would like them to be considered for the next merge window. I also included
>> a simple rust documentation patch fixing some issues I noticed while reading it :).
> 
> Just to be clear, it appears to me that one must first apply
> "[PATCH v7 0/4] bitfield initial refactor within nova-core" [1],
> in order to apply this series, yes?
> 
> 
> [1] https://lore.kernel.org/20251016150204.1189641-1-joelagnelf@nvidia.com
> 

Yes that is correct. These patches are now in drm-rust-next and I clarified this
yesterday with Alex as well. Sorry I should have mentioned it in the cover letter.

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts
  2025-10-20 22:35   ` John Hubbard
@ 2025-10-21 18:42     ` Joel Fernandes
  2025-10-22  6:48     ` Alexandre Courbot
  1 sibling, 0 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-21 18:42 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

Hi John,

On 10/20/2025 6:35 PM, John Hubbard wrote:
[...]
> Joel:
> 
> I am guessing that there is never a situation in which we would *disable*
> these interrupts, right? Just thought I'd ask.

Thanks for double checking. At the moment, we don't have usecase to disable
swgen0. I checked my prototype code and Nouveau as well and we never have to do
this.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-10-21 18:26     ` Joel Fernandes
@ 2025-10-21 20:30       ` John Hubbard
  2025-10-21 21:58         ` Joel Fernandes
  0 siblings, 1 reply; 55+ messages in thread
From: John Hubbard @ 2025-10-21 20:30 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/21/25 11:26 AM, Joel Fernandes wrote:
> On 10/20/2025 4:59 PM, John Hubbard wrote:
>> On 10/20/25 11:55 AM, Joel Fernandes wrote:
>> ...
> Yes, there's different format versions.
> 
> This patch supports all Turing and later GPUs because all GPUs including Hopper+
> are backward compatible with version 1. However they wont be able to use the
> version 2 and 3 features with only this patch.
> 
> I kind of intentionally did this for a first cut. But yes, I could split it into
> versions. The 3 MMU structures (PTE, PDE and Dual PDE) are versioned. Version 2
> is Turing and later. Hopper+ is when Version 3 got introduced and it is also

Ah, then we shouldn't even do version 1. We should take full advantage of
the fact that Nova's earliest GPU is Turing.

> backward compatible with Version 2.
> 
> We could eventually support versions 2 and 3 (instead of just version 1 as I am
> doing), however my working MMU translation prototype is based on version 1 (I
> did not have to configure anything in the MMU to switch versions, this was default).
> 
> There are a couple of options:
> 
> 1. For starters, support only version 1. Drawback is, when/if we want to use
> version 2 and 3 features, it may require some rework.
> 
> 2. Have the following hierarchy:
> mm/types.rs - all common structures (stuff that is generic like Pfn).
> mm/types/ver1.rs - Version 1 specific types.
> mm/types/ver2.rs - Version 2 specific types.
> mm/types/ver3.rs - Version 3 specific types.

Maybe a file/directory structure that more directly indicates page table
formats. "mm/types" could be quite a few things.

> The advantage of this is it keeps the structs namespaced. So it'd be
> nova_core::mm:types::ver2::Pte or nova_core::mm:types::ver3::Pte. And the
> nova-core MMU code can pick the correct version.
> 
> 3. Keep the single file types.rs and just suffix the structs with version
> numbers. This is attractive because there are only 3 types that have version
> flavors (pte, pde and dual pde). So instead of Pte, we would have PteVersion1,
> PteVersion2 etc, and a helper abstraction can pick the correct struct.
> 
> 4. Any of the options 1-3, but dropping version 1 since Turing+ supports version
> 2 and later. I do have to figure out how to configure the MMU to use a specific

Right, I see you already noticed that we can start with Turing. Good.

> version (which is reasonable).
> 
> 5. Your options here.
> 
> Btw, I used Nouveau as a reference as well, so likely Nouveau doesn't support
> version 2 and 3 features. Not that that matters (we should support newer
> features in nova-core), but just thought I'd mention it.
> 
> Other thoughts?

Two things:

1) Danilo is working on writing down locking requirements for Nova page
tables, based on earlier experience with Nouveau page table locking
problems, which were apparently very serious.

2) Maybe it would be good to start with versions 2 and 3, so that we
can see how to do >1 version?

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-10-21 20:30       ` John Hubbard
@ 2025-10-21 21:58         ` Joel Fernandes
  0 siblings, 0 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-21 21:58 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, dakr@kernel.org,
	Alexandre Courbot, Alistair Popple, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, bjorn3_gh@protonmail.com, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Timur Tabi, joel@joelfernandes.org,
	Elle Rhumsaa, Daniel Almeida, nouveau@lists.freedesktop.org


> On Oct 21, 2025, at 4:30 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> 
> On 10/21/25 11:26 AM, Joel Fernandes wrote:
>>> On 10/20/2025 4:59 PM, John Hubbard wrote:
>>>> On 10/20/25 11:55 AM, Joel Fernandes wrote:
>>> ...
>> Yes, there's different format versions.
>> 
>> This patch supports all Turing and later GPUs because all GPUs including Hopper+
>> are backward compatible with version 1. However they wont be able to use the
>> version 2 and 3 features with only this patch.
>> 
>> I kind of intentionally did this for a first cut. But yes, I could split it into
>> versions. The 3 MMU structures (PTE, PDE and Dual PDE) are versioned. Version 2
>> is Turing and later. Hopper+ is when Version 3 got introduced and it is also
> 
> Ah, then we shouldn't even do version 1. We should take full advantage of
> the fact that Nova's earliest GPU is Turing.

Makes sense, though one advantage of just defining version 1 too is it may
be a good reference or a fallback should we ever need it (Especially since it
took me 2 months to be get my nova-core MMU code working and it may be
nice to have a fallback. Let me think about it, I believe it will be just few more
lines of code, so should not be that bad.)

> 
>> backward compatible with Version 2.
>> 
>> We could eventually support versions 2 and 3 (instead of just version 1 as I am
>> doing), however my working MMU translation prototype is based on version 1 (I
>> did not have to configure anything in the MMU to switch versions, this was default).
>> 
>> There are a couple of options:
>> 
>> 1. For starters, support only version 1. Drawback is, when/if we want to use
>> version 2 and 3 features, it may require some rework.
>> 
>> 2. Have the following hierarchy:
>> mm/types.rs - all common structures (stuff that is generic like Pfn).
>> mm/types/ver1.rs - Version 1 specific types.
>> mm/types/ver2.rs - Version 2 specific types.
>> mm/types/ver3.rs - Version 3 specific types.
> 
> Maybe a file/directory structure that more directly indicates page table
> formats. "mm/types" could be quite a few things.
> 

Agreed, though mm/types also contains other non-pagetable structures.
I could prefix ver1/2/3 with page_tables_, so it would be like:
mm/types/page_tables_ver{1,2,3}.rs. That is much more clear.

> 
>> version (which is reasonable).
>> 
>> 5. Your options here.
>> 
>> Btw, I used Nouveau as a reference as well, so likely Nouveau doesn't support
>> version 2 and 3 features. Not that that matters (we should support newer
>> features in nova-core), but just thought I'd mention it.
>> 
>> Other thoughts?
> 
> Two things:
> 
> 1) Danilo is working on writing down locking requirements for Nova page
> tables, based on earlier experience with Nouveau page table locking
> problems, which were apparently very serious.

Sure, though I believe it should not block these patches because we are
not yet designing higher level users. These patches just expose the structures
and things like locking should be built at a higher level. I believe Danilo
is referring to page table look ups and updates, that will a different module
outside of types.

> 2) Maybe it would be good to start with versions 2 and 3, so that we
> can see how to do >1 version?

Yes I will start with version 2, 3. If version 1 is simple, I may include that
in too for the benefit of my working prototype if that is Ok :) Worst case,
we have to just delete an extra file should be not need it :)

thanks,

 - Joel

> 
> thanks,
> --
> John Hubbard
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture
  2025-10-20 18:55 ` [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture Joel Fernandes
  2025-10-20 21:49   ` John Hubbard
@ 2025-10-22  1:43   ` Bagas Sanjaya
  2025-10-22 11:16   ` Alexandre Courbot
  2 siblings, 0 replies; 55+ messages in thread
From: Bagas Sanjaya @ 2025-10-22  1:43 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

[-- Attachment #1: Type: text/plain, Size: 448 bytes --]

On Mon, Oct 20, 2025 at 02:55:35PM -0400, Joel Fernandes wrote:
> +The split read/write pointer design allows bidirectional communication between the
> +CPU and GSP without synchronization (if it were a shared queue), for example, the
                                        as if it is
> +following diagram illustrates pointer updates, when CPU sends message to GSP::
 
Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  2025-10-20 18:55 ` [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
  2025-10-20 19:36   ` John Hubbard
  2025-10-20 22:08   ` John Hubbard
@ 2025-10-22  2:09   ` Bagas Sanjaya
  2 siblings, 0 replies; 55+ messages in thread
From: Bagas Sanjaya @ 2025-10-22  2:09 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

[-- Attachment #1: Type: text/plain, Size: 3283 bytes --]

On Mon, Oct 20, 2025 at 02:55:36PM -0400, Joel Fernandes wrote:
> +The window position is controlled via the PBUS BAR0_WINDOW register::
> +
> +    NV_PBUS_BAR0_WINDOW Register
> +    +-----+-----+--------------------------------------+
> +    |31-26|25-24|           23-0                       |
> +    |     |TARG |         BASE_ADDR                    |
> +    |     | ET  |        (bits 39:16 of VRAM address)  |
> +    +-----+-----+--------------------------------------+

Shouldn't the TARGET field cell above be fitted (extended)?

> +
> +    TARGET field values:
> +    - 0x0: VID_MEM (Video Memory / VRAM)
> +    - 0x1: SYS_MEM_COHERENT (Coherent system memory)
> +    - 0x2: SYS_MEM_NONCOHERENT (Non-coherent system memory)
> +
> +64KB Alignment Requirement
> +---------------------------
> +
> +The PRAMIN window must be aligned to 64KB boundaries in VRAM. This is enforced
> +by the BASE_ADDR field representing bits [39:16] of the target address::
> +
> +    VRAM Address Calculation:
> +    actual_vram_addr = (BASE_ADDR << 16) + pramin_offset
> +    Where:
> +    - BASE_ADDR: 24-bit value from NV_PBUS_BAR0_WINDOW[23:0]
> +    - pramin_offset: 20-bit offset within PRAMIN window [0x00000-0xFFFFF]
> +    Example Window Positioning:

Move the supplementary explanation out of respective literal code blocks,
like::

---- >8 ----
diff --git a/Documentation/gpu/nova/core/pramin.rst b/Documentation/gpu/nova/core/pramin.rst
index 19615e504db9d5..47abe539e32a81 100644
--- a/Documentation/gpu/nova/core/pramin.rst
+++ b/Documentation/gpu/nova/core/pramin.rst
@@ -70,23 +70,28 @@ The window position is controlled via the PBUS BAR0_WINDOW register::
     |     | ET  |        (bits 39:16 of VRAM address)  |
     +-----+-----+--------------------------------------+
 
-    TARGET field values:
-    - 0x0: VID_MEM (Video Memory / VRAM)
-    - 0x1: SYS_MEM_COHERENT (Coherent system memory)
-    - 0x2: SYS_MEM_NONCOHERENT (Non-coherent system memory)
+where TARGET field values are:
+
+  - 0x0: VID_MEM (Video Memory / VRAM)
+  - 0x1: SYS_MEM_COHERENT (Coherent system memory)
+  - 0x2: SYS_MEM_NONCOHERENT (Non-coherent system memory)
 
 64KB Alignment Requirement
 ---------------------------
 
 The PRAMIN window must be aligned to 64KB boundaries in VRAM. This is enforced
-by the BASE_ADDR field representing bits [39:16] of the target address::
+by the BASE_ADDR field representing bits [39:16] of the target address. The
+VRAM address calculation is determined by::
 
-    VRAM Address Calculation:
     actual_vram_addr = (BASE_ADDR << 16) + pramin_offset
-    Where:
-    - BASE_ADDR: 24-bit value from NV_PBUS_BAR0_WINDOW[23:0]
-    - pramin_offset: 20-bit offset within PRAMIN window [0x00000-0xFFFFF]
-    Example Window Positioning:
+
+where:
+
+  - BASE_ADDR: 24-bit value from NV_PBUS_BAR0_WINDOW[23:0]
+  - pramin_offset: 20-bit offset within PRAMIN window [0x00000-0xFFFFF]
+
+Example window positioning::
+
     +---------------------------------------------------------+
     |                    VRAM Space                           |
     |                                                         |

Thanks. 

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2025-10-20 18:55 ` [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
@ 2025-10-22  2:18   ` John Hubbard
  2025-10-22 17:48     ` Joel Fernandes
  2025-10-22 10:41   ` Alexandre Courbot
  1 sibling, 1 reply; 55+ messages in thread
From: John Hubbard @ 2025-10-22  2:18 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/20/25 11:55 AM, Joel Fernandes wrote:
> Required for writing page tables directly to VRAM physical memory,
> before page tables and MMU are setup.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/mm/mod.rs    |   3 +
>  drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/nova_core.rs |   1 +
>  drivers/gpu/nova-core/regs.rs      |  29 +++-
>  4 files changed, 273 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs

Hi Joel,

Lots of feedback here, I'm familiar with this part of the RM and GPU,
and have studied your patch here for a while.

First of all, I'd like very much for this patch to be at the beginning
of a new patchset, that includes calling code that actually uses this
pramin functionality. That's a good practice in general, which I want
to sort of reestablish here in nova, but also in this particular case
it is especially helpful in avoiding a lot of code that we don't need,
and won't need.

OK here we go.

> 
> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
> new file mode 100644
> index 000000000000..54c7cd9416a9
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/mod.rs
> @@ -0,0 +1,3 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +pub(crate) mod pramin;
> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
> new file mode 100644
> index 000000000000..4f4e1b8c0b9b
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/pramin.rs
> @@ -0,0 +1,241 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Direct VRAM access through PRAMIN window before page tables are set up.

Let's omit "before page tables are set up", because that's an assumption
about the caller. In fact, this can be used at any time. (As an aside,
various diag tools use this HW feature to peek/poke at vidmem, as you
might expect.)

> +//! PRAMIN can also write to system memory, however for simplicty we only
> +//! support VRAM access.

Actually, I'd like to say something a bit different:

For Hopper/Blackwell and later GPUs, PRAMIN can only access VRAM.
For earlier GPUs, sysmem was technically supported, but in practice
never used in production drivers.

This is why Nova only provides VRAM access: Nova supports only a few
of those older GPU generations (Turing, Ampere, and Ada), and there is
no use case even for those few GPUs, for attempting sysmem access
via PRAMIN.


> +//!
> +//! # Examples
> +//!
> +//! ## Writing u32 data to VRAM
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin::PraminVram;
> +//!
> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
> +//!     let pramin = PraminVram::new(bar);
> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
> +//!     pramin.write::<u32>(0x10000, &data)?;
> +//!     Ok(())
> +//! }
> +//! ```
> +//!
> +//! ## Reading bytes from VRAM
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin::PraminVram;
> +//!
> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
> +//!     let pramin = PraminVram::new(bar);
> +//!     // Read a u8 from VRAM starting at offset 0x20000
> +//!     pramin.read::<u8>(0x20000, buffer)?;
> +//!     Ok(())
> +//! }
> +//! ```
> +
> +#![expect(dead_code)]
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +use core::mem;
> +use kernel::prelude::*;
> +
> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
> +/// the VRAM directly. These addresses are consistent across all GPUs.
> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
> +
> +/// Trait for types that can be read/written through PRAMIN.
> +pub(crate) trait PraminNum: Copy + Default + Sized {
> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
> +
> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
> +
> +    fn size_bytes() -> usize {
> +        mem::size_of::<Self>()
> +    }
> +
> +    fn alignment() -> usize {
> +        Self::size_bytes()
> +    }
> +}
> +
> +/// Macro to implement PraminNum trait for unsigned integer types.
> +macro_rules! impl_pramin_unsigned_num {
> +    ($bits:literal) => {
> +        ::kernel::macros::paste! {
> +            impl PraminNum for [<u $bits>] {
> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
> +                    bar.[<try_read $bits>](offset)
> +                }
> +
> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
> +                    bar.[<try_write $bits>](self, offset)
> +                }
> +            }
> +        }
> +    };
> +}
> +
> +impl_pramin_unsigned_num!(8);
> +impl_pramin_unsigned_num!(16);
> +impl_pramin_unsigned_num!(32);
> +impl_pramin_unsigned_num!(64);
> +
> +/// Direct VRAM access through PRAMIN window before page tables are set up.
> +pub(crate) struct PraminVram<'a> {
> +    bar: &'a Bar0,
> +    saved_window_addr: usize,
> +}
> +
> +impl<'a> PraminVram<'a> {
> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
> +    /// the state is restored when the accessor is dropped.
> +    ///
> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
> +        let saved_window_addr = Self::get_window_addr(bar);
> +        Self {
> +            bar,
> +            saved_window_addr,
> +        }
> +    }
> +
> +    /// Set BAR0 window to point to specific FB region.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
> +    ///                 Must be 64KB aligned (lower 16 bits zero).
> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
> +        if fb_offset & 0xFFFF != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::default().set_window_addr(fb_offset);
> +        window_reg.write(self.bar);
> +
> +        // Read back to ensure it took effect
> +        let readback = regs::NV_PBUS_BAR0_WINDOW::read(self.bar);
> +        if readback.window_base() != window_reg.window_base() {
> +            return Err(EIO);
> +        }

Let's not "read it back to ensure it took effect". This is not required
by the hardware.

> +
> +        Ok(())
> +    }
> +
> +    /// Get current BAR0 window offset.
> +    ///
> +    /// # Returns
> +    ///
> +    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
> +    /// This offset is always 64KB aligned.
> +    fn get_window_addr(bar: &Bar0) -> usize {
> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
> +        window_reg.get_window_addr()
> +    }
> +
> +    /// Common logic for accessing VRAM data through PRAMIN with windowing.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `num_items` - Number of items of type `T` to process.
> +    /// * `operation` - Closure called for each item to perform the actual read/write.
> +    ///                 Takes two parameters:
> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access
> +    ///
> +    /// The function automatically handles PRAMIN window repositioning when accessing
> +    /// data that spans multiple 1MB windows.
> +    fn access_vram<T: PraminNum, F>(
> +        &self,
> +        fb_offset: usize,
> +        num_items: usize,
> +        mut operation: F,
> +    ) -> Result

This is far too much functionality, and the code can be made much smaller
and simpler, and still get what we need. Open RM only supplies small accessors
(8 thru 64 bits wide), and no "bulk access". The calling code can loop if 
necessary.

We should do likewise, and avoid this.

Then we can just create things such as write_u32() or write<u32>(), etc.

And do we even *need* read?? I'm not sure we do.

This is hopefully showing the value of including the calling code, as
a follow-on patch in the series.

Hope this helps!

thanks,
-- 
John Hubbard


> +    where
> +        F: FnMut(usize, usize) -> Result,
> +    {
> +        // FB offset must be aligned to the size of T
> +        if fb_offset & (T::alignment() - 1) != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let mut offset_bytes = fb_offset;
> +        let mut remaining_items = num_items;
> +        let mut data_index = 0;
> +
> +        while remaining_items > 0 {
> +            // Align the window to 64KB boundary
> +            let target_window = offset_bytes & !0xFFFF;
> +            let window_offset = offset_bytes - target_window;
> +
> +            // Set window if needed
> +            if target_window != Self::get_window_addr(self.bar) {
> +                self.set_window_addr(target_window)?;
> +            }
> +
> +            // Calculate how many items we can access from this window position
> +            // We can access up to 1MB total, minus the offset within the window
> +            let remaining_in_window = PRAMIN_SIZE - window_offset;
> +            let max_items_in_window = remaining_in_window / T::size_bytes();
> +            let items_to_write = core::cmp::min(remaining_items, max_items_in_window);
> +
> +            // Process data through PRAMIN
> +            for i in 0..items_to_write {
> +                // Calculate the byte offset in the PRAMIN window to write to.
> +                let pramin_offset_bytes = PRAMIN_BASE + window_offset + (i * T::size_bytes());
> +                operation(data_index + i, pramin_offset_bytes)?;
> +            }
> +
> +            // Move to next chunk.
> +            data_index += items_to_write;
> +            offset_bytes += items_to_write * T::size_bytes();
> +            remaining_items -= items_to_write;
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Generic write for data to VRAM through PRAMIN.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM where data will be written.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `data` - Slice of items to write to VRAM. All items will be written sequentially
> +    ///            starting at `fb_offset`.
> +    pub(crate) fn write<T: PraminNum>(&self, fb_offset: usize, data: &[T]) -> Result {
> +        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
> +            data[data_idx].write_to_bar(self.bar, pramin_offset)
> +        })
> +    }
> +
> +    /// Generic read data from VRAM through PRAMIN.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM where data will be read from.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `data` - Mutable slice that will be filled with data read from VRAM.
> +    ///            The number of items read equals `data.len()`.
> +    pub(crate) fn read<T: PraminNum>(&self, fb_offset: usize, data: &mut [T]) -> Result {
> +        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
> +            data[data_idx] = T::read_from_bar(self.bar, pramin_offset)?;
> +            Ok(())
> +        })
> +    }
> +}
> +
> +impl<'a> Drop for PraminVram<'a> {
> +    fn drop(&mut self) {
> +        let _ = self.set_window_addr(self.saved_window_addr); // Restore original window.
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 112277c7921e..6bd9fc3372d6 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -13,6 +13,7 @@
>  mod gfw;
>  mod gpu;
>  mod gsp;
> +mod mm;
>  mod regs;
>  mod util;
>  mod vbios;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index a3836a01996b..ba09da7e1541 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -12,6 +12,7 @@
>      FalconModSelAlgo, FalconSecurityModel, PFalcon2Base, PFalconBase, PeregrineCoreSelect,
>  };
>  use crate::gpu::{Architecture, Chipset};
> +use kernel::bits::genmask_u32;
>  use kernel::prelude::*;
>  
>  // PMC
> @@ -43,7 +44,8 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
>      }
>  }
>  
> -// PBUS
> +// PBUS - PBUS is a bus control unit, that helps the GPU communicate with the PCI bus.
> +// Handles the BAR windows, decoding of MMIO read/writes on the BARs, etc.
>  
>  register!(NV_PBUS_SW_SCRATCH @ 0x00001400[64]  {});
>  
> @@ -52,6 +54,31 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
>      31:16   frts_err_code as u16;
>  });
>  
> +// BAR0 window control register to configure the BAR0 window for PRAMIN access
> +// (direct physical VRAM access).
> +register!(NV_PBUS_BAR0_WINDOW @ 0x00001700, "BAR0 window control register" {
> +    25:24   target as u8, "Target (0=VID_MEM, 1=SYS_MEM_COHERENT, 2=SYS_MEM_NONCOHERENT)";
> +    23:0    window_base as u32, "Window base address (bits 39:16 of FB addr)";
> +});
> +
> +impl NV_PBUS_BAR0_WINDOW {
> +    /// Returns the 64-bit aligned VRAM address of the window.
> +    pub(crate) fn get_window_addr(self) -> usize {
> +        (self.window_base() as usize) << 16
> +    }
> +
> +    /// Sets the window address from a framebuffer offset.
> +    /// The fb_offset must be 64KB aligned (lower bits discared).
> +    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
> +        // Calculate window base (bits 39:16 of FB address)
> +        // The total FB address is 40 bits, mask anything above. Since we are
> +        // right shifting the offset by 16 bits, the mask is only 24 bits.
> +        let mask = genmask_u32(0..=23) as usize;
> +        let window_base = ((fb_offset >> 16) & mask) as u32;
> +        self.set_window_base(window_base)
> +    }
> +}
> +
>  // PFB
>  
>  // The following two registers together hold the physical system memory address that is used by the



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/7] gpu: nova-core: Add support to convert bitfield to underlying type
  2025-10-20 18:55 ` [PATCH 2/7] gpu: nova-core: Add support to convert bitfield to underlying type Joel Fernandes
  2025-10-20 21:25   ` John Hubbard
@ 2025-10-22  6:25   ` Alexandre Courbot
  2025-10-22 17:51     ` Joel Fernandes
  1 sibling, 1 reply; 55+ messages in thread
From: Alexandre Courbot @ 2025-10-22  6:25 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
> To support the usecase where we read a register and write to another
> with identical bit layout, add support to convert bitfield to underlying type.
>
> Another way to do this, is to read individual fields, on the caller
> side, and write to the destination fields, but that is both cumbersome
> and error-prone as new bits added in hardware may be missed.

Shouldn't the title be "Add support to convert bitfield FROM underlying
type"? Since this adds a `From<storage> for Bitfield`.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts
  2025-10-20 18:55 ` [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts Joel Fernandes
  2025-10-20 22:35   ` John Hubbard
@ 2025-10-22  6:47   ` Alexandre Courbot
  2025-10-22 21:05     ` Joel Fernandes
  1 sibling, 1 reply; 55+ messages in thread
From: Alexandre Courbot @ 2025-10-22  6:47 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
> Add support for managing GSP falcon interrupts. These are required for
> GSP message queue interrupt handling.
>
> Also rename clear_swgen0_intr() to enable_msq_interrupt() for
> readability.

Let's make this "also" its own patch as it is a different thing.

>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/falcon/gsp.rs | 26 +++++++++++++++++++++++---
>  drivers/gpu/nova-core/gpu.rs        |  2 +-
>  drivers/gpu/nova-core/regs.rs       | 10 ++++++++++
>  3 files changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/falcon/gsp.rs b/drivers/gpu/nova-core/falcon/gsp.rs
> index f17599cb49fa..6da63823996b 100644
> --- a/drivers/gpu/nova-core/falcon/gsp.rs
> +++ b/drivers/gpu/nova-core/falcon/gsp.rs
> @@ -22,11 +22,31 @@ impl FalconEngine for Gsp {
>  }
>  
>  impl Falcon<Gsp> {
> -    /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
> -    /// allow GSP to signal CPU for processing new messages in message queue.
> -    pub(crate) fn clear_swgen0_intr(&self, bar: &Bar0) {
> +    /// Enable the GSP Falcon message queue interrupt (SWGEN0 interrupt).
> +    #[expect(dead_code)]
> +    pub(crate) fn enable_msgq_interrupt(&self, bar: &Bar0) {
> +        regs::NV_PFALCON_FALCON_IRQMASK::alter(bar, &Gsp::ID, |r| r.set_swgen0(true));
> +    }
> +
> +    /// Check if the message queue interrupt is pending.
> +    #[expect(dead_code)]
> +    pub(crate) fn has_msgq_interrupt(&self, bar: &Bar0) -> bool {
> +        regs::NV_PFALCON_FALCON_IRQSTAT::read(bar, &Gsp::ID).swgen0()
> +    }
> +
> +    /// Clears the message queue interrupt to allow GSP to signal CPU
> +    /// for processing new messages.
> +    pub(crate) fn clear_msgq_interrupt(&self, bar: &Bar0) {
>          regs::NV_PFALCON_FALCON_IRQSCLR::default()
>              .set_swgen0(true)
>              .write(bar, &Gsp::ID);
>      }
> +
> +    /// Acknowledge all pending GSP interrupts.
> +    #[expect(dead_code)]
> +    pub(crate) fn ack_all_interrupts(&self, bar: &Bar0) {
> +        // Read status and write the raw value to IRQSCLR to clear all pending interrupts.
> +        let status = regs::NV_PFALCON_FALCON_IRQSTAT::read(bar, &Gsp::ID);
> +        regs::NV_PFALCON_FALCON_IRQSCLR::from(u32::from(status)).write(bar, &Gsp::ID);
> +    }

I think this can be a bit more generic than that: all interrupts for all
falcons are handled the same way, so we shouldn't need to write
`enable`, `clear`, and other methods for each.

So the first step should probably be a generic `impl<E> Falcon<E>` block
that provides base methods for specialized blocks to reuse. It could
work with just the index of the bit corresponding to the interrupt to
enable/clear, but if we want to involve the type system we could also
define a `FalconInterrupt` trait with an associated type for the engine
on which this interrupt is valid, and its bit index as an associated
const.

But I suspect that the set of interrupts is going to be pretty standard,
so maybe we can use the standard nomenclature for the generic methods
(i.e. SWGEN0), and have dedicated methods for specialized units where
relevant.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts
  2025-10-20 22:35   ` John Hubbard
  2025-10-21 18:42     ` Joel Fernandes
@ 2025-10-22  6:48     ` Alexandre Courbot
  2025-10-22 21:09       ` Joel Fernandes
  1 sibling, 1 reply; 55+ messages in thread
From: Alexandre Courbot @ 2025-10-22  6:48 UTC (permalink / raw)
  To: John Hubbard, Joel Fernandes, linux-kernel, rust-for-linux,
	dri-devel, dakr, acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On Tue Oct 21, 2025 at 7:35 AM JST, John Hubbard wrote:
> Alex, this ".alter" method is misnamed, IMHO. Because for registers,
> The One True Way (or so I claim, haha) is to have the following methods:
>
>     .read
>     .modify, also known as RMW (read-modify-write)
>     .write
>
> "alter" never shows up in this naming scheme. I'm going to claim that
> this is a bit jarring for old hardware/kernel programmers.
>
> But it's not too late: these are only used in a very few places, and entirely
> within nova-core, too.
>
> Can I *please* send a patch to rename "alter" to "modify", perhaps?

Oh yes, although I was just thinking that this should be renamed to
`update` for consistency with regmap.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core
  2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
                   ` (7 preceding siblings ...)
  2025-10-20 21:20 ` [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core John Hubbard
@ 2025-10-22  6:57 ` Alexandre Courbot
  2025-10-22 21:30   ` Joel Fernandes
  8 siblings, 1 reply; 55+ messages in thread
From: Alexandre Courbot @ 2025-10-22  6:57 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
> These patches have some prerequistes needed for nova-core as support is added
> for memory management and interrupt handling. I rebased them on drm-rust-next
> and would like them to be considered for the next merge window. I also included
> a simple rust documentation patch fixing some issues I noticed while reading it :).
>
> The series adds support for the PRAMIN aperture mechanism, which is a
> prerequisite for virtual memory as it is required to boot strap virtual memory
> (we cannot write to VRAM using virtual memory because we need to write page
> tables to VRAM in the first place).
>
> I also add page table related structures (mm/types.rs) using the bitfield
> macro, which will be used for page table walking, memory mapping, etc. This is
> currently unused code, because without physical memory allocation (using the
> buddy allocator), we cannot use this code as page table pages need to be
> allocated in the first place. However, I have included several examples in the
> file about how these structures will be used. I have also simplified the code
> keeping future additions to it for later.
>
> For interrupts, I only have added additional support for GSP's message queue
> interrupt. I am working on adding support to the interrupt controller module
> (VFN) which is the next thing for me to post after this series. I have it
> prototyped and working, however I am currently making several changes to it
> related to virtual functions. For now in this series, I just want to get the
> GSP-specific patch out of the way, hence I am including it here.
>
> I also have added a patch for bitfield macro which constructs a bitfield struct
> given its storage type. This is used in a later GSP interrupt patch in the
> series to read from one register and write to another.

So IIUC, this series contains the following:

1. Add PRAMIN support,
2. Add Page mapping support, albeit this is unexercised until we have a
   user (e.g. buddy allocator),
3. Add Falcon interrupt support,
4. Add missing bitfield functionality, albeit not used yet,
5. Various documentation patches.

This is a bit confusing, as there is close to no logical relationship or
dependency between these patches. I see potential for several different
submissions here:

- The core documentation fix, as Miguel pointed out, since it should be
  merged into the rust tree and not nova-core.
- The bitfield patch is a useful addition and should be sent separately.
- PRAMIN/Page mapping should come with code that exercices them, so
  think they belong as the first patches of a series that ends with
  basic memory allocation capabilities. But feel free to send a RFC if
  you want early feedback.
- The falcon interrupts patch does not seem to be used by the last two
  patches? I guess it belongs to the series that will add support for
  the interrupt controller.
- Other documentation patches belong to the series that introduces the
  feature they document.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2025-10-20 18:55 ` [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
  2025-10-22  2:18   ` John Hubbard
@ 2025-10-22 10:41   ` Alexandre Courbot
  2025-10-22 22:04     ` Joel Fernandes
  1 sibling, 1 reply; 55+ messages in thread
From: Alexandre Courbot @ 2025-10-22 10:41 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
> Required for writing page tables directly to VRAM physical memory,
> before page tables and MMU are setup.
>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/mm/mod.rs    |   3 +
>  drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/nova_core.rs |   1 +
>  drivers/gpu/nova-core/regs.rs      |  29 +++-
>  4 files changed, 273 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
>
> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
> new file mode 100644
> index 000000000000..54c7cd9416a9
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/mod.rs
> @@ -0,0 +1,3 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +pub(crate) mod pramin;
> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
> new file mode 100644
> index 000000000000..4f4e1b8c0b9b
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/pramin.rs
> @@ -0,0 +1,241 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Direct VRAM access through PRAMIN window before page tables are set up.
> +//! PRAMIN can also write to system memory, however for simplicty we only

s/simplicty/simplicity

> +//! support VRAM access.
> +//!
> +//! # Examples
> +//!
> +//! ## Writing u32 data to VRAM
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin::PraminVram;
> +//!
> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
> +//!     let pramin = PraminVram::new(bar);
> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
> +//!     pramin.write::<u32>(0x10000, &data)?;
> +//!     Ok(())
> +//! }
> +//! ```
> +//!
> +//! ## Reading bytes from VRAM
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin::PraminVram;
> +//!
> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
> +//!     let pramin = PraminVram::new(bar);
> +//!     // Read a u8 from VRAM starting at offset 0x20000
> +//!     pramin.read::<u8>(0x20000, buffer)?;
> +//!     Ok(())
> +//! }
> +//! ```
> +
> +#![expect(dead_code)]
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +use core::mem;
> +use kernel::prelude::*;
> +
> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
> +/// the VRAM directly. These addresses are consistent across all GPUs.
> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000

This definition looks like it could be an array of registers - that way
we could use its `BASE` associated constant and keep the hardware
offsets into the `regs` module.

Even if we don't use the array of registers for convenience, it is good
to have it defined in `regs` for consistency.

> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position

You can use `kernel::sizes::SZ_1M` here.

> +
> +/// Trait for types that can be read/written through PRAMIN.
> +pub(crate) trait PraminNum: Copy + Default + Sized {
> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
> +
> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
> +
> +    fn size_bytes() -> usize {
> +        mem::size_of::<Self>()
> +    }
> +
> +    fn alignment() -> usize {
> +        Self::size_bytes()
> +    }
> +}

Since this trait requires `Sized`, you can use `size_of` and `align_of`
directly, making the `size_bytes` and `alignment` methods redundant.
Only `write_to_bar` should remain.

I also wonder whether we couldn't get rid of this trait entirely by
leveragin `FromBytes` and `AsBytes`. Since the size of the type is
known, we could have read/write methods in Pramin that write its content
by using Io accessors of decreasing size (first 64-bit, then 32, etc)
until all the data is written.

> +
> +/// Macro to implement PraminNum trait for unsigned integer types.
> +macro_rules! impl_pramin_unsigned_num {
> +    ($bits:literal) => {
> +        ::kernel::macros::paste! {
> +            impl PraminNum for [<u $bits>] {
> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
> +                    bar.[<try_read $bits>](offset)
> +                }
> +
> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
> +                    bar.[<try_write $bits>](self, offset)
> +                }
> +            }
> +        }
> +    };
> +}
> +
> +impl_pramin_unsigned_num!(8);
> +impl_pramin_unsigned_num!(16);
> +impl_pramin_unsigned_num!(32);
> +impl_pramin_unsigned_num!(64);
> +
> +/// Direct VRAM access through PRAMIN window before page tables are set up.
> +pub(crate) struct PraminVram<'a> {

Let's use the shorter name `Pramin` - the limitation to VRAM is a
reasonable one (since the CPU can access its own system memory), it is
not necessary to encode it into the name.

> +    bar: &'a Bar0,
> +    saved_window_addr: usize,
> +}
> +
> +impl<'a> PraminVram<'a> {
> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
> +    /// the state is restored when the accessor is dropped.
> +    ///
> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
> +        let saved_window_addr = Self::get_window_addr(bar);
> +        Self {
> +            bar,
> +            saved_window_addr,
> +        }
> +    }
> +
> +    /// Set BAR0 window to point to specific FB region.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
> +    ///                 Must be 64KB aligned (lower 16 bits zero).

Let's follow the rust doccomment guidelines for the arguments.

> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
> +        if fb_offset & 0xFFFF != 0 {
> +            return Err(EINVAL);
> +        }

Since this method is private and called from controlled contexts for
which `fb_offset` should always be valid, we can request callers to
give us a "window index" (e.g. the `window_base` of the
`NV_PBUS_BAR0_WINDOW` register) directly and remove this check. That
will also let us remove the impl block on `NV_PBUS_BAR0_WINDOW`.

> +
> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::default().set_window_addr(fb_offset);
> +        window_reg.write(self.bar);
> +
> +        // Read back to ensure it took effect
> +        let readback = regs::NV_PBUS_BAR0_WINDOW::read(self.bar);
> +        if readback.window_base() != window_reg.window_base() {
> +            return Err(EIO);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Get current BAR0 window offset.
> +    ///
> +    /// # Returns
> +    ///
> +    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
> +    /// This offset is always 64KB aligned.
> +    fn get_window_addr(bar: &Bar0) -> usize {
> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
> +        window_reg.get_window_addr()
> +    }
> +
> +    /// Common logic for accessing VRAM data through PRAMIN with windowing.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `num_items` - Number of items of type `T` to process.
> +    /// * `operation` - Closure called for each item to perform the actual read/write.
> +    ///                 Takes two parameters:
> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access

Formatting of arguments is strange here as well.

> +    ///
> +    /// The function automatically handles PRAMIN window repositioning when accessing
> +    /// data that spans multiple 1MB windows.

Inversely, this large method is under-documented. Understanding what
`operation` is supposed to do would be helpful.

> +    fn access_vram<T: PraminNum, F>(
> +        &self,
> +        fb_offset: usize,
> +        num_items: usize,
> +        mut operation: F,
> +    ) -> Result
> +    where
> +        F: FnMut(usize, usize) -> Result,
> +    {
> +        // FB offset must be aligned to the size of T
> +        if fb_offset & (T::alignment() - 1) != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let mut offset_bytes = fb_offset;
> +        let mut remaining_items = num_items;
> +        let mut data_index = 0;
> +
> +        while remaining_items > 0 {
> +            // Align the window to 64KB boundary
> +            let target_window = offset_bytes & !0xFFFF;
> +            let window_offset = offset_bytes - target_window;
> +
> +            // Set window if needed
> +            if target_window != Self::get_window_addr(self.bar) {
> +                self.set_window_addr(target_window)?;
> +            }
> +
> +            // Calculate how many items we can access from this window position
> +            // We can access up to 1MB total, minus the offset within the window
> +            let remaining_in_window = PRAMIN_SIZE - window_offset;
> +            let max_items_in_window = remaining_in_window / T::size_bytes();
> +            let items_to_write = core::cmp::min(remaining_items, max_items_in_window);
> +
> +            // Process data through PRAMIN
> +            for i in 0..items_to_write {
> +                // Calculate the byte offset in the PRAMIN window to write to.
> +                let pramin_offset_bytes = PRAMIN_BASE + window_offset + (i * T::size_bytes());
> +                operation(data_index + i, pramin_offset_bytes)?;
> +            }
> +
> +            // Move to next chunk.
> +            data_index += items_to_write;
> +            offset_bytes += items_to_write * T::size_bytes();
> +            remaining_items -= items_to_write;
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Generic write for data to VRAM through PRAMIN.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM where data will be written.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `data` - Slice of items to write to VRAM. All items will be written sequentially
> +    ///            starting at `fb_offset`.
> +    pub(crate) fn write<T: PraminNum>(&self, fb_offset: usize, data: &[T]) -> Result {
> +        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
> +            data[data_idx].write_to_bar(self.bar, pramin_offset)
> +        })
> +    }
> +
> +    /// Generic read data from VRAM through PRAMIN.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM where data will be read from.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `data` - Mutable slice that will be filled with data read from VRAM.
> +    ///            The number of items read equals `data.len()`.
> +    pub(crate) fn read<T: PraminNum>(&self, fb_offset: usize, data: &mut [T]) -> Result {
> +        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
> +            data[data_idx] = T::read_from_bar(self.bar, pramin_offset)?;
> +            Ok(())
> +        })
> +    }
> +}
> +
> +impl<'a> Drop for PraminVram<'a> {
> +    fn drop(&mut self) {
> +        let _ = self.set_window_addr(self.saved_window_addr); // Restore original window.
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 112277c7921e..6bd9fc3372d6 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -13,6 +13,7 @@
>  mod gfw;
>  mod gpu;
>  mod gsp;
> +mod mm;
>  mod regs;
>  mod util;
>  mod vbios;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index a3836a01996b..ba09da7e1541 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -12,6 +12,7 @@
>      FalconModSelAlgo, FalconSecurityModel, PFalcon2Base, PFalconBase, PeregrineCoreSelect,
>  };
>  use crate::gpu::{Architecture, Chipset};
> +use kernel::bits::genmask_u32;
>  use kernel::prelude::*;
>  
>  // PMC
> @@ -43,7 +44,8 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
>      }
>  }
>  
> -// PBUS
> +// PBUS - PBUS is a bus control unit, that helps the GPU communicate with the PCI bus.
> +// Handles the BAR windows, decoding of MMIO read/writes on the BARs, etc.
>  
>  register!(NV_PBUS_SW_SCRATCH @ 0x00001400[64]  {});
>  
> @@ -52,6 +54,31 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
>      31:16   frts_err_code as u16;
>  });
>  
> +// BAR0 window control register to configure the BAR0 window for PRAMIN access
> +// (direct physical VRAM access).
> +register!(NV_PBUS_BAR0_WINDOW @ 0x00001700, "BAR0 window control register" {
> +    25:24   target as u8, "Target (0=VID_MEM, 1=SYS_MEM_COHERENT, 2=SYS_MEM_NONCOHERENT)";
> +    23:0    window_base as u32, "Window base address (bits 39:16 of FB addr)";
> +});
> +
> +impl NV_PBUS_BAR0_WINDOW {
> +    /// Returns the 64-bit aligned VRAM address of the window.
> +    pub(crate) fn get_window_addr(self) -> usize {
> +        (self.window_base() as usize) << 16
> +    }
> +
> +    /// Sets the window address from a framebuffer offset.
> +    /// The fb_offset must be 64KB aligned (lower bits discared).
> +    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
> +        // Calculate window base (bits 39:16 of FB address)
> +        // The total FB address is 40 bits, mask anything above. Since we are
> +        // right shifting the offset by 16 bits, the mask is only 24 bits.
> +        let mask = genmask_u32(0..=23) as usize;
> +        let window_base = ((fb_offset >> 16) & mask) as u32;
> +        self.set_window_base(window_base)
> +    }
> +}

If you work directly with `window_base` as suggested above, this impl
block can be dropped altogether.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture
  2025-10-20 18:55 ` [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture Joel Fernandes
  2025-10-20 21:49   ` John Hubbard
  2025-10-22  1:43   ` Bagas Sanjaya
@ 2025-10-22 11:16   ` Alexandre Courbot
  2 siblings, 0 replies; 55+ messages in thread
From: Alexandre Courbot @ 2025-10-22 11:16 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
<snip>
> +When the CPU sends a message to the GSP, it writes the message to its own
> +queue (CPU queue) and updates the write pointer in its queue's TX header. The GSP
> +then reads the read pointer in its own queue's RX header and knows that there are
> +pending messages from the CPU because its RX header's read pointer is behind the
> +CPU's TX header's write pointer. After reading the message, the GSP updates its RX
> +header's read pointer to catch up. The same happens in reverse.

"The same happens in reverse" doesn't seem to be the full sentence (I
guess you want to add "when the GSP sends a message to the CPU").

> +
> +Page-based message passing
> +--------------------------
> +The message queue is page-based, which means that the message is stored in a

"means that each message""

> +page-aligned buffer. The page size is 4KB. Each message starts at the beginning of
> +a page.

This last sentence is a repetition of the first one of this paragraph.

> If the message is shorter than a page, the remaining space in the page is
> +wasted.

I'd say "unused" instead of "wasted", since this is by design.

>
> The next message starts at the beginning of the next page no matter how
> +small the previous message was.
> +
> +Note that messages larger than a page will span multiple pages. This means that
> +it is possible that the first part of the message lands on the last page, and the
> +second part of the message lands on the first page, thus requiring out-of-order
> +memory access. The SBuffer data structure in Nova tackles this use case.

nit: enclose type names in ` (i.e. `SBuffer`).


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-10-20 18:55 ` [PATCH 7/7] nova-core: mm: Add data structures for page table management Joel Fernandes
  2025-10-20 20:59   ` John Hubbard
  2025-10-20 21:30   ` Miguel Ojeda
@ 2025-10-22 11:21   ` Alexandre Courbot
  2025-10-22 19:13     ` Joel Fernandes
  2 siblings, 1 reply; 55+ messages in thread
From: Alexandre Courbot @ 2025-10-22 11:21 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
<snip>
> +#![expect(dead_code)]
> +
> +/// Memory size constants
> +pub(crate) const KB: usize = 1024;
> +pub(crate) const MB: usize = KB * 1024;

You can use `kernel::types::SZ_1K` and `SZ_1M` instead.

> +
> +/// Page size: 4 KiB
> +pub(crate) const PAGE_SIZE: usize = 4 * KB;

SZ_4K exists as well. :)


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2025-10-22  2:18   ` John Hubbard
@ 2025-10-22 17:48     ` Joel Fernandes
  2025-10-22 20:43       ` Joel Fernandes
  2025-10-24 11:31       ` Alexandre Courbot
  0 siblings, 2 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-22 17:48 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

Hi John,

On 10/21/2025 10:18 PM, John Hubbard wrote:
[...]
> First of all, I'd like very much for this patch to be at the beginning
> of a new patchset, that includes calling code that actually uses this
> pramin functionality.

There's no code in this patchset that uses pramin functionality though, so not
sure what you mean? This is a prerequisite patch for mm as mentioned in the
cover letter.

>> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
>> new file mode 100644
>> index 000000000000..54c7cd9416a9
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/mm/mod.rs
>> @@ -0,0 +1,3 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +pub(crate) mod pramin;
>> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
>> new file mode 100644
>> index 000000000000..4f4e1b8c0b9b
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/mm/pramin.rs
>> @@ -0,0 +1,241 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Direct VRAM access through PRAMIN window before page tables are set up.
> 
> Let's omit "before page tables are set up", because that's an assumption
> about the caller. In fact, this can be used at any time. (As an aside,
> various diag tools use this HW feature to peek/poke at vidmem, as you
> might expect.)

Ok.

> 
>> +//! PRAMIN can also write to system memory, however for simplicty we only
>> +//! support VRAM access.
> 
> Actually, I'd like to say something a bit different:
> 
> For Hopper/Blackwell and later GPUs, PRAMIN can only access VRAM.
> For earlier GPUs, sysmem was technically supported, but in practice
> never used in production drivers.

Ah, will clarify.

> 
> This is why Nova only provides VRAM access: Nova supports only a few
> of those older GPU generations (Turing, Ampere, and Ada), and there is
> no use case even for those few GPUs, for attempting sysmem access
> via PRAMIN.

Makes sense. Plus we can also access sysmem from BAR1 if needed.

>> +//!
>> +//! # Examples
>> +//!
>> +//! ## Writing u32 data to VRAM
>> +//!
>> +//! ```no_run
>> +//! use crate::driver::Bar0;
>> +//! use crate::mm::pramin::PraminVram;
>> +//!
>> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
>> +//!     let pramin = PraminVram::new(bar);
>> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
>> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
>> +//!     pramin.write::<u32>(0x10000, &data)?;
>> +//!     Ok(())
>> +//! }
>> +//! ```
>> +//!
>> +//! ## Reading bytes from VRAM
>> +//!
>> +//! ```no_run
>> +//! use crate::driver::Bar0;
>> +//! use crate::mm::pramin::PraminVram;
>> +//!
>> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
>> +//!     let pramin = PraminVram::new(bar);
>> +//!     // Read a u8 from VRAM starting at offset 0x20000
>> +//!     pramin.read::<u8>(0x20000, buffer)?;
>> +//!     Ok(())
>> +//! }
>> +//! ```
>> +
>> +#![expect(dead_code)]
>> +
>> +use crate::driver::Bar0;
>> +use crate::regs;
>> +use core::mem;
>> +use kernel::prelude::*;
>> +
>> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
>> +/// the VRAM directly. These addresses are consistent across all GPUs.
>> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
>> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
>> +
>> +/// Trait for types that can be read/written through PRAMIN.
>> +pub(crate) trait PraminNum: Copy + Default + Sized {
>> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
>> +
>> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
>> +
>> +    fn size_bytes() -> usize {
>> +        mem::size_of::<Self>()
>> +    }
>> +
>> +    fn alignment() -> usize {
>> +        Self::size_bytes()
>> +    }
>> +}
>> +
>> +/// Macro to implement PraminNum trait for unsigned integer types.
>> +macro_rules! impl_pramin_unsigned_num {
>> +    ($bits:literal) => {
>> +        ::kernel::macros::paste! {
>> +            impl PraminNum for [<u $bits>] {
>> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
>> +                    bar.[<try_read $bits>](offset)
>> +                }
>> +
>> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
>> +                    bar.[<try_write $bits>](self, offset)
>> +                }
>> +            }
>> +        }
>> +    };
>> +}
>> +
>> +impl_pramin_unsigned_num!(8);
>> +impl_pramin_unsigned_num!(16);
>> +impl_pramin_unsigned_num!(32);
>> +impl_pramin_unsigned_num!(64);
>> +
>> +/// Direct VRAM access through PRAMIN window before page tables are set up.
>> +pub(crate) struct PraminVram<'a> {
>> +    bar: &'a Bar0,
>> +    saved_window_addr: usize,
>> +}
>> +
>> +impl<'a> PraminVram<'a> {
>> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
>> +    /// the state is restored when the accessor is dropped.
>> +    ///
>> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
>> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
>> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
>> +        let saved_window_addr = Self::get_window_addr(bar);
>> +        Self {
>> +            bar,
>> +            saved_window_addr,
>> +        }
>> +    }
>> +
>> +    /// Set BAR0 window to point to specific FB region.
>> +    ///
>> +    /// # Arguments
>> +    ///
>> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
>> +    ///                 Must be 64KB aligned (lower 16 bits zero).
>> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
>> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
>> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
>> +        if fb_offset & 0xFFFF != 0 {
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::default().set_window_addr(fb_offset);
>> +        window_reg.write(self.bar);
>> +
>> +        // Read back to ensure it took effect
>> +        let readback = regs::NV_PBUS_BAR0_WINDOW::read(self.bar);
>> +        if readback.window_base() != window_reg.window_base() {
>> +            return Err(EIO);
>> +        }
> 
> Let's not "read it back to ensure it took effect". This is not required
> by the hardware.

Ok, it was more for my ascertaining that the write to effect, but you're right
its not necessary so I'll remove it.

>> +        Ok(())
>> +    }
>> +
>> +    /// Get current BAR0 window offset.
>> +    ///
>> +    /// # Returns
>> +    ///
>> +    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
>> +    /// This offset is always 64KB aligned.
>> +    fn get_window_addr(bar: &Bar0) -> usize {
>> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
>> +        window_reg.get_window_addr()
>> +    }
>> +
>> +    /// Common logic for accessing VRAM data through PRAMIN with windowing.
>> +    ///
>> +    /// # Arguments
>> +    ///
>> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
>> +    ///                 Must be aligned to `T::alignment()`.
>> +    /// * `num_items` - Number of items of type `T` to process.
>> +    /// * `operation` - Closure called for each item to perform the actual read/write.
>> +    ///                 Takes two parameters:
>> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
>> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access
>> +    ///
>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>> +    /// data that spans multiple 1MB windows.
>> +    fn access_vram<T: PraminNum, F>(
>> +        &self,
>> +        fb_offset: usize,
>> +        num_items: usize,
>> +        mut operation: F,
>> +    ) -> Result
> 
> This is far too much functionality, and the code can be made much smaller
> and simpler.
> and still get what we need. Open RM only supplies small accessors
> (8 thru 64 bits wide), and no "bulk access". The calling code can loop if 
> necessary.

The code uses a sliding window approach to reposition the moving window,
abstracting away the details of the moving window from the caller. That
simplifies the callers a lot as they don't need to "loop" and know when to move
the window when they hit limits. They can also write to greater than 1MB. The
bulk of the logic is in this function and the surrounding code is mostly
wrappers, which part is complicated or that you did not understand?

Just to note also, the PRAMIN moving window functionality in this patch allows
us to not need BAR2 to access VRAM for instance memory. That is a code
simplification then as we do not need code for BAR2 (the tradeoff being slightly
slower instance memory access). I confirmed with the team that this is also an
option. Abstracting the sliding window functionality becomes important then, so
I'd not vote for removing this functionality for that reason. And if we ever use
BAR2, having it is still useful because it allows us to have a fallback too for
comparison/reference.

> 
> We should do likewise, and avoid this.
> 
> Then we can just create things such as write_u32() or write<u32>(), etc.
> 
> And do we even *need* read?? I'm not sure we do.

We do need reads as we walk through page tables structures. Note that the page
tables are partially allocated by the GSP.

> 
> This is hopefully showing the value of including the calling code, as
> a follow-on patch in the series.

Unfortunately, there are too many dependencies as I mentioned in the cover
letter, so I would like to get functionality merged in stages. That's the
best way to make good progress IMO for nova-core. Of course, we have to careful
about design etc and I kept it as simple as possible out of that intention. My
pramin patch was written 3-4 months ago now, so I'd like to not keep it too
sitting comfortably in my tree. :). And this patch is critical for mm.

That said, I'll go look at the usecases and APIs again and see if I can simplify
anything more.

thanks,

 - Joel







^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/7] gpu: nova-core: Add support to convert bitfield to underlying type
  2025-10-22  6:25   ` Alexandre Courbot
@ 2025-10-22 17:51     ` Joel Fernandes
  0 siblings, 0 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-22 17:51 UTC (permalink / raw)
  To: Alexandre Courbot, linux-kernel, rust-for-linux, dri-devel, dakr
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau



On 10/22/2025 2:25 AM, Alexandre Courbot wrote:
> On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
>> To support the usecase where we read a register and write to another
>> with identical bit layout, add support to convert bitfield to underlying type.
>>
>> Another way to do this, is to read individual fields, on the caller
>> side, and write to the destination fields, but that is both cumbersome
>> and error-prone as new bits added in hardware may be missed.
> 
> Shouldn't the title be "Add support to convert bitfield FROM underlying
> type"? Since this adds a `From<storage> for Bitfield`.
> 

Good catch, I will fix the title for next posting.

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-10-22 11:21   ` Alexandre Courbot
@ 2025-10-22 19:13     ` Joel Fernandes
  0 siblings, 0 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-22 19:13 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	dri-devel@lists.freedesktop.org, dakr@kernel.org, Alistair Popple,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh@protonmail.com, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, John Hubbard,
	Timur Tabi, joel@joelfernandes.org, Elle Rhumsaa, Daniel Almeida,
	nouveau@lists.freedesktop.org, Alexandre Courbot



> On Oct 22, 2025, at 7:22 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> 
> On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
> <snip>
>> +#![expect(dead_code)]
>> +
>> +/// Memory size constants
>> +pub(crate) const KB: usize = 1024;
>> +pub(crate) const MB: usize = KB * 1024;
> 
> You can use `kernel::types::SZ_1K` and `SZ_1M` instead.
> 
>> +
>> +/// Page size: 4 KiB
>> +pub(crate) const PAGE_SIZE: usize = 4 * KB;
> 
> SZ_4K exists as well. :)

Thanks a lot, will do.

- Joel


> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2025-10-22 17:48     ` Joel Fernandes
@ 2025-10-22 20:43       ` Joel Fernandes
  2025-10-24 11:31       ` Alexandre Courbot
  1 sibling, 0 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-22 20:43 UTC (permalink / raw)
  To: John Hubbard, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

Hi,

On 10/22/2025 1:48 PM, Joel Fernandes wrote:
>>> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
>>> +    ///                 Must be aligned to `T::alignment()`.
>>> +    /// * `num_items` - Number of items of type `T` to process.
>>> +    /// * `operation` - Closure called for each item to perform the actual read/write.
>>> +    ///                 Takes two parameters:
>>> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
>>> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access
>>> +    ///
>>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>>> +    /// data that spans multiple 1MB windows.
>>> +    fn access_vram<T: PraminNum, F>(
>>> +        &self,
>>> +        fb_offset: usize,
>>> +        num_items: usize,
>>> +        mut operation: F,
>>> +    ) -> Result
>> This is far too much functionality, and the code can be made much smaller
>> and simpler.
>> and still get what we need. Open RM only supplies small accessors
>> (8 thru 64 bits wide), and no "bulk access". The calling code can loop if 
>> necessary.
>> The code uses a sliding window approach to reposition the moving window,
> abstracting away the details of the moving window from the caller.
For completeness: I chatted with John and wanted to update the thread. The main
"benefit" from this extra complexity is if we write > 1MB at a time (automatic
sliding of window). However, in all frankness I *currently* don't have such a
use case - all writes are within 1MB. Just to be clear though I do feel we will
need this though if we use PRAMIN as the primary way to access "instance memory"
instead of BAR2 since instance memory is much larger than the window.

So for next version of this patch, I will keep it simple (KISS principal). We
can add in the abstracted moving-window for >1MB writes when we need it as this
patch on the list and is not going anywhere. Also I think since we're exposing
the details of the window size to the caller, we could come up with compile-time
checks to make sure they're within bounds, so I'll proceed accordingly.

Thanks!

 - Joel


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts
  2025-10-22  6:47   ` Alexandre Courbot
@ 2025-10-22 21:05     ` Joel Fernandes
  0 siblings, 0 replies; 55+ messages in thread
From: Joel Fernandes @ 2025-10-22 21:05 UTC (permalink / raw)
  To: Alexandre Courbot, linux-kernel, rust-for-linux, dri-devel, dakr
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau



On 10/22/2025 2:47 AM, Alexandre Courbot wrote:
> On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
>> Add support for managing GSP falcon interrupts. These are required for
>> GSP message queue interrupt handling.
>>
>> Also rename clear_swgen0_intr() to enable_msq_interrupt() for
>> readability.
> 
> Let's make this "also" its own patch as it is a different thing.

Sure, will do. To clarify, my intention was to make this a "prerequisite" patch
series that's why it has all the goodies in a single series. However, it is
probably less confusing to have them independently sent as you pointed out.

>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/falcon/gsp.rs | 26 +++++++++++++++++++++++---
>>  drivers/gpu/nova-core/gpu.rs        |  2 +-
>>  drivers/gpu/nova-core/regs.rs       | 10 ++++++++++
>>  3 files changed, 34 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/falcon/gsp.rs b/drivers/gpu/nova-core/falcon/gsp.rs
>> index f17599cb49fa..6da63823996b 100644
>> --- a/drivers/gpu/nova-core/falcon/gsp.rs
>> +++ b/drivers/gpu/nova-core/falcon/gsp.rs
>> @@ -22,11 +22,31 @@ impl FalconEngine for Gsp {
>>  }
>>  
>>  impl Falcon<Gsp> {
>> -    /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
>> -    /// allow GSP to signal CPU for processing new messages in message queue.
>> -    pub(crate) fn clear_swgen0_intr(&self, bar: &Bar0) {
>> +    /// Enable the GSP Falcon message queue interrupt (SWGEN0 interrupt).
>> +    #[expect(dead_code)]
>> +    pub(crate) fn enable_msgq_interrupt(&self, bar: &Bar0) {
>> +        regs::NV_PFALCON_FALCON_IRQMASK::alter(bar, &Gsp::ID, |r| r.set_swgen0(true));
>> +    }
>> +
>> +    /// Check if the message queue interrupt is pending.
>> +    #[expect(dead_code)]
>> +    pub(crate) fn has_msgq_interrupt(&self, bar: &Bar0) -> bool {
>> +        regs::NV_PFALCON_FALCON_IRQSTAT::read(bar, &Gsp::ID).swgen0()
>> +    }
>> +
>> +    /// Clears the message queue interrupt to allow GSP to signal CPU
>> +    /// for processing new messages.
>> +    pub(crate) fn clear_msgq_interrupt(&self, bar: &Bar0) {
>>          regs::NV_PFALCON_FALCON_IRQSCLR::default()
>>              .set_swgen0(true)
>>              .write(bar, &Gsp::ID);
>>      }
>> +
>> +    /// Acknowledge all pending GSP interrupts.
>> +    #[expect(dead_code)]
>> +    pub(crate) fn ack_all_interrupts(&self, bar: &Bar0) {
>> +        // Read status and write the raw value to IRQSCLR to clear all pending interrupts.
>> +        let status = regs::NV_PFALCON_FALCON_IRQSTAT::read(bar, &Gsp::ID);
>> +        regs::NV_PFALCON_FALCON_IRQSCLR::from(u32::from(status)).write(bar, &Gsp::ID);
>> +    }
> 
> I think this can be a bit more generic than that: all interrupts for all
> falcons are handled the same way, so we shouldn't need to write
> `enable`, `clear`, and other methods for each.
> 
> So the first step should probably be a generic `impl<E> Falcon<E>` block
> that provides base methods for specialized blocks to reuse. It could
> work with just the index of the bit corresponding to the interrupt to
> enable/clear, but if we want to involve the type system we could also
> define a `FalconInterrupt` trait with an associated type for the engine
> on which this interrupt is valid, and its bit index as an associated
> const.
> 
> But I suspect that the set of interrupts is going to be pretty standard,
> so maybe we can use the standard nomenclature for the generic methods
> (i.e. SWGEN0), and have dedicated methods for specialized units where
> relevant.

Good point. I'll start with the `impl<E> Falcon<E>` block as it is simple.
I also suspect as you do that the layout should be standard across falcons.
Thanks for the suggestion and reviews.
thanks,

 - Joel



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts
  2025-10-22  6:48     ` Alexandre Courbot
@ 2025-10-22 21:09       ` Joel Fernandes
  2025-10-22 23:16         ` John Hubbard
  0 siblings, 1 reply; 55+ messages in thread
From: Joel Fernandes @ 2025-10-22 21:09 UTC (permalink / raw)
  To: Alexandre Courbot, John Hubbard, linux-kernel, rust-for-linux,
	dri-devel, dakr
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau



On 10/22/2025 2:48 AM, Alexandre Courbot wrote:
> On Tue Oct 21, 2025 at 7:35 AM JST, John Hubbard wrote:
>> Alex, this ".alter" method is misnamed, IMHO. Because for registers,
>> The One True Way (or so I claim, haha) is to have the following methods:
>>
>>     .read
>>     .modify, also known as RMW (read-modify-write)
>>     .write
>>
>> "alter" never shows up in this naming scheme. I'm going to claim that
>> this is a bit jarring for old hardware/kernel programmers.
>>
>> But it's not too late: these are only used in a very few places, and entirely
>> within nova-core, too.
>>
>> Can I *please* send a patch to rename "alter" to "modify", perhaps?
> 
> Oh yes, although I was just thinking that this should be renamed to
> `update` for consistency with regmap.
> 

Either update or modify would be Ok with me. Update does make it sound more like
a total write though for some reason. Perhaps update_fields ?

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core
  2025-10-22  6:57 ` Alexandre Courbot
@ 2025-10-22 21:30   ` Joel Fernandes
  2025-10-24 11:51     ` Alexandre Courbot
  0 siblings, 1 reply; 55+ messages in thread
From: Joel Fernandes @ 2025-10-22 21:30 UTC (permalink / raw)
  To: Alexandre Courbot, linux-kernel, rust-for-linux, dri-devel, dakr
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

Hi Alex,

On 10/22/2025 2:57 AM, Alexandre Courbot wrote:
> On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
>> These patches have some prerequistes needed for nova-core as support is added
>> for memory management and interrupt handling. I rebased them on drm-rust-next
>> and would like them to be considered for the next merge window. I also included
>> a simple rust documentation patch fixing some issues I noticed while reading it :).
>>
>> The series adds support for the PRAMIN aperture mechanism, which is a
>> prerequisite for virtual memory as it is required to boot strap virtual memory
>> (we cannot write to VRAM using virtual memory because we need to write page
>> tables to VRAM in the first place).
>>
>> I also add page table related structures (mm/types.rs) using the bitfield
>> macro, which will be used for page table walking, memory mapping, etc. This is
>> currently unused code, because without physical memory allocation (using the
>> buddy allocator), we cannot use this code as page table pages need to be
>> allocated in the first place. However, I have included several examples in the
>> file about how these structures will be used. I have also simplified the code
>> keeping future additions to it for later.
>>
>> For interrupts, I only have added additional support for GSP's message queue
>> interrupt. I am working on adding support to the interrupt controller module
>> (VFN) which is the next thing for me to post after this series. I have it
>> prototyped and working, however I am currently making several changes to it
>> related to virtual functions. For now in this series, I just want to get the
>> GSP-specific patch out of the way, hence I am including it here.
>>
>> I also have added a patch for bitfield macro which constructs a bitfield struct
>> given its storage type. This is used in a later GSP interrupt patch in the
>> series to read from one register and write to another.
> 
> So IIUC, this series contains the following:
> 
> 1. Add PRAMIN support,
> 2. Add Page mapping support, albeit this is unexercised until we have a
>    user (e.g. buddy allocator),
> 3. Add Falcon interrupt support,
> 4. Add missing bitfield functionality, albeit not used yet,
> 5. Various documentation patches.
> 
> This is a bit confusing, as there is close to no logical relationship or
> dependency between these patches. I see potential for several different
> submissions here:
> 
> - The core documentation fix, as Miguel pointed out, since it should be
>   merged into the rust tree and not nova-core.
> - The bitfield patch is a useful addition and should be sent separately.
> - PRAMIN/Page mapping should come with code that exercices them, so
>   think they belong as the first patches of a series that ends with
>   basic memory allocation capabilities. But feel free to send a RFC if
>   you want early feedback.

Ah, so if we go with that - the mm patches here (Pramin and page structures
types) will then have to wait till we have DRM buddy bindings or a DRM buddy
written in rust (the latter of which I already have). That would be unfortunate
because it will take more time up in the future to rebase. But RFC is a good
idea too for just getting an advance review. I think we need to carefully choose
RFC versus merging small (even if unused) patches (more below).

> - The falcon interrupts patch does not seem to be used by the last two
>   patches? I guess it belongs to the series that will add support for
>   the interrupt controller.
No, it is independent. Yes this leads up to the interrupt handing feature, but
just to emphasize a bit, the motivation was to "get small patches" in so that we
don't need to do obvious things later (example, the VFN interrupt module is much
more complex than this GSP patch yet both are needed for interrupt handling, so
the GSP patch is a good candidate IMO for upstreaming in the next merge window).
Having small patches merged reduces future burden on both reviewers and the
developers. This is also not something new, for instance we don't have any users
of the PCI MSI IRQ allocation bindings in rust/, yet we merged those. I think
that is reasonable. RFC should be used too when it makes sense, but I think we
should also look into merging things in chunks to avoid future review/rebase
burden. There isn't one rule that fits all is my point, right? I mean just look
at the attempted bitfield move too, Nova is the only user yet we will move it
out. But one may ask why move it out until there are other users? It has to be
on a case-by-case basis..

> - Other documentation patches belong to the series that introduces the
>   feature they document.
Agreed with the splitting suggestions, I will split this series moving forward.
But I may still push for merging small patches where it makes sense, if that's
Ok with everyone.

(I do feel the PRAMIN and GSP patch both do fall under the 'small patch' category)

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2025-10-22 10:41   ` Alexandre Courbot
@ 2025-10-22 22:04     ` Joel Fernandes
  2025-10-24 11:39       ` Alexandre Courbot
  0 siblings, 1 reply; 55+ messages in thread
From: Joel Fernandes @ 2025-10-22 22:04 UTC (permalink / raw)
  To: Alexandre Courbot, linux-kernel, rust-for-linux, dri-devel, dakr
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

Hi Alex,

On 10/22/2025 6:41 AM, Alexandre Courbot wrote:
> On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
>> Required for writing page tables directly to VRAM physical memory,
>> before page tables and MMU are setup.
>>
>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/mm/mod.rs    |   3 +
>>  drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
>>  drivers/gpu/nova-core/nova_core.rs |   1 +
>>  drivers/gpu/nova-core/regs.rs      |  29 +++-
>>  4 files changed, 273 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
>>
>> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
>> new file mode 100644
>> index 000000000000..54c7cd9416a9
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/mm/mod.rs
>> @@ -0,0 +1,3 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +pub(crate) mod pramin;
>> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
>> new file mode 100644
>> index 000000000000..4f4e1b8c0b9b
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/mm/pramin.rs
>> @@ -0,0 +1,241 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Direct VRAM access through PRAMIN window before page tables are set up.
>> +//! PRAMIN can also write to system memory, however for simplicty we only
> 
> s/simplicty/simplicity

Ok.

>> +//! support VRAM access.
>> +//!
>> +//! # Examples
>> +//!
>> +//! ## Writing u32 data to VRAM
>> +//!
>> +//! ```no_run
>> +//! use crate::driver::Bar0;
>> +//! use crate::mm::pramin::PraminVram;
>> +//!
>> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
>> +//!     let pramin = PraminVram::new(bar);
>> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
>> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
>> +//!     pramin.write::<u32>(0x10000, &data)?;
>> +//!     Ok(())
>> +//! }
>> +//! ```
>> +//!
>> +//! ## Reading bytes from VRAM
>> +//!
>> +//! ```no_run
>> +//! use crate::driver::Bar0;
>> +//! use crate::mm::pramin::PraminVram;
>> +//!
>> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
>> +//!     let pramin = PraminVram::new(bar);
>> +//!     // Read a u8 from VRAM starting at offset 0x20000
>> +//!     pramin.read::<u8>(0x20000, buffer)?;
>> +//!     Ok(())
>> +//! }
>> +//! ```
>> +
>> +#![expect(dead_code)]
>> +
>> +use crate::driver::Bar0;
>> +use crate::regs;
>> +use core::mem;
>> +use kernel::prelude::*;
>> +
>> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
>> +/// the VRAM directly. These addresses are consistent across all GPUs.
>> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
> 
> This definition looks like it could be an array of registers - that way
> we could use its `BASE` associated constant and keep the hardware
> offsets into the `regs` module.
> 
> Even if we don't use the array of registers for convenience, it is good
> to have it defined in `regs` for consistency.

Ok, I wanted to do that, but I thought since these are registers, it is weird to
move it there.

Also we need byte-level access, register macro is u32. I don't think we should
overload regs.rs just to store magic numbers, these are not registers right? We
have PRAM window configuration registers but that's different.

> 
>> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
> 
> You can use `kernel::sizes::SZ_1M` here.

Sure, will do.

>> +
>> +/// Trait for types that can be read/written through PRAMIN.
>> +pub(crate) trait PraminNum: Copy + Default + Sized {
>> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
>> +
>> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
>> +
>> +    fn size_bytes() -> usize {
>> +        mem::size_of::<Self>()
>> +    }
>> +
>> +    fn alignment() -> usize {
>> +        Self::size_bytes()
>> +    }
>> +}
> 
> Since this trait requires `Sized`, you can use `size_of` and `align_of`
> directly, making the `size_bytes` and `alignment` methods redundant.
> Only `write_to_bar` should remain.

Sure, slightly poorer caller-side readability though but its fine with me, I'll
do that.

> I also wonder whether we couldn't get rid of this trait entirely by
> leveragin `FromBytes` and `AsBytes`. Since the size of the type is
> known, we could have read/write methods in Pramin that write its content
> by using Io accessors of decreasing size (first 64-bit, then 32, etc)
> until all the data is written.

Ah great idea, I like this. Though per the other discussion with John on keeping
it simple (not doing bulk I/O operations), maybe we wouldn't need a trait at
all. Let me see.

> 
>> +
>> +/// Macro to implement PraminNum trait for unsigned integer types.
>> +macro_rules! impl_pramin_unsigned_num {
>> +    ($bits:literal) => {
>> +        ::kernel::macros::paste! {
>> +            impl PraminNum for [<u $bits>] {
>> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
>> +                    bar.[<try_read $bits>](offset)
>> +                }
>> +
>> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
>> +                    bar.[<try_write $bits>](self, offset)
>> +                }
>> +            }
>> +        }
>> +    };
>> +}
>> +
>> +impl_pramin_unsigned_num!(8);
>> +impl_pramin_unsigned_num!(16);
>> +impl_pramin_unsigned_num!(32);
>> +impl_pramin_unsigned_num!(64);
>> +
>> +/// Direct VRAM access through PRAMIN window before page tables are set up.
>> +pub(crate) struct PraminVram<'a> {
> 
> Let's use the shorter name `Pramin` - the limitation to VRAM is a
> reasonable one (since the CPU can access its own system memory), it is
> not necessary to encode it into the name.
Sure, sounds good.

> 
>> +    bar: &'a Bar0,
>> +    saved_window_addr: usize,
>> +}
>> +
>> +impl<'a> PraminVram<'a> {
>> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
>> +    /// the state is restored when the accessor is dropped.
>> +    ///
>> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
>> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
>> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
>> +        let saved_window_addr = Self::get_window_addr(bar);
>> +        Self {
>> +            bar,
>> +            saved_window_addr,
>> +        }
>> +    }
>> +
>> +    /// Set BAR0 window to point to specific FB region.
>> +    ///
>> +    /// # Arguments
>> +    ///
>> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
>> +    ///                 Must be 64KB aligned (lower 16 bits zero).
> 
> Let's follow the rust doccomment guidelines for the arguments.

Ok, Sure.
> 
>> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
>> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
>> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
>> +        if fb_offset & 0xFFFF != 0 {
>> +            return Err(EINVAL);
>> +        }
> 
> Since this method is private and called from controlled contexts for
> which `fb_offset` should always be valid, we can request callers to
> give us a "window index" (e.g. the `window_base` of the
> `NV_PBUS_BAR0_WINDOW` register) directly and remove this check. That
> will also let us remove the impl block on `NV_PBUS_BAR0_WINDOW`.
> 

The tradeoff being it may complicated callers of the function that deal purely
with addresses instead of window indices.

>> +    ///
>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>> +    /// data that spans multiple 1MB windows.
> 
> Inversely, this large method is under-documented. Understanding what
> `operation` is supposed to do would be helpful.

I will skip these comments for now as we discussed dropping complexity in other
thread, but thanks for the review on this. This function should be likely
dropped in the next iteration.

>> +
>> +    /// Sets the window address from a framebuffer offset.
>> +    /// The fb_offset must be 64KB aligned (lower bits discared).
>> +    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
>> +        // Calculate window base (bits 39:16 of FB address)
>> +        // The total FB address is 40 bits, mask anything above. Since we are
>> +        // right shifting the offset by 16 bits, the mask is only 24 bits.
>> +        let mask = genmask_u32(0..=23) as usize;
>> +        let window_base = ((fb_offset >> 16) & mask) as u32;
>> +        self.set_window_base(window_base)
>> +    }
>> +}
> 
> If you work directly with `window_base` as suggested above, this impl
> block can be dropped altogether.
But it will complicate callers. That's the tradeoff. I prefer to keep caller
side simple and abstract away complexity. But to your point, this is an internal
API so I can probably code it both ways and see what it looks like.

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts
  2025-10-22 21:09       ` Joel Fernandes
@ 2025-10-22 23:16         ` John Hubbard
  0 siblings, 0 replies; 55+ messages in thread
From: John Hubbard @ 2025-10-22 23:16 UTC (permalink / raw)
  To: Joel Fernandes, Alexandre Courbot, linux-kernel, rust-for-linux,
	dri-devel, dakr
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/22/25 2:09 PM, Joel Fernandes wrote:
> 
> 
> On 10/22/2025 2:48 AM, Alexandre Courbot wrote:
>> On Tue Oct 21, 2025 at 7:35 AM JST, John Hubbard wrote:
>>> Alex, this ".alter" method is misnamed, IMHO. Because for registers,
>>> The One True Way (or so I claim, haha) is to have the following methods:
>>>
>>>      .read
>>>      .modify, also known as RMW (read-modify-write)
>>>      .write
>>>
>>> "alter" never shows up in this naming scheme. I'm going to claim that
>>> this is a bit jarring for old hardware/kernel programmers.
>>>
>>> But it's not too late: these are only used in a very few places, and entirely
>>> within nova-core, too.
>>>
>>> Can I *please* send a patch to rename "alter" to "modify", perhaps?
>>
>> Oh yes, although I was just thinking that this should be renamed to
>> `update` for consistency with regmap.
>>
> 
> Either update or modify would be Ok with me. Update does make it sound more like

"update" works for me, too. Maybe we have a winner. Quick, let's
do it before a 4th engineer shows up! haha

I can send a patch to rename it.

> a total write though for some reason. Perhaps update_fields ?
> 
thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2025-10-22 17:48     ` Joel Fernandes
  2025-10-22 20:43       ` Joel Fernandes
@ 2025-10-24 11:31       ` Alexandre Courbot
  1 sibling, 0 replies; 55+ messages in thread
From: Alexandre Courbot @ 2025-10-24 11:31 UTC (permalink / raw)
  To: Joel Fernandes, John Hubbard, linux-kernel, rust-for-linux,
	dri-devel, dakr, acourbot
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On Thu Oct 23, 2025 at 2:48 AM JST, Joel Fernandes wrote:
<snip>
>>> +        Ok(())
>>> +    }
>>> +
>>> +    /// Get current BAR0 window offset.
>>> +    ///
>>> +    /// # Returns
>>> +    ///
>>> +    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
>>> +    /// This offset is always 64KB aligned.
>>> +    fn get_window_addr(bar: &Bar0) -> usize {
>>> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
>>> +        window_reg.get_window_addr()
>>> +    }
>>> +
>>> +    /// Common logic for accessing VRAM data through PRAMIN with windowing.
>>> +    ///
>>> +    /// # Arguments
>>> +    ///
>>> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
>>> +    ///                 Must be aligned to `T::alignment()`.
>>> +    /// * `num_items` - Number of items of type `T` to process.
>>> +    /// * `operation` - Closure called for each item to perform the actual read/write.
>>> +    ///                 Takes two parameters:
>>> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
>>> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access
>>> +    ///
>>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>>> +    /// data that spans multiple 1MB windows.
>>> +    fn access_vram<T: PraminNum, F>(
>>> +        &self,
>>> +        fb_offset: usize,
>>> +        num_items: usize,
>>> +        mut operation: F,
>>> +    ) -> Result
>> 
>> This is far too much functionality, and the code can be made much smaller
>> and simpler.
>> and still get what we need. Open RM only supplies small accessors
>> (8 thru 64 bits wide), and no "bulk access". The calling code can loop if 
>> necessary.
>
> The code uses a sliding window approach to reposition the moving window,
> abstracting away the details of the moving window from the caller. That
> simplifies the callers a lot as they don't need to "loop" and know when to move
> the window when they hit limits. They can also write to greater than 1MB. The
> bulk of the logic is in this function and the surrounding code is mostly
> wrappers, which part is complicated or that you did not understand?
>
> Just to note also, the PRAMIN moving window functionality in this patch allows
> us to not need BAR2 to access VRAM for instance memory. That is a code
> simplification then as we do not need code for BAR2 (the tradeoff being slightly
> slower instance memory access). I confirmed with the team that this is also an
> option. Abstracting the sliding window functionality becomes important then, so
> I'd not vote for removing this functionality for that reason. And if we ever use
> BAR2, having it is still useful because it allows us to have a fallback too for
> comparison/reference.

Whether we want a sliding window mechanism or not, I think it is
valuable to expose the PRAMIN functionality the way the hardware
supports it (i.e. set base address and work with a fixed slice), and
then build QoL features like the sliding window on top of it, through
e.g. another type that wraps the basic PRAMIN one.

This would make the code easier to read, allow more flexibility for
users (although in the case of PRAMIN we might not really need it), and
matches what Rust does for e.g. `BufReader`, which consumes a basic reader
and provide buffering for it.

>
>> 
>> We should do likewise, and avoid this.
>> 
>> Then we can just create things such as write_u32() or write<u32>(), etc.
>> 
>> And do we even *need* read?? I'm not sure we do.
>
> We do need reads as we walk through page tables structures. Note that the page
> tables are partially allocated by the GSP.
>
>> 
>> This is hopefully showing the value of including the calling code, as
>> a follow-on patch in the series.
>
> Unfortunately, there are too many dependencies as I mentioned in the cover
> letter, so I would like to get functionality merged in stages. That's the
> best way to make good progress IMO for nova-core. Of course, we have to careful
> about design etc and I kept it as simple as possible out of that intention. My
> pramin patch was written 3-4 months ago now, so I'd like to not keep it too
> sitting comfortably in my tree. :). And this patch is critical for mm.

Although we have neglected it lately, we could use our
`nova-core-unstable` staging branch for that - IIRC the goal was also to
keep track of pending patches and make sure they don't bitrot until they
can be sent.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2025-10-22 22:04     ` Joel Fernandes
@ 2025-10-24 11:39       ` Alexandre Courbot
  0 siblings, 0 replies; 55+ messages in thread
From: Alexandre Courbot @ 2025-10-24 11:39 UTC (permalink / raw)
  To: Joel Fernandes, Alexandre Courbot, linux-kernel, rust-for-linux,
	dri-devel, dakr
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Thu Oct 23, 2025 at 7:04 AM JST, Joel Fernandes wrote:
> Hi Alex,
>
> On 10/22/2025 6:41 AM, Alexandre Courbot wrote:
>> On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
>>> Required for writing page tables directly to VRAM physical memory,
>>> before page tables and MMU are setup.
>>>
>>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>>> ---
>>>  drivers/gpu/nova-core/mm/mod.rs    |   3 +
>>>  drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
>>>  drivers/gpu/nova-core/nova_core.rs |   1 +
>>>  drivers/gpu/nova-core/regs.rs      |  29 +++-
>>>  4 files changed, 273 insertions(+), 1 deletion(-)
>>>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>>>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
>>>
>>> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
>>> new file mode 100644
>>> index 000000000000..54c7cd9416a9
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/mm/mod.rs
>>> @@ -0,0 +1,3 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +pub(crate) mod pramin;
>>> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
>>> new file mode 100644
>>> index 000000000000..4f4e1b8c0b9b
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/mm/pramin.rs
>>> @@ -0,0 +1,241 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Direct VRAM access through PRAMIN window before page tables are set up.
>>> +//! PRAMIN can also write to system memory, however for simplicty we only
>> 
>> s/simplicty/simplicity
>
> Ok.
>
>>> +//! support VRAM access.
>>> +//!
>>> +//! # Examples
>>> +//!
>>> +//! ## Writing u32 data to VRAM
>>> +//!
>>> +//! ```no_run
>>> +//! use crate::driver::Bar0;
>>> +//! use crate::mm::pramin::PraminVram;
>>> +//!
>>> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
>>> +//!     let pramin = PraminVram::new(bar);
>>> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
>>> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
>>> +//!     pramin.write::<u32>(0x10000, &data)?;
>>> +//!     Ok(())
>>> +//! }
>>> +//! ```
>>> +//!
>>> +//! ## Reading bytes from VRAM
>>> +//!
>>> +//! ```no_run
>>> +//! use crate::driver::Bar0;
>>> +//! use crate::mm::pramin::PraminVram;
>>> +//!
>>> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
>>> +//!     let pramin = PraminVram::new(bar);
>>> +//!     // Read a u8 from VRAM starting at offset 0x20000
>>> +//!     pramin.read::<u8>(0x20000, buffer)?;
>>> +//!     Ok(())
>>> +//! }
>>> +//! ```
>>> +
>>> +#![expect(dead_code)]
>>> +
>>> +use crate::driver::Bar0;
>>> +use crate::regs;
>>> +use core::mem;
>>> +use kernel::prelude::*;
>>> +
>>> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
>>> +/// the VRAM directly. These addresses are consistent across all GPUs.
>>> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
>> 
>> This definition looks like it could be an array of registers - that way
>> we could use its `BASE` associated constant and keep the hardware
>> offsets into the `regs` module.
>> 
>> Even if we don't use the array of registers for convenience, it is good
>> to have it defined in `regs` for consistency.
>
> Ok, I wanted to do that, but I thought since these are registers, it is weird to
> move it there.

It's just that it looks like to keep all the layout in the same place.

>
> Also we need byte-level access, register macro is u32. I don't think we should
> overload regs.rs just to store magic numbers, these are not registers right? We
> have PRAM window configuration registers but that's different.

The register macro just gained support for `u8` thanks to your work, I
thought that would have been a good use-case. :)

Also we don't need to use the register accessors for this - the idea was
just to have the definition in `regs.rs`, and use the constant
containing its address instead of the regular register accessors if they
are not fit for this.

I don't feel too strongly about this though.

>
>> 
>>> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
>> 
>> You can use `kernel::sizes::SZ_1M` here.
>
> Sure, will do.
>
>>> +
>>> +/// Trait for types that can be read/written through PRAMIN.
>>> +pub(crate) trait PraminNum: Copy + Default + Sized {
>>> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
>>> +
>>> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
>>> +
>>> +    fn size_bytes() -> usize {
>>> +        mem::size_of::<Self>()
>>> +    }
>>> +
>>> +    fn alignment() -> usize {
>>> +        Self::size_bytes()
>>> +    }
>>> +}
>> 
>> Since this trait requires `Sized`, you can use `size_of` and `align_of`
>> directly, making the `size_bytes` and `alignment` methods redundant.
>> Only `write_to_bar` should remain.
>
> Sure, slightly poorer caller-side readability though but its fine with me, I'll
> do that.
>
>> I also wonder whether we couldn't get rid of this trait entirely by
>> leveragin `FromBytes` and `AsBytes`. Since the size of the type is
>> known, we could have read/write methods in Pramin that write its content
>> by using Io accessors of decreasing size (first 64-bit, then 32, etc)
>> until all the data is written.
>
> Ah great idea, I like this. Though per the other discussion with John on keeping
> it simple (not doing bulk I/O operations), maybe we wouldn't need a trait at
> all. Let me see.

Even better. :)

>
>> 
>>> +
>>> +/// Macro to implement PraminNum trait for unsigned integer types.
>>> +macro_rules! impl_pramin_unsigned_num {
>>> +    ($bits:literal) => {
>>> +        ::kernel::macros::paste! {
>>> +            impl PraminNum for [<u $bits>] {
>>> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
>>> +                    bar.[<try_read $bits>](offset)
>>> +                }
>>> +
>>> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
>>> +                    bar.[<try_write $bits>](self, offset)
>>> +                }
>>> +            }
>>> +        }
>>> +    };
>>> +}
>>> +
>>> +impl_pramin_unsigned_num!(8);
>>> +impl_pramin_unsigned_num!(16);
>>> +impl_pramin_unsigned_num!(32);
>>> +impl_pramin_unsigned_num!(64);
>>> +
>>> +/// Direct VRAM access through PRAMIN window before page tables are set up.
>>> +pub(crate) struct PraminVram<'a> {
>> 
>> Let's use the shorter name `Pramin` - the limitation to VRAM is a
>> reasonable one (since the CPU can access its own system memory), it is
>> not necessary to encode it into the name.
> Sure, sounds good.
>
>> 
>>> +    bar: &'a Bar0,
>>> +    saved_window_addr: usize,
>>> +}
>>> +
>>> +impl<'a> PraminVram<'a> {
>>> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
>>> +    /// the state is restored when the accessor is dropped.
>>> +    ///
>>> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
>>> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
>>> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
>>> +        let saved_window_addr = Self::get_window_addr(bar);
>>> +        Self {
>>> +            bar,
>>> +            saved_window_addr,
>>> +        }
>>> +    }
>>> +
>>> +    /// Set BAR0 window to point to specific FB region.
>>> +    ///
>>> +    /// # Arguments
>>> +    ///
>>> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
>>> +    ///                 Must be 64KB aligned (lower 16 bits zero).
>> 
>> Let's follow the rust doccomment guidelines for the arguments.
>
> Ok, Sure.
>> 
>>> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
>>> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
>>> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
>>> +        if fb_offset & 0xFFFF != 0 {
>>> +            return Err(EINVAL);
>>> +        }
>> 
>> Since this method is private and called from controlled contexts for
>> which `fb_offset` should always be valid, we can request callers to
>> give us a "window index" (e.g. the `window_base` of the
>> `NV_PBUS_BAR0_WINDOW` register) directly and remove this check. That
>> will also let us remove the impl block on `NV_PBUS_BAR0_WINDOW`.
>> 
>
> The tradeoff being it may complicated callers of the function that deal purely
> with addresses instead of window indices.

Would it though? IIUC the two callers of this method are aligning the
address already, so the internal check is superfluous. And this would
also simplify `NV_PBUS_BAR0_WINDOW`.

>
>>> +    ///
>>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>>> +    /// data that spans multiple 1MB windows.
>> 
>> Inversely, this large method is under-documented. Understanding what
>> `operation` is supposed to do would be helpful.
>
> I will skip these comments for now as we discussed dropping complexity in other
> thread, but thanks for the review on this. This function should be likely
> dropped in the next iteration.
>
>>> +
>>> +    /// Sets the window address from a framebuffer offset.
>>> +    /// The fb_offset must be 64KB aligned (lower bits discared).
>>> +    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
>>> +        // Calculate window base (bits 39:16 of FB address)
>>> +        // The total FB address is 40 bits, mask anything above. Since we are
>>> +        // right shifting the offset by 16 bits, the mask is only 24 bits.
>>> +        let mask = genmask_u32(0..=23) as usize;
>>> +        let window_base = ((fb_offset >> 16) & mask) as u32;
>>> +        self.set_window_base(window_base)
>>> +    }
>>> +}
>> 
>> If you work directly with `window_base` as suggested above, this impl
>> block can be dropped altogether.
> But it will complicate callers. That's the tradeoff. I prefer to keep caller
> side simple and abstract away complexity. But to your point, this is an internal
> API so I can probably code it both ways and see what it looks like.

I am not convinced that the callers would require extra complexity. If
it turns out they do, let's weight the cost/benefit of both approaches.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core
  2025-10-22 21:30   ` Joel Fernandes
@ 2025-10-24 11:51     ` Alexandre Courbot
  0 siblings, 0 replies; 55+ messages in thread
From: Alexandre Courbot @ 2025-10-24 11:51 UTC (permalink / raw)
  To: Joel Fernandes, Alexandre Courbot, linux-kernel, rust-for-linux,
	dri-devel, dakr
  Cc: Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Thu Oct 23, 2025 at 6:30 AM JST, Joel Fernandes wrote:
>> - The falcon interrupts patch does not seem to be used by the last two
>>   patches? I guess it belongs to the series that will add support for
>>   the interrupt controller.
> No, it is independent. Yes this leads up to the interrupt handing feature, but
> just to emphasize a bit, the motivation was to "get small patches" in so that we
> don't need to do obvious things later (example, the VFN interrupt module is much
> more complex than this GSP patch yet both are needed for interrupt handling, so
> the GSP patch is a good candidate IMO for upstreaming in the next merge window).
> Having small patches merged reduces future burden on both reviewers and the
> developers. This is also not something new, for instance we don't have any users
> of the PCI MSI IRQ allocation bindings in rust/, yet we merged those. I think
> that is reasonable. RFC should be used too when it makes sense, but I think we
> should also look into merging things in chunks to avoid future review/rebase
> burden. There isn't one rule that fits all is my point, right? I mean just look
> at the attempted bitfield move too, Nova is the only user yet we will move it
> out. But one may ask why move it out until there are other users? It has to be
> on a case-by-case basis..

We do have another user for bitfield/register and that's Tyr - the move
is to allow them to use these macros.

I am also more comfortable merging code when I understand how it is
called and used in practice. It doesn't necessarily need to be fully
complete, but something at least in RFC status demonstrating a real use
of the API helps.

Once a core patch in RFC status is reviewed and agreed on, it can be
added (with all the Reviewed-by tags) to the series containing its user
code, even if the user code comes later. It delays the merging of the
core code a bit, but since it has no user it would be dead merged code
anyway, and when you look at the whole picture it really comes down to
the same - there is no delay to when the machinery starts moving to
produce something useful.

Exceptions can be discussed if e.g. there is a big risk that a
refactoring will wreck everything, but this doesn't appear to be a
factor here.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-10-20 21:30   ` Miguel Ojeda
@ 2025-11-03 19:21     ` Joel Fernandes
  2025-11-04 17:54       ` Miguel Ojeda
  2025-11-03 19:29     ` John Hubbard
  1 sibling, 1 reply; 55+ messages in thread
From: Joel Fernandes @ 2025-11-03 19:21 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot,
	Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

Hi Miguel,

On 10/20/2025 5:30 PM, Miguel Ojeda wrote:
> Hi Joel,
> 
> A few nits below (I do sometimes this kind of docs review to try to
> keep a consistent style across all Rust code).

Thanks a lot for these, I studied all of the suggestions and agree with them.
May I also suggest to add some of these suggestions to the kernel rust coding
guidelines document, that way others new to sending rust kernel patches don't
miss it (example not adding a period at the end of a markdown doc header.). But
I will make it a point to check all my patches to make sure it confirms to the
suggestions.

Also a lot of your suggestions are related to how it looks it rustdoc, so I will
try to build rustdoc and see what it looks like as well, to get an idea of when
things in my patches could be improved.

thanks,

 - Joel


> 
> On Mon, Oct 20, 2025 at 8:56 PM Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>
>> +//!     .set_table_frame_number(new_table.frame_number());
>> +//! // Call a function to write PDE to VRAM address
> 
> Newline between these. Period ad the end.
> 
>> +//! ## Given a PTE, Get or allocate a PFN (page frame number).
> 
> In headers, no period at the end. Also, is "Get" intended to be capitalized?
> 
>> +//!     // Call a function to read 64-bit PTE value from VRAM address
> 
> Period at the end too (more of these elsewhere).
> 
>> +//!     if pte.valid() {
>> +//!         // Return physical frame number from existing mapping
>> +//!         Ok(Pfn::new(pte.frame_number()))
> 
> Early returns where possible, like in C, i.e. to avoid indentation on
> big `else` branches.
> 
>> +/// Memory size constants
>> +pub(crate) const KB: usize = 1024;
>> +pub(crate) const MB: usize = KB * 1024;
> 
> The docs will only apply to the first item, so this probably was meant
> to be a `//` instead of a `///`.
> 
> Or you could use a module to contain these (and then possibly `use`
> them outside), and then you can have docs in the module itself, but
> that is heavier.
> 
>> +/// Page size: 4 KiB
>> +pub(crate) const PAGE_SIZE: usize = 4 * KB;
> 
> `rustdoc` would eventually render the value and the non-evaluated
> expression, and in the source code it already says `4 * KB`, so I
> think repeating the value isn't needed, unless you mean to show it is
> really a multiple of 2.
> 
>> +pub(crate) enum PageTableLevel {
>> +    Pdb, // Level 0 - Page Directory Base
>> +    L1,  // Level 1
>> +    L2,  // Level 2
>> +    L3,  // Level 3 - Dual PDE (128-bit entries)
>> +    L4,  // Level 4 - PTEs
> 
> In this case, I think you meant the other way around, i.e. actual
> docs: `///` instead of `//`.
> 
> (Also, unless there is a particular reason (e.g. it is a big table),
> please generally put comments on top of things, not at the side, which
> matches closer to what is needed for docs.)
> 
>> +    /// Convert an Address to a frame number.
> 
> These should eventually be intra-doc links, but at least please use
> for the moment backticks when referring to Rust items like types etc.
> 
>> +    /// # Example
> 
> We always use the plural for these section headers, even if there is a
> single item (e.g. single example).
> 
>> +    /// ```no_run
>> +    /// let va = VirtualAddress::default();
>> +    /// let pte_idx = va.level_index(PageTableLevel::L4);
>> +    /// ```
> 
> This will need some `use` lines -- not needed now, but just as a
> reminder that these will get actually built eventually.
> 
>> +    /// Get raw u64 value.
> 
> Intra-doc link or at least backticks.
> 
>> +    /// The valid bit is inverted so add an accessor to flip it.
>> +    pub(crate) fn set_valid(&self, value: bool) -> Pde {
> 
> This docs string sounds like a commit message.
> 
>> +/// Dual PDE at Level 3 - 128-bit entry containing both LPT and SPT pointers.
>> +/// Lower 64 bits = big/large page, upper 64 bits = small page.
> 
> It sounds like a few of these details should be on its own paragraph
> to avoid having them in the short description (title).
> 
>> +/// // Call a function to read dual PDE from VRAM address
>> +/// let mut dual_pde: DualPde = read_dual_pde(dpde_addr)?;
>> +/// // Call a function to allocate a page table and return its VRAM address
>> +/// let spt_addr = allocate_page_table()?;
>> +/// dual_pde.set_spt(Pfn::from(spt_addr), AperturePde::VideoMemory);
>> +/// // Call a function to write dual PDE to VRAM address
>> +/// write_dual_pde(dpde_addr, dual_pde)?;
> 
> Newlines before the comments, i.e. between "conceptual blocks".
> 
>> +    pub lpt: Pde, // Large/Big Page Table pointer (2MB pages) - bits 63:0 (lower)
>> +    pub spt: Pde, // Small Page Table pointer (4KB pages) - bits 127:64 (upper)
> 
> Docs instead of comments.
> 
>> +    /// Check if has valid Small Page Table.
> 
> Missing word?
> 
> Thanks!
> 
> Cheers,
> Miguel


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-10-20 21:30   ` Miguel Ojeda
  2025-11-03 19:21     ` Joel Fernandes
@ 2025-11-03 19:29     ` John Hubbard
  2025-11-04 17:56       ` Miguel Ojeda
  1 sibling, 1 reply; 55+ messages in thread
From: John Hubbard @ 2025-11-03 19:29 UTC (permalink / raw)
  To: Miguel Ojeda, Joel Fernandes
  Cc: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot,
	Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 10/20/25 2:30 PM, Miguel Ojeda wrote:
> Hi Joel,
> 
> A few nits below (I do sometimes this kind of docs review to try to
> keep a consistent style across all Rust code).
> 
> On Mon, Oct 20, 2025 at 8:56 PM Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>
>> +//!     .set_table_frame_number(new_table.frame_number());
>> +//! // Call a function to write PDE to VRAM address
> 
> Newline between these. Period ad the end.
> 

Hi Miguel,

As Joel also was hinting at, is there any easy way to get this sort
of thing automatically checked? This is what scripts/checkpatch.pl
helps us out with on the C side, and maybe it is also the right
tool for Rust...?

It's sad to have a patchset pass both checkpatch.pl, and CLIPPY=1 
builds, and yet still be full of typography and convention 
violations.


thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-11-03 19:21     ` Joel Fernandes
@ 2025-11-04 17:54       ` Miguel Ojeda
  2025-11-04 18:18         ` Danilo Krummrich
  0 siblings, 1 reply; 55+ messages in thread
From: Miguel Ojeda @ 2025-11-04 17:54 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, rust-for-linux, dri-devel, dakr, acourbot,
	Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Mon, Nov 3, 2025 at 8:21 PM Joel Fernandes <joelagnelf@nvidia.com> wrote:
>
> Thanks a lot for these, I studied all of the suggestions and agree with them.
> May I also suggest to add some of these suggestions to the kernel rust coding
> guidelines document, that way others new to sending rust kernel patches don't
> miss it (example not adding a period at the end of a markdown doc header.). But

You're welcome!

I don't think everyone reads the documentation, and one issue is that
the longer it is, the less people may read it. For instance, the note
about using "Examples" as the section name is already explicitly there
and other bits can be inferred from the examples' style.

Now, in 2025, thanks to AI, you actually have a point, in the sense
that I assume people may be able to point a patch to an AI to ask it
to apply the guidelines from such a document.

So a good way forward may be best to have a list of "short
rules/examples" in a separate section or document, where I can easily
add entries with a simple example without too much explanation. Yeah,
I think I will do that.

> Also a lot of your suggestions are related to how it looks it rustdoc, so I will
> try to build rustdoc and see what it looks like as well, to get an idea of when
> things in my patches could be improved.

Definitely, please do!

We all should be doing it, especially so when the changes aren't
trivial (e.g. adding an entire new feature/API).

I have it in the "Subsystem Profile" document from `MAINTAINERS`:

    https://rust-for-linux.com/contributing#submit-checklist-addendum

    "When submitting changes to Rust code documentation, please render
them using the `rustdoc` target and ensure the result looks as
expected."

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-11-03 19:29     ` John Hubbard
@ 2025-11-04 17:56       ` Miguel Ojeda
  2025-11-05  2:25         ` John Hubbard
  0 siblings, 1 reply; 55+ messages in thread
From: Miguel Ojeda @ 2025-11-04 17:56 UTC (permalink / raw)
  To: John Hubbard
  Cc: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot, Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On Mon, Nov 3, 2025 at 8:29 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> As Joel also was hinting at, is there any easy way to get this sort
> of thing automatically checked? This is what scripts/checkpatch.pl
> helps us out with on the C side, and maybe it is also the right
> tool for Rust...?

We have a few patches for that script (including for at least one of
the things above), but lately I have been thinking we may want to have
a different script or tools, ideally written in Rust, to encourage
contributions and reviews and tests and so on.

Moreover, for some of the cases above it is better to put it into
other tooling like `rustdoc`, Clippy, `rustfmt` or even klint,
depending on what it is -- over time I have opened quite a few
suggestions and some were implemented and work great, see e.g.

    https://github.com/Rust-for-Linux/linux/issues/349

If someone wants to help with some of that, of course, please ping me!

I also had a bot I wrote back then when we used GitHub, with quite a
few checks (especially around development process for newcomers to the
kernel, e.g. using the right SoB and tags etc.) which people seemed to
appreciate (to the point of someone mentioning it in a talk... :).

A long time ago I asked about making the bot send messages to the
mailing list when we migrated, but it wasn't OK'd back then. I can try
again, or perhaps it would make sense to make it send messages in
private.

Finally, nowadays, I imagine an LLM could do a reasonable job for some
of these as well, if there is available AI time somewhere (please see
my reply to Joel on that too).

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-11-04 17:54       ` Miguel Ojeda
@ 2025-11-04 18:18         ` Danilo Krummrich
  0 siblings, 0 replies; 55+ messages in thread
From: Danilo Krummrich @ 2025-11-04 18:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Miguel Ojeda, linux-kernel, rust-for-linux, dri-devel, acourbot,
	Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, John Hubbard, Timur Tabi, joel,
	Elle Rhumsaa, Daniel Almeida, nouveau

On Tue Nov 4, 2025 at 6:54 PM CET, Miguel Ojeda wrote:
> On Mon, Nov 3, 2025 at 8:21 PM Joel Fernandes <joelagnelf@nvidia.com> wrote:
>> Also a lot of your suggestions are related to how it looks it rustdoc, so I will
>> try to build rustdoc and see what it looks like as well, to get an idea of when
>> things in my patches could be improved.

For the drm-rust tree we also have a summary [1] of things committers are
expected to check (including a link to the generic kernel and Rust checklist).

> Definitely, please do!

@Joel: Just be aware that for leaf crates no documentation is built yet, only
the kernel crate is built.

[1] https://drm.pages.freedesktop.org/maintainer-tools/committer/committer-drm-rust.html#submit-checklist

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] nova-core: mm: Add data structures for page table management
  2025-11-04 17:56       ` Miguel Ojeda
@ 2025-11-05  2:25         ` John Hubbard
  0 siblings, 0 replies; 55+ messages in thread
From: John Hubbard @ 2025-11-05  2:25 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Joel Fernandes, linux-kernel, rust-for-linux, dri-devel, dakr,
	acourbot, Alistair Popple, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, bjorn3_gh, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Timur Tabi, joel, Elle Rhumsaa,
	Daniel Almeida, nouveau

On 11/4/25 9:56 AM, Miguel Ojeda wrote:
> On Mon, Nov 3, 2025 at 8:29 PM John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> As Joel also was hinting at, is there any easy way to get this sort
>> of thing automatically checked? This is what scripts/checkpatch.pl
>> helps us out with on the C side, and maybe it is also the right
>> tool for Rust...?
> 
> We have a few patches for that script (including for at least one of
> the things above), but lately I have been thinking we may want to have
> a different script or tools, ideally written in Rust, to encourage
> contributions and reviews and tests and so on.
> 
> Moreover, for some of the cases above it is better to put it into
> other tooling like `rustdoc`, Clippy, `rustfmt` or even klint,

rustfmt sounds like a nice user experience: fixing things up
upon file save then becomes possible.

> depending on what it is -- over time I have opened quite a few
> suggestions and some were implemented and work great, see e.g.
> 
>     https://github.com/Rust-for-Linux/linux/issues/349
> 
> If someone wants to help with some of that, of course, please ping me!
> 
> I also had a bot I wrote back then when we used GitHub, with quite a
> few checks (especially around development process for newcomers to the
> kernel, e.g. using the right SoB and tags etc.) which people seemed to
> appreciate (to the point of someone mentioning it in a talk... :).
> 
> A long time ago I asked about making the bot send messages to the
> mailing list when we migrated, but it wasn't OK'd back then. I can try

I'm grateful for that. I think tooling provides a much happier
work environment: you can run the tools locally (and we can put
than in a submitters-checklist.rst), as opposed to getting an
email after posting.

> again, or perhaps it would make sense to make it send messages in
> private.
> 
> Finally, nowadays, I imagine an LLM could do a reasonable job for some
> of these as well, if there is available AI time somewhere (please see
> my reply to Joel on that too).

Very true. I saw that. Yes, once we know what the AI should be
reading for instructions, could help spot issues.


thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2025-11-05  2:26 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-20 18:55 [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core Joel Fernandes
2025-10-20 18:55 ` [PATCH 1/7] docs: rust: Fix a few grammatical errors Joel Fernandes
2025-10-20 21:21   ` John Hubbard
2025-10-20 21:33   ` Miguel Ojeda
2025-10-20 23:23     ` Joel Fernandes
2025-10-20 18:55 ` [PATCH 2/7] gpu: nova-core: Add support to convert bitfield to underlying type Joel Fernandes
2025-10-20 21:25   ` John Hubbard
2025-10-22  6:25   ` Alexandre Courbot
2025-10-22 17:51     ` Joel Fernandes
2025-10-20 18:55 ` [PATCH 3/7] docs: gpu: nova-core: Document GSP RPC message queue architecture Joel Fernandes
2025-10-20 21:49   ` John Hubbard
2025-10-22  1:43   ` Bagas Sanjaya
2025-10-22 11:16   ` Alexandre Courbot
2025-10-20 18:55 ` [PATCH 4/7] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
2025-10-20 19:36   ` John Hubbard
2025-10-20 19:48     ` Joel Fernandes
2025-10-20 20:42       ` John Hubbard
2025-10-20 20:45         ` Joel Fernandes
2025-10-20 22:08   ` John Hubbard
2025-10-22  2:09   ` Bagas Sanjaya
2025-10-20 18:55 ` [PATCH 5/7] gpu: nova-core: Add support for managing GSP falcon interrupts Joel Fernandes
2025-10-20 22:35   ` John Hubbard
2025-10-21 18:42     ` Joel Fernandes
2025-10-22  6:48     ` Alexandre Courbot
2025-10-22 21:09       ` Joel Fernandes
2025-10-22 23:16         ` John Hubbard
2025-10-22  6:47   ` Alexandre Courbot
2025-10-22 21:05     ` Joel Fernandes
2025-10-20 18:55 ` [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
2025-10-22  2:18   ` John Hubbard
2025-10-22 17:48     ` Joel Fernandes
2025-10-22 20:43       ` Joel Fernandes
2025-10-24 11:31       ` Alexandre Courbot
2025-10-22 10:41   ` Alexandre Courbot
2025-10-22 22:04     ` Joel Fernandes
2025-10-24 11:39       ` Alexandre Courbot
2025-10-20 18:55 ` [PATCH 7/7] nova-core: mm: Add data structures for page table management Joel Fernandes
2025-10-20 20:59   ` John Hubbard
2025-10-21 18:26     ` Joel Fernandes
2025-10-21 20:30       ` John Hubbard
2025-10-21 21:58         ` Joel Fernandes
2025-10-20 21:30   ` Miguel Ojeda
2025-11-03 19:21     ` Joel Fernandes
2025-11-04 17:54       ` Miguel Ojeda
2025-11-04 18:18         ` Danilo Krummrich
2025-11-03 19:29     ` John Hubbard
2025-11-04 17:56       ` Miguel Ojeda
2025-11-05  2:25         ` John Hubbard
2025-10-22 11:21   ` Alexandre Courbot
2025-10-22 19:13     ` Joel Fernandes
2025-10-20 21:20 ` [PATCH 0/7] Pre-requisite patches for mm and irq in nova-core John Hubbard
2025-10-21 18:29   ` Joel Fernandes
2025-10-22  6:57 ` Alexandre Courbot
2025-10-22 21:30   ` Joel Fernandes
2025-10-24 11:51     ` Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).