public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/7] gpu: nova-core: add INTR_CTRL interrupt controller and CPU doorbell self-test
@ 2026-05-01 20:58 Joel Fernandes
  2026-05-01 20:58 ` [PATCH v1 1/7] rust: sync: completion: add wait_for_completion_timeout() Joel Fernandes
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Joel Fernandes @ 2026-05-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Danilo Krummrich, Alexandre Courbot, John Hubbard, Alice Ryhl,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Jonathan Corbet, Shuah Khan, nova-gpu, dri-devel,
	rust-for-linux, linux-doc, Joel Fernandes

This series adds interrupt controller support to nova-core, and validates the
full interrupt path with a CPU-doorbell self-test. It is based on today's
drm-rust-next tree.

The GPU interrupt controller block (INTR_CTRL)
----------------------------------------------
INTR_CTRL is the GPU's two-level (top/leaf) interrupt controller. It
multiplexes all GSP and engine interrupts onto a single PCI MSI line. We need
it to receive asynchronous messages from GSP. GSP will the host with a SWGEN0
interrupt (routed via INTR_CTRL) to deliver async RPC messages like error
reports (XIDs) and other event.  INTR_CTRL also routes notifications from GPU
engines to PCIe MSI such as submitted work completion, MMU faults, etc.
Detailed documentation of the architecture with diagrams are provided in a
separate patch (patch 7/7).

What this series proves
-----------------------
The CPU doorbell self-test (patch 6/7) exercises the full interrupt path
end-to-end. It uses the LEAF_TRIGGER hardware register in INTR_CTRL to trigger
a interrupt, then waits for the IRQ handler to fire. The path under test is:

  LEAF_TRIGGER write
       v
  INTR_CTRL TOP/LEAF goes pending
       v
  GPU fires PCI MSI write
       v
  Host VFIO -> guest IOMMU/IRQ
       v
  Guest Linux IRQ -> nova-core handler
       v
  Handler

The full stack has been end-to-end tested on Ampere GA102 with GPU passthrough.

What comes next
---------------
This is the first stage. Once this lands, the next step is to read (via RPC)
the interrupt vector table that GSP programs at boot. That table tells us which
INTR_CTRL leaf and bit each engine is wired to, so the driver can route
incoming interrupts to per-engine handlers. Future patches will also add GSP interrupt
support, and a per-engine ISR dispatch loops.

The git tree with all patches can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (tag: nova-intr-ctrl-v1-20260501b)

Joel Fernandes (7):
  rust: sync: completion: add wait_for_completion_timeout()
  gpu: nova-core: allocate PCI MSI vector during probe
  gpu: nova-core: add interrupt controller register definitions
  gpu: nova-core: add Architecture::is_pre_hopper() helper
  gpu: nova-core: add INTR_CTRL interrupt controller API
  gpu: nova-core: add CPU doorbell IRQ self-test
  gpu: nova-core: document INTR_CTRL interrupt tree

 Documentation/gpu/nova/core/intr-ctrl.rst  | 305 +++++++++++++++++++++
 Documentation/gpu/nova/index.rst           |   1 +
 drivers/gpu/nova-core/Kconfig              |  13 +
 drivers/gpu/nova-core/gpu.rs               |  22 ++
 drivers/gpu/nova-core/irq.rs               |  29 ++
 drivers/gpu/nova-core/irq/doorbell_test.rs | 203 ++++++++++++++
 drivers/gpu/nova-core/irq/intr_ctrl.rs     | 281 +++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs         |   2 +
 drivers/gpu/nova-core/regs.rs              |  13 +
 rust/kernel/sync/completion.rs             |  18 +-
 10 files changed, 886 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/gpu/nova/core/intr-ctrl.rst
 create mode 100644 drivers/gpu/nova-core/irq.rs
 create mode 100644 drivers/gpu/nova-core/irq/doorbell_test.rs
 create mode 100644 drivers/gpu/nova-core/irq/intr_ctrl.rs


base-commit: 610e892bdb57043c7769982c2bff0260b6007b75
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1 1/7] rust: sync: completion: add wait_for_completion_timeout()
  2026-05-01 20:58 [PATCH v1 0/7] gpu: nova-core: add INTR_CTRL interrupt controller and CPU doorbell self-test Joel Fernandes
@ 2026-05-01 20:58 ` Joel Fernandes
  2026-05-05 12:17   ` Miguel Ojeda
  2026-05-01 20:58 ` [PATCH v1 2/7] gpu: nova-core: allocate PCI MSI vector during probe Joel Fernandes
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2026-05-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Danilo Krummrich, Alexandre Courbot, John Hubbard, Alice Ryhl,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Jonathan Corbet, Shuah Khan, nova-gpu, dri-devel,
	rust-for-linux, linux-doc, Joel Fernandes

Add a timeout variant of wait_for_completion() that wraps the C
function wait_for_completion_timeout(). Returns true if the task
completed before the timeout, else false.

The timeout is specified in jiffies. This is needed by drivers
that perform interrupt self-tests during probe, where an indefinite
wait would hang the system if the interrupt path is broken.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 rust/kernel/sync/completion.rs | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/sync/completion.rs b/rust/kernel/sync/completion.rs
index c50012a940a3..a2f46862365e 100644
--- a/rust/kernel/sync/completion.rs
+++ b/rust/kernel/sync/completion.rs
@@ -6,7 +6,12 @@
 //!
 //! C header: [`include/linux/completion.h`](srctree/include/linux/completion.h)
 
-use crate::{bindings, prelude::*, types::Opaque};
+use crate::{
+    bindings,
+    prelude::*,
+    time::Jiffies,
+    types::Opaque, //
+};
 
 /// Synchronization primitive to signal when a certain task has been completed.
 ///
@@ -109,4 +114,15 @@ pub fn wait_for_completion(&self) {
         // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
         unsafe { bindings::wait_for_completion(self.as_raw()) };
     }
+
+    /// Wait for completion of a task with a timeout.
+    ///
+    /// This method waits for the completion of a task; it is not interruptible but has a timeout.
+    /// Returns `true` if the task completed before the timeout, `false` if the timeout elapsed.
+    ///
+    /// The timeout is specified in jiffies. See also [`Completion::complete_all`].
+    pub fn wait_for_completion_timeout(&self, timeout: Jiffies) -> bool {
+        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
+        unsafe { bindings::wait_for_completion_timeout(self.as_raw(), timeout) != 0 }
+    }
 }

base-commit: 610e892bdb57043c7769982c2bff0260b6007b75
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 2/7] gpu: nova-core: allocate PCI MSI vector during probe
  2026-05-01 20:58 [PATCH v1 0/7] gpu: nova-core: add INTR_CTRL interrupt controller and CPU doorbell self-test Joel Fernandes
  2026-05-01 20:58 ` [PATCH v1 1/7] rust: sync: completion: add wait_for_completion_timeout() Joel Fernandes
@ 2026-05-01 20:58 ` Joel Fernandes
  2026-05-01 20:58 ` [PATCH v1 3/7] gpu: nova-core: add interrupt controller register definitions Joel Fernandes
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Joel Fernandes @ 2026-05-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Danilo Krummrich, Alexandre Courbot, John Hubbard, Alice Ryhl,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Jonathan Corbet, Shuah Khan, nova-gpu, dri-devel,
	rust-for-linux, linux-doc, Joel Fernandes

Allocate a single PCI MSI interrupt vector in the probe path.

Try MSI/MSI-X first. If that fails (possible in broken VFIO setups),
fall back to INTx with a dev_warn so the issue is visible in dmesg.
The allocation is devres-managed and automatically freed on unbind.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs       |  7 +++++++
 drivers/gpu/nova-core/irq.rs       | 25 +++++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs |  1 +
 3 files changed, 33 insertions(+)
 create mode 100644 drivers/gpu/nova-core/irq.rs

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 659f6a24ee13..3ac9cb106bfd 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -11,6 +11,8 @@
     sync::Arc, //
 };
 
+use crate::irq;
+
 use crate::{
     bounded_enum,
     driver::Bar0,
@@ -293,6 +295,11 @@ pub(crate) fn new<'a>(
 
             _: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
 
+            // Allocate a PCI interrupt vector.
+            _: {
+                let _irq_vector = irq::alloc_vector(pdev)?;
+            },
+
             bar: devres_bar,
         })
     }
diff --git a/drivers/gpu/nova-core/irq.rs b/drivers/gpu/nova-core/irq.rs
new file mode 100644
index 000000000000..3a2a40519f11
--- /dev/null
+++ b/drivers/gpu/nova-core/irq.rs
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::{
+    device::Bound,
+    pci::{
+        self,
+        IrqType,
+        IrqTypes, //
+    },
+    prelude::*,
+};
+
+pub(crate) fn alloc_vector(pdev: &pci::Device<Bound>) -> Result<pci::IrqVector<'_>> {
+    let msi_types = IrqTypes::default().with(IrqType::Msi).with(IrqType::MsiX);
+
+    let irq_vectors = match pdev.alloc_irq_vectors(1, 1, msi_types) {
+        Ok(vecs) => vecs,
+        Err(_) => {
+            dev_warn!(pdev.as_ref(), "MSI not available, falling back to INTx\n");
+            pdev.alloc_irq_vectors(1, 1, IrqTypes::default().with(IrqType::Intx))?
+        }
+    };
+
+    Ok(*irq_vectors.start())
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 3a609f6937e4..837aa2d36a0e 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -19,6 +19,7 @@
 mod firmware;
 mod gpu;
 mod gsp;
+mod irq;
 #[macro_use]
 mod num;
 mod regs;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 3/7] gpu: nova-core: add interrupt controller register definitions
  2026-05-01 20:58 [PATCH v1 0/7] gpu: nova-core: add INTR_CTRL interrupt controller and CPU doorbell self-test Joel Fernandes
  2026-05-01 20:58 ` [PATCH v1 1/7] rust: sync: completion: add wait_for_completion_timeout() Joel Fernandes
  2026-05-01 20:58 ` [PATCH v1 2/7] gpu: nova-core: allocate PCI MSI vector during probe Joel Fernandes
@ 2026-05-01 20:58 ` Joel Fernandes
  2026-05-01 20:58 ` [PATCH v1 4/7] gpu: nova-core: add Architecture::is_pre_hopper() helper Joel Fernandes
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Joel Fernandes @ 2026-05-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Danilo Krummrich, Alexandre Courbot, John Hubbard, Alice Ryhl,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Jonathan Corbet, Shuah Khan, nova-gpu, dri-devel,
	rust-for-linux, linux-doc, Joel Fernandes

Define the interrupt controller register layout in regs.rs.

The runtime leaf count is architecture-dependent. Leaf arrays have 16
entries covering Hopper/Blackwell maximum. Pre-Hopper chipsets
(Turing/Ampere/Ada) only use indices 0-7.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/regs.rs | 47 +++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 6faeed73901d..51dff318acf1 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -284,6 +284,53 @@ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
     }
 }
 
+// INTR_CTRL block.
+// Definition of the PF interrupt tree of the GPU interrupt controller. The PF
+// can also view VF interrupt trees, but that is not supported right now.
+
+register! {
+    /// Per-leaf pending interrupt bitmap. Bit N is set when vector
+    /// `(leaf * 32) + N` is pending. Reading returns the current pending bitmap;
+    /// writing acknowledges set bits (write-1-to-clear). 16 leaves cover the
+    /// Hopper/Blackwell+ maximum of 512 vectors; earlier architectures
+    /// (Turing/Ampere/Ada) only use indices 0..7.
+    pub(crate) NV_VF_INTR_LEAF(u32)[16] @ 0x00b81000 {}
+
+    /// Per-leaf interrupt enable set ("allow"). Writing a 1 to bit N enables
+    /// vector `(leaf * 32) + N` (write-1-to-set; writing 0 has no effect).
+    /// Used to unmask interrupts from a specific source.
+    pub(crate) NV_VF_INTR_LEAF_EN_SET(u32)[16] @ 0x00b81200 {}
+
+    /// Per-leaf interrupt enable clear ("block"). Writing a 1 to bit N disables
+    /// vector `(leaf * 32) + N` (write-1-to-clear; writing 0 has no effect).
+    /// Used to mask interrupts from a specific source.
+    pub(crate) NV_VF_INTR_LEAF_EN_CLEAR(u32)[16] @ 0x00b81400 {}
+
+    /// Top-level pending bitmap. Bit N is set if any enabled vector in subtree
+    /// N is pending. Each subtree covers two consecutive leaves
+    /// (subtree N = leaves 2N and 2N+1). The top bit clears automatically once
+    /// every pending vector in the subtree has been acknowledged via the
+    /// corresponding LEAF register.
+    pub(crate) NV_VF_INTR_TOP(u32) @ 0x00b81600 {}
+
+    /// Top-level enable set ("rearm"). Writing a 1 to bit N enables MSI
+    /// delivery for subtree N (write-1-to-set). The ISR writes the active
+    /// subtree mask here after servicing all pending leaves to resume MSI
+    /// generation.
+    pub(crate) NV_VF_INTR_TOP_EN_SET(u32) @ 0x00b81608 {}
+
+    /// Top-level enable clear ("unarm"). Writing a 1 to bit N disables MSI
+    /// delivery for subtree N (write-1-to-clear). The ISR writes the active
+    /// subtree mask here on entry to mask further MSI writes while servicing
+    /// the pending leaves.
+    pub(crate) NV_VF_INTR_TOP_EN_CLEAR(u32) @ 0x00b81610 {}
+
+    /// Synthetic interrupt trigger. Writing a vector number sets that vector's
+    /// LEAF bit as if the corresponding hardware source had asserted, allowing
+    /// software to inject interrupts. Used by the CPU doorbell self-test.
+    pub(crate) NV_VF_INTR_LEAF_TRIGGER(u32) @ 0x00b81640 {}
+}
+
 // PFALCON
 
 register! {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 4/7] gpu: nova-core: add Architecture::is_pre_hopper() helper
  2026-05-01 20:58 [PATCH v1 0/7] gpu: nova-core: add INTR_CTRL interrupt controller and CPU doorbell self-test Joel Fernandes
                   ` (2 preceding siblings ...)
  2026-05-01 20:58 ` [PATCH v1 3/7] gpu: nova-core: add interrupt controller register definitions Joel Fernandes
@ 2026-05-01 20:58 ` Joel Fernandes
  2026-05-01 20:58 ` [PATCH v1 5/7] gpu: nova-core: add INTR_CTRL interrupt controller API Joel Fernandes
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Joel Fernandes @ 2026-05-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Danilo Krummrich, Alexandre Courbot, John Hubbard, Alice Ryhl,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Jonathan Corbet, Shuah Khan, nova-gpu, dri-devel,
	rust-for-linux, linux-doc, Joel Fernandes

Add a helper method to the Architecture enum that returns true for Turing,
Ampere, and Ada -- the GPU generations that predate Hopper.

The interrupt controller uses this to determine the number of active
interrupt leaves at construction time.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 3ac9cb106bfd..3b45bce6738b 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -163,6 +163,13 @@ pub(crate) enum Architecture with TryFrom<Bounded<u32, 6>> {
     }
 }
 
+impl Architecture {
+    /// Returns `true` for GPU architectures that predate Hopper.
+    pub(crate) fn is_pre_hopper(self) -> bool {
+        matches!(self, Self::Turing | Self::Ampere | Self::Ada)
+    }
+}
+
 #[derive(Clone, Copy)]
 pub(crate) struct Revision {
     major: Bounded<u8, 4>,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 5/7] gpu: nova-core: add INTR_CTRL interrupt controller API
  2026-05-01 20:58 [PATCH v1 0/7] gpu: nova-core: add INTR_CTRL interrupt controller and CPU doorbell self-test Joel Fernandes
                   ` (3 preceding siblings ...)
  2026-05-01 20:58 ` [PATCH v1 4/7] gpu: nova-core: add Architecture::is_pre_hopper() helper Joel Fernandes
@ 2026-05-01 20:58 ` Joel Fernandes
  2026-05-01 20:58 ` [PATCH v1 6/7] gpu: nova-core: add CPU doorbell IRQ self-test Joel Fernandes
  2026-05-01 20:58 ` [PATCH v1 7/7] gpu: nova-core: document INTR_CTRL interrupt tree Joel Fernandes
  6 siblings, 0 replies; 10+ messages in thread
From: Joel Fernandes @ 2026-05-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Danilo Krummrich, Alexandre Courbot, John Hubbard, Alice Ryhl,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Jonathan Corbet, Shuah Khan, nova-gpu, dri-devel,
	rust-for-linux, linux-doc, Joel Fernandes

Create the irq/ module with a type-state INTR_CTRL interrupt
controller API. The IntrCtrl struct provides factory methods for Top
and Leaf objects that use a sealed State trait with Idle/Pending types
to enforce correct usage at compile time.

The type-state pattern ensures ack() is only callable after
read_pending() has cached the hardware state, preventing mismatched
masks at compile time.

The later CPU doorbell self-test will make use of it.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/irq.rs           |   2 +
 drivers/gpu/nova-core/irq/intr_ctrl.rs | 281 +++++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs     |   1 +
 3 files changed, 284 insertions(+)
 create mode 100644 drivers/gpu/nova-core/irq/intr_ctrl.rs

diff --git a/drivers/gpu/nova-core/irq.rs b/drivers/gpu/nova-core/irq.rs
index 3a2a40519f11..01ae638bf494 100644
--- a/drivers/gpu/nova-core/irq.rs
+++ b/drivers/gpu/nova-core/irq.rs
@@ -10,6 +10,8 @@
     prelude::*,
 };
 
+mod intr_ctrl;
+
 pub(crate) fn alloc_vector(pdev: &pci::Device<Bound>) -> Result<pci::IrqVector<'_>> {
     let msi_types = IrqTypes::default().with(IrqType::Msi).with(IrqType::MsiX);
 
diff --git a/drivers/gpu/nova-core/irq/intr_ctrl.rs b/drivers/gpu/nova-core/irq/intr_ctrl.rs
new file mode 100644
index 000000000000..dde77cc1f42f
--- /dev/null
+++ b/drivers/gpu/nova-core/irq/intr_ctrl.rs
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! GPU interrupt controller support (INTR_CTRL).
+//!
+//! Each PCIe function (PF and each VF, also known as a GFID) has its own
+//! interrupt tree. In this module, we only interact with the PF tree.
+//! The VF interacts with its own tree (which appears as a PF tree to it).
+//!
+//! See `Documentation/gpu/nova/core/intr-ctrl.rst` for detailed documentation
+//! of the INTR_CTRL architecture.
+
+use kernel::{
+    io::{
+        register::Array,
+        Io, //
+    },
+    num::Bounded,
+};
+
+use crate::{driver::Bar0, gpu::Chipset, regs};
+
+/// Type alias for a leaf interrupt index, bounded to valid values 0-15.
+pub(super) type LeafIndex = Bounded<usize, 4>;
+
+// Type-state for `Top` and `Leaf`.
+//
+// `Top` follows Idle -> Unarmed -> Pending -> consumed (rearmed).
+// `Leaf` uses Idle -> Pending to catch.
+//
+// This catches issues at compile time where we perform an operation
+// on an object in the wrong state (example, rearming `Top` without reading
+// pending bits first).
+/// Sealed trait representing the interrupt controller state.
+pub(super) trait State: private::Sealed {}
+
+/// Idle state: TOP_EN may or may not be armed; no snapshot held.
+pub(super) struct Idle;
+impl State for Idle {}
+
+/// Unarmed state: TOP_EN was just cleared by this Top handle, snapshot not yet read.
+pub(super) struct Unarmed;
+impl State for Unarmed {}
+
+/// Pending state: interrupt mask has been read from hardware.
+pub(super) struct Pending {
+    mask: u32,
+}
+impl State for Pending {}
+
+mod private {
+    pub(in crate::irq) trait Sealed {}
+    impl Sealed for super::Idle {}
+    impl Sealed for super::Unarmed {}
+    impl Sealed for super::Pending {}
+}
+
+/// Interrupt controller for a single PCIe function's interrupt tree.
+#[derive(Clone)]
+pub(super) struct IntrCtrl {
+    subtree_mask: u8,
+}
+
+impl IntrCtrl {
+    /// Create an `IntrCtrl` configured for the given chipset's interrupt tree width.
+    pub(super) fn new(chipset: Chipset) -> Self {
+        // Each TOP bit covers 2 leaves; subtree_mask has one bit per subtree.
+        //   Pre-Hopper:  8 leaves / 2 = 4 subtrees -> 0x0f (bits [3:0])
+        //   Hopper+:    16 leaves / 2 = 8 subtrees -> 0xff (bits [7:0])
+        Self {
+            subtree_mask: if chipset.arch().is_pre_hopper() {
+                0xf
+            } else {
+                0xff
+            },
+        }
+    }
+
+    /// Return a [`Top`] handle in the [`Idle`] state for this controller.
+    pub(super) fn top(&self) -> Top<Idle> {
+        Top {
+            subtree_mask: self.subtree_mask,
+            state: Idle,
+        }
+    }
+
+    /// Return a [`Leaf`] handle in the [`Idle`] state for the given leaf index.
+    pub(super) fn leaf(&self, index: LeafIndex) -> Leaf<Idle> {
+        Leaf::from_index(index)
+    }
+
+    /// Trigger a CPU doorbell interrupt for the given MSI vector number.
+    pub(super) fn trigger(&self, bar: &Bar0, vector: u32) {
+        bar.write(regs::NV_VF_INTR_LEAF_TRIGGER, vector.into());
+    }
+
+    /// Drain any pending interrupts on this controller.
+    ///
+    /// Walks all enabled subtrees, reads each leaf's pending mask, and acks
+    /// any pending bits. Useful for clearing stale interrupt state, e.g.,
+    /// state leftover when GSP booted.
+    pub(super) fn drain(&self, bar: &Bar0) {
+        let top = self.top().unarm(bar).read_pending(bar);
+
+        for subtree in top.iter_subtrees() {
+            for leaf in subtree.iter_pending_leaves(self, bar) {
+                leaf.ack(bar);
+            }
+        }
+
+        top.rearm(bar);
+    }
+}
+
+/// Top-level interrupt controller view.
+pub(super) struct Top<S: State = Idle> {
+    subtree_mask: u8,
+    state: S,
+}
+
+impl Top<Idle> {
+    /// Arm the controller (write TOP_EN_SET). Use for one-shot initial
+    /// setup before any interrupts are expected. The ISR's normal
+    /// re-arm path goes through `unarm()` -> `read_pending()` ->
+    /// `Top<Pending>::rearm()` instead.
+    pub(super) fn arm(self, bar: &Bar0) {
+        bar.write(
+            regs::NV_VF_INTR_TOP_EN_SET,
+            u32::from(self.subtree_mask).into(),
+        );
+    }
+
+    /// Unarm the controller (write TOP_EN_CLEAR). MSI is edge-triggered,
+    /// so this stops the GPU from firing redundant MSI writes over PCIe
+    /// while the host drains the tree. Consumes self and transitions to
+    /// `Top<Unarmed>`, which can then `read_pending()`.
+    pub(super) fn unarm(self, bar: &Bar0) -> Top<Unarmed> {
+        bar.write(
+            regs::NV_VF_INTR_TOP_EN_CLEAR,
+            u32::from(self.subtree_mask).into(),
+        );
+        Top {
+            subtree_mask: self.subtree_mask,
+            state: Unarmed,
+        }
+    }
+}
+
+impl Top<Unarmed> {
+    /// Read the TOP register's pending bitmask. Consumes self and
+    /// returns a `Top<Pending>` carrying the snapshot.
+    pub(super) fn read_pending(self, bar: &Bar0) -> Top<Pending> {
+        let mask = bar.read(regs::NV_VF_INTR_TOP).into_raw();
+        Top {
+            subtree_mask: self.subtree_mask,
+            state: Pending { mask },
+        }
+    }
+}
+
+/// One subtree in the INTR_TOP pending mask (covers two adjacent leaf indices).
+#[derive(Clone, Copy)]
+pub(super) struct Subtree {
+    index: usize,
+}
+
+impl Subtree {
+    /// Yields the two [`Leaf`] slots covered by this subtree's TOP bit.
+    fn iter_leaves<'a>(
+        self,
+        ctrl: &'a IntrCtrl,
+    ) -> impl Iterator<Item = Leaf<Idle>> + 'a {
+        // Each subtree covers two adjacent leaf indices for all architectures.
+        (0..2usize).filter_map(move |offset| {
+            // self.index is 0-31 and offset is 0-1, so idx is at most 63.
+            let idx = self.index * 2 + offset;
+            LeafIndex::try_new(idx).map(|idx| ctrl.leaf(idx))
+        })
+    }
+
+    /// Like [`Self::iter_leaves`], but keeps only leaves with a non-zero pending mask.
+    pub(super) fn iter_pending_leaves<'a>(
+        self,
+        ctrl: &'a IntrCtrl,
+        bar: &'a Bar0,
+    ) -> impl Iterator<Item = Leaf<Pending>> + 'a {
+        self.iter_leaves(ctrl).filter_map(move |idle| {
+            let pending = idle.read_pending(bar);
+            (pending.mask() != 0).then_some(pending)
+        })
+    }
+}
+
+impl Top<Pending> {
+    /// Return the raw TOP pending bitmask snapshot.
+    pub(super) fn mask(&self) -> u32 {
+        self.state.mask
+    }
+
+    /// Iterate over all subtrees with a pending TOP bit set in the snapshot.
+    pub(super) fn iter_subtrees(&self) -> impl Iterator<Item = Subtree> + '_ {
+        (0..32usize)
+            .filter(move |&bit| self.state.mask & (1u32 << bit) != 0)
+            .map(|index| Subtree { index })
+    }
+
+    /// Re-arm the controller (write TOP_EN_SET). Consumes self so the
+    /// pending snapshot cannot be consulted or re-iterated afterwards.
+    pub(super) fn rearm(self, bar: &Bar0) {
+        bar.write(
+            regs::NV_VF_INTR_TOP_EN_SET,
+            u32::from(self.subtree_mask).into(),
+        );
+    }
+}
+
+/// Leaf interrupt controller view for one interrupt leaf.
+pub(super) struct Leaf<S: State = Idle> {
+    index: LeafIndex,
+    state: S,
+}
+
+impl<Left: State, Right: State> PartialEq<Leaf<Right>> for Leaf<Left> {
+    fn eq(&self, other: &Leaf<Right>) -> bool {
+        self.index == other.index
+    }
+}
+
+impl<S: State> Eq for Leaf<S> {}
+
+// All `try_at().unwrap()` calls below are safe: `LeafIndex` is `Bounded<usize, 4>`,
+// guaranteeing values 0-15, and all INTR_CTRL leaf register arrays have 16 elements.
+impl Leaf<Idle> {
+    /// Construct a [`Leaf`] handle for the given leaf index.
+    pub(super) fn from_index(index: LeafIndex) -> Self {
+        Leaf { index, state: Idle }
+    }
+
+    /// Enable the bits in `mask` in this leaf's EN_SET register.
+    pub(super) fn allow(&self, bar: &Bar0, mask: u32) {
+        bar.write(
+            regs::NV_VF_INTR_LEAF_EN_SET::try_at(self.index.get()).unwrap(),
+            mask.into(),
+        );
+    }
+
+    /// Disable the bits in `mask` in this leaf's EN_CLEAR register.
+    pub(super) fn block(&self, bar: &Bar0, mask: u32) {
+        bar.write(
+            regs::NV_VF_INTR_LEAF_EN_CLEAR::try_at(self.index.get()).unwrap(),
+            mask.into(),
+        );
+    }
+
+    /// Read this leaf's pending interrupt mask and transition to [`Pending`].
+    pub(super) fn read_pending(self, bar: &Bar0) -> Leaf<Pending> {
+        let mask = bar
+            .read(regs::NV_VF_INTR_LEAF::try_at(self.index.get()).unwrap())
+            .into_raw();
+        Leaf {
+            index: self.index,
+            state: Pending { mask },
+        }
+    }
+}
+
+impl Leaf<Pending> {
+    /// Return the raw pending interrupt bitmask read from hardware.
+    pub(super) fn mask(&self) -> u32 {
+        self.state.mask
+    }
+
+    /// Acknowledge all pending bits by writing the mask back to the leaf register.
+    pub(super) fn ack(&self, bar: &Bar0) {
+        if self.state.mask != 0 {
+            bar.write(
+                regs::NV_VF_INTR_LEAF::try_at(self.index.get()).unwrap(),
+                self.state.mask.into(),
+            );
+        }
+    }
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 837aa2d36a0e..6d0e4b2f53c7 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -19,6 +19,7 @@
 mod firmware;
 mod gpu;
 mod gsp;
+#[expect(dead_code)]
 mod irq;
 #[macro_use]
 mod num;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 6/7] gpu: nova-core: add CPU doorbell IRQ self-test
  2026-05-01 20:58 [PATCH v1 0/7] gpu: nova-core: add INTR_CTRL interrupt controller and CPU doorbell self-test Joel Fernandes
                   ` (4 preceding siblings ...)
  2026-05-01 20:58 ` [PATCH v1 5/7] gpu: nova-core: add INTR_CTRL interrupt controller API Joel Fernandes
@ 2026-05-01 20:58 ` Joel Fernandes
  2026-05-01 20:58 ` [PATCH v1 7/7] gpu: nova-core: document INTR_CTRL interrupt tree Joel Fernandes
  6 siblings, 0 replies; 10+ messages in thread
From: Joel Fernandes @ 2026-05-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Danilo Krummrich, Alexandre Courbot, John Hubbard, Alice Ryhl,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Jonathan Corbet, Shuah Khan, nova-gpu, dri-devel,
	rust-for-linux, linux-doc, Joel Fernandes

Add a CPU doorbell interrupt self-test that runs during probe, after GSP
boot. The test validates the full MSI interrupt path from GPU through
PCIe to the CPU interrupt handler.

Tested with qemu + GPU passthrough on GA102, with dmesg as follows:
  NovaCore 0000:00:06.0: CPU doorbell self-test: PASS (irq_count=1)

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/Kconfig              |  13 ++
 drivers/gpu/nova-core/gpu.rs               |   8 +
 drivers/gpu/nova-core/irq.rs               |   2 +
 drivers/gpu/nova-core/irq/doorbell_test.rs | 203 +++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs         |   2 +-
 5 files changed, 227 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/nova-core/irq/doorbell_test.rs

diff --git a/drivers/gpu/nova-core/Kconfig b/drivers/gpu/nova-core/Kconfig
index d8456f8eaa05..e2c8a090c7ff 100644
--- a/drivers/gpu/nova-core/Kconfig
+++ b/drivers/gpu/nova-core/Kconfig
@@ -15,3 +15,16 @@ config NOVA_CORE
 	  This driver is work in progress and may not be functional.
 
 	  If M is selected, the module will be called nova_core.
+
+config NOVA_CORE_IRQ_SELFTEST
+	bool "Nova IRQ self-test during probe"
+	depends on NOVA_CORE
+	help
+	  Enable the CPU doorbell IRQ self-test that runs during nova-core
+	  probe. The test triggers vector 129 (CPU doorbell) and verifies
+	  the interrupt is received through the INTR_CTRL interrupt tree.
+
+	  This validates the full MSI interrupt path from GPU through PCIe
+	  to the CPU interrupt handler.
+
+	  If unsure, say N.
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 3b45bce6738b..f6e02007ef8f 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -305,6 +305,14 @@ pub(crate) fn new<'a>(
             // Allocate a PCI interrupt vector.
             _: {
                 let _irq_vector = irq::alloc_vector(pdev)?;
+
+                #[cfg(CONFIG_NOVA_CORE_IRQ_SELFTEST)]
+                irq::doorbell_test::run_selftest(
+                    pdev,
+                    &devres_bar,
+                    spec.chipset,
+                    _irq_vector,
+                )?;
             },
 
             bar: devres_bar,
diff --git a/drivers/gpu/nova-core/irq.rs b/drivers/gpu/nova-core/irq.rs
index 01ae638bf494..f4ed4593e795 100644
--- a/drivers/gpu/nova-core/irq.rs
+++ b/drivers/gpu/nova-core/irq.rs
@@ -10,6 +10,8 @@
     prelude::*,
 };
 
+#[cfg(CONFIG_NOVA_CORE_IRQ_SELFTEST)]
+pub(crate) mod doorbell_test;
 mod intr_ctrl;
 
 pub(crate) fn alloc_vector(pdev: &pci::Device<Bound>) -> Result<pci::IrqVector<'_>> {
diff --git a/drivers/gpu/nova-core/irq/doorbell_test.rs b/drivers/gpu/nova-core/irq/doorbell_test.rs
new file mode 100644
index 000000000000..fb4e039ac032
--- /dev/null
+++ b/drivers/gpu/nova-core/irq/doorbell_test.rs
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::{
+    device::{Bound, Device},
+    devres::Devres,
+    irq, pci,
+    prelude::*,
+    sync::{
+        atomic::{
+            Atomic,
+            Relaxed, //
+        },
+        Arc, Completion,
+    },
+    time,
+};
+
+use super::intr_ctrl::{
+    IntrCtrl,
+    Leaf,
+    LeafIndex, //
+};
+use crate::{
+    driver::Bar0,
+    gpu::Chipset, //
+};
+
+// The following are constant across all architectures.
+
+/// CPU doorbell vector.
+const DOORBELL_VECTOR: u32 = 129;
+
+/// Leaf index for the doorbell vector: 129 / 32 = 4.
+const DOORBELL_LEAF: usize = 4;
+
+/// Bit within the leaf: 129 % 32 = 1.
+const DOORBELL_BIT: u32 = 1 << 1;
+
+/// IRQ handler for the CPU doorbell self-test.
+///
+/// Performs a minimal interrupt-tree drain cycle:
+/// unarm -> read TOP -> iterate leaves -> ack -> rearm.
+/// Signals completion and increments the interrupt counter on each handled interrupt.
+/// Records the leaf index and pending mask observed by the handler for verification.
+#[pin_data]
+struct DoorbellTestHandler {
+    bar: Arc<Devres<Bar0>>,
+    intr_ctrl: IntrCtrl,
+    #[pin]
+    completion: Completion,
+    /// Used to confirm the number of interrupts handled.
+    irq_count: Atomic<u32>,
+    /// Used to confirm the mask observed on the doorbell leaf (leaf 4).
+    doorbell_leaf_mask: Atomic<u32>,
+}
+
+impl irq::Handler for DoorbellTestHandler {
+    fn handle(&self, dev: &Device<Bound>) -> irq::IrqReturn {
+        let Ok(bar) = self.bar.access(dev) else {
+            return irq::IrqReturn::None;
+        };
+
+        let top = self.intr_ctrl.top().unarm(bar).read_pending(bar);
+
+        if top.mask() == 0 {
+            top.rearm(bar);
+            return irq::IrqReturn::None;
+        }
+
+        // Record the doorbell leaf mask for later verification.
+        let doorbell_leaf = Leaf::from_index(LeafIndex::new::<DOORBELL_LEAF>());
+
+        for subtree in top.iter_subtrees() {
+            for leaf in subtree.iter_pending_leaves(&self.intr_ctrl, bar) {
+                if leaf == doorbell_leaf {
+                    self.doorbell_leaf_mask.store(leaf.mask(), Relaxed);
+                }
+                leaf.ack(bar);
+            }
+        }
+
+        top.rearm(bar);
+
+        // Increment the interrupt counter and signal the completion.
+        self.irq_count.fetch_add(1, Relaxed);
+        self.completion.complete_all();
+
+        irq::IrqReturn::Handled
+    }
+}
+
+/// Run the CPU doorbell IRQ self-test.
+///
+/// Registers an IRQ handler, triggers CPU doorbell vector, and verifies the
+/// interrupt is received through the interrupt tree. This validates the full MSI path:
+/// GPU -> PCIe -> CPU -> handler.
+pub(crate) fn run_selftest(
+    pdev: &pci::Device<Bound>,
+    bar_devres: &Arc<Devres<Bar0>>,
+    chipset: Chipset,
+    irq_vector: pci::IrqVector<'_>,
+) -> Result {
+    let bar = bar_devres.access(pdev.as_ref())?;
+    let intr_ctrl = IntrCtrl::new(chipset);
+
+    // Clear stale pending bits before enabling the doorbell.
+    intr_ctrl.drain(bar);
+
+    let handler_init = try_pin_init!(DoorbellTestHandler {
+        bar: bar_devres.clone(),
+        intr_ctrl,
+        completion <- Completion::new(),
+        irq_count: Atomic::new(0),
+        doorbell_leaf_mask: Atomic::new(0),
+    }? Error);
+
+    let reg = Arc::pin_init(
+        pdev.request_irq(
+            irq_vector,
+            irq::Flags::TRIGGER_NONE,
+            c"nova-core",
+            handler_init,
+        ),
+        GFP_KERNEL,
+    )?;
+
+    let handler = reg.handler();
+
+    // Allow doorbell leaf.
+    let doorbell_leaf_idx = LeafIndex::new::<DOORBELL_LEAF>();
+    handler
+        .intr_ctrl
+        .leaf(doorbell_leaf_idx)
+        .allow(bar, DOORBELL_BIT);
+
+    // The doorbell bit must be clear before triggering, otherwise the test
+    // cannot prove that the IRQ came from the trigger below.
+    let pre_mask = handler
+        .intr_ctrl
+        .leaf(doorbell_leaf_idx)
+        .read_pending(bar)
+        .mask();
+    if pre_mask & DOORBELL_BIT != 0 {
+        handler
+            .intr_ctrl
+            .leaf(doorbell_leaf_idx)
+            .block(bar, DOORBELL_BIT);
+        let _ = handler.intr_ctrl.top().unarm(bar);
+        dev_warn!(
+            pdev.as_ref(),
+            "CPU doorbell self-test: FAIL (doorbell bit already pending, leaf[{}] mask={:#x})\n",
+            DOORBELL_LEAF,
+            pre_mask,
+        );
+        return Err(EIO);
+    }
+
+    // Arm the INTR_CTRL top level to enable MSI generation.
+    handler.intr_ctrl.top().arm(bar);
+
+    // Trigger the CPU doorbell interrupt.
+    handler.intr_ctrl.trigger(bar, DOORBELL_VECTOR);
+
+    // Wait up to 1 second for the interrupt handler to fire.
+    let completed = handler
+        .completion
+        .wait_for_completion_timeout(time::msecs_to_jiffies(1000));
+
+    let count = handler.irq_count.load(Relaxed);
+    let leaf_mask = handler.doorbell_leaf_mask.load(Relaxed);
+
+    // Block the doorbell leaf after the test.
+    handler
+        .intr_ctrl
+        .leaf(doorbell_leaf_idx)
+        .block(bar, DOORBELL_BIT);
+    let _ = handler.intr_ctrl.top().unarm(bar);
+
+    // Verify that the doorbell IRQ fired.
+    let doorbell_bit_seen = leaf_mask & DOORBELL_BIT != 0;
+    let pass = completed && count == 1 && doorbell_bit_seen;
+
+    if pass {
+        dev_info!(
+            pdev.as_ref(),
+            "CPU doorbell self-test: PASS (irq_count={}, leaf[{}] mask={:#x})\n",
+            count,
+            DOORBELL_LEAF,
+            leaf_mask,
+        );
+    } else {
+        dev_warn!(
+            pdev.as_ref(),
+            "CPU doorbell self-test: FAIL (completed={}, irq_count={}, leaf[{}] mask={:#x})\n",
+            completed,
+            count,
+            DOORBELL_LEAF,
+            leaf_mask,
+        );
+    }
+
+    Ok(())
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 6d0e4b2f53c7..5fce7068db03 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -19,7 +19,7 @@
 mod firmware;
 mod gpu;
 mod gsp;
-#[expect(dead_code)]
+#[cfg_attr(not(CONFIG_NOVA_CORE_IRQ_SELFTEST), expect(dead_code))]
 mod irq;
 #[macro_use]
 mod num;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 7/7] gpu: nova-core: document INTR_CTRL interrupt tree
  2026-05-01 20:58 [PATCH v1 0/7] gpu: nova-core: add INTR_CTRL interrupt controller and CPU doorbell self-test Joel Fernandes
                   ` (5 preceding siblings ...)
  2026-05-01 20:58 ` [PATCH v1 6/7] gpu: nova-core: add CPU doorbell IRQ self-test Joel Fernandes
@ 2026-05-01 20:58 ` Joel Fernandes
  6 siblings, 0 replies; 10+ messages in thread
From: Joel Fernandes @ 2026-05-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Danilo Krummrich, Alexandre Courbot, John Hubbard, Alice Ryhl,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Jonathan Corbet, Shuah Khan, nova-gpu, dri-devel,
	rust-for-linux, linux-doc, Joel Fernandes

Add documentation describing the interrupt controller architecture for
modern NVIDIA GPUs.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 Documentation/gpu/nova/core/intr-ctrl.rst | 305 ++++++++++++++++++++++
 Documentation/gpu/nova/index.rst          |   1 +
 2 files changed, 306 insertions(+)
 create mode 100644 Documentation/gpu/nova/core/intr-ctrl.rst

diff --git a/Documentation/gpu/nova/core/intr-ctrl.rst b/Documentation/gpu/nova/core/intr-ctrl.rst
new file mode 100644
index 000000000000..10091c258f9c
--- /dev/null
+++ b/Documentation/gpu/nova/core/intr-ctrl.rst
@@ -0,0 +1,305 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================================
+INTR_CTRL: The GPU's Interrupt Controller
+==============================================
+
+This document describes the interrupt controller which sits between
+the GPU's internal engines (including GSP) and the host's MSI delivery path.
+It is the first hardware block the host driver consults whenever an interrupt is
+delivered, and it is responsible for telling software which engine interrupted.
+It is also known as the "INTR_CTRL" block. The main evolution of interrupt
+controller architecture is to support virtualization (multiple views of the
+interrupt tree for PFs and VFs).
+
+Per-function trees
+==================
+
+Each PCIe function has its own private interrupt tree:
+
+* The Physical Function (PF) sees a tree at a fixed BAR0 offset.
+* Each Virtual Function (VF) sees its own tree at the same BAR0 offset
+  *within its own BAR0 view*, and it cannot observe the PF's tree.
+* The GSP firmware also has its own logical tree in INTR_CTRL used for
+  receiving interrupts to GSP from the engines, but we don't need to
+  bother with those in nova-core (that's GSP's business).
+
+.. note::
+  The PF can also see the VF's tree at an aliased offset in BAR0, which
+  is useful if the guest driver needs host help in configuring interrupts,
+  but we currently do not use that in nova-core.
+
+Two-level interrupt tree
+========================
+
+INTR_CTRL multiplexes up to 256 internal interrupt vectors onto the single
+MSI line allocated to the PCIe function via a two-level tree of MMIO
+registers, where each TOP bit covers exactly two adjacent leaves (known
+as a "subtree"). As an example, on GA102, the tree looks like this::
+
+    TOP register (32-bit)
+     |   bit N == 1 => subtree N has at least one pending leaf
+     |
+     +-- bit 0 --> LEAF[0] (32-bit)  vectors    0..  31  (nonstall base)
+     |             LEAF[1] (32-bit)  vectors   32..  63
+     |
+     +-- bit 1 --> LEAF[2] (32-bit)  vectors   64..  95
+     |             LEAF[3] (32-bit)  vectors   96.. 127
+     |
+     +-- bit 2 --> LEAF[4] (32-bit)  vectors  128.. 159  (CPU doorbell @ 129)
+     |             LEAF[5] (32-bit)  vectors  160.. 191
+     |
+     +-- bit 3 --> LEAF[6] (32-bit)  vectors  192.. 223  (engine stall base,
+     |             LEAF[7] (32-bit)  vectors  224.. 255   GSP stall vector)
+     |
+     +-- bits 4..7 (Hopper+ only) --> LEAF[8..15]
+
+
+The second level (LEAF registers) is where individual engines deposit
+their interrupt events. The first level (TOP register) is a summary: bit
+``N`` of TOP is set if and only if at least one bit is set in the two
+leaves owned by subtree ``N``. Software can therefore start from TOP,
+identify which subtrees have work, and then descend into just those leaves
+- it never needs to read all 16 leaf registers blindly.
+
+The advantage of this architecture is that it allows the host to mask
+entire subtrees of interrupts at once, rather than having to mask each
+leaf individually. Similar reasoning for determining which interrupt
+source fired, the host can walk the tree without going through all 16
+leaves.
+
+Each TOP bit, called a **subtree**, is wired in hardware to exactly two
+adjacent leaves (``leaves 2*N`` and ``2*N + 1``), so nova-core derives
+``num_subtrees = num_leaves / 2`` rather than tracking both numbers
+independently.
+
+End-to-end engine interrupt routing to MSI
+===============================================
+
+The engine interrupt routing is done by the engine's INTR_CTRL(i)
+register. This register is written once by GSP at boot and decides
+which tree/leaf to activate in the INTR_CTRL. This model assists in
+virtualization, as it is possible for the GSP to route engines to the
+correct tree/leaf corresponding to the VF. GSP then provides the
+information to the host via the INTR_GET_KERNEL_TABLE RPC so that
+the host knows which leaf bits correspond to an engine's interrupt.
+
+It roughly looks like the following::
+              +--------------- Engine (CE, GR, NVDEC, ...) ---------------+
+              |                                                           |
+              |   internal work completes                                 |
+              |          |                                                |
+              |          v                                                |
+              |   +-----------------------------------------+             |
+              |   | INTR_CTRL(i): programmable register     |             |
+              |   | (written once by GSP-RM at boot,        |             |
+              |   |  one such reg per engine)               |             |
+              |   |                                         |             |
+              |   |   VECTOR  = 200   (-> which leaf bit)   |             |
+              |   |   GFID    = 0     (-> which function's  |             |
+              |   |                       tree: 0=PF, N=VF) |             |
+              |   |   CPU     = 1     (-> copy to CPU tree?)|             |
+              |   |   GSP     = 0     (-> copy to GSP tree?)|             |
+              |   +--------------------+--------------------+             |
+              |                        |                                  |
+              |     engine builds      |                                  |
+              |     interrupt ctrl     |                                  |
+              |     command message    |                                  |
+              |  (all2ctrl_intr_cmd)   |                                  |
+              +------------------------|----------------------------------+
+                                       |
+                                       v
+                   +-----------------------------------------+
+                   | Central INTR_CTRL block                 |
+                   |                                         |
+                   | reads message; for the tree picked      |
+                   | by GFID, sets:                          |
+                   |   LEAF[ 200 / 32 ]  = LEAF[6]           |
+                   |   bit  ( 200 % 32 ) = bit 8             |
+                   | TOP subtree 3 = pending                 |
+                   +--------------------+--------------------+
+                                        |
+                                        v
+                                MSI to host (PF)
+
+Vector encoding
+---------------
+
+A vector number ``v`` (0..255) maps to a unique ``(leaf, bit)`` pair::
+
+    leaf_index = v / 32
+    bit_in_leaf = v % 32
+
+For example, vector 129 (the CPU doorbell self-test vector we use in
+the INTR_CTRL self-test, see below) lives in ``LEAF[4]`` at bit 1,
+which is reachable through subtree 2 in the TOP register.
+
+Architecture differences
+------------------------
+
+The number of *active* leaves depends on the GPU architecture:
+
+==================  =================  ==========  ================
+Architecture        Active leaves      Subtrees    ``subtree_mask``
+==================  =================  ==========  ================
+Turing / Ampere     8                  4           ``0x0f``
+Ada Lovelace        8                  4           ``0x0f``
+Hopper / Blackwell  16                 8           ``0xff``
+==================  =================  ==========  ================
+
+Pre-Hopper chipsets only have leaves 0-7 wired up; the upper half of the
+TOP register is unused and reads back as zero. Hopper widened the tree to
+16 leaves to support more engines and more virtual functions.
+
+Stall vs nonstall vector ranges
+===============================
+
+A common point of confusion: **stall and nonstall are NOT separate
+interrupt trees**. They are two different *vector ranges* within the same
+INTR_CTRL tree, and the source engine picks which range its interrupt
+lands in.
+
+* **Nonstall** vectors live in the low leaves (``LEAF[0..1]``, vectors
+  0..63). The engine fires the interrupt and continues immediately,
+  whether or not the host has acknowledged it. Used for "fire and
+  forget" notifications - examples: vblank, semaphore wakeups, performance
+  counter overflow).
+
+* **Stall** vectors live in the high leaves.
+  On Turing and Ampere:
+  ``LEAF[6..7]`` (vectors 192..255, subtree 3).
+  On Hopper:
+  ``LEAF[6..11]`` (subtrees 3..5).
+  The engine *blocks* (stalls) until the host writes a W1C (Write 1 to Clear)
+  ack to the leaf bit. Example: MMU fault.
+
+ISR operation flow
+==================
+
+When an MSI fires, the ISR walks the tree in a fixed sequence::
+
+    1. UNARM    write subtree_mask -> TOP_EN_CLEAR  (stop MSI delivery)
+    2. READ     pending = TOP                       (which subtrees fired?)
+    3. ACK      for each pending leaf:
+                   mask = LEAF[i]                   (read pending vectors)
+                   LEAF[i] = mask                   (W1C the latches)
+                   dispatch handlers for set bits
+    4. REARM    write subtree_mask -> TOP_EN_SET    (resume MSI delivery)
+
+A few important properties:
+
+* **All pending leaf bits must be acked**, even bits that nova-core does
+  not currently dispatch. Leaving a bit set keeps its subtree pending in
+  TOP, which means the next REARM immediately fires another MSI - an
+  interrupt storm. The handler therefore acks the full leaf mask, not
+  just the bits it recognizes.
+
+* **REARM happens only after every pending leaf has been acked.**
+  Otherwise a still-set leaf bit would re-fire MSI on the next REARM
+  even though the ISR is mid-processing.
+
+Edge-trigger and rearm semantics
+================================
+
+Each LEAF bit is a sticky latch with edge-triggered SET behaviour:
+
+* The latch SETS on the rising edge of the source signal (an engine
+  message arriving on the interrupt control command interface, or a falcon
+  output wire transitioning low->high).
+* The latch CLEARS only when the host writes a 1 to that bit (W1C).
+* A still-asserted source does **not** re-set the latch. There is no
+  way to make a level-asserted signal "re-fire" except to drop and
+  re-raise it.
+
+There are two distinct rescue mechanisms, at two different layers,
+for two different problems. They are easy to confuse, so first some
+vocabulary as the rescues are entirely about how these pieces of
+hardware are wired together:
+
+* ``LEAF[i]``: each bit is a *sticky latch*: bit ``b`` SETs on the
+  rising edge of an ``all2ctrl_intr_cmd`` message and CLEARs only
+  when the host writes 1 to that bit (W1C ack).
+
+* ``TOP[N]``: bit ``N`` of the read-only ``TOP`` register. Purely
+  combinational: it reads 1 if and only if at least one bit is
+  latched in either of the two leaves owned by subtree ``N``,
+  i.e. ``LEAF[2N]`` or ``LEAF[2N+1]``. Software cannot write ``TOP``;
+  the hardware tracks the leaves automatically.
+
+* ``TOP_EN[N]``: a single host-controlled "armed?" bit per subtree,
+  internal to INTR_CTRL. The host *sets* it by writing 1 to
+  ``TOP_EN_SET`` and *clears* it by writing 1 to ``TOP_EN_CLEAR``.
+  Reading either register returns the current ``TOP_EN`` bitmask.
+  ``TOP_EN`` is not a latch; it just remembers what the host last set
+  or cleared.
+
+* The **MSI-edge AND-gate**: one per subtree, internal to ``INTR_CTRL``.
+  It ANDs ``TOP[N]`` with ``TOP_EN[N]`` and drives the output
+  through an edge detector. An MSI for subtree ``N`` is delivered
+  on every *rising edge* of this AND output; level changes that
+  drop the output to 0 (for any reason) deliver no MSI.
+
+::
+
+       LEAF[2N], LEAF[2N+1]         (sticky latches; W1C to clear)
+              |
+              v
+       (OR of all 64 latched bits in subtree N)
+              |
+              v
+        TOP[N] ----+
+                   |
+                   AND ---(rising edge detector)---> MSI for subtree N
+                   |
+       TOP_EN[N] --+
+              ^
+              |   host pokes:
+              |     write 1 to TOP_EN_SET[N]   -> TOP_EN[N] becomes 1
+              |     write 1 to TOP_EN_CLEAR[N] -> TOP_EN[N] becomes 0
+
+With that in hand:
+
+1. **REARM** (writing the subtree mask to ``TOP_EN_SET``) rescues a
+   timing race: between the ISR's last leaf ack and the moment
+   ``TOP_EN`` is brought back high, *new* engine events can arrive
+   and latch fresh leaf bits. The ISR did its best to drain
+   everything visible at the time of its W1C, but the W1C only
+   clears the bits the ISR snapshotted; anything the engine fires
+   afterwards sets new bits in ``LEAF[i]`` that the ISR never saw.
+
+2. **INTR_RETRIGGER** (a per-engine register, not part of INTR_CTRL)
+   rescues a still-asserted level source *inside an engine*. Most
+   engines drive their internal "interrupt pending" signal as a
+   level and convert it to an ``all2ctrl_intr_cmd`` message via an
+   edge converter that fires only on the rising edge of that level.
+   So one rising edge of the engine's level produces one message,
+   which sets one leaf bit. After the host's W1C clears that leaf
+   bit, a level that has stayed high produces no new edge, so the
+   engine's edge converter never sends another message to INTR_CTRL,
+   the leaf stays clear, and ``TOP[N]`` is 0. REARM's AND-gate trick
+   is useless here. Writing 1 to the engine's
+   ``INTR_RETRIGGER`` register drops the engine's level for one
+   clock cycle; the level then returns to 1 (the engine still has
+   work pending in its source register), the edge converter sees a
+   fresh 0->1 transition, sends a new message, the leaf re-latches,
+   ``TOP[N]`` goes back to 1, and an MSI follows on REARM (or
+   immediately, if ``TOP_EN[N]`` was already 1). ``INTR_RETRIGGER``
+   bridges the asymmetry between level-asserted internal engine
+   logic and edge-driven ``INTR_CTRL`` leaf messages.
+
+CPU doorbell self-test
+======================
+
+INTR_CTRL exposes a software-trigger register, ``NV_VF_INTR_LEAF_TRIGGER``.
+Writing a vector number ``v`` to this register synthesizes a hardware
+interrupt event on vector ``v``: the matching leaf bit latches, TOP
+updates, and (assuming the subtree is armed and the leaf vector is
+enabled) an MSI is delivered to the host.
+
+nova-core uses vector 129 (``LEAF[4]`` bit 1) as a self-test "doorbell":
+during early initialization, the driver registers a temporary ISR for vector
+129, writes 129 to ``LEAF_TRIGGER``, and verifies that its ISR fires.
+This validates the entire MSI -> INTR_CTRL -> ISR path *without* needing
+the GSP firmware to be running, which makes it useful for debugging early
+PCI / MSI issues, VFIO passthrough setups, and testing when GSP is not yet
+available.
diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
index e39cb3163581..1ea111988e35 100644
--- a/Documentation/gpu/nova/index.rst
+++ b/Documentation/gpu/nova/index.rst
@@ -32,3 +32,4 @@ vGPU manager VFIO driver and the nova-drm driver.
    core/devinit
    core/fwsec
    core/falcon
+   core/intr-ctrl
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/7] rust: sync: completion: add wait_for_completion_timeout()
  2026-05-01 20:58 ` [PATCH v1 1/7] rust: sync: completion: add wait_for_completion_timeout() Joel Fernandes
@ 2026-05-05 12:17   ` Miguel Ojeda
  2026-05-05 20:19     ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Miguel Ojeda @ 2026-05-05 12:17 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Danilo Krummrich, Alexandre Courbot, John Hubbard,
	Alice Ryhl, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Miguel Ojeda, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Jonathan Corbet, Shuah Khan, nova-gpu, dri-devel,
	rust-for-linux, linux-doc

On Fri, May 1, 2026 at 10:58 PM Joel Fernandes <joelagnelf@nvidia.com> wrote:
>
> +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.

This is fine since it follows the other ones in the file, but we
should say why this is the case (in another series, possibly a good
first issue), rather than just asserting it.

e.g. a type invariant?

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/7] rust: sync: completion: add wait_for_completion_timeout()
  2026-05-05 12:17   ` Miguel Ojeda
@ 2026-05-05 20:19     ` Joel Fernandes
  0 siblings, 0 replies; 10+ messages in thread
From: Joel Fernandes @ 2026-05-05 20:19 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: linux-kernel, Danilo Krummrich, Alexandre Courbot, John Hubbard,
	Alice Ryhl, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Miguel Ojeda, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Jonathan Corbet, Shuah Khan, nova-gpu, dri-devel,
	rust-for-linux, linux-doc



On 5/5/2026 8:17 AM, Miguel Ojeda wrote:
> On Fri, May 1, 2026 at 10:58 PM Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>
>> +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
> 
> This is fine since it follows the other ones in the file, but we
> should say why this is the case (in another series, possibly a good
> first issue), rather than just asserting it.
> 
> e.g. a type invariant?
> 

Sure, I will make this change as a separate patch. Probably if it is one extra
patch, I'll just add it to the series.

thanks,

--
Joel Fernandes


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-05 20:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01 20:58 [PATCH v1 0/7] gpu: nova-core: add INTR_CTRL interrupt controller and CPU doorbell self-test Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 1/7] rust: sync: completion: add wait_for_completion_timeout() Joel Fernandes
2026-05-05 12:17   ` Miguel Ojeda
2026-05-05 20:19     ` Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 2/7] gpu: nova-core: allocate PCI MSI vector during probe Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 3/7] gpu: nova-core: add interrupt controller register definitions Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 4/7] gpu: nova-core: add Architecture::is_pre_hopper() helper Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 5/7] gpu: nova-core: add INTR_CTRL interrupt controller API Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 6/7] gpu: nova-core: add CPU doorbell IRQ self-test Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 7/7] gpu: nova-core: document INTR_CTRL interrupt tree Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox