public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joelagnelf@nvidia.com>
To: linux-kernel@vger.kernel.org
Cc: "Danilo Krummrich" <dakr@kernel.org>,
	"Alexandre Courbot" <acourbot@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Boqun Feng" <boqun@kernel.org>, "Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Shuah Khan" <skhan@linuxfoundation.org>,
	nova-gpu@lists.linux.dev, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-doc@vger.kernel.org,
	"Joel Fernandes" <joelagnelf@nvidia.com>
Subject: [PATCH v1 7/7] gpu: nova-core: document INTR_CTRL interrupt tree
Date: Fri,  1 May 2026 16:58:25 -0400	[thread overview]
Message-ID: <20260501205825.73614-8-joelagnelf@nvidia.com> (raw)
In-Reply-To: <20260501205825.73614-1-joelagnelf@nvidia.com>

Add documentation describing the interrupt controller architecture for
modern NVIDIA GPUs.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 Documentation/gpu/nova/core/intr-ctrl.rst | 305 ++++++++++++++++++++++
 Documentation/gpu/nova/index.rst          |   1 +
 2 files changed, 306 insertions(+)
 create mode 100644 Documentation/gpu/nova/core/intr-ctrl.rst

diff --git a/Documentation/gpu/nova/core/intr-ctrl.rst b/Documentation/gpu/nova/core/intr-ctrl.rst
new file mode 100644
index 000000000000..10091c258f9c
--- /dev/null
+++ b/Documentation/gpu/nova/core/intr-ctrl.rst
@@ -0,0 +1,305 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================================
+INTR_CTRL: The GPU's Interrupt Controller
+==============================================
+
+This document describes the interrupt controller which sits between
+the GPU's internal engines (including GSP) and the host's MSI delivery path.
+It is the first hardware block the host driver consults whenever an interrupt is
+delivered, and it is responsible for telling software which engine interrupted.
+It is also known as the "INTR_CTRL" block. The main evolution of interrupt
+controller architecture is to support virtualization (multiple views of the
+interrupt tree for PFs and VFs).
+
+Per-function trees
+==================
+
+Each PCIe function has its own private interrupt tree:
+
+* The Physical Function (PF) sees a tree at a fixed BAR0 offset.
+* Each Virtual Function (VF) sees its own tree at the same BAR0 offset
+  *within its own BAR0 view*, and it cannot observe the PF's tree.
+* The GSP firmware also has its own logical tree in INTR_CTRL used for
+  receiving interrupts to GSP from the engines, but we don't need to
+  bother with those in nova-core (that's GSP's business).
+
+.. note::
+  The PF can also see the VF's tree at an aliased offset in BAR0, which
+  is useful if the guest driver needs host help in configuring interrupts,
+  but we currently do not use that in nova-core.
+
+Two-level interrupt tree
+========================
+
+INTR_CTRL multiplexes up to 256 internal interrupt vectors onto the single
+MSI line allocated to the PCIe function via a two-level tree of MMIO
+registers, where each TOP bit covers exactly two adjacent leaves (known
+as a "subtree"). As an example, on GA102, the tree looks like this::
+
+    TOP register (32-bit)
+     |   bit N == 1 => subtree N has at least one pending leaf
+     |
+     +-- bit 0 --> LEAF[0] (32-bit)  vectors    0..  31  (nonstall base)
+     |             LEAF[1] (32-bit)  vectors   32..  63
+     |
+     +-- bit 1 --> LEAF[2] (32-bit)  vectors   64..  95
+     |             LEAF[3] (32-bit)  vectors   96.. 127
+     |
+     +-- bit 2 --> LEAF[4] (32-bit)  vectors  128.. 159  (CPU doorbell @ 129)
+     |             LEAF[5] (32-bit)  vectors  160.. 191
+     |
+     +-- bit 3 --> LEAF[6] (32-bit)  vectors  192.. 223  (engine stall base,
+     |             LEAF[7] (32-bit)  vectors  224.. 255   GSP stall vector)
+     |
+     +-- bits 4..7 (Hopper+ only) --> LEAF[8..15]
+
+
+The second level (LEAF registers) is where individual engines deposit
+their interrupt events. The first level (TOP register) is a summary: bit
+``N`` of TOP is set if and only if at least one bit is set in the two
+leaves owned by subtree ``N``. Software can therefore start from TOP,
+identify which subtrees have work, and then descend into just those leaves
+- it never needs to read all 16 leaf registers blindly.
+
+The advantage of this architecture is that it allows the host to mask
+entire subtrees of interrupts at once, rather than having to mask each
+leaf individually. Similar reasoning for determining which interrupt
+source fired, the host can walk the tree without going through all 16
+leaves.
+
+Each TOP bit, called a **subtree**, is wired in hardware to exactly two
+adjacent leaves (``leaves 2*N`` and ``2*N + 1``), so nova-core derives
+``num_subtrees = num_leaves / 2`` rather than tracking both numbers
+independently.
+
+End-to-end engine interrupt routing to MSI
+===============================================
+
+The engine interrupt routing is done by the engine's INTR_CTRL(i)
+register. This register is written once by GSP at boot and decides
+which tree/leaf to activate in the INTR_CTRL. This model assists in
+virtualization, as it is possible for the GSP to route engines to the
+correct tree/leaf corresponding to the VF. GSP then provides the
+information to the host via the INTR_GET_KERNEL_TABLE RPC so that
+the host knows which leaf bits correspond to an engine's interrupt.
+
+It roughly looks like the following::
+              +--------------- Engine (CE, GR, NVDEC, ...) ---------------+
+              |                                                           |
+              |   internal work completes                                 |
+              |          |                                                |
+              |          v                                                |
+              |   +-----------------------------------------+             |
+              |   | INTR_CTRL(i): programmable register     |             |
+              |   | (written once by GSP-RM at boot,        |             |
+              |   |  one such reg per engine)               |             |
+              |   |                                         |             |
+              |   |   VECTOR  = 200   (-> which leaf bit)   |             |
+              |   |   GFID    = 0     (-> which function's  |             |
+              |   |                       tree: 0=PF, N=VF) |             |
+              |   |   CPU     = 1     (-> copy to CPU tree?)|             |
+              |   |   GSP     = 0     (-> copy to GSP tree?)|             |
+              |   +--------------------+--------------------+             |
+              |                        |                                  |
+              |     engine builds      |                                  |
+              |     interrupt ctrl     |                                  |
+              |     command message    |                                  |
+              |  (all2ctrl_intr_cmd)   |                                  |
+              +------------------------|----------------------------------+
+                                       |
+                                       v
+                   +-----------------------------------------+
+                   | Central INTR_CTRL block                 |
+                   |                                         |
+                   | reads message; for the tree picked      |
+                   | by GFID, sets:                          |
+                   |   LEAF[ 200 / 32 ]  = LEAF[6]           |
+                   |   bit  ( 200 % 32 ) = bit 8             |
+                   | TOP subtree 3 = pending                 |
+                   +--------------------+--------------------+
+                                        |
+                                        v
+                                MSI to host (PF)
+
+Vector encoding
+---------------
+
+A vector number ``v`` (0..255) maps to a unique ``(leaf, bit)`` pair::
+
+    leaf_index = v / 32
+    bit_in_leaf = v % 32
+
+For example, vector 129 (the CPU doorbell self-test vector we use in
+the INTR_CTRL self-test, see below) lives in ``LEAF[4]`` at bit 1,
+which is reachable through subtree 2 in the TOP register.
+
+Architecture differences
+------------------------
+
+The number of *active* leaves depends on the GPU architecture:
+
+==================  =================  ==========  ================
+Architecture        Active leaves      Subtrees    ``subtree_mask``
+==================  =================  ==========  ================
+Turing / Ampere     8                  4           ``0x0f``
+Ada Lovelace        8                  4           ``0x0f``
+Hopper / Blackwell  16                 8           ``0xff``
+==================  =================  ==========  ================
+
+Pre-Hopper chipsets only have leaves 0-7 wired up; the upper half of the
+TOP register is unused and reads back as zero. Hopper widened the tree to
+16 leaves to support more engines and more virtual functions.
+
+Stall vs nonstall vector ranges
+===============================
+
+A common point of confusion: **stall and nonstall are NOT separate
+interrupt trees**. They are two different *vector ranges* within the same
+INTR_CTRL tree, and the source engine picks which range its interrupt
+lands in.
+
+* **Nonstall** vectors live in the low leaves (``LEAF[0..1]``, vectors
+  0..63). The engine fires the interrupt and continues immediately,
+  whether or not the host has acknowledged it. Used for "fire and
+  forget" notifications - examples: vblank, semaphore wakeups, performance
+  counter overflow).
+
+* **Stall** vectors live in the high leaves.
+  On Turing and Ampere:
+  ``LEAF[6..7]`` (vectors 192..255, subtree 3).
+  On Hopper:
+  ``LEAF[6..11]`` (subtrees 3..5).
+  The engine *blocks* (stalls) until the host writes a W1C (Write 1 to Clear)
+  ack to the leaf bit. Example: MMU fault.
+
+ISR operation flow
+==================
+
+When an MSI fires, the ISR walks the tree in a fixed sequence::
+
+    1. UNARM    write subtree_mask -> TOP_EN_CLEAR  (stop MSI delivery)
+    2. READ     pending = TOP                       (which subtrees fired?)
+    3. ACK      for each pending leaf:
+                   mask = LEAF[i]                   (read pending vectors)
+                   LEAF[i] = mask                   (W1C the latches)
+                   dispatch handlers for set bits
+    4. REARM    write subtree_mask -> TOP_EN_SET    (resume MSI delivery)
+
+A few important properties:
+
+* **All pending leaf bits must be acked**, even bits that nova-core does
+  not currently dispatch. Leaving a bit set keeps its subtree pending in
+  TOP, which means the next REARM immediately fires another MSI - an
+  interrupt storm. The handler therefore acks the full leaf mask, not
+  just the bits it recognizes.
+
+* **REARM happens only after every pending leaf has been acked.**
+  Otherwise a still-set leaf bit would re-fire MSI on the next REARM
+  even though the ISR is mid-processing.
+
+Edge-trigger and rearm semantics
+================================
+
+Each LEAF bit is a sticky latch with edge-triggered SET behaviour:
+
+* The latch SETS on the rising edge of the source signal (an engine
+  message arriving on the interrupt control command interface, or a falcon
+  output wire transitioning low->high).
+* The latch CLEARS only when the host writes a 1 to that bit (W1C).
+* A still-asserted source does **not** re-set the latch. There is no
+  way to make a level-asserted signal "re-fire" except to drop and
+  re-raise it.
+
+There are two distinct rescue mechanisms, at two different layers,
+for two different problems. They are easy to confuse, so first some
+vocabulary as the rescues are entirely about how these pieces of
+hardware are wired together:
+
+* ``LEAF[i]``: each bit is a *sticky latch*: bit ``b`` SETs on the
+  rising edge of an ``all2ctrl_intr_cmd`` message and CLEARs only
+  when the host writes 1 to that bit (W1C ack).
+
+* ``TOP[N]``: bit ``N`` of the read-only ``TOP`` register. Purely
+  combinational: it reads 1 if and only if at least one bit is
+  latched in either of the two leaves owned by subtree ``N``,
+  i.e. ``LEAF[2N]`` or ``LEAF[2N+1]``. Software cannot write ``TOP``;
+  the hardware tracks the leaves automatically.
+
+* ``TOP_EN[N]``: a single host-controlled "armed?" bit per subtree,
+  internal to INTR_CTRL. The host *sets* it by writing 1 to
+  ``TOP_EN_SET`` and *clears* it by writing 1 to ``TOP_EN_CLEAR``.
+  Reading either register returns the current ``TOP_EN`` bitmask.
+  ``TOP_EN`` is not a latch; it just remembers what the host last set
+  or cleared.
+
+* The **MSI-edge AND-gate**: one per subtree, internal to ``INTR_CTRL``.
+  It ANDs ``TOP[N]`` with ``TOP_EN[N]`` and drives the output
+  through an edge detector. An MSI for subtree ``N`` is delivered
+  on every *rising edge* of this AND output; level changes that
+  drop the output to 0 (for any reason) deliver no MSI.
+
+::
+
+       LEAF[2N], LEAF[2N+1]         (sticky latches; W1C to clear)
+              |
+              v
+       (OR of all 64 latched bits in subtree N)
+              |
+              v
+        TOP[N] ----+
+                   |
+                   AND ---(rising edge detector)---> MSI for subtree N
+                   |
+       TOP_EN[N] --+
+              ^
+              |   host pokes:
+              |     write 1 to TOP_EN_SET[N]   -> TOP_EN[N] becomes 1
+              |     write 1 to TOP_EN_CLEAR[N] -> TOP_EN[N] becomes 0
+
+With that in hand:
+
+1. **REARM** (writing the subtree mask to ``TOP_EN_SET``) rescues a
+   timing race: between the ISR's last leaf ack and the moment
+   ``TOP_EN`` is brought back high, *new* engine events can arrive
+   and latch fresh leaf bits. The ISR did its best to drain
+   everything visible at the time of its W1C, but the W1C only
+   clears the bits the ISR snapshotted; anything the engine fires
+   afterwards sets new bits in ``LEAF[i]`` that the ISR never saw.
+
+2. **INTR_RETRIGGER** (a per-engine register, not part of INTR_CTRL)
+   rescues a still-asserted level source *inside an engine*. Most
+   engines drive their internal "interrupt pending" signal as a
+   level and convert it to an ``all2ctrl_intr_cmd`` message via an
+   edge converter that fires only on the rising edge of that level.
+   So one rising edge of the engine's level produces one message,
+   which sets one leaf bit. After the host's W1C clears that leaf
+   bit, a level that has stayed high produces no new edge, so the
+   engine's edge converter never sends another message to INTR_CTRL,
+   the leaf stays clear, and ``TOP[N]`` is 0. REARM's AND-gate trick
+   is useless here. Writing 1 to the engine's
+   ``INTR_RETRIGGER`` register drops the engine's level for one
+   clock cycle; the level then returns to 1 (the engine still has
+   work pending in its source register), the edge converter sees a
+   fresh 0->1 transition, sends a new message, the leaf re-latches,
+   ``TOP[N]`` goes back to 1, and an MSI follows on REARM (or
+   immediately, if ``TOP_EN[N]`` was already 1). ``INTR_RETRIGGER``
+   bridges the asymmetry between level-asserted internal engine
+   logic and edge-driven ``INTR_CTRL`` leaf messages.
+
+CPU doorbell self-test
+======================
+
+INTR_CTRL exposes a software-trigger register, ``NV_VF_INTR_LEAF_TRIGGER``.
+Writing a vector number ``v`` to this register synthesizes a hardware
+interrupt event on vector ``v``: the matching leaf bit latches, TOP
+updates, and (assuming the subtree is armed and the leaf vector is
+enabled) an MSI is delivered to the host.
+
+nova-core uses vector 129 (``LEAF[4]`` bit 1) as a self-test "doorbell":
+during early initialization, the driver registers a temporary ISR for vector
+129, writes 129 to ``LEAF_TRIGGER``, and verifies that its ISR fires.
+This validates the entire MSI -> INTR_CTRL -> ISR path *without* needing
+the GSP firmware to be running, which makes it useful for debugging early
+PCI / MSI issues, VFIO passthrough setups, and testing when GSP is not yet
+available.
diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
index e39cb3163581..1ea111988e35 100644
--- a/Documentation/gpu/nova/index.rst
+++ b/Documentation/gpu/nova/index.rst
@@ -32,3 +32,4 @@ vGPU manager VFIO driver and the nova-drm driver.
    core/devinit
    core/fwsec
    core/falcon
+   core/intr-ctrl
-- 
2.34.1


      parent reply	other threads:[~2026-05-01 20:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 20:58 [PATCH v1 0/7] gpu: nova-core: add INTR_CTRL interrupt controller and CPU doorbell self-test Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 1/7] rust: sync: completion: add wait_for_completion_timeout() Joel Fernandes
2026-05-05 12:17   ` Miguel Ojeda
2026-05-05 20:19     ` Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 2/7] gpu: nova-core: allocate PCI MSI vector during probe Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 3/7] gpu: nova-core: add interrupt controller register definitions Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 4/7] gpu: nova-core: add Architecture::is_pre_hopper() helper Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 5/7] gpu: nova-core: add INTR_CTRL interrupt controller API Joel Fernandes
2026-05-01 20:58 ` [PATCH v1 6/7] gpu: nova-core: add CPU doorbell IRQ self-test Joel Fernandes
2026-05-01 20:58 ` Joel Fernandes [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260501205825.73614-8-joelagnelf@nvidia.com \
    --to=joelagnelf@nvidia.com \
    --cc=a.hindborg@kernel.org \
    --cc=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gary@garyguo.net \
    --cc=jhubbard@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=nova-gpu@lists.linux.dev \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=skhan@linuxfoundation.org \
    --cc=tmgross@umich.edu \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox