From: Alexandre Courbot <acourbot@nvidia.com>
To: Danilo Krummrich <dakr@kernel.org>,
Alice Ryhl <aliceryhl@google.com>,
David Airlie <airlied@gmail.com>,
Simona Vetter <simona@ffwll.ch>
Cc: John Hubbard <jhubbard@nvidia.com>,
Alistair Popple <apopple@nvidia.com>,
Timur Tabi <ttabi@nvidia.com>,
Eliot Courtney <ecourtney@nvidia.com>,
nova-gpu@lists.linux.dev, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
Alexandre Courbot <acourbot@nvidia.com>,
Gary Guo <gary@garyguo.net>
Subject: [PATCH v5 0/7] gpu: nova-core: run unload sequence upon unbinding
Date: Fri, 15 May 2026 15:12:26 +0900 [thread overview]
Message-ID: <20260515-nova-unload-v5-0-c4d6250ad160@nvidia.com> (raw)
Currently the GSP is left running and the WPR2 memory region untouched
when the driver is unbound. This is obviously not ideal for at least two
reasons:
- Probing requires setting up the WPR2 region, which cannot be done if
there is already one in place. Hence the current requirement to reset
the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before
the driver can be probed again after removal.
- The running GSP may still attempt to access shared memory regions
which the kernel might recycle.
On top of that, there is a nasty bug in the Blackwell VBIOS that
sometimes borks the GPU upon PCI reset, requiring a reboot. So relying
on the PCI reset to unload/reload Nova is really not practical here.
This series does what is needed to leave the GPU in a clean state after
unbind, for all currently supported GPUs. Blackwell support is basic and
will be added alongside the Blackwell series if this can be merged
first.
This revision rebases on top of the Device HRT series [1] and addresses
the minor feedback received on v4. A branch with the series and its
required dependencies is available at [2].
[1] https://lore.kernel.org/20260506215113.851360-1-dakr@kernel.org
[2] https://github.com/Gnurou/linux/tree/b4/nova-unload
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v5:
- Rebase on top of the Device HRT series.
- Drop the now unneeded "gpu: nova-core: split BAR acquisition in unbind()".
- Link to v4: https://patch.msgid.link/20260427-nova-unload-v4-0-e145ccddae66@nvidia.com
Changes in v4:
- Remove `warn_on_err` macro as it isn't performing as expected and
distracts from the goal of the series.
- Add John's patch from the Blackwell series refactoring the Booter
Loader runner code.
- Add a GSP HAL and move the existing TU102/SEC2 boot sequence into it
in preparation for the Hopper/Blackwell FSP boot path.
- Prepare the firmware required for unloading at probe time and save it
into an unload bundle, as we cannot guarantee filesystem access at
unload time.
- Constrain `UNLOADING_GUEST_DRIVER`'s visibility to the parent module.
- Also write the sentinel value `0xff` into `mbox1` when running Booter
Unloader to align with OpenRM.
- Link to v3: https://patch.msgid.link/20260422-nova-unload-v3-0-1d2c81bd3ced@nvidia.com
Changes in v3:
- Disambiguate doccomment for `warn_on_err`.
- Test the correct bit instead of the whole register value to determine
that the GSP has stopped.
- Use an enum instead of a boolean to encode the power level when
shutting down the GSP.
- Add missing newline to `dev_err`.
- Add missing doccomments for new types.
- Use values from bindings instead of magic numbers.
- Remove the redundant `get_gsp_info` function.
- Better document Booter Unloader mailbox sentinel value, and check the
value of mbox0 upon return.
- Link to v2: https://patch.msgid.link/20260421-nova-unload-v2-0-2fe54963af8b@nvidia.com
Changes in v2:
- Rebase on top of `master` and remove unneeded/obsolete preparatory patches.
- Tidy up the imports of commands from the `fw` module in the `gsp` module.
- Link to v1: https://patch.msgid.link/20251216-nova-unload-v1-0-6a5d823be19d@nvidia.com
---
Alexandre Courbot (6):
gpu: nova-core: remove unneeded get_gsp_info proxy function
gpu: nova-core: do not import firmware commands into GSP command module
gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close
gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
John Hubbard (1):
gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run()
drivers/gpu/nova-core/driver.rs | 4 +
drivers/gpu/nova-core/firmware/booter.rs | 31 +-
drivers/gpu/nova-core/firmware/fwsec.rs | 1 -
drivers/gpu/nova-core/gpu.rs | 7 +
drivers/gpu/nova-core/gsp.rs | 4 +
drivers/gpu/nova-core/gsp/boot.rs | 252 +++++-----------
drivers/gpu/nova-core/gsp/commands.rs | 71 +++--
drivers/gpu/nova-core/gsp/fw.rs | 4 +
drivers/gpu/nova-core/gsp/fw/commands.rs | 44 +++
drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 11 +
drivers/gpu/nova-core/gsp/hal.rs | 92 ++++++
drivers/gpu/nova-core/gsp/hal/gh100.rs | 52 ++++
drivers/gpu/nova-core/gsp/hal/tu102.rs | 351 ++++++++++++++++++++++
drivers/gpu/nova-core/regs.rs | 5 +
14 files changed, 736 insertions(+), 193 deletions(-)
---
base-commit: 84d984f9fe9363f4700e20f7c95b2da67fb2fe63
change-id: 20251216-nova-unload-4029b3b76950
Best regards,
--
Alexandre Courbot <acourbot@nvidia.com>
next reply other threads:[~2026-05-15 6:12 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 6:12 Alexandre Courbot [this message]
2026-05-15 6:12 ` [PATCH v5 1/7] gpu: nova-core: remove unneeded get_gsp_info proxy function Alexandre Courbot
2026-05-15 6:12 ` [PATCH v5 2/7] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
2026-05-15 6:12 ` [PATCH v5 3/7] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
2026-05-15 6:12 ` [PATCH v5 4/7] gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run() Alexandre Courbot
2026-05-15 6:12 ` [PATCH v5 5/7] gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close Alexandre Courbot
2026-05-15 6:12 ` [PATCH v5 6/7] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
2026-05-15 6:12 ` [PATCH v5 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260515-nova-unload-v5-0-c4d6250ad160@nvidia.com \
--to=acourbot@nvidia.com \
--cc=airlied@gmail.com \
--cc=aliceryhl@google.com \
--cc=apopple@nvidia.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=ecourtney@nvidia.com \
--cc=gary@garyguo.net \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nova-gpu@lists.linux.dev \
--cc=rust-for-linux@vger.kernel.org \
--cc=simona@ffwll.ch \
--cc=ttabi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox