public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] gpu: nova-core: run unload sequence upon unbinding
@ 2026-04-27  6:56 Alexandre Courbot
  2026-04-27  6:56 ` [PATCH v4 1/8] gpu: nova-core: remove unneeded get_gsp_info proxy function Alexandre Courbot
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Alexandre Courbot @ 2026-04-27  6:56 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Joel Fernandes, Timur Tabi,
	Eliot Courtney, nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

Currently the GSP is left running and the WPR2 memory region untouched
when the driver is unbound. This is obviously not ideal for at least two
reasons:

- Probing requires setting up the WPR2 region, which cannot be done if
  there is already one in place. Hence the current requirement to reset
  the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before
  the driver can be probed again after removal.
- The running GSP may still attempt to access shared memory regions
  which the kernel might recycle.

On top of that, there is a nasty bug in the Blackwell VBIOS that
sometimes borks the GPU upon PCI reset, requiring a reboot. So relying
on the PCI reset to unload/reload Nova is really not practical here.

This series does what is needed to leave the GPU in a clean state after
unbind, for all currently supported GPUs. Blackwell support is trivial
and will be added alongside the Blackwell series [1] if this can be
merged first.

On top of addressing the feedback received on v3, this revision is a
considerable rework. The `warn_on_err` convenience macro looks like it
will require a bit more work, so it has been dropped as it brought a
very small improvement anyway.

Since this series touches the GSP boot code, which will also be changed
further by the Blackwell series [1], I have decided to proactively split
said boot code into a HAL. This will be helpful to avoid having the two
boot methods in the same Rust module and discriminated only by
conditionals; but this is also useful for this series in order to
abstract the unload bundle.

Indeed, as Sachiko correctly pointed out we cannot guarantee filesystem
access when the driver detaches. So this revision now prepares all the
firmware needed for unloading at load time and saves it into an unload
bundle that is executed when the driver unbinds.

The GSP HAL work might look a bit premature, but it will be fully
exploited by the Blackwell series and doing it now spares us the need to
move things around in it by putting everything in the right place from
the get-go.

This series is based on `drm-rust-next` with the GA100 support series
[2] and first patches of the Blackwell series [1] applied. A branch with
the series and its required dependencies is available at [3].

[1] https://lore.kernel.org/all/20260411024953.473149-1-jhubbard@nvidia.com/
[2] https://lore.kernel.org/all/20260417191359.1307434-1-ttabi@nvidia.com/
[3] https://github.com/Gnurou/linux/tree/b4/nova-unload

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v4:
- Remove `warn_on_err` macro as it isn't performing as expected and
  distracts from the goal of the series.
- Add John's patch from the Blackwell series refactoring the Booter
  Loader runner code.
- Add a GSP HAL and move the existing TU102/SEC2 boot sequence into it
  in preparation for the Hopper/Blackwell FSP boot path.
- Prepare the firmware required for unloading at probe time and save it
  into an unload bundle, as we cannot guarantee filesystem access at
  unload time.
- Constrain `UNLOADING_GUEST_DRIVER`'s visibility to the parent module.
- Also write the sentinel value `0xff` into `mbox1` when running Booter
  Unloader to align with OpenRM.
- Link to v3: https://patch.msgid.link/20260422-nova-unload-v3-0-1d2c81bd3ced@nvidia.com

Changes in v3:
- Disambiguate doccomment for `warn_on_err`.
- Test the correct bit instead of the whole register value to determine
  that the GSP has stopped.
- Use an enum instead of a boolean to encode the power level when
  shutting down the GSP.
- Add missing newline to `dev_err`.
- Add missing doccomments for new types.
- Use values from bindings instead of magic numbers.
- Remove the redundant `get_gsp_info` function.
- Better document Booter Unloader mailbox sentinel value, and check the
  value of mbox0 upon return.
- Link to v2: https://patch.msgid.link/20260421-nova-unload-v2-0-2fe54963af8b@nvidia.com

Changes in v2:
- Rebase on top of `master` and remove unneeded/obsolete preparatory patches.
- Tidy up the imports of commands from the `fw` module in the `gsp` module.
- Link to v1: https://patch.msgid.link/20251216-nova-unload-v1-0-6a5d823be19d@nvidia.com

---
Alexandre Courbot (7):
      gpu: nova-core: remove unneeded get_gsp_info proxy function
      gpu: nova-core: do not import firmware commands into GSP command module
      gpu: nova-core: split BAR acquisition in unbind()
      gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
      gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close
      gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
      gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding

John Hubbard (1):
      gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run()

 drivers/gpu/nova-core/firmware/booter.rs          |  31 +-
 drivers/gpu/nova-core/firmware/fwsec.rs           |   1 -
 drivers/gpu/nova-core/gpu.rs                      |  21 +-
 drivers/gpu/nova-core/gsp.rs                      |   4 +
 drivers/gpu/nova-core/gsp/boot.rs                 | 252 +++++-----------
 drivers/gpu/nova-core/gsp/commands.rs             |  71 +++--
 drivers/gpu/nova-core/gsp/fw.rs                   |   4 +
 drivers/gpu/nova-core/gsp/fw/commands.rs          |  44 +++
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs |  11 +
 drivers/gpu/nova-core/gsp/hal.rs                  |  92 ++++++
 drivers/gpu/nova-core/gsp/hal/gh100.rs            |  52 ++++
 drivers/gpu/nova-core/gsp/hal/tu102.rs            | 351 ++++++++++++++++++++++
 drivers/gpu/nova-core/regs.rs                     |   5 +
 13 files changed, 741 insertions(+), 198 deletions(-)
---
base-commit: 1affa624811b93a2df098c1d141a33c7c134d43f
change-id: 20251216-nova-unload-4029b3b76950

Best regards,
--  
Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-04-27  6:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27  6:56 [PATCH v4 0/8] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-04-27  6:56 ` [PATCH v4 1/8] gpu: nova-core: remove unneeded get_gsp_info proxy function Alexandre Courbot
2026-04-27  6:56 ` [PATCH v4 2/8] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
2026-04-27  6:57 ` [PATCH v4 3/8] gpu: nova-core: split BAR acquisition in unbind() Alexandre Courbot
2026-04-27  6:57 ` [PATCH v4 4/8] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
2026-04-27  6:57 ` [PATCH v4 5/8] gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run() Alexandre Courbot
2026-04-27  6:57 ` [PATCH v4 6/8] gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close Alexandre Courbot
2026-04-27  6:57 ` [PATCH v4 7/8] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
2026-04-27  6:57 ` [PATCH v4 8/8] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox