public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support
@ 2026-04-24 23:38 Deborah Brouwer
  2026-04-24 23:38 ` [PATCH v4 01/20] drm/tyr: remove unused device from platform data Deborah Brouwer
                   ` (20 more replies)
  0 siblings, 21 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:38 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

This series adds firmware loading and MCU boot support to the Tyr DRM
driver. It includes:
 - A parser for the Mali CSF firmware binary format
 - A kernel-managed BO type (KernelBo) for internal driver allocations
 - GPU virtual memory (VM) integration using drm_gpuvm
 - An MMU module and a generic slot manager
 - Shmem-backed GEM support for Tyr
 - Loading firmware, VM activation, and MCU boot at probe()
 - Initialization of Command Stream Frontend (CSF) firmware interfaces

Dependencies:
 - [PATCH v12 0/5] Rust bindings for gem shmem
    https://lore.kernel.org/rust-for-linux/20260421235346.672794-1-lyude@redhat.com

 - [PATCH v6 0/5] Rust GPUVM immediate mode
    https://lore.kernel.org/rust-for-linux/20260409-gpuvm-rust-v6-0-b16e6ada7261@google.com/

 - [PATCH v6 0/5] Introduce DeviceContext
    https://lore.kernel.org/rust-for-linux/20260320233645.950190-1-lyude@redhat.com/

 - [PATCH v5 0/6] drm/tyr: Use register! macro
    https://lore.kernel.org/rust-for-linux/20260409-b4-tyr-use-register-macro-v5-v5-0-8abfff8a0204@collabora.com/

Other Prerequisites:
 This series also depends on additional prerequisite fixes not included in
 this posting. The full stack (base + prerequisites + this series) is
 available here:
  https://gitlab.freedesktop.org/dbrouwer/linux/-/tree/dbrouwer/fw-boot
    
  Development history / discussion:
   https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/56
    

---
Changes in v4:
 New commits:
  - drm/tyr: program CSF global interface
  - rust: time: add arch_timer_get_rate wrapper
  - drm/tyr: add CSF firmware interface support
  - drm/tyr: validate presence of CSF shared section
  - drm/tyr: wait for global interface readiness
  - drm/tyr: add Job IRQ handling
  - drm/tyr: add Wait type for GPU events

 The existing commits from v3 remain unchanged.
 - Link to v3: https://lore.kernel.org/r/20260413-b4-fw-boot-v3-v3-0-b422f3c03885@collabora.com
    
Changes in v3:
 New commits:
  - drm/tyr: remove unused device from platform data
  - drm/tyr: use shmem GEM object type in TyrDrmDriver
    
 drm/tyr: select required dependencies in Kconfig
  - Rename commit since the dependencies are not limited to DRM.
  - Select new RUST_DRM_GEM_SHMEM_HELPER instead of DRM_GEM_SHMEM_HELPER.
    
 drm/tyr: set DMA mask using GPU physical address
  - Use register macro to read pa_bits instead of separate helper function.
    
 drm/tyr: add MMU module
  - Switch MMU code to typed register APIs (TRANSCFG, MEMATTR, STATUS, LOCKADDR, etc.).
  - Use MmuCommand enum for MMU commands instead of raw constants.
  - Minor cleanups and renaming (MAX_AS, AS_PRESENT handling).
    
 drm/tyr: add GPU virtual memory module
  - Extract VA/PA bits via typed MMU_FEATURES register.
  - Update the VM code to match the new GPUVM v6 and shmem GEM v10 APIs.
    
 drm/tyr: add a kernel buffer object
  - Reject zero-sized KernelBo allocations up front.
    
 drm/tyr: add firmware loading and MCU boot support
  - Use typed GPU control registers.
  - Pass iomem by Arc into Firmware::new() since we store it eventually.
    
  - Link to v2: https://lore.kernel.org/rust-for-linux/20260302232500.244489-1-deborah.brouwer@collabora.com/
    
Changes in v2:
 - The whole series is rebased on drm-rust-next including v7.0-rc1.
 - Each patch has its own changelog.
    
 Link to v1: https://lore.kernel.org/rust-for-linux/20260212013713.304343-1-deborah.brouwer@collabora.com/

Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>

---
Alvin Sun (1):
      drm/tyr: use shmem GEM object type in TyrDrmDriver

Beata Michalska (1):
      drm/tyr: set DMA mask using GPU physical address

Boris Brezillon (5):
      drm/tyr: select required dependencies in Kconfig
      drm/tyr: rename TyrObject to BoData
      drm/tyr: Add generic slot manager
      drm/tyr: add MMU module
      drm/tyr: add GPU virtual memory module

Daniel Almeida (1):
      drm/tyr: add parser for firmware binary

Deborah Brouwer (12):
      drm/tyr: remove unused device from platform data
      drm/tyr: move clock cleanup into Clocks Drop impl
      drm/tyr: add shmem backing for GEM objects
      drm/tyr: add a kernel buffer object
      drm/tyr: add firmware loading and MCU boot support
      drm/tyr: add Wait type for GPU events
      drm/tyr: add Job IRQ handling
      drm/tyr: wait for global interface readiness
      drm/tyr: validate presence of CSF shared section
      drm/tyr: add CSF firmware interface support
      rust: time: add arch_timer_get_rate wrapper
      drm/tyr: program CSF global interface

 drivers/gpu/drm/tyr/Kconfig              |   15 +-
 drivers/gpu/drm/tyr/driver.rs            |  153 +-
 drivers/gpu/drm/tyr/fw.rs                |  352 +++++
 drivers/gpu/drm/tyr/fw/interfaces.rs     | 2243 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/tyr/fw/irq.rs            |  120 ++
 drivers/gpu/drm/tyr/fw/parser.rs         |  532 +++++++
 drivers/gpu/drm/tyr/gem.rs               |  164 ++-
 drivers/gpu/drm/tyr/mmu.rs               |  127 ++
 drivers/gpu/drm/tyr/mmu/address_space.rs |  571 ++++++++
 drivers/gpu/drm/tyr/regs.rs              |  110 ++
 drivers/gpu/drm/tyr/slot.rs              |  436 ++++++
 drivers/gpu/drm/tyr/tyr.rs               |    5 +
 drivers/gpu/drm/tyr/vm.rs                |  805 +++++++++++
 drivers/gpu/drm/tyr/wait.rs              |  125 ++
 rust/kernel/time.rs                      |   29 +
 15 files changed, 5752 insertions(+), 35 deletions(-)
---
base-commit: 52c3c3b7eb11d596526e523bf57f8b3cbdcb24d8
change-id: 20260424-b4-fw-boot-v4-4b5f09bf13e8

Best regards,
-- 
Deborah Brouwer <deborah.brouwer@collabora.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v4 01/20] drm/tyr: remove unused device from platform data
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
@ 2026-04-24 23:38 ` Deborah Brouwer
  2026-04-24 23:38 ` [PATCH v4 02/20] drm/tyr: select required dependencies in Kconfig Deborah Brouwer
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:38 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

TyrPlatformDriverData stores an ARef to the DRM device to keep it alive,
but after switching to Registration::new_foreign_owned(), the registration
owns the device, so the extra reference is no longer needed.

Remove the device field and return an empty TyrPlatformDriverData instead.

Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/driver.rs | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index 0a1cfbad1e6c..7a27207e52f7 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -52,9 +52,7 @@
 pub(crate) type TyrDrmDevice<Ctx = drm::Registered> = drm::Device<TyrDrmDriver, Ctx>;
 
 #[pin_data(PinnedDrop)]
-pub(crate) struct TyrPlatformDriverData {
-    _device: ARef<TyrDrmDevice>,
-}
+pub(crate) struct TyrPlatformDriverData;
 
 #[pin_data(PinnedDrop)]
 pub(crate) struct TyrDrmDeviceData {
@@ -157,15 +155,12 @@ fn probe(
                 gpu_info,
         });
 
-        let ddev = Registration::new_foreign_owned(uninit_ddev, pdev.as_ref(), data, 0)?;
-        let driver = TyrPlatformDriverData {
-            _device: ddev.into(),
-        };
+        Registration::new_foreign_owned(uninit_ddev, pdev.as_ref(), data, 0)?;
 
         // We need this to be dev_info!() because dev_dbg!() does not work at
         // all in Rust for now, and we need to see whether probe succeeded.
         dev_info!(pdev, "Tyr initialized correctly.\n");
-        Ok(driver)
+        Ok(TyrPlatformDriverData)
     }
 }
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 02/20] drm/tyr: select required dependencies in Kconfig
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
  2026-04-24 23:38 ` [PATCH v4 01/20] drm/tyr: remove unused device from platform data Deborah Brouwer
@ 2026-04-24 23:38 ` Deborah Brouwer
  2026-04-27  7:23   ` Boris Brezillon
  2026-04-24 23:38 ` [PATCH v4 03/20] drm/tyr: move clock cleanup into Clocks Drop impl Deborah Brouwer
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:38 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

From: Boris Brezillon <boris.brezillon@collabora.com>

Tyr depends on DRM_GPUVM, RUST_DRM_GEM_SHMEM_HELPER, MMU, IOMMU_SUPPORT,
and IOMMU_IO_PGTABLE_LPAE. Select or depend on these symbols in Kconfig so
the required infrastructure is enabled when Tyr is built.

Introduce DRM_TYR_STATIC_DEPS to keep the built-in DRM dependencies
selected even when Tyr is built as a module.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/Kconfig | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tyr/Kconfig b/drivers/gpu/drm/tyr/Kconfig
index e933e6478027..443ce988b570 100644
--- a/drivers/gpu/drm/tyr/Kconfig
+++ b/drivers/gpu/drm/tyr/Kconfig
@@ -1,5 +1,12 @@
 # SPDX-License-Identifier: GPL-2.0 or MIT
 
+config DRM_TYR_STATIC_DEPS
+	bool
+	select DRM_GPUVM
+	help
+	  Ensure required DRM infrastructure is built-in when enabling Tyr
+	  even if Tyr is =m
+
 config DRM_TYR
 	tristate "Tyr (Rust DRM support for ARM Mali CSF-based GPUs)"
 	depends on DRM=y
@@ -7,6 +14,11 @@ config DRM_TYR
 	depends on ARM || ARM64 || COMPILE_TEST
 	depends on !GENERIC_ATOMIC64  # for IOMMU_IO_PGTABLE_LPAE
 	depends on COMMON_CLK
+	depends on MMU
+	select DRM_TYR_STATIC_DEPS
+	select IOMMU_IO_PGTABLE_LPAE
+	select RUST_DRM_GEM_SHMEM_HELPER
+	depends on IOMMU_SUPPORT
 	default n
 	help
 	  Rust DRM driver for ARM Mali CSF-based GPUs.
@@ -16,5 +28,5 @@ config DRM_TYR
 	  Note that the Mali-G68 and Mali-G78, while Valhall architecture, will
 	  be supported with the panfrost driver as they are not CSF GPUs.
 
-	  if M is selected, the module will be called tyr. This driver is work
+	  If M is selected, the module will be called tyr. This driver is work
 	  in progress and may not be functional.

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 03/20] drm/tyr: move clock cleanup into Clocks Drop impl
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
  2026-04-24 23:38 ` [PATCH v4 01/20] drm/tyr: remove unused device from platform data Deborah Brouwer
  2026-04-24 23:38 ` [PATCH v4 02/20] drm/tyr: select required dependencies in Kconfig Deborah Brouwer
@ 2026-04-24 23:38 ` Deborah Brouwer
  2026-04-24 23:38 ` [PATCH v4 04/20] drm/tyr: rename TyrObject to BoData Deborah Brouwer
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:38 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

Currently Tyr disables its clocks from TyrDrmDeviceData::drop(), which
causes them to be shut down before any other fields in TyrDrmDeviceData
are dropped. This prevents us from using the clocks when dropping the
other fields in TyrDrmDeviceData.

In order to better control when the clocks are dropped, move this cleanup
logic into a Drop implementation on the Clocks struct itself.

Since it serves no further purpose, remove the PinnedDrop implementation
for TyrDrmDeviceData.

Also, while here, remove the #[pin_data] annotation from both the struct
Clocks and struct Regulators since neither of these structs need this
macro to create structurally pinned fields.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/driver.rs | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index 7a27207e52f7..3057f3af10e3 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -54,7 +54,7 @@
 #[pin_data(PinnedDrop)]
 pub(crate) struct TyrPlatformDriverData;
 
-#[pin_data(PinnedDrop)]
+#[pin_data]
 pub(crate) struct TyrDrmDeviceData {
     pub(crate) pdev: ARef<platform::Device>,
 
@@ -169,17 +169,6 @@ impl PinnedDrop for TyrPlatformDriverData {
     fn drop(self: Pin<&mut Self>) {}
 }
 
-#[pinned_drop]
-impl PinnedDrop for TyrDrmDeviceData {
-    fn drop(self: Pin<&mut Self>) {
-        // TODO: the type-state pattern for Clks will fix this.
-        let clks = self.clks.lock();
-        clks.core.disable_unprepare();
-        clks.stacks.disable_unprepare();
-        clks.coregroup.disable_unprepare();
-    }
-}
-
 // We need to retain the name "panthor" to achieve drop-in compatibility with
 // the C driver in the userspace stack.
 const INFO: drm::DriverInfo = drm::DriverInfo {
@@ -203,14 +192,20 @@ impl drm::Driver for TyrDrmDriver {
     }
 }
 
-#[pin_data]
 struct Clocks {
     core: Clk,
     stacks: OptionalClk,
     coregroup: OptionalClk,
 }
 
-#[pin_data]
+impl Drop for Clocks {
+    fn drop(&mut self) {
+        self.core.disable_unprepare();
+        self.stacks.disable_unprepare();
+        self.coregroup.disable_unprepare();
+    }
+}
+
 struct Regulators {
     _mali: Regulator<regulator::Enabled>,
     _sram: Regulator<regulator::Enabled>,

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 04/20] drm/tyr: rename TyrObject to BoData
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (2 preceding siblings ...)
  2026-04-24 23:38 ` [PATCH v4 03/20] drm/tyr: move clock cleanup into Clocks Drop impl Deborah Brouwer
@ 2026-04-24 23:38 ` Deborah Brouwer
  2026-04-24 23:38 ` [PATCH v4 05/20] drm/tyr: use shmem GEM object type in TyrDrmDriver Deborah Brouwer
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:38 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

From: Boris Brezillon <boris.brezillon@collabora.com>

Currently the GEM inner driver data object is called `TyrObject` which
is a fairly generic name. To make the code easier to understand,
rename `TyrObject` to `BoData` so that the name better reflects its
role.

No functional change is intended.

Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Co-developed-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/driver.rs | 4 ++--
 drivers/gpu/drm/tyr/gem.rs    | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index 3057f3af10e3..f2e7157221f1 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -38,7 +38,7 @@
 
 use crate::{
     file::TyrDrmFileData,
-    gem::TyrObject,
+    gem::BoData,
     gpu,
     gpu::GpuInfo,
     regs::gpu_control::*, //
@@ -183,7 +183,7 @@ fn drop(self: Pin<&mut Self>) {}
 impl drm::Driver for TyrDrmDriver {
     type Data = TyrDrmDeviceData;
     type File = TyrDrmFileData;
-    type Object<R: drm::DeviceContext> = drm::gem::Object<TyrObject, R>;
+    type Object<R: drm::DeviceContext> = drm::gem::Object<BoData, R>;
 
     const INFO: drm::DriverInfo = INFO;
 
diff --git a/drivers/gpu/drm/tyr/gem.rs b/drivers/gpu/drm/tyr/gem.rs
index fa8d663fb523..11951b507b18 100644
--- a/drivers/gpu/drm/tyr/gem.rs
+++ b/drivers/gpu/drm/tyr/gem.rs
@@ -15,9 +15,9 @@
 
 /// GEM Object inner driver data
 #[pin_data]
-pub(crate) struct TyrObject {}
+pub(crate) struct BoData {}
 
-impl gem::DriverObject for TyrObject {
+impl gem::DriverObject for BoData {
     type Driver = TyrDrmDriver;
     type Args = ();
 
@@ -26,6 +26,6 @@ fn new<Ctx: DeviceContext>(
         _size: usize,
         _args: Self::Args,
     ) -> impl PinInit<Self, Error> {
-        try_pin_init!(TyrObject {})
+        try_pin_init!(BoData {})
     }
 }

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 05/20] drm/tyr: use shmem GEM object type in TyrDrmDriver
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (3 preceding siblings ...)
  2026-04-24 23:38 ` [PATCH v4 04/20] drm/tyr: rename TyrObject to BoData Deborah Brouwer
@ 2026-04-24 23:38 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 06/20] drm/tyr: set DMA mask using GPU physical address Deborah Brouwer
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:38 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

From: Alvin Sun <alvin.sun@linux.dev>

Tyr buffer objects are shmem-backed, so the driver should use
drm::gem::shmem::Object<BoData> as its GEM object type instead of the base
drm::gem::Object<BoData, R> type.

Switching to the shmem GEM object type matches how Tyr allocates and
manages its buffer objects, and uses the shmem-specific GEM abstraction
provided by the DRM Rust bindings.

Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/driver.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index f2e7157221f1..26a20e0ab91c 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -183,7 +183,7 @@ fn drop(self: Pin<&mut Self>) {}
 impl drm::Driver for TyrDrmDriver {
     type Data = TyrDrmDeviceData;
     type File = TyrDrmFileData;
-    type Object<R: drm::DeviceContext> = drm::gem::Object<BoData, R>;
+    type Object<R: drm::DeviceContext> = drm::gem::shmem::Object<BoData>;
 
     const INFO: drm::DriverInfo = INFO;
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 06/20] drm/tyr: set DMA mask using GPU physical address
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (4 preceding siblings ...)
  2026-04-24 23:38 ` [PATCH v4 05/20] drm/tyr: use shmem GEM object type in TyrDrmDriver Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 07/20] drm/tyr: add shmem backing for GEM objects Deborah Brouwer
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

From: Beata Michalska <beata.michalska@arm.com>

Configure the device DMA mask during probe using the GPU's physical
address capability reported in GpuInfo. This ensures DMA allocations
use an appropriate address mask.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Co-developed-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/driver.rs | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index 26a20e0ab91c..1f09f59e271a 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -11,6 +11,10 @@
         Device, //
     },
     devres::Devres,
+    dma::{
+        Device as DmaDevice,
+        DmaMask, //
+    },
     drm,
     drm::{
         driver::Registration,
@@ -138,6 +142,14 @@ fn probe(
         let gpu_info = GpuInfo::new(pdev.as_ref(), &iomem)?;
         gpu_info.log(pdev.as_ref());
 
+        let pa_bits = MMU_FEATURES::from_raw(gpu_info.mmu_features)
+            .pa_bits()
+            .get();
+        // SAFETY: No concurrent DMA allocations or mappings can be made because
+        // the device is still being probed and therefore isn't being used by
+        // other threads of execution.
+        unsafe { pdev.dma_set_mask_and_coherent(DmaMask::try_new(pa_bits)?)? };
+
         let uninit_ddev = UnregisteredDevice::<TyrDrmDriver>::new(pdev.as_ref())?;
         let platform: ARef<platform::Device> = pdev.into();
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 07/20] drm/tyr: add shmem backing for GEM objects
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (5 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 06/20] drm/tyr: set DMA mask using GPU physical address Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 08/20] drm/tyr: Add generic slot manager Deborah Brouwer
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

Add support for GEM buffer objects backed by shared memory.

This introduces the BoCreateArgs structure for passing creation parameters
including flags, and adds a flags field to BoData.

Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/gem.rs | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/tyr/gem.rs b/drivers/gpu/drm/tyr/gem.rs
index 11951b507b18..c6d4d6f9bae3 100644
--- a/drivers/gpu/drm/tyr/gem.rs
+++ b/drivers/gpu/drm/tyr/gem.rs
@@ -1,4 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0 or MIT
+//! GEM buffer object management for the Tyr driver.
+//!
+//! This module provides buffer object (BO) management functionality using
+//! DRM's GEM subsystem with shmem backing.
 
 use kernel::{
     drm::{
@@ -13,19 +17,27 @@
     TyrDrmDriver, //
 };
 
-/// GEM Object inner driver data
+/// Tyr's DriverObject type for GEM objects.
 #[pin_data]
-pub(crate) struct BoData {}
+pub(crate) struct BoData {
+    flags: u32,
+}
+
+/// Provides a way to pass arguments when creating BoData
+/// as required by the gem::DriverObject trait.
+pub(crate) struct BoCreateArgs {
+    flags: u32,
+}
 
 impl gem::DriverObject for BoData {
     type Driver = TyrDrmDriver;
-    type Args = ();
+    type Args = BoCreateArgs;
 
     fn new<Ctx: DeviceContext>(
         _dev: &TyrDrmDevice<Ctx>,
         _size: usize,
-        _args: Self::Args,
+        args: BoCreateArgs,
     ) -> impl PinInit<Self, Error> {
-        try_pin_init!(BoData {})
+        try_pin_init!(Self { flags: args.flags })
     }
 }

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 08/20] drm/tyr: Add generic slot manager
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (6 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 07/20] drm/tyr: add shmem backing for GEM objects Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 09/20] drm/tyr: add MMU module Deborah Brouwer
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

From: Boris Brezillon <boris.brezillon@collabora.com>

Introduce a generic slot manager to dynamically allocate limited hardware
slots to software "seats". It can be used for both address space (AS) and
command stream group (CSG) slots.

The slot manager initially assigns seats to its free slots. It then
continues to reuse the same slot for a seat, as long as another seat
did not start to use the slot in the interim.

When contention arises because all of the slots are allocated, the slot
manager will lazily evict and reuse slots that have become idle (if any).

The seat state is protected using the LockedBy pattern with the same lock
that guards the SlotManager. This ensures the seat state stays consistent
across slot operations.

Hardware specific behaviour can be customized using the slot manager's
`SlotOperations` trait.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Co-developed-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/slot.rs | 437 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/tyr/tyr.rs  |   1 +
 2 files changed, 438 insertions(+)

diff --git a/drivers/gpu/drm/tyr/slot.rs b/drivers/gpu/drm/tyr/slot.rs
new file mode 100644
index 000000000000..debba75f6204
--- /dev/null
+++ b/drivers/gpu/drm/tyr/slot.rs
@@ -0,0 +1,437 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+
+//! Slot management abstraction for limited hardware resources.
+//!
+//! This module provides a generic [`SlotManager`] that assigns limited hardware
+//! slots to logical "seats". A seat represents an entity (such as a virtual memory
+//! (VM) address space) that needs access to a hardware slot.
+//!
+//! The [`SlotManager`] tracks slot allocation using sequence numbers (seqno) to detect
+//! when a seat's binding has been invalidated. When a seat requests activation,
+//! the manager will either reuse the seat's existing slot (if still valid),
+//! allocate a free slot (if any are available), or evict the oldest idle slot if any
+//! slots are idle.
+//!
+//! Hardware-specific behavior is customized by implementing the [`SlotOperations`]
+//! trait, which allows callbacks when slots are activated or evicted.
+//!
+//! This is currently used for managing address space slots in the GPU, and it will
+//! also be used to manage Command Stream Group (CSG) interface slots in the future.
+//!
+//! [SlotOperations]: crate::slot::SlotOperations
+//! [SlotManager]: crate::slot::SlotManager
+#![allow(dead_code)]
+
+use core::{
+    mem::take,
+    ops::{
+        Deref,
+        DerefMut, //
+    }, //
+};
+
+use kernel::{
+    prelude::*,
+    sync::LockedBy, //
+};
+
+/// Seat information.
+///
+/// This can't be accessed directly by the element embedding a `Seat`,
+/// but is used by the generic slot manager logic to control residency
+/// of a certain object on a hardware slot.
+pub(crate) struct SeatInfo {
+    /// Slot used by this seat.
+    ///
+    /// This index is only valid if the slot pointed to by this index
+    /// has its `SlotInfo::seqno` match `SeatInfo::seqno`. Otherwise,
+    /// it means the object has been evicted from the hardware slot,
+    /// and a new slot needs to be acquired to make this object
+    /// resident again.
+    slot: u8,
+
+    /// Sequence number encoding the last time this seat was active.
+    /// We also use it to check if a slot is still bound to a seat.
+    seqno: u64,
+}
+
+/// Seat state.
+///
+/// This is meant to be embedded in the object that wants to acquire
+/// hardware slots. It also starts in the `Seat::NoSeat` state, and
+/// the slot manager will change the object value when an active/evict
+/// request is issued.
+#[derive(Default)]
+pub(crate) enum Seat {
+    #[expect(clippy::enum_variant_names)]
+    /// Resource is not resident.
+    ///
+    /// All objects start with a seat in the `Seat::NoSeat` state. The seat also
+    /// gets back to that state if the user requests eviction. It
+    /// can also end up in that state next time an operation is done
+    /// on a `Seat::Idle` seat and the slot manager finds out this
+    /// object has been evicted from the slot.
+    #[default]
+    NoSeat,
+
+    /// Resource is actively used and resident.
+    ///
+    /// When a seat is in the `Seat::Active` state, it can't be evicted, and the
+    /// slot pointed to by `SeatInfo::slot` is guaranteed to be reserved
+    /// for this object as long as the seat stays active.
+    Active(SeatInfo),
+
+    /// Resource is idle and might or might not be resident.
+    ///
+    /// When a seat is in the`Seat::Idle` state, we can't know for sure if the
+    /// object is resident or evicted until the next request we issue
+    /// to the slot manager. This tells the slot manager it can
+    /// reclaim the underlying slot if needed.
+    /// In order for the hardware to use this object again, the seat
+    /// needs to be turned into an `Seat::Active` state again
+    /// with a `SlotManager::activate()` call.
+    Idle(SeatInfo),
+}
+
+impl Seat {
+    /// Get the slot index this seat is pointing to.
+    ///
+    /// If the seat is not `Seat::Active` we can't trust the
+    /// `SeatInfo`. In that case `None` is returned, otherwise
+    /// `Some(SeatInfo::slot)` is returned.
+    pub(super) fn slot(&self) -> Option<u8> {
+        match self {
+            Self::Active(info) => Some(info.slot),
+            _ => None,
+        }
+    }
+}
+
+/// Trait describing the slot-related operations.
+pub(crate) trait SlotOperations {
+    /// Implementation-specific data associated with each slot.
+    type SlotData;
+
+    /// Called when a slot is being activated for a seat.
+    ///
+    /// This callback allows hardware-specific actions to be performed when a slot
+    /// becomes active, such as updating hardware registers or invalidating caches.
+    fn activate(&mut self, _slot_idx: usize, _slot_data: &Self::SlotData) -> Result {
+        Ok(())
+    }
+
+    /// Called when a slot is being evicted and freed.
+    ///
+    /// This callback allows hardware-specific cleanup when a slot is being
+    /// completely freed, either explicitly or when an idle slot is being
+    /// reused for a different seat. Any hardware state should be invalidated.
+    fn evict(&mut self, _slot_idx: usize, _slot_data: &Self::SlotData) -> Result {
+        Ok(())
+    }
+}
+
+/// Data attached to a slot.
+///
+/// Contains data and the sequence number used to check
+/// whether a seat's binding to this slot is still valid.
+struct SlotInfo<T> {
+    /// Type specific data attached to a slot
+    slot_data: T,
+
+    /// Sequence number from when this slot was last activated
+    seqno: u64,
+}
+
+/// Slot state.
+///
+/// Tracks whether a hardware slot is free, actively in use, or idle and available
+/// for eviction.
+#[derive(Default)]
+enum Slot<T> {
+    /// Slot is free.
+    ///
+    /// All slots start in the `Slot::Free` state when the slot manager is created.
+    #[default]
+    Free,
+
+    /// Slot is active.
+    ///
+    /// When in the `Slot::Active` state, the slot is guaranteed to stay active
+    /// for as long as the resource bound to it has its seat in the
+    /// `Seat::Active` state. No new resource can be bound to it.
+    Active(SlotInfo<T>),
+
+    /// Slot is idle.
+    ///
+    /// Happens when the underlying resource has been flagged
+    /// `Seat::Idle`. When in the `Slot::Idle` state, the slot manager is allowed
+    /// to evict the resource and re-assign the slot to someone else.
+    /// This process involves updating the `SlotInfo::seqno` which
+    /// will be checked against the `SeatInfo::seqno` in case the idle
+    /// resource wants to become active again.
+    Idle(SlotInfo<T>),
+}
+
+/// Generic slot manager object.
+///
+/// It abstracts away all the churn around activeness/idleness tracking
+/// and lets the implementer of the SlotOperations trait focus on how to
+/// make a resource active or evict it.
+///
+/// This structure must be protected by a lock.
+/// Seats that want to use this manager must be wrapped with
+/// `LockedBy<Seat, SlotManager<T, MAX_SLOTS>>` to ensure they are protected by the same lock.
+/// All operations on seats and slots are synchronized through this shared lock.
+pub(crate) struct SlotManager<T: SlotOperations, const MAX_SLOTS: usize> {
+    /// Manager specific data
+    manager: T,
+
+    /// Number of slots actually available
+    slot_count: usize,
+
+    /// Slots
+    slots: [Slot<T::SlotData>; MAX_SLOTS],
+
+    /// Sequence number incremented each time a Seat is successfully activated
+    use_seqno: u64,
+}
+
+/// A `Seat` protected by the same lock that is used to wrap the `SlotManager`.
+type LockedSeat<T, const MAX_SLOTS: usize> = LockedBy<Seat, SlotManager<T, MAX_SLOTS>>;
+
+impl<T: SlotOperations, const MAX_SLOTS: usize> SlotManager<T, MAX_SLOTS> {
+    /// Creates a new slot manager.
+    ///
+    /// Returns [`EINVAL`] if the slot count is zero or exceeds the maximum number of slots.
+    pub(crate) fn new(manager: T, slot_count: usize) -> Result<Self> {
+        if slot_count == 0 {
+            return Err(EINVAL);
+        }
+        if slot_count > MAX_SLOTS {
+            return Err(EINVAL);
+        }
+        Ok(Self {
+            manager,
+            slot_count,
+            slots: [const { Slot::Free }; MAX_SLOTS],
+            use_seqno: 1,
+        })
+    }
+
+    /// Records a slot as active for the given seat.
+    ///
+    /// Updates both the seat state and the slot state to reflect the active binding,
+    /// using the current sequence number. Increments the sequence number for the next
+    /// activation.
+    fn record_active_slot(
+        &mut self,
+        slot_idx: usize,
+        locked_seat: &LockedSeat<T, MAX_SLOTS>,
+        slot_data: T::SlotData,
+    ) -> Result {
+        let cur_seqno = self.use_seqno;
+
+        *locked_seat.access_mut(self) = Seat::Active(SeatInfo {
+            slot: slot_idx as u8,
+            seqno: cur_seqno,
+        });
+
+        self.slots[slot_idx] = Slot::Active(SlotInfo {
+            slot_data,
+            seqno: cur_seqno,
+        });
+
+        self.use_seqno += 1;
+        Ok(())
+    }
+
+    /// Activates a slot for the given seat.
+    ///
+    /// Calls the activation callback and then records the slot as active.
+    fn activate_slot(
+        &mut self,
+        slot_idx: usize,
+        locked_seat: &LockedSeat<T, MAX_SLOTS>,
+        slot_data: T::SlotData,
+    ) -> Result {
+        self.manager.activate(slot_idx, &slot_data)?;
+        self.record_active_slot(slot_idx, locked_seat, slot_data)
+    }
+
+    /// Allocates a slot for the given seat.
+    ///
+    /// Searches for a free slot first. If none are available, finds the oldest idle
+    /// slot (by sequence number) and evicts it. Returns [`EBUSY`] if all slots are
+    /// active and none can be evicted.
+    fn allocate_slot(
+        &mut self,
+        locked_seat: &LockedSeat<T, MAX_SLOTS>,
+        slot_data: T::SlotData,
+    ) -> Result {
+        let slots = &self.slots[..self.slot_count];
+
+        let mut idle_slot_idx = None;
+        let mut idle_slot_seqno: u64 = 0;
+
+        for (slot_idx, slot) in slots.iter().enumerate() {
+            match slot {
+                Slot::Free => {
+                    return self.activate_slot(slot_idx, locked_seat, slot_data);
+                }
+                Slot::Idle(slot_info) => {
+                    if idle_slot_idx.is_none() || slot_info.seqno < idle_slot_seqno {
+                        idle_slot_idx = Some(slot_idx);
+                        idle_slot_seqno = slot_info.seqno;
+                    }
+                }
+                Slot::Active(_) => (),
+            }
+        }
+
+        match idle_slot_idx {
+            Some(slot_idx) => {
+                // Lazily evict idle slot just before it is reused
+                if let Slot::Idle(slot_info) = &self.slots[slot_idx] {
+                    self.manager.evict(slot_idx, &slot_info.slot_data)?;
+                }
+                self.activate_slot(slot_idx, locked_seat, slot_data)
+            }
+            None => {
+                pr_err!(
+                    "Slot allocation failed: all {} slots in use\n",
+                    self.slot_count
+                );
+                Err(EBUSY)
+            }
+        }
+    }
+
+    /// Transitions a slot from active to idle state.
+    ///
+    /// Updates both the slot and seat to idle state, making the slot eligible for
+    /// eviction if needed by another seat.
+    fn idle_slot(&mut self, slot_idx: usize, locked_seat: &LockedSeat<T, MAX_SLOTS>) -> Result {
+        let slot = take(&mut self.slots[slot_idx]);
+
+        if let Slot::Active(slot_info) = slot {
+            self.slots[slot_idx] = Slot::Idle(SlotInfo {
+                slot_data: slot_info.slot_data,
+                seqno: slot_info.seqno,
+            })
+        };
+
+        *locked_seat.access_mut(self) = match locked_seat.access(self) {
+            Seat::Active(seat_info) | Seat::Idle(seat_info) => Seat::Idle(SeatInfo {
+                slot: seat_info.slot,
+                seqno: seat_info.seqno,
+            }),
+            Seat::NoSeat => Seat::NoSeat,
+        };
+        Ok(())
+    }
+
+    /// Evicts a seat from its slot and marks the slot as free.
+    ///
+    /// Calls the eviction callback then frees the slot and resets the seat to `NoSeat`.
+    fn evict_slot(&mut self, slot_idx: usize, locked_seat: &LockedSeat<T, MAX_SLOTS>) -> Result {
+        match &self.slots[slot_idx] {
+            Slot::Active(slot_info) | Slot::Idle(slot_info) => {
+                self.manager.evict(slot_idx, &slot_info.slot_data)?;
+                take(&mut self.slots[slot_idx]);
+            }
+            _ => (),
+        }
+
+        *locked_seat.access_mut(self) = Seat::NoSeat;
+        Ok(())
+    }
+
+    /// Checks and updates the seat state based on the slot it points to.
+    ///
+    /// Validates that the seat's sequence number matches the slot's sequence number.
+    /// If they don't match, the seat has been evicted and is reset to `NoSeat`.
+    fn check_seat(&mut self, locked_seat: &LockedSeat<T, MAX_SLOTS>) {
+        let (slot_idx, seqno, is_active) = match locked_seat.access(self) {
+            Seat::Active(info) => (info.slot as usize, info.seqno, true),
+            Seat::Idle(info) => (info.slot as usize, info.seqno, false),
+            _ => return,
+        };
+
+        let valid = if is_active {
+            !kernel::warn_on!(!matches!(&self.slots[slot_idx], Slot::Active(s) if s.seqno == seqno))
+        } else {
+            matches!(&self.slots[slot_idx], Slot::Idle(s) if s.seqno == seqno)
+        };
+
+        if !valid {
+            *locked_seat.access_mut(self) = Seat::NoSeat;
+        }
+    }
+
+    /// Make a resource active on any available/reclaimable slot.
+    ///
+    /// Returns [`EBUSY`] if all slots are in use and none can be reclaimed
+    /// or the reclaim failed. May also return errors from the callbacks.
+    pub(crate) fn activate(
+        &mut self,
+        locked_seat: &LockedSeat<T, MAX_SLOTS>,
+        slot_data: T::SlotData,
+    ) -> Result {
+        self.check_seat(locked_seat);
+        match locked_seat.access(self) {
+            Seat::Active(seat_info) | Seat::Idle(seat_info) => {
+                // With lazy eviction, if seqno matches, the hardware state is still
+                // valid for both Active and Idle slots, so just update our records
+                self.record_active_slot(seat_info.slot as usize, locked_seat, slot_data)
+            }
+            _ => self.allocate_slot(locked_seat, slot_data),
+        }
+    }
+
+    /// Flag a resource idle.
+    ///
+    /// The slot manager can decide to reclaim the slot this resource
+    /// was bound to at any point after function returns.
+    // The idle() method will be used when we start adding support for user VMs.
+    #[expect(dead_code)]
+    pub(crate) fn idle(&mut self, locked_seat: &LockedSeat<T, MAX_SLOTS>) -> Result {
+        self.check_seat(locked_seat);
+        if let Seat::Active(seat_info) = locked_seat.access(self) {
+            self.idle_slot(seat_info.slot as usize, locked_seat)?;
+        }
+        Ok(())
+    }
+
+    /// Evict a resource from its slot, and make this slot free again
+    /// for other users.
+    ///
+    /// May return errors from the eviction callback.
+    pub(crate) fn evict(&mut self, locked_seat: &LockedSeat<T, MAX_SLOTS>) -> Result {
+        self.check_seat(locked_seat);
+
+        match locked_seat.access(self) {
+            Seat::Active(seat_info) | Seat::Idle(seat_info) => {
+                let slot_idx = seat_info.slot as usize;
+
+                self.evict_slot(slot_idx, locked_seat)?;
+            }
+            _ => (),
+        }
+
+        Ok(())
+    }
+}
+
+impl<T: SlotOperations, const MAX_SLOTS: usize> Deref for SlotManager<T, MAX_SLOTS> {
+    type Target = T;
+
+    fn deref(&self) -> &Self::Target {
+        &self.manager
+    }
+}
+
+impl<T: SlotOperations, const MAX_SLOTS: usize> DerefMut for SlotManager<T, MAX_SLOTS> {
+    fn deref_mut(&mut self) -> &mut Self::Target {
+        &mut self.manager
+    }
+}
diff --git a/drivers/gpu/drm/tyr/tyr.rs b/drivers/gpu/drm/tyr/tyr.rs
index 9432ddd6b5b8..20b38120e20e 100644
--- a/drivers/gpu/drm/tyr/tyr.rs
+++ b/drivers/gpu/drm/tyr/tyr.rs
@@ -12,6 +12,7 @@
 mod gem;
 mod gpu;
 mod regs;
+mod slot;
 
 kernel::module_platform_driver! {
     type: TyrPlatformDriverData,

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 09/20] drm/tyr: add MMU module
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (7 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 08/20] drm/tyr: Add generic slot manager Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 10/20] drm/tyr: add GPU virtual memory module Deborah Brouwer
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

From: Boris Brezillon <boris.brezillon@collabora.com>

Add a Memory Management Unit (MMU) driver for Tyr. The MMU wraps a
SlotManager for allocating hardware address space slots. The underlying
AddressSpaceManager performs MMU operations including enabling/disabling
address spaces, flushing page tables, and locking regions for page table
updates.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Co-developed-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/driver.rs            |   3 +
 drivers/gpu/drm/tyr/mmu.rs               | 128 +++++++
 drivers/gpu/drm/tyr/mmu/address_space.rs | 571 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/tyr/regs.rs              | 110 ++++++
 drivers/gpu/drm/tyr/tyr.rs               |   1 +
 5 files changed, 813 insertions(+)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index 1f09f59e271a..495021a8657d 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -45,6 +45,7 @@
     gem::BoData,
     gpu,
     gpu::GpuInfo,
+    mmu::Mmu,
     regs::gpu_control::*, //
 };
 
@@ -153,6 +154,8 @@ fn probe(
         let uninit_ddev = UnregisteredDevice::<TyrDrmDriver>::new(pdev.as_ref())?;
         let platform: ARef<platform::Device> = pdev.into();
 
+        let _mmu = Mmu::new(pdev, iomem.as_arc_borrow(), &gpu_info)?;
+
         let data = try_pin_init!(TyrDrmDeviceData {
                 pdev: platform.clone(),
                 clks <- new_mutex!(Clocks {
diff --git a/drivers/gpu/drm/tyr/mmu.rs b/drivers/gpu/drm/tyr/mmu.rs
new file mode 100644
index 000000000000..09df98ffc9e3
--- /dev/null
+++ b/drivers/gpu/drm/tyr/mmu.rs
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+
+//! Memory Management Unit (MMU) driver for the Tyr GPU.
+//!
+//! This module manages GPU address spaces and virtual memory (VM) operations through
+//! hardware MMU slots. It provides functionality for flushing page tables and
+//! managing VM updates for active address spaces.
+//!
+//! The MMU coordinates with the [`AddressSpaceManager`] to handle hardware
+//! address space allocation and page table operations, using [`SlotManager`]
+//! to track which address spaces are currently active in hardware slots.
+//!
+//! [`AddressSpaceManager`]: address_space::AddressSpaceManager
+//! [`SlotManager`]: crate::slot::SlotManager
+#![allow(dead_code)]
+
+use core::ops::Range;
+
+use kernel::{
+    devres::Devres,
+    new_mutex,
+    platform,
+    prelude::*,
+    sync::{
+        Arc,
+        ArcBorrow,
+        Mutex, //
+    }, //
+};
+
+use crate::{
+    driver::IoMem,
+    gpu::GpuInfo,
+    mmu::address_space::{
+        AddressSpaceManager,
+        VmAsData, //
+    },
+    regs::{
+        gpu_control::AS_PRESENT,
+        MAX_AS, //
+    },
+    slot::SlotManager, //
+};
+
+pub(crate) mod address_space;
+
+pub(crate) type AsSlotManager = SlotManager<AddressSpaceManager, MAX_AS>;
+
+/// MMU component of the GPU.
+///
+/// This is used to bind VM objects to an AS (Address Space) slot
+/// and make the VM active on the GPU.
+///
+/// All operations acquire an internal lock, allowing concurrent access from multiple
+/// threads. Methods may block if another thread holds the lock.
+#[pin_data]
+pub(crate) struct Mmu {
+    /// Manages the allocation of hardware MMU slots to GPU address spaces.
+    ///
+    /// Tracks which address spaces are currently active in hardware slots and
+    /// coordinates address space operations like flushing and VM updates.
+    ///
+    /// This mutex also protects individual [`Seat`]s that are wrapped with
+    /// `LockedBy<Seat, SlotManager<...>>` to share the same lock protection.
+    ///
+    /// [`Seat`]: crate::slot::Seat
+    #[pin]
+    pub(crate) as_manager: Mutex<AsSlotManager>,
+}
+
+impl Mmu {
+    /// Create an MMU component for this device.
+    pub(crate) fn new(
+        pdev: &platform::Device,
+        iomem: ArcBorrow<'_, Devres<IoMem>>,
+        gpu_info: &GpuInfo,
+    ) -> Result<Arc<Mmu>> {
+        let present = AS_PRESENT::from_raw(gpu_info.as_present).present().get();
+        let slot_count = present.count_ones().try_into()?;
+
+        let as_manager = AddressSpaceManager::new(pdev, iomem, present)?;
+        let mmu_init = try_pin_init!(Self{
+            as_manager <- new_mutex!(SlotManager::new(as_manager, slot_count)?),
+        });
+        Arc::pin_init(mmu_init, GFP_KERNEL)
+    }
+
+    /// Make a VM active.
+    ///
+    /// This implies assigning the VM to an AS slot through the slot manager.
+    pub(crate) fn activate_vm(&self, vm: ArcBorrow<'_, VmAsData>) -> Result {
+        self.as_manager.lock().activate_vm(vm)
+    }
+
+    /// Make the VM inactive.
+    ///
+    /// Evicts the VM from its AS slot through the slot manager.
+    pub(crate) fn deactivate_vm(&self, vm: &VmAsData) -> Result {
+        self.as_manager.lock().deactivate_vm(vm)
+    }
+
+    /// Flush caches after a VM update.
+    ///
+    /// If the VM is no longer resident, this is a NOP, otherwise, the
+    /// AS manager will flush the GPU and MMU Translation Lookaside Buffer (TLB) caches.
+    pub(crate) fn flush_vm(&self, vm: &VmAsData) -> Result {
+        self.as_manager.lock().flush_vm(vm)
+    }
+
+    /// Flags the start of a VM update.
+    ///
+    /// If the VM is resident, any GPU access on the memory range being
+    /// updated will be blocked until `Mmu::end_vm_update()` is called.
+    /// This guarantees the atomicity of a VM update.
+    /// If the VM is not resident, this is a NOP.
+    pub(crate) fn start_vm_update(&self, vm: &VmAsData, region: &Range<u64>) -> Result {
+        self.as_manager.lock().start_vm_update(vm, region)
+    }
+
+    /// Flags the end of a VM update.
+    ///
+    /// If the VM is resident, this will let GPU accesses on the updated
+    /// range go through, in case any of them were blocked.
+    /// If the VM is not resident, this is a NOP.
+    pub(crate) fn end_vm_update(&self, vm: &VmAsData) -> Result {
+        self.as_manager.lock().end_vm_update(vm)
+    }
+}
diff --git a/drivers/gpu/drm/tyr/mmu/address_space.rs b/drivers/gpu/drm/tyr/mmu/address_space.rs
new file mode 100644
index 000000000000..cc2841bab21c
--- /dev/null
+++ b/drivers/gpu/drm/tyr/mmu/address_space.rs
@@ -0,0 +1,571 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+
+//! GPU address space management and hardware operations.
+//!
+//! This module manages GPU hardware address spaces (AS), including configuration,
+//! command submission, and page table update regions. It handles the hardware
+//! interaction for MMU operations through MMIO register access.
+//!
+//! The [`AddressSpaceManager`] implements [`SlotOperations`] to integrate with
+//! the slot management system, enabling and configuring address spaces in the
+//! hardware slots as needed.
+//!
+//! [`SlotOperations`]: crate::slot::SlotOperations
+
+use core::ops::Range;
+
+use kernel::{
+    device::{
+        Bound,
+        Device, //
+    },
+    devres::Devres,
+    error::Result,
+    io::{
+        poll,
+        register::Array,
+        Io, //
+    },
+    iommu::pgtable::{
+        Config,
+        IoPageTable,
+        ARM64LPAES1, //
+    },
+    platform,
+    prelude::*,
+    sizes::{
+        SZ_2M,
+        SZ_4K, //
+    },
+    sync::{
+        aref::ARef,
+        Arc,
+        ArcBorrow,
+        LockedBy, //
+    },
+    time::Delta, //
+};
+
+use crate::{
+    driver::IoMem,
+    mmu::{
+        AsSlotManager,
+        Mmu, //
+    },
+    regs::{
+        mmu_control::mmu_as_control,
+        mmu_control::mmu_as_control::*,
+        MAX_AS, //
+    },
+    slot::{
+        Seat,
+        SlotOperations, //
+    }, //
+};
+
+/// Hardware address space configuration registers.
+///
+/// Contains register values for configuring a GPU MMU address space.
+#[derive(Clone, Copy)]
+struct AddressSpaceConfig {
+    /// Translation configuration.
+    ///
+    /// Controls address translation mode, address range restrictions, translation table
+    /// walk attributes, and access permission settings for this address space.
+    transcfg: u64,
+
+    /// Translation table base address.
+    ///
+    /// The address of the top level of a translation table structure.
+    transtab: u64,
+
+    /// Memory attributes.
+    ///
+    /// Defines memory attribute indirection entries that control cacheability
+    /// and other memory access properties for the address space.
+    memattr: u64,
+}
+
+/// Virtual memory (VM) address space data for GPU MMU operations.
+///
+/// Contains all resources and information needed by the [`AddressSpaceManager`]
+/// to activate a VM in a hardware address space slot.
+///
+/// On activation, we will pass an [`Arc`]<[`VmAsData`]> that will be stored in
+/// the slot to make sure the page table and the underlying resources
+/// (pages) used by the AS slot won't go away while the MMU points to
+/// those.
+///
+/// The `as_seat` field uses [`LockedBy`] to ensure safe concurrent access to
+/// the slot assignment state, protected by the [`AsSlotManager`] lock.
+#[pin_data]
+pub(crate) struct VmAsData {
+    /// Tracks this VM's binding to a hardware address space slot.
+    as_seat: LockedBy<Seat, AsSlotManager>,
+
+    /// Virtual address bits for this address space.
+    va_bits: u8,
+
+    /// Page table.
+    ///
+    /// Managed by devres to ensure proper cleanup. The page table maps
+    /// GPU virtual addresses to physical addresses for this VM.
+    #[pin]
+    pub(crate) page_table: Devres<IoPageTable<ARM64LPAES1>>,
+}
+
+impl VmAsData {
+    /// Creates a new VM address space data structure.
+    ///
+    /// Initializes the page table for the address space.
+    pub(crate) fn new<'a>(
+        mmu: &'a Mmu,
+        pdev: &'a platform::Device,
+        va_bits: u32,
+        pa_bits: u32,
+    ) -> impl pin_init::PinInit<VmAsData, Error> + 'a {
+        // SAFETY: pdev is a bound device.
+        let dev = unsafe { pdev.as_ref().as_bound() };
+
+        let pt_config = Config {
+            quirks: 0,
+            pgsize_bitmap: SZ_4K | SZ_2M,
+            ias: va_bits,
+            oas: pa_bits,
+            coherent_walk: false,
+        };
+
+        let page_table_init = IoPageTable::new(dev, pt_config);
+
+        try_pin_init!(Self {
+            as_seat: LockedBy::new(&mmu.as_manager, Seat::NoSeat),
+            va_bits: va_bits as u8,
+            page_table <- page_table_init,
+        }? Error)
+    }
+
+    /// Computes the hardware configuration for this address space.
+    ///
+    /// The caller must ensure that the address space is evicted and cleaned up
+    /// before the `VmAsData` is dropped.
+    fn as_config(&self, dev: &Device<Bound>) -> Result<AddressSpaceConfig> {
+        let pt = self.page_table.access(dev)?;
+
+        // The hardware computes the valid input address range as:
+        //   INA_BITS_VALID = min(HW_INA_BITS, 55 - INA_BITS)
+        // To configure our desired va_bits, we solve for INA_BITS:
+        //   INA_BITS = 55 - va_bits
+        // This assumes HW_INA_BITS (hardware capability) >= va_bits.
+        let ina_bits_field_value = 55 - self.va_bits;
+        let ina_bits = match ina_bits_field_value {
+            7 => mmu_as_control::InaBits::Bits48,
+            8 => mmu_as_control::InaBits::Bits47,
+            9 => mmu_as_control::InaBits::Bits46,
+            10 => mmu_as_control::InaBits::Bits45,
+            11 => mmu_as_control::InaBits::Bits44,
+            12 => mmu_as_control::InaBits::Bits43,
+            13 => mmu_as_control::InaBits::Bits42,
+            14 => mmu_as_control::InaBits::Bits41,
+            15 => mmu_as_control::InaBits::Bits40,
+            16 => mmu_as_control::InaBits::Bits39,
+            17 => mmu_as_control::InaBits::Bits38,
+            18 => mmu_as_control::InaBits::Bits37,
+            19 => mmu_as_control::InaBits::Bits36,
+            20 => mmu_as_control::InaBits::Bits35,
+            21 => mmu_as_control::InaBits::Bits34,
+            22 => mmu_as_control::InaBits::Bits33,
+            23 => mmu_as_control::InaBits::Bits32,
+            24 => mmu_as_control::InaBits::Bits31,
+            25 => mmu_as_control::InaBits::Bits30,
+            26 => mmu_as_control::InaBits::Bits29,
+            27 => mmu_as_control::InaBits::Bits28,
+            28 => mmu_as_control::InaBits::Bits27,
+            29 => mmu_as_control::InaBits::Bits26,
+            30 => mmu_as_control::InaBits::Bits25,
+            _ => return Err(EINVAL),
+        };
+
+        let transcfg = mmu_as_control::TRANSCFG::zeroed()
+            .with_ptw_memattr(mmu_as_control::PtwMemattr::WriteBack)
+            .with_r_allocate(true)
+            .with_mode(mmu_as_control::AddressSpaceMode::Aarch64_4K)
+            .with_ina_bits(ina_bits)
+            .into_raw();
+
+        Ok(AddressSpaceConfig {
+            transcfg,
+            // SAFETY: Caller ensures proper cleanup.
+            transtab: unsafe { pt.ttbr() },
+            memattr: MEMATTR::from_mair(pt.mair()).into_raw(),
+        })
+    }
+}
+
+/// Manages GPU hardware address spaces via MMIO register operations.
+///
+/// Coordinates all hardware-level address space operations including enabling,
+/// disabling, flushing, and updating address spaces. Implements [`SlotOperations`]
+/// to integrate with the generic slot management system.
+///
+/// [`SlotOperations`]: crate::slot::SlotOperations
+pub(crate) struct AddressSpaceManager {
+    /// Platform device reference for DMA and device operations.
+    pdev: ARef<platform::Device>,
+
+    /// Memory-mapped I/O region for GPU register access.
+    iomem: Arc<Devres<IoMem>>,
+
+    /// Bitmask of available address space slots from GPU_AS_PRESENT register.
+    as_present: u32,
+}
+
+impl SlotOperations for AddressSpaceManager {
+    /// VM address space data stored in each hardware slot.
+    type SlotData = Arc<VmAsData>;
+
+    /// Activates an address space in a hardware slot.
+    fn activate(&mut self, slot_idx: usize, slot_data: &Self::SlotData) -> Result {
+        let as_config = slot_data.as_config(self.dev())?;
+        self.as_enable(slot_idx, &as_config)
+    }
+
+    /// Evicts an address space from a hardware slot.
+    fn evict(&mut self, slot_idx: usize, _slot_data: &Self::SlotData) -> Result {
+        if self.iomem.try_access().is_some() {
+            self.as_flush(slot_idx)?;
+            self.as_disable(slot_idx)?;
+        }
+        Ok(())
+    }
+}
+
+impl AddressSpaceManager {
+    /// Creates a new address space manager.
+    ///
+    /// Initializes the manager with references to the platform device and
+    /// I/O memory region, along with the bitmask of available AS slots.
+    pub(super) fn new(
+        pdev: &platform::Device,
+        iomem: ArcBorrow<'_, Devres<IoMem>>,
+        as_present: u32,
+    ) -> Result<AddressSpaceManager> {
+        Ok(Self {
+            pdev: pdev.into(),
+            iomem: iomem.into(),
+            as_present,
+        })
+    }
+
+    /// Returns a reference to the bound device.
+    fn dev(&self) -> &Device<Bound> {
+        // SAFETY: pdev is a bound device.
+        unsafe { self.pdev.as_ref().as_bound() }
+    }
+
+    /// Validates that an AS slot number is within range and present in hardware.
+    ///
+    /// Checks that the slot index is less than [`MAX_AS`] and that
+    /// the corresponding bit is set in the `as_present` mask read from the GPU.
+    ///
+    /// Returns [`EINVAL`] if the slot is out of range or not present in hardware.
+    fn validate_as_slot(&self, as_nr: usize) -> Result {
+        if as_nr >= MAX_AS {
+            pr_err!("AS slot {} out of valid range (max {})\n", as_nr, MAX_AS);
+            return Err(EINVAL);
+        }
+
+        if (self.as_present & (1 << as_nr)) == 0 {
+            pr_err!(
+                "AS slot {} not present in hardware (AS_PRESENT={:#x})\n",
+                as_nr,
+                self.as_present
+            );
+            return Err(EINVAL);
+        }
+
+        Ok(())
+    }
+
+    /// Waits for an AS slot to become ready (not active).
+    ///
+    /// Returns an error if polling times out after 10ms or if register access fails.
+    fn as_wait_ready(&self, as_nr: usize) -> Result {
+        let dev = self.dev();
+        let io = self.iomem.access(dev)?;
+        let op = || {
+            let status_reg = STATUS::try_at(as_nr).ok_or(EINVAL)?;
+            Ok(io.read(status_reg))
+        };
+        let cond = |status: &STATUS| -> bool { !status.active_ext() };
+        poll::read_poll_timeout(op, cond, Delta::from_millis(0), Delta::from_millis(10))?;
+
+        Ok(())
+    }
+
+    /// Sends a command to an AS slot.
+    ///
+    /// Returns an error if waiting for ready times out or if register write fails.
+    fn as_send_cmd(&mut self, as_nr: usize, cmd: MmuCommand) -> Result {
+        self.as_wait_ready(as_nr)?;
+        let dev = self.dev();
+        let io = self.iomem.access(dev)?;
+        let command_reg = COMMAND::try_at(as_nr).ok_or(EINVAL)?;
+        io.write(command_reg, COMMAND::zeroed().with_command(cmd));
+        Ok(())
+    }
+
+    /// Sends a command to an AS slot and waits for completion.
+    ///
+    /// Returns an error if sending the command fails or if waiting for completion times out.
+    fn as_send_cmd_and_wait(&mut self, as_nr: usize, cmd: MmuCommand) -> Result {
+        self.as_send_cmd(as_nr, cmd)?;
+        self.as_wait_ready(as_nr)?;
+        Ok(())
+    }
+
+    /// Enables an AS slot with the provided configuration.
+    ///
+    /// Returns an error if the slot is invalid or if register writes/commands fail.
+    fn as_enable(&mut self, as_nr: usize, as_config: &AddressSpaceConfig) -> Result {
+        self.validate_as_slot(as_nr)?;
+
+        let dev = self.dev();
+        let io = self.iomem.access(dev)?;
+
+        let transtab = as_config.transtab;
+        io.write(
+            TRANSTAB_LO::try_at(as_nr).ok_or(EINVAL)?,
+            TRANSTAB_LO::from_raw(transtab as u32),
+        );
+        io.write(
+            TRANSTAB_HI::try_at(as_nr).ok_or(EINVAL)?,
+            TRANSTAB_HI::from_raw((transtab >> 32) as u32),
+        );
+
+        let transcfg = as_config.transcfg;
+        io.write(
+            TRANSCFG_LO::try_at(as_nr).ok_or(EINVAL)?,
+            TRANSCFG_LO::from_raw(transcfg as u32),
+        );
+        io.write(
+            TRANSCFG_HI::try_at(as_nr).ok_or(EINVAL)?,
+            TRANSCFG_HI::from_raw((transcfg >> 32) as u32),
+        );
+
+        let memattr = as_config.memattr;
+        io.write(
+            MEMATTR_LO::try_at(as_nr).ok_or(EINVAL)?,
+            MEMATTR_LO::from_raw(memattr as u32),
+        );
+        io.write(
+            MEMATTR_HI::try_at(as_nr).ok_or(EINVAL)?,
+            MEMATTR_HI::from_raw((memattr >> 32) as u32),
+        );
+
+        self.as_send_cmd_and_wait(as_nr, MmuCommand::Update)?;
+
+        Ok(())
+    }
+
+    /// Disables an AS slot and clears its configuration.
+    ///
+    /// Returns an error if the slot is invalid or if register writes/commands fail.
+    fn as_disable(&mut self, as_nr: usize) -> Result {
+        self.validate_as_slot(as_nr)?;
+
+        // Flush AS before disabling
+        self.as_send_cmd_and_wait(as_nr, MmuCommand::FlushMem)?;
+
+        let dev = self.dev();
+        let io = self.iomem.access(dev)?;
+
+        io.write(
+            TRANSTAB_LO::try_at(as_nr).ok_or(EINVAL)?,
+            TRANSTAB_LO::from_raw(0),
+        );
+        io.write(
+            TRANSTAB_HI::try_at(as_nr).ok_or(EINVAL)?,
+            TRANSTAB_HI::from_raw(0),
+        );
+
+        io.write(
+            MEMATTR_LO::try_at(as_nr).ok_or(EINVAL)?,
+            MEMATTR_LO::from_raw(0),
+        );
+        io.write(
+            MEMATTR_HI::try_at(as_nr).ok_or(EINVAL)?,
+            MEMATTR_HI::from_raw(0),
+        );
+
+        let transcfg = TRANSCFG::zeroed()
+            .with_mode(AddressSpaceMode::Unmapped)
+            .into_raw();
+
+        io.write(
+            TRANSCFG_LO::try_at(as_nr).ok_or(EINVAL)?,
+            TRANSCFG_LO::from_raw(transcfg as u32),
+        );
+        io.write(
+            TRANSCFG_HI::try_at(as_nr).ok_or(EINVAL)?,
+            TRANSCFG_HI::from_raw((transcfg >> 32) as u32),
+        );
+
+        self.as_send_cmd_and_wait(as_nr, MmuCommand::Update)?;
+
+        Ok(())
+    }
+
+    /// Locks a region of the translation tables for an atomic update.
+    ///
+    /// Programs the MMU LOCKADDR register for the given address space and issues
+    /// the lock command. The hardware rounds the requested range up to a
+    /// power-of-two region aligned to its size.
+    ///
+    /// Returns an error if the slot is invalid or if register writes/commands fail.
+    fn as_start_update(&mut self, as_nr: usize, region: &Range<u64>) -> Result {
+        self.validate_as_slot(as_nr)?;
+
+        // The lock operates on full 64-byte cache lines of translation table entries.
+        // Since each translation table entry (TTE) is 8 bytes, a cache line has 8 TTEs.
+        // Since each TTE maps one page, the minimum locked region size will be 8 pages.
+        //
+        // With 4KiB pages (Aarch64_4K mode), the minimum locked region is 32KiB.
+        let lock_region_min_size: u64 = 32 * 1024;
+
+        // Count the number of trailing zero bits (zeros at the right/least-significant
+        // end of the binary representation). For a power-of-two value, this equals the
+        // base-2 exponent (e.g., 32 KiB = 2^15 → 15).
+        let lock_region_min_size_log2 = lock_region_min_size.trailing_zeros() as u8;
+
+        // XOR the first and last addresses to identify which bits differ between them.
+        // The highest set bit in the result determines the exponent of the smallest
+        // power-of-two region that can contain both addresses.
+        //
+        // Example:
+        //   addr_xor = 0x1000 ^ 0x2FFF = 0x3FFF
+        //   highest set bit in 0x3FFF is bit 13
+        //   minimum region size = 2^(13 + 1) = 16 KiB
+        let addr_xor = region.start ^ (region.end - 1);
+        let region_size_log2 = 64 - addr_xor.leading_zeros() as u8;
+
+        let lock_region_log2 = core::cmp::max(region_size_log2, lock_region_min_size_log2);
+
+        // Align the LOCKADDR base address down to the lock region size (1 << lock_region_log2).
+        //
+        // The MMU ignores the low lock_region_log2 bits of LOCKADDR base, so ensure
+        // they are cleared in software to avoid ambiguity.
+        let lockaddr_base = region.start & !((1u64 << lock_region_log2) - 1);
+
+        // The LOCKADDR size field encodes the lock region size as log2(size) - 1,
+        // per the hardware definition. For example, a 32 KiB region is encoded as 14
+        // because log2(32 KiB) = 15.
+        let lockaddr_size = lock_region_log2 - 1;
+
+        let dev = self.dev();
+        let io = self.iomem.access(dev)?;
+
+        let lockaddr_val = LOCKADDR::zeroed()
+            .try_with_size(lockaddr_size)?
+            .try_with_base(lockaddr_base)?
+            .into_raw();
+
+        io.write(
+            LOCKADDR_LO::try_at(as_nr).ok_or(EINVAL)?,
+            LOCKADDR_LO::from_raw(lockaddr_val as u32),
+        );
+        io.write(
+            LOCKADDR_HI::try_at(as_nr).ok_or(EINVAL)?,
+            LOCKADDR_HI::from_raw((lockaddr_val >> 32) as u32),
+        );
+
+        self.as_send_cmd(as_nr, MmuCommand::Lock)
+    }
+
+    /// Completes an atomic translation table update.
+    ///
+    /// Returns an error if the slot is invalid or if the flush command fails.
+    fn as_end_update(&mut self, as_nr: usize) -> Result {
+        self.validate_as_slot(as_nr)?;
+        self.as_send_cmd_and_wait(as_nr, MmuCommand::FlushPt)?;
+        Ok(())
+    }
+
+    /// Flushes the translation table cache for an AS slot.
+    ///
+    /// Returns an error if the slot is invalid or if the flush command fails.
+    fn as_flush(&mut self, as_nr: usize) -> Result {
+        self.validate_as_slot(as_nr)?;
+        self.as_send_cmd(as_nr, MmuCommand::FlushPt)
+    }
+}
+
+impl AsSlotManager {
+    /// Locks a region for translation table updates if the VM has an active slot.
+    ///
+    /// If the VM is currently assigned to a hardware slot, locks the specified
+    /// memory region to make translation table updates atomic. GPU accesses to the
+    /// region will be blocked until [`end_vm_update`] is called.
+    ///
+    /// If the VM is not resident in a hardware slot, this is a no-op.
+    pub(super) fn start_vm_update(&mut self, vm: &VmAsData, region: &Range<u64>) -> Result {
+        let seat = vm.as_seat.access(self);
+        match seat.slot() {
+            Some(slot) => {
+                let as_nr = slot as usize;
+                self.as_start_update(as_nr, region)
+            }
+            _ => Ok(()),
+        }
+    }
+
+    /// Completes translation table updates and unlocks the region.
+    ///
+    /// If the VM is currently assigned to a hardware slot, flushes the translation
+    /// table cache and unlocks the region that was locked by [`start_vm_update`],
+    /// allowing GPU accesses to proceed with the updated translation tables.
+    ///
+    /// If the VM is not resident in a hardware slot, this is a no-op.
+    pub(super) fn end_vm_update(&mut self, vm: &VmAsData) -> Result {
+        let seat = vm.as_seat.access(self);
+        match seat.slot() {
+            Some(slot) => {
+                let as_nr = slot as usize;
+                self.as_end_update(as_nr)
+            }
+            _ => Ok(()),
+        }
+    }
+
+    /// Flushes translation table cache if the VM has an active slot.
+    ///
+    /// If the VM is currently assigned to a hardware slot, invalidates cached
+    /// translation table entries to ensure subsequent GPU accesses use updated translations.
+    ///
+    /// If the VM is not resident in a hardware slot, this is a no-op.
+    pub(super) fn flush_vm(&mut self, vm: &VmAsData) -> Result {
+        let seat = vm.as_seat.access(self);
+        match seat.slot() {
+            Some(slot) => {
+                let as_nr = slot as usize;
+                self.as_flush(as_nr)
+            }
+            _ => Ok(()),
+        }
+    }
+
+    /// Activates a VM by assigning it to a hardware slot.
+    ///
+    /// Allocates a hardware address space slot for the VM and configures
+    /// it with the VM's translation table and memory attributes.
+    pub(super) fn activate_vm(&mut self, vm: ArcBorrow<'_, VmAsData>) -> Result {
+        self.activate(&vm.as_seat, vm.into())
+    }
+
+    /// Deactivates a VM by evicting it from its hardware slot.
+    ///
+    /// Flushes any pending operations and clears the hardware slot's
+    /// configuration, freeing the slot for use by other VMs.
+    pub(super) fn deactivate_vm(&mut self, vm: &VmAsData) -> Result {
+        self.evict(&vm.as_seat)
+    }
+}
diff --git a/drivers/gpu/drm/tyr/regs.rs b/drivers/gpu/drm/tyr/regs.rs
index 9963294b8625..dafca19e3532 100644
--- a/drivers/gpu/drm/tyr/regs.rs
+++ b/drivers/gpu/drm/tyr/regs.rs
@@ -45,6 +45,8 @@ pub(crate) fn read_u64_no_tearing(lo_read: impl Fn() -> u32, hi_read: impl Fn()
     }
 }
 
+pub(crate) use mmu_control::mmu_as_control::MAX_AS;
+
 /// These registers correspond to the GPU_CONTROL register page.
 /// They are involved in GPU configuration and control.
 pub(crate) mod gpu_control {
@@ -974,6 +976,8 @@ pub(crate) mod mmu_as_control {
             register, //
         };
 
+        use pin_init::Zeroable;
+
         /// Maximum number of hardware address space slots.
         /// The actual number of slots available is usually lower.
         pub(crate) const MAX_AS: usize = 16;
@@ -1167,7 +1171,113 @@ fn from(val: MMU_MEMATTR_STAGE1) -> Self {
             pub(crate) MEMATTR_HI(u32)[MAX_AS, stride = STRIDE] @ 0x240c {
                 31:0 value;
             }
+        }
+
+        impl MEMATTR {
+            /// ARM MAIR Write-Allocate bit (bit 0 of inner/outer cache policy nibble).
+            ///
+            /// In the ARM Architecture Reference Manual, the MAIR encoding for Normal memory
+            /// uses the format `0bxxRW` where:
+            /// - `W` (bit 0) = Write-Allocate policy
+            /// - `R` (bit 1) = Read-Allocate policy
+            ///   For example, `0b1111` = Write-Back with both Read and Write allocation.
+            const ARM_MAIR_WRITE_ALLOCATE: u8 = 0x1;
+            /// ARM MAIR Read-Allocate bit (bit 1 of inner/outer cache policy nibble).
+            const ARM_MAIR_READ_ALLOCATE: u8 = 0x2;
+            /// ARM MAIR Write-back bit (bit 2 of inner/outer cache policy nibble).
+            const ARM_MAIR_WRITE_BACK: u8 = 0x4;
+            /// Mask for the inner cache policy nibble in MAIR attribute bytes.
+            const ARM_MAIR_INNER_MASK: u8 = 0x0f;
+
+            fn encode_attribute(
+                alloc_w: bool,
+                alloc_r: bool,
+                alloc_sel: AllocPolicySelect,
+                coherency: Coherency,
+                memory_type: MemoryType,
+            ) -> MMU_MEMATTR_STAGE1 {
+                MMU_MEMATTR_STAGE1::zeroed()
+                    .with_alloc_w(alloc_w)
+                    .with_alloc_r(alloc_r)
+                    .with_alloc_sel(alloc_sel)
+                    .with_coherency(coherency)
+                    .with_memory_type(memory_type)
+            }
+
+            fn with_encoded_attribute(self, index: usize, attr: MMU_MEMATTR_STAGE1) -> Self {
+                debug_assert!(index < 8);
+
+                let shift = index * 8;
+                let mask = !(0xffu64 << shift);
+                let raw = (self.into_raw() & mask) | ((u64::from(attr.into_raw())) << shift);
+
+                Self::from_raw(raw)
+            }
+
+            /// Check if a MAIR attribute byte represents device memory.
+            ///
+            /// Device memory (memory-mapped I/O, registers) cannot be cached and must
+            /// be mapped as GPU `NonCacheable`.
+            fn is_device_memory(mair_attr: u8) -> bool {
+                // In AArch64 MAIR, device memory has bits [1:0] of outer nibble = 0.
+                let outer = mair_attr >> 4;
+                (outer & 0x3) == 0
+            }
 
+            /// Check if normal memory is fully write-back cacheable.
+            ///
+            /// ARM MAIR has two cache policy levels (outer [7:4] and inner [3:0]).
+            /// For memory to be truly write-back, BOTH levels must have the write-back bit set.
+            /// If only one level is write-back, treat it as non-cacheable for GPU purposes.
+            fn is_writeback_cacheable(mair_attr: u8) -> bool {
+                let outer = mair_attr >> 4;
+                let inner = mair_attr & Self::ARM_MAIR_INNER_MASK;
+
+                (outer & Self::ARM_MAIR_WRITE_BACK) != 0 && (inner & Self::ARM_MAIR_WRITE_BACK) != 0
+            }
+
+            // TODO: Add a `coherent` parameter like panthor's mair_to_memattr().
+            // For now, assume a non-coherent system and always encode write-back
+            // memory with MidgardInnerDomain coherency.
+            fn attribute_from_mair(mair_attr: u8) -> MMU_MEMATTR_STAGE1 {
+                // Device memory or non-writeback normal memory
+                if Self::is_device_memory(mair_attr) || !Self::is_writeback_cacheable(mair_attr) {
+                    return Self::encode_attribute(
+                        false,
+                        false,
+                        AllocPolicySelect::Alloc,
+                        Coherency::MidgardInnerDomain,
+                        MemoryType::NonCacheable,
+                    );
+                }
+
+                // Write-back cacheable normal memory
+                let inner: u8 = mair_attr & Self::ARM_MAIR_INNER_MASK;
+                Self::encode_attribute(
+                    (inner & Self::ARM_MAIR_WRITE_ALLOCATE) != 0,
+                    (inner & Self::ARM_MAIR_READ_ALLOCATE) != 0,
+                    AllocPolicySelect::Alloc,
+                    Coherency::MidgardInnerDomain,
+                    MemoryType::WriteBack,
+                )
+            }
+
+            /// Convert an AArch64 MAIR value into the GPU MEMATTR register encoding.
+            ///
+            /// MAIR bytes map to GPU attributes as follows:
+            /// - device/write-through/non-cacheable → GPU `NonCacheable`
+            /// - write-back → GPU `WriteBack` (preserving inner allocation hints)
+            pub(crate) fn from_mair(mair: u64) -> Self {
+                mair.to_le_bytes()
+                    .into_iter()
+                    .enumerate()
+                    .fold(Self::zeroed(), |acc, (i, attr)| {
+                        acc.with_encoded_attribute(i, Self::attribute_from_mair(attr))
+                    })
+            }
+        }
+
+        register! {
             /// Lock region address for each address space.
             pub(crate) LOCKADDR(u64)[MAX_AS, stride = STRIDE] @ 0x2410 {
                 /// Lock region size.
diff --git a/drivers/gpu/drm/tyr/tyr.rs b/drivers/gpu/drm/tyr/tyr.rs
index 20b38120e20e..9f9f31ea02e3 100644
--- a/drivers/gpu/drm/tyr/tyr.rs
+++ b/drivers/gpu/drm/tyr/tyr.rs
@@ -11,6 +11,7 @@
 mod file;
 mod gem;
 mod gpu;
+mod mmu;
 mod regs;
 mod slot;
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 10/20] drm/tyr: add GPU virtual memory module
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (8 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 09/20] drm/tyr: add MMU module Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 11/20] drm/tyr: add a kernel buffer object Deborah Brouwer
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

From: Boris Brezillon <boris.brezillon@collabora.com>

Add GPU virtual address space management using the DRM GPUVM framework.
Each virtual memory (VM) space is backed by ARM64 LPAE Stage 1 page tables
and can be mapped into hardware address space (AS) slots for GPU execution.

The implementation provides memory isolation and virtual address
allocation. VMs support mapping GEM buffer objects with configurable
protection flags (readonly, noexec, uncached) and handle both 4KB and 2MB
page sizes. A new_dummy_object() helper is provided to create a dummy GEM
object for use as a GPUVM root.

The vm module integrates with the MMU for address space activation and
provides map/unmap/remap operations with page table synchronization.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Co-developed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Co-developed-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/gem.rs |  22 +-
 drivers/gpu/drm/tyr/tyr.rs |   1 +
 drivers/gpu/drm/tyr/vm.rs  | 806 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 828 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tyr/gem.rs b/drivers/gpu/drm/tyr/gem.rs
index c6d4d6f9bae3..111acf33993f 100644
--- a/drivers/gpu/drm/tyr/gem.rs
+++ b/drivers/gpu/drm/tyr/gem.rs
@@ -7,9 +7,11 @@
 use kernel::{
     drm::{
         gem,
+        gem::shmem,
         DeviceContext, //
     },
-    prelude::*, //
+    prelude::*,
+    sync::aref::ARef, //
 };
 
 use crate::driver::{
@@ -41,3 +43,21 @@ fn new<Ctx: DeviceContext>(
         try_pin_init!(Self { flags: args.flags })
     }
 }
+
+/// Type alias for Tyr GEM buffer objects.
+pub(crate) type Bo = gem::shmem::Object<BoData>;
+
+/// Creates a dummy GEM object to serve as the root of a GPUVM.
+pub(crate) fn new_dummy_object<Ctx: DeviceContext>(ddev: &TyrDrmDevice<Ctx>) -> Result<ARef<Bo>> {
+    let bo = gem::shmem::Object::<BoData>::new(
+        ddev,
+        4096,
+        shmem::ObjectConfig {
+            map_wc: true,
+            parent_resv_obj: None,
+        },
+        BoCreateArgs { flags: 0 },
+    )?;
+
+    Ok(bo)
+}
diff --git a/drivers/gpu/drm/tyr/tyr.rs b/drivers/gpu/drm/tyr/tyr.rs
index 9f9f31ea02e3..b3244670dd79 100644
--- a/drivers/gpu/drm/tyr/tyr.rs
+++ b/drivers/gpu/drm/tyr/tyr.rs
@@ -14,6 +14,7 @@
 mod mmu;
 mod regs;
 mod slot;
+mod vm;
 
 kernel::module_platform_driver! {
     type: TyrPlatformDriverData,
diff --git a/drivers/gpu/drm/tyr/vm.rs b/drivers/gpu/drm/tyr/vm.rs
new file mode 100644
index 000000000000..c19300d76194
--- /dev/null
+++ b/drivers/gpu/drm/tyr/vm.rs
@@ -0,0 +1,806 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+
+//! GPU virtual memory management using the DRM GPUVM framework.
+//!
+//! This module manages GPU virtual address spaces, providing memory isolation and
+//! the illusion of owning the entire virtual address (VA) range, similar to CPU virtual memory.
+//! Each virtual memory (VM) area is backed by ARM64 LPAE Stage 1 page tables and can be
+//! mapped into hardware address space (AS) slots for GPU execution.
+#![allow(dead_code)]
+
+use core::ops::Range;
+
+use kernel::{
+    c_str,
+    device::{
+        Bound,
+        Device, //
+    },
+    drm::{
+        gpuvm::{
+            DriverGpuVm,
+            GpuVaAlloc,
+            GpuVm,
+            GpuVmBo,
+            OpMap,
+            OpMapRequest,
+            OpMapped,
+            OpRemap,
+            OpRemapped,
+            OpUnmap,
+            OpUnmapped,
+            UniqueRefGpuVm, //
+        },
+        DeviceContext, //
+    },
+    impl_flags,
+    iommu::pgtable::{
+        prot,
+        IoPageTable,
+        ARM64LPAES1, //
+    },
+    new_mutex,
+    platform,
+    prelude::*,
+    sizes::{
+        SZ_1G,
+        SZ_2M,
+        SZ_4K, //
+    },
+    sync::{
+        aref::ARef,
+        Arc,
+        ArcBorrow,
+        Mutex, //
+    },
+    uapi, //
+};
+
+use crate::{
+    driver::{
+        TyrDrmDevice,
+        TyrDrmDriver, //
+    },
+    gem,
+    gem::Bo,
+    gpu::GpuInfo,
+    mmu::{
+        address_space::VmAsData,
+        Mmu, //
+    },
+    regs::gpu_control::MMU_FEATURES,
+};
+
+impl_flags!(
+    /// Flags controlling virtual memory mapping behavior.
+    ///
+    /// These flags control access permissions and caching behavior for GPU virtual
+    /// memory mappings.
+    #[derive(Debug, Clone, Default, Copy, PartialEq, Eq)]
+    pub(crate) struct VmMapFlags(u32);
+
+    /// Individual flags that can be combined in [`VmMapFlags`].
+    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
+    pub(crate) enum VmFlag {
+        /// Map as read-only.
+        Readonly = uapi::drm_panthor_vm_bind_op_flags_DRM_PANTHOR_VM_BIND_OP_MAP_READONLY as u32,
+        /// Map as non-executable.
+        Noexec = uapi::drm_panthor_vm_bind_op_flags_DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC as u32,
+        /// Map as uncached.
+        Uncached = uapi::drm_panthor_vm_bind_op_flags_DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED as u32,
+    }
+);
+
+impl VmMapFlags {
+    /// Convert the flags to `pgtable::prot`.
+    fn to_prot(self) -> u32 {
+        let mut prot = 0;
+
+        if self.contains(VmFlag::Readonly) {
+            prot |= prot::READ;
+        } else {
+            prot |= prot::READ | prot::WRITE;
+        }
+
+        if self.contains(VmFlag::Noexec) {
+            prot |= prot::NOEXEC;
+        }
+
+        if !self.contains(VmFlag::Uncached) {
+            prot |= prot::CACHE;
+        }
+
+        prot
+    }
+}
+
+impl core::fmt::Display for VmMapFlags {
+    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
+        let mut first = true;
+
+        if self.contains(VmFlag::Readonly) {
+            write!(f, "READONLY")?;
+            first = false;
+        }
+        if self.contains(VmFlag::Noexec) {
+            if !first {
+                write!(f, " | ")?;
+            }
+            write!(f, "NOEXEC")?;
+            first = false;
+        }
+
+        if self.contains(VmFlag::Uncached) {
+            if !first {
+                write!(f, " | ")?;
+            }
+            write!(f, "UNCACHED")?;
+        }
+
+        Ok(())
+    }
+}
+
+impl TryFrom<u32> for VmMapFlags {
+    type Error = Error;
+
+    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
+        let valid = (kernel::uapi::drm_panthor_vm_bind_op_flags_DRM_PANTHOR_VM_BIND_OP_MAP_READONLY
+            | kernel::uapi::drm_panthor_vm_bind_op_flags_DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC
+            | kernel::uapi::drm_panthor_vm_bind_op_flags_DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED)
+            as u32;
+
+        if value & !valid != 0 {
+            pr_err!("Invalid VM map flags: {:#x}\n", value);
+            return Err(EINVAL);
+        }
+        Ok(Self(value))
+    }
+}
+
+/// Arguments for a virtual memory map operation.
+struct VmMapArgs {
+    /// Access permissions and caching behavior for the mapping.
+    flags: VmMapFlags,
+    /// GEM buffer object registered with the GPUVM framework.
+    vm_bo: ARef<GpuVmBo<GpuVmData>>,
+    /// Offset in bytes from the start of the buffer object.
+    bo_offset: u64,
+}
+
+/// Type of virtual memory operation.
+enum VmOpType {
+    /// Map a GEM buffer object into the virtual address space.
+    Map(VmMapArgs),
+    /// Unmap a region from the virtual address space.
+    Unmap,
+}
+
+/// Preallocated resources needed to execute a VM operation.
+///
+/// VM operations may require allocating new GPUVA objects to track mappings.
+/// To avoid allocation failures during the operation, preallocate the
+/// maximum number of GPUVAs that might be needed.
+struct VmOpResources {
+    /// Preallocated GPUVA objects for remap operations.
+    ///
+    /// Partial unmap requests or map requests overlapping existing mappings
+    /// will trigger a remap call, which needs to register up to three VA
+    /// objects (one for the new mapping, and two for the previous and next
+    /// mappings).
+    preallocated_gpuvas: [Option<GpuVaAlloc<GpuVmData>>; 3],
+}
+
+/// Request to execute a virtual memory operation.
+struct VmOpRequest {
+    /// Request type.
+    op_type: VmOpType,
+
+    /// Region of the virtual address space covered by this request.
+    region: Range<u64>,
+}
+
+/// Arguments for a page table map operation.
+struct PtMapArgs {
+    /// Memory protection flags describing allowed accesses for this mapping.
+    ///
+    /// This is directly derived from [`VmMapFlags`] via [`VmMapFlags::to_prot`].
+    prot: u32,
+}
+
+/// Type of page table operation.
+enum PtOpType {
+    /// Map pages into the page table.
+    Map(PtMapArgs),
+    /// Unmap pages from the page table.
+    Unmap,
+}
+
+/// Context for updating the GPU page table.
+///
+/// This context is created when beginning a page table update operation and
+/// automatically flushes changes when dropped. It ensures that the
+/// Memory Management Unit (MMU) state is properly managed and Translation
+/// Lookaside Buffer (TLB) entries are flushed.
+pub(crate) struct PtUpdateContext<'ctx> {
+    /// Device used for DMA-mapping GEM shmem SG tables.
+    dev: &'ctx Device<Bound>,
+
+    /// Page table.
+    pt: &'ctx IoPageTable<ARM64LPAES1>,
+
+    /// MMU manager.
+    mmu: &'ctx Mmu,
+
+    /// Reference to the address space data to pass to the MMU functions.
+    as_data: &'ctx VmAsData,
+
+    /// Region of the virtual address space covered by this request.
+    region: Range<u64>,
+
+    /// Operation type.
+    op_type: PtOpType,
+
+    /// Preallocated resources that can be used when executing the request.
+    resources: &'ctx mut VmOpResources,
+}
+
+impl<'ctx> PtUpdateContext<'ctx> {
+    /// Creates a new page table update context.
+    ///
+    /// This prepares the MMU for a page table update.
+    /// The context will automatically flush the TLB and
+    /// complete the update when dropped.
+    fn new(
+        dev: &'ctx Device<Bound>,
+        pt: &'ctx IoPageTable<ARM64LPAES1>,
+        mmu: &'ctx Mmu,
+        as_data: &'ctx VmAsData,
+        region: Range<u64>,
+        op_type: PtOpType,
+        resources: &'ctx mut VmOpResources,
+    ) -> Result<PtUpdateContext<'ctx>> {
+        mmu.start_vm_update(as_data, &region)?;
+
+        Ok(Self {
+            dev,
+            pt,
+            mmu,
+            as_data,
+            region,
+            op_type,
+            resources,
+        })
+    }
+
+    /// Finds one of our pre-allocated VAs.
+    ///
+    /// It is a logic error to call this more than three times for a given
+    /// PtUpdateContext.
+    fn preallocated_gpuva(&mut self) -> Result<GpuVaAlloc<GpuVmData>> {
+        self.resources
+            .preallocated_gpuvas
+            .iter_mut()
+            .find_map(|f| f.take())
+            .ok_or(EINVAL)
+    }
+}
+
+impl Drop for PtUpdateContext<'_> {
+    fn drop(&mut self) {
+        if let Err(e) = self.mmu.end_vm_update(self.as_data) {
+            pr_err!("Failed to end VM update {:?}\n", e);
+        }
+
+        if let Err(e) = self.mmu.flush_vm(self.as_data) {
+            pr_err!("Failed to flush VM {:?}\n", e);
+        }
+    }
+}
+
+/// Driver implementation for the GPUVM framework.
+///
+/// Implements [`DriverGpuVm`] to provide VM operation callbacks (map, unmap, remap)
+/// and associated types for buffer objects, virtual addresses, and contexts.
+pub(crate) struct GpuVmData {}
+
+/// GPU virtual address space.
+///
+/// Each VM can be mapped into a hardware address space slot.
+#[pin_data]
+pub(crate) struct Vm {
+    /// Data referenced by an AS when the VM is active
+    as_data: Arc<VmAsData>,
+    /// MMU manager.
+    mmu: Arc<Mmu>,
+    /// Platform device reference (needed to access the page table via devres).
+    pdev: ARef<platform::Device>,
+    /// DRM GPUVM core for managing virtual address space.
+    #[pin]
+    gpuvm_unique: Mutex<UniqueRefGpuVm<GpuVmData>>,
+    /// Non-core part of the GPUVM. Can be used for stuff that doesn't modify the
+    /// internal mapping tree, like GpuVm::obtain()
+    gpuvm: ARef<GpuVm<GpuVmData>>,
+    /// VA range for this VM.
+    va_range: Range<u64>,
+}
+
+impl Vm {
+    /// Creates a new GPU virtual address space.
+    ///
+    /// The VM is initialized with a page table configured according to the GPU's
+    /// address translation capabilities and registered with the GPUVM framework.
+    pub(crate) fn new<Ctx: DeviceContext>(
+        pdev: &platform::Device,
+        ddev: &TyrDrmDevice<Ctx>,
+        mmu: ArcBorrow<'_, Mmu>,
+        gpu_info: &GpuInfo,
+    ) -> Result<Arc<Vm>> {
+        let mmu_features = MMU_FEATURES::from_raw(gpu_info.mmu_features);
+        let va_bits = mmu_features.va_bits().get();
+        let pa_bits = mmu_features.pa_bits().get();
+
+        let range = 0..(1u64 << va_bits);
+        let reserve_range = 0..0u64;
+
+        // dummy_obj is used to initialize the GPUVM tree.
+        let dummy_obj = gem::new_dummy_object(ddev).inspect_err(|e| {
+            pr_err!("Failed to create dummy GEM object: {:?}\n", e);
+        })?;
+
+        let gpuvm_unique = kernel::drm::gpuvm::GpuVm::new::<Error, _>(
+            c_str!("Tyr::GpuVm"),
+            ddev,
+            &*dummy_obj,
+            range.clone(),
+            reserve_range,
+            GpuVmData {},
+        )
+        .inspect_err(|e| {
+            pr_err!("Failed to create GpuVm: {:?}\n", e);
+        })?;
+        let gpuvm = ARef::from(&*gpuvm_unique);
+
+        let as_data = Arc::pin_init(VmAsData::new(&mmu, pdev, va_bits, pa_bits), GFP_KERNEL)?;
+
+        let vm = Arc::pin_init(
+            pin_init!(Self{
+                as_data,
+                pdev: pdev.into(),
+                mmu: mmu.into(),
+                gpuvm,
+                gpuvm_unique <- new_mutex!(gpuvm_unique),
+                va_range: range,
+            }),
+            GFP_KERNEL,
+        )?;
+
+        Ok(vm)
+    }
+
+    /// Activate the VM in a hardware address space slot.
+    pub(crate) fn activate(&self) -> Result {
+        self.mmu
+            .activate_vm(self.as_data.as_arc_borrow())
+            .inspect_err(|e| {
+                pr_err!("Failed to activate VM: {:?}\n", e);
+            })
+    }
+
+    /// Deactivate the VM by evicting it from its address space slot.
+    fn deactivate(&self) -> Result {
+        self.mmu.deactivate_vm(&self.as_data).inspect_err(|e| {
+            pr_err!("Failed to deactivate VM: {:?}\n", e);
+        })
+    }
+
+    /// Kills the VM by deactivating it and unmapping all regions.
+    pub(crate) fn kill(&self) {
+        // TODO: Turn the VM into a state where it can't be used.
+        let _ = self.deactivate().inspect_err(|e| {
+            pr_err!("Failed to deactivate VM: {:?}\n", e);
+        });
+        let _ = self
+            .unmap_range(self.va_range.start, self.va_range.end - self.va_range.start)
+            .inspect_err(|e| {
+                pr_err!("Failed to unmap range during deactivate: {:?}\n", e);
+            });
+    }
+
+    /// Executes a virtual memory operation.
+    ///
+    /// This handles both map and unmap operations by coordinating between the
+    /// GPUVM framework and the hardware page table.
+    fn exec_op(
+        &self,
+        gpuvm_unique: &mut UniqueRefGpuVm<GpuVmData>,
+        req: VmOpRequest,
+        resources: &mut VmOpResources,
+    ) -> Result {
+        // SAFETY: pdev is a bound device.
+        let dev = unsafe { self.pdev.as_ref().as_bound() };
+
+        let pt = self.as_data.page_table.access(dev).inspect_err(|e| {
+            pr_err!("Failed to access page table while mapping pages: {:?}\n", e);
+        })?;
+
+        match req.op_type {
+            VmOpType::Map(args) => {
+                let mut pt_upd = PtUpdateContext::new(
+                    dev,
+                    pt,
+                    &self.mmu,
+                    &self.as_data,
+                    req.region,
+                    PtOpType::Map(PtMapArgs {
+                        prot: args.flags.to_prot(),
+                    }),
+                    resources,
+                )?;
+
+                gpuvm_unique.sm_map(OpMapRequest {
+                    addr: pt_upd.region.start,
+                    range: pt_upd.region.end - pt_upd.region.start,
+                    gem_offset: args.bo_offset,
+                    vm_bo: &args.vm_bo,
+                    context: &mut pt_upd,
+                })
+                //PtUpdateContext drops here flushing the page table
+            }
+            VmOpType::Unmap => {
+                let mut pt_upd = PtUpdateContext::new(
+                    dev,
+                    pt,
+                    &self.mmu,
+                    &self.as_data,
+                    req.region,
+                    PtOpType::Unmap,
+                    resources,
+                )?;
+
+                gpuvm_unique.sm_unmap(
+                    pt_upd.region.start,
+                    pt_upd.region.end - pt_upd.region.start,
+                    &mut pt_upd,
+                )
+                //PtUpdateContext drops here flushing the page table
+            }
+        }
+    }
+
+    /// Maps a GEM buffer object range into the VM at the specified virtual address.
+    ///
+    /// This creates a mapping from GPU virtual address `va` to the physical pages
+    /// backing the GEM object, starting at `bo_offset` bytes into the object and
+    /// spanning `size` bytes. The mapping respects the access permissions and
+    /// caching behavior specified in `flags`.
+    pub(crate) fn map_bo_range(
+        &self,
+        bo: &Bo,
+        bo_offset: u64,
+        size: u64,
+        va: u64,
+        flags: VmMapFlags,
+    ) -> Result {
+        let req = VmOpRequest {
+            op_type: VmOpType::Map(VmMapArgs {
+                vm_bo: self.gpuvm.obtain(bo, ())?,
+                flags,
+                bo_offset,
+            }),
+            region: va..(va + size),
+        };
+        let mut resources = VmOpResources {
+            preallocated_gpuvas: [
+                Some(GpuVaAlloc::<GpuVmData>::new(GFP_KERNEL)?),
+                Some(GpuVaAlloc::<GpuVmData>::new(GFP_KERNEL)?),
+                Some(GpuVaAlloc::<GpuVmData>::new(GFP_KERNEL)?),
+            ],
+        };
+        let mut gpuvm_unique = self.gpuvm_unique.lock();
+
+        self.exec_op(gpuvm_unique.as_mut().get_mut(), req, &mut resources)?;
+
+        // We flush the defer cleanup list now. Things will be different in
+        // the asynchronous VM_BIND path, where we want the cleanup to
+        // happen outside the DMA signalling path.
+        self.gpuvm.deferred_cleanup();
+        Ok(())
+    }
+
+    /// Unmaps a virtual address range from the VM.
+    ///
+    /// This removes any existing mappings in the specified range, freeing the
+    /// virtual address space for reuse.
+    pub(crate) fn unmap_range(&self, va: u64, size: u64) -> Result {
+        let req = VmOpRequest {
+            op_type: VmOpType::Unmap,
+            region: va..(va + size),
+        };
+        let mut resources = VmOpResources {
+            preallocated_gpuvas: [
+                Some(GpuVaAlloc::<GpuVmData>::new(GFP_KERNEL)?),
+                Some(GpuVaAlloc::<GpuVmData>::new(GFP_KERNEL)?),
+                None,
+            ],
+        };
+        let mut gpuvm_unique = self.gpuvm_unique.lock();
+
+        self.exec_op(gpuvm_unique.as_mut().get_mut(), req, &mut resources)?;
+
+        // We flush the defer cleanup list now. Things will be different in
+        // the asynchronous VM_BIND path, where we want the cleanup to
+        // happen outside the DMA signalling path.
+        self.gpuvm.deferred_cleanup();
+        Ok(())
+    }
+}
+
+impl DriverGpuVm for GpuVmData {
+    type Driver = TyrDrmDriver;
+    type Object = Bo;
+    type VmBoData = ();
+    type VaData = ();
+    type SmContext<'ctx> = PtUpdateContext<'ctx>;
+
+    /// Indicates that a new mapping should be created.
+    fn sm_step_map<'op>(
+        &mut self,
+        op: OpMap<'op, Self>,
+        context: &mut Self::SmContext<'_>,
+    ) -> Result<OpMapped<'op, Self>, Error> {
+        let start_iova = op.addr();
+        let mut iova = start_iova;
+        let mut bytes_left_to_map = op.length();
+        let mut gem_offset = op.gem_offset();
+        let sgt = op.obj().sg_table(context.dev).inspect_err(|e| {
+            pr_err!("Failed to get sg_table: {:?}\n", e);
+        })?;
+        let prot = match &context.op_type {
+            PtOpType::Map(args) => args.prot,
+            _ => {
+                return Err(EINVAL);
+            }
+        };
+
+        for sgt_entry in sgt.iter() {
+            let mut paddr = sgt_entry.dma_address();
+            let mut sgt_entry_length: u64 = sgt_entry.dma_len();
+
+            if bytes_left_to_map == 0 {
+                break;
+            }
+
+            if gem_offset > 0 {
+                // Skip the entire SGT entry if the gem_offset exceeds its length
+                let skip = sgt_entry_length.min(gem_offset);
+                paddr += skip;
+                sgt_entry_length -= skip;
+                gem_offset -= skip;
+            }
+
+            if sgt_entry_length == 0 {
+                continue;
+            }
+
+            if gem_offset != 0 {
+                pr_err!("Invalid gem_offset {} in page table mapping.\n", gem_offset);
+                return Err(EINVAL);
+            }
+            let len = sgt_entry_length.min(bytes_left_to_map);
+
+            let segment_mapped = match pt_map(context.pt, iova, paddr, len, prot) {
+                Ok(segment_mapped) => segment_mapped,
+                Err(e) => {
+                    // clean up any successful mappings from previous SGT entries.
+                    let total_mapped = iova - start_iova;
+                    if total_mapped > 0 {
+                        pt_unmap(context.pt, start_iova..(start_iova + total_mapped)).ok();
+                    }
+                    return Err(e);
+                }
+            };
+
+            // Since there could be a partial mapping, only advance by the actual amount mapped
+            bytes_left_to_map -= segment_mapped;
+            iova += segment_mapped;
+        }
+
+        let gpuva = context.preallocated_gpuva()?;
+        let op = op.insert(gpuva, pin_init::init_zeroed());
+
+        Ok(op)
+    }
+
+    /// Indicates that an existing mapping should be removed.
+    fn sm_step_unmap<'op>(
+        &mut self,
+        op: OpUnmap<'op, Self>,
+        context: &mut Self::SmContext<'_>,
+    ) -> Result<OpUnmapped<'op, Self>, Error> {
+        let start_iova = op.va().addr();
+        let length = op.va().length();
+
+        let region = start_iova..(start_iova + length);
+        pt_unmap(context.pt, region.clone()).inspect_err(|e| {
+            pr_err!(
+                "Failed to unmap region {:#x}..{:#x}: {:?}\n",
+                region.start,
+                region.end,
+                e
+            );
+        })?;
+
+        let (op_unmapped, _va_removed) = op.remove();
+
+        Ok(op_unmapped)
+    }
+
+    /// Indicates that an existing mapping should be split up.
+    fn sm_step_remap<'op>(
+        &mut self,
+        op: OpRemap<'op, Self>,
+        context: &mut Self::SmContext<'_>,
+    ) -> Result<OpRemapped<'op, Self>, Error> {
+        let unmap_start = if let Some(prev) = op.prev() {
+            prev.addr() + prev.length()
+        } else {
+            op.va_to_unmap().addr()
+        };
+
+        let unmap_end = if let Some(next) = op.next() {
+            next.addr()
+        } else {
+            op.va_to_unmap().addr() + op.va_to_unmap().length()
+        };
+
+        let unmap_length = unmap_end - unmap_start;
+
+        if unmap_length > 0 {
+            let region = unmap_start..(unmap_start + unmap_length);
+            pt_unmap(context.pt, region.clone()).inspect_err(|e| {
+                pr_err!(
+                    "Failed to unmap remap region {:#x}..{:#x}: {:?}\n",
+                    region.start,
+                    region.end,
+                    e
+                );
+            })?;
+        }
+
+        let prev_va = context.preallocated_gpuva()?;
+        let next_va = context.preallocated_gpuva()?;
+
+        let (op_remapped, _remap_ret) = op.remap(
+            [prev_va, next_va],
+            pin_init::init_zeroed(),
+            pin_init::init_zeroed(),
+        );
+
+        Ok(op_remapped)
+    }
+}
+
+/// This function selects the largest supported block size (currently 4KB or 2MB)
+/// that can be used for a mapping at the given address and size, respecting alignment constraints.
+///
+/// We can map multiple pages at once but we can't exceed the size of the
+// table entry itself. So, if mapping 4KB pages, figure out how many pages
+// can be mapped before we hit the 2MB boundary. Or, if mapping 2MB pages,
+// figure out how many pages can be mapped before hitting the 1GB boundary
+// Returns the page size (4KB or 2MB) and the number of pages that can be mapped at that size.
+fn get_pgsize(addr: u64, size: u64) -> (u64, u64) {
+    // Get the distance to the next boundary of 2MB block
+    let blk_offset_2m = addr.wrapping_neg() % (SZ_2M as u64);
+
+    // Use 4K blocks if the address is not 2MB aligned, or we have less than 2MB to map
+    if blk_offset_2m != 0 || size < SZ_2M as u64 {
+        let pgcount = if blk_offset_2m == 0 {
+            size / SZ_4K as u64
+        } else {
+            blk_offset_2m.min(size) / SZ_4K as u64
+        };
+        return (SZ_4K as u64, pgcount);
+    }
+
+    let blk_offset_1g = addr.wrapping_neg() % (SZ_1G as u64);
+    let blk_offset = if blk_offset_1g == 0 {
+        SZ_1G as u64
+    } else {
+        blk_offset_1g
+    };
+    let pgcount = blk_offset.min(size) / SZ_2M as u64;
+
+    (SZ_2M as u64, pgcount)
+}
+
+/// Maps a physical address range into the page table at the specified virtual address.
+///
+/// This function maps `len` bytes of physical memory starting at `paddr` to the
+/// virtual address `iova`, using the protection flags specified in `prot`. It
+/// automatically selects optimal page sizes to minimize page table overhead.
+///
+/// If the mapping fails partway through, all successfully mapped pages are
+/// unmapped before returning an error.
+///
+/// Returns the number of bytes successfully mapped.
+fn pt_map(
+    pt: &IoPageTable<ARM64LPAES1>,
+    iova: u64,
+    paddr: u64,
+    len: u64,
+    prot: u32,
+) -> Result<u64> {
+    let mut segment_mapped = 0u64;
+    while segment_mapped < len {
+        let remaining = len - segment_mapped;
+        let curr_iova = iova + segment_mapped;
+        let curr_paddr = paddr + segment_mapped;
+
+        let (pgsize, pgcount) = get_pgsize(curr_iova | curr_paddr, remaining);
+
+        // SAFETY: Exclusive access to the page table is ensured because
+        // the pt reference comes from PtUpdateContext, which is created
+        // during a VM update operation, ensuring the driver does not concurrently
+        // modify the page table.
+        let (mapped, result) = unsafe {
+            pt.map_pages(
+                curr_iova as usize,
+                (curr_paddr as usize).try_into().unwrap(),
+                pgsize as usize,
+                pgcount as usize,
+                prot,
+                GFP_KERNEL,
+            )
+        };
+
+        if let Err(e) = result {
+            pr_err!("pt.map_pages failed at iova {:#x}: {:?}\n", curr_iova, e);
+            if segment_mapped > 0 {
+                pt_unmap(pt, iova..(iova + segment_mapped)).ok();
+            }
+            return Err(e);
+        }
+
+        if mapped == 0 {
+            pr_err!("Failed to map any pages at iova {:#x}\n", curr_iova);
+            if segment_mapped > 0 {
+                pt_unmap(pt, iova..(iova + segment_mapped)).ok();
+            }
+            return Err(ENOMEM);
+        }
+
+        segment_mapped += mapped as u64;
+    }
+
+    Ok(segment_mapped)
+}
+
+/// Unmaps a virtual address range from the page table.
+///
+/// This function removes all page table entries in the specified range,
+/// automatically handling different page sizes that may be present.
+fn pt_unmap(pt: &IoPageTable<ARM64LPAES1>, range: Range<u64>) -> Result {
+    let mut iova = range.start;
+    let mut bytes_left_to_unmap = range.end - range.start;
+
+    while bytes_left_to_unmap > 0 {
+        let (pgsize, pgcount) = get_pgsize(iova, bytes_left_to_unmap);
+
+        // SAFETY: Exclusive access to the page table is ensured because
+        // the pt reference comes from PtUpdateContext, which was
+        // created while holding &mut Vm, preventing any other access to the
+        // page table for the duration of this operation.
+        let unmapped = unsafe { pt.unmap_pages(iova as usize, pgsize as usize, pgcount as usize) };
+
+        if unmapped == 0 {
+            pr_err!("Failed to unmap any bytes at iova {:#x}\n", iova);
+            return Err(EINVAL);
+        }
+
+        bytes_left_to_unmap -= unmapped as u64;
+        iova += unmapped as u64;
+    }
+
+    Ok(())
+}

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 11/20] drm/tyr: add a kernel buffer object
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (9 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 10/20] drm/tyr: add GPU virtual memory module Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 12/20] drm/tyr: add parser for firmware binary Deborah Brouwer
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

Introduce a buffer object type (KernelBo) for internal driver allocations
that are managed by the kernel rather than userspace.

KernelBo wraps a GEM shmem object and automatically handles GPU virtual
address space mapping during creation and unmapping on drop. This provides
a safe and convenient way for the driver to both allocate and clean up
internal buffers for kernel-managed resources.

Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/gem.rs | 124 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 117 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tyr/gem.rs b/drivers/gpu/drm/tyr/gem.rs
index 111acf33993f..d032a8ae543f 100644
--- a/drivers/gpu/drm/tyr/gem.rs
+++ b/drivers/gpu/drm/tyr/gem.rs
@@ -4,6 +4,8 @@
 //! This module provides buffer object (BO) management functionality using
 //! DRM's GEM subsystem with shmem backing.
 
+use core::ops::Range;
+
 use kernel::{
     drm::{
         gem,
@@ -11,23 +13,41 @@
         DeviceContext, //
     },
     prelude::*,
-    sync::aref::ARef, //
+    sync::{
+        aref::ARef,
+        Arc,
+        ArcBorrow, //
+    },
 };
 
-use crate::driver::{
-    TyrDrmDevice,
-    TyrDrmDriver, //
+use crate::{
+    driver::{
+        TyrDrmDevice,
+        TyrDrmDriver, //
+    },
+    vm::{
+        Vm,
+        VmMapFlags, //
+    },
 };
 
-/// Tyr's DriverObject type for GEM objects.
+/// Driver-specific data for Tyr GEM buffer objects.
+///
+/// This structure contains Tyr-specific metadata associated with each GEM object.
+/// It implements [`gem::DriverObject`] to provide driver-specific behavior for
+/// buffer object creation and management.
 #[pin_data]
 pub(crate) struct BoData {
+    /// Buffer object creation flags (currently unused).
     flags: u32,
 }
 
-/// Provides a way to pass arguments when creating BoData
-/// as required by the gem::DriverObject trait.
+/// Arguments for creating a [`BoData`] instance.
+///
+/// This structure is used to pass creation parameters when instantiating
+/// a new buffer object, as required by the [`gem::DriverObject`] trait.
 pub(crate) struct BoCreateArgs {
+    /// Buffer object creation flags (currently unused).
     flags: u32,
 }
 
@@ -35,6 +55,12 @@ impl gem::DriverObject for BoData {
     type Driver = TyrDrmDriver;
     type Args = BoCreateArgs;
 
+    /// Constructs a new [`BoData`] instance for a GEM object.
+    ///
+    /// This function is called by the GEM subsystem when creating a new buffer
+    /// object. It initializes the driver-specific data with the provided flags.
+    /// The device and size parameters are currently unused but required by the
+    /// [`gem::DriverObject`] trait.
     fn new<Ctx: DeviceContext>(
         _dev: &TyrDrmDevice<Ctx>,
         _size: usize,
@@ -61,3 +87,87 @@ pub(crate) fn new_dummy_object<Ctx: DeviceContext>(ddev: &TyrDrmDevice<Ctx>) ->
 
     Ok(bo)
 }
+
+/// VA allocation strategy for kernel buffer objects.
+///
+/// Specifies how the GPU virtual address should be determined when creating
+/// a [`KernelBo`]. An automatic VA allocation strategy will be added in the future.
+pub(crate) enum KernelBoVaAlloc {
+    /// Explicit VA address specified by the caller.
+    #[expect(dead_code)]
+    Explicit(u64),
+}
+
+/// A kernel-owned buffer object with automatic GPU virtual address mapping.
+///
+/// This structure represents a buffer object that is created and managed entirely
+/// by the kernel driver, as opposed to userspace-created GEM objects. It combines
+/// a GEM object with automatic GPU virtual address (VA) space mapping and cleanup.
+///
+/// When dropped, the buffer is automatically unmapped from the GPU VA space.
+pub(crate) struct KernelBo {
+    /// The underlying GEM buffer object.
+    #[expect(dead_code)]
+    pub(crate) bo: ARef<Bo>,
+    /// The GPU VM this buffer is mapped into.
+    vm: Arc<Vm>,
+    /// The GPU VA range occupied by this buffer.
+    va_range: Range<u64>,
+}
+
+impl KernelBo {
+    /// Creates a new kernel-owned buffer object and maps it into GPU VA space.
+    ///
+    /// This function allocates a new shmem-backed GEM object and immediately maps
+    /// it into the specified GPU virtual memory space. The mapping is automatically
+    /// cleaned up when the [`KernelBo`] is dropped.
+    #[expect(dead_code)]
+    pub(crate) fn new<Ctx: DeviceContext>(
+        ddev: &TyrDrmDevice<Ctx>,
+        vm: ArcBorrow<'_, Vm>,
+        size: u64,
+        va_alloc: KernelBoVaAlloc,
+        flags: VmMapFlags,
+    ) -> Result<Self> {
+        if size == 0 {
+            pr_err!("Cannot create KernelBo with size 0\n");
+            return Err(EINVAL);
+        }
+
+        let KernelBoVaAlloc::Explicit(va) = va_alloc;
+
+        let bo = gem::shmem::Object::<BoData>::new(
+            ddev,
+            size as usize,
+            shmem::ObjectConfig {
+                map_wc: true,
+                parent_resv_obj: None,
+            },
+            BoCreateArgs { flags: 0 },
+        )?;
+
+        vm.map_bo_range(&bo, 0, size, va, flags)?;
+
+        Ok(KernelBo {
+            bo,
+            vm: vm.into(),
+            va_range: va..(va + size),
+        })
+    }
+}
+
+impl Drop for KernelBo {
+    fn drop(&mut self) {
+        let va = self.va_range.start;
+        let size = self.va_range.end - self.va_range.start;
+
+        if let Err(e) = self.vm.unmap_range(va, size) {
+            pr_err!(
+                "Failed to unmap KernelBo range {:#x}..{:#x}: {:?}\n",
+                self.va_range.start,
+                self.va_range.end,
+                e
+            );
+        }
+    }
+}

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 12/20] drm/tyr: add parser for firmware binary
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (10 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 11/20] drm/tyr: add a kernel buffer object Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-27  8:09   ` Onur Özkan
  2026-04-24 23:39 ` [PATCH v4 13/20] drm/tyr: add firmware loading and MCU boot support Deborah Brouwer
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

From: Daniel Almeida <daniel.almeida@collabora.com>

Add a parser for the Mali CSF GPU firmware binary format. The firmware
consists of a header followed by entries describing how to load firmware
sections into the MCU's memory.

The parser extracts section metadata including virtual address ranges,
data byte offsets within the binary, and section flags controlling
permissions and cache modes. It validates the basic firmware structure
and alignment and ignores protected-mode sections for now.

Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Co-developed-by: Beata Michalska <beata.michalska@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Co-developed-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/fw/parser.rs | 519 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 519 insertions(+)

diff --git a/drivers/gpu/drm/tyr/fw/parser.rs b/drivers/gpu/drm/tyr/fw/parser.rs
new file mode 100644
index 000000000000..638707430701
--- /dev/null
+++ b/drivers/gpu/drm/tyr/fw/parser.rs
@@ -0,0 +1,519 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+
+//! Firmware binary parser for Mali CSF (Command Stream Frontend) GPU.
+//!
+//! This module implements a parser for the Mali GPU firmware binary format. The firmware
+//! file contains a header followed by a sequence of entries, each describing how to load
+//! firmware sections into the MCU (Microcontroller Unit) memory. The parser extracts section metadata including:
+//! - Virtual address ranges where sections should be mapped
+//! - Data ranges (byte offsets) within the firmware binary
+//! - Section flags (permissions, cache modes)
+
+use core::{
+    mem::size_of,
+    ops::Range, //
+};
+
+use kernel::{
+    bits::bit_u32,
+    prelude::*,
+    str::CString, //
+};
+
+use crate::{
+    fw::{
+        SectionFlag,
+        SectionFlags,
+        CSF_MCU_SHARED_REGION_START, //
+    },
+    vm::{
+        VmFlag,
+        VmMapFlags, //
+    }, //
+};
+
+/// A parsed firmware section ready for loading into MCU memory.
+///
+/// Represents a single firmware section extracted from the firmware binary, containing
+/// all information needed to map the section's data into the MCU's virtual address space.
+pub(super) struct ParsedSection {
+    /// Byte offset range within the firmware binary where this section's data resides.
+    pub(super) data_range: Range<u32>,
+    /// MCU virtual address range where this section should be mapped.
+    pub(super) va: Range<u32>,
+    /// Memory protection and caching flags for the mapping.
+    pub(super) vm_map_flags: VmMapFlags,
+}
+
+/// A bare-bones `std::io::Cursor<[u8]>` clone to keep track of the current position in the firmware binary.
+///
+/// Provides methods to sequentially read primitive types and byte arrays from the firmware
+/// binary while maintaining the current read position.
+struct Cursor<'a> {
+    data: &'a [u8],
+    pos: usize,
+}
+
+impl<'a> Cursor<'a> {
+    fn new(data: &'a [u8]) -> Self {
+        Self { data, pos: 0 }
+    }
+
+    fn len(&self) -> usize {
+        self.data.len()
+    }
+
+    fn pos(&self) -> usize {
+        self.pos
+    }
+
+    /// Returns a view into the cursor's data.
+    ///
+    /// This spawns a new cursor, leaving the current cursor unchanged.
+    fn view(&self, range: Range<usize>) -> Result<Cursor<'_>> {
+        if range.start < self.pos || range.end > self.data.len() {
+            pr_err!(
+                "Invalid cursor range {:?} for data of length {}",
+                range,
+                self.data.len()
+            );
+
+            Err(EINVAL)
+        } else {
+            Ok(Self {
+                data: &self.data[range],
+                pos: 0,
+            })
+        }
+    }
+
+    /// Reads a slice of bytes from the current position and advances the cursor.
+    ///
+    /// Returns an error if the read would exceed the data bounds.
+    fn read(&mut self, nbytes: usize) -> Result<&[u8]> {
+        let start = self.pos;
+        let end = start + nbytes;
+
+        if end > self.data.len() {
+            pr_err!(
+                "Invalid firmware file: read of size {} at position {} is out of bounds",
+                nbytes,
+                start,
+            );
+            return Err(EINVAL);
+        }
+
+        self.pos += nbytes;
+        Ok(&self.data[start..end])
+    }
+
+    /// Reads a little-endian `u8` from the current position and advances the cursor.
+    fn read_u8(&mut self) -> Result<u8> {
+        let bytes = self.read(size_of::<u8>())?;
+        Ok(bytes[0])
+    }
+
+    /// Reads a little-endian `u16` from the current position and advances the cursor.
+    fn read_u16(&mut self) -> Result<u16> {
+        let bytes = self.read(size_of::<u16>())?;
+        Ok(u16::from_le_bytes(bytes.try_into().unwrap()))
+    }
+
+    /// Reads a little-endian `u32` from the current position and advances the cursor.
+    fn read_u32(&mut self) -> Result<u32> {
+        let bytes = self.read(size_of::<u32>())?;
+        Ok(u32::from_le_bytes(bytes.try_into().unwrap()))
+    }
+
+    /// Advances the cursor position by the specified number of bytes.
+    ///
+    /// Returns an error if the advance would exceed the data bounds.
+    fn advance(&mut self, nbytes: usize) -> Result {
+        if self.pos + nbytes > self.data.len() {
+            pr_err!(
+                "Invalid firmware file: advance of size {} at position {} is out of bounds",
+                nbytes,
+                self.pos,
+            );
+            return Err(EINVAL);
+        }
+        self.pos += nbytes;
+        Ok(())
+    }
+}
+
+/// Parser for Mali CSF GPU firmware binaries.
+///
+/// Parses the firmware binary format, extracting section metadata including virtual
+/// address ranges, data offsets, and memory protection flags needed to load firmware
+/// into the MCU's memory.
+pub(super) struct FwParser<'a> {
+    cursor: Cursor<'a>,
+}
+
+impl<'a> FwParser<'a> {
+    /// Creates a new firmware parser for the given firmware binary data.
+    pub(super) fn new(data: &'a [u8]) -> Self {
+        Self {
+            cursor: Cursor::new(data),
+        }
+    }
+
+    /// Parses the firmware binary and returns a collection of parsed sections.
+    ///
+    /// This method validates the firmware header and iterates through all entries
+    /// in the binary, extracting section information needed for loading.
+    pub(super) fn parse(&mut self) -> Result<KVec<ParsedSection>> {
+        let fw_header = self.parse_fw_header()?;
+
+        let mut parsed_sections = KVec::new();
+        while (self.cursor.pos() as u32) < fw_header.size {
+            let entry_section = self.parse_entry()?;
+
+            if let Some(inner) = entry_section.inner {
+                parsed_sections.push(inner, GFP_KERNEL)?;
+            }
+        }
+
+        Ok(parsed_sections)
+    }
+
+    fn parse_fw_header(&mut self) -> Result<FirmwareHeader> {
+        let fw_header: FirmwareHeader = match FirmwareHeader::new(&mut self.cursor) {
+            Ok(fw_header) => fw_header,
+            Err(e) => {
+                pr_err!("Invalid firmware file: {}", e.to_errno());
+                return Err(e);
+            }
+        };
+
+        if fw_header.size > self.cursor.len() as u32 {
+            pr_err!("Firmware image is truncated");
+            return Err(EINVAL);
+        }
+        Ok(fw_header)
+    }
+
+    fn parse_entry(&mut self) -> Result<EntrySection> {
+        let entry_section = EntrySection {
+            entry_hdr: EntryHeader(self.cursor.read_u32()?),
+            inner: None,
+        };
+
+        if self.cursor.pos() % size_of::<u32>() != 0
+            || entry_section.entry_hdr.size() as usize % size_of::<u32>() != 0
+        {
+            pr_err!(
+                "Firmware entry isn't 32 bit aligned, offset={:#x} size={:#x}\n",
+                self.cursor.pos() - size_of::<u32>(),
+                entry_section.entry_hdr.size()
+            );
+            return Err(EINVAL);
+        }
+
+        let section_hdr_size = entry_section.entry_hdr.size() as usize - size_of::<EntryHeader>();
+
+        let entry_section = {
+            let mut entry_cursor = self
+                .cursor
+                .view(self.cursor.pos()..self.cursor.pos() + section_hdr_size)?;
+
+            match entry_section.entry_hdr.entry_type() {
+                Ok(EntryType::Iface) => Ok(EntrySection {
+                    entry_hdr: entry_section.entry_hdr,
+                    inner: Self::parse_section_entry(&mut entry_cursor)?,
+                }),
+                Ok(
+                    EntryType::Config
+                    | EntryType::FutfTest
+                    | EntryType::TraceBuffer
+                    | EntryType::TimelineMetadata
+                    | EntryType::BuildInfoMetadata,
+                ) => Ok(entry_section),
+
+                entry_type => {
+                    if entry_type.is_err() || !entry_section.entry_hdr.optional() {
+                        if !entry_section.entry_hdr.optional() {
+                            pr_err!(
+                                "Failed to handle firmware entry type: {}\n",
+                                entry_type
+                                    .map_or(entry_section.entry_hdr.entry_type_raw(), |e| e as u8)
+                            );
+                            Err(EINVAL)
+                        } else {
+                            Ok(entry_section)
+                        }
+                    } else {
+                        Ok(entry_section)
+                    }
+                }
+            }
+        };
+
+        if entry_section.is_ok() {
+            self.cursor.advance(section_hdr_size)?;
+        }
+
+        entry_section
+    }
+
+    fn parse_section_entry(entry_cursor: &mut Cursor<'_>) -> Result<Option<ParsedSection>> {
+        let section_hdr: SectionHeader = SectionHeader::new(entry_cursor)?;
+
+        if section_hdr.data.end < section_hdr.data.start {
+            pr_err!(
+                "Firmware corrupted, data.end < data.start (0x{:x} < 0x{:x})\n",
+                section_hdr.data.end,
+                section_hdr.data.start
+            );
+            return Err(EINVAL);
+        }
+
+        if section_hdr.va.end < section_hdr.va.start {
+            pr_err!(
+                "Firmware corrupted, section_hdr.va.end < section_hdr.va.start (0x{:x} < 0x{:x})\n",
+                section_hdr.va.end,
+                section_hdr.va.start
+            );
+            return Err(EINVAL);
+        }
+
+        if section_hdr.section_flags.contains(SectionFlag::Prot) {
+            pr_info!("Firmware protected mode entry not supported, ignoring");
+            return Ok(None);
+        }
+
+        if section_hdr.va.start == CSF_MCU_SHARED_REGION_START
+            && !section_hdr.section_flags.contains(SectionFlag::Shared)
+        {
+            pr_err!(
+                "Interface at 0x{:x} must be shared",
+                CSF_MCU_SHARED_REGION_START
+            );
+            return Err(EINVAL);
+        }
+
+        let name_len = entry_cursor.len() - entry_cursor.pos();
+        let name_bytes = entry_cursor.read(name_len)?;
+
+        let mut name = KVec::with_capacity(name_bytes.len() + 1, GFP_KERNEL)?;
+        name.extend_from_slice(name_bytes, GFP_KERNEL)?;
+        name.push(0, GFP_KERNEL)?;
+
+        let _name = CStr::from_bytes_with_nul(&name)
+            .ok()
+            .and_then(|name| CString::try_from(name).ok());
+
+        let cache_mode = section_hdr.section_flags.cache_mode();
+        let mut vm_map_flags = VmMapFlags::empty();
+
+        if !section_hdr.section_flags.contains(SectionFlag::Write) {
+            vm_map_flags |= VmFlag::Readonly;
+        }
+        if !section_hdr.section_flags.contains(SectionFlag::Exec) {
+            vm_map_flags |= VmFlag::Noexec;
+        }
+        if cache_mode != SectionFlag::CacheModeCached.into() {
+            vm_map_flags |= VmFlag::Uncached;
+        }
+
+        Ok(Some(ParsedSection {
+            data_range: section_hdr.data.clone(),
+            va: section_hdr.va,
+            vm_map_flags,
+        }))
+    }
+}
+
+/// Firmware binary header containing version and size information.
+///
+/// The header is located at the beginning of the firmware binary and contains
+/// a magic value for validation, version information, and the total size of
+/// all structured headers that follow.
+#[expect(dead_code)]
+struct FirmwareHeader {
+    /// Magic value to check binary validity.
+    magic: u32,
+
+    /// Minor firmware version.
+    minor: u8,
+
+    /// Major firmware version.
+    major: u8,
+
+    /// Padding. Must be set to zero.
+    _padding1: u16,
+
+    /// Firmware version hash.
+    version_hash: u32,
+
+    /// Padding. Must be set to zero.
+    _padding2: u32,
+
+    /// Total size of all the structured data headers at beginning of firmware binary.
+    size: u32,
+}
+
+impl FirmwareHeader {
+    const FW_BINARY_MAGIC: u32 = 0xc3f13a6e;
+    const FW_BINARY_MAJOR_MAX: u8 = 0;
+
+    /// Reads and validates a firmware header from the cursor.
+    ///
+    /// Verifies the magic value, version compatibility, and padding fields.
+    fn new(cursor: &mut Cursor<'_>) -> Result<Self> {
+        let magic = cursor.read_u32()?;
+        if magic != Self::FW_BINARY_MAGIC {
+            pr_err!("Invalid firmware magic");
+            return Err(EINVAL);
+        }
+
+        let minor = cursor.read_u8()?;
+        let major = cursor.read_u8()?;
+
+        if major > Self::FW_BINARY_MAJOR_MAX {
+            pr_err!(
+                "Unsupported firmware binary header version {}.{} (expected {}.x)\n",
+                major,
+                minor,
+                Self::FW_BINARY_MAJOR_MAX
+            );
+            return Err(EINVAL);
+        }
+
+        let padding1 = cursor.read_u16()?;
+        let version_hash = cursor.read_u32()?;
+        let padding2 = cursor.read_u32()?;
+        let size = cursor.read_u32()?;
+
+        if padding1 != 0 || padding2 != 0 {
+            pr_err!("Invalid firmware file: header padding is not zero");
+            return Err(EINVAL);
+        }
+
+        let fw_header = Self {
+            magic,
+            minor,
+            major,
+            _padding1: padding1,
+            version_hash,
+            _padding2: padding2,
+            size,
+        };
+
+        Ok(fw_header)
+    }
+}
+
+/// Firmware section header for loading binary sections into MCU memory.
+#[derive(Debug)]
+struct SectionHeader {
+    section_flags: SectionFlags,
+    /// MCU virtual range to map this binary section to.
+    va: Range<u32>,
+    /// References the data in the FW binary.
+    data: Range<u32>,
+}
+
+impl SectionHeader {
+    /// Reads and validates a section header from the cursor.
+    ///
+    /// Parses section flags, virtual address range, and data range from the firmware binary.
+    fn new(cursor: &mut Cursor<'_>) -> Result<Self> {
+        let section_flags = cursor.read_u32()?;
+        let section_flags = SectionFlags::try_from(section_flags)?;
+
+        let va_start = cursor.read_u32()?;
+        let va_end = cursor.read_u32()?;
+
+        let va = va_start..va_end;
+
+        if va.is_empty() {
+            pr_err!(
+                "Invalid firmware file: empty VA range at pos {}\n",
+                cursor.pos(),
+            );
+            return Err(EINVAL);
+        }
+
+        let data_start = cursor.read_u32()?;
+        let data_end = cursor.read_u32()?;
+        let data = data_start..data_end;
+
+        Ok(Self {
+            section_flags,
+            va,
+            data,
+        })
+    }
+}
+
+/// A firmware entry containing a header and optional parsed section data.
+///
+/// Represents a single entry in the firmware binary, which may contain loadable
+/// section data or metadata that doesn't require loading.
+struct EntrySection {
+    entry_hdr: EntryHeader,
+    inner: Option<ParsedSection>,
+}
+
+/// Header for a firmware entry, packed into a single u32.
+///
+/// The entry header encodes the entry type, size, and optional flag in a
+/// 32-bit value with the following layout:
+/// - Bits 0-7: Entry type
+/// - Bits 8-15: Size in bytes
+/// - Bit 31: Optional flag
+struct EntryHeader(u32);
+
+impl EntryHeader {
+    fn entry_type_raw(&self) -> u8 {
+        (self.0 & 0xff) as u8
+    }
+
+    fn entry_type(&self) -> Result<EntryType> {
+        let v = self.entry_type_raw();
+        EntryType::try_from(v)
+    }
+
+    fn optional(&self) -> bool {
+        self.0 & bit_u32(31) != 0
+    }
+
+    fn size(&self) -> u32 {
+        self.0 >> 8 & 0xff
+    }
+}
+
+#[derive(Clone, Copy, Debug)]
+#[repr(u8)]
+enum EntryType {
+    /// Host <-> FW interface.
+    Iface = 0,
+    /// FW config.
+    Config = 1,
+    /// Unit tests.
+    FutfTest = 2,
+    /// Trace buffer interface.
+    TraceBuffer = 3,
+    /// Timeline metadata interface.
+    TimelineMetadata = 4,
+    /// Metadata about how the FW binary was built.
+    BuildInfoMetadata = 6,
+}
+
+impl TryFrom<u8> for EntryType {
+    type Error = Error;
+
+    fn try_from(value: u8) -> Result<Self, Self::Error> {
+        match value {
+            0 => Ok(EntryType::Iface),
+            1 => Ok(EntryType::Config),
+            2 => Ok(EntryType::FutfTest),
+            3 => Ok(EntryType::TraceBuffer),
+            4 => Ok(EntryType::TimelineMetadata),
+            6 => Ok(EntryType::BuildInfoMetadata),
+            _ => Err(EINVAL),
+        }
+    }
+}

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 13/20] drm/tyr: add firmware loading and MCU boot support
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (11 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 12/20] drm/tyr: add parser for firmware binary Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 14/20] drm/tyr: add Wait type for GPU events Deborah Brouwer
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

Add firmware loading and management for the Mali CSF GPU. This introduces
the fw module that loads the Mali GPU firmware binary, parses it into
sections, and maps those sections into the MCU VM at the required
virtual addresses.

On probe, the firmware is loaded, its sections are mapped and populated,
the MCU VM is activated, and the MCU is booted.

Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/Kconfig   |   1 +
 drivers/gpu/drm/tyr/driver.rs |  16 ++-
 drivers/gpu/drm/tyr/fw.rs     | 272 ++++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/tyr/gem.rs    |   3 -
 drivers/gpu/drm/tyr/mmu.rs    |   1 -
 drivers/gpu/drm/tyr/slot.rs   |   1 -
 drivers/gpu/drm/tyr/tyr.rs    |   1 +
 drivers/gpu/drm/tyr/vm.rs     |   1 -
 8 files changed, 289 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tyr/Kconfig b/drivers/gpu/drm/tyr/Kconfig
index 443ce988b570..729643f4db49 100644
--- a/drivers/gpu/drm/tyr/Kconfig
+++ b/drivers/gpu/drm/tyr/Kconfig
@@ -18,6 +18,7 @@ config DRM_TYR
 	select DRM_TYR_STATIC_DEPS
 	select IOMMU_IO_PGTABLE_LPAE
 	select RUST_DRM_GEM_SHMEM_HELPER
+	select RUST_FW_LOADER_ABSTRACTIONS
 	depends on IOMMU_SUPPORT
 	default n
 	help
diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index 495021a8657d..246bc3cb8580 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -42,6 +42,7 @@
 
 use crate::{
     file::TyrDrmFileData,
+    fw::Firmware,
     gem::BoData,
     gpu,
     gpu::GpuInfo,
@@ -63,6 +64,8 @@
 pub(crate) struct TyrDrmDeviceData {
     pub(crate) pdev: ARef<platform::Device>,
 
+    pub(crate) fw: Arc<Firmware>,
+
     #[pin]
     clks: Mutex<Clocks>,
 
@@ -154,10 +157,21 @@ fn probe(
         let uninit_ddev = UnregisteredDevice::<TyrDrmDriver>::new(pdev.as_ref())?;
         let platform: ARef<platform::Device> = pdev.into();
 
-        let _mmu = Mmu::new(pdev, iomem.as_arc_borrow(), &gpu_info)?;
+        let mmu = Mmu::new(pdev, iomem.as_arc_borrow(), &gpu_info)?;
+
+        let firmware = Firmware::new(
+            pdev,
+            iomem.clone(),
+            &uninit_ddev,
+            mmu.as_arc_borrow(),
+            &gpu_info,
+        )?;
+
+        firmware.boot()?;
 
         let data = try_pin_init!(TyrDrmDeviceData {
                 pdev: platform.clone(),
+                fw: firmware,
                 clks <- new_mutex!(Clocks {
                     core: core_clk,
                     stacks: stacks_clk,
diff --git a/drivers/gpu/drm/tyr/fw.rs b/drivers/gpu/drm/tyr/fw.rs
new file mode 100644
index 000000000000..cb2546350f0a
--- /dev/null
+++ b/drivers/gpu/drm/tyr/fw.rs
@@ -0,0 +1,272 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+
+//! Firmware loading and management for Mali CSF GPU.
+//!
+//! This module handles loading the Mali GPU firmware binary, parsing it into sections,
+//! and mapping those sections into the MCU's virtual address space. Each firmware section
+//! has specific properties (read/write/execute permissions, cache modes) and must be loaded
+//! at specific virtual addresses expected by the MCU.
+//!
+//! See [`Firmware`] for the main firmware management interface and [`Section`] for
+//! individual firmware sections.
+//!
+//! [`Firmware`]: crate::fw::Firmware
+//! [`Section`]: crate::fw::Section
+
+use kernel::{
+    bits::genmask_u32,
+    devres::Devres,
+    drm::{
+        gem::BaseObject,
+        Uninit, //
+    },
+    impl_flags,
+    io::{
+        poll,
+        Io, //
+    },
+    platform,
+    prelude::*,
+    str::CString,
+    sync::{
+        Arc,
+        ArcBorrow, //
+    },
+    time,
+    types::ARef, //
+};
+
+use crate::{
+    driver::{
+        IoMem,
+        TyrDrmDevice, //
+    },
+    fw::parser::{
+        FwParser,
+        ParsedSection, //
+    },
+    gem,
+    gem::{
+        KernelBo,
+        KernelBoVaAlloc, //
+    },
+    gpu::GpuInfo,
+    mmu::Mmu,
+    regs::gpu_control::{
+        McuControlMode,
+        McuStatus,
+        GPU_ID,
+        MCU_CONTROL,
+        MCU_STATUS, //
+    },
+    vm::Vm, //
+};
+
+mod parser;
+
+impl_flags!(
+    #[derive(Debug, Clone, Default, Copy, PartialEq, Eq)]
+    pub(super) struct SectionFlags(u32);
+
+    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
+    pub(super) enum SectionFlag {
+        Read = 1 << 0,
+        Write = 1 << 1,
+        Exec = 1 << 2,
+        CacheModeNone = 0 << 3,
+        CacheModeCached = 1 << 3,
+        CacheModeUncachedCoherent = 2 << 3,
+        CacheModeCachedCoherent = 3 << 3,
+        Prot = 1 << 5,
+        Shared = 1 << 30,
+        Zero = 1 << 31,
+    }
+);
+
+pub(super) const CACHE_MODE_MASK: SectionFlags = SectionFlags(genmask_u32(3..=4));
+
+pub(super) const CSF_MCU_SHARED_REGION_START: u32 = 0x04000000;
+
+impl SectionFlags {
+    fn cache_mode(&self) -> SectionFlags {
+        *self & CACHE_MODE_MASK
+    }
+}
+
+impl TryFrom<u32> for SectionFlags {
+    type Error = Error;
+
+    fn try_from(value: u32) -> Result<Self, Self::Error> {
+        let valid_flags = SectionFlags::from(SectionFlag::Read)
+            | SectionFlags::from(SectionFlag::Write)
+            | SectionFlags::from(SectionFlag::Exec)
+            | CACHE_MODE_MASK
+            | SectionFlags::from(SectionFlag::Prot)
+            | SectionFlags::from(SectionFlag::Shared)
+            | SectionFlags::from(SectionFlag::Zero);
+
+        if value & valid_flags.0 != value {
+            Err(EINVAL)
+        } else {
+            Ok(Self(value))
+        }
+    }
+}
+
+/// A parsed section of the firmware binary.
+struct Section {
+    // Raw firmware section data for reset purposes
+    #[expect(dead_code)]
+    data: KVec<u8>,
+
+    // Keep the BO backing this firmware section so that both the
+    // GPU mapping and CPU mapping remain valid until the Section is dropped.
+    #[expect(dead_code)]
+    mem: gem::KernelBo,
+}
+
+/// Loaded firmware with sections mapped into MCU VM.
+pub(crate) struct Firmware {
+    /// Platform device reference (needed to access the MCU JOB_IRQ registers).
+    pdev: ARef<platform::Device>,
+
+    /// Iomem need to access registers.
+    iomem: Arc<Devres<IoMem>>,
+
+    /// MCU VM.
+    vm: Arc<Vm>,
+
+    /// List of firmware sections.
+    #[expect(dead_code)]
+    sections: KVec<Section>,
+}
+
+impl Drop for Firmware {
+    fn drop(&mut self) {
+        // AS slots retain a VM ref, we need to kill the circular ref manually.
+        self.vm.kill();
+    }
+}
+
+impl Firmware {
+    fn init_section_mem(mem: &mut KernelBo, data: &KVec<u8>) -> Result {
+        if data.is_empty() {
+            return Ok(());
+        }
+
+        let vmap = mem.bo.vmap::<0>()?;
+        let size = mem.bo.size();
+
+        if data.len() > size {
+            pr_err!("fw section {} bigger than BO {}\n", data.len(), size);
+            return Err(EINVAL);
+        }
+
+        for (i, &byte) in data.iter().enumerate() {
+            vmap.try_write8(byte, i)?;
+        }
+
+        Ok(())
+    }
+
+    fn request(
+        ddev: &TyrDrmDevice<Uninit>,
+        gpu_info: &GpuInfo,
+    ) -> Result<kernel::firmware::Firmware> {
+        let gpu_id = GPU_ID::from_raw(gpu_info.gpu_id);
+
+        let path = CString::try_from_fmt(fmt!(
+            "arm/mali/arch{}.{}/mali_csffw.bin",
+            gpu_id.arch_major().get(),
+            gpu_id.arch_minor().get()
+        ))?;
+
+        kernel::firmware::Firmware::request(&path, ddev.as_ref())
+    }
+
+    fn load(
+        ddev: &TyrDrmDevice<Uninit>,
+        gpu_info: &GpuInfo,
+    ) -> Result<(kernel::firmware::Firmware, KVec<ParsedSection>)> {
+        let fw = Self::request(ddev, gpu_info)?;
+        let mut parser = FwParser::new(fw.data());
+
+        let parsed_sections = parser.parse()?;
+
+        Ok((fw, parsed_sections))
+    }
+
+    /// Load firmware and map sections into MCU VM.
+    pub(crate) fn new(
+        pdev: &platform::Device,
+        iomem: Arc<Devres<IoMem>>,
+        ddev: &TyrDrmDevice<Uninit>,
+        mmu: ArcBorrow<'_, Mmu>,
+        gpu_info: &GpuInfo,
+    ) -> Result<Arc<Firmware>> {
+        let vm = Vm::new(pdev, ddev, mmu, gpu_info)?;
+
+        let (fw, parsed_sections) = Self::load(ddev, gpu_info)?;
+
+        vm.activate()?;
+
+        let mut sections = KVec::new();
+        for parsed in parsed_sections {
+            let size = (parsed.va.end - parsed.va.start) as usize;
+            let va = u64::from(parsed.va.start);
+
+            let mut mem = KernelBo::new(
+                ddev,
+                vm.as_arc_borrow(),
+                size.try_into().unwrap(),
+                KernelBoVaAlloc::Explicit(va),
+                parsed.vm_map_flags,
+            )?;
+
+            let section_start = parsed.data_range.start as usize;
+            let section_end = parsed.data_range.end as usize;
+            let mut data = KVec::new();
+
+            // Ensure that the firmware slice is not out of bounds.
+            let fw_data = fw.data();
+            let bytes = fw_data.get(section_start..section_end).ok_or(EINVAL)?;
+            data.extend_from_slice(bytes, GFP_KERNEL)?;
+
+            Self::init_section_mem(&mut mem, &data)?;
+
+            sections.push(Section { data, mem }, GFP_KERNEL)?;
+        }
+
+        let firmware = Arc::new(
+            Firmware {
+                pdev: pdev.into(),
+                iomem,
+                vm,
+                sections,
+            },
+            GFP_KERNEL,
+        )?;
+
+        Ok(firmware)
+    }
+
+    pub(crate) fn boot(&self) -> Result {
+        // SAFETY: Boot is currently only called in the probe path, so we're sure we have a bound
+        // device.
+        let dev = unsafe { self.pdev.as_ref().as_bound() };
+        let io = (self.iomem).access(dev)?;
+        io.write_reg(MCU_CONTROL::zeroed().with_req(McuControlMode::Auto));
+
+        if let Err(e) = poll::read_poll_timeout(
+            || Ok(io.read(MCU_STATUS)),
+            |status| status.value() == McuStatus::Enabled,
+            time::Delta::from_millis(1),
+            time::Delta::from_millis(100),
+        ) {
+            let status = io.read(MCU_STATUS);
+            pr_err!("MCU failed to boot, status: {:?}", status.value());
+            return Err(e);
+        }
+        Ok(())
+    }
+}
diff --git a/drivers/gpu/drm/tyr/gem.rs b/drivers/gpu/drm/tyr/gem.rs
index d032a8ae543f..4ec373e0bcfa 100644
--- a/drivers/gpu/drm/tyr/gem.rs
+++ b/drivers/gpu/drm/tyr/gem.rs
@@ -94,7 +94,6 @@ pub(crate) fn new_dummy_object<Ctx: DeviceContext>(ddev: &TyrDrmDevice<Ctx>) ->
 /// a [`KernelBo`]. An automatic VA allocation strategy will be added in the future.
 pub(crate) enum KernelBoVaAlloc {
     /// Explicit VA address specified by the caller.
-    #[expect(dead_code)]
     Explicit(u64),
 }
 
@@ -107,7 +106,6 @@ pub(crate) enum KernelBoVaAlloc {
 /// When dropped, the buffer is automatically unmapped from the GPU VA space.
 pub(crate) struct KernelBo {
     /// The underlying GEM buffer object.
-    #[expect(dead_code)]
     pub(crate) bo: ARef<Bo>,
     /// The GPU VM this buffer is mapped into.
     vm: Arc<Vm>,
@@ -121,7 +119,6 @@ impl KernelBo {
     /// This function allocates a new shmem-backed GEM object and immediately maps
     /// it into the specified GPU virtual memory space. The mapping is automatically
     /// cleaned up when the [`KernelBo`] is dropped.
-    #[expect(dead_code)]
     pub(crate) fn new<Ctx: DeviceContext>(
         ddev: &TyrDrmDevice<Ctx>,
         vm: ArcBorrow<'_, Vm>,
diff --git a/drivers/gpu/drm/tyr/mmu.rs b/drivers/gpu/drm/tyr/mmu.rs
index 09df98ffc9e3..935e2102ab30 100644
--- a/drivers/gpu/drm/tyr/mmu.rs
+++ b/drivers/gpu/drm/tyr/mmu.rs
@@ -12,7 +12,6 @@
 //!
 //! [`AddressSpaceManager`]: address_space::AddressSpaceManager
 //! [`SlotManager`]: crate::slot::SlotManager
-#![allow(dead_code)]
 
 use core::ops::Range;
 
diff --git a/drivers/gpu/drm/tyr/slot.rs b/drivers/gpu/drm/tyr/slot.rs
index debba75f6204..53abb9eeb970 100644
--- a/drivers/gpu/drm/tyr/slot.rs
+++ b/drivers/gpu/drm/tyr/slot.rs
@@ -20,7 +20,6 @@
 //!
 //! [SlotOperations]: crate::slot::SlotOperations
 //! [SlotManager]: crate::slot::SlotManager
-#![allow(dead_code)]
 
 use core::{
     mem::take,
diff --git a/drivers/gpu/drm/tyr/tyr.rs b/drivers/gpu/drm/tyr/tyr.rs
index b3244670dd79..18b0668bb217 100644
--- a/drivers/gpu/drm/tyr/tyr.rs
+++ b/drivers/gpu/drm/tyr/tyr.rs
@@ -9,6 +9,7 @@
 
 mod driver;
 mod file;
+mod fw;
 mod gem;
 mod gpu;
 mod mmu;
diff --git a/drivers/gpu/drm/tyr/vm.rs b/drivers/gpu/drm/tyr/vm.rs
index c19300d76194..1ef7e40ccdb5 100644
--- a/drivers/gpu/drm/tyr/vm.rs
+++ b/drivers/gpu/drm/tyr/vm.rs
@@ -6,7 +6,6 @@
 //! the illusion of owning the entire virtual address (VA) range, similar to CPU virtual memory.
 //! Each virtual memory (VM) area is backed by ARM64 LPAE Stage 1 page tables and can be
 //! mapped into hardware address space (AS) slots for GPU execution.
-#![allow(dead_code)]
 
 use core::ops::Range;
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 14/20] drm/tyr: add Wait type for GPU events
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (12 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 13/20] drm/tyr: add firmware loading and MCU boot support Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 15/20] drm/tyr: add Job IRQ handling Deborah Brouwer
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

Add a Wait convenience type wrapping a CondVar and Mutex for sleeping
until a condition is met or a timeout expires.

The helper centralizes a common wait pattern: check the completion
predicate before sleeping, wait interruptibly with a timeout, retry on
spurious or unrelated wakeups, and perform a final predicate check before
returning ETIMEDOUT.

This will be used for CSF firmware responses and other GPU-driven events.

Also add a new_wait! macro so each Wait instance gets a call-site-specific
lockdep class key for its internal mutex.

Co-developed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Co-developed-by: Beata Michalska <beata.michalska@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/tyr.rs  |   1 +
 drivers/gpu/drm/tyr/wait.rs | 126 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 127 insertions(+)

diff --git a/drivers/gpu/drm/tyr/tyr.rs b/drivers/gpu/drm/tyr/tyr.rs
index 18b0668bb217..fd83d4b40978 100644
--- a/drivers/gpu/drm/tyr/tyr.rs
+++ b/drivers/gpu/drm/tyr/tyr.rs
@@ -16,6 +16,7 @@
 mod regs;
 mod slot;
 mod vm;
+mod wait;
 
 kernel::module_platform_driver! {
     type: TyrPlatformDriverData,
diff --git a/drivers/gpu/drm/tyr/wait.rs b/drivers/gpu/drm/tyr/wait.rs
new file mode 100644
index 000000000000..2a4d691c443c
--- /dev/null
+++ b/drivers/gpu/drm/tyr/wait.rs
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+
+//! Code to wait on GPU events.
+#![allow(dead_code)]
+
+use kernel::{
+    new_condvar,
+    prelude::*,
+    sync::{
+        lock::{
+            mutex::MutexBackend,
+            Lock, //
+        },
+        Arc,
+        CondVar,
+        CondVarTimeoutResult,
+        Mutex, //
+    },
+    time::msecs_to_jiffies, //
+};
+
+/// Creates a new [`Wait`] instance with a call-site-specific lockdep class key.
+///
+/// Always prefer this macro over [`Wait::new_with_lock`] when the [`Wait`] instance has
+/// unique locking behaviour that could otherwise trigger false-positive lockdep
+/// warnings.
+#[macro_export]
+macro_rules! new_wait {
+    () => {{
+        let lock = new_mutex!(());
+        $crate::wait::Wait::new_with_lock(lock)
+    }};
+}
+
+/// A convenience type to wait for GPU events.
+///
+/// Wraps a [`CondVar`] and [`Mutex`] pair. The mutex synchronizes predicate checks
+/// with wait/wake operations; the condvar provides the sleep/wake mechanism.
+#[pin_data]
+pub(crate) struct Wait {
+    /// The actual wait/signal mechanism.
+    #[pin]
+    cond: CondVar,
+    /// Synchronizes waiters with notifications.
+    #[pin]
+    lock: Mutex<()>,
+}
+
+impl Wait {
+    /// Creates a new [`Wait`] with a caller-supplied lock instance.
+    ///
+    /// Use [`new_wait!`] instead of calling this directly; the macro ensures a
+    /// per-call-site lockdep class key is registered.
+    pub(crate) fn new_with_lock(lock: impl PinInit<Lock<(), MutexBackend>>) -> Result<Arc<Self>> {
+        Arc::pin_init(
+            pin_init!(Self {
+                cond <- new_condvar!(),
+                lock <- lock,
+            }),
+            GFP_KERNEL,
+        )
+    }
+
+    /// Waits until a GPU event condition is met or the timeout elapses.
+    ///
+    /// Calls `on_woken` before sleeping and after each wakeup. If `on_woken`
+    /// returns [`WaitResult::Retry`], the wait continues; [`WaitResult::Done`]
+    /// returns success.
+    ///
+    /// `on_woken` is called while the internal wait lock is held, so it must be
+    /// cheap and must not call back into code that can notify this wait object.
+    ///
+    /// Returns [`ETIMEDOUT`] if the deadline is reached without the condition
+    /// becoming true, or [`ERESTARTSYS`] if interrupted by a signal.
+    pub(crate) fn wait_interruptible_timeout<F>(&self, timeout_ms: u32, mut on_woken: F) -> Result
+    where
+        F: FnMut() -> Result<WaitResult>,
+    {
+        let mut guard = self.lock.lock();
+        let mut remaining_time = msecs_to_jiffies(timeout_ms);
+
+        loop {
+            // Check the condition before sleeping to avoid missing a wakeup
+            // that arrived between the caller's last check and acquiring the
+            // lock here.
+            if let WaitResult::Done = on_woken()? {
+                return Ok(());
+            }
+
+            match self
+                .cond
+                .wait_interruptible_timeout(&mut guard, remaining_time)
+            {
+                CondVarTimeoutResult::Woken { jiffies } => match on_woken()? {
+                    WaitResult::Done => return Ok(()),
+                    WaitResult::Retry => remaining_time = jiffies,
+                },
+                CondVarTimeoutResult::Timeout => {
+                    // One final check before giving up.
+                    if let WaitResult::Done = on_woken()? {
+                        return Ok(());
+                    }
+                    return Err(ETIMEDOUT);
+                }
+                CondVarTimeoutResult::Signal { .. } => return Err(ERESTARTSYS),
+            }
+        }
+    }
+
+    /// Wakes all waiters.
+    ///
+    /// Takes the internal lock so notifications are serialized against waiters
+    /// checking the condition and entering the sleep state.
+    pub(crate) fn notify_all(&self) {
+        let _guard = self.lock.lock();
+        self.cond.notify_all();
+    }
+}
+
+/// The result of a wait operation.
+pub(crate) enum WaitResult {
+    /// The condition was met.
+    Done,
+    /// The wakeup was spurious or for an unrelated event; retry.
+    Retry,
+}

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 15/20] drm/tyr: add Job IRQ handling
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (13 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 14/20] drm/tyr: add Wait type for GPU events Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 16/20] drm/tyr: wait for global interface readiness Deborah Brouwer
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

Add a threaded IRQ wrapper for Tyr interrupt sources and use it to handle
the firmware Job IRQ.

The Job IRQ reports requests from the CSF firmware, including global
interface requests and CSG attention bits. Add a Job IRQ handler that
masks the interrupt in the primary IRQ handler, processes pending raw
status in the threaded handler, clears handled bits, and reenables the
mask before returning.

Co-developed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/driver.rs |  70 ++++++++++++++++++++++++
 drivers/gpu/drm/tyr/fw.rs     |   3 ++
 drivers/gpu/drm/tyr/fw/irq.rs | 121 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 194 insertions(+)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index 246bc3cb8580..da007aded92d 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 or MIT
 
+use core::marker::PhantomPinned;
+
 use kernel::{
     clk::{
         Clk,
@@ -25,6 +27,13 @@
         poll,
         Io, //
     },
+    irq::{
+        Flags,
+        IrqReturn,
+        ThreadedHandler,
+        ThreadedIrqReturn,
+        ThreadedRegistration, //
+    },
     new_mutex,
     of,
     platform,
@@ -239,3 +248,64 @@ struct Regulators {
     _mali: Regulator<regulator::Enabled>,
     _sram: Regulator<regulator::Enabled>,
 }
+
+pub(crate) trait TyrIrqTrait: Sync {
+    fn read_status(&self, dev: &Device<Bound>) -> u32;
+    fn clear_mask(&self, dev: &Device<Bound>);
+    fn reenable_mask(&self, dev: &Device<Bound>);
+    fn read_raw_status(&self, dev: &Device<Bound>) -> u32;
+    fn clear_status(&self, dev: &Device<Bound>, status: u32);
+    fn mask(&self) -> u32;
+    fn handle(&self, status: u32);
+}
+
+#[pin_data]
+pub(crate) struct TyrIrq<T: TyrIrqTrait> {
+    irq: T,
+    #[pin]
+    _pin: PhantomPinned,
+}
+
+impl<T: TyrIrqTrait + 'static> TyrIrq<T> {
+    pub(crate) fn request<'a>(
+        pdev: &'a platform::Device<Bound>,
+        name: &'static CStr,
+        irq: T,
+    ) -> Result<impl PinInit<ThreadedRegistration<Self>, Error> + 'a> {
+        let handler = try_pin_init!(Self {
+            irq,
+            _pin: PhantomPinned,
+        });
+
+        Ok(pdev.request_threaded_irq_by_name(Flags::SHARED, name, name, handler))
+    }
+}
+
+impl<T: TyrIrqTrait> ThreadedHandler for TyrIrq<T> {
+    fn handle(&self, dev: &Device<Bound>) -> ThreadedIrqReturn {
+        let masked_status = self.irq.read_status(dev);
+
+        if masked_status == 0 {
+            return ThreadedIrqReturn::None;
+        }
+        self.irq.clear_mask(dev);
+        ThreadedIrqReturn::WakeThread
+    }
+
+    fn handle_threaded(&self, dev: &Device<Bound>) -> IrqReturn {
+        let mut ret = IrqReturn::None;
+
+        loop {
+            let raw_status = self.irq.read_raw_status(dev) & self.irq.mask();
+            if raw_status == 0 {
+                break;
+            }
+            self.irq.handle(raw_status);
+            self.irq.clear_status(dev, raw_status);
+            ret = IrqReturn::Handled;
+        }
+
+        self.irq.reenable_mask(dev);
+        ret
+    }
+}
diff --git a/drivers/gpu/drm/tyr/fw.rs b/drivers/gpu/drm/tyr/fw.rs
index cb2546350f0a..b5ccacf891a3 100644
--- a/drivers/gpu/drm/tyr/fw.rs
+++ b/drivers/gpu/drm/tyr/fw.rs
@@ -62,8 +62,11 @@
     vm::Vm, //
 };
 
+pub(crate) mod irq;
 mod parser;
 
+const MAX_CSG: u32 = 16;
+
 impl_flags!(
     #[derive(Debug, Clone, Default, Copy, PartialEq, Eq)]
     pub(super) struct SectionFlags(u32);
diff --git a/drivers/gpu/drm/tyr/fw/irq.rs b/drivers/gpu/drm/tyr/fw/irq.rs
new file mode 100644
index 000000000000..0eff5a14f69e
--- /dev/null
+++ b/drivers/gpu/drm/tyr/fw/irq.rs
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+
+//! IRQ handling for the Job IRQ.
+//!
+//! The Job IRQ signals events from the MCU, including global interface acknowledgements.
+#![allow(dead_code)]
+
+use core::sync::atomic::{
+    AtomicBool,
+    Ordering, //
+};
+
+use kernel::{
+    c_str,
+    device::{
+        Bound,
+        Device, //
+    },
+    devres::Devres,
+    io::Io,
+    irq::ThreadedRegistration,
+    platform,
+    prelude::*,
+    sync::Arc, //
+};
+
+use crate::{
+    driver::{
+        IoMem,
+        TyrIrq,
+        TyrIrqTrait, //
+    },
+    regs::job_control::{
+        JOB_IRQ_CLEAR,
+        JOB_IRQ_MASK,
+        JOB_IRQ_RAWSTAT,
+        JOB_IRQ_STATUS, //
+    },
+    wait::Wait, //
+};
+
+const CSG_IRQ_MASK: u32 = (1u32 << super::MAX_CSG) - 1;
+
+pub(crate) struct JobIrq {
+    iomem: Arc<Devres<IoMem>>,
+    fw_ready: Arc<AtomicBool>,
+    ready_wait: Arc<Wait>,
+}
+
+pub(crate) fn job_irq_init<'a>(
+    pdev: &'a platform::Device<Bound>,
+    iomem: Arc<Devres<IoMem>>,
+    fw_ready: Arc<AtomicBool>,
+    ready_wait: Arc<Wait>,
+) -> Result<impl PinInit<ThreadedRegistration<TyrIrq<JobIrq>>, Error> + 'a> {
+    let io = iomem.access(pdev.as_ref())?;
+    io.write_reg(
+        JOB_IRQ_MASK::zeroed()
+            .with_const_csg::<CSG_IRQ_MASK>()
+            .with_glb(true),
+    );
+    let job_irq = JobIrq {
+        iomem: iomem.clone(),
+        fw_ready,
+        ready_wait,
+    };
+
+    TyrIrq::request(pdev, c_str!("job"), job_irq)
+}
+
+impl TyrIrqTrait for JobIrq {
+    fn read_status(&self, dev: &Device<Bound>) -> u32 {
+        match self.iomem.access(dev) {
+            Ok(io) => io.read(JOB_IRQ_STATUS).into_raw(),
+            Err(_) => 0,
+        }
+    }
+
+    fn clear_mask(&self, dev: &Device<Bound>) {
+        if let Ok(io) = self.iomem.access(dev) {
+            io.write_reg(JOB_IRQ_MASK::zeroed());
+        }
+    }
+
+    fn reenable_mask(&self, dev: &Device<Bound>) {
+        if let Ok(io) = self.iomem.access(dev) {
+            io.write_reg(
+                JOB_IRQ_MASK::zeroed()
+                    .with_const_csg::<CSG_IRQ_MASK>()
+                    .with_glb(true),
+            );
+        }
+    }
+
+    fn read_raw_status(&self, dev: &Device<Bound>) -> u32 {
+        match self.iomem.access(dev) {
+            Ok(io) => io.read(JOB_IRQ_RAWSTAT).into_raw(),
+            Err(_) => 0,
+        }
+    }
+
+    fn clear_status(&self, dev: &Device<Bound>, status: u32) {
+        if let Ok(io) = self.iomem.access(dev) {
+            io.write_reg(JOB_IRQ_CLEAR::from_raw(status));
+        }
+    }
+
+    fn mask(&self) -> u32 {
+        JOB_IRQ_MASK::zeroed()
+            .with_const_csg::<CSG_IRQ_MASK>()
+            .with_glb(true)
+            .into_raw()
+    }
+
+    fn handle(&self, status: u32) {
+        if JOB_IRQ_RAWSTAT::from_raw(status).glb() {
+            self.fw_ready.store(true, Ordering::Release);
+            self.ready_wait.notify_all();
+        }
+    }
+}

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 16/20] drm/tyr: wait for global interface readiness
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (14 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 15/20] drm/tyr: add Job IRQ handling Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 17/20] drm/tyr: validate presence of CSF shared section Deborah Brouwer
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

Add a wait helper for global interface readiness using the Job IRQ. The
IRQ handler latches readiness in fw_ready and wakes waiters. After
booting the firmware, probe waits until the firmware reports that the
global interface is ready to accept requests.

Register the Job IRQ before booting the firmware so that the initial GLB
event is not missed.

Co-developed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/driver.rs | 18 +++++++++++++++++-
 drivers/gpu/drm/tyr/fw.rs     | 35 +++++++++++++++++++++++++++++++++--
 drivers/gpu/drm/tyr/fw/irq.rs |  1 -
 drivers/gpu/drm/tyr/wait.rs   |  1 -
 4 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index da007aded92d..3225385cd511 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -12,6 +12,7 @@
         Core,
         Device, //
     },
+    devres,
     devres::Devres,
     dma::{
         Device as DmaDevice,
@@ -51,7 +52,10 @@
 
 use crate::{
     file::TyrDrmFileData,
-    fw::Firmware,
+    fw::{
+        irq::job_irq_init,
+        Firmware, //
+    },
     gem::BoData,
     gpu,
     gpu::GpuInfo,
@@ -176,8 +180,20 @@ fn probe(
             &gpu_info,
         )?;
 
+        let job_irq = job_irq_init(
+            pdev,
+            iomem.clone(),
+            firmware.fw_ready.clone(),
+            firmware.ready_wait.clone(),
+        )?;
+        devres::register(pdev.as_ref(), job_irq, GFP_KERNEL)?;
+
         firmware.boot()?;
 
+        firmware
+            .wait_ready(1000)
+            .inspect_err(|_| pr_err!("Timed out waiting for firmware to be ready.\n"))?;
+
         let data = try_pin_init!(TyrDrmDeviceData {
                 pdev: platform.clone(),
                 fw: firmware,
diff --git a/drivers/gpu/drm/tyr/fw.rs b/drivers/gpu/drm/tyr/fw.rs
index b5ccacf891a3..14815cdafac8 100644
--- a/drivers/gpu/drm/tyr/fw.rs
+++ b/drivers/gpu/drm/tyr/fw.rs
@@ -13,6 +13,11 @@
 //! [`Firmware`]: crate::fw::Firmware
 //! [`Section`]: crate::fw::Section
 
+use core::sync::atomic::{
+    AtomicBool,
+    Ordering, //
+};
+
 use kernel::{
     bits::genmask_u32,
     devres::Devres,
@@ -25,6 +30,7 @@
         poll,
         Io, //
     },
+    new_mutex,
     platform,
     prelude::*,
     str::CString,
@@ -52,6 +58,7 @@
     },
     gpu::GpuInfo,
     mmu::Mmu,
+    new_wait,
     regs::gpu_control::{
         McuControlMode,
         McuStatus,
@@ -59,13 +66,18 @@
         MCU_CONTROL,
         MCU_STATUS, //
     },
-    vm::Vm, //
+    vm::Vm,
+    wait::{
+        Wait,
+        WaitResult, //
+    }, //
 };
 
 pub(crate) mod irq;
 mod parser;
 
-const MAX_CSG: u32 = 16;
+/// Maximum number of CSG interfaces supported by hardware.
+const MAX_CSG: usize = 16;
 
 impl_flags!(
     #[derive(Debug, Clone, Default, Copy, PartialEq, Eq)]
@@ -142,6 +154,12 @@ pub(crate) struct Firmware {
     /// List of firmware sections.
     #[expect(dead_code)]
     sections: KVec<Section>,
+
+    /// A condvar representing a wait on a firmware event.
+    pub(crate) ready_wait: Arc<Wait>,
+
+    /// Latched to `true` by the IRQ handler when the firmware signals readiness via the GLB bit.
+    pub(crate) fw_ready: Arc<AtomicBool>,
 }
 
 impl Drop for Firmware {
@@ -246,6 +264,8 @@ pub(crate) fn new(
                 iomem,
                 vm,
                 sections,
+                ready_wait: new_wait!()?,
+                fw_ready: Arc::new(AtomicBool::new(false), GFP_KERNEL)?,
             },
             GFP_KERNEL,
         )?;
@@ -272,4 +292,15 @@ pub(crate) fn boot(&self) -> Result {
         }
         Ok(())
     }
+
+    /// Waits until the firmware signals readiness via the GLB IRQ bit.
+    pub(crate) fn wait_ready(&self, timeout_ms: u32) -> Result {
+        self.ready_wait.wait_interruptible_timeout(timeout_ms, || {
+            if self.fw_ready.load(Ordering::Acquire) {
+                Ok(WaitResult::Done)
+            } else {
+                Ok(WaitResult::Retry)
+            }
+        })
+    }
 }
diff --git a/drivers/gpu/drm/tyr/fw/irq.rs b/drivers/gpu/drm/tyr/fw/irq.rs
index 0eff5a14f69e..0f371000679c 100644
--- a/drivers/gpu/drm/tyr/fw/irq.rs
+++ b/drivers/gpu/drm/tyr/fw/irq.rs
@@ -3,7 +3,6 @@
 //! IRQ handling for the Job IRQ.
 //!
 //! The Job IRQ signals events from the MCU, including global interface acknowledgements.
-#![allow(dead_code)]
 
 use core::sync::atomic::{
     AtomicBool,
diff --git a/drivers/gpu/drm/tyr/wait.rs b/drivers/gpu/drm/tyr/wait.rs
index 2a4d691c443c..1db4c1827fd7 100644
--- a/drivers/gpu/drm/tyr/wait.rs
+++ b/drivers/gpu/drm/tyr/wait.rs
@@ -1,7 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 or MIT
 
 //! Code to wait on GPU events.
-#![allow(dead_code)]
 
 use kernel::{
     new_condvar,

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 17/20] drm/tyr: validate presence of CSF shared section
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (15 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 16/20] drm/tyr: wait for global interface readiness Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-24 23:39 ` [PATCH v4 18/20] drm/tyr: add CSF firmware interface support Deborah Brouwer
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

The firmware binary must have a shared section for communicating with
the MCU. Check for this section after parsing and fail with -EINVAL if it
is missing.

Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/fw/parser.rs | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/tyr/fw/parser.rs b/drivers/gpu/drm/tyr/fw/parser.rs
index 638707430701..0be3b740ea65 100644
--- a/drivers/gpu/drm/tyr/fw/parser.rs
+++ b/drivers/gpu/drm/tyr/fw/parser.rs
@@ -175,6 +175,19 @@ pub(super) fn parse(&mut self) -> Result<KVec<ParsedSection>> {
             }
         }
 
+        // Validate that the firmware contains the required shared memory section.
+        let has_shared_section = parsed_sections
+            .iter()
+            .any(|section| section.va.start == super::CSF_MCU_SHARED_REGION_START);
+
+        if !has_shared_section {
+            pr_err!(
+                "No shared section found at 0x{:08x} in firmware\n",
+                super::CSF_MCU_SHARED_REGION_START
+            );
+            return Err(EINVAL);
+        }
+
         Ok(parsed_sections)
     }
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 18/20] drm/tyr: add CSF firmware interface support
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (16 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 17/20] drm/tyr: validate presence of CSF shared section Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-27  9:08   ` Onur Özkan
  2026-04-24 23:39 ` [PATCH v4 19/20] rust: time: add arch_timer_get_rate wrapper Deborah Brouwer
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

Add initial support for the Command Stream Frontend (CSF) firmware
interfaces, enabling communication between the driver and the MCU through
shared memory.

Implement the global (GLB), command stream group (CSG), and command stream
(CS) interfaces. These provide access to the firmware control, input, and
output blocks and allow discovery of the available CSGs and CSs at
runtime.

Store the global interface in the firmware state and initialize it after
firmware boot during probe.

Co-developed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/driver.rs        |    2 +-
 drivers/gpu/drm/tyr/fw.rs            |   62 +-
 drivers/gpu/drm/tyr/fw/interfaces.rs | 2005 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/tyr/gem.rs           |    5 +
 4 files changed, 2061 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index 3225385cd511..20ae114a4180 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -189,10 +189,10 @@ fn probe(
         devres::register(pdev.as_ref(), job_irq, GFP_KERNEL)?;
 
         firmware.boot()?;
-
         firmware
             .wait_ready(1000)
             .inspect_err(|_| pr_err!("Timed out waiting for firmware to be ready.\n"))?;
+        firmware.enable_global_interface()?;
 
         let data = try_pin_init!(TyrDrmDeviceData {
                 pdev: platform.clone(),
diff --git a/drivers/gpu/drm/tyr/fw.rs b/drivers/gpu/drm/tyr/fw.rs
index 14815cdafac8..598e399a58ae 100644
--- a/drivers/gpu/drm/tyr/fw.rs
+++ b/drivers/gpu/drm/tyr/fw.rs
@@ -36,7 +36,8 @@
     str::CString,
     sync::{
         Arc,
-        ArcBorrow, //
+        ArcBorrow,
+        Mutex, //
     },
     time,
     types::ARef, //
@@ -47,9 +48,12 @@
         IoMem,
         TyrDrmDevice, //
     },
-    fw::parser::{
-        FwParser,
-        ParsedSection, //
+    fw::{
+        interfaces::GlobalInterface,
+        parser::{
+            FwParser,
+            ParsedSection, //
+        },
     },
     gem,
     gem::{
@@ -73,12 +77,16 @@
     }, //
 };
 
+mod interfaces;
 pub(crate) mod irq;
 mod parser;
 
 /// Maximum number of CSG interfaces supported by hardware.
 const MAX_CSG: usize = 16;
 
+/// Maximum number of CS interfaces supported by hardware.
+const MAX_CS: usize = 16;
+
 impl_flags!(
     #[derive(Debug, Clone, Default, Copy, PartialEq, Eq)]
     pub(super) struct SectionFlags(u32);
@@ -100,6 +108,11 @@ pub(super) enum SectionFlag {
 
 pub(super) const CACHE_MODE_MASK: SectionFlags = SectionFlags(genmask_u32(3..=4));
 
+/// MCU virtual address where the CSF shared memory region starts.
+///
+/// This region contains the firmware interface structures for communication between
+/// the CPU driver and MCU firmware, including the GLB_CONTROL_BLOCK at this base address.
+/// The firmware binary contains a section marked to be loaded at this address.
 pub(super) const CSF_MCU_SHARED_REGION_START: u32 = 0x04000000;
 
 impl SectionFlags {
@@ -129,18 +142,18 @@ fn try_from(value: u32) -> Result<Self, Self::Error> {
 }
 
 /// A parsed section of the firmware binary.
-struct Section {
+pub(crate) struct Section {
     // Raw firmware section data for reset purposes
     #[expect(dead_code)]
     data: KVec<u8>,
 
     // Keep the BO backing this firmware section so that both the
     // GPU mapping and CPU mapping remain valid until the Section is dropped.
-    #[expect(dead_code)]
     mem: gem::KernelBo,
 }
 
 /// Loaded firmware with sections mapped into MCU VM.
+#[pin_data(PinnedDrop)]
 pub(crate) struct Firmware {
     /// Platform device reference (needed to access the MCU JOB_IRQ registers).
     pdev: ARef<platform::Device>,
@@ -152,7 +165,6 @@ pub(crate) struct Firmware {
     vm: Arc<Vm>,
 
     /// List of firmware sections.
-    #[expect(dead_code)]
     sections: KVec<Section>,
 
     /// A condvar representing a wait on a firmware event.
@@ -160,10 +172,15 @@ pub(crate) struct Firmware {
 
     /// Latched to `true` by the IRQ handler when the firmware signals readiness via the GLB bit.
     pub(crate) fw_ready: Arc<AtomicBool>,
+
+    /// The global FW interface.
+    #[pin]
+    global_iface: Mutex<GlobalInterface>,
 }
 
-impl Drop for Firmware {
-    fn drop(&mut self) {
+#[pinned_drop]
+impl PinnedDrop for Firmware {
+    fn drop(self: Pin<&mut Self>) {
         // AS slots retain a VM ref, we need to kill the circular ref manually.
         self.vm.kill();
     }
@@ -258,21 +275,36 @@ pub(crate) fn new(
             sections.push(Section { data, mem }, GFP_KERNEL)?;
         }
 
-        let firmware = Arc::new(
-            Firmware {
+        let firmware = Arc::pin_init(
+            try_pin_init!(Firmware {
                 pdev: pdev.into(),
                 iomem,
                 vm,
                 sections,
                 ready_wait: new_wait!()?,
                 fw_ready: Arc::new(AtomicBool::new(false), GFP_KERNEL)?,
-            },
+                global_iface <- new_mutex!(GlobalInterface::new()?),
+            }),
             GFP_KERNEL,
         )?;
 
         Ok(firmware)
     }
 
+    /// Get the shared memory section containing firmware interface structures.
+    pub(crate) fn shared_section(&self) -> Result<&Section> {
+        self.sections
+            .iter()
+            .find(|section| section.mem.va_range().start == u64::from(CSF_MCU_SHARED_REGION_START))
+            .ok_or_else(|| {
+                pr_err!(
+                    "CSF shared section not found at 0x{:08x}\n",
+                    CSF_MCU_SHARED_REGION_START
+                );
+                EINVAL
+            })
+    }
+
     pub(crate) fn boot(&self) -> Result {
         // SAFETY: Boot is currently only called in the probe path, so we're sure we have a bound
         // device.
@@ -303,4 +335,10 @@ pub(crate) fn wait_ready(&self, timeout_ms: u32) -> Result {
             }
         })
     }
+
+    /// Enable the global interface.
+    pub(crate) fn enable_global_interface(&self) -> Result {
+        let shared_section = self.shared_section()?;
+        self.global_iface.lock().enable(shared_section)
+    }
 }
diff --git a/drivers/gpu/drm/tyr/fw/interfaces.rs b/drivers/gpu/drm/tyr/fw/interfaces.rs
new file mode 100644
index 000000000000..07cdb1c76a3f
--- /dev/null
+++ b/drivers/gpu/drm/tyr/fw/interfaces.rs
@@ -0,0 +1,2005 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+
+//! Code to control the global interface of the CSF firmware.
+//!
+//! For abbreviation definitions (CEU, CS, CSF, CSG, CSHW, GLB, JASID, MCU, MMU), see the top-level
+//! module documentation in [`crate::regs`].
+//!
+//! # Interface Overview
+//!
+//! Tyr interacts with the CSF firmware running on the MCU through shared memory
+//! interfaces. The CSF manages job submission via a hierarchy of:
+//! - **GLB**: Global interface - controls operations common to all CSs
+//! - **CSG**: Command Stream Groups - groups of related command streams
+//! - **CS**: Command Streams - individual sequences of GPU commands
+//!
+//! ```
+//! ┌──────────────────────────────────────────┐
+//! │ GPU                                      │
+//! │ ┌─────┐ ┌──────────────────────────────┐ │
+//! │ │ MMU │ │  CSF                         │ │
+//! │ └─────┘ │ ┌────────────┐ ┌─────┐       │ │
+//! │         │ │ CSHW (CEU) │ │ MCU │       │ │
+//! │         │ └────────────┘ └─────┘       │ │
+//! └─────────┼──────────────────────────────┼─┘
+//!           │ ┌──────────────────────────┐ │
+//!           │ │ Shared Memory            │ │
+//!           │ │ ┌────────┐ ┌────┐ ┌────┐ │ │
+//!           │ │ │  CSG0  │ │GLB │ │ FW │ │ │
+//!           │ │ │ ┌────┐ │ └────┘ └────┘ │ │
+//!           │ │ │ │CS0 │ │               │ │
+//!           │ │ │ └────┘ │               │ │
+//!           │ │ └────────┘               │ │
+//!           │ └──────────────────────────┘ │
+//!           └──────────────┬───────────────┘
+//!                          │
+//!                      ┌───┴───┐
+//!                      │  Tyr  │
+//!                      └───────┘
+//! ```
+//!
+
+use crate::fw::Section;
+use iface::FwInterface;
+use kernel::{
+    io::Io,
+    prelude::*, //
+};
+
+/// Offset from GLB_CONTROL_BLOCK start to the first GROUP_CONTROL block.
+const CSG_GROUP_CONTROL_OFFSET: usize = 0x1000;
+
+/// Offset from GROUP_CONTROL_BLOCK start to the first STREAM_CONTROL block.
+const CS_CONTROL_OFFSET: usize = 0x40;
+
+/// Generic firmware interface infrastructure.
+///
+/// Provides a bounded VMap-backed IO wrapper for accessing CSF shared memory regions.
+mod iface {
+    use core::{
+        mem::size_of,
+        ops::Range,
+        ptr::{
+            read_volatile,
+            write_volatile, //
+        }, //
+    };
+
+    use kernel::{
+        drm::gem::shmem::VMapOwned,
+        io::{
+            Io,
+            IoCapable,
+            IoKnownSize, //
+        },
+        prelude::*, //
+    };
+
+    use crate::gem::BoData;
+
+    /// Firmware interface wrapper for accessing CSF shared memory regions.
+    ///
+    /// Provides bounds-checked access to firmware interface blocks mapped into
+    /// driver memory via a VMap.
+    pub(super) struct FwInterface<const FW_IFACE_SIZE: usize> {
+        /// Virtual mapping of the shared memory buffer.
+        vmap: VMapOwned<BoData>,
+        /// Offset within the shared memory buffer where this interface starts.
+        offset: usize,
+    }
+
+    impl<const FW_IFACE_SIZE: usize> FwInterface<FW_IFACE_SIZE> {
+        /// Creates a new firmware interface wrapper at the specified MCU virtual address.
+        ///
+        /// Validates that the whole interface block is within the section's address range.
+        pub(super) fn new(
+            vmap: &VMapOwned<BoData>,
+            va_range: &Range<u64>,
+            shared_iface_addr: u64,
+        ) -> Result<FwInterface<FW_IFACE_SIZE>> {
+            let shared_mem_start = va_range.start;
+            let shared_mem_end = va_range.end;
+
+            let iface_end = shared_iface_addr
+                .checked_add(FW_IFACE_SIZE as u64)
+                .ok_or(EINVAL)?;
+
+            if shared_iface_addr < shared_mem_start || iface_end > shared_mem_end {
+                pr_err!(
+                    "FwInterface::new: interface [0x{:x}..0x{:x}) out of bounds [0x{:x}..0x{:x})\n",
+                    shared_iface_addr,
+                    iface_end,
+                    shared_mem_start,
+                    shared_mem_end
+                );
+                return Err(EINVAL);
+            }
+
+            let offset = (shared_iface_addr - shared_mem_start) as usize;
+            Ok(FwInterface {
+                vmap: vmap.clone(),
+                offset,
+            })
+        }
+    }
+
+    impl<const FW_IFACE_SIZE: usize> Io for FwInterface<FW_IFACE_SIZE> {
+        #[inline]
+        fn addr(&self) -> usize {
+            self.vmap.addr() + self.offset
+        }
+
+        #[inline]
+        fn maxsize(&self) -> usize {
+            FW_IFACE_SIZE
+        }
+    }
+
+    impl<const FW_IFACE_SIZE: usize> IoKnownSize for FwInterface<FW_IFACE_SIZE> {
+        const MIN_SIZE: usize = FW_IFACE_SIZE;
+    }
+
+    impl<T, const FW_IFACE_SIZE: usize> IoCapable<T> for FwInterface<FW_IFACE_SIZE> {
+        unsafe fn io_read(&self, addr: usize) -> T {
+            let base = self.addr();
+            let size = size_of::<T>();
+
+            if addr < base || addr.saturating_add(size) > base + FW_IFACE_SIZE {
+                pr_err!(
+                    "io_read: address 0x{:x} out of bounds [0x{:x}..0x{:x})\n",
+                    addr,
+                    base,
+                    base + FW_IFACE_SIZE
+                );
+                panic!("io_read: address 0x{:x} out of bounds", addr);
+            }
+
+            let ptr = addr as *const T;
+
+            // SAFETY: ptr is within bounds (checked above) and valid for the VMap lifetime.
+            unsafe { read_volatile(ptr) }
+        }
+
+        unsafe fn io_write(&self, value: T, addr: usize) {
+            let base = self.addr();
+            let size = size_of::<T>();
+
+            if addr < base || addr.saturating_add(size) > base + FW_IFACE_SIZE {
+                pr_err!(
+                    "io_write: address 0x{:x} out of bounds [0x{:x}..0x{:x})\n",
+                    addr,
+                    base,
+                    base + FW_IFACE_SIZE
+                );
+                panic!("io_write: address 0x{:x} out of bounds", addr);
+            }
+
+            let ptr = addr as *mut T;
+
+            // SAFETY: ptr is within bounds (checked above) and valid for the VMap lifetime.
+            unsafe { write_volatile(ptr, value) };
+        }
+    }
+}
+
+/// GLB (Global) interface definitions.
+///
+/// This module contains the register definitions and types for the global CSF interface,
+/// including control, input, and output blocks.
+mod glb {
+    use core::convert::TryFrom;
+
+    use kernel::{
+        error::{
+            code::EINVAL,
+            Error, //
+        },
+        num::Bounded, //
+    };
+
+    /// Size of the GLB_CONTROL_BLOCK base registers (not including GROUP_CONTROL blocks).
+    ///
+    /// This covers only the GLB_CONTROL base registers: 0x00-0x1C.
+    /// GROUP_CONTROL (CSG) blocks are accessed separately via runtime calculations.
+    pub(super) const GLB_CONTROL_BLOCK_SIZE: usize = 0x20;
+
+    /// Size of the GLB_INPUT_BLOCK register block excluding reserved space at the end.
+    pub(super) const GLB_INPUT_BLOCK_SIZE: usize = 0x84;
+
+    /// Size of the GLB_OUTPUT_BLOCK register block excluding reserved space at the end.
+    pub(super) const GLB_OUTPUT_BLOCK_SIZE: usize = 0x1C;
+
+    /// Timestamp source selection for timers.
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u8)]
+    pub(super) enum TimestampSource {
+        /// The system timestamp is used.
+        /// This is the value exposed in the TIMESTAMP register
+        /// ([`TIMESTAMP_LO`](crate::regs::gpu_control::TIMESTAMP_LO) and
+        /// [`TIMESTAMP_HI`](crate::regs::gpu_control::TIMESTAMP_HI)).
+        SystemTimestamp = 0,
+        /// The GPU cycle counter is used.
+        /// This is the value exposed in the CYCLE_COUNT register
+        /// ([`CYCLE_COUNT_LO`](crate::regs::gpu_control::CYCLE_COUNT_LO) and
+        /// [`CYCLE_COUNT_HI`](crate::regs::gpu_control::CYCLE_COUNT_HI)).
+        GpuCounter = 1,
+    }
+
+    impl From<Bounded<u32, 1>> for TimestampSource {
+        fn from(val: Bounded<u32, 1>) -> Self {
+            match val.get() {
+                0 => TimestampSource::SystemTimestamp,
+                1 => TimestampSource::GpuCounter,
+                _ => unreachable!(),
+            }
+        }
+    }
+
+    impl From<TimestampSource> for Bounded<u32, 1> {
+        fn from(src: TimestampSource) -> Self {
+            Bounded::try_new(src as u32).unwrap()
+        }
+    }
+
+    /// Global halt status values.
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u32)]
+    pub(super) enum HaltStatus {
+        /// No problem reported.
+        Ok = 0x00000000,
+        /// A fatal error has occurred, but unable to determine cause.
+        Panic = 0x0000004E,
+        /// A watchdog timer has expired.
+        Wd = 0x0000004F,
+    }
+
+    impl TryFrom<Bounded<u32, 32>> for HaltStatus {
+        type Error = Error;
+
+        fn try_from(val: Bounded<u32, 32>) -> Result<Self, Self::Error> {
+            match val.get() {
+                0x00000000 => Ok(HaltStatus::Ok),
+                0x0000004E => Ok(HaltStatus::Panic),
+                0x0000004F => Ok(HaltStatus::Wd),
+                _ => Err(EINVAL),
+            }
+        }
+    }
+
+    impl From<HaltStatus> for Bounded<u32, 32> {
+        fn from(status: HaltStatus) -> Self {
+            Bounded::try_new(status as u32).unwrap()
+        }
+    }
+
+    /// GLB_CONTROL_BLOCK - Global interface control and capabilities.
+    ///
+    /// These macros represent virtualized registers for the global interface control block.
+    /// They allow Tyr to query global CSF interface capabilities and to
+    /// retrieve the MCU's virtual addresses for the global input/output blocks.
+    pub(super) mod control {
+        use kernel::register;
+
+        register! {
+            /// Global interface version.
+            pub GLB_VERSION(u32) @ 0x00 {
+                /// Patch number.
+                15:0 patch;
+                /// Minor version number.
+                23:16 minor;
+                /// Major version number.
+                31:24 major;
+            }
+
+            /// Capabilities of the global CSF interface.
+            pub GLB_FEATURES(u32) @ 0x04 {
+                // Suspend compute jobs supported.
+                0:0 compute_suspend => bool;
+                /// Suspend fragment jobs supported.
+                1:1 fragment_suspend => bool;
+                /// Suspend tiler jobs supported.
+                2:2 tiler_suspend => bool;
+                /// Support for multiple PROGRESS_WAIT.
+                3:3 progress_multi_wait => bool;
+            }
+
+            /// MCU virtual address of the global input block.
+            pub GLB_INPUT_VA(u32) @ 0x08 {
+                31:0 value;
+            }
+
+            /// MCU virtual address of the global output block.
+            pub GLB_OUTPUT_VA(u32) @ 0x0C {
+                31:0 value;
+            }
+
+            /// This register contains the count of CSG interfaces supported.
+            pub GLB_GROUP_NUM(u32) @ 0x10 {
+                4:0 value;
+            }
+
+            /// Stride, in bytes, between each CSG interface capabilities structure.
+            pub GLB_GROUP_STRIDE(u32) @ 0x14 {
+                31:0 value;
+            }
+
+            /// Size, in bytes, of the GPU performance counters.
+            pub GLB_PRFCNT_SIZE(u32) @ 0x18 {
+                /// Size of GPU hardware performance counter data.
+                15:0 hardware_size;
+                /// Size of GPU firmware performance counter data.
+                31:16 firmware_size;
+            }
+
+            /// Features of instrumentation buffer used by the TRACE_POINT instruction.
+            pub GLB_INSTR_FEATURES(u32) @ 0x1C {
+                /// How often the buffer offset is updated.
+                3:0 offset_update_rate;
+                /// Maximum size of each stored event
+                7:4 event_size_max;
+            }
+        }
+    }
+
+    /// GLB_INPUT_BLOCK - Global register interface, input area.
+    ///
+    /// These macros represent virtualized registers for the global input block.
+    /// Only Tyr updates these registers; CSF has read-only access.
+    pub(super) mod input {
+        use super::TimestampSource;
+        use kernel::register;
+
+        register! {
+            /// Global request register.
+            ///
+            /// Tyr makes requests to the CSF by changing the value of bits in
+            /// this register.
+            pub GLB_REQ(u32) @ 0x00 {
+                /// Halt the MCU.
+                0:0 halt => bool;
+                /// Update the progress timer timeout.
+                1:1 cfg_progress_timer => bool;
+                /// Update the shader core allocation mask.
+                2:2 cfg_alloc_en => bool;
+                /// Update the shader core power down timeout.
+                3:3 cfg_pwroff_timer => bool;
+                /// Switch the GPU into protected mode.
+                4:4 protm_enter => bool;
+                /// Control performance counters.
+                5:5 prfcnt_enable => bool;
+                /// Sample performance counters.
+                6:6 prfcnt_sample => bool;
+                /// Enable cycle counter and timestamp.
+                7:7 counter_enable => bool;
+                /// Check if firmware is alive.
+                8:8 ping => bool;
+                /// Update firmware configuration settings.
+                9:9 firmware_config_update => bool;
+                /// Enable idle state reporting.
+                10:10 idle_enable => bool;
+                /// Inactive compute iterator event.
+                20:20 inactive_compute => bool;
+                /// Inactive fragment iterator event.
+                21:21 inactive_fragment => bool;
+                /// Inactive tiler iterator event.
+                22:22 inactive_tiler => bool;
+                /// GPU exit protected mode event.
+                23:23 protm_exit => bool;
+                /// Performance counter buffer hit 50% threshold.
+                24:24 prfcnt_threshold => bool;
+                /// Performance counter buffer overflow.
+                25:25 prfcnt_overflow => bool;
+                /// Idle state reached.
+                26:26 idle_event => bool;
+            }
+
+            /// Global acknowledge IRQ mask.
+            ///
+            /// Tyr uses this bit mask to indicate which CSF acknowledgements
+            /// it wishes to be notified about. The bit mask corresponds to
+            /// the request register which also corresponds to the CSF's ack
+            /// register in the Output block.
+            pub GLB_ACK_IRQ_MASK(u32) @ 0x04 {
+                /// Halt the MCU.
+                0:0 halt => bool;
+                /// Update the progress timer timeout.
+                1:1 cfg_progress_timer => bool;
+                /// Update the shader core allocation mask.
+                2:2 cfg_alloc_en => bool;
+                /// Update the shader core power down timeout.
+                3:3 cfg_pwroff_timer => bool;
+                /// Switch the GPU into protected mode.
+                4:4 protm_enter => bool;
+                /// Control performance counters.
+                5:5 prfcnt_enable => bool;
+                /// Sample performance counters.
+                6:6 prfcnt_sample => bool;
+                /// Enable cycle counter and timestamp.
+                7:7 counter_enable => bool;
+                /// Check if firmware is alive.
+                8:8 ping => bool;
+                /// Update firmware configuration.
+                9:9 firmware_config_update => bool;
+                /// Enable idle state reporting.
+                10:10 idle_enable => bool;
+                /// Inactive compute iterator event.
+                20:20 inactive_compute => bool;
+                /// Inactive fragment iterator event.
+                21:21 inactive_fragment => bool;
+                /// Inactive tiler iterator event.
+                22:22 inactive_tiler => bool;
+                /// GPU exit protected mode event.
+                23:23 protm_exit => bool;
+                /// Performance counter buffer threshold reached.
+                24:24 prfcnt_threshold => bool;
+                /// Performance counter buffer overflow.
+                25:25 prfcnt_overflow => bool;
+                /// Idle state reached.
+                26:26 idle_event => bool;
+            }
+
+            /// Global doorbell request.
+            ///
+            /// Each bit in this register is a request flag for the doorbell to
+            /// the corresponding CSG.
+            pub GLB_DB_REQ(u32) @ 0x08 {
+                31:0 mask;
+            }
+
+            /// Global progress timeout.
+            ///
+            /// Tyr uses this register to configure the maximum time limit without
+            /// forward progress before an interrupt or event is generated.
+            /// Timeout is given in clock cycles; a value of 0 disables the timeout.
+            pub GLB_PROGRESS_TIMER(u32) @ 0x10 {
+                31:0 timeout;
+            }
+
+            /// Global shader core power down timer.
+            ///
+            /// Configures the timeout for automatic shader core and tiler power domain
+            /// powerdown. A nonzero value enables the timeout; 0 disables it.
+            pub GLB_PWROFF_TIMER(u32) @ 0x14 {
+                30:0 timeout;
+                31:31 timer_source => TimestampSource;
+            }
+
+            /// Global shader core allocation enable mask.
+            ///
+            /// Each bit in this register controls which shader cores are
+            /// available for endpoint allocation.
+            pub GLB_ALLOC_EN(u64) @ 0x18 {
+                63:0 mask;
+            }
+
+            /// Configure COHERENCY_ENABLE register value to use in protected
+            /// mode execution.
+            pub GLB_PROTM_COHERENCY(u32) @ 0x20 {
+                31:0 value;
+            }
+
+            /// Performance counter address space.
+            pub GLB_PRFCNT_JASID(u32) @ 0x24 {
+                3:0 jasid;
+            }
+
+            /// Performance counter buffer address.
+            pub GLB_PRFCNT_BASE(u64) @ 0x28 {
+                63:0 pointer;
+            }
+
+            /// Performance counter buffer extract index.
+            pub GLB_PRFCNT_EXTRACT(u32) @ 0x30 {
+                31:0 index;
+            }
+
+            /// Performance counter configuration.
+            pub GLB_PRFCNT_CONFIG(u32) @ 0x40 {
+                7:0 size;
+                9:8 set_select;
+            }
+
+            /// CSG performance counting enable.
+            pub GLB_PRFCNT_CSG_SELECT(u32) @ 0x44 {
+                31:0 enable;
+            }
+
+            /// Performance counter enable for firmware.
+            pub GLB_PRFCNT_FW_EN(u32) @ 0x48 {
+                /// Enable flags for groups of 4 counters.
+                31:0 enable;
+            }
+
+            /// Performance counter enable for CSG.
+            pub GLB_PRFCNT_CSG_EN(u32) @ 0x4C {
+                /// Enable flags for groups of 4 counters.
+                31:0 enable;
+            }
+
+            /// Performance counter enable for CSF.
+            pub GLB_PRFCNT_CSF_EN(u32) @ 0x50 {
+                /// Enable flags for groups of 4 counters.
+                31:0 enable;
+            }
+
+            /// Performance counter enable for shader cores.
+            pub GLB_PRFCNT_SHADER_EN(u32) @ 0x54 {
+                /// Enable flags for groups of 4 counters.
+                31:0 enable;
+            }
+
+            /// Performance counter enable for tiler.
+            pub GLB_PRFCNT_TILER_EN(u32) @ 0x58 {
+                /// Enable flags for groups of 4 counters.
+                31:0 enable;
+            }
+
+            /// Performance counter enable for MMU/L2 cache.
+            pub GLB_PRFCNT_MMU_L2_EN(u32) @ 0x5C {
+                /// Enable flags for groups of 4 counters.
+                31:0 enable;
+            }
+
+            /// Global idle event timer.
+            ///
+            /// Configures the timeout for reporting that the GPU has become idle.
+            /// If the value is 0, then idleness is reported immediately.
+            pub GLB_IDLE_TIMER(u32) @ 0x80 {
+                30:0 timeout;
+                31:31 timer_source => TimestampSource;
+            }
+        }
+    }
+
+    /// GLB_OUTPUT_BLOCK - Global register interface, output area.
+    ///
+    /// These macros represent virtualized registers for the global output block.
+    /// Only the CSF updates registers in this area; Tyr has read-only access.
+    pub(super) mod output {
+        use super::HaltStatus;
+        use kernel::register;
+
+        register! {
+            /// Global acknowledge register.
+            ///
+            /// The CSF acknowledges requests from Tyr by changing the value of
+            /// bits in this register.
+            pub GLB_ACK(u32) @ 0x00 {
+                /// Update the progress timer timeout.
+                1:1 cfg_progress_timer => bool;
+                /// Update the shader core allocation mask.
+                2:2 cfg_alloc_en => bool;
+                /// Update the shader core power down timeout.
+                3:3 cfg_pwroff_timer => bool;
+                /// Switch the GPU into protected mode.
+                4:4 protm_enter => bool;
+                /// Control performance counters.
+                5:5 prfcnt_enable => bool;
+                /// Sample performance counters.
+                6:6 prfcnt_sample => bool;
+                /// Enable cycle counter and timestamp.
+                7:7 counter_enable => bool;
+                /// Check if firmware is alive.
+                8:8 ping => bool;
+                /// Update firmware configuration settings.
+                9:9 firmware_config_update => bool;
+                /// Enable idle state reporting.
+                10:10 idle_enable => bool;
+                /// Inactive compute iterator event.
+                20:20 inactive_compute => bool;
+                /// Inactive fragment iterator event.
+                21:21 inactive_fragment => bool;
+                /// Inactive tiler iterator event.
+                22:22 inactive_tiler => bool;
+                /// The GPU has exited protected mode.
+                23:23 protm_exit => bool;
+                /// Performance counter buffer hit 50% threshold.
+                24:24 prfcnt_threshold => bool;
+                /// Performance counter buffer overflow.
+                25:25 prfcnt_overflow => bool;
+                /// Idle state reached.
+                26:26 idle_event => bool;
+            }
+
+            /// Global doorbell acknowledge.
+            ///
+            /// Each bit in this register is an acknowledgement flag from the
+            /// doorbell to the corresponding CSG.
+            pub GLB_DB_ACK(u32) @ 0x08 {
+                31:0 mask;
+            }
+
+            /// Global halt status.
+            ///
+            /// If the MCU has entered the HALT state due to a serious error, then the
+            /// firmware can write a value to this field to supply more information about
+            /// the source of the error.
+            pub GLB_HALT_STATUS(u32) @ 0x10 {
+                31:0 value ?=> HaltStatus;
+            }
+
+            /// Performance counter status.
+            ///
+            /// This register contains information about the last performance-counter
+            /// sample operation.
+            pub GLB_PRFCNT_STATUS(u32) @ 0x14 {
+                /// Performance counter operation failed.
+                0:0 failed => bool;
+                /// Performance counter operation affected by POWER_ON.
+                1:1 power_on_transition => bool;
+                /// Performance counter operation affected by POWER_OFF.
+                2:2 power_off_transition => bool;
+                /// Performance counter operation affected by protected mode.
+                3:3 protected_session => bool;
+            }
+
+            /// Performance counter buffer insert index.
+            pub GLB_PRFCNT_INSERT(u32) @ 0x18 {
+                31:0 index;
+            }
+        }
+    }
+}
+
+/// CSG (Command Stream Group) interface definitions for GROUP_CONTROL_BLOCK.
+///
+/// This module contains the register definitions and types for CSG interfaces,
+/// including control, input, and output blocks.
+mod csg {
+    use core::convert::TryFrom;
+
+    use kernel::{
+        error::{
+            code::EINVAL,
+            Error, //
+        },
+        num::Bounded, //
+    };
+
+    /// Size of a single CSG control block header (GROUP_FEATURES through GROUP_STREAM_STRIDE).
+    ///
+    /// This covers the per-CSG control registers at offsets 0x00-0x18
+    /// STREAM_CONTROL (CS) blocks are accessed separately via runtime calculations.
+    pub(super) const CSG_CONTROL_BLOCK_SIZE: usize = 0x1C;
+
+    /// Size of the CSG_INPUT_BLOCK register block (up to and including CSG_CONFIG at 0x50 + 4 bytes).
+    pub(super) const CSG_INPUT_BLOCK_SIZE: usize = 0x54;
+
+    /// Size of the CSG_OUTPUT_BLOCK register block (up to and including CSG_RESOURCE_DEP at 0x1C + 4 bytes).
+    pub(super) const CSG_OUTPUT_BLOCK_SIZE: usize = 0x20;
+
+    /// CSG execution state (csg_execution_state_t in spec).
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u8)]
+    pub(super) enum CsgExecutionState {
+        /// Terminate execution without saving any state.
+        Terminate = 0,
+        /// Start execution of the command stream group without restoring any state.
+        Start = 1,
+        /// Suspend the command stream. The state of the command stream is saved in the suspend
+        /// buffer, and then the status update registers are updated.
+        Suspend = 2,
+        /// Restore command stream group state from the suspend buffer and continue execution of
+        /// the command stream group.
+        Resume = 3,
+    }
+
+    impl TryFrom<Bounded<u32, 3>> for CsgExecutionState {
+        type Error = Error;
+
+        fn try_from(val: Bounded<u32, 3>) -> Result<Self, Self::Error> {
+            match val.get() {
+                0 => Ok(CsgExecutionState::Terminate),
+                1 => Ok(CsgExecutionState::Start),
+                2 => Ok(CsgExecutionState::Suspend),
+                3 => Ok(CsgExecutionState::Resume),
+                _ => Err(EINVAL),
+            }
+        }
+    }
+
+    impl From<CsgExecutionState> for Bounded<u32, 3> {
+        fn from(state: CsgExecutionState) -> Self {
+            Bounded::try_new(state as u32).unwrap()
+        }
+    }
+
+    /// CSG state interrupt mask (csf_state_irq_mask_t in spec).
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u8)]
+    pub(super) enum CsgStateIrqMask {
+        /// Host interrupt disabled.
+        Disabled = 0,
+        /// Host interrupt enabled.
+        /// This interrupt mask enables interrupts for all 3 bits of the STATUS field,
+        /// and therefore triggers on any value change.
+        Enabled = 7,
+    }
+
+    impl TryFrom<Bounded<u32, 3>> for CsgStateIrqMask {
+        type Error = Error;
+
+        fn try_from(val: Bounded<u32, 3>) -> Result<Self, Self::Error> {
+            match val.get() {
+                0 => Ok(CsgStateIrqMask::Disabled),
+                7 => Ok(CsgStateIrqMask::Enabled),
+                _ => Err(EINVAL),
+            }
+        }
+    }
+
+    impl From<CsgStateIrqMask> for Bounded<u32, 3> {
+        fn from(mask: CsgStateIrqMask) -> Self {
+            Bounded::try_new(mask as u32).unwrap()
+        }
+    }
+
+    /// GROUP_CONTROL_BLOCK - CSG interface control and capabilities.
+    ///
+    /// This defines the register layout for a single CSG interface control block.
+    /// Each CSG's control block is accessed by calculating its runtime offset.
+    pub(super) mod control {
+        use kernel::register;
+
+        register! {
+            /// CSG interface features.
+            ///
+            /// This register contains information about the capabilities of the CSG.
+            pub GROUP_FEATURES(u32) @ 0x00 {
+                /// Suspend buffer type.
+                ///
+                /// Suspend data can be interchanged between two CSGs with the same suspend type.
+                /// Suspend type values have no specific meaning and are otherwise opaque to Tyr.
+                7:0 suspend_type;
+                /// Detailed resource tracking supported. Default is 0 (false).
+                8:8 detailed_tracking => bool;
+            }
+
+            /// MCU virtual address of CSG_INPUT_BLOCK.
+            pub GROUP_INPUT_VA(u32) @ 0x04 {
+                31:0 value;
+            }
+
+            /// MCU virtual address of CSG_OUTPUT_BLOCK.
+            pub GROUP_OUTPUT_VA(u32) @ 0x08 {
+                31:0 value;
+            }
+
+            /// Size, in bytes, required to write suspend data for a CSG buffer in unprotected mode.
+            pub GROUP_SUSPEND_SIZE(u32) @ 0x0C {
+                31:0 value;
+            }
+
+            /// Size, in bytes, required to write suspend data for a CSG buffer in protected mode.
+            pub GROUP_PROTM_SUSPEND_SIZE(u32) @ 0x10 {
+                31:0 value;
+            }
+
+            /// Number of CS interfaces supported by this CSG.
+            pub GROUP_STREAM_NUM(u32) @ 0x14 {
+                5:0 value;
+            }
+
+            /// Stride, in bytes, between CS interface capabilities structures.
+            pub GROUP_STREAM_STRIDE(u32) @ 0x18 {
+                31:0 value;
+            }
+        }
+    }
+
+    /// CSG_INPUT_BLOCK - CSG control, input area.
+    ///
+    /// Only Tyr updates registers in this area. This area is used for control
+    /// of a particular CSG.
+    pub(super) mod input {
+        use super::{
+            CsgExecutionState,
+            CsgStateIrqMask, //
+        };
+        use kernel::register;
+
+        register! {
+            /// CSG request.
+            ///
+            /// Controls various features of the CSG through
+            /// request/acknowledge communication with CSG_ACK.
+            pub CSG_REQ(u32) @ 0x00 {
+                /// Request change of Execution state.
+                2:0 state ?=> CsgExecutionState;
+                /// Request endpoint configuration update.
+                4:4 ep_cfg => bool;
+                /// Request status update.
+                5:5 status_update => bool;
+                /// Notification of sync status change.
+                28:28 sync_update => bool;
+                /// Notification of idle status.
+                29:29 idle => bool;
+                /// Notification of forward progress timeout.
+                31:31 progress_timer_event => bool;
+            }
+
+            /// Global acknowledge IRQ mask.
+            ///
+            /// Controls which flags in CSG_ACK trigger a host IRQ when updated.
+            pub CSG_ACK_IRQ_MASK(u32) @ 0x04 {
+                /// Execution state change event.
+                2:0 state ?=> CsgStateIrqMask;
+                /// Endpoint configuration complete event.
+                4:4 ep_cfg => bool;
+                /// Status update event.
+                5:5 status_update => bool;
+                /// Sync status change event.
+                28:28 sync_update => bool;
+                /// Idle event.
+                29:29 idle => bool;
+                /// Progress timer event.
+                31:31 progress_timer_event => bool;
+            }
+
+            /// CS doorbell request.
+            ///
+            /// Each bit is a request flag for the doorbell to the corresponding CS
+            /// within this CSG. Checked when the global DOORBELL register is written.
+            pub CSG_DB_REQ(u32) @ 0x08 {
+                31:0 mask;
+            }
+
+            /// CS IRQ acknowledge.
+            ///
+            /// Each bit is an acknowledge flag for the IRQ to the corresponding
+            /// CS within the CSG.
+            pub CSG_IRQ_ACK(u32) @ 0x0C {
+                31:0 mask;
+            }
+
+            /// Allowed compute endpoints.
+            pub CSG_ALLOW_COMPUTE(u64) @ 0x20 {
+                63:0 mask;
+            }
+
+            /// Allowed fragment endpoints.
+            pub CSG_ALLOW_FRAGMENT(u64) @ 0x28 {
+                63:0 mask;
+            }
+
+            /// Allowed other endpoints.
+            pub CSG_ALLOW_OTHER(u32) @ 0x30 {
+                31:0 mask;
+            }
+
+            /// Endpoint allocation request.
+            ///
+            /// Configures the allowed requests for each type of endpoint for this CSG.
+            pub CSG_EP_REQ(u32) @ 0x34 {
+                /// Maximum number of endpoints which can run compute jobs.
+                7:0 compute_ep;
+                /// Maximum number of endpoints which can run fragment jobs.
+                15:8 fragment_ep;
+                /// Maximum number of endpoints which can run tiler jobs.
+                19:16 tiler_ep;
+                /// Endpoint exclusively runs compute jobs.
+                20:20 exclusive_compute => bool;
+                /// Endpoint exclusively runs fragment jobs.
+                21:21 exclusive_fragment => bool;
+                /// Priority of the CSG with respect to other CSGs (higher value = higher priority).
+                31:28 priority;
+            }
+
+            /// Normal mode suspend buffer address.
+            pub CSG_SUSPEND_BUF(u64) @ 0x40 {
+                63:0 pointer;
+            }
+
+            /// Protected mode suspend buffer address.
+            pub CSG_PROTM_SUSPEND_BUF(u64) @ 0x48 {
+                63:0 pointer;
+            }
+
+            /// CSG configuration options.
+            pub CSG_CONFIG(u32) @ 0x50 {
+                3:0 jasid;
+                8:8 l2c_allocate_ring => bool;
+                16:16 l2c_allocate_other => bool;
+            }
+        }
+    }
+
+    /// CSG_OUTPUT_BLOCK - CSG control, output area.
+    ///
+    /// Only the CSF updates the registers in this area. This area is used for control
+    /// of a particular CSG.
+    ///
+    /// Instances of this virtual register page are referenced by the
+    /// GROUP_CONTROL_BLOCK.GROUP_OUTPUT_VA register.
+    pub(super) mod output {
+        use super::CsgExecutionState;
+        use kernel::register;
+
+        register! {
+            /// CSG acknowledge flags.
+            ///
+            /// Interacts with CSG_REQ to control various features of the CSG
+            /// through request/acknowledge communication.
+            pub CSG_ACK(u32) @ 0x00 {
+                /// Current Execution state.
+                2:0 state ?=> CsgExecutionState;
+                /// Completion of endpoint configuration.
+                4:4 ep_cfg => bool;
+                /// Completion of status update.
+                5:5 status_update => bool;
+                /// Notification of sync status change.
+                28:28 sync_update => bool;
+                /// Notification of idle status.
+                29:29 idle => bool;
+                /// Notification of forward progress timeout.
+                31:31 progress_timer_event => bool;
+            }
+
+            /// CS kernel doorbell acknowledge flags.
+            ///
+            /// Each bit is an acknowledge flag for the doorbell to the corresponding
+            /// CS within this CSG. The doorbell for CSn is active when
+            /// bit n in CSG_DB_REQ and CSG_DB_ACK differ.
+            pub CSG_DB_ACK(u32) @ 0x08 {
+                31:0 mask;
+            }
+
+            /// CS IRQ request flags.
+            pub CSG_IRQ_REQ(u32) @ 0x0C {
+                31:0 mask;
+            }
+
+            /// Endpoint allocation status register.
+            ///
+            /// Provides information on the number of endpoints currently allocated
+            /// to this CSG.
+            pub CSG_STATUS_EP_CURRENT(u32) @ 0x10 {
+                /// Number of compute endpoints.
+                7:0 compute_ep;
+                /// Number of fragment endpoints.
+                15:8 fragment_ep;
+                /// Number of tiler endpoints.
+                19:16 tiler_ep;
+            }
+
+            /// Endpoint request status register.
+            ///
+            /// Provides information on the number of endpoints currently requested
+            /// by this CSG.
+            pub CSG_STATUS_EP_REQ(u32) @ 0x14 {
+                /// Number of compute endpoints.
+                7:0 compute_ep;
+                /// Number of fragment endpoints.
+                15:8 fragment_ep;
+                /// Number of tiler endpoints.
+                19:16 tiler_ep;
+                /// Endpoint exclusively runs compute jobs.
+                20:20 exclusive_compute => bool;
+                /// Endpoint exclusively runs fragment jobs.
+                21:21 exclusive_fragment => bool;
+            }
+
+            /// Overall state status register.
+            pub CSG_STATUS_STATE(u32) @ 0x18 {
+                0:0 idle => bool;
+            }
+
+            /// Current resource dependencies.
+            pub CSG_RESOURCE_DEP(u32) @ 0x1C {
+                /// Stream using no resources.
+                0:0 none => bool;
+                /// Stream using only compute resources.
+                1:1 using_compute => bool;
+                /// Stream using only fragment resources.
+                2:2 using_fragment => bool;
+                /// Stream using compute and fragment resources.
+                3:3 using_compute_fragment => bool;
+                /// Stream using only tiler resources.
+                4:4 using_tiler => bool;
+                /// Stream using compute and tiler resources.
+                5:5 using_compute_tiler => bool;
+                /// Stream using fragment and tiler resources.
+                6:6 using_fragment_tiler => bool;
+                /// Stream using compute, fragment and tiler resources.
+                7:7 using_compute_fragment_tiler => bool;
+                /// Compute resource available.
+                16:16 avail_compute => bool;
+                /// Fragment resource available.
+                17:17 avail_fragment => bool;
+                /// Tiler resource available.
+                18:18 avail_tiler => bool;
+                /// Active compute resource request.
+                20:20 active_compute => bool;
+                /// Active fragment resource request.
+                21:21 active_fragment => bool;
+                /// Active tiler resource request.
+                22:22 active_tiler => bool;
+            }
+        }
+    }
+}
+
+/// CS interface definitions for STREAM_CONTROL_BLOCK
+///
+/// This module contains the register definitions and types for CS interfaces,
+/// including control, input, and output blocks.
+mod cs {
+    use core::convert::TryFrom;
+
+    use kernel::{
+        error::{
+            code::EINVAL,
+            Error, //
+        },
+        num::Bounded, //
+    };
+
+    /// Size of a single CS control block header.
+    ///
+    /// This covers the per-CS control registers at offsets 0x00-0x08
+    /// CS blocks are accessed separately via runtime calculations.
+    pub(super) const CS_CONTROL_BLOCK_SIZE: usize = 0xC;
+
+    /// Size of the CS_KERNEL_INPUT_BLOCK register block.
+    pub(super) const CS_KERNEL_INPUT_BLOCK_SIZE: usize = 0x58;
+
+    /// Size of the CS_KERNEL_OUTPUT_BLOCK register block.
+    pub(super) const CS_KERNEL_OUTPUT_BLOCK_SIZE: usize = 0xD8;
+
+    /// CS execution state (cs_state_t in spec).
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u8)]
+    pub(super) enum CsState {
+        /// Stop the command stream.
+        /// The execution of command stream instructions stops and any job active from the
+        /// command stream runs to completion (unless terminated at the CSG level) before
+        /// the STOP request completes.
+        Stop = 0,
+        /// Initialize the command stream and start execution.
+        Start = 1,
+    }
+
+    impl TryFrom<Bounded<u32, 3>> for CsState {
+        type Error = Error;
+
+        fn try_from(val: Bounded<u32, 3>) -> Result<Self, Self::Error> {
+            match val.get() {
+                0 => Ok(CsState::Stop),
+                1 => Ok(CsState::Start),
+                _ => Err(EINVAL),
+            }
+        }
+    }
+
+    impl From<CsState> for Bounded<u32, 3> {
+        fn from(state: CsState) -> Self {
+            Bounded::try_new(state as u32).unwrap()
+        }
+    }
+
+    /// CS state interrupt mask (csf_state_irq_mask_t in spec).
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u8)]
+    pub(super) enum CsStateIrqMask {
+        /// Host interrupt disabled.
+        Disabled = 0,
+        /// Host interrupt enabled.
+        /// This interrupt mask enables interrupts for all 3 bits of the STATUS field,
+        /// and therefore triggers on any value change.
+        Enabled = 7,
+    }
+
+    impl TryFrom<Bounded<u32, 3>> for CsStateIrqMask {
+        type Error = Error;
+
+        fn try_from(val: Bounded<u32, 3>) -> Result<Self, Self::Error> {
+            match val.get() {
+                0 => Ok(CsStateIrqMask::Disabled),
+                7 => Ok(CsStateIrqMask::Enabled),
+                _ => Err(EINVAL),
+            }
+        }
+    }
+
+    impl From<CsStateIrqMask> for Bounded<u32, 3> {
+        fn from(mask: CsStateIrqMask) -> Self {
+            Bounded::try_new(mask as u32).unwrap()
+        }
+    }
+
+    /// CS scoreboard wait source (cs_sb_wait_source_t in spec).
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u8)]
+    pub(super) enum CsSbWaitSource {
+        /// Not waiting for scoreboards.
+        None = 0x0,
+        /// WAIT instruction.
+        /// The SB_MASK field shows which scoreboard entries the WAIT instruction is waiting for.
+        Wait = 0x8,
+    }
+
+    impl TryFrom<Bounded<u32, 4>> for CsSbWaitSource {
+        type Error = Error;
+
+        fn try_from(val: Bounded<u32, 4>) -> Result<Self, Self::Error> {
+            match val.get() {
+                0x0 => Ok(CsSbWaitSource::None),
+                0x8 => Ok(CsSbWaitSource::Wait),
+                _ => Err(EINVAL),
+            }
+        }
+    }
+
+    impl From<CsSbWaitSource> for Bounded<u32, 4> {
+        fn from(source: CsSbWaitSource) -> Self {
+            Bounded::try_new(source as u32).unwrap()
+        }
+    }
+
+    /// CS wait condition (csf_wait_condition_t in spec).
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u8)]
+    pub(super) enum CsWaitCondition {
+        /// Sync Object <= Comparison Register.
+        Le = 0,
+        /// Sync Object > Comparison Register.
+        Gt = 1,
+    }
+
+    impl TryFrom<Bounded<u32, 4>> for CsWaitCondition {
+        type Error = Error;
+
+        fn try_from(val: Bounded<u32, 4>) -> Result<Self, Self::Error> {
+            match val.get() {
+                0 => Ok(CsWaitCondition::Le),
+                1 => Ok(CsWaitCondition::Gt),
+                _ => Err(EINVAL),
+            }
+        }
+    }
+
+    impl From<CsWaitCondition> for Bounded<u32, 4> {
+        fn from(condition: CsWaitCondition) -> Self {
+            Bounded::try_new(condition as u32).unwrap()
+        }
+    }
+
+    /// CS blocked reason (cs_blocked_reason_t in spec).
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u8)]
+    pub(super) enum CsBlockedReason {
+        /// The command stream is not blocked.
+        Unblocked = 0,
+        /// Blocked on scoreboards in some way.
+        /// See CS_STATUS_WAIT for further information.
+        SbWait = 1,
+        /// Blocked on PROGRESS_WAIT instruction.
+        ProgressWait = 2,
+        /// Blocked on a SYNC_WAIT32 or SYNC_WAIT64 instruction.
+        /// See CS_STATUS_WAIT, CS_STATUS_WAIT_SYNC_POINTER and CS_STATUS_WAIT_SYNC_VALUE for
+        /// more information.
+        SyncWait = 3,
+        /// Blocked awaiting storage for a deferred instruction.
+        Deferred = 4,
+        /// Blocked awaiting resource allocation.
+        /// See CS_STATUS_REQ_RESOURCE for more information.
+        Resource = 5,
+        /// Blocked awaiting completion of a synchronous FLUSH_CACHE2 instruction.
+        Flush = 6,
+    }
+
+    impl TryFrom<Bounded<u32, 4>> for CsBlockedReason {
+        type Error = Error;
+
+        fn try_from(val: Bounded<u32, 4>) -> Result<Self, Self::Error> {
+            match val.get() {
+                0 => Ok(CsBlockedReason::Unblocked),
+                1 => Ok(CsBlockedReason::SbWait),
+                2 => Ok(CsBlockedReason::ProgressWait),
+                3 => Ok(CsBlockedReason::SyncWait),
+                4 => Ok(CsBlockedReason::Deferred),
+                5 => Ok(CsBlockedReason::Resource),
+                6 => Ok(CsBlockedReason::Flush),
+                _ => Err(EINVAL),
+            }
+        }
+    }
+
+    impl From<CsBlockedReason> for Bounded<u32, 4> {
+        fn from(reason: CsBlockedReason) -> Self {
+            Bounded::try_new(reason as u32).unwrap()
+        }
+    }
+
+    /// CS_FAULT exception type (restricted subset of exception_type_t in spec).
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u8)]
+    pub(super) enum CsFaultExceptionType {
+        /// No error.
+        Ok = 0x00,
+        /// Shader program executed a KABOOM instruction.
+        Kaboom = 0x05,
+        /// Iterator terminated.
+        CsResourceTerminated = 0x0F,
+        /// Command stream bus error.
+        CsBusFault = 0x48,
+        /// A fault has been inherited.
+        CsInheritFault = 0x4B,
+        /// Shader invalid Program Counter.
+        InstrInvalidPc = 0x50,
+        /// Shader invalid instruction.
+        InstrInvalidEnc = 0x51,
+        /// Shader barrier failure.
+        InstrBarrierFault = 0x55,
+        /// Invalid descriptor.
+        DataInvalidFault = 0x58,
+        /// Tile out of bounds.
+        TileRangeFault = 0x59,
+        /// Address out of bounds.
+        AddrRangeFault = 0x5A,
+        /// No detailed error information available.
+        ImpreciseFault = 0x5B,
+        /// Firmware error.
+        ResourceEvictionTimeout = 0x69,
+    }
+
+    impl TryFrom<Bounded<u32, 8>> for CsFaultExceptionType {
+        type Error = Error;
+
+        fn try_from(val: Bounded<u32, 8>) -> Result<Self, Self::Error> {
+            match val.get() {
+                0x00 => Ok(CsFaultExceptionType::Ok),
+                0x05 => Ok(CsFaultExceptionType::Kaboom),
+                0x0F => Ok(CsFaultExceptionType::CsResourceTerminated),
+                0x48 => Ok(CsFaultExceptionType::CsBusFault),
+                0x4B => Ok(CsFaultExceptionType::CsInheritFault),
+                0x50 => Ok(CsFaultExceptionType::InstrInvalidPc),
+                0x51 => Ok(CsFaultExceptionType::InstrInvalidEnc),
+                0x55 => Ok(CsFaultExceptionType::InstrBarrierFault),
+                0x58 => Ok(CsFaultExceptionType::DataInvalidFault),
+                0x59 => Ok(CsFaultExceptionType::TileRangeFault),
+                0x5A => Ok(CsFaultExceptionType::AddrRangeFault),
+                0x5B => Ok(CsFaultExceptionType::ImpreciseFault),
+                0x69 => Ok(CsFaultExceptionType::ResourceEvictionTimeout),
+                _ => Err(EINVAL),
+            }
+        }
+    }
+
+    impl From<CsFaultExceptionType> for Bounded<u32, 8> {
+        fn from(exc_type: CsFaultExceptionType) -> Self {
+            Bounded::try_new(exc_type as u32).unwrap()
+        }
+    }
+
+    /// CS_FATAL exception type (restricted subset of exception_type_t in spec).
+    #[derive(Copy, Clone, Debug, PartialEq)]
+    #[repr(u8)]
+    pub(super) enum CsFatalExceptionType {
+        /// No error.
+        Ok = 0x00,
+        /// Command stream config invalid.
+        CsConfigFault = 0x40,
+        /// No endpoints available.
+        CsEndpointFault = 0x44,
+        /// Command stream bus error.
+        CsBusFault = 0x48,
+        /// Command stream invalid instruction.
+        CsInvalidInstruction = 0x49,
+        /// Command stream call stack overflow.
+        CsCallStackOverflow = 0x4A,
+        /// Firmware error.
+        FirmwareInternalError = 0x68,
+    }
+
+    impl TryFrom<Bounded<u32, 8>> for CsFatalExceptionType {
+        type Error = Error;
+
+        fn try_from(val: Bounded<u32, 8>) -> Result<Self, Self::Error> {
+            match val.get() {
+                0x00 => Ok(CsFatalExceptionType::Ok),
+                0x40 => Ok(CsFatalExceptionType::CsConfigFault),
+                0x44 => Ok(CsFatalExceptionType::CsEndpointFault),
+                0x48 => Ok(CsFatalExceptionType::CsBusFault),
+                0x49 => Ok(CsFatalExceptionType::CsInvalidInstruction),
+                0x4A => Ok(CsFatalExceptionType::CsCallStackOverflow),
+                0x68 => Ok(CsFatalExceptionType::FirmwareInternalError),
+                _ => Err(EINVAL),
+            }
+        }
+    }
+
+    impl From<CsFatalExceptionType> for Bounded<u32, 8> {
+        fn from(exc_type: CsFatalExceptionType) -> Self {
+            Bounded::try_new(exc_type as u32).unwrap()
+        }
+    }
+
+    /// STREAM_CONTROL_BLOCK - CS interface control and capabilities.
+    ///
+    /// This defines the register layout for a single CS interface control block.
+    /// Each CS's control block is accessed by calculating its runtime offset.
+    pub(super) mod control {
+        use kernel::register;
+        register! {
+            /// CS features.
+            pub STREAM_FEATURES(u32) @ 0x00 {
+                /// Number of work registers.
+                7:0 work_registers;
+                /// Number of scoreboards.
+                15:8 scoreboards;
+                /// Compute jobs are supported.
+                16:16 compute => bool;
+                /// Fragment jobs are supported.
+                17:17 fragment => bool;
+                /// Tiler jobs are supported.
+                18:18 tiler => bool;
+            }
+
+            /// MCU virtual address of CS_KERNEL_INPUT_BLOCK.
+            pub STREAM_INPUT_VA(u32) @ 0x04 {
+                31:0 value;
+            }
+
+            /// MCU virtual address of CS_KERNEL_OUTPUT_BLOCK.
+            pub STREAM_OUTPUT_VA(u32) @ 0x08 {
+                31:0 value;
+            }
+        }
+    }
+
+    /// CS_KERNEL_INPUT_BLOCK.
+    pub(super) mod input {
+        use super::{
+            CsState,
+            CsStateIrqMask, //
+        };
+        use kernel::register;
+
+        // Command stream control, kernel input area.
+        register! {
+            /// Command stream request flags.
+            pub CS_REQ(u32) @ 0x00 {
+                /// Requested command stream state.
+                2:0 state ?=> CsState;
+                /// Enable extract events.
+                4:4 extract_event => bool;
+                /// Enable idle events for sync/wait.
+                8:8 idle_sync_wait => bool;
+                /// Enable idle events for protected mode pending.
+                9:9 idle_protm_pend => bool;
+                /// Enable idle events for empty ring buffer.
+                10:10 idle_empty => bool;
+                /// Enable idle events for resource requests.
+                11:11 idle_resource_req => bool;
+                /// Clear tiler-out-of-memory notification.
+                26:26 tiler_oom => bool;
+                /// Clear protected mode pending notification.
+                27:27 protm_pend => bool;
+                /// Clear fatal error notification.
+                30:30 fatal => bool;
+                /// Clear fault notification.
+                31:31 fault => bool;
+            }
+
+            /// Command stream configuration.
+            pub CS_CONFIG(u32) @ 0x04 {
+                3:0 priority;
+                15:8 user_doorbell;
+            }
+
+            /// Command stream interrupt mask.
+            pub CS_ACK_IRQ_MASK(u32) @ 0x0C {
+                /// CS state change event.
+                2:0 state ?=> CsStateIrqMask;
+                /// Extract event.
+                4:4 extract_event => bool;
+                /// Tiler out of memory.
+                26:26 tiler_oom => bool;
+                /// Protected mode pending.
+                27:27 protm_pend => bool;
+                /// Non-recoverable error.
+                30:30 fatal => bool;
+                /// Recoverable error.
+                31:31 fault => bool;
+            }
+
+            /// Base pointer for the ring buffer.
+            pub CS_BASE(u64) @ 0x10 {
+                63:0 pointer;
+            }
+
+            /// Size of the ring buffer.
+            pub CS_SIZE(u32) @ 0x18 {
+                31:0 size;
+            }
+
+            /// Pointer to start of heap chunk list.
+            pub CS_TILER_HEAP_START(u64) @ 0x20 {
+                63:0 pointer;
+            }
+
+            /// Pointer to end of heap chunk list.
+            pub CS_TILER_HEAP_END(u64) @ 0x28 {
+                63:0 pointer;
+            }
+
+            /// CS user mode input page address.
+            pub CS_USER_INPUT(u64) @ 0x30 {
+                63:0 pointer;
+            }
+
+            /// CS user mode output page address.
+            pub CS_USER_OUTPUT(u64) @ 0x38 {
+                63:0 pointer;
+            }
+
+            /// Instrumentation buffer configuration.
+            pub CS_INSTR_CONFIG(u32) @ 0x40 {
+                3:0 jasid;
+                7:4 event_size;
+                23:16 event_state;
+            }
+
+            /// Instrumentation buffer size.
+            pub CS_INSTR_BUFFER_SIZE(u32) @ 0x44 {
+                31:0 size;
+            }
+
+            /// Instrumentation buffer base pointer.
+            pub CS_INSTR_BUFFER_BASE(u64) @ 0x48 {
+                63:0 pointer;
+            }
+
+            /// Instrumentation buffer pointer to insert offset.
+            pub CS_INSTR_BUFFER_OFFSET_POINTER(u64) @ 0x50 {
+                63:0 pointer;
+            }
+        }
+    }
+
+    /// CS_KERNEL_OUTPUT_BLOCK.
+    pub(super) mod output {
+        use super::{
+            CsBlockedReason,
+            CsFatalExceptionType,
+            CsFaultExceptionType,
+            CsSbWaitSource,
+            CsState,
+            CsWaitCondition, //
+        };
+        use kernel::register;
+
+        // Command stream control, kernel output area.
+        register! {
+            /// Command stream acknowledge flags.
+            pub CS_ACK(u32) @ 0x00 {
+                /// Current command stream state.
+                2:0 state ?=> CsState;
+                /// Extract event notification.
+                4:4 extract_event => bool;
+                /// Tiler out of memory notification.
+                26:26 tiler_oom => bool;
+                /// Stalled waiting for protected mode.
+                27:27 protm_pend => bool;
+                /// Unrecoverable error notification.
+                30:30 fatal => bool;
+                /// Recoverable error notification.
+                31:31 fault => bool;
+            }
+
+            /// Program pointer current value.
+            pub CS_STATUS_CMD_PTR(u64) @ 0x40 {
+                /// Program Counter current value.
+                63:0 pointer;
+            }
+
+            /// Wait condition status register.
+            pub CS_STATUS_WAIT(u32) @ 0x48 {
+                /// Waiting for scoreboard entry.
+                15:0 sb_mask;
+                /// Source of scoreboard wait status, if any.
+                19:16 sb_source ?=> CsSbWaitSource;
+                /// SYNC_WAIT condition.
+                27:24 sync_wait_condition ?=> CsWaitCondition;
+                /// Waiting for PROGRESS_WAIT instruction.
+                28:28 progress_wait => bool;
+                /// Waiting for protected execution.
+                29:29 protm_pend => bool;
+                /// Size of sync object waited for.
+                30:30 sync_wait_size => bool;
+                /// Waiting for SYNC_WAIT instruction.
+                31:31 sync_wait => bool;
+            }
+
+            /// Indicates the resources requested by the command stream.
+            pub CS_STATUS_REQ_RESOURCE(u32) @ 0x4C {
+                /// Compute resources requested.
+                0:0 compute_requested => bool;
+                /// Fragment resources requested.
+                1:1 fragment_requested => bool;
+                /// Tiler resources requested.
+                2:2 tiler_requested => bool;
+                /// IDVS resources requested.
+                3:3 idvs_requested => bool;
+                /// Compute resources granted.
+                16:16 compute_granted => bool;
+                /// Fragment resources granted.
+                17:17 fragment_granted => bool;
+                /// Tiler resources granted.
+                18:18 tiler_granted => bool;
+                /// IDVS resources granted.
+                19:19 idvs_granted => bool;
+            }
+
+            /// Sync object pointer.
+            pub CS_STATUS_WAIT_SYNC_POINTER(u64) @ 0x50 {
+                /// Sync object address.
+                63:0 pointer;
+            }
+
+            /// Sync object test value, low half.
+            pub CS_STATUS_WAIT_SYNC_VALUE(u32) @ 0x58 {
+                /// Sync object test value.
+                31:0 value;
+            }
+
+            /// Scoreboard status.
+            pub CS_STATUS_SCOREBOARDS(u32) @ 0x5C {
+                /// Which scoreboard entries are non-zero.
+                15:0 nonzero;
+            }
+
+            /// Blocked reason.
+            pub CS_STATUS_BLOCKED_REASON(u32) @ 0x60 {
+                3:0 reason ?=> CsBlockedReason;
+            }
+
+            /// Sync object test value, high half.
+            pub CS_STATUS_WAIT_SYNC_VALUE_HI(u32) @ 0x64 {
+                /// Sync object test value.
+                31:0 value;
+            }
+
+            /// Recoverable fault information.
+            pub CS_FAULT(u32) @ 0x80 {
+                /// Exception type.
+                7:0 exception_type ?=> CsFaultExceptionType;
+                /// Exception specific data.
+                31:8 exception_data;
+            }
+
+            /// Unrecoverable fault information.
+            pub CS_FATAL(u32) @ 0x84 {
+                /// Exception type.
+                7:0 exception_type ?=> CsFatalExceptionType;
+                /// Exception specific data.
+                31:8 exception_data;
+            }
+
+            /// Additional information about a recoverable fault.
+            pub CS_FAULT_INFO(u64) @ 0x88 {
+                /// Exception specific data.
+                63:0 exception_data;
+            }
+
+            /// Additional information about a non-recoverable fault.
+            pub CS_FATAL_INFO(u64) @ 0x90 {
+                /// Exception specific data.
+                63:0 exception_data;
+            }
+
+            /// Number of vertex/tiling operations started.
+            pub CS_HEAP_VT_START(u32) @ 0xC0 {
+                31:0 value;
+            }
+
+            /// Number of vertex/tiling operations completed.
+            pub CS_HEAP_VT_END(u32) @ 0xC4 {
+                31:0 value;
+            }
+
+            /// Number of fragment completed.
+            pub CS_HEAP_FRAG_END(u32) @ 0xCC {
+                31:0 value;
+            }
+
+            /// Heap context address.
+            pub CS_HEAP_ADDRESS(u64) @ 0xD0 {
+                63:0 pointer;
+            }
+        }
+    }
+}
+
+use cs::*;
+use csg::*;
+use glb::{
+    control::*,
+    *, //
+};
+
+/// State of the global interface.
+enum GlobalInterfaceState {
+    /// Interface is not yet initialized.
+    Disabled,
+    /// Interface is initialized and operational.
+    Enabled(EnabledGlobalInterface),
+}
+
+/// When enabled, the Global Interface has control,
+/// input, and output system memory interfaces, as well as
+/// the discovered CSG interfaces.
+#[expect(dead_code)]
+struct EnabledGlobalInterface {
+    /// Control block interface - provides version, features, and CSG discovery.
+    glb_control: FwInterface<GLB_CONTROL_BLOCK_SIZE>,
+    /// Input block interface - driver writes requests here.
+    glb_input: FwInterface<GLB_INPUT_BLOCK_SIZE>,
+    /// Output block interface - firmware writes acknowledgements here.
+    glb_output: FwInterface<GLB_OUTPUT_BLOCK_SIZE>,
+    /// Runtime stride between CSG control blocks (read from GLB_GROUP_STRIDE).
+    csg_stride: usize,
+    /// Number of CSG interfaces reported by hardware.
+    csg_num: usize,
+    /// Discovered CSG interfaces.
+    csg: KVec<CsgInterface>,
+}
+
+/// Global CSF Interface
+///
+/// The CSF controls operations that are common to all CSs.
+pub(super) struct GlobalInterface {
+    /// Current interface state (Disabled or Enabled).
+    state: GlobalInterfaceState,
+}
+
+impl GlobalInterface {
+    /// Creates a new CSF global interface, initially disabled.
+    pub(super) fn new() -> Result<Self> {
+        Ok(Self {
+            state: GlobalInterfaceState::Disabled,
+        })
+    }
+
+    /// Enables the global interface and discovers the CSG interfaces.
+    ///
+    /// This reads the firmware's control block to set up the global input/output
+    /// interfaces; it configures timers and shader core allocation; and it discovers
+    /// available CSG interfaces.
+    pub(crate) fn enable(&mut self, shared_section: &Section) -> Result {
+        let vmap = shared_section.mem.bo.owned_vmap::<0>()?;
+        let va_range = shared_section.mem.va_range();
+
+        let glb_control =
+            FwInterface::<GLB_CONTROL_BLOCK_SIZE>::new(&vmap, &va_range, va_range.start)?;
+
+        let version = glb_control.read(GLB_VERSION);
+        if version.major().get() == 0 {
+            pr_err!("CSF interface version is 0. Firmware may have failed to boot.\n");
+            return Err(EINVAL);
+        }
+        pr_info!(
+            "CSF interface version: {}.{}.{}\n",
+            version.major().get(),
+            version.minor().get(),
+            version.patch().get()
+        );
+
+        let input_va = glb_control.read(GLB_INPUT_VA);
+        let glb_input = FwInterface::<GLB_INPUT_BLOCK_SIZE>::new(
+            &vmap,
+            &va_range,
+            input_va.value().get().into(),
+        )?;
+
+        let output_va = glb_control.read(GLB_OUTPUT_VA);
+        let glb_output = FwInterface::<GLB_OUTPUT_BLOCK_SIZE>::new(
+            &vmap,
+            &va_range,
+            output_va.value().get().into(),
+        )?;
+
+        // Read how many CSG interfaces exist.
+        let csg_num = glb_control.read(GLB_GROUP_NUM).value().get();
+
+        // Read the stride between CSG control blocks.
+        let csg_stride = glb_control.read(GLB_GROUP_STRIDE).value().get() as usize;
+
+        if csg_stride < CSG_CONTROL_BLOCK_SIZE {
+            pr_err!(
+                "CSG stride {} is smaller than control block size {}\n",
+                csg_stride,
+                CSG_CONTROL_BLOCK_SIZE
+            );
+            return Err(EINVAL);
+        }
+
+        // Validate the CSG number reported.
+        if csg_num as usize > super::MAX_CSG {
+            pr_err!(
+                "Too many CSGs: hardware reports {}, max supported {}\n",
+                csg_num,
+                super::MAX_CSG
+            );
+            return Err(EINVAL);
+        }
+
+        let enabled = EnabledGlobalInterface {
+            glb_control,
+            glb_input,
+            glb_output,
+            csg_stride,
+            csg_num: csg_num as usize,
+            csg: KVec::with_capacity(csg_num as usize, GFP_KERNEL)?,
+        };
+
+        self.state = GlobalInterfaceState::Enabled(enabled);
+        self.init_csg(shared_section)?;
+        Ok(())
+    }
+
+    /// Initialize CSG interfaces.
+    ///
+    /// This uses the previously read CSG count to create and enable each CSG interface.
+    fn init_csg(&mut self, shared_section: &Section) -> Result {
+        let enabled = match &mut self.state {
+            GlobalInterfaceState::Enabled(e) => e,
+            GlobalInterfaceState::Disabled => return Err(EINVAL),
+        };
+
+        for csg_idx in 0..enabled.csg_num {
+            // Create and enable the CSG interface.
+            let mut csg = CsgInterface::new(csg_idx)?;
+            csg.enable(shared_section, csg_idx, enabled.csg_stride)?;
+
+            enabled.csg.push(csg, GFP_KERNEL)?;
+        }
+
+        Ok(())
+    }
+}
+
+/// State of a CSG interface.
+enum CsgInterfaceState {
+    /// Interface is not yet initialized.
+    Disabled,
+    /// Interface is initialized and operational.
+    Enabled(EnabledCsgInterface),
+}
+
+/// When enabled, a CSG Interface has control, input, and output system memory interfaces.
+struct EnabledCsgInterface {
+    /// Control block interface - provides CSG capabilities and configuration.
+    #[expect(dead_code)]
+    csg_control: FwInterface<CSG_CONTROL_BLOCK_SIZE>,
+    /// Input block interface - driver writes CSG requests here.
+    #[expect(dead_code)]
+    csg_input: FwInterface<CSG_INPUT_BLOCK_SIZE>,
+    /// Output block interface - firmware writes CSG acknowledgements here.
+    #[expect(dead_code)]
+    csg_output: FwInterface<CSG_OUTPUT_BLOCK_SIZE>,
+    /// Runtime stride between CS control blocks (read from GROUP_STREAM_STRIDE).
+    cs_stride: usize,
+    /// Number of CS interfaces reported by hardware for this CSG.
+    cs_num: usize,
+    /// Discovered CS interfaces.
+    cs: KVec<CsInterface>,
+}
+
+/// Command Stream Group Interface
+///
+/// The CSG interface controls operations for a specific CSG.
+pub(crate) struct CsgInterface {
+    /// Current interface state (Disabled or Enabled).
+    state: CsgInterfaceState,
+    /// CSG identifier/index number.
+    #[expect(dead_code)]
+    csg_idx: usize,
+}
+
+impl CsgInterface {
+    /// Creates a new disabled CSG interface.
+    pub(super) fn new(csg_idx: usize) -> Result<Self> {
+        Ok(Self {
+            state: CsgInterfaceState::Disabled,
+            csg_idx,
+        })
+    }
+
+    /// Enables the CSG interface.
+    ///
+    /// This calculates the runtime offset of this CSG's control block and creates
+    /// a bounded interface to access it. It then reads the input/output interface
+    /// addresses from the CSG control block.
+    fn enable(&mut self, shared_section: &Section, csg_idx: usize, csg_stride: usize) -> Result {
+        use csg::control::{
+            GROUP_INPUT_VA,
+            GROUP_OUTPUT_VA,
+            GROUP_STREAM_NUM,
+            GROUP_STREAM_STRIDE, //
+        };
+        use kernel::io::Io;
+
+        let vmap = shared_section.mem.bo.owned_vmap::<0>()?;
+        let va_range = shared_section.mem.va_range();
+
+        // Calculate the runtime offset for this CSG's control block.
+        // The CSG control blocks start at CSG_GROUP_CONTROL_OFFSET from the GLB control block,
+        // with each CSG spaced by csg_stride bytes.
+        let csg_control_offset = CSG_GROUP_CONTROL_OFFSET + csg_idx * csg_stride;
+
+        // The CSG control block's MCU virtual address is relative to the shared section start.
+        let csg_control_va = va_range.start + csg_control_offset as u64;
+
+        // Create a bounded interface for this CSG's control block at the calculated address.
+        let csg_control =
+            FwInterface::<CSG_CONTROL_BLOCK_SIZE>::new(&vmap, &va_range, csg_control_va)?;
+
+        // Read the input and output VAs from the CSG control block.
+        let input_va = csg_control.read(GROUP_INPUT_VA).value().get();
+        let csg_input =
+            FwInterface::<CSG_INPUT_BLOCK_SIZE>::new(&vmap, &va_range, input_va.into())?;
+
+        let output_va = csg_control.read(GROUP_OUTPUT_VA).value().get();
+        let csg_output =
+            FwInterface::<CSG_OUTPUT_BLOCK_SIZE>::new(&vmap, &va_range, output_va.into())?;
+
+        // Read the runtime stride between CS control blocks.
+        let cs_stride = csg_control.read(GROUP_STREAM_STRIDE).value().get() as usize;
+
+        if cs_stride < CS_CONTROL_BLOCK_SIZE {
+            pr_err!(
+                "CS stride {} is smaller than control block size {}\n",
+                cs_stride,
+                CS_CONTROL_BLOCK_SIZE
+            );
+            return Err(EINVAL);
+        }
+
+        // Read how many CS interfaces exist for this CSG.
+        let cs_num = csg_control.read(GROUP_STREAM_NUM).value().get();
+
+        // Validate that the hardware doesn't report more CS than we support.
+        if cs_num as usize > super::MAX_CS {
+            pr_err!(
+                "Too many CS: hardware reports {}, max supported {}\n",
+                cs_num,
+                super::MAX_CS
+            );
+            return Err(EINVAL);
+        }
+
+        let enabled = EnabledCsgInterface {
+            csg_control,
+            csg_input,
+            csg_output,
+            cs_stride,
+            cs_num: cs_num as usize,
+            cs: KVec::with_capacity(cs_num as usize, GFP_KERNEL)?,
+        };
+
+        self.state = CsgInterfaceState::Enabled(enabled);
+        self.init_cs(shared_section, csg_control_offset)?;
+        Ok(())
+    }
+
+    /// Initialize and discover CS interfaces.
+    ///
+    /// This uses the previously read CS count to create and enable each CS interface.
+    fn init_cs(&mut self, shared_section: &Section, csg_control_offset: usize) -> Result {
+        let enabled = match &mut self.state {
+            CsgInterfaceState::Enabled(e) => e,
+            CsgInterfaceState::Disabled => return Err(EINVAL),
+        };
+
+        for cs_idx in 0..enabled.cs_num {
+            // Create and enable the CS interface.
+            let mut cs = CsInterface::new(cs_idx)?;
+            cs.enable(
+                shared_section,
+                csg_control_offset,
+                cs_idx,
+                enabled.cs_stride,
+            )?;
+
+            enabled.cs.push(cs, GFP_KERNEL)?;
+        }
+
+        Ok(())
+    }
+}
+
+/// State of a CS interface.
+enum CsInterfaceState {
+    /// Interface is not yet initialized.
+    Disabled,
+    /// Interface is initialized and operational.
+    #[expect(dead_code)]
+    Enabled(EnabledCsInterface),
+}
+
+/// When enabled, a CS Interface has control, input, and output system memory interfaces.
+struct EnabledCsInterface {
+    /// Control block interface - provides CS capabilities and configuration.
+    #[expect(dead_code)]
+    cs_control: FwInterface<CS_CONTROL_BLOCK_SIZE>,
+    /// Input block interface - driver writes CS requests here.
+    #[expect(dead_code)]
+    cs_input: FwInterface<CS_KERNEL_INPUT_BLOCK_SIZE>,
+    /// Output block interface - firmware writes CS acknowledgements here.
+    #[expect(dead_code)]
+    cs_output: FwInterface<CS_KERNEL_OUTPUT_BLOCK_SIZE>,
+}
+
+/// Command Stream Interface
+///
+/// The CS interface controls operations for a specific CS.
+pub(crate) struct CsInterface {
+    /// Current interface state (Disabled or Enabled).
+    state: CsInterfaceState,
+    /// CS identifier/index number.
+    #[expect(dead_code)]
+    cs_idx: usize,
+}
+
+impl CsInterface {
+    /// Creates a new disabled CS interface.
+    pub(super) fn new(cs_idx: usize) -> Result<Self> {
+        Ok(Self {
+            state: CsInterfaceState::Disabled,
+            cs_idx,
+        })
+    }
+
+    /// Enables the CS interface.
+    ///
+    /// This calculates the runtime offset of this CS's control block and creates
+    /// a bounded interface to access it. It then reads the input/output interface
+    /// addresses from the CS control block.
+    fn enable(
+        &mut self,
+        shared_section: &Section,
+        csg_control_offset: usize,
+        cs_idx: usize,
+        cs_stride: usize,
+    ) -> Result {
+        use cs::control::{
+            STREAM_INPUT_VA,
+            STREAM_OUTPUT_VA, //
+        };
+        use kernel::io::Io;
+
+        let vmap = shared_section.mem.bo.owned_vmap::<0>()?;
+        let va_range = shared_section.mem.va_range();
+
+        // Calculate the runtime offset for this CS's control block.
+        let cs_control_offset = CS_CONTROL_OFFSET + cs_idx * cs_stride;
+
+        // The CS control block's MCU virtual address is relative to the shared section start.
+        let cs_control_va = va_range.start + csg_control_offset as u64 + cs_control_offset as u64;
+
+        // Create a bounded interface for this CS's control block at the calculated address.
+        let cs_control =
+            FwInterface::<CS_CONTROL_BLOCK_SIZE>::new(&vmap, &va_range, cs_control_va)?;
+
+        // Read the input and output VAs from the CS control block.
+        let input_va = cs_control.read(STREAM_INPUT_VA).value().get();
+        let cs_input =
+            FwInterface::<CS_KERNEL_INPUT_BLOCK_SIZE>::new(&vmap, &va_range, input_va.into())?;
+
+        let output_va = cs_control.read(STREAM_OUTPUT_VA).value().get();
+        let cs_output =
+            FwInterface::<CS_KERNEL_OUTPUT_BLOCK_SIZE>::new(&vmap, &va_range, output_va.into())?;
+
+        let enabled = EnabledCsInterface {
+            cs_control,
+            cs_input,
+            cs_output,
+        };
+
+        self.state = CsInterfaceState::Enabled(enabled);
+
+        Ok(())
+    }
+}
diff --git a/drivers/gpu/drm/tyr/gem.rs b/drivers/gpu/drm/tyr/gem.rs
index 4ec373e0bcfa..606d446aafd9 100644
--- a/drivers/gpu/drm/tyr/gem.rs
+++ b/drivers/gpu/drm/tyr/gem.rs
@@ -151,6 +151,11 @@ pub(crate) fn new<Ctx: DeviceContext>(
             va_range: va..(va + size),
         })
     }
+
+    /// Returns the GPU virtual address range occupied by this buffer.
+    pub(crate) fn va_range(&self) -> Range<u64> {
+        self.va_range.clone()
+    }
 }
 
 impl Drop for KernelBo {

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 19/20] rust: time: add arch_timer_get_rate wrapper
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (17 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 18/20] drm/tyr: add CSF firmware interface support Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-27  7:42   ` Andreas Hindborg
                     ` (2 more replies)
  2026-04-24 23:39 ` [PATCH v4 20/20] drm/tyr: program CSF global interface Deborah Brouwer
  2026-04-27  8:07 ` [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Boris Brezillon
  20 siblings, 3 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

Provide a safe Rust wrapper for arch_timer_get_rate().

The underlying C helper returns 0 when the ARM architectural timer
is not available or not yet initialized. Map this to Option<u32> to
make the absence of a valid rate explicit to Rust callers.

This allows Rust drivers to query the system timer frequency and
select appropriate time sources when programming hardware timeouts.

Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 rust/kernel/time.rs | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/rust/kernel/time.rs b/rust/kernel/time.rs
index 6ea98dfcd027..03ce96450fc8 100644
--- a/rust/kernel/time.rs
+++ b/rust/kernel/time.rs
@@ -359,6 +359,35 @@ fn div(self, rhs: Self) -> Self::Output {
     }
 }
 
+/// Returns the ARM architecture timer frequency in Hz, if available.
+///
+/// This function queries the system-wide ARM architecture timer frequency.
+/// The architecture timer provides a consistent time source across all CPU cores.
+///
+/// Returns `None` if:
+/// - The ARM architecture timer is not available (`CONFIG_ARM_ARCH_TIMER` not enabled)
+/// - The timer rate is zero (not initialized)
+///
+/// # Examples
+///
+/// ```
+/// use kernel::time::arch_timer_get_rate;
+///
+/// if let Some(rate) = arch_timer_get_rate() {
+///     // Use `rate`.
+/// }
+/// ```
+pub fn arch_timer_get_rate() -> Option<u32> {
+    // SAFETY: The C API is available in all configs; when CONFIG_ARM_ARCH_TIMER
+    // is disabled, the header provides a stub returning 0.
+    let rate = unsafe { bindings::arch_timer_get_rate() };
+    if rate == 0 {
+        None
+    } else {
+        Some(rate)
+    }
+}
+
 impl Delta {
     /// A span of time equal to zero.
     pub const ZERO: Self = Self { nanos: 0 };

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 20/20] drm/tyr: program CSF global interface
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (18 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 19/20] rust: time: add arch_timer_get_rate wrapper Deborah Brouwer
@ 2026-04-24 23:39 ` Deborah Brouwer
  2026-04-27  8:07 ` [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Boris Brezillon
  20 siblings, 0 replies; 29+ messages in thread
From: Deborah Brouwer @ 2026-04-24 23:39 UTC (permalink / raw)
  To: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

Initialize the CSF global (GLB) interface after firmware boot.

Program the GLB input block with initial configuration:
- enable allocation across all present shader cores
- set power-off, progress, and idle timers

Then update GLB_REQ to enable persistent features and trigger
configuration updates, and ring the global doorbell to notify the MCU.

After ringing the doorbell, wait for the firmware to acknowledge the
configuration requests before proceeding.

Co-developed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
---
 drivers/gpu/drm/tyr/driver.rs        |   2 +-
 drivers/gpu/drm/tyr/fw.rs            |  12 +-
 drivers/gpu/drm/tyr/fw/interfaces.rs | 246 ++++++++++++++++++++++++++++++++++-
 3 files changed, 253 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index 20ae114a4180..67a5289dd346 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -192,7 +192,7 @@ fn probe(
         firmware
             .wait_ready(1000)
             .inspect_err(|_| pr_err!("Timed out waiting for firmware to be ready.\n"))?;
-        firmware.enable_global_interface()?;
+        firmware.enable_global_interface(&gpu_info, &core_clk)?;
 
         let data = try_pin_init!(TyrDrmDeviceData {
                 pdev: platform.clone(),
diff --git a/drivers/gpu/drm/tyr/fw.rs b/drivers/gpu/drm/tyr/fw.rs
index 598e399a58ae..5fe6f47c5d2e 100644
--- a/drivers/gpu/drm/tyr/fw.rs
+++ b/drivers/gpu/drm/tyr/fw.rs
@@ -20,6 +20,7 @@
 
 use kernel::{
     bits::genmask_u32,
+    clk::Clk,
     devres::Devres,
     drm::{
         gem::BaseObject,
@@ -337,8 +338,15 @@ pub(crate) fn wait_ready(&self, timeout_ms: u32) -> Result {
     }
 
     /// Enable the global interface.
-    pub(crate) fn enable_global_interface(&self) -> Result {
+    pub(crate) fn enable_global_interface(&self, gpu_info: &GpuInfo, core_clk: &Clk) -> Result {
         let shared_section = self.shared_section()?;
-        self.global_iface.lock().enable(shared_section)
+        self.global_iface.lock().enable(
+            &self.pdev,
+            &self.iomem,
+            shared_section,
+            gpu_info,
+            core_clk,
+            &self.ready_wait,
+        )
     }
 }
diff --git a/drivers/gpu/drm/tyr/fw/interfaces.rs b/drivers/gpu/drm/tyr/fw/interfaces.rs
index 07cdb1c76a3f..efea0785b3bd 100644
--- a/drivers/gpu/drm/tyr/fw/interfaces.rs
+++ b/drivers/gpu/drm/tyr/fw/interfaces.rs
@@ -39,11 +39,29 @@
 //! ```
 //!
 
-use crate::fw::Section;
+use crate::{
+    driver::IoMem,
+    fw::Section,
+    gpu::GpuInfo,
+    regs::doorbell_block::DOORBELL,
+    wait::{
+        Wait,
+        WaitResult, //
+    }, //
+};
 use iface::FwInterface;
 use kernel::{
-    io::Io,
-    prelude::*, //
+    bindings::SZ_1K,
+    clk::Clk,
+    devres::Devres,
+    io::{
+        register::Array,
+        Io, //
+    },
+    num::Bounded,
+    platform,
+    prelude::*,
+    time::arch_timer_get_rate, //
 };
 
 /// Offset from GLB_CONTROL_BLOCK start to the first GROUP_CONTROL block.
@@ -1616,9 +1634,94 @@ pub(super) mod output {
 use csg::*;
 use glb::{
     control::*,
+    input::*,
+    output::GLB_ACK,
     *, //
 };
 
+/// Converts a timeout in microseconds to a timeout field value and timer source.
+///
+/// The firmware supports two timer sources:
+/// - System timestamp (arch timer): preferred when available, so the timeout
+///   tracks real elapsed time independently of GPU clock rate.
+/// - GPU cycle counter: fallback when the system timestamp is unavailable.
+///
+/// Returns the encoded timeout value and the selected timer source.
+fn conv_timeout(core_clk: &Clk, timeout_us: u32) -> Result<(u32, TimestampSource)> {
+    // The max timeout is determined by the 31 bit size of the timeout field.
+    let max_timeout = (1u32 << 31) - 1;
+    let core_rate = core_clk.rate().as_hz() as u64;
+
+    let (timer_rate, timer_source) = match arch_timer_get_rate() {
+        Some(rate) => (u64::from(rate), TimestampSource::SystemTimestamp),
+        _ if core_rate != 0 => (core_rate, TimestampSource::GpuCounter),
+        _ => return Err(EINVAL),
+    };
+
+    let timeout_in_cycles = u64::from(timeout_us) * timer_rate;
+
+    // The hardware stores the represented timeout value with a shr(10) to save space.
+    let timeout_shift = u64::from(SZ_1K);
+    let us_per_second = 1_000_000u64;
+
+    let timeout_val = timeout_in_cycles.div_ceil(us_per_second * timeout_shift);
+    let timeout_val = timeout_val.min(u64::from(max_timeout)) as u32;
+
+    Ok((timeout_val, timer_source))
+}
+
+/// Request/acknowledge communication between Tyr and CSF.
+struct GlobalInterfaceRequests<'a> {
+    /// Global input block where driver writes requests.
+    input: &'a FwInterface<GLB_INPUT_BLOCK_SIZE>,
+    /// Global output block where firmware writes acknowledgements.
+    output: &'a FwInterface<GLB_OUTPUT_BLOCK_SIZE>,
+}
+
+impl<'a> GlobalInterfaceRequests<'a> {
+    fn new(
+        input: &'a FwInterface<GLB_INPUT_BLOCK_SIZE>,
+        output: &'a FwInterface<GLB_OUTPUT_BLOCK_SIZE>,
+    ) -> Self {
+        Self { input, output }
+    }
+
+    /// Waits for the firmware to acknowledge the given request bits.
+    ///
+    /// The ack condition is `(GLB_ACK & mask) == (GLB_REQ & mask)`.
+    fn wait_acks(&self, reqs_mask: GLB_REQ, event_wait: &Wait, timeout_ms: u32) -> Result {
+        let mask = reqs_mask.into_raw();
+
+        event_wait.wait_interruptible_timeout(timeout_ms, || {
+            let req = self.input.read(GLB_REQ).into_raw() & mask;
+            let ack = self.output.read(GLB_ACK).into_raw() & mask;
+            if req == ack {
+                Ok(WaitResult::Done)
+            } else {
+                Ok(WaitResult::Retry)
+            }
+        })
+    }
+
+    /// Use to make requests, where simply changing the bit value is
+    /// sufficient to make a request; the bit value has no meaning in itself.
+    fn toggle_requests(&self, reqs_mask: GLB_REQ) -> Result {
+        let reqs_mask_val = reqs_mask.into_raw();
+
+        let cur_ack_val = self.output.read(GLB_ACK).into_raw();
+
+        // Calculate which bits to toggle based on ACK state
+        let toggled_bits = (cur_ack_val ^ reqs_mask_val) & reqs_mask_val;
+
+        let cur_req_val = self.input.read(GLB_REQ).into_raw();
+        let preserved_bits = cur_req_val & !reqs_mask_val;
+        let new_val = toggled_bits | preserved_bits;
+
+        self.input.write(GLB_REQ, GLB_REQ::from_raw(new_val));
+        Ok(())
+    }
+}
+
 /// State of the global interface.
 enum GlobalInterfaceState {
     /// Interface is not yet initialized.
@@ -1667,7 +1770,15 @@ pub(super) fn new() -> Result<Self> {
     /// This reads the firmware's control block to set up the global input/output
     /// interfaces; it configures timers and shader core allocation; and it discovers
     /// available CSG interfaces.
-    pub(crate) fn enable(&mut self, shared_section: &Section) -> Result {
+    pub(crate) fn enable(
+        &mut self,
+        pdev: &platform::Device,
+        iomem: &Devres<IoMem>,
+        shared_section: &Section,
+        gpu_info: &GpuInfo,
+        core_clk: &Clk,
+        event_wait: &Wait,
+    ) -> Result {
         let vmap = shared_section.mem.bo.owned_vmap::<0>()?;
         let va_range = shared_section.mem.va_range();
 
@@ -1700,6 +1811,24 @@ pub(crate) fn enable(&mut self, shared_section: &Section) -> Result {
             output_va.value().get().into(),
         )?;
 
+        Self::configure_glb_input(&glb_input, gpu_info, core_clk)?;
+        let ack_mask = Self::configure_glb_requests(&glb_input, &glb_output)?;
+
+        // Ring the global doorbell to notify the MCU.
+        // SAFETY: Called during probe after the device has been successfully bound,
+        // so it is valid to access it as a bound device.
+        let dev = unsafe { pdev.as_ref().as_bound() };
+        let io = iomem.access(dev)?;
+        io.write(Array::at(0), DOORBELL::zeroed().with_ring(true));
+
+        // Wait for the firmware to acknowledge the initial global configuration.
+        let request_field = GlobalInterfaceRequests::new(&glb_input, &glb_output);
+
+        if let Err(e) = request_field.wait_acks(ack_mask, event_wait, 1000) {
+            pr_err!("CSF firmware failed to ACK initial GLB config\n");
+            return Err(e);
+        }
+
         // Read how many CSG interfaces exist.
         let csg_num = glb_control.read(GLB_GROUP_NUM).value().get();
 
@@ -1739,6 +1868,115 @@ pub(crate) fn enable(&mut self, shared_section: &Section) -> Result {
         Ok(())
     }
 
+    /// Programs GLB input-block configuration registers.
+    ///
+    /// Writes shader core allocation and timer values. These settings are applied
+    /// by firmware only after the corresponding GLB_REQ bits are updated.
+    fn configure_glb_input(
+        glb_input: &FwInterface<GLB_INPUT_BLOCK_SIZE>,
+        gpu_info: &GpuInfo,
+        core_clk: &Clk,
+    ) -> Result {
+        // Make all present shader cores available for endpoint allocation.
+        glb_input.write(
+            GLB_ALLOC_EN,
+            GLB_ALLOC_EN::zeroed().with_mask(gpu_info.shader_present),
+        );
+
+        // Configure power-down delay for shader and tiler domains.
+        // The firmware powers down a domain after it has been idle for this duration,
+        // and cancels the timeout if work arrives before expiry.
+
+        // Power-down delay after idle, in microseconds.
+        const PWROFF_HYSTERESIS_US: u32 = 10_000;
+        let (pwroff_timeout, pwroff_source) = conv_timeout(core_clk, PWROFF_HYSTERESIS_US)?;
+        let pwroff_timeout = Bounded::<u32, 31>::try_new(pwroff_timeout).ok_or(EINVAL)?;
+        glb_input.write(
+            GLB_PWROFF_TIMER,
+            GLB_PWROFF_TIMER::zeroed()
+                .with_timeout(pwroff_timeout)
+                .with_timer_source(pwroff_source),
+        );
+
+        // Configure forward progress timeout.
+        //
+        // Keep this aligned with panthor, which programs a fixed GPU-cycle timeout.
+        // The real-time duration therefore varies with the GPU clock rate (e.g. ~5.24 s
+        // at 500 MHz, longer at lower frequencies).
+        //
+        // The hardware stores the timeout in units of 1024 cycles, so encode the raw
+        // cycle count by shifting right by 10.
+        const PROGRESS_TIMEOUT_CYCLES: u32 = 5 * 500 * 1024 * 1024;
+        const PROGRESS_TIMEOUT_SCALE_SHIFT: u32 = 10;
+        let progress_timeout = PROGRESS_TIMEOUT_CYCLES >> PROGRESS_TIMEOUT_SCALE_SHIFT;
+        glb_input.write(
+            GLB_PROGRESS_TIMER,
+            GLB_PROGRESS_TIMER::zeroed().with_timeout(progress_timeout),
+        );
+
+        // Configure the delay before reporting the GPU as idle.
+        const IDLE_HYSTERESIS_US: u32 = 800;
+        let (idle_timeout, idle_source) = conv_timeout(core_clk, IDLE_HYSTERESIS_US)?;
+        let idle_timeout = Bounded::<u32, 31>::try_new(idle_timeout).ok_or(EINVAL)?;
+        glb_input.write(
+            GLB_IDLE_TIMER,
+            GLB_IDLE_TIMER::zeroed()
+                .with_timeout(idle_timeout)
+                .with_timer_source(idle_source),
+        );
+
+        Ok(())
+    }
+
+    /// Programs GLB_REQ and ACK IRQ mask after GLB input registers are configured.
+    ///
+    /// This sets desired persistent states, toggles configuration-update requests,
+    /// and returns the GLB_REQ bits that must be acknowledged by firmware.
+    fn configure_glb_requests(
+        glb_input: &FwInterface<GLB_INPUT_BLOCK_SIZE>,
+        glb_output: &FwInterface<GLB_OUTPUT_BLOCK_SIZE>,
+    ) -> Result<GLB_REQ> {
+        // Firmware updates GLB_ACK (output block) in response to GLB_REQ.
+        // GLB_ACK_IRQ_MASK selects which of these updates trigger a host interrupt.
+        glb_input.write(
+            GLB_ACK_IRQ_MASK,
+            GLB_ACK_IRQ_MASK::zeroed()
+                .with_cfg_progress_timer(true)
+                .with_cfg_alloc_en(true)
+                .with_cfg_pwroff_timer(true)
+                .with_idle_enable(true)
+                .with_idle_event(true)
+                .with_counter_enable(true),
+        );
+
+        // Requests whose value represents the desired persistent state.
+        let cur_req = glb_input.read(GLB_REQ);
+        glb_input.write(
+            GLB_REQ,
+            cur_req.with_idle_enable(true).with_counter_enable(true),
+        );
+
+        let request_field = GlobalInterfaceRequests::new(glb_input, glb_output);
+
+        // Fields that require toggle semantics.
+        let toggle_mask = GLB_REQ::zeroed()
+            .with_cfg_progress_timer(true)
+            .with_cfg_alloc_en(true)
+            .with_cfg_pwroff_timer(true);
+
+        request_field.toggle_requests(toggle_mask)?;
+
+        // All fields we want to wait for completion on (REQ == ACK).
+        let ack_mask = GLB_REQ::zeroed()
+            .with_cfg_progress_timer(true)
+            .with_cfg_alloc_en(true)
+            .with_cfg_pwroff_timer(true)
+            .with_idle_enable(true)
+            .with_counter_enable(true);
+
+        Ok(ack_mask)
+    }
+
     /// Initialize CSG interfaces.
     ///
     /// This uses the previously read CSG count to create and enable each CSG interface.

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 02/20] drm/tyr: select required dependencies in Kconfig
  2026-04-24 23:38 ` [PATCH v4 02/20] drm/tyr: select required dependencies in Kconfig Deborah Brouwer
@ 2026-04-27  7:23   ` Boris Brezillon
  0 siblings, 0 replies; 29+ messages in thread
From: Boris Brezillon @ 2026-04-27  7:23 UTC (permalink / raw)
  To: Deborah Brouwer
  Cc: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd, dri-devel,
	linux-kernel, rust-for-linux, beata.michalska, lyude, acourbot,
	work, alvin.sun

On Fri, 24 Apr 2026 16:38:56 -0700
Deborah Brouwer <deborah.brouwer@collabora.com> wrote:

> From: Boris Brezillon <boris.brezillon@collabora.com>
> 
> Tyr depends on DRM_GPUVM, RUST_DRM_GEM_SHMEM_HELPER, MMU, IOMMU_SUPPORT,
> and IOMMU_IO_PGTABLE_LPAE. Select or depend on these symbols in Kconfig so
> the required infrastructure is enabled when Tyr is built.
> 
> Introduce DRM_TYR_STATIC_DEPS to keep the built-in DRM dependencies
> selected even when Tyr is built as a module.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
> ---
>  drivers/gpu/drm/tyr/Kconfig | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/tyr/Kconfig b/drivers/gpu/drm/tyr/Kconfig
> index e933e6478027..443ce988b570 100644
> --- a/drivers/gpu/drm/tyr/Kconfig
> +++ b/drivers/gpu/drm/tyr/Kconfig
> @@ -1,5 +1,12 @@
>  # SPDX-License-Identifier: GPL-2.0 or MIT
>  
> +config DRM_TYR_STATIC_DEPS
> +	bool
> +	select DRM_GPUVM

IIRC, Danilo said we should have some boolean RUST_DRM_GPUVM option
selecting DRM_GPUVM for us, just like RUST_DRM_GEM_SHMEM_HELPER does.

> +	help
> +	  Ensure required DRM infrastructure is built-in when enabling Tyr
> +	  even if Tyr is =m
> +
>  config DRM_TYR
>  	tristate "Tyr (Rust DRM support for ARM Mali CSF-based GPUs)"
>  	depends on DRM=y
> @@ -7,6 +14,11 @@ config DRM_TYR
>  	depends on ARM || ARM64 || COMPILE_TEST
>  	depends on !GENERIC_ATOMIC64  # for IOMMU_IO_PGTABLE_LPAE
>  	depends on COMMON_CLK
> +	depends on MMU
> +	select DRM_TYR_STATIC_DEPS
> +	select IOMMU_IO_PGTABLE_LPAE
> +	select RUST_DRM_GEM_SHMEM_HELPER
> +	depends on IOMMU_SUPPORT
>  	default n
>  	help
>  	  Rust DRM driver for ARM Mali CSF-based GPUs.
> @@ -16,5 +28,5 @@ config DRM_TYR
>  	  Note that the Mali-G68 and Mali-G78, while Valhall architecture, will
>  	  be supported with the panfrost driver as they are not CSF GPUs.
>  
> -	  if M is selected, the module will be called tyr. This driver is work
> +	  If M is selected, the module will be called tyr. This driver is work
>  	  in progress and may not be functional.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 19/20] rust: time: add arch_timer_get_rate wrapper
  2026-04-24 23:39 ` [PATCH v4 19/20] rust: time: add arch_timer_get_rate wrapper Deborah Brouwer
@ 2026-04-27  7:42   ` Andreas Hindborg
  2026-04-27  7:53   ` Alice Ryhl
  2026-04-27  8:59   ` Onur Özkan
  2 siblings, 0 replies; 29+ messages in thread
From: Andreas Hindborg @ 2026-04-27  7:42 UTC (permalink / raw)
  To: Deborah Brouwer, Daniel Almeida, Alice Ryhl, Danilo Krummrich,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda,
	Boqun Feng, Björn Roy Baron, Trevor Gross, FUJITA Tomonori,
	Frederic Weisbecker, Thomas Gleixner, Anna-Maria Behnsen,
	John Stultz, Stephen Boyd
  Cc: dri-devel, linux-kernel, rust-for-linux, boris.brezillon,
	beata.michalska, lyude, acourbot, work, alvin.sun,
	Deborah Brouwer

"Deborah Brouwer" <deborah.brouwer@collabora.com> writes:

> Provide a safe Rust wrapper for arch_timer_get_rate().
>
> The underlying C helper returns 0 when the ARM architectural timer
> is not available or not yet initialized. Map this to Option<u32> to
> make the absence of a valid rate explicit to Rust callers.
>
> This allows Rust drivers to query the system timer frequency and
> select appropriate time sources when programming hardware timeouts.
>
> Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>

Looks good to me, but it is weird that this is defined for non-arm targets.

Even if this is available for all architectures in C code, would it make
sense to gate it on the target being Arm in Rust?


Best regards,
Andreas Hindborg


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 19/20] rust: time: add arch_timer_get_rate wrapper
  2026-04-24 23:39 ` [PATCH v4 19/20] rust: time: add arch_timer_get_rate wrapper Deborah Brouwer
  2026-04-27  7:42   ` Andreas Hindborg
@ 2026-04-27  7:53   ` Alice Ryhl
  2026-04-27  8:59   ` Onur Özkan
  2 siblings, 0 replies; 29+ messages in thread
From: Alice Ryhl @ 2026-04-27  7:53 UTC (permalink / raw)
  To: Deborah Brouwer
  Cc: Daniel Almeida, Danilo Krummrich, David Airlie, Simona Vetter,
	Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd, dri-devel,
	linux-kernel, rust-for-linux, boris.brezillon, beata.michalska,
	lyude, acourbot, work, alvin.sun

On Sat, Apr 25, 2026 at 1:39 AM Deborah Brouwer
<deborah.brouwer@collabora.com> wrote:
>
> Provide a safe Rust wrapper for arch_timer_get_rate().
>
> The underlying C helper returns 0 when the ARM architectural timer
> is not available or not yet initialized. Map this to Option<u32> to
> make the absence of a valid rate explicit to Rust callers.
>
> This allows Rust drivers to query the system timer frequency and
> select appropriate time sources when programming hardware timeouts.
>
> Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
> ---
>  rust/kernel/time.rs | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
>
> diff --git a/rust/kernel/time.rs b/rust/kernel/time.rs
> index 6ea98dfcd027..03ce96450fc8 100644
> --- a/rust/kernel/time.rs
> +++ b/rust/kernel/time.rs
> @@ -359,6 +359,35 @@ fn div(self, rhs: Self) -> Self::Output {
>      }
>  }
>
> +/// Returns the ARM architecture timer frequency in Hz, if available.
> +///
> +/// This function queries the system-wide ARM architecture timer frequency.
> +/// The architecture timer provides a consistent time source across all CPU cores.
> +///
> +/// Returns `None` if:
> +/// - The ARM architecture timer is not available (`CONFIG_ARM_ARCH_TIMER` not enabled)
> +/// - The timer rate is zero (not initialized)
> +///
> +/// # Examples
> +///
> +/// ```
> +/// use kernel::time::arch_timer_get_rate;
> +///
> +/// if let Some(rate) = arch_timer_get_rate() {
> +///     // Use `rate`.
> +/// }
> +/// ```
> +pub fn arch_timer_get_rate() -> Option<u32> {
> +    // SAFETY: The C API is available in all configs; when CONFIG_ARM_ARCH_TIMER
> +    // is disabled, the header provides a stub returning 0.

The stub is inline, so to call it you must define a helper for this
function. This code will not compile if CONFIG_ARM_ARCH_TIMER is not
set.

Alice

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support
  2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
                   ` (19 preceding siblings ...)
  2026-04-24 23:39 ` [PATCH v4 20/20] drm/tyr: program CSF global interface Deborah Brouwer
@ 2026-04-27  8:07 ` Boris Brezillon
  20 siblings, 0 replies; 29+ messages in thread
From: Boris Brezillon @ 2026-04-27  8:07 UTC (permalink / raw)
  To: Deborah Brouwer
  Cc: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd, dri-devel,
	linux-kernel, rust-for-linux, beata.michalska, lyude, acourbot,
	work, alvin.sun

On Fri, 24 Apr 2026 16:38:54 -0700
Deborah Brouwer <deborah.brouwer@collabora.com> wrote:

> This series adds firmware loading and MCU boot support to the Tyr DRM
> driver. It includes:
>  - A parser for the Mali CSF firmware binary format
>  - A kernel-managed BO type (KernelBo) for internal driver allocations
>  - GPU virtual memory (VM) integration using drm_gpuvm
>  - An MMU module and a generic slot manager
>  - Shmem-backed GEM support for Tyr
>  - Loading firmware, VM activation, and MCU boot at probe()
>  - Initialization of Command Stream Frontend (CSF) firmware interfaces
> 
> Dependencies:
>  - [PATCH v12 0/5] Rust bindings for gem shmem
>     https://lore.kernel.org/rust-for-linux/20260421235346.672794-1-lyude@redhat.com
> 
>  - [PATCH v6 0/5] Rust GPUVM immediate mode
>     https://lore.kernel.org/rust-for-linux/20260409-gpuvm-rust-v6-0-b16e6ada7261@google.com/
> 
>  - [PATCH v6 0/5] Introduce DeviceContext
>     https://lore.kernel.org/rust-for-linux/20260320233645.950190-1-lyude@redhat.com/
> 
>  - [PATCH v5 0/6] drm/tyr: Use register! macro
>     https://lore.kernel.org/rust-for-linux/20260409-b4-tyr-use-register-macro-v5-v5-0-8abfff8a0204@collabora.com/
> 
> Other Prerequisites:
>  This series also depends on additional prerequisite fixes not included in
>  this posting. The full stack (base + prerequisites + this series) is
>  available here:
>   https://gitlab.freedesktop.org/dbrouwer/linux/-/tree/dbrouwer/fw-boot
>     
>   Development history / discussion:
>    https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/56
>     
> 
> ---
> Changes in v4:
>  New commits:
>   - drm/tyr: program CSF global interface
>   - rust: time: add arch_timer_get_rate wrapper
>   - drm/tyr: add CSF firmware interface support
>   - drm/tyr: validate presence of CSF shared section
>   - drm/tyr: wait for global interface readiness
>   - drm/tyr: add Job IRQ handling
>   - drm/tyr: add Wait type for GPU events
> 
>  The existing commits from v3 remain unchanged.
>  - Link to v3: https://lore.kernel.org/r/20260413-b4-fw-boot-v3-v3-0-b422f3c03885@collabora.com
>     
> Changes in v3:
>  New commits:
>   - drm/tyr: remove unused device from platform data
>   - drm/tyr: use shmem GEM object type in TyrDrmDriver
>     
>  drm/tyr: select required dependencies in Kconfig
>   - Rename commit since the dependencies are not limited to DRM.
>   - Select new RUST_DRM_GEM_SHMEM_HELPER instead of DRM_GEM_SHMEM_HELPER.
>     
>  drm/tyr: set DMA mask using GPU physical address
>   - Use register macro to read pa_bits instead of separate helper function.
>     
>  drm/tyr: add MMU module
>   - Switch MMU code to typed register APIs (TRANSCFG, MEMATTR, STATUS, LOCKADDR, etc.).
>   - Use MmuCommand enum for MMU commands instead of raw constants.
>   - Minor cleanups and renaming (MAX_AS, AS_PRESENT handling).
>     
>  drm/tyr: add GPU virtual memory module
>   - Extract VA/PA bits via typed MMU_FEATURES register.
>   - Update the VM code to match the new GPUVM v6 and shmem GEM v10 APIs.
>     
>  drm/tyr: add a kernel buffer object
>   - Reject zero-sized KernelBo allocations up front.
>     
>  drm/tyr: add firmware loading and MCU boot support
>   - Use typed GPU control registers.
>   - Pass iomem by Arc into Firmware::new() since we store it eventually.
>     
>   - Link to v2: https://lore.kernel.org/rust-for-linux/20260302232500.244489-1-deborah.brouwer@collabora.com/
>     
> Changes in v2:
>  - The whole series is rebased on drm-rust-next including v7.0-rc1.
>  - Each patch has its own changelog.
>     
>  Link to v1: https://lore.kernel.org/rust-for-linux/20260212013713.304343-1-deborah.brouwer@collabora.com/
> 
> Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
> 
> ---
> Alvin Sun (1):
>       drm/tyr: use shmem GEM object type in TyrDrmDriver
> 
> Beata Michalska (1):
>       drm/tyr: set DMA mask using GPU physical address
> 
> Boris Brezillon (5):
>       drm/tyr: select required dependencies in Kconfig
>       drm/tyr: rename TyrObject to BoData
>       drm/tyr: Add generic slot manager
>       drm/tyr: add MMU module
>       drm/tyr: add GPU virtual memory module
> 
> Daniel Almeida (1):
>       drm/tyr: add parser for firmware binary
> 
> Deborah Brouwer (12):
>       drm/tyr: remove unused device from platform data
>       drm/tyr: move clock cleanup into Clocks Drop impl
>       drm/tyr: add shmem backing for GEM objects
>       drm/tyr: add a kernel buffer object
>       drm/tyr: add firmware loading and MCU boot support
>       drm/tyr: add Wait type for GPU events
>       drm/tyr: add Job IRQ handling
>       drm/tyr: wait for global interface readiness
>       drm/tyr: validate presence of CSF shared section
>       drm/tyr: add CSF firmware interface support
>       rust: time: add arch_timer_get_rate wrapper
>       drm/tyr: program CSF global interface

This series starts to be quite big, and it seems new features have been
added to v4 (interactions with the FW). I'd recommend that we extract
the uncontroversial bits (I'd say patch 1-2, 4-7) or have them applied
to drm-rust-next right away. I know it's tempting to add features
between revisions, but the more you do that the longer it will take to
get the foundation merged.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 12/20] drm/tyr: add parser for firmware binary
  2026-04-24 23:39 ` [PATCH v4 12/20] drm/tyr: add parser for firmware binary Deborah Brouwer
@ 2026-04-27  8:09   ` Onur Özkan
  2026-04-27  8:20     ` Boris Brezillon
  0 siblings, 1 reply; 29+ messages in thread
From: Onur Özkan @ 2026-04-27  8:09 UTC (permalink / raw)
  To: Deborah Brouwer
  Cc: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd, dri-devel,
	linux-kernel, rust-for-linux, boris.brezillon, beata.michalska,
	lyude, acourbot, alvin.sun

On Fri, 24 Apr 2026 16:39:06 -0700
Deborah Brouwer <deborah.brouwer@collabora.com> wrote:

> From: Daniel Almeida <daniel.almeida@collabora.com>
> 
> Add a parser for the Mali CSF GPU firmware binary format. The firmware
> consists of a header followed by entries describing how to load firmware
> sections into the MCU's memory.
> 
> The parser extracts section metadata including virtual address ranges,
> data byte offsets within the binary, and section flags controlling
> permissions and cache modes. It validates the basic firmware structure
> and alignment and ignores protected-mode sections for now.
> 
> Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
> Co-developed-by: Beata Michalska <beata.michalska@arm.com>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Co-developed-by: Deborah Brouwer <deborah.brouwer@collabora.com>
> Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
> ---
>  drivers/gpu/drm/tyr/fw/parser.rs | 519 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 519 insertions(+)
> 
> diff --git a/drivers/gpu/drm/tyr/fw/parser.rs b/drivers/gpu/drm/tyr/fw/parser.rs
> new file mode 100644
> index 000000000000..638707430701
> --- /dev/null
> +++ b/drivers/gpu/drm/tyr/fw/parser.rs
> @@ -0,0 +1,519 @@
> +// SPDX-License-Identifier: GPL-2.0 or MIT
> +
> +//! Firmware binary parser for Mali CSF (Command Stream Frontend) GPU.
> +//!
> +//! This module implements a parser for the Mali GPU firmware binary format. The firmware
> +//! file contains a header followed by a sequence of entries, each describing how to load
> +//! firmware sections into the MCU (Microcontroller Unit) memory. The parser extracts section metadata including:
> +//! - Virtual address ranges where sections should be mapped
> +//! - Data ranges (byte offsets) within the firmware binary
> +//! - Section flags (permissions, cache modes)
> +
> +use core::{
> +    mem::size_of,
> +    ops::Range, //
> +};
> +
> +use kernel::{
> +    bits::bit_u32,
> +    prelude::*,
> +    str::CString, //
> +};
> +
> +use crate::{
> +    fw::{
> +        SectionFlag,
> +        SectionFlags,
> +        CSF_MCU_SHARED_REGION_START, //
> +    },
> +    vm::{
> +        VmFlag,
> +        VmMapFlags, //
> +    }, //
> +};
> +
> +/// A parsed firmware section ready for loading into MCU memory.
> +///
> +/// Represents a single firmware section extracted from the firmware binary, containing
> +/// all information needed to map the section's data into the MCU's virtual address space.
> +pub(super) struct ParsedSection {
> +    /// Byte offset range within the firmware binary where this section's data resides.
> +    pub(super) data_range: Range<u32>,
> +    /// MCU virtual address range where this section should be mapped.
> +    pub(super) va: Range<u32>,
> +    /// Memory protection and caching flags for the mapping.
> +    pub(super) vm_map_flags: VmMapFlags,
> +}
> +
> +/// A bare-bones `std::io::Cursor<[u8]>` clone to keep track of the current position in the firmware binary.
> +///
> +/// Provides methods to sequentially read primitive types and byte arrays from the firmware
> +/// binary while maintaining the current read position.
> +struct Cursor<'a> {
> +    data: &'a [u8],
> +    pos: usize,
> +}
> +
> +impl<'a> Cursor<'a> {
> +    fn new(data: &'a [u8]) -> Self {
> +        Self { data, pos: 0 }
> +    }
> +
> +    fn len(&self) -> usize {
> +        self.data.len()
> +    }
> +
> +    fn pos(&self) -> usize {
> +        self.pos
> +    }
> +
> +    /// Returns a view into the cursor's data.
> +    ///
> +    /// This spawns a new cursor, leaving the current cursor unchanged.
> +    fn view(&self, range: Range<usize>) -> Result<Cursor<'_>> {
> +        if range.start < self.pos || range.end > self.data.len() {
> +            pr_err!(
> +                "Invalid cursor range {:?} for data of length {}",
> +                range,
> +                self.data.len()
> +            );
> +
> +            Err(EINVAL)
> +        } else {
> +            Ok(Self {
> +                data: &self.data[range],
> +                pos: 0,
> +            })
> +        }
> +    }
> +
> +    /// Reads a slice of bytes from the current position and advances the cursor.
> +    ///
> +    /// Returns an error if the read would exceed the data bounds.
> +    fn read(&mut self, nbytes: usize) -> Result<&[u8]> {
> +        let start = self.pos;
> +        let end = start + nbytes;
> +
> +        if end > self.data.len() {
> +            pr_err!(
> +                "Invalid firmware file: read of size {} at position {} is out of bounds",
> +                nbytes,
> +                start,
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        self.pos += nbytes;
> +        Ok(&self.data[start..end])
> +    }
> +
> +    /// Reads a little-endian `u8` from the current position and advances the cursor.
> +    fn read_u8(&mut self) -> Result<u8> {
> +        let bytes = self.read(size_of::<u8>())?;
> +        Ok(bytes[0])
> +    }
> +
> +    /// Reads a little-endian `u16` from the current position and advances the cursor.
> +    fn read_u16(&mut self) -> Result<u16> {
> +        let bytes = self.read(size_of::<u16>())?;
> +        Ok(u16::from_le_bytes(bytes.try_into().unwrap()))
> +    }
> +
> +    /// Reads a little-endian `u32` from the current position and advances the cursor.
> +    fn read_u32(&mut self) -> Result<u32> {
> +        let bytes = self.read(size_of::<u32>())?;
> +        Ok(u32::from_le_bytes(bytes.try_into().unwrap()))
> +    }
> +
> +    /// Advances the cursor position by the specified number of bytes.
> +    ///
> +    /// Returns an error if the advance would exceed the data bounds.
> +    fn advance(&mut self, nbytes: usize) -> Result {
> +        if self.pos + nbytes > self.data.len() {
> +            pr_err!(
> +                "Invalid firmware file: advance of size {} at position {} is out of bounds",
> +                nbytes,
> +                self.pos,
> +            );
> +            return Err(EINVAL);
> +        }
> +        self.pos += nbytes;
> +        Ok(())
> +    }
> +}
> +
> +/// Parser for Mali CSF GPU firmware binaries.
> +///
> +/// Parses the firmware binary format, extracting section metadata including virtual
> +/// address ranges, data offsets, and memory protection flags needed to load firmware
> +/// into the MCU's memory.
> +pub(super) struct FwParser<'a> {
> +    cursor: Cursor<'a>,
> +}
> +
> +impl<'a> FwParser<'a> {
> +    /// Creates a new firmware parser for the given firmware binary data.
> +    pub(super) fn new(data: &'a [u8]) -> Self {
> +        Self {
> +            cursor: Cursor::new(data),
> +        }
> +    }
> +
> +    /// Parses the firmware binary and returns a collection of parsed sections.
> +    ///
> +    /// This method validates the firmware header and iterates through all entries
> +    /// in the binary, extracting section information needed for loading.
> +    pub(super) fn parse(&mut self) -> Result<KVec<ParsedSection>> {
> +        let fw_header = self.parse_fw_header()?;
> +
> +        let mut parsed_sections = KVec::new();
> +        while (self.cursor.pos() as u32) < fw_header.size {
> +            let entry_section = self.parse_entry()?;
> +
> +            if let Some(inner) = entry_section.inner {
> +                parsed_sections.push(inner, GFP_KERNEL)?;
> +            }
> +        }
> +
> +        Ok(parsed_sections)
> +    }
> +
> +    fn parse_fw_header(&mut self) -> Result<FirmwareHeader> {
> +        let fw_header: FirmwareHeader = match FirmwareHeader::new(&mut self.cursor) {
> +            Ok(fw_header) => fw_header,
> +            Err(e) => {
> +                pr_err!("Invalid firmware file: {}", e.to_errno());
> +                return Err(e);
> +            }
> +        };
> +
> +        if fw_header.size > self.cursor.len() as u32 {
> +            pr_err!("Firmware image is truncated");
> +            return Err(EINVAL);
> +        }
> +        Ok(fw_header)
> +    }
> +
> +    fn parse_entry(&mut self) -> Result<EntrySection> {
> +        let entry_section = EntrySection {
> +            entry_hdr: EntryHeader(self.cursor.read_u32()?),
> +            inner: None,
> +        };
> +
> +        if self.cursor.pos() % size_of::<u32>() != 0
> +            || entry_section.entry_hdr.size() as usize % size_of::<u32>() != 0
> +        {
> +            pr_err!(
> +                "Firmware entry isn't 32 bit aligned, offset={:#x} size={:#x}\n",
> +                self.cursor.pos() - size_of::<u32>(),
> +                entry_section.entry_hdr.size()
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        let section_hdr_size = entry_section.entry_hdr.size() as usize - size_of::<EntryHeader>();
> +
> +        let entry_section = {
> +            let mut entry_cursor = self
> +                .cursor
> +                .view(self.cursor.pos()..self.cursor.pos() + section_hdr_size)?;
> +
> +            match entry_section.entry_hdr.entry_type() {
> +                Ok(EntryType::Iface) => Ok(EntrySection {
> +                    entry_hdr: entry_section.entry_hdr,
> +                    inner: Self::parse_section_entry(&mut entry_cursor)?,
> +                }),
> +                Ok(
> +                    EntryType::Config
> +                    | EntryType::FutfTest
> +                    | EntryType::TraceBuffer
> +                    | EntryType::TimelineMetadata
> +                    | EntryType::BuildInfoMetadata,
> +                ) => Ok(entry_section),
> +
> +                entry_type => {
> +                    if entry_type.is_err() || !entry_section.entry_hdr.optional() {
> +                        if !entry_section.entry_hdr.optional() {
> +                            pr_err!(
> +                                "Failed to handle firmware entry type: {}\n",
> +                                entry_type
> +                                    .map_or(entry_section.entry_hdr.entry_type_raw(), |e| e as u8)
> +                            );
> +                            Err(EINVAL)
> +                        } else {
> +                            Ok(entry_section)
> +                        }
> +                    } else {
> +                        Ok(entry_section)
> +                    }
> +                }
> +            }
> +        };
> +
> +        if entry_section.is_ok() {
> +            self.cursor.advance(section_hdr_size)?;
> +        }
> +
> +        entry_section
> +    }
> +
> +    fn parse_section_entry(entry_cursor: &mut Cursor<'_>) -> Result<Option<ParsedSection>> {
> +        let section_hdr: SectionHeader = SectionHeader::new(entry_cursor)?;
> +
> +        if section_hdr.data.end < section_hdr.data.start {
> +            pr_err!(
> +                "Firmware corrupted, data.end < data.start (0x{:x} < 0x{:x})\n",
> +                section_hdr.data.end,
> +                section_hdr.data.start
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        if section_hdr.va.end < section_hdr.va.start {
> +            pr_err!(
> +                "Firmware corrupted, section_hdr.va.end < section_hdr.va.start (0x{:x} < 0x{:x})\n",
> +                section_hdr.va.end,
> +                section_hdr.va.start
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        if section_hdr.section_flags.contains(SectionFlag::Prot) {
> +            pr_info!("Firmware protected mode entry not supported, ignoring");
> +            return Ok(None);
> +        }
> +
> +        if section_hdr.va.start == CSF_MCU_SHARED_REGION_START
> +            && !section_hdr.section_flags.contains(SectionFlag::Shared)
> +        {
> +            pr_err!(
> +                "Interface at 0x{:x} must be shared",
> +                CSF_MCU_SHARED_REGION_START
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        let name_len = entry_cursor.len() - entry_cursor.pos();
> +        let name_bytes = entry_cursor.read(name_len)?;
> +
> +        let mut name = KVec::with_capacity(name_bytes.len() + 1, GFP_KERNEL)?;
> +        name.extend_from_slice(name_bytes, GFP_KERNEL)?;
> +        name.push(0, GFP_KERNEL)?;
> +
> +        let _name = CStr::from_bytes_with_nul(&name)
> +            .ok()
> +            .and_then(|name| CString::try_from(name).ok());
> +
> +        let cache_mode = section_hdr.section_flags.cache_mode();
> +        let mut vm_map_flags = VmMapFlags::empty();
> +
> +        if !section_hdr.section_flags.contains(SectionFlag::Write) {
> +            vm_map_flags |= VmFlag::Readonly;
> +        }
> +        if !section_hdr.section_flags.contains(SectionFlag::Exec) {
> +            vm_map_flags |= VmFlag::Noexec;
> +        }
> +        if cache_mode != SectionFlag::CacheModeCached.into() {
> +            vm_map_flags |= VmFlag::Uncached;
> +        }
> +
> +        Ok(Some(ParsedSection {
> +            data_range: section_hdr.data.clone(),
> +            va: section_hdr.va,
> +            vm_map_flags,
> +        }))
> +    }
> +}
> +
> +/// Firmware binary header containing version and size information.
> +///
> +/// The header is located at the beginning of the firmware binary and contains
> +/// a magic value for validation, version information, and the total size of
> +/// all structured headers that follow.
> +#[expect(dead_code)]
> +struct FirmwareHeader {
> +    /// Magic value to check binary validity.
> +    magic: u32,
> +
> +    /// Minor firmware version.
> +    minor: u8,
> +
> +    /// Major firmware version.
> +    major: u8,
> +
> +    /// Padding. Must be set to zero.
> +    _padding1: u16,
> +
> +    /// Firmware version hash.
> +    version_hash: u32,
> +
> +    /// Padding. Must be set to zero.
> +    _padding2: u32,
> +
> +    /// Total size of all the structured data headers at beginning of firmware binary.
> +    size: u32,
> +}
> +
> +impl FirmwareHeader {
> +    const FW_BINARY_MAGIC: u32 = 0xc3f13a6e;
> +    const FW_BINARY_MAJOR_MAX: u8 = 0;
> +
> +    /// Reads and validates a firmware header from the cursor.
> +    ///
> +    /// Verifies the magic value, version compatibility, and padding fields.
> +    fn new(cursor: &mut Cursor<'_>) -> Result<Self> {
> +        let magic = cursor.read_u32()?;
> +        if magic != Self::FW_BINARY_MAGIC {
> +            pr_err!("Invalid firmware magic");
> +            return Err(EINVAL);
> +        }
> +
> +        let minor = cursor.read_u8()?;
> +        let major = cursor.read_u8()?;
> +
> +        if major > Self::FW_BINARY_MAJOR_MAX {
> +            pr_err!(
> +                "Unsupported firmware binary header version {}.{} (expected {}.x)\n",
> +                major,
> +                minor,
> +                Self::FW_BINARY_MAJOR_MAX
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        let padding1 = cursor.read_u16()?;
> +        let version_hash = cursor.read_u32()?;
> +        let padding2 = cursor.read_u32()?;
> +        let size = cursor.read_u32()?;
> +
> +        if padding1 != 0 || padding2 != 0 {
> +            pr_err!("Invalid firmware file: header padding is not zero");
> +            return Err(EINVAL);
> +        }
> +
> +        let fw_header = Self {
> +            magic,
> +            minor,
> +            major,
> +            _padding1: padding1,
> +            version_hash,
> +            _padding2: padding2,

nit: I would write it like this:

	_padding1: 0,
	version_hash,
	_padding2: 0,

to be more explicit.

> +            size,
> +        };
> +
> +        Ok(fw_header)
> +    }
> +}
> +
> +/// Firmware section header for loading binary sections into MCU memory.
> +#[derive(Debug)]
> +struct SectionHeader {
> +    section_flags: SectionFlags,
> +    /// MCU virtual range to map this binary section to.
> +    va: Range<u32>,
> +    /// References the data in the FW binary.
> +    data: Range<u32>,
> +}
> +
> +impl SectionHeader {
> +    /// Reads and validates a section header from the cursor.
> +    ///
> +    /// Parses section flags, virtual address range, and data range from the firmware binary.
> +    fn new(cursor: &mut Cursor<'_>) -> Result<Self> {
> +        let section_flags = cursor.read_u32()?;
> +        let section_flags = SectionFlags::try_from(section_flags)?;
> +
> +        let va_start = cursor.read_u32()?;
> +        let va_end = cursor.read_u32()?;
> +
> +        let va = va_start..va_end;
> +
> +        if va.is_empty() {
> +            pr_err!(
> +                "Invalid firmware file: empty VA range at pos {}\n",
> +                cursor.pos(),
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        let data_start = cursor.read_u32()?;
> +        let data_end = cursor.read_u32()?;
> +        let data = data_start..data_end;
> +
> +        Ok(Self {
> +            section_flags,
> +            va,
> +            data,
> +        })
> +    }
> +}
> +
> +/// A firmware entry containing a header and optional parsed section data.
> +///
> +/// Represents a single entry in the firmware binary, which may contain loadable
> +/// section data or metadata that doesn't require loading.
> +struct EntrySection {
> +    entry_hdr: EntryHeader,
> +    inner: Option<ParsedSection>,
> +}
> +
> +/// Header for a firmware entry, packed into a single u32.
> +///
> +/// The entry header encodes the entry type, size, and optional flag in a
> +/// 32-bit value with the following layout:
> +/// - Bits 0-7: Entry type
> +/// - Bits 8-15: Size in bytes
> +/// - Bit 31: Optional flag
> +struct EntryHeader(u32);
> +
> +impl EntryHeader {
> +    fn entry_type_raw(&self) -> u8 {
> +        (self.0 & 0xff) as u8
> +    }
> +
> +    fn entry_type(&self) -> Result<EntryType> {
> +        let v = self.entry_type_raw();
> +        EntryType::try_from(v)
> +    }
> +
> +    fn optional(&self) -> bool {
> +        self.0 & bit_u32(31) != 0
> +    }
> +
> +    fn size(&self) -> u32 {
> +        self.0 >> 8 & 0xff
> +    }
> +}
> +
> +#[derive(Clone, Copy, Debug)]
> +#[repr(u8)]
> +enum EntryType {
> +    /// Host <-> FW interface.
> +    Iface = 0,
> +    /// FW config.
> +    Config = 1,
> +    /// Unit tests.
> +    FutfTest = 2,
> +    /// Trace buffer interface.
> +    TraceBuffer = 3,
> +    /// Timeline metadata interface.
> +    TimelineMetadata = 4,
> +    /// Metadata about how the FW binary was built.
> +    BuildInfoMetadata = 6,
> +}
> +
> +impl TryFrom<u8> for EntryType {
> +    type Error = Error;
> +
> +    fn try_from(value: u8) -> Result<Self, Self::Error> {
> +        match value {
> +            0 => Ok(EntryType::Iface),
> +            1 => Ok(EntryType::Config),
> +            2 => Ok(EntryType::FutfTest),
> +            3 => Ok(EntryType::TraceBuffer),
> +            4 => Ok(EntryType::TimelineMetadata),
> +            6 => Ok(EntryType::BuildInfoMetadata),
> +            _ => Err(EINVAL),
> +        }
> +    }
> +}
> 
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 12/20] drm/tyr: add parser for firmware binary
  2026-04-27  8:09   ` Onur Özkan
@ 2026-04-27  8:20     ` Boris Brezillon
  0 siblings, 0 replies; 29+ messages in thread
From: Boris Brezillon @ 2026-04-27  8:20 UTC (permalink / raw)
  To: Onur Özkan
  Cc: Deborah Brouwer, Daniel Almeida, Alice Ryhl, Danilo Krummrich,
	David Airlie, Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda,
	Boqun Feng, Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd, dri-devel,
	linux-kernel, rust-for-linux, beata.michalska, lyude, acourbot,
	alvin.sun

On Mon, 27 Apr 2026 11:09:25 +0300
Onur Özkan <work@onurozkan.dev> wrote:

> > +        let fw_header = Self {
> > +            magic,
> > +            minor,
> > +            major,
> > +            _padding1: padding1,
> > +            version_hash,
> > +            _padding2: padding2,  
> 
> nit: I would write it like this:
> 
> 	_padding1: 0,
> 	version_hash,
> 	_padding2: 0,
> 
> to be more explicit.

OOC, why do we need these padding fields? It looks like we're not doing
doing any raw copy/compare of any sort (fw_header is built using values
read through the cursor). If those are not used/needed, I'd recommend
dropping them.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 19/20] rust: time: add arch_timer_get_rate wrapper
  2026-04-24 23:39 ` [PATCH v4 19/20] rust: time: add arch_timer_get_rate wrapper Deborah Brouwer
  2026-04-27  7:42   ` Andreas Hindborg
  2026-04-27  7:53   ` Alice Ryhl
@ 2026-04-27  8:59   ` Onur Özkan
  2 siblings, 0 replies; 29+ messages in thread
From: Onur Özkan @ 2026-04-27  8:59 UTC (permalink / raw)
  To: Deborah Brouwer
  Cc: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd, dri-devel,
	linux-kernel, rust-for-linux, boris.brezillon, beata.michalska,
	lyude, acourbot, alvin.sun

On Fri, 24 Apr 2026 16:39:13 -0700
Deborah Brouwer <deborah.brouwer@collabora.com> wrote:

> Provide a safe Rust wrapper for arch_timer_get_rate().
> 
> The underlying C helper returns 0 when the ARM architectural timer
> is not available or not yet initialized. Map this to Option<u32> to
> make the absence of a valid rate explicit to Rust callers.
> 
> This allows Rust drivers to query the system timer frequency and
> select appropriate time sources when programming hardware timeouts.
> 
> Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
> ---
>  rust/kernel/time.rs | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/rust/kernel/time.rs b/rust/kernel/time.rs
> index 6ea98dfcd027..03ce96450fc8 100644
> --- a/rust/kernel/time.rs
> +++ b/rust/kernel/time.rs
> @@ -359,6 +359,35 @@ fn div(self, rhs: Self) -> Self::Output {
>      }
>  }
>  
> +/// Returns the ARM architecture timer frequency in Hz, if available.
> +///
> +/// This function queries the system-wide ARM architecture timer frequency.
> +/// The architecture timer provides a consistent time source across all CPU cores.
> +///
> +/// Returns `None` if:
> +/// - The ARM architecture timer is not available (`CONFIG_ARM_ARCH_TIMER` not enabled)
> +/// - The timer rate is zero (not initialized)

Can we return distinct errors for these cases and return NonZero<u32> when the
rate is valid?

> +///
> +/// # Examples
> +///
> +/// ```
> +/// use kernel::time::arch_timer_get_rate;
> +///
> +/// if let Some(rate) = arch_timer_get_rate() {
> +///     // Use `rate`.
> +/// }
> +/// ```
> +pub fn arch_timer_get_rate() -> Option<u32> {
> +    // SAFETY: The C API is available in all configs; when CONFIG_ARM_ARCH_TIMER
> +    // is disabled, the header provides a stub returning 0.
> +    let rate = unsafe { bindings::arch_timer_get_rate() };
> +    if rate == 0 {
> +        None
> +    } else {
> +        Some(rate)
> +    }
> +}
> +
>  impl Delta {
>      /// A span of time equal to zero.
>      pub const ZERO: Self = Self { nanos: 0 };
> 
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 18/20] drm/tyr: add CSF firmware interface support
  2026-04-24 23:39 ` [PATCH v4 18/20] drm/tyr: add CSF firmware interface support Deborah Brouwer
@ 2026-04-27  9:08   ` Onur Özkan
  0 siblings, 0 replies; 29+ messages in thread
From: Onur Özkan @ 2026-04-27  9:08 UTC (permalink / raw)
  To: Deborah Brouwer
  Cc: Daniel Almeida, Alice Ryhl, Danilo Krummrich, David Airlie,
	Simona Vetter, Benno Lossin, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Andreas Hindborg, Trevor Gross,
	FUJITA Tomonori, Frederic Weisbecker, Thomas Gleixner,
	Anna-Maria Behnsen, John Stultz, Stephen Boyd, dri-devel,
	linux-kernel, rust-for-linux, boris.brezillon, beata.michalska,
	lyude, acourbot, alvin.sun

On Fri, 24 Apr 2026 16:39:12 -0700
Deborah Brouwer <deborah.brouwer@collabora.com> wrote:

> Add initial support for the Command Stream Frontend (CSF) firmware
> interfaces, enabling communication between the driver and the MCU through
> shared memory.
> 
> Implement the global (GLB), command stream group (CSG), and command stream
> (CS) interfaces. These provide access to the firmware control, input, and
> output blocks and allow discovery of the available CSGs and CSs at
> runtime.
> 
> Store the global interface in the firmware state and initialize it after
> firmware boot during probe.
> 
> Co-developed-by: Daniel Almeida <daniel.almeida@collabora.com>
> Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
> Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
> ---
>  drivers/gpu/drm/tyr/driver.rs        |    2 +-
>  drivers/gpu/drm/tyr/fw.rs            |   62 +-
>  drivers/gpu/drm/tyr/fw/interfaces.rs | 2005 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/tyr/gem.rs           |    5 +
>  4 files changed, 2061 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
> index 3225385cd511..20ae114a4180 100644
> --- a/drivers/gpu/drm/tyr/driver.rs
> +++ b/drivers/gpu/drm/tyr/driver.rs
> @@ -189,10 +189,10 @@ fn probe(
>          devres::register(pdev.as_ref(), job_irq, GFP_KERNEL)?;
>  
>          firmware.boot()?;
> -
>          firmware
>              .wait_ready(1000)
>              .inspect_err(|_| pr_err!("Timed out waiting for firmware to be ready.\n"))?;
> +        firmware.enable_global_interface()?;
>  
>          let data = try_pin_init!(TyrDrmDeviceData {
>                  pdev: platform.clone(),
> diff --git a/drivers/gpu/drm/tyr/fw.rs b/drivers/gpu/drm/tyr/fw.rs
> index 14815cdafac8..598e399a58ae 100644
> --- a/drivers/gpu/drm/tyr/fw.rs
> +++ b/drivers/gpu/drm/tyr/fw.rs
> @@ -36,7 +36,8 @@
>      str::CString,
>      sync::{
>          Arc,
> -        ArcBorrow, //
> +        ArcBorrow,
> +        Mutex, //
>      },
>      time,
>      types::ARef, //
> @@ -47,9 +48,12 @@
>          IoMem,
>          TyrDrmDevice, //
>      },
> -    fw::parser::{
> -        FwParser,
> -        ParsedSection, //
> +    fw::{
> +        interfaces::GlobalInterface,
> +        parser::{
> +            FwParser,
> +            ParsedSection, //
> +        },
>      },
>      gem,
>      gem::{
> @@ -73,12 +77,16 @@
>      }, //
>  };
>  
> +mod interfaces;
>  pub(crate) mod irq;
>  mod parser;
>  
>  /// Maximum number of CSG interfaces supported by hardware.
>  const MAX_CSG: usize = 16;
>  
> +/// Maximum number of CS interfaces supported by hardware.
> +const MAX_CS: usize = 16;
> +
>  impl_flags!(
>      #[derive(Debug, Clone, Default, Copy, PartialEq, Eq)]
>      pub(super) struct SectionFlags(u32);
> @@ -100,6 +108,11 @@ pub(super) enum SectionFlag {
>  
>  pub(super) const CACHE_MODE_MASK: SectionFlags = SectionFlags(genmask_u32(3..=4));
>  
> +/// MCU virtual address where the CSF shared memory region starts.
> +///
> +/// This region contains the firmware interface structures for communication between
> +/// the CPU driver and MCU firmware, including the GLB_CONTROL_BLOCK at this base address.
> +/// The firmware binary contains a section marked to be loaded at this address.
>  pub(super) const CSF_MCU_SHARED_REGION_START: u32 = 0x04000000;
>  
>  impl SectionFlags {
> @@ -129,18 +142,18 @@ fn try_from(value: u32) -> Result<Self, Self::Error> {
>  }
>  
>  /// A parsed section of the firmware binary.
> -struct Section {
> +pub(crate) struct Section {
>      // Raw firmware section data for reset purposes
>      #[expect(dead_code)]
>      data: KVec<u8>,
>  
>      // Keep the BO backing this firmware section so that both the
>      // GPU mapping and CPU mapping remain valid until the Section is dropped.
> -    #[expect(dead_code)]
>      mem: gem::KernelBo,
>  }
>  
>  /// Loaded firmware with sections mapped into MCU VM.
> +#[pin_data(PinnedDrop)]
>  pub(crate) struct Firmware {
>      /// Platform device reference (needed to access the MCU JOB_IRQ registers).
>      pdev: ARef<platform::Device>,
> @@ -152,7 +165,6 @@ pub(crate) struct Firmware {
>      vm: Arc<Vm>,
>  
>      /// List of firmware sections.
> -    #[expect(dead_code)]
>      sections: KVec<Section>,
>  
>      /// A condvar representing a wait on a firmware event.
> @@ -160,10 +172,15 @@ pub(crate) struct Firmware {
>  
>      /// Latched to `true` by the IRQ handler when the firmware signals readiness via the GLB bit.
>      pub(crate) fw_ready: Arc<AtomicBool>,
> +
> +    /// The global FW interface.
> +    #[pin]
> +    global_iface: Mutex<GlobalInterface>,
>  }
>  
> -impl Drop for Firmware {
> -    fn drop(&mut self) {
> +#[pinned_drop]
> +impl PinnedDrop for Firmware {
> +    fn drop(self: Pin<&mut Self>) {
>          // AS slots retain a VM ref, we need to kill the circular ref manually.
>          self.vm.kill();
>      }
> @@ -258,21 +275,36 @@ pub(crate) fn new(
>              sections.push(Section { data, mem }, GFP_KERNEL)?;
>          }
>  
> -        let firmware = Arc::new(
> -            Firmware {
> +        let firmware = Arc::pin_init(
> +            try_pin_init!(Firmware {
>                  pdev: pdev.into(),
>                  iomem,
>                  vm,
>                  sections,
>                  ready_wait: new_wait!()?,
>                  fw_ready: Arc::new(AtomicBool::new(false), GFP_KERNEL)?,
> -            },
> +                global_iface <- new_mutex!(GlobalInterface::new()?),
> +            }),
>              GFP_KERNEL,
>          )?;
>  
>          Ok(firmware)
>      }
>  
> +    /// Get the shared memory section containing firmware interface structures.
> +    pub(crate) fn shared_section(&self) -> Result<&Section> {
> +        self.sections
> +            .iter()
> +            .find(|section| section.mem.va_range().start == u64::from(CSF_MCU_SHARED_REGION_START))
> +            .ok_or_else(|| {
> +                pr_err!(
> +                    "CSF shared section not found at 0x{:08x}\n",
> +                    CSF_MCU_SHARED_REGION_START
> +                );
> +                EINVAL
> +            })
> +    }
> +
>      pub(crate) fn boot(&self) -> Result {
>          // SAFETY: Boot is currently only called in the probe path, so we're sure we have a bound
>          // device.
> @@ -303,4 +335,10 @@ pub(crate) fn wait_ready(&self, timeout_ms: u32) -> Result {
>              }
>          })
>      }
> +
> +    /// Enable the global interface.
> +    pub(crate) fn enable_global_interface(&self) -> Result {
> +        let shared_section = self.shared_section()?;
> +        self.global_iface.lock().enable(shared_section)
> +    }
>  }
> diff --git a/drivers/gpu/drm/tyr/fw/interfaces.rs b/drivers/gpu/drm/tyr/fw/interfaces.rs
> new file mode 100644
> index 000000000000..07cdb1c76a3f
> --- /dev/null
> +++ b/drivers/gpu/drm/tyr/fw/interfaces.rs
> @@ -0,0 +1,2005 @@
> +// SPDX-License-Identifier: GPL-2.0 or MIT
> +
> +//! Code to control the global interface of the CSF firmware.
> +//!
> +//! For abbreviation definitions (CEU, CS, CSF, CSG, CSHW, GLB, JASID, MCU, MMU), see the top-level
> +//! module documentation in [`crate::regs`].
> +//!
> +//! # Interface Overview
> +//!
> +//! Tyr interacts with the CSF firmware running on the MCU through shared memory
> +//! interfaces. The CSF manages job submission via a hierarchy of:
> +//! - **GLB**: Global interface - controls operations common to all CSs
> +//! - **CSG**: Command Stream Groups - groups of related command streams
> +//! - **CS**: Command Streams - individual sequences of GPU commands
> +//!
> +//! ```
> +//! ┌──────────────────────────────────────────┐
> +//! │ GPU                                      │
> +//! │ ┌─────┐ ┌──────────────────────────────┐ │
> +//! │ │ MMU │ │  CSF                         │ │
> +//! │ └─────┘ │ ┌────────────┐ ┌─────┐       │ │
> +//! │         │ │ CSHW (CEU) │ │ MCU │       │ │
> +//! │         │ └────────────┘ └─────┘       │ │
> +//! └─────────┼──────────────────────────────┼─┘
> +//!           │ ┌──────────────────────────┐ │
> +//!           │ │ Shared Memory            │ │
> +//!           │ │ ┌────────┐ ┌────┐ ┌────┐ │ │
> +//!           │ │ │  CSG0  │ │GLB │ │ FW │ │ │
> +//!           │ │ │ ┌────┐ │ └────┘ └────┘ │ │
> +//!           │ │ │ │CS0 │ │               │ │
> +//!           │ │ │ └────┘ │               │ │
> +//!           │ │ └────────┘               │ │
> +//!           │ └──────────────────────────┘ │
> +//!           └──────────────┬───────────────┘
> +//!                          │
> +//!                      ┌───┴───┐
> +//!                      │  Tyr  │
> +//!                      └───────┘
> +//! ```
> +//!
> +
> +use crate::fw::Section;
> +use iface::FwInterface;
> +use kernel::{
> +    io::Io,
> +    prelude::*, //
> +};
> +
> +            }

[...]

> +        }
> +    }
> +}
> +
> +use cs::*;
> +use csg::*;
> +use glb::{
> +    control::*,
> +    *, //
> +};

Any reason having these imports at middle of the module?

> +
> +/// State of the global interface.
> +enum GlobalInterfaceState {
> +    /// Interface is not yet initialized.
> +    Disabled,
> +    /// Interface is initialized and operational.
> +    Enabled(EnabledGlobalInterface),
> +}
> +
> +/// When enabled, the Global Interface has control,
> +/// input, and output system memory interfaces, as well as
> +/// the discovered CSG interfaces.
> +#[expect(dead_code)]
> +struct EnabledGlobalInterface {
> +    /// Control block interface - provides version, features, and CSG discovery.
> +    glb_control: FwInterface<GLB_CONTROL_BLOCK_SIZE>,
> +    /// Input block interface - driver writes requests here.
> +    glb_input: FwInterface<GLB_INPUT_BLOCK_SIZE>,
> +    /// Output block interface - firmware writes acknowledgements here.
> +    glb_output: FwInterface<GLB_OUTPUT_BLOCK_SIZE>,
> +    /// Runtime stride between CSG control blocks (read from GLB_GROUP_STRIDE).
> +    csg_stride: usize,
> +    /// Number of CSG interfaces reported by hardware.
> +    csg_num: usize,
> +    /// Discovered CSG interfaces.
> +    csg: KVec<CsgInterface>,
> +}
> +
> +/// Global CSF Interface
> +///
> +/// The CSF controls operations that are common to all CSs.
> +pub(super) struct GlobalInterface {
> +    /// Current interface state (Disabled or Enabled).
> +    state: GlobalInterfaceState,
> +}
> +
> +impl GlobalInterface {
> +    /// Creates a new CSF global interface, initially disabled.
> +    pub(super) fn new() -> Result<Self> {
> +        Ok(Self {
> +            state: GlobalInterfaceState::Disabled,
> +        })
> +    }
> +
> +    /// Enables the global interface and discovers the CSG interfaces.
> +    ///
> +    /// This reads the firmware's control block to set up the global input/output
> +    /// interfaces; it configures timers and shader core allocation; and it discovers
> +    /// available CSG interfaces.
> +    pub(crate) fn enable(&mut self, shared_section: &Section) -> Result {
> +        let vmap = shared_section.mem.bo.owned_vmap::<0>()?;
> +        let va_range = shared_section.mem.va_range();
> +
> +        let glb_control =
> +            FwInterface::<GLB_CONTROL_BLOCK_SIZE>::new(&vmap, &va_range, va_range.start)?;
> +
> +        let version = glb_control.read(GLB_VERSION);
> +        if version.major().get() == 0 {
> +            pr_err!("CSF interface version is 0. Firmware may have failed to boot.\n");
> +            return Err(EINVAL);
> +        }
> +        pr_info!(
> +            "CSF interface version: {}.{}.{}\n",
> +            version.major().get(),
> +            version.minor().get(),
> +            version.patch().get()
> +        );
> +
> +        let input_va = glb_control.read(GLB_INPUT_VA);
> +        let glb_input = FwInterface::<GLB_INPUT_BLOCK_SIZE>::new(
> +            &vmap,
> +            &va_range,
> +            input_va.value().get().into(),
> +        )?;
> +
> +        let output_va = glb_control.read(GLB_OUTPUT_VA);
> +        let glb_output = FwInterface::<GLB_OUTPUT_BLOCK_SIZE>::new(
> +            &vmap,
> +            &va_range,
> +            output_va.value().get().into(),
> +        )?;
> +
> +        // Read how many CSG interfaces exist.
> +        let csg_num = glb_control.read(GLB_GROUP_NUM).value().get();
> +
> +        // Read the stride between CSG control blocks.
> +        let csg_stride = glb_control.read(GLB_GROUP_STRIDE).value().get() as usize;
> +
> +        if csg_stride < CSG_CONTROL_BLOCK_SIZE {
> +            pr_err!(
> +                "CSG stride {} is smaller than control block size {}\n",
> +                csg_stride,
> +                CSG_CONTROL_BLOCK_SIZE
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        // Validate the CSG number reported.
> +        if csg_num as usize > super::MAX_CSG {
> +            pr_err!(
> +                "Too many CSGs: hardware reports {}, max supported {}\n",
> +                csg_num,
> +                super::MAX_CSG
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        let enabled = EnabledGlobalInterface {
> +            glb_control,
> +            glb_input,
> +            glb_output,
> +            csg_stride,
> +            csg_num: csg_num as usize,
> +            csg: KVec::with_capacity(csg_num as usize, GFP_KERNEL)?,
> +        };
> +
> +        self.state = GlobalInterfaceState::Enabled(enabled);
> +        self.init_csg(shared_section)?;
> +        Ok(())
> +    }
> +
> +    /// Initialize CSG interfaces.
> +    ///
> +    /// This uses the previously read CSG count to create and enable each CSG interface.
> +    fn init_csg(&mut self, shared_section: &Section) -> Result {
> +        let enabled = match &mut self.state {
> +            GlobalInterfaceState::Enabled(e) => e,
> +            GlobalInterfaceState::Disabled => return Err(EINVAL),
> +        };
> +
> +        for csg_idx in 0..enabled.csg_num {
> +            // Create and enable the CSG interface.
> +            let mut csg = CsgInterface::new(csg_idx)?;
> +            csg.enable(shared_section, csg_idx, enabled.csg_stride)?;
> +
> +            enabled.csg.push(csg, GFP_KERNEL)?;
> +        }
> +
> +        Ok(())
> +    }
> +}
> +
> +/// State of a CSG interface.
> +enum CsgInterfaceState {
> +    /// Interface is not yet initialized.
> +    Disabled,
> +    /// Interface is initialized and operational.
> +    Enabled(EnabledCsgInterface),
> +}
> +
> +/// When enabled, a CSG Interface has control, input, and output system memory interfaces.
> +struct EnabledCsgInterface {
> +    /// Control block interface - provides CSG capabilities and configuration.
> +    #[expect(dead_code)]
> +    csg_control: FwInterface<CSG_CONTROL_BLOCK_SIZE>,
> +    /// Input block interface - driver writes CSG requests here.
> +    #[expect(dead_code)]
> +    csg_input: FwInterface<CSG_INPUT_BLOCK_SIZE>,
> +    /// Output block interface - firmware writes CSG acknowledgements here.
> +    #[expect(dead_code)]
> +    csg_output: FwInterface<CSG_OUTPUT_BLOCK_SIZE>,
> +    /// Runtime stride between CS control blocks (read from GROUP_STREAM_STRIDE).
> +    cs_stride: usize,
> +    /// Number of CS interfaces reported by hardware for this CSG.
> +    cs_num: usize,
> +    /// Discovered CS interfaces.
> +    cs: KVec<CsInterface>,
> +}
> +
> +/// Command Stream Group Interface
> +///
> +/// The CSG interface controls operations for a specific CSG.
> +pub(crate) struct CsgInterface {
> +    /// Current interface state (Disabled or Enabled).
> +    state: CsgInterfaceState,
> +    /// CSG identifier/index number.
> +    #[expect(dead_code)]
> +    csg_idx: usize,
> +}
> +
> +impl CsgInterface {
> +    /// Creates a new disabled CSG interface.
> +    pub(super) fn new(csg_idx: usize) -> Result<Self> {
> +        Ok(Self {
> +            state: CsgInterfaceState::Disabled,
> +            csg_idx,
> +        })
> +    }
> +
> +    /// Enables the CSG interface.
> +    ///
> +    /// This calculates the runtime offset of this CSG's control block and creates
> +    /// a bounded interface to access it. It then reads the input/output interface
> +    /// addresses from the CSG control block.
> +    fn enable(&mut self, shared_section: &Section, csg_idx: usize, csg_stride: usize) -> Result {
> +        use csg::control::{
> +            GROUP_INPUT_VA,
> +            GROUP_OUTPUT_VA,
> +            GROUP_STREAM_NUM,
> +            GROUP_STREAM_STRIDE, //
> +        };
> +        use kernel::io::Io;

Why are these imported inside the function? This is usually done when the
function is gated behind a feature flag to avoid clippy warnings when features
are disabled, but that doesn't seem to be the case here.

> +
> +        let vmap = shared_section.mem.bo.owned_vmap::<0>()?;
> +        let va_range = shared_section.mem.va_range();
> +
> +        // Calculate the runtime offset for this CSG's control block.
> +        // The CSG control blocks start at CSG_GROUP_CONTROL_OFFSET from the GLB control block,
> +        // with each CSG spaced by csg_stride bytes.
> +        let csg_control_offset = CSG_GROUP_CONTROL_OFFSET + csg_idx * csg_stride;
> +
> +        // The CSG control block's MCU virtual address is relative to the shared section start.
> +        let csg_control_va = va_range.start + csg_control_offset as u64;
> +
> +        // Create a bounded interface for this CSG's control block at the calculated address.
> +        let csg_control =
> +            FwInterface::<CSG_CONTROL_BLOCK_SIZE>::new(&vmap, &va_range, csg_control_va)?;
> +
> +        // Read the input and output VAs from the CSG control block.
> +        let input_va = csg_control.read(GROUP_INPUT_VA).value().get();
> +        let csg_input =
> +            FwInterface::<CSG_INPUT_BLOCK_SIZE>::new(&vmap, &va_range, input_va.into())?;
> +
> +        let output_va = csg_control.read(GROUP_OUTPUT_VA).value().get();
> +        let csg_output =
> +            FwInterface::<CSG_OUTPUT_BLOCK_SIZE>::new(&vmap, &va_range, output_va.into())?;
> +
> +        // Read the runtime stride between CS control blocks.
> +        let cs_stride = csg_control.read(GROUP_STREAM_STRIDE).value().get() as usize;
> +
> +        if cs_stride < CS_CONTROL_BLOCK_SIZE {
> +            pr_err!(
> +                "CS stride {} is smaller than control block size {}\n",
> +                cs_stride,
> +                CS_CONTROL_BLOCK_SIZE
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        // Read how many CS interfaces exist for this CSG.
> +        let cs_num = csg_control.read(GROUP_STREAM_NUM).value().get();
> +
> +        // Validate that the hardware doesn't report more CS than we support.
> +        if cs_num as usize > super::MAX_CS {
> +            pr_err!(
> +                "Too many CS: hardware reports {}, max supported {}\n",
> +                cs_num,
> +                super::MAX_CS
> +            );
> +            return Err(EINVAL);
> +        }
> +
> +        let enabled = EnabledCsgInterface {
> +            csg_control,
> +            csg_input,
> +            csg_output,
> +            cs_stride,
> +            cs_num: cs_num as usize,
> +            cs: KVec::with_capacity(cs_num as usize, GFP_KERNEL)?,
> +        };
> +
> +        self.state = CsgInterfaceState::Enabled(enabled);
> +        self.init_cs(shared_section, csg_control_offset)?;
> +        Ok(())
> +    }
> +
> +    /// Initialize and discover CS interfaces.
> +    ///
> +    /// This uses the previously read CS count to create and enable each CS interface.
> +    fn init_cs(&mut self, shared_section: &Section, csg_control_offset: usize) -> Result {
> +        let enabled = match &mut self.state {
> +            CsgInterfaceState::Enabled(e) => e,
> +            CsgInterfaceState::Disabled => return Err(EINVAL),
> +        };
> +
> +        for cs_idx in 0..enabled.cs_num {
> +            // Create and enable the CS interface.
> +            let mut cs = CsInterface::new(cs_idx)?;
> +            cs.enable(
> +                shared_section,
> +                csg_control_offset,
> +                cs_idx,
> +                enabled.cs_stride,
> +            )?;
> +
> +            enabled.cs.push(cs, GFP_KERNEL)?;
> +        }
> +
> +        Ok(())
> +    }
> +}
> +
> +/// State of a CS interface.
> +enum CsInterfaceState {
> +    /// Interface is not yet initialized.
> +    Disabled,
> +    /// Interface is initialized and operational.
> +    #[expect(dead_code)]
> +    Enabled(EnabledCsInterface),
> +}
> +
> +/// When enabled, a CS Interface has control, input, and output system memory interfaces.
> +struct EnabledCsInterface {
> +    /// Control block interface - provides CS capabilities and configuration.
> +    #[expect(dead_code)]
> +    cs_control: FwInterface<CS_CONTROL_BLOCK_SIZE>,
> +    /// Input block interface - driver writes CS requests here.
> +    #[expect(dead_code)]
> +    cs_input: FwInterface<CS_KERNEL_INPUT_BLOCK_SIZE>,
> +    /// Output block interface - firmware writes CS acknowledgements here.
> +    #[expect(dead_code)]
> +    cs_output: FwInterface<CS_KERNEL_OUTPUT_BLOCK_SIZE>,
> +}
> +
> +/// Command Stream Interface
> +///
> +/// The CS interface controls operations for a specific CS.
> +pub(crate) struct CsInterface {
> +    /// Current interface state (Disabled or Enabled).
> +    state: CsInterfaceState,
> +    /// CS identifier/index number.
> +    #[expect(dead_code)]
> +    cs_idx: usize,
> +}
> +
> +impl CsInterface {
> +    /// Creates a new disabled CS interface.
> +    pub(super) fn new(cs_idx: usize) -> Result<Self> {
> +        Ok(Self {
> +            state: CsInterfaceState::Disabled,
> +            cs_idx,
> +        })
> +    }
> +
> +    /// Enables the CS interface.
> +    ///
> +    /// This calculates the runtime offset of this CS's control block and creates
> +    /// a bounded interface to access it. It then reads the input/output interface
> +    /// addresses from the CS control block.
> +    fn enable(
> +        &mut self,
> +        shared_section: &Section,
> +        csg_control_offset: usize,
> +        cs_idx: usize,
> +        cs_stride: usize,
> +    ) -> Result {
> +        use cs::control::{
> +            STREAM_INPUT_VA,
> +            STREAM_OUTPUT_VA, //
> +        };
> +        use kernel::io::Io;
> +
> +        let vmap = shared_section.mem.bo.owned_vmap::<0>()?;
> +        let va_range = shared_section.mem.va_range();
> +
> +        // Calculate the runtime offset for this CS's control block.
> +        let cs_control_offset = CS_CONTROL_OFFSET + cs_idx * cs_stride;
> +
> +        // The CS control block's MCU virtual address is relative to the shared section start.
> +        let cs_control_va = va_range.start + csg_control_offset as u64 + cs_control_offset as u64;
> +
> +        // Create a bounded interface for this CS's control block at the calculated address.
> +        let cs_control =
> +            FwInterface::<CS_CONTROL_BLOCK_SIZE>::new(&vmap, &va_range, cs_control_va)?;
> +
> +        // Read the input and output VAs from the CS control block.
> +        let input_va = cs_control.read(STREAM_INPUT_VA).value().get();
> +        let cs_input =
> +            FwInterface::<CS_KERNEL_INPUT_BLOCK_SIZE>::new(&vmap, &va_range, input_va.into())?;
> +
> +        let output_va = cs_control.read(STREAM_OUTPUT_VA).value().get();
> +        let cs_output =
> +            FwInterface::<CS_KERNEL_OUTPUT_BLOCK_SIZE>::new(&vmap, &va_range, output_va.into())?;
> +
> +        let enabled = EnabledCsInterface {
> +            cs_control,
> +            cs_input,
> +            cs_output,
> +        };
> +
> +        self.state = CsInterfaceState::Enabled(enabled);
> +
> +        Ok(())
> +    }
> +}
> diff --git a/drivers/gpu/drm/tyr/gem.rs b/drivers/gpu/drm/tyr/gem.rs
> index 4ec373e0bcfa..606d446aafd9 100644
> --- a/drivers/gpu/drm/tyr/gem.rs
> +++ b/drivers/gpu/drm/tyr/gem.rs
> @@ -151,6 +151,11 @@ pub(crate) fn new<Ctx: DeviceContext>(
>              va_range: va..(va + size),
>          })
>      }
> +
> +    /// Returns the GPU virtual address range occupied by this buffer.
> +    pub(crate) fn va_range(&self) -> Range<u64> {
> +        self.va_range.clone()
> +    }
>  }
>  
>  impl Drop for KernelBo {
> 
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2026-04-27  9:08 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24 23:38 [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Deborah Brouwer
2026-04-24 23:38 ` [PATCH v4 01/20] drm/tyr: remove unused device from platform data Deborah Brouwer
2026-04-24 23:38 ` [PATCH v4 02/20] drm/tyr: select required dependencies in Kconfig Deborah Brouwer
2026-04-27  7:23   ` Boris Brezillon
2026-04-24 23:38 ` [PATCH v4 03/20] drm/tyr: move clock cleanup into Clocks Drop impl Deborah Brouwer
2026-04-24 23:38 ` [PATCH v4 04/20] drm/tyr: rename TyrObject to BoData Deborah Brouwer
2026-04-24 23:38 ` [PATCH v4 05/20] drm/tyr: use shmem GEM object type in TyrDrmDriver Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 06/20] drm/tyr: set DMA mask using GPU physical address Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 07/20] drm/tyr: add shmem backing for GEM objects Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 08/20] drm/tyr: Add generic slot manager Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 09/20] drm/tyr: add MMU module Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 10/20] drm/tyr: add GPU virtual memory module Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 11/20] drm/tyr: add a kernel buffer object Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 12/20] drm/tyr: add parser for firmware binary Deborah Brouwer
2026-04-27  8:09   ` Onur Özkan
2026-04-27  8:20     ` Boris Brezillon
2026-04-24 23:39 ` [PATCH v4 13/20] drm/tyr: add firmware loading and MCU boot support Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 14/20] drm/tyr: add Wait type for GPU events Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 15/20] drm/tyr: add Job IRQ handling Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 16/20] drm/tyr: wait for global interface readiness Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 17/20] drm/tyr: validate presence of CSF shared section Deborah Brouwer
2026-04-24 23:39 ` [PATCH v4 18/20] drm/tyr: add CSF firmware interface support Deborah Brouwer
2026-04-27  9:08   ` Onur Özkan
2026-04-24 23:39 ` [PATCH v4 19/20] rust: time: add arch_timer_get_rate wrapper Deborah Brouwer
2026-04-27  7:42   ` Andreas Hindborg
2026-04-27  7:53   ` Alice Ryhl
2026-04-27  8:59   ` Onur Özkan
2026-04-24 23:39 ` [PATCH v4 20/20] drm/tyr: program CSF global interface Deborah Brouwer
2026-04-27  8:07 ` [PATCH v4 00/20] drm/tyr: firmware loading and MCU boot support Boris Brezillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox