[PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6)
@ 2026-01-20 20:42 Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists Joel Fernandes
                   ` (26 more replies)
  0 siblings, 27 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

This series is rebased on drm-rust-kernel/drm-rust-next and provides memory
management infrastructure for the nova-core GPU driver. It combines several
previous series and provides a foundation for nova GPU memory management
including page tables, virtual memory management, and BAR mapping. All these
are critical nova-core features.

The series includes:
- A Rust module (CList) to interface with C circular linked lists, required
  for iterating over buddy allocator blocks.
- Movement of the DRM buddy allocator up to drivers/gpu/ level, renamed to GPU buddy.
- Rust bindings for the GPU buddy allocator.
- PRAMIN aperture support for direct VRAM access.
- Page table types for MMU v2 and v3 formats.
- Virtual Memory Manager (VMM) for GPU virtual address space management.
- BAR1 user interface for mapping access GPU via virtual memory.
- Selftests for PRAMIN and BAR1 user interface (disabled by default).

Changes from v5 to v6:
- Rebased on drm-rust-kernel/drm-rust-next
- Added page table types and page table walker infrastructure
- Added Virtual Memory Manager (VMM)
- Added BAR1 user interface
- Added TLB flush support
- Added GpuMm memory manager
- Extended to 26 patches from 6 (full mm infrastructure now included)

The git tree with all patches can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (tag: nova-mm-v6-20260120)

Link to v5: https://lore.kernel.org/all/20251219203805.1246586-1-joelagnelf@nvidia.com/

Previous series that are combined:
- v4 (clist + buddy): https://lore.kernel.org/all/20251204215129.2357292-1-joelagnelf@nvidia.com/
- v3 (clist only): https://lore.kernel.org/all/20251129213056.4021375-1-joelagnelf@nvidia.com/
- v2 (clist only): https://lore.kernel.org/all/20251111171315.2196103-4-joelagnelf@nvidia.com/
- clist RFC (original with buddy): https://lore.kernel.org/all/20251030190613.1224287-1-joelagnelf@nvidia.com/
- DRM buddy move: https://lore.kernel.org/all/20251124234432.1988476-1-joelagnelf@nvidia.com/
- PRAMIN series: https://lore.kernel.org/all/20251020185539.49986-1-joelagnelf@nvidia.com/

Joel Fernandes (26):
  rust: clist: Add support to interface with C linked lists
  gpu: Move DRM buddy allocator one level up
  rust: gpu: Add GPU buddy allocator bindings
  nova-core: mm: Select GPU_BUDDY for VRAM allocation
  nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  nova-core: Add BAR1 aperture type and size constant
  nova-core: gsp: Add BAR1 PDE base accessors
  nova-core: mm: Add common memory management types
  nova-core: mm: Add common types for all page table formats
  nova-core: mm: Add MMU v2 page table types
  nova-core: mm: Add MMU v3 page table types
  nova-core: mm: Add unified page table entry wrapper enums
  nova-core: mm: Add TLB flush support
  nova-core: mm: Add GpuMm centralized memory manager
  nova-core: mm: Add page table walker for MMU v2
  nova-core: mm: Add Virtual Memory Manager
  nova-core: mm: Add virtual address range tracking to VMM
  nova-core: mm: Add BAR1 user interface
  nova-core: gsp: Return GspStaticInfo and FbLayout from boot()
  nova-core: mm: Add memory management self-tests
  nova-core: mm: Add PRAMIN aperture self-tests
  nova-core: gsp: Extract usable FB region from GSP
  nova-core: fb: Add usable_vram field to FbLayout
  nova-core: mm: Use usable VRAM region for buddy allocator
  nova-core: mm: Add BarUser to struct Gpu and create at boot

 Documentation/gpu/drm-mm.rst                  |   10 +-
 Documentation/gpu/nova/core/pramin.rst        |  125 ++
 Documentation/gpu/nova/index.rst              |    1 +
 MAINTAINERS                                   |    7 +
 drivers/gpu/Kconfig                           |   13 +
 drivers/gpu/Makefile                          |    2 +
 drivers/gpu/buddy.c                           | 1310 +++++++++++++++++
 drivers/gpu/drm/Kconfig                       |    1 +
 drivers/gpu/drm/Kconfig.debug                 |    4 +-
 drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c       |    2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h    |   12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |   80 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h  |   20 +-
 drivers/gpu/drm/drm_buddy.c                   | 1284 +---------------
 drivers/gpu/drm/i915/Kconfig                  |    1 +
 drivers/gpu/drm/i915/i915_scatterlist.c       |   10 +-
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |   55 +-
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.h |    6 +-
 .../drm/i915/selftests/intel_memory_region.c  |   20 +-
 drivers/gpu/drm/tests/Makefile                |    1 -
 .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |    5 +-
 drivers/gpu/drm/ttm/tests/ttm_mock_manager.c  |   18 +-
 drivers/gpu/drm/ttm/tests/ttm_mock_manager.h  |    4 +-
 drivers/gpu/drm/xe/Kconfig                    |    1 +
 drivers/gpu/drm/xe/xe_res_cursor.h            |   34 +-
 drivers/gpu/drm/xe/xe_svm.c                   |   12 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c          |   73 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h    |    4 +-
 drivers/gpu/nova-core/Kconfig                 |   22 +
 drivers/gpu/nova-core/driver.rs               |    9 +-
 drivers/gpu/nova-core/fb.rs                   |   23 +-
 drivers/gpu/nova-core/gpu.rs                  |  140 +-
 drivers/gpu/nova-core/gsp/boot.rs             |   22 +-
 drivers/gpu/nova-core/gsp/commands.rs         |   18 +-
 drivers/gpu/nova-core/gsp/fw/commands.rs      |   38 +
 drivers/gpu/nova-core/mm/bar_user.rs          |  336 +++++
 drivers/gpu/nova-core/mm/mod.rs               |  209 +++
 drivers/gpu/nova-core/mm/pagetable/mod.rs     |  377 +++++
 drivers/gpu/nova-core/mm/pagetable/ver2.rs    |  184 +++
 drivers/gpu/nova-core/mm/pagetable/ver3.rs    |  286 ++++
 drivers/gpu/nova-core/mm/pagetable/walk.rs    |  285 ++++
 drivers/gpu/nova-core/mm/pramin.rs            |  404 +++++
 drivers/gpu/nova-core/mm/tlb.rs               |   79 +
 drivers/gpu/nova-core/mm/vmm.rs               |  247 ++++
 drivers/gpu/nova-core/nova_core.rs            |    1 +
 drivers/gpu/nova-core/regs.rs                 |   38 +
 drivers/gpu/tests/Makefile                    |    3 +
 .../gpu_buddy_test.c}                         |  390 ++---
 drivers/gpu/tests/gpu_random.c                |   48 +
 drivers/gpu/tests/gpu_random.h                |   28 +
 drivers/video/Kconfig                         |    2 +
 include/drm/drm_buddy.h                       |  163 +-
 include/linux/gpu_buddy.h                     |  177 +++
 rust/bindings/bindings_helper.h               |   11 +
 rust/helpers/gpu.c                            |   23 +
 rust/helpers/helpers.c                        |    2 +
 rust/helpers/list.c                           |   12 +
 rust/kernel/clist.rs                          |  357 +++++
 rust/kernel/gpu/buddy.rs                      |  538 +++++++
 rust/kernel/gpu/mod.rs                        |    5 +
 rust/kernel/lib.rs                            |    3 +
 62 files changed, 5788 insertions(+), 1808 deletions(-)
 create mode 100644 Documentation/gpu/nova/core/pramin.rst
 create mode 100644 drivers/gpu/Kconfig
 create mode 100644 drivers/gpu/buddy.c
 create mode 100644 drivers/gpu/nova-core/mm/bar_user.rs
 create mode 100644 drivers/gpu/nova-core/mm/mod.rs
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/mod.rs
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/ver2.rs
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/ver3.rs
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/walk.rs
 create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
 create mode 100644 drivers/gpu/nova-core/mm/tlb.rs
 create mode 100644 drivers/gpu/nova-core/mm/vmm.rs
 create mode 100644 drivers/gpu/tests/Makefile
 rename drivers/gpu/{drm/tests/drm_buddy_test.c => tests/gpu_buddy_test.c} (68%)
 create mode 100644 drivers/gpu/tests/gpu_random.c
 create mode 100644 drivers/gpu/tests/gpu_random.h
 create mode 100644 include/linux/gpu_buddy.h
 create mode 100644 rust/helpers/gpu.c
 create mode 100644 rust/helpers/list.c
 create mode 100644 rust/kernel/clist.rs
 create mode 100644 rust/kernel/gpu/buddy.rs
 create mode 100644 rust/kernel/gpu/mod.rs


base-commit: 6ea52b6d8f33ae627f4dcf43b12b6e713a8b9331
-- 
2.34.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 23:48   ` Gary Guo
  2026-01-21  7:27   ` Zhi Wang
  2026-01-20 20:42 ` [PATCH RFC v6 02/26] gpu: Move DRM buddy allocator one level up Joel Fernandes
                   ` (25 subsequent siblings)
  26 siblings, 2 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add a new module `clist` for working with C's doubly circular linked
lists. Provide low-level iteration over list nodes.

Typed iteration over actual items is provided with a `clist_create`
macro to assist in creation of the `Clist` type.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 MAINTAINERS            |   7 +
 rust/helpers/helpers.c |   1 +
 rust/helpers/list.c    |  12 ++
 rust/kernel/clist.rs   | 357 +++++++++++++++++++++++++++++++++++++++++
 rust/kernel/lib.rs     |   1 +
 5 files changed, 378 insertions(+)
 create mode 100644 rust/helpers/list.c
 create mode 100644 rust/kernel/clist.rs

diff --git a/MAINTAINERS b/MAINTAINERS
index 0d044a58cbfe..b76988c38045 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22936,6 +22936,13 @@ F:	rust/kernel/init.rs
 F:	rust/pin-init/
 K:	\bpin-init\b|pin_init\b|PinInit
 
+RUST TO C LIST INTERFACES
+M:	Joel Fernandes <joelagnelf@nvidia.com>
+M:	Alexandre Courbot <acourbot@nvidia.com>
+L:	rust-for-linux@vger.kernel.org
+S:	Maintained
+F:	rust/kernel/clist.rs
+
 RXRPC SOCKETS (AF_RXRPC)
 M:	David Howells <dhowells@redhat.com>
 M:	Marc Dionne <marc.dionne@auristor.com>
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 79c72762ad9c..634fa2386bbb 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -32,6 +32,7 @@
 #include "io.c"
 #include "jump_label.c"
 #include "kunit.c"
+#include "list.c"
 #include "maple_tree.c"
 #include "mm.c"
 #include "mutex.c"
diff --git a/rust/helpers/list.c b/rust/helpers/list.c
new file mode 100644
index 000000000000..6044979c7a2e
--- /dev/null
+++ b/rust/helpers/list.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Helpers for C Circular doubly linked list implementation.
+ */
+
+#include <linux/list.h>
+
+void rust_helper_list_add_tail(struct list_head *new, struct list_head *head)
+{
+	list_add_tail(new, head);
+}
diff --git a/rust/kernel/clist.rs b/rust/kernel/clist.rs
new file mode 100644
index 000000000000..91754ae721b9
--- /dev/null
+++ b/rust/kernel/clist.rs
@@ -0,0 +1,357 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! A C doubly circular intrusive linked list interface for rust code.
+//!
+//! # Examples
+//!
+//! ```
+//! use kernel::{
+//!     bindings,
+//!     clist::init_list_head,
+//!     clist_create,
+//!     types::Opaque, //
+//! };
+//! # // Create test list with values (0, 10, 20) - normally done by C code but it is
+//! # // emulated here for doctests using the C bindings.
+//! # use core::mem::MaybeUninit;
+//! #
+//! # /// C struct with embedded `list_head` (typically will be allocated by C code).
+//! # #[repr(C)]
+//! # pub(crate) struct SampleItemC {
+//! #     pub value: i32,
+//! #     pub link: bindings::list_head,
+//! # }
+//! #
+//! # let mut head = MaybeUninit::<bindings::list_head>::uninit();
+//! #
+//! # let head = head.as_mut_ptr();
+//! # // SAFETY: head and all the items are test objects allocated in this scope.
+//! # unsafe { init_list_head(head) };
+//! #
+//! # let mut items = [
+//! #     MaybeUninit::<SampleItemC>::uninit(),
+//! #     MaybeUninit::<SampleItemC>::uninit(),
+//! #     MaybeUninit::<SampleItemC>::uninit(),
+//! # ];
+//! #
+//! # for (i, item) in items.iter_mut().enumerate() {
+//! #     let ptr = item.as_mut_ptr();
+//! #     // SAFETY: pointers are to allocated test objects with a list_head field.
+//! #     unsafe {
+//! #         (*ptr).value = i as i32 * 10;
+//! #         // addr_of_mut!() computes address of link directly as link is uninitialized.
+//! #         init_list_head(core::ptr::addr_of_mut!((*ptr).link));
+//! #         bindings::list_add_tail(&mut (*ptr).link, head);
+//! #     }
+//! # }
+//!
+//! // Rust wrapper for the C struct.
+//! // The list item struct in this example is defined in C code as:
+//! //   struct SampleItemC {
+//! //       int value;
+//! //       struct list_head link;
+//! //   };
+//! //
+//! #[repr(transparent)]
+//! pub(crate) struct Item(Opaque<SampleItemC>);
+//!
+//! impl Item {
+//!     pub(crate) fn value(&self) -> i32 {
+//!         // SAFETY: [`Item`] has same layout as [`SampleItemC`].
+//!         unsafe { (*self.0.get()).value }
+//!     }
+//! }
+//!
+//! // Create typed [`CList`] from sentinel head.
+//! // SAFETY: head is valid, items are [`SampleItemC`] with embedded `link` field.
+//! let list = unsafe { clist_create!(head, Item, SampleItemC, link) };
+//!
+//! // Iterate directly over typed items.
+//! let mut found_0 = false;
+//! let mut found_10 = false;
+//! let mut found_20 = false;
+//!
+//! for item in list.iter() {
+//!     let val = item.value();
+//!     if val == 0 { found_0 = true; }
+//!     if val == 10 { found_10 = true; }
+//!     if val == 20 { found_20 = true; }
+//! }
+//!
+//! assert!(found_0 && found_10 && found_20);
+//! ```
+
+use core::{
+    iter::FusedIterator,
+    marker::PhantomData, //
+};
+
+use crate::{
+    bindings,
+    types::Opaque, //
+};
+
+use pin_init::PinInit;
+
+/// Initialize a `list_head` object to point to itself.
+///
+/// # Safety
+///
+/// `list` must be a valid pointer to a `list_head` object.
+#[inline]
+pub unsafe fn init_list_head(list: *mut bindings::list_head) {
+    // SAFETY: Caller guarantees `list` is a valid pointer to a `list_head`.
+    unsafe {
+        (*list).next = list;
+        (*list).prev = list;
+    }
+}
+
+/// Wraps a `list_head` object for use in intrusive linked lists.
+///
+/// # Invariants
+///
+/// - [`CListHead`] represents an allocated and valid `list_head` structure.
+/// - Once a [`CListHead`] is created in Rust, it will not be modified by non-Rust code.
+/// - All `list_head` for individual items are not modified for the lifetime of [`CListHead`].
+#[repr(transparent)]
+pub struct CListHead(Opaque<bindings::list_head>);
+
+impl CListHead {
+    /// Create a `&CListHead` reference from a raw `list_head` pointer.
+    ///
+    /// # Safety
+    ///
+    /// - `ptr` must be a valid pointer to an allocated and initialized `list_head` structure.
+    /// - `ptr` must remain valid and unmodified for the lifetime `'a`.
+    #[inline]
+    pub unsafe fn from_raw<'a>(ptr: *mut bindings::list_head) -> &'a Self {
+        // SAFETY:
+        // - [`CListHead`] has same layout as `list_head`.
+        // - `ptr` is valid and unmodified for 'a.
+        unsafe { &*ptr.cast() }
+    }
+
+    /// Get the raw `list_head` pointer.
+    #[inline]
+    pub fn as_raw(&self) -> *mut bindings::list_head {
+        self.0.get()
+    }
+
+    /// Get the next [`CListHead`] in the list.
+    #[inline]
+    pub fn next(&self) -> &Self {
+        let raw = self.as_raw();
+        // SAFETY:
+        // - `self.as_raw()` is valid per type invariants.
+        // - The `next` pointer is guaranteed to be non-NULL.
+        unsafe { Self::from_raw((*raw).next) }
+    }
+
+    /// Get the previous [`CListHead`] in the list.
+    #[inline]
+    pub fn prev(&self) -> &Self {
+        let raw = self.as_raw();
+        // SAFETY:
+        // - self.as_raw() is valid per type invariants.
+        // - The `prev` pointer is guaranteed to be non-NULL.
+        unsafe { Self::from_raw((*raw).prev) }
+    }
+
+    /// Check if this node is linked in a list (not isolated).
+    #[inline]
+    pub fn is_linked(&self) -> bool {
+        let raw = self.as_raw();
+        // SAFETY: self.as_raw() is valid per type invariants.
+        unsafe { (*raw).next != raw && (*raw).prev != raw }
+    }
+
+    /// Fallible pin-initializer that initializes and then calls user closure.
+    ///
+    /// Initializes the list head first, then passes `&CListHead` to the closure.
+    /// This hides the raw FFI pointer from the user.
+    pub fn try_init<E>(
+        init_func: impl FnOnce(&CListHead) -> Result<(), E>,
+    ) -> impl PinInit<Self, E> {
+        // SAFETY: init_list_head initializes the list_head to point to itself.
+        // After initialization, we create a reference to pass to the closure.
+        unsafe {
+            pin_init::pin_init_from_closure(move |slot: *mut Self| {
+                init_list_head(slot.cast());
+                // SAFETY: slot is now initialized, safe to create reference.
+                init_func(&*slot)
+            })
+        }
+    }
+}
+
+// SAFETY: [`CListHead`] can be sent to any thread.
+unsafe impl Send for CListHead {}
+
+// SAFETY: [`CListHead`] can be shared among threads as it is not modified
+// by non-Rust code per type invariants.
+unsafe impl Sync for CListHead {}
+
+impl PartialEq for CListHead {
+    fn eq(&self, other: &Self) -> bool {
+        self.as_raw() == other.as_raw()
+    }
+}
+
+impl Eq for CListHead {}
+
+/// Low-level iterator over `list_head` nodes.
+///
+/// An iterator used to iterate over a C intrusive linked list (`list_head`). Caller has to
+/// perform conversion of returned [`CListHead`] to an item (using `container_of` macro or similar).
+///
+/// # Invariants
+///
+/// [`CListHeadIter`] is iterating over an allocated, initialized and valid list.
+struct CListHeadIter<'a> {
+    current_head: &'a CListHead,
+    list_head: &'a CListHead,
+}
+
+impl<'a> Iterator for CListHeadIter<'a> {
+    type Item = &'a CListHead;
+
+    #[inline]
+    fn next(&mut self) -> Option<Self::Item> {
+        // Advance to next node.
+        let next = self.current_head.next();
+
+        // Check if we've circled back to the sentinel head.
+        if next == self.list_head {
+            None
+        } else {
+            self.current_head = next;
+            Some(self.current_head)
+        }
+    }
+}
+
+impl<'a> FusedIterator for CListHeadIter<'a> {}
+
+/// A typed C linked list with a sentinel head.
+///
+/// A sentinel head represents the entire linked list and can be used for
+/// iteration over items of type `T`, it is not associated with a specific item.
+///
+/// The const generic `OFFSET` specifies the byte offset of the `list_head` field within
+/// the struct that `T` wraps.
+///
+/// # Invariants
+///
+/// - `head` is an allocated and valid C `list_head` structure that is the list's sentinel.
+/// - `OFFSET` is the byte offset of the `list_head` field within the struct that `T` wraps.
+/// - All the list's `list_head` nodes are allocated and have valid next/prev pointers.
+/// - The underlying `list_head` (and entire list) is not modified for the lifetime `'a`.
+pub struct CList<'a, T, const OFFSET: usize> {
+    head: &'a CListHead,
+    _phantom: PhantomData<&'a T>,
+}
+
+impl<'a, T, const OFFSET: usize> CList<'a, T, OFFSET> {
+    /// Create a typed [`CList`] from a raw sentinel `list_head` pointer.
+    ///
+    /// # Safety
+    ///
+    /// - `ptr` must be a valid pointer to an allocated and initialized `list_head` structure
+    ///   representing a list sentinel.
+    /// - `ptr` must remain valid and unmodified for the lifetime `'a`.
+    /// - The list must contain items where the `list_head` field is at byte offset `OFFSET`.
+    /// - `T` must be `#[repr(transparent)]` over the C struct.
+    #[inline]
+    pub unsafe fn from_raw(ptr: *mut bindings::list_head) -> Self {
+        Self {
+            // SAFETY: Caller guarantees `ptr` is a valid, sentinel `list_head` object.
+            head: unsafe { CListHead::from_raw(ptr) },
+            _phantom: PhantomData,
+        }
+    }
+
+    /// Get the raw sentinel `list_head` pointer.
+    #[inline]
+    pub fn as_raw(&self) -> *mut bindings::list_head {
+        self.head.as_raw()
+    }
+
+    /// Check if the list is empty.
+    #[inline]
+    pub fn is_empty(&self) -> bool {
+        let raw = self.as_raw();
+        // SAFETY: self.as_raw() is valid per type invariants.
+        unsafe { (*raw).next == raw }
+    }
+
+    /// Create an iterator over typed items.
+    #[inline]
+    pub fn iter(&self) -> CListIter<'a, T, OFFSET> {
+        CListIter {
+            head_iter: CListHeadIter {
+                current_head: self.head,
+                list_head: self.head,
+            },
+            _phantom: PhantomData,
+        }
+    }
+}
+
+/// High-level iterator over typed list items.
+pub struct CListIter<'a, T, const OFFSET: usize> {
+    head_iter: CListHeadIter<'a>,
+    _phantom: PhantomData<&'a T>,
+}
+
+impl<'a, T, const OFFSET: usize> Iterator for CListIter<'a, T, OFFSET> {
+    type Item = &'a T;
+
+    fn next(&mut self) -> Option<Self::Item> {
+        let head = self.head_iter.next()?;
+
+        // Convert to item using OFFSET.
+        // SAFETY: `item_ptr` calculation from `OFFSET` (calculated using offset_of!)
+        // is valid per invariants.
+        Some(unsafe { &*head.as_raw().byte_sub(OFFSET).cast::<T>() })
+    }
+}
+
+impl<'a, T, const OFFSET: usize> FusedIterator for CListIter<'a, T, OFFSET> {}
+
+/// Create a C doubly-circular linked list interface [`CList`] from a raw `list_head` pointer.
+///
+/// This macro creates a [`CList<T, OFFSET>`] that can iterate over items of type `$rust_type`
+/// linked via the `$field` field in the underlying C struct `$c_type`.
+///
+/// # Arguments
+///
+/// - `$head`: Raw pointer to the sentinel `list_head` object (`*mut bindings::list_head`).
+/// - `$rust_type`: Each item's rust wrapper type.
+/// - `$c_type`: Each item's C struct type that contains the embedded `list_head`.
+/// - `$field`: The name of the `list_head` field within the C struct.
+///
+/// # Safety
+///
+/// The caller must ensure:
+/// - `$head` is a valid, initialized sentinel `list_head` pointing to a list that remains
+///   unmodified for the lifetime of the rust [`CList`].
+/// - The list contains items of type `$c_type` linked via an embedded `$field`.
+/// - `$rust_type` is `#[repr(transparent)]` over `$c_type` or has compatible layout.
+/// - The macro is called from an unsafe block.
+///
+/// # Examples
+///
+/// Refer to the examples in the [`crate::clist`] module documentation.
+#[macro_export]
+macro_rules! clist_create {
+    ($head:expr, $rust_type:ty, $c_type:ty, $($field:tt).+) => {{
+        // Compile-time check that field path is a list_head.
+        let _: fn(*const $c_type) -> *const $crate::bindings::list_head =
+            |p| ::core::ptr::addr_of!((*p).$($field).+);
+
+        // Calculate offset and create `CList`.
+        const OFFSET: usize = ::core::mem::offset_of!($c_type, $($field).+);
+        $crate::clist::CList::<$rust_type, OFFSET>::from_raw($head)
+    }};
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index f812cf120042..cd7e6a1055b0 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -75,6 +75,7 @@
 pub mod bug;
 #[doc(hidden)]
 pub mod build_assert;
+pub mod clist;
 pub mod clk;
 #[cfg(CONFIG_CONFIGFS_FS)]
 pub mod configfs;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 02/26] gpu: Move DRM buddy allocator one level up
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-02-05 20:55   ` Dave Airlie
  2026-01-20 20:42 ` [PATCH RFC v6 03/26] rust: gpu: Add GPU buddy allocator bindings Joel Fernandes
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Move the DRM buddy allocator one level up so that it can be used by GPU
drivers (example, nova-core) that have usecases other than DRM (such as
VFIO vGPU support). Modify the API, structures and Kconfigs to use
"gpu_buddy" terminology. Adapt the drivers and tests to use the new API.

The commit cannot be split due to bisectability, however no functional
change is intended. Verified by running K-UNIT tests and build tested
various configurations.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 Documentation/gpu/drm-mm.rst                  |   10 +-
 drivers/gpu/Kconfig                           |   13 +
 drivers/gpu/Makefile                          |    2 +
 drivers/gpu/buddy.c                           | 1310 +++++++++++++++++
 drivers/gpu/drm/Kconfig                       |    1 +
 drivers/gpu/drm/Kconfig.debug                 |    4 +-
 drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c       |    2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h    |   12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |   80 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h  |   20 +-
 drivers/gpu/drm/drm_buddy.c                   | 1284 +---------------
 drivers/gpu/drm/i915/Kconfig                  |    1 +
 drivers/gpu/drm/i915/i915_scatterlist.c       |   10 +-
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |   55 +-
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.h |    6 +-
 .../drm/i915/selftests/intel_memory_region.c  |   20 +-
 drivers/gpu/drm/tests/Makefile                |    1 -
 .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |    5 +-
 drivers/gpu/drm/ttm/tests/ttm_mock_manager.c  |   18 +-
 drivers/gpu/drm/ttm/tests/ttm_mock_manager.h  |    4 +-
 drivers/gpu/drm/xe/Kconfig                    |    1 +
 drivers/gpu/drm/xe/xe_res_cursor.h            |   34 +-
 drivers/gpu/drm/xe/xe_svm.c                   |   12 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c          |   73 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h    |    4 +-
 drivers/gpu/tests/Makefile                    |    3 +
 .../gpu_buddy_test.c}                         |  390 ++---
 drivers/gpu/tests/gpu_random.c                |   48 +
 drivers/gpu/tests/gpu_random.h                |   28 +
 drivers/video/Kconfig                         |    2 +
 include/drm/drm_buddy.h                       |  163 +-
 include/linux/gpu_buddy.h                     |  177 +++
 33 files changed, 1995 insertions(+), 1799 deletions(-)
 create mode 100644 drivers/gpu/Kconfig
 create mode 100644 drivers/gpu/buddy.c
 create mode 100644 drivers/gpu/tests/Makefile
 rename drivers/gpu/{drm/tests/drm_buddy_test.c => tests/gpu_buddy_test.c} (68%)
 create mode 100644 drivers/gpu/tests/gpu_random.c
 create mode 100644 drivers/gpu/tests/gpu_random.h
 create mode 100644 include/linux/gpu_buddy.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index d55751cad67c..8e0d31230b29 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -509,8 +509,14 @@ DRM GPUVM Function References
 DRM Buddy Allocator
 ===================
 
-DRM Buddy Function References
------------------------------
+Buddy Allocator Function References (GPU buddy)
+-----------------------------------------------
+
+.. kernel-doc:: drivers/gpu/buddy.c
+   :export:
+
+DRM Buddy Specific Logging Function References
+----------------------------------------------
 
 .. kernel-doc:: drivers/gpu/drm/drm_buddy.c
    :export:
diff --git a/drivers/gpu/Kconfig b/drivers/gpu/Kconfig
new file mode 100644
index 000000000000..22dd29cd50b5
--- /dev/null
+++ b/drivers/gpu/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config GPU_BUDDY
+	bool
+	help
+	  A page based buddy allocator for GPU memory.
+
+config GPU_BUDDY_KUNIT_TEST
+	tristate "KUnit tests for GPU buddy allocator" if !KUNIT_ALL_TESTS
+	depends on GPU_BUDDY && KUNIT
+	default KUNIT_ALL_TESTS
+	help
+	  KUnit tests for the GPU buddy allocator.
diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
index 36a54d456630..5063caccabdf 100644
--- a/drivers/gpu/Makefile
+++ b/drivers/gpu/Makefile
@@ -6,3 +6,5 @@ obj-y			+= host1x/ drm/ vga/
 obj-$(CONFIG_IMX_IPUV3_CORE)	+= ipu-v3/
 obj-$(CONFIG_TRACE_GPU_MEM)		+= trace/
 obj-$(CONFIG_NOVA_CORE)		+= nova-core/
+obj-$(CONFIG_GPU_BUDDY)		+= buddy.o
+obj-y				+= tests/
diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
new file mode 100644
index 000000000000..1347c0436617
--- /dev/null
+++ b/drivers/gpu/buddy.c
@@ -0,0 +1,1310 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include <kunit/test-bug.h>
+
+#include <linux/export.h>
+#include <linux/gpu_buddy.h>
+#include <linux/kmemleak.h>
+#include <linux/module.h>
+#include <linux/sizes.h>
+
+static struct kmem_cache *slab_blocks;
+
+static struct gpu_buddy_block *gpu_block_alloc(struct gpu_buddy *mm,
+					       struct gpu_buddy_block *parent,
+					       unsigned int order,
+					       u64 offset)
+{
+	struct gpu_buddy_block *block;
+
+	BUG_ON(order > GPU_BUDDY_MAX_ORDER);
+
+	block = kmem_cache_zalloc(slab_blocks, GFP_KERNEL);
+	if (!block)
+		return NULL;
+
+	block->header = offset;
+	block->header |= order;
+	block->parent = parent;
+
+	RB_CLEAR_NODE(&block->rb);
+
+	BUG_ON(block->header & GPU_BUDDY_HEADER_UNUSED);
+	return block;
+}
+
+static void gpu_block_free(struct gpu_buddy *mm,
+			   struct gpu_buddy_block *block)
+{
+	kmem_cache_free(slab_blocks, block);
+}
+
+static enum gpu_buddy_free_tree
+get_block_tree(struct gpu_buddy_block *block)
+{
+	return gpu_buddy_block_is_clear(block) ?
+	       GPU_BUDDY_CLEAR_TREE : GPU_BUDDY_DIRTY_TREE;
+}
+
+static struct gpu_buddy_block *
+rbtree_get_free_block(const struct rb_node *node)
+{
+	return node ? rb_entry(node, struct gpu_buddy_block, rb) : NULL;
+}
+
+static struct gpu_buddy_block *
+rbtree_last_free_block(struct rb_root *root)
+{
+	return rbtree_get_free_block(rb_last(root));
+}
+
+static bool rbtree_is_empty(struct rb_root *root)
+{
+	return RB_EMPTY_ROOT(root);
+}
+
+static bool gpu_buddy_block_offset_less(const struct gpu_buddy_block *block,
+					const struct gpu_buddy_block *node)
+{
+	return gpu_buddy_block_offset(block) < gpu_buddy_block_offset(node);
+}
+
+static bool rbtree_block_offset_less(struct rb_node *block,
+				     const struct rb_node *node)
+{
+	return gpu_buddy_block_offset_less(rbtree_get_free_block(block),
+					   rbtree_get_free_block(node));
+}
+
+static void rbtree_insert(struct gpu_buddy *mm,
+			  struct gpu_buddy_block *block,
+			  enum gpu_buddy_free_tree tree)
+{
+	rb_add(&block->rb,
+	       &mm->free_trees[tree][gpu_buddy_block_order(block)],
+	       rbtree_block_offset_less);
+}
+
+static void rbtree_remove(struct gpu_buddy *mm,
+			  struct gpu_buddy_block *block)
+{
+	unsigned int order = gpu_buddy_block_order(block);
+	enum gpu_buddy_free_tree tree;
+	struct rb_root *root;
+
+	tree = get_block_tree(block);
+	root = &mm->free_trees[tree][order];
+
+	rb_erase(&block->rb, root);
+	RB_CLEAR_NODE(&block->rb);
+}
+
+static void clear_reset(struct gpu_buddy_block *block)
+{
+	block->header &= ~GPU_BUDDY_HEADER_CLEAR;
+}
+
+static void mark_cleared(struct gpu_buddy_block *block)
+{
+	block->header |= GPU_BUDDY_HEADER_CLEAR;
+}
+
+static void mark_allocated(struct gpu_buddy *mm,
+			   struct gpu_buddy_block *block)
+{
+	block->header &= ~GPU_BUDDY_HEADER_STATE;
+	block->header |= GPU_BUDDY_ALLOCATED;
+
+	rbtree_remove(mm, block);
+}
+
+static void mark_free(struct gpu_buddy *mm,
+		      struct gpu_buddy_block *block)
+{
+	enum gpu_buddy_free_tree tree;
+
+	block->header &= ~GPU_BUDDY_HEADER_STATE;
+	block->header |= GPU_BUDDY_FREE;
+
+	tree = get_block_tree(block);
+	rbtree_insert(mm, block, tree);
+}
+
+static void mark_split(struct gpu_buddy *mm,
+		       struct gpu_buddy_block *block)
+{
+	block->header &= ~GPU_BUDDY_HEADER_STATE;
+	block->header |= GPU_BUDDY_SPLIT;
+
+	rbtree_remove(mm, block);
+}
+
+static inline bool overlaps(u64 s1, u64 e1, u64 s2, u64 e2)
+{
+	return s1 <= e2 && e1 >= s2;
+}
+
+static inline bool contains(u64 s1, u64 e1, u64 s2, u64 e2)
+{
+	return s1 <= s2 && e1 >= e2;
+}
+
+static struct gpu_buddy_block *
+__get_buddy(struct gpu_buddy_block *block)
+{
+	struct gpu_buddy_block *parent;
+
+	parent = block->parent;
+	if (!parent)
+		return NULL;
+
+	if (parent->left == block)
+		return parent->right;
+
+	return parent->left;
+}
+
+static unsigned int __gpu_buddy_free(struct gpu_buddy *mm,
+				     struct gpu_buddy_block *block,
+				     bool force_merge)
+{
+	struct gpu_buddy_block *parent;
+	unsigned int order;
+
+	while ((parent = block->parent)) {
+		struct gpu_buddy_block *buddy;
+
+		buddy = __get_buddy(block);
+
+		if (!gpu_buddy_block_is_free(buddy))
+			break;
+
+		if (!force_merge) {
+			/*
+			 * Check the block and its buddy clear state and exit
+			 * the loop if they both have the dissimilar state.
+			 */
+			if (gpu_buddy_block_is_clear(block) !=
+			    gpu_buddy_block_is_clear(buddy))
+				break;
+
+			if (gpu_buddy_block_is_clear(block))
+				mark_cleared(parent);
+		}
+
+		rbtree_remove(mm, buddy);
+		if (force_merge && gpu_buddy_block_is_clear(buddy))
+			mm->clear_avail -= gpu_buddy_block_size(mm, buddy);
+
+		gpu_block_free(mm, block);
+		gpu_block_free(mm, buddy);
+
+		block = parent;
+	}
+
+	order = gpu_buddy_block_order(block);
+	mark_free(mm, block);
+
+	return order;
+}
+
+static int __force_merge(struct gpu_buddy *mm,
+			 u64 start,
+			 u64 end,
+			 unsigned int min_order)
+{
+	unsigned int tree, order;
+	int i;
+
+	if (!min_order)
+		return -ENOMEM;
+
+	if (min_order > mm->max_order)
+		return -EINVAL;
+
+	for_each_free_tree(tree) {
+		for (i = min_order - 1; i >= 0; i--) {
+			struct rb_node *iter = rb_last(&mm->free_trees[tree][i]);
+
+			while (iter) {
+				struct gpu_buddy_block *block, *buddy;
+				u64 block_start, block_end;
+
+				block = rbtree_get_free_block(iter);
+				iter = rb_prev(iter);
+
+				if (!block || !block->parent)
+					continue;
+
+				block_start = gpu_buddy_block_offset(block);
+				block_end = block_start + gpu_buddy_block_size(mm, block) - 1;
+
+				if (!contains(start, end, block_start, block_end))
+					continue;
+
+				buddy = __get_buddy(block);
+				if (!gpu_buddy_block_is_free(buddy))
+					continue;
+
+				WARN_ON(gpu_buddy_block_is_clear(block) ==
+					gpu_buddy_block_is_clear(buddy));
+
+				/*
+				 * Advance to the next node when the current node is the buddy,
+				 * as freeing the block will also remove its buddy from the tree.
+				 */
+				if (iter == &buddy->rb)
+					iter = rb_prev(iter);
+
+				rbtree_remove(mm, block);
+				if (gpu_buddy_block_is_clear(block))
+					mm->clear_avail -= gpu_buddy_block_size(mm, block);
+
+				order = __gpu_buddy_free(mm, block, true);
+				if (order >= min_order)
+					return 0;
+			}
+		}
+	}
+
+	return -ENOMEM;
+}
+
+/**
+ * gpu_buddy_init - init memory manager
+ *
+ * @mm: GPU buddy manager to initialize
+ * @size: size in bytes to manage
+ * @chunk_size: minimum page size in bytes for our allocations
+ *
+ * Initializes the memory manager and its resources.
+ *
+ * Returns:
+ * 0 on success, error code on failure.
+ */
+int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size)
+{
+	unsigned int i, j, root_count = 0;
+	u64 offset = 0;
+
+	if (size < chunk_size)
+		return -EINVAL;
+
+	if (chunk_size < SZ_4K)
+		return -EINVAL;
+
+	if (!is_power_of_2(chunk_size))
+		return -EINVAL;
+
+	size = round_down(size, chunk_size);
+
+	mm->size = size;
+	mm->avail = size;
+	mm->clear_avail = 0;
+	mm->chunk_size = chunk_size;
+	mm->max_order = ilog2(size) - ilog2(chunk_size);
+
+	BUG_ON(mm->max_order > GPU_BUDDY_MAX_ORDER);
+
+	mm->free_trees = kmalloc_array(GPU_BUDDY_MAX_FREE_TREES,
+				       sizeof(*mm->free_trees),
+				       GFP_KERNEL);
+	if (!mm->free_trees)
+		return -ENOMEM;
+
+	for_each_free_tree(i) {
+		mm->free_trees[i] = kmalloc_array(mm->max_order + 1,
+						  sizeof(struct rb_root),
+						  GFP_KERNEL);
+		if (!mm->free_trees[i])
+			goto out_free_tree;
+
+		for (j = 0; j <= mm->max_order; ++j)
+			mm->free_trees[i][j] = RB_ROOT;
+	}
+
+	mm->n_roots = hweight64(size);
+
+	mm->roots = kmalloc_array(mm->n_roots,
+				  sizeof(struct gpu_buddy_block *),
+				  GFP_KERNEL);
+	if (!mm->roots)
+		goto out_free_tree;
+
+	/*
+	 * Split into power-of-two blocks, in case we are given a size that is
+	 * not itself a power-of-two.
+	 */
+	do {
+		struct gpu_buddy_block *root;
+		unsigned int order;
+		u64 root_size;
+
+		order = ilog2(size) - ilog2(chunk_size);
+		root_size = chunk_size << order;
+
+		root = gpu_block_alloc(mm, NULL, order, offset);
+		if (!root)
+			goto out_free_roots;
+
+		mark_free(mm, root);
+
+		BUG_ON(root_count > mm->max_order);
+		BUG_ON(gpu_buddy_block_size(mm, root) < chunk_size);
+
+		mm->roots[root_count] = root;
+
+		offset += root_size;
+		size -= root_size;
+		root_count++;
+	} while (size);
+
+	return 0;
+
+out_free_roots:
+	while (root_count--)
+		gpu_block_free(mm, mm->roots[root_count]);
+	kfree(mm->roots);
+out_free_tree:
+	while (i--)
+		kfree(mm->free_trees[i]);
+	kfree(mm->free_trees);
+	return -ENOMEM;
+}
+EXPORT_SYMBOL(gpu_buddy_init);
+
+/**
+ * gpu_buddy_fini - tear down the memory manager
+ *
+ * @mm: GPU buddy manager to free
+ *
+ * Cleanup memory manager resources and the freetree
+ */
+void gpu_buddy_fini(struct gpu_buddy *mm)
+{
+	u64 root_size, size, start;
+	unsigned int order;
+	int i;
+
+	size = mm->size;
+
+	for (i = 0; i < mm->n_roots; ++i) {
+		order = ilog2(size) - ilog2(mm->chunk_size);
+		start = gpu_buddy_block_offset(mm->roots[i]);
+		__force_merge(mm, start, start + size, order);
+
+		if (WARN_ON(!gpu_buddy_block_is_free(mm->roots[i])))
+			kunit_fail_current_test("buddy_fini() root");
+
+		gpu_block_free(mm, mm->roots[i]);
+
+		root_size = mm->chunk_size << order;
+		size -= root_size;
+	}
+
+	WARN_ON(mm->avail != mm->size);
+
+	for_each_free_tree(i)
+		kfree(mm->free_trees[i]);
+	kfree(mm->roots);
+}
+EXPORT_SYMBOL(gpu_buddy_fini);
+
+static int split_block(struct gpu_buddy *mm,
+		       struct gpu_buddy_block *block)
+{
+	unsigned int block_order = gpu_buddy_block_order(block) - 1;
+	u64 offset = gpu_buddy_block_offset(block);
+
+	BUG_ON(!gpu_buddy_block_is_free(block));
+	BUG_ON(!gpu_buddy_block_order(block));
+
+	block->left = gpu_block_alloc(mm, block, block_order, offset);
+	if (!block->left)
+		return -ENOMEM;
+
+	block->right = gpu_block_alloc(mm, block, block_order,
+				       offset + (mm->chunk_size << block_order));
+	if (!block->right) {
+		gpu_block_free(mm, block->left);
+		return -ENOMEM;
+	}
+
+	mark_split(mm, block);
+
+	if (gpu_buddy_block_is_clear(block)) {
+		mark_cleared(block->left);
+		mark_cleared(block->right);
+		clear_reset(block);
+	}
+
+	mark_free(mm, block->left);
+	mark_free(mm, block->right);
+
+	return 0;
+}
+
+/**
+ * gpu_get_buddy - get buddy address
+ *
+ * @block: GPU buddy block
+ *
+ * Returns the corresponding buddy block for @block, or NULL
+ * if this is a root block and can't be merged further.
+ * Requires some kind of locking to protect against
+ * any concurrent allocate and free operations.
+ */
+struct gpu_buddy_block *
+gpu_get_buddy(struct gpu_buddy_block *block)
+{
+	return __get_buddy(block);
+}
+EXPORT_SYMBOL(gpu_get_buddy);
+
+/**
+ * gpu_buddy_reset_clear - reset blocks clear state
+ *
+ * @mm: GPU buddy manager
+ * @is_clear: blocks clear state
+ *
+ * Reset the clear state based on @is_clear value for each block
+ * in the freetree.
+ */
+void gpu_buddy_reset_clear(struct gpu_buddy *mm, bool is_clear)
+{
+	enum gpu_buddy_free_tree src_tree, dst_tree;
+	u64 root_size, size, start;
+	unsigned int order;
+	int i;
+
+	size = mm->size;
+	for (i = 0; i < mm->n_roots; ++i) {
+		order = ilog2(size) - ilog2(mm->chunk_size);
+		start = gpu_buddy_block_offset(mm->roots[i]);
+		__force_merge(mm, start, start + size, order);
+
+		root_size = mm->chunk_size << order;
+		size -= root_size;
+	}
+
+	src_tree = is_clear ? GPU_BUDDY_DIRTY_TREE : GPU_BUDDY_CLEAR_TREE;
+	dst_tree = is_clear ? GPU_BUDDY_CLEAR_TREE : GPU_BUDDY_DIRTY_TREE;
+
+	for (i = 0; i <= mm->max_order; ++i) {
+		struct rb_root *root = &mm->free_trees[src_tree][i];
+		struct gpu_buddy_block *block, *tmp;
+
+		rbtree_postorder_for_each_entry_safe(block, tmp, root, rb) {
+			rbtree_remove(mm, block);
+			if (is_clear) {
+				mark_cleared(block);
+				mm->clear_avail += gpu_buddy_block_size(mm, block);
+			} else {
+				clear_reset(block);
+				mm->clear_avail -= gpu_buddy_block_size(mm, block);
+			}
+
+			rbtree_insert(mm, block, dst_tree);
+		}
+	}
+}
+EXPORT_SYMBOL(gpu_buddy_reset_clear);
+
+/**
+ * gpu_buddy_free_block - free a block
+ *
+ * @mm: GPU buddy manager
+ * @block: block to be freed
+ */
+void gpu_buddy_free_block(struct gpu_buddy *mm,
+			  struct gpu_buddy_block *block)
+{
+	BUG_ON(!gpu_buddy_block_is_allocated(block));
+	mm->avail += gpu_buddy_block_size(mm, block);
+	if (gpu_buddy_block_is_clear(block))
+		mm->clear_avail += gpu_buddy_block_size(mm, block);
+
+	__gpu_buddy_free(mm, block, false);
+}
+EXPORT_SYMBOL(gpu_buddy_free_block);
+
+static void __gpu_buddy_free_list(struct gpu_buddy *mm,
+				  struct list_head *objects,
+				  bool mark_clear,
+				  bool mark_dirty)
+{
+	struct gpu_buddy_block *block, *on;
+
+	WARN_ON(mark_dirty && mark_clear);
+
+	list_for_each_entry_safe(block, on, objects, link) {
+		if (mark_clear)
+			mark_cleared(block);
+		else if (mark_dirty)
+			clear_reset(block);
+		gpu_buddy_free_block(mm, block);
+		cond_resched();
+	}
+	INIT_LIST_HEAD(objects);
+}
+
+static void gpu_buddy_free_list_internal(struct gpu_buddy *mm,
+					 struct list_head *objects)
+{
+	/*
+	 * Don't touch the clear/dirty bit, since allocation is still internal
+	 * at this point. For example we might have just failed part of the
+	 * allocation.
+	 */
+	__gpu_buddy_free_list(mm, objects, false, false);
+}
+
+/**
+ * gpu_buddy_free_list - free blocks
+ *
+ * @mm: GPU buddy manager
+ * @objects: input list head to free blocks
+ * @flags: optional flags like GPU_BUDDY_CLEARED
+ */
+void gpu_buddy_free_list(struct gpu_buddy *mm,
+			 struct list_head *objects,
+			 unsigned int flags)
+{
+	bool mark_clear = flags & GPU_BUDDY_CLEARED;
+
+	__gpu_buddy_free_list(mm, objects, mark_clear, !mark_clear);
+}
+EXPORT_SYMBOL(gpu_buddy_free_list);
+
+static bool block_incompatible(struct gpu_buddy_block *block, unsigned int flags)
+{
+	bool needs_clear = flags & GPU_BUDDY_CLEAR_ALLOCATION;
+
+	return needs_clear != gpu_buddy_block_is_clear(block);
+}
+
+static struct gpu_buddy_block *
+__alloc_range_bias(struct gpu_buddy *mm,
+		   u64 start, u64 end,
+		   unsigned int order,
+		   unsigned long flags,
+		   bool fallback)
+{
+	u64 req_size = mm->chunk_size << order;
+	struct gpu_buddy_block *block;
+	struct gpu_buddy_block *buddy;
+	LIST_HEAD(dfs);
+	int err;
+	int i;
+
+	end = end - 1;
+
+	for (i = 0; i < mm->n_roots; ++i)
+		list_add_tail(&mm->roots[i]->tmp_link, &dfs);
+
+	do {
+		u64 block_start;
+		u64 block_end;
+
+		block = list_first_entry_or_null(&dfs,
+						 struct gpu_buddy_block,
+						 tmp_link);
+		if (!block)
+			break;
+
+		list_del(&block->tmp_link);
+
+		if (gpu_buddy_block_order(block) < order)
+			continue;
+
+		block_start = gpu_buddy_block_offset(block);
+		block_end = block_start + gpu_buddy_block_size(mm, block) - 1;
+
+		if (!overlaps(start, end, block_start, block_end))
+			continue;
+
+		if (gpu_buddy_block_is_allocated(block))
+			continue;
+
+		if (block_start < start || block_end > end) {
+			u64 adjusted_start = max(block_start, start);
+			u64 adjusted_end = min(block_end, end);
+
+			if (round_down(adjusted_end + 1, req_size) <=
+			    round_up(adjusted_start, req_size))
+				continue;
+		}
+
+		if (!fallback && block_incompatible(block, flags))
+			continue;
+
+		if (contains(start, end, block_start, block_end) &&
+		    order == gpu_buddy_block_order(block)) {
+			/*
+			 * Find the free block within the range.
+			 */
+			if (gpu_buddy_block_is_free(block))
+				return block;
+
+			continue;
+		}
+
+		if (!gpu_buddy_block_is_split(block)) {
+			err = split_block(mm, block);
+			if (unlikely(err))
+				goto err_undo;
+		}
+
+		list_add(&block->right->tmp_link, &dfs);
+		list_add(&block->left->tmp_link, &dfs);
+	} while (1);
+
+	return ERR_PTR(-ENOSPC);
+
+err_undo:
+	/*
+	 * We really don't want to leave around a bunch of split blocks, since
+	 * bigger is better, so make sure we merge everything back before we
+	 * free the allocated blocks.
+	 */
+	buddy = __get_buddy(block);
+	if (buddy &&
+	    (gpu_buddy_block_is_free(block) &&
+	     gpu_buddy_block_is_free(buddy)))
+		__gpu_buddy_free(mm, block, false);
+	return ERR_PTR(err);
+}
+
+static struct gpu_buddy_block *
+__gpu_buddy_alloc_range_bias(struct gpu_buddy *mm,
+			     u64 start, u64 end,
+			     unsigned int order,
+			     unsigned long flags)
+{
+	struct gpu_buddy_block *block;
+	bool fallback = false;
+
+	block = __alloc_range_bias(mm, start, end, order,
+				   flags, fallback);
+	if (IS_ERR(block))
+		return __alloc_range_bias(mm, start, end, order,
+					  flags, !fallback);
+
+	return block;
+}
+
+static struct gpu_buddy_block *
+get_maxblock(struct gpu_buddy *mm,
+	     unsigned int order,
+	     enum gpu_buddy_free_tree tree)
+{
+	struct gpu_buddy_block *max_block = NULL, *block = NULL;
+	struct rb_root *root;
+	unsigned int i;
+
+	for (i = order; i <= mm->max_order; ++i) {
+		root = &mm->free_trees[tree][i];
+		block = rbtree_last_free_block(root);
+		if (!block)
+			continue;
+
+		if (!max_block) {
+			max_block = block;
+			continue;
+		}
+
+		if (gpu_buddy_block_offset(block) >
+		    gpu_buddy_block_offset(max_block)) {
+			max_block = block;
+		}
+	}
+
+	return max_block;
+}
+
+static struct gpu_buddy_block *
+alloc_from_freetree(struct gpu_buddy *mm,
+		    unsigned int order,
+		    unsigned long flags)
+{
+	struct gpu_buddy_block *block = NULL;
+	struct rb_root *root;
+	enum gpu_buddy_free_tree tree;
+	unsigned int tmp;
+	int err;
+
+	tree = (flags & GPU_BUDDY_CLEAR_ALLOCATION) ?
+		GPU_BUDDY_CLEAR_TREE : GPU_BUDDY_DIRTY_TREE;
+
+	if (flags & GPU_BUDDY_TOPDOWN_ALLOCATION) {
+		block = get_maxblock(mm, order, tree);
+		if (block)
+			/* Store the obtained block order */
+			tmp = gpu_buddy_block_order(block);
+	} else {
+		for (tmp = order; tmp <= mm->max_order; ++tmp) {
+			/* Get RB tree root for this order and tree */
+			root = &mm->free_trees[tree][tmp];
+			block = rbtree_last_free_block(root);
+			if (block)
+				break;
+		}
+	}
+
+	if (!block) {
+		/* Try allocating from the other tree */
+		tree = (tree == GPU_BUDDY_CLEAR_TREE) ?
+			GPU_BUDDY_DIRTY_TREE : GPU_BUDDY_CLEAR_TREE;
+
+		for (tmp = order; tmp <= mm->max_order; ++tmp) {
+			root = &mm->free_trees[tree][tmp];
+			block = rbtree_last_free_block(root);
+			if (block)
+				break;
+		}
+
+		if (!block)
+			return ERR_PTR(-ENOSPC);
+	}
+
+	BUG_ON(!gpu_buddy_block_is_free(block));
+
+	while (tmp != order) {
+		err = split_block(mm, block);
+		if (unlikely(err))
+			goto err_undo;
+
+		block = block->right;
+		tmp--;
+	}
+	return block;
+
+err_undo:
+	if (tmp != order)
+		__gpu_buddy_free(mm, block, false);
+	return ERR_PTR(err);
+}
+
+static int __alloc_range(struct gpu_buddy *mm,
+			 struct list_head *dfs,
+			 u64 start, u64 size,
+			 struct list_head *blocks,
+			 u64 *total_allocated_on_err)
+{
+	struct gpu_buddy_block *block;
+	struct gpu_buddy_block *buddy;
+	u64 total_allocated = 0;
+	LIST_HEAD(allocated);
+	u64 end;
+	int err;
+
+	end = start + size - 1;
+
+	do {
+		u64 block_start;
+		u64 block_end;
+
+		block = list_first_entry_or_null(dfs,
+						 struct gpu_buddy_block,
+						 tmp_link);
+		if (!block)
+			break;
+
+		list_del(&block->tmp_link);
+
+		block_start = gpu_buddy_block_offset(block);
+		block_end = block_start + gpu_buddy_block_size(mm, block) - 1;
+
+		if (!overlaps(start, end, block_start, block_end))
+			continue;
+
+		if (gpu_buddy_block_is_allocated(block)) {
+			err = -ENOSPC;
+			goto err_free;
+		}
+
+		if (contains(start, end, block_start, block_end)) {
+			if (gpu_buddy_block_is_free(block)) {
+				mark_allocated(mm, block);
+				total_allocated += gpu_buddy_block_size(mm, block);
+				mm->avail -= gpu_buddy_block_size(mm, block);
+				if (gpu_buddy_block_is_clear(block))
+					mm->clear_avail -= gpu_buddy_block_size(mm, block);
+				list_add_tail(&block->link, &allocated);
+				continue;
+			} else if (!mm->clear_avail) {
+				err = -ENOSPC;
+				goto err_free;
+			}
+		}
+
+		if (!gpu_buddy_block_is_split(block)) {
+			err = split_block(mm, block);
+			if (unlikely(err))
+				goto err_undo;
+		}
+
+		list_add(&block->right->tmp_link, dfs);
+		list_add(&block->left->tmp_link, dfs);
+	} while (1);
+
+	if (total_allocated < size) {
+		err = -ENOSPC;
+		goto err_free;
+	}
+
+	list_splice_tail(&allocated, blocks);
+
+	return 0;
+
+err_undo:
+	/*
+	 * We really don't want to leave around a bunch of split blocks, since
+	 * bigger is better, so make sure we merge everything back before we
+	 * free the allocated blocks.
+	 */
+	buddy = __get_buddy(block);
+	if (buddy &&
+	    (gpu_buddy_block_is_free(block) &&
+	     gpu_buddy_block_is_free(buddy)))
+		__gpu_buddy_free(mm, block, false);
+
+err_free:
+	if (err == -ENOSPC && total_allocated_on_err) {
+		list_splice_tail(&allocated, blocks);
+		*total_allocated_on_err = total_allocated;
+	} else {
+		gpu_buddy_free_list_internal(mm, &allocated);
+	}
+
+	return err;
+}
+
+static int __gpu_buddy_alloc_range(struct gpu_buddy *mm,
+				   u64 start,
+				   u64 size,
+				   u64 *total_allocated_on_err,
+				   struct list_head *blocks)
+{
+	LIST_HEAD(dfs);
+	int i;
+
+	for (i = 0; i < mm->n_roots; ++i)
+		list_add_tail(&mm->roots[i]->tmp_link, &dfs);
+
+	return __alloc_range(mm, &dfs, start, size,
+			     blocks, total_allocated_on_err);
+}
+
+static int __alloc_contig_try_harder(struct gpu_buddy *mm,
+				     u64 size,
+				     u64 min_block_size,
+				     struct list_head *blocks)
+{
+	u64 rhs_offset, lhs_offset, lhs_size, filled;
+	struct gpu_buddy_block *block;
+	unsigned int tree, order;
+	LIST_HEAD(blocks_lhs);
+	unsigned long pages;
+	u64 modify_size;
+	int err;
+
+	modify_size = rounddown_pow_of_two(size);
+	pages = modify_size >> ilog2(mm->chunk_size);
+	order = fls(pages) - 1;
+	if (order == 0)
+		return -ENOSPC;
+
+	for_each_free_tree(tree) {
+		struct rb_root *root;
+		struct rb_node *iter;
+
+		root = &mm->free_trees[tree][order];
+		if (rbtree_is_empty(root))
+			continue;
+
+		iter = rb_last(root);
+		while (iter) {
+			block = rbtree_get_free_block(iter);
+
+			/* Allocate blocks traversing RHS */
+			rhs_offset = gpu_buddy_block_offset(block);
+			err =  __gpu_buddy_alloc_range(mm, rhs_offset, size,
+						       &filled, blocks);
+			if (!err || err != -ENOSPC)
+				return err;
+
+			lhs_size = max((size - filled), min_block_size);
+			if (!IS_ALIGNED(lhs_size, min_block_size))
+				lhs_size = round_up(lhs_size, min_block_size);
+
+			/* Allocate blocks traversing LHS */
+			lhs_offset = gpu_buddy_block_offset(block) - lhs_size;
+			err =  __gpu_buddy_alloc_range(mm, lhs_offset, lhs_size,
+						       NULL, &blocks_lhs);
+			if (!err) {
+				list_splice(&blocks_lhs, blocks);
+				return 0;
+			} else if (err != -ENOSPC) {
+				gpu_buddy_free_list_internal(mm, blocks);
+				return err;
+			}
+			/* Free blocks for the next iteration */
+			gpu_buddy_free_list_internal(mm, blocks);
+
+			iter = rb_prev(iter);
+		}
+	}
+
+	return -ENOSPC;
+}
+
+/**
+ * gpu_buddy_block_trim - free unused pages
+ *
+ * @mm: GPU buddy manager
+ * @start: start address to begin the trimming.
+ * @new_size: original size requested
+ * @blocks: Input and output list of allocated blocks.
+ * MUST contain single block as input to be trimmed.
+ * On success will contain the newly allocated blocks
+ * making up the @new_size. Blocks always appear in
+ * ascending order
+ *
+ * For contiguous allocation, we round up the size to the nearest
+ * power of two value, drivers consume *actual* size, so remaining
+ * portions are unused and can be optionally freed with this function
+ *
+ * Returns:
+ * 0 on success, error code on failure.
+ */
+int gpu_buddy_block_trim(struct gpu_buddy *mm,
+			 u64 *start,
+			 u64 new_size,
+			 struct list_head *blocks)
+{
+	struct gpu_buddy_block *parent;
+	struct gpu_buddy_block *block;
+	u64 block_start, block_end;
+	LIST_HEAD(dfs);
+	u64 new_start;
+	int err;
+
+	if (!list_is_singular(blocks))
+		return -EINVAL;
+
+	block = list_first_entry(blocks,
+				 struct gpu_buddy_block,
+				 link);
+
+	block_start = gpu_buddy_block_offset(block);
+	block_end = block_start + gpu_buddy_block_size(mm, block);
+
+	if (WARN_ON(!gpu_buddy_block_is_allocated(block)))
+		return -EINVAL;
+
+	if (new_size > gpu_buddy_block_size(mm, block))
+		return -EINVAL;
+
+	if (!new_size || !IS_ALIGNED(new_size, mm->chunk_size))
+		return -EINVAL;
+
+	if (new_size == gpu_buddy_block_size(mm, block))
+		return 0;
+
+	new_start = block_start;
+	if (start) {
+		new_start = *start;
+
+		if (new_start < block_start)
+			return -EINVAL;
+
+		if (!IS_ALIGNED(new_start, mm->chunk_size))
+			return -EINVAL;
+
+		if (range_overflows(new_start, new_size, block_end))
+			return -EINVAL;
+	}
+
+	list_del(&block->link);
+	mark_free(mm, block);
+	mm->avail += gpu_buddy_block_size(mm, block);
+	if (gpu_buddy_block_is_clear(block))
+		mm->clear_avail += gpu_buddy_block_size(mm, block);
+
+	/* Prevent recursively freeing this node */
+	parent = block->parent;
+	block->parent = NULL;
+
+	list_add(&block->tmp_link, &dfs);
+	err =  __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL);
+	if (err) {
+		mark_allocated(mm, block);
+		mm->avail -= gpu_buddy_block_size(mm, block);
+		if (gpu_buddy_block_is_clear(block))
+			mm->clear_avail -= gpu_buddy_block_size(mm, block);
+		list_add(&block->link, blocks);
+	}
+
+	block->parent = parent;
+	return err;
+}
+EXPORT_SYMBOL(gpu_buddy_block_trim);
+
+static struct gpu_buddy_block *
+__gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
+			 u64 start, u64 end,
+			 unsigned int order,
+			 unsigned long flags)
+{
+	if (flags & GPU_BUDDY_RANGE_ALLOCATION)
+		/* Allocate traversing within the range */
+		return  __gpu_buddy_alloc_range_bias(mm, start, end,
+						     order, flags);
+	else
+		/* Allocate from freetree */
+		return alloc_from_freetree(mm, order, flags);
+}
+
+/**
+ * gpu_buddy_alloc_blocks - allocate power-of-two blocks
+ *
+ * @mm: GPU buddy manager to allocate from
+ * @start: start of the allowed range for this block
+ * @end: end of the allowed range for this block
+ * @size: size of the allocation in bytes
+ * @min_block_size: alignment of the allocation
+ * @blocks: output list head to add allocated blocks
+ * @flags: GPU_BUDDY_*_ALLOCATION flags
+ *
+ * alloc_range_bias() called on range limitations, which traverses
+ * the tree and returns the desired block.
+ *
+ * alloc_from_freetree() called when *no* range restrictions
+ * are enforced, which picks the block from the freetree.
+ *
+ * Returns:
+ * 0 on success, error code on failure.
+ */
+int gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
+			   u64 start, u64 end, u64 size,
+			   u64 min_block_size,
+			   struct list_head *blocks,
+			   unsigned long flags)
+{
+	struct gpu_buddy_block *block = NULL;
+	u64 original_size, original_min_size;
+	unsigned int min_order, order;
+	LIST_HEAD(allocated);
+	unsigned long pages;
+	int err;
+
+	if (size < mm->chunk_size)
+		return -EINVAL;
+
+	if (min_block_size < mm->chunk_size)
+		return -EINVAL;
+
+	if (!is_power_of_2(min_block_size))
+		return -EINVAL;
+
+	if (!IS_ALIGNED(start | end | size, mm->chunk_size))
+		return -EINVAL;
+
+	if (end > mm->size)
+		return -EINVAL;
+
+	if (range_overflows(start, size, mm->size))
+		return -EINVAL;
+
+	/* Actual range allocation */
+	if (start + size == end) {
+		if (!IS_ALIGNED(start | end, min_block_size))
+			return -EINVAL;
+
+		return __gpu_buddy_alloc_range(mm, start, size, NULL, blocks);
+	}
+
+	original_size = size;
+	original_min_size = min_block_size;
+
+	/* Roundup the size to power of 2 */
+	if (flags & GPU_BUDDY_CONTIGUOUS_ALLOCATION) {
+		size = roundup_pow_of_two(size);
+		min_block_size = size;
+	/* Align size value to min_block_size */
+	} else if (!IS_ALIGNED(size, min_block_size)) {
+		size = round_up(size, min_block_size);
+	}
+
+	pages = size >> ilog2(mm->chunk_size);
+	order = fls(pages) - 1;
+	min_order = ilog2(min_block_size) - ilog2(mm->chunk_size);
+
+	do {
+		order = min(order, (unsigned int)fls(pages) - 1);
+		BUG_ON(order > mm->max_order);
+		BUG_ON(order < min_order);
+
+		do {
+			block = __gpu_buddy_alloc_blocks(mm, start,
+							 end,
+							 order,
+							 flags);
+			if (!IS_ERR(block))
+				break;
+
+			if (order-- == min_order) {
+				/* Try allocation through force merge method */
+				if (mm->clear_avail &&
+				    !__force_merge(mm, start, end, min_order)) {
+					block = __gpu_buddy_alloc_blocks(mm, start,
+									 end,
+									 min_order,
+									 flags);
+					if (!IS_ERR(block)) {
+						order = min_order;
+						break;
+					}
+				}
+
+				/*
+				 * Try contiguous block allocation through
+				 * try harder method.
+				 */
+				if (flags & GPU_BUDDY_CONTIGUOUS_ALLOCATION &&
+				    !(flags & GPU_BUDDY_RANGE_ALLOCATION))
+					return __alloc_contig_try_harder(mm,
+									 original_size,
+									 original_min_size,
+									 blocks);
+				err = -ENOSPC;
+				goto err_free;
+			}
+		} while (1);
+
+		mark_allocated(mm, block);
+		mm->avail -= gpu_buddy_block_size(mm, block);
+		if (gpu_buddy_block_is_clear(block))
+			mm->clear_avail -= gpu_buddy_block_size(mm, block);
+		kmemleak_update_trace(block);
+		list_add_tail(&block->link, &allocated);
+
+		pages -= BIT(order);
+
+		if (!pages)
+			break;
+	} while (1);
+
+	/* Trim the allocated block to the required size */
+	if (!(flags & GPU_BUDDY_TRIM_DISABLE) &&
+	    original_size != size) {
+		struct list_head *trim_list;
+		LIST_HEAD(temp);
+		u64 trim_size;
+
+		trim_list = &allocated;
+		trim_size = original_size;
+
+		if (!list_is_singular(&allocated)) {
+			block = list_last_entry(&allocated, typeof(*block), link);
+			list_move(&block->link, &temp);
+			trim_list = &temp;
+			trim_size = gpu_buddy_block_size(mm, block) -
+				(size - original_size);
+		}
+
+		gpu_buddy_block_trim(mm,
+				     NULL,
+				     trim_size,
+				     trim_list);
+
+		if (!list_empty(&temp))
+			list_splice_tail(trim_list, &allocated);
+	}
+
+	list_splice_tail(&allocated, blocks);
+	return 0;
+
+err_free:
+	gpu_buddy_free_list_internal(mm, &allocated);
+	return err;
+}
+EXPORT_SYMBOL(gpu_buddy_alloc_blocks);
+
+/**
+ * gpu_buddy_block_print - print block information
+ *
+ * @mm: GPU buddy manager
+ * @block: GPU buddy block
+ */
+void gpu_buddy_block_print(struct gpu_buddy *mm,
+			   struct gpu_buddy_block *block)
+{
+	u64 start = gpu_buddy_block_offset(block);
+	u64 size = gpu_buddy_block_size(mm, block);
+
+	pr_info("%#018llx-%#018llx: %llu\n", start, start + size, size);
+}
+EXPORT_SYMBOL(gpu_buddy_block_print);
+
+/**
+ * gpu_buddy_print - print allocator state
+ *
+ * @mm: GPU buddy manager
+ */
+void gpu_buddy_print(struct gpu_buddy *mm)
+{
+	int order;
+
+	pr_info("chunk_size: %lluKiB, total: %lluMiB, free: %lluMiB, clear_free: %lluMiB\n",
+		mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
+
+	for (order = mm->max_order; order >= 0; order--) {
+		struct gpu_buddy_block *block, *tmp;
+		struct rb_root *root;
+		u64 count = 0, free;
+		unsigned int tree;
+
+		for_each_free_tree(tree) {
+			root = &mm->free_trees[tree][order];
+
+			rbtree_postorder_for_each_entry_safe(block, tmp, root, rb) {
+				BUG_ON(!gpu_buddy_block_is_free(block));
+				count++;
+			}
+		}
+
+		free = count * (mm->chunk_size << order);
+		if (free < SZ_1M)
+			pr_info("order-%2d free: %8llu KiB, blocks: %llu\n",
+				order, free >> 10, count);
+		else
+			pr_info("order-%2d free: %8llu MiB, blocks: %llu\n",
+				order, free >> 20, count);
+	}
+}
+EXPORT_SYMBOL(gpu_buddy_print);
+
+static void gpu_buddy_module_exit(void)
+{
+	kmem_cache_destroy(slab_blocks);
+}
+
+static int __init gpu_buddy_module_init(void)
+{
+	slab_blocks = KMEM_CACHE(gpu_buddy_block, 0);
+	if (!slab_blocks)
+		return -ENOMEM;
+
+	return 0;
+}
+
+module_init(gpu_buddy_module_init);
+module_exit(gpu_buddy_module_exit);
+
+MODULE_DESCRIPTION("GPU Buddy Allocator");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 7e6bc0b3a589..0475defb37f0 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -220,6 +220,7 @@ config DRM_GPUSVM
 config DRM_BUDDY
 	tristate
 	depends on DRM
+	select GPU_BUDDY
 	help
 	  A page based buddy allocator
 
diff --git a/drivers/gpu/drm/Kconfig.debug b/drivers/gpu/drm/Kconfig.debug
index 05dc43c0b8c5..1f4c408c7920 100644
--- a/drivers/gpu/drm/Kconfig.debug
+++ b/drivers/gpu/drm/Kconfig.debug
@@ -71,6 +71,7 @@ config DRM_KUNIT_TEST
 	select DRM_KUNIT_TEST_HELPERS
 	select DRM_LIB_RANDOM
 	select DRM_SYSFB_HELPER
+	select GPU_BUDDY
 	select PRIME_NUMBERS
 	default KUNIT_ALL_TESTS
 	help
@@ -88,10 +89,11 @@ config DRM_TTM_KUNIT_TEST
 	tristate "KUnit tests for TTM" if !KUNIT_ALL_TESTS
 	default n
 	depends on DRM && KUNIT && MMU && (UML || COMPILE_TEST)
-	select DRM_TTM
 	select DRM_BUDDY
+	select DRM_TTM
 	select DRM_EXPORT_FOR_TESTS if m
 	select DRM_KUNIT_TEST_HELPERS
+	select GPU_BUDDY
 	default KUNIT_ALL_TESTS
 	help
 	  Enables unit tests for TTM, a GPU memory manager subsystem used
diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 7f515be5185d..bb131543e1d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -23,6 +23,7 @@ config DRM_AMDGPU
 	select CRC16
 	select BACKLIGHT_CLASS_DEVICE
 	select INTERVAL_TREE
+	select GPU_BUDDY
 	select DRM_BUDDY
 	select DRM_SUBALLOC_HELPER
 	select DRM_EXEC
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 2a6cf7963dde..e0bd8a68877f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -5654,7 +5654,7 @@ int amdgpu_ras_add_critical_region(struct amdgpu_device *adev,
 	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
 	struct amdgpu_vram_mgr_resource *vres;
 	struct ras_critical_region *region;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	int ret = 0;
 
 	if (!bo || !bo->tbo.resource)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index be2e56ce1355..8908d9e08a30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -55,7 +55,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
 				    uint64_t start, uint64_t size,
 				    struct amdgpu_res_cursor *cur)
 {
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	struct list_head *head, *next;
 	struct drm_mm_node *node;
 
@@ -71,7 +71,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
 		head = &to_amdgpu_vram_mgr_resource(res)->blocks;
 
 		block = list_first_entry_or_null(head,
-						 struct drm_buddy_block,
+						 struct gpu_buddy_block,
 						 link);
 		if (!block)
 			goto fallback;
@@ -81,7 +81,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
 
 			next = block->link.next;
 			if (next != head)
-				block = list_entry(next, struct drm_buddy_block, link);
+				block = list_entry(next, struct gpu_buddy_block, link);
 		}
 
 		cur->start = amdgpu_vram_mgr_block_start(block) + start;
@@ -125,7 +125,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
  */
 static inline void amdgpu_res_next(struct amdgpu_res_cursor *cur, uint64_t size)
 {
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	struct drm_mm_node *node;
 	struct list_head *next;
 
@@ -146,7 +146,7 @@ static inline void amdgpu_res_next(struct amdgpu_res_cursor *cur, uint64_t size)
 		block = cur->node;
 
 		next = block->link.next;
-		block = list_entry(next, struct drm_buddy_block, link);
+		block = list_entry(next, struct gpu_buddy_block, link);
 
 		cur->node = block;
 		cur->start = amdgpu_vram_mgr_block_start(block);
@@ -175,7 +175,7 @@ static inline void amdgpu_res_next(struct amdgpu_res_cursor *cur, uint64_t size)
  */
 static inline bool amdgpu_res_cleared(struct amdgpu_res_cursor *cur)
 {
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 
 	switch (cur->mem_type) {
 	case TTM_PL_VRAM:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 9d934c07fa6b..6c06a9c9b13f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -23,6 +23,8 @@
  */
 
 #include <linux/dma-mapping.h>
+
+#include <drm/drm_buddy.h>
 #include <drm/ttm/ttm_range_manager.h>
 #include <drm/drm_drv.h>
 
@@ -52,15 +54,15 @@ to_amdgpu_device(struct amdgpu_vram_mgr *mgr)
 	return container_of(mgr, struct amdgpu_device, mman.vram_mgr);
 }
 
-static inline struct drm_buddy_block *
+static inline struct gpu_buddy_block *
 amdgpu_vram_mgr_first_block(struct list_head *list)
 {
-	return list_first_entry_or_null(list, struct drm_buddy_block, link);
+	return list_first_entry_or_null(list, struct gpu_buddy_block, link);
 }
 
 static inline bool amdgpu_is_vram_mgr_blocks_contiguous(struct list_head *head)
 {
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	u64 start, size;
 
 	block = amdgpu_vram_mgr_first_block(head);
@@ -71,7 +73,7 @@ static inline bool amdgpu_is_vram_mgr_blocks_contiguous(struct list_head *head)
 		start = amdgpu_vram_mgr_block_start(block);
 		size = amdgpu_vram_mgr_block_size(block);
 
-		block = list_entry(block->link.next, struct drm_buddy_block, link);
+		block = list_entry(block->link.next, struct gpu_buddy_block, link);
 		if (start + size != amdgpu_vram_mgr_block_start(block))
 			return false;
 	}
@@ -81,7 +83,7 @@ static inline bool amdgpu_is_vram_mgr_blocks_contiguous(struct list_head *head)
 
 static inline u64 amdgpu_vram_mgr_blocks_size(struct list_head *head)
 {
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	u64 size = 0;
 
 	list_for_each_entry(block, head, link)
@@ -254,7 +256,7 @@ const struct attribute_group amdgpu_vram_mgr_attr_group = {
  * Calculate how many bytes of the DRM BUDDY block are inside visible VRAM
  */
 static u64 amdgpu_vram_mgr_vis_size(struct amdgpu_device *adev,
-				    struct drm_buddy_block *block)
+				    struct gpu_buddy_block *block)
 {
 	u64 start = amdgpu_vram_mgr_block_start(block);
 	u64 end = start + amdgpu_vram_mgr_block_size(block);
@@ -279,7 +281,7 @@ u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo)
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 	struct ttm_resource *res = bo->tbo.resource;
 	struct amdgpu_vram_mgr_resource *vres = to_amdgpu_vram_mgr_resource(res);
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	u64 usage = 0;
 
 	if (amdgpu_gmc_vram_full_visible(&adev->gmc))
@@ -299,15 +301,15 @@ static void amdgpu_vram_mgr_do_reserve(struct ttm_resource_manager *man)
 {
 	struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
 	struct amdgpu_device *adev = to_amdgpu_device(mgr);
-	struct drm_buddy *mm = &mgr->mm;
+	struct gpu_buddy *mm = &mgr->mm;
 	struct amdgpu_vram_reservation *rsv, *temp;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	uint64_t vis_usage;
 
 	list_for_each_entry_safe(rsv, temp, &mgr->reservations_pending, blocks) {
-		if (drm_buddy_alloc_blocks(mm, rsv->start, rsv->start + rsv->size,
+		if (gpu_buddy_alloc_blocks(mm, rsv->start, rsv->start + rsv->size,
 					   rsv->size, mm->chunk_size, &rsv->allocated,
-					   DRM_BUDDY_RANGE_ALLOCATION))
+					   GPU_BUDDY_RANGE_ALLOCATION))
 			continue;
 
 		block = amdgpu_vram_mgr_first_block(&rsv->allocated);
@@ -403,7 +405,7 @@ int amdgpu_vram_mgr_query_address_block_info(struct amdgpu_vram_mgr *mgr,
 			uint64_t address, struct amdgpu_vram_block_info *info)
 {
 	struct amdgpu_vram_mgr_resource *vres;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	u64 start, size;
 	int ret = -ENOENT;
 
@@ -450,8 +452,8 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 	struct amdgpu_vram_mgr_resource *vres;
 	u64 size, remaining_size, lpfn, fpfn;
 	unsigned int adjust_dcc_size = 0;
-	struct drm_buddy *mm = &mgr->mm;
-	struct drm_buddy_block *block;
+	struct gpu_buddy *mm = &mgr->mm;
+	struct gpu_buddy_block *block;
 	unsigned long pages_per_block;
 	int r;
 
@@ -493,17 +495,17 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 	INIT_LIST_HEAD(&vres->blocks);
 
 	if (place->flags & TTM_PL_FLAG_TOPDOWN)
-		vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
+		vres->flags |= GPU_BUDDY_TOPDOWN_ALLOCATION;
 
 	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
-		vres->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
+		vres->flags |= GPU_BUDDY_CONTIGUOUS_ALLOCATION;
 
 	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED)
-		vres->flags |= DRM_BUDDY_CLEAR_ALLOCATION;
+		vres->flags |= GPU_BUDDY_CLEAR_ALLOCATION;
 
 	if (fpfn || lpfn != mgr->mm.size)
 		/* Allocate blocks in desired range */
-		vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
+		vres->flags |= GPU_BUDDY_RANGE_ALLOCATION;
 
 	if (bo->flags & AMDGPU_GEM_CREATE_GFX12_DCC &&
 	    adev->gmc.gmc_funcs->get_dcc_alignment)
@@ -516,7 +518,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 		dcc_size = roundup_pow_of_two(vres->base.size + adjust_dcc_size);
 		remaining_size = (u64)dcc_size;
 
-		vres->flags |= DRM_BUDDY_TRIM_DISABLE;
+		vres->flags |= GPU_BUDDY_TRIM_DISABLE;
 	}
 
 	mutex_lock(&mgr->lock);
@@ -536,7 +538,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 
 		BUG_ON(min_block_size < mm->chunk_size);
 
-		r = drm_buddy_alloc_blocks(mm, fpfn,
+		r = gpu_buddy_alloc_blocks(mm, fpfn,
 					   lpfn,
 					   size,
 					   min_block_size,
@@ -545,7 +547,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 
 		if (unlikely(r == -ENOSPC) && pages_per_block == ~0ul &&
 		    !(place->flags & TTM_PL_FLAG_CONTIGUOUS)) {
-			vres->flags &= ~DRM_BUDDY_CONTIGUOUS_ALLOCATION;
+			vres->flags &= ~GPU_BUDDY_CONTIGUOUS_ALLOCATION;
 			pages_per_block = max_t(u32, 2UL << (20UL - PAGE_SHIFT),
 						tbo->page_alignment);
 
@@ -566,7 +568,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 	list_add_tail(&vres->vres_node, &mgr->allocated_vres_list);
 
 	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && adjust_dcc_size) {
-		struct drm_buddy_block *dcc_block;
+		struct gpu_buddy_block *dcc_block;
 		unsigned long dcc_start;
 		u64 trim_start;
 
@@ -576,7 +578,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 			roundup((unsigned long)amdgpu_vram_mgr_block_start(dcc_block),
 				adjust_dcc_size);
 		trim_start = (u64)dcc_start;
-		drm_buddy_block_trim(mm, &trim_start,
+		gpu_buddy_block_trim(mm, &trim_start,
 				     (u64)vres->base.size,
 				     &vres->blocks);
 	}
@@ -614,7 +616,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 	return 0;
 
 error_free_blocks:
-	drm_buddy_free_list(mm, &vres->blocks, 0);
+	gpu_buddy_free_list(mm, &vres->blocks, 0);
 	mutex_unlock(&mgr->lock);
 error_fini:
 	ttm_resource_fini(man, &vres->base);
@@ -637,8 +639,8 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man,
 	struct amdgpu_vram_mgr_resource *vres = to_amdgpu_vram_mgr_resource(res);
 	struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
 	struct amdgpu_device *adev = to_amdgpu_device(mgr);
-	struct drm_buddy *mm = &mgr->mm;
-	struct drm_buddy_block *block;
+	struct gpu_buddy *mm = &mgr->mm;
+	struct gpu_buddy_block *block;
 	uint64_t vis_usage = 0;
 
 	mutex_lock(&mgr->lock);
@@ -649,7 +651,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man,
 	list_for_each_entry(block, &vres->blocks, link)
 		vis_usage += amdgpu_vram_mgr_vis_size(adev, block);
 
-	drm_buddy_free_list(mm, &vres->blocks, vres->flags);
+	gpu_buddy_free_list(mm, &vres->blocks, vres->flags);
 	amdgpu_vram_mgr_do_reserve(man);
 	mutex_unlock(&mgr->lock);
 
@@ -688,7 +690,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
 	if (!*sgt)
 		return -ENOMEM;
 
-	/* Determine the number of DRM_BUDDY blocks to export */
+	/* Determine the number of GPU_BUDDY blocks to export */
 	amdgpu_res_first(res, offset, length, &cursor);
 	while (cursor.remaining) {
 		num_entries++;
@@ -704,10 +706,10 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
 		sg->length = 0;
 
 	/*
-	 * Walk down DRM_BUDDY blocks to populate scatterlist nodes
-	 * @note: Use iterator api to get first the DRM_BUDDY block
+	 * Walk down GPU_BUDDY blocks to populate scatterlist nodes
+	 * @note: Use iterator api to get first the GPU_BUDDY block
 	 * and the number of bytes from it. Access the following
-	 * DRM_BUDDY block(s) if more buffer needs to exported
+	 * GPU_BUDDY block(s) if more buffer needs to exported
 	 */
 	amdgpu_res_first(res, offset, length, &cursor);
 	for_each_sgtable_sg((*sgt), sg, i) {
@@ -792,10 +794,10 @@ uint64_t amdgpu_vram_mgr_vis_usage(struct amdgpu_vram_mgr *mgr)
 void amdgpu_vram_mgr_clear_reset_blocks(struct amdgpu_device *adev)
 {
 	struct amdgpu_vram_mgr *mgr = &adev->mman.vram_mgr;
-	struct drm_buddy *mm = &mgr->mm;
+	struct gpu_buddy *mm = &mgr->mm;
 
 	mutex_lock(&mgr->lock);
-	drm_buddy_reset_clear(mm, false);
+	gpu_buddy_reset_clear(mm, false);
 	mutex_unlock(&mgr->lock);
 }
 
@@ -815,7 +817,7 @@ static bool amdgpu_vram_mgr_intersects(struct ttm_resource_manager *man,
 				       size_t size)
 {
 	struct amdgpu_vram_mgr_resource *mgr = to_amdgpu_vram_mgr_resource(res);
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 
 	/* Check each drm buddy block individually */
 	list_for_each_entry(block, &mgr->blocks, link) {
@@ -848,7 +850,7 @@ static bool amdgpu_vram_mgr_compatible(struct ttm_resource_manager *man,
 				       size_t size)
 {
 	struct amdgpu_vram_mgr_resource *mgr = to_amdgpu_vram_mgr_resource(res);
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 
 	/* Check each drm buddy block individually */
 	list_for_each_entry(block, &mgr->blocks, link) {
@@ -877,7 +879,7 @@ static void amdgpu_vram_mgr_debug(struct ttm_resource_manager *man,
 				  struct drm_printer *printer)
 {
 	struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
-	struct drm_buddy *mm = &mgr->mm;
+	struct gpu_buddy *mm = &mgr->mm;
 	struct amdgpu_vram_reservation *rsv;
 
 	drm_printf(printer, "  vis usage:%llu\n",
@@ -930,7 +932,7 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
 	mgr->default_page_size = PAGE_SIZE;
 
 	man->func = &amdgpu_vram_mgr_func;
-	err = drm_buddy_init(&mgr->mm, man->size, PAGE_SIZE);
+	err = gpu_buddy_init(&mgr->mm, man->size, PAGE_SIZE);
 	if (err)
 		return err;
 
@@ -965,11 +967,11 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
 		kfree(rsv);
 
 	list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) {
-		drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0);
+		gpu_buddy_free_list(&mgr->mm, &rsv->allocated, 0);
 		kfree(rsv);
 	}
 	if (!adev->gmc.is_app_apu)
-		drm_buddy_fini(&mgr->mm);
+		gpu_buddy_fini(&mgr->mm);
 	mutex_unlock(&mgr->lock);
 
 	ttm_resource_manager_cleanup(man);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
index 5f5fd9a911c2..429a21a2e9b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
@@ -24,11 +24,11 @@
 #ifndef __AMDGPU_VRAM_MGR_H__
 #define __AMDGPU_VRAM_MGR_H__
 
-#include <drm/drm_buddy.h>
+#include <linux/gpu_buddy.h>
 
 struct amdgpu_vram_mgr {
 	struct ttm_resource_manager manager;
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 	/* protects access to buffer objects */
 	struct mutex lock;
 	struct list_head reservations_pending;
@@ -57,19 +57,19 @@ struct amdgpu_vram_mgr_resource {
 	struct amdgpu_vres_task task;
 };
 
-static inline u64 amdgpu_vram_mgr_block_start(struct drm_buddy_block *block)
+static inline u64 amdgpu_vram_mgr_block_start(struct gpu_buddy_block *block)
 {
-	return drm_buddy_block_offset(block);
+	return gpu_buddy_block_offset(block);
 }
 
-static inline u64 amdgpu_vram_mgr_block_size(struct drm_buddy_block *block)
+static inline u64 amdgpu_vram_mgr_block_size(struct gpu_buddy_block *block)
 {
-	return (u64)PAGE_SIZE << drm_buddy_block_order(block);
+	return (u64)PAGE_SIZE << gpu_buddy_block_order(block);
 }
 
-static inline bool amdgpu_vram_mgr_is_cleared(struct drm_buddy_block *block)
+static inline bool amdgpu_vram_mgr_is_cleared(struct gpu_buddy_block *block)
 {
-	return drm_buddy_block_is_clear(block);
+	return gpu_buddy_block_is_clear(block);
 }
 
 static inline struct amdgpu_vram_mgr_resource *
@@ -82,8 +82,8 @@ static inline void amdgpu_vram_mgr_set_cleared(struct ttm_resource *res)
 {
 	struct amdgpu_vram_mgr_resource *ares = to_amdgpu_vram_mgr_resource(res);
 
-	WARN_ON(ares->flags & DRM_BUDDY_CLEARED);
-	ares->flags |= DRM_BUDDY_CLEARED;
+	WARN_ON(ares->flags & GPU_BUDDY_CLEARED);
+	ares->flags |= GPU_BUDDY_CLEARED;
 }
 
 int amdgpu_vram_mgr_query_address_block_info(struct amdgpu_vram_mgr *mgr,
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 2f279b46bd2c..188b36054e59 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -3,1262 +3,25 @@
  * Copyright © 2021 Intel Corporation
  */
 
-#include <kunit/test-bug.h>
-
 #include <linux/export.h>
-#include <linux/kmemleak.h>
 #include <linux/module.h>
 #include <linux/sizes.h>
 
 #include <drm/drm_buddy.h>
 #include <drm/drm_print.h>
 
-enum drm_buddy_free_tree {
-	DRM_BUDDY_CLEAR_TREE = 0,
-	DRM_BUDDY_DIRTY_TREE,
-	DRM_BUDDY_MAX_FREE_TREES,
-};
-
-static struct kmem_cache *slab_blocks;
-
-#define for_each_free_tree(tree) \
-	for ((tree) = 0; (tree) < DRM_BUDDY_MAX_FREE_TREES; (tree)++)
-
-static struct drm_buddy_block *drm_block_alloc(struct drm_buddy *mm,
-					       struct drm_buddy_block *parent,
-					       unsigned int order,
-					       u64 offset)
-{
-	struct drm_buddy_block *block;
-
-	BUG_ON(order > DRM_BUDDY_MAX_ORDER);
-
-	block = kmem_cache_zalloc(slab_blocks, GFP_KERNEL);
-	if (!block)
-		return NULL;
-
-	block->header = offset;
-	block->header |= order;
-	block->parent = parent;
-
-	RB_CLEAR_NODE(&block->rb);
-
-	BUG_ON(block->header & DRM_BUDDY_HEADER_UNUSED);
-	return block;
-}
-
-static void drm_block_free(struct drm_buddy *mm,
-			   struct drm_buddy_block *block)
-{
-	kmem_cache_free(slab_blocks, block);
-}
-
-static enum drm_buddy_free_tree
-get_block_tree(struct drm_buddy_block *block)
-{
-	return drm_buddy_block_is_clear(block) ?
-	       DRM_BUDDY_CLEAR_TREE : DRM_BUDDY_DIRTY_TREE;
-}
-
-static struct drm_buddy_block *
-rbtree_get_free_block(const struct rb_node *node)
-{
-	return node ? rb_entry(node, struct drm_buddy_block, rb) : NULL;
-}
-
-static struct drm_buddy_block *
-rbtree_last_free_block(struct rb_root *root)
-{
-	return rbtree_get_free_block(rb_last(root));
-}
-
-static bool rbtree_is_empty(struct rb_root *root)
-{
-	return RB_EMPTY_ROOT(root);
-}
-
-static bool drm_buddy_block_offset_less(const struct drm_buddy_block *block,
-					const struct drm_buddy_block *node)
-{
-	return drm_buddy_block_offset(block) < drm_buddy_block_offset(node);
-}
-
-static bool rbtree_block_offset_less(struct rb_node *block,
-				     const struct rb_node *node)
-{
-	return drm_buddy_block_offset_less(rbtree_get_free_block(block),
-					   rbtree_get_free_block(node));
-}
-
-static void rbtree_insert(struct drm_buddy *mm,
-			  struct drm_buddy_block *block,
-			  enum drm_buddy_free_tree tree)
-{
-	rb_add(&block->rb,
-	       &mm->free_trees[tree][drm_buddy_block_order(block)],
-	       rbtree_block_offset_less);
-}
-
-static void rbtree_remove(struct drm_buddy *mm,
-			  struct drm_buddy_block *block)
-{
-	unsigned int order = drm_buddy_block_order(block);
-	enum drm_buddy_free_tree tree;
-	struct rb_root *root;
-
-	tree = get_block_tree(block);
-	root = &mm->free_trees[tree][order];
-
-	rb_erase(&block->rb, root);
-	RB_CLEAR_NODE(&block->rb);
-}
-
-static void clear_reset(struct drm_buddy_block *block)
-{
-	block->header &= ~DRM_BUDDY_HEADER_CLEAR;
-}
-
-static void mark_cleared(struct drm_buddy_block *block)
-{
-	block->header |= DRM_BUDDY_HEADER_CLEAR;
-}
-
-static void mark_allocated(struct drm_buddy *mm,
-			   struct drm_buddy_block *block)
-{
-	block->header &= ~DRM_BUDDY_HEADER_STATE;
-	block->header |= DRM_BUDDY_ALLOCATED;
-
-	rbtree_remove(mm, block);
-}
-
-static void mark_free(struct drm_buddy *mm,
-		      struct drm_buddy_block *block)
-{
-	enum drm_buddy_free_tree tree;
-
-	block->header &= ~DRM_BUDDY_HEADER_STATE;
-	block->header |= DRM_BUDDY_FREE;
-
-	tree = get_block_tree(block);
-	rbtree_insert(mm, block, tree);
-}
-
-static void mark_split(struct drm_buddy *mm,
-		       struct drm_buddy_block *block)
-{
-	block->header &= ~DRM_BUDDY_HEADER_STATE;
-	block->header |= DRM_BUDDY_SPLIT;
-
-	rbtree_remove(mm, block);
-}
-
-static inline bool overlaps(u64 s1, u64 e1, u64 s2, u64 e2)
-{
-	return s1 <= e2 && e1 >= s2;
-}
-
-static inline bool contains(u64 s1, u64 e1, u64 s2, u64 e2)
-{
-	return s1 <= s2 && e1 >= e2;
-}
-
-static struct drm_buddy_block *
-__get_buddy(struct drm_buddy_block *block)
-{
-	struct drm_buddy_block *parent;
-
-	parent = block->parent;
-	if (!parent)
-		return NULL;
-
-	if (parent->left == block)
-		return parent->right;
-
-	return parent->left;
-}
-
-static unsigned int __drm_buddy_free(struct drm_buddy *mm,
-				     struct drm_buddy_block *block,
-				     bool force_merge)
-{
-	struct drm_buddy_block *parent;
-	unsigned int order;
-
-	while ((parent = block->parent)) {
-		struct drm_buddy_block *buddy;
-
-		buddy = __get_buddy(block);
-
-		if (!drm_buddy_block_is_free(buddy))
-			break;
-
-		if (!force_merge) {
-			/*
-			 * Check the block and its buddy clear state and exit
-			 * the loop if they both have the dissimilar state.
-			 */
-			if (drm_buddy_block_is_clear(block) !=
-			    drm_buddy_block_is_clear(buddy))
-				break;
-
-			if (drm_buddy_block_is_clear(block))
-				mark_cleared(parent);
-		}
-
-		rbtree_remove(mm, buddy);
-		if (force_merge && drm_buddy_block_is_clear(buddy))
-			mm->clear_avail -= drm_buddy_block_size(mm, buddy);
-
-		drm_block_free(mm, block);
-		drm_block_free(mm, buddy);
-
-		block = parent;
-	}
-
-	order = drm_buddy_block_order(block);
-	mark_free(mm, block);
-
-	return order;
-}
-
-static int __force_merge(struct drm_buddy *mm,
-			 u64 start,
-			 u64 end,
-			 unsigned int min_order)
-{
-	unsigned int tree, order;
-	int i;
-
-	if (!min_order)
-		return -ENOMEM;
-
-	if (min_order > mm->max_order)
-		return -EINVAL;
-
-	for_each_free_tree(tree) {
-		for (i = min_order - 1; i >= 0; i--) {
-			struct rb_node *iter = rb_last(&mm->free_trees[tree][i]);
-
-			while (iter) {
-				struct drm_buddy_block *block, *buddy;
-				u64 block_start, block_end;
-
-				block = rbtree_get_free_block(iter);
-				iter = rb_prev(iter);
-
-				if (!block || !block->parent)
-					continue;
-
-				block_start = drm_buddy_block_offset(block);
-				block_end = block_start + drm_buddy_block_size(mm, block) - 1;
-
-				if (!contains(start, end, block_start, block_end))
-					continue;
-
-				buddy = __get_buddy(block);
-				if (!drm_buddy_block_is_free(buddy))
-					continue;
-
-				WARN_ON(drm_buddy_block_is_clear(block) ==
-					drm_buddy_block_is_clear(buddy));
-
-				/*
-				 * Advance to the next node when the current node is the buddy,
-				 * as freeing the block will also remove its buddy from the tree.
-				 */
-				if (iter == &buddy->rb)
-					iter = rb_prev(iter);
-
-				rbtree_remove(mm, block);
-				if (drm_buddy_block_is_clear(block))
-					mm->clear_avail -= drm_buddy_block_size(mm, block);
-
-				order = __drm_buddy_free(mm, block, true);
-				if (order >= min_order)
-					return 0;
-			}
-		}
-	}
-
-	return -ENOMEM;
-}
-
-/**
- * drm_buddy_init - init memory manager
- *
- * @mm: DRM buddy manager to initialize
- * @size: size in bytes to manage
- * @chunk_size: minimum page size in bytes for our allocations
- *
- * Initializes the memory manager and its resources.
- *
- * Returns:
- * 0 on success, error code on failure.
- */
-int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size)
-{
-	unsigned int i, j, root_count = 0;
-	u64 offset = 0;
-
-	if (size < chunk_size)
-		return -EINVAL;
-
-	if (chunk_size < SZ_4K)
-		return -EINVAL;
-
-	if (!is_power_of_2(chunk_size))
-		return -EINVAL;
-
-	size = round_down(size, chunk_size);
-
-	mm->size = size;
-	mm->avail = size;
-	mm->clear_avail = 0;
-	mm->chunk_size = chunk_size;
-	mm->max_order = ilog2(size) - ilog2(chunk_size);
-
-	BUG_ON(mm->max_order > DRM_BUDDY_MAX_ORDER);
-
-	mm->free_trees = kmalloc_array(DRM_BUDDY_MAX_FREE_TREES,
-				       sizeof(*mm->free_trees),
-				       GFP_KERNEL);
-	if (!mm->free_trees)
-		return -ENOMEM;
-
-	for_each_free_tree(i) {
-		mm->free_trees[i] = kmalloc_array(mm->max_order + 1,
-						  sizeof(struct rb_root),
-						  GFP_KERNEL);
-		if (!mm->free_trees[i])
-			goto out_free_tree;
-
-		for (j = 0; j <= mm->max_order; ++j)
-			mm->free_trees[i][j] = RB_ROOT;
-	}
-
-	mm->n_roots = hweight64(size);
-
-	mm->roots = kmalloc_array(mm->n_roots,
-				  sizeof(struct drm_buddy_block *),
-				  GFP_KERNEL);
-	if (!mm->roots)
-		goto out_free_tree;
-
-	/*
-	 * Split into power-of-two blocks, in case we are given a size that is
-	 * not itself a power-of-two.
-	 */
-	do {
-		struct drm_buddy_block *root;
-		unsigned int order;
-		u64 root_size;
-
-		order = ilog2(size) - ilog2(chunk_size);
-		root_size = chunk_size << order;
-
-		root = drm_block_alloc(mm, NULL, order, offset);
-		if (!root)
-			goto out_free_roots;
-
-		mark_free(mm, root);
-
-		BUG_ON(root_count > mm->max_order);
-		BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
-
-		mm->roots[root_count] = root;
-
-		offset += root_size;
-		size -= root_size;
-		root_count++;
-	} while (size);
-
-	return 0;
-
-out_free_roots:
-	while (root_count--)
-		drm_block_free(mm, mm->roots[root_count]);
-	kfree(mm->roots);
-out_free_tree:
-	while (i--)
-		kfree(mm->free_trees[i]);
-	kfree(mm->free_trees);
-	return -ENOMEM;
-}
-EXPORT_SYMBOL(drm_buddy_init);
-
-/**
- * drm_buddy_fini - tear down the memory manager
- *
- * @mm: DRM buddy manager to free
- *
- * Cleanup memory manager resources and the freetree
- */
-void drm_buddy_fini(struct drm_buddy *mm)
-{
-	u64 root_size, size, start;
-	unsigned int order;
-	int i;
-
-	size = mm->size;
-
-	for (i = 0; i < mm->n_roots; ++i) {
-		order = ilog2(size) - ilog2(mm->chunk_size);
-		start = drm_buddy_block_offset(mm->roots[i]);
-		__force_merge(mm, start, start + size, order);
-
-		if (WARN_ON(!drm_buddy_block_is_free(mm->roots[i])))
-			kunit_fail_current_test("buddy_fini() root");
-
-		drm_block_free(mm, mm->roots[i]);
-
-		root_size = mm->chunk_size << order;
-		size -= root_size;
-	}
-
-	WARN_ON(mm->avail != mm->size);
-
-	for_each_free_tree(i)
-		kfree(mm->free_trees[i]);
-	kfree(mm->roots);
-}
-EXPORT_SYMBOL(drm_buddy_fini);
-
-static int split_block(struct drm_buddy *mm,
-		       struct drm_buddy_block *block)
-{
-	unsigned int block_order = drm_buddy_block_order(block) - 1;
-	u64 offset = drm_buddy_block_offset(block);
-
-	BUG_ON(!drm_buddy_block_is_free(block));
-	BUG_ON(!drm_buddy_block_order(block));
-
-	block->left = drm_block_alloc(mm, block, block_order, offset);
-	if (!block->left)
-		return -ENOMEM;
-
-	block->right = drm_block_alloc(mm, block, block_order,
-				       offset + (mm->chunk_size << block_order));
-	if (!block->right) {
-		drm_block_free(mm, block->left);
-		return -ENOMEM;
-	}
-
-	mark_split(mm, block);
-
-	if (drm_buddy_block_is_clear(block)) {
-		mark_cleared(block->left);
-		mark_cleared(block->right);
-		clear_reset(block);
-	}
-
-	mark_free(mm, block->left);
-	mark_free(mm, block->right);
-
-	return 0;
-}
-
-/**
- * drm_get_buddy - get buddy address
- *
- * @block: DRM buddy block
- *
- * Returns the corresponding buddy block for @block, or NULL
- * if this is a root block and can't be merged further.
- * Requires some kind of locking to protect against
- * any concurrent allocate and free operations.
- */
-struct drm_buddy_block *
-drm_get_buddy(struct drm_buddy_block *block)
-{
-	return __get_buddy(block);
-}
-EXPORT_SYMBOL(drm_get_buddy);
-
-/**
- * drm_buddy_reset_clear - reset blocks clear state
- *
- * @mm: DRM buddy manager
- * @is_clear: blocks clear state
- *
- * Reset the clear state based on @is_clear value for each block
- * in the freetree.
- */
-void drm_buddy_reset_clear(struct drm_buddy *mm, bool is_clear)
-{
-	enum drm_buddy_free_tree src_tree, dst_tree;
-	u64 root_size, size, start;
-	unsigned int order;
-	int i;
-
-	size = mm->size;
-	for (i = 0; i < mm->n_roots; ++i) {
-		order = ilog2(size) - ilog2(mm->chunk_size);
-		start = drm_buddy_block_offset(mm->roots[i]);
-		__force_merge(mm, start, start + size, order);
-
-		root_size = mm->chunk_size << order;
-		size -= root_size;
-	}
-
-	src_tree = is_clear ? DRM_BUDDY_DIRTY_TREE : DRM_BUDDY_CLEAR_TREE;
-	dst_tree = is_clear ? DRM_BUDDY_CLEAR_TREE : DRM_BUDDY_DIRTY_TREE;
-
-	for (i = 0; i <= mm->max_order; ++i) {
-		struct rb_root *root = &mm->free_trees[src_tree][i];
-		struct drm_buddy_block *block, *tmp;
-
-		rbtree_postorder_for_each_entry_safe(block, tmp, root, rb) {
-			rbtree_remove(mm, block);
-			if (is_clear) {
-				mark_cleared(block);
-				mm->clear_avail += drm_buddy_block_size(mm, block);
-			} else {
-				clear_reset(block);
-				mm->clear_avail -= drm_buddy_block_size(mm, block);
-			}
-
-			rbtree_insert(mm, block, dst_tree);
-		}
-	}
-}
-EXPORT_SYMBOL(drm_buddy_reset_clear);
-
-/**
- * drm_buddy_free_block - free a block
- *
- * @mm: DRM buddy manager
- * @block: block to be freed
- */
-void drm_buddy_free_block(struct drm_buddy *mm,
-			  struct drm_buddy_block *block)
-{
-	BUG_ON(!drm_buddy_block_is_allocated(block));
-	mm->avail += drm_buddy_block_size(mm, block);
-	if (drm_buddy_block_is_clear(block))
-		mm->clear_avail += drm_buddy_block_size(mm, block);
-
-	__drm_buddy_free(mm, block, false);
-}
-EXPORT_SYMBOL(drm_buddy_free_block);
-
-static void __drm_buddy_free_list(struct drm_buddy *mm,
-				  struct list_head *objects,
-				  bool mark_clear,
-				  bool mark_dirty)
-{
-	struct drm_buddy_block *block, *on;
-
-	WARN_ON(mark_dirty && mark_clear);
-
-	list_for_each_entry_safe(block, on, objects, link) {
-		if (mark_clear)
-			mark_cleared(block);
-		else if (mark_dirty)
-			clear_reset(block);
-		drm_buddy_free_block(mm, block);
-		cond_resched();
-	}
-	INIT_LIST_HEAD(objects);
-}
-
-static void drm_buddy_free_list_internal(struct drm_buddy *mm,
-					 struct list_head *objects)
-{
-	/*
-	 * Don't touch the clear/dirty bit, since allocation is still internal
-	 * at this point. For example we might have just failed part of the
-	 * allocation.
-	 */
-	__drm_buddy_free_list(mm, objects, false, false);
-}
-
-/**
- * drm_buddy_free_list - free blocks
- *
- * @mm: DRM buddy manager
- * @objects: input list head to free blocks
- * @flags: optional flags like DRM_BUDDY_CLEARED
- */
-void drm_buddy_free_list(struct drm_buddy *mm,
-			 struct list_head *objects,
-			 unsigned int flags)
-{
-	bool mark_clear = flags & DRM_BUDDY_CLEARED;
-
-	__drm_buddy_free_list(mm, objects, mark_clear, !mark_clear);
-}
-EXPORT_SYMBOL(drm_buddy_free_list);
-
-static bool block_incompatible(struct drm_buddy_block *block, unsigned int flags)
-{
-	bool needs_clear = flags & DRM_BUDDY_CLEAR_ALLOCATION;
-
-	return needs_clear != drm_buddy_block_is_clear(block);
-}
-
-static struct drm_buddy_block *
-__alloc_range_bias(struct drm_buddy *mm,
-		   u64 start, u64 end,
-		   unsigned int order,
-		   unsigned long flags,
-		   bool fallback)
-{
-	u64 req_size = mm->chunk_size << order;
-	struct drm_buddy_block *block;
-	struct drm_buddy_block *buddy;
-	LIST_HEAD(dfs);
-	int err;
-	int i;
-
-	end = end - 1;
-
-	for (i = 0; i < mm->n_roots; ++i)
-		list_add_tail(&mm->roots[i]->tmp_link, &dfs);
-
-	do {
-		u64 block_start;
-		u64 block_end;
-
-		block = list_first_entry_or_null(&dfs,
-						 struct drm_buddy_block,
-						 tmp_link);
-		if (!block)
-			break;
-
-		list_del(&block->tmp_link);
-
-		if (drm_buddy_block_order(block) < order)
-			continue;
-
-		block_start = drm_buddy_block_offset(block);
-		block_end = block_start + drm_buddy_block_size(mm, block) - 1;
-
-		if (!overlaps(start, end, block_start, block_end))
-			continue;
-
-		if (drm_buddy_block_is_allocated(block))
-			continue;
-
-		if (block_start < start || block_end > end) {
-			u64 adjusted_start = max(block_start, start);
-			u64 adjusted_end = min(block_end, end);
-
-			if (round_down(adjusted_end + 1, req_size) <=
-			    round_up(adjusted_start, req_size))
-				continue;
-		}
-
-		if (!fallback && block_incompatible(block, flags))
-			continue;
-
-		if (contains(start, end, block_start, block_end) &&
-		    order == drm_buddy_block_order(block)) {
-			/*
-			 * Find the free block within the range.
-			 */
-			if (drm_buddy_block_is_free(block))
-				return block;
-
-			continue;
-		}
-
-		if (!drm_buddy_block_is_split(block)) {
-			err = split_block(mm, block);
-			if (unlikely(err))
-				goto err_undo;
-		}
-
-		list_add(&block->right->tmp_link, &dfs);
-		list_add(&block->left->tmp_link, &dfs);
-	} while (1);
-
-	return ERR_PTR(-ENOSPC);
-
-err_undo:
-	/*
-	 * We really don't want to leave around a bunch of split blocks, since
-	 * bigger is better, so make sure we merge everything back before we
-	 * free the allocated blocks.
-	 */
-	buddy = __get_buddy(block);
-	if (buddy &&
-	    (drm_buddy_block_is_free(block) &&
-	     drm_buddy_block_is_free(buddy)))
-		__drm_buddy_free(mm, block, false);
-	return ERR_PTR(err);
-}
-
-static struct drm_buddy_block *
-__drm_buddy_alloc_range_bias(struct drm_buddy *mm,
-			     u64 start, u64 end,
-			     unsigned int order,
-			     unsigned long flags)
-{
-	struct drm_buddy_block *block;
-	bool fallback = false;
-
-	block = __alloc_range_bias(mm, start, end, order,
-				   flags, fallback);
-	if (IS_ERR(block))
-		return __alloc_range_bias(mm, start, end, order,
-					  flags, !fallback);
-
-	return block;
-}
-
-static struct drm_buddy_block *
-get_maxblock(struct drm_buddy *mm,
-	     unsigned int order,
-	     enum drm_buddy_free_tree tree)
-{
-	struct drm_buddy_block *max_block = NULL, *block = NULL;
-	struct rb_root *root;
-	unsigned int i;
-
-	for (i = order; i <= mm->max_order; ++i) {
-		root = &mm->free_trees[tree][i];
-		block = rbtree_last_free_block(root);
-		if (!block)
-			continue;
-
-		if (!max_block) {
-			max_block = block;
-			continue;
-		}
-
-		if (drm_buddy_block_offset(block) >
-		    drm_buddy_block_offset(max_block)) {
-			max_block = block;
-		}
-	}
-
-	return max_block;
-}
-
-static struct drm_buddy_block *
-alloc_from_freetree(struct drm_buddy *mm,
-		    unsigned int order,
-		    unsigned long flags)
-{
-	struct drm_buddy_block *block = NULL;
-	struct rb_root *root;
-	enum drm_buddy_free_tree tree;
-	unsigned int tmp;
-	int err;
-
-	tree = (flags & DRM_BUDDY_CLEAR_ALLOCATION) ?
-		DRM_BUDDY_CLEAR_TREE : DRM_BUDDY_DIRTY_TREE;
-
-	if (flags & DRM_BUDDY_TOPDOWN_ALLOCATION) {
-		block = get_maxblock(mm, order, tree);
-		if (block)
-			/* Store the obtained block order */
-			tmp = drm_buddy_block_order(block);
-	} else {
-		for (tmp = order; tmp <= mm->max_order; ++tmp) {
-			/* Get RB tree root for this order and tree */
-			root = &mm->free_trees[tree][tmp];
-			block = rbtree_last_free_block(root);
-			if (block)
-				break;
-		}
-	}
-
-	if (!block) {
-		/* Try allocating from the other tree */
-		tree = (tree == DRM_BUDDY_CLEAR_TREE) ?
-			DRM_BUDDY_DIRTY_TREE : DRM_BUDDY_CLEAR_TREE;
-
-		for (tmp = order; tmp <= mm->max_order; ++tmp) {
-			root = &mm->free_trees[tree][tmp];
-			block = rbtree_last_free_block(root);
-			if (block)
-				break;
-		}
-
-		if (!block)
-			return ERR_PTR(-ENOSPC);
-	}
-
-	BUG_ON(!drm_buddy_block_is_free(block));
-
-	while (tmp != order) {
-		err = split_block(mm, block);
-		if (unlikely(err))
-			goto err_undo;
-
-		block = block->right;
-		tmp--;
-	}
-	return block;
-
-err_undo:
-	if (tmp != order)
-		__drm_buddy_free(mm, block, false);
-	return ERR_PTR(err);
-}
-
-static int __alloc_range(struct drm_buddy *mm,
-			 struct list_head *dfs,
-			 u64 start, u64 size,
-			 struct list_head *blocks,
-			 u64 *total_allocated_on_err)
-{
-	struct drm_buddy_block *block;
-	struct drm_buddy_block *buddy;
-	u64 total_allocated = 0;
-	LIST_HEAD(allocated);
-	u64 end;
-	int err;
-
-	end = start + size - 1;
-
-	do {
-		u64 block_start;
-		u64 block_end;
-
-		block = list_first_entry_or_null(dfs,
-						 struct drm_buddy_block,
-						 tmp_link);
-		if (!block)
-			break;
-
-		list_del(&block->tmp_link);
-
-		block_start = drm_buddy_block_offset(block);
-		block_end = block_start + drm_buddy_block_size(mm, block) - 1;
-
-		if (!overlaps(start, end, block_start, block_end))
-			continue;
-
-		if (drm_buddy_block_is_allocated(block)) {
-			err = -ENOSPC;
-			goto err_free;
-		}
-
-		if (contains(start, end, block_start, block_end)) {
-			if (drm_buddy_block_is_free(block)) {
-				mark_allocated(mm, block);
-				total_allocated += drm_buddy_block_size(mm, block);
-				mm->avail -= drm_buddy_block_size(mm, block);
-				if (drm_buddy_block_is_clear(block))
-					mm->clear_avail -= drm_buddy_block_size(mm, block);
-				list_add_tail(&block->link, &allocated);
-				continue;
-			} else if (!mm->clear_avail) {
-				err = -ENOSPC;
-				goto err_free;
-			}
-		}
-
-		if (!drm_buddy_block_is_split(block)) {
-			err = split_block(mm, block);
-			if (unlikely(err))
-				goto err_undo;
-		}
-
-		list_add(&block->right->tmp_link, dfs);
-		list_add(&block->left->tmp_link, dfs);
-	} while (1);
-
-	if (total_allocated < size) {
-		err = -ENOSPC;
-		goto err_free;
-	}
-
-	list_splice_tail(&allocated, blocks);
-
-	return 0;
-
-err_undo:
-	/*
-	 * We really don't want to leave around a bunch of split blocks, since
-	 * bigger is better, so make sure we merge everything back before we
-	 * free the allocated blocks.
-	 */
-	buddy = __get_buddy(block);
-	if (buddy &&
-	    (drm_buddy_block_is_free(block) &&
-	     drm_buddy_block_is_free(buddy)))
-		__drm_buddy_free(mm, block, false);
-
-err_free:
-	if (err == -ENOSPC && total_allocated_on_err) {
-		list_splice_tail(&allocated, blocks);
-		*total_allocated_on_err = total_allocated;
-	} else {
-		drm_buddy_free_list_internal(mm, &allocated);
-	}
-
-	return err;
-}
-
-static int __drm_buddy_alloc_range(struct drm_buddy *mm,
-				   u64 start,
-				   u64 size,
-				   u64 *total_allocated_on_err,
-				   struct list_head *blocks)
-{
-	LIST_HEAD(dfs);
-	int i;
-
-	for (i = 0; i < mm->n_roots; ++i)
-		list_add_tail(&mm->roots[i]->tmp_link, &dfs);
-
-	return __alloc_range(mm, &dfs, start, size,
-			     blocks, total_allocated_on_err);
-}
-
-static int __alloc_contig_try_harder(struct drm_buddy *mm,
-				     u64 size,
-				     u64 min_block_size,
-				     struct list_head *blocks)
-{
-	u64 rhs_offset, lhs_offset, lhs_size, filled;
-	struct drm_buddy_block *block;
-	unsigned int tree, order;
-	LIST_HEAD(blocks_lhs);
-	unsigned long pages;
-	u64 modify_size;
-	int err;
-
-	modify_size = rounddown_pow_of_two(size);
-	pages = modify_size >> ilog2(mm->chunk_size);
-	order = fls(pages) - 1;
-	if (order == 0)
-		return -ENOSPC;
-
-	for_each_free_tree(tree) {
-		struct rb_root *root;
-		struct rb_node *iter;
-
-		root = &mm->free_trees[tree][order];
-		if (rbtree_is_empty(root))
-			continue;
-
-		iter = rb_last(root);
-		while (iter) {
-			block = rbtree_get_free_block(iter);
-
-			/* Allocate blocks traversing RHS */
-			rhs_offset = drm_buddy_block_offset(block);
-			err =  __drm_buddy_alloc_range(mm, rhs_offset, size,
-						       &filled, blocks);
-			if (!err || err != -ENOSPC)
-				return err;
-
-			lhs_size = max((size - filled), min_block_size);
-			if (!IS_ALIGNED(lhs_size, min_block_size))
-				lhs_size = round_up(lhs_size, min_block_size);
-
-			/* Allocate blocks traversing LHS */
-			lhs_offset = drm_buddy_block_offset(block) - lhs_size;
-			err =  __drm_buddy_alloc_range(mm, lhs_offset, lhs_size,
-						       NULL, &blocks_lhs);
-			if (!err) {
-				list_splice(&blocks_lhs, blocks);
-				return 0;
-			} else if (err != -ENOSPC) {
-				drm_buddy_free_list_internal(mm, blocks);
-				return err;
-			}
-			/* Free blocks for the next iteration */
-			drm_buddy_free_list_internal(mm, blocks);
-
-			iter = rb_prev(iter);
-		}
-	}
-
-	return -ENOSPC;
-}
-
-/**
- * drm_buddy_block_trim - free unused pages
- *
- * @mm: DRM buddy manager
- * @start: start address to begin the trimming.
- * @new_size: original size requested
- * @blocks: Input and output list of allocated blocks.
- * MUST contain single block as input to be trimmed.
- * On success will contain the newly allocated blocks
- * making up the @new_size. Blocks always appear in
- * ascending order
- *
- * For contiguous allocation, we round up the size to the nearest
- * power of two value, drivers consume *actual* size, so remaining
- * portions are unused and can be optionally freed with this function
- *
- * Returns:
- * 0 on success, error code on failure.
- */
-int drm_buddy_block_trim(struct drm_buddy *mm,
-			 u64 *start,
-			 u64 new_size,
-			 struct list_head *blocks)
-{
-	struct drm_buddy_block *parent;
-	struct drm_buddy_block *block;
-	u64 block_start, block_end;
-	LIST_HEAD(dfs);
-	u64 new_start;
-	int err;
-
-	if (!list_is_singular(blocks))
-		return -EINVAL;
-
-	block = list_first_entry(blocks,
-				 struct drm_buddy_block,
-				 link);
-
-	block_start = drm_buddy_block_offset(block);
-	block_end = block_start + drm_buddy_block_size(mm, block);
-
-	if (WARN_ON(!drm_buddy_block_is_allocated(block)))
-		return -EINVAL;
-
-	if (new_size > drm_buddy_block_size(mm, block))
-		return -EINVAL;
-
-	if (!new_size || !IS_ALIGNED(new_size, mm->chunk_size))
-		return -EINVAL;
-
-	if (new_size == drm_buddy_block_size(mm, block))
-		return 0;
-
-	new_start = block_start;
-	if (start) {
-		new_start = *start;
-
-		if (new_start < block_start)
-			return -EINVAL;
-
-		if (!IS_ALIGNED(new_start, mm->chunk_size))
-			return -EINVAL;
-
-		if (range_overflows(new_start, new_size, block_end))
-			return -EINVAL;
-	}
-
-	list_del(&block->link);
-	mark_free(mm, block);
-	mm->avail += drm_buddy_block_size(mm, block);
-	if (drm_buddy_block_is_clear(block))
-		mm->clear_avail += drm_buddy_block_size(mm, block);
-
-	/* Prevent recursively freeing this node */
-	parent = block->parent;
-	block->parent = NULL;
-
-	list_add(&block->tmp_link, &dfs);
-	err =  __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL);
-	if (err) {
-		mark_allocated(mm, block);
-		mm->avail -= drm_buddy_block_size(mm, block);
-		if (drm_buddy_block_is_clear(block))
-			mm->clear_avail -= drm_buddy_block_size(mm, block);
-		list_add(&block->link, blocks);
-	}
-
-	block->parent = parent;
-	return err;
-}
-EXPORT_SYMBOL(drm_buddy_block_trim);
-
-static struct drm_buddy_block *
-__drm_buddy_alloc_blocks(struct drm_buddy *mm,
-			 u64 start, u64 end,
-			 unsigned int order,
-			 unsigned long flags)
-{
-	if (flags & DRM_BUDDY_RANGE_ALLOCATION)
-		/* Allocate traversing within the range */
-		return  __drm_buddy_alloc_range_bias(mm, start, end,
-						     order, flags);
-	else
-		/* Allocate from freetree */
-		return alloc_from_freetree(mm, order, flags);
-}
-
-/**
- * drm_buddy_alloc_blocks - allocate power-of-two blocks
- *
- * @mm: DRM buddy manager to allocate from
- * @start: start of the allowed range for this block
- * @end: end of the allowed range for this block
- * @size: size of the allocation in bytes
- * @min_block_size: alignment of the allocation
- * @blocks: output list head to add allocated blocks
- * @flags: DRM_BUDDY_*_ALLOCATION flags
- *
- * alloc_range_bias() called on range limitations, which traverses
- * the tree and returns the desired block.
- *
- * alloc_from_freetree() called when *no* range restrictions
- * are enforced, which picks the block from the freetree.
- *
- * Returns:
- * 0 on success, error code on failure.
- */
-int drm_buddy_alloc_blocks(struct drm_buddy *mm,
-			   u64 start, u64 end, u64 size,
-			   u64 min_block_size,
-			   struct list_head *blocks,
-			   unsigned long flags)
-{
-	struct drm_buddy_block *block = NULL;
-	u64 original_size, original_min_size;
-	unsigned int min_order, order;
-	LIST_HEAD(allocated);
-	unsigned long pages;
-	int err;
-
-	if (size < mm->chunk_size)
-		return -EINVAL;
-
-	if (min_block_size < mm->chunk_size)
-		return -EINVAL;
-
-	if (!is_power_of_2(min_block_size))
-		return -EINVAL;
-
-	if (!IS_ALIGNED(start | end | size, mm->chunk_size))
-		return -EINVAL;
-
-	if (end > mm->size)
-		return -EINVAL;
-
-	if (range_overflows(start, size, mm->size))
-		return -EINVAL;
-
-	/* Actual range allocation */
-	if (start + size == end) {
-		if (!IS_ALIGNED(start | end, min_block_size))
-			return -EINVAL;
-
-		return __drm_buddy_alloc_range(mm, start, size, NULL, blocks);
-	}
-
-	original_size = size;
-	original_min_size = min_block_size;
-
-	/* Roundup the size to power of 2 */
-	if (flags & DRM_BUDDY_CONTIGUOUS_ALLOCATION) {
-		size = roundup_pow_of_two(size);
-		min_block_size = size;
-	/* Align size value to min_block_size */
-	} else if (!IS_ALIGNED(size, min_block_size)) {
-		size = round_up(size, min_block_size);
-	}
-
-	pages = size >> ilog2(mm->chunk_size);
-	order = fls(pages) - 1;
-	min_order = ilog2(min_block_size) - ilog2(mm->chunk_size);
-
-	do {
-		order = min(order, (unsigned int)fls(pages) - 1);
-		BUG_ON(order > mm->max_order);
-		BUG_ON(order < min_order);
-
-		do {
-			block = __drm_buddy_alloc_blocks(mm, start,
-							 end,
-							 order,
-							 flags);
-			if (!IS_ERR(block))
-				break;
-
-			if (order-- == min_order) {
-				/* Try allocation through force merge method */
-				if (mm->clear_avail &&
-				    !__force_merge(mm, start, end, min_order)) {
-					block = __drm_buddy_alloc_blocks(mm, start,
-									 end,
-									 min_order,
-									 flags);
-					if (!IS_ERR(block)) {
-						order = min_order;
-						break;
-					}
-				}
-
-				/*
-				 * Try contiguous block allocation through
-				 * try harder method.
-				 */
-				if (flags & DRM_BUDDY_CONTIGUOUS_ALLOCATION &&
-				    !(flags & DRM_BUDDY_RANGE_ALLOCATION))
-					return __alloc_contig_try_harder(mm,
-									 original_size,
-									 original_min_size,
-									 blocks);
-				err = -ENOSPC;
-				goto err_free;
-			}
-		} while (1);
-
-		mark_allocated(mm, block);
-		mm->avail -= drm_buddy_block_size(mm, block);
-		if (drm_buddy_block_is_clear(block))
-			mm->clear_avail -= drm_buddy_block_size(mm, block);
-		kmemleak_update_trace(block);
-		list_add_tail(&block->link, &allocated);
-
-		pages -= BIT(order);
-
-		if (!pages)
-			break;
-	} while (1);
-
-	/* Trim the allocated block to the required size */
-	if (!(flags & DRM_BUDDY_TRIM_DISABLE) &&
-	    original_size != size) {
-		struct list_head *trim_list;
-		LIST_HEAD(temp);
-		u64 trim_size;
-
-		trim_list = &allocated;
-		trim_size = original_size;
-
-		if (!list_is_singular(&allocated)) {
-			block = list_last_entry(&allocated, typeof(*block), link);
-			list_move(&block->link, &temp);
-			trim_list = &temp;
-			trim_size = drm_buddy_block_size(mm, block) -
-				(size - original_size);
-		}
-
-		drm_buddy_block_trim(mm,
-				     NULL,
-				     trim_size,
-				     trim_list);
-
-		if (!list_empty(&temp))
-			list_splice_tail(trim_list, &allocated);
-	}
-
-	list_splice_tail(&allocated, blocks);
-	return 0;
-
-err_free:
-	drm_buddy_free_list_internal(mm, &allocated);
-	return err;
-}
-EXPORT_SYMBOL(drm_buddy_alloc_blocks);
-
 /**
  * drm_buddy_block_print - print block information
  *
- * @mm: DRM buddy manager
- * @block: DRM buddy block
+ * @mm: GPU buddy manager
+ * @block: GPU buddy block
  * @p: DRM printer to use
  */
-void drm_buddy_block_print(struct drm_buddy *mm,
-			   struct drm_buddy_block *block,
+void drm_buddy_block_print(struct gpu_buddy *mm, struct gpu_buddy_block *block,
 			   struct drm_printer *p)
 {
-	u64 start = drm_buddy_block_offset(block);
-	u64 size = drm_buddy_block_size(mm, block);
+	u64 start = gpu_buddy_block_offset(block);
+	u64 size = gpu_buddy_block_size(mm, block);
 
 	drm_printf(p, "%#018llx-%#018llx: %llu\n", start, start + size, size);
 }
@@ -1267,18 +30,21 @@ EXPORT_SYMBOL(drm_buddy_block_print);
 /**
  * drm_buddy_print - print allocator state
  *
- * @mm: DRM buddy manager
+ * @mm: GPU buddy manager
  * @p: DRM printer to use
  */
-void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p)
+void drm_buddy_print(struct gpu_buddy *mm, struct drm_printer *p)
 {
 	int order;
 
-	drm_printf(p, "chunk_size: %lluKiB, total: %lluMiB, free: %lluMiB, clear_free: %lluMiB\n",
-		   mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
+	drm_printf(
+		p,
+		"chunk_size: %lluKiB, total: %lluMiB, free: %lluMiB, clear_free: %lluMiB\n",
+		mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20,
+		mm->clear_avail >> 20);
 
 	for (order = mm->max_order; order >= 0; order--) {
-		struct drm_buddy_block *block, *tmp;
+		struct gpu_buddy_block *block, *tmp;
 		struct rb_root *root;
 		u64 count = 0, free;
 		unsigned int tree;
@@ -1286,8 +52,9 @@ void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p)
 		for_each_free_tree(tree) {
 			root = &mm->free_trees[tree][order];
 
-			rbtree_postorder_for_each_entry_safe(block, tmp, root, rb) {
-				BUG_ON(!drm_buddy_block_is_free(block));
+			rbtree_postorder_for_each_entry_safe(block, tmp, root,
+							     rb) {
+				BUG_ON(!gpu_buddy_block_is_free(block));
 				count++;
 			}
 		}
@@ -1305,22 +72,5 @@ void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p)
 }
 EXPORT_SYMBOL(drm_buddy_print);
 
-static void drm_buddy_module_exit(void)
-{
-	kmem_cache_destroy(slab_blocks);
-}
-
-static int __init drm_buddy_module_init(void)
-{
-	slab_blocks = KMEM_CACHE(drm_buddy_block, 0);
-	if (!slab_blocks)
-		return -ENOMEM;
-
-	return 0;
-}
-
-module_init(drm_buddy_module_init);
-module_exit(drm_buddy_module_exit);
-
-MODULE_DESCRIPTION("DRM Buddy Allocator");
+MODULE_DESCRIPTION("DRM-specific GPU Buddy Allocator Print Helpers");
 MODULE_LICENSE("Dual MIT/GPL");
diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 5e939004b646..859aeca87c19 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -38,6 +38,7 @@ config DRM_I915
 	select CEC_CORE if CEC_NOTIFIER
 	select VMAP_PFN
 	select DRM_TTM
+	select GPU_BUDDY
 	select DRM_BUDDY
 	select AUXILIARY_BUS
 	help
diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c b/drivers/gpu/drm/i915/i915_scatterlist.c
index 4d830740946d..6a34dae13769 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.c
+++ b/drivers/gpu/drm/i915/i915_scatterlist.c
@@ -7,7 +7,7 @@
 #include "i915_scatterlist.h"
 #include "i915_ttm_buddy_manager.h"
 
-#include <drm/drm_buddy.h>
+#include <linux/gpu_buddy.h>
 #include <drm/drm_mm.h>
 
 #include <linux/slab.h>
@@ -167,9 +167,9 @@ struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
 	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
 	const u64 size = res->size;
 	const u32 max_segment = round_down(UINT_MAX, page_alignment);
-	struct drm_buddy *mm = bman_res->mm;
+	struct gpu_buddy *mm = bman_res->mm;
 	struct list_head *blocks = &bman_res->blocks;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	struct i915_refct_sgt *rsgt;
 	struct scatterlist *sg;
 	struct sg_table *st;
@@ -202,8 +202,8 @@ struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
 	list_for_each_entry(block, blocks, link) {
 		u64 block_size, offset;
 
-		block_size = min_t(u64, size, drm_buddy_block_size(mm, block));
-		offset = drm_buddy_block_offset(block);
+		block_size = min_t(u64, size, gpu_buddy_block_size(mm, block));
+		offset = gpu_buddy_block_offset(block);
 
 		while (block_size) {
 			u64 len;
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
index d5c6e6605086..f43d7f2771ad 100644
--- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
+++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/slab.h>
+#include <linux/gpu_buddy.h>
 
 #include <drm/drm_buddy.h>
 #include <drm/drm_print.h>
@@ -16,7 +17,7 @@
 
 struct i915_ttm_buddy_manager {
 	struct ttm_resource_manager manager;
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 	struct list_head reserved;
 	struct mutex lock;
 	unsigned long visible_size;
@@ -38,7 +39,7 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 {
 	struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
 	struct i915_ttm_buddy_resource *bman_res;
-	struct drm_buddy *mm = &bman->mm;
+	struct gpu_buddy *mm = &bman->mm;
 	unsigned long n_pages, lpfn;
 	u64 min_page_size;
 	u64 size;
@@ -57,13 +58,13 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 	bman_res->mm = mm;
 
 	if (place->flags & TTM_PL_FLAG_TOPDOWN)
-		bman_res->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
+		bman_res->flags |= GPU_BUDDY_TOPDOWN_ALLOCATION;
 
 	if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
-		bman_res->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
+		bman_res->flags |= GPU_BUDDY_CONTIGUOUS_ALLOCATION;
 
 	if (place->fpfn || lpfn != man->size)
-		bman_res->flags |= DRM_BUDDY_RANGE_ALLOCATION;
+		bman_res->flags |= GPU_BUDDY_RANGE_ALLOCATION;
 
 	GEM_BUG_ON(!bman_res->base.size);
 	size = bman_res->base.size;
@@ -89,7 +90,7 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 		goto err_free_res;
 	}
 
-	err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
+	err = gpu_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
 				     (u64)lpfn << PAGE_SHIFT,
 				     (u64)n_pages << PAGE_SHIFT,
 				     min_page_size,
@@ -101,15 +102,15 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 	if (lpfn <= bman->visible_size) {
 		bman_res->used_visible_size = PFN_UP(bman_res->base.size);
 	} else {
-		struct drm_buddy_block *block;
+		struct gpu_buddy_block *block;
 
 		list_for_each_entry(block, &bman_res->blocks, link) {
 			unsigned long start =
-				drm_buddy_block_offset(block) >> PAGE_SHIFT;
+				gpu_buddy_block_offset(block) >> PAGE_SHIFT;
 
 			if (start < bman->visible_size) {
 				unsigned long end = start +
-					(drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
+					(gpu_buddy_block_size(mm, block) >> PAGE_SHIFT);
 
 				bman_res->used_visible_size +=
 					min(end, bman->visible_size) - start;
@@ -126,7 +127,7 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
 	return 0;
 
 err_free_blocks:
-	drm_buddy_free_list(mm, &bman_res->blocks, 0);
+	gpu_buddy_free_list(mm, &bman_res->blocks, 0);
 	mutex_unlock(&bman->lock);
 err_free_res:
 	ttm_resource_fini(man, &bman_res->base);
@@ -141,7 +142,7 @@ static void i915_ttm_buddy_man_free(struct ttm_resource_manager *man,
 	struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
 
 	mutex_lock(&bman->lock);
-	drm_buddy_free_list(&bman->mm, &bman_res->blocks, 0);
+	gpu_buddy_free_list(&bman->mm, &bman_res->blocks, 0);
 	bman->visible_avail += bman_res->used_visible_size;
 	mutex_unlock(&bman->lock);
 
@@ -156,8 +157,8 @@ static bool i915_ttm_buddy_man_intersects(struct ttm_resource_manager *man,
 {
 	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
 	struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
-	struct drm_buddy *mm = &bman->mm;
-	struct drm_buddy_block *block;
+	struct gpu_buddy *mm = &bman->mm;
+	struct gpu_buddy_block *block;
 
 	if (!place->fpfn && !place->lpfn)
 		return true;
@@ -176,9 +177,9 @@ static bool i915_ttm_buddy_man_intersects(struct ttm_resource_manager *man,
 	/* Check each drm buddy block individually */
 	list_for_each_entry(block, &bman_res->blocks, link) {
 		unsigned long fpfn =
-			drm_buddy_block_offset(block) >> PAGE_SHIFT;
+			gpu_buddy_block_offset(block) >> PAGE_SHIFT;
 		unsigned long lpfn = fpfn +
-			(drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
+			(gpu_buddy_block_size(mm, block) >> PAGE_SHIFT);
 
 		if (place->fpfn < lpfn && place->lpfn > fpfn)
 			return true;
@@ -194,8 +195,8 @@ static bool i915_ttm_buddy_man_compatible(struct ttm_resource_manager *man,
 {
 	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
 	struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
-	struct drm_buddy *mm = &bman->mm;
-	struct drm_buddy_block *block;
+	struct gpu_buddy *mm = &bman->mm;
+	struct gpu_buddy_block *block;
 
 	if (!place->fpfn && !place->lpfn)
 		return true;
@@ -209,9 +210,9 @@ static bool i915_ttm_buddy_man_compatible(struct ttm_resource_manager *man,
 	/* Check each drm buddy block individually */
 	list_for_each_entry(block, &bman_res->blocks, link) {
 		unsigned long fpfn =
-			drm_buddy_block_offset(block) >> PAGE_SHIFT;
+			gpu_buddy_block_offset(block) >> PAGE_SHIFT;
 		unsigned long lpfn = fpfn +
-			(drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
+			(gpu_buddy_block_size(mm, block) >> PAGE_SHIFT);
 
 		if (fpfn < place->fpfn || lpfn > place->lpfn)
 			return false;
@@ -224,7 +225,7 @@ static void i915_ttm_buddy_man_debug(struct ttm_resource_manager *man,
 				     struct drm_printer *printer)
 {
 	struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 
 	mutex_lock(&bman->lock);
 	drm_printf(printer, "default_page_size: %lluKiB\n",
@@ -293,7 +294,7 @@ int i915_ttm_buddy_man_init(struct ttm_device *bdev,
 	if (!bman)
 		return -ENOMEM;
 
-	err = drm_buddy_init(&bman->mm, size, chunk_size);
+	err = gpu_buddy_init(&bman->mm, size, chunk_size);
 	if (err)
 		goto err_free_bman;
 
@@ -333,7 +334,7 @@ int i915_ttm_buddy_man_fini(struct ttm_device *bdev, unsigned int type)
 {
 	struct ttm_resource_manager *man = ttm_manager_type(bdev, type);
 	struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
-	struct drm_buddy *mm = &bman->mm;
+	struct gpu_buddy *mm = &bman->mm;
 	int ret;
 
 	ttm_resource_manager_set_used(man, false);
@@ -345,8 +346,8 @@ int i915_ttm_buddy_man_fini(struct ttm_device *bdev, unsigned int type)
 	ttm_set_driver_manager(bdev, type, NULL);
 
 	mutex_lock(&bman->lock);
-	drm_buddy_free_list(mm, &bman->reserved, 0);
-	drm_buddy_fini(mm);
+	gpu_buddy_free_list(mm, &bman->reserved, 0);
+	gpu_buddy_fini(mm);
 	bman->visible_avail += bman->visible_reserved;
 	WARN_ON_ONCE(bman->visible_avail != bman->visible_size);
 	mutex_unlock(&bman->lock);
@@ -371,15 +372,15 @@ int i915_ttm_buddy_man_reserve(struct ttm_resource_manager *man,
 			       u64 start, u64 size)
 {
 	struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
-	struct drm_buddy *mm = &bman->mm;
+	struct gpu_buddy *mm = &bman->mm;
 	unsigned long fpfn = start >> PAGE_SHIFT;
 	unsigned long flags = 0;
 	int ret;
 
-	flags |= DRM_BUDDY_RANGE_ALLOCATION;
+	flags |= GPU_BUDDY_RANGE_ALLOCATION;
 
 	mutex_lock(&bman->lock);
-	ret = drm_buddy_alloc_blocks(mm, start,
+	ret = gpu_buddy_alloc_blocks(mm, start,
 				     start + size,
 				     size, mm->chunk_size,
 				     &bman->reserved,
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
index d64620712830..4a92dcf09766 100644
--- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
+++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
@@ -13,14 +13,14 @@
 
 struct ttm_device;
 struct ttm_resource_manager;
-struct drm_buddy;
+struct gpu_buddy;
 
 /**
  * struct i915_ttm_buddy_resource
  *
  * @base: struct ttm_resource base class we extend
  * @blocks: the list of struct i915_buddy_block for this resource/allocation
- * @flags: DRM_BUDDY_*_ALLOCATION flags
+ * @flags: GPU_BUDDY_*_ALLOCATION flags
  * @used_visible_size: How much of this resource, if any, uses the CPU visible
  * portion, in pages.
  * @mm: the struct i915_buddy_mm for this resource
@@ -33,7 +33,7 @@ struct i915_ttm_buddy_resource {
 	struct list_head blocks;
 	unsigned long flags;
 	unsigned long used_visible_size;
-	struct drm_buddy *mm;
+	struct gpu_buddy *mm;
 };
 
 /**
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
index 7b856b5090f9..8307390943a2 100644
--- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
@@ -6,7 +6,7 @@
 #include <linux/prime_numbers.h>
 #include <linux/sort.h>
 
-#include <drm/drm_buddy.h>
+#include <linux/gpu_buddy.h>
 
 #include "../i915_selftest.h"
 
@@ -371,7 +371,7 @@ static int igt_mock_splintered_region(void *arg)
 	struct drm_i915_private *i915 = mem->i915;
 	struct i915_ttm_buddy_resource *res;
 	struct drm_i915_gem_object *obj;
-	struct drm_buddy *mm;
+	struct gpu_buddy *mm;
 	unsigned int expected_order;
 	LIST_HEAD(objects);
 	u64 size;
@@ -447,8 +447,8 @@ static int igt_mock_max_segment(void *arg)
 	struct drm_i915_private *i915 = mem->i915;
 	struct i915_ttm_buddy_resource *res;
 	struct drm_i915_gem_object *obj;
-	struct drm_buddy_block *block;
-	struct drm_buddy *mm;
+	struct gpu_buddy_block *block;
+	struct gpu_buddy *mm;
 	struct list_head *blocks;
 	struct scatterlist *sg;
 	I915_RND_STATE(prng);
@@ -487,8 +487,8 @@ static int igt_mock_max_segment(void *arg)
 	mm = res->mm;
 	size = 0;
 	list_for_each_entry(block, blocks, link) {
-		if (drm_buddy_block_size(mm, block) > size)
-			size = drm_buddy_block_size(mm, block);
+		if (gpu_buddy_block_size(mm, block) > size)
+			size = gpu_buddy_block_size(mm, block);
 	}
 	if (size < max_segment) {
 		pr_err("%s: Failed to create a huge contiguous block [> %u], largest block %lld\n",
@@ -527,14 +527,14 @@ static u64 igt_object_mappable_total(struct drm_i915_gem_object *obj)
 	struct intel_memory_region *mr = obj->mm.region;
 	struct i915_ttm_buddy_resource *bman_res =
 		to_ttm_buddy_resource(obj->mm.res);
-	struct drm_buddy *mm = bman_res->mm;
-	struct drm_buddy_block *block;
+	struct gpu_buddy *mm = bman_res->mm;
+	struct gpu_buddy_block *block;
 	u64 total;
 
 	total = 0;
 	list_for_each_entry(block, &bman_res->blocks, link) {
-		u64 start = drm_buddy_block_offset(block);
-		u64 end = start + drm_buddy_block_size(mm, block);
+		u64 start = gpu_buddy_block_offset(block);
+		u64 end = start + gpu_buddy_block_size(mm, block);
 
 		if (start < resource_size(&mr->io))
 			total += min_t(u64, end, resource_size(&mr->io)) - start;
diff --git a/drivers/gpu/drm/tests/Makefile b/drivers/gpu/drm/tests/Makefile
index 87d5d5f9332a..d2e2e3d8349a 100644
--- a/drivers/gpu/drm/tests/Makefile
+++ b/drivers/gpu/drm/tests/Makefile
@@ -7,7 +7,6 @@ obj-$(CONFIG_DRM_KUNIT_TEST) += \
 	drm_atomic_test.o \
 	drm_atomic_state_test.o \
 	drm_bridge_test.o \
-	drm_buddy_test.o \
 	drm_cmdline_parser_test.o \
 	drm_connector_test.o \
 	drm_damage_helper_test.o \
diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
index 2eda87882e65..ffa12473077c 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
@@ -3,6 +3,7 @@
  * Copyright © 2023 Intel Corporation
  */
 #include <linux/delay.h>
+#include <linux/gpu_buddy.h>
 #include <linux/kthread.h>
 
 #include <drm/ttm/ttm_resource.h>
@@ -251,7 +252,7 @@ static void ttm_bo_validate_basic(struct kunit *test)
 				   NULL, &dummy_ttm_bo_destroy);
 	KUNIT_EXPECT_EQ(test, err, 0);
 
-	snd_place = ttm_place_kunit_init(test, snd_mem, DRM_BUDDY_TOPDOWN_ALLOCATION);
+	snd_place = ttm_place_kunit_init(test, snd_mem, GPU_BUDDY_TOPDOWN_ALLOCATION);
 	snd_placement = ttm_placement_kunit_init(test, snd_place, 1);
 
 	err = ttm_bo_validate(bo, snd_placement, &ctx_val);
@@ -263,7 +264,7 @@ static void ttm_bo_validate_basic(struct kunit *test)
 	KUNIT_EXPECT_TRUE(test, ttm_tt_is_populated(bo->ttm));
 	KUNIT_EXPECT_EQ(test, bo->resource->mem_type, snd_mem);
 	KUNIT_EXPECT_EQ(test, bo->resource->placement,
-			DRM_BUDDY_TOPDOWN_ALLOCATION);
+			GPU_BUDDY_TOPDOWN_ALLOCATION);
 
 	ttm_bo_fini(bo);
 	ttm_mock_manager_fini(priv->ttm_dev, snd_mem);
diff --git a/drivers/gpu/drm/ttm/tests/ttm_mock_manager.c b/drivers/gpu/drm/ttm/tests/ttm_mock_manager.c
index dd395229e388..294d56d9067e 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_mock_manager.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_mock_manager.c
@@ -31,7 +31,7 @@ static int ttm_mock_manager_alloc(struct ttm_resource_manager *man,
 {
 	struct ttm_mock_manager *manager = to_mock_mgr(man);
 	struct ttm_mock_resource *mock_res;
-	struct drm_buddy *mm = &manager->mm;
+	struct gpu_buddy *mm = &manager->mm;
 	u64 lpfn, fpfn, alloc_size;
 	int err;
 
@@ -47,14 +47,14 @@ static int ttm_mock_manager_alloc(struct ttm_resource_manager *man,
 	INIT_LIST_HEAD(&mock_res->blocks);
 
 	if (place->flags & TTM_PL_FLAG_TOPDOWN)
-		mock_res->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
+		mock_res->flags |= GPU_BUDDY_TOPDOWN_ALLOCATION;
 
 	if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
-		mock_res->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
+		mock_res->flags |= GPU_BUDDY_CONTIGUOUS_ALLOCATION;
 
 	alloc_size = (uint64_t)mock_res->base.size;
 	mutex_lock(&manager->lock);
-	err = drm_buddy_alloc_blocks(mm, fpfn, lpfn, alloc_size,
+	err = gpu_buddy_alloc_blocks(mm, fpfn, lpfn, alloc_size,
 				     manager->default_page_size,
 				     &mock_res->blocks,
 				     mock_res->flags);
@@ -67,7 +67,7 @@ static int ttm_mock_manager_alloc(struct ttm_resource_manager *man,
 	return 0;
 
 error_free_blocks:
-	drm_buddy_free_list(mm, &mock_res->blocks, 0);
+	gpu_buddy_free_list(mm, &mock_res->blocks, 0);
 	ttm_resource_fini(man, &mock_res->base);
 	mutex_unlock(&manager->lock);
 
@@ -79,10 +79,10 @@ static void ttm_mock_manager_free(struct ttm_resource_manager *man,
 {
 	struct ttm_mock_manager *manager = to_mock_mgr(man);
 	struct ttm_mock_resource *mock_res = to_mock_mgr_resource(res);
-	struct drm_buddy *mm = &manager->mm;
+	struct gpu_buddy *mm = &manager->mm;
 
 	mutex_lock(&manager->lock);
-	drm_buddy_free_list(mm, &mock_res->blocks, 0);
+	gpu_buddy_free_list(mm, &mock_res->blocks, 0);
 	mutex_unlock(&manager->lock);
 
 	ttm_resource_fini(man, res);
@@ -106,7 +106,7 @@ int ttm_mock_manager_init(struct ttm_device *bdev, u32 mem_type, u32 size)
 
 	mutex_init(&manager->lock);
 
-	err = drm_buddy_init(&manager->mm, size, PAGE_SIZE);
+	err = gpu_buddy_init(&manager->mm, size, PAGE_SIZE);
 
 	if (err) {
 		kfree(manager);
@@ -142,7 +142,7 @@ void ttm_mock_manager_fini(struct ttm_device *bdev, u32 mem_type)
 	ttm_resource_manager_set_used(man, false);
 
 	mutex_lock(&mock_man->lock);
-	drm_buddy_fini(&mock_man->mm);
+	gpu_buddy_fini(&mock_man->mm);
 	mutex_unlock(&mock_man->lock);
 
 	ttm_set_driver_manager(bdev, mem_type, NULL);
diff --git a/drivers/gpu/drm/ttm/tests/ttm_mock_manager.h b/drivers/gpu/drm/ttm/tests/ttm_mock_manager.h
index e4c95f86a467..08710756fd8e 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_mock_manager.h
+++ b/drivers/gpu/drm/ttm/tests/ttm_mock_manager.h
@@ -5,11 +5,11 @@
 #ifndef TTM_MOCK_MANAGER_H
 #define TTM_MOCK_MANAGER_H
 
-#include <drm/drm_buddy.h>
+#include <linux/gpu_buddy.h>
 
 struct ttm_mock_manager {
 	struct ttm_resource_manager man;
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 	u64 default_page_size;
 	/* protects allocations of mock buffer objects */
 	struct mutex lock;
diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index 4b288eb3f5b0..982ef754742e 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -11,6 +11,7 @@ config DRM_XE
 	# the shmem_readpage() which depends upon tmpfs
 	select SHMEM
 	select TMPFS
+	select GPU_BUDDY
 	select DRM_BUDDY
 	select DRM_CLIENT_SELECTION
 	select DRM_KMS_HELPER
diff --git a/drivers/gpu/drm/xe/xe_res_cursor.h b/drivers/gpu/drm/xe/xe_res_cursor.h
index 4e00008b7081..5f4ab08c0686 100644
--- a/drivers/gpu/drm/xe/xe_res_cursor.h
+++ b/drivers/gpu/drm/xe/xe_res_cursor.h
@@ -58,7 +58,7 @@ struct xe_res_cursor {
 	/** @dma_addr: Current element in a struct drm_pagemap_addr array */
 	const struct drm_pagemap_addr *dma_addr;
 	/** @mm: Buddy allocator for VRAM cursor */
-	struct drm_buddy *mm;
+	struct gpu_buddy *mm;
 	/**
 	 * @dma_start: DMA start address for the current segment.
 	 * This may be different to @dma_addr.addr since elements in
@@ -69,7 +69,7 @@ struct xe_res_cursor {
 	u64 dma_seg_size;
 };
 
-static struct drm_buddy *xe_res_get_buddy(struct ttm_resource *res)
+static struct gpu_buddy *xe_res_get_buddy(struct ttm_resource *res)
 {
 	struct ttm_resource_manager *mgr;
 
@@ -104,30 +104,30 @@ static inline void xe_res_first(struct ttm_resource *res,
 	case XE_PL_STOLEN:
 	case XE_PL_VRAM0:
 	case XE_PL_VRAM1: {
-		struct drm_buddy_block *block;
+		struct gpu_buddy_block *block;
 		struct list_head *head, *next;
-		struct drm_buddy *mm = xe_res_get_buddy(res);
+		struct gpu_buddy *mm = xe_res_get_buddy(res);
 
 		head = &to_xe_ttm_vram_mgr_resource(res)->blocks;
 
 		block = list_first_entry_or_null(head,
-						 struct drm_buddy_block,
+						 struct gpu_buddy_block,
 						 link);
 		if (!block)
 			goto fallback;
 
-		while (start >= drm_buddy_block_size(mm, block)) {
-			start -= drm_buddy_block_size(mm, block);
+		while (start >= gpu_buddy_block_size(mm, block)) {
+			start -= gpu_buddy_block_size(mm, block);
 
 			next = block->link.next;
 			if (next != head)
-				block = list_entry(next, struct drm_buddy_block,
+				block = list_entry(next, struct gpu_buddy_block,
 						   link);
 		}
 
 		cur->mm = mm;
-		cur->start = drm_buddy_block_offset(block) + start;
-		cur->size = min(drm_buddy_block_size(mm, block) - start,
+		cur->start = gpu_buddy_block_offset(block) + start;
+		cur->size = min(gpu_buddy_block_size(mm, block) - start,
 				size);
 		cur->remaining = size;
 		cur->node = block;
@@ -259,7 +259,7 @@ static inline void xe_res_first_dma(const struct drm_pagemap_addr *dma_addr,
  */
 static inline void xe_res_next(struct xe_res_cursor *cur, u64 size)
 {
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	struct list_head *next;
 	u64 start;
 
@@ -295,18 +295,18 @@ static inline void xe_res_next(struct xe_res_cursor *cur, u64 size)
 		block = cur->node;
 
 		next = block->link.next;
-		block = list_entry(next, struct drm_buddy_block, link);
+		block = list_entry(next, struct gpu_buddy_block, link);
 
 
-		while (start >= drm_buddy_block_size(cur->mm, block)) {
-			start -= drm_buddy_block_size(cur->mm, block);
+		while (start >= gpu_buddy_block_size(cur->mm, block)) {
+			start -= gpu_buddy_block_size(cur->mm, block);
 
 			next = block->link.next;
-			block = list_entry(next, struct drm_buddy_block, link);
+			block = list_entry(next, struct gpu_buddy_block, link);
 		}
 
-		cur->start = drm_buddy_block_offset(block) + start;
-		cur->size = min(drm_buddy_block_size(cur->mm, block) - start,
+		cur->start = gpu_buddy_block_offset(block) + start;
+		cur->size = min(gpu_buddy_block_size(cur->mm, block) - start,
 				cur->remaining);
 		cur->node = block;
 		break;
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index f97e0af6a9b0..2b7e266f9bdd 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -688,7 +688,7 @@ static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64 offset)
 	return PHYS_PFN(offset + vr->hpa_base);
 }
 
-static struct drm_buddy *vram_to_buddy(struct xe_vram_region *vram)
+static struct gpu_buddy *vram_to_buddy(struct xe_vram_region *vram)
 {
 	return &vram->ttm.mm;
 }
@@ -699,16 +699,16 @@ static int xe_svm_populate_devmem_pfn(struct drm_pagemap_devmem *devmem_allocati
 	struct xe_bo *bo = to_xe_bo(devmem_allocation);
 	struct ttm_resource *res = bo->ttm.resource;
 	struct list_head *blocks = &to_xe_ttm_vram_mgr_resource(res)->blocks;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	int j = 0;
 
 	list_for_each_entry(block, blocks, link) {
 		struct xe_vram_region *vr = block->private;
-		struct drm_buddy *buddy = vram_to_buddy(vr);
-		u64 block_pfn = block_offset_to_pfn(vr, drm_buddy_block_offset(block));
+		struct gpu_buddy *buddy = vram_to_buddy(vr);
+		u64 block_pfn = block_offset_to_pfn(vr, gpu_buddy_block_offset(block));
 		int i;
 
-		for (i = 0; i < drm_buddy_block_size(buddy, block) >> PAGE_SHIFT; ++i)
+		for (i = 0; i < gpu_buddy_block_size(buddy, block) >> PAGE_SHIFT; ++i)
 			pfn[j++] = block_pfn + i;
 	}
 
@@ -876,7 +876,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 	struct dma_fence *pre_migrate_fence = NULL;
 	struct xe_device *xe = vr->xe;
 	struct device *dev = xe->drm.dev;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	struct xe_validation_ctx vctx;
 	struct list_head *blocks;
 	struct drm_exec exec;
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 9f70802fce92..8192957261e8 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -4,8 +4,9 @@
  * Copyright (C) 2021-2022 Red Hat
  */
 
-#include <drm/drm_managed.h>
+#include <drm/drm_buddy.h>
 #include <drm/drm_drv.h>
+#include <drm/drm_managed.h>
 
 #include <drm/ttm/ttm_placement.h>
 #include <drm/ttm/ttm_range_manager.h>
@@ -17,16 +18,16 @@
 #include "xe_ttm_vram_mgr.h"
 #include "xe_vram_types.h"
 
-static inline struct drm_buddy_block *
+static inline struct gpu_buddy_block *
 xe_ttm_vram_mgr_first_block(struct list_head *list)
 {
-	return list_first_entry_or_null(list, struct drm_buddy_block, link);
+	return list_first_entry_or_null(list, struct gpu_buddy_block, link);
 }
 
-static inline bool xe_is_vram_mgr_blocks_contiguous(struct drm_buddy *mm,
+static inline bool xe_is_vram_mgr_blocks_contiguous(struct gpu_buddy *mm,
 						    struct list_head *head)
 {
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	u64 start, size;
 
 	block = xe_ttm_vram_mgr_first_block(head);
@@ -34,12 +35,12 @@ static inline bool xe_is_vram_mgr_blocks_contiguous(struct drm_buddy *mm,
 		return false;
 
 	while (head != block->link.next) {
-		start = drm_buddy_block_offset(block);
-		size = drm_buddy_block_size(mm, block);
+		start = gpu_buddy_block_offset(block);
+		size = gpu_buddy_block_size(mm, block);
 
-		block = list_entry(block->link.next, struct drm_buddy_block,
+		block = list_entry(block->link.next, struct gpu_buddy_block,
 				   link);
-		if (start + size != drm_buddy_block_offset(block))
+		if (start + size != gpu_buddy_block_offset(block))
 			return false;
 	}
 
@@ -53,7 +54,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 {
 	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
 	struct xe_ttm_vram_mgr_resource *vres;
-	struct drm_buddy *mm = &mgr->mm;
+	struct gpu_buddy *mm = &mgr->mm;
 	u64 size, min_page_size;
 	unsigned long lpfn;
 	int err;
@@ -80,10 +81,10 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	INIT_LIST_HEAD(&vres->blocks);
 
 	if (place->flags & TTM_PL_FLAG_TOPDOWN)
-		vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
+		vres->flags |= GPU_BUDDY_TOPDOWN_ALLOCATION;
 
 	if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
-		vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
+		vres->flags |= GPU_BUDDY_RANGE_ALLOCATION;
 
 	if (WARN_ON(!vres->base.size)) {
 		err = -EINVAL;
@@ -119,27 +120,27 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 		lpfn = max_t(unsigned long, place->fpfn + (size >> PAGE_SHIFT), lpfn);
 	}
 
-	err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
+	err = gpu_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
 				     (u64)lpfn << PAGE_SHIFT, size,
 				     min_page_size, &vres->blocks, vres->flags);
 	if (err)
 		goto error_unlock;
 
 	if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
-		if (!drm_buddy_block_trim(mm, NULL, vres->base.size, &vres->blocks))
+		if (!gpu_buddy_block_trim(mm, NULL, vres->base.size, &vres->blocks))
 			size = vres->base.size;
 	}
 
 	if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
 		vres->used_visible_size = size;
 	} else {
-		struct drm_buddy_block *block;
+		struct gpu_buddy_block *block;
 
 		list_for_each_entry(block, &vres->blocks, link) {
-			u64 start = drm_buddy_block_offset(block);
+			u64 start = gpu_buddy_block_offset(block);
 
 			if (start < mgr->visible_size) {
-				u64 end = start + drm_buddy_block_size(mm, block);
+				u64 end = start + gpu_buddy_block_size(mm, block);
 
 				vres->used_visible_size +=
 					min(end, mgr->visible_size) - start;
@@ -159,11 +160,11 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	 * the object.
 	 */
 	if (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) {
-		struct drm_buddy_block *block = list_first_entry(&vres->blocks,
+		struct gpu_buddy_block *block = list_first_entry(&vres->blocks,
 								 typeof(*block),
 								 link);
 
-		vres->base.start = drm_buddy_block_offset(block) >> PAGE_SHIFT;
+		vres->base.start = gpu_buddy_block_offset(block) >> PAGE_SHIFT;
 	} else {
 		vres->base.start = XE_BO_INVALID_OFFSET;
 	}
@@ -185,10 +186,10 @@ static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
 	struct xe_ttm_vram_mgr_resource *vres =
 		to_xe_ttm_vram_mgr_resource(res);
 	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
-	struct drm_buddy *mm = &mgr->mm;
+	struct gpu_buddy *mm = &mgr->mm;
 
 	mutex_lock(&mgr->lock);
-	drm_buddy_free_list(mm, &vres->blocks, 0);
+	gpu_buddy_free_list(mm, &vres->blocks, 0);
 	mgr->visible_avail += vres->used_visible_size;
 	mutex_unlock(&mgr->lock);
 
@@ -201,7 +202,7 @@ static void xe_ttm_vram_mgr_debug(struct ttm_resource_manager *man,
 				  struct drm_printer *printer)
 {
 	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
-	struct drm_buddy *mm = &mgr->mm;
+	struct gpu_buddy *mm = &mgr->mm;
 
 	mutex_lock(&mgr->lock);
 	drm_printf(printer, "default_page_size: %lluKiB\n",
@@ -224,8 +225,8 @@ static bool xe_ttm_vram_mgr_intersects(struct ttm_resource_manager *man,
 	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
 	struct xe_ttm_vram_mgr_resource *vres =
 		to_xe_ttm_vram_mgr_resource(res);
-	struct drm_buddy *mm = &mgr->mm;
-	struct drm_buddy_block *block;
+	struct gpu_buddy *mm = &mgr->mm;
+	struct gpu_buddy_block *block;
 
 	if (!place->fpfn && !place->lpfn)
 		return true;
@@ -235,9 +236,9 @@ static bool xe_ttm_vram_mgr_intersects(struct ttm_resource_manager *man,
 
 	list_for_each_entry(block, &vres->blocks, link) {
 		unsigned long fpfn =
-			drm_buddy_block_offset(block) >> PAGE_SHIFT;
+			gpu_buddy_block_offset(block) >> PAGE_SHIFT;
 		unsigned long lpfn = fpfn +
-			(drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
+			(gpu_buddy_block_size(mm, block) >> PAGE_SHIFT);
 
 		if (place->fpfn < lpfn && place->lpfn > fpfn)
 			return true;
@@ -254,8 +255,8 @@ static bool xe_ttm_vram_mgr_compatible(struct ttm_resource_manager *man,
 	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
 	struct xe_ttm_vram_mgr_resource *vres =
 		to_xe_ttm_vram_mgr_resource(res);
-	struct drm_buddy *mm = &mgr->mm;
-	struct drm_buddy_block *block;
+	struct gpu_buddy *mm = &mgr->mm;
+	struct gpu_buddy_block *block;
 
 	if (!place->fpfn && !place->lpfn)
 		return true;
@@ -265,9 +266,9 @@ static bool xe_ttm_vram_mgr_compatible(struct ttm_resource_manager *man,
 
 	list_for_each_entry(block, &vres->blocks, link) {
 		unsigned long fpfn =
-			drm_buddy_block_offset(block) >> PAGE_SHIFT;
+			gpu_buddy_block_offset(block) >> PAGE_SHIFT;
 		unsigned long lpfn = fpfn +
-			(drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
+			(gpu_buddy_block_size(mm, block) >> PAGE_SHIFT);
 
 		if (fpfn < place->fpfn || lpfn > place->lpfn)
 			return false;
@@ -297,7 +298,7 @@ static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
 
 	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
 
-	drm_buddy_fini(&mgr->mm);
+	gpu_buddy_fini(&mgr->mm);
 
 	ttm_resource_manager_cleanup(&mgr->manager);
 
@@ -328,7 +329,7 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
 	mgr->visible_avail = io_size;
 
 	ttm_resource_manager_init(man, &xe->ttm, size);
-	err = drm_buddy_init(&mgr->mm, man->size, default_page_size);
+	err = gpu_buddy_init(&mgr->mm, man->size, default_page_size);
 	if (err)
 		return err;
 
@@ -376,7 +377,7 @@ int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
 	if (!*sgt)
 		return -ENOMEM;
 
-	/* Determine the number of DRM_BUDDY blocks to export */
+	/* Determine the number of GPU_BUDDY blocks to export */
 	xe_res_first(res, offset, length, &cursor);
 	while (cursor.remaining) {
 		num_entries++;
@@ -393,10 +394,10 @@ int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
 		sg->length = 0;
 
 	/*
-	 * Walk down DRM_BUDDY blocks to populate scatterlist nodes
-	 * @note: Use iterator api to get first the DRM_BUDDY block
+	 * Walk down GPU_BUDDY blocks to populate scatterlist nodes
+	 * @note: Use iterator api to get first the GPU_BUDDY block
 	 * and the number of bytes from it. Access the following
-	 * DRM_BUDDY block(s) if more buffer needs to exported
+	 * GPU_BUDDY block(s) if more buffer needs to exported
 	 */
 	xe_res_first(res, offset, length, &cursor);
 	for_each_sgtable_sg((*sgt), sg, i) {
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
index a71e14818ec2..9106da056b49 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
@@ -6,7 +6,7 @@
 #ifndef _XE_TTM_VRAM_MGR_TYPES_H_
 #define _XE_TTM_VRAM_MGR_TYPES_H_
 
-#include <drm/drm_buddy.h>
+#include <linux/gpu_buddy.h>
 #include <drm/ttm/ttm_device.h>
 
 /**
@@ -18,7 +18,7 @@ struct xe_ttm_vram_mgr {
 	/** @manager: Base TTM resource manager */
 	struct ttm_resource_manager manager;
 	/** @mm: DRM buddy allocator which manages the VRAM */
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 	/** @visible_size: Proped size of the CPU visible portion */
 	u64 visible_size;
 	/** @visible_avail: CPU visible portion still unallocated */
diff --git a/drivers/gpu/tests/Makefile b/drivers/gpu/tests/Makefile
new file mode 100644
index 000000000000..31a5ff44cb4e
--- /dev/null
+++ b/drivers/gpu/tests/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_GPU_BUDDY_KUNIT_TEST) += gpu_buddy_test.o gpu_random.o
diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/tests/gpu_buddy_test.c
similarity index 68%
rename from drivers/gpu/drm/tests/drm_buddy_test.c
rename to drivers/gpu/tests/gpu_buddy_test.c
index 5f40b5343bd8..dcd4741a905d 100644
--- a/drivers/gpu/drm/tests/drm_buddy_test.c
+++ b/drivers/gpu/tests/gpu_buddy_test.c
@@ -10,9 +10,9 @@
 #include <linux/sched/signal.h>
 #include <linux/sizes.h>
 
-#include <drm/drm_buddy.h>
+#include <linux/gpu_buddy.h>
 
-#include "../lib/drm_random.h"
+#include "gpu_random.h"
 
 static unsigned int random_seed;
 
@@ -21,9 +21,9 @@ static inline u64 get_size(int order, u64 chunk_size)
 	return (1 << order) * chunk_size;
 }
 
-static void drm_test_buddy_fragmentation_performance(struct kunit *test)
+static void gpu_test_buddy_fragmentation_performance(struct kunit *test)
 {
-	struct drm_buddy_block *block, *tmp;
+	struct gpu_buddy_block *block, *tmp;
 	int num_blocks, i, ret, count = 0;
 	LIST_HEAD(allocated_blocks);
 	unsigned long elapsed_ms;
@@ -32,7 +32,7 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
 	LIST_HEAD(clear_list);
 	LIST_HEAD(dirty_list);
 	LIST_HEAD(free_list);
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 	u64 mm_size = SZ_4G;
 	ktime_t start, end;
 
@@ -47,7 +47,7 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
 	 * quickly the allocator can satisfy larger, aligned requests from a pool of
 	 * highly fragmented space.
 	 */
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K),
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, SZ_4K),
 			       "buddy_init failed\n");
 
 	num_blocks = mm_size / SZ_64K;
@@ -55,7 +55,7 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
 	start = ktime_get();
 	/* Allocate with maximum fragmentation - 8K blocks with 64K alignment */
 	for (i = 0; i < num_blocks; i++)
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, SZ_8K, SZ_64K,
+		KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size, SZ_8K, SZ_64K,
 								    &allocated_blocks, 0),
 					"buddy_alloc hit an error size=%u\n", SZ_8K);
 
@@ -68,21 +68,21 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
 	}
 
 	/* Free with different flags to ensure no coalescing */
-	drm_buddy_free_list(&mm, &clear_list, DRM_BUDDY_CLEARED);
-	drm_buddy_free_list(&mm, &dirty_list, 0);
+	gpu_buddy_free_list(&mm, &clear_list, GPU_BUDDY_CLEARED);
+	gpu_buddy_free_list(&mm, &dirty_list, 0);
 
 	for (i = 0; i < num_blocks; i++)
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, SZ_64K, SZ_64K,
+		KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size, SZ_64K, SZ_64K,
 								    &test_blocks, 0),
 					"buddy_alloc hit an error size=%u\n", SZ_64K);
-	drm_buddy_free_list(&mm, &test_blocks, 0);
+	gpu_buddy_free_list(&mm, &test_blocks, 0);
 
 	end = ktime_get();
 	elapsed_ms = ktime_to_ms(ktime_sub(end, start));
 
 	kunit_info(test, "Fragmented allocation took %lu ms\n", elapsed_ms);
 
-	drm_buddy_fini(&mm);
+	gpu_buddy_fini(&mm);
 
 	/*
 	 * Reverse free order under fragmentation
@@ -96,13 +96,13 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
 	 * deallocation occurs in the opposite order of allocation, exposing the
 	 * cost difference between a linear freelist scan and an ordered tree lookup.
 	 */
-	ret = drm_buddy_init(&mm, mm_size, SZ_4K);
+	ret = gpu_buddy_init(&mm, mm_size, SZ_4K);
 	KUNIT_ASSERT_EQ(test, ret, 0);
 
 	start = ktime_get();
 	/* Allocate maximum fragmentation */
 	for (i = 0; i < num_blocks; i++)
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, SZ_8K, SZ_64K,
+		KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size, SZ_8K, SZ_64K,
 								    &allocated_blocks, 0),
 					"buddy_alloc hit an error size=%u\n", SZ_8K);
 
@@ -111,28 +111,28 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
 			list_move_tail(&block->link, &free_list);
 		count++;
 	}
-	drm_buddy_free_list(&mm, &free_list, DRM_BUDDY_CLEARED);
+	gpu_buddy_free_list(&mm, &free_list, GPU_BUDDY_CLEARED);
 
 	list_for_each_entry_safe_reverse(block, tmp, &allocated_blocks, link)
 		list_move(&block->link, &reverse_list);
-	drm_buddy_free_list(&mm, &reverse_list, DRM_BUDDY_CLEARED);
+	gpu_buddy_free_list(&mm, &reverse_list, GPU_BUDDY_CLEARED);
 
 	end = ktime_get();
 	elapsed_ms = ktime_to_ms(ktime_sub(end, start));
 
 	kunit_info(test, "Reverse-ordered free took %lu ms\n", elapsed_ms);
 
-	drm_buddy_fini(&mm);
+	gpu_buddy_fini(&mm);
 }
 
-static void drm_test_buddy_alloc_range_bias(struct kunit *test)
+static void gpu_test_buddy_alloc_range_bias(struct kunit *test)
 {
 	u32 mm_size, size, ps, bias_size, bias_start, bias_end, bias_rem;
-	DRM_RND_STATE(prng, random_seed);
+	GPU_RND_STATE(prng, random_seed);
 	unsigned int i, count, *order;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	unsigned long flags;
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 	LIST_HEAD(allocated);
 
 	bias_size = SZ_1M;
@@ -142,11 +142,11 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
 
 	kunit_info(test, "mm_size=%u, ps=%u\n", mm_size, ps);
 
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps),
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, ps),
 			       "buddy_init failed\n");
 
 	count = mm_size / bias_size;
-	order = drm_random_order(count, &prng);
+	order = gpu_random_order(count, &prng);
 	KUNIT_EXPECT_TRUE(test, order);
 
 	/*
@@ -166,79 +166,79 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
 
 		/* internal round_up too big */
 		KUNIT_ASSERT_TRUE_MSG(test,
-				      drm_buddy_alloc_blocks(&mm, bias_start,
+				      gpu_buddy_alloc_blocks(&mm, bias_start,
 							     bias_end, bias_size + ps, bias_size,
 							     &allocated,
-							     DRM_BUDDY_RANGE_ALLOCATION),
+							     GPU_BUDDY_RANGE_ALLOCATION),
 				      "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
 				      bias_start, bias_end, bias_size, bias_size);
 
 		/* size too big */
 		KUNIT_ASSERT_TRUE_MSG(test,
-				      drm_buddy_alloc_blocks(&mm, bias_start,
+				      gpu_buddy_alloc_blocks(&mm, bias_start,
 							     bias_end, bias_size + ps, ps,
 							     &allocated,
-							     DRM_BUDDY_RANGE_ALLOCATION),
+							     GPU_BUDDY_RANGE_ALLOCATION),
 				      "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
 				      bias_start, bias_end, bias_size + ps, ps);
 
 		/* bias range too small for size */
 		KUNIT_ASSERT_TRUE_MSG(test,
-				      drm_buddy_alloc_blocks(&mm, bias_start + ps,
+				      gpu_buddy_alloc_blocks(&mm, bias_start + ps,
 							     bias_end, bias_size, ps,
 							     &allocated,
-							     DRM_BUDDY_RANGE_ALLOCATION),
+							     GPU_BUDDY_RANGE_ALLOCATION),
 				      "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
 				      bias_start + ps, bias_end, bias_size, ps);
 
 		/* bias misaligned */
 		KUNIT_ASSERT_TRUE_MSG(test,
-				      drm_buddy_alloc_blocks(&mm, bias_start + ps,
+				      gpu_buddy_alloc_blocks(&mm, bias_start + ps,
 							     bias_end - ps,
 							     bias_size >> 1, bias_size >> 1,
 							     &allocated,
-							     DRM_BUDDY_RANGE_ALLOCATION),
+							     GPU_BUDDY_RANGE_ALLOCATION),
 				      "buddy_alloc h didn't fail with bias(%x-%x), size=%u, ps=%u\n",
 				      bias_start + ps, bias_end - ps, bias_size >> 1, bias_size >> 1);
 
 		/* single big page */
 		KUNIT_ASSERT_FALSE_MSG(test,
-				       drm_buddy_alloc_blocks(&mm, bias_start,
+				       gpu_buddy_alloc_blocks(&mm, bias_start,
 							      bias_end, bias_size, bias_size,
 							      &tmp,
-							      DRM_BUDDY_RANGE_ALLOCATION),
+							      GPU_BUDDY_RANGE_ALLOCATION),
 				       "buddy_alloc i failed with bias(%x-%x), size=%u, ps=%u\n",
 				       bias_start, bias_end, bias_size, bias_size);
-		drm_buddy_free_list(&mm, &tmp, 0);
+		gpu_buddy_free_list(&mm, &tmp, 0);
 
 		/* single page with internal round_up */
 		KUNIT_ASSERT_FALSE_MSG(test,
-				       drm_buddy_alloc_blocks(&mm, bias_start,
+				       gpu_buddy_alloc_blocks(&mm, bias_start,
 							      bias_end, ps, bias_size,
 							      &tmp,
-							      DRM_BUDDY_RANGE_ALLOCATION),
+							      GPU_BUDDY_RANGE_ALLOCATION),
 				       "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
 				       bias_start, bias_end, ps, bias_size);
-		drm_buddy_free_list(&mm, &tmp, 0);
+		gpu_buddy_free_list(&mm, &tmp, 0);
 
 		/* random size within */
 		size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
 		if (size)
 			KUNIT_ASSERT_FALSE_MSG(test,
-					       drm_buddy_alloc_blocks(&mm, bias_start,
+					       gpu_buddy_alloc_blocks(&mm, bias_start,
 								      bias_end, size, ps,
 								      &tmp,
-								      DRM_BUDDY_RANGE_ALLOCATION),
+								      GPU_BUDDY_RANGE_ALLOCATION),
 					       "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
 					       bias_start, bias_end, size, ps);
 
 		bias_rem -= size;
 		/* too big for current avail */
 		KUNIT_ASSERT_TRUE_MSG(test,
-				      drm_buddy_alloc_blocks(&mm, bias_start,
+				      gpu_buddy_alloc_blocks(&mm, bias_start,
 							     bias_end, bias_rem + ps, ps,
 							     &allocated,
-							     DRM_BUDDY_RANGE_ALLOCATION),
+							     GPU_BUDDY_RANGE_ALLOCATION),
 				      "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
 				      bias_start, bias_end, bias_rem + ps, ps);
 
@@ -248,10 +248,10 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
 			size = max(size, ps);
 
 			KUNIT_ASSERT_FALSE_MSG(test,
-					       drm_buddy_alloc_blocks(&mm, bias_start,
+					       gpu_buddy_alloc_blocks(&mm, bias_start,
 								      bias_end, size, ps,
 								      &allocated,
-								      DRM_BUDDY_RANGE_ALLOCATION),
+								      GPU_BUDDY_RANGE_ALLOCATION),
 					       "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
 					       bias_start, bias_end, size, ps);
 			/*
@@ -259,15 +259,15 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
 			 * unallocated, and ideally not always on the bias
 			 * boundaries.
 			 */
-			drm_buddy_free_list(&mm, &tmp, 0);
+			gpu_buddy_free_list(&mm, &tmp, 0);
 		} else {
 			list_splice_tail(&tmp, &allocated);
 		}
 	}
 
 	kfree(order);
-	drm_buddy_free_list(&mm, &allocated, 0);
-	drm_buddy_fini(&mm);
+	gpu_buddy_free_list(&mm, &allocated, 0);
+	gpu_buddy_fini(&mm);
 
 	/*
 	 * Something more free-form. Idea is to pick a random starting bias
@@ -278,7 +278,7 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
 	 * allocated nodes in the middle of the address space.
 	 */
 
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps),
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, ps),
 			       "buddy_init failed\n");
 
 	bias_start = round_up(prandom_u32_state(&prng) % (mm_size - ps), ps);
@@ -290,10 +290,10 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
 		u32 size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
 
 		KUNIT_ASSERT_FALSE_MSG(test,
-				       drm_buddy_alloc_blocks(&mm, bias_start,
+				       gpu_buddy_alloc_blocks(&mm, bias_start,
 							      bias_end, size, ps,
 							      &allocated,
-							      DRM_BUDDY_RANGE_ALLOCATION),
+							      GPU_BUDDY_RANGE_ALLOCATION),
 				       "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
 				       bias_start, bias_end, size, ps);
 		bias_rem -= size;
@@ -319,24 +319,24 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
 	KUNIT_ASSERT_EQ(test, bias_start, 0);
 	KUNIT_ASSERT_EQ(test, bias_end, mm_size);
 	KUNIT_ASSERT_TRUE_MSG(test,
-			      drm_buddy_alloc_blocks(&mm, bias_start, bias_end,
+			      gpu_buddy_alloc_blocks(&mm, bias_start, bias_end,
 						     ps, ps,
 						     &allocated,
-						     DRM_BUDDY_RANGE_ALLOCATION),
+						     GPU_BUDDY_RANGE_ALLOCATION),
 			      "buddy_alloc passed with bias(%x-%x), size=%u\n",
 			      bias_start, bias_end, ps);
 
-	drm_buddy_free_list(&mm, &allocated, 0);
-	drm_buddy_fini(&mm);
+	gpu_buddy_free_list(&mm, &allocated, 0);
+	gpu_buddy_fini(&mm);
 
 	/*
-	 * Allocate cleared blocks in the bias range when the DRM buddy's clear avail is
+	 * Allocate cleared blocks in the bias range when the GPU buddy's clear avail is
 	 * zero. This will validate the bias range allocation in scenarios like system boot
 	 * when no cleared blocks are available and exercise the fallback path too. The resulting
 	 * blocks should always be dirty.
 	 */
 
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps),
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, ps),
 			       "buddy_init failed\n");
 
 	bias_start = round_up(prandom_u32_state(&prng) % (mm_size - ps), ps);
@@ -344,11 +344,11 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
 	bias_end = max(bias_end, bias_start + ps);
 	bias_rem = bias_end - bias_start;
 
-	flags = DRM_BUDDY_CLEAR_ALLOCATION | DRM_BUDDY_RANGE_ALLOCATION;
+	flags = GPU_BUDDY_CLEAR_ALLOCATION | GPU_BUDDY_RANGE_ALLOCATION;
 	size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
 
 	KUNIT_ASSERT_FALSE_MSG(test,
-			       drm_buddy_alloc_blocks(&mm, bias_start,
+			       gpu_buddy_alloc_blocks(&mm, bias_start,
 						      bias_end, size, ps,
 						      &allocated,
 						      flags),
@@ -356,27 +356,27 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
 			       bias_start, bias_end, size, ps);
 
 	list_for_each_entry(block, &allocated, link)
-		KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false);
+		KUNIT_EXPECT_EQ(test, gpu_buddy_block_is_clear(block), false);
 
-	drm_buddy_free_list(&mm, &allocated, 0);
-	drm_buddy_fini(&mm);
+	gpu_buddy_free_list(&mm, &allocated, 0);
+	gpu_buddy_fini(&mm);
 }
 
-static void drm_test_buddy_alloc_clear(struct kunit *test)
+static void gpu_test_buddy_alloc_clear(struct kunit *test)
 {
 	unsigned long n_pages, total, i = 0;
 	const unsigned long ps = SZ_4K;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	const int max_order = 12;
 	LIST_HEAD(allocated);
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 	unsigned int order;
 	u32 mm_size, size;
 	LIST_HEAD(dirty);
 	LIST_HEAD(clean);
 
 	mm_size = SZ_4K << max_order;
-	KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
+	KUNIT_EXPECT_FALSE(test, gpu_buddy_init(&mm, mm_size, ps));
 
 	KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
 
@@ -389,11 +389,11 @@ static void drm_test_buddy_alloc_clear(struct kunit *test)
 	 * is indeed all dirty pages and vice versa. Free it all again,
 	 * keeping the dirty/clear status.
 	 */
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 							    5 * ps, ps, &allocated,
-							    DRM_BUDDY_TOPDOWN_ALLOCATION),
+							    GPU_BUDDY_TOPDOWN_ALLOCATION),
 				"buddy_alloc hit an error size=%lu\n", 5 * ps);
-	drm_buddy_free_list(&mm, &allocated, DRM_BUDDY_CLEARED);
+	gpu_buddy_free_list(&mm, &allocated, GPU_BUDDY_CLEARED);
 
 	n_pages = 10;
 	do {
@@ -406,37 +406,37 @@ static void drm_test_buddy_alloc_clear(struct kunit *test)
 			flags = 0;
 		} else {
 			list = &clean;
-			flags = DRM_BUDDY_CLEAR_ALLOCATION;
+			flags = GPU_BUDDY_CLEAR_ALLOCATION;
 		}
 
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+		KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 								    ps, ps, list,
 								    flags),
 					"buddy_alloc hit an error size=%lu\n", ps);
 	} while (++i < n_pages);
 
 	list_for_each_entry(block, &clean, link)
-		KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), true);
+		KUNIT_EXPECT_EQ(test, gpu_buddy_block_is_clear(block), true);
 
 	list_for_each_entry(block, &dirty, link)
-		KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false);
+		KUNIT_EXPECT_EQ(test, gpu_buddy_block_is_clear(block), false);
 
-	drm_buddy_free_list(&mm, &clean, DRM_BUDDY_CLEARED);
+	gpu_buddy_free_list(&mm, &clean, GPU_BUDDY_CLEARED);
 
 	/*
 	 * Trying to go over the clear limit for some allocation.
 	 * The allocation should never fail with reasonable page-size.
 	 */
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 							    10 * ps, ps, &clean,
-							    DRM_BUDDY_CLEAR_ALLOCATION),
+							    GPU_BUDDY_CLEAR_ALLOCATION),
 				"buddy_alloc hit an error size=%lu\n", 10 * ps);
 
-	drm_buddy_free_list(&mm, &clean, DRM_BUDDY_CLEARED);
-	drm_buddy_free_list(&mm, &dirty, 0);
-	drm_buddy_fini(&mm);
+	gpu_buddy_free_list(&mm, &clean, GPU_BUDDY_CLEARED);
+	gpu_buddy_free_list(&mm, &dirty, 0);
+	gpu_buddy_fini(&mm);
 
-	KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
+	KUNIT_EXPECT_FALSE(test, gpu_buddy_init(&mm, mm_size, ps));
 
 	/*
 	 * Create a new mm. Intentionally fragment the address space by creating
@@ -458,34 +458,34 @@ static void drm_test_buddy_alloc_clear(struct kunit *test)
 		else
 			list = &clean;
 
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+		KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 								    ps, ps, list, 0),
 					"buddy_alloc hit an error size=%lu\n", ps);
 	} while (++i < n_pages);
 
-	drm_buddy_free_list(&mm, &clean, DRM_BUDDY_CLEARED);
-	drm_buddy_free_list(&mm, &dirty, 0);
+	gpu_buddy_free_list(&mm, &clean, GPU_BUDDY_CLEARED);
+	gpu_buddy_free_list(&mm, &dirty, 0);
 
 	order = 1;
 	do {
 		size = SZ_4K << order;
 
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+		KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 								    size, size, &allocated,
-								    DRM_BUDDY_CLEAR_ALLOCATION),
+								    GPU_BUDDY_CLEAR_ALLOCATION),
 					"buddy_alloc hit an error size=%u\n", size);
 		total = 0;
 		list_for_each_entry(block, &allocated, link) {
 			if (size != mm_size)
-				KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false);
-			total += drm_buddy_block_size(&mm, block);
+				KUNIT_EXPECT_EQ(test, gpu_buddy_block_is_clear(block), false);
+			total += gpu_buddy_block_size(&mm, block);
 		}
 		KUNIT_EXPECT_EQ(test, total, size);
 
-		drm_buddy_free_list(&mm, &allocated, 0);
+		gpu_buddy_free_list(&mm, &allocated, 0);
 	} while (++order <= max_order);
 
-	drm_buddy_fini(&mm);
+	gpu_buddy_fini(&mm);
 
 	/*
 	 * Create a new mm with a non power-of-two size. Allocate a random size from each
@@ -494,44 +494,44 @@ static void drm_test_buddy_alloc_clear(struct kunit *test)
 	 */
 	mm_size = (SZ_4K << max_order) + (SZ_4K << (max_order - 2));
 
-	KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
+	KUNIT_EXPECT_FALSE(test, gpu_buddy_init(&mm, mm_size, ps));
 	KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, SZ_4K << max_order,
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, SZ_4K << max_order,
 							    4 * ps, ps, &allocated,
-							    DRM_BUDDY_RANGE_ALLOCATION),
+							    GPU_BUDDY_RANGE_ALLOCATION),
 				"buddy_alloc hit an error size=%lu\n", 4 * ps);
-	drm_buddy_free_list(&mm, &allocated, DRM_BUDDY_CLEARED);
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, SZ_4K << max_order,
+	gpu_buddy_free_list(&mm, &allocated, GPU_BUDDY_CLEARED);
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, SZ_4K << max_order,
 							    2 * ps, ps, &allocated,
-							    DRM_BUDDY_CLEAR_ALLOCATION),
+							    GPU_BUDDY_CLEAR_ALLOCATION),
 				"buddy_alloc hit an error size=%lu\n", 2 * ps);
-	drm_buddy_free_list(&mm, &allocated, DRM_BUDDY_CLEARED);
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, SZ_4K << max_order, mm_size,
+	gpu_buddy_free_list(&mm, &allocated, GPU_BUDDY_CLEARED);
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, SZ_4K << max_order, mm_size,
 							    ps, ps, &allocated,
-							    DRM_BUDDY_RANGE_ALLOCATION),
+							    GPU_BUDDY_RANGE_ALLOCATION),
 				"buddy_alloc hit an error size=%lu\n", ps);
-	drm_buddy_free_list(&mm, &allocated, DRM_BUDDY_CLEARED);
-	drm_buddy_fini(&mm);
+	gpu_buddy_free_list(&mm, &allocated, GPU_BUDDY_CLEARED);
+	gpu_buddy_fini(&mm);
 }
 
-static void drm_test_buddy_alloc_contiguous(struct kunit *test)
+static void gpu_test_buddy_alloc_contiguous(struct kunit *test)
 {
 	const unsigned long ps = SZ_4K, mm_size = 16 * 3 * SZ_4K;
 	unsigned long i, n_pages, total;
-	struct drm_buddy_block *block;
-	struct drm_buddy mm;
+	struct gpu_buddy_block *block;
+	struct gpu_buddy mm;
 	LIST_HEAD(left);
 	LIST_HEAD(middle);
 	LIST_HEAD(right);
 	LIST_HEAD(allocated);
 
-	KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
+	KUNIT_EXPECT_FALSE(test, gpu_buddy_init(&mm, mm_size, ps));
 
 	/*
 	 * Idea is to fragment the address space by alternating block
 	 * allocations between three different lists; one for left, middle and
 	 * right. We can then free a list to simulate fragmentation. In
-	 * particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION,
+	 * particular we want to exercise the GPU_BUDDY_CONTIGUOUS_ALLOCATION,
 	 * including the try_harder path.
 	 */
 
@@ -548,66 +548,66 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test)
 		else
 			list = &right;
 		KUNIT_ASSERT_FALSE_MSG(test,
-				       drm_buddy_alloc_blocks(&mm, 0, mm_size,
+				       gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 							      ps, ps, list, 0),
 				       "buddy_alloc hit an error size=%lu\n",
 				       ps);
 	} while (++i < n_pages);
 
-	KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+	KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 							   3 * ps, ps, &allocated,
-							   DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+							   GPU_BUDDY_CONTIGUOUS_ALLOCATION),
 			       "buddy_alloc didn't error size=%lu\n", 3 * ps);
 
-	drm_buddy_free_list(&mm, &middle, 0);
-	KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+	gpu_buddy_free_list(&mm, &middle, 0);
+	KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 							   3 * ps, ps, &allocated,
-							   DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+							   GPU_BUDDY_CONTIGUOUS_ALLOCATION),
 			       "buddy_alloc didn't error size=%lu\n", 3 * ps);
-	KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+	KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 							   2 * ps, ps, &allocated,
-							   DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+							   GPU_BUDDY_CONTIGUOUS_ALLOCATION),
 			       "buddy_alloc didn't error size=%lu\n", 2 * ps);
 
-	drm_buddy_free_list(&mm, &right, 0);
-	KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+	gpu_buddy_free_list(&mm, &right, 0);
+	KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 							   3 * ps, ps, &allocated,
-							   DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+							   GPU_BUDDY_CONTIGUOUS_ALLOCATION),
 			       "buddy_alloc didn't error size=%lu\n", 3 * ps);
 	/*
 	 * At this point we should have enough contiguous space for 2 blocks,
 	 * however they are never buddies (since we freed middle and right) so
 	 * will require the try_harder logic to find them.
 	 */
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 							    2 * ps, ps, &allocated,
-							    DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+							    GPU_BUDDY_CONTIGUOUS_ALLOCATION),
 			       "buddy_alloc hit an error size=%lu\n", 2 * ps);
 
-	drm_buddy_free_list(&mm, &left, 0);
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+	gpu_buddy_free_list(&mm, &left, 0);
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
 							    3 * ps, ps, &allocated,
-							    DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+							    GPU_BUDDY_CONTIGUOUS_ALLOCATION),
 			       "buddy_alloc hit an error size=%lu\n", 3 * ps);
 
 	total = 0;
 	list_for_each_entry(block, &allocated, link)
-		total += drm_buddy_block_size(&mm, block);
+		total += gpu_buddy_block_size(&mm, block);
 
 	KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3);
 
-	drm_buddy_free_list(&mm, &allocated, 0);
-	drm_buddy_fini(&mm);
+	gpu_buddy_free_list(&mm, &allocated, 0);
+	gpu_buddy_fini(&mm);
 }
 
-static void drm_test_buddy_alloc_pathological(struct kunit *test)
+static void gpu_test_buddy_alloc_pathological(struct kunit *test)
 {
 	u64 mm_size, size, start = 0;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	const int max_order = 3;
 	unsigned long flags = 0;
 	int order, top;
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 	LIST_HEAD(blocks);
 	LIST_HEAD(holes);
 	LIST_HEAD(tmp);
@@ -620,7 +620,7 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test)
 	 */
 
 	mm_size = SZ_4K << max_order;
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K),
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, SZ_4K),
 			       "buddy_init failed\n");
 
 	KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
@@ -630,18 +630,18 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test)
 		block = list_first_entry_or_null(&blocks, typeof(*block), link);
 		if (block) {
 			list_del(&block->link);
-			drm_buddy_free_block(&mm, block);
+			gpu_buddy_free_block(&mm, block);
 		}
 
 		for (order = top; order--;) {
 			size = get_size(order, mm.chunk_size);
-			KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start,
+			KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start,
 									    mm_size, size, size,
 										&tmp, flags),
 					"buddy_alloc hit -ENOMEM with order=%d, top=%d\n",
 					order, top);
 
-			block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
+			block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
 			KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
 
 			list_move_tail(&block->link, &blocks);
@@ -649,45 +649,45 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test)
 
 		/* There should be one final page for this sub-allocation */
 		size = get_size(0, mm.chunk_size);
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
+		KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
 								    size, size, &tmp, flags),
 							   "buddy_alloc hit -ENOMEM for hole\n");
 
-		block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
+		block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
 		KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
 
 		list_move_tail(&block->link, &holes);
 
 		size = get_size(top, mm.chunk_size);
-		KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
+		KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
 								   size, size, &tmp, flags),
 							  "buddy_alloc unexpectedly succeeded at top-order %d/%d, it should be full!",
 							  top, max_order);
 	}
 
-	drm_buddy_free_list(&mm, &holes, 0);
+	gpu_buddy_free_list(&mm, &holes, 0);
 
 	/* Nothing larger than blocks of chunk_size now available */
 	for (order = 1; order <= max_order; order++) {
 		size = get_size(order, mm.chunk_size);
-		KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
+		KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
 								   size, size, &tmp, flags),
 							  "buddy_alloc unexpectedly succeeded at order %d, it should be full!",
 							  order);
 	}
 
 	list_splice_tail(&holes, &blocks);
-	drm_buddy_free_list(&mm, &blocks, 0);
-	drm_buddy_fini(&mm);
+	gpu_buddy_free_list(&mm, &blocks, 0);
+	gpu_buddy_fini(&mm);
 }
 
-static void drm_test_buddy_alloc_pessimistic(struct kunit *test)
+static void gpu_test_buddy_alloc_pessimistic(struct kunit *test)
 {
 	u64 mm_size, size, start = 0;
-	struct drm_buddy_block *block, *bn;
+	struct gpu_buddy_block *block, *bn;
 	const unsigned int max_order = 16;
 	unsigned long flags = 0;
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 	unsigned int order;
 	LIST_HEAD(blocks);
 	LIST_HEAD(tmp);
@@ -699,19 +699,19 @@ static void drm_test_buddy_alloc_pessimistic(struct kunit *test)
 	 */
 
 	mm_size = SZ_4K << max_order;
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K),
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, SZ_4K),
 			       "buddy_init failed\n");
 
 	KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
 
 	for (order = 0; order < max_order; order++) {
 		size = get_size(order, mm.chunk_size);
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
+		KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
 								    size, size, &tmp, flags),
 							   "buddy_alloc hit -ENOMEM with order=%d\n",
 							   order);
 
-		block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
+		block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
 		KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
 
 		list_move_tail(&block->link, &blocks);
@@ -719,11 +719,11 @@ static void drm_test_buddy_alloc_pessimistic(struct kunit *test)
 
 	/* And now the last remaining block available */
 	size = get_size(0, mm.chunk_size);
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
 							    size, size, &tmp, flags),
 						   "buddy_alloc hit -ENOMEM on final alloc\n");
 
-	block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
+	block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
 	KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
 
 	list_move_tail(&block->link, &blocks);
@@ -731,58 +731,58 @@ static void drm_test_buddy_alloc_pessimistic(struct kunit *test)
 	/* Should be completely full! */
 	for (order = max_order; order--;) {
 		size = get_size(order, mm.chunk_size);
-		KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
+		KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
 								   size, size, &tmp, flags),
 							  "buddy_alloc unexpectedly succeeded, it should be full!");
 	}
 
 	block = list_last_entry(&blocks, typeof(*block), link);
 	list_del(&block->link);
-	drm_buddy_free_block(&mm, block);
+	gpu_buddy_free_block(&mm, block);
 
 	/* As we free in increasing size, we make available larger blocks */
 	order = 1;
 	list_for_each_entry_safe(block, bn, &blocks, link) {
 		list_del(&block->link);
-		drm_buddy_free_block(&mm, block);
+		gpu_buddy_free_block(&mm, block);
 
 		size = get_size(order, mm.chunk_size);
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
+		KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
 								    size, size, &tmp, flags),
 							   "buddy_alloc hit -ENOMEM with order=%d\n",
 							   order);
 
-		block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
+		block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
 		KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
 
 		list_del(&block->link);
-		drm_buddy_free_block(&mm, block);
+		gpu_buddy_free_block(&mm, block);
 		order++;
 	}
 
 	/* To confirm, now the whole mm should be available */
 	size = get_size(max_order, mm.chunk_size);
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
 							    size, size, &tmp, flags),
 						   "buddy_alloc (realloc) hit -ENOMEM with order=%d\n",
 						   max_order);
 
-	block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
+	block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
 	KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
 
 	list_del(&block->link);
-	drm_buddy_free_block(&mm, block);
-	drm_buddy_free_list(&mm, &blocks, 0);
-	drm_buddy_fini(&mm);
+	gpu_buddy_free_block(&mm, block);
+	gpu_buddy_free_list(&mm, &blocks, 0);
+	gpu_buddy_fini(&mm);
 }
 
-static void drm_test_buddy_alloc_optimistic(struct kunit *test)
+static void gpu_test_buddy_alloc_optimistic(struct kunit *test)
 {
 	u64 mm_size, size, start = 0;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	unsigned long flags = 0;
 	const int max_order = 16;
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 	LIST_HEAD(blocks);
 	LIST_HEAD(tmp);
 	int order;
@@ -794,19 +794,19 @@ static void drm_test_buddy_alloc_optimistic(struct kunit *test)
 
 	mm_size = SZ_4K * ((1 << (max_order + 1)) - 1);
 
-	KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K),
+	KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, SZ_4K),
 			       "buddy_init failed\n");
 
 	KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
 
 	for (order = 0; order <= max_order; order++) {
 		size = get_size(order, mm.chunk_size);
-		KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
+		KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
 								    size, size, &tmp, flags),
 							   "buddy_alloc hit -ENOMEM with order=%d\n",
 							   order);
 
-		block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
+		block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
 		KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
 
 		list_move_tail(&block->link, &blocks);
@@ -814,80 +814,80 @@ static void drm_test_buddy_alloc_optimistic(struct kunit *test)
 
 	/* Should be completely full! */
 	size = get_size(0, mm.chunk_size);
-	KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
+	KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
 							   size, size, &tmp, flags),
 						  "buddy_alloc unexpectedly succeeded, it should be full!");
 
-	drm_buddy_free_list(&mm, &blocks, 0);
-	drm_buddy_fini(&mm);
+	gpu_buddy_free_list(&mm, &blocks, 0);
+	gpu_buddy_fini(&mm);
 }
 
-static void drm_test_buddy_alloc_limit(struct kunit *test)
+static void gpu_test_buddy_alloc_limit(struct kunit *test)
 {
 	u64 size = U64_MAX, start = 0;
-	struct drm_buddy_block *block;
+	struct gpu_buddy_block *block;
 	unsigned long flags = 0;
 	LIST_HEAD(allocated);
-	struct drm_buddy mm;
+	struct gpu_buddy mm;
 
-	KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, size, SZ_4K));
+	KUNIT_EXPECT_FALSE(test, gpu_buddy_init(&mm, size, SZ_4K));
 
-	KUNIT_EXPECT_EQ_MSG(test, mm.max_order, DRM_BUDDY_MAX_ORDER,
+	KUNIT_EXPECT_EQ_MSG(test, mm.max_order, GPU_BUDDY_MAX_ORDER,
 			    "mm.max_order(%d) != %d\n", mm.max_order,
-						DRM_BUDDY_MAX_ORDER);
+						GPU_BUDDY_MAX_ORDER);
 
 	size = mm.chunk_size << mm.max_order;
-	KUNIT_EXPECT_FALSE(test, drm_buddy_alloc_blocks(&mm, start, size, size,
+	KUNIT_EXPECT_FALSE(test, gpu_buddy_alloc_blocks(&mm, start, size, size,
 							mm.chunk_size, &allocated, flags));
 
-	block = list_first_entry_or_null(&allocated, struct drm_buddy_block, link);
+	block = list_first_entry_or_null(&allocated, struct gpu_buddy_block, link);
 	KUNIT_EXPECT_TRUE(test, block);
 
-	KUNIT_EXPECT_EQ_MSG(test, drm_buddy_block_order(block), mm.max_order,
+	KUNIT_EXPECT_EQ_MSG(test, gpu_buddy_block_order(block), mm.max_order,
 			    "block order(%d) != %d\n",
-						drm_buddy_block_order(block), mm.max_order);
+						gpu_buddy_block_order(block), mm.max_order);
 
-	KUNIT_EXPECT_EQ_MSG(test, drm_buddy_block_size(&mm, block),
+	KUNIT_EXPECT_EQ_MSG(test, gpu_buddy_block_size(&mm, block),
 			    BIT_ULL(mm.max_order) * mm.chunk_size,
 						"block size(%llu) != %llu\n",
-						drm_buddy_block_size(&mm, block),
+						gpu_buddy_block_size(&mm, block),
 						BIT_ULL(mm.max_order) * mm.chunk_size);
 
-	drm_buddy_free_list(&mm, &allocated, 0);
-	drm_buddy_fini(&mm);
+	gpu_buddy_free_list(&mm, &allocated, 0);
+	gpu_buddy_fini(&mm);
 }
 
-static int drm_buddy_suite_init(struct kunit_suite *suite)
+static int gpu_buddy_suite_init(struct kunit_suite *suite)
 {
 	while (!random_seed)
 		random_seed = get_random_u32();
 
-	kunit_info(suite, "Testing DRM buddy manager, with random_seed=0x%x\n",
+	kunit_info(suite, "Testing GPU buddy manager, with random_seed=0x%x\n",
 		   random_seed);
 
 	return 0;
 }
 
-static struct kunit_case drm_buddy_tests[] = {
-	KUNIT_CASE(drm_test_buddy_alloc_limit),
-	KUNIT_CASE(drm_test_buddy_alloc_optimistic),
-	KUNIT_CASE(drm_test_buddy_alloc_pessimistic),
-	KUNIT_CASE(drm_test_buddy_alloc_pathological),
-	KUNIT_CASE(drm_test_buddy_alloc_contiguous),
-	KUNIT_CASE(drm_test_buddy_alloc_clear),
-	KUNIT_CASE(drm_test_buddy_alloc_range_bias),
-	KUNIT_CASE(drm_test_buddy_fragmentation_performance),
+static struct kunit_case gpu_buddy_tests[] = {
+	KUNIT_CASE(gpu_test_buddy_alloc_limit),
+	KUNIT_CASE(gpu_test_buddy_alloc_optimistic),
+	KUNIT_CASE(gpu_test_buddy_alloc_pessimistic),
+	KUNIT_CASE(gpu_test_buddy_alloc_pathological),
+	KUNIT_CASE(gpu_test_buddy_alloc_contiguous),
+	KUNIT_CASE(gpu_test_buddy_alloc_clear),
+	KUNIT_CASE(gpu_test_buddy_alloc_range_bias),
+	KUNIT_CASE(gpu_test_buddy_fragmentation_performance),
 	{}
 };
 
-static struct kunit_suite drm_buddy_test_suite = {
-	.name = "drm_buddy",
-	.suite_init = drm_buddy_suite_init,
-	.test_cases = drm_buddy_tests,
+static struct kunit_suite gpu_buddy_test_suite = {
+	.name = "gpu_buddy",
+	.suite_init = gpu_buddy_suite_init,
+	.test_cases = gpu_buddy_tests,
 };
 
-kunit_test_suite(drm_buddy_test_suite);
+kunit_test_suite(gpu_buddy_test_suite);
 
 MODULE_AUTHOR("Intel Corporation");
-MODULE_DESCRIPTION("Kunit test for drm_buddy functions");
+MODULE_DESCRIPTION("Kunit test for gpu_buddy functions");
 MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/tests/gpu_random.c b/drivers/gpu/tests/gpu_random.c
new file mode 100644
index 000000000000..54f1f6a3a6c1
--- /dev/null
+++ b/drivers/gpu/tests/gpu_random.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bitops.h>
+#include <linux/export.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "gpu_random.h"
+
+u32 gpu_prandom_u32_max_state(u32 ep_ro, struct rnd_state *state)
+{
+	return upper_32_bits((u64)prandom_u32_state(state) * ep_ro);
+}
+EXPORT_SYMBOL(gpu_prandom_u32_max_state);
+
+void gpu_random_reorder(unsigned int *order, unsigned int count,
+			struct rnd_state *state)
+{
+	unsigned int i, j;
+
+	for (i = 0; i < count; ++i) {
+		BUILD_BUG_ON(sizeof(unsigned int) > sizeof(u32));
+		j = gpu_prandom_u32_max_state(count, state);
+		swap(order[i], order[j]);
+	}
+}
+EXPORT_SYMBOL(gpu_random_reorder);
+
+unsigned int *gpu_random_order(unsigned int count, struct rnd_state *state)
+{
+	unsigned int *order, i;
+
+	order = kmalloc_array(count, sizeof(*order), GFP_KERNEL);
+	if (!order)
+		return order;
+
+	for (i = 0; i < count; i++)
+		order[i] = i;
+
+	gpu_random_reorder(order, count, state);
+	return order;
+}
+EXPORT_SYMBOL(gpu_random_order);
+
+MODULE_DESCRIPTION("GPU Randomization Utilities");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/drivers/gpu/tests/gpu_random.h b/drivers/gpu/tests/gpu_random.h
new file mode 100644
index 000000000000..b68cf3448264
--- /dev/null
+++ b/drivers/gpu/tests/gpu_random.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __GPU_RANDOM_H__
+#define __GPU_RANDOM_H__
+
+/* This is a temporary home for a couple of utility functions that should
+ * be transposed to lib/ at the earliest convenience.
+ */
+
+#include <linux/prandom.h>
+
+#define GPU_RND_STATE_INITIALIZER(seed__) ({				\
+	struct rnd_state state__;					\
+	prandom_seed_state(&state__, (seed__));				\
+	state__;							\
+})
+
+#define GPU_RND_STATE(name__, seed__) \
+	struct rnd_state name__ = GPU_RND_STATE_INITIALIZER(seed__)
+
+unsigned int *gpu_random_order(unsigned int count,
+			       struct rnd_state *state);
+void gpu_random_reorder(unsigned int *order,
+			unsigned int count,
+			struct rnd_state *state);
+u32 gpu_prandom_u32_max_state(u32 ep_ro,
+			      struct rnd_state *state);
+
+#endif /* !__GPU_RANDOM_H__ */
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index d51777df12d1..6ae1383b0e2e 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -37,6 +37,8 @@ source "drivers/char/agp/Kconfig"
 
 source "drivers/gpu/vga/Kconfig"
 
+source "drivers/gpu/Kconfig"
+
 source "drivers/gpu/host1x/Kconfig"
 source "drivers/gpu/ipu-v3/Kconfig"
 source "drivers/gpu/nova-core/Kconfig"
diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h
index b909fa8f810a..3054369bebff 100644
--- a/include/drm/drm_buddy.h
+++ b/include/drm/drm_buddy.h
@@ -6,166 +6,13 @@
 #ifndef __DRM_BUDDY_H__
 #define __DRM_BUDDY_H__
 
-#include <linux/bitops.h>
-#include <linux/list.h>
-#include <linux/slab.h>
-#include <linux/sched.h>
-#include <linux/rbtree.h>
+#include <linux/gpu_buddy.h>
 
 struct drm_printer;
 
-#define DRM_BUDDY_RANGE_ALLOCATION		BIT(0)
-#define DRM_BUDDY_TOPDOWN_ALLOCATION		BIT(1)
-#define DRM_BUDDY_CONTIGUOUS_ALLOCATION		BIT(2)
-#define DRM_BUDDY_CLEAR_ALLOCATION		BIT(3)
-#define DRM_BUDDY_CLEARED			BIT(4)
-#define DRM_BUDDY_TRIM_DISABLE			BIT(5)
-
-struct drm_buddy_block {
-#define DRM_BUDDY_HEADER_OFFSET GENMASK_ULL(63, 12)
-#define DRM_BUDDY_HEADER_STATE  GENMASK_ULL(11, 10)
-#define   DRM_BUDDY_ALLOCATED	   (1 << 10)
-#define   DRM_BUDDY_FREE	   (2 << 10)
-#define   DRM_BUDDY_SPLIT	   (3 << 10)
-#define DRM_BUDDY_HEADER_CLEAR  GENMASK_ULL(9, 9)
-/* Free to be used, if needed in the future */
-#define DRM_BUDDY_HEADER_UNUSED GENMASK_ULL(8, 6)
-#define DRM_BUDDY_HEADER_ORDER  GENMASK_ULL(5, 0)
-	u64 header;
-
-	struct drm_buddy_block *left;
-	struct drm_buddy_block *right;
-	struct drm_buddy_block *parent;
-
-	void *private; /* owned by creator */
-
-	/*
-	 * While the block is allocated by the user through drm_buddy_alloc*,
-	 * the user has ownership of the link, for example to maintain within
-	 * a list, if so desired. As soon as the block is freed with
-	 * drm_buddy_free* ownership is given back to the mm.
-	 */
-	union {
-		struct rb_node rb;
-		struct list_head link;
-	};
-
-	struct list_head tmp_link;
-};
-
-/* Order-zero must be at least SZ_4K */
-#define DRM_BUDDY_MAX_ORDER (63 - 12)
-
-/*
- * Binary Buddy System.
- *
- * Locking should be handled by the user, a simple mutex around
- * drm_buddy_alloc* and drm_buddy_free* should suffice.
- */
-struct drm_buddy {
-	/* Maintain a free list for each order. */
-	struct rb_root **free_trees;
-
-	/*
-	 * Maintain explicit binary tree(s) to track the allocation of the
-	 * address space. This gives us a simple way of finding a buddy block
-	 * and performing the potentially recursive merge step when freeing a
-	 * block.  Nodes are either allocated or free, in which case they will
-	 * also exist on the respective free list.
-	 */
-	struct drm_buddy_block **roots;
-
-	/*
-	 * Anything from here is public, and remains static for the lifetime of
-	 * the mm. Everything above is considered do-not-touch.
-	 */
-	unsigned int n_roots;
-	unsigned int max_order;
-
-	/* Must be at least SZ_4K */
-	u64 chunk_size;
-	u64 size;
-	u64 avail;
-	u64 clear_avail;
-};
-
-static inline u64
-drm_buddy_block_offset(const struct drm_buddy_block *block)
-{
-	return block->header & DRM_BUDDY_HEADER_OFFSET;
-}
-
-static inline unsigned int
-drm_buddy_block_order(struct drm_buddy_block *block)
-{
-	return block->header & DRM_BUDDY_HEADER_ORDER;
-}
-
-static inline unsigned int
-drm_buddy_block_state(struct drm_buddy_block *block)
-{
-	return block->header & DRM_BUDDY_HEADER_STATE;
-}
-
-static inline bool
-drm_buddy_block_is_allocated(struct drm_buddy_block *block)
-{
-	return drm_buddy_block_state(block) == DRM_BUDDY_ALLOCATED;
-}
-
-static inline bool
-drm_buddy_block_is_clear(struct drm_buddy_block *block)
-{
-	return block->header & DRM_BUDDY_HEADER_CLEAR;
-}
-
-static inline bool
-drm_buddy_block_is_free(struct drm_buddy_block *block)
-{
-	return drm_buddy_block_state(block) == DRM_BUDDY_FREE;
-}
-
-static inline bool
-drm_buddy_block_is_split(struct drm_buddy_block *block)
-{
-	return drm_buddy_block_state(block) == DRM_BUDDY_SPLIT;
-}
-
-static inline u64
-drm_buddy_block_size(struct drm_buddy *mm,
-		     struct drm_buddy_block *block)
-{
-	return mm->chunk_size << drm_buddy_block_order(block);
-}
-
-int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size);
-
-void drm_buddy_fini(struct drm_buddy *mm);
-
-struct drm_buddy_block *
-drm_get_buddy(struct drm_buddy_block *block);
-
-int drm_buddy_alloc_blocks(struct drm_buddy *mm,
-			   u64 start, u64 end, u64 size,
-			   u64 min_page_size,
-			   struct list_head *blocks,
-			   unsigned long flags);
-
-int drm_buddy_block_trim(struct drm_buddy *mm,
-			 u64 *start,
-			 u64 new_size,
-			 struct list_head *blocks);
-
-void drm_buddy_reset_clear(struct drm_buddy *mm, bool is_clear);
-
-void drm_buddy_free_block(struct drm_buddy *mm, struct drm_buddy_block *block);
-
-void drm_buddy_free_list(struct drm_buddy *mm,
-			 struct list_head *objects,
-			 unsigned int flags);
-
-void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p);
-void drm_buddy_block_print(struct drm_buddy *mm,
-			   struct drm_buddy_block *block,
+/* DRM-specific GPU Buddy Allocator print helpers */
+void drm_buddy_print(struct gpu_buddy *mm, struct drm_printer *p);
+void drm_buddy_block_print(struct gpu_buddy *mm,
+			   struct gpu_buddy_block *block,
 			   struct drm_printer *p);
 #endif
diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
new file mode 100644
index 000000000000..3e4bd11ccb71
--- /dev/null
+++ b/include/linux/gpu_buddy.h
@@ -0,0 +1,177 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#ifndef __GPU_BUDDY_H__
+#define __GPU_BUDDY_H__
+
+#include <linux/bitops.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+
+#define GPU_BUDDY_RANGE_ALLOCATION		BIT(0)
+#define GPU_BUDDY_TOPDOWN_ALLOCATION		BIT(1)
+#define GPU_BUDDY_CONTIGUOUS_ALLOCATION		BIT(2)
+#define GPU_BUDDY_CLEAR_ALLOCATION		BIT(3)
+#define GPU_BUDDY_CLEARED			BIT(4)
+#define GPU_BUDDY_TRIM_DISABLE			BIT(5)
+
+enum gpu_buddy_free_tree {
+	GPU_BUDDY_CLEAR_TREE = 0,
+	GPU_BUDDY_DIRTY_TREE,
+	GPU_BUDDY_MAX_FREE_TREES,
+};
+
+#define for_each_free_tree(tree) \
+	for ((tree) = 0; (tree) < GPU_BUDDY_MAX_FREE_TREES; (tree)++)
+
+struct gpu_buddy_block {
+#define GPU_BUDDY_HEADER_OFFSET GENMASK_ULL(63, 12)
+#define GPU_BUDDY_HEADER_STATE  GENMASK_ULL(11, 10)
+#define   GPU_BUDDY_ALLOCATED	   (1 << 10)
+#define   GPU_BUDDY_FREE	   (2 << 10)
+#define   GPU_BUDDY_SPLIT	   (3 << 10)
+#define GPU_BUDDY_HEADER_CLEAR  GENMASK_ULL(9, 9)
+/* Free to be used, if needed in the future */
+#define GPU_BUDDY_HEADER_UNUSED GENMASK_ULL(8, 6)
+#define GPU_BUDDY_HEADER_ORDER  GENMASK_ULL(5, 0)
+	u64 header;
+
+	struct gpu_buddy_block *left;
+	struct gpu_buddy_block *right;
+	struct gpu_buddy_block *parent;
+
+	void *private; /* owned by creator */
+
+	/*
+	 * While the block is allocated by the user through gpu_buddy_alloc*,
+	 * the user has ownership of the link, for example to maintain within
+	 * a list, if so desired. As soon as the block is freed with
+	 * gpu_buddy_free* ownership is given back to the mm.
+	 */
+	union {
+		struct rb_node rb;
+		struct list_head link;
+	};
+
+	struct list_head tmp_link;
+};
+
+/* Order-zero must be at least SZ_4K */
+#define GPU_BUDDY_MAX_ORDER (63 - 12)
+
+/*
+ * Binary Buddy System.
+ *
+ * Locking should be handled by the user, a simple mutex around
+ * gpu_buddy_alloc* and gpu_buddy_free* should suffice.
+ */
+struct gpu_buddy {
+	/* Maintain a free list for each order. */
+	struct rb_root **free_trees;
+
+	/*
+	 * Maintain explicit binary tree(s) to track the allocation of the
+	 * address space. This gives us a simple way of finding a buddy block
+	 * and performing the potentially recursive merge step when freeing a
+	 * block.  Nodes are either allocated or free, in which case they will
+	 * also exist on the respective free list.
+	 */
+	struct gpu_buddy_block **roots;
+
+	/*
+	 * Anything from here is public, and remains static for the lifetime of
+	 * the mm. Everything above is considered do-not-touch.
+	 */
+	unsigned int n_roots;
+	unsigned int max_order;
+
+	/* Must be at least SZ_4K */
+	u64 chunk_size;
+	u64 size;
+	u64 avail;
+	u64 clear_avail;
+};
+
+static inline u64
+gpu_buddy_block_offset(const struct gpu_buddy_block *block)
+{
+	return block->header & GPU_BUDDY_HEADER_OFFSET;
+}
+
+static inline unsigned int
+gpu_buddy_block_order(struct gpu_buddy_block *block)
+{
+	return block->header & GPU_BUDDY_HEADER_ORDER;
+}
+
+static inline unsigned int
+gpu_buddy_block_state(struct gpu_buddy_block *block)
+{
+	return block->header & GPU_BUDDY_HEADER_STATE;
+}
+
+static inline bool
+gpu_buddy_block_is_allocated(struct gpu_buddy_block *block)
+{
+	return gpu_buddy_block_state(block) == GPU_BUDDY_ALLOCATED;
+}
+
+static inline bool
+gpu_buddy_block_is_clear(struct gpu_buddy_block *block)
+{
+	return block->header & GPU_BUDDY_HEADER_CLEAR;
+}
+
+static inline bool
+gpu_buddy_block_is_free(struct gpu_buddy_block *block)
+{
+	return gpu_buddy_block_state(block) == GPU_BUDDY_FREE;
+}
+
+static inline bool
+gpu_buddy_block_is_split(struct gpu_buddy_block *block)
+{
+	return gpu_buddy_block_state(block) == GPU_BUDDY_SPLIT;
+}
+
+static inline u64
+gpu_buddy_block_size(struct gpu_buddy *mm,
+		     struct gpu_buddy_block *block)
+{
+	return mm->chunk_size << gpu_buddy_block_order(block);
+}
+
+int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size);
+
+void gpu_buddy_fini(struct gpu_buddy *mm);
+
+struct gpu_buddy_block *
+gpu_get_buddy(struct gpu_buddy_block *block);
+
+int gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
+			   u64 start, u64 end, u64 size,
+			   u64 min_page_size,
+			   struct list_head *blocks,
+			   unsigned long flags);
+
+int gpu_buddy_block_trim(struct gpu_buddy *mm,
+			 u64 *start,
+			 u64 new_size,
+			 struct list_head *blocks);
+
+void gpu_buddy_reset_clear(struct gpu_buddy *mm, bool is_clear);
+
+void gpu_buddy_free_block(struct gpu_buddy *mm, struct gpu_buddy_block *block);
+
+void gpu_buddy_free_list(struct gpu_buddy *mm,
+			 struct list_head *objects,
+			 unsigned int flags);
+
+void gpu_buddy_print(struct gpu_buddy *mm);
+void gpu_buddy_block_print(struct gpu_buddy *mm,
+			   struct gpu_buddy_block *block);
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 03/26] rust: gpu: Add GPU buddy allocator bindings
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 02/26] gpu: Move DRM buddy allocator one level up Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-02-04  3:55   ` Dave Airlie
  2026-01-20 20:42 ` [PATCH RFC v6 04/26] nova-core: mm: Select GPU_BUDDY for VRAM allocation Joel Fernandes
                   ` (23 subsequent siblings)
  26 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add safe Rust abstractions over the Linux kernel's GPU buddy
allocator for physical memory management. The GPU buddy allocator
implements a binary buddy system useful for GPU physical memory
allocation. nova-core will use it for physical memory allocation.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 rust/bindings/bindings_helper.h |  11 +
 rust/helpers/gpu.c              |  23 ++
 rust/helpers/helpers.c          |   1 +
 rust/kernel/gpu/buddy.rs        | 538 ++++++++++++++++++++++++++++++++
 rust/kernel/gpu/mod.rs          |   5 +
 rust/kernel/lib.rs              |   2 +
 6 files changed, 580 insertions(+)
 create mode 100644 rust/helpers/gpu.c
 create mode 100644 rust/kernel/gpu/buddy.rs
 create mode 100644 rust/kernel/gpu/mod.rs

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index a067038b4b42..940b854a1f93 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -29,6 +29,7 @@
 #include <linux/hrtimer_types.h>
 
 #include <linux/acpi.h>
+#include <linux/gpu_buddy.h>
 #include <drm/drm_device.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_file.h>
@@ -144,6 +145,16 @@ const vm_flags_t RUST_CONST_HELPER_VM_MIXEDMAP = VM_MIXEDMAP;
 const vm_flags_t RUST_CONST_HELPER_VM_HUGEPAGE = VM_HUGEPAGE;
 const vm_flags_t RUST_CONST_HELPER_VM_NOHUGEPAGE = VM_NOHUGEPAGE;
 
+#if IS_ENABLED(CONFIG_GPU_BUDDY)
+const unsigned long RUST_CONST_HELPER_GPU_BUDDY_RANGE_ALLOCATION = GPU_BUDDY_RANGE_ALLOCATION;
+const unsigned long RUST_CONST_HELPER_GPU_BUDDY_TOPDOWN_ALLOCATION = GPU_BUDDY_TOPDOWN_ALLOCATION;
+const unsigned long RUST_CONST_HELPER_GPU_BUDDY_CONTIGUOUS_ALLOCATION =
+								GPU_BUDDY_CONTIGUOUS_ALLOCATION;
+const unsigned long RUST_CONST_HELPER_GPU_BUDDY_CLEAR_ALLOCATION = GPU_BUDDY_CLEAR_ALLOCATION;
+const unsigned long RUST_CONST_HELPER_GPU_BUDDY_CLEARED = GPU_BUDDY_CLEARED;
+const unsigned long RUST_CONST_HELPER_GPU_BUDDY_TRIM_DISABLE = GPU_BUDDY_TRIM_DISABLE;
+#endif
+
 #if IS_ENABLED(CONFIG_ANDROID_BINDER_IPC_RUST)
 #include "../../drivers/android/binder/rust_binder.h"
 #include "../../drivers/android/binder/rust_binder_events.h"
diff --git a/rust/helpers/gpu.c b/rust/helpers/gpu.c
new file mode 100644
index 000000000000..38b1a4e6bef8
--- /dev/null
+++ b/rust/helpers/gpu.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/gpu_buddy.h>
+
+#ifdef CONFIG_GPU_BUDDY
+
+__rust_helper u64 rust_helper_gpu_buddy_block_offset(const struct gpu_buddy_block *block)
+{
+	return gpu_buddy_block_offset(block);
+}
+
+__rust_helper unsigned int rust_helper_gpu_buddy_block_order(struct gpu_buddy_block *block)
+{
+	return gpu_buddy_block_order(block);
+}
+
+__rust_helper u64 rust_helper_gpu_buddy_block_size(struct gpu_buddy *mm,
+						   struct gpu_buddy_block *block)
+{
+	return gpu_buddy_block_size(mm, block);
+}
+
+#endif /* CONFIG_GPU_BUDDY */
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 634fa2386bbb..6db7c4c25afa 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -29,6 +29,7 @@
 #include "err.c"
 #include "irq.c"
 #include "fs.c"
+#include "gpu.c"
 #include "io.c"
 #include "jump_label.c"
 #include "kunit.c"
diff --git a/rust/kernel/gpu/buddy.rs b/rust/kernel/gpu/buddy.rs
new file mode 100644
index 000000000000..7fb8e505ff9f
--- /dev/null
+++ b/rust/kernel/gpu/buddy.rs
@@ -0,0 +1,538 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! GPU buddy allocator bindings.
+//!
+//! C header: [`include/linux/gpu_buddy.h`](srctree/include/linux/gpu_buddy.h)
+//!
+//! This module provides Rust abstractions over the Linux kernel's GPU buddy
+//! allocator, which implements a binary buddy memory allocator.
+//!
+//! The buddy allocator manages a contiguous address space and allocates blocks
+//! in power-of-two sizes, useful for GPU physical memory management.
+//!
+//! # Examples
+//!
+//! ```
+//! use kernel::{
+//!     gpu::buddy::{BuddyFlags, GpuBuddy, GpuBuddyAllocParams, GpuBuddyParams},
+//!     prelude::*,
+//!     sizes::*, //
+//! };
+//!
+//! // Create a 1GB buddy allocator with 4KB minimum chunk size.
+//! let mut buddy = GpuBuddy::new(GpuBuddyParams {
+//!     base_offset_bytes: 0,
+//!     physical_memory_size_bytes: SZ_1G as u64,
+//!     chunk_size_bytes: SZ_4K as u64,
+//! })?;
+//!
+//! // Verify initial state.
+//! assert_eq!(buddy.size(), SZ_1G as u64);
+//! assert_eq!(buddy.chunk_size(), SZ_4K as u64);
+//! let initial_free = buddy.free_memory_bytes();
+//!
+//! // Base allocation params - reused across tests with field overrides.
+//! let params = GpuBuddyAllocParams {
+//!     start_range_address: 0,
+//!     end_range_address: 0,   // Entire range.
+//!     size_bytes: SZ_16M as u64,
+//!     min_block_size_bytes: SZ_16M as u64,
+//!     buddy_flags: BuddyFlags::try_new(BuddyFlags::RANGE_ALLOCATION)?,
+//! };
+//!
+//! // Test top-down allocation (allocates from highest addresses).
+//! let topdown = buddy.alloc_blocks(GpuBuddyAllocParams {
+//!     buddy_flags: BuddyFlags::try_new(BuddyFlags::TOPDOWN_ALLOCATION)?,
+//!     ..params
+//! })?;
+//! assert_eq!(buddy.free_memory_bytes(), initial_free - SZ_16M as u64);
+//!
+//! for block in topdown.iter() {
+//!     assert_eq!(block.offset(), (SZ_1G - SZ_16M) as u64);
+//!     assert_eq!(block.order(), 12); // 2^12 pages
+//!     assert_eq!(block.size(), SZ_16M as u64);
+//! }
+//! drop(topdown);
+//! assert_eq!(buddy.free_memory_bytes(), initial_free);
+//!
+//! // Allocate 16MB - should result in a single 16MB block at offset 0.
+//! let allocated = buddy.alloc_blocks(params)?;
+//! assert_eq!(buddy.free_memory_bytes(), initial_free - SZ_16M as u64);
+//!
+//! for block in allocated.iter() {
+//!     assert_eq!(block.offset(), 0);
+//!     assert_eq!(block.order(), 12); // 2^12 pages
+//!     assert_eq!(block.size(), SZ_16M as u64);
+//! }
+//! drop(allocated);
+//! assert_eq!(buddy.free_memory_bytes(), initial_free);
+//!
+//! // Test non-contiguous allocation with fragmented memory.
+//! // Create fragmentation by allocating 4MB blocks at [0,4M) and [8M,12M).
+//! let params_4m = GpuBuddyAllocParams {
+//!     end_range_address: SZ_4M as u64,
+//!     size_bytes: SZ_4M as u64,
+//!     min_block_size_bytes: SZ_4M as u64,
+//!     ..params
+//! };
+//! let frag1 = buddy.alloc_blocks(params_4m)?;
+//! assert_eq!(buddy.free_memory_bytes(), initial_free - SZ_4M as u64);
+//!
+//! let frag2 = buddy.alloc_blocks(GpuBuddyAllocParams {
+//!     start_range_address: SZ_8M as u64,
+//!     end_range_address: (SZ_8M + SZ_4M) as u64,
+//!     ..params_4m
+//! })?;
+//! assert_eq!(buddy.free_memory_bytes(), initial_free - SZ_8M as u64);
+//!
+//! // Allocate 8MB without CONTIGUOUS - should return 2 blocks from the holes.
+//! let fragmented = buddy.alloc_blocks(GpuBuddyAllocParams {
+//!     end_range_address: SZ_16M as u64,
+//!     size_bytes: SZ_8M as u64,
+//!     min_block_size_bytes: SZ_4M as u64,
+//!     ..params
+//! })?;
+//! assert_eq!(buddy.free_memory_bytes(), initial_free - (SZ_16M) as u64);
+//!
+//! let (mut count, mut total) = (0u32, 0u64);
+//! for block in fragmented.iter() {
+//!     // The 8MB allocation should return 2 blocks, each 4MB.
+//!     assert_eq!(block.size(), SZ_4M as u64);
+//!     total += block.size();
+//!     count += 1;
+//! }
+//! assert_eq!(total, SZ_8M as u64);
+//! assert_eq!(count, 2);
+//! drop(fragmented);
+//! drop(frag2);
+//! drop(frag1);
+//! assert_eq!(buddy.free_memory_bytes(), initial_free);
+//!
+//! // Test CONTIGUOUS failure when only fragmented space available.
+//! // Create a small buddy allocator with only 16MB of memory.
+//! let mut small = GpuBuddy::new(GpuBuddyParams {
+//!     base_offset_bytes: 0,
+//!     physical_memory_size_bytes: SZ_16M as u64,
+//!     chunk_size_bytes: SZ_4K as u64,
+//! })?;
+//!
+//! // Allocate 4MB blocks at [0,4M) and [8M,12M) to create fragmented memory.
+//! let hole1 = small.alloc_blocks(params_4m)?;
+//! let hole2 = small.alloc_blocks(GpuBuddyAllocParams {
+//!     start_range_address: SZ_8M as u64,
+//!     end_range_address: (SZ_8M + SZ_4M) as u64,
+//!     ..params_4m
+//! })?;
+//!
+//! // 8MB contiguous should fail - only two non-contiguous 4MB holes exist.
+//! let result = small.alloc_blocks(GpuBuddyAllocParams {
+//!     size_bytes: SZ_8M as u64,
+//!     min_block_size_bytes: SZ_4M as u64,
+//!     buddy_flags: BuddyFlags::try_new(BuddyFlags::CONTIGUOUS_ALLOCATION)?,
+//!     ..params
+//! });
+//! assert!(result.is_err());
+//! drop(hole2);
+//! drop(hole1);
+//!
+//! # Ok::<(), Error>(())
+//! ```
+
+use crate::{
+    bindings,
+    clist::CListHead,
+    clist_create,
+    error::to_result,
+    new_mutex,
+    prelude::*,
+    sync::{
+        lock::mutex::MutexGuard,
+        Arc,
+        Mutex, //
+    },
+    types::Opaque,
+};
+
+/// Flags for GPU buddy allocator operations.
+///
+/// These flags control the allocation behavior of the buddy allocator.
+#[derive(Clone, Copy, Default, PartialEq, Eq)]
+pub struct BuddyFlags(usize);
+
+impl BuddyFlags {
+    /// Range-based allocation from start to end addresses.
+    pub const RANGE_ALLOCATION: usize = bindings::GPU_BUDDY_RANGE_ALLOCATION;
+
+    /// Allocate from top of address space downward.
+    pub const TOPDOWN_ALLOCATION: usize = bindings::GPU_BUDDY_TOPDOWN_ALLOCATION;
+
+    /// Allocate physically contiguous blocks.
+    pub const CONTIGUOUS_ALLOCATION: usize = bindings::GPU_BUDDY_CONTIGUOUS_ALLOCATION;
+
+    /// Request allocation from the cleared (zeroed) memory. The zero'ing is not
+    /// done by the allocator, but by the caller before freeing old blocks.
+    pub const CLEAR_ALLOCATION: usize = bindings::GPU_BUDDY_CLEAR_ALLOCATION;
+
+    /// Disable trimming of partially used blocks.
+    pub const TRIM_DISABLE: usize = bindings::GPU_BUDDY_TRIM_DISABLE;
+
+    /// Mark blocks as cleared (zeroed) when freeing. When set during free,
+    /// indicates that the caller has already zeroed the memory.
+    pub const CLEARED: usize = bindings::GPU_BUDDY_CLEARED;
+
+    /// Create [`BuddyFlags`] from a raw value with validation.
+    ///
+    /// Use `|` operator to combine flags if needed, before calling this method.
+    pub fn try_new(flags: usize) -> Result<Self> {
+        // Flags must not exceed u32::MAX to satisfy the GPU buddy allocator C API.
+        if flags > u32::MAX as usize {
+            return Err(EINVAL);
+        }
+
+        // `TOPDOWN_ALLOCATION` only works without `RANGE_ALLOCATION`. When both are
+        // set, `TOPDOWN_ALLOCATION` is silently ignored by the allocator. Reject this.
+        if (flags & Self::RANGE_ALLOCATION) != 0 && (flags & Self::TOPDOWN_ALLOCATION) != 0 {
+            return Err(EINVAL);
+        }
+
+        Ok(Self(flags))
+    }
+
+    /// Get raw value of the flags.
+    pub(crate) fn as_raw(self) -> usize {
+        self.0
+    }
+}
+
+/// Parameters for creating a GPU buddy allocator.
+#[derive(Clone, Copy)]
+pub struct GpuBuddyParams {
+    /// Base offset in bytes where the managed memory region starts.
+    /// Allocations will be offset by this value.
+    pub base_offset_bytes: u64,
+    /// Total physical memory size managed by the allocator in bytes.
+    pub physical_memory_size_bytes: u64,
+    /// Minimum allocation unit / chunk size in bytes, must be >= 4KB.
+    pub chunk_size_bytes: u64,
+}
+
+/// Parameters for allocating blocks from a GPU buddy allocator.
+#[derive(Clone, Copy)]
+pub struct GpuBuddyAllocParams {
+    /// Start of allocation range in bytes. Use 0 for beginning.
+    pub start_range_address: u64,
+    /// End of allocation range in bytes. Use 0 for entire range.
+    pub end_range_address: u64,
+    /// Total size to allocate in bytes.
+    pub size_bytes: u64,
+    /// Minimum block size for fragmented allocations in bytes.
+    pub min_block_size_bytes: u64,
+    /// Buddy allocator behavior flags.
+    pub buddy_flags: BuddyFlags,
+}
+
+/// Inner structure holding the actual buddy allocator.
+///
+/// # Synchronization
+///
+/// The C `gpu_buddy` API requires synchronization (see `include/linux/gpu_buddy.h`).
+/// The internal [`GpuBuddyGuard`] ensures that the lock is held for all
+/// allocator and free operations, preventing races between concurrent allocations
+/// and the freeing that occurs when [`AllocatedBlocks`] is dropped.
+///
+/// # Invariants
+///
+/// The inner [`Opaque`] contains a valid, initialized buddy allocator.
+#[pin_data(PinnedDrop)]
+struct GpuBuddyInner {
+    #[pin]
+    inner: Opaque<bindings::gpu_buddy>,
+    #[pin]
+    lock: Mutex<()>,
+    /// Base offset for all allocations (does not change after init).
+    base_offset: u64,
+    /// Cached chunk size (does not change after init).
+    chunk_size: u64,
+    /// Cached total size (does not change after init).
+    size: u64,
+}
+
+impl GpuBuddyInner {
+    /// Create a pin-initializer for the buddy allocator.
+    fn new(params: &GpuBuddyParams) -> impl PinInit<Self, Error> {
+        let base_offset = params.base_offset_bytes;
+        let size = params.physical_memory_size_bytes;
+        let chunk_size = params.chunk_size_bytes;
+
+        try_pin_init!(Self {
+            inner <- Opaque::try_ffi_init(|ptr| {
+                // SAFETY: ptr points to valid uninitialized memory from the pin-init
+                // infrastructure. gpu_buddy_init will initialize the structure.
+                to_result(unsafe { bindings::gpu_buddy_init(ptr, size, chunk_size) })
+            }),
+            lock <- new_mutex!(()),
+            base_offset: base_offset,
+            chunk_size: chunk_size,
+            size: size,
+        })
+    }
+
+    /// Lock the mutex and return a guard for accessing the allocator.
+    fn lock(&self) -> GpuBuddyGuard<'_> {
+        GpuBuddyGuard {
+            inner: self,
+            _guard: self.lock.lock(),
+        }
+    }
+}
+
+#[pinned_drop]
+impl PinnedDrop for GpuBuddyInner {
+    fn drop(self: Pin<&mut Self>) {
+        let guard = self.lock();
+
+        // SAFETY: guard provides exclusive access to the allocator.
+        unsafe {
+            bindings::gpu_buddy_fini(guard.as_raw());
+        }
+    }
+}
+
+// SAFETY: [`GpuBuddyInner`] can be sent between threads.
+unsafe impl Send for GpuBuddyInner {}
+
+// SAFETY: [`GpuBuddyInner`] is `Sync` because the internal [`GpuBuddyGuard`]
+// serializes all access to the C allocator, preventing data races.
+unsafe impl Sync for GpuBuddyInner {}
+
+/// Guard that proves the lock is held, enabling access to the allocator.
+///
+/// # Invariants
+///
+/// The inner `_guard` holds the lock for the duration of this guard's lifetime.
+pub(crate) struct GpuBuddyGuard<'a> {
+    inner: &'a GpuBuddyInner,
+    _guard: MutexGuard<'a, ()>,
+}
+
+impl GpuBuddyGuard<'_> {
+    /// Get a raw pointer to the underlying C `gpu_buddy` structure.
+    fn as_raw(&self) -> *mut bindings::gpu_buddy {
+        self.inner.inner.get()
+    }
+}
+
+/// GPU buddy allocator instance.
+///
+/// This structure wraps the C `gpu_buddy` allocator using reference counting.
+/// The allocator is automatically cleaned up when all references are dropped.
+///
+/// # Invariants
+///
+/// The inner [`Arc`] points to a valid, initialized GPU buddy allocator.
+pub struct GpuBuddy(Arc<GpuBuddyInner>);
+
+impl GpuBuddy {
+    /// Create a new buddy allocator.
+    ///
+    /// Creates a buddy allocator that manages a contiguous address space of the given
+    /// size, with the specified minimum allocation unit (chunk_size must be at least 4KB).
+    pub fn new(params: GpuBuddyParams) -> Result<Self> {
+        Ok(Self(Arc::pin_init(
+            GpuBuddyInner::new(&params),
+            GFP_KERNEL,
+        )?))
+    }
+
+    /// Get the base offset for allocations.
+    pub fn base_offset(&self) -> u64 {
+        self.0.base_offset
+    }
+
+    /// Get the chunk size (minimum allocation unit).
+    pub fn chunk_size(&self) -> u64 {
+        self.0.chunk_size
+    }
+
+    /// Get the total managed size.
+    pub fn size(&self) -> u64 {
+        self.0.size
+    }
+
+    /// Get the available (free) memory in bytes.
+    pub fn free_memory_bytes(&self) -> u64 {
+        let guard = self.0.lock();
+        // SAFETY: guard provides exclusive access to the allocator.
+        unsafe { (*guard.as_raw()).avail }
+    }
+
+    /// Allocate blocks from the buddy allocator.
+    ///
+    /// Returns an [`Arc<AllocatedBlocks>`] structure that owns the allocated blocks
+    /// and automatically frees them when all references are dropped.
+    ///
+    /// Takes `&self` instead of `&mut self` because the internal [`Mutex`] provides
+    /// synchronization - no external `&mut` exclusivity needed.
+    pub fn alloc_blocks(&self, params: GpuBuddyAllocParams) -> Result<Arc<AllocatedBlocks>> {
+        let buddy_arc = Arc::clone(&self.0);
+
+        // Create pin-initializer that initializes list and allocates blocks.
+        let init = try_pin_init!(AllocatedBlocks {
+            list <- CListHead::try_init(|list| {
+                // Lock while allocating to serialize with concurrent frees.
+                let guard = buddy_arc.lock();
+
+                // SAFETY: guard provides exclusive access, list is initialized.
+                to_result(unsafe {
+                    bindings::gpu_buddy_alloc_blocks(
+                        guard.as_raw(),
+                        params.start_range_address,
+                        params.end_range_address,
+                        params.size_bytes,
+                        params.min_block_size_bytes,
+                        list.as_raw(),
+                        params.buddy_flags.as_raw(),
+                    )
+                })
+            }),
+            buddy: Arc::clone(&buddy_arc),
+            flags: params.buddy_flags,
+        });
+
+        Arc::pin_init(init, GFP_KERNEL)
+    }
+}
+
+/// Allocated blocks from the buddy allocator with automatic cleanup.
+///
+/// This structure owns a list of allocated blocks and ensures they are
+/// automatically freed when dropped. Use `iter()` to iterate over all
+/// allocated [`Block`] structures.
+///
+/// # Invariants
+///
+/// - `list` is an initialized, valid list head containing allocated blocks.
+/// - `buddy` references a valid [`GpuBuddyInner`].
+#[pin_data(PinnedDrop)]
+pub struct AllocatedBlocks {
+    #[pin]
+    list: CListHead,
+    buddy: Arc<GpuBuddyInner>,
+    flags: BuddyFlags,
+}
+
+impl AllocatedBlocks {
+    /// Check if the block list is empty.
+    pub fn is_empty(&self) -> bool {
+        // An empty list head points to itself.
+        !self.list.is_linked()
+    }
+
+    /// Iterate over allocated blocks.
+    ///
+    /// Returns an iterator yielding [`AllocatedBlock`] references. The blocks
+    /// are only valid for the duration of the borrow of `self`.
+    pub fn iter(&self) -> impl Iterator<Item = AllocatedBlock<'_>> + '_ {
+        // SAFETY: list contains gpu_buddy_block items linked via __bindgen_anon_1.link.
+        let clist = unsafe {
+            clist_create!(
+                self.list.as_raw(),
+                Block,
+                bindings::gpu_buddy_block,
+                __bindgen_anon_1.link
+            )
+        };
+
+        clist
+            .iter()
+            .map(|block| AllocatedBlock { block, alloc: self })
+    }
+}
+
+#[pinned_drop]
+impl PinnedDrop for AllocatedBlocks {
+    fn drop(self: Pin<&mut Self>) {
+        let guard = self.buddy.lock();
+
+        // SAFETY:
+        // - list is valid per the type's invariants.
+        // - guard provides exclusive access to the allocator.
+        // CAST: BuddyFlags were validated to fit in u32 at construction.
+        unsafe {
+            bindings::gpu_buddy_free_list(
+                guard.as_raw(),
+                self.list.as_raw(),
+                self.flags.as_raw() as u32,
+            );
+        }
+    }
+}
+
+/// A GPU buddy block.
+///
+/// Transparent wrapper over C `gpu_buddy_block` structure. This type is returned
+/// as references from [`CListIter`] during iteration over [`AllocatedBlocks`].
+///
+/// # Invariants
+///
+/// The inner [`Opaque`] contains a valid, allocated `gpu_buddy_block`.
+#[repr(transparent)]
+pub struct Block(Opaque<bindings::gpu_buddy_block>);
+
+impl Block {
+    /// Get a raw pointer to the underlying C block.
+    fn as_raw(&self) -> *mut bindings::gpu_buddy_block {
+        self.0.get()
+    }
+
+    /// Get the block's offset in the address space.
+    pub(crate) fn offset(&self) -> u64 {
+        // SAFETY: self.as_raw() is valid per the type's invariants.
+        unsafe { bindings::gpu_buddy_block_offset(self.as_raw()) }
+    }
+
+    /// Get the block order.
+    pub(crate) fn order(&self) -> u32 {
+        // SAFETY: self.as_raw() is valid per the type's invariants.
+        unsafe { bindings::gpu_buddy_block_order(self.as_raw()) }
+    }
+}
+
+// SAFETY: `Block` is a transparent wrapper over `gpu_buddy_block` which is not
+// modified after allocation. It can be safely sent between threads.
+unsafe impl Send for Block {}
+
+// SAFETY: `Block` is a transparent wrapper over `gpu_buddy_block` which is not
+// modified after allocation. It can be safely shared among threads.
+unsafe impl Sync for Block {}
+
+/// An allocated block with access to the allocation list.
+///
+/// # Invariants
+///
+/// - `block` is a valid reference to an allocated [`Block`].
+/// - `alloc` is a valid reference to the [`AllocatedBlocks`] that owns this block.
+pub struct AllocatedBlock<'a> {
+    block: &'a Block,
+    alloc: &'a AllocatedBlocks,
+}
+
+impl AllocatedBlock<'_> {
+    /// Get the block's offset in the address space.
+    ///
+    /// Returns the absolute offset including the allocator's base offset.
+    /// This is the actual address to use for accessing the allocated memory.
+    pub fn offset(&self) -> u64 {
+        self.alloc.buddy.base_offset + self.block.offset()
+    }
+
+    /// Get the block order (size = chunk_size << order).
+    pub fn order(&self) -> u32 {
+        self.block.order()
+    }
+
+    /// Get the block's size in bytes.
+    pub fn size(&self) -> u64 {
+        self.alloc.buddy.chunk_size << self.block.order()
+    }
+}
diff --git a/rust/kernel/gpu/mod.rs b/rust/kernel/gpu/mod.rs
new file mode 100644
index 000000000000..8f25e6367edc
--- /dev/null
+++ b/rust/kernel/gpu/mod.rs
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! GPU subsystem abstractions.
+
+pub mod buddy;
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index cd7e6a1055b0..d754d777f8ff 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -98,6 +98,8 @@
 pub mod firmware;
 pub mod fmt;
 pub mod fs;
+#[cfg(CONFIG_GPU_BUDDY)]
+pub mod gpu;
 #[cfg(CONFIG_I2C = "y")]
 pub mod i2c;
 pub mod id_pool;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 04/26] nova-core: mm: Select GPU_BUDDY for VRAM allocation
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (2 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 03/26] rust: gpu: Add GPU buddy allocator bindings Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Select the GPU_BUDDY allocator config option, which provides the buddy
allocator bindings needed for VRAM page allocation in nova-core's memory
management subsystem.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/nova-core/Kconfig b/drivers/gpu/nova-core/Kconfig
index 527920f9c4d3..809485167aff 100644
--- a/drivers/gpu/nova-core/Kconfig
+++ b/drivers/gpu/nova-core/Kconfig
@@ -5,6 +5,7 @@ config NOVA_CORE
 	depends on RUST
 	select RUST_FW_LOADER_ABSTRACTIONS
 	select AUXILIARY_BUS
+	select GPU_BUDDY
 	default n
 	help
 	  Choose this if you want to build the Nova Core driver for Nvidia
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (3 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 04/26] nova-core: mm: Select GPU_BUDDY for VRAM allocation Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-21  8:07   ` Zhi Wang
  2026-01-20 20:42 ` [PATCH RFC v6 06/26] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
                   ` (21 subsequent siblings)
  26 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

PRAMIN apertures are a crucial mechanism to direct read/write to VRAM.
Add support for the same.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/mod.rs    |   5 +
 drivers/gpu/nova-core/mm/pramin.rs | 244 +++++++++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs |   1 +
 drivers/gpu/nova-core/regs.rs      |   5 +
 4 files changed, 255 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/mod.rs
 create mode 100644 drivers/gpu/nova-core/mm/pramin.rs

diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
new file mode 100644
index 000000000000..7a5dd4220c67
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/mod.rs
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Memory management subsystems for nova-core.
+
+pub(crate) mod pramin;
diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
new file mode 100644
index 000000000000..6a7ea2dc7d77
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pramin.rs
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Direct VRAM access through the PRAMIN aperture.
+//!
+//! PRAMIN provides a 1MB sliding window into VRAM through BAR0, allowing the CPU to access
+//! video memory directly. The [`Window`] type automatically repositions the window when
+//! accessing different VRAM regions and restores the original position on drop. This allows
+//! to reuse the same window for multiple accesses in the same window.
+//!
+//! The PRAMIN aperture is a 1MB region at BAR0 + 0x700000 for all GPUs. The window base is
+//! controlled by the `NV_PBUS_BAR0_WINDOW` register and must be 64KB aligned.
+//!
+//! # Examples
+//!
+//! ## Basic read/write
+//!
+//! ```no_run
+//! use crate::driver::Bar0;
+//! use crate::mm::pramin;
+//! use kernel::devres::Devres;
+//! use kernel::sync::Arc;
+//!
+//! fn example(devres_bar: Arc<Devres<Bar0>>) -> Result<()> {
+//!     let mut pram_win = pramin::Window::new(devres_bar)?;
+//!
+//!     // Write and read back.
+//!     pram_win.try_write32(0x100, 0xDEADBEEF)?;
+//!     let val = pram_win.try_read32(0x100)?;
+//!     assert_eq!(val, 0xDEADBEEF);
+//!
+//!     Ok(())
+//!     // Original window position restored on drop.
+//! }
+//! ```
+//!
+//! ## Auto-repositioning across VRAM regions
+//!
+//! ```no_run
+//! use crate::driver::Bar0;
+//! use crate::mm::pramin;
+//! use kernel::devres::Devres;
+//! use kernel::sync::Arc;
+//!
+//! fn example(devres_bar: Arc<Devres<Bar0>>) -> Result<()> {
+//!     let mut pram_win = pramin::Window::new(devres_bar)?;
+//!
+//!     // Access first 1MB region.
+//!     pram_win.try_write32(0x100, 0x11111111)?;
+//!
+//!     // Access at 2MB - window auto-repositions.
+//!     pram_win.try_write32(0x200000, 0x22222222)?;
+//!
+//!     // Back to first region - window repositions again.
+//!     let val = pram_win.try_read32(0x100)?;
+//!     assert_eq!(val, 0x11111111);
+//!
+//!     Ok(())
+//! }
+//! ```
+
+#![allow(unused)]
+
+use crate::{
+    driver::Bar0,
+    regs, //
+};
+
+use kernel::bits::genmask_u64;
+use kernel::devres::Devres;
+use kernel::prelude::*;
+use kernel::ptr::{
+    Alignable,
+    Alignment, //
+};
+use kernel::sizes::{
+    SZ_1M,
+    SZ_64K, //
+};
+use kernel::sync::Arc;
+
+/// PRAMIN aperture base offset in BAR0.
+const PRAMIN_BASE: usize = 0x700000;
+
+/// PRAMIN aperture size (1MB).
+const PRAMIN_SIZE: usize = SZ_1M;
+
+/// 64KB alignment for window base.
+const WINDOW_ALIGN: Alignment = Alignment::new::<SZ_64K>();
+
+/// Maximum addressable VRAM offset (40-bit address space).
+///
+/// The `NV_PBUS_BAR0_WINDOW` register has a 24-bit `window_base` field (bits 23:0) that stores
+/// bits [39:16] of the target VRAM address. This limits the addressable space to 2^40 bytes.
+///
+/// CAST: On 64-bit systems, this fits in usize.
+const MAX_VRAM_OFFSET: usize = genmask_u64(0..=39) as usize;
+
+/// Generate a PRAMIN read accessor.
+macro_rules! define_pramin_read {
+    ($name:ident, $ty:ty) => {
+        #[doc = concat!("Read a `", stringify!($ty), "` from VRAM at the given offset.")]
+        pub(crate) fn $name(&mut self, vram_offset: usize) -> Result<$ty> {
+            // Compute window parameters without bar reference.
+            let (bar_offset, new_base) =
+                self.compute_window(vram_offset, ::core::mem::size_of::<$ty>())?;
+
+            // Update window base if needed and perform read.
+            let bar = self.bar.try_access().ok_or(ENODEV)?;
+            if let Some(base) = new_base {
+                Self::write_window_base(&bar, base);
+                self.current_base = base;
+            }
+            bar.$name(bar_offset)
+        }
+    };
+}
+
+/// Generate a PRAMIN write accessor.
+macro_rules! define_pramin_write {
+    ($name:ident, $ty:ty) => {
+        #[doc = concat!("Write a `", stringify!($ty), "` to VRAM at the given offset.")]
+        pub(crate) fn $name(&mut self, vram_offset: usize, value: $ty) -> Result {
+            // Compute window parameters without bar reference.
+            let (bar_offset, new_base) =
+                self.compute_window(vram_offset, ::core::mem::size_of::<$ty>())?;
+
+            // Update window base if needed and perform write.
+            let bar = self.bar.try_access().ok_or(ENODEV)?;
+            if let Some(base) = new_base {
+                Self::write_window_base(&bar, base);
+                self.current_base = base;
+            }
+            bar.$name(value, bar_offset)
+        }
+    };
+}
+
+/// PRAMIN window for direct VRAM access.
+///
+/// The window auto-repositions when accessing VRAM offsets outside the current 1MB range.
+/// Original window position is saved on creation and restored on drop.
+pub(crate) struct Window {
+    bar: Arc<Devres<Bar0>>,
+    saved_base: usize,
+    current_base: usize,
+}
+
+impl Window {
+    /// Create a new PRAMIN window accessor.
+    ///
+    /// Saves the current window position for restoration on drop.
+    pub(crate) fn new(bar: Arc<Devres<Bar0>>) -> Result<Self> {
+        let bar_access = bar.try_access().ok_or(ENODEV)?;
+        let saved_base = Self::try_read_window_base(&bar_access)?;
+
+        Ok(Self {
+            bar,
+            saved_base,
+            current_base: saved_base,
+        })
+    }
+
+    /// Read the current window base from the BAR0_WINDOW register.
+    fn try_read_window_base(bar: &Bar0) -> Result<usize> {
+        let reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
+        let base = u64::from(reg.window_base());
+        let shifted = base.checked_shl(16).ok_or(EOVERFLOW)?;
+        shifted.try_into().map_err(|_| EOVERFLOW)
+    }
+
+    /// Write a new window base to the BAR0_WINDOW register.
+    fn write_window_base(bar: &Bar0, base: usize) {
+        // CAST:
+        // - We have guaranteed that the base is within the addressable range (40-bits).
+        // - After >> 16, a 40-bit aligned base becomes 24 bits, which fits in u32.
+        regs::NV_PBUS_BAR0_WINDOW::default()
+            .set_window_base((base >> 16) as u32)
+            .write(bar);
+    }
+
+    /// Compute window parameters for a VRAM access.
+    ///
+    /// Returns (bar_offset, new_base) where:
+    /// - bar_offset: The BAR0 offset to use for the access
+    /// - new_base: Some(base) if window needs repositioning, None otherwise
+    fn compute_window(
+        &self,
+        vram_offset: usize,
+        access_size: usize,
+    ) -> Result<(usize, Option<usize>)> {
+        // Validate VRAM offset is within addressable range (40-bit address space).
+        let end_offset = vram_offset.checked_add(access_size).ok_or(EINVAL)?;
+        if end_offset > MAX_VRAM_OFFSET + 1 {
+            return Err(EINVAL);
+        }
+
+        // Calculate which 64KB-aligned base we need.
+        let needed_base = vram_offset.align_down(WINDOW_ALIGN);
+
+        // Calculate offset within the window.
+        let offset_in_window = vram_offset - needed_base;
+
+        // Check if access fits in 1MB window from this base.
+        if offset_in_window + access_size > PRAMIN_SIZE {
+            return Err(EINVAL);
+        }
+
+        // Return bar offset and whether window needs repositioning.
+        let new_base = if self.current_base != needed_base {
+            Some(needed_base)
+        } else {
+            None
+        };
+
+        Ok((PRAMIN_BASE + offset_in_window, new_base))
+    }
+
+    define_pramin_read!(try_read8, u8);
+    define_pramin_read!(try_read16, u16);
+    define_pramin_read!(try_read32, u32);
+    define_pramin_read!(try_read64, u64);
+
+    define_pramin_write!(try_write8, u8);
+    define_pramin_write!(try_write16, u16);
+    define_pramin_write!(try_write32, u32);
+    define_pramin_write!(try_write64, u64);
+}
+
+impl Drop for Window {
+    fn drop(&mut self) {
+        // Restore the original window base if it changed.
+        if self.current_base != self.saved_base {
+            if let Some(bar) = self.bar.try_access() {
+                Self::write_window_base(&bar, self.saved_base);
+            }
+        }
+    }
+}
+
+// SAFETY: `Window` requires `&mut self` for all accessors.
+unsafe impl Send for Window {}
+
+// SAFETY: `Window` requires `&mut self` for all accessors.
+unsafe impl Sync for Window {}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index c1121e7c64c5..3de00db3279e 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -13,6 +13,7 @@
 mod gfw;
 mod gpu;
 mod gsp;
+mod mm;
 mod num;
 mod regs;
 mod sbuffer;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 82cc6c0790e5..c8b8fbdcf608 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -96,6 +96,11 @@ fn fmt(&self, f: &mut kernel::fmt::Formatter<'_>) -> kernel::fmt::Result {
     31:16   frts_err_code as u16;
 });
 
+register!(NV_PBUS_BAR0_WINDOW @ 0x00001700, "BAR0 window control for PRAMIN access" {
+    25:24   target as u8, "Target memory (0=VRAM, 1=SYS_MEM_COH, 2=SYS_MEM_NONCOH)";
+    23:0    window_base as u32, "Window base address (bits 39:16 of FB addr)";
+});
+
 // PFB
 
 // The following two registers together hold the physical system memory address that is used by the
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 06/26] docs: gpu: nova-core: Document the PRAMIN aperture mechanism
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (4 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 07/26] nova-core: Add BAR1 aperture type and size constant Joel Fernandes
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add documentation for the PRAMIN aperture mechanism used by nova-core
for direct VRAM access.

Nova only uses TARGET=VID_MEM for VRAM access. The SYS_MEM target values
are documented for completeness but not used by the driver.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 Documentation/gpu/nova/core/pramin.rst | 125 +++++++++++++++++++++++++
 Documentation/gpu/nova/index.rst       |   1 +
 2 files changed, 126 insertions(+)
 create mode 100644 Documentation/gpu/nova/core/pramin.rst

diff --git a/Documentation/gpu/nova/core/pramin.rst b/Documentation/gpu/nova/core/pramin.rst
new file mode 100644
index 000000000000..55ec9d920629
--- /dev/null
+++ b/Documentation/gpu/nova/core/pramin.rst
@@ -0,0 +1,125 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
+PRAMIN aperture mechanism
+=========================
+
+.. note::
+   The following description is approximate and current as of the Ampere family.
+   It may change for future generations and is intended to assist in understanding
+   the driver code.
+
+Introduction
+============
+
+PRAMIN is a hardware aperture mechanism that provides CPU access to GPU Video RAM (VRAM) before
+the GPU's Memory Management Unit (MMU) and page tables are initialized. This 1MB sliding window,
+located at a fixed offset within BAR0, is essential for setting up page tables and other critical
+GPU data structures without relying on the GPU's MMU.
+
+Architecture Overview
+=====================
+
+The PRAMIN aperture mechanism is logically implemented by the GPU's PBUS (PCIe Bus Controller Unit)
+and provides a CPU-accessible window into VRAM through the PCIe interface::
+
+    +-----------------+    PCIe     +------------------------------+
+    |      CPU        |<----------->|           GPU                |
+    +-----------------+             |                              |
+                                    |  +----------------------+    |
+                                    |  |       PBUS           |    |
+                                    |  |  (Bus Controller)    |    |
+                                    |  |                      |    |
+                                    |  |  +--------------+<------------ (window starts at
+                                    |  |  |   PRAMIN     |    |    |     BAR0 + 0x700000)
+                                    |  |  |   Window     |    |    |
+                                    |  |  |   (1MB)      |    |    |
+                                    |  |  +--------------+    |    |
+                                    |  |         |            |    |
+                                    |  +---------|------------+    |
+                                    |            |                 |
+                                    |            v                 |
+                                    |  +----------------------+<------------ (Program PRAMIN to any
+                                    |  |       VRAM           |    |    64KB-aligned VRAM boundary)
+                                    |  |    (Several GBs)     |    |
+                                    |  |                      |    |
+                                    |  |  FB[0x000000000000]  |    |
+                                    |  |          ...         |    |
+                                    |  |  FB[0x7FFFFFFFFFF]   |    |
+                                    |  +----------------------+    |
+                                    +------------------------------+
+
+PBUS (PCIe Bus Controller) is responsible for, among other things, handling MMIO
+accesses to the BAR registers.
+
+PRAMIN Window Operation
+=======================
+
+The PRAMIN window provides a 1MB sliding aperture that can be repositioned over
+the entire VRAM address space using the ``NV_PBUS_BAR0_WINDOW`` register.
+
+Window Control Mechanism
+-------------------------
+
+The window position is controlled via the PBUS ``BAR0_WINDOW`` register::
+
+    NV_PBUS_BAR0_WINDOW Register (0x1700):
+    +-------+--------+--------------------------------------+
+    | 31:26 | 25:24  |               23:0                   |
+    | RSVD  | TARGET |            BASE_ADDR                 |
+    |       |        |        (bits 39:16 of VRAM address)  |
+    +-------+--------+--------------------------------------+
+
+    BASE_ADDR field (bits 23:0):
+    - Contains bits [39:16] of the target VRAM address
+    - Provides 40-bit (1TB) address space coverage
+    - Must be programmed with 64KB-aligned addresses
+
+    TARGET field (bits 25:24):
+    - 0x0: VRAM (Video Memory)
+    - 0x1: SYS_MEM_COH (Coherent System Memory)
+    - 0x2: SYS_MEM_NONCOH (Non-coherent System Memory)
+    - 0x3: Reserved
+
+    .. note::
+       Nova only uses TARGET=VRAM (0x0) for video memory access. The SYS_MEM
+       target values are documented here for hardware completeness but are
+       not used by the driver.
+
+64KB Alignment Requirement
+---------------------------
+
+The PRAMIN window must be aligned to 64KB boundaries in VRAM. This is enforced
+by the ``BASE_ADDR`` field representing bits [39:16] of the target address::
+
+    VRAM Address Calculation:
+    actual_vram_addr = (BASE_ADDR << 16) + pramin_offset
+    Where:
+    - BASE_ADDR: 24-bit value from NV_PBUS_BAR0_WINDOW[23:0]
+    - pramin_offset: 20-bit offset within the PRAMIN window [0x00000-0xFFFFF]
+
+    Example Window Positioning:
+    +---------------------------------------------------------+
+    |                    VRAM Space                           |
+    |                                                         |
+    |  0x000000000  +-----------------+ <-- 64KB aligned      |
+    |               | PRAMIN Window   |                       |
+    |               |    (1MB)        |                       |
+    |  0x0000FFFFF  +-----------------+                       |
+    |                                                         |
+    |       |              ^                                  |
+    |       |              | Window can slide                 |
+    |       v              | to any 64KB-aligned boundary     |
+    |                                                         |
+    |  0x123400000  +-----------------+ <-- 64KB aligned      |
+    |               | PRAMIN Window   |                       |
+    |               |    (1MB)        |                       |
+    |  0x1234FFFFF  +-----------------+                       |
+    |                                                         |
+    |                       ...                               |
+    |                                                         |
+    |  0x7FFFF0000  +-----------------+ <-- 64KB aligned      |
+    |               | PRAMIN Window   |                       |
+    |               |    (1MB)        |                       |
+    |  0x7FFFFFFFF  +-----------------+                       |
+    +---------------------------------------------------------+
diff --git a/Documentation/gpu/nova/index.rst b/Documentation/gpu/nova/index.rst
index e39cb3163581..b8254b1ffe2a 100644
--- a/Documentation/gpu/nova/index.rst
+++ b/Documentation/gpu/nova/index.rst
@@ -32,3 +32,4 @@ vGPU manager VFIO driver and the nova-drm driver.
    core/devinit
    core/fwsec
    core/falcon
+   core/pramin
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 07/26] nova-core: Add BAR1 aperture type and size constant
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (5 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 06/26] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 08/26] nova-core: gsp: Add BAR1 PDE base accessors Joel Fernandes
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add BAR1_SIZE constant and Bar1 type alias for the 256MB BAR1 aperture.
These are prerequisites for BAR1 memory access functionality.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 5a4cc047bcfc..d8b2e967ba4c 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -13,7 +13,10 @@
         Vendor, //
     },
     prelude::*,
-    sizes::SZ_16M,
+    sizes::{
+        SZ_16M,
+        SZ_256M, //
+    },
     sync::Arc, //
 };
 
@@ -28,6 +31,7 @@ pub(crate) struct NovaCore {
 }
 
 const BAR0_SIZE: usize = SZ_16M;
+pub(crate) const BAR1_SIZE: usize = SZ_256M;
 
 // For now we only support Ampere which can use up to 47-bit DMA addresses.
 //
@@ -38,6 +42,7 @@ pub(crate) struct NovaCore {
 const GPU_DMA_BITS: u32 = 47;
 
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
+pub(crate) type Bar1 = pci::Bar<BAR1_SIZE>;
 
 kernel::pci_device_table!(
     PCI_TABLE,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 08/26] nova-core: gsp: Add BAR1 PDE base accessors
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (6 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 07/26] nova-core: Add BAR1 aperture type and size constant Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 09/26] nova-core: mm: Add common memory management types Joel Fernandes
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add accessor methods to GspStaticConfigInfo for retrieving the BAR1 Page
Directory Entry base addresses from GSP-RM firmware.

These addresses point to the root page tables for BAR1 virtual memory spaces.
The page tables are set up by GSP-RM during GPU initialization.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs          | 1 +
 drivers/gpu/nova-core/gsp/commands.rs    | 8 ++++++++
 drivers/gpu/nova-core/gsp/fw/commands.rs | 8 ++++++++
 3 files changed, 17 insertions(+)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index d8b2e967ba4c..f30ffa45cf13 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -42,6 +42,7 @@ pub(crate) struct NovaCore {
 const GPU_DMA_BITS: u32 = 47;
 
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
+#[expect(dead_code)]
 pub(crate) type Bar1 = pci::Bar<BAR1_SIZE>;
 
 kernel::pci_device_table!(
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index c8430a076269..7b5025cba106 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -189,6 +189,7 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
 /// The reply from the GSP to the [`GetGspInfo`] command.
 pub(crate) struct GetGspStaticInfoReply {
     gpu_name: [u8; 64],
+    bar1_pde_base: u64,
 }
 
 impl MessageFromGsp for GetGspStaticInfoReply {
@@ -202,6 +203,7 @@ fn read(
     ) -> Result<Self, Self::InitError> {
         Ok(GetGspStaticInfoReply {
             gpu_name: msg.gpu_name_str(),
+            bar1_pde_base: msg.bar1_pde_base(),
         })
     }
 }
@@ -228,6 +230,12 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
             .to_str()
             .map_err(GpuNameError::InvalidUtf8)
     }
+
+    /// Returns the BAR1 Page Directory Entry base address.
+    #[expect(dead_code)]
+    pub(crate) fn bar1_pde_base(&self) -> u64 {
+        self.bar1_pde_base
+    }
 }
 
 /// Send the [`GetGspInfo`] command and awaits for its reply.
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index 21be44199693..f069f4092911 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -114,6 +114,14 @@ impl GspStaticConfigInfo {
     pub(crate) fn gpu_name_str(&self) -> [u8; 64] {
         self.0.gpuNameString
     }
+
+    /// Returns the BAR1 Page Directory Entry base address.
+    ///
+    /// This is the root page table address for BAR1 virtual memory,
+    /// set up by GSP-RM firmware.
+    pub(crate) fn bar1_pde_base(&self) -> u64 {
+        self.0.bar1PdeBase
+    }
 }
 
 // SAFETY: Padding is explicit and will not contain uninitialized data.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 09/26] nova-core: mm: Add common memory management types
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (7 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 08/26] nova-core: gsp: Add BAR1 PDE base accessors Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 10/26] nova-core: mm: Add common types for all page table formats Joel Fernandes
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add foundational types for GPU memory management. These types are used
throughout the nova memory management subsystem for page table
operations, address translation, and memory allocation.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/mod.rs | 147 ++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)

diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
index 7a5dd4220c67..b57016d453ce 100644
--- a/drivers/gpu/nova-core/mm/mod.rs
+++ b/drivers/gpu/nova-core/mm/mod.rs
@@ -2,4 +2,151 @@
 
 //! Memory management subsystems for nova-core.
 
+#![expect(dead_code)]
+
 pub(crate) mod pramin;
+
+use kernel::sizes::SZ_4K;
+
+/// Page size in bytes (4 KiB).
+pub(crate) const PAGE_SIZE: usize = SZ_4K;
+
+bitfield! {
+    pub(crate) struct VramAddress(u64), "Physical VRAM address in GPU video memory" {
+        11:0    offset          as u64, "Offset within 4KB page";
+        63:12   frame_number    as u64 => Pfn, "Physical frame number";
+    }
+}
+
+impl VramAddress {
+    /// Create a new VRAM address from a raw value.
+    pub(crate) const fn new(addr: u64) -> Self {
+        Self(addr)
+    }
+
+    /// Get the raw address value as `usize` (useful for MMIO offsets).
+    pub(crate) const fn raw(&self) -> usize {
+        self.0 as usize
+    }
+
+    /// Get the raw address value as `u64`.
+    pub(crate) const fn raw_u64(&self) -> u64 {
+        self.0
+    }
+}
+
+impl From<Pfn> for VramAddress {
+    fn from(pfn: Pfn) -> Self {
+        Self::default().set_frame_number(pfn)
+    }
+}
+
+bitfield! {
+    pub(crate) struct VirtualAddress(u64), "Virtual address in GPU address space" {
+        11:0    offset          as u64, "Offset within 4KB page";
+        20:12   l4_index        as u64, "Level 4 index (PTE)";
+        29:21   l3_index        as u64, "Level 3 index (Dual PDE)";
+        38:30   l2_index        as u64, "Level 2 index";
+        47:39   l1_index        as u64, "Level 1 index";
+        56:48   l0_index        as u64, "Level 0 index (PDB)";
+        63:12   frame_number    as u64 => Vfn, "Virtual frame number";
+    }
+}
+
+impl VirtualAddress {
+    /// Create a new virtual address from a raw value.
+    #[expect(dead_code)]
+    pub(crate) const fn new(addr: u64) -> Self {
+        Self(addr)
+    }
+
+    /// Get the page table index for a given level.
+    pub(crate) fn level_index(&self, level: u64) -> u64 {
+        match level {
+            0 => self.l0_index(),
+            1 => self.l1_index(),
+            2 => self.l2_index(),
+            3 => self.l3_index(),
+            4 => self.l4_index(),
+            _ => 0,
+        }
+    }
+}
+
+impl From<Vfn> for VirtualAddress {
+    fn from(vfn: Vfn) -> Self {
+        Self::default().set_frame_number(vfn)
+    }
+}
+
+/// Physical Frame Number.
+///
+/// Represents a physical page in VRAM.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(crate) struct Pfn(u64);
+
+impl Pfn {
+    /// Create a new PFN from a frame number.
+    pub(crate) const fn new(frame_number: u64) -> Self {
+        Self(frame_number)
+    }
+
+    /// Get the raw frame number.
+    pub(crate) const fn raw(self) -> u64 {
+        self.0
+    }
+}
+
+impl From<VramAddress> for Pfn {
+    fn from(addr: VramAddress) -> Self {
+        addr.frame_number()
+    }
+}
+
+impl From<u64> for Pfn {
+    fn from(val: u64) -> Self {
+        Self(val)
+    }
+}
+
+impl From<Pfn> for u64 {
+    fn from(pfn: Pfn) -> Self {
+        pfn.0
+    }
+}
+
+/// Virtual Frame Number.
+///
+/// Represents a virtual page in GPU address space.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(crate) struct Vfn(u64);
+
+impl Vfn {
+    /// Create a new VFN from a frame number.
+    pub(crate) const fn new(frame_number: u64) -> Self {
+        Self(frame_number)
+    }
+
+    /// Get the raw frame number.
+    pub(crate) const fn raw(self) -> u64 {
+        self.0
+    }
+}
+
+impl From<VirtualAddress> for Vfn {
+    fn from(addr: VirtualAddress) -> Self {
+        addr.frame_number()
+    }
+}
+
+impl From<u64> for Vfn {
+    fn from(val: u64) -> Self {
+        Self(val)
+    }
+}
+
+impl From<Vfn> for u64 {
+    fn from(vfn: Vfn) -> Self {
+        vfn.0
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 10/26] nova-core: mm: Add common types for all page table formats
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (8 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 09/26] nova-core: mm: Add common memory management types Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 11/26] nova-core: mm: Add MMU v2 page table types Joel Fernandes
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add common page table types shared between MMU v2 and v3. These types
are hardware-agnostic and used by both MMU versions.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/mod.rs           |   1 +
 drivers/gpu/nova-core/mm/pagetable/mod.rs | 168 ++++++++++++++++++++++
 2 files changed, 169 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/mod.rs

diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
index b57016d453ce..6015fc8753bc 100644
--- a/drivers/gpu/nova-core/mm/mod.rs
+++ b/drivers/gpu/nova-core/mm/mod.rs
@@ -4,6 +4,7 @@
 
 #![expect(dead_code)]
 
+pub(crate) mod pagetable;
 pub(crate) mod pramin;
 
 use kernel::sizes::SZ_4K;
diff --git a/drivers/gpu/nova-core/mm/pagetable/mod.rs b/drivers/gpu/nova-core/mm/pagetable/mod.rs
new file mode 100644
index 000000000000..bb3a37cc6ca0
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable/mod.rs
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Common page table types shared between MMU v2 and v3.
+//!
+//! This module provides foundational types used by both MMU versions:
+//! - Page table level hierarchy
+//! - Memory aperture types for PDEs and PTEs
+
+#![expect(dead_code)]
+
+use crate::gpu::Architecture;
+
+/// MMU version enumeration.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum MmuVersion {
+    /// MMU v2 for Turing/Ampere/Ada.
+    V2,
+    /// MMU v3 for Hopper and later.
+    V3,
+}
+
+impl From<Architecture> for MmuVersion {
+    fn from(arch: Architecture) -> Self {
+        match arch {
+            Architecture::Turing | Architecture::Ampere | Architecture::Ada => Self::V2,
+            // In the future, uncomment:
+            // _ => Self::V3,
+        }
+    }
+}
+
+/// Page Table Level hierarchy for MMU v2/v3.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum PageTableLevel {
+    /// Level 0 - Page Directory Base (root).
+    Pdb,
+    /// Level 1 - Intermediate page directory.
+    L1,
+    /// Level 2 - Intermediate page directory.
+    L2,
+    /// Level 3 - Dual PDE level (Big and Small Page Tables, 128-bit entries).
+    L3,
+    /// Level 4 - Page Table Entries, pointing directly to physical pages.
+    L4,
+}
+
+impl PageTableLevel {
+    /// Get the entry size in bytes for this level.
+    pub(crate) const fn entry_size(&self) -> usize {
+        match self {
+            Self::L3 => 16, // 128-bit dual PDE
+            _ => 8,         // 64-bit PDE/PTE
+        }
+    }
+
+    /// Number of entries per page table (512 for 4KB pages).
+    pub(crate) const ENTRIES_PER_TABLE: usize = 512;
+
+    /// Get the next level in the hierarchy.
+    pub(crate) const fn next(&self) -> Option<PageTableLevel> {
+        match self {
+            Self::Pdb => Some(Self::L1),
+            Self::L1 => Some(Self::L2),
+            Self::L2 => Some(Self::L3),
+            Self::L3 => Some(Self::L4),
+            Self::L4 => None,
+        }
+    }
+
+    /// Check if this is the PTE level.
+    pub(crate) const fn is_pte_level(&self) -> bool {
+        matches!(self, Self::L4)
+    }
+
+    /// Check if this level uses dual PDE (128-bit entries).
+    pub(crate) const fn is_dual_pde_level(&self) -> bool {
+        matches!(self, Self::L3)
+    }
+
+    /// Get all PDE levels (excluding PTE level) for walking.
+    pub(crate) const fn pde_levels() -> [PageTableLevel; 4] {
+        [Self::Pdb, Self::L1, Self::L2, Self::L3]
+    }
+
+    /// Get the level as a numeric index (0-4).
+    pub(crate) const fn as_index(&self) -> u64 {
+        match self {
+            Self::Pdb => 0,
+            Self::L1 => 1,
+            Self::L2 => 2,
+            Self::L3 => 3,
+            Self::L4 => 4,
+        }
+    }
+}
+
+/// Memory aperture for Page Table Entries (`PTE`s).
+///
+/// Determines which memory region the `PTE` points to.
+#[repr(u8)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(crate) enum AperturePte {
+    /// Local video memory (VRAM).
+    #[default]
+    VideoMemory = 0,
+    /// Peer GPU's video memory.
+    PeerMemory = 1,
+    /// System memory with cache coherence.
+    SystemCoherent = 2,
+    /// System memory without cache coherence.
+    SystemNonCoherent = 3,
+}
+
+// TODO[FPRI]: Replace with `#[derive(FromPrimitive)]` when available.
+impl From<u8> for AperturePte {
+    fn from(val: u8) -> Self {
+        match val {
+            0 => Self::VideoMemory,
+            1 => Self::PeerMemory,
+            2 => Self::SystemCoherent,
+            3 => Self::SystemNonCoherent,
+            _ => Self::VideoMemory,
+        }
+    }
+}
+
+// TODO[FPRI]: Replace with `#[derive(ToPrimitive)]` when available.
+impl From<AperturePte> for u8 {
+    fn from(val: AperturePte) -> Self {
+        val as u8
+    }
+}
+
+/// Memory aperture for Page Directory Entries (`PDE`s).
+///
+/// Note: For `PDE`s, `Invalid` (0) means the entry is not valid.
+#[repr(u8)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+pub(crate) enum AperturePde {
+    /// Invalid/unused entry.
+    #[default]
+    Invalid = 0,
+    /// Page table is in video memory.
+    VideoMemory = 1,
+    /// Page table is in system memory with coherence.
+    SystemCoherent = 2,
+    /// Page table is in system memory without coherence.
+    SystemNonCoherent = 3,
+}
+
+// TODO[FPRI]: Replace with `#[derive(FromPrimitive)]` when available.
+impl From<u8> for AperturePde {
+    fn from(val: u8) -> Self {
+        match val {
+            1 => Self::VideoMemory,
+            2 => Self::SystemCoherent,
+            3 => Self::SystemNonCoherent,
+            _ => Self::Invalid,
+        }
+    }
+}
+
+// TODO[FPRI]: Replace with `#[derive(ToPrimitive)]` when available.
+impl From<AperturePde> for u8 {
+    fn from(val: AperturePde) -> Self {
+        val as u8
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 11/26] nova-core: mm: Add MMU v2 page table types
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (9 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 10/26] nova-core: mm: Add common types for all page table formats Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 12/26] nova-core: mm: Add MMU v3 " Joel Fernandes
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add page table entry and directory structures for MMU version 2
used by Turing/Ampere/Ada GPUs.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/pagetable/mod.rs  |   1 +
 drivers/gpu/nova-core/mm/pagetable/ver2.rs | 184 +++++++++++++++++++++
 2 files changed, 185 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/ver2.rs

diff --git a/drivers/gpu/nova-core/mm/pagetable/mod.rs b/drivers/gpu/nova-core/mm/pagetable/mod.rs
index bb3a37cc6ca0..787755e89a5b 100644
--- a/drivers/gpu/nova-core/mm/pagetable/mod.rs
+++ b/drivers/gpu/nova-core/mm/pagetable/mod.rs
@@ -7,6 +7,7 @@
 //! - Memory aperture types for PDEs and PTEs
 
 #![expect(dead_code)]
+pub(crate) mod ver2;
 
 use crate::gpu::Architecture;
 
diff --git a/drivers/gpu/nova-core/mm/pagetable/ver2.rs b/drivers/gpu/nova-core/mm/pagetable/ver2.rs
new file mode 100644
index 000000000000..d50c3e56d38e
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable/ver2.rs
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! MMU v2 page table types for Turing and Ampere GPUs.
+//!
+//! This module defines MMU version 2 specific types (Turing, Ampere and Ada GPUs).
+//!
+//! Bit field layouts derived from the NVIDIA OpenRM documentation:
+//! `open-gpu-kernel-modules/src/common/inc/swref/published/turing/tu102/dev_mmu.h`
+
+#![expect(dead_code)]
+
+use super::{
+    AperturePde,
+    AperturePte, //
+};
+use crate::mm::{
+    Pfn,
+    VramAddress, //
+};
+
+// Page Table Entry (PTE) for MMU v2 - 64-bit entry at level 4.
+bitfield! {
+    pub(crate) struct Pte(u64), "Page Table Entry for MMU v2" {
+        0:0     valid               as bool, "Entry is valid";
+        2:1     aperture            as u8 => AperturePte, "Memory aperture type";
+        3:3     volatile            as bool, "Volatile (bypass L2 cache)";
+        4:4     encrypted           as bool, "Encryption enabled (Confidential Computing)";
+        5:5     privilege           as bool, "Privileged access only";
+        6:6     read_only           as bool, "Write protection";
+        7:7     atomic_disable      as bool, "Atomic operations disabled";
+        53:8    frame_number_sys    as u64 => Pfn, "Frame number for system memory";
+        32:8    frame_number_vid    as u64 => Pfn, "Frame number for video memory";
+        35:33   peer_id             as u8, "Peer GPU ID for peer memory (0-7)";
+        53:36   comptagline         as u32, "Compression tag line bits";
+        63:56   kind                as u8, "Surface kind/format";
+    }
+}
+
+impl Pte {
+    /// Create a PTE from a `u64` value.
+    pub(crate) fn new(val: u64) -> Self {
+        Self(val)
+    }
+
+    /// Create a valid PTE for video memory.
+    pub(crate) fn new_vram(pfn: Pfn, writable: bool) -> Self {
+        Self::default()
+            .set_valid(true)
+            .set_aperture(AperturePte::VideoMemory)
+            .set_frame_number_vid(pfn)
+            .set_read_only(!writable)
+    }
+
+    /// Create an invalid PTE.
+    pub(crate) fn invalid() -> Self {
+        Self::default()
+    }
+
+    /// Get the frame number based on aperture type.
+    pub(crate) fn frame_number(&self) -> Pfn {
+        match self.aperture() {
+            AperturePte::VideoMemory => self.frame_number_vid(),
+            _ => self.frame_number_sys(),
+        }
+    }
+
+    /// Get the raw `u64` value.
+    pub(crate) fn raw_u64(&self) -> u64 {
+        self.0
+    }
+}
+
+// Page Directory Entry (PDE) for MMU v2 - 64-bit entry at levels 0-2.
+bitfield! {
+    pub(crate) struct Pde(u64), "Page Directory Entry for MMU v2" {
+        0:0     valid_inverted      as bool, "Valid bit (inverted logic)";
+        2:1     aperture            as u8 => AperturePde, "Memory aperture type";
+        3:3     volatile            as bool, "Volatile (bypass L2 cache)";
+        5:5     no_ats              as bool, "Disable Address Translation Services";
+        53:8    table_frame_sys     as u64 => Pfn, "Table frame number for system memory";
+        32:8    table_frame_vid     as u64 => Pfn, "Table frame number for video memory";
+        35:33   peer_id             as u8, "Peer GPU ID (0-7)";
+    }
+}
+
+impl Pde {
+    /// Create a PDE from a `u64` value.
+    pub(crate) fn new(val: u64) -> Self {
+        Self(val)
+    }
+
+    /// Create a valid PDE pointing to a page table in video memory.
+    pub(crate) fn new_vram(table_pfn: Pfn) -> Self {
+        Self::default()
+            .set_valid_inverted(false) // 0 = valid
+            .set_aperture(AperturePde::VideoMemory)
+            .set_table_frame_vid(table_pfn)
+    }
+
+    /// Create an invalid PDE.
+    pub(crate) fn invalid() -> Self {
+        Self::default()
+            .set_valid_inverted(true)
+            .set_aperture(AperturePde::Invalid)
+    }
+
+    /// Check if this PDE is valid.
+    pub(crate) fn is_valid(&self) -> bool {
+        !self.valid_inverted() && self.aperture() != AperturePde::Invalid
+    }
+
+    /// Get the table frame number based on aperture type.
+    pub(crate) fn table_frame(&self) -> Pfn {
+        match self.aperture() {
+            AperturePde::VideoMemory => self.table_frame_vid(),
+            _ => self.table_frame_sys(),
+        }
+    }
+
+    /// Get the VRAM address of the page table.
+    pub(crate) fn table_vram_address(&self) -> VramAddress {
+        debug_assert!(
+            self.aperture() == AperturePde::VideoMemory,
+            "table_vram_address called on non-VRAM PDE (aperture: {:?})",
+            self.aperture()
+        );
+        VramAddress::from(self.table_frame_vid())
+    }
+
+    /// Get the raw `u64` value of the PDE.
+    pub(crate) fn raw_u64(&self) -> u64 {
+        self.0
+    }
+}
+
+/// Dual PDE at Level 3 - 128-bit entry of Large/Small Page Table pointers.
+///
+/// The dual PDE supports both large (64KB) and small (4KB) page tables.
+#[repr(C)]
+#[derive(Debug, Clone, Copy, Default)]
+pub(crate) struct DualPde {
+    /// Large/Big Page Table pointer (lower 64 bits).
+    pub big: Pde,
+    /// Small Page Table pointer (upper 64 bits).
+    pub small: Pde,
+}
+
+impl DualPde {
+    /// Create a dual PDE from raw 128-bit value (two `u64`s).
+    pub(crate) fn new(big: u64, small: u64) -> Self {
+        Self {
+            big: Pde::new(big),
+            small: Pde::new(small),
+        }
+    }
+
+    /// Create a dual PDE with only the small page table pointer set.
+    ///
+    /// Note: The big (LPT) portion is set to 0, not `Pde::invalid()`.
+    /// According to hardware documentation, clearing bit 0 of the 128-bit
+    /// entry makes the PDE behave as a "normal" PDE. Using `Pde::invalid()`
+    /// would set bit 0 (valid_inverted), which breaks page table walking.
+    pub(crate) fn new_small(table_pfn: Pfn) -> Self {
+        Self {
+            big: Pde::new(0),
+            small: Pde::new_vram(table_pfn),
+        }
+    }
+
+    /// Check if the small page table pointer is valid.
+    pub(crate) fn has_small(&self) -> bool {
+        self.small.is_valid()
+    }
+
+    /// Check if the big page table pointer is valid.
+    pub(crate) fn has_big(&self) -> bool {
+        self.big.is_valid()
+    }
+
+    /// Get the small page table PFN.
+    pub(crate) fn small_pfn(&self) -> Pfn {
+        self.small.table_frame()
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 12/26] nova-core: mm: Add MMU v3 page table types
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (10 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 11/26] nova-core: mm: Add MMU v2 page table types Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 13/26] nova-core: mm: Add unified page table entry wrapper enums Joel Fernandes
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add page table entry and directory structures for MMU version 3
used by Hopper and later GPUs.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/pagetable/mod.rs  |   1 +
 drivers/gpu/nova-core/mm/pagetable/ver3.rs | 286 +++++++++++++++++++++
 2 files changed, 287 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/ver3.rs

diff --git a/drivers/gpu/nova-core/mm/pagetable/mod.rs b/drivers/gpu/nova-core/mm/pagetable/mod.rs
index 787755e89a5b..3b1324add844 100644
--- a/drivers/gpu/nova-core/mm/pagetable/mod.rs
+++ b/drivers/gpu/nova-core/mm/pagetable/mod.rs
@@ -8,6 +8,7 @@
 
 #![expect(dead_code)]
 pub(crate) mod ver2;
+pub(crate) mod ver3;
 
 use crate::gpu::Architecture;
 
diff --git a/drivers/gpu/nova-core/mm/pagetable/ver3.rs b/drivers/gpu/nova-core/mm/pagetable/ver3.rs
new file mode 100644
index 000000000000..6a5618fbb63d
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable/ver3.rs
@@ -0,0 +1,286 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! MMU v3 page table types for Hopper and later GPUs.
+//!
+//! This module defines MMU version 3 specific types (Hopper and later GPUs).
+//!
+//! Key differences from MMU v2:
+//! - Unified 40-bit address field for all apertures (v2 had separate sys/vid fields).
+//! - PCF (Page Classification Field) replaces separate privilege/RO/atomic/cache bits.
+//! - KIND field is 4 bits (not 8).
+//! - IS_PTE bit in PDE to support large pages directly.
+//! - No COMPTAGLINE field (compression handled differently in v3).
+//! - No separate ENCRYPTED bit.
+//!
+//! Bit field layouts derived from the NVIDIA OpenRM documentation:
+//! `open-gpu-kernel-modules/src/common/inc/swref/published/hopper/gh100/dev_mmu.h`
+
+#![expect(dead_code)]
+
+use super::{
+    AperturePde,
+    AperturePte, //
+};
+use crate::mm::{
+    Pfn,
+    VramAddress, //
+};
+use kernel::prelude::*;
+
+// Page Classification Field (PCF) - 5 bits for PTEs in MMU v3.
+bitfield! {
+    pub(crate) struct PtePcf(u8), "Page Classification Field for PTEs" {
+        0:0     uncached    as bool, "Bypass L2 cache (0=cached, 1=bypass)";
+        1:1     acd         as bool, "Access counting disabled (0=enabled, 1=disabled)";
+        2:2     read_only   as bool, "Read-only access (0=read-write, 1=read-only)";
+        3:3     no_atomic   as bool, "Atomics disabled (0=enabled, 1=disabled)";
+        4:4     privileged  as bool, "Privileged access only (0=regular, 1=privileged)";
+    }
+}
+
+impl PtePcf {
+    /// Create PCF for read-write mapping (cached, no atomics, regular mode).
+    pub(crate) fn rw() -> Self {
+        Self::default().set_no_atomic(true)
+    }
+
+    /// Create PCF for read-only mapping (cached, no atomics, regular mode).
+    pub(crate) fn ro() -> Self {
+        Self::default().set_read_only(true).set_no_atomic(true)
+    }
+
+    /// Get the raw `u8` value.
+    pub(crate) fn raw_u8(&self) -> u8 {
+        self.0
+    }
+}
+
+impl From<u8> for PtePcf {
+    fn from(val: u8) -> Self {
+        Self(val)
+    }
+}
+
+// Page Classification Field (PCF) - 3 bits for PDEs in MMU v3.
+// Controls Address Translation Services (ATS) and caching.
+bitfield! {
+    pub(crate) struct PdePcf(u8), "Page Classification Field for PDEs" {
+        0:0     uncached    as bool, "Bypass L2 cache (0=cached, 1=bypass)";
+        1:1     no_ats      as bool, "Address Translation Services disabled (0=enabled, 1=disabled)";
+    }
+}
+
+impl PdePcf {
+    /// Create PCF for cached mapping with ATS enabled (default).
+    pub(crate) fn cached() -> Self {
+        Self::default()
+    }
+
+    /// Get the raw `u8` value.
+    pub(crate) fn raw_u8(&self) -> u8 {
+        self.0
+    }
+}
+
+impl From<u8> for PdePcf {
+    fn from(val: u8) -> Self {
+        Self(val)
+    }
+}
+
+// Page Table Entry (PTE) for MMU v3.
+bitfield! {
+    pub(crate) struct Pte(u64), "Page Table Entry for MMU v3" {
+        0:0     valid           as bool, "Entry is valid";
+        2:1     aperture        as u8 => AperturePte, "Memory aperture type";
+        7:3     pcf             as u8 => PtePcf, "Page Classification Field";
+        11:8    kind            as u8, "Surface kind (4 bits, 0x0=pitch, 0xF=invalid)";
+        51:12   frame_number    as u64 => Pfn, "Physical frame number (for all apertures)";
+        63:61   peer_id         as u8, "Peer GPU ID for peer memory (0-7)";
+    }
+}
+
+impl Pte {
+    /// Create a PTE from a `u64` value.
+    pub(crate) fn new(val: u64) -> Self {
+        Self(val)
+    }
+
+    /// Create a valid PTE for video memory.
+    pub(crate) fn new_vram(frame: Pfn, writable: bool) -> Self {
+        let pcf = if writable { PtePcf::rw() } else { PtePcf::ro() };
+        Self::default()
+            .set_valid(true)
+            .set_aperture(AperturePte::VideoMemory)
+            .set_pcf(pcf)
+            .set_frame_number(frame)
+    }
+
+    /// Create an invalid PTE.
+    pub(crate) fn invalid() -> Self {
+        Self::default()
+    }
+
+    /// Get the raw `u64` value.
+    pub(crate) fn raw_u64(&self) -> u64 {
+        self.0
+    }
+}
+
+// Page Directory Entry (PDE) for MMU v3.
+//
+// Note: v3 uses a unified 40-bit address field (v2 had separate sys/vid address fields).
+bitfield! {
+    pub(crate) struct Pde(u64), "Page Directory Entry for MMU v3 (Hopper+)" {
+        0:0     is_pte      as bool, "Entry is a PTE (0=PDE, 1=large page PTE)";
+        2:1     aperture    as u8 => AperturePde, "Memory aperture (0=invalid, 1=vidmem, 2=coherent, 3=non-coherent)";
+        5:3     pcf         as u8 => PdePcf, "Page Classification Field (3 bits for PDE)";
+        51:12   table_frame as u64 => Pfn, "Table frame number (40-bit unified address)";
+    }
+}
+
+impl Pde {
+    /// Create a PDE from a `u64` value.
+    pub(crate) fn new(val: u64) -> Self {
+        Self(val)
+    }
+
+    /// Create a valid PDE pointing to a page table in video memory.
+    pub(crate) fn new_vram(table_pfn: Pfn) -> Self {
+        Self::default()
+            .set_is_pte(false)
+            .set_aperture(AperturePde::VideoMemory)
+            .set_table_frame(table_pfn)
+    }
+
+    /// Create an invalid PDE.
+    pub(crate) fn invalid() -> Self {
+        Self::default().set_aperture(AperturePde::Invalid)
+    }
+
+    /// Check if this PDE is valid.
+    pub(crate) fn is_valid(&self) -> bool {
+        self.aperture() != AperturePde::Invalid
+    }
+
+    /// Get the VRAM address of the page table.
+    pub(crate) fn table_vram_address(&self) -> VramAddress {
+        debug_assert!(
+            self.aperture() == AperturePde::VideoMemory,
+            "table_vram_address called on non-VRAM PDE (aperture: {:?})",
+            self.aperture()
+        );
+        VramAddress::from(self.table_frame())
+    }
+
+    /// Get the raw `u64` value.
+    pub(crate) fn raw_u64(&self) -> u64 {
+        self.0
+    }
+}
+
+// Big Page Table pointer for Dual PDE - 64-bit lower word of the 128-bit Dual PDE.
+bitfield! {
+    pub(crate) struct DualPdeBig(u64), "Big Page Table pointer in Dual PDE (MMU v3)" {
+        0:0     is_pte      as bool, "Entry is a PTE (for large pages)";
+        2:1     aperture    as u8 => AperturePde, "Memory aperture type";
+        5:3     pcf         as u8 => PdePcf, "Page Classification Field";
+        51:8    table_frame as u64, "Table frame (table address 256-byte aligned)";
+    }
+}
+
+impl DualPdeBig {
+    /// Create a big page table pointer from a `u64` value.
+    pub(crate) fn new(val: u64) -> Self {
+        Self(val)
+    }
+
+    /// Create an invalid big page table pointer.
+    pub(crate) fn invalid() -> Self {
+        Self::default().set_aperture(AperturePde::Invalid)
+    }
+
+    /// Create a valid big PDE pointing to a page table in video memory.
+    pub(crate) fn new_vram(table_addr: VramAddress) -> Result<Self> {
+        // Big page table addresses must be 256-byte aligned (shift 8).
+        if table_addr.raw_u64() & 0xFF != 0 {
+            return Err(EINVAL);
+        }
+
+        let table_frame = table_addr.raw_u64() >> 8;
+        Ok(Self::default()
+            .set_is_pte(false)
+            .set_aperture(AperturePde::VideoMemory)
+            .set_table_frame(table_frame))
+    }
+
+    /// Check if this big PDE is valid.
+    pub(crate) fn is_valid(&self) -> bool {
+        self.aperture() != AperturePde::Invalid
+    }
+
+    /// Get the VRAM address of the big page table.
+    pub(crate) fn table_vram_address(&self) -> VramAddress {
+        debug_assert!(
+            self.aperture() == AperturePde::VideoMemory,
+            "table_vram_address called on non-VRAM DualPdeBig (aperture: {:?})",
+            self.aperture()
+        );
+        VramAddress::new(self.table_frame() << 8)
+    }
+
+    /// Get the raw `u64` value.
+    pub(crate) fn raw_u64(&self) -> u64 {
+        self.0
+    }
+}
+
+/// Dual PDE at Level 3 for MMU v3 - 128-bit entry.
+///
+/// Contains both big (64KB) and small (4KB) page table pointers:
+/// - Lower 64 bits: Big Page Table pointer.
+/// - Upper 64 bits: Small Page Table pointer.
+///
+/// ## Note
+///
+/// The big and small page table pointers have different address layouts:
+/// - Big address = field value << 8 (256-byte alignment).
+/// - Small address = field value << 12 (4KB alignment).
+///
+/// This is why `DualPdeBig` is a separate type from `Pde`.
+#[repr(C)]
+#[derive(Debug, Clone, Copy, Default)]
+pub(crate) struct DualPde {
+    /// Big Page Table pointer.
+    pub big: DualPdeBig,
+    /// Small Page Table pointer.
+    pub small: Pde,
+}
+
+impl DualPde {
+    /// Create a dual PDE from raw 128-bit value (two `u64`s).
+    pub(crate) fn new(big: u64, small: u64) -> Self {
+        Self {
+            big: DualPdeBig::new(big),
+            small: Pde::new(small),
+        }
+    }
+
+    /// Create a dual PDE with only the small page table pointer set.
+    pub(crate) fn new_small(table_pfn: Pfn) -> Self {
+        Self {
+            big: DualPdeBig::invalid(),
+            small: Pde::new_vram(table_pfn),
+        }
+    }
+
+    /// Check if the small page table pointer is valid.
+    pub(crate) fn has_small(&self) -> bool {
+        self.small.is_valid()
+    }
+
+    /// Check if the big page table pointer is valid.
+    pub(crate) fn has_big(&self) -> bool {
+        self.big.is_valid()
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 13/26] nova-core: mm: Add unified page table entry wrapper enums
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (11 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 12/26] nova-core: mm: Add MMU v3 " Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-21  9:54   ` Zhi Wang
  2026-01-20 20:42 ` [PATCH RFC v6 14/26] nova-core: mm: Add TLB flush support Joel Fernandes
                   ` (13 subsequent siblings)
  26 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add unified Pte, Pde, and DualPde wrapper enums that abstract over
MMU v2 and v3 page table entry formats. These enums allow the page
table walker and VMM to work with both MMU versions.

Each unified type:
- Takes MmuVersion parameter in constructors
- Wraps both ver2 and ver3 variants
- Delegates method calls to the appropriate variant

This enables version-agnostic page table operations while keeping
version-specific implementation details encapsulated in the ver2
and ver3 modules.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/pagetable/mod.rs | 194 ++++++++++++++++++++++
 1 file changed, 194 insertions(+)

diff --git a/drivers/gpu/nova-core/mm/pagetable/mod.rs b/drivers/gpu/nova-core/mm/pagetable/mod.rs
index 3b1324add844..72bc7cda8df6 100644
--- a/drivers/gpu/nova-core/mm/pagetable/mod.rs
+++ b/drivers/gpu/nova-core/mm/pagetable/mod.rs
@@ -10,6 +10,10 @@
 pub(crate) mod ver2;
 pub(crate) mod ver3;
 
+use super::{
+    Pfn,
+    VramAddress, //
+};
 use crate::gpu::Architecture;
 
 /// MMU version enumeration.
@@ -168,3 +172,193 @@ fn from(val: AperturePde) -> Self {
         val as u8
     }
 }
+
+/// Unified Page Table Entry wrapper for both MMU v2 and v3 `PTE`
+/// types, allowing the walker to work with either format.
+#[derive(Debug, Clone, Copy)]
+pub(crate) enum Pte {
+    /// MMU v2 `PTE` (Turing/Ampere/Ada).
+    V2(ver2::Pte),
+    /// MMU v3 `PTE` (Hopper+).
+    V3(ver3::Pte),
+}
+
+impl Pte {
+    /// Create a `PTE` from a raw `u64` value for the given MMU version.
+    pub(crate) fn new(version: MmuVersion, val: u64) -> Self {
+        match version {
+            MmuVersion::V2 => Self::V2(ver2::Pte::new(val)),
+            MmuVersion::V3 => Self::V3(ver3::Pte::new(val)),
+        }
+    }
+
+    /// Create an invalid `PTE` for the given MMU version.
+    pub(crate) fn invalid(version: MmuVersion) -> Self {
+        match version {
+            MmuVersion::V2 => Self::V2(ver2::Pte::invalid()),
+            MmuVersion::V3 => Self::V3(ver3::Pte::invalid()),
+        }
+    }
+
+    /// Create a valid `PTE` for video memory.
+    pub(crate) fn new_vram(version: MmuVersion, pfn: Pfn, writable: bool) -> Self {
+        match version {
+            MmuVersion::V2 => Self::V2(ver2::Pte::new_vram(pfn, writable)),
+            MmuVersion::V3 => Self::V3(ver3::Pte::new_vram(pfn, writable)),
+        }
+    }
+
+    /// Check if this `PTE` is valid.
+    pub(crate) fn is_valid(&self) -> bool {
+        match self {
+            Self::V2(p) => p.valid(),
+            Self::V3(p) => p.valid(),
+        }
+    }
+
+    /// Get the physical frame number.
+    pub(crate) fn frame_number(&self) -> Pfn {
+        match self {
+            Self::V2(p) => p.frame_number(),
+            Self::V3(p) => p.frame_number(),
+        }
+    }
+
+    /// Get the raw `u64` value.
+    pub(crate) fn raw_u64(&self) -> u64 {
+        match self {
+            Self::V2(p) => p.raw_u64(),
+            Self::V3(p) => p.raw_u64(),
+        }
+    }
+}
+
+impl Default for Pte {
+    fn default() -> Self {
+        Self::V2(ver2::Pte::default())
+    }
+}
+
+/// Unified Page Directory Entry wrapper for both MMU v2 and v3 `PDE`.
+#[derive(Debug, Clone, Copy)]
+pub(crate) enum Pde {
+    /// MMU v2 `PDE` (Turing/Ampere/Ada).
+    V2(ver2::Pde),
+    /// MMU v3 `PDE` (Hopper+).
+    V3(ver3::Pde),
+}
+
+impl Pde {
+    /// Create a `PDE` from a raw `u64` value for the given MMU version.
+    pub(crate) fn new(version: MmuVersion, val: u64) -> Self {
+        match version {
+            MmuVersion::V2 => Self::V2(ver2::Pde::new(val)),
+            MmuVersion::V3 => Self::V3(ver3::Pde::new(val)),
+        }
+    }
+
+    /// Create a valid `PDE` pointing to a page table in video memory.
+    pub(crate) fn new_vram(version: MmuVersion, table_pfn: Pfn) -> Self {
+        match version {
+            MmuVersion::V2 => Self::V2(ver2::Pde::new_vram(table_pfn)),
+            MmuVersion::V3 => Self::V3(ver3::Pde::new_vram(table_pfn)),
+        }
+    }
+
+    /// Create an invalid `PDE` for the given MMU version.
+    pub(crate) fn invalid(version: MmuVersion) -> Self {
+        match version {
+            MmuVersion::V2 => Self::V2(ver2::Pde::invalid()),
+            MmuVersion::V3 => Self::V3(ver3::Pde::invalid()),
+        }
+    }
+
+    /// Check if this `PDE` is valid.
+    pub(crate) fn is_valid(&self) -> bool {
+        match self {
+            Self::V2(p) => p.is_valid(),
+            Self::V3(p) => p.is_valid(),
+        }
+    }
+
+    /// Get the VRAM address of the page table.
+    pub(crate) fn table_vram_address(&self) -> VramAddress {
+        match self {
+            Self::V2(p) => p.table_vram_address(),
+            Self::V3(p) => p.table_vram_address(),
+        }
+    }
+
+    /// Get the raw `u64` value.
+    pub(crate) fn raw_u64(&self) -> u64 {
+        match self {
+            Self::V2(p) => p.raw_u64(),
+            Self::V3(p) => p.raw_u64(),
+        }
+    }
+}
+
+impl Default for Pde {
+    fn default() -> Self {
+        Self::V2(ver2::Pde::default())
+    }
+}
+
+/// Unified Dual Page Directory Entry wrapper for both MMU v2 and v3 [`DualPde`].
+#[derive(Debug, Clone, Copy)]
+pub(crate) enum DualPde {
+    /// MMU v2 [`DualPde`] (Turing/Ampere/Ada).
+    V2(ver2::DualPde),
+    /// MMU v3 [`DualPde`] (Hopper+).
+    V3(ver3::DualPde),
+}
+
+impl DualPde {
+    /// Create a [`DualPde`] from raw 128-bit value (two `u64`s) for the given MMU version.
+    pub(crate) fn new(version: MmuVersion, big: u64, small: u64) -> Self {
+        match version {
+            MmuVersion::V2 => Self::V2(ver2::DualPde::new(big, small)),
+            MmuVersion::V3 => Self::V3(ver3::DualPde::new(big, small)),
+        }
+    }
+
+    /// Create a [`DualPde`] with only the small page table pointer set.
+    pub(crate) fn new_small(version: MmuVersion, table_pfn: Pfn) -> Self {
+        match version {
+            MmuVersion::V2 => Self::V2(ver2::DualPde::new_small(table_pfn)),
+            MmuVersion::V3 => Self::V3(ver3::DualPde::new_small(table_pfn)),
+        }
+    }
+
+    /// Check if the small page table pointer is valid.
+    pub(crate) fn has_small(&self) -> bool {
+        match self {
+            Self::V2(d) => d.has_small(),
+            Self::V3(d) => d.has_small(),
+        }
+    }
+
+    /// Get the small page table VRAM address.
+    pub(crate) fn small_vram_address(&self) -> VramAddress {
+        match self {
+            Self::V2(d) => d.small.table_vram_address(),
+            Self::V3(d) => d.small.table_vram_address(),
+        }
+    }
+
+    /// Get the raw `u64` value of the big PDE.
+    pub(crate) fn big_raw_u64(&self) -> u64 {
+        match self {
+            Self::V2(d) => d.big.raw_u64(),
+            Self::V3(d) => d.big.raw_u64(),
+        }
+    }
+
+    /// Get the raw `u64` value of the small PDE.
+    pub(crate) fn small_raw_u64(&self) -> u64 {
+        match self {
+            Self::V2(d) => d.small.raw_u64(),
+            Self::V3(d) => d.small.raw_u64(),
+        }
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 14/26] nova-core: mm: Add TLB flush support
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (12 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 13/26] nova-core: mm: Add unified page table entry wrapper enums Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-21  9:59   ` Zhi Wang
  2026-01-20 20:42 ` [PATCH RFC v6 15/26] nova-core: mm: Add GpuMm centralized memory manager Joel Fernandes
                   ` (12 subsequent siblings)
  26 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add TLB (Translation Lookaside Buffer) flush support for GPU MMU.

After modifying page table entries, the GPU's TLB must be invalidated
to ensure the new mappings take effect. The Tlb struct provides flush
functionality through BAR0 registers.

The flush operation writes the page directory base address and triggers
an invalidation, polling for completion with a 2 second timeout matching
the Nouveau driver.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/mod.rs |  1 +
 drivers/gpu/nova-core/mm/tlb.rs | 79 +++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/regs.rs   | 33 ++++++++++++++
 3 files changed, 113 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/tlb.rs

diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
index 6015fc8753bc..39635f2d0156 100644
--- a/drivers/gpu/nova-core/mm/mod.rs
+++ b/drivers/gpu/nova-core/mm/mod.rs
@@ -6,6 +6,7 @@
 
 pub(crate) mod pagetable;
 pub(crate) mod pramin;
+pub(crate) mod tlb;
 
 use kernel::sizes::SZ_4K;
 
diff --git a/drivers/gpu/nova-core/mm/tlb.rs b/drivers/gpu/nova-core/mm/tlb.rs
new file mode 100644
index 000000000000..8b2ee620da18
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/tlb.rs
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! TLB (Translation Lookaside Buffer) flush support for GPU MMU.
+//!
+//! After modifying page table entries, the GPU's TLB must be flushed to
+//! ensure the new mappings take effect. This module provides TLB flush
+//! functionality for virtual memory managers.
+//!
+//! # Example
+//!
+//! ```ignore
+//! use crate::mm::tlb::Tlb;
+//!
+//! fn page_table_update(tlb: &Tlb, pdb_addr: VramAddress) -> Result<()> {
+//!     // ... modify page tables ...
+//!
+//!     // Flush TLB to make changes visible (polls for completion).
+//!     tlb.flush(pdb_addr)?;
+//!
+//!     Ok(())
+//! }
+//! ```
+
+#![allow(dead_code)]
+
+use kernel::{
+    devres::Devres,
+    io::poll::read_poll_timeout,
+    prelude::*,
+    sync::Arc,
+    time::Delta, //
+};
+
+use crate::{
+    driver::Bar0,
+    mm::VramAddress,
+    regs, //
+};
+
+/// TLB manager for GPU translation buffer operations.
+pub(crate) struct Tlb {
+    bar: Arc<Devres<Bar0>>,
+}
+
+impl Tlb {
+    /// Create a new TLB manager.
+    pub(super) fn new(bar: Arc<Devres<Bar0>>) -> Self {
+        Self { bar }
+    }
+
+    /// Flush the GPU TLB for a specific page directory base.
+    ///
+    /// This invalidates all TLB entries associated with the given PDB address.
+    /// Must be called after modifying page table entries to ensure the GPU sees
+    /// the updated mappings.
+    pub(crate) fn flush(&self, pdb_addr: VramAddress) -> Result {
+        let bar = self.bar.try_access().ok_or(ENODEV)?;
+
+        // Write PDB address.
+        regs::NV_TLB_FLUSH_PDB_LO::from_pdb_addr(pdb_addr.raw_u64()).write(&*bar);
+        regs::NV_TLB_FLUSH_PDB_HI::from_pdb_addr(pdb_addr.raw_u64()).write(&*bar);
+
+        // Trigger flush: invalidate all pages and enable.
+        regs::NV_TLB_FLUSH_CTRL::default()
+            .set_page_all(true)
+            .set_enable(true)
+            .write(&*bar);
+
+        // Poll for completion - enable bit clears when flush is done.
+        read_poll_timeout(
+            || Ok(regs::NV_TLB_FLUSH_CTRL::read(&*bar)),
+            |ctrl| !ctrl.enable(),
+            Delta::ZERO,
+            Delta::from_secs(2),
+        )?;
+
+        Ok(())
+    }
+}
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index c8b8fbdcf608..e722ef837e11 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -414,3 +414,36 @@ pub(crate) mod ga100 {
         0:0     display_disabled as bool;
     });
 }
+
+// MMU TLB
+
+register!(NV_TLB_FLUSH_PDB_LO @ 0x00b830a0, "TLB flush register: PDB address bits [39:8]" {
+    31:0    pdb_lo as u32, "PDB address bits [39:8]";
+});
+
+impl NV_TLB_FLUSH_PDB_LO {
+    /// Create a register value from a PDB address.
+    ///
+    /// Extracts bits [39:8] of the address and shifts it right by 8 bits.
+    pub(crate) fn from_pdb_addr(addr: u64) -> Self {
+        Self::default().set_pdb_lo(((addr >> 8) & 0xFFFF_FFFF) as u32)
+    }
+}
+
+register!(NV_TLB_FLUSH_PDB_HI @ 0x00b830a4, "TLB flush register: PDB address bits [47:40]" {
+    7:0     pdb_hi as u8, "PDB address bits [47:40]";
+});
+
+impl NV_TLB_FLUSH_PDB_HI {
+    /// Create a register value from a PDB address.
+    ///
+    /// Extracts bits [47:40] of the address and shifts it right by 40 bits.
+    pub(crate) fn from_pdb_addr(addr: u64) -> Self {
+        Self::default().set_pdb_hi(((addr >> 40) & 0xFF) as u8)
+    }
+}
+
+register!(NV_TLB_FLUSH_CTRL @ 0x00b830b0, "TLB flush control register" {
+    0:0     page_all as bool, "Invalidate all pages";
+    31:31   enable as bool, "Enable/trigger flush (clears when flush completes)";
+});
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 15/26] nova-core: mm: Add GpuMm centralized memory manager
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (13 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 14/26] nova-core: mm: Add TLB flush support Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 16/26] nova-core: mm: Add page table walker for MMU v2 Joel Fernandes
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Introduce GpuMm as the centralized GPU memory manager that owns:
- Buddy allocator for VRAM allocation.
- PRAMIN window for direct VRAM access.
- TLB manager for translation buffer operations.

This provides clean ownership model where GpuMm provides accessor
methods for its components that can be used for memory management
operations.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs    | 14 +++++++++
 drivers/gpu/nova-core/mm/mod.rs | 55 ++++++++++++++++++++++++++++++++-
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 9b042ef1a308..572e6d4502bc 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -4,8 +4,10 @@
     device,
     devres::Devres,
     fmt,
+    gpu::buddy::GpuBuddyParams,
     pci,
     prelude::*,
+    sizes::{SZ_1M, SZ_4K},
     sync::Arc, //
 };
 
@@ -19,6 +21,7 @@
     fb::SysmemFlush,
     gfw,
     gsp::Gsp,
+    mm::GpuMm,
     regs,
 };
 
@@ -249,6 +252,8 @@ pub(crate) struct Gpu {
     gsp_falcon: Falcon<GspFalcon>,
     /// SEC2 falcon instance, used for GSP boot up and cleanup.
     sec2_falcon: Falcon<Sec2Falcon>,
+    /// GPU memory manager owning memory management resources.
+    mm: GpuMm,
     /// GSP runtime data. Temporarily an empty placeholder.
     #[pin]
     gsp: Gsp,
@@ -281,6 +286,15 @@ pub(crate) fn new<'a>(
 
             sec2_falcon: Falcon::new(pdev.as_ref(), spec.chipset)?,
 
+            // Create GPU memory manager owning memory management resources.
+            // This will be initialized with the usable VRAM region from GSP in a later
+            // patch. For now, we use a placeholder of 1MB.
+            mm: GpuMm::new(devres_bar.clone(), GpuBuddyParams {
+                base_offset_bytes: 0,
+                physical_memory_size_bytes: SZ_1M as u64,
+                chunk_size_bytes: SZ_4K as u64,
+            })?,
+
             gsp <- Gsp::new(pdev),
 
             _: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
index 39635f2d0156..56c72bf51431 100644
--- a/drivers/gpu/nova-core/mm/mod.rs
+++ b/drivers/gpu/nova-core/mm/mod.rs
@@ -8,7 +8,60 @@
 pub(crate) mod pramin;
 pub(crate) mod tlb;
 
-use kernel::sizes::SZ_4K;
+use kernel::{
+    devres::Devres,
+    gpu::buddy::{
+        GpuBuddy,
+        GpuBuddyParams, //
+    },
+    prelude::*,
+    sizes::SZ_4K,
+    sync::Arc, //
+};
+
+use crate::driver::Bar0;
+
+pub(crate) use tlb::Tlb;
+
+/// GPU Memory Manager - owns all core MM components.
+///
+/// Provides centralized ownership of memory management resources:
+/// - [`GpuBuddy`] allocator for VRAM page table allocation.
+/// - [`pramin::Window`] for direct VRAM access.
+/// - [`Tlb`] manager for translation buffer flush operations.
+///
+/// No pinning required, all fields manage their own pinning internally.
+pub(crate) struct GpuMm {
+    buddy: GpuBuddy,
+    pramin: pramin::Window,
+    tlb: Tlb,
+}
+
+impl GpuMm {
+    /// Create a new `GpuMm` object.
+    pub(crate) fn new(bar: Arc<Devres<Bar0>>, buddy_params: GpuBuddyParams) -> Result<Self> {
+        Ok(Self {
+            buddy: GpuBuddy::new(buddy_params)?,
+            pramin: pramin::Window::new(bar.clone())?,
+            tlb: Tlb::new(bar),
+        })
+    }
+
+    /// Access the [`GpuBuddy`] allocator.
+    pub(crate) fn buddy(&self) -> &GpuBuddy {
+        &self.buddy
+    }
+
+    /// Access the [`pramin::Window`].
+    pub(crate) fn pramin(&mut self) -> &mut pramin::Window {
+        &mut self.pramin
+    }
+
+    /// Access the [`Tlb`] manager.
+    pub(crate) fn tlb(&self) -> &Tlb {
+        &self.tlb
+    }
+}
 
 /// Page size in bytes (4 KiB).
 pub(crate) const PAGE_SIZE: usize = SZ_4K;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 16/26] nova-core: mm: Add page table walker for MMU v2
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (14 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 15/26] nova-core: mm: Add GpuMm centralized memory manager Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 17/26] nova-core: mm: Add Virtual Memory Manager Joel Fernandes
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add the page table walker implementation that traverses the 5-level
page table hierarchy (PDB -> L1 -> L2 -> L3 -> L4) to resolve virtual
addresses to physical addresses or find PTE locations.

The walker provides:
- walk_to_pte_lookup(): Walk existing page tables (no allocation)
- Helper functions for reading/writing PDEs and PTEs via PRAMIN

Uses GpuMm API for centralized access to PRAMIN window.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/pagetable/mod.rs  |  13 +
 drivers/gpu/nova-core/mm/pagetable/walk.rs | 285 +++++++++++++++++++++
 2 files changed, 298 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/pagetable/walk.rs

diff --git a/drivers/gpu/nova-core/mm/pagetable/mod.rs b/drivers/gpu/nova-core/mm/pagetable/mod.rs
index 72bc7cda8df6..4c77d4953fbd 100644
--- a/drivers/gpu/nova-core/mm/pagetable/mod.rs
+++ b/drivers/gpu/nova-core/mm/pagetable/mod.rs
@@ -9,12 +9,25 @@
 #![expect(dead_code)]
 pub(crate) mod ver2;
 pub(crate) mod ver3;
+pub(crate) mod walk;
 
 use super::{
+    GpuMm,
     Pfn,
     VramAddress, //
 };
 use crate::gpu::Architecture;
+use kernel::prelude::*;
+
+/// Trait for allocating page tables during page table walks.
+///
+/// Implementors must allocate a zeroed 4KB page table in VRAM and
+/// ensure the allocation persists for the lifetime of the address
+/// space and the lifetime of the implementor.
+pub(crate) trait PageTableAllocator {
+    /// Allocate a zeroed page table and return its VRAM address.
+    fn alloc_page_table(&mut self, mm: &mut GpuMm) -> Result<VramAddress>;
+}
 
 /// MMU version enumeration.
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
diff --git a/drivers/gpu/nova-core/mm/pagetable/walk.rs b/drivers/gpu/nova-core/mm/pagetable/walk.rs
new file mode 100644
index 000000000000..7a2660a30d80
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pagetable/walk.rs
@@ -0,0 +1,285 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Page table walker implementation for NVIDIA GPUs.
+//!
+//! This module provides page table walking functionality for MMU v2 (Turing/Ampere/Ada).
+//! The walker traverses the 5-level page table hierarchy (PDB -> L1 -> L2 -> L3 -> L4)
+//! to resolve virtual addresses to physical addresses or to find PTE locations.
+//!
+//! # Page Table Hierarchy
+//!
+//! ```text
+//!     +-------+     +-------+     +-------+     +---------+     +-------+
+//!     | PDB   |---->|  L1   |---->|  L2   |---->| L3 Dual |---->|  L4   |
+//!     | (L0)  |     |       |     |       |     | PDE     |     | (PTE) |
+//!     +-------+     +-------+     +-------+     +---------+     +-------+
+//!       64-bit        64-bit        64-bit        128-bit         64-bit
+//!        PDE           PDE           PDE        (big+small)        PTE
+//! ```
+//!
+//! # Result of a page table walk
+//!
+//! The walker returns a [`WalkResult`] indicating the outcome:
+//! - [`WalkResult::PageTableMissing`]: Intermediate page tables don't exist (lookup mode).
+//! - [`WalkResult::Unmapped`]: PTE exists but is invalid (page not mapped).
+//! - [`WalkResult::Mapped`]: PTE exists and is valid (page is mapped).
+//!
+//! # Example
+//!
+//! ```ignore
+//! use crate::mm::pagetable::walk::{PtWalk, WalkResult};
+//! use crate::mm::GpuMm;
+//!
+//! fn walk_example(mm: &mut GpuMm, pdb_addr: VramAddress) -> Result<()> {
+//!     // Create a page table walker.
+//!     let walker = PtWalk::new(pdb_addr, MmuVersion::V2);
+//!
+//!     // Walk to a PTE (lookup mode).
+//!     match walker.walk_to_pte_lookup(mm, Vfn::new(0x1000))? {
+//!         WalkResult::Mapped { pte_addr, pfn } => {
+//!             // Page is mapped to the physical frame number.
+//!         }
+//!         WalkResult::Unmapped { pte_addr } => {
+//!             // PTE exists but the page is not mapped.
+//!         }
+//!         WalkResult::PageTableMissing => {
+//!             // Intermediate page tables are missing.
+//!         }
+//!     }
+//!
+//!     Ok(())
+//! }
+//! ```
+
+#![allow(dead_code)]
+
+use kernel::prelude::*;
+
+use super::{
+    DualPde,
+    MmuVersion,
+    PageTableAllocator,
+    PageTableLevel,
+    Pde,
+    Pte, //
+};
+use crate::mm::{
+    pramin,
+    GpuMm,
+    Pfn,
+    Vfn,
+    VirtualAddress,
+    VramAddress, //
+};
+
+/// Dummy allocator for lookup-only walks.
+enum NoAlloc {}
+
+impl PageTableAllocator for NoAlloc {
+    fn alloc_page_table(&mut self, _mm: &mut GpuMm) -> Result<VramAddress> {
+        unreachable!()
+    }
+}
+
+/// Result of walking to a PTE.
+#[derive(Debug, Clone, Copy)]
+pub(crate) enum WalkResult {
+    /// Intermediate page tables are missing (only returned in lookup mode).
+    PageTableMissing,
+    /// PTE exists but is invalid (page not mapped).
+    Unmapped { pte_addr: VramAddress },
+    /// PTE exists and is valid (page is mapped).
+    Mapped { pte_addr: VramAddress, pfn: Pfn },
+}
+
+/// Page table walker for NVIDIA GPUs.
+///
+/// Walks the 5-level page table hierarchy to find PTE locations or resolve
+/// virtual addresses.
+pub(crate) struct PtWalk {
+    pdb_addr: VramAddress,
+    mmu_version: MmuVersion,
+}
+
+impl PtWalk {
+    /// Create a new page table walker.
+    ///
+    /// Copies `pdb_addr` and `mmu_version` from VMM configuration.
+    pub(crate) fn new(pdb_addr: VramAddress, mmu_version: MmuVersion) -> Self {
+        Self {
+            pdb_addr,
+            mmu_version,
+        }
+    }
+
+    /// Get the MMU version this walker is configured for.
+    pub(crate) fn mmu_version(&self) -> MmuVersion {
+        self.mmu_version
+    }
+
+    /// Get the Page Directory Base address.
+    pub(crate) fn pdb_addr(&self) -> VramAddress {
+        self.pdb_addr
+    }
+
+    /// Walk to PTE for lookup only (no allocation).
+    ///
+    /// Returns `PageTableMissing` if intermediate tables don't exist.
+    pub(crate) fn walk_to_pte_lookup(&self, mm: &mut GpuMm, vfn: Vfn) -> Result<WalkResult> {
+        self.walk_to_pte_inner::<NoAlloc>(mm, None, vfn)
+    }
+
+    /// Walk to PTE with allocation of missing tables.
+    ///
+    /// Uses `PageTableAllocator::alloc_page_table()` when tables are missing.
+    pub(crate) fn walk_to_pte_allocate<A: PageTableAllocator>(
+        &self,
+        mm: &mut GpuMm,
+        allocator: &mut A,
+        vfn: Vfn,
+    ) -> Result<WalkResult> {
+        self.walk_to_pte_inner(mm, Some(allocator), vfn)
+    }
+
+    /// Internal walk implementation.
+    ///
+    /// If `allocator` is `Some`, allocates missing page tables. Otherwise returns
+    /// `PageTableMissing` when intermediate tables don't exist.
+    fn walk_to_pte_inner<A: PageTableAllocator>(
+        &self,
+        mm: &mut GpuMm,
+        mut allocator: Option<&mut A>,
+        vfn: Vfn,
+    ) -> Result<WalkResult> {
+        let va = VirtualAddress::from(vfn);
+        let mut cur_table = self.pdb_addr;
+
+        // Walk through PDE levels (PDB -> L1 -> L2 -> L3).
+        for level in PageTableLevel::pde_levels() {
+            let idx = va.level_index(level.as_index());
+
+            if level.is_dual_pde_level() {
+                // L3: 128-bit dual PDE. This is the final PDE level before PTEs and uses
+                // a special "dual" format that can point to both a Small Page Table (SPT)
+                // for 4KB pages and a Large Page Table (LPT) for 64KB pages, or encode a
+                // 2MB huge page directly via IS_PTE bit.
+                let dpde_addr = entry_addr(cur_table, level, idx);
+                let dual_pde = read_dual_pde(mm.pramin(), dpde_addr, self.mmu_version)?;
+
+                // Check if SPT (Small Page Table) pointer is present. We use the "small"
+                // path for 4KB pages (only page size currently supported). If missing and
+                // allocator is available, create a new page table; otherwise return
+                // `PageTableMissing` for lookup-only walks.
+                if !dual_pde.has_small() {
+                    if let Some(ref mut a) = allocator {
+                        let new_table = a.alloc_page_table(mm)?;
+                        let new_dual_pde =
+                            DualPde::new_small(self.mmu_version, Pfn::from(new_table));
+                        write_dual_pde(mm.pramin(), dpde_addr, &new_dual_pde)?;
+                        cur_table = new_table;
+                    } else {
+                        return Ok(WalkResult::PageTableMissing);
+                    }
+                } else {
+                    cur_table = dual_pde.small_vram_address();
+                }
+            } else {
+                // Regular 64-bit PDE (levels PDB, L1, L2). Each entry points to the next
+                // level page table.
+                let pde_addr = entry_addr(cur_table, level, idx);
+                let pde = read_pde(mm.pramin(), pde_addr, self.mmu_version)?;
+
+                // Allocate new page table if PDE is invalid and allocator provided,
+                // otherwise return PageTableMissing for lookup-only walks.
+                if !pde.is_valid() {
+                    if let Some(ref mut a) = allocator {
+                        let new_table = a.alloc_page_table(mm)?;
+                        let new_pde = Pde::new_vram(self.mmu_version, Pfn::from(new_table));
+                        write_pde(mm.pramin(), pde_addr, new_pde)?;
+                        cur_table = new_table;
+                    } else {
+                        return Ok(WalkResult::PageTableMissing);
+                    }
+                } else {
+                    cur_table = pde.table_vram_address();
+                }
+            }
+        }
+
+        // Now at L4 (PTE level).
+        let pte_idx = va.level_index(PageTableLevel::L4.as_index());
+        let pte_addr = entry_addr(cur_table, PageTableLevel::L4, pte_idx);
+
+        // Read PTE to check if mapped.
+        let pte = read_pte(mm.pramin(), pte_addr, self.mmu_version)?;
+        if pte.is_valid() {
+            Ok(WalkResult::Mapped {
+                pte_addr,
+                pfn: pte.frame_number(),
+            })
+        } else {
+            Ok(WalkResult::Unmapped { pte_addr })
+        }
+    }
+}
+
+// ====================================
+// Helper functions for accessing VRAM
+// ====================================
+
+/// Calculate the address of an entry within a page table.
+fn entry_addr(table: VramAddress, level: PageTableLevel, index: u64) -> VramAddress {
+    let entry_size = level.entry_size() as u64;
+    VramAddress::new(table.raw() as u64 + index * entry_size)
+}
+
+/// Read a PDE from VRAM.
+pub(crate) fn read_pde(
+    pramin: &mut pramin::Window,
+    addr: VramAddress,
+    mmu_version: MmuVersion,
+) -> Result<Pde> {
+    let val = pramin.try_read64(addr.raw())?;
+    Ok(Pde::new(mmu_version, val))
+}
+
+/// Write a PDE to VRAM.
+pub(crate) fn write_pde(pramin: &mut pramin::Window, addr: VramAddress, pde: Pde) -> Result {
+    pramin.try_write64(addr.raw(), pde.raw_u64())
+}
+
+/// Read a dual PDE (128-bit) from VRAM.
+pub(crate) fn read_dual_pde(
+    pramin: &mut pramin::Window,
+    addr: VramAddress,
+    mmu_version: MmuVersion,
+) -> Result<DualPde> {
+    let lo = pramin.try_read64(addr.raw())?;
+    let hi = pramin.try_read64(addr.raw() + 8)?;
+    Ok(DualPde::new(mmu_version, lo, hi))
+}
+
+/// Write a dual PDE (128-bit) to VRAM.
+pub(crate) fn write_dual_pde(
+    pramin: &mut pramin::Window,
+    addr: VramAddress,
+    dual_pde: &DualPde,
+) -> Result {
+    pramin.try_write64(addr.raw(), dual_pde.big_raw_u64())?;
+    pramin.try_write64(addr.raw() + 8, dual_pde.small_raw_u64())
+}
+
+/// Read a PTE from VRAM.
+pub(crate) fn read_pte(
+    pramin: &mut pramin::Window,
+    addr: VramAddress,
+    mmu_version: MmuVersion,
+) -> Result<Pte> {
+    let val = pramin.try_read64(addr.raw())?;
+    Ok(Pte::new(mmu_version, val))
+}
+
+/// Write a PTE to VRAM.
+pub(crate) fn write_pte(pramin: &mut pramin::Window, addr: VramAddress, pte: Pte) -> Result {
+    pramin.try_write64(addr.raw(), pte.raw_u64())
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 17/26] nova-core: mm: Add Virtual Memory Manager
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (15 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 16/26] nova-core: mm: Add page table walker for MMU v2 Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 18/26] nova-core: mm: Add virtual address range tracking to VMM Joel Fernandes
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add the Virtual Memory Manager (VMM) for GPU address space management.
The VMM provides high-level page mapping and unmapping operations for
BAR1 address spaces.

The VMM provides mapping, unmapping, lookup, and page table allocations.
Uses GpuMm for access to buddy allocator, PRAMIN, and TLB.  Extends the
page table walker with walk_to_pte_allocate() for on-demand page table
creation.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/mod.rs |   1 +
 drivers/gpu/nova-core/mm/vmm.rs | 204 ++++++++++++++++++++++++++++++++
 2 files changed, 205 insertions(+)
 create mode 100644 drivers/gpu/nova-core/mm/vmm.rs

diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
index 56c72bf51431..53d726eb7296 100644
--- a/drivers/gpu/nova-core/mm/mod.rs
+++ b/drivers/gpu/nova-core/mm/mod.rs
@@ -7,6 +7,7 @@
 pub(crate) mod pagetable;
 pub(crate) mod pramin;
 pub(crate) mod tlb;
+pub(crate) mod vmm;
 
 use kernel::{
     devres::Devres,
diff --git a/drivers/gpu/nova-core/mm/vmm.rs b/drivers/gpu/nova-core/mm/vmm.rs
new file mode 100644
index 000000000000..a5b4af9053a0
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/vmm.rs
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Virtual Memory Manager for NVIDIA GPU page table management.
+//!
+//! The [`Vmm`] provides high-level page mapping and unmapping operations for GPU
+//! virtual address spaces (Channels, BAR1, BAR2). It wraps the page table walker
+//! and handles TLB flushing after modifications.
+//!
+//! # Example
+//!
+//! ```ignore
+//! use crate::mm::vmm::Vmm;
+//! use crate::mm::{GpuMm, Pfn, Vfn, VramAddress};
+//! use crate::mm::pagetable::MmuVersion;
+//! use kernel::sizes::SZ_1M;
+//!
+//! fn map_example(mm: &mut GpuMm, pdb_addr: VramAddress) -> Result<()> {
+//!     let mut vmm = Vmm::new(pdb_addr, MmuVersion::V2, SZ_1M as u64)?;
+//!
+//!     // Map virtual frame 0x100 to physical frame 0x200.
+//!     let vfn = Vfn::new(0x100);
+//!     let pfn = Pfn::new(0x200);
+//!     vmm.map_page(mm, vfn, pfn, true /* writable */)?;
+//!
+//!     Ok(())
+//! }
+//! ```
+
+#![allow(dead_code)]
+
+use kernel::{
+    gpu::buddy::{
+        AllocatedBlocks,
+        BuddyFlags,
+        GpuBuddyAllocParams, //
+    },
+    prelude::*,
+    sizes::SZ_4K,
+    sync::Arc, //
+};
+
+use crate::mm::{
+    pagetable::{
+        walk::{
+            write_pte,
+            PtWalk,
+            WalkResult, //
+        },
+        MmuVersion,
+        PageTableAllocator,
+        Pte, //
+    },
+    GpuMm,
+    Pfn,
+    Vfn,
+    VramAddress,
+    PAGE_SIZE, //
+};
+
+/// Virtual Memory Manager for a GPU address space.
+///
+/// Each [`Vmm`] instance manages a single address space identified by its Page
+/// Directory Base (`PDB`) address. The [`Vmm`] is used for BAR1 and BAR2 mappings.
+///
+/// The [`Vmm`] tracks all page table allocations made during mapping operations
+/// to ensure they remain valid for the lifetime of the address space.
+pub(crate) struct Vmm {
+    pdb_addr: VramAddress,
+    mmu_version: MmuVersion,
+    /// Page table allocations that must persist for the lifetime of mappings.
+    page_table_allocs: KVec<Arc<AllocatedBlocks>>,
+}
+
+impl Vmm {
+    /// Create a new [`Vmm`] for the given Page Directory Base address.
+    pub(crate) fn new(pdb_addr: VramAddress, mmu_version: MmuVersion) -> Result<Self> {
+        // Only MMU v2 is supported for now.
+        if mmu_version != MmuVersion::V2 {
+            return Err(ENOTSUPP);
+        }
+
+        Ok(Self {
+            pdb_addr,
+            mmu_version,
+            page_table_allocs: KVec::new(),
+        })
+    }
+
+    /// Get the Page Directory Base address.
+    pub(crate) fn pdb_addr(&self) -> VramAddress {
+        self.pdb_addr
+    }
+
+    /// Get the MMU version.
+    pub(crate) fn mmu_version(&self) -> MmuVersion {
+        self.mmu_version
+    }
+
+    /// Allocate a new page table, zero it, and track the allocation.
+    ///
+    /// This method ensures page table allocations persist for the lifetime of
+    /// the [`Vmm`].
+    pub(crate) fn alloc_page_table(&mut self, mm: &mut GpuMm) -> Result<VramAddress> {
+        let params = GpuBuddyAllocParams {
+            start_range_address: 0,
+            end_range_address: 0,
+            size_bytes: SZ_4K as u64,
+            min_block_size_bytes: SZ_4K as u64,
+            buddy_flags: BuddyFlags::try_new(0)?,
+        };
+
+        // Use buddy first, then pramin (sequential to avoid overlapping borrows).
+        let blocks = mm.buddy().alloc_blocks(params)?;
+        let offset = blocks.iter().next().ok_or(ENOMEM)?.offset();
+        let addr = VramAddress::new(offset);
+
+        // Zero the page table using pramin.
+        let base = addr.raw();
+        for offset in (0..PAGE_SIZE).step_by(8) {
+            mm.pramin().try_write64(base + offset, 0)?;
+        }
+
+        // Track the page table allocation.
+        self.page_table_allocs.push(blocks, GFP_KERNEL)?;
+
+        Ok(addr)
+    }
+
+    /// Map a 4KB page with on-demand page table allocation.
+    ///
+    /// Walks the page table hierarchy and allocates any missing intermediate
+    /// tables using the buddy allocator from [`GpuMm`].
+    pub(crate) fn map_page(
+        &mut self,
+        mm: &mut GpuMm,
+        vfn: Vfn,
+        pfn: Pfn,
+        writable: bool,
+    ) -> Result {
+        // Create page table walker.
+        let walker = PtWalk::new(self.pdb_addr, self.mmu_version);
+
+        // Walk to PTE address, allocating tables as needed.
+        let pte_addr = match walker.walk_to_pte_allocate(mm, self, vfn)? {
+            WalkResult::Unmapped { pte_addr } | WalkResult::Mapped { pte_addr, .. } => pte_addr,
+            WalkResult::PageTableMissing => {
+                // Should not happen with allocate mode.
+                return Err(EINVAL);
+            }
+        };
+
+        // Create and write PTE.
+        let pte = Pte::new_vram(self.mmu_version, pfn, writable);
+        write_pte(mm.pramin(), pte_addr, pte)?;
+
+        // Flush the TLB.
+        mm.tlb().flush(self.pdb_addr)?;
+
+        Ok(())
+    }
+
+    /// Unmap a 4KB page.
+    ///
+    /// Invalidates the [`Pte`] at the given virtual frame number. Does nothing if
+    /// the page is not currently mapped.
+    pub(crate) fn unmap_page(&self, mm: &mut GpuMm, vfn: Vfn) -> Result {
+        // Create page table walker.
+        let walker = PtWalk::new(self.pdb_addr, self.mmu_version);
+
+        // Walk to PTE address.
+        let pte_addr = match walker.walk_to_pte_lookup(mm, vfn)? {
+            WalkResult::Unmapped { pte_addr } | WalkResult::Mapped { pte_addr, .. } => pte_addr,
+            WalkResult::PageTableMissing => return Ok(()), // Nothing to unmap.
+        };
+
+        // Invalidate PTE.
+        let invalid_pte = Pte::invalid(self.mmu_version);
+        write_pte(mm.pramin(), pte_addr, invalid_pte)?;
+
+        // Flush the TLB.
+        mm.tlb().flush(self.pdb_addr)?;
+
+        Ok(())
+    }
+
+    /// Read the [`Pfn`] for a mapped virtual frame number.
+    ///
+    /// Returns `Some(pfn)` if the [`Vfn`] is mapped, `None` otherwise.
+    pub(crate) fn read_mapping(&self, mm: &mut GpuMm, vfn: Vfn) -> Result<Option<Pfn>> {
+        // Create page table walker.
+        let walker = PtWalk::new(self.pdb_addr, self.mmu_version);
+
+        match walker.walk_to_pte_lookup(mm, vfn)? {
+            WalkResult::Mapped { pfn, .. } => Ok(Some(pfn)),
+            WalkResult::Unmapped { .. } | WalkResult::PageTableMissing => Ok(None),
+        }
+    }
+}
+
+impl PageTableAllocator for Vmm {
+    fn alloc_page_table(&mut self, mm: &mut GpuMm) -> Result<VramAddress> {
+        Vmm::alloc_page_table(self, mm)
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 18/26] nova-core: mm: Add virtual address range tracking to VMM
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (16 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 17/26] nova-core: mm: Add Virtual Memory Manager Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 19/26] nova-core: mm: Add BAR1 user interface Joel Fernandes
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Extend the Virtual Memory Manager with optional virtual address range
tracking using a buddy allocator. This enables BarUser to allocate
contiguous virtual ranges for BAR1 mappings.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/vmm.rs | 49 +++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/nova-core/mm/vmm.rs b/drivers/gpu/nova-core/mm/vmm.rs
index a5b4af9053a0..0ab80b84e55a 100644
--- a/drivers/gpu/nova-core/mm/vmm.rs
+++ b/drivers/gpu/nova-core/mm/vmm.rs
@@ -32,7 +32,9 @@
     gpu::buddy::{
         AllocatedBlocks,
         BuddyFlags,
-        GpuBuddyAllocParams, //
+        GpuBuddy,
+        GpuBuddyAllocParams,
+        GpuBuddyParams, //
     },
     prelude::*,
     sizes::SZ_4K,
@@ -60,29 +62,48 @@
 /// Virtual Memory Manager for a GPU address space.
 ///
 /// Each [`Vmm`] instance manages a single address space identified by its Page
-/// Directory Base (`PDB`) address. The [`Vmm`] is used for BAR1 and BAR2 mappings.
+/// Directory Base (`PDB`) address. The [`Vmm`] is used for Channel, BAR1 and BAR2 mappings.
 ///
 /// The [`Vmm`] tracks all page table allocations made during mapping operations
 /// to ensure they remain valid for the lifetime of the address space.
+///
+/// It tracks virtual address allocations via a buddy allocator.
 pub(crate) struct Vmm {
     pdb_addr: VramAddress,
     mmu_version: MmuVersion,
     /// Page table allocations that must persist for the lifetime of mappings.
     page_table_allocs: KVec<Arc<AllocatedBlocks>>,
+    /// Buddy allocator for virtual address range tracking.
+    virt_buddy: GpuBuddy,
 }
 
 impl Vmm {
     /// Create a new [`Vmm`] for the given Page Directory Base address.
-    pub(crate) fn new(pdb_addr: VramAddress, mmu_version: MmuVersion) -> Result<Self> {
+    ///
+    /// The [`Vmm`] will manage a virtual address space of `va_size` bytes using
+    /// a buddy allocator. This enables [`Vmm::alloc_vfn_range()`] for allocating
+    /// contiguous virtual ranges.
+    pub(crate) fn new(
+        pdb_addr: VramAddress,
+        mmu_version: MmuVersion,
+        va_size: u64,
+    ) -> Result<Self> {
         // Only MMU v2 is supported for now.
         if mmu_version != MmuVersion::V2 {
             return Err(ENOTSUPP);
         }
 
+        let virt_buddy = GpuBuddy::new(GpuBuddyParams {
+            base_offset_bytes: 0,
+            physical_memory_size_bytes: va_size,
+            chunk_size_bytes: SZ_4K as u64,
+        })?;
+
         Ok(Self {
             pdb_addr,
             mmu_version,
             page_table_allocs: KVec::new(),
+            virt_buddy,
         })
     }
 
@@ -96,6 +117,28 @@ pub(crate) fn mmu_version(&self) -> MmuVersion {
         self.mmu_version
     }
 
+    /// Allocate a contiguous virtual frame number range.
+    ///
+    /// Returns an [`Arc<AllocatedBlocks>`] representing the allocated range.
+    /// The allocation is automatically freed when the [`Arc`] is dropped.
+    pub(crate) fn alloc_vfn_range(&self, num_pages: usize) -> Result<(Vfn, Arc<AllocatedBlocks>)> {
+        let params = GpuBuddyAllocParams {
+            start_range_address: 0,
+            end_range_address: 0,
+            size_bytes: num_pages.checked_mul(PAGE_SIZE).ok_or(EOVERFLOW)? as u64,
+            min_block_size_bytes: SZ_4K as u64,
+            buddy_flags: BuddyFlags::try_new(BuddyFlags::CONTIGUOUS_ALLOCATION)?,
+        };
+
+        let alloc = self.virt_buddy.alloc_blocks(params)?;
+
+        // Get the starting offset from the first (and only, due to CONTIGUOUS) block.
+        let offset = alloc.iter().next().ok_or(ENOMEM)?.offset();
+        let vfn = Vfn::new(offset / PAGE_SIZE as u64);
+
+        Ok((vfn, alloc))
+    }
+
     /// Allocate a new page table, zero it, and track the allocation.
     ///
     /// This method ensures page table allocations persist for the lifetime of
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 19/26] nova-core: mm: Add BAR1 user interface
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (17 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 18/26] nova-core: mm: Add virtual address range tracking to VMM Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 20/26] nova-core: gsp: Return GspStaticInfo and FbLayout from boot() Joel Fernandes
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add the BAR1 user interface for CPU access to GPU video memory through
the BAR1 aperture.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs      |   1 -
 drivers/gpu/nova-core/mm/bar_user.rs | 195 +++++++++++++++++++++++++++
 drivers/gpu/nova-core/mm/mod.rs      |   1 +
 3 files changed, 196 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/nova-core/mm/bar_user.rs

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index f30ffa45cf13..d8b2e967ba4c 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -42,7 +42,6 @@ pub(crate) struct NovaCore {
 const GPU_DMA_BITS: u32 = 47;
 
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
-#[expect(dead_code)]
 pub(crate) type Bar1 = pci::Bar<BAR1_SIZE>;
 
 kernel::pci_device_table!(
diff --git a/drivers/gpu/nova-core/mm/bar_user.rs b/drivers/gpu/nova-core/mm/bar_user.rs
new file mode 100644
index 000000000000..288dec0ae920
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/bar_user.rs
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! BAR1 user interface for CPU access to GPU virtual memory.
+//!
+//! BAR1 provides a PCIe aperture for CPU access to GPU video memory through
+//! the GPU's MMU. The [`BarUser`] struct owns a VMM and provides BAR1-specific
+//! mapping operations with automatic cleanup.
+//!
+//! [`BarUser::map()`] returns a [`BarAccess`] object that provides read/write
+//! accessors to the mapped region. When [`BarAccess`] is dropped, the pages
+//! are automatically unmapped and the virtual range is freed.
+//!
+//! Some uses of BAR1 are:
+//! - USERD writes: CPU submits work by writing GP_PUT to userspace doorbell.
+//! - User-space mmap: Applications access GPU buffers via mmap().
+//!
+//! # Example
+//!
+//! ```ignore
+//! use crate::mm::bar_user::BarUser;
+//!
+//! fn setup_bar1(mm: &mut GpuMm, bar1: &Bar1, pdb_addr: VramAddress) -> Result<()> {
+//!     let mut bar_user = BarUser::new(pdb_addr, MmuVersion::V2, 0x1000_0000)?;
+//!
+//!     // Map discontiguous physical pages to contiguous virtual range.
+//!     let pfns = [Pfn::new(0x100), Pfn::new(0x500), Pfn::new(0x200)];
+//!     let access = bar_user.map(mm, bar1, &pfns, true)?;
+//!
+//!     // Access the mapped region (offset is within the mapped range).
+//!     access.try_write32(0xDEAD_BEEF, 0x0)?;  // Page 0, offset 0
+//!     access.try_write32(0xCAFE_BABE, 0x1000)?;  // Page 1, offset 0
+//!
+//!     let val = access.try_read32(0x0)?;
+//!     assert_eq!(val, 0xDEAD_BEEF);
+//!
+//!     // Pages unmapped when `access` is dropped.
+//!     Ok(())
+//! }
+//! ```
+
+use kernel::{
+    gpu::buddy::AllocatedBlocks,
+    prelude::*,
+    sync::Arc, //
+};
+
+use crate::{
+    driver::Bar1,
+    mm::{
+        pagetable::MmuVersion,
+        vmm::Vmm,
+        GpuMm,
+        Pfn,
+        Vfn,
+        VirtualAddress,
+        VramAddress,
+        PAGE_SIZE, //
+    },
+};
+
+/// BAR1 user interface for virtual memory mappings.
+///
+/// Owns a VMM instance with virtual address tracking and provides
+/// BAR1-specific mapping and cleanup operations.
+pub(crate) struct BarUser {
+    vmm: Vmm,
+}
+
+impl BarUser {
+    /// Create a new [`BarUser`] with virtual address tracking.
+    pub(crate) fn new(
+        pdb_addr: VramAddress,
+        mmu_version: MmuVersion,
+        va_size: u64,
+    ) -> Result<Self> {
+        Ok(Self {
+            vmm: Vmm::new(pdb_addr, mmu_version, va_size)?,
+        })
+    }
+
+    /// Map a list of physical frame numbers to a contiguous virtual range.
+    ///
+    /// Allocates a contiguous virtual range from the VMM's virtual address range
+    /// allocator, maps each PFN to consecutive VFNs, and returns a [`BarAccess`] object
+    /// for accessing the mapped region.
+    ///
+    /// The mappings are automatically unmapped and the virtual range is freed
+    /// when the returned [`BarAccess`] is dropped.
+    pub(crate) fn map<'a>(
+        &'a mut self,
+        mm: &'a mut GpuMm,
+        bar: &'a Bar1,
+        pfns: &[Pfn],
+        writable: bool,
+    ) -> Result<BarAccess<'a>> {
+        let num_pages = pfns.len();
+        if num_pages == 0 {
+            return Err(EINVAL);
+        }
+
+        // Allocate contiguous virtual range.
+        let (vfn_start, vfn_alloc) = self.vmm.alloc_vfn_range(num_pages)?;
+
+        // Map each PFN to its corresponding VFN.
+        for (i, &pfn) in pfns.iter().enumerate() {
+            let vfn = Vfn::new(vfn_start.raw() + i as u64);
+            self.vmm.map_page(mm, vfn, pfn, writable)?;
+        }
+
+        Ok(BarAccess {
+            vmm: &mut self.vmm,
+            mm,
+            bar,
+            vfn_start,
+            num_pages,
+            _vfn_alloc: vfn_alloc,
+        })
+    }
+}
+
+/// Access object for a mapped BAR1 region.
+///
+/// Provides read/write accessors to the mapped region. When dropped, automatically
+/// unmaps all pages and frees the virtual range.
+pub(crate) struct BarAccess<'a> {
+    vmm: &'a mut Vmm,
+    mm: &'a mut GpuMm,
+    bar: &'a Bar1,
+    vfn_start: Vfn,
+    num_pages: usize,
+    /// Holds the virtual range allocation; freed when [`BarAccess`] is dropped.
+    _vfn_alloc: Arc<AllocatedBlocks>,
+}
+
+impl<'a> BarAccess<'a> {
+    /// Get the base virtual address of this mapping.
+    pub(crate) fn base(&self) -> VirtualAddress {
+        VirtualAddress::from(self.vfn_start)
+    }
+
+    /// Get the total size of the mapped region in bytes.
+    pub(crate) fn size(&self) -> usize {
+        self.num_pages * PAGE_SIZE
+    }
+
+    /// Get the starting virtual frame number.
+    pub(crate) fn vfn_start(&self) -> Vfn {
+        self.vfn_start
+    }
+
+    /// Get the number of pages in this mapping.
+    pub(crate) fn num_pages(&self) -> usize {
+        self.num_pages
+    }
+
+    /// Translate an offset within this mapping to a BAR1 aperture offset.
+    fn bar_offset(&self, offset: usize) -> Result<usize> {
+        if offset >= self.size() {
+            return Err(EINVAL);
+        }
+        Ok(self.vfn_start.raw() as usize * PAGE_SIZE + offset)
+    }
+
+    // Fallible accessors with runtime bounds checking.
+
+    /// Read a 32-bit value at the given offset.
+    pub(crate) fn try_read32(&self, offset: usize) -> Result<u32> {
+        self.bar.try_read32(self.bar_offset(offset)?)
+    }
+
+    /// Write a 32-bit value at the given offset.
+    pub(crate) fn try_write32(&self, value: u32, offset: usize) -> Result {
+        self.bar.try_write32(value, self.bar_offset(offset)?)
+    }
+
+    /// Read a 64-bit value at the given offset.
+    pub(crate) fn try_read64(&self, offset: usize) -> Result<u64> {
+        self.bar.try_read64(self.bar_offset(offset)?)
+    }
+
+    /// Write a 64-bit value at the given offset.
+    pub(crate) fn try_write64(&self, value: u64, offset: usize) -> Result {
+        self.bar.try_write64(value, self.bar_offset(offset)?)
+    }
+}
+
+impl Drop for BarAccess<'_> {
+    fn drop(&mut self) {
+        // Unmap all pages in this access range.
+        for i in 0..self.num_pages {
+            let vfn = Vfn::new(self.vfn_start.raw() + i as u64);
+            let _ = self.vmm.unmap_page(self.mm, vfn);
+        }
+    }
+}
diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
index 53d726eb7296..449c2dea3e07 100644
--- a/drivers/gpu/nova-core/mm/mod.rs
+++ b/drivers/gpu/nova-core/mm/mod.rs
@@ -4,6 +4,7 @@
 
 #![expect(dead_code)]
 
+pub(crate) mod bar_user;
 pub(crate) mod pagetable;
 pub(crate) mod pramin;
 pub(crate) mod tlb;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 20/26] nova-core: gsp: Return GspStaticInfo and FbLayout from boot()
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (18 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 19/26] nova-core: mm: Add BAR1 user interface Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 21/26] nova-core: mm: Add memory management self-tests Joel Fernandes
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Refactor the GSP boot function to return the GspStaticInfo and FbLayout.

This enables access required for memory management initialization to:
- bar1_pde_base: BAR1 page directory base.
- bar2_pde_base: BAR2 page directory base.
- usable memory regions in vidmem.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs      |  9 +++++++--
 drivers/gpu/nova-core/gsp/boot.rs | 15 ++++++++++++---
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 572e6d4502bc..91ec7f7910e9 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -20,7 +20,10 @@
     },
     fb::SysmemFlush,
     gfw,
-    gsp::Gsp,
+    gsp::{
+        commands::GetGspStaticInfoReply,
+        Gsp, //
+    },
     mm::GpuMm,
     regs,
 };
@@ -257,6 +260,8 @@ pub(crate) struct Gpu {
     /// GSP runtime data. Temporarily an empty placeholder.
     #[pin]
     gsp: Gsp,
+    /// Static GPU information from GSP.
+    gsp_static_info: GetGspStaticInfoReply,
 }
 
 impl Gpu {
@@ -297,7 +302,7 @@ pub(crate) fn new<'a>(
 
             gsp <- Gsp::new(pdev),
 
-            _: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
+            gsp_static_info: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)?.0 },
 
             bar: devres_bar,
         })
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 581b412554dc..75f949bc4864 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -32,7 +32,10 @@
     },
     gpu::Chipset,
     gsp::{
-        commands,
+        commands::{
+            self,
+            GetGspStaticInfoReply, //
+        },
         sequencer::{
             GspSequencer,
             GspSequencerParams, //
@@ -127,6 +130,12 @@ fn run_fwsec_frts(
     /// structures that the GSP will use at runtime.
     ///
     /// Upon return, the GSP is up and running, and its runtime object given as return value.
+    ///
+    /// Returns a tuple containing:
+    /// - [`GetGspStaticInfoReply`]: Static GPU information from GSP, including the BAR1 page
+    ///   directory base address needed for memory management.
+    /// - [`FbLayout`]: Frame buffer layout computed during boot, containing memory regions
+    ///   required for [`GpuMm`] initialization.
     pub(crate) fn boot(
         mut self: Pin<&mut Self>,
         pdev: &pci::Device<device::Bound>,
@@ -134,7 +143,7 @@ pub(crate) fn boot(
         chipset: Chipset,
         gsp_falcon: &Falcon<Gsp>,
         sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    ) -> Result<(GetGspStaticInfoReply, FbLayout)> {
         let dev = pdev.as_ref();
 
         let bios = Vbios::new(dev, bar)?;
@@ -243,6 +252,6 @@ pub(crate) fn boot(
             Err(e) => dev_warn!(pdev.as_ref(), "GPU name unavailable: {:?}\n", e),
         }
 
-        Ok(())
+        Ok((info, fb_layout))
     }
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 21/26] nova-core: mm: Add memory management self-tests
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (19 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 20/26] nova-core: gsp: Return GspStaticInfo and FbLayout from boot() Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:42 ` [PATCH RFC v6 22/26] nova-core: mm: Add PRAMIN aperture self-tests Joel Fernandes
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add comprehensive self-tests for the MM subsystem that run during driver
probe when CONFIG_NOVA_MM_SELFTESTS is enabled (default disabled). These
result in testing the Vmm, buddy, bar1 and pramin all of which should
function correctly for the tests to pass.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/Kconfig         |  10 ++
 drivers/gpu/nova-core/driver.rs       |   2 +
 drivers/gpu/nova-core/gpu.rs          |  43 ++++++++
 drivers/gpu/nova-core/gsp/commands.rs |   1 -
 drivers/gpu/nova-core/mm/bar_user.rs  | 141 ++++++++++++++++++++++++++
 5 files changed, 196 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/Kconfig b/drivers/gpu/nova-core/Kconfig
index 809485167aff..257bca5aa0ef 100644
--- a/drivers/gpu/nova-core/Kconfig
+++ b/drivers/gpu/nova-core/Kconfig
@@ -15,3 +15,13 @@ config NOVA_CORE
 	  This driver is work in progress and may not be functional.
 
 	  If M is selected, the module will be called nova_core.
+
+config NOVA_MM_SELFTESTS
+	bool "Memory management self-tests"
+	depends on NOVA_CORE
+	help
+	  Enable self-tests for the memory management subsystem. When enabled,
+	  tests are run during GPU probe to verify page table walking and
+	  BAR1 virtual memory mapping functionality.
+
+	  This is a testing option and is default-disabled.
diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index d8b2e967ba4c..7d0d09939835 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -92,6 +92,8 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
 
             Ok(try_pin_init!(Self {
                 gpu <- Gpu::new(pdev, bar.clone(), bar.access(pdev.as_ref())?),
+                // Run optional GPU selftests.
+                _: { gpu.run_selftests(pdev)? },
                 _reg <- auxiliary::Registration::new(
                     pdev.as_ref(),
                     c"nova-drm",
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 91ec7f7910e9..938828508f2c 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -318,4 +318,47 @@ pub(crate) fn unbind(&self, dev: &device::Device<device::Core>) {
             .inspect(|bar| self.sysmem_flush.unregister(bar))
             .is_err());
     }
+
+    /// Run selftests on the constructed [`Gpu`].
+    pub(crate) fn run_selftests(
+        mut self: Pin<&mut Self>,
+        pdev: &pci::Device<device::Bound>,
+    ) -> Result {
+        self.as_mut().run_mm_selftest(pdev)?;
+        Ok(())
+    }
+
+    fn run_mm_selftest(mut self: Pin<&mut Self>, pdev: &pci::Device<device::Bound>) -> Result {
+        #[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+        {
+            use crate::driver::BAR1_SIZE;
+            use crate::mm::pagetable::MmuVersion;
+            use kernel::c_str;
+
+            let bar1 = Arc::pin_init(
+                pdev.iomap_region_sized::<BAR1_SIZE>(1, c_str!("nova-core/bar1")),
+                GFP_KERNEL,
+            )?;
+            let bar1_access = bar1.access(pdev.as_ref())?;
+
+            // Use projection to access non-pinned fields.
+            let proj = self.as_mut().project();
+            let bar1_pde_base = proj.gsp_static_info.bar1_pde_base();
+            let mm = proj.mm;
+            let mmu_version = MmuVersion::from(proj.spec.chipset.arch());
+
+            crate::mm::bar_user::run_self_test(
+                pdev.as_ref(),
+                mm,
+                bar1_access,
+                bar1_pde_base,
+                mmu_version,
+            )?;
+        }
+
+        // Suppress unused warnings when selftests disabled.
+        let _ = &mut self;
+        let _ = pdev;
+        Ok(())
+    }
 }
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 7b5025cba106..311f65f8367b 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -232,7 +232,6 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
     }
 
     /// Returns the BAR1 Page Directory Entry base address.
-    #[expect(dead_code)]
     pub(crate) fn bar1_pde_base(&self) -> u64 {
         self.bar1_pde_base
     }
diff --git a/drivers/gpu/nova-core/mm/bar_user.rs b/drivers/gpu/nova-core/mm/bar_user.rs
index 288dec0ae920..e19906d5bcc6 100644
--- a/drivers/gpu/nova-core/mm/bar_user.rs
+++ b/drivers/gpu/nova-core/mm/bar_user.rs
@@ -193,3 +193,144 @@ fn drop(&mut self) {
         }
     }
 }
+
+/// Run MM subsystem self-tests during probe.
+///
+/// Tests page table infrastructure and BAR1 MMIO access using the BAR1
+/// address space initialized by GSP-RM. Uses the GpuMm's buddy allocator
+/// to allocate page tables and test pages as needed.
+#[cfg(CONFIG_NOVA_MM_SELFTESTS)]
+pub(crate) fn run_self_test(
+    dev: &kernel::device::Device,
+    mm: &mut GpuMm,
+    bar1: &crate::driver::Bar1,
+    bar1_pdb: u64,
+    mmu_version: MmuVersion,
+) -> Result {
+    use crate::mm::vmm::Vmm;
+    use crate::mm::PAGE_SIZE;
+    use kernel::gpu::buddy::BuddyFlags;
+    use kernel::gpu::buddy::GpuBuddyAllocParams;
+    use kernel::sizes::{
+        SZ_4K,
+        SZ_64K, //
+    };
+
+    // Self-tests only support MMU v2 (Turing/Ampere/Ada).
+    if mmu_version != MmuVersion::V2 {
+        dev_info!(
+            dev,
+            "MM: Skipping self-tests for MMU {:?} (only V2 supported)\n",
+            mmu_version
+        );
+        return Ok(());
+    }
+
+    // Test patterns - distinct values to detect stale reads.
+    const PATTERN_PRAMIN: u32 = 0xDEAD_BEEF;
+    const PATTERN_BAR1: u32 = 0xCAFE_BABE;
+
+    dev_info!(dev, "MM: Starting self-test...\n");
+
+    let pdb_addr = VramAddress::new(bar1_pdb);
+
+    // Phase 1: Check if page tables are in VRAM (accessible via PRAMIN).
+    {
+        use crate::mm::pagetable::ver2::Pde;
+        use crate::mm::pagetable::AperturePde;
+
+        // Read PDB[0] to check the aperture of the first L1 pointer.
+        let pdb_entry_raw = mm.pramin().try_read64(pdb_addr.raw())?;
+        let pdb_entry = Pde::new(pdb_entry_raw);
+
+        if !pdb_entry.is_valid() {
+            dev_info!(dev, "MM: Self-test SKIPPED - no valid page tables\n");
+            return Ok(());
+        }
+
+        if pdb_entry.aperture() != AperturePde::VideoMemory {
+            dev_info!(dev, "MM: Self-test SKIPPED - requires VRAM-based page tables\n");
+            return Ok(());
+        }
+    }
+
+    // Phase 2: Allocate a test page from the buddy allocator.
+    let alloc_params = GpuBuddyAllocParams {
+        start_range_address: 0,
+        end_range_address: 0,
+        size_bytes: SZ_4K as u64,
+        min_block_size_bytes: SZ_4K as u64,
+        buddy_flags: BuddyFlags::try_new(0)?,
+    };
+
+    let test_page_blocks = mm.buddy().alloc_blocks(alloc_params)?;
+    let test_vram_offset = test_page_blocks.iter().next().ok_or(ENOMEM)?.offset();
+    let test_vram = VramAddress::new(test_vram_offset);
+    let test_pfn = Pfn::from(test_vram);
+
+    // Use VFN 8 (offset 0x8000) for the test mapping.
+    // This is within the BAR1 aperture and will trigger page table allocation.
+    let test_vfn = Vfn::new(8u64);
+
+    // Create a VMM of size 64K to track virtual memory mappings.
+    let mut vmm = Vmm::new(pdb_addr, MmuVersion::V2, SZ_64K as u64)?;
+
+    // Phase 3+4: Create mapping using `GpuMm` and `Vmm`.
+    vmm.map_page(mm, test_vfn, test_pfn, true)?;
+
+    // Phase 5: Test the mapping.
+    // Pre-compute test addresses for each access path.
+    // Use distinct offsets within the page for read (0x100) and write (0x200) tests.
+    let bar1_base_offset = test_vfn.raw() as usize * PAGE_SIZE;
+    let bar1_read_offset: usize = bar1_base_offset + 0x100;
+    let bar1_write_offset: usize = bar1_base_offset + 0x200;
+    let vram_read_addr: usize = test_vram.raw() + 0x100;
+    let vram_write_addr: usize = test_vram.raw() + 0x200;
+
+    // Test 1: Write via PRAMIN, read via BAR1.
+    mm.pramin().try_write32(vram_read_addr, PATTERN_PRAMIN)?;
+
+    // Read back via BAR1 aperture.
+    let bar1_value = bar1.try_read32(bar1_read_offset)?;
+
+    let test1_passed = if bar1_value == PATTERN_PRAMIN {
+        true
+    } else {
+        dev_err!(
+            dev,
+            "MM: Test 1 FAILED - Expected {:#010x}, got {:#010x}\n",
+            PATTERN_PRAMIN,
+            bar1_value
+        );
+        false
+    };
+
+    // Test 2: Write via BAR1, read via PRAMIN.
+    bar1.try_write32(PATTERN_BAR1, bar1_write_offset)?;
+
+    // Read back via PRAMIN.
+    let pramin_value = mm.pramin().try_read32(vram_write_addr)?;
+
+    let test2_passed = if pramin_value == PATTERN_BAR1 {
+        true
+    } else {
+        dev_err!(
+            dev,
+            "MM: Test 2 FAILED - Expected {:#010x}, got {:#010x}\n",
+            PATTERN_BAR1,
+            pramin_value
+        );
+        false
+    };
+
+    // Phase 6: Cleanup - invalidate PTE.
+    vmm.unmap_page(mm, test_vfn)?;
+
+    if test1_passed && test2_passed {
+        dev_info!(dev, "MM: All self-tests PASSED\n");
+        Ok(())
+    } else {
+        dev_err!(dev, "MM: Self-tests FAILED\n");
+        Err(EIO)
+    }
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 22/26] nova-core: mm: Add PRAMIN aperture self-tests
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (20 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 21/26] nova-core: mm: Add memory management self-tests Joel Fernandes
@ 2026-01-20 20:42 ` Joel Fernandes
  2026-01-20 20:43 ` [PATCH RFC v6 23/26] nova-core: gsp: Extract usable FB region from GSP Joel Fernandes
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add self-tests for the PRAMIN aperture mechanism to verify correct
operation during GPU probe. The tests validate various alignment
requirements and corner cases.

The tests are default disabled and behind CONFIG_NOVA_PRAMIN_SELFTESTS
When enabled, tests run after GSP boot during probe.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/Kconfig      |  11 ++
 drivers/gpu/nova-core/gpu.rs       |  14 +++
 drivers/gpu/nova-core/mm/pramin.rs | 160 +++++++++++++++++++++++++++++
 3 files changed, 185 insertions(+)

diff --git a/drivers/gpu/nova-core/Kconfig b/drivers/gpu/nova-core/Kconfig
index 257bca5aa0ef..cbdbc1fb02b2 100644
--- a/drivers/gpu/nova-core/Kconfig
+++ b/drivers/gpu/nova-core/Kconfig
@@ -25,3 +25,14 @@ config NOVA_MM_SELFTESTS
 	  BAR1 virtual memory mapping functionality.
 
 	  This is a testing option and is default-disabled.
+
+config NOVA_PRAMIN_SELFTESTS
+	bool "PRAMIN self-tests"
+	depends on NOVA_CORE
+	default n
+	help
+	  Enable self-tests for the PRAMIN aperture mechanism. When enabled,
+	  basic tests are run during GPU probe after GSP boot to
+	  verify PRAMIN functionality.
+
+	  This is a testing option and is default-disabled.
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 938828508f2c..a1bcf6679e2a 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -324,10 +324,24 @@ pub(crate) fn run_selftests(
         mut self: Pin<&mut Self>,
         pdev: &pci::Device<device::Bound>,
     ) -> Result {
+        self.as_mut().run_pramin_selftest(pdev)?;
         self.as_mut().run_mm_selftest(pdev)?;
         Ok(())
     }
 
+    fn run_pramin_selftest(self: Pin<&mut Self>, pdev: &pci::Device<device::Bound>) -> Result {
+        #[cfg(CONFIG_NOVA_PRAMIN_SELFTESTS)]
+        {
+            use crate::mm::pagetable::MmuVersion;
+
+            let mmu_version = MmuVersion::from(self.spec.chipset.arch());
+            crate::mm::pramin::run_self_test(pdev.as_ref(), self.bar.clone(), mmu_version)?;
+        }
+
+        let _ = pdev; // Suppress unused warning when selftests disabled.
+        Ok(())
+    }
+
     fn run_mm_selftest(mut self: Pin<&mut Self>, pdev: &pci::Device<device::Bound>) -> Result {
         #[cfg(CONFIG_NOVA_MM_SELFTESTS)]
         {
diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
index 6a7ea2dc7d77..06384fb24841 100644
--- a/drivers/gpu/nova-core/mm/pramin.rs
+++ b/drivers/gpu/nova-core/mm/pramin.rs
@@ -242,3 +242,163 @@ unsafe impl Send for Window {}
 
 // SAFETY: `Window` requires `&mut self` for all accessors.
 unsafe impl Sync for Window {}
+
+/// Run PRAMIN self-tests during boot if self-tests are enabled.
+#[cfg(CONFIG_NOVA_PRAMIN_SELFTESTS)]
+pub(crate) fn run_self_test(
+    dev: &kernel::device::Device,
+    bar: Arc<Devres<Bar0>>,
+    mmu_version: super::pagetable::MmuVersion,
+) -> Result {
+    use super::pagetable::MmuVersion;
+
+    // PRAMIN support is only for MMU v2 for now (Turing/Ampere/Ada).
+    if mmu_version != MmuVersion::V2 {
+        dev_info!(
+            dev,
+            "PRAMIN: Skipping self-tests for MMU {:?} (only V2 supported)\n",
+            mmu_version
+        );
+        return Ok(());
+    }
+
+    dev_info!(dev, "PRAMIN: Starting self-test...\n");
+
+    let mut win = Window::new(bar)?;
+
+    // Use offset 0x1000 as test area.
+    let base: usize = 0x1000;
+
+    // Test 1: Read/write at byte-aligned locations.
+    for i in 0u8..4 {
+        let offset = base + 1 + usize::from(i); // Offsets 0x1001, 0x1002, 0x1003, 0x1004
+        let val = 0xA0 + i;
+        win.try_write8(offset, val)?;
+        let read_val = win.try_read8(offset)?;
+        if read_val != val {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - offset {:#x}: wrote {:#x}, read {:#x}\n",
+                offset,
+                val,
+                read_val
+            );
+            return Err(EIO);
+        }
+    }
+
+    // Test 2: Write `u32` and read back as `u8`s.
+    let test2_offset = base + 0x10;
+    let test2_val: u32 = 0xDEADBEEF;
+    win.try_write32(test2_offset, test2_val)?;
+
+    // Read back as individual bytes (little-endian: EF BE AD DE).
+    let expected_bytes: [u8; 4] = [0xEF, 0xBE, 0xAD, 0xDE];
+    for (i, &expected) in expected_bytes.iter().enumerate() {
+        let read_val = win.try_read8(test2_offset + i)?;
+        if read_val != expected {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - offset {:#x}: expected {:#x}, read {:#x}\n",
+                test2_offset + i,
+                expected,
+                read_val
+            );
+            return Err(EIO);
+        }
+    }
+
+    // Test 3: Window repositioning across 1MB boundaries.
+    // Write to offset > 1MB to trigger window slide, then verify.
+    let test3_offset_a: usize = base; // First 1MB region.
+    let test3_offset_b: usize = 0x200000 + base; // 2MB + base (different 1MB region).
+    let val_a: u32 = 0x11111111;
+    let val_b: u32 = 0x22222222;
+
+    // Write to first region.
+    win.try_write32(test3_offset_a, val_a)?;
+
+    // Write to second region (triggers window reposition).
+    win.try_write32(test3_offset_b, val_b)?;
+
+    // Read back from second region.
+    let read_b = win.try_read32(test3_offset_b)?;
+    if read_b != val_b {
+        dev_err!(
+            dev,
+            "PRAMIN: FAIL - offset {:#x}: expected {:#x}, read {:#x}\n",
+            test3_offset_b,
+            val_b,
+            read_b
+        );
+        return Err(EIO);
+    }
+
+    // Read back from first region (triggers window reposition again).
+    let read_a = win.try_read32(test3_offset_a)?;
+    if read_a != val_a {
+        dev_err!(
+            dev,
+            "PRAMIN: FAIL - offset {:#x}: expected {:#x}, read {:#x}\n",
+            test3_offset_a,
+            val_a,
+            read_a
+        );
+        return Err(EIO);
+    }
+
+    // Test 4: Invalid offset rejection (beyond 40-bit address space).
+    {
+        // 40-bit address space limit check.
+        let invalid_offset: usize = MAX_VRAM_OFFSET + 1;
+        let result = win.try_read32(invalid_offset);
+        if result.is_ok() {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - read at invalid offset {:#x} should have failed\n",
+                invalid_offset
+            );
+            return Err(EIO);
+        }
+    }
+
+    // Test 5: Misaligned multi-byte access rejection.
+    // Verify that misaligned `u16`/`u32`/`u64` accesses are properly rejected.
+    {
+        // `u16` at odd offset (not 2-byte aligned).
+        let offset_u16 = base + 0x21;
+        if win.try_write16(offset_u16, 0xABCD).is_ok() {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - misaligned u16 write at {:#x} should have failed\n",
+                offset_u16
+            );
+            return Err(EIO);
+        }
+
+        // `u32` at 2-byte-aligned (not 4-byte-aligned) offset.
+        let offset_u32 = base + 0x32;
+        if win.try_write32(offset_u32, 0x12345678).is_ok() {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - misaligned u32 write at {:#x} should have failed\n",
+                offset_u32
+            );
+            return Err(EIO);
+        }
+
+        // `u64` read at 4-byte-aligned (not 8-byte-aligned) offset.
+        let offset_u64 = base + 0x44;
+        if win.try_read64(offset_u64).is_ok() {
+            dev_err!(
+                dev,
+                "PRAMIN: FAIL - misaligned u64 read at {:#x} should have failed\n",
+                offset_u64
+            );
+            return Err(EIO);
+        }
+    }
+
+    dev_info!(dev, "PRAMIN: All self-tests PASSED\n");
+    Ok(())
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 23/26] nova-core: gsp: Extract usable FB region from GSP
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (21 preceding siblings ...)
  2026-01-20 20:42 ` [PATCH RFC v6 22/26] nova-core: mm: Add PRAMIN aperture self-tests Joel Fernandes
@ 2026-01-20 20:43 ` Joel Fernandes
  2026-01-20 20:43 ` [PATCH RFC v6 24/26] nova-core: fb: Add usable_vram field to FbLayout Joel Fernandes
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add first_usable_fb_region() to GspStaticConfigInfo to extract the first
usable FB region from GSP's fbRegionInfoParams. Usable regions are those
that are not reserved or protected.

The extracted region is stored in GetGspStaticInfoReply and exposed via
usable_fb_region() API for use by the memory subsystem.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gsp/commands.rs    | 13 +++++++++-
 drivers/gpu/nova-core/gsp/fw/commands.rs | 30 ++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 311f65f8367b..d619cf294b9c 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -186,10 +186,13 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
     }
 }
 
-/// The reply from the GSP to the [`GetGspInfo`] command.
+/// The reply from the GSP to the [`GetGspStaticInfo`] command.
 pub(crate) struct GetGspStaticInfoReply {
     gpu_name: [u8; 64],
     bar1_pde_base: u64,
+    /// First usable FB region (base, size) for memory allocation.
+    #[expect(dead_code)]
+    usable_fb_region: Option<(u64, u64)>,
 }
 
 impl MessageFromGsp for GetGspStaticInfoReply {
@@ -204,6 +207,7 @@ fn read(
         Ok(GetGspStaticInfoReply {
             gpu_name: msg.gpu_name_str(),
             bar1_pde_base: msg.bar1_pde_base(),
+            usable_fb_region: msg.first_usable_fb_region(),
         })
     }
 }
@@ -235,6 +239,13 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
     pub(crate) fn bar1_pde_base(&self) -> u64 {
         self.bar1_pde_base
     }
+
+    /// Returns the usable FB region (base, size) for driver allocation which is
+    /// already retrieved from the GSP.
+    #[expect(dead_code)]
+    pub(crate) fn usable_fb_region(&self) -> Option<(u64, u64)> {
+        self.usable_fb_region
+    }
 }
 
 /// Send the [`GetGspInfo`] command and awaits for its reply.
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index f069f4092911..cc1cf4bd52ea 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -122,6 +122,36 @@ impl GspStaticConfigInfo {
     pub(crate) fn bar1_pde_base(&self) -> u64 {
         self.0.bar1PdeBase
     }
+
+    /// Extract the first usable FB region from GSP firmware data.
+    ///
+    /// Returns the first region suitable for driver memory allocation as a base,size tuple.
+    /// Usable regions are those that:
+    /// - Are not reserved for firmware internal use.
+    /// - Are not protected (hardware-enforced access restrictions).
+    /// - Support compression (can use GPU memory compression for bandwidth).
+    /// - Support ISO (isochronous memory for display requiring guaranteed bandwidth).
+    pub(crate) fn first_usable_fb_region(&self) -> Option<(u64, u64)> {
+        let fb_info = &self.0.fbRegionInfoParams;
+        for i in 0..fb_info.numFBRegions as usize {
+            if let Some(reg) = fb_info.fbRegion.get(i) {
+                // Skip malformed regions where limit < base.
+                if reg.limit < reg.base {
+                    continue;
+                }
+                // Filter: not reserved, not protected, supports compression and ISO.
+                if reg.reserved == 0
+                    && reg.bProtected == 0
+                    && reg.supportCompressed != 0
+                    && reg.supportISO != 0
+                {
+                    let size = reg.limit - reg.base + 1;
+                    return Some((reg.base, size));
+                }
+            }
+        }
+        None
+    }
 }
 
 // SAFETY: Padding is explicit and will not contain uninitialized data.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 24/26] nova-core: fb: Add usable_vram field to FbLayout
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (22 preceding siblings ...)
  2026-01-20 20:43 ` [PATCH RFC v6 23/26] nova-core: gsp: Extract usable FB region from GSP Joel Fernandes
@ 2026-01-20 20:43 ` Joel Fernandes
  2026-01-20 20:43 ` [PATCH RFC v6 25/26] nova-core: mm: Use usable VRAM region for buddy allocator Joel Fernandes
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add usable_vram field to FbLayout to store the usable VRAM region for
driver allocations. This is populated after GSP boot with the region
extracted from GSP's fbRegionInfoParams.

FbLayout is now a two-phase structure:
1. new() computes firmware layout from hardware
2. set_usable_vram() populates usable region from GSP

The new usable_vram field represents the actual usable VRAM region
(~23.7GB on a 24GB GPU GA102 Ampere GPU).

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/fb.rs | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/fb.rs b/drivers/gpu/nova-core/fb.rs
index c62abcaed547..779447952b19 100644
--- a/drivers/gpu/nova-core/fb.rs
+++ b/drivers/gpu/nova-core/fb.rs
@@ -97,6 +97,10 @@ pub(crate) fn unregister(&self, bar: &Bar0) {
 /// Layout of the GPU framebuffer memory.
 ///
 /// Contains ranges of GPU memory reserved for a given purpose during the GSP boot process.
+///
+/// This structure is populated in 2 steps:
+/// 1. [`FbLayout::new()`] computes firmware layout from hardware.
+/// 2. [`FbLayout::set_usable_vram()`] populates usable region from GSP response.
 #[derive(Debug)]
 pub(crate) struct FbLayout {
     /// Range of the framebuffer. Starts at `0`.
@@ -111,10 +115,14 @@ pub(crate) struct FbLayout {
     pub(crate) elf: Range<u64>,
     /// WPR2 heap.
     pub(crate) wpr2_heap: Range<u64>,
-    /// WPR2 region range, starting with an instance of `GspFwWprMeta`.
+    /// WPR2 region range, starting with an instance of [`GspFwWprMeta`].
     pub(crate) wpr2: Range<u64>,
+    /// Non-WPR heap carved before WPR2, used by GSP firmware.
     pub(crate) heap: Range<u64>,
     pub(crate) vf_partition_count: u8,
+    /// Usable VRAM region for driver allocations (from GSP `fbRegionInfoParams`).
+    /// Initially [`None`], populated after GSP boot with usable region info.
+    pub(crate) usable_vram: Option<Range<u64>>,
 }
 
 impl FbLayout {
@@ -212,6 +220,19 @@ pub(crate) fn new(chipset: Chipset, bar: &Bar0, gsp_fw: &GspFirmware) -> Result<
             wpr2,
             heap,
             vf_partition_count: 0,
+            usable_vram: None,
         })
     }
+
+    /// Set the usable VRAM region from GSP response.
+    ///
+    /// Called after GSP boot with the first usable region extracted from
+    /// GSP's `fbRegionInfoParams`. Usable regions are those that:
+    /// - Are not reserved for firmware internal use.
+    /// - Are not protected (hardware-enforced access restrictions).
+    /// - Support compression (can use GPU memory compression for bandwidth).
+    /// - Support ISO (isochronous memory for display requiring guaranteed bandwidth).
+    pub(crate) fn set_usable_vram(&mut self, base: u64, size: u64) {
+        self.usable_vram = Some(base..base.saturating_add(size));
+    }
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 25/26] nova-core: mm: Use usable VRAM region for buddy allocator
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (23 preceding siblings ...)
  2026-01-20 20:43 ` [PATCH RFC v6 24/26] nova-core: fb: Add usable_vram field to FbLayout Joel Fernandes
@ 2026-01-20 20:43 ` Joel Fernandes
  2026-01-20 20:43 ` [PATCH RFC v6 26/26] nova-core: mm: Add BarUser to struct Gpu and create at boot Joel Fernandes
  2026-01-28 11:37 ` [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Danilo Krummrich
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

The buddy allocator manages the actual usable VRAM. On my GA102 Ampere
with 24GB video memory, that is ~23.7GB on a 24GB GPU enabling proper
GPU memory allocation for driver use.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs          | 62 ++++++++++++++++++++++-----
 drivers/gpu/nova-core/gsp/boot.rs     |  7 ++-
 drivers/gpu/nova-core/gsp/commands.rs |  2 -
 3 files changed, 57 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index a1bcf6679e2a..dd05ad23f763 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use core::cell::Cell;
+
 use kernel::{
     device,
     devres::Devres,
@@ -7,7 +9,7 @@
     gpu::buddy::GpuBuddyParams,
     pci,
     prelude::*,
-    sizes::{SZ_1M, SZ_4K},
+    sizes::SZ_4K,
     sync::Arc, //
 };
 
@@ -28,6 +30,13 @@
     regs,
 };
 
+/// Parameters extracted from GSP boot for initializing memory subsystems.
+#[derive(Clone, Copy)]
+struct BootParams {
+    usable_vram_start: u64,
+    usable_vram_size: u64,
+}
+
 macro_rules! define_chipset {
     ({ $($variant:ident = $value:expr),* $(,)* }) =>
     {
@@ -270,6 +279,13 @@ pub(crate) fn new<'a>(
         devres_bar: Arc<Devres<Bar0>>,
         bar: &'a Bar0,
     ) -> impl PinInit<Self, Error> + 'a {
+        // Cell to share boot parameters between GSP boot and subsequent initializations.
+        // Contains usable VRAM region from FbLayout and BAR1 PDE base from GSP info.
+        let boot_params: Cell<BootParams> = Cell::new(BootParams {
+            usable_vram_start: 0,
+            usable_vram_size: 0,
+        });
+
         try_pin_init!(Self {
             spec: Spec::new(pdev.as_ref(), bar).inspect(|spec| {
                 dev_info!(pdev.as_ref(),"NVIDIA ({})\n", spec);
@@ -291,18 +307,42 @@ pub(crate) fn new<'a>(
 
             sec2_falcon: Falcon::new(pdev.as_ref(), spec.chipset)?,
 
-            // Create GPU memory manager owning memory management resources.
-            // This will be initialized with the usable VRAM region from GSP in a later
-            // patch. For now, we use a placeholder of 1MB.
-            mm: GpuMm::new(devres_bar.clone(), GpuBuddyParams {
-                base_offset_bytes: 0,
-                physical_memory_size_bytes: SZ_1M as u64,
-                chunk_size_bytes: SZ_4K as u64,
-            })?,
-
             gsp <- Gsp::new(pdev),
 
-            gsp_static_info: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)?.0 },
+            // Boot GSP and extract usable VRAM region for buddy allocator.
+            gsp_static_info: {
+                let (info, fb_layout) = gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)?;
+
+                let usable_vram = fb_layout.usable_vram.as_ref().ok_or_else(|| {
+                    dev_err!(pdev.as_ref(), "No usable FB regions found from GSP\n");
+                    ENODEV
+                })?;
+
+                dev_info!(
+                    pdev.as_ref(),
+                    "Using FB region: {:#x}..{:#x}\n",
+                    usable_vram.start,
+                    usable_vram.end
+                );
+
+                boot_params.set(BootParams {
+                    usable_vram_start: usable_vram.start,
+                    usable_vram_size: usable_vram.end - usable_vram.start,
+                });
+
+                info
+            },
+
+            // Create GPU memory manager owning memory management resources.
+            // Uses the usable VRAM region from GSP for buddy allocator.
+            mm: {
+                let params = boot_params.get();
+                GpuMm::new(devres_bar.clone(), GpuBuddyParams {
+                    base_offset_bytes: params.usable_vram_start,
+                    physical_memory_size_bytes: params.usable_vram_size,
+                    chunk_size_bytes: SZ_4K as u64,
+                })?
+            },
 
             bar: devres_bar,
         })
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 75f949bc4864..a034e2e80a4b 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -150,7 +150,7 @@ pub(crate) fn boot(
 
         let gsp_fw = KBox::pin_init(GspFirmware::new(dev, chipset, FIRMWARE_VERSION), GFP_KERNEL)?;
 
-        let fb_layout = FbLayout::new(chipset, bar, &gsp_fw)?;
+        let mut fb_layout = FbLayout::new(chipset, bar, &gsp_fw)?;
         dev_dbg!(dev, "{:#x?}\n", fb_layout);
 
         Self::run_fwsec_frts(dev, gsp_falcon, bar, &bios, &fb_layout)?;
@@ -252,6 +252,11 @@ pub(crate) fn boot(
             Err(e) => dev_warn!(pdev.as_ref(), "GPU name unavailable: {:?}\n", e),
         }
 
+        // Populate usable VRAM from GSP response.
+        if let Some((base, size)) = info.usable_fb_region() {
+            fb_layout.set_usable_vram(base, size);
+        }
+
         Ok((info, fb_layout))
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index d619cf294b9c..4a7eda512789 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -191,7 +191,6 @@ pub(crate) struct GetGspStaticInfoReply {
     gpu_name: [u8; 64],
     bar1_pde_base: u64,
     /// First usable FB region (base, size) for memory allocation.
-    #[expect(dead_code)]
     usable_fb_region: Option<(u64, u64)>,
 }
 
@@ -242,7 +241,6 @@ pub(crate) fn bar1_pde_base(&self) -> u64 {
 
     /// Returns the usable FB region (base, size) for driver allocation which is
     /// already retrieved from the GSP.
-    #[expect(dead_code)]
     pub(crate) fn usable_fb_region(&self) -> Option<(u64, u64)> {
         self.usable_fb_region
     }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v6 26/26] nova-core: mm: Add BarUser to struct Gpu and create at boot
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (24 preceding siblings ...)
  2026-01-20 20:43 ` [PATCH RFC v6 25/26] nova-core: mm: Use usable VRAM region for buddy allocator Joel Fernandes
@ 2026-01-20 20:43 ` Joel Fernandes
  2026-01-28 11:37 ` [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Danilo Krummrich
  26 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-20 20:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Joel Fernandes

Add a BarUser field to struct Gpu and eagerly create it during GPU
initialization. The BarUser provides the BAR1 user interface for CPU
access to GPU virtual memory through the GPU's MMU.

The BarUser is initialized using BAR1 PDE base address from GSP static
info, MMU version and BAR1 size obtained from platform device.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index dd05ad23f763..15d8d42ecfa8 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -26,7 +26,12 @@
         commands::GetGspStaticInfoReply,
         Gsp, //
     },
-    mm::GpuMm,
+    mm::{
+        bar_user::BarUser,
+        pagetable::MmuVersion,
+        GpuMm,
+        VramAddress, //
+    },
     regs,
 };
 
@@ -35,6 +40,7 @@
 struct BootParams {
     usable_vram_start: u64,
     usable_vram_size: u64,
+    bar1_pde_base: u64,
 }
 
 macro_rules! define_chipset {
@@ -271,6 +277,8 @@ pub(crate) struct Gpu {
     gsp: Gsp,
     /// Static GPU information from GSP.
     gsp_static_info: GetGspStaticInfoReply,
+    /// BAR1 user interface for CPU access to GPU virtual memory.
+    bar_user: BarUser,
 }
 
 impl Gpu {
@@ -284,6 +292,7 @@ pub(crate) fn new<'a>(
         let boot_params: Cell<BootParams> = Cell::new(BootParams {
             usable_vram_start: 0,
             usable_vram_size: 0,
+            bar1_pde_base: 0,
         });
 
         try_pin_init!(Self {
@@ -328,6 +337,7 @@ pub(crate) fn new<'a>(
                 boot_params.set(BootParams {
                     usable_vram_start: usable_vram.start,
                     usable_vram_size: usable_vram.end - usable_vram.start,
+                    bar1_pde_base: info.bar1_pde_base(),
                 });
 
                 info
@@ -344,6 +354,16 @@ pub(crate) fn new<'a>(
                 })?
             },
 
+            // Create BAR1 user interface for CPU access to GPU virtual memory.
+            // Uses the BAR1 PDE base from GSP and full BAR1 size for VA space.
+            bar_user: {
+                let params = boot_params.get();
+                let pdb_addr = VramAddress::new(params.bar1_pde_base);
+                let mmu_version = MmuVersion::from(spec.chipset.arch());
+                let bar1_size = pdev.resource_len(1)?;
+                BarUser::new(pdb_addr, mmu_version, bar1_size)?
+            },
+
             bar: devres_bar,
         })
     }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists
  2026-01-20 20:42 ` [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists Joel Fernandes
@ 2026-01-20 23:48   ` Gary Guo
  2026-01-21 19:50     ` Joel Fernandes
  2026-01-21  7:27   ` Zhi Wang
  1 sibling, 1 reply; 71+ messages in thread
From: Gary Guo @ 2026-01-20 23:48 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

On Tue Jan 20, 2026 at 8:42 PM GMT, Joel Fernandes wrote:
> Add a new module `clist` for working with C's doubly circular linked
> lists. Provide low-level iteration over list nodes.
>
> Typed iteration over actual items is provided with a `clist_create`
> macro to assist in creation of the `Clist` type.

This should read "CList".

---

I was quite dubious about the patch just from the title (everybody knows how
easy a linked list is in Rust), but it turns out it is not as concerning as I
expected, mostly due to the read-only nature of the particular implementation
(a lot of the safety comments would be much more difficult to justify, say, if
it's mutable). That said, still a lot of feedbacks below.

I think something like is okay in the short term. However, there's an growing
interest in getting our Rust list API improved, so it could be ideal if
eventually the Rust list can be capable of handling FFI lists, too.

>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  MAINTAINERS            |   7 +
>  rust/helpers/helpers.c |   1 +
>  rust/helpers/list.c    |  12 ++
>  rust/kernel/clist.rs   | 357 +++++++++++++++++++++++++++++++++++++++++
>  rust/kernel/lib.rs     |   1 +
>  5 files changed, 378 insertions(+)
>  create mode 100644 rust/helpers/list.c
>  create mode 100644 rust/kernel/clist.rs
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0d044a58cbfe..b76988c38045 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22936,6 +22936,13 @@ F:	rust/kernel/init.rs
>  F:	rust/pin-init/
>  K:	\bpin-init\b|pin_init\b|PinInit
>  
> +RUST TO C LIST INTERFACES
> +M:	Joel Fernandes <joelagnelf@nvidia.com>
> +M:	Alexandre Courbot <acourbot@nvidia.com>
> +L:	rust-for-linux@vger.kernel.org
> +S:	Maintained
> +F:	rust/kernel/clist.rs
> +
>  RXRPC SOCKETS (AF_RXRPC)
>  M:	David Howells <dhowells@redhat.com>
>  M:	Marc Dionne <marc.dionne@auristor.com>
> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index 79c72762ad9c..634fa2386bbb 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -32,6 +32,7 @@
>  #include "io.c"
>  #include "jump_label.c"
>  #include "kunit.c"
> +#include "list.c"
>  #include "maple_tree.c"
>  #include "mm.c"
>  #include "mutex.c"
> diff --git a/rust/helpers/list.c b/rust/helpers/list.c
> new file mode 100644
> index 000000000000..6044979c7a2e
> --- /dev/null
> +++ b/rust/helpers/list.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Helpers for C Circular doubly linked list implementation.
> + */
> +
> +#include <linux/list.h>
> +
> +void rust_helper_list_add_tail(struct list_head *new, struct list_head *head)
> +{
> +	list_add_tail(new, head);
> +}
> diff --git a/rust/kernel/clist.rs b/rust/kernel/clist.rs
> new file mode 100644
> index 000000000000..91754ae721b9
> --- /dev/null
> +++ b/rust/kernel/clist.rs
> @@ -0,0 +1,357 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! A C doubly circular intrusive linked list interface for rust code.
> +//!
> +//! # Examples
> +//!
> +//! ```
> +//! use kernel::{
> +//!     bindings,
> +//!     clist::init_list_head,
> +//!     clist_create,
> +//!     types::Opaque, //
> +//! };
> +//! # // Create test list with values (0, 10, 20) - normally done by C code but it is
> +//! # // emulated here for doctests using the C bindings.
> +//! # use core::mem::MaybeUninit;
> +//! #
> +//! # /// C struct with embedded `list_head` (typically will be allocated by C code).
> +//! # #[repr(C)]
> +//! # pub(crate) struct SampleItemC {
> +//! #     pub value: i32,
> +//! #     pub link: bindings::list_head,
> +//! # }
> +//! #
> +//! # let mut head = MaybeUninit::<bindings::list_head>::uninit();
> +//! #
> +//! # let head = head.as_mut_ptr();
> +//! # // SAFETY: head and all the items are test objects allocated in this scope.
> +//! # unsafe { init_list_head(head) };
> +//! #
> +//! # let mut items = [
> +//! #     MaybeUninit::<SampleItemC>::uninit(),
> +//! #     MaybeUninit::<SampleItemC>::uninit(),
> +//! #     MaybeUninit::<SampleItemC>::uninit(),
> +//! # ];
> +//! #
> +//! # for (i, item) in items.iter_mut().enumerate() {
> +//! #     let ptr = item.as_mut_ptr();
> +//! #     // SAFETY: pointers are to allocated test objects with a list_head field.
> +//! #     unsafe {
> +//! #         (*ptr).value = i as i32 * 10;
> +//! #         // addr_of_mut!() computes address of link directly as link is uninitialized.
> +//! #         init_list_head(core::ptr::addr_of_mut!((*ptr).link));
> +//! #         bindings::list_add_tail(&mut (*ptr).link, head);
> +//! #     }
> +//! # }
> +//!
> +//! // Rust wrapper for the C struct.
> +//! // The list item struct in this example is defined in C code as:
> +//! //   struct SampleItemC {
> +//! //       int value;
> +//! //       struct list_head link;
> +//! //   };
> +//! //
> +//! #[repr(transparent)]
> +//! pub(crate) struct Item(Opaque<SampleItemC>);
> +//!
> +//! impl Item {
> +//!     pub(crate) fn value(&self) -> i32 {
> +//!         // SAFETY: [`Item`] has same layout as [`SampleItemC`].
> +//!         unsafe { (*self.0.get()).value }
> +//!     }
> +//! }
> +//!
> +//! // Create typed [`CList`] from sentinel head.
> +//! // SAFETY: head is valid, items are [`SampleItemC`] with embedded `link` field.
> +//! let list = unsafe { clist_create!(head, Item, SampleItemC, link) };
> +//!
> +//! // Iterate directly over typed items.
> +//! let mut found_0 = false;
> +//! let mut found_10 = false;
> +//! let mut found_20 = false;
> +//!
> +//! for item in list.iter() {
> +//!     let val = item.value();
> +//!     if val == 0 { found_0 = true; }
> +//!     if val == 10 { found_10 = true; }
> +//!     if val == 20 { found_20 = true; }
> +//! }
> +//!
> +//! assert!(found_0 && found_10 && found_20);
> +//! ```
> +
> +use core::{
> +    iter::FusedIterator,
> +    marker::PhantomData, //
> +};
> +
> +use crate::{
> +    bindings,
> +    types::Opaque, //
> +};
> +
> +use pin_init::PinInit;
> +
> +/// Initialize a `list_head` object to point to itself.
> +///
> +/// # Safety
> +///
> +/// `list` must be a valid pointer to a `list_head` object.
> +#[inline]
> +pub unsafe fn init_list_head(list: *mut bindings::list_head) {
> +    // SAFETY: Caller guarantees `list` is a valid pointer to a `list_head`.
> +    unsafe {
> +        (*list).next = list;
> +        (*list).prev = list;

This needs to be an atomic write or it'll depart from the C implementation.

> +    }
> +}

I don't think we want to publicly expose this! I've not found a user in the
subsequent patch, too.

Alice suggested to move this to bindings in v3 which I think is a good idea.
Also, even though it's against Rust name convention, for bindings we should use
the exact name as C (so INIT_LIST_HEAD).

> +
> +/// Wraps a `list_head` object for use in intrusive linked lists.
> +///
> +/// # Invariants
> +///
> +/// - [`CListHead`] represents an allocated and valid `list_head` structure.
> +/// - Once a [`CListHead`] is created in Rust, it will not be modified by non-Rust code.
> +/// - All `list_head` for individual items are not modified for the lifetime of [`CListHead`].
> +#[repr(transparent)]
> +pub struct CListHead(Opaque<bindings::list_head>);
> +
> +impl CListHead {
> +    /// Create a `&CListHead` reference from a raw `list_head` pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `ptr` must be a valid pointer to an allocated and initialized `list_head` structure.
> +    /// - `ptr` must remain valid and unmodified for the lifetime `'a`.
> +    #[inline]
> +    pub unsafe fn from_raw<'a>(ptr: *mut bindings::list_head) -> &'a Self {
> +        // SAFETY:
> +        // - [`CListHead`] has same layout as `list_head`.
> +        // - `ptr` is valid and unmodified for 'a.
> +        unsafe { &*ptr.cast() }
> +    }
> +
> +    /// Get the raw `list_head` pointer.
> +    #[inline]
> +    pub fn as_raw(&self) -> *mut bindings::list_head {
> +        self.0.get()
> +    }
> +
> +    /// Get the next [`CListHead`] in the list.
> +    #[inline]
> +    pub fn next(&self) -> &Self {
> +        let raw = self.as_raw();
> +        // SAFETY:
> +        // - `self.as_raw()` is valid per type invariants.
> +        // - The `next` pointer is guaranteed to be non-NULL.
> +        unsafe { Self::from_raw((*raw).next) }
> +    }
> +
> +    /// Get the previous [`CListHead`] in the list.
> +    #[inline]
> +    pub fn prev(&self) -> &Self {
> +        let raw = self.as_raw();
> +        // SAFETY:
> +        // - self.as_raw() is valid per type invariants.
> +        // - The `prev` pointer is guaranteed to be non-NULL.
> +        unsafe { Self::from_raw((*raw).prev) }
> +    }
> +
> +    /// Check if this node is linked in a list (not isolated).
> +    #[inline]
> +    pub fn is_linked(&self) -> bool {
> +        let raw = self.as_raw();
> +        // SAFETY: self.as_raw() is valid per type invariants.
> +        unsafe { (*raw).next != raw && (*raw).prev != raw }

While is this checking both prev and next? `list_empty` is just
`READ_ONCE(head->next) == head`.

> +    }
> +
> +    /// Fallible pin-initializer that initializes and then calls user closure.
> +    ///
> +    /// Initializes the list head first, then passes `&CListHead` to the closure.
> +    /// This hides the raw FFI pointer from the user.
> +    pub fn try_init<E>(
> +        init_func: impl FnOnce(&CListHead) -> Result<(), E>,
> +    ) -> impl PinInit<Self, E> {
> +        // SAFETY: init_list_head initializes the list_head to point to itself.
> +        // After initialization, we create a reference to pass to the closure.
> +        unsafe {
> +            pin_init::pin_init_from_closure(move |slot: *mut Self| {
> +                init_list_head(slot.cast());
> +                // SAFETY: slot is now initialized, safe to create reference.
> +                init_func(&*slot)

Why is this callback necessary? The user can just create the list head and
then reference it later? I don't see what this specifically gains over just
doing

    fn new() -> impl PinInit<Self>;

and have user-side

    list <- CListHead::new(),
    _: {
        do_want_ever(&list)
    }


> +            })
> +        }
> +    }
> +}
> +
> +// SAFETY: [`CListHead`] can be sent to any thread.
> +unsafe impl Send for CListHead {}
> +
> +// SAFETY: [`CListHead`] can be shared among threads as it is not modified
> +// by non-Rust code per type invariants.
> +unsafe impl Sync for CListHead {}
> +
> +impl PartialEq for CListHead {
> +    fn eq(&self, other: &Self) -> bool {
> +        self.as_raw() == other.as_raw()

Or just `core::ptr::eq(self, other)`

> +    }
> +}
> +
> +impl Eq for CListHead {}
> +
> +/// Low-level iterator over `list_head` nodes.
> +///
> +/// An iterator used to iterate over a C intrusive linked list (`list_head`). Caller has to
> +/// perform conversion of returned [`CListHead`] to an item (using `container_of` macro or similar).
> +///
> +/// # Invariants
> +///
> +/// [`CListHeadIter`] is iterating over an allocated, initialized and valid list.
> +struct CListHeadIter<'a> {
> +    current_head: &'a CListHead,
> +    list_head: &'a CListHead,
> +}
> +
> +impl<'a> Iterator for CListHeadIter<'a> {
> +    type Item = &'a CListHead;
> +
> +    #[inline]
> +    fn next(&mut self) -> Option<Self::Item> {
> +        // Advance to next node.
> +        let next = self.current_head.next();
> +
> +        // Check if we've circled back to the sentinel head.
> +        if next == self.list_head {
> +            None
> +        } else {
> +            self.current_head = next;
> +            Some(self.current_head)
> +        }

I think this could match the C iterator behaviour. When the iterator is created,
a `next` is done first, and then subsequently you only need to check if
`current_head` is `list_head`.

This is slightly better because the condition check does not need to dereference
a pointer.

> +    }
> +}
> +
> +impl<'a> FusedIterator for CListHeadIter<'a> {}
> +
> +/// A typed C linked list with a sentinel head.
> +///
> +/// A sentinel head represents the entire linked list and can be used for
> +/// iteration over items of type `T`, it is not associated with a specific item.
> +///
> +/// The const generic `OFFSET` specifies the byte offset of the `list_head` field within
> +/// the struct that `T` wraps.
> +///
> +/// # Invariants
> +///
> +/// - `head` is an allocated and valid C `list_head` structure that is the list's sentinel.
> +/// - `OFFSET` is the byte offset of the `list_head` field within the struct that `T` wraps.
> +/// - All the list's `list_head` nodes are allocated and have valid next/prev pointers.
> +/// - The underlying `list_head` (and entire list) is not modified for the lifetime `'a`.
> +pub struct CList<'a, T, const OFFSET: usize> {
> +    head: &'a CListHead,
> +    _phantom: PhantomData<&'a T>,
> +}

Is there a reason that this is not

    #[repr(transparent)]
    struct CList(CListHead)

? We typically want to avoid putting reference inside the struct if it can be on
the outside. This allows `&self` to be a single level of reference, not too.

It also means that you can just write `&CList<_>` in many cases, and doesn't need
`CList<'_, T>` (plus all the benefits of a reference).

> +
> +impl<'a, T, const OFFSET: usize> CList<'a, T, OFFSET> {
> +    /// Create a typed [`CList`] from a raw sentinel `list_head` pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `ptr` must be a valid pointer to an allocated and initialized `list_head` structure
> +    ///   representing a list sentinel.
> +    /// - `ptr` must remain valid and unmodified for the lifetime `'a`.
> +    /// - The list must contain items where the `list_head` field is at byte offset `OFFSET`.
> +    /// - `T` must be `#[repr(transparent)]` over the C struct.
> +    #[inline]
> +    pub unsafe fn from_raw(ptr: *mut bindings::list_head) -> Self {
> +        Self {
> +            // SAFETY: Caller guarantees `ptr` is a valid, sentinel `list_head` object.
> +            head: unsafe { CListHead::from_raw(ptr) },
> +            _phantom: PhantomData,
> +        }
> +    }
> +
> +    /// Get the raw sentinel `list_head` pointer.
> +    #[inline]
> +    pub fn as_raw(&self) -> *mut bindings::list_head {
> +        self.head.as_raw()
> +    }
> +
> +    /// Check if the list is empty.
> +    #[inline]
> +    pub fn is_empty(&self) -> bool {
> +        let raw = self.as_raw();
> +        // SAFETY: self.as_raw() is valid per type invariants.
> +        unsafe { (*raw).next == raw }

`self.head.is_linked()`?

> +    }
> +
> +    /// Create an iterator over typed items.
> +    #[inline]
> +    pub fn iter(&self) -> CListIter<'a, T, OFFSET> {
> +        CListIter {
> +            head_iter: CListHeadIter {
> +                current_head: self.head,
> +                list_head: self.head,
> +            },
> +            _phantom: PhantomData,
> +        }
> +    }
> +}
> +
> +/// High-level iterator over typed list items.
> +pub struct CListIter<'a, T, const OFFSET: usize> {
> +    head_iter: CListHeadIter<'a>,
> +    _phantom: PhantomData<&'a T>,
> +}
> +
> +impl<'a, T, const OFFSET: usize> Iterator for CListIter<'a, T, OFFSET> {
> +    type Item = &'a T;
> +
> +    fn next(&mut self) -> Option<Self::Item> {
> +        let head = self.head_iter.next()?;
> +
> +        // Convert to item using OFFSET.
> +        // SAFETY: `item_ptr` calculation from `OFFSET` (calculated using offset_of!)
> +        // is valid per invariants.
> +        Some(unsafe { &*head.as_raw().byte_sub(OFFSET).cast::<T>() })
> +    }
> +}
> +
> +impl<'a, T, const OFFSET: usize> FusedIterator for CListIter<'a, T, OFFSET> {}
> +
> +/// Create a C doubly-circular linked list interface [`CList`] from a raw `list_head` pointer.
> +///
> +/// This macro creates a [`CList<T, OFFSET>`] that can iterate over items of type `$rust_type`
> +/// linked via the `$field` field in the underlying C struct `$c_type`.
> +///
> +/// # Arguments
> +///
> +/// - `$head`: Raw pointer to the sentinel `list_head` object (`*mut bindings::list_head`).
> +/// - `$rust_type`: Each item's rust wrapper type.
> +/// - `$c_type`: Each item's C struct type that contains the embedded `list_head`.
> +/// - `$field`: The name of the `list_head` field within the C struct.
> +///
> +/// # Safety
> +///
> +/// The caller must ensure:
> +/// - `$head` is a valid, initialized sentinel `list_head` pointing to a list that remains
> +///   unmodified for the lifetime of the rust [`CList`].
> +/// - The list contains items of type `$c_type` linked via an embedded `$field`.
> +/// - `$rust_type` is `#[repr(transparent)]` over `$c_type` or has compatible layout.
> +/// - The macro is called from an unsafe block.

This is not a safe requirement, probably lift it up and say "This is an unsafe
macro.".

> +///
> +/// # Examples
> +///
> +/// Refer to the examples in the [`crate::clist`] module documentation.
> +#[macro_export]
> +macro_rules! clist_create {
> +    ($head:expr, $rust_type:ty, $c_type:ty, $($field:tt).+) => {{
> +        // Compile-time check that field path is a list_head.
> +        let _: fn(*const $c_type) -> *const $crate::bindings::list_head =
> +            |p| ::core::ptr::addr_of!((*p).$($field).+);

`&raw const` is preferred now.

> +
> +        // Calculate offset and create `CList`.
> +        const OFFSET: usize = ::core::mem::offset_of!($c_type, $($field).+);
> +        $crate::clist::CList::<$rust_type, OFFSET>::from_raw($head)
> +    }};
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index f812cf120042..cd7e6a1055b0 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -75,6 +75,7 @@
>  pub mod bug;
>  #[doc(hidden)]
>  pub mod build_assert;
> +pub mod clist;

Can we keep this pub(crate)?

Best,
Gary

>  pub mod clk;
>  #[cfg(CONFIG_CONFIGFS_FS)]
>  pub mod configfs;


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists
  2026-01-20 20:42 ` [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists Joel Fernandes
  2026-01-20 23:48   ` Gary Guo
@ 2026-01-21  7:27   ` Zhi Wang
  2026-01-21 18:12     ` Joel Fernandes
  1 sibling, 1 reply; 71+ messages in thread
From: Zhi Wang @ 2026-01-21  7:27 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, joel, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev

On Tue, 20 Jan 2026 15:42:38 -0500
Joel Fernandes <joelagnelf@nvidia.com> wrote:

> Add a new module `clist` for working with C's doubly circular linked
> lists. Provide low-level iteration over list nodes.
> 
> Typed iteration over actual items is provided with a `clist_create`
> macro to assist in creation of the `Clist` type.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---

snip

> +/// Initialize a `list_head` object to point to itself.
> +///
> +/// # Safety
> +///
> +/// `list` must be a valid pointer to a `list_head` object.
> +#[inline]
> +pub unsafe fn init_list_head(list: *mut bindings::list_head) {
> +    // SAFETY: Caller guarantees `list` is a valid pointer to a
> `list_head`.
> +    unsafe {
> +        (*list).next = list;
> +        (*list).prev = list;
> +    }
> +}
> +

Might be better to have a C helper? since INIT_LIST_HEAD() has WRITE_ONCE()
for memory ordering. This one seems not equal to it.

Z.

> +/// Wraps a `list_head` object for use in intrusive linked lists.
> +///
> +/// # Invariants
> +///
> +/// - [`CListHead`] represents an allocated and valid `list_head`
> structure. +/// - Once a [`CListHead`] is created in Rust, it will not
> be modified by non-Rust code. +/// - All `list_head` for individual
> items are not modified for the lifetime of [`CListHead`].
> +#[repr(transparent)] +pub struct CListHead(Opaque<bindings::list_head>);
> +
> +impl CListHead {
> +    /// Create a `&CListHead` reference from a raw `list_head` pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `ptr` must be a valid pointer to an allocated and initialized
> `list_head` structure.
> +    /// - `ptr` must remain valid and unmodified for the lifetime `'a`.
> +    #[inline]
> +    pub unsafe fn from_raw<'a>(ptr: *mut bindings::list_head) -> &'a
> Self {
> +        // SAFETY:
> +        // - [`CListHead`] has same layout as `list_head`.
> +        // - `ptr` is valid and unmodified for 'a.
> +        unsafe { &*ptr.cast() }
> +    }
> +
> +    /// Get the raw `list_head` pointer.
> +    #[inline]
> +    pub fn as_raw(&self) -> *mut bindings::list_head {
> +        self.0.get()
> +    }
> +
> +    /// Get the next [`CListHead`] in the list.
> +    #[inline]
> +    pub fn next(&self) -> &Self {
> +        let raw = self.as_raw();
> +        // SAFETY:
> +        // - `self.as_raw()` is valid per type invariants.
> +        // - The `next` pointer is guaranteed to be non-NULL.
> +        unsafe { Self::from_raw((*raw).next) }
> +    }
> +
> +    /// Get the previous [`CListHead`] in the list.
> +    #[inline]
> +    pub fn prev(&self) -> &Self {
> +        let raw = self.as_raw();
> +        // SAFETY:
> +        // - self.as_raw() is valid per type invariants.
> +        // - The `prev` pointer is guaranteed to be non-NULL.
> +        unsafe { Self::from_raw((*raw).prev) }
> +    }
> +
> +    /// Check if this node is linked in a list (not isolated).
> +    #[inline]
> +    pub fn is_linked(&self) -> bool {
> +        let raw = self.as_raw();
> +        // SAFETY: self.as_raw() is valid per type invariants.
> +        unsafe { (*raw).next != raw && (*raw).prev != raw }
> +    }
> +
> +    /// Fallible pin-initializer that initializes and then calls user
> closure.
> +    ///
> +    /// Initializes the list head first, then passes `&CListHead` to
> the closure.
> +    /// This hides the raw FFI pointer from the user.
> +    pub fn try_init<E>(
> +        init_func: impl FnOnce(&CListHead) -> Result<(), E>,
> +    ) -> impl PinInit<Self, E> {
> +        // SAFETY: init_list_head initializes the list_head to point to
> itself.
> +        // After initialization, we create a reference to pass to the
> closure.
> +        unsafe {
> +            pin_init::pin_init_from_closure(move |slot: *mut Self| {
> +                init_list_head(slot.cast());
> +                // SAFETY: slot is now initialized, safe to create
> reference.
> +                init_func(&*slot)
> +            })
> +        }
> +    }
> +}
> +
> +// SAFETY: [`CListHead`] can be sent to any thread.
> +unsafe impl Send for CListHead {}
> +
> +// SAFETY: [`CListHead`] can be shared among threads as it is not
> modified +// by non-Rust code per type invariants.
> +unsafe impl Sync for CListHead {}
> +
> +impl PartialEq for CListHead {
> +    fn eq(&self, other: &Self) -> bool {
> +        self.as_raw() == other.as_raw()
> +    }
> +}
> +
> +impl Eq for CListHead {}
> +
> +/// Low-level iterator over `list_head` nodes.
> +///
> +/// An iterator used to iterate over a C intrusive linked list
> (`list_head`). Caller has to +/// perform conversion of returned
> [`CListHead`] to an item (using `container_of` macro or similar). +///
> +/// # Invariants
> +///
> +/// [`CListHeadIter`] is iterating over an allocated, initialized and
> valid list. +struct CListHeadIter<'a> {
> +    current_head: &'a CListHead,
> +    list_head: &'a CListHead,
> +}
> +
> +impl<'a> Iterator for CListHeadIter<'a> {
> +    type Item = &'a CListHead;
> +
> +    #[inline]
> +    fn next(&mut self) -> Option<Self::Item> {
> +        // Advance to next node.
> +        let next = self.current_head.next();
> +
> +        // Check if we've circled back to the sentinel head.
> +        if next == self.list_head {
> +            None
> +        } else {
> +            self.current_head = next;
> +            Some(self.current_head)
> +        }
> +    }
> +}
> +
> +impl<'a> FusedIterator for CListHeadIter<'a> {}
> +
> +/// A typed C linked list with a sentinel head.
> +///
> +/// A sentinel head represents the entire linked list and can be used
> for +/// iteration over items of type `T`, it is not associated with a
> specific item. +///
> +/// The const generic `OFFSET` specifies the byte offset of the
> `list_head` field within +/// the struct that `T` wraps.
> +///
> +/// # Invariants
> +///
> +/// - `head` is an allocated and valid C `list_head` structure that is
> the list's sentinel. +/// - `OFFSET` is the byte offset of the
> `list_head` field within the struct that `T` wraps. +/// - All the
> list's `list_head` nodes are allocated and have valid next/prev
> pointers. +/// - The underlying `list_head` (and entire list) is not
> modified for the lifetime `'a`. +pub struct CList<'a, T, const OFFSET:
> usize> {
> +    head: &'a CListHead,
> +    _phantom: PhantomData<&'a T>,
> +}
> +
> +impl<'a, T, const OFFSET: usize> CList<'a, T, OFFSET> {
> +    /// Create a typed [`CList`] from a raw sentinel `list_head`
> pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `ptr` must be a valid pointer to an allocated and initialized
> `list_head` structure
> +    ///   representing a list sentinel.
> +    /// - `ptr` must remain valid and unmodified for the lifetime `'a`.
> +    /// - The list must contain items where the `list_head` field is at
> byte offset `OFFSET`.
> +    /// - `T` must be `#[repr(transparent)]` over the C struct.
> +    #[inline]
> +    pub unsafe fn from_raw(ptr: *mut bindings::list_head) -> Self {
> +        Self {
> +            // SAFETY: Caller guarantees `ptr` is a valid, sentinel
> `list_head` object.
> +            head: unsafe { CListHead::from_raw(ptr) },
> +            _phantom: PhantomData,
> +        }
> +    }
> +
> +    /// Get the raw sentinel `list_head` pointer.
> +    #[inline]
> +    pub fn as_raw(&self) -> *mut bindings::list_head {
> +        self.head.as_raw()
> +    }
> +
> +    /// Check if the list is empty.
> +    #[inline]
> +    pub fn is_empty(&self) -> bool {
> +        let raw = self.as_raw();
> +        // SAFETY: self.as_raw() is valid per type invariants.
> +        unsafe { (*raw).next == raw }
> +    }
> +
> +    /// Create an iterator over typed items.
> +    #[inline]
> +    pub fn iter(&self) -> CListIter<'a, T, OFFSET> {
> +        CListIter {
> +            head_iter: CListHeadIter {
> +                current_head: self.head,
> +                list_head: self.head,
> +            },
> +            _phantom: PhantomData,
> +        }
> +    }
> +}
> +
> +/// High-level iterator over typed list items.
> +pub struct CListIter<'a, T, const OFFSET: usize> {
> +    head_iter: CListHeadIter<'a>,
> +    _phantom: PhantomData<&'a T>,
> +}
> +
> +impl<'a, T, const OFFSET: usize> Iterator for CListIter<'a, T, OFFSET> {
> +    type Item = &'a T;
> +
> +    fn next(&mut self) -> Option<Self::Item> {
> +        let head = self.head_iter.next()?;
> +
> +        // Convert to item using OFFSET.
> +        // SAFETY: `item_ptr` calculation from `OFFSET` (calculated
> using offset_of!)
> +        // is valid per invariants.
> +        Some(unsafe { &*head.as_raw().byte_sub(OFFSET).cast::<T>() })
> +    }
> +}
> +
> +impl<'a, T, const OFFSET: usize> FusedIterator for CListIter<'a, T,
> OFFSET> {} +
> +/// Create a C doubly-circular linked list interface [`CList`] from a
> raw `list_head` pointer. +///
> +/// This macro creates a [`CList<T, OFFSET>`] that can iterate over
> items of type `$rust_type` +/// linked via the `$field` field in the
> underlying C struct `$c_type`. +///
> +/// # Arguments
> +///
> +/// - `$head`: Raw pointer to the sentinel `list_head` object (`*mut
> bindings::list_head`). +/// - `$rust_type`: Each item's rust wrapper
> type. +/// - `$c_type`: Each item's C struct type that contains the
> embedded `list_head`. +/// - `$field`: The name of the `list_head` field
> within the C struct. +///
> +/// # Safety
> +///
> +/// The caller must ensure:
> +/// - `$head` is a valid, initialized sentinel `list_head` pointing to
> a list that remains +///   unmodified for the lifetime of the rust
> [`CList`]. +/// - The list contains items of type `$c_type` linked via
> an embedded `$field`. +/// - `$rust_type` is `#[repr(transparent)]` over
> `$c_type` or has compatible layout. +/// - The macro is called from an
> unsafe block. +///
> +/// # Examples
> +///
> +/// Refer to the examples in the [`crate::clist`] module documentation.
> +#[macro_export]
> +macro_rules! clist_create {
> +    ($head:expr, $rust_type:ty, $c_type:ty, $($field:tt).+) => {{
> +        // Compile-time check that field path is a list_head.
> +        let _: fn(*const $c_type) -> *const $crate::bindings::list_head
> =
> +            |p| ::core::ptr::addr_of!((*p).$($field).+);
> +
> +        // Calculate offset and create `CList`.
> +        const OFFSET: usize = ::core::mem::offset_of!($c_type,
> $($field).+);
> +        $crate::clist::CList::<$rust_type, OFFSET>::from_raw($head)
> +    }};
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index f812cf120042..cd7e6a1055b0 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -75,6 +75,7 @@
>  pub mod bug;
>  #[doc(hidden)]
>  pub mod build_assert;
> +pub mod clist;
>  pub mod clk;
>  #[cfg(CONFIG_CONFIGFS_FS)]
>  pub mod configfs;


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-20 20:42 ` [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
@ 2026-01-21  8:07   ` Zhi Wang
  2026-01-21 17:52     ` Joel Fernandes
  0 siblings, 1 reply; 71+ messages in thread
From: Zhi Wang @ 2026-01-21  8:07 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, joel, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev

On Tue, 20 Jan 2026 15:42:42 -0500
Joel Fernandes <joelagnelf@nvidia.com> wrote:

> PRAMIN apertures are a crucial mechanism to direct read/write to VRAM.
> Add support for the same.
> 

I went through the code, this seems not designed for multiple users. As
this is used for writting PTEs for page tables, can you shed some light
about the plan of how we should handle the concurrency of writting multiple
page table PTEs, e.g. when two GPU memory mapping in two different GPU
page tables are procceding concurrently, this could happen when people
creating vGPUs concurrently. 

Z.

> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/mm/mod.rs    |   5 +
>  drivers/gpu/nova-core/mm/pramin.rs | 244 +++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/nova_core.rs |   1 +
>  drivers/gpu/nova-core/regs.rs      |   5 +
>  4 files changed, 255 insertions(+)
>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
> 
> diff --git a/drivers/gpu/nova-core/mm/mod.rs
> b/drivers/gpu/nova-core/mm/mod.rs new file mode 100644
> index 000000000000..7a5dd4220c67
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/mod.rs
> @@ -0,0 +1,5 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Memory management subsystems for nova-core.
> +
> +pub(crate) mod pramin;
> diff --git a/drivers/gpu/nova-core/mm/pramin.rs
> b/drivers/gpu/nova-core/mm/pramin.rs new file mode 100644
> index 000000000000..6a7ea2dc7d77
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/pramin.rs
> @@ -0,0 +1,244 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Direct VRAM access through the PRAMIN aperture.
> +//!
> +//! PRAMIN provides a 1MB sliding window into VRAM through BAR0,
> allowing the CPU to access +//! video memory directly. The [`Window`]
> type automatically repositions the window when +//! accessing different
> VRAM regions and restores the original position on drop. This allows
> +//! to reuse the same window for multiple accesses in the same window.
> +//! +//! The PRAMIN aperture is a 1MB region at BAR0 + 0x700000 for all
> GPUs. The window base is +//! controlled by the `NV_PBUS_BAR0_WINDOW`
> register and must be 64KB aligned. +//!
> +//! # Examples
> +//!
> +//! ## Basic read/write
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin;
> +//! use kernel::devres::Devres;
> +//! use kernel::sync::Arc;
> +//!
> +//! fn example(devres_bar: Arc<Devres<Bar0>>) -> Result<()> {
> +//!     let mut pram_win = pramin::Window::new(devres_bar)?;
> +//!
> +//!     // Write and read back.
> +//!     pram_win.try_write32(0x100, 0xDEADBEEF)?;
> +//!     let val = pram_win.try_read32(0x100)?;
> +//!     assert_eq!(val, 0xDEADBEEF);
> +//!
> +//!     Ok(())
> +//!     // Original window position restored on drop.
> +//! }
> +//! ```
> +//!
> +//! ## Auto-repositioning across VRAM regions
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin;
> +//! use kernel::devres::Devres;
> +//! use kernel::sync::Arc;
> +//!
> +//! fn example(devres_bar: Arc<Devres<Bar0>>) -> Result<()> {
> +//!     let mut pram_win = pramin::Window::new(devres_bar)?;
> +//!
> +//!     // Access first 1MB region.
> +//!     pram_win.try_write32(0x100, 0x11111111)?;
> +//!
> +//!     // Access at 2MB - window auto-repositions.
> +//!     pram_win.try_write32(0x200000, 0x22222222)?;
> +//!
> +//!     // Back to first region - window repositions again.
> +//!     let val = pram_win.try_read32(0x100)?;
> +//!     assert_eq!(val, 0x11111111);
> +//!
> +//!     Ok(())
> +//! }
> +//! ```
> +
> +#![allow(unused)]
> +
> +use crate::{
> +    driver::Bar0,
> +    regs, //
> +};
> +
> +use kernel::bits::genmask_u64;
> +use kernel::devres::Devres;
> +use kernel::prelude::*;
> +use kernel::ptr::{
> +    Alignable,
> +    Alignment, //
> +};
> +use kernel::sizes::{
> +    SZ_1M,
> +    SZ_64K, //
> +};
> +use kernel::sync::Arc;
> +
> +/// PRAMIN aperture base offset in BAR0.
> +const PRAMIN_BASE: usize = 0x700000;
> +
> +/// PRAMIN aperture size (1MB).
> +const PRAMIN_SIZE: usize = SZ_1M;
> +
> +/// 64KB alignment for window base.
> +const WINDOW_ALIGN: Alignment = Alignment::new::<SZ_64K>();
> +
> +/// Maximum addressable VRAM offset (40-bit address space).
> +///
> +/// The `NV_PBUS_BAR0_WINDOW` register has a 24-bit `window_base` field
> (bits 23:0) that stores +/// bits [39:16] of the target VRAM address.
> This limits the addressable space to 2^40 bytes. +///
> +/// CAST: On 64-bit systems, this fits in usize.
> +const MAX_VRAM_OFFSET: usize = genmask_u64(0..=39) as usize;
> +
> +/// Generate a PRAMIN read accessor.
> +macro_rules! define_pramin_read {
> +    ($name:ident, $ty:ty) => {
> +        #[doc = concat!("Read a `", stringify!($ty), "` from VRAM at
> the given offset.")]
> +        pub(crate) fn $name(&mut self, vram_offset: usize) ->
> Result<$ty> {
> +            // Compute window parameters without bar reference.
> +            let (bar_offset, new_base) =
> +                self.compute_window(vram_offset,
> ::core::mem::size_of::<$ty>())?; +
> +            // Update window base if needed and perform read.
> +            let bar = self.bar.try_access().ok_or(ENODEV)?;
> +            if let Some(base) = new_base {
> +                Self::write_window_base(&bar, base);
> +                self.current_base = base;
> +            }
> +            bar.$name(bar_offset)
> +        }
> +    };
> +}
> +
> +/// Generate a PRAMIN write accessor.
> +macro_rules! define_pramin_write {
> +    ($name:ident, $ty:ty) => {
> +        #[doc = concat!("Write a `", stringify!($ty), "` to VRAM at the
> given offset.")]
> +        pub(crate) fn $name(&mut self, vram_offset: usize, value: $ty)
> -> Result {
> +            // Compute window parameters without bar reference.
> +            let (bar_offset, new_base) =
> +                self.compute_window(vram_offset,
> ::core::mem::size_of::<$ty>())?; +
> +            // Update window base if needed and perform write.
> +            let bar = self.bar.try_access().ok_or(ENODEV)?;
> +            if let Some(base) = new_base {
> +                Self::write_window_base(&bar, base);
> +                self.current_base = base;
> +            }
> +            bar.$name(value, bar_offset)
> +        }
> +    };
> +}
> +
> +/// PRAMIN window for direct VRAM access.
> +///
> +/// The window auto-repositions when accessing VRAM offsets outside the
> current 1MB range. +/// Original window position is saved on creation
> and restored on drop. +pub(crate) struct Window {
> +    bar: Arc<Devres<Bar0>>,
> +    saved_base: usize,
> +    current_base: usize,
> +}
> +
> +impl Window {
> +    /// Create a new PRAMIN window accessor.
> +    ///
> +    /// Saves the current window position for restoration on drop.
> +    pub(crate) fn new(bar: Arc<Devres<Bar0>>) -> Result<Self> {
> +        let bar_access = bar.try_access().ok_or(ENODEV)?;
> +        let saved_base = Self::try_read_window_base(&bar_access)?;
> +
> +        Ok(Self {
> +            bar,
> +            saved_base,
> +            current_base: saved_base,
> +        })
> +    }
> +
> +    /// Read the current window base from the BAR0_WINDOW register.
> +    fn try_read_window_base(bar: &Bar0) -> Result<usize> {
> +        let reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
> +        let base = u64::from(reg.window_base());
> +        let shifted = base.checked_shl(16).ok_or(EOVERFLOW)?;
> +        shifted.try_into().map_err(|_| EOVERFLOW)
> +    }
> +
> +    /// Write a new window base to the BAR0_WINDOW register.
> +    fn write_window_base(bar: &Bar0, base: usize) {
> +        // CAST:
> +        // - We have guaranteed that the base is within the addressable
> range (40-bits).
> +        // - After >> 16, a 40-bit aligned base becomes 24 bits, which
> fits in u32.
> +        regs::NV_PBUS_BAR0_WINDOW::default()
> +            .set_window_base((base >> 16) as u32)
> +            .write(bar);
> +    }
> +
> +    /// Compute window parameters for a VRAM access.
> +    ///
> +    /// Returns (bar_offset, new_base) where:
> +    /// - bar_offset: The BAR0 offset to use for the access
> +    /// - new_base: Some(base) if window needs repositioning, None
> otherwise
> +    fn compute_window(
> +        &self,
> +        vram_offset: usize,
> +        access_size: usize,
> +    ) -> Result<(usize, Option<usize>)> {
> +        // Validate VRAM offset is within addressable range (40-bit
> address space).
> +        let end_offset =
> vram_offset.checked_add(access_size).ok_or(EINVAL)?;
> +        if end_offset > MAX_VRAM_OFFSET + 1 {
> +            return Err(EINVAL);
> +        }
> +
> +        // Calculate which 64KB-aligned base we need.
> +        let needed_base = vram_offset.align_down(WINDOW_ALIGN);
> +
> +        // Calculate offset within the window.
> +        let offset_in_window = vram_offset - needed_base;
> +
> +        // Check if access fits in 1MB window from this base.
> +        if offset_in_window + access_size > PRAMIN_SIZE {
> +            return Err(EINVAL);
> +        }
> +
> +        // Return bar offset and whether window needs repositioning.
> +        let new_base = if self.current_base != needed_base {
> +            Some(needed_base)
> +        } else {
> +            None
> +        };
> +
> +        Ok((PRAMIN_BASE + offset_in_window, new_base))
> +    }
> +
> +    define_pramin_read!(try_read8, u8);
> +    define_pramin_read!(try_read16, u16);
> +    define_pramin_read!(try_read32, u32);
> +    define_pramin_read!(try_read64, u64);
> +
> +    define_pramin_write!(try_write8, u8);
> +    define_pramin_write!(try_write16, u16);
> +    define_pramin_write!(try_write32, u32);
> +    define_pramin_write!(try_write64, u64);
> +}
> +
> +impl Drop for Window {
> +    fn drop(&mut self) {
> +        // Restore the original window base if it changed.
> +        if self.current_base != self.saved_base {
> +            if let Some(bar) = self.bar.try_access() {
> +                Self::write_window_base(&bar, self.saved_base);
> +            }
> +        }
> +    }
> +}
> +
> +// SAFETY: `Window` requires `&mut self` for all accessors.
> +unsafe impl Send for Window {}
> +
> +// SAFETY: `Window` requires `&mut self` for all accessors.
> +unsafe impl Sync for Window {}
> diff --git a/drivers/gpu/nova-core/nova_core.rs
> b/drivers/gpu/nova-core/nova_core.rs index c1121e7c64c5..3de00db3279e
> 100644 --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -13,6 +13,7 @@
>  mod gfw;
>  mod gpu;
>  mod gsp;
> +mod mm;
>  mod num;
>  mod regs;
>  mod sbuffer;
> diff --git a/drivers/gpu/nova-core/regs.rs
> b/drivers/gpu/nova-core/regs.rs index 82cc6c0790e5..c8b8fbdcf608 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -96,6 +96,11 @@ fn fmt(&self, f: &mut kernel::fmt::Formatter<'_>) ->
> kernel::fmt::Result { 31:16   frts_err_code as u16;
>  });
>  
> +register!(NV_PBUS_BAR0_WINDOW @ 0x00001700, "BAR0 window control for
> PRAMIN access" {
> +    25:24   target as u8, "Target memory (0=VRAM, 1=SYS_MEM_COH,
> 2=SYS_MEM_NONCOH)";
> +    23:0    window_base as u32, "Window base address (bits 39:16 of FB
> addr)"; +});
> +
>  // PFB
>  
>  // The following two registers together hold the physical system memory
> address that is used by the


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 13/26] nova-core: mm: Add unified page table entry wrapper enums
  2026-01-20 20:42 ` [PATCH RFC v6 13/26] nova-core: mm: Add unified page table entry wrapper enums Joel Fernandes
@ 2026-01-21  9:54   ` Zhi Wang
  2026-01-21 18:35     ` Joel Fernandes
  0 siblings, 1 reply; 71+ messages in thread
From: Zhi Wang @ 2026-01-21  9:54 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian König, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Huang Rui,
	Matthew Auld, Matthew Brost, Lucas De Marchi,
	Thomas Hellström, Helge Deller, Danilo Krummrich, Alice Ryhl,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Alistair Popple, Alexandre Courbot, Andrea Righi,
	Alexey Ivanov, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	joel, nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx,
	intel-gfx, intel-xe, linux-fbdev

On Tue, 20 Jan 2026 15:42:50 -0500
Joel Fernandes <joelagnelf@nvidia.com> wrote:
> Add unified Pte, Pde, and DualPde wrapper enums that abstract over
> MMU v2 and v3 page table entry formats. These enums allow the page
> table walker and VMM to work with both MMU versions.
> 

snip

> +impl DualPde {
> +    /// Create a [`DualPde`] from raw 128-bit value (two `u64`s) for
> the given MMU version.
> +    pub(crate) fn new(version: MmuVersion, big: u64, small: u64) ->
> Self {
> +        match version {
> +            MmuVersion::V2 => Self::V2(ver2::DualPde::new(big, small)),
> +            MmuVersion::V3 => Self::V3(ver3::DualPde::new(big, small)),
> +        }
> +    }
> +
> +    /// Create a [`DualPde`] with only the small page table pointer set.
> +    pub(crate) fn new_small(version: MmuVersion, table_pfn: Pfn) ->
> Self {
> +        match version {
> +            MmuVersion::V2 =>
> Self::V2(ver2::DualPde::new_small(table_pfn)),
> +            MmuVersion::V3 =>
> Self::V3(ver3::DualPde::new_small(table_pfn)),
> +        }
> +    }
> +
> +    /// Check if the small page table pointer is valid.
> +    pub(crate) fn has_small(&self) -> bool {
> +        match self {
> +            Self::V2(d) => d.has_small(),
> +            Self::V3(d) => d.has_small(),
> +        }
> +    }
> +

Should we also have a has_big here as well?

Z.

> +    /// Get the small page table VRAM address.
> +    pub(crate) fn small_vram_address(&self) -> VramAddress {
> +        match self {
> +            Self::V2(d) => d.small.table_vram_address(),
> +            Self::V3(d) => d.small.table_vram_address(),
> +        }
> +    }
> +
> +    /// Get the raw `u64` value of the big PDE.
> +    pub(crate) fn big_raw_u64(&self) -> u64 {
> +        match self {
> +            Self::V2(d) => d.big.raw_u64(),
> +            Self::V3(d) => d.big.raw_u64(),
> +        }
> +    }
> +
> +    /// Get the raw `u64` value of the small PDE.
> +    pub(crate) fn small_raw_u64(&self) -> u64 {
> +        match self {
> +            Self::V2(d) => d.small.raw_u64(),
> +            Self::V3(d) => d.small.raw_u64(),
> +        }
> +    }
> +}


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 14/26] nova-core: mm: Add TLB flush support
  2026-01-20 20:42 ` [PATCH RFC v6 14/26] nova-core: mm: Add TLB flush support Joel Fernandes
@ 2026-01-21  9:59   ` Zhi Wang
  2026-01-21 18:45     ` Joel Fernandes
  0 siblings, 1 reply; 71+ messages in thread
From: Zhi Wang @ 2026-01-21  9:59 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian König, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Huang Rui,
	Matthew Auld, Matthew Brost, Lucas De Marchi,
	Thomas Hellström, Helge Deller, Danilo Krummrich, Alice Ryhl,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Alistair Popple, Alexandre Courbot, Andrea Righi,
	Alexey Ivanov, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	joel, nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx,
	intel-gfx, intel-xe, linux-fbdev

On Tue, 20 Jan 2026 15:42:51 -0500
Joel Fernandes <joelagnelf@nvidia.com> wrote:

> Add TLB (Translation Lookaside Buffer) flush support for GPU MMU.
> 

The same concern as in PATCH 5, guess we need to think of concurrency for
TLB flush.

> After modifying page table entries, the GPU's TLB must be invalidated
> to ensure the new mappings take effect. The Tlb struct provides flush
> functionality through BAR0 registers.
> 
> The flush operation writes the page directory base address and triggers
> an invalidation, polling for completion with a 2 second timeout matching
> the Nouveau driver.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/mm/mod.rs |  1 +
>  drivers/gpu/nova-core/mm/tlb.rs | 79 +++++++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/regs.rs   | 33 ++++++++++++++
>  3 files changed, 113 insertions(+)
>  create mode 100644 drivers/gpu/nova-core/mm/tlb.rs
> 
> diff --git a/drivers/gpu/nova-core/mm/mod.rs
> b/drivers/gpu/nova-core/mm/mod.rs index 6015fc8753bc..39635f2d0156 100644
> --- a/drivers/gpu/nova-core/mm/mod.rs
> +++ b/drivers/gpu/nova-core/mm/mod.rs
> @@ -6,6 +6,7 @@
>  
>  pub(crate) mod pagetable;
>  pub(crate) mod pramin;
> +pub(crate) mod tlb;
>  
>  use kernel::sizes::SZ_4K;
>  
> diff --git a/drivers/gpu/nova-core/mm/tlb.rs
> b/drivers/gpu/nova-core/mm/tlb.rs new file mode 100644
> index 000000000000..8b2ee620da18
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/tlb.rs
> @@ -0,0 +1,79 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! TLB (Translation Lookaside Buffer) flush support for GPU MMU.
> +//!
> +//! After modifying page table entries, the GPU's TLB must be flushed to
> +//! ensure the new mappings take effect. This module provides TLB flush
> +//! functionality for virtual memory managers.
> +//!
> +//! # Example
> +//!
> +//! ```ignore
> +//! use crate::mm::tlb::Tlb;
> +//!
> +//! fn page_table_update(tlb: &Tlb, pdb_addr: VramAddress) ->
> Result<()> { +//!     // ... modify page tables ...
> +//!
> +//!     // Flush TLB to make changes visible (polls for completion).
> +//!     tlb.flush(pdb_addr)?;
> +//!
> +//!     Ok(())
> +//! }
> +//! ```
> +
> +#![allow(dead_code)]
> +
> +use kernel::{
> +    devres::Devres,
> +    io::poll::read_poll_timeout,
> +    prelude::*,
> +    sync::Arc,
> +    time::Delta, //
> +};
> +
> +use crate::{
> +    driver::Bar0,
> +    mm::VramAddress,
> +    regs, //
> +};
> +
> +/// TLB manager for GPU translation buffer operations.
> +pub(crate) struct Tlb {
> +    bar: Arc<Devres<Bar0>>,
> +}
> +
> +impl Tlb {
> +    /// Create a new TLB manager.
> +    pub(super) fn new(bar: Arc<Devres<Bar0>>) -> Self {
> +        Self { bar }
> +    }
> +
> +    /// Flush the GPU TLB for a specific page directory base.
> +    ///
> +    /// This invalidates all TLB entries associated with the given PDB
> address.
> +    /// Must be called after modifying page table entries to ensure the
> GPU sees
> +    /// the updated mappings.
> +    pub(crate) fn flush(&self, pdb_addr: VramAddress) -> Result {
> +        let bar = self.bar.try_access().ok_or(ENODEV)?;
> +
> +        // Write PDB address.
> +
> regs::NV_TLB_FLUSH_PDB_LO::from_pdb_addr(pdb_addr.raw_u64()).write(&*bar);
> +
> regs::NV_TLB_FLUSH_PDB_HI::from_pdb_addr(pdb_addr.raw_u64()).write(&*bar);
> +
> +        // Trigger flush: invalidate all pages and enable.
> +        regs::NV_TLB_FLUSH_CTRL::default()
> +            .set_page_all(true)
> +            .set_enable(true)
> +            .write(&*bar);
> +
> +        // Poll for completion - enable bit clears when flush is done.
> +        read_poll_timeout(
> +            || Ok(regs::NV_TLB_FLUSH_CTRL::read(&*bar)),
> +            |ctrl| !ctrl.enable(),
> +            Delta::ZERO,
> +            Delta::from_secs(2),
> +        )?;
> +
> +        Ok(())
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/regs.rs
> b/drivers/gpu/nova-core/regs.rs index c8b8fbdcf608..e722ef837e11 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -414,3 +414,36 @@ pub(crate) mod ga100 {
>          0:0     display_disabled as bool;
>      });
>  }
> +
> +// MMU TLB
> +
> +register!(NV_TLB_FLUSH_PDB_LO @ 0x00b830a0, "TLB flush register: PDB
> address bits [39:8]" {
> +    31:0    pdb_lo as u32, "PDB address bits [39:8]";
> +});
> +
> +impl NV_TLB_FLUSH_PDB_LO {
> +    /// Create a register value from a PDB address.
> +    ///
> +    /// Extracts bits [39:8] of the address and shifts it right by 8
> bits.
> +    pub(crate) fn from_pdb_addr(addr: u64) -> Self {
> +        Self::default().set_pdb_lo(((addr >> 8) & 0xFFFF_FFFF) as u32)
> +    }
> +}
> +
> +register!(NV_TLB_FLUSH_PDB_HI @ 0x00b830a4, "TLB flush register: PDB
> address bits [47:40]" {
> +    7:0     pdb_hi as u8, "PDB address bits [47:40]";
> +});
> +
> +impl NV_TLB_FLUSH_PDB_HI {
> +    /// Create a register value from a PDB address.
> +    ///
> +    /// Extracts bits [47:40] of the address and shifts it right by 40
> bits.
> +    pub(crate) fn from_pdb_addr(addr: u64) -> Self {
> +        Self::default().set_pdb_hi(((addr >> 40) & 0xFF) as u8)
> +    }
> +}
> +
> +register!(NV_TLB_FLUSH_CTRL @ 0x00b830b0, "TLB flush control register" {
> +    0:0     page_all as bool, "Invalidate all pages";
> +    31:31   enable as bool, "Enable/trigger flush (clears when flush
> completes)"; +});


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-21  8:07   ` Zhi Wang
@ 2026-01-21 17:52     ` Joel Fernandes
  2026-01-22 23:16       ` Joel Fernandes
  0 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-21 17:52 UTC (permalink / raw)
  To: Zhi Wang
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, joel, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev

Hello, Zhi,

On 1/21/2026 3:07 AM, Zhi Wang wrote:
> On Tue, 20 Jan 2026 15:42:42 -0500
> Joel Fernandes <joelagnelf@nvidia.com> wrote:
> 
>> PRAMIN apertures are a crucial mechanism to direct read/write to VRAM.
>> Add support for the same.
>>
> 
> I went through the code, this seems not designed for multiple users. As
> this is used for writting PTEs for page tables, can you shed some light
> about the plan of how we should handle the concurrency of writting multiple
> page table PTEs, e.g. when two GPU memory mapping in two different GPU
> page tables are procceding concurrently, this could happen when people
> creating vGPUs concurrently. 
Good question. Currently, BarUser::map() requires a mutable reference to both
the BarUser and the GpuMm.

    pub(crate) fn map<'a>(
        &'a mut self,
        mm: &'a mut GpuMm,

GpuMm is owned by the struct Gpu, so from a Rust standpoint, it is already
handled since it is not possible to manipulate the Page table hierarchy (Page
directories and last level Page table).

But yes, we have to look into concurrency once we have channels, and users other
than bar where have multiple users of the same address space doing
mapping/unmapping.

I think we can incrementally build on this series to add support for the same,
it is not something this series directly addresses since I have spend majority
of my time last several months making translation *work* which is itself no east
task. This series is just preliminary based on work from last several months and
to make BAR1 work. For instance, I kept PRAMIN simple based on feedback that we
don't want to over complicate without fully understanding all the requirements.
There is also additional requirements for locking design that have implications
with DMA fencing etc, for instance.

Anyway thinking out loud, I am thinking for handling concurrency at the page
table entry level (if we ever need it), we could use per-PT spinlocks similar to
the Linux kernel. But lets plan on how to do this properly and based on actual
requirements.

-- 
Joel Fernandes

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists
  2026-01-21  7:27   ` Zhi Wang
@ 2026-01-21 18:12     ` Joel Fernandes
  0 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-21 18:12 UTC (permalink / raw)
  To: Zhi Wang, Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, nouveau, dri-devel, rust-for-linux,
	linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev, Boqun Feng,
	Paul E. McKenney

On 1/21/2026 2:27 AM, Zhi Wang wrote:
> 
>> +/// Initialize a `list_head` object to point to itself.
>> +///
>> +/// # Safety
>> +///
>> +/// `list` must be a valid pointer to a `list_head` object.
>> +#[inline]
>> +pub unsafe fn init_list_head(list: *mut bindings::list_head) {
>> +    // SAFETY: Caller guarantees `list` is a valid pointer to a
>> `list_head`.
>> +    unsafe {
>> +        (*list).next = list;
>> +        (*list).prev = list;
>> +    }
>> +}
>> +
>
> Might be better to have a C helper? since INIT_LIST_HEAD() has WRITE_ONCE()
> for memory ordering. This one seems not equal to it.

WRITE_ONCE() is not really about CPU memory ordering though, it is about
compiler optimizations. On the C side, I think it is needed in case of
list_for_each_entry_rcu(), to avoid the case of invented stores or store fusing,
but here we are not doing RCU-based iteration.

Anyway, if we want to future proof that, I am Ok with adding the helper back
(which I actually initially had but feedback from past review was to just inline
it into rust).

But I am not sure if we have this issue for the Rust compiler, like we do for C.
Rust does not allow raw pointers to be concurrently read/written using plain
accesses, so should be already protected due to the borrow checker and compiler
itself right?

Adding some interested folks as well to CC for the topic of _ONCE, +Boqun +Paul.

--
Joel Fernandes

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 13/26] nova-core: mm: Add unified page table entry wrapper enums
  2026-01-21  9:54   ` Zhi Wang
@ 2026-01-21 18:35     ` Joel Fernandes
  0 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-21 18:35 UTC (permalink / raw)
  To: Zhi Wang
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian König, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Huang Rui,
	Matthew Auld, Matthew Brost, Lucas De Marchi,
	Thomas Hellström, Helge Deller, Danilo Krummrich, Alice Ryhl,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Alistair Popple, Alexandre Courbot, Andrea Righi,
	Alexey Ivanov, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	joel, nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx,
	intel-gfx, intel-xe, linux-fbdev



On 1/21/2026 4:54 AM, Zhi Wang wrote:
> On Tue, 20 Jan 2026 15:42:50 -0500
> Joel Fernandes <joelagnelf@nvidia.com> wrote:
>> Add unified Pte, Pde, and DualPde wrapper enums that abstract over
>> MMU v2 and v3 page table entry formats. These enums allow the page
>> table walker and VMM to work with both MMU versions.
>>
> 
> snip
> 
>> +impl DualPde {
>> +    /// Create a [`DualPde`] from raw 128-bit value (two `u64`s) for
>> the given MMU version.
>> +    pub(crate) fn new(version: MmuVersion, big: u64, small: u64) ->
>> Self {
>> +        match version {
>> +            MmuVersion::V2 => Self::V2(ver2::DualPde::new(big, small)),
>> +            MmuVersion::V3 => Self::V3(ver3::DualPde::new(big, small)),
>> +        }
>> +    }
>> +
>> +    /// Create a [`DualPde`] with only the small page table pointer set.
>> +    pub(crate) fn new_small(version: MmuVersion, table_pfn: Pfn) ->
>> Self {
>> +        match version {
>> +            MmuVersion::V2 =>
>> Self::V2(ver2::DualPde::new_small(table_pfn)),
>> +            MmuVersion::V3 =>
>> Self::V3(ver3::DualPde::new_small(table_pfn)),
>> +        }
>> +    }
>> +
>> +    /// Check if the small page table pointer is valid.
>> +    pub(crate) fn has_small(&self) -> bool {
>> +        match self {
>> +            Self::V2(d) => d.has_small(),
>> +            Self::V3(d) => d.has_small(),
>> +        }
>> +    }
>> +
> 
> Should we also have a has_big here as well?
Good catch, I will add that in, thanks.

--
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 14/26] nova-core: mm: Add TLB flush support
  2026-01-21  9:59   ` Zhi Wang
@ 2026-01-21 18:45     ` Joel Fernandes
  0 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-21 18:45 UTC (permalink / raw)
  To: Zhi Wang
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian König, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Huang Rui,
	Matthew Auld, Matthew Brost, Lucas De Marchi,
	Thomas Hellström, Helge Deller, Danilo Krummrich, Alice Ryhl,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Alistair Popple, Alexandre Courbot, Andrea Righi,
	Alexey Ivanov, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	joel, nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx,
	intel-gfx, intel-xe, linux-fbdev

Hello, Zhi,

On 1/21/2026 4:59 AM, Zhi Wang wrote:
> On Tue, 20 Jan 2026 15:42:51 -0500
> Joel Fernandes <joelagnelf@nvidia.com> wrote:
> 
>> Add TLB (Translation Lookaside Buffer) flush support for GPU MMU.
>>
> The same concern as in PATCH 5, guess we need to think of concurrency for
> TLB flush.


Will change:
    pub(crate) fn flush(&self, pdb_addr: VramAddress)

to:
   pub(crate) fn flush(&mut self, pdb_addr: VramAddress)


and also changing in mm/mod.rs:
    pub(crate) fn tlb(&self) -> &Tlb {
to:
    pub(crate) fn tlb(&mut self) -> &mut Tlb.

Since TLB operations modify registers, that does make sense to me.

For the buddy allocator, however, I am locking internally so I left it as is:
    /// Access the [`GpuBuddy`] allocator.
    pub(crate) fn buddy(&self) -> &GpuBuddy {
        &self.buddy
    }

-- 
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists
  2026-01-20 23:48   ` Gary Guo
@ 2026-01-21 19:50     ` Joel Fernandes
  2026-01-21 20:36       ` Gary Guo
  0 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-21 19:50 UTC (permalink / raw)
  To: Gary Guo, linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, John Hubbard, Alistair Popple, Timur Tabi,
	Edwin Peer, Alexandre Courbot, Andrea Righi, Andy Ritger,
	Zhi Wang, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, joel, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev

Hello, Gary,

On 1/20/2026 6:48 PM, Gary Guo wrote:
> On Tue Jan 20, 2026 at 8:42 PM GMT, Joel Fernandes wrote:
>> Add a new module `clist` for working with C's doubly circular linked
>> lists. Provide low-level iteration over list nodes.
>>
>> Typed iteration over actual items is provided with a `clist_create`
>> macro to assist in creation of the `Clist` type.
> 
> This should read "CList".

Sure, will fix.

> 
> I was quite dubious about the patch just from the title (everybody knows how
> easy a linked list is in Rust), but it turns out it is not as concerning as I
> expected, mostly due to the read-only nature of the particular implementation
> (a lot of the safety comments would be much more difficult to justify, say, if
> it's mutable). That said, still a lot of feedbacks below.

Sure, the reason for requiring this is interfacing with lists coming from C
code. I'd see a future where we may want it mutable too (example, Rust code
adding elements to the existing). At which point, the invariants/safety
reasoning may change.

> I think something like is okay in the short term. However, there's an growing
> interest in getting our Rust list API improved, so it could be ideal if
> eventually the Rust list can be capable of handling FFI lists, too.

Yeah we looked into that, if you see old threads, the conclusion was it is not a
good fit for existing rust list abstractions. TLDR; it does not fit into their
ownership/borrowing model.

[...]
>> +
>> +/// Initialize a `list_head` object to point to itself.
>> +///
>> +/// # Safety
>> +///
>> +/// `list` must be a valid pointer to a `list_head` object.
>> +#[inline]
>> +pub unsafe fn init_list_head(list: *mut bindings::list_head) {
>> +    // SAFETY: Caller guarantees `list` is a valid pointer to a `list_head`.
>> +    unsafe {
>> +        (*list).next = list;
>> +        (*list).prev = list;
> 
> This needs to be an atomic write or it'll depart from the C implementation.

I am curious what you mean by atomic write, can you define it?  Does rust
compiler have load/store fusing, invented stores, etc, like C does? Sorry I am
only familiar with these concepts on C. Could you provide example of a race
condition in Rust that can happen?

Also I did this addition based on feedback from past review:
https://lore.kernel.org/all/DEI89VUEYXAJ.1IQQPC3QRLITP@nvidia.com/

There was some concerns around pointless function call overhead when the rust
implementation is already quite intertwined with internals of the C linked list
implementation. I do agree with that point of view too.

Also see my other reply to Zhi on this helper topic, lets discuss there too, if
that's Ok.

>> +    }
>> +}
> 
> I don't think we want to publicly expose this! I've not found a user in the
> subsequent patch, too.

There are 2 users:

    pub fn try_init<E>(

and the self-tests:

//! # let head = head.as_mut_ptr();
//! # // SAFETY: head and all the items are test objects allocated in [..]
//! # unsafe { init_list_head(head) };
//! #

> 
>> +
>> +/// Wraps a `list_head` object for use in intrusive linked lists.
>> +///
>> +/// # Invariants
>> +///
>> +/// - [`CListHead`] represents an allocated and valid `list_head` structure.
>> +/// - Once a [`CListHead`] is created in Rust, it will not be modified by non-Rust code.
>> +/// - All `list_head` for individual items are not modified for the lifetime of [`CListHead`].
>> +#[repr(transparent)]
>> +pub struct CListHead(Opaque<bindings::list_head>);
>> +
>> +impl CListHead {
>> +    /// Create a `&CListHead` reference from a raw `list_head` pointer.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `ptr` must be a valid pointer to an allocated and initialized `list_head` structure.
>> +    /// - `ptr` must remain valid and unmodified for the lifetime `'a`.
>> +    #[inline]
>> +    pub unsafe fn from_raw<'a>(ptr: *mut bindings::list_head) -> &'a Self {
>> +        // SAFETY:
>> +        // - [`CListHead`] has same layout as `list_head`.
>> +        // - `ptr` is valid and unmodified for 'a.
>> +        unsafe { &*ptr.cast() }
>> +    }
>> +
>> +    /// Get the raw `list_head` pointer.
>> +    #[inline]
>> +    pub fn as_raw(&self) -> *mut bindings::list_head {
>> +        self.0.get()
>> +    }
>> +
>> +    /// Get the next [`CListHead`] in the list.
>> +    #[inline]
>> +    pub fn next(&self) -> &Self {
>> +        let raw = self.as_raw();
>> +        // SAFETY:
>> +        // - `self.as_raw()` is valid per type invariants.
>> +        // - The `next` pointer is guaranteed to be non-NULL.
>> +        unsafe { Self::from_raw((*raw).next) }
>> +    }
>> +
>> +    /// Get the previous [`CListHead`] in the list.
>> +    #[inline]
>> +    pub fn prev(&self) -> &Self {
>> +        let raw = self.as_raw();
>> +        // SAFETY:
>> +        // - self.as_raw() is valid per type invariants.
>> +        // - The `prev` pointer is guaranteed to be non-NULL.
>> +        unsafe { Self::from_raw((*raw).prev) }
>> +    }
>> +
>> +    /// Check if this node is linked in a list (not isolated).
>> +    #[inline]
>> +    pub fn is_linked(&self) -> bool {
>> +        let raw = self.as_raw();
>> +        // SAFETY: self.as_raw() is valid per type invariants.
>> +        unsafe { (*raw).next != raw && (*raw).prev != raw }
> 
> While is this checking both prev and next? `list_empty` is just
> `READ_ONCE(head->next) == head`.

Sure, I can optimize to just check ->next, that makes sense. Will do.

> 
>> +    }
>> +
>> +    /// Fallible pin-initializer that initializes and then calls user closure.
>> +    ///
>> +    /// Initializes the list head first, then passes `&CListHead` to the closure.
>> +    /// This hides the raw FFI pointer from the user.
>> +    pub fn try_init<E>(
>> +        init_func: impl FnOnce(&CListHead) -> Result<(), E>,
>> +    ) -> impl PinInit<Self, E> {
>> +        // SAFETY: init_list_head initializes the list_head to point to itself.
>> +        // After initialization, we create a reference to pass to the closure.
>> +        unsafe {
>> +            pin_init::pin_init_from_closure(move |slot: *mut Self| {
>> +                init_list_head(slot.cast());
>> +                // SAFETY: slot is now initialized, safe to create reference.
>> +                init_func(&*slot)
> 
> Why is this callback necessary? The user can just create the list head and
> then reference it later? I don't see what this specifically gains over just
> doing
> 
>     fn new() -> impl PinInit<Self>;
> 
> and have user-side
> 
>     list <- CListHead::new(),
>     _: {
>         do_want_ever(&list)
>     }

The list initialization can fail, see the GPU buddy patch:

        // Create pin-initializer that initializes list and allocates blocks.
        let init = try_pin_init!(AllocatedBlocks {
            list <- CListHead::try_init(|list| {
                // Lock while allocating to serialize with concurrent frees.
                let guard = buddy_arc.lock();

                // SAFETY: guard provides exclusive access, list is initialized.
                to_result(unsafe {
                    bindings::gpu_buddy_alloc_blocks(
                        guard.as_raw(),
                        params.start_range_address,
                        params.end_range_address,
                        params.size_bytes,
                        params.min_block_size_bytes,
                        list.as_raw(),
                        params.buddy_flags.as_raw(),
                    )
                })
            }),
            buddy: Arc::clone(&buddy_arc),
            flags: params.buddy_flags,
        });

> 
> 
>> +            })
>> +        }
>> +    }
>> +}
>> +
>> +// SAFETY: [`CListHead`] can be sent to any thread.
>> +unsafe impl Send for CListHead {}
>> +
>> +// SAFETY: [`CListHead`] can be shared among threads as it is not modified
>> +// by non-Rust code per type invariants.
>> +unsafe impl Sync for CListHead {}
>> +
>> +impl PartialEq for CListHead {
>> +    fn eq(&self, other: &Self) -> bool {
>> +        self.as_raw() == other.as_raw()
> 
> Or just `core::ptr::eq(self, other)`

Sure, will fix.

> 
>> +    }
>> +}
>> +
>> +impl Eq for CListHead {}
>> +
>> +/// Low-level iterator over `list_head` nodes.
>> +///
>> +/// An iterator used to iterate over a C intrusive linked list (`list_head`). Caller has to
>> +/// perform conversion of returned [`CListHead`] to an item (using `container_of` macro or similar).
>> +///
>> +/// # Invariants
>> +///
>> +/// [`CListHeadIter`] is iterating over an allocated, initialized and valid list.
>> +struct CListHeadIter<'a> {
>> +    current_head: &'a CListHead,
>> +    list_head: &'a CListHead,
>> +}
>> +
>> +impl<'a> Iterator for CListHeadIter<'a> {
>> +    type Item = &'a CListHead;
>> +
>> +    #[inline]
>> +    fn next(&mut self) -> Option<Self::Item> {
>> +        // Advance to next node.
>> +        let next = self.current_head.next();
>> +
>> +        // Check if we've circled back to the sentinel head.
>> +        if next == self.list_head {
>> +            None
>> +        } else {
>> +            self.current_head = next;
>> +            Some(self.current_head)
>> +        }
> 
> I think this could match the C iterator behaviour. When the iterator is created,
> a `next` is done first, and then subsequently you only need to check if
> `current_head` is `list_head`.
> 
> This is slightly better because the condition check does not need to dereference
> a pointer.

Sure, I can change it to that.
>> +impl<'a> FusedIterator for CListHeadIter<'a> {}
>> +
>> +/// A typed C linked list with a sentinel head.
>> +///
>> +/// A sentinel head represents the entire linked list and can be used for
>> +/// iteration over items of type `T`, it is not associated with a specific item.
>> +///
>> +/// The const generic `OFFSET` specifies the byte offset of the `list_head` field within
>> +/// the struct that `T` wraps.
>> +///
>> +/// # Invariants
>> +///
>> +/// - `head` is an allocated and valid C `list_head` structure that is the list's sentinel.
>> +/// - `OFFSET` is the byte offset of the `list_head` field within the struct that `T` wraps.
>> +/// - All the list's `list_head` nodes are allocated and have valid next/prev pointers.
>> +/// - The underlying `list_head` (and entire list) is not modified for the lifetime `'a`.
>> +pub struct CList<'a, T, const OFFSET: usize> {
>> +    head: &'a CListHead,
>> +    _phantom: PhantomData<&'a T>,
>> +}
> 
> Is there a reason that this is not
> 
>     #[repr(transparent)]
>     struct CList(CListHead)
> 
> ? We typically want to avoid putting reference inside the struct if it can be on
> the outside. This allows `&self` to be a single level of reference, not too.
> 
> It also means that you can just write `&CList<_>` in many cases, and doesn't need
> `CList<'_, T>` (plus all the benefits of a reference).

Sure! Will change to this. I am guessing you mean the following, but please let
me know if you meant something else:

  pub struct CList<T, const OFFSET: usize>(
      CListHead,
      PhantomData<T>,
  );

I don't see any issues with my code using that, at the moment. Will let you know
how it goes.
>> +impl<'a, T, const OFFSET: usize> CList<'a, T, OFFSET> {
>> +    /// Create a typed [`CList`] from a raw sentinel `list_head` pointer.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `ptr` must be a valid pointer to an allocated and initialized `list_head` structure
>> +    ///   representing a list sentinel.
>> +    /// - `ptr` must remain valid and unmodified for the lifetime `'a`.
>> +    /// - The list must contain items where the `list_head` field is at byte offset `OFFSET`.
>> +    /// - `T` must be `#[repr(transparent)]` over the C struct.
>> +    #[inline]
>> +    pub unsafe fn from_raw(ptr: *mut bindings::list_head) -> Self {
>> +        Self {
>> +            // SAFETY: Caller guarantees `ptr` is a valid, sentinel `list_head` object.
>> +            head: unsafe { CListHead::from_raw(ptr) },
>> +            _phantom: PhantomData,
>> +        }
>> +    }
>> +
>> +    /// Get the raw sentinel `list_head` pointer.
>> +    #[inline]
>> +    pub fn as_raw(&self) -> *mut bindings::list_head {
>> +        self.head.as_raw()
>> +    }
>> +
>> +    /// Check if the list is empty.
>> +    #[inline]
>> +    pub fn is_empty(&self) -> bool {
>> +        let raw = self.as_raw();
>> +        // SAFETY: self.as_raw() is valid per type invariants.
>> +        unsafe { (*raw).next == raw }
> 
> `self.head.is_linked()`?

I'd considered `is_linked()` to be something that makes sense to call only on
`ClistHead` objects that belong to a particular "item" node, not a sentinel
node, so that was deliberate.

Though, I am Ok with doing it the way you are suggesting too
(`self.head.is_linked()`), since it is functionally equivalent.

>> +    }
>> +
>> +    /// Create an iterator over typed items.
>> +    #[inline]
>> +    pub fn iter(&self) -> CListIter<'a, T, OFFSET> {
>> +        CListIter {
>> +            head_iter: CListHeadIter {
>> +                current_head: self.head,
>> +                list_head: self.head,
>> +            },
>> +            _phantom: PhantomData,
>> +        }
>> +    }
>> +}
>> +
>> +/// High-level iterator over typed list items.
>> +pub struct CListIter<'a, T, const OFFSET: usize> {
>> +    head_iter: CListHeadIter<'a>,
>> +    _phantom: PhantomData<&'a T>,
>> +}
>> +
>> +impl<'a, T, const OFFSET: usize> Iterator for CListIter<'a, T, OFFSET> {
>> +    type Item = &'a T;
>> +
>> +    fn next(&mut self) -> Option<Self::Item> {
>> +        let head = self.head_iter.next()?;
>> +
>> +        // Convert to item using OFFSET.
>> +        // SAFETY: `item_ptr` calculation from `OFFSET` (calculated using offset_of!)
>> +        // is valid per invariants.
>> +        Some(unsafe { &*head.as_raw().byte_sub(OFFSET).cast::<T>() })
>> +    }
>> +}
>> +
>> +impl<'a, T, const OFFSET: usize> FusedIterator for CListIter<'a, T, OFFSET> {}
>> +
>> +/// Create a C doubly-circular linked list interface [`CList`] from a raw `list_head` pointer.
>> +///
>> +/// This macro creates a [`CList<T, OFFSET>`] that can iterate over items of type `$rust_type`
>> +/// linked via the `$field` field in the underlying C struct `$c_type`.
>> +///
>> +/// # Arguments
>> +///
>> +/// - `$head`: Raw pointer to the sentinel `list_head` object (`*mut bindings::list_head`).
>> +/// - `$rust_type`: Each item's rust wrapper type.
>> +/// - `$c_type`: Each item's C struct type that contains the embedded `list_head`.
>> +/// - `$field`: The name of the `list_head` field within the C struct.
>> +///
>> +/// # Safety
>> +///
>> +/// The caller must ensure:
>> +/// - `$head` is a valid, initialized sentinel `list_head` pointing to a list that remains
>> +///   unmodified for the lifetime of the rust [`CList`].
>> +/// - The list contains items of type `$c_type` linked via an embedded `$field`.
>> +/// - `$rust_type` is `#[repr(transparent)]` over `$c_type` or has compatible layout.
>> +/// - The macro is called from an unsafe block.
> 
> This is not a safe requirement, probably lift it up and say "This is an unsafe
> macro.".

Sure, so like this then:
  /// This is an unsafe macro. The caller must ensure:
  /// - `$head` is a valid, initialized sentinel `list_head`...

>> +///
>> +/// # Examples
>> +///
>> +/// Refer to the examples in the [`crate::clist`] module documentation.
>> +#[macro_export]
>> +macro_rules! clist_create {
>> +    ($head:expr, $rust_type:ty, $c_type:ty, $($field:tt).+) => {{
>> +        // Compile-time check that field path is a list_head.
>> +        let _: fn(*const $c_type) -> *const $crate::bindings::list_head =
>> +            |p| ::core::ptr::addr_of!((*p).$($field).+);
> 
> `&raw const` is preferred now.

Sure, will fix.

> 
>> +
>> +        // Calculate offset and create `CList`.
>> +        const OFFSET: usize = ::core::mem::offset_of!($c_type, $($field).+);
>> +        $crate::clist::CList::<$rust_type, OFFSET>::from_raw($head)
>> +    }};
>> +}
>> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
>> index f812cf120042..cd7e6a1055b0 100644
>> --- a/rust/kernel/lib.rs
>> +++ b/rust/kernel/lib.rs
>> @@ -75,6 +75,7 @@
>>  pub mod bug;
>>  #[doc(hidden)]
>>  pub mod build_assert;
>> +pub mod clist;
> 
> Can we keep this pub(crate)?

Yes, will do.

-- 
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists
  2026-01-21 19:50     ` Joel Fernandes
@ 2026-01-21 20:36       ` Gary Guo
  2026-01-21 20:41         ` Joel Fernandes
                           ` (2 more replies)
  0 siblings, 3 replies; 71+ messages in thread
From: Gary Guo @ 2026-01-21 20:36 UTC (permalink / raw)
  To: Joel Fernandes, Gary Guo, linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, John Hubbard, Alistair Popple, Timur Tabi,
	Edwin Peer, Alexandre Courbot, Andrea Righi, Andy Ritger,
	Zhi Wang, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, joel, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev

On Wed Jan 21, 2026 at 7:50 PM GMT, Joel Fernandes wrote:
> Hello, Gary,
>
> On 1/20/2026 6:48 PM, Gary Guo wrote:
>> On Tue Jan 20, 2026 at 8:42 PM GMT, Joel Fernandes wrote:
>>> Add a new module `clist` for working with C's doubly circular linked
>>> lists. Provide low-level iteration over list nodes.
>>>
>>> Typed iteration over actual items is provided with a `clist_create`
>>> macro to assist in creation of the `Clist` type.
>> 
>> This should read "CList".
>
> Sure, will fix.
>
>> 
>> I was quite dubious about the patch just from the title (everybody knows how
>> easy a linked list is in Rust), but it turns out it is not as concerning as I
>> expected, mostly due to the read-only nature of the particular implementation
>> (a lot of the safety comments would be much more difficult to justify, say, if
>> it's mutable). That said, still a lot of feedbacks below.
>
> Sure, the reason for requiring this is interfacing with lists coming from C
> code. I'd see a future where we may want it mutable too (example, Rust code
> adding elements to the existing). At which point, the invariants/safety
> reasoning may change.
>
>> I think something like is okay in the short term. However, there's an growing
>> interest in getting our Rust list API improved, so it could be ideal if
>> eventually the Rust list can be capable of handling FFI lists, too.
>
> Yeah we looked into that, if you see old threads, the conclusion was it is not a
> good fit for existing rust list abstractions. TLDR; it does not fit into their
> ownership/borrowing model.

Definitely not with the existing one that we have, as it handles only `Arc`.
But the existing abstraction is also not good enough if you want to insert
`Box`...

>
> [...]
>>> +
>>> +/// Initialize a `list_head` object to point to itself.
>>> +///
>>> +/// # Safety
>>> +///
>>> +/// `list` must be a valid pointer to a `list_head` object.
>>> +#[inline]
>>> +pub unsafe fn init_list_head(list: *mut bindings::list_head) {
>>> +    // SAFETY: Caller guarantees `list` is a valid pointer to a `list_head`.
>>> +    unsafe {
>>> +        (*list).next = list;
>>> +        (*list).prev = list;
>> 
>> This needs to be an atomic write or it'll depart from the C implementation.
>
> I am curious what you mean by atomic write, can you define it?  Does rust
> compiler have load/store fusing, invented stores, etc, like C does? Sorry I am
> only familiar with these concepts on C. Could you provide example of a race
> condition in Rust that can happen?

Oh yes, this would definitely happen. It's down to LLVM to compile anyway. If
you create a reference, there'll be even more freedom to do these.

>
> Also I did this addition based on feedback from past review:
> https://lore.kernel.org/all/DEI89VUEYXAJ.1IQQPC3QRLITP@nvidia.com/
>
> There was some concerns around pointless function call overhead when the rust
> implementation is already quite intertwined with internals of the C linked list
> implementation. I do agree with that point of view too.

Overall our practice is to not duplicate code. Even `ERR_PTR` is calling into
helpers.

For performance, it's a valid concern. However Alice and I have series out there
that enable you to inline the helpers. I'd say unless there's an absolute need,
we should do the helpers. Especially with caveats like WRITE_ONCE in this case.

>
> Also see my other reply to Zhi on this helper topic, lets discuss there too, if
> that's Ok.
>
>>> +    }
>>> +}
>> 
>> I don't think we want to publicly expose this! I've not found a user in the
>> subsequent patch, too.
>
> There are 2 users:
>
>     pub fn try_init<E>(
>
> and the self-tests:

This is not really a public user. It's hidden in the doc test too, you could
initialize using try_init too.

>
> //! # let head = head.as_mut_ptr();
> //! # // SAFETY: head and all the items are test objects allocated in [..]
> //! # unsafe { init_list_head(head) };
> //! #
>
>> 
>>> +
>>> +/// Wraps a `list_head` object for use in intrusive linked lists.
>>> +///
>>> +/// # Invariants
>>> +///
>>> +/// - [`CListHead`] represents an allocated and valid `list_head` structure.
>>> +/// - Once a [`CListHead`] is created in Rust, it will not be modified by non-Rust code.
>>> +/// - All `list_head` for individual items are not modified for the lifetime of [`CListHead`].
>>> +#[repr(transparent)]
>>> +pub struct CListHead(Opaque<bindings::list_head>);
>>> +
>>> +impl CListHead {
>>> +    /// Create a `&CListHead` reference from a raw `list_head` pointer.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// - `ptr` must be a valid pointer to an allocated and initialized `list_head` structure.
>>> +    /// - `ptr` must remain valid and unmodified for the lifetime `'a`.
>>> +    #[inline]
>>> +    pub unsafe fn from_raw<'a>(ptr: *mut bindings::list_head) -> &'a Self {
>>> +        // SAFETY:
>>> +        // - [`CListHead`] has same layout as `list_head`.
>>> +        // - `ptr` is valid and unmodified for 'a.
>>> +        unsafe { &*ptr.cast() }
>>> +    }
>>> +
>>> +    /// Get the raw `list_head` pointer.
>>> +    #[inline]
>>> +    pub fn as_raw(&self) -> *mut bindings::list_head {
>>> +        self.0.get()
>>> +    }
>>> +
>>> +    /// Get the next [`CListHead`] in the list.
>>> +    #[inline]
>>> +    pub fn next(&self) -> &Self {
>>> +        let raw = self.as_raw();
>>> +        // SAFETY:
>>> +        // - `self.as_raw()` is valid per type invariants.
>>> +        // - The `next` pointer is guaranteed to be non-NULL.
>>> +        unsafe { Self::from_raw((*raw).next) }
>>> +    }
>>> +
>>> +    /// Get the previous [`CListHead`] in the list.
>>> +    #[inline]
>>> +    pub fn prev(&self) -> &Self {
>>> +        let raw = self.as_raw();
>>> +        // SAFETY:
>>> +        // - self.as_raw() is valid per type invariants.
>>> +        // - The `prev` pointer is guaranteed to be non-NULL.
>>> +        unsafe { Self::from_raw((*raw).prev) }
>>> +    }
>>> +
>>> +    /// Check if this node is linked in a list (not isolated).
>>> +    #[inline]
>>> +    pub fn is_linked(&self) -> bool {
>>> +        let raw = self.as_raw();
>>> +        // SAFETY: self.as_raw() is valid per type invariants.
>>> +        unsafe { (*raw).next != raw && (*raw).prev != raw }
>> 
>> While is this checking both prev and next? `list_empty` is just
>> `READ_ONCE(head->next) == head`.
>
> Sure, I can optimize to just check ->next, that makes sense. Will do.
>

The important part is to make sure we don't deviate from C implementation. A
copy is already not good, and difference is worse.

>> 
>>> +    }
>>> +
>>> +    /// Fallible pin-initializer that initializes and then calls user closure.
>>> +    ///
>>> +    /// Initializes the list head first, then passes `&CListHead` to the closure.
>>> +    /// This hides the raw FFI pointer from the user.
>>> +    pub fn try_init<E>(
>>> +        init_func: impl FnOnce(&CListHead) -> Result<(), E>,
>>> +    ) -> impl PinInit<Self, E> {
>>> +        // SAFETY: init_list_head initializes the list_head to point to itself.
>>> +        // After initialization, we create a reference to pass to the closure.
>>> +        unsafe {
>>> +            pin_init::pin_init_from_closure(move |slot: *mut Self| {
>>> +                init_list_head(slot.cast());
>>> +                // SAFETY: slot is now initialized, safe to create reference.
>>> +                init_func(&*slot)
>> 
>> Why is this callback necessary? The user can just create the list head and
>> then reference it later? I don't see what this specifically gains over just
>> doing
>> 
>>     fn new() -> impl PinInit<Self>;
>> 
>> and have user-side
>> 
>>     list <- CListHead::new(),
>>     _: {
>>         do_want_ever(&list)
>>     }
>
> The list initialization can fail, see the GPU buddy patch:
>
>         // Create pin-initializer that initializes list and allocates blocks.
>         let init = try_pin_init!(AllocatedBlocks {
>             list <- CListHead::try_init(|list| {
>                 // Lock while allocating to serialize with concurrent frees.
>                 let guard = buddy_arc.lock();
>
>                 // SAFETY: guard provides exclusive access, list is initialized.
>                 to_result(unsafe {
>                     bindings::gpu_buddy_alloc_blocks(
>                         guard.as_raw(),
>                         params.start_range_address,
>                         params.end_range_address,
>                         params.size_bytes,
>                         params.min_block_size_bytes,
>                         list.as_raw(),
>                         params.buddy_flags.as_raw(),
>                     )
>                 })
>             }),
>             buddy: Arc::clone(&buddy_arc),
>             flags: params.buddy_flags,
>         });

The list initialization doesn't fail? It's the subsequent action you did that
failed.

You can put failing things in the `_: { ... }` block too.

>
>> 
>> 
>>> +            })
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +// SAFETY: [`CListHead`] can be sent to any thread.
>>> +unsafe impl Send for CListHead {}
>>> +
>>> +// SAFETY: [`CListHead`] can be shared among threads as it is not modified
>>> +// by non-Rust code per type invariants.
>>> +unsafe impl Sync for CListHead {}
>>> +
>>> +impl PartialEq for CListHead {
>>> +    fn eq(&self, other: &Self) -> bool {
>>> +        self.as_raw() == other.as_raw()
>> 
>> Or just `core::ptr::eq(self, other)`
>
> Sure, will fix.
>
>> 
>>> +    }
>>> +}
>>> +
>>> +impl Eq for CListHead {}
>>> +
>>> +/// Low-level iterator over `list_head` nodes.
>>> +///
>>> +/// An iterator used to iterate over a C intrusive linked list (`list_head`). Caller has to
>>> +/// perform conversion of returned [`CListHead`] to an item (using `container_of` macro or similar).
>>> +///
>>> +/// # Invariants
>>> +///
>>> +/// [`CListHeadIter`] is iterating over an allocated, initialized and valid list.
>>> +struct CListHeadIter<'a> {
>>> +    current_head: &'a CListHead,
>>> +    list_head: &'a CListHead,
>>> +}
>>> +
>>> +impl<'a> Iterator for CListHeadIter<'a> {
>>> +    type Item = &'a CListHead;
>>> +
>>> +    #[inline]
>>> +    fn next(&mut self) -> Option<Self::Item> {
>>> +        // Advance to next node.
>>> +        let next = self.current_head.next();
>>> +
>>> +        // Check if we've circled back to the sentinel head.
>>> +        if next == self.list_head {
>>> +            None
>>> +        } else {
>>> +            self.current_head = next;
>>> +            Some(self.current_head)
>>> +        }
>> 
>> I think this could match the C iterator behaviour. When the iterator is created,
>> a `next` is done first, and then subsequently you only need to check if
>> `current_head` is `list_head`.
>> 
>> This is slightly better because the condition check does not need to dereference
>> a pointer.
>
> Sure, I can change it to that.
>>> +impl<'a> FusedIterator for CListHeadIter<'a> {}
>>> +
>>> +/// A typed C linked list with a sentinel head.
>>> +///
>>> +/// A sentinel head represents the entire linked list and can be used for
>>> +/// iteration over items of type `T`, it is not associated with a specific item.
>>> +///
>>> +/// The const generic `OFFSET` specifies the byte offset of the `list_head` field within
>>> +/// the struct that `T` wraps.
>>> +///
>>> +/// # Invariants
>>> +///
>>> +/// - `head` is an allocated and valid C `list_head` structure that is the list's sentinel.
>>> +/// - `OFFSET` is the byte offset of the `list_head` field within the struct that `T` wraps.
>>> +/// - All the list's `list_head` nodes are allocated and have valid next/prev pointers.
>>> +/// - The underlying `list_head` (and entire list) is not modified for the lifetime `'a`.
>>> +pub struct CList<'a, T, const OFFSET: usize> {
>>> +    head: &'a CListHead,
>>> +    _phantom: PhantomData<&'a T>,
>>> +}
>> 
>> Is there a reason that this is not
>> 
>>     #[repr(transparent)]
>>     struct CList(CListHead)
>> 
>> ? We typically want to avoid putting reference inside the struct if it can be on
>> the outside. This allows `&self` to be a single level of reference, not too.
>> 
>> It also means that you can just write `&CList<_>` in many cases, and doesn't need
>> `CList<'_, T>` (plus all the benefits of a reference).
>
> Sure! Will change to this. I am guessing you mean the following, but please let
> me know if you meant something else:
>
>   pub struct CList<T, const OFFSET: usize>(
>       CListHead,
>       PhantomData<T>,
>   );
>
> I don't see any issues with my code using that, at the moment. Will let you know
> how it goes.

Yes, with `#[repr(transparent)]`.

>>> +impl<'a, T, const OFFSET: usize> CList<'a, T, OFFSET> {
>>> +    /// Create a typed [`CList`] from a raw sentinel `list_head` pointer.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// - `ptr` must be a valid pointer to an allocated and initialized `list_head` structure
>>> +    ///   representing a list sentinel.
>>> +    /// - `ptr` must remain valid and unmodified for the lifetime `'a`.
>>> +    /// - The list must contain items where the `list_head` field is at byte offset `OFFSET`.
>>> +    /// - `T` must be `#[repr(transparent)]` over the C struct.
>>> +    #[inline]
>>> +    pub unsafe fn from_raw(ptr: *mut bindings::list_head) -> Self {
>>> +        Self {
>>> +            // SAFETY: Caller guarantees `ptr` is a valid, sentinel `list_head` object.
>>> +            head: unsafe { CListHead::from_raw(ptr) },
>>> +            _phantom: PhantomData,
>>> +        }
>>> +    }
>>> +
>>> +    /// Get the raw sentinel `list_head` pointer.
>>> +    #[inline]
>>> +    pub fn as_raw(&self) -> *mut bindings::list_head {
>>> +        self.head.as_raw()
>>> +    }
>>> +
>>> +    /// Check if the list is empty.
>>> +    #[inline]
>>> +    pub fn is_empty(&self) -> bool {
>>> +        let raw = self.as_raw();
>>> +        // SAFETY: self.as_raw() is valid per type invariants.
>>> +        unsafe { (*raw).next == raw }
>> 
>> `self.head.is_linked()`?
>
> I'd considered `is_linked()` to be something that makes sense to call only on
> `ClistHead` objects that belong to a particular "item" node, not a sentinel
> node, so that was deliberate.
>
> Though, I am Ok with doing it the way you are suggesting too
> (`self.head.is_linked()`), since it is functionally equivalent.
>
>>> +    }
>>> +
>>> +    /// Create an iterator over typed items.
>>> +    #[inline]
>>> +    pub fn iter(&self) -> CListIter<'a, T, OFFSET> {
>>> +        CListIter {
>>> +            head_iter: CListHeadIter {
>>> +                current_head: self.head,
>>> +                list_head: self.head,
>>> +            },
>>> +            _phantom: PhantomData,
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +/// High-level iterator over typed list items.
>>> +pub struct CListIter<'a, T, const OFFSET: usize> {
>>> +    head_iter: CListHeadIter<'a>,
>>> +    _phantom: PhantomData<&'a T>,
>>> +}
>>> +
>>> +impl<'a, T, const OFFSET: usize> Iterator for CListIter<'a, T, OFFSET> {
>>> +    type Item = &'a T;
>>> +
>>> +    fn next(&mut self) -> Option<Self::Item> {
>>> +        let head = self.head_iter.next()?;
>>> +
>>> +        // Convert to item using OFFSET.
>>> +        // SAFETY: `item_ptr` calculation from `OFFSET` (calculated using offset_of!)
>>> +        // is valid per invariants.
>>> +        Some(unsafe { &*head.as_raw().byte_sub(OFFSET).cast::<T>() })
>>> +    }
>>> +}
>>> +
>>> +impl<'a, T, const OFFSET: usize> FusedIterator for CListIter<'a, T, OFFSET> {}
>>> +
>>> +/// Create a C doubly-circular linked list interface [`CList`] from a raw `list_head` pointer.
>>> +///
>>> +/// This macro creates a [`CList<T, OFFSET>`] that can iterate over items of type `$rust_type`
>>> +/// linked via the `$field` field in the underlying C struct `$c_type`.
>>> +///
>>> +/// # Arguments
>>> +///
>>> +/// - `$head`: Raw pointer to the sentinel `list_head` object (`*mut bindings::list_head`).
>>> +/// - `$rust_type`: Each item's rust wrapper type.
>>> +/// - `$c_type`: Each item's C struct type that contains the embedded `list_head`.
>>> +/// - `$field`: The name of the `list_head` field within the C struct.
>>> +///
>>> +/// # Safety
>>> +///
>>> +/// The caller must ensure:
>>> +/// - `$head` is a valid, initialized sentinel `list_head` pointing to a list that remains
>>> +///   unmodified for the lifetime of the rust [`CList`].
>>> +/// - The list contains items of type `$c_type` linked via an embedded `$field`.
>>> +/// - `$rust_type` is `#[repr(transparent)]` over `$c_type` or has compatible layout.
>>> +/// - The macro is called from an unsafe block.
>> 
>> This is not a safe requirement, probably lift it up and say "This is an unsafe
>> macro.".
>
> Sure, so like this then:
>   /// This is an unsafe macro. The caller must ensure:
>   /// - `$head` is a valid, initialized sentinel `list_head`...

Yes.

Best,
Gary

>
>>> +///
>>> +/// # Examples
>>> +///
>>> +/// Refer to the examples in the [`crate::clist`] module documentation.
>>> +#[macro_export]
>>> +macro_rules! clist_create {
>>> +    ($head:expr, $rust_type:ty, $c_type:ty, $($field:tt).+) => {{
>>> +        // Compile-time check that field path is a list_head.
>>> +        let _: fn(*const $c_type) -> *const $crate::bindings::list_head =
>>> +            |p| ::core::ptr::addr_of!((*p).$($field).+);
>> 
>> `&raw const` is preferred now.
>
> Sure, will fix.
>
>> 
>>> +
>>> +        // Calculate offset and create `CList`.
>>> +        const OFFSET: usize = ::core::mem::offset_of!($c_type, $($field).+);
>>> +        $crate::clist::CList::<$rust_type, OFFSET>::from_raw($head)
>>> +    }};
>>> +}
>>> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
>>> index f812cf120042..cd7e6a1055b0 100644
>>> --- a/rust/kernel/lib.rs
>>> +++ b/rust/kernel/lib.rs
>>> @@ -75,6 +75,7 @@
>>>  pub mod bug;
>>>  #[doc(hidden)]
>>>  pub mod build_assert;
>>> +pub mod clist;
>> 
>> Can we keep this pub(crate)?
>
> Yes, will do.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists
  2026-01-21 20:36       ` Gary Guo
@ 2026-01-21 20:41         ` Joel Fernandes
  2026-01-21 20:46         ` Joel Fernandes
  2026-01-25  1:51         ` Joel Fernandes
  2 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-21 20:41 UTC (permalink / raw)
  To: Gary Guo, linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, John Hubbard, Alistair Popple, Timur Tabi,
	Edwin Peer, Alexandre Courbot, Andrea Righi, Andy Ritger,
	Zhi Wang, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, joel, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev



On 1/21/2026 3:36 PM, Gary Guo wrote:
>> There are 2 users:
>>
>>     pub fn try_init<E>(
>>
>> and the self-tests:
> This is not really a public user. It's hidden in the doc test too, you could
> initialize using try_init too.
> 
>> //! # let head = head.as_mut_ptr();
>> //! # // SAFETY: head and all the items are test objects allocated in [..]
>> //! # unsafe { init_list_head(head) };
>> //! #

True, but if we initialize purely within try_init() without using a helper, does
that not defeat the argument of adding a separate INIT_LIST_HEAD helper such
that we don't deviate from the C side?

Regarding your other comment about the try_init block itself, I will take a look
at your suggestion and see if I can simplify.

-- 
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists
  2026-01-21 20:36       ` Gary Guo
  2026-01-21 20:41         ` Joel Fernandes
@ 2026-01-21 20:46         ` Joel Fernandes
  2026-01-25  1:51         ` Joel Fernandes
  2 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-21 20:46 UTC (permalink / raw)
  To: Gary Guo, linux-kernel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, John Hubbard, Alistair Popple, Timur Tabi,
	Edwin Peer, Alexandre Courbot, Andrea Righi, Andy Ritger,
	Zhi Wang, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, joel, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev



On 1/21/2026 3:36 PM, Gary Guo wrote:
>> [...]
>>>> +
>>>> +/// Initialize a `list_head` object to point to itself.
>>>> +///
>>>> +/// # Safety
>>>> +///
>>>> +/// `list` must be a valid pointer to a `list_head` object.
>>>> +#[inline]
>>>> +pub unsafe fn init_list_head(list: *mut bindings::list_head) {
>>>> +    // SAFETY: Caller guarantees `list` is a valid pointer to a `list_head`.
>>>> +    unsafe {
>>>> +        (*list).next = list;
>>>> +        (*list).prev = list;
>>>
>>> This needs to be an atomic write or it'll depart from the C implementation.
>> I am curious what you mean by atomic write, can you define it?  Does rust
>> compiler have load/store fusing, invented stores, etc, like C does? Sorry I am
>> only familiar with these concepts on C. Could you provide example of a race
>> condition in Rust that can happen?
>
> Oh yes, this would definitely happen. It's down to LLVM to compile anyway. If
> you create a reference, there'll be even more freedom to do these.
>

Ok.

>> Also I did this addition based on feedback from past review:
>> https://lore.kernel.org/all/DEI89VUEYXAJ.1IQQPC3QRLITP@nvidia.com/
>>
>> There was some concerns around pointless function call overhead when the rust
>> implementation is already quite intertwined with internals of the C linked list
>> implementation. I do agree with that point of view too.
>
> Overall our practice is to not duplicate code. Even `ERR_PTR` is calling into
> helpers.
> 
> For performance, it's a valid concern. However Alice and I have series out there
> that enable you to inline the helpers. I'd say unless there's an absolute need,
> we should do the helpers. Especially with caveats like WRITE_ONCE in this case.

Sounds good, so I will then go back to adding a INIT_LIST_HEAD C helper for the
next spin. I agree with the suggestion and now that we are inlining helpers,
there seems little point in adding a separate rust function to do the same.

-- 
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-21 17:52     ` Joel Fernandes
@ 2026-01-22 23:16       ` Joel Fernandes
  2026-01-23 10:13         ` Zhi Wang
  2026-01-28 12:04         ` Danilo Krummrich
  0 siblings, 2 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-22 23:16 UTC (permalink / raw)
  To: Zhi Wang
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian Koenig, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Danilo Krummrich,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

On Wed, 21 Jan 2026 12:52:10 -0500, Joel Fernandes wrote:
> I think we can incrementally build on this series to add support for the same,
> it is not something this series directly addresses since I have spend majority
> of my time last several months making translation *work* which is itself no east
> task. This series is just preliminary based on work from last several months and
> to make BAR1 work. For instance, I kept PRAMIN simple based on feedback that we
> don't want to over complicate without fully understanding all the requirements.
> There is also additional requirements for locking design that have implications
> with DMA fencing etc, for instance.
>
> Anyway thinking out loud, I am thinking for handling concurrency at the page
> table entry level (if we ever need it), we could use per-PT spinlocks similar to
> the Linux kernel. But lets plan on how to do this properly and based on actual
> requirements.

Thanks for the discussion on concurrency, Zhi.

My plan is to make TLB and PRAMIN use immutable references in their function
calls and then implement internal locking. I've already done this for the GPU
buddy functions, so it should be doable, and we'll keep it consistent. As a
result, we will have finer-grain locking on the memory management objects
instead of requiring to globally lock a common GpuMm object. I'll plan on
doing this for v7.

Also, the PTE allocation race you mentioned is already handled by PRAMIN
serialization. Since threads must hold the PRAMIN lock to write page table
entries, concurrent writers are not possible:

  Thread A: acquire PRAMIN lock
  Thread A: read PDE (via PRAMIN) -> NULL
  Thread A: alloc PT page, write PDE
  Thread A: release PRAMIN lock

  Thread B: acquire PRAMIN lock
  Thread B: read PDE (via PRAMIN) -> sees A's pointer
  Thread B: uses existing PT page, no allocation needed

No atomic compare-and-swap on VRAM is needed because the PRAMIN lock serializes
access. Please let me know if you had a different scenario in mind, but I think
this covers it.

Zhi, feel free to use v6 though for any testing you are doing while I
rework the locking.

-- 
Joel Fernandes

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-22 23:16       ` Joel Fernandes
@ 2026-01-23 10:13         ` Zhi Wang
  2026-01-23 12:59           ` Joel Fernandes
  2026-01-28 12:04         ` Danilo Krummrich
  1 sibling, 1 reply; 71+ messages in thread
From: Zhi Wang @ 2026-01-23 10:13 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian Koenig, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Huang Rui,
	Matthew Auld, Matthew Brost, Lucas De Marchi, Thomas Hellstrom,
	Helge Deller, Danilo Krummrich, Alice Ryhl, Miguel Ojeda,
	Alex Gaynor, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Alistair Popple,
	Alexandre Courbot, Andrea Righi, Alexey Ivanov, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, nouveau, dri-devel, rust-for-linux,
	linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev

On Thu, 22 Jan 2026 18:16:00 -0500
Joel Fernandes <joelagnelf@nvidia.com> wrote:

> On Wed, 21 Jan 2026 12:52:10 -0500, Joel Fernandes wrote:
> > I think we can incrementally build on this series to add support for
> > the same, it is not something this series directly addresses since I
> > have spend majority of my time last several months making translation
> > *work* which is itself no east task. This series is just preliminary
> > based on work from last several months and to make BAR1 work. For
> > instance, I kept PRAMIN simple based on feedback that we don't want to
> > over complicate without fully understanding all the requirements.
> > There is also additional requirements for locking design that have
> > implications with DMA fencing etc, for instance.
> >
> > Anyway thinking out loud, I am thinking for handling concurrency at
> > the page table entry level (if we ever need it), we could use per-PT
> > spinlocks similar to the Linux kernel. But lets plan on how to do this
> > properly and based on actual requirements.
> 
> Thanks for the discussion on concurrency, Zhi.
> 
> My plan is to make TLB and PRAMIN use immutable references in their
> function calls and then implement internal locking. I've already done
> this for the GPU buddy functions, so it should be doable, and we'll keep
> it consistent. As a result, we will have finer-grain locking on the
> memory management objects instead of requiring to globally lock a common
> GpuMm object. I'll plan on doing this for v7.
> 
> Also, the PTE allocation race you mentioned is already handled by PRAMIN
> serialization. Since threads must hold the PRAMIN lock to write page
> table entries, concurrent writers are not possible:
> 
>   Thread A: acquire PRAMIN lock
>   Thread A: read PDE (via PRAMIN) -> NULL
>   Thread A: alloc PT page, write PDE
>   Thread A: release PRAMIN lock
> 
>   Thread B: acquire PRAMIN lock
>   Thread B: read PDE (via PRAMIN) -> sees A's pointer
>   Thread B: uses existing PT page, no allocation needed
> 
> No atomic compare-and-swap on VRAM is needed because the PRAMIN lock
> serializes access. Please let me know if you had a different scenario in
> mind, but I think this covers it.
> 
> Zhi, feel free to use v6 though for any testing you are doing while I
> rework the locking.
> 

Hi Joel:

Thanks so much for the work and the discussion. It is super important
efforts for me to move on for the vGPU work. :)

As we discussed, the concurrency matters most when booting multiple vGPUs.
At that time, the concurrency happens at:

1) Allocating GPU memory chunks
2) Reserving GPU channels
3) Mapping GPU memory to BAR1 page table

We basically need kinda protection there. E.g. Guard/Access on immutable
references, which is backed by the mutex. I believe there shouldn't be a
non-sleepible path reaching those. This should be fine.

I can see you are thinking of fine-granularity locking scheme, which I
think is the right direction to go. I agreed with the above two locks.

For 1), I can recall that you mentioned there is some lock protection
already there.

For 2), We can think of it when reaching there.

However for 3), We need to have one there as well beside the above two
locks. Have you already had one in the GPU VA allocator?

If yes, the above two locks should be good enough so far. IMO.

Z.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-23 10:13         ` Zhi Wang
@ 2026-01-23 12:59           ` Joel Fernandes
  0 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-23 12:59 UTC (permalink / raw)
  To: Zhi Wang
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian Koenig, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Huang Rui,
	Matthew Auld, Matthew Brost, Lucas De Marchi, Thomas Hellstrom,
	Helge Deller, Danilo Krummrich, Alice Ryhl, Miguel Ojeda,
	Alex Gaynor, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Alistair Popple,
	Alexandre Courbot, Andrea Righi, Alexey Ivanov, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, nouveau, dri-devel, rust-for-linux,
	linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev

On Fri, 23 Jan 2026 12:13:43 +0200, Zhi Wang wrote:
> Thanks so much for the work and the discussion. It is super important
> efforts for me to move on for the vGPU work. :)

Great!

> As we discussed, the concurrency matters most when booting multiple vGPUs.
> At that time, the concurrency happens at:
>
> 1) Allocating GPU memory chunks
> 2) Reserving GPU channels
> 3) Mapping GPU memory to BAR1 page table

Yes all these are already covered from a concurrency PoV by the v6.

> I can see you are thinking of fine-granularity locking scheme, which I
> think is the right direction to go. I agreed with the above two locks.

Cool!

> However for 3), We need to have one there as well beside the above two
> locks. Have you already had one in the GPU VA allocator?

Currently for mapping Bar pages, you need a mutable reference to BarUser.
For the future, when we have multiple channels sharing the same va space,
it will still be protected because the va space allocator (virt_buddy)
already has internal locking. And for each map_page, it is protected as
well. So I believe that should also be covered. Thanks for checking.

> If yes, the above two locks should be good enough so far. IMO.

Ok, thanks for checking.

--
Joel Fernandes

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists
  2026-01-21 20:36       ` Gary Guo
  2026-01-21 20:41         ` Joel Fernandes
  2026-01-21 20:46         ` Joel Fernandes
@ 2026-01-25  1:51         ` Joel Fernandes
  2 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-25  1:51 UTC (permalink / raw)
  To: Gary Guo
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, John Hubbard, Alistair Popple, Timur Tabi,
	Edwin Peer, Alexandre Courbot, Andrea Righi, Andy Ritger,
	Zhi Wang, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, nouveau, dri-devel, rust-for-linux,
	linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev

On Wed, Jan 21, 2026 at 08:36:05PM +0000, Gary Guo wrote:
>>> Why is this callback necessary? The user can just create the list head and
>>> then reference it later? I don't see what this specifically gains over just
>>> doing
>>> 
>>>     fn new() -> impl PinInit<Self>;
>>> 
>>> and have user-side
>>> 
>>>     list <- CListHead::new(),
>>>     _: {
>>>         do_want_ever(&list)
>>>     }
>>
>> The list initialization can fail, see the GPU buddy patch:
>>
>>         // Create pin-initializer that initializes list and allocates blocks.
>>         let init = try_pin_init!(AllocatedBlocks {
>>             list <- CListHead::try_init(|list| {
>>                 // Lock while allocating to serialize with concurrent frees.
>>                 let guard = buddy_arc.lock();
>>
>>                 // SAFETY: guard provides exclusive access, list is initialized.
>>                 to_result(unsafe {
>>                     bindings::gpu_buddy_alloc_blocks(
>>                         guard.as_raw(),
>>                         params.start_range_address,
>>                         params.end_range_address,
>>                         params.size_bytes,
>>                         params.min_block_size_bytes,
>>                         list.as_raw(),
>>                         params.buddy_flags.as_raw(),
>>                     )
>>                 })
>>             }),
>>             buddy: Arc::clone(&buddy_arc),
>>             flags: params.buddy_flags,
>>         });
> 
> The list initialization doesn't fail? It's the subsequent action you did that
> failed.
> 
> You can put failing things in the `_: { ... }` block too.

This worked out well, thanks for the suggestion! I've updated the code
to use `CListHead::new()` with the failable allocation in a `_: { ... }` block:

        let init = try_pin_init!(AllocatedBlocks {
            buddy: Arc::clone(&buddy_arc),
            list <- CListHead::new(),
            flags: params.buddy_flags,
            _: {
                // Lock while allocating to serialize with concurrent frees.
                let guard = buddy.lock();

                // SAFETY: `guard` provides exclusive access to the buddy allocator.
                to_result(unsafe {
                    bindings::gpu_buddy_alloc_blocks(
                        guard.as_raw(),
                        params.start_range_address,
                        params.end_range_address,
                        params.size_bytes,
                        params.min_block_size_bytes,
                        list.as_raw(),
                        params.buddy_flags.as_raw(),
                    )
                })?
            }
        });

I'll remove the try_init() method from CListHead since new() is sufficient.

-- 
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6)
  2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
                   ` (25 preceding siblings ...)
  2026-01-20 20:43 ` [PATCH RFC v6 26/26] nova-core: mm: Add BarUser to struct Gpu and create at boot Joel Fernandes
@ 2026-01-28 11:37 ` Danilo Krummrich
  2026-01-28 12:44   ` Joel Fernandes
  26 siblings, 1 reply; 71+ messages in thread
From: Danilo Krummrich @ 2026-01-28 11:37 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller, Alice Ryhl,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, John Hubbard, Alistair Popple, Timur Tabi,
	Edwin Peer, Alexandre Courbot, Andrea Righi, Andy Ritger,
	Zhi Wang, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, joel, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev

On Tue Jan 20, 2026 at 9:42 PM CET, Joel Fernandes wrote:
> This series is rebased on drm-rust-kernel/drm-rust-next and provides memory
> management infrastructure for the nova-core GPU driver. It combines several
> previous series and provides a foundation for nova GPU memory management
> including page tables, virtual memory management, and BAR mapping. All these
> are critical nova-core features.

Thanks for this work, I will go through the series soon. (Although it would also
be nice to have what I mention below addressed first.)

> The series includes:
> - A Rust module (CList) to interface with C circular linked lists, required
>   for iterating over buddy allocator blocks.
> - Movement of the DRM buddy allocator up to drivers/gpu/ level, renamed to GPU buddy.
> - Rust bindings for the GPU buddy allocator.
> - PRAMIN aperture support for direct VRAM access.
> - Page table types for MMU v2 and v3 formats.
> - Virtual Memory Manager (VMM) for GPU virtual address space management.
> - BAR1 user interface for mapping access GPU via virtual memory.
> - Selftests for PRAMIN and BAR1 user interface (disabled by default).
>
> Changes from v5 to v6:
> - Rebased on drm-rust-kernel/drm-rust-next
> - Added page table types and page table walker infrastructure
> - Added Virtual Memory Manager (VMM)
> - Added BAR1 user interface
> - Added TLB flush support
> - Added GpuMm memory manager
> - Extended to 26 patches from 6 (full mm infrastructure now included)
>
> The git tree with all patches can be found at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (tag: nova-mm-v6-20260120)
>
> Link to v5: https://lore.kernel.org/all/20251219203805.1246586-1-joelagnelf@nvidia.com/
>
> Previous series that are combined:
> - v4 (clist + buddy): https://lore.kernel.org/all/20251204215129.2357292-1-joelagnelf@nvidia.com/
> - v3 (clist only): https://lore.kernel.org/all/20251129213056.4021375-1-joelagnelf@nvidia.com/
> - v2 (clist only): https://lore.kernel.org/all/20251111171315.2196103-4-joelagnelf@nvidia.com/
> - clist RFC (original with buddy): https://lore.kernel.org/all/20251030190613.1224287-1-joelagnelf@nvidia.com/
> - DRM buddy move: https://lore.kernel.org/all/20251124234432.1988476-1-joelagnelf@nvidia.com/
> - PRAMIN series: https://lore.kernel.org/all/20251020185539.49986-1-joelagnelf@nvidia.com/

I'm not overly happy with this version history. I understand that you are
building things on top of each other, but going back and forth with adding and
removing features from a series is confusing and makes it hard to keep track of
things.

(In the worst case it may even result in reviewers skipping over it leaving you
with no progress eventually.)

I.e. you stared with a CList and DRM buddy RFC, then DRM buddy disappeared for a
few versions and came back eventually. Then, in the next version, the PRAMIN
stuff came back in, which also had a predecessor series already and now you
added lots of MM stuff on top of it.

The whole version history is about what features and patches were added and
removed to/from the series, rather than about what actually changed design wise
and code wise between the iterations (which is the important part for reviewers
and maintainers).

I also think it is confusing that a lot of the patches in this series have never
been posted before, yet they are labeled as v6 of this RFC.

Hence, please separate the features from each other in separate patch series,
with their own proper version history and changelog. In order to account for the
dependencies, you can just mention them in the cover letter and add a link to
the other related patch series, which should be sufficient for people interested
in the full picture.

I think the most clean approach would probably be a split with CList, DRM buddy
and Nova MM stuff.

And just to clarify, in the end I do not care too much about whether it's all in
a single series or split up, but going back and forth with combining things that
once have been separate and have a separate history doesn't work out well.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-22 23:16       ` Joel Fernandes
  2026-01-23 10:13         ` Zhi Wang
@ 2026-01-28 12:04         ` Danilo Krummrich
  2026-01-28 15:27           ` Joel Fernandes
  2026-01-30  0:26           ` Joel Fernandes
  1 sibling, 2 replies; 71+ messages in thread
From: Danilo Krummrich @ 2026-01-28 12:04 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Zhi Wang, linux-kernel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

On Fri Jan 23, 2026 at 12:16 AM CET, Joel Fernandes wrote:
> My plan is to make TLB and PRAMIN use immutable references in their function
> calls and then implement internal locking. I've already done this for the GPU
> buddy functions, so it should be doable, and we'll keep it consistent. As a
> result, we will have finer-grain locking on the memory management objects
> instead of requiring to globally lock a common GpuMm object. I'll plan on
> doing this for v7.
>
> Also, the PTE allocation race you mentioned is already handled by PRAMIN
> serialization. Since threads must hold the PRAMIN lock to write page table
> entries, concurrent writers are not possible:
>
>   Thread A: acquire PRAMIN lock
>   Thread A: read PDE (via PRAMIN) -> NULL
>   Thread A: alloc PT page, write PDE
>   Thread A: release PRAMIN lock
>
>   Thread B: acquire PRAMIN lock
>   Thread B: read PDE (via PRAMIN) -> sees A's pointer
>   Thread B: uses existing PT page, no allocation needed

This won't work unfortunately.

We have to separate allocations and modifications of the page tabe. Or in other
words, we must not allocate new PDEs or PTEs while holding the lock protecting
the page table from modifications.

Once we have VM_BIND in nova-drm, we will have the situation that userspace
passes jobs to modify the GPUs virtual address space and hence the page tables.

Such a jobs has mainly three stages.

  (1) The submit stage.

      This is where the job is initialized, dependencies are set up and the
      driver has to pre-allocate all kinds of structures that are required
      throughout the subsequent stages of the job.

  (2) The run stage.

      This is the stage where the job is staged for execution and its DMA fence
      has been made public (i.e. it is accessible by userspace).

      This is the stage where we are in the DMA fence signalling critical
      section, hence we can't do any non-atomic allocations, since otherwise we
      could deadlock in MMU notifier callbacks for instance.

      This is the stage where the page table is actually modified. Hence, we
      can't acquire any locks that might be held elsewhere while doing
      non-atomic allocations. Also note that this is transitive, e.g. if you
      take lock A and somewhere else a lock B is taked while A is already held
      and we do non-atomic allocations while holding B, then A can't be held in
      the DMA fence signalling critical path either.

      It is also worth noting that this is the stage where we know the exact
      operations we have to execute based on the VM_BIND request from userspace.

      For instance, in the submit stage we may only know that userspace wants
      that we map a BO with a certain offset in the GPUs virtual address space
      at [0x0, 0x1000000]. What we don't know is what exact operations this does
      require, i.e. "What do we have to unmap first?", "Are there any
      overlapping mappings that we have to truncate?", etc.

      So, we have to consider this when we pre-allocate in the submit stage.

  (3) The cleanup stage.

      This is where the job has been signaled and hence left the DMA fence
      signalling critical section.

      In this stage the job is cleaned up, which includes freeing data that is
      not required anymore, such as PTEs and PDEs.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6)
  2026-01-28 11:37 ` [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Danilo Krummrich
@ 2026-01-28 12:44   ` Joel Fernandes
  2026-01-29  0:01     ` Danilo Krummrich
  0 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-28 12:44 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Joel Fernandes, linux-kernel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Vivi Rodrigo, Tvrtko Ursulin, Rui Huang, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Alexey Ivanov, Balbir Singh, Philipp Stanner, Elle Rhumsaa,
	Daniel Almeida, nouveau, dri-devel, rust-for-linux, linux-doc,
	amd-gfx, intel-gfx, intel-xe, linux-fbdev

On Jan 28, 2026, at 6:38 AM, Danilo Krummrich <dakr@kernel.org> wrote:
> On Tue Jan 20, 2026 at 9:42 PM CET, Joel Fernandes wrote:
>> This series is rebased on drm-rust-kernel/drm-rust-next and provides memory
>> management infrastructure for the nova-core GPU driver. It combines several
>> previous series and provides a foundation for nova GPU memory management
>> including page tables, virtual memory management, and BAR mapping. All these
>> are critical nova-core features.
>
> Thanks for this work, I will go through the series soon. (Although it would also
> be nice to have what I mention below addressed first.)

Thanks, I appreciate that.

> I'm not overly happy with this version history. I understand that you are
> building things on top of each other, but going back and forth with adding and
> removing features from a series is confusing and makes it hard to keep track of
> things.
>
> (In the worst case it may even result in reviewers skipping over it leaving you
> with no progress eventually.)
>
> [...]
>
> Hence, please separate the features from each other in separate patch series,
> with their own proper version history and changelog. In order to account for the
> dependencies, you can just mention them in the cover letter and add a link to
> the other related patch series, which should be sufficient for people interested
> in the full picture.
>
> I think the most clean approach would probably be a split with CList, DRM buddy
> and Nova MM stuff.
>
> And just to clarify, in the end I do not care too much about whether it's all in
> a single series or split up, but going back and forth with combining things that
> once have been separate and have a separate history doesn't work out well.

I understand the concern, and I appreciate you taking the time to explain. Let
me provide some context on how we ended up here, as it may help clarify the
situation.

1. This is a multi-month undertaking with many interdependencies. It is
   difficult to predict which patches will come to exist, the optimal order, how to split, which series
   first, or what pieces are missing. This is similar to the evolution of nova
   itself - complex interdependencies make it hard to predict what will be
   needed. Rather than waiting months for a perfect plan before posting
   anything, I chose to iterate publicly.

2. The decision to move GPU buddy out of DRM came later in the process [1].
   This significantly changed the scope, requiring a much larger patch to
   handle the buddy infrastructure that everything else depends on.

3. The decision to separate buddy from the CList series came from wanting to
   make progress on CList independently [2]. That effort alone took almost a
   month with several rewrites based on feedback from  others.

4. There was some back and forth on whether to post code with users or code
   that could potentially be used. This influenced the decision to combine
   things into the same series to demonstrate working functionality.

5. The memory management code only became functional around v3. Page table
   walking turned out to be tricky, and I did not have a proper user at that
   time. Eventually I realized BAR1 is a strong use case for page table
   translation, so I added support for that.

Regarding splitting the series: that makes sense, I will split into CList, GPU
buddy, and Nova MM as you suggest. You make a fair point about the versioning
too - labeling new patches (even though most are old) as v6 is confusing. One question: what version
numbers should each split series use? CList was at v3 before being combined,
and similar story for GPU buddy and Nova MM. Should I continue from the last
version number they were posted with, or continue from v6?

[1] https://lore.kernel.org/all/20251124234432.1988476-1-joelagnelf@nvidia.com/
[2] https://lore.kernel.org/all/20251129213056.4021375-1-joelagnelf@nvidia.com/

--
Joel Fernandes

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-28 12:04         ` Danilo Krummrich
@ 2026-01-28 15:27           ` Joel Fernandes
  2026-01-29  0:09             ` Danilo Krummrich
  2026-01-30  0:26           ` Joel Fernandes
  1 sibling, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-28 15:27 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Zhi Wang, linux-kernel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev



On 1/28/2026 7:04 AM, Danilo Krummrich wrote:
> On Fri Jan 23, 2026 at 12:16 AM CET, Joel Fernandes wrote:
>> My plan is to make TLB and PRAMIN use immutable references in their function
>> calls and then implement internal locking. I've already done this for the GPU
>> buddy functions, so it should be doable, and we'll keep it consistent. As a
>> result, we will have finer-grain locking on the memory management objects
>> instead of requiring to globally lock a common GpuMm object. I'll plan on
>> doing this for v7.
>>
>> Also, the PTE allocation race you mentioned is already handled by PRAMIN
>> serialization. Since threads must hold the PRAMIN lock to write page table
>> entries, concurrent writers are not possible:
>>
>>    Thread A: acquire PRAMIN lock
>>    Thread A: read PDE (via PRAMIN) -> NULL
>>    Thread A: alloc PT page, write PDE
>>    Thread A: release PRAMIN lock
>>
>>    Thread B: acquire PRAMIN lock
>>    Thread B: read PDE (via PRAMIN) -> sees A's pointer
>>    Thread B: uses existing PT page, no allocation needed
> 
> This won't work unfortunately.
> 
> We have to separate allocations and modifications of the page tabe. Or in other
> words, we must not allocate new PDEs or PTEs while holding the lock protecting
> the page table from modifications.

I will go over these concerns, just to clarify - do you mean forbidding 
*any* lock or do you mean only forbidding non-atomic locks? I believe we 
can avoid non-atomic locks completely - actually I just wrote a patch 
before I read this email to do just. If we are to forbid any locking at 
all, that might require some careful redesign to handle the above race 
afaics.

> 
> Once we have VM_BIND in nova-drm, we will have the situation that userspace
> passes jobs to modify the GPUs virtual address space and hence the page tables.

Thanks for listing all the concerns below, this is very valuable. I will 
go over all these and all cases before posting the v7 now that I have this.

--
Joel Fernandes


> Such a jobs has mainly three stages.
> 
>    (1) The submit stage.
> 
>        This is where the job is initialized, dependencies are set up and the
>        driver has to pre-allocate all kinds of structures that are required
>        throughout the subsequent stages of the job.
> 
>    (2) The run stage.
> 
>        This is the stage where the job is staged for execution and its DMA fence
>        has been made public (i.e. it is accessible by userspace).
> 
>        This is the stage where we are in the DMA fence signalling critical
>        section, hence we can't do any non-atomic allocations, since otherwise we
>        could deadlock in MMU notifier callbacks for instance.
> 
>        This is the stage where the page table is actually modified. Hence, we
>        can't acquire any locks that might be held elsewhere while doing
>        non-atomic allocations. Also note that this is transitive, e.g. if you
>        take lock A and somewhere else a lock B is taked while A is already held
>        and we do non-atomic allocations while holding B, then A can't be held in
>        the DMA fence signalling critical path either.
> 
>        It is also worth noting that this is the stage where we know the exact
>        operations we have to execute based on the VM_BIND request from userspace.
> 
>        For instance, in the submit stage we may only know that userspace wants
>        that we map a BO with a certain offset in the GPUs virtual address space
>        at [0x0, 0x1000000]. What we don't know is what exact operations this does
>        require, i.e. "What do we have to unmap first?", "Are there any
>        overlapping mappings that we have to truncate?", etc.
> 
>        So, we have to consider this when we pre-allocate in the submit stage.
> 
>    (3) The cleanup stage.
> 
>        This is where the job has been signaled and hence left the DMA fence
>        signalling critical section.
> 
>        In this stage the job is cleaned up, which includes freeing data that is
>        not required anymore, such as PTEs and PDEs.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6)
  2026-01-28 12:44   ` Joel Fernandes
@ 2026-01-29  0:01     ` Danilo Krummrich
  0 siblings, 0 replies; 71+ messages in thread
From: Danilo Krummrich @ 2026-01-29  0:01 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian Koenig, Jani Nikula, Joonas Lahtinen, Vivi Rodrigo,
	Tvrtko Ursulin, Rui Huang, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alice Ryhl,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, John Hubbard,
	Alistair Popple, Timur Tabi, Edwin Peer, Alexandre Courbot,
	Andrea Righi, Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev

On Wed Jan 28, 2026 at 1:44 PM CET, Joel Fernandes wrote:
> I will split into CList, GPU buddy, and Nova MM as you suggest.

Thanks, together with a proper changelog this will help a lot.

> One question: what version numbers should each split series use? CList was at
> v3 before being combined, and similar story for GPU buddy and Nova MM. Should
> I continue from the last version number they were posted with, or continue
> from v6?

I'd say from the last version is probably best. Maybe you also want to move out
of the RFC stage for some of them.

Thanks,
Danilo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-28 15:27           ` Joel Fernandes
@ 2026-01-29  0:09             ` Danilo Krummrich
  2026-01-29  1:02               ` John Hubbard
  2026-01-29  1:28               ` Joel Fernandes
  0 siblings, 2 replies; 71+ messages in thread
From: Danilo Krummrich @ 2026-01-29  0:09 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Zhi Wang, linux-kernel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

On Wed Jan 28, 2026 at 4:27 PM CET, Joel Fernandes wrote:
> I will go over these concerns, just to clarify - do you mean forbidding 
> *any* lock or do you mean only forbidding non-atomic locks? I believe we 
> can avoid non-atomic locks completely - actually I just wrote a patch 
> before I read this email to do just. If we are to forbid any locking at 
> all, that might require some careful redesign to handle the above race 
> afaics.

It's not about the locks themselves, sleeping locks are fine too. It's about
holding locks that are held elsewhere when doing memory allocations that can
call back into MMU notifiers or the shrinker.

I.e. if in the fence signalling critical path you wait for a mutex that is held
elsewhere while allocating memory and the memory allocation calls back into the
shrinker, you may end up waiting for your own DMA fence to be signaled, which
causes a deadlock.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-29  0:09             ` Danilo Krummrich
@ 2026-01-29  1:02               ` John Hubbard
  2026-01-29  1:49                 ` Joel Fernandes
  2026-01-29  1:28               ` Joel Fernandes
  1 sibling, 1 reply; 71+ messages in thread
From: John Hubbard @ 2026-01-29  1:02 UTC (permalink / raw)
  To: Danilo Krummrich, Joel Fernandes
  Cc: Zhi Wang, linux-kernel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	Alistair Popple, Timur Tabi, Edwin Peer, Alexandre Courbot,
	Andrea Righi, Andy Ritger, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev

On 1/28/26 4:09 PM, Danilo Krummrich wrote:
> On Wed Jan 28, 2026 at 4:27 PM CET, Joel Fernandes wrote:
>> I will go over these concerns, just to clarify - do you mean forbidding
>> *any* lock or do you mean only forbidding non-atomic locks? I believe we
>> can avoid non-atomic locks completely - actually I just wrote a patch
>> before I read this email to do just. If we are to forbid any locking at
>> all, that might require some careful redesign to handle the above race
>> afaics.
> 
> It's not about the locks themselves, sleeping locks are fine too. It's about
> holding locks that are held elsewhere when doing memory allocations that can
> call back into MMU notifiers or the shrinker.

If you look at core kernel mm, you'll find a similar constraint: avoid
holding any locks while allocating--unless you are in the reclaim code
itself.

Especially when dealing with page tables.

So this is looking familiar to me and I agree with the constraint, fwiw.

> 
> I.e. if in the fence signalling critical path you wait for a mutex that is held
> elsewhere while allocating memory and the memory allocation calls back into the
> shrinker, you may end up waiting for your own DMA fence to be signaled, which
> causes a deadlock.

Right, and the list of pitfalls such as this is basically limited only
by your imagination--it's long. :)

thanks,
-- 
John Hubbard

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-29  0:09             ` Danilo Krummrich
  2026-01-29  1:02               ` John Hubbard
@ 2026-01-29  1:28               ` Joel Fernandes
  1 sibling, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-29  1:28 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Zhi Wang, linux-kernel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

Hi Danilo,

On 1/28/2026 7:09 PM, Danilo Krummrich wrote:
> On Wed Jan 28, 2026 at 4:27 PM CET, Joel Fernandes wrote:
>> I will go over these concerns, just to clarify - do you mean forbidding 
>> *any* lock or do you mean only forbidding non-atomic locks? I believe we 
>> can avoid non-atomic locks completely - actually I just wrote a patch 
>> before I read this email to do just. If we are to forbid any locking at 
>> all, that might require some careful redesign to handle the above race 
>> afaics.
> 
> It's not about the locks themselves, sleeping locks are fine too.
Ah, so in your last email when you meant "non-atomic", you mean an allocation
that cause memory reclamation etc, right? I got confused by "non-atomic" because
I thought you were referring to acquiring a sleeping lock in a non-atomic
context (I also work on CPU scheduling/RCU, so the word atomic sometimes means
different things to me - my fault not yours :P).

I believe we may have to use "try lock" on a mutex if have to use these in the
future, in a path that cannot wait (such as a page fault handler), but yes I
agree with you we can use mutexes for these, with a combination of try_lock +
bottom half deferrals. additional comment [1].

Coming to the dma-fence deadlocks you mention, this sounds very similar to my
experiences with reclaim-deadlocks when I worked on the Ashmem Android driver.
Deja-vu :-D. The issue there was the memory shrinker would take a lock in the
ashmem driver during reclaim, which is a disaster if the lock was already held
and a memory allocation request triggered reclaim. I believe the DMA fence
usecase is also similar based on your description.

It's about
> holding locks that are held elsewhere when doing memory allocations that can
> call back into MMU notifiers or the shrinker.
> 
> I.e. if in the fence signalling critical path you wait for a mutex that is held
> elsewhere while allocating memory and the memory allocation calls back into the
> shrinker, you may end up waiting for your own DMA fence to be signaled, which
> causes a deadlock.

Got it, I will send the next day or so studying the DMA fence architecture but I
mostly got the idea now. We need to be careful with reclaim locking as you
stressed. I will analyze all the requirements to properly address this. I will
reach out if I have any questions. Thanks for sharing your knowledge in this!

--
Joel Fernandes

[1]
I can confirm for completeness, that both Nouveau and OpenRM use mutexes for
PT/VMM related locking. In interrupt contexts, OpenRM does a "try lock" AFAICS
on its mutex. This is similar to how Linux kernel mm page fault handling
acquires mmap_sem (via try-locking).
The linux kernel does have per-PT spinlocks to handle the "2 paths try to
install a PDE/PTE race", but I don't think we need that at the moment for our
usecases as we can keep it simple and rely on the VMM mutex, we can perhaps add
that in later if needed (or use more finer grained block-level locking), but let
me know if anyone disagrees with that.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-29  1:02               ` John Hubbard
@ 2026-01-29  1:49                 ` Joel Fernandes
  0 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-29  1:49 UTC (permalink / raw)
  To: John Hubbard, Danilo Krummrich
  Cc: Zhi Wang, linux-kernel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	Alistair Popple, Timur Tabi, Edwin Peer, Alexandre Courbot,
	Andrea Righi, Andy Ritger, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev



On 1/28/2026 8:02 PM, John Hubbard wrote:
> On 1/28/26 4:09 PM, Danilo Krummrich wrote:
>> On Wed Jan 28, 2026 at 4:27 PM CET, Joel Fernandes wrote:
>>> I will go over these concerns, just to clarify - do you mean forbidding
>>> *any* lock or do you mean only forbidding non-atomic locks? I believe we
>>> can avoid non-atomic locks completely - actually I just wrote a patch
>>> before I read this email to do just. If we are to forbid any locking at
>>> all, that might require some careful redesign to handle the above race
>>> afaics.
>>
>> It's not about the locks themselves, sleeping locks are fine too. It's about
>> holding locks that are held elsewhere when doing memory allocations that can
>> call back into MMU notifiers or the shrinker.
> 
> If you look at core kernel mm, you'll find a similar constraint: avoid
> holding any locks while allocating--unless you are in the reclaim code
> itself.
> 
> Especially when dealing with page tables.
> 
> So this is looking familiar to me and I agree with the constraint, fwiw.

Right, so similar to core kernel mm, we need to separate PT allocation from the
lock needed for PT writing. Essentially never allocating PT pages in the
dma-fence critical paths. We already have separate locks for both these (buddy
versus vmm), so it should be doable with some adjustments. I will study the
dma-fence further and revise patches. Thanks.
--
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-28 12:04         ` Danilo Krummrich
  2026-01-28 15:27           ` Joel Fernandes
@ 2026-01-30  0:26           ` Joel Fernandes
  2026-01-30  1:11             ` John Hubbard
  2026-01-30  1:16             ` Gary Guo
  1 sibling, 2 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-30  0:26 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Zhi Wang, linux-kernel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

Hi, Danilo, all,

Based on the below discussion and research, I came up with some deadlock
scenarios that we need to handle in the v6 series of these patches. Please let
me know if I missed something below. At the moment, off the top I identified
that we are doing GFP_KERNEL memory allocations inside GPU buddy allocator
during map/unmap. I will work on solutions for that. Thanks.

All deadlock scenarios
----------------------
The gist is, in the DMA fence signaling critical path we cannot acquire
resources (locks or memory allocation etc) that are already acquired when a
fence is being waited on to be signaled. So we have to careful which resources
we acquire, and also we need to be careful which paths in the driver we do any
memory allocations under locks that we need in the dma-fence signaling critical
path (when doing the virtual memory map/unmap)

1. deadlock scenario 1: allocator deadlock (no locking needed to trigger it)

Fence Signal start (A) -> Alloc -> MMU notifier/Shrinker (B) -> Fence Wait (A)

ABA deadlock.

2. deadlock scenario 2: Same as 1, but ABBA scenario (2 CPUs).

CPU 0: Fence Signal start (A) -> Alloc (B)

CPU 1: Alloc -> MMU notifier or Shrinker (B) -> Fence Wait (A)

3. deadlock scenario 3: When locking: ABBA (and similarly) deadlock but locking.

CPU 0: Fence Signal start (A) -> Lock (B)

CPU 1: Lock (B) -> Fence Wait (A)

4. deadlock scenario 4: Same as scenario 3, but the fence wait comes from
allocation path.

rule: We cannot try to acquire locks in the DMA fence signaling critical path if
those locks were already acquire in paths that do reclaimable memory allocations.

CPU 0: Fence Signal (A) -> Lock (B)

CPU 1: Lock (B) -> Alloc -> Fence Wait (A)

5. deadlock scenario 5: Transitive locking:

rule: We cannot try to acquire locks in the DMA fence signaling critical path
that are transitively waiting on the same DMA fence.

Fence Signal (A) -> Lock (B)

Lock (B) -> Lock(C)

Lock (C) -> Alloc -> Fence Wait (A)

ABBCCA deadlock.


--
Joel Fernandes

On 1/28/2026 7:04 AM, Danilo Krummrich wrote:
> On Fri Jan 23, 2026 at 12:16 AM CET, Joel Fernandes wrote:
>> My plan is to make TLB and PRAMIN use immutable references in their function
>> calls and then implement internal locking. I've already done this for the GPU
>> buddy functions, so it should be doable, and we'll keep it consistent. As a
>> result, we will have finer-grain locking on the memory management objects
>> instead of requiring to globally lock a common GpuMm object. I'll plan on
>> doing this for v7.
>>
>> Also, the PTE allocation race you mentioned is already handled by PRAMIN
>> serialization. Since threads must hold the PRAMIN lock to write page table
>> entries, concurrent writers are not possible:
>>
>>   Thread A: acquire PRAMIN lock
>>   Thread A: read PDE (via PRAMIN) -> NULL
>>   Thread A: alloc PT page, write PDE
>>   Thread A: release PRAMIN lock
>>
>>   Thread B: acquire PRAMIN lock
>>   Thread B: read PDE (via PRAMIN) -> sees A's pointer
>>   Thread B: uses existing PT page, no allocation needed
> 
> This won't work unfortunately.
> 
> We have to separate allocations and modifications of the page tabe. Or in other
> words, we must not allocate new PDEs or PTEs while holding the lock protecting
> the page table from modifications.
> 
> Once we have VM_BIND in nova-drm, we will have the situation that userspace
> passes jobs to modify the GPUs virtual address space and hence the page tables.
> 
> Such a jobs has mainly three stages.
> 
>   (1) The submit stage.
> 
>       This is where the job is initialized, dependencies are set up and the
>       driver has to pre-allocate all kinds of structures that are required
>       throughout the subsequent stages of the job.
> 
>   (2) The run stage.
> 
>       This is the stage where the job is staged for execution and its DMA fence
>       has been made public (i.e. it is accessible by userspace).
> 
>       This is the stage where we are in the DMA fence signalling critical
>       section, hence we can't do any non-atomic allocations, since otherwise we
>       could deadlock in MMU notifier callbacks for instance.
> 
>       This is the stage where the page table is actually modified. Hence, we
>       can't acquire any locks that might be held elsewhere while doing
>       non-atomic allocations. Also note that this is transitive, e.g. if you
>       take lock A and somewhere else a lock B is taked while A is already held
>       and we do non-atomic allocations while holding B, then A can't be held in
>       the DMA fence signalling critical path either.
> 
>       It is also worth noting that this is the stage where we know the exact
>       operations we have to execute based on the VM_BIND request from userspace.
> 
>       For instance, in the submit stage we may only know that userspace wants
>       that we map a BO with a certain offset in the GPUs virtual address space
>       at [0x0, 0x1000000]. What we don't know is what exact operations this does
>       require, i.e. "What do we have to unmap first?", "Are there any
>       overlapping mappings that we have to truncate?", etc.
> 
>       So, we have to consider this when we pre-allocate in the submit stage.
> 
>   (3) The cleanup stage.
> 
>       This is where the job has been signaled and hence left the DMA fence
>       signalling critical section.
> 
>       In this stage the job is cleaned up, which includes freeing data that is
>       not required anymore, such as PTEs and PDEs.

-- 
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-30  0:26           ` Joel Fernandes
@ 2026-01-30  1:11             ` John Hubbard
  2026-01-30  1:59               ` Joel Fernandes
  2026-01-30  1:16             ` Gary Guo
  1 sibling, 1 reply; 71+ messages in thread
From: John Hubbard @ 2026-01-30  1:11 UTC (permalink / raw)
  To: Joel Fernandes, Danilo Krummrich
  Cc: Zhi Wang, linux-kernel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	Alistair Popple, Timur Tabi, Edwin Peer, Alexandre Courbot,
	Andrea Righi, Andy Ritger, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev

On 1/29/26 4:26 PM, Joel Fernandes wrote:
> Hi, Danilo, all,
> 
> Based on the below discussion and research, I came up with some deadlock
> scenarios that we need to handle in the v6 series of these patches. Please let
> me know if I missed something below. At the moment, off the top I identified
> that we are doing GFP_KERNEL memory allocations inside GPU buddy allocator
> during map/unmap. I will work on solutions for that. Thanks.
> 
> All deadlock scenarios
> ----------------------
> The gist is, in the DMA fence signaling critical path we cannot acquire
> resources (locks or memory allocation etc) that are already acquired when a
> fence is being waited on to be signaled. So we have to careful which resources
> we acquire, and also we need to be careful which paths in the driver we do any
> memory allocations under locks that we need in the dma-fence signaling critical
> path (when doing the virtual memory map/unmap)

unmap? Are you seeing any allocations happening during unmap? I don't
immediately see any, but that sounds surprising.

thanks,
-- 
John Hubbard

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-30  0:26           ` Joel Fernandes
  2026-01-30  1:11             ` John Hubbard
@ 2026-01-30  1:16             ` Gary Guo
  2026-01-30  1:45               ` Joel Fernandes
  1 sibling, 1 reply; 71+ messages in thread
From: Gary Guo @ 2026-01-30  1:16 UTC (permalink / raw)
  To: Joel Fernandes, Danilo Krummrich
  Cc: Zhi Wang, linux-kernel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

On Fri Jan 30, 2026 at 12:26 AM GMT, Joel Fernandes wrote:
> Hi, Danilo, all,
>
> Based on the below discussion and research, I came up with some deadlock
> scenarios that we need to handle in the v6 series of these patches. Please let
> me know if I missed something below. At the moment, off the top I identified
> that we are doing GFP_KERNEL memory allocations inside GPU buddy allocator
> during map/unmap. I will work on solutions for that. Thanks.
>
> All deadlock scenarios
> ----------------------
> The gist is, in the DMA fence signaling critical path we cannot acquire
> resources (locks or memory allocation etc) that are already acquired when a
> fence is being waited on to be signaled. So we have to careful which resources
> we acquire, and also we need to be careful which paths in the driver we do any
> memory allocations under locks that we need in the dma-fence signaling critical
> path (when doing the virtual memory map/unmap)

When thinking about deadlocks it usually helps if you think without detailed
scenarios (which would be hard to enumerate and easy to miss), but rather in
terms of relative order of resource acquisition. All resources that you wait on
would need to form a partial order. Any violation could result in deadlocks.
This is also how lockdep checks.

So to me all cases you listed are all the same...

Best,
Gary

>
> 1. deadlock scenario 1: allocator deadlock (no locking needed to trigger it)
>
> Fence Signal start (A) -> Alloc -> MMU notifier/Shrinker (B) -> Fence Wait (A)
>
> ABA deadlock.
>
> 2. deadlock scenario 2: Same as 1, but ABBA scenario (2 CPUs).
>
> CPU 0: Fence Signal start (A) -> Alloc (B)
>
> CPU 1: Alloc -> MMU notifier or Shrinker (B) -> Fence Wait (A)
>
> 3. deadlock scenario 3: When locking: ABBA (and similarly) deadlock but locking.
>
> CPU 0: Fence Signal start (A) -> Lock (B)
>
> CPU 1: Lock (B) -> Fence Wait (A)
>
> 4. deadlock scenario 4: Same as scenario 3, but the fence wait comes from
> allocation path.
>
> rule: We cannot try to acquire locks in the DMA fence signaling critical path if
> those locks were already acquire in paths that do reclaimable memory allocations.
>
> CPU 0: Fence Signal (A) -> Lock (B)
>
> CPU 1: Lock (B) -> Alloc -> Fence Wait (A)
>
> 5. deadlock scenario 5: Transitive locking:
>
> rule: We cannot try to acquire locks in the DMA fence signaling critical path
> that are transitively waiting on the same DMA fence.
>
> Fence Signal (A) -> Lock (B)
>
> Lock (B) -> Lock(C)
>
> Lock (C) -> Alloc -> Fence Wait (A)
>
> ABBCCA deadlock.
>
>
> --
> Joel Fernandes
>
> On 1/28/2026 7:04 AM, Danilo Krummrich wrote:
>> On Fri Jan 23, 2026 at 12:16 AM CET, Joel Fernandes wrote:
>>> My plan is to make TLB and PRAMIN use immutable references in their function
>>> calls and then implement internal locking. I've already done this for the GPU
>>> buddy functions, so it should be doable, and we'll keep it consistent. As a
>>> result, we will have finer-grain locking on the memory management objects
>>> instead of requiring to globally lock a common GpuMm object. I'll plan on
>>> doing this for v7.
>>>
>>> Also, the PTE allocation race you mentioned is already handled by PRAMIN
>>> serialization. Since threads must hold the PRAMIN lock to write page table
>>> entries, concurrent writers are not possible:
>>>
>>>   Thread A: acquire PRAMIN lock
>>>   Thread A: read PDE (via PRAMIN) -> NULL
>>>   Thread A: alloc PT page, write PDE
>>>   Thread A: release PRAMIN lock
>>>
>>>   Thread B: acquire PRAMIN lock
>>>   Thread B: read PDE (via PRAMIN) -> sees A's pointer
>>>   Thread B: uses existing PT page, no allocation needed
>> 
>> This won't work unfortunately.
>> 
>> We have to separate allocations and modifications of the page tabe. Or in other
>> words, we must not allocate new PDEs or PTEs while holding the lock protecting
>> the page table from modifications.
>> 
>> Once we have VM_BIND in nova-drm, we will have the situation that userspace
>> passes jobs to modify the GPUs virtual address space and hence the page tables.
>> 
>> Such a jobs has mainly three stages.
>> 
>>   (1) The submit stage.
>> 
>>       This is where the job is initialized, dependencies are set up and the
>>       driver has to pre-allocate all kinds of structures that are required
>>       throughout the subsequent stages of the job.
>> 
>>   (2) The run stage.
>> 
>>       This is the stage where the job is staged for execution and its DMA fence
>>       has been made public (i.e. it is accessible by userspace).
>> 
>>       This is the stage where we are in the DMA fence signalling critical
>>       section, hence we can't do any non-atomic allocations, since otherwise we
>>       could deadlock in MMU notifier callbacks for instance.
>> 
>>       This is the stage where the page table is actually modified. Hence, we
>>       can't acquire any locks that might be held elsewhere while doing
>>       non-atomic allocations. Also note that this is transitive, e.g. if you
>>       take lock A and somewhere else a lock B is taked while A is already held
>>       and we do non-atomic allocations while holding B, then A can't be held in
>>       the DMA fence signalling critical path either.
>> 
>>       It is also worth noting that this is the stage where we know the exact
>>       operations we have to execute based on the VM_BIND request from userspace.
>> 
>>       For instance, in the submit stage we may only know that userspace wants
>>       that we map a BO with a certain offset in the GPUs virtual address space
>>       at [0x0, 0x1000000]. What we don't know is what exact operations this does
>>       require, i.e. "What do we have to unmap first?", "Are there any
>>       overlapping mappings that we have to truncate?", etc.
>> 
>>       So, we have to consider this when we pre-allocate in the submit stage.
>> 
>>   (3) The cleanup stage.
>> 
>>       This is where the job has been signaled and hence left the DMA fence
>>       signalling critical section.
>> 
>>       In this stage the job is cleaned up, which includes freeing data that is
>>       not required anymore, such as PTEs and PDEs.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-30  1:16             ` Gary Guo
@ 2026-01-30  1:45               ` Joel Fernandes
  0 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-30  1:45 UTC (permalink / raw)
  To: Gary Guo
  Cc: Danilo Krummrich, Zhi Wang, linux-kernel, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian Koenig, Jani Nikula,
	Joonas Lahtinen, Vivi Rodrigo, Tvrtko Ursulin, Rui Huang,
	Matthew Auld, Matthew Brost, Lucas De Marchi, Thomas Hellstrom,
	Helge Deller, Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev, Gary Guo

> On Jan 29, 2026, at 8:16 PM, Gary Guo <gary@garyguo.net> wrote:
>
> On Fri Jan 30, 2026 at 12:26 AM GMT, Joel Fernandes wrote:
>> Hi, Danilo, all,
>>
>> Based on the below discussion and research, I came up with some deadlock
>> scenarios that we need to handle in the v6 series of these patches. Please let
>> me know if I missed something below. At the moment, off the top I identified
>> that we are doing GFP_KERNEL memory allocations inside GPU buddy allocator
>> during map/unmap. I will work on solutions for that. Thanks.
>>
>> All deadlock scenarios
>> ----------------------
>> The gist is, in the DMA fence signaling critical path we cannot acquire
>> resources (locks or memory allocation etc) that are already acquired when a
>> fence is being waited on to be signaled. So we have to careful which resources
>> we acquire, and also we need to be careful which paths in the driver we do any
>> memory allocations under locks that we need in the dma-fence signaling critical
>> path (when doing the virtual memory map/unmap)
>
> When thinking about deadlocks it usually helps if you think without detailed
> scenarios (which would be hard to enumerate and easy to miss), but rather in
> terms of relative order of resource acquisition. All resources that you wait on
> would need to form a partial order. Any violation could result in deadlocks.
> This is also how lockdep checks.
>
> So to me all cases you listed are all the same...

Hmm, I am quite familiar with lockdep internals, but I don’t see how all cases
are the same one when there are different resources being acquired (locks versus
memory allocation, for instance). I think it helps to visualize different cases
based on different scenarios for a complete understanding of issues and mild
repetition is a good thing IMO - the goal is to not miss anything. But agreed on
that is how lockdep works. Lockdep just needs those relationships in its graph
to know that ordering enough to flag issues. Speaking of lockdep, I have not
checked but we should probably add support for fence signal/wait and resource
dependencies, to catch any potential issues as well.

Thanks for taking a look,

--
Joel Fernandes



>
> Best,
> Gary
>
>>
>> 1. deadlock scenario 1: allocator deadlock (no locking needed to trigger it)
>>
>> Fence Signal start (A) -> Alloc -> MMU notifier/Shrinker (B) -> Fence Wait (A)
>>
>> ABA deadlock.
>>
>> 2. deadlock scenario 2: Same as 1, but ABBA scenario (2 CPUs).
>>
>> CPU 0: Fence Signal start (A) -> Alloc (B)
>>
>> CPU 1: Alloc -> MMU notifier or Shrinker (B) -> Fence Wait (A)
>>
>> 3. deadlock scenario 3: When locking: ABBA (and similarly) deadlock but locking.
>>
>> CPU 0: Fence Signal start (A) -> Lock (B)
>>
>> CPU 1: Lock (B) -> Fence Wait (A)
>>
>> 4. deadlock scenario 4: Same as scenario 3, but the fence wait comes from
>> allocation path.
>>
>> rule: We cannot try to acquire locks in the DMA fence signaling critical path if
>> those locks were already acquire in paths that do reclaimable memory allocations.
>>
>> CPU 0: Fence Signal (A) -> Lock (B)
>>
>> CPU 1: Lock (B) -> Alloc -> Fence Wait (A)
>>
>> 5. deadlock scenario 5: Transitive locking:
>>
>> rule: We cannot try to acquire locks in the DMA fence signaling critical path
>> that are transitively waiting on the same DMA fence.
>>
>> Fence Signal (A) -> Lock (B)
>>
>> Lock (B) -> Lock(C)
>>
>> Lock (C) -> Alloc -> Fence Wait (A)
>>
>> ABBCCA deadlock.
>>
>>
>> --
>> Joel Fernandes
>>
>>> On 1/28/2026 7:04 AM, Danilo Krummrich wrote:
>>> On Fri Jan 23, 2026 at 12:16 AM CET, Joel Fernandes wrote:
>>>> My plan is to make TLB and PRAMIN use immutable references in their function
>>>> calls and then implement internal locking. I've already done this for the GPU
>>>> buddy functions, so it should be doable, and we'll keep it consistent. As a
>>>> result, we will have finer-grain locking on the memory management objects
>>>> instead of requiring to globally lock a common GpuMm object. I'll plan on
>>>> doing this for v7.
>>>>
>>>> Also, the PTE allocation race you mentioned is already handled by PRAMIN
>>>> serialization. Since threads must hold the PRAMIN lock to write page table
>>>> entries, concurrent writers are not possible:
>>>>
>>>>  Thread A: acquire PRAMIN lock
>>>>  Thread A: read PDE (via PRAMIN) -> NULL
>>>>  Thread A: alloc PT page, write PDE
>>>>  Thread A: release PRAMIN lock
>>>>
>>>>  Thread B: acquire PRAMIN lock
>>>>  Thread B: read PDE (via PRAMIN) -> sees A's pointer
>>>>  Thread B: uses existing PT page, no allocation needed
>>>
>>> This won't work unfortunately.
>>>
>>> We have to separate allocations and modifications of the page tabe. Or in other
>>> words, we must not allocate new PDEs or PTEs while holding the lock protecting
>>> the page table from modifications.
>>>
>>> Once we have VM_BIND in nova-drm, we will have the situation that userspace
>>> passes jobs to modify the GPUs virtual address space and hence the page tables.
>>>
>>> Such a jobs has mainly three stages.
>>>
>>>  (1) The submit stage.
>>>
>>>      This is where the job is initialized, dependencies are set up and the
>>>      driver has to pre-allocate all kinds of structures that are required
>>>      throughout the subsequent stages of the job.
>>>
>>>  (2) The run stage.
>>>
>>>      This is the stage where the job is staged for execution and its DMA fence
>>>      has been made public (i.e. it is accessible by userspace).
>>>
>>>      This is the stage where we are in the DMA fence signalling critical
>>>      section, hence we can't do any non-atomic allocations, since otherwise we
>>>      could deadlock in MMU notifier callbacks for instance.
>>>
>>>      This is the stage where the page table is actually modified. Hence, we
>>>      can't acquire any locks that might be held elsewhere while doing
>>>      non-atomic allocations. Also note that this is transitive, e.g. if you
>>>      take lock A and somewhere else a lock B is taked while A is already held
>>>      and we do non-atomic allocations while holding B, then A can't be held in
>>>      the DMA fence signalling critical path either.
>>>
>>>      It is also worth noting that this is the stage where we know the exact
>>>      operations we have to execute based on the VM_BIND request from userspace.
>>>
>>>      For instance, in the submit stage we may only know that userspace wants
>>>      that we map a BO with a certain offset in the GPUs virtual address space
>>>      at [0x0, 0x1000000]. What we don't know is what exact operations this does
>>>      require, i.e. "What do we have to unmap first?", "Are there any
>>>      overlapping mappings that we have to truncate?", etc.
>>>
>>>      So, we have to consider this when we pre-allocate in the submit stage.
>>>
>>>  (3) The cleanup stage.
>>>
>>>      This is where the job has been signaled and hence left the DMA fence
>>>      signalling critical section.
>>>
>>>      In this stage the job is cleaned up, which includes freeing data that is
>>>      not required anymore, such as PTEs and PDEs.
>

-- 
Joel Fernandes

-- 
-- 
Joel Fernandes

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-30  1:11             ` John Hubbard
@ 2026-01-30  1:59               ` Joel Fernandes
  2026-01-30  3:38                 ` John Hubbard
  0 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-30  1:59 UTC (permalink / raw)
  To: John Hubbard
  Cc: Danilo Krummrich, Zhi Wang, linux-kernel, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian Koenig, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Rui Huang,
	Matthew Auld, Matthew Brost, Lucas De Marchi, Thomas Hellstrom,
	Helge Deller, Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Bjorn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

On 1/29/26 8:12 PM, John Hubbard wrote:
> On 1/29/26 4:26 PM, Joel Fernandes wrote:
>> Based on the below discussion and research, I came up with some deadlock
>> scenarios that we need to handle in the v6 series of these patches.
>> [...]
>> memory allocations under locks that we need in the dma-fence signaling
>> critical path (when doing the virtual memory map/unmap)
> 
> unmap? Are you seeing any allocations happening during unmap? I don't
> immediately see any, but that sounds surprising.

Not allocations but we are acquiring locks during unmap. My understanding
is (at least some) unmaps have to also be done in the dma fence signaling
critical path (the run stage), but Danilo/you can correct me if I am wrong
on that. We cannot avoid all locking but those same locks cannot be held in
any other paths which do a memory allocation (as mentioned in one of the
deadlock scenarios), that is probably the main thing to check for unmap.

Thanks,
-- 
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-30  1:59               ` Joel Fernandes
@ 2026-01-30  3:38                 ` John Hubbard
  2026-01-30 21:14                   ` Joel Fernandes
  0 siblings, 1 reply; 71+ messages in thread
From: John Hubbard @ 2026-01-30  3:38 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Danilo Krummrich, Zhi Wang, linux-kernel, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian Koenig, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Rui Huang,
	Matthew Auld, Matthew Brost, Lucas De Marchi, Thomas Hellstrom,
	Helge Deller, Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Bjorn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

On 1/29/26 5:59 PM, Joel Fernandes wrote:
> On 1/29/26 8:12 PM, John Hubbard wrote:
>> On 1/29/26 4:26 PM, Joel Fernandes wrote:
>>> Based on the below discussion and research, I came up with some deadlock
>>> scenarios that we need to handle in the v6 series of these patches.
>>> [...]
>>> memory allocations under locks that we need in the dma-fence signaling
>>> critical path (when doing the virtual memory map/unmap)
>>
>> unmap? Are you seeing any allocations happening during unmap? I don't
>> immediately see any, but that sounds surprising.
> 
> Not allocations but we are acquiring locks during unmap. My understanding
> is (at least some) unmaps have to also be done in the dma fence signaling
> critical path (the run stage), but Danilo/you can correct me if I am wrong
> on that. We cannot avoid all locking but those same locks cannot be held in
> any other paths which do a memory allocation (as mentioned in one of the
> deadlock scenarios), that is probably the main thing to check for unmap.
> 

Right, OK we are on the same page now: no allocations happening on unmap,
but it can still deadlock, because the driver is typically going to
use a single lock to protect calls both map and unmap-related calls
to the buddy allocator.

For the deadlock above, I think a good way to break that deadlock is
to not allow taking that lock in a fence signaling calling path.

So during an unmap, instead of "lock, unmap/free, unlock" it should
move the item to a deferred-free list, which is processed separately.
Of course, this is a little complex, because the allocation and reclaim
has to be aware of such lists if they get large.



thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-30  3:38                 ` John Hubbard
@ 2026-01-30 21:14                   ` Joel Fernandes
  2026-01-31  3:00                     ` Dave Airlie
  0 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-01-30 21:14 UTC (permalink / raw)
  To: John Hubbard
  Cc: Danilo Krummrich, Zhi Wang, linux-kernel, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian Koenig, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Rui Huang,
	Matthew Auld, Matthew Brost, Lucas De Marchi, Thomas Hellstrom,
	Helge Deller, Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Bjorn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev



On 1/29/2026 10:38 PM, John Hubbard wrote:
> On 1/29/26 5:59 PM, Joel Fernandes wrote:
>> On 1/29/26 8:12 PM, John Hubbard wrote:
>>> On 1/29/26 4:26 PM, Joel Fernandes wrote:
>>>> Based on the below discussion and research, I came up with some deadlock
>>>> scenarios that we need to handle in the v6 series of these patches.
>>>> [...]
>>>> memory allocations under locks that we need in the dma-fence signaling
>>>> critical path (when doing the virtual memory map/unmap)
>>>
>>> unmap? Are you seeing any allocations happening during unmap? I don't
>>> immediately see any, but that sounds surprising.
>>
>> Not allocations but we are acquiring locks during unmap. My understanding
>> is (at least some) unmaps have to also be done in the dma fence signaling
>> critical path (the run stage), but Danilo/you can correct me if I am wrong
>> on that. We cannot avoid all locking but those same locks cannot be held in
>> any other paths which do a memory allocation (as mentioned in one of the
>> deadlock scenarios), that is probably the main thing to check for unmap.
>>
> 
> Right, OK we are on the same page now: no allocations happening on unmap,
> but it can still deadlock, because the driver is typically going to
> use a single lock to protect calls both map and unmap-related calls
> to the buddy allocator.

Yes exactly!

> 
> For the deadlock above, I think a good way to break that deadlock is
> to not allow taking that lock in a fence signaling calling path.
> 
> So during an unmap, instead of "lock, unmap/free, unlock" it should
> move the item to a deferred-free list, which is processed separately.
> Of course, this is a little complex, because the allocation and reclaim
> has to be aware of such lists if they get large.
Yes, also avoiding GFP_KERNEL allocations while holding any of these mm locks
(whichever we take during map). The GPU buddy actually does GFP_KERNEL
allocations internally which is problematic.

Some solutions / next steps:

1. allocating (VRAM and system memory) outside mm locks just before acquiring them.

2. pre-allocating both VRAM and system memory needed, before the DMA fence
critical paths (The issue is also to figure out how much memory to pre-allocate
for the page table pages based on the VM_BIND request. I think we can analyze
the page tables in the submit stage to make an estimate).

3. Unfortunately, I am using gpu-buddy when allocating a VA range in the Vmm
(called virt_buddy), which itself does GFP_KERNEL memory allocations in the
allocate path. I am not sure what do yet about this. ISTR the maple tree also
has similar issues.

4. Using non-reclaimable memory allocations where pre-allocation or
pre-allocated memory pools is not possible (I'd like to avoid this #4 so we
don't fail allocations when memory is scarce).

Will work on these issues for the v7. Thanks,

--
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-30 21:14                   ` Joel Fernandes
@ 2026-01-31  3:00                     ` Dave Airlie
  2026-01-31  3:21                       ` John Hubbard
                                         ` (2 more replies)
  0 siblings, 3 replies; 71+ messages in thread
From: Dave Airlie @ 2026-01-31  3:00 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: John Hubbard, Danilo Krummrich, Zhi Wang, linux-kernel,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, Jonathan Corbet, Alex Deucher, Christian Koenig,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Rui Huang, Matthew Auld, Matthew Brost, Lucas De Marchi,
	Thomas Hellstrom, Helge Deller, Alice Ryhl, Miguel Ojeda,
	Alex Gaynor, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, Alistair Popple, Timur Tabi,
	Edwin Peer, Alexandre Courbot, Andrea Righi, Andy Ritger,
	Alexey Ivanov, Balbir Singh, Philipp Stanner, Elle Rhumsaa,
	Daniel Almeida, nouveau, dri-devel, rust-for-linux, linux-doc,
	amd-gfx, intel-gfx, intel-xe, linux-fbdev

On Sat, 31 Jan 2026 at 07:14, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>
>
>
> On 1/29/2026 10:38 PM, John Hubbard wrote:
> > On 1/29/26 5:59 PM, Joel Fernandes wrote:
> >> On 1/29/26 8:12 PM, John Hubbard wrote:
> >>> On 1/29/26 4:26 PM, Joel Fernandes wrote:
> >>>> Based on the below discussion and research, I came up with some deadlock
> >>>> scenarios that we need to handle in the v6 series of these patches.
> >>>> [...]
> >>>> memory allocations under locks that we need in the dma-fence signaling
> >>>> critical path (when doing the virtual memory map/unmap)
> >>>
> >>> unmap? Are you seeing any allocations happening during unmap? I don't
> >>> immediately see any, but that sounds surprising.
> >>
> >> Not allocations but we are acquiring locks during unmap. My understanding
> >> is (at least some) unmaps have to also be done in the dma fence signaling
> >> critical path (the run stage), but Danilo/you can correct me if I am wrong
> >> on that. We cannot avoid all locking but those same locks cannot be held in
> >> any other paths which do a memory allocation (as mentioned in one of the
> >> deadlock scenarios), that is probably the main thing to check for unmap.
> >>
> >
> > Right, OK we are on the same page now: no allocations happening on unmap,
> > but it can still deadlock, because the driver is typically going to
> > use a single lock to protect calls both map and unmap-related calls
> > to the buddy allocator.
>
> Yes exactly!
>
> >
> > For the deadlock above, I think a good way to break that deadlock is
> > to not allow taking that lock in a fence signaling calling path.
> >
> > So during an unmap, instead of "lock, unmap/free, unlock" it should
> > move the item to a deferred-free list, which is processed separately.
> > Of course, this is a little complex, because the allocation and reclaim
> > has to be aware of such lists if they get large.
> Yes, also avoiding GFP_KERNEL allocations while holding any of these mm locks
> (whichever we take during map). The GPU buddy actually does GFP_KERNEL
> allocations internally which is problematic.
>
> Some solutions / next steps:
>
> 1. allocating (VRAM and system memory) outside mm locks just before acquiring them.
>
> 2. pre-allocating both VRAM and system memory needed, before the DMA fence
> critical paths (The issue is also to figure out how much memory to pre-allocate
> for the page table pages based on the VM_BIND request. I think we can analyze
> the page tables in the submit stage to make an estimate).
>
> 3. Unfortunately, I am using gpu-buddy when allocating a VA range in the Vmm
> (called virt_buddy), which itself does GFP_KERNEL memory allocations in the
> allocate path. I am not sure what do yet about this. ISTR the maple tree also
> has similar issues.
>
> 4. Using non-reclaimable memory allocations where pre-allocation or
> pre-allocated memory pools is not possible (I'd like to avoid this #4 so we
> don't fail allocations when memory is scarce).
>
> Will work on these issues for the v7. Thanks,

The way this works on nouveau at least (and I haven't yet read the
nova code in depth).

Is we have 4 stages of vmm page table mgmt.

ref - locked with a ref lock - can allocate/free memory - just makes
sure the page tables exist and are reference counted
map - locked with a map lock - cannot allocate memory - fill in the
PTEs in the page table
unmap - locked with a map lock - cannot allocate memory - removes
entries in PTEs
unref - locked with a ref lock - can allocate/free memory - just drops
references and frees (not sure if it ever merges).

So maps and unmaps can be in fence signalling paths, but unrefs are
done in free job from a workqueue.

Dave.
>
> --
> Joel Fernandes
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-31  3:00                     ` Dave Airlie
@ 2026-01-31  3:21                       ` John Hubbard
  2026-01-31 20:08                         ` Joel Fernandes
  2026-01-31 20:02                       ` Joel Fernandes
  2026-02-02  9:12                       ` Christian König
  2 siblings, 1 reply; 71+ messages in thread
From: John Hubbard @ 2026-01-31  3:21 UTC (permalink / raw)
  To: Dave Airlie, Joel Fernandes
  Cc: Danilo Krummrich, Zhi Wang, linux-kernel, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Rui Huang, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Trevor Gross,
	Alistair Popple, Timur Tabi, Edwin Peer, Alexandre Courbot,
	Andrea Righi, Andy Ritger, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, nouveau, dri-devel,
	rust-for-linux, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev

On 1/30/26 7:00 PM, Dave Airlie wrote:
> On Sat, 31 Jan 2026 at 07:14, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>> On 1/29/2026 10:38 PM, John Hubbard wrote:
>>> On 1/29/26 5:59 PM, Joel Fernandes wrote:
>>>> On 1/29/26 8:12 PM, John Hubbard wrote:
>>>>> On 1/29/26 4:26 PM, Joel Fernandes wrote:
>>>>>> [...]
>> Will work on these issues for the v7. Thanks,
> 
> The way this works on nouveau at least (and I haven't yet read the
> nova code in depth).
> 
> Is we have 4 stages of vmm page table mgmt.
> 
> ref - locked with a ref lock - can allocate/free memory - just makes
> sure the page tables exist and are reference counted
> map - locked with a map lock - cannot allocate memory - fill in the
> PTEs in the page table
> unmap - locked with a map lock - cannot allocate memory - removes
> entries in PTEs
> unref - locked with a ref lock - can allocate/free memory - just drops
> references and frees (not sure if it ever merges).
> 
> So maps and unmaps can be in fence signalling paths, but unrefs are
> done in free job from a workqueue.
> 

Nice! Thanks Dave, I guess this is one time we really should have
taken a peek at nouveau for inspiration after all. :)

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-31  3:00                     ` Dave Airlie
  2026-01-31  3:21                       ` John Hubbard
@ 2026-01-31 20:02                       ` Joel Fernandes
  2026-02-02  9:12                       ` Christian König
  2 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-31 20:02 UTC (permalink / raw)
  To: Dave Airlie
  Cc: John Hubbard, Danilo Krummrich, Zhi Wang,
	linux-kernel@vger.kernel.org, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian Koenig, Jani Nikula, Joonas Lahtinen, Vivi Rodrigo,
	Tvrtko Ursulin, Rui Huang, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alice Ryhl,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, nouveau@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
	linux-doc@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org

Hi Dave,

> On Jan 30, 2026, at 10:01 PM, Dave Airlie <airlied@gmail.com> wrote:
> 
> On Sat, 31 Jan 2026 at 07:14, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>> 
>> 
>> 
>>> On 1/29/2026 10:38 PM, John Hubbard wrote:
>>> On 1/29/26 5:59 PM, Joel Fernandes wrote:
>>>> On 1/29/26 8:12 PM, John Hubbard wrote:
>>>>> On 1/29/26 4:26 PM, Joel Fernandes wrote:
>>>>>> Based on the below discussion and research, I came up with some deadlock
>>>>>> scenarios that we need to handle in the v6 series of these patches.
>>>>>> [...]
>>>>>> memory allocations under locks that we need in the dma-fence signaling
>>>>>> critical path (when doing the virtual memory map/unmap)
>>>>> 
>>>>> unmap? Are you seeing any allocations happening during unmap? I don't
>>>>> immediately see any, but that sounds surprising.
>>>> 
>>>> Not allocations but we are acquiring locks during unmap. My understanding
>>>> is (at least some) unmaps have to also be done in the dma fence signaling
>>>> critical path (the run stage), but Danilo/you can correct me if I am wrong
>>>> on that. We cannot avoid all locking but those same locks cannot be held in
>>>> any other paths which do a memory allocation (as mentioned in one of the
>>>> deadlock scenarios), that is probably the main thing to check for unmap.
>>>> 
>>> 
>>> Right, OK we are on the same page now: no allocations happening on unmap,
>>> but it can still deadlock, because the driver is typically going to
>>> use a single lock to protect calls both map and unmap-related calls
>>> to the buddy allocator.
>> 
>> Yes exactly!
>> 
>>> 
>>> For the deadlock above, I think a good way to break that deadlock is
>>> to not allow taking that lock in a fence signaling calling path.
>>> 
>>> So during an unmap, instead of "lock, unmap/free, unlock" it should
>>> move the item to a deferred-free list, which is processed separately.
>>> Of course, this is a little complex, because the allocation and reclaim
>>> has to be aware of such lists if they get large.
>> Yes, also avoiding GFP_KERNEL allocations while holding any of these mm locks
>> (whichever we take during map). The GPU buddy actually does GFP_KERNEL
>> allocations internally which is problematic.
>> 
>> Some solutions / next steps:
>> 
>> 1. allocating (VRAM and system memory) outside mm locks just before acquiring them.
>> 
>> 2. pre-allocating both VRAM and system memory needed, before the DMA fence
>> critical paths (The issue is also to figure out how much memory to pre-allocate
>> for the page table pages based on the VM_BIND request. I think we can analyze
>> the page tables in the submit stage to make an estimate).
>> 
>> 3. Unfortunately, I am using gpu-buddy when allocating a VA range in the Vmm
>> (called virt_buddy), which itself does GFP_KERNEL memory allocations in the
>> allocate path. I am not sure what do yet about this. ISTR the maple tree also
>> has similar issues.
>> 
>> 4. Using non-reclaimable memory allocations where pre-allocation or
>> pre-allocated memory pools is not possible (I'd like to avoid this #4 so we
>> don't fail allocations when memory is scarce).
>> 
>> Will work on these issues for the v7. Thanks,
> 
> The way this works on nouveau at least (and I haven't yet read the
> nova code in depth).
> 
> Is we have 4 stages of vmm page table mgmt.
> 
> ref - locked with a ref lock - can allocate/free memory - just makes
> sure the page tables exist and are reference counted
> map - locked with a map lock - cannot allocate memory - fill in the
> PTEs in the page table
> unmap - locked with a map lock - cannot allocate memory - removes
> entries in PTEs
> unref - locked with a ref lock - can allocate/free memory - just drops
> references and frees (not sure if it ever merges).

Thanks for sharing this, yes this is similar to what I am coming up with.

One thing is OpenRM (and the Linux kernel) have finer grained locking.

But I think we can keep it simple initially like we do for Nouveau with additional complexity progressively added.

Joel Fernandes


> 
> So maps and unmaps can be in fence signalling paths, but unrefs are
> done in free job from a workqueue.
> 
> Dave.
>> 
>> --
>> Joel Fernandes
>> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-31  3:21                       ` John Hubbard
@ 2026-01-31 20:08                         ` Joel Fernandes
  0 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-01-31 20:08 UTC (permalink / raw)
  To: John Hubbard
  Cc: Dave Airlie, Danilo Krummrich, Zhi Wang,
	linux-kernel@vger.kernel.org, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian Koenig, Jani Nikula, Joonas Lahtinen, Vivi Rodrigo,
	Tvrtko Ursulin, Rui Huang, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alice Ryhl,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Alexey Ivanov, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, Daniel Almeida, nouveau@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, rust-for-linux@vger.kernel.org,
	linux-doc@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org



> On Jan 30, 2026, at 10:21 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> 
> On 1/30/26 7:00 PM, Dave Airlie wrote:
>>> On Sat, 31 Jan 2026 at 07:14, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>> On 1/29/2026 10:38 PM, John Hubbard wrote:
>>>> On 1/29/26 5:59 PM, Joel Fernandes wrote:
>>>>> On 1/29/26 8:12 PM, John Hubbard wrote:
>>>>>>> On 1/29/26 4:26 PM, Joel Fernandes wrote:
>>>>>>>> [...]
>>> Will work on these issues for the v7. Thanks,
>> 
>> The way this works on nouveau at least (and I haven't yet read the
>> nova code in depth).
>> 
>> Is we have 4 stages of vmm page table mgmt.
>> 
>> ref - locked with a ref lock - can allocate/free memory - just makes
>> sure the page tables exist and are reference counted
>> map - locked with a map lock - cannot allocate memory - fill in the
>> PTEs in the page table
>> unmap - locked with a map lock - cannot allocate memory - removes
>> entries in PTEs
>> unref - locked with a ref lock - can allocate/free memory - just drops
>> references and frees (not sure if it ever merges).
>> 
>> So maps and unmaps can be in fence signalling paths, but unrefs are
>> done in free job from a workqueue.
>> 
> 
> Nice! Thanks Dave

Indeed, thanks Dave and John.


> , I guess this is one time we really should have
> taken a peek at nouveau for inspiration after all. :)

I have actually been referring to Nouveau, OpenRM and also the core kernel mm code for my research in this area. These all have been a great reference in this respect :). Thanks,

Joel Fernandes


> 
> thanks,
> --
> John Hubbard
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-01-31  3:00                     ` Dave Airlie
  2026-01-31  3:21                       ` John Hubbard
  2026-01-31 20:02                       ` Joel Fernandes
@ 2026-02-02  9:12                       ` Christian König
  2026-02-04 23:42                         ` Joel Fernandes
  2 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2026-02-02  9:12 UTC (permalink / raw)
  To: Dave Airlie, Joel Fernandes
  Cc: John Hubbard, Danilo Krummrich, Zhi Wang, linux-kernel,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, Jonathan Corbet, Alex Deucher, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Rui Huang,
	Matthew Auld, Matthew Brost, Lucas De Marchi, Thomas Hellstrom,
	Helge Deller, Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Bjorn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

On 1/31/26 04:00, Dave Airlie wrote:
> On Sat, 31 Jan 2026 at 07:14, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>
>>
>>
>> On 1/29/2026 10:38 PM, John Hubbard wrote:
>>> On 1/29/26 5:59 PM, Joel Fernandes wrote:
>>>> On 1/29/26 8:12 PM, John Hubbard wrote:
>>>>> On 1/29/26 4:26 PM, Joel Fernandes wrote:
>>>>>> Based on the below discussion and research, I came up with some deadlock
>>>>>> scenarios that we need to handle in the v6 series of these patches.
>>>>>> [...]
>>>>>> memory allocations under locks that we need in the dma-fence signaling
>>>>>> critical path (when doing the virtual memory map/unmap)
>>>>>
>>>>> unmap? Are you seeing any allocations happening during unmap? I don't
>>>>> immediately see any, but that sounds surprising.
>>>>
>>>> Not allocations but we are acquiring locks during unmap. My understanding
>>>> is (at least some) unmaps have to also be done in the dma fence signaling
>>>> critical path (the run stage), but Danilo/you can correct me if I am wrong
>>>> on that. We cannot avoid all locking but those same locks cannot be held in
>>>> any other paths which do a memory allocation (as mentioned in one of the
>>>> deadlock scenarios), that is probably the main thing to check for unmap.
>>>>
>>>
>>> Right, OK we are on the same page now: no allocations happening on unmap,
>>> but it can still deadlock, because the driver is typically going to
>>> use a single lock to protect calls both map and unmap-related calls
>>> to the buddy allocator.
>>
>> Yes exactly!
>>
>>>
>>> For the deadlock above, I think a good way to break that deadlock is
>>> to not allow taking that lock in a fence signaling calling path.
>>>
>>> So during an unmap, instead of "lock, unmap/free, unlock" it should
>>> move the item to a deferred-free list, which is processed separately.
>>> Of course, this is a little complex, because the allocation and reclaim
>>> has to be aware of such lists if they get large.
>> Yes, also avoiding GFP_KERNEL allocations while holding any of these mm locks
>> (whichever we take during map). The GPU buddy actually does GFP_KERNEL
>> allocations internally which is problematic.
>>
>> Some solutions / next steps:
>>
>> 1. allocating (VRAM and system memory) outside mm locks just before acquiring them.
>>
>> 2. pre-allocating both VRAM and system memory needed, before the DMA fence
>> critical paths (The issue is also to figure out how much memory to pre-allocate
>> for the page table pages based on the VM_BIND request. I think we can analyze
>> the page tables in the submit stage to make an estimate).
>>
>> 3. Unfortunately, I am using gpu-buddy when allocating a VA range in the Vmm
>> (called virt_buddy), which itself does GFP_KERNEL memory allocations in the
>> allocate path. I am not sure what do yet about this. ISTR the maple tree also
>> has similar issues.
>>
>> 4. Using non-reclaimable memory allocations where pre-allocation or
>> pre-allocated memory pools is not possible (I'd like to avoid this #4 so we
>> don't fail allocations when memory is scarce).
>>
>> Will work on these issues for the v7. Thanks,
> 
> The way this works on nouveau at least (and I haven't yet read the
> nova code in depth).
> 
> Is we have 4 stages of vmm page table mgmt.
> 
> ref - locked with a ref lock - can allocate/free memory - just makes
> sure the page tables exist and are reference counted
> map - locked with a map lock - cannot allocate memory - fill in the
> PTEs in the page table
> unmap - locked with a map lock - cannot allocate memory - removes
> entries in PTEs
> unref - locked with a ref lock - can allocate/free memory - just drops
> references and frees (not sure if it ever merges).

On amdgpu VM page tables are allocated and PTEs filled outside of the fence critical path.

Only invalidating PTEs to signal that a shader needs to be taken off the HW are inside the fence critical path and here no memory allocation is needed.

Keep in mind that you not only need to avoid having memory allocations inside the critical path, but also not take locks under which memory is allocated.

Simona added some dma_fence_begin_signalling() and dma_fence_end_signalling() helpers to add lockdep annotations to the fence signaling path. Those have proven to be extremely useful since they allow lockdep to point out mistakes immediately and not just after hours of running on a test system.

Regards,
Christian.

> 
> So maps and unmaps can be in fence signalling paths, but unrefs are
> done in free job from a workqueue.
> 
> Dave.
>>
>> --
>> Joel Fernandes
>>


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 03/26] rust: gpu: Add GPU buddy allocator bindings
  2026-01-20 20:42 ` [PATCH RFC v6 03/26] rust: gpu: Add GPU buddy allocator bindings Joel Fernandes
@ 2026-02-04  3:55   ` Dave Airlie
  2026-02-05  1:00     ` Joel Fernandes
  0 siblings, 1 reply; 71+ messages in thread
From: Dave Airlie @ 2026-02-04  3:55 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

> +///
> +/// These flags control the allocation behavior of the buddy allocator.
> +#[derive(Clone, Copy, Default, PartialEq, Eq)]
> +pub struct BuddyFlags(usize);
> +
> +impl BuddyFlags {
> +    /// Range-based allocation from start to end addresses.
> +    pub const RANGE_ALLOCATION: usize = bindings::GPU_BUDDY_RANGE_ALLOCATION;
> +
> +    /// Allocate from top of address space downward.
> +    pub const TOPDOWN_ALLOCATION: usize = bindings::GPU_BUDDY_TOPDOWN_ALLOCATION;
> +
> +    /// Allocate physically contiguous blocks.
> +    pub const CONTIGUOUS_ALLOCATION: usize = bindings::GPU_BUDDY_CONTIGUOUS_ALLOCATION;
> +
> +    /// Request allocation from the cleared (zeroed) memory. The zero'ing is not
> +    /// done by the allocator, but by the caller before freeing old blocks.
> +    pub const CLEAR_ALLOCATION: usize = bindings::GPU_BUDDY_CLEAR_ALLOCATION;
> +
> +    /// Disable trimming of partially used blocks.
> +    pub const TRIM_DISABLE: usize = bindings::GPU_BUDDY_TRIM_DISABLE;
> +
> +    /// Mark blocks as cleared (zeroed) when freeing. When set during free,
> +    /// indicates that the caller has already zeroed the memory.
> +    pub const CLEARED: usize = bindings::GPU_BUDDY_CLEARED;
> +
> +    /// Create [`BuddyFlags`] from a raw value with validation.
> +    ///
> +    /// Use `|` operator to combine flags if needed, before calling this method.
> +    pub fn try_new(flags: usize) -> Result<Self> {
> +        // Flags must not exceed u32::MAX to satisfy the GPU buddy allocator C API.
> +        if flags > u32::MAX as usize {
> +            return Err(EINVAL);
> +        }
> +
> +        // `TOPDOWN_ALLOCATION` only works without `RANGE_ALLOCATION`. When both are
> +        // set, `TOPDOWN_ALLOCATION` is silently ignored by the allocator. Reject this.
> +        if (flags & Self::RANGE_ALLOCATION) != 0 && (flags & Self::TOPDOWN_ALLOCATION) != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        Ok(Self(flags))
> +    }
> +
> +    /// Get raw value of the flags.
> +    pub(crate) fn as_raw(self) -> usize {
> +        self.0
> +    }
> +}
> +
> +/// Parameters for creating a GPU buddy allocator.
> +#[derive(Clone, Copy)]
> +pub struct GpuBuddyParams {
> +    /// Base offset in bytes where the managed memory region starts.
> +    /// Allocations will be offset by this value.
> +    pub base_offset_bytes: u64,
> +    /// Total physical memory size managed by the allocator in bytes.
> +    pub physical_memory_size_bytes: u64,
> +    /// Minimum allocation unit / chunk size in bytes, must be >= 4KB.
> +    pub chunk_size_bytes: u64,
> +}
> +
> +/// Parameters for allocating blocks from a GPU buddy allocator.
> +#[derive(Clone, Copy)]
> +pub struct GpuBuddyAllocParams {
> +    /// Start of allocation range in bytes. Use 0 for beginning.
> +    pub start_range_address: u64,
> +    /// End of allocation range in bytes. Use 0 for entire range.
> +    pub end_range_address: u64,
> +    /// Total size to allocate in bytes.
> +    pub size_bytes: u64,
> +    /// Minimum block size for fragmented allocations in bytes.
> +    pub min_block_size_bytes: u64,
> +    /// Buddy allocator behavior flags.
> +    pub buddy_flags: BuddyFlags,
> +}
> +

(not a full review)

Any reason these two need Clone, Copy? I'm not seeing a use case for
that, maybe we should pass them as non-mutable references, but I don't
think there is any point in passing them by value ever.

Dave.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
  2026-02-02  9:12                       ` Christian König
@ 2026-02-04 23:42                         ` Joel Fernandes
  0 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-02-04 23:42 UTC (permalink / raw)
  To: Christian König, Dave Airlie
  Cc: John Hubbard, Danilo Krummrich, Zhi Wang, linux-kernel,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, Jonathan Corbet, Alex Deucher, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Rui Huang,
	Matthew Auld, Matthew Brost, Lucas De Marchi, Thomas Hellstrom,
	Helge Deller, Alice Ryhl, Miguel Ojeda, Alex Gaynor, Boqun Feng,
	Gary Guo, Bjorn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Alexey Ivanov,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Daniel Almeida,
	nouveau, dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev



On 2/2/2026 4:12 AM, Christian König wrote:
> On 1/31/26 04:00, Dave Airlie wrote:
>> On Sat, 31 Jan 2026 at 07:14, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>> On 1/29/2026 10:38 PM, John Hubbard wrote:
[...]
>>>> For the deadlock above, I think a good way to break that deadlock is
>>>> to not allow taking that lock in a fence signaling calling path.
>>>>
>>>> So during an unmap, instead of "lock, unmap/free, unlock" it should
>>>> move the item to a deferred-free list, which is processed separately.
>>>> Of course, this is a little complex, because the allocation and reclaim
>>>> has to be aware of such lists if they get large.
>>> Yes, also avoiding GFP_KERNEL allocations while holding any of these mm locks
>>> (whichever we take during map). The GPU buddy actually does GFP_KERNEL
>>> allocations internally which is problematic.
>>>
>>> Some solutions / next steps:
>>>
>>> 1. allocating (VRAM and system memory) outside mm locks just before acquiring them.
>>>
>>> 2. pre-allocating both VRAM and system memory needed, before the DMA fence
>>> critical paths (The issue is also to figure out how much memory to pre-allocate
>>> for the page table pages based on the VM_BIND request. I think we can analyze
>>> the page tables in the submit stage to make an estimate).
>>>
>>> 3. Unfortunately, I am using gpu-buddy when allocating a VA range in the Vmm
>>> (called virt_buddy), which itself does GFP_KERNEL memory allocations in the
>>> allocate path. I am not sure what do yet about this. ISTR the maple tree also
>>> has similar issues.
>>>
>>> 4. Using non-reclaimable memory allocations where pre-allocation or
>>> pre-allocated memory pools is not possible (I'd like to avoid this #4 so we
>>> don't fail allocations when memory is scarce).
>>>
>>> Will work on these issues for the v7. Thanks,
>>
>> The way this works on nouveau at least (and I haven't yet read the
>> nova code in depth).
>>
>> Is we have 4 stages of vmm page table mgmt.
>>
>> ref - locked with a ref lock - can allocate/free memory - just makes
>> sure the page tables exist and are reference counted
>> map - locked with a map lock - cannot allocate memory - fill in the
>> PTEs in the page table
>> unmap - locked with a map lock - cannot allocate memory - removes
>> entries in PTEs
>> unref - locked with a ref lock - can allocate/free memory - just drops
>> references and frees (not sure if it ever merges).
> 
> On amdgpu VM page tables are allocated and PTEs filled outside of the fence critical path.

Does that really work for async VM_BIND? If we're missing anything in nova-core
related to the timing of when the allocate and update of the page tables, it
would be good to know.

My understanding you have to write the PTEs at the run stage of the job in
question otherwise you may not know how to map? Are you saying amdgpu writes it
during the run stage but somehow before fence signaling?

> 
> Only invalidating PTEs to signal that a shader needs to be taken off the HW are inside the fence critical path and here no memory allocation is needed.
> 
> Keep in mind that you not only need to avoid having memory allocations inside the critical path, but also not take locks under which memory is allocated.

Yes, this part I was clear from Danilo's email and clear about the various
deadlocks. See my analysis where what you mention is in the cases I covered:
https://lore.kernel.org/all/20e04a3e-8d7d-47bc-9299-deadf8b9e992@nvidia.com/

> Simona added some dma_fence_begin_signalling() and dma_fence_end_signalling() helpers to add lockdep annotations to the fence signaling path. Those have proven to be extremely useful since they allow lockdep to point out mistakes immediately and not just after hours of running on a test system.
> 
Yeah, I looked. Nice! Thanks,

--
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 03/26] rust: gpu: Add GPU buddy allocator bindings
  2026-02-04  3:55   ` Dave Airlie
@ 2026-02-05  1:00     ` Joel Fernandes
  0 siblings, 0 replies; 71+ messages in thread
From: Joel Fernandes @ 2026-02-05  1:00 UTC (permalink / raw)
  To: Dave Airlie
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev



On 2/3/2026 10:55 PM, Dave Airlie wrote:
>> +///
>> +/// These flags control the allocation behavior of the buddy allocator.
>> +#[derive(Clone, Copy, Default, PartialEq, Eq)]
>> +pub struct BuddyFlags(usize);
>> +
>> +impl BuddyFlags {
>> +    /// Range-based allocation from start to end addresses.
>> +    pub const RANGE_ALLOCATION: usize = bindings::GPU_BUDDY_RANGE_ALLOCATION;
>> +
>> +    /// Allocate from top of address space downward.
>> +    pub const TOPDOWN_ALLOCATION: usize = bindings::GPU_BUDDY_TOPDOWN_ALLOCATION;
>> +
>> +    /// Allocate physically contiguous blocks.
>> +    pub const CONTIGUOUS_ALLOCATION: usize = bindings::GPU_BUDDY_CONTIGUOUS_ALLOCATION;
>> +
>> +    /// Request allocation from the cleared (zeroed) memory. The zero'ing is not
>> +    /// done by the allocator, but by the caller before freeing old blocks.
>> +    pub const CLEAR_ALLOCATION: usize = bindings::GPU_BUDDY_CLEAR_ALLOCATION;
>> +
>> +    /// Disable trimming of partially used blocks.
>> +    pub const TRIM_DISABLE: usize = bindings::GPU_BUDDY_TRIM_DISABLE;
>> +
>> +    /// Mark blocks as cleared (zeroed) when freeing. When set during free,
>> +    /// indicates that the caller has already zeroed the memory.
>> +    pub const CLEARED: usize = bindings::GPU_BUDDY_CLEARED;
>> +
>> +    /// Create [`BuddyFlags`] from a raw value with validation.
>> +    ///
>> +    /// Use `|` operator to combine flags if needed, before calling this method.
>> +    pub fn try_new(flags: usize) -> Result<Self> {
>> +        // Flags must not exceed u32::MAX to satisfy the GPU buddy allocator C API.
>> +        if flags > u32::MAX as usize {
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        // `TOPDOWN_ALLOCATION` only works without `RANGE_ALLOCATION`. When both are
>> +        // set, `TOPDOWN_ALLOCATION` is silently ignored by the allocator. Reject this.
>> +        if (flags & Self::RANGE_ALLOCATION) != 0 && (flags & Self::TOPDOWN_ALLOCATION) != 0 {
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        Ok(Self(flags))
>> +    }
>> +
>> +    /// Get raw value of the flags.
>> +    pub(crate) fn as_raw(self) -> usize {
>> +        self.0
>> +    }
>> +}
>> +
>> +/// Parameters for creating a GPU buddy allocator.
>> +#[derive(Clone, Copy)]
>> +pub struct GpuBuddyParams {
>> +    /// Base offset in bytes where the managed memory region starts.
>> +    /// Allocations will be offset by this value.
>> +    pub base_offset_bytes: u64,
>> +    /// Total physical memory size managed by the allocator in bytes.
>> +    pub physical_memory_size_bytes: u64,
>> +    /// Minimum allocation unit / chunk size in bytes, must be >= 4KB.
>> +    pub chunk_size_bytes: u64,
>> +}
>> +
>> +/// Parameters for allocating blocks from a GPU buddy allocator.
>> +#[derive(Clone, Copy)]
>> +pub struct GpuBuddyAllocParams {
>> +    /// Start of allocation range in bytes. Use 0 for beginning.
>> +    pub start_range_address: u64,
>> +    /// End of allocation range in bytes. Use 0 for entire range.
>> +    pub end_range_address: u64,
>> +    /// Total size to allocate in bytes.
>> +    pub size_bytes: u64,
>> +    /// Minimum block size for fragmented allocations in bytes.
>> +    pub min_block_size_bytes: u64,
>> +    /// Buddy allocator behavior flags.
>> +    pub buddy_flags: BuddyFlags,
>> +}
>> +
> 
> (not a full review)
> 
> Any reason these two need Clone, Copy? I'm not seeing a use case for
> that, maybe we should pass them as non-mutable references, but I don't
> think there is any point in passing them by value ever.
Yes, one reason I did that is because the doctests reuse the same params. But I
could also just pass by reference as you suggest. It might remove some mem
copies in the doctests. I will make this change then, thanks!

--
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 02/26] gpu: Move DRM buddy allocator one level up
  2026-01-20 20:42 ` [PATCH RFC v6 02/26] gpu: Move DRM buddy allocator one level up Joel Fernandes
@ 2026-02-05 20:55   ` Dave Airlie
  2026-02-06  1:04     ` Joel Fernandes
  0 siblings, 1 reply; 71+ messages in thread
From: Dave Airlie @ 2026-02-05 20:55 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

On Wed, 21 Jan 2026 at 06:44, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>
> Move the DRM buddy allocator one level up so that it can be used by GPU
> drivers (example, nova-core) that have usecases other than DRM (such as
> VFIO vGPU support). Modify the API, structures and Kconfigs to use
> "gpu_buddy" terminology. Adapt the drivers and tests to use the new API.
>
> The commit cannot be split due to bisectability, however no functional
> change is intended. Verified by running K-UNIT tests and build tested
> various configurations.
>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

I suggested this and think it's a good idea.

Reviewed-by: Dave Airlie <airlied@redhat.com>


> ---
>  Documentation/gpu/drm-mm.rst                  |   10 +-
>  drivers/gpu/Kconfig                           |   13 +
>  drivers/gpu/Makefile                          |    2 +
>  drivers/gpu/buddy.c                           | 1310 +++++++++++++++++
>  drivers/gpu/drm/Kconfig                       |    1 +
>  drivers/gpu/drm/Kconfig.debug                 |    4 +-
>  drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c       |    2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h    |   12 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |   80 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h  |   20 +-
>  drivers/gpu/drm/drm_buddy.c                   | 1284 +---------------
>  drivers/gpu/drm/i915/Kconfig                  |    1 +
>  drivers/gpu/drm/i915/i915_scatterlist.c       |   10 +-
>  drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |   55 +-
>  drivers/gpu/drm/i915/i915_ttm_buddy_manager.h |    6 +-
>  .../drm/i915/selftests/intel_memory_region.c  |   20 +-
>  drivers/gpu/drm/tests/Makefile                |    1 -
>  .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |    5 +-
>  drivers/gpu/drm/ttm/tests/ttm_mock_manager.c  |   18 +-
>  drivers/gpu/drm/ttm/tests/ttm_mock_manager.h  |    4 +-
>  drivers/gpu/drm/xe/Kconfig                    |    1 +
>  drivers/gpu/drm/xe/xe_res_cursor.h            |   34 +-
>  drivers/gpu/drm/xe/xe_svm.c                   |   12 +-
>  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c          |   73 +-
>  drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h    |    4 +-
>  drivers/gpu/tests/Makefile                    |    3 +
>  .../gpu_buddy_test.c}                         |  390 ++---
>  drivers/gpu/tests/gpu_random.c                |   48 +
>  drivers/gpu/tests/gpu_random.h                |   28 +
>  drivers/video/Kconfig                         |    2 +
>  include/drm/drm_buddy.h                       |  163 +-
>  include/linux/gpu_buddy.h                     |  177 +++
>  33 files changed, 1995 insertions(+), 1799 deletions(-)
>  create mode 100644 drivers/gpu/Kconfig
>  create mode 100644 drivers/gpu/buddy.c
>  create mode 100644 drivers/gpu/tests/Makefile
>  rename drivers/gpu/{drm/tests/drm_buddy_test.c => tests/gpu_buddy_test.c} (68%)
>  create mode 100644 drivers/gpu/tests/gpu_random.c
>  create mode 100644 drivers/gpu/tests/gpu_random.h
>  create mode 100644 include/linux/gpu_buddy.h
>
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index d55751cad67c..8e0d31230b29 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -509,8 +509,14 @@ DRM GPUVM Function References
>  DRM Buddy Allocator
>  ===================
>
> -DRM Buddy Function References
> ------------------------------
> +Buddy Allocator Function References (GPU buddy)
> +-----------------------------------------------
> +
> +.. kernel-doc:: drivers/gpu/buddy.c
> +   :export:
> +
> +DRM Buddy Specific Logging Function References
> +----------------------------------------------
>
>  .. kernel-doc:: drivers/gpu/drm/drm_buddy.c
>     :export:
> diff --git a/drivers/gpu/Kconfig b/drivers/gpu/Kconfig
> new file mode 100644
> index 000000000000..22dd29cd50b5
> --- /dev/null
> +++ b/drivers/gpu/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +config GPU_BUDDY
> +       bool
> +       help
> +         A page based buddy allocator for GPU memory.
> +
> +config GPU_BUDDY_KUNIT_TEST
> +       tristate "KUnit tests for GPU buddy allocator" if !KUNIT_ALL_TESTS
> +       depends on GPU_BUDDY && KUNIT
> +       default KUNIT_ALL_TESTS
> +       help
> +         KUnit tests for the GPU buddy allocator.
> diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
> index 36a54d456630..5063caccabdf 100644
> --- a/drivers/gpu/Makefile
> +++ b/drivers/gpu/Makefile
> @@ -6,3 +6,5 @@ obj-y                   += host1x/ drm/ vga/
>  obj-$(CONFIG_IMX_IPUV3_CORE)   += ipu-v3/
>  obj-$(CONFIG_TRACE_GPU_MEM)            += trace/
>  obj-$(CONFIG_NOVA_CORE)                += nova-core/
> +obj-$(CONFIG_GPU_BUDDY)                += buddy.o
> +obj-y                          += tests/
> diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
> new file mode 100644
> index 000000000000..1347c0436617
> --- /dev/null
> +++ b/drivers/gpu/buddy.c
> @@ -0,0 +1,1310 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include <kunit/test-bug.h>
> +
> +#include <linux/export.h>
> +#include <linux/gpu_buddy.h>
> +#include <linux/kmemleak.h>
> +#include <linux/module.h>
> +#include <linux/sizes.h>
> +
> +static struct kmem_cache *slab_blocks;
> +
> +static struct gpu_buddy_block *gpu_block_alloc(struct gpu_buddy *mm,
> +                                              struct gpu_buddy_block *parent,
> +                                              unsigned int order,
> +                                              u64 offset)
> +{
> +       struct gpu_buddy_block *block;
> +
> +       BUG_ON(order > GPU_BUDDY_MAX_ORDER);
> +
> +       block = kmem_cache_zalloc(slab_blocks, GFP_KERNEL);
> +       if (!block)
> +               return NULL;
> +
> +       block->header = offset;
> +       block->header |= order;
> +       block->parent = parent;
> +
> +       RB_CLEAR_NODE(&block->rb);
> +
> +       BUG_ON(block->header & GPU_BUDDY_HEADER_UNUSED);
> +       return block;
> +}
> +
> +static void gpu_block_free(struct gpu_buddy *mm,
> +                          struct gpu_buddy_block *block)
> +{
> +       kmem_cache_free(slab_blocks, block);
> +}
> +
> +static enum gpu_buddy_free_tree
> +get_block_tree(struct gpu_buddy_block *block)
> +{
> +       return gpu_buddy_block_is_clear(block) ?
> +              GPU_BUDDY_CLEAR_TREE : GPU_BUDDY_DIRTY_TREE;
> +}
> +
> +static struct gpu_buddy_block *
> +rbtree_get_free_block(const struct rb_node *node)
> +{
> +       return node ? rb_entry(node, struct gpu_buddy_block, rb) : NULL;
> +}
> +
> +static struct gpu_buddy_block *
> +rbtree_last_free_block(struct rb_root *root)
> +{
> +       return rbtree_get_free_block(rb_last(root));
> +}
> +
> +static bool rbtree_is_empty(struct rb_root *root)
> +{
> +       return RB_EMPTY_ROOT(root);
> +}
> +
> +static bool gpu_buddy_block_offset_less(const struct gpu_buddy_block *block,
> +                                       const struct gpu_buddy_block *node)
> +{
> +       return gpu_buddy_block_offset(block) < gpu_buddy_block_offset(node);
> +}
> +
> +static bool rbtree_block_offset_less(struct rb_node *block,
> +                                    const struct rb_node *node)
> +{
> +       return gpu_buddy_block_offset_less(rbtree_get_free_block(block),
> +                                          rbtree_get_free_block(node));
> +}
> +
> +static void rbtree_insert(struct gpu_buddy *mm,
> +                         struct gpu_buddy_block *block,
> +                         enum gpu_buddy_free_tree tree)
> +{
> +       rb_add(&block->rb,
> +              &mm->free_trees[tree][gpu_buddy_block_order(block)],
> +              rbtree_block_offset_less);
> +}
> +
> +static void rbtree_remove(struct gpu_buddy *mm,
> +                         struct gpu_buddy_block *block)
> +{
> +       unsigned int order = gpu_buddy_block_order(block);
> +       enum gpu_buddy_free_tree tree;
> +       struct rb_root *root;
> +
> +       tree = get_block_tree(block);
> +       root = &mm->free_trees[tree][order];
> +
> +       rb_erase(&block->rb, root);
> +       RB_CLEAR_NODE(&block->rb);
> +}
> +
> +static void clear_reset(struct gpu_buddy_block *block)
> +{
> +       block->header &= ~GPU_BUDDY_HEADER_CLEAR;
> +}
> +
> +static void mark_cleared(struct gpu_buddy_block *block)
> +{
> +       block->header |= GPU_BUDDY_HEADER_CLEAR;
> +}
> +
> +static void mark_allocated(struct gpu_buddy *mm,
> +                          struct gpu_buddy_block *block)
> +{
> +       block->header &= ~GPU_BUDDY_HEADER_STATE;
> +       block->header |= GPU_BUDDY_ALLOCATED;
> +
> +       rbtree_remove(mm, block);
> +}
> +
> +static void mark_free(struct gpu_buddy *mm,
> +                     struct gpu_buddy_block *block)
> +{
> +       enum gpu_buddy_free_tree tree;
> +
> +       block->header &= ~GPU_BUDDY_HEADER_STATE;
> +       block->header |= GPU_BUDDY_FREE;
> +
> +       tree = get_block_tree(block);
> +       rbtree_insert(mm, block, tree);
> +}
> +
> +static void mark_split(struct gpu_buddy *mm,
> +                      struct gpu_buddy_block *block)
> +{
> +       block->header &= ~GPU_BUDDY_HEADER_STATE;
> +       block->header |= GPU_BUDDY_SPLIT;
> +
> +       rbtree_remove(mm, block);
> +}
> +
> +static inline bool overlaps(u64 s1, u64 e1, u64 s2, u64 e2)
> +{
> +       return s1 <= e2 && e1 >= s2;
> +}
> +
> +static inline bool contains(u64 s1, u64 e1, u64 s2, u64 e2)
> +{
> +       return s1 <= s2 && e1 >= e2;
> +}
> +
> +static struct gpu_buddy_block *
> +__get_buddy(struct gpu_buddy_block *block)
> +{
> +       struct gpu_buddy_block *parent;
> +
> +       parent = block->parent;
> +       if (!parent)
> +               return NULL;
> +
> +       if (parent->left == block)
> +               return parent->right;
> +
> +       return parent->left;
> +}
> +
> +static unsigned int __gpu_buddy_free(struct gpu_buddy *mm,
> +                                    struct gpu_buddy_block *block,
> +                                    bool force_merge)
> +{
> +       struct gpu_buddy_block *parent;
> +       unsigned int order;
> +
> +       while ((parent = block->parent)) {
> +               struct gpu_buddy_block *buddy;
> +
> +               buddy = __get_buddy(block);
> +
> +               if (!gpu_buddy_block_is_free(buddy))
> +                       break;
> +
> +               if (!force_merge) {
> +                       /*
> +                        * Check the block and its buddy clear state and exit
> +                        * the loop if they both have the dissimilar state.
> +                        */
> +                       if (gpu_buddy_block_is_clear(block) !=
> +                           gpu_buddy_block_is_clear(buddy))
> +                               break;
> +
> +                       if (gpu_buddy_block_is_clear(block))
> +                               mark_cleared(parent);
> +               }
> +
> +               rbtree_remove(mm, buddy);
> +               if (force_merge && gpu_buddy_block_is_clear(buddy))
> +                       mm->clear_avail -= gpu_buddy_block_size(mm, buddy);
> +
> +               gpu_block_free(mm, block);
> +               gpu_block_free(mm, buddy);
> +
> +               block = parent;
> +       }
> +
> +       order = gpu_buddy_block_order(block);
> +       mark_free(mm, block);
> +
> +       return order;
> +}
> +
> +static int __force_merge(struct gpu_buddy *mm,
> +                        u64 start,
> +                        u64 end,
> +                        unsigned int min_order)
> +{
> +       unsigned int tree, order;
> +       int i;
> +
> +       if (!min_order)
> +               return -ENOMEM;
> +
> +       if (min_order > mm->max_order)
> +               return -EINVAL;
> +
> +       for_each_free_tree(tree) {
> +               for (i = min_order - 1; i >= 0; i--) {
> +                       struct rb_node *iter = rb_last(&mm->free_trees[tree][i]);
> +
> +                       while (iter) {
> +                               struct gpu_buddy_block *block, *buddy;
> +                               u64 block_start, block_end;
> +
> +                               block = rbtree_get_free_block(iter);
> +                               iter = rb_prev(iter);
> +
> +                               if (!block || !block->parent)
> +                                       continue;
> +
> +                               block_start = gpu_buddy_block_offset(block);
> +                               block_end = block_start + gpu_buddy_block_size(mm, block) - 1;
> +
> +                               if (!contains(start, end, block_start, block_end))
> +                                       continue;
> +
> +                               buddy = __get_buddy(block);
> +                               if (!gpu_buddy_block_is_free(buddy))
> +                                       continue;
> +
> +                               WARN_ON(gpu_buddy_block_is_clear(block) ==
> +                                       gpu_buddy_block_is_clear(buddy));
> +
> +                               /*
> +                                * Advance to the next node when the current node is the buddy,
> +                                * as freeing the block will also remove its buddy from the tree.
> +                                */
> +                               if (iter == &buddy->rb)
> +                                       iter = rb_prev(iter);
> +
> +                               rbtree_remove(mm, block);
> +                               if (gpu_buddy_block_is_clear(block))
> +                                       mm->clear_avail -= gpu_buddy_block_size(mm, block);
> +
> +                               order = __gpu_buddy_free(mm, block, true);
> +                               if (order >= min_order)
> +                                       return 0;
> +                       }
> +               }
> +       }
> +
> +       return -ENOMEM;
> +}
> +
> +/**
> + * gpu_buddy_init - init memory manager
> + *
> + * @mm: GPU buddy manager to initialize
> + * @size: size in bytes to manage
> + * @chunk_size: minimum page size in bytes for our allocations
> + *
> + * Initializes the memory manager and its resources.
> + *
> + * Returns:
> + * 0 on success, error code on failure.
> + */
> +int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size)
> +{
> +       unsigned int i, j, root_count = 0;
> +       u64 offset = 0;
> +
> +       if (size < chunk_size)
> +               return -EINVAL;
> +
> +       if (chunk_size < SZ_4K)
> +               return -EINVAL;
> +
> +       if (!is_power_of_2(chunk_size))
> +               return -EINVAL;
> +
> +       size = round_down(size, chunk_size);
> +
> +       mm->size = size;
> +       mm->avail = size;
> +       mm->clear_avail = 0;
> +       mm->chunk_size = chunk_size;
> +       mm->max_order = ilog2(size) - ilog2(chunk_size);
> +
> +       BUG_ON(mm->max_order > GPU_BUDDY_MAX_ORDER);
> +
> +       mm->free_trees = kmalloc_array(GPU_BUDDY_MAX_FREE_TREES,
> +                                      sizeof(*mm->free_trees),
> +                                      GFP_KERNEL);
> +       if (!mm->free_trees)
> +               return -ENOMEM;
> +
> +       for_each_free_tree(i) {
> +               mm->free_trees[i] = kmalloc_array(mm->max_order + 1,
> +                                                 sizeof(struct rb_root),
> +                                                 GFP_KERNEL);
> +               if (!mm->free_trees[i])
> +                       goto out_free_tree;
> +
> +               for (j = 0; j <= mm->max_order; ++j)
> +                       mm->free_trees[i][j] = RB_ROOT;
> +       }
> +
> +       mm->n_roots = hweight64(size);
> +
> +       mm->roots = kmalloc_array(mm->n_roots,
> +                                 sizeof(struct gpu_buddy_block *),
> +                                 GFP_KERNEL);
> +       if (!mm->roots)
> +               goto out_free_tree;
> +
> +       /*
> +        * Split into power-of-two blocks, in case we are given a size that is
> +        * not itself a power-of-two.
> +        */
> +       do {
> +               struct gpu_buddy_block *root;
> +               unsigned int order;
> +               u64 root_size;
> +
> +               order = ilog2(size) - ilog2(chunk_size);
> +               root_size = chunk_size << order;
> +
> +               root = gpu_block_alloc(mm, NULL, order, offset);
> +               if (!root)
> +                       goto out_free_roots;
> +
> +               mark_free(mm, root);
> +
> +               BUG_ON(root_count > mm->max_order);
> +               BUG_ON(gpu_buddy_block_size(mm, root) < chunk_size);
> +
> +               mm->roots[root_count] = root;
> +
> +               offset += root_size;
> +               size -= root_size;
> +               root_count++;
> +       } while (size);
> +
> +       return 0;
> +
> +out_free_roots:
> +       while (root_count--)
> +               gpu_block_free(mm, mm->roots[root_count]);
> +       kfree(mm->roots);
> +out_free_tree:
> +       while (i--)
> +               kfree(mm->free_trees[i]);
> +       kfree(mm->free_trees);
> +       return -ENOMEM;
> +}
> +EXPORT_SYMBOL(gpu_buddy_init);
> +
> +/**
> + * gpu_buddy_fini - tear down the memory manager
> + *
> + * @mm: GPU buddy manager to free
> + *
> + * Cleanup memory manager resources and the freetree
> + */
> +void gpu_buddy_fini(struct gpu_buddy *mm)
> +{
> +       u64 root_size, size, start;
> +       unsigned int order;
> +       int i;
> +
> +       size = mm->size;
> +
> +       for (i = 0; i < mm->n_roots; ++i) {
> +               order = ilog2(size) - ilog2(mm->chunk_size);
> +               start = gpu_buddy_block_offset(mm->roots[i]);
> +               __force_merge(mm, start, start + size, order);
> +
> +               if (WARN_ON(!gpu_buddy_block_is_free(mm->roots[i])))
> +                       kunit_fail_current_test("buddy_fini() root");
> +
> +               gpu_block_free(mm, mm->roots[i]);
> +
> +               root_size = mm->chunk_size << order;
> +               size -= root_size;
> +       }
> +
> +       WARN_ON(mm->avail != mm->size);
> +
> +       for_each_free_tree(i)
> +               kfree(mm->free_trees[i]);
> +       kfree(mm->roots);
> +}
> +EXPORT_SYMBOL(gpu_buddy_fini);
> +
> +static int split_block(struct gpu_buddy *mm,
> +                      struct gpu_buddy_block *block)
> +{
> +       unsigned int block_order = gpu_buddy_block_order(block) - 1;
> +       u64 offset = gpu_buddy_block_offset(block);
> +
> +       BUG_ON(!gpu_buddy_block_is_free(block));
> +       BUG_ON(!gpu_buddy_block_order(block));
> +
> +       block->left = gpu_block_alloc(mm, block, block_order, offset);
> +       if (!block->left)
> +               return -ENOMEM;
> +
> +       block->right = gpu_block_alloc(mm, block, block_order,
> +                                      offset + (mm->chunk_size << block_order));
> +       if (!block->right) {
> +               gpu_block_free(mm, block->left);
> +               return -ENOMEM;
> +       }
> +
> +       mark_split(mm, block);
> +
> +       if (gpu_buddy_block_is_clear(block)) {
> +               mark_cleared(block->left);
> +               mark_cleared(block->right);
> +               clear_reset(block);
> +       }
> +
> +       mark_free(mm, block->left);
> +       mark_free(mm, block->right);
> +
> +       return 0;
> +}
> +
> +/**
> + * gpu_get_buddy - get buddy address
> + *
> + * @block: GPU buddy block
> + *
> + * Returns the corresponding buddy block for @block, or NULL
> + * if this is a root block and can't be merged further.
> + * Requires some kind of locking to protect against
> + * any concurrent allocate and free operations.
> + */
> +struct gpu_buddy_block *
> +gpu_get_buddy(struct gpu_buddy_block *block)
> +{
> +       return __get_buddy(block);
> +}
> +EXPORT_SYMBOL(gpu_get_buddy);
> +
> +/**
> + * gpu_buddy_reset_clear - reset blocks clear state
> + *
> + * @mm: GPU buddy manager
> + * @is_clear: blocks clear state
> + *
> + * Reset the clear state based on @is_clear value for each block
> + * in the freetree.
> + */
> +void gpu_buddy_reset_clear(struct gpu_buddy *mm, bool is_clear)
> +{
> +       enum gpu_buddy_free_tree src_tree, dst_tree;
> +       u64 root_size, size, start;
> +       unsigned int order;
> +       int i;
> +
> +       size = mm->size;
> +       for (i = 0; i < mm->n_roots; ++i) {
> +               order = ilog2(size) - ilog2(mm->chunk_size);
> +               start = gpu_buddy_block_offset(mm->roots[i]);
> +               __force_merge(mm, start, start + size, order);
> +
> +               root_size = mm->chunk_size << order;
> +               size -= root_size;
> +       }
> +
> +       src_tree = is_clear ? GPU_BUDDY_DIRTY_TREE : GPU_BUDDY_CLEAR_TREE;
> +       dst_tree = is_clear ? GPU_BUDDY_CLEAR_TREE : GPU_BUDDY_DIRTY_TREE;
> +
> +       for (i = 0; i <= mm->max_order; ++i) {
> +               struct rb_root *root = &mm->free_trees[src_tree][i];
> +               struct gpu_buddy_block *block, *tmp;
> +
> +               rbtree_postorder_for_each_entry_safe(block, tmp, root, rb) {
> +                       rbtree_remove(mm, block);
> +                       if (is_clear) {
> +                               mark_cleared(block);
> +                               mm->clear_avail += gpu_buddy_block_size(mm, block);
> +                       } else {
> +                               clear_reset(block);
> +                               mm->clear_avail -= gpu_buddy_block_size(mm, block);
> +                       }
> +
> +                       rbtree_insert(mm, block, dst_tree);
> +               }
> +       }
> +}
> +EXPORT_SYMBOL(gpu_buddy_reset_clear);
> +
> +/**
> + * gpu_buddy_free_block - free a block
> + *
> + * @mm: GPU buddy manager
> + * @block: block to be freed
> + */
> +void gpu_buddy_free_block(struct gpu_buddy *mm,
> +                         struct gpu_buddy_block *block)
> +{
> +       BUG_ON(!gpu_buddy_block_is_allocated(block));
> +       mm->avail += gpu_buddy_block_size(mm, block);
> +       if (gpu_buddy_block_is_clear(block))
> +               mm->clear_avail += gpu_buddy_block_size(mm, block);
> +
> +       __gpu_buddy_free(mm, block, false);
> +}
> +EXPORT_SYMBOL(gpu_buddy_free_block);
> +
> +static void __gpu_buddy_free_list(struct gpu_buddy *mm,
> +                                 struct list_head *objects,
> +                                 bool mark_clear,
> +                                 bool mark_dirty)
> +{
> +       struct gpu_buddy_block *block, *on;
> +
> +       WARN_ON(mark_dirty && mark_clear);
> +
> +       list_for_each_entry_safe(block, on, objects, link) {
> +               if (mark_clear)
> +                       mark_cleared(block);
> +               else if (mark_dirty)
> +                       clear_reset(block);
> +               gpu_buddy_free_block(mm, block);
> +               cond_resched();
> +       }
> +       INIT_LIST_HEAD(objects);
> +}
> +
> +static void gpu_buddy_free_list_internal(struct gpu_buddy *mm,
> +                                        struct list_head *objects)
> +{
> +       /*
> +        * Don't touch the clear/dirty bit, since allocation is still internal
> +        * at this point. For example we might have just failed part of the
> +        * allocation.
> +        */
> +       __gpu_buddy_free_list(mm, objects, false, false);
> +}
> +
> +/**
> + * gpu_buddy_free_list - free blocks
> + *
> + * @mm: GPU buddy manager
> + * @objects: input list head to free blocks
> + * @flags: optional flags like GPU_BUDDY_CLEARED
> + */
> +void gpu_buddy_free_list(struct gpu_buddy *mm,
> +                        struct list_head *objects,
> +                        unsigned int flags)
> +{
> +       bool mark_clear = flags & GPU_BUDDY_CLEARED;
> +
> +       __gpu_buddy_free_list(mm, objects, mark_clear, !mark_clear);
> +}
> +EXPORT_SYMBOL(gpu_buddy_free_list);
> +
> +static bool block_incompatible(struct gpu_buddy_block *block, unsigned int flags)
> +{
> +       bool needs_clear = flags & GPU_BUDDY_CLEAR_ALLOCATION;
> +
> +       return needs_clear != gpu_buddy_block_is_clear(block);
> +}
> +
> +static struct gpu_buddy_block *
> +__alloc_range_bias(struct gpu_buddy *mm,
> +                  u64 start, u64 end,
> +                  unsigned int order,
> +                  unsigned long flags,
> +                  bool fallback)
> +{
> +       u64 req_size = mm->chunk_size << order;
> +       struct gpu_buddy_block *block;
> +       struct gpu_buddy_block *buddy;
> +       LIST_HEAD(dfs);
> +       int err;
> +       int i;
> +
> +       end = end - 1;
> +
> +       for (i = 0; i < mm->n_roots; ++i)
> +               list_add_tail(&mm->roots[i]->tmp_link, &dfs);
> +
> +       do {
> +               u64 block_start;
> +               u64 block_end;
> +
> +               block = list_first_entry_or_null(&dfs,
> +                                                struct gpu_buddy_block,
> +                                                tmp_link);
> +               if (!block)
> +                       break;
> +
> +               list_del(&block->tmp_link);
> +
> +               if (gpu_buddy_block_order(block) < order)
> +                       continue;
> +
> +               block_start = gpu_buddy_block_offset(block);
> +               block_end = block_start + gpu_buddy_block_size(mm, block) - 1;
> +
> +               if (!overlaps(start, end, block_start, block_end))
> +                       continue;
> +
> +               if (gpu_buddy_block_is_allocated(block))
> +                       continue;
> +
> +               if (block_start < start || block_end > end) {
> +                       u64 adjusted_start = max(block_start, start);
> +                       u64 adjusted_end = min(block_end, end);
> +
> +                       if (round_down(adjusted_end + 1, req_size) <=
> +                           round_up(adjusted_start, req_size))
> +                               continue;
> +               }
> +
> +               if (!fallback && block_incompatible(block, flags))
> +                       continue;
> +
> +               if (contains(start, end, block_start, block_end) &&
> +                   order == gpu_buddy_block_order(block)) {
> +                       /*
> +                        * Find the free block within the range.
> +                        */
> +                       if (gpu_buddy_block_is_free(block))
> +                               return block;
> +
> +                       continue;
> +               }
> +
> +               if (!gpu_buddy_block_is_split(block)) {
> +                       err = split_block(mm, block);
> +                       if (unlikely(err))
> +                               goto err_undo;
> +               }
> +
> +               list_add(&block->right->tmp_link, &dfs);
> +               list_add(&block->left->tmp_link, &dfs);
> +       } while (1);
> +
> +       return ERR_PTR(-ENOSPC);
> +
> +err_undo:
> +       /*
> +        * We really don't want to leave around a bunch of split blocks, since
> +        * bigger is better, so make sure we merge everything back before we
> +        * free the allocated blocks.
> +        */
> +       buddy = __get_buddy(block);
> +       if (buddy &&
> +           (gpu_buddy_block_is_free(block) &&
> +            gpu_buddy_block_is_free(buddy)))
> +               __gpu_buddy_free(mm, block, false);
> +       return ERR_PTR(err);
> +}
> +
> +static struct gpu_buddy_block *
> +__gpu_buddy_alloc_range_bias(struct gpu_buddy *mm,
> +                            u64 start, u64 end,
> +                            unsigned int order,
> +                            unsigned long flags)
> +{
> +       struct gpu_buddy_block *block;
> +       bool fallback = false;
> +
> +       block = __alloc_range_bias(mm, start, end, order,
> +                                  flags, fallback);
> +       if (IS_ERR(block))
> +               return __alloc_range_bias(mm, start, end, order,
> +                                         flags, !fallback);
> +
> +       return block;
> +}
> +
> +static struct gpu_buddy_block *
> +get_maxblock(struct gpu_buddy *mm,
> +            unsigned int order,
> +            enum gpu_buddy_free_tree tree)
> +{
> +       struct gpu_buddy_block *max_block = NULL, *block = NULL;
> +       struct rb_root *root;
> +       unsigned int i;
> +
> +       for (i = order; i <= mm->max_order; ++i) {
> +               root = &mm->free_trees[tree][i];
> +               block = rbtree_last_free_block(root);
> +               if (!block)
> +                       continue;
> +
> +               if (!max_block) {
> +                       max_block = block;
> +                       continue;
> +               }
> +
> +               if (gpu_buddy_block_offset(block) >
> +                   gpu_buddy_block_offset(max_block)) {
> +                       max_block = block;
> +               }
> +       }
> +
> +       return max_block;
> +}
> +
> +static struct gpu_buddy_block *
> +alloc_from_freetree(struct gpu_buddy *mm,
> +                   unsigned int order,
> +                   unsigned long flags)
> +{
> +       struct gpu_buddy_block *block = NULL;
> +       struct rb_root *root;
> +       enum gpu_buddy_free_tree tree;
> +       unsigned int tmp;
> +       int err;
> +
> +       tree = (flags & GPU_BUDDY_CLEAR_ALLOCATION) ?
> +               GPU_BUDDY_CLEAR_TREE : GPU_BUDDY_DIRTY_TREE;
> +
> +       if (flags & GPU_BUDDY_TOPDOWN_ALLOCATION) {
> +               block = get_maxblock(mm, order, tree);
> +               if (block)
> +                       /* Store the obtained block order */
> +                       tmp = gpu_buddy_block_order(block);
> +       } else {
> +               for (tmp = order; tmp <= mm->max_order; ++tmp) {
> +                       /* Get RB tree root for this order and tree */
> +                       root = &mm->free_trees[tree][tmp];
> +                       block = rbtree_last_free_block(root);
> +                       if (block)
> +                               break;
> +               }
> +       }
> +
> +       if (!block) {
> +               /* Try allocating from the other tree */
> +               tree = (tree == GPU_BUDDY_CLEAR_TREE) ?
> +                       GPU_BUDDY_DIRTY_TREE : GPU_BUDDY_CLEAR_TREE;
> +
> +               for (tmp = order; tmp <= mm->max_order; ++tmp) {
> +                       root = &mm->free_trees[tree][tmp];
> +                       block = rbtree_last_free_block(root);
> +                       if (block)
> +                               break;
> +               }
> +
> +               if (!block)
> +                       return ERR_PTR(-ENOSPC);
> +       }
> +
> +       BUG_ON(!gpu_buddy_block_is_free(block));
> +
> +       while (tmp != order) {
> +               err = split_block(mm, block);
> +               if (unlikely(err))
> +                       goto err_undo;
> +
> +               block = block->right;
> +               tmp--;
> +       }
> +       return block;
> +
> +err_undo:
> +       if (tmp != order)
> +               __gpu_buddy_free(mm, block, false);
> +       return ERR_PTR(err);
> +}
> +
> +static int __alloc_range(struct gpu_buddy *mm,
> +                        struct list_head *dfs,
> +                        u64 start, u64 size,
> +                        struct list_head *blocks,
> +                        u64 *total_allocated_on_err)
> +{
> +       struct gpu_buddy_block *block;
> +       struct gpu_buddy_block *buddy;
> +       u64 total_allocated = 0;
> +       LIST_HEAD(allocated);
> +       u64 end;
> +       int err;
> +
> +       end = start + size - 1;
> +
> +       do {
> +               u64 block_start;
> +               u64 block_end;
> +
> +               block = list_first_entry_or_null(dfs,
> +                                                struct gpu_buddy_block,
> +                                                tmp_link);
> +               if (!block)
> +                       break;
> +
> +               list_del(&block->tmp_link);
> +
> +               block_start = gpu_buddy_block_offset(block);
> +               block_end = block_start + gpu_buddy_block_size(mm, block) - 1;
> +
> +               if (!overlaps(start, end, block_start, block_end))
> +                       continue;
> +
> +               if (gpu_buddy_block_is_allocated(block)) {
> +                       err = -ENOSPC;
> +                       goto err_free;
> +               }
> +
> +               if (contains(start, end, block_start, block_end)) {
> +                       if (gpu_buddy_block_is_free(block)) {
> +                               mark_allocated(mm, block);
> +                               total_allocated += gpu_buddy_block_size(mm, block);
> +                               mm->avail -= gpu_buddy_block_size(mm, block);
> +                               if (gpu_buddy_block_is_clear(block))
> +                                       mm->clear_avail -= gpu_buddy_block_size(mm, block);
> +                               list_add_tail(&block->link, &allocated);
> +                               continue;
> +                       } else if (!mm->clear_avail) {
> +                               err = -ENOSPC;
> +                               goto err_free;
> +                       }
> +               }
> +
> +               if (!gpu_buddy_block_is_split(block)) {
> +                       err = split_block(mm, block);
> +                       if (unlikely(err))
> +                               goto err_undo;
> +               }
> +
> +               list_add(&block->right->tmp_link, dfs);
> +               list_add(&block->left->tmp_link, dfs);
> +       } while (1);
> +
> +       if (total_allocated < size) {
> +               err = -ENOSPC;
> +               goto err_free;
> +       }
> +
> +       list_splice_tail(&allocated, blocks);
> +
> +       return 0;
> +
> +err_undo:
> +       /*
> +        * We really don't want to leave around a bunch of split blocks, since
> +        * bigger is better, so make sure we merge everything back before we
> +        * free the allocated blocks.
> +        */
> +       buddy = __get_buddy(block);
> +       if (buddy &&
> +           (gpu_buddy_block_is_free(block) &&
> +            gpu_buddy_block_is_free(buddy)))
> +               __gpu_buddy_free(mm, block, false);
> +
> +err_free:
> +       if (err == -ENOSPC && total_allocated_on_err) {
> +               list_splice_tail(&allocated, blocks);
> +               *total_allocated_on_err = total_allocated;
> +       } else {
> +               gpu_buddy_free_list_internal(mm, &allocated);
> +       }
> +
> +       return err;
> +}
> +
> +static int __gpu_buddy_alloc_range(struct gpu_buddy *mm,
> +                                  u64 start,
> +                                  u64 size,
> +                                  u64 *total_allocated_on_err,
> +                                  struct list_head *blocks)
> +{
> +       LIST_HEAD(dfs);
> +       int i;
> +
> +       for (i = 0; i < mm->n_roots; ++i)
> +               list_add_tail(&mm->roots[i]->tmp_link, &dfs);
> +
> +       return __alloc_range(mm, &dfs, start, size,
> +                            blocks, total_allocated_on_err);
> +}
> +
> +static int __alloc_contig_try_harder(struct gpu_buddy *mm,
> +                                    u64 size,
> +                                    u64 min_block_size,
> +                                    struct list_head *blocks)
> +{
> +       u64 rhs_offset, lhs_offset, lhs_size, filled;
> +       struct gpu_buddy_block *block;
> +       unsigned int tree, order;
> +       LIST_HEAD(blocks_lhs);
> +       unsigned long pages;
> +       u64 modify_size;
> +       int err;
> +
> +       modify_size = rounddown_pow_of_two(size);
> +       pages = modify_size >> ilog2(mm->chunk_size);
> +       order = fls(pages) - 1;
> +       if (order == 0)
> +               return -ENOSPC;
> +
> +       for_each_free_tree(tree) {
> +               struct rb_root *root;
> +               struct rb_node *iter;
> +
> +               root = &mm->free_trees[tree][order];
> +               if (rbtree_is_empty(root))
> +                       continue;
> +
> +               iter = rb_last(root);
> +               while (iter) {
> +                       block = rbtree_get_free_block(iter);
> +
> +                       /* Allocate blocks traversing RHS */
> +                       rhs_offset = gpu_buddy_block_offset(block);
> +                       err =  __gpu_buddy_alloc_range(mm, rhs_offset, size,
> +                                                      &filled, blocks);
> +                       if (!err || err != -ENOSPC)
> +                               return err;
> +
> +                       lhs_size = max((size - filled), min_block_size);
> +                       if (!IS_ALIGNED(lhs_size, min_block_size))
> +                               lhs_size = round_up(lhs_size, min_block_size);
> +
> +                       /* Allocate blocks traversing LHS */
> +                       lhs_offset = gpu_buddy_block_offset(block) - lhs_size;
> +                       err =  __gpu_buddy_alloc_range(mm, lhs_offset, lhs_size,
> +                                                      NULL, &blocks_lhs);
> +                       if (!err) {
> +                               list_splice(&blocks_lhs, blocks);
> +                               return 0;
> +                       } else if (err != -ENOSPC) {
> +                               gpu_buddy_free_list_internal(mm, blocks);
> +                               return err;
> +                       }
> +                       /* Free blocks for the next iteration */
> +                       gpu_buddy_free_list_internal(mm, blocks);
> +
> +                       iter = rb_prev(iter);
> +               }
> +       }
> +
> +       return -ENOSPC;
> +}
> +
> +/**
> + * gpu_buddy_block_trim - free unused pages
> + *
> + * @mm: GPU buddy manager
> + * @start: start address to begin the trimming.
> + * @new_size: original size requested
> + * @blocks: Input and output list of allocated blocks.
> + * MUST contain single block as input to be trimmed.
> + * On success will contain the newly allocated blocks
> + * making up the @new_size. Blocks always appear in
> + * ascending order
> + *
> + * For contiguous allocation, we round up the size to the nearest
> + * power of two value, drivers consume *actual* size, so remaining
> + * portions are unused and can be optionally freed with this function
> + *
> + * Returns:
> + * 0 on success, error code on failure.
> + */
> +int gpu_buddy_block_trim(struct gpu_buddy *mm,
> +                        u64 *start,
> +                        u64 new_size,
> +                        struct list_head *blocks)
> +{
> +       struct gpu_buddy_block *parent;
> +       struct gpu_buddy_block *block;
> +       u64 block_start, block_end;
> +       LIST_HEAD(dfs);
> +       u64 new_start;
> +       int err;
> +
> +       if (!list_is_singular(blocks))
> +               return -EINVAL;
> +
> +       block = list_first_entry(blocks,
> +                                struct gpu_buddy_block,
> +                                link);
> +
> +       block_start = gpu_buddy_block_offset(block);
> +       block_end = block_start + gpu_buddy_block_size(mm, block);
> +
> +       if (WARN_ON(!gpu_buddy_block_is_allocated(block)))
> +               return -EINVAL;
> +
> +       if (new_size > gpu_buddy_block_size(mm, block))
> +               return -EINVAL;
> +
> +       if (!new_size || !IS_ALIGNED(new_size, mm->chunk_size))
> +               return -EINVAL;
> +
> +       if (new_size == gpu_buddy_block_size(mm, block))
> +               return 0;
> +
> +       new_start = block_start;
> +       if (start) {
> +               new_start = *start;
> +
> +               if (new_start < block_start)
> +                       return -EINVAL;
> +
> +               if (!IS_ALIGNED(new_start, mm->chunk_size))
> +                       return -EINVAL;
> +
> +               if (range_overflows(new_start, new_size, block_end))
> +                       return -EINVAL;
> +       }
> +
> +       list_del(&block->link);
> +       mark_free(mm, block);
> +       mm->avail += gpu_buddy_block_size(mm, block);
> +       if (gpu_buddy_block_is_clear(block))
> +               mm->clear_avail += gpu_buddy_block_size(mm, block);
> +
> +       /* Prevent recursively freeing this node */
> +       parent = block->parent;
> +       block->parent = NULL;
> +
> +       list_add(&block->tmp_link, &dfs);
> +       err =  __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL);
> +       if (err) {
> +               mark_allocated(mm, block);
> +               mm->avail -= gpu_buddy_block_size(mm, block);
> +               if (gpu_buddy_block_is_clear(block))
> +                       mm->clear_avail -= gpu_buddy_block_size(mm, block);
> +               list_add(&block->link, blocks);
> +       }
> +
> +       block->parent = parent;
> +       return err;
> +}
> +EXPORT_SYMBOL(gpu_buddy_block_trim);
> +
> +static struct gpu_buddy_block *
> +__gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
> +                        u64 start, u64 end,
> +                        unsigned int order,
> +                        unsigned long flags)
> +{
> +       if (flags & GPU_BUDDY_RANGE_ALLOCATION)
> +               /* Allocate traversing within the range */
> +               return  __gpu_buddy_alloc_range_bias(mm, start, end,
> +                                                    order, flags);
> +       else
> +               /* Allocate from freetree */
> +               return alloc_from_freetree(mm, order, flags);
> +}
> +
> +/**
> + * gpu_buddy_alloc_blocks - allocate power-of-two blocks
> + *
> + * @mm: GPU buddy manager to allocate from
> + * @start: start of the allowed range for this block
> + * @end: end of the allowed range for this block
> + * @size: size of the allocation in bytes
> + * @min_block_size: alignment of the allocation
> + * @blocks: output list head to add allocated blocks
> + * @flags: GPU_BUDDY_*_ALLOCATION flags
> + *
> + * alloc_range_bias() called on range limitations, which traverses
> + * the tree and returns the desired block.
> + *
> + * alloc_from_freetree() called when *no* range restrictions
> + * are enforced, which picks the block from the freetree.
> + *
> + * Returns:
> + * 0 on success, error code on failure.
> + */
> +int gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
> +                          u64 start, u64 end, u64 size,
> +                          u64 min_block_size,
> +                          struct list_head *blocks,
> +                          unsigned long flags)
> +{
> +       struct gpu_buddy_block *block = NULL;
> +       u64 original_size, original_min_size;
> +       unsigned int min_order, order;
> +       LIST_HEAD(allocated);
> +       unsigned long pages;
> +       int err;
> +
> +       if (size < mm->chunk_size)
> +               return -EINVAL;
> +
> +       if (min_block_size < mm->chunk_size)
> +               return -EINVAL;
> +
> +       if (!is_power_of_2(min_block_size))
> +               return -EINVAL;
> +
> +       if (!IS_ALIGNED(start | end | size, mm->chunk_size))
> +               return -EINVAL;
> +
> +       if (end > mm->size)
> +               return -EINVAL;
> +
> +       if (range_overflows(start, size, mm->size))
> +               return -EINVAL;
> +
> +       /* Actual range allocation */
> +       if (start + size == end) {
> +               if (!IS_ALIGNED(start | end, min_block_size))
> +                       return -EINVAL;
> +
> +               return __gpu_buddy_alloc_range(mm, start, size, NULL, blocks);
> +       }
> +
> +       original_size = size;
> +       original_min_size = min_block_size;
> +
> +       /* Roundup the size to power of 2 */
> +       if (flags & GPU_BUDDY_CONTIGUOUS_ALLOCATION) {
> +               size = roundup_pow_of_two(size);
> +               min_block_size = size;
> +       /* Align size value to min_block_size */
> +       } else if (!IS_ALIGNED(size, min_block_size)) {
> +               size = round_up(size, min_block_size);
> +       }
> +
> +       pages = size >> ilog2(mm->chunk_size);
> +       order = fls(pages) - 1;
> +       min_order = ilog2(min_block_size) - ilog2(mm->chunk_size);
> +
> +       do {
> +               order = min(order, (unsigned int)fls(pages) - 1);
> +               BUG_ON(order > mm->max_order);
> +               BUG_ON(order < min_order);
> +
> +               do {
> +                       block = __gpu_buddy_alloc_blocks(mm, start,
> +                                                        end,
> +                                                        order,
> +                                                        flags);
> +                       if (!IS_ERR(block))
> +                               break;
> +
> +                       if (order-- == min_order) {
> +                               /* Try allocation through force merge method */
> +                               if (mm->clear_avail &&
> +                                   !__force_merge(mm, start, end, min_order)) {
> +                                       block = __gpu_buddy_alloc_blocks(mm, start,
> +                                                                        end,
> +                                                                        min_order,
> +                                                                        flags);
> +                                       if (!IS_ERR(block)) {
> +                                               order = min_order;
> +                                               break;
> +                                       }
> +                               }
> +
> +                               /*
> +                                * Try contiguous block allocation through
> +                                * try harder method.
> +                                */
> +                               if (flags & GPU_BUDDY_CONTIGUOUS_ALLOCATION &&
> +                                   !(flags & GPU_BUDDY_RANGE_ALLOCATION))
> +                                       return __alloc_contig_try_harder(mm,
> +                                                                        original_size,
> +                                                                        original_min_size,
> +                                                                        blocks);
> +                               err = -ENOSPC;
> +                               goto err_free;
> +                       }
> +               } while (1);
> +
> +               mark_allocated(mm, block);
> +               mm->avail -= gpu_buddy_block_size(mm, block);
> +               if (gpu_buddy_block_is_clear(block))
> +                       mm->clear_avail -= gpu_buddy_block_size(mm, block);
> +               kmemleak_update_trace(block);
> +               list_add_tail(&block->link, &allocated);
> +
> +               pages -= BIT(order);
> +
> +               if (!pages)
> +                       break;
> +       } while (1);
> +
> +       /* Trim the allocated block to the required size */
> +       if (!(flags & GPU_BUDDY_TRIM_DISABLE) &&
> +           original_size != size) {
> +               struct list_head *trim_list;
> +               LIST_HEAD(temp);
> +               u64 trim_size;
> +
> +               trim_list = &allocated;
> +               trim_size = original_size;
> +
> +               if (!list_is_singular(&allocated)) {
> +                       block = list_last_entry(&allocated, typeof(*block), link);
> +                       list_move(&block->link, &temp);
> +                       trim_list = &temp;
> +                       trim_size = gpu_buddy_block_size(mm, block) -
> +                               (size - original_size);
> +               }
> +
> +               gpu_buddy_block_trim(mm,
> +                                    NULL,
> +                                    trim_size,
> +                                    trim_list);
> +
> +               if (!list_empty(&temp))
> +                       list_splice_tail(trim_list, &allocated);
> +       }
> +
> +       list_splice_tail(&allocated, blocks);
> +       return 0;
> +
> +err_free:
> +       gpu_buddy_free_list_internal(mm, &allocated);
> +       return err;
> +}
> +EXPORT_SYMBOL(gpu_buddy_alloc_blocks);
> +
> +/**
> + * gpu_buddy_block_print - print block information
> + *
> + * @mm: GPU buddy manager
> + * @block: GPU buddy block
> + */
> +void gpu_buddy_block_print(struct gpu_buddy *mm,
> +                          struct gpu_buddy_block *block)
> +{
> +       u64 start = gpu_buddy_block_offset(block);
> +       u64 size = gpu_buddy_block_size(mm, block);
> +
> +       pr_info("%#018llx-%#018llx: %llu\n", start, start + size, size);
> +}
> +EXPORT_SYMBOL(gpu_buddy_block_print);
> +
> +/**
> + * gpu_buddy_print - print allocator state
> + *
> + * @mm: GPU buddy manager
> + */
> +void gpu_buddy_print(struct gpu_buddy *mm)
> +{
> +       int order;
> +
> +       pr_info("chunk_size: %lluKiB, total: %lluMiB, free: %lluMiB, clear_free: %lluMiB\n",
> +               mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
> +
> +       for (order = mm->max_order; order >= 0; order--) {
> +               struct gpu_buddy_block *block, *tmp;
> +               struct rb_root *root;
> +               u64 count = 0, free;
> +               unsigned int tree;
> +
> +               for_each_free_tree(tree) {
> +                       root = &mm->free_trees[tree][order];
> +
> +                       rbtree_postorder_for_each_entry_safe(block, tmp, root, rb) {
> +                               BUG_ON(!gpu_buddy_block_is_free(block));
> +                               count++;
> +                       }
> +               }
> +
> +               free = count * (mm->chunk_size << order);
> +               if (free < SZ_1M)
> +                       pr_info("order-%2d free: %8llu KiB, blocks: %llu\n",
> +                               order, free >> 10, count);
> +               else
> +                       pr_info("order-%2d free: %8llu MiB, blocks: %llu\n",
> +                               order, free >> 20, count);
> +       }
> +}
> +EXPORT_SYMBOL(gpu_buddy_print);
> +
> +static void gpu_buddy_module_exit(void)
> +{
> +       kmem_cache_destroy(slab_blocks);
> +}
> +
> +static int __init gpu_buddy_module_init(void)
> +{
> +       slab_blocks = KMEM_CACHE(gpu_buddy_block, 0);
> +       if (!slab_blocks)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +
> +module_init(gpu_buddy_module_init);
> +module_exit(gpu_buddy_module_exit);
> +
> +MODULE_DESCRIPTION("GPU Buddy Allocator");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 7e6bc0b3a589..0475defb37f0 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -220,6 +220,7 @@ config DRM_GPUSVM
>  config DRM_BUDDY
>         tristate
>         depends on DRM
> +       select GPU_BUDDY
>         help
>           A page based buddy allocator
>
> diff --git a/drivers/gpu/drm/Kconfig.debug b/drivers/gpu/drm/Kconfig.debug
> index 05dc43c0b8c5..1f4c408c7920 100644
> --- a/drivers/gpu/drm/Kconfig.debug
> +++ b/drivers/gpu/drm/Kconfig.debug
> @@ -71,6 +71,7 @@ config DRM_KUNIT_TEST
>         select DRM_KUNIT_TEST_HELPERS
>         select DRM_LIB_RANDOM
>         select DRM_SYSFB_HELPER
> +       select GPU_BUDDY
>         select PRIME_NUMBERS
>         default KUNIT_ALL_TESTS
>         help
> @@ -88,10 +89,11 @@ config DRM_TTM_KUNIT_TEST
>         tristate "KUnit tests for TTM" if !KUNIT_ALL_TESTS
>         default n
>         depends on DRM && KUNIT && MMU && (UML || COMPILE_TEST)
> -       select DRM_TTM
>         select DRM_BUDDY
> +       select DRM_TTM
>         select DRM_EXPORT_FOR_TESTS if m
>         select DRM_KUNIT_TEST_HELPERS
> +       select GPU_BUDDY
>         default KUNIT_ALL_TESTS
>         help
>           Enables unit tests for TTM, a GPU memory manager subsystem used
> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
> index 7f515be5185d..bb131543e1d9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
> @@ -23,6 +23,7 @@ config DRM_AMDGPU
>         select CRC16
>         select BACKLIGHT_CLASS_DEVICE
>         select INTERVAL_TREE
> +       select GPU_BUDDY
>         select DRM_BUDDY
>         select DRM_SUBALLOC_HELPER
>         select DRM_EXEC
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 2a6cf7963dde..e0bd8a68877f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -5654,7 +5654,7 @@ int amdgpu_ras_add_critical_region(struct amdgpu_device *adev,
>         struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>         struct amdgpu_vram_mgr_resource *vres;
>         struct ras_critical_region *region;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         int ret = 0;
>
>         if (!bo || !bo->tbo.resource)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
> index be2e56ce1355..8908d9e08a30 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
> @@ -55,7 +55,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
>                                     uint64_t start, uint64_t size,
>                                     struct amdgpu_res_cursor *cur)
>  {
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         struct list_head *head, *next;
>         struct drm_mm_node *node;
>
> @@ -71,7 +71,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
>                 head = &to_amdgpu_vram_mgr_resource(res)->blocks;
>
>                 block = list_first_entry_or_null(head,
> -                                                struct drm_buddy_block,
> +                                                struct gpu_buddy_block,
>                                                  link);
>                 if (!block)
>                         goto fallback;
> @@ -81,7 +81,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
>
>                         next = block->link.next;
>                         if (next != head)
> -                               block = list_entry(next, struct drm_buddy_block, link);
> +                               block = list_entry(next, struct gpu_buddy_block, link);
>                 }
>
>                 cur->start = amdgpu_vram_mgr_block_start(block) + start;
> @@ -125,7 +125,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
>   */
>  static inline void amdgpu_res_next(struct amdgpu_res_cursor *cur, uint64_t size)
>  {
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         struct drm_mm_node *node;
>         struct list_head *next;
>
> @@ -146,7 +146,7 @@ static inline void amdgpu_res_next(struct amdgpu_res_cursor *cur, uint64_t size)
>                 block = cur->node;
>
>                 next = block->link.next;
> -               block = list_entry(next, struct drm_buddy_block, link);
> +               block = list_entry(next, struct gpu_buddy_block, link);
>
>                 cur->node = block;
>                 cur->start = amdgpu_vram_mgr_block_start(block);
> @@ -175,7 +175,7 @@ static inline void amdgpu_res_next(struct amdgpu_res_cursor *cur, uint64_t size)
>   */
>  static inline bool amdgpu_res_cleared(struct amdgpu_res_cursor *cur)
>  {
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>
>         switch (cur->mem_type) {
>         case TTM_PL_VRAM:
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 9d934c07fa6b..6c06a9c9b13f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -23,6 +23,8 @@
>   */
>
>  #include <linux/dma-mapping.h>
> +
> +#include <drm/drm_buddy.h>
>  #include <drm/ttm/ttm_range_manager.h>
>  #include <drm/drm_drv.h>
>
> @@ -52,15 +54,15 @@ to_amdgpu_device(struct amdgpu_vram_mgr *mgr)
>         return container_of(mgr, struct amdgpu_device, mman.vram_mgr);
>  }
>
> -static inline struct drm_buddy_block *
> +static inline struct gpu_buddy_block *
>  amdgpu_vram_mgr_first_block(struct list_head *list)
>  {
> -       return list_first_entry_or_null(list, struct drm_buddy_block, link);
> +       return list_first_entry_or_null(list, struct gpu_buddy_block, link);
>  }
>
>  static inline bool amdgpu_is_vram_mgr_blocks_contiguous(struct list_head *head)
>  {
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         u64 start, size;
>
>         block = amdgpu_vram_mgr_first_block(head);
> @@ -71,7 +73,7 @@ static inline bool amdgpu_is_vram_mgr_blocks_contiguous(struct list_head *head)
>                 start = amdgpu_vram_mgr_block_start(block);
>                 size = amdgpu_vram_mgr_block_size(block);
>
> -               block = list_entry(block->link.next, struct drm_buddy_block, link);
> +               block = list_entry(block->link.next, struct gpu_buddy_block, link);
>                 if (start + size != amdgpu_vram_mgr_block_start(block))
>                         return false;
>         }
> @@ -81,7 +83,7 @@ static inline bool amdgpu_is_vram_mgr_blocks_contiguous(struct list_head *head)
>
>  static inline u64 amdgpu_vram_mgr_blocks_size(struct list_head *head)
>  {
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         u64 size = 0;
>
>         list_for_each_entry(block, head, link)
> @@ -254,7 +256,7 @@ const struct attribute_group amdgpu_vram_mgr_attr_group = {
>   * Calculate how many bytes of the DRM BUDDY block are inside visible VRAM
>   */
>  static u64 amdgpu_vram_mgr_vis_size(struct amdgpu_device *adev,
> -                                   struct drm_buddy_block *block)
> +                                   struct gpu_buddy_block *block)
>  {
>         u64 start = amdgpu_vram_mgr_block_start(block);
>         u64 end = start + amdgpu_vram_mgr_block_size(block);
> @@ -279,7 +281,7 @@ u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo)
>         struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>         struct ttm_resource *res = bo->tbo.resource;
>         struct amdgpu_vram_mgr_resource *vres = to_amdgpu_vram_mgr_resource(res);
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         u64 usage = 0;
>
>         if (amdgpu_gmc_vram_full_visible(&adev->gmc))
> @@ -299,15 +301,15 @@ static void amdgpu_vram_mgr_do_reserve(struct ttm_resource_manager *man)
>  {
>         struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
>         struct amdgpu_device *adev = to_amdgpu_device(mgr);
> -       struct drm_buddy *mm = &mgr->mm;
> +       struct gpu_buddy *mm = &mgr->mm;
>         struct amdgpu_vram_reservation *rsv, *temp;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         uint64_t vis_usage;
>
>         list_for_each_entry_safe(rsv, temp, &mgr->reservations_pending, blocks) {
> -               if (drm_buddy_alloc_blocks(mm, rsv->start, rsv->start + rsv->size,
> +               if (gpu_buddy_alloc_blocks(mm, rsv->start, rsv->start + rsv->size,
>                                            rsv->size, mm->chunk_size, &rsv->allocated,
> -                                          DRM_BUDDY_RANGE_ALLOCATION))
> +                                          GPU_BUDDY_RANGE_ALLOCATION))
>                         continue;
>
>                 block = amdgpu_vram_mgr_first_block(&rsv->allocated);
> @@ -403,7 +405,7 @@ int amdgpu_vram_mgr_query_address_block_info(struct amdgpu_vram_mgr *mgr,
>                         uint64_t address, struct amdgpu_vram_block_info *info)
>  {
>         struct amdgpu_vram_mgr_resource *vres;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         u64 start, size;
>         int ret = -ENOENT;
>
> @@ -450,8 +452,8 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>         struct amdgpu_vram_mgr_resource *vres;
>         u64 size, remaining_size, lpfn, fpfn;
>         unsigned int adjust_dcc_size = 0;
> -       struct drm_buddy *mm = &mgr->mm;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy *mm = &mgr->mm;
> +       struct gpu_buddy_block *block;
>         unsigned long pages_per_block;
>         int r;
>
> @@ -493,17 +495,17 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>         INIT_LIST_HEAD(&vres->blocks);
>
>         if (place->flags & TTM_PL_FLAG_TOPDOWN)
> -               vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
> +               vres->flags |= GPU_BUDDY_TOPDOWN_ALLOCATION;
>
>         if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
> -               vres->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
> +               vres->flags |= GPU_BUDDY_CONTIGUOUS_ALLOCATION;
>
>         if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED)
> -               vres->flags |= DRM_BUDDY_CLEAR_ALLOCATION;
> +               vres->flags |= GPU_BUDDY_CLEAR_ALLOCATION;
>
>         if (fpfn || lpfn != mgr->mm.size)
>                 /* Allocate blocks in desired range */
> -               vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
> +               vres->flags |= GPU_BUDDY_RANGE_ALLOCATION;
>
>         if (bo->flags & AMDGPU_GEM_CREATE_GFX12_DCC &&
>             adev->gmc.gmc_funcs->get_dcc_alignment)
> @@ -516,7 +518,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>                 dcc_size = roundup_pow_of_two(vres->base.size + adjust_dcc_size);
>                 remaining_size = (u64)dcc_size;
>
> -               vres->flags |= DRM_BUDDY_TRIM_DISABLE;
> +               vres->flags |= GPU_BUDDY_TRIM_DISABLE;
>         }
>
>         mutex_lock(&mgr->lock);
> @@ -536,7 +538,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>
>                 BUG_ON(min_block_size < mm->chunk_size);
>
> -               r = drm_buddy_alloc_blocks(mm, fpfn,
> +               r = gpu_buddy_alloc_blocks(mm, fpfn,
>                                            lpfn,
>                                            size,
>                                            min_block_size,
> @@ -545,7 +547,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>
>                 if (unlikely(r == -ENOSPC) && pages_per_block == ~0ul &&
>                     !(place->flags & TTM_PL_FLAG_CONTIGUOUS)) {
> -                       vres->flags &= ~DRM_BUDDY_CONTIGUOUS_ALLOCATION;
> +                       vres->flags &= ~GPU_BUDDY_CONTIGUOUS_ALLOCATION;
>                         pages_per_block = max_t(u32, 2UL << (20UL - PAGE_SHIFT),
>                                                 tbo->page_alignment);
>
> @@ -566,7 +568,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>         list_add_tail(&vres->vres_node, &mgr->allocated_vres_list);
>
>         if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && adjust_dcc_size) {
> -               struct drm_buddy_block *dcc_block;
> +               struct gpu_buddy_block *dcc_block;
>                 unsigned long dcc_start;
>                 u64 trim_start;
>
> @@ -576,7 +578,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>                         roundup((unsigned long)amdgpu_vram_mgr_block_start(dcc_block),
>                                 adjust_dcc_size);
>                 trim_start = (u64)dcc_start;
> -               drm_buddy_block_trim(mm, &trim_start,
> +               gpu_buddy_block_trim(mm, &trim_start,
>                                      (u64)vres->base.size,
>                                      &vres->blocks);
>         }
> @@ -614,7 +616,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>         return 0;
>
>  error_free_blocks:
> -       drm_buddy_free_list(mm, &vres->blocks, 0);
> +       gpu_buddy_free_list(mm, &vres->blocks, 0);
>         mutex_unlock(&mgr->lock);
>  error_fini:
>         ttm_resource_fini(man, &vres->base);
> @@ -637,8 +639,8 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man,
>         struct amdgpu_vram_mgr_resource *vres = to_amdgpu_vram_mgr_resource(res);
>         struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
>         struct amdgpu_device *adev = to_amdgpu_device(mgr);
> -       struct drm_buddy *mm = &mgr->mm;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy *mm = &mgr->mm;
> +       struct gpu_buddy_block *block;
>         uint64_t vis_usage = 0;
>
>         mutex_lock(&mgr->lock);
> @@ -649,7 +651,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man,
>         list_for_each_entry(block, &vres->blocks, link)
>                 vis_usage += amdgpu_vram_mgr_vis_size(adev, block);
>
> -       drm_buddy_free_list(mm, &vres->blocks, vres->flags);
> +       gpu_buddy_free_list(mm, &vres->blocks, vres->flags);
>         amdgpu_vram_mgr_do_reserve(man);
>         mutex_unlock(&mgr->lock);
>
> @@ -688,7 +690,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
>         if (!*sgt)
>                 return -ENOMEM;
>
> -       /* Determine the number of DRM_BUDDY blocks to export */
> +       /* Determine the number of GPU_BUDDY blocks to export */
>         amdgpu_res_first(res, offset, length, &cursor);
>         while (cursor.remaining) {
>                 num_entries++;
> @@ -704,10 +706,10 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
>                 sg->length = 0;
>
>         /*
> -        * Walk down DRM_BUDDY blocks to populate scatterlist nodes
> -        * @note: Use iterator api to get first the DRM_BUDDY block
> +        * Walk down GPU_BUDDY blocks to populate scatterlist nodes
> +        * @note: Use iterator api to get first the GPU_BUDDY block
>          * and the number of bytes from it. Access the following
> -        * DRM_BUDDY block(s) if more buffer needs to exported
> +        * GPU_BUDDY block(s) if more buffer needs to exported
>          */
>         amdgpu_res_first(res, offset, length, &cursor);
>         for_each_sgtable_sg((*sgt), sg, i) {
> @@ -792,10 +794,10 @@ uint64_t amdgpu_vram_mgr_vis_usage(struct amdgpu_vram_mgr *mgr)
>  void amdgpu_vram_mgr_clear_reset_blocks(struct amdgpu_device *adev)
>  {
>         struct amdgpu_vram_mgr *mgr = &adev->mman.vram_mgr;
> -       struct drm_buddy *mm = &mgr->mm;
> +       struct gpu_buddy *mm = &mgr->mm;
>
>         mutex_lock(&mgr->lock);
> -       drm_buddy_reset_clear(mm, false);
> +       gpu_buddy_reset_clear(mm, false);
>         mutex_unlock(&mgr->lock);
>  }
>
> @@ -815,7 +817,7 @@ static bool amdgpu_vram_mgr_intersects(struct ttm_resource_manager *man,
>                                        size_t size)
>  {
>         struct amdgpu_vram_mgr_resource *mgr = to_amdgpu_vram_mgr_resource(res);
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>
>         /* Check each drm buddy block individually */
>         list_for_each_entry(block, &mgr->blocks, link) {
> @@ -848,7 +850,7 @@ static bool amdgpu_vram_mgr_compatible(struct ttm_resource_manager *man,
>                                        size_t size)
>  {
>         struct amdgpu_vram_mgr_resource *mgr = to_amdgpu_vram_mgr_resource(res);
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>
>         /* Check each drm buddy block individually */
>         list_for_each_entry(block, &mgr->blocks, link) {
> @@ -877,7 +879,7 @@ static void amdgpu_vram_mgr_debug(struct ttm_resource_manager *man,
>                                   struct drm_printer *printer)
>  {
>         struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
> -       struct drm_buddy *mm = &mgr->mm;
> +       struct gpu_buddy *mm = &mgr->mm;
>         struct amdgpu_vram_reservation *rsv;
>
>         drm_printf(printer, "  vis usage:%llu\n",
> @@ -930,7 +932,7 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
>         mgr->default_page_size = PAGE_SIZE;
>
>         man->func = &amdgpu_vram_mgr_func;
> -       err = drm_buddy_init(&mgr->mm, man->size, PAGE_SIZE);
> +       err = gpu_buddy_init(&mgr->mm, man->size, PAGE_SIZE);
>         if (err)
>                 return err;
>
> @@ -965,11 +967,11 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
>                 kfree(rsv);
>
>         list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) {
> -               drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0);
> +               gpu_buddy_free_list(&mgr->mm, &rsv->allocated, 0);
>                 kfree(rsv);
>         }
>         if (!adev->gmc.is_app_apu)
> -               drm_buddy_fini(&mgr->mm);
> +               gpu_buddy_fini(&mgr->mm);
>         mutex_unlock(&mgr->lock);
>
>         ttm_resource_manager_cleanup(man);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
> index 5f5fd9a911c2..429a21a2e9b2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
> @@ -24,11 +24,11 @@
>  #ifndef __AMDGPU_VRAM_MGR_H__
>  #define __AMDGPU_VRAM_MGR_H__
>
> -#include <drm/drm_buddy.h>
> +#include <linux/gpu_buddy.h>
>
>  struct amdgpu_vram_mgr {
>         struct ttm_resource_manager manager;
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>         /* protects access to buffer objects */
>         struct mutex lock;
>         struct list_head reservations_pending;
> @@ -57,19 +57,19 @@ struct amdgpu_vram_mgr_resource {
>         struct amdgpu_vres_task task;
>  };
>
> -static inline u64 amdgpu_vram_mgr_block_start(struct drm_buddy_block *block)
> +static inline u64 amdgpu_vram_mgr_block_start(struct gpu_buddy_block *block)
>  {
> -       return drm_buddy_block_offset(block);
> +       return gpu_buddy_block_offset(block);
>  }
>
> -static inline u64 amdgpu_vram_mgr_block_size(struct drm_buddy_block *block)
> +static inline u64 amdgpu_vram_mgr_block_size(struct gpu_buddy_block *block)
>  {
> -       return (u64)PAGE_SIZE << drm_buddy_block_order(block);
> +       return (u64)PAGE_SIZE << gpu_buddy_block_order(block);
>  }
>
> -static inline bool amdgpu_vram_mgr_is_cleared(struct drm_buddy_block *block)
> +static inline bool amdgpu_vram_mgr_is_cleared(struct gpu_buddy_block *block)
>  {
> -       return drm_buddy_block_is_clear(block);
> +       return gpu_buddy_block_is_clear(block);
>  }
>
>  static inline struct amdgpu_vram_mgr_resource *
> @@ -82,8 +82,8 @@ static inline void amdgpu_vram_mgr_set_cleared(struct ttm_resource *res)
>  {
>         struct amdgpu_vram_mgr_resource *ares = to_amdgpu_vram_mgr_resource(res);
>
> -       WARN_ON(ares->flags & DRM_BUDDY_CLEARED);
> -       ares->flags |= DRM_BUDDY_CLEARED;
> +       WARN_ON(ares->flags & GPU_BUDDY_CLEARED);
> +       ares->flags |= GPU_BUDDY_CLEARED;
>  }
>
>  int amdgpu_vram_mgr_query_address_block_info(struct amdgpu_vram_mgr *mgr,
> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
> index 2f279b46bd2c..188b36054e59 100644
> --- a/drivers/gpu/drm/drm_buddy.c
> +++ b/drivers/gpu/drm/drm_buddy.c
> @@ -3,1262 +3,25 @@
>   * Copyright © 2021 Intel Corporation
>   */
>
> -#include <kunit/test-bug.h>
> -
>  #include <linux/export.h>
> -#include <linux/kmemleak.h>
>  #include <linux/module.h>
>  #include <linux/sizes.h>
>
>  #include <drm/drm_buddy.h>
>  #include <drm/drm_print.h>
>
> -enum drm_buddy_free_tree {
> -       DRM_BUDDY_CLEAR_TREE = 0,
> -       DRM_BUDDY_DIRTY_TREE,
> -       DRM_BUDDY_MAX_FREE_TREES,
> -};
> -
> -static struct kmem_cache *slab_blocks;
> -
> -#define for_each_free_tree(tree) \
> -       for ((tree) = 0; (tree) < DRM_BUDDY_MAX_FREE_TREES; (tree)++)
> -
> -static struct drm_buddy_block *drm_block_alloc(struct drm_buddy *mm,
> -                                              struct drm_buddy_block *parent,
> -                                              unsigned int order,
> -                                              u64 offset)
> -{
> -       struct drm_buddy_block *block;
> -
> -       BUG_ON(order > DRM_BUDDY_MAX_ORDER);
> -
> -       block = kmem_cache_zalloc(slab_blocks, GFP_KERNEL);
> -       if (!block)
> -               return NULL;
> -
> -       block->header = offset;
> -       block->header |= order;
> -       block->parent = parent;
> -
> -       RB_CLEAR_NODE(&block->rb);
> -
> -       BUG_ON(block->header & DRM_BUDDY_HEADER_UNUSED);
> -       return block;
> -}
> -
> -static void drm_block_free(struct drm_buddy *mm,
> -                          struct drm_buddy_block *block)
> -{
> -       kmem_cache_free(slab_blocks, block);
> -}
> -
> -static enum drm_buddy_free_tree
> -get_block_tree(struct drm_buddy_block *block)
> -{
> -       return drm_buddy_block_is_clear(block) ?
> -              DRM_BUDDY_CLEAR_TREE : DRM_BUDDY_DIRTY_TREE;
> -}
> -
> -static struct drm_buddy_block *
> -rbtree_get_free_block(const struct rb_node *node)
> -{
> -       return node ? rb_entry(node, struct drm_buddy_block, rb) : NULL;
> -}
> -
> -static struct drm_buddy_block *
> -rbtree_last_free_block(struct rb_root *root)
> -{
> -       return rbtree_get_free_block(rb_last(root));
> -}
> -
> -static bool rbtree_is_empty(struct rb_root *root)
> -{
> -       return RB_EMPTY_ROOT(root);
> -}
> -
> -static bool drm_buddy_block_offset_less(const struct drm_buddy_block *block,
> -                                       const struct drm_buddy_block *node)
> -{
> -       return drm_buddy_block_offset(block) < drm_buddy_block_offset(node);
> -}
> -
> -static bool rbtree_block_offset_less(struct rb_node *block,
> -                                    const struct rb_node *node)
> -{
> -       return drm_buddy_block_offset_less(rbtree_get_free_block(block),
> -                                          rbtree_get_free_block(node));
> -}
> -
> -static void rbtree_insert(struct drm_buddy *mm,
> -                         struct drm_buddy_block *block,
> -                         enum drm_buddy_free_tree tree)
> -{
> -       rb_add(&block->rb,
> -              &mm->free_trees[tree][drm_buddy_block_order(block)],
> -              rbtree_block_offset_less);
> -}
> -
> -static void rbtree_remove(struct drm_buddy *mm,
> -                         struct drm_buddy_block *block)
> -{
> -       unsigned int order = drm_buddy_block_order(block);
> -       enum drm_buddy_free_tree tree;
> -       struct rb_root *root;
> -
> -       tree = get_block_tree(block);
> -       root = &mm->free_trees[tree][order];
> -
> -       rb_erase(&block->rb, root);
> -       RB_CLEAR_NODE(&block->rb);
> -}
> -
> -static void clear_reset(struct drm_buddy_block *block)
> -{
> -       block->header &= ~DRM_BUDDY_HEADER_CLEAR;
> -}
> -
> -static void mark_cleared(struct drm_buddy_block *block)
> -{
> -       block->header |= DRM_BUDDY_HEADER_CLEAR;
> -}
> -
> -static void mark_allocated(struct drm_buddy *mm,
> -                          struct drm_buddy_block *block)
> -{
> -       block->header &= ~DRM_BUDDY_HEADER_STATE;
> -       block->header |= DRM_BUDDY_ALLOCATED;
> -
> -       rbtree_remove(mm, block);
> -}
> -
> -static void mark_free(struct drm_buddy *mm,
> -                     struct drm_buddy_block *block)
> -{
> -       enum drm_buddy_free_tree tree;
> -
> -       block->header &= ~DRM_BUDDY_HEADER_STATE;
> -       block->header |= DRM_BUDDY_FREE;
> -
> -       tree = get_block_tree(block);
> -       rbtree_insert(mm, block, tree);
> -}
> -
> -static void mark_split(struct drm_buddy *mm,
> -                      struct drm_buddy_block *block)
> -{
> -       block->header &= ~DRM_BUDDY_HEADER_STATE;
> -       block->header |= DRM_BUDDY_SPLIT;
> -
> -       rbtree_remove(mm, block);
> -}
> -
> -static inline bool overlaps(u64 s1, u64 e1, u64 s2, u64 e2)
> -{
> -       return s1 <= e2 && e1 >= s2;
> -}
> -
> -static inline bool contains(u64 s1, u64 e1, u64 s2, u64 e2)
> -{
> -       return s1 <= s2 && e1 >= e2;
> -}
> -
> -static struct drm_buddy_block *
> -__get_buddy(struct drm_buddy_block *block)
> -{
> -       struct drm_buddy_block *parent;
> -
> -       parent = block->parent;
> -       if (!parent)
> -               return NULL;
> -
> -       if (parent->left == block)
> -               return parent->right;
> -
> -       return parent->left;
> -}
> -
> -static unsigned int __drm_buddy_free(struct drm_buddy *mm,
> -                                    struct drm_buddy_block *block,
> -                                    bool force_merge)
> -{
> -       struct drm_buddy_block *parent;
> -       unsigned int order;
> -
> -       while ((parent = block->parent)) {
> -               struct drm_buddy_block *buddy;
> -
> -               buddy = __get_buddy(block);
> -
> -               if (!drm_buddy_block_is_free(buddy))
> -                       break;
> -
> -               if (!force_merge) {
> -                       /*
> -                        * Check the block and its buddy clear state and exit
> -                        * the loop if they both have the dissimilar state.
> -                        */
> -                       if (drm_buddy_block_is_clear(block) !=
> -                           drm_buddy_block_is_clear(buddy))
> -                               break;
> -
> -                       if (drm_buddy_block_is_clear(block))
> -                               mark_cleared(parent);
> -               }
> -
> -               rbtree_remove(mm, buddy);
> -               if (force_merge && drm_buddy_block_is_clear(buddy))
> -                       mm->clear_avail -= drm_buddy_block_size(mm, buddy);
> -
> -               drm_block_free(mm, block);
> -               drm_block_free(mm, buddy);
> -
> -               block = parent;
> -       }
> -
> -       order = drm_buddy_block_order(block);
> -       mark_free(mm, block);
> -
> -       return order;
> -}
> -
> -static int __force_merge(struct drm_buddy *mm,
> -                        u64 start,
> -                        u64 end,
> -                        unsigned int min_order)
> -{
> -       unsigned int tree, order;
> -       int i;
> -
> -       if (!min_order)
> -               return -ENOMEM;
> -
> -       if (min_order > mm->max_order)
> -               return -EINVAL;
> -
> -       for_each_free_tree(tree) {
> -               for (i = min_order - 1; i >= 0; i--) {
> -                       struct rb_node *iter = rb_last(&mm->free_trees[tree][i]);
> -
> -                       while (iter) {
> -                               struct drm_buddy_block *block, *buddy;
> -                               u64 block_start, block_end;
> -
> -                               block = rbtree_get_free_block(iter);
> -                               iter = rb_prev(iter);
> -
> -                               if (!block || !block->parent)
> -                                       continue;
> -
> -                               block_start = drm_buddy_block_offset(block);
> -                               block_end = block_start + drm_buddy_block_size(mm, block) - 1;
> -
> -                               if (!contains(start, end, block_start, block_end))
> -                                       continue;
> -
> -                               buddy = __get_buddy(block);
> -                               if (!drm_buddy_block_is_free(buddy))
> -                                       continue;
> -
> -                               WARN_ON(drm_buddy_block_is_clear(block) ==
> -                                       drm_buddy_block_is_clear(buddy));
> -
> -                               /*
> -                                * Advance to the next node when the current node is the buddy,
> -                                * as freeing the block will also remove its buddy from the tree.
> -                                */
> -                               if (iter == &buddy->rb)
> -                                       iter = rb_prev(iter);
> -
> -                               rbtree_remove(mm, block);
> -                               if (drm_buddy_block_is_clear(block))
> -                                       mm->clear_avail -= drm_buddy_block_size(mm, block);
> -
> -                               order = __drm_buddy_free(mm, block, true);
> -                               if (order >= min_order)
> -                                       return 0;
> -                       }
> -               }
> -       }
> -
> -       return -ENOMEM;
> -}
> -
> -/**
> - * drm_buddy_init - init memory manager
> - *
> - * @mm: DRM buddy manager to initialize
> - * @size: size in bytes to manage
> - * @chunk_size: minimum page size in bytes for our allocations
> - *
> - * Initializes the memory manager and its resources.
> - *
> - * Returns:
> - * 0 on success, error code on failure.
> - */
> -int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size)
> -{
> -       unsigned int i, j, root_count = 0;
> -       u64 offset = 0;
> -
> -       if (size < chunk_size)
> -               return -EINVAL;
> -
> -       if (chunk_size < SZ_4K)
> -               return -EINVAL;
> -
> -       if (!is_power_of_2(chunk_size))
> -               return -EINVAL;
> -
> -       size = round_down(size, chunk_size);
> -
> -       mm->size = size;
> -       mm->avail = size;
> -       mm->clear_avail = 0;
> -       mm->chunk_size = chunk_size;
> -       mm->max_order = ilog2(size) - ilog2(chunk_size);
> -
> -       BUG_ON(mm->max_order > DRM_BUDDY_MAX_ORDER);
> -
> -       mm->free_trees = kmalloc_array(DRM_BUDDY_MAX_FREE_TREES,
> -                                      sizeof(*mm->free_trees),
> -                                      GFP_KERNEL);
> -       if (!mm->free_trees)
> -               return -ENOMEM;
> -
> -       for_each_free_tree(i) {
> -               mm->free_trees[i] = kmalloc_array(mm->max_order + 1,
> -                                                 sizeof(struct rb_root),
> -                                                 GFP_KERNEL);
> -               if (!mm->free_trees[i])
> -                       goto out_free_tree;
> -
> -               for (j = 0; j <= mm->max_order; ++j)
> -                       mm->free_trees[i][j] = RB_ROOT;
> -       }
> -
> -       mm->n_roots = hweight64(size);
> -
> -       mm->roots = kmalloc_array(mm->n_roots,
> -                                 sizeof(struct drm_buddy_block *),
> -                                 GFP_KERNEL);
> -       if (!mm->roots)
> -               goto out_free_tree;
> -
> -       /*
> -        * Split into power-of-two blocks, in case we are given a size that is
> -        * not itself a power-of-two.
> -        */
> -       do {
> -               struct drm_buddy_block *root;
> -               unsigned int order;
> -               u64 root_size;
> -
> -               order = ilog2(size) - ilog2(chunk_size);
> -               root_size = chunk_size << order;
> -
> -               root = drm_block_alloc(mm, NULL, order, offset);
> -               if (!root)
> -                       goto out_free_roots;
> -
> -               mark_free(mm, root);
> -
> -               BUG_ON(root_count > mm->max_order);
> -               BUG_ON(drm_buddy_block_size(mm, root) < chunk_size);
> -
> -               mm->roots[root_count] = root;
> -
> -               offset += root_size;
> -               size -= root_size;
> -               root_count++;
> -       } while (size);
> -
> -       return 0;
> -
> -out_free_roots:
> -       while (root_count--)
> -               drm_block_free(mm, mm->roots[root_count]);
> -       kfree(mm->roots);
> -out_free_tree:
> -       while (i--)
> -               kfree(mm->free_trees[i]);
> -       kfree(mm->free_trees);
> -       return -ENOMEM;
> -}
> -EXPORT_SYMBOL(drm_buddy_init);
> -
> -/**
> - * drm_buddy_fini - tear down the memory manager
> - *
> - * @mm: DRM buddy manager to free
> - *
> - * Cleanup memory manager resources and the freetree
> - */
> -void drm_buddy_fini(struct drm_buddy *mm)
> -{
> -       u64 root_size, size, start;
> -       unsigned int order;
> -       int i;
> -
> -       size = mm->size;
> -
> -       for (i = 0; i < mm->n_roots; ++i) {
> -               order = ilog2(size) - ilog2(mm->chunk_size);
> -               start = drm_buddy_block_offset(mm->roots[i]);
> -               __force_merge(mm, start, start + size, order);
> -
> -               if (WARN_ON(!drm_buddy_block_is_free(mm->roots[i])))
> -                       kunit_fail_current_test("buddy_fini() root");
> -
> -               drm_block_free(mm, mm->roots[i]);
> -
> -               root_size = mm->chunk_size << order;
> -               size -= root_size;
> -       }
> -
> -       WARN_ON(mm->avail != mm->size);
> -
> -       for_each_free_tree(i)
> -               kfree(mm->free_trees[i]);
> -       kfree(mm->roots);
> -}
> -EXPORT_SYMBOL(drm_buddy_fini);
> -
> -static int split_block(struct drm_buddy *mm,
> -                      struct drm_buddy_block *block)
> -{
> -       unsigned int block_order = drm_buddy_block_order(block) - 1;
> -       u64 offset = drm_buddy_block_offset(block);
> -
> -       BUG_ON(!drm_buddy_block_is_free(block));
> -       BUG_ON(!drm_buddy_block_order(block));
> -
> -       block->left = drm_block_alloc(mm, block, block_order, offset);
> -       if (!block->left)
> -               return -ENOMEM;
> -
> -       block->right = drm_block_alloc(mm, block, block_order,
> -                                      offset + (mm->chunk_size << block_order));
> -       if (!block->right) {
> -               drm_block_free(mm, block->left);
> -               return -ENOMEM;
> -       }
> -
> -       mark_split(mm, block);
> -
> -       if (drm_buddy_block_is_clear(block)) {
> -               mark_cleared(block->left);
> -               mark_cleared(block->right);
> -               clear_reset(block);
> -       }
> -
> -       mark_free(mm, block->left);
> -       mark_free(mm, block->right);
> -
> -       return 0;
> -}
> -
> -/**
> - * drm_get_buddy - get buddy address
> - *
> - * @block: DRM buddy block
> - *
> - * Returns the corresponding buddy block for @block, or NULL
> - * if this is a root block and can't be merged further.
> - * Requires some kind of locking to protect against
> - * any concurrent allocate and free operations.
> - */
> -struct drm_buddy_block *
> -drm_get_buddy(struct drm_buddy_block *block)
> -{
> -       return __get_buddy(block);
> -}
> -EXPORT_SYMBOL(drm_get_buddy);
> -
> -/**
> - * drm_buddy_reset_clear - reset blocks clear state
> - *
> - * @mm: DRM buddy manager
> - * @is_clear: blocks clear state
> - *
> - * Reset the clear state based on @is_clear value for each block
> - * in the freetree.
> - */
> -void drm_buddy_reset_clear(struct drm_buddy *mm, bool is_clear)
> -{
> -       enum drm_buddy_free_tree src_tree, dst_tree;
> -       u64 root_size, size, start;
> -       unsigned int order;
> -       int i;
> -
> -       size = mm->size;
> -       for (i = 0; i < mm->n_roots; ++i) {
> -               order = ilog2(size) - ilog2(mm->chunk_size);
> -               start = drm_buddy_block_offset(mm->roots[i]);
> -               __force_merge(mm, start, start + size, order);
> -
> -               root_size = mm->chunk_size << order;
> -               size -= root_size;
> -       }
> -
> -       src_tree = is_clear ? DRM_BUDDY_DIRTY_TREE : DRM_BUDDY_CLEAR_TREE;
> -       dst_tree = is_clear ? DRM_BUDDY_CLEAR_TREE : DRM_BUDDY_DIRTY_TREE;
> -
> -       for (i = 0; i <= mm->max_order; ++i) {
> -               struct rb_root *root = &mm->free_trees[src_tree][i];
> -               struct drm_buddy_block *block, *tmp;
> -
> -               rbtree_postorder_for_each_entry_safe(block, tmp, root, rb) {
> -                       rbtree_remove(mm, block);
> -                       if (is_clear) {
> -                               mark_cleared(block);
> -                               mm->clear_avail += drm_buddy_block_size(mm, block);
> -                       } else {
> -                               clear_reset(block);
> -                               mm->clear_avail -= drm_buddy_block_size(mm, block);
> -                       }
> -
> -                       rbtree_insert(mm, block, dst_tree);
> -               }
> -       }
> -}
> -EXPORT_SYMBOL(drm_buddy_reset_clear);
> -
> -/**
> - * drm_buddy_free_block - free a block
> - *
> - * @mm: DRM buddy manager
> - * @block: block to be freed
> - */
> -void drm_buddy_free_block(struct drm_buddy *mm,
> -                         struct drm_buddy_block *block)
> -{
> -       BUG_ON(!drm_buddy_block_is_allocated(block));
> -       mm->avail += drm_buddy_block_size(mm, block);
> -       if (drm_buddy_block_is_clear(block))
> -               mm->clear_avail += drm_buddy_block_size(mm, block);
> -
> -       __drm_buddy_free(mm, block, false);
> -}
> -EXPORT_SYMBOL(drm_buddy_free_block);
> -
> -static void __drm_buddy_free_list(struct drm_buddy *mm,
> -                                 struct list_head *objects,
> -                                 bool mark_clear,
> -                                 bool mark_dirty)
> -{
> -       struct drm_buddy_block *block, *on;
> -
> -       WARN_ON(mark_dirty && mark_clear);
> -
> -       list_for_each_entry_safe(block, on, objects, link) {
> -               if (mark_clear)
> -                       mark_cleared(block);
> -               else if (mark_dirty)
> -                       clear_reset(block);
> -               drm_buddy_free_block(mm, block);
> -               cond_resched();
> -       }
> -       INIT_LIST_HEAD(objects);
> -}
> -
> -static void drm_buddy_free_list_internal(struct drm_buddy *mm,
> -                                        struct list_head *objects)
> -{
> -       /*
> -        * Don't touch the clear/dirty bit, since allocation is still internal
> -        * at this point. For example we might have just failed part of the
> -        * allocation.
> -        */
> -       __drm_buddy_free_list(mm, objects, false, false);
> -}
> -
> -/**
> - * drm_buddy_free_list - free blocks
> - *
> - * @mm: DRM buddy manager
> - * @objects: input list head to free blocks
> - * @flags: optional flags like DRM_BUDDY_CLEARED
> - */
> -void drm_buddy_free_list(struct drm_buddy *mm,
> -                        struct list_head *objects,
> -                        unsigned int flags)
> -{
> -       bool mark_clear = flags & DRM_BUDDY_CLEARED;
> -
> -       __drm_buddy_free_list(mm, objects, mark_clear, !mark_clear);
> -}
> -EXPORT_SYMBOL(drm_buddy_free_list);
> -
> -static bool block_incompatible(struct drm_buddy_block *block, unsigned int flags)
> -{
> -       bool needs_clear = flags & DRM_BUDDY_CLEAR_ALLOCATION;
> -
> -       return needs_clear != drm_buddy_block_is_clear(block);
> -}
> -
> -static struct drm_buddy_block *
> -__alloc_range_bias(struct drm_buddy *mm,
> -                  u64 start, u64 end,
> -                  unsigned int order,
> -                  unsigned long flags,
> -                  bool fallback)
> -{
> -       u64 req_size = mm->chunk_size << order;
> -       struct drm_buddy_block *block;
> -       struct drm_buddy_block *buddy;
> -       LIST_HEAD(dfs);
> -       int err;
> -       int i;
> -
> -       end = end - 1;
> -
> -       for (i = 0; i < mm->n_roots; ++i)
> -               list_add_tail(&mm->roots[i]->tmp_link, &dfs);
> -
> -       do {
> -               u64 block_start;
> -               u64 block_end;
> -
> -               block = list_first_entry_or_null(&dfs,
> -                                                struct drm_buddy_block,
> -                                                tmp_link);
> -               if (!block)
> -                       break;
> -
> -               list_del(&block->tmp_link);
> -
> -               if (drm_buddy_block_order(block) < order)
> -                       continue;
> -
> -               block_start = drm_buddy_block_offset(block);
> -               block_end = block_start + drm_buddy_block_size(mm, block) - 1;
> -
> -               if (!overlaps(start, end, block_start, block_end))
> -                       continue;
> -
> -               if (drm_buddy_block_is_allocated(block))
> -                       continue;
> -
> -               if (block_start < start || block_end > end) {
> -                       u64 adjusted_start = max(block_start, start);
> -                       u64 adjusted_end = min(block_end, end);
> -
> -                       if (round_down(adjusted_end + 1, req_size) <=
> -                           round_up(adjusted_start, req_size))
> -                               continue;
> -               }
> -
> -               if (!fallback && block_incompatible(block, flags))
> -                       continue;
> -
> -               if (contains(start, end, block_start, block_end) &&
> -                   order == drm_buddy_block_order(block)) {
> -                       /*
> -                        * Find the free block within the range.
> -                        */
> -                       if (drm_buddy_block_is_free(block))
> -                               return block;
> -
> -                       continue;
> -               }
> -
> -               if (!drm_buddy_block_is_split(block)) {
> -                       err = split_block(mm, block);
> -                       if (unlikely(err))
> -                               goto err_undo;
> -               }
> -
> -               list_add(&block->right->tmp_link, &dfs);
> -               list_add(&block->left->tmp_link, &dfs);
> -       } while (1);
> -
> -       return ERR_PTR(-ENOSPC);
> -
> -err_undo:
> -       /*
> -        * We really don't want to leave around a bunch of split blocks, since
> -        * bigger is better, so make sure we merge everything back before we
> -        * free the allocated blocks.
> -        */
> -       buddy = __get_buddy(block);
> -       if (buddy &&
> -           (drm_buddy_block_is_free(block) &&
> -            drm_buddy_block_is_free(buddy)))
> -               __drm_buddy_free(mm, block, false);
> -       return ERR_PTR(err);
> -}
> -
> -static struct drm_buddy_block *
> -__drm_buddy_alloc_range_bias(struct drm_buddy *mm,
> -                            u64 start, u64 end,
> -                            unsigned int order,
> -                            unsigned long flags)
> -{
> -       struct drm_buddy_block *block;
> -       bool fallback = false;
> -
> -       block = __alloc_range_bias(mm, start, end, order,
> -                                  flags, fallback);
> -       if (IS_ERR(block))
> -               return __alloc_range_bias(mm, start, end, order,
> -                                         flags, !fallback);
> -
> -       return block;
> -}
> -
> -static struct drm_buddy_block *
> -get_maxblock(struct drm_buddy *mm,
> -            unsigned int order,
> -            enum drm_buddy_free_tree tree)
> -{
> -       struct drm_buddy_block *max_block = NULL, *block = NULL;
> -       struct rb_root *root;
> -       unsigned int i;
> -
> -       for (i = order; i <= mm->max_order; ++i) {
> -               root = &mm->free_trees[tree][i];
> -               block = rbtree_last_free_block(root);
> -               if (!block)
> -                       continue;
> -
> -               if (!max_block) {
> -                       max_block = block;
> -                       continue;
> -               }
> -
> -               if (drm_buddy_block_offset(block) >
> -                   drm_buddy_block_offset(max_block)) {
> -                       max_block = block;
> -               }
> -       }
> -
> -       return max_block;
> -}
> -
> -static struct drm_buddy_block *
> -alloc_from_freetree(struct drm_buddy *mm,
> -                   unsigned int order,
> -                   unsigned long flags)
> -{
> -       struct drm_buddy_block *block = NULL;
> -       struct rb_root *root;
> -       enum drm_buddy_free_tree tree;
> -       unsigned int tmp;
> -       int err;
> -
> -       tree = (flags & DRM_BUDDY_CLEAR_ALLOCATION) ?
> -               DRM_BUDDY_CLEAR_TREE : DRM_BUDDY_DIRTY_TREE;
> -
> -       if (flags & DRM_BUDDY_TOPDOWN_ALLOCATION) {
> -               block = get_maxblock(mm, order, tree);
> -               if (block)
> -                       /* Store the obtained block order */
> -                       tmp = drm_buddy_block_order(block);
> -       } else {
> -               for (tmp = order; tmp <= mm->max_order; ++tmp) {
> -                       /* Get RB tree root for this order and tree */
> -                       root = &mm->free_trees[tree][tmp];
> -                       block = rbtree_last_free_block(root);
> -                       if (block)
> -                               break;
> -               }
> -       }
> -
> -       if (!block) {
> -               /* Try allocating from the other tree */
> -               tree = (tree == DRM_BUDDY_CLEAR_TREE) ?
> -                       DRM_BUDDY_DIRTY_TREE : DRM_BUDDY_CLEAR_TREE;
> -
> -               for (tmp = order; tmp <= mm->max_order; ++tmp) {
> -                       root = &mm->free_trees[tree][tmp];
> -                       block = rbtree_last_free_block(root);
> -                       if (block)
> -                               break;
> -               }
> -
> -               if (!block)
> -                       return ERR_PTR(-ENOSPC);
> -       }
> -
> -       BUG_ON(!drm_buddy_block_is_free(block));
> -
> -       while (tmp != order) {
> -               err = split_block(mm, block);
> -               if (unlikely(err))
> -                       goto err_undo;
> -
> -               block = block->right;
> -               tmp--;
> -       }
> -       return block;
> -
> -err_undo:
> -       if (tmp != order)
> -               __drm_buddy_free(mm, block, false);
> -       return ERR_PTR(err);
> -}
> -
> -static int __alloc_range(struct drm_buddy *mm,
> -                        struct list_head *dfs,
> -                        u64 start, u64 size,
> -                        struct list_head *blocks,
> -                        u64 *total_allocated_on_err)
> -{
> -       struct drm_buddy_block *block;
> -       struct drm_buddy_block *buddy;
> -       u64 total_allocated = 0;
> -       LIST_HEAD(allocated);
> -       u64 end;
> -       int err;
> -
> -       end = start + size - 1;
> -
> -       do {
> -               u64 block_start;
> -               u64 block_end;
> -
> -               block = list_first_entry_or_null(dfs,
> -                                                struct drm_buddy_block,
> -                                                tmp_link);
> -               if (!block)
> -                       break;
> -
> -               list_del(&block->tmp_link);
> -
> -               block_start = drm_buddy_block_offset(block);
> -               block_end = block_start + drm_buddy_block_size(mm, block) - 1;
> -
> -               if (!overlaps(start, end, block_start, block_end))
> -                       continue;
> -
> -               if (drm_buddy_block_is_allocated(block)) {
> -                       err = -ENOSPC;
> -                       goto err_free;
> -               }
> -
> -               if (contains(start, end, block_start, block_end)) {
> -                       if (drm_buddy_block_is_free(block)) {
> -                               mark_allocated(mm, block);
> -                               total_allocated += drm_buddy_block_size(mm, block);
> -                               mm->avail -= drm_buddy_block_size(mm, block);
> -                               if (drm_buddy_block_is_clear(block))
> -                                       mm->clear_avail -= drm_buddy_block_size(mm, block);
> -                               list_add_tail(&block->link, &allocated);
> -                               continue;
> -                       } else if (!mm->clear_avail) {
> -                               err = -ENOSPC;
> -                               goto err_free;
> -                       }
> -               }
> -
> -               if (!drm_buddy_block_is_split(block)) {
> -                       err = split_block(mm, block);
> -                       if (unlikely(err))
> -                               goto err_undo;
> -               }
> -
> -               list_add(&block->right->tmp_link, dfs);
> -               list_add(&block->left->tmp_link, dfs);
> -       } while (1);
> -
> -       if (total_allocated < size) {
> -               err = -ENOSPC;
> -               goto err_free;
> -       }
> -
> -       list_splice_tail(&allocated, blocks);
> -
> -       return 0;
> -
> -err_undo:
> -       /*
> -        * We really don't want to leave around a bunch of split blocks, since
> -        * bigger is better, so make sure we merge everything back before we
> -        * free the allocated blocks.
> -        */
> -       buddy = __get_buddy(block);
> -       if (buddy &&
> -           (drm_buddy_block_is_free(block) &&
> -            drm_buddy_block_is_free(buddy)))
> -               __drm_buddy_free(mm, block, false);
> -
> -err_free:
> -       if (err == -ENOSPC && total_allocated_on_err) {
> -               list_splice_tail(&allocated, blocks);
> -               *total_allocated_on_err = total_allocated;
> -       } else {
> -               drm_buddy_free_list_internal(mm, &allocated);
> -       }
> -
> -       return err;
> -}
> -
> -static int __drm_buddy_alloc_range(struct drm_buddy *mm,
> -                                  u64 start,
> -                                  u64 size,
> -                                  u64 *total_allocated_on_err,
> -                                  struct list_head *blocks)
> -{
> -       LIST_HEAD(dfs);
> -       int i;
> -
> -       for (i = 0; i < mm->n_roots; ++i)
> -               list_add_tail(&mm->roots[i]->tmp_link, &dfs);
> -
> -       return __alloc_range(mm, &dfs, start, size,
> -                            blocks, total_allocated_on_err);
> -}
> -
> -static int __alloc_contig_try_harder(struct drm_buddy *mm,
> -                                    u64 size,
> -                                    u64 min_block_size,
> -                                    struct list_head *blocks)
> -{
> -       u64 rhs_offset, lhs_offset, lhs_size, filled;
> -       struct drm_buddy_block *block;
> -       unsigned int tree, order;
> -       LIST_HEAD(blocks_lhs);
> -       unsigned long pages;
> -       u64 modify_size;
> -       int err;
> -
> -       modify_size = rounddown_pow_of_two(size);
> -       pages = modify_size >> ilog2(mm->chunk_size);
> -       order = fls(pages) - 1;
> -       if (order == 0)
> -               return -ENOSPC;
> -
> -       for_each_free_tree(tree) {
> -               struct rb_root *root;
> -               struct rb_node *iter;
> -
> -               root = &mm->free_trees[tree][order];
> -               if (rbtree_is_empty(root))
> -                       continue;
> -
> -               iter = rb_last(root);
> -               while (iter) {
> -                       block = rbtree_get_free_block(iter);
> -
> -                       /* Allocate blocks traversing RHS */
> -                       rhs_offset = drm_buddy_block_offset(block);
> -                       err =  __drm_buddy_alloc_range(mm, rhs_offset, size,
> -                                                      &filled, blocks);
> -                       if (!err || err != -ENOSPC)
> -                               return err;
> -
> -                       lhs_size = max((size - filled), min_block_size);
> -                       if (!IS_ALIGNED(lhs_size, min_block_size))
> -                               lhs_size = round_up(lhs_size, min_block_size);
> -
> -                       /* Allocate blocks traversing LHS */
> -                       lhs_offset = drm_buddy_block_offset(block) - lhs_size;
> -                       err =  __drm_buddy_alloc_range(mm, lhs_offset, lhs_size,
> -                                                      NULL, &blocks_lhs);
> -                       if (!err) {
> -                               list_splice(&blocks_lhs, blocks);
> -                               return 0;
> -                       } else if (err != -ENOSPC) {
> -                               drm_buddy_free_list_internal(mm, blocks);
> -                               return err;
> -                       }
> -                       /* Free blocks for the next iteration */
> -                       drm_buddy_free_list_internal(mm, blocks);
> -
> -                       iter = rb_prev(iter);
> -               }
> -       }
> -
> -       return -ENOSPC;
> -}
> -
> -/**
> - * drm_buddy_block_trim - free unused pages
> - *
> - * @mm: DRM buddy manager
> - * @start: start address to begin the trimming.
> - * @new_size: original size requested
> - * @blocks: Input and output list of allocated blocks.
> - * MUST contain single block as input to be trimmed.
> - * On success will contain the newly allocated blocks
> - * making up the @new_size. Blocks always appear in
> - * ascending order
> - *
> - * For contiguous allocation, we round up the size to the nearest
> - * power of two value, drivers consume *actual* size, so remaining
> - * portions are unused and can be optionally freed with this function
> - *
> - * Returns:
> - * 0 on success, error code on failure.
> - */
> -int drm_buddy_block_trim(struct drm_buddy *mm,
> -                        u64 *start,
> -                        u64 new_size,
> -                        struct list_head *blocks)
> -{
> -       struct drm_buddy_block *parent;
> -       struct drm_buddy_block *block;
> -       u64 block_start, block_end;
> -       LIST_HEAD(dfs);
> -       u64 new_start;
> -       int err;
> -
> -       if (!list_is_singular(blocks))
> -               return -EINVAL;
> -
> -       block = list_first_entry(blocks,
> -                                struct drm_buddy_block,
> -                                link);
> -
> -       block_start = drm_buddy_block_offset(block);
> -       block_end = block_start + drm_buddy_block_size(mm, block);
> -
> -       if (WARN_ON(!drm_buddy_block_is_allocated(block)))
> -               return -EINVAL;
> -
> -       if (new_size > drm_buddy_block_size(mm, block))
> -               return -EINVAL;
> -
> -       if (!new_size || !IS_ALIGNED(new_size, mm->chunk_size))
> -               return -EINVAL;
> -
> -       if (new_size == drm_buddy_block_size(mm, block))
> -               return 0;
> -
> -       new_start = block_start;
> -       if (start) {
> -               new_start = *start;
> -
> -               if (new_start < block_start)
> -                       return -EINVAL;
> -
> -               if (!IS_ALIGNED(new_start, mm->chunk_size))
> -                       return -EINVAL;
> -
> -               if (range_overflows(new_start, new_size, block_end))
> -                       return -EINVAL;
> -       }
> -
> -       list_del(&block->link);
> -       mark_free(mm, block);
> -       mm->avail += drm_buddy_block_size(mm, block);
> -       if (drm_buddy_block_is_clear(block))
> -               mm->clear_avail += drm_buddy_block_size(mm, block);
> -
> -       /* Prevent recursively freeing this node */
> -       parent = block->parent;
> -       block->parent = NULL;
> -
> -       list_add(&block->tmp_link, &dfs);
> -       err =  __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL);
> -       if (err) {
> -               mark_allocated(mm, block);
> -               mm->avail -= drm_buddy_block_size(mm, block);
> -               if (drm_buddy_block_is_clear(block))
> -                       mm->clear_avail -= drm_buddy_block_size(mm, block);
> -               list_add(&block->link, blocks);
> -       }
> -
> -       block->parent = parent;
> -       return err;
> -}
> -EXPORT_SYMBOL(drm_buddy_block_trim);
> -
> -static struct drm_buddy_block *
> -__drm_buddy_alloc_blocks(struct drm_buddy *mm,
> -                        u64 start, u64 end,
> -                        unsigned int order,
> -                        unsigned long flags)
> -{
> -       if (flags & DRM_BUDDY_RANGE_ALLOCATION)
> -               /* Allocate traversing within the range */
> -               return  __drm_buddy_alloc_range_bias(mm, start, end,
> -                                                    order, flags);
> -       else
> -               /* Allocate from freetree */
> -               return alloc_from_freetree(mm, order, flags);
> -}
> -
> -/**
> - * drm_buddy_alloc_blocks - allocate power-of-two blocks
> - *
> - * @mm: DRM buddy manager to allocate from
> - * @start: start of the allowed range for this block
> - * @end: end of the allowed range for this block
> - * @size: size of the allocation in bytes
> - * @min_block_size: alignment of the allocation
> - * @blocks: output list head to add allocated blocks
> - * @flags: DRM_BUDDY_*_ALLOCATION flags
> - *
> - * alloc_range_bias() called on range limitations, which traverses
> - * the tree and returns the desired block.
> - *
> - * alloc_from_freetree() called when *no* range restrictions
> - * are enforced, which picks the block from the freetree.
> - *
> - * Returns:
> - * 0 on success, error code on failure.
> - */
> -int drm_buddy_alloc_blocks(struct drm_buddy *mm,
> -                          u64 start, u64 end, u64 size,
> -                          u64 min_block_size,
> -                          struct list_head *blocks,
> -                          unsigned long flags)
> -{
> -       struct drm_buddy_block *block = NULL;
> -       u64 original_size, original_min_size;
> -       unsigned int min_order, order;
> -       LIST_HEAD(allocated);
> -       unsigned long pages;
> -       int err;
> -
> -       if (size < mm->chunk_size)
> -               return -EINVAL;
> -
> -       if (min_block_size < mm->chunk_size)
> -               return -EINVAL;
> -
> -       if (!is_power_of_2(min_block_size))
> -               return -EINVAL;
> -
> -       if (!IS_ALIGNED(start | end | size, mm->chunk_size))
> -               return -EINVAL;
> -
> -       if (end > mm->size)
> -               return -EINVAL;
> -
> -       if (range_overflows(start, size, mm->size))
> -               return -EINVAL;
> -
> -       /* Actual range allocation */
> -       if (start + size == end) {
> -               if (!IS_ALIGNED(start | end, min_block_size))
> -                       return -EINVAL;
> -
> -               return __drm_buddy_alloc_range(mm, start, size, NULL, blocks);
> -       }
> -
> -       original_size = size;
> -       original_min_size = min_block_size;
> -
> -       /* Roundup the size to power of 2 */
> -       if (flags & DRM_BUDDY_CONTIGUOUS_ALLOCATION) {
> -               size = roundup_pow_of_two(size);
> -               min_block_size = size;
> -       /* Align size value to min_block_size */
> -       } else if (!IS_ALIGNED(size, min_block_size)) {
> -               size = round_up(size, min_block_size);
> -       }
> -
> -       pages = size >> ilog2(mm->chunk_size);
> -       order = fls(pages) - 1;
> -       min_order = ilog2(min_block_size) - ilog2(mm->chunk_size);
> -
> -       do {
> -               order = min(order, (unsigned int)fls(pages) - 1);
> -               BUG_ON(order > mm->max_order);
> -               BUG_ON(order < min_order);
> -
> -               do {
> -                       block = __drm_buddy_alloc_blocks(mm, start,
> -                                                        end,
> -                                                        order,
> -                                                        flags);
> -                       if (!IS_ERR(block))
> -                               break;
> -
> -                       if (order-- == min_order) {
> -                               /* Try allocation through force merge method */
> -                               if (mm->clear_avail &&
> -                                   !__force_merge(mm, start, end, min_order)) {
> -                                       block = __drm_buddy_alloc_blocks(mm, start,
> -                                                                        end,
> -                                                                        min_order,
> -                                                                        flags);
> -                                       if (!IS_ERR(block)) {
> -                                               order = min_order;
> -                                               break;
> -                                       }
> -                               }
> -
> -                               /*
> -                                * Try contiguous block allocation through
> -                                * try harder method.
> -                                */
> -                               if (flags & DRM_BUDDY_CONTIGUOUS_ALLOCATION &&
> -                                   !(flags & DRM_BUDDY_RANGE_ALLOCATION))
> -                                       return __alloc_contig_try_harder(mm,
> -                                                                        original_size,
> -                                                                        original_min_size,
> -                                                                        blocks);
> -                               err = -ENOSPC;
> -                               goto err_free;
> -                       }
> -               } while (1);
> -
> -               mark_allocated(mm, block);
> -               mm->avail -= drm_buddy_block_size(mm, block);
> -               if (drm_buddy_block_is_clear(block))
> -                       mm->clear_avail -= drm_buddy_block_size(mm, block);
> -               kmemleak_update_trace(block);
> -               list_add_tail(&block->link, &allocated);
> -
> -               pages -= BIT(order);
> -
> -               if (!pages)
> -                       break;
> -       } while (1);
> -
> -       /* Trim the allocated block to the required size */
> -       if (!(flags & DRM_BUDDY_TRIM_DISABLE) &&
> -           original_size != size) {
> -               struct list_head *trim_list;
> -               LIST_HEAD(temp);
> -               u64 trim_size;
> -
> -               trim_list = &allocated;
> -               trim_size = original_size;
> -
> -               if (!list_is_singular(&allocated)) {
> -                       block = list_last_entry(&allocated, typeof(*block), link);
> -                       list_move(&block->link, &temp);
> -                       trim_list = &temp;
> -                       trim_size = drm_buddy_block_size(mm, block) -
> -                               (size - original_size);
> -               }
> -
> -               drm_buddy_block_trim(mm,
> -                                    NULL,
> -                                    trim_size,
> -                                    trim_list);
> -
> -               if (!list_empty(&temp))
> -                       list_splice_tail(trim_list, &allocated);
> -       }
> -
> -       list_splice_tail(&allocated, blocks);
> -       return 0;
> -
> -err_free:
> -       drm_buddy_free_list_internal(mm, &allocated);
> -       return err;
> -}
> -EXPORT_SYMBOL(drm_buddy_alloc_blocks);
> -
>  /**
>   * drm_buddy_block_print - print block information
>   *
> - * @mm: DRM buddy manager
> - * @block: DRM buddy block
> + * @mm: GPU buddy manager
> + * @block: GPU buddy block
>   * @p: DRM printer to use
>   */
> -void drm_buddy_block_print(struct drm_buddy *mm,
> -                          struct drm_buddy_block *block,
> +void drm_buddy_block_print(struct gpu_buddy *mm, struct gpu_buddy_block *block,
>                            struct drm_printer *p)
>  {
> -       u64 start = drm_buddy_block_offset(block);
> -       u64 size = drm_buddy_block_size(mm, block);
> +       u64 start = gpu_buddy_block_offset(block);
> +       u64 size = gpu_buddy_block_size(mm, block);
>
>         drm_printf(p, "%#018llx-%#018llx: %llu\n", start, start + size, size);
>  }
> @@ -1267,18 +30,21 @@ EXPORT_SYMBOL(drm_buddy_block_print);
>  /**
>   * drm_buddy_print - print allocator state
>   *
> - * @mm: DRM buddy manager
> + * @mm: GPU buddy manager
>   * @p: DRM printer to use
>   */
> -void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p)
> +void drm_buddy_print(struct gpu_buddy *mm, struct drm_printer *p)
>  {
>         int order;
>
> -       drm_printf(p, "chunk_size: %lluKiB, total: %lluMiB, free: %lluMiB, clear_free: %lluMiB\n",
> -                  mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
> +       drm_printf(
> +               p,
> +               "chunk_size: %lluKiB, total: %lluMiB, free: %lluMiB, clear_free: %lluMiB\n",
> +               mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20,
> +               mm->clear_avail >> 20);
>
>         for (order = mm->max_order; order >= 0; order--) {
> -               struct drm_buddy_block *block, *tmp;
> +               struct gpu_buddy_block *block, *tmp;
>                 struct rb_root *root;
>                 u64 count = 0, free;
>                 unsigned int tree;
> @@ -1286,8 +52,9 @@ void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p)
>                 for_each_free_tree(tree) {
>                         root = &mm->free_trees[tree][order];
>
> -                       rbtree_postorder_for_each_entry_safe(block, tmp, root, rb) {
> -                               BUG_ON(!drm_buddy_block_is_free(block));
> +                       rbtree_postorder_for_each_entry_safe(block, tmp, root,
> +                                                            rb) {
> +                               BUG_ON(!gpu_buddy_block_is_free(block));
>                                 count++;
>                         }
>                 }
> @@ -1305,22 +72,5 @@ void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p)
>  }
>  EXPORT_SYMBOL(drm_buddy_print);
>
> -static void drm_buddy_module_exit(void)
> -{
> -       kmem_cache_destroy(slab_blocks);
> -}
> -
> -static int __init drm_buddy_module_init(void)
> -{
> -       slab_blocks = KMEM_CACHE(drm_buddy_block, 0);
> -       if (!slab_blocks)
> -               return -ENOMEM;
> -
> -       return 0;
> -}
> -
> -module_init(drm_buddy_module_init);
> -module_exit(drm_buddy_module_exit);
> -
> -MODULE_DESCRIPTION("DRM Buddy Allocator");
> +MODULE_DESCRIPTION("DRM-specific GPU Buddy Allocator Print Helpers");
>  MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index 5e939004b646..859aeca87c19 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -38,6 +38,7 @@ config DRM_I915
>         select CEC_CORE if CEC_NOTIFIER
>         select VMAP_PFN
>         select DRM_TTM
> +       select GPU_BUDDY
>         select DRM_BUDDY
>         select AUXILIARY_BUS
>         help
> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c b/drivers/gpu/drm/i915/i915_scatterlist.c
> index 4d830740946d..6a34dae13769 100644
> --- a/drivers/gpu/drm/i915/i915_scatterlist.c
> +++ b/drivers/gpu/drm/i915/i915_scatterlist.c
> @@ -7,7 +7,7 @@
>  #include "i915_scatterlist.h"
>  #include "i915_ttm_buddy_manager.h"
>
> -#include <drm/drm_buddy.h>
> +#include <linux/gpu_buddy.h>
>  #include <drm/drm_mm.h>
>
>  #include <linux/slab.h>
> @@ -167,9 +167,9 @@ struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
>         struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
>         const u64 size = res->size;
>         const u32 max_segment = round_down(UINT_MAX, page_alignment);
> -       struct drm_buddy *mm = bman_res->mm;
> +       struct gpu_buddy *mm = bman_res->mm;
>         struct list_head *blocks = &bman_res->blocks;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         struct i915_refct_sgt *rsgt;
>         struct scatterlist *sg;
>         struct sg_table *st;
> @@ -202,8 +202,8 @@ struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
>         list_for_each_entry(block, blocks, link) {
>                 u64 block_size, offset;
>
> -               block_size = min_t(u64, size, drm_buddy_block_size(mm, block));
> -               offset = drm_buddy_block_offset(block);
> +               block_size = min_t(u64, size, gpu_buddy_block_size(mm, block));
> +               offset = gpu_buddy_block_offset(block);
>
>                 while (block_size) {
>                         u64 len;
> diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
> index d5c6e6605086..f43d7f2771ad 100644
> --- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
> +++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
> @@ -4,6 +4,7 @@
>   */
>
>  #include <linux/slab.h>
> +#include <linux/gpu_buddy.h>
>
>  #include <drm/drm_buddy.h>
>  #include <drm/drm_print.h>
> @@ -16,7 +17,7 @@
>
>  struct i915_ttm_buddy_manager {
>         struct ttm_resource_manager manager;
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>         struct list_head reserved;
>         struct mutex lock;
>         unsigned long visible_size;
> @@ -38,7 +39,7 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
>  {
>         struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
>         struct i915_ttm_buddy_resource *bman_res;
> -       struct drm_buddy *mm = &bman->mm;
> +       struct gpu_buddy *mm = &bman->mm;
>         unsigned long n_pages, lpfn;
>         u64 min_page_size;
>         u64 size;
> @@ -57,13 +58,13 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
>         bman_res->mm = mm;
>
>         if (place->flags & TTM_PL_FLAG_TOPDOWN)
> -               bman_res->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
> +               bman_res->flags |= GPU_BUDDY_TOPDOWN_ALLOCATION;
>
>         if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
> -               bman_res->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
> +               bman_res->flags |= GPU_BUDDY_CONTIGUOUS_ALLOCATION;
>
>         if (place->fpfn || lpfn != man->size)
> -               bman_res->flags |= DRM_BUDDY_RANGE_ALLOCATION;
> +               bman_res->flags |= GPU_BUDDY_RANGE_ALLOCATION;
>
>         GEM_BUG_ON(!bman_res->base.size);
>         size = bman_res->base.size;
> @@ -89,7 +90,7 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
>                 goto err_free_res;
>         }
>
> -       err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
> +       err = gpu_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
>                                      (u64)lpfn << PAGE_SHIFT,
>                                      (u64)n_pages << PAGE_SHIFT,
>                                      min_page_size,
> @@ -101,15 +102,15 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
>         if (lpfn <= bman->visible_size) {
>                 bman_res->used_visible_size = PFN_UP(bman_res->base.size);
>         } else {
> -               struct drm_buddy_block *block;
> +               struct gpu_buddy_block *block;
>
>                 list_for_each_entry(block, &bman_res->blocks, link) {
>                         unsigned long start =
> -                               drm_buddy_block_offset(block) >> PAGE_SHIFT;
> +                               gpu_buddy_block_offset(block) >> PAGE_SHIFT;
>
>                         if (start < bman->visible_size) {
>                                 unsigned long end = start +
> -                                       (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
> +                                       (gpu_buddy_block_size(mm, block) >> PAGE_SHIFT);
>
>                                 bman_res->used_visible_size +=
>                                         min(end, bman->visible_size) - start;
> @@ -126,7 +127,7 @@ static int i915_ttm_buddy_man_alloc(struct ttm_resource_manager *man,
>         return 0;
>
>  err_free_blocks:
> -       drm_buddy_free_list(mm, &bman_res->blocks, 0);
> +       gpu_buddy_free_list(mm, &bman_res->blocks, 0);
>         mutex_unlock(&bman->lock);
>  err_free_res:
>         ttm_resource_fini(man, &bman_res->base);
> @@ -141,7 +142,7 @@ static void i915_ttm_buddy_man_free(struct ttm_resource_manager *man,
>         struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
>
>         mutex_lock(&bman->lock);
> -       drm_buddy_free_list(&bman->mm, &bman_res->blocks, 0);
> +       gpu_buddy_free_list(&bman->mm, &bman_res->blocks, 0);
>         bman->visible_avail += bman_res->used_visible_size;
>         mutex_unlock(&bman->lock);
>
> @@ -156,8 +157,8 @@ static bool i915_ttm_buddy_man_intersects(struct ttm_resource_manager *man,
>  {
>         struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
>         struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
> -       struct drm_buddy *mm = &bman->mm;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy *mm = &bman->mm;
> +       struct gpu_buddy_block *block;
>
>         if (!place->fpfn && !place->lpfn)
>                 return true;
> @@ -176,9 +177,9 @@ static bool i915_ttm_buddy_man_intersects(struct ttm_resource_manager *man,
>         /* Check each drm buddy block individually */
>         list_for_each_entry(block, &bman_res->blocks, link) {
>                 unsigned long fpfn =
> -                       drm_buddy_block_offset(block) >> PAGE_SHIFT;
> +                       gpu_buddy_block_offset(block) >> PAGE_SHIFT;
>                 unsigned long lpfn = fpfn +
> -                       (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
> +                       (gpu_buddy_block_size(mm, block) >> PAGE_SHIFT);
>
>                 if (place->fpfn < lpfn && place->lpfn > fpfn)
>                         return true;
> @@ -194,8 +195,8 @@ static bool i915_ttm_buddy_man_compatible(struct ttm_resource_manager *man,
>  {
>         struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
>         struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
> -       struct drm_buddy *mm = &bman->mm;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy *mm = &bman->mm;
> +       struct gpu_buddy_block *block;
>
>         if (!place->fpfn && !place->lpfn)
>                 return true;
> @@ -209,9 +210,9 @@ static bool i915_ttm_buddy_man_compatible(struct ttm_resource_manager *man,
>         /* Check each drm buddy block individually */
>         list_for_each_entry(block, &bman_res->blocks, link) {
>                 unsigned long fpfn =
> -                       drm_buddy_block_offset(block) >> PAGE_SHIFT;
> +                       gpu_buddy_block_offset(block) >> PAGE_SHIFT;
>                 unsigned long lpfn = fpfn +
> -                       (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
> +                       (gpu_buddy_block_size(mm, block) >> PAGE_SHIFT);
>
>                 if (fpfn < place->fpfn || lpfn > place->lpfn)
>                         return false;
> @@ -224,7 +225,7 @@ static void i915_ttm_buddy_man_debug(struct ttm_resource_manager *man,
>                                      struct drm_printer *printer)
>  {
>         struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>
>         mutex_lock(&bman->lock);
>         drm_printf(printer, "default_page_size: %lluKiB\n",
> @@ -293,7 +294,7 @@ int i915_ttm_buddy_man_init(struct ttm_device *bdev,
>         if (!bman)
>                 return -ENOMEM;
>
> -       err = drm_buddy_init(&bman->mm, size, chunk_size);
> +       err = gpu_buddy_init(&bman->mm, size, chunk_size);
>         if (err)
>                 goto err_free_bman;
>
> @@ -333,7 +334,7 @@ int i915_ttm_buddy_man_fini(struct ttm_device *bdev, unsigned int type)
>  {
>         struct ttm_resource_manager *man = ttm_manager_type(bdev, type);
>         struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
> -       struct drm_buddy *mm = &bman->mm;
> +       struct gpu_buddy *mm = &bman->mm;
>         int ret;
>
>         ttm_resource_manager_set_used(man, false);
> @@ -345,8 +346,8 @@ int i915_ttm_buddy_man_fini(struct ttm_device *bdev, unsigned int type)
>         ttm_set_driver_manager(bdev, type, NULL);
>
>         mutex_lock(&bman->lock);
> -       drm_buddy_free_list(mm, &bman->reserved, 0);
> -       drm_buddy_fini(mm);
> +       gpu_buddy_free_list(mm, &bman->reserved, 0);
> +       gpu_buddy_fini(mm);
>         bman->visible_avail += bman->visible_reserved;
>         WARN_ON_ONCE(bman->visible_avail != bman->visible_size);
>         mutex_unlock(&bman->lock);
> @@ -371,15 +372,15 @@ int i915_ttm_buddy_man_reserve(struct ttm_resource_manager *man,
>                                u64 start, u64 size)
>  {
>         struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
> -       struct drm_buddy *mm = &bman->mm;
> +       struct gpu_buddy *mm = &bman->mm;
>         unsigned long fpfn = start >> PAGE_SHIFT;
>         unsigned long flags = 0;
>         int ret;
>
> -       flags |= DRM_BUDDY_RANGE_ALLOCATION;
> +       flags |= GPU_BUDDY_RANGE_ALLOCATION;
>
>         mutex_lock(&bman->lock);
> -       ret = drm_buddy_alloc_blocks(mm, start,
> +       ret = gpu_buddy_alloc_blocks(mm, start,
>                                      start + size,
>                                      size, mm->chunk_size,
>                                      &bman->reserved,
> diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
> index d64620712830..4a92dcf09766 100644
> --- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
> +++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
> @@ -13,14 +13,14 @@
>
>  struct ttm_device;
>  struct ttm_resource_manager;
> -struct drm_buddy;
> +struct gpu_buddy;
>
>  /**
>   * struct i915_ttm_buddy_resource
>   *
>   * @base: struct ttm_resource base class we extend
>   * @blocks: the list of struct i915_buddy_block for this resource/allocation
> - * @flags: DRM_BUDDY_*_ALLOCATION flags
> + * @flags: GPU_BUDDY_*_ALLOCATION flags
>   * @used_visible_size: How much of this resource, if any, uses the CPU visible
>   * portion, in pages.
>   * @mm: the struct i915_buddy_mm for this resource
> @@ -33,7 +33,7 @@ struct i915_ttm_buddy_resource {
>         struct list_head blocks;
>         unsigned long flags;
>         unsigned long used_visible_size;
> -       struct drm_buddy *mm;
> +       struct gpu_buddy *mm;
>  };
>
>  /**
> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> index 7b856b5090f9..8307390943a2 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> @@ -6,7 +6,7 @@
>  #include <linux/prime_numbers.h>
>  #include <linux/sort.h>
>
> -#include <drm/drm_buddy.h>
> +#include <linux/gpu_buddy.h>
>
>  #include "../i915_selftest.h"
>
> @@ -371,7 +371,7 @@ static int igt_mock_splintered_region(void *arg)
>         struct drm_i915_private *i915 = mem->i915;
>         struct i915_ttm_buddy_resource *res;
>         struct drm_i915_gem_object *obj;
> -       struct drm_buddy *mm;
> +       struct gpu_buddy *mm;
>         unsigned int expected_order;
>         LIST_HEAD(objects);
>         u64 size;
> @@ -447,8 +447,8 @@ static int igt_mock_max_segment(void *arg)
>         struct drm_i915_private *i915 = mem->i915;
>         struct i915_ttm_buddy_resource *res;
>         struct drm_i915_gem_object *obj;
> -       struct drm_buddy_block *block;
> -       struct drm_buddy *mm;
> +       struct gpu_buddy_block *block;
> +       struct gpu_buddy *mm;
>         struct list_head *blocks;
>         struct scatterlist *sg;
>         I915_RND_STATE(prng);
> @@ -487,8 +487,8 @@ static int igt_mock_max_segment(void *arg)
>         mm = res->mm;
>         size = 0;
>         list_for_each_entry(block, blocks, link) {
> -               if (drm_buddy_block_size(mm, block) > size)
> -                       size = drm_buddy_block_size(mm, block);
> +               if (gpu_buddy_block_size(mm, block) > size)
> +                       size = gpu_buddy_block_size(mm, block);
>         }
>         if (size < max_segment) {
>                 pr_err("%s: Failed to create a huge contiguous block [> %u], largest block %lld\n",
> @@ -527,14 +527,14 @@ static u64 igt_object_mappable_total(struct drm_i915_gem_object *obj)
>         struct intel_memory_region *mr = obj->mm.region;
>         struct i915_ttm_buddy_resource *bman_res =
>                 to_ttm_buddy_resource(obj->mm.res);
> -       struct drm_buddy *mm = bman_res->mm;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy *mm = bman_res->mm;
> +       struct gpu_buddy_block *block;
>         u64 total;
>
>         total = 0;
>         list_for_each_entry(block, &bman_res->blocks, link) {
> -               u64 start = drm_buddy_block_offset(block);
> -               u64 end = start + drm_buddy_block_size(mm, block);
> +               u64 start = gpu_buddy_block_offset(block);
> +               u64 end = start + gpu_buddy_block_size(mm, block);
>
>                 if (start < resource_size(&mr->io))
>                         total += min_t(u64, end, resource_size(&mr->io)) - start;
> diff --git a/drivers/gpu/drm/tests/Makefile b/drivers/gpu/drm/tests/Makefile
> index 87d5d5f9332a..d2e2e3d8349a 100644
> --- a/drivers/gpu/drm/tests/Makefile
> +++ b/drivers/gpu/drm/tests/Makefile
> @@ -7,7 +7,6 @@ obj-$(CONFIG_DRM_KUNIT_TEST) += \
>         drm_atomic_test.o \
>         drm_atomic_state_test.o \
>         drm_bridge_test.o \
> -       drm_buddy_test.o \
>         drm_cmdline_parser_test.o \
>         drm_connector_test.o \
>         drm_damage_helper_test.o \
> diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
> index 2eda87882e65..ffa12473077c 100644
> --- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
> +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c
> @@ -3,6 +3,7 @@
>   * Copyright © 2023 Intel Corporation
>   */
>  #include <linux/delay.h>
> +#include <linux/gpu_buddy.h>
>  #include <linux/kthread.h>
>
>  #include <drm/ttm/ttm_resource.h>
> @@ -251,7 +252,7 @@ static void ttm_bo_validate_basic(struct kunit *test)
>                                    NULL, &dummy_ttm_bo_destroy);
>         KUNIT_EXPECT_EQ(test, err, 0);
>
> -       snd_place = ttm_place_kunit_init(test, snd_mem, DRM_BUDDY_TOPDOWN_ALLOCATION);
> +       snd_place = ttm_place_kunit_init(test, snd_mem, GPU_BUDDY_TOPDOWN_ALLOCATION);
>         snd_placement = ttm_placement_kunit_init(test, snd_place, 1);
>
>         err = ttm_bo_validate(bo, snd_placement, &ctx_val);
> @@ -263,7 +264,7 @@ static void ttm_bo_validate_basic(struct kunit *test)
>         KUNIT_EXPECT_TRUE(test, ttm_tt_is_populated(bo->ttm));
>         KUNIT_EXPECT_EQ(test, bo->resource->mem_type, snd_mem);
>         KUNIT_EXPECT_EQ(test, bo->resource->placement,
> -                       DRM_BUDDY_TOPDOWN_ALLOCATION);
> +                       GPU_BUDDY_TOPDOWN_ALLOCATION);
>
>         ttm_bo_fini(bo);
>         ttm_mock_manager_fini(priv->ttm_dev, snd_mem);
> diff --git a/drivers/gpu/drm/ttm/tests/ttm_mock_manager.c b/drivers/gpu/drm/ttm/tests/ttm_mock_manager.c
> index dd395229e388..294d56d9067e 100644
> --- a/drivers/gpu/drm/ttm/tests/ttm_mock_manager.c
> +++ b/drivers/gpu/drm/ttm/tests/ttm_mock_manager.c
> @@ -31,7 +31,7 @@ static int ttm_mock_manager_alloc(struct ttm_resource_manager *man,
>  {
>         struct ttm_mock_manager *manager = to_mock_mgr(man);
>         struct ttm_mock_resource *mock_res;
> -       struct drm_buddy *mm = &manager->mm;
> +       struct gpu_buddy *mm = &manager->mm;
>         u64 lpfn, fpfn, alloc_size;
>         int err;
>
> @@ -47,14 +47,14 @@ static int ttm_mock_manager_alloc(struct ttm_resource_manager *man,
>         INIT_LIST_HEAD(&mock_res->blocks);
>
>         if (place->flags & TTM_PL_FLAG_TOPDOWN)
> -               mock_res->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
> +               mock_res->flags |= GPU_BUDDY_TOPDOWN_ALLOCATION;
>
>         if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
> -               mock_res->flags |= DRM_BUDDY_CONTIGUOUS_ALLOCATION;
> +               mock_res->flags |= GPU_BUDDY_CONTIGUOUS_ALLOCATION;
>
>         alloc_size = (uint64_t)mock_res->base.size;
>         mutex_lock(&manager->lock);
> -       err = drm_buddy_alloc_blocks(mm, fpfn, lpfn, alloc_size,
> +       err = gpu_buddy_alloc_blocks(mm, fpfn, lpfn, alloc_size,
>                                      manager->default_page_size,
>                                      &mock_res->blocks,
>                                      mock_res->flags);
> @@ -67,7 +67,7 @@ static int ttm_mock_manager_alloc(struct ttm_resource_manager *man,
>         return 0;
>
>  error_free_blocks:
> -       drm_buddy_free_list(mm, &mock_res->blocks, 0);
> +       gpu_buddy_free_list(mm, &mock_res->blocks, 0);
>         ttm_resource_fini(man, &mock_res->base);
>         mutex_unlock(&manager->lock);
>
> @@ -79,10 +79,10 @@ static void ttm_mock_manager_free(struct ttm_resource_manager *man,
>  {
>         struct ttm_mock_manager *manager = to_mock_mgr(man);
>         struct ttm_mock_resource *mock_res = to_mock_mgr_resource(res);
> -       struct drm_buddy *mm = &manager->mm;
> +       struct gpu_buddy *mm = &manager->mm;
>
>         mutex_lock(&manager->lock);
> -       drm_buddy_free_list(mm, &mock_res->blocks, 0);
> +       gpu_buddy_free_list(mm, &mock_res->blocks, 0);
>         mutex_unlock(&manager->lock);
>
>         ttm_resource_fini(man, res);
> @@ -106,7 +106,7 @@ int ttm_mock_manager_init(struct ttm_device *bdev, u32 mem_type, u32 size)
>
>         mutex_init(&manager->lock);
>
> -       err = drm_buddy_init(&manager->mm, size, PAGE_SIZE);
> +       err = gpu_buddy_init(&manager->mm, size, PAGE_SIZE);
>
>         if (err) {
>                 kfree(manager);
> @@ -142,7 +142,7 @@ void ttm_mock_manager_fini(struct ttm_device *bdev, u32 mem_type)
>         ttm_resource_manager_set_used(man, false);
>
>         mutex_lock(&mock_man->lock);
> -       drm_buddy_fini(&mock_man->mm);
> +       gpu_buddy_fini(&mock_man->mm);
>         mutex_unlock(&mock_man->lock);
>
>         ttm_set_driver_manager(bdev, mem_type, NULL);
> diff --git a/drivers/gpu/drm/ttm/tests/ttm_mock_manager.h b/drivers/gpu/drm/ttm/tests/ttm_mock_manager.h
> index e4c95f86a467..08710756fd8e 100644
> --- a/drivers/gpu/drm/ttm/tests/ttm_mock_manager.h
> +++ b/drivers/gpu/drm/ttm/tests/ttm_mock_manager.h
> @@ -5,11 +5,11 @@
>  #ifndef TTM_MOCK_MANAGER_H
>  #define TTM_MOCK_MANAGER_H
>
> -#include <drm/drm_buddy.h>
> +#include <linux/gpu_buddy.h>
>
>  struct ttm_mock_manager {
>         struct ttm_resource_manager man;
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>         u64 default_page_size;
>         /* protects allocations of mock buffer objects */
>         struct mutex lock;
> diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
> index 4b288eb3f5b0..982ef754742e 100644
> --- a/drivers/gpu/drm/xe/Kconfig
> +++ b/drivers/gpu/drm/xe/Kconfig
> @@ -11,6 +11,7 @@ config DRM_XE
>         # the shmem_readpage() which depends upon tmpfs
>         select SHMEM
>         select TMPFS
> +       select GPU_BUDDY
>         select DRM_BUDDY
>         select DRM_CLIENT_SELECTION
>         select DRM_KMS_HELPER
> diff --git a/drivers/gpu/drm/xe/xe_res_cursor.h b/drivers/gpu/drm/xe/xe_res_cursor.h
> index 4e00008b7081..5f4ab08c0686 100644
> --- a/drivers/gpu/drm/xe/xe_res_cursor.h
> +++ b/drivers/gpu/drm/xe/xe_res_cursor.h
> @@ -58,7 +58,7 @@ struct xe_res_cursor {
>         /** @dma_addr: Current element in a struct drm_pagemap_addr array */
>         const struct drm_pagemap_addr *dma_addr;
>         /** @mm: Buddy allocator for VRAM cursor */
> -       struct drm_buddy *mm;
> +       struct gpu_buddy *mm;
>         /**
>          * @dma_start: DMA start address for the current segment.
>          * This may be different to @dma_addr.addr since elements in
> @@ -69,7 +69,7 @@ struct xe_res_cursor {
>         u64 dma_seg_size;
>  };
>
> -static struct drm_buddy *xe_res_get_buddy(struct ttm_resource *res)
> +static struct gpu_buddy *xe_res_get_buddy(struct ttm_resource *res)
>  {
>         struct ttm_resource_manager *mgr;
>
> @@ -104,30 +104,30 @@ static inline void xe_res_first(struct ttm_resource *res,
>         case XE_PL_STOLEN:
>         case XE_PL_VRAM0:
>         case XE_PL_VRAM1: {
> -               struct drm_buddy_block *block;
> +               struct gpu_buddy_block *block;
>                 struct list_head *head, *next;
> -               struct drm_buddy *mm = xe_res_get_buddy(res);
> +               struct gpu_buddy *mm = xe_res_get_buddy(res);
>
>                 head = &to_xe_ttm_vram_mgr_resource(res)->blocks;
>
>                 block = list_first_entry_or_null(head,
> -                                                struct drm_buddy_block,
> +                                                struct gpu_buddy_block,
>                                                  link);
>                 if (!block)
>                         goto fallback;
>
> -               while (start >= drm_buddy_block_size(mm, block)) {
> -                       start -= drm_buddy_block_size(mm, block);
> +               while (start >= gpu_buddy_block_size(mm, block)) {
> +                       start -= gpu_buddy_block_size(mm, block);
>
>                         next = block->link.next;
>                         if (next != head)
> -                               block = list_entry(next, struct drm_buddy_block,
> +                               block = list_entry(next, struct gpu_buddy_block,
>                                                    link);
>                 }
>
>                 cur->mm = mm;
> -               cur->start = drm_buddy_block_offset(block) + start;
> -               cur->size = min(drm_buddy_block_size(mm, block) - start,
> +               cur->start = gpu_buddy_block_offset(block) + start;
> +               cur->size = min(gpu_buddy_block_size(mm, block) - start,
>                                 size);
>                 cur->remaining = size;
>                 cur->node = block;
> @@ -259,7 +259,7 @@ static inline void xe_res_first_dma(const struct drm_pagemap_addr *dma_addr,
>   */
>  static inline void xe_res_next(struct xe_res_cursor *cur, u64 size)
>  {
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         struct list_head *next;
>         u64 start;
>
> @@ -295,18 +295,18 @@ static inline void xe_res_next(struct xe_res_cursor *cur, u64 size)
>                 block = cur->node;
>
>                 next = block->link.next;
> -               block = list_entry(next, struct drm_buddy_block, link);
> +               block = list_entry(next, struct gpu_buddy_block, link);
>
>
> -               while (start >= drm_buddy_block_size(cur->mm, block)) {
> -                       start -= drm_buddy_block_size(cur->mm, block);
> +               while (start >= gpu_buddy_block_size(cur->mm, block)) {
> +                       start -= gpu_buddy_block_size(cur->mm, block);
>
>                         next = block->link.next;
> -                       block = list_entry(next, struct drm_buddy_block, link);
> +                       block = list_entry(next, struct gpu_buddy_block, link);
>                 }
>
> -               cur->start = drm_buddy_block_offset(block) + start;
> -               cur->size = min(drm_buddy_block_size(cur->mm, block) - start,
> +               cur->start = gpu_buddy_block_offset(block) + start;
> +               cur->size = min(gpu_buddy_block_size(cur->mm, block) - start,
>                                 cur->remaining);
>                 cur->node = block;
>                 break;
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index f97e0af6a9b0..2b7e266f9bdd 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -688,7 +688,7 @@ static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64 offset)
>         return PHYS_PFN(offset + vr->hpa_base);
>  }
>
> -static struct drm_buddy *vram_to_buddy(struct xe_vram_region *vram)
> +static struct gpu_buddy *vram_to_buddy(struct xe_vram_region *vram)
>  {
>         return &vram->ttm.mm;
>  }
> @@ -699,16 +699,16 @@ static int xe_svm_populate_devmem_pfn(struct drm_pagemap_devmem *devmem_allocati
>         struct xe_bo *bo = to_xe_bo(devmem_allocation);
>         struct ttm_resource *res = bo->ttm.resource;
>         struct list_head *blocks = &to_xe_ttm_vram_mgr_resource(res)->blocks;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         int j = 0;
>
>         list_for_each_entry(block, blocks, link) {
>                 struct xe_vram_region *vr = block->private;
> -               struct drm_buddy *buddy = vram_to_buddy(vr);
> -               u64 block_pfn = block_offset_to_pfn(vr, drm_buddy_block_offset(block));
> +               struct gpu_buddy *buddy = vram_to_buddy(vr);
> +               u64 block_pfn = block_offset_to_pfn(vr, gpu_buddy_block_offset(block));
>                 int i;
>
> -               for (i = 0; i < drm_buddy_block_size(buddy, block) >> PAGE_SHIFT; ++i)
> +               for (i = 0; i < gpu_buddy_block_size(buddy, block) >> PAGE_SHIFT; ++i)
>                         pfn[j++] = block_pfn + i;
>         }
>
> @@ -876,7 +876,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>         struct dma_fence *pre_migrate_fence = NULL;
>         struct xe_device *xe = vr->xe;
>         struct device *dev = xe->drm.dev;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         struct xe_validation_ctx vctx;
>         struct list_head *blocks;
>         struct drm_exec exec;
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> index 9f70802fce92..8192957261e8 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> @@ -4,8 +4,9 @@
>   * Copyright (C) 2021-2022 Red Hat
>   */
>
> -#include <drm/drm_managed.h>
> +#include <drm/drm_buddy.h>
>  #include <drm/drm_drv.h>
> +#include <drm/drm_managed.h>
>
>  #include <drm/ttm/ttm_placement.h>
>  #include <drm/ttm/ttm_range_manager.h>
> @@ -17,16 +18,16 @@
>  #include "xe_ttm_vram_mgr.h"
>  #include "xe_vram_types.h"
>
> -static inline struct drm_buddy_block *
> +static inline struct gpu_buddy_block *
>  xe_ttm_vram_mgr_first_block(struct list_head *list)
>  {
> -       return list_first_entry_or_null(list, struct drm_buddy_block, link);
> +       return list_first_entry_or_null(list, struct gpu_buddy_block, link);
>  }
>
> -static inline bool xe_is_vram_mgr_blocks_contiguous(struct drm_buddy *mm,
> +static inline bool xe_is_vram_mgr_blocks_contiguous(struct gpu_buddy *mm,
>                                                     struct list_head *head)
>  {
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         u64 start, size;
>
>         block = xe_ttm_vram_mgr_first_block(head);
> @@ -34,12 +35,12 @@ static inline bool xe_is_vram_mgr_blocks_contiguous(struct drm_buddy *mm,
>                 return false;
>
>         while (head != block->link.next) {
> -               start = drm_buddy_block_offset(block);
> -               size = drm_buddy_block_size(mm, block);
> +               start = gpu_buddy_block_offset(block);
> +               size = gpu_buddy_block_size(mm, block);
>
> -               block = list_entry(block->link.next, struct drm_buddy_block,
> +               block = list_entry(block->link.next, struct gpu_buddy_block,
>                                    link);
> -               if (start + size != drm_buddy_block_offset(block))
> +               if (start + size != gpu_buddy_block_offset(block))
>                         return false;
>         }
>
> @@ -53,7 +54,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
>  {
>         struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>         struct xe_ttm_vram_mgr_resource *vres;
> -       struct drm_buddy *mm = &mgr->mm;
> +       struct gpu_buddy *mm = &mgr->mm;
>         u64 size, min_page_size;
>         unsigned long lpfn;
>         int err;
> @@ -80,10 +81,10 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
>         INIT_LIST_HEAD(&vres->blocks);
>
>         if (place->flags & TTM_PL_FLAG_TOPDOWN)
> -               vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
> +               vres->flags |= GPU_BUDDY_TOPDOWN_ALLOCATION;
>
>         if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
> -               vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
> +               vres->flags |= GPU_BUDDY_RANGE_ALLOCATION;
>
>         if (WARN_ON(!vres->base.size)) {
>                 err = -EINVAL;
> @@ -119,27 +120,27 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
>                 lpfn = max_t(unsigned long, place->fpfn + (size >> PAGE_SHIFT), lpfn);
>         }
>
> -       err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
> +       err = gpu_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
>                                      (u64)lpfn << PAGE_SHIFT, size,
>                                      min_page_size, &vres->blocks, vres->flags);
>         if (err)
>                 goto error_unlock;
>
>         if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
> -               if (!drm_buddy_block_trim(mm, NULL, vres->base.size, &vres->blocks))
> +               if (!gpu_buddy_block_trim(mm, NULL, vres->base.size, &vres->blocks))
>                         size = vres->base.size;
>         }
>
>         if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
>                 vres->used_visible_size = size;
>         } else {
> -               struct drm_buddy_block *block;
> +               struct gpu_buddy_block *block;
>
>                 list_for_each_entry(block, &vres->blocks, link) {
> -                       u64 start = drm_buddy_block_offset(block);
> +                       u64 start = gpu_buddy_block_offset(block);
>
>                         if (start < mgr->visible_size) {
> -                               u64 end = start + drm_buddy_block_size(mm, block);
> +                               u64 end = start + gpu_buddy_block_size(mm, block);
>
>                                 vres->used_visible_size +=
>                                         min(end, mgr->visible_size) - start;
> @@ -159,11 +160,11 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
>          * the object.
>          */
>         if (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) {
> -               struct drm_buddy_block *block = list_first_entry(&vres->blocks,
> +               struct gpu_buddy_block *block = list_first_entry(&vres->blocks,
>                                                                  typeof(*block),
>                                                                  link);
>
> -               vres->base.start = drm_buddy_block_offset(block) >> PAGE_SHIFT;
> +               vres->base.start = gpu_buddy_block_offset(block) >> PAGE_SHIFT;
>         } else {
>                 vres->base.start = XE_BO_INVALID_OFFSET;
>         }
> @@ -185,10 +186,10 @@ static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
>         struct xe_ttm_vram_mgr_resource *vres =
>                 to_xe_ttm_vram_mgr_resource(res);
>         struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
> -       struct drm_buddy *mm = &mgr->mm;
> +       struct gpu_buddy *mm = &mgr->mm;
>
>         mutex_lock(&mgr->lock);
> -       drm_buddy_free_list(mm, &vres->blocks, 0);
> +       gpu_buddy_free_list(mm, &vres->blocks, 0);
>         mgr->visible_avail += vres->used_visible_size;
>         mutex_unlock(&mgr->lock);
>
> @@ -201,7 +202,7 @@ static void xe_ttm_vram_mgr_debug(struct ttm_resource_manager *man,
>                                   struct drm_printer *printer)
>  {
>         struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
> -       struct drm_buddy *mm = &mgr->mm;
> +       struct gpu_buddy *mm = &mgr->mm;
>
>         mutex_lock(&mgr->lock);
>         drm_printf(printer, "default_page_size: %lluKiB\n",
> @@ -224,8 +225,8 @@ static bool xe_ttm_vram_mgr_intersects(struct ttm_resource_manager *man,
>         struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>         struct xe_ttm_vram_mgr_resource *vres =
>                 to_xe_ttm_vram_mgr_resource(res);
> -       struct drm_buddy *mm = &mgr->mm;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy *mm = &mgr->mm;
> +       struct gpu_buddy_block *block;
>
>         if (!place->fpfn && !place->lpfn)
>                 return true;
> @@ -235,9 +236,9 @@ static bool xe_ttm_vram_mgr_intersects(struct ttm_resource_manager *man,
>
>         list_for_each_entry(block, &vres->blocks, link) {
>                 unsigned long fpfn =
> -                       drm_buddy_block_offset(block) >> PAGE_SHIFT;
> +                       gpu_buddy_block_offset(block) >> PAGE_SHIFT;
>                 unsigned long lpfn = fpfn +
> -                       (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
> +                       (gpu_buddy_block_size(mm, block) >> PAGE_SHIFT);
>
>                 if (place->fpfn < lpfn && place->lpfn > fpfn)
>                         return true;
> @@ -254,8 +255,8 @@ static bool xe_ttm_vram_mgr_compatible(struct ttm_resource_manager *man,
>         struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>         struct xe_ttm_vram_mgr_resource *vres =
>                 to_xe_ttm_vram_mgr_resource(res);
> -       struct drm_buddy *mm = &mgr->mm;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy *mm = &mgr->mm;
> +       struct gpu_buddy_block *block;
>
>         if (!place->fpfn && !place->lpfn)
>                 return true;
> @@ -265,9 +266,9 @@ static bool xe_ttm_vram_mgr_compatible(struct ttm_resource_manager *man,
>
>         list_for_each_entry(block, &vres->blocks, link) {
>                 unsigned long fpfn =
> -                       drm_buddy_block_offset(block) >> PAGE_SHIFT;
> +                       gpu_buddy_block_offset(block) >> PAGE_SHIFT;
>                 unsigned long lpfn = fpfn +
> -                       (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
> +                       (gpu_buddy_block_size(mm, block) >> PAGE_SHIFT);
>
>                 if (fpfn < place->fpfn || lpfn > place->lpfn)
>                         return false;
> @@ -297,7 +298,7 @@ static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
>
>         WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>
> -       drm_buddy_fini(&mgr->mm);
> +       gpu_buddy_fini(&mgr->mm);
>
>         ttm_resource_manager_cleanup(&mgr->manager);
>
> @@ -328,7 +329,7 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
>         mgr->visible_avail = io_size;
>
>         ttm_resource_manager_init(man, &xe->ttm, size);
> -       err = drm_buddy_init(&mgr->mm, man->size, default_page_size);
> +       err = gpu_buddy_init(&mgr->mm, man->size, default_page_size);
>         if (err)
>                 return err;
>
> @@ -376,7 +377,7 @@ int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>         if (!*sgt)
>                 return -ENOMEM;
>
> -       /* Determine the number of DRM_BUDDY blocks to export */
> +       /* Determine the number of GPU_BUDDY blocks to export */
>         xe_res_first(res, offset, length, &cursor);
>         while (cursor.remaining) {
>                 num_entries++;
> @@ -393,10 +394,10 @@ int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>                 sg->length = 0;
>
>         /*
> -        * Walk down DRM_BUDDY blocks to populate scatterlist nodes
> -        * @note: Use iterator api to get first the DRM_BUDDY block
> +        * Walk down GPU_BUDDY blocks to populate scatterlist nodes
> +        * @note: Use iterator api to get first the GPU_BUDDY block
>          * and the number of bytes from it. Access the following
> -        * DRM_BUDDY block(s) if more buffer needs to exported
> +        * GPU_BUDDY block(s) if more buffer needs to exported
>          */
>         xe_res_first(res, offset, length, &cursor);
>         for_each_sgtable_sg((*sgt), sg, i) {
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> index a71e14818ec2..9106da056b49 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> @@ -6,7 +6,7 @@
>  #ifndef _XE_TTM_VRAM_MGR_TYPES_H_
>  #define _XE_TTM_VRAM_MGR_TYPES_H_
>
> -#include <drm/drm_buddy.h>
> +#include <linux/gpu_buddy.h>
>  #include <drm/ttm/ttm_device.h>
>
>  /**
> @@ -18,7 +18,7 @@ struct xe_ttm_vram_mgr {
>         /** @manager: Base TTM resource manager */
>         struct ttm_resource_manager manager;
>         /** @mm: DRM buddy allocator which manages the VRAM */
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>         /** @visible_size: Proped size of the CPU visible portion */
>         u64 visible_size;
>         /** @visible_avail: CPU visible portion still unallocated */
> diff --git a/drivers/gpu/tests/Makefile b/drivers/gpu/tests/Makefile
> new file mode 100644
> index 000000000000..31a5ff44cb4e
> --- /dev/null
> +++ b/drivers/gpu/tests/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_GPU_BUDDY_KUNIT_TEST) += gpu_buddy_test.o gpu_random.o
> diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/tests/gpu_buddy_test.c
> similarity index 68%
> rename from drivers/gpu/drm/tests/drm_buddy_test.c
> rename to drivers/gpu/tests/gpu_buddy_test.c
> index 5f40b5343bd8..dcd4741a905d 100644
> --- a/drivers/gpu/drm/tests/drm_buddy_test.c
> +++ b/drivers/gpu/tests/gpu_buddy_test.c
> @@ -10,9 +10,9 @@
>  #include <linux/sched/signal.h>
>  #include <linux/sizes.h>
>
> -#include <drm/drm_buddy.h>
> +#include <linux/gpu_buddy.h>
>
> -#include "../lib/drm_random.h"
> +#include "gpu_random.h"
>
>  static unsigned int random_seed;
>
> @@ -21,9 +21,9 @@ static inline u64 get_size(int order, u64 chunk_size)
>         return (1 << order) * chunk_size;
>  }
>
> -static void drm_test_buddy_fragmentation_performance(struct kunit *test)
> +static void gpu_test_buddy_fragmentation_performance(struct kunit *test)
>  {
> -       struct drm_buddy_block *block, *tmp;
> +       struct gpu_buddy_block *block, *tmp;
>         int num_blocks, i, ret, count = 0;
>         LIST_HEAD(allocated_blocks);
>         unsigned long elapsed_ms;
> @@ -32,7 +32,7 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
>         LIST_HEAD(clear_list);
>         LIST_HEAD(dirty_list);
>         LIST_HEAD(free_list);
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>         u64 mm_size = SZ_4G;
>         ktime_t start, end;
>
> @@ -47,7 +47,7 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
>          * quickly the allocator can satisfy larger, aligned requests from a pool of
>          * highly fragmented space.
>          */
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K),
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, SZ_4K),
>                                "buddy_init failed\n");
>
>         num_blocks = mm_size / SZ_64K;
> @@ -55,7 +55,7 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
>         start = ktime_get();
>         /* Allocate with maximum fragmentation - 8K blocks with 64K alignment */
>         for (i = 0; i < num_blocks; i++)
> -               KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, SZ_8K, SZ_64K,
> +               KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size, SZ_8K, SZ_64K,
>                                                                     &allocated_blocks, 0),
>                                         "buddy_alloc hit an error size=%u\n", SZ_8K);
>
> @@ -68,21 +68,21 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
>         }
>
>         /* Free with different flags to ensure no coalescing */
> -       drm_buddy_free_list(&mm, &clear_list, DRM_BUDDY_CLEARED);
> -       drm_buddy_free_list(&mm, &dirty_list, 0);
> +       gpu_buddy_free_list(&mm, &clear_list, GPU_BUDDY_CLEARED);
> +       gpu_buddy_free_list(&mm, &dirty_list, 0);
>
>         for (i = 0; i < num_blocks; i++)
> -               KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, SZ_64K, SZ_64K,
> +               KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size, SZ_64K, SZ_64K,
>                                                                     &test_blocks, 0),
>                                         "buddy_alloc hit an error size=%u\n", SZ_64K);
> -       drm_buddy_free_list(&mm, &test_blocks, 0);
> +       gpu_buddy_free_list(&mm, &test_blocks, 0);
>
>         end = ktime_get();
>         elapsed_ms = ktime_to_ms(ktime_sub(end, start));
>
>         kunit_info(test, "Fragmented allocation took %lu ms\n", elapsed_ms);
>
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_fini(&mm);
>
>         /*
>          * Reverse free order under fragmentation
> @@ -96,13 +96,13 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
>          * deallocation occurs in the opposite order of allocation, exposing the
>          * cost difference between a linear freelist scan and an ordered tree lookup.
>          */
> -       ret = drm_buddy_init(&mm, mm_size, SZ_4K);
> +       ret = gpu_buddy_init(&mm, mm_size, SZ_4K);
>         KUNIT_ASSERT_EQ(test, ret, 0);
>
>         start = ktime_get();
>         /* Allocate maximum fragmentation */
>         for (i = 0; i < num_blocks; i++)
> -               KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, SZ_8K, SZ_64K,
> +               KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size, SZ_8K, SZ_64K,
>                                                                     &allocated_blocks, 0),
>                                         "buddy_alloc hit an error size=%u\n", SZ_8K);
>
> @@ -111,28 +111,28 @@ static void drm_test_buddy_fragmentation_performance(struct kunit *test)
>                         list_move_tail(&block->link, &free_list);
>                 count++;
>         }
> -       drm_buddy_free_list(&mm, &free_list, DRM_BUDDY_CLEARED);
> +       gpu_buddy_free_list(&mm, &free_list, GPU_BUDDY_CLEARED);
>
>         list_for_each_entry_safe_reverse(block, tmp, &allocated_blocks, link)
>                 list_move(&block->link, &reverse_list);
> -       drm_buddy_free_list(&mm, &reverse_list, DRM_BUDDY_CLEARED);
> +       gpu_buddy_free_list(&mm, &reverse_list, GPU_BUDDY_CLEARED);
>
>         end = ktime_get();
>         elapsed_ms = ktime_to_ms(ktime_sub(end, start));
>
>         kunit_info(test, "Reverse-ordered free took %lu ms\n", elapsed_ms);
>
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_fini(&mm);
>  }
>
> -static void drm_test_buddy_alloc_range_bias(struct kunit *test)
> +static void gpu_test_buddy_alloc_range_bias(struct kunit *test)
>  {
>         u32 mm_size, size, ps, bias_size, bias_start, bias_end, bias_rem;
> -       DRM_RND_STATE(prng, random_seed);
> +       GPU_RND_STATE(prng, random_seed);
>         unsigned int i, count, *order;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         unsigned long flags;
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>         LIST_HEAD(allocated);
>
>         bias_size = SZ_1M;
> @@ -142,11 +142,11 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
>
>         kunit_info(test, "mm_size=%u, ps=%u\n", mm_size, ps);
>
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps),
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, ps),
>                                "buddy_init failed\n");
>
>         count = mm_size / bias_size;
> -       order = drm_random_order(count, &prng);
> +       order = gpu_random_order(count, &prng);
>         KUNIT_EXPECT_TRUE(test, order);
>
>         /*
> @@ -166,79 +166,79 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
>
>                 /* internal round_up too big */
>                 KUNIT_ASSERT_TRUE_MSG(test,
> -                                     drm_buddy_alloc_blocks(&mm, bias_start,
> +                                     gpu_buddy_alloc_blocks(&mm, bias_start,
>                                                              bias_end, bias_size + ps, bias_size,
>                                                              &allocated,
> -                                                            DRM_BUDDY_RANGE_ALLOCATION),
> +                                                            GPU_BUDDY_RANGE_ALLOCATION),
>                                       "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
>                                       bias_start, bias_end, bias_size, bias_size);
>
>                 /* size too big */
>                 KUNIT_ASSERT_TRUE_MSG(test,
> -                                     drm_buddy_alloc_blocks(&mm, bias_start,
> +                                     gpu_buddy_alloc_blocks(&mm, bias_start,
>                                                              bias_end, bias_size + ps, ps,
>                                                              &allocated,
> -                                                            DRM_BUDDY_RANGE_ALLOCATION),
> +                                                            GPU_BUDDY_RANGE_ALLOCATION),
>                                       "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
>                                       bias_start, bias_end, bias_size + ps, ps);
>
>                 /* bias range too small for size */
>                 KUNIT_ASSERT_TRUE_MSG(test,
> -                                     drm_buddy_alloc_blocks(&mm, bias_start + ps,
> +                                     gpu_buddy_alloc_blocks(&mm, bias_start + ps,
>                                                              bias_end, bias_size, ps,
>                                                              &allocated,
> -                                                            DRM_BUDDY_RANGE_ALLOCATION),
> +                                                            GPU_BUDDY_RANGE_ALLOCATION),
>                                       "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
>                                       bias_start + ps, bias_end, bias_size, ps);
>
>                 /* bias misaligned */
>                 KUNIT_ASSERT_TRUE_MSG(test,
> -                                     drm_buddy_alloc_blocks(&mm, bias_start + ps,
> +                                     gpu_buddy_alloc_blocks(&mm, bias_start + ps,
>                                                              bias_end - ps,
>                                                              bias_size >> 1, bias_size >> 1,
>                                                              &allocated,
> -                                                            DRM_BUDDY_RANGE_ALLOCATION),
> +                                                            GPU_BUDDY_RANGE_ALLOCATION),
>                                       "buddy_alloc h didn't fail with bias(%x-%x), size=%u, ps=%u\n",
>                                       bias_start + ps, bias_end - ps, bias_size >> 1, bias_size >> 1);
>
>                 /* single big page */
>                 KUNIT_ASSERT_FALSE_MSG(test,
> -                                      drm_buddy_alloc_blocks(&mm, bias_start,
> +                                      gpu_buddy_alloc_blocks(&mm, bias_start,
>                                                               bias_end, bias_size, bias_size,
>                                                               &tmp,
> -                                                             DRM_BUDDY_RANGE_ALLOCATION),
> +                                                             GPU_BUDDY_RANGE_ALLOCATION),
>                                        "buddy_alloc i failed with bias(%x-%x), size=%u, ps=%u\n",
>                                        bias_start, bias_end, bias_size, bias_size);
> -               drm_buddy_free_list(&mm, &tmp, 0);
> +               gpu_buddy_free_list(&mm, &tmp, 0);
>
>                 /* single page with internal round_up */
>                 KUNIT_ASSERT_FALSE_MSG(test,
> -                                      drm_buddy_alloc_blocks(&mm, bias_start,
> +                                      gpu_buddy_alloc_blocks(&mm, bias_start,
>                                                               bias_end, ps, bias_size,
>                                                               &tmp,
> -                                                             DRM_BUDDY_RANGE_ALLOCATION),
> +                                                             GPU_BUDDY_RANGE_ALLOCATION),
>                                        "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
>                                        bias_start, bias_end, ps, bias_size);
> -               drm_buddy_free_list(&mm, &tmp, 0);
> +               gpu_buddy_free_list(&mm, &tmp, 0);
>
>                 /* random size within */
>                 size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
>                 if (size)
>                         KUNIT_ASSERT_FALSE_MSG(test,
> -                                              drm_buddy_alloc_blocks(&mm, bias_start,
> +                                              gpu_buddy_alloc_blocks(&mm, bias_start,
>                                                                       bias_end, size, ps,
>                                                                       &tmp,
> -                                                                     DRM_BUDDY_RANGE_ALLOCATION),
> +                                                                     GPU_BUDDY_RANGE_ALLOCATION),
>                                                "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
>                                                bias_start, bias_end, size, ps);
>
>                 bias_rem -= size;
>                 /* too big for current avail */
>                 KUNIT_ASSERT_TRUE_MSG(test,
> -                                     drm_buddy_alloc_blocks(&mm, bias_start,
> +                                     gpu_buddy_alloc_blocks(&mm, bias_start,
>                                                              bias_end, bias_rem + ps, ps,
>                                                              &allocated,
> -                                                            DRM_BUDDY_RANGE_ALLOCATION),
> +                                                            GPU_BUDDY_RANGE_ALLOCATION),
>                                       "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
>                                       bias_start, bias_end, bias_rem + ps, ps);
>
> @@ -248,10 +248,10 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
>                         size = max(size, ps);
>
>                         KUNIT_ASSERT_FALSE_MSG(test,
> -                                              drm_buddy_alloc_blocks(&mm, bias_start,
> +                                              gpu_buddy_alloc_blocks(&mm, bias_start,
>                                                                       bias_end, size, ps,
>                                                                       &allocated,
> -                                                                     DRM_BUDDY_RANGE_ALLOCATION),
> +                                                                     GPU_BUDDY_RANGE_ALLOCATION),
>                                                "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
>                                                bias_start, bias_end, size, ps);
>                         /*
> @@ -259,15 +259,15 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
>                          * unallocated, and ideally not always on the bias
>                          * boundaries.
>                          */
> -                       drm_buddy_free_list(&mm, &tmp, 0);
> +                       gpu_buddy_free_list(&mm, &tmp, 0);
>                 } else {
>                         list_splice_tail(&tmp, &allocated);
>                 }
>         }
>
>         kfree(order);
> -       drm_buddy_free_list(&mm, &allocated, 0);
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_free_list(&mm, &allocated, 0);
> +       gpu_buddy_fini(&mm);
>
>         /*
>          * Something more free-form. Idea is to pick a random starting bias
> @@ -278,7 +278,7 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
>          * allocated nodes in the middle of the address space.
>          */
>
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps),
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, ps),
>                                "buddy_init failed\n");
>
>         bias_start = round_up(prandom_u32_state(&prng) % (mm_size - ps), ps);
> @@ -290,10 +290,10 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
>                 u32 size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
>
>                 KUNIT_ASSERT_FALSE_MSG(test,
> -                                      drm_buddy_alloc_blocks(&mm, bias_start,
> +                                      gpu_buddy_alloc_blocks(&mm, bias_start,
>                                                               bias_end, size, ps,
>                                                               &allocated,
> -                                                             DRM_BUDDY_RANGE_ALLOCATION),
> +                                                             GPU_BUDDY_RANGE_ALLOCATION),
>                                        "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
>                                        bias_start, bias_end, size, ps);
>                 bias_rem -= size;
> @@ -319,24 +319,24 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
>         KUNIT_ASSERT_EQ(test, bias_start, 0);
>         KUNIT_ASSERT_EQ(test, bias_end, mm_size);
>         KUNIT_ASSERT_TRUE_MSG(test,
> -                             drm_buddy_alloc_blocks(&mm, bias_start, bias_end,
> +                             gpu_buddy_alloc_blocks(&mm, bias_start, bias_end,
>                                                      ps, ps,
>                                                      &allocated,
> -                                                    DRM_BUDDY_RANGE_ALLOCATION),
> +                                                    GPU_BUDDY_RANGE_ALLOCATION),
>                               "buddy_alloc passed with bias(%x-%x), size=%u\n",
>                               bias_start, bias_end, ps);
>
> -       drm_buddy_free_list(&mm, &allocated, 0);
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_free_list(&mm, &allocated, 0);
> +       gpu_buddy_fini(&mm);
>
>         /*
> -        * Allocate cleared blocks in the bias range when the DRM buddy's clear avail is
> +        * Allocate cleared blocks in the bias range when the GPU buddy's clear avail is
>          * zero. This will validate the bias range allocation in scenarios like system boot
>          * when no cleared blocks are available and exercise the fallback path too. The resulting
>          * blocks should always be dirty.
>          */
>
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps),
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, ps),
>                                "buddy_init failed\n");
>
>         bias_start = round_up(prandom_u32_state(&prng) % (mm_size - ps), ps);
> @@ -344,11 +344,11 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
>         bias_end = max(bias_end, bias_start + ps);
>         bias_rem = bias_end - bias_start;
>
> -       flags = DRM_BUDDY_CLEAR_ALLOCATION | DRM_BUDDY_RANGE_ALLOCATION;
> +       flags = GPU_BUDDY_CLEAR_ALLOCATION | GPU_BUDDY_RANGE_ALLOCATION;
>         size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
>
>         KUNIT_ASSERT_FALSE_MSG(test,
> -                              drm_buddy_alloc_blocks(&mm, bias_start,
> +                              gpu_buddy_alloc_blocks(&mm, bias_start,
>                                                       bias_end, size, ps,
>                                                       &allocated,
>                                                       flags),
> @@ -356,27 +356,27 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test)
>                                bias_start, bias_end, size, ps);
>
>         list_for_each_entry(block, &allocated, link)
> -               KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false);
> +               KUNIT_EXPECT_EQ(test, gpu_buddy_block_is_clear(block), false);
>
> -       drm_buddy_free_list(&mm, &allocated, 0);
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_free_list(&mm, &allocated, 0);
> +       gpu_buddy_fini(&mm);
>  }
>
> -static void drm_test_buddy_alloc_clear(struct kunit *test)
> +static void gpu_test_buddy_alloc_clear(struct kunit *test)
>  {
>         unsigned long n_pages, total, i = 0;
>         const unsigned long ps = SZ_4K;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         const int max_order = 12;
>         LIST_HEAD(allocated);
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>         unsigned int order;
>         u32 mm_size, size;
>         LIST_HEAD(dirty);
>         LIST_HEAD(clean);
>
>         mm_size = SZ_4K << max_order;
> -       KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
> +       KUNIT_EXPECT_FALSE(test, gpu_buddy_init(&mm, mm_size, ps));
>
>         KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
>
> @@ -389,11 +389,11 @@ static void drm_test_buddy_alloc_clear(struct kunit *test)
>          * is indeed all dirty pages and vice versa. Free it all again,
>          * keeping the dirty/clear status.
>          */
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                             5 * ps, ps, &allocated,
> -                                                           DRM_BUDDY_TOPDOWN_ALLOCATION),
> +                                                           GPU_BUDDY_TOPDOWN_ALLOCATION),
>                                 "buddy_alloc hit an error size=%lu\n", 5 * ps);
> -       drm_buddy_free_list(&mm, &allocated, DRM_BUDDY_CLEARED);
> +       gpu_buddy_free_list(&mm, &allocated, GPU_BUDDY_CLEARED);
>
>         n_pages = 10;
>         do {
> @@ -406,37 +406,37 @@ static void drm_test_buddy_alloc_clear(struct kunit *test)
>                         flags = 0;
>                 } else {
>                         list = &clean;
> -                       flags = DRM_BUDDY_CLEAR_ALLOCATION;
> +                       flags = GPU_BUDDY_CLEAR_ALLOCATION;
>                 }
>
> -               KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +               KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                                     ps, ps, list,
>                                                                     flags),
>                                         "buddy_alloc hit an error size=%lu\n", ps);
>         } while (++i < n_pages);
>
>         list_for_each_entry(block, &clean, link)
> -               KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), true);
> +               KUNIT_EXPECT_EQ(test, gpu_buddy_block_is_clear(block), true);
>
>         list_for_each_entry(block, &dirty, link)
> -               KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false);
> +               KUNIT_EXPECT_EQ(test, gpu_buddy_block_is_clear(block), false);
>
> -       drm_buddy_free_list(&mm, &clean, DRM_BUDDY_CLEARED);
> +       gpu_buddy_free_list(&mm, &clean, GPU_BUDDY_CLEARED);
>
>         /*
>          * Trying to go over the clear limit for some allocation.
>          * The allocation should never fail with reasonable page-size.
>          */
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                             10 * ps, ps, &clean,
> -                                                           DRM_BUDDY_CLEAR_ALLOCATION),
> +                                                           GPU_BUDDY_CLEAR_ALLOCATION),
>                                 "buddy_alloc hit an error size=%lu\n", 10 * ps);
>
> -       drm_buddy_free_list(&mm, &clean, DRM_BUDDY_CLEARED);
> -       drm_buddy_free_list(&mm, &dirty, 0);
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_free_list(&mm, &clean, GPU_BUDDY_CLEARED);
> +       gpu_buddy_free_list(&mm, &dirty, 0);
> +       gpu_buddy_fini(&mm);
>
> -       KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
> +       KUNIT_EXPECT_FALSE(test, gpu_buddy_init(&mm, mm_size, ps));
>
>         /*
>          * Create a new mm. Intentionally fragment the address space by creating
> @@ -458,34 +458,34 @@ static void drm_test_buddy_alloc_clear(struct kunit *test)
>                 else
>                         list = &clean;
>
> -               KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +               KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                                     ps, ps, list, 0),
>                                         "buddy_alloc hit an error size=%lu\n", ps);
>         } while (++i < n_pages);
>
> -       drm_buddy_free_list(&mm, &clean, DRM_BUDDY_CLEARED);
> -       drm_buddy_free_list(&mm, &dirty, 0);
> +       gpu_buddy_free_list(&mm, &clean, GPU_BUDDY_CLEARED);
> +       gpu_buddy_free_list(&mm, &dirty, 0);
>
>         order = 1;
>         do {
>                 size = SZ_4K << order;
>
> -               KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +               KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                                     size, size, &allocated,
> -                                                                   DRM_BUDDY_CLEAR_ALLOCATION),
> +                                                                   GPU_BUDDY_CLEAR_ALLOCATION),
>                                         "buddy_alloc hit an error size=%u\n", size);
>                 total = 0;
>                 list_for_each_entry(block, &allocated, link) {
>                         if (size != mm_size)
> -                               KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false);
> -                       total += drm_buddy_block_size(&mm, block);
> +                               KUNIT_EXPECT_EQ(test, gpu_buddy_block_is_clear(block), false);
> +                       total += gpu_buddy_block_size(&mm, block);
>                 }
>                 KUNIT_EXPECT_EQ(test, total, size);
>
> -               drm_buddy_free_list(&mm, &allocated, 0);
> +               gpu_buddy_free_list(&mm, &allocated, 0);
>         } while (++order <= max_order);
>
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_fini(&mm);
>
>         /*
>          * Create a new mm with a non power-of-two size. Allocate a random size from each
> @@ -494,44 +494,44 @@ static void drm_test_buddy_alloc_clear(struct kunit *test)
>          */
>         mm_size = (SZ_4K << max_order) + (SZ_4K << (max_order - 2));
>
> -       KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
> +       KUNIT_EXPECT_FALSE(test, gpu_buddy_init(&mm, mm_size, ps));
>         KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, SZ_4K << max_order,
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, SZ_4K << max_order,
>                                                             4 * ps, ps, &allocated,
> -                                                           DRM_BUDDY_RANGE_ALLOCATION),
> +                                                           GPU_BUDDY_RANGE_ALLOCATION),
>                                 "buddy_alloc hit an error size=%lu\n", 4 * ps);
> -       drm_buddy_free_list(&mm, &allocated, DRM_BUDDY_CLEARED);
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, SZ_4K << max_order,
> +       gpu_buddy_free_list(&mm, &allocated, GPU_BUDDY_CLEARED);
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, SZ_4K << max_order,
>                                                             2 * ps, ps, &allocated,
> -                                                           DRM_BUDDY_CLEAR_ALLOCATION),
> +                                                           GPU_BUDDY_CLEAR_ALLOCATION),
>                                 "buddy_alloc hit an error size=%lu\n", 2 * ps);
> -       drm_buddy_free_list(&mm, &allocated, DRM_BUDDY_CLEARED);
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, SZ_4K << max_order, mm_size,
> +       gpu_buddy_free_list(&mm, &allocated, GPU_BUDDY_CLEARED);
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, SZ_4K << max_order, mm_size,
>                                                             ps, ps, &allocated,
> -                                                           DRM_BUDDY_RANGE_ALLOCATION),
> +                                                           GPU_BUDDY_RANGE_ALLOCATION),
>                                 "buddy_alloc hit an error size=%lu\n", ps);
> -       drm_buddy_free_list(&mm, &allocated, DRM_BUDDY_CLEARED);
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_free_list(&mm, &allocated, GPU_BUDDY_CLEARED);
> +       gpu_buddy_fini(&mm);
>  }
>
> -static void drm_test_buddy_alloc_contiguous(struct kunit *test)
> +static void gpu_test_buddy_alloc_contiguous(struct kunit *test)
>  {
>         const unsigned long ps = SZ_4K, mm_size = 16 * 3 * SZ_4K;
>         unsigned long i, n_pages, total;
> -       struct drm_buddy_block *block;
> -       struct drm_buddy mm;
> +       struct gpu_buddy_block *block;
> +       struct gpu_buddy mm;
>         LIST_HEAD(left);
>         LIST_HEAD(middle);
>         LIST_HEAD(right);
>         LIST_HEAD(allocated);
>
> -       KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
> +       KUNIT_EXPECT_FALSE(test, gpu_buddy_init(&mm, mm_size, ps));
>
>         /*
>          * Idea is to fragment the address space by alternating block
>          * allocations between three different lists; one for left, middle and
>          * right. We can then free a list to simulate fragmentation. In
> -        * particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION,
> +        * particular we want to exercise the GPU_BUDDY_CONTIGUOUS_ALLOCATION,
>          * including the try_harder path.
>          */
>
> @@ -548,66 +548,66 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test)
>                 else
>                         list = &right;
>                 KUNIT_ASSERT_FALSE_MSG(test,
> -                                      drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +                                      gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                               ps, ps, list, 0),
>                                        "buddy_alloc hit an error size=%lu\n",
>                                        ps);
>         } while (++i < n_pages);
>
> -       KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +       KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                            3 * ps, ps, &allocated,
> -                                                          DRM_BUDDY_CONTIGUOUS_ALLOCATION),
> +                                                          GPU_BUDDY_CONTIGUOUS_ALLOCATION),
>                                "buddy_alloc didn't error size=%lu\n", 3 * ps);
>
> -       drm_buddy_free_list(&mm, &middle, 0);
> -       KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +       gpu_buddy_free_list(&mm, &middle, 0);
> +       KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                            3 * ps, ps, &allocated,
> -                                                          DRM_BUDDY_CONTIGUOUS_ALLOCATION),
> +                                                          GPU_BUDDY_CONTIGUOUS_ALLOCATION),
>                                "buddy_alloc didn't error size=%lu\n", 3 * ps);
> -       KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +       KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                            2 * ps, ps, &allocated,
> -                                                          DRM_BUDDY_CONTIGUOUS_ALLOCATION),
> +                                                          GPU_BUDDY_CONTIGUOUS_ALLOCATION),
>                                "buddy_alloc didn't error size=%lu\n", 2 * ps);
>
> -       drm_buddy_free_list(&mm, &right, 0);
> -       KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +       gpu_buddy_free_list(&mm, &right, 0);
> +       KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                            3 * ps, ps, &allocated,
> -                                                          DRM_BUDDY_CONTIGUOUS_ALLOCATION),
> +                                                          GPU_BUDDY_CONTIGUOUS_ALLOCATION),
>                                "buddy_alloc didn't error size=%lu\n", 3 * ps);
>         /*
>          * At this point we should have enough contiguous space for 2 blocks,
>          * however they are never buddies (since we freed middle and right) so
>          * will require the try_harder logic to find them.
>          */
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                             2 * ps, ps, &allocated,
> -                                                           DRM_BUDDY_CONTIGUOUS_ALLOCATION),
> +                                                           GPU_BUDDY_CONTIGUOUS_ALLOCATION),
>                                "buddy_alloc hit an error size=%lu\n", 2 * ps);
>
> -       drm_buddy_free_list(&mm, &left, 0);
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
> +       gpu_buddy_free_list(&mm, &left, 0);
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, 0, mm_size,
>                                                             3 * ps, ps, &allocated,
> -                                                           DRM_BUDDY_CONTIGUOUS_ALLOCATION),
> +                                                           GPU_BUDDY_CONTIGUOUS_ALLOCATION),
>                                "buddy_alloc hit an error size=%lu\n", 3 * ps);
>
>         total = 0;
>         list_for_each_entry(block, &allocated, link)
> -               total += drm_buddy_block_size(&mm, block);
> +               total += gpu_buddy_block_size(&mm, block);
>
>         KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3);
>
> -       drm_buddy_free_list(&mm, &allocated, 0);
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_free_list(&mm, &allocated, 0);
> +       gpu_buddy_fini(&mm);
>  }
>
> -static void drm_test_buddy_alloc_pathological(struct kunit *test)
> +static void gpu_test_buddy_alloc_pathological(struct kunit *test)
>  {
>         u64 mm_size, size, start = 0;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         const int max_order = 3;
>         unsigned long flags = 0;
>         int order, top;
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>         LIST_HEAD(blocks);
>         LIST_HEAD(holes);
>         LIST_HEAD(tmp);
> @@ -620,7 +620,7 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test)
>          */
>
>         mm_size = SZ_4K << max_order;
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K),
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, SZ_4K),
>                                "buddy_init failed\n");
>
>         KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
> @@ -630,18 +630,18 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test)
>                 block = list_first_entry_or_null(&blocks, typeof(*block), link);
>                 if (block) {
>                         list_del(&block->link);
> -                       drm_buddy_free_block(&mm, block);
> +                       gpu_buddy_free_block(&mm, block);
>                 }
>
>                 for (order = top; order--;) {
>                         size = get_size(order, mm.chunk_size);
> -                       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start,
> +                       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start,
>                                                                             mm_size, size, size,
>                                                                                 &tmp, flags),
>                                         "buddy_alloc hit -ENOMEM with order=%d, top=%d\n",
>                                         order, top);
>
> -                       block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
> +                       block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
>                         KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
>
>                         list_move_tail(&block->link, &blocks);
> @@ -649,45 +649,45 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test)
>
>                 /* There should be one final page for this sub-allocation */
>                 size = get_size(0, mm.chunk_size);
> -               KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
> +               KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
>                                                                     size, size, &tmp, flags),
>                                                            "buddy_alloc hit -ENOMEM for hole\n");
>
> -               block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
> +               block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
>                 KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
>
>                 list_move_tail(&block->link, &holes);
>
>                 size = get_size(top, mm.chunk_size);
> -               KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
> +               KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
>                                                                    size, size, &tmp, flags),
>                                                           "buddy_alloc unexpectedly succeeded at top-order %d/%d, it should be full!",
>                                                           top, max_order);
>         }
>
> -       drm_buddy_free_list(&mm, &holes, 0);
> +       gpu_buddy_free_list(&mm, &holes, 0);
>
>         /* Nothing larger than blocks of chunk_size now available */
>         for (order = 1; order <= max_order; order++) {
>                 size = get_size(order, mm.chunk_size);
> -               KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
> +               KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
>                                                                    size, size, &tmp, flags),
>                                                           "buddy_alloc unexpectedly succeeded at order %d, it should be full!",
>                                                           order);
>         }
>
>         list_splice_tail(&holes, &blocks);
> -       drm_buddy_free_list(&mm, &blocks, 0);
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_free_list(&mm, &blocks, 0);
> +       gpu_buddy_fini(&mm);
>  }
>
> -static void drm_test_buddy_alloc_pessimistic(struct kunit *test)
> +static void gpu_test_buddy_alloc_pessimistic(struct kunit *test)
>  {
>         u64 mm_size, size, start = 0;
> -       struct drm_buddy_block *block, *bn;
> +       struct gpu_buddy_block *block, *bn;
>         const unsigned int max_order = 16;
>         unsigned long flags = 0;
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>         unsigned int order;
>         LIST_HEAD(blocks);
>         LIST_HEAD(tmp);
> @@ -699,19 +699,19 @@ static void drm_test_buddy_alloc_pessimistic(struct kunit *test)
>          */
>
>         mm_size = SZ_4K << max_order;
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K),
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, SZ_4K),
>                                "buddy_init failed\n");
>
>         KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
>
>         for (order = 0; order < max_order; order++) {
>                 size = get_size(order, mm.chunk_size);
> -               KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
> +               KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
>                                                                     size, size, &tmp, flags),
>                                                            "buddy_alloc hit -ENOMEM with order=%d\n",
>                                                            order);
>
> -               block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
> +               block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
>                 KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
>
>                 list_move_tail(&block->link, &blocks);
> @@ -719,11 +719,11 @@ static void drm_test_buddy_alloc_pessimistic(struct kunit *test)
>
>         /* And now the last remaining block available */
>         size = get_size(0, mm.chunk_size);
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
>                                                             size, size, &tmp, flags),
>                                                    "buddy_alloc hit -ENOMEM on final alloc\n");
>
> -       block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
> +       block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
>         KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
>
>         list_move_tail(&block->link, &blocks);
> @@ -731,58 +731,58 @@ static void drm_test_buddy_alloc_pessimistic(struct kunit *test)
>         /* Should be completely full! */
>         for (order = max_order; order--;) {
>                 size = get_size(order, mm.chunk_size);
> -               KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
> +               KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
>                                                                    size, size, &tmp, flags),
>                                                           "buddy_alloc unexpectedly succeeded, it should be full!");
>         }
>
>         block = list_last_entry(&blocks, typeof(*block), link);
>         list_del(&block->link);
> -       drm_buddy_free_block(&mm, block);
> +       gpu_buddy_free_block(&mm, block);
>
>         /* As we free in increasing size, we make available larger blocks */
>         order = 1;
>         list_for_each_entry_safe(block, bn, &blocks, link) {
>                 list_del(&block->link);
> -               drm_buddy_free_block(&mm, block);
> +               gpu_buddy_free_block(&mm, block);
>
>                 size = get_size(order, mm.chunk_size);
> -               KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
> +               KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
>                                                                     size, size, &tmp, flags),
>                                                            "buddy_alloc hit -ENOMEM with order=%d\n",
>                                                            order);
>
> -               block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
> +               block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
>                 KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
>
>                 list_del(&block->link);
> -               drm_buddy_free_block(&mm, block);
> +               gpu_buddy_free_block(&mm, block);
>                 order++;
>         }
>
>         /* To confirm, now the whole mm should be available */
>         size = get_size(max_order, mm.chunk_size);
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
>                                                             size, size, &tmp, flags),
>                                                    "buddy_alloc (realloc) hit -ENOMEM with order=%d\n",
>                                                    max_order);
>
> -       block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
> +       block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
>         KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
>
>         list_del(&block->link);
> -       drm_buddy_free_block(&mm, block);
> -       drm_buddy_free_list(&mm, &blocks, 0);
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_free_block(&mm, block);
> +       gpu_buddy_free_list(&mm, &blocks, 0);
> +       gpu_buddy_fini(&mm);
>  }
>
> -static void drm_test_buddy_alloc_optimistic(struct kunit *test)
> +static void gpu_test_buddy_alloc_optimistic(struct kunit *test)
>  {
>         u64 mm_size, size, start = 0;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         unsigned long flags = 0;
>         const int max_order = 16;
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>         LIST_HEAD(blocks);
>         LIST_HEAD(tmp);
>         int order;
> @@ -794,19 +794,19 @@ static void drm_test_buddy_alloc_optimistic(struct kunit *test)
>
>         mm_size = SZ_4K * ((1 << (max_order + 1)) - 1);
>
> -       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K),
> +       KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_init(&mm, mm_size, SZ_4K),
>                                "buddy_init failed\n");
>
>         KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
>
>         for (order = 0; order <= max_order; order++) {
>                 size = get_size(order, mm.chunk_size);
> -               KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
> +               KUNIT_ASSERT_FALSE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
>                                                                     size, size, &tmp, flags),
>                                                            "buddy_alloc hit -ENOMEM with order=%d\n",
>                                                            order);
>
> -               block = list_first_entry_or_null(&tmp, struct drm_buddy_block, link);
> +               block = list_first_entry_or_null(&tmp, struct gpu_buddy_block, link);
>                 KUNIT_ASSERT_TRUE_MSG(test, block, "alloc_blocks has no blocks\n");
>
>                 list_move_tail(&block->link, &blocks);
> @@ -814,80 +814,80 @@ static void drm_test_buddy_alloc_optimistic(struct kunit *test)
>
>         /* Should be completely full! */
>         size = get_size(0, mm.chunk_size);
> -       KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size,
> +       KUNIT_ASSERT_TRUE_MSG(test, gpu_buddy_alloc_blocks(&mm, start, mm_size,
>                                                            size, size, &tmp, flags),
>                                                   "buddy_alloc unexpectedly succeeded, it should be full!");
>
> -       drm_buddy_free_list(&mm, &blocks, 0);
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_free_list(&mm, &blocks, 0);
> +       gpu_buddy_fini(&mm);
>  }
>
> -static void drm_test_buddy_alloc_limit(struct kunit *test)
> +static void gpu_test_buddy_alloc_limit(struct kunit *test)
>  {
>         u64 size = U64_MAX, start = 0;
> -       struct drm_buddy_block *block;
> +       struct gpu_buddy_block *block;
>         unsigned long flags = 0;
>         LIST_HEAD(allocated);
> -       struct drm_buddy mm;
> +       struct gpu_buddy mm;
>
> -       KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, size, SZ_4K));
> +       KUNIT_EXPECT_FALSE(test, gpu_buddy_init(&mm, size, SZ_4K));
>
> -       KUNIT_EXPECT_EQ_MSG(test, mm.max_order, DRM_BUDDY_MAX_ORDER,
> +       KUNIT_EXPECT_EQ_MSG(test, mm.max_order, GPU_BUDDY_MAX_ORDER,
>                             "mm.max_order(%d) != %d\n", mm.max_order,
> -                                               DRM_BUDDY_MAX_ORDER);
> +                                               GPU_BUDDY_MAX_ORDER);
>
>         size = mm.chunk_size << mm.max_order;
> -       KUNIT_EXPECT_FALSE(test, drm_buddy_alloc_blocks(&mm, start, size, size,
> +       KUNIT_EXPECT_FALSE(test, gpu_buddy_alloc_blocks(&mm, start, size, size,
>                                                         mm.chunk_size, &allocated, flags));
>
> -       block = list_first_entry_or_null(&allocated, struct drm_buddy_block, link);
> +       block = list_first_entry_or_null(&allocated, struct gpu_buddy_block, link);
>         KUNIT_EXPECT_TRUE(test, block);
>
> -       KUNIT_EXPECT_EQ_MSG(test, drm_buddy_block_order(block), mm.max_order,
> +       KUNIT_EXPECT_EQ_MSG(test, gpu_buddy_block_order(block), mm.max_order,
>                             "block order(%d) != %d\n",
> -                                               drm_buddy_block_order(block), mm.max_order);
> +                                               gpu_buddy_block_order(block), mm.max_order);
>
> -       KUNIT_EXPECT_EQ_MSG(test, drm_buddy_block_size(&mm, block),
> +       KUNIT_EXPECT_EQ_MSG(test, gpu_buddy_block_size(&mm, block),
>                             BIT_ULL(mm.max_order) * mm.chunk_size,
>                                                 "block size(%llu) != %llu\n",
> -                                               drm_buddy_block_size(&mm, block),
> +                                               gpu_buddy_block_size(&mm, block),
>                                                 BIT_ULL(mm.max_order) * mm.chunk_size);
>
> -       drm_buddy_free_list(&mm, &allocated, 0);
> -       drm_buddy_fini(&mm);
> +       gpu_buddy_free_list(&mm, &allocated, 0);
> +       gpu_buddy_fini(&mm);
>  }
>
> -static int drm_buddy_suite_init(struct kunit_suite *suite)
> +static int gpu_buddy_suite_init(struct kunit_suite *suite)
>  {
>         while (!random_seed)
>                 random_seed = get_random_u32();
>
> -       kunit_info(suite, "Testing DRM buddy manager, with random_seed=0x%x\n",
> +       kunit_info(suite, "Testing GPU buddy manager, with random_seed=0x%x\n",
>                    random_seed);
>
>         return 0;
>  }
>
> -static struct kunit_case drm_buddy_tests[] = {
> -       KUNIT_CASE(drm_test_buddy_alloc_limit),
> -       KUNIT_CASE(drm_test_buddy_alloc_optimistic),
> -       KUNIT_CASE(drm_test_buddy_alloc_pessimistic),
> -       KUNIT_CASE(drm_test_buddy_alloc_pathological),
> -       KUNIT_CASE(drm_test_buddy_alloc_contiguous),
> -       KUNIT_CASE(drm_test_buddy_alloc_clear),
> -       KUNIT_CASE(drm_test_buddy_alloc_range_bias),
> -       KUNIT_CASE(drm_test_buddy_fragmentation_performance),
> +static struct kunit_case gpu_buddy_tests[] = {
> +       KUNIT_CASE(gpu_test_buddy_alloc_limit),
> +       KUNIT_CASE(gpu_test_buddy_alloc_optimistic),
> +       KUNIT_CASE(gpu_test_buddy_alloc_pessimistic),
> +       KUNIT_CASE(gpu_test_buddy_alloc_pathological),
> +       KUNIT_CASE(gpu_test_buddy_alloc_contiguous),
> +       KUNIT_CASE(gpu_test_buddy_alloc_clear),
> +       KUNIT_CASE(gpu_test_buddy_alloc_range_bias),
> +       KUNIT_CASE(gpu_test_buddy_fragmentation_performance),
>         {}
>  };
>
> -static struct kunit_suite drm_buddy_test_suite = {
> -       .name = "drm_buddy",
> -       .suite_init = drm_buddy_suite_init,
> -       .test_cases = drm_buddy_tests,
> +static struct kunit_suite gpu_buddy_test_suite = {
> +       .name = "gpu_buddy",
> +       .suite_init = gpu_buddy_suite_init,
> +       .test_cases = gpu_buddy_tests,
>  };
>
> -kunit_test_suite(drm_buddy_test_suite);
> +kunit_test_suite(gpu_buddy_test_suite);
>
>  MODULE_AUTHOR("Intel Corporation");
> -MODULE_DESCRIPTION("Kunit test for drm_buddy functions");
> +MODULE_DESCRIPTION("Kunit test for gpu_buddy functions");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/gpu/tests/gpu_random.c b/drivers/gpu/tests/gpu_random.c
> new file mode 100644
> index 000000000000..54f1f6a3a6c1
> --- /dev/null
> +++ b/drivers/gpu/tests/gpu_random.c
> @@ -0,0 +1,48 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/bitops.h>
> +#include <linux/export.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/random.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +
> +#include "gpu_random.h"
> +
> +u32 gpu_prandom_u32_max_state(u32 ep_ro, struct rnd_state *state)
> +{
> +       return upper_32_bits((u64)prandom_u32_state(state) * ep_ro);
> +}
> +EXPORT_SYMBOL(gpu_prandom_u32_max_state);
> +
> +void gpu_random_reorder(unsigned int *order, unsigned int count,
> +                       struct rnd_state *state)
> +{
> +       unsigned int i, j;
> +
> +       for (i = 0; i < count; ++i) {
> +               BUILD_BUG_ON(sizeof(unsigned int) > sizeof(u32));
> +               j = gpu_prandom_u32_max_state(count, state);
> +               swap(order[i], order[j]);
> +       }
> +}
> +EXPORT_SYMBOL(gpu_random_reorder);
> +
> +unsigned int *gpu_random_order(unsigned int count, struct rnd_state *state)
> +{
> +       unsigned int *order, i;
> +
> +       order = kmalloc_array(count, sizeof(*order), GFP_KERNEL);
> +       if (!order)
> +               return order;
> +
> +       for (i = 0; i < count; i++)
> +               order[i] = i;
> +
> +       gpu_random_reorder(order, count, state);
> +       return order;
> +}
> +EXPORT_SYMBOL(gpu_random_order);
> +
> +MODULE_DESCRIPTION("GPU Randomization Utilities");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/drivers/gpu/tests/gpu_random.h b/drivers/gpu/tests/gpu_random.h
> new file mode 100644
> index 000000000000..b68cf3448264
> --- /dev/null
> +++ b/drivers/gpu/tests/gpu_random.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __GPU_RANDOM_H__
> +#define __GPU_RANDOM_H__
> +
> +/* This is a temporary home for a couple of utility functions that should
> + * be transposed to lib/ at the earliest convenience.
> + */
> +
> +#include <linux/prandom.h>
> +
> +#define GPU_RND_STATE_INITIALIZER(seed__) ({                           \
> +       struct rnd_state state__;                                       \
> +       prandom_seed_state(&state__, (seed__));                         \
> +       state__;                                                        \
> +})
> +
> +#define GPU_RND_STATE(name__, seed__) \
> +       struct rnd_state name__ = GPU_RND_STATE_INITIALIZER(seed__)
> +
> +unsigned int *gpu_random_order(unsigned int count,
> +                              struct rnd_state *state);
> +void gpu_random_reorder(unsigned int *order,
> +                       unsigned int count,
> +                       struct rnd_state *state);
> +u32 gpu_prandom_u32_max_state(u32 ep_ro,
> +                             struct rnd_state *state);
> +
> +#endif /* !__GPU_RANDOM_H__ */
> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> index d51777df12d1..6ae1383b0e2e 100644
> --- a/drivers/video/Kconfig
> +++ b/drivers/video/Kconfig
> @@ -37,6 +37,8 @@ source "drivers/char/agp/Kconfig"
>
>  source "drivers/gpu/vga/Kconfig"
>
> +source "drivers/gpu/Kconfig"
> +
>  source "drivers/gpu/host1x/Kconfig"
>  source "drivers/gpu/ipu-v3/Kconfig"
>  source "drivers/gpu/nova-core/Kconfig"
> diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h
> index b909fa8f810a..3054369bebff 100644
> --- a/include/drm/drm_buddy.h
> +++ b/include/drm/drm_buddy.h
> @@ -6,166 +6,13 @@
>  #ifndef __DRM_BUDDY_H__
>  #define __DRM_BUDDY_H__
>
> -#include <linux/bitops.h>
> -#include <linux/list.h>
> -#include <linux/slab.h>
> -#include <linux/sched.h>
> -#include <linux/rbtree.h>
> +#include <linux/gpu_buddy.h>
>
>  struct drm_printer;
>
> -#define DRM_BUDDY_RANGE_ALLOCATION             BIT(0)
> -#define DRM_BUDDY_TOPDOWN_ALLOCATION           BIT(1)
> -#define DRM_BUDDY_CONTIGUOUS_ALLOCATION                BIT(2)
> -#define DRM_BUDDY_CLEAR_ALLOCATION             BIT(3)
> -#define DRM_BUDDY_CLEARED                      BIT(4)
> -#define DRM_BUDDY_TRIM_DISABLE                 BIT(5)
> -
> -struct drm_buddy_block {
> -#define DRM_BUDDY_HEADER_OFFSET GENMASK_ULL(63, 12)
> -#define DRM_BUDDY_HEADER_STATE  GENMASK_ULL(11, 10)
> -#define   DRM_BUDDY_ALLOCATED     (1 << 10)
> -#define   DRM_BUDDY_FREE          (2 << 10)
> -#define   DRM_BUDDY_SPLIT         (3 << 10)
> -#define DRM_BUDDY_HEADER_CLEAR  GENMASK_ULL(9, 9)
> -/* Free to be used, if needed in the future */
> -#define DRM_BUDDY_HEADER_UNUSED GENMASK_ULL(8, 6)
> -#define DRM_BUDDY_HEADER_ORDER  GENMASK_ULL(5, 0)
> -       u64 header;
> -
> -       struct drm_buddy_block *left;
> -       struct drm_buddy_block *right;
> -       struct drm_buddy_block *parent;
> -
> -       void *private; /* owned by creator */
> -
> -       /*
> -        * While the block is allocated by the user through drm_buddy_alloc*,
> -        * the user has ownership of the link, for example to maintain within
> -        * a list, if so desired. As soon as the block is freed with
> -        * drm_buddy_free* ownership is given back to the mm.
> -        */
> -       union {
> -               struct rb_node rb;
> -               struct list_head link;
> -       };
> -
> -       struct list_head tmp_link;
> -};
> -
> -/* Order-zero must be at least SZ_4K */
> -#define DRM_BUDDY_MAX_ORDER (63 - 12)
> -
> -/*
> - * Binary Buddy System.
> - *
> - * Locking should be handled by the user, a simple mutex around
> - * drm_buddy_alloc* and drm_buddy_free* should suffice.
> - */
> -struct drm_buddy {
> -       /* Maintain a free list for each order. */
> -       struct rb_root **free_trees;
> -
> -       /*
> -        * Maintain explicit binary tree(s) to track the allocation of the
> -        * address space. This gives us a simple way of finding a buddy block
> -        * and performing the potentially recursive merge step when freeing a
> -        * block.  Nodes are either allocated or free, in which case they will
> -        * also exist on the respective free list.
> -        */
> -       struct drm_buddy_block **roots;
> -
> -       /*
> -        * Anything from here is public, and remains static for the lifetime of
> -        * the mm. Everything above is considered do-not-touch.
> -        */
> -       unsigned int n_roots;
> -       unsigned int max_order;
> -
> -       /* Must be at least SZ_4K */
> -       u64 chunk_size;
> -       u64 size;
> -       u64 avail;
> -       u64 clear_avail;
> -};
> -
> -static inline u64
> -drm_buddy_block_offset(const struct drm_buddy_block *block)
> -{
> -       return block->header & DRM_BUDDY_HEADER_OFFSET;
> -}
> -
> -static inline unsigned int
> -drm_buddy_block_order(struct drm_buddy_block *block)
> -{
> -       return block->header & DRM_BUDDY_HEADER_ORDER;
> -}
> -
> -static inline unsigned int
> -drm_buddy_block_state(struct drm_buddy_block *block)
> -{
> -       return block->header & DRM_BUDDY_HEADER_STATE;
> -}
> -
> -static inline bool
> -drm_buddy_block_is_allocated(struct drm_buddy_block *block)
> -{
> -       return drm_buddy_block_state(block) == DRM_BUDDY_ALLOCATED;
> -}
> -
> -static inline bool
> -drm_buddy_block_is_clear(struct drm_buddy_block *block)
> -{
> -       return block->header & DRM_BUDDY_HEADER_CLEAR;
> -}
> -
> -static inline bool
> -drm_buddy_block_is_free(struct drm_buddy_block *block)
> -{
> -       return drm_buddy_block_state(block) == DRM_BUDDY_FREE;
> -}
> -
> -static inline bool
> -drm_buddy_block_is_split(struct drm_buddy_block *block)
> -{
> -       return drm_buddy_block_state(block) == DRM_BUDDY_SPLIT;
> -}
> -
> -static inline u64
> -drm_buddy_block_size(struct drm_buddy *mm,
> -                    struct drm_buddy_block *block)
> -{
> -       return mm->chunk_size << drm_buddy_block_order(block);
> -}
> -
> -int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size);
> -
> -void drm_buddy_fini(struct drm_buddy *mm);
> -
> -struct drm_buddy_block *
> -drm_get_buddy(struct drm_buddy_block *block);
> -
> -int drm_buddy_alloc_blocks(struct drm_buddy *mm,
> -                          u64 start, u64 end, u64 size,
> -                          u64 min_page_size,
> -                          struct list_head *blocks,
> -                          unsigned long flags);
> -
> -int drm_buddy_block_trim(struct drm_buddy *mm,
> -                        u64 *start,
> -                        u64 new_size,
> -                        struct list_head *blocks);
> -
> -void drm_buddy_reset_clear(struct drm_buddy *mm, bool is_clear);
> -
> -void drm_buddy_free_block(struct drm_buddy *mm, struct drm_buddy_block *block);
> -
> -void drm_buddy_free_list(struct drm_buddy *mm,
> -                        struct list_head *objects,
> -                        unsigned int flags);
> -
> -void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p);
> -void drm_buddy_block_print(struct drm_buddy *mm,
> -                          struct drm_buddy_block *block,
> +/* DRM-specific GPU Buddy Allocator print helpers */
> +void drm_buddy_print(struct gpu_buddy *mm, struct drm_printer *p);
> +void drm_buddy_block_print(struct gpu_buddy *mm,
> +                          struct gpu_buddy_block *block,
>                            struct drm_printer *p);
>  #endif
> diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
> new file mode 100644
> index 000000000000..3e4bd11ccb71
> --- /dev/null
> +++ b/include/linux/gpu_buddy.h
> @@ -0,0 +1,177 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#ifndef __GPU_BUDDY_H__
> +#define __GPU_BUDDY_H__
> +
> +#include <linux/bitops.h>
> +#include <linux/list.h>
> +#include <linux/rbtree.h>
> +#include <linux/slab.h>
> +#include <linux/sched.h>
> +
> +#define GPU_BUDDY_RANGE_ALLOCATION             BIT(0)
> +#define GPU_BUDDY_TOPDOWN_ALLOCATION           BIT(1)
> +#define GPU_BUDDY_CONTIGUOUS_ALLOCATION                BIT(2)
> +#define GPU_BUDDY_CLEAR_ALLOCATION             BIT(3)
> +#define GPU_BUDDY_CLEARED                      BIT(4)
> +#define GPU_BUDDY_TRIM_DISABLE                 BIT(5)
> +
> +enum gpu_buddy_free_tree {
> +       GPU_BUDDY_CLEAR_TREE = 0,
> +       GPU_BUDDY_DIRTY_TREE,
> +       GPU_BUDDY_MAX_FREE_TREES,
> +};
> +
> +#define for_each_free_tree(tree) \
> +       for ((tree) = 0; (tree) < GPU_BUDDY_MAX_FREE_TREES; (tree)++)
> +
> +struct gpu_buddy_block {
> +#define GPU_BUDDY_HEADER_OFFSET GENMASK_ULL(63, 12)
> +#define GPU_BUDDY_HEADER_STATE  GENMASK_ULL(11, 10)
> +#define   GPU_BUDDY_ALLOCATED     (1 << 10)
> +#define   GPU_BUDDY_FREE          (2 << 10)
> +#define   GPU_BUDDY_SPLIT         (3 << 10)
> +#define GPU_BUDDY_HEADER_CLEAR  GENMASK_ULL(9, 9)
> +/* Free to be used, if needed in the future */
> +#define GPU_BUDDY_HEADER_UNUSED GENMASK_ULL(8, 6)
> +#define GPU_BUDDY_HEADER_ORDER  GENMASK_ULL(5, 0)
> +       u64 header;
> +
> +       struct gpu_buddy_block *left;
> +       struct gpu_buddy_block *right;
> +       struct gpu_buddy_block *parent;
> +
> +       void *private; /* owned by creator */
> +
> +       /*
> +        * While the block is allocated by the user through gpu_buddy_alloc*,
> +        * the user has ownership of the link, for example to maintain within
> +        * a list, if so desired. As soon as the block is freed with
> +        * gpu_buddy_free* ownership is given back to the mm.
> +        */
> +       union {
> +               struct rb_node rb;
> +               struct list_head link;
> +       };
> +
> +       struct list_head tmp_link;
> +};
> +
> +/* Order-zero must be at least SZ_4K */
> +#define GPU_BUDDY_MAX_ORDER (63 - 12)
> +
> +/*
> + * Binary Buddy System.
> + *
> + * Locking should be handled by the user, a simple mutex around
> + * gpu_buddy_alloc* and gpu_buddy_free* should suffice.
> + */
> +struct gpu_buddy {
> +       /* Maintain a free list for each order. */
> +       struct rb_root **free_trees;
> +
> +       /*
> +        * Maintain explicit binary tree(s) to track the allocation of the
> +        * address space. This gives us a simple way of finding a buddy block
> +        * and performing the potentially recursive merge step when freeing a
> +        * block.  Nodes are either allocated or free, in which case they will
> +        * also exist on the respective free list.
> +        */
> +       struct gpu_buddy_block **roots;
> +
> +       /*
> +        * Anything from here is public, and remains static for the lifetime of
> +        * the mm. Everything above is considered do-not-touch.
> +        */
> +       unsigned int n_roots;
> +       unsigned int max_order;
> +
> +       /* Must be at least SZ_4K */
> +       u64 chunk_size;
> +       u64 size;
> +       u64 avail;
> +       u64 clear_avail;
> +};
> +
> +static inline u64
> +gpu_buddy_block_offset(const struct gpu_buddy_block *block)
> +{
> +       return block->header & GPU_BUDDY_HEADER_OFFSET;
> +}
> +
> +static inline unsigned int
> +gpu_buddy_block_order(struct gpu_buddy_block *block)
> +{
> +       return block->header & GPU_BUDDY_HEADER_ORDER;
> +}
> +
> +static inline unsigned int
> +gpu_buddy_block_state(struct gpu_buddy_block *block)
> +{
> +       return block->header & GPU_BUDDY_HEADER_STATE;
> +}
> +
> +static inline bool
> +gpu_buddy_block_is_allocated(struct gpu_buddy_block *block)
> +{
> +       return gpu_buddy_block_state(block) == GPU_BUDDY_ALLOCATED;
> +}
> +
> +static inline bool
> +gpu_buddy_block_is_clear(struct gpu_buddy_block *block)
> +{
> +       return block->header & GPU_BUDDY_HEADER_CLEAR;
> +}
> +
> +static inline bool
> +gpu_buddy_block_is_free(struct gpu_buddy_block *block)
> +{
> +       return gpu_buddy_block_state(block) == GPU_BUDDY_FREE;
> +}
> +
> +static inline bool
> +gpu_buddy_block_is_split(struct gpu_buddy_block *block)
> +{
> +       return gpu_buddy_block_state(block) == GPU_BUDDY_SPLIT;
> +}
> +
> +static inline u64
> +gpu_buddy_block_size(struct gpu_buddy *mm,
> +                    struct gpu_buddy_block *block)
> +{
> +       return mm->chunk_size << gpu_buddy_block_order(block);
> +}
> +
> +int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size);
> +
> +void gpu_buddy_fini(struct gpu_buddy *mm);
> +
> +struct gpu_buddy_block *
> +gpu_get_buddy(struct gpu_buddy_block *block);
> +
> +int gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
> +                          u64 start, u64 end, u64 size,
> +                          u64 min_page_size,
> +                          struct list_head *blocks,
> +                          unsigned long flags);
> +
> +int gpu_buddy_block_trim(struct gpu_buddy *mm,
> +                        u64 *start,
> +                        u64 new_size,
> +                        struct list_head *blocks);
> +
> +void gpu_buddy_reset_clear(struct gpu_buddy *mm, bool is_clear);
> +
> +void gpu_buddy_free_block(struct gpu_buddy *mm, struct gpu_buddy_block *block);
> +
> +void gpu_buddy_free_list(struct gpu_buddy *mm,
> +                        struct list_head *objects,
> +                        unsigned int flags);
> +
> +void gpu_buddy_print(struct gpu_buddy *mm);
> +void gpu_buddy_block_print(struct gpu_buddy *mm,
> +                          struct gpu_buddy_block *block);
> +#endif
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 02/26] gpu: Move DRM buddy allocator one level up
  2026-02-05 20:55   ` Dave Airlie
@ 2026-02-06  1:04     ` Joel Fernandes
  2026-02-06  1:07       ` Dave Airlie
  0 siblings, 1 reply; 71+ messages in thread
From: Joel Fernandes @ 2026-02-06  1:04 UTC (permalink / raw)
  To: Dave Airlie
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev



On 2/5/2026 3:55 PM, Dave Airlie wrote:
> On Wed, 21 Jan 2026 at 06:44, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>
>> Move the DRM buddy allocator one level up so that it can be used by GPU
>> drivers (example, nova-core) that have usecases other than DRM (such as
>> VFIO vGPU support). Modify the API, structures and Kconfigs to use
>> "gpu_buddy" terminology. Adapt the drivers and tests to use the new API.
>>
>> The commit cannot be split due to bisectability, however no functional
>> change is intended. Verified by running K-UNIT tests and build tested
>> various configurations.
>>
>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> 
> I suggested this and think it's a good idea.
> 
> Reviewed-by: Dave Airlie <airlied@redhat.com>
Thanks, Dave!

--
Joel Fernandes


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v6 02/26] gpu: Move DRM buddy allocator one level up
  2026-02-06  1:04     ` Joel Fernandes
@ 2026-02-06  1:07       ` Dave Airlie
  0 siblings, 0 replies; 71+ messages in thread
From: Dave Airlie @ 2026-02-06  1:07 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, Jonathan Corbet, Alex Deucher,
	Christian König, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Matthew Auld, Matthew Brost,
	Lucas De Marchi, Thomas Hellström, Helge Deller,
	Danilo Krummrich, Alice Ryhl, Miguel Ojeda, Alex Gaynor,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Andreas Hindborg, Trevor Gross, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Alexey Ivanov, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, Daniel Almeida, joel, nouveau,
	dri-devel, rust-for-linux, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev

On Fri, 6 Feb 2026 at 11:04, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>
>
>
> On 2/5/2026 3:55 PM, Dave Airlie wrote:
> > On Wed, 21 Jan 2026 at 06:44, Joel Fernandes <joelagnelf@nvidia.com> wrote:
> >>
> >> Move the DRM buddy allocator one level up so that it can be used by GPU
> >> drivers (example, nova-core) that have usecases other than DRM (such as
> >> VFIO vGPU support). Modify the API, structures and Kconfigs to use
> >> "gpu_buddy" terminology. Adapt the drivers and tests to use the new API.
> >>
> >> The commit cannot be split due to bisectability, however no functional
> >> change is intended. Verified by running K-UNIT tests and build tested
> >> various configurations.
> >>
> >> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> >
> > I suggested this and think it's a good idea.
> >
> > Reviewed-by: Dave Airlie <airlied@redhat.com>
> Thanks, Dave!

I'm going to apply this to drm-misc-next today but I'll move some of it around.

Dave.

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2026-02-06  1:07 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20 20:42 [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 01/26] rust: clist: Add support to interface with C linked lists Joel Fernandes
2026-01-20 23:48   ` Gary Guo
2026-01-21 19:50     ` Joel Fernandes
2026-01-21 20:36       ` Gary Guo
2026-01-21 20:41         ` Joel Fernandes
2026-01-21 20:46         ` Joel Fernandes
2026-01-25  1:51         ` Joel Fernandes
2026-01-21  7:27   ` Zhi Wang
2026-01-21 18:12     ` Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 02/26] gpu: Move DRM buddy allocator one level up Joel Fernandes
2026-02-05 20:55   ` Dave Airlie
2026-02-06  1:04     ` Joel Fernandes
2026-02-06  1:07       ` Dave Airlie
2026-01-20 20:42 ` [PATCH RFC v6 03/26] rust: gpu: Add GPU buddy allocator bindings Joel Fernandes
2026-02-04  3:55   ` Dave Airlie
2026-02-05  1:00     ` Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 04/26] nova-core: mm: Select GPU_BUDDY for VRAM allocation Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM Joel Fernandes
2026-01-21  8:07   ` Zhi Wang
2026-01-21 17:52     ` Joel Fernandes
2026-01-22 23:16       ` Joel Fernandes
2026-01-23 10:13         ` Zhi Wang
2026-01-23 12:59           ` Joel Fernandes
2026-01-28 12:04         ` Danilo Krummrich
2026-01-28 15:27           ` Joel Fernandes
2026-01-29  0:09             ` Danilo Krummrich
2026-01-29  1:02               ` John Hubbard
2026-01-29  1:49                 ` Joel Fernandes
2026-01-29  1:28               ` Joel Fernandes
2026-01-30  0:26           ` Joel Fernandes
2026-01-30  1:11             ` John Hubbard
2026-01-30  1:59               ` Joel Fernandes
2026-01-30  3:38                 ` John Hubbard
2026-01-30 21:14                   ` Joel Fernandes
2026-01-31  3:00                     ` Dave Airlie
2026-01-31  3:21                       ` John Hubbard
2026-01-31 20:08                         ` Joel Fernandes
2026-01-31 20:02                       ` Joel Fernandes
2026-02-02  9:12                       ` Christian König
2026-02-04 23:42                         ` Joel Fernandes
2026-01-30  1:16             ` Gary Guo
2026-01-30  1:45               ` Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 06/26] docs: gpu: nova-core: Document the PRAMIN aperture mechanism Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 07/26] nova-core: Add BAR1 aperture type and size constant Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 08/26] nova-core: gsp: Add BAR1 PDE base accessors Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 09/26] nova-core: mm: Add common memory management types Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 10/26] nova-core: mm: Add common types for all page table formats Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 11/26] nova-core: mm: Add MMU v2 page table types Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 12/26] nova-core: mm: Add MMU v3 " Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 13/26] nova-core: mm: Add unified page table entry wrapper enums Joel Fernandes
2026-01-21  9:54   ` Zhi Wang
2026-01-21 18:35     ` Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 14/26] nova-core: mm: Add TLB flush support Joel Fernandes
2026-01-21  9:59   ` Zhi Wang
2026-01-21 18:45     ` Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 15/26] nova-core: mm: Add GpuMm centralized memory manager Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 16/26] nova-core: mm: Add page table walker for MMU v2 Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 17/26] nova-core: mm: Add Virtual Memory Manager Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 18/26] nova-core: mm: Add virtual address range tracking to VMM Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 19/26] nova-core: mm: Add BAR1 user interface Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 20/26] nova-core: gsp: Return GspStaticInfo and FbLayout from boot() Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 21/26] nova-core: mm: Add memory management self-tests Joel Fernandes
2026-01-20 20:42 ` [PATCH RFC v6 22/26] nova-core: mm: Add PRAMIN aperture self-tests Joel Fernandes
2026-01-20 20:43 ` [PATCH RFC v6 23/26] nova-core: gsp: Extract usable FB region from GSP Joel Fernandes
2026-01-20 20:43 ` [PATCH RFC v6 24/26] nova-core: fb: Add usable_vram field to FbLayout Joel Fernandes
2026-01-20 20:43 ` [PATCH RFC v6 25/26] nova-core: mm: Use usable VRAM region for buddy allocator Joel Fernandes
2026-01-20 20:43 ` [PATCH RFC v6 26/26] nova-core: mm: Add BarUser to struct Gpu and create at boot Joel Fernandes
2026-01-28 11:37 ` [PATCH RFC v6 00/26] nova-core: Memory management infrastructure (v6) Danilo Krummrich
2026-01-28 12:44   ` Joel Fernandes
2026-01-29  0:01     ` Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox