[PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF

Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
@ 2026-05-27 14:45 Rodrigo Vivi
  2026-05-27 15:48 ` ✗ i915.CI.BAT: failure for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Rodrigo Vivi @ 2026-05-27 14:45 UTC (permalink / raw)
  To: intel-gfx, intel-xe
  Cc: Rodrigo Vivi, Ville Syrjälä, Michal Wajdeczko,
	Maarten Lankhorst

The PF GGTT allocator was initialised over a relative [0, usable_size)
range, with ggtt->start added on every address conversion to get the
actual hardware address.  Two consequences of that model were considered
"horrible hacks":

  - ggtt->start (the WOPCM offset) had to be carried around and added
    to every drm_mm result.
  - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
    of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
    untouched during the initial clear.

Fix this for the PF case by initialising drm_mm over the full hardware
GGTT range [0, total_size) and permanently reserving the two forbidden
zones:

  - [0, wopcm)           — inaccessible below WOPCM
  - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP

A new mm_offset field (zero for PF) carries the base offset used in
address conversions, unifying the existing VF relative model (where
mm_offset == vf_base) with the new PF absolute model.  The public
xe_ggtt_start() / xe_ggtt_size() API continues to return the usable
[wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF
config code are unaffected.

xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset
so the VF recovery path remains a single O(1) WRITE_ONCE pair.

Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Assisted-by: GitHub-Copilot:claude-sonnet-4.6
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++-------
 1 file changed, 101 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index a351c578b170..00a6cd2b8a51 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -137,6 +137,17 @@ struct xe_ggtt {
 	const struct xe_ggtt_pt_ops *pt_ops;
 	/** @mm: The memory manager used to manage individual GGTT allocations */
 	struct drm_mm mm;
+	/**
+	 * @mm_offset: base offset added to drm_mm node addresses to obtain hardware
+	 * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses).
+	 * For VF this equals @start (drm_mm uses relative addresses from VF base).
+	 * Updated atomically by xe_ggtt_shift_nodes() during VF recovery.
+	 */
+	u64 mm_offset;
+	/** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */
+	struct drm_mm_node reserved_bottom;
+	/** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */
+	struct drm_mm_node reserved_top;
 	/** @access_count: counts GGTT writes */
 	unsigned int access_count;
 	/** @wq: Dedicated unordered work queue to process node removals */
@@ -318,6 +329,10 @@ static void ggtt_fini_early(struct drm_device *drm, void *arg)
 {
 	struct xe_ggtt *ggtt = arg;
 
+	if (drm_mm_node_allocated(&ggtt->reserved_top))
+		drm_mm_remove_node(&ggtt->reserved_top);
+	if (drm_mm_node_allocated(&ggtt->reserved_bottom))
+		drm_mm_remove_node(&ggtt->reserved_bottom);
 	destroy_workqueue(ggtt->wq);
 	drm_mm_takedown(&ggtt->mm);
 }
@@ -354,16 +369,66 @@ static const struct xe_ggtt_pt_ops xelpg_pt_wa_ops = {
 	.ggtt_get_pte = xe_ggtt_get_pte,
 };
 
-static void __xe_ggtt_init_early(struct xe_ggtt *ggtt, u64 start, u64 size)
+/*
+ * __xe_ggtt_init_early - Generic drm_mm initialisation for both PF and VF.
+ *
+ * @start and @size define the usable GGTT range exposed through the public API.
+ * @mm_offset is added to every drm_mm node address to obtain the hardware
+ * address (0 for PF where the drm_mm spans real addresses; @start for VF).
+ * @mm_size is the total span passed to drm_mm_init() (equals @size for VF;
+ * equals the full hardware GGTT size for PF).
+ */
+static void __xe_ggtt_init_early(struct xe_ggtt *ggtt, u64 start, u64 size,
+				 u64 mm_offset, u64 mm_size)
 {
 	ggtt->start = start;
 	ggtt->size = size;
-	drm_mm_init(&ggtt->mm, 0, size);
+	ggtt->mm_offset = mm_offset;
+	drm_mm_init(&ggtt->mm, 0, mm_size);
+}
+
+/*
+ * __xe_ggtt_reserve_pf_nodes - Permanently reserve the forbidden GGTT zones.
+ *
+ * Must be called after __xe_ggtt_init_early() for the PF case.  Reserves
+ * [0, @wopcm) and [@guc_top, @total_size) so the allocator never hands them
+ * out.  On failure the drm_mm is torn down and the error is returned.
+ */
+static int __xe_ggtt_reserve_pf_nodes(struct xe_ggtt *ggtt, u64 wopcm,
+				      u64 guc_top, u64 total_size)
+{
+	int err;
+
+	/* Reserve [0, wopcm) — GuC cannot access below WOPCM */
+	if (wopcm > 0) {
+		ggtt->reserved_bottom.start = 0;
+		ggtt->reserved_bottom.size = wopcm;
+		err = drm_mm_reserve_node(&ggtt->mm, &ggtt->reserved_bottom);
+		if (WARN_ON(err))
+			goto err_takedown;
+	}
+
+	/* Reserve [guc_top, total_size) — GuC cannot access above GUC_GGTT_TOP */
+	if (guc_top < total_size) {
+		ggtt->reserved_top.start = guc_top;
+		ggtt->reserved_top.size = total_size - guc_top;
+		err = drm_mm_reserve_node(&ggtt->mm, &ggtt->reserved_top);
+		if (WARN_ON(err))
+			goto err_remove_bottom;
+	}
+
+	return 0;
+
+err_remove_bottom:
+	drm_mm_remove_node(&ggtt->reserved_bottom);
+err_takedown:
+	drm_mm_takedown(&ggtt->mm);
+	return err;
 }
 
 int xe_ggtt_init_kunit(struct xe_ggtt *ggtt, u32 start, u32 size)
 {
-	__xe_ggtt_init_early(ggtt, start, size);
+	__xe_ggtt_init_early(ggtt, start, size, start, size);
 	return 0;
 }
 EXPORT_SYMBOL_IF_KUNIT(xe_ggtt_init_kunit);
@@ -405,8 +470,13 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
 			xe_tile_err(ggtt->tile, "Hardware reported no preallocated GSM\n");
 			return -ENOMEM;
 		}
+		/*
+		 * For PF, ggtt_size holds the full hardware GGTT size. The
+		 * WOPCM and GUC_GGTT_TOP limits are enforced via permanently
+		 * reserved drm_mm nodes rather than by capping the range.
+		 */
 		ggtt_start = wopcm;
-		ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE - ggtt_start;
+		ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE;
 	} else {
 		ggtt_start = xe_tile_sriov_vf_ggtt_base(ggtt->tile);
 		ggtt_size = xe_tile_sriov_vf_ggtt(ggtt->tile);
@@ -423,9 +493,6 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
 	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
 		ggtt->flags |= XE_GGTT_FLAGS_64K;
 
-	if (ggtt_size + ggtt_start > GUC_GGTT_TOP)
-		ggtt_size = GUC_GGTT_TOP - ggtt_start;
-
 	if (GRAPHICS_VERx100(xe) >= 1270)
 		ggtt->pt_ops =
 			(ggtt->tile->media_gt && XE_GT_WA(ggtt->tile->media_gt, 22019338487)) ||
@@ -438,7 +505,19 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
 	if (!ggtt->wq)
 		return -ENOMEM;
 
-	__xe_ggtt_init_early(ggtt, ggtt_start, ggtt_size);
+	if (!IS_SRIOV_VF(xe)) {
+		__xe_ggtt_init_early(ggtt, ggtt_start, GUC_GGTT_TOP - ggtt_start,
+				     0, ggtt_size);
+		err = __xe_ggtt_reserve_pf_nodes(ggtt, ggtt_start, GUC_GGTT_TOP,
+						 ggtt_size);
+		if (err) {
+			destroy_workqueue(ggtt->wq);
+			return err;
+		}
+	} else {
+		__xe_ggtt_init_early(ggtt, ggtt_start, ggtt_size,
+				     ggtt_start, ggtt_size);
+	}
 
 	err = drmm_add_action_or_reset(&xe->drm, ggtt_fini_early, ggtt);
 	if (err)
@@ -459,7 +538,7 @@ static void xe_ggtt_initial_clear(struct xe_ggtt *ggtt)
 	/* Display may have allocated inside ggtt, so be careful with clearing here */
 	mutex_lock(&ggtt->lock);
 	drm_mm_for_each_hole(hole, &ggtt->mm, start, end)
-		xe_ggtt_clear(ggtt, ggtt->start + start, end - start);
+		xe_ggtt_clear(ggtt, ggtt->mm_offset + start, end - start);
 
 	xe_ggtt_invalidate(ggtt);
 	mutex_unlock(&ggtt->lock);
@@ -613,6 +692,7 @@ void xe_ggtt_shift_nodes(struct xe_ggtt *ggtt, u64 new_start)
 
 	/* pairs with READ_ONCE in xe_ggtt_node_addr() */
 	WRITE_ONCE(ggtt->start, new_start);
+	WRITE_ONCE(ggtt->mm_offset, new_start);
 }
 
 static int xe_ggtt_insert_node_locked(struct xe_ggtt_node *node,
@@ -815,20 +895,19 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
 
 	mutex_lock(&ggtt->lock);
 	/*
-	 * When inheriting the initial framebuffer, the framebuffer is
-	 * physically located at VRAM address 0, and usually at GGTT address 0 too.
-	 *
-	 * The display code will ask for a GGTT allocation between end of BO and
-	 * remainder of GGTT, unaware that the start is reserved by WOPCM.
+	 * Convert caller-supplied hardware GGTT addresses to drm_mm-relative
+	 * coordinates. For PF mm_offset is 0 so this is a no-op; callers pass
+	 * real addresses and the reserved_bottom/top nodes prevent allocations
+	 * in the forbidden regions. For VF, mm_offset equals the VF base so
+	 * we convert to relative drm_mm space.
 	 */
-	if (start >= ggtt->start)
-		start -= ggtt->start;
+	if (start >= ggtt->mm_offset)
+		start -= ggtt->mm_offset;
 	else
 		start = 0;
 
-	/* Should never happen, but since we handle start, fail graciously for end */
-	if (end >= ggtt->start)
-		end -= ggtt->start;
+	if (end >= ggtt->mm_offset)
+		end -= ggtt->mm_offset;
 	else
 		end = 0;
 
@@ -923,7 +1002,7 @@ u64 xe_ggtt_largest_hole(struct xe_ggtt *ggtt, u64 alignment, u64 *spare)
 
 	mutex_lock(&ggtt->lock);
 	drm_mm_for_each_hole(entry, mm, hole_start, hole_end) {
-		hole_start = max(hole_start, ggtt->start);
+		hole_start = max(hole_start, ggtt->mm_offset);
 		hole_start = ALIGN(hole_start, alignment);
 		hole_end = ALIGN_DOWN(hole_end, alignment);
 		if (hole_start >= hole_end)
@@ -1098,7 +1177,7 @@ u64 xe_ggtt_print_holes(struct xe_ggtt *ggtt, u64 alignment, struct drm_printer
 
 	mutex_lock(&ggtt->lock);
 	drm_mm_for_each_hole(entry, mm, hole_start, hole_end) {
-		hole_start = max(hole_start, ggtt->start);
+		hole_start = max(hole_start, ggtt->mm_offset);
 		hole_start = ALIGN(hole_start, alignment);
 		hole_end = ALIGN_DOWN(hole_end, alignment);
 		if (hole_start >= hole_end)
@@ -1152,7 +1231,7 @@ u64 xe_ggtt_read_pte(struct xe_ggtt *ggtt, u64 offset)
 u64 xe_ggtt_node_addr(const struct xe_ggtt_node *node)
 {
 	/* pairs with WRITE_ONCE in xe_ggtt_shift_nodes() */
-	return node->base.start + READ_ONCE(node->ggtt->start);
+	return node->base.start + READ_ONCE(node->ggtt->mm_offset);
 }
 
 /**
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* ✗ i915.CI.BAT: failure for drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
  2026-05-27 14:45 [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF Rodrigo Vivi
@ 2026-05-27 15:48 ` Patchwork
  2026-05-28 22:11 ` [PATCH] " Michal Wajdeczko
  2026-06-03 12:57 ` [PATCH] drm/xe/ggtt: clear reserved area at GUC_GGTT_TOP Maarten Lankhorst
  2 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2026-05-27 15:48 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 2813 bytes --]

== Series Details ==

Series: drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
URL   : https://patchwork.freedesktop.org/series/167392/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_18561 -> Patchwork_167392v1
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_167392v1 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_167392v1, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_167392v1/index.html

Participating hosts (42 -> 40)
------------------------------

  Missing    (2): bat-dg2-13 fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_167392v1:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_module_load@load:
    - bat-apl-1:          [PASS][1] -> [DMESG-WARN][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_18561/bat-apl-1/igt@i915_module_load@load.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_167392v1/bat-apl-1/igt@i915_module_load@load.html

  
Known issues
------------

  Here are the changes found in Patchwork_167392v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@core_debugfs@read-all-entries:
    - bat-adlp-6:         [PASS][3] -> [DMESG-WARN][4] ([i915#15673])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_18561/bat-adlp-6/igt@core_debugfs@read-all-entries.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_167392v1/bat-adlp-6/igt@core_debugfs@read-all-entries.html

  
#### Possible fixes ####

  * igt@core_auth@basic-auth:
    - bat-adlp-6:         [DMESG-WARN][5] ([i915#15673]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_18561/bat-adlp-6/igt@core_auth@basic-auth.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_167392v1/bat-adlp-6/igt@core_auth@basic-auth.html

  
  [i915#15673]: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15673


Build changes
-------------

  * Linux: CI_DRM_18561 -> Patchwork_167392v1

  CI-20190529: 20190529
  CI_DRM_18561: 5390f2273d45bb259d88508828018c0fbbb79d32 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_8938: b024a3b67372962ff6e643d3998c5cf5acc07081 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_167392v1: 5390f2273d45bb259d88508828018c0fbbb79d32 @ git://anongit.freedesktop.org/gfx-ci/linux

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_167392v1/index.html

[-- Attachment #2: Type: text/html, Size: 3542 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
  2026-05-27 14:45 [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF Rodrigo Vivi
  2026-05-27 15:48 ` ✗ i915.CI.BAT: failure for " Patchwork
@ 2026-05-28 22:11 ` Michal Wajdeczko
  2026-05-28 22:28   ` Rodrigo Vivi
  2026-06-03 12:57 ` [PATCH] drm/xe/ggtt: clear reserved area at GUC_GGTT_TOP Maarten Lankhorst
  2 siblings, 1 reply; 12+ messages in thread
From: Michal Wajdeczko @ 2026-05-28 22:11 UTC (permalink / raw)
  To: Rodrigo Vivi, intel-gfx, intel-xe
  Cc: Ville Syrjälä, Maarten Lankhorst



On 5/27/2026 4:45 PM, Rodrigo Vivi wrote:
> The PF GGTT allocator was initialised over a relative [0, usable_size)
> range, with ggtt->start added on every address conversion to get the
> actual hardware address.  Two consequences of that model were considered
> "horrible hacks":
> 
>   - ggtt->start (the WOPCM offset) had to be carried around and added
>     to every drm_mm result.

hmm, but this an internal detail of the xe_ggtt implementation, so why
would someone else complain about it?

>   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead

hmm, for the record, this GGTT cap on the top was added back in 2023

commit ab10e976fbda8349163ceee2ce99b2bfc97031b8
Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Date:   Wed Jun 14 10:47:54 2023 -0700

    drm/xe: limit GGTT size to GUC_GGTT_TOP

+        * The GuC address space is limited on both ends of the GGTT, because
+        * the GuC shim HW redirects accesses to those addresses to other HW
+        * areas instead of going through the GGTT. On the bottom end, the GuC
+        * can't access offsets below the WOPCM size, while on the top side the
+        * limit is fixed at GUC_GGTT_TOP. To keep things simple, instead of
+        * checking each object to see if they are accessed by GuC or not, we
+        * just exclude those areas from the allocator. Additionally, to
+        * simplify the driver load, we use the maximum WOPCM size in this logic

>     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
>     untouched during the initial clear.

and that likely will not be changed by this patch as after allocating 'two
permanent zones', the drm_mm_for_each_hole will not iterate over them

> 
> Fix this for the PF case by initialising drm_mm over the full hardware
> GGTT range [0, total_size) and permanently reserving the two forbidden
> zones:
> 
>   - [0, wopcm)           — inaccessible below WOPCM
>   - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP

that looks odds: why pretend to claim manageability of full [0, 4GB)
of the GGTT and then immediately permanently reserve two end zones to
end up with real [wopcm, GUC_TOP) which is what we already have?

> 
> A new mm_offset field (zero for PF) carries the base offset used in
> address conversions, unifying the existing VF relative model (where
> mm_offset == vf_base) with the new PF absolute model.

but public xe_ggtt API already uses absolute addressing in PF and VF

>  The public
> xe_ggtt_start() / xe_ggtt_size() API continues to return the usable
> [wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF
> config code are unaffected.
> 
> xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset
> so the VF recovery path remains a single O(1) WRITE_ONCE pair.

maybe it's just me - but I can't figure out the real rationale for this
patch - what did I miss?

> 
> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Assisted-by: GitHub-Copilot:claude-sonnet-4.6
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++-------
>  1 file changed, 101 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index a351c578b170..00a6cd2b8a51 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -137,6 +137,17 @@ struct xe_ggtt {
>  	const struct xe_ggtt_pt_ops *pt_ops;
>  	/** @mm: The memory manager used to manage individual GGTT allocations */
>  	struct drm_mm mm;
> +	/**
> +	 * @mm_offset: base offset added to drm_mm node addresses to obtain hardware
> +	 * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses).
> +	 * For VF this equals @start (drm_mm uses relative addresses from VF base).
> +	 * Updated atomically by xe_ggtt_shift_nodes() during VF recovery.
> +	 */
> +	u64 mm_offset;
> +	/** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */
> +	struct drm_mm_node reserved_bottom;
> +	/** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */
> +	struct drm_mm_node reserved_top;

maybe all we need is to separate concepts of:

* raw GGTT - fixed range [0, 4GB)

from

* allocable GGTT - configurable sub-range [start, end)
  * [wopcm, GUC_TOP) on PF
  * [base, base+size) on VF

and then we can continue to use drm_mm.init(0, end-start) to manage
that [start, end) range in a common way on both PF and VF?


>  	/** @access_count: counts GGTT writes */
>  	unsigned int access_count;
>  	/** @wq: Dedicated unordered work queue to process node removals */
> @@ -318,6 +329,10 @@ static void ggtt_fini_early(struct drm_device *drm, void *arg)
>  {
>  	struct xe_ggtt *ggtt = arg;
>  
> +	if (drm_mm_node_allocated(&ggtt->reserved_top))
> +		drm_mm_remove_node(&ggtt->reserved_top);
> +	if (drm_mm_node_allocated(&ggtt->reserved_bottom))
> +		drm_mm_remove_node(&ggtt->reserved_bottom);
>  	destroy_workqueue(ggtt->wq);
>  	drm_mm_takedown(&ggtt->mm);
>  }
> @@ -354,16 +369,66 @@ static const struct xe_ggtt_pt_ops xelpg_pt_wa_ops = {
>  	.ggtt_get_pte = xe_ggtt_get_pte,
>  };
>  
> -static void __xe_ggtt_init_early(struct xe_ggtt *ggtt, u64 start, u64 size)
> +/*
> + * __xe_ggtt_init_early - Generic drm_mm initialisation for both PF and VF.
> + *
> + * @start and @size define the usable GGTT range exposed through the public API.
> + * @mm_offset is added to every drm_mm node address to obtain the hardware
> + * address (0 for PF where the drm_mm spans real addresses; @start for VF).
> + * @mm_size is the total span passed to drm_mm_init() (equals @size for VF;
> + * equals the full hardware GGTT size for PF).
> + */
> +static void __xe_ggtt_init_early(struct xe_ggtt *ggtt, u64 start, u64 size,
> +				 u64 mm_offset, u64 mm_size)
>  {
>  	ggtt->start = start;
>  	ggtt->size = size;
> -	drm_mm_init(&ggtt->mm, 0, size);
> +	ggtt->mm_offset = mm_offset;
> +	drm_mm_init(&ggtt->mm, 0, mm_size);
> +}
> +
> +/*
> + * __xe_ggtt_reserve_pf_nodes - Permanently reserve the forbidden GGTT zones.
> + *
> + * Must be called after __xe_ggtt_init_early() for the PF case.  Reserves
> + * [0, @wopcm) and [@guc_top, @total_size) so the allocator never hands them
> + * out.  On failure the drm_mm is torn down and the error is returned.
> + */
> +static int __xe_ggtt_reserve_pf_nodes(struct xe_ggtt *ggtt, u64 wopcm,
> +				      u64 guc_top, u64 total_size)
> +{
> +	int err;
> +
> +	/* Reserve [0, wopcm) — GuC cannot access below WOPCM */
> +	if (wopcm > 0) {
> +		ggtt->reserved_bottom.start = 0;
> +		ggtt->reserved_bottom.size = wopcm;
> +		err = drm_mm_reserve_node(&ggtt->mm, &ggtt->reserved_bottom);
> +		if (WARN_ON(err))
> +			goto err_takedown;
> +	}
> +
> +	/* Reserve [guc_top, total_size) — GuC cannot access above GUC_GGTT_TOP */
> +	if (guc_top < total_size) {
> +		ggtt->reserved_top.start = guc_top;
> +		ggtt->reserved_top.size = total_size - guc_top;
> +		err = drm_mm_reserve_node(&ggtt->mm, &ggtt->reserved_top);
> +		if (WARN_ON(err))
> +			goto err_remove_bottom;
> +	}
> +
> +	return 0;
> +
> +err_remove_bottom:
> +	drm_mm_remove_node(&ggtt->reserved_bottom);
> +err_takedown:
> +	drm_mm_takedown(&ggtt->mm);
> +	return err;
>  }
>  
>  int xe_ggtt_init_kunit(struct xe_ggtt *ggtt, u32 start, u32 size)
>  {
> -	__xe_ggtt_init_early(ggtt, start, size);
> +	__xe_ggtt_init_early(ggtt, start, size, start, size);
>  	return 0;
>  }
>  EXPORT_SYMBOL_IF_KUNIT(xe_ggtt_init_kunit);
> @@ -405,8 +470,13 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
>  			xe_tile_err(ggtt->tile, "Hardware reported no preallocated GSM\n");
>  			return -ENOMEM;
>  		}
> +		/*
> +		 * For PF, ggtt_size holds the full hardware GGTT size. The
> +		 * WOPCM and GUC_GGTT_TOP limits are enforced via permanently
> +		 * reserved drm_mm nodes rather than by capping the range.
> +		 */
>  		ggtt_start = wopcm;
> -		ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE - ggtt_start;
> +		ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE;
>  	} else {
>  		ggtt_start = xe_tile_sriov_vf_ggtt_base(ggtt->tile);
>  		ggtt_size = xe_tile_sriov_vf_ggtt(ggtt->tile);
> @@ -423,9 +493,6 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
>  	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
>  		ggtt->flags |= XE_GGTT_FLAGS_64K;
>  
> -	if (ggtt_size + ggtt_start > GUC_GGTT_TOP)
> -		ggtt_size = GUC_GGTT_TOP - ggtt_start;
> -
>  	if (GRAPHICS_VERx100(xe) >= 1270)
>  		ggtt->pt_ops =
>  			(ggtt->tile->media_gt && XE_GT_WA(ggtt->tile->media_gt, 22019338487)) ||
> @@ -438,7 +505,19 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
>  	if (!ggtt->wq)
>  		return -ENOMEM;
>  
> -	__xe_ggtt_init_early(ggtt, ggtt_start, ggtt_size);
> +	if (!IS_SRIOV_VF(xe)) {
> +		__xe_ggtt_init_early(ggtt, ggtt_start, GUC_GGTT_TOP - ggtt_start,
> +				     0, ggtt_size);
> +		err = __xe_ggtt_reserve_pf_nodes(ggtt, ggtt_start, GUC_GGTT_TOP,
> +						 ggtt_size);
> +		if (err) {
> +			destroy_workqueue(ggtt->wq);
> +			return err;
> +		}
> +	} else {
> +		__xe_ggtt_init_early(ggtt, ggtt_start, ggtt_size,
> +				     ggtt_start, ggtt_size);
> +	}
>  
>  	err = drmm_add_action_or_reset(&xe->drm, ggtt_fini_early, ggtt);
>  	if (err)
> @@ -459,7 +538,7 @@ static void xe_ggtt_initial_clear(struct xe_ggtt *ggtt)
>  	/* Display may have allocated inside ggtt, so be careful with clearing here */
>  	mutex_lock(&ggtt->lock);
>  	drm_mm_for_each_hole(hole, &ggtt->mm, start, end)
> -		xe_ggtt_clear(ggtt, ggtt->start + start, end - start);
> +		xe_ggtt_clear(ggtt, ggtt->mm_offset + start, end - start);
>  
>  	xe_ggtt_invalidate(ggtt);
>  	mutex_unlock(&ggtt->lock);
> @@ -613,6 +692,7 @@ void xe_ggtt_shift_nodes(struct xe_ggtt *ggtt, u64 new_start)
>  
>  	/* pairs with READ_ONCE in xe_ggtt_node_addr() */
>  	WRITE_ONCE(ggtt->start, new_start);
> +	WRITE_ONCE(ggtt->mm_offset, new_start);
>  }
>  
>  static int xe_ggtt_insert_node_locked(struct xe_ggtt_node *node,
> @@ -815,20 +895,19 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
>  
>  	mutex_lock(&ggtt->lock);
>  	/*
> -	 * When inheriting the initial framebuffer, the framebuffer is
> -	 * physically located at VRAM address 0, and usually at GGTT address 0 too.
> -	 *
> -	 * The display code will ask for a GGTT allocation between end of BO and
> -	 * remainder of GGTT, unaware that the start is reserved by WOPCM.
> +	 * Convert caller-supplied hardware GGTT addresses to drm_mm-relative
> +	 * coordinates. For PF mm_offset is 0 so this is a no-op; callers pass
> +	 * real addresses and the reserved_bottom/top nodes prevent allocations
> +	 * in the forbidden regions. For VF, mm_offset equals the VF base so
> +	 * we convert to relative drm_mm space.
>  	 */
> -	if (start >= ggtt->start)
> -		start -= ggtt->start;
> +	if (start >= ggtt->mm_offset)
> +		start -= ggtt->mm_offset;
>  	else
>  		start = 0;
>  
> -	/* Should never happen, but since we handle start, fail graciously for end */
> -	if (end >= ggtt->start)
> -		end -= ggtt->start;
> +	if (end >= ggtt->mm_offset)
> +		end -= ggtt->mm_offset;
>  	else
>  		end = 0;
>  
> @@ -923,7 +1002,7 @@ u64 xe_ggtt_largest_hole(struct xe_ggtt *ggtt, u64 alignment, u64 *spare)
>  
>  	mutex_lock(&ggtt->lock);
>  	drm_mm_for_each_hole(entry, mm, hole_start, hole_end) {
> -		hole_start = max(hole_start, ggtt->start);
> +		hole_start = max(hole_start, ggtt->mm_offset);
>  		hole_start = ALIGN(hole_start, alignment);
>  		hole_end = ALIGN_DOWN(hole_end, alignment);
>  		if (hole_start >= hole_end)
> @@ -1098,7 +1177,7 @@ u64 xe_ggtt_print_holes(struct xe_ggtt *ggtt, u64 alignment, struct drm_printer
>  
>  	mutex_lock(&ggtt->lock);
>  	drm_mm_for_each_hole(entry, mm, hole_start, hole_end) {
> -		hole_start = max(hole_start, ggtt->start);
> +		hole_start = max(hole_start, ggtt->mm_offset);
>  		hole_start = ALIGN(hole_start, alignment);
>  		hole_end = ALIGN_DOWN(hole_end, alignment);
>  		if (hole_start >= hole_end)
> @@ -1152,7 +1231,7 @@ u64 xe_ggtt_read_pte(struct xe_ggtt *ggtt, u64 offset)
>  u64 xe_ggtt_node_addr(const struct xe_ggtt_node *node)
>  {
>  	/* pairs with WRITE_ONCE in xe_ggtt_shift_nodes() */
> -	return node->base.start + READ_ONCE(node->ggtt->start);
> +	return node->base.start + READ_ONCE(node->ggtt->mm_offset);
>  }
>  
>  /**


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
  2026-05-28 22:11 ` [PATCH] " Michal Wajdeczko
@ 2026-05-28 22:28   ` Rodrigo Vivi
  2026-05-29 10:50     ` Ville Syrjälä
  0 siblings, 1 reply; 12+ messages in thread
From: Rodrigo Vivi @ 2026-05-28 22:28 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: intel-gfx, intel-xe, Ville Syrjälä, Maarten Lankhorst

On Fri, May 29, 2026 at 12:11:22AM +0200, Michal Wajdeczko wrote:
> 
> 
> On 5/27/2026 4:45 PM, Rodrigo Vivi wrote:
> > The PF GGTT allocator was initialised over a relative [0, usable_size)
> > range, with ggtt->start added on every address conversion to get the
> > actual hardware address.  Two consequences of that model were considered
> > "horrible hacks":
> > 
> >   - ggtt->start (the WOPCM offset) had to be carried around and added
> >     to every drm_mm result.
> 
> hmm, but this an internal detail of the xe_ggtt implementation, so why
> would someone else complain about it?
> 
> >   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
> 
> hmm, for the record, this GGTT cap on the top was added back in 2023
> 
> commit ab10e976fbda8349163ceee2ce99b2bfc97031b8
> Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Date:   Wed Jun 14 10:47:54 2023 -0700
> 
>     drm/xe: limit GGTT size to GUC_GGTT_TOP
> 
> +        * The GuC address space is limited on both ends of the GGTT, because
> +        * the GuC shim HW redirects accesses to those addresses to other HW
> +        * areas instead of going through the GGTT. On the bottom end, the GuC
> +        * can't access offsets below the WOPCM size, while on the top side the
> +        * limit is fixed at GUC_GGTT_TOP. To keep things simple, instead of
> +        * checking each object to see if they are accessed by GuC or not, we
> +        * just exclude those areas from the allocator. Additionally, to
> +        * simplify the driver load, we use the maximum WOPCM size in this logic
> 
> >     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
> >     untouched during the initial clear.
> 
> and that likely will not be changed by this patch as after allocating 'two
> permanent zones', the drm_mm_for_each_hole will not iterate over them

right...

> 
> > 
> > Fix this for the PF case by initialising drm_mm over the full hardware
> > GGTT range [0, total_size) and permanently reserving the two forbidden
> > zones:
> > 
> >   - [0, wopcm)           — inaccessible below WOPCM
> >   - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP
> 
> that looks odds: why pretend to claim manageability of full [0, 4GB)
> of the GGTT and then immediately permanently reserve two end zones to
> end up with real [wopcm, GUC_TOP) which is what we already have?

yes...

> 
> > 
> > A new mm_offset field (zero for PF) carries the base offset used in
> > address conversions, unifying the existing VF relative model (where
> > mm_offset == vf_base) with the new PF absolute model.
> 
> but public xe_ggtt API already uses absolute addressing in PF and VF

I know...

> 
> >  The public
> > xe_ggtt_start() / xe_ggtt_size() API continues to return the usable
> > [wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF
> > config code are unaffected.
> > 
> > xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset
> > so the VF recovery path remains a single O(1) WRITE_ONCE pair.
> 
> maybe it's just me - but I can't figure out the real rationale for this
> patch - what did I miss?

This series:
https://lore.kernel.org/intel-xe/20260511214122.8468-1-ville.syrjala@linux.intel.com/

And more specifically the discussion in this patch:
https://lore.kernel.org/intel-xe/20260511214122.8468-13-ville.syrjala@linux.intel.com/

> 
> > 
> > Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Assisted-by: GitHub-Copilot:claude-sonnet-4.6
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++-------
> >  1 file changed, 101 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > index a351c578b170..00a6cd2b8a51 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > @@ -137,6 +137,17 @@ struct xe_ggtt {
> >  	const struct xe_ggtt_pt_ops *pt_ops;
> >  	/** @mm: The memory manager used to manage individual GGTT allocations */
> >  	struct drm_mm mm;
> > +	/**
> > +	 * @mm_offset: base offset added to drm_mm node addresses to obtain hardware
> > +	 * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses).
> > +	 * For VF this equals @start (drm_mm uses relative addresses from VF base).
> > +	 * Updated atomically by xe_ggtt_shift_nodes() during VF recovery.
> > +	 */
> > +	u64 mm_offset;
> > +	/** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */
> > +	struct drm_mm_node reserved_bottom;
> > +	/** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */
> > +	struct drm_mm_node reserved_top;
> 
> maybe all we need is to separate concepts of:
> 
> * raw GGTT - fixed range [0, 4GB)
> 
> from
> 
> * allocable GGTT - configurable sub-range [start, end)
>   * [wopcm, GUC_TOP) on PF
>   * [base, base+size) on VF
> 
> and then we can continue to use drm_mm.init(0, end-start) to manage
> that [start, end) range in a common way on both PF and VF?

we need to be able to use a ggtt buffer that comes out of this range,
so I'm afraid it doesn't solve all the cases.

> 
> 
> >  	/** @access_count: counts GGTT writes */
> >  	unsigned int access_count;
> >  	/** @wq: Dedicated unordered work queue to process node removals */
> > @@ -318,6 +329,10 @@ static void ggtt_fini_early(struct drm_device *drm, void *arg)
> >  {
> >  	struct xe_ggtt *ggtt = arg;
> >  
> > +	if (drm_mm_node_allocated(&ggtt->reserved_top))
> > +		drm_mm_remove_node(&ggtt->reserved_top);
> > +	if (drm_mm_node_allocated(&ggtt->reserved_bottom))
> > +		drm_mm_remove_node(&ggtt->reserved_bottom);
> >  	destroy_workqueue(ggtt->wq);
> >  	drm_mm_takedown(&ggtt->mm);
> >  }
> > @@ -354,16 +369,66 @@ static const struct xe_ggtt_pt_ops xelpg_pt_wa_ops = {
> >  	.ggtt_get_pte = xe_ggtt_get_pte,
> >  };
> >  
> > -static void __xe_ggtt_init_early(struct xe_ggtt *ggtt, u64 start, u64 size)
> > +/*
> > + * __xe_ggtt_init_early - Generic drm_mm initialisation for both PF and VF.
> > + *
> > + * @start and @size define the usable GGTT range exposed through the public API.
> > + * @mm_offset is added to every drm_mm node address to obtain the hardware
> > + * address (0 for PF where the drm_mm spans real addresses; @start for VF).
> > + * @mm_size is the total span passed to drm_mm_init() (equals @size for VF;
> > + * equals the full hardware GGTT size for PF).
> > + */
> > +static void __xe_ggtt_init_early(struct xe_ggtt *ggtt, u64 start, u64 size,
> > +				 u64 mm_offset, u64 mm_size)
> >  {
> >  	ggtt->start = start;
> >  	ggtt->size = size;
> > -	drm_mm_init(&ggtt->mm, 0, size);
> > +	ggtt->mm_offset = mm_offset;
> > +	drm_mm_init(&ggtt->mm, 0, mm_size);
> > +}
> > +
> > +/*
> > + * __xe_ggtt_reserve_pf_nodes - Permanently reserve the forbidden GGTT zones.
> > + *
> > + * Must be called after __xe_ggtt_init_early() for the PF case.  Reserves
> > + * [0, @wopcm) and [@guc_top, @total_size) so the allocator never hands them
> > + * out.  On failure the drm_mm is torn down and the error is returned.
> > + */
> > +static int __xe_ggtt_reserve_pf_nodes(struct xe_ggtt *ggtt, u64 wopcm,
> > +				      u64 guc_top, u64 total_size)
> > +{
> > +	int err;
> > +
> > +	/* Reserve [0, wopcm) — GuC cannot access below WOPCM */
> > +	if (wopcm > 0) {
> > +		ggtt->reserved_bottom.start = 0;
> > +		ggtt->reserved_bottom.size = wopcm;
> > +		err = drm_mm_reserve_node(&ggtt->mm, &ggtt->reserved_bottom);
> > +		if (WARN_ON(err))
> > +			goto err_takedown;
> > +	}
> > +
> > +	/* Reserve [guc_top, total_size) — GuC cannot access above GUC_GGTT_TOP */
> > +	if (guc_top < total_size) {
> > +		ggtt->reserved_top.start = guc_top;
> > +		ggtt->reserved_top.size = total_size - guc_top;
> > +		err = drm_mm_reserve_node(&ggtt->mm, &ggtt->reserved_top);
> > +		if (WARN_ON(err))
> > +			goto err_remove_bottom;
> > +	}
> > +
> > +	return 0;
> > +
> > +err_remove_bottom:
> > +	drm_mm_remove_node(&ggtt->reserved_bottom);
> > +err_takedown:
> > +	drm_mm_takedown(&ggtt->mm);
> > +	return err;
> >  }
> >  
> >  int xe_ggtt_init_kunit(struct xe_ggtt *ggtt, u32 start, u32 size)
> >  {
> > -	__xe_ggtt_init_early(ggtt, start, size);
> > +	__xe_ggtt_init_early(ggtt, start, size, start, size);
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL_IF_KUNIT(xe_ggtt_init_kunit);
> > @@ -405,8 +470,13 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> >  			xe_tile_err(ggtt->tile, "Hardware reported no preallocated GSM\n");
> >  			return -ENOMEM;
> >  		}
> > +		/*
> > +		 * For PF, ggtt_size holds the full hardware GGTT size. The
> > +		 * WOPCM and GUC_GGTT_TOP limits are enforced via permanently
> > +		 * reserved drm_mm nodes rather than by capping the range.
> > +		 */
> >  		ggtt_start = wopcm;
> > -		ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE - ggtt_start;
> > +		ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE;
> >  	} else {
> >  		ggtt_start = xe_tile_sriov_vf_ggtt_base(ggtt->tile);
> >  		ggtt_size = xe_tile_sriov_vf_ggtt(ggtt->tile);
> > @@ -423,9 +493,6 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> >  	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
> >  		ggtt->flags |= XE_GGTT_FLAGS_64K;
> >  
> > -	if (ggtt_size + ggtt_start > GUC_GGTT_TOP)
> > -		ggtt_size = GUC_GGTT_TOP - ggtt_start;
> > -
> >  	if (GRAPHICS_VERx100(xe) >= 1270)
> >  		ggtt->pt_ops =
> >  			(ggtt->tile->media_gt && XE_GT_WA(ggtt->tile->media_gt, 22019338487)) ||
> > @@ -438,7 +505,19 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> >  	if (!ggtt->wq)
> >  		return -ENOMEM;
> >  
> > -	__xe_ggtt_init_early(ggtt, ggtt_start, ggtt_size);
> > +	if (!IS_SRIOV_VF(xe)) {
> > +		__xe_ggtt_init_early(ggtt, ggtt_start, GUC_GGTT_TOP - ggtt_start,
> > +				     0, ggtt_size);
> > +		err = __xe_ggtt_reserve_pf_nodes(ggtt, ggtt_start, GUC_GGTT_TOP,
> > +						 ggtt_size);
> > +		if (err) {
> > +			destroy_workqueue(ggtt->wq);
> > +			return err;
> > +		}
> > +	} else {
> > +		__xe_ggtt_init_early(ggtt, ggtt_start, ggtt_size,
> > +				     ggtt_start, ggtt_size);
> > +	}
> >  
> >  	err = drmm_add_action_or_reset(&xe->drm, ggtt_fini_early, ggtt);
> >  	if (err)
> > @@ -459,7 +538,7 @@ static void xe_ggtt_initial_clear(struct xe_ggtt *ggtt)
> >  	/* Display may have allocated inside ggtt, so be careful with clearing here */
> >  	mutex_lock(&ggtt->lock);
> >  	drm_mm_for_each_hole(hole, &ggtt->mm, start, end)
> > -		xe_ggtt_clear(ggtt, ggtt->start + start, end - start);
> > +		xe_ggtt_clear(ggtt, ggtt->mm_offset + start, end - start);
> >  
> >  	xe_ggtt_invalidate(ggtt);
> >  	mutex_unlock(&ggtt->lock);
> > @@ -613,6 +692,7 @@ void xe_ggtt_shift_nodes(struct xe_ggtt *ggtt, u64 new_start)
> >  
> >  	/* pairs with READ_ONCE in xe_ggtt_node_addr() */
> >  	WRITE_ONCE(ggtt->start, new_start);
> > +	WRITE_ONCE(ggtt->mm_offset, new_start);
> >  }
> >  
> >  static int xe_ggtt_insert_node_locked(struct xe_ggtt_node *node,
> > @@ -815,20 +895,19 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> >  
> >  	mutex_lock(&ggtt->lock);
> >  	/*
> > -	 * When inheriting the initial framebuffer, the framebuffer is
> > -	 * physically located at VRAM address 0, and usually at GGTT address 0 too.
> > -	 *
> > -	 * The display code will ask for a GGTT allocation between end of BO and
> > -	 * remainder of GGTT, unaware that the start is reserved by WOPCM.
> > +	 * Convert caller-supplied hardware GGTT addresses to drm_mm-relative
> > +	 * coordinates. For PF mm_offset is 0 so this is a no-op; callers pass
> > +	 * real addresses and the reserved_bottom/top nodes prevent allocations
> > +	 * in the forbidden regions. For VF, mm_offset equals the VF base so
> > +	 * we convert to relative drm_mm space.
> >  	 */
> > -	if (start >= ggtt->start)
> > -		start -= ggtt->start;
> > +	if (start >= ggtt->mm_offset)
> > +		start -= ggtt->mm_offset;
> >  	else
> >  		start = 0;
> >  
> > -	/* Should never happen, but since we handle start, fail graciously for end */
> > -	if (end >= ggtt->start)
> > -		end -= ggtt->start;
> > +	if (end >= ggtt->mm_offset)
> > +		end -= ggtt->mm_offset;
> >  	else
> >  		end = 0;
> >  
> > @@ -923,7 +1002,7 @@ u64 xe_ggtt_largest_hole(struct xe_ggtt *ggtt, u64 alignment, u64 *spare)
> >  
> >  	mutex_lock(&ggtt->lock);
> >  	drm_mm_for_each_hole(entry, mm, hole_start, hole_end) {
> > -		hole_start = max(hole_start, ggtt->start);
> > +		hole_start = max(hole_start, ggtt->mm_offset);
> >  		hole_start = ALIGN(hole_start, alignment);
> >  		hole_end = ALIGN_DOWN(hole_end, alignment);
> >  		if (hole_start >= hole_end)
> > @@ -1098,7 +1177,7 @@ u64 xe_ggtt_print_holes(struct xe_ggtt *ggtt, u64 alignment, struct drm_printer
> >  
> >  	mutex_lock(&ggtt->lock);
> >  	drm_mm_for_each_hole(entry, mm, hole_start, hole_end) {
> > -		hole_start = max(hole_start, ggtt->start);
> > +		hole_start = max(hole_start, ggtt->mm_offset);
> >  		hole_start = ALIGN(hole_start, alignment);
> >  		hole_end = ALIGN_DOWN(hole_end, alignment);
> >  		if (hole_start >= hole_end)
> > @@ -1152,7 +1231,7 @@ u64 xe_ggtt_read_pte(struct xe_ggtt *ggtt, u64 offset)
> >  u64 xe_ggtt_node_addr(const struct xe_ggtt_node *node)
> >  {
> >  	/* pairs with WRITE_ONCE in xe_ggtt_shift_nodes() */
> > -	return node->base.start + READ_ONCE(node->ggtt->start);
> > +	return node->base.start + READ_ONCE(node->ggtt->mm_offset);
> >  }
> >  
> >  /**
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
  2026-05-28 22:28   ` Rodrigo Vivi
@ 2026-05-29 10:50     ` Ville Syrjälä
  2026-05-29 21:03       ` Rodrigo Vivi
  0 siblings, 1 reply; 12+ messages in thread
From: Ville Syrjälä @ 2026-05-29 10:50 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: Michal Wajdeczko, intel-gfx, intel-xe, Maarten Lankhorst

On Thu, May 28, 2026 at 06:28:47PM -0400, Rodrigo Vivi wrote:
> On Fri, May 29, 2026 at 12:11:22AM +0200, Michal Wajdeczko wrote:
> > 
> > 
> > On 5/27/2026 4:45 PM, Rodrigo Vivi wrote:
> > > The PF GGTT allocator was initialised over a relative [0, usable_size)
> > > range, with ggtt->start added on every address conversion to get the
> > > actual hardware address.  Two consequences of that model were considered
> > > "horrible hacks":
> > > 
> > >   - ggtt->start (the WOPCM offset) had to be carried around and added
> > >     to every drm_mm result.
> > 
> > hmm, but this an internal detail of the xe_ggtt implementation, so why
> > would someone else complain about it?
> > 
> > >   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
> > 
> > hmm, for the record, this GGTT cap on the top was added back in 2023
> > 
> > commit ab10e976fbda8349163ceee2ce99b2bfc97031b8
> > Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Date:   Wed Jun 14 10:47:54 2023 -0700
> > 
> >     drm/xe: limit GGTT size to GUC_GGTT_TOP
> > 
> > +        * The GuC address space is limited on both ends of the GGTT, because
> > +        * the GuC shim HW redirects accesses to those addresses to other HW
> > +        * areas instead of going through the GGTT. On the bottom end, the GuC
> > +        * can't access offsets below the WOPCM size, while on the top side the
> > +        * limit is fixed at GUC_GGTT_TOP. To keep things simple, instead of
> > +        * checking each object to see if they are accessed by GuC or not, we
> > +        * just exclude those areas from the allocator. Additionally, to
> > +        * simplify the driver load, we use the maximum WOPCM size in this logic
> > 
> > >     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
> > >     untouched during the initial clear.
> > 
> > and that likely will not be changed by this patch as after allocating 'two
> > permanent zones', the drm_mm_for_each_hole will not iterate over them
> 
> right...
> 
> > 
> > > 
> > > Fix this for the PF case by initialising drm_mm over the full hardware
> > > GGTT range [0, total_size) and permanently reserving the two forbidden
> > > zones:
> > > 
> > >   - [0, wopcm)           — inaccessible below WOPCM
> > >   - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP
> > 
> > that looks odds: why pretend to claim manageability of full [0, 4GB)
> > of the GGTT and then immediately permanently reserve two end zones to
> > end up with real [wopcm, GUC_TOP) which is what we already have?
> 
> yes...
> 
> > 
> > > 
> > > A new mm_offset field (zero for PF) carries the base offset used in
> > > address conversions, unifying the existing VF relative model (where
> > > mm_offset == vf_base) with the new PF absolute model.
> > 
> > but public xe_ggtt API already uses absolute addressing in PF and VF
> 
> I know...
> 
> > 
> > >  The public
> > > xe_ggtt_start() / xe_ggtt_size() API continues to return the usable
> > > [wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF
> > > config code are unaffected.
> > > 
> > > xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset
> > > so the VF recovery path remains a single O(1) WRITE_ONCE pair.
> > 
> > maybe it's just me - but I can't figure out the real rationale for this
> > patch - what did I miss?
> 
> This series:
> https://lore.kernel.org/intel-xe/20260511214122.8468-1-ville.syrjala@linux.intel.com/
> 
> And more specifically the discussion in this patch:
> https://lore.kernel.org/intel-xe/20260511214122.8468-13-ville.syrjala@linux.intel.com/
> 
> > 
> > > 
> > > Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > Assisted-by: GitHub-Copilot:claude-sonnet-4.6
> > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++-------
> > >  1 file changed, 101 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > > index a351c578b170..00a6cd2b8a51 100644
> > > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > > @@ -137,6 +137,17 @@ struct xe_ggtt {
> > >  	const struct xe_ggtt_pt_ops *pt_ops;
> > >  	/** @mm: The memory manager used to manage individual GGTT allocations */
> > >  	struct drm_mm mm;
> > > +	/**
> > > +	 * @mm_offset: base offset added to drm_mm node addresses to obtain hardware
> > > +	 * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses).
> > > +	 * For VF this equals @start (drm_mm uses relative addresses from VF base).
> > > +	 * Updated atomically by xe_ggtt_shift_nodes() during VF recovery.
> > > +	 */
> > > +	u64 mm_offset;
> > > +	/** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */
> > > +	struct drm_mm_node reserved_bottom;
> > > +	/** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */
> > > +	struct drm_mm_node reserved_top;
> > 
> > maybe all we need is to separate concepts of:
> > 
> > * raw GGTT - fixed range [0, 4GB)
> > 
> > from
> > 
> > * allocable GGTT - configurable sub-range [start, end)
> >   * [wopcm, GUC_TOP) on PF
> >   * [base, base+size) on VF
> > 
> > and then we can continue to use drm_mm.init(0, end-start) to manage
> > that [start, end) range in a common way on both PF and VF?
> 
> we need to be able to use a ggtt buffer that comes out of this range,
> so I'm afraid it doesn't solve all the cases.

Basically what the display needs is:
1. specify where in ggtt the buffer was originally placed by the GOP,
   this may be partially or fully inside these GuC reserved ranges
2. bind the buffer to some acceptable location (assuming the original
   location wasn't acceptable) without overwriting the PTEs for the
   original location

I suppose this could be achieved even with this "mm doesn't cover the
ends" hack, but step 1 there becomes a bit dodgy because we can't
insert the mm node if it's fully outside the mm. I suppose it could 
still work if you hide it in a function that only validates the real
ggtt offsets, but then ignores the fact that the node can't be
inserted due to being fully inside those reserved ranges. And then
whatever cleans up that original mm node must also ignore the fact
that the node maybe wasn't even allocated. And also
xe_ggtt_initial_clear() will need special code to clear the
reserved ranges.

My original idea was that we'd just include the reserved regions
in the mm, and then the display could just keep the buffer at its
original location, and later the guc code can reserve what is
left over. So we could skip step 2 above completely. But after
a second thought we probably don't want to skip that step because
we might free the display bo later, at which point we might free
up some of the reserved ranges. So I guess we'd still want to keep
step 2. But I think it'd still result in less special cases in the
code. We'd just need the guc code to reserve what it needs, after
the display code has rebound the bo to an acceptable location.

So we'd end up with:
1. insert node for the bo's original ggtt location
2. rebind the display bo to an acceptable ggtt location
3. undo step 1
4. xe_ggtt_initial_clear() (now also clears the reserved ranges
   without any special code)
5. guc steals the reserved ranges explicitly

So only two special cases left really, and all the rest
of the code is blissfully unaware of any of it.

Hmm, although hibernation might still be a slight issue for
xe_ggtt_initial_clear(). As in how would the reserved regions
get cleared during resume from hibernation? I have no idea 
how the current xe ggtt code handles resume at all...

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
  2026-05-29 10:50     ` Ville Syrjälä
@ 2026-05-29 21:03       ` Rodrigo Vivi
  2026-06-03 13:19         ` Ville Syrjälä
  0 siblings, 1 reply; 12+ messages in thread
From: Rodrigo Vivi @ 2026-05-29 21:03 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Michal Wajdeczko, intel-gfx, intel-xe, Maarten Lankhorst

On Fri, May 29, 2026 at 01:50:34PM +0300, Ville Syrjälä wrote:
> On Thu, May 28, 2026 at 06:28:47PM -0400, Rodrigo Vivi wrote:
> > On Fri, May 29, 2026 at 12:11:22AM +0200, Michal Wajdeczko wrote:
> > > 
> > > 
> > > On 5/27/2026 4:45 PM, Rodrigo Vivi wrote:
> > > > The PF GGTT allocator was initialised over a relative [0, usable_size)
> > > > range, with ggtt->start added on every address conversion to get the
> > > > actual hardware address.  Two consequences of that model were considered
> > > > "horrible hacks":
> > > > 
> > > >   - ggtt->start (the WOPCM offset) had to be carried around and added
> > > >     to every drm_mm result.
> > > 
> > > hmm, but this an internal detail of the xe_ggtt implementation, so why
> > > would someone else complain about it?
> > > 
> > > >   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
> > > 
> > > hmm, for the record, this GGTT cap on the top was added back in 2023
> > > 
> > > commit ab10e976fbda8349163ceee2ce99b2bfc97031b8
> > > Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > > Date:   Wed Jun 14 10:47:54 2023 -0700
> > > 
> > >     drm/xe: limit GGTT size to GUC_GGTT_TOP
> > > 
> > > +        * The GuC address space is limited on both ends of the GGTT, because
> > > +        * the GuC shim HW redirects accesses to those addresses to other HW
> > > +        * areas instead of going through the GGTT. On the bottom end, the GuC
> > > +        * can't access offsets below the WOPCM size, while on the top side the
> > > +        * limit is fixed at GUC_GGTT_TOP. To keep things simple, instead of
> > > +        * checking each object to see if they are accessed by GuC or not, we
> > > +        * just exclude those areas from the allocator. Additionally, to
> > > +        * simplify the driver load, we use the maximum WOPCM size in this logic
> > > 
> > > >     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
> > > >     untouched during the initial clear.
> > > 
> > > and that likely will not be changed by this patch as after allocating 'two
> > > permanent zones', the drm_mm_for_each_hole will not iterate over them
> > 
> > right...
> > 
> > > 
> > > > 
> > > > Fix this for the PF case by initialising drm_mm over the full hardware
> > > > GGTT range [0, total_size) and permanently reserving the two forbidden
> > > > zones:
> > > > 
> > > >   - [0, wopcm)           — inaccessible below WOPCM
> > > >   - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP
> > > 
> > > that looks odds: why pretend to claim manageability of full [0, 4GB)
> > > of the GGTT and then immediately permanently reserve two end zones to
> > > end up with real [wopcm, GUC_TOP) which is what we already have?
> > 
> > yes...
> > 
> > > 
> > > > 
> > > > A new mm_offset field (zero for PF) carries the base offset used in
> > > > address conversions, unifying the existing VF relative model (where
> > > > mm_offset == vf_base) with the new PF absolute model.
> > > 
> > > but public xe_ggtt API already uses absolute addressing in PF and VF
> > 
> > I know...
> > 
> > > 
> > > >  The public
> > > > xe_ggtt_start() / xe_ggtt_size() API continues to return the usable
> > > > [wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF
> > > > config code are unaffected.
> > > > 
> > > > xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset
> > > > so the VF recovery path remains a single O(1) WRITE_ONCE pair.
> > > 
> > > maybe it's just me - but I can't figure out the real rationale for this
> > > patch - what did I miss?
> > 
> > This series:
> > https://lore.kernel.org/intel-xe/20260511214122.8468-1-ville.syrjala@linux.intel.com/
> > 
> > And more specifically the discussion in this patch:
> > https://lore.kernel.org/intel-xe/20260511214122.8468-13-ville.syrjala@linux.intel.com/
> > 
> > > 
> > > > 
> > > > Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > Assisted-by: GitHub-Copilot:claude-sonnet-4.6
> > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++-------
> > > >  1 file changed, 101 insertions(+), 22 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > > > index a351c578b170..00a6cd2b8a51 100644
> > > > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > > > @@ -137,6 +137,17 @@ struct xe_ggtt {
> > > >  	const struct xe_ggtt_pt_ops *pt_ops;
> > > >  	/** @mm: The memory manager used to manage individual GGTT allocations */
> > > >  	struct drm_mm mm;
> > > > +	/**
> > > > +	 * @mm_offset: base offset added to drm_mm node addresses to obtain hardware
> > > > +	 * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses).
> > > > +	 * For VF this equals @start (drm_mm uses relative addresses from VF base).
> > > > +	 * Updated atomically by xe_ggtt_shift_nodes() during VF recovery.
> > > > +	 */
> > > > +	u64 mm_offset;
> > > > +	/** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */
> > > > +	struct drm_mm_node reserved_bottom;
> > > > +	/** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */
> > > > +	struct drm_mm_node reserved_top;
> > > 
> > > maybe all we need is to separate concepts of:
> > > 
> > > * raw GGTT - fixed range [0, 4GB)
> > > 
> > > from
> > > 
> > > * allocable GGTT - configurable sub-range [start, end)
> > >   * [wopcm, GUC_TOP) on PF
> > >   * [base, base+size) on VF
> > > 
> > > and then we can continue to use drm_mm.init(0, end-start) to manage
> > > that [start, end) range in a common way on both PF and VF?
> > 
> > we need to be able to use a ggtt buffer that comes out of this range,
> > so I'm afraid it doesn't solve all the cases.
> 
> Basically what the display needs is:
> 1. specify where in ggtt the buffer was originally placed by the GOP,
>    this may be partially or fully inside these GuC reserved ranges
> 2. bind the buffer to some acceptable location (assuming the original
>    location wasn't acceptable) without overwriting the PTEs for the
>    original location
> 
> I suppose this could be achieved even with this "mm doesn't cover the
> ends" hack, but step 1 there becomes a bit dodgy because we can't
> insert the mm node if it's fully outside the mm. I suppose it could 
> still work if you hide it in a function that only validates the real
> ggtt offsets, but then ignores the fact that the node can't be
> inserted due to being fully inside those reserved ranges. And then
> whatever cleans up that original mm node must also ignore the fact
> that the node maybe wasn't even allocated. And also
> xe_ggtt_initial_clear() will need special code to clear the
> reserved ranges.

right, so basically we could keep the xe_ggtt as is and provide
2 hooks:

1. one to reserve the portion of the BIOS FB that goes
inside our managed ggtt area
2. a special clear for this area

And in between you do the rebind with existing infrastructure
to an empty region?! Is this what you are thinking now?

> 
> My original idea was that we'd just include the reserved regions
> in the mm, and then the display could just keep the buffer at its
> original location, and later the guc code can reserve what is
> left over. So we could skip step 2 above completely. But after
> a second thought we probably don't want to skip that step because
> we might free the display bo later, at which point we might free
> up some of the reserved ranges. So I guess we'd still want to keep
> step 2. But I think it'd still result in less special cases in the
> code. We'd just need the guc code to reserve what it needs, after
> the display code has rebound the bo to an acceptable location.
> 
> So we'd end up with:
> 1. insert node for the bo's original ggtt location
> 2. rebind the display bo to an acceptable ggtt location
> 3. undo step 1
> 4. xe_ggtt_initial_clear() (now also clears the reserved ranges
>    without any special code)
> 5. guc steals the reserved ranges explicitly
> 
> So only two special cases left really, and all the rest
> of the code is blissfully unaware of any of it.
> 
> Hmm, although hibernation might still be a slight issue for
> xe_ggtt_initial_clear(). As in how would the reserved regions
> get cleared during resume from hibernation? I have no idea 
> how the current xe ggtt code handles resume at all...

The resume should only restore the pinned bo's one by one, nothing
special. So I guess if we keep the original code we are okay,
but if we start to managing the full range with the reserved areas
we might have some difficulties here on the way...

> 
> -- 
> Ville Syrjälä
> Intel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] drm/xe/ggtt: clear reserved area at GUC_GGTT_TOP.
  2026-05-27 14:45 [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF Rodrigo Vivi
  2026-05-27 15:48 ` ✗ i915.CI.BAT: failure for " Patchwork
  2026-05-28 22:11 ` [PATCH] " Michal Wajdeczko
@ 2026-06-03 12:57 ` Maarten Lankhorst
  2026-06-03 13:11   ` Ville Syrjälä
  2 siblings, 1 reply; 12+ messages in thread
From: Maarten Lankhorst @ 2026-06-03 12:57 UTC (permalink / raw)
  To: Rodrigo Vivi, intel-gfx, intel-xe
  Cc: Ville Syrjälä, Michal Wajdeczko

Hello,

On 5/27/26 16:45, Rodrigo Vivi wrote:
> The PF GGTT allocator was initialised over a relative [0, usable_size)
> range, with ggtt->start added on every address conversion to get the
> actual hardware address.  Two consequences of that model were considered
> "horrible hacks":
> 
>   - ggtt->start (the WOPCM offset) had to be carried around and added
>     to every drm_mm result.
This is still the case, only we add a separate variable now that is usually 0.
We always have to ensure that this is correctly accounted
for, otherwise we get subtle bugs, so there's no reason not to use it,
if it's already used everywhere.

Whether it's 0 or wopcm_start doesn't change the result.

>   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
>     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
>     untouched during the initial clear.
This is a valid concern, but it's very easily be fixed inside the initial clear
by calling xe_ggtt_clear once more with a comment for !VF case.

Something like below is enough to fix it. Adding a reserved region at the end/start
is not the way to do so and just adds more complications.
---8<---
When initialising GGTT, we never clear the part above GUC_GGTT_TOP.
Add a member to &xe_ggtt that holds the full GGTT size, and clear it
during the pass at xe_ggtt_initial_clear.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
---
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 8ec23862477fc..497b99a932f0d 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -115,6 +115,8 @@ struct xe_ggtt {
 	u64 start;
 	/** @size: Total usable size of this GGTT */
 	u64 size;
+	/** @entire_size: complete size of the accessible GGTT, reserved regions inclusive */
+	u64 entire_size;
 	/**
 	 * @flags: Flags for this GGTT.
 	 * Acceptable flags:
@@ -406,7 +408,8 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
 			return -ENOMEM;
 		}
 		ggtt_start = wopcm;
-		ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE - ggtt_start;
+		ggtt->entire_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE;
+		ggtt_size = ggtt->entire_size - ggtt_start;
 	} else {
 		ggtt_start = xe_tile_sriov_vf_ggtt_base(ggtt->tile);
 		ggtt_size = xe_tile_sriov_vf_ggtt(ggtt->tile);
@@ -417,6 +420,7 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
 				    ggtt_start, ggtt_start + ggtt_size - 1);
 			return -ERANGE;
 		}
+		ggtt->entire_size = ggtt_size;
 	}
 
 	ggtt->gsm = ggtt->tile->mmio.regs + SZ_8M;
@@ -461,6 +465,10 @@ static void xe_ggtt_initial_clear(struct xe_ggtt *ggtt)
 	drm_mm_for_each_hole(hole, &ggtt->mm, start, end)
 		xe_ggtt_clear(ggtt, ggtt->start + start, end - start);
 
+	end = ggtt->start + ggtt->size;
+	if (ggtt->entire_size > end)
+		xe_ggtt_clear(ggtt, end, ggtt->entire_size - end);
+
 	xe_ggtt_invalidate(ggtt);
 	mutex_unlock(&ggtt->lock);
 }


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/xe/ggtt: clear reserved area at GUC_GGTT_TOP.
  2026-06-03 12:57 ` [PATCH] drm/xe/ggtt: clear reserved area at GUC_GGTT_TOP Maarten Lankhorst
@ 2026-06-03 13:11   ` Ville Syrjälä
  0 siblings, 0 replies; 12+ messages in thread
From: Ville Syrjälä @ 2026-06-03 13:11 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: Rodrigo Vivi, intel-gfx, intel-xe, Michal Wajdeczko

On Wed, Jun 03, 2026 at 02:57:48PM +0200, Maarten Lankhorst wrote:
> Hello,
> 
> On 5/27/26 16:45, Rodrigo Vivi wrote:
> > The PF GGTT allocator was initialised over a relative [0, usable_size)
> > range, with ggtt->start added on every address conversion to get the
> > actual hardware address.  Two consequences of that model were considered
> > "horrible hacks":
> > 
> >   - ggtt->start (the WOPCM offset) had to be carried around and added
> >     to every drm_mm result.
> This is still the case, only we add a separate variable now that is usually 0.
> We always have to ensure that this is correctly accounted
> for, otherwise we get subtle bugs, so there's no reason not to use it,
> if it's already used everywhere.
> 
> Whether it's 0 or wopcm_start doesn't change the result.
> 
> >   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
> >     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
> >     untouched during the initial clear.
> This is a valid concern, but it's very easily be fixed inside the initial clear
> by calling xe_ggtt_clear once more with a comment for !VF case.
> 
> Something like below is enough to fix it. Adding a reserved region at the end/start
> is not the way to do so and just adds more complications.
> ---8<---
> When initialising GGTT, we never clear the part above GUC_GGTT_TOP.
> Add a member to &xe_ggtt that holds the full GGTT size, and clear it
> during the pass at xe_ggtt_initial_clear.
> 
> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
> ---
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index 8ec23862477fc..497b99a932f0d 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -115,6 +115,8 @@ struct xe_ggtt {
>  	u64 start;
>  	/** @size: Total usable size of this GGTT */
>  	u64 size;
> +	/** @entire_size: complete size of the accessible GGTT, reserved regions inclusive */
> +	u64 entire_size;
>  	/**
>  	 * @flags: Flags for this GGTT.
>  	 * Acceptable flags:
> @@ -406,7 +408,8 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
>  			return -ENOMEM;
>  		}
>  		ggtt_start = wopcm;
> -		ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE - ggtt_start;
> +		ggtt->entire_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE;
> +		ggtt_size = ggtt->entire_size - ggtt_start;
>  	} else {
>  		ggtt_start = xe_tile_sriov_vf_ggtt_base(ggtt->tile);
>  		ggtt_size = xe_tile_sriov_vf_ggtt(ggtt->tile);
> @@ -417,6 +420,7 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
>  				    ggtt_start, ggtt_start + ggtt_size - 1);
>  			return -ERANGE;
>  		}
> +		ggtt->entire_size = ggtt_size;
>  	}
>  
>  	ggtt->gsm = ggtt->tile->mmio.regs + SZ_8M;
> @@ -461,6 +465,10 @@ static void xe_ggtt_initial_clear(struct xe_ggtt *ggtt)
>  	drm_mm_for_each_hole(hole, &ggtt->mm, start, end)
>  		xe_ggtt_clear(ggtt, ggtt->start + start, end - start);
>  
> +	end = ggtt->start + ggtt->size;
> +	if (ggtt->entire_size > end)
> +		xe_ggtt_clear(ggtt, end, ggtt->entire_size - end);

You'll need the same thing for the 0 to ggtt->start range.

> +
>  	xe_ggtt_invalidate(ggtt);
>  	mutex_unlock(&ggtt->lock);
>  }

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
  2026-05-29 21:03       ` Rodrigo Vivi
@ 2026-06-03 13:19         ` Ville Syrjälä
  2026-06-03 13:44           ` Rodrigo Vivi
  0 siblings, 1 reply; 12+ messages in thread
From: Ville Syrjälä @ 2026-06-03 13:19 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: Michal Wajdeczko, intel-gfx, intel-xe, Maarten Lankhorst

On Fri, May 29, 2026 at 05:03:55PM -0400, Rodrigo Vivi wrote:
> On Fri, May 29, 2026 at 01:50:34PM +0300, Ville Syrjälä wrote:
> > On Thu, May 28, 2026 at 06:28:47PM -0400, Rodrigo Vivi wrote:
> > > On Fri, May 29, 2026 at 12:11:22AM +0200, Michal Wajdeczko wrote:
> > > > 
> > > > 
> > > > On 5/27/2026 4:45 PM, Rodrigo Vivi wrote:
> > > > > The PF GGTT allocator was initialised over a relative [0, usable_size)
> > > > > range, with ggtt->start added on every address conversion to get the
> > > > > actual hardware address.  Two consequences of that model were considered
> > > > > "horrible hacks":
> > > > > 
> > > > >   - ggtt->start (the WOPCM offset) had to be carried around and added
> > > > >     to every drm_mm result.
> > > > 
> > > > hmm, but this an internal detail of the xe_ggtt implementation, so why
> > > > would someone else complain about it?
> > > > 
> > > > >   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
> > > > 
> > > > hmm, for the record, this GGTT cap on the top was added back in 2023
> > > > 
> > > > commit ab10e976fbda8349163ceee2ce99b2bfc97031b8
> > > > Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > > > Date:   Wed Jun 14 10:47:54 2023 -0700
> > > > 
> > > >     drm/xe: limit GGTT size to GUC_GGTT_TOP
> > > > 
> > > > +        * The GuC address space is limited on both ends of the GGTT, because
> > > > +        * the GuC shim HW redirects accesses to those addresses to other HW
> > > > +        * areas instead of going through the GGTT. On the bottom end, the GuC
> > > > +        * can't access offsets below the WOPCM size, while on the top side the
> > > > +        * limit is fixed at GUC_GGTT_TOP. To keep things simple, instead of
> > > > +        * checking each object to see if they are accessed by GuC or not, we
> > > > +        * just exclude those areas from the allocator. Additionally, to
> > > > +        * simplify the driver load, we use the maximum WOPCM size in this logic
> > > > 
> > > > >     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
> > > > >     untouched during the initial clear.
> > > > 
> > > > and that likely will not be changed by this patch as after allocating 'two
> > > > permanent zones', the drm_mm_for_each_hole will not iterate over them
> > > 
> > > right...
> > > 
> > > > 
> > > > > 
> > > > > Fix this for the PF case by initialising drm_mm over the full hardware
> > > > > GGTT range [0, total_size) and permanently reserving the two forbidden
> > > > > zones:
> > > > > 
> > > > >   - [0, wopcm)           — inaccessible below WOPCM
> > > > >   - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP
> > > > 
> > > > that looks odds: why pretend to claim manageability of full [0, 4GB)
> > > > of the GGTT and then immediately permanently reserve two end zones to
> > > > end up with real [wopcm, GUC_TOP) which is what we already have?
> > > 
> > > yes...
> > > 
> > > > 
> > > > > 
> > > > > A new mm_offset field (zero for PF) carries the base offset used in
> > > > > address conversions, unifying the existing VF relative model (where
> > > > > mm_offset == vf_base) with the new PF absolute model.
> > > > 
> > > > but public xe_ggtt API already uses absolute addressing in PF and VF
> > > 
> > > I know...
> > > 
> > > > 
> > > > >  The public
> > > > > xe_ggtt_start() / xe_ggtt_size() API continues to return the usable
> > > > > [wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF
> > > > > config code are unaffected.
> > > > > 
> > > > > xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset
> > > > > so the VF recovery path remains a single O(1) WRITE_ONCE pair.
> > > > 
> > > > maybe it's just me - but I can't figure out the real rationale for this
> > > > patch - what did I miss?
> > > 
> > > This series:
> > > https://lore.kernel.org/intel-xe/20260511214122.8468-1-ville.syrjala@linux.intel.com/
> > > 
> > > And more specifically the discussion in this patch:
> > > https://lore.kernel.org/intel-xe/20260511214122.8468-13-ville.syrjala@linux.intel.com/
> > > 
> > > > 
> > > > > 
> > > > > Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > > Assisted-by: GitHub-Copilot:claude-sonnet-4.6
> > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++-------
> > > > >  1 file changed, 101 insertions(+), 22 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > > > > index a351c578b170..00a6cd2b8a51 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > > > > @@ -137,6 +137,17 @@ struct xe_ggtt {
> > > > >  	const struct xe_ggtt_pt_ops *pt_ops;
> > > > >  	/** @mm: The memory manager used to manage individual GGTT allocations */
> > > > >  	struct drm_mm mm;
> > > > > +	/**
> > > > > +	 * @mm_offset: base offset added to drm_mm node addresses to obtain hardware
> > > > > +	 * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses).
> > > > > +	 * For VF this equals @start (drm_mm uses relative addresses from VF base).
> > > > > +	 * Updated atomically by xe_ggtt_shift_nodes() during VF recovery.
> > > > > +	 */
> > > > > +	u64 mm_offset;
> > > > > +	/** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */
> > > > > +	struct drm_mm_node reserved_bottom;
> > > > > +	/** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */
> > > > > +	struct drm_mm_node reserved_top;
> > > > 
> > > > maybe all we need is to separate concepts of:
> > > > 
> > > > * raw GGTT - fixed range [0, 4GB)
> > > > 
> > > > from
> > > > 
> > > > * allocable GGTT - configurable sub-range [start, end)
> > > >   * [wopcm, GUC_TOP) on PF
> > > >   * [base, base+size) on VF
> > > > 
> > > > and then we can continue to use drm_mm.init(0, end-start) to manage
> > > > that [start, end) range in a common way on both PF and VF?
> > > 
> > > we need to be able to use a ggtt buffer that comes out of this range,
> > > so I'm afraid it doesn't solve all the cases.
> > 
> > Basically what the display needs is:
> > 1. specify where in ggtt the buffer was originally placed by the GOP,
> >    this may be partially or fully inside these GuC reserved ranges
> > 2. bind the buffer to some acceptable location (assuming the original
> >    location wasn't acceptable) without overwriting the PTEs for the
> >    original location
> > 
> > I suppose this could be achieved even with this "mm doesn't cover the
> > ends" hack, but step 1 there becomes a bit dodgy because we can't
> > insert the mm node if it's fully outside the mm. I suppose it could 
> > still work if you hide it in a function that only validates the real
> > ggtt offsets, but then ignores the fact that the node can't be
> > inserted due to being fully inside those reserved ranges. And then
> > whatever cleans up that original mm node must also ignore the fact
> > that the node maybe wasn't even allocated. And also
> > xe_ggtt_initial_clear() will need special code to clear the
> > reserved ranges.
> 
> right, so basically we could keep the xe_ggtt as is and provide
> 2 hooks:
> 
> 1. one to reserve the portion of the BIOS FB that goes
> inside our managed ggtt area
> 2. a special clear for this area
> 
> And in between you do the rebind with existing infrastructure
> to an empty region?! Is this what you are thinking now?
> 
> > 
> > My original idea was that we'd just include the reserved regions
> > in the mm, and then the display could just keep the buffer at its
> > original location, and later the guc code can reserve what is
> > left over. So we could skip step 2 above completely. But after
> > a second thought we probably don't want to skip that step because
> > we might free the display bo later, at which point we might free
> > up some of the reserved ranges. So I guess we'd still want to keep
> > step 2. But I think it'd still result in less special cases in the
> > code. We'd just need the guc code to reserve what it needs, after
> > the display code has rebound the bo to an acceptable location.
> > 
> > So we'd end up with:
> > 1. insert node for the bo's original ggtt location
> > 2. rebind the display bo to an acceptable ggtt location
> > 3. undo step 1
> > 4. xe_ggtt_initial_clear() (now also clears the reserved ranges
> >    without any special code)
> > 5. guc steals the reserved ranges explicitly
> > 
> > So only two special cases left really, and all the rest
> > of the code is blissfully unaware of any of it.
> > 
> > Hmm, although hibernation might still be a slight issue for
> > xe_ggtt_initial_clear(). As in how would the reserved regions
> > get cleared during resume from hibernation? I have no idea 
> > how the current xe ggtt code handles resume at all...
> 
> The resume should only restore the pinned bo's one by one, nothing
> special.

Looks like currently xe_ggtt_initial_clear() is never even called
during resume from hibernation, so in that case parts of the GGTT
will be left with whatever garbage the GOP put there. So that's
one thing that needs fixing.

> So I guess if we keep the original code we are okay,
> but if we start to managing the full range with the reserved areas
> we might have some difficulties here on the way...

If we had a full range mm I suppose we'd need a bit of special code
to remove the reserved nodes before xe_ggtt_initial_clear() gets
called, at least for the resume from hibernation case.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
  2026-06-03 13:19         ` Ville Syrjälä
@ 2026-06-03 13:44           ` Rodrigo Vivi
  2026-06-03 14:54             ` Ville Syrjälä
  2026-06-10 11:26             ` Maarten Lankhorst
  0 siblings, 2 replies; 12+ messages in thread
From: Rodrigo Vivi @ 2026-06-03 13:44 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Michal Wajdeczko, intel-gfx, intel-xe, Maarten Lankhorst

On Wed, Jun 03, 2026 at 04:19:41PM +0300, Ville Syrjälä wrote:
> On Fri, May 29, 2026 at 05:03:55PM -0400, Rodrigo Vivi wrote:
> > On Fri, May 29, 2026 at 01:50:34PM +0300, Ville Syrjälä wrote:
> > > On Thu, May 28, 2026 at 06:28:47PM -0400, Rodrigo Vivi wrote:
> > > > On Fri, May 29, 2026 at 12:11:22AM +0200, Michal Wajdeczko wrote:
> > > > > 
> > > > > 
> > > > > On 5/27/2026 4:45 PM, Rodrigo Vivi wrote:
> > > > > > The PF GGTT allocator was initialised over a relative [0, usable_size)
> > > > > > range, with ggtt->start added on every address conversion to get the
> > > > > > actual hardware address.  Two consequences of that model were considered
> > > > > > "horrible hacks":
> > > > > > 
> > > > > >   - ggtt->start (the WOPCM offset) had to be carried around and added
> > > > > >     to every drm_mm result.
> > > > > 
> > > > > hmm, but this an internal detail of the xe_ggtt implementation, so why
> > > > > would someone else complain about it?
> > > > > 
> > > > > >   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
> > > > > 
> > > > > hmm, for the record, this GGTT cap on the top was added back in 2023
> > > > > 
> > > > > commit ab10e976fbda8349163ceee2ce99b2bfc97031b8
> > > > > Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > > > > Date:   Wed Jun 14 10:47:54 2023 -0700
> > > > > 
> > > > >     drm/xe: limit GGTT size to GUC_GGTT_TOP
> > > > > 
> > > > > +        * The GuC address space is limited on both ends of the GGTT, because
> > > > > +        * the GuC shim HW redirects accesses to those addresses to other HW
> > > > > +        * areas instead of going through the GGTT. On the bottom end, the GuC
> > > > > +        * can't access offsets below the WOPCM size, while on the top side the
> > > > > +        * limit is fixed at GUC_GGTT_TOP. To keep things simple, instead of
> > > > > +        * checking each object to see if they are accessed by GuC or not, we
> > > > > +        * just exclude those areas from the allocator. Additionally, to
> > > > > +        * simplify the driver load, we use the maximum WOPCM size in this logic
> > > > > 
> > > > > >     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
> > > > > >     untouched during the initial clear.
> > > > > 
> > > > > and that likely will not be changed by this patch as after allocating 'two
> > > > > permanent zones', the drm_mm_for_each_hole will not iterate over them
> > > > 
> > > > right...
> > > > 
> > > > > 
> > > > > > 
> > > > > > Fix this for the PF case by initialising drm_mm over the full hardware
> > > > > > GGTT range [0, total_size) and permanently reserving the two forbidden
> > > > > > zones:
> > > > > > 
> > > > > >   - [0, wopcm)           — inaccessible below WOPCM
> > > > > >   - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP
> > > > > 
> > > > > that looks odds: why pretend to claim manageability of full [0, 4GB)
> > > > > of the GGTT and then immediately permanently reserve two end zones to
> > > > > end up with real [wopcm, GUC_TOP) which is what we already have?
> > > > 
> > > > yes...
> > > > 
> > > > > 
> > > > > > 
> > > > > > A new mm_offset field (zero for PF) carries the base offset used in
> > > > > > address conversions, unifying the existing VF relative model (where
> > > > > > mm_offset == vf_base) with the new PF absolute model.
> > > > > 
> > > > > but public xe_ggtt API already uses absolute addressing in PF and VF
> > > > 
> > > > I know...
> > > > 
> > > > > 
> > > > > >  The public
> > > > > > xe_ggtt_start() / xe_ggtt_size() API continues to return the usable
> > > > > > [wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF
> > > > > > config code are unaffected.
> > > > > > 
> > > > > > xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset
> > > > > > so the VF recovery path remains a single O(1) WRITE_ONCE pair.
> > > > > 
> > > > > maybe it's just me - but I can't figure out the real rationale for this
> > > > > patch - what did I miss?
> > > > 
> > > > This series:
> > > > https://lore.kernel.org/intel-xe/20260511214122.8468-1-ville.syrjala@linux.intel.com/
> > > > 
> > > > And more specifically the discussion in this patch:
> > > > https://lore.kernel.org/intel-xe/20260511214122.8468-13-ville.syrjala@linux.intel.com/
> > > > 
> > > > > 
> > > > > > 
> > > > > > Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > > > Assisted-by: GitHub-Copilot:claude-sonnet-4.6
> > > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++-------
> > > > > >  1 file changed, 101 insertions(+), 22 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > > > > > index a351c578b170..00a6cd2b8a51 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > > > > > @@ -137,6 +137,17 @@ struct xe_ggtt {
> > > > > >  	const struct xe_ggtt_pt_ops *pt_ops;
> > > > > >  	/** @mm: The memory manager used to manage individual GGTT allocations */
> > > > > >  	struct drm_mm mm;
> > > > > > +	/**
> > > > > > +	 * @mm_offset: base offset added to drm_mm node addresses to obtain hardware
> > > > > > +	 * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses).
> > > > > > +	 * For VF this equals @start (drm_mm uses relative addresses from VF base).
> > > > > > +	 * Updated atomically by xe_ggtt_shift_nodes() during VF recovery.
> > > > > > +	 */
> > > > > > +	u64 mm_offset;
> > > > > > +	/** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */
> > > > > > +	struct drm_mm_node reserved_bottom;
> > > > > > +	/** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */
> > > > > > +	struct drm_mm_node reserved_top;
> > > > > 
> > > > > maybe all we need is to separate concepts of:
> > > > > 
> > > > > * raw GGTT - fixed range [0, 4GB)
> > > > > 
> > > > > from
> > > > > 
> > > > > * allocable GGTT - configurable sub-range [start, end)
> > > > >   * [wopcm, GUC_TOP) on PF
> > > > >   * [base, base+size) on VF
> > > > > 
> > > > > and then we can continue to use drm_mm.init(0, end-start) to manage
> > > > > that [start, end) range in a common way on both PF and VF?
> > > > 
> > > > we need to be able to use a ggtt buffer that comes out of this range,
> > > > so I'm afraid it doesn't solve all the cases.
> > > 
> > > Basically what the display needs is:
> > > 1. specify where in ggtt the buffer was originally placed by the GOP,
> > >    this may be partially or fully inside these GuC reserved ranges
> > > 2. bind the buffer to some acceptable location (assuming the original
> > >    location wasn't acceptable) without overwriting the PTEs for the
> > >    original location
> > > 
> > > I suppose this could be achieved even with this "mm doesn't cover the
> > > ends" hack, but step 1 there becomes a bit dodgy because we can't
> > > insert the mm node if it's fully outside the mm. I suppose it could 
> > > still work if you hide it in a function that only validates the real
> > > ggtt offsets, but then ignores the fact that the node can't be
> > > inserted due to being fully inside those reserved ranges. And then
> > > whatever cleans up that original mm node must also ignore the fact
> > > that the node maybe wasn't even allocated. And also
> > > xe_ggtt_initial_clear() will need special code to clear the
> > > reserved ranges.
> > 
> > right, so basically we could keep the xe_ggtt as is and provide
> > 2 hooks:
> > 
> > 1. one to reserve the portion of the BIOS FB that goes
> > inside our managed ggtt area
> > 2. a special clear for this area
> > 
> > And in between you do the rebind with existing infrastructure
> > to an empty region?! Is this what you are thinking now?
> > 
> > > 
> > > My original idea was that we'd just include the reserved regions
> > > in the mm, and then the display could just keep the buffer at its
> > > original location, and later the guc code can reserve what is
> > > left over. So we could skip step 2 above completely. But after
> > > a second thought we probably don't want to skip that step because
> > > we might free the display bo later, at which point we might free
> > > up some of the reserved ranges. So I guess we'd still want to keep
> > > step 2. But I think it'd still result in less special cases in the
> > > code. We'd just need the guc code to reserve what it needs, after
> > > the display code has rebound the bo to an acceptable location.
> > > 
> > > So we'd end up with:
> > > 1. insert node for the bo's original ggtt location
> > > 2. rebind the display bo to an acceptable ggtt location
> > > 3. undo step 1
> > > 4. xe_ggtt_initial_clear() (now also clears the reserved ranges
> > >    without any special code)
> > > 5. guc steals the reserved ranges explicitly
> > > 
> > > So only two special cases left really, and all the rest
> > > of the code is blissfully unaware of any of it.
> > > 
> > > Hmm, although hibernation might still be a slight issue for
> > > xe_ggtt_initial_clear(). As in how would the reserved regions
> > > get cleared during resume from hibernation? I have no idea 
> > > how the current xe ggtt code handles resume at all...
> > 
> > The resume should only restore the pinned bo's one by one, nothing
> > special.
> 
> Looks like currently xe_ggtt_initial_clear() is never even called
> during resume from hibernation, so in that case parts of the GGTT
> will be left with whatever garbage the GOP put there. So that's
> one thing that needs fixing.
> 
> > So I guess if we keep the original code we are okay,
> > but if we start to managing the full range with the reserved areas
> > we might have some difficulties here on the way...
> 
> If we had a full range mm I suppose we'd need a bit of special code
> to remove the reserved nodes before xe_ggtt_initial_clear() gets
> called, at least for the resume from hibernation case.

it looks like Maarten suggestion fix the clear portion.
But for the steps 1 and 3 above we would need a special reservation
with a node area only within our managed area?!

something like (for step 1):

mm_start = max(start, ggtt->start);
mm_end = min(start + size, ggtt->start + ggtt->size);

node->base.start = mm_start - ggtt->start;
node->base.size = mm_end - mm_start;

drm_mm_reserve_node(&ggtt->mm, &node->base);

and another special function to delete this special node
if needed to be created?

Or what do you have on mind for the full area? I believe the full
area is this patch, but with some additions anyway since we need
to handle this buffer plus ensure it gets reserved when we don't
have it...

I'd like to avoid complications like the phys offset addition or
the full range if possible.

Could you please incorporate something simple in a v2 of your series?

Thanks,
Rodrigo.

> 
> -- 
> Ville Syrjälä
> Intel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
  2026-06-03 13:44           ` Rodrigo Vivi
@ 2026-06-03 14:54             ` Ville Syrjälä
  2026-06-10 11:26             ` Maarten Lankhorst
  1 sibling, 0 replies; 12+ messages in thread
From: Ville Syrjälä @ 2026-06-03 14:54 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: Michal Wajdeczko, intel-gfx, intel-xe, Maarten Lankhorst

On Wed, Jun 03, 2026 at 09:44:17AM -0400, Rodrigo Vivi wrote:
> On Wed, Jun 03, 2026 at 04:19:41PM +0300, Ville Syrjälä wrote:
> > On Fri, May 29, 2026 at 05:03:55PM -0400, Rodrigo Vivi wrote:
> > > On Fri, May 29, 2026 at 01:50:34PM +0300, Ville Syrjälä wrote:
> > > > On Thu, May 28, 2026 at 06:28:47PM -0400, Rodrigo Vivi wrote:
> > > > > On Fri, May 29, 2026 at 12:11:22AM +0200, Michal Wajdeczko wrote:
> > > > > > 
> > > > > > 
> > > > > > On 5/27/2026 4:45 PM, Rodrigo Vivi wrote:
> > > > > > > The PF GGTT allocator was initialised over a relative [0, usable_size)
> > > > > > > range, with ggtt->start added on every address conversion to get the
> > > > > > > actual hardware address.  Two consequences of that model were considered
> > > > > > > "horrible hacks":
> > > > > > > 
> > > > > > >   - ggtt->start (the WOPCM offset) had to be carried around and added
> > > > > > >     to every drm_mm result.
> > > > > > 
> > > > > > hmm, but this an internal detail of the xe_ggtt implementation, so why
> > > > > > would someone else complain about it?
> > > > > > 
> > > > > > >   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
> > > > > > 
> > > > > > hmm, for the record, this GGTT cap on the top was added back in 2023
> > > > > > 
> > > > > > commit ab10e976fbda8349163ceee2ce99b2bfc97031b8
> > > > > > Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > > > > > Date:   Wed Jun 14 10:47:54 2023 -0700
> > > > > > 
> > > > > >     drm/xe: limit GGTT size to GUC_GGTT_TOP
> > > > > > 
> > > > > > +        * The GuC address space is limited on both ends of the GGTT, because
> > > > > > +        * the GuC shim HW redirects accesses to those addresses to other HW
> > > > > > +        * areas instead of going through the GGTT. On the bottom end, the GuC
> > > > > > +        * can't access offsets below the WOPCM size, while on the top side the
> > > > > > +        * limit is fixed at GUC_GGTT_TOP. To keep things simple, instead of
> > > > > > +        * checking each object to see if they are accessed by GuC or not, we
> > > > > > +        * just exclude those areas from the allocator. Additionally, to
> > > > > > +        * simplify the driver load, we use the maximum WOPCM size in this logic
> > > > > > 
> > > > > > >     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
> > > > > > >     untouched during the initial clear.
> > > > > > 
> > > > > > and that likely will not be changed by this patch as after allocating 'two
> > > > > > permanent zones', the drm_mm_for_each_hole will not iterate over them
> > > > > 
> > > > > right...
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Fix this for the PF case by initialising drm_mm over the full hardware
> > > > > > > GGTT range [0, total_size) and permanently reserving the two forbidden
> > > > > > > zones:
> > > > > > > 
> > > > > > >   - [0, wopcm)           — inaccessible below WOPCM
> > > > > > >   - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP
> > > > > > 
> > > > > > that looks odds: why pretend to claim manageability of full [0, 4GB)
> > > > > > of the GGTT and then immediately permanently reserve two end zones to
> > > > > > end up with real [wopcm, GUC_TOP) which is what we already have?
> > > > > 
> > > > > yes...
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > A new mm_offset field (zero for PF) carries the base offset used in
> > > > > > > address conversions, unifying the existing VF relative model (where
> > > > > > > mm_offset == vf_base) with the new PF absolute model.
> > > > > > 
> > > > > > but public xe_ggtt API already uses absolute addressing in PF and VF
> > > > > 
> > > > > I know...
> > > > > 
> > > > > > 
> > > > > > >  The public
> > > > > > > xe_ggtt_start() / xe_ggtt_size() API continues to return the usable
> > > > > > > [wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF
> > > > > > > config code are unaffected.
> > > > > > > 
> > > > > > > xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset
> > > > > > > so the VF recovery path remains a single O(1) WRITE_ONCE pair.
> > > > > > 
> > > > > > maybe it's just me - but I can't figure out the real rationale for this
> > > > > > patch - what did I miss?
> > > > > 
> > > > > This series:
> > > > > https://lore.kernel.org/intel-xe/20260511214122.8468-1-ville.syrjala@linux.intel.com/
> > > > > 
> > > > > And more specifically the discussion in this patch:
> > > > > https://lore.kernel.org/intel-xe/20260511214122.8468-13-ville.syrjala@linux.intel.com/
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > > > > Assisted-by: GitHub-Copilot:claude-sonnet-4.6
> > > > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++-------
> > > > > > >  1 file changed, 101 insertions(+), 22 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > > > > > > index a351c578b170..00a6cd2b8a51 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > > > > > > @@ -137,6 +137,17 @@ struct xe_ggtt {
> > > > > > >  	const struct xe_ggtt_pt_ops *pt_ops;
> > > > > > >  	/** @mm: The memory manager used to manage individual GGTT allocations */
> > > > > > >  	struct drm_mm mm;
> > > > > > > +	/**
> > > > > > > +	 * @mm_offset: base offset added to drm_mm node addresses to obtain hardware
> > > > > > > +	 * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses).
> > > > > > > +	 * For VF this equals @start (drm_mm uses relative addresses from VF base).
> > > > > > > +	 * Updated atomically by xe_ggtt_shift_nodes() during VF recovery.
> > > > > > > +	 */
> > > > > > > +	u64 mm_offset;
> > > > > > > +	/** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */
> > > > > > > +	struct drm_mm_node reserved_bottom;
> > > > > > > +	/** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */
> > > > > > > +	struct drm_mm_node reserved_top;
> > > > > > 
> > > > > > maybe all we need is to separate concepts of:
> > > > > > 
> > > > > > * raw GGTT - fixed range [0, 4GB)
> > > > > > 
> > > > > > from
> > > > > > 
> > > > > > * allocable GGTT - configurable sub-range [start, end)
> > > > > >   * [wopcm, GUC_TOP) on PF
> > > > > >   * [base, base+size) on VF
> > > > > > 
> > > > > > and then we can continue to use drm_mm.init(0, end-start) to manage
> > > > > > that [start, end) range in a common way on both PF and VF?
> > > > > 
> > > > > we need to be able to use a ggtt buffer that comes out of this range,
> > > > > so I'm afraid it doesn't solve all the cases.
> > > > 
> > > > Basically what the display needs is:
> > > > 1. specify where in ggtt the buffer was originally placed by the GOP,
> > > >    this may be partially or fully inside these GuC reserved ranges
> > > > 2. bind the buffer to some acceptable location (assuming the original
> > > >    location wasn't acceptable) without overwriting the PTEs for the
> > > >    original location
> > > > 
> > > > I suppose this could be achieved even with this "mm doesn't cover the
> > > > ends" hack, but step 1 there becomes a bit dodgy because we can't
> > > > insert the mm node if it's fully outside the mm. I suppose it could 
> > > > still work if you hide it in a function that only validates the real
> > > > ggtt offsets, but then ignores the fact that the node can't be
> > > > inserted due to being fully inside those reserved ranges. And then
> > > > whatever cleans up that original mm node must also ignore the fact
> > > > that the node maybe wasn't even allocated. And also
> > > > xe_ggtt_initial_clear() will need special code to clear the
> > > > reserved ranges.
> > > 
> > > right, so basically we could keep the xe_ggtt as is and provide
> > > 2 hooks:
> > > 
> > > 1. one to reserve the portion of the BIOS FB that goes
> > > inside our managed ggtt area
> > > 2. a special clear for this area
> > > 
> > > And in between you do the rebind with existing infrastructure
> > > to an empty region?! Is this what you are thinking now?
> > > 
> > > > 
> > > > My original idea was that we'd just include the reserved regions
> > > > in the mm, and then the display could just keep the buffer at its
> > > > original location, and later the guc code can reserve what is
> > > > left over. So we could skip step 2 above completely. But after
> > > > a second thought we probably don't want to skip that step because
> > > > we might free the display bo later, at which point we might free
> > > > up some of the reserved ranges. So I guess we'd still want to keep
> > > > step 2. But I think it'd still result in less special cases in the
> > > > code. We'd just need the guc code to reserve what it needs, after
> > > > the display code has rebound the bo to an acceptable location.
> > > > 
> > > > So we'd end up with:
> > > > 1. insert node for the bo's original ggtt location
> > > > 2. rebind the display bo to an acceptable ggtt location
> > > > 3. undo step 1
> > > > 4. xe_ggtt_initial_clear() (now also clears the reserved ranges
> > > >    without any special code)
> > > > 5. guc steals the reserved ranges explicitly
> > > > 
> > > > So only two special cases left really, and all the rest
> > > > of the code is blissfully unaware of any of it.
> > > > 
> > > > Hmm, although hibernation might still be a slight issue for
> > > > xe_ggtt_initial_clear(). As in how would the reserved regions
> > > > get cleared during resume from hibernation? I have no idea 
> > > > how the current xe ggtt code handles resume at all...
> > > 
> > > The resume should only restore the pinned bo's one by one, nothing
> > > special.
> > 
> > Looks like currently xe_ggtt_initial_clear() is never even called
> > during resume from hibernation, so in that case parts of the GGTT
> > will be left with whatever garbage the GOP put there. So that's
> > one thing that needs fixing.
> > 
> > > So I guess if we keep the original code we are okay,
> > > but if we start to managing the full range with the reserved areas
> > > we might have some difficulties here on the way...
> > 
> > If we had a full range mm I suppose we'd need a bit of special code
> > to remove the reserved nodes before xe_ggtt_initial_clear() gets
> > called, at least for the resume from hibernation case.
> 
> it looks like Maarten suggestion fix the clear portion.
> But for the steps 1 and 3 above we would need a special reservation
> with a node area only within our managed area?!
> 
> something like (for step 1):
> 
> mm_start = max(start, ggtt->start);
> mm_end = min(start + size, ggtt->start + ggtt->size);
> 
> node->base.start = mm_start - ggtt->start;
> node->base.size = mm_end - mm_start;
> 
> drm_mm_reserve_node(&ggtt->mm, &node->base);
> 
> and another special function to delete this special node
> if needed to be created?

Something like that could work.

> Or what do you have on mind for the full area? I believe the full
> area is this patch, but with some additions anyway since we need
> to handle this buffer plus ensure it gets reserved when we don't
> have it...

From the display side the full mm approach shouldn't really
look any different.

> 
> I'd like to avoid complications like the phys offset addition

What phys_offset addition are we talking about here?

> or
> the full range if possible.
> 
> Could you please incorporate something simple in a v2 of your series?

If you mean me and my initial fb series then I have no plan to
post a v2 of that. What I posted is IMO just fine as is. The
remaining broken corner cases can be fixed afterwards once we
get that new thing to properly protect the current PTEs.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
  2026-06-03 13:44           ` Rodrigo Vivi
  2026-06-03 14:54             ` Ville Syrjälä
@ 2026-06-10 11:26             ` Maarten Lankhorst
  1 sibling, 0 replies; 12+ messages in thread
From: Maarten Lankhorst @ 2026-06-10 11:26 UTC (permalink / raw)
  To: Rodrigo Vivi, Ville Syrjälä
  Cc: Michal Wajdeczko, intel-gfx, intel-xe

Hello,

On 6/3/26 15:44, Rodrigo Vivi wrote:
> On Wed, Jun 03, 2026 at 04:19:41PM +0300, Ville Syrjälä wrote:
>> On Fri, May 29, 2026 at 05:03:55PM -0400, Rodrigo Vivi wrote:
>>> On Fri, May 29, 2026 at 01:50:34PM +0300, Ville Syrjälä wrote:
>>>> On Thu, May 28, 2026 at 06:28:47PM -0400, Rodrigo Vivi wrote:
>>>>> On Fri, May 29, 2026 at 12:11:22AM +0200, Michal Wajdeczko wrote:
>>>>>>
>>>>>>
>>>>>> On 5/27/2026 4:45 PM, Rodrigo Vivi wrote:
>>>>>>> The PF GGTT allocator was initialised over a relative [0, usable_size)
>>>>>>> range, with ggtt->start added on every address conversion to get the
>>>>>>> actual hardware address.  Two consequences of that model were considered
>>>>>>> "horrible hacks":
>>>>>>>
>>>>>>>   - ggtt->start (the WOPCM offset) had to be carried around and added
>>>>>>>     to every drm_mm result.
>>>>>>
>>>>>> hmm, but this an internal detail of the xe_ggtt implementation, so why
>>>>>> would someone else complain about it?
>>>>>>
>>>>>>>   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
>>>>>>
>>>>>> hmm, for the record, this GGTT cap on the top was added back in 2023
>>>>>>
>>>>>> commit ab10e976fbda8349163ceee2ce99b2bfc97031b8
>>>>>> Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>>> Date:   Wed Jun 14 10:47:54 2023 -0700
>>>>>>
>>>>>>     drm/xe: limit GGTT size to GUC_GGTT_TOP
>>>>>>
>>>>>> +        * The GuC address space is limited on both ends of the GGTT, because
>>>>>> +        * the GuC shim HW redirects accesses to those addresses to other HW
>>>>>> +        * areas instead of going through the GGTT. On the bottom end, the GuC
>>>>>> +        * can't access offsets below the WOPCM size, while on the top side the
>>>>>> +        * limit is fixed at GUC_GGTT_TOP. To keep things simple, instead of
>>>>>> +        * checking each object to see if they are accessed by GuC or not, we
>>>>>> +        * just exclude those areas from the allocator. Additionally, to
>>>>>> +        * simplify the driver load, we use the maximum WOPCM size in this logic
>>>>>>
>>>>>>>     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
>>>>>>>     untouched during the initial clear.
>>>>>>
>>>>>> and that likely will not be changed by this patch as after allocating 'two
>>>>>> permanent zones', the drm_mm_for_each_hole will not iterate over them
>>>>>
>>>>> right...
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Fix this for the PF case by initialising drm_mm over the full hardware
>>>>>>> GGTT range [0, total_size) and permanently reserving the two forbidden
>>>>>>> zones:
>>>>>>>
>>>>>>>   - [0, wopcm)           — inaccessible below WOPCM
>>>>>>>   - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP
>>>>>>
>>>>>> that looks odds: why pretend to claim manageability of full [0, 4GB)
>>>>>> of the GGTT and then immediately permanently reserve two end zones to
>>>>>> end up with real [wopcm, GUC_TOP) which is what we already have?
>>>>>
>>>>> yes...
>>>>>
>>>>>>
>>>>>>>
>>>>>>> A new mm_offset field (zero for PF) carries the base offset used in
>>>>>>> address conversions, unifying the existing VF relative model (where
>>>>>>> mm_offset == vf_base) with the new PF absolute model.
>>>>>>
>>>>>> but public xe_ggtt API already uses absolute addressing in PF and VF
>>>>>
>>>>> I know...
>>>>>
>>>>>>
>>>>>>>  The public
>>>>>>> xe_ggtt_start() / xe_ggtt_size() API continues to return the usable
>>>>>>> [wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF
>>>>>>> config code are unaffected.
>>>>>>>
>>>>>>> xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset
>>>>>>> so the VF recovery path remains a single O(1) WRITE_ONCE pair.
>>>>>>
>>>>>> maybe it's just me - but I can't figure out the real rationale for this
>>>>>> patch - what did I miss?
>>>>>
>>>>> This series:
>>>>> https://lore.kernel.org/intel-xe/20260511214122.8468-1-ville.syrjala@linux.intel.com/
>>>>>
>>>>> And more specifically the discussion in this patch:
>>>>> https://lore.kernel.org/intel-xe/20260511214122.8468-13-ville.syrjala@linux.intel.com/
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>>>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>>>> Assisted-by: GitHub-Copilot:claude-sonnet-4.6
>>>>>>> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>>> ---
>>>>>>>  drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++-------
>>>>>>>  1 file changed, 101 insertions(+), 22 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
>>>>>>> index a351c578b170..00a6cd2b8a51 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_ggtt.c
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
>>>>>>> @@ -137,6 +137,17 @@ struct xe_ggtt {
>>>>>>>  	const struct xe_ggtt_pt_ops *pt_ops;
>>>>>>>  	/** @mm: The memory manager used to manage individual GGTT allocations */
>>>>>>>  	struct drm_mm mm;
>>>>>>> +	/**
>>>>>>> +	 * @mm_offset: base offset added to drm_mm node addresses to obtain hardware
>>>>>>> +	 * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses).
>>>>>>> +	 * For VF this equals @start (drm_mm uses relative addresses from VF base).
>>>>>>> +	 * Updated atomically by xe_ggtt_shift_nodes() during VF recovery.
>>>>>>> +	 */
>>>>>>> +	u64 mm_offset;
>>>>>>> +	/** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */
>>>>>>> +	struct drm_mm_node reserved_bottom;
>>>>>>> +	/** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */
>>>>>>> +	struct drm_mm_node reserved_top;
>>>>>>
>>>>>> maybe all we need is to separate concepts of:
>>>>>>
>>>>>> * raw GGTT - fixed range [0, 4GB)
>>>>>>
>>>>>> from
>>>>>>
>>>>>> * allocable GGTT - configurable sub-range [start, end)
>>>>>>   * [wopcm, GUC_TOP) on PF
>>>>>>   * [base, base+size) on VF
>>>>>>
>>>>>> and then we can continue to use drm_mm.init(0, end-start) to manage
>>>>>> that [start, end) range in a common way on both PF and VF?
>>>>>
>>>>> we need to be able to use a ggtt buffer that comes out of this range,
>>>>> so I'm afraid it doesn't solve all the cases.
>>>>
>>>> Basically what the display needs is:
>>>> 1. specify where in ggtt the buffer was originally placed by the GOP,
>>>>    this may be partially or fully inside these GuC reserved ranges
>>>> 2. bind the buffer to some acceptable location (assuming the original
>>>>    location wasn't acceptable) without overwriting the PTEs for the
>>>>    original location
>>>>
>>>> I suppose this could be achieved even with this "mm doesn't cover the
>>>> ends" hack, but step 1 there becomes a bit dodgy because we can't
>>>> insert the mm node if it's fully outside the mm. I suppose it could 
>>>> still work if you hide it in a function that only validates the real
>>>> ggtt offsets, but then ignores the fact that the node can't be
>>>> inserted due to being fully inside those reserved ranges. And then
>>>> whatever cleans up that original mm node must also ignore the fact
>>>> that the node maybe wasn't even allocated. And also
>>>> xe_ggtt_initial_clear() will need special code to clear the
>>>> reserved ranges.
>>>
>>> right, so basically we could keep the xe_ggtt as is and provide
>>> 2 hooks:
>>>
>>> 1. one to reserve the portion of the BIOS FB that goes
>>> inside our managed ggtt area
>>> 2. a special clear for this area
>>>
>>> And in between you do the rebind with existing infrastructure
>>> to an empty region?! Is this what you are thinking now?
>>>
>>>>
>>>> My original idea was that we'd just include the reserved regions
>>>> in the mm, and then the display could just keep the buffer at its
>>>> original location, and later the guc code can reserve what is
>>>> left over. So we could skip step 2 above completely. But after
>>>> a second thought we probably don't want to skip that step because
>>>> we might free the display bo later, at which point we might free
>>>> up some of the reserved ranges. So I guess we'd still want to keep
>>>> step 2. But I think it'd still result in less special cases in the
>>>> code. We'd just need the guc code to reserve what it needs, after
>>>> the display code has rebound the bo to an acceptable location.
>>>>
>>>> So we'd end up with:
>>>> 1. insert node for the bo's original ggtt location
>>>> 2. rebind the display bo to an acceptable ggtt location
>>>> 3. undo step 1
>>>> 4. xe_ggtt_initial_clear() (now also clears the reserved ranges
>>>>    without any special code)
>>>> 5. guc steals the reserved ranges explicitly
>>>>
>>>> So only two special cases left really, and all the rest
>>>> of the code is blissfully unaware of any of it.
>>>>
>>>> Hmm, although hibernation might still be a slight issue for
>>>> xe_ggtt_initial_clear(). As in how would the reserved regions
>>>> get cleared during resume from hibernation? I have no idea 
>>>> how the current xe ggtt code handles resume at all...
>>>
>>> The resume should only restore the pinned bo's one by one, nothing
>>> special.
>>
>> Looks like currently xe_ggtt_initial_clear() is never even called
>> during resume from hibernation, so in that case parts of the GGTT
>> will be left with whatever garbage the GOP put there. So that's
>> one thing that needs fixing.
>>
>>> So I guess if we keep the original code we are okay,
>>> but if we start to managing the full range with the reserved areas
>>> we might have some difficulties here on the way...
>>
>> If we had a full range mm I suppose we'd need a bit of special code
>> to remove the reserved nodes before xe_ggtt_initial_clear() gets
>> called, at least for the resume from hibernation case.
> 
> it looks like Maarten suggestion fix the clear portion.
> But for the steps 1 and 3 above we would need a special reservation
> with a node area only within our managed area?!
> 
> something like (for step 1):
> 
> mm_start = max(start, ggtt->start);
> mm_end = min(start + size, ggtt->start + ggtt->size);
> 
> node->base.start = mm_start - ggtt->start;
> node->base.size = mm_end - mm_start;
> 
> drm_mm_reserve_node(&ggtt->mm, &node->base);

I completely missed this, but that function exists.

Extend xe_ggtt_insert_node to xe_ggtt_insert_node_at, and xe_ggtt_node_remove.
The display code needs to truncate start/end/size as needed, because
size is passed as argument.

The proposal I made in the original series that spawned this
discussion, had the same idea. You can argue the clipping should be done by
xe_ggtt, in which case I'm open for a xe_ggtt_node_reserve_region(begin, end),
with clipping done by xe_ggtt. But I believe xe_ggtt_insert_node_at is more
generic, and can be used in more places.

This also removes the workaround we added that required adding 'xe_ggtt_insert_bo_at'

> 
> and another special function to delete this special node
> if needed to be created?
> 
> Or what do you have on mind for the full area? I believe the full
> area is this patch, but with some additions anyway since we need
> to handle this buffer plus ensure it gets reserved when we don't
> have it...
> 
> I'd like to avoid complications like the phys offset addition or
> the full range if possible.
> 
> Could you please incorporate something simple in a v2 of your series?
> 
> Thanks,
> Rodrigo.
> 
>>
>> -- 
>> Ville Syrjälä
>> Intel

Kind regards,
~Maarten Lankhorst

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-06-10 11:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-27 14:45 [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF Rodrigo Vivi
2026-05-27 15:48 ` ✗ i915.CI.BAT: failure for " Patchwork
2026-05-28 22:11 ` [PATCH] " Michal Wajdeczko
2026-05-28 22:28   ` Rodrigo Vivi
2026-05-29 10:50     ` Ville Syrjälä
2026-05-29 21:03       ` Rodrigo Vivi
2026-06-03 13:19         ` Ville Syrjälä
2026-06-03 13:44           ` Rodrigo Vivi
2026-06-03 14:54             ` Ville Syrjälä
2026-06-10 11:26             ` Maarten Lankhorst
2026-06-03 12:57 ` [PATCH] drm/xe/ggtt: clear reserved area at GUC_GGTT_TOP Maarten Lankhorst
2026-06-03 13:11   ` Ville Syrjälä

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox