Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	"Ville Syrjälä" <ville.syrjala@linux.intel.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>
Subject: Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF
Date: Thu, 28 May 2026 18:28:47 -0400	[thread overview]
Message-ID: <ahjBn2vYp5dppkYp@intel.com> (raw)
In-Reply-To: <c5b06289-f353-47eb-a918-102288fac7e9@intel.com>

On Fri, May 29, 2026 at 12:11:22AM +0200, Michal Wajdeczko wrote:
> 
> 
> On 5/27/2026 4:45 PM, Rodrigo Vivi wrote:
> > The PF GGTT allocator was initialised over a relative [0, usable_size)
> > range, with ggtt->start added on every address conversion to get the
> > actual hardware address.  Two consequences of that model were considered
> > "horrible hacks":
> > 
> >   - ggtt->start (the WOPCM offset) had to be carried around and added
> >     to every drm_mm result.
> 
> hmm, but this an internal detail of the xe_ggtt implementation, so why
> would someone else complain about it?
> 
> >   - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead
> 
> hmm, for the record, this GGTT cap on the top was added back in 2023
> 
> commit ab10e976fbda8349163ceee2ce99b2bfc97031b8
> Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Date:   Wed Jun 14 10:47:54 2023 -0700
> 
>     drm/xe: limit GGTT size to GUC_GGTT_TOP
> 
> +        * The GuC address space is limited on both ends of the GGTT, because
> +        * the GuC shim HW redirects accesses to those addresses to other HW
> +        * areas instead of going through the GGTT. On the bottom end, the GuC
> +        * can't access offsets below the WOPCM size, while on the top side the
> +        * limit is fixed at GUC_GGTT_TOP. To keep things simple, instead of
> +        * checking each object to see if they are accessed by GuC or not, we
> +        * just exclude those areas from the allocator. Additionally, to
> +        * simplify the driver load, we use the maximum WOPCM size in this logic
> 
> >     of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size)
> >     untouched during the initial clear.
> 
> and that likely will not be changed by this patch as after allocating 'two
> permanent zones', the drm_mm_for_each_hole will not iterate over them

right...

> 
> > 
> > Fix this for the PF case by initialising drm_mm over the full hardware
> > GGTT range [0, total_size) and permanently reserving the two forbidden
> > zones:
> > 
> >   - [0, wopcm)           — inaccessible below WOPCM
> >   - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP
> 
> that looks odds: why pretend to claim manageability of full [0, 4GB)
> of the GGTT and then immediately permanently reserve two end zones to
> end up with real [wopcm, GUC_TOP) which is what we already have?

yes...

> 
> > 
> > A new mm_offset field (zero for PF) carries the base offset used in
> > address conversions, unifying the existing VF relative model (where
> > mm_offset == vf_base) with the new PF absolute model.
> 
> but public xe_ggtt API already uses absolute addressing in PF and VF

I know...

> 
> >  The public
> > xe_ggtt_start() / xe_ggtt_size() API continues to return the usable
> > [wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF
> > config code are unaffected.
> > 
> > xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset
> > so the VF recovery path remains a single O(1) WRITE_ONCE pair.
> 
> maybe it's just me - but I can't figure out the real rationale for this
> patch - what did I miss?

This series:
https://lore.kernel.org/intel-xe/20260511214122.8468-1-ville.syrjala@linux.intel.com/

And more specifically the discussion in this patch:
https://lore.kernel.org/intel-xe/20260511214122.8468-13-ville.syrjala@linux.intel.com/

> 
> > 
> > Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Assisted-by: GitHub-Copilot:claude-sonnet-4.6
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++-------
> >  1 file changed, 101 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > index a351c578b170..00a6cd2b8a51 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > @@ -137,6 +137,17 @@ struct xe_ggtt {
> >  	const struct xe_ggtt_pt_ops *pt_ops;
> >  	/** @mm: The memory manager used to manage individual GGTT allocations */
> >  	struct drm_mm mm;
> > +	/**
> > +	 * @mm_offset: base offset added to drm_mm node addresses to obtain hardware
> > +	 * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses).
> > +	 * For VF this equals @start (drm_mm uses relative addresses from VF base).
> > +	 * Updated atomically by xe_ggtt_shift_nodes() during VF recovery.
> > +	 */
> > +	u64 mm_offset;
> > +	/** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */
> > +	struct drm_mm_node reserved_bottom;
> > +	/** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */
> > +	struct drm_mm_node reserved_top;
> 
> maybe all we need is to separate concepts of:
> 
> * raw GGTT - fixed range [0, 4GB)
> 
> from
> 
> * allocable GGTT - configurable sub-range [start, end)
>   * [wopcm, GUC_TOP) on PF
>   * [base, base+size) on VF
> 
> and then we can continue to use drm_mm.init(0, end-start) to manage
> that [start, end) range in a common way on both PF and VF?

we need to be able to use a ggtt buffer that comes out of this range,
so I'm afraid it doesn't solve all the cases.

> 
> 
> >  	/** @access_count: counts GGTT writes */
> >  	unsigned int access_count;
> >  	/** @wq: Dedicated unordered work queue to process node removals */
> > @@ -318,6 +329,10 @@ static void ggtt_fini_early(struct drm_device *drm, void *arg)
> >  {
> >  	struct xe_ggtt *ggtt = arg;
> >  
> > +	if (drm_mm_node_allocated(&ggtt->reserved_top))
> > +		drm_mm_remove_node(&ggtt->reserved_top);
> > +	if (drm_mm_node_allocated(&ggtt->reserved_bottom))
> > +		drm_mm_remove_node(&ggtt->reserved_bottom);
> >  	destroy_workqueue(ggtt->wq);
> >  	drm_mm_takedown(&ggtt->mm);
> >  }
> > @@ -354,16 +369,66 @@ static const struct xe_ggtt_pt_ops xelpg_pt_wa_ops = {
> >  	.ggtt_get_pte = xe_ggtt_get_pte,
> >  };
> >  
> > -static void __xe_ggtt_init_early(struct xe_ggtt *ggtt, u64 start, u64 size)
> > +/*
> > + * __xe_ggtt_init_early - Generic drm_mm initialisation for both PF and VF.
> > + *
> > + * @start and @size define the usable GGTT range exposed through the public API.
> > + * @mm_offset is added to every drm_mm node address to obtain the hardware
> > + * address (0 for PF where the drm_mm spans real addresses; @start for VF).
> > + * @mm_size is the total span passed to drm_mm_init() (equals @size for VF;
> > + * equals the full hardware GGTT size for PF).
> > + */
> > +static void __xe_ggtt_init_early(struct xe_ggtt *ggtt, u64 start, u64 size,
> > +				 u64 mm_offset, u64 mm_size)
> >  {
> >  	ggtt->start = start;
> >  	ggtt->size = size;
> > -	drm_mm_init(&ggtt->mm, 0, size);
> > +	ggtt->mm_offset = mm_offset;
> > +	drm_mm_init(&ggtt->mm, 0, mm_size);
> > +}
> > +
> > +/*
> > + * __xe_ggtt_reserve_pf_nodes - Permanently reserve the forbidden GGTT zones.
> > + *
> > + * Must be called after __xe_ggtt_init_early() for the PF case.  Reserves
> > + * [0, @wopcm) and [@guc_top, @total_size) so the allocator never hands them
> > + * out.  On failure the drm_mm is torn down and the error is returned.
> > + */
> > +static int __xe_ggtt_reserve_pf_nodes(struct xe_ggtt *ggtt, u64 wopcm,
> > +				      u64 guc_top, u64 total_size)
> > +{
> > +	int err;
> > +
> > +	/* Reserve [0, wopcm) — GuC cannot access below WOPCM */
> > +	if (wopcm > 0) {
> > +		ggtt->reserved_bottom.start = 0;
> > +		ggtt->reserved_bottom.size = wopcm;
> > +		err = drm_mm_reserve_node(&ggtt->mm, &ggtt->reserved_bottom);
> > +		if (WARN_ON(err))
> > +			goto err_takedown;
> > +	}
> > +
> > +	/* Reserve [guc_top, total_size) — GuC cannot access above GUC_GGTT_TOP */
> > +	if (guc_top < total_size) {
> > +		ggtt->reserved_top.start = guc_top;
> > +		ggtt->reserved_top.size = total_size - guc_top;
> > +		err = drm_mm_reserve_node(&ggtt->mm, &ggtt->reserved_top);
> > +		if (WARN_ON(err))
> > +			goto err_remove_bottom;
> > +	}
> > +
> > +	return 0;
> > +
> > +err_remove_bottom:
> > +	drm_mm_remove_node(&ggtt->reserved_bottom);
> > +err_takedown:
> > +	drm_mm_takedown(&ggtt->mm);
> > +	return err;
> >  }
> >  
> >  int xe_ggtt_init_kunit(struct xe_ggtt *ggtt, u32 start, u32 size)
> >  {
> > -	__xe_ggtt_init_early(ggtt, start, size);
> > +	__xe_ggtt_init_early(ggtt, start, size, start, size);
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL_IF_KUNIT(xe_ggtt_init_kunit);
> > @@ -405,8 +470,13 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> >  			xe_tile_err(ggtt->tile, "Hardware reported no preallocated GSM\n");
> >  			return -ENOMEM;
> >  		}
> > +		/*
> > +		 * For PF, ggtt_size holds the full hardware GGTT size. The
> > +		 * WOPCM and GUC_GGTT_TOP limits are enforced via permanently
> > +		 * reserved drm_mm nodes rather than by capping the range.
> > +		 */
> >  		ggtt_start = wopcm;
> > -		ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE - ggtt_start;
> > +		ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE;
> >  	} else {
> >  		ggtt_start = xe_tile_sriov_vf_ggtt_base(ggtt->tile);
> >  		ggtt_size = xe_tile_sriov_vf_ggtt(ggtt->tile);
> > @@ -423,9 +493,6 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> >  	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
> >  		ggtt->flags |= XE_GGTT_FLAGS_64K;
> >  
> > -	if (ggtt_size + ggtt_start > GUC_GGTT_TOP)
> > -		ggtt_size = GUC_GGTT_TOP - ggtt_start;
> > -
> >  	if (GRAPHICS_VERx100(xe) >= 1270)
> >  		ggtt->pt_ops =
> >  			(ggtt->tile->media_gt && XE_GT_WA(ggtt->tile->media_gt, 22019338487)) ||
> > @@ -438,7 +505,19 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> >  	if (!ggtt->wq)
> >  		return -ENOMEM;
> >  
> > -	__xe_ggtt_init_early(ggtt, ggtt_start, ggtt_size);
> > +	if (!IS_SRIOV_VF(xe)) {
> > +		__xe_ggtt_init_early(ggtt, ggtt_start, GUC_GGTT_TOP - ggtt_start,
> > +				     0, ggtt_size);
> > +		err = __xe_ggtt_reserve_pf_nodes(ggtt, ggtt_start, GUC_GGTT_TOP,
> > +						 ggtt_size);
> > +		if (err) {
> > +			destroy_workqueue(ggtt->wq);
> > +			return err;
> > +		}
> > +	} else {
> > +		__xe_ggtt_init_early(ggtt, ggtt_start, ggtt_size,
> > +				     ggtt_start, ggtt_size);
> > +	}
> >  
> >  	err = drmm_add_action_or_reset(&xe->drm, ggtt_fini_early, ggtt);
> >  	if (err)
> > @@ -459,7 +538,7 @@ static void xe_ggtt_initial_clear(struct xe_ggtt *ggtt)
> >  	/* Display may have allocated inside ggtt, so be careful with clearing here */
> >  	mutex_lock(&ggtt->lock);
> >  	drm_mm_for_each_hole(hole, &ggtt->mm, start, end)
> > -		xe_ggtt_clear(ggtt, ggtt->start + start, end - start);
> > +		xe_ggtt_clear(ggtt, ggtt->mm_offset + start, end - start);
> >  
> >  	xe_ggtt_invalidate(ggtt);
> >  	mutex_unlock(&ggtt->lock);
> > @@ -613,6 +692,7 @@ void xe_ggtt_shift_nodes(struct xe_ggtt *ggtt, u64 new_start)
> >  
> >  	/* pairs with READ_ONCE in xe_ggtt_node_addr() */
> >  	WRITE_ONCE(ggtt->start, new_start);
> > +	WRITE_ONCE(ggtt->mm_offset, new_start);
> >  }
> >  
> >  static int xe_ggtt_insert_node_locked(struct xe_ggtt_node *node,
> > @@ -815,20 +895,19 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> >  
> >  	mutex_lock(&ggtt->lock);
> >  	/*
> > -	 * When inheriting the initial framebuffer, the framebuffer is
> > -	 * physically located at VRAM address 0, and usually at GGTT address 0 too.
> > -	 *
> > -	 * The display code will ask for a GGTT allocation between end of BO and
> > -	 * remainder of GGTT, unaware that the start is reserved by WOPCM.
> > +	 * Convert caller-supplied hardware GGTT addresses to drm_mm-relative
> > +	 * coordinates. For PF mm_offset is 0 so this is a no-op; callers pass
> > +	 * real addresses and the reserved_bottom/top nodes prevent allocations
> > +	 * in the forbidden regions. For VF, mm_offset equals the VF base so
> > +	 * we convert to relative drm_mm space.
> >  	 */
> > -	if (start >= ggtt->start)
> > -		start -= ggtt->start;
> > +	if (start >= ggtt->mm_offset)
> > +		start -= ggtt->mm_offset;
> >  	else
> >  		start = 0;
> >  
> > -	/* Should never happen, but since we handle start, fail graciously for end */
> > -	if (end >= ggtt->start)
> > -		end -= ggtt->start;
> > +	if (end >= ggtt->mm_offset)
> > +		end -= ggtt->mm_offset;
> >  	else
> >  		end = 0;
> >  
> > @@ -923,7 +1002,7 @@ u64 xe_ggtt_largest_hole(struct xe_ggtt *ggtt, u64 alignment, u64 *spare)
> >  
> >  	mutex_lock(&ggtt->lock);
> >  	drm_mm_for_each_hole(entry, mm, hole_start, hole_end) {
> > -		hole_start = max(hole_start, ggtt->start);
> > +		hole_start = max(hole_start, ggtt->mm_offset);
> >  		hole_start = ALIGN(hole_start, alignment);
> >  		hole_end = ALIGN_DOWN(hole_end, alignment);
> >  		if (hole_start >= hole_end)
> > @@ -1098,7 +1177,7 @@ u64 xe_ggtt_print_holes(struct xe_ggtt *ggtt, u64 alignment, struct drm_printer
> >  
> >  	mutex_lock(&ggtt->lock);
> >  	drm_mm_for_each_hole(entry, mm, hole_start, hole_end) {
> > -		hole_start = max(hole_start, ggtt->start);
> > +		hole_start = max(hole_start, ggtt->mm_offset);
> >  		hole_start = ALIGN(hole_start, alignment);
> >  		hole_end = ALIGN_DOWN(hole_end, alignment);
> >  		if (hole_start >= hole_end)
> > @@ -1152,7 +1231,7 @@ u64 xe_ggtt_read_pte(struct xe_ggtt *ggtt, u64 offset)
> >  u64 xe_ggtt_node_addr(const struct xe_ggtt_node *node)
> >  {
> >  	/* pairs with WRITE_ONCE in xe_ggtt_shift_nodes() */
> > -	return node->base.start + READ_ONCE(node->ggtt->start);
> > +	return node->base.start + READ_ONCE(node->ggtt->mm_offset);
> >  }
> >  
> >  /**
>

next prev parent reply	other threads:[~2026-05-28 22:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-27 14:45 [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF Rodrigo Vivi
2026-05-27 15:48 ` ✗ i915.CI.BAT: failure for " Patchwork
2026-05-27 17:42 ` ✓ CI.KUnit: success " Patchwork
2026-05-27 18:55 ` ✓ Xe.CI.BAT: " Patchwork
2026-05-28  1:47 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-05-28 22:11 ` [PATCH] " Michal Wajdeczko
2026-05-28 22:28   ` Rodrigo Vivi [this message]
2026-05-29 10:50     ` Ville Syrjälä
2026-05-29 21:03       ` Rodrigo Vivi
2026-06-03 13:19         ` Ville Syrjälä
2026-06-03 13:44           ` Rodrigo Vivi
2026-06-03 14:54             ` Ville Syrjälä
2026-06-03 12:57 ` [PATCH] drm/xe/ggtt: clear reserved area at GUC_GGTT_TOP Maarten Lankhorst
2026-06-03 13:11   ` Ville Syrjälä

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahjBn2vYp5dppkYp@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=ville.syrjala@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.