Re: [PATCH 1/3] drm/xe/xe3p_lpg: flush userptr/shrinker bo cachelines manually

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matt Roper <matthew.d.roper@intel.com>
To: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <matthew.auld@intel.com>,
	<thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH 1/3] drm/xe/xe3p_lpg: flush userptr/shrinker bo cachelines manually
Date: Tue, 10 Feb 2026 13:05:25 -0800	[thread overview]
Message-ID: <20260210210525.GC4694@mdroper-desk1.amr.corp.intel.com> (raw)
In-Reply-To: <20260210125120.1329411-6-tejas.upadhyay@intel.com>

On Tue, Feb 10, 2026 at 06:21:22PM +0530, Tejas Upadhyay wrote:
> "eXtended Architecture" (XA) tagged memory—memory shared between the CPU and GPU

I'm pretty sure this expansion of "XA" is wrong; where are you seeing
this definition?  Everything in the bspec indicates that XA means "wb -
transient app" (similar to how "XD" is 'wb - transient display").  I'm
not sure why exactly they picked "X" to refer to transient in both of
these cases, but I've never seen any documentation that refers to it as
"extended."

> is treated differently from other GPU memory when the Media engine is power-gated.
> 
> XA is *always* flushed, like at the end-of-submssion (and maybe other

I assume you're referring to the fact that the driver performs flushes
at the end of submission (via PIPE_CONTROL or MI_FLUSH_DW), and that
depending on other state/optimizations in the system, those flushes may
flush the entire device cache, or may only flush the subset of cache
data that is not marked as transient.  The way you worded this was
confusing since it makes it sound like cache flushes happen
automatically somewhere in hardware/firmware.

> places), just that internally as an optimisation hw doesn't need to make
> that a full flush (which will also include XA) when Media is
> off/powergated, since it doesn't need to worry about GT caches vs Media
> coherency, and only CPU vs GPU coherency, so can make that flush a
> targeted XA flush, since stuff tagged with XA now means it's shared with
> the CPU. The main implication is that we now need to somehow flush non-XA before
> freeing system memory pages, otherwise dirty cachelines could be flushed after the free
> (like if Media suddenly turns on and does a full flush)

This description seems really confusing.  My understanding is that
marking something as wb-transient-app indicates that it might be
accessed by something other than our graphics/media IP (i.e., accessed
from the CPU, exported to another device, etc.), so transient data truly
does need to be flushed at the points in the driver where a flush
typically happens.

However when something is _not_ transient, then either:
 - it's "private" to the GPU and only our graphics/media IP will be
   accessing it
 - it's bound with a coherent PAT index so that outside observers like
   the CPU can snoop the device cache, even when the cache hasn't been
   flushed

If media is not active, then there's really no need to include
non-transient data when an device cache flush happens since there's no
real need for the data to get to RAM.  So that enables an optimization
(which comes in your next patch), that allows flushes to only operate on
the subset of the device cache tagged as "transient" if media is idle.

As you said, we eventually do want to force a flush of the non-transient
data as well once we're freeing the underlying pages.  So how do we do
that?  It's not clear to me how the changes below are accomplishing
that.  Is there a way to explicitly request a full device cache flush
(ignoring the transient vs non-transient tagging)?  Since the GuC
handles the optimization in the next patch (toggling whether flushes are
full flushes vs non-transient flushes depending on whether media is
active), I thought there might be some kind of GuC interface to request
"please do one full flush now, even if media is idle."


Matt

> 
> V2(MattA): Expand commit description
> 
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c      |  3 ++-
>  drivers/gpu/drm/xe/xe_device.c  | 23 +++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_device.h  |  1 +
>  drivers/gpu/drm/xe/xe_userptr.c |  3 ++-
>  4 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index e9180b01a4e4..4455886b211e 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -689,7 +689,8 @@ static int xe_bo_trigger_rebind(struct xe_device *xe, struct xe_bo *bo,
>  
>  		if (!xe_vm_in_fault_mode(vm)) {
>  			drm_gpuvm_bo_evict(vm_bo, true);
> -			continue;
> +			if (!xe_device_needs_cache_flush(xe))
> +				continue;
>  		}
>  
>  		if (!idle) {
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 743c18e0c580..da2abed94bc0 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -1097,6 +1097,29 @@ static void tdf_request_sync(struct xe_device *xe)
>  	}
>  }
>  
> +/**
> + * xe_device_needs_cache_flush - Whether the cache needs to be flushed
> + * @xe: The device to check.
> + *
> + * Return: true if the device needs cache flush, false otherwise.
> + */
> +bool xe_device_needs_cache_flush(struct xe_device *xe)
> +{
> +	/* XA is *always* flushed, like at the end-of-submssion (and maybe other
> +	 * places), just that internally as an optimisation hw doesn't need to make
> +	 * that a full flush (which will also include XA) when Media is
> +	 * off/powergated, since it doesn't need to worry about GT caches vs Media
> +	 * coherency, and only CPU vs GPU coherency, so can make that flush a
> +	 * targeted XA flush, since stuff tagged with XA now means it's shared with
> +	 * the CPU. The main implication is that we now need to somehow flush non-XA before
> +	 * freeing system memory pages, otherwise dirty cachelines could be flushed after the free
> +	 * (like if Media suddenly turns on and does a full flush)
> +	 */
> +	if (GRAPHICS_VER(xe) >= 35 && !IS_DGFX(xe))
> +		return true;
> +	return false;
> +}
> +
>  void xe_device_l2_flush(struct xe_device *xe)
>  {
>  	struct xe_gt *gt;
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 39464650533b..baf386e0e037 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -184,6 +184,7 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p);
>  u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address);
>  u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address);
>  
> +bool xe_device_needs_cache_flush(struct xe_device *xe);
>  void xe_device_td_flush(struct xe_device *xe);
>  void xe_device_l2_flush(struct xe_device *xe);
>  
> diff --git a/drivers/gpu/drm/xe/xe_userptr.c b/drivers/gpu/drm/xe/xe_userptr.c
> index e120323c43bc..b435ea7f9b66 100644
> --- a/drivers/gpu/drm/xe/xe_userptr.c
> +++ b/drivers/gpu/drm/xe/xe_userptr.c
> @@ -114,7 +114,8 @@ static void __vma_userptr_invalidate(struct xe_vm *vm, struct xe_userptr_vma *uv
>  				    false, MAX_SCHEDULE_TIMEOUT);
>  	XE_WARN_ON(err <= 0);
>  
> -	if (xe_vm_in_fault_mode(vm) && userptr->initial_bind) {
> +	if ((xe_vm_in_fault_mode(vm) || xe_device_needs_cache_flush(vm->xe)) &&
> +	    userptr->initial_bind) {
>  		err = xe_vm_invalidate_vma(vma);
>  		XE_WARN_ON(err);
>  	}
> -- 
> 2.52.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

next prev parent reply	other threads:[~2026-02-10 21:05 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 12:51 [PATCH 0/3] drm/xe/xe3p_lpg: L2 flush optimization Tejas Upadhyay
2026-02-10 12:51 ` [PATCH 1/3] drm/xe/xe3p_lpg: flush userptr/shrinker bo cachelines manually Tejas Upadhyay
2026-02-10 21:05   ` Matt Roper [this message]
2026-02-11  0:02     ` Matthew Brost
2026-02-11 19:06       ` Upadhyay, Tejas
2026-02-11 21:11         ` Matt Roper
2026-02-12  9:53           ` Matthew Auld
2026-02-13 11:17             ` Upadhyay, Tejas
2026-02-13 13:27               ` Matthew Auld
2026-02-13 13:30                 ` Souza, Jose
2026-02-13 16:23           ` Upadhyay, Tejas
2026-02-13 16:48             ` Souza, Jose
2026-02-13 17:16               ` Matt Roper
2026-02-13 17:31                 ` Souza, Jose
2026-02-13 17:31                 ` Matthew Auld
2026-02-16 10:23                   ` Thomas Hellström
2026-02-16 10:58                     ` Matthew Auld
2026-02-16 12:07                       ` Thomas Hellström
2026-02-16 14:55                         ` Matthew Auld
2026-02-16 15:38                           ` Thomas Hellström
2026-02-16 16:41                             ` Matthew Auld
2026-02-17  6:19                               ` Upadhyay, Tejas
2026-02-17  9:53                                 ` Thomas Hellström
2026-02-17 17:04                               ` Thomas Hellström
2026-02-17 18:41                                 ` Matthew Auld
2026-02-16 10:56             ` Thomas Hellström
2026-02-16 11:26               ` Upadhyay, Tejas
2026-02-13 17:29           ` Matthew Auld
2026-02-10 12:51 ` [PATCH 2/3] drm/xe/xe3p_lpg: Enable L2 flush optimization feature Tejas Upadhyay
2026-02-10 12:51 ` [PATCH 3/3] drm/xe/xe3p: Skip TD flush Tejas Upadhyay
2026-02-10 21:22   ` Matt Roper
2026-02-13 11:06     ` Upadhyay, Tejas
2026-02-10 13:35 ` ✗ CI.checkpatch: warning for drm/xe/xe3p_lpg: L2 flush optimization (rev2) Patchwork
2026-02-10 13:36 ` ✓ CI.KUnit: success " Patchwork
2026-02-10 14:33 ` ✗ Xe.CI.BAT: failure " Patchwork
2026-02-10 18:06 ` ✗ Xe.CI.FULL: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2025-11-25  9:43 [PATCH 0/3] drm/xe/xe3p_lpg: L2 flush optimization Tejas Upadhyay
2025-11-25  9:43 ` [PATCH 1/3] drm/xe/xe3p_lpg: flush userptr/shrinker bo cachelines manually Tejas Upadhyay
2025-11-25 10:17   ` Matthew Auld
2025-11-25 13:39     ` Souza, Jose
2025-11-25 15:06   ` Thomas Hellström
2025-11-25 15:31     ` Upadhyay, Tejas
2025-11-26 10:26       ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260210210525.GC4694@mdroper-desk1.amr.corp.intel.com \
    --to=matthew.d.roper@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=tejas.upadhyay@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox