Re: [PATCH V6 3/4] drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimization

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>,
	"Zhang, Carl" <carl.zhang@intel.com>,
	 "Upadhyay, Tejas" <tejas.upadhyay@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Cc: "Souza, Jose" <jose.souza@intel.com>,
	"Mrozek, Michal" <michal.mrozek@intel.com>
Subject: Re: [PATCH V6 3/4] drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimization
Date: Mon, 09 Mar 2026 18:30:05 +0100	[thread overview]
Message-ID: <a295b9cf93bb892505b8d209ca7806868c564b32.camel@linux.intel.com> (raw)
In-Reply-To: <07e5e84d-070b-4fb9-84b9-f80f63b14799@intel.com>

On Mon, 2026-03-09 at 17:22 +0000, Matthew Auld wrote:
> On 09/03/2026 15:29, Zhang, Carl wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Auld, Matthew <matthew.auld@intel.com>
> > > Sent: Friday, March 6, 2026 6:09 PM
> > > To: Zhang, Carl <carl.zhang@intel.com>; Upadhyay, Tejas
> > > <tejas.upadhyay@intel.com>; intel-xe@lists.freedesktop.org
> > > Cc: thomas.hellstrom@linux.intel.com; Souza, Jose
> > > <jose.souza@intel.com>;
> > > Mrozek, Michal <michal.mrozek@intel.com>
> > > Subject: Re: [PATCH V6 3/4] drm/xe/xe3p_lpg: Restrict UAPI to
> > > enable L2 flush
> > > optimization
> > > 
> > > On 06/03/2026 07:11, Zhang, Carl wrote:
> > > > My understanding:
> > > > 1. GuC uses a timer to monitor media activity status. The mode
> > > > becomes
> > > active only when no media tasks have been detected for 5 seconds.
> > > From the
> > > media perspective, this allows legacy behavior to be maintained
> > > without
> > > requiring any changes.
> > > > 2. The media UMD only needs to set usage hints to gmmlib, which
> > > > then
> > > manages the PAT index. Therefore, the UMD itself should not
> > > require
> > > changes—only gmmlib needs to be updated to return PAT index 19 on
> > > NVL for
> > > imported surfaces.
> > > > My open is:
> > > > 1. In some applications (e.g., ChromeOS), memory is allocated
> > > > centrally
> > > (possibly by minigbm) and then shared across different
> > > components. If there
> > > are no media tasks, the system operates in persistent mode.
> > > However, based
> > > on current interfaces, imported memory should be configured as
> > > transient +
> > > 1-way coherency. This raises a question: if this memory is used
> > > exclusively by
> > > compute (not media), is this the expected behavior?
> > > 
> > > 2way is also allowed.
> > > 
> >   
> > But 2way is slower than 1 way , right?
> 
> Likely it would be, I think. But for Media only, not sure.
> 
> > 
> > > > 2. For userptr memory that is used by only one component, I
> > > > believe 1-way
> > > coherency should be sufficient?
> > > 
> > > I think for 1) and 2), it mostly comes down to CPU/host <-> GPU
> > > coherency,
> > > right? If you don't use 2WAY or XA, userspace would now have to
> > > manually
> > > handle the coherency, in case in "persistent" mode. It doesn't
> > > matter if there
> > > is just one component/app, the coherency issue would still be
> > > there.
> > > 
> > For Media (vdbox, vebox), does not use L2 cache,  so, if it is
> > userptr
> > 1-way is enough, just need snoop cpu cache.
> > 
> > > For example, for 2) if you only use 1way without XA, then AFAIK
> > > you now
> > > need manual flushing, if GPU side is cached and CPU is expecting
> > > to see
> > > coherent view. Like say GPU writes something and CPU later reads
> > > it. The 1-
> > > way here would just ensure that GPU snoops the CPU caches on the
> > > first
> > > access. But if it then gets cached on GPU side, there is now no
> > > guaranteed
> > > flush when that GPU job is complete, when in "persistent" mode.
> > > 
> > a. my understanding , L2 is used only for RCS and VCS. Not for VCS
> > and VECS.
> > Please correct me if there is any misunderstanding.  So, if one
> > external resource
> > is only used by media . never be used by CCS RCS. whether we should
> > still have
> > such limitation.
> 
> Yeah, that matches my understanding. It sounded like Media access
> does 
> not go via l2, so I assume goes directly to system memory. Hence why
> l2 
> needs to be fully flushed if sharing something with Media (which is 
> assumed whenever Media is currently turned on).
> 
> > 
> > b. if there is media task, seems it cant be "persistent" mode. Of
> > course, maybe,
> > it is "persistent" mode, then media task comes, it turns to
> > "transient" mode.
> > But the data is still in cache , not flushed.  for this case, I
> > agree that any resource
> > used firstly by CCS or RCS then shared it to media, it should be
> > set to 2-way or
> > 1-way + XA.
> > 
> > > So assumption was that for userptr, the memory comes from the
> > > host, and
> > > access is likely shared with CPU/host, so seems reasonable you
> > > would want
> > > XA or 2WAY. For foreign imported memory you are likely sharing
> > > with host or
> > > some other device/driver, so seems reasonable you would probably
> > > want XA
> > > or 2WAY.
> > > 
> > > We can drop the restrictions, if userspace really needs it, but
> > > it would be up to
> > > userspace to deal with all the CPU/host vs GPU coherency fun, if
> > > applicable.
> > > The restrictions do simplify things a little on the KMD side,
> > > plus the validation
> > > angle in IGT.
> > > 
> > I am thinking whether it is too strict.  There is different
> > useagecases.
> 
> Do you know which index(s) you would pick instead, for Media only
> cases? 
> I think that is the only edge case you are concerned with here?
> 
> Is it not the case that if you are only submitting within Media and
> it 
> if it doesn't actually use l2, wouldn't the XA and l2 cache mode
> stuff 
> be a no-op anyway, since the access from Media side never goes
> through l2?

I was wondering whether that could actually be the case as well.

/Thomas


> 
> Do you know which index that we don't allow, would give 
> different/preferred behaviour for Media only case?
> 
> > 
> > > > 
> > > > Thanks
> > > > Carl
> > > > 
> > > > > -----Original Message-----
> > > > > From: Auld, Matthew <matthew.auld@intel.com>
> > > > > Sent: Thursday, March 5, 2026 10:00 PM
> > > > > To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
> > > > > xe@lists.freedesktop.org
> > > > > Cc: thomas.hellstrom@linux.intel.com; Zhang, Carl
> > > <carl.zhang@intel.com>;
> > > > > Souza, Jose <jose.souza@intel.com>; Mrozek, Michal
> > > > > <michal.mrozek@intel.com>
> > > > > Subject: Re: [PATCH V6 3/4] drm/xe/xe3p_lpg: Restrict UAPI to
> > > > > enable L2
> > > flush
> > > > > optimization
> > > > > 
> > > > > On 05/03/2026 12:19, Tejas Upadhyay wrote:
> > > > > > When set, starting xe3p_lpg, the L2 flush optimization
> > > > > > feature will
> > > > > > control whether L2 is in Persistent or Transient mode
> > > > > > through
> > > > > > monitoring of media activity.
> > > > > > 
> > > > > > To enable L2 flush optimization include new feature flag
> > > > > > GUC_CTL_ENABLE_L2FLUSH_OPT for Novalake platforms when
> > > > > > media
> > > type
> > > > > is
> > > > > > detected.
> > > > > > 
> > > > > > Tighten UAPI validation to restrict userptr, svm and dmabuf
> > > > > > mappings
> > > > > > to be either 2WAY or XA+1WAY
> > > > > > 
> > > > > > V5(Thomas): logic correction
> > > > > > V4(MattA): Modify uapi doc and commit
> > > > > > V3(MattA): check valid op and pat_index value
> > > > > > V2(MattA): validate dma-buf bos and madvise pat-index
> > > > > > 
> > > > > > Acked-by: José Roberto de Souza <jose.souza@intel.com>
> > > > > > Acked-by: Michal Mrozek <michal.mrozek@intel.com>
> > > > > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> > > > > > ---
> > > > > >     drivers/gpu/drm/xe/xe_guc.c        |  3 +++
> > > > > >     drivers/gpu/drm/xe/xe_guc_fwif.h   |  1 +
> > > > > >     drivers/gpu/drm/xe/xe_vm.c         |  8 ++++++++
> > > > > >     drivers/gpu/drm/xe/xe_vm_madvise.c | 23
> > > > > > +++++++++++++++++++++++
> > > > > >     include/uapi/drm/xe_drm.h          |  4 +++-
> > > > > >     5 files changed, 38 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_guc.c
> > > > > > b/drivers/gpu/drm/xe/xe_guc.c
> > > > > > index 54d2fc780127..43dc4353206f 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_guc.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > > > > > @@ -98,6 +98,9 @@ static u32 guc_ctl_feature_flags(struct
> > > > > > xe_guc *guc)
> > > > > >     	if (xe_guc_using_main_gamctrl_queues(guc))
> > > > > >     		flags |= GUC_CTL_MAIN_GAMCTRL_QUEUES;
> > > > > > 
> > > > > > +	if (GRAPHICS_VER(xe) >= 35 && !IS_DGFX(xe) &&
> > > > > xe_gt_is_media_type(guc_to_gt(guc)))
> > > > > > +		flags |= GUC_CTL_ENABLE_L2FLUSH_OPT;
> > > > > 
> > > > > Pending whether we also need this on primary GT or not. Since
> > > > > it sounded
> > > > > like it would also need to know whether to do a targeted or
> > > > > full flush
> > > based
> > > > > on current Media status, and it's unclear if here we are
> > > > > meant to opt into
> > > that
> > > > > for every GT/GuC instance vs just the Media GuC.
> > > > > 
> > > > > Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> > > > > 
> > > > > > +
> > > > > >     	return flags;
> > > > > >     }
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h
> > > > > > b/drivers/gpu/drm/xe/xe_guc_fwif.h
> > > > > > index bb8f71d38611..b73fae063fac 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
> > > > > > @@ -67,6 +67,7 @@ struct guc_update_exec_queue_policy {
> > > > > >     #define   GUC_CTL_ENABLE_PSMI_LOGGING	BIT(7)
> > > > > >     #define   GUC_CTL_MAIN_GAMCTRL_QUEUES	BIT(9)
> > > > > >     #define   GUC_CTL_DISABLE_SCHEDULER	BIT(14)
> > > > > > +#define   GUC_CTL_ENABLE_L2FLUSH_OPT	BIT(15)
> > > > > > 
> > > > > >     #define GUC_CTL_DEBUG			3
> > > > > >     #define   GUC_LOG_VERBOSITY		REG_GENMASK(1, 0)
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > > > > b/drivers/gpu/drm/xe/xe_vm.c
> > > > > > index da0ce0b3704c..0b236e08c158 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > > > @@ -3481,6 +3481,10 @@ static int
> > > > > > vm_bind_ioctl_check_args(struct
> > > > > xe_device *xe, struct xe_vm *vm,
> > > > > >     				 op ==
> > > > > DRM_XE_VM_BIND_OP_MAP_USERPTR) ||
> > > > > >     		    XE_IOCTL_DBG(xe, coh_mode ==
> > > > > > XE_COH_NONE &&
> > > > > >     				 op ==
> > > > > DRM_XE_VM_BIND_OP_MAP_USERPTR) ||
> > > > > > +		    XE_IOCTL_DBG(xe,
> > > > > > xe_device_is_l2_flush_optimized(xe) &&
> > > > > > +				 (op ==
> > > > > DRM_XE_VM_BIND_OP_MAP_USERPTR ||
> > > > > > +				  is_cpu_addr_mirror) &&
> > > > > > +				 (pat_index != 19 &&
> > > > > > coh_mode !=
> > > > > XE_COH_2WAY)) ||
> > > > > >     		    XE_IOCTL_DBG(xe, comp_en &&
> > > > > >     				 op ==
> > > > > DRM_XE_VM_BIND_OP_MAP_USERPTR) ||
> > > > > >     		    XE_IOCTL_DBG(xe, op ==
> > > > > DRM_XE_VM_BIND_OP_MAP_USERPTR && @@
> > > > > > -3615,6 +3619,10 @@ static int
> > > > > > xe_vm_bind_ioctl_validate_bo(struct
> > > > > xe_device *xe, struct xe_bo *bo,
> > > > > >     	if (XE_IOCTL_DBG(xe, bo->ttm.base.import_attach &&
> > > > > > comp_en))
> > > > > >     		return -EINVAL;
> > > > > > 
> > > > > > +	if (XE_IOCTL_DBG(xe, bo->ttm.base.import_attach &&
> > > > > xe_device_is_l2_flush_optimized(xe) &&
> > > > > > +			 (pat_index != 19 && coh_mode !=
> > > > > > XE_COH_2WAY)))
> > > > > > +		return -EINVAL;
> > > > > > +
> > > > > >     	/* If a BO is protected it can only be mapped if
> > > > > > the key is still valid */
> > > > > >     	if ((bind_flags & DRM_XE_VM_BIND_FLAG_CHECK_PXP)
> > > > > > &&
> > > > > xe_bo_is_protected(bo) &&
> > > > > >     	    op != DRM_XE_VM_BIND_OP_UNMAP && op !=
> > > > > > DRM_XE_VM_BIND_OP_UNMAP_ALL) diff --git
> > > > > > a/drivers/gpu/drm/xe/xe_vm_madvise.c
> > > > > > b/drivers/gpu/drm/xe/xe_vm_madvise.c
> > > > > > index 07169586e35f..376c014239ee 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
> > > > > > @@ -411,6 +411,7 @@ int xe_vm_madvise_ioctl(struct
> > > > > > drm_device *dev,
> > > > > void *data, struct drm_file *fil
> > > > > >     	struct xe_vmas_in_madvise_range madvise_range =
> > > > > > {.addr = args-
> > > > > > start,
> > > > > >     							
> > > > > > .range =  args-
> > > > > > range, };
> > > > > >     	struct xe_madvise_details details;
> > > > > > +	u16 pat_index, coh_mode;
> > > > > >     	struct xe_vm *vm;
> > > > > >     	struct drm_exec exec;
> > > > > >     	int err, attr_type;
> > > > > > @@ -447,6 +448,17 @@ int xe_vm_madvise_ioctl(struct
> > > > > > drm_device
> > > *dev,
> > > > > void *data, struct drm_file *fil
> > > > > >     	if (err || !madvise_range.num_vmas)
> > > > > >     		goto madv_fini;
> > > > > > 
> > > > > > +	if (args->type == DRM_XE_MEM_RANGE_ATTR_PAT) {
> > > > > > +		pat_index = array_index_nospec(args-
> > > > > > >pat_index.val, xe-
> > > > > > pat.n_entries);
> > > > > > +		coh_mode = xe_pat_index_get_coh_mode(xe,
> > > > > > pat_index);
> > > > > > +		if (XE_IOCTL_DBG(xe,
> > > > > > madvise_range.has_svm_userptr_vmas
> > > > > &&
> > > > > > +				
> > > > > > xe_device_is_l2_flush_optimized(xe) &&
> > > > > > +				 (pat_index != 19 &&
> > > > > > coh_mode !=
> > > > > XE_COH_2WAY))) {
> > > > > > +			err = -EINVAL;
> > > > > > +			goto madv_fini;
> > > > > > +		}
> > > > > > +	}
> > > > > > +
> > > > > >     	if (madvise_range.has_bo_vmas) {
> > > > > >     		if (args->type ==
> > > > > > DRM_XE_MEM_RANGE_ATTR_ATOMIC) {
> > > > > >     			if (!check_bo_args_are_sane(vm,
> > > > > madvise_range.vmas, @@ -464,6
> > > > > > +476,17 @@ int xe_vm_madvise_ioctl(struct drm_device *dev,
> > > > > > void
> > > *data,
> > > > > > struct drm_file *fil
> > > > > > 
> > > > > >     				if (!bo)
> > > > > >     					continue;
> > > > > > +
> > > > > > +				if (args->type ==
> > > > > DRM_XE_MEM_RANGE_ATTR_PAT) {
> > > > > > +					if
> > > > > > (XE_IOCTL_DBG(xe, bo-
> > > > > > ttm.base.import_attach &&
> > > > > > +
> > > > > xe_device_is_l2_flush_optimized(xe) &&
> > > > > > +							
> > > > > > (pat_index != 19 &&
> > > > > > +							 
> > > > > > coh_mode !=
> > > > > XE_COH_2WAY))) {
> > > > > > +						err = -
> > > > > > EINVAL;
> > > > > > +						goto
> > > > > > err_fini;
> > > > > > +					}
> > > > > > +				}
> > > > > > +
> > > > > >     				err =
> > > > > > drm_exec_lock_obj(&exec, &bo-
> > > > > > ttm.base);
> > > > > >     				drm_exec_retry_on_contenti
> > > > > > on(&exec);
> > > > > >     				if (err)
> > > > > > diff --git a/include/uapi/drm/xe_drm.h
> > > > > > b/include/uapi/drm/xe_drm.h
> > > > > > index ef2565048bdf..862fed3cf1ed 100644
> > > > > > --- a/include/uapi/drm/xe_drm.h
> > > > > > +++ b/include/uapi/drm/xe_drm.h
> > > > > > @@ -1103,7 +1103,9 @@ struct drm_xe_vm_bind_op {
> > > > > >     	 * incoherent GT access is possible.
> > > > > >     	 *
> > > > > >     	 * Note: For userptr and externally imported dma-
> > > > > > buf the kernel
> > > > > expects
> > > > > > -	 * either 1WAY or 2WAY for the @pat_index.
> > > > > > +	 * either 1WAY or 2WAY for the @pat_index.
> > > > > > Starting from NVL-P, for
> > > > > > +	 * userptr, svm, madvise and externally imported
> > > > > > dma-buf the kernel
> > > > > expects
> > > > > > +	 * either 2WAY or 1WAY and XA @pat_index.
> > > > > >     	 *
> > > > > >     	 * For DRM_XE_VM_BIND_FLAG_NULL bindings there are
> > > > > > no KMD
> > > > > restrictions
> > > > > >     	 * on the @pat_index. For such mappings there is
> > > > > > no actual memory
> > > > > > being
> > > > 
> >

next prev parent reply	other threads:[~2026-03-09 17:30 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05 12:19 [PATCH V6 0/4] drm/xe/xe3p_lpg: L2 flush optimization Tejas Upadhyay
2026-03-05 12:19 ` [PATCH V6 1/4] drm/xe/xe3p_lpg: flush shrinker bo cachelines manually Tejas Upadhyay
2026-03-05 12:19 ` [PATCH V6 2/4] drm/xe/pat: define coh_mode 2way Tejas Upadhyay
2026-03-05 12:19 ` [PATCH V6 3/4] drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimization Tejas Upadhyay
2026-03-05 13:52   ` Thomas Hellström
2026-03-05 13:59   ` Matthew Auld
2026-03-06  5:46     ` Upadhyay, Tejas
2026-03-06  7:11     ` Zhang, Carl
2026-03-06  9:13       ` Upadhyay, Tejas
2026-03-06 10:08       ` Matthew Auld
2026-03-09 15:29         ` Zhang, Carl
2026-03-09 17:22           ` Matthew Auld
2026-03-09 17:30             ` Thomas Hellström [this message]
2026-03-11 14:58               ` Zhang, Carl
2026-03-05 12:19 ` [PATCH V6 4/4] drm/xe/xe3p: Skip TD flush Tejas Upadhyay
2026-03-06 12:16 ` ✓ CI.KUnit: success for drm/xe/xe3p_lpg: L2 flush optimization (rev7) Patchwork
2026-03-06 12:58 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-07 13:34 ` ✓ Xe.CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a295b9cf93bb892505b8d209ca7806868c564b32.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=carl.zhang@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jose.souza@intel.com \
    --cc=matthew.auld@intel.com \
    --cc=michal.mrozek@intel.com \
    --cc=tejas.upadhyay@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox