Re: [PATCH V6 3/4] drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimization

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Matthew Auld <matthew.auld@intel.com>
To: "Zhang, Carl" <carl.zhang@intel.com>,
	"Upadhyay, Tejas" <tejas.upadhyay@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Cc: "thomas.hellstrom@linux.intel.com"
	<thomas.hellstrom@linux.intel.com>,
	"Souza, Jose" <jose.souza@intel.com>,
	"Mrozek, Michal" <michal.mrozek@intel.com>
Subject: Re: [PATCH V6 3/4] drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimization
Date: Mon, 9 Mar 2026 17:22:31 +0000	[thread overview]
Message-ID: <07e5e84d-070b-4fb9-84b9-f80f63b14799@intel.com> (raw)
In-Reply-To: <PH0PR11MB55795DE53945E3945748F1628779A@PH0PR11MB5579.namprd11.prod.outlook.com>

On 09/03/2026 15:29, Zhang, Carl wrote:
> 
> 
>> -----Original Message-----
>> From: Auld, Matthew <matthew.auld@intel.com>
>> Sent: Friday, March 6, 2026 6:09 PM
>> To: Zhang, Carl <carl.zhang@intel.com>; Upadhyay, Tejas
>> <tejas.upadhyay@intel.com>; intel-xe@lists.freedesktop.org
>> Cc: thomas.hellstrom@linux.intel.com; Souza, Jose <jose.souza@intel.com>;
>> Mrozek, Michal <michal.mrozek@intel.com>
>> Subject: Re: [PATCH V6 3/4] drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush
>> optimization
>>
>> On 06/03/2026 07:11, Zhang, Carl wrote:
>>> My understanding:
>>> 1. GuC uses a timer to monitor media activity status. The mode becomes
>> active only when no media tasks have been detected for 5 seconds. From the
>> media perspective, this allows legacy behavior to be maintained without
>> requiring any changes.
>>> 2. The media UMD only needs to set usage hints to gmmlib, which then
>> manages the PAT index. Therefore, the UMD itself should not require
>> changes—only gmmlib needs to be updated to return PAT index 19 on NVL for
>> imported surfaces.
>>> My open is:
>>> 1. In some applications (e.g., ChromeOS), memory is allocated centrally
>> (possibly by minigbm) and then shared across different components. If there
>> are no media tasks, the system operates in persistent mode. However, based
>> on current interfaces, imported memory should be configured as transient +
>> 1-way coherency. This raises a question: if this memory is used exclusively by
>> compute (not media), is this the expected behavior?
>>
>> 2way is also allowed.
>>
>   
> But 2way is slower than 1 way , right?

Likely it would be, I think. But for Media only, not sure.

> 
>>> 2. For userptr memory that is used by only one component, I believe 1-way
>> coherency should be sufficient?
>>
>> I think for 1) and 2), it mostly comes down to CPU/host <-> GPU coherency,
>> right? If you don't use 2WAY or XA, userspace would now have to manually
>> handle the coherency, in case in "persistent" mode. It doesn't matter if there
>> is just one component/app, the coherency issue would still be there.
>>
> For Media (vdbox, vebox), does not use L2 cache,  so, if it is userptr
> 1-way is enough, just need snoop cpu cache.
> 
>> For example, for 2) if you only use 1way without XA, then AFAIK you now
>> need manual flushing, if GPU side is cached and CPU is expecting to see
>> coherent view. Like say GPU writes something and CPU later reads it. The 1-
>> way here would just ensure that GPU snoops the CPU caches on the first
>> access. But if it then gets cached on GPU side, there is now no guaranteed
>> flush when that GPU job is complete, when in "persistent" mode.
>>
> a. my understanding , L2 is used only for RCS and VCS. Not for VCS and VECS.
> Please correct me if there is any misunderstanding.  So, if one external resource
> is only used by media . never be used by CCS RCS. whether we should still have
> such limitation.

Yeah, that matches my understanding. It sounded like Media access does 
not go via l2, so I assume goes directly to system memory. Hence why l2 
needs to be fully flushed if sharing something with Media (which is 
assumed whenever Media is currently turned on).

> 
> b. if there is media task, seems it cant be "persistent" mode. Of course, maybe,
> it is "persistent" mode, then media task comes, it turns to "transient" mode.
> But the data is still in cache , not flushed.  for this case, I agree that any resource
> used firstly by CCS or RCS then shared it to media, it should be set to 2-way or
> 1-way + XA.
> 
>> So assumption was that for userptr, the memory comes from the host, and
>> access is likely shared with CPU/host, so seems reasonable you would want
>> XA or 2WAY. For foreign imported memory you are likely sharing with host or
>> some other device/driver, so seems reasonable you would probably want XA
>> or 2WAY.
>>
>> We can drop the restrictions, if userspace really needs it, but it would be up to
>> userspace to deal with all the CPU/host vs GPU coherency fun, if applicable.
>> The restrictions do simplify things a little on the KMD side, plus the validation
>> angle in IGT.
>>
> I am thinking whether it is too strict.  There is different useagecases.

Do you know which index(s) you would pick instead, for Media only cases? 
I think that is the only edge case you are concerned with here?

Is it not the case that if you are only submitting within Media and it 
if it doesn't actually use l2, wouldn't the XA and l2 cache mode stuff 
be a no-op anyway, since the access from Media side never goes through l2?

Do you know which index that we don't allow, would give 
different/preferred behaviour for Media only case?

> 
>>>
>>> Thanks
>>> Carl
>>>
>>>> -----Original Message-----
>>>> From: Auld, Matthew <matthew.auld@intel.com>
>>>> Sent: Thursday, March 5, 2026 10:00 PM
>>>> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
>>>> xe@lists.freedesktop.org
>>>> Cc: thomas.hellstrom@linux.intel.com; Zhang, Carl
>> <carl.zhang@intel.com>;
>>>> Souza, Jose <jose.souza@intel.com>; Mrozek, Michal
>>>> <michal.mrozek@intel.com>
>>>> Subject: Re: [PATCH V6 3/4] drm/xe/xe3p_lpg: Restrict UAPI to enable L2
>> flush
>>>> optimization
>>>>
>>>> On 05/03/2026 12:19, Tejas Upadhyay wrote:
>>>>> When set, starting xe3p_lpg, the L2 flush optimization feature will
>>>>> control whether L2 is in Persistent or Transient mode through
>>>>> monitoring of media activity.
>>>>>
>>>>> To enable L2 flush optimization include new feature flag
>>>>> GUC_CTL_ENABLE_L2FLUSH_OPT for Novalake platforms when media
>> type
>>>> is
>>>>> detected.
>>>>>
>>>>> Tighten UAPI validation to restrict userptr, svm and dmabuf mappings
>>>>> to be either 2WAY or XA+1WAY
>>>>>
>>>>> V5(Thomas): logic correction
>>>>> V4(MattA): Modify uapi doc and commit
>>>>> V3(MattA): check valid op and pat_index value
>>>>> V2(MattA): validate dma-buf bos and madvise pat-index
>>>>>
>>>>> Acked-by: José Roberto de Souza <jose.souza@intel.com>
>>>>> Acked-by: Michal Mrozek <michal.mrozek@intel.com>
>>>>> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/xe/xe_guc.c        |  3 +++
>>>>>     drivers/gpu/drm/xe/xe_guc_fwif.h   |  1 +
>>>>>     drivers/gpu/drm/xe/xe_vm.c         |  8 ++++++++
>>>>>     drivers/gpu/drm/xe/xe_vm_madvise.c | 23 +++++++++++++++++++++++
>>>>>     include/uapi/drm/xe_drm.h          |  4 +++-
>>>>>     5 files changed, 38 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
>>>>> index 54d2fc780127..43dc4353206f 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>>>>> @@ -98,6 +98,9 @@ static u32 guc_ctl_feature_flags(struct xe_guc *guc)
>>>>>     	if (xe_guc_using_main_gamctrl_queues(guc))
>>>>>     		flags |= GUC_CTL_MAIN_GAMCTRL_QUEUES;
>>>>>
>>>>> +	if (GRAPHICS_VER(xe) >= 35 && !IS_DGFX(xe) &&
>>>> xe_gt_is_media_type(guc_to_gt(guc)))
>>>>> +		flags |= GUC_CTL_ENABLE_L2FLUSH_OPT;
>>>>
>>>> Pending whether we also need this on primary GT or not. Since it sounded
>>>> like it would also need to know whether to do a targeted or full flush
>> based
>>>> on current Media status, and it's unclear if here we are meant to opt into
>> that
>>>> for every GT/GuC instance vs just the Media GuC.
>>>>
>>>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>>>>
>>>>> +
>>>>>     	return flags;
>>>>>     }
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h
>>>>> b/drivers/gpu/drm/xe/xe_guc_fwif.h
>>>>> index bb8f71d38611..b73fae063fac 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
>>>>> @@ -67,6 +67,7 @@ struct guc_update_exec_queue_policy {
>>>>>     #define   GUC_CTL_ENABLE_PSMI_LOGGING	BIT(7)
>>>>>     #define   GUC_CTL_MAIN_GAMCTRL_QUEUES	BIT(9)
>>>>>     #define   GUC_CTL_DISABLE_SCHEDULER	BIT(14)
>>>>> +#define   GUC_CTL_ENABLE_L2FLUSH_OPT	BIT(15)
>>>>>
>>>>>     #define GUC_CTL_DEBUG			3
>>>>>     #define   GUC_LOG_VERBOSITY		REG_GENMASK(1, 0)
>>>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>>>> index da0ce0b3704c..0b236e08c158 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>>>> @@ -3481,6 +3481,10 @@ static int vm_bind_ioctl_check_args(struct
>>>> xe_device *xe, struct xe_vm *vm,
>>>>>     				 op ==
>>>> DRM_XE_VM_BIND_OP_MAP_USERPTR) ||
>>>>>     		    XE_IOCTL_DBG(xe, coh_mode == XE_COH_NONE &&
>>>>>     				 op ==
>>>> DRM_XE_VM_BIND_OP_MAP_USERPTR) ||
>>>>> +		    XE_IOCTL_DBG(xe, xe_device_is_l2_flush_optimized(xe) &&
>>>>> +				 (op ==
>>>> DRM_XE_VM_BIND_OP_MAP_USERPTR ||
>>>>> +				  is_cpu_addr_mirror) &&
>>>>> +				 (pat_index != 19 && coh_mode !=
>>>> XE_COH_2WAY)) ||
>>>>>     		    XE_IOCTL_DBG(xe, comp_en &&
>>>>>     				 op ==
>>>> DRM_XE_VM_BIND_OP_MAP_USERPTR) ||
>>>>>     		    XE_IOCTL_DBG(xe, op ==
>>>> DRM_XE_VM_BIND_OP_MAP_USERPTR && @@
>>>>> -3615,6 +3619,10 @@ static int xe_vm_bind_ioctl_validate_bo(struct
>>>> xe_device *xe, struct xe_bo *bo,
>>>>>     	if (XE_IOCTL_DBG(xe, bo->ttm.base.import_attach && comp_en))
>>>>>     		return -EINVAL;
>>>>>
>>>>> +	if (XE_IOCTL_DBG(xe, bo->ttm.base.import_attach &&
>>>> xe_device_is_l2_flush_optimized(xe) &&
>>>>> +			 (pat_index != 19 && coh_mode != XE_COH_2WAY)))
>>>>> +		return -EINVAL;
>>>>> +
>>>>>     	/* If a BO is protected it can only be mapped if the key is still valid */
>>>>>     	if ((bind_flags & DRM_XE_VM_BIND_FLAG_CHECK_PXP) &&
>>>> xe_bo_is_protected(bo) &&
>>>>>     	    op != DRM_XE_VM_BIND_OP_UNMAP && op !=
>>>>> DRM_XE_VM_BIND_OP_UNMAP_ALL) diff --git
>>>>> a/drivers/gpu/drm/xe/xe_vm_madvise.c
>>>>> b/drivers/gpu/drm/xe/xe_vm_madvise.c
>>>>> index 07169586e35f..376c014239ee 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
>>>>> @@ -411,6 +411,7 @@ int xe_vm_madvise_ioctl(struct drm_device *dev,
>>>> void *data, struct drm_file *fil
>>>>>     	struct xe_vmas_in_madvise_range madvise_range = {.addr = args-
>>>>> start,
>>>>>     							 .range =  args-
>>>>> range, };
>>>>>     	struct xe_madvise_details details;
>>>>> +	u16 pat_index, coh_mode;
>>>>>     	struct xe_vm *vm;
>>>>>     	struct drm_exec exec;
>>>>>     	int err, attr_type;
>>>>> @@ -447,6 +448,17 @@ int xe_vm_madvise_ioctl(struct drm_device
>> *dev,
>>>> void *data, struct drm_file *fil
>>>>>     	if (err || !madvise_range.num_vmas)
>>>>>     		goto madv_fini;
>>>>>
>>>>> +	if (args->type == DRM_XE_MEM_RANGE_ATTR_PAT) {
>>>>> +		pat_index = array_index_nospec(args->pat_index.val, xe-
>>>>> pat.n_entries);
>>>>> +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
>>>>> +		if (XE_IOCTL_DBG(xe, madvise_range.has_svm_userptr_vmas
>>>> &&
>>>>> +				 xe_device_is_l2_flush_optimized(xe) &&
>>>>> +				 (pat_index != 19 && coh_mode !=
>>>> XE_COH_2WAY))) {
>>>>> +			err = -EINVAL;
>>>>> +			goto madv_fini;
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>>     	if (madvise_range.has_bo_vmas) {
>>>>>     		if (args->type == DRM_XE_MEM_RANGE_ATTR_ATOMIC) {
>>>>>     			if (!check_bo_args_are_sane(vm,
>>>> madvise_range.vmas, @@ -464,6
>>>>> +476,17 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void
>> *data,
>>>>> struct drm_file *fil
>>>>>
>>>>>     				if (!bo)
>>>>>     					continue;
>>>>> +
>>>>> +				if (args->type ==
>>>> DRM_XE_MEM_RANGE_ATTR_PAT) {
>>>>> +					if (XE_IOCTL_DBG(xe, bo-
>>>>> ttm.base.import_attach &&
>>>>> +
>>>> xe_device_is_l2_flush_optimized(xe) &&
>>>>> +							 (pat_index != 19 &&
>>>>> +							  coh_mode !=
>>>> XE_COH_2WAY))) {
>>>>> +						err = -EINVAL;
>>>>> +						goto err_fini;
>>>>> +					}
>>>>> +				}
>>>>> +
>>>>>     				err = drm_exec_lock_obj(&exec, &bo-
>>>>> ttm.base);
>>>>>     				drm_exec_retry_on_contention(&exec);
>>>>>     				if (err)
>>>>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>>>>> index ef2565048bdf..862fed3cf1ed 100644
>>>>> --- a/include/uapi/drm/xe_drm.h
>>>>> +++ b/include/uapi/drm/xe_drm.h
>>>>> @@ -1103,7 +1103,9 @@ struct drm_xe_vm_bind_op {
>>>>>     	 * incoherent GT access is possible.
>>>>>     	 *
>>>>>     	 * Note: For userptr and externally imported dma-buf the kernel
>>>> expects
>>>>> -	 * either 1WAY or 2WAY for the @pat_index.
>>>>> +	 * either 1WAY or 2WAY for the @pat_index. Starting from NVL-P, for
>>>>> +	 * userptr, svm, madvise and externally imported dma-buf the kernel
>>>> expects
>>>>> +	 * either 2WAY or 1WAY and XA @pat_index.
>>>>>     	 *
>>>>>     	 * For DRM_XE_VM_BIND_FLAG_NULL bindings there are no KMD
>>>> restrictions
>>>>>     	 * on the @pat_index. For such mappings there is no actual memory
>>>>> being
>>>
>

next prev parent reply	other threads:[~2026-03-09 17:22 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05 12:19 [PATCH V6 0/4] drm/xe/xe3p_lpg: L2 flush optimization Tejas Upadhyay
2026-03-05 12:19 ` [PATCH V6 1/4] drm/xe/xe3p_lpg: flush shrinker bo cachelines manually Tejas Upadhyay
2026-03-05 12:19 ` [PATCH V6 2/4] drm/xe/pat: define coh_mode 2way Tejas Upadhyay
2026-03-05 12:19 ` [PATCH V6 3/4] drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimization Tejas Upadhyay
2026-03-05 13:52   ` Thomas Hellström
2026-03-05 13:59   ` Matthew Auld
2026-03-06  5:46     ` Upadhyay, Tejas
2026-03-06  7:11     ` Zhang, Carl
2026-03-06  9:13       ` Upadhyay, Tejas
2026-03-06 10:08       ` Matthew Auld
2026-03-09 15:29         ` Zhang, Carl
2026-03-09 17:22           ` Matthew Auld [this message]
2026-03-09 17:30             ` Thomas Hellström
2026-03-11 14:58               ` Zhang, Carl
2026-03-05 12:19 ` [PATCH V6 4/4] drm/xe/xe3p: Skip TD flush Tejas Upadhyay
2026-03-06 12:16 ` ✓ CI.KUnit: success for drm/xe/xe3p_lpg: L2 flush optimization (rev7) Patchwork
2026-03-06 12:58 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-07 13:34 ` ✓ Xe.CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07e5e84d-070b-4fb9-84b9-f80f63b14799@intel.com \
    --to=matthew.auld@intel.com \
    --cc=carl.zhang@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jose.souza@intel.com \
    --cc=michal.mrozek@intel.com \
    --cc=tejas.upadhyay@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox