Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
To: Matt Roper <matthew.d.roper@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH] drm/xe: Allow per queue programming of COMMON_SLICE_CHICKEN3 bit13
Date: Fri, 13 Feb 2026 10:28:06 +0200	[thread overview]
Message-ID: <94490d05-606d-4c20-bc7b-a14bc7e5480f@intel.com> (raw)
In-Reply-To: <20260212205013.GN4694@mdroper-desk1.amr.corp.intel.com>

On 12/02/2026 22:50, Matt Roper wrote:
> On Thu, Feb 12, 2026 at 02:16:21PM -0500, Rodrigo Vivi wrote:
>> On Thu, Feb 05, 2026 at 08:52:48PM +0200, Lionel Landwerlin wrote:
>>> Similar to i915's commit cebc13de7e704b1355bea208a9f9cdb042c74588
>>> ("drm/i915: Whitelist COMMON_SLICE_CHICKEN3 for UMD access"), except
>>> people have decided to not rely on putting the register on the
>>> allowlist for UMD to program and instead have context/queue creation
>>> flag.
>> Could you please be more specific here?
>>
>> I don't understand why we cannot simply add an RTP rule to xe_reg_whitelist.
>>
>> If we really need to go with the uapi, it is fine by me. Just attend to
>> all Jose's request, plus fix this commit message and then ack from my side.
>>
>> I mean, I'd like to understand this a bit better on why and change a bit
>> this message here, because 'people' just seems to blame someone
>> without being transparent about it. Not good in a commit message.
>> Not a good 'why' reason. The commit message should be about why, not blames...
> I think the reasoning is that COMMON_SLICE_CHICKEN3 isn't approved for
> whitelisting according to bspec 73995 (whereas it was approved on Xe1
> and older platforms on their corresponding pages).  We (the KMD team)
> are only ever allowed to whitelist registers that are explicitly listed
> on those 'approved whitelist' pages, or that are requested by a
> documented workaround that comes with its own formal approval for
> whitelisting from the security teams.  There have been some mistakes
> made here in the past (OA is a big area that we really screwed up on and
> it snuck by our code review), but it's good to be more vigilant about
> proper handling of whitelists going forward.
>
> It looks like the motivation for userspace wanting to adjust this
> register in the first place is the performance tuning setting that just
> showed up on bspec 72161.  That tuning guidance was just added one week
> ago, so it's not clear to me whether whitelisting was proposed and
> rejected (in favor of a KMD-driven solution like Lionel is adding
> below), or whether there might already be a security review in flight
> that will eventually lead to it getting added to page 73995.
>
>
> Matt


The motivation to want to adjust this register existed for a long time 
as described in the BSpec pages listed in the commit message.

Why has this moved to a context creation flag on windows and stopped 
being described a performance tuning knob... I don't know.

I poked people to get the documentation updated (hence the 72161 
change), but as far as I know there is no security review pending.


Windows drivers switched to using this rather than put the register on 
the UMD allowlist and all the currently shipping products use this on 
DX12/Vulkan.

We're currently missing the ability to program this on Xe (on i915 the 
allowlist is still in effect).


-Lionel


>
>>> This is a recommended tuning setting for both gen12 and Xe_HP
>>> platforms.
>>>
>>> Bspec: 73993, 73994, 31870, 68331
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/regs/xe_gt_regs.h     |  1 +
>>>   drivers/gpu/drm/xe/xe_exec_queue.c       | 20 +++++++++++++++++++-
>>>   drivers/gpu/drm/xe/xe_exec_queue_types.h |  2 ++
>>>   drivers/gpu/drm/xe/xe_lrc.c              |  9 +++++++++
>>>   drivers/gpu/drm/xe/xe_lrc.h              |  1 +
>>>   include/uapi/drm/xe_drm.h                |  1 +
>>>   6 files changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> index 24fc64fc832e9..a1b89f0a22e14 100644
>>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> @@ -172,6 +172,7 @@
>>>   
>>>   #define COMMON_SLICE_CHICKEN3				XE_REG(0x7304, XE_REG_OPTION_MASKED)
>>>   #define XEHP_COMMON_SLICE_CHICKEN3			XE_REG_MCR(0x7304, XE_REG_OPTION_MASKED)
>>> +#define   STATE_CACHE_PERF_FIX_DISABLED			REG_BIT(13)
>>>   #define   DG1_FLOAT_POINT_BLEND_OPT_STRICT_MODE_EN	REG_BIT(12)
>>>   #define   XEHP_DUAL_SIMD8_SEQ_MERGE_DISABLE		REG_BIT(12)
>>>   #define   BLEND_EMB_FIX_DISABLE_IN_RCC			REG_BIT(11)
>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>> index 66d0e10ee2c4a..d20b4d84bbe06 100644
>>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>> @@ -292,6 +292,9 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 exec_queue_flags)
>>>   	if (!(exec_queue_flags & EXEC_QUEUE_FLAG_KERNEL))
>>>   		flags |= XE_LRC_CREATE_USER_CTX;
>>>   
>>> +	if (q->flags & EXEC_QUEUE_FLAG_STATE_CACHE_PERF_FIX)
>>> +		flags |= XE_LRC_STATE_CACHE_PERF_FIX;
>>> +
>>>   	err = q->ops->init(q);
>>>   	if (err)
>>>   		return err;
>>> @@ -850,6 +853,19 @@ static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_e
>>>   	return q->ops->set_multi_queue_priority(q, value);
>>>   }
>>>   
>>> +static int exec_queue_set_state_cache_perf_fix(struct xe_device *xe, struct xe_exec_queue *q,
>>> +					       u64 value)
>>> +{
>>> +	if (XE_IOCTL_DBG(xe,
>>> +			 q->class != XE_ENGINE_CLASS_RENDER &&
>>> +                         q->class != XE_ENGINE_CLASS_COMPUTE))
>>> +		return -EOPNOTSUPP;
>>> +
>>> +	q->flags |= value != 0 ? EXEC_QUEUE_FLAG_STATE_CACHE_PERF_FIX : 0;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>   typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>>>   					     struct xe_exec_queue *q,
>>>   					     u64 value);
>>> @@ -862,6 +878,7 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>>>   	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
>>>   	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY] =
>>>   							exec_queue_set_multi_queue_priority,
>>> +	[DRM_XE_EXEC_QUEUE_SET_STATE_CACHE_PERF_FIX] = exec_queue_set_state_cache_perf_fix,
>>>   };
>>>   
>>>   int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
>>> @@ -946,7 +963,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
>>>   			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
>>>   			 ext.property != DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE &&
>>>   			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP &&
>>> -			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY))
>>> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY &&
>>> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_STATE_CACHE_PERF_FIX))
>>>   		return -EINVAL;
>>>   
>>>   	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>>> index 3791fed34ffa5..f4f72d01eb8c8 100644
>>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>>> @@ -134,6 +134,8 @@ struct xe_exec_queue {
>>>   #define EXEC_QUEUE_FLAG_LOW_LATENCY		BIT(5)
>>>   /* for migration (kernel copy, clear, bind) jobs */
>>>   #define EXEC_QUEUE_FLAG_MIGRATE			BIT(6)
>>> +/* for programming COMMON_SLICE_CHICKEN2 on first submission */
>>> +#define EXEC_QUEUE_FLAG_STATE_CACHE_PERF_FIX	BIT(7)
>>>   
>>>   	/**
>>>   	 * @flags: flags for this exec queue, should statically setup aside from ban
>>> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
>>> index 3db7968aa5e22..571d7ef303e1e 100644
>>> --- a/drivers/gpu/drm/xe/xe_lrc.c
>>> +++ b/drivers/gpu/drm/xe/xe_lrc.c
>>> @@ -14,6 +14,7 @@
>>>   #include "instructions/xe_gfxpipe_commands.h"
>>>   #include "instructions/xe_gfx_state_commands.h"
>>>   #include "regs/xe_engine_regs.h"
>>> +#include "regs/xe_gt_regs.h"
>>>   #include "regs/xe_lrc_layout.h"
>>>   #include "xe_bb.h"
>>>   #include "xe_bo.h"
>>> @@ -1443,6 +1444,7 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>>>   	struct xe_device *xe = gt_to_xe(gt);
>>>   	struct iosys_map map;
>>>   	u32 arb_enable;
>>> +	u32 state_cache_perf_fix[3];
>>>   	u32 bo_flags;
>>>   	int err;
>>>   
>>> @@ -1575,6 +1577,13 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>>>   	arb_enable = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>>>   	xe_lrc_write_ring(lrc, &arb_enable, sizeof(arb_enable));
>>>   
>>> +	if (init_flags & XE_LRC_STATE_CACHE_PERF_FIX) {
>>> +		state_cache_perf_fix[0] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1);
>>> +		state_cache_perf_fix[1] = COMMON_SLICE_CHICKEN3.addr;
>>> +		state_cache_perf_fix[2] = _MASKED_BIT_ENABLE(STATE_CACHE_PERF_FIX_DISABLED);
>>> +		xe_lrc_write_ring(lrc, state_cache_perf_fix, sizeof(state_cache_perf_fix));
>>> +	}
>>> +
>>>   	map = __xe_lrc_seqno_map(lrc);
>>>   	xe_map_write32(lrc_to_xe(lrc), &map, lrc->fence_ctx.next_seqno - 1);
>>>   
>>> diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
>>> index c307a3fd9ea28..083a2167aeef8 100644
>>> --- a/drivers/gpu/drm/xe/xe_lrc.h
>>> +++ b/drivers/gpu/drm/xe/xe_lrc.h
>>> @@ -49,6 +49,7 @@ struct xe_lrc_snapshot {
>>>   #define XE_LRC_CREATE_RUNALONE		BIT(0)
>>>   #define XE_LRC_CREATE_PXP		BIT(1)
>>>   #define XE_LRC_CREATE_USER_CTX		BIT(2)
>>> +#define XE_LRC_STATE_CACHE_PERF_FIX	BIT(3)
>>>   
>>>   struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm,
>>>   			     void *replay_state, u32 ring_size, u16 msix_vec, u32 flags);
>>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>>> index 077e66a682e29..b69b95a56bd82 100644
>>> --- a/include/uapi/drm/xe_drm.h
>>> +++ b/include/uapi/drm/xe_drm.h
>>> @@ -1329,6 +1329,7 @@ struct drm_xe_exec_queue_create {
>>>   #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		4
>>>   #define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
>>>   #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY	5
>>> +#define   DRM_XE_EXEC_QUEUE_SET_STATE_CACHE_PERF_FIX		6
>>>   	/** @extensions: Pointer to the first extension struct, if any */
>>>   	__u64 extensions;
>>>   
>>> -- 
>>> 2.43.0
>>>


      reply	other threads:[~2026-02-13  8:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05 18:52 [PATCH] drm/xe: Allow per queue programming of COMMON_SLICE_CHICKEN3 bit13 Lionel Landwerlin
2026-02-05 18:57 ` ✗ CI.checkpatch: warning for " Patchwork
2026-02-05 18:58 ` ✓ CI.KUnit: success " Patchwork
2026-02-05 19:48 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-06 18:29 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-02-12 18:56 ` [PATCH] " Souza, Jose
2026-02-12 19:16 ` Rodrigo Vivi
2026-02-12 20:50   ` Matt Roper
2026-02-13  8:28     ` Lionel Landwerlin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94490d05-606d-4c20-bc7b-a14bc7e5480f@intel.com \
    --to=lionel.g.landwerlin@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.d.roper@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox