* [PATCH 0/3] Xe2 performance tuning updates
@ 2024-09-18 20:47 Gustavo Sousa
2024-09-18 20:47 ` [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT Gustavo Sousa
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Gustavo Sousa @ 2024-09-18 20:47 UTC (permalink / raw)
To: intel-xe; +Cc: Matt Roper
This series contains updates for Xe2 recommended performance tuning
settings. It brings over a v2 of [1] as patch #1 and adds other relevant
updates with patches #2 and #3.
[1] https://patchwork.freedesktop.org/series/138776/
Gustavo Sousa (3):
drm/xe/xe2: Extend performance tuning to media GT
drm/xe/xe2: Assume tuning settings also apply for future media GT
drm/xe/xe2: Add performance tuning for L3 cache flushing
drivers/gpu/drm/xe/regs/xe_gt_regs.h | 12 ++++++++++
drivers/gpu/drm/xe/xe_tuning.c | 34 +++++++++++++++++++++++++++-
2 files changed, 45 insertions(+), 1 deletion(-)
--
2.46.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT
2024-09-18 20:47 [PATCH 0/3] Xe2 performance tuning updates Gustavo Sousa
@ 2024-09-18 20:47 ` Gustavo Sousa
2024-09-19 8:00 ` Upadhyay, Tejas
2024-09-18 20:47 ` [PATCH 2/3] drm/xe/xe2: Assume tuning settings also apply for future " Gustavo Sousa
` (4 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Gustavo Sousa @ 2024-09-18 20:47 UTC (permalink / raw)
To: intel-xe; +Cc: Matt Roper
With exception of "Tuning: L3 cache - media", we are currently applying
recommended performance tuning settings only for the primary GT. Let's
also implement them for the media GT when applicable.
According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist
only in Xe2_LPM and their offsets do not match their primary GT
counterparts. Furthermore, the range where CCCHKNREG1 belongs is not
listed as a multicast range on the media GT. As such, we need to have
Xe2_LPM-specific definitions for those registers and apply the setting
only for that specific IP.
Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and the
offset on the media GT matches the one on the primary one. However, the
range that contains that register is not is not listed as a multicast
range, so we need two different entries for media.
v2:
- Fix implementation with respect to multicast vs non-multicast
registers. (Matt)
- Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning:
Compression Overfetch - media".
Bspec: 72161
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
---
drivers/gpu/drm/xe/regs/xe_gt_regs.h | 7 +++++++
drivers/gpu/drm/xe/xe_tuning.c | 24 ++++++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index cf21de3adca6..6ec2d2c11d77 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -80,6 +80,7 @@
#define LE_CACHEABILITY_MASK REG_GENMASK(1, 0)
#define LE_CACHEABILITY(value) REG_FIELD_PREP(LE_CACHEABILITY_MASK, value)
+#define XELPMP_STATELESS_COMPRESSION_CTRL XE_REG(0x4148)
#define STATELESS_COMPRESSION_CTRL XE_REG_MCR(0x4148)
#define UNIFIED_COMPRESSION_FORMAT REG_GENMASK(3, 0)
@@ -169,6 +170,8 @@
#define XEHP_SLICE_COMMON_ECO_CHICKEN1 XE_REG_MCR(0x731c, XE_REG_OPTION_MASKED)
#define MSC_MSAA_REODER_BUF_BYPASS_DISABLE REG_BIT(14)
+#define XE2LPM_CCCHKNREG1 XE_REG(0x82a8)
+
#define VF_PREEMPTION XE_REG(0x83a4, XE_REG_OPTION_MASKED)
#define PREEMPTION_VERTEX_COUNT REG_GENMASK(15, 0)
@@ -399,6 +402,10 @@
#define SCRATCH1LPFC XE_REG(0xb474)
#define EN_L3_RW_CCS_CACHE_FLUSH REG_BIT(0)
+#define XE2LPM_L3SQCREG2 XE_REG_MCR(0xb604)
+
+#define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
+
#define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
#define XE2_TDF_CTRL XE_REG(0xb418)
diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
index faa1bf42e50e..7a5b852af8d7 100644
--- a/drivers/gpu/drm/xe/xe_tuning.c
+++ b/drivers/gpu/drm/xe/xe_tuning.c
@@ -42,20 +42,44 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
XE_RTP_ACTIONS(CLR(CCCHKNREG1, ENCOMPPERFFIX),
SET(CCCHKNREG1, L3CMPCTRL))
},
+ { XE_RTP_NAME("Tuning: Compression Overfetch - media"),
+ XE_RTP_RULES(MEDIA_VERSION(2000)),
+ XE_RTP_ACTIONS(CLR(XE2LPM_CCCHKNREG1, ENCOMPPERFFIX),
+ SET(XE2LPM_CCCHKNREG1, L3CMPCTRL))
+ },
{ XE_RTP_NAME("Tuning: Enable compressible partial write overfetch in L3"),
XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)),
XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN))
},
+ { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch in L3 - media"),
+ XE_RTP_RULES(MEDIA_VERSION(2000)),
+ XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG3, COMPPWOVERFETCHEN))
+ },
{ XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only"),
XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)),
XE_RTP_ACTIONS(SET(L3SQCREG2,
COMPMEMRD256BOVRFETCHEN))
},
+ { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only - media"),
+ XE_RTP_RULES(MEDIA_VERSION(2000)),
+ XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG2,
+ COMPMEMRD256BOVRFETCHEN))
+ },
{ XE_RTP_NAME("Tuning: Stateless compression control"),
XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)),
XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
},
+ { XE_RTP_NAME("Tuning: Stateless compression control - media"),
+ XE_RTP_RULES(MEDIA_VERSION(2000)),
+ XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
+ REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
+ },
+ { XE_RTP_NAME("Tuning: Stateless compression control - media (Xe2_HPM)"),
+ XE_RTP_RULES(MEDIA_VERSION(1301)),
+ XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
+ REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
+ },
{}
};
--
2.46.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/3] drm/xe/xe2: Assume tuning settings also apply for future media GT
2024-09-18 20:47 [PATCH 0/3] Xe2 performance tuning updates Gustavo Sousa
2024-09-18 20:47 ` [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT Gustavo Sousa
@ 2024-09-18 20:47 ` Gustavo Sousa
2024-09-19 8:01 ` Upadhyay, Tejas
2024-09-18 20:47 ` [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing Gustavo Sousa
` (3 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Gustavo Sousa @ 2024-09-18 20:47 UTC (permalink / raw)
To: intel-xe; +Cc: Matt Roper
We already make the assumption that recommended tuning settings for
primary GT on Xe2 will also apply for future releases. Let's make the
same assumption for the media GT. We can come back and define closed
ranges when that becomes necessary.
Bspec: 72161
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
---
drivers/gpu/drm/xe/xe_tuning.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
index 7a5b852af8d7..f62622f0be85 100644
--- a/drivers/gpu/drm/xe/xe_tuning.c
+++ b/drivers/gpu/drm/xe/xe_tuning.c
@@ -33,7 +33,7 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f)))
},
{ XE_RTP_NAME("Tuning: L3 cache - media"),
- XE_RTP_RULES(MEDIA_VERSION(2000)),
+ XE_RTP_RULES(MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED)),
XE_RTP_ACTIONS(FIELD_SET(XE2LPM_L3SQCREG5, L3_PWM_TIMER_INIT_VAL_MASK,
REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f)))
},
@@ -43,7 +43,7 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
SET(CCCHKNREG1, L3CMPCTRL))
},
{ XE_RTP_NAME("Tuning: Compression Overfetch - media"),
- XE_RTP_RULES(MEDIA_VERSION(2000)),
+ XE_RTP_RULES(MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED)),
XE_RTP_ACTIONS(CLR(XE2LPM_CCCHKNREG1, ENCOMPPERFFIX),
SET(XE2LPM_CCCHKNREG1, L3CMPCTRL))
},
@@ -52,7 +52,7 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN))
},
{ XE_RTP_NAME("Tuning: Enable compressible partial write overfetch in L3 - media"),
- XE_RTP_RULES(MEDIA_VERSION(2000)),
+ XE_RTP_RULES(MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED)),
XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG3, COMPPWOVERFETCHEN))
},
{ XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only"),
@@ -61,7 +61,7 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
COMPMEMRD256BOVRFETCHEN))
},
{ XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only - media"),
- XE_RTP_RULES(MEDIA_VERSION(2000)),
+ XE_RTP_RULES(MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED)),
XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG2,
COMPMEMRD256BOVRFETCHEN))
},
@@ -71,7 +71,7 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
},
{ XE_RTP_NAME("Tuning: Stateless compression control - media"),
- XE_RTP_RULES(MEDIA_VERSION(2000)),
+ XE_RTP_RULES(MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED)),
XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
},
--
2.46.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing
2024-09-18 20:47 [PATCH 0/3] Xe2 performance tuning updates Gustavo Sousa
2024-09-18 20:47 ` [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT Gustavo Sousa
2024-09-18 20:47 ` [PATCH 2/3] drm/xe/xe2: Assume tuning settings also apply for future " Gustavo Sousa
@ 2024-09-18 20:47 ` Gustavo Sousa
2024-09-19 7:39 ` Pottumuttu, Sai Teja
2024-09-19 8:22 ` Upadhyay, Tejas
2024-09-18 23:50 ` ✓ CI.Patch_applied: success for Xe2 performance tuning updates Patchwork
` (2 subsequent siblings)
5 siblings, 2 replies; 18+ messages in thread
From: Gustavo Sousa @ 2024-09-18 20:47 UTC (permalink / raw)
To: intel-xe; +Cc: Matt Roper
A recommended performance tuning for LNL related to L3 cache flushing
was recently introduced in Bspec. Implement it.
Bspec: 70821
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
---
drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +++++
drivers/gpu/drm/xe/xe_tuning.c | 8 ++++++++
2 files changed, 13 insertions(+)
diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index 6ec2d2c11d77..ccd18cdd5b50 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -389,6 +389,9 @@
#define L3SQCREG3 XE_REG_MCR(0xb108)
#define COMPPWOVERFETCHEN REG_BIT(28)
+#define SCRATCH3LBCF XE_REG_MCR(0xb154)
+#define RWFLUSHALLEN REG_BIT(17)
+
#define XEHP_L3SQCREG5 XE_REG_MCR(0xb158)
#define L3_PWM_TIMER_INIT_VAL_MASK REG_GENMASK(9, 0)
@@ -406,6 +409,8 @@
#define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
+#define XE2LPM_SCRATCH3LBCF XE_REG_MCR(0xb654)
+
#define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
#define XE2_TDF_CTRL XE_REG(0xb418)
diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
index f62622f0be85..4dd77b44ac82 100644
--- a/drivers/gpu/drm/xe/xe_tuning.c
+++ b/drivers/gpu/drm/xe/xe_tuning.c
@@ -80,6 +80,14 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
},
+ { XE_RTP_NAME("Tuning: L3 RW flush all Cache"),
+ XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2004, XE_RTP_END_VERSION_UNDEFINED)),
+ XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
+ },
+ { XE_RTP_NAME("Tuning: L3 RW flush all cache - media"),
+ XE_RTP_RULES(MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED)),
+ XE_RTP_ACTIONS(SET(XE2LPM_SCRATCH3LBCF, RWFLUSHALLEN))
+ },
{}
};
--
2.46.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* ✓ CI.Patch_applied: success for Xe2 performance tuning updates
2024-09-18 20:47 [PATCH 0/3] Xe2 performance tuning updates Gustavo Sousa
` (2 preceding siblings ...)
2024-09-18 20:47 ` [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing Gustavo Sousa
@ 2024-09-18 23:50 ` Patchwork
2024-09-18 23:51 ` ✓ CI.checkpatch: " Patchwork
2024-09-18 23:51 ` ✗ CI.KUnit: failure " Patchwork
5 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2024-09-18 23:50 UTC (permalink / raw)
To: Gustavo Sousa; +Cc: intel-xe
== Series Details ==
Series: Xe2 performance tuning updates
URL : https://patchwork.freedesktop.org/series/138844/
State : success
== Summary ==
=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: 2141344f9b4f drm-tip: 2024y-09m-18d-21h-15m-07s UTC integration manifest
=== git am output follows ===
Applying: drm/xe/xe2: Extend performance tuning to media GT
Applying: drm/xe/xe2: Assume tuning settings also apply for future media GT
Applying: drm/xe/xe2: Add performance tuning for L3 cache flushing
^ permalink raw reply [flat|nested] 18+ messages in thread
* ✓ CI.checkpatch: success for Xe2 performance tuning updates
2024-09-18 20:47 [PATCH 0/3] Xe2 performance tuning updates Gustavo Sousa
` (3 preceding siblings ...)
2024-09-18 23:50 ` ✓ CI.Patch_applied: success for Xe2 performance tuning updates Patchwork
@ 2024-09-18 23:51 ` Patchwork
2024-09-18 23:51 ` ✗ CI.KUnit: failure " Patchwork
5 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2024-09-18 23:51 UTC (permalink / raw)
To: Gustavo Sousa; +Cc: intel-xe
== Series Details ==
Series: Xe2 performance tuning updates
URL : https://patchwork.freedesktop.org/series/138844/
State : success
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
c62d7e164862503a3662a095da1c6c9014248cb2
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 29ff5c767106d015524722190a456bb0cd35280c
Author: Gustavo Sousa <gustavo.sousa@intel.com>
Date: Wed Sep 18 17:47:31 2024 -0300
drm/xe/xe2: Add performance tuning for L3 cache flushing
A recommended performance tuning for LNL related to L3 cache flushing
was recently introduced in Bspec. Implement it.
Bspec: 70821
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
+ /mt/dim checkpatch 2141344f9b4f0bcd6bb20a45afaef94209743c0d drm-intel
b8fae12e6d77 drm/xe/xe2: Extend performance tuning to media GT
b949cee5a311 drm/xe/xe2: Assume tuning settings also apply for future media GT
29ff5c767106 drm/xe/xe2: Add performance tuning for L3 cache flushing
^ permalink raw reply [flat|nested] 18+ messages in thread
* ✗ CI.KUnit: failure for Xe2 performance tuning updates
2024-09-18 20:47 [PATCH 0/3] Xe2 performance tuning updates Gustavo Sousa
` (4 preceding siblings ...)
2024-09-18 23:51 ` ✓ CI.checkpatch: " Patchwork
@ 2024-09-18 23:51 ` Patchwork
5 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2024-09-18 23:51 UTC (permalink / raw)
To: Gustavo Sousa; +Cc: intel-xe
== Series Details ==
Series: Xe2 performance tuning updates
URL : https://patchwork.freedesktop.org/series/138844/
State : failure
== Summary ==
+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:In file included from ../drivers/gpu/drm/xe/xe_rtp.h:14,
from ../drivers/gpu/drm/xe/xe_tuning.c:13:
../drivers/gpu/drm/xe/xe_rtp_helpers.h:81:39: error: ‘DROP_FIRST_ARG’ undeclared here (not in a function)
81 | #define XE_RTP_DROP_CAST(...) _XE_ESC(DROP_FIRST_ARG _XE_ESC __VA_ARGS__)
| ^~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:18:22: note: in definition of macro ‘_XE_ESC’
18 | #define _XE_ESC(...) __VA_ARGS__
| ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp.h:261:11: note: in expansion of macro ‘XE_RTP_DROP_CAST’
261 | { .reg = XE_RTP_DROP_CAST(reg_), \
| ^~~~~~~~~~~~~~~~
../include/linux/args.h:25:24: note: in expansion of macro ‘XE_RTP_ACTION_SET’
25 | #define __CONCAT(a, b) a ## b
| ^
../include/linux/args.h:26:27: note: in expansion of macro ‘__CONCAT’
26 | #define CONCATENATE(a, b) __CONCAT(a, b)
| ^~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:22:30: note: in expansion of macro ‘CONCATENATE’
22 | #define _XE_RTP_CONCAT(a, b) CONCATENATE(XE_RTP_, CONCATENATE(a, b))
| ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:57:46: note: in expansion of macro ‘_XE_RTP_CONCAT’
57 | #define XE_RTP_PASTE_1(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_)
| ^~~~~~~~~~~~~~
../include/linux/args.h:25:24: note: in expansion of macro ‘XE_RTP_PASTE_1’
25 | #define __CONCAT(a, b) a ## b
| ^
../include/linux/args.h:26:27: note: in expansion of macro ‘__CONCAT’
26 | #define CONCATENATE(a, b) __CONCAT(a, b)
| ^~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:22:30: note: in expansion of macro ‘CONCATENATE’
22 | #define _XE_RTP_CONCAT(a, b) CONCATENATE(XE_RTP_, CONCATENATE(a, b))
| ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:56:52: note: in expansion of macro ‘_XE_RTP_CONCAT’
56 | #define XE_RTP_PASTE_FOREACH(prefix_, sep_, args_) _XE_RTP_CONCAT(PASTE_, COUNT_ARGS args_)(prefix_, sep_, args_)
| ^~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp.h:420:3: note: in expansion of macro ‘XE_RTP_PASTE_FOREACH’
420 | XE_RTP_PASTE_FOREACH(ACTION_, COMMA, (__VA_ARGS__)) \
| ^~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_tuning.c:85:4: note: in expansion of macro ‘XE_RTP_ACTIONS’
85 | XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
| ^~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:81:54: error: expected ‘}’ before ‘_XE_ESC’
81 | #define XE_RTP_DROP_CAST(...) _XE_ESC(DROP_FIRST_ARG _XE_ESC __VA_ARGS__)
| ^~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:18:22: note: in definition of macro ‘_XE_ESC’
18 | #define _XE_ESC(...) __VA_ARGS__
| ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp.h:261:11: note: in expansion of macro ‘XE_RTP_DROP_CAST’
261 | { .reg = XE_RTP_DROP_CAST(reg_), \
| ^~~~~~~~~~~~~~~~
../include/linux/args.h:25:24: note: in expansion of macro ‘XE_RTP_ACTION_SET’
25 | #define __CONCAT(a, b) a ## b
| ^
../include/linux/args.h:26:27: note: in expansion of macro ‘__CONCAT’
26 | #define CONCATENATE(a, b) __CONCAT(a, b)
| ^~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:22:30: note: in expansion of macro ‘CONCATENATE’
22 | #define _XE_RTP_CONCAT(a, b) CONCATENATE(XE_RTP_, CONCATENATE(a, b))
| ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:57:46: note: in expansion of macro ‘_XE_RTP_CONCAT’
57 | #define XE_RTP_PASTE_1(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_)
| ^~~~~~~~~~~~~~
../include/linux/args.h:25:24: note: in expansion of macro ‘XE_RTP_PASTE_1’
25 | #define __CONCAT(a, b) a ## b
| ^
../include/linux/args.h:26:27: note: in expansion of macro ‘__CONCAT’
26 | #define CONCATENATE(a, b) __CONCAT(a, b)
| ^~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:22:30: note: in expansion of macro ‘CONCATENATE’
22 | #define _XE_RTP_CONCAT(a, b) CONCATENATE(XE_RTP_, CONCATENATE(a, b))
| ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:56:52: note: in expansion of macro ‘_XE_RTP_CONCAT’
56 | #define XE_RTP_PASTE_FOREACH(prefix_, sep_, args_) _XE_RTP_CONCAT(PASTE_, COUNT_ARGS args_)(prefix_, sep_, args_)
| ^~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp.h:420:3: note: in expansion of macro ‘XE_RTP_PASTE_FOREACH’
420 | XE_RTP_PASTE_FOREACH(ACTION_, COMMA, (__VA_ARGS__)) \
| ^~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_tuning.c:85:4: note: in expansion of macro ‘XE_RTP_ACTIONS’
85 | XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
| ^~~~~~~~~~~~~~
In file included from ../drivers/gpu/drm/xe/xe_tuning.c:13:
../drivers/gpu/drm/xe/xe_rtp.h:261:2: note: to match this ‘{’
261 | { .reg = XE_RTP_DROP_CAST(reg_), \
| ^
../include/linux/args.h:25:24: note: in expansion of macro ‘XE_RTP_ACTION_SET’
25 | #define __CONCAT(a, b) a ## b
| ^
../include/linux/args.h:26:27: note: in expansion of macro ‘__CONCAT’
26 | #define CONCATENATE(a, b) __CONCAT(a, b)
| ^~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:22:30: note: in expansion of macro ‘CONCATENATE’
22 | #define _XE_RTP_CONCAT(a, b) CONCATENATE(XE_RTP_, CONCATENATE(a, b))
| ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:57:46: note: in expansion of macro ‘_XE_RTP_CONCAT’
57 | #define XE_RTP_PASTE_1(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_)
| ^~~~~~~~~~~~~~
../include/linux/args.h:25:24: note: in expansion of macro ‘XE_RTP_PASTE_1’
25 | #define __CONCAT(a, b) a ## b
| ^
../include/linux/args.h:26:27: note: in expansion of macro ‘__CONCAT’
26 | #define CONCATENATE(a, b) __CONCAT(a, b)
| ^~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:22:30: note: in expansion of macro ‘CONCATENATE’
22 | #define _XE_RTP_CONCAT(a, b) CONCATENATE(XE_RTP_, CONCATENATE(a, b))
| ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp_helpers.h:56:52: note: in expansion of macro ‘_XE_RTP_CONCAT’
56 | #define XE_RTP_PASTE_FOREACH(prefix_, sep_, args_) _XE_RTP_CONCAT(PASTE_, COUNT_ARGS args_)(prefix_, sep_, args_)
| ^~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_rtp.h:420:3: note: in expansion of macro ‘XE_RTP_PASTE_FOREACH’
420 | XE_RTP_PASTE_FOREACH(ACTION_, COMMA, (__VA_ARGS__)) \
| ^~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_tuning.c:85:4: note: in expansion of macro ‘XE_RTP_ACTIONS’
85 | XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
| ^~~~~~~~~~~~~~
make[7]: *** [../scripts/Makefile.build:244: drivers/gpu/drm/xe/xe_tuning.o] Error 1
make[7]: *** Waiting for unfinished jobs....
../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
156 | u64 ioread64_lo_hi(const void __iomem *addr)
| ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
163 | u64 ioread64_hi_lo(const void __iomem *addr)
| ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
170 | u64 ioread64be_lo_hi(const void __iomem *addr)
| ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
178 | u64 ioread64be_hi_lo(const void __iomem *addr)
| ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~~~
make[6]: *** [../scripts/Makefile.build:485: drivers/gpu/drm/xe] Error 2
make[5]: *** [../scripts/Makefile.build:485: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:485: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:485: drivers] Error 2
make[2]: *** [/kernel/Makefile:1926: .] Error 2
make[1]: *** [/kernel/Makefile:224: __sub-make] Error 2
make: *** [Makefile:224: __sub-make] Error 2
[23:51:05] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[23:51:09] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing
2024-09-18 20:47 ` [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing Gustavo Sousa
@ 2024-09-19 7:39 ` Pottumuttu, Sai Teja
2024-09-19 18:46 ` Gustavo Sousa
2024-09-19 8:22 ` Upadhyay, Tejas
1 sibling, 1 reply; 18+ messages in thread
From: Pottumuttu, Sai Teja @ 2024-09-19 7:39 UTC (permalink / raw)
To: Gustavo Sousa, intel-xe; +Cc: Matt Roper, sai.teja.pottumuttu
On 19-09-2024 02:17, Gustavo Sousa wrote:
> A recommended performance tuning for LNL related to L3 cache flushing
> was recently introduced in Bspec. Implement it.
>
> Bspec: 70821
The correct BSpec should be 72161 I guess.
> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
> ---
> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +++++
> drivers/gpu/drm/xe/xe_tuning.c | 8 ++++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> index 6ec2d2c11d77..ccd18cdd5b50 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> @@ -389,6 +389,9 @@
> #define L3SQCREG3 XE_REG_MCR(0xb108)
> #define COMPPWOVERFETCHEN REG_BIT(28)
>
> +#define SCRATCH3LBCF XE_REG_MCR(0xb154)
> +#define RWFLUSHALLEN REG_BIT(17)
> +
> #define XEHP_L3SQCREG5 XE_REG_MCR(0xb158)
> #define L3_PWM_TIMER_INIT_VAL_MASK REG_GENMASK(9, 0)
>
> @@ -406,6 +409,8 @@
>
> #define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
>
> +#define XE2LPM_SCRATCH3LBCF XE_REG_MCR(0xb654)
Just a general question, the register might exist on other platforms as
well right, so,
would it be a good idea to call it SCRATCH3LBCF_MEDIA instead?
> +
> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>
> #define XE2_TDF_CTRL XE_REG(0xb418)
> diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
> index f62622f0be85..4dd77b44ac82 100644
> --- a/drivers/gpu/drm/xe/xe_tuning.c
> +++ b/drivers/gpu/drm/xe/xe_tuning.c
> @@ -80,6 +80,14 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> },
> + { XE_RTP_NAME("Tuning: L3 RW flush all Cache"),
> + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2004, XE_RTP_END_VERSION_UNDEFINED)),
> + XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
The register should be SCRATCH3LBCF
Thank You
- Sai Teja
> + },
> + { XE_RTP_NAME("Tuning: L3 RW flush all cache - media"),
> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED)),
> + XE_RTP_ACTIONS(SET(XE2LPM_SCRATCH3LBCF, RWFLUSHALLEN))
> + },
> {}
> };
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT
2024-09-18 20:47 ` [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT Gustavo Sousa
@ 2024-09-19 8:00 ` Upadhyay, Tejas
2024-09-19 18:08 ` Gustavo Sousa
0 siblings, 1 reply; 18+ messages in thread
From: Upadhyay, Tejas @ 2024-09-19 8:00 UTC (permalink / raw)
To: Sousa, Gustavo, intel-xe@lists.freedesktop.org; +Cc: Roper, Matthew D
> -----Original Message-----
> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Gustavo
> Sousa
> Sent: Thursday, September 19, 2024 2:17 AM
> To: intel-xe@lists.freedesktop.org
> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
> Subject: [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT
>
> With exception of "Tuning: L3 cache - media", we are currently applying
> recommended performance tuning settings only for the primary GT. Let's also
> implement them for the media GT when applicable.
>
> According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist
> only in Xe2_LPM and their offsets do not match their primary GT
> counterparts. Furthermore, the range where CCCHKNREG1 belongs is not
> listed as a multicast range on the media GT. As such, we need to have
> Xe2_LPM-specific definitions for those registers and apply the setting only for
> that specific IP.
>
> Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and
> the offset on the media GT matches the one on the primary one. However,
> the range that contains that register is not is not listed as a multicast range, so
> we need two different entries for media.
>
> v2:
> - Fix implementation with respect to multicast vs non-multicast
> registers. (Matt)
> - Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning:
> Compression Overfetch - media".
>
> Bspec: 72161
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
> ---
> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 7 +++++++
> drivers/gpu/drm/xe/xe_tuning.c | 24 ++++++++++++++++++++++++
> 2 files changed, 31 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> index cf21de3adca6..6ec2d2c11d77 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> @@ -80,6 +80,7 @@
> #define LE_CACHEABILITY_MASK REG_GENMASK(1, 0)
> #define LE_CACHEABILITY(value)
> REG_FIELD_PREP(LE_CACHEABILITY_MASK, value)
>
> +#define XELPMP_STATELESS_COMPRESSION_CTRL XE_REG(0x4148)
Were trying to say, XE2LPM_ here? Also this seems to be MCR register.
> #define STATELESS_COMPRESSION_CTRL
> XE_REG_MCR(0x4148)
> #define UNIFIED_COMPRESSION_FORMAT REG_GENMASK(3, 0)
>
> @@ -169,6 +170,8 @@
> #define XEHP_SLICE_COMMON_ECO_CHICKEN1
> XE_REG_MCR(0x731c, XE_REG_OPTION_MASKED)
> #define MSC_MSAA_REODER_BUF_BYPASS_DISABLE REG_BIT(14)
>
> +#define XE2LPM_CCCHKNREG1 XE_REG(0x82a8)
> +
> #define VF_PREEMPTION XE_REG(0x83a4,
> XE_REG_OPTION_MASKED)
> #define PREEMPTION_VERTEX_COUNT REG_GENMASK(15, 0)
>
> @@ -399,6 +402,10 @@
> #define SCRATCH1LPFC XE_REG(0xb474)
> #define EN_L3_RW_CCS_CACHE_FLUSH REG_BIT(0)
>
> +#define XE2LPM_L3SQCREG2 XE_REG_MCR(0xb604)
> +
> +#define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
> +
These are not marked MCR in bspec. Is there something I missed looking.
> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>
> #define XE2_TDF_CTRL XE_REG(0xb418)
> diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
> index faa1bf42e50e..7a5b852af8d7 100644
> --- a/drivers/gpu/drm/xe/xe_tuning.c
> +++ b/drivers/gpu/drm/xe/xe_tuning.c
> @@ -42,20 +42,44 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
> XE_RTP_ACTIONS(CLR(CCCHKNREG1, ENCOMPPERFFIX),
> SET(CCCHKNREG1, L3CMPCTRL))
> },
> + { XE_RTP_NAME("Tuning: Compression Overfetch - media"),
> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> + XE_RTP_ACTIONS(CLR(XE2LPM_CCCHKNREG1, ENCOMPPERFFIX),
> + SET(XE2LPM_CCCHKNREG1, L3CMPCTRL))
> + },
> { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch
> in L3"),
> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
> XE_RTP_END_VERSION_UNDEFINED)),
> XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN))
> },
> + { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch
> in L3 - media"),
> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> + XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG3,
> COMPPWOVERFETCHEN))
> + },
> { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only"),
> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
> XE_RTP_END_VERSION_UNDEFINED)),
> XE_RTP_ACTIONS(SET(L3SQCREG2,
> COMPMEMRD256BOVRFETCHEN))
> },
> + { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only - media"),
> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> + XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG2,
> + COMPMEMRD256BOVRFETCHEN))
> + },
> { XE_RTP_NAME("Tuning: Stateless compression control"),
> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
> XE_RTP_END_VERSION_UNDEFINED)),
> XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL,
> UNIFIED_COMPRESSION_FORMAT,
>
> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> },
> + { XE_RTP_NAME("Tuning: Stateless compression control - media"),
> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> + XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL,
> UNIFIED_COMPRESSION_FORMAT,
> +
> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> + },
> + { XE_RTP_NAME("Tuning: Stateless compression control - media
> (Xe2_HPM)"),
> + XE_RTP_RULES(MEDIA_VERSION(1301)),
> +
> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL,
> UNIFIED_COMPRESSION_FORMAT,
> +
> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> + },
> {}
> };
>
> --
> 2.46.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 2/3] drm/xe/xe2: Assume tuning settings also apply for future media GT
2024-09-18 20:47 ` [PATCH 2/3] drm/xe/xe2: Assume tuning settings also apply for future " Gustavo Sousa
@ 2024-09-19 8:01 ` Upadhyay, Tejas
0 siblings, 0 replies; 18+ messages in thread
From: Upadhyay, Tejas @ 2024-09-19 8:01 UTC (permalink / raw)
To: Sousa, Gustavo, intel-xe@lists.freedesktop.org; +Cc: Roper, Matthew D
> -----Original Message-----
> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Gustavo
> Sousa
> Sent: Thursday, September 19, 2024 2:18 AM
> To: intel-xe@lists.freedesktop.org
> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
> Subject: [PATCH 2/3] drm/xe/xe2: Assume tuning settings also apply for future
> media GT
>
> We already make the assumption that recommended tuning settings for
> primary GT on Xe2 will also apply for future releases. Let's make the same
> assumption for the media GT. We can come back and define closed ranges
> when that becomes necessary.
>
> Bspec: 72161
> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
> ---
> drivers/gpu/drm/xe/xe_tuning.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
> index 7a5b852af8d7..f62622f0be85 100644
> --- a/drivers/gpu/drm/xe/xe_tuning.c
> +++ b/drivers/gpu/drm/xe/xe_tuning.c
> @@ -33,7 +33,7 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
>
> REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f)))
> },
> { XE_RTP_NAME("Tuning: L3 cache - media"),
> - XE_RTP_RULES(MEDIA_VERSION(2000)),
> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000,
> +XE_RTP_END_VERSION_UNDEFINED)),
> XE_RTP_ACTIONS(FIELD_SET(XE2LPM_L3SQCREG5,
> L3_PWM_TIMER_INIT_VAL_MASK,
>
> REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f)))
> },
> @@ -43,7 +43,7 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
> SET(CCCHKNREG1, L3CMPCTRL))
> },
> { XE_RTP_NAME("Tuning: Compression Overfetch - media"),
> - XE_RTP_RULES(MEDIA_VERSION(2000)),
> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000,
> +XE_RTP_END_VERSION_UNDEFINED)),
> XE_RTP_ACTIONS(CLR(XE2LPM_CCCHKNREG1, ENCOMPPERFFIX),
> SET(XE2LPM_CCCHKNREG1, L3CMPCTRL))
> },
> @@ -52,7 +52,7 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
> XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN))
> },
> { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch
> in L3 - media"),
> - XE_RTP_RULES(MEDIA_VERSION(2000)),
> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000,
> +XE_RTP_END_VERSION_UNDEFINED)),
> XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG3,
> COMPPWOVERFETCHEN))
> },
> { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only"), @@ -
> 61,7 +61,7 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
> COMPMEMRD256BOVRFETCHEN))
> },
> { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only - media"),
> - XE_RTP_RULES(MEDIA_VERSION(2000)),
> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000,
> +XE_RTP_END_VERSION_UNDEFINED)),
> XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG2,
> COMPMEMRD256BOVRFETCHEN))
> },
> @@ -71,7 +71,7 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
>
> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> },
> { XE_RTP_NAME("Tuning: Stateless compression control - media"),
> - XE_RTP_RULES(MEDIA_VERSION(2000)),
> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000,
> +XE_RTP_END_VERSION_UNDEFINED)),
> XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL,
> UNIFIED_COMPRESSION_FORMAT,
Looks fine,
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>
> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> },
> --
> 2.46.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing
2024-09-18 20:47 ` [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing Gustavo Sousa
2024-09-19 7:39 ` Pottumuttu, Sai Teja
@ 2024-09-19 8:22 ` Upadhyay, Tejas
2024-09-19 19:24 ` Gustavo Sousa
1 sibling, 1 reply; 18+ messages in thread
From: Upadhyay, Tejas @ 2024-09-19 8:22 UTC (permalink / raw)
To: Sousa, Gustavo, intel-xe@lists.freedesktop.org; +Cc: Roper, Matthew D
> -----Original Message-----
> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Gustavo
> Sousa
> Sent: Thursday, September 19, 2024 2:18 AM
> To: intel-xe@lists.freedesktop.org
> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
> Subject: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache
> flushing
>
> A recommended performance tuning for LNL related to L3 cache flushing was
> recently introduced in Bspec. Implement it.
>
> Bspec: 70821
Yes bspec needs an update.
> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
> ---
> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +++++
> drivers/gpu/drm/xe/xe_tuning.c | 8 ++++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> index 6ec2d2c11d77..ccd18cdd5b50 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> @@ -389,6 +389,9 @@
> #define L3SQCREG3 XE_REG_MCR(0xb108)
> #define COMPPWOVERFETCHEN REG_BIT(28)
>
> +#define SCRATCH3LBCF
> XE_REG_MCR(0xb154)
Please name this SCRATCH3 only as bspec mentions.
> +#define RWFLUSHALLEN REG_BIT(17)
> +
> #define XEHP_L3SQCREG5
> XE_REG_MCR(0xb158)
> #define L3_PWM_TIMER_INIT_VAL_MASK REG_GENMASK(9, 0)
>
> @@ -406,6 +409,8 @@
>
> #define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
>
> +#define XE2LPM_SCRATCH3LBCF
> XE_REG_MCR(0xb654)
Agree with other review that we should name this MEDIA_SCRATCH3 for future use well. Also this does not look to be MCR reg. Please double check.
With that addressed,
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> +
> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>
> #define XE2_TDF_CTRL XE_REG(0xb418)
> diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
> index f62622f0be85..4dd77b44ac82 100644
> --- a/drivers/gpu/drm/xe/xe_tuning.c
> +++ b/drivers/gpu/drm/xe/xe_tuning.c
> @@ -80,6 +80,14 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
>
> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL,
> UNIFIED_COMPRESSION_FORMAT,
>
> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> },
> + { XE_RTP_NAME("Tuning: L3 RW flush all Cache"),
> + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2004,
> XE_RTP_END_VERSION_UNDEFINED)),
> + XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
> + },
> + { XE_RTP_NAME("Tuning: L3 RW flush all cache - media"),
> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000,
> XE_RTP_END_VERSION_UNDEFINED)),
> + XE_RTP_ACTIONS(SET(XE2LPM_SCRATCH3LBCF, RWFLUSHALLEN))
> + },
> {}
> };
>
> --
> 2.46.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT
2024-09-19 8:00 ` Upadhyay, Tejas
@ 2024-09-19 18:08 ` Gustavo Sousa
2024-09-20 5:42 ` Upadhyay, Tejas
0 siblings, 1 reply; 18+ messages in thread
From: Gustavo Sousa @ 2024-09-19 18:08 UTC (permalink / raw)
To: Upadhyay, Tejas, intel-xe@lists.freedesktop.org; +Cc: Roper, Matthew D
Quoting Upadhyay, Tejas (2024-09-19 05:00:22-03:00)
>
>
>> -----Original Message-----
>> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Gustavo
>> Sousa
>> Sent: Thursday, September 19, 2024 2:17 AM
>> To: intel-xe@lists.freedesktop.org
>> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
>> Subject: [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT
>>
>> With exception of "Tuning: L3 cache - media", we are currently applying
>> recommended performance tuning settings only for the primary GT. Let's also
>> implement them for the media GT when applicable.
>>
>> According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist
>> only in Xe2_LPM and their offsets do not match their primary GT
>> counterparts. Furthermore, the range where CCCHKNREG1 belongs is not
>> listed as a multicast range on the media GT. As such, we need to have
>> Xe2_LPM-specific definitions for those registers and apply the setting only for
>> that specific IP.
>>
>> Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and
>> the offset on the media GT matches the one on the primary one. However,
>> the range that contains that register is not is not listed as a multicast range, so
>> we need two different entries for media.
>>
>> v2:
>> - Fix implementation with respect to multicast vs non-multicast
>> registers. (Matt)
>> - Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning:
>> Compression Overfetch - media".
>>
>> Bspec: 72161
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
>> ---
>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 7 +++++++
>> drivers/gpu/drm/xe/xe_tuning.c | 24 ++++++++++++++++++++++++
>> 2 files changed, 31 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> index cf21de3adca6..6ec2d2c11d77 100644
>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> @@ -80,6 +80,7 @@
>> #define LE_CACHEABILITY_MASK REG_GENMASK(1, 0)
>> #define LE_CACHEABILITY(value)
>> REG_FIELD_PREP(LE_CACHEABILITY_MASK, value)
>>
>> +#define XELPMP_STATELESS_COMPRESSION_CTRL XE_REG(0x4148)
>
>Were trying to say, XE2LPM_ here? Also this seems to be MCR register.
Yeah, you're right on both. I was looking at steering spec for MTL media
instead of BMG's when adding this and then used XELPMP_ thinking that
Xe_LMP+ also had that register.
Thanks for catching this. I'll update this on the next version of this
series.
It looks like we also need to fix the logic around MCR tables in our
driver, since we are selecting Xe_LPM+'s table for Xe2_LPM.
>
>> #define STATELESS_COMPRESSION_CTRL
>> XE_REG_MCR(0x4148)
>> #define UNIFIED_COMPRESSION_FORMAT REG_GENMASK(3, 0)
>>
>> @@ -169,6 +170,8 @@
>> #define XEHP_SLICE_COMMON_ECO_CHICKEN1
>> XE_REG_MCR(0x731c, XE_REG_OPTION_MASKED)
>> #define MSC_MSAA_REODER_BUF_BYPASS_DISABLE REG_BIT(14)
>>
>> +#define XE2LPM_CCCHKNREG1 XE_REG(0x82a8)
>> +
>> #define VF_PREEMPTION XE_REG(0x83a4,
>> XE_REG_OPTION_MASKED)
>> #define PREEMPTION_VERTEX_COUNT REG_GENMASK(15, 0)
>>
>> @@ -399,6 +402,10 @@
>> #define SCRATCH1LPFC XE_REG(0xb474)
>> #define EN_L3_RW_CCS_CACHE_FLUSH REG_BIT(0)
>>
>> +#define XE2LPM_L3SQCREG2 XE_REG_MCR(0xb604)
>> +
>> +#define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
>> +
>
>These are not marked MCR in bspec. Is there something I missed looking.
I just checked Bspec 71186 again and range [0x38B600:0x38B8FF] is marked
as multicast.
--
Gustavo Sousa
>
>> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>>
>> #define XE2_TDF_CTRL XE_REG(0xb418)
>> diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
>> index faa1bf42e50e..7a5b852af8d7 100644
>> --- a/drivers/gpu/drm/xe/xe_tuning.c
>> +++ b/drivers/gpu/drm/xe/xe_tuning.c
>> @@ -42,20 +42,44 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
>> XE_RTP_ACTIONS(CLR(CCCHKNREG1, ENCOMPPERFFIX),
>> SET(CCCHKNREG1, L3CMPCTRL))
>> },
>> + { XE_RTP_NAME("Tuning: Compression Overfetch - media"),
>> + XE_RTP_RULES(MEDIA_VERSION(2000)),
>> + XE_RTP_ACTIONS(CLR(XE2LPM_CCCHKNREG1, ENCOMPPERFFIX),
>> + SET(XE2LPM_CCCHKNREG1, L3CMPCTRL))
>> + },
>> { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch
>> in L3"),
>> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
>> XE_RTP_END_VERSION_UNDEFINED)),
>> XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN))
>> },
>> + { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch
>> in L3 - media"),
>> + XE_RTP_RULES(MEDIA_VERSION(2000)),
>> + XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG3,
>> COMPPWOVERFETCHEN))
>> + },
>> { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only"),
>> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
>> XE_RTP_END_VERSION_UNDEFINED)),
>> XE_RTP_ACTIONS(SET(L3SQCREG2,
>> COMPMEMRD256BOVRFETCHEN))
>> },
>> + { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only - media"),
>> + XE_RTP_RULES(MEDIA_VERSION(2000)),
>> + XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG2,
>> + COMPMEMRD256BOVRFETCHEN))
>> + },
>> { XE_RTP_NAME("Tuning: Stateless compression control"),
>> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
>> XE_RTP_END_VERSION_UNDEFINED)),
>> XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL,
>> UNIFIED_COMPRESSION_FORMAT,
>>
>> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
>> },
>> + { XE_RTP_NAME("Tuning: Stateless compression control - media"),
>> + XE_RTP_RULES(MEDIA_VERSION(2000)),
>> + XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL,
>> UNIFIED_COMPRESSION_FORMAT,
>> +
>> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
>> + },
>> + { XE_RTP_NAME("Tuning: Stateless compression control - media
>> (Xe2_HPM)"),
>> + XE_RTP_RULES(MEDIA_VERSION(1301)),
>> +
>> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL,
>> UNIFIED_COMPRESSION_FORMAT,
>> +
>> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
>> + },
>> {}
>> };
>>
>> --
>> 2.46.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing
2024-09-19 7:39 ` Pottumuttu, Sai Teja
@ 2024-09-19 18:46 ` Gustavo Sousa
0 siblings, 0 replies; 18+ messages in thread
From: Gustavo Sousa @ 2024-09-19 18:46 UTC (permalink / raw)
To: Pottumuttu, Sai Teja, intel-xe; +Cc: Matt Roper, sai.teja.pottumuttu
Quoting Pottumuttu, Sai Teja (2024-09-19 04:39:00-03:00)
>
>On 19-09-2024 02:17, Gustavo Sousa wrote:
>> A recommended performance tuning for LNL related to L3 cache flushing
>> was recently introduced in Bspec. Implement it.
>>
>> Bspec: 70821
>
>The correct BSpec should be 72161 I guess.
Yes. Thanks!
>
>> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
>> ---
>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +++++
>> drivers/gpu/drm/xe/xe_tuning.c | 8 ++++++++
>> 2 files changed, 13 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> index 6ec2d2c11d77..ccd18cdd5b50 100644
>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> @@ -389,6 +389,9 @@
>> #define L3SQCREG3 XE_REG_MCR(0xb108)
>> #define COMPPWOVERFETCHEN REG_BIT(28)
>>
>> +#define SCRATCH3LBCF XE_REG_MCR(0xb154)
>> +#define RWFLUSHALLEN REG_BIT(17)
>> +
>> #define XEHP_L3SQCREG5 XE_REG_MCR(0xb158)
>> #define L3_PWM_TIMER_INIT_VAL_MASK REG_GENMASK(9, 0)
>>
>> @@ -406,6 +409,8 @@
>>
>> #define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
>>
>> +#define XE2LPM_SCRATCH3LBCF XE_REG_MCR(0xb654)
>
>Just a general question, the register might exist on other platforms as
>well right, so,
>
>would it be a good idea to call it SCRATCH3LBCF_MEDIA instead?
XE2LPM_ already indicates this as the media version of this register. I
think _MEDIA is unnecessary here.
>
>> +
>> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>>
>> #define XE2_TDF_CTRL XE_REG(0xb418)
>> diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
>> index f62622f0be85..4dd77b44ac82 100644
>> --- a/drivers/gpu/drm/xe/xe_tuning.c
>> +++ b/drivers/gpu/drm/xe/xe_tuning.c
>> @@ -80,6 +80,14 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
>> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL, UNIFIED_COMPRESSION_FORMAT,
>> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
>> },
>> + { XE_RTP_NAME("Tuning: L3 RW flush all Cache"),
>> + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2004, XE_RTP_END_VERSION_UNDEFINED)),
>> + XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
>
>The register should be SCRATCH3LBCF
Yeah. Thanks!
I thought I had build-tested this series... It turns out I did, but with
the wrong config :-( (i915 CI's kernel config instead of xe's).
--
Gustavo Sousa
>
>Thank You
>
>- Sai Teja
>
>> + },
>> + { XE_RTP_NAME("Tuning: L3 RW flush all cache - media"),
>> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED)),
>> + XE_RTP_ACTIONS(SET(XE2LPM_SCRATCH3LBCF, RWFLUSHALLEN))
>> + },
>> {}
>> };
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing
2024-09-19 8:22 ` Upadhyay, Tejas
@ 2024-09-19 19:24 ` Gustavo Sousa
2024-09-19 19:36 ` Gustavo Sousa
0 siblings, 1 reply; 18+ messages in thread
From: Gustavo Sousa @ 2024-09-19 19:24 UTC (permalink / raw)
To: Upadhyay, Tejas, intel-xe@lists.freedesktop.org; +Cc: Roper, Matthew D
Quoting Upadhyay, Tejas (2024-09-19 05:22:50-03:00)
>
>
>> -----Original Message-----
>> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Gustavo
>> Sousa
>> Sent: Thursday, September 19, 2024 2:18 AM
>> To: intel-xe@lists.freedesktop.org
>> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
>> Subject: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache
>> flushing
>>
>> A recommended performance tuning for LNL related to L3 cache flushing was
>> recently introduced in Bspec. Implement it.
>>
>> Bspec: 70821
>
>Yes bspec needs an update.
Yep.
>
>> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
>> ---
>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +++++
>> drivers/gpu/drm/xe/xe_tuning.c | 8 ++++++++
>> 2 files changed, 13 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> index 6ec2d2c11d77..ccd18cdd5b50 100644
>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> @@ -389,6 +389,9 @@
>> #define L3SQCREG3 XE_REG_MCR(0xb108)
>> #define COMPPWOVERFETCHEN REG_BIT(28)
>>
>> +#define SCRATCH3LBCF
>> XE_REG_MCR(0xb154)
>
>Please name this SCRATCH3 only as bspec mentions.
Looking at our register database, it looks like there are other
"SCRATCH" registers from other units. So wouldn't it be better not to
use plain SCRATCH3 here?
Probably SCRATCH3_LBCF for more readability...
>
>> +#define RWFLUSHALLEN REG_BIT(17)
>> +
>> #define XEHP_L3SQCREG5
>> XE_REG_MCR(0xb158)
>> #define L3_PWM_TIMER_INIT_VAL_MASK REG_GENMASK(9, 0)
>>
>> @@ -406,6 +409,8 @@
>>
>> #define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
>>
>> +#define XE2LPM_SCRATCH3LBCF
>> XE_REG_MCR(0xb654)
>
>Agree with other review that we should name this MEDIA_SCRATCH3 for future use well. Also this does not look to be MCR reg. Please double check.
I believe the prefix XE2LPM_ is enough to differentiate this from the
primary GT version of this register. It is common to prefix registers
with the name of the IP that introduce the difference with respect to
other versions of the register.
Please note that the same pattern is used for other registers that have
different offsets on the media GT.
--
Gustavo Sousa
>
>With that addressed,
>Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>
>> +
>> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>>
>> #define XE2_TDF_CTRL XE_REG(0xb418)
>> diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
>> index f62622f0be85..4dd77b44ac82 100644
>> --- a/drivers/gpu/drm/xe/xe_tuning.c
>> +++ b/drivers/gpu/drm/xe/xe_tuning.c
>> @@ -80,6 +80,14 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
>>
>> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL,
>> UNIFIED_COMPRESSION_FORMAT,
>>
>> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
>> },
>> + { XE_RTP_NAME("Tuning: L3 RW flush all Cache"),
>> + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2004,
>> XE_RTP_END_VERSION_UNDEFINED)),
>> + XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
>> + },
>> + { XE_RTP_NAME("Tuning: L3 RW flush all cache - media"),
>> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000,
>> XE_RTP_END_VERSION_UNDEFINED)),
>> + XE_RTP_ACTIONS(SET(XE2LPM_SCRATCH3LBCF, RWFLUSHALLEN))
>> + },
>> {}
>> };
>>
>> --
>> 2.46.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing
2024-09-19 19:24 ` Gustavo Sousa
@ 2024-09-19 19:36 ` Gustavo Sousa
2024-09-20 5:17 ` Upadhyay, Tejas
0 siblings, 1 reply; 18+ messages in thread
From: Gustavo Sousa @ 2024-09-19 19:36 UTC (permalink / raw)
To: Upadhyay, Tejas, intel-xe@lists.freedesktop.org; +Cc: Roper, Matthew D
Quoting Gustavo Sousa (2024-09-19 16:24:19-03:00)
>Quoting Upadhyay, Tejas (2024-09-19 05:22:50-03:00)
>>
>>
>>> -----Original Message-----
>>> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Gustavo
>>> Sousa
>>> Sent: Thursday, September 19, 2024 2:18 AM
>>> To: intel-xe@lists.freedesktop.org
>>> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
>>> Subject: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache
>>> flushing
>>>
>>> A recommended performance tuning for LNL related to L3 cache flushing was
>>> recently introduced in Bspec. Implement it.
>>>
>>> Bspec: 70821
>>
>>Yes bspec needs an update.
>
>Yep.
>
>>
>>> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
>>> ---
>>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +++++
>>> drivers/gpu/drm/xe/xe_tuning.c | 8 ++++++++
>>> 2 files changed, 13 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> index 6ec2d2c11d77..ccd18cdd5b50 100644
>>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>> @@ -389,6 +389,9 @@
>>> #define L3SQCREG3 XE_REG_MCR(0xb108)
>>> #define COMPPWOVERFETCHEN REG_BIT(28)
>>>
>>> +#define SCRATCH3LBCF
>>> XE_REG_MCR(0xb154)
>>
>>Please name this SCRATCH3 only as bspec mentions.
>
>Looking at our register database, it looks like there are other
>"SCRATCH" registers from other units. So wouldn't it be better not to
>use plain SCRATCH3 here?
>
>Probably SCRATCH3_LBCF for more readability...
>
>>
>>> +#define RWFLUSHALLEN REG_BIT(17)
>>> +
>>> #define XEHP_L3SQCREG5
>>> XE_REG_MCR(0xb158)
>>> #define L3_PWM_TIMER_INIT_VAL_MASK REG_GENMASK(9, 0)
>>>
>>> @@ -406,6 +409,8 @@
>>>
>>> #define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
>>>
>>> +#define XE2LPM_SCRATCH3LBCF
>>> XE_REG_MCR(0xb654)
>>
>>Agree with other review that we should name this MEDIA_SCRATCH3 for future use well. Also this does not look to be MCR reg. Please double check.
>
>I believe the prefix XE2LPM_ is enough to differentiate this from the
>primary GT version of this register. It is common to prefix registers
>with the name of the IP that introduce the difference with respect to
>other versions of the register.
>
>Please note that the same pattern is used for other registers that have
>different offsets on the media GT.
Ah... I forgot to reply also with respect to the MCR-related comment.
Bspec 71186 lists range [0x38B600, 0x38B8FF] as multicast.
--
Gustavo Sousa
>
>--
>Gustavo Sousa
>
>>
>>With that addressed,
>>Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>>
>>> +
>>> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>>>
>>> #define XE2_TDF_CTRL XE_REG(0xb418)
>>> diff --git a/drivers/gpu/drm/xe/xe_tuning.c b/drivers/gpu/drm/xe/xe_tuning.c
>>> index f62622f0be85..4dd77b44ac82 100644
>>> --- a/drivers/gpu/drm/xe/xe_tuning.c
>>> +++ b/drivers/gpu/drm/xe/xe_tuning.c
>>> @@ -80,6 +80,14 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
>>>
>>> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL,
>>> UNIFIED_COMPRESSION_FORMAT,
>>>
>>> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
>>> },
>>> + { XE_RTP_NAME("Tuning: L3 RW flush all Cache"),
>>> + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2004,
>>> XE_RTP_END_VERSION_UNDEFINED)),
>>> + XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
>>> + },
>>> + { XE_RTP_NAME("Tuning: L3 RW flush all cache - media"),
>>> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000,
>>> XE_RTP_END_VERSION_UNDEFINED)),
>>> + XE_RTP_ACTIONS(SET(XE2LPM_SCRATCH3LBCF, RWFLUSHALLEN))
>>> + },
>>> {}
>>> };
>>>
>>> --
>>> 2.46.1
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing
2024-09-19 19:36 ` Gustavo Sousa
@ 2024-09-20 5:17 ` Upadhyay, Tejas
2024-09-20 11:53 ` Gustavo Sousa
0 siblings, 1 reply; 18+ messages in thread
From: Upadhyay, Tejas @ 2024-09-20 5:17 UTC (permalink / raw)
To: Sousa, Gustavo, intel-xe@lists.freedesktop.org; +Cc: Roper, Matthew D
> -----Original Message-----
> From: Sousa, Gustavo <gustavo.sousa@intel.com>
> Sent: Friday, September 20, 2024 1:06 AM
> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
> xe@lists.freedesktop.org
> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
> Subject: RE: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache
> flushing
>
> Quoting Gustavo Sousa (2024-09-19 16:24:19-03:00)
> >Quoting Upadhyay, Tejas (2024-09-19 05:22:50-03:00)
> >>
> >>
> >>> -----Original Message-----
> >>> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of
> >>> Gustavo Sousa
> >>> Sent: Thursday, September 19, 2024 2:18 AM
> >>> To: intel-xe@lists.freedesktop.org
> >>> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
> >>> Subject: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache
> >>> flushing
> >>>
> >>> A recommended performance tuning for LNL related to L3 cache
> >>> flushing was recently introduced in Bspec. Implement it.
> >>>
> >>> Bspec: 70821
> >>
> >>Yes bspec needs an update.
> >
> >Yep.
> >
> >>
> >>> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
> >>> ---
> >>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +++++
> >>> drivers/gpu/drm/xe/xe_tuning.c | 8 ++++++++
> >>> 2 files changed, 13 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >>> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >>> index 6ec2d2c11d77..ccd18cdd5b50 100644
> >>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >>> @@ -389,6 +389,9 @@
> >>> #define L3SQCREG3 XE_REG_MCR(0xb108)
> >>> #define COMPPWOVERFETCHEN REG_BIT(28)
> >>>
> >>> +#define SCRATCH3LBCF
> >>> XE_REG_MCR(0xb154)
> >>
> >>Please name this SCRATCH3 only as bspec mentions.
> >
> >Looking at our register database, it looks like there are other
> >"SCRATCH" registers from other units. So wouldn't it be better not to
> >use plain SCRATCH3 here?
> >
> >Probably SCRATCH3_LBCF for more readability...
In that case I don’t have strong resistance here, but media also have scratch reg. How about "SCRATCH3_LBCF_GFX".
> >
> >>
> >>> +#define RWFLUSHALLEN REG_BIT(17)
> >>> +
> >>> #define XEHP_L3SQCREG5
> >>> XE_REG_MCR(0xb158)
> >>> #define L3_PWM_TIMER_INIT_VAL_MASK REG_GENMASK(9, 0)
> >>>
> >>> @@ -406,6 +409,8 @@
> >>>
> >>> #define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
> >>>
> >>> +#define XE2LPM_SCRATCH3LBCF
> >>> XE_REG_MCR(0xb654)
> >>
> >>Agree with other review that we should name this MEDIA_SCRATCH3 for
> future use well. Also this does not look to be MCR reg. Please double check.
> >
> >I believe the prefix XE2LPM_ is enough to differentiate this from the
> >primary GT version of this register. It is common to prefix registers
> >with the name of the IP that introduce the difference with respect to
> >other versions of the register.
> >
> >Please note that the same pattern is used for other registers that have
> >different offsets on the media GT.
>
> Ah... I forgot to reply also with respect to the MCR-related comment.
> Bspec 71186 lists range [0x38B600, 0x38B8FF] as multicast.
Oh I did not realise it is different range page for media, I was looking gfx one. Thanks for pointing out. You can retain my r-o-b with that.
Tejas
>
> --
> Gustavo Sousa
>
> >
> >--
> >Gustavo Sousa
> >
> >>
> >>With that addressed,
> >>Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> >>
> >>> +
> >>> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
> >>>
> >>> #define XE2_TDF_CTRL XE_REG(0xb418)
> >>> diff --git a/drivers/gpu/drm/xe/xe_tuning.c
> >>> b/drivers/gpu/drm/xe/xe_tuning.c index f62622f0be85..4dd77b44ac82
> >>> 100644
> >>> --- a/drivers/gpu/drm/xe/xe_tuning.c
> >>> +++ b/drivers/gpu/drm/xe/xe_tuning.c
> >>> @@ -80,6 +80,14 @@ static const struct xe_rtp_entry_sr gt_tunings[]
> >>> = {
> >>>
> >>> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL,
> >>> UNIFIED_COMPRESSION_FORMAT,
> >>>
> >>> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> >>> },
> >>> + { XE_RTP_NAME("Tuning: L3 RW flush all Cache"),
> >>> + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2004,
> >>> XE_RTP_END_VERSION_UNDEFINED)),
> >>> + XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
> >>> + },
> >>> + { XE_RTP_NAME("Tuning: L3 RW flush all cache - media"),
> >>> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000,
> >>> XE_RTP_END_VERSION_UNDEFINED)),
> >>> + XE_RTP_ACTIONS(SET(XE2LPM_SCRATCH3LBCF, RWFLUSHALLEN))
> >>> + },
> >>> {}
> >>> };
> >>>
> >>> --
> >>> 2.46.1
> >>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT
2024-09-19 18:08 ` Gustavo Sousa
@ 2024-09-20 5:42 ` Upadhyay, Tejas
0 siblings, 0 replies; 18+ messages in thread
From: Upadhyay, Tejas @ 2024-09-20 5:42 UTC (permalink / raw)
To: Sousa, Gustavo, intel-xe@lists.freedesktop.org; +Cc: Roper, Matthew D
> -----Original Message-----
> From: Sousa, Gustavo <gustavo.sousa@intel.com>
> Sent: Thursday, September 19, 2024 11:39 PM
> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
> xe@lists.freedesktop.org
> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
> Subject: RE: [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT
>
> Quoting Upadhyay, Tejas (2024-09-19 05:00:22-03:00)
> >
> >
> >> -----Original Message-----
> >> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of
> >> Gustavo Sousa
> >> Sent: Thursday, September 19, 2024 2:17 AM
> >> To: intel-xe@lists.freedesktop.org
> >> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
> >> Subject: [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media
> >> GT
> >>
> >> With exception of "Tuning: L3 cache - media", we are currently
> >> applying recommended performance tuning settings only for the primary
> >> GT. Let's also implement them for the media GT when applicable.
> >>
> >> According to our spec, media GT registers CCCHKNREG1 and L3SQCREG*
> >> exist only in Xe2_LPM and their offsets do not match their primary GT
> >> counterparts. Furthermore, the range where CCCHKNREG1 belongs is not
> >> listed as a multicast range on the media GT. As such, we need to have
> >> Xe2_LPM-specific definitions for those registers and apply the
> >> setting only for that specific IP.
> >>
> >> Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and
> the
> >> offset on the media GT matches the one on the primary one. However,
> >> the range that contains that register is not is not listed as a
> >> multicast range, so we need two different entries for media.
> >>
> >> v2:
> >> - Fix implementation with respect to multicast vs non-multicast
> >> registers. (Matt)
> >> - Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning:
> >> Compression Overfetch - media".
> >>
> >> Bspec: 72161
> >> Cc: Matt Roper <matthew.d.roper@intel.com>
> >> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
> >> ---
> >> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 7 +++++++
> >> drivers/gpu/drm/xe/xe_tuning.c | 24 ++++++++++++++++++++++++
> >> 2 files changed, 31 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >> index cf21de3adca6..6ec2d2c11d77 100644
> >> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> >> @@ -80,6 +80,7 @@
> >> #define LE_CACHEABILITY_MASK REG_GENMASK(1, 0)
> >> #define LE_CACHEABILITY(value)
> >> REG_FIELD_PREP(LE_CACHEABILITY_MASK, value)
> >>
> >> +#define XELPMP_STATELESS_COMPRESSION_CTRL XE_REG(0x4148)
> >
> >Were trying to say, XE2LPM_ here? Also this seems to be MCR register.
>
> Yeah, you're right on both. I was looking at steering spec for MTL media
> instead of BMG's when adding this and then used XELPMP_ thinking that
> Xe_LMP+ also had that register.
>
> Thanks for catching this. I'll update this on the next version of this series.
>
> It looks like we also need to fix the logic around MCR tables in our driver,
> since we are selecting Xe_LPM+'s table for Xe2_LPM.
>
> >
> >> #define STATELESS_COMPRESSION_CTRL
> >> XE_REG_MCR(0x4148)
> >> #define UNIFIED_COMPRESSION_FORMAT REG_GENMASK(3, 0)
> >>
> >> @@ -169,6 +170,8 @@
> >> #define XEHP_SLICE_COMMON_ECO_CHICKEN1
> >> XE_REG_MCR(0x731c, XE_REG_OPTION_MASKED)
> >> #define MSC_MSAA_REODER_BUF_BYPASS_DISABLE REG_BIT(14)
> >>
> >> +#define XE2LPM_CCCHKNREG1 XE_REG(0x82a8)
> >> +
> >> #define VF_PREEMPTION XE_REG(0x83a4,
> >> XE_REG_OPTION_MASKED)
> >> #define PREEMPTION_VERTEX_COUNT REG_GENMASK(15, 0)
> >>
> >> @@ -399,6 +402,10 @@
> >> #define SCRATCH1LPFC XE_REG(0xb474)
> >> #define EN_L3_RW_CCS_CACHE_FLUSH REG_BIT(0)
> >>
> >> +#define XE2LPM_L3SQCREG2 XE_REG_MCR(0xb604)
> >> +
> >> +#define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
> >> +
> >
> >These are not marked MCR in bspec. Is there something I missed looking.
>
> I just checked Bspec 71186 again and range [0x38B600:0x38B8FF] is marked
> as multicast.
Ok, as I mentioned in other comment, I completely missed media table while I was looking at this stage. You can add my r-o-b, when you incorporate above comments,
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Tejas
>
> --
> Gustavo Sousa
>
> >
> >> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
> >>
> >> #define XE2_TDF_CTRL XE_REG(0xb418)
> >> diff --git a/drivers/gpu/drm/xe/xe_tuning.c
> >> b/drivers/gpu/drm/xe/xe_tuning.c index faa1bf42e50e..7a5b852af8d7
> >> 100644
> >> --- a/drivers/gpu/drm/xe/xe_tuning.c
> >> +++ b/drivers/gpu/drm/xe/xe_tuning.c
> >> @@ -42,20 +42,44 @@ static const struct xe_rtp_entry_sr gt_tunings[] = {
> >> XE_RTP_ACTIONS(CLR(CCCHKNREG1, ENCOMPPERFFIX),
> >> SET(CCCHKNREG1, L3CMPCTRL))
> >> },
> >> + { XE_RTP_NAME("Tuning: Compression Overfetch - media"),
> >> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> >> + XE_RTP_ACTIONS(CLR(XE2LPM_CCCHKNREG1, ENCOMPPERFFIX),
> >> + SET(XE2LPM_CCCHKNREG1, L3CMPCTRL))
> >> + },
> >> { XE_RTP_NAME("Tuning: Enable compressible partial write
> >> overfetch in L3"),
> >> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
> >> XE_RTP_END_VERSION_UNDEFINED)),
> >> XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN))
> >> },
> >> + { XE_RTP_NAME("Tuning: Enable compressible partial write
> >> + overfetch
> >> in L3 - media"),
> >> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> >> + XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG3,
> >> COMPPWOVERFETCHEN))
> >> + },
> >> { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only"),
> >> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
> >> XE_RTP_END_VERSION_UNDEFINED)),
> >> XE_RTP_ACTIONS(SET(L3SQCREG2,
> >> COMPMEMRD256BOVRFETCHEN))
> >> },
> >> + { XE_RTP_NAME("Tuning: L2 Overfetch Compressible Only - media"),
> >> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> >> + XE_RTP_ACTIONS(SET(XE2LPM_L3SQCREG2,
> >> + COMPMEMRD256BOVRFETCHEN))
> >> + },
> >> { XE_RTP_NAME("Tuning: Stateless compression control"),
> >> XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001,
> >> XE_RTP_END_VERSION_UNDEFINED)),
> >> XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL,
> >> UNIFIED_COMPRESSION_FORMAT,
> >>
> >> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> >> },
> >> + { XE_RTP_NAME("Tuning: Stateless compression control - media"),
> >> + XE_RTP_RULES(MEDIA_VERSION(2000)),
> >> + XE_RTP_ACTIONS(FIELD_SET(STATELESS_COMPRESSION_CTRL,
> >> UNIFIED_COMPRESSION_FORMAT,
> >> +
> >> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> >> + },
> >> + { XE_RTP_NAME("Tuning: Stateless compression control - media
> >> (Xe2_HPM)"),
> >> + XE_RTP_RULES(MEDIA_VERSION(1301)),
> >> +
> >> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL,
> >> UNIFIED_COMPRESSION_FORMAT,
> >> +
> >> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
> >> + },
> >> {}
> >> };
> >>
> >> --
> >> 2.46.1
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing
2024-09-20 5:17 ` Upadhyay, Tejas
@ 2024-09-20 11:53 ` Gustavo Sousa
0 siblings, 0 replies; 18+ messages in thread
From: Gustavo Sousa @ 2024-09-20 11:53 UTC (permalink / raw)
To: Upadhyay, Tejas, intel-xe@lists.freedesktop.org; +Cc: Roper, Matthew D
Quoting Upadhyay, Tejas (2024-09-20 02:17:44-03:00)
>
>
>> -----Original Message-----
>> From: Sousa, Gustavo <gustavo.sousa@intel.com>
>> Sent: Friday, September 20, 2024 1:06 AM
>> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
>> xe@lists.freedesktop.org
>> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
>> Subject: RE: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache
>> flushing
>>
>> Quoting Gustavo Sousa (2024-09-19 16:24:19-03:00)
>> >Quoting Upadhyay, Tejas (2024-09-19 05:22:50-03:00)
>> >>
>> >>
>> >>> -----Original Message-----
>> >>> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of
>> >>> Gustavo Sousa
>> >>> Sent: Thursday, September 19, 2024 2:18 AM
>> >>> To: intel-xe@lists.freedesktop.org
>> >>> Cc: Roper, Matthew D <matthew.d.roper@intel.com>
>> >>> Subject: [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache
>> >>> flushing
>> >>>
>> >>> A recommended performance tuning for LNL related to L3 cache
>> >>> flushing was recently introduced in Bspec. Implement it.
>> >>>
>> >>> Bspec: 70821
>> >>
>> >>Yes bspec needs an update.
>> >
>> >Yep.
>> >
>> >>
>> >>> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
>> >>> ---
>> >>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +++++
>> >>> drivers/gpu/drm/xe/xe_tuning.c | 8 ++++++++
>> >>> 2 files changed, 13 insertions(+)
>> >>>
>> >>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> >>> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> >>> index 6ec2d2c11d77..ccd18cdd5b50 100644
>> >>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> >>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> >>> @@ -389,6 +389,9 @@
>> >>> #define L3SQCREG3 XE_REG_MCR(0xb108)
>> >>> #define COMPPWOVERFETCHEN REG_BIT(28)
>> >>>
>> >>> +#define SCRATCH3LBCF
>> >>> XE_REG_MCR(0xb154)
>> >>
>> >>Please name this SCRATCH3 only as bspec mentions.
>> >
>> >Looking at our register database, it looks like there are other
>> >"SCRATCH" registers from other units. So wouldn't it be better not to
>> >use plain SCRATCH3 here?
>> >
>> >Probably SCRATCH3_LBCF for more readability...
>
>In that case I don’t have strong resistance here, but media also have scratch reg. How about "SCRATCH3_LBCF_GFX".
Yes, that's why it is being defined with the XE2LPM_ prefix, just like
with the other media registers that have different offset with respect
to graphics.
As an existing example already in the code, look at the definitions for
L3SQCREG5. I'm following the same pattern there.
--
Gustavo Sousa
>
>> >
>> >>
>> >>> +#define RWFLUSHALLEN REG_BIT(17)
>> >>> +
>> >>> #define XEHP_L3SQCREG5
>> >>> XE_REG_MCR(0xb158)
>> >>> #define L3_PWM_TIMER_INIT_VAL_MASK REG_GENMASK(9, 0)
>> >>>
>> >>> @@ -406,6 +409,8 @@
>> >>>
>> >>> #define XE2LPM_L3SQCREG3 XE_REG_MCR(0xb608)
>> >>>
>> >>> +#define XE2LPM_SCRATCH3LBCF
>> >>> XE_REG_MCR(0xb654)
>> >>
>> >>Agree with other review that we should name this MEDIA_SCRATCH3 for
>> future use well. Also this does not look to be MCR reg. Please double check.
>> >
>> >I believe the prefix XE2LPM_ is enough to differentiate this from the
>> >primary GT version of this register. It is common to prefix registers
>> >with the name of the IP that introduce the difference with respect to
>> >other versions of the register.
>> >
>> >Please note that the same pattern is used for other registers that have
>> >different offsets on the media GT.
>>
>> Ah... I forgot to reply also with respect to the MCR-related comment.
>> Bspec 71186 lists range [0x38B600, 0x38B8FF] as multicast.
>
>Oh I did not realise it is different range page for media, I was looking gfx one. Thanks for pointing out. You can retain my r-o-b with that.
Thanks!
Please, let me know if the r-b also stands with SCRATCH3_LBCF without
the _GFX suffix.
--
Gustavo Sousa
>
>Tejas
>>
>> --
>> Gustavo Sousa
>>
>> >
>> >--
>> >Gustavo Sousa
>> >
>> >>
>> >>With that addressed,
>> >>Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>> >>
>> >>> +
>> >>> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>> >>>
>> >>> #define XE2_TDF_CTRL XE_REG(0xb418)
>> >>> diff --git a/drivers/gpu/drm/xe/xe_tuning.c
>> >>> b/drivers/gpu/drm/xe/xe_tuning.c index f62622f0be85..4dd77b44ac82
>> >>> 100644
>> >>> --- a/drivers/gpu/drm/xe/xe_tuning.c
>> >>> +++ b/drivers/gpu/drm/xe/xe_tuning.c
>> >>> @@ -80,6 +80,14 @@ static const struct xe_rtp_entry_sr gt_tunings[]
>> >>> = {
>> >>>
>> >>> XE_RTP_ACTIONS(FIELD_SET(XELPMP_STATELESS_COMPRESSION_CTRL,
>> >>> UNIFIED_COMPRESSION_FORMAT,
>> >>>
>> >>> REG_FIELD_PREP(UNIFIED_COMPRESSION_FORMAT, 0)))
>> >>> },
>> >>> + { XE_RTP_NAME("Tuning: L3 RW flush all Cache"),
>> >>> + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2004,
>> >>> XE_RTP_END_VERSION_UNDEFINED)),
>> >>> + XE_RTP_ACTIONS(SET(SCRATCH3, RWFLUSHALLEN))
>> >>> + },
>> >>> + { XE_RTP_NAME("Tuning: L3 RW flush all cache - media"),
>> >>> + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000,
>> >>> XE_RTP_END_VERSION_UNDEFINED)),
>> >>> + XE_RTP_ACTIONS(SET(XE2LPM_SCRATCH3LBCF, RWFLUSHALLEN))
>> >>> + },
>> >>> {}
>> >>> };
>> >>>
>> >>> --
>> >>> 2.46.1
>> >>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-09-20 11:53 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-18 20:47 [PATCH 0/3] Xe2 performance tuning updates Gustavo Sousa
2024-09-18 20:47 ` [PATCH 1/3] drm/xe/xe2: Extend performance tuning to media GT Gustavo Sousa
2024-09-19 8:00 ` Upadhyay, Tejas
2024-09-19 18:08 ` Gustavo Sousa
2024-09-20 5:42 ` Upadhyay, Tejas
2024-09-18 20:47 ` [PATCH 2/3] drm/xe/xe2: Assume tuning settings also apply for future " Gustavo Sousa
2024-09-19 8:01 ` Upadhyay, Tejas
2024-09-18 20:47 ` [PATCH 3/3] drm/xe/xe2: Add performance tuning for L3 cache flushing Gustavo Sousa
2024-09-19 7:39 ` Pottumuttu, Sai Teja
2024-09-19 18:46 ` Gustavo Sousa
2024-09-19 8:22 ` Upadhyay, Tejas
2024-09-19 19:24 ` Gustavo Sousa
2024-09-19 19:36 ` Gustavo Sousa
2024-09-20 5:17 ` Upadhyay, Tejas
2024-09-20 11:53 ` Gustavo Sousa
2024-09-18 23:50 ` ✓ CI.Patch_applied: success for Xe2 performance tuning updates Patchwork
2024-09-18 23:51 ` ✓ CI.checkpatch: " Patchwork
2024-09-18 23:51 ` ✗ CI.KUnit: failure " Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox