[RFC PATCH V2] drm/xe/guc: Use exec queue hints for GT frequency

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Tejas Upadhyay <tejas.upadhyay@intel.com>
To: intel-xe@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org, badal.nilawar@intel.com,
	vinay.belgaumkar@intel.com, michal.mrozek@intel.com,
	szymon.morek@intel.com, jose.souza@intel.com,
	lucas.demarchi@intel.com,
	Tejas Upadhyay <tejas.upadhyay@intel.com>
Subject: [RFC PATCH V2] drm/xe/guc: Use exec queue hints for GT frequency
Date: Thu,  9 Jan 2025 17:37:05 +0530	[thread overview]
Message-ID: <20250109120705.3021126-1-tejas.upadhyay@intel.com> (raw)

Allow user to provide a low latency hint per exec queue. When set,
KMD sends a hint to GuC which results in special handling for this
exec queue. SLPC will ramp the GT frequency aggressively every time
it switches to this exec queue.

We need to enable the use of SLPC Compute strategy during init, but
it will apply only to exec queues that set this bit during exec queue
creation.

Improvement with this approach as below:

Before,

:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Graphics [0xe20b]
    Driver version  : 24.52.0 (Linux x64)
    Compute units   : 160
    Clock frequency : 2850 MHz
    Kernel launch latency : 283.16 us

After,

:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Graphics [0xe20b]
    Driver version  : 24.52.0 (Linux x64)
    Compute units   : 160
    Clock frequency : 2850 MHz

    Kernel launch latency : 63.38 us

UMD will indicate low latency hint with flag as mentioned below,

*     struct drm_xe_exec_queue_create exec_queue_create = {
*          .flags = DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT or 0
*          .extensions = 0,
*          .vm_id = vm,
*          .num_bb_per_exec = 1,
*          .num_eng_per_bb = 1,
*          .instances = to_user_pointer(&instance),
*     };
*     ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &exec_queue_create);

Link to UMD PR : https://github.com/intel/compute-runtime/pull/794

Note: There is outstanding issue on guc side to be not able to switch to max
frequency as per strategy indicated by KMD, so for experminet/test result
hardcoding apporch was taken and passed to guc as policy. Effort on debugging
from guc side is going on in parallel.

V2:
  - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 is already planned for other hint(Szymon)
  - Add motivation to description (Lucas)

Cc:dri-devel@lists.freedesktop.org
Cc:vinay.belgaumkar@intel.com
Cc:Michal Mrozek <michal.mrozek@intel.com>
Cc:Szymon Morek <szymon.morek@intel.com>
Cc:José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h |  3 +++
 drivers/gpu/drm/xe/xe_exec_queue.c            |  7 ++++---
 drivers/gpu/drm/xe/xe_guc_pc.c                | 16 ++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_submit.c            |  7 +++++++
 include/uapi/drm/xe_drm.h                     |  3 ++-
 5 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h
index 85abe4f09ae2..c50075b8270f 100644
--- a/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h
@@ -174,6 +174,9 @@ struct slpc_task_state_data {
 	};
 } __packed;
 
+#define SLPC_EXEC_QUEUE_FREQ_REQ_IS_COMPUTE	REG_BIT(28)
+#define SLPC_OPTIMIZED_STRATEGY_COMPUTE		REG_BIT(0)
+
 struct slpc_shared_data_header {
 	/* Total size in bytes of this shared buffer. */
 	u32 size;
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 8948f50ee58f..7747ba6c4bb8 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -553,7 +553,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 	u32 len;
 	int err;
 
-	if (XE_IOCTL_DBG(xe, args->flags) ||
+	if (XE_IOCTL_DBG(xe, args->flags &&
+			 !(args->flags & DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT)) ||
 	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
 		return -EINVAL;
 
@@ -578,7 +579,7 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 
 		for_each_tile(tile, xe, id) {
 			struct xe_exec_queue *new;
-			u32 flags = EXEC_QUEUE_FLAG_VM;
+			u32 flags = args->flags | EXEC_QUEUE_FLAG_VM;
 
 			if (id)
 				flags |= EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD;
@@ -626,7 +627,7 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 		}
 
 		q = xe_exec_queue_create(xe, vm, logical_mask,
-					 args->width, hwe, 0,
+					 args->width, hwe, args->flags,
 					 args->extensions);
 		up_read(&vm->lock);
 		xe_vm_put(vm);
diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
index df7f130fb663..ff0b98ccf1a7 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.c
+++ b/drivers/gpu/drm/xe/xe_guc_pc.c
@@ -992,6 +992,19 @@ static int pc_init_freqs(struct xe_guc_pc *pc)
 	return ret;
 }
 
+static int xe_guc_pc_set_strategy(struct xe_guc_pc *pc, u32 val)
+{
+	int ret = 0;
+
+	xe_pm_runtime_get(pc_to_xe(pc));
+	ret = pc_action_set_param(pc,
+				  SLPC_PARAM_STRATEGIES,
+				  val);
+	xe_pm_runtime_put(pc_to_xe(pc));
+
+	return ret;
+}
+
 /**
  * xe_guc_pc_start - Start GuC's Power Conservation component
  * @pc: Xe_GuC_PC instance
@@ -1052,6 +1065,9 @@ int xe_guc_pc_start(struct xe_guc_pc *pc)
 
 	ret = pc_action_setup_gucrc(pc, GUCRC_FIRMWARE_CONTROL);
 
+	/* Enable SLPC Optimized Strategy for compute */
+	xe_guc_pc_set_strategy(pc, SLPC_OPTIMIZED_STRATEGY_COMPUTE);
+
 out:
 	xe_force_wake_put(gt_to_fw(gt), fw_ref);
 	return ret;
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 9c36329fe857..88a1987ac360 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -15,6 +15,7 @@
 #include <drm/drm_managed.h>
 
 #include "abi/guc_actions_abi.h"
+#include "abi/guc_actions_slpc_abi.h"
 #include "abi/guc_klvs_abi.h"
 #include "regs/xe_lrc_layout.h"
 #include "xe_assert.h"
@@ -400,6 +401,7 @@ static void __guc_exec_queue_policy_add_##func(struct exec_queue_policy *policy,
 MAKE_EXEC_QUEUE_POLICY_ADD(execution_quantum, EXECUTION_QUANTUM)
 MAKE_EXEC_QUEUE_POLICY_ADD(preemption_timeout, PREEMPTION_TIMEOUT)
 MAKE_EXEC_QUEUE_POLICY_ADD(priority, SCHEDULING_PRIORITY)
+MAKE_EXEC_QUEUE_POLICY_ADD(slpc_ctx_freq_req, SLPM_GT_FREQUENCY)
 #undef MAKE_EXEC_QUEUE_POLICY_ADD
 
 static const int xe_exec_queue_prio_to_guc[] = {
@@ -414,14 +416,19 @@ static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q)
 	struct exec_queue_policy policy;
 	enum xe_exec_queue_priority prio = q->sched_props.priority;
 	u32 timeslice_us = q->sched_props.timeslice_us;
+	u32 slpc_ctx_freq_req = 0;
 	u32 preempt_timeout_us = q->sched_props.preempt_timeout_us;
 
 	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
 
+	if (q->flags & DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT)
+		slpc_ctx_freq_req |= SLPC_EXEC_QUEUE_FREQ_REQ_IS_COMPUTE;
+
 	__guc_exec_queue_policy_start_klv(&policy, q->guc->id);
 	__guc_exec_queue_policy_add_priority(&policy, xe_exec_queue_prio_to_guc[prio]);
 	__guc_exec_queue_policy_add_execution_quantum(&policy, timeslice_us);
 	__guc_exec_queue_policy_add_preemption_timeout(&policy, preempt_timeout_us);
+	__guc_exec_queue_policy_add_slpc_ctx_freq_req(&policy, slpc_ctx_freq_req);
 
 	xe_guc_ct_send(&guc->ct, (u32 *)&policy.h2g,
 		       __guc_exec_queue_policy_action_size(&policy), 0, 0);
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index f62689ca861a..bd0150d2200c 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1097,6 +1097,7 @@ struct drm_xe_vm_bind {
  *         .engine_class = DRM_XE_ENGINE_CLASS_RENDER,
  *     };
  *     struct drm_xe_exec_queue_create exec_queue_create = {
+ *          .flags = DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT or 0
  *          .extensions = 0,
  *          .vm_id = vm,
  *          .num_bb_per_exec = 1,
@@ -1110,7 +1111,6 @@ struct drm_xe_exec_queue_create {
 #define DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY		0
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
-
 	/** @extensions: Pointer to the first extension struct, if any */
 	__u64 extensions;
 
@@ -1123,6 +1123,7 @@ struct drm_xe_exec_queue_create {
 	/** @vm_id: VM to use for this exec queue */
 	__u32 vm_id;
 
+#define DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT	(0x1 << 1)
 	/** @flags: MBZ */
 	__u32 flags;
 
-- 
2.34.1

next             reply	other threads:[~2025-01-09 12:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-09 12:07 Tejas Upadhyay [this message]
2025-01-09 12:35 ` [RFC PATCH V2] drm/xe/guc: Use exec queue hints for GT frequency Mrozek, Michal
2025-01-09 14:31 ` ✓ CI.Patch_applied: success for drm/xe/guc: Use exec queue hints for GT frequency (rev2) Patchwork
2025-01-09 14:31 ` ✗ CI.checkpatch: warning " Patchwork
2025-01-09 14:33 ` ✓ CI.KUnit: success " Patchwork
2025-01-09 14:36 ` [RFC PATCH V2] drm/xe/guc: Use exec queue hints for GT frequency Souza, Jose
2025-01-09 18:36   ` Belgaumkar, Vinay
2025-01-10  6:42     ` Upadhyay, Tejas
2025-01-09 14:51 ` ✓ CI.Build: success for drm/xe/guc: Use exec queue hints for GT frequency (rev2) Patchwork
2025-01-09 14:53 ` ✓ CI.Hooks: " Patchwork
2025-01-09 14:55 ` ✓ CI.checksparse: " Patchwork
2025-01-09 15:24 ` ✓ Xe.CI.BAT: " Patchwork
2025-01-09 17:37 ` [RFC PATCH V2] drm/xe/guc: Use exec queue hints for GT frequency Zeng, Oak
2025-01-09 22:03   ` Belgaumkar, Vinay
2025-01-09 22:59     ` Zeng, Oak
2025-01-10  6:46       ` Upadhyay, Tejas
2025-01-11 22:27 ` ✗ Xe.CI.Full: failure for drm/xe/guc: Use exec queue hints for GT frequency (rev2) Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2025-01-09 11:50 [RFC PATCH V2] drm/xe/guc: Use exec queue hints for GT frequency Tejas Upadhyay

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:85abe4f09ae dfblob:c50075b8270 dfblob:8948f50ee58
dfblob:7747ba6c4bb dfblob:df7f130fb66 dfblob:ff0b98ccf1a
dfblob:9c36329fe85 dfblob:88a1987ac36 dfblob:f62689ca861
dfblob:bd0150d2200 )
 OR (
bs:"[RFC PATCH V2] drm/xe/guc: Use exec queue hints for GT frequency" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250109120705.3021126-1-tejas.upadhyay@intel.com \
    --to=tejas.upadhyay@intel.com \
    --cc=badal.nilawar@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jose.souza@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=michal.mrozek@intel.com \
    --cc=szymon.morek@intel.com \
    --cc=vinay.belgaumkar@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox