Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Lucas De Marchi <lucas.demarchi@intel.com>
To: <intel-xe@lists.freedesktop.org>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>,
	Tvrtko Ursulin <tursulin@ursulin.net>,
	Lucas De Marchi <lucas.demarchi@intel.com>
Subject: [PATCH 6/7] drm/xe/client: Print runtime to fdinfo
Date: Mon, 15 Apr 2024 20:04:53 -0700	[thread overview]
Message-ID: <20240416030454.3739862-7-lucas.demarchi@intel.com> (raw)
In-Reply-To: <20240416030454.3739862-1-lucas.demarchi@intel.com>

Print the accumulated runtime, per client, when printing fdinfo.
Each time a query is done it first does 2 things:

1) loop through all the exec queues for the current client and
   accumulates the runtime, per engine class. CTX_TIMESTAMP is used for
   that, being read from the context image.

2) Read a "GPU timestamp" that can be used for considering "how much GPU
   time has passed" and that has the same unit/ref-clock as the one
   recording the runtime. RING_TIMESTAMP is used for that via MMIO.

This second part is done once per engine class, since it's a register
that is replicated on all engines. It is however the same stamp. At
least for the current GPUs this was tested one. It may be simplified,
but in order to play safe and avoid the cases the clock is different in
future for primary/media GTs, or across engine classes, just read it per
class.

This is exported to userspace as 2 numbers in fdinfo:

	drm-engine-<class>: <GPU_TIMESTAMP> <RUNTIME> ticks

Userspace is expected to collect at least 2 samples, which allows to
know the engine busyness as per:

		    RUNTIME1 - RUNTIME0
	busyness = ---------------------
			  T1 - T0

When calculating the overall system busyness, userspace can loop through
all the clients and add up all the numbers.  Since the GPU timestamp
will be a little bit different, it's expected some fluctuation on
accuracy, but that may be improved with a better hardware/GuC interface
in future, maintaining the UAPI.

Another thing to point out is that it's expected that userspace reads
any 2 samples every few seconds.  Given the update frequency of the
counters involved and that CTX_TIMESTAMP is 32b, it is expect to wrap
every 25 ~ 30 seconds.  This could be mitigated by adding a workqueue to
accumulate the counters every so often, but it's additional complexity
for something that is done already by userspace every few seconds in
tools like gputop (from igt), htop, nvtop, etc.

Test-with: https://lore.kernel.org/igt-dev/20240405060056.59379-1-lucas.demarchi@intel.com/
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_drm_client.c | 81 +++++++++++++++++++++++++++++-
 1 file changed, 80 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index 08f0b7c95901..79eb453bfb14 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -2,6 +2,7 @@
 /*
  * Copyright © 2023 Intel Corporation
  */
+#include "xe_drm_client.h"
 
 #include <drm/drm_print.h>
 #include <drm/xe_drm.h>
@@ -12,7 +13,10 @@
 #include "xe_bo.h"
 #include "xe_bo_types.h"
 #include "xe_device_types.h"
-#include "xe_drm_client.h"
+#include "xe_exec_queue.h"
+#include "xe_gt.h"
+#include "xe_hw_engine.h"
+#include "xe_pm.h"
 #include "xe_trace.h"
 
 /**
@@ -179,6 +183,80 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 	}
 }
 
+static const u64 class_to_mask[] = {
+        [XE_ENGINE_CLASS_RENDER] = XE_HW_ENGINE_RCS_MASK,
+        [XE_ENGINE_CLASS_VIDEO_DECODE] = XE_HW_ENGINE_VCS_MASK,
+        [XE_ENGINE_CLASS_VIDEO_ENHANCE] = XE_HW_ENGINE_VECS_MASK,
+        [XE_ENGINE_CLASS_COPY] = XE_HW_ENGINE_BCS_MASK,
+        [XE_ENGINE_CLASS_OTHER] = XE_HW_ENGINE_GSCCS_MASK,
+        [XE_ENGINE_CLASS_COMPUTE] = XE_HW_ENGINE_CCS_MASK,
+};
+
+static void show_runtime(struct drm_printer *p, struct drm_file *file)
+{
+	struct xe_file *xef = file->driver_priv;
+	struct xe_device *xe = xef->xe;
+	struct xe_gt *gt;
+	struct xe_hw_engine *hwe;
+	struct xe_exec_queue *q;
+	unsigned long i, id_hwe, id_gt, capacity[XE_ENGINE_CLASS_MAX] = { };
+	u64 gpu_timestamp, engine_mask = 0;
+	bool gpu_stamp = false;
+
+	xe_pm_runtime_get(xe);
+
+	mutex_lock(&xef->exec_queue.lock);
+	xa_for_each(&xef->exec_queue.xa, i, q)
+		xe_exec_queue_update_runtime(q);
+	mutex_unlock(&xef->exec_queue.lock);
+
+	for_each_gt(gt, xe, id_gt)
+		engine_mask |= gt->info.engine_mask;
+
+	BUILD_BUG_ON(ARRAY_SIZE(class_to_mask) != XE_ENGINE_CLASS_MAX);
+	for (i = 0; i < XE_ENGINE_CLASS_MAX; i++)
+		capacity[i] = hweight64(engine_mask & class_to_mask[i]);
+
+	/*
+	 * Iterate over all engines, printing the accumulated
+	 * runtime for this xef per engine class
+	 */
+	for_each_gt(gt, xe, id_gt) {
+		xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
+		for_each_hw_engine(hwe, gt, id_hwe) {
+			const char *class_name;
+
+			if (!capacity[hwe->class])
+				continue;
+
+			/*
+			 * Use any (first) engine to have a timestamp to be used every
+			 * time
+			 */
+			if (!gpu_stamp) {
+				gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
+				gpu_stamp = true;
+			}
+
+			class_name = xe_hw_engine_class_to_str(hwe->class);
+
+			drm_printf(p, "drm-engine-%s:\t%llu %llu ticks\n",
+				   class_name, gpu_timestamp,
+				   xef->runtime[hwe->class]);
+
+			if (capacity[hwe->class] > 1)
+				drm_printf(p, "drm-engine-capacity-%s:\t%lu\n",
+					   class_name, capacity[hwe->class]);
+
+			/* engine class already handled, skip next iterations */
+			capacity[hwe->class] = 0;
+		}
+		xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
+	}
+
+	xe_pm_runtime_get(xe);
+}
+
 /**
  * xe_drm_client_fdinfo() - Callback for fdinfo interface
  * @p: The drm_printer ptr
@@ -192,5 +270,6 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 void xe_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
 	show_meminfo(p, file);
+	show_runtime(p, file);
 }
 #endif
-- 
2.43.0


  parent reply	other threads:[~2024-04-16  3:05 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-16  3:04 [PATCH 0/7] drm/xe: Per client usage Lucas De Marchi
2024-04-16  3:04 ` [PATCH 1/7] drm/xe/lrc: Add helper to capture context timestamp Lucas De Marchi
2024-04-16  3:04 ` [PATCH 2/7] drm/xe: Add helper to capture context runtime Lucas De Marchi
2024-04-16  5:26   ` Vivekanandan, Balasubramani
2024-04-16 13:42     ` Lucas De Marchi
2024-04-16 15:45       ` Vivekanandan, Balasubramani
2024-04-16 15:53         ` Lucas De Marchi
2024-04-16  3:04 ` [PATCH 3/7] drm/xe: Promote xe_hw_engine_class_to_str() Lucas De Marchi
2024-04-16  9:36   ` Nirmoy Das
2024-04-19 18:36     ` Zeng, Oak
2024-04-16  3:04 ` [PATCH 4/7] drm/xe: Add XE_ENGINE_CLASS_OTHER to str conversion Lucas De Marchi
2024-04-16  9:37   ` Nirmoy Das
2024-04-16  3:04 ` [PATCH 5/7] drm/xe: Add helper to capture engine timestamp Lucas De Marchi
2024-04-16 22:56   ` Umesh Nerlige Ramappa
2024-04-17  3:14     ` Lucas De Marchi
2024-04-18 18:24       ` Umesh Nerlige Ramappa
2024-04-16  3:04 ` Lucas De Marchi [this message]
2024-04-16 23:20   ` [PATCH 6/7] drm/xe/client: Print runtime to fdinfo Umesh Nerlige Ramappa
2024-04-17  3:11     ` Lucas De Marchi
2024-04-18 23:12       ` Umesh Nerlige Ramappa
2024-04-19 13:25         ` Lucas De Marchi
2024-04-16  3:04 ` [PATCH 7/7] HACK: simple gputop-like impl in python Lucas De Marchi
2024-04-16  3:17 ` ✓ CI.Patch_applied: success for drm/xe: Per client usage Patchwork
2024-04-16  3:17 ` ✗ CI.checkpatch: warning " Patchwork
2024-04-16  3:18 ` ✓ CI.KUnit: success " Patchwork
2024-04-16  3:30 ` ✓ CI.Build: " Patchwork
2024-04-16  3:32 ` ✓ CI.Hooks: " Patchwork
2024-04-16  3:34 ` ✓ CI.checksparse: " Patchwork
2024-04-16  3:59 ` ✗ CI.BAT: failure " Patchwork
2024-04-16  8:37 ` [PATCH 0/7] " Tvrtko Ursulin
2024-04-16 13:30   ` Lucas De Marchi
2024-04-16 13:51     ` Lucas De Marchi
2024-04-16 14:22       ` Tvrtko Ursulin
2024-04-16 18:29         ` Lucas De Marchi
2024-04-17  8:51           ` Tvrtko Ursulin
2024-04-17 19:05             ` Lucas De Marchi
2024-04-17 20:35               ` Umesh Nerlige Ramappa
2024-04-17 23:19                 ` Lucas De Marchi
2024-04-18  8:09                   ` Tvrtko Ursulin
2024-04-19 10:44                   ` Tvrtko Ursulin
2024-04-19 23:51                     ` Umesh Nerlige Ramappa
2024-04-22 10:40                       ` Tvrtko Ursulin
2024-04-22 17:17                         ` Umesh Nerlige Ramappa
2024-04-23  8:44                           ` Tvrtko Ursulin
2024-04-24  0:40                             ` Lucas De Marchi
2024-04-16 22:12 ` ✗ CI.FULL: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240416030454.3739862-7-lucas.demarchi@intel.com \
    --to=lucas.demarchi@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=tursulin@ursulin.net \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox