[PATCH v13 0/7] drm/xe/pmu: PMU interface for Xe

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v13 0/7] drm/xe/pmu: PMU interface for Xe
@ 2025-01-16 23:07 Lucas De Marchi
  2025-01-16 23:07 ` [PATCH v13 1/7] perf/core: Add PMU_EVENT_ATTR_ID_STRING Lucas De Marchi
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-16 23:07 UTC (permalink / raw)
  To: intel-xe; +Cc: Peter Zijlstra, linux-perf-users, Lucas De Marchi

Previous versions: https://patchwork.freedesktop.org/series/139121/

For perf/core: first patch adds a new macro in include/linux/perf_event.h.
If it's ok, I'd like to merge it through the drm branch so we can add the
pmu implementation to the xe driver. I'm also Cc'ing you on the other
patches to show its usage.

For "drm/xe/pmu: Enable PMU interface", previously from several authors,
I squashed several changes I suggested in the past to clean it up and
reduce boilerplate code that resulted from copy-paste from i915. More
detailed changes can be found at
https://gitlab.freedesktop.org/demarchi/xe/-/tags/tip-xe-pmu-v13

Compared to v12 this only has 1 event as Vinay said he will rebase the
frequency event on top. I tested this on Lunar Lake with and without
workload running and the c6 residency seems reasonable.

Lucas De Marchi (4):
  perf/core: Add PMU_EVENT_ATTR_ID_STRING
  drm/xe/pmu: Assert max gt
  drm/xe/pmu: Extract xe_pmu_event_update()
  drm/xe/pmu: Add attribute skeleton

Vinay Belgaumkar (3):
  drm/xe/pmu: Enable PMU interface
  drm/xe/pmu: Hook up gt suspend notification
  drm/xe/pmu: Add GT C6 events

 drivers/gpu/drm/xe/Makefile          |   2 +
 drivers/gpu/drm/xe/xe_device.c       |   3 +
 drivers/gpu/drm/xe/xe_device_types.h |   4 +
 drivers/gpu/drm/xe/xe_gt.c           |   3 +
 drivers/gpu/drm/xe/xe_pmu.c          | 408 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_pmu.h          |  22 ++
 drivers/gpu/drm/xe/xe_pmu_types.h    |  51 ++++
 include/linux/perf_event.h           |   7 +-
 8 files changed, 498 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.c
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.h
 create mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h

-- 
2.48.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v13 1/7] perf/core: Add PMU_EVENT_ATTR_ID_STRING
  2025-01-16 23:07 [PATCH v13 0/7] drm/xe/pmu: PMU interface for Xe Lucas De Marchi
@ 2025-01-16 23:07 ` Lucas De Marchi
  2025-01-16 23:07 ` [PATCH v13 2/7] drm/xe/pmu: Enable PMU interface Lucas De Marchi
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-16 23:07 UTC (permalink / raw)
  To: intel-xe; +Cc: Peter Zijlstra, linux-perf-users, Lucas De Marchi

struct perf_pmu_events_attr has both id and event_str however zeroes
the id and only set event_str. Add another macro that allows to set both
so drivers can make use of them. The id is useful for determining the
visibility of the attributes without resorting to creating separate
groups passed via update_attr, while the event_str is still useful for
attributes like *.unit or *.scale.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 include/linux/perf_event.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index cb99ec8c9e96f..423f21b51cb0f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1923,13 +1923,16 @@ static struct perf_pmu_events_attr _var = {				\
 	.id   =  _id,							\
 };
 
-#define PMU_EVENT_ATTR_STRING(_name, _var, _str)			    \
+#define PMU_EVENT_ATTR_ID_STRING(_name, _var, _id, _str)		    \
 static struct perf_pmu_events_attr _var = {				    \
 	.attr		= __ATTR(_name, 0444, perf_event_sysfs_show, NULL), \
-	.id		= 0,						    \
+	.id		= _id,						    \
 	.event_str	= _str,						    \
 };
 
+#define PMU_EVENT_ATTR_STRING(_name, _var, _str)			\
+	PMU_EVENT_ATTR_ID_STRING(_name, _var, 0, _str)
+
 #define PMU_EVENT_ATTR_ID(_name, _show, _id)				\
 	(&((struct perf_pmu_events_attr[]) {				\
 		{ .attr = __ATTR(_name, 0444, _show, NULL),		\
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v13 2/7] drm/xe/pmu: Enable PMU interface
  2025-01-16 23:07 [PATCH v13 0/7] drm/xe/pmu: PMU interface for Xe Lucas De Marchi
  2025-01-16 23:07 ` [PATCH v13 1/7] perf/core: Add PMU_EVENT_ATTR_ID_STRING Lucas De Marchi
@ 2025-01-16 23:07 ` Lucas De Marchi
  2025-01-17 18:30   ` Rodrigo Vivi
  2025-01-16 23:07 ` [PATCH v13 3/7] drm/xe/pmu: Assert max gt Lucas De Marchi
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-16 23:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Peter Zijlstra, linux-perf-users, Vinay Belgaumkar, Rodrigo Vivi,
	Lucas De Marchi

From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>

Basic PMU enabling patch. Setup the basic framework
for adding events/timers. This patch was previously
reviewed here -
https://patchwork.freedesktop.org/series/119504/

Based on previous versions by Bommu Krishnaiah, Aravind Iddamsetty and
Riana Tauro, using i915 and rapl as reference implementation.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/Makefile          |   2 +
 drivers/gpu/drm/xe/xe_device.c       |   3 +
 drivers/gpu/drm/xe/xe_device_types.h |   4 +
 drivers/gpu/drm/xe/xe_pmu.c          | 300 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_pmu.h          |  20 ++
 drivers/gpu/drm/xe/xe_pmu_types.h    |  43 ++++
 6 files changed, 372 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.c
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.h
 create mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 81f63258a7e19..b476866a2a68e 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -304,6 +304,8 @@ endif
 xe-$(CONFIG_DRM_XE_DP_TUNNEL) += \
 	i915-display/intel_dp_tunnel.o
 
+xe-$(CONFIG_PERF_EVENTS) += xe_pmu.o
+
 obj-$(CONFIG_DRM_XE) += xe.o
 obj-$(CONFIG_DRM_XE_KUNIT_TEST) += tests/
 
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 0966d9697cafe..a0a80fa16c39a 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -49,6 +49,7 @@
 #include "xe_pat.h"
 #include "xe_pcode.h"
 #include "xe_pm.h"
+#include "xe_pmu.h"
 #include "xe_query.h"
 #include "xe_sriov.h"
 #include "xe_tile.h"
@@ -865,6 +866,8 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_oa_register(xe);
 
+	xe_pmu_register(&xe->pmu);
+
 	xe_debugfs_register(xe);
 
 	xe_hwmon_register(xe);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 16ebb2859877f..58e79e19deaad 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -18,6 +18,7 @@
 #include "xe_memirq_types.h"
 #include "xe_oa_types.h"
 #include "xe_platform_types.h"
+#include "xe_pmu_types.h"
 #include "xe_pt_types.h"
 #include "xe_sriov_types.h"
 #include "xe_step_types.h"
@@ -514,6 +515,9 @@ struct xe_device {
 		int mode;
 	} wedged;
 
+	/** @pmu: performance monitoring unit */
+	struct xe_pmu pmu;
+
 #ifdef TEST_VM_OPS_ERROR
 	/**
 	 * @vm_inject_error_position: inject errors at different places in VM
diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
new file mode 100644
index 0000000000000..b63f819c54d02
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -0,0 +1,300 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#include <drm/drm_drv.h>
+#include <drm/drm_managed.h>
+#include <drm/xe_drm.h>
+
+#include "regs/xe_gt_regs.h"
+#include "xe_device.h"
+#include "xe_force_wake.h"
+#include "xe_gt_clock.h"
+#include "xe_gt_printk.h"
+#include "xe_mmio.h"
+#include "xe_macros.h"
+#include "xe_pm.h"
+#include "xe_pmu.h"
+
+/**
+ * DOC: Xe PMU (Performance Monitoring Unit)
+ *
+ * Expose events/counters like C6 residency and GT frequency to user land.
+ * Events will be per device, the GT can be selected with an extra config
+ * sub-field (bits 60-63).
+ *
+ * Perf tool can be used to list these counters from the command line.
+ *
+ * Example commands to list/record supported perf events-
+ *
+ * $ ls -ld /sys/bus/event_source/devices/xe_*
+ * $ ls /sys/bus/event_source/devices/xe_0000_00_02.0/events/
+ * $ ls /sys/bus/event_source/devices/xe_0000_00_02.0/format/
+ *
+ * The format directory has info regarding the configs that can be used.
+ *
+ * The standard perf tool can be used to grep for a certain event as well-
+ * $ perf list | grep c6
+ *
+ * To list a specific event for a GT at regular intervals-
+ * $ perf stat -e <event_name,xe_gt_id=> -I <interval>
+ */
+
+#define XE_PMU_EVENT_GT_MASK		GENMASK_ULL(63, 60)
+#define XE_PMU_EVENT_ID_MASK		GENMASK_ULL(11, 0)
+
+static unsigned int config_to_event_id(u64 config)
+{
+	return FIELD_GET(XE_PMU_EVENT_ID_MASK, config);
+}
+
+static unsigned int config_to_gt_id(u64 config)
+{
+	return FIELD_GET(XE_PMU_EVENT_GT_MASK, config);
+}
+
+static struct xe_gt *event_to_gt(struct perf_event *event)
+{
+	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+	u64 gt = config_to_gt_id(event->attr.config);
+
+	return xe_device_get_gt(xe, gt);
+}
+
+static bool event_supported(struct xe_pmu *pmu, unsigned int gt,
+			    unsigned int id)
+{
+	if (gt >= XE_MAX_GT_PER_TILE)
+		return false;
+
+	return false;
+}
+
+static void xe_pmu_event_destroy(struct perf_event *event)
+{
+	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+
+	drm_WARN_ON(&xe->drm, event->parent);
+	drm_dev_put(&xe->drm);
+}
+
+static int xe_pmu_event_init(struct perf_event *event)
+{
+	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+	struct xe_pmu *pmu = &xe->pmu;
+	unsigned int id, gt;
+
+	if (!pmu->registered)
+		return -ENODEV;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* unsupported modes and filters */
+	if (event->attr.sample_period) /* no sampling */
+		return -EINVAL;
+
+	if (event->cpu < 0)
+		return -EINVAL;
+
+	gt = config_to_gt_id(event->attr.config);
+	id = config_to_event_id(event->attr.config);
+	if (!event_supported(pmu, gt, id))
+		return -ENOENT;
+
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	if (!event->parent) {
+		drm_dev_get(&xe->drm);
+		event->destroy = xe_pmu_event_destroy;
+	}
+
+	return 0;
+}
+
+static u64 __xe_pmu_event_read(struct perf_event *event)
+{
+	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+	struct xe_gt *gt = event_to_gt(event);
+	u64 val = 0;
+
+	if (!gt)
+		return 0;
+
+	return val;
+}
+
+static void xe_pmu_event_read(struct perf_event *event)
+{
+	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+	struct hw_perf_event *hwc = &event->hw;
+	struct xe_pmu *pmu = &xe->pmu;
+	u64 prev, new;
+
+	if (!pmu->registered) {
+		event->hw.state = PERF_HES_STOPPED;
+		return;
+	}
+
+	prev = local64_read(&hwc->prev_count);
+	do {
+		new = __xe_pmu_event_read(event);
+	} while (!local64_try_cmpxchg(&hwc->prev_count, &prev, new));
+
+	local64_add(new - prev, &event->count);
+}
+
+static void xe_pmu_enable(struct perf_event *event)
+{
+	/*
+	 * Store the current counter value so we can report the correct delta
+	 * for all listeners. Even when the event was already enabled and has
+	 * an existing non-zero value.
+	 */
+	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
+}
+
+static void xe_pmu_event_start(struct perf_event *event, int flags)
+{
+	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+	struct xe_pmu *pmu = &xe->pmu;
+
+	if (!pmu->registered)
+		return;
+
+	xe_pmu_enable(event);
+	event->hw.state = 0;
+}
+
+static void xe_pmu_event_stop(struct perf_event *event, int flags)
+{
+	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+	struct xe_pmu *pmu = &xe->pmu;
+
+	if (pmu->registered)
+		if (flags & PERF_EF_UPDATE)
+			xe_pmu_event_read(event);
+
+	event->hw.state = PERF_HES_STOPPED;
+}
+
+static int xe_pmu_event_add(struct perf_event *event, int flags)
+{
+	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+	struct xe_pmu *pmu = &xe->pmu;
+
+	if (!pmu->registered)
+		return -ENODEV;
+
+	if (flags & PERF_EF_START)
+		xe_pmu_event_start(event, flags);
+
+	return 0;
+}
+
+static void xe_pmu_event_del(struct perf_event *event, int flags)
+{
+	xe_pmu_event_stop(event, PERF_EF_UPDATE);
+}
+
+PMU_FORMAT_ATTR(gt,	"config:60-63");
+PMU_FORMAT_ATTR(event,	"config:0-11");
+
+static struct attribute *pmu_format_attrs[] = {
+	&format_attr_event.attr,
+	&format_attr_gt.attr,
+	NULL,
+};
+
+static const struct attribute_group pmu_format_attr_group = {
+	.name = "format",
+	.attrs = pmu_format_attrs,
+};
+
+static struct attribute *pmu_event_attrs[] = {
+	/* No events yet */
+	NULL,
+};
+
+static const struct attribute_group pmu_events_attr_group = {
+	.name = "events",
+	.attrs = pmu_event_attrs,
+};
+
+/**
+ * xe_pmu_unregister() - Remove/cleanup PMU registration
+ * @arg: Ptr to pmu
+ */
+static void xe_pmu_unregister(void *arg)
+{
+	struct xe_pmu *pmu = arg;
+	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
+
+	if (!pmu->registered)
+		return;
+
+	pmu->registered = false;
+
+	perf_pmu_unregister(&pmu->base);
+	kfree(pmu->name);
+}
+
+/**
+ * xe_pmu_register() - Define basic PMU properties for Xe and add event callbacks.
+ * @pmu: the PMU object
+ *
+ * Returns 0 on success and an appropriate error code otherwise
+ */
+int xe_pmu_register(struct xe_pmu *pmu)
+{
+	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
+	static const struct attribute_group *attr_groups[] = {
+		&pmu_format_attr_group,
+		&pmu_events_attr_group,
+		NULL
+	};
+	int ret = -ENOMEM;
+	char *name;
+
+	if (IS_SRIOV_VF(xe))
+		return 0;
+
+	raw_spin_lock_init(&pmu->lock);
+
+	name = kasprintf(GFP_KERNEL, "xe_%s",
+			 dev_name(xe->drm.dev));
+	if (!name)
+		goto err;
+
+	/* tools/perf reserves colons as special. */
+	strreplace(name, ':', '_');
+
+	pmu->name		= name;
+	pmu->base.attr_groups	= attr_groups;
+	pmu->base.scope		= PERF_PMU_SCOPE_SYS_WIDE;
+	pmu->base.module	= THIS_MODULE;
+	pmu->base.task_ctx_nr	= perf_invalid_context;
+	pmu->base.event_init	= xe_pmu_event_init;
+	pmu->base.add		= xe_pmu_event_add;
+	pmu->base.del		= xe_pmu_event_del;
+	pmu->base.start		= xe_pmu_event_start;
+	pmu->base.stop		= xe_pmu_event_stop;
+	pmu->base.read		= xe_pmu_event_read;
+
+	ret = perf_pmu_register(&pmu->base, pmu->name, -1);
+	if (ret)
+		goto err_name;
+
+	pmu->registered = true;
+
+	return devm_add_action_or_reset(xe->drm.dev, xe_pmu_unregister, pmu);
+
+err_name:
+	kfree(name);
+err:
+	drm_err(&xe->drm, "Failed to register PMU (ret=%d)!\n", ret);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/xe/xe_pmu.h b/drivers/gpu/drm/xe/xe_pmu.h
new file mode 100644
index 0000000000000..f9dfe77d00cb6
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pmu.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_PMU_H_
+#define _XE_PMU_H_
+
+#include "xe_pmu_types.h"
+
+struct xe_gt;
+
+#if IS_ENABLED(CONFIG_PERF_EVENTS)
+int xe_pmu_register(struct xe_pmu *pmu);
+#else
+static inline void xe_pmu_register(struct xe_pmu *pmu) {}
+#endif
+
+#endif
+
diff --git a/drivers/gpu/drm/xe/xe_pmu_types.h b/drivers/gpu/drm/xe/xe_pmu_types.h
new file mode 100644
index 0000000000000..e0cf7169f4fda
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pmu_types.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_PMU_TYPES_H_
+#define _XE_PMU_TYPES_H_
+
+#include <linux/perf_event.h>
+#include <linux/spinlock_types.h>
+
+enum {
+	__XE_NUM_PMU_SAMPLERS
+};
+
+#define XE_PMU_MAX_GT 2
+
+/**
+ * struct xe_pmu - PMU related data per Xe device
+ *
+ * Stores per device PMU info that includes event/perf attributes and
+ * sampling counters across all GTs for this device.
+ */
+struct xe_pmu {
+	/**
+	 * @base: PMU base.
+	 */
+	struct pmu base;
+	/**
+	 * @registered: PMU is registered and not in the unregistering process.
+	 */
+	bool registered;
+	/**
+	 * @name: Name as registered with perf core.
+	 */
+	const char *name;
+	/**
+	 * @lock: Lock protecting enable mask and ref count handling.
+	 */
+	raw_spinlock_t lock;
+};
+
+#endif
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v13 3/7] drm/xe/pmu: Assert max gt
  2025-01-16 23:07 [PATCH v13 0/7] drm/xe/pmu: PMU interface for Xe Lucas De Marchi
  2025-01-16 23:07 ` [PATCH v13 1/7] perf/core: Add PMU_EVENT_ATTR_ID_STRING Lucas De Marchi
  2025-01-16 23:07 ` [PATCH v13 2/7] drm/xe/pmu: Enable PMU interface Lucas De Marchi
@ 2025-01-16 23:07 ` Lucas De Marchi
  2025-01-17 18:31   ` Rodrigo Vivi
  2025-01-16 23:07 ` [PATCH v13 4/7] drm/xe/pmu: Hook up gt suspend notification Lucas De Marchi
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-16 23:07 UTC (permalink / raw)
  To: intel-xe; +Cc: Peter Zijlstra, linux-perf-users, Lucas De Marchi

XE_PMU_MAX_GT needs to be used due to a circular dependency, but we
should make sure it doesn't go out of sync with XE_PMU_MAX_GT. Add a
compile check for that.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_pmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
index b63f819c54d02..2312c73a3ee16 100644
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -258,6 +258,8 @@ int xe_pmu_register(struct xe_pmu *pmu)
 	int ret = -ENOMEM;
 	char *name;
 
+	BUILD_BUG_ON(XE_MAX_GT_PER_TILE != XE_PMU_MAX_GT);
+
 	if (IS_SRIOV_VF(xe))
 		return 0;
 
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v13 4/7] drm/xe/pmu: Hook up gt suspend notification
  2025-01-16 23:07 [PATCH v13 0/7] drm/xe/pmu: PMU interface for Xe Lucas De Marchi
                   ` (2 preceding siblings ...)
  2025-01-16 23:07 ` [PATCH v13 3/7] drm/xe/pmu: Assert max gt Lucas De Marchi
@ 2025-01-16 23:07 ` Lucas De Marchi
  2025-01-16 23:07 ` [PATCH v13 5/7] drm/xe/pmu: Extract xe_pmu_event_update() Lucas De Marchi
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-16 23:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Peter Zijlstra, linux-perf-users, Vinay Belgaumkar,
	Lucas De Marchi

From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>

When the device is runtime suspended it's not desired to wake it up to
read the idle residency - that would make the measurement interfere on
the outcome. Hook up the pmu to the gt suspend flow so it's possible to
estimate the idle residency while device is suspended.

v2: Extract suspend notification as a preparatory step (Lucas)

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.c        |  3 +++
 drivers/gpu/drm/xe/xe_pmu.c       | 18 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_pmu.h       |  2 ++
 drivers/gpu/drm/xe/xe_pmu_types.h |  4 ++++
 4 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 26e64530ada27..d3b9df6438d6c 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -47,6 +47,7 @@
 #include "xe_mmio.h"
 #include "xe_pat.h"
 #include "xe_pm.h"
+#include "xe_pmu.h"
 #include "xe_mocs.h"
 #include "xe_reg_sr.h"
 #include "xe_ring_ops.h"
@@ -877,6 +878,8 @@ int xe_gt_suspend(struct xe_gt *gt)
 
 	xe_gt_disable_host_l2_vram(gt);
 
+	xe_pmu_suspend(gt);
+
 	xe_force_wake_put(gt_to_fw(gt), fw_ref);
 	xe_gt_dbg(gt, "suspended\n");
 
diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
index 2312c73a3ee16..ce53cfb846470 100644
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -223,6 +223,24 @@ static const struct attribute_group pmu_events_attr_group = {
 	.attrs = pmu_event_attrs,
 };
 
+/**
+ * xe_pmu_suspend() - Save necessary state before suspending
+ * @gt: GT structure
+ */
+void xe_pmu_suspend(struct xe_gt *gt)
+{
+	struct xe_device *xe = gt_to_xe(gt);
+	struct xe_pmu *pmu = &xe->pmu;
+	unsigned long flags;
+
+	if (!pmu->registered)
+		return;
+
+	raw_spin_lock_irqsave(&pmu->lock, flags);
+	pmu->suspend_timestamp[gt->info.id] = ktime_get();
+	raw_spin_unlock_irqrestore(&pmu->lock, flags);
+}
+
 /**
  * xe_pmu_unregister() - Remove/cleanup PMU registration
  * @arg: Ptr to pmu
diff --git a/drivers/gpu/drm/xe/xe_pmu.h b/drivers/gpu/drm/xe/xe_pmu.h
index f9dfe77d00cb6..c53c6a4a2ae55 100644
--- a/drivers/gpu/drm/xe/xe_pmu.h
+++ b/drivers/gpu/drm/xe/xe_pmu.h
@@ -12,8 +12,10 @@ struct xe_gt;
 
 #if IS_ENABLED(CONFIG_PERF_EVENTS)
 int xe_pmu_register(struct xe_pmu *pmu);
+void xe_pmu_suspend(struct xe_gt *gt);
 #else
 static inline void xe_pmu_register(struct xe_pmu *pmu) {}
+static inline void xe_pmu_suspend(struct xe_gt *gt) {}
 #endif
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pmu_types.h b/drivers/gpu/drm/xe/xe_pmu_types.h
index e0cf7169f4fda..883e462852377 100644
--- a/drivers/gpu/drm/xe/xe_pmu_types.h
+++ b/drivers/gpu/drm/xe/xe_pmu_types.h
@@ -38,6 +38,10 @@ struct xe_pmu {
 	 * @lock: Lock protecting enable mask and ref count handling.
 	 */
 	raw_spinlock_t lock;
+	/**
+	 * @suspend_timestamp: Last time GT suspended
+	 */
+	ktime_t suspend_timestamp[XE_PMU_MAX_GT];
 };
 
 #endif
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v13 5/7] drm/xe/pmu: Extract xe_pmu_event_update()
  2025-01-16 23:07 [PATCH v13 0/7] drm/xe/pmu: PMU interface for Xe Lucas De Marchi
                   ` (3 preceding siblings ...)
  2025-01-16 23:07 ` [PATCH v13 4/7] drm/xe/pmu: Hook up gt suspend notification Lucas De Marchi
@ 2025-01-16 23:07 ` Lucas De Marchi
  2025-01-16 23:07 ` [PATCH v13 6/7] drm/xe/pmu: Add attribute skeleton Lucas De Marchi
  2025-01-16 23:07 ` [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events Lucas De Marchi
  6 siblings, 0 replies; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-16 23:07 UTC (permalink / raw)
  To: intel-xe; +Cc: Peter Zijlstra, linux-perf-users, Lucas De Marchi

Like other pmu drivers, keep the update separate from the read so it can
be called from other methods (like stop()) without side effects.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_pmu.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
index ce53cfb846470..f8154bcad891a 100644
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -126,18 +126,11 @@ static u64 __xe_pmu_event_read(struct perf_event *event)
 	return val;
 }
 
-static void xe_pmu_event_read(struct perf_event *event)
+static void xe_pmu_event_update(struct perf_event *event)
 {
-	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
 	struct hw_perf_event *hwc = &event->hw;
-	struct xe_pmu *pmu = &xe->pmu;
 	u64 prev, new;
 
-	if (!pmu->registered) {
-		event->hw.state = PERF_HES_STOPPED;
-		return;
-	}
-
 	prev = local64_read(&hwc->prev_count);
 	do {
 		new = __xe_pmu_event_read(event);
@@ -146,6 +139,19 @@ static void xe_pmu_event_read(struct perf_event *event)
 	local64_add(new - prev, &event->count);
 }
 
+static void xe_pmu_event_read(struct perf_event *event)
+{
+	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+	struct xe_pmu *pmu = &xe->pmu;
+
+	if (!pmu->registered) {
+		event->hw.state = PERF_HES_STOPPED;
+		return;
+	}
+
+	xe_pmu_event_update(event);
+}
+
 static void xe_pmu_enable(struct perf_event *event)
 {
 	/*
@@ -175,7 +181,7 @@ static void xe_pmu_event_stop(struct perf_event *event, int flags)
 
 	if (pmu->registered)
 		if (flags & PERF_EF_UPDATE)
-			xe_pmu_event_read(event);
+			xe_pmu_event_update(event);
 
 	event->hw.state = PERF_HES_STOPPED;
 }
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v13 6/7] drm/xe/pmu: Add attribute skeleton
  2025-01-16 23:07 [PATCH v13 0/7] drm/xe/pmu: PMU interface for Xe Lucas De Marchi
                   ` (4 preceding siblings ...)
  2025-01-16 23:07 ` [PATCH v13 5/7] drm/xe/pmu: Extract xe_pmu_event_update() Lucas De Marchi
@ 2025-01-16 23:07 ` Lucas De Marchi
  2025-01-16 23:07 ` [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events Lucas De Marchi
  6 siblings, 0 replies; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-16 23:07 UTC (permalink / raw)
  To: intel-xe; +Cc: Peter Zijlstra, linux-perf-users, Lucas De Marchi

Add the generic support for defining new attributes. This uses
gt-c6-residency as first attribute to bootstrap it, but its
implementation will be added by a follow up commit: until proper support
is added, it will always be invisible in sysfs since the corresponding
bit is not set in the supported_events bitmap.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_pmu.c       | 46 +++++++++++++++++++++++++++++--
 drivers/gpu/drm/xe/xe_pmu_types.h |  4 +++
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
index f8154bcad891a..c2af82ec3f793 100644
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -54,6 +54,8 @@ static unsigned int config_to_gt_id(u64 config)
 	return FIELD_GET(XE_PMU_EVENT_GT_MASK, config);
 }
 
+#define XE_PMU_EVENT_GT_C6_RESIDENCY	0x01
+
 static struct xe_gt *event_to_gt(struct perf_event *event)
 {
 	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
@@ -68,7 +70,8 @@ static bool event_supported(struct xe_pmu *pmu, unsigned int gt,
 	if (gt >= XE_MAX_GT_PER_TILE)
 		return false;
 
-	return false;
+	return id < sizeof(pmu->supported_events) * BITS_PER_BYTE &&
+		pmu->supported_events & BIT_ULL(id);
 }
 
 static void xe_pmu_event_destroy(struct perf_event *event)
@@ -219,16 +222,53 @@ static const struct attribute_group pmu_format_attr_group = {
 	.attrs = pmu_format_attrs,
 };
 
+static ssize_t event_attr_show(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct perf_pmu_events_attr *pmu_attr =
+		container_of(attr, struct perf_pmu_events_attr, attr);
+
+	return sprintf(buf, "event=%#04llx\n", pmu_attr->id);
+}
+
+static umode_t event_attr_is_visible(struct kobject *kobj,
+				     struct attribute *attr, int idx)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct perf_pmu_events_attr *pmu_attr =
+		container_of(attr, typeof(*pmu_attr), attr.attr);
+	struct xe_pmu *pmu =
+	       container_of(dev_get_drvdata(dev), typeof(*pmu), base);
+
+	if (event_supported(pmu, 0, pmu_attr->id))
+		return attr->mode;
+
+	return 0;
+}
+
+#define XE_EVENT_ATTR(name_, v_, id_, unit_)					\
+	PMU_EVENT_ATTR(name_, pmu_event_ ## v_, id_, event_attr_show)		\
+	PMU_EVENT_ATTR_ID_STRING(name_.unit, pmu_event_unit_ ## v_, id_, unit_)
+
+XE_EVENT_ATTR(gt-c6-residency, gt_c6_residency, XE_PMU_EVENT_GT_C6_RESIDENCY, "ms")
+
 static struct attribute *pmu_event_attrs[] = {
-	/* No events yet */
+	&pmu_event_gt_c6_residency.attr.attr,
+	&pmu_event_unit_gt_c6_residency.attr.attr,
+
 	NULL,
 };
 
 static const struct attribute_group pmu_events_attr_group = {
 	.name = "events",
 	.attrs = pmu_event_attrs,
+	.is_visible = event_attr_is_visible,
 };
 
+static void set_supported_events(struct xe_pmu *pmu)
+{
+}
+
 /**
  * xe_pmu_suspend() - Save necessary state before suspending
  * @gt: GT structure
@@ -309,6 +349,8 @@ int xe_pmu_register(struct xe_pmu *pmu)
 	pmu->base.stop		= xe_pmu_event_stop;
 	pmu->base.read		= xe_pmu_event_read;
 
+	set_supported_events(pmu);
+
 	ret = perf_pmu_register(&pmu->base, pmu->name, -1);
 	if (ret)
 		goto err_name;
diff --git a/drivers/gpu/drm/xe/xe_pmu_types.h b/drivers/gpu/drm/xe/xe_pmu_types.h
index 883e462852377..341398a9e60dc 100644
--- a/drivers/gpu/drm/xe/xe_pmu_types.h
+++ b/drivers/gpu/drm/xe/xe_pmu_types.h
@@ -42,6 +42,10 @@ struct xe_pmu {
 	 * @suspend_timestamp: Last time GT suspended
 	 */
 	ktime_t suspend_timestamp[XE_PMU_MAX_GT];
+	/**
+	 * @supported_events: Bitmap of supported events, indexed by event id
+	 */
+	u64 supported_events;
 };
 
 #endif
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events
  2025-01-16 23:07 [PATCH v13 0/7] drm/xe/pmu: PMU interface for Xe Lucas De Marchi
                   ` (5 preceding siblings ...)
  2025-01-16 23:07 ` [PATCH v13 6/7] drm/xe/pmu: Add attribute skeleton Lucas De Marchi
@ 2025-01-16 23:07 ` Lucas De Marchi
  2025-01-17 18:44   ` Rodrigo Vivi
  2025-01-21  6:16   ` Riana Tauro
  6 siblings, 2 replies; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-16 23:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Peter Zijlstra, linux-perf-users, Vinay Belgaumkar,
	Lucas De Marchi

From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>

Provide a PMU interface for GT C6 residency counters. The implementation
is ported over from the i915 PMU code. Residency is provided in units of
ms(like sysfs entry in - /sys/class/drm/card0/device/tile0/gt0/gtidle).

Sample usage and output:

	$ perf list | grep gt-c6
	  xe_0000_00_02.0/gt-c6-residency/                   [Kernel PMU event]

	$ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency*
	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency <==
	event=0x01

	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency.unit <==
	ms

	$ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000
	#           time             counts unit events
	     1.001196056              1,001 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
	     2.005216219              1,003 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---

Besides the rebase, that changed a lot how the event was added,
here is a summary of other changes:

- Use xe_pm_runtime_get_if_active() when reading
  xe_gt_idle_residency_msec() as there's not guarantee it will not be
  suspended anymore by the time it reads the counter

- Drop sample[] from the pmu struct and only use the prev/counter from
  the perf_event struct. This avoids mixing the counter reported to 2
  separate clients.

- Drop time ktime helpers and just use what's provided by
  include/linux/ktime.h

 drivers/gpu/drm/xe/xe_pmu.c | 56 +++++++++++++++++++++++++++++++------
 1 file changed, 48 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
index c2af82ec3f793..37df9d3cc110c 100644
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -11,6 +11,7 @@
 #include "xe_device.h"
 #include "xe_force_wake.h"
 #include "xe_gt_clock.h"
+#include "xe_gt_idle.h"
 #include "xe_gt_printk.h"
 #include "xe_mmio.h"
 #include "xe_macros.h"
@@ -117,16 +118,50 @@ static int xe_pmu_event_init(struct perf_event *event)
 	return 0;
 }
 
-static u64 __xe_pmu_event_read(struct perf_event *event)
+static u64 read_gt_c6_residency(struct xe_pmu *pmu, struct xe_gt *gt, u64 prev)
 {
-	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+	struct xe_device *xe = gt_to_xe(gt);
+	unsigned long flags;
+	ktime_t t0;
+	s64 delta;
+
+	if (xe_pm_runtime_get_if_active(xe)) {
+		u64 val = xe_gt_idle_residency_msec(&gt->gtidle);
+
+		xe_pm_runtime_put(xe);
+
+		return val;
+	}
+
+	/*
+	 * Estimate the idle residency by looking at the time the device was
+	 * suspended: should be good enough as long as the sampling frequency is
+	 * 2x or more than the suspend frequency.
+	 */
+	raw_spin_lock_irqsave(&pmu->lock, flags);
+	t0 = pmu->suspend_timestamp[gt->info.id];
+	raw_spin_unlock_irqrestore(&pmu->lock, flags);
+
+	delta = ktime_ms_delta(ktime_get(), t0);
+
+	return prev + delta;
+}
+
+static u64 __xe_pmu_event_read(struct perf_event *event, u64 prev)
+{
+	struct xe_pmu *pmu = container_of(event->pmu, typeof(*pmu), base);
+	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
 	struct xe_gt *gt = event_to_gt(event);
-	u64 val = 0;
 
 	if (!gt)
-		return 0;
+		return prev;
+
+	switch (config_to_event_id(event->attr.config)) {
+	case XE_PMU_EVENT_GT_C6_RESIDENCY:
+		return read_gt_c6_residency(pmu, gt, prev);
+	}
 
-	return val;
+	return prev;
 }
 
 static void xe_pmu_event_update(struct perf_event *event)
@@ -136,10 +171,11 @@ static void xe_pmu_event_update(struct perf_event *event)
 
 	prev = local64_read(&hwc->prev_count);
 	do {
-		new = __xe_pmu_event_read(event);
+		new = __xe_pmu_event_read(event, prev);
 	} while (!local64_try_cmpxchg(&hwc->prev_count, &prev, new));
 
-	local64_add(new - prev, &event->count);
+	if (new > prev)
+		local64_add(new - prev, &event->count);
 }
 
 static void xe_pmu_event_read(struct perf_event *event)
@@ -162,7 +198,7 @@ static void xe_pmu_enable(struct perf_event *event)
 	 * for all listeners. Even when the event was already enabled and has
 	 * an existing non-zero value.
 	 */
-	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
+	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event, 0));
 }
 
 static void xe_pmu_event_start(struct perf_event *event, int flags)
@@ -267,6 +303,10 @@ static const struct attribute_group pmu_events_attr_group = {
 
 static void set_supported_events(struct xe_pmu *pmu)
 {
+	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
+
+	if (!xe->info.skip_guc_pc)
+		pmu->supported_events |= BIT_ULL(XE_PMU_EVENT_GT_C6_RESIDENCY);
 }
 
 /**
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v13 2/7] drm/xe/pmu: Enable PMU interface
  2025-01-16 23:07 ` [PATCH v13 2/7] drm/xe/pmu: Enable PMU interface Lucas De Marchi
@ 2025-01-17 18:30   ` Rodrigo Vivi
  2025-01-17 22:19     ` Lucas De Marchi
  0 siblings, 1 reply; 17+ messages in thread
From: Rodrigo Vivi @ 2025-01-17 18:30 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-xe, Peter Zijlstra, linux-perf-users, Vinay Belgaumkar

On Thu, Jan 16, 2025 at 03:07:13PM -0800, Lucas De Marchi wrote:
> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> 
> Basic PMU enabling patch. Setup the basic framework
> for adding events/timers. This patch was previously
> reviewed here -
> https://patchwork.freedesktop.org/series/119504/
> 
> Based on previous versions by Bommu Krishnaiah, Aravind Iddamsetty and
> Riana Tauro, using i915 and rapl as reference implementation.
> 
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile          |   2 +
>  drivers/gpu/drm/xe/xe_device.c       |   3 +
>  drivers/gpu/drm/xe/xe_device_types.h |   4 +
>  drivers/gpu/drm/xe/xe_pmu.c          | 300 +++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_pmu.h          |  20 ++
>  drivers/gpu/drm/xe/xe_pmu_types.h    |  43 ++++
>  6 files changed, 372 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_pmu.c
>  create mode 100644 drivers/gpu/drm/xe/xe_pmu.h
>  create mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 81f63258a7e19..b476866a2a68e 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -304,6 +304,8 @@ endif
>  xe-$(CONFIG_DRM_XE_DP_TUNNEL) += \
>  	i915-display/intel_dp_tunnel.o
>  
> +xe-$(CONFIG_PERF_EVENTS) += xe_pmu.o
> +
>  obj-$(CONFIG_DRM_XE) += xe.o
>  obj-$(CONFIG_DRM_XE_KUNIT_TEST) += tests/
>  
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 0966d9697cafe..a0a80fa16c39a 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -49,6 +49,7 @@
>  #include "xe_pat.h"
>  #include "xe_pcode.h"
>  #include "xe_pm.h"
> +#include "xe_pmu.h"
>  #include "xe_query.h"
>  #include "xe_sriov.h"
>  #include "xe_tile.h"
> @@ -865,6 +866,8 @@ int xe_device_probe(struct xe_device *xe)
>  
>  	xe_oa_register(xe);
>  
> +	xe_pmu_register(&xe->pmu);
> +
>  	xe_debugfs_register(xe);
>  
>  	xe_hwmon_register(xe);
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 16ebb2859877f..58e79e19deaad 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -18,6 +18,7 @@
>  #include "xe_memirq_types.h"
>  #include "xe_oa_types.h"
>  #include "xe_platform_types.h"
> +#include "xe_pmu_types.h"
>  #include "xe_pt_types.h"
>  #include "xe_sriov_types.h"
>  #include "xe_step_types.h"
> @@ -514,6 +515,9 @@ struct xe_device {
>  		int mode;
>  	} wedged;
>  
> +	/** @pmu: performance monitoring unit */
> +	struct xe_pmu pmu;
> +
>  #ifdef TEST_VM_OPS_ERROR
>  	/**
>  	 * @vm_inject_error_position: inject errors at different places in VM
> diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
> new file mode 100644
> index 0000000000000..b63f819c54d02
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pmu.c
> @@ -0,0 +1,300 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_managed.h>
> +#include <drm/xe_drm.h>
> +
> +#include "regs/xe_gt_regs.h"
> +#include "xe_device.h"
> +#include "xe_force_wake.h"
> +#include "xe_gt_clock.h"
> +#include "xe_gt_printk.h"
> +#include "xe_mmio.h"
> +#include "xe_macros.h"
> +#include "xe_pm.h"
> +#include "xe_pmu.h"
> +
> +/**
> + * DOC: Xe PMU (Performance Monitoring Unit)
> + *
> + * Expose events/counters like C6 residency and GT frequency to user land.

GT-C6 please... C6 like this will get people confused with CPU C6.

> + * Events will be per device, the GT can be selected with an extra config
> + * sub-field (bits 60-63).
> + *
> + * Perf tool can be used to list these counters from the command line.
> + *
> + * Example commands to list/record supported perf events-
> + *
> + * $ ls -ld /sys/bus/event_source/devices/xe_*
> + * $ ls /sys/bus/event_source/devices/xe_0000_00_02.0/events/
> + * $ ls /sys/bus/event_source/devices/xe_0000_00_02.0/format/
> + *
> + * The format directory has info regarding the configs that can be used.
> + *
> + * The standard perf tool can be used to grep for a certain event as well-
> + * $ perf list | grep c6
> + *
> + * To list a specific event for a GT at regular intervals-

Why not ':' here and all the occurences above?

> + * $ perf stat -e <event_name,xe_gt_id=> -I <interval>
> + */
> +
> +#define XE_PMU_EVENT_GT_MASK		GENMASK_ULL(63, 60)
> +#define XE_PMU_EVENT_ID_MASK		GENMASK_ULL(11, 0)
> +
> +static unsigned int config_to_event_id(u64 config)
> +{
> +	return FIELD_GET(XE_PMU_EVENT_ID_MASK, config);
> +}
> +
> +static unsigned int config_to_gt_id(u64 config)
> +{
> +	return FIELD_GET(XE_PMU_EVENT_GT_MASK, config);
> +}
> +
> +static struct xe_gt *event_to_gt(struct perf_event *event)
> +{
> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> +	u64 gt = config_to_gt_id(event->attr.config);
> +
> +	return xe_device_get_gt(xe, gt);
> +}
> +
> +static bool event_supported(struct xe_pmu *pmu, unsigned int gt,
> +			    unsigned int id)
> +{
> +	if (gt >= XE_MAX_GT_PER_TILE)
> +		return false;
> +
> +	return false;
> +}
> +
> +static void xe_pmu_event_destroy(struct perf_event *event)
> +{
> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> +
> +	drm_WARN_ON(&xe->drm, event->parent);
> +	drm_dev_put(&xe->drm);
> +}
> +
> +static int xe_pmu_event_init(struct perf_event *event)
> +{
> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> +	struct xe_pmu *pmu = &xe->pmu;
> +	unsigned int id, gt;
> +
> +	if (!pmu->registered)
> +		return -ENODEV;
> +
> +	if (event->attr.type != event->pmu->type)
> +		return -ENOENT;
> +
> +	/* unsupported modes and filters */
> +	if (event->attr.sample_period) /* no sampling */
> +		return -EINVAL;
> +
> +	if (event->cpu < 0)
> +		return -EINVAL;
> +
> +	gt = config_to_gt_id(event->attr.config);
> +	id = config_to_event_id(event->attr.config);
> +	if (!event_supported(pmu, gt, id))
> +		return -ENOENT;
> +
> +	if (has_branch_stack(event))
> +		return -EOPNOTSUPP;
> +
> +	if (!event->parent) {
> +		drm_dev_get(&xe->drm);
> +		event->destroy = xe_pmu_event_destroy;
> +	}
> +
> +	return 0;
> +}
> +
> +static u64 __xe_pmu_event_read(struct perf_event *event)
> +{
> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> +	struct xe_gt *gt = event_to_gt(event);
> +	u64 val = 0;
> +
> +	if (!gt)
> +		return 0;
> +
> +	return val;
> +}
> +
> +static void xe_pmu_event_read(struct perf_event *event)
> +{
> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct xe_pmu *pmu = &xe->pmu;
> +	u64 prev, new;
> +
> +	if (!pmu->registered) {
> +		event->hw.state = PERF_HES_STOPPED;
> +		return;
> +	}
> +
> +	prev = local64_read(&hwc->prev_count);
> +	do {
> +		new = __xe_pmu_event_read(event);
> +	} while (!local64_try_cmpxchg(&hwc->prev_count, &prev, new));
> +
> +	local64_add(new - prev, &event->count);
> +}
> +
> +static void xe_pmu_enable(struct perf_event *event)
> +{
> +	/*
> +	 * Store the current counter value so we can report the correct delta
> +	 * for all listeners. Even when the event was already enabled and has
> +	 * an existing non-zero value.
> +	 */
> +	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
> +}
> +
> +static void xe_pmu_event_start(struct perf_event *event, int flags)
> +{
> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> +	struct xe_pmu *pmu = &xe->pmu;
> +
> +	if (!pmu->registered)
> +		return;
> +
> +	xe_pmu_enable(event);
> +	event->hw.state = 0;
> +}
> +
> +static void xe_pmu_event_stop(struct perf_event *event, int flags)
> +{
> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> +	struct xe_pmu *pmu = &xe->pmu;
> +
> +	if (pmu->registered)
> +		if (flags & PERF_EF_UPDATE)
> +			xe_pmu_event_read(event);
> +
> +	event->hw.state = PERF_HES_STOPPED;
> +}
> +
> +static int xe_pmu_event_add(struct perf_event *event, int flags)
> +{
> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> +	struct xe_pmu *pmu = &xe->pmu;
> +
> +	if (!pmu->registered)
> +		return -ENODEV;
> +
> +	if (flags & PERF_EF_START)
> +		xe_pmu_event_start(event, flags);
> +
> +	return 0;
> +}
> +
> +static void xe_pmu_event_del(struct perf_event *event, int flags)
> +{
> +	xe_pmu_event_stop(event, PERF_EF_UPDATE);
> +}
> +
> +PMU_FORMAT_ATTR(gt,	"config:60-63");
> +PMU_FORMAT_ATTR(event,	"config:0-11");
> +
> +static struct attribute *pmu_format_attrs[] = {
> +	&format_attr_event.attr,
> +	&format_attr_gt.attr,
> +	NULL,
> +};
> +
> +static const struct attribute_group pmu_format_attr_group = {
> +	.name = "format",
> +	.attrs = pmu_format_attrs,
> +};
> +
> +static struct attribute *pmu_event_attrs[] = {
> +	/* No events yet */
> +	NULL,
> +};
> +
> +static const struct attribute_group pmu_events_attr_group = {
> +	.name = "events",
> +	.attrs = pmu_event_attrs,
> +};
> +
> +/**
> + * xe_pmu_unregister() - Remove/cleanup PMU registration
> + * @arg: Ptr to pmu
> + */
> +static void xe_pmu_unregister(void *arg)
> +{
> +	struct xe_pmu *pmu = arg;
> +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
> +
> +	if (!pmu->registered)
> +		return;
> +
> +	pmu->registered = false;
> +
> +	perf_pmu_unregister(&pmu->base);
> +	kfree(pmu->name);
> +}
> +
> +/**
> + * xe_pmu_register() - Define basic PMU properties for Xe and add event callbacks.
> + * @pmu: the PMU object
> + *
> + * Returns 0 on success and an appropriate error code otherwise
> + */
> +int xe_pmu_register(struct xe_pmu *pmu)
> +{
> +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
> +	static const struct attribute_group *attr_groups[] = {
> +		&pmu_format_attr_group,
> +		&pmu_events_attr_group,
> +		NULL
> +	};
> +	int ret = -ENOMEM;
> +	char *name;
> +
> +	if (IS_SRIOV_VF(xe))
> +		return 0;
> +
> +	raw_spin_lock_init(&pmu->lock);
> +
> +	name = kasprintf(GFP_KERNEL, "xe_%s",
> +			 dev_name(xe->drm.dev));
> +	if (!name)
> +		goto err;
> +
> +	/* tools/perf reserves colons as special. */
> +	strreplace(name, ':', '_');
> +
> +	pmu->name		= name;
> +	pmu->base.attr_groups	= attr_groups;
> +	pmu->base.scope		= PERF_PMU_SCOPE_SYS_WIDE;
> +	pmu->base.module	= THIS_MODULE;
> +	pmu->base.task_ctx_nr	= perf_invalid_context;
> +	pmu->base.event_init	= xe_pmu_event_init;
> +	pmu->base.add		= xe_pmu_event_add;
> +	pmu->base.del		= xe_pmu_event_del;
> +	pmu->base.start		= xe_pmu_event_start;
> +	pmu->base.stop		= xe_pmu_event_stop;
> +	pmu->base.read		= xe_pmu_event_read;
> +
> +	ret = perf_pmu_register(&pmu->base, pmu->name, -1);
> +	if (ret)
> +		goto err_name;
> +
> +	pmu->registered = true;
> +
> +	return devm_add_action_or_reset(xe->drm.dev, xe_pmu_unregister, pmu);
> +
> +err_name:
> +	kfree(name);
> +err:
> +	drm_err(&xe->drm, "Failed to register PMU (ret=%d)!\n", ret);
> +
> +	return ret;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_pmu.h b/drivers/gpu/drm/xe/xe_pmu.h
> new file mode 100644
> index 0000000000000..f9dfe77d00cb6
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pmu.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_PMU_H_
> +#define _XE_PMU_H_
> +
> +#include "xe_pmu_types.h"
> +
> +struct xe_gt;
> +
> +#if IS_ENABLED(CONFIG_PERF_EVENTS)
> +int xe_pmu_register(struct xe_pmu *pmu);
> +#else
> +static inline void xe_pmu_register(struct xe_pmu *pmu) {}
> +#endif
> +
> +#endif
> +
> diff --git a/drivers/gpu/drm/xe/xe_pmu_types.h b/drivers/gpu/drm/xe/xe_pmu_types.h
> new file mode 100644
> index 0000000000000..e0cf7169f4fda
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pmu_types.h
> @@ -0,0 +1,43 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_PMU_TYPES_H_
> +#define _XE_PMU_TYPES_H_
> +
> +#include <linux/perf_event.h>
> +#include <linux/spinlock_types.h>
> +
> +enum {
> +	__XE_NUM_PMU_SAMPLERS
> +};
> +
> +#define XE_PMU_MAX_GT 2
> +
> +/**
> + * struct xe_pmu - PMU related data per Xe device
> + *
> + * Stores per device PMU info that includes event/perf attributes and
> + * sampling counters across all GTs for this device.
> + */
> +struct xe_pmu {
> +	/**
> +	 * @base: PMU base.
> +	 */
> +	struct pmu base;
> +	/**
> +	 * @registered: PMU is registered and not in the unregistering process.
> +	 */
> +	bool registered;
> +	/**
> +	 * @name: Name as registered with perf core.
> +	 */
> +	const char *name;
> +	/**
> +	 * @lock: Lock protecting enable mask and ref count handling.
> +	 */
> +	raw_spinlock_t lock;
> +};
> +
> +#endif
> -- 
> 2.48.0
> 

With s/C6/GT-C6, 
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v13 3/7] drm/xe/pmu: Assert max gt
  2025-01-16 23:07 ` [PATCH v13 3/7] drm/xe/pmu: Assert max gt Lucas De Marchi
@ 2025-01-17 18:31   ` Rodrigo Vivi
  0 siblings, 0 replies; 17+ messages in thread
From: Rodrigo Vivi @ 2025-01-17 18:31 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Peter Zijlstra, linux-perf-users

On Thu, Jan 16, 2025 at 03:07:14PM -0800, Lucas De Marchi wrote:
> XE_PMU_MAX_GT needs to be used due to a circular dependency, but we
> should make sure it doesn't go out of sync with XE_PMU_MAX_GT. Add a
> compile check for that.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_pmu.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
> index b63f819c54d02..2312c73a3ee16 100644
> --- a/drivers/gpu/drm/xe/xe_pmu.c
> +++ b/drivers/gpu/drm/xe/xe_pmu.c
> @@ -258,6 +258,8 @@ int xe_pmu_register(struct xe_pmu *pmu)
>  	int ret = -ENOMEM;
>  	char *name;
>  
> +	BUILD_BUG_ON(XE_MAX_GT_PER_TILE != XE_PMU_MAX_GT);

good idea!

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> +
>  	if (IS_SRIOV_VF(xe))
>  		return 0;
>  
> -- 
> 2.48.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events
  2025-01-16 23:07 ` [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events Lucas De Marchi
@ 2025-01-17 18:44   ` Rodrigo Vivi
  2025-01-17 22:40     ` Lucas De Marchi
  2025-01-21  6:16   ` Riana Tauro
  1 sibling, 1 reply; 17+ messages in thread
From: Rodrigo Vivi @ 2025-01-17 18:44 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-xe, Peter Zijlstra, linux-perf-users, Vinay Belgaumkar

On Thu, Jan 16, 2025 at 03:07:18PM -0800, Lucas De Marchi wrote:
> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> 
> Provide a PMU interface for GT C6 residency counters. The implementation
> is ported over from the i915 PMU code. Residency is provided in units of
> ms(like sysfs entry in - /sys/class/drm/card0/device/tile0/gt0/gtidle).
> 
> Sample usage and output:
> 
> 	$ perf list | grep gt-c6
> 	  xe_0000_00_02.0/gt-c6-residency/                   [Kernel PMU event]
> 
> 	$ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency*
> 	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency <==
> 	event=0x01
> 
> 	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency.unit <==
> 	ms
> 
> 	$ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000
> 	#           time             counts unit events
> 	     1.001196056              1,001 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
> 	     2.005216219              1,003 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
> 
> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
> 
> Besides the rebase, that changed a lot how the event was added,
> here is a summary of other changes:
> 
> - Use xe_pm_runtime_get_if_active() when reading
>   xe_gt_idle_residency_msec() as there's not guarantee it will not be
>   suspended anymore by the time it reads the counter
> 
> - Drop sample[] from the pmu struct and only use the prev/counter from
>   the perf_event struct. This avoids mixing the counter reported to 2
>   separate clients.
> 
> - Drop time ktime helpers and just use what's provided by
>   include/linux/ktime.h
> 
>  drivers/gpu/drm/xe/xe_pmu.c | 56 +++++++++++++++++++++++++++++++------
>  1 file changed, 48 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
> index c2af82ec3f793..37df9d3cc110c 100644
> --- a/drivers/gpu/drm/xe/xe_pmu.c
> +++ b/drivers/gpu/drm/xe/xe_pmu.c
> @@ -11,6 +11,7 @@
>  #include "xe_device.h"
>  #include "xe_force_wake.h"
>  #include "xe_gt_clock.h"
> +#include "xe_gt_idle.h"
>  #include "xe_gt_printk.h"
>  #include "xe_mmio.h"
>  #include "xe_macros.h"
> @@ -117,16 +118,50 @@ static int xe_pmu_event_init(struct perf_event *event)
>  	return 0;
>  }
>  
> -static u64 __xe_pmu_event_read(struct perf_event *event)
> +static u64 read_gt_c6_residency(struct xe_pmu *pmu, struct xe_gt *gt, u64 prev)
>  {
> -	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> +	struct xe_device *xe = gt_to_xe(gt);
> +	unsigned long flags;
> +	ktime_t t0;
> +	s64 delta;
> +
> +	if (xe_pm_runtime_get_if_active(xe)) {
> +		u64 val = xe_gt_idle_residency_msec(&gt->gtidle);
> +
> +		xe_pm_runtime_put(xe);
> +
> +		return val;
> +	}
> +
> +	/*
> +	 * Estimate the idle residency by looking at the time the device was
> +	 * suspended: should be good enough as long as the sampling frequency is
> +	 * 2x or more than the suspend frequency.
> +	 */
> +	raw_spin_lock_irqsave(&pmu->lock, flags);
> +	t0 = pmu->suspend_timestamp[gt->info.id];
> +	raw_spin_unlock_irqrestore(&pmu->lock, flags);

I believe we should simply avoid the patch 4 and this block and use
xe_pm_runtime_get(xe) instead of the if_active variant.

I'm afraid this won't be a good estimative since this mix 2 different wakes
of 2 different IP blocks... the runtime pm is for the entire device, while
in runtime active the GTs can be sleeping in GT-C6.

or did we get some deadlock risk on that scenario?
perhaps we should get runtime pm in upper layers when using PMU?

> +
> +	delta = ktime_ms_delta(ktime_get(), t0);
> +
> +	return prev + delta;
> +}
> +
> +static u64 __xe_pmu_event_read(struct perf_event *event, u64 prev)
> +{
> +	struct xe_pmu *pmu = container_of(event->pmu, typeof(*pmu), base);
> +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>  	struct xe_gt *gt = event_to_gt(event);
> -	u64 val = 0;
>  
>  	if (!gt)
> -		return 0;
> +		return prev;
> +
> +	switch (config_to_event_id(event->attr.config)) {
> +	case XE_PMU_EVENT_GT_C6_RESIDENCY:
> +		return read_gt_c6_residency(pmu, gt, prev);
> +	}
>  
> -	return val;
> +	return prev;
>  }
>  
>  static void xe_pmu_event_update(struct perf_event *event)
> @@ -136,10 +171,11 @@ static void xe_pmu_event_update(struct perf_event *event)
>  
>  	prev = local64_read(&hwc->prev_count);
>  	do {
> -		new = __xe_pmu_event_read(event);
> +		new = __xe_pmu_event_read(event, prev);
>  	} while (!local64_try_cmpxchg(&hwc->prev_count, &prev, new));
>  
> -	local64_add(new - prev, &event->count);
> +	if (new > prev)
> +		local64_add(new - prev, &event->count);
>  }
>  
>  static void xe_pmu_event_read(struct perf_event *event)
> @@ -162,7 +198,7 @@ static void xe_pmu_enable(struct perf_event *event)
>  	 * for all listeners. Even when the event was already enabled and has
>  	 * an existing non-zero value.
>  	 */
> -	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
> +	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event, 0));
>  }
>  
>  static void xe_pmu_event_start(struct perf_event *event, int flags)
> @@ -267,6 +303,10 @@ static const struct attribute_group pmu_events_attr_group = {
>  
>  static void set_supported_events(struct xe_pmu *pmu)
>  {
> +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
> +
> +	if (!xe->info.skip_guc_pc)
> +		pmu->supported_events |= BIT_ULL(XE_PMU_EVENT_GT_C6_RESIDENCY);
>  }
>  
>  /**
> -- 
> 2.48.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v13 2/7] drm/xe/pmu: Enable PMU interface
  2025-01-17 18:30   ` Rodrigo Vivi
@ 2025-01-17 22:19     ` Lucas De Marchi
  0 siblings, 0 replies; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-17 22:19 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe, Peter Zijlstra, linux-perf-users, Vinay Belgaumkar

On Fri, Jan 17, 2025 at 01:30:35PM -0500, Rodrigo Vivi wrote:
>On Thu, Jan 16, 2025 at 03:07:13PM -0800, Lucas De Marchi wrote:
>> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>
>> Basic PMU enabling patch. Setup the basic framework
>> for adding events/timers. This patch was previously
>> reviewed here -
>> https://patchwork.freedesktop.org/series/119504/
>>
>> Based on previous versions by Bommu Krishnaiah, Aravind Iddamsetty and
>> Riana Tauro, using i915 and rapl as reference implementation.
>>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>  drivers/gpu/drm/xe/Makefile          |   2 +
>>  drivers/gpu/drm/xe/xe_device.c       |   3 +
>>  drivers/gpu/drm/xe/xe_device_types.h |   4 +
>>  drivers/gpu/drm/xe/xe_pmu.c          | 300 +++++++++++++++++++++++++++
>>  drivers/gpu/drm/xe/xe_pmu.h          |  20 ++
>>  drivers/gpu/drm/xe/xe_pmu_types.h    |  43 ++++
>>  6 files changed, 372 insertions(+)
>>  create mode 100644 drivers/gpu/drm/xe/xe_pmu.c
>>  create mode 100644 drivers/gpu/drm/xe/xe_pmu.h
>>  create mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index 81f63258a7e19..b476866a2a68e 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -304,6 +304,8 @@ endif
>>  xe-$(CONFIG_DRM_XE_DP_TUNNEL) += \
>>  	i915-display/intel_dp_tunnel.o
>>
>> +xe-$(CONFIG_PERF_EVENTS) += xe_pmu.o
>> +
>>  obj-$(CONFIG_DRM_XE) += xe.o
>>  obj-$(CONFIG_DRM_XE_KUNIT_TEST) += tests/
>>
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> index 0966d9697cafe..a0a80fa16c39a 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -49,6 +49,7 @@
>>  #include "xe_pat.h"
>>  #include "xe_pcode.h"
>>  #include "xe_pm.h"
>> +#include "xe_pmu.h"
>>  #include "xe_query.h"
>>  #include "xe_sriov.h"
>>  #include "xe_tile.h"
>> @@ -865,6 +866,8 @@ int xe_device_probe(struct xe_device *xe)
>>
>>  	xe_oa_register(xe);
>>
>> +	xe_pmu_register(&xe->pmu);
>> +
>>  	xe_debugfs_register(xe);
>>
>>  	xe_hwmon_register(xe);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> index 16ebb2859877f..58e79e19deaad 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -18,6 +18,7 @@
>>  #include "xe_memirq_types.h"
>>  #include "xe_oa_types.h"
>>  #include "xe_platform_types.h"
>> +#include "xe_pmu_types.h"
>>  #include "xe_pt_types.h"
>>  #include "xe_sriov_types.h"
>>  #include "xe_step_types.h"
>> @@ -514,6 +515,9 @@ struct xe_device {
>>  		int mode;
>>  	} wedged;
>>
>> +	/** @pmu: performance monitoring unit */
>> +	struct xe_pmu pmu;
>> +
>>  #ifdef TEST_VM_OPS_ERROR
>>  	/**
>>  	 * @vm_inject_error_position: inject errors at different places in VM
>> diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
>> new file mode 100644
>> index 0000000000000..b63f819c54d02
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_pmu.c
>> @@ -0,0 +1,300 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2025 Intel Corporation
>> + */
>> +
>> +#include <drm/drm_drv.h>
>> +#include <drm/drm_managed.h>
>> +#include <drm/xe_drm.h>
>> +
>> +#include "regs/xe_gt_regs.h"
>> +#include "xe_device.h"
>> +#include "xe_force_wake.h"
>> +#include "xe_gt_clock.h"
>> +#include "xe_gt_printk.h"
>> +#include "xe_mmio.h"
>> +#include "xe_macros.h"
>> +#include "xe_pm.h"
>> +#include "xe_pmu.h"
>> +
>> +/**
>> + * DOC: Xe PMU (Performance Monitoring Unit)
>> + *
>> + * Expose events/counters like C6 residency and GT frequency to user land.
>
>GT-C6 please... C6 like this will get people confused with CPU C6.
>
>> + * Events will be per device, the GT can be selected with an extra config
>> + * sub-field (bits 60-63).
>> + *
>> + * Perf tool can be used to list these counters from the command line.
>> + *
>> + * Example commands to list/record supported perf events-
>> + *
>> + * $ ls -ld /sys/bus/event_source/devices/xe_*
>> + * $ ls /sys/bus/event_source/devices/xe_0000_00_02.0/events/
>> + * $ ls /sys/bus/event_source/devices/xe_0000_00_02.0/format/
>> + *
>> + * The format directory has info regarding the configs that can be used.
>> + *
>> + * The standard perf tool can be used to grep for a certain event as well-
>> + * $ perf list | grep c6
>> + *
>> + * To list a specific event for a GT at regular intervals-
>
>Why not ':' here and all the occurences above?

yeah... I replaced a few I found in Vinay's commit messages. These ones
went unnoticed.

>
>> + * $ perf stat -e <event_name,xe_gt_id=> -I <interval>
>> + */
>> +
>> +#define XE_PMU_EVENT_GT_MASK		GENMASK_ULL(63, 60)
>> +#define XE_PMU_EVENT_ID_MASK		GENMASK_ULL(11, 0)
>> +
>> +static unsigned int config_to_event_id(u64 config)
>> +{
>> +	return FIELD_GET(XE_PMU_EVENT_ID_MASK, config);
>> +}
>> +
>> +static unsigned int config_to_gt_id(u64 config)
>> +{
>> +	return FIELD_GET(XE_PMU_EVENT_GT_MASK, config);
>> +}
>> +
>> +static struct xe_gt *event_to_gt(struct perf_event *event)
>> +{
>> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
>> +	u64 gt = config_to_gt_id(event->attr.config);
>> +
>> +	return xe_device_get_gt(xe, gt);
>> +}
>> +
>> +static bool event_supported(struct xe_pmu *pmu, unsigned int gt,
>> +			    unsigned int id)
>> +{
>> +	if (gt >= XE_MAX_GT_PER_TILE)
>> +		return false;
>> +
>> +	return false;
>> +}
>> +
>> +static void xe_pmu_event_destroy(struct perf_event *event)
>> +{
>> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
>> +
>> +	drm_WARN_ON(&xe->drm, event->parent);
>> +	drm_dev_put(&xe->drm);
>> +}
>> +
>> +static int xe_pmu_event_init(struct perf_event *event)
>> +{
>> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
>> +	struct xe_pmu *pmu = &xe->pmu;
>> +	unsigned int id, gt;
>> +
>> +	if (!pmu->registered)
>> +		return -ENODEV;
>> +
>> +	if (event->attr.type != event->pmu->type)
>> +		return -ENOENT;
>> +
>> +	/* unsupported modes and filters */
>> +	if (event->attr.sample_period) /* no sampling */
>> +		return -EINVAL;
>> +
>> +	if (event->cpu < 0)
>> +		return -EINVAL;
>> +
>> +	gt = config_to_gt_id(event->attr.config);
>> +	id = config_to_event_id(event->attr.config);
>> +	if (!event_supported(pmu, gt, id))
>> +		return -ENOENT;
>> +
>> +	if (has_branch_stack(event))
>> +		return -EOPNOTSUPP;
>> +
>> +	if (!event->parent) {
>> +		drm_dev_get(&xe->drm);
>> +		event->destroy = xe_pmu_event_destroy;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static u64 __xe_pmu_event_read(struct perf_event *event)
>> +{
>> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
>> +	struct xe_gt *gt = event_to_gt(event);
>> +	u64 val = 0;
>> +
>> +	if (!gt)
>> +		return 0;
>> +
>> +	return val;
>> +}
>> +
>> +static void xe_pmu_event_read(struct perf_event *event)
>> +{
>> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
>> +	struct hw_perf_event *hwc = &event->hw;
>> +	struct xe_pmu *pmu = &xe->pmu;
>> +	u64 prev, new;
>> +
>> +	if (!pmu->registered) {
>> +		event->hw.state = PERF_HES_STOPPED;
>> +		return;
>> +	}
>> +
>> +	prev = local64_read(&hwc->prev_count);
>> +	do {
>> +		new = __xe_pmu_event_read(event);
>> +	} while (!local64_try_cmpxchg(&hwc->prev_count, &prev, new));
>> +
>> +	local64_add(new - prev, &event->count);
>> +}
>> +
>> +static void xe_pmu_enable(struct perf_event *event)
>> +{
>> +	/*
>> +	 * Store the current counter value so we can report the correct delta
>> +	 * for all listeners. Even when the event was already enabled and has
>> +	 * an existing non-zero value.
>> +	 */
>> +	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
>> +}
>> +
>> +static void xe_pmu_event_start(struct perf_event *event, int flags)
>> +{
>> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
>> +	struct xe_pmu *pmu = &xe->pmu;
>> +
>> +	if (!pmu->registered)
>> +		return;
>> +
>> +	xe_pmu_enable(event);
>> +	event->hw.state = 0;
>> +}
>> +
>> +static void xe_pmu_event_stop(struct perf_event *event, int flags)
>> +{
>> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
>> +	struct xe_pmu *pmu = &xe->pmu;
>> +
>> +	if (pmu->registered)
>> +		if (flags & PERF_EF_UPDATE)
>> +			xe_pmu_event_read(event);
>> +
>> +	event->hw.state = PERF_HES_STOPPED;
>> +}
>> +
>> +static int xe_pmu_event_add(struct perf_event *event, int flags)
>> +{
>> +	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
>> +	struct xe_pmu *pmu = &xe->pmu;
>> +
>> +	if (!pmu->registered)
>> +		return -ENODEV;
>> +
>> +	if (flags & PERF_EF_START)
>> +		xe_pmu_event_start(event, flags);
>> +
>> +	return 0;
>> +}
>> +
>> +static void xe_pmu_event_del(struct perf_event *event, int flags)
>> +{
>> +	xe_pmu_event_stop(event, PERF_EF_UPDATE);
>> +}
>> +
>> +PMU_FORMAT_ATTR(gt,	"config:60-63");
>> +PMU_FORMAT_ATTR(event,	"config:0-11");
>> +
>> +static struct attribute *pmu_format_attrs[] = {
>> +	&format_attr_event.attr,
>> +	&format_attr_gt.attr,
>> +	NULL,
>> +};
>> +
>> +static const struct attribute_group pmu_format_attr_group = {
>> +	.name = "format",
>> +	.attrs = pmu_format_attrs,
>> +};
>> +
>> +static struct attribute *pmu_event_attrs[] = {
>> +	/* No events yet */
>> +	NULL,
>> +};
>> +
>> +static const struct attribute_group pmu_events_attr_group = {
>> +	.name = "events",
>> +	.attrs = pmu_event_attrs,
>> +};
>> +
>> +/**
>> + * xe_pmu_unregister() - Remove/cleanup PMU registration
>> + * @arg: Ptr to pmu
>> + */
>> +static void xe_pmu_unregister(void *arg)
>> +{
>> +	struct xe_pmu *pmu = arg;
>> +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>> +
>> +	if (!pmu->registered)
>> +		return;
>> +
>> +	pmu->registered = false;
>> +
>> +	perf_pmu_unregister(&pmu->base);
>> +	kfree(pmu->name);
>> +}
>> +
>> +/**
>> + * xe_pmu_register() - Define basic PMU properties for Xe and add event callbacks.
>> + * @pmu: the PMU object
>> + *
>> + * Returns 0 on success and an appropriate error code otherwise
>> + */
>> +int xe_pmu_register(struct xe_pmu *pmu)
>> +{
>> +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>> +	static const struct attribute_group *attr_groups[] = {
>> +		&pmu_format_attr_group,
>> +		&pmu_events_attr_group,
>> +		NULL
>> +	};
>> +	int ret = -ENOMEM;
>> +	char *name;
>> +
>> +	if (IS_SRIOV_VF(xe))
>> +		return 0;
>> +
>> +	raw_spin_lock_init(&pmu->lock);
>> +
>> +	name = kasprintf(GFP_KERNEL, "xe_%s",
>> +			 dev_name(xe->drm.dev));
>> +	if (!name)
>> +		goto err;
>> +
>> +	/* tools/perf reserves colons as special. */
>> +	strreplace(name, ':', '_');
>> +
>> +	pmu->name		= name;
>> +	pmu->base.attr_groups	= attr_groups;
>> +	pmu->base.scope		= PERF_PMU_SCOPE_SYS_WIDE;
>> +	pmu->base.module	= THIS_MODULE;
>> +	pmu->base.task_ctx_nr	= perf_invalid_context;
>> +	pmu->base.event_init	= xe_pmu_event_init;
>> +	pmu->base.add		= xe_pmu_event_add;
>> +	pmu->base.del		= xe_pmu_event_del;
>> +	pmu->base.start		= xe_pmu_event_start;
>> +	pmu->base.stop		= xe_pmu_event_stop;
>> +	pmu->base.read		= xe_pmu_event_read;
>> +
>> +	ret = perf_pmu_register(&pmu->base, pmu->name, -1);
>> +	if (ret)
>> +		goto err_name;
>> +
>> +	pmu->registered = true;
>> +
>> +	return devm_add_action_or_reset(xe->drm.dev, xe_pmu_unregister, pmu);
>> +
>> +err_name:
>> +	kfree(name);
>> +err:
>> +	drm_err(&xe->drm, "Failed to register PMU (ret=%d)!\n", ret);
>> +
>> +	return ret;
>> +}
>> diff --git a/drivers/gpu/drm/xe/xe_pmu.h b/drivers/gpu/drm/xe/xe_pmu.h
>> new file mode 100644
>> index 0000000000000..f9dfe77d00cb6
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_pmu.h
>> @@ -0,0 +1,20 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2025 Intel Corporation
>> + */
>> +
>> +#ifndef _XE_PMU_H_
>> +#define _XE_PMU_H_
>> +
>> +#include "xe_pmu_types.h"
>> +
>> +struct xe_gt;
>> +
>> +#if IS_ENABLED(CONFIG_PERF_EVENTS)
>> +int xe_pmu_register(struct xe_pmu *pmu);
>> +#else
>> +static inline void xe_pmu_register(struct xe_pmu *pmu) {}
>> +#endif
>> +
>> +#endif
>> +
>> diff --git a/drivers/gpu/drm/xe/xe_pmu_types.h b/drivers/gpu/drm/xe/xe_pmu_types.h
>> new file mode 100644
>> index 0000000000000..e0cf7169f4fda
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_pmu_types.h
>> @@ -0,0 +1,43 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2025 Intel Corporation
>> + */
>> +
>> +#ifndef _XE_PMU_TYPES_H_
>> +#define _XE_PMU_TYPES_H_
>> +
>> +#include <linux/perf_event.h>
>> +#include <linux/spinlock_types.h>
>> +
>> +enum {
>> +	__XE_NUM_PMU_SAMPLERS
>> +};
>> +
>> +#define XE_PMU_MAX_GT 2
>> +
>> +/**
>> + * struct xe_pmu - PMU related data per Xe device
>> + *
>> + * Stores per device PMU info that includes event/perf attributes and
>> + * sampling counters across all GTs for this device.
>> + */
>> +struct xe_pmu {
>> +	/**
>> +	 * @base: PMU base.
>> +	 */
>> +	struct pmu base;
>> +	/**
>> +	 * @registered: PMU is registered and not in the unregistering process.
>> +	 */
>> +	bool registered;
>> +	/**
>> +	 * @name: Name as registered with perf core.
>> +	 */
>> +	const char *name;
>> +	/**
>> +	 * @lock: Lock protecting enable mask and ref count handling.
>> +	 */
>> +	raw_spinlock_t lock;
>> +};
>> +
>> +#endif
>> --
>> 2.48.0
>>
>
>With s/C6/GT-C6,
>Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

thanks
Lucas De Marchi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events
  2025-01-17 18:44   ` Rodrigo Vivi
@ 2025-01-17 22:40     ` Lucas De Marchi
  2025-01-17 22:52       ` Rodrigo Vivi
  0 siblings, 1 reply; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-17 22:40 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe, Peter Zijlstra, linux-perf-users, Vinay Belgaumkar

On Fri, Jan 17, 2025 at 01:44:02PM -0500, Rodrigo Vivi wrote:
>On Thu, Jan 16, 2025 at 03:07:18PM -0800, Lucas De Marchi wrote:
>> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>
>> Provide a PMU interface for GT C6 residency counters. The implementation
>> is ported over from the i915 PMU code. Residency is provided in units of
>> ms(like sysfs entry in - /sys/class/drm/card0/device/tile0/gt0/gtidle).
>>
>> Sample usage and output:
>>
>> 	$ perf list | grep gt-c6
>> 	  xe_0000_00_02.0/gt-c6-residency/                   [Kernel PMU event]
>>
>> 	$ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency*
>> 	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency <==
>> 	event=0x01
>>
>> 	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency.unit <==
>> 	ms
>>
>> 	$ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000
>> 	#           time             counts unit events
>> 	     1.001196056              1,001 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
>> 	     2.005216219              1,003 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
>>
>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>
>> Besides the rebase, that changed a lot how the event was added,
>> here is a summary of other changes:
>>
>> - Use xe_pm_runtime_get_if_active() when reading
>>   xe_gt_idle_residency_msec() as there's not guarantee it will not be
>>   suspended anymore by the time it reads the counter
>>
>> - Drop sample[] from the pmu struct and only use the prev/counter from
>>   the perf_event struct. This avoids mixing the counter reported to 2
>>   separate clients.
>>
>> - Drop time ktime helpers and just use what's provided by
>>   include/linux/ktime.h
>>
>>  drivers/gpu/drm/xe/xe_pmu.c | 56 +++++++++++++++++++++++++++++++------
>>  1 file changed, 48 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
>> index c2af82ec3f793..37df9d3cc110c 100644
>> --- a/drivers/gpu/drm/xe/xe_pmu.c
>> +++ b/drivers/gpu/drm/xe/xe_pmu.c
>> @@ -11,6 +11,7 @@
>>  #include "xe_device.h"
>>  #include "xe_force_wake.h"
>>  #include "xe_gt_clock.h"
>> +#include "xe_gt_idle.h"
>>  #include "xe_gt_printk.h"
>>  #include "xe_mmio.h"
>>  #include "xe_macros.h"
>> @@ -117,16 +118,50 @@ static int xe_pmu_event_init(struct perf_event *event)
>>  	return 0;
>>  }
>>
>> -static u64 __xe_pmu_event_read(struct perf_event *event)
>> +static u64 read_gt_c6_residency(struct xe_pmu *pmu, struct xe_gt *gt, u64 prev)
>>  {
>> -	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
>> +	struct xe_device *xe = gt_to_xe(gt);
>> +	unsigned long flags;
>> +	ktime_t t0;
>> +	s64 delta;
>> +
>> +	if (xe_pm_runtime_get_if_active(xe)) {
>> +		u64 val = xe_gt_idle_residency_msec(&gt->gtidle);
>> +
>> +		xe_pm_runtime_put(xe);
>> +
>> +		return val;
>> +	}
>> +
>> +	/*
>> +	 * Estimate the idle residency by looking at the time the device was
>> +	 * suspended: should be good enough as long as the sampling frequency is
>> +	 * 2x or more than the suspend frequency.
>> +	 */
>> +	raw_spin_lock_irqsave(&pmu->lock, flags);
>> +	t0 = pmu->suspend_timestamp[gt->info.id];
>> +	raw_spin_unlock_irqrestore(&pmu->lock, flags);
>
>I believe we should simply avoid the patch 4 and this block and use
>xe_pm_runtime_get(xe) instead of the if_active variant.

downside is that gpu is going to be awaken because we are monitoring it.

>
>I'm afraid this won't be a good estimative since this mix 2 different wakes
>of 2 different IP blocks... the runtime pm is for the entire device, while
>in runtime active the GTs can be sleeping in GT-C6.

in runtime active we can read the counter without making the debug tool
(perf) disrupt the device state. Idea here is that when it's on runtime
pm suspended it also counts as gt-c6 since it's stronger than that.
device suspended implies gt-6, so we count that since suspended.

>
>or did we get some deadlock risk on that scenario?

I don't think there's a deadlock... there's just the undesired behavior
of "when perf is running, the device is never suspended" - and since we
have plans to integrate perf with tools like gputop/qmassa/etc, it means
it will never suspend "when users are looking at it".

It's definitely simpler, if acceptable: "when looking at gt-c6, the
device will never suspend". We can also start with the simpler one and
split the more complex scenario with a patch on top, too.

>perhaps we should get runtime pm in upper layers when using PMU?

we'd need to do it when enabling/disabling the event. It should be
possible, yes. We can't do it on the read call though since we'd need to
make that synchronously.

Lucas De Marchi

>
>> +
>> +	delta = ktime_ms_delta(ktime_get(), t0);
>> +
>> +	return prev + delta;
>> +}
>> +
>> +static u64 __xe_pmu_event_read(struct perf_event *event, u64 prev)
>> +{
>> +	struct xe_pmu *pmu = container_of(event->pmu, typeof(*pmu), base);
>> +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>>  	struct xe_gt *gt = event_to_gt(event);
>> -	u64 val = 0;
>>
>>  	if (!gt)
>> -		return 0;
>> +		return prev;
>> +
>> +	switch (config_to_event_id(event->attr.config)) {
>> +	case XE_PMU_EVENT_GT_C6_RESIDENCY:
>> +		return read_gt_c6_residency(pmu, gt, prev);
>> +	}
>>
>> -	return val;
>> +	return prev;
>>  }
>>
>>  static void xe_pmu_event_update(struct perf_event *event)
>> @@ -136,10 +171,11 @@ static void xe_pmu_event_update(struct perf_event *event)
>>
>>  	prev = local64_read(&hwc->prev_count);
>>  	do {
>> -		new = __xe_pmu_event_read(event);
>> +		new = __xe_pmu_event_read(event, prev);
>>  	} while (!local64_try_cmpxchg(&hwc->prev_count, &prev, new));
>>
>> -	local64_add(new - prev, &event->count);
>> +	if (new > prev)
>> +		local64_add(new - prev, &event->count);
>>  }
>>
>>  static void xe_pmu_event_read(struct perf_event *event)
>> @@ -162,7 +198,7 @@ static void xe_pmu_enable(struct perf_event *event)
>>  	 * for all listeners. Even when the event was already enabled and has
>>  	 * an existing non-zero value.
>>  	 */
>> -	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
>> +	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event, 0));
>>  }
>>
>>  static void xe_pmu_event_start(struct perf_event *event, int flags)
>> @@ -267,6 +303,10 @@ static const struct attribute_group pmu_events_attr_group = {
>>
>>  static void set_supported_events(struct xe_pmu *pmu)
>>  {
>> +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>> +
>> +	if (!xe->info.skip_guc_pc)
>> +		pmu->supported_events |= BIT_ULL(XE_PMU_EVENT_GT_C6_RESIDENCY);
>>  }
>>
>>  /**
>> --
>> 2.48.0
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events
  2025-01-17 22:40     ` Lucas De Marchi
@ 2025-01-17 22:52       ` Rodrigo Vivi
  0 siblings, 0 replies; 17+ messages in thread
From: Rodrigo Vivi @ 2025-01-17 22:52 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-xe, Peter Zijlstra, linux-perf-users, Vinay Belgaumkar

On Fri, Jan 17, 2025 at 04:40:25PM -0600, Lucas De Marchi wrote:
> On Fri, Jan 17, 2025 at 01:44:02PM -0500, Rodrigo Vivi wrote:
> > On Thu, Jan 16, 2025 at 03:07:18PM -0800, Lucas De Marchi wrote:
> > > From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> > > 
> > > Provide a PMU interface for GT C6 residency counters. The implementation
> > > is ported over from the i915 PMU code. Residency is provided in units of
> > > ms(like sysfs entry in - /sys/class/drm/card0/device/tile0/gt0/gtidle).
> > > 
> > > Sample usage and output:
> > > 
> > > 	$ perf list | grep gt-c6
> > > 	  xe_0000_00_02.0/gt-c6-residency/                   [Kernel PMU event]
> > > 
> > > 	$ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency*
> > > 	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency <==
> > > 	event=0x01
> > > 
> > > 	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency.unit <==
> > > 	ms
> > > 
> > > 	$ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000
> > > 	#           time             counts unit events
> > > 	     1.001196056              1,001 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
> > > 	     2.005216219              1,003 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
> > > 
> > > Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> > > Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> > > ---
> > > 
> > > Besides the rebase, that changed a lot how the event was added,
> > > here is a summary of other changes:
> > > 
> > > - Use xe_pm_runtime_get_if_active() when reading
> > >   xe_gt_idle_residency_msec() as there's not guarantee it will not be
> > >   suspended anymore by the time it reads the counter
> > > 
> > > - Drop sample[] from the pmu struct and only use the prev/counter from
> > >   the perf_event struct. This avoids mixing the counter reported to 2
> > >   separate clients.
> > > 
> > > - Drop time ktime helpers and just use what's provided by
> > >   include/linux/ktime.h
> > > 
> > >  drivers/gpu/drm/xe/xe_pmu.c | 56 +++++++++++++++++++++++++++++++------
> > >  1 file changed, 48 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
> > > index c2af82ec3f793..37df9d3cc110c 100644
> > > --- a/drivers/gpu/drm/xe/xe_pmu.c
> > > +++ b/drivers/gpu/drm/xe/xe_pmu.c
> > > @@ -11,6 +11,7 @@
> > >  #include "xe_device.h"
> > >  #include "xe_force_wake.h"
> > >  #include "xe_gt_clock.h"
> > > +#include "xe_gt_idle.h"
> > >  #include "xe_gt_printk.h"
> > >  #include "xe_mmio.h"
> > >  #include "xe_macros.h"
> > > @@ -117,16 +118,50 @@ static int xe_pmu_event_init(struct perf_event *event)
> > >  	return 0;
> > >  }
> > > 
> > > -static u64 __xe_pmu_event_read(struct perf_event *event)
> > > +static u64 read_gt_c6_residency(struct xe_pmu *pmu, struct xe_gt *gt, u64 prev)
> > >  {
> > > -	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> > > +	struct xe_device *xe = gt_to_xe(gt);
> > > +	unsigned long flags;
> > > +	ktime_t t0;
> > > +	s64 delta;
> > > +
> > > +	if (xe_pm_runtime_get_if_active(xe)) {
> > > +		u64 val = xe_gt_idle_residency_msec(&gt->gtidle);
> > > +
> > > +		xe_pm_runtime_put(xe);
> > > +
> > > +		return val;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Estimate the idle residency by looking at the time the device was
> > > +	 * suspended: should be good enough as long as the sampling frequency is
> > > +	 * 2x or more than the suspend frequency.
> > > +	 */
> > > +	raw_spin_lock_irqsave(&pmu->lock, flags);
> > > +	t0 = pmu->suspend_timestamp[gt->info.id];
> > > +	raw_spin_unlock_irqrestore(&pmu->lock, flags);
> > 
> > I believe we should simply avoid the patch 4 and this block and use
> > xe_pm_runtime_get(xe) instead of the if_active variant.
> 
> downside is that gpu is going to be awaken because we are monitoring it.
> 
> > 
> > I'm afraid this won't be a good estimative since this mix 2 different wakes
> > of 2 different IP blocks... the runtime pm is for the entire device, while
> > in runtime active the GTs can be sleeping in GT-C6.
> 
> in runtime active we can read the counter without making the debug tool
> (perf) disrupt the device state. Idea here is that when it's on runtime
> pm suspended it also counts as gt-c6 since it's stronger than that.
> device suspended implies gt-6, so we count that since suspended.

hmm okay, that superseeds indeed. but then it would be good if patch 4
saves per device and not per gt, and we make this clear with a comment
there to avoid confusions...

> 
> > 
> > or did we get some deadlock risk on that scenario?
> 
> I don't think there's a deadlock... there's just the undesired behavior
> of "when perf is running, the device is never suspended" - and since we
> have plans to integrate perf with tools like gputop/qmassa/etc, it means
> it will never suspend "when users are looking at it".
> 
> It's definitely simpler, if acceptable: "when looking at gt-c6, the
> device will never suspend". We can also start with the simpler one and
> split the more complex scenario with a patch on top, too.

my baseline assumption was that if you are looking at perf is because you are
doing something with it, so it is not suspended in general...

But well, there's also the power consumption aspect in general that
would be higher when looking to the device and that wouldn't be good
indeed in the tools charts...

so, okay, let's try to save power and get some good charts ;)

> 
> > perhaps we should get runtime pm in upper layers when using PMU?
> 
> we'd need to do it when enabling/disabling the event. It should be
> possible, yes. We can't do it on the read call though since we'd need to
> make that synchronously.
> 
> Lucas De Marchi
> 
> > 
> > > +
> > > +	delta = ktime_ms_delta(ktime_get(), t0);
> > > +
> > > +	return prev + delta;
> > > +}
> > > +
> > > +static u64 __xe_pmu_event_read(struct perf_event *event, u64 prev)
> > > +{
> > > +	struct xe_pmu *pmu = container_of(event->pmu, typeof(*pmu), base);
> > > +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
> > >  	struct xe_gt *gt = event_to_gt(event);
> > > -	u64 val = 0;
> > > 
> > >  	if (!gt)
> > > -		return 0;
> > > +		return prev;
> > > +
> > > +	switch (config_to_event_id(event->attr.config)) {
> > > +	case XE_PMU_EVENT_GT_C6_RESIDENCY:
> > > +		return read_gt_c6_residency(pmu, gt, prev);
> > > +	}
> > > 
> > > -	return val;
> > > +	return prev;
> > >  }
> > > 
> > >  static void xe_pmu_event_update(struct perf_event *event)
> > > @@ -136,10 +171,11 @@ static void xe_pmu_event_update(struct perf_event *event)
> > > 
> > >  	prev = local64_read(&hwc->prev_count);
> > >  	do {
> > > -		new = __xe_pmu_event_read(event);
> > > +		new = __xe_pmu_event_read(event, prev);
> > >  	} while (!local64_try_cmpxchg(&hwc->prev_count, &prev, new));
> > > 
> > > -	local64_add(new - prev, &event->count);
> > > +	if (new > prev)
> > > +		local64_add(new - prev, &event->count);
> > >  }
> > > 
> > >  static void xe_pmu_event_read(struct perf_event *event)
> > > @@ -162,7 +198,7 @@ static void xe_pmu_enable(struct perf_event *event)
> > >  	 * for all listeners. Even when the event was already enabled and has
> > >  	 * an existing non-zero value.
> > >  	 */
> > > -	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
> > > +	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event, 0));
> > >  }
> > > 
> > >  static void xe_pmu_event_start(struct perf_event *event, int flags)
> > > @@ -267,6 +303,10 @@ static const struct attribute_group pmu_events_attr_group = {
> > > 
> > >  static void set_supported_events(struct xe_pmu *pmu)
> > >  {
> > > +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
> > > +
> > > +	if (!xe->info.skip_guc_pc)
> > > +		pmu->supported_events |= BIT_ULL(XE_PMU_EVENT_GT_C6_RESIDENCY);
> > >  }
> > > 
> > >  /**
> > > --
> > > 2.48.0
> > > 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events
  2025-01-16 23:07 ` [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events Lucas De Marchi
  2025-01-17 18:44   ` Rodrigo Vivi
@ 2025-01-21  6:16   ` Riana Tauro
  2025-01-21 20:38     ` Lucas De Marchi
  1 sibling, 1 reply; 17+ messages in thread
From: Riana Tauro @ 2025-01-21  6:16 UTC (permalink / raw)
  To: Lucas De Marchi, intel-xe
  Cc: Peter Zijlstra, linux-perf-users, Vinay Belgaumkar

Hi Lucas

On 1/17/2025 4:37 AM, Lucas De Marchi wrote:
> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> 
> Provide a PMU interface for GT C6 residency counters. The implementation
> is ported over from the i915 PMU code. Residency is provided in units of
> ms(like sysfs entry in - /sys/class/drm/card0/device/tile0/gt0/gtidle).
> 
> Sample usage and output:
> 
> 	$ perf list | grep gt-c6
> 	  xe_0000_00_02.0/gt-c6-residency/                   [Kernel PMU event]
> 
> 	$ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency*
> 	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency <==
> 	event=0x01
> 
> 	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency.unit <==
> 	ms
> 
> 	$ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000
> 	#           time             counts unit events
> 	     1.001196056              1,001 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
> 	     2.005216219              1,003 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
> 
> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
> 
> Besides the rebase, that changed a lot how the event was added,
> here is a summary of other changes:
> 
> - Use xe_pm_runtime_get_if_active() when reading
>    xe_gt_idle_residency_msec() as there's not guarantee it will not be
>    suspended anymore by the time it reads the counter
> 
> - Drop sample[] from the pmu struct and only use the prev/counter from
>    the perf_event struct. This avoids mixing the counter reported to 2
>    separate clients.
> 
> - Drop time ktime helpers and just use what's provided by
>    include/linux/ktime.h
> 
>   drivers/gpu/drm/xe/xe_pmu.c | 56 +++++++++++++++++++++++++++++++------
>   1 file changed, 48 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
> index c2af82ec3f793..37df9d3cc110c 100644
> --- a/drivers/gpu/drm/xe/xe_pmu.c
> +++ b/drivers/gpu/drm/xe/xe_pmu.c
> @@ -11,6 +11,7 @@
>   #include "xe_device.h"
>   #include "xe_force_wake.h"
>   #include "xe_gt_clock.h"
> +#include "xe_gt_idle.h"
>   #include "xe_gt_printk.h"
>   #include "xe_mmio.h"
>   #include "xe_macros.h"
> @@ -117,16 +118,50 @@ static int xe_pmu_event_init(struct perf_event *event)
>   	return 0;
>   }
>   
> -static u64 __xe_pmu_event_read(struct perf_event *event)
> +static u64 read_gt_c6_residency(struct xe_pmu *pmu, struct xe_gt *gt, u64 prev)
>   {
> -	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
> +	struct xe_device *xe = gt_to_xe(gt);
> +	unsigned long flags;
> +	ktime_t t0;
> +	s64 delta;
> +
> +	if (xe_pm_runtime_get_if_active(xe)) {
Seeing this lockdep warn on reading gt-c6-residency

[ 5032.731663] =============================
[ 5032.735699] [ BUG: Invalid wait context ]
[ 5032.745260] -----------------------------
[ 5032.749300] perf/3320 is trying to lock:
[ 5032.753260] ffff888105f2c238 (&dev->power.lock){-.-.}-{3:3}, at: 
pm_runtime_get_conditional+0x26/0xb0
[ 5032.762528] other info that might help us debug this:
[ 5032.767613] context-{5:5}
[ 5032.770262] 3 locks held by perf/3320:
[ 5032.774045]  #0: ffff88846f632048 (&cpuctx_mutex){+.+.}-{4:4}, at: 
perf_event_ctx_lock_nested+0xba/0x230
[ 5032.783587]  #1: ffff8881037482c0 (&event->child_mutex){+.+.}-{4:4}, 
at: perf_event_for_each_child+0x39/0x90
[ 5032.793480]  #2: ffff88846f631fb8 (&cpuctx_lock){....}-{2:2}, at: 
event_function+0x57/0x120


[ 5032.829907] Call Trace:
[ 5032.832384]  <TASK>
[ 5032.834513]  dump_stack_lvl+0x81/0xc0
[ 5032.838236]  dump_stack+0x10/0x20
[ 5032.841586]  __lock_acquire+0xa5a/0x2550
[ 5032.845548]  lock_acquire+0xc3/0x2f0
[ 5032.849156]  ? pm_runtime_get_conditional+0x26/0xb0
[ 5032.854066]  ? is_bpf_text_address+0x71/0x120
[ 5032.858487]  _raw_spin_lock_irqsave+0x4b/0x70
[ 5032.862892]  ? pm_runtime_get_conditional+0x26/0xb0
[ 5032.867814]  pm_runtime_get_conditional+0x26/0xb0
[ 5032.872570]  pm_runtime_get_if_active+0x13/0x20
[ 5032.877141]  xe_pm_runtime_get_if_active+0x12/0x20 [xe]
[ 5032.882545]  __xe_pmu_event_read+0x116/0x2a0 [xe]

Thanks
Riana

> +		u64 val = xe_gt_idle_residency_msec(&gt->gtidle);
> +
> +		xe_pm_runtime_put(xe);
> +
> +		return val;
> +	}
> +
> +	/*
> +	 * Estimate the idle residency by looking at the time the device was
> +	 * suspended: should be good enough as long as the sampling frequency is
> +	 * 2x or more than the suspend frequency.
> +	 */
> +	raw_spin_lock_irqsave(&pmu->lock, flags);
> +	t0 = pmu->suspend_timestamp[gt->info.id];
> +	raw_spin_unlock_irqrestore(&pmu->lock, flags);
> +
> +	delta = ktime_ms_delta(ktime_get(), t0);
> +
> +	return prev + delta;
> +}
> +
> +static u64 __xe_pmu_event_read(struct perf_event *event, u64 prev)
> +{
> +	struct xe_pmu *pmu = container_of(event->pmu, typeof(*pmu), base);
> +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>   	struct xe_gt *gt = event_to_gt(event);
> -	u64 val = 0;
>   
>   	if (!gt)
> -		return 0;
> +		return prev;
> +
> +	switch (config_to_event_id(event->attr.config)) {
> +	case XE_PMU_EVENT_GT_C6_RESIDENCY:
> +		return read_gt_c6_residency(pmu, gt, prev);
> +	}
>   
> -	return val;
> +	return prev;
>   }
>   
>   static void xe_pmu_event_update(struct perf_event *event)
> @@ -136,10 +171,11 @@ static void xe_pmu_event_update(struct perf_event *event)
>   
>   	prev = local64_read(&hwc->prev_count);
>   	do {
> -		new = __xe_pmu_event_read(event);
> +		new = __xe_pmu_event_read(event, prev);
>   	} while (!local64_try_cmpxchg(&hwc->prev_count, &prev, new));
>   
> -	local64_add(new - prev, &event->count);
> +	if (new > prev)
> +		local64_add(new - prev, &event->count);
>   }
>   
>   static void xe_pmu_event_read(struct perf_event *event)
> @@ -162,7 +198,7 @@ static void xe_pmu_enable(struct perf_event *event)
>   	 * for all listeners. Even when the event was already enabled and has
>   	 * an existing non-zero value.
>   	 */
> -	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
> +	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event, 0));
>   }
>   
>   static void xe_pmu_event_start(struct perf_event *event, int flags)
> @@ -267,6 +303,10 @@ static const struct attribute_group pmu_events_attr_group = {
>   
>   static void set_supported_events(struct xe_pmu *pmu)
>   {
> +	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
> +
> +	if (!xe->info.skip_guc_pc)
> +		pmu->supported_events |= BIT_ULL(XE_PMU_EVENT_GT_C6_RESIDENCY);
>   }
>   
>   /**


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events
  2025-01-21  6:16   ` Riana Tauro
@ 2025-01-21 20:38     ` Lucas De Marchi
  2025-01-22  5:28       ` Riana Tauro
  0 siblings, 1 reply; 17+ messages in thread
From: Lucas De Marchi @ 2025-01-21 20:38 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, Peter Zijlstra, linux-perf-users, Vinay Belgaumkar,
	Rodrigo Vivi

On Tue, Jan 21, 2025 at 11:46:45AM +0530, Riana Tauro wrote:
>Hi Lucas
>
>On 1/17/2025 4:37 AM, Lucas De Marchi wrote:
>>From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>
>>Provide a PMU interface for GT C6 residency counters. The implementation
>>is ported over from the i915 PMU code. Residency is provided in units of
>>ms(like sysfs entry in - /sys/class/drm/card0/device/tile0/gt0/gtidle).
>>
>>Sample usage and output:
>>
>>	$ perf list | grep gt-c6
>>	  xe_0000_00_02.0/gt-c6-residency/                   [Kernel PMU event]
>>
>>	$ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency*
>>	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency <==
>>	event=0x01
>>
>>	==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency.unit <==
>>	ms
>>
>>	$ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000
>>	#           time             counts unit events
>>	     1.001196056              1,001 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
>>	     2.005216219              1,003 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
>>
>>Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>>---
>>
>>Besides the rebase, that changed a lot how the event was added,
>>here is a summary of other changes:
>>
>>- Use xe_pm_runtime_get_if_active() when reading
>>   xe_gt_idle_residency_msec() as there's not guarantee it will not be
>>   suspended anymore by the time it reads the counter
>>
>>- Drop sample[] from the pmu struct and only use the prev/counter from
>>   the perf_event struct. This avoids mixing the counter reported to 2
>>   separate clients.
>>
>>- Drop time ktime helpers and just use what's provided by
>>   include/linux/ktime.h
>>
>>  drivers/gpu/drm/xe/xe_pmu.c | 56 +++++++++++++++++++++++++++++++------
>>  1 file changed, 48 insertions(+), 8 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
>>index c2af82ec3f793..37df9d3cc110c 100644
>>--- a/drivers/gpu/drm/xe/xe_pmu.c
>>+++ b/drivers/gpu/drm/xe/xe_pmu.c
>>@@ -11,6 +11,7 @@
>>  #include "xe_device.h"
>>  #include "xe_force_wake.h"
>>  #include "xe_gt_clock.h"
>>+#include "xe_gt_idle.h"
>>  #include "xe_gt_printk.h"
>>  #include "xe_mmio.h"
>>  #include "xe_macros.h"
>>@@ -117,16 +118,50 @@ static int xe_pmu_event_init(struct perf_event *event)
>>  	return 0;
>>  }
>>-static u64 __xe_pmu_event_read(struct perf_event *event)
>>+static u64 read_gt_c6_residency(struct xe_pmu *pmu, struct xe_gt *gt, u64 prev)
>>  {
>>-	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
>>+	struct xe_device *xe = gt_to_xe(gt);
>>+	unsigned long flags;
>>+	ktime_t t0;
>>+	s64 delta;
>>+
>>+	if (xe_pm_runtime_get_if_active(xe)) {
>Seeing this lockdep warn on reading gt-c6-residency
>
>[ 5032.731663] =============================
>[ 5032.735699] [ BUG: Invalid wait context ]
>[ 5032.745260] -----------------------------
>[ 5032.749300] perf/3320 is trying to lock:
>[ 5032.753260] ffff888105f2c238 (&dev->power.lock){-.-.}-{3:3}, at: 
>pm_runtime_get_conditional+0x26/0xb0

ugh... thanks. Currently we are papering over that in drm-tip since
i915's pmu is completly broken.  I need to remember reverting that hack
from topic/core-for-CI to test things properly so xe doesn't have the
same fate.

here dev->power.lock is a spinlock, but ....


>[ 5032.762528] other info that might help us debug this:
>[ 5032.767613] context-{5:5}
>[ 5032.770262] 3 locks held by perf/3320:
>[ 5032.774045]  #0: ffff88846f632048 (&cpuctx_mutex){+.+.}-{4:4}, at: 
>perf_event_ctx_lock_nested+0xba/0x230
>[ 5032.783587]  #1: ffff8881037482c0 
>(&event->child_mutex){+.+.}-{4:4}, at: 
>perf_event_for_each_child+0x39/0x90
>[ 5032.793480]  #2: ffff88846f631fb8 (&cpuctx_lock){....}-{2:2}, at: 
>event_function+0x57/0x120

we are already holding a raw_spinlock.

The previous check for xe_pm_runtime_suspended() would get rid of the
warning since it doesn't take the dev->power.lock, but it would also be
racy. I think using the simplified interface Rodrigo proposed would be
the ok:  just take the runtime pm when creating the event
and releasing it when closing. It brings back the issues of releasing
an event after the device unbinds, but hopefully that will be fixed soon
by Peter's patches to perf_pmu_unregister().

Thanks for checking.

Lucas De Marchi

>
>
>[ 5032.829907] Call Trace:
>[ 5032.832384]  <TASK>
>[ 5032.834513]  dump_stack_lvl+0x81/0xc0
>[ 5032.838236]  dump_stack+0x10/0x20
>[ 5032.841586]  __lock_acquire+0xa5a/0x2550
>[ 5032.845548]  lock_acquire+0xc3/0x2f0
>[ 5032.849156]  ? pm_runtime_get_conditional+0x26/0xb0
>[ 5032.854066]  ? is_bpf_text_address+0x71/0x120
>[ 5032.858487]  _raw_spin_lock_irqsave+0x4b/0x70
>[ 5032.862892]  ? pm_runtime_get_conditional+0x26/0xb0
>[ 5032.867814]  pm_runtime_get_conditional+0x26/0xb0
>[ 5032.872570]  pm_runtime_get_if_active+0x13/0x20
>[ 5032.877141]  xe_pm_runtime_get_if_active+0x12/0x20 [xe]
>[ 5032.882545]  __xe_pmu_event_read+0x116/0x2a0 [xe]
>
>Thanks
>Riana
>
>>+		u64 val = xe_gt_idle_residency_msec(&gt->gtidle);
>>+
>>+		xe_pm_runtime_put(xe);
>>+
>>+		return val;
>>+	}
>>+
>>+	/*
>>+	 * Estimate the idle residency by looking at the time the device was
>>+	 * suspended: should be good enough as long as the sampling frequency is
>>+	 * 2x or more than the suspend frequency.
>>+	 */
>>+	raw_spin_lock_irqsave(&pmu->lock, flags);
>>+	t0 = pmu->suspend_timestamp[gt->info.id];
>>+	raw_spin_unlock_irqrestore(&pmu->lock, flags);
>>+
>>+	delta = ktime_ms_delta(ktime_get(), t0);
>>+
>>+	return prev + delta;
>>+}
>>+
>>+static u64 __xe_pmu_event_read(struct perf_event *event, u64 prev)
>>+{
>>+	struct xe_pmu *pmu = container_of(event->pmu, typeof(*pmu), base);
>>+	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>>  	struct xe_gt *gt = event_to_gt(event);
>>-	u64 val = 0;
>>  	if (!gt)
>>-		return 0;
>>+		return prev;
>>+
>>+	switch (config_to_event_id(event->attr.config)) {
>>+	case XE_PMU_EVENT_GT_C6_RESIDENCY:
>>+		return read_gt_c6_residency(pmu, gt, prev);
>>+	}
>>-	return val;
>>+	return prev;
>>  }
>>  static void xe_pmu_event_update(struct perf_event *event)
>>@@ -136,10 +171,11 @@ static void xe_pmu_event_update(struct perf_event *event)
>>  	prev = local64_read(&hwc->prev_count);
>>  	do {
>>-		new = __xe_pmu_event_read(event);
>>+		new = __xe_pmu_event_read(event, prev);
>>  	} while (!local64_try_cmpxchg(&hwc->prev_count, &prev, new));
>>-	local64_add(new - prev, &event->count);
>>+	if (new > prev)
>>+		local64_add(new - prev, &event->count);
>>  }
>>  static void xe_pmu_event_read(struct perf_event *event)
>>@@ -162,7 +198,7 @@ static void xe_pmu_enable(struct perf_event *event)
>>  	 * for all listeners. Even when the event was already enabled and has
>>  	 * an existing non-zero value.
>>  	 */
>>-	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
>>+	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event, 0));
>>  }
>>  static void xe_pmu_event_start(struct perf_event *event, int flags)
>>@@ -267,6 +303,10 @@ static const struct attribute_group pmu_events_attr_group = {
>>  static void set_supported_events(struct xe_pmu *pmu)
>>  {
>>+	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>>+
>>+	if (!xe->info.skip_guc_pc)
>>+		pmu->supported_events |= BIT_ULL(XE_PMU_EVENT_GT_C6_RESIDENCY);
>>  }
>>  /**
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events
  2025-01-21 20:38     ` Lucas De Marchi
@ 2025-01-22  5:28       ` Riana Tauro
  0 siblings, 0 replies; 17+ messages in thread
From: Riana Tauro @ 2025-01-22  5:28 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-xe, Peter Zijlstra, linux-perf-users, Vinay Belgaumkar,
	Rodrigo Vivi, Gupta, Anshuman



On 1/22/2025 2:08 AM, Lucas De Marchi wrote:
> On Tue, Jan 21, 2025 at 11:46:45AM +0530, Riana Tauro wrote:
>> Hi Lucas
>>
>> On 1/17/2025 4:37 AM, Lucas De Marchi wrote:
>>> From: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>>
>>> Provide a PMU interface for GT C6 residency counters. The implementation
>>> is ported over from the i915 PMU code. Residency is provided in units of
>>> ms(like sysfs entry in - /sys/class/drm/card0/device/tile0/gt0/gtidle).
>>>
>>> Sample usage and output:
>>>
>>>     $ perf list | grep gt-c6
>>>       xe_0000_00_02.0/gt-c6-residency/                   [Kernel PMU 
>>> event]
>>>
>>>     $ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt- 
>>> c6-residency*
>>>     ==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6- 
>>> residency <==
>>>     event=0x01
>>>
>>>     ==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6- 
>>> residency.unit <==
>>>     ms
>>>
>>>     $ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000
>>>     #           time             counts unit events
>>>          1.001196056              1,001 ms   xe_0000_00_02.0/gt-c6- 
>>> residency,gt=0/
>>>          2.005216219              1,003 ms   xe_0000_00_02.0/gt-c6- 
>>> residency,gt=0/
>>>
>>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>>> ---
>>>
>>> Besides the rebase, that changed a lot how the event was added,
>>> here is a summary of other changes:
>>>
>>> - Use xe_pm_runtime_get_if_active() when reading
>>>   xe_gt_idle_residency_msec() as there's not guarantee it will not be
>>>   suspended anymore by the time it reads the counter
>>>
>>> - Drop sample[] from the pmu struct and only use the prev/counter from
>>>   the perf_event struct. This avoids mixing the counter reported to 2
>>>   separate clients.
>>>
>>> - Drop time ktime helpers and just use what's provided by
>>>   include/linux/ktime.h
>>>
>>>  drivers/gpu/drm/xe/xe_pmu.c | 56 +++++++++++++++++++++++++++++++------
>>>  1 file changed, 48 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
>>> index c2af82ec3f793..37df9d3cc110c 100644
>>> --- a/drivers/gpu/drm/xe/xe_pmu.c
>>> +++ b/drivers/gpu/drm/xe/xe_pmu.c
>>> @@ -11,6 +11,7 @@
>>>  #include "xe_device.h"
>>>  #include "xe_force_wake.h"
>>>  #include "xe_gt_clock.h"
>>> +#include "xe_gt_idle.h"
>>>  #include "xe_gt_printk.h"
>>>  #include "xe_mmio.h"
>>>  #include "xe_macros.h"
>>> @@ -117,16 +118,50 @@ static int xe_pmu_event_init(struct perf_event 
>>> *event)
>>>      return 0;
>>>  }
>>> -static u64 __xe_pmu_event_read(struct perf_event *event)
>>> +static u64 read_gt_c6_residency(struct xe_pmu *pmu, struct xe_gt 
>>> *gt, u64 prev)
>>>  {
>>> -    struct xe_device *xe = container_of(event->pmu, typeof(*xe), 
>>> pmu.base);
>>> +    struct xe_device *xe = gt_to_xe(gt);
>>> +    unsigned long flags;
>>> +    ktime_t t0;
>>> +    s64 delta;
>>> +
>>> +    if (xe_pm_runtime_get_if_active(xe)) {
>> Seeing this lockdep warn on reading gt-c6-residency
>>
>> [ 5032.731663] =============================
>> [ 5032.735699] [ BUG: Invalid wait context ]
>> [ 5032.745260] -----------------------------
>> [ 5032.749300] perf/3320 is trying to lock:
>> [ 5032.753260] ffff888105f2c238 (&dev->power.lock){-.-.}-{3:3}, at: 
>> pm_runtime_get_conditional+0x26/0xb0
> 
> ugh... thanks. Currently we are papering over that in drm-tip since
> i915's pmu is completly broken.  I need to remember reverting that hack
> from topic/core-for-CI to test things properly so xe doesn't have the
> same fate.
> 
> here dev->power.lock is a spinlock, but ....
> 
> 
>> [ 5032.762528] other info that might help us debug this:
>> [ 5032.767613] context-{5:5}
>> [ 5032.770262] 3 locks held by perf/3320:
>> [ 5032.774045]  #0: ffff88846f632048 (&cpuctx_mutex){+.+.}-{4:4}, at: 
>> perf_event_ctx_lock_nested+0xba/0x230
>> [ 5032.783587]  #1: ffff8881037482c0 (&event->child_mutex){+.+.}- 
>> {4:4}, at: perf_event_for_each_child+0x39/0x90
>> [ 5032.793480]  #2: ffff88846f631fb8 (&cpuctx_lock){....}-{2:2}, at: 
>> event_function+0x57/0x120
> 
> we are already holding a raw_spinlock.
yeah and it's seen after 6.13. That's why xe_pm_runtime_suspended was 
used as it was not throwing any warning

Seeing a similar one for force_wake too, for engine activity. fw->lock 
is spinlock. Any suggestions to avoid this without changing it to 
raw_spinlock?

[  465.380652] ffff88810d5b8098 (&fw->lock){....}-{3:3}, at: 
xe_force_wake_get+0x1f9/0x8c0 [xe]
[  465.389168] other info that might help us debug this:
[  465.394221] context-{5:5}
[  465.396847] 1 lock held by swapper/0/0:
[  465.400682]  #0: ffff88885f031fb8 (&cpuctx_lock){....}-{2:2}, at: 
__perf_event_read+0x60/0x230

For runtime_get, rodrigo's suggestion would work. But taking force_wake 
on event start and end would cause power issues.

Thanks
Riana
> 
> The previous check for xe_pm_runtime_suspended() would get rid of the
> warning since it doesn't take the dev->power.lock, but it would also be
> racy. I think using the simplified interface Rodrigo proposed would be
> the ok:  just take the runtime pm when creating the event
> and releasing it when closing. It brings back the issues of releasing
> an event after the device unbinds, but hopefully that will be fixed soon
> by Peter's patches to perf_pmu_unregister().
> 
> Thanks for checking.
> 
> Lucas De Marchi
> 
>>
>>
>> [ 5032.829907] Call Trace:
>> [ 5032.832384]  <TASK>
>> [ 5032.834513]  dump_stack_lvl+0x81/0xc0
>> [ 5032.838236]  dump_stack+0x10/0x20
>> [ 5032.841586]  __lock_acquire+0xa5a/0x2550
>> [ 5032.845548]  lock_acquire+0xc3/0x2f0
>> [ 5032.849156]  ? pm_runtime_get_conditional+0x26/0xb0
>> [ 5032.854066]  ? is_bpf_text_address+0x71/0x120
>> [ 5032.858487]  _raw_spin_lock_irqsave+0x4b/0x70
>> [ 5032.862892]  ? pm_runtime_get_conditional+0x26/0xb0
>> [ 5032.867814]  pm_runtime_get_conditional+0x26/0xb0
>> [ 5032.872570]  pm_runtime_get_if_active+0x13/0x20
>> [ 5032.877141]  xe_pm_runtime_get_if_active+0x12/0x20 [xe]
>> [ 5032.882545]  __xe_pmu_event_read+0x116/0x2a0 [xe]
>>
>> Thanks
>> Riana
>>
>>> +        u64 val = xe_gt_idle_residency_msec(&gt->gtidle);
>>> +
>>> +        xe_pm_runtime_put(xe);
>>> +
>>> +        return val;
>>> +    }
>>> +
>>> +    /*
>>> +     * Estimate the idle residency by looking at the time the device 
>>> was
>>> +     * suspended: should be good enough as long as the sampling 
>>> frequency is
>>> +     * 2x or more than the suspend frequency.
>>> +     */
>>> +    raw_spin_lock_irqsave(&pmu->lock, flags);
>>> +    t0 = pmu->suspend_timestamp[gt->info.id];
>>> +    raw_spin_unlock_irqrestore(&pmu->lock, flags);
>>> +
>>> +    delta = ktime_ms_delta(ktime_get(), t0);
>>> +
>>> +    return prev + delta;
>>> +}
>>> +
>>> +static u64 __xe_pmu_event_read(struct perf_event *event, u64 prev)
>>> +{
>>> +    struct xe_pmu *pmu = container_of(event->pmu, typeof(*pmu), base);
>>> +    struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>>>      struct xe_gt *gt = event_to_gt(event);
>>> -    u64 val = 0;
>>>      if (!gt)
>>> -        return 0;
>>> +        return prev;
>>> +
>>> +    switch (config_to_event_id(event->attr.config)) {
>>> +    case XE_PMU_EVENT_GT_C6_RESIDENCY:
>>> +        return read_gt_c6_residency(pmu, gt, prev);
>>> +    }
>>> -    return val;
>>> +    return prev;
>>>  }
>>>  static void xe_pmu_event_update(struct perf_event *event)
>>> @@ -136,10 +171,11 @@ static void xe_pmu_event_update(struct 
>>> perf_event *event)
>>>      prev = local64_read(&hwc->prev_count);
>>>      do {
>>> -        new = __xe_pmu_event_read(event);
>>> +        new = __xe_pmu_event_read(event, prev);
>>>      } while (!local64_try_cmpxchg(&hwc->prev_count, &prev, new));
>>> -    local64_add(new - prev, &event->count);
>>> +    if (new > prev)
>>> +        local64_add(new - prev, &event->count);
>>>  }
>>>  static void xe_pmu_event_read(struct perf_event *event)
>>> @@ -162,7 +198,7 @@ static void xe_pmu_enable(struct perf_event *event)
>>>       * for all listeners. Even when the event was already enabled 
>>> and has
>>>       * an existing non-zero value.
>>>       */
>>> -    local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
>>> +    local64_set(&event->hw.prev_count, __xe_pmu_event_read(event, 0));
>>>  }
>>>  static void xe_pmu_event_start(struct perf_event *event, int flags)
>>> @@ -267,6 +303,10 @@ static const struct attribute_group 
>>> pmu_events_attr_group = {
>>>  static void set_supported_events(struct xe_pmu *pmu)
>>>  {
>>> +    struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>>> +
>>> +    if (!xe->info.skip_guc_pc)
>>> +        pmu->supported_events |= BIT_ULL(XE_PMU_EVENT_GT_C6_RESIDENCY);
>>>  }
>>>  /**
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-01-22  5:28 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-16 23:07 [PATCH v13 0/7] drm/xe/pmu: PMU interface for Xe Lucas De Marchi
2025-01-16 23:07 ` [PATCH v13 1/7] perf/core: Add PMU_EVENT_ATTR_ID_STRING Lucas De Marchi
2025-01-16 23:07 ` [PATCH v13 2/7] drm/xe/pmu: Enable PMU interface Lucas De Marchi
2025-01-17 18:30   ` Rodrigo Vivi
2025-01-17 22:19     ` Lucas De Marchi
2025-01-16 23:07 ` [PATCH v13 3/7] drm/xe/pmu: Assert max gt Lucas De Marchi
2025-01-17 18:31   ` Rodrigo Vivi
2025-01-16 23:07 ` [PATCH v13 4/7] drm/xe/pmu: Hook up gt suspend notification Lucas De Marchi
2025-01-16 23:07 ` [PATCH v13 5/7] drm/xe/pmu: Extract xe_pmu_event_update() Lucas De Marchi
2025-01-16 23:07 ` [PATCH v13 6/7] drm/xe/pmu: Add attribute skeleton Lucas De Marchi
2025-01-16 23:07 ` [PATCH v13 7/7] drm/xe/pmu: Add GT C6 events Lucas De Marchi
2025-01-17 18:44   ` Rodrigo Vivi
2025-01-17 22:40     ` Lucas De Marchi
2025-01-17 22:52       ` Rodrigo Vivi
2025-01-21  6:16   ` Riana Tauro
2025-01-21 20:38     ` Lucas De Marchi
2025-01-22  5:28       ` Riana Tauro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).