Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/17] Add OA functionality to Xe
@ 2023-12-08  6:43 Ashutosh Dixit
  2023-12-08  6:43 ` [PATCH 01/17] drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types Ashutosh Dixit
                   ` (17 more replies)
  0 siblings, 18 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Here's the latest version of Xe OA patches, this time Xe2/LNL support and
some more updates to uapi. Uapi updates include:
* The "Xe way" of using chained xe_user_extension structs to specify
  properties
* A new query for OA unit properties

That said, some uapi pieces are still not done. These include:
* Previously proposed sync/fences scheme
* Single header per read, rather than header per report
* Other properties like OA buffer size and hold preemption which are
  missing but will be added incrementally in the future

Most code review comments are addressed but some are not. These include:
* Cleanup of verifying OA configs during ADD_CONFIG perf op
* Optimizing the OA buffer use case when only OAR/OAC is needed

Also the patches are completely redone and don't assume starting with the
i915 uapi, so they will need to be reviewed anew. Please review with
emphasis on the uapi.

This series is also available at:
        https://gitlab.freedesktop.org/adixit/kernel/-/tree/xe-oa

The series has been tested against this IGT series:
        https://gitlab.freedesktop.org/adixit/igt-gpu-tools/-/tree/xe-oa

v2: Fix build
v3: Rebase, due to s/xe_engine/xe_exec_queue/
v4: Re-run for testing
v5: Address review comments, new patches 11 through 17
v6: New patches 18 through 21
v7: Patches are completely redone and don't start with i915 version of the uapi

Ashutosh Dixit (17):
  drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream
    types
  drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  drm/xe/oa/uapi: Add oa_max_sample_rate sysctl
  drm/xe/oa/uapi: Add OA data formats
  drm/xe/oa/uapi: Initialize OA units
  drm/xe/oa/uapi: Add/remove OA config perf ops
  drm/xe/oa/uapi: Define and parse OA stream properties
  drm/xe/oa: OA stream initialization (OAG)
  drm/xe/oa/uapi: Expose OA stream fd
  drm/xe/oa/uapi: Read file_operation
  drm/xe/oa: Disable overrun mode for Xe2+ OAG
  drm/xe/oa: Add OAR support
  drm/xe/oa: Add OAC support
  drm/xe/oa/uapi: Query OA unit properties
  drm/xe/oa/uapi: OA buffer mmap
  drm/xe/oa: Add MMIO trigger support
  drm/xe/oa: Override GuC RC with OA on PVC

 drivers/gpu/drm/xe/Makefile                   |    2 +
 .../gpu/drm/xe/instructions/xe_mi_commands.h  |    3 +
 drivers/gpu/drm/xe/regs/xe_engine_regs.h      |    4 +-
 drivers/gpu/drm/xe/regs/xe_gt_regs.h          |    3 +
 drivers/gpu/drm/xe/regs/xe_oa_regs.h          |  107 +
 drivers/gpu/drm/xe/xe_device.c                |   13 +
 drivers/gpu/drm/xe/xe_device_types.h          |    4 +
 drivers/gpu/drm/xe/xe_gt_types.h              |    4 +
 drivers/gpu/drm/xe/xe_guc_pc.c                |   60 +
 drivers/gpu/drm/xe/xe_guc_pc.h                |    3 +
 drivers/gpu/drm/xe/xe_hw_engine_types.h       |    2 +
 drivers/gpu/drm/xe/xe_lrc.c                   |   11 +-
 drivers/gpu/drm/xe/xe_lrc.h                   |    1 +
 drivers/gpu/drm/xe/xe_module.c                |   10 +
 drivers/gpu/drm/xe/xe_oa.c                    | 2506 +++++++++++++++++
 drivers/gpu/drm/xe/xe_oa.h                    |   31 +
 drivers/gpu/drm/xe/xe_oa_types.h              |  233 ++
 drivers/gpu/drm/xe/xe_perf.c                  |   67 +
 drivers/gpu/drm/xe/xe_perf.h                  |   20 +
 drivers/gpu/drm/xe/xe_query.c                 |   81 +
 drivers/gpu/drm/xe/xe_reg_whitelist.c         |   23 +
 include/uapi/drm/xe_drm.h                     |  278 ++
 22 files changed, 3460 insertions(+), 6 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/regs/xe_oa_regs.h
 create mode 100644 drivers/gpu/drm/xe/xe_oa.c
 create mode 100644 drivers/gpu/drm/xe/xe_oa.h
 create mode 100644 drivers/gpu/drm/xe/xe_oa_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_perf.c
 create mode 100644 drivers/gpu/drm/xe/xe_perf.h

-- 
2.41.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 01/17] drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-08  6:43 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

In Xe, the plan is to support multiple types of perf counter streams (OA is
only one type of these streams). Rather than introduce NxM ioctls for
these (N perf streams with M ioctl's per perf stream), we decide to
multiplex these (N different stream types and the M ops for each of these
stream types) through a single PERF ioctl. This multiplexing is the purpose
of the PERF layer.

In addition to PERF DRM ioctl's, another set of ioctl's on the PERF fd are
defined. These are expected to be common to different PERF stream types and
therefore defined at the PERF layer itself.

v2: Add param_size to 'struct drm_xe_perf_param' (Umesh)
v3: Rename 'enum drm_xe_perf_ops' to
    'enum drm_xe_perf_ioctls' (Guy Zadicario)
    Add DRM_ prefix to ioctl names to indicate uapi names
v4: Add 'enum drm_xe_perf_op' previously missed out (Guy Zadicario)
v5: Squash the ops and PERF layer patches into a single patch (Umesh)
    Remove param_size from struct 'drm_xe_perf_param' (Umesh)

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Guy Zadicario <gzadicario@habana.ai>
---
 drivers/gpu/drm/xe/Makefile    |  1 +
 drivers/gpu/drm/xe/xe_device.c |  2 ++
 drivers/gpu/drm/xe/xe_perf.c   | 21 +++++++++++++
 drivers/gpu/drm/xe/xe_perf.h   | 16 ++++++++++
 include/uapi/drm/xe_drm.h      | 56 ++++++++++++++++++++++++++++++++++
 5 files changed, 96 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_perf.c
 create mode 100644 drivers/gpu/drm/xe/xe_perf.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 87f3fca0c0ee7..b719953d9d30f 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -101,6 +101,7 @@ xe-y += xe_bb.o \
 	xe_pat.o \
 	xe_pci.o \
 	xe_pcode.o \
+	xe_perf.o \
 	xe_pm.o \
 	xe_preempt_fence.o \
 	xe_pt.o \
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 8423c817111bf..35616d1a81a31 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -31,6 +31,7 @@
 #include "xe_module.h"
 #include "xe_pat.h"
 #include "xe_pcode.h"
+#include "xe_perf.h"
 #include "xe_pm.h"
 #include "xe_query.h"
 #include "xe_tile.h"
@@ -126,6 +127,7 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_WAIT_USER_FENCE, xe_wait_user_fence_ioctl,
 			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(XE_PERF, xe_perf_ioctl, DRM_RENDER_ALLOW),
 };
 
 static const struct file_operations xe_driver_fops = {
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
new file mode 100644
index 0000000000000..a130076b59aa2
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/errno.h>
+
+#include "xe_perf.h"
+
+int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+	struct drm_xe_perf_param *arg = data;
+
+	if (arg->extensions)
+		return -EINVAL;
+
+	switch (arg->perf_type) {
+	default:
+		return -EINVAL;
+	}
+}
diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
new file mode 100644
index 0000000000000..254cc7cf49fef
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_perf.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_PERF_H_
+#define _XE_PERF_H_
+
+#include <drm/xe_drm.h>
+
+struct drm_device;
+struct drm_file;
+
+int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+
+#endif
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index eb03a49c17a13..3539e0781d700 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -108,6 +108,8 @@ struct xe_user_extension {
 #define DRM_XE_EXEC_QUEUE_GET_PROPERTY	0x08
 #define DRM_XE_EXEC			0x09
 #define DRM_XE_WAIT_USER_FENCE		0x0a
+#define DRM_XE_PERF			0x0f
+
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_XE_DEVICE_QUERY		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_DEVICE_QUERY, struct drm_xe_device_query)
@@ -122,6 +124,7 @@ struct xe_user_extension {
 #define DRM_IOCTL_XE_EXEC_QUEUE_GET_PROPERTY	DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_EXEC_QUEUE_GET_PROPERTY, struct drm_xe_exec_queue_get_property)
 #define DRM_IOCTL_XE_EXEC			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec)
 #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
+#define DRM_IOCTL_XE_PERF			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_PERF, struct drm_xe_perf_param)
 
 /**
  * struct drm_xe_engine_class_instance - instance of an engine class
@@ -1119,6 +1122,59 @@ struct drm_xe_wait_user_fence {
 #define DRM_XE_PMU_MEDIA_GROUP_BUSY(gt)		___DRM_XE_PMU_OTHER(gt, 2)
 #define DRM_XE_PMU_ANY_ENGINE_GROUP_BUSY(gt)	___DRM_XE_PMU_OTHER(gt, 3)
 
+/**
+ * enum drm_xe_perf_type - Perf stream types
+ */
+enum drm_xe_perf_type {
+	DRM_XE_PERF_TYPE_MAX,
+};
+
+/**
+ * enum drm_xe_perf_op - Perf stream ops
+ */
+enum drm_xe_perf_op {
+	/** @DRM_XE_PERF_OP_STREAM_OPEN: Open a perf counter stream */
+	DRM_XE_PERF_OP_STREAM_OPEN,
+
+	/** @DRM_XE_PERF_OP_ADD_CONFIG: Add perf stream config */
+	DRM_XE_PERF_OP_ADD_CONFIG,
+
+	/** @DRM_XE_PERF_OP_REMOVE_CONFIG: Remove perf stream config */
+	DRM_XE_PERF_OP_REMOVE_CONFIG,
+};
+
+/**
+ * struct drm_xe_perf_param - Perf layer param
+ *
+ * The perf layer enables multiplexing perf counter streams of multiple
+ * types. The actual params for a particular stream operation are supplied
+ * via the @param pointer (use __copy_from_user to get these params).
+ */
+struct drm_xe_perf_param {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+	/** @perf_type: Perf stream type, of enum @drm_xe_perf_type */
+	__u64 perf_type;
+	/** @perf_op: Perf op, of enum @drm_xe_perf_op */
+	__u64 perf_op;
+	/** @param: Pointer to actual stream params */
+	__u64 param;
+};
+
+/**
+ * enum drm_xe_perf_ioctls - Perf fd ioctl's
+ */
+enum drm_xe_perf_ioctls {
+	/** @DRM_XE_PERF_IOCTL_ENABLE: Enable data capture for a stream */
+	DRM_XE_PERF_IOCTL_ENABLE = _IO('i', 0x0),
+
+	/** @DRM_XE_PERF_IOCTL_DISABLE: Disable data capture for a stream */
+	DRM_XE_PERF_IOCTL_DISABLE = _IO('i', 0x1),
+
+	/** @DRM_XE_PERF_IOCTL_CONFIG: Change stream configuration */
+	DRM_XE_PERF_IOCTL_CONFIG = _IO('i', 0x2),
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
  2023-12-08  6:43 ` [PATCH 01/17] drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-14  0:57   ` Umesh Nerlige Ramappa
                     ` (2 more replies)
  2023-12-08  6:43 ` [PATCH 03/17] drm/xe/oa/uapi: Add oa_max_sample_rate sysctl Ashutosh Dixit
                   ` (15 subsequent siblings)
  17 siblings, 3 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Normally only superuser/root can access perf counter data. However,
superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
users to also access perf data. perf_stream_paranoid is introduced at the
perf layer to allow different perf stream types to share this access
mechanism.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_module.c |  5 +++++
 drivers/gpu/drm/xe/xe_perf.c   | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_perf.h   |  4 ++++
 3 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 51bf69b7ab222..8629330d928b0 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,6 +11,7 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
+#include "xe_perf.h"
 #include "xe_pmu.h"
 #include "xe_sched_job.h"
 
@@ -71,6 +72,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
 	},
+	{
+		.init = xe_perf_sysctl_register,
+		.exit = xe_perf_sysctl_unregister,
+	},
 };
 
 static int __init xe_init(void)
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index a130076b59aa2..37538e98dcc04 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -4,9 +4,13 @@
  */
 
 #include <linux/errno.h>
+#include <linux/sysctl.h>
 
 #include "xe_perf.h"
 
+u32 xe_perf_stream_paranoid = true;
+static struct ctl_table_header *sysctl_header;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_xe_perf_param *arg = data;
@@ -19,3 +23,27 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		return -EINVAL;
 	}
 }
+
+static struct ctl_table perf_ctl_table[] = {
+	{
+	 .procname = "perf_stream_paranoid",
+	 .data = &xe_perf_stream_paranoid,
+	 .maxlen = sizeof(xe_perf_stream_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = SYSCTL_ZERO,
+	 .extra2 = SYSCTL_ONE,
+	 },
+	{}
+};
+
+int xe_perf_sysctl_register(void)
+{
+	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
+	return 0;
+}
+
+void xe_perf_sysctl_unregister(void)
+{
+	unregister_sysctl_table(sysctl_header);
+}
diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
index 254cc7cf49fef..1ff0a07ebab30 100644
--- a/drivers/gpu/drm/xe/xe_perf.h
+++ b/drivers/gpu/drm/xe/xe_perf.h
@@ -11,6 +11,10 @@
 struct drm_device;
 struct drm_file;
 
+extern u32 xe_perf_stream_paranoid;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+int xe_perf_sysctl_register(void);
+void xe_perf_sysctl_unregister(void);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 03/17] drm/xe/oa/uapi: Add oa_max_sample_rate sysctl
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
  2023-12-08  6:43 ` [PATCH 01/17] drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types Ashutosh Dixit
  2023-12-08  6:43 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-14  0:58   ` Umesh Nerlige Ramappa
  2024-01-24 14:11   ` Joel Granados
  2023-12-08  6:43 ` [PATCH 04/17] drm/xe/oa/uapi: Add OA data formats Ashutosh Dixit
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Introduce oa_max_sample_rate sysctl to set a max limit on the frequency of
periodic OA reports.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/Makefile    |  1 +
 drivers/gpu/drm/xe/xe_device.c |  7 +++++
 drivers/gpu/drm/xe/xe_module.c |  5 ++++
 drivers/gpu/drm/xe/xe_oa.c     | 49 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_oa.h     | 16 +++++++++++
 5 files changed, 78 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_oa.c
 create mode 100644 drivers/gpu/drm/xe/xe_oa.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index b719953d9d30f..cf7e0e5261f73 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -98,6 +98,7 @@ xe-y += xe_bb.o \
 	xe_mmio.o \
 	xe_mocs.o \
 	xe_module.o \
+	xe_oa.o \
 	xe_pat.o \
 	xe_pci.o \
 	xe_pcode.o \
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 35616d1a81a31..744d573eb2720 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -29,6 +29,7 @@
 #include "xe_irq.h"
 #include "xe_mmio.h"
 #include "xe_module.h"
+#include "xe_oa.h"
 #include "xe_pat.h"
 #include "xe_pcode.h"
 #include "xe_perf.h"
@@ -480,6 +481,10 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_heci_gsc_init(xe);
 
+	err = xe_oa_init(xe);
+	if (err)
+		goto err_irq_shutdown;
+
 	err = xe_display_init(xe);
 	if (err)
 		goto err_irq_shutdown;
@@ -526,6 +531,8 @@ void xe_device_remove(struct xe_device *xe)
 
 	xe_display_fini(xe);
 
+	xe_oa_fini(xe);
+
 	xe_heci_gsc_fini(xe);
 
 	xe_irq_shutdown(xe);
diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 8629330d928b0..176d3e6ec8464 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -10,6 +10,7 @@
 
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
+#include "xe_oa.h"
 #include "xe_pci.h"
 #include "xe_perf.h"
 #include "xe_pmu.h"
@@ -76,6 +77,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_perf_sysctl_register,
 		.exit = xe_perf_sysctl_unregister,
 	},
+	{
+		.init = xe_oa_sysctl_register,
+		.exit = xe_oa_sysctl_unregister,
+	},
 };
 
 static int __init xe_init(void)
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
new file mode 100644
index 0000000000000..f4cacb4af47c5
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/sysctl.h>
+
+#include "xe_device.h"
+#include "xe_oa.h"
+
+static int xe_oa_sample_rate_hard_limit;
+static u32 xe_oa_max_sample_rate = 100000;
+
+static struct ctl_table_header *sysctl_header;
+
+int xe_oa_init(struct xe_device *xe)
+{
+	/* Choose a representative limit */
+	xe_oa_sample_rate_hard_limit = xe_root_mmio_gt(xe)->info.reference_clock / 2;
+	return 0;
+}
+
+void xe_oa_fini(struct xe_device *xe)
+{
+}
+
+static struct ctl_table oa_ctl_table[] = {
+	{
+	 .procname = "oa_max_sample_rate",
+	 .data = &xe_oa_max_sample_rate,
+	 .maxlen = sizeof(xe_oa_max_sample_rate),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = SYSCTL_ZERO,
+	 .extra2 = &xe_oa_sample_rate_hard_limit,
+	 },
+	{}
+};
+
+int xe_oa_sysctl_register(void)
+{
+	sysctl_header = register_sysctl("dev/xe", oa_ctl_table);
+	return 0;
+}
+
+void xe_oa_sysctl_unregister(void)
+{
+	unregister_sysctl_table(sysctl_header);
+}
diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
new file mode 100644
index 0000000000000..1b81330c9708b
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_oa.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_OA_H_
+#define _XE_OA_H_
+
+struct xe_device;
+
+int xe_oa_init(struct xe_device *xe);
+void xe_oa_fini(struct xe_device *xe);
+int xe_oa_sysctl_register(void);
+void xe_oa_sysctl_unregister(void);
+
+#endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 04/17] drm/xe/oa/uapi: Add OA data formats
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (2 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 03/17] drm/xe/oa/uapi: Add oa_max_sample_rate sysctl Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-19  1:11   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 05/17] drm/xe/oa/uapi: Initialize OA units Ashutosh Dixit
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Add and initialize supported OA data formats for various platforms
(including Xe2). User can request OA data in any supported format.

Bspec: 52198, 60942, 61101

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  4 ++
 drivers/gpu/drm/xe/xe_oa.c           | 94 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_oa.h           |  2 +
 drivers/gpu/drm/xe/xe_oa_types.h     | 78 +++++++++++++++++++++++
 include/uapi/drm/xe_drm.h            | 10 +++
 5 files changed, 188 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_oa_types.h

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 9a212dbdb8a49..842ca8b1a7408 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -15,6 +15,7 @@
 #include "xe_devcoredump_types.h"
 #include "xe_heci_gsc.h"
 #include "xe_gt_types.h"
+#include "xe_oa.h"
 #include "xe_platform_types.h"
 #include "xe_pt_types.h"
 #include "xe_pmu.h"
@@ -418,6 +419,9 @@ struct xe_device {
 	/** @heci_gsc: graphics security controller */
 	struct xe_heci_gsc heci_gsc;
 
+	/** @oa: oa perf counter subsystem */
+	struct xe_oa oa;
+
 	/** @needs_flr_on_fini: requests function-reset on fini */
 	bool needs_flr_on_fini;
 
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index f4cacb4af47c5..11662a81ef6d8 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -13,15 +13,109 @@ static u32 xe_oa_max_sample_rate = 100000;
 
 static struct ctl_table_header *sysctl_header;
 
+#define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
+
+static const struct xe_oa_format oa_formats[] = {
+	[XE_OA_FORMAT_C4_B8]			= { 7, 64 },
+	[XE_OA_FORMAT_A12]			= { 0, 64 },
+	[XE_OA_FORMAT_A12_B8_C8]		= { 2, 128 },
+	[XE_OA_FORMAT_A32u40_A4u32_B8_C8]	= { 5, 256 },
+	[XE_OAR_FORMAT_A32u40_A4u32_B8_C8]	= { 5, 256, DRM_FMT(OAR) },
+	[XE_OA_FORMAT_A24u40_A14u32_B8_C8]	= { 5, 256 },
+	[XE_OAC_FORMAT_A24u64_B8_C8]		= { 1, 320, DRM_FMT(OAC), HDR_64_BIT },
+	[XE_OAC_FORMAT_A22u32_R2u32_B8_C8]	= { 2, 192, DRM_FMT(OAC), HDR_64_BIT },
+	[XE_OAM_FORMAT_MPEC8u64_B8_C8]		= { 1, 192, DRM_FMT(OAM_MPEC), HDR_64_BIT },
+	[XE_OAM_FORMAT_MPEC8u32_B8_C8]		= { 2, 128, DRM_FMT(OAM_MPEC), HDR_64_BIT },
+	[XE_OA_FORMAT_PEC64u64]			= { 1, 576, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
+	[XE_OA_FORMAT_PEC64u64_B8_C8]		= { 1, 640, DRM_FMT(PEC), HDR_64_BIT, 1, 1 },
+	[XE_OA_FORMAT_PEC64u32]			= { 1, 320, DRM_FMT(PEC), HDR_64_BIT },
+	[XE_OA_FORMAT_PEC32u64_G1]		= { 5, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
+	[XE_OA_FORMAT_PEC32u32_G1]		= { 5, 192, DRM_FMT(PEC), HDR_64_BIT },
+	[XE_OA_FORMAT_PEC32u64_G2]		= { 6, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
+	[XE_OA_FORMAT_PEC32u32_G2]		= { 6, 192, DRM_FMT(PEC), HDR_64_BIT },
+	[XE_OA_FORMAT_PEC36u64_G1_32_G2_4]	= { 3, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
+	[XE_OA_FORMAT_PEC36u64_G1_4_G2_32]	= { 4, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
+};
+
+static void oa_format_add(struct xe_oa *oa, enum xe_oa_format_name format)
+{
+	__set_bit(format, oa->format_mask);
+}
+
+static void xe_oa_init_supported_formats(struct xe_oa *oa)
+{
+	switch (oa->xe->info.platform) {
+	case XE_TIGERLAKE:
+	case XE_ROCKETLAKE:
+	case XE_ALDERLAKE_S:
+	case XE_ALDERLAKE_P:
+	case XE_ALDERLAKE_N:
+	case XE_DG1:
+		oa_format_add(oa, XE_OA_FORMAT_A12);
+		oa_format_add(oa, XE_OA_FORMAT_A12_B8_C8);
+		oa_format_add(oa, XE_OA_FORMAT_A32u40_A4u32_B8_C8);
+		oa_format_add(oa, XE_OA_FORMAT_C4_B8);
+		break;
+
+	case XE_DG2:
+	case XE_PVC:
+		oa_format_add(oa, XE_OAR_FORMAT_A32u40_A4u32_B8_C8);
+		oa_format_add(oa, XE_OA_FORMAT_A24u40_A14u32_B8_C8);
+		oa_format_add(oa, XE_OAC_FORMAT_A24u64_B8_C8);
+		oa_format_add(oa, XE_OAC_FORMAT_A22u32_R2u32_B8_C8);
+		break;
+
+	case XE_METEORLAKE:
+		oa_format_add(oa, XE_OAR_FORMAT_A32u40_A4u32_B8_C8);
+		oa_format_add(oa, XE_OA_FORMAT_A24u40_A14u32_B8_C8);
+		oa_format_add(oa, XE_OAC_FORMAT_A24u64_B8_C8);
+		oa_format_add(oa, XE_OAC_FORMAT_A22u32_R2u32_B8_C8);
+		oa_format_add(oa, XE_OAM_FORMAT_MPEC8u64_B8_C8);
+		oa_format_add(oa, XE_OAM_FORMAT_MPEC8u32_B8_C8);
+		break;
+
+	case XE_LUNARLAKE:
+		oa_format_add(oa, XE_OAM_FORMAT_MPEC8u64_B8_C8);
+		oa_format_add(oa, XE_OAM_FORMAT_MPEC8u32_B8_C8);
+		oa_format_add(oa, XE_OA_FORMAT_PEC64u64);
+		oa_format_add(oa, XE_OA_FORMAT_PEC64u64_B8_C8);
+		oa_format_add(oa, XE_OA_FORMAT_PEC64u32);
+		oa_format_add(oa, XE_OA_FORMAT_PEC32u64_G1);
+		oa_format_add(oa, XE_OA_FORMAT_PEC32u32_G1);
+		oa_format_add(oa, XE_OA_FORMAT_PEC32u64_G2);
+		oa_format_add(oa, XE_OA_FORMAT_PEC32u32_G2);
+		oa_format_add(oa, XE_OA_FORMAT_PEC36u64_G1_32_G2_4);
+		oa_format_add(oa, XE_OA_FORMAT_PEC36u64_G1_4_G2_32);
+		break;
+
+	default:
+		drm_err(&oa->xe->drm, "Unknown platform\n");
+	}
+}
+
 int xe_oa_init(struct xe_device *xe)
 {
+	struct xe_oa *oa = &xe->oa;
+
+	/* Support OA only with GuC submission and Gen12+ */
+	if (XE_WARN_ON(!xe_device_uc_enabled(xe)) || XE_WARN_ON(GRAPHICS_VER(xe) < 12))
+		return 0;
+
+	oa->xe = xe;
+	oa->oa_formats = oa_formats;
+
 	/* Choose a representative limit */
 	xe_oa_sample_rate_hard_limit = xe_root_mmio_gt(xe)->info.reference_clock / 2;
+
+	xe_oa_init_supported_formats(oa);
 	return 0;
 }
 
 void xe_oa_fini(struct xe_device *xe)
 {
+	struct xe_oa *oa = &xe->oa;
+
+	oa->xe = NULL;
 }
 
 static struct ctl_table oa_ctl_table[] = {
diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
index 1b81330c9708b..2145c73176953 100644
--- a/drivers/gpu/drm/xe/xe_oa.h
+++ b/drivers/gpu/drm/xe/xe_oa.h
@@ -6,6 +6,8 @@
 #ifndef _XE_OA_H_
 #define _XE_OA_H_
 
+#include "xe_oa_types.h"
+
 struct xe_device;
 
 int xe_oa_init(struct xe_device *xe);
diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
new file mode 100644
index 0000000000000..3758bd2879cbb
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_oa_types.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_OA_TYPES_H_
+#define _XE_OA_TYPES_H__
+
+#include <linux/math.h>
+#include <linux/types.h>
+
+enum xe_oa_report_header {
+	HDR_32_BIT = 0,
+	HDR_64_BIT,
+};
+
+enum xe_oa_format_name {
+	XE_OA_FORMAT_C4_B8 = 7,
+
+	/* Gen8+ */
+	XE_OA_FORMAT_A12,
+	XE_OA_FORMAT_A12_B8_C8,
+	XE_OA_FORMAT_A32u40_A4u32_B8_C8,
+
+	/* DG2 */
+	XE_OAR_FORMAT_A32u40_A4u32_B8_C8,
+	XE_OA_FORMAT_A24u40_A14u32_B8_C8,
+
+	/* DG2/MTL OAC */
+	XE_OAC_FORMAT_A24u64_B8_C8,
+	XE_OAC_FORMAT_A22u32_R2u32_B8_C8,
+
+	/* MTL OAM */
+	XE_OAM_FORMAT_MPEC8u64_B8_C8,
+	XE_OAM_FORMAT_MPEC8u32_B8_C8,
+
+	/* Xe2+ */
+	XE_OA_FORMAT_PEC64u64,
+	XE_OA_FORMAT_PEC64u64_B8_C8,
+	XE_OA_FORMAT_PEC64u32,
+	XE_OA_FORMAT_PEC32u64_G1,
+	XE_OA_FORMAT_PEC32u32_G1,
+	XE_OA_FORMAT_PEC32u64_G2,
+	XE_OA_FORMAT_PEC32u32_G2,
+	XE_OA_FORMAT_PEC36u64_G1_32_G2_4,
+	XE_OA_FORMAT_PEC36u64_G1_4_G2_32,
+
+	XE_OA_FORMAT_MAX,
+};
+
+/**
+ * struct xe_oa_format - Format fields for supported OA formats
+ */
+struct xe_oa_format {
+	u32 counter_select;
+	int size;
+	int type;
+	enum xe_oa_report_header header;
+	u16 counter_size;
+	u16 bc_report;
+};
+
+/**
+ * struct xe_oa - OA device level information
+ */
+struct xe_oa {
+	/** @xe: back pointer to xe device */
+	struct xe_device *xe;
+
+	/** @oa_formats: tracks all OA formats across platforms */
+	const struct xe_oa_format *oa_formats;
+
+#define FORMAT_MASK_SIZE DIV_ROUND_UP(XE_OA_FORMAT_MAX - 1, BITS_PER_LONG)
+
+	/** @format_mask: tracks valid OA formats for a platform */
+	unsigned long format_mask[FORMAT_MASK_SIZE];
+};
+#endif
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 3539e0781d700..5bfb2d5aba12a 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1175,6 +1175,16 @@ enum drm_xe_perf_ioctls {
 	DRM_XE_PERF_IOCTL_CONFIG = _IO('i', 0x2),
 };
 
+/** enum drm_xe_oa_format_type - OA format types */
+enum drm_xe_oa_format_type {
+	DRM_XE_OA_FMT_TYPE_OAG,
+	DRM_XE_OA_FMT_TYPE_OAR,
+	DRM_XE_OA_FMT_TYPE_OAM,
+	DRM_XE_OA_FMT_TYPE_OAC,
+	DRM_XE_OA_FMT_TYPE_OAM_MPEC,
+	DRM_XE_OA_FMT_TYPE_PEC,
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 05/17] drm/xe/oa/uapi: Initialize OA units
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (3 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 04/17] drm/xe/oa/uapi: Add OA data formats Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-19 16:11   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 06/17] drm/xe/oa/uapi: Add/remove OA config perf ops Ashutosh Dixit
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Initialize OA unit data struct's for each gt during device probe. Also
assign OA units for hardware engines.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_oa_regs.h    |  96 ++++++++++++++
 drivers/gpu/drm/xe/xe_gt_types.h        |   4 +
 drivers/gpu/drm/xe/xe_hw_engine_types.h |   2 +
 drivers/gpu/drm/xe/xe_oa.c              | 169 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_oa_types.h        |  56 ++++++++
 include/uapi/drm/xe_drm.h               |   6 +
 6 files changed, 333 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/regs/xe_oa_regs.h

diff --git a/drivers/gpu/drm/xe/regs/xe_oa_regs.h b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
new file mode 100644
index 0000000000000..4455a5a42b01b
--- /dev/null
+++ b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef __XE_OA_REGS__
+#define __XE_OA_REGS__
+
+#define REG_EQUAL(reg, xe_reg) ((reg) == (xe_reg.addr))
+#define REG_EQUAL_MCR(reg, xe_reg) ((reg) == (xe_reg.__reg.addr))
+
+#define RPM_CONFIG1			XE_REG(0xd04)
+#define   GT_NOA_ENABLE			REG_BIT(9)
+
+#define EU_PERF_CNTL0			XE_REG(0xe458)
+#define EU_PERF_CNTL4			XE_REG(0xe45c)
+#define EU_PERF_CNTL1			XE_REG(0xe558)
+#define EU_PERF_CNTL5			XE_REG(0xe55c)
+#define EU_PERF_CNTL2			XE_REG(0xe658)
+#define EU_PERF_CNTL6			XE_REG(0xe65c)
+#define EU_PERF_CNTL3			XE_REG(0xe758)
+
+#define OA_TLB_INV_CR			XE_REG(0xceec)
+
+/* OAR unit */
+#define OAR_OACONTROL			XE_REG(0x2960)
+#define  OAR_OACONTROL_COUNTER_SEL_MASK	REG_GENMASK(3, 1)
+#define  OAR_OACONTROL_COUNTER_ENABLE	REG_BIT(0)
+
+#define OACTXCONTROL(base) XE_REG((base) + 0x360)
+#define OAR_OASTATUS			XE_REG(0x2968)
+#define  OA_COUNTER_RESUME		REG_BIT(0)
+
+/* OAG unit */
+#define OAG_OAGLBCTXCTRL		XE_REG(0x2b28)
+#define  OAG_OAGLBCTXCTRL_TIMER_PERIOD_MASK	REG_GENMASK(7, 2)
+#define  OAG_OAGLBCTXCTRL_TIMER_ENABLE		REG_BIT(1)
+#define  OAG_OAGLBCTXCTRL_COUNTER_RESUME	REG_BIT(0)
+
+#define OAG_OAHEADPTR				XE_REG(0xdb00)
+#define  OAG_OAHEADPTR_MASK			REG_GENMASK(31, 6)
+#define OAG_OATAILPTR				XE_REG(0xdb04)
+#define  OAG_OATAILPTR_MASK			REG_GENMASK(31, 6)
+
+#define OAG_OABUFFER		XE_REG(0xdb08)
+#define  OABUFFER_SIZE_MASK	REG_GENMASK(5, 3)
+#define  OABUFFER_SIZE_128K	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 0)
+#define  OABUFFER_SIZE_256K	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 1)
+#define  OABUFFER_SIZE_512K	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 2)
+#define  OABUFFER_SIZE_1M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 3)
+#define  OABUFFER_SIZE_2M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 4)
+#define  OABUFFER_SIZE_4M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 5)
+#define  OABUFFER_SIZE_8M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 6)
+#define  OABUFFER_SIZE_16M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 7)
+#define  OAG_OABUFFER_MEMORY_SELECT		REG_BIT(0) /* 0: PPGTT, 1: GGTT */
+
+#define OAG_OACONTROL				XE_REG(0xdaf4)
+#define  OAG_OACONTROL_OA_CCS_SELECT_MASK	REG_GENMASK(18, 16)
+#define  OAG_OACONTROL_OA_COUNTER_SEL_MASK	REG_GENMASK(4, 2)
+#define  OAG_OACONTROL_OA_COUNTER_ENABLE	REG_BIT(0)
+/* Common to all OA units */
+#define  OA_OACONTROL_REPORT_BC_MASK		REG_GENMASK(9, 9)
+#define  OA_OACONTROL_COUNTER_SIZE_MASK		REG_GENMASK(8, 8)
+
+#define OAG_OA_DEBUG XE_REG(0xdaf8, XE_REG_OPTION_MASKED)
+#define  OAG_OA_DEBUG_INCLUDE_CLK_RATIO			REG_BIT(6)
+#define  OAG_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS		REG_BIT(5)
+#define  OAG_OA_DEBUG_DISABLE_GO_1_0_REPORTS		REG_BIT(2)
+#define  OAG_OA_DEBUG_DISABLE_CTX_SWITCH_REPORTS	REG_BIT(1)
+
+#define OAG_OASTATUS XE_REG(0xdafc)
+#define  OAG_OASTATUS_COUNTER_OVERFLOW	REG_BIT(2)
+#define  OAG_OASTATUS_BUFFER_OVERFLOW	REG_BIT(1)
+#define  OAG_OASTATUS_REPORT_LOST	REG_BIT(0)
+
+/* OAM unit */
+#define OAM_HEAD_POINTER_OFFSET			(0x1a0)
+#define OAM_TAIL_POINTER_OFFSET			(0x1a4)
+#define OAM_BUFFER_OFFSET			(0x1a8)
+#define OAM_CONTEXT_CONTROL_OFFSET		(0x1bc)
+#define OAM_CONTROL_OFFSET			(0x194)
+#define  OAM_CONTROL_COUNTER_SEL_MASK		REG_GENMASK(3, 1)
+#define OAM_DEBUG_OFFSET			(0x198)
+#define OAM_STATUS_OFFSET			(0x19c)
+#define OAM_MMIO_TRG_OFFSET			(0x1d0)
+
+#define OAM_HEAD_POINTER(base)			XE_REG((base) + OAM_HEAD_POINTER_OFFSET)
+#define OAM_TAIL_POINTER(base)			XE_REG((base) + OAM_TAIL_POINTER_OFFSET)
+#define OAM_BUFFER(base)			XE_REG((base) + OAM_BUFFER_OFFSET)
+#define OAM_CONTEXT_CONTROL(base)		XE_REG((base) + OAM_CONTEXT_CONTROL_OFFSET)
+#define OAM_CONTROL(base)			XE_REG((base) + OAM_CONTROL_OFFSET)
+#define OAM_DEBUG(base)				XE_REG((base) + OAM_DEBUG_OFFSET)
+#define OAM_STATUS(base)			XE_REG((base) + OAM_STATUS_OFFSET)
+#define OAM_MMIO_TRG(base)			XE_REG((base) + OAM_MMIO_TRG_OFFSET)
+
+#endif /* __XE_OA_REGS__ */
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index a7263738308ec..a4a0170996982 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -10,6 +10,7 @@
 #include "xe_gt_idle_types.h"
 #include "xe_hw_engine_types.h"
 #include "xe_hw_fence_types.h"
+#include "xe_oa.h"
 #include "xe_reg_sr_types.h"
 #include "xe_sa_types.h"
 #include "xe_uc_types.h"
@@ -347,6 +348,9 @@ struct xe_gt {
 		/** @oob: bitmap with active OOB workaroudns */
 		unsigned long *oob;
 	} wa_active;
+
+	/** @oa: oa perf counter subsystem per gt info */
+	struct xe_oa_gt oa;
 };
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_hw_engine_types.h b/drivers/gpu/drm/xe/xe_hw_engine_types.h
index 39908dec042a4..4d2e2338db987 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine_types.h
+++ b/drivers/gpu/drm/xe/xe_hw_engine_types.h
@@ -146,6 +146,8 @@ struct xe_hw_engine {
 	enum xe_hw_engine_id engine_id;
 	/** @eclass: pointer to per hw engine class interface */
 	struct xe_hw_engine_class_intf *eclass;
+	/** @oa_unit: oa unit for this hw engine */
+	struct xe_oa_unit *oa_unit;
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 11662a81ef6d8..5ad3c9c78b4e9 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -5,7 +5,10 @@
 
 #include <linux/sysctl.h>
 
+#include "regs/xe_oa_regs.h"
 #include "xe_device.h"
+#include "xe_gt.h"
+#include "xe_mmio.h"
 #include "xe_oa.h"
 
 static int xe_oa_sample_rate_hard_limit;
@@ -13,6 +16,13 @@ static u32 xe_oa_max_sample_rate = 100000;
 
 static struct ctl_table_header *sysctl_header;
 
+enum {
+	XE_OA_UNIT_OAG = 0,
+	XE_OA_UNIT_OAM_SAMEDIA_0 = 0,
+	XE_OA_UNIT_MAX,
+	XE_OA_UNIT_INVALID = U32_MAX,
+};
+
 #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
 
 static const struct xe_oa_format oa_formats[] = {
@@ -37,6 +47,143 @@ static const struct xe_oa_format oa_formats[] = {
 	[XE_OA_FORMAT_PEC36u64_G1_4_G2_32]	= { 4, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
 };
 
+static u32 num_oa_units_per_gt(struct xe_gt *gt)
+{
+	return 1;
+}
+
+static u32 __hwe_oam_unit(struct xe_hw_engine *hwe)
+{
+	if (GRAPHICS_VERx100(gt_to_xe(hwe->gt)) >= 1270) {
+		/*
+		 * There's 1 SAMEDIA gt and 1 OAM per SAMEDIA gt. All media slices
+		 * within the gt use the same OAM. All MTL/LNL SKUs list 1 SA MEDIA
+		 */
+		drm_WARN_ON(&gt_to_xe(hwe->gt)->drm,
+			    hwe->gt->info.type != XE_GT_TYPE_MEDIA);
+
+		return XE_OA_UNIT_OAM_SAMEDIA_0;
+	}
+
+	return XE_OA_UNIT_INVALID;
+}
+
+static u32 __hwe_oa_unit(struct xe_hw_engine *hwe)
+{
+	switch (hwe->class) {
+	case XE_ENGINE_CLASS_RENDER:
+	case XE_ENGINE_CLASS_COMPUTE:
+		return XE_OA_UNIT_OAG;
+
+	case XE_ENGINE_CLASS_VIDEO_DECODE:
+	case XE_ENGINE_CLASS_VIDEO_ENHANCE:
+		return __hwe_oam_unit(hwe);
+
+	default:
+		return XE_OA_UNIT_INVALID;
+	}
+}
+
+static struct xe_oa_regs __oam_regs(u32 base)
+{
+	return (struct xe_oa_regs) {
+		base,
+		OAM_HEAD_POINTER(base),
+		OAM_TAIL_POINTER(base),
+		OAM_BUFFER(base),
+		OAM_CONTEXT_CONTROL(base),
+		OAM_CONTROL(base),
+		OAM_DEBUG(base),
+		OAM_STATUS(base),
+		OAM_CONTROL_COUNTER_SEL_MASK,
+	};
+}
+
+static struct xe_oa_regs __oag_regs(void)
+{
+	return (struct xe_oa_regs) {
+		0,
+		OAG_OAHEADPTR,
+		OAG_OATAILPTR,
+		OAG_OABUFFER,
+		OAG_OAGLBCTXCTRL,
+		OAG_OACONTROL,
+		OAG_OA_DEBUG,
+		OAG_OASTATUS,
+		OAG_OACONTROL_OA_COUNTER_SEL_MASK,
+	};
+}
+
+static void __xe_oa_init_oa_units(struct xe_gt *gt)
+{
+	const u32 mtl_oa_base[] = {
+		[XE_OA_UNIT_OAM_SAMEDIA_0] = 0x393000,
+	};
+	int i, num_units = gt->oa.num_oa_units;
+
+	for (i = 0; i < num_units; i++) {
+		struct xe_oa_unit *u = &gt->oa.oa_unit[i];
+
+		if (i == XE_OA_UNIT_OAG && gt->info.type != XE_GT_TYPE_MEDIA) {
+			u->regs = __oag_regs();
+			u->type = DRM_XE_OA_UNIT_TYPE_OAG;
+		} else if (GRAPHICS_VERx100(gt_to_xe(gt)) >= 1270) {
+			u->regs = __oam_regs(mtl_oa_base[i]);
+			u->type = DRM_XE_OA_UNIT_TYPE_OAM;
+		}
+
+		/* Set oa_unit_ids now to ensure ids remain contiguous */
+		u->oa_unit_id = gt_to_xe(gt)->oa.oa_unit_ids++;
+	}
+}
+
+static int xe_oa_init_gt(struct xe_gt *gt)
+{
+	u32 num_oa_units = num_oa_units_per_gt(gt);
+	struct xe_hw_engine *hwe;
+	enum xe_hw_engine_id id;
+	struct xe_oa_unit *u;
+
+	u = kcalloc(num_oa_units, sizeof(*u), GFP_KERNEL);
+	if (!u)
+		return -ENOMEM;
+
+	for_each_hw_engine(hwe, gt, id) {
+		u32 index = __hwe_oa_unit(hwe);
+
+		hwe->oa_unit = NULL;
+		if (index < num_oa_units) {
+			u[index].num_engines++;
+			hwe->oa_unit = &u[index];
+		}
+	}
+
+	/*
+	 * Fused off engines can result in oa_unit's with num_engines == 0. These units
+	 * will appear in OA unit query, but no perf streams can be opened on them.
+	 */
+	gt->oa.num_oa_units = num_oa_units;
+	gt->oa.oa_unit = u;
+
+	__xe_oa_init_oa_units(gt);
+
+	return 0;
+}
+
+static int xe_oa_init_oa_units(struct xe_oa *oa)
+{
+	struct xe_gt *gt;
+	int i, ret;
+
+	for_each_gt(gt, oa->xe, i) {
+		ret = xe_oa_init_gt(gt);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 static void oa_format_add(struct xe_oa *oa, enum xe_oa_format_name format)
 {
 	__set_bit(format, oa->format_mask);
@@ -96,6 +243,8 @@ static void xe_oa_init_supported_formats(struct xe_oa *oa)
 int xe_oa_init(struct xe_device *xe)
 {
 	struct xe_oa *oa = &xe->oa;
+	struct xe_gt *gt;
+	int i, ret;
 
 	/* Support OA only with GuC submission and Gen12+ */
 	if (XE_WARN_ON(!xe_device_uc_enabled(xe)) || XE_WARN_ON(GRAPHICS_VER(xe) < 12))
@@ -104,16 +253,36 @@ int xe_oa_init(struct xe_device *xe)
 	oa->xe = xe;
 	oa->oa_formats = oa_formats;
 
+	for_each_gt(gt, xe, i)
+		mutex_init(&gt->oa.gt_lock);
+
 	/* Choose a representative limit */
 	xe_oa_sample_rate_hard_limit = xe_root_mmio_gt(xe)->info.reference_clock / 2;
 
+	ret = xe_oa_init_oa_units(oa);
+	if (ret) {
+		drm_err(&xe->drm, "OA initialization failed %d\n", ret);
+		goto exit;
+	}
+
 	xe_oa_init_supported_formats(oa);
 	return 0;
+exit:
+	oa->xe = NULL;
+	return ret;
 }
 
 void xe_oa_fini(struct xe_device *xe)
 {
 	struct xe_oa *oa = &xe->oa;
+	struct xe_gt *gt;
+	int i;
+
+	if (!oa->xe)
+		return;
+
+	for_each_gt(gt, xe, i)
+		kfree(gt->oa.oa_unit);
 
 	oa->xe = NULL;
 }
diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
index 3758bd2879cbb..8f8cf6a2bf556 100644
--- a/drivers/gpu/drm/xe/xe_oa_types.h
+++ b/drivers/gpu/drm/xe/xe_oa_types.h
@@ -8,6 +8,10 @@
 
 #include <linux/math.h>
 #include <linux/types.h>
+#include <linux/mutex.h>
+
+#include <drm/xe_drm.h>
+#include "regs/xe_reg_defs.h"
 
 enum xe_oa_report_header {
 	HDR_32_BIT = 0,
@@ -60,6 +64,55 @@ struct xe_oa_format {
 	u16 bc_report;
 };
 
+/**
+ * struct xe_oa_regs - Registers for each OA unit
+ */
+struct xe_oa_regs {
+	u32 base;
+	struct xe_reg oa_head_ptr;
+	struct xe_reg oa_tail_ptr;
+	struct xe_reg oa_buffer;
+	struct xe_reg oa_ctx_ctrl;
+	struct xe_reg oa_ctrl;
+	struct xe_reg oa_debug;
+	struct xe_reg oa_status;
+	u32 oa_ctrl_counter_select_mask;
+};
+
+/**
+ * struct xe_oa_unit - Hardware OA unit
+ */
+struct xe_oa_unit {
+	/** @oa_unit_id: identifier for the OA unit */
+	u16 oa_unit_id;
+
+	/** @type: Type of OA unit - OAM, OAG etc. */
+	enum drm_xe_oa_unit_type type;
+
+	/** @regs: OA registers for programming the OA unit */
+	struct xe_oa_regs regs;
+
+	/** @num_engines: number of engines attached to this OA unit */
+	u32 num_engines;
+
+	/** @exclusive_stream: The stream currently using the OA unit */
+	struct xe_oa_stream *exclusive_stream;
+};
+
+/**
+ * struct xe_oa_gt - OA per-gt information
+ */
+struct xe_oa_gt {
+	/** @lock: lock protecting create/destroy OA streams */
+	struct mutex gt_lock;
+
+	/** @num_oa_units: number of oa units for each gt */
+	u32 num_oa_units;
+
+	/** @oa_unit: array of oa_units */
+	struct xe_oa_unit *oa_unit;
+};
+
 /**
  * struct xe_oa - OA device level information
  */
@@ -74,5 +127,8 @@ struct xe_oa {
 
 	/** @format_mask: tracks valid OA formats for a platform */
 	unsigned long format_mask[FORMAT_MASK_SIZE];
+
+	/** @oa_unit_ids: tracks oa unit ids assigned across gt's */
+	u16 oa_unit_ids;
 };
 #endif
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 5bfb2d5aba12a..778862a5b76d4 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1175,6 +1175,12 @@ enum drm_xe_perf_ioctls {
 	DRM_XE_PERF_IOCTL_CONFIG = _IO('i', 0x2),
 };
 
+/** enum drm_xe_oa_unit_type - OA unit types */
+enum drm_xe_oa_unit_type {
+	DRM_XE_OA_UNIT_TYPE_OAG,
+	DRM_XE_OA_UNIT_TYPE_OAM,
+};
+
 /** enum drm_xe_oa_format_type - OA format types */
 enum drm_xe_oa_format_type {
 	DRM_XE_OA_FMT_TYPE_OAG,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 06/17] drm/xe/oa/uapi: Add/remove OA config perf ops
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (4 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 05/17] drm/xe/oa/uapi: Initialize OA units Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-19 19:10   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties Ashutosh Dixit
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Introduce add/remove config perf ops for OA. OA configurations consist of a
set of event/counter select register address/value pairs. The add_config
perf op validates and stores such configurations and also exposes them in
the metrics sysfs. These configurations will be programmed to OA unit HW
when an OA stream using a configuration is opened. The OA stream can also
switch to other stored configurations.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c   |   4 +
 drivers/gpu/drm/xe/xe_oa.c       | 406 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_oa.h       |   9 +
 drivers/gpu/drm/xe/xe_oa_types.h |  10 +
 drivers/gpu/drm/xe/xe_perf.c     |  16 ++
 include/uapi/drm/xe_drm.h        |  25 ++
 6 files changed, 470 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 744d573eb2720..23fdd045b470a 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -495,6 +495,8 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_display_register(xe);
 
+	xe_oa_register(xe);
+
 	xe_debugfs_register(xe);
 
 	xe_pmu_register(&xe->pmu);
@@ -527,6 +529,8 @@ static void xe_device_remove_display(struct xe_device *xe)
 
 void xe_device_remove(struct xe_device *xe)
 {
+	xe_oa_unregister(xe);
+
 	xe_device_remove_display(xe);
 
 	xe_display_fini(xe);
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 5ad3c9c78b4e9..6a903bf4f87d1 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -10,6 +10,7 @@
 #include "xe_gt.h"
 #include "xe_mmio.h"
 #include "xe_oa.h"
+#include "xe_perf.h"
 
 static int xe_oa_sample_rate_hard_limit;
 static u32 xe_oa_max_sample_rate = 100000;
@@ -23,6 +24,28 @@ enum {
 	XE_OA_UNIT_INVALID = U32_MAX,
 };
 
+struct xe_oa_reg {
+	struct xe_reg addr;
+	u32 value;
+};
+
+struct xe_oa_config {
+	struct xe_oa *oa;
+
+	char uuid[UUID_STRING_LEN + 1];
+	int id;
+
+	const struct xe_oa_reg *regs;
+	u32 regs_len;
+
+	struct attribute_group sysfs_metric;
+	struct attribute *attrs[2];
+	struct kobj_attribute sysfs_metric_id;
+
+	struct kref ref;
+	struct rcu_head rcu;
+};
+
 #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
 
 static const struct xe_oa_format oa_formats[] = {
@@ -47,6 +70,377 @@ static const struct xe_oa_format oa_formats[] = {
 	[XE_OA_FORMAT_PEC36u64_G1_4_G2_32]	= { 4, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
 };
 
+static void xe_oa_config_release(struct kref *ref)
+{
+	struct xe_oa_config *oa_config =
+		container_of(ref, typeof(*oa_config), ref);
+
+	kfree(oa_config->regs);
+
+	kfree_rcu(oa_config, rcu);
+}
+
+static void xe_oa_config_put(struct xe_oa_config *oa_config)
+{
+	if (!oa_config)
+		return;
+
+	kref_put(&oa_config->ref, xe_oa_config_release);
+}
+
+static bool xe_oa_is_valid_flex_addr(struct xe_oa *oa, u32 addr)
+{
+	static const struct xe_reg flex_eu_regs[] = {
+		EU_PERF_CNTL0,
+		EU_PERF_CNTL1,
+		EU_PERF_CNTL2,
+		EU_PERF_CNTL3,
+		EU_PERF_CNTL4,
+		EU_PERF_CNTL5,
+		EU_PERF_CNTL6,
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(flex_eu_regs); i++) {
+		if (flex_eu_regs[i].addr == addr)
+			return true;
+	}
+	return false;
+}
+
+static bool xe_oa_reg_in_range_table(u32 addr, const struct xe_mmio_range *table)
+{
+	while (table->start || table->end) {
+		if (addr >= table->start && addr <= table->end)
+			return true;
+
+		table++;
+	}
+
+	return false;
+}
+
+static const struct xe_mmio_range xehp_oa_b_counters[] = {
+	{ .start = 0xdc48, .end = 0xdc48 },	/* OAA_ENABLE_REG */
+	{ .start = 0xdd00, .end = 0xdd48 },	/* OAG_LCE0_0 - OAA_LENABLE_REG */
+	{}
+};
+
+static const struct xe_mmio_range gen12_oa_b_counters[] = {
+	{ .start = 0x2b2c, .end = 0x2b2c },	/* OAG_OA_PESS */
+	{ .start = 0xd900, .end = 0xd91c },	/* OAG_OASTARTTRIG[1-8] */
+	{ .start = 0xd920, .end = 0xd93c },	/* OAG_OAREPORTTRIG1[1-8] */
+	{ .start = 0xd940, .end = 0xd97c },	/* OAG_CEC[0-7][0-1] */
+	{ .start = 0xdc00, .end = 0xdc3c },	/* OAG_SCEC[0-7][0-1] */
+	{ .start = 0xdc40, .end = 0xdc40 },	/* OAG_SPCTR_CNF */
+	{ .start = 0xdc44, .end = 0xdc44 },	/* OAA_DBG_REG */
+	{}
+};
+
+static const struct xe_mmio_range mtl_oam_b_counters[] = {
+	{ .start = 0x393000, .end = 0x39301c },	/* OAM_STARTTRIG1[1-8] */
+	{ .start = 0x393020, .end = 0x39303c },	/* OAM_REPORTTRIG1[1-8] */
+	{ .start = 0x393040, .end = 0x39307c },	/* OAM_CEC[0-7][0-1] */
+	{ .start = 0x393200, .end = 0x39323C },	/* MPES[0-7] */
+	{}
+};
+
+static const struct xe_mmio_range xe2_oa_b_counters[] = {
+	{ .start = 0x393200, .end = 0x39323C },	/* MPES_0_MPES_SAG - MPES_7_UPPER_MPES_SAG */
+	{ .start = 0x394200, .end = 0x39423C },	/* MPES_0_MPES_SCMI0 - MPES_7_UPPER_MPES_SCMI0 */
+	{ .start = 0x394A00, .end = 0x394A3C },	/* MPES_0_MPES_SCMI1 - MPES_7_UPPER_MPES_SCMI1 */
+	{},
+};
+
+static bool xe_oa_is_valid_b_counter_addr(struct xe_oa *oa, u32 addr)
+{
+	return xe_oa_reg_in_range_table(addr, xehp_oa_b_counters) ||
+		xe_oa_reg_in_range_table(addr, gen12_oa_b_counters) ||
+		xe_oa_reg_in_range_table(addr, mtl_oam_b_counters) ||
+		(GRAPHICS_VER(oa->xe) >= 20 &&
+		 xe_oa_reg_in_range_table(addr, xe2_oa_b_counters));
+}
+
+static const struct xe_mmio_range mtl_oa_mux_regs[] = {
+	{ .start = 0x0d00, .end = 0x0d04 },	/* RPM_CONFIG[0-1] */
+	{ .start = 0x0d0c, .end = 0x0d2c },	/* NOA_CONFIG[0-8] */
+	{ .start = 0x9840, .end = 0x9840 },	/* GDT_CHICKEN_BITS */
+	{ .start = 0x9884, .end = 0x9888 },	/* NOA_WRITE */
+	{ .start = 0x38d100, .end = 0x38d114},	/* VISACTL */
+	{}
+};
+
+static const struct xe_mmio_range gen12_oa_mux_regs[] = {
+	{ .start = 0x0d00, .end = 0x0d04 },     /* RPM_CONFIG[0-1] */
+	{ .start = 0x0d0c, .end = 0x0d2c },     /* NOA_CONFIG[0-8] */
+	{ .start = 0x9840, .end = 0x9840 },	/* GDT_CHICKEN_BITS */
+	{ .start = 0x9884, .end = 0x9888 },	/* NOA_WRITE */
+	{ .start = 0x20cc, .end = 0x20cc },	/* WAIT_FOR_RC6_EXIT */
+	{}
+};
+
+static const struct xe_mmio_range xe2_oa_mux_regs[] = {
+	{ .start = 0x13000,  .end = 0x137FC },	/* PES_0_PESL0 - PES_63_UPPER_PESL3 */
+	{},
+};
+
+static bool xe_oa_is_valid_mux_addr(struct xe_oa *oa, u32 addr)
+{
+	if (GRAPHICS_VER(oa->xe) >= 20)
+		return xe_oa_reg_in_range_table(addr, xe2_oa_mux_regs);
+	else if (GRAPHICS_VERx100(oa->xe) >= 1270)
+		return xe_oa_reg_in_range_table(addr, mtl_oa_mux_regs);
+	else
+		return xe_oa_reg_in_range_table(addr, gen12_oa_mux_regs);
+}
+
+static bool xe_oa_is_valid_config_reg_addr(struct xe_oa *oa, u32 addr)
+{
+	return xe_oa_is_valid_flex_addr(oa, addr) ||
+		xe_oa_is_valid_b_counter_addr(oa, addr) ||
+		xe_oa_is_valid_mux_addr(oa, addr);
+}
+
+static struct xe_oa_reg *
+xe_oa_alloc_regs(struct xe_oa *oa, bool (*is_valid)(struct xe_oa *oa, u32 addr),
+		 u32 __user *regs, u32 n_regs)
+{
+	struct xe_oa_reg *oa_regs;
+	int err;
+	u32 i;
+
+	oa_regs = kmalloc_array(n_regs, sizeof(*oa_regs), GFP_KERNEL);
+	if (!oa_regs)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < n_regs; i++) {
+		u32 addr, value;
+
+		err = get_user(addr, regs);
+		if (err)
+			goto addr_err;
+
+		if (!is_valid(oa, addr)) {
+			drm_dbg(&oa->xe->drm, "Invalid oa_reg address: %X\n", addr);
+			err = -EINVAL;
+			goto addr_err;
+		}
+
+		err = get_user(value, regs + 1);
+		if (err)
+			goto addr_err;
+
+		oa_regs[i].addr = XE_REG(addr);
+		oa_regs[i].value = value;
+
+		regs += 2;
+	}
+
+	return oa_regs;
+
+addr_err:
+	kfree(oa_regs);
+	return ERR_PTR(err);
+}
+
+static ssize_t show_dynamic_id(struct kobject *kobj,
+			       struct kobj_attribute *attr,
+			       char *buf)
+{
+	struct xe_oa_config *oa_config =
+		container_of(attr, typeof(*oa_config), sysfs_metric_id);
+
+	return sprintf(buf, "%d\n", oa_config->id);
+}
+
+static int create_dynamic_oa_sysfs_entry(struct xe_oa *oa,
+					 struct xe_oa_config *oa_config)
+{
+	sysfs_attr_init(&oa_config->sysfs_metric_id.attr);
+	oa_config->sysfs_metric_id.attr.name = "id";
+	oa_config->sysfs_metric_id.attr.mode = 0444;
+	oa_config->sysfs_metric_id.show = show_dynamic_id;
+	oa_config->sysfs_metric_id.store = NULL;
+
+	oa_config->attrs[0] = &oa_config->sysfs_metric_id.attr;
+	oa_config->attrs[1] = NULL;
+
+	oa_config->sysfs_metric.name = oa_config->uuid;
+	oa_config->sysfs_metric.attrs = oa_config->attrs;
+
+	return sysfs_create_group(oa->metrics_kobj, &oa_config->sysfs_metric);
+}
+
+int xe_oa_add_config_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file)
+{
+	struct xe_oa *oa = &to_xe_device(dev)->oa;
+	struct drm_xe_oa_config param;
+	struct drm_xe_oa_config *arg = &param;
+	struct xe_oa_config *oa_config, *tmp;
+	struct xe_oa_reg *regs;
+	int err, id;
+
+	if (!oa->xe) {
+		drm_dbg(&oa->xe->drm, "xe oa interface not available for this system\n");
+		return -ENODEV;
+	}
+
+	if (xe_perf_stream_paranoid && !perfmon_capable()) {
+		drm_dbg(&oa->xe->drm, "Insufficient privileges to add xe OA config\n");
+		return -EACCES;
+	}
+
+	err = __copy_from_user(&param, data, sizeof(param));
+	if (XE_IOCTL_DBG(oa->xe, err))
+		return -EFAULT;
+
+	if (!arg->regs_ptr || !arg->n_regs) {
+		drm_dbg(&oa->xe->drm, "No OA registers given\n");
+		return -EINVAL;
+	}
+
+	oa_config = kzalloc(sizeof(*oa_config), GFP_KERNEL);
+	if (!oa_config)
+		return -ENOMEM;
+
+	oa_config->oa = oa;
+	kref_init(&oa_config->ref);
+
+	if (!uuid_is_valid(arg->uuid)) {
+		drm_dbg(&oa->xe->drm, "Invalid uuid format for OA config\n");
+		err = -EINVAL;
+		goto reg_err;
+	}
+
+	/* Last character in oa_config->uuid will be 0 because oa_config is kzalloc */
+	memcpy(oa_config->uuid, arg->uuid, sizeof(arg->uuid));
+
+	oa_config->regs_len = arg->n_regs;
+	regs = xe_oa_alloc_regs(oa, xe_oa_is_valid_config_reg_addr,
+				u64_to_user_ptr(arg->regs_ptr),
+				arg->n_regs);
+	if (IS_ERR(regs)) {
+		drm_dbg(&oa->xe->drm, "Failed to create OA config for mux_regs\n");
+		err = PTR_ERR(regs);
+		goto reg_err;
+	}
+	oa_config->regs = regs;
+
+	err = mutex_lock_interruptible(&oa->metrics_lock);
+	if (err)
+		goto reg_err;
+
+	/* We shouldn't have too many configs, so this iteration shouldn't be too costly */
+	idr_for_each_entry(&oa->metrics_idr, tmp, id) {
+		if (!strcmp(tmp->uuid, oa_config->uuid)) {
+			drm_dbg(&oa->xe->drm, "OA config already exists with this uuid\n");
+			err = -EADDRINUSE;
+			goto sysfs_err;
+		}
+	}
+
+	err = create_dynamic_oa_sysfs_entry(oa, oa_config);
+	if (err) {
+		drm_dbg(&oa->xe->drm, "Failed to create sysfs entry for OA config\n");
+		goto sysfs_err;
+	}
+
+	/* Config id 0 is invalid, id 1 for kernel stored test config */
+	oa_config->id = idr_alloc(&oa->metrics_idr, oa_config, 2, 0, GFP_KERNEL);
+	if (oa_config->id < 0) {
+		drm_dbg(&oa->xe->drm, "Failed to create sysfs entry for OA config\n");
+		err = oa_config->id;
+		goto sysfs_err;
+	}
+
+	mutex_unlock(&oa->metrics_lock);
+
+	drm_dbg(&oa->xe->drm, "Added config %s id=%i\n", oa_config->uuid, oa_config->id);
+
+	return oa_config->id;
+
+sysfs_err:
+	mutex_unlock(&oa->metrics_lock);
+reg_err:
+	xe_oa_config_put(oa_config);
+	drm_dbg(&oa->xe->drm, "Failed to add new OA config\n");
+	return err;
+}
+
+int xe_oa_remove_config_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file)
+{
+	struct xe_oa *oa = &to_xe_device(dev)->oa;
+	struct xe_oa_config *oa_config;
+	u64 arg, *ptr = data;
+	int ret;
+
+	if (!oa->xe) {
+		drm_dbg(&oa->xe->drm, "xe oa interface not available for this system\n");
+		return -ENODEV;
+	}
+
+	if (xe_perf_stream_paranoid && !perfmon_capable()) {
+		drm_dbg(&oa->xe->drm, "Insufficient privileges to remove xe OA config\n");
+		return -EACCES;
+	}
+
+	ret = get_user(arg, ptr);
+	if (XE_IOCTL_DBG(oa->xe, ret))
+		return ret;
+
+	ret = mutex_lock_interruptible(&oa->metrics_lock);
+	if (ret)
+		return ret;
+
+	oa_config = idr_find(&oa->metrics_idr, arg);
+	if (!oa_config) {
+		drm_dbg(&oa->xe->drm, "Failed to remove unknown OA config\n");
+		ret = -ENOENT;
+		goto err_unlock;
+	}
+
+	WARN_ON(arg != oa_config->id);
+
+	sysfs_remove_group(oa->metrics_kobj, &oa_config->sysfs_metric);
+	idr_remove(&oa->metrics_idr, arg);
+
+	mutex_unlock(&oa->metrics_lock);
+
+	drm_dbg(&oa->xe->drm, "Removed config %s id=%i\n", oa_config->uuid, oa_config->id);
+
+	xe_oa_config_put(oa_config);
+
+	return 0;
+
+err_unlock:
+	mutex_unlock(&oa->metrics_lock);
+	return ret;
+}
+
+void xe_oa_register(struct xe_device *xe)
+{
+	struct xe_oa *oa = &xe->oa;
+
+	if (!oa->xe)
+		return;
+
+	oa->metrics_kobj = kobject_create_and_add("metrics",
+						  &xe->drm.primary->kdev->kobj);
+}
+
+void xe_oa_unregister(struct xe_device *xe)
+{
+	struct xe_oa *oa = &xe->oa;
+
+	if (!oa->metrics_kobj)
+		return;
+
+	kobject_put(oa->metrics_kobj);
+	oa->metrics_kobj = NULL;
+}
+
 static u32 num_oa_units_per_gt(struct xe_gt *gt)
 {
 	return 1;
@@ -259,6 +653,9 @@ int xe_oa_init(struct xe_device *xe)
 	/* Choose a representative limit */
 	xe_oa_sample_rate_hard_limit = xe_root_mmio_gt(xe)->info.reference_clock / 2;
 
+	mutex_init(&oa->metrics_lock);
+	idr_init_base(&oa->metrics_idr, 1);
+
 	ret = xe_oa_init_oa_units(oa);
 	if (ret) {
 		drm_err(&xe->drm, "OA initialization failed %d\n", ret);
@@ -272,6 +669,12 @@ int xe_oa_init(struct xe_device *xe)
 	return ret;
 }
 
+static int destroy_config(int id, void *p, void *data)
+{
+	xe_oa_config_put(p);
+	return 0;
+}
+
 void xe_oa_fini(struct xe_device *xe)
 {
 	struct xe_oa *oa = &xe->oa;
@@ -284,6 +687,9 @@ void xe_oa_fini(struct xe_device *xe)
 	for_each_gt(gt, xe, i)
 		kfree(gt->oa.oa_unit);
 
+	idr_for_each(&oa->metrics_idr, destroy_config, oa);
+	idr_destroy(&oa->metrics_idr);
+
 	oa->xe = NULL;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
index 2145c73176953..e4863f8681b14 100644
--- a/drivers/gpu/drm/xe/xe_oa.h
+++ b/drivers/gpu/drm/xe/xe_oa.h
@@ -8,11 +8,20 @@
 
 #include "xe_oa_types.h"
 
+struct drm_device;
+struct drm_file;
 struct xe_device;
 
 int xe_oa_init(struct xe_device *xe);
 void xe_oa_fini(struct xe_device *xe);
+void xe_oa_register(struct xe_device *xe);
+void xe_oa_unregister(struct xe_device *xe);
 int xe_oa_sysctl_register(void);
 void xe_oa_sysctl_unregister(void);
 
+int xe_oa_add_config_ioctl(struct drm_device *dev, void *data,
+			   struct drm_file *file);
+int xe_oa_remove_config_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file);
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
index 8f8cf6a2bf556..2985443df3080 100644
--- a/drivers/gpu/drm/xe/xe_oa_types.h
+++ b/drivers/gpu/drm/xe/xe_oa_types.h
@@ -6,6 +6,7 @@
 #ifndef _XE_OA_TYPES_H_
 #define _XE_OA_TYPES_H__
 
+#include <linux/idr.h>
 #include <linux/math.h>
 #include <linux/types.h>
 #include <linux/mutex.h>
@@ -120,6 +121,15 @@ struct xe_oa {
 	/** @xe: back pointer to xe device */
 	struct xe_device *xe;
 
+	/** @metrics_kobj: kobj for metrics sysfs */
+	struct kobject *metrics_kobj;
+
+	/** @metrics_lock: lock protecting add/remove configs */
+	struct mutex metrics_lock;
+
+	/** @metrics_idr: List of dynamic configurations (struct xe_oa_config) */
+	struct idr metrics_idr;
+
 	/** @oa_formats: tracks all OA formats across platforms */
 	const struct xe_oa_format *oa_formats;
 
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index 37538e98dcc04..2aee4c7989486 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -6,11 +6,25 @@
 #include <linux/errno.h>
 #include <linux/sysctl.h>
 
+#include "xe_oa.h"
 #include "xe_perf.h"
 
 u32 xe_perf_stream_paranoid = true;
 static struct ctl_table_header *sysctl_header;
 
+static int xe_oa_ioctl(struct drm_device *dev, struct drm_xe_perf_param *arg,
+		       struct drm_file *file)
+{
+	switch (arg->perf_op) {
+	case DRM_XE_PERF_OP_ADD_CONFIG:
+		return xe_oa_add_config_ioctl(dev, (void *)arg->param, file);
+	case DRM_XE_PERF_OP_REMOVE_CONFIG:
+		return xe_oa_remove_config_ioctl(dev, (void *)arg->param, file);
+	default:
+		return -EINVAL;
+	}
+}
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_xe_perf_param *arg = data;
@@ -19,6 +33,8 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		return -EINVAL;
 
 	switch (arg->perf_type) {
+	case DRM_XE_PERF_TYPE_OA:
+		return xe_oa_ioctl(dev, arg, file);
 	default:
 		return -EINVAL;
 	}
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 778862a5b76d4..f17134828c093 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1126,6 +1126,7 @@ struct drm_xe_wait_user_fence {
  * enum drm_xe_perf_type - Perf stream types
  */
 enum drm_xe_perf_type {
+	DRM_XE_PERF_TYPE_OA,
 	DRM_XE_PERF_TYPE_MAX,
 };
 
@@ -1191,6 +1192,30 @@ enum drm_xe_oa_format_type {
 	DRM_XE_OA_FMT_TYPE_PEC,
 };
 
+/**
+ * struct drm_xe_oa_config - OA metric configuration
+ *
+ * Multiple OA configs can be added using @DRM_XE_PERF_OP_ADD_CONFIG. A
+ * particular config can be specified when opening an OA stream using
+ * @DRM_XE_OA_PROPERTY_OA_METRIC_SET property.
+ */
+struct drm_xe_oa_config {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+	/** * @uuid: String formatted like "%\08x-%\04x-%\04x-%\04x-%\012x" */
+	char uuid[36];
+
+	/** @n_regs: Number of regs in @regs_ptr */
+	__u32 n_regs;
+
+	/**
+	 * @regs_ptr: Pointer to (register address, value) pairs for OA config
+	 * registers. Expected length of buffer is: (2 * sizeof(u32) * @n_regs).
+	 */
+	__u64 regs_ptr;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (5 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 06/17] drm/xe/oa/uapi: Add/remove OA config perf ops Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-09 22:53   ` Dixit, Ashutosh
                     ` (2 more replies)
  2023-12-08  6:43 ` [PATCH 08/17] drm/xe/oa: OA stream initialization (OAG) Ashutosh Dixit
                   ` (10 subsequent siblings)
  17 siblings, 3 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Properties for OA streams are specified by user space, when the stream is
opened, as a chain of drm_xe_ext_set_property struct's. Parse and validate
these stream properties.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_oa.c   | 372 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_oa.h   |   2 +
 drivers/gpu/drm/xe/xe_perf.c |   2 +
 include/uapi/drm/xe_drm.h    | 114 +++++++++++
 4 files changed, 490 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 6a903bf4f87d1..9b0bd58fcbc06 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -3,10 +3,13 @@
  * Copyright © 2023 Intel Corporation
  */
 
+#include <linux/nospec.h>
 #include <linux/sysctl.h>
 
+#include "regs/xe_gt_regs.h"
 #include "regs/xe_oa_regs.h"
 #include "xe_device.h"
+#include "xe_exec_queue.h"
 #include "xe_gt.h"
 #include "xe_mmio.h"
 #include "xe_oa.h"
@@ -46,6 +49,20 @@ struct xe_oa_config {
 	struct rcu_head rcu;
 };
 
+struct xe_oa_open_param {
+	u32 oa_unit_id;
+	bool sample;
+	u32 metric_set;
+	enum xe_oa_format_name oa_format;
+	int period_exponent;
+	u32 poll_period_us;
+	u32 open_flags;
+	int exec_queue_id;
+	int engine_instance;
+	struct xe_exec_queue *exec_q;
+	struct xe_hw_engine *hwe;
+};
+
 #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
 
 static const struct xe_oa_format oa_formats[] = {
@@ -88,6 +105,361 @@ static void xe_oa_config_put(struct xe_oa_config *oa_config)
 	kref_put(&oa_config->ref, xe_oa_config_release);
 }
 
+/*
+ * OA timestamp frequency = CS timestamp frequency in most platforms. On some
+ * platforms OA unit ignores the CTC_SHIFT and the 2 timestamps differ. In such
+ * cases, return the adjusted CS timestamp frequency to the user.
+ */
+u32 xe_oa_timestamp_frequency(struct xe_gt *gt)
+{
+	u32 reg, shift;
+
+	/*
+	 * Wa_18013179988:dg2
+	 * Wa_14015568240:pvc
+	 * Wa_14015846243:mtl
+	 */
+	switch (gt_to_xe(gt)->info.platform) {
+	case XE_DG2:
+	case XE_PVC:
+	case XE_METEORLAKE:
+		xe_device_mem_access_get(gt_to_xe(gt));
+		reg = xe_mmio_read32(gt, RPM_CONFIG0);
+		xe_device_mem_access_put(gt_to_xe(gt));
+
+		shift = REG_FIELD_GET(RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, reg);
+		return gt->info.reference_clock << (3 - shift);
+
+	default:
+		return gt->info.reference_clock;
+	}
+}
+
+static u64 oa_exponent_to_ns(struct xe_gt *gt, int exponent)
+{
+	u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
+	u32 den = xe_oa_timestamp_frequency(gt);
+
+	return div_u64(nom + den - 1, den);
+}
+
+static bool engine_supports_oa_format(const struct xe_hw_engine *hwe, int type)
+{
+	switch (hwe->oa_unit->type) {
+	case DRM_XE_OA_UNIT_TYPE_OAG:
+		return type == DRM_XE_OA_FMT_TYPE_OAG || type == DRM_XE_OA_FMT_TYPE_OAR ||
+			type == DRM_XE_OA_FMT_TYPE_OAC || type == DRM_XE_OA_FMT_TYPE_PEC;
+	case DRM_XE_OA_UNIT_TYPE_OAM:
+		return type == DRM_XE_OA_FMT_TYPE_OAM || type == DRM_XE_OA_FMT_TYPE_OAM_MPEC;
+	default:
+		return false;
+	}
+}
+
+static int decode_oa_format(struct xe_oa *oa, u64 fmt, enum xe_oa_format_name *name)
+{
+	u32 counter_size = FIELD_GET(DRM_XE_OA_FORMAT_MASK_COUNTER_SIZE, fmt);
+	u32 counter_sel = FIELD_GET(DRM_XE_OA_FORMAT_MASK_COUNTER_SEL, fmt);
+	u32 bc_report = FIELD_GET(DRM_XE_OA_FORMAT_MASK_BC_REPORT, fmt);
+	u32 type = FIELD_GET(DRM_XE_OA_FORMAT_MASK_FMT_TYPE, fmt);
+	int idx;
+
+	for_each_set_bit(idx, oa->format_mask, XE_OA_FORMAT_MAX) {
+		const struct xe_oa_format *f = &oa->oa_formats[idx];
+
+		if (counter_size == f->counter_size && bc_report == f->bc_report &&
+		    type == f->type && counter_sel == f->counter_select) {
+			*name = idx;
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
+
+u16 xe_oa_unit_id(struct xe_hw_engine *hwe)
+{
+	return hwe->oa_unit && hwe->oa_unit->num_engines ?
+		hwe->oa_unit->oa_unit_id : U16_MAX;
+}
+
+static int xe_oa_assign_hwe(struct xe_oa *oa, struct xe_oa_open_param *param)
+{
+	struct xe_gt *gt;
+	int i, ret = 0;
+
+	if (param->exec_q) {
+		/* When we have an exec_q, get hwe from the exec_q */
+		for_each_gt(gt, oa->xe, i) {
+			param->hwe = xe_gt_hw_engine(gt, param->exec_q->class,
+						     param->engine_instance, true);
+			if (param->hwe)
+				break;
+		}
+		if (param->hwe && (xe_oa_unit_id(param->hwe) != param->oa_unit_id)) {
+			drm_dbg(&oa->xe->drm, "OA unit ID mismatch for exec_q\n");
+			ret = -EINVAL;
+		}
+	} else {
+		struct xe_hw_engine *hwe;
+		enum xe_hw_engine_id id;
+
+		/* Else just get the first hwe attached to the oa unit */
+		for_each_gt(gt, oa->xe, i) {
+			for_each_hw_engine(hwe, gt, id) {
+				if (xe_oa_unit_id(hwe) == param->oa_unit_id) {
+					param->hwe = hwe;
+					goto out;
+				}
+			}
+		}
+	}
+out:
+	if (!param->hwe) {
+		drm_dbg(&oa->xe->drm, "Unable to find hwe for OA unit ID %d\n",
+			param->oa_unit_id);
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static int xe_oa_set_prop_oa_unit_id(struct xe_oa *oa, u64 value,
+				     struct xe_oa_open_param *param)
+{
+	if (value >= oa->oa_unit_ids) {
+		drm_dbg(&oa->xe->drm, "OA unit ID out of range %lld\n", value);
+		return -EINVAL;
+	}
+	param->oa_unit_id = value;
+	return 0;
+}
+
+static int xe_oa_set_prop_sample_oa(struct xe_oa *oa, u64 value,
+				    struct xe_oa_open_param *param)
+{
+	param->sample = value;
+	return 0;
+}
+
+static int xe_oa_set_prop_metric_set(struct xe_oa *oa, u64 value,
+				     struct xe_oa_open_param *param)
+{
+	param->metric_set = value;
+	return 0;
+}
+
+static int xe_oa_set_prop_oa_format(struct xe_oa *oa, u64 value,
+				    struct xe_oa_open_param *param)
+{
+	int ret = decode_oa_format(oa, value, &param->oa_format);
+
+	if (ret) {
+		drm_dbg(&oa->xe->drm, "Unsupported OA report format %#llx\n", value);
+		return ret;
+	}
+	return 0;
+}
+
+static int xe_oa_set_prop_oa_exponent(struct xe_oa *oa, u64 value,
+				      struct xe_oa_open_param *param)
+{
+#define OA_EXPONENT_MAX 31
+
+	if (value > OA_EXPONENT_MAX) {
+		drm_dbg(&oa->xe->drm, "OA timer exponent too high (> %u)\n", OA_EXPONENT_MAX);
+		return -EINVAL;
+	}
+	param->period_exponent = value;
+	return 0;
+}
+
+static int xe_oa_set_prop_poll_oa_period(struct xe_oa *oa, u64 value,
+					 struct xe_oa_open_param *param)
+{
+	if (value < 100) {
+		drm_dbg(&oa->xe->drm, "OA timer too small (%lldus < 100us)\n", value);
+		return -EINVAL;
+	}
+	param->poll_period_us = value;
+	return 0;
+}
+
+static int xe_oa_set_prop_open_flags(struct xe_oa *oa, u64 value,
+				     struct xe_oa_open_param *param)
+{
+	u32 known_open_flags =
+		DRM_XE_OA_FLAG_FD_CLOEXEC | DRM_XE_OA_FLAG_FD_NONBLOCK | DRM_XE_OA_FLAG_DISABLED;
+
+	if (value & ~known_open_flags) {
+		drm_dbg(&oa->xe->drm, "Unknown open_flag %#llx\n", value);
+		return -EINVAL;
+	}
+	param->open_flags = value;
+	return 0;
+}
+
+static int xe_oa_set_prop_exec_queue_id(struct xe_oa *oa, u64 value,
+					struct xe_oa_open_param *param)
+{
+	param->exec_queue_id = value;
+	return 0;
+}
+
+static int xe_oa_set_prop_engine_instance(struct xe_oa *oa, u64 value,
+					  struct xe_oa_open_param *param)
+{
+	param->engine_instance = value;
+	return 0;
+}
+
+typedef int (*xe_oa_set_property_fn)(struct xe_oa *oa, u64 value,
+				     struct xe_oa_open_param *param);
+static const xe_oa_set_property_fn xe_oa_set_property_funcs[] = {
+	[DRM_XE_OA_PROPERTY_OA_UNIT_ID] = xe_oa_set_prop_oa_unit_id,
+	[DRM_XE_OA_PROPERTY_SAMPLE_OA] = xe_oa_set_prop_sample_oa,
+	[DRM_XE_OA_PROPERTY_OA_METRIC_SET] = xe_oa_set_prop_metric_set,
+	[DRM_XE_OA_PROPERTY_OA_FORMAT] = xe_oa_set_prop_oa_format,
+	[DRM_XE_OA_PROPERTY_OA_EXPONENT] = xe_oa_set_prop_oa_exponent,
+	[DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US] = xe_oa_set_prop_poll_oa_period,
+	[DRM_XE_OA_PROPERTY_OPEN_FLAGS] = xe_oa_set_prop_open_flags,
+	[DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID] = xe_oa_set_prop_exec_queue_id,
+	[DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE] = xe_oa_set_prop_engine_instance,
+};
+
+static int xe_oa_user_ext_set_property(struct xe_oa *oa, u64 extension,
+				       struct xe_oa_open_param *param)
+{
+	u64 __user *address = u64_to_user_ptr(extension);
+	struct drm_xe_ext_set_property ext;
+	int err;
+	u32 idx;
+
+	err = __copy_from_user(&ext, address, sizeof(ext));
+	if (XE_IOCTL_DBG(oa->xe, err))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(oa->xe, ext.property >= ARRAY_SIZE(xe_oa_set_property_funcs)) ||
+	    XE_IOCTL_DBG(oa->xe, ext.pad))
+		return -EINVAL;
+
+	idx = array_index_nospec(ext.property, ARRAY_SIZE(xe_oa_set_property_funcs));
+	return xe_oa_set_property_funcs[idx](oa, ext.value, param);
+}
+
+typedef int (*xe_oa_user_extension_fn)(struct xe_oa *oa, u64 extension,
+				       struct xe_oa_open_param *param);
+static const xe_oa_user_extension_fn xe_oa_user_extension_funcs[] = {
+	[DRM_XE_OA_EXTENSION_SET_PROPERTY] = xe_oa_user_ext_set_property,
+};
+
+static int xe_oa_user_extensions(struct xe_oa *oa, u64 extension,
+				 struct xe_oa_open_param *param)
+{
+	u64 __user *address = u64_to_user_ptr(extension);
+	struct xe_user_extension ext;
+	int err;
+	u32 idx;
+
+	err = __copy_from_user(&ext, address, sizeof(ext));
+	if (XE_IOCTL_DBG(oa->xe, err))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(oa->xe, ext.pad) ||
+	    XE_IOCTL_DBG(oa->xe, ext.name >= ARRAY_SIZE(xe_oa_user_extension_funcs)))
+		return -EINVAL;
+
+	idx = array_index_nospec(ext.name, ARRAY_SIZE(xe_oa_user_extension_funcs));
+	err = xe_oa_user_extension_funcs[idx](oa, extension, param);
+	if (XE_IOCTL_DBG(oa->xe, err))
+		return err;
+
+	if (ext.next_extension)
+		return xe_oa_user_extensions(oa, ext.next_extension, param);
+
+	return 0;
+}
+
+int xe_oa_stream_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+	struct xe_oa *oa = &to_xe_device(dev)->oa;
+	struct xe_file *xef = to_xe_file(file);
+	struct drm_xe_oa_open_param dparam;
+	struct xe_oa_open_param param = {};
+	const struct xe_oa_format *f;
+	bool privileged_op = true;
+	int ret;
+
+	if (!oa->xe) {
+		drm_dbg(&oa->xe->drm, "xe oa interface not available for this system\n");
+		return -ENODEV;
+	}
+
+	ret = __copy_from_user(&dparam, data, sizeof(dparam));
+	if (XE_IOCTL_DBG(oa->xe, ret))
+		return -EFAULT;
+
+	ret = xe_oa_user_extensions(oa, dparam.extensions, &param);
+	if (ret)
+		return ret;
+
+	if (param.exec_queue_id > 0) {
+		param.exec_q = xe_exec_queue_lookup(xef, param.exec_queue_id);
+		if (XE_IOCTL_DBG(oa->xe, !param.exec_q))
+			return -ENOENT;
+	}
+
+	/*
+	 * Query based sampling (using MI_REPORT_PERF_COUNT) with OAR/OAC,
+	 * without global stream access, can be an unprivileged operation
+	 */
+	if (param.exec_q && !param.sample)
+		privileged_op = false;
+
+	if (privileged_op && xe_perf_stream_paranoid && !perfmon_capable()) {
+		drm_dbg(&oa->xe->drm, "Insufficient privileges to open xe perf stream\n");
+		ret = -EACCES;
+		goto err_exec_q;
+	}
+
+	if (!param.exec_q && !param.sample) {
+		drm_dbg(&oa->xe->drm, "Only OA report sampling supported\n");
+		ret = -EINVAL;
+		goto err_exec_q;
+	}
+
+	ret = xe_oa_assign_hwe(oa, &param);
+	if (ret)
+		goto err_exec_q;
+
+	f = &oa->oa_formats[param.oa_format];
+	if (!param.oa_format || !f->size ||
+	    !engine_supports_oa_format(param.hwe, f->type)) {
+		drm_dbg(&oa->xe->drm, "Invalid OA format %d type %d size %d for class %d\n",
+			param.oa_format, f->type, f->size, param.hwe->class);
+		ret = -EINVAL;
+		goto err_exec_q;
+	}
+
+	if (param.period_exponent > 0) {
+		u64 oa_period, oa_freq_hz;
+
+		oa_period = oa_exponent_to_ns(param.hwe->gt, param.period_exponent);
+		oa_freq_hz = div64_u64(NSEC_PER_SEC, oa_period);
+		if (oa_freq_hz > xe_oa_max_sample_rate && !perfmon_capable()) {
+			drm_dbg(&oa->xe->drm,
+				"OA exponent would exceed the max sampling frequency (sysctl dev.xe.oa_max_sample_rate) %uHz without CAP_PERFMON or CAP_SYS_ADMIN privileges\n",
+				xe_oa_max_sample_rate);
+			ret = -EACCES;
+			goto err_exec_q;
+		}
+	}
+err_exec_q:
+	if (ret < 0 && param.exec_q)
+		xe_exec_queue_put(param.exec_q);
+	return ret;
+}
+
 static bool xe_oa_is_valid_flex_addr(struct xe_oa *oa, u32 addr)
 {
 	static const struct xe_reg flex_eu_regs[] = {
diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
index e4863f8681b14..a0f9a876ea6b4 100644
--- a/drivers/gpu/drm/xe/xe_oa.h
+++ b/drivers/gpu/drm/xe/xe_oa.h
@@ -19,6 +19,8 @@ void xe_oa_unregister(struct xe_device *xe);
 int xe_oa_sysctl_register(void);
 void xe_oa_sysctl_unregister(void);
 
+int xe_oa_stream_open_ioctl(struct drm_device *dev, void *data,
+			    struct drm_file *file);
 int xe_oa_add_config_ioctl(struct drm_device *dev, void *data,
 			   struct drm_file *file);
 int xe_oa_remove_config_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index 2aee4c7989486..2c0615481b7df 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -16,6 +16,8 @@ static int xe_oa_ioctl(struct drm_device *dev, struct drm_xe_perf_param *arg,
 		       struct drm_file *file)
 {
 	switch (arg->perf_op) {
+	case DRM_XE_PERF_OP_STREAM_OPEN:
+		return xe_oa_stream_open_ioctl(dev, (void *)arg->param, file);
 	case DRM_XE_PERF_OP_ADD_CONFIG:
 		return xe_oa_add_config_ioctl(dev, (void *)arg->param, file);
 	case DRM_XE_PERF_OP_REMOVE_CONFIG:
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index f17134828c093..8156301df7315 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1192,6 +1192,120 @@ enum drm_xe_oa_format_type {
 	DRM_XE_OA_FMT_TYPE_PEC,
 };
 
+/** enum drm_xe_oa_property_id - OA stream property id's */
+enum drm_xe_oa_property_id {
+	/**
+	 * @DRM_XE_OA_PROPERTY_OA_UNIT_ID: ID of the OA unit on which to open
+	 * the OA stream, see @oa_unit_id in 'struct
+	 * drm_xe_query_oa_units'. Defaults to 0 if not provided.
+	 */
+	DRM_XE_OA_PROPERTY_OA_UNIT_ID = 1,
+
+	/**
+	 * @DRM_XE_OA_PROPERTY_SAMPLE_OA: A value of 1 requests the inclusion of
+	 * raw OA unit reports as part of stream samples.
+	 */
+	DRM_XE_OA_PROPERTY_SAMPLE_OA,
+
+	/**
+	 * @DRM_XE_OA_PROPERTY_OA_METRIC_SET: OA metrics defining contents of OA
+	 * reportst, previously added via @@DRM_XE_PERF_OP_ADD_CONFIG.
+	 */
+	DRM_XE_OA_PROPERTY_OA_METRIC_SET,
+
+	/** @DRM_XE_OA_PROPERTY_OA_FORMAT: Perf counter report format */
+	DRM_XE_OA_PROPERTY_OA_FORMAT,
+	/**
+	 * OA_FORMAT's are specified the same way as in Bspec, in terms of
+	 * the following quantities: a. enum @drm_xe_oa_format_type
+	 * b. Counter select c. Counter size and d. BC report
+	 */
+#define DRM_XE_OA_FORMAT_MASK_FMT_TYPE		(0xff << 0)
+#define DRM_XE_OA_FORMAT_MASK_COUNTER_SEL	(0xff << 8)
+#define DRM_XE_OA_FORMAT_MASK_COUNTER_SIZE	(0xff << 16)
+#define DRM_XE_OA_FORMAT_MASK_BC_REPORT		(0xff << 24)
+
+	/**
+	 * @DRM_XE_OA_PROPERTY_OA_EXPONENT: Requests periodic OA unit sampling
+	 * with sampling frequency proportional to 2^(period_exponent + 1)
+	 */
+	DRM_XE_OA_PROPERTY_OA_EXPONENT,
+
+	/**
+	 * @DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US: Timer interval in microseconds
+	 * to check OA buffer for available data. Minimum allowed value is 100
+	 * microseconds. A default value is used by the driver if this parameter
+	 * is skipped. Larger timer values will reduce cpu consumption during OA
+	 * perf captures, but excessively large values could result in data loss
+	 * due to OA buffer overwrites.
+	 */
+	DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US,
+
+	/**
+	 * @DRM_XE_OA_PROPERTY_OPEN_FLAGS: CLOEXEC and NONBLOCK flags are
+	 * directly applied to returned OA fd. DISABLED opens the OA stream in a
+	 * DISABLED state (see @DRM_XE_PERF_IOCTL_ENABLE).
+	 */
+	DRM_XE_OA_PROPERTY_OPEN_FLAGS,
+#define DRM_XE_OA_FLAG_FD_CLOEXEC	(1 << 0)
+#define DRM_XE_OA_FLAG_FD_NONBLOCK	(1 << 1)
+#define DRM_XE_OA_FLAG_DISABLED		(1 << 2)
+
+	/**
+	 * @DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID: Open the stream for a specific
+	 * @exec_queue_id. Perf queries can be executed on this exec queue.
+	 */
+	DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID,
+
+	/**
+	 * @DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE: Optional engine instance to
+	 * pass along with @DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID or will default to 0.
+	 */
+	DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE,
+
+	DRM_XE_OA_PROPERTY_MAX /* non-ABI */
+};
+
+/**
+ * struct drm_xe_oa_open_param - Params for opening an OA stream
+ *
+ * Stream params are specified as a chain of @drm_xe_ext_set_property
+ * struct's, with @property values from enum @drm_xe_oa_property_id and
+ * @xe_user_extension base.name set to @DRM_XE_OA_EXTENSION_SET_PROPERTY
+ */
+struct drm_xe_oa_open_param {
+#define DRM_XE_OA_EXTENSION_SET_PROPERTY	0
+	/** @extensions: Pointer to the first extension struct */
+	__u64 extensions;
+};
+
+/** enum drm_xe_oa_record_type - Type of OA packet read from OA fd */
+enum drm_xe_oa_record_type {
+	/** @DRM_XE_OA_RECORD_SAMPLE: Regular OA data sample */
+	DRM_XE_OA_RECORD_SAMPLE = 1,
+
+	/** @DRM_XE_OA_RECORD_OA_REPORT_LOST: Status indicating lost OA reports */
+	DRM_XE_OA_RECORD_OA_REPORT_LOST = 2,
+
+	/**
+	 * @DRM_XE_OA_RECORD_OA_BUFFER_LOST: Status indicating lost OA
+	 * reports and OA buffer reset in the process
+	 */
+	DRM_XE_OA_RECORD_OA_BUFFER_LOST = 3,
+
+	DRM_XE_OA_RECORD_MAX /* non-ABI */
+};
+
+/** struct drm_xe_oa_record_header - Header for OA packets read from OA fd */
+struct drm_xe_oa_record_header {
+	/** @type: Of enum @drm_xe_oa_record_type */
+	__u16 type;
+	/** @pad: MBZ */
+	__u16 pad;
+	/** @size: size in bytes */
+	__u32 size;
+};
+
 /**
  * struct drm_xe_oa_config - OA metric configuration
  *
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 08/17] drm/xe/oa: OA stream initialization (OAG)
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (6 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-20  2:31   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 09/17] drm/xe/oa/uapi: Expose OA stream fd Ashutosh Dixit
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Implement majority of OA stream initialization (as part of OA stream open)
ioctl). OAG buffer is allocated for receiving perf counter samples from
HW. OAG unit is initialized and the selected OA metric configuration is
programmed into OAG unit HW using a command/batch buffer.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_gt_regs.h |   3 +
 drivers/gpu/drm/xe/xe_oa.c           | 397 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_oa_types.h     |  82 ++++++
 3 files changed, 482 insertions(+)

diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index d318ec0efd7db..1b98b609f7fda 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -156,6 +156,8 @@
 
 #define SQCNT1					XE_REG_MCR(0x8718)
 #define XELPMP_SQCNT1				XE_REG(0x8718)
+#define   SQCNT1_PMON_ENABLE			REG_BIT(30)
+#define   SQCNT1_OABPC				REG_BIT(29)
 #define   ENFORCE_RAR				REG_BIT(23)
 
 #define XEHP_SQCM				XE_REG_MCR(0x8724)
@@ -365,6 +367,7 @@
 #define ROW_CHICKEN				XE_REG_MCR(0xe4f0, XE_REG_OPTION_MASKED)
 #define   UGM_BACKUP_MODE			REG_BIT(13)
 #define   MDQ_ARBITRATION_MODE			REG_BIT(12)
+#define   STALL_DOP_GATING_DISABLE		REG_BIT(5)
 #define   EARLY_EOT_DIS				REG_BIT(1)
 
 #define ROW_CHICKEN2				XE_REG_MCR(0xe4f4, XE_REG_OPTION_MASKED)
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 9b0bd58fcbc06..d898610322d50 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -6,15 +6,26 @@
 #include <linux/nospec.h>
 #include <linux/sysctl.h>
 
+#include <drm/drm_drv.h>
+#include <drm/xe_drm.h>
+
+#include "instructions/xe_mi_commands.h"
 #include "regs/xe_gt_regs.h"
 #include "regs/xe_oa_regs.h"
 #include "xe_device.h"
 #include "xe_exec_queue.h"
+#include "xe_bb.h"
+#include "xe_bo.h"
 #include "xe_gt.h"
+#include "xe_gt_mcr.h"
 #include "xe_mmio.h"
 #include "xe_oa.h"
+#include "xe_sched_job.h"
 #include "xe_perf.h"
 
+#define DEFAULT_POLL_FREQUENCY_HZ 200
+#define DEFAULT_POLL_PERIOD_NS (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY_HZ)
+
 static int xe_oa_sample_rate_hard_limit;
 static u32 xe_oa_max_sample_rate = 100000;
 
@@ -63,6 +74,13 @@ struct xe_oa_open_param {
 	struct xe_hw_engine *hwe;
 };
 
+struct xe_oa_config_bo {
+	struct llist_node node;
+
+	struct xe_oa_config *oa_config;
+	struct xe_bb *bb;
+};
+
 #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
 
 static const struct xe_oa_format oa_formats[] = {
@@ -105,6 +123,381 @@ static void xe_oa_config_put(struct xe_oa_config *oa_config)
 	kref_put(&oa_config->ref, xe_oa_config_release);
 }
 
+static struct xe_oa_config *xe_oa_config_get(struct xe_oa_config *oa_config)
+{
+	return kref_get_unless_zero(&oa_config->ref) ? oa_config : NULL;
+}
+
+static struct xe_oa_config *xe_oa_get_oa_config(struct xe_oa *oa, int metrics_set)
+{
+	struct xe_oa_config *oa_config;
+
+	rcu_read_lock();
+	oa_config = idr_find(&oa->metrics_idr, metrics_set);
+	if (oa_config)
+		oa_config = xe_oa_config_get(oa_config);
+	rcu_read_unlock();
+
+	return oa_config;
+}
+
+static void free_oa_config_bo(struct xe_oa_config_bo *oa_bo)
+{
+	xe_oa_config_put(oa_bo->oa_config);
+	xe_bb_free(oa_bo->bb, NULL);
+	kfree(oa_bo);
+}
+
+static const struct xe_oa_regs *__oa_regs(struct xe_oa_stream *stream)
+{
+	return &stream->hwe->oa_unit->regs;
+}
+
+static int xe_oa_submit_bb(struct xe_oa_stream *stream, struct xe_bb *bb)
+{
+	struct xe_sched_job *job;
+	struct dma_fence *fence;
+	long timeout;
+	int err = 0;
+
+	/* Kernel configuration is issued on stream->k_exec_q, not stream->exec_q */
+	job = xe_bb_create_job(stream->k_exec_q, bb);
+	if (IS_ERR(job)) {
+		err = PTR_ERR(job);
+		goto exit;
+	}
+
+	xe_sched_job_arm(job);
+	fence = dma_fence_get(&job->drm.s_fence->finished);
+	xe_sched_job_push(job);
+
+	timeout = dma_fence_wait_timeout(fence, false, HZ);
+	dma_fence_put(fence);
+	if (timeout < 0)
+		err = timeout;
+	else if (!timeout)
+		err = -ETIME;
+exit:
+	return err;
+}
+
+static void xe_oa_free_oa_buffer(struct xe_oa_stream *stream)
+{
+	xe_bo_unpin_map_no_vm(stream->oa_buffer.bo);
+}
+
+static void xe_oa_free_configs(struct xe_oa_stream *stream)
+{
+	struct xe_oa_config_bo *oa_bo, *tmp;
+
+	xe_oa_config_put(stream->oa_config);
+	llist_for_each_entry_safe(oa_bo, tmp, stream->oa_config_bos.first, node)
+		free_oa_config_bo(oa_bo);
+}
+
+#define HAS_OA_BPC_REPORTING(xe) (GRAPHICS_VERx100(xe) >= 1255)
+
+static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
+{
+	u32 sqcnt1;
+
+	/*
+	 * Wa_1508761755:xehpsdv, dg2
+	 * Enable thread stall DOP gating and EU DOP gating.
+	 */
+	if (stream->oa->xe->info.platform == XE_DG2) {
+		xe_gt_mcr_multicast_write(stream->gt, ROW_CHICKEN,
+					  _MASKED_BIT_DISABLE(STALL_DOP_GATING_DISABLE));
+		xe_gt_mcr_multicast_write(stream->gt, ROW_CHICKEN2,
+					  _MASKED_BIT_DISABLE(DISABLE_DOP_GATING));
+	}
+
+	/* Make sure we disable noa to save power. */
+	xe_mmio_rmw32(stream->gt, RPM_CONFIG1, GT_NOA_ENABLE, 0);
+
+	sqcnt1 = SQCNT1_PMON_ENABLE |
+		 (HAS_OA_BPC_REPORTING(stream->oa->xe) ? SQCNT1_OABPC : 0);
+
+	/* Reset PMON Enable to save power. */
+	xe_mmio_rmw32(stream->gt, XELPMP_SQCNT1, sqcnt1, 0);
+}
+
+static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream)
+{
+	struct xe_bo *bo;
+
+	BUILD_BUG_ON_NOT_POWER_OF_2(XE_OA_BUFFER_SIZE);
+	BUILD_BUG_ON(XE_OA_BUFFER_SIZE < SZ_128K || XE_OA_BUFFER_SIZE > SZ_16M);
+
+	bo = xe_bo_create_pin_map(stream->oa->xe, stream->gt->tile, NULL,
+				  XE_OA_BUFFER_SIZE, ttm_bo_type_kernel,
+				  XE_BO_CREATE_SYSTEM_BIT | XE_BO_CREATE_GGTT_BIT);
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
+
+	stream->oa_buffer.bo = bo;
+	stream->oa_buffer.vaddr = bo->vmap.vaddr;
+	return 0;
+}
+
+static void write_cs_mi_lri(struct xe_bb *bb, const struct xe_oa_reg *reg_data, u32 n_regs)
+{
+	u32 i;
+
+#define MI_LOAD_REGISTER_IMM_MAX_REGS (126)
+
+	for (i = 0; i < n_regs; i++) {
+		if ((i % MI_LOAD_REGISTER_IMM_MAX_REGS) == 0) {
+			u32 n_lri = min_t(u32, n_regs - i,
+					  MI_LOAD_REGISTER_IMM_MAX_REGS);
+
+			bb->cs[bb->len++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(n_lri);
+		}
+		bb->cs[bb->len++] = reg_data[i].addr.addr;
+		bb->cs[bb->len++] = reg_data[i].value;
+	}
+}
+
+static int num_lri_dwords(int num_regs)
+{
+	int count = 0;
+
+	if (num_regs > 0) {
+		count += DIV_ROUND_UP(num_regs, MI_LOAD_REGISTER_IMM_MAX_REGS);
+		count += num_regs * 2;
+	}
+
+	return count;
+}
+
+static struct xe_oa_config_bo *
+__xe_oa_alloc_config_buffer(struct xe_oa_stream *stream, struct xe_oa_config *oa_config)
+{
+	struct xe_oa_config_bo *oa_bo;
+	size_t config_length;
+	struct xe_bb *bb;
+
+	oa_bo = kzalloc(sizeof(*oa_bo), GFP_KERNEL);
+	if (!oa_bo)
+		return ERR_PTR(-ENOMEM);
+
+	config_length = num_lri_dwords(oa_config->regs_len);
+	config_length = ALIGN(sizeof(u32) * config_length, XE_PAGE_SIZE) / sizeof(u32);
+
+	bb = xe_bb_new(stream->gt, config_length, false);
+	if (IS_ERR(bb))
+		goto err_free;
+
+	write_cs_mi_lri(bb, oa_config->regs, oa_config->regs_len);
+
+	oa_bo->bb = bb;
+	oa_bo->oa_config = xe_oa_config_get(oa_config);
+	llist_add(&oa_bo->node, &stream->oa_config_bos);
+
+	return oa_bo;
+err_free:
+	kfree(oa_bo);
+	return ERR_CAST(bb);
+}
+
+static struct xe_oa_config_bo *xe_oa_alloc_config_buffer(struct xe_oa_stream *stream)
+{
+	struct xe_oa_config *oa_config = stream->oa_config;
+	struct xe_oa_config_bo *oa_bo;
+
+	/* Look for the buffer in the already allocated BOs attached to the stream */
+	llist_for_each_entry(oa_bo, stream->oa_config_bos.first, node) {
+		if (oa_bo->oa_config == oa_config &&
+		    memcmp(oa_bo->oa_config->uuid, oa_config->uuid,
+			   sizeof(oa_config->uuid)) == 0)
+			goto out;
+	}
+
+	oa_bo = __xe_oa_alloc_config_buffer(stream, oa_config);
+out:
+	return oa_bo;
+}
+
+static int xe_oa_emit_oa_config(struct xe_oa_stream *stream)
+{
+#define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
+	struct xe_oa_config_bo *oa_bo;
+	int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US;
+
+	oa_bo = xe_oa_alloc_config_buffer(stream);
+	if (IS_ERR(oa_bo)) {
+		err = PTR_ERR(oa_bo);
+		goto exit;
+	}
+
+	err = xe_oa_submit_bb(stream, oa_bo->bb);
+
+	/* Additional empirical delay needed for NOA programming after registers are written */
+	usleep_range(us, 2 * us);
+exit:
+	return err;
+}
+
+static u32 oag_report_ctx_switches(const struct xe_oa_stream *stream)
+{
+	/* If user didn't require OA reports, ask HW not to emit ctx switch reports */
+	return _MASKED_FIELD(OAG_OA_DEBUG_DISABLE_CTX_SWITCH_REPORTS,
+			     stream->sample ?
+			     0 : OAG_OA_DEBUG_DISABLE_CTX_SWITCH_REPORTS);
+}
+
+static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
+{
+	u32 oa_debug, sqcnt1;
+
+	/*
+	 * Wa_1508761755:xehpsdv, dg2
+	 * EU NOA signals behave incorrectly if EU clock gating is enabled.
+	 * Disable thread stall DOP gating and EU DOP gating.
+	 */
+	if (stream->oa->xe->info.platform == XE_DG2) {
+		xe_gt_mcr_multicast_write(stream->gt, ROW_CHICKEN,
+					  _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
+		xe_gt_mcr_multicast_write(stream->gt, ROW_CHICKEN2,
+					  _MASKED_BIT_ENABLE(DISABLE_DOP_GATING));
+	}
+
+	/* Disable clk ratio reports */
+	oa_debug = OAG_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS |
+		OAG_OA_DEBUG_INCLUDE_CLK_RATIO;
+
+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_debug,
+			_MASKED_BIT_ENABLE(oa_debug) |
+			oag_report_ctx_switches(stream));
+
+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_ctx_ctrl, stream->periodic ?
+			(OAG_OAGLBCTXCTRL_COUNTER_RESUME |
+			 OAG_OAGLBCTXCTRL_TIMER_ENABLE |
+			 REG_FIELD_PREP(OAG_OAGLBCTXCTRL_TIMER_PERIOD_MASK,
+					stream->period_exponent)) : 0);
+
+	/*
+	 * Initialize Super Queue Internal Cnt Register
+	 * Set PMON Enable in order to collect valid metrics
+	 * Enable bytes per clock reporting
+	 */
+	sqcnt1 = SQCNT1_PMON_ENABLE |
+		 (HAS_OA_BPC_REPORTING(stream->oa->xe) ? SQCNT1_OABPC : 0);
+
+	xe_mmio_rmw32(stream->gt, XELPMP_SQCNT1, 0, sqcnt1);
+
+	return xe_oa_emit_oa_config(stream);
+}
+
+static int xe_oa_stream_init(struct xe_oa_stream *stream,
+			     struct xe_oa_open_param *param)
+{
+	struct xe_oa_unit *u = param->hwe->oa_unit;
+	struct xe_gt *gt = param->hwe->gt;
+	int ret;
+
+	stream->exec_q = param->exec_q;
+	stream->poll_period_ns = param->poll_period_us ?
+		param->poll_period_us * NSEC_PER_USEC : DEFAULT_POLL_PERIOD_NS;
+	stream->hwe = param->hwe;
+	stream->gt = stream->hwe->gt;
+	stream->sample_size = sizeof(struct drm_xe_oa_record_header);
+	stream->oa_buffer.format = &stream->oa->oa_formats[param->oa_format];
+
+	stream->sample = param->sample;
+	stream->sample_size += stream->oa_buffer.format->size;
+	stream->periodic = param->period_exponent > 0;
+	stream->period_exponent = param->period_exponent;
+
+	stream->oa_config = xe_oa_get_oa_config(stream->oa, param->metric_set);
+	if (!stream->oa_config) {
+		drm_dbg(&stream->oa->xe->drm, "Invalid OA config id=%i\n", param->metric_set);
+		ret = -EINVAL;
+		goto exit;
+	}
+
+	ret = xe_oa_alloc_oa_buffer(stream);
+	if (ret)
+		goto err_free_configs;
+
+	/* Take runtime pm ref and forcewake to disable RC6 */
+	xe_device_mem_access_get(stream->oa->xe);
+	XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL));
+
+	stream->k_exec_q = xe_exec_queue_create(stream->oa->xe, NULL,
+						BIT(stream->hwe->logical_instance), 1,
+						stream->hwe, EXEC_QUEUE_FLAG_KERNEL);
+	if (IS_ERR(stream->k_exec_q)) {
+		ret = PTR_ERR(stream->k_exec_q);
+		drm_err(&stream->oa->xe->drm, "gt%d, hwe %s, xe_exec_queue_create failed=%d",
+			stream->gt->info.id, stream->hwe->name, ret);
+		goto err_fw_put;
+	}
+
+	ret = xe_oa_enable_metric_set(stream);
+	if (ret) {
+		drm_dbg(&stream->oa->xe->drm, "Unable to enable metric set\n");
+		goto err_put_k_exec_q;
+	}
+
+	drm_dbg(&stream->oa->xe->drm, "opening stream oa config uuid=%s\n",
+		stream->oa_config->uuid);
+
+	WRITE_ONCE(u->exclusive_stream, stream);
+
+	spin_lock_init(&stream->oa_buffer.ptr_lock);
+	mutex_init(&stream->stream_lock);
+
+	return 0;
+
+err_put_k_exec_q:
+	xe_oa_disable_metric_set(stream);
+	xe_exec_queue_put(stream->k_exec_q);
+err_fw_put:
+	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
+	xe_device_mem_access_put(stream->oa->xe);
+	xe_oa_free_oa_buffer(stream);
+err_free_configs:
+	xe_oa_free_configs(stream);
+exit:
+	return ret;
+}
+
+static int xe_oa_stream_open_ioctl_locked(struct xe_oa *oa,
+					  struct xe_oa_open_param *param)
+{
+	struct xe_oa_stream *stream;
+	int stream_fd;
+	int ret;
+
+	/* We currently only allow exclusive access */
+	if (param->hwe->oa_unit->exclusive_stream) {
+		drm_dbg(&oa->xe->drm, "OA unit already in use\n");
+		ret = -EBUSY;
+		goto exit;
+	}
+
+	stream = kzalloc(sizeof(*stream), GFP_KERNEL);
+	if (!stream) {
+		ret = -ENOMEM;
+		goto exit;
+	}
+
+	stream->oa = oa;
+	ret = xe_oa_stream_init(stream, param);
+	if (ret)
+		goto err_free;
+
+	/* Hold a reference on the drm device till stream_fd is released */
+	drm_dev_get(&stream->oa->xe->drm);
+
+	return stream_fd;
+err_free:
+	kfree(stream);
+exit:
+	return ret;
+}
+
 /*
  * OA timestamp frequency = CS timestamp frequency in most platforms. On some
  * platforms OA unit ignores the CTC_SHIFT and the 2 timestamps differ. In such
@@ -454,6 +847,10 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, void *data, struct drm_file
 			goto err_exec_q;
 		}
 	}
+
+	mutex_lock(&param.hwe->gt->oa.gt_lock);
+	ret = xe_oa_stream_open_ioctl_locked(oa, &param);
+	mutex_unlock(&param.hwe->gt->oa.gt_lock);
 err_exec_q:
 	if (ret < 0 && param.exec_q)
 		xe_exec_queue_put(param.exec_q);
diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
index 2985443df3080..05047226af8d1 100644
--- a/drivers/gpu/drm/xe/xe_oa_types.h
+++ b/drivers/gpu/drm/xe/xe_oa_types.h
@@ -14,6 +14,8 @@
 #include <drm/xe_drm.h>
 #include "regs/xe_reg_defs.h"
 
+#define XE_OA_BUFFER_SIZE SZ_16M
+
 enum xe_oa_report_header {
 	HDR_32_BIT = 0,
 	HDR_64_BIT,
@@ -141,4 +143,84 @@ struct xe_oa {
 	/** @oa_unit_ids: tracks oa unit ids assigned across gt's */
 	u16 oa_unit_ids;
 };
+
+/** @oa_buffer: State of the stream OA buffer */
+struct oa_buffer {
+	/** @format: data format */
+	const struct xe_oa_format *format;
+
+	/** @format: xe_bo backing the OA buffer */
+	struct xe_bo *bo;
+
+	/** @vaddr: mapped vaddr of the OA buffer */
+	u8 *vaddr;
+
+	/** @ptr_lock: Lock protecting reads/writes to head/tail pointers */
+	spinlock_t ptr_lock;
+
+	/** @head: Cached head to read from */
+	u32 head;
+
+	/** @tail: The last verified cached tail where HW has completed writing */
+	u32 tail;
+};
+
+/**
+ * struct xe_oa_stream - state for a single open stream FD
+ */
+struct xe_oa_stream {
+	/** @oa: xe_oa backpointer */
+	struct xe_oa *oa;
+
+	/** @gt: gt associated with the oa stream */
+	struct xe_gt *gt;
+
+	/** @hwe: hardware engine associated with this oa stream */
+	struct xe_hw_engine *hwe;
+
+	/** @lock: Lock serializing stream operations */
+	struct mutex stream_lock;
+
+	/** @sample: true if DRM_XE_OA_PROP_SAMPLE_OA is provided */
+	bool sample;
+
+	/** @sample_size: Size of an OA record/packet plus the header */
+	int sample_size;
+
+	/** @exec_q: Exec queue corresponding to DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID */
+	struct xe_exec_queue *exec_q;
+
+	/** @k_exec_q: kernel exec_q used for OA programming batch submissions */
+	struct xe_exec_queue *k_exec_q;
+
+	/** @enabled: Whether the stream is currently enabled */
+	bool enabled;
+
+	/** @oa_config: OA configuration used by the stream */
+	struct xe_oa_config *oa_config;
+
+	/** @oa_config_bos: List of struct @xe_oa_config_bo's */
+	struct llist_head oa_config_bos;
+
+	/** @poll_check_timer: Timer to periodically check for data in the OA buffer */
+	struct hrtimer poll_check_timer;
+
+	/** @poll_wq: Wait queue for waiting for OA data to be available */
+	wait_queue_head_t poll_wq;
+
+	/** @pollin: Whether there is data available to read */
+	bool pollin;
+
+	/** @periodic: Whether periodic sampling is currently enabled */
+	bool periodic;
+
+	/** @period_exponent: OA unit sampling frequency is derived from this */
+	int period_exponent;
+
+	/** @oa_buffer: OA buffer for the stream */
+	struct oa_buffer oa_buffer;
+
+	/** @poll_period_ns: hrtimer period for checking OA buffer for available data */
+	u64 poll_period_ns;
+};
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 09/17] drm/xe/oa/uapi: Expose OA stream fd
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (7 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 08/17] drm/xe/oa: OA stream initialization (OAG) Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-20  2:52   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 10/17] drm/xe/oa/uapi: Read file_operation Ashutosh Dixit
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

The OA stream open perf op returns an fd with its own file_operations for
the newly initialized OA stream. These file_operations allow userspace to
enable or disable the stream, as well as apply a different metric
configuration for the OA stream. Userspace can also poll for data
availability. OA stream initialization is completed in this commit by
enabling the OA stream. When sampling is enabled this starts a hrtimer
which periodically checks for data availablility.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_oa.c | 373 +++++++++++++++++++++++++++++++++++++
 1 file changed, 373 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index d898610322d50..b6e94dba5f525 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -3,7 +3,9 @@
  * Copyright © 2023 Intel Corporation
  */
 
+#include <linux/anon_inodes.h>
 #include <linux/nospec.h>
+#include <linux/poll.h>
 #include <linux/sysctl.h>
 
 #include <drm/drm_drv.h>
@@ -23,6 +25,7 @@
 #include "xe_sched_job.h"
 #include "xe_perf.h"
 
+#define OA_TAKEN(tail, head)	(((tail) - (head)) & (XE_OA_BUFFER_SIZE - 1))
 #define DEFAULT_POLL_FREQUENCY_HZ 200
 #define DEFAULT_POLL_PERIOD_NS (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY_HZ)
 
@@ -153,6 +156,202 @@ static const struct xe_oa_regs *__oa_regs(struct xe_oa_stream *stream)
 	return &stream->hwe->oa_unit->regs;
 }
 
+static u32 xe_oa_hw_tail_read(struct xe_oa_stream *stream)
+{
+	return xe_mmio_read32(stream->gt, __oa_regs(stream)->oa_tail_ptr) &
+		OAG_OATAILPTR_MASK;
+}
+
+#define oa_report_header_64bit(__s) \
+	((__s)->oa_buffer.format->header == HDR_64_BIT)
+
+static u64 oa_report_id(struct xe_oa_stream *stream, void *report)
+{
+	return oa_report_header_64bit(stream) ? *(u64 *)report : *(u32 *)report;
+}
+
+static u64 oa_timestamp(struct xe_oa_stream *stream, void *report)
+{
+	return oa_report_header_64bit(stream) ?
+		*((u64 *)report + 1) :
+		*((u32 *)report + 1);
+}
+
+static bool xe_oa_buffer_check_unlocked(struct xe_oa_stream *stream)
+{
+	u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo);
+	int report_size = stream->oa_buffer.format->size;
+	u32 tail, hw_tail;
+	unsigned long flags;
+	bool pollin;
+	u32 partial_report_size;
+
+	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
+
+	hw_tail = xe_oa_hw_tail_read(stream);
+	hw_tail -= gtt_offset;
+
+	/*
+	 * The tail pointer increases in 64 byte (cacheline size), not in report_size
+	 * increments. Also report size may not be a power of 2. Compute potential
+	 * partially landed report in OA buffer.
+	 */
+	partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail);
+	partial_report_size %= report_size;
+
+	/* Subtract partial amount off the tail */
+	hw_tail = OA_TAKEN(hw_tail, partial_report_size);
+
+	tail = hw_tail;
+
+	/*
+	 * Walk the stream backward until we find a report with report id and timestamp
+	 * not 0. We can't tell whether a report has fully landed in memory before the
+	 * report id and timestamp of the following report have landed.
+	 *
+	 * This is assuming that the writes of the OA unit land in memory in the order
+	 * they were written.  If not : (╯°□°)╯︵ ┻━┻
+	 */
+	while (OA_TAKEN(tail, stream->oa_buffer.tail) >= report_size) {
+		void *report = stream->oa_buffer.vaddr + tail;
+
+		if (oa_report_id(stream, report) || oa_timestamp(stream, report))
+			break;
+
+		tail = OA_TAKEN(tail, report_size);
+	}
+
+	if (OA_TAKEN(hw_tail, tail) > report_size)
+		drm_dbg(&stream->oa->xe->drm,
+			"unlanded report(s) head=0x%x tail=0x%x hw_tail=0x%x\n",
+			stream->oa_buffer.head, tail, hw_tail);
+
+	stream->oa_buffer.tail = tail;
+
+	pollin = OA_TAKEN(stream->oa_buffer.tail,
+			  stream->oa_buffer.head) >= report_size;
+
+	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
+
+	return pollin;
+}
+
+static enum hrtimer_restart xe_oa_poll_check_timer_cb(struct hrtimer *hrtimer)
+{
+	struct xe_oa_stream *stream =
+		container_of(hrtimer, typeof(*stream), poll_check_timer);
+
+	if (xe_oa_buffer_check_unlocked(stream)) {
+		stream->pollin = true;
+		wake_up(&stream->poll_wq);
+	}
+
+	hrtimer_forward_now(hrtimer, ns_to_ktime(stream->poll_period_ns));
+
+	return HRTIMER_RESTART;
+}
+
+static void xe_oa_init_oa_buffer(struct xe_oa_stream *stream)
+{
+	u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo);
+	u32 oa_buf = gtt_offset | OABUFFER_SIZE_16M | OAG_OABUFFER_MEMORY_SELECT;
+	unsigned long flags;
+
+	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
+
+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_status, 0);
+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_head_ptr,
+			gtt_offset & OAG_OAHEADPTR_MASK);
+	stream->oa_buffer.head = 0;
+
+	/*
+	 * PRM says: "This MMIO must be set before the OATAILPTR register and after the
+	 * OAHEADPTR register. This is to enable proper functionality of the overflow bit".
+	 */
+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_buffer, oa_buf);
+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_tail_ptr,
+			gtt_offset & OAG_OATAILPTR_MASK);
+
+	/* Mark that we need updated tail pointer to read from */
+	stream->oa_buffer.tail = 0;
+
+	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
+
+	/* Zero out the OA buffer since we rely on zero report id and timestamp fields */
+	memset(stream->oa_buffer.vaddr, 0, stream->oa_buffer.bo->size);
+}
+
+u32 __format_to_oactrl(const struct xe_oa_format *format, int counter_sel_mask)
+{
+	return ((format->counter_select << __bf_shf(counter_sel_mask)) & counter_sel_mask) |
+		REG_FIELD_PREP(OA_OACONTROL_REPORT_BC_MASK, format->bc_report) |
+		REG_FIELD_PREP(OA_OACONTROL_COUNTER_SIZE_MASK, format->counter_size);
+}
+
+static void xe_oa_enable(struct xe_oa_stream *stream)
+{
+	const struct xe_oa_format *format = stream->oa_buffer.format;
+	const struct xe_oa_regs *regs;
+	u32 val;
+
+	/*
+	 * BSpec: 46822: Bit 0. Even if stream->sample is 0, for OAR to function, the OA
+	 * buffer must be correctly initialized
+	 */
+	xe_oa_init_oa_buffer(stream);
+
+	regs = __oa_regs(stream);
+	val = __format_to_oactrl(format, regs->oa_ctrl_counter_select_mask) |
+		OAG_OACONTROL_OA_COUNTER_ENABLE;
+
+	xe_mmio_write32(stream->gt, regs->oa_ctrl, val);
+}
+
+static void xe_oa_disable(struct xe_oa_stream *stream)
+{
+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_ctrl, 0);
+	if (xe_mmio_wait32(stream->gt, __oa_regs(stream)->oa_ctrl,
+			   OAG_OACONTROL_OA_COUNTER_ENABLE, 0, 50000, NULL, false))
+		drm_err(&stream->oa->xe->drm,
+			"wait for OA to be disabled timed out\n");
+
+	xe_mmio_write32(stream->gt, OA_TLB_INV_CR, 1);
+	if (xe_mmio_wait32(stream->gt, OA_TLB_INV_CR, 1, 0, 50000, NULL, false))
+		drm_err(&stream->oa->xe->drm,
+			"wait for OA tlb invalidate timed out\n");
+}
+
+static __poll_t xe_oa_poll_locked(struct xe_oa_stream *stream,
+				  struct file *file, poll_table *wait)
+{
+	__poll_t events = 0;
+
+	poll_wait(file, &stream->poll_wq, wait);
+
+	/*
+	 * We don't explicitly check whether there's something to read here since this
+	 * path may be hot depending on what else userspace is polling, or on the timeout
+	 * in use. We rely on hrtimer xe_oa_poll_check_timer_cb to notify us when there
+	 * are samples to read
+	 */
+	if (stream->pollin)
+		events |= EPOLLIN;
+
+	return events;
+}
+
+static __poll_t xe_oa_poll(struct file *file, poll_table *wait)
+{
+	struct xe_oa_stream *stream = file->private_data;
+	__poll_t ret;
+
+	mutex_lock(&stream->stream_lock);
+	ret = xe_oa_poll_locked(stream, file, wait);
+	mutex_unlock(&stream->stream_lock);
+
+	return ret;
+}
+
 static int xe_oa_submit_bb(struct xe_oa_stream *stream, struct xe_bb *bb)
 {
 	struct xe_sched_job *job;
@@ -222,6 +421,26 @@ static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
 	xe_mmio_rmw32(stream->gt, XELPMP_SQCNT1, sqcnt1, 0);
 }
 
+static void xe_oa_stream_destroy(struct xe_oa_stream *stream)
+{
+	struct xe_oa_unit *u = stream->hwe->oa_unit;
+	struct xe_gt *gt = stream->hwe->gt;
+
+	if (WARN_ON(stream != u->exclusive_stream))
+		return;
+
+	WRITE_ONCE(u->exclusive_stream, NULL);
+
+	xe_oa_disable_metric_set(stream);
+	xe_exec_queue_put(stream->k_exec_q);
+
+	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
+	xe_device_mem_access_put(stream->oa->xe);
+
+	xe_oa_free_oa_buffer(stream);
+	xe_oa_free_configs(stream);
+}
+
 static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream)
 {
 	struct xe_bo *bo;
@@ -389,6 +608,139 @@ static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
 	return xe_oa_emit_oa_config(stream);
 }
 
+static void xe_oa_stream_enable(struct xe_oa_stream *stream)
+{
+	stream->pollin = false;
+
+	xe_oa_enable(stream);
+
+	if (stream->sample)
+		hrtimer_start(&stream->poll_check_timer,
+			      ns_to_ktime(stream->poll_period_ns),
+			      HRTIMER_MODE_REL_PINNED);
+}
+
+static void xe_oa_stream_disable(struct xe_oa_stream *stream)
+{
+	xe_oa_disable(stream);
+
+	if (stream->sample)
+		hrtimer_cancel(&stream->poll_check_timer);
+}
+
+static void xe_oa_enable_locked(struct xe_oa_stream *stream)
+{
+	if (stream->enabled)
+		return;
+
+	stream->enabled = true;
+
+	xe_oa_stream_enable(stream);
+}
+
+static void xe_oa_disable_locked(struct xe_oa_stream *stream)
+{
+	if (!stream->enabled)
+		return;
+
+	stream->enabled = false;
+
+	xe_oa_stream_disable(stream);
+}
+
+static long xe_oa_config_locked(struct xe_oa_stream *stream,
+				unsigned long metrics_set)
+{
+	struct xe_oa_config *config;
+	long ret = stream->oa_config->id;
+
+	config = xe_oa_get_oa_config(stream->oa, metrics_set);
+	if (!config)
+		return -ENODEV;
+
+	if (config != stream->oa_config) {
+		int err;
+
+		err = xe_oa_emit_oa_config(stream);
+		if (!err)
+			config = xchg(&stream->oa_config, config);
+		else
+			ret = err;
+	}
+
+	xe_oa_config_put(config);
+
+	return ret;
+}
+
+static long xe_oa_ioctl_locked(struct xe_oa_stream *stream,
+			       unsigned int cmd,
+			       unsigned long arg)
+{
+	switch (cmd) {
+	case DRM_XE_PERF_IOCTL_ENABLE:
+		xe_oa_enable_locked(stream);
+		return 0;
+	case DRM_XE_PERF_IOCTL_DISABLE:
+		xe_oa_disable_locked(stream);
+		return 0;
+	case DRM_XE_PERF_IOCTL_CONFIG:
+		return xe_oa_config_locked(stream, arg);
+	}
+
+	return -EINVAL;
+}
+
+static long xe_oa_ioctl(struct file *file,
+			unsigned int cmd,
+			unsigned long arg)
+{
+	struct xe_oa_stream *stream = file->private_data;
+	long ret;
+
+	mutex_lock(&stream->stream_lock);
+	ret = xe_oa_ioctl_locked(stream, cmd, arg);
+	mutex_unlock(&stream->stream_lock);
+
+	return ret;
+}
+
+static void xe_oa_destroy_locked(struct xe_oa_stream *stream)
+{
+	if (stream->enabled)
+		xe_oa_disable_locked(stream);
+
+	xe_oa_stream_destroy(stream);
+
+	if (stream->exec_q)
+		xe_exec_queue_put(stream->exec_q);
+
+	kfree(stream);
+}
+
+static int xe_oa_release(struct inode *inode, struct file *file)
+{
+	struct xe_oa_stream *stream = file->private_data;
+	struct xe_gt *gt = stream->gt;
+
+	mutex_lock(&gt->oa.gt_lock);
+	xe_oa_destroy_locked(stream);
+	mutex_unlock(&gt->oa.gt_lock);
+
+	/* Release the reference the perf stream kept on the driver */
+	drm_dev_put(&gt_to_xe(gt)->drm);
+
+	return 0;
+}
+
+static const struct file_operations xe_oa_fops = {
+	.owner		= THIS_MODULE,
+	.llseek		= no_llseek,
+	.release	= xe_oa_release,
+	.poll		= xe_oa_poll,
+	.unlocked_ioctl	= xe_oa_ioctl,
+};
+
 static int xe_oa_stream_init(struct xe_oa_stream *stream,
 			     struct xe_oa_open_param *param)
 {
@@ -445,6 +797,10 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream,
 
 	WRITE_ONCE(u->exclusive_stream, stream);
 
+	hrtimer_init(&stream->poll_check_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	stream->poll_check_timer.function = xe_oa_poll_check_timer_cb;
+	init_waitqueue_head(&stream->poll_wq);
+
 	spin_lock_init(&stream->oa_buffer.ptr_lock);
 	mutex_init(&stream->stream_lock);
 
@@ -467,6 +823,7 @@ static int xe_oa_stream_open_ioctl_locked(struct xe_oa *oa,
 					  struct xe_oa_open_param *param)
 {
 	struct xe_oa_stream *stream;
+	unsigned long f_flags = 0;
 	int stream_fd;
 	int ret;
 
@@ -488,10 +845,26 @@ static int xe_oa_stream_open_ioctl_locked(struct xe_oa *oa,
 	if (ret)
 		goto err_free;
 
+	if (param->open_flags & DRM_XE_OA_FLAG_FD_CLOEXEC)
+		f_flags |= O_CLOEXEC;
+	if (param->open_flags & DRM_XE_OA_FLAG_FD_NONBLOCK)
+		f_flags |= O_NONBLOCK;
+
+	stream_fd = anon_inode_getfd("[xe_oa]", &xe_oa_fops, stream, f_flags);
+	if (stream_fd < 0) {
+		ret = stream_fd;
+		goto err_destroy;
+	}
+
+	if (!(param->open_flags & DRM_XE_OA_FLAG_DISABLED))
+		xe_oa_enable_locked(stream);
+
 	/* Hold a reference on the drm device till stream_fd is released */
 	drm_dev_get(&stream->oa->xe->drm);
 
 	return stream_fd;
+err_destroy:
+	xe_oa_stream_destroy(stream);
 err_free:
 	kfree(stream);
 exit:
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 10/17] drm/xe/oa/uapi: Read file_operation
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (8 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 09/17] drm/xe/oa/uapi: Expose OA stream fd Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-20  3:01   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 11/17] drm/xe/oa: Disable overrun mode for Xe2+ OAG Ashutosh Dixit
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Implement the OA stream read file_operation. Both blocking and non-blocking
reads are supported. As part of read system call, the read copies OA perf
data from the OA buffer to the user buffer, after appending packet headers
for status and data packets.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_oa.c | 239 +++++++++++++++++++++++++++++++++++++
 1 file changed, 239 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index b6e94dba5f525..5744436188dcd 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -170,6 +170,14 @@ static u64 oa_report_id(struct xe_oa_stream *stream, void *report)
 	return oa_report_header_64bit(stream) ? *(u64 *)report : *(u32 *)report;
 }
 
+static void oa_report_id_clear(struct xe_oa_stream *stream, u32 *report)
+{
+	if (oa_report_header_64bit(stream))
+		*(u64 *)report = 0;
+	else
+		*report = 0;
+}
+
 static u64 oa_timestamp(struct xe_oa_stream *stream, void *report)
 {
 	return oa_report_header_64bit(stream) ?
@@ -177,6 +185,14 @@ static u64 oa_timestamp(struct xe_oa_stream *stream, void *report)
 		*((u32 *)report + 1);
 }
 
+static void oa_timestamp_clear(struct xe_oa_stream *stream, u32 *report)
+{
+	if (oa_report_header_64bit(stream))
+		*(u64 *)&report[2] = 0;
+	else
+		report[1] = 0;
+}
+
 static bool xe_oa_buffer_check_unlocked(struct xe_oa_stream *stream)
 {
 	u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo);
@@ -251,6 +267,134 @@ static enum hrtimer_restart xe_oa_poll_check_timer_cb(struct hrtimer *hrtimer)
 	return HRTIMER_RESTART;
 }
 
+static int xe_oa_append_status(struct xe_oa_stream *stream, char __user *buf,
+			       size_t count, size_t *offset,
+			       enum drm_xe_oa_record_type type)
+{
+	struct drm_xe_oa_record_header header = { type, 0, sizeof(header) };
+
+	if ((count - *offset) < header.size)
+		return -ENOSPC;
+
+	if (copy_to_user(buf + *offset, &header, sizeof(header)))
+		return -EFAULT;
+
+	*offset += header.size;
+
+	return 0;
+}
+
+static int xe_oa_append_sample(struct xe_oa_stream *stream, char __user *buf,
+			       size_t count, size_t *offset, const u8 *report)
+{
+	int report_size = stream->oa_buffer.format->size;
+	struct drm_xe_oa_record_header header;
+	int report_size_partial;
+	u8 *oa_buf_end;
+
+	header.type = DRM_XE_OA_RECORD_SAMPLE;
+	header.pad = 0;
+	header.size = stream->sample_size;
+
+	if ((count - *offset) < header.size)
+		return -ENOSPC;
+
+	buf += *offset;
+	if (copy_to_user(buf, &header, sizeof(header)))
+		return -EFAULT;
+	buf += sizeof(header);
+
+	oa_buf_end = stream->oa_buffer.vaddr + XE_OA_BUFFER_SIZE;
+	report_size_partial = oa_buf_end - report;
+
+	if (report_size_partial < report_size) {
+		if (copy_to_user(buf, report, report_size_partial))
+			return -EFAULT;
+		buf += report_size_partial;
+
+		if (copy_to_user(buf, stream->oa_buffer.vaddr,
+				 report_size - report_size_partial))
+			return -EFAULT;
+	} else if (copy_to_user(buf, report, report_size)) {
+		return -EFAULT;
+	}
+
+	*offset += header.size;
+
+	return 0;
+}
+
+static int xe_oa_append_reports(struct xe_oa_stream *stream, char __user *buf,
+				size_t count, size_t *offset)
+{
+	int report_size = stream->oa_buffer.format->size;
+	u8 *oa_buf_base = stream->oa_buffer.vaddr;
+	u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo);
+	u32 mask = (XE_OA_BUFFER_SIZE - 1);
+	size_t start_offset = *offset;
+	unsigned long flags;
+	u32 head, tail;
+	int ret = 0;
+
+	if (drm_WARN_ON(&stream->oa->xe->drm, !stream->enabled))
+		return -EIO;
+
+	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
+
+	head = stream->oa_buffer.head;
+	tail = stream->oa_buffer.tail;
+
+	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
+
+	/* An out of bounds or misaligned head or tail pointer implies a driver bug */
+	if (drm_WARN_ONCE(&stream->oa->xe->drm,
+			  head > XE_OA_BUFFER_SIZE || tail > XE_OA_BUFFER_SIZE,
+			  "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
+			  head, tail))
+		return -EIO;
+
+	for (; OA_TAKEN(tail, head); head = (head + report_size) & mask) {
+		u8 *report = oa_buf_base + head;
+		u32 *report32 = (void *)report;
+
+		ret = xe_oa_append_sample(stream, buf, count, offset, report);
+		if (ret)
+			break;
+
+		if (is_power_of_2(report_size)) {
+			/* Clear out report id and timestamp to detect unlanded reports */
+			oa_report_id_clear(stream, report32);
+			oa_timestamp_clear(stream, report32);
+		} else {
+			u8 *oa_buf_end = stream->oa_buffer.vaddr +
+					 XE_OA_BUFFER_SIZE;
+			u32 part = oa_buf_end - (u8 *)report32;
+
+			/* Zero out the entire report */
+			if (report_size <= part) {
+				memset(report32, 0, report_size);
+			} else {
+				memset(report32, 0, part);
+				memset(oa_buf_base, 0, report_size - part);
+			}
+		}
+	}
+
+	if (start_offset != *offset) {
+		struct xe_reg oaheadptr = __oa_regs(stream)->oa_head_ptr;
+
+		spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
+
+		xe_mmio_write32(stream->gt, oaheadptr,
+				(head + gtt_offset) & OAG_OAHEADPTR_MASK);
+		stream->oa_buffer.head = head;
+
+		spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
+	}
+
+	return ret;
+}
+
 static void xe_oa_init_oa_buffer(struct xe_oa_stream *stream)
 {
 	u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo);
@@ -321,6 +465,100 @@ static void xe_oa_disable(struct xe_oa_stream *stream)
 			"wait for OA tlb invalidate timed out\n");
 }
 
+static int __xe_oa_read(struct xe_oa_stream *stream, char __user *buf,
+			size_t count, size_t *offset)
+{
+	struct xe_reg oastatus_reg = __oa_regs(stream)->oa_status;
+	u32 oastatus;
+	int ret;
+
+	if (drm_WARN_ON(&stream->oa->xe->drm, !stream->oa_buffer.vaddr))
+		return -EIO;
+
+	oastatus = xe_mmio_read32(stream->gt, oastatus_reg);
+
+	/* We treat OABUFFER_OVERFLOW as a significant error */
+	if (oastatus & OAG_OASTATUS_BUFFER_OVERFLOW) {
+		ret = xe_oa_append_status(stream, buf, count, offset,
+					  DRM_XE_OA_RECORD_OA_BUFFER_LOST);
+		if (ret)
+			return ret;
+
+		drm_dbg(&stream->oa->xe->drm,
+			"OA buffer overflow (exponent = %d): force restart\n",
+			stream->period_exponent);
+
+		xe_oa_disable(stream);
+		xe_oa_enable(stream);
+
+		/* oa_enable will re-init oabuffer and reset oastatus_reg */
+		oastatus = xe_mmio_read32(stream->gt, oastatus_reg);
+	}
+
+	if (oastatus & OAG_OASTATUS_REPORT_LOST) {
+		ret = xe_oa_append_status(stream, buf, count, offset,
+					  DRM_XE_OA_RECORD_OA_REPORT_LOST);
+		if (ret)
+			return ret;
+
+		xe_mmio_rmw32(stream->gt, oastatus_reg,
+			      OAG_OASTATUS_COUNTER_OVERFLOW |
+			      OAG_OASTATUS_REPORT_LOST, 0);
+	}
+
+	return xe_oa_append_reports(stream, buf, count, offset);
+}
+
+static int xe_oa_wait_unlocked(struct xe_oa_stream *stream)
+{
+	/* We might wait indefinitely if periodic sampling is not enabled */
+	if (!stream->periodic)
+		return -EIO;
+
+	return wait_event_interruptible(stream->poll_wq,
+					xe_oa_buffer_check_unlocked(stream));
+}
+
+static ssize_t xe_oa_read(struct file *file, char __user *buf,
+			  size_t count, loff_t *ppos)
+{
+	struct xe_oa_stream *stream = file->private_data;
+	size_t offset = 0;
+	int ret;
+
+	/* Can't read from disabled streams */
+	if (!stream->enabled || !stream->sample)
+		return -EIO;
+
+	if (!(file->f_flags & O_NONBLOCK)) {
+		do {
+			ret = xe_oa_wait_unlocked(stream);
+			if (ret)
+				return ret;
+
+			mutex_lock(&stream->stream_lock);
+			ret = __xe_oa_read(stream, buf, count, &offset);
+			mutex_unlock(&stream->stream_lock);
+		} while (!offset && !ret);
+	} else {
+		mutex_lock(&stream->stream_lock);
+		ret = __xe_oa_read(stream, buf, count, &offset);
+		mutex_unlock(&stream->stream_lock);
+	}
+
+	/*
+	 * Typically we clear pollin here in order to wait for the new hrtimer callback
+	 * before unblocking. The exception to this is if __xe_oa_read returns -ENOSPC,
+	 * which means that more OA data is available than could fit in the user provided
+	 * buffer. In this case we want the next poll() call to not block.
+	 */
+	if (ret != -ENOSPC)
+		stream->pollin = false;
+
+	/* Possible values for ret are 0, -EFAULT, -ENOSPC, -EIO, ... */
+	return offset ?: (ret ?: -EAGAIN);
+}
+
 static __poll_t xe_oa_poll_locked(struct xe_oa_stream *stream,
 				  struct file *file, poll_table *wait)
 {
@@ -738,6 +976,7 @@ static const struct file_operations xe_oa_fops = {
 	.llseek		= no_llseek,
 	.release	= xe_oa_release,
 	.poll		= xe_oa_poll,
+	.read		= xe_oa_read,
 	.unlocked_ioctl	= xe_oa_ioctl,
 };
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 11/17] drm/xe/oa: Disable overrun mode for Xe2+ OAG
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (9 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 10/17] drm/xe/oa/uapi: Read file_operation Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-20  3:05   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 12/17] drm/xe/oa: Add OAR support Ashutosh Dixit
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Xe2+ OAG requires special handling because non-power-of-2 report sizes are
not a sub-multiple of the OA buffer size and there are no partial reports
at the end of the buffer. This issue is present only when overrun mode is
enabled. Avoid adding this special handling by disabling overrun mode for
Xe2+ OAG.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_oa_regs.h | 1 +
 drivers/gpu/drm/xe/xe_oa.c           | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/xe/regs/xe_oa_regs.h b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
index 4455a5a42b01b..7e2e875ccf80a 100644
--- a/drivers/gpu/drm/xe/regs/xe_oa_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
@@ -52,6 +52,7 @@
 #define  OABUFFER_SIZE_4M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 5)
 #define  OABUFFER_SIZE_8M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 6)
 #define  OABUFFER_SIZE_16M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 7)
+#define  OAG_OABUFFER_DISABLE_OVERRUN_MODE	REG_BIT(1)
 #define  OAG_OABUFFER_MEMORY_SELECT		REG_BIT(0) /* 0: PPGTT, 1: GGTT */
 
 #define OAG_OACONTROL				XE_REG(0xdaf4)
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 5744436188dcd..073476721377d 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -408,6 +408,14 @@ static void xe_oa_init_oa_buffer(struct xe_oa_stream *stream)
 			gtt_offset & OAG_OAHEADPTR_MASK);
 	stream->oa_buffer.head = 0;
 
+	/*
+	 * For Xe2+, OAG buffer is not a multiple of report size and there are no partial
+	 * reports at the end of the buffer when overrun mode is enabled. Disable overrun
+	 * mode to avoid this issue.
+	 */
+	if (GRAPHICS_VER(stream->oa->xe) >= 20 &&
+	    stream->hwe->oa_unit->type == DRM_XE_OA_UNIT_TYPE_OAG)
+		oa_buf |= OAG_OABUFFER_DISABLE_OVERRUN_MODE;
 	/*
 	 * PRM says: "This MMIO must be set before the OATAILPTR register and after the
 	 * OAHEADPTR register. This is to enable proper functionality of the overflow bit".
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 12/17] drm/xe/oa: Add OAR support
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (10 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 11/17] drm/xe/oa: Disable overrun mode for Xe2+ OAG Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-20  4:37   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 13/17] drm/xe/oa: Add OAC support Ashutosh Dixit
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Add OAR support to allow userspace to execute MI_REPORT_PERF_COUNT on
render engines. Configuration batches are used to program the OAR unit, as
well as modifying the render engine context image of a specified exec queue
(to have correct register values when that context switches in).

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 .../gpu/drm/xe/instructions/xe_mi_commands.h  |   3 +
 drivers/gpu/drm/xe/regs/xe_engine_regs.h      |   3 +-
 drivers/gpu/drm/xe/xe_lrc.c                   |  11 +-
 drivers/gpu/drm/xe/xe_lrc.h                   |   1 +
 drivers/gpu/drm/xe/xe_oa.c                    | 216 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_oa_types.h              |   4 +
 6 files changed, 232 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
index 1cfa96167fde3..d333132b021e0 100644
--- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
@@ -45,6 +45,9 @@
 #define   MI_LRI_MMIO_REMAP_EN		REG_BIT(17)
 #define   MI_LRI_NUM_REGS(x)		XE_INSTR_NUM_DW(2 * (x) + 1)
 #define   MI_LRI_FORCE_POSTED		REG_BIT(12)
+#define   IS_MI_LRI_CMD(x)		(REG_FIELD_GET(MI_OPCODE, (x)) == \
+					 REG_FIELD_GET(MI_OPCODE, MI_LOAD_REGISTER_IMM))
+#define   MI_LRI_LEN(x)			(((x) & 0xff) + 1)
 
 #define MI_FLUSH_DW			__MI_INSTR(0x26)
 #define   MI_FLUSH_DW_STORE_INDEX	REG_BIT(21)
diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
index 444ff9b83bb1b..76c0938df05f3 100644
--- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
@@ -71,7 +71,8 @@
 #define RING_EXECLIST_STATUS_LO(base)		XE_REG((base) + 0x234)
 #define RING_EXECLIST_STATUS_HI(base)		XE_REG((base) + 0x234 + 4)
 
-#define RING_CONTEXT_CONTROL(base)		XE_REG((base) + 0x244)
+#define RING_CONTEXT_CONTROL(base)		XE_REG((base) + 0x244, XE_REG_OPTION_MASKED)
+#define	  CTX_CTRL_OAC_CONTEXT_ENABLE		REG_BIT(8)
 #define	  CTX_CTRL_INHIBIT_SYN_CTX_SWITCH	REG_BIT(3)
 #define	  CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT	REG_BIT(0)
 
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 17c0eb9e62cfb..8586e1f4a7fbc 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -565,12 +565,18 @@ u32 xe_lrc_pphwsp_offset(struct xe_lrc *lrc)
 
 /* Make the magic macros work */
 #define __xe_lrc_pphwsp_offset xe_lrc_pphwsp_offset
+#define __xe_lrc_regs_offset xe_lrc_regs_offset
 
 #define LRC_SEQNO_PPHWSP_OFFSET 512
 #define LRC_START_SEQNO_PPHWSP_OFFSET (LRC_SEQNO_PPHWSP_OFFSET + 8)
 #define LRC_PARALLEL_PPHWSP_OFFSET 2048
 #define LRC_PPHWSP_SIZE SZ_4K
 
+u32 xe_lrc_regs_offset(struct xe_lrc *lrc)
+{
+	return xe_lrc_pphwsp_offset(lrc) + LRC_PPHWSP_SIZE;
+}
+
 static size_t lrc_reg_size(struct xe_device *xe)
 {
 	if (GRAPHICS_VERx100(xe) >= 1250)
@@ -602,11 +608,6 @@ static inline u32 __xe_lrc_parallel_offset(struct xe_lrc *lrc)
 	return xe_lrc_pphwsp_offset(lrc) + LRC_PARALLEL_PPHWSP_OFFSET;
 }
 
-static inline u32 __xe_lrc_regs_offset(struct xe_lrc *lrc)
-{
-	return xe_lrc_pphwsp_offset(lrc) + LRC_PPHWSP_SIZE;
-}
-
 #define DECL_MAP_ADDR_HELPERS(elem) \
 static inline struct iosys_map __xe_lrc_##elem##_map(struct xe_lrc *lrc) \
 { \
diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
index 28b1d3f404d4f..d6d8aa8fb51eb 100644
--- a/drivers/gpu/drm/xe/xe_lrc.h
+++ b/drivers/gpu/drm/xe/xe_lrc.h
@@ -23,6 +23,7 @@ void xe_lrc_finish(struct xe_lrc *lrc);
 
 size_t xe_lrc_size(struct xe_device *xe, enum xe_engine_class class);
 u32 xe_lrc_pphwsp_offset(struct xe_lrc *lrc);
+u32 xe_lrc_regs_offset(struct xe_lrc *lrc);
 
 void xe_lrc_set_ring_head(struct xe_lrc *lrc, u32 head);
 u32 xe_lrc_ring_head(struct xe_lrc *lrc);
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 073476721377d..9d653d7722d1a 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -12,7 +12,9 @@
 #include <drm/xe_drm.h>
 
 #include "instructions/xe_mi_commands.h"
+#include "regs/xe_engine_regs.h"
 #include "regs/xe_gt_regs.h"
+#include "regs/xe_lrc_layout.h"
 #include "regs/xe_oa_regs.h"
 #include "xe_device.h"
 #include "xe_exec_queue.h"
@@ -20,6 +22,7 @@
 #include "xe_bo.h"
 #include "xe_gt.h"
 #include "xe_gt_mcr.h"
+#include "xe_lrc.h"
 #include "xe_mmio.h"
 #include "xe_oa.h"
 #include "xe_sched_job.h"
@@ -63,6 +66,12 @@ struct xe_oa_config {
 	struct rcu_head rcu;
 };
 
+struct flex {
+	struct xe_reg reg;
+	u32 offset;
+	u32 value;
+};
+
 struct xe_oa_open_param {
 	u32 oa_unit_id;
 	bool sample;
@@ -640,6 +649,119 @@ static void xe_oa_free_configs(struct xe_oa_stream *stream)
 		free_oa_config_bo(oa_bo);
 }
 
+static void xe_oa_store_flex(struct xe_oa_stream *stream, struct xe_lrc *lrc,
+			     struct xe_bb *bb, const struct flex *flex, u32 count)
+{
+	u32 offset = xe_bo_ggtt_addr(lrc->bo);
+
+	do {
+		bb->cs[bb->len++] = MI_STORE_DATA_IMM | BIT(22) /* GGTT */ | 2;
+		bb->cs[bb->len++] = offset + flex->offset * sizeof(u32);
+		bb->cs[bb->len++] = 0;
+		bb->cs[bb->len++] = flex->value;
+
+	} while (flex++, --count);
+}
+
+static int xe_oa_modify_context(struct xe_oa_stream *stream, struct xe_lrc *lrc,
+				const struct flex *flex, u32 count)
+{
+	struct xe_bb *bb;
+	int err;
+
+	bb = xe_bb_new(stream->gt, 4 * count + 1, false);
+	if (IS_ERR(bb)) {
+		err = PTR_ERR(bb);
+		goto exit;
+	}
+
+	xe_oa_store_flex(stream, lrc, bb, flex, count);
+
+	err = xe_oa_submit_bb(stream, bb);
+	xe_bb_free(bb, NULL);
+exit:
+	return err;
+}
+
+static void xe_oa_load_flex(struct xe_oa_stream *stream, struct xe_bb *bb,
+			    const struct flex *flex, u32 count)
+{
+	XE_WARN_ON(!count || count > 63);
+
+	bb->cs[bb->len++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(count);
+
+	do {
+		bb->cs[bb->len++] = flex->reg.addr;
+		bb->cs[bb->len++] = flex->value;
+
+	} while (flex++, --count);
+
+	bb->cs[bb->len++] = MI_NOOP;
+}
+
+static int xe_oa_modify_self(struct xe_oa_stream *stream,
+			     const struct flex *flex, u32 count)
+{
+	struct xe_bb *bb;
+	int err;
+
+	bb = xe_bb_new(stream->gt, 2 * count + 3, false);
+	if (IS_ERR(bb)) {
+		err = PTR_ERR(bb);
+		goto exit;
+	}
+
+	xe_oa_load_flex(stream, bb, flex, count);
+
+	err = xe_oa_submit_bb(stream, bb);
+	xe_bb_free(bb, NULL);
+exit:
+	return err;
+}
+
+#define OAR_OAC_OACONTROL_OFFSET 0x5B0
+
+static int xe_oa_configure_oar_context(struct xe_oa_stream *stream, bool enable)
+{
+	const struct xe_oa_format *format = stream->oa_buffer.format;
+	struct xe_lrc *lrc = &stream->exec_q->lrc[0];
+	u32 regs_offset = xe_lrc_regs_offset(lrc) / sizeof(u32);
+	u32 oacontrol = __format_to_oactrl(format, OAR_OACONTROL_COUNTER_SEL_MASK) |
+		(enable ? OAR_OACONTROL_COUNTER_ENABLE : 0);
+
+	struct flex regs_context[] = {
+		{
+			OACTXCONTROL(stream->hwe->mmio_base),
+			stream->oa->ctx_oactxctrl_offset[stream->hwe->class] + 1,
+			enable ? OA_COUNTER_RESUME : 0,
+		},
+		{
+			RING_CONTEXT_CONTROL(stream->hwe->mmio_base),
+			regs_offset + CTX_CONTEXT_CONTROL,
+			_MASKED_FIELD(CTX_CTRL_OAC_CONTEXT_ENABLE,
+				      enable ? CTX_CTRL_OAC_CONTEXT_ENABLE : 0)
+		},
+	};
+	/* Offsets in regs_lri are not used since this configuration is applied using LRI */
+	struct flex regs_lri[] = {
+		{
+			OAR_OACONTROL,
+			OAR_OAC_OACONTROL_OFFSET + 1,
+			oacontrol,
+		},
+	};
+	int err;
+
+	/* Modify stream hwe context image with regs_context */
+	err = xe_oa_modify_context(stream, &stream->exec_q->lrc[0],
+				   regs_context, ARRAY_SIZE(regs_context));
+	if (err)
+		return err;
+
+	/* Apply regs_lri using LRI */
+	return xe_oa_modify_self(stream, regs_lri, ARRAY_SIZE(regs_lri));
+}
+
 #define HAS_OA_BPC_REPORTING(xe) (GRAPHICS_VERx100(xe) >= 1255)
 
 static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
@@ -657,6 +779,10 @@ static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
 					  _MASKED_BIT_DISABLE(DISABLE_DOP_GATING));
 	}
 
+	/* disable the context save/restore or OAR counters */
+	if (stream->exec_q)
+		xe_oa_configure_oar_context(stream, false);
+
 	/* Make sure we disable noa to save power. */
 	xe_mmio_rmw32(stream->gt, RPM_CONFIG1, GT_NOA_ENABLE, 0);
 
@@ -814,6 +940,7 @@ static u32 oag_report_ctx_switches(const struct xe_oa_stream *stream)
 static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
 {
 	u32 oa_debug, sqcnt1;
+	int ret;
 
 	/*
 	 * Wa_1508761755:xehpsdv, dg2
@@ -851,6 +978,12 @@ static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
 
 	xe_mmio_rmw32(stream->gt, XELPMP_SQCNT1, 0, sqcnt1);
 
+	if (stream->exec_q) {
+		ret = xe_oa_configure_oar_context(stream, true);
+		if (ret)
+			return ret;
+	}
+
 	return xe_oa_emit_oa_config(stream);
 }
 
@@ -988,6 +1121,78 @@ static const struct file_operations xe_oa_fops = {
 	.unlocked_ioctl	= xe_oa_ioctl,
 };
 
+static bool engine_supports_mi_query(struct xe_hw_engine *hwe)
+{
+	return hwe->class == XE_ENGINE_CLASS_RENDER ||
+		hwe->class == XE_ENGINE_CLASS_COMPUTE;
+}
+
+static bool xe_oa_find_reg_in_lri(u32 *state, u32 reg, u32 *offset, u32 end)
+{
+	u32 idx = *offset;
+	u32 len = min(MI_LRI_LEN(state[idx]) + idx, end);
+	bool found = false;
+
+	idx++;
+	for (; idx < len; idx += 2) {
+		if (state[idx] == reg) {
+			found = true;
+			break;
+		}
+	}
+
+	*offset = idx;
+	return found;
+}
+
+static u32 xe_oa_context_image_offset(struct xe_oa_stream *stream, u32 reg)
+{
+	struct xe_lrc *lrc = &stream->exec_q->lrc[0];
+	u32 len = (xe_lrc_size(stream->oa->xe, stream->hwe->class) +
+		   lrc->ring.size) / sizeof(u32);
+	u32 offset = xe_lrc_regs_offset(lrc) / sizeof(u32);
+	u32 *state = (u32 *)lrc->bo->vmap.vaddr;
+
+	if (drm_WARN_ON(&stream->oa->xe->drm, !state))
+		return U32_MAX;
+
+	for (; offset < len; ) {
+		if (IS_MI_LRI_CMD(state[offset])) {
+			/*
+			 * We expect reg-value pairs in MI_LRI command, so
+			 * MI_LRI_LEN() should be even
+			 */
+			drm_WARN_ON(&stream->oa->xe->drm,
+				    MI_LRI_LEN(state[offset]) & 0x1);
+
+			if (xe_oa_find_reg_in_lri(state, reg, &offset, len))
+				break;
+		} else {
+			offset++;
+		}
+	}
+
+	return offset < len ? offset : U32_MAX;
+}
+
+static int xe_oa_set_ctx_ctrl_offset(struct xe_oa_stream *stream)
+{
+	struct xe_reg reg = OACTXCONTROL(stream->hwe->mmio_base);
+	u32 offset = stream->oa->ctx_oactxctrl_offset[stream->hwe->class];
+
+	/* Do this only once. Failure is stored as offset of U32_MAX */
+	if (offset)
+		goto exit;
+
+	offset = xe_oa_context_image_offset(stream, reg.addr);
+	stream->oa->ctx_oactxctrl_offset[stream->hwe->class] = offset;
+
+	drm_dbg(&stream->oa->xe->drm, "%s oa ctx control at 0x%08x dword offset\n",
+		stream->hwe->name, offset);
+exit:
+	return offset && offset != U32_MAX ? 0 : -ENODEV;
+}
+
 static int xe_oa_stream_init(struct xe_oa_stream *stream,
 			     struct xe_oa_open_param *param)
 {
@@ -1008,6 +1213,17 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream,
 	stream->periodic = param->period_exponent > 0;
 	stream->period_exponent = param->period_exponent;
 
+	if (stream->exec_q && engine_supports_mi_query(stream->hwe)) {
+		/* If we don't find the context offset, just return error */
+		ret = xe_oa_set_ctx_ctrl_offset(stream);
+		if (ret) {
+			drm_err(&stream->oa->xe->drm,
+				"xe_oa_set_ctx_ctrl_offset failed for %s\n",
+				stream->hwe->name);
+			goto exit;
+		}
+	}
+
 	stream->oa_config = xe_oa_get_oa_config(stream->oa, param->metric_set);
 	if (!stream->oa_config) {
 		drm_dbg(&stream->oa->xe->drm, "Invalid OA config id=%i\n", param->metric_set);
diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
index 05047226af8d1..bcd8d249faaec 100644
--- a/drivers/gpu/drm/xe/xe_oa_types.h
+++ b/drivers/gpu/drm/xe/xe_oa_types.h
@@ -13,6 +13,7 @@
 
 #include <drm/xe_drm.h>
 #include "regs/xe_reg_defs.h"
+#include "xe_hw_engine_types.h"
 
 #define XE_OA_BUFFER_SIZE SZ_16M
 
@@ -132,6 +133,9 @@ struct xe_oa {
 	/** @metrics_idr: List of dynamic configurations (struct xe_oa_config) */
 	struct idr metrics_idr;
 
+	/** @ctx_oactxctrl_offset: offset of OACTXCONTROL register in context image */
+	u32 ctx_oactxctrl_offset[XE_ENGINE_CLASS_MAX];
+
 	/** @oa_formats: tracks all OA formats across platforms */
 	const struct xe_oa_format *oa_formats;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 13/17] drm/xe/oa: Add OAC support
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (11 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 12/17] drm/xe/oa: Add OAR support Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-20  4:59   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 14/17] drm/xe/oa/uapi: Query OA unit properties Ashutosh Dixit
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Similar to OAR, allow userspace to execute MI_REPORT_PERF_COUNT on compute
engines of a specified exec queue.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_engine_regs.h |  1 +
 drivers/gpu/drm/xe/regs/xe_oa_regs.h     |  3 +
 drivers/gpu/drm/xe/xe_oa.c               | 81 +++++++++++++++++++++++-
 3 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
index 76c0938df05f3..045f9773f01f4 100644
--- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
@@ -73,6 +73,7 @@
 
 #define RING_CONTEXT_CONTROL(base)		XE_REG((base) + 0x244, XE_REG_OPTION_MASKED)
 #define	  CTX_CTRL_OAC_CONTEXT_ENABLE		REG_BIT(8)
+#define	  CTX_CTRL_RUN_ALONE			REG_BIT(7)
 #define	  CTX_CTRL_INHIBIT_SYN_CTX_SWITCH	REG_BIT(3)
 #define	  CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT	REG_BIT(0)
 
diff --git a/drivers/gpu/drm/xe/regs/xe_oa_regs.h b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
index 7e2e875ccf80a..b66cd95b795e7 100644
--- a/drivers/gpu/drm/xe/regs/xe_oa_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
@@ -74,6 +74,9 @@
 #define  OAG_OASTATUS_BUFFER_OVERFLOW	REG_BIT(1)
 #define  OAG_OASTATUS_REPORT_LOST	REG_BIT(0)
 
+/* OAC unit */
+#define OAC_OACONTROL			XE_REG(0x15114)
+
 /* OAM unit */
 #define OAM_HEAD_POINTER_OFFSET			(0x1a0)
 #define OAM_TAIL_POINTER_OFFSET			(0x1a4)
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 9d653d7722d1a..42f32d4359f2c 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -449,6 +449,19 @@ u32 __format_to_oactrl(const struct xe_oa_format *format, int counter_sel_mask)
 		REG_FIELD_PREP(OA_OACONTROL_COUNTER_SIZE_MASK, format->counter_size);
 }
 
+static u32 __oa_ccs_select(struct xe_oa_stream *stream)
+{
+	u32 val;
+
+	if (stream->hwe->class != XE_ENGINE_CLASS_COMPUTE)
+		return 0;
+
+	val = REG_FIELD_PREP(OAG_OACONTROL_OA_CCS_SELECT_MASK, stream->hwe->instance);
+	xe_assert(stream->oa->xe,
+		  REG_FIELD_GET(OAG_OACONTROL_OA_CCS_SELECT_MASK, val) == stream->hwe->instance);
+	return val;
+}
+
 static void xe_oa_enable(struct xe_oa_stream *stream)
 {
 	const struct xe_oa_format *format = stream->oa_buffer.format;
@@ -463,7 +476,7 @@ static void xe_oa_enable(struct xe_oa_stream *stream)
 
 	regs = __oa_regs(stream);
 	val = __format_to_oactrl(format, regs->oa_ctrl_counter_select_mask) |
-		OAG_OACONTROL_OA_COUNTER_ENABLE;
+		__oa_ccs_select(stream) | OAG_OACONTROL_OA_COUNTER_ENABLE;
 
 	xe_mmio_write32(stream->gt, regs->oa_ctrl, val);
 }
@@ -762,6 +775,64 @@ static int xe_oa_configure_oar_context(struct xe_oa_stream *stream, bool enable)
 	return xe_oa_modify_self(stream, regs_lri, ARRAY_SIZE(regs_lri));
 }
 
+static int xe_oa_configure_oac_context(struct xe_oa_stream *stream, bool enable)
+{
+	const struct xe_oa_format *format = stream->oa_buffer.format;
+	struct xe_lrc *lrc = &stream->exec_q->lrc[0];
+	u32 regs_offset = xe_lrc_regs_offset(lrc) / sizeof(u32);
+	u32 oacontrol = __format_to_oactrl(format, OAR_OACONTROL_COUNTER_SEL_MASK) |
+		(enable ? OAR_OACONTROL_COUNTER_ENABLE : 0);
+	struct flex regs_context[] = {
+		{
+			OACTXCONTROL(stream->hwe->mmio_base),
+			stream->oa->ctx_oactxctrl_offset[stream->hwe->class] + 1,
+			enable ? OA_COUNTER_RESUME : 0,
+		},
+		{
+			RING_CONTEXT_CONTROL(stream->hwe->mmio_base),
+			regs_offset + CTX_CONTEXT_CONTROL,
+			_MASKED_FIELD(CTX_CTRL_OAC_CONTEXT_ENABLE,
+				      enable ? CTX_CTRL_OAC_CONTEXT_ENABLE : 0) |
+			_MASKED_FIELD(CTX_CTRL_RUN_ALONE,
+				      enable ? CTX_CTRL_RUN_ALONE : 0),
+		},
+	};
+	/* Offsets in regs_lri are not used since this configuration is applied using LRI */
+	struct flex regs_lri[] = {
+		{
+			OAC_OACONTROL,
+			OAR_OAC_OACONTROL_OFFSET + 1,
+			oacontrol,
+		},
+	};
+	int err;
+
+	/* Set ccs select to enable programming of OAC_OACONTROL */
+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_ctrl, __oa_ccs_select(stream));
+
+	/* Modify stream hwe context image with regs_context */
+	err = xe_oa_modify_context(stream, &stream->exec_q->lrc[0],
+				   regs_context, ARRAY_SIZE(regs_context));
+	if (err)
+		return err;
+
+	/* Apply regs_lri using LRI */
+	return xe_oa_modify_self(stream, regs_lri, ARRAY_SIZE(regs_lri));
+}
+
+static int xe_oa_configure_oa_context(struct xe_oa_stream *stream, bool enable)
+{
+	switch (stream->hwe->class) {
+	case XE_ENGINE_CLASS_RENDER:
+		return xe_oa_configure_oar_context(stream, enable);
+	case XE_ENGINE_CLASS_COMPUTE:
+		return xe_oa_configure_oac_context(stream, enable);
+	default:
+		/* Video engines do not support MI_REPORT_PERF_COUNT */
+		return 0;
+	}
+}
+
 #define HAS_OA_BPC_REPORTING(xe) (GRAPHICS_VERx100(xe) >= 1255)
 
 static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
@@ -781,7 +852,7 @@ static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
 
 	/* disable the context save/restore or OAR counters */
 	if (stream->exec_q)
-		xe_oa_configure_oar_context(stream, false);
+		xe_oa_configure_oa_context(stream, false);
 
 	/* Make sure we disable noa to save power. */
 	xe_mmio_rmw32(stream->gt, RPM_CONFIG1, GT_NOA_ENABLE, 0);
@@ -978,8 +1049,9 @@ static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
 
 	xe_mmio_rmw32(stream->gt, XELPMP_SQCNT1, 0, sqcnt1);
 
+	/* Configure OAR/OAC */
 	if (stream->exec_q) {
-		ret = xe_oa_configure_oar_context(stream, true);
+		ret = xe_oa_configure_oa_context(stream, true);
 		if (ret)
 			return ret;
 	}
@@ -1636,6 +1708,9 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, void *data, struct drm_file
 		param.exec_q = xe_exec_queue_lookup(xef, param.exec_queue_id);
 		if (XE_IOCTL_DBG(oa->xe, !param.exec_q))
 			return -ENOENT;
+
+		if (param.exec_q->width > 1)
+			drm_dbg(&oa->xe->drm, "exec_q->width > 1, programming only exec_q->lrc[0]\n");
 	}
 
 	/*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 14/17] drm/xe/oa/uapi: Query OA unit properties
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (12 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 13/17] drm/xe/oa: Add OAC support Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-23  0:40   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap Ashutosh Dixit
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Implement query for properties of OA units present on a device.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_oa.h    |  2 +
 drivers/gpu/drm/xe/xe_query.c | 81 +++++++++++++++++++++++++++++++++++
 include/uapi/drm/xe_drm.h     | 64 +++++++++++++++++++++++++++
 3 files changed, 147 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
index a0f9a876ea6b4..b88914693cdb3 100644
--- a/drivers/gpu/drm/xe/xe_oa.h
+++ b/drivers/gpu/drm/xe/xe_oa.h
@@ -25,5 +25,7 @@ int xe_oa_add_config_ioctl(struct drm_device *dev, void *data,
 			   struct drm_file *file);
 int xe_oa_remove_config_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file);
+u32 xe_oa_timestamp_frequency(struct xe_gt *gt);
+u16 xe_oa_unit_id(struct xe_hw_engine *hwe);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 56d61bf596b2b..abe2ea088e2ec 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -501,6 +501,86 @@ static int query_gt_topology(struct xe_device *xe,
 	return 0;
 }
 
+static size_t calc_oa_unit_query_size(struct xe_device *xe)
+{
+	size_t size = sizeof(struct drm_xe_query_oa_units);
+	struct xe_gt *gt;
+	int i, id;
+
+	for_each_gt(gt, xe, id) {
+		for (i = 0; i < gt->oa.num_oa_units; i++) {
+			size += sizeof(struct drm_xe_oa_unit);
+			size += gt->oa.oa_unit[i].num_engines *
+				sizeof(struct drm_xe_engine_class_instance);
+		}
+	}
+
+	return size;
+}
+
+static int query_oa_units(struct xe_device *xe,
+			  struct drm_xe_device_query *query)
+{
+	void __user *query_ptr = u64_to_user_ptr(query->data);
+	size_t size = calc_oa_unit_query_size(xe);
+	struct drm_xe_query_oa_units *qoa;
+	enum xe_hw_engine_id hwe_id;
+	struct drm_xe_oa_unit *du;
+	struct xe_hw_engine *hwe;
+	struct xe_oa_unit *u;
+	int gt_id, i, j, ret;
+	struct xe_gt *gt;
+	u8 *pdu;
+
+	if (query->size == 0) {
+		query->size = size;
+		return 0;
+	} else if (XE_IOCTL_DBG(xe, query->size != size)) {
+		return -EINVAL;
+	}
+
+	qoa = kzalloc(size, GFP_KERNEL);
+	if (!qoa)
+		return -ENOMEM;
+
+	pdu = (u8 *)&qoa->oa_units[0];
+	for_each_gt(gt, xe, gt_id) {
+		for (i = 0; i < gt->oa.num_oa_units; i++) {
+			u = &gt->oa.oa_unit[i];
+			du = (struct drm_xe_oa_unit *)pdu;
+
+			du->oa_unit_id = u->oa_unit_id;
+			du->oa_unit_type = u->type;
+			du->gt_id = gt->info.id;
+			du->open_stream = !!u->exclusive_stream;
+			du->oa_timestamp_freq = xe_oa_timestamp_frequency(gt);
+			du->oa_buf_size = XE_OA_BUFFER_SIZE;
+			du->num_engines = u->num_engines;
+
+			for (j = 1; j < DRM_XE_OA_PROPERTY_MAX; j++)
+				du->capabilities |= BIT(j);
+
+			j = 0;
+			for_each_hw_engine(hwe, gt, hwe_id) {
+				if (xe_oa_unit_id(hwe) == u->oa_unit_id) {
+					du->eci[j].engine_class =
+						xe_to_user_engine_class[hwe->class];
+					du->eci[j].engine_instance = hwe->logical_instance;
+					du->eci[j].gt_id = gt->info.id;
+					j++;
+				}
+			}
+			pdu += sizeof(*du) + j * sizeof(du->eci[0]);
+			qoa->num_oa_units++;
+		}
+	}
+
+	ret = copy_to_user(query_ptr, qoa, size);
+	kfree(qoa);
+
+	return ret ? -EFAULT : 0;
+}
+
 static int (* const xe_query_funcs[])(struct xe_device *xe,
 				      struct drm_xe_device_query *query) = {
 	query_engines,
@@ -510,6 +590,7 @@ static int (* const xe_query_funcs[])(struct xe_device *xe,
 	query_hwconfig,
 	query_gt_topology,
 	query_engine_cycles,
+	query_oa_units,
 };
 
 int xe_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 8156301df7315..5f41c5bfe5e0e 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -517,6 +517,7 @@ struct drm_xe_device_query {
 #define DRM_XE_DEVICE_QUERY_HWCONFIG		4
 #define DRM_XE_DEVICE_QUERY_GT_TOPOLOGY		5
 #define DRM_XE_DEVICE_QUERY_ENGINE_CYCLES	6
+#define DRM_XE_DEVICE_QUERY_OA_UNITS		7
 	/** @query: The type of data to query */
 	__u32 query;
 
@@ -1182,6 +1183,69 @@ enum drm_xe_oa_unit_type {
 	DRM_XE_OA_UNIT_TYPE_OAM,
 };
 
+/**
+ * struct drm_xe_query_oa_units - describe OA units
+ *
+ * If a query is made with a struct drm_xe_device_query where .query
+ * is equal to DRM_XE_DEVICE_QUERY_OA_UNITS, then the reply uses struct
+ * drm_xe_query_oa_units in .data.
+ *
+ * When there is an @open_stream, the query returns properties specific to
+ * that @open_stream. Else default properties are returned.
+ */
+struct drm_xe_query_oa_units {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+	/** @num_oa_units: number of OA units returned in oau[] */
+	__u32 num_oa_units;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/** @reserved: MBZ */
+	__u64 reserved[4];
+
+	/** @oa_units: OA units returned for this device */
+	struct drm_xe_oa_unit {
+		/** @oa_unit_id: OA unit ID */
+		__u16 oa_unit_id;
+
+		/** @oa_unit_type: OA unit type of @drm_xe_oa_unit_type */
+		__u16 oa_unit_type;
+
+		/** @gt_id: GT ID for this OA unit */
+		__u16 gt_id;
+
+		/** @open_stream: True if a stream is open on the OA unit */
+		__u16 open_stream;
+
+		/** @internal_events: True if internal events are available */
+		__u16 internal_events;
+
+		/** @pad: MBZ */
+		__u16 pad;
+
+		/** @capabilities: OA capabilities bit-mask */
+		__u64 capabilities;
+
+		/** @oa_timestamp_freq: OA timestamp freq */
+		__u64 oa_timestamp_freq;
+
+		/** @oa_buf_size: OA buffer size */
+		__u64 oa_buf_size;
+
+		/** @reserved: MBZ */
+		__u64 reserved[4];
+
+		/** @num_engines: number of engines in @eci array */
+		__u64 num_engines;
+
+		/** @eci: engines attached to this OA unit */
+		struct drm_xe_engine_class_instance eci[];
+	} oa_units[];
+};
+
 /** enum drm_xe_oa_format_type - OA format types */
 enum drm_xe_oa_format_type {
 	DRM_XE_OA_FMT_TYPE_OAG,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (13 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 14/17] drm/xe/oa/uapi: Query OA unit properties Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-23  2:39   ` Umesh Nerlige Ramappa
  2024-01-02 11:16   ` Thomas Hellström
  2023-12-08  6:43 ` [PATCH 16/17] drm/xe/oa: Add MMIO trigger support Ashutosh Dixit
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Allow the OA buffer to be mmap'd to userspace. This is needed for the MMIO
trigger use case. Even otherwise, with whitelisted OA head/tail ptr
registers, userspace can receive/interpret OA data from the mmap'd buffer
without issuing read()'s on the OA stream fd.

Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_oa.c | 53 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 42f32d4359f2c..97779cbb83ee8 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -898,6 +898,8 @@ static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream)
 		return PTR_ERR(bo);
 
 	stream->oa_buffer.bo = bo;
+	/* mmap implementation requires OA buffer to be in system memory */
+	xe_assert(stream->oa->xe, bo->vmap.is_iomem == 0);
 	stream->oa_buffer.vaddr = bo->vmap.vaddr;
 	return 0;
 }
@@ -1174,6 +1176,9 @@ static int xe_oa_release(struct inode *inode, struct file *file)
 	struct xe_oa_stream *stream = file->private_data;
 	struct xe_gt *gt = stream->gt;
 
+	/* Zap mmap's */
+	unmap_mapping_range(file->f_mapping, 0, -1, 1);
+
 	mutex_lock(&gt->oa.gt_lock);
 	xe_oa_destroy_locked(stream);
 	mutex_unlock(&gt->oa.gt_lock);
@@ -1184,6 +1189,53 @@ static int xe_oa_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static int xe_oa_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct xe_oa_stream *stream = file->private_data;
+	struct xe_bo *bo = stream->oa_buffer.bo;
+	unsigned long start = vma->vm_start;
+	int i, ret;
+
+	if (xe_perf_stream_paranoid && !perfmon_capable()) {
+		drm_dbg(&stream->oa->xe->drm, "Insufficient privilege to map OA buffer\n");
+		return -EACCES;
+	}
+
+	/* Can mmap the entire OA buffer or nothing (no partial OA buffer mmaps) */
+	if (vma->vm_end - vma->vm_start != XE_OA_BUFFER_SIZE) {
+		drm_dbg(&stream->oa->xe->drm, "Wrong mmap size, must be OA buffer size\n");
+		return -EINVAL;
+	}
+
+	/* Only support VM_READ, enforce MAP_PRIVATE by checking for VM_MAYSHARE */
+	if (vma->vm_flags & (VM_WRITE | VM_EXEC | VM_SHARED | VM_MAYSHARE)) {
+		drm_dbg(&stream->oa->xe->drm, "mmap must be read only\n");
+		return -EINVAL;
+	}
+
+	vm_flags_clear(vma, VM_MAYWRITE | VM_MAYEXEC);
+
+	/*
+	 * If the privileged parent forks and child drops root privilege, we do not want
+	 * the child to retain access to the mapped OA buffer. Explicitly set VM_DONTCOPY
+	 * to avoid such cases.
+	 */
+	vm_flags_set(vma, vma->vm_flags | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY);
+
+	xe_assert(stream->oa->xe, bo->ttm.ttm->num_pages ==
+		  (vma->vm_end - vma->vm_start) >> PAGE_SHIFT);
+	for (i = 0; i < bo->ttm.ttm->num_pages; i++) {
+		ret = remap_pfn_range(vma, start, page_to_pfn(bo->ttm.ttm->pages[i]),
+				      PAGE_SIZE, vma->vm_page_prot);
+		if (ret)
+			break;
+
+		start += PAGE_SIZE;
+	}
+
+	return ret;
+}
+
 static const struct file_operations xe_oa_fops = {
 	.owner		= THIS_MODULE,
 	.llseek		= no_llseek,
@@ -1191,6 +1243,7 @@ static const struct file_operations xe_oa_fops = {
 	.poll		= xe_oa_poll,
 	.read		= xe_oa_read,
 	.unlocked_ioctl	= xe_oa_ioctl,
+	.mmap		= xe_oa_mmap,
 };
 
 static bool engine_supports_mi_query(struct xe_hw_engine *hwe)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 16/17] drm/xe/oa: Add MMIO trigger support
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (14 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-20  4:35   ` Umesh Nerlige Ramappa
  2023-12-08  6:43 ` [PATCH 17/17] drm/xe/oa: Override GuC RC with OA on PVC Ashutosh Dixit
  2023-12-08  9:22 ` ✗ CI.Patch_applied: failure for Add OA functionality to Xe (rev7) Patchwork
  17 siblings, 1 reply; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

Add MMIO trigger support and allow-list required registers for MMIO trigger
use case. Registers are whitelisted for the lifetime of the driver but MMIO
trigger is enabled only for the duration of the stream.

Bspec: 45925, 60340, 61228

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/regs/xe_oa_regs.h  |  7 ++++++
 drivers/gpu/drm/xe/xe_oa.c            | 34 ++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_reg_whitelist.c | 23 ++++++++++++++++++
 include/uapi/drm/xe_drm.h             |  3 +++
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/regs/xe_oa_regs.h b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
index b66cd95b795e7..1ce27a72079ad 100644
--- a/drivers/gpu/drm/xe/regs/xe_oa_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
@@ -64,16 +64,23 @@
 #define  OA_OACONTROL_COUNTER_SIZE_MASK		REG_GENMASK(8, 8)
 
 #define OAG_OA_DEBUG XE_REG(0xdaf8, XE_REG_OPTION_MASKED)
+#define  OAG_OA_DEBUG_DISABLE_MMIO_TRG			REG_BIT(14)
+#define  OAG_OA_DEBUG_START_TRIGGER_SCOPE_CONTROL	REG_BIT(13)
+#define  OAG_OA_DEBUG_DISABLE_START_TRG_2_COUNT_QUAL	REG_BIT(8)
+#define  OAG_OA_DEBUG_DISABLE_START_TRG_1_COUNT_QUAL	REG_BIT(7)
 #define  OAG_OA_DEBUG_INCLUDE_CLK_RATIO			REG_BIT(6)
 #define  OAG_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS		REG_BIT(5)
 #define  OAG_OA_DEBUG_DISABLE_GO_1_0_REPORTS		REG_BIT(2)
 #define  OAG_OA_DEBUG_DISABLE_CTX_SWITCH_REPORTS	REG_BIT(1)
 
 #define OAG_OASTATUS XE_REG(0xdafc)
+#define  OAG_OASTATUS_MMIO_TRG_Q_FULL	REG_BIT(6)
 #define  OAG_OASTATUS_COUNTER_OVERFLOW	REG_BIT(2)
 #define  OAG_OASTATUS_BUFFER_OVERFLOW	REG_BIT(1)
 #define  OAG_OASTATUS_REPORT_LOST	REG_BIT(0)
 
+#define OAG_MMIOTRIGGER			XE_REG(0xdb1c)
+
 /* OAC unit */
 #define OAC_OACONTROL			XE_REG(0x15114)
 
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 97779cbb83ee8..13c6e516d9169 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -525,6 +525,16 @@ static int __xe_oa_read(struct xe_oa_stream *stream, char __user *buf,
 		oastatus = xe_mmio_read32(stream->gt, oastatus_reg);
 	}
 
+	if (oastatus & OAG_OASTATUS_MMIO_TRG_Q_FULL) {
+		ret = xe_oa_append_status(stream, buf, count, offset,
+					  DRM_XE_OA_RECORD_OA_MMIO_TRG_Q_FULL);
+		if (ret)
+			return ret;
+
+		xe_mmio_rmw32(stream->gt, oastatus_reg,
+			      OAG_OASTATUS_MMIO_TRG_Q_FULL, 0);
+	}
+
 	if (oastatus & OAG_OASTATUS_REPORT_LOST) {
 		ret = xe_oa_append_status(stream, buf, count, offset,
 					  DRM_XE_OA_RECORD_OA_REPORT_LOST);
@@ -835,6 +845,13 @@ static int xe_oa_configure_oa_context(struct xe_oa_stream *stream, bool enable)
 
 #define HAS_OA_BPC_REPORTING(xe) (GRAPHICS_VERx100(xe) >= 1255)
 
+static u32 oag_configure_mmio_trigger(const struct xe_oa_stream *stream, bool enable)
+{
+	return _MASKED_FIELD(OAG_OA_DEBUG_DISABLE_MMIO_TRG,
+			     enable && stream && stream->sample ?
+			     0 : OAG_OA_DEBUG_DISABLE_MMIO_TRG);
+}
+
 static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
 {
 	u32 sqcnt1;
@@ -850,6 +867,9 @@ static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
 					  _MASKED_BIT_DISABLE(DISABLE_DOP_GATING));
 	}
 
+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_debug,
+			oag_configure_mmio_trigger(stream, false));
+
 	/* disable the context save/restore or OAR counters */
 	if (stream->exec_q)
 		xe_oa_configure_oa_context(stream, false);
@@ -1031,9 +1051,17 @@ static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
 	oa_debug = OAG_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS |
 		OAG_OA_DEBUG_INCLUDE_CLK_RATIO;
 
+	if (GRAPHICS_VER(stream->oa->xe) >= 20)
+		oa_debug |=
+			/* The three bits below are needed to get PEC counters running */
+			OAG_OA_DEBUG_START_TRIGGER_SCOPE_CONTROL |
+			OAG_OA_DEBUG_DISABLE_START_TRG_2_COUNT_QUAL |
+			OAG_OA_DEBUG_DISABLE_START_TRG_1_COUNT_QUAL;
+
 	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_debug,
 			_MASKED_BIT_ENABLE(oa_debug) |
-			oag_report_ctx_switches(stream));
+			oag_report_ctx_switches(stream) |
+			oag_configure_mmio_trigger(stream, true));
 
 	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_ctx_ctrl, stream->periodic ?
 			(OAG_OAGLBCTXCTRL_COUNTER_RESUME |
@@ -2259,6 +2287,10 @@ static void __xe_oa_init_oa_units(struct xe_gt *gt)
 			u->type = DRM_XE_OA_UNIT_TYPE_OAM;
 		}
 
+		/* Ensure MMIO triggers remain disabled till there is a stream */
+		xe_mmio_write32(gt, u->regs.oa_debug,
+				oag_configure_mmio_trigger(NULL, false));
+
 		/* Set oa_unit_ids now to ensure ids remain contiguous */
 		u->oa_unit_id = gt_to_xe(gt)->oa.oa_unit_ids++;
 	}
diff --git a/drivers/gpu/drm/xe/xe_reg_whitelist.c b/drivers/gpu/drm/xe/xe_reg_whitelist.c
index e66ae1bdaf9c0..267af6759332b 100644
--- a/drivers/gpu/drm/xe/xe_reg_whitelist.c
+++ b/drivers/gpu/drm/xe/xe_reg_whitelist.c
@@ -7,6 +7,7 @@
 
 #include "regs/xe_engine_regs.h"
 #include "regs/xe_gt_regs.h"
+#include "regs/xe_oa_regs.h"
 #include "xe_gt_types.h"
 #include "xe_platform_types.h"
 #include "xe_rtp.h"
@@ -56,6 +57,28 @@ static const struct xe_rtp_entry_sr register_whitelist[] = {
 				   RING_FORCE_TO_NONPRIV_DENY,
 				   XE_RTP_ACTION_FLAG(ENGINE_BASE)))
 	},
+	{ XE_RTP_NAME("oa_reg_render"),
+	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, XE_RTP_END_VERSION_UNDEFINED),
+		       ENGINE_CLASS(RENDER)),
+	  XE_RTP_ACTIONS(WHITELIST(OAG_MMIOTRIGGER,
+				   RING_FORCE_TO_NONPRIV_ACCESS_RW),
+			 WHITELIST(OAG_OASTATUS,
+				   RING_FORCE_TO_NONPRIV_ACCESS_RD),
+			 WHITELIST(OAG_OAHEADPTR,
+				   RING_FORCE_TO_NONPRIV_ACCESS_RD |
+				   RING_FORCE_TO_NONPRIV_RANGE_4))
+	},
+	{ XE_RTP_NAME("oa_reg_compute"),
+	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, XE_RTP_END_VERSION_UNDEFINED),
+		       ENGINE_CLASS(COMPUTE)),
+	  XE_RTP_ACTIONS(WHITELIST(OAG_MMIOTRIGGER,
+				   RING_FORCE_TO_NONPRIV_ACCESS_RW),
+			 WHITELIST(OAG_OASTATUS,
+				   RING_FORCE_TO_NONPRIV_ACCESS_RD),
+			 WHITELIST(OAG_OAHEADPTR,
+				   RING_FORCE_TO_NONPRIV_ACCESS_RD |
+				   RING_FORCE_TO_NONPRIV_RANGE_4))
+	},
 	{}
 };
 
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 5f41c5bfe5e0e..34cd7d5206834 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1357,6 +1357,9 @@ enum drm_xe_oa_record_type {
 	 */
 	DRM_XE_OA_RECORD_OA_BUFFER_LOST = 3,
 
+	/** @DRM_XE_OA_RECORD_OA_MMIO_TRG_Q_FULL: Status indicating MMIO trigger queue full */
+	DRM_XE_OA_RECORD_OA_MMIO_TRG_Q_FULL = 4,
+
 	DRM_XE_OA_RECORD_MAX /* non-ABI */
 };
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 17/17] drm/xe/oa: Override GuC RC with OA on PVC
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (15 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 16/17] drm/xe/oa: Add MMIO trigger support Ashutosh Dixit
@ 2023-12-08  6:43 ` Ashutosh Dixit
  2023-12-08  9:22 ` ✗ CI.Patch_applied: failure for Add OA functionality to Xe (rev7) Patchwork
  17 siblings, 0 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2023-12-08  6:43 UTC (permalink / raw)
  To: intel-xe

On PVC, a w/a resets RCS/CCS before it goes into RC6. This breaks OA since
OA does not expect engine resets during its use. Fix it by disabling RC6.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_pc.c   | 60 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_pc.h   |  3 ++
 drivers/gpu/drm/xe/xe_oa.c       | 25 ++++++++++++-
 drivers/gpu/drm/xe/xe_oa_types.h |  3 ++
 4 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
index d2605a684b1c5..fb576d1fa8dc3 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.c
+++ b/drivers/gpu/drm/xe/xe_guc_pc.c
@@ -228,6 +228,27 @@ static int pc_action_set_param(struct xe_guc_pc *pc, u8 id, u32 value)
 	return ret;
 }
 
+static int pc_action_unset_param(struct xe_guc_pc *pc, u8 id)
+{
+	struct xe_guc_ct *ct = &pc_to_guc(pc)->ct;
+	int ret;
+	u32 action[] = {
+		GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
+		SLPC_EVENT(SLPC_EVENT_PARAMETER_UNSET, 1),
+		id,
+	};
+
+	if (wait_for_pc_state(pc, SLPC_GLOBAL_STATE_RUNNING))
+		return -EAGAIN;
+
+	ret = xe_guc_ct_send(ct, action, ARRAY_SIZE(action), 0, 0);
+	if (ret)
+		drm_err(&pc_to_xe(pc)->drm, "GuC PC unset param failed: %pe",
+			ERR_PTR(ret));
+
+	return ret;
+}
+
 static int pc_action_setup_gucrc(struct xe_guc_pc *pc, u32 mode)
 {
 	struct xe_guc_ct *ct = &pc_to_guc(pc)->ct;
@@ -818,6 +839,45 @@ int xe_guc_pc_gucrc_disable(struct xe_guc_pc *pc)
 	return ret;
 }
 
+/**
+ * xe_guc_pc_override_gucrc_mode() - override GUCRC mode
+ * @pc: Xe_GuC_PC instance
+ * @mode: new value of the mode.
+ *
+ * Override the GUCRC mode.
+ *
+ * Return: 0 on success, negative error code on error.
+ */
+int xe_guc_pc_override_gucrc_mode(struct xe_guc_pc *pc, enum slpc_gucrc_mode mode)
+{
+	int ret;
+
+	xe_device_mem_access_get(pc_to_xe(pc));
+	ret = pc_action_set_param(pc, SLPC_PARAM_PWRGATE_RC_MODE, mode);
+	xe_device_mem_access_put(pc_to_xe(pc));
+
+	return ret;
+}
+
+/**
+ * xe_guc_pc_override_gucrc_mode() - override GUCRC mode
+ * @pc: Xe_GuC_PC instance
+ *
+ * Unset the GUCRC mode override
+ *
+ * Return: 0 on success, negative error code on error.
+ */
+int xe_guc_pc_unset_gucrc_mode(struct xe_guc_pc *pc)
+{
+	int ret;
+
+	xe_device_mem_access_get(pc_to_xe(pc));
+	ret = pc_action_unset_param(pc, SLPC_PARAM_PWRGATE_RC_MODE);
+	xe_device_mem_access_put(pc_to_xe(pc));
+
+	return ret;
+}
+
 static void pc_init_pcode_freq(struct xe_guc_pc *pc)
 {
 	u32 min = DIV_ROUND_CLOSEST(pc->rpn_freq, GT_FREQUENCY_MULTIPLIER);
diff --git a/drivers/gpu/drm/xe/xe_guc_pc.h b/drivers/gpu/drm/xe/xe_guc_pc.h
index 054788e006f32..51b99c357e048 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.h
+++ b/drivers/gpu/drm/xe/xe_guc_pc.h
@@ -7,12 +7,15 @@
 #define _XE_GUC_PC_H_
 
 #include "xe_guc_pc_types.h"
+#include "abi/guc_actions_slpc_abi.h"
 
 int xe_guc_pc_init(struct xe_guc_pc *pc);
 void xe_guc_pc_fini(struct xe_guc_pc *pc);
 int xe_guc_pc_start(struct xe_guc_pc *pc);
 int xe_guc_pc_stop(struct xe_guc_pc *pc);
 int xe_guc_pc_gucrc_disable(struct xe_guc_pc *pc);
+int xe_guc_pc_override_gucrc_mode(struct xe_guc_pc *pc, enum slpc_gucrc_mode mode);
+int xe_guc_pc_unset_gucrc_mode(struct xe_guc_pc *pc);
 
 enum xe_gt_idle_state xe_guc_pc_c_status(struct xe_guc_pc *pc);
 u64 xe_guc_pc_rc6_residency(struct xe_guc_pc *pc);
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 13c6e516d9169..34b3f16333550 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -22,6 +22,7 @@
 #include "xe_bo.h"
 #include "xe_gt.h"
 #include "xe_gt_mcr.h"
+#include "xe_guc_pc.h"
 #include "xe_lrc.h"
 #include "xe_mmio.h"
 #include "xe_oa.h"
@@ -901,6 +902,10 @@ static void xe_oa_stream_destroy(struct xe_oa_stream *stream)
 	xe_device_mem_access_put(stream->oa->xe);
 
 	xe_oa_free_oa_buffer(stream);
+	/* Wa_1509372804:pvc: Unset the override of GUCRC mode to enable rc6 */
+	if (stream->override_gucrc)
+		XE_WARN_ON(xe_guc_pc_unset_gucrc_mode(&gt->uc.guc.pc));
+
 	xe_oa_free_configs(stream);
 }
 
@@ -1384,9 +1389,24 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream,
 		goto exit;
 	}
 
+	/*
+	 * Wa_1509372804:pvc
+	 *
+	 * GuC reset of engines causes OA to lose configuration
+	 * state. Prevent this by overriding GUCRC mode.
+	 */
+	if (stream->oa->xe->info.platform == XE_PVC) {
+		ret = xe_guc_pc_override_gucrc_mode(&gt->uc.guc.pc,
+						    SLPC_GUCRC_MODE_GUCRC_NO_RC6);
+		if (ret)
+			goto err_free_configs;
+
+		stream->override_gucrc = true;
+	}
+
 	ret = xe_oa_alloc_oa_buffer(stream);
 	if (ret)
-		goto err_free_configs;
+		goto err_unset_gucrc;
 
 	/* Take runtime pm ref and forcewake to disable RC6 */
 	xe_device_mem_access_get(stream->oa->xe);
@@ -1429,6 +1449,9 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream,
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
 	xe_device_mem_access_put(stream->oa->xe);
 	xe_oa_free_oa_buffer(stream);
+err_unset_gucrc:
+	if (stream->override_gucrc)
+		XE_WARN_ON(xe_guc_pc_unset_gucrc_mode(&gt->uc.guc.pc));
 err_free_configs:
 	xe_oa_free_configs(stream);
 exit:
diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
index bcd8d249faaec..e11555bac7c79 100644
--- a/drivers/gpu/drm/xe/xe_oa_types.h
+++ b/drivers/gpu/drm/xe/xe_oa_types.h
@@ -226,5 +226,8 @@ struct xe_oa_stream {
 
 	/** @poll_period_ns: hrtimer period for checking OA buffer for available data */
 	u64 poll_period_ns;
+
+	/** @override_gucrc: GuC RC has been overridden for the OA stream */
+	bool override_gucrc;
 };
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* ✗ CI.Patch_applied: failure for Add OA functionality to Xe (rev7)
  2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
                   ` (16 preceding siblings ...)
  2023-12-08  6:43 ` [PATCH 17/17] drm/xe/oa: Override GuC RC with OA on PVC Ashutosh Dixit
@ 2023-12-08  9:22 ` Patchwork
  17 siblings, 0 replies; 68+ messages in thread
From: Patchwork @ 2023-12-08  9:22 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

== Series Details ==

Series: Add OA functionality to Xe (rev7)
URL   : https://patchwork.freedesktop.org/series/121084/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: c4235ef53 fixup! drm/xe/display: Implement display support
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_device_types.h:15
error: drivers/gpu/drm/xe/xe_device_types.h: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types
Applying: drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
Applying: drm/xe/oa/uapi: Add oa_max_sample_rate sysctl
Applying: drm/xe/oa/uapi: Add OA data formats
Patch failed at 0004 drm/xe/oa/uapi: Add OA data formats
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties
  2023-12-08  6:43 ` [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties Ashutosh Dixit
@ 2023-12-09 22:53   ` Dixit, Ashutosh
  2023-12-19  2:59   ` Dixit, Ashutosh
  2023-12-19 23:23   ` Umesh Nerlige Ramappa
  2 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2023-12-09 22:53 UTC (permalink / raw)
  To: intel-xe; +Cc: Harish Chegondi

On Thu, 07 Dec 2023 22:43:19 -0800, Ashutosh Dixit wrote:
>
> +/**
> + * struct drm_xe_oa_open_param - Params for opening an OA stream
> + *
> + * Stream params are specified as a chain of @drm_xe_ext_set_property
> + * struct's, with @property values from enum @drm_xe_oa_property_id and
> + * @xe_user_extension base.name set to @DRM_XE_OA_EXTENSION_SET_PROPERTY
> + */
> +struct drm_xe_oa_open_param {
> +#define DRM_XE_OA_EXTENSION_SET_PROPERTY	0
> +	/** @extensions: Pointer to the first extension struct */
> +	__u64 extensions;
> +};

Harish pointed out that this struct is not really needed. We might as well
point the @param field in struct @drm_xe_perf_param to the first
@drm_xe_ext_set_property struct. I'll make this change in the next
revision. Thanks Harish.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2023-12-08  6:43 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
@ 2023-12-14  0:57   ` Umesh Nerlige Ramappa
  2023-12-19 20:28   ` Dixit, Ashutosh
  2024-01-24 14:10   ` Joel Granados
  2 siblings, 0 replies; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-14  0:57 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:14PM -0800, Ashutosh Dixit wrote:
>Normally only superuser/root can access perf counter data. However,
>superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
>users to also access perf data. perf_stream_paranoid is introduced at the
>perf layer to allow different perf stream types to share this access
>mechanism.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

lgtm,

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

>---
> drivers/gpu/drm/xe/xe_module.c |  5 +++++
> drivers/gpu/drm/xe/xe_perf.c   | 28 ++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_perf.h   |  4 ++++
> 3 files changed, 37 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
>index 51bf69b7ab222..8629330d928b0 100644
>--- a/drivers/gpu/drm/xe/xe_module.c
>+++ b/drivers/gpu/drm/xe/xe_module.c
>@@ -11,6 +11,7 @@
> #include "xe_drv.h"
> #include "xe_hw_fence.h"
> #include "xe_pci.h"
>+#include "xe_perf.h"
> #include "xe_pmu.h"
> #include "xe_sched_job.h"
>
>@@ -71,6 +72,10 @@ static const struct init_funcs init_funcs[] = {
> 		.init = xe_register_pci_driver,
> 		.exit = xe_unregister_pci_driver,
> 	},
>+	{
>+		.init = xe_perf_sysctl_register,
>+		.exit = xe_perf_sysctl_unregister,
>+	},
> };
>
> static int __init xe_init(void)
>diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
>index a130076b59aa2..37538e98dcc04 100644
>--- a/drivers/gpu/drm/xe/xe_perf.c
>+++ b/drivers/gpu/drm/xe/xe_perf.c
>@@ -4,9 +4,13 @@
>  */
>
> #include <linux/errno.h>
>+#include <linux/sysctl.h>
>
> #include "xe_perf.h"
>
>+u32 xe_perf_stream_paranoid = true;
>+static struct ctl_table_header *sysctl_header;
>+
> int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> {
> 	struct drm_xe_perf_param *arg = data;
>@@ -19,3 +23,27 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> 		return -EINVAL;
> 	}
> }
>+
>+static struct ctl_table perf_ctl_table[] = {
>+	{
>+	 .procname = "perf_stream_paranoid",
>+	 .data = &xe_perf_stream_paranoid,
>+	 .maxlen = sizeof(xe_perf_stream_paranoid),
>+	 .mode = 0644,
>+	 .proc_handler = proc_dointvec_minmax,
>+	 .extra1 = SYSCTL_ZERO,
>+	 .extra2 = SYSCTL_ONE,
>+	 },
>+	{}
>+};
>+
>+int xe_perf_sysctl_register(void)
>+{
>+	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
>+	return 0;
>+}
>+
>+void xe_perf_sysctl_unregister(void)
>+{
>+	unregister_sysctl_table(sysctl_header);
>+}
>diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
>index 254cc7cf49fef..1ff0a07ebab30 100644
>--- a/drivers/gpu/drm/xe/xe_perf.h
>+++ b/drivers/gpu/drm/xe/xe_perf.h
>@@ -11,6 +11,10 @@
> struct drm_device;
> struct drm_file;
>
>+extern u32 xe_perf_stream_paranoid;
>+
> int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
>+int xe_perf_sysctl_register(void);
>+void xe_perf_sysctl_unregister(void);
>
> #endif
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 03/17] drm/xe/oa/uapi: Add oa_max_sample_rate sysctl
  2023-12-08  6:43 ` [PATCH 03/17] drm/xe/oa/uapi: Add oa_max_sample_rate sysctl Ashutosh Dixit
@ 2023-12-14  0:58   ` Umesh Nerlige Ramappa
  2024-01-20  2:36     ` Dixit, Ashutosh
  2024-01-24 14:11   ` Joel Granados
  1 sibling, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-14  0:58 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:15PM -0800, Ashutosh Dixit wrote:
>Introduce oa_max_sample_rate sysctl to set a max limit on the frequency of
>periodic OA reports.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/Makefile    |  1 +
> drivers/gpu/drm/xe/xe_device.c |  7 +++++
> drivers/gpu/drm/xe/xe_module.c |  5 ++++
> drivers/gpu/drm/xe/xe_oa.c     | 49 ++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_oa.h     | 16 +++++++++++
> 5 files changed, 78 insertions(+)
> create mode 100644 drivers/gpu/drm/xe/xe_oa.c
> create mode 100644 drivers/gpu/drm/xe/xe_oa.h
>
>diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>index b719953d9d30f..cf7e0e5261f73 100644
>--- a/drivers/gpu/drm/xe/Makefile
>+++ b/drivers/gpu/drm/xe/Makefile
>@@ -98,6 +98,7 @@ xe-y += xe_bb.o \
> 	xe_mmio.o \
> 	xe_mocs.o \
> 	xe_module.o \
>+	xe_oa.o \
> 	xe_pat.o \
> 	xe_pci.o \
> 	xe_pcode.o \
>diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>index 35616d1a81a31..744d573eb2720 100644
>--- a/drivers/gpu/drm/xe/xe_device.c
>+++ b/drivers/gpu/drm/xe/xe_device.c
>@@ -29,6 +29,7 @@
> #include "xe_irq.h"
> #include "xe_mmio.h"
> #include "xe_module.h"
>+#include "xe_oa.h"
> #include "xe_pat.h"
> #include "xe_pcode.h"
> #include "xe_perf.h"
>@@ -480,6 +481,10 @@ int xe_device_probe(struct xe_device *xe)
>
> 	xe_heci_gsc_init(xe);
>
>+	err = xe_oa_init(xe);
>+	if (err)
>+		goto err_irq_shutdown;
>+
> 	err = xe_display_init(xe);
> 	if (err)
> 		goto err_irq_shutdown;

^ this needs to do an xe_oa_fini on failure, so should be a 
different/new goto label.

Umesh

>@@ -526,6 +531,8 @@ void xe_device_remove(struct xe_device *xe)
>
> 	xe_display_fini(xe);
>
>+	xe_oa_fini(xe);
>+
> 	xe_heci_gsc_fini(xe);
>
> 	xe_irq_shutdown(xe);
>diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
>index 8629330d928b0..176d3e6ec8464 100644
>--- a/drivers/gpu/drm/xe/xe_module.c
>+++ b/drivers/gpu/drm/xe/xe_module.c
>@@ -10,6 +10,7 @@
>
> #include "xe_drv.h"
> #include "xe_hw_fence.h"
>+#include "xe_oa.h"
> #include "xe_pci.h"
> #include "xe_perf.h"
> #include "xe_pmu.h"
>@@ -76,6 +77,10 @@ static const struct init_funcs init_funcs[] = {
> 		.init = xe_perf_sysctl_register,
> 		.exit = xe_perf_sysctl_unregister,
> 	},
>+	{
>+		.init = xe_oa_sysctl_register,
>+		.exit = xe_oa_sysctl_unregister,
>+	},
> };
>
> static int __init xe_init(void)
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>new file mode 100644
>index 0000000000000..f4cacb4af47c5
>--- /dev/null
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -0,0 +1,49 @@
>+// SPDX-License-Identifier: MIT
>+/*
>+ * Copyright © 2023 Intel Corporation
>+ */
>+
>+#include <linux/sysctl.h>
>+
>+#include "xe_device.h"
>+#include "xe_oa.h"
>+
>+static int xe_oa_sample_rate_hard_limit;
>+static u32 xe_oa_max_sample_rate = 100000;
>+
>+static struct ctl_table_header *sysctl_header;
>+
>+int xe_oa_init(struct xe_device *xe)
>+{
>+	/* Choose a representative limit */
>+	xe_oa_sample_rate_hard_limit = xe_root_mmio_gt(xe)->info.reference_clock / 2;
>+	return 0;
>+}
>+
>+void xe_oa_fini(struct xe_device *xe)
>+{
>+}
>+
>+static struct ctl_table oa_ctl_table[] = {
>+	{
>+	 .procname = "oa_max_sample_rate",
>+	 .data = &xe_oa_max_sample_rate,
>+	 .maxlen = sizeof(xe_oa_max_sample_rate),
>+	 .mode = 0644,
>+	 .proc_handler = proc_dointvec_minmax,
>+	 .extra1 = SYSCTL_ZERO,
>+	 .extra2 = &xe_oa_sample_rate_hard_limit,
>+	 },
>+	{}
>+};
>+
>+int xe_oa_sysctl_register(void)
>+{
>+	sysctl_header = register_sysctl("dev/xe", oa_ctl_table);
>+	return 0;
>+}
>+
>+void xe_oa_sysctl_unregister(void)
>+{
>+	unregister_sysctl_table(sysctl_header);
>+}
>diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
>new file mode 100644
>index 0000000000000..1b81330c9708b
>--- /dev/null
>+++ b/drivers/gpu/drm/xe/xe_oa.h
>@@ -0,0 +1,16 @@
>+/* SPDX-License-Identifier: MIT */
>+/*
>+ * Copyright © 2023 Intel Corporation
>+ */
>+
>+#ifndef _XE_OA_H_
>+#define _XE_OA_H_
>+
>+struct xe_device;
>+
>+int xe_oa_init(struct xe_device *xe);
>+void xe_oa_fini(struct xe_device *xe);
>+int xe_oa_sysctl_register(void);
>+void xe_oa_sysctl_unregister(void);
>+
>+#endif
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/17] drm/xe/oa/uapi: Add OA data formats
  2023-12-08  6:43 ` [PATCH 04/17] drm/xe/oa/uapi: Add OA data formats Ashutosh Dixit
@ 2023-12-19  1:11   ` Umesh Nerlige Ramappa
  2023-12-19  1:17     ` Dixit, Ashutosh
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-19  1:11 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:16PM -0800, Ashutosh Dixit wrote:
>Add and initialize supported OA data formats for various platforms
>(including Xe2). User can request OA data in any supported format.
>
>Bspec: 52198, 60942, 61101
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/xe_device_types.h |  4 ++
> drivers/gpu/drm/xe/xe_oa.c           | 94 ++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_oa.h           |  2 +
> drivers/gpu/drm/xe/xe_oa_types.h     | 78 +++++++++++++++++++++++
> include/uapi/drm/xe_drm.h            | 10 +++
> 5 files changed, 188 insertions(+)
> create mode 100644 drivers/gpu/drm/xe/xe_oa_types.h
>
>diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>index 9a212dbdb8a49..842ca8b1a7408 100644
>--- a/drivers/gpu/drm/xe/xe_device_types.h
>+++ b/drivers/gpu/drm/xe/xe_device_types.h
>@@ -15,6 +15,7 @@
> #include "xe_devcoredump_types.h"
> #include "xe_heci_gsc.h"
> #include "xe_gt_types.h"
>+#include "xe_oa.h"
> #include "xe_platform_types.h"
> #include "xe_pt_types.h"
> #include "xe_pmu.h"
>@@ -418,6 +419,9 @@ struct xe_device {
> 	/** @heci_gsc: graphics security controller */
> 	struct xe_heci_gsc heci_gsc;
>
>+	/** @oa: oa perf counter subsystem */
>+	struct xe_oa oa;
>+
> 	/** @needs_flr_on_fini: requests function-reset on fini */
> 	bool needs_flr_on_fini;
>
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index f4cacb4af47c5..11662a81ef6d8 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -13,15 +13,109 @@ static u32 xe_oa_max_sample_rate = 100000;
>
> static struct ctl_table_header *sysctl_header;
>
>+#define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
>+
>+static const struct xe_oa_format oa_formats[] = {
>+	[XE_OA_FORMAT_C4_B8]			= { 7, 64 },
>+	[XE_OA_FORMAT_A12]			= { 0, 64 },
>+	[XE_OA_FORMAT_A12_B8_C8]		= { 2, 128 },
>+	[XE_OA_FORMAT_A32u40_A4u32_B8_C8]	= { 5, 256 },
>+	[XE_OAR_FORMAT_A32u40_A4u32_B8_C8]	= { 5, 256, DRM_FMT(OAR) },
>+	[XE_OA_FORMAT_A24u40_A14u32_B8_C8]	= { 5, 256 },
>+	[XE_OAC_FORMAT_A24u64_B8_C8]		= { 1, 320, DRM_FMT(OAC), HDR_64_BIT },
>+	[XE_OAC_FORMAT_A22u32_R2u32_B8_C8]	= { 2, 192, DRM_FMT(OAC), HDR_64_BIT },
>+	[XE_OAM_FORMAT_MPEC8u64_B8_C8]		= { 1, 192, DRM_FMT(OAM_MPEC), HDR_64_BIT },
>+	[XE_OAM_FORMAT_MPEC8u32_B8_C8]		= { 2, 128, DRM_FMT(OAM_MPEC), HDR_64_BIT },
>+	[XE_OA_FORMAT_PEC64u64]			= { 1, 576, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
>+	[XE_OA_FORMAT_PEC64u64_B8_C8]		= { 1, 640, DRM_FMT(PEC), HDR_64_BIT, 1, 1 },
>+	[XE_OA_FORMAT_PEC64u32]			= { 1, 320, DRM_FMT(PEC), HDR_64_BIT },
>+	[XE_OA_FORMAT_PEC32u64_G1]		= { 5, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
>+	[XE_OA_FORMAT_PEC32u32_G1]		= { 5, 192, DRM_FMT(PEC), HDR_64_BIT },
>+	[XE_OA_FORMAT_PEC32u64_G2]		= { 6, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
>+	[XE_OA_FORMAT_PEC32u32_G2]		= { 6, 192, DRM_FMT(PEC), HDR_64_BIT },
>+	[XE_OA_FORMAT_PEC36u64_G1_32_G2_4]	= { 3, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
>+	[XE_OA_FORMAT_PEC36u64_G1_4_G2_32]	= { 4, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
>+};
>+
>+static void oa_format_add(struct xe_oa *oa, enum xe_oa_format_name format)
>+{
>+	__set_bit(format, oa->format_mask);
>+}
>+
>+static void xe_oa_init_supported_formats(struct xe_oa *oa)
>+{
>+	switch (oa->xe->info.platform) {
>+	case XE_TIGERLAKE:
>+	case XE_ROCKETLAKE:
>+	case XE_ALDERLAKE_S:
>+	case XE_ALDERLAKE_P:
>+	case XE_ALDERLAKE_N:
>+	case XE_DG1:
>+		oa_format_add(oa, XE_OA_FORMAT_A12);
>+		oa_format_add(oa, XE_OA_FORMAT_A12_B8_C8);
>+		oa_format_add(oa, XE_OA_FORMAT_A32u40_A4u32_B8_C8);
>+		oa_format_add(oa, XE_OA_FORMAT_C4_B8);
>+		break;
>+
>+	case XE_DG2:
>+	case XE_PVC:
>+		oa_format_add(oa, XE_OAR_FORMAT_A32u40_A4u32_B8_C8);
>+		oa_format_add(oa, XE_OA_FORMAT_A24u40_A14u32_B8_C8);
>+		oa_format_add(oa, XE_OAC_FORMAT_A24u64_B8_C8);
>+		oa_format_add(oa, XE_OAC_FORMAT_A22u32_R2u32_B8_C8);
>+		break;
>+
>+	case XE_METEORLAKE:
>+		oa_format_add(oa, XE_OAR_FORMAT_A32u40_A4u32_B8_C8);
>+		oa_format_add(oa, XE_OA_FORMAT_A24u40_A14u32_B8_C8);
>+		oa_format_add(oa, XE_OAC_FORMAT_A24u64_B8_C8);
>+		oa_format_add(oa, XE_OAC_FORMAT_A22u32_R2u32_B8_C8);
>+		oa_format_add(oa, XE_OAM_FORMAT_MPEC8u64_B8_C8);
>+		oa_format_add(oa, XE_OAM_FORMAT_MPEC8u32_B8_C8);
>+		break;
>+
>+	case XE_LUNARLAKE:
>+		oa_format_add(oa, XE_OAM_FORMAT_MPEC8u64_B8_C8);
>+		oa_format_add(oa, XE_OAM_FORMAT_MPEC8u32_B8_C8);
>+		oa_format_add(oa, XE_OA_FORMAT_PEC64u64);
>+		oa_format_add(oa, XE_OA_FORMAT_PEC64u64_B8_C8);
>+		oa_format_add(oa, XE_OA_FORMAT_PEC64u32);
>+		oa_format_add(oa, XE_OA_FORMAT_PEC32u64_G1);
>+		oa_format_add(oa, XE_OA_FORMAT_PEC32u32_G1);
>+		oa_format_add(oa, XE_OA_FORMAT_PEC32u64_G2);
>+		oa_format_add(oa, XE_OA_FORMAT_PEC32u32_G2);
>+		oa_format_add(oa, XE_OA_FORMAT_PEC36u64_G1_32_G2_4);
>+		oa_format_add(oa, XE_OA_FORMAT_PEC36u64_G1_4_G2_32);
>+		break;
>+
>+	default:
>+		drm_err(&oa->xe->drm, "Unknown platform\n");
>+	}
>+}
>+
> int xe_oa_init(struct xe_device *xe)
> {
>+	struct xe_oa *oa = &xe->oa;
>+
>+	/* Support OA only with GuC submission and Gen12+ */
>+	if (XE_WARN_ON(!xe_device_uc_enabled(xe)) || XE_WARN_ON(GRAPHICS_VER(xe) < 12))
>+		return 0;
>+
>+	oa->xe = xe;
>+	oa->oa_formats = oa_formats;
>+
> 	/* Choose a representative limit */
> 	xe_oa_sample_rate_hard_limit = xe_root_mmio_gt(xe)->info.reference_clock / 2;
>+
>+	xe_oa_init_supported_formats(oa);
> 	return 0;
> }
>
> void xe_oa_fini(struct xe_device *xe)
> {
>+	struct xe_oa *oa = &xe->oa;
>+
>+	oa->xe = NULL;
> }
>
> static struct ctl_table oa_ctl_table[] = {
>diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
>index 1b81330c9708b..2145c73176953 100644
>--- a/drivers/gpu/drm/xe/xe_oa.h
>+++ b/drivers/gpu/drm/xe/xe_oa.h
>@@ -6,6 +6,8 @@
> #ifndef _XE_OA_H_
> #define _XE_OA_H_
>
>+#include "xe_oa_types.h"
>+
> struct xe_device;
>
> int xe_oa_init(struct xe_device *xe);
>diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
>new file mode 100644
>index 0000000000000..3758bd2879cbb
>--- /dev/null
>+++ b/drivers/gpu/drm/xe/xe_oa_types.h
>@@ -0,0 +1,78 @@
>+/* SPDX-License-Identifier: MIT */
>+/*
>+ * Copyright © 2023 Intel Corporation
>+ */
>+
>+#ifndef _XE_OA_TYPES_H_
>+#define _XE_OA_TYPES_H__
>+
>+#include <linux/math.h>
>+#include <linux/types.h>
>+
>+enum xe_oa_report_header {
>+	HDR_32_BIT = 0,
>+	HDR_64_BIT,
>+};
>+
>+enum xe_oa_format_name {
>+	XE_OA_FORMAT_C4_B8 = 7,

7? Leaving room for old formats? Not sure if it adds any value. Do you 
anticipate this driver being supported on pre-gen12? If not, IMO, we 
should just start with 0 OR 1 (if you want to use 0 for some special 
case.

rest of it, lgtm,
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Umesh

>+
>+	/* Gen8+ */
>+	XE_OA_FORMAT_A12,
>+	XE_OA_FORMAT_A12_B8_C8,
>+	XE_OA_FORMAT_A32u40_A4u32_B8_C8,
>+
>+	/* DG2 */
>+	XE_OAR_FORMAT_A32u40_A4u32_B8_C8,
>+	XE_OA_FORMAT_A24u40_A14u32_B8_C8,
>+
>+	/* DG2/MTL OAC */
>+	XE_OAC_FORMAT_A24u64_B8_C8,
>+	XE_OAC_FORMAT_A22u32_R2u32_B8_C8,
>+
>+	/* MTL OAM */
>+	XE_OAM_FORMAT_MPEC8u64_B8_C8,
>+	XE_OAM_FORMAT_MPEC8u32_B8_C8,
>+
>+	/* Xe2+ */
>+	XE_OA_FORMAT_PEC64u64,
>+	XE_OA_FORMAT_PEC64u64_B8_C8,
>+	XE_OA_FORMAT_PEC64u32,
>+	XE_OA_FORMAT_PEC32u64_G1,
>+	XE_OA_FORMAT_PEC32u32_G1,
>+	XE_OA_FORMAT_PEC32u64_G2,
>+	XE_OA_FORMAT_PEC32u32_G2,
>+	XE_OA_FORMAT_PEC36u64_G1_32_G2_4,
>+	XE_OA_FORMAT_PEC36u64_G1_4_G2_32,
>+
>+	XE_OA_FORMAT_MAX,
>+};
>+
>+/**
>+ * struct xe_oa_format - Format fields for supported OA formats
>+ */
>+struct xe_oa_format {
>+	u32 counter_select;
>+	int size;
>+	int type;
>+	enum xe_oa_report_header header;
>+	u16 counter_size;
>+	u16 bc_report;
>+};
>+
>+/**
>+ * struct xe_oa - OA device level information
>+ */
>+struct xe_oa {
>+	/** @xe: back pointer to xe device */
>+	struct xe_device *xe;
>+
>+	/** @oa_formats: tracks all OA formats across platforms */
>+	const struct xe_oa_format *oa_formats;
>+
>+#define FORMAT_MASK_SIZE DIV_ROUND_UP(XE_OA_FORMAT_MAX - 1, BITS_PER_LONG)
>+
>+	/** @format_mask: tracks valid OA formats for a platform */
>+	unsigned long format_mask[FORMAT_MASK_SIZE];
>+};
>+#endif
>diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>index 3539e0781d700..5bfb2d5aba12a 100644
>--- a/include/uapi/drm/xe_drm.h
>+++ b/include/uapi/drm/xe_drm.h
>@@ -1175,6 +1175,16 @@ enum drm_xe_perf_ioctls {
> 	DRM_XE_PERF_IOCTL_CONFIG = _IO('i', 0x2),
> };
>
>+/** enum drm_xe_oa_format_type - OA format types */
>+enum drm_xe_oa_format_type {
>+	DRM_XE_OA_FMT_TYPE_OAG,
>+	DRM_XE_OA_FMT_TYPE_OAR,
>+	DRM_XE_OA_FMT_TYPE_OAM,
>+	DRM_XE_OA_FMT_TYPE_OAC,
>+	DRM_XE_OA_FMT_TYPE_OAM_MPEC,
>+	DRM_XE_OA_FMT_TYPE_PEC,
>+};
>+
> #if defined(__cplusplus)
> }
> #endif
>-- 2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 04/17] drm/xe/oa/uapi: Add OA data formats
  2023-12-19  1:11   ` Umesh Nerlige Ramappa
@ 2023-12-19  1:17     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2023-12-19  1:17 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Mon, 18 Dec 2023 17:11:37 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> > +enum xe_oa_format_name {
> > +	XE_OA_FORMAT_C4_B8 = 7,
>
> 7? Leaving room for old formats? Not sure if it adds any value. Do you
> anticipate this driver being supported on pre-gen12? If not, IMO, we should
> just start with 0 OR 1 (if you want to use 0 for some special case.
>
> rest of it, lgtm,
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Yes good point, I'll fix this up. The OA format uapi does need review but
that's a later patch.

Thanks.
--
Ashutosh


>
> Umesh
>
> > +
> > +	/* Gen8+ */
> > +	XE_OA_FORMAT_A12,
> > +	XE_OA_FORMAT_A12_B8_C8,
> > +	XE_OA_FORMAT_A32u40_A4u32_B8_C8,
> > +
> > +	/* DG2 */
> > +	XE_OAR_FORMAT_A32u40_A4u32_B8_C8,
> > +	XE_OA_FORMAT_A24u40_A14u32_B8_C8,
> > +
> > +	/* DG2/MTL OAC */
> > +	XE_OAC_FORMAT_A24u64_B8_C8,
> > +	XE_OAC_FORMAT_A22u32_R2u32_B8_C8,
> > +
> > +	/* MTL OAM */
> > +	XE_OAM_FORMAT_MPEC8u64_B8_C8,
> > +	XE_OAM_FORMAT_MPEC8u32_B8_C8,
> > +
> > +	/* Xe2+ */
> > +	XE_OA_FORMAT_PEC64u64,
> > +	XE_OA_FORMAT_PEC64u64_B8_C8,
> > +	XE_OA_FORMAT_PEC64u32,
> > +	XE_OA_FORMAT_PEC32u64_G1,
> > +	XE_OA_FORMAT_PEC32u32_G1,
> > +	XE_OA_FORMAT_PEC32u64_G2,
> > +	XE_OA_FORMAT_PEC32u32_G2,
> > +	XE_OA_FORMAT_PEC36u64_G1_32_G2_4,
> > +	XE_OA_FORMAT_PEC36u64_G1_4_G2_32,
> > +
> > +	XE_OA_FORMAT_MAX,
> > +};
> > +
> > +/**
> > + * struct xe_oa_format - Format fields for supported OA formats
> > + */
> > +struct xe_oa_format {
> > +	u32 counter_select;
> > +	int size;
> > +	int type;
> > +	enum xe_oa_report_header header;
> > +	u16 counter_size;
> > +	u16 bc_report;
> > +};
> > +
> > +/**
> > + * struct xe_oa - OA device level information
> > + */
> > +struct xe_oa {
> > +	/** @xe: back pointer to xe device */
> > +	struct xe_device *xe;
> > +
> > +	/** @oa_formats: tracks all OA formats across platforms */
> > +	const struct xe_oa_format *oa_formats;
> > +
> > +#define FORMAT_MASK_SIZE DIV_ROUND_UP(XE_OA_FORMAT_MAX - 1, BITS_PER_LONG)
> > +
> > +	/** @format_mask: tracks valid OA formats for a platform */
> > +	unsigned long format_mask[FORMAT_MASK_SIZE];
> > +};
> > +#endif
> > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > index 3539e0781d700..5bfb2d5aba12a 100644
> > --- a/include/uapi/drm/xe_drm.h
> > +++ b/include/uapi/drm/xe_drm.h
> > @@ -1175,6 +1175,16 @@ enum drm_xe_perf_ioctls {
> >	DRM_XE_PERF_IOCTL_CONFIG = _IO('i', 0x2),
> > };
> >
> > +/** enum drm_xe_oa_format_type - OA format types */
> > +enum drm_xe_oa_format_type {
> > +	DRM_XE_OA_FMT_TYPE_OAG,
> > +	DRM_XE_OA_FMT_TYPE_OAR,
> > +	DRM_XE_OA_FMT_TYPE_OAM,
> > +	DRM_XE_OA_FMT_TYPE_OAC,
> > +	DRM_XE_OA_FMT_TYPE_OAM_MPEC,
> > +	DRM_XE_OA_FMT_TYPE_PEC,
> > +};
> > +
> > #if defined(__cplusplus)
> > }
> > #endif
> > -- 2.41.0
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties
  2023-12-08  6:43 ` [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties Ashutosh Dixit
  2023-12-09 22:53   ` Dixit, Ashutosh
@ 2023-12-19  2:59   ` Dixit, Ashutosh
  2023-12-19 16:26     ` Umesh Nerlige Ramappa
  2023-12-19 23:23   ` Umesh Nerlige Ramappa
  2 siblings, 1 reply; 68+ messages in thread
From: Dixit, Ashutosh @ 2023-12-19  2:59 UTC (permalink / raw)
  To: intel-xe; +Cc: Harish Chegondi, gzadicario, Robert Krzemien

On Thu, 07 Dec 2023 22:43:19 -0800, Ashutosh Dixit wrote:
>
> +	/**
> +	 * @DRM_XE_OA_PROPERTY_OPEN_FLAGS: CLOEXEC and NONBLOCK flags are
> +	 * directly applied to returned OA fd. DISABLED opens the OA stream in a
> +	 * DISABLED state (see @DRM_XE_PERF_IOCTL_ENABLE).
> +	 */
> +	DRM_XE_OA_PROPERTY_OPEN_FLAGS,
> +#define DRM_XE_OA_FLAG_FD_CLOEXEC	(1 << 0)
> +#define DRM_XE_OA_FLAG_FD_NONBLOCK	(1 << 1)
> +#define DRM_XE_OA_FLAG_DISABLED		(1 << 2)

I am wondering why these flags should be part of this uapi:

* O_CLOEXEC and O_NONBLOCK can be set on the returned stream fd using fcntl
  (see man 2 fcntl)
* DRM_XE_OA_FLAG_DISABLED can just be a stream open property, doesn't need
  to be a fd flag.

Comments?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/17] drm/xe/oa/uapi: Initialize OA units
  2023-12-08  6:43 ` [PATCH 05/17] drm/xe/oa/uapi: Initialize OA units Ashutosh Dixit
@ 2023-12-19 16:11   ` Umesh Nerlige Ramappa
  2024-01-20  2:43     ` Dixit, Ashutosh
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-19 16:11 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:17PM -0800, Ashutosh Dixit wrote:
>Initialize OA unit data struct's for each gt during device probe. Also
>assign OA units for hardware engines.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/regs/xe_oa_regs.h    |  96 ++++++++++++++
> drivers/gpu/drm/xe/xe_gt_types.h        |   4 +
> drivers/gpu/drm/xe/xe_hw_engine_types.h |   2 +
> drivers/gpu/drm/xe/xe_oa.c              | 169 ++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_oa_types.h        |  56 ++++++++
> include/uapi/drm/xe_drm.h               |   6 +
> 6 files changed, 333 insertions(+)
> create mode 100644 drivers/gpu/drm/xe/regs/xe_oa_regs.h
>
>diff --git a/drivers/gpu/drm/xe/regs/xe_oa_regs.h b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>new file mode 100644
>index 0000000000000..4455a5a42b01b
>--- /dev/null
>+++ b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>@@ -0,0 +1,96 @@
>+/* SPDX-License-Identifier: MIT */
>+/*
>+ * Copyright © 2023 Intel Corporation
>+ */
>+
>+#ifndef __XE_OA_REGS__
>+#define __XE_OA_REGS__
>+
>+#define REG_EQUAL(reg, xe_reg) ((reg) == (xe_reg.addr))
>+#define REG_EQUAL_MCR(reg, xe_reg) ((reg) == (xe_reg.__reg.addr))
>+
>+#define RPM_CONFIG1			XE_REG(0xd04)
>+#define   GT_NOA_ENABLE			REG_BIT(9)
>+
>+#define EU_PERF_CNTL0			XE_REG(0xe458)
>+#define EU_PERF_CNTL4			XE_REG(0xe45c)
>+#define EU_PERF_CNTL1			XE_REG(0xe558)
>+#define EU_PERF_CNTL5			XE_REG(0xe55c)
>+#define EU_PERF_CNTL2			XE_REG(0xe658)
>+#define EU_PERF_CNTL6			XE_REG(0xe65c)
>+#define EU_PERF_CNTL3			XE_REG(0xe758)
>+
>+#define OA_TLB_INV_CR			XE_REG(0xceec)
>+
>+/* OAR unit */
>+#define OAR_OACONTROL			XE_REG(0x2960)
>+#define  OAR_OACONTROL_COUNTER_SEL_MASK	REG_GENMASK(3, 1)
>+#define  OAR_OACONTROL_COUNTER_ENABLE	REG_BIT(0)
>+
>+#define OACTXCONTROL(base) XE_REG((base) + 0x360)
>+#define OAR_OASTATUS			XE_REG(0x2968)
>+#define  OA_COUNTER_RESUME		REG_BIT(0)
>+
>+/* OAG unit */
>+#define OAG_OAGLBCTXCTRL		XE_REG(0x2b28)
>+#define  OAG_OAGLBCTXCTRL_TIMER_PERIOD_MASK	REG_GENMASK(7, 2)
>+#define  OAG_OAGLBCTXCTRL_TIMER_ENABLE		REG_BIT(1)
>+#define  OAG_OAGLBCTXCTRL_COUNTER_RESUME	REG_BIT(0)
>+
>+#define OAG_OAHEADPTR				XE_REG(0xdb00)
>+#define  OAG_OAHEADPTR_MASK			REG_GENMASK(31, 6)
>+#define OAG_OATAILPTR				XE_REG(0xdb04)
>+#define  OAG_OATAILPTR_MASK			REG_GENMASK(31, 6)
>+
>+#define OAG_OABUFFER		XE_REG(0xdb08)
>+#define  OABUFFER_SIZE_MASK	REG_GENMASK(5, 3)
>+#define  OABUFFER_SIZE_128K	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 0)
>+#define  OABUFFER_SIZE_256K	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 1)
>+#define  OABUFFER_SIZE_512K	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 2)
>+#define  OABUFFER_SIZE_1M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 3)
>+#define  OABUFFER_SIZE_2M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 4)
>+#define  OABUFFER_SIZE_4M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 5)
>+#define  OABUFFER_SIZE_8M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 6)
>+#define  OABUFFER_SIZE_16M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 7)
>+#define  OAG_OABUFFER_MEMORY_SELECT		REG_BIT(0) /* 0: PPGTT, 1: GGTT */
>+
>+#define OAG_OACONTROL				XE_REG(0xdaf4)
>+#define  OAG_OACONTROL_OA_CCS_SELECT_MASK	REG_GENMASK(18, 16)
>+#define  OAG_OACONTROL_OA_COUNTER_SEL_MASK	REG_GENMASK(4, 2)
>+#define  OAG_OACONTROL_OA_COUNTER_ENABLE	REG_BIT(0)
>+/* Common to all OA units */
>+#define  OA_OACONTROL_REPORT_BC_MASK		REG_GENMASK(9, 9)
>+#define  OA_OACONTROL_COUNTER_SIZE_MASK		REG_GENMASK(8, 8)
>+
>+#define OAG_OA_DEBUG XE_REG(0xdaf8, XE_REG_OPTION_MASKED)
>+#define  OAG_OA_DEBUG_INCLUDE_CLK_RATIO			REG_BIT(6)
>+#define  OAG_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS		REG_BIT(5)
>+#define  OAG_OA_DEBUG_DISABLE_GO_1_0_REPORTS		REG_BIT(2)
>+#define  OAG_OA_DEBUG_DISABLE_CTX_SWITCH_REPORTS	REG_BIT(1)
>+
>+#define OAG_OASTATUS XE_REG(0xdafc)
>+#define  OAG_OASTATUS_COUNTER_OVERFLOW	REG_BIT(2)
>+#define  OAG_OASTATUS_BUFFER_OVERFLOW	REG_BIT(1)
>+#define  OAG_OASTATUS_REPORT_LOST	REG_BIT(0)
>+
>+/* OAM unit */
>+#define OAM_HEAD_POINTER_OFFSET			(0x1a0)
>+#define OAM_TAIL_POINTER_OFFSET			(0x1a4)
>+#define OAM_BUFFER_OFFSET			(0x1a8)
>+#define OAM_CONTEXT_CONTROL_OFFSET		(0x1bc)
>+#define OAM_CONTROL_OFFSET			(0x194)
>+#define  OAM_CONTROL_COUNTER_SEL_MASK		REG_GENMASK(3, 1)
>+#define OAM_DEBUG_OFFSET			(0x198)
>+#define OAM_STATUS_OFFSET			(0x19c)
>+#define OAM_MMIO_TRG_OFFSET			(0x1d0)
>+
>+#define OAM_HEAD_POINTER(base)			XE_REG((base) + OAM_HEAD_POINTER_OFFSET)
>+#define OAM_TAIL_POINTER(base)			XE_REG((base) + OAM_TAIL_POINTER_OFFSET)
>+#define OAM_BUFFER(base)			XE_REG((base) + OAM_BUFFER_OFFSET)
>+#define OAM_CONTEXT_CONTROL(base)		XE_REG((base) + OAM_CONTEXT_CONTROL_OFFSET)
>+#define OAM_CONTROL(base)			XE_REG((base) + OAM_CONTROL_OFFSET)
>+#define OAM_DEBUG(base)				XE_REG((base) + OAM_DEBUG_OFFSET)
>+#define OAM_STATUS(base)			XE_REG((base) + OAM_STATUS_OFFSET)
>+#define OAM_MMIO_TRG(base)			XE_REG((base) + OAM_MMIO_TRG_OFFSET)
>+
>+#endif /* __XE_OA_REGS__ */
>diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
>index a7263738308ec..a4a0170996982 100644
>--- a/drivers/gpu/drm/xe/xe_gt_types.h
>+++ b/drivers/gpu/drm/xe/xe_gt_types.h
>@@ -10,6 +10,7 @@
> #include "xe_gt_idle_types.h"
> #include "xe_hw_engine_types.h"
> #include "xe_hw_fence_types.h"
>+#include "xe_oa.h"
> #include "xe_reg_sr_types.h"
> #include "xe_sa_types.h"
> #include "xe_uc_types.h"
>@@ -347,6 +348,9 @@ struct xe_gt {
> 		/** @oob: bitmap with active OOB workaroudns */
> 		unsigned long *oob;
> 	} wa_active;
>+
>+	/** @oa: oa perf counter subsystem per gt info */
>+	struct xe_oa_gt oa;
> };
>
> #endif
>diff --git a/drivers/gpu/drm/xe/xe_hw_engine_types.h b/drivers/gpu/drm/xe/xe_hw_engine_types.h
>index 39908dec042a4..4d2e2338db987 100644
>--- a/drivers/gpu/drm/xe/xe_hw_engine_types.h
>+++ b/drivers/gpu/drm/xe/xe_hw_engine_types.h
>@@ -146,6 +146,8 @@ struct xe_hw_engine {
> 	enum xe_hw_engine_id engine_id;
> 	/** @eclass: pointer to per hw engine class interface */
> 	struct xe_hw_engine_class_intf *eclass;
>+	/** @oa_unit: oa unit for this hw engine */
>+	struct xe_oa_unit *oa_unit;
> };
>
> /**
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index 11662a81ef6d8..5ad3c9c78b4e9 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -5,7 +5,10 @@
>
> #include <linux/sysctl.h>
>
>+#include "regs/xe_oa_regs.h"
> #include "xe_device.h"
>+#include "xe_gt.h"
>+#include "xe_mmio.h"
> #include "xe_oa.h"
>
> static int xe_oa_sample_rate_hard_limit;
>@@ -13,6 +16,13 @@ static u32 xe_oa_max_sample_rate = 100000;
>
> static struct ctl_table_header *sysctl_header;
>
>+enum {
>+	XE_OA_UNIT_OAG = 0,
>+	XE_OA_UNIT_OAM_SAMEDIA_0 = 0,
>+	XE_OA_UNIT_MAX,
>+	XE_OA_UNIT_INVALID = U32_MAX,
>+};

Right now, I think the enum is not needed since we are only defining 0.

>+
> #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
>
> static const struct xe_oa_format oa_formats[] = {
>@@ -37,6 +47,143 @@ static const struct xe_oa_format oa_formats[] = {
> 	[XE_OA_FORMAT_PEC36u64_G1_4_G2_32]	= { 4, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
> };
>
>+static u32 num_oa_units_per_gt(struct xe_gt *gt)
>+{
>+	return 1;
>+}
>+
>+static u32 __hwe_oam_unit(struct xe_hw_engine *hwe)
>+{
>+	if (GRAPHICS_VERx100(gt_to_xe(hwe->gt)) >= 1270) {
>+		/*
>+		 * There's 1 SAMEDIA gt and 1 OAM per SAMEDIA gt. All media slices
>+		 * within the gt use the same OAM. All MTL/LNL SKUs list 1 SA MEDIA
>+		 */
>+		drm_WARN_ON(&gt_to_xe(hwe->gt)->drm,
>+			    hwe->gt->info.type != XE_GT_TYPE_MEDIA);
>+
>+		return XE_OA_UNIT_OAM_SAMEDIA_0;
>+	}
>+
>+	return XE_OA_UNIT_INVALID;
>+}
>+
>+static u32 __hwe_oa_unit(struct xe_hw_engine *hwe)
>+{
>+	switch (hwe->class) {
>+	case XE_ENGINE_CLASS_RENDER:
>+	case XE_ENGINE_CLASS_COMPUTE:
>+		return XE_OA_UNIT_OAG;
>+
>+	case XE_ENGINE_CLASS_VIDEO_DECODE:
>+	case XE_ENGINE_CLASS_VIDEO_ENHANCE:
>+		return __hwe_oam_unit(hwe);
>+
>+	default:
>+		return XE_OA_UNIT_INVALID;
>+	}
>+}
>+
>+static struct xe_oa_regs __oam_regs(u32 base)
>+{
>+	return (struct xe_oa_regs) {
>+		base,
>+		OAM_HEAD_POINTER(base),
>+		OAM_TAIL_POINTER(base),
>+		OAM_BUFFER(base),
>+		OAM_CONTEXT_CONTROL(base),
>+		OAM_CONTROL(base),
>+		OAM_DEBUG(base),
>+		OAM_STATUS(base),
>+		OAM_CONTROL_COUNTER_SEL_MASK,
>+	};
>+}
>+
>+static struct xe_oa_regs __oag_regs(void)
>+{
>+	return (struct xe_oa_regs) {
>+		0,
>+		OAG_OAHEADPTR,
>+		OAG_OATAILPTR,
>+		OAG_OABUFFER,
>+		OAG_OAGLBCTXCTRL,
>+		OAG_OACONTROL,
>+		OAG_OA_DEBUG,
>+		OAG_OASTATUS,
>+		OAG_OACONTROL_OA_COUNTER_SEL_MASK,
>+	};
>+}
>+
>+static void __xe_oa_init_oa_units(struct xe_gt *gt)
>+{
>+	const u32 mtl_oa_base[] = {
>+		[XE_OA_UNIT_OAM_SAMEDIA_0] = 0x393000,

The base can also be 0x13000 because intel_uncore will automagically add 
0x380000. I prefer 0x13000 so that the media related mmio adjustments 
happen in one place - intel_uncore. For functionality, it doesn't 
matter.

>+	};
>+	int i, num_units = gt->oa.num_oa_units;
>+
>+	for (i = 0; i < num_units; i++) {
>+		struct xe_oa_unit *u = &gt->oa.oa_unit[i];
>+
>+		if (i == XE_OA_UNIT_OAG && gt->info.type != XE_GT_TYPE_MEDIA) {

This is where I feel enum can be dropped since decision can solely be 
made with gt->info.type.

>+			u->regs = __oag_regs();
>+			u->type = DRM_XE_OA_UNIT_TYPE_OAG;
>+		} else if (GRAPHICS_VERx100(gt_to_xe(gt)) >= 1270) {
>+			u->regs = __oam_regs(mtl_oa_base[i]);
>+			u->type = DRM_XE_OA_UNIT_TYPE_OAM;
>+		}
>+
>+		/* Set oa_unit_ids now to ensure ids remain contiguous */
>+		u->oa_unit_id = gt_to_xe(gt)->oa.oa_unit_ids++;
>+	}
>+}
>+

All the above are minor comments, so with or without those addressed, 
this is

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks,
Umesh

>+static int xe_oa_init_gt(struct xe_gt *gt)
>+{
>+	u32 num_oa_units = num_oa_units_per_gt(gt);
>+	struct xe_hw_engine *hwe;
>+	enum xe_hw_engine_id id;
>+	struct xe_oa_unit *u;
>+
>+	u = kcalloc(num_oa_units, sizeof(*u), GFP_KERNEL);
>+	if (!u)
>+		return -ENOMEM;
>+
>+	for_each_hw_engine(hwe, gt, id) {
>+		u32 index = __hwe_oa_unit(hwe);
>+
>+		hwe->oa_unit = NULL;
>+		if (index < num_oa_units) {
>+			u[index].num_engines++;
>+			hwe->oa_unit = &u[index];
>+		}
>+	}
>+
>+	/*
>+	 * Fused off engines can result in oa_unit's with num_engines == 0. These units
>+	 * will appear in OA unit query, but no perf streams can be opened on them.
>+	 */
>+	gt->oa.num_oa_units = num_oa_units;
>+	gt->oa.oa_unit = u;
>+
>+	__xe_oa_init_oa_units(gt);
>+
>+	return 0;
>+}
>+
>+static int xe_oa_init_oa_units(struct xe_oa *oa)
>+{
>+	struct xe_gt *gt;
>+	int i, ret;
>+
>+	for_each_gt(gt, oa->xe, i) {
>+		ret = xe_oa_init_gt(gt);
>+		if (ret)
>+			return ret;
>+	}
>+
>+	return 0;
>+}
>+
> static void oa_format_add(struct xe_oa *oa, enum xe_oa_format_name format)
> {
> 	__set_bit(format, oa->format_mask);
>@@ -96,6 +243,8 @@ static void xe_oa_init_supported_formats(struct xe_oa *oa)
> int xe_oa_init(struct xe_device *xe)
> {
> 	struct xe_oa *oa = &xe->oa;
>+	struct xe_gt *gt;
>+	int i, ret;
>
> 	/* Support OA only with GuC submission and Gen12+ */
> 	if (XE_WARN_ON(!xe_device_uc_enabled(xe)) || XE_WARN_ON(GRAPHICS_VER(xe) < 12))
>@@ -104,16 +253,36 @@ int xe_oa_init(struct xe_device *xe)
> 	oa->xe = xe;
> 	oa->oa_formats = oa_formats;
>
>+	for_each_gt(gt, xe, i)
>+		mutex_init(&gt->oa.gt_lock);
>+
> 	/* Choose a representative limit */
> 	xe_oa_sample_rate_hard_limit = xe_root_mmio_gt(xe)->info.reference_clock / 2;
>
>+	ret = xe_oa_init_oa_units(oa);
>+	if (ret) {
>+		drm_err(&xe->drm, "OA initialization failed %d\n", ret);
>+		goto exit;
>+	}
>+
> 	xe_oa_init_supported_formats(oa);
> 	return 0;
>+exit:
>+	oa->xe = NULL;
>+	return ret;
> }
>
> void xe_oa_fini(struct xe_device *xe)
> {
> 	struct xe_oa *oa = &xe->oa;
>+	struct xe_gt *gt;
>+	int i;
>+
>+	if (!oa->xe)
>+		return;
>+
>+	for_each_gt(gt, xe, i)
>+		kfree(gt->oa.oa_unit);
>
> 	oa->xe = NULL;
> }
>diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
>index 3758bd2879cbb..8f8cf6a2bf556 100644
>--- a/drivers/gpu/drm/xe/xe_oa_types.h
>+++ b/drivers/gpu/drm/xe/xe_oa_types.h
>@@ -8,6 +8,10 @@
>
> #include <linux/math.h>
> #include <linux/types.h>
>+#include <linux/mutex.h>
>+
>+#include <drm/xe_drm.h>
>+#include "regs/xe_reg_defs.h"
>
> enum xe_oa_report_header {
> 	HDR_32_BIT = 0,
>@@ -60,6 +64,55 @@ struct xe_oa_format {
> 	u16 bc_report;
> };
>
>+/**
>+ * struct xe_oa_regs - Registers for each OA unit
>+ */
>+struct xe_oa_regs {
>+	u32 base;
>+	struct xe_reg oa_head_ptr;
>+	struct xe_reg oa_tail_ptr;
>+	struct xe_reg oa_buffer;
>+	struct xe_reg oa_ctx_ctrl;
>+	struct xe_reg oa_ctrl;
>+	struct xe_reg oa_debug;
>+	struct xe_reg oa_status;
>+	u32 oa_ctrl_counter_select_mask;
>+};
>+
>+/**
>+ * struct xe_oa_unit - Hardware OA unit
>+ */
>+struct xe_oa_unit {
>+	/** @oa_unit_id: identifier for the OA unit */
>+	u16 oa_unit_id;
>+
>+	/** @type: Type of OA unit - OAM, OAG etc. */
>+	enum drm_xe_oa_unit_type type;
>+
>+	/** @regs: OA registers for programming the OA unit */
>+	struct xe_oa_regs regs;
>+
>+	/** @num_engines: number of engines attached to this OA unit */
>+	u32 num_engines;
>+
>+	/** @exclusive_stream: The stream currently using the OA unit */
>+	struct xe_oa_stream *exclusive_stream;
>+};
>+
>+/**
>+ * struct xe_oa_gt - OA per-gt information
>+ */
>+struct xe_oa_gt {
>+	/** @lock: lock protecting create/destroy OA streams */
>+	struct mutex gt_lock;
>+
>+	/** @num_oa_units: number of oa units for each gt */
>+	u32 num_oa_units;
>+
>+	/** @oa_unit: array of oa_units */
>+	struct xe_oa_unit *oa_unit;
>+};
>+
> /**
>  * struct xe_oa - OA device level information
>  */
>@@ -74,5 +127,8 @@ struct xe_oa {
>
> 	/** @format_mask: tracks valid OA formats for a platform */
> 	unsigned long format_mask[FORMAT_MASK_SIZE];
>+
>+	/** @oa_unit_ids: tracks oa unit ids assigned across gt's */
>+	u16 oa_unit_ids;
> };
> #endif
>diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>index 5bfb2d5aba12a..778862a5b76d4 100644
>--- a/include/uapi/drm/xe_drm.h
>+++ b/include/uapi/drm/xe_drm.h
>@@ -1175,6 +1175,12 @@ enum drm_xe_perf_ioctls {
> 	DRM_XE_PERF_IOCTL_CONFIG = _IO('i', 0x2),
> };
>
>+/** enum drm_xe_oa_unit_type - OA unit types */
>+enum drm_xe_oa_unit_type {
>+	DRM_XE_OA_UNIT_TYPE_OAG,
>+	DRM_XE_OA_UNIT_TYPE_OAM,
>+};
>+
> /** enum drm_xe_oa_format_type - OA format types */
> enum drm_xe_oa_format_type {
> 	DRM_XE_OA_FMT_TYPE_OAG,
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties
  2023-12-19  2:59   ` Dixit, Ashutosh
@ 2023-12-19 16:26     ` Umesh Nerlige Ramappa
  2023-12-19 16:29       ` Lionel Landwerlin
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-19 16:26 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: Harish Chegondi, gzadicario, intel-xe, Robert Krzemien

On Mon, Dec 18, 2023 at 06:59:57PM -0800, Dixit, Ashutosh wrote:
>On Thu, 07 Dec 2023 22:43:19 -0800, Ashutosh Dixit wrote:
>>
>> +	/**
>> +	 * @DRM_XE_OA_PROPERTY_OPEN_FLAGS: CLOEXEC and NONBLOCK flags are
>> +	 * directly applied to returned OA fd. DISABLED opens the OA stream in a
>> +	 * DISABLED state (see @DRM_XE_PERF_IOCTL_ENABLE).
>> +	 */
>> +	DRM_XE_OA_PROPERTY_OPEN_FLAGS,
>> +#define DRM_XE_OA_FLAG_FD_CLOEXEC	(1 << 0)
>> +#define DRM_XE_OA_FLAG_FD_NONBLOCK	(1 << 1)
>> +#define DRM_XE_OA_FLAG_DISABLED		(1 << 2)
>
>I am wondering why these flags should be part of this uapi:
>
>* O_CLOEXEC and O_NONBLOCK can be set on the returned stream fd using fcntl
>  (see man 2 fcntl)

I think the O_CLOEXEC was used so that a fork doesn't carry over the fd 
to the child. For the OA use case, we want to prevent that.  However, 
these flags don't really need to be passed separately. They can be flags 
in the stream open property.

Umesh

>* DRM_XE_OA_FLAG_DISABLED can just be a stream open property, doesn't 
>need
>  to be a fd flag.
>
>Comments?


>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties
  2023-12-19 16:26     ` Umesh Nerlige Ramappa
@ 2023-12-19 16:29       ` Lionel Landwerlin
  2023-12-19 16:40         ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 68+ messages in thread
From: Lionel Landwerlin @ 2023-12-19 16:29 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, Dixit, Ashutosh
  Cc: Harish Chegondi, intel-xe, gzadicario, Robert Krzemien

On 19/12/2023 18:26, Umesh Nerlige Ramappa wrote:
> On Mon, Dec 18, 2023 at 06:59:57PM -0800, Dixit, Ashutosh wrote:
>> On Thu, 07 Dec 2023 22:43:19 -0800, Ashutosh Dixit wrote:
>>>
>>> +    /**
>>> +     * @DRM_XE_OA_PROPERTY_OPEN_FLAGS: CLOEXEC and NONBLOCK flags are
>>> +     * directly applied to returned OA fd. DISABLED opens the OA 
>>> stream in a
>>> +     * DISABLED state (see @DRM_XE_PERF_IOCTL_ENABLE).
>>> +     */
>>> +    DRM_XE_OA_PROPERTY_OPEN_FLAGS,
>>> +#define DRM_XE_OA_FLAG_FD_CLOEXEC    (1 << 0)
>>> +#define DRM_XE_OA_FLAG_FD_NONBLOCK    (1 << 1)
>>> +#define DRM_XE_OA_FLAG_DISABLED        (1 << 2)
>>
>> I am wondering why these flags should be part of this uapi:
>>
>> * O_CLOEXEC and O_NONBLOCK can be set on the returned stream fd using 
>> fcntl
>>  (see man 2 fcntl)
>
> I think the O_CLOEXEC was used so that a fork doesn't carry over the 
> fd to the child. For the OA use case, we want to prevent that.  
> However, these flags don't really need to be passed separately. They 
> can be flags in the stream open property.
>
> Umesh
>

You know that the application can set those flags by using the fcntl() 
syscall?

It doesn't look like it's a useful feature to add in the driver.


-Lionel


>> * DRM_XE_OA_FLAG_DISABLED can just be a stream open property, doesn't 
>> need
>>  to be a fd flag.
>>
>> Comments?
>
>
>>
>> Thanks.
>> -- 
>> Ashutosh



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties
  2023-12-19 16:29       ` Lionel Landwerlin
@ 2023-12-19 16:40         ` Umesh Nerlige Ramappa
  2023-12-19 17:48           ` Lionel Landwerlin
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-19 16:40 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Harish Chegondi, intel-xe, gzadicario, Robert Krzemien

On Tue, Dec 19, 2023 at 06:29:56PM +0200, Lionel Landwerlin wrote:
>On 19/12/2023 18:26, Umesh Nerlige Ramappa wrote:
>>On Mon, Dec 18, 2023 at 06:59:57PM -0800, Dixit, Ashutosh wrote:
>>>On Thu, 07 Dec 2023 22:43:19 -0800, Ashutosh Dixit wrote:
>>>>
>>>>+    /**
>>>>+     * @DRM_XE_OA_PROPERTY_OPEN_FLAGS: CLOEXEC and NONBLOCK flags are
>>>>+     * directly applied to returned OA fd. DISABLED opens the 
>>>>OA stream in a
>>>>+     * DISABLED state (see @DRM_XE_PERF_IOCTL_ENABLE).
>>>>+     */
>>>>+    DRM_XE_OA_PROPERTY_OPEN_FLAGS,
>>>>+#define DRM_XE_OA_FLAG_FD_CLOEXEC    (1 << 0)
>>>>+#define DRM_XE_OA_FLAG_FD_NONBLOCK    (1 << 1)
>>>>+#define DRM_XE_OA_FLAG_DISABLED        (1 << 2)
>>>
>>>I am wondering why these flags should be part of this uapi:
>>>
>>>* O_CLOEXEC and O_NONBLOCK can be set on the returned stream fd 
>>>using fcntl
>>> (see man 2 fcntl)
>>
>>I think the O_CLOEXEC was used so that a fork doesn't carry over the 
>>fd to the child. For the OA use case, we want to prevent that.  
>>However, these flags don't really need to be passed separately. They 
>>can be flags in the stream open property.
>>
>>Umesh
>>
>
>You know that the application can set those flags by using the fcntl() 
>syscall?
>
>It doesn't look like it's a useful feature to add in the driver.
>

Right. It does look like it's not needed in the driver.

I just don't know if there was a reason to include it in the same call 
as stream open ioctl. My guess is that we didn't want them to be 
separate calls due the nature of OA use case - privileged and single 
user. The application could just open a stream fd and fork a bunch of 
threads and all threads would have access to the stream fd (even if they 
drop root?).

Or I might be overthinking this. Maybe it's just there in the driver 
because fcntl mentions some races that may/may not apply to our use 
case. In practice, the application will likely call the fcntl right away 
and since OA does not support multiple users, the above concerns are not 
relevant, so fine to do it in fcntl.

Thanks,
Umesh
>
>-Lionel
>
>
>>>* DRM_XE_OA_FLAG_DISABLED can just be a stream open property, 
>>>doesn't need
>>> to be a fd flag.
>>>
>>>Comments?
>>
>>
>>>
>>>Thanks.
>>>-- 
>>>Ashutosh
>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties
  2023-12-19 16:40         ` Umesh Nerlige Ramappa
@ 2023-12-19 17:48           ` Lionel Landwerlin
  0 siblings, 0 replies; 68+ messages in thread
From: Lionel Landwerlin @ 2023-12-19 17:48 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa
  Cc: Harish Chegondi, intel-xe, gzadicario, Robert Krzemien

On 19/12/2023 18:40, Umesh Nerlige Ramappa wrote:
> On Tue, Dec 19, 2023 at 06:29:56PM +0200, Lionel Landwerlin wrote:
>> On 19/12/2023 18:26, Umesh Nerlige Ramappa wrote:
>>> On Mon, Dec 18, 2023 at 06:59:57PM -0800, Dixit, Ashutosh wrote:
>>>> On Thu, 07 Dec 2023 22:43:19 -0800, Ashutosh Dixit wrote:
>>>>>
>>>>> +    /**
>>>>> +     * @DRM_XE_OA_PROPERTY_OPEN_FLAGS: CLOEXEC and NONBLOCK flags 
>>>>> are
>>>>> +     * directly applied to returned OA fd. DISABLED opens the OA 
>>>>> stream in a
>>>>> +     * DISABLED state (see @DRM_XE_PERF_IOCTL_ENABLE).
>>>>> +     */
>>>>> +    DRM_XE_OA_PROPERTY_OPEN_FLAGS,
>>>>> +#define DRM_XE_OA_FLAG_FD_CLOEXEC    (1 << 0)
>>>>> +#define DRM_XE_OA_FLAG_FD_NONBLOCK    (1 << 1)
>>>>> +#define DRM_XE_OA_FLAG_DISABLED        (1 << 2)
>>>>
>>>> I am wondering why these flags should be part of this uapi:
>>>>
>>>> * O_CLOEXEC and O_NONBLOCK can be set on the returned stream fd 
>>>> using fcntl
>>>>  (see man 2 fcntl)
>>>
>>> I think the O_CLOEXEC was used so that a fork doesn't carry over the 
>>> fd to the child. For the OA use case, we want to prevent that.  
>>> However, these flags don't really need to be passed separately. They 
>>> can be flags in the stream open property.
>>>
>>> Umesh
>>>
>>
>> You know that the application can set those flags by using the 
>> fcntl() syscall?
>>
>> It doesn't look like it's a useful feature to add in the driver.
>>
>
> Right. It does look like it's not needed in the driver.
>
> I just don't know if there was a reason to include it in the same call 
> as stream open ioctl. My guess is that we didn't want them to be 
> separate calls due the nature of OA use case - privileged and single 
> user. The application could just open a stream fd and fork a bunch of 
> threads and all threads would have access to the stream fd (even if 
> they drop root?).
> Or I might be overthinking this. Maybe it's just there in the driver 
> because fcntl mentions some races that may/may not apply to our use 
> case. In practice, the application will likely call the fcntl right 
> away and since OA does not support multiple users, the above concerns 
> are not relevant, so fine to do it in fcntl.


Yeah I think it's likely userspace driver code that will call fcntl 
right after OA open. So it doesn't look like the KMD needs to also 
handle that with additional flags.


-Lionel


>
> Thanks,
> Umesh
>>
>> -Lionel
>>
>>
>>>> * DRM_XE_OA_FLAG_DISABLED can just be a stream open property, 
>>>> doesn't need
>>>>  to be a fd flag.
>>>>
>>>> Comments?
>>>
>>>
>>>>
>>>> Thanks.
>>>> -- 
>>>> Ashutosh
>>
>>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 06/17] drm/xe/oa/uapi: Add/remove OA config perf ops
  2023-12-08  6:43 ` [PATCH 06/17] drm/xe/oa/uapi: Add/remove OA config perf ops Ashutosh Dixit
@ 2023-12-19 19:10   ` Umesh Nerlige Ramappa
  2024-01-20  2:44     ` Dixit, Ashutosh
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-19 19:10 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:18PM -0800, Ashutosh Dixit wrote:
>Introduce add/remove config perf ops for OA. OA configurations consist of a
>set of event/counter select register address/value pairs. The add_config
>perf op validates and stores such configurations and also exposes them in
>the metrics sysfs. These configurations will be programmed to OA unit HW
>when an OA stream using a configuration is opened. The OA stream can also
>switch to other stored configurations.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/xe_device.c   |   4 +
> drivers/gpu/drm/xe/xe_oa.c       | 406 +++++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_oa.h       |   9 +
> drivers/gpu/drm/xe/xe_oa_types.h |  10 +
> drivers/gpu/drm/xe/xe_perf.c     |  16 ++
> include/uapi/drm/xe_drm.h        |  25 ++
> 6 files changed, 470 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>index 744d573eb2720..23fdd045b470a 100644
>--- a/drivers/gpu/drm/xe/xe_device.c
>+++ b/drivers/gpu/drm/xe/xe_device.c
>@@ -495,6 +495,8 @@ int xe_device_probe(struct xe_device *xe)
>
> 	xe_display_register(xe);
>
>+	xe_oa_register(xe);
>+
> 	xe_debugfs_register(xe);
>
> 	xe_pmu_register(&xe->pmu);
>@@ -527,6 +529,8 @@ static void xe_device_remove_display(struct xe_device *xe)
>
> void xe_device_remove(struct xe_device *xe)
> {
>+	xe_oa_unregister(xe);
>+
> 	xe_device_remove_display(xe);
>
> 	xe_display_fini(xe);
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index 5ad3c9c78b4e9..6a903bf4f87d1 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -10,6 +10,7 @@
> #include "xe_gt.h"
> #include "xe_mmio.h"
> #include "xe_oa.h"
>+#include "xe_perf.h"
>
> static int xe_oa_sample_rate_hard_limit;
> static u32 xe_oa_max_sample_rate = 100000;
>@@ -23,6 +24,28 @@ enum {
> 	XE_OA_UNIT_INVALID = U32_MAX,
> };
>
>+struct xe_oa_reg {
>+	struct xe_reg addr;
>+	u32 value;
>+};
>+
>+struct xe_oa_config {
>+	struct xe_oa *oa;
>+
>+	char uuid[UUID_STRING_LEN + 1];
>+	int id;
>+
>+	const struct xe_oa_reg *regs;
>+	u32 regs_len;
>+
>+	struct attribute_group sysfs_metric;
>+	struct attribute *attrs[2];
>+	struct kobj_attribute sysfs_metric_id;
>+
>+	struct kref ref;
>+	struct rcu_head rcu;
>+};
>+
> #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
>
> static const struct xe_oa_format oa_formats[] = {
>@@ -47,6 +70,377 @@ static const struct xe_oa_format oa_formats[] = {
> 	[XE_OA_FORMAT_PEC36u64_G1_4_G2_32]	= { 4, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
> };
>
>+static void xe_oa_config_release(struct kref *ref)
>+{
>+	struct xe_oa_config *oa_config =
>+		container_of(ref, typeof(*oa_config), ref);
>+
>+	kfree(oa_config->regs);
>+
>+	kfree_rcu(oa_config, rcu);
>+}
>+
>+static void xe_oa_config_put(struct xe_oa_config *oa_config)
>+{
>+	if (!oa_config)
>+		return;
>+
>+	kref_put(&oa_config->ref, xe_oa_config_release);
>+}
>+
>+static bool xe_oa_is_valid_flex_addr(struct xe_oa *oa, u32 addr)
>+{
>+	static const struct xe_reg flex_eu_regs[] = {
>+		EU_PERF_CNTL0,
>+		EU_PERF_CNTL1,
>+		EU_PERF_CNTL2,
>+		EU_PERF_CNTL3,
>+		EU_PERF_CNTL4,
>+		EU_PERF_CNTL5,
>+		EU_PERF_CNTL6,
>+	};
>+	int i;
>+
>+	for (i = 0; i < ARRAY_SIZE(flex_eu_regs); i++) {
>+		if (flex_eu_regs[i].addr == addr)
>+			return true;
>+	}
>+	return false;
>+}
>+
>+static bool xe_oa_reg_in_range_table(u32 addr, const struct xe_mmio_range *table)
>+{
>+	while (table->start || table->end) {

nit: why not start && end? I would expect both start and end defined for 
a range.
>+		if (addr >= table->start && addr <= table->end)
>+			return true;
>+
>+		table++;
>+	}
>+
>+	return false;
>+}
>+
>+static const struct xe_mmio_range xehp_oa_b_counters[] = {
>+	{ .start = 0xdc48, .end = 0xdc48 },	/* OAA_ENABLE_REG */
>+	{ .start = 0xdd00, .end = 0xdd48 },	/* OAG_LCE0_0 - OAA_LENABLE_REG */
>+	{}
>+};
>+
>+static const struct xe_mmio_range gen12_oa_b_counters[] = {
>+	{ .start = 0x2b2c, .end = 0x2b2c },	/* OAG_OA_PESS */
>+	{ .start = 0xd900, .end = 0xd91c },	/* OAG_OASTARTTRIG[1-8] */
>+	{ .start = 0xd920, .end = 0xd93c },	/* OAG_OAREPORTTRIG1[1-8] */
>+	{ .start = 0xd940, .end = 0xd97c },	/* OAG_CEC[0-7][0-1] */
>+	{ .start = 0xdc00, .end = 0xdc3c },	/* OAG_SCEC[0-7][0-1] */
>+	{ .start = 0xdc40, .end = 0xdc40 },	/* OAG_SPCTR_CNF */
>+	{ .start = 0xdc44, .end = 0xdc44 },	/* OAA_DBG_REG */
>+	{}
>+};
>+
>+static const struct xe_mmio_range mtl_oam_b_counters[] = {
>+	{ .start = 0x393000, .end = 0x39301c },	/* OAM_STARTTRIG1[1-8] */
>+	{ .start = 0x393020, .end = 0x39303c },	/* OAM_REPORTTRIG1[1-8] */
>+	{ .start = 0x393040, .end = 0x39307c },	/* OAM_CEC[0-7][0-1] */
>+	{ .start = 0x393200, .end = 0x39323C },	/* MPES[0-7] */
>+	{}
>+};
>+
>+static const struct xe_mmio_range xe2_oa_b_counters[] = {
>+	{ .start = 0x393200, .end = 0x39323C },	/* MPES_0_MPES_SAG - MPES_7_UPPER_MPES_SAG */
>+	{ .start = 0x394200, .end = 0x39423C },	/* MPES_0_MPES_SCMI0 - MPES_7_UPPER_MPES_SCMI0 */
>+	{ .start = 0x394A00, .end = 0x394A3C },	/* MPES_0_MPES_SCMI1 - MPES_7_UPPER_MPES_SCMI1 */
>+	{},
>+};
>+
>+static bool xe_oa_is_valid_b_counter_addr(struct xe_oa *oa, u32 addr)
>+{
>+	return xe_oa_reg_in_range_table(addr, xehp_oa_b_counters) ||
>+		xe_oa_reg_in_range_table(addr, gen12_oa_b_counters) ||
>+		xe_oa_reg_in_range_table(addr, mtl_oam_b_counters) ||
>+		(GRAPHICS_VER(oa->xe) >= 20 &&
>+		 xe_oa_reg_in_range_table(addr, xe2_oa_b_counters));
>+}
>+
>+static const struct xe_mmio_range mtl_oa_mux_regs[] = {
>+	{ .start = 0x0d00, .end = 0x0d04 },	/* RPM_CONFIG[0-1] */
>+	{ .start = 0x0d0c, .end = 0x0d2c },	/* NOA_CONFIG[0-8] */
>+	{ .start = 0x9840, .end = 0x9840 },	/* GDT_CHICKEN_BITS */
>+	{ .start = 0x9884, .end = 0x9888 },	/* NOA_WRITE */
>+	{ .start = 0x38d100, .end = 0x38d114},	/* VISACTL */
>+	{}
>+};
>+
>+static const struct xe_mmio_range gen12_oa_mux_regs[] = {
>+	{ .start = 0x0d00, .end = 0x0d04 },     /* RPM_CONFIG[0-1] */
>+	{ .start = 0x0d0c, .end = 0x0d2c },     /* NOA_CONFIG[0-8] */
>+	{ .start = 0x9840, .end = 0x9840 },	/* GDT_CHICKEN_BITS */
>+	{ .start = 0x9884, .end = 0x9888 },	/* NOA_WRITE */
>+	{ .start = 0x20cc, .end = 0x20cc },	/* WAIT_FOR_RC6_EXIT */
>+	{}
>+};
>+
>+static const struct xe_mmio_range xe2_oa_mux_regs[] = {
>+	{ .start = 0x13000,  .end = 0x137FC },	/* PES_0_PESL0 - PES_63_UPPER_PESL3 */
>+	{},
>+};
>+
>+static bool xe_oa_is_valid_mux_addr(struct xe_oa *oa, u32 addr)
>+{
>+	if (GRAPHICS_VER(oa->xe) >= 20)
>+		return xe_oa_reg_in_range_table(addr, xe2_oa_mux_regs);
>+	else if (GRAPHICS_VERx100(oa->xe) >= 1270)
>+		return xe_oa_reg_in_range_table(addr, mtl_oa_mux_regs);
>+	else
>+		return xe_oa_reg_in_range_table(addr, gen12_oa_mux_regs);
>+}
>+
>+static bool xe_oa_is_valid_config_reg_addr(struct xe_oa *oa, u32 addr)
>+{
>+	return xe_oa_is_valid_flex_addr(oa, addr) ||
>+		xe_oa_is_valid_b_counter_addr(oa, addr) ||
>+		xe_oa_is_valid_mux_addr(oa, addr);
>+}
>+
>+static struct xe_oa_reg *
>+xe_oa_alloc_regs(struct xe_oa *oa, bool (*is_valid)(struct xe_oa *oa, u32 addr),
>+		 u32 __user *regs, u32 n_regs)
>+{
>+	struct xe_oa_reg *oa_regs;
>+	int err;
>+	u32 i;
>+
>+	oa_regs = kmalloc_array(n_regs, sizeof(*oa_regs), GFP_KERNEL);
>+	if (!oa_regs)
>+		return ERR_PTR(-ENOMEM);
>+
>+	for (i = 0; i < n_regs; i++) {
>+		u32 addr, value;
>+
>+		err = get_user(addr, regs);
>+		if (err)
>+			goto addr_err;
>+
>+		if (!is_valid(oa, addr)) {
>+			drm_dbg(&oa->xe->drm, "Invalid oa_reg address: %X\n", addr);
>+			err = -EINVAL;
>+			goto addr_err;
>+		}
>+
>+		err = get_user(value, regs + 1);
>+		if (err)
>+			goto addr_err;
>+
>+		oa_regs[i].addr = XE_REG(addr);
>+		oa_regs[i].value = value;
>+
>+		regs += 2;
>+	}
>+
>+	return oa_regs;
>+
>+addr_err:
>+	kfree(oa_regs);
>+	return ERR_PTR(err);
>+}
>+
>+static ssize_t show_dynamic_id(struct kobject *kobj,
>+			       struct kobj_attribute *attr,
>+			       char *buf)
>+{
>+	struct xe_oa_config *oa_config =
>+		container_of(attr, typeof(*oa_config), sysfs_metric_id);
>+
>+	return sprintf(buf, "%d\n", oa_config->id);
>+}
>+
>+static int create_dynamic_oa_sysfs_entry(struct xe_oa *oa,
>+					 struct xe_oa_config *oa_config)
>+{
>+	sysfs_attr_init(&oa_config->sysfs_metric_id.attr);
>+	oa_config->sysfs_metric_id.attr.name = "id";
>+	oa_config->sysfs_metric_id.attr.mode = 0444;
>+	oa_config->sysfs_metric_id.show = show_dynamic_id;
>+	oa_config->sysfs_metric_id.store = NULL;
>+
>+	oa_config->attrs[0] = &oa_config->sysfs_metric_id.attr;
>+	oa_config->attrs[1] = NULL;
>+
>+	oa_config->sysfs_metric.name = oa_config->uuid;
>+	oa_config->sysfs_metric.attrs = oa_config->attrs;
>+
>+	return sysfs_create_group(oa->metrics_kobj, &oa_config->sysfs_metric);
>+}
>+
>+int xe_oa_add_config_ioctl(struct drm_device *dev, void *data,
>+			   struct drm_file *file)
>+{
>+	struct xe_oa *oa = &to_xe_device(dev)->oa;
>+	struct drm_xe_oa_config param;
>+	struct drm_xe_oa_config *arg = &param;
>+	struct xe_oa_config *oa_config, *tmp;
>+	struct xe_oa_reg *regs;
>+	int err, id;
>+
>+	if (!oa->xe) {
>+		drm_dbg(&oa->xe->drm, "xe oa interface not available for this system\n");
>+		return -ENODEV;
>+	}
>+
>+	if (xe_perf_stream_paranoid && !perfmon_capable()) {
>+		drm_dbg(&oa->xe->drm, "Insufficient privileges to add xe OA config\n");
>+		return -EACCES;
>+	}
>+
>+	err = __copy_from_user(&param, data, sizeof(param));
>+	if (XE_IOCTL_DBG(oa->xe, err))
>+		return -EFAULT;
>+
>+	if (!arg->regs_ptr || !arg->n_regs) {
>+		drm_dbg(&oa->xe->drm, "No OA registers given\n");
>+		return -EINVAL;
>+	}
>+
>+	oa_config = kzalloc(sizeof(*oa_config), GFP_KERNEL);
>+	if (!oa_config)
>+		return -ENOMEM;
>+
>+	oa_config->oa = oa;
>+	kref_init(&oa_config->ref);
>+
>+	if (!uuid_is_valid(arg->uuid)) {
>+		drm_dbg(&oa->xe->drm, "Invalid uuid format for OA config\n");
>+		err = -EINVAL;
>+		goto reg_err;
>+	}
>+
>+	/* Last character in oa_config->uuid will be 0 because oa_config is kzalloc */
>+	memcpy(oa_config->uuid, arg->uuid, sizeof(arg->uuid));
>+
>+	oa_config->regs_len = arg->n_regs;
>+	regs = xe_oa_alloc_regs(oa, xe_oa_is_valid_config_reg_addr,
>+				u64_to_user_ptr(arg->regs_ptr),
>+				arg->n_regs);
>+	if (IS_ERR(regs)) {
>+		drm_dbg(&oa->xe->drm, "Failed to create OA config for mux_regs\n");
>+		err = PTR_ERR(regs);
>+		goto reg_err;
>+	}
>+	oa_config->regs = regs;
>+
>+	err = mutex_lock_interruptible(&oa->metrics_lock);
>+	if (err)
>+		goto reg_err;
>+
>+	/* We shouldn't have too many configs, so this iteration shouldn't be too costly */
>+	idr_for_each_entry(&oa->metrics_idr, tmp, id) {
>+		if (!strcmp(tmp->uuid, oa_config->uuid)) {
>+			drm_dbg(&oa->xe->drm, "OA config already exists with this uuid\n");
>+			err = -EADDRINUSE;
>+			goto sysfs_err;
>+		}
>+	}
>+
>+	err = create_dynamic_oa_sysfs_entry(oa, oa_config);
>+	if (err) {
>+		drm_dbg(&oa->xe->drm, "Failed to create sysfs entry for OA config\n");
>+		goto sysfs_err;
>+	}
>+
>+	/* Config id 0 is invalid, id 1 for kernel stored test config */

kernel doesn't store a test config anymore, so the comment can be 
updated. You can start with 1 below though, but that's up to you.

>+	oa_config->id = idr_alloc(&oa->metrics_idr, oa_config, 2, 0, GFP_KERNEL);
>+	if (oa_config->id < 0) {
>+		drm_dbg(&oa->xe->drm, "Failed to create sysfs entry for OA config\n");
>+		err = oa_config->id;
>+		goto sysfs_err;
>+	}
>+
>+	mutex_unlock(&oa->metrics_lock);
>+
>+	drm_dbg(&oa->xe->drm, "Added config %s id=%i\n", oa_config->uuid, oa_config->id);
>+
>+	return oa_config->id;
>+
>+sysfs_err:
>+	mutex_unlock(&oa->metrics_lock);
>+reg_err:
>+	xe_oa_config_put(oa_config);
>+	drm_dbg(&oa->xe->drm, "Failed to add new OA config\n");
>+	return err;
>+}
>+
>+int xe_oa_remove_config_ioctl(struct drm_device *dev, void *data,
>+			      struct drm_file *file)
>+{
>+	struct xe_oa *oa = &to_xe_device(dev)->oa;
>+	struct xe_oa_config *oa_config;
>+	u64 arg, *ptr = data;
>+	int ret;
>+
>+	if (!oa->xe) {
>+		drm_dbg(&oa->xe->drm, "xe oa interface not available for this system\n");
>+		return -ENODEV;
>+	}
>+
>+	if (xe_perf_stream_paranoid && !perfmon_capable()) {
>+		drm_dbg(&oa->xe->drm, "Insufficient privileges to remove xe OA config\n");
>+		return -EACCES;
>+	}
>+
>+	ret = get_user(arg, ptr);
>+	if (XE_IOCTL_DBG(oa->xe, ret))
>+		return ret;
>+
>+	ret = mutex_lock_interruptible(&oa->metrics_lock);
>+	if (ret)
>+		return ret;
>+
>+	oa_config = idr_find(&oa->metrics_idr, arg);
>+	if (!oa_config) {
>+		drm_dbg(&oa->xe->drm, "Failed to remove unknown OA config\n");
>+		ret = -ENOENT;
>+		goto err_unlock;
>+	}
>+
>+	WARN_ON(arg != oa_config->id);
>+
>+	sysfs_remove_group(oa->metrics_kobj, &oa_config->sysfs_metric);
>+	idr_remove(&oa->metrics_idr, arg);
>+
>+	mutex_unlock(&oa->metrics_lock);
>+
>+	drm_dbg(&oa->xe->drm, "Removed config %s id=%i\n", oa_config->uuid, oa_config->id);
>+
>+	xe_oa_config_put(oa_config);
>+
>+	return 0;
>+
>+err_unlock:
>+	mutex_unlock(&oa->metrics_lock);
>+	return ret;
>+}
>+
>+void xe_oa_register(struct xe_device *xe)
>+{
>+	struct xe_oa *oa = &xe->oa;
>+
>+	if (!oa->xe)
>+		return;
>+
>+	oa->metrics_kobj = kobject_create_and_add("metrics",
>+						  &xe->drm.primary->kdev->kobj);
>+}
>+
>+void xe_oa_unregister(struct xe_device *xe)
>+{
>+	struct xe_oa *oa = &xe->oa;
>+
>+	if (!oa->metrics_kobj)
>+		return;
>+
>+	kobject_put(oa->metrics_kobj);
>+	oa->metrics_kobj = NULL;
>+}
>+
> static u32 num_oa_units_per_gt(struct xe_gt *gt)
> {
> 	return 1;
>@@ -259,6 +653,9 @@ int xe_oa_init(struct xe_device *xe)
> 	/* Choose a representative limit */
> 	xe_oa_sample_rate_hard_limit = xe_root_mmio_gt(xe)->info.reference_clock / 2;
>
>+	mutex_init(&oa->metrics_lock);
>+	idr_init_base(&oa->metrics_idr, 1);
>+
> 	ret = xe_oa_init_oa_units(oa);
> 	if (ret) {
> 		drm_err(&xe->drm, "OA initialization failed %d\n", ret);
>@@ -272,6 +669,12 @@ int xe_oa_init(struct xe_device *xe)
> 	return ret;
> }
>
>+static int destroy_config(int id, void *p, void *data)
>+{
>+	xe_oa_config_put(p);
>+	return 0;
>+}
>+
> void xe_oa_fini(struct xe_device *xe)
> {
> 	struct xe_oa *oa = &xe->oa;
>@@ -284,6 +687,9 @@ void xe_oa_fini(struct xe_device *xe)
> 	for_each_gt(gt, xe, i)
> 		kfree(gt->oa.oa_unit);
>
>+	idr_for_each(&oa->metrics_idr, destroy_config, oa);
>+	idr_destroy(&oa->metrics_idr);
>+
> 	oa->xe = NULL;
> }
>
>diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
>index 2145c73176953..e4863f8681b14 100644
>--- a/drivers/gpu/drm/xe/xe_oa.h
>+++ b/drivers/gpu/drm/xe/xe_oa.h
>@@ -8,11 +8,20 @@
>
> #include "xe_oa_types.h"
>
>+struct drm_device;
>+struct drm_file;
> struct xe_device;
>
> int xe_oa_init(struct xe_device *xe);
> void xe_oa_fini(struct xe_device *xe);
>+void xe_oa_register(struct xe_device *xe);
>+void xe_oa_unregister(struct xe_device *xe);
> int xe_oa_sysctl_register(void);
> void xe_oa_sysctl_unregister(void);
>
>+int xe_oa_add_config_ioctl(struct drm_device *dev, void *data,
>+			   struct drm_file *file);
>+int xe_oa_remove_config_ioctl(struct drm_device *dev, void *data,
>+			      struct drm_file *file);
>+
> #endif
>diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
>index 8f8cf6a2bf556..2985443df3080 100644
>--- a/drivers/gpu/drm/xe/xe_oa_types.h
>+++ b/drivers/gpu/drm/xe/xe_oa_types.h
>@@ -6,6 +6,7 @@
> #ifndef _XE_OA_TYPES_H_
> #define _XE_OA_TYPES_H__
>
>+#include <linux/idr.h>
> #include <linux/math.h>
> #include <linux/types.h>
> #include <linux/mutex.h>
>@@ -120,6 +121,15 @@ struct xe_oa {
> 	/** @xe: back pointer to xe device */
> 	struct xe_device *xe;
>
>+	/** @metrics_kobj: kobj for metrics sysfs */
>+	struct kobject *metrics_kobj;
>+
>+	/** @metrics_lock: lock protecting add/remove configs */
>+	struct mutex metrics_lock;
>+
>+	/** @metrics_idr: List of dynamic configurations (struct xe_oa_config) */
>+	struct idr metrics_idr;
>+
> 	/** @oa_formats: tracks all OA formats across platforms */
> 	const struct xe_oa_format *oa_formats;
>
>diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
>index 37538e98dcc04..2aee4c7989486 100644
>--- a/drivers/gpu/drm/xe/xe_perf.c
>+++ b/drivers/gpu/drm/xe/xe_perf.c
>@@ -6,11 +6,25 @@
> #include <linux/errno.h>
> #include <linux/sysctl.h>
>
>+#include "xe_oa.h"
> #include "xe_perf.h"
>
> u32 xe_perf_stream_paranoid = true;
> static struct ctl_table_header *sysctl_header;
>
>+static int xe_oa_ioctl(struct drm_device *dev, struct drm_xe_perf_param *arg,
>+		       struct drm_file *file)
>+{
>+	switch (arg->perf_op) {
>+	case DRM_XE_PERF_OP_ADD_CONFIG:
>+		return xe_oa_add_config_ioctl(dev, (void *)arg->param, file);
>+	case DRM_XE_PERF_OP_REMOVE_CONFIG:
>+		return xe_oa_remove_config_ioctl(dev, (void *)arg->param, file);
>+	default:
>+		return -EINVAL;
>+	}
>+}
>+
> int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> {
> 	struct drm_xe_perf_param *arg = data;
>@@ -19,6 +33,8 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> 		return -EINVAL;
>
> 	switch (arg->perf_type) {
>+	case DRM_XE_PERF_TYPE_OA:
>+		return xe_oa_ioctl(dev, arg, file);
> 	default:
> 		return -EINVAL;
> 	}
>diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>index 778862a5b76d4..f17134828c093 100644
>--- a/include/uapi/drm/xe_drm.h
>+++ b/include/uapi/drm/xe_drm.h
>@@ -1126,6 +1126,7 @@ struct drm_xe_wait_user_fence {
>  * enum drm_xe_perf_type - Perf stream types
>  */
> enum drm_xe_perf_type {
>+	DRM_XE_PERF_TYPE_OA,
> 	DRM_XE_PERF_TYPE_MAX,
> };
>
>@@ -1191,6 +1192,30 @@ enum drm_xe_oa_format_type {
> 	DRM_XE_OA_FMT_TYPE_PEC,
> };
>
>+/**
>+ * struct drm_xe_oa_config - OA metric configuration
>+ *
>+ * Multiple OA configs can be added using @DRM_XE_PERF_OP_ADD_CONFIG. A
>+ * particular config can be specified when opening an OA stream using
>+ * @DRM_XE_OA_PROPERTY_OA_METRIC_SET property.
>+ */
>+struct drm_xe_oa_config {
>+	/** @extensions: Pointer to the first extension struct, if any */
>+	__u64 extensions;
>+
>+	/** * @uuid: String formatted like "%\08x-%\04x-%\04x-%\04x-%\012x" */
>+	char uuid[36];
>+
>+	/** @n_regs: Number of regs in @regs_ptr */
>+	__u32 n_regs;
>+
>+	/**
>+	 * @regs_ptr: Pointer to (register address, value) pairs for OA config
>+	 * registers. Expected length of buffer is: (2 * sizeof(u32) * @n_regs).
>+	 */
>+	__u64 regs_ptr;
>+};

minor nits above, otherwise lgtm,

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Umesh
>+
> #if defined(__cplusplus)
> }
> #endif
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2023-12-08  6:43 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
  2023-12-14  0:57   ` Umesh Nerlige Ramappa
@ 2023-12-19 20:28   ` Dixit, Ashutosh
  2024-01-20  2:35     ` Dixit, Ashutosh
  2024-01-24 14:10   ` Joel Granados
  2 siblings, 1 reply; 68+ messages in thread
From: Dixit, Ashutosh @ 2023-12-19 20:28 UTC (permalink / raw)
  To: intel-xe

On Thu, 07 Dec 2023 22:43:14 -0800, Ashutosh Dixit wrote:
>
> +static struct ctl_table perf_ctl_table[] = {
> +	{
> +	 .procname = "perf_stream_paranoid",
> +	 .data = &xe_perf_stream_paranoid,
> +	 .maxlen = sizeof(xe_perf_stream_paranoid),
> +	 .mode = 0644,
> +	 .proc_handler = proc_dointvec_minmax,
> +	 .extra1 = SYSCTL_ZERO,
> +	 .extra2 = SYSCTL_ONE,
> +	 },
> +	{}
> +};
> +
> +int xe_perf_sysctl_register(void)
> +{
> +	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
> +	return 0;
> +}

Any idea why this (and xe_oa_max_sample_rate) is create in /proc, rather
than something attached to the module?

We would want it to be per module, rather than per device, so that's one
reason. What are the options for creating per module params? One is
module_param itself. In that case this would appear in
"/sys/module/xe/parameters/perf_stream_paranoid" rather than in
"/proc/sys/dev/xe/perf_stream_paranoid".

Module params are slightly simpler to manage than /proc stuff I think. Any
other reason to prefer one over the other?

Comments?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties
  2023-12-08  6:43 ` [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties Ashutosh Dixit
  2023-12-09 22:53   ` Dixit, Ashutosh
  2023-12-19  2:59   ` Dixit, Ashutosh
@ 2023-12-19 23:23   ` Umesh Nerlige Ramappa
  2024-01-20  2:48     ` Dixit, Ashutosh
  2 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-19 23:23 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:19PM -0800, Ashutosh Dixit wrote:
>Properties for OA streams are specified by user space, when the stream is
>opened, as a chain of drm_xe_ext_set_property struct's. Parse and validate
>these stream properties.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/xe_oa.c   | 372 +++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_oa.h   |   2 +
> drivers/gpu/drm/xe/xe_perf.c |   2 +
> include/uapi/drm/xe_drm.h    | 114 +++++++++++
> 4 files changed, 490 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index 6a903bf4f87d1..9b0bd58fcbc06 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -3,10 +3,13 @@
>  * Copyright © 2023 Intel Corporation
>  */
>
>+#include <linux/nospec.h>
> #include <linux/sysctl.h>
>
>+#include "regs/xe_gt_regs.h"
> #include "regs/xe_oa_regs.h"
> #include "xe_device.h"
>+#include "xe_exec_queue.h"
> #include "xe_gt.h"
> #include "xe_mmio.h"
> #include "xe_oa.h"
>@@ -46,6 +49,20 @@ struct xe_oa_config {
> 	struct rcu_head rcu;
> };
>
>+struct xe_oa_open_param {
>+	u32 oa_unit_id;
>+	bool sample;
>+	u32 metric_set;
>+	enum xe_oa_format_name oa_format;
>+	int period_exponent;
>+	u32 poll_period_us;
>+	u32 open_flags;
>+	int exec_queue_id;
>+	int engine_instance;
>+	struct xe_exec_queue *exec_q;
>+	struct xe_hw_engine *hwe;
>+};
>+
> #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
>
> static const struct xe_oa_format oa_formats[] = {
>@@ -88,6 +105,361 @@ static void xe_oa_config_put(struct xe_oa_config *oa_config)
> 	kref_put(&oa_config->ref, xe_oa_config_release);
> }
>
>+/*
>+ * OA timestamp frequency = CS timestamp frequency in most platforms. On some
>+ * platforms OA unit ignores the CTC_SHIFT and the 2 timestamps differ. In such
>+ * cases, return the adjusted CS timestamp frequency to the user.
>+ */
>+u32 xe_oa_timestamp_frequency(struct xe_gt *gt)
>+{
>+	u32 reg, shift;
>+
>+	/*
>+	 * Wa_18013179988:dg2
>+	 * Wa_14015568240:pvc
>+	 * Wa_14015846243:mtl
>+	 */
>+	switch (gt_to_xe(gt)->info.platform) {
>+	case XE_DG2:
>+	case XE_PVC:
>+	case XE_METEORLAKE:
>+		xe_device_mem_access_get(gt_to_xe(gt));
>+		reg = xe_mmio_read32(gt, RPM_CONFIG0);
>+		xe_device_mem_access_put(gt_to_xe(gt));
>+
>+		shift = REG_FIELD_GET(RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, reg);
>+		return gt->info.reference_clock << (3 - shift);
>+
>+	default:
>+		return gt->info.reference_clock;
>+	}
>+}
>+
>+static u64 oa_exponent_to_ns(struct xe_gt *gt, int exponent)
>+{
>+	u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
>+	u32 den = xe_oa_timestamp_frequency(gt);
>+
>+	return div_u64(nom + den - 1, den);
>+}
>+
>+static bool engine_supports_oa_format(const struct xe_hw_engine *hwe, int type)
>+{
>+	switch (hwe->oa_unit->type) {
>+	case DRM_XE_OA_UNIT_TYPE_OAG:
>+		return type == DRM_XE_OA_FMT_TYPE_OAG || type == DRM_XE_OA_FMT_TYPE_OAR ||
>+			type == DRM_XE_OA_FMT_TYPE_OAC || type == DRM_XE_OA_FMT_TYPE_PEC;
>+	case DRM_XE_OA_UNIT_TYPE_OAM:
>+		return type == DRM_XE_OA_FMT_TYPE_OAM || type == DRM_XE_OA_FMT_TYPE_OAM_MPEC;
>+	default:
>+		return false;
>+	}
>+}
>+
>+static int decode_oa_format(struct xe_oa *oa, u64 fmt, enum xe_oa_format_name *name)
>+{
>+	u32 counter_size = FIELD_GET(DRM_XE_OA_FORMAT_MASK_COUNTER_SIZE, fmt);
>+	u32 counter_sel = FIELD_GET(DRM_XE_OA_FORMAT_MASK_COUNTER_SEL, fmt);
>+	u32 bc_report = FIELD_GET(DRM_XE_OA_FORMAT_MASK_BC_REPORT, fmt);
>+	u32 type = FIELD_GET(DRM_XE_OA_FORMAT_MASK_FMT_TYPE, fmt);
>+	int idx;
>+
>+	for_each_set_bit(idx, oa->format_mask, XE_OA_FORMAT_MAX) {
>+		const struct xe_oa_format *f = &oa->oa_formats[idx];
>+
>+		if (counter_size == f->counter_size && bc_report == f->bc_report &&
>+		    type == f->type && counter_sel == f->counter_select) {
>+			*name = idx;
>+			return 0;
>+		}
>+	}
>+
>+	return -EINVAL;
>+}
>+
>+u16 xe_oa_unit_id(struct xe_hw_engine *hwe)
>+{
>+	return hwe->oa_unit && hwe->oa_unit->num_engines ?
>+		hwe->oa_unit->oa_unit_id : U16_MAX;
>+}
>+
>+static int xe_oa_assign_hwe(struct xe_oa *oa, struct xe_oa_open_param *param)
>+{
>+	struct xe_gt *gt;
>+	int i, ret = 0;
>+
>+	if (param->exec_q) {
>+		/* When we have an exec_q, get hwe from the exec_q */
>+		for_each_gt(gt, oa->xe, i) {

Looks like the exec_queue can submit to a specific gt. I think we should 
try to get the hwe from the same gt as the exec_q. Basically this:

if (param->exec_q->gt != gt)
	continue;

or you can just drop the for loop and assume that xe_gt_hw_engine is not 
supposed to fail for this gt (exec_q->gt).


>+			param->hwe = xe_gt_hw_engine(gt, param->exec_q->class,
>+						     param->engine_instance, true);
>+			if (param->hwe)
>+				break;
>+		}
>+		if (param->hwe && (xe_oa_unit_id(param->hwe) != param->oa_unit_id)) {
>+			drm_dbg(&oa->xe->drm, "OA unit ID mismatch for exec_q\n");
>+			ret = -EINVAL;
>+		}
>+	} else {
>+		struct xe_hw_engine *hwe;
>+		enum xe_hw_engine_id id;
>+
>+		/* Else just get the first hwe attached to the oa unit */
>+		for_each_gt(gt, oa->xe, i) {
>+			for_each_hw_engine(hwe, gt, id) {
>+				if (xe_oa_unit_id(hwe) == param->oa_unit_id) {
>+					param->hwe = hwe;
>+					goto out;
>+				}
>+			}
>+		}
>+	}
>+out:
>+	if (!param->hwe) {
>+		drm_dbg(&oa->xe->drm, "Unable to find hwe for OA unit ID %d\n",
>+			param->oa_unit_id);
>+		ret = -EINVAL;
>+	}
>+
>+	return ret;
>+}
>+
>+static int xe_oa_set_prop_oa_unit_id(struct xe_oa *oa, u64 value,
>+				     struct xe_oa_open_param *param)
>+{
>+	if (value >= oa->oa_unit_ids) {
>+		drm_dbg(&oa->xe->drm, "OA unit ID out of range %lld\n", value);
>+		return -EINVAL;
>+	}
>+	param->oa_unit_id = value;
>+	return 0;
>+}
>+
>+static int xe_oa_set_prop_sample_oa(struct xe_oa *oa, u64 value,
>+				    struct xe_oa_open_param *param)
>+{
>+	param->sample = value;
>+	return 0;
>+}
>+
>+static int xe_oa_set_prop_metric_set(struct xe_oa *oa, u64 value,
>+				     struct xe_oa_open_param *param)
>+{
>+	param->metric_set = value;
>+	return 0;
>+}
>+
>+static int xe_oa_set_prop_oa_format(struct xe_oa *oa, u64 value,
>+				    struct xe_oa_open_param *param)
>+{
>+	int ret = decode_oa_format(oa, value, &param->oa_format);
>+
>+	if (ret) {
>+		drm_dbg(&oa->xe->drm, "Unsupported OA report format %#llx\n", value);
>+		return ret;
>+	}
>+	return 0;
>+}
>+
>+static int xe_oa_set_prop_oa_exponent(struct xe_oa *oa, u64 value,
>+				      struct xe_oa_open_param *param)
>+{
>+#define OA_EXPONENT_MAX 31
>+
>+	if (value > OA_EXPONENT_MAX) {
>+		drm_dbg(&oa->xe->drm, "OA timer exponent too high (> %u)\n", OA_EXPONENT_MAX);
>+		return -EINVAL;
>+	}
>+	param->period_exponent = value;

I think i915 has some additional logic where only root can sample at 
really high frequencies, but well, this is a root only use case, so I 
don't know what that logic achieved. Hoping that you intended to drop 
that logic, which is okay. Just confirming.

>+	return 0;
>+}
>+
>+static int xe_oa_set_prop_poll_oa_period(struct xe_oa *oa, u64 value,
>+					 struct xe_oa_open_param *param)
>+{
>+	if (value < 100) {
>+		drm_dbg(&oa->xe->drm, "OA timer too small (%lldus < 100us)\n", value);
>+		return -EINVAL;
>+	}
>+	param->poll_period_us = value;

I am not sure if anyone ended up using this at all. This will be unused 
if we add interrupt support in future. Any thoughts on adding interrupt 
support in future?

Also note that if we throttle poll to only signal the user after a set 
number of reports are available, then this parameter is not of much use.  
The poll throttling itself will reduce the CPU overhead that this was 
trying to address. Are there plans to bring that feature to XE (poll 
throttling)?

>+	return 0;
>+}
>+
>+static int xe_oa_set_prop_open_flags(struct xe_oa *oa, u64 value,
>+				     struct xe_oa_open_param *param)
>+{
>+	u32 known_open_flags =
>+		DRM_XE_OA_FLAG_FD_CLOEXEC | DRM_XE_OA_FLAG_FD_NONBLOCK | DRM_XE_OA_FLAG_DISABLED;
>+
>+	if (value & ~known_open_flags) {
>+		drm_dbg(&oa->xe->drm, "Unknown open_flag %#llx\n", value);
>+		return -EINVAL;
>+	}
>+	param->open_flags = value;
>+	return 0;
>+}
>+
>+static int xe_oa_set_prop_exec_queue_id(struct xe_oa *oa, u64 value,
>+					struct xe_oa_open_param *param)
>+{
>+	param->exec_queue_id = value;
>+	return 0;
>+}
>+
>+static int xe_oa_set_prop_engine_instance(struct xe_oa *oa, u64 value,
>+					  struct xe_oa_open_param *param)
>+{
>+	param->engine_instance = value;
>+	return 0;
>+}
>+
>+typedef int (*xe_oa_set_property_fn)(struct xe_oa *oa, u64 value,
>+				     struct xe_oa_open_param *param);
>+static const xe_oa_set_property_fn xe_oa_set_property_funcs[] = {
>+	[DRM_XE_OA_PROPERTY_OA_UNIT_ID] = xe_oa_set_prop_oa_unit_id,
>+	[DRM_XE_OA_PROPERTY_SAMPLE_OA] = xe_oa_set_prop_sample_oa,
>+	[DRM_XE_OA_PROPERTY_OA_METRIC_SET] = xe_oa_set_prop_metric_set,
>+	[DRM_XE_OA_PROPERTY_OA_FORMAT] = xe_oa_set_prop_oa_format,
>+	[DRM_XE_OA_PROPERTY_OA_EXPONENT] = xe_oa_set_prop_oa_exponent,
>+	[DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US] = xe_oa_set_prop_poll_oa_period,
>+	[DRM_XE_OA_PROPERTY_OPEN_FLAGS] = xe_oa_set_prop_open_flags,
>+	[DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID] = xe_oa_set_prop_exec_queue_id,
>+	[DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE] = xe_oa_set_prop_engine_instance,
>+};
>+
>+static int xe_oa_user_ext_set_property(struct xe_oa *oa, u64 extension,
>+				       struct xe_oa_open_param *param)
>+{
>+	u64 __user *address = u64_to_user_ptr(extension);
>+	struct drm_xe_ext_set_property ext;
>+	int err;
>+	u32 idx;
>+
>+	err = __copy_from_user(&ext, address, sizeof(ext));
>+	if (XE_IOCTL_DBG(oa->xe, err))
>+		return -EFAULT;
>+
>+	if (XE_IOCTL_DBG(oa->xe, ext.property >= ARRAY_SIZE(xe_oa_set_property_funcs)) ||
>+	    XE_IOCTL_DBG(oa->xe, ext.pad))
>+		return -EINVAL;
>+
>+	idx = array_index_nospec(ext.property, ARRAY_SIZE(xe_oa_set_property_funcs));
>+	return xe_oa_set_property_funcs[idx](oa, ext.value, param);
>+}
>+
>+typedef int (*xe_oa_user_extension_fn)(struct xe_oa *oa, u64 extension,
>+				       struct xe_oa_open_param *param);
>+static const xe_oa_user_extension_fn xe_oa_user_extension_funcs[] = {
>+	[DRM_XE_OA_EXTENSION_SET_PROPERTY] = xe_oa_user_ext_set_property,
>+};
>+
>+static int xe_oa_user_extensions(struct xe_oa *oa, u64 extension,
>+				 struct xe_oa_open_param *param)
>+{
>+	u64 __user *address = u64_to_user_ptr(extension);
>+	struct xe_user_extension ext;
>+	int err;
>+	u32 idx;
>+
>+	err = __copy_from_user(&ext, address, sizeof(ext));
>+	if (XE_IOCTL_DBG(oa->xe, err))
>+		return -EFAULT;
>+
>+	if (XE_IOCTL_DBG(oa->xe, ext.pad) ||
>+	    XE_IOCTL_DBG(oa->xe, ext.name >= ARRAY_SIZE(xe_oa_user_extension_funcs)))
>+		return -EINVAL;
>+
>+	idx = array_index_nospec(ext.name, ARRAY_SIZE(xe_oa_user_extension_funcs));
>+	err = xe_oa_user_extension_funcs[idx](oa, extension, param);
>+	if (XE_IOCTL_DBG(oa->xe, err))
>+		return err;
>+
>+	if (ext.next_extension)
>+		return xe_oa_user_extensions(oa, ext.next_extension, param);

What if the user passed a circular list of extensions? If it will result 
in an issue, we should also add a test for it.

>+
>+	return 0;
>+}
>+
>+int xe_oa_stream_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>+{
>+	struct xe_oa *oa = &to_xe_device(dev)->oa;
>+	struct xe_file *xef = to_xe_file(file);
>+	struct drm_xe_oa_open_param dparam;
>+	struct xe_oa_open_param param = {};
>+	const struct xe_oa_format *f;
>+	bool privileged_op = true;
>+	int ret;
>+
>+	if (!oa->xe) {
>+		drm_dbg(&oa->xe->drm, "xe oa interface not available for this system\n");
>+		return -ENODEV;
>+	}
>+
>+	ret = __copy_from_user(&dparam, data, sizeof(dparam));
>+	if (XE_IOCTL_DBG(oa->xe, ret))
>+		return -EFAULT;
>+
>+	ret = xe_oa_user_extensions(oa, dparam.extensions, &param);
>+	if (ret)
>+		return ret;
>+
>+	if (param.exec_queue_id > 0) {
>+		param.exec_q = xe_exec_queue_lookup(xef, param.exec_queue_id);
>+		if (XE_IOCTL_DBG(oa->xe, !param.exec_q))
>+			return -ENOENT;
>+	}
>+
>+	/*
>+	 * Query based sampling (using MI_REPORT_PERF_COUNT) with OAR/OAC,
>+	 * without global stream access, can be an unprivileged operation
>+	 */
>+	if (param.exec_q && !param.sample)
>+		privileged_op = false;
>+
>+	if (privileged_op && xe_perf_stream_paranoid && !perfmon_capable()) {
>+		drm_dbg(&oa->xe->drm, "Insufficient privileges to open xe perf stream\n");
>+		ret = -EACCES;
>+		goto err_exec_q;
>+	}
>+
>+	if (!param.exec_q && !param.sample) {
>+		drm_dbg(&oa->xe->drm, "Only OA report sampling supported\n");
>+		ret = -EINVAL;
>+		goto err_exec_q;
>+	}
>+
>+	ret = xe_oa_assign_hwe(oa, &param);
>+	if (ret)
>+		goto err_exec_q;
>+
>+	f = &oa->oa_formats[param.oa_format];
>+	if (!param.oa_format || !f->size ||
>+	    !engine_supports_oa_format(param.hwe, f->type)) {
>+		drm_dbg(&oa->xe->drm, "Invalid OA format %d type %d size %d for class %d\n",
>+			param.oa_format, f->type, f->size, param.hwe->class);
>+		ret = -EINVAL;
>+		goto err_exec_q;
>+	}
>+
>+	if (param.period_exponent > 0) {
>+		u64 oa_period, oa_freq_hz;
>+
>+		oa_period = oa_exponent_to_ns(param.hwe->gt, param.period_exponent);
>+		oa_freq_hz = div64_u64(NSEC_PER_SEC, oa_period);
>+		if (oa_freq_hz > xe_oa_max_sample_rate && !perfmon_capable()) {
>+			drm_dbg(&oa->xe->drm,
>+				"OA exponent would exceed the max sampling frequency (sysctl dev.xe.oa_max_sample_rate) %uHz without CAP_PERFMON or CAP_SYS_ADMIN privileges\n",
>+				xe_oa_max_sample_rate);
>+			ret = -EACCES;
>+			goto err_exec_q;
>+		}
>+	}
>+err_exec_q:
>+	if (ret < 0 && param.exec_q)
>+		xe_exec_queue_put(param.exec_q);
>+	return ret;
>+}
>+
> static bool xe_oa_is_valid_flex_addr(struct xe_oa *oa, u32 addr)
> {
> 	static const struct xe_reg flex_eu_regs[] = {
>diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
>index e4863f8681b14..a0f9a876ea6b4 100644
>--- a/drivers/gpu/drm/xe/xe_oa.h
>+++ b/drivers/gpu/drm/xe/xe_oa.h
>@@ -19,6 +19,8 @@ void xe_oa_unregister(struct xe_device *xe);
> int xe_oa_sysctl_register(void);
> void xe_oa_sysctl_unregister(void);
>
>+int xe_oa_stream_open_ioctl(struct drm_device *dev, void *data,
>+			    struct drm_file *file);
> int xe_oa_add_config_ioctl(struct drm_device *dev, void *data,
> 			   struct drm_file *file);
> int xe_oa_remove_config_ioctl(struct drm_device *dev, void *data,
>diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
>index 2aee4c7989486..2c0615481b7df 100644
>--- a/drivers/gpu/drm/xe/xe_perf.c
>+++ b/drivers/gpu/drm/xe/xe_perf.c
>@@ -16,6 +16,8 @@ static int xe_oa_ioctl(struct drm_device *dev, struct drm_xe_perf_param *arg,
> 		       struct drm_file *file)
> {
> 	switch (arg->perf_op) {
>+	case DRM_XE_PERF_OP_STREAM_OPEN:
>+		return xe_oa_stream_open_ioctl(dev, (void *)arg->param, file);
> 	case DRM_XE_PERF_OP_ADD_CONFIG:
> 		return xe_oa_add_config_ioctl(dev, (void *)arg->param, file);
> 	case DRM_XE_PERF_OP_REMOVE_CONFIG:
>diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>index f17134828c093..8156301df7315 100644
>--- a/include/uapi/drm/xe_drm.h
>+++ b/include/uapi/drm/xe_drm.h
>@@ -1192,6 +1192,120 @@ enum drm_xe_oa_format_type {
> 	DRM_XE_OA_FMT_TYPE_PEC,
> };
>
>+/** enum drm_xe_oa_property_id - OA stream property id's */
>+enum drm_xe_oa_property_id {
>+	/**
>+	 * @DRM_XE_OA_PROPERTY_OA_UNIT_ID: ID of the OA unit on which to open
>+	 * the OA stream, see @oa_unit_id in 'struct
>+	 * drm_xe_query_oa_units'. Defaults to 0 if not provided.
>+	 */
>+	DRM_XE_OA_PROPERTY_OA_UNIT_ID = 1,
>+
>+	/**
>+	 * @DRM_XE_OA_PROPERTY_SAMPLE_OA: A value of 1 requests the inclusion of
>+	 * raw OA unit reports as part of stream samples.
>+	 */
>+	DRM_XE_OA_PROPERTY_SAMPLE_OA,
>+
>+	/**
>+	 * @DRM_XE_OA_PROPERTY_OA_METRIC_SET: OA metrics defining contents of OA
>+	 * reportst, previously added via @@DRM_XE_PERF_OP_ADD_CONFIG.

typo: reports

>+	 */
>+	DRM_XE_OA_PROPERTY_OA_METRIC_SET,
>+
>+	/** @DRM_XE_OA_PROPERTY_OA_FORMAT: Perf counter report format */
>+	DRM_XE_OA_PROPERTY_OA_FORMAT,
>+	/**
>+	 * OA_FORMAT's are specified the same way as in Bspec, in terms of
>+	 * the following quantities: a. enum @drm_xe_oa_format_type
>+	 * b. Counter select c. Counter size and d. BC report
>+	 */
>+#define DRM_XE_OA_FORMAT_MASK_FMT_TYPE		(0xff << 0)
>+#define DRM_XE_OA_FORMAT_MASK_COUNTER_SEL	(0xff << 8)
>+#define DRM_XE_OA_FORMAT_MASK_COUNTER_SIZE	(0xff << 16)
>+#define DRM_XE_OA_FORMAT_MASK_BC_REPORT		(0xff << 24)

indentation/alignment is off I guess

>+
>+	/**
>+	 * @DRM_XE_OA_PROPERTY_OA_EXPONENT: Requests periodic OA unit sampling
>+	 * with sampling frequency proportional to 2^(period_exponent + 1)
>+	 */
>+	DRM_XE_OA_PROPERTY_OA_EXPONENT,
>+
>+	/**
>+	 * @DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US: Timer interval in microseconds
>+	 * to check OA buffer for available data. Minimum allowed value is 100
>+	 * microseconds. A default value is used by the driver if this parameter
>+	 * is skipped. Larger timer values will reduce cpu consumption during OA
>+	 * perf captures, but excessively large values could result in data loss
>+	 * due to OA buffer overwrites.
>+	 */
>+	DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US,

Again, very likely not used, but please confirm with the UMDs though.

>+
>+	/**
>+	 * @DRM_XE_OA_PROPERTY_OPEN_FLAGS: CLOEXEC and NONBLOCK flags are
>+	 * directly applied to returned OA fd. DISABLED opens the OA stream in a
>+	 * DISABLED state (see @DRM_XE_PERF_IOCTL_ENABLE).
>+	 */
>+	DRM_XE_OA_PROPERTY_OPEN_FLAGS,
>+#define DRM_XE_OA_FLAG_FD_CLOEXEC	(1 << 0)
>+#define DRM_XE_OA_FLAG_FD_NONBLOCK	(1 << 1)
>+#define DRM_XE_OA_FLAG_DISABLED		(1 << 2)

oh, overlooked this before commenting earlier on the fcntl stuff. Looks 
like you already were passing this in params. Anyways, fcntl should be 
good for CLOEXEC/NONBLOCK.

>+
>+	/**
>+	 * @DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID: Open the stream for a specific
>+	 * @exec_queue_id. Perf queries can be executed on this exec queue.
>+	 */
>+	DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID,
>+
>+	/**
>+	 * @DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE: Optional engine instance to
>+	 * pass along with @DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID or will default to 0.
>+	 */
>+	DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE,
>+
>+	DRM_XE_OA_PROPERTY_MAX /* non-ABI */
>+};
>+
>+/**
>+ * struct drm_xe_oa_open_param - Params for opening an OA stream
>+ *
>+ * Stream params are specified as a chain of @drm_xe_ext_set_property
>+ * struct's, with @property values from enum @drm_xe_oa_property_id and
>+ * @xe_user_extension base.name set to @DRM_XE_OA_EXTENSION_SET_PROPERTY
>+ */
>+struct drm_xe_oa_open_param {
>+#define DRM_XE_OA_EXTENSION_SET_PROPERTY	0
>+	/** @extensions: Pointer to the first extension struct */
>+	__u64 extensions;
>+};
>+
>+/** enum drm_xe_oa_record_type - Type of OA packet read from OA fd */
>+enum drm_xe_oa_record_type {
>+	/** @DRM_XE_OA_RECORD_SAMPLE: Regular OA data sample */
>+	DRM_XE_OA_RECORD_SAMPLE = 1,
>+
>+	/** @DRM_XE_OA_RECORD_OA_REPORT_LOST: Status indicating lost OA reports */
>+	DRM_XE_OA_RECORD_OA_REPORT_LOST = 2,
>+
>+	/**
>+	 * @DRM_XE_OA_RECORD_OA_BUFFER_LOST: Status indicating lost OA
>+	 * reports and OA buffer reset in the process
>+	 */
>+	DRM_XE_OA_RECORD_OA_BUFFER_LOST = 3,
>+
>+	DRM_XE_OA_RECORD_MAX /* non-ABI */
>+};
>+
>+/** struct drm_xe_oa_record_header - Header for OA packets read from OA fd */
>+struct drm_xe_oa_record_header {
>+	/** @type: Of enum @drm_xe_oa_record_type */
>+	__u16 type;
>+	/** @pad: MBZ */
>+	__u16 pad;
>+	/** @size: size in bytes */
>+	__u32 size;
>+};

I think we want to drop the header completely, but I guess that's still 
wip. Any plans to allow read/write of STATUS reg? I am thinking i915 can 
clear the register automatically but store the last cleared value that 
can be returned to the user on a read or something similar where user 
only ever needs to read it. I don't think user will try to recover 
anything if there is an error. In case of error the capture is just 
reinitiated. Again, need UMD confirmation here.

Thanks,
Umesh

>+
> /**
>  * struct drm_xe_oa_config - OA metric configuration
>  *
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 08/17] drm/xe/oa: OA stream initialization (OAG)
  2023-12-08  6:43 ` [PATCH 08/17] drm/xe/oa: OA stream initialization (OAG) Ashutosh Dixit
@ 2023-12-20  2:31   ` Umesh Nerlige Ramappa
  2024-01-20  2:49     ` Dixit, Ashutosh
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-20  2:31 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:20PM -0800, Ashutosh Dixit wrote:
>Implement majority of OA stream initialization (as part of OA stream open)
>ioctl). OAG buffer is allocated for receiving perf counter samples from
>HW. OAG unit is initialized and the selected OA metric configuration is
>programmed into OAG unit HW using a command/batch buffer.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/regs/xe_gt_regs.h |   3 +
> drivers/gpu/drm/xe/xe_oa.c           | 397 +++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_oa_types.h     |  82 ++++++
> 3 files changed, 482 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>index d318ec0efd7db..1b98b609f7fda 100644
>--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>@@ -156,6 +156,8 @@
>
> #define SQCNT1					XE_REG_MCR(0x8718)
> #define XELPMP_SQCNT1				XE_REG(0x8718)
>+#define   SQCNT1_PMON_ENABLE			REG_BIT(30)
>+#define   SQCNT1_OABPC				REG_BIT(29)
> #define   ENFORCE_RAR				REG_BIT(23)

REG_BIT(29) indentation seems to be off

>
> #define XEHP_SQCM				XE_REG_MCR(0x8724)
>@@ -365,6 +367,7 @@
> #define ROW_CHICKEN				XE_REG_MCR(0xe4f0, XE_REG_OPTION_MASKED)
> #define   UGM_BACKUP_MODE			REG_BIT(13)
> #define   MDQ_ARBITRATION_MODE			REG_BIT(12)
>+#define   STALL_DOP_GATING_DISABLE		REG_BIT(5)
> #define   EARLY_EOT_DIS				REG_BIT(1)
>
> #define ROW_CHICKEN2				XE_REG_MCR(0xe4f4, XE_REG_OPTION_MASKED)
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index 9b0bd58fcbc06..d898610322d50 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -6,15 +6,26 @@
> #include <linux/nospec.h>
> #include <linux/sysctl.h>
>
>+#include <drm/drm_drv.h>
>+#include <drm/xe_drm.h>
>+
>+#include "instructions/xe_mi_commands.h"
> #include "regs/xe_gt_regs.h"
> #include "regs/xe_oa_regs.h"
> #include "xe_device.h"
> #include "xe_exec_queue.h"
>+#include "xe_bb.h"
>+#include "xe_bo.h"
> #include "xe_gt.h"
>+#include "xe_gt_mcr.h"
> #include "xe_mmio.h"
> #include "xe_oa.h"
>+#include "xe_sched_job.h"
> #include "xe_perf.h"
>
>+#define DEFAULT_POLL_FREQUENCY_HZ 200
>+#define DEFAULT_POLL_PERIOD_NS (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY_HZ)
>+
> static int xe_oa_sample_rate_hard_limit;
> static u32 xe_oa_max_sample_rate = 100000;
>
>@@ -63,6 +74,13 @@ struct xe_oa_open_param {
> 	struct xe_hw_engine *hwe;
> };
>
>+struct xe_oa_config_bo {
>+	struct llist_node node;
>+
>+	struct xe_oa_config *oa_config;
>+	struct xe_bb *bb;
>+};
>+
> #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
>
> static const struct xe_oa_format oa_formats[] = {
>@@ -105,6 +123,381 @@ static void xe_oa_config_put(struct xe_oa_config *oa_config)
> 	kref_put(&oa_config->ref, xe_oa_config_release);
> }
>
>+static struct xe_oa_config *xe_oa_config_get(struct xe_oa_config *oa_config)
>+{
>+	return kref_get_unless_zero(&oa_config->ref) ? oa_config : NULL;
>+}
>+
>+static struct xe_oa_config *xe_oa_get_oa_config(struct xe_oa *oa, int metrics_set)
>+{
>+	struct xe_oa_config *oa_config;
>+
>+	rcu_read_lock();
>+	oa_config = idr_find(&oa->metrics_idr, metrics_set);
>+	if (oa_config)
>+		oa_config = xe_oa_config_get(oa_config);
>+	rcu_read_unlock();
>+
>+	return oa_config;
>+}
>+
>+static void free_oa_config_bo(struct xe_oa_config_bo *oa_bo)
>+{
>+	xe_oa_config_put(oa_bo->oa_config);
>+	xe_bb_free(oa_bo->bb, NULL);
>+	kfree(oa_bo);
>+}
>+
>+static const struct xe_oa_regs *__oa_regs(struct xe_oa_stream *stream)
>+{
>+	return &stream->hwe->oa_unit->regs;
>+}
>+
>+static int xe_oa_submit_bb(struct xe_oa_stream *stream, struct xe_bb *bb)
>+{
>+	struct xe_sched_job *job;
>+	struct dma_fence *fence;
>+	long timeout;
>+	int err = 0;
>+
>+	/* Kernel configuration is issued on stream->k_exec_q, not stream->exec_q */
>+	job = xe_bb_create_job(stream->k_exec_q, bb);
>+	if (IS_ERR(job)) {
>+		err = PTR_ERR(job);
>+		goto exit;
>+	}
>+
>+	xe_sched_job_arm(job);
>+	fence = dma_fence_get(&job->drm.s_fence->finished);
>+	xe_sched_job_push(job);
>+
>+	timeout = dma_fence_wait_timeout(fence, false, HZ);
>+	dma_fence_put(fence);
>+	if (timeout < 0)
>+		err = timeout;
>+	else if (!timeout)
>+		err = -ETIME;
>+exit:
>+	return err;
>+}
>+
>+static void xe_oa_free_oa_buffer(struct xe_oa_stream *stream)
>+{
>+	xe_bo_unpin_map_no_vm(stream->oa_buffer.bo);
>+}
>+
>+static void xe_oa_free_configs(struct xe_oa_stream *stream)
>+{
>+	struct xe_oa_config_bo *oa_bo, *tmp;
>+
>+	xe_oa_config_put(stream->oa_config);
>+	llist_for_each_entry_safe(oa_bo, tmp, stream->oa_config_bos.first, node)
>+		free_oa_config_bo(oa_bo);
>+}
>+
>+#define HAS_OA_BPC_REPORTING(xe) (GRAPHICS_VERx100(xe) >= 1255)
>+
>+static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
>+{
>+	u32 sqcnt1;
>+
>+	/*
>+	 * Wa_1508761755:xehpsdv, dg2
>+	 * Enable thread stall DOP gating and EU DOP gating.
>+	 */
>+	if (stream->oa->xe->info.platform == XE_DG2) {
>+		xe_gt_mcr_multicast_write(stream->gt, ROW_CHICKEN,
>+					  _MASKED_BIT_DISABLE(STALL_DOP_GATING_DISABLE));
>+		xe_gt_mcr_multicast_write(stream->gt, ROW_CHICKEN2,
>+					  _MASKED_BIT_DISABLE(DISABLE_DOP_GATING));
>+	}
>+
>+	/* Make sure we disable noa to save power. */
>+	xe_mmio_rmw32(stream->gt, RPM_CONFIG1, GT_NOA_ENABLE, 0);
>+
>+	sqcnt1 = SQCNT1_PMON_ENABLE |
>+		 (HAS_OA_BPC_REPORTING(stream->oa->xe) ? SQCNT1_OABPC : 0);
>+
>+	/* Reset PMON Enable to save power. */
>+	xe_mmio_rmw32(stream->gt, XELPMP_SQCNT1, sqcnt1, 0);
>+}
>+
>+static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream)
>+{
>+	struct xe_bo *bo;
>+
>+	BUILD_BUG_ON_NOT_POWER_OF_2(XE_OA_BUFFER_SIZE);
>+	BUILD_BUG_ON(XE_OA_BUFFER_SIZE < SZ_128K || XE_OA_BUFFER_SIZE > SZ_16M);
>+
>+	bo = xe_bo_create_pin_map(stream->oa->xe, stream->gt->tile, NULL,
>+				  XE_OA_BUFFER_SIZE, ttm_bo_type_kernel,
>+				  XE_BO_CREATE_SYSTEM_BIT | XE_BO_CREATE_GGTT_BIT);
>+	if (IS_ERR(bo))
>+		return PTR_ERR(bo);
>+
>+	stream->oa_buffer.bo = bo;
>+	stream->oa_buffer.vaddr = bo->vmap.vaddr;
>+	return 0;
>+}
>+
>+static void write_cs_mi_lri(struct xe_bb *bb, const struct xe_oa_reg *reg_data, u32 n_regs)
>+{
>+	u32 i;
>+
>+#define MI_LOAD_REGISTER_IMM_MAX_REGS (126)
>+
>+	for (i = 0; i < n_regs; i++) {
>+		if ((i % MI_LOAD_REGISTER_IMM_MAX_REGS) == 0) {
>+			u32 n_lri = min_t(u32, n_regs - i,
>+					  MI_LOAD_REGISTER_IMM_MAX_REGS);
>+
>+			bb->cs[bb->len++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(n_lri);
>+		}
>+		bb->cs[bb->len++] = reg_data[i].addr.addr;
>+		bb->cs[bb->len++] = reg_data[i].value;
>+	}
>+}
>+
>+static int num_lri_dwords(int num_regs)
>+{
>+	int count = 0;
>+
>+	if (num_regs > 0) {
>+		count += DIV_ROUND_UP(num_regs, MI_LOAD_REGISTER_IMM_MAX_REGS);
>+		count += num_regs * 2;
>+	}
>+
>+	return count;
>+}
>+
>+static struct xe_oa_config_bo *
>+__xe_oa_alloc_config_buffer(struct xe_oa_stream *stream, struct xe_oa_config *oa_config)
>+{
>+	struct xe_oa_config_bo *oa_bo;
>+	size_t config_length;
>+	struct xe_bb *bb;
>+
>+	oa_bo = kzalloc(sizeof(*oa_bo), GFP_KERNEL);
>+	if (!oa_bo)
>+		return ERR_PTR(-ENOMEM);
>+
>+	config_length = num_lri_dwords(oa_config->regs_len);
>+	config_length = ALIGN(sizeof(u32) * config_length, XE_PAGE_SIZE) / sizeof(u32);
>+
>+	bb = xe_bb_new(stream->gt, config_length, false);
>+	if (IS_ERR(bb))
>+		goto err_free;
>+
>+	write_cs_mi_lri(bb, oa_config->regs, oa_config->regs_len);
>+
>+	oa_bo->bb = bb;
>+	oa_bo->oa_config = xe_oa_config_get(oa_config);
>+	llist_add(&oa_bo->node, &stream->oa_config_bos);
>+
>+	return oa_bo;
>+err_free:
>+	kfree(oa_bo);
>+	return ERR_CAST(bb);
>+}
>+
>+static struct xe_oa_config_bo *xe_oa_alloc_config_buffer(struct xe_oa_stream *stream)
>+{
>+	struct xe_oa_config *oa_config = stream->oa_config;
>+	struct xe_oa_config_bo *oa_bo;
>+
>+	/* Look for the buffer in the already allocated BOs attached to the stream */
>+	llist_for_each_entry(oa_bo, stream->oa_config_bos.first, node) {
>+		if (oa_bo->oa_config == oa_config &&
>+		    memcmp(oa_bo->oa_config->uuid, oa_config->uuid,
>+			   sizeof(oa_config->uuid)) == 0)
>+			goto out;
>+	}
>+
>+	oa_bo = __xe_oa_alloc_config_buffer(stream, oa_config);
>+out:
>+	return oa_bo;
>+}
>+
>+static int xe_oa_emit_oa_config(struct xe_oa_stream *stream)
>+{
>+#define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
>+	struct xe_oa_config_bo *oa_bo;
>+	int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US;
>+
>+	oa_bo = xe_oa_alloc_config_buffer(stream);
>+	if (IS_ERR(oa_bo)) {
>+		err = PTR_ERR(oa_bo);
>+		goto exit;
>+	}
>+
>+	err = xe_oa_submit_bb(stream, oa_bo->bb);
>+
>+	/* Additional empirical delay needed for NOA programming after registers are written */
>+	usleep_range(us, 2 * us);

Are we planning to signal user fence or something to indicate 
completion? I haven't tracked that aspect much.

The reset is familiar and lgtm,

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Umesh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 09/17] drm/xe/oa/uapi: Expose OA stream fd
  2023-12-08  6:43 ` [PATCH 09/17] drm/xe/oa/uapi: Expose OA stream fd Ashutosh Dixit
@ 2023-12-20  2:52   ` Umesh Nerlige Ramappa
  2024-01-20  2:50     ` Dixit, Ashutosh
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-20  2:52 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:21PM -0800, Ashutosh Dixit wrote:
>The OA stream open perf op returns an fd with its own file_operations for
>the newly initialized OA stream. These file_operations allow userspace to
>enable or disable the stream, as well as apply a different metric
>configuration for the OA stream. Userspace can also poll for data
>availability. OA stream initialization is completed in this commit by
>enabling the OA stream. When sampling is enabled this starts a hrtimer
>which periodically checks for data availablility.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

lgtm

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

>---
> drivers/gpu/drm/xe/xe_oa.c | 373 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 373 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index d898610322d50..b6e94dba5f525 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -3,7 +3,9 @@
>  * Copyright © 2023 Intel Corporation
>  */
>
>+#include <linux/anon_inodes.h>
> #include <linux/nospec.h>
>+#include <linux/poll.h>
> #include <linux/sysctl.h>
>
> #include <drm/drm_drv.h>
>@@ -23,6 +25,7 @@
> #include "xe_sched_job.h"
> #include "xe_perf.h"
>
>+#define OA_TAKEN(tail, head)	(((tail) - (head)) & (XE_OA_BUFFER_SIZE - 1))
> #define DEFAULT_POLL_FREQUENCY_HZ 200
> #define DEFAULT_POLL_PERIOD_NS (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY_HZ)
>
>@@ -153,6 +156,202 @@ static const struct xe_oa_regs *__oa_regs(struct xe_oa_stream *stream)
> 	return &stream->hwe->oa_unit->regs;
> }
>
>+static u32 xe_oa_hw_tail_read(struct xe_oa_stream *stream)
>+{
>+	return xe_mmio_read32(stream->gt, __oa_regs(stream)->oa_tail_ptr) &
>+		OAG_OATAILPTR_MASK;
>+}
>+
>+#define oa_report_header_64bit(__s) \
>+	((__s)->oa_buffer.format->header == HDR_64_BIT)
>+
>+static u64 oa_report_id(struct xe_oa_stream *stream, void *report)
>+{
>+	return oa_report_header_64bit(stream) ? *(u64 *)report : *(u32 *)report;
>+}
>+
>+static u64 oa_timestamp(struct xe_oa_stream *stream, void *report)
>+{
>+	return oa_report_header_64bit(stream) ?
>+		*((u64 *)report + 1) :
>+		*((u32 *)report + 1);
>+}
>+
>+static bool xe_oa_buffer_check_unlocked(struct xe_oa_stream *stream)
>+{
>+	u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo);
>+	int report_size = stream->oa_buffer.format->size;
>+	u32 tail, hw_tail;
>+	unsigned long flags;
>+	bool pollin;
>+	u32 partial_report_size;
>+
>+	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>+
>+	hw_tail = xe_oa_hw_tail_read(stream);
>+	hw_tail -= gtt_offset;
>+
>+	/*
>+	 * The tail pointer increases in 64 byte (cacheline size), not in report_size
>+	 * increments. Also report size may not be a power of 2. Compute potential
>+	 * partially landed report in OA buffer.
>+	 */
>+	partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail);
>+	partial_report_size %= report_size;
>+
>+	/* Subtract partial amount off the tail */
>+	hw_tail = OA_TAKEN(hw_tail, partial_report_size);
>+
>+	tail = hw_tail;
>+
>+	/*
>+	 * Walk the stream backward until we find a report with report id and timestamp
>+	 * not 0. We can't tell whether a report has fully landed in memory before the
>+	 * report id and timestamp of the following report have landed.
>+	 *
>+	 * This is assuming that the writes of the OA unit land in memory in the order
>+	 * they were written.  If not : (╯°□°)╯︵ ┻━┻
>+	 */
>+	while (OA_TAKEN(tail, stream->oa_buffer.tail) >= report_size) {
>+		void *report = stream->oa_buffer.vaddr + tail;
>+
>+		if (oa_report_id(stream, report) || oa_timestamp(stream, report))
>+			break;
>+
>+		tail = OA_TAKEN(tail, report_size);
>+	}
>+
>+	if (OA_TAKEN(hw_tail, tail) > report_size)
>+		drm_dbg(&stream->oa->xe->drm,
>+			"unlanded report(s) head=0x%x tail=0x%x hw_tail=0x%x\n",
>+			stream->oa_buffer.head, tail, hw_tail);
>+
>+	stream->oa_buffer.tail = tail;
>+
>+	pollin = OA_TAKEN(stream->oa_buffer.tail,
>+			  stream->oa_buffer.head) >= report_size;
>+
>+	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>+
>+	return pollin;
>+}
>+
>+static enum hrtimer_restart xe_oa_poll_check_timer_cb(struct hrtimer *hrtimer)
>+{
>+	struct xe_oa_stream *stream =
>+		container_of(hrtimer, typeof(*stream), poll_check_timer);
>+
>+	if (xe_oa_buffer_check_unlocked(stream)) {
>+		stream->pollin = true;
>+		wake_up(&stream->poll_wq);
>+	}
>+
>+	hrtimer_forward_now(hrtimer, ns_to_ktime(stream->poll_period_ns));
>+
>+	return HRTIMER_RESTART;
>+}
>+
>+static void xe_oa_init_oa_buffer(struct xe_oa_stream *stream)
>+{
>+	u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo);
>+	u32 oa_buf = gtt_offset | OABUFFER_SIZE_16M | OAG_OABUFFER_MEMORY_SELECT;
>+	unsigned long flags;
>+
>+	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>+
>+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_status, 0);
>+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_head_ptr,
>+			gtt_offset & OAG_OAHEADPTR_MASK);
>+	stream->oa_buffer.head = 0;
>+
>+	/*
>+	 * PRM says: "This MMIO must be set before the OATAILPTR register and after the
>+	 * OAHEADPTR register. This is to enable proper functionality of the overflow bit".
>+	 */
>+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_buffer, oa_buf);
>+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_tail_ptr,
>+			gtt_offset & OAG_OATAILPTR_MASK);
>+
>+	/* Mark that we need updated tail pointer to read from */
>+	stream->oa_buffer.tail = 0;
>+
>+	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>+
>+	/* Zero out the OA buffer since we rely on zero report id and timestamp fields */
>+	memset(stream->oa_buffer.vaddr, 0, stream->oa_buffer.bo->size);
>+}
>+
>+u32 __format_to_oactrl(const struct xe_oa_format *format, int counter_sel_mask)
>+{
>+	return ((format->counter_select << __bf_shf(counter_sel_mask)) & counter_sel_mask) |
>+		REG_FIELD_PREP(OA_OACONTROL_REPORT_BC_MASK, format->bc_report) |
>+		REG_FIELD_PREP(OA_OACONTROL_COUNTER_SIZE_MASK, format->counter_size);
>+}
>+
>+static void xe_oa_enable(struct xe_oa_stream *stream)
>+{
>+	const struct xe_oa_format *format = stream->oa_buffer.format;
>+	const struct xe_oa_regs *regs;
>+	u32 val;
>+
>+	/*
>+	 * BSpec: 46822: Bit 0. Even if stream->sample is 0, for OAR to function, the OA
>+	 * buffer must be correctly initialized
>+	 */
>+	xe_oa_init_oa_buffer(stream);
>+
>+	regs = __oa_regs(stream);
>+	val = __format_to_oactrl(format, regs->oa_ctrl_counter_select_mask) |
>+		OAG_OACONTROL_OA_COUNTER_ENABLE;
>+
>+	xe_mmio_write32(stream->gt, regs->oa_ctrl, val);
>+}
>+
>+static void xe_oa_disable(struct xe_oa_stream *stream)
>+{
>+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_ctrl, 0);
>+	if (xe_mmio_wait32(stream->gt, __oa_regs(stream)->oa_ctrl,
>+			   OAG_OACONTROL_OA_COUNTER_ENABLE, 0, 50000, NULL, false))
>+		drm_err(&stream->oa->xe->drm,
>+			"wait for OA to be disabled timed out\n");
>+
>+	xe_mmio_write32(stream->gt, OA_TLB_INV_CR, 1);
>+	if (xe_mmio_wait32(stream->gt, OA_TLB_INV_CR, 1, 0, 50000, NULL, false))
>+		drm_err(&stream->oa->xe->drm,
>+			"wait for OA tlb invalidate timed out\n");
>+}
>+
>+static __poll_t xe_oa_poll_locked(struct xe_oa_stream *stream,
>+				  struct file *file, poll_table *wait)
>+{
>+	__poll_t events = 0;
>+
>+	poll_wait(file, &stream->poll_wq, wait);
>+
>+	/*
>+	 * We don't explicitly check whether there's something to read here since this
>+	 * path may be hot depending on what else userspace is polling, or on the timeout
>+	 * in use. We rely on hrtimer xe_oa_poll_check_timer_cb to notify us when there
>+	 * are samples to read
>+	 */
>+	if (stream->pollin)
>+		events |= EPOLLIN;
>+
>+	return events;
>+}
>+
>+static __poll_t xe_oa_poll(struct file *file, poll_table *wait)
>+{
>+	struct xe_oa_stream *stream = file->private_data;
>+	__poll_t ret;
>+
>+	mutex_lock(&stream->stream_lock);
>+	ret = xe_oa_poll_locked(stream, file, wait);
>+	mutex_unlock(&stream->stream_lock);
>+
>+	return ret;
>+}
>+
> static int xe_oa_submit_bb(struct xe_oa_stream *stream, struct xe_bb *bb)
> {
> 	struct xe_sched_job *job;
>@@ -222,6 +421,26 @@ static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
> 	xe_mmio_rmw32(stream->gt, XELPMP_SQCNT1, sqcnt1, 0);
> }
>
>+static void xe_oa_stream_destroy(struct xe_oa_stream *stream)
>+{
>+	struct xe_oa_unit *u = stream->hwe->oa_unit;
>+	struct xe_gt *gt = stream->hwe->gt;
>+
>+	if (WARN_ON(stream != u->exclusive_stream))
>+		return;
>+
>+	WRITE_ONCE(u->exclusive_stream, NULL);
>+
>+	xe_oa_disable_metric_set(stream);
>+	xe_exec_queue_put(stream->k_exec_q);
>+
>+	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
>+	xe_device_mem_access_put(stream->oa->xe);
>+
>+	xe_oa_free_oa_buffer(stream);
>+	xe_oa_free_configs(stream);
>+}
>+
> static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream)
> {
> 	struct xe_bo *bo;
>@@ -389,6 +608,139 @@ static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
> 	return xe_oa_emit_oa_config(stream);
> }
>
>+static void xe_oa_stream_enable(struct xe_oa_stream *stream)
>+{
>+	stream->pollin = false;
>+
>+	xe_oa_enable(stream);
>+
>+	if (stream->sample)
>+		hrtimer_start(&stream->poll_check_timer,
>+			      ns_to_ktime(stream->poll_period_ns),
>+			      HRTIMER_MODE_REL_PINNED);
>+}
>+
>+static void xe_oa_stream_disable(struct xe_oa_stream *stream)
>+{
>+	xe_oa_disable(stream);
>+
>+	if (stream->sample)
>+		hrtimer_cancel(&stream->poll_check_timer);
>+}
>+
>+static void xe_oa_enable_locked(struct xe_oa_stream *stream)
>+{
>+	if (stream->enabled)
>+		return;
>+
>+	stream->enabled = true;
>+
>+	xe_oa_stream_enable(stream);
>+}
>+
>+static void xe_oa_disable_locked(struct xe_oa_stream *stream)
>+{
>+	if (!stream->enabled)
>+		return;
>+
>+	stream->enabled = false;
>+
>+	xe_oa_stream_disable(stream);
>+}
>+
>+static long xe_oa_config_locked(struct xe_oa_stream *stream,
>+				unsigned long metrics_set)
>+{
>+	struct xe_oa_config *config;
>+	long ret = stream->oa_config->id;
>+
>+	config = xe_oa_get_oa_config(stream->oa, metrics_set);
>+	if (!config)
>+		return -ENODEV;
>+
>+	if (config != stream->oa_config) {
>+		int err;
>+
>+		err = xe_oa_emit_oa_config(stream);
>+		if (!err)
>+			config = xchg(&stream->oa_config, config);
>+		else
>+			ret = err;
>+	}
>+
>+	xe_oa_config_put(config);
>+
>+	return ret;
>+}
>+
>+static long xe_oa_ioctl_locked(struct xe_oa_stream *stream,
>+			       unsigned int cmd,
>+			       unsigned long arg)
>+{
>+	switch (cmd) {
>+	case DRM_XE_PERF_IOCTL_ENABLE:
>+		xe_oa_enable_locked(stream);
>+		return 0;
>+	case DRM_XE_PERF_IOCTL_DISABLE:
>+		xe_oa_disable_locked(stream);
>+		return 0;
>+	case DRM_XE_PERF_IOCTL_CONFIG:
>+		return xe_oa_config_locked(stream, arg);
>+	}
>+
>+	return -EINVAL;
>+}
>+
>+static long xe_oa_ioctl(struct file *file,
>+			unsigned int cmd,
>+			unsigned long arg)
>+{
>+	struct xe_oa_stream *stream = file->private_data;
>+	long ret;
>+
>+	mutex_lock(&stream->stream_lock);
>+	ret = xe_oa_ioctl_locked(stream, cmd, arg);
>+	mutex_unlock(&stream->stream_lock);
>+
>+	return ret;
>+}
>+
>+static void xe_oa_destroy_locked(struct xe_oa_stream *stream)
>+{
>+	if (stream->enabled)
>+		xe_oa_disable_locked(stream);
>+
>+	xe_oa_stream_destroy(stream);
>+
>+	if (stream->exec_q)
>+		xe_exec_queue_put(stream->exec_q);
>+
>+	kfree(stream);
>+}
>+
>+static int xe_oa_release(struct inode *inode, struct file *file)
>+{
>+	struct xe_oa_stream *stream = file->private_data;
>+	struct xe_gt *gt = stream->gt;
>+
>+	mutex_lock(&gt->oa.gt_lock);
>+	xe_oa_destroy_locked(stream);
>+	mutex_unlock(&gt->oa.gt_lock);
>+
>+	/* Release the reference the perf stream kept on the driver */
>+	drm_dev_put(&gt_to_xe(gt)->drm);
>+
>+	return 0;
>+}
>+
>+static const struct file_operations xe_oa_fops = {
>+	.owner		= THIS_MODULE,
>+	.llseek		= no_llseek,
>+	.release	= xe_oa_release,
>+	.poll		= xe_oa_poll,
>+	.unlocked_ioctl	= xe_oa_ioctl,
>+};
>+
> static int xe_oa_stream_init(struct xe_oa_stream *stream,
> 			     struct xe_oa_open_param *param)
> {
>@@ -445,6 +797,10 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream,
>
> 	WRITE_ONCE(u->exclusive_stream, stream);
>
>+	hrtimer_init(&stream->poll_check_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>+	stream->poll_check_timer.function = xe_oa_poll_check_timer_cb;
>+	init_waitqueue_head(&stream->poll_wq);
>+
> 	spin_lock_init(&stream->oa_buffer.ptr_lock);
> 	mutex_init(&stream->stream_lock);
>
>@@ -467,6 +823,7 @@ static int xe_oa_stream_open_ioctl_locked(struct xe_oa *oa,
> 					  struct xe_oa_open_param *param)
> {
> 	struct xe_oa_stream *stream;
>+	unsigned long f_flags = 0;
> 	int stream_fd;
> 	int ret;
>
>@@ -488,10 +845,26 @@ static int xe_oa_stream_open_ioctl_locked(struct xe_oa *oa,
> 	if (ret)
> 		goto err_free;
>
>+	if (param->open_flags & DRM_XE_OA_FLAG_FD_CLOEXEC)
>+		f_flags |= O_CLOEXEC;
>+	if (param->open_flags & DRM_XE_OA_FLAG_FD_NONBLOCK)
>+		f_flags |= O_NONBLOCK;
>+
>+	stream_fd = anon_inode_getfd("[xe_oa]", &xe_oa_fops, stream, f_flags);
>+	if (stream_fd < 0) {
>+		ret = stream_fd;
>+		goto err_destroy;
>+	}
>+
>+	if (!(param->open_flags & DRM_XE_OA_FLAG_DISABLED))
>+		xe_oa_enable_locked(stream);
>+
> 	/* Hold a reference on the drm device till stream_fd is released */
> 	drm_dev_get(&stream->oa->xe->drm);
>
> 	return stream_fd;
>+err_destroy:
>+	xe_oa_stream_destroy(stream);
> err_free:
> 	kfree(stream);
> exit:
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 10/17] drm/xe/oa/uapi: Read file_operation
  2023-12-08  6:43 ` [PATCH 10/17] drm/xe/oa/uapi: Read file_operation Ashutosh Dixit
@ 2023-12-20  3:01   ` Umesh Nerlige Ramappa
  2024-01-20  2:51     ` Dixit, Ashutosh
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-20  3:01 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:22PM -0800, Ashutosh Dixit wrote:
>Implement the OA stream read file_operation. Both blocking and non-blocking
>reads are supported. As part of read system call, the read copies OA perf
>data from the OA buffer to the user buffer, after appending packet headers
>for status and data packets.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/xe_oa.c | 239 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 239 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index b6e94dba5f525..5744436188dcd 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -170,6 +170,14 @@ static u64 oa_report_id(struct xe_oa_stream *stream, void *report)
> 	return oa_report_header_64bit(stream) ? *(u64 *)report : *(u32 *)report;
> }
>
>+static void oa_report_id_clear(struct xe_oa_stream *stream, u32 *report)
>+{
>+	if (oa_report_header_64bit(stream))
>+		*(u64 *)report = 0;
>+	else
>+		*report = 0;
>+}
>+
> static u64 oa_timestamp(struct xe_oa_stream *stream, void *report)
> {
> 	return oa_report_header_64bit(stream) ?
>@@ -177,6 +185,14 @@ static u64 oa_timestamp(struct xe_oa_stream *stream, void *report)
> 		*((u32 *)report + 1);
> }
>
>+static void oa_timestamp_clear(struct xe_oa_stream *stream, u32 *report)
>+{
>+	if (oa_report_header_64bit(stream))
>+		*(u64 *)&report[2] = 0;
>+	else
>+		report[1] = 0;
>+}
>+
> static bool xe_oa_buffer_check_unlocked(struct xe_oa_stream *stream)
> {
> 	u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo);
>@@ -251,6 +267,134 @@ static enum hrtimer_restart xe_oa_poll_check_timer_cb(struct hrtimer *hrtimer)
> 	return HRTIMER_RESTART;
> }
>
>+static int xe_oa_append_status(struct xe_oa_stream *stream, char __user *buf,
>+			       size_t count, size_t *offset,
>+			       enum drm_xe_oa_record_type type)

space/indent ^

>+{
>+	struct drm_xe_oa_record_header header = { type, 0, sizeof(header) };
>+
>+	if ((count - *offset) < header.size)
>+		return -ENOSPC;
>+
>+	if (copy_to_user(buf + *offset, &header, sizeof(header)))
>+		return -EFAULT;
>+
>+	*offset += header.size;
>+
>+	return 0;
>+}
>+
>+static int xe_oa_append_sample(struct xe_oa_stream *stream, char __user *buf,
>+			       size_t count, size_t *offset, const u8 *report)

space/indent ^ and a couple more places, in this patch.

>+{
>+	int report_size = stream->oa_buffer.format->size;
>+	struct drm_xe_oa_record_header header;
>+	int report_size_partial;
>+	u8 *oa_buf_end;
>+
>+	header.type = DRM_XE_OA_RECORD_SAMPLE;
>+	header.pad = 0;
>+	header.size = stream->sample_size;
>+
>+	if ((count - *offset) < header.size)
>+		return -ENOSPC;
>+
>+	buf += *offset;
>+	if (copy_to_user(buf, &header, sizeof(header)))
>+		return -EFAULT;
>+	buf += sizeof(header);
>+
>+	oa_buf_end = stream->oa_buffer.vaddr + XE_OA_BUFFER_SIZE;
>+	report_size_partial = oa_buf_end - report;
>+
>+	if (report_size_partial < report_size) {
>+		if (copy_to_user(buf, report, report_size_partial))
>+			return -EFAULT;
>+		buf += report_size_partial;
>+
>+		if (copy_to_user(buf, stream->oa_buffer.vaddr,
>+				 report_size - report_size_partial))
>+			return -EFAULT;
>+	} else if (copy_to_user(buf, report, report_size)) {
>+		return -EFAULT;
>+	}
>+
>+	*offset += header.size;
>+
>+	return 0;
>+}
>+
>+static int xe_oa_append_reports(struct xe_oa_stream *stream, char __user *buf,
>+				size_t count, size_t *offset)
>+{
>+	int report_size = stream->oa_buffer.format->size;
>+	u8 *oa_buf_base = stream->oa_buffer.vaddr;
>+	u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo);
>+	u32 mask = (XE_OA_BUFFER_SIZE - 1);
>+	size_t start_offset = *offset;
>+	unsigned long flags;
>+	u32 head, tail;
>+	int ret = 0;
>+
>+	if (drm_WARN_ON(&stream->oa->xe->drm, !stream->enabled))
>+		return -EIO;
>+
>+	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>+
>+	head = stream->oa_buffer.head;
>+	tail = stream->oa_buffer.tail;
>+
>+	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>+
>+	/* An out of bounds or misaligned head or tail pointer implies a driver bug */
>+	if (drm_WARN_ONCE(&stream->oa->xe->drm,
>+			  head > XE_OA_BUFFER_SIZE || tail > XE_OA_BUFFER_SIZE,
>+			  "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
>+			  head, tail))
>+		return -EIO;
>+
>+	for (; OA_TAKEN(tail, head); head = (head + report_size) & mask) {
>+		u8 *report = oa_buf_base + head;
>+		u32 *report32 = (void *)report;
>+
>+		ret = xe_oa_append_sample(stream, buf, count, offset, report);
>+		if (ret)
>+			break;
>+
>+		if (is_power_of_2(report_size)) {
>+			/* Clear out report id and timestamp to detect unlanded reports */
>+			oa_report_id_clear(stream, report32);
>+			oa_timestamp_clear(stream, report32);
>+		} else {
>+			u8 *oa_buf_end = stream->oa_buffer.vaddr +
>+					 XE_OA_BUFFER_SIZE;
>+			u32 part = oa_buf_end - (u8 *)report32;
>+
>+			/* Zero out the entire report */
>+			if (report_size <= part) {
>+				memset(report32, 0, report_size);
>+			} else {
>+				memset(report32, 0, part);
>+				memset(oa_buf_base, 0, report_size - part);
>+			}
>+		}
>+	}
>+
>+	if (start_offset != *offset) {
>+		struct xe_reg oaheadptr = __oa_regs(stream)->oa_head_ptr;
>+
>+		spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>+
>+		xe_mmio_write32(stream->gt, oaheadptr,
>+				(head + gtt_offset) & OAG_OAHEADPTR_MASK);
>+		stream->oa_buffer.head = head;
>+
>+		spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>+	}
>+
>+	return ret;
>+}
>+
> static void xe_oa_init_oa_buffer(struct xe_oa_stream *stream)
> {
> 	u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo);
>@@ -321,6 +465,100 @@ static void xe_oa_disable(struct xe_oa_stream *stream)
> 			"wait for OA tlb invalidate timed out\n");
> }
>
>+static int __xe_oa_read(struct xe_oa_stream *stream, char __user *buf,
>+			size_t count, size_t *offset)
>+{
>+	struct xe_reg oastatus_reg = __oa_regs(stream)->oa_status;
>+	u32 oastatus;
>+	int ret;
>+
>+	if (drm_WARN_ON(&stream->oa->xe->drm, !stream->oa_buffer.vaddr))
>+		return -EIO;
>+
>+	oastatus = xe_mmio_read32(stream->gt, oastatus_reg);
>+
>+	/* We treat OABUFFER_OVERFLOW as a significant error */
>+	if (oastatus & OAG_OASTATUS_BUFFER_OVERFLOW) {
>+		ret = xe_oa_append_status(stream, buf, count, offset,
>+					  DRM_XE_OA_RECORD_OA_BUFFER_LOST);
>+		if (ret)
>+			return ret;
>+
>+		drm_dbg(&stream->oa->xe->drm,
>+			"OA buffer overflow (exponent = %d): force restart\n",
>+			stream->period_exponent);
>+
>+		xe_oa_disable(stream);
>+		xe_oa_enable(stream);
>+
>+		/* oa_enable will re-init oabuffer and reset oastatus_reg */
>+		oastatus = xe_mmio_read32(stream->gt, oastatus_reg);
>+	}
>+
>+	if (oastatus & OAG_OASTATUS_REPORT_LOST) {
>+		ret = xe_oa_append_status(stream, buf, count, offset,
>+					  DRM_XE_OA_RECORD_OA_REPORT_LOST);
>+		if (ret)
>+			return ret;
>+
>+		xe_mmio_rmw32(stream->gt, oastatus_reg,
>+			      OAG_OASTATUS_COUNTER_OVERFLOW |
>+			      OAG_OASTATUS_REPORT_LOST, 0);
>+	}
>+
>+	return xe_oa_append_reports(stream, buf, count, offset);
>+}
>+
>+static int xe_oa_wait_unlocked(struct xe_oa_stream *stream)
>+{
>+	/* We might wait indefinitely if periodic sampling is not enabled */
>+	if (!stream->periodic)
>+		return -EIO;
>+
>+	return wait_event_interruptible(stream->poll_wq,
>+					xe_oa_buffer_check_unlocked(stream));
>+}
>+
>+static ssize_t xe_oa_read(struct file *file, char __user *buf,
>+			  size_t count, loff_t *ppos)
>+{
>+	struct xe_oa_stream *stream = file->private_data;
>+	size_t offset = 0;
>+	int ret;
>+
>+	/* Can't read from disabled streams */
>+	if (!stream->enabled || !stream->sample)
>+		return -EIO;
>+
>+	if (!(file->f_flags & O_NONBLOCK)) {
>+		do {
>+			ret = xe_oa_wait_unlocked(stream);
>+			if (ret)
>+				return ret;
>+
>+			mutex_lock(&stream->stream_lock);
>+			ret = __xe_oa_read(stream, buf, count, &offset);
>+			mutex_unlock(&stream->stream_lock);
>+		} while (!offset && !ret);
>+	} else {
>+		mutex_lock(&stream->stream_lock);
>+		ret = __xe_oa_read(stream, buf, count, &offset);
>+		mutex_unlock(&stream->stream_lock);
>+	}
>+
>+	/*
>+	 * Typically we clear pollin here in order to wait for the new hrtimer callback
>+	 * before unblocking. The exception to this is if __xe_oa_read returns -ENOSPC,
>+	 * which means that more OA data is available than could fit in the user provided
>+	 * buffer. In this case we want the next poll() call to not block.
>+	 */
>+	if (ret != -ENOSPC)
>+		stream->pollin = false;
>+
>+	/* Possible values for ret are 0, -EFAULT, -ENOSPC, -EIO, ... */
>+	return offset ?: (ret ?: -EAGAIN);
>+}
>+
> static __poll_t xe_oa_poll_locked(struct xe_oa_stream *stream,
> 				  struct file *file, poll_table *wait)
> {
>@@ -738,6 +976,7 @@ static const struct file_operations xe_oa_fops = {
> 	.llseek		= no_llseek,
> 	.release	= xe_oa_release,
> 	.poll		= xe_oa_poll,
>+	.read		= xe_oa_read,
> 	.unlocked_ioctl	= xe_oa_ioctl,
> };

With some indents addressed, this is:

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

>
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 11/17] drm/xe/oa: Disable overrun mode for Xe2+ OAG
  2023-12-08  6:43 ` [PATCH 11/17] drm/xe/oa: Disable overrun mode for Xe2+ OAG Ashutosh Dixit
@ 2023-12-20  3:05   ` Umesh Nerlige Ramappa
  2024-01-20  2:51     ` Dixit, Ashutosh
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-20  3:05 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:23PM -0800, Ashutosh Dixit wrote:
>Xe2+ OAG requires special handling because non-power-of-2 report sizes are
>not a sub-multiple of the OA buffer size and there are no partial reports
>at the end of the buffer. This issue is present only when overrun mode is
>enabled. Avoid adding this special handling by disabling overrun mode for
>Xe2+ OAG.

Like you mentioned earlier, maybe disable overrun for all platforms OR 
have the user control the setting.

Otherwise, this is

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>


>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/regs/xe_oa_regs.h | 1 +
> drivers/gpu/drm/xe/xe_oa.c           | 8 ++++++++
> 2 files changed, 9 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/regs/xe_oa_regs.h b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>index 4455a5a42b01b..7e2e875ccf80a 100644
>--- a/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>+++ b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>@@ -52,6 +52,7 @@
> #define  OABUFFER_SIZE_4M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 5)
> #define  OABUFFER_SIZE_8M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 6)
> #define  OABUFFER_SIZE_16M	REG_FIELD_PREP(OABUFFER_SIZE_MASK, 7)
>+#define  OAG_OABUFFER_DISABLE_OVERRUN_MODE	REG_BIT(1)
> #define  OAG_OABUFFER_MEMORY_SELECT		REG_BIT(0) /* 0: PPGTT, 1: GGTT */
>
> #define OAG_OACONTROL				XE_REG(0xdaf4)
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index 5744436188dcd..073476721377d 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -408,6 +408,14 @@ static void xe_oa_init_oa_buffer(struct xe_oa_stream *stream)
> 			gtt_offset & OAG_OAHEADPTR_MASK);
> 	stream->oa_buffer.head = 0;
>
>+	/*
>+	 * For Xe2+, OAG buffer is not a multiple of report size and there are no partial
>+	 * reports at the end of the buffer when overrun mode is enabled. Disable overrun
>+	 * mode to avoid this issue.
>+	 */
>+	if (GRAPHICS_VER(stream->oa->xe) >= 20 &&
>+	    stream->hwe->oa_unit->type == DRM_XE_OA_UNIT_TYPE_OAG)
>+		oa_buf |= OAG_OABUFFER_DISABLE_OVERRUN_MODE;
> 	/*
> 	 * PRM says: "This MMIO must be set before the OATAILPTR register and after the
> 	 * OAHEADPTR register. This is to enable proper functionality of the overflow bit".
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 16/17] drm/xe/oa: Add MMIO trigger support
  2023-12-08  6:43 ` [PATCH 16/17] drm/xe/oa: Add MMIO trigger support Ashutosh Dixit
@ 2023-12-20  4:35   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-20  4:35 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:28PM -0800, Ashutosh Dixit wrote:
>Add MMIO trigger support and allow-list required registers for MMIO trigger
>use case. Registers are whitelisted for the lifetime of the driver but MMIO
>trigger is enabled only for the duration of the stream.
>
>Bspec: 45925, 60340, 61228
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

lgtm
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

>---
> drivers/gpu/drm/xe/regs/xe_oa_regs.h  |  7 ++++++
> drivers/gpu/drm/xe/xe_oa.c            | 34 ++++++++++++++++++++++++++-
> drivers/gpu/drm/xe/xe_reg_whitelist.c | 23 ++++++++++++++++++
> include/uapi/drm/xe_drm.h             |  3 +++
> 4 files changed, 66 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/xe/regs/xe_oa_regs.h b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>index b66cd95b795e7..1ce27a72079ad 100644
>--- a/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>+++ b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>@@ -64,16 +64,23 @@
> #define  OA_OACONTROL_COUNTER_SIZE_MASK		REG_GENMASK(8, 8)
>
> #define OAG_OA_DEBUG XE_REG(0xdaf8, XE_REG_OPTION_MASKED)
>+#define  OAG_OA_DEBUG_DISABLE_MMIO_TRG			REG_BIT(14)
>+#define  OAG_OA_DEBUG_START_TRIGGER_SCOPE_CONTROL	REG_BIT(13)
>+#define  OAG_OA_DEBUG_DISABLE_START_TRG_2_COUNT_QUAL	REG_BIT(8)
>+#define  OAG_OA_DEBUG_DISABLE_START_TRG_1_COUNT_QUAL	REG_BIT(7)
> #define  OAG_OA_DEBUG_INCLUDE_CLK_RATIO			REG_BIT(6)
> #define  OAG_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS		REG_BIT(5)
> #define  OAG_OA_DEBUG_DISABLE_GO_1_0_REPORTS		REG_BIT(2)
> #define  OAG_OA_DEBUG_DISABLE_CTX_SWITCH_REPORTS	REG_BIT(1)
>
> #define OAG_OASTATUS XE_REG(0xdafc)
>+#define  OAG_OASTATUS_MMIO_TRG_Q_FULL	REG_BIT(6)
> #define  OAG_OASTATUS_COUNTER_OVERFLOW	REG_BIT(2)
> #define  OAG_OASTATUS_BUFFER_OVERFLOW	REG_BIT(1)
> #define  OAG_OASTATUS_REPORT_LOST	REG_BIT(0)
>
>+#define OAG_MMIOTRIGGER			XE_REG(0xdb1c)
>+
> /* OAC unit */
> #define OAC_OACONTROL			XE_REG(0x15114)
>
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index 97779cbb83ee8..13c6e516d9169 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -525,6 +525,16 @@ static int __xe_oa_read(struct xe_oa_stream *stream, char __user *buf,
> 		oastatus = xe_mmio_read32(stream->gt, oastatus_reg);
> 	}
>
>+	if (oastatus & OAG_OASTATUS_MMIO_TRG_Q_FULL) {
>+		ret = xe_oa_append_status(stream, buf, count, offset,
>+					  DRM_XE_OA_RECORD_OA_MMIO_TRG_Q_FULL);
>+		if (ret)
>+			return ret;
>+
>+		xe_mmio_rmw32(stream->gt, oastatus_reg,
>+			      OAG_OASTATUS_MMIO_TRG_Q_FULL, 0);
>+	}
>+
> 	if (oastatus & OAG_OASTATUS_REPORT_LOST) {
> 		ret = xe_oa_append_status(stream, buf, count, offset,
> 					  DRM_XE_OA_RECORD_OA_REPORT_LOST);
>@@ -835,6 +845,13 @@ static int xe_oa_configure_oa_context(struct xe_oa_stream *stream, bool enable)
>
> #define HAS_OA_BPC_REPORTING(xe) (GRAPHICS_VERx100(xe) >= 1255)
>
>+static u32 oag_configure_mmio_trigger(const struct xe_oa_stream *stream, bool enable)
>+{
>+	return _MASKED_FIELD(OAG_OA_DEBUG_DISABLE_MMIO_TRG,
>+			     enable && stream && stream->sample ?
>+			     0 : OAG_OA_DEBUG_DISABLE_MMIO_TRG);
>+}
>+
> static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
> {
> 	u32 sqcnt1;
>@@ -850,6 +867,9 @@ static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
> 					  _MASKED_BIT_DISABLE(DISABLE_DOP_GATING));
> 	}
>
>+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_debug,
>+			oag_configure_mmio_trigger(stream, false));
>+
> 	/* disable the context save/restore or OAR counters */
> 	if (stream->exec_q)
> 		xe_oa_configure_oa_context(stream, false);
>@@ -1031,9 +1051,17 @@ static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
> 	oa_debug = OAG_OA_DEBUG_DISABLE_CLK_RATIO_REPORTS |
> 		OAG_OA_DEBUG_INCLUDE_CLK_RATIO;
>
>+	if (GRAPHICS_VER(stream->oa->xe) >= 20)
>+		oa_debug |=
>+			/* The three bits below are needed to get PEC counters running */
>+			OAG_OA_DEBUG_START_TRIGGER_SCOPE_CONTROL |
>+			OAG_OA_DEBUG_DISABLE_START_TRG_2_COUNT_QUAL |
>+			OAG_OA_DEBUG_DISABLE_START_TRG_1_COUNT_QUAL;
>+
> 	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_debug,
> 			_MASKED_BIT_ENABLE(oa_debug) |
>-			oag_report_ctx_switches(stream));
>+			oag_report_ctx_switches(stream) |
>+			oag_configure_mmio_trigger(stream, true));
>
> 	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_ctx_ctrl, stream->periodic ?
> 			(OAG_OAGLBCTXCTRL_COUNTER_RESUME |
>@@ -2259,6 +2287,10 @@ static void __xe_oa_init_oa_units(struct xe_gt *gt)
> 			u->type = DRM_XE_OA_UNIT_TYPE_OAM;
> 		}
>
>+		/* Ensure MMIO triggers remain disabled till there is a stream */
>+		xe_mmio_write32(gt, u->regs.oa_debug,
>+				oag_configure_mmio_trigger(NULL, false));
>+
> 		/* Set oa_unit_ids now to ensure ids remain contiguous */
> 		u->oa_unit_id = gt_to_xe(gt)->oa.oa_unit_ids++;
> 	}
>diff --git a/drivers/gpu/drm/xe/xe_reg_whitelist.c b/drivers/gpu/drm/xe/xe_reg_whitelist.c
>index e66ae1bdaf9c0..267af6759332b 100644
>--- a/drivers/gpu/drm/xe/xe_reg_whitelist.c
>+++ b/drivers/gpu/drm/xe/xe_reg_whitelist.c
>@@ -7,6 +7,7 @@
>
> #include "regs/xe_engine_regs.h"
> #include "regs/xe_gt_regs.h"
>+#include "regs/xe_oa_regs.h"
> #include "xe_gt_types.h"
> #include "xe_platform_types.h"
> #include "xe_rtp.h"
>@@ -56,6 +57,28 @@ static const struct xe_rtp_entry_sr register_whitelist[] = {
> 				   RING_FORCE_TO_NONPRIV_DENY,
> 				   XE_RTP_ACTION_FLAG(ENGINE_BASE)))
> 	},
>+	{ XE_RTP_NAME("oa_reg_render"),
>+	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, XE_RTP_END_VERSION_UNDEFINED),
>+		       ENGINE_CLASS(RENDER)),
>+	  XE_RTP_ACTIONS(WHITELIST(OAG_MMIOTRIGGER,
>+				   RING_FORCE_TO_NONPRIV_ACCESS_RW),
>+			 WHITELIST(OAG_OASTATUS,
>+				   RING_FORCE_TO_NONPRIV_ACCESS_RD),
>+			 WHITELIST(OAG_OAHEADPTR,
>+				   RING_FORCE_TO_NONPRIV_ACCESS_RD |
>+				   RING_FORCE_TO_NONPRIV_RANGE_4))
>+	},
>+	{ XE_RTP_NAME("oa_reg_compute"),
>+	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, XE_RTP_END_VERSION_UNDEFINED),
>+		       ENGINE_CLASS(COMPUTE)),
>+	  XE_RTP_ACTIONS(WHITELIST(OAG_MMIOTRIGGER,
>+				   RING_FORCE_TO_NONPRIV_ACCESS_RW),
>+			 WHITELIST(OAG_OASTATUS,
>+				   RING_FORCE_TO_NONPRIV_ACCESS_RD),
>+			 WHITELIST(OAG_OAHEADPTR,
>+				   RING_FORCE_TO_NONPRIV_ACCESS_RD |
>+				   RING_FORCE_TO_NONPRIV_RANGE_4))
>+	},
> 	{}
> };
>
>diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>index 5f41c5bfe5e0e..34cd7d5206834 100644
>--- a/include/uapi/drm/xe_drm.h
>+++ b/include/uapi/drm/xe_drm.h
>@@ -1357,6 +1357,9 @@ enum drm_xe_oa_record_type {
> 	 */
> 	DRM_XE_OA_RECORD_OA_BUFFER_LOST = 3,
>
>+	/** @DRM_XE_OA_RECORD_OA_MMIO_TRG_Q_FULL: Status indicating MMIO trigger queue full */
>+	DRM_XE_OA_RECORD_OA_MMIO_TRG_Q_FULL = 4,
>+
> 	DRM_XE_OA_RECORD_MAX /* non-ABI */
> };
>
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 12/17] drm/xe/oa: Add OAR support
  2023-12-08  6:43 ` [PATCH 12/17] drm/xe/oa: Add OAR support Ashutosh Dixit
@ 2023-12-20  4:37   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-20  4:37 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:24PM -0800, Ashutosh Dixit wrote:
>Add OAR support to allow userspace to execute MI_REPORT_PERF_COUNT on
>render engines. Configuration batches are used to program the OAR unit, as
>well as modifying the render engine context image of a specified exec queue
>(to have correct register values when that context switches in).
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

>---
> .../gpu/drm/xe/instructions/xe_mi_commands.h  |   3 +
> drivers/gpu/drm/xe/regs/xe_engine_regs.h      |   3 +-
> drivers/gpu/drm/xe/xe_lrc.c                   |  11 +-
> drivers/gpu/drm/xe/xe_lrc.h                   |   1 +
> drivers/gpu/drm/xe/xe_oa.c                    | 216 ++++++++++++++++++
> drivers/gpu/drm/xe/xe_oa_types.h              |   4 +
> 6 files changed, 232 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
>index 1cfa96167fde3..d333132b021e0 100644
>--- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
>+++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
>@@ -45,6 +45,9 @@
> #define   MI_LRI_MMIO_REMAP_EN		REG_BIT(17)
> #define   MI_LRI_NUM_REGS(x)		XE_INSTR_NUM_DW(2 * (x) + 1)
> #define   MI_LRI_FORCE_POSTED		REG_BIT(12)
>+#define   IS_MI_LRI_CMD(x)		(REG_FIELD_GET(MI_OPCODE, (x)) == \
>+					 REG_FIELD_GET(MI_OPCODE, MI_LOAD_REGISTER_IMM))
>+#define   MI_LRI_LEN(x)			(((x) & 0xff) + 1)
>
> #define MI_FLUSH_DW			__MI_INSTR(0x26)
> #define   MI_FLUSH_DW_STORE_INDEX	REG_BIT(21)
>diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
>index 444ff9b83bb1b..76c0938df05f3 100644
>--- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h
>+++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
>@@ -71,7 +71,8 @@
> #define RING_EXECLIST_STATUS_LO(base)		XE_REG((base) + 0x234)
> #define RING_EXECLIST_STATUS_HI(base)		XE_REG((base) + 0x234 + 4)
>
>-#define RING_CONTEXT_CONTROL(base)		XE_REG((base) + 0x244)
>+#define RING_CONTEXT_CONTROL(base)		XE_REG((base) + 0x244, XE_REG_OPTION_MASKED)
>+#define	  CTX_CTRL_OAC_CONTEXT_ENABLE		REG_BIT(8)
> #define	  CTX_CTRL_INHIBIT_SYN_CTX_SWITCH	REG_BIT(3)
> #define	  CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT	REG_BIT(0)
>
>diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
>index 17c0eb9e62cfb..8586e1f4a7fbc 100644
>--- a/drivers/gpu/drm/xe/xe_lrc.c
>+++ b/drivers/gpu/drm/xe/xe_lrc.c
>@@ -565,12 +565,18 @@ u32 xe_lrc_pphwsp_offset(struct xe_lrc *lrc)
>
> /* Make the magic macros work */
> #define __xe_lrc_pphwsp_offset xe_lrc_pphwsp_offset
>+#define __xe_lrc_regs_offset xe_lrc_regs_offset
>
> #define LRC_SEQNO_PPHWSP_OFFSET 512
> #define LRC_START_SEQNO_PPHWSP_OFFSET (LRC_SEQNO_PPHWSP_OFFSET + 8)
> #define LRC_PARALLEL_PPHWSP_OFFSET 2048
> #define LRC_PPHWSP_SIZE SZ_4K
>
>+u32 xe_lrc_regs_offset(struct xe_lrc *lrc)
>+{
>+	return xe_lrc_pphwsp_offset(lrc) + LRC_PPHWSP_SIZE;
>+}
>+
> static size_t lrc_reg_size(struct xe_device *xe)
> {
> 	if (GRAPHICS_VERx100(xe) >= 1250)
>@@ -602,11 +608,6 @@ static inline u32 __xe_lrc_parallel_offset(struct xe_lrc *lrc)
> 	return xe_lrc_pphwsp_offset(lrc) + LRC_PARALLEL_PPHWSP_OFFSET;
> }
>
>-static inline u32 __xe_lrc_regs_offset(struct xe_lrc *lrc)
>-{
>-	return xe_lrc_pphwsp_offset(lrc) + LRC_PPHWSP_SIZE;
>-}
>-
> #define DECL_MAP_ADDR_HELPERS(elem) \
> static inline struct iosys_map __xe_lrc_##elem##_map(struct xe_lrc *lrc) \
> { \
>diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
>index 28b1d3f404d4f..d6d8aa8fb51eb 100644
>--- a/drivers/gpu/drm/xe/xe_lrc.h
>+++ b/drivers/gpu/drm/xe/xe_lrc.h
>@@ -23,6 +23,7 @@ void xe_lrc_finish(struct xe_lrc *lrc);
>
> size_t xe_lrc_size(struct xe_device *xe, enum xe_engine_class class);
> u32 xe_lrc_pphwsp_offset(struct xe_lrc *lrc);
>+u32 xe_lrc_regs_offset(struct xe_lrc *lrc);
>
> void xe_lrc_set_ring_head(struct xe_lrc *lrc, u32 head);
> u32 xe_lrc_ring_head(struct xe_lrc *lrc);
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index 073476721377d..9d653d7722d1a 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -12,7 +12,9 @@
> #include <drm/xe_drm.h>
>
> #include "instructions/xe_mi_commands.h"
>+#include "regs/xe_engine_regs.h"
> #include "regs/xe_gt_regs.h"
>+#include "regs/xe_lrc_layout.h"
> #include "regs/xe_oa_regs.h"
> #include "xe_device.h"
> #include "xe_exec_queue.h"
>@@ -20,6 +22,7 @@
> #include "xe_bo.h"
> #include "xe_gt.h"
> #include "xe_gt_mcr.h"
>+#include "xe_lrc.h"
> #include "xe_mmio.h"
> #include "xe_oa.h"
> #include "xe_sched_job.h"
>@@ -63,6 +66,12 @@ struct xe_oa_config {
> 	struct rcu_head rcu;
> };
>
>+struct flex {
>+	struct xe_reg reg;
>+	u32 offset;
>+	u32 value;
>+};
>+
> struct xe_oa_open_param {
> 	u32 oa_unit_id;
> 	bool sample;
>@@ -640,6 +649,119 @@ static void xe_oa_free_configs(struct xe_oa_stream *stream)
> 		free_oa_config_bo(oa_bo);
> }
>
>+static void xe_oa_store_flex(struct xe_oa_stream *stream, struct xe_lrc *lrc,
>+			     struct xe_bb *bb, const struct flex *flex, u32 count)
>+{
>+	u32 offset = xe_bo_ggtt_addr(lrc->bo);
>+
>+	do {
>+		bb->cs[bb->len++] = MI_STORE_DATA_IMM | BIT(22) /* GGTT */ | 2;
>+		bb->cs[bb->len++] = offset + flex->offset * sizeof(u32);
>+		bb->cs[bb->len++] = 0;
>+		bb->cs[bb->len++] = flex->value;
>+
>+	} while (flex++, --count);
>+}
>+
>+static int xe_oa_modify_context(struct xe_oa_stream *stream, struct xe_lrc *lrc,
>+				const struct flex *flex, u32 count)
>+{
>+	struct xe_bb *bb;
>+	int err;
>+
>+	bb = xe_bb_new(stream->gt, 4 * count + 1, false);
>+	if (IS_ERR(bb)) {
>+		err = PTR_ERR(bb);
>+		goto exit;
>+	}
>+
>+	xe_oa_store_flex(stream, lrc, bb, flex, count);
>+
>+	err = xe_oa_submit_bb(stream, bb);
>+	xe_bb_free(bb, NULL);
>+exit:
>+	return err;
>+}
>+
>+static void xe_oa_load_flex(struct xe_oa_stream *stream, struct xe_bb *bb,
>+			    const struct flex *flex, u32 count)
>+{
>+	XE_WARN_ON(!count || count > 63);
>+
>+	bb->cs[bb->len++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(count);
>+
>+	do {
>+		bb->cs[bb->len++] = flex->reg.addr;
>+		bb->cs[bb->len++] = flex->value;
>+
>+	} while (flex++, --count);
>+
>+	bb->cs[bb->len++] = MI_NOOP;
>+}
>+
>+static int xe_oa_modify_self(struct xe_oa_stream *stream,
>+			     const struct flex *flex, u32 count)
>+{
>+	struct xe_bb *bb;
>+	int err;
>+
>+	bb = xe_bb_new(stream->gt, 2 * count + 3, false);
>+	if (IS_ERR(bb)) {
>+		err = PTR_ERR(bb);
>+		goto exit;
>+	}
>+
>+	xe_oa_load_flex(stream, bb, flex, count);
>+
>+	err = xe_oa_submit_bb(stream, bb);
>+	xe_bb_free(bb, NULL);
>+exit:
>+	return err;
>+}
>+
>+#define OAR_OAC_OACONTROL_OFFSET 0x5B0
>+
>+static int xe_oa_configure_oar_context(struct xe_oa_stream *stream, bool enable)
>+{
>+	const struct xe_oa_format *format = stream->oa_buffer.format;
>+	struct xe_lrc *lrc = &stream->exec_q->lrc[0];
>+	u32 regs_offset = xe_lrc_regs_offset(lrc) / sizeof(u32);
>+	u32 oacontrol = __format_to_oactrl(format, OAR_OACONTROL_COUNTER_SEL_MASK) |
>+		(enable ? OAR_OACONTROL_COUNTER_ENABLE : 0);
>+
>+	struct flex regs_context[] = {
>+		{
>+			OACTXCONTROL(stream->hwe->mmio_base),
>+			stream->oa->ctx_oactxctrl_offset[stream->hwe->class] + 1,
>+			enable ? OA_COUNTER_RESUME : 0,
>+		},
>+		{
>+			RING_CONTEXT_CONTROL(stream->hwe->mmio_base),
>+			regs_offset + CTX_CONTEXT_CONTROL,
>+			_MASKED_FIELD(CTX_CTRL_OAC_CONTEXT_ENABLE,
>+				      enable ? CTX_CTRL_OAC_CONTEXT_ENABLE : 0)
>+		},
>+	};
>+	/* Offsets in regs_lri are not used since this configuration is applied using LRI */
>+	struct flex regs_lri[] = {
>+		{
>+			OAR_OACONTROL,
>+			OAR_OAC_OACONTROL_OFFSET + 1,
>+			oacontrol,
>+		},
>+	};
>+	int err;
>+
>+	/* Modify stream hwe context image with regs_context */
>+	err = xe_oa_modify_context(stream, &stream->exec_q->lrc[0],
>+				   regs_context, ARRAY_SIZE(regs_context));
>+	if (err)
>+		return err;
>+
>+	/* Apply regs_lri using LRI */
>+	return xe_oa_modify_self(stream, regs_lri, ARRAY_SIZE(regs_lri));
>+}
>+
> #define HAS_OA_BPC_REPORTING(xe) (GRAPHICS_VERx100(xe) >= 1255)
>
> static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
>@@ -657,6 +779,10 @@ static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
> 					  _MASKED_BIT_DISABLE(DISABLE_DOP_GATING));
> 	}
>
>+	/* disable the context save/restore or OAR counters */
>+	if (stream->exec_q)
>+		xe_oa_configure_oar_context(stream, false);
>+
> 	/* Make sure we disable noa to save power. */
> 	xe_mmio_rmw32(stream->gt, RPM_CONFIG1, GT_NOA_ENABLE, 0);
>
>@@ -814,6 +940,7 @@ static u32 oag_report_ctx_switches(const struct xe_oa_stream *stream)
> static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
> {
> 	u32 oa_debug, sqcnt1;
>+	int ret;
>
> 	/*
> 	 * Wa_1508761755:xehpsdv, dg2
>@@ -851,6 +978,12 @@ static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
>
> 	xe_mmio_rmw32(stream->gt, XELPMP_SQCNT1, 0, sqcnt1);
>
>+	if (stream->exec_q) {
>+		ret = xe_oa_configure_oar_context(stream, true);
>+		if (ret)
>+			return ret;
>+	}
>+
> 	return xe_oa_emit_oa_config(stream);
> }
>
>@@ -988,6 +1121,78 @@ static const struct file_operations xe_oa_fops = {
> 	.unlocked_ioctl	= xe_oa_ioctl,
> };
>
>+static bool engine_supports_mi_query(struct xe_hw_engine *hwe)
>+{
>+	return hwe->class == XE_ENGINE_CLASS_RENDER ||
>+		hwe->class == XE_ENGINE_CLASS_COMPUTE;
>+}
>+
>+static bool xe_oa_find_reg_in_lri(u32 *state, u32 reg, u32 *offset, u32 end)
>+{
>+	u32 idx = *offset;
>+	u32 len = min(MI_LRI_LEN(state[idx]) + idx, end);
>+	bool found = false;
>+
>+	idx++;
>+	for (; idx < len; idx += 2) {
>+		if (state[idx] == reg) {
>+			found = true;
>+			break;
>+		}
>+	}
>+
>+	*offset = idx;
>+	return found;
>+}
>+
>+static u32 xe_oa_context_image_offset(struct xe_oa_stream *stream, u32 reg)
>+{
>+	struct xe_lrc *lrc = &stream->exec_q->lrc[0];
>+	u32 len = (xe_lrc_size(stream->oa->xe, stream->hwe->class) +
>+		   lrc->ring.size) / sizeof(u32);
>+	u32 offset = xe_lrc_regs_offset(lrc) / sizeof(u32);
>+	u32 *state = (u32 *)lrc->bo->vmap.vaddr;
>+
>+	if (drm_WARN_ON(&stream->oa->xe->drm, !state))
>+		return U32_MAX;
>+
>+	for (; offset < len; ) {
>+		if (IS_MI_LRI_CMD(state[offset])) {
>+			/*
>+			 * We expect reg-value pairs in MI_LRI command, so
>+			 * MI_LRI_LEN() should be even
>+			 */
>+			drm_WARN_ON(&stream->oa->xe->drm,
>+				    MI_LRI_LEN(state[offset]) & 0x1);
>+
>+			if (xe_oa_find_reg_in_lri(state, reg, &offset, len))
>+				break;
>+		} else {
>+			offset++;
>+		}
>+	}
>+
>+	return offset < len ? offset : U32_MAX;
>+}
>+
>+static int xe_oa_set_ctx_ctrl_offset(struct xe_oa_stream *stream)
>+{
>+	struct xe_reg reg = OACTXCONTROL(stream->hwe->mmio_base);
>+	u32 offset = stream->oa->ctx_oactxctrl_offset[stream->hwe->class];
>+
>+	/* Do this only once. Failure is stored as offset of U32_MAX */
>+	if (offset)
>+		goto exit;
>+
>+	offset = xe_oa_context_image_offset(stream, reg.addr);
>+	stream->oa->ctx_oactxctrl_offset[stream->hwe->class] = offset;
>+
>+	drm_dbg(&stream->oa->xe->drm, "%s oa ctx control at 0x%08x dword offset\n",
>+		stream->hwe->name, offset);
>+exit:
>+	return offset && offset != U32_MAX ? 0 : -ENODEV;
>+}
>+
> static int xe_oa_stream_init(struct xe_oa_stream *stream,
> 			     struct xe_oa_open_param *param)
> {
>@@ -1008,6 +1213,17 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream,
> 	stream->periodic = param->period_exponent > 0;
> 	stream->period_exponent = param->period_exponent;
>
>+	if (stream->exec_q && engine_supports_mi_query(stream->hwe)) {
>+		/* If we don't find the context offset, just return error */
>+		ret = xe_oa_set_ctx_ctrl_offset(stream);
>+		if (ret) {
>+			drm_err(&stream->oa->xe->drm,
>+				"xe_oa_set_ctx_ctrl_offset failed for %s\n",
>+				stream->hwe->name);
>+			goto exit;
>+		}
>+	}
>+
> 	stream->oa_config = xe_oa_get_oa_config(stream->oa, param->metric_set);
> 	if (!stream->oa_config) {
> 		drm_dbg(&stream->oa->xe->drm, "Invalid OA config id=%i\n", param->metric_set);
>diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
>index 05047226af8d1..bcd8d249faaec 100644
>--- a/drivers/gpu/drm/xe/xe_oa_types.h
>+++ b/drivers/gpu/drm/xe/xe_oa_types.h
>@@ -13,6 +13,7 @@
>
> #include <drm/xe_drm.h>
> #include "regs/xe_reg_defs.h"
>+#include "xe_hw_engine_types.h"
>
> #define XE_OA_BUFFER_SIZE SZ_16M
>
>@@ -132,6 +133,9 @@ struct xe_oa {
> 	/** @metrics_idr: List of dynamic configurations (struct xe_oa_config) */
> 	struct idr metrics_idr;
>
>+	/** @ctx_oactxctrl_offset: offset of OACTXCONTROL register in context image */
>+	u32 ctx_oactxctrl_offset[XE_ENGINE_CLASS_MAX];
>+
> 	/** @oa_formats: tracks all OA formats across platforms */
> 	const struct xe_oa_format *oa_formats;
>
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 13/17] drm/xe/oa: Add OAC support
  2023-12-08  6:43 ` [PATCH 13/17] drm/xe/oa: Add OAC support Ashutosh Dixit
@ 2023-12-20  4:59   ` Umesh Nerlige Ramappa
  2024-01-20  2:52     ` FIXME " Dixit, Ashutosh
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-20  4:59 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:25PM -0800, Ashutosh Dixit wrote:
>Similar to OAR, allow userspace to execute MI_REPORT_PERF_COUNT on compute
>engines of a specified exec queue.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/regs/xe_engine_regs.h |  1 +
> drivers/gpu/drm/xe/regs/xe_oa_regs.h     |  3 +
> drivers/gpu/drm/xe/xe_oa.c               | 81 +++++++++++++++++++++++-
> 3 files changed, 82 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
>index 76c0938df05f3..045f9773f01f4 100644
>--- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h
>+++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
>@@ -73,6 +73,7 @@
>
> #define RING_CONTEXT_CONTROL(base)		XE_REG((base) + 0x244, XE_REG_OPTION_MASKED)
> #define	  CTX_CTRL_OAC_CONTEXT_ENABLE		REG_BIT(8)
>+#define	  CTX_CTRL_RUN_ALONE			REG_BIT(7)
> #define	  CTX_CTRL_INHIBIT_SYN_CTX_SWITCH	REG_BIT(3)
> #define	  CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT	REG_BIT(0)
>
>diff --git a/drivers/gpu/drm/xe/regs/xe_oa_regs.h b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>index 7e2e875ccf80a..b66cd95b795e7 100644
>--- a/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>+++ b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
>@@ -74,6 +74,9 @@
> #define  OAG_OASTATUS_BUFFER_OVERFLOW	REG_BIT(1)
> #define  OAG_OASTATUS_REPORT_LOST	REG_BIT(0)
>
>+/* OAC unit */
>+#define OAC_OACONTROL			XE_REG(0x15114)
>+
> /* OAM unit */
> #define OAM_HEAD_POINTER_OFFSET			(0x1a0)
> #define OAM_TAIL_POINTER_OFFSET			(0x1a4)
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index 9d653d7722d1a..42f32d4359f2c 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -449,6 +449,19 @@ u32 __format_to_oactrl(const struct xe_oa_format *format, int counter_sel_mask)
> 		REG_FIELD_PREP(OA_OACONTROL_COUNTER_SIZE_MASK, format->counter_size);
> }
>
>+static u32 __oa_ccs_select(struct xe_oa_stream *stream)
>+{
>+	u32 val;
>+
>+	if (stream->hwe->class != XE_ENGINE_CLASS_COMPUTE)
>+		return 0;
>+
>+	val = REG_FIELD_PREP(OAG_OACONTROL_OA_CCS_SELECT_MASK, stream->hwe->instance);
>+	xe_assert(stream->oa->xe,
>+		  REG_FIELD_GET(OAG_OACONTROL_OA_CCS_SELECT_MASK, val) == stream->hwe->instance);

Why is there a need to do REG_FIELD_GET? I thought the REG_FIELD_PREP is 
just a bitwise operation. Are you expecting coherency issues?

>+}
>+
> static void xe_oa_enable(struct xe_oa_stream *stream)
> {
> 	const struct xe_oa_format *format = stream->oa_buffer.format;
>@@ -463,7 +476,7 @@ static void xe_oa_enable(struct xe_oa_stream *stream)
>
> 	regs = __oa_regs(stream);
> 	val = __format_to_oactrl(format, regs->oa_ctrl_counter_select_mask) |
>-		OAG_OACONTROL_OA_COUNTER_ENABLE;
>+		__oa_ccs_select(stream) | OAG_OACONTROL_OA_COUNTER_ENABLE;
>
> 	xe_mmio_write32(stream->gt, regs->oa_ctrl, val);
> }
>@@ -762,6 +775,64 @@ static int xe_oa_configure_oar_context(struct xe_oa_stream *stream, bool enable)
> 	return xe_oa_modify_self(stream, regs_lri, ARRAY_SIZE(regs_lri));
> }
>
>+static int xe_oa_configure_oac_context(struct xe_oa_stream *stream, bool enable)
>+{
>+	const struct xe_oa_format *format = stream->oa_buffer.format;
>+	struct xe_lrc *lrc = &stream->exec_q->lrc[0];
>+	u32 regs_offset = xe_lrc_regs_offset(lrc) / sizeof(u32);
>+	u32 oacontrol = __format_to_oactrl(format, OAR_OACONTROL_COUNTER_SEL_MASK) |
>+		(enable ? OAR_OACONTROL_COUNTER_ENABLE : 0);
>+	struct flex regs_context[] = {
>+		{
>+			OACTXCONTROL(stream->hwe->mmio_base),
>+			stream->oa->ctx_oactxctrl_offset[stream->hwe->class] + 1,
>+			enable ? OA_COUNTER_RESUME : 0,
>+		},
>+		{
>+			RING_CONTEXT_CONTROL(stream->hwe->mmio_base),
>+			regs_offset + CTX_CONTEXT_CONTROL,
>+			_MASKED_FIELD(CTX_CTRL_OAC_CONTEXT_ENABLE,
>+				      enable ? CTX_CTRL_OAC_CONTEXT_ENABLE : 0) |
>+			_MASKED_FIELD(CTX_CTRL_RUN_ALONE,
>+				      enable ? CTX_CTRL_RUN_ALONE : 0),
>+		},
>+	};
>+	/* Offsets in regs_lri are not used since this configuration is applied using LRI */
>+	struct flex regs_lri[] = {
>+		{
>+			OAC_OACONTROL,
>+			OAR_OAC_OACONTROL_OFFSET + 1,
>+			oacontrol,
>+		},
>+	};
>+	int err;
>+
>+	/* Set ccs select to enable programming of OAC_OACONTROL */
>+	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_ctrl, __oa_ccs_select(stream));
>+
>+	/* Modify stream hwe context image with regs_context */
>+	err = xe_oa_modify_context(stream, &stream->exec_q->lrc[0],
>+				   regs_context, ARRAY_SIZE(regs_context));
>+	if (err)
>+		return err;
>+
>+	/* Apply regs_lri using LRI */
>+	return xe_oa_modify_self(stream, regs_lri, ARRAY_SIZE(regs_lri));

I think in i915, for execlist scheduling, there was a kernel context 
that was scheduled and when it ran, it would indicate that all other 
contexts are done executing - kinda GPU idle. The modify self was (IMO) 
only needed to update the kernel context in this scenario.

GuC does not have a concept of kernel context (at least not in i915, not 
sure if things changed in XE). If so, all the modify self can be dropped 
(both from OAR and OAC).

Otherwise, this is good too, so

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

The other query I have (unrelated to this patch) is if we need the 
PWR_CLK_STATE state configured in all contexts. It's a bit hazy if that 
was only needed for older gens or if it is applicable to newer 
platforms. (gen12_configure_all_contexts() in i915).

Thanks,
Umesh


>+}
>+
>+static int xe_oa_configure_oa_context(struct xe_oa_stream *stream, bool enable)
>+{
>+	switch (stream->hwe->class) {
>+	case XE_ENGINE_CLASS_RENDER:
>+		return xe_oa_configure_oar_context(stream, enable);
>+	case XE_ENGINE_CLASS_COMPUTE:
>+		return xe_oa_configure_oac_context(stream, enable);
>+	default:
>+		/* Video engines do not support MI_REPORT_PERF_COUNT */
>+		return 0;
>+	}
>+}
>+
> #define HAS_OA_BPC_REPORTING(xe) (GRAPHICS_VERx100(xe) >= 1255)
>
> static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
>@@ -781,7 +852,7 @@ static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
>
> 	/* disable the context save/restore or OAR counters */
> 	if (stream->exec_q)
>-		xe_oa_configure_oar_context(stream, false);
>+		xe_oa_configure_oa_context(stream, false);
>
> 	/* Make sure we disable noa to save power. */
> 	xe_mmio_rmw32(stream->gt, RPM_CONFIG1, GT_NOA_ENABLE, 0);
>@@ -978,8 +1049,9 @@ static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
>
> 	xe_mmio_rmw32(stream->gt, XELPMP_SQCNT1, 0, sqcnt1);
>
>+	/* Configure OAR/OAC */
> 	if (stream->exec_q) {
>-		ret = xe_oa_configure_oar_context(stream, true);
>+		ret = xe_oa_configure_oa_context(stream, true);
> 		if (ret)
> 			return ret;
> 	}
>@@ -1636,6 +1708,9 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, void *data, struct drm_file
> 		param.exec_q = xe_exec_queue_lookup(xef, param.exec_queue_id);
> 		if (XE_IOCTL_DBG(oa->xe, !param.exec_q))
> 			return -ENOENT;
>+
>+		if (param.exec_q->width > 1)
>+			drm_dbg(&oa->xe->drm, "exec_q->width > 1, programming only exec_q->lrc[0]\n");
> 	}
>
> 	/*
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 14/17] drm/xe/oa/uapi: Query OA unit properties
  2023-12-08  6:43 ` [PATCH 14/17] drm/xe/oa/uapi: Query OA unit properties Ashutosh Dixit
@ 2023-12-23  0:40   ` Umesh Nerlige Ramappa
  2024-01-20  3:10     ` Dixit, Ashutosh
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-23  0:40 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:26PM -0800, Ashutosh Dixit wrote:
>Implement query for properties of OA units present on a device.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/xe_oa.h    |  2 +
> drivers/gpu/drm/xe/xe_query.c | 81 +++++++++++++++++++++++++++++++++++
> include/uapi/drm/xe_drm.h     | 64 +++++++++++++++++++++++++++
> 3 files changed, 147 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
>index a0f9a876ea6b4..b88914693cdb3 100644
>--- a/drivers/gpu/drm/xe/xe_oa.h
>+++ b/drivers/gpu/drm/xe/xe_oa.h
>@@ -25,5 +25,7 @@ int xe_oa_add_config_ioctl(struct drm_device *dev, void *data,
> 			   struct drm_file *file);
> int xe_oa_remove_config_ioctl(struct drm_device *dev, void *data,
> 			      struct drm_file *file);
>+u32 xe_oa_timestamp_frequency(struct xe_gt *gt);
>+u16 xe_oa_unit_id(struct xe_hw_engine *hwe);
>
> #endif
>diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
>index 56d61bf596b2b..abe2ea088e2ec 100644
>--- a/drivers/gpu/drm/xe/xe_query.c
>+++ b/drivers/gpu/drm/xe/xe_query.c
>@@ -501,6 +501,86 @@ static int query_gt_topology(struct xe_device *xe,
> 	return 0;
> }
>
>+static size_t calc_oa_unit_query_size(struct xe_device *xe)
>+{
>+	size_t size = sizeof(struct drm_xe_query_oa_units);
>+	struct xe_gt *gt;
>+	int i, id;
>+
>+	for_each_gt(gt, xe, id) {
>+		for (i = 0; i < gt->oa.num_oa_units; i++) {
>+			size += sizeof(struct drm_xe_oa_unit);
>+			size += gt->oa.oa_unit[i].num_engines *
>+				sizeof(struct drm_xe_engine_class_instance);
>+		}
>+	}
>+
>+	return size;
>+}
>+
>+static int query_oa_units(struct xe_device *xe,
>+			  struct drm_xe_device_query *query)
>+{
>+	void __user *query_ptr = u64_to_user_ptr(query->data);
>+	size_t size = calc_oa_unit_query_size(xe);
>+	struct drm_xe_query_oa_units *qoa;
>+	enum xe_hw_engine_id hwe_id;
>+	struct drm_xe_oa_unit *du;
>+	struct xe_hw_engine *hwe;
>+	struct xe_oa_unit *u;
>+	int gt_id, i, j, ret;
>+	struct xe_gt *gt;
>+	u8 *pdu;
>+
>+	if (query->size == 0) {
>+		query->size = size;
>+		return 0;
>+	} else if (XE_IOCTL_DBG(xe, query->size != size)) {
>+		return -EINVAL;
>+	}
>+
>+	qoa = kzalloc(size, GFP_KERNEL);
>+	if (!qoa)
>+		return -ENOMEM;
>+
>+	pdu = (u8 *)&qoa->oa_units[0];
>+	for_each_gt(gt, xe, gt_id) {
>+		for (i = 0; i < gt->oa.num_oa_units; i++) {
>+			u = &gt->oa.oa_unit[i];
>+			du = (struct drm_xe_oa_unit *)pdu;
>+
>+			du->oa_unit_id = u->oa_unit_id;
>+			du->oa_unit_type = u->type;
>+			du->gt_id = gt->info.id;
>+			du->open_stream = !!u->exclusive_stream;
>+			du->oa_timestamp_freq = xe_oa_timestamp_frequency(gt);
>+			du->oa_buf_size = XE_OA_BUFFER_SIZE;
>+			du->num_engines = u->num_engines;
>+
>+			for (j = 1; j < DRM_XE_OA_PROPERTY_MAX; j++)
>+				du->capabilities |= BIT(j);
>+
>+			j = 0;
>+			for_each_hw_engine(hwe, gt, hwe_id) {
>+				if (xe_oa_unit_id(hwe) == u->oa_unit_id) {
>+					du->eci[j].engine_class =
>+						xe_to_user_engine_class[hwe->class];
>+					du->eci[j].engine_instance = hwe->logical_instance;
>+					du->eci[j].gt_id = gt->info.id;
>+					j++;
>+				}
>+			}
>+			pdu += sizeof(*du) + j * sizeof(du->eci[0]);
>+			qoa->num_oa_units++;
>+		}
>+	}
>+
>+	ret = copy_to_user(query_ptr, qoa, size);
>+	kfree(qoa);
>+
>+	return ret ? -EFAULT : 0;
>+}
>+
> static int (* const xe_query_funcs[])(struct xe_device *xe,
> 				      struct drm_xe_device_query *query) = {
> 	query_engines,
>@@ -510,6 +590,7 @@ static int (* const xe_query_funcs[])(struct xe_device *xe,
> 	query_hwconfig,
> 	query_gt_topology,
> 	query_engine_cycles,
>+	query_oa_units,
> };
>
> int xe_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>index 8156301df7315..5f41c5bfe5e0e 100644
>--- a/include/uapi/drm/xe_drm.h
>+++ b/include/uapi/drm/xe_drm.h
>@@ -517,6 +517,7 @@ struct drm_xe_device_query {
> #define DRM_XE_DEVICE_QUERY_HWCONFIG		4
> #define DRM_XE_DEVICE_QUERY_GT_TOPOLOGY		5
> #define DRM_XE_DEVICE_QUERY_ENGINE_CYCLES	6
>+#define DRM_XE_DEVICE_QUERY_OA_UNITS		7
> 	/** @query: The type of data to query */
> 	__u32 query;
>
>@@ -1182,6 +1183,69 @@ enum drm_xe_oa_unit_type {
> 	DRM_XE_OA_UNIT_TYPE_OAM,
> };
>
>+/**
>+ * struct drm_xe_query_oa_units - describe OA units
>+ *
>+ * If a query is made with a struct drm_xe_device_query where .query
>+ * is equal to DRM_XE_DEVICE_QUERY_OA_UNITS, then the reply uses struct
>+ * drm_xe_query_oa_units in .data.
>+ *
>+ * When there is an @open_stream, the query returns properties specific to
>+ * that @open_stream. Else default properties are returned.
>+ */
>+struct drm_xe_query_oa_units {
>+	/** @extensions: Pointer to the first extension struct, if any */
>+	__u64 extensions;
>+
>+	/** @num_oa_units: number of OA units returned in oau[] */
>+	__u32 num_oa_units;
>+
>+	/** @pad: MBZ */
>+	__u32 pad;
>+
>+	/** @reserved: MBZ */
>+	__u64 reserved[4];

For some reason I have assumed reserved fields are added only at the end 
of the uApi struct, not sure though.

>+
>+	/** @oa_units: OA units returned for this device */
>+	struct drm_xe_oa_unit {
>+		/** @oa_unit_id: OA unit ID */
>+		__u16 oa_unit_id;
>+
>+		/** @oa_unit_type: OA unit type of @drm_xe_oa_unit_type */
>+		__u16 oa_unit_type;
>+
>+		/** @gt_id: GT ID for this OA unit */
>+		__u16 gt_id;
>+
>+		/** @open_stream: True if a stream is open on the OA unit */
>+		__u16 open_stream;
>+
>+		/** @internal_events: True if internal events are available */
>+		__u16 internal_events;
>+
>+		/** @pad: MBZ */
>+		__u16 pad;

__u16 pad[3] for 64bit alignment
>+
>+		/** @capabilities: OA capabilities bit-mask */
>+		__u64 capabilities;
>+
>+		/** @oa_timestamp_freq: OA timestamp freq */
>+		__u64 oa_timestamp_freq;
>+
>+		/** @oa_buf_size: OA buffer size */
>+		__u64 oa_buf_size;
>+
>+		/** @reserved: MBZ */
>+		__u64 reserved[4];
>+
>+		/** @num_engines: number of engines in @eci array */
>+		__u64 num_engines;
>+
>+		/** @eci: engines attached to this OA unit */
>+		struct drm_xe_engine_class_instance eci[];
>+	} oa_units[];

nesting of flexible arrays; not sure about that. i think some compilers 
may throw an error/warning. Sending an old message from Joonas offline.

In general, I feel the pad and reserved fields sprinkled into the 
structure. If we can avoid that in a way that they are all located at 
the end of the struct, I think that would look good. Not sure about the 
technical aspect though. I always assumed they were meant to be at the 
end (but then structs are nested anyways, so really not sure).

Thanks,
Umesh

>+};
>+
> /** enum drm_xe_oa_format_type - OA format types */
> enum drm_xe_oa_format_type {
> 	DRM_XE_OA_FMT_TYPE_OAG,
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap
  2023-12-08  6:43 ` [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap Ashutosh Dixit
@ 2023-12-23  2:39   ` Umesh Nerlige Ramappa
  2024-01-20  3:11     ` Dixit, Ashutosh
  2024-01-02 11:16   ` Thomas Hellström
  1 sibling, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-12-23  2:39 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

On Thu, Dec 07, 2023 at 10:43:27PM -0800, Ashutosh Dixit wrote:
>Allow the OA buffer to be mmap'd to userspace. This is needed for the MMIO
>trigger use case. Even otherwise, with whitelisted OA head/tail ptr
>registers, userspace can receive/interpret OA data from the mmap'd buffer
>without issuing read()'s on the OA stream fd.
>
>Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/xe/xe_oa.c | 53 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>index 42f32d4359f2c..97779cbb83ee8 100644
>--- a/drivers/gpu/drm/xe/xe_oa.c
>+++ b/drivers/gpu/drm/xe/xe_oa.c
>@@ -898,6 +898,8 @@ static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream)
> 		return PTR_ERR(bo);
>
> 	stream->oa_buffer.bo = bo;
>+	/* mmap implementation requires OA buffer to be in system memory */
>+	xe_assert(stream->oa->xe, bo->vmap.is_iomem == 0);
> 	stream->oa_buffer.vaddr = bo->vmap.vaddr;
> 	return 0;
> }
>@@ -1174,6 +1176,9 @@ static int xe_oa_release(struct inode *inode, struct file *file)
> 	struct xe_oa_stream *stream = file->private_data;
> 	struct xe_gt *gt = stream->gt;
>
>+	/* Zap mmap's */
>+	unmap_mapping_range(file->f_mapping, 0, -1, 1);
>+
> 	mutex_lock(&gt->oa.gt_lock);
> 	xe_oa_destroy_locked(stream);
> 	mutex_unlock(&gt->oa.gt_lock);
>@@ -1184,6 +1189,53 @@ static int xe_oa_release(struct inode *inode, struct file *file)
> 	return 0;
> }
>
>+static int xe_oa_mmap(struct file *file, struct vm_area_struct *vma)
>+{
>+	struct xe_oa_stream *stream = file->private_data;
>+	struct xe_bo *bo = stream->oa_buffer.bo;
>+	unsigned long start = vma->vm_start;
>+	int i, ret;
>+
>+	if (xe_perf_stream_paranoid && !perfmon_capable()) {
>+		drm_dbg(&stream->oa->xe->drm, "Insufficient privilege to map OA buffer\n");
>+		return -EACCES;
>+	}
>+
>+	/* Can mmap the entire OA buffer or nothing (no partial OA buffer mmaps) */
>+	if (vma->vm_end - vma->vm_start != XE_OA_BUFFER_SIZE) {
>+		drm_dbg(&stream->oa->xe->drm, "Wrong mmap size, must be OA buffer size\n");
>+		return -EINVAL;
>+	}
>+
>+	/* Only support VM_READ, enforce MAP_PRIVATE by checking for VM_MAYSHARE */
>+	if (vma->vm_flags & (VM_WRITE | VM_EXEC | VM_SHARED | VM_MAYSHARE)) {
>+		drm_dbg(&stream->oa->xe->drm, "mmap must be read only\n");
>+		return -EINVAL;
>+	}
>+
>+	vm_flags_clear(vma, VM_MAYWRITE | VM_MAYEXEC);
>+
>+	/*
>+	 * If the privileged parent forks and child drops root privilege, we do not want
>+	 * the child to retain access to the mapped OA buffer. Explicitly set VM_DONTCOPY
>+	 * to avoid such cases.
>+	 */
>+	vm_flags_set(vma, vma->vm_flags | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY);

Would help to just use the vm_flags_mod where you can specify both set 
and clear flags.

And then just to be paranoid about it, maybe add an assert to check that 
the flags applied correctly.

Assuming you ran the existing mmap tests for this. I think we should 
also add an mremap case. I think that should fail with EINVAL since this 
is a private mapping.

>+
>+	xe_assert(stream->oa->xe, bo->ttm.ttm->num_pages ==
>+		  (vma->vm_end - vma->vm_start) >> PAGE_SHIFT);
>+	for (i = 0; i < bo->ttm.ttm->num_pages; i++) {
>+		ret = remap_pfn_range(vma, start, page_to_pfn(bo->ttm.ttm->pages[i]),
>+				      PAGE_SIZE, vma->vm_page_prot);

vma->vm_page_prot is set to the state of vm_flags that existed at the 
mmap_region() level. We have modified those flags here and we must 
update the vma_page_prot with vm_get_page_prot(vma->vm_flags).

Thanks,
Umesh

>+		if (ret)
>+			break;
>+
>+		start += PAGE_SIZE;
>+	}
>+
>+	return ret;
>+}
>+
> static const struct file_operations xe_oa_fops = {
> 	.owner		= THIS_MODULE,
> 	.llseek		= no_llseek,
>@@ -1191,6 +1243,7 @@ static const struct file_operations xe_oa_fops = {
> 	.poll		= xe_oa_poll,
> 	.read		= xe_oa_read,
> 	.unlocked_ioctl	= xe_oa_ioctl,
>+	.mmap		= xe_oa_mmap,
> };
>
> static bool engine_supports_mi_query(struct xe_hw_engine *hwe)
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap
  2023-12-08  6:43 ` [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap Ashutosh Dixit
  2023-12-23  2:39   ` Umesh Nerlige Ramappa
@ 2024-01-02 11:16   ` Thomas Hellström
  2024-01-08 19:50     ` Umesh Nerlige Ramappa
  1 sibling, 1 reply; 68+ messages in thread
From: Thomas Hellström @ 2024-01-02 11:16 UTC (permalink / raw)
  To: Ashutosh Dixit, intel-xe

On Thu, 2023-12-07 at 22:43 -0800, Ashutosh Dixit wrote:
> Allow the OA buffer to be mmap'd to userspace. This is needed for the
> MMIO
> trigger use case. Even otherwise, with whitelisted OA head/tail ptr
> registers, userspace can receive/interpret OA data from the mmap'd
> buffer
> without issuing read()'s on the OA stream fd.
> 
> Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_oa.c | 53
> ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> index 42f32d4359f2c..97779cbb83ee8 100644
> --- a/drivers/gpu/drm/xe/xe_oa.c
> +++ b/drivers/gpu/drm/xe/xe_oa.c
> @@ -898,6 +898,8 @@ static int xe_oa_alloc_oa_buffer(struct
> xe_oa_stream *stream)
>  		return PTR_ERR(bo);
>  
>  	stream->oa_buffer.bo = bo;
> +	/* mmap implementation requires OA buffer to be in system
> memory */
> +	xe_assert(stream->oa->xe, bo->vmap.is_iomem == 0);
>  	stream->oa_buffer.vaddr = bo->vmap.vaddr;
>  	return 0;
>  }
> @@ -1174,6 +1176,9 @@ static int xe_oa_release(struct inode *inode,
> struct file *file)
>  	struct xe_oa_stream *stream = file->private_data;
>  	struct xe_gt *gt = stream->gt;
>  
> +	/* Zap mmap's */
> +	unmap_mapping_range(file->f_mapping, 0, -1, 1);
> +

Can release() get called at all if there is a live mapping()? Meaning
the unmap_mapping_range() shouldn't be needed?

/Thomas


>  	mutex_lock(&gt->oa.gt_lock);
>  	xe_oa_destroy_locked(stream);
>  	mutex_unlock(&gt->oa.gt_lock);
> @@ -1184,6 +1189,53 @@ static int xe_oa_release(struct inode *inode,
> struct file *file)
>  	return 0;
>  }
>  
> +static int xe_oa_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct xe_oa_stream *stream = file->private_data;
> +	struct xe_bo *bo = stream->oa_buffer.bo;
> +	unsigned long start = vma->vm_start;
> +	int i, ret;
> +
> +	if (xe_perf_stream_paranoid && !perfmon_capable()) {
> +		drm_dbg(&stream->oa->xe->drm, "Insufficient
> privilege to map OA buffer\n");
> +		return -EACCES;
> +	}
> +
> +	/* Can mmap the entire OA buffer or nothing (no partial OA
> buffer mmaps) */
> +	if (vma->vm_end - vma->vm_start != XE_OA_BUFFER_SIZE) {
> +		drm_dbg(&stream->oa->xe->drm, "Wrong mmap size, must
> be OA buffer size\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Only support VM_READ, enforce MAP_PRIVATE by checking for
> VM_MAYSHARE */
> +	if (vma->vm_flags & (VM_WRITE | VM_EXEC | VM_SHARED |
> VM_MAYSHARE)) {
> +		drm_dbg(&stream->oa->xe->drm, "mmap must be read
> only\n");
> +		return -EINVAL;
> +	}
> +
> +	vm_flags_clear(vma, VM_MAYWRITE | VM_MAYEXEC);
> +
> +	/*
> +	 * If the privileged parent forks and child drops root
> privilege, we do not want
> +	 * the child to retain access to the mapped OA buffer.
> Explicitly set VM_DONTCOPY
> +	 * to avoid such cases.
> +	 */
> +	vm_flags_set(vma, vma->vm_flags | VM_PFNMAP | VM_DONTEXPAND
> | VM_DONTDUMP | VM_DONTCOPY);
> +
> +	xe_assert(stream->oa->xe, bo->ttm.ttm->num_pages ==
> +		  (vma->vm_end - vma->vm_start) >> PAGE_SHIFT);
> +	for (i = 0; i < bo->ttm.ttm->num_pages; i++) {
> +		ret = remap_pfn_range(vma, start, page_to_pfn(bo-
> >ttm.ttm->pages[i]),
> +				      PAGE_SIZE, vma->vm_page_prot);
> +		if (ret)
> +			break;
> +
> +		start += PAGE_SIZE;
> +	}
> +
> +	return ret;
> +}
> +
>  static const struct file_operations xe_oa_fops = {
>  	.owner		= THIS_MODULE,
>  	.llseek		= no_llseek,
> @@ -1191,6 +1243,7 @@ static const struct file_operations xe_oa_fops
> = {
>  	.poll		= xe_oa_poll,
>  	.read		= xe_oa_read,
>  	.unlocked_ioctl	= xe_oa_ioctl,
> +	.mmap		= xe_oa_mmap,
>  };
>  
>  static bool engine_supports_mi_query(struct xe_hw_engine *hwe)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap
  2024-01-02 11:16   ` Thomas Hellström
@ 2024-01-08 19:50     ` Umesh Nerlige Ramappa
  2024-01-09  5:14       ` Dixit, Ashutosh
  0 siblings, 1 reply; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-01-08 19:50 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

On Tue, Jan 02, 2024 at 12:16:12PM +0100, Thomas Hellström wrote:
>On Thu, 2023-12-07 at 22:43 -0800, Ashutosh Dixit wrote:
>> Allow the OA buffer to be mmap'd to userspace. This is needed for the
>> MMIO
>> trigger use case. Even otherwise, with whitelisted OA head/tail ptr
>> registers, userspace can receive/interpret OA data from the mmap'd
>> buffer
>> without issuing read()'s on the OA stream fd.
>>
>> Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_oa.c | 53
>> ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 53 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>> index 42f32d4359f2c..97779cbb83ee8 100644
>> --- a/drivers/gpu/drm/xe/xe_oa.c
>> +++ b/drivers/gpu/drm/xe/xe_oa.c
>> @@ -898,6 +898,8 @@ static int xe_oa_alloc_oa_buffer(struct
>> xe_oa_stream *stream)
>>  		return PTR_ERR(bo);
>>  
>>  	stream->oa_buffer.bo = bo;
>> +	/* mmap implementation requires OA buffer to be in system
>> memory */
>> +	xe_assert(stream->oa->xe, bo->vmap.is_iomem == 0);
>>  	stream->oa_buffer.vaddr = bo->vmap.vaddr;
>>  	return 0;
>>  }
>> @@ -1174,6 +1176,9 @@ static int xe_oa_release(struct inode *inode,
>> struct file *file)
>>  	struct xe_oa_stream *stream = file->private_data;
>>  	struct xe_gt *gt = stream->gt;
>>  
>> +	/* Zap mmap's */
>> +	unmap_mapping_range(file->f_mapping, 0, -1, 1);
>> +
>
>Can release() get called at all if there is a live mapping()? Meaning
>the unmap_mapping_range() shouldn't be needed?

If user closes the fd, but has not called unmap, then release will not 
get called. If unmap_mapping_range() doesn't do anything extra compared 
to unmap(), then I agree that we coud drop this.

Umesh
>
>/Thomas
>
>
>>  	mutex_lock(&gt->oa.gt_lock);
>>  	xe_oa_destroy_locked(stream);
>>  	mutex_unlock(&gt->oa.gt_lock);
>> @@ -1184,6 +1189,53 @@ static int xe_oa_release(struct inode *inode,
>> struct file *file)
>>  	return 0;
>>  }
>>  
>> +static int xe_oa_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> +	struct xe_oa_stream *stream = file->private_data;
>> +	struct xe_bo *bo = stream->oa_buffer.bo;
>> +	unsigned long start = vma->vm_start;
>> +	int i, ret;
>> +
>> +	if (xe_perf_stream_paranoid && !perfmon_capable()) {
>> +		drm_dbg(&stream->oa->xe->drm, "Insufficient
>> privilege to map OA buffer\n");
>> +		return -EACCES;
>> +	}
>> +
>> +	/* Can mmap the entire OA buffer or nothing (no partial OA
>> buffer mmaps) */
>> +	if (vma->vm_end - vma->vm_start != XE_OA_BUFFER_SIZE) {
>> +		drm_dbg(&stream->oa->xe->drm, "Wrong mmap size, must
>> be OA buffer size\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Only support VM_READ, enforce MAP_PRIVATE by checking for
>> VM_MAYSHARE */
>> +	if (vma->vm_flags & (VM_WRITE | VM_EXEC | VM_SHARED |
>> VM_MAYSHARE)) {
>> +		drm_dbg(&stream->oa->xe->drm, "mmap must be read
>> only\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	vm_flags_clear(vma, VM_MAYWRITE | VM_MAYEXEC);
>> +
>> +	/*
>> +	 * If the privileged parent forks and child drops root
>> privilege, we do not want
>> +	 * the child to retain access to the mapped OA buffer.
>> Explicitly set VM_DONTCOPY
>> +	 * to avoid such cases.
>> +	 */
>> +	vm_flags_set(vma, vma->vm_flags | VM_PFNMAP | VM_DONTEXPAND
>> | VM_DONTDUMP | VM_DONTCOPY);
>> +
>> +	xe_assert(stream->oa->xe, bo->ttm.ttm->num_pages ==
>> +		  (vma->vm_end - vma->vm_start) >> PAGE_SHIFT);
>> +	for (i = 0; i < bo->ttm.ttm->num_pages; i++) {
>> +		ret = remap_pfn_range(vma, start, page_to_pfn(bo-
>> >ttm.ttm->pages[i]),
>> +				      PAGE_SIZE, vma->vm_page_prot);
>> +		if (ret)
>> +			break;
>> +
>> +		start += PAGE_SIZE;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  static const struct file_operations xe_oa_fops = {
>>  	.owner		= THIS_MODULE,
>>  	.llseek		= no_llseek,
>> @@ -1191,6 +1243,7 @@ static const struct file_operations xe_oa_fops
>> = {
>>  	.poll		= xe_oa_poll,
>>  	.read		= xe_oa_read,
>>  	.unlocked_ioctl	= xe_oa_ioctl,
>> +	.mmap		= xe_oa_mmap,
>>  };
>>  
>>  static bool engine_supports_mi_query(struct xe_hw_engine *hwe)
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap
  2024-01-08 19:50     ` Umesh Nerlige Ramappa
@ 2024-01-09  5:14       ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-09  5:14 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Mon, 08 Jan 2024 11:50:40 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Thomas/Umesh,

> On Tue, Jan 02, 2024 at 12:16:12PM +0100, Thomas Hellström wrote:
> > On Thu, 2023-12-07 at 22:43 -0800, Ashutosh Dixit wrote:
> >> Allow the OA buffer to be mmap'd to userspace. This is needed for the
> >> MMIO
> >> trigger use case. Even otherwise, with whitelisted OA head/tail ptr
> >> registers, userspace can receive/interpret OA data from the mmap'd
> >> buffer
> >> without issuing read()'s on the OA stream fd.
> >>
> >> Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> >> ---
> >>  drivers/gpu/drm/xe/xe_oa.c | 53
> >> ++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 53 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> >> index 42f32d4359f2c..97779cbb83ee8 100644
> >> --- a/drivers/gpu/drm/xe/xe_oa.c
> >> +++ b/drivers/gpu/drm/xe/xe_oa.c
> >> @@ -898,6 +898,8 @@ static int xe_oa_alloc_oa_buffer(struct
> >> xe_oa_stream *stream)
> >>  		return PTR_ERR(bo);
> >>
> >>  	stream->oa_buffer.bo = bo;
> >> +	/* mmap implementation requires OA buffer to be in system
> >> memory */
> >> +	xe_assert(stream->oa->xe, bo->vmap.is_iomem == 0);
> >>  	stream->oa_buffer.vaddr = bo->vmap.vaddr;
> >>  	return 0;
> >>  }
> >> @@ -1174,6 +1176,9 @@ static int xe_oa_release(struct inode *inode,
> >> struct file *file)
> >>  	struct xe_oa_stream *stream = file->private_data;
> >>  	struct xe_gt *gt = stream->gt;
> >>
> >> +	/* Zap mmap's */
> >> +	unmap_mapping_range(file->f_mapping, 0, -1, 1);
> >> +
> >
> > Can release() get called at all if there is a live mapping()? Meaning
> > the unmap_mapping_range() shouldn't be needed?
>
> If user closes the fd, but has not called unmap, then release will not get
> called. If unmap_mapping_range() doesn't do anything extra compared to
> unmap(), then I agree that we coud drop this.

I am removing unmap_mapping_range. I checked and found that:

* If munmap is not called and fd is closed, release is not called
* However, release is called (even in the above case) when the process
  exits. But at process exit, I think we can safely assume that resources
  allocated for the process will be automatically freed. So there seems to
  be no reason to retain unmap_mapping_range.
* Also, in general, I am not seeing unmap_mapping_range being called in
  other places in the kernel which use remap_pfn_range.

So removing unmap_mapping_range because of the above reasons.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2023-12-19 20:28   ` Dixit, Ashutosh
@ 2024-01-20  2:35     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  2:35 UTC (permalink / raw)
  To: intel-xe

On Tue, 19 Dec 2023 12:28:51 -0800, Dixit, Ashutosh wrote:
>
> On Thu, 07 Dec 2023 22:43:14 -0800, Ashutosh Dixit wrote:
> >
> > +static struct ctl_table perf_ctl_table[] = {
> > +	{
> > +	 .procname = "perf_stream_paranoid",
> > +	 .data = &xe_perf_stream_paranoid,
> > +	 .maxlen = sizeof(xe_perf_stream_paranoid),
> > +	 .mode = 0644,
> > +	 .proc_handler = proc_dointvec_minmax,
> > +	 .extra1 = SYSCTL_ZERO,
> > +	 .extra2 = SYSCTL_ONE,
> > +	 },
> > +	{}
> > +};
> > +
> > +int xe_perf_sysctl_register(void)
> > +{
> > +	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
> > +	return 0;
> > +}
>
> Any idea why this (and xe_oa_max_sample_rate) is create in /proc, rather
> than something attached to the module?
>
> We would want it to be per module, rather than per device, so that's one
> reason. What are the options for creating per module params? One is
> module_param itself. In that case this would appear in
> "/sys/module/xe/parameters/perf_stream_paranoid" rather than in
> "/proc/sys/dev/xe/perf_stream_paranoid".
>
> Module params are slightly simpler to manage than /proc stuff I think. Any
> other reason to prefer one over the other?
>
> Comments?

Looking into i915 history, seems these were created in /proc to follow what
the kernel perf subsystem does. However, considering there is nothing
common between kernel perf and xe/i915 perf subsystems, there seems to be
little reason to have these files where the kernel subsystem places similar
files (if the files are are indeed similar). These files might as well
created in xe module params or device level sysfs. Module param is probably
the least lines of code.

Anyway I've left them in /proc for now and probably doesn't matter a lot,
but I think we should think a little bit if we should move them to module
params or sysfs.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 03/17] drm/xe/oa/uapi: Add oa_max_sample_rate sysctl
  2023-12-14  0:58   ` Umesh Nerlige Ramappa
@ 2024-01-20  2:36     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  2:36 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Wed, 13 Dec 2023 16:58:52 -0800, Umesh Nerlige Ramappa wrote:
>
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -29,6 +29,7 @@
> > #include "xe_irq.h"
> > #include "xe_mmio.h"
> > #include "xe_module.h"
> > +#include "xe_oa.h"
> > #include "xe_pat.h"
> > #include "xe_pcode.h"
> > #include "xe_perf.h"
> > @@ -480,6 +481,10 @@ int xe_device_probe(struct xe_device *xe)
> >
> >	xe_heci_gsc_init(xe);
> >
> > +	err = xe_oa_init(xe);
> > +	if (err)
> > +		goto err_irq_shutdown;
> > +
> >	err = xe_display_init(xe);
> >	if (err)
> >		goto err_irq_shutdown;
>
> ^ this needs to do an xe_oa_fini on failure, so should be a different/new
> goto label.

Good point, fixed in v8. Though the "drm/xe/oa/uapi: Add oa_max_sample_rate
sysctl" patch is gone in v8, so the change has been made in
"drm/xe/oa/uapi: Add OA data formats".

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 05/17] drm/xe/oa/uapi: Initialize OA units
  2023-12-19 16:11   ` Umesh Nerlige Ramappa
@ 2024-01-20  2:43     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  2:43 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Tue, 19 Dec 2023 08:11:58 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> > index 11662a81ef6d8..5ad3c9c78b4e9 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.c
> > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > @@ -5,7 +5,10 @@
> >
> > #include <linux/sysctl.h>
> >
> > +#include "regs/xe_oa_regs.h"
> > #include "xe_device.h"
> > +#include "xe_gt.h"
> > +#include "xe_mmio.h"
> > #include "xe_oa.h"
> >
> > static int xe_oa_sample_rate_hard_limit;
> > @@ -13,6 +16,13 @@ static u32 xe_oa_max_sample_rate = 100000;
> >
> > static struct ctl_table_header *sysctl_header;
> >
> > +enum {
> > +	XE_OA_UNIT_OAG = 0,
> > +	XE_OA_UNIT_OAM_SAMEDIA_0 = 0,
> > +	XE_OA_UNIT_MAX,
> > +	XE_OA_UNIT_INVALID = U32_MAX,
> > +};
>
> Right now, I think the enum is not needed since we are only defining 0.

Removed in v8. There was my spiel on this part in i915 too:

https://patchwork.freedesktop.org/patch/527601/?series=115330&rev=1#comment_955388

> > +
> > #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
> >
> > static const struct xe_oa_format oa_formats[] = {
> > @@ -37,6 +47,143 @@ static const struct xe_oa_format oa_formats[] = {
> >	[XE_OA_FORMAT_PEC36u64_G1_4_G2_32]	= { 4, 320, DRM_FMT(PEC), HDR_64_BIT, 1, 0 },
> > };
> >
> > +static u32 num_oa_units_per_gt(struct xe_gt *gt)
> > +{
> > +	return 1;
> > +}
> > +
> > +static u32 __hwe_oam_unit(struct xe_hw_engine *hwe)
> > +{
> > +	if (GRAPHICS_VERx100(gt_to_xe(hwe->gt)) >= 1270) {
> > +		/*
> > +		 * There's 1 SAMEDIA gt and 1 OAM per SAMEDIA gt. All media slices
> > +		 * within the gt use the same OAM. All MTL/LNL SKUs list 1 SA MEDIA
> > +		 */
> > +		drm_WARN_ON(&gt_to_xe(hwe->gt)->drm,
> > +			    hwe->gt->info.type != XE_GT_TYPE_MEDIA);
> > +
> > +		return XE_OA_UNIT_OAM_SAMEDIA_0;
> > +	}
> > +
> > +	return XE_OA_UNIT_INVALID;
> > +}
> > +
> > +static u32 __hwe_oa_unit(struct xe_hw_engine *hwe)
> > +{
> > +	switch (hwe->class) {
> > +	case XE_ENGINE_CLASS_RENDER:
> > +	case XE_ENGINE_CLASS_COMPUTE:
> > +		return XE_OA_UNIT_OAG;
> > +
> > +	case XE_ENGINE_CLASS_VIDEO_DECODE:
> > +	case XE_ENGINE_CLASS_VIDEO_ENHANCE:
> > +		return __hwe_oam_unit(hwe);
> > +
> > +	default:
> > +		return XE_OA_UNIT_INVALID;
> > +	}
> > +}
> > +
> > +static struct xe_oa_regs __oam_regs(u32 base)
> > +{
> > +	return (struct xe_oa_regs) {
> > +		base,
> > +		OAM_HEAD_POINTER(base),
> > +		OAM_TAIL_POINTER(base),
> > +		OAM_BUFFER(base),
> > +		OAM_CONTEXT_CONTROL(base),
> > +		OAM_CONTROL(base),
> > +		OAM_DEBUG(base),
> > +		OAM_STATUS(base),
> > +		OAM_CONTROL_COUNTER_SEL_MASK,
> > +	};
> > +}
> > +
> > +static struct xe_oa_regs __oag_regs(void)
> > +{
> > +	return (struct xe_oa_regs) {
> > +		0,
> > +		OAG_OAHEADPTR,
> > +		OAG_OATAILPTR,
> > +		OAG_OABUFFER,
> > +		OAG_OAGLBCTXCTRL,
> > +		OAG_OACONTROL,
> > +		OAG_OA_DEBUG,
> > +		OAG_OASTATUS,
> > +		OAG_OACONTROL_OA_COUNTER_SEL_MASK,
> > +	};
> > +}
> > +
> > +static void __xe_oa_init_oa_units(struct xe_gt *gt)
> > +{
> > +	const u32 mtl_oa_base[] = {
> > +		[XE_OA_UNIT_OAM_SAMEDIA_0] = 0x393000,
>
> The base can also be 0x13000 because intel_uncore will automagically add
> 0x380000. I prefer 0x13000 so that the media related mmio adjustments
> happen in one place - intel_uncore. For functionality, it doesn't matter.

Changed to 0x13000, since you are right, GSI steering will take care of it.

> > +	};
> > +	int i, num_units = gt->oa.num_oa_units;
> > +
> > +	for (i = 0; i < num_units; i++) {
> > +		struct xe_oa_unit *u = &gt->oa.oa_unit[i];
> > +
> > +		if (i == XE_OA_UNIT_OAG && gt->info.type != XE_GT_TYPE_MEDIA) {
>
> This is where I feel enum can be dropped since decision can solely be made
> with gt->info.type.

Done.

>
> > +			u->regs = __oag_regs();
> > +			u->type = DRM_XE_OA_UNIT_TYPE_OAG;
> > +		} else if (GRAPHICS_VERx100(gt_to_xe(gt)) >= 1270) {
> > +			u->regs = __oam_regs(mtl_oa_base[i]);
> > +			u->type = DRM_XE_OA_UNIT_TYPE_OAM;
> > +		}
> > +
> > +		/* Set oa_unit_ids now to ensure ids remain contiguous */
> > +		u->oa_unit_id = gt_to_xe(gt)->oa.oa_unit_ids++;
> > +	}
> > +}
> > +
>
> All the above are minor comments, so with or without those addressed, this
> is
>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 06/17] drm/xe/oa/uapi: Add/remove OA config perf ops
  2023-12-19 19:10   ` Umesh Nerlige Ramappa
@ 2024-01-20  2:44     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  2:44 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Tue, 19 Dec 2023 11:10:22 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> > +static bool xe_oa_reg_in_range_table(u32 addr, const struct xe_mmio_range *table)
> > +{
> > +	while (table->start || table->end) {
>
> nit: why not start && end? I would expect both start and end defined for a
> range.

Done (it was carry over from i915).

Note that the plan here is to clean up these checks (remove separate
mux/flex/b_c counter register checks) eventually, as discussed in a
previous review. I am tracking this task but haven't got round to it
yet. We can do this cleanup later since it doesn't affect uapi.

> > +	/* Config id 0 is invalid, id 1 for kernel stored test config */
>
> kernel doesn't store a test config anymore, so the comment can be
> updated. You can start with 1 below though, but that's up to you.
>
> > +	oa_config->id = idr_alloc(&oa->metrics_idr, oa_config, 2, 0, GFP_KERNEL);
> > +	if (oa_config->id < 0) {
> > +		drm_dbg(&oa->xe->drm, "Failed to create sysfs entry for OA config\n");
> > +		err = oa_config->id;
> > +		goto sysfs_err;
> > +	}

Done, dropped comment and starting with 1 now.

> minor nits above, otherwise lgtm,
>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties
  2023-12-19 23:23   ` Umesh Nerlige Ramappa
@ 2024-01-20  2:48     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  2:48 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Tue, 19 Dec 2023 15:23:59 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> On Thu, Dec 07, 2023 at 10:43:19PM -0800, Ashutosh Dixit wrote:
> > Properties for OA streams are specified by user space, when the stream is
> > opened, as a chain of drm_xe_ext_set_property struct's. Parse and validate
> > these stream properties.
> >
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_oa.c   | 372 +++++++++++++++++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_oa.h   |   2 +
> > drivers/gpu/drm/xe/xe_perf.c |   2 +
> > include/uapi/drm/xe_drm.h    | 114 +++++++++++
> > 4 files changed, 490 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> > index 6a903bf4f87d1..9b0bd58fcbc06 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.c
> > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > @@ -3,10 +3,13 @@
> >  * Copyright © 2023 Intel Corporation
> >  */
> >
> > +#include <linux/nospec.h>
> > #include <linux/sysctl.h>
> >
> > +#include "regs/xe_gt_regs.h"
> > #include "regs/xe_oa_regs.h"
> > #include "xe_device.h"
> > +#include "xe_exec_queue.h"
> > #include "xe_gt.h"
> > #include "xe_mmio.h"
> > #include "xe_oa.h"
> > @@ -46,6 +49,20 @@ struct xe_oa_config {
> >	struct rcu_head rcu;
> > };
> >
> > +struct xe_oa_open_param {
> > +	u32 oa_unit_id;
> > +	bool sample;
> > +	u32 metric_set;
> > +	enum xe_oa_format_name oa_format;
> > +	int period_exponent;
> > +	u32 poll_period_us;
> > +	u32 open_flags;
> > +	int exec_queue_id;
> > +	int engine_instance;
> > +	struct xe_exec_queue *exec_q;
> > +	struct xe_hw_engine *hwe;
> > +};
> > +
> > #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
> >
> > static const struct xe_oa_format oa_formats[] = {
> > @@ -88,6 +105,361 @@ static void xe_oa_config_put(struct xe_oa_config *oa_config)
> >	kref_put(&oa_config->ref, xe_oa_config_release);
> > }
> >
> > +/*
> > + * OA timestamp frequency = CS timestamp frequency in most platforms. On some
> > + * platforms OA unit ignores the CTC_SHIFT and the 2 timestamps differ. In such
> > + * cases, return the adjusted CS timestamp frequency to the user.
> > + */
> > +u32 xe_oa_timestamp_frequency(struct xe_gt *gt)
> > +{
> > +	u32 reg, shift;
> > +
> > +	/*
> > +	 * Wa_18013179988:dg2
> > +	 * Wa_14015568240:pvc
> > +	 * Wa_14015846243:mtl
> > +	 */
> > +	switch (gt_to_xe(gt)->info.platform) {
> > +	case XE_DG2:
> > +	case XE_PVC:
> > +	case XE_METEORLAKE:
> > +		xe_device_mem_access_get(gt_to_xe(gt));
> > +		reg = xe_mmio_read32(gt, RPM_CONFIG0);
> > +		xe_device_mem_access_put(gt_to_xe(gt));
> > +
> > +		shift = REG_FIELD_GET(RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, reg);
> > +		return gt->info.reference_clock << (3 - shift);
> > +
> > +	default:
> > +		return gt->info.reference_clock;
> > +	}
> > +}
> > +
> > +static u64 oa_exponent_to_ns(struct xe_gt *gt, int exponent)
> > +{
> > +	u64 nom = (2ULL << exponent) * NSEC_PER_SEC;
> > +	u32 den = xe_oa_timestamp_frequency(gt);
> > +
> > +	return div_u64(nom + den - 1, den);
> > +}
> > +
> > +static bool engine_supports_oa_format(const struct xe_hw_engine *hwe, int type)
> > +{
> > +	switch (hwe->oa_unit->type) {
> > +	case DRM_XE_OA_UNIT_TYPE_OAG:
> > +		return type == DRM_XE_OA_FMT_TYPE_OAG || type == DRM_XE_OA_FMT_TYPE_OAR ||
> > +			type == DRM_XE_OA_FMT_TYPE_OAC || type == DRM_XE_OA_FMT_TYPE_PEC;
> > +	case DRM_XE_OA_UNIT_TYPE_OAM:
> > +		return type == DRM_XE_OA_FMT_TYPE_OAM || type == DRM_XE_OA_FMT_TYPE_OAM_MPEC;
> > +	default:
> > +		return false;
> > +	}
> > +}
> > +
> > +static int decode_oa_format(struct xe_oa *oa, u64 fmt, enum xe_oa_format_name *name)
> > +{
> > +	u32 counter_size = FIELD_GET(DRM_XE_OA_FORMAT_MASK_COUNTER_SIZE, fmt);
> > +	u32 counter_sel = FIELD_GET(DRM_XE_OA_FORMAT_MASK_COUNTER_SEL, fmt);
> > +	u32 bc_report = FIELD_GET(DRM_XE_OA_FORMAT_MASK_BC_REPORT, fmt);
> > +	u32 type = FIELD_GET(DRM_XE_OA_FORMAT_MASK_FMT_TYPE, fmt);
> > +	int idx;
> > +
> > +	for_each_set_bit(idx, oa->format_mask, XE_OA_FORMAT_MAX) {
> > +		const struct xe_oa_format *f = &oa->oa_formats[idx];
> > +
> > +		if (counter_size == f->counter_size && bc_report == f->bc_report &&
> > +		    type == f->type && counter_sel == f->counter_select) {
> > +			*name = idx;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +u16 xe_oa_unit_id(struct xe_hw_engine *hwe)
> > +{
> > +	return hwe->oa_unit && hwe->oa_unit->num_engines ?
> > +		hwe->oa_unit->oa_unit_id : U16_MAX;
> > +}
> > +
> > +static int xe_oa_assign_hwe(struct xe_oa *oa, struct xe_oa_open_param *param)
> > +{
> > +	struct xe_gt *gt;
> > +	int i, ret = 0;
> > +
> > +	if (param->exec_q) {
> > +		/* When we have an exec_q, get hwe from the exec_q */
> > +		for_each_gt(gt, oa->xe, i) {
>
> Looks like the exec_queue can submit to a specific gt. I think we should
> try to get the hwe from the same gt as the exec_q. Basically this:
>
> if (param->exec_q->gt != gt)
>	continue;
>
> or you can just drop the for loop and assume that xe_gt_hw_engine is not
> supposed to fail for this gt (exec_q->gt).

Thanks for spotting this. I've gone ahead and changed to the second option
above in v8.

>
> > +			param->hwe = xe_gt_hw_engine(gt, param->exec_q->class,
> > +						     param->engine_instance, true);
> > +			if (param->hwe)
> > +				break;
> > +		}
> > +		if (param->hwe && (xe_oa_unit_id(param->hwe) != param->oa_unit_id)) {
> > +			drm_dbg(&oa->xe->drm, "OA unit ID mismatch for exec_q\n");
> > +			ret = -EINVAL;
> > +		}
> > +	} else {
> > +		struct xe_hw_engine *hwe;
> > +		enum xe_hw_engine_id id;
> > +
> > +		/* Else just get the first hwe attached to the oa unit */
> > +		for_each_gt(gt, oa->xe, i) {
> > +			for_each_hw_engine(hwe, gt, id) {
> > +				if (xe_oa_unit_id(hwe) == param->oa_unit_id) {
> > +					param->hwe = hwe;
> > +					goto out;
> > +				}
> > +			}
> > +		}
> > +	}
> > +out:
> > +	if (!param->hwe) {
> > +		drm_dbg(&oa->xe->drm, "Unable to find hwe for OA unit ID %d\n",
> > +			param->oa_unit_id);
> > +		ret = -EINVAL;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static int xe_oa_set_prop_oa_unit_id(struct xe_oa *oa, u64 value,
> > +				     struct xe_oa_open_param *param)
> > +{
> > +	if (value >= oa->oa_unit_ids) {
> > +		drm_dbg(&oa->xe->drm, "OA unit ID out of range %lld\n", value);
> > +		return -EINVAL;
> > +	}
> > +	param->oa_unit_id = value;
> > +	return 0;
> > +}
> > +
> > +static int xe_oa_set_prop_sample_oa(struct xe_oa *oa, u64 value,
> > +				    struct xe_oa_open_param *param)
> > +{
> > +	param->sample = value;
> > +	return 0;
> > +}
> > +
> > +static int xe_oa_set_prop_metric_set(struct xe_oa *oa, u64 value,
> > +				     struct xe_oa_open_param *param)
> > +{
> > +	param->metric_set = value;
> > +	return 0;
> > +}
> > +
> > +static int xe_oa_set_prop_oa_format(struct xe_oa *oa, u64 value,
> > +				    struct xe_oa_open_param *param)
> > +{
> > +	int ret = decode_oa_format(oa, value, &param->oa_format);
> > +
> > +	if (ret) {
> > +		drm_dbg(&oa->xe->drm, "Unsupported OA report format %#llx\n", value);
> > +		return ret;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int xe_oa_set_prop_oa_exponent(struct xe_oa *oa, u64 value,
> > +				      struct xe_oa_open_param *param)
> > +{
> > +#define OA_EXPONENT_MAX 31
> > +
> > +	if (value > OA_EXPONENT_MAX) {
> > +		drm_dbg(&oa->xe->drm, "OA timer exponent too high (> %u)\n", OA_EXPONENT_MAX);
> > +		return -EINVAL;
> > +	}
> > +	param->period_exponent = value;
>
> I think i915 has some additional logic where only root can sample at really
> high frequencies, but well, this is a root only use case, so I don't know
> what that logic achieved. Hoping that you intended to drop that logic,
> which is okay. Just confirming.

Good point, I have now removed that logic. This also deleted stuff like
xe_oa_max_sample_rate and xe_oa_sample_rate_hard_limit, so the previous
patch "drm/xe/oa/uapi: Add oa_max_sample_rate sysctl" is now dropped.

>
> > +	return 0;
> > +}
> > +
> > +static int xe_oa_set_prop_poll_oa_period(struct xe_oa *oa, u64 value,
> > +					 struct xe_oa_open_param *param)
> > +{
> > +	if (value < 100) {
> > +		drm_dbg(&oa->xe->drm, "OA timer too small (%lldus < 100us)\n", value);
> > +		return -EINVAL;
> > +	}
> > +	param->poll_period_us = value;
>
> I am not sure if anyone ended up using this at all. This will be unused if
> we add interrupt support in future. Any thoughts on adding interrupt
> support in future?

I think features like interrupt will need to be implemented incrementally
in the future, as long as they don't violate (or cause changes) in the
current uapi (which I hope we can merge before pondering these sort of
features).

> Also note that if we throttle poll to only signal the user after a set
> number of reports are available, then this parameter is not of much use.
> The poll throttling itself will reduce the CPU overhead that this was
> trying to address. Are there plans to bring that feature to XE (poll
> throttling)?

If it's needed, we will add it (incrementally) later. For now I have gone
ahead and dropped DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US.

>
> > +	return 0;
> > +}
> > +
> > +static int xe_oa_set_prop_open_flags(struct xe_oa *oa, u64 value,
> > +				     struct xe_oa_open_param *param)
> > +{
> > +	u32 known_open_flags =
> > +		DRM_XE_OA_FLAG_FD_CLOEXEC | DRM_XE_OA_FLAG_FD_NONBLOCK | DRM_XE_OA_FLAG_DISABLED;
> > +
> > +	if (value & ~known_open_flags) {
> > +		drm_dbg(&oa->xe->drm, "Unknown open_flag %#llx\n", value);
> > +		return -EINVAL;
> > +	}
> > +	param->open_flags = value;
> > +	return 0;
> > +}
> > +
> > +static int xe_oa_set_prop_exec_queue_id(struct xe_oa *oa, u64 value,
> > +					struct xe_oa_open_param *param)
> > +{
> > +	param->exec_queue_id = value;
> > +	return 0;
> > +}
> > +
> > +static int xe_oa_set_prop_engine_instance(struct xe_oa *oa, u64 value,
> > +					  struct xe_oa_open_param *param)
> > +{
> > +	param->engine_instance = value;
> > +	return 0;
> > +}
> > +
> > +typedef int (*xe_oa_set_property_fn)(struct xe_oa *oa, u64 value,
> > +				     struct xe_oa_open_param *param);
> > +static const xe_oa_set_property_fn xe_oa_set_property_funcs[] = {
> > +	[DRM_XE_OA_PROPERTY_OA_UNIT_ID] = xe_oa_set_prop_oa_unit_id,
> > +	[DRM_XE_OA_PROPERTY_SAMPLE_OA] = xe_oa_set_prop_sample_oa,
> > +	[DRM_XE_OA_PROPERTY_OA_METRIC_SET] = xe_oa_set_prop_metric_set,
> > +	[DRM_XE_OA_PROPERTY_OA_FORMAT] = xe_oa_set_prop_oa_format,
> > +	[DRM_XE_OA_PROPERTY_OA_EXPONENT] = xe_oa_set_prop_oa_exponent,
> > +	[DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US] = xe_oa_set_prop_poll_oa_period,
> > +	[DRM_XE_OA_PROPERTY_OPEN_FLAGS] = xe_oa_set_prop_open_flags,
> > +	[DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID] = xe_oa_set_prop_exec_queue_id,
> > +	[DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE] = xe_oa_set_prop_engine_instance,
> > +};
> > +
> > +static int xe_oa_user_ext_set_property(struct xe_oa *oa, u64 extension,
> > +				       struct xe_oa_open_param *param)
> > +{
> > +	u64 __user *address = u64_to_user_ptr(extension);
> > +	struct drm_xe_ext_set_property ext;
> > +	int err;
> > +	u32 idx;
> > +
> > +	err = __copy_from_user(&ext, address, sizeof(ext));
> > +	if (XE_IOCTL_DBG(oa->xe, err))
> > +		return -EFAULT;
> > +
> > +	if (XE_IOCTL_DBG(oa->xe, ext.property >= ARRAY_SIZE(xe_oa_set_property_funcs)) ||
> > +	    XE_IOCTL_DBG(oa->xe, ext.pad))
> > +		return -EINVAL;
> > +
> > +	idx = array_index_nospec(ext.property, ARRAY_SIZE(xe_oa_set_property_funcs));
> > +	return xe_oa_set_property_funcs[idx](oa, ext.value, param);
> > +}
> > +
> > +typedef int (*xe_oa_user_extension_fn)(struct xe_oa *oa, u64 extension,
> > +				       struct xe_oa_open_param *param);
> > +static const xe_oa_user_extension_fn xe_oa_user_extension_funcs[] = {
> > +	[DRM_XE_OA_EXTENSION_SET_PROPERTY] = xe_oa_user_ext_set_property,
> > +};
> > +
> > +static int xe_oa_user_extensions(struct xe_oa *oa, u64 extension,
> > +				 struct xe_oa_open_param *param)
> > +{
> > +	u64 __user *address = u64_to_user_ptr(extension);
> > +	struct xe_user_extension ext;
> > +	int err;
> > +	u32 idx;
> > +
> > +	err = __copy_from_user(&ext, address, sizeof(ext));
> > +	if (XE_IOCTL_DBG(oa->xe, err))
> > +		return -EFAULT;
> > +
> > +	if (XE_IOCTL_DBG(oa->xe, ext.pad) ||
> > +	    XE_IOCTL_DBG(oa->xe, ext.name >= ARRAY_SIZE(xe_oa_user_extension_funcs)))
> > +		return -EINVAL;
> > +
> > +	idx = array_index_nospec(ext.name, ARRAY_SIZE(xe_oa_user_extension_funcs));
> > +	err = xe_oa_user_extension_funcs[idx](oa, extension, param);
> > +	if (XE_IOCTL_DBG(oa->xe, err))
> > +		return err;
> > +
> > +	if (ext.next_extension)
> > +		return xe_oa_user_extensions(oa, ext.next_extension, param);
>
> What if the user passed a circular list of extensions? If it will result in
> an issue, we should also add a test for it.

Good catch. I have added a check for the number of extensions passed, the
same technique which is used in exec_queue_user_extensions.

>
> > +
> > +	return 0;
> > +}
> > +
> > +int xe_oa_stream_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > +{
> > +	struct xe_oa *oa = &to_xe_device(dev)->oa;
> > +	struct xe_file *xef = to_xe_file(file);
> > +	struct drm_xe_oa_open_param dparam;
> > +	struct xe_oa_open_param param = {};
> > +	const struct xe_oa_format *f;
> > +	bool privileged_op = true;
> > +	int ret;
> > +
> > +	if (!oa->xe) {
> > +		drm_dbg(&oa->xe->drm, "xe oa interface not available for this system\n");
> > +		return -ENODEV;
> > +	}
> > +
> > +	ret = __copy_from_user(&dparam, data, sizeof(dparam));
> > +	if (XE_IOCTL_DBG(oa->xe, ret))
> > +		return -EFAULT;
> > +
> > +	ret = xe_oa_user_extensions(oa, dparam.extensions, &param);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (param.exec_queue_id > 0) {
> > +		param.exec_q = xe_exec_queue_lookup(xef, param.exec_queue_id);
> > +		if (XE_IOCTL_DBG(oa->xe, !param.exec_q))
> > +			return -ENOENT;
> > +	}
> > +
> > +	/*
> > +	 * Query based sampling (using MI_REPORT_PERF_COUNT) with OAR/OAC,
> > +	 * without global stream access, can be an unprivileged operation
> > +	 */
> > +	if (param.exec_q && !param.sample)
> > +		privileged_op = false;
> > +
> > +	if (privileged_op && xe_perf_stream_paranoid && !perfmon_capable()) {
> > +		drm_dbg(&oa->xe->drm, "Insufficient privileges to open xe perf stream\n");
> > +		ret = -EACCES;
> > +		goto err_exec_q;
> > +	}
> > +
> > +	if (!param.exec_q && !param.sample) {
> > +		drm_dbg(&oa->xe->drm, "Only OA report sampling supported\n");
> > +		ret = -EINVAL;
> > +		goto err_exec_q;
> > +	}
> > +
> > +	ret = xe_oa_assign_hwe(oa, &param);
> > +	if (ret)
> > +		goto err_exec_q;
> > +
> > +	f = &oa->oa_formats[param.oa_format];
> > +	if (!param.oa_format || !f->size ||
> > +	    !engine_supports_oa_format(param.hwe, f->type)) {
> > +		drm_dbg(&oa->xe->drm, "Invalid OA format %d type %d size %d for class %d\n",
> > +			param.oa_format, f->type, f->size, param.hwe->class);
> > +		ret = -EINVAL;
> > +		goto err_exec_q;
> > +	}
> > +
> > +	if (param.period_exponent > 0) {
> > +		u64 oa_period, oa_freq_hz;
> > +
> > +		oa_period = oa_exponent_to_ns(param.hwe->gt, param.period_exponent);
> > +		oa_freq_hz = div64_u64(NSEC_PER_SEC, oa_period);
> > +		if (oa_freq_hz > xe_oa_max_sample_rate && !perfmon_capable()) {
> > +			drm_dbg(&oa->xe->drm,
> > +				"OA exponent would exceed the max sampling frequency (sysctl dev.xe.oa_max_sample_rate) %uHz without CAP_PERFMON or CAP_SYS_ADMIN privileges\n",
> > +				xe_oa_max_sample_rate);
> > +			ret = -EACCES;
> > +			goto err_exec_q;
> > +		}
> > +	}
> > +err_exec_q:
> > +	if (ret < 0 && param.exec_q)
> > +		xe_exec_queue_put(param.exec_q);
> > +	return ret;
> > +}
> > +
> > static bool xe_oa_is_valid_flex_addr(struct xe_oa *oa, u32 addr)
> > {
> >	static const struct xe_reg flex_eu_regs[] = {
> > diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
> > index e4863f8681b14..a0f9a876ea6b4 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.h
> > +++ b/drivers/gpu/drm/xe/xe_oa.h
> > @@ -19,6 +19,8 @@ void xe_oa_unregister(struct xe_device *xe);
> > int xe_oa_sysctl_register(void);
> > void xe_oa_sysctl_unregister(void);
> >
> > +int xe_oa_stream_open_ioctl(struct drm_device *dev, void *data,
> > +			    struct drm_file *file);
> > int xe_oa_add_config_ioctl(struct drm_device *dev, void *data,
> >			   struct drm_file *file);
> > int xe_oa_remove_config_ioctl(struct drm_device *dev, void *data,
> > diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
> > index 2aee4c7989486..2c0615481b7df 100644
> > --- a/drivers/gpu/drm/xe/xe_perf.c
> > +++ b/drivers/gpu/drm/xe/xe_perf.c
> > @@ -16,6 +16,8 @@ static int xe_oa_ioctl(struct drm_device *dev, struct drm_xe_perf_param *arg,
> >		       struct drm_file *file)
> > {
> >	switch (arg->perf_op) {
> > +	case DRM_XE_PERF_OP_STREAM_OPEN:
> > +		return xe_oa_stream_open_ioctl(dev, (void *)arg->param, file);
> >	case DRM_XE_PERF_OP_ADD_CONFIG:
> >		return xe_oa_add_config_ioctl(dev, (void *)arg->param, file);
> >	case DRM_XE_PERF_OP_REMOVE_CONFIG:
> > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > index f17134828c093..8156301df7315 100644
> > --- a/include/uapi/drm/xe_drm.h
> > +++ b/include/uapi/drm/xe_drm.h
> > @@ -1192,6 +1192,120 @@ enum drm_xe_oa_format_type {
> >	DRM_XE_OA_FMT_TYPE_PEC,
> > };
> >
> > +/** enum drm_xe_oa_property_id - OA stream property id's */
> > +enum drm_xe_oa_property_id {
> > +	/**
> > +	 * @DRM_XE_OA_PROPERTY_OA_UNIT_ID: ID of the OA unit on which to open
> > +	 * the OA stream, see @oa_unit_id in 'struct
> > +	 * drm_xe_query_oa_units'. Defaults to 0 if not provided.
> > +	 */
> > +	DRM_XE_OA_PROPERTY_OA_UNIT_ID = 1,
> > +
> > +	/**
> > +	 * @DRM_XE_OA_PROPERTY_SAMPLE_OA: A value of 1 requests the inclusion of
> > +	 * raw OA unit reports as part of stream samples.
> > +	 */
> > +	DRM_XE_OA_PROPERTY_SAMPLE_OA,
> > +
> > +	/**
> > +	 * @DRM_XE_OA_PROPERTY_OA_METRIC_SET: OA metrics defining contents of OA
> > +	 * reportst, previously added via @@DRM_XE_PERF_OP_ADD_CONFIG.
>
> typo: reports

Done, thanks.

>
> > +	 */
> > +	DRM_XE_OA_PROPERTY_OA_METRIC_SET,
> > +
> > +	/** @DRM_XE_OA_PROPERTY_OA_FORMAT: Perf counter report format */
> > +	DRM_XE_OA_PROPERTY_OA_FORMAT,
> > +	/**
> > +	 * OA_FORMAT's are specified the same way as in Bspec, in terms of
> > +	 * the following quantities: a. enum @drm_xe_oa_format_type
> > +	 * b. Counter select c. Counter size and d. BC report
> > +	 */
> > +#define DRM_XE_OA_FORMAT_MASK_FMT_TYPE		(0xff << 0)
> > +#define DRM_XE_OA_FORMAT_MASK_COUNTER_SEL	(0xff << 8)
> > +#define DRM_XE_OA_FORMAT_MASK_COUNTER_SIZE	(0xff << 16)
> > +#define DRM_XE_OA_FORMAT_MASK_BC_REPORT		(0xff << 24)
>
> indentation/alignment is off I guess

Once again, indentation/alignment is fine. It is the presence of the
additional '+' which disturbs the tab's and makes it appear that alignment
is off.

>
> > +
> > +	/**
> > +	 * @DRM_XE_OA_PROPERTY_OA_EXPONENT: Requests periodic OA unit sampling
> > +	 * with sampling frequency proportional to 2^(period_exponent + 1)
> > +	 */
> > +	DRM_XE_OA_PROPERTY_OA_EXPONENT,
> > +
> > +	/**
> > +	 * @DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US: Timer interval in microseconds
> > +	 * to check OA buffer for available data. Minimum allowed value is 100
> > +	 * microseconds. A default value is used by the driver if this parameter
> > +	 * is skipped. Larger timer values will reduce cpu consumption during OA
> > +	 * perf captures, but excessively large values could result in data loss
> > +	 * due to OA buffer overwrites.
> > +	 */
> > +	DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US,
>
> Again, very likely not used, but please confirm with the UMDs though.

This is dropped for now. See above.

>
> > +
> > +	/**
> > +	 * @DRM_XE_OA_PROPERTY_OPEN_FLAGS: CLOEXEC and NONBLOCK flags are
> > +	 * directly applied to returned OA fd. DISABLED opens the OA stream in a
> > +	 * DISABLED state (see @DRM_XE_PERF_IOCTL_ENABLE).
> > +	 */
> > +	DRM_XE_OA_PROPERTY_OPEN_FLAGS,
> > +#define DRM_XE_OA_FLAG_FD_CLOEXEC	(1 << 0)
> > +#define DRM_XE_OA_FLAG_FD_NONBLOCK	(1 << 1)
> > +#define DRM_XE_OA_FLAG_DISABLED		(1 << 2)
>
> oh, overlooked this before commenting earlier on the fcntl stuff. Looks
> like you already were passing this in params. Anyways, fcntl should be good
> for CLOEXEC/NONBLOCK.

Yes, CLOEXEC/NONBLOCK are dropped (verified fcntl works fine). DISABLED is
converted into a new property DRM_XE_OA_PROPERTY_OA_DISABLED.

>
> > +
> > +	/**
> > +	 * @DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID: Open the stream for a specific
> > +	 * @exec_queue_id. Perf queries can be executed on this exec queue.
> > +	 */
> > +	DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID,
> > +
> > +	/**
> > +	 * @DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE: Optional engine instance to
> > +	 * pass along with @DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID or will default to 0.
> > +	 */
> > +	DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE,
> > +
> > +	DRM_XE_OA_PROPERTY_MAX /* non-ABI */
> > +};
> > +
> > +/**
> > + * struct drm_xe_oa_open_param - Params for opening an OA stream
> > + *
> > + * Stream params are specified as a chain of @drm_xe_ext_set_property
> > + * struct's, with @property values from enum @drm_xe_oa_property_id and
> > + * @xe_user_extension base.name set to @DRM_XE_OA_EXTENSION_SET_PROPERTY
> > + */
> > +struct drm_xe_oa_open_param {
> > +#define DRM_XE_OA_EXTENSION_SET_PROPERTY	0
> > +	/** @extensions: Pointer to the first extension struct */
> > +	__u64 extensions;
> > +};
> > +
> > +/** enum drm_xe_oa_record_type - Type of OA packet read from OA fd */
> > +enum drm_xe_oa_record_type {
> > +	/** @DRM_XE_OA_RECORD_SAMPLE: Regular OA data sample */
> > +	DRM_XE_OA_RECORD_SAMPLE = 1,
> > +
> > +	/** @DRM_XE_OA_RECORD_OA_REPORT_LOST: Status indicating lost OA reports */
> > +	DRM_XE_OA_RECORD_OA_REPORT_LOST = 2,
> > +
> > +	/**
> > +	 * @DRM_XE_OA_RECORD_OA_BUFFER_LOST: Status indicating lost OA
> > +	 * reports and OA buffer reset in the process
> > +	 */
> > +	DRM_XE_OA_RECORD_OA_BUFFER_LOST = 3,
> > +
> > +	DRM_XE_OA_RECORD_MAX /* non-ABI */
> > +};
> > +
> > +/** struct drm_xe_oa_record_header - Header for OA packets read from OA fd */
> > +struct drm_xe_oa_record_header {
> > +	/** @type: Of enum @drm_xe_oa_record_type */
> > +	__u16 type;
> > +	/** @pad: MBZ */
> > +	__u16 pad;
> > +	/** @size: size in bytes */
> > +	__u32 size;
> > +};
>
> I think we want to drop the header completely, but I guess that's still
> wip. Any plans to allow read/write of STATUS reg? I am thinking i915 can
> clear the register automatically but store the last cleared value that can
> be returned to the user on a read or something similar where user only ever
> needs to read it. I don't think user will try to recover anything if there
> is an error. In case of error the capture is just reinitiated. Again, need
> UMD confirmation here.

I have eliminated the report header, so the read just provides the OA
buffer data in integral number of reports (num_reports = read_size /
format_size). Also provided a new stream fd ioctl DRM_XE_PERF_IOCTL_STATUS,
which returns the status reg and clears it (so for now user can only read
the status register, not write it). Take a look at xe_oa_status_locked() in
the patch "drm/xe/oa/uapi: Read file_operation". If there is an error, it's
the user's responsibility to reinitiate the capture (by say using
disable/enable).

>
> > +
> > /**
> >  * struct drm_xe_oa_config - OA metric configuration
> >  *
> > --
> > 2.41.0
> >

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 08/17] drm/xe/oa: OA stream initialization (OAG)
  2023-12-20  2:31   ` Umesh Nerlige Ramappa
@ 2024-01-20  2:49     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  2:49 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Tue, 19 Dec 2023 18:31:45 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> > diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> > index d318ec0efd7db..1b98b609f7fda 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> > @@ -156,6 +156,8 @@
> >
> > #define SQCNT1					XE_REG_MCR(0x8718)
> > #define XELPMP_SQCNT1				XE_REG(0x8718)
> > +#define   SQCNT1_PMON_ENABLE			REG_BIT(30)
> > +#define   SQCNT1_OABPC				REG_BIT(29)
> > #define   ENFORCE_RAR				REG_BIT(23)
>
> REG_BIT(29) indentation seems to be off

Indentation is ok, I've explained this elsewhere.

> > +static int xe_oa_emit_oa_config(struct xe_oa_stream *stream)
> > +{
> > +#define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
> > +	struct xe_oa_config_bo *oa_bo;
> > +	int err, us = NOA_PROGRAM_ADDITIONAL_DELAY_US;
> > +
> > +	oa_bo = xe_oa_alloc_config_buffer(stream);
> > +	if (IS_ERR(oa_bo)) {
> > +		err = PTR_ERR(oa_bo);
> > +		goto exit;
> > +	}
> > +
> > +	err = xe_oa_submit_bb(stream, oa_bo->bb);
> > +
> > +	/* Additional empirical delay needed for NOA programming after registers are written */
> > +	usleep_range(us, 2 * us);
>
> Are we planning to signal user fence or something to indicate completion? I
> haven't tracked that aspect much.

Yes that's the plan. This usleep() will be removed when we implement that.

>
> The reset is familiar and lgtm,
>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 09/17] drm/xe/oa/uapi: Expose OA stream fd
  2023-12-20  2:52   ` Umesh Nerlige Ramappa
@ 2024-01-20  2:50     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  2:50 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Tue, 19 Dec 2023 18:52:58 -0800, Umesh Nerlige Ramappa wrote:
>
> On Thu, Dec 07, 2023 at 10:43:21PM -0800, Ashutosh Dixit wrote:
> > The OA stream open perf op returns an fd with its own file_operations for
> > the newly initialized OA stream. These file_operations allow userspace to
> > enable or disable the stream, as well as apply a different metric
> > configuration for the OA stream. Userspace can also poll for data
> > availability. OA stream initialization is completed in this commit by
> > enabling the OA stream. When sampling is enabled this starts a hrtimer
> > which periodically checks for data availablility.
> >
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>
> lgtm
>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks, though the question here is can we avoid introducing the read()
interface and can just do with mmap. For v8, I'm assuming we need read()
too, but let's revisit afterwards.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 11/17] drm/xe/oa: Disable overrun mode for Xe2+ OAG
  2023-12-20  3:05   ` Umesh Nerlige Ramappa
@ 2024-01-20  2:51     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  2:51 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Tue, 19 Dec 2023 19:05:56 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> On Thu, Dec 07, 2023 at 10:43:23PM -0800, Ashutosh Dixit wrote:
> > Xe2+ OAG requires special handling because non-power-of-2 report sizes are
> > not a sub-multiple of the OA buffer size and there are no partial reports
> > at the end of the buffer. This issue is present only when overrun mode is
> > enabled. Avoid adding this special handling by disabling overrun mode for
> > Xe2+ OAG.
>
> Like you mentioned earlier, maybe disable overrun for all platforms OR have
> the user control the setting.

This patch is mostly a workaround till Xe handles this new thing in HW. But
yes, we should see if it is possible to disable overrun mode altogether.

I was actually saying have overrun mode always enabled and skip the HEAD
pointer update (if we don't expose the read() interface). Anyway let's
discuss some more about this. I will also check with UMD's.

> Otherwise, this is
>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 10/17] drm/xe/oa/uapi: Read file_operation
  2023-12-20  3:01   ` Umesh Nerlige Ramappa
@ 2024-01-20  2:51     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  2:51 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Tue, 19 Dec 2023 19:01:57 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> > +static int xe_oa_append_status(struct xe_oa_stream *stream, char __user *buf,
> > +			       size_t count, size_t *offset,
> > +			       enum drm_xe_oa_record_type type)
>
> space/indent ^
>
> > +{
> > +	struct drm_xe_oa_record_header header = { type, 0, sizeof(header) };
> > +
> > +	if ((count - *offset) < header.size)
> > +		return -ENOSPC;
> > +
> > +	if (copy_to_user(buf + *offset, &header, sizeof(header)))
> > +		return -EFAULT;
> > +
> > +	*offset += header.size;
> > +
> > +	return 0;
> > +}
> > +
> > +static int xe_oa_append_sample(struct xe_oa_stream *stream, char __user *buf,
> > +			       size_t count, size_t *offset, const u8 *report)
>
> space/indent ^ and a couple more places, in this patch.
>
> With some indents addressed, this is:

Indents are fine and patches have gone through checkpatch. These indents
may look off in the patches sometimes (say due to the additional +
character) but in reality they are fine.

> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* FIXME Re: [PATCH 13/17] drm/xe/oa: Add OAC support
  2023-12-20  4:59   ` Umesh Nerlige Ramappa
@ 2024-01-20  2:52     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  2:52 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Tue, 19 Dec 2023 20:59:29 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> On Thu, Dec 07, 2023 at 10:43:25PM -0800, Ashutosh Dixit wrote:
> > Similar to OAR, allow userspace to execute MI_REPORT_PERF_COUNT on compute
> > engines of a specified exec queue.
> >
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > ---
> > drivers/gpu/drm/xe/regs/xe_engine_regs.h |  1 +
> > drivers/gpu/drm/xe/regs/xe_oa_regs.h     |  3 +
> > drivers/gpu/drm/xe/xe_oa.c               | 81 +++++++++++++++++++++++-
> > 3 files changed, 82 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
> > index 76c0938df05f3..045f9773f01f4 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
> > @@ -73,6 +73,7 @@
> >
> > #define RING_CONTEXT_CONTROL(base)		XE_REG((base) + 0x244, XE_REG_OPTION_MASKED)
> > #define	  CTX_CTRL_OAC_CONTEXT_ENABLE		REG_BIT(8)
> > +#define	  CTX_CTRL_RUN_ALONE			REG_BIT(7)
> > #define	  CTX_CTRL_INHIBIT_SYN_CTX_SWITCH	REG_BIT(3)
> > #define	  CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT	REG_BIT(0)
> >
> > diff --git a/drivers/gpu/drm/xe/regs/xe_oa_regs.h b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
> > index 7e2e875ccf80a..b66cd95b795e7 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_oa_regs.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_oa_regs.h
> > @@ -74,6 +74,9 @@
> > #define  OAG_OASTATUS_BUFFER_OVERFLOW	REG_BIT(1)
> > #define  OAG_OASTATUS_REPORT_LOST	REG_BIT(0)
> >
> > +/* OAC unit */
> > +#define OAC_OACONTROL			XE_REG(0x15114)
> > +
> > /* OAM unit */
> > #define OAM_HEAD_POINTER_OFFSET			(0x1a0)
> > #define OAM_TAIL_POINTER_OFFSET			(0x1a4)
> > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> > index 9d653d7722d1a..42f32d4359f2c 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.c
> > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > @@ -449,6 +449,19 @@ u32 __format_to_oactrl(const struct xe_oa_format *format, int counter_sel_mask)
> >		REG_FIELD_PREP(OA_OACONTROL_COUNTER_SIZE_MASK, format->counter_size);
> > }
> >
> > +static u32 __oa_ccs_select(struct xe_oa_stream *stream)
> > +{
> > +	u32 val;
> > +
> > +	if (stream->hwe->class != XE_ENGINE_CLASS_COMPUTE)
> > +		return 0;
> > +
> > +	val = REG_FIELD_PREP(OAG_OACONTROL_OA_CCS_SELECT_MASK, stream->hwe->instance);
> > +	xe_assert(stream->oa->xe,
> > +		  REG_FIELD_GET(OAG_OACONTROL_OA_CCS_SELECT_MASK, val) == stream->hwe->instance);
>
> Why is there a need to do REG_FIELD_GET? I thought the REG_FIELD_PREP is
> just a bitwise operation. Are you expecting coherency issues?

No, the check is that hwe->instance can fit in 3 bits
(OAG_OACONTROL_OA_CCS_SELECT_MASK).

>
> > +}
> > +
> > static void xe_oa_enable(struct xe_oa_stream *stream)
> > {
> >	const struct xe_oa_format *format = stream->oa_buffer.format;
> > @@ -463,7 +476,7 @@ static void xe_oa_enable(struct xe_oa_stream *stream)
> >
> >	regs = __oa_regs(stream);
> >	val = __format_to_oactrl(format, regs->oa_ctrl_counter_select_mask) |
> > -		OAG_OACONTROL_OA_COUNTER_ENABLE;
> > +		__oa_ccs_select(stream) | OAG_OACONTROL_OA_COUNTER_ENABLE;
> >
> >	xe_mmio_write32(stream->gt, regs->oa_ctrl, val);
> > }
> > @@ -762,6 +775,64 @@ static int xe_oa_configure_oar_context(struct xe_oa_stream *stream, bool enable)
> >	return xe_oa_modify_self(stream, regs_lri, ARRAY_SIZE(regs_lri));
> > }
> >
> > +static int xe_oa_configure_oac_context(struct xe_oa_stream *stream, bool enable)
> > +{
> > +	const struct xe_oa_format *format = stream->oa_buffer.format;
> > +	struct xe_lrc *lrc = &stream->exec_q->lrc[0];
> > +	u32 regs_offset = xe_lrc_regs_offset(lrc) / sizeof(u32);
> > +	u32 oacontrol = __format_to_oactrl(format, OAR_OACONTROL_COUNTER_SEL_MASK) |
> > +		(enable ? OAR_OACONTROL_COUNTER_ENABLE : 0);
> > +	struct flex regs_context[] = {
> > +		{
> > +			OACTXCONTROL(stream->hwe->mmio_base),
> > +			stream->oa->ctx_oactxctrl_offset[stream->hwe->class] + 1,
> > +			enable ? OA_COUNTER_RESUME : 0,
> > +		},
> > +		{
> > +			RING_CONTEXT_CONTROL(stream->hwe->mmio_base),
> > +			regs_offset + CTX_CONTEXT_CONTROL,
> > +			_MASKED_FIELD(CTX_CTRL_OAC_CONTEXT_ENABLE,
> > +				      enable ? CTX_CTRL_OAC_CONTEXT_ENABLE : 0) |
> > +			_MASKED_FIELD(CTX_CTRL_RUN_ALONE,
> > +				      enable ? CTX_CTRL_RUN_ALONE : 0),
> > +		},
> > +	};
> > +	/* Offsets in regs_lri are not used since this configuration is applied using LRI */
> > +	struct flex regs_lri[] = {
> > +		{
> > +			OAC_OACONTROL,
> > +			OAR_OAC_OACONTROL_OFFSET + 1,
> > +			oacontrol,
> > +		},
> > +	};
> > +	int err;
> > +
> > +	/* Set ccs select to enable programming of OAC_OACONTROL */
> > +	xe_mmio_write32(stream->gt, __oa_regs(stream)->oa_ctrl, __oa_ccs_select(stream));
> > +
> > +	/* Modify stream hwe context image with regs_context */
> > +	err = xe_oa_modify_context(stream, &stream->exec_q->lrc[0],
> > +				   regs_context, ARRAY_SIZE(regs_context));
> > +	if (err)
> > +		return err;
> > +
> > +	/* Apply regs_lri using LRI */
> > +	return xe_oa_modify_self(stream, regs_lri, ARRAY_SIZE(regs_lri));
>
> I think in i915, for execlist scheduling, there was a kernel context that
> was scheduled and when it ran, it would indicate that all other contexts
> are done executing - kinda GPU idle. The modify self was (IMO) only needed
> to update the kernel context in this scenario.

Hmm, in i915 the modify_self uses the "pinned context" (not the kernel
context) afaict. Anyway there are significant differences between this code
in Xe vs the same code in i915 (due to the exclusive use of
stream->k_exec_q in Xe).

> GuC does not have a concept of kernel context (at least not in i915, not
> sure if things changed in XE). If so, all the modify self can be dropped
> (both from OAR and OAC).

I tried replacing modify_self with an MMIO write to
OAR_OACONTROL/OAC_OACONTROL, but it doesn't work. So it seems those
registers can only be programmed via MI_LOAD_REGISTER commands, not by
xe_mmio_write32.

So what I have done instead is, just rename modify_self and refactor the
code a bit.

> Otherwise, this is good too, so
>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>
> The other query I have (unrelated to this patch) is if we need the
> PWR_CLK_STATE state configured in all contexts. It's a bit hazy if that was
> only needed for older gens or if it is applicable to newer
> platforms. (gen12_configure_all_contexts() in i915).

Lionel confirmed that PWR_CLK_STATE is not needed for Gen12+ (only needed
for Gen11 and older).

Thanks.
--
Ashutosh

> > +}
> > +
> > +static int xe_oa_configure_oa_context(struct xe_oa_stream *stream, bool enable)
> > +{
> > +	switch (stream->hwe->class) {
> > +	case XE_ENGINE_CLASS_RENDER:
> > +		return xe_oa_configure_oar_context(stream, enable);
> > +	case XE_ENGINE_CLASS_COMPUTE:
> > +		return xe_oa_configure_oac_context(stream, enable);
> > +	default:
> > +		/* Video engines do not support MI_REPORT_PERF_COUNT */
> > +		return 0;
> > +	}
> > +}
> > +
> > #define HAS_OA_BPC_REPORTING(xe) (GRAPHICS_VERx100(xe) >= 1255)
> >
> > static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
> > @@ -781,7 +852,7 @@ static void xe_oa_disable_metric_set(struct xe_oa_stream *stream)
> >
> >	/* disable the context save/restore or OAR counters */
> >	if (stream->exec_q)
> > -		xe_oa_configure_oar_context(stream, false);
> > +		xe_oa_configure_oa_context(stream, false);
> >
> >	/* Make sure we disable noa to save power. */
> >	xe_mmio_rmw32(stream->gt, RPM_CONFIG1, GT_NOA_ENABLE, 0);
> > @@ -978,8 +1049,9 @@ static int xe_oa_enable_metric_set(struct xe_oa_stream *stream)
> >
> >	xe_mmio_rmw32(stream->gt, XELPMP_SQCNT1, 0, sqcnt1);
> >
> > +	/* Configure OAR/OAC */
> >	if (stream->exec_q) {
> > -		ret = xe_oa_configure_oar_context(stream, true);
> > +		ret = xe_oa_configure_oa_context(stream, true);
> >		if (ret)
> >			return ret;
> >	}
> > @@ -1636,6 +1708,9 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, void *data, struct drm_file
> >		param.exec_q = xe_exec_queue_lookup(xef, param.exec_queue_id);
> >		if (XE_IOCTL_DBG(oa->xe, !param.exec_q))
> >			return -ENOENT;
> > +
> > +		if (param.exec_q->width > 1)
> > +			drm_dbg(&oa->xe->drm, "exec_q->width > 1, programming only exec_q->lrc[0]\n");
> >	}
> >
> >	/*
> > --
> > 2.41.0
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 14/17] drm/xe/oa/uapi: Query OA unit properties
  2023-12-23  0:40   ` Umesh Nerlige Ramappa
@ 2024-01-20  3:10     ` Dixit, Ashutosh
  0 siblings, 0 replies; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  3:10 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Fri, 22 Dec 2023 16:40:47 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > index 8156301df7315..5f41c5bfe5e0e 100644
> > --- a/include/uapi/drm/xe_drm.h
> > +++ b/include/uapi/drm/xe_drm.h
> > @@ -517,6 +517,7 @@ struct drm_xe_device_query {
> > #define DRM_XE_DEVICE_QUERY_HWCONFIG		4
> > #define DRM_XE_DEVICE_QUERY_GT_TOPOLOGY		5
> > #define DRM_XE_DEVICE_QUERY_ENGINE_CYCLES	6
> > +#define DRM_XE_DEVICE_QUERY_OA_UNITS		7
> >	/** @query: The type of data to query */
> >	__u32 query;
> >
> > @@ -1182,6 +1183,69 @@ enum drm_xe_oa_unit_type {
> >	DRM_XE_OA_UNIT_TYPE_OAM,
> > };
> >
> > +/**
> > + * struct drm_xe_query_oa_units - describe OA units
> > + *
> > + * If a query is made with a struct drm_xe_device_query where .query
> > + * is equal to DRM_XE_DEVICE_QUERY_OA_UNITS, then the reply uses struct
> > + * drm_xe_query_oa_units in .data.
> > + *
> > + * When there is an @open_stream, the query returns properties specific to
> > + * that @open_stream. Else default properties are returned.
> > + */
> > +struct drm_xe_query_oa_units {
> > +	/** @extensions: Pointer to the first extension struct, if any */
> > +	__u64 extensions;
> > +
> > +	/** @num_oa_units: number of OA units returned in oau[] */
> > +	__u32 num_oa_units;
> > +
> > +	/** @pad: MBZ */
> > +	__u32 pad;
> > +
> > +	/** @reserved: MBZ */
> > +	__u64 reserved[4];
>
> For some reason I have assumed reserved fields are added only at the end of
> the uApi struct, not sure though.

I have removed this in v8 and also brought 'struct drm_xe_query_oa_units'
in line with other query structs (see query_engines or query_mem_regions
e.g.).

>
> > +
> > +	/** @oa_units: OA units returned for this device */
> > +	struct drm_xe_oa_unit {
> > +		/** @oa_unit_id: OA unit ID */
> > +		__u16 oa_unit_id;
> > +
> > +		/** @oa_unit_type: OA unit type of @drm_xe_oa_unit_type */
> > +		__u16 oa_unit_type;
> > +
> > +		/** @gt_id: GT ID for this OA unit */
> > +		__u16 gt_id;
> > +
> > +		/** @open_stream: True if a stream is open on the OA unit */
> > +		__u16 open_stream;
> > +
> > +		/** @internal_events: True if internal events are available */
> > +		__u16 internal_events;
> > +
> > +		/** @pad: MBZ */
> > +		__u16 pad;
>
> __u16 pad[3] for 64bit alignment

internal_events and pad above are also removed.

> > +
> > +		/** @capabilities: OA capabilities bit-mask */
> > +		__u64 capabilities;
> > +
> > +		/** @oa_timestamp_freq: OA timestamp freq */
> > +		__u64 oa_timestamp_freq;
> > +
> > +		/** @oa_buf_size: OA buffer size */
> > +		__u64 oa_buf_size;
> > +
> > +		/** @reserved: MBZ */
> > +		__u64 reserved[4];
> > +
> > +		/** @num_engines: number of engines in @eci array */
> > +		__u64 num_engines;
> > +
> > +		/** @eci: engines attached to this OA unit */
> > +		struct drm_xe_engine_class_instance eci[];
> > +	} oa_units[];
>
> nesting of flexible arrays; not sure about that. i think some compilers may
> throw an error/warning. Sending an old message from Joonas offline.

From what I saw that old message is inconclusive. Windows guys have not
explained what they are doing and anyway why should Windows UMD talk to
Linux KMD. Windows can #ifdef the struct out if needed at their end.

It talks about this error:

https://learn.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2233

	class B {
		char zeroarray[];
	};

	B array2[100];   // C2233

The above is obviously wrong but this is not we are doing in the above
struct (we have not sized the variable length struct, only indicated that a
variable length struct is inside another variable length struct, which is
legitimate as we index correctly into the arrays).

So I am ignoring this. Please let me know if you disagree. Or if you have
any suggestions about alternative ways of doing this, we could look into
it.

> In general, I feel the pad and reserved fields sprinkled into the
> structure. If we can avoid that in a way that they are all located at the
> end of the struct, I think that would look good. Not sure about the
> technical aspect though. I always assumed they were meant to be at the end
> (but then structs are nested anyways, so really not sure).

In v8 there is only a single reserved[4] array just before num_engines in
'struct drm_xe_oa_unit'. In case we need to add extra fields later on
(after that is num_engines and the variable length eci[] array which it's
better to keep together).


> > +};
> > +
> > /** enum drm_xe_oa_format_type - OA format types */
> > enum drm_xe_oa_format_type {
> >	DRM_XE_OA_FMT_TYPE_OAG,
> > --
> > 2.41.0
> >

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap
  2023-12-23  2:39   ` Umesh Nerlige Ramappa
@ 2024-01-20  3:11     ` Dixit, Ashutosh
  2024-02-06 23:51       ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 68+ messages in thread
From: Dixit, Ashutosh @ 2024-01-20  3:11 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe

On Fri, 22 Dec 2023 18:39:14 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> On Thu, Dec 07, 2023 at 10:43:27PM -0800, Ashutosh Dixit wrote:
> > Allow the OA buffer to be mmap'd to userspace. This is needed for the MMIO
> > trigger use case. Even otherwise, with whitelisted OA head/tail ptr
> > registers, userspace can receive/interpret OA data from the mmap'd buffer
> > without issuing read()'s on the OA stream fd.
> >
> > Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_oa.c | 53 ++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 53 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> > index 42f32d4359f2c..97779cbb83ee8 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.c
> > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > @@ -898,6 +898,8 @@ static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream)
> >		return PTR_ERR(bo);
> >
> >	stream->oa_buffer.bo = bo;
> > +	/* mmap implementation requires OA buffer to be in system memory */
> > +	xe_assert(stream->oa->xe, bo->vmap.is_iomem == 0);
> >	stream->oa_buffer.vaddr = bo->vmap.vaddr;
> >	return 0;
> > }
> > @@ -1174,6 +1176,9 @@ static int xe_oa_release(struct inode *inode, struct file *file)
> >	struct xe_oa_stream *stream = file->private_data;
> >	struct xe_gt *gt = stream->gt;
> >
> > +	/* Zap mmap's */
> > +	unmap_mapping_range(file->f_mapping, 0, -1, 1);
> > +
> >	mutex_lock(&gt->oa.gt_lock);
> >	xe_oa_destroy_locked(stream);
> >	mutex_unlock(&gt->oa.gt_lock);
> > @@ -1184,6 +1189,53 @@ static int xe_oa_release(struct inode *inode, struct file *file)
> >	return 0;
> > }
> >
> > +static int xe_oa_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +	struct xe_oa_stream *stream = file->private_data;
> > +	struct xe_bo *bo = stream->oa_buffer.bo;
> > +	unsigned long start = vma->vm_start;
> > +	int i, ret;
> > +
> > +	if (xe_perf_stream_paranoid && !perfmon_capable()) {
> > +		drm_dbg(&stream->oa->xe->drm, "Insufficient privilege to map OA buffer\n");
> > +		return -EACCES;
> > +	}
> > +
> > +	/* Can mmap the entire OA buffer or nothing (no partial OA buffer mmaps) */
> > +	if (vma->vm_end - vma->vm_start != XE_OA_BUFFER_SIZE) {
> > +		drm_dbg(&stream->oa->xe->drm, "Wrong mmap size, must be OA buffer size\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* Only support VM_READ, enforce MAP_PRIVATE by checking for VM_MAYSHARE */
> > +	if (vma->vm_flags & (VM_WRITE | VM_EXEC | VM_SHARED | VM_MAYSHARE)) {
> > +		drm_dbg(&stream->oa->xe->drm, "mmap must be read only\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	vm_flags_clear(vma, VM_MAYWRITE | VM_MAYEXEC);
> > +
> > +	/*
> > +	 * If the privileged parent forks and child drops root privilege, we do not want
> > +	 * the child to retain access to the mapped OA buffer. Explicitly set VM_DONTCOPY
> > +	 * to avoid such cases.
> > +	 */
> > +	vm_flags_set(vma, vma->vm_flags | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY);
>
> Would help to just use the vm_flags_mod where you can specify both set and
> clear flags.

Yes, good idea, done.

>
> And then just to be paranoid about it, maybe add an assert to check that
> the flags applied correctly.

Don't think this is needed, code in vm_flags_mod is pretty clear. So
skipping this.

> Assuming you ran the existing mmap tests for this.

Yes, existing IGT's have been ported to Xe.

> I think we should also add an mremap case. I think that should fail with
> EINVAL since this is a private mapping.

Ah, mremap description says "mremap() expands (or shrinks) an existing
memory mapping" so the relevant flag seems to be VM_DONTEXPAND rather than
VM_SHARED/VM_MAYSHARE. I will see about adding a mremap test.

>
> > +
> > +	xe_assert(stream->oa->xe, bo->ttm.ttm->num_pages ==
> > +		  (vma->vm_end - vma->vm_start) >> PAGE_SHIFT);
> > +	for (i = 0; i < bo->ttm.ttm->num_pages; i++) {
> > +		ret = remap_pfn_range(vma, start, page_to_pfn(bo->ttm.ttm->pages[i]),
> > +				      PAGE_SIZE, vma->vm_page_prot);
>
> vma->vm_page_prot is set to the state of vm_flags that existed at the
> mmap_region() level. We have modified those flags here and we must update
> the vma_page_prot with vm_get_page_prot(vma->vm_flags).

If we need to do this I think the call to use would be
vma_set_page_prot(). But, looking at vm_get_page_prot, vm_flags which
affect vma->vm_page_prot are (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED). And note
that in this function we have not modified any of these, if
(VM_WRITE|VM_EXEC|VM_SHARED) is set, we just return error
(vma->vm_page_prot seems to determine what we write to the PTE's and the
flags we are modifying here are more like "software" vm_flags).

Also we would see some test failures if we didn't do this correctly, and we
are not seeing test failures.

So I am a little torn about this, vma_set_page_prot() doesn't seem to be
really needed, but may be nice to have, but it is also rarely used in the
kernel.

I am copying Thomas and let's see what he thinks. I'm skipping adding this
for now but if Thomas or you say we should add it, I'll go ahead add it.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2023-12-08  6:43 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
  2023-12-14  0:57   ` Umesh Nerlige Ramappa
  2023-12-19 20:28   ` Dixit, Ashutosh
@ 2024-01-24 14:10   ` Joel Granados
  2 siblings, 0 replies; 68+ messages in thread
From: Joel Granados @ 2024-01-24 14:10 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 3499 bytes --]

On Thu, Dec 07, 2023 at 10:43:14PM -0800, Ashutosh Dixit wrote:
> Normally only superuser/root can access perf counter data. However,
> superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
> users to also access perf data. perf_stream_paranoid is introduced at the
> perf layer to allow different perf stream types to share this access
> mechanism.
> 
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_module.c |  5 +++++
>  drivers/gpu/drm/xe/xe_perf.c   | 28 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_perf.h   |  4 ++++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> index 51bf69b7ab222..8629330d928b0 100644
> --- a/drivers/gpu/drm/xe/xe_module.c
> +++ b/drivers/gpu/drm/xe/xe_module.c
> @@ -11,6 +11,7 @@
>  #include "xe_drv.h"
>  #include "xe_hw_fence.h"
>  #include "xe_pci.h"
> +#include "xe_perf.h"
>  #include "xe_pmu.h"
>  #include "xe_sched_job.h"
>  
> @@ -71,6 +72,10 @@ static const struct init_funcs init_funcs[] = {
>  		.init = xe_register_pci_driver,
>  		.exit = xe_unregister_pci_driver,
>  	},
> +	{
> +		.init = xe_perf_sysctl_register,
> +		.exit = xe_perf_sysctl_unregister,
> +	},
>  };
>  
>  static int __init xe_init(void)
> diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
> index a130076b59aa2..37538e98dcc04 100644
> --- a/drivers/gpu/drm/xe/xe_perf.c
> +++ b/drivers/gpu/drm/xe/xe_perf.c
> @@ -4,9 +4,13 @@
>   */
>  
>  #include <linux/errno.h>
> +#include <linux/sysctl.h>
>  
>  #include "xe_perf.h"
>  
> +u32 xe_perf_stream_paranoid = true;
> +static struct ctl_table_header *sysctl_header;
> +
>  int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  {
>  	struct drm_xe_perf_param *arg = data;
> @@ -19,3 +23,27 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		return -EINVAL;
>  	}
>  }
> +
> +static struct ctl_table perf_ctl_table[] = {
> +	{
> +	 .procname = "perf_stream_paranoid",
> +	 .data = &xe_perf_stream_paranoid,
> +	 .maxlen = sizeof(xe_perf_stream_paranoid),
> +	 .mode = 0644,
> +	 .proc_handler = proc_dointvec_minmax,
> +	 .extra1 = SYSCTL_ZERO,
> +	 .extra2 = SYSCTL_ONE,
> +	 },
> +	{}
We no longer need the sentinel moving forward. At the moment it will
work with and without a sentinel. But that check will be removed within
in the next two releases. I suggest you just remove the sentinel and end
it like this

...
  .extra2 = SYSCTL_ONE,
  },
...

Furthermore: isn't that space after the tab on the struct supposed to be
a tab?

best

> +};
> +
> +int xe_perf_sysctl_register(void)
> +{
> +	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
> +	return 0;
> +}
> +
> +void xe_perf_sysctl_unregister(void)
> +{
> +	unregister_sysctl_table(sysctl_header);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
> index 254cc7cf49fef..1ff0a07ebab30 100644
> --- a/drivers/gpu/drm/xe/xe_perf.h
> +++ b/drivers/gpu/drm/xe/xe_perf.h
> @@ -11,6 +11,10 @@
>  struct drm_device;
>  struct drm_file;
>  
> +extern u32 xe_perf_stream_paranoid;
> +
>  int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
> +int xe_perf_sysctl_register(void);
> +void xe_perf_sysctl_unregister(void);
>  
>  #endif
> -- 
> 2.41.0
> 

-- 

Joel Granados

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 03/17] drm/xe/oa/uapi: Add oa_max_sample_rate sysctl
  2023-12-08  6:43 ` [PATCH 03/17] drm/xe/oa/uapi: Add oa_max_sample_rate sysctl Ashutosh Dixit
  2023-12-14  0:58   ` Umesh Nerlige Ramappa
@ 2024-01-24 14:11   ` Joel Granados
  1 sibling, 0 replies; 68+ messages in thread
From: Joel Granados @ 2024-01-24 14:11 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 4525 bytes --]

On Thu, Dec 07, 2023 at 10:43:15PM -0800, Ashutosh Dixit wrote:
> Introduce oa_max_sample_rate sysctl to set a max limit on the frequency of
> periodic OA reports.
> 
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile    |  1 +
>  drivers/gpu/drm/xe/xe_device.c |  7 +++++
>  drivers/gpu/drm/xe/xe_module.c |  5 ++++
>  drivers/gpu/drm/xe/xe_oa.c     | 49 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_oa.h     | 16 +++++++++++
>  5 files changed, 78 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_oa.c
>  create mode 100644 drivers/gpu/drm/xe/xe_oa.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index b719953d9d30f..cf7e0e5261f73 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -98,6 +98,7 @@ xe-y += xe_bb.o \
>  	xe_mmio.o \
>  	xe_mocs.o \
>  	xe_module.o \
> +	xe_oa.o \
>  	xe_pat.o \
>  	xe_pci.o \
>  	xe_pcode.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 35616d1a81a31..744d573eb2720 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -29,6 +29,7 @@
>  #include "xe_irq.h"
>  #include "xe_mmio.h"
>  #include "xe_module.h"
> +#include "xe_oa.h"
>  #include "xe_pat.h"
>  #include "xe_pcode.h"
>  #include "xe_perf.h"
> @@ -480,6 +481,10 @@ int xe_device_probe(struct xe_device *xe)
>  
>  	xe_heci_gsc_init(xe);
>  
> +	err = xe_oa_init(xe);
> +	if (err)
> +		goto err_irq_shutdown;
> +
>  	err = xe_display_init(xe);
>  	if (err)
>  		goto err_irq_shutdown;
> @@ -526,6 +531,8 @@ void xe_device_remove(struct xe_device *xe)
>  
>  	xe_display_fini(xe);
>  
> +	xe_oa_fini(xe);
> +
>  	xe_heci_gsc_fini(xe);
>  
>  	xe_irq_shutdown(xe);
> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> index 8629330d928b0..176d3e6ec8464 100644
> --- a/drivers/gpu/drm/xe/xe_module.c
> +++ b/drivers/gpu/drm/xe/xe_module.c
> @@ -10,6 +10,7 @@
>  
>  #include "xe_drv.h"
>  #include "xe_hw_fence.h"
> +#include "xe_oa.h"
>  #include "xe_pci.h"
>  #include "xe_perf.h"
>  #include "xe_pmu.h"
> @@ -76,6 +77,10 @@ static const struct init_funcs init_funcs[] = {
>  		.init = xe_perf_sysctl_register,
>  		.exit = xe_perf_sysctl_unregister,
>  	},
> +	{
> +		.init = xe_oa_sysctl_register,
> +		.exit = xe_oa_sysctl_unregister,
> +	},
>  };
>  
>  static int __init xe_init(void)
> diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> new file mode 100644
> index 0000000000000..f4cacb4af47c5
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_oa.c
> @@ -0,0 +1,49 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2023 Intel Corporation
> + */
> +
> +#include <linux/sysctl.h>
> +
> +#include "xe_device.h"
> +#include "xe_oa.h"
> +
> +static int xe_oa_sample_rate_hard_limit;
> +static u32 xe_oa_max_sample_rate = 100000;
> +
> +static struct ctl_table_header *sysctl_header;
> +
> +int xe_oa_init(struct xe_device *xe)
> +{
> +	/* Choose a representative limit */
> +	xe_oa_sample_rate_hard_limit = xe_root_mmio_gt(xe)->info.reference_clock / 2;
> +	return 0;
> +}
> +
> +void xe_oa_fini(struct xe_device *xe)
> +{
> +}
> +
> +static struct ctl_table oa_ctl_table[] = {
> +	{
> +	 .procname = "oa_max_sample_rate",
> +	 .data = &xe_oa_max_sample_rate,
> +	 .maxlen = sizeof(xe_oa_max_sample_rate),
> +	 .mode = 0644,
> +	 .proc_handler = proc_dointvec_minmax,
> +	 .extra1 = SYSCTL_ZERO,
> +	 .extra2 = &xe_oa_sample_rate_hard_limit,
> +	 },
> +	{}
Remove sentinel here as well.

> +};
> +
> +int xe_oa_sysctl_register(void)
> +{
> +	sysctl_header = register_sysctl("dev/xe", oa_ctl_table);
> +	return 0;
> +}
> +
> +void xe_oa_sysctl_unregister(void)
> +{
> +	unregister_sysctl_table(sysctl_header);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
> new file mode 100644
> index 0000000000000..1b81330c9708b
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_oa.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2023 Intel Corporation
> + */
> +
> +#ifndef _XE_OA_H_
> +#define _XE_OA_H_
> +
> +struct xe_device;
> +
> +int xe_oa_init(struct xe_device *xe);
> +void xe_oa_fini(struct xe_device *xe);
> +int xe_oa_sysctl_register(void);
> +void xe_oa_sysctl_unregister(void);
> +
> +#endif
> -- 
> 2.41.0
> 

-- 

Joel Granados

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap
  2024-01-20  3:11     ` Dixit, Ashutosh
@ 2024-02-06 23:51       ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 68+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-02-06 23:51 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-xe, Thomas Hellström

On Fri, Jan 19, 2024 at 07:11:22PM -0800, Dixit, Ashutosh wrote:
>On Fri, 22 Dec 2023 18:39:14 -0800, Umesh Nerlige Ramappa wrote:
>>
>
>Hi Umesh,
>
>> On Thu, Dec 07, 2023 at 10:43:27PM -0800, Ashutosh Dixit wrote:
>> > Allow the OA buffer to be mmap'd to userspace. This is needed for the MMIO
>> > trigger use case. Even otherwise, with whitelisted OA head/tail ptr
>> > registers, userspace can receive/interpret OA data from the mmap'd buffer
>> > without issuing read()'s on the OA stream fd.
>> >
>> > Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>> > ---
>> > drivers/gpu/drm/xe/xe_oa.c | 53 ++++++++++++++++++++++++++++++++++++++
>> > 1 file changed, 53 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
>> > index 42f32d4359f2c..97779cbb83ee8 100644
>> > --- a/drivers/gpu/drm/xe/xe_oa.c
>> > +++ b/drivers/gpu/drm/xe/xe_oa.c
>> > @@ -898,6 +898,8 @@ static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream)
>> >		return PTR_ERR(bo);
>> >
>> >	stream->oa_buffer.bo = bo;
>> > +	/* mmap implementation requires OA buffer to be in system memory */
>> > +	xe_assert(stream->oa->xe, bo->vmap.is_iomem == 0);
>> >	stream->oa_buffer.vaddr = bo->vmap.vaddr;
>> >	return 0;
>> > }
>> > @@ -1174,6 +1176,9 @@ static int xe_oa_release(struct inode *inode, struct file *file)
>> >	struct xe_oa_stream *stream = file->private_data;
>> >	struct xe_gt *gt = stream->gt;
>> >
>> > +	/* Zap mmap's */
>> > +	unmap_mapping_range(file->f_mapping, 0, -1, 1);
>> > +
>> >	mutex_lock(&gt->oa.gt_lock);
>> >	xe_oa_destroy_locked(stream);
>> >	mutex_unlock(&gt->oa.gt_lock);
>> > @@ -1184,6 +1189,53 @@ static int xe_oa_release(struct inode *inode, struct file *file)
>> >	return 0;
>> > }
>> >
>> > +static int xe_oa_mmap(struct file *file, struct vm_area_struct *vma)
>> > +{
>> > +	struct xe_oa_stream *stream = file->private_data;
>> > +	struct xe_bo *bo = stream->oa_buffer.bo;
>> > +	unsigned long start = vma->vm_start;
>> > +	int i, ret;
>> > +
>> > +	if (xe_perf_stream_paranoid && !perfmon_capable()) {
>> > +		drm_dbg(&stream->oa->xe->drm, "Insufficient privilege to map OA buffer\n");
>> > +		return -EACCES;
>> > +	}
>> > +
>> > +	/* Can mmap the entire OA buffer or nothing (no partial OA buffer mmaps) */
>> > +	if (vma->vm_end - vma->vm_start != XE_OA_BUFFER_SIZE) {
>> > +		drm_dbg(&stream->oa->xe->drm, "Wrong mmap size, must be OA buffer size\n");
>> > +		return -EINVAL;
>> > +	}
>> > +
>> > +	/* Only support VM_READ, enforce MAP_PRIVATE by checking for VM_MAYSHARE */
>> > +	if (vma->vm_flags & (VM_WRITE | VM_EXEC | VM_SHARED | VM_MAYSHARE)) {
>> > +		drm_dbg(&stream->oa->xe->drm, "mmap must be read only\n");
>> > +		return -EINVAL;
>> > +	}
>> > +
>> > +	vm_flags_clear(vma, VM_MAYWRITE | VM_MAYEXEC);
>> > +
>> > +	/*
>> > +	 * If the privileged parent forks and child drops root privilege, we do not want
>> > +	 * the child to retain access to the mapped OA buffer. Explicitly set VM_DONTCOPY
>> > +	 * to avoid such cases.
>> > +	 */
>> > +	vm_flags_set(vma, vma->vm_flags | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY);
>>
>> Would help to just use the vm_flags_mod where you can specify both set and
>> clear flags.
>
>Yes, good idea, done.
>
>>
>> And then just to be paranoid about it, maybe add an assert to check that
>> the flags applied correctly.
>
>Don't think this is needed, code in vm_flags_mod is pretty clear. So
>skipping this.
>
>> Assuming you ran the existing mmap tests for this.
>
>Yes, existing IGT's have been ported to Xe.
>
>> I think we should also add an mremap case. I think that should fail with
>> EINVAL since this is a private mapping.
>
>Ah, mremap description says "mremap() expands (or shrinks) an existing
>memory mapping" so the relevant flag seems to be VM_DONTEXPAND rather than
>VM_SHARED/VM_MAYSHARE. I will see about adding a mremap test.
>
>>
>> > +
>> > +	xe_assert(stream->oa->xe, bo->ttm.ttm->num_pages ==
>> > +		  (vma->vm_end - vma->vm_start) >> PAGE_SHIFT);
>> > +	for (i = 0; i < bo->ttm.ttm->num_pages; i++) {
>> > +		ret = remap_pfn_range(vma, start, page_to_pfn(bo->ttm.ttm->pages[i]),
>> > +				      PAGE_SIZE, vma->vm_page_prot);
>>
>> vma->vm_page_prot is set to the state of vm_flags that existed at the
>> mmap_region() level. We have modified those flags here and we must update
>> the vma_page_prot with vm_get_page_prot(vma->vm_flags).
>
>If we need to do this I think the call to use would be
>vma_set_page_prot(). But, looking at vm_get_page_prot, vm_flags which
>affect vma->vm_page_prot are (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED). And note
>that in this function we have not modified any of these, if
>(VM_WRITE|VM_EXEC|VM_SHARED) is set, we just return error
>(vma->vm_page_prot seems to determine what we write to the PTE's and the
>flags we are modifying here are more like "software" vm_flags).
>
>Also we would see some test failures if we didn't do this correctly, and we
>are not seeing test failures.
>
>So I am a little torn about this, vma_set_page_prot() doesn't seem to be
>really needed, but may be nice to have, but it is also rarely used in the
>kernel.

I think I misread this earlier. I agree with your description, so I am 
okay to leave it as is.

Thanks,
Umesh
>
>I am copying Thomas and let's see what he thinks. I'm skipping adding this
>for now but if Thomas or you say we should add it, I'll go ahead add it.
>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2024-03-12  3:38 [PATCH v12 00/17] Add OA functionality to Xe Ashutosh Dixit
@ 2024-03-12  3:39 ` Ashutosh Dixit
  0 siblings, 0 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2024-03-12  3:39 UTC (permalink / raw)
  To: intel-xe

Normally only superuser/root can access perf counter data. However,
superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
users to also access perf data. perf_stream_paranoid is introduced at the
perf layer to allow different perf stream types to share this access
mechanism.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_module.c |  5 +++++
 drivers/gpu/drm/xe/xe_perf.c   | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_perf.h   |  4 ++++
 3 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 110b69864656..42085a4b56be 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,6 +11,7 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
+#include "xe_perf.h"
 #include "xe_sched_job.h"
 
 struct xe_modparam xe_modparam = {
@@ -66,6 +67,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
 	},
+	{
+		.init = xe_perf_sysctl_register,
+		.exit = xe_perf_sysctl_unregister,
+	},
 };
 
 static int __init xe_init(void)
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index a130076b59aa..37538e98dcc0 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -4,9 +4,13 @@
  */
 
 #include <linux/errno.h>
+#include <linux/sysctl.h>
 
 #include "xe_perf.h"
 
+u32 xe_perf_stream_paranoid = true;
+static struct ctl_table_header *sysctl_header;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_xe_perf_param *arg = data;
@@ -19,3 +23,27 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		return -EINVAL;
 	}
 }
+
+static struct ctl_table perf_ctl_table[] = {
+	{
+	 .procname = "perf_stream_paranoid",
+	 .data = &xe_perf_stream_paranoid,
+	 .maxlen = sizeof(xe_perf_stream_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = SYSCTL_ZERO,
+	 .extra2 = SYSCTL_ONE,
+	 },
+	{}
+};
+
+int xe_perf_sysctl_register(void)
+{
+	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
+	return 0;
+}
+
+void xe_perf_sysctl_unregister(void)
+{
+	unregister_sysctl_table(sysctl_header);
+}
diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
index 254cc7cf49fe..1ff0a07ebab3 100644
--- a/drivers/gpu/drm/xe/xe_perf.h
+++ b/drivers/gpu/drm/xe/xe_perf.h
@@ -11,6 +11,10 @@
 struct drm_device;
 struct drm_file;
 
+extern u32 xe_perf_stream_paranoid;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+int xe_perf_sysctl_register(void);
+void xe_perf_sysctl_unregister(void);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2024-03-15  1:35 [PATCH 00/17] Add OA functionality to Xe Ashutosh Dixit
@ 2024-03-15  1:35 ` Ashutosh Dixit
  0 siblings, 0 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2024-03-15  1:35 UTC (permalink / raw)
  To: intel-xe

Normally only superuser/root can access perf counter data. However,
superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
users to also access perf data. perf_stream_paranoid is introduced at the
perf layer to allow different perf stream types to share this access
mechanism.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_module.c |  5 +++++
 drivers/gpu/drm/xe/xe_perf.c   | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_perf.h   |  4 ++++
 3 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 110b69864656..42085a4b56be 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,6 +11,7 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
+#include "xe_perf.h"
 #include "xe_sched_job.h"
 
 struct xe_modparam xe_modparam = {
@@ -66,6 +67,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
 	},
+	{
+		.init = xe_perf_sysctl_register,
+		.exit = xe_perf_sysctl_unregister,
+	},
 };
 
 static int __init xe_init(void)
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index a130076b59aa..37538e98dcc0 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -4,9 +4,13 @@
  */
 
 #include <linux/errno.h>
+#include <linux/sysctl.h>
 
 #include "xe_perf.h"
 
+u32 xe_perf_stream_paranoid = true;
+static struct ctl_table_header *sysctl_header;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_xe_perf_param *arg = data;
@@ -19,3 +23,27 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		return -EINVAL;
 	}
 }
+
+static struct ctl_table perf_ctl_table[] = {
+	{
+	 .procname = "perf_stream_paranoid",
+	 .data = &xe_perf_stream_paranoid,
+	 .maxlen = sizeof(xe_perf_stream_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = SYSCTL_ZERO,
+	 .extra2 = SYSCTL_ONE,
+	 },
+	{}
+};
+
+int xe_perf_sysctl_register(void)
+{
+	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
+	return 0;
+}
+
+void xe_perf_sysctl_unregister(void)
+{
+	unregister_sysctl_table(sysctl_header);
+}
diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
index 254cc7cf49fe..1ff0a07ebab3 100644
--- a/drivers/gpu/drm/xe/xe_perf.h
+++ b/drivers/gpu/drm/xe/xe_perf.h
@@ -11,6 +11,10 @@
 struct drm_device;
 struct drm_file;
 
+extern u32 xe_perf_stream_paranoid;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+int xe_perf_sysctl_register(void);
+void xe_perf_sysctl_unregister(void);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2024-05-24 19:01 [PATCH v14 00/17] Add OA functionality to Xe Ashutosh Dixit
@ 2024-05-24 19:01 ` Ashutosh Dixit
  0 siblings, 0 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2024-05-24 19:01 UTC (permalink / raw)
  To: intel-xe

Normally only superuser/root can access perf counter data. However,
superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
users to also access perf data. perf_stream_paranoid is introduced at the
perf layer to allow different perf stream types to share this access
mechanism.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_module.c |  5 +++++
 drivers/gpu/drm/xe/xe_perf.c   | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_perf.h   |  4 ++++
 3 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 3edeb30d5ccb..893858a2eea0 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,6 +11,7 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
+#include "xe_perf.h"
 #include "xe_sched_job.h"
 
 struct xe_modparam xe_modparam = {
@@ -78,6 +79,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
 	},
+	{
+		.init = xe_perf_sysctl_register,
+		.exit = xe_perf_sysctl_unregister,
+	},
 };
 
 static int __init xe_init(void)
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index a130076b59aa..37538e98dcc0 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -4,9 +4,13 @@
  */
 
 #include <linux/errno.h>
+#include <linux/sysctl.h>
 
 #include "xe_perf.h"
 
+u32 xe_perf_stream_paranoid = true;
+static struct ctl_table_header *sysctl_header;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_xe_perf_param *arg = data;
@@ -19,3 +23,27 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		return -EINVAL;
 	}
 }
+
+static struct ctl_table perf_ctl_table[] = {
+	{
+	 .procname = "perf_stream_paranoid",
+	 .data = &xe_perf_stream_paranoid,
+	 .maxlen = sizeof(xe_perf_stream_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = SYSCTL_ZERO,
+	 .extra2 = SYSCTL_ONE,
+	 },
+	{}
+};
+
+int xe_perf_sysctl_register(void)
+{
+	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
+	return 0;
+}
+
+void xe_perf_sysctl_unregister(void)
+{
+	unregister_sysctl_table(sysctl_header);
+}
diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
index 254cc7cf49fe..1ff0a07ebab3 100644
--- a/drivers/gpu/drm/xe/xe_perf.h
+++ b/drivers/gpu/drm/xe/xe_perf.h
@@ -11,6 +11,10 @@
 struct drm_device;
 struct drm_file;
 
+extern u32 xe_perf_stream_paranoid;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+int xe_perf_sysctl_register(void);
+void xe_perf_sysctl_unregister(void);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2024-05-27  1:43 [PATCH v15 00/17] Add OA functionality to Xe Ashutosh Dixit
@ 2024-05-27  1:43 ` Ashutosh Dixit
  0 siblings, 0 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2024-05-27  1:43 UTC (permalink / raw)
  To: intel-xe

Normally only superuser/root can access perf counter data. However,
superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
users to also access perf data. perf_stream_paranoid is introduced at the
perf layer to allow different perf stream types to share this access
mechanism.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_module.c |  5 +++++
 drivers/gpu/drm/xe/xe_perf.c   | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_perf.h   |  4 ++++
 3 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 3edeb30d5ccb..893858a2eea0 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,6 +11,7 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
+#include "xe_perf.h"
 #include "xe_sched_job.h"
 
 struct xe_modparam xe_modparam = {
@@ -78,6 +79,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
 	},
+	{
+		.init = xe_perf_sysctl_register,
+		.exit = xe_perf_sysctl_unregister,
+	},
 };
 
 static int __init xe_init(void)
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index a130076b59aa..37538e98dcc0 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -4,9 +4,13 @@
  */
 
 #include <linux/errno.h>
+#include <linux/sysctl.h>
 
 #include "xe_perf.h"
 
+u32 xe_perf_stream_paranoid = true;
+static struct ctl_table_header *sysctl_header;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_xe_perf_param *arg = data;
@@ -19,3 +23,27 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		return -EINVAL;
 	}
 }
+
+static struct ctl_table perf_ctl_table[] = {
+	{
+	 .procname = "perf_stream_paranoid",
+	 .data = &xe_perf_stream_paranoid,
+	 .maxlen = sizeof(xe_perf_stream_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = SYSCTL_ZERO,
+	 .extra2 = SYSCTL_ONE,
+	 },
+	{}
+};
+
+int xe_perf_sysctl_register(void)
+{
+	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
+	return 0;
+}
+
+void xe_perf_sysctl_unregister(void)
+{
+	unregister_sysctl_table(sysctl_header);
+}
diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
index 254cc7cf49fe..1ff0a07ebab3 100644
--- a/drivers/gpu/drm/xe/xe_perf.h
+++ b/drivers/gpu/drm/xe/xe_perf.h
@@ -11,6 +11,10 @@
 struct drm_device;
 struct drm_file;
 
+extern u32 xe_perf_stream_paranoid;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+int xe_perf_sysctl_register(void);
+void xe_perf_sysctl_unregister(void);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2024-06-07 20:43 [PATCH v16 00/17] Add OA functionality to Xe Ashutosh Dixit
@ 2024-06-07 20:43 ` Ashutosh Dixit
  0 siblings, 0 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2024-06-07 20:43 UTC (permalink / raw)
  To: intel-xe

Normally only superuser/root can access perf counter data. However,
superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
users to also access perf data. perf_stream_paranoid is introduced at the
perf layer to allow different perf stream types to share this access
mechanism.

Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_module.c |  5 +++++
 drivers/gpu/drm/xe/xe_perf.c   | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_perf.h   |  4 ++++
 3 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 3edeb30d5ccb..893858a2eea0 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,6 +11,7 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
+#include "xe_perf.h"
 #include "xe_sched_job.h"
 
 struct xe_modparam xe_modparam = {
@@ -78,6 +79,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
 	},
+	{
+		.init = xe_perf_sysctl_register,
+		.exit = xe_perf_sysctl_unregister,
+	},
 };
 
 static int __init xe_init(void)
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index a130076b59aa..37538e98dcc0 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -4,9 +4,13 @@
  */
 
 #include <linux/errno.h>
+#include <linux/sysctl.h>
 
 #include "xe_perf.h"
 
+u32 xe_perf_stream_paranoid = true;
+static struct ctl_table_header *sysctl_header;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_xe_perf_param *arg = data;
@@ -19,3 +23,27 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		return -EINVAL;
 	}
 }
+
+static struct ctl_table perf_ctl_table[] = {
+	{
+	 .procname = "perf_stream_paranoid",
+	 .data = &xe_perf_stream_paranoid,
+	 .maxlen = sizeof(xe_perf_stream_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = SYSCTL_ZERO,
+	 .extra2 = SYSCTL_ONE,
+	 },
+	{}
+};
+
+int xe_perf_sysctl_register(void)
+{
+	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
+	return 0;
+}
+
+void xe_perf_sysctl_unregister(void)
+{
+	unregister_sysctl_table(sysctl_header);
+}
diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
index 254cc7cf49fe..1ff0a07ebab3 100644
--- a/drivers/gpu/drm/xe/xe_perf.h
+++ b/drivers/gpu/drm/xe/xe_perf.h
@@ -11,6 +11,10 @@
 struct drm_device;
 struct drm_file;
 
+extern u32 xe_perf_stream_paranoid;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+int xe_perf_sysctl_register(void);
+void xe_perf_sysctl_unregister(void);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2024-06-12  2:05 [PATCH v17 00/17] Add OA functionality to Xe Ashutosh Dixit
@ 2024-06-12  2:05 ` Ashutosh Dixit
  0 siblings, 0 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2024-06-12  2:05 UTC (permalink / raw)
  To: intel-xe

Normally only superuser/root can access perf counter data. However,
superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
users to also access perf data. perf_stream_paranoid is introduced at the
perf layer to allow different perf stream types to share this access
mechanism.

v2: Add kernel doc for non-static functions (Michal)

Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_module.c |  5 +++++
 drivers/gpu/drm/xe/xe_perf.c   | 40 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_perf.h   |  6 +++++
 3 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 3edeb30d5ccb..893858a2eea0 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,6 +11,7 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
+#include "xe_perf.h"
 #include "xe_sched_job.h"
 
 struct xe_modparam xe_modparam = {
@@ -78,6 +79,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
 	},
+	{
+		.init = xe_perf_sysctl_register,
+		.exit = xe_perf_sysctl_unregister,
+	},
 };
 
 static int __init xe_init(void)
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index 2963174ecd0e..f619cf50b453 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -4,11 +4,15 @@
  */
 
 #include <linux/errno.h>
+#include <linux/sysctl.h>
 
 #include <drm/xe_drm.h>
 
 #include "xe_perf.h"
 
+u32 xe_perf_stream_paranoid = true;
+static struct ctl_table_header *sysctl_header;
+
 /**
  * xe_perf_ioctl - The top level perf layer ioctl
  * @dev: @drm_device
@@ -32,3 +36,39 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		return -EINVAL;
 	}
 }
+
+static struct ctl_table perf_ctl_table[] = {
+	{
+	 .procname = "perf_stream_paranoid",
+	 .data = &xe_perf_stream_paranoid,
+	 .maxlen = sizeof(xe_perf_stream_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = SYSCTL_ZERO,
+	 .extra2 = SYSCTL_ONE,
+	 },
+	{}
+};
+
+/**
+ * xe_perf_sysctl_register - Register "perf_stream_paranoid" sysctl
+ *
+ * Normally only superuser/root can access perf counter data. However,
+ * superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
+ * users to also access perf data.
+ *
+ * Return: always returns 0
+ */
+int xe_perf_sysctl_register(void)
+{
+	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
+	return 0;
+}
+
+/**
+ * xe_perf_sysctl_unregister - Unregister "perf_stream_paranoid" sysctl
+ */
+void xe_perf_sysctl_unregister(void)
+{
+	unregister_sysctl_table(sysctl_header);
+}
diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
index e7e258eaf0a9..53a8377a1bb1 100644
--- a/drivers/gpu/drm/xe/xe_perf.h
+++ b/drivers/gpu/drm/xe/xe_perf.h
@@ -6,9 +6,15 @@
 #ifndef _XE_PERF_H_
 #define _XE_PERF_H_
 
+#include <linux/types.h>
+
 struct drm_device;
 struct drm_file;
 
+extern u32 xe_perf_stream_paranoid;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+int xe_perf_sysctl_register(void);
+void xe_perf_sysctl_unregister(void);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2024-06-17 22:36 [PATCH v18 00/17] Add OA functionality to Xe Ashutosh Dixit
@ 2024-06-17 22:36 ` Ashutosh Dixit
  0 siblings, 0 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2024-06-17 22:36 UTC (permalink / raw)
  To: intel-xe

Normally only superuser/root can access perf counter data. However,
superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
users to also access perf data. perf_stream_paranoid is introduced at the
perf layer to allow different perf stream types to share this access
mechanism.

v2: Add kernel doc for non-static functions (Michal)

Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_module.c |  5 +++++
 drivers/gpu/drm/xe/xe_perf.c   | 40 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_perf.h   |  6 +++++
 3 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 3edeb30d5ccb..893858a2eea0 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,6 +11,7 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
+#include "xe_perf.h"
 #include "xe_sched_job.h"
 
 struct xe_modparam xe_modparam = {
@@ -78,6 +79,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
 	},
+	{
+		.init = xe_perf_sysctl_register,
+		.exit = xe_perf_sysctl_unregister,
+	},
 };
 
 static int __init xe_init(void)
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index 2963174ecd0e..f619cf50b453 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -4,11 +4,15 @@
  */
 
 #include <linux/errno.h>
+#include <linux/sysctl.h>
 
 #include <drm/xe_drm.h>
 
 #include "xe_perf.h"
 
+u32 xe_perf_stream_paranoid = true;
+static struct ctl_table_header *sysctl_header;
+
 /**
  * xe_perf_ioctl - The top level perf layer ioctl
  * @dev: @drm_device
@@ -32,3 +36,39 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		return -EINVAL;
 	}
 }
+
+static struct ctl_table perf_ctl_table[] = {
+	{
+	 .procname = "perf_stream_paranoid",
+	 .data = &xe_perf_stream_paranoid,
+	 .maxlen = sizeof(xe_perf_stream_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = SYSCTL_ZERO,
+	 .extra2 = SYSCTL_ONE,
+	 },
+	{}
+};
+
+/**
+ * xe_perf_sysctl_register - Register "perf_stream_paranoid" sysctl
+ *
+ * Normally only superuser/root can access perf counter data. However,
+ * superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
+ * users to also access perf data.
+ *
+ * Return: always returns 0
+ */
+int xe_perf_sysctl_register(void)
+{
+	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
+	return 0;
+}
+
+/**
+ * xe_perf_sysctl_unregister - Unregister "perf_stream_paranoid" sysctl
+ */
+void xe_perf_sysctl_unregister(void)
+{
+	unregister_sysctl_table(sysctl_header);
+}
diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
index e7e258eaf0a9..53a8377a1bb1 100644
--- a/drivers/gpu/drm/xe/xe_perf.h
+++ b/drivers/gpu/drm/xe/xe_perf.h
@@ -6,9 +6,15 @@
 #ifndef _XE_PERF_H_
 #define _XE_PERF_H_
 
+#include <linux/types.h>
+
 struct drm_device;
 struct drm_file;
 
+extern u32 xe_perf_stream_paranoid;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+int xe_perf_sysctl_register(void);
+void xe_perf_sysctl_unregister(void);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl
  2024-06-18  1:45 [PATCH v19 00/17] Add OA functionality to Xe Ashutosh Dixit
@ 2024-06-18  1:45 ` Ashutosh Dixit
  0 siblings, 0 replies; 68+ messages in thread
From: Ashutosh Dixit @ 2024-06-18  1:45 UTC (permalink / raw)
  To: intel-xe

Normally only superuser/root can access perf counter data. However,
superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
users to also access perf data. perf_stream_paranoid is introduced at the
perf layer to allow different perf stream types to share this access
mechanism.

v2: Add kernel doc for non-static functions (Michal)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/xe_module.c |  5 +++++
 drivers/gpu/drm/xe/xe_perf.c   | 40 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_perf.h   |  6 +++++
 3 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 3edeb30d5ccb..893858a2eea0 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,6 +11,7 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
+#include "xe_perf.h"
 #include "xe_sched_job.h"
 
 struct xe_modparam xe_modparam = {
@@ -78,6 +79,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
 	},
+	{
+		.init = xe_perf_sysctl_register,
+		.exit = xe_perf_sysctl_unregister,
+	},
 };
 
 static int __init xe_init(void)
diff --git a/drivers/gpu/drm/xe/xe_perf.c b/drivers/gpu/drm/xe/xe_perf.c
index 2963174ecd0e..f619cf50b453 100644
--- a/drivers/gpu/drm/xe/xe_perf.c
+++ b/drivers/gpu/drm/xe/xe_perf.c
@@ -4,11 +4,15 @@
  */
 
 #include <linux/errno.h>
+#include <linux/sysctl.h>
 
 #include <drm/xe_drm.h>
 
 #include "xe_perf.h"
 
+u32 xe_perf_stream_paranoid = true;
+static struct ctl_table_header *sysctl_header;
+
 /**
  * xe_perf_ioctl - The top level perf layer ioctl
  * @dev: @drm_device
@@ -32,3 +36,39 @@ int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		return -EINVAL;
 	}
 }
+
+static struct ctl_table perf_ctl_table[] = {
+	{
+	 .procname = "perf_stream_paranoid",
+	 .data = &xe_perf_stream_paranoid,
+	 .maxlen = sizeof(xe_perf_stream_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = SYSCTL_ZERO,
+	 .extra2 = SYSCTL_ONE,
+	 },
+	{}
+};
+
+/**
+ * xe_perf_sysctl_register - Register "perf_stream_paranoid" sysctl
+ *
+ * Normally only superuser/root can access perf counter data. However,
+ * superuser can set perf_stream_paranoid sysctl to 0 to allow non-privileged
+ * users to also access perf data.
+ *
+ * Return: always returns 0
+ */
+int xe_perf_sysctl_register(void)
+{
+	sysctl_header = register_sysctl("dev/xe", perf_ctl_table);
+	return 0;
+}
+
+/**
+ * xe_perf_sysctl_unregister - Unregister "perf_stream_paranoid" sysctl
+ */
+void xe_perf_sysctl_unregister(void)
+{
+	unregister_sysctl_table(sysctl_header);
+}
diff --git a/drivers/gpu/drm/xe/xe_perf.h b/drivers/gpu/drm/xe/xe_perf.h
index e7e258eaf0a9..53a8377a1bb1 100644
--- a/drivers/gpu/drm/xe/xe_perf.h
+++ b/drivers/gpu/drm/xe/xe_perf.h
@@ -6,9 +6,15 @@
 #ifndef _XE_PERF_H_
 #define _XE_PERF_H_
 
+#include <linux/types.h>
+
 struct drm_device;
 struct drm_file;
 
+extern u32 xe_perf_stream_paranoid;
+
 int xe_perf_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+int xe_perf_sysctl_register(void);
+void xe_perf_sysctl_unregister(void);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2024-06-18  1:46 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-08  6:43 [PATCH v7 00/17] Add OA functionality to Xe Ashutosh Dixit
2023-12-08  6:43 ` [PATCH 01/17] drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types Ashutosh Dixit
2023-12-08  6:43 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
2023-12-14  0:57   ` Umesh Nerlige Ramappa
2023-12-19 20:28   ` Dixit, Ashutosh
2024-01-20  2:35     ` Dixit, Ashutosh
2024-01-24 14:10   ` Joel Granados
2023-12-08  6:43 ` [PATCH 03/17] drm/xe/oa/uapi: Add oa_max_sample_rate sysctl Ashutosh Dixit
2023-12-14  0:58   ` Umesh Nerlige Ramappa
2024-01-20  2:36     ` Dixit, Ashutosh
2024-01-24 14:11   ` Joel Granados
2023-12-08  6:43 ` [PATCH 04/17] drm/xe/oa/uapi: Add OA data formats Ashutosh Dixit
2023-12-19  1:11   ` Umesh Nerlige Ramappa
2023-12-19  1:17     ` Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 05/17] drm/xe/oa/uapi: Initialize OA units Ashutosh Dixit
2023-12-19 16:11   ` Umesh Nerlige Ramappa
2024-01-20  2:43     ` Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 06/17] drm/xe/oa/uapi: Add/remove OA config perf ops Ashutosh Dixit
2023-12-19 19:10   ` Umesh Nerlige Ramappa
2024-01-20  2:44     ` Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 07/17] drm/xe/oa/uapi: Define and parse OA stream properties Ashutosh Dixit
2023-12-09 22:53   ` Dixit, Ashutosh
2023-12-19  2:59   ` Dixit, Ashutosh
2023-12-19 16:26     ` Umesh Nerlige Ramappa
2023-12-19 16:29       ` Lionel Landwerlin
2023-12-19 16:40         ` Umesh Nerlige Ramappa
2023-12-19 17:48           ` Lionel Landwerlin
2023-12-19 23:23   ` Umesh Nerlige Ramappa
2024-01-20  2:48     ` Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 08/17] drm/xe/oa: OA stream initialization (OAG) Ashutosh Dixit
2023-12-20  2:31   ` Umesh Nerlige Ramappa
2024-01-20  2:49     ` Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 09/17] drm/xe/oa/uapi: Expose OA stream fd Ashutosh Dixit
2023-12-20  2:52   ` Umesh Nerlige Ramappa
2024-01-20  2:50     ` Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 10/17] drm/xe/oa/uapi: Read file_operation Ashutosh Dixit
2023-12-20  3:01   ` Umesh Nerlige Ramappa
2024-01-20  2:51     ` Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 11/17] drm/xe/oa: Disable overrun mode for Xe2+ OAG Ashutosh Dixit
2023-12-20  3:05   ` Umesh Nerlige Ramappa
2024-01-20  2:51     ` Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 12/17] drm/xe/oa: Add OAR support Ashutosh Dixit
2023-12-20  4:37   ` Umesh Nerlige Ramappa
2023-12-08  6:43 ` [PATCH 13/17] drm/xe/oa: Add OAC support Ashutosh Dixit
2023-12-20  4:59   ` Umesh Nerlige Ramappa
2024-01-20  2:52     ` FIXME " Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 14/17] drm/xe/oa/uapi: Query OA unit properties Ashutosh Dixit
2023-12-23  0:40   ` Umesh Nerlige Ramappa
2024-01-20  3:10     ` Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 15/17] drm/xe/oa/uapi: OA buffer mmap Ashutosh Dixit
2023-12-23  2:39   ` Umesh Nerlige Ramappa
2024-01-20  3:11     ` Dixit, Ashutosh
2024-02-06 23:51       ` Umesh Nerlige Ramappa
2024-01-02 11:16   ` Thomas Hellström
2024-01-08 19:50     ` Umesh Nerlige Ramappa
2024-01-09  5:14       ` Dixit, Ashutosh
2023-12-08  6:43 ` [PATCH 16/17] drm/xe/oa: Add MMIO trigger support Ashutosh Dixit
2023-12-20  4:35   ` Umesh Nerlige Ramappa
2023-12-08  6:43 ` [PATCH 17/17] drm/xe/oa: Override GuC RC with OA on PVC Ashutosh Dixit
2023-12-08  9:22 ` ✗ CI.Patch_applied: failure for Add OA functionality to Xe (rev7) Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2024-03-12  3:38 [PATCH v12 00/17] Add OA functionality to Xe Ashutosh Dixit
2024-03-12  3:39 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
2024-03-15  1:35 [PATCH 00/17] Add OA functionality to Xe Ashutosh Dixit
2024-03-15  1:35 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
2024-05-24 19:01 [PATCH v14 00/17] Add OA functionality to Xe Ashutosh Dixit
2024-05-24 19:01 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
2024-05-27  1:43 [PATCH v15 00/17] Add OA functionality to Xe Ashutosh Dixit
2024-05-27  1:43 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
2024-06-07 20:43 [PATCH v16 00/17] Add OA functionality to Xe Ashutosh Dixit
2024-06-07 20:43 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
2024-06-12  2:05 [PATCH v17 00/17] Add OA functionality to Xe Ashutosh Dixit
2024-06-12  2:05 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
2024-06-17 22:36 [PATCH v18 00/17] Add OA functionality to Xe Ashutosh Dixit
2024-06-17 22:36 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit
2024-06-18  1:45 [PATCH v19 00/17] Add OA functionality to Xe Ashutosh Dixit
2024-06-18  1:45 ` [PATCH 02/17] drm/xe/perf/uapi: Add perf_stream_paranoid sysctl Ashutosh Dixit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox