Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3
@ 2024-12-09 13:32 Mika Kuoppala
  2024-12-09 13:32 ` [PATCH 01/26] ptrace: export ptrace_may_access Mika Kuoppala
                   ` (31 more replies)
  0 siblings, 32 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:32 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Mika Kuoppala

Hi,

This is continuation of the first and second submission of
Intel Xe GPU debug support:

v1: https://lists.freedesktop.org/archives/intel-xe/2024-July/043605.html
v2: https://lists.freedesktop.org/archives/intel-xe/2024-October/052260.html

New features in v3:

 - EXEC_QUEUE_PLACEMENT events providing detailed information
   about engines participating on exec queue. (Dominik Grzegorzek)

 - EU thread page fault support (Gwan-gyeong Mun)

 - Fixed access to VRAM backed storage (Matthew Brost)
   Essential for BMG enabling. This work was already merged into
   xe driver and eudebug takes advantage of that (ttm_bo_access).
   [8].
   
 - Support for Pantherlake (Dominik Grzegorzek)

v3 supports:
 - Lunarlake (LNL)
 - Battlemage (BMG)
 - Pantherlake (PTL)

Thanks to all contributors!

Latest code can be found in:
[1] https://gitlab.freedesktop.org/miku/kernel/-/tree/eudebug-dev

Branch for this submission:
[2] https://gitlab.freedesktop.org/miku/kernel/-/tree/eudebug-v3

README/instructions:
[3] https://gitlab.freedesktop.org/miku/kernel

IGT tests (needs config switch 'xe_eudebug' to be set)
[4] https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
https://gitlab.freedesktop.org/gfx-ci/i915-infra/-/blob/master/kconfig/debug.kconfig

The user for this uapi:
[5] https://github.com/intel/compute-runtime
Event loop and thread control interaction can be found at:
https://github.com/intel/compute-runtime/tree/master/level_zero/tools/source/debug/linux/xe
And the wrappers in:
https://github.com/intel/compute-runtime/tree/master/shared/source/os_interface/linux/xe
https://github.com/intel/compute-runtime/blob/master/shared/source/os_interface/linux/xe/ioctl_helper_xe_debugger.cpp
Note that the XE support is disabled by default and you will need
NEO_ENABLE_XE_EU_DEBUG_SUPPORT enabled in order to test.

GDB support:
[6] https://github.com/intel/gdb/tree/upstream/intelgt-mvp
[7] https://github.com/intel/gdb/tree/upstream/intelgt-mvp-plus
GDB is preparing their own mailing list submission with above and based on v3.
I will reply to this cover letter and update README when it happens.

[8]: https://lists.freedesktop.org/archives/intel-xe/2024-November/060247.html
Fix non-contiguous VRAM BO access in Xe

Thanks,
Mika

Andrzej Hajda (2):
  drm/xe: add system memory page iterator support to xe_res_cursor
  drm/xe/eudebug: implement userptr_vma access

Christoph Manszewski (3):
  drm/xe/eudebug: Add vm bind and vm bind ops
  drm/xe/eudebug: Dynamically toggle debugger functionality
  drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test

Dominik Grzegorzek (11):
  drm/xe/eudebug: Introduce exec_queue events
  drm/xe/eudebug: Introduce exec queue placements event
  drm/xe/eudebug: hw enablement for eudebug
  drm/xe: Add EUDEBUG_ENABLE exec queue property
  drm/xe/eudebug: Introduce per device attention scan worker
  drm/xe/eudebug: Introduce EU control interface
  drm/xe: Debug metadata create/destroy ioctls
  drm/xe: Attach debug metadata to vma
  drm/xe/eudebug: Add debug metadata support for xe_eudebug
  drm/xe/eudebug/ptl: Add support for extra attention register
  drm/xe/eudebug/ptl: Add RCU_DEBUG_1 register support for xe3

Gwan-gyeong Mun (4):
  drm/xe/eudebug: Add read/count/compare helper for eu attention
  drm/xe/eudebug: Introduce EU pagefault handling interface
  drm/xe/vm: Support for adding null page VMA to VM on request
  drm/xe/eudebug: Enable EU pagefault handling

Mika Kuoppala (6):
  ptrace: export ptrace_may_access
  drm/xe/eudebug: Introduce eudebug support
  drm/xe/eudebug: Introduce discovery for resources
  drm/xe/eudebug: Add UFENCE events with acks
  drm/xe/eudebug: vm open/pread/pwrite
  drm/xe/eudebug: Implement vm_bind_op discovery

 drivers/gpu/drm/xe/Kconfig                   |   10 +
 drivers/gpu/drm/xe/Makefile                  |    4 +
 drivers/gpu/drm/xe/regs/xe_engine_regs.h     |    7 +
 drivers/gpu/drm/xe/regs/xe_gt_regs.h         |   43 +
 drivers/gpu/drm/xe/tests/xe_eudebug.c        |  176 +
 drivers/gpu/drm/xe/tests/xe_live_test_mod.c  |    5 +
 drivers/gpu/drm/xe/xe_debug_metadata.c       |  233 +
 drivers/gpu/drm/xe/xe_debug_metadata.h       |  102 +
 drivers/gpu/drm/xe/xe_debug_metadata_types.h |   25 +
 drivers/gpu/drm/xe/xe_device.c               |   25 +-
 drivers/gpu/drm/xe/xe_device.h               |   36 +
 drivers/gpu/drm/xe/xe_device_types.h         |   54 +
 drivers/gpu/drm/xe/xe_eudebug.c              | 4451 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_eudebug.h              |  128 +
 drivers/gpu/drm/xe/xe_eudebug_types.h        |  448 ++
 drivers/gpu/drm/xe/xe_exec.c                 |    2 +-
 drivers/gpu/drm/xe/xe_exec_queue.c           |   56 +-
 drivers/gpu/drm/xe/xe_exec_queue.h           |    2 +
 drivers/gpu/drm/xe/xe_exec_queue_types.h     |    7 +
 drivers/gpu/drm/xe/xe_execlist.c             |    2 +-
 drivers/gpu/drm/xe/xe_gt_debug.c             |  212 +
 drivers/gpu/drm/xe/xe_gt_debug.h             |   46 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c         |   87 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.h         |    2 +
 drivers/gpu/drm/xe/xe_hw_engine.c            |    1 +
 drivers/gpu/drm/xe/xe_lrc.c                  |   16 +-
 drivers/gpu/drm/xe/xe_lrc.h                  |    4 +-
 drivers/gpu/drm/xe/xe_oa.c                   |    3 +-
 drivers/gpu/drm/xe/xe_reg_sr.c               |   21 +-
 drivers/gpu/drm/xe/xe_reg_sr.h               |    4 +-
 drivers/gpu/drm/xe/xe_res_cursor.h           |   51 +-
 drivers/gpu/drm/xe/xe_rtp.c                  |    2 +-
 drivers/gpu/drm/xe/xe_sync.c                 |   45 +-
 drivers/gpu/drm/xe/xe_sync.h                 |    8 +-
 drivers/gpu/drm/xe/xe_sync_types.h           |   28 +-
 drivers/gpu/drm/xe/xe_vm.c                   |  196 +-
 drivers/gpu/drm/xe/xe_vm.h                   |    5 +
 drivers/gpu/drm/xe/xe_vm_types.h             |   40 +
 drivers/gpu/drm/xe/xe_wa_oob.rules           |    2 +
 include/uapi/drm/xe_drm.h                    |   96 +-
 include/uapi/drm/xe_drm_eudebug.h            |  256 +
 kernel/ptrace.c                              |    1 +
 42 files changed, 6869 insertions(+), 73 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/tests/xe_eudebug.c
 create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata.c
 create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata.h
 create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_eudebug.c
 create mode 100644 drivers/gpu/drm/xe/xe_eudebug.h
 create mode 100644 drivers/gpu/drm/xe/xe_eudebug_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_gt_debug.c
 create mode 100644 drivers/gpu/drm/xe/xe_gt_debug.h
 create mode 100644 include/uapi/drm/xe_drm_eudebug.h

-- 
2.43.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 01/26] ptrace: export ptrace_may_access
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
@ 2024-12-09 13:32 ` Mika Kuoppala
  2024-12-10  4:29   ` Christoph Hellwig
  2024-12-09 13:32 ` [PATCH 02/26] drm/xe/eudebug: Introduce eudebug support Mika Kuoppala
                   ` (30 subsequent siblings)
  31 siblings, 1 reply; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:32 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Mika Kuoppala, Oleg Nesterov,
	linux-kernel, Dave Airlie, Lucas De Marchi, Matthew Brost,
	Andi Shyti, Joonas Lahtinen, Maciej Patelczyk, Dominik Grzegorzek,
	Jonathan Cavitt, Andi Shyti

xe driver would like to allow fine grained access control
for GDB debugger using ptrace. Without this export, the only
option would be to check for CAP_SYS_ADMIN.

The check intended for an ioctl to attach a GPU debugger
is similar to the ptrace use case: allow a calling process
to manipulate a target process if it has the necessary
capabilities or the same permissions, as described in
Documentation/process/adding-syscalls.rst.

Export ptrace_may_access function to allow GPU debugger to
have identical access control for debugger(s)
as a CPU debugger.

v2: proper commit message (Lucas)

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: Dave Airlie <airlied@redhat.com>
CC: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
CC: Andi Shyti <andi.shyti@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
CC: Maciej Patelczyk <maciej.patelczyk@linux.intel.com>
Cc: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 kernel/ptrace.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d5f89f9ef29f..86be1805ebd8 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -354,6 +354,7 @@ bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 	task_unlock(task);
 	return !err;
 }
+EXPORT_SYMBOL_GPL(ptrace_may_access);
 
 static int check_ptrace_options(unsigned long data)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 02/26] drm/xe/eudebug: Introduce eudebug support
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
  2024-12-09 13:32 ` [PATCH 01/26] ptrace: export ptrace_may_access Mika Kuoppala
@ 2024-12-09 13:32 ` Mika Kuoppala
  2024-12-09 13:32 ` [PATCH 03/26] drm/xe/eudebug: Introduce discovery for resources Mika Kuoppala
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:32 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Mika Kuoppala, Maarten Lankhorst,
	Lucas De Marchi, Dominik Grzegorzek, Andi Shyti, Matt Roper,
	Matthew Brost, Zbigniew Kempczyński, Andrzej Hajda,
	Maciej Patelczyk, Jonathan Cavitt, Christoph Manszewski

With eudebug event interface, user space debugger process (like gdb)
is able to keep track of resources created by another process
(debuggee using drm/xe) and act upon these resources.

For example, debugger can find a client vm which contains isa/elf
for a particular shader/eu-kernel and then inspect and modify it
(for example installing a breakpoint).

Debugger first opens a connection to xe with a drm ioctl specifying
target pid to connect. This returns an anon fd handle that can then be
used to listen for events with dedicated ioctl.

This patch introduces eudebug connection and event queuing, adding
client create/destroy and vm create/destroy events as a baseline.
More events for full debugger operation are needed and
those will be introduced in follow up patches.

The resource tracking parts are inspired by the work of
Maciej Patelczyk on resource handling for i915. Chris Wilson
suggested improvement of two ways mapping which makes it easy to
use resource map as a definitive bookkeep of what resources
are played to debugger in the discovery phase (on follow up patch).

v2: - Kconfig support (Matthew)
    - ptraced access control (Lucas)
    - pass expected event length to user (Zbigniew)
    - only track long running VMs
    - checkpatch (Tilak)
    - include order (Andrzej)
    - 32bit fixes (Andrzej)
    - cleaner get_task_struct
    - remove xa_array and use clients.list for tracking (Mika)

v3: - adapt to removal of clients.lock (Mika)
    - create_event cleanup (Christoph)

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Christoph Manszewski <christoph.manszewski@intel.com>
---
 drivers/gpu/drm/xe/Kconfig            |   10 +
 drivers/gpu/drm/xe/Makefile           |    2 +
 drivers/gpu/drm/xe/xe_device.c        |   10 +
 drivers/gpu/drm/xe/xe_device_types.h  |   35 +
 drivers/gpu/drm/xe/xe_eudebug.c       | 1118 +++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_eudebug.h       |   46 +
 drivers/gpu/drm/xe/xe_eudebug_types.h |  169 ++++
 drivers/gpu/drm/xe/xe_vm.c            |    7 +-
 include/uapi/drm/xe_drm.h             |   21 +
 include/uapi/drm/xe_drm_eudebug.h     |   56 ++
 10 files changed, 1473 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/xe/xe_eudebug.c
 create mode 100644 drivers/gpu/drm/xe/xe_eudebug.h
 create mode 100644 drivers/gpu/drm/xe/xe_eudebug_types.h
 create mode 100644 include/uapi/drm/xe_drm_eudebug.h

diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index b51a2bde73e2..ed97730b1af3 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -87,6 +87,16 @@ config DRM_XE_FORCE_PROBE
 
 	  Use "!*" to block the probe of the driver for all known devices.
 
+config DRM_XE_EUDEBUG
+	bool "Enable gdb debugger support (eudebug)"
+	depends on DRM_XE
+	default y
+	help
+	  Choose this option if you want to add support for debugger (gdb) to
+	  attach into process using Xe and debug the gpu/gpgpu programs.
+	  With debugger support, Xe will provide interface for a debugger to
+	  process to track, inspect and modify resources.
+
 menu "drm/Xe Debugging"
 depends on DRM_XE
 depends on EXPERT
diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 7730e0596299..deabcdd3ea52 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -116,6 +116,8 @@ xe-y += xe_bb.o \
 	xe_wa.o \
 	xe_wopcm.o
 
+xe-$(CONFIG_DRM_XE_EUDEBUG) += xe_eudebug.o
+
 xe-$(CONFIG_HMM_MIRROR) += xe_hmm.o
 
 # graphics hardware monitoring (HWMON) support
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index d6fccea1e083..9ed0de1eba0b 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -28,6 +28,7 @@
 #include "xe_dma_buf.h"
 #include "xe_drm_client.h"
 #include "xe_drv.h"
+#include "xe_eudebug.h"
 #include "xe_exec.h"
 #include "xe_exec_queue.h"
 #include "xe_force_wake.h"
@@ -100,6 +101,8 @@ static int xe_file_open(struct drm_device *dev, struct drm_file *file)
 		put_task_struct(task);
 	}
 
+	xe_eudebug_file_open(xef);
+
 	return 0;
 }
 
@@ -153,6 +156,8 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
 
 	xe_pm_runtime_get(xe);
 
+	xe_eudebug_file_close(xef);
+
 	/*
 	 * No need for exec_queue.lock here as there is no contention for it
 	 * when FD is closing as IOCTLs presumably can't be modifying the
@@ -191,6 +196,7 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(XE_WAIT_USER_FENCE, xe_wait_user_fence_ioctl,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(XE_EUDEBUG_CONNECT, xe_eudebug_connect_ioctl, DRM_RENDER_ALLOW),
 };
 
 static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
@@ -281,6 +287,8 @@ static void xe_device_destroy(struct drm_device *dev, void *dummy)
 {
 	struct xe_device *xe = to_xe_device(dev);
 
+	xe_eudebug_fini(xe);
+
 	if (xe->preempt_fence_wq)
 		destroy_workqueue(xe->preempt_fence_wq);
 
@@ -352,6 +360,8 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 	INIT_LIST_HEAD(&xe->pinned.external_vram);
 	INIT_LIST_HEAD(&xe->pinned.evicted);
 
+	xe_eudebug_init(xe);
+
 	xe->preempt_fence_wq = alloc_ordered_workqueue("xe-preempt-fence-wq",
 						       WQ_MEM_RECLAIM);
 	xe->ordered_wq = alloc_ordered_workqueue("xe-ordered-wq", 0);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 1373a222f5a5..9f04e6476195 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -383,6 +383,17 @@ struct xe_device {
 		struct workqueue_struct *wq;
 	} sriov;
 
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+	/** @clients: eudebug clients info */
+	struct {
+		/** @clients.lock: Protects client list */
+		spinlock_t lock;
+
+		/** @xa: client list for eudebug discovery */
+		struct list_head list;
+	} clients;
+#endif
+
 	/** @usm: unified memory state */
 	struct {
 		/** @usm.asid: convert a ASID to VM */
@@ -525,6 +536,23 @@ struct xe_device {
 	u8 vm_inject_error_position;
 #endif
 
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+	/** @debugger connection list and globals for device */
+	struct {
+		/** @lock: protects the list of connections */
+		spinlock_t lock;
+
+		/** @list: list of connections, aka debuggers */
+		struct list_head list;
+
+		/** @session_count: session counter to track connections */
+		u64 session_count;
+
+		/** @available: is the debugging functionality available */
+		bool available;
+	} eudebug;
+#endif
+
 	/* private: */
 
 #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY)
@@ -642,6 +670,13 @@ struct xe_file {
 
 	/** @refcount: ref count of this xe file */
 	struct kref refcount;
+
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+	struct {
+		/** @client_link: list entry in xe_device.clients.list */
+		struct list_head client_link;
+	} eudebug;
+#endif
 };
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
new file mode 100644
index 000000000000..bbb5f1e81bb8
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -0,0 +1,1118 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/delay.h>
+#include <linux/poll.h>
+#include <linux/uaccess.h>
+
+#include <drm/drm_managed.h>
+
+#include "xe_assert.h"
+#include "xe_device.h"
+#include "xe_eudebug.h"
+#include "xe_eudebug_types.h"
+#include "xe_macros.h"
+#include "xe_vm.h"
+
+/*
+ * If there is no detected event read by userspace, during this period, assume
+ * userspace problem and disconnect debugger to allow forward progress.
+ */
+#define XE_EUDEBUG_NO_READ_DETECTED_TIMEOUT_MS (25 * 1000)
+
+#define for_each_debugger_rcu(debugger, head) \
+	list_for_each_entry_rcu((debugger), (head), connection_link)
+#define for_each_debugger(debugger, head) \
+	list_for_each_entry((debugger), (head), connection_link)
+
+#define cast_event(T, event) container_of((event), typeof(*(T)), base)
+
+#define XE_EUDEBUG_DBG_STR "eudbg: %lld:%lu:%s (%d/%d) -> (%d/%d): "
+#define XE_EUDEBUG_DBG_ARGS(d) (d)->session, \
+		atomic_long_read(&(d)->events.seqno), \
+		READ_ONCE(d->connection.status) <= 0 ? "disconnected" : "", \
+		current->pid, \
+		task_tgid_nr(current), \
+		(d)->target_task->pid, \
+		task_tgid_nr((d)->target_task)
+
+#define eu_err(d, fmt, ...) drm_err(&(d)->xe->drm, XE_EUDEBUG_DBG_STR # fmt, \
+				    XE_EUDEBUG_DBG_ARGS(d), ##__VA_ARGS__)
+#define eu_warn(d, fmt, ...) drm_warn(&(d)->xe->drm, XE_EUDEBUG_DBG_STR # fmt, \
+				      XE_EUDEBUG_DBG_ARGS(d), ##__VA_ARGS__)
+#define eu_dbg(d, fmt, ...) drm_dbg(&(d)->xe->drm, XE_EUDEBUG_DBG_STR # fmt, \
+				    XE_EUDEBUG_DBG_ARGS(d), ##__VA_ARGS__)
+
+#define xe_eudebug_assert(d, ...) xe_assert((d)->xe, ##__VA_ARGS__)
+
+#define struct_member(T, member) (((T *)0)->member)
+
+/* Keep 1:1 parity with uapi events */
+#define write_member(T_out, ptr, member, value) { \
+	BUILD_BUG_ON(sizeof(*ptr) != sizeof(T_out)); \
+	BUILD_BUG_ON(offsetof(typeof(*ptr), member) != \
+		     offsetof(typeof(T_out), member)); \
+	BUILD_BUG_ON(sizeof(ptr->member) != sizeof(value)); \
+	BUILD_BUG_ON(sizeof(struct_member(T_out, member)) != sizeof(value)); \
+	BUILD_BUG_ON(!typecheck(typeof((ptr)->member), value));	\
+	(ptr)->member = (value); \
+	}
+
+static struct xe_eudebug_event *
+event_fifo_pending(struct xe_eudebug *d)
+{
+	struct xe_eudebug_event *event;
+
+	if (kfifo_peek(&d->events.fifo, &event))
+		return event;
+
+	return NULL;
+}
+
+/*
+ * This is racy as we dont take the lock for read but all the
+ * callsites can handle the race so we can live without lock.
+ */
+__no_kcsan
+static unsigned int
+event_fifo_num_events_peek(const struct xe_eudebug * const d)
+{
+	return kfifo_len(&d->events.fifo);
+}
+
+static bool
+xe_eudebug_detached(struct xe_eudebug *d)
+{
+	int status;
+
+	spin_lock(&d->connection.lock);
+	status = d->connection.status;
+	spin_unlock(&d->connection.lock);
+
+	return status <= 0;
+}
+
+static int
+xe_eudebug_error(const struct xe_eudebug * const d)
+{
+	const int status = READ_ONCE(d->connection.status);
+
+	return status <= 0 ? status : 0;
+}
+
+static unsigned int
+event_fifo_has_events(struct xe_eudebug *d)
+{
+	if (xe_eudebug_detached(d))
+		return 1;
+
+	return event_fifo_num_events_peek(d);
+}
+
+static const struct rhashtable_params rhash_res = {
+	.head_offset = offsetof(struct xe_eudebug_handle, rh_head),
+	.key_len = sizeof_field(struct xe_eudebug_handle, key),
+	.key_offset = offsetof(struct xe_eudebug_handle, key),
+	.automatic_shrinking = true,
+};
+
+static struct xe_eudebug_resource *
+resource_from_type(struct xe_eudebug_resources * const res, const int t)
+{
+	return &res->rt[t];
+}
+
+static struct xe_eudebug_resources *
+xe_eudebug_resources_alloc(void)
+{
+	struct xe_eudebug_resources *res;
+	int err;
+	int i;
+
+	res = kzalloc(sizeof(*res), GFP_ATOMIC);
+	if (!res)
+		return ERR_PTR(-ENOMEM);
+
+	mutex_init(&res->lock);
+
+	for (i = 0; i < XE_EUDEBUG_RES_TYPE_COUNT; i++) {
+		xa_init_flags(&res->rt[i].xa, XA_FLAGS_ALLOC1);
+		err = rhashtable_init(&res->rt[i].rh, &rhash_res);
+
+		if (err)
+			break;
+	}
+
+	if (err) {
+		while (i--) {
+			xa_destroy(&res->rt[i].xa);
+			rhashtable_destroy(&res->rt[i].rh);
+		}
+
+		kfree(res);
+		return ERR_PTR(err);
+	}
+
+	return res;
+}
+
+static void res_free_fn(void *ptr, void *arg)
+{
+	XE_WARN_ON(ptr);
+	kfree(ptr);
+}
+
+static void
+xe_eudebug_destroy_resources(struct xe_eudebug *d)
+{
+	struct xe_eudebug_resources *res = d->res;
+	struct xe_eudebug_handle *h;
+	unsigned long j;
+	int i;
+	int err;
+
+	mutex_lock(&res->lock);
+	for (i = 0; i < XE_EUDEBUG_RES_TYPE_COUNT; i++) {
+		struct xe_eudebug_resource *r = &res->rt[i];
+
+		xa_for_each(&r->xa, j, h) {
+			struct xe_eudebug_handle *t;
+
+			err = rhashtable_remove_fast(&r->rh,
+						     &h->rh_head,
+						     rhash_res);
+			xe_eudebug_assert(d, !err);
+			t = xa_erase(&r->xa, h->id);
+			xe_eudebug_assert(d, t == h);
+			kfree(t);
+		}
+	}
+	mutex_unlock(&res->lock);
+
+	for (i = 0; i < XE_EUDEBUG_RES_TYPE_COUNT; i++) {
+		struct xe_eudebug_resource *r = &res->rt[i];
+
+		rhashtable_free_and_destroy(&r->rh, res_free_fn, NULL);
+		xe_eudebug_assert(d, xa_empty(&r->xa));
+		xa_destroy(&r->xa);
+	}
+
+	mutex_destroy(&res->lock);
+
+	kfree(res);
+}
+
+static void xe_eudebug_free(struct kref *ref)
+{
+	struct xe_eudebug *d = container_of(ref, typeof(*d), ref);
+	struct xe_eudebug_event *event;
+
+	while (kfifo_get(&d->events.fifo, &event))
+		kfree(event);
+
+	xe_eudebug_destroy_resources(d);
+	put_task_struct(d->target_task);
+
+	xe_eudebug_assert(d, !kfifo_len(&d->events.fifo));
+
+	kfree_rcu(d, rcu);
+}
+
+static void xe_eudebug_put(struct xe_eudebug *d)
+{
+	kref_put(&d->ref, xe_eudebug_free);
+}
+
+static struct task_struct *find_get_target(const pid_t nr)
+{
+	struct task_struct *task;
+
+	rcu_read_lock();
+	task = pid_task(find_pid_ns(nr, task_active_pid_ns(current)), PIDTYPE_PID);
+	if (task)
+		get_task_struct(task);
+	rcu_read_unlock();
+
+	return task;
+}
+
+static int
+xe_eudebug_attach(struct xe_device *xe, struct xe_eudebug *d,
+		  const pid_t pid_nr)
+{
+	struct task_struct *target;
+	struct xe_eudebug *iter;
+	int ret = 0;
+
+	target = find_get_target(pid_nr);
+	if (!target)
+		return -ENOENT;
+
+	if (!ptrace_may_access(target, PTRACE_MODE_READ_REALCREDS)) {
+		put_task_struct(target);
+		return -EACCES;
+	}
+
+	XE_WARN_ON(d->connection.status != 0);
+
+	spin_lock(&xe->eudebug.lock);
+	for_each_debugger(iter, &xe->eudebug.list) {
+		if (!same_thread_group(iter->target_task, target))
+			continue;
+
+		ret = -EBUSY;
+	}
+
+	if (!ret && xe->eudebug.session_count + 1 == 0)
+		ret = -ENOSPC;
+
+	if (!ret) {
+		d->connection.status = XE_EUDEBUG_STATUS_CONNECTED;
+		d->xe = xe;
+		d->target_task = get_task_struct(target);
+		d->session = ++xe->eudebug.session_count;
+		kref_get(&d->ref);
+		list_add_tail_rcu(&d->connection_link, &xe->eudebug.list);
+	}
+	spin_unlock(&xe->eudebug.lock);
+
+	put_task_struct(target);
+
+	return ret;
+}
+
+static bool xe_eudebug_detach(struct xe_device *xe,
+			      struct xe_eudebug *d,
+			      const int err)
+{
+	bool detached = false;
+
+	XE_WARN_ON(err > 0);
+
+	spin_lock(&d->connection.lock);
+	if (d->connection.status == XE_EUDEBUG_STATUS_CONNECTED) {
+		d->connection.status = err;
+		detached = true;
+	}
+	spin_unlock(&d->connection.lock);
+
+	if (!detached)
+		return false;
+
+	spin_lock(&xe->eudebug.lock);
+	list_del_rcu(&d->connection_link);
+	spin_unlock(&xe->eudebug.lock);
+
+	eu_dbg(d, "session %lld detached with %d", d->session, err);
+
+	/* Our ref with the connection_link */
+	xe_eudebug_put(d);
+
+	return true;
+}
+
+static int _xe_eudebug_disconnect(struct xe_eudebug *d,
+				  const int err)
+{
+	wake_up_all(&d->events.write_done);
+	wake_up_all(&d->events.read_done);
+
+	return xe_eudebug_detach(d->xe, d, err);
+}
+
+#define xe_eudebug_disconnect(_d, _err) ({ \
+	if (_xe_eudebug_disconnect((_d), (_err))) { \
+		if ((_err) == 0 || (_err) == -ETIMEDOUT) \
+			eu_dbg(d, "Session closed (%d)", (_err)); \
+		else \
+			eu_err(d, "Session disconnected, err = %d (%s:%d)", \
+			       (_err), __func__, __LINE__); \
+	} \
+})
+
+static int xe_eudebug_release(struct inode *inode, struct file *file)
+{
+	struct xe_eudebug *d = file->private_data;
+
+	xe_eudebug_disconnect(d, 0);
+	xe_eudebug_put(d);
+
+	return 0;
+}
+
+static __poll_t xe_eudebug_poll(struct file *file, poll_table *wait)
+{
+	struct xe_eudebug * const d = file->private_data;
+	__poll_t ret = 0;
+
+	poll_wait(file, &d->events.write_done, wait);
+
+	if (xe_eudebug_detached(d)) {
+		ret |= EPOLLHUP;
+		if (xe_eudebug_error(d))
+			ret |= EPOLLERR;
+	}
+
+	if (event_fifo_num_events_peek(d))
+		ret |= EPOLLIN;
+
+	return ret;
+}
+
+static ssize_t xe_eudebug_read(struct file *file,
+			       char __user *buf,
+			       size_t count,
+			       loff_t *ppos)
+{
+	return -EINVAL;
+}
+
+static struct xe_eudebug *
+xe_eudebug_for_task_get(struct xe_device *xe,
+			struct task_struct *task)
+{
+	struct xe_eudebug *d, *iter;
+
+	d = NULL;
+
+	rcu_read_lock();
+	for_each_debugger_rcu(iter, &xe->eudebug.list) {
+		if (!same_thread_group(iter->target_task, task))
+			continue;
+
+		if (kref_get_unless_zero(&iter->ref))
+			d = iter;
+
+		break;
+	}
+	rcu_read_unlock();
+
+	return d;
+}
+
+static struct task_struct *find_task_get(struct xe_file *xef)
+{
+	struct task_struct *task;
+	struct pid *pid;
+
+	rcu_read_lock();
+	pid = rcu_dereference(xef->drm->pid);
+	task = pid_task(pid, PIDTYPE_PID);
+	if (task)
+		get_task_struct(task);
+	rcu_read_unlock();
+
+	return task;
+}
+
+static struct xe_eudebug *
+xe_eudebug_get(struct xe_file *xef)
+{
+	struct task_struct *task;
+	struct xe_eudebug *d;
+
+	d = NULL;
+	task = find_task_get(xef);
+	if (task) {
+		d = xe_eudebug_for_task_get(to_xe_device(xef->drm->minor->dev),
+					    task);
+		put_task_struct(task);
+	}
+
+	if (!d)
+		return NULL;
+
+	if (xe_eudebug_detached(d)) {
+		xe_eudebug_put(d);
+		return NULL;
+	}
+
+	return d;
+}
+
+static int xe_eudebug_queue_event(struct xe_eudebug *d,
+				  struct xe_eudebug_event *event)
+{
+	const u64 wait_jiffies = msecs_to_jiffies(1000);
+	u64 last_read_detected_ts, last_head_seqno, start_ts;
+
+	xe_eudebug_assert(d, event->len > sizeof(struct xe_eudebug_event));
+	xe_eudebug_assert(d, event->type);
+	xe_eudebug_assert(d, event->type != DRM_XE_EUDEBUG_EVENT_READ);
+
+	start_ts = ktime_get();
+	last_read_detected_ts = start_ts;
+	last_head_seqno = 0;
+
+	do  {
+		struct xe_eudebug_event *head;
+		u64 head_seqno;
+		bool was_queued;
+
+		if (xe_eudebug_detached(d))
+			break;
+
+		spin_lock(&d->events.lock);
+		head = event_fifo_pending(d);
+		if (head)
+			head_seqno = event->seqno;
+		else
+			head_seqno = 0;
+
+		was_queued = kfifo_in(&d->events.fifo, &event, 1);
+		spin_unlock(&d->events.lock);
+
+		wake_up_all(&d->events.write_done);
+
+		if (was_queued) {
+			event = NULL;
+			break;
+		}
+
+		XE_WARN_ON(!head_seqno);
+
+		/* If we detect progress, restart timeout */
+		if (last_head_seqno != head_seqno)
+			last_read_detected_ts = ktime_get();
+
+		last_head_seqno = head_seqno;
+
+		wait_event_interruptible_timeout(d->events.read_done,
+						 !kfifo_is_full(&d->events.fifo),
+						 wait_jiffies);
+
+	} while (ktime_ms_delta(ktime_get(), last_read_detected_ts) <
+		 XE_EUDEBUG_NO_READ_DETECTED_TIMEOUT_MS);
+
+	if (event) {
+		eu_dbg(d,
+		       "event %llu queue failed (blocked %lld ms, avail %d)",
+		       event ? event->seqno : 0,
+		       ktime_ms_delta(ktime_get(), start_ts),
+		       kfifo_avail(&d->events.fifo));
+
+		kfree(event);
+
+		return -ETIMEDOUT;
+	}
+
+	return 0;
+}
+
+static struct xe_eudebug_handle *
+alloc_handle(const int type, const u64 key)
+{
+	struct xe_eudebug_handle *h;
+
+	h = kzalloc(sizeof(*h), GFP_ATOMIC);
+	if (!h)
+		return NULL;
+
+	h->key = key;
+
+	return h;
+}
+
+static struct xe_eudebug_handle *
+__find_handle(struct xe_eudebug_resource *r,
+	      const u64 key)
+{
+	struct xe_eudebug_handle *h;
+
+	h = rhashtable_lookup_fast(&r->rh,
+				   &key,
+				   rhash_res);
+	return h;
+}
+
+static int find_handle(struct xe_eudebug_resources *res,
+		       const int type,
+		       const void *p)
+{
+	const u64 key = (uintptr_t)p;
+	struct xe_eudebug_resource *r;
+	struct xe_eudebug_handle *h;
+	int id;
+
+	if (XE_WARN_ON(!key))
+		return -EINVAL;
+
+	r = resource_from_type(res, type);
+
+	mutex_lock(&res->lock);
+	h = __find_handle(r, key);
+	id = h ? h->id : -ENOENT;
+	mutex_unlock(&res->lock);
+
+	return id;
+}
+
+static int _xe_eudebug_add_handle(struct xe_eudebug *d,
+				  int type,
+				  void *p,
+				  u64 *seqno,
+				  int *handle)
+{
+	const u64 key = (uintptr_t)p;
+	struct xe_eudebug_resource *r;
+	struct xe_eudebug_handle *h, *o;
+	int err;
+
+	if (XE_WARN_ON(!p))
+		return -EINVAL;
+
+	if (xe_eudebug_detached(d))
+		return -ENOTCONN;
+
+	h = alloc_handle(type, key);
+	if (!h)
+		return -ENOMEM;
+
+	r = resource_from_type(d->res, type);
+
+	mutex_lock(&d->res->lock);
+	o = __find_handle(r, key);
+	if (!o) {
+		err = xa_alloc(&r->xa, &h->id, h, xa_limit_31b, GFP_KERNEL);
+
+		if (h->id >= INT_MAX) {
+			xa_erase(&r->xa, h->id);
+			err = -ENOSPC;
+		}
+
+		if (!err)
+			err = rhashtable_insert_fast(&r->rh,
+						     &h->rh_head,
+						     rhash_res);
+
+		if (err) {
+			xa_erase(&r->xa, h->id);
+		} else {
+			if (seqno)
+				*seqno = atomic_long_inc_return(&d->events.seqno);
+		}
+	} else {
+		xe_eudebug_assert(d, o->id);
+		err = -EEXIST;
+	}
+	mutex_unlock(&d->res->lock);
+
+	if (handle)
+		*handle = o ? o->id : h->id;
+
+	if (err) {
+		kfree(h);
+		XE_WARN_ON(err > 0);
+		return err;
+	}
+
+	xe_eudebug_assert(d, h->id);
+
+	return h->id;
+}
+
+static int xe_eudebug_add_handle(struct xe_eudebug *d,
+				 int type,
+				 void *p,
+				 u64 *seqno)
+{
+	int ret;
+
+	ret = _xe_eudebug_add_handle(d, type, p, seqno, NULL);
+	if (ret == -EEXIST || ret == -ENOTCONN) {
+		eu_dbg(d, "%d on adding %d", ret, type);
+		return 0;
+	}
+
+	if (ret < 0)
+		xe_eudebug_disconnect(d, ret);
+
+	return ret;
+}
+
+static int _xe_eudebug_remove_handle(struct xe_eudebug *d, int type, void *p,
+				     u64 *seqno)
+{
+	const u64 key = (uintptr_t)p;
+	struct xe_eudebug_resource *r;
+	struct xe_eudebug_handle *h, *xa_h;
+	int ret;
+
+	if (XE_WARN_ON(!key))
+		return -EINVAL;
+
+	if (xe_eudebug_detached(d))
+		return -ENOTCONN;
+
+	r = resource_from_type(d->res, type);
+
+	mutex_lock(&d->res->lock);
+	h = __find_handle(r, key);
+	if (h) {
+		ret = rhashtable_remove_fast(&r->rh,
+					     &h->rh_head,
+					     rhash_res);
+		xe_eudebug_assert(d, !ret);
+		xa_h = xa_erase(&r->xa, h->id);
+		xe_eudebug_assert(d, xa_h == h);
+		if (!ret) {
+			ret = h->id;
+			if (seqno)
+				*seqno = atomic_long_inc_return(&d->events.seqno);
+		}
+	} else {
+		ret = -ENOENT;
+	}
+	mutex_unlock(&d->res->lock);
+
+	kfree(h);
+
+	xe_eudebug_assert(d, ret);
+
+	return ret;
+}
+
+static int xe_eudebug_remove_handle(struct xe_eudebug *d, int type, void *p,
+				    u64 *seqno)
+{
+	int ret;
+
+	ret = _xe_eudebug_remove_handle(d, type, p, seqno);
+	if (ret == -ENOENT || ret == -ENOTCONN) {
+		eu_dbg(d, "%d on removing %d", ret, type);
+		return 0;
+	}
+
+	if (ret < 0)
+		xe_eudebug_disconnect(d, ret);
+
+	return ret;
+}
+
+static struct xe_eudebug_event *
+xe_eudebug_create_event(struct xe_eudebug *d, u16 type, u64 seqno, u16 flags,
+			u32 len)
+{
+	const u16 max_event = DRM_XE_EUDEBUG_EVENT_VM;
+	const u16 known_flags =
+		DRM_XE_EUDEBUG_EVENT_CREATE |
+		DRM_XE_EUDEBUG_EVENT_DESTROY |
+		DRM_XE_EUDEBUG_EVENT_STATE_CHANGE |
+		DRM_XE_EUDEBUG_EVENT_NEED_ACK;
+	struct xe_eudebug_event *event;
+
+	BUILD_BUG_ON(type > max_event);
+
+	xe_eudebug_assert(d, type <= max_event);
+	xe_eudebug_assert(d, !(~known_flags & flags));
+	xe_eudebug_assert(d, len > sizeof(*event));
+
+	event = kzalloc(len, GFP_KERNEL);
+	if (!event)
+		return NULL;
+
+	event->len = len;
+	event->type = type;
+	event->flags = flags;
+	event->seqno = seqno;
+
+	return event;
+}
+
+static long xe_eudebug_read_event(struct xe_eudebug *d,
+				  const u64 arg,
+				  const bool wait)
+{
+	struct xe_device *xe = d->xe;
+	struct drm_xe_eudebug_event __user * const user_orig =
+		u64_to_user_ptr(arg);
+	struct drm_xe_eudebug_event user_event;
+	struct xe_eudebug_event *event;
+	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_VM;
+	long ret = 0;
+
+	if (XE_IOCTL_DBG(xe, copy_from_user(&user_event, user_orig, sizeof(user_event))))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, !user_event.type))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, user_event.type > max_event))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, user_event.type != DRM_XE_EUDEBUG_EVENT_READ))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, user_event.len < sizeof(*user_orig)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, user_event.flags))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, user_event.reserved))
+		return -EINVAL;
+
+	/* XXX: define wait time in connect arguments ? */
+	if (wait) {
+		ret = wait_event_interruptible_timeout(d->events.write_done,
+						       event_fifo_has_events(d),
+						       msecs_to_jiffies(5 * 1000));
+
+		if (XE_IOCTL_DBG(xe, ret < 0))
+			return ret;
+	}
+
+	ret = 0;
+	spin_lock(&d->events.lock);
+	event = event_fifo_pending(d);
+	if (event) {
+		if (user_event.len < event->len) {
+			ret = -EMSGSIZE;
+		} else if (!kfifo_out(&d->events.fifo, &event, 1)) {
+			eu_warn(d, "internal fifo corruption");
+			ret = -ENOTCONN;
+		}
+	}
+	spin_unlock(&d->events.lock);
+
+	wake_up_all(&d->events.read_done);
+
+	if (ret == -EMSGSIZE && put_user(event->len, &user_orig->len))
+		ret = -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, ret))
+		return ret;
+
+	if (!event) {
+		if (xe_eudebug_detached(d))
+			return -ENOTCONN;
+		if (!wait)
+			return -EAGAIN;
+
+		return -ENOENT;
+	}
+
+	if (copy_to_user(user_orig, event, event->len))
+		ret = -EFAULT;
+	else
+		eu_dbg(d, "event read: type=%u, flags=0x%x, seqno=%llu", event->type,
+		       event->flags, event->seqno);
+
+	kfree(event);
+
+	return ret;
+}
+
+static long xe_eudebug_ioctl(struct file *file,
+			     unsigned int cmd,
+			     unsigned long arg)
+{
+	struct xe_eudebug * const d = file->private_data;
+	long ret;
+
+	switch (cmd) {
+	case DRM_XE_EUDEBUG_IOCTL_READ_EVENT:
+		ret = xe_eudebug_read_event(d, arg,
+					    !(file->f_flags & O_NONBLOCK));
+		break;
+
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static const struct file_operations fops = {
+	.owner		= THIS_MODULE,
+	.release	= xe_eudebug_release,
+	.poll		= xe_eudebug_poll,
+	.read		= xe_eudebug_read,
+	.unlocked_ioctl	= xe_eudebug_ioctl,
+};
+
+static int
+xe_eudebug_connect(struct xe_device *xe,
+		   struct drm_xe_eudebug_connect *param)
+{
+	const u64 known_open_flags = 0;
+	unsigned long f_flags = 0;
+	struct xe_eudebug *d;
+	int fd, err;
+
+	if (param->extensions)
+		return -EINVAL;
+
+	if (!param->pid)
+		return -EINVAL;
+
+	if (param->flags & ~known_open_flags)
+		return -EINVAL;
+
+	if (param->version && param->version != DRM_XE_EUDEBUG_VERSION)
+		return -EINVAL;
+
+	param->version = DRM_XE_EUDEBUG_VERSION;
+
+	if (!xe->eudebug.available)
+		return -EOPNOTSUPP;
+
+	d = kzalloc(sizeof(*d), GFP_KERNEL);
+	if (!d)
+		return -ENOMEM;
+
+	kref_init(&d->ref);
+	spin_lock_init(&d->connection.lock);
+	init_waitqueue_head(&d->events.write_done);
+	init_waitqueue_head(&d->events.read_done);
+
+	spin_lock_init(&d->events.lock);
+	INIT_KFIFO(d->events.fifo);
+
+	d->res = xe_eudebug_resources_alloc();
+	if (IS_ERR(d->res)) {
+		err = PTR_ERR(d->res);
+		goto err_free;
+	}
+
+	err = xe_eudebug_attach(xe, d, param->pid);
+	if (err)
+		goto err_free_res;
+
+	fd = anon_inode_getfd("[xe_eudebug]", &fops, d, f_flags);
+	if (fd < 0) {
+		err = fd;
+		goto err_detach;
+	}
+
+	eu_dbg(d, "connected session %lld", d->session);
+
+	return fd;
+
+err_detach:
+	xe_eudebug_detach(xe, d, err);
+err_free_res:
+	xe_eudebug_destroy_resources(d);
+err_free:
+	kfree(d);
+
+	return err;
+}
+
+int xe_eudebug_connect_ioctl(struct drm_device *dev,
+			     void *data,
+			     struct drm_file *file)
+{
+	struct xe_device *xe = to_xe_device(dev);
+	struct drm_xe_eudebug_connect * const param = data;
+	int ret = 0;
+
+	ret = xe_eudebug_connect(xe, param);
+
+	return ret;
+}
+
+void xe_eudebug_init(struct xe_device *xe)
+{
+	spin_lock_init(&xe->eudebug.lock);
+	INIT_LIST_HEAD(&xe->eudebug.list);
+
+	spin_lock_init(&xe->clients.lock);
+	INIT_LIST_HEAD(&xe->clients.list);
+
+	xe->eudebug.available = true;
+}
+
+void xe_eudebug_fini(struct xe_device *xe)
+{
+	xe_assert(xe, list_empty_careful(&xe->eudebug.list));
+}
+
+static int send_open_event(struct xe_eudebug *d, u32 flags, const u64 handle,
+			   const u64 seqno)
+{
+	struct xe_eudebug_event *event;
+	struct xe_eudebug_event_open *eo;
+
+	if (!handle)
+		return -EINVAL;
+
+	if (XE_WARN_ON((long)handle >= INT_MAX))
+		return -EINVAL;
+
+	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_OPEN, seqno,
+					flags, sizeof(*eo));
+	if (!event)
+		return -ENOMEM;
+
+	eo = cast_event(eo, event);
+
+	write_member(struct drm_xe_eudebug_event_client, eo,
+		     client_handle, handle);
+
+	return xe_eudebug_queue_event(d, event);
+}
+
+static int client_create_event(struct xe_eudebug *d, struct xe_file *xef)
+{
+	u64 seqno;
+	int ret;
+
+	ret = xe_eudebug_add_handle(d, XE_EUDEBUG_RES_TYPE_CLIENT, xef, &seqno);
+	if (ret > 0)
+		ret = send_open_event(d, DRM_XE_EUDEBUG_EVENT_CREATE,
+				      ret, seqno);
+
+	return ret;
+}
+
+static int client_destroy_event(struct xe_eudebug *d, struct xe_file *xef)
+{
+	u64 seqno;
+	int ret;
+
+	ret = xe_eudebug_remove_handle(d, XE_EUDEBUG_RES_TYPE_CLIENT,
+				       xef, &seqno);
+	if (ret > 0)
+		ret = send_open_event(d, DRM_XE_EUDEBUG_EVENT_DESTROY,
+				      ret, seqno);
+
+	return ret;
+}
+
+#define xe_eudebug_event_put(_d, _err) ({ \
+	if ((_err)) \
+		xe_eudebug_disconnect((_d), (_err)); \
+	xe_eudebug_put((_d)); \
+	})
+
+void xe_eudebug_file_open(struct xe_file *xef)
+{
+	struct xe_eudebug *d;
+
+	INIT_LIST_HEAD(&xef->eudebug.client_link);
+	spin_lock(&xef->xe->clients.lock);
+	list_add_tail(&xef->eudebug.client_link, &xef->xe->clients.list);
+	spin_unlock(&xef->xe->clients.lock);
+
+	d = xe_eudebug_get(xef);
+	if (!d)
+		return;
+
+	xe_eudebug_event_put(d, client_create_event(d, xef));
+}
+
+void xe_eudebug_file_close(struct xe_file *xef)
+{
+	struct xe_eudebug *d;
+
+	d = xe_eudebug_get(xef);
+	if (d)
+		xe_eudebug_event_put(d, client_destroy_event(d, xef));
+
+	spin_lock(&xef->xe->clients.lock);
+	list_del_init(&xef->eudebug.client_link);
+	spin_unlock(&xef->xe->clients.lock);
+}
+
+static int send_vm_event(struct xe_eudebug *d, u32 flags,
+			 const u64 client_handle,
+			 const u64 vm_handle,
+			 const u64 seqno)
+{
+	struct xe_eudebug_event *event;
+	struct xe_eudebug_event_vm *e;
+
+	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_VM,
+					seqno, flags, sizeof(*e));
+	if (!event)
+		return -ENOMEM;
+
+	e = cast_event(e, event);
+
+	write_member(struct drm_xe_eudebug_event_vm, e, client_handle, client_handle);
+	write_member(struct drm_xe_eudebug_event_vm, e, vm_handle, vm_handle);
+
+	return xe_eudebug_queue_event(d, event);
+}
+
+static int vm_create_event(struct xe_eudebug *d,
+			   struct xe_file *xef, struct xe_vm *vm)
+{
+	int h_c, h_vm;
+	u64 seqno;
+	int ret;
+
+	if (!xe_vm_in_lr_mode(vm))
+		return 0;
+
+	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, xef);
+	if (h_c < 0)
+		return h_c;
+
+	xe_eudebug_assert(d, h_c);
+
+	h_vm = xe_eudebug_add_handle(d, XE_EUDEBUG_RES_TYPE_VM, vm, &seqno);
+	if (h_vm <= 0)
+		return h_vm;
+
+	ret = send_vm_event(d, DRM_XE_EUDEBUG_EVENT_CREATE, h_c, h_vm, seqno);
+
+	return ret;
+}
+
+static int vm_destroy_event(struct xe_eudebug *d,
+			    struct xe_file *xef, struct xe_vm *vm)
+{
+	int h_c, h_vm;
+	u64 seqno;
+
+	if (!xe_vm_in_lr_mode(vm))
+		return 0;
+
+	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, xef);
+	if (h_c < 0) {
+		XE_WARN_ON("no client found for vm");
+		eu_warn(d, "no client found for vm");
+		return h_c;
+	}
+
+	xe_eudebug_assert(d, h_c);
+
+	h_vm = xe_eudebug_remove_handle(d, XE_EUDEBUG_RES_TYPE_VM, vm, &seqno);
+	if (h_vm <= 0)
+		return h_vm;
+
+	return send_vm_event(d, DRM_XE_EUDEBUG_EVENT_DESTROY, h_c, h_vm, seqno);
+}
+
+void xe_eudebug_vm_create(struct xe_file *xef, struct xe_vm *vm)
+{
+	struct xe_eudebug *d;
+
+	if (!xe_vm_in_lr_mode(vm))
+		return;
+
+	d = xe_eudebug_get(xef);
+	if (!d)
+		return;
+
+	xe_eudebug_event_put(d, vm_create_event(d, xef, vm));
+}
+
+void xe_eudebug_vm_destroy(struct xe_file *xef, struct xe_vm *vm)
+{
+	struct xe_eudebug *d;
+
+	if (!xe_vm_in_lr_mode(vm))
+		return;
+
+	d = xe_eudebug_get(xef);
+	if (!d)
+		return;
+
+	xe_eudebug_event_put(d, vm_destroy_event(d, xef, vm));
+}
diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h
new file mode 100644
index 000000000000..e3247365f72f
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_eudebug.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_EUDEBUG_H_
+
+struct drm_device;
+struct drm_file;
+struct xe_device;
+struct xe_file;
+struct xe_vm;
+
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+
+int xe_eudebug_connect_ioctl(struct drm_device *dev,
+			     void *data,
+			     struct drm_file *file);
+
+void xe_eudebug_init(struct xe_device *xe);
+void xe_eudebug_fini(struct xe_device *xe);
+
+void xe_eudebug_file_open(struct xe_file *xef);
+void xe_eudebug_file_close(struct xe_file *xef);
+
+void xe_eudebug_vm_create(struct xe_file *xef, struct xe_vm *vm);
+void xe_eudebug_vm_destroy(struct xe_file *xef, struct xe_vm *vm);
+
+#else
+
+static inline int xe_eudebug_connect_ioctl(struct drm_device *dev,
+					   void *data,
+					   struct drm_file *file) { return 0; }
+
+static inline void xe_eudebug_init(struct xe_device *xe) { }
+static inline void xe_eudebug_fini(struct xe_device *xe) { }
+
+static inline void xe_eudebug_file_open(struct xe_file *xef) { }
+static inline void xe_eudebug_file_close(struct xe_file *xef) { }
+
+static inline void xe_eudebug_vm_create(struct xe_file *xef, struct xe_vm *vm) { }
+static inline void xe_eudebug_vm_destroy(struct xe_file *xef, struct xe_vm *vm) { }
+
+#endif /* CONFIG_DRM_XE_EUDEBUG */
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
new file mode 100644
index 000000000000..a5185f18f640
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
@@ -0,0 +1,169 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef __XE_EUDEBUG_TYPES_H_
+
+#include <linux/completion.h>
+#include <linux/kfifo.h>
+#include <linux/kref.h>
+#include <linux/mutex.h>
+#include <linux/rbtree.h>
+#include <linux/rhashtable.h>
+#include <linux/wait.h>
+#include <linux/xarray.h>
+
+#include <uapi/drm/xe_drm.h>
+
+struct xe_device;
+struct task_struct;
+struct xe_eudebug_event;
+
+#define CONFIG_DRM_XE_DEBUGGER_EVENT_QUEUE_SIZE 64
+
+/**
+ * struct xe_eudebug_handle - eudebug resource handle
+ */
+struct xe_eudebug_handle {
+	/** @key: key value in rhashtable <key:id> */
+	u64 key;
+
+	/** @id: opaque handle id for xarray <id:key> */
+	int id;
+
+	/** @rh_head: rhashtable head */
+	struct rhash_head rh_head;
+};
+
+/**
+ * struct xe_eudebug_resource - Resource map for one resource
+ */
+struct xe_eudebug_resource {
+	/** @xa: xarrays for <id->key> */
+	struct xarray xa;
+
+	/** @rh rhashtable for <key->id> */
+	struct rhashtable rh;
+};
+
+#define XE_EUDEBUG_RES_TYPE_CLIENT	0
+#define XE_EUDEBUG_RES_TYPE_VM		1
+#define XE_EUDEBUG_RES_TYPE_COUNT	(XE_EUDEBUG_RES_TYPE_VM + 1)
+
+/**
+ * struct xe_eudebug_resources - eudebug resources for all types
+ */
+struct xe_eudebug_resources {
+	/** @lock: guards access into rt */
+	struct mutex lock;
+
+	/** @rt: resource maps for all types */
+	struct xe_eudebug_resource rt[XE_EUDEBUG_RES_TYPE_COUNT];
+};
+
+/**
+ * struct xe_eudebug - Top level struct for eudebug: the connection
+ */
+struct xe_eudebug {
+	/** @ref: kref counter for this struct */
+	struct kref ref;
+
+	/** @rcu: rcu_head for rcu destruction */
+	struct rcu_head rcu;
+
+	/** @connection_link: our link into the xe_device:eudebug.list */
+	struct list_head connection_link;
+
+	struct {
+		/** @status: connected = 1, disconnected = error */
+#define XE_EUDEBUG_STATUS_CONNECTED 1
+		int status;
+
+		/** @lock: guards access to status */
+		spinlock_t lock;
+	} connection;
+
+	/** @xe: the parent device we are serving */
+	struct xe_device *xe;
+
+	/** @target_task: the task that we are debugging */
+	struct task_struct *target_task;
+
+	/** @res: the resource maps we track for target_task */
+	struct xe_eudebug_resources *res;
+
+	/** @session: session number for this connection (for logs) */
+	u64 session;
+
+	/** @events: kfifo queue of to-be-delivered events */
+	struct {
+		/** @lock: guards access to fifo */
+		spinlock_t lock;
+
+		/** @fifo: queue of events pending */
+		DECLARE_KFIFO(fifo,
+			      struct xe_eudebug_event *,
+			      CONFIG_DRM_XE_DEBUGGER_EVENT_QUEUE_SIZE);
+
+		/** @write_done: waitqueue for signalling write to fifo */
+		wait_queue_head_t write_done;
+
+		/** @read_done: waitqueue for signalling read from fifo */
+		wait_queue_head_t read_done;
+
+		/** @event_seqno: seqno counter to stamp events for fifo */
+		atomic_long_t seqno;
+	} events;
+
+};
+
+/**
+ * struct xe_eudebug_event - Internal base event struct for eudebug
+ */
+struct xe_eudebug_event {
+	/** @len: length of this event, including payload */
+	u32 len;
+
+	/** @type: message type */
+	u16 type;
+
+	/** @flags: message flags */
+	u16 flags;
+
+	/** @seqno: sequence number for ordering */
+	u64 seqno;
+
+	/** @reserved: reserved field MBZ */
+	u64 reserved;
+
+	/** @data: payload bytes */
+	u8 data[];
+};
+
+/**
+ * struct xe_eudebug_event_open - Internal event for client open/close
+ */
+struct xe_eudebug_event_open {
+	/** @base: base event */
+	struct xe_eudebug_event base;
+
+	/** @client_handle: opaque handle for client */
+	u64 client_handle;
+};
+
+/**
+ * struct xe_eudebug_event_vm - Internal event for vm open/close
+ */
+struct xe_eudebug_event_vm {
+	/** @base: base event */
+	struct xe_eudebug_event base;
+
+	/** @client_handle: client containing the vm open/close */
+	u64 client_handle;
+
+	/** @vm_handle: vm handle it's open/close */
+	u64 vm_handle;
+};
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 7788680da4e5..6f16049f4f6e 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -25,6 +25,7 @@
 #include "xe_bo.h"
 #include "xe_device.h"
 #include "xe_drm_client.h"
+#include "xe_eudebug.h"
 #include "xe_exec_queue.h"
 #include "xe_gt_pagefault.h"
 #include "xe_gt_tlb_invalidation.h"
@@ -1797,6 +1798,8 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 
 	args->vm_id = id;
 
+	xe_eudebug_vm_create(xef, vm);
+
 	return 0;
 
 err_close_and_put:
@@ -1828,8 +1831,10 @@ int xe_vm_destroy_ioctl(struct drm_device *dev, void *data,
 		xa_erase(&xef->vm.xa, args->vm_id);
 	mutex_unlock(&xef->vm.lock);
 
-	if (!err)
+	if (!err) {
+		xe_eudebug_vm_destroy(xef, vm);
 		xe_vm_close_and_put(vm);
+	}
 
 	return err;
 }
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 4a8a4a63e99c..78479100a0b6 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -102,6 +102,7 @@ extern "C" {
 #define DRM_XE_EXEC			0x09
 #define DRM_XE_WAIT_USER_FENCE		0x0a
 #define DRM_XE_OBSERVATION		0x0b
+#define DRM_XE_EUDEBUG_CONNECT		0x0c
 
 /* Must be kept compact -- no holes */
 
@@ -117,6 +118,7 @@ extern "C" {
 #define DRM_IOCTL_XE_EXEC			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec)
 #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
 #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
+#define DRM_IOCTL_XE_EUDEBUG_CONNECT		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_EUDEBUG_CONNECT, struct drm_xe_eudebug_connect)
 
 /**
  * DOC: Xe IOCTL Extensions
@@ -1713,6 +1715,25 @@ struct drm_xe_oa_stream_info {
 	__u64 reserved[3];
 };
 
+/*
+ * Debugger ABI (ioctl and events) Version History:
+ * 0 - No debugger available
+ * 1 - Initial version
+ */
+#define DRM_XE_EUDEBUG_VERSION 1
+
+struct drm_xe_eudebug_connect {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+	__u64 pid; /* input: Target process ID */
+	__u32 flags; /* MBZ */
+
+	__u32 version; /* output: current ABI (ioctl / events) version */
+};
+
+#include "xe_drm_eudebug.h"
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
new file mode 100644
index 000000000000..acf6071c82bf
--- /dev/null
+++ b/include/uapi/drm/xe_drm_eudebug.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _UAPI_XE_DRM_EUDEBUG_H_
+#define _UAPI_XE_DRM_EUDEBUG_H_
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+/**
+ * Do a eudebug event read for a debugger connection.
+ *
+ * This ioctl is available in debug version 1.
+ */
+#define DRM_XE_EUDEBUG_IOCTL_READ_EVENT _IO('j', 0x0)
+
+/* XXX: Document events to match their internal counterparts when moved to xe_drm.h */
+struct drm_xe_eudebug_event {
+	__u32 len;
+
+	__u16 type;
+#define DRM_XE_EUDEBUG_EVENT_NONE		0
+#define DRM_XE_EUDEBUG_EVENT_READ		1
+#define DRM_XE_EUDEBUG_EVENT_OPEN		2
+#define DRM_XE_EUDEBUG_EVENT_VM			3
+
+	__u16 flags;
+#define DRM_XE_EUDEBUG_EVENT_CREATE		(1 << 0)
+#define DRM_XE_EUDEBUG_EVENT_DESTROY		(1 << 1)
+#define DRM_XE_EUDEBUG_EVENT_STATE_CHANGE	(1 << 2)
+#define DRM_XE_EUDEBUG_EVENT_NEED_ACK		(1 << 3)
+	__u64 seqno;
+	__u64 reserved;
+};
+
+struct drm_xe_eudebug_event_client {
+	struct drm_xe_eudebug_event base;
+
+	__u64 client_handle; /* This is unique per debug connection */
+};
+
+struct drm_xe_eudebug_event_vm {
+	struct drm_xe_eudebug_event base;
+
+	__u64 client_handle;
+	__u64 vm_handle;
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 03/26] drm/xe/eudebug: Introduce discovery for resources
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
  2024-12-09 13:32 ` [PATCH 01/26] ptrace: export ptrace_may_access Mika Kuoppala
  2024-12-09 13:32 ` [PATCH 02/26] drm/xe/eudebug: Introduce eudebug support Mika Kuoppala
@ 2024-12-09 13:32 ` Mika Kuoppala
  2024-12-09 13:32 ` [PATCH 04/26] drm/xe/eudebug: Introduce exec_queue events Mika Kuoppala
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:32 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Mika Kuoppala, Matthew Brost,
	Dominik Grzegorzek, Maciej Patelczyk

Debugger connection can happen way after the client has
created and destroyed arbitrary number of resources.

We need to playback all currently existing resources for the
debugger. The client is held until this so called discovery
process, executed by workqueue, is complete.

This patch is based on discovery work by Maciej Patelczyk
for i915 driver.

v2: - use rw_semaphore to block drm_ioctls during discovery (Matthew)
    - only lock according to ioctl at play (Dominik)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Co-developed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com> #locking
---
 drivers/gpu/drm/xe/xe_device.c        |  10 +-
 drivers/gpu/drm/xe/xe_device.h        |  34 +++++++
 drivers/gpu/drm/xe/xe_device_types.h  |   6 ++
 drivers/gpu/drm/xe/xe_eudebug.c       | 135 +++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_eudebug_types.h |   7 ++
 5 files changed, 185 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 9ed0de1eba0b..f051612908de 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -209,8 +209,11 @@ static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		return -ECANCELED;
 
 	ret = xe_pm_runtime_get_ioctl(xe);
-	if (ret >= 0)
+	if (ret >= 0) {
+		xe_eudebug_discovery_lock(xe, cmd);
 		ret = drm_ioctl(file, cmd, arg);
+		xe_eudebug_discovery_unlock(xe, cmd);
+	}
 	xe_pm_runtime_put(xe);
 
 	return ret;
@@ -227,8 +230,11 @@ static long xe_drm_compat_ioctl(struct file *file, unsigned int cmd, unsigned lo
 		return -ECANCELED;
 
 	ret = xe_pm_runtime_get_ioctl(xe);
-	if (ret >= 0)
+	if (ret >= 0) {
+		xe_eudebug_discovery_lock(xe, cmd);
 		ret = drm_compat_ioctl(file, cmd, arg);
+		xe_eudebug_discovery_unlock(xe, cmd);
+	}
 	xe_pm_runtime_put(xe);
 
 	return ret;
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index f1fbfe916867..088831a6b863 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -7,6 +7,7 @@
 #define _XE_DEVICE_H_
 
 #include <drm/drm_util.h>
+#include <drm/drm_ioctl.h>
 
 #include "xe_device_types.h"
 #include "xe_gt_types.h"
@@ -205,4 +206,37 @@ void xe_file_put(struct xe_file *xef);
 #define LNL_FLUSH_WORK(wrk__) \
 	flush_work(wrk__)
 
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+static inline int xe_eudebug_needs_lock(const unsigned int cmd)
+{
+	const unsigned int xe_cmd = DRM_IOCTL_NR(cmd) - DRM_COMMAND_BASE;
+
+	switch (xe_cmd) {
+	case DRM_XE_VM_CREATE:
+	case DRM_XE_VM_DESTROY:
+	case DRM_XE_VM_BIND:
+	case DRM_XE_EXEC_QUEUE_CREATE:
+	case DRM_XE_EXEC_QUEUE_DESTROY:
+	case DRM_XE_EUDEBUG_CONNECT:
+		return 1;
+	}
+
+	return 0;
+}
+
+static inline void xe_eudebug_discovery_lock(struct xe_device *xe, unsigned int cmd)
+{
+	if (xe_eudebug_needs_lock(cmd))
+		down_read(&xe->eudebug.discovery_lock);
+}
+static inline void xe_eudebug_discovery_unlock(struct xe_device *xe, unsigned int cmd)
+{
+	if (xe_eudebug_needs_lock(cmd))
+		up_read(&xe->eudebug.discovery_lock);
+}
+#else
+static inline void xe_eudebug_discovery_lock(struct xe_device *xe, unsigned int cmd) { }
+static inline void xe_eudebug_discovery_unlock(struct xe_device *xe, unsigned int cmd) { }
+#endif /* CONFIG_DRM_XE_EUDEBUG */
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 9f04e6476195..9941ea1400c6 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -550,6 +550,12 @@ struct xe_device {
 
 		/** @available: is the debugging functionality available */
 		bool available;
+
+		/** @ordered_wq: used to discovery */
+		struct workqueue_struct *ordered_wq;
+
+		/** discovery_lock: used for discovery to block xe ioctls */
+		struct rw_semaphore discovery_lock;
 	} eudebug;
 #endif
 
diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index bbb5f1e81bb8..228bc36342ba 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -299,6 +299,8 @@ static bool xe_eudebug_detach(struct xe_device *xe,
 	}
 	spin_unlock(&d->connection.lock);
 
+	flush_work(&d->discovery_work);
+
 	if (!detached)
 		return false;
 
@@ -409,7 +411,7 @@ static struct task_struct *find_task_get(struct xe_file *xef)
 }
 
 static struct xe_eudebug *
-xe_eudebug_get(struct xe_file *xef)
+_xe_eudebug_get(struct xe_file *xef)
 {
 	struct task_struct *task;
 	struct xe_eudebug *d;
@@ -433,6 +435,24 @@ xe_eudebug_get(struct xe_file *xef)
 	return d;
 }
 
+static struct xe_eudebug *
+xe_eudebug_get(struct xe_file *xef)
+{
+	struct xe_eudebug *d;
+
+	lockdep_assert_held(&xef->xe->eudebug.discovery_lock);
+
+	d = _xe_eudebug_get(xef);
+	if (d) {
+		if (!completion_done(&d->discovery)) {
+			xe_eudebug_put(d);
+			d = NULL;
+		}
+	}
+
+	return d;
+}
+
 static int xe_eudebug_queue_event(struct xe_eudebug *d,
 				  struct xe_eudebug_event *event)
 {
@@ -813,6 +833,10 @@ static long xe_eudebug_ioctl(struct file *file,
 	struct xe_eudebug * const d = file->private_data;
 	long ret;
 
+	if (cmd != DRM_XE_EUDEBUG_IOCTL_READ_EVENT &&
+	    !completion_done(&d->discovery))
+		return -EBUSY;
+
 	switch (cmd) {
 	case DRM_XE_EUDEBUG_IOCTL_READ_EVENT:
 		ret = xe_eudebug_read_event(d, arg,
@@ -834,6 +858,8 @@ static const struct file_operations fops = {
 	.unlocked_ioctl	= xe_eudebug_ioctl,
 };
 
+static void discovery_work_fn(struct work_struct *work);
+
 static int
 xe_eudebug_connect(struct xe_device *xe,
 		   struct drm_xe_eudebug_connect *param)
@@ -868,9 +894,11 @@ xe_eudebug_connect(struct xe_device *xe,
 	spin_lock_init(&d->connection.lock);
 	init_waitqueue_head(&d->events.write_done);
 	init_waitqueue_head(&d->events.read_done);
+	init_completion(&d->discovery);
 
 	spin_lock_init(&d->events.lock);
 	INIT_KFIFO(d->events.fifo);
+	INIT_WORK(&d->discovery_work, discovery_work_fn);
 
 	d->res = xe_eudebug_resources_alloc();
 	if (IS_ERR(d->res)) {
@@ -888,6 +916,9 @@ xe_eudebug_connect(struct xe_device *xe,
 		goto err_detach;
 	}
 
+	kref_get(&d->ref);
+	queue_work(xe->eudebug.ordered_wq, &d->discovery_work);
+
 	eu_dbg(d, "connected session %lld", d->session);
 
 	return fd;
@@ -922,13 +953,18 @@ void xe_eudebug_init(struct xe_device *xe)
 
 	spin_lock_init(&xe->clients.lock);
 	INIT_LIST_HEAD(&xe->clients.list);
+	init_rwsem(&xe->eudebug.discovery_lock);
 
-	xe->eudebug.available = true;
+	xe->eudebug.ordered_wq = alloc_ordered_workqueue("xe-eudebug-ordered-wq", 0);
+	xe->eudebug.available = !!xe->eudebug.ordered_wq;
 }
 
 void xe_eudebug_fini(struct xe_device *xe)
 {
 	xe_assert(xe, list_empty_careful(&xe->eudebug.list));
+
+	if (xe->eudebug.ordered_wq)
+		destroy_workqueue(xe->eudebug.ordered_wq);
 }
 
 static int send_open_event(struct xe_eudebug *d, u32 flags, const u64 handle,
@@ -994,21 +1030,25 @@ void xe_eudebug_file_open(struct xe_file *xef)
 	struct xe_eudebug *d;
 
 	INIT_LIST_HEAD(&xef->eudebug.client_link);
+
+	down_read(&xef->xe->eudebug.discovery_lock);
+
 	spin_lock(&xef->xe->clients.lock);
 	list_add_tail(&xef->eudebug.client_link, &xef->xe->clients.list);
 	spin_unlock(&xef->xe->clients.lock);
 
 	d = xe_eudebug_get(xef);
-	if (!d)
-		return;
+	if (d)
+		xe_eudebug_event_put(d, client_create_event(d, xef));
 
-	xe_eudebug_event_put(d, client_create_event(d, xef));
+	up_read(&xef->xe->eudebug.discovery_lock);
 }
 
 void xe_eudebug_file_close(struct xe_file *xef)
 {
 	struct xe_eudebug *d;
 
+	down_read(&xef->xe->eudebug.discovery_lock);
 	d = xe_eudebug_get(xef);
 	if (d)
 		xe_eudebug_event_put(d, client_destroy_event(d, xef));
@@ -1016,6 +1056,8 @@ void xe_eudebug_file_close(struct xe_file *xef)
 	spin_lock(&xef->xe->clients.lock);
 	list_del_init(&xef->eudebug.client_link);
 	spin_unlock(&xef->xe->clients.lock);
+
+	up_read(&xef->xe->eudebug.discovery_lock);
 }
 
 static int send_vm_event(struct xe_eudebug *d, u32 flags,
@@ -1116,3 +1158,86 @@ void xe_eudebug_vm_destroy(struct xe_file *xef, struct xe_vm *vm)
 
 	xe_eudebug_event_put(d, vm_destroy_event(d, xef, vm));
 }
+
+static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
+{
+	struct xe_vm *vm;
+	unsigned long i;
+	int err;
+
+	err = client_create_event(d, xef);
+	if (err)
+		return err;
+
+	xa_for_each(&xef->vm.xa, i, vm) {
+		err = vm_create_event(d, xef, vm);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+static bool xe_eudebug_task_match(struct xe_eudebug *d, struct xe_file *xef)
+{
+	struct task_struct *task;
+	bool match;
+
+	task = find_task_get(xef);
+	if (!task)
+		return false;
+
+	match = same_thread_group(d->target_task, task);
+
+	put_task_struct(task);
+
+	return match;
+}
+
+static void discover_clients(struct xe_device *xe, struct xe_eudebug *d)
+{
+	struct xe_file *xef;
+	int err;
+
+	list_for_each_entry(xef, &xe->clients.list, eudebug.client_link) {
+		if (xe_eudebug_detached(d))
+			break;
+
+		if (xe_eudebug_task_match(d, xef))
+			err = discover_client(d, xef);
+		else
+			err = 0;
+
+		if (err) {
+			eu_dbg(d, "discover client %p: %d\n", xef, err);
+			break;
+		}
+	}
+}
+
+static void discovery_work_fn(struct work_struct *work)
+{
+	struct xe_eudebug *d = container_of(work, typeof(*d),
+					    discovery_work);
+	struct xe_device *xe = d->xe;
+
+	if (xe_eudebug_detached(d)) {
+		complete_all(&d->discovery);
+		xe_eudebug_put(d);
+		return;
+	}
+
+	down_write(&xe->eudebug.discovery_lock);
+
+	eu_dbg(d, "Discovery start for %lld\n", d->session);
+
+	discover_clients(xe, d);
+
+	eu_dbg(d, "Discovery end for %lld\n", d->session);
+
+	complete_all(&d->discovery);
+
+	up_write(&xe->eudebug.discovery_lock);
+
+	xe_eudebug_put(d);
+}
diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
index a5185f18f640..080a821db3e4 100644
--- a/drivers/gpu/drm/xe/xe_eudebug_types.h
+++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
@@ -19,6 +19,7 @@
 struct xe_device;
 struct task_struct;
 struct xe_eudebug_event;
+struct workqueue_struct;
 
 #define CONFIG_DRM_XE_DEBUGGER_EVENT_QUEUE_SIZE 64
 
@@ -96,6 +97,12 @@ struct xe_eudebug {
 	/** @session: session number for this connection (for logs) */
 	u64 session;
 
+	/** @discovery: completion to wait for discovery */
+	struct completion discovery;
+
+	/** @discovery_work: worker to discover resources for target_task */
+	struct work_struct discovery_work;
+
 	/** @events: kfifo queue of to-be-delivered events */
 	struct {
 		/** @lock: guards access to fifo */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 04/26] drm/xe/eudebug: Introduce exec_queue events
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (2 preceding siblings ...)
  2024-12-09 13:32 ` [PATCH 03/26] drm/xe/eudebug: Introduce discovery for resources Mika Kuoppala
@ 2024-12-09 13:32 ` Mika Kuoppala
  2024-12-09 13:32 ` [PATCH 05/26] drm/xe/eudebug: Introduce exec queue placements event Mika Kuoppala
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:32 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Dominik Grzegorzek, Maciej Patelczyk,
	Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

Inform debugger about creation and destruction of exec_queues.

1) Use user engine class types instead of internal xe_engine_class enum
   in exec_queue event.

2) During discovery do not advertise every execqueue created, only ones
   with class render or compute.

v2: - Only track long running queues
    - Checkpatch (Tilak)

v3: __counted_by added

Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c       | 189 +++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_eudebug.h       |   7 +
 drivers/gpu/drm/xe/xe_eudebug_types.h |  31 ++++-
 drivers/gpu/drm/xe/xe_exec_queue.c    |   5 +
 include/uapi/drm/xe_drm_eudebug.h     |  12 ++
 5 files changed, 241 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 228bc36342ba..3ca46ec838b9 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -14,6 +14,7 @@
 #include "xe_device.h"
 #include "xe_eudebug.h"
 #include "xe_eudebug_types.h"
+#include "xe_exec_queue.h"
 #include "xe_macros.h"
 #include "xe_vm.h"
 
@@ -716,7 +717,7 @@ static struct xe_eudebug_event *
 xe_eudebug_create_event(struct xe_eudebug *d, u16 type, u64 seqno, u16 flags,
 			u32 len)
 {
-	const u16 max_event = DRM_XE_EUDEBUG_EVENT_VM;
+	const u16 max_event = DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE;
 	const u16 known_flags =
 		DRM_XE_EUDEBUG_EVENT_CREATE |
 		DRM_XE_EUDEBUG_EVENT_DESTROY |
@@ -751,7 +752,7 @@ static long xe_eudebug_read_event(struct xe_eudebug *d,
 		u64_to_user_ptr(arg);
 	struct drm_xe_eudebug_event user_event;
 	struct xe_eudebug_event *event;
-	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_VM;
+	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE;
 	long ret = 0;
 
 	if (XE_IOCTL_DBG(xe, copy_from_user(&user_event, user_orig, sizeof(user_event))))
@@ -1159,8 +1160,183 @@ void xe_eudebug_vm_destroy(struct xe_file *xef, struct xe_vm *vm)
 	xe_eudebug_event_put(d, vm_destroy_event(d, xef, vm));
 }
 
+static bool exec_queue_class_is_tracked(enum xe_engine_class class)
+{
+	return class == XE_ENGINE_CLASS_COMPUTE ||
+		class == XE_ENGINE_CLASS_RENDER;
+}
+
+static const u16 xe_to_user_engine_class[] = {
+	[XE_ENGINE_CLASS_RENDER] = DRM_XE_ENGINE_CLASS_RENDER,
+	[XE_ENGINE_CLASS_COPY] = DRM_XE_ENGINE_CLASS_COPY,
+	[XE_ENGINE_CLASS_VIDEO_DECODE] = DRM_XE_ENGINE_CLASS_VIDEO_DECODE,
+	[XE_ENGINE_CLASS_VIDEO_ENHANCE] = DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE,
+	[XE_ENGINE_CLASS_COMPUTE] = DRM_XE_ENGINE_CLASS_COMPUTE,
+};
+
+static int send_exec_queue_event(struct xe_eudebug *d, u32 flags,
+				 u64 client_handle, u64 vm_handle,
+				 u64 exec_queue_handle, enum xe_engine_class class,
+				 u32 width, u64 *lrc_handles, u64 seqno)
+{
+	struct xe_eudebug_event *event;
+	struct xe_eudebug_event_exec_queue *e;
+	const u32 sz = struct_size(e, lrc_handle, width);
+	const u32 xe_engine_class = xe_to_user_engine_class[class];
+
+	if (!exec_queue_class_is_tracked(class))
+		return -EINVAL;
+
+	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE,
+					seqno, flags, sz);
+	if (!event)
+		return -ENOMEM;
+
+	e = cast_event(e, event);
+
+	write_member(struct drm_xe_eudebug_event_exec_queue, e, client_handle, client_handle);
+	write_member(struct drm_xe_eudebug_event_exec_queue, e, vm_handle, vm_handle);
+	write_member(struct drm_xe_eudebug_event_exec_queue, e, exec_queue_handle,
+		     exec_queue_handle);
+	write_member(struct drm_xe_eudebug_event_exec_queue, e, engine_class, xe_engine_class);
+	write_member(struct drm_xe_eudebug_event_exec_queue, e, width, width);
+
+	memcpy(e->lrc_handle, lrc_handles, width);
+
+	return xe_eudebug_queue_event(d, event);
+}
+
+static int exec_queue_create_event(struct xe_eudebug *d,
+				   struct xe_file *xef, struct xe_exec_queue *q)
+{
+	int h_c, h_vm, h_queue;
+	u64 h_lrc[XE_HW_ENGINE_MAX_INSTANCE], seqno;
+	int i;
+
+	if (!xe_exec_queue_is_lr(q))
+		return 0;
+
+	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, xef);
+	if (h_c < 0)
+		return h_c;
+
+	h_vm = find_handle(d->res, XE_EUDEBUG_RES_TYPE_VM, q->vm);
+	if (h_vm < 0)
+		return h_vm;
+
+	if (XE_WARN_ON(q->width >= XE_HW_ENGINE_MAX_INSTANCE))
+		return -EINVAL;
+
+	for (i = 0; i < q->width; i++) {
+		int h, ret;
+
+		ret = _xe_eudebug_add_handle(d,
+					     XE_EUDEBUG_RES_TYPE_LRC,
+					     q->lrc[i],
+					     NULL,
+					     &h);
+
+		if (ret < 0 && ret != -EEXIST)
+			return ret;
+
+		XE_WARN_ON(!h);
+
+		h_lrc[i] = h;
+	}
+
+	h_queue = xe_eudebug_add_handle(d, XE_EUDEBUG_RES_TYPE_EXEC_QUEUE, q, &seqno);
+	if (h_queue <= 0)
+		return h_queue;
+
+	/* No need to cleanup for added handles on error as if we fail
+	 * we disconnect
+	 */
+
+	return send_exec_queue_event(d, DRM_XE_EUDEBUG_EVENT_CREATE,
+				     h_c, h_vm, h_queue, q->class,
+				     q->width, h_lrc, seqno);
+}
+
+static int exec_queue_destroy_event(struct xe_eudebug *d,
+				    struct xe_file *xef,
+				    struct xe_exec_queue *q)
+{
+	int h_c, h_vm, h_queue;
+	u64 h_lrc[XE_HW_ENGINE_MAX_INSTANCE], seqno;
+	int i;
+
+	if (!xe_exec_queue_is_lr(q))
+		return 0;
+
+	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, xef);
+	if (h_c < 0)
+		return h_c;
+
+	h_vm = find_handle(d->res, XE_EUDEBUG_RES_TYPE_VM, q->vm);
+	if (h_vm < 0)
+		return h_vm;
+
+	if (XE_WARN_ON(q->width >= XE_HW_ENGINE_MAX_INSTANCE))
+		return -EINVAL;
+
+	h_queue = xe_eudebug_remove_handle(d,
+					   XE_EUDEBUG_RES_TYPE_EXEC_QUEUE,
+					   q,
+					   &seqno);
+	if (h_queue <= 0)
+		return h_queue;
+
+	for (i = 0; i < q->width; i++) {
+		int ret;
+
+		ret = _xe_eudebug_remove_handle(d,
+						XE_EUDEBUG_RES_TYPE_LRC,
+						q->lrc[i],
+						NULL);
+		if (ret < 0 && ret != -ENOENT)
+			return ret;
+
+		XE_WARN_ON(!ret);
+
+		h_lrc[i] = ret;
+	}
+
+	return send_exec_queue_event(d, DRM_XE_EUDEBUG_EVENT_DESTROY,
+				     h_c, h_vm, h_queue, q->class,
+				     q->width, h_lrc, seqno);
+}
+
+void xe_eudebug_exec_queue_create(struct xe_file *xef, struct xe_exec_queue *q)
+{
+	struct xe_eudebug *d;
+
+	if (!exec_queue_class_is_tracked(q->class))
+		return;
+
+	d = xe_eudebug_get(xef);
+	if (!d)
+		return;
+
+	xe_eudebug_event_put(d, exec_queue_create_event(d, xef, q));
+}
+
+void xe_eudebug_exec_queue_destroy(struct xe_file *xef, struct xe_exec_queue *q)
+{
+	struct xe_eudebug *d;
+
+	if (!exec_queue_class_is_tracked(q->class))
+		return;
+
+	d = xe_eudebug_get(xef);
+	if (!d)
+		return;
+
+	xe_eudebug_event_put(d, exec_queue_destroy_event(d, xef, q));
+}
+
 static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
 {
+	struct xe_exec_queue *q;
 	struct xe_vm *vm;
 	unsigned long i;
 	int err;
@@ -1175,6 +1351,15 @@ static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
 			break;
 	}
 
+	xa_for_each(&xef->exec_queue.xa, i, q) {
+		if (!exec_queue_class_is_tracked(q->class))
+			continue;
+
+		err = exec_queue_create_event(d, xef, q);
+		if (err)
+			break;
+	}
+
 	return err;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h
index e3247365f72f..326ddbd50651 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.h
+++ b/drivers/gpu/drm/xe/xe_eudebug.h
@@ -10,6 +10,7 @@ struct drm_file;
 struct xe_device;
 struct xe_file;
 struct xe_vm;
+struct xe_exec_queue;
 
 #if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
 
@@ -26,6 +27,9 @@ void xe_eudebug_file_close(struct xe_file *xef);
 void xe_eudebug_vm_create(struct xe_file *xef, struct xe_vm *vm);
 void xe_eudebug_vm_destroy(struct xe_file *xef, struct xe_vm *vm);
 
+void xe_eudebug_exec_queue_create(struct xe_file *xef, struct xe_exec_queue *q);
+void xe_eudebug_exec_queue_destroy(struct xe_file *xef, struct xe_exec_queue *q);
+
 #else
 
 static inline int xe_eudebug_connect_ioctl(struct drm_device *dev,
@@ -41,6 +45,9 @@ static inline void xe_eudebug_file_close(struct xe_file *xef) { }
 static inline void xe_eudebug_vm_create(struct xe_file *xef, struct xe_vm *vm) { }
 static inline void xe_eudebug_vm_destroy(struct xe_file *xef, struct xe_vm *vm) { }
 
+static inline void xe_eudebug_exec_queue_create(struct xe_file *xef, struct xe_exec_queue *q) { }
+static inline void xe_eudebug_exec_queue_destroy(struct xe_file *xef, struct xe_exec_queue *q) { }
+
 #endif /* CONFIG_DRM_XE_EUDEBUG */
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
index 080a821db3e4..4824c4159036 100644
--- a/drivers/gpu/drm/xe/xe_eudebug_types.h
+++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
@@ -50,7 +50,9 @@ struct xe_eudebug_resource {
 
 #define XE_EUDEBUG_RES_TYPE_CLIENT	0
 #define XE_EUDEBUG_RES_TYPE_VM		1
-#define XE_EUDEBUG_RES_TYPE_COUNT	(XE_EUDEBUG_RES_TYPE_VM + 1)
+#define XE_EUDEBUG_RES_TYPE_EXEC_QUEUE	2
+#define XE_EUDEBUG_RES_TYPE_LRC		3
+#define XE_EUDEBUG_RES_TYPE_COUNT	(XE_EUDEBUG_RES_TYPE_LRC + 1)
 
 /**
  * struct xe_eudebug_resources - eudebug resources for all types
@@ -173,4 +175,31 @@ struct xe_eudebug_event_vm {
 	u64 vm_handle;
 };
 
+/**
+ * struct xe_eudebug_event_exec_queue - Internal event for
+ * exec_queue create/destroy
+ */
+struct xe_eudebug_event_exec_queue {
+	/** @base: base event */
+	struct xe_eudebug_event base;
+
+	/** @client_handle: client for the engine create/destroy */
+	u64 client_handle;
+
+	/** @vm_handle: vm handle for the engine create/destroy */
+	u64 vm_handle;
+
+	/** @exec_queue_handle: engine handle */
+	u64 exec_queue_handle;
+
+	/** @engine_handle: engine class */
+	u32 engine_class;
+
+	/** @width: submission width (number BB per exec) for this exec queue */
+	u32 width;
+
+	/** @lrc_handles: handles for each logical ring context created with this exec queue */
+	u64 lrc_handle[] __counted_by(width);
+};
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index aab9e561153d..7f5d8af778be 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -23,6 +23,7 @@
 #include "xe_ring_ops_types.h"
 #include "xe_trace.h"
 #include "xe_vm.h"
+#include "xe_eudebug.h"
 
 enum xe_exec_queue_sched_prop {
 	XE_EXEC_QUEUE_JOB_TIMEOUT = 0,
@@ -654,6 +655,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 
 	args->exec_queue_id = id;
 
+	xe_eudebug_exec_queue_create(xef, q);
+
 	return 0;
 
 kill_exec_queue:
@@ -840,6 +843,8 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
 	if (q->vm && q->hwe->hw_engine_group)
 		xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q);
 
+	xe_eudebug_exec_queue_destroy(xef, q);
+
 	xe_exec_queue_kill(q);
 
 	trace_xe_exec_queue_close(q);
diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
index acf6071c82bf..ac44e890152a 100644
--- a/include/uapi/drm/xe_drm_eudebug.h
+++ b/include/uapi/drm/xe_drm_eudebug.h
@@ -26,6 +26,7 @@ struct drm_xe_eudebug_event {
 #define DRM_XE_EUDEBUG_EVENT_READ		1
 #define DRM_XE_EUDEBUG_EVENT_OPEN		2
 #define DRM_XE_EUDEBUG_EVENT_VM			3
+#define DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE		4
 
 	__u16 flags;
 #define DRM_XE_EUDEBUG_EVENT_CREATE		(1 << 0)
@@ -49,6 +50,17 @@ struct drm_xe_eudebug_event_vm {
 	__u64 vm_handle;
 };
 
+struct drm_xe_eudebug_event_exec_queue {
+	struct drm_xe_eudebug_event base;
+
+	__u64 client_handle;
+	__u64 vm_handle;
+	__u64 exec_queue_handle;
+	__u32 engine_class;
+	__u32 width;
+	__u64 lrc_handle[];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 05/26] drm/xe/eudebug: Introduce exec queue placements event
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (3 preceding siblings ...)
  2024-12-09 13:32 ` [PATCH 04/26] drm/xe/eudebug: Introduce exec_queue events Mika Kuoppala
@ 2024-12-09 13:32 ` Mika Kuoppala
  2024-12-09 13:32 ` [PATCH 06/26] drm/xe/eudebug: hw enablement for eudebug Mika Kuoppala
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:32 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Dominik Grzegorzek, Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

This commit introduces the DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE_PLACEMENTS,
which provides dbgUMD with information about the hw engines utilized
during execution. The event is sent for every logical ring context (lrc)
in scenarios involving parallel submission.

Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c       | 99 ++++++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_eudebug_types.h | 26 +++++++
 include/uapi/drm/xe_drm_eudebug.h     | 17 +++++
 3 files changed, 133 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 3ca46ec838b9..cbcf7a72fdba 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -717,7 +717,7 @@ static struct xe_eudebug_event *
 xe_eudebug_create_event(struct xe_eudebug *d, u16 type, u64 seqno, u16 flags,
 			u32 len)
 {
-	const u16 max_event = DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE;
+	const u16 max_event = DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE_PLACEMENTS;
 	const u16 known_flags =
 		DRM_XE_EUDEBUG_EVENT_CREATE |
 		DRM_XE_EUDEBUG_EVENT_DESTROY |
@@ -752,7 +752,7 @@ static long xe_eudebug_read_event(struct xe_eudebug *d,
 		u64_to_user_ptr(arg);
 	struct drm_xe_eudebug_event user_event;
 	struct xe_eudebug_event *event;
-	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE;
+	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE_PLACEMENTS;
 	long ret = 0;
 
 	if (XE_IOCTL_DBG(xe, copy_from_user(&user_event, user_orig, sizeof(user_event))))
@@ -1206,12 +1206,88 @@ static int send_exec_queue_event(struct xe_eudebug *d, u32 flags,
 	return xe_eudebug_queue_event(d, event);
 }
 
-static int exec_queue_create_event(struct xe_eudebug *d,
-				   struct xe_file *xef, struct xe_exec_queue *q)
+static int send_exec_queue_placements_event(struct xe_eudebug *d,
+					    u64 client_handle, u64 vm_handle,
+					    u64 exec_queue_handle, u64 lrc_handle,
+					    u32 num_placements, u64 *instances,
+					    u64 seqno)
+{
+	struct xe_eudebug_event_exec_queue_placements *e;
+	const u32 sz = struct_size(e, instances, num_placements);
+	struct xe_eudebug_event *event;
+
+	event = xe_eudebug_create_event(d,
+					DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE_PLACEMENTS,
+					seqno, DRM_XE_EUDEBUG_EVENT_CREATE, sz);
+	if (!event)
+		return -ENOMEM;
+
+	e = cast_event(e, event);
+
+	write_member(struct drm_xe_eudebug_event_exec_queue_placements, e, client_handle,
+		     client_handle);
+	write_member(struct drm_xe_eudebug_event_exec_queue_placements, e, vm_handle, vm_handle);
+	write_member(struct drm_xe_eudebug_event_exec_queue_placements, e, exec_queue_handle,
+		     exec_queue_handle);
+	write_member(struct drm_xe_eudebug_event_exec_queue_placements, e, lrc_handle, lrc_handle);
+	write_member(struct drm_xe_eudebug_event_exec_queue_placements, e, num_placements,
+		     num_placements);
+
+	memcpy(e->instances, instances, num_placements * sizeof(*instances));
+
+	return xe_eudebug_queue_event(d, event);
+}
+
+static int send_exec_queue_placements_events(struct xe_eudebug *d, struct xe_exec_queue *q,
+					     u64 client_handle, u64 vm_handle,
+					     u64 exec_queue_handle, u64 *lrc_handles)
+{
+
+	struct drm_xe_engine_class_instance eci[XE_HW_ENGINE_MAX_INSTANCE] = {};
+	unsigned long mask = q->logical_mask;
+	u32 num_placements = 0;
+	int ret, i, j;
+	u64 seqno;
+
+	for_each_set_bit(i, &mask, sizeof(q->logical_mask) * 8) {
+		if (XE_WARN_ON(num_placements == XE_HW_ENGINE_MAX_INSTANCE))
+			break;
+
+		eci[num_placements].engine_class = xe_to_user_engine_class[q->class];
+		eci[num_placements].engine_instance = i;
+		eci[num_placements++].gt_id = q->gt->info.id;
+	}
+
+	ret = 0;
+	for (i = 0; i < q->width; i++) {
+		seqno = atomic_long_inc_return(&d->events.seqno);
+
+		ret = send_exec_queue_placements_event(d, client_handle, vm_handle,
+						       exec_queue_handle, lrc_handles[i],
+						       num_placements, (u64 *)eci, seqno);
+		if (ret)
+			return ret;
+
+		/*
+		 * Parallel submissions must be logically contiguous,
+		 * so the next placement is just q->logical_mask >> 1
+		 */
+		for (j = 0; j < num_placements; j++) {
+			eci[j].engine_instance++;
+			XE_WARN_ON(eci[j].engine_instance >= XE_HW_ENGINE_MAX_INSTANCE);
+		}
+	}
+
+	return ret;
+}
+
+static int exec_queue_create_events(struct xe_eudebug *d,
+				    struct xe_file *xef, struct xe_exec_queue *q)
 {
 	int h_c, h_vm, h_queue;
 	u64 h_lrc[XE_HW_ENGINE_MAX_INSTANCE], seqno;
 	int i;
+	int ret = 0;
 
 	if (!xe_exec_queue_is_lr(q))
 		return 0;
@@ -1252,9 +1328,14 @@ static int exec_queue_create_event(struct xe_eudebug *d,
 	 * we disconnect
 	 */
 
-	return send_exec_queue_event(d, DRM_XE_EUDEBUG_EVENT_CREATE,
-				     h_c, h_vm, h_queue, q->class,
-				     q->width, h_lrc, seqno);
+	ret = send_exec_queue_event(d, DRM_XE_EUDEBUG_EVENT_CREATE,
+				  h_c, h_vm, h_queue, q->class,
+				  q->width, h_lrc, seqno);
+
+	if (ret)
+		return ret;
+
+	return send_exec_queue_placements_events(d, q, h_c, h_vm, h_queue, h_lrc);
 }
 
 static int exec_queue_destroy_event(struct xe_eudebug *d,
@@ -1317,7 +1398,7 @@ void xe_eudebug_exec_queue_create(struct xe_file *xef, struct xe_exec_queue *q)
 	if (!d)
 		return;
 
-	xe_eudebug_event_put(d, exec_queue_create_event(d, xef, q));
+	xe_eudebug_event_put(d, exec_queue_create_events(d, xef, q));
 }
 
 void xe_eudebug_exec_queue_destroy(struct xe_file *xef, struct xe_exec_queue *q)
@@ -1355,7 +1436,7 @@ static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
 		if (!exec_queue_class_is_tracked(q->class))
 			continue;
 
-		err = exec_queue_create_event(d, xef, q);
+		err = exec_queue_create_events(d, xef, q);
 		if (err)
 			break;
 	}
diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
index 4824c4159036..bdffdfb1abff 100644
--- a/drivers/gpu/drm/xe/xe_eudebug_types.h
+++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
@@ -202,4 +202,30 @@ struct xe_eudebug_event_exec_queue {
 	u64 lrc_handle[] __counted_by(width);
 };
 
+struct xe_eudebug_event_exec_queue_placements {
+	/** @base: base event */
+	struct xe_eudebug_event base;
+
+	/** @client_handle: client for the engine create/destroy */
+	u64 client_handle;
+
+	/** @vm_handle: vm handle for the engine create/destroy */
+	u64 vm_handle;
+
+	/** @exec_queue_handle: engine handle */
+	u64 exec_queue_handle;
+
+	/** @engine_handle: engine class */
+	u64 lrc_handle;
+
+	/** @num_placements: all possible placements for given lrc */
+	u32 num_placements;
+
+	/** @pad: padding */
+	u32 pad;
+
+	/** @instances: num_placements sized array containing drm_xe_engine_class_instance*/
+	u64 instances[]; __counted_by(num_placements);
+};
+
 #endif
diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
index ac44e890152a..21690008a869 100644
--- a/include/uapi/drm/xe_drm_eudebug.h
+++ b/include/uapi/drm/xe_drm_eudebug.h
@@ -27,6 +27,7 @@ struct drm_xe_eudebug_event {
 #define DRM_XE_EUDEBUG_EVENT_OPEN		2
 #define DRM_XE_EUDEBUG_EVENT_VM			3
 #define DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE		4
+#define DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE_PLACEMENTS 5
 
 	__u16 flags;
 #define DRM_XE_EUDEBUG_EVENT_CREATE		(1 << 0)
@@ -61,6 +62,22 @@ struct drm_xe_eudebug_event_exec_queue {
 	__u64 lrc_handle[];
 };
 
+struct drm_xe_eudebug_event_exec_queue_placements {
+	struct drm_xe_eudebug_event base;
+
+	__u64 client_handle;
+	__u64 vm_handle;
+	__u64 exec_queue_handle;
+	__u64 lrc_handle;
+	__u32 num_placements;
+	__u32 pad;
+	/**
+	 * @instances: user pointer to num_placements sized array of struct
+	 * drm_xe_engine_class_instance
+	 */
+	__u64 instances[];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 06/26] drm/xe/eudebug: hw enablement for eudebug
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (4 preceding siblings ...)
  2024-12-09 13:32 ` [PATCH 05/26] drm/xe/eudebug: Introduce exec queue placements event Mika Kuoppala
@ 2024-12-09 13:32 ` Mika Kuoppala
  2024-12-09 13:32 ` [PATCH 07/26] drm/xe: Add EUDEBUG_ENABLE exec queue property Mika Kuoppala
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:32 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Dominik Grzegorzek, Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

In order to turn on debug capabilities, (i.e. breakpoints), TD_CTL
and some other registers needs to be programmed. Implement eudebug
mode enabling including eudebug related workarounds.

v2: Move workarounds to xe_wa_oob. Use reg_sr directly instead of
xe_rtp as it suits better for dynamic manipulation of those register we
do later in the series.
v3: get rid of undefining XE_MCR_REG (Mika)

Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/regs/xe_engine_regs.h |  4 ++
 drivers/gpu/drm/xe/regs/xe_gt_regs.h     | 10 +++++
 drivers/gpu/drm/xe/xe_eudebug.c          | 49 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_eudebug.h          |  3 ++
 drivers/gpu/drm/xe/xe_hw_engine.c        |  2 +
 drivers/gpu/drm/xe/xe_wa_oob.rules       |  2 +
 6 files changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
index 7c78496e6213..e45c4d5378e5 100644
--- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
@@ -115,6 +115,10 @@
 
 #define INDIRECT_RING_STATE(base)		XE_REG((base) + 0x108)
 
+#define CS_DEBUG_MODE2(base)			XE_REG((base) + 0xd8, XE_REG_OPTION_MASKED)
+#define   INST_STATE_CACHE_INVALIDATE		REG_BIT(6)
+#define   GLOBAL_DEBUG_ENABLE			REG_BIT(5)
+
 #define RING_BBADDR(base)			XE_REG((base) + 0x140)
 #define RING_BBADDR_UDW(base)			XE_REG((base) + 0x168)
 
diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index 162f18e975da..cd8c49a9000f 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -455,6 +455,14 @@
 #define   DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA	REG_BIT(15)
 #define   CLEAR_OPTIMIZATION_DISABLE			REG_BIT(6)
 
+#define TD_CTL					XE_REG_MCR(0xe400)
+#define   TD_CTL_FEH_AND_FEE_ENABLE		REG_BIT(7) /* forced halt and exception */
+#define   TD_CTL_FORCE_EXTERNAL_HALT		REG_BIT(6)
+#define   TD_CTL_FORCE_THREAD_BREAKPOINT_ENABLE	REG_BIT(4)
+#define   TD_CTL_FORCE_EXCEPTION		REG_BIT(3)
+#define   TD_CTL_BREAKPOINT_ENABLE		REG_BIT(2)
+#define   TD_CTL_GLOBAL_DEBUG_ENABLE		REG_BIT(0) /* XeHP */
+
 #define CACHE_MODE_SS				XE_REG_MCR(0xe420, XE_REG_OPTION_MASKED)
 #define   DISABLE_ECC				REG_BIT(5)
 #define   ENABLE_PREFETCH_INTO_IC		REG_BIT(3)
@@ -481,11 +489,13 @@
 #define   MDQ_ARBITRATION_MODE			REG_BIT(12)
 #define   STALL_DOP_GATING_DISABLE		REG_BIT(5)
 #define   EARLY_EOT_DIS				REG_BIT(1)
+#define   STALL_DOP_GATING_DISABLE		REG_BIT(5)
 
 #define ROW_CHICKEN2				XE_REG_MCR(0xe4f4, XE_REG_OPTION_MASKED)
 #define   DISABLE_READ_SUPPRESSION		REG_BIT(15)
 #define   DISABLE_EARLY_READ			REG_BIT(14)
 #define   ENABLE_LARGE_GRF_MODE			REG_BIT(12)
+#define   XEHPC_DISABLE_BTB			REG_BIT(11)
 #define   PUSH_CONST_DEREF_HOLD_DIS		REG_BIT(8)
 #define   DISABLE_TDL_SVHS_GATING		REG_BIT(1)
 #define   DISABLE_DOP_GATING			REG_BIT(0)
diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index cbcf7a72fdba..fecb7c8a9779 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -10,13 +10,21 @@
 
 #include <drm/drm_managed.h>
 
+#include <generated/xe_wa_oob.h>
+
+#include "regs/xe_gt_regs.h"
+#include "regs/xe_engine_regs.h"
+
 #include "xe_assert.h"
 #include "xe_device.h"
 #include "xe_eudebug.h"
 #include "xe_eudebug_types.h"
 #include "xe_exec_queue.h"
 #include "xe_macros.h"
+#include "xe_reg_sr.h"
+#include "xe_rtp.h"
 #include "xe_vm.h"
+#include "xe_wa.h"
 
 /*
  * If there is no detected event read by userspace, during this period, assume
@@ -947,6 +955,47 @@ int xe_eudebug_connect_ioctl(struct drm_device *dev,
 	return ret;
 }
 
+static void add_sr_entry(struct xe_hw_engine *hwe,
+			 struct xe_reg_mcr mcr_reg,
+			 u32 mask)
+{
+	const struct xe_reg_sr_entry sr_entry = {
+		.reg = mcr_reg.__reg,
+		.clr_bits = mask,
+		.set_bits = mask,
+		.read_mask = mask,
+	};
+
+	xe_reg_sr_add(&hwe->reg_sr, &sr_entry, hwe->gt);
+}
+
+void xe_eudebug_init_hw_engine(struct xe_hw_engine *hwe)
+{
+	struct xe_gt *gt = hwe->gt;
+	struct xe_device *xe = gt_to_xe(gt);
+
+	if (!xe->eudebug.available)
+		return;
+
+	if (!xe_rtp_match_first_render_or_compute(gt, hwe))
+		return;
+
+	if (XE_WA(gt, 18022722726))
+		add_sr_entry(hwe, ROW_CHICKEN, STALL_DOP_GATING_DISABLE);
+
+	if (XE_WA(gt, 14015474168))
+		add_sr_entry(hwe, ROW_CHICKEN2, XEHPC_DISABLE_BTB);
+
+	if (xe->info.graphics_verx100 >= 1200)
+		add_sr_entry(hwe, TD_CTL,
+			     TD_CTL_BREAKPOINT_ENABLE |
+			     TD_CTL_FORCE_THREAD_BREAKPOINT_ENABLE |
+			     TD_CTL_FEH_AND_FEE_ENABLE);
+
+	if (xe->info.graphics_verx100 >= 1250)
+		add_sr_entry(hwe, TD_CTL, TD_CTL_GLOBAL_DEBUG_ENABLE);
+}
+
 void xe_eudebug_init(struct xe_device *xe)
 {
 	spin_lock_init(&xe->eudebug.lock);
diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h
index 326ddbd50651..3cd6bc7bb682 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.h
+++ b/drivers/gpu/drm/xe/xe_eudebug.h
@@ -11,6 +11,7 @@ struct xe_device;
 struct xe_file;
 struct xe_vm;
 struct xe_exec_queue;
+struct xe_hw_engine;
 
 #if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
 
@@ -20,6 +21,7 @@ int xe_eudebug_connect_ioctl(struct drm_device *dev,
 
 void xe_eudebug_init(struct xe_device *xe);
 void xe_eudebug_fini(struct xe_device *xe);
+void xe_eudebug_init_hw_engine(struct xe_hw_engine *hwe);
 
 void xe_eudebug_file_open(struct xe_file *xef);
 void xe_eudebug_file_close(struct xe_file *xef);
@@ -38,6 +40,7 @@ static inline int xe_eudebug_connect_ioctl(struct drm_device *dev,
 
 static inline void xe_eudebug_init(struct xe_device *xe) { }
 static inline void xe_eudebug_fini(struct xe_device *xe) { }
+static inline void xe_eudebug_init_hw_engine(struct xe_hw_engine *hwe) { }
 
 static inline void xe_eudebug_file_open(struct xe_file *xef) { }
 static inline void xe_eudebug_file_close(struct xe_file *xef) { }
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index c4b0dc3be39c..8a188ddc99f4 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -16,6 +16,7 @@
 #include "xe_assert.h"
 #include "xe_bo.h"
 #include "xe_device.h"
+#include "xe_eudebug.h"
 #include "xe_execlist.h"
 #include "xe_force_wake.h"
 #include "xe_gsc.h"
@@ -558,6 +559,7 @@ static void hw_engine_init_early(struct xe_gt *gt, struct xe_hw_engine *hwe,
 	xe_tuning_process_engine(hwe);
 	xe_wa_process_engine(hwe);
 	hw_engine_setup_default_state(hwe);
+	xe_eudebug_init_hw_engine(hwe);
 
 	xe_reg_sr_init(&hwe->reg_whitelist, hwe->name, gt_to_xe(gt));
 	xe_reg_whitelist_process_engine(hwe);
diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
index 3ed12a85cc60..cc2f28663072 100644
--- a/drivers/gpu/drm/xe/xe_wa_oob.rules
+++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
@@ -42,3 +42,5 @@
 no_media_l3	MEDIA_VERSION(3000)
 14022866841	GRAPHICS_VERSION(3000), GRAPHICS_STEP(A0, B0)
 		MEDIA_VERSION(3000), MEDIA_STEP(A0, B0)
+18022722726	GRAPHICS_VERSION_RANGE(1250, 1274)
+14015474168	PLATFORM(PVC)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 07/26] drm/xe: Add EUDEBUG_ENABLE exec queue property
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (5 preceding siblings ...)
  2024-12-09 13:32 ` [PATCH 06/26] drm/xe/eudebug: hw enablement for eudebug Mika Kuoppala
@ 2024-12-09 13:32 ` Mika Kuoppala
  2024-12-09 13:32 ` [PATCH 08/26] drm/xe/eudebug: Introduce per device attention scan worker Mika Kuoppala
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:32 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Dominik Grzegorzek, Matthew Brost,
	Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

Introduce exec queue immutable property of eudebug
with a flags as value to enable eudebug specific feature(s).

For now engine lrc will use this flag to set up runalone
hw feature. Runalone is used to ensure that only one hw engine
of group [rcs0, ccs0-3] is active on a tile.

Note: unlike the i915, xe allows user to set runalone
also on devices with single render/compute engine. It should not
make much difference, but leave control to the user.

v2: - check CONFIG_DRM_XE_EUDEBUG and LR mode (Matthew)
    - disable preempt (Dominik)
    - lrc_create remove from engine init

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c          |  4 +--
 drivers/gpu/drm/xe/xe_exec_queue.c       | 46 ++++++++++++++++++++++--
 drivers/gpu/drm/xe/xe_exec_queue.h       |  2 ++
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  7 ++++
 drivers/gpu/drm/xe/xe_execlist.c         |  2 +-
 drivers/gpu/drm/xe/xe_lrc.c              | 16 +++++++--
 drivers/gpu/drm/xe/xe_lrc.h              |  4 ++-
 include/uapi/drm/xe_drm.h                |  3 +-
 8 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index fecb7c8a9779..4644d6846aae 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -1338,7 +1338,7 @@ static int exec_queue_create_events(struct xe_eudebug *d,
 	int i;
 	int ret = 0;
 
-	if (!xe_exec_queue_is_lr(q))
+	if (!xe_exec_queue_is_debuggable(q))
 		return 0;
 
 	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, xef);
@@ -1395,7 +1395,7 @@ static int exec_queue_destroy_event(struct xe_eudebug *d,
 	u64 h_lrc[XE_HW_ENGINE_MAX_INSTANCE], seqno;
 	int i;
 
-	if (!xe_exec_queue_is_lr(q))
+	if (!xe_exec_queue_is_debuggable(q))
 		return 0;
 
 	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, xef);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 7f5d8af778be..cca46a32723e 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -109,6 +109,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 static int __xe_exec_queue_init(struct xe_exec_queue *q)
 {
 	struct xe_vm *vm = q->vm;
+	u32 flags = 0;
 	int i, err;
 
 	if (vm) {
@@ -117,8 +118,11 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q)
 			return err;
 	}
 
+	if (q->eudebug_flags & EXEC_QUEUE_EUDEBUG_FLAG_ENABLE)
+		flags |= LRC_CREATE_RUNALONE;
+
 	for (i = 0; i < q->width; ++i) {
-		q->lrc[i] = xe_lrc_create(q->hwe, q->vm, SZ_16K);
+		q->lrc[i] = xe_lrc_create(q->hwe, q->vm, SZ_16K, flags);
 		if (IS_ERR(q->lrc[i])) {
 			err = PTR_ERR(q->lrc[i]);
 			goto err_unlock;
@@ -403,6 +407,42 @@ static int exec_queue_set_timeslice(struct xe_device *xe, struct xe_exec_queue *
 	return 0;
 }
 
+static int exec_queue_set_eudebug(struct xe_device *xe, struct xe_exec_queue *q,
+				  u64 value)
+{
+	const u64 known_flags = DRM_XE_EXEC_QUEUE_EUDEBUG_FLAG_ENABLE;
+
+	if (XE_IOCTL_DBG(xe, (q->class != XE_ENGINE_CLASS_RENDER &&
+			      q->class != XE_ENGINE_CLASS_COMPUTE)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, (value & ~known_flags)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, !IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)))
+		return -EOPNOTSUPP;
+
+	if (XE_IOCTL_DBG(xe, !xe_exec_queue_is_lr(q)))
+		return -EINVAL;
+	/*
+	 * We want to explicitly set the global feature if
+	 * property is set.
+	 */
+	if (XE_IOCTL_DBG(xe,
+			 !(value & DRM_XE_EXEC_QUEUE_EUDEBUG_FLAG_ENABLE)))
+		return -EINVAL;
+
+	q->eudebug_flags = EXEC_QUEUE_EUDEBUG_FLAG_ENABLE;
+	q->sched_props.preempt_timeout_us = 0;
+
+	return 0;
+}
+
+int xe_exec_queue_is_debuggable(struct xe_exec_queue *q)
+{
+	return q->eudebug_flags & EXEC_QUEUE_EUDEBUG_FLAG_ENABLE;
+}
+
 typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
 					     struct xe_exec_queue *q,
 					     u64 value);
@@ -410,6 +450,7 @@ typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
 static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
+	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_EUDEBUG] = exec_queue_set_eudebug,
 };
 
 static int exec_queue_user_ext_set_property(struct xe_device *xe,
@@ -429,7 +470,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
 			 ARRAY_SIZE(exec_queue_set_property_funcs)) ||
 	    XE_IOCTL_DBG(xe, ext.pad) ||
 	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
-			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE))
+			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
+			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_EUDEBUG))
 		return -EINVAL;
 
 	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
index 90c7f73eab88..421d8dc89814 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue.h
@@ -85,4 +85,6 @@ int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
 				      struct xe_vm *vm);
 void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
 
+int xe_exec_queue_is_debuggable(struct xe_exec_queue *q);
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 1158b6062a6c..03f3ad235e4b 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -90,6 +90,13 @@ struct xe_exec_queue {
 	 */
 	unsigned long flags;
 
+	/**
+	 * @eudebug_flags: immutable eudebug flags for this exec queue.
+	 * Set up with DRM_XE_EXEC_QUEUE_SET_PROPERTY_EUDEBUG.
+	 */
+#define EXEC_QUEUE_EUDEBUG_FLAG_ENABLE		BIT(0)
+	unsigned long eudebug_flags;
+
 	union {
 		/** @multi_gt_list: list head for VM bind engines if multi-GT */
 		struct list_head multi_gt_list;
diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index a8c416a48812..84b69a5dd361 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -265,7 +265,7 @@ struct xe_execlist_port *xe_execlist_port_create(struct xe_device *xe,
 
 	port->hwe = hwe;
 
-	port->lrc = xe_lrc_create(hwe, NULL, SZ_16K);
+	port->lrc = xe_lrc_create(hwe, NULL, SZ_16K, 0);
 	if (IS_ERR(port->lrc)) {
 		err = PTR_ERR(port->lrc);
 		goto err;
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 22e58c6e2a35..4ff217ca5474 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -876,7 +876,7 @@ static void xe_lrc_finish(struct xe_lrc *lrc)
 #define PVC_CTX_ACC_CTR_THOLD	(0x2a + 1)
 
 static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
-		       struct xe_vm *vm, u32 ring_size)
+		       struct xe_vm *vm, u32 ring_size, u32 flags)
 {
 	struct xe_gt *gt = hwe->gt;
 	struct xe_tile *tile = gt_to_tile(gt);
@@ -993,6 +993,16 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	map = __xe_lrc_start_seqno_map(lrc);
 	xe_map_write32(lrc_to_xe(lrc), &map, lrc->fence_ctx.next_seqno - 1);
 
+	if (flags & LRC_CREATE_RUNALONE) {
+		u32 ctx_control = xe_lrc_read_ctx_reg(lrc, CTX_CONTEXT_CONTROL);
+
+		drm_dbg(&xe->drm, "read CTX_CONTEXT_CONTROL: 0x%x\n", ctx_control);
+		ctx_control |= _MASKED_BIT_ENABLE(CTX_CTRL_RUN_ALONE);
+		drm_dbg(&xe->drm, "written CTX_CONTEXT_CONTROL: 0x%x\n", ctx_control);
+
+		xe_lrc_write_ctx_reg(lrc, CTX_CONTEXT_CONTROL, ctx_control);
+	}
+
 	return 0;
 
 err_lrc_finish:
@@ -1012,7 +1022,7 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
  * upon failure.
  */
 struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm,
-			     u32 ring_size)
+			     u32 ring_size, u32 flags)
 {
 	struct xe_lrc *lrc;
 	int err;
@@ -1021,7 +1031,7 @@ struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm,
 	if (!lrc)
 		return ERR_PTR(-ENOMEM);
 
-	err = xe_lrc_init(lrc, hwe, vm, ring_size);
+	err = xe_lrc_init(lrc, hwe, vm, ring_size, flags);
 	if (err) {
 		kfree(lrc);
 		return ERR_PTR(err);
diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
index b459dcab8787..3e5656752831 100644
--- a/drivers/gpu/drm/xe/xe_lrc.h
+++ b/drivers/gpu/drm/xe/xe_lrc.h
@@ -41,8 +41,10 @@ struct xe_lrc_snapshot {
 
 #define LRC_PPHWSP_SCRATCH_ADDR (0x34 * 4)
 
+#define LRC_CREATE_RUNALONE	BIT(0)
+
 struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm,
-			     u32 ring_size);
+			     u32 ring_size, u32 flags);
 void xe_lrc_destroy(struct kref *ref);
 
 /**
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 78479100a0b6..d0b9ef0799b2 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1112,7 +1112,8 @@ struct drm_xe_exec_queue_create {
 #define DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY		0
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
-
+#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_EUDEBUG		2
+#define     DRM_XE_EXEC_QUEUE_EUDEBUG_FLAG_ENABLE		(1 << 0)
 	/** @extensions: Pointer to the first extension struct, if any */
 	__u64 extensions;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 08/26] drm/xe/eudebug: Introduce per device attention scan worker
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (6 preceding siblings ...)
  2024-12-09 13:32 ` [PATCH 07/26] drm/xe: Add EUDEBUG_ENABLE exec queue property Mika Kuoppala
@ 2024-12-09 13:32 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 09/26] drm/xe/eudebug: Introduce EU control interface Mika Kuoppala
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:32 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Dominik Grzegorzek,
	Christoph Manszewski, Maciej Patelczyk, Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

Scan for EU debugging attention bits periodically to detect if some EU
thread has entered the system routine (SIP) due to EU thread exception.

Make the scanning interval 10 times slower when there is no debugger
connection open. Send attention event whenever we see attention with
debugger presence. If there is no debugger connection active - reset.

Based on work by authors and other folks who were part of attentions in
i915.

v2: - use xa_array for files
    - null ptr deref fix for non-debugged context (Dominik)
    - checkpatch (Tilak)
    - use discovery_lock during list traversal

v3: - engine status per gen improvements, force_wake ref
    - __counted_by (Mika)

Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Christoph Manszewski <christoph.manszewski@intel.com>
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/Makefile              |   1 +
 drivers/gpu/drm/xe/regs/xe_engine_regs.h |   3 +
 drivers/gpu/drm/xe/regs/xe_gt_regs.h     |   7 +
 drivers/gpu/drm/xe/xe_device.c           |   2 +
 drivers/gpu/drm/xe/xe_device_types.h     |   3 +
 drivers/gpu/drm/xe/xe_eudebug.c          | 410 ++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_eudebug.h          |   2 +
 drivers/gpu/drm/xe/xe_eudebug_types.h    |  32 ++
 drivers/gpu/drm/xe/xe_gt_debug.c         | 148 ++++++++
 drivers/gpu/drm/xe/xe_gt_debug.h         |  21 ++
 include/uapi/drm/xe_drm_eudebug.h        |  13 +
 11 files changed, 640 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_gt_debug.c
 create mode 100644 drivers/gpu/drm/xe/xe_gt_debug.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index deabcdd3ea52..33f457e4fcd3 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -48,6 +48,7 @@ xe-y += xe_bb.o \
 	xe_gt_clock.o \
 	xe_gt_freq.o \
 	xe_gt_idle.o \
+	xe_gt_debug.o \
 	xe_gt_mcr.o \
 	xe_gt_pagefault.o \
 	xe_gt_sysfs.o \
diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
index e45c4d5378e5..83b26cb174d6 100644
--- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h
@@ -133,6 +133,9 @@
 #define RING_EXECLIST_STATUS_LO(base)		XE_REG((base) + 0x234)
 #define RING_EXECLIST_STATUS_HI(base)		XE_REG((base) + 0x234 + 4)
 
+#define RING_CURRENT_LRCA(base)			XE_REG((base) + 0x240)
+#define   CURRENT_LRCA_VALID			REG_BIT(0)
+
 #define RING_CONTEXT_CONTROL(base)		XE_REG((base) + 0x244, XE_REG_OPTION_MASKED)
 #define	  CTX_CTRL_OAC_CONTEXT_ENABLE		REG_BIT(8)
 #define	  CTX_CTRL_RUN_ALONE			REG_BIT(7)
diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index cd8c49a9000f..a20331b6c20e 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -467,6 +467,8 @@
 #define   DISABLE_ECC				REG_BIT(5)
 #define   ENABLE_PREFETCH_INTO_IC		REG_BIT(3)
 
+#define TD_ATT(x)				XE_REG_MCR(0xe470 + (x) * 4)
+
 #define ROW_CHICKEN4				XE_REG_MCR(0xe48c, XE_REG_OPTION_MASKED)
 #define   DISABLE_GRF_CLEAR			REG_BIT(13)
 #define   XEHP_DIS_BBL_SYSPIPE			REG_BIT(11)
@@ -547,6 +549,11 @@
 #define   CCS_MODE_CSLICE(cslice, ccs) \
 	((ccs) << ((cslice) * CCS_MODE_CSLICE_WIDTH))
 
+#define RCU_DEBUG_1				XE_REG(0x14a00)
+#define   RCU_DEBUG_1_ENGINE_STATUS		REG_GENMASK(2, 0)
+#define   RCU_DEBUG_1_RUNALONE_ACTIVE		REG_BIT(2)
+#define   RCU_DEBUG_1_CONTEXT_ACTIVE		REG_BIT(0)
+
 #define FORCEWAKE_ACK_GT			XE_REG(0x130044)
 
 /* Applicable for all FORCEWAKE_DOMAIN and FORCEWAKE_ACK_DOMAIN regs */
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index f051612908de..dc0336215912 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -777,6 +777,8 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_debugfs_register(xe);
 
+	xe_eudebug_init_late(xe);
+
 	xe_hwmon_register(xe);
 
 	for_each_gt(gt, xe, id)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 9941ea1400c6..7b893a86d83f 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -556,6 +556,9 @@ struct xe_device {
 
 		/** discovery_lock: used for discovery to block xe ioctls */
 		struct rw_semaphore discovery_lock;
+
+		/** @attention_scan: attention scan worker */
+		struct delayed_work attention_scan;
 	} eudebug;
 #endif
 
diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 4644d6846aae..39e927100222 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -20,9 +20,17 @@
 #include "xe_eudebug.h"
 #include "xe_eudebug_types.h"
 #include "xe_exec_queue.h"
+#include "xe_force_wake.h"
+#include "xe_gt.h"
+#include "xe_gt_debug.h"
+#include "xe_hw_engine.h"
+#include "xe_lrc.h"
 #include "xe_macros.h"
+#include "xe_mmio.h"
+#include "xe_pm.h"
 #include "xe_reg_sr.h"
 #include "xe_rtp.h"
+#include "xe_sched_job.h"
 #include "xe_vm.h"
 #include "xe_wa.h"
 
@@ -725,7 +733,7 @@ static struct xe_eudebug_event *
 xe_eudebug_create_event(struct xe_eudebug *d, u16 type, u64 seqno, u16 flags,
 			u32 len)
 {
-	const u16 max_event = DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE_PLACEMENTS;
+	const u16 max_event = DRM_XE_EUDEBUG_EVENT_EU_ATTENTION;
 	const u16 known_flags =
 		DRM_XE_EUDEBUG_EVENT_CREATE |
 		DRM_XE_EUDEBUG_EVENT_DESTROY |
@@ -760,7 +768,7 @@ static long xe_eudebug_read_event(struct xe_eudebug *d,
 		u64_to_user_ptr(arg);
 	struct drm_xe_eudebug_event user_event;
 	struct xe_eudebug_event *event;
-	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE_PLACEMENTS;
+	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_EU_ATTENTION;
 	long ret = 0;
 
 	if (XE_IOCTL_DBG(xe, copy_from_user(&user_event, user_orig, sizeof(user_event))))
@@ -867,6 +875,392 @@ static const struct file_operations fops = {
 	.unlocked_ioctl	= xe_eudebug_ioctl,
 };
 
+static int __current_lrca(struct xe_hw_engine *hwe, u32 *lrc_hw)
+{
+	u32 lrc_reg;
+
+	lrc_reg = xe_hw_engine_mmio_read32(hwe, RING_CURRENT_LRCA(0));
+
+	if (!(lrc_reg & CURRENT_LRCA_VALID))
+		return -ENOENT;
+
+	*lrc_hw = lrc_reg & GENMASK(31, 12);
+
+	return 0;
+}
+
+static int current_lrca(struct xe_hw_engine *hwe, u32 *lrc_hw)
+{
+	unsigned int fw_ref;
+	int ret;
+
+	fw_ref = xe_force_wake_get(gt_to_fw(hwe->gt), hwe->domain);
+	if (!fw_ref)
+		return -ETIMEDOUT;
+
+	ret = __current_lrca(hwe, lrc_hw);
+
+	xe_force_wake_put(gt_to_fw(hwe->gt), fw_ref);
+
+	return ret;
+}
+
+static bool lrca_equals(u32 a, u32 b)
+{
+	return (a & GENMASK(31, 12)) == (b & GENMASK(31, 12));
+}
+
+static int match_exec_queue_lrca(struct xe_exec_queue *q, u32 lrc_hw)
+{
+	int i;
+
+	for (i = 0; i < q->width; i++)
+		if (lrca_equals(lower_32_bits(xe_lrc_descriptor(q->lrc[i])), lrc_hw))
+			return i;
+
+	return -1;
+}
+
+static int rcu_debug1_engine_index(const struct xe_hw_engine * const hwe)
+{
+	if (hwe->class == XE_ENGINE_CLASS_RENDER) {
+		XE_WARN_ON(hwe->instance);
+		return 0;
+	}
+
+	XE_WARN_ON(hwe->instance > 3);
+
+	return hwe->instance + 1;
+}
+
+static u32 engine_status_xe1(const struct xe_hw_engine * const hwe,
+			     u32 rcu_debug1)
+{
+	const unsigned int first = 7;
+	const unsigned int incr = 3;
+	const unsigned int i = rcu_debug1_engine_index(hwe);
+	const unsigned int shift = first + (i * incr);
+
+	return (rcu_debug1 >> shift) & RCU_DEBUG_1_ENGINE_STATUS;
+}
+
+static u32 engine_status_xe2(const struct xe_hw_engine * const hwe,
+			     u32 rcu_debug1)
+{
+	const unsigned int first = 7;
+	const unsigned int incr = 4;
+	const unsigned int i = rcu_debug1_engine_index(hwe);
+	const unsigned int shift = first + (i * incr);
+
+	return (rcu_debug1 >> shift) & RCU_DEBUG_1_ENGINE_STATUS;
+}
+
+static u32 engine_status(const struct xe_hw_engine * const hwe,
+			 u32 rcu_debug1)
+{
+	u32 status = 0;
+
+	if (GRAPHICS_VER(gt_to_xe(hwe->gt)) < 20)
+		status = engine_status_xe1(hwe, rcu_debug1);
+	else if (GRAPHICS_VER(gt_to_xe(hwe->gt)) < 30)
+		status = engine_status_xe2(hwe, rcu_debug1);
+	else
+		XE_WARN_ON(GRAPHICS_VER(gt_to_xe(hwe->gt)));
+
+	return status;
+}
+
+static bool engine_is_runalone_set(const struct xe_hw_engine * const hwe,
+				   u32 rcu_debug1)
+{
+	return engine_status(hwe, rcu_debug1) & RCU_DEBUG_1_RUNALONE_ACTIVE;
+}
+
+static bool engine_is_context_set(const struct xe_hw_engine * const hwe,
+				  u32 rcu_debug1)
+{
+	return engine_status(hwe, rcu_debug1) & RCU_DEBUG_1_CONTEXT_ACTIVE;
+}
+
+static bool engine_has_runalone(const struct xe_hw_engine * const hwe)
+{
+	return hwe->class == XE_ENGINE_CLASS_RENDER ||
+		hwe->class == XE_ENGINE_CLASS_COMPUTE;
+}
+
+static struct xe_hw_engine *get_runalone_active_hw_engine(struct xe_gt *gt)
+{
+	struct xe_hw_engine *hwe, *first = NULL;
+	unsigned int num_active, id, fw_ref;
+	u32 val;
+
+	fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
+	if (!fw_ref) {
+		drm_dbg(&gt_to_xe(gt)->drm, "eudbg: runalone failed to get force wake\n");
+		return NULL;
+	}
+
+	val = xe_mmio_read32(&gt->mmio, RCU_DEBUG_1);
+	xe_force_wake_put(gt_to_fw(gt), fw_ref);
+
+	drm_dbg(&gt_to_xe(gt)->drm, "eudbg: runalone RCU_DEBUG_1 = 0x%08x\n", val);
+
+	num_active = 0;
+	for_each_hw_engine(hwe, gt, id) {
+		bool runalone, ctx;
+
+		if (!engine_has_runalone(hwe))
+			continue;
+
+		runalone = engine_is_runalone_set(hwe, val);
+		ctx = engine_is_context_set(hwe, val);
+
+		drm_dbg(&gt_to_xe(gt)->drm, "eudbg: engine %s: runalone=%s, context=%s",
+			hwe->name, runalone ? "active" : "inactive",
+			ctx ? "active" : "inactive");
+
+		/*
+		 * On earlier gen12 the context status seems to be idle when
+		 * it has raised attention. We have to omit the active bit.
+		 */
+		if (IS_DGFX(gt_to_xe(gt)))
+			ctx = true;
+
+		if (runalone && ctx) {
+			num_active++;
+
+			drm_dbg(&gt_to_xe(gt)->drm, "eudbg: runalone engine %s %s",
+				hwe->name, first ? "selected" : "found");
+			if (!first)
+				first = hwe;
+		}
+	}
+
+	if (num_active > 1)
+		drm_err(&gt_to_xe(gt)->drm, "eudbg: %d runalone engines active!",
+			num_active);
+
+	return first;
+}
+
+static struct xe_exec_queue *runalone_active_queue_get(struct xe_gt *gt, int *lrc_idx)
+{
+	struct xe_device *xe = gt_to_xe(gt);
+	struct xe_exec_queue *q, *found = NULL;
+	struct xe_hw_engine *active;
+	struct xe_file *xef;
+	unsigned long i;
+	int idx, err;
+	u32 lrc_hw;
+
+	active = get_runalone_active_hw_engine(gt);
+	if (!active) {
+		drm_dbg(&gt_to_xe(gt)->drm, "Runalone engine not found!");
+		return ERR_PTR(-ENOENT);
+	}
+
+	err = current_lrca(active, &lrc_hw);
+	if (err)
+		return ERR_PTR(err);
+
+	/* Take write so that we can safely check the lists */
+	down_write(&xe->eudebug.discovery_lock);
+	list_for_each_entry(xef, &xe->clients.list, eudebug.client_link) {
+		xa_for_each(&xef->exec_queue.xa, i, q) {
+			if (q->gt != gt)
+				continue;
+
+			if (q->class != active->class)
+				continue;
+
+			if (xe_exec_queue_is_idle(q))
+				continue;
+
+			idx = match_exec_queue_lrca(q, lrc_hw);
+			if (idx < 0)
+				continue;
+
+			found = xe_exec_queue_get(q);
+
+			if (lrc_idx)
+				*lrc_idx = idx;
+
+			break;
+		}
+
+		if (found)
+			break;
+	}
+	up_write(&xe->eudebug.discovery_lock);
+
+	if (!found)
+		return ERR_PTR(-ENOENT);
+
+	if (XE_WARN_ON(current_lrca(active, &lrc_hw)) &&
+	    XE_WARN_ON(match_exec_queue_lrca(found, lrc_hw) < 0)) {
+		xe_exec_queue_put(found);
+		return ERR_PTR(-ENOENT);
+	}
+
+	return found;
+}
+
+static int send_attention_event(struct xe_eudebug *d, struct xe_exec_queue *q, int lrc_idx)
+{
+	struct xe_eudebug_event_eu_attention *ea;
+	struct xe_eudebug_event *event;
+	int h_c, h_queue, h_lrc;
+	u32 size = xe_gt_eu_attention_bitmap_size(q->gt);
+	u32 sz = struct_size(ea, bitmask, size);
+	int ret;
+
+	XE_WARN_ON(lrc_idx < 0 || lrc_idx >= q->width);
+
+	XE_WARN_ON(!xe_exec_queue_is_debuggable(q));
+
+	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, q->vm->xef);
+	if (h_c < 0)
+		return h_c;
+
+	h_queue = find_handle(d->res, XE_EUDEBUG_RES_TYPE_EXEC_QUEUE, q);
+	if (h_queue < 0)
+		return h_queue;
+
+	h_lrc = find_handle(d->res, XE_EUDEBUG_RES_TYPE_LRC, q->lrc[lrc_idx]);
+	if (h_lrc < 0)
+		return h_lrc;
+
+	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_EU_ATTENTION, 0,
+					DRM_XE_EUDEBUG_EVENT_STATE_CHANGE, sz);
+
+	if (!event)
+		return -ENOSPC;
+
+	ea = cast_event(ea, event);
+	write_member(struct drm_xe_eudebug_event_eu_attention, ea, client_handle, (u64)h_c);
+	write_member(struct drm_xe_eudebug_event_eu_attention, ea, exec_queue_handle, (u64)h_queue);
+	write_member(struct drm_xe_eudebug_event_eu_attention, ea, lrc_handle, (u64)h_lrc);
+	write_member(struct drm_xe_eudebug_event_eu_attention, ea, bitmask_size, size);
+
+	mutex_lock(&d->eu_lock);
+	event->seqno = atomic_long_inc_return(&d->events.seqno);
+	ret = xe_gt_eu_attention_bitmap(q->gt, &ea->bitmask[0], ea->bitmask_size);
+	mutex_unlock(&d->eu_lock);
+
+	if (ret)
+		return ret;
+
+	return xe_eudebug_queue_event(d, event);
+}
+
+
+static int xe_send_gt_attention(struct xe_gt *gt)
+{
+	struct xe_eudebug *d;
+	struct xe_exec_queue *q;
+	int ret, lrc_idx;
+
+	if (list_empty_careful(&gt_to_xe(gt)->eudebug.list))
+		return -ENOTCONN;
+
+	q = runalone_active_queue_get(gt, &lrc_idx);
+	if (IS_ERR(q))
+		return PTR_ERR(q);
+
+	if (!xe_exec_queue_is_debuggable(q)) {
+		ret = -EPERM;
+		goto err_exec_queue_put;
+	}
+
+	d = _xe_eudebug_get(q->vm->xef);
+	if (!d) {
+		ret = -ENOTCONN;
+		goto err_exec_queue_put;
+	}
+
+	if (!completion_done(&d->discovery)) {
+		eu_dbg(d, "discovery not yet done\n");
+		ret = -EBUSY;
+		goto err_eudebug_put;
+	}
+
+	ret = send_attention_event(d, q, lrc_idx);
+	if (ret)
+		xe_eudebug_disconnect(d, ret);
+
+err_eudebug_put:
+	xe_eudebug_put(d);
+err_exec_queue_put:
+	xe_exec_queue_put(q);
+
+	return ret;
+}
+
+static int xe_eudebug_handle_gt_attention(struct xe_gt *gt)
+{
+	int ret;
+
+	ret = xe_gt_eu_threads_needing_attention(gt);
+	if (ret <= 0)
+		return ret;
+
+	ret = xe_send_gt_attention(gt);
+
+	/* Discovery in progress, fake it */
+	if (ret == -EBUSY)
+		return 0;
+
+	return ret;
+}
+
+#define XE_EUDEBUG_ATTENTION_INTERVAL 100
+static void attention_scan_fn(struct work_struct *work)
+{
+	struct xe_device *xe = container_of(work, typeof(*xe), eudebug.attention_scan.work);
+	long delay = msecs_to_jiffies(XE_EUDEBUG_ATTENTION_INTERVAL);
+	struct xe_gt *gt;
+	u8 gt_id;
+
+	if (list_empty_careful(&xe->eudebug.list))
+		delay *= 10;
+
+	if (delay >= HZ)
+		delay = round_jiffies_up_relative(delay);
+
+	if (xe_pm_runtime_get_if_active(xe)) {
+		for_each_gt(gt, xe, gt_id) {
+			int ret;
+
+			if (gt->info.type != XE_GT_TYPE_MAIN)
+				continue;
+
+			ret = xe_eudebug_handle_gt_attention(gt);
+			if (ret) {
+				// TODO: error capture
+				drm_info(&gt_to_xe(gt)->drm,
+					 "gt:%d unable to handle eu attention ret=%d\n",
+					 gt_id, ret);
+
+				xe_gt_reset_async(gt);
+			}
+		}
+
+		xe_pm_runtime_put(xe);
+	}
+
+	schedule_delayed_work(&xe->eudebug.attention_scan, delay);
+}
+
+static void attention_scan_cancel(struct xe_device *xe)
+{
+	cancel_delayed_work_sync(&xe->eudebug.attention_scan);
+}
+
+static void attention_scan_flush(struct xe_device *xe)
+{
+	mod_delayed_work(system_wq, &xe->eudebug.attention_scan, 0);
+}
+
 static void discovery_work_fn(struct work_struct *work);
 
 static int
@@ -901,6 +1295,7 @@ xe_eudebug_connect(struct xe_device *xe,
 
 	kref_init(&d->ref);
 	spin_lock_init(&d->connection.lock);
+	mutex_init(&d->eu_lock);
 	init_waitqueue_head(&d->events.write_done);
 	init_waitqueue_head(&d->events.read_done);
 	init_completion(&d->discovery);
@@ -927,6 +1322,7 @@ xe_eudebug_connect(struct xe_device *xe,
 
 	kref_get(&d->ref);
 	queue_work(xe->eudebug.ordered_wq, &d->discovery_work);
+	attention_scan_flush(xe);
 
 	eu_dbg(d, "connected session %lld", d->session);
 
@@ -1004,13 +1400,23 @@ void xe_eudebug_init(struct xe_device *xe)
 	spin_lock_init(&xe->clients.lock);
 	INIT_LIST_HEAD(&xe->clients.list);
 	init_rwsem(&xe->eudebug.discovery_lock);
+	INIT_DELAYED_WORK(&xe->eudebug.attention_scan, attention_scan_fn);
 
 	xe->eudebug.ordered_wq = alloc_ordered_workqueue("xe-eudebug-ordered-wq", 0);
 	xe->eudebug.available = !!xe->eudebug.ordered_wq;
 }
 
+void xe_eudebug_init_late(struct xe_device *xe)
+{
+	if (!xe->eudebug.available)
+		return;
+
+	attention_scan_flush(xe);
+}
+
 void xe_eudebug_fini(struct xe_device *xe)
 {
+	attention_scan_cancel(xe);
 	xe_assert(xe, list_empty_careful(&xe->eudebug.list));
 
 	if (xe->eudebug.ordered_wq)
diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h
index 3cd6bc7bb682..1fe86bec99e1 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.h
+++ b/drivers/gpu/drm/xe/xe_eudebug.h
@@ -20,6 +20,7 @@ int xe_eudebug_connect_ioctl(struct drm_device *dev,
 			     struct drm_file *file);
 
 void xe_eudebug_init(struct xe_device *xe);
+void xe_eudebug_init_late(struct xe_device *xe);
 void xe_eudebug_fini(struct xe_device *xe);
 void xe_eudebug_init_hw_engine(struct xe_hw_engine *hwe);
 
@@ -39,6 +40,7 @@ static inline int xe_eudebug_connect_ioctl(struct drm_device *dev,
 					   struct drm_file *file) { return 0; }
 
 static inline void xe_eudebug_init(struct xe_device *xe) { }
+static inline void xe_eudebug_init_late(struct xe_device *xe) { }
 static inline void xe_eudebug_fini(struct xe_device *xe) { }
 static inline void xe_eudebug_init_hw_engine(struct xe_hw_engine *hwe) { }
 
diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
index bdffdfb1abff..410b3ecccc12 100644
--- a/drivers/gpu/drm/xe/xe_eudebug_types.h
+++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
@@ -105,6 +105,9 @@ struct xe_eudebug {
 	/** @discovery_work: worker to discover resources for target_task */
 	struct work_struct discovery_work;
 
+	/** eu_lock: guards operations on eus (eu thread control and attention) */
+	struct mutex eu_lock;
+
 	/** @events: kfifo queue of to-be-delivered events */
 	struct {
 		/** @lock: guards access to fifo */
@@ -228,4 +231,33 @@ struct xe_eudebug_event_exec_queue_placements {
 	u64 instances[]; __counted_by(num_placements);
 };
 
+/**
+ * struct xe_eudebug_event_eu_attention - Internal event for EU attention
+ */
+struct xe_eudebug_event_eu_attention {
+	/** @base: base event */
+	struct xe_eudebug_event base;
+
+	/** @client_handle: client for the attention */
+	u64 client_handle;
+
+	/** @exec_queue_handle: handle of exec_queue which raised attention */
+	u64 exec_queue_handle;
+
+	/** @lrc_handle: lrc handle of the workload which raised attention */
+	u64 lrc_handle;
+
+	/** @flags: eu attention event flags, currently MBZ */
+	u32 flags;
+
+	/** @bitmask_size: size of the bitmask, specific to device */
+	u32 bitmask_size;
+
+	/**
+	 * @bitmask: reflects threads currently signalling attention,
+	 * starting from natural hardware order of DSS=0, eu=0
+	 */
+	u8 bitmask[] __counted_by(bitmask_size);
+};
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_gt_debug.c b/drivers/gpu/drm/xe/xe_gt_debug.c
new file mode 100644
index 000000000000..c4f0d11a20a6
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_gt_debug.c
@@ -0,0 +1,148 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include "regs/xe_gt_regs.h"
+#include "xe_device.h"
+#include "xe_force_wake.h"
+#include "xe_gt.h"
+#include "xe_gt_topology.h"
+#include "xe_gt_debug.h"
+#include "xe_gt_mcr.h"
+#include "xe_pm.h"
+#include "xe_macros.h"
+
+static int xe_gt_foreach_dss_group_instance(struct xe_gt *gt,
+					    int (*fn)(struct xe_gt *gt,
+						      void *data,
+						      u16 group,
+						      u16 instance),
+					    void *data)
+{
+	const enum xe_force_wake_domains fw_domains = XE_FW_GT;
+	unsigned int dss, fw_ref;
+	u16 group, instance;
+	int ret = 0;
+
+	fw_ref = xe_force_wake_get(gt_to_fw(gt), fw_domains);
+	if (!fw_ref)
+		return -ETIMEDOUT;
+
+	for_each_dss_steering(dss, gt, group, instance) {
+		ret = fn(gt, data, group, instance);
+		if (ret)
+			break;
+	}
+
+	xe_force_wake_put(gt_to_fw(gt), fw_ref);
+
+	return ret;
+}
+
+static int read_first_attention_mcr(struct xe_gt *gt, void *data,
+				    u16 group, u16 instance)
+{
+	unsigned int row;
+
+	for (row = 0; row < 2; row++) {
+		u32 val;
+
+		val = xe_gt_mcr_unicast_read(gt, TD_ATT(row), group, instance);
+
+		if (val)
+			return 1;
+	}
+
+	return 0;
+}
+
+#define MAX_EUS_PER_ROW 4u
+#define MAX_THREADS 8u
+
+/**
+ * xe_gt_eu_attention_bitmap_size - query size of the attention bitmask
+ *
+ * @gt: pointer to struct xe_gt
+ *
+ * Return: size in bytes.
+ */
+int xe_gt_eu_attention_bitmap_size(struct xe_gt *gt)
+{
+	xe_dss_mask_t dss_mask;
+
+	bitmap_or(dss_mask, gt->fuse_topo.c_dss_mask,
+		  gt->fuse_topo.g_dss_mask, XE_MAX_DSS_FUSE_BITS);
+
+	return  bitmap_weight(dss_mask, XE_MAX_DSS_FUSE_BITS) *
+		TD_EU_ATTENTION_MAX_ROWS * MAX_THREADS *
+		MAX_EUS_PER_ROW / 8;
+}
+
+struct attn_read_iter {
+	struct xe_gt *gt;
+	unsigned int i;
+	unsigned int size;
+	u8 *bits;
+};
+
+static int read_eu_attentions_mcr(struct xe_gt *gt, void *data,
+				  u16 group, u16 instance)
+{
+	struct attn_read_iter * const iter = data;
+	unsigned int row;
+
+	for (row = 0; row < TD_EU_ATTENTION_MAX_ROWS; row++) {
+		u32 val;
+
+		if (iter->i >= iter->size)
+			return 0;
+
+		XE_WARN_ON(iter->i + sizeof(val) > xe_gt_eu_attention_bitmap_size(gt));
+
+		val = xe_gt_mcr_unicast_read(gt, TD_ATT(row), group, instance);
+
+		memcpy(&iter->bits[iter->i], &val, sizeof(val));
+		iter->i += sizeof(val);
+	}
+
+	return 0;
+}
+
+/**
+ * xe_gt_eu_attention_bitmap - query host attention
+ *
+ * @gt: pointer to struct xe_gt
+ *
+ * Return: 0 on success, negative otherwise.
+ */
+int xe_gt_eu_attention_bitmap(struct xe_gt *gt, u8 *bits,
+			      unsigned int bitmap_size)
+{
+	struct attn_read_iter iter = {
+		.gt = gt,
+		.i = 0,
+		.size = bitmap_size,
+		.bits = bits
+	};
+
+	return xe_gt_foreach_dss_group_instance(gt, read_eu_attentions_mcr, &iter);
+}
+
+/**
+ * xe_gt_eu_threads_needing_attention - Query host attention
+ *
+ * @gt: pointer to struct xe_gt
+ *
+ * Return: 1 if threads waiting host attention, 0 otherwise.
+ */
+int xe_gt_eu_threads_needing_attention(struct xe_gt *gt)
+{
+	int err;
+
+	err = xe_gt_foreach_dss_group_instance(gt, read_first_attention_mcr, NULL);
+
+	XE_WARN_ON(err < 0);
+
+	return err < 0 ? 0 : err;
+}
diff --git a/drivers/gpu/drm/xe/xe_gt_debug.h b/drivers/gpu/drm/xe/xe_gt_debug.h
new file mode 100644
index 000000000000..3f13dbb17a5f
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_gt_debug.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef __XE_GT_DEBUG_
+#define __XE_GT_DEBUG_
+
+#define TD_EU_ATTENTION_MAX_ROWS 2u
+
+#include "xe_gt_types.h"
+
+#define XE_GT_ATTENTION_TIMEOUT_MS 100
+
+int xe_gt_eu_threads_needing_attention(struct xe_gt *gt);
+
+int xe_gt_eu_attention_bitmap_size(struct xe_gt *gt);
+int xe_gt_eu_attention_bitmap(struct xe_gt *gt, u8 *bits,
+			      unsigned int bitmap_size);
+
+#endif
diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
index 21690008a869..144c7cf888bb 100644
--- a/include/uapi/drm/xe_drm_eudebug.h
+++ b/include/uapi/drm/xe_drm_eudebug.h
@@ -28,12 +28,14 @@ struct drm_xe_eudebug_event {
 #define DRM_XE_EUDEBUG_EVENT_VM			3
 #define DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE		4
 #define DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE_PLACEMENTS 5
+#define DRM_XE_EUDEBUG_EVENT_EU_ATTENTION	6
 
 	__u16 flags;
 #define DRM_XE_EUDEBUG_EVENT_CREATE		(1 << 0)
 #define DRM_XE_EUDEBUG_EVENT_DESTROY		(1 << 1)
 #define DRM_XE_EUDEBUG_EVENT_STATE_CHANGE	(1 << 2)
 #define DRM_XE_EUDEBUG_EVENT_NEED_ACK		(1 << 3)
+
 	__u64 seqno;
 	__u64 reserved;
 };
@@ -78,6 +80,17 @@ struct drm_xe_eudebug_event_exec_queue_placements {
 	__u64 instances[];
 };
 
+struct drm_xe_eudebug_event_eu_attention {
+	struct drm_xe_eudebug_event base;
+
+	__u64 client_handle;
+	__u64 exec_queue_handle;
+	__u64 lrc_handle;
+	__u32 flags;
+	__u32 bitmask_size;
+	__u8 bitmask[];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 09/26] drm/xe/eudebug: Introduce EU control interface
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (7 preceding siblings ...)
  2024-12-09 13:32 ` [PATCH 08/26] drm/xe/eudebug: Introduce per device attention scan worker Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 10/26] drm/xe/eudebug: Add vm bind and vm bind ops Mika Kuoppala
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Dominik Grzegorzek, Maciej Patelczyk,
	Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

Introduce EU control functionality, which allows EU debugger
to interrupt, resume, and inform about the current state of
EU threads during execution. Provide an abstraction layer,
so in the future guc will only need to provide appropriate callbacks.

Based on implementation created by authors and other folks within
i915 driver.

v2: - checkpatch (Maciej)
    - lrc index off by one fix (Mika)
    - checkpatch (Tilak)
    - 32bit fixes (Andrzej, Mika)
    - find_resource_get for client (Mika)

v3: - fw ref (Mika)

Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/regs/xe_gt_regs.h  |   2 +
 drivers/gpu/drm/xe/xe_eudebug.c       | 515 +++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_eudebug_types.h |  24 ++
 drivers/gpu/drm/xe/xe_gt_debug.c      |  12 +-
 drivers/gpu/drm/xe/xe_gt_debug.h      |   6 +
 include/uapi/drm/xe_drm_eudebug.h     |  21 +-
 6 files changed, 560 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index a20331b6c20e..5fcf06835ef0 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -478,6 +478,8 @@
 #define   THREAD_EX_ARB_MODE			REG_GENMASK(3, 2)
 #define   THREAD_EX_ARB_MODE_RR_AFTER_DEP	REG_FIELD_PREP(THREAD_EX_ARB_MODE, 0x2)
 
+#define TD_CLR(i)				XE_REG_MCR(0xe490 + (i) * 4)
+
 #define ROW_CHICKEN3				XE_REG_MCR(0xe49c, XE_REG_OPTION_MASKED)
 #define   XE2_EUPEND_CHK_FLUSH_DIS		REG_BIT(14)
 #define   DIS_FIX_EOT1_FLUSH			REG_BIT(9)
diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 39e927100222..81d03a860b7f 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -23,6 +23,7 @@
 #include "xe_force_wake.h"
 #include "xe_gt.h"
 #include "xe_gt_debug.h"
+#include "xe_gt_mcr.h"
 #include "xe_hw_engine.h"
 #include "xe_lrc.h"
 #include "xe_macros.h"
@@ -587,6 +588,64 @@ static int find_handle(struct xe_eudebug_resources *res,
 	return id;
 }
 
+static void *find_resource__unlocked(struct xe_eudebug_resources *res,
+				     const int type,
+				     const u32 id)
+{
+	struct xe_eudebug_resource *r;
+	struct xe_eudebug_handle *h;
+
+	r = resource_from_type(res, type);
+	h = xa_load(&r->xa, id);
+
+	return h ? (void *)(uintptr_t)h->key : NULL;
+}
+
+static void *find_resource(struct xe_eudebug_resources *res,
+			   const int type,
+			   const u32 id)
+{
+	void *p;
+
+	mutex_lock(&res->lock);
+	p =  find_resource__unlocked(res, type, id);
+	mutex_unlock(&res->lock);
+
+	return p;
+}
+
+static struct xe_file *find_client_get(struct xe_eudebug *d, const u32 id)
+{
+	struct xe_file *xef;
+
+	mutex_lock(&d->res->lock);
+	xef = find_resource__unlocked(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, id);
+	if (xef)
+		xe_file_get(xef);
+	mutex_unlock(&d->res->lock);
+
+	return xef;
+}
+
+static struct xe_exec_queue *find_exec_queue_get(struct xe_eudebug *d,
+						 u32 id)
+{
+	struct xe_exec_queue *q;
+
+	mutex_lock(&d->res->lock);
+	q = find_resource__unlocked(d->res, XE_EUDEBUG_RES_TYPE_EXEC_QUEUE, id);
+	if (q)
+		xe_exec_queue_get(q);
+	mutex_unlock(&d->res->lock);
+
+	return q;
+}
+
+static struct xe_lrc *find_lrc(struct xe_eudebug *d, const u32 id)
+{
+	return find_resource(d->res, XE_EUDEBUG_RES_TYPE_LRC, id);
+}
+
 static int _xe_eudebug_add_handle(struct xe_eudebug *d,
 				  int type,
 				  void *p,
@@ -843,6 +902,177 @@ static long xe_eudebug_read_event(struct xe_eudebug *d,
 	return ret;
 }
 
+static int do_eu_control(struct xe_eudebug *d,
+			 const struct drm_xe_eudebug_eu_control * const arg,
+			 struct drm_xe_eudebug_eu_control __user * const user_ptr)
+{
+	void __user * const bitmask_ptr = u64_to_user_ptr(arg->bitmask_ptr);
+	struct xe_device *xe = d->xe;
+	u8 *bits = NULL;
+	unsigned int hw_attn_size, attn_size;
+	struct xe_exec_queue *q;
+	struct xe_file *xef;
+	struct xe_lrc *lrc;
+	u64 seqno;
+	int ret;
+
+	if (xe_eudebug_detached(d))
+		return -ENOTCONN;
+
+	/* Accept only hardware reg granularity mask */
+	if (XE_IOCTL_DBG(xe, !IS_ALIGNED(arg->bitmask_size, sizeof(u32))))
+		return -EINVAL;
+
+	xef = find_client_get(d, arg->client_handle);
+	if (XE_IOCTL_DBG(xe, !xef))
+		return -EINVAL;
+
+	q = find_exec_queue_get(d, arg->exec_queue_handle);
+	if (XE_IOCTL_DBG(xe, !q)) {
+		xe_file_put(xef);
+		return -EINVAL;
+	}
+
+	if (XE_IOCTL_DBG(xe, !xe_exec_queue_is_debuggable(q))) {
+		ret = -EINVAL;
+		goto queue_put;
+	}
+
+	if (XE_IOCTL_DBG(xe, xef != q->vm->xef)) {
+		ret = -EINVAL;
+		goto queue_put;
+	}
+
+	lrc = find_lrc(d, arg->lrc_handle);
+	if (XE_IOCTL_DBG(xe, !lrc)) {
+		ret = -EINVAL;
+		goto queue_put;
+	}
+
+	hw_attn_size = xe_gt_eu_attention_bitmap_size(q->gt);
+	attn_size = arg->bitmask_size;
+
+	if (attn_size > hw_attn_size)
+		attn_size = hw_attn_size;
+
+	if (attn_size > 0) {
+		bits = kmalloc(attn_size, GFP_KERNEL);
+		if (!bits) {
+			ret =  -ENOMEM;
+			goto queue_put;
+		}
+
+		if (copy_from_user(bits, bitmask_ptr, attn_size)) {
+			ret = -EFAULT;
+			goto out_free;
+		}
+	}
+
+	if (!pm_runtime_active(xe->drm.dev)) {
+		ret = -EIO;
+		goto out_free;
+	}
+
+	ret = -EINVAL;
+	mutex_lock(&d->eu_lock);
+
+	switch (arg->cmd) {
+	case DRM_XE_EUDEBUG_EU_CONTROL_CMD_INTERRUPT_ALL:
+		/* Make sure we dont promise anything but interrupting all */
+		if (!attn_size)
+			ret = d->ops->interrupt_all(d, q, lrc);
+		break;
+	case DRM_XE_EUDEBUG_EU_CONTROL_CMD_STOPPED:
+		ret = d->ops->stopped(d, q, lrc, bits, attn_size);
+		break;
+	case DRM_XE_EUDEBUG_EU_CONTROL_CMD_RESUME:
+		ret = d->ops->resume(d, q, lrc, bits, attn_size);
+		break;
+	default:
+		break;
+	}
+
+	if (ret == 0)
+		seqno = atomic_long_inc_return(&d->events.seqno);
+
+	mutex_unlock(&d->eu_lock);
+
+	if (ret)
+		goto out_free;
+
+	if (put_user(seqno, &user_ptr->seqno)) {
+		ret = -EFAULT;
+		goto out_free;
+	}
+
+	if (copy_to_user(bitmask_ptr, bits, attn_size)) {
+		ret = -EFAULT;
+		goto out_free;
+	}
+
+	if (hw_attn_size != arg->bitmask_size)
+		if (put_user(hw_attn_size, &user_ptr->bitmask_size))
+			ret = -EFAULT;
+
+out_free:
+	kfree(bits);
+queue_put:
+	xe_exec_queue_put(q);
+	xe_file_put(xef);
+
+	return ret;
+}
+
+static long xe_eudebug_eu_control(struct xe_eudebug *d, const u64 arg)
+{
+	struct drm_xe_eudebug_eu_control __user * const user_ptr =
+		u64_to_user_ptr(arg);
+	struct drm_xe_eudebug_eu_control user_arg;
+	struct xe_device *xe = d->xe;
+	struct xe_file *xef;
+	int ret;
+
+	if (XE_IOCTL_DBG(xe, !(_IOC_DIR(DRM_XE_EUDEBUG_IOCTL_EU_CONTROL) & _IOC_WRITE)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, !(_IOC_DIR(DRM_XE_EUDEBUG_IOCTL_EU_CONTROL) & _IOC_READ)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, _IOC_SIZE(DRM_XE_EUDEBUG_IOCTL_EU_CONTROL) != sizeof(user_arg)))
+		return -EINVAL;
+
+	if (copy_from_user(&user_arg,
+			   user_ptr,
+			   sizeof(user_arg)))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, user_arg.flags))
+		return -EINVAL;
+
+	if (!access_ok(u64_to_user_ptr(user_arg.bitmask_ptr), user_arg.bitmask_size))
+		return -EFAULT;
+
+	eu_dbg(d,
+	       "eu_control: client_handle=%llu, cmd=%u, flags=0x%x, exec_queue_handle=%llu, bitmask_size=%u\n",
+	       user_arg.client_handle, user_arg.cmd, user_arg.flags, user_arg.exec_queue_handle,
+	       user_arg.bitmask_size);
+
+	xef = find_client_get(d, user_arg.client_handle);
+	if (XE_IOCTL_DBG(xe, !xef))
+		return -EINVAL; /* As this is user input */
+
+	ret = do_eu_control(d, &user_arg, user_ptr);
+
+	xe_file_put(xef);
+
+	eu_dbg(d,
+	       "eu_control: client_handle=%llu, cmd=%u, flags=0x%x, exec_queue_handle=%llu, bitmask_size=%u ret=%d\n",
+	       user_arg.client_handle, user_arg.cmd, user_arg.flags, user_arg.exec_queue_handle,
+	       user_arg.bitmask_size, ret);
+
+	return ret;
+}
+
 static long xe_eudebug_ioctl(struct file *file,
 			     unsigned int cmd,
 			     unsigned long arg)
@@ -859,6 +1089,10 @@ static long xe_eudebug_ioctl(struct file *file,
 		ret = xe_eudebug_read_event(d, arg,
 					    !(file->f_flags & O_NONBLOCK));
 		break;
+	case DRM_XE_EUDEBUG_IOCTL_EU_CONTROL:
+		ret = xe_eudebug_eu_control(d, arg);
+		eu_dbg(d, "ioctl cmd=EU_CONTROL ret=%ld\n", ret);
+		break;
 
 	default:
 		ret = -EINVAL;
@@ -1043,23 +1277,17 @@ static struct xe_hw_engine *get_runalone_active_hw_engine(struct xe_gt *gt)
 	return first;
 }
 
-static struct xe_exec_queue *runalone_active_queue_get(struct xe_gt *gt, int *lrc_idx)
+static struct xe_exec_queue *active_hwe_to_exec_queue(struct xe_hw_engine *hwe, int *lrc_idx)
 {
-	struct xe_device *xe = gt_to_xe(gt);
+	struct xe_device *xe = gt_to_xe(hwe->gt);
+	struct xe_gt *gt = hwe->gt;
 	struct xe_exec_queue *q, *found = NULL;
-	struct xe_hw_engine *active;
 	struct xe_file *xef;
 	unsigned long i;
 	int idx, err;
 	u32 lrc_hw;
 
-	active = get_runalone_active_hw_engine(gt);
-	if (!active) {
-		drm_dbg(&gt_to_xe(gt)->drm, "Runalone engine not found!");
-		return ERR_PTR(-ENOENT);
-	}
-
-	err = current_lrca(active, &lrc_hw);
+	err = current_lrca(hwe, &lrc_hw);
 	if (err)
 		return ERR_PTR(err);
 
@@ -1070,7 +1298,7 @@ static struct xe_exec_queue *runalone_active_queue_get(struct xe_gt *gt, int *lr
 			if (q->gt != gt)
 				continue;
 
-			if (q->class != active->class)
+			if (q->class != hwe->class)
 				continue;
 
 			if (xe_exec_queue_is_idle(q))
@@ -1096,7 +1324,7 @@ static struct xe_exec_queue *runalone_active_queue_get(struct xe_gt *gt, int *lr
 	if (!found)
 		return ERR_PTR(-ENOENT);
 
-	if (XE_WARN_ON(current_lrca(active, &lrc_hw)) &&
+	if (XE_WARN_ON(current_lrca(hwe, &lrc_hw)) &&
 	    XE_WARN_ON(match_exec_queue_lrca(found, lrc_hw) < 0)) {
 		xe_exec_queue_put(found);
 		return ERR_PTR(-ENOENT);
@@ -1105,6 +1333,19 @@ static struct xe_exec_queue *runalone_active_queue_get(struct xe_gt *gt, int *lr
 	return found;
 }
 
+static struct xe_exec_queue *runalone_active_queue_get(struct xe_gt *gt, int *lrc_idx)
+{
+	struct xe_hw_engine *active;
+
+	active = get_runalone_active_hw_engine(gt);
+	if (!active) {
+		drm_dbg(&gt_to_xe(gt)->drm, "Runalone engine not found!");
+		return ERR_PTR(-ENOENT);
+	}
+
+	return active_hwe_to_exec_queue(active, lrc_idx);
+}
+
 static int send_attention_event(struct xe_eudebug *d, struct xe_exec_queue *q, int lrc_idx)
 {
 	struct xe_eudebug_event_eu_attention *ea;
@@ -1153,7 +1394,6 @@ static int send_attention_event(struct xe_eudebug *d, struct xe_exec_queue *q, i
 	return xe_eudebug_queue_event(d, event);
 }
 
-
 static int xe_send_gt_attention(struct xe_gt *gt)
 {
 	struct xe_eudebug *d;
@@ -1261,6 +1501,254 @@ static void attention_scan_flush(struct xe_device *xe)
 	mod_delayed_work(system_wq, &xe->eudebug.attention_scan, 0);
 }
 
+static int xe_eu_control_interrupt_all(struct xe_eudebug *d,
+				       struct xe_exec_queue *q,
+				       struct xe_lrc *lrc)
+{
+	struct xe_gt *gt = q->hwe->gt;
+	struct xe_device *xe = d->xe;
+	struct xe_exec_queue *active;
+	struct xe_hw_engine *hwe;
+	unsigned int fw_ref;
+	int lrc_idx, ret;
+	u32 lrc_hw;
+	u32 td_ctl;
+
+	hwe = get_runalone_active_hw_engine(gt);
+	if (XE_IOCTL_DBG(xe, !hwe)) {
+		drm_dbg(&gt_to_xe(gt)->drm, "Runalone engine not found!");
+		return -EINVAL;
+	}
+
+	active = active_hwe_to_exec_queue(hwe, &lrc_idx);
+	if (XE_IOCTL_DBG(xe, IS_ERR(active)))
+		return PTR_ERR(active);
+
+	if (XE_IOCTL_DBG(xe, q != active)) {
+		xe_exec_queue_put(active);
+		return -EINVAL;
+	}
+	xe_exec_queue_put(active);
+
+	if (XE_IOCTL_DBG(xe, lrc_idx >= q->width || q->lrc[lrc_idx] != lrc))
+		return -EINVAL;
+
+	fw_ref = xe_force_wake_get(gt_to_fw(gt), hwe->domain);
+	if (!fw_ref)
+		return -ETIMEDOUT;
+
+	/* Additional check just before issuing MMIO writes */
+	ret = __current_lrca(hwe, &lrc_hw);
+	if (ret)
+		goto put_fw;
+
+	if (!lrca_equals(lower_32_bits(xe_lrc_descriptor(lrc)), lrc_hw)) {
+		ret = -EBUSY;
+		goto put_fw;
+	}
+
+	td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL);
+
+	/* Halt on next thread dispatch */
+	if (!(td_ctl & TD_CTL_FORCE_EXTERNAL_HALT))
+		xe_gt_mcr_multicast_write(gt, TD_CTL,
+					  td_ctl | TD_CTL_FORCE_EXTERNAL_HALT);
+	else
+		eu_warn(d, "TD_CTL force external halt bit already set!\n");
+
+	/*
+	 * The sleep is needed because some interrupts are ignored
+	 * by the HW, hence we allow the HW some time to acknowledge
+	 * that.
+	 */
+	usleep_range(100, 110);
+
+	/* Halt regardless of thread dependencies */
+	if (!(td_ctl & TD_CTL_FORCE_EXCEPTION))
+		xe_gt_mcr_multicast_write(gt, TD_CTL,
+					  td_ctl | TD_CTL_FORCE_EXCEPTION);
+	else
+		eu_warn(d, "TD_CTL force exception bit already set!\n");
+
+	usleep_range(100, 110);
+
+	xe_gt_mcr_multicast_write(gt, TD_CTL, td_ctl &
+				  ~(TD_CTL_FORCE_EXTERNAL_HALT | TD_CTL_FORCE_EXCEPTION));
+
+	/*
+	 * In case of stopping wrong ctx emit warning.
+	 * Nothing else we can do for now.
+	 */
+	ret = __current_lrca(hwe, &lrc_hw);
+	if (ret || !lrca_equals(lower_32_bits(xe_lrc_descriptor(lrc)), lrc_hw))
+		eu_warn(d, "xe_eudebug: interrupted wrong context.");
+
+put_fw:
+	xe_force_wake_put(gt_to_fw(gt), fw_ref);
+
+	return ret;
+}
+
+struct ss_iter {
+	struct xe_eudebug *debugger;
+	unsigned int i;
+
+	unsigned int size;
+	u8 *bits;
+};
+
+static int check_attn_mcr(struct xe_gt *gt, void *data,
+			  u16 group, u16 instance)
+{
+	struct ss_iter *iter = data;
+	struct xe_eudebug *d = iter->debugger;
+	unsigned int row;
+
+	for (row = 0; row < TD_EU_ATTENTION_MAX_ROWS; row++) {
+		u32 val, cur = 0;
+
+		if (iter->i >= iter->size)
+			return 0;
+
+		if (XE_WARN_ON((iter->i + sizeof(val)) >
+				(xe_gt_eu_attention_bitmap_size(gt))))
+			return -EIO;
+
+		memcpy(&val, &iter->bits[iter->i], sizeof(val));
+		iter->i += sizeof(val);
+
+		cur = xe_gt_mcr_unicast_read(gt, TD_ATT(row), group, instance);
+
+		if ((val | cur) != cur) {
+			eu_dbg(d,
+			       "WRONG CLEAR (%u:%u:%u) TD_CRL: 0x%08x; TD_ATT: 0x%08x\n",
+			       group, instance, row, val, cur);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static int clear_attn_mcr(struct xe_gt *gt, void *data,
+			  u16 group, u16 instance)
+{
+	struct ss_iter *iter = data;
+	struct xe_eudebug *d = iter->debugger;
+	unsigned int row;
+
+	for (row = 0; row < TD_EU_ATTENTION_MAX_ROWS; row++) {
+		u32 val;
+
+		if (iter->i >= iter->size)
+			return 0;
+
+		if (XE_WARN_ON((iter->i + sizeof(val)) >
+				(xe_gt_eu_attention_bitmap_size(gt))))
+			return -EIO;
+
+		memcpy(&val, &iter->bits[iter->i], sizeof(val));
+		iter->i += sizeof(val);
+
+		if (!val)
+			continue;
+
+		xe_gt_mcr_unicast_write(gt, TD_CLR(row), val,
+					group, instance);
+
+		eu_dbg(d,
+		       "TD_CLR: (%u:%u:%u): 0x%08x\n",
+		       group, instance, row, val);
+	}
+
+	return 0;
+}
+
+static int xe_eu_control_resume(struct xe_eudebug *d,
+				struct xe_exec_queue *q,
+				struct xe_lrc *lrc,
+				u8 *bits, unsigned int bitmask_size)
+{
+	struct xe_device *xe = d->xe;
+	struct ss_iter iter = {
+		.debugger = d,
+		.i = 0,
+		.size = bitmask_size,
+		.bits = bits
+	};
+	int ret = 0;
+	struct xe_exec_queue *active;
+	int lrc_idx;
+
+	active = runalone_active_queue_get(q->gt, &lrc_idx);
+	if (IS_ERR(active))
+		return PTR_ERR(active);
+
+	if (XE_IOCTL_DBG(xe, q != active)) {
+		xe_exec_queue_put(active);
+		return -EBUSY;
+	}
+	xe_exec_queue_put(active);
+
+	if (XE_IOCTL_DBG(xe, lrc_idx >= q->width || q->lrc[lrc_idx] != lrc))
+		return -EBUSY;
+
+	/*
+	 * hsdes: 18021122357
+	 * We need to avoid clearing attention bits that are not set
+	 * in order to avoid the EOT hang on PVC.
+	 */
+	if (GRAPHICS_VERx100(d->xe) == 1260) {
+		ret = xe_gt_foreach_dss_group_instance(q->gt, check_attn_mcr, &iter);
+		if (ret)
+			return ret;
+
+		iter.i = 0;
+	}
+
+	xe_gt_foreach_dss_group_instance(q->gt, clear_attn_mcr, &iter);
+	return 0;
+}
+
+static int xe_eu_control_stopped(struct xe_eudebug *d,
+				 struct xe_exec_queue *q,
+				 struct xe_lrc *lrc,
+				 u8 *bits, unsigned int bitmask_size)
+{
+	struct xe_device *xe = d->xe;
+	struct xe_exec_queue *active;
+	int lrc_idx;
+
+	if (XE_WARN_ON(!q) || XE_WARN_ON(!q->gt))
+		return -EINVAL;
+
+	active = runalone_active_queue_get(q->gt, &lrc_idx);
+	if (IS_ERR(active))
+		return PTR_ERR(active);
+
+	if (active) {
+		if (XE_IOCTL_DBG(xe, q != active)) {
+			xe_exec_queue_put(active);
+			return -EBUSY;
+		}
+
+		if (XE_IOCTL_DBG(xe, lrc_idx >= q->width || q->lrc[lrc_idx] != lrc)) {
+			xe_exec_queue_put(active);
+			return -EBUSY;
+		}
+	}
+
+	xe_exec_queue_put(active);
+
+	return xe_gt_eu_attention_bitmap(q->gt, bits, bitmask_size);
+}
+
+static struct xe_eudebug_eu_control_ops eu_control = {
+	.interrupt_all = xe_eu_control_interrupt_all,
+	.stopped = xe_eu_control_stopped,
+	.resume = xe_eu_control_resume,
+};
+
 static void discovery_work_fn(struct work_struct *work);
 
 static int
@@ -1320,6 +1808,7 @@ xe_eudebug_connect(struct xe_device *xe,
 		goto err_detach;
 	}
 
+	d->ops = &eu_control;
 	kref_get(&d->ref);
 	queue_work(xe->eudebug.ordered_wq, &d->discovery_work);
 	attention_scan_flush(xe);
diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
index 410b3ecccc12..e1d4e31b32ec 100644
--- a/drivers/gpu/drm/xe/xe_eudebug_types.h
+++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
@@ -18,8 +18,12 @@
 
 struct xe_device;
 struct task_struct;
+struct xe_eudebug;
 struct xe_eudebug_event;
+struct xe_hw_engine;
 struct workqueue_struct;
+struct xe_exec_queue;
+struct xe_lrc;
 
 #define CONFIG_DRM_XE_DEBUGGER_EVENT_QUEUE_SIZE 64
 
@@ -65,6 +69,24 @@ struct xe_eudebug_resources {
 	struct xe_eudebug_resource rt[XE_EUDEBUG_RES_TYPE_COUNT];
 };
 
+/**
+ * struct xe_eudebug_eu_control_ops - interface for eu thread
+ * state control backend
+ */
+struct xe_eudebug_eu_control_ops {
+	/** @interrupt_all: interrupts workload active on given hwe */
+	int (*interrupt_all)(struct xe_eudebug *e, struct xe_exec_queue *q,
+			     struct xe_lrc *lrc);
+
+	/** @resume: resumes threads reflected by bitmask active on given hwe */
+	int (*resume)(struct xe_eudebug *e, struct xe_exec_queue *q,
+		      struct xe_lrc *lrc, u8 *bitmap, unsigned int bitmap_size);
+
+	/** @stopped: returns bitmap reflecting threads which signal attention */
+	int (*stopped)(struct xe_eudebug *e, struct xe_exec_queue *q,
+		       struct xe_lrc *lrc, u8 *bitmap, unsigned int bitmap_size);
+};
+
 /**
  * struct xe_eudebug - Top level struct for eudebug: the connection
  */
@@ -128,6 +150,8 @@ struct xe_eudebug {
 		atomic_long_t seqno;
 	} events;
 
+	/** @ops operations for eu_control */
+	struct xe_eudebug_eu_control_ops *ops;
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_gt_debug.c b/drivers/gpu/drm/xe/xe_gt_debug.c
index c4f0d11a20a6..f35b9df5e41b 100644
--- a/drivers/gpu/drm/xe/xe_gt_debug.c
+++ b/drivers/gpu/drm/xe/xe_gt_debug.c
@@ -13,12 +13,12 @@
 #include "xe_pm.h"
 #include "xe_macros.h"
 
-static int xe_gt_foreach_dss_group_instance(struct xe_gt *gt,
-					    int (*fn)(struct xe_gt *gt,
-						      void *data,
-						      u16 group,
-						      u16 instance),
-					    void *data)
+int xe_gt_foreach_dss_group_instance(struct xe_gt *gt,
+				     int (*fn)(struct xe_gt *gt,
+					       void *data,
+					       u16 group,
+					       u16 instance),
+				     void *data)
 {
 	const enum xe_force_wake_domains fw_domains = XE_FW_GT;
 	unsigned int dss, fw_ref;
diff --git a/drivers/gpu/drm/xe/xe_gt_debug.h b/drivers/gpu/drm/xe/xe_gt_debug.h
index 3f13dbb17a5f..342082699ff6 100644
--- a/drivers/gpu/drm/xe/xe_gt_debug.h
+++ b/drivers/gpu/drm/xe/xe_gt_debug.h
@@ -13,6 +13,12 @@
 #define XE_GT_ATTENTION_TIMEOUT_MS 100
 
 int xe_gt_eu_threads_needing_attention(struct xe_gt *gt);
+int xe_gt_foreach_dss_group_instance(struct xe_gt *gt,
+				     int (*fn)(struct xe_gt *gt,
+					       void *data,
+					       u16 group,
+					       u16 instance),
+				     void *data);
 
 int xe_gt_eu_attention_bitmap_size(struct xe_gt *gt);
 int xe_gt_eu_attention_bitmap(struct xe_gt *gt, u8 *bits,
diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
index 144c7cf888bb..ccfbe976c509 100644
--- a/include/uapi/drm/xe_drm_eudebug.h
+++ b/include/uapi/drm/xe_drm_eudebug.h
@@ -15,7 +15,8 @@ extern "C" {
  *
  * This ioctl is available in debug version 1.
  */
-#define DRM_XE_EUDEBUG_IOCTL_READ_EVENT _IO('j', 0x0)
+#define DRM_XE_EUDEBUG_IOCTL_READ_EVENT		_IO('j', 0x0)
+#define DRM_XE_EUDEBUG_IOCTL_EU_CONTROL		_IOWR('j', 0x2, struct drm_xe_eudebug_eu_control)
 
 /* XXX: Document events to match their internal counterparts when moved to xe_drm.h */
 struct drm_xe_eudebug_event {
@@ -91,6 +92,24 @@ struct drm_xe_eudebug_event_eu_attention {
 	__u8 bitmask[];
 };
 
+struct drm_xe_eudebug_eu_control {
+	__u64 client_handle;
+
+#define DRM_XE_EUDEBUG_EU_CONTROL_CMD_INTERRUPT_ALL	0
+#define DRM_XE_EUDEBUG_EU_CONTROL_CMD_STOPPED		1
+#define DRM_XE_EUDEBUG_EU_CONTROL_CMD_RESUME		2
+	__u32 cmd;
+	__u32 flags;
+
+	__u64 seqno;
+
+	__u64 exec_queue_handle;
+	__u64 lrc_handle;
+	__u32 reserved;
+	__u32 bitmask_size;
+	__u64 bitmask_ptr;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 10/26] drm/xe/eudebug: Add vm bind and vm bind ops
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (8 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 09/26] drm/xe/eudebug: Introduce EU control interface Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 11/26] drm/xe/eudebug: Add UFENCE events with acks Mika Kuoppala
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Christoph Manszewski, Mika Kuoppala

From: Christoph Manszewski <christoph.manszewski@intel.com>

Add events dedicated to track vma bind and vma unbind operations. The
events are generated for operations performed on xe_vma MAP and UNMAP
for boss and userptrs.

As one bind can result in multiple operations and fail in the middle,
we want to store the events until full successful chain of operations
can be relayed to debugger.

Signed-off-by: Christoph Manszewski <christoph.manszewski@intel.com>
Co-developed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c       | 318 +++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_eudebug.h       |  13 ++
 drivers/gpu/drm/xe/xe_eudebug_types.h |  29 +++
 drivers/gpu/drm/xe/xe_vm.c            |  16 +-
 drivers/gpu/drm/xe/xe_vm_types.h      |  13 ++
 include/uapi/drm/xe_drm_eudebug.h     |  64 ++++++
 6 files changed, 449 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 81d03a860b7f..f544f60d7d6b 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -792,7 +792,7 @@ static struct xe_eudebug_event *
 xe_eudebug_create_event(struct xe_eudebug *d, u16 type, u64 seqno, u16 flags,
 			u32 len)
 {
-	const u16 max_event = DRM_XE_EUDEBUG_EVENT_EU_ATTENTION;
+	const u16 max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_OP;
 	const u16 known_flags =
 		DRM_XE_EUDEBUG_EVENT_CREATE |
 		DRM_XE_EUDEBUG_EVENT_DESTROY |
@@ -827,7 +827,7 @@ static long xe_eudebug_read_event(struct xe_eudebug *d,
 		u64_to_user_ptr(arg);
 	struct drm_xe_eudebug_event user_event;
 	struct xe_eudebug_event *event;
-	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_EU_ATTENTION;
+	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_OP;
 	long ret = 0;
 
 	if (XE_IOCTL_DBG(xe, copy_from_user(&user_event, user_orig, sizeof(user_event))))
@@ -2359,6 +2359,320 @@ void xe_eudebug_exec_queue_destroy(struct xe_file *xef, struct xe_exec_queue *q)
 	xe_eudebug_event_put(d, exec_queue_destroy_event(d, xef, q));
 }
 
+static int xe_eudebug_queue_bind_event(struct xe_eudebug *d,
+				       struct xe_vm *vm,
+				       struct xe_eudebug_event *event)
+{
+	struct xe_eudebug_event_envelope *env;
+
+	lockdep_assert_held_write(&vm->lock);
+
+	env = kmalloc(sizeof(*env), GFP_KERNEL);
+	if (!env)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&env->link);
+	env->event = event;
+
+	spin_lock(&vm->eudebug.lock);
+	list_add_tail(&env->link, &vm->eudebug.events);
+
+	if (event->type == DRM_XE_EUDEBUG_EVENT_VM_BIND_OP)
+		++vm->eudebug.ops;
+	spin_unlock(&vm->eudebug.lock);
+
+	return 0;
+}
+
+static int queue_vm_bind_event(struct xe_eudebug *d,
+			       struct xe_vm *vm,
+			       u64 client_handle,
+			       u64 vm_handle,
+			       u32 bind_flags,
+			       u32 num_ops, u64 *seqno)
+{
+	struct xe_eudebug_event_vm_bind *e;
+	struct xe_eudebug_event *event;
+	const u32 sz = sizeof(*e);
+	const u32 base_flags = DRM_XE_EUDEBUG_EVENT_STATE_CHANGE;
+
+	*seqno = atomic_long_inc_return(&d->events.seqno);
+
+	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_VM_BIND,
+					*seqno, base_flags, sz);
+	if (!event)
+		return -ENOMEM;
+
+	e = cast_event(e, event);
+	write_member(struct drm_xe_eudebug_event_vm_bind, e, client_handle, client_handle);
+	write_member(struct drm_xe_eudebug_event_vm_bind, e, vm_handle, vm_handle);
+	write_member(struct drm_xe_eudebug_event_vm_bind, e, flags, bind_flags);
+	write_member(struct drm_xe_eudebug_event_vm_bind, e, num_binds, num_ops);
+
+	/* If in discovery, no need to collect ops */
+	if (!completion_done(&d->discovery)) {
+		XE_WARN_ON(!num_ops);
+		return xe_eudebug_queue_event(d, event);
+	}
+
+	return xe_eudebug_queue_bind_event(d, vm, event);
+}
+
+static int vm_bind_event(struct xe_eudebug *d,
+			 struct xe_vm *vm,
+			 u32 num_ops,
+			 u64 *seqno)
+{
+	int h_c, h_vm;
+
+	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, vm->xef);
+	if (h_c < 0)
+		return h_c;
+
+	h_vm = find_handle(d->res, XE_EUDEBUG_RES_TYPE_VM, vm);
+	if (h_vm < 0)
+		return h_vm;
+
+	return queue_vm_bind_event(d, vm, h_c, h_vm, 0,
+				   num_ops, seqno);
+}
+
+static int vm_bind_op_event(struct xe_eudebug *d,
+			    struct xe_vm *vm,
+			    const u32 flags,
+			    const u64 bind_ref_seqno,
+			    const u64 num_extensions,
+			    u64 addr, u64 range,
+			    u64 *op_seqno)
+{
+	struct xe_eudebug_event_vm_bind_op *e;
+	struct xe_eudebug_event *event;
+	const u32 sz = sizeof(*e);
+
+	*op_seqno = atomic_long_inc_return(&d->events.seqno);
+
+	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_VM_BIND_OP,
+					*op_seqno, flags, sz);
+	if (!event)
+		return -ENOMEM;
+
+	e = cast_event(e, event);
+
+	write_member(struct drm_xe_eudebug_event_vm_bind_op, e, vm_bind_ref_seqno, bind_ref_seqno);
+	write_member(struct drm_xe_eudebug_event_vm_bind_op, e, num_extensions, num_extensions);
+	write_member(struct drm_xe_eudebug_event_vm_bind_op, e, addr, addr);
+	write_member(struct drm_xe_eudebug_event_vm_bind_op, e, range, range);
+
+	/* If in discovery, no need to collect ops */
+	if (!completion_done(&d->discovery))
+		return xe_eudebug_queue_event(d, event);
+
+	return xe_eudebug_queue_bind_event(d, vm, event);
+}
+
+static int vm_bind_op(struct xe_eudebug *d, struct xe_vm *vm,
+		      const u32 flags, const u64 bind_ref_seqno,
+		      u64 addr, u64 range)
+{
+	u64 op_seqno = 0;
+	u64 num_extensions = 0;
+	int ret;
+
+	ret = vm_bind_op_event(d, vm, flags, bind_ref_seqno, num_extensions,
+			       addr, range, &op_seqno);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+void xe_eudebug_vm_init(struct xe_vm *vm)
+{
+	INIT_LIST_HEAD(&vm->eudebug.events);
+	spin_lock_init(&vm->eudebug.lock);
+	vm->eudebug.ops = 0;
+	vm->eudebug.ref_seqno = 0;
+}
+
+void xe_eudebug_vm_bind_start(struct xe_vm *vm)
+{
+	struct xe_eudebug *d;
+	u64 seqno = 0;
+	int err;
+
+	if (!xe_vm_in_lr_mode(vm))
+		return;
+
+	d = xe_eudebug_get(vm->xef);
+	if (!d)
+		return;
+
+	lockdep_assert_held_write(&vm->lock);
+
+	if (XE_WARN_ON(!list_empty(&vm->eudebug.events)) ||
+	    XE_WARN_ON(vm->eudebug.ops) ||
+	    XE_WARN_ON(vm->eudebug.ref_seqno)) {
+		eu_err(d, "bind busy on %s",  __func__);
+		xe_eudebug_disconnect(d, -EINVAL);
+	}
+
+	err = vm_bind_event(d, vm, 0, &seqno);
+	if (err) {
+		eu_err(d, "error %d on %s", err, __func__);
+		xe_eudebug_disconnect(d, err);
+	}
+
+	spin_lock(&vm->eudebug.lock);
+	XE_WARN_ON(vm->eudebug.ref_seqno);
+	vm->eudebug.ref_seqno = seqno;
+	vm->eudebug.ops = 0;
+	spin_unlock(&vm->eudebug.lock);
+
+	xe_eudebug_put(d);
+}
+
+void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range)
+{
+	struct xe_eudebug *d;
+	u32 flags;
+
+	if (!xe_vm_in_lr_mode(vm))
+		return;
+
+	switch (op) {
+	case DRM_XE_VM_BIND_OP_MAP:
+	case DRM_XE_VM_BIND_OP_MAP_USERPTR:
+	{
+		flags = DRM_XE_EUDEBUG_EVENT_CREATE;
+		break;
+	}
+	case DRM_XE_VM_BIND_OP_UNMAP:
+	case DRM_XE_VM_BIND_OP_UNMAP_ALL:
+		flags = DRM_XE_EUDEBUG_EVENT_DESTROY;
+		break;
+	default:
+		flags = 0;
+		break;
+	}
+
+	if (!flags)
+		return;
+
+	d = xe_eudebug_get(vm->xef);
+	if (!d)
+		return;
+
+	xe_eudebug_event_put(d, vm_bind_op(d, vm, flags, 0, addr, range));
+}
+
+static struct xe_eudebug_event *fetch_bind_event(struct xe_vm * const vm)
+{
+	struct xe_eudebug_event_envelope *env;
+	struct xe_eudebug_event *e = NULL;
+
+	spin_lock(&vm->eudebug.lock);
+	env = list_first_entry_or_null(&vm->eudebug.events,
+				       struct xe_eudebug_event_envelope, link);
+	if (env) {
+		e = env->event;
+		list_del(&env->link);
+	}
+	spin_unlock(&vm->eudebug.lock);
+
+	kfree(env);
+
+	return e;
+}
+
+static void fill_vm_bind_fields(struct xe_vm *vm,
+				struct xe_eudebug_event *e,
+				bool ufence,
+				u32 bind_ops)
+{
+	struct xe_eudebug_event_vm_bind *eb = cast_event(eb, e);
+
+	eb->flags = ufence ?
+		DRM_XE_EUDEBUG_EVENT_VM_BIND_FLAG_UFENCE : 0;
+	eb->num_binds = bind_ops;
+}
+
+static void fill_vm_bind_op_fields(struct xe_vm *vm,
+				   struct xe_eudebug_event *e,
+				   u64 ref_seqno)
+{
+	struct xe_eudebug_event_vm_bind_op *op;
+
+	if (e->type != DRM_XE_EUDEBUG_EVENT_VM_BIND_OP)
+		return;
+
+	op = cast_event(op, e);
+	op->vm_bind_ref_seqno = ref_seqno;
+}
+
+void xe_eudebug_vm_bind_end(struct xe_vm *vm, bool has_ufence, int bind_err)
+{
+	struct xe_eudebug_event *e;
+	struct xe_eudebug *d;
+	u32 bind_ops;
+	u64 ref;
+
+	if (!xe_vm_in_lr_mode(vm))
+		return;
+
+	spin_lock(&vm->eudebug.lock);
+	ref = vm->eudebug.ref_seqno;
+	vm->eudebug.ref_seqno = 0;
+	bind_ops = vm->eudebug.ops;
+	vm->eudebug.ops = 0;
+	spin_unlock(&vm->eudebug.lock);
+
+	e = fetch_bind_event(vm);
+	if (!e)
+		return;
+
+	d = NULL;
+	if (!bind_err && ref) {
+		d = xe_eudebug_get(vm->xef);
+		if (d) {
+			if (bind_ops) {
+				fill_vm_bind_fields(vm, e, has_ufence, bind_ops);
+			} else {
+				/*
+				 * If there was no ops we are interested in,
+				 * we can omit the whole sequence
+				 */
+				xe_eudebug_put(d);
+				d = NULL;
+			}
+		}
+	}
+
+	while (e) {
+		int err = 0;
+
+		if (d) {
+			err = xe_eudebug_queue_event(d, e);
+			if (!err)
+				e = NULL;
+		}
+
+		if (err) {
+			xe_eudebug_disconnect(d, err);
+			xe_eudebug_put(d);
+			d = NULL;
+		}
+
+		kfree(e);
+
+		e = fetch_bind_event(vm);
+		if (e && ref)
+			fill_vm_bind_op_fields(vm, e, ref);
+	}
+
+	if (d)
+		xe_eudebug_put(d);
+}
+
 static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
 {
 	struct xe_exec_queue *q;
diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h
index 1fe86bec99e1..ccc7202b3308 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.h
+++ b/drivers/gpu/drm/xe/xe_eudebug.h
@@ -5,11 +5,14 @@
 
 #ifndef _XE_EUDEBUG_H_
 
+#include <linux/types.h>
+
 struct drm_device;
 struct drm_file;
 struct xe_device;
 struct xe_file;
 struct xe_vm;
+struct xe_vma;
 struct xe_exec_queue;
 struct xe_hw_engine;
 
@@ -33,6 +36,11 @@ void xe_eudebug_vm_destroy(struct xe_file *xef, struct xe_vm *vm);
 void xe_eudebug_exec_queue_create(struct xe_file *xef, struct xe_exec_queue *q);
 void xe_eudebug_exec_queue_destroy(struct xe_file *xef, struct xe_exec_queue *q);
 
+void xe_eudebug_vm_init(struct xe_vm *vm);
+void xe_eudebug_vm_bind_start(struct xe_vm *vm);
+void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range);
+void xe_eudebug_vm_bind_end(struct xe_vm *vm, bool has_ufence, int err);
+
 #else
 
 static inline int xe_eudebug_connect_ioctl(struct drm_device *dev,
@@ -53,6 +61,11 @@ static inline void xe_eudebug_vm_destroy(struct xe_file *xef, struct xe_vm *vm)
 static inline void xe_eudebug_exec_queue_create(struct xe_file *xef, struct xe_exec_queue *q) { }
 static inline void xe_eudebug_exec_queue_destroy(struct xe_file *xef, struct xe_exec_queue *q) { }
 
+static inline void xe_eudebug_vm_init(struct xe_vm *vm) { }
+static inline void xe_eudebug_vm_bind_start(struct xe_vm *vm) { }
+static inline void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range) { }
+static inline void xe_eudebug_vm_bind_end(struct xe_vm *vm, bool has_ufence, int err) { }
+
 #endif /* CONFIG_DRM_XE_EUDEBUG */
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
index e1d4e31b32ec..cbc316ec3593 100644
--- a/drivers/gpu/drm/xe/xe_eudebug_types.h
+++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
@@ -177,6 +177,11 @@ struct xe_eudebug_event {
 	u8 data[];
 };
 
+struct xe_eudebug_event_envelope {
+	struct list_head link;
+	struct xe_eudebug_event *event;
+};
+
 /**
  * struct xe_eudebug_event_open - Internal event for client open/close
  */
@@ -284,4 +289,28 @@ struct xe_eudebug_event_eu_attention {
 	u8 bitmask[] __counted_by(bitmask_size);
 };
 
+/**
+ * struct xe_eudebug_event_vm_bind - Internal event for vm bind/unbind operation
+ */
+struct xe_eudebug_event_vm_bind {
+	/** @base: base event */
+	struct xe_eudebug_event base;
+
+	u64 client_handle;
+	u64 vm_handle;
+
+	u32 flags;
+	u32 num_binds;
+};
+
+struct xe_eudebug_event_vm_bind_op {
+	/** @base: base event */
+	struct xe_eudebug_event base;
+	u64 vm_bind_ref_seqno;
+	u64 num_extensions;
+
+	u64 addr; /* Zero for unmap all ? */
+	u64 range; /* Zero for unmap all ? */
+};
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 6f16049f4f6e..e83420473763 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1413,6 +1413,8 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	for_each_tile(tile, xe, id)
 		xe_range_fence_tree_init(&vm->rftree[id]);
 
+	xe_eudebug_vm_init(vm);
+
 	vm->pt_ops = &xelp_pt_ops;
 
 	/*
@@ -1641,6 +1643,8 @@ static void vm_destroy_work_func(struct work_struct *w)
 	struct xe_tile *tile;
 	u8 id;
 
+	xe_eudebug_vm_bind_end(vm, 0, -ENOENT);
+
 	/* xe_vm_close_and_put was not called? */
 	xe_assert(xe, !vm->size);
 
@@ -2651,7 +2655,7 @@ static void vm_bind_ioctl_ops_fini(struct xe_vm *vm, struct xe_vma_ops *vops,
 				   struct dma_fence *fence)
 {
 	struct xe_exec_queue *wait_exec_queue = to_wait_exec_queue(vm, vops->q);
-	struct xe_user_fence *ufence;
+	struct xe_user_fence *ufence = NULL;
 	struct xe_vma_op *op;
 	int i;
 
@@ -2666,6 +2670,9 @@ static void vm_bind_ioctl_ops_fini(struct xe_vm *vm, struct xe_vma_ops *vops,
 			xe_vma_destroy(gpuva_to_vma(op->base.remap.unmap->va),
 				       fence);
 	}
+
+	xe_eudebug_vm_bind_end(vm, ufence, 0);
+
 	if (ufence)
 		xe_sync_ufence_put(ufence);
 	for (i = 0; i < vops->num_syncs; i++)
@@ -3078,6 +3085,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		if (err)
 			goto unwind_ops;
 
+		xe_eudebug_vm_bind_op_add(vm, op, addr, range);
+
 #ifdef TEST_VM_OPS_ERROR
 		if (flags & FORCE_OP_ERROR) {
 			vops.inject_error = true;
@@ -3101,8 +3110,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	err = vm_bind_ioctl_ops_execute(vm, &vops);
 
 unwind_ops:
-	if (err && err != -ENODATA)
+	if (err && err != -ENODATA) {
+		xe_eudebug_vm_bind_end(vm, num_ufence > 0, err);
 		vm_bind_ioctl_ops_unwind(vm, ops, args->num_binds);
+	}
+
 	xe_vma_ops_fini(&vops);
 	for (i = args->num_binds - 1; i >= 0; --i)
 		if (ops[i])
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 7f9a303e51d8..557b047ebdd7 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -282,6 +282,19 @@ struct xe_vm {
 	bool batch_invalidate_tlb;
 	/** @xef: XE file handle for tracking this VM's drm client */
 	struct xe_file *xef;
+
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+	struct {
+		/** @lock: Lock for eudebug_bind members */
+		spinlock_t lock;
+		/** @events: List of vm bind ops gathered */
+		struct list_head events;
+		/** @ops: How many operations we have stored */
+		u32 ops;
+		/** @ref_seqno: Reference to the VM_BIND that the ops relate */
+		u64 ref_seqno;
+	} eudebug;
+#endif
 };
 
 /** struct xe_vma_op_map - VMA map operation */
diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
index ccfbe976c509..cc34c522fa4d 100644
--- a/include/uapi/drm/xe_drm_eudebug.h
+++ b/include/uapi/drm/xe_drm_eudebug.h
@@ -30,6 +30,8 @@ struct drm_xe_eudebug_event {
 #define DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE		4
 #define DRM_XE_EUDEBUG_EVENT_EXEC_QUEUE_PLACEMENTS 5
 #define DRM_XE_EUDEBUG_EVENT_EU_ATTENTION	6
+#define DRM_XE_EUDEBUG_EVENT_VM_BIND		7
+#define DRM_XE_EUDEBUG_EVENT_VM_BIND_OP		8
 
 	__u16 flags;
 #define DRM_XE_EUDEBUG_EVENT_CREATE		(1 << 0)
@@ -110,6 +112,68 @@ struct drm_xe_eudebug_eu_control {
 	__u64 bitmask_ptr;
 };
 
+/*
+ *  When client (debuggee) does vm_bind_ioctl() following event
+ *  sequence will be created (for the debugger):
+ *
+ *  ┌───────────────────────┐
+ *  │  EVENT_VM_BIND        ├───────┬─┬─┐
+ *  └───────────────────────┘       │ │ │
+ *      ┌───────────────────────┐   │ │ │
+ *      │ EVENT_VM_BIND_OP #1   ├───┘ │ │
+ *      └───────────────────────┘     │ │
+ *                 ...                │ │
+ *      ┌───────────────────────┐     │ │
+ *      │ EVENT_VM_BIND_OP #n   ├─────┘ │
+ *      └───────────────────────┘       │
+ *                                      │
+ *      ┌───────────────────────┐       │
+ *      │ EVENT_UFENCE          ├───────┘
+ *      └───────────────────────┘
+ *
+ * All the events below VM_BIND will reference the VM_BIND
+ * they associate with, by field .vm_bind_ref_seqno.
+ * event_ufence will only be included if the client did
+ * attach sync of type UFENCE into its vm_bind_ioctl().
+ *
+ * When EVENT_UFENCE is sent by the driver, all the OPs of
+ * the original VM_BIND are completed and the [addr,range]
+ * contained in them are present and modifiable through the
+ * vm accessors. Accessing [addr, range] before related ufence
+ * event will lead to undefined results as the actual bind
+ * operations are async and the backing storage might not
+ * be there on a moment of receiving the event.
+ *
+ * Client's UFENCE sync will be held by the driver: client's
+ * drm_xe_wait_ufence will not complete and the value of the ufence
+ * won't appear until ufence is acked by the debugger process calling
+ * DRM_XE_EUDEBUG_IOCTL_ACK_EVENT with the event_ufence.base.seqno.
+ * This will signal the fence, .value will update and the wait will
+ * complete allowing the client to continue.
+ *
+ */
+
+struct drm_xe_eudebug_event_vm_bind {
+	struct drm_xe_eudebug_event base;
+
+	__u64 client_handle;
+	__u64 vm_handle;
+
+	__u32 flags;
+#define DRM_XE_EUDEBUG_EVENT_VM_BIND_FLAG_UFENCE (1 << 0)
+
+	__u32 num_binds;
+};
+
+struct drm_xe_eudebug_event_vm_bind_op {
+	struct drm_xe_eudebug_event base;
+	__u64 vm_bind_ref_seqno; /* *_event_vm_bind.base.seqno */
+	__u64 num_extensions;
+
+	__u64 addr; /* XXX: Zero for unmap all? */
+	__u64 range; /* XXX: Zero for unmap all? */
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 11/26] drm/xe/eudebug: Add UFENCE events with acks
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (9 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 10/26] drm/xe/eudebug: Add vm bind and vm bind ops Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 12/26] drm/xe/eudebug: vm open/pread/pwrite Mika Kuoppala
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Mika Kuoppala, Andrzej Hajda

When vma is in place, debugger needs to intercept before
userspace proceeds with the workload. For example to install
a breakpoint in a eu shader.

Attach debugger in xe_user_fence, send UFENCE event
and stall normal user fence signal path to yield if
there is debugger attached to ufence.

When ack (ioctl) is received for the corresponding seqno,
signal ufence.

v2: - return err instead of 0 to guarantee signalling (Dominik)
    - checkpatch (Tilak)
    - Kconfig (Mika, Andrzej)
    - use lock instead of cmpxchg (Mika)

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c       | 283 +++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_eudebug.h       |  16 ++
 drivers/gpu/drm/xe/xe_eudebug_types.h |  13 ++
 drivers/gpu/drm/xe/xe_exec.c          |   2 +-
 drivers/gpu/drm/xe/xe_oa.c            |   3 +-
 drivers/gpu/drm/xe/xe_sync.c          |  45 ++--
 drivers/gpu/drm/xe/xe_sync.h          |   8 +-
 drivers/gpu/drm/xe/xe_sync_types.h    |  28 ++-
 drivers/gpu/drm/xe/xe_vm.c            |   4 +-
 include/uapi/drm/xe_drm_eudebug.h     |  13 ++
 10 files changed, 385 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index f544f60d7d6b..3cf3616e546d 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -32,6 +32,7 @@
 #include "xe_reg_sr.h"
 #include "xe_rtp.h"
 #include "xe_sched_job.h"
+#include "xe_sync.h"
 #include "xe_vm.h"
 #include "xe_wa.h"
 
@@ -239,11 +240,119 @@ static void xe_eudebug_free(struct kref *ref)
 	kfree_rcu(d, rcu);
 }
 
-static void xe_eudebug_put(struct xe_eudebug *d)
+void xe_eudebug_put(struct xe_eudebug *d)
 {
 	kref_put(&d->ref, xe_eudebug_free);
 }
 
+struct xe_eudebug_ack {
+	struct rb_node rb_node;
+	u64 seqno;
+	u64 ts_insert;
+	struct xe_user_fence *ufence;
+};
+
+#define fetch_ack(x) rb_entry(x, struct xe_eudebug_ack, rb_node)
+
+static int compare_ack(const u64 a, const u64 b)
+{
+	if (a < b)
+		return -1;
+	else if (a > b)
+		return 1;
+
+	return 0;
+}
+
+static int ack_insert_cmp(struct rb_node * const node,
+			  const struct rb_node * const p)
+{
+	return compare_ack(fetch_ack(node)->seqno,
+			   fetch_ack(p)->seqno);
+}
+
+static int ack_lookup_cmp(const void * const key,
+			  const struct rb_node * const node)
+{
+	return compare_ack(*(const u64 *)key,
+			   fetch_ack(node)->seqno);
+}
+
+static struct xe_eudebug_ack *remove_ack(struct xe_eudebug *d, u64 seqno)
+{
+	struct rb_root * const root = &d->acks.tree;
+	struct rb_node *node;
+
+	spin_lock(&d->acks.lock);
+	node = rb_find(&seqno, root, ack_lookup_cmp);
+	if (node)
+		rb_erase(node, root);
+	spin_unlock(&d->acks.lock);
+
+	if (!node)
+		return NULL;
+
+	return rb_entry_safe(node, struct xe_eudebug_ack, rb_node);
+}
+
+static void ufence_signal_worker(struct work_struct *w)
+{
+	struct xe_user_fence * const ufence =
+		container_of(w, struct xe_user_fence, eudebug.worker);
+
+	if (READ_ONCE(ufence->signalled))
+		xe_sync_ufence_signal(ufence);
+
+	xe_sync_ufence_put(ufence);
+}
+
+static void kick_ufence_worker(struct xe_user_fence *f)
+{
+	queue_work(f->xe->eudebug.ordered_wq, &f->eudebug.worker);
+}
+
+static void handle_ack(struct xe_eudebug *d, struct xe_eudebug_ack *ack,
+		       bool on_disconnect)
+{
+	struct xe_user_fence *f = ack->ufence;
+	u64 signalled_by;
+	bool signal = false;
+
+	spin_lock(&f->eudebug.lock);
+	if (!f->eudebug.signalled_seqno) {
+		f->eudebug.signalled_seqno = ack->seqno;
+		signal = true;
+	}
+	signalled_by = f->eudebug.signalled_seqno;
+	spin_unlock(&f->eudebug.lock);
+
+	if (signal)
+		kick_ufence_worker(f);
+	else
+		xe_sync_ufence_put(f);
+
+	eu_dbg(d, "ACK: seqno=%llu: signalled by %llu (%s) (held %lluus)",
+	       ack->seqno, signalled_by,
+	       on_disconnect ? "disconnect" : "debugger",
+	       ktime_us_delta(ktime_get(), ack->ts_insert));
+
+	kfree(ack);
+}
+
+static void release_acks(struct xe_eudebug *d)
+{
+	struct xe_eudebug_ack *ack, *n;
+	struct rb_root root;
+
+	spin_lock(&d->acks.lock);
+	root = d->acks.tree;
+	d->acks.tree = RB_ROOT;
+	spin_unlock(&d->acks.lock);
+
+	rbtree_postorder_for_each_entry_safe(ack, n, &root, rb_node)
+		handle_ack(d, ack, true);
+}
+
 static struct task_struct *find_get_target(const pid_t nr)
 {
 	struct task_struct *task;
@@ -328,6 +437,8 @@ static bool xe_eudebug_detach(struct xe_device *xe,
 
 	eu_dbg(d, "session %lld detached with %d", d->session, err);
 
+	release_acks(d);
+
 	/* Our ref with the connection_link */
 	xe_eudebug_put(d);
 
@@ -453,7 +564,7 @@ _xe_eudebug_get(struct xe_file *xef)
 	return d;
 }
 
-static struct xe_eudebug *
+struct xe_eudebug *
 xe_eudebug_get(struct xe_file *xef)
 {
 	struct xe_eudebug *d;
@@ -792,7 +903,7 @@ static struct xe_eudebug_event *
 xe_eudebug_create_event(struct xe_eudebug *d, u16 type, u64 seqno, u16 flags,
 			u32 len)
 {
-	const u16 max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_OP;
+	const u16 max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_UFENCE;
 	const u16 known_flags =
 		DRM_XE_EUDEBUG_EVENT_CREATE |
 		DRM_XE_EUDEBUG_EVENT_DESTROY |
@@ -827,7 +938,7 @@ static long xe_eudebug_read_event(struct xe_eudebug *d,
 		u64_to_user_ptr(arg);
 	struct drm_xe_eudebug_event user_event;
 	struct xe_eudebug_event *event;
-	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_OP;
+	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_UFENCE;
 	long ret = 0;
 
 	if (XE_IOCTL_DBG(xe, copy_from_user(&user_event, user_orig, sizeof(user_event))))
@@ -902,6 +1013,44 @@ static long xe_eudebug_read_event(struct xe_eudebug *d,
 	return ret;
 }
 
+static long
+xe_eudebug_ack_event_ioctl(struct xe_eudebug *d,
+			   const unsigned int cmd,
+			   const u64 arg)
+{
+	struct drm_xe_eudebug_ack_event __user * const user_ptr =
+		u64_to_user_ptr(arg);
+	struct drm_xe_eudebug_ack_event user_arg;
+	struct xe_eudebug_ack *ack;
+	struct xe_device *xe = d->xe;
+
+	if (XE_IOCTL_DBG(xe, _IOC_SIZE(cmd) < sizeof(user_arg)))
+		return -EINVAL;
+
+	/* Userland write */
+	if (XE_IOCTL_DBG(xe, !(_IOC_DIR(cmd) & _IOC_WRITE)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, copy_from_user(&user_arg,
+					    user_ptr,
+					    sizeof(user_arg))))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, user_arg.flags))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, xe_eudebug_detached(d)))
+		return -ENOTCONN;
+
+	ack = remove_ack(d, user_arg.seqno);
+	if (XE_IOCTL_DBG(xe, !ack))
+		return -EINVAL;
+
+	handle_ack(d, ack, false);
+
+	return 0;
+}
+
 static int do_eu_control(struct xe_eudebug *d,
 			 const struct drm_xe_eudebug_eu_control * const arg,
 			 struct drm_xe_eudebug_eu_control __user * const user_ptr)
@@ -1093,7 +1242,10 @@ static long xe_eudebug_ioctl(struct file *file,
 		ret = xe_eudebug_eu_control(d, arg);
 		eu_dbg(d, "ioctl cmd=EU_CONTROL ret=%ld\n", ret);
 		break;
-
+	case DRM_XE_EUDEBUG_IOCTL_ACK_EVENT:
+		ret = xe_eudebug_ack_event_ioctl(d, cmd, arg);
+		eu_dbg(d, "ioctl cmd=EVENT_ACK ret=%ld\n", ret);
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1792,6 +1944,9 @@ xe_eudebug_connect(struct xe_device *xe,
 	INIT_KFIFO(d->events.fifo);
 	INIT_WORK(&d->discovery_work, discovery_work_fn);
 
+	spin_lock_init(&d->acks.lock);
+	d->acks.tree = RB_ROOT;
+
 	d->res = xe_eudebug_resources_alloc();
 	if (IS_ERR(d->res)) {
 		err = PTR_ERR(d->res);
@@ -2486,6 +2641,70 @@ static int vm_bind_op(struct xe_eudebug *d, struct xe_vm *vm,
 	return 0;
 }
 
+static int xe_eudebug_track_ufence(struct xe_eudebug *d,
+				   struct xe_user_fence *f,
+				   u64 seqno)
+{
+	struct xe_eudebug_ack *ack;
+	struct rb_node *old;
+
+	ack = kzalloc(sizeof(*ack), GFP_KERNEL);
+	if (!ack)
+		return -ENOMEM;
+
+	ack->seqno = seqno;
+	ack->ts_insert = ktime_get();
+
+	spin_lock(&d->acks.lock);
+	old = rb_find_add(&ack->rb_node,
+			  &d->acks.tree, ack_insert_cmp);
+	if (!old) {
+		kref_get(&f->refcount);
+		ack->ufence = f;
+	}
+	spin_unlock(&d->acks.lock);
+
+	if (old) {
+		eu_dbg(d, "ACK: seqno=%llu: already exists", seqno);
+		kfree(ack);
+		return -EEXIST;
+	}
+
+	eu_dbg(d, "ACK: seqno=%llu: tracking started", seqno);
+
+	return 0;
+}
+
+static int vm_bind_ufence_event(struct xe_eudebug *d,
+				struct xe_user_fence *ufence)
+{
+	struct xe_eudebug_event *event;
+	struct xe_eudebug_event_vm_bind_ufence *e;
+	const u32 sz = sizeof(*e);
+	const u32 flags = DRM_XE_EUDEBUG_EVENT_CREATE |
+		DRM_XE_EUDEBUG_EVENT_NEED_ACK;
+	u64 seqno;
+	int ret;
+
+	seqno = atomic_long_inc_return(&d->events.seqno);
+
+	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_VM_BIND_UFENCE,
+					seqno, flags, sz);
+	if (!event)
+		return -ENOMEM;
+
+	e = cast_event(e, event);
+
+	write_member(struct drm_xe_eudebug_event_vm_bind_ufence,
+		     e, vm_bind_ref_seqno, ufence->eudebug.bind_ref_seqno);
+
+	ret = xe_eudebug_track_ufence(d, ufence, seqno);
+	if (!ret)
+		ret = xe_eudebug_queue_event(d, event);
+
+	return ret;
+}
+
 void xe_eudebug_vm_init(struct xe_vm *vm)
 {
 	INIT_LIST_HEAD(&vm->eudebug.events);
@@ -2673,6 +2892,24 @@ void xe_eudebug_vm_bind_end(struct xe_vm *vm, bool has_ufence, int bind_err)
 		xe_eudebug_put(d);
 }
 
+int xe_eudebug_vm_bind_ufence(struct xe_user_fence *ufence)
+{
+	struct xe_eudebug *d;
+	int err;
+
+	d = ufence->eudebug.debugger;
+	if (!d || xe_eudebug_detached(d))
+		return -ENOTCONN;
+
+	err = vm_bind_ufence_event(d, ufence);
+	if (err) {
+		eu_err(d, "error %d on %s", err, __func__);
+		xe_eudebug_disconnect(d, err);
+	}
+
+	return err;
+}
+
 static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
 {
 	struct xe_exec_queue *q;
@@ -2765,3 +3002,39 @@ static void discovery_work_fn(struct work_struct *work)
 
 	xe_eudebug_put(d);
 }
+
+void xe_eudebug_ufence_init(struct xe_user_fence *ufence,
+			    struct xe_file *xef,
+			    struct xe_vm *vm)
+{
+	u64 bind_ref;
+
+	/* Drop if OA */
+	if (!vm)
+		return;
+
+	spin_lock(&vm->eudebug.lock);
+	bind_ref = vm->eudebug.ref_seqno;
+	spin_unlock(&vm->eudebug.lock);
+
+	spin_lock_init(&ufence->eudebug.lock);
+	INIT_WORK(&ufence->eudebug.worker, ufence_signal_worker);
+
+	ufence->eudebug.signalled_seqno = 0;
+
+	if (bind_ref) {
+		ufence->eudebug.debugger = xe_eudebug_get(xef);
+
+		if (ufence->eudebug.debugger)
+			ufence->eudebug.bind_ref_seqno = bind_ref;
+	}
+}
+
+void xe_eudebug_ufence_fini(struct xe_user_fence *ufence)
+{
+	if (!ufence->eudebug.debugger)
+		return;
+
+	xe_eudebug_put(ufence->eudebug.debugger);
+	ufence->eudebug.debugger = NULL;
+}
diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h
index ccc7202b3308..13ba0167b31b 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.h
+++ b/drivers/gpu/drm/xe/xe_eudebug.h
@@ -15,6 +15,7 @@ struct xe_vm;
 struct xe_vma;
 struct xe_exec_queue;
 struct xe_hw_engine;
+struct xe_user_fence;
 
 #if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
 
@@ -41,6 +42,13 @@ void xe_eudebug_vm_bind_start(struct xe_vm *vm);
 void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range);
 void xe_eudebug_vm_bind_end(struct xe_vm *vm, bool has_ufence, int err);
 
+int xe_eudebug_vm_bind_ufence(struct xe_user_fence *ufence);
+void xe_eudebug_ufence_init(struct xe_user_fence *ufence, struct xe_file *xef, struct xe_vm *vm);
+void xe_eudebug_ufence_fini(struct xe_user_fence *ufence);
+
+struct xe_eudebug *xe_eudebug_get(struct xe_file *xef);
+void xe_eudebug_put(struct xe_eudebug *d);
+
 #else
 
 static inline int xe_eudebug_connect_ioctl(struct drm_device *dev,
@@ -66,6 +74,14 @@ static inline void xe_eudebug_vm_bind_start(struct xe_vm *vm) { }
 static inline void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range) { }
 static inline void xe_eudebug_vm_bind_end(struct xe_vm *vm, bool has_ufence, int err) { }
 
+static inline int xe_eudebug_vm_bind_ufence(struct xe_user_fence *ufence) { return 0; }
+static inline void xe_eudebug_ufence_init(struct xe_user_fence *ufence,
+					  struct xe_file *xef, struct xe_vm *vm) { }
+static inline void xe_eudebug_ufence_fini(struct xe_user_fence *ufence) { }
+
+static inline struct xe_eudebug *xe_eudebug_get(struct xe_file *xef) { return NULL; }
+static inline void xe_eudebug_put(struct xe_eudebug *d) { }
+
 #endif /* CONFIG_DRM_XE_EUDEBUG */
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
index cbc316ec3593..ffb0dc71430a 100644
--- a/drivers/gpu/drm/xe/xe_eudebug_types.h
+++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
@@ -150,6 +150,14 @@ struct xe_eudebug {
 		atomic_long_t seqno;
 	} events;
 
+	/* user fences tracked by this debugger */
+	struct {
+		/** @lock: guards access to tree */
+		spinlock_t lock;
+
+		struct rb_root tree;
+	} acks;
+
 	/** @ops operations for eu_control */
 	struct xe_eudebug_eu_control_ops *ops;
 };
@@ -313,4 +321,9 @@ struct xe_eudebug_event_vm_bind_op {
 	u64 range; /* Zero for unmap all ? */
 };
 
+struct xe_eudebug_event_vm_bind_ufence {
+	struct xe_eudebug_event base;
+	u64 vm_bind_ref_seqno;
+};
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 31cca938956f..17dd7a3f8354 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -159,7 +159,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	vm = q->vm;
 
 	for (num_syncs = 0; num_syncs < args->num_syncs; num_syncs++) {
-		err = xe_sync_entry_parse(xe, xef, &syncs[num_syncs],
+		err = xe_sync_entry_parse(xe, xef, vm, &syncs[num_syncs],
 					  &syncs_user[num_syncs], SYNC_PARSE_FLAG_EXEC |
 					  (xe_vm_in_lr_mode(vm) ?
 					   SYNC_PARSE_FLAG_LR_MODE : 0));
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 8dd55798ab31..a32dc3fdabe7 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -1379,7 +1379,8 @@ static int xe_oa_parse_syncs(struct xe_oa *oa, struct xe_oa_open_param *param)
 	}
 
 	for (num_syncs = 0; num_syncs < param->num_syncs; num_syncs++) {
-		ret = xe_sync_entry_parse(oa->xe, param->xef, &param->syncs[num_syncs],
+		ret = xe_sync_entry_parse(oa->xe, param->xef, NULL,
+					  &param->syncs[num_syncs],
 					  &param->syncs_user[num_syncs], 0);
 		if (ret)
 			goto err_syncs;
diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
index 42f5bebd09e5..3e7398983b52 100644
--- a/drivers/gpu/drm/xe/xe_sync.c
+++ b/drivers/gpu/drm/xe/xe_sync.c
@@ -15,27 +15,20 @@
 #include <uapi/drm/xe_drm.h>
 
 #include "xe_device_types.h"
+#include "xe_eudebug.h"
 #include "xe_exec_queue.h"
 #include "xe_macros.h"
 #include "xe_sched_job_types.h"
 
-struct xe_user_fence {
-	struct xe_device *xe;
-	struct kref refcount;
-	struct dma_fence_cb cb;
-	struct work_struct worker;
-	struct mm_struct *mm;
-	u64 __user *addr;
-	u64 value;
-	int signalled;
-};
-
 static void user_fence_destroy(struct kref *kref)
 {
 	struct xe_user_fence *ufence = container_of(kref, struct xe_user_fence,
 						 refcount);
 
 	mmdrop(ufence->mm);
+
+	xe_eudebug_ufence_fini(ufence);
+
 	kfree(ufence);
 }
 
@@ -49,7 +42,10 @@ static void user_fence_put(struct xe_user_fence *ufence)
 	kref_put(&ufence->refcount, user_fence_destroy);
 }
 
-static struct xe_user_fence *user_fence_create(struct xe_device *xe, u64 addr,
+static struct xe_user_fence *user_fence_create(struct xe_device *xe,
+					       struct xe_file *xef,
+					       struct xe_vm *vm,
+					       u64 addr,
 					       u64 value)
 {
 	struct xe_user_fence *ufence;
@@ -70,12 +66,14 @@ static struct xe_user_fence *user_fence_create(struct xe_device *xe, u64 addr,
 	ufence->mm = current->mm;
 	mmgrab(ufence->mm);
 
+	xe_eudebug_ufence_init(ufence, xef, vm);
+
 	return ufence;
 }
 
-static void user_fence_worker(struct work_struct *w)
+void xe_sync_ufence_signal(struct xe_user_fence *ufence)
 {
-	struct xe_user_fence *ufence = container_of(w, struct xe_user_fence, worker);
+	XE_WARN_ON(!ufence->signalled);
 
 	if (mmget_not_zero(ufence->mm)) {
 		kthread_use_mm(ufence->mm);
@@ -87,12 +85,25 @@ static void user_fence_worker(struct work_struct *w)
 		drm_dbg(&ufence->xe->drm, "mmget_not_zero() failed, ufence wasn't signaled\n");
 	}
 
+	wake_up_all(&ufence->xe->ufence_wq);
+}
+
+static void user_fence_worker(struct work_struct *w)
+{
+	struct xe_user_fence *ufence = container_of(w, struct xe_user_fence, worker);
+	int ret;
+
 	/*
 	 * Wake up waiters only after updating the ufence state, allowing the UMD
 	 * to safely reuse the same ufence without encountering -EBUSY errors.
 	 */
 	WRITE_ONCE(ufence->signalled, 1);
-	wake_up_all(&ufence->xe->ufence_wq);
+
+	/* Lets see if debugger wants to track this */
+	ret = xe_eudebug_vm_bind_ufence(ufence);
+	if (ret)
+		xe_sync_ufence_signal(ufence);
+
 	user_fence_put(ufence);
 }
 
@@ -111,6 +122,7 @@ static void user_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
 }
 
 int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
+			struct xe_vm *vm,
 			struct xe_sync_entry *sync,
 			struct drm_xe_sync __user *sync_user,
 			unsigned int flags)
@@ -192,7 +204,8 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
 		if (exec) {
 			sync->addr = sync_in.addr;
 		} else {
-			sync->ufence = user_fence_create(xe, sync_in.addr,
+			sync->ufence = user_fence_create(xe, xef, vm,
+							 sync_in.addr,
 							 sync_in.timeline_value);
 			if (XE_IOCTL_DBG(xe, IS_ERR(sync->ufence)))
 				return PTR_ERR(sync->ufence);
diff --git a/drivers/gpu/drm/xe/xe_sync.h b/drivers/gpu/drm/xe/xe_sync.h
index 256ffc1e54dc..f5bec2b1b4f6 100644
--- a/drivers/gpu/drm/xe/xe_sync.h
+++ b/drivers/gpu/drm/xe/xe_sync.h
@@ -9,8 +9,12 @@
 #include "xe_sync_types.h"
 
 struct xe_device;
-struct xe_exec_queue;
 struct xe_file;
+struct xe_exec_queue;
+struct drm_syncobj;
+struct dma_fence;
+struct dma_fence_chain;
+struct drm_xe_sync;
 struct xe_sched_job;
 struct xe_vm;
 
@@ -19,6 +23,7 @@ struct xe_vm;
 #define SYNC_PARSE_FLAG_DISALLOW_USER_FENCE	BIT(2)
 
 int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
+			struct xe_vm *vm,
 			struct xe_sync_entry *sync,
 			struct drm_xe_sync __user *sync_user,
 			unsigned int flags);
@@ -40,5 +45,6 @@ struct xe_user_fence *__xe_sync_ufence_get(struct xe_user_fence *ufence);
 struct xe_user_fence *xe_sync_ufence_get(struct xe_sync_entry *sync);
 void xe_sync_ufence_put(struct xe_user_fence *ufence);
 int xe_sync_ufence_get_status(struct xe_user_fence *ufence);
+void xe_sync_ufence_signal(struct xe_user_fence *ufence);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_sync_types.h b/drivers/gpu/drm/xe/xe_sync_types.h
index 30ac3f51993b..dcd3165e66a7 100644
--- a/drivers/gpu/drm/xe/xe_sync_types.h
+++ b/drivers/gpu/drm/xe/xe_sync_types.h
@@ -6,13 +6,31 @@
 #ifndef _XE_SYNC_TYPES_H_
 #define _XE_SYNC_TYPES_H_
 
+#include <linux/dma-fence-array.h>
+#include <linux/kref.h>
+#include <linux/spinlock.h>
 #include <linux/types.h>
 
-struct drm_syncobj;
-struct dma_fence;
-struct dma_fence_chain;
-struct drm_xe_sync;
-struct user_fence;
+struct xe_user_fence {
+	struct xe_device *xe;
+	struct kref refcount;
+	struct dma_fence_cb cb;
+	struct work_struct worker;
+	struct mm_struct *mm;
+	u64 __user *addr;
+	u64 value;
+	int signalled;
+
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+	struct {
+		spinlock_t lock;
+		struct xe_eudebug *debugger;
+		u64 bind_ref_seqno;
+		u64 signalled_seqno;
+		struct work_struct worker;
+	} eudebug;
+#endif
+};
 
 struct xe_sync_entry {
 	struct drm_syncobj *syncobj;
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index e83420473763..0f17bc8b627b 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3037,9 +3037,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		}
 	}
 
+	xe_eudebug_vm_bind_start(vm);
+
 	syncs_user = u64_to_user_ptr(args->syncs);
 	for (num_syncs = 0; num_syncs < args->num_syncs; num_syncs++) {
-		err = xe_sync_entry_parse(xe, xef, &syncs[num_syncs],
+		err = xe_sync_entry_parse(xe, xef, vm, &syncs[num_syncs],
 					  &syncs_user[num_syncs],
 					  (xe_vm_in_lr_mode(vm) ?
 					   SYNC_PARSE_FLAG_LR_MODE : 0) |
diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
index cc34c522fa4d..1d5f1411c9a8 100644
--- a/include/uapi/drm/xe_drm_eudebug.h
+++ b/include/uapi/drm/xe_drm_eudebug.h
@@ -17,6 +17,7 @@ extern "C" {
  */
 #define DRM_XE_EUDEBUG_IOCTL_READ_EVENT		_IO('j', 0x0)
 #define DRM_XE_EUDEBUG_IOCTL_EU_CONTROL		_IOWR('j', 0x2, struct drm_xe_eudebug_eu_control)
+#define DRM_XE_EUDEBUG_IOCTL_ACK_EVENT		_IOW('j', 0x4, struct drm_xe_eudebug_ack_event)
 
 /* XXX: Document events to match their internal counterparts when moved to xe_drm.h */
 struct drm_xe_eudebug_event {
@@ -32,6 +33,7 @@ struct drm_xe_eudebug_event {
 #define DRM_XE_EUDEBUG_EVENT_EU_ATTENTION	6
 #define DRM_XE_EUDEBUG_EVENT_VM_BIND		7
 #define DRM_XE_EUDEBUG_EVENT_VM_BIND_OP		8
+#define DRM_XE_EUDEBUG_EVENT_VM_BIND_UFENCE	9
 
 	__u16 flags;
 #define DRM_XE_EUDEBUG_EVENT_CREATE		(1 << 0)
@@ -174,6 +176,17 @@ struct drm_xe_eudebug_event_vm_bind_op {
 	__u64 range; /* XXX: Zero for unmap all? */
 };
 
+struct drm_xe_eudebug_event_vm_bind_ufence {
+	struct drm_xe_eudebug_event base;
+	__u64 vm_bind_ref_seqno; /* *_event_vm_bind.base.seqno */
+};
+
+struct drm_xe_eudebug_ack_event {
+	__u32 type;
+	__u32 flags; /* MBZ */
+	__u64 seqno;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 12/26] drm/xe/eudebug: vm open/pread/pwrite
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (10 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 11/26] drm/xe/eudebug: Add UFENCE events with acks Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 13/26] drm/xe: add system memory page iterator support to xe_res_cursor Mika Kuoppala
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Mika Kuoppala, Matthew Brost

Debugger needs access to the client's vm to read and write. For
example inspecting ISA/ELF and setting up breakpoints.

Add ioctl to open target vm with debugger client and vm_handle
and hook up pread/pwrite possibility.

Open will take timeout argument so that standard fsync
can be used for explicit flushing between cpu/gpu for
the target vm.

Implement this for bo backed storage. userptr will
be done in following patch.

v2: - checkpatch (Maciej)
    - 32bit fixes (Andrzej)
    - bo_vmap (Mika)
    - fix vm leak if can't allocate k_buffer (Mika)
    - assert vm write held for vma (Matthew)

v3: - fw ref, ttm_bo_access
    - timeout boundary check (Dominik)
    - dont try to copy to user on zero bytes (Mika)

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/regs/xe_gt_regs.h |  24 ++
 drivers/gpu/drm/xe/xe_eudebug.c      | 442 +++++++++++++++++++++++++++
 include/uapi/drm/xe_drm_eudebug.h    |  19 ++
 3 files changed, 485 insertions(+)

diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index 5fcf06835ef0..4c620f95b466 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -551,6 +551,30 @@
 #define   CCS_MODE_CSLICE(cslice, ccs) \
 	((ccs) << ((cslice) * CCS_MODE_CSLICE_WIDTH))
 
+#define RCU_ASYNC_FLUSH				XE_REG(0x149fc)
+#define   RCU_ASYNC_FLUSH_IN_PROGRESS	REG_BIT(31)
+#define   RCU_ASYNC_FLUSH_ENGINE_ID_SHIFT	28
+#define   RCU_ASYNC_FLUSH_ENGINE_ID_DECODE1 REG_BIT(26)
+#define   RCU_ASYNC_FLUSH_AMFS		REG_BIT(8)
+#define   RCU_ASYNC_FLUSH_PREFETCH	REG_BIT(7)
+#define   RCU_ASYNC_FLUSH_DATA_PORT	REG_BIT(6)
+#define   RCU_ASYNC_FLUSH_DATA_CACHE	REG_BIT(5)
+#define   RCU_ASYNC_FLUSH_HDC_PIPELINE	REG_BIT(4)
+#define   RCU_ASYNC_INVALIDATE_HDC_PIPELINE REG_BIT(3)
+#define   RCU_ASYNC_INVALIDATE_CONSTANT_CACHE REG_BIT(2)
+#define   RCU_ASYNC_INVALIDATE_TEXTURE_CACHE REG_BIT(1)
+#define   RCU_ASYNC_INVALIDATE_INSTRUCTION_CACHE REG_BIT(0)
+#define   RCU_ASYNC_FLUSH_AND_INVALIDATE_ALL ( \
+	RCU_ASYNC_FLUSH_AMFS | \
+	RCU_ASYNC_FLUSH_PREFETCH | \
+	RCU_ASYNC_FLUSH_DATA_PORT | \
+	RCU_ASYNC_FLUSH_DATA_CACHE | \
+	RCU_ASYNC_FLUSH_HDC_PIPELINE | \
+	RCU_ASYNC_INVALIDATE_HDC_PIPELINE | \
+	RCU_ASYNC_INVALIDATE_CONSTANT_CACHE | \
+	RCU_ASYNC_INVALIDATE_TEXTURE_CACHE | \
+	RCU_ASYNC_INVALIDATE_INSTRUCTION_CACHE)
+
 #define RCU_DEBUG_1				XE_REG(0x14a00)
 #define   RCU_DEBUG_1_ENGINE_STATUS		REG_GENMASK(2, 0)
 #define   RCU_DEBUG_1_RUNALONE_ACTIVE		REG_BIT(2)
diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 3cf3616e546d..9d87df75348b 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -5,9 +5,12 @@
 
 #include <linux/anon_inodes.h>
 #include <linux/delay.h>
+#include <linux/file.h>
 #include <linux/poll.h>
 #include <linux/uaccess.h>
+#include <linux/vmalloc.h>
 
+#include <drm/drm_drv.h>
 #include <drm/drm_managed.h>
 
 #include <generated/xe_wa_oob.h>
@@ -16,6 +19,7 @@
 #include "regs/xe_engine_regs.h"
 
 #include "xe_assert.h"
+#include "xe_bo.h"
 #include "xe_device.h"
 #include "xe_eudebug.h"
 #include "xe_eudebug_types.h"
@@ -1222,6 +1226,8 @@ static long xe_eudebug_eu_control(struct xe_eudebug *d, const u64 arg)
 	return ret;
 }
 
+static long xe_eudebug_vm_open_ioctl(struct xe_eudebug *d, unsigned long arg);
+
 static long xe_eudebug_ioctl(struct file *file,
 			     unsigned int cmd,
 			     unsigned long arg)
@@ -1246,6 +1252,11 @@ static long xe_eudebug_ioctl(struct file *file,
 		ret = xe_eudebug_ack_event_ioctl(d, cmd, arg);
 		eu_dbg(d, "ioctl cmd=EVENT_ACK ret=%ld\n", ret);
 		break;
+	case DRM_XE_EUDEBUG_IOCTL_VM_OPEN:
+		ret = xe_eudebug_vm_open_ioctl(d, arg);
+		eu_dbg(d, "ioctl cmd=VM_OPEN ret=%ld\n", ret);
+		break;
+
 	default:
 		ret = -EINVAL;
 	}
@@ -3038,3 +3049,434 @@ void xe_eudebug_ufence_fini(struct xe_user_fence *ufence)
 	xe_eudebug_put(ufence->eudebug.debugger);
 	ufence->eudebug.debugger = NULL;
 }
+
+static int xe_eudebug_vma_access(struct xe_vma *vma, u64 offset_in_vma,
+				 void *buf, u64 len, bool write)
+{
+	struct xe_bo *bo;
+	u64 bytes;
+
+	lockdep_assert_held_write(&xe_vma_vm(vma)->lock);
+
+	if (XE_WARN_ON(offset_in_vma >= xe_vma_size(vma)))
+		return -EINVAL;
+
+	bytes = min_t(u64, len, xe_vma_size(vma) - offset_in_vma);
+	if (!bytes)
+		return 0;
+
+	bo = xe_bo_get(xe_vma_bo(vma));
+	if (bo) {
+		int ret;
+
+		ret = ttm_bo_access(&bo->ttm, offset_in_vma, buf, bytes, write);
+
+		xe_bo_put(bo);
+
+		return ret;
+	}
+
+	return -EINVAL;
+}
+
+static int xe_eudebug_vm_access(struct xe_vm *vm, u64 offset,
+				void *buf, u64 len, bool write)
+{
+	struct xe_vma *vma;
+	int ret;
+
+	down_write(&vm->lock);
+
+	vma = xe_vm_find_overlapping_vma(vm, offset, len);
+	if (vma) {
+		/* XXX: why find overlapping returns below start? */
+		if (offset < xe_vma_start(vma) ||
+		    offset >= (xe_vma_start(vma) + xe_vma_size(vma))) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		/* Offset into vma */
+		offset -= xe_vma_start(vma);
+		ret = xe_eudebug_vma_access(vma, offset, buf, len, write);
+	} else {
+		ret = -EINVAL;
+	}
+
+out:
+	up_write(&vm->lock);
+
+	return ret;
+}
+
+struct vm_file {
+	struct xe_eudebug *debugger;
+	struct xe_file *xef;
+	struct xe_vm *vm;
+	u64 flags;
+	u64 client_id;
+	u64 vm_handle;
+	unsigned int timeout_us;
+};
+
+static ssize_t __vm_read_write(struct xe_vm *vm,
+			       void *bb,
+			       char __user *r_buffer,
+			       const char __user *w_buffer,
+			       unsigned long offset,
+			       unsigned long len,
+			       const bool write)
+{
+	ssize_t ret;
+
+	if (!len)
+		return 0;
+
+	if (write) {
+		ret = copy_from_user(bb, w_buffer, len);
+		if (ret)
+			return -EFAULT;
+
+		ret = xe_eudebug_vm_access(vm, offset, bb, len, true);
+		if (ret <= 0)
+			return ret;
+
+		len = ret;
+	} else {
+		ret = xe_eudebug_vm_access(vm, offset, bb, len, false);
+		if (ret <= 0)
+			return ret;
+
+		len = ret;
+
+		ret = copy_to_user(r_buffer, bb, len);
+		if (ret)
+			return -EFAULT;
+	}
+
+	return len;
+}
+
+static struct xe_vm *find_vm_get(struct xe_eudebug *d, const u32 id)
+{
+	struct xe_vm *vm;
+
+	mutex_lock(&d->res->lock);
+	vm = find_resource__unlocked(d->res, XE_EUDEBUG_RES_TYPE_VM, id);
+	if (vm)
+		xe_vm_get(vm);
+
+	mutex_unlock(&d->res->lock);
+
+	return vm;
+}
+
+static ssize_t __xe_eudebug_vm_access(struct file *file,
+				      char __user *r_buffer,
+				      const char __user *w_buffer,
+				      size_t count, loff_t *__pos)
+{
+	struct vm_file *vmf = file->private_data;
+	struct xe_eudebug * const d = vmf->debugger;
+	struct xe_device * const xe = d->xe;
+	const bool write = !!w_buffer;
+	struct xe_vm *vm;
+	ssize_t copied = 0;
+	ssize_t bytes_left = count;
+	ssize_t ret;
+	unsigned long alloc_len;
+	loff_t pos = *__pos;
+	void *k_buffer;
+
+	if (XE_IOCTL_DBG(xe, write && r_buffer))
+		return -EINVAL;
+
+	vm = find_vm_get(d, vmf->vm_handle);
+	if (XE_IOCTL_DBG(xe, !vm))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, vm != vmf->vm)) {
+		eu_warn(d, "vm_access(%s): vm handle mismatch client_handle=%llu, vm_handle=%llu, flags=0x%llx, pos=%llu, count=%zu\n",
+			write ? "write" : "read",
+			vmf->client_id, vmf->vm_handle, vmf->flags, pos, count);
+		xe_vm_put(vm);
+		return -EINVAL;
+	}
+
+	if (!count) {
+		xe_vm_put(vm);
+		return 0;
+	}
+
+	alloc_len = min_t(unsigned long, ALIGN(count, PAGE_SIZE), 64 * SZ_1M);
+	do  {
+		k_buffer = vmalloc(alloc_len);
+		if (k_buffer)
+			break;
+
+		alloc_len >>= 1;
+	} while (alloc_len > PAGE_SIZE);
+
+	if (XE_IOCTL_DBG(xe, !k_buffer)) {
+		xe_vm_put(vm);
+		return -ENOMEM;
+	}
+
+	do {
+		const ssize_t len = min_t(ssize_t, bytes_left, alloc_len);
+
+		ret = __vm_read_write(vm, k_buffer,
+				      write ? NULL : r_buffer + copied,
+				      write ? w_buffer + copied : NULL,
+				      pos + copied,
+				      len,
+				      write);
+		if (ret <= 0)
+			break;
+
+		bytes_left -= ret;
+		copied += ret;
+	} while (bytes_left > 0);
+
+	vfree(k_buffer);
+	xe_vm_put(vm);
+
+	if (XE_WARN_ON(copied < 0))
+		copied = 0;
+
+	*__pos += copied;
+
+	return copied ?: ret;
+}
+
+static ssize_t xe_eudebug_vm_read(struct file *file,
+				  char __user *buffer,
+				  size_t count, loff_t *pos)
+{
+	return __xe_eudebug_vm_access(file, buffer, NULL, count, pos);
+}
+
+static ssize_t xe_eudebug_vm_write(struct file *file,
+				   const char __user *buffer,
+				   size_t count, loff_t *pos)
+{
+	return __xe_eudebug_vm_access(file, NULL, buffer, count, pos);
+}
+
+static int engine_rcu_flush(struct xe_eudebug *d,
+			    struct xe_hw_engine *hwe,
+			    unsigned int timeout_us)
+{
+	const struct xe_reg psmi_addr = RING_PSMI_CTL(hwe->mmio_base);
+	struct xe_gt *gt = hwe->gt;
+	unsigned int fw_ref;
+	u32 mask = RCU_ASYNC_FLUSH_AND_INVALIDATE_ALL;
+	u32 psmi_ctrl;
+	u32 id;
+	int ret;
+
+	if (hwe->class == XE_ENGINE_CLASS_RENDER)
+		id = 0;
+	else if (hwe->class == XE_ENGINE_CLASS_COMPUTE)
+		id = hwe->instance + 1;
+	else
+		return -EINVAL;
+
+	if (id < 8)
+		mask |= id << RCU_ASYNC_FLUSH_ENGINE_ID_SHIFT;
+	else
+		mask |= (id - 8) << RCU_ASYNC_FLUSH_ENGINE_ID_SHIFT |
+			RCU_ASYNC_FLUSH_ENGINE_ID_DECODE1;
+
+	fw_ref = xe_force_wake_get(gt_to_fw(gt), hwe->domain);
+	if (!fw_ref)
+		return -ETIMEDOUT;
+
+	/* Prevent concurrent flushes */
+	mutex_lock(&d->eu_lock);
+	psmi_ctrl = xe_mmio_read32(&gt->mmio, psmi_addr);
+	if (!(psmi_ctrl & IDLE_MSG_DISABLE))
+		xe_mmio_write32(&gt->mmio, psmi_addr, _MASKED_BIT_ENABLE(IDLE_MSG_DISABLE));
+
+	/* XXX: Timeout is per operation but in here we flush previous */
+	ret = xe_mmio_wait32(&gt->mmio, RCU_ASYNC_FLUSH,
+			     RCU_ASYNC_FLUSH_IN_PROGRESS, 0,
+			     timeout_us, NULL, false);
+	if (ret)
+		goto out;
+
+	xe_mmio_write32(&gt->mmio, RCU_ASYNC_FLUSH, mask);
+
+	ret = xe_mmio_wait32(&gt->mmio, RCU_ASYNC_FLUSH,
+			     RCU_ASYNC_FLUSH_IN_PROGRESS, 0,
+			     timeout_us, NULL, false);
+out:
+	if (!(psmi_ctrl & IDLE_MSG_DISABLE))
+		xe_mmio_write32(&gt->mmio, psmi_addr, _MASKED_BIT_DISABLE(IDLE_MSG_DISABLE));
+
+	mutex_unlock(&d->eu_lock);
+	xe_force_wake_put(gt_to_fw(gt), fw_ref);
+
+	return ret;
+}
+
+static int xe_eudebug_vm_fsync(struct file *file, loff_t start, loff_t end, int datasync)
+{
+	struct vm_file *vmf = file->private_data;
+	struct xe_eudebug *d = vmf->debugger;
+	struct xe_gt *gt;
+	int gt_id;
+	int ret = -EINVAL;
+
+	eu_dbg(d, "vm_fsync: client_handle=%llu, vm_handle=%llu, flags=0x%llx, start=%llu, end=%llu datasync=%d\n",
+	       vmf->client_id, vmf->vm_handle, vmf->flags, start, end, datasync);
+
+	for_each_gt(gt, d->xe, gt_id) {
+		struct xe_hw_engine *hwe;
+		enum xe_hw_engine_id id;
+
+		/* XXX: vm open per engine? */
+		for_each_hw_engine(hwe, gt, id) {
+			if (hwe->class != XE_ENGINE_CLASS_RENDER &&
+			    hwe->class != XE_ENGINE_CLASS_COMPUTE)
+				continue;
+
+			ret = engine_rcu_flush(d, hwe, vmf->timeout_us);
+			if (ret)
+				break;
+		}
+	}
+
+	return ret;
+}
+
+static int xe_eudebug_vm_release(struct inode *inode, struct file *file)
+{
+	struct vm_file *vmf = file->private_data;
+	struct xe_eudebug *d = vmf->debugger;
+
+	eu_dbg(d, "vm_release: client_handle=%llu, vm_handle=%llu, flags=0x%llx",
+	       vmf->client_id, vmf->vm_handle, vmf->flags);
+
+	xe_vm_put(vmf->vm);
+	xe_file_put(vmf->xef);
+	xe_eudebug_put(d);
+	drm_dev_put(&d->xe->drm);
+
+	kfree(vmf);
+
+	return 0;
+}
+
+static const struct file_operations vm_fops = {
+	.owner   = THIS_MODULE,
+	.llseek  = generic_file_llseek,
+	.read    = xe_eudebug_vm_read,
+	.write   = xe_eudebug_vm_write,
+	.fsync   = xe_eudebug_vm_fsync,
+	.mmap    = NULL,
+	.release = xe_eudebug_vm_release,
+};
+
+static long
+xe_eudebug_vm_open_ioctl(struct xe_eudebug *d, unsigned long arg)
+{
+	const u64 max_timeout_ns = DRM_XE_EUDEBUG_VM_SYNC_MAX_TIMEOUT_NSECS;
+	struct drm_xe_eudebug_vm_open param;
+	struct xe_device * const xe = d->xe;
+	struct vm_file *vmf = NULL;
+	struct xe_file *xef;
+	struct xe_vm *vm;
+	struct file *file;
+	long ret = 0;
+	int fd;
+
+	if (XE_IOCTL_DBG(xe, _IOC_SIZE(DRM_XE_EUDEBUG_IOCTL_VM_OPEN) != sizeof(param)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, !(_IOC_DIR(DRM_XE_EUDEBUG_IOCTL_VM_OPEN) & _IOC_WRITE)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, copy_from_user(&param, (void __user *)arg, sizeof(param))))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, param.flags))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, param.timeout_ns > max_timeout_ns))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, xe_eudebug_detached(d)))
+		return -ENOTCONN;
+
+	xef = find_client_get(d, param.client_handle);
+	if (xef)
+		vm = find_vm_get(d, param.vm_handle);
+	else
+		vm = NULL;
+
+	if (XE_IOCTL_DBG(xe, !xef))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, !vm)) {
+		ret = -EINVAL;
+		goto out_file_put;
+	}
+
+	vmf = kzalloc(sizeof(*vmf), GFP_KERNEL);
+	if (XE_IOCTL_DBG(xe, !vmf)) {
+		ret = -ENOMEM;
+		goto out_vm_put;
+	}
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (XE_IOCTL_DBG(xe, fd < 0)) {
+		ret = fd;
+		goto out_free;
+	}
+
+	kref_get(&d->ref);
+	vmf->debugger = d;
+	vmf->vm = vm;
+	vmf->xef = xef;
+	vmf->flags = param.flags;
+	vmf->client_id = param.client_handle;
+	vmf->vm_handle = param.vm_handle;
+	vmf->timeout_us = div64_u64(param.timeout_ns, 1000ull);
+
+	file = anon_inode_getfile("[xe_eudebug.vm]", &vm_fops, vmf, O_RDWR);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		XE_IOCTL_DBG(xe, ret);
+		file = NULL;
+		goto out_fd_put;
+	}
+
+	file->f_mode |= FMODE_PREAD | FMODE_PWRITE |
+		FMODE_READ | FMODE_WRITE | FMODE_LSEEK;
+
+	fd_install(fd, file);
+
+	eu_dbg(d, "vm_open: client_handle=%llu, handle=%llu, flags=0x%llx, fd=%d",
+	       vmf->client_id, vmf->vm_handle, vmf->flags, fd);
+
+	XE_WARN_ON(ret);
+
+	drm_dev_get(&xe->drm);
+
+	return fd;
+
+out_fd_put:
+	put_unused_fd(fd);
+	xe_eudebug_put(d);
+out_free:
+	kfree(vmf);
+out_vm_put:
+	xe_vm_put(vm);
+out_file_put:
+	xe_file_put(xef);
+
+	XE_WARN_ON(ret >= 0);
+
+	return ret;
+}
diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
index 1d5f1411c9a8..a5f13563b3b9 100644
--- a/include/uapi/drm/xe_drm_eudebug.h
+++ b/include/uapi/drm/xe_drm_eudebug.h
@@ -18,6 +18,7 @@ extern "C" {
 #define DRM_XE_EUDEBUG_IOCTL_READ_EVENT		_IO('j', 0x0)
 #define DRM_XE_EUDEBUG_IOCTL_EU_CONTROL		_IOWR('j', 0x2, struct drm_xe_eudebug_eu_control)
 #define DRM_XE_EUDEBUG_IOCTL_ACK_EVENT		_IOW('j', 0x4, struct drm_xe_eudebug_ack_event)
+#define DRM_XE_EUDEBUG_IOCTL_VM_OPEN		_IOW('j', 0x1, struct drm_xe_eudebug_vm_open)
 
 /* XXX: Document events to match their internal counterparts when moved to xe_drm.h */
 struct drm_xe_eudebug_event {
@@ -187,6 +188,24 @@ struct drm_xe_eudebug_ack_event {
 	__u64 seqno;
 };
 
+struct drm_xe_eudebug_vm_open {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+	/** @client_handle: id of client */
+	__u64 client_handle;
+
+	/** @vm_handle: id of vm */
+	__u64 vm_handle;
+
+	/** @flags: flags */
+	__u64 flags;
+
+#define DRM_XE_EUDEBUG_VM_SYNC_MAX_TIMEOUT_NSECS (10ULL * NSEC_PER_SEC)
+	/** @timeout_ns: Timeout value in nanoseconds operations (fsync) */
+	__u64 timeout_ns;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 13/26] drm/xe: add system memory page iterator support to xe_res_cursor
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (11 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 12/26] drm/xe/eudebug: vm open/pread/pwrite Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access Mika Kuoppala
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Andrzej Hajda, Mika Kuoppala

From: Andrzej Hajda <andrzej.hajda@intel.com>

Currently xe_res_cursor allows iteration only over DMA side of scatter
gatter tables.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_res_cursor.h | 51 +++++++++++++++++++++++-------
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_res_cursor.h b/drivers/gpu/drm/xe/xe_res_cursor.h
index dca374b6521c..c1f39a680ae0 100644
--- a/drivers/gpu/drm/xe/xe_res_cursor.h
+++ b/drivers/gpu/drm/xe/xe_res_cursor.h
@@ -129,18 +129,35 @@ static inline void __xe_res_sg_next(struct xe_res_cursor *cur)
 {
 	struct scatterlist *sgl = cur->sgl;
 	u64 start = cur->start;
+	unsigned int len;
 
-	while (start >= sg_dma_len(sgl)) {
-		start -= sg_dma_len(sgl);
+	while (true) {
+		len = (cur->mem_type == XE_PL_SYSTEM) ? sgl->length : sg_dma_len(sgl);
+		if (start < len)
+			break;
+		start -= len;
 		sgl = sg_next(sgl);
 		XE_WARN_ON(!sgl);
 	}
-
 	cur->start = start;
-	cur->size = sg_dma_len(sgl) - start;
+	cur->size = len - start;
 	cur->sgl = sgl;
 }
 
+static inline void __xe_res_first_sg(const struct sg_table *sg,
+				   u64 start, u64 size,
+				   struct xe_res_cursor *cur, u32 mem_type)
+{
+	XE_WARN_ON(!sg);
+	cur->node = NULL;
+	cur->start = start;
+	cur->remaining = size;
+	cur->size = 0;
+	cur->sgl = sg->sgl;
+	cur->mem_type = mem_type;
+	__xe_res_sg_next(cur);
+}
+
 /**
  * xe_res_first_sg - initialize a xe_res_cursor with a scatter gather table
  *
@@ -155,14 +172,24 @@ static inline void xe_res_first_sg(const struct sg_table *sg,
 				   u64 start, u64 size,
 				   struct xe_res_cursor *cur)
 {
-	XE_WARN_ON(!sg);
-	cur->node = NULL;
-	cur->start = start;
-	cur->remaining = size;
-	cur->size = 0;
-	cur->sgl = sg->sgl;
-	cur->mem_type = XE_PL_TT;
-	__xe_res_sg_next(cur);
+	__xe_res_first_sg(sg, start, size, cur, XE_PL_TT);
+}
+
+/**
+ * xe_res_first_sg_system - initialize a xe_res_cursor for iterate system memory pages
+ *
+ * @sg: scatter gather table to walk
+ * @start: Start of the range
+ * @size: Size of the range
+ * @cur: cursor object to initialize
+ *
+ * Start walking over the range of allocations between @start and @size
+ */
+static inline void xe_res_first_sg_system(const struct sg_table *sg,
+				   u64 start, u64 size,
+				   struct xe_res_cursor *cur)
+{
+	__xe_res_first_sg(sg, start, size, cur, XE_PL_SYSTEM);
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (12 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 13/26] drm/xe: add system memory page iterator support to xe_res_cursor Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 14:03   ` Christian König
                     ` (2 more replies)
  2024-12-09 13:33 ` [PATCH 15/26] drm/xe: Debug metadata create/destroy ioctls Mika Kuoppala
                   ` (17 subsequent siblings)
  31 siblings, 3 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Andrzej Hajda, Maciej Patelczyk,
	Mika Kuoppala, Jonathan Cavitt

From: Andrzej Hajda <andrzej.hajda@intel.com>

Debugger needs to read/write program's vmas including userptr_vma.
Since hmm_range_fault is used to pin userptr vmas, it is possible
to map those vmas from debugger context.

v2: pin pages vs notifier, move to vm.c (Matthew)
v3: - iterate over system pages instead of DMA, fixes iommu enabled
    - s/xe_uvma_access/xe_vm_uvma_access/ (Matt)

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> #v1
---
 drivers/gpu/drm/xe/xe_eudebug.c |  3 ++-
 drivers/gpu/drm/xe/xe_vm.c      | 47 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.h      |  3 +++
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 9d87df75348b..e5949e4dcad8 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -3076,7 +3076,8 @@ static int xe_eudebug_vma_access(struct xe_vma *vma, u64 offset_in_vma,
 		return ret;
 	}
 
-	return -EINVAL;
+	return xe_vm_userptr_access(to_userptr_vma(vma), offset_in_vma,
+				    buf, bytes, write);
 }
 
 static int xe_eudebug_vm_access(struct xe_vm *vm, u64 offset,
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 0f17bc8b627b..224ff9e16941 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3414,3 +3414,50 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
 	}
 	kvfree(snap);
 }
+
+int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
+			 void *buf, u64 len, bool write)
+{
+	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
+	struct xe_userptr *up = &uvma->userptr;
+	struct xe_res_cursor cur = {};
+	int cur_len, ret = 0;
+
+	while (true) {
+		down_read(&vm->userptr.notifier_lock);
+		if (!xe_vma_userptr_check_repin(uvma))
+			break;
+
+		spin_lock(&vm->userptr.invalidated_lock);
+		list_del_init(&uvma->userptr.invalidate_link);
+		spin_unlock(&vm->userptr.invalidated_lock);
+
+		up_read(&vm->userptr.notifier_lock);
+		ret = xe_vma_userptr_pin_pages(uvma);
+		if (ret)
+			return ret;
+	}
+
+	if (!up->sg) {
+		ret = -EINVAL;
+		goto out_unlock_notifier;
+	}
+
+	for (xe_res_first_sg_system(up->sg, offset, len, &cur); cur.remaining;
+	     xe_res_next(&cur, cur_len)) {
+		void *ptr = kmap_local_page(sg_page(cur.sgl)) + cur.start;
+
+		cur_len = min(cur.size, cur.remaining);
+		if (write)
+			memcpy(ptr, buf, cur_len);
+		else
+			memcpy(buf, ptr, cur_len);
+		kunmap_local(ptr);
+		buf += cur_len;
+	}
+	ret = len;
+
+out_unlock_notifier:
+	up_read(&vm->userptr.notifier_lock);
+	return ret;
+}
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 23adb7442881..372ad40ad67f 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -280,3 +280,6 @@ struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm);
 void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
 void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p);
 void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
+
+int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
+			 void *buf, u64 len, bool write);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 15/26] drm/xe: Debug metadata create/destroy ioctls
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (13 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 16/26] drm/xe: Attach debug metadata to vma Mika Kuoppala
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Dominik Grzegorzek, Matthew Auld,
	Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

Ad a part of eu debug feature introduce debug metadata objects.
These are to be used to pass metadata between client and debugger,
by attaching them to vm_bind operations.

todo: WORK_IN_PROGRESS_* defines need to be reworded/refined when
      the real usage and need is established by l0+gdb.

v2: - include uapi/drm/xe_drm.h
    - metadata behind kconfig (Mika)
    - dont leak args->id on error (Matt Auld)

Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/Makefile                  |   3 +-
 drivers/gpu/drm/xe/xe_debug_metadata.c       | 107 +++++++++++++++++++
 drivers/gpu/drm/xe/xe_debug_metadata.h       |  50 +++++++++
 drivers/gpu/drm/xe/xe_debug_metadata_types.h |  25 +++++
 drivers/gpu/drm/xe/xe_device.c               |   5 +
 drivers/gpu/drm/xe/xe_device.h               |   2 +
 drivers/gpu/drm/xe/xe_device_types.h         |   7 ++
 drivers/gpu/drm/xe/xe_eudebug.c              |  13 +++
 include/uapi/drm/xe_drm.h                    |  53 ++++++++-
 9 files changed, 263 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata.c
 create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata.h
 create mode 100644 drivers/gpu/drm/xe/xe_debug_metadata_types.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 33f457e4fcd3..e7dc299ea178 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -117,7 +117,8 @@ xe-y += xe_bb.o \
 	xe_wa.o \
 	xe_wopcm.o
 
-xe-$(CONFIG_DRM_XE_EUDEBUG) += xe_eudebug.o
+xe-$(CONFIG_DRM_XE_EUDEBUG) += xe_eudebug.o \
+	xe_debug_metadata.o
 
 xe-$(CONFIG_HMM_MIRROR) += xe_hmm.o
 
diff --git a/drivers/gpu/drm/xe/xe_debug_metadata.c b/drivers/gpu/drm/xe/xe_debug_metadata.c
new file mode 100644
index 000000000000..1dfed9aed285
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_debug_metadata.c
@@ -0,0 +1,107 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+#include "xe_debug_metadata.h"
+
+#include <drm/drm_device.h>
+#include <drm/drm_file.h>
+#include <uapi/drm/xe_drm.h>
+
+#include "xe_device.h"
+#include "xe_macros.h"
+
+static void xe_debug_metadata_release(struct kref *ref)
+{
+	struct xe_debug_metadata *mdata = container_of(ref, struct xe_debug_metadata, refcount);
+
+	kvfree(mdata->ptr);
+	kfree(mdata);
+}
+
+void xe_debug_metadata_put(struct xe_debug_metadata *mdata)
+{
+	kref_put(&mdata->refcount, xe_debug_metadata_release);
+}
+
+int xe_debug_metadata_create_ioctl(struct drm_device *dev,
+				   void *data,
+				   struct drm_file *file)
+{
+	struct xe_device *xe = to_xe_device(dev);
+	struct xe_file *xef = to_xe_file(file);
+	struct drm_xe_debug_metadata_create *args = data;
+	struct xe_debug_metadata *mdata;
+	int err;
+	u32 id;
+
+	if (XE_IOCTL_DBG(xe, args->extensions))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, args->type > DRM_XE_DEBUG_METADATA_PROGRAM_MODULE))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, !args->user_addr || !args->len))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, !access_ok(u64_to_user_ptr(args->user_addr), args->len)))
+		return -EFAULT;
+
+	mdata = kzalloc(sizeof(*mdata), GFP_KERNEL);
+	if (!mdata)
+		return -ENOMEM;
+
+	mdata->len = args->len;
+	mdata->type = args->type;
+
+	mdata->ptr = kvmalloc(mdata->len, GFP_KERNEL);
+	if (!mdata->ptr) {
+		kfree(mdata);
+		return -ENOMEM;
+	}
+	kref_init(&mdata->refcount);
+
+	err = copy_from_user(mdata->ptr, u64_to_user_ptr(args->user_addr), mdata->len);
+	if (err) {
+		err = -EFAULT;
+		goto put_mdata;
+	}
+
+	mutex_lock(&xef->eudebug.metadata.lock);
+	err = xa_alloc(&xef->eudebug.metadata.xa, &id, mdata, xa_limit_32b, GFP_KERNEL);
+	mutex_unlock(&xef->eudebug.metadata.lock);
+
+	if (err)
+		goto put_mdata;
+
+	args->metadata_id = id;
+
+	return 0;
+
+put_mdata:
+	xe_debug_metadata_put(mdata);
+	return err;
+}
+
+int xe_debug_metadata_destroy_ioctl(struct drm_device *dev,
+				    void *data,
+				    struct drm_file *file)
+{
+	struct xe_device *xe = to_xe_device(dev);
+	struct xe_file *xef = to_xe_file(file);
+	struct drm_xe_debug_metadata_destroy * const args = data;
+	struct xe_debug_metadata *mdata;
+
+	if (XE_IOCTL_DBG(xe, args->extensions))
+		return -EINVAL;
+
+	mutex_lock(&xef->eudebug.metadata.lock);
+	mdata = xa_erase(&xef->eudebug.metadata.xa, args->metadata_id);
+	mutex_unlock(&xef->eudebug.metadata.lock);
+	if (XE_IOCTL_DBG(xe, !mdata))
+		return -ENOENT;
+
+	xe_debug_metadata_put(mdata);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/xe/xe_debug_metadata.h b/drivers/gpu/drm/xe/xe_debug_metadata.h
new file mode 100644
index 000000000000..3266c25e657e
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_debug_metadata.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_DEBUG_METADATA_H_
+#define _XE_DEBUG_METADATA_H_
+
+struct drm_device;
+struct drm_file;
+
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+
+#include "xe_debug_metadata_types.h"
+
+void xe_debug_metadata_put(struct xe_debug_metadata *mdata);
+
+int xe_debug_metadata_create_ioctl(struct drm_device *dev,
+				   void *data,
+				   struct drm_file *file);
+
+int xe_debug_metadata_destroy_ioctl(struct drm_device *dev,
+				    void *data,
+				    struct drm_file *file);
+#else /* CONFIG_DRM_XE_EUDEBUG */
+
+#include <linux/errno.h>
+
+struct xe_debug_metadata;
+
+static inline void xe_debug_metadata_put(struct xe_debug_metadata *mdata) { }
+
+static inline int xe_debug_metadata_create_ioctl(struct drm_device *dev,
+						 void *data,
+						 struct drm_file *file)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int xe_debug_metadata_destroy_ioctl(struct drm_device *dev,
+						  void *data,
+						  struct drm_file *file)
+{
+	return -EOPNOTSUPP;
+}
+
+#endif /* CONFIG_DRM_XE_EUDEBUG */
+
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_debug_metadata_types.h b/drivers/gpu/drm/xe/xe_debug_metadata_types.h
new file mode 100644
index 000000000000..624852920f58
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_debug_metadata_types.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_DEBUG_METADATA_TYPES_H_
+#define _XE_DEBUG_METADATA_TYPES_H_
+
+#include <linux/kref.h>
+
+struct xe_debug_metadata {
+	/** @type: type of given metadata */
+	u64 type;
+
+	/** @ptr: copy of userptr, given as a metadata payload */
+	void *ptr;
+
+	/** @len: length, in bytes of the metadata */
+	u64 len;
+
+	/** @ref: reference count */
+	struct kref refcount;
+};
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index dc0336215912..a7a715475184 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -25,6 +25,7 @@
 #include "xe_bo.h"
 #include "xe_debugfs.h"
 #include "xe_devcoredump.h"
+#include "xe_debug_metadata.h"
 #include "xe_dma_buf.h"
 #include "xe_drm_client.h"
 #include "xe_drv.h"
@@ -197,6 +198,10 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_EUDEBUG_CONNECT, xe_eudebug_connect_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(XE_DEBUG_METADATA_CREATE, xe_debug_metadata_create_ioctl,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(XE_DEBUG_METADATA_DESTROY, xe_debug_metadata_destroy_ioctl,
+			  DRM_RENDER_ALLOW),
 };
 
 static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 088831a6b863..f89f1f0fc25e 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -218,6 +218,8 @@ static inline int xe_eudebug_needs_lock(const unsigned int cmd)
 	case DRM_XE_EXEC_QUEUE_CREATE:
 	case DRM_XE_EXEC_QUEUE_DESTROY:
 	case DRM_XE_EUDEBUG_CONNECT:
+	case DRM_XE_DEBUG_METADATA_CREATE:
+	case DRM_XE_DEBUG_METADATA_DESTROY:
 		return 1;
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 7b893a86d83f..4ab9f06eba2d 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -684,6 +684,13 @@ struct xe_file {
 	struct {
 		/** @client_link: list entry in xe_device.clients.list */
 		struct list_head client_link;
+
+		struct {
+			/** @xa: xarray to store debug metadata */
+			struct xarray xa;
+			/** @lock: protects debug metadata xarray */
+			struct mutex lock;
+		} metadata;
 	} eudebug;
 #endif
 };
diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index e5949e4dcad8..e9092ed0b344 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -21,6 +21,7 @@
 #include "xe_assert.h"
 #include "xe_bo.h"
 #include "xe_device.h"
+#include "xe_debug_metadata.h"
 #include "xe_eudebug.h"
 #include "xe_eudebug_types.h"
 #include "xe_exec_queue.h"
@@ -2141,6 +2142,8 @@ void xe_eudebug_file_open(struct xe_file *xef)
 	struct xe_eudebug *d;
 
 	INIT_LIST_HEAD(&xef->eudebug.client_link);
+	mutex_init(&xef->eudebug.metadata.lock);
+	xa_init_flags(&xef->eudebug.metadata.xa, XA_FLAGS_ALLOC1);
 
 	down_read(&xef->xe->eudebug.discovery_lock);
 
@@ -2158,12 +2161,22 @@ void xe_eudebug_file_open(struct xe_file *xef)
 void xe_eudebug_file_close(struct xe_file *xef)
 {
 	struct xe_eudebug *d;
+	unsigned long idx;
+	struct xe_debug_metadata *mdata;
 
 	down_read(&xef->xe->eudebug.discovery_lock);
 	d = xe_eudebug_get(xef);
 	if (d)
 		xe_eudebug_event_put(d, client_destroy_event(d, xef));
 
+	mutex_lock(&xef->eudebug.metadata.lock);
+	xa_for_each(&xef->eudebug.metadata.xa, idx, mdata)
+		xe_debug_metadata_put(mdata);
+	mutex_unlock(&xef->eudebug.metadata.lock);
+
+	xa_destroy(&xef->eudebug.metadata.xa);
+	mutex_destroy(&xef->eudebug.metadata.lock);
+
 	spin_lock(&xef->xe->clients.lock);
 	list_del_init(&xef->eudebug.client_link);
 	spin_unlock(&xef->xe->clients.lock);
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index d0b9ef0799b2..1a452a8d2a2a 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -103,7 +103,8 @@ extern "C" {
 #define DRM_XE_WAIT_USER_FENCE		0x0a
 #define DRM_XE_OBSERVATION		0x0b
 #define DRM_XE_EUDEBUG_CONNECT		0x0c
-
+#define DRM_XE_DEBUG_METADATA_CREATE	0x0d
+#define DRM_XE_DEBUG_METADATA_DESTROY	0x0e
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_XE_DEVICE_QUERY		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_DEVICE_QUERY, struct drm_xe_device_query)
@@ -119,6 +120,8 @@ extern "C" {
 #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
 #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
 #define DRM_IOCTL_XE_EUDEBUG_CONNECT		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_EUDEBUG_CONNECT, struct drm_xe_eudebug_connect)
+#define DRM_IOCTL_XE_DEBUG_METADATA_CREATE	 DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_DEBUG_METADATA_CREATE, struct drm_xe_debug_metadata_create)
+#define DRM_IOCTL_XE_DEBUG_METADATA_DESTROY	 DRM_IOW(DRM_COMMAND_BASE + DRM_XE_DEBUG_METADATA_DESTROY, struct drm_xe_debug_metadata_destroy)
 
 /**
  * DOC: Xe IOCTL Extensions
@@ -1733,6 +1736,54 @@ struct drm_xe_eudebug_connect {
 	__u32 version; /* output: current ABI (ioctl / events) version */
 };
 
+/*
+ * struct drm_xe_debug_metadata_create - Create debug metadata
+ *
+ * Add a region of user memory to be marked as debug metadata.
+ * When the debugger attaches, the metadata regions will be delivered
+ * for debugger. Debugger can then map these regions to help decode
+ * the program state.
+ *
+ * Returns handle to created metadata entry.
+ */
+struct drm_xe_debug_metadata_create {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+#define DRM_XE_DEBUG_METADATA_ELF_BINARY     0
+#define DRM_XE_DEBUG_METADATA_PROGRAM_MODULE 1
+#define WORK_IN_PROGRESS_DRM_XE_DEBUG_METADATA_MODULE_AREA 2
+#define WORK_IN_PROGRESS_DRM_XE_DEBUG_METADATA_SBA_AREA 3
+#define WORK_IN_PROGRESS_DRM_XE_DEBUG_METADATA_SIP_AREA 4
+#define WORK_IN_PROGRESS_DRM_XE_DEBUG_METADATA_NUM (1 + \
+	  WORK_IN_PROGRESS_DRM_XE_DEBUG_METADATA_SIP_AREA)
+
+	/** @type: Type of metadata */
+	__u64 type;
+
+	/** @user_addr: pointer to start of the metadata */
+	__u64 user_addr;
+
+	/** @len: length, in bytes of the medata */
+	__u64 len;
+
+	/** @metadata_id: created metadata handle (out) */
+	__u32 metadata_id;
+};
+
+/**
+ * struct drm_xe_debug_metadata_destroy - Destroy debug metadata
+ *
+ * Destroy debug metadata.
+ */
+struct drm_xe_debug_metadata_destroy {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+	/** @metadata_id: metadata handle to destroy */
+	__u32 metadata_id;
+};
+
 #include "xe_drm_eudebug.h"
 
 #if defined(__cplusplus)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 16/26] drm/xe: Attach debug metadata to vma
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (14 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 15/26] drm/xe: Debug metadata create/destroy ioctls Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 17/26] drm/xe/eudebug: Add debug metadata support for xe_eudebug Mika Kuoppala
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Dominik Grzegorzek, Maciej Patelczyk,
	Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

Introduces a vm_bind_op extension, enabling users to attach metadata
objects to each [OP_MAP|OP_MAP_USERPTR] operation. This interface
will be utilized by the EU debugger to relay information about the
contents of specified VMAs from the debugee to the debugger process.

v2: move vma metadata handling behind Kconfig (Mika)

Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_debug_metadata.c | 120 +++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_debug_metadata.h |  52 +++++++++++
 drivers/gpu/drm/xe/xe_vm.c             |  99 +++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm_types.h       |  27 ++++++
 include/uapi/drm/xe_drm.h              |  19 ++++
 5 files changed, 313 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_debug_metadata.c b/drivers/gpu/drm/xe/xe_debug_metadata.c
index 1dfed9aed285..b045bdd77235 100644
--- a/drivers/gpu/drm/xe/xe_debug_metadata.c
+++ b/drivers/gpu/drm/xe/xe_debug_metadata.c
@@ -10,6 +10,113 @@
 
 #include "xe_device.h"
 #include "xe_macros.h"
+#include "xe_vm.h"
+
+void xe_eudebug_free_vma_metadata(struct xe_eudebug_vma_metadata *mdata)
+{
+	struct xe_vma_debug_metadata *vmad, *tmp;
+
+	list_for_each_entry_safe(vmad, tmp, &mdata->list, link) {
+		list_del(&vmad->link);
+		kfree(vmad);
+	}
+}
+
+static struct xe_vma_debug_metadata *
+vma_new_debug_metadata(u32 metadata_id, u64 cookie)
+{
+	struct xe_vma_debug_metadata *vmad;
+
+	vmad = kzalloc(sizeof(*vmad), GFP_KERNEL);
+	if (!vmad)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&vmad->link);
+
+	vmad->metadata_id = metadata_id;
+	vmad->cookie = cookie;
+
+	return vmad;
+}
+
+int xe_eudebug_copy_vma_metadata(struct xe_eudebug_vma_metadata *from,
+				 struct xe_eudebug_vma_metadata *to)
+{
+	struct xe_vma_debug_metadata *vmad, *vma;
+
+	list_for_each_entry(vmad, &from->list, link) {
+		vma = vma_new_debug_metadata(vmad->metadata_id, vmad->cookie);
+		if (IS_ERR(vma))
+			return PTR_ERR(vma);
+
+		list_add_tail(&vmad->link, &to->list);
+	}
+
+	return 0;
+}
+
+static int vma_new_debug_metadata_op(struct xe_vma_op *op,
+				     u32 metadata_id, u64 cookie,
+				     u64 flags)
+{
+	struct xe_vma_debug_metadata *vmad;
+
+	vmad = vma_new_debug_metadata(metadata_id, cookie);
+	if (IS_ERR(vmad))
+		return PTR_ERR(vmad);
+
+	list_add_tail(&vmad->link, &op->map.eudebug.metadata.list);
+
+	return 0;
+}
+
+int vm_bind_op_ext_attach_debug(struct xe_device *xe,
+				struct xe_file *xef,
+				struct drm_gpuva_ops *ops,
+				u32 operation, u64 extension)
+{
+	u64 __user *address = u64_to_user_ptr(extension);
+	struct drm_xe_vm_bind_op_ext_attach_debug ext;
+	struct xe_debug_metadata *mdata;
+	struct drm_gpuva_op *__op;
+	int err;
+
+	err = __copy_from_user(&ext, address, sizeof(ext));
+	if (XE_IOCTL_DBG(xe, err))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe,
+			 operation != DRM_XE_VM_BIND_OP_MAP_USERPTR &&
+			 operation != DRM_XE_VM_BIND_OP_MAP))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, ext.flags))
+		return -EINVAL;
+
+	mdata = xe_debug_metadata_get(xef, (u32)ext.metadata_id);
+	if (XE_IOCTL_DBG(xe, !mdata))
+		return -ENOENT;
+
+	/* care about metadata existence only on the time of attach */
+	xe_debug_metadata_put(mdata);
+
+	if (!ops)
+		return 0;
+
+	drm_gpuva_for_each_op(__op, ops) {
+		struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
+
+		if (op->base.op == DRM_GPUVA_OP_MAP) {
+			err = vma_new_debug_metadata_op(op,
+							ext.metadata_id,
+							ext.cookie,
+							ext.flags);
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
 
 static void xe_debug_metadata_release(struct kref *ref)
 {
@@ -24,6 +131,19 @@ void xe_debug_metadata_put(struct xe_debug_metadata *mdata)
 	kref_put(&mdata->refcount, xe_debug_metadata_release);
 }
 
+struct xe_debug_metadata *xe_debug_metadata_get(struct xe_file *xef, u32 id)
+{
+	struct xe_debug_metadata *mdata;
+
+	mutex_lock(&xef->eudebug.metadata.lock);
+	mdata = xa_load(&xef->eudebug.metadata.xa, id);
+	if (mdata)
+		kref_get(&mdata->refcount);
+	mutex_unlock(&xef->eudebug.metadata.lock);
+
+	return mdata;
+}
+
 int xe_debug_metadata_create_ioctl(struct drm_device *dev,
 				   void *data,
 				   struct drm_file *file)
diff --git a/drivers/gpu/drm/xe/xe_debug_metadata.h b/drivers/gpu/drm/xe/xe_debug_metadata.h
index 3266c25e657e..ba913a4d6def 100644
--- a/drivers/gpu/drm/xe/xe_debug_metadata.h
+++ b/drivers/gpu/drm/xe/xe_debug_metadata.h
@@ -6,13 +6,18 @@
 #ifndef _XE_DEBUG_METADATA_H_
 #define _XE_DEBUG_METADATA_H_
 
+#include <linux/types.h>
+
 struct drm_device;
 struct drm_file;
+struct xe_file;
 
 #if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
 
 #include "xe_debug_metadata_types.h"
+#include "xe_vm_types.h"
 
+struct xe_debug_metadata *xe_debug_metadata_get(struct xe_file *xef, u32 id);
 void xe_debug_metadata_put(struct xe_debug_metadata *mdata);
 
 int xe_debug_metadata_create_ioctl(struct drm_device *dev,
@@ -22,11 +27,35 @@ int xe_debug_metadata_create_ioctl(struct drm_device *dev,
 int xe_debug_metadata_destroy_ioctl(struct drm_device *dev,
 				    void *data,
 				    struct drm_file *file);
+
+static inline void xe_eudebug_move_vma_metadata(struct xe_eudebug_vma_metadata *from,
+						struct xe_eudebug_vma_metadata *to)
+{
+	list_splice_tail_init(&from->list, &to->list);
+}
+
+int xe_eudebug_copy_vma_metadata(struct xe_eudebug_vma_metadata *from,
+				 struct xe_eudebug_vma_metadata *to);
+void xe_eudebug_free_vma_metadata(struct xe_eudebug_vma_metadata *mdata);
+
+int vm_bind_op_ext_attach_debug(struct xe_device *xe,
+				struct xe_file *xef,
+				struct drm_gpuva_ops *ops,
+				u32 operation, u64 extension);
+
 #else /* CONFIG_DRM_XE_EUDEBUG */
 
 #include <linux/errno.h>
 
 struct xe_debug_metadata;
+struct xe_device;
+struct xe_eudebug_vma_metadata;
+struct drm_gpuva_ops;
+
+static inline struct xe_debug_metadata *xe_debug_metadata_get(struct xe_file *xef, u32 id)
+{
+	return NULL;
+}
 
 static inline void xe_debug_metadata_put(struct xe_debug_metadata *mdata) { }
 
@@ -44,6 +73,29 @@ static inline int xe_debug_metadata_destroy_ioctl(struct drm_device *dev,
 	return -EOPNOTSUPP;
 }
 
+static inline void xe_eudebug_move_vma_metadata(struct xe_eudebug_vma_metadata *from,
+						struct xe_eudebug_vma_metadata *to)
+{
+}
+
+static inline int xe_eudebug_copy_vma_metadata(struct xe_eudebug_vma_metadata *from,
+					       struct xe_eudebug_vma_metadata *to)
+{
+	return 0;
+}
+
+static inline void xe_eudebug_free_vma_metadata(struct xe_eudebug_vma_metadata *mdata)
+{
+}
+
+static inline int vm_bind_op_ext_attach_debug(struct xe_device *xe,
+					      struct xe_file *xef,
+					      struct drm_gpuva_ops *ops,
+					      u32 operation, u64 extension)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_DRM_XE_EUDEBUG */
 
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 224ff9e16941..19c0b36c10b1 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -23,6 +23,7 @@
 #include "regs/xe_gtt_defs.h"
 #include "xe_assert.h"
 #include "xe_bo.h"
+#include "xe_debug_metadata.h"
 #include "xe_device.h"
 #include "xe_drm_client.h"
 #include "xe_eudebug.h"
@@ -944,6 +945,9 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 			vma->gpuva.gem.obj = &bo->ttm.base;
 	}
 
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+	INIT_LIST_HEAD(&vma->eudebug.metadata.list);
+#endif
 	INIT_LIST_HEAD(&vma->combined_links.rebind);
 
 	INIT_LIST_HEAD(&vma->gpuva.gem.entry);
@@ -1036,6 +1040,7 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 		xe_bo_put(xe_vma_bo(vma));
 	}
 
+	xe_eudebug_free_vma_metadata(&vma->eudebug.metadata);
 	xe_vma_free(vma);
 }
 
@@ -1979,6 +1984,9 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
 			op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE;
 			op->map.pat_index = pat_index;
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+			INIT_LIST_HEAD(&op->map.eudebug.metadata.list);
+#endif
 		} else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
 			op->prefetch.region = prefetch_region;
 		}
@@ -2170,11 +2178,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 			flags |= op->map.dumpable ?
 				VMA_CREATE_FLAG_DUMPABLE : 0;
 
-			vma = new_vma(vm, &op->base.map, op->map.pat_index,
-				      flags);
+			vma = new_vma(vm, &op->base.map, op->map.pat_index, flags);
 			if (IS_ERR(vma))
 				return PTR_ERR(vma);
 
+			xe_eudebug_move_vma_metadata(&op->map.eudebug.metadata,
+						     &vma->eudebug.metadata);
+
 			op->map.vma = vma;
 			if (op->map.immediate || !xe_vm_in_fault_mode(vm))
 				xe_vma_ops_incr_pt_update_ops(vops,
@@ -2205,6 +2215,9 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 				if (IS_ERR(vma))
 					return PTR_ERR(vma);
 
+				xe_eudebug_move_vma_metadata(&old->eudebug.metadata,
+							     &vma->eudebug.metadata);
+
 				op->remap.prev = vma;
 
 				/*
@@ -2244,6 +2257,16 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 				if (IS_ERR(vma))
 					return PTR_ERR(vma);
 
+				if (op->base.remap.prev) {
+					err = xe_eudebug_copy_vma_metadata(&op->remap.prev->eudebug.metadata,
+									   &vma->eudebug.metadata);
+					if (err)
+						return err;
+				} else {
+					xe_eudebug_move_vma_metadata(&old->eudebug.metadata,
+								     &vma->eudebug.metadata);
+				}
+
 				op->remap.next = vma;
 
 				/*
@@ -2294,6 +2317,7 @@ static void xe_vma_op_unwind(struct xe_vm *vm, struct xe_vma_op *op,
 	switch (op->base.op) {
 	case DRM_GPUVA_OP_MAP:
 		if (op->map.vma) {
+			xe_eudebug_free_vma_metadata(&op->map.eudebug.metadata);
 			prep_vma_destroy(vm, op->map.vma, post_commit);
 			xe_vma_destroy_unlocked(op->map.vma);
 		}
@@ -2532,6 +2556,58 @@ static int vm_ops_setup_tile_args(struct xe_vm *vm, struct xe_vma_ops *vops)
 	}
 
 	return number_tiles;
+};
+
+typedef int (*xe_vm_bind_op_user_extension_fn)(struct xe_device *xe,
+					       struct xe_file *xef,
+					       struct drm_gpuva_ops *ops,
+					       u32 operation, u64 extension);
+
+static const xe_vm_bind_op_user_extension_fn vm_bind_op_extension_funcs[] = {
+	[XE_VM_BIND_OP_EXTENSIONS_ATTACH_DEBUG] = vm_bind_op_ext_attach_debug,
+};
+
+#define MAX_USER_EXTENSIONS	16
+static int vm_bind_op_user_extensions(struct xe_device *xe,
+				      struct xe_file *xef,
+				      struct drm_gpuva_ops *ops,
+				      u32 operation,
+				      u64 extensions, int ext_number)
+{
+	u64 __user *address = u64_to_user_ptr(extensions);
+	struct drm_xe_user_extension ext;
+	int err;
+
+	if (XE_IOCTL_DBG(xe, ext_number >= MAX_USER_EXTENSIONS))
+		return -E2BIG;
+
+	err = __copy_from_user(&ext, address, sizeof(ext));
+	if (XE_IOCTL_DBG(xe, err))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, ext.pad) ||
+	    XE_IOCTL_DBG(xe, ext.name >=
+			 ARRAY_SIZE(vm_bind_op_extension_funcs)))
+		return -EINVAL;
+
+	err = vm_bind_op_extension_funcs[ext.name](xe, xef, ops,
+						   operation, extensions);
+	if (XE_IOCTL_DBG(xe, err))
+		return err;
+
+	if (ext.next_extension)
+		return vm_bind_op_user_extensions(xe, xef, ops,
+						  operation, ext.next_extension,
+						  ++ext_number);
+
+	return 0;
+}
+
+static int vm_bind_op_user_extensions_check(struct xe_device *xe,
+					    struct xe_file *xef,
+					    u32 operation, u64 extensions)
+{
+	return vm_bind_op_user_extensions(xe, xef, NULL, operation, extensions, 0);
 }
 
 static struct dma_fence *ops_execute(struct xe_vm *vm,
@@ -2729,6 +2805,7 @@ ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_execute, ERRNO);
 #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
 
 static int vm_bind_ioctl_check_args(struct xe_device *xe,
+				    struct xe_file *xef,
 				    struct drm_xe_vm_bind *args,
 				    struct drm_xe_vm_bind_op **bind_ops)
 {
@@ -2773,6 +2850,7 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 		u64 obj_offset = (*bind_ops)[i].obj_offset;
 		u32 prefetch_region = (*bind_ops)[i].prefetch_mem_region_instance;
 		bool is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
+		u64 extensions = (*bind_ops)[i].extensions;
 		u16 pat_index = (*bind_ops)[i].pat_index;
 		u16 coh_mode;
 
@@ -2833,6 +2911,13 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 			err = -EINVAL;
 			goto free_bind_ops;
 		}
+
+		if (extensions) {
+			err = vm_bind_op_user_extensions_check(xe, xef, op, extensions);
+			if (err)
+				goto free_bind_ops;
+		}
+
 	}
 
 	return 0;
@@ -2944,7 +3029,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	int err;
 	int i;
 
-	err = vm_bind_ioctl_check_args(xe, args, &bind_ops);
+	err = vm_bind_ioctl_check_args(xe, xef, args, &bind_ops);
 	if (err)
 		return err;
 
@@ -3073,11 +3158,17 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u64 obj_offset = bind_ops[i].obj_offset;
 		u32 prefetch_region = bind_ops[i].prefetch_mem_region_instance;
 		u16 pat_index = bind_ops[i].pat_index;
+		u64 extensions = bind_ops[i].extensions;
 
 		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
 						  addr, range, op, flags,
 						  prefetch_region, pat_index);
-		if (IS_ERR(ops[i])) {
+		if (!IS_ERR(ops[i]) && extensions) {
+			err = vm_bind_op_user_extensions(xe, xef, ops[i],
+							 op, extensions, 0);
+			if (err)
+				goto unwind_ops;
+		} else if (IS_ERR(ops[i])) {
 			err = PTR_ERR(ops[i]);
 			ops[i] = NULL;
 			goto unwind_ops;
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 557b047ebdd7..1c5776194e54 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -70,6 +70,14 @@ struct xe_userptr {
 #endif
 };
 
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+struct xe_eudebug_vma_metadata {
+	struct list_head list;
+};
+#else
+struct xe_eudebug_vma_metadata { };
+#endif
+
 struct xe_vma {
 	/** @gpuva: Base GPUVA object */
 	struct drm_gpuva gpuva;
@@ -121,6 +129,11 @@ struct xe_vma {
 	 * Needs to be signalled before UNMAP can be processed.
 	 */
 	struct xe_user_fence *ufence;
+
+	struct {
+		/** @metadata: List of vma debug metadata */
+		struct xe_eudebug_vma_metadata metadata;
+	} eudebug;
 };
 
 /**
@@ -311,6 +324,10 @@ struct xe_vma_op_map {
 	bool dumpable;
 	/** @pat_index: The pat index to use for this operation. */
 	u16 pat_index;
+	struct  {
+		/** @vma_metadata: List of vma debug metadata */
+		struct xe_eudebug_vma_metadata metadata;
+	} eudebug;
 };
 
 /** struct xe_vma_op_remap - VMA remap operation */
@@ -388,4 +405,14 @@ struct xe_vma_ops {
 #endif
 };
 
+struct xe_vma_debug_metadata {
+	/** @debug.metadata: id of attached xe_debug_metadata */
+	u32 metadata_id;
+	/** @debug.cookie: user defined cookie */
+	u64 cookie;
+
+	/** @link: list of metadata attached to vma */
+	struct list_head link;
+};
+
 #endif
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 1a452a8d2a2a..176c348c3fdd 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -888,6 +888,23 @@ struct drm_xe_vm_destroy {
 	__u64 reserved[2];
 };
 
+struct drm_xe_vm_bind_op_ext_attach_debug {
+	/** @base: base user extension */
+	struct drm_xe_user_extension base;
+
+	/** @id: Debug object id from create metadata */
+	__u64 metadata_id;
+
+	/** @flags: Flags */
+	__u64 flags;
+
+	/** @cookie: Cookie */
+	__u64 cookie;
+
+	/** @reserved: Reserved */
+	__u64 reserved;
+};
+
 /**
  * struct drm_xe_vm_bind_op - run bind operations
  *
@@ -912,7 +929,9 @@ struct drm_xe_vm_destroy {
  *    handle MBZ, and the BO offset MBZ. This flag is intended to
  *    implement VK sparse bindings.
  */
+
 struct drm_xe_vm_bind_op {
+#define XE_VM_BIND_OP_EXTENSIONS_ATTACH_DEBUG 0
 	/** @extensions: Pointer to the first extension struct, if any */
 	__u64 extensions;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 17/26] drm/xe/eudebug: Add debug metadata support for xe_eudebug
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (15 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 16/26] drm/xe: Attach debug metadata to vma Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 18/26] drm/xe/eudebug: Implement vm_bind_op discovery Mika Kuoppala
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Dominik Grzegorzek, Maciej Patelczyk,
	Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

Reflect debug metadata resource creation/destroy as events passed to the
debugger. Introduce ioctl allowing to read metadata content on demand.

Each VMA can have multiple metadata attached and it is passed from user
on BIND or it's copied on internal remap.

Xe EU Debugger on VM BIND will inform about VMA metadata attachements
during bind IOCTL sending proper OP event.

v2:  - checkpatch (Maciej, Tilak)
     - struct alignment (Matthew)
     - Kconfig (Mika)

Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_debug_metadata.c |   8 +-
 drivers/gpu/drm/xe/xe_eudebug.c        | 330 ++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_eudebug.h        |  21 +-
 drivers/gpu/drm/xe/xe_eudebug_types.h  |  27 +-
 drivers/gpu/drm/xe/xe_vm.c             |   2 +-
 include/uapi/drm/xe_drm_eudebug.h      |  30 +++
 6 files changed, 406 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_debug_metadata.c b/drivers/gpu/drm/xe/xe_debug_metadata.c
index b045bdd77235..172fe2b33557 100644
--- a/drivers/gpu/drm/xe/xe_debug_metadata.c
+++ b/drivers/gpu/drm/xe/xe_debug_metadata.c
@@ -9,6 +9,7 @@
 #include <uapi/drm/xe_drm.h>
 
 #include "xe_device.h"
+#include "xe_eudebug.h"
 #include "xe_macros.h"
 #include "xe_vm.h"
 
@@ -158,7 +159,7 @@ int xe_debug_metadata_create_ioctl(struct drm_device *dev,
 	if (XE_IOCTL_DBG(xe, args->extensions))
 		return -EINVAL;
 
-	if (XE_IOCTL_DBG(xe, args->type > DRM_XE_DEBUG_METADATA_PROGRAM_MODULE))
+	if (XE_IOCTL_DBG(xe, args->type >= WORK_IN_PROGRESS_DRM_XE_DEBUG_METADATA_NUM))
 		return -EINVAL;
 
 	if (XE_IOCTL_DBG(xe, !args->user_addr || !args->len))
@@ -194,8 +195,11 @@ int xe_debug_metadata_create_ioctl(struct drm_device *dev,
 	if (err)
 		goto put_mdata;
 
+
 	args->metadata_id = id;
 
+	xe_eudebug_debug_metadata_create(xef, mdata);
+
 	return 0;
 
 put_mdata:
@@ -221,6 +225,8 @@ int xe_debug_metadata_destroy_ioctl(struct drm_device *dev,
 	if (XE_IOCTL_DBG(xe, !mdata))
 		return -ENOENT;
 
+	xe_eudebug_debug_metadata_destroy(xef, mdata);
+
 	xe_debug_metadata_put(mdata);
 
 	return 0;
diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index e9092ed0b344..2514b880d871 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -20,15 +20,18 @@
 
 #include "xe_assert.h"
 #include "xe_bo.h"
+#include "xe_debug_metadata.h"
 #include "xe_device.h"
 #include "xe_debug_metadata.h"
 #include "xe_eudebug.h"
 #include "xe_eudebug_types.h"
 #include "xe_exec_queue.h"
+#include "xe_exec_queue_types.h"
 #include "xe_force_wake.h"
 #include "xe_gt.h"
 #include "xe_gt_debug.h"
 #include "xe_gt_mcr.h"
+#include "xe_guc_exec_queue_types.h"
 #include "xe_hw_engine.h"
 #include "xe_lrc.h"
 #include "xe_macros.h"
@@ -908,7 +911,7 @@ static struct xe_eudebug_event *
 xe_eudebug_create_event(struct xe_eudebug *d, u16 type, u64 seqno, u16 flags,
 			u32 len)
 {
-	const u16 max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_UFENCE;
+	const u16 max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_OP_METADATA;
 	const u16 known_flags =
 		DRM_XE_EUDEBUG_EVENT_CREATE |
 		DRM_XE_EUDEBUG_EVENT_DESTROY |
@@ -943,7 +946,7 @@ static long xe_eudebug_read_event(struct xe_eudebug *d,
 		u64_to_user_ptr(arg);
 	struct drm_xe_eudebug_event user_event;
 	struct xe_eudebug_event *event;
-	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_UFENCE;
+	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_OP_METADATA;
 	long ret = 0;
 
 	if (XE_IOCTL_DBG(xe, copy_from_user(&user_event, user_orig, sizeof(user_event))))
@@ -1227,6 +1230,90 @@ static long xe_eudebug_eu_control(struct xe_eudebug *d, const u64 arg)
 	return ret;
 }
 
+static struct xe_debug_metadata *find_metadata_get(struct xe_eudebug *d,
+						   u32 id)
+{
+	struct xe_debug_metadata *m;
+
+	mutex_lock(&d->res->lock);
+	m = find_resource__unlocked(d->res, XE_EUDEBUG_RES_TYPE_METADATA, id);
+	if (m)
+		kref_get(&m->refcount);
+	mutex_unlock(&d->res->lock);
+
+	return m;
+}
+
+static long xe_eudebug_read_metadata(struct xe_eudebug *d,
+				     unsigned int cmd,
+				     const u64 arg)
+{
+	struct drm_xe_eudebug_read_metadata user_arg;
+	struct xe_debug_metadata *mdata;
+	struct xe_file *xef;
+	struct xe_device *xe = d->xe;
+	long ret = 0;
+
+	if (XE_IOCTL_DBG(xe, !(_IOC_DIR(cmd) & _IOC_WRITE)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, !(_IOC_DIR(cmd) & _IOC_READ)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, _IOC_SIZE(cmd) < sizeof(user_arg)))
+		return -EINVAL;
+
+	if (copy_from_user(&user_arg, u64_to_user_ptr(arg), sizeof(user_arg)))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, user_arg.flags))
+		return -EINVAL;
+
+	if (!access_ok(u64_to_user_ptr(user_arg.ptr), user_arg.size))
+		return -EFAULT;
+
+	if (xe_eudebug_detached(d))
+		return -ENOTCONN;
+
+	eu_dbg(d,
+	       "read metadata: client_handle=%llu, metadata_handle=%llu, flags=0x%x",
+	       user_arg.client_handle, user_arg.metadata_handle, user_arg.flags);
+
+	xef = find_client_get(d, user_arg.client_handle);
+	if (XE_IOCTL_DBG(xe, !xef))
+		return -EINVAL;
+
+	mdata = find_metadata_get(d, (u32)user_arg.metadata_handle);
+	if (XE_IOCTL_DBG(xe, !mdata)) {
+		xe_file_put(xef);
+		return -EINVAL;
+	}
+
+	if (user_arg.size) {
+		if (user_arg.size < mdata->len) {
+			ret = -EINVAL;
+			goto metadata_put;
+		}
+
+		/* This limits us to a maximum payload size of 2G */
+		if (copy_to_user(u64_to_user_ptr(user_arg.ptr),
+				 mdata->ptr, mdata->len)) {
+			ret = -EFAULT;
+			goto metadata_put;
+		}
+	}
+
+	user_arg.size = mdata->len;
+
+	if (copy_to_user(u64_to_user_ptr(arg), &user_arg, sizeof(user_arg)))
+		ret = -EFAULT;
+
+metadata_put:
+	xe_debug_metadata_put(mdata);
+	xe_file_put(xef);
+	return ret;
+}
+
 static long xe_eudebug_vm_open_ioctl(struct xe_eudebug *d, unsigned long arg);
 
 static long xe_eudebug_ioctl(struct file *file,
@@ -1257,7 +1344,10 @@ static long xe_eudebug_ioctl(struct file *file,
 		ret = xe_eudebug_vm_open_ioctl(d, arg);
 		eu_dbg(d, "ioctl cmd=VM_OPEN ret=%ld\n", ret);
 		break;
-
+	case DRM_XE_EUDEBUG_IOCTL_READ_METADATA:
+		ret = xe_eudebug_read_metadata(d, cmd, arg);
+		eu_dbg(d, "ioctl cmd=READ_METADATA ret=%ld\n", ret);
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -2649,19 +2739,145 @@ static int vm_bind_op_event(struct xe_eudebug *d,
 	return xe_eudebug_queue_bind_event(d, vm, event);
 }
 
+static int vm_bind_op_metadata_event(struct xe_eudebug *d,
+				     struct xe_vm *vm,
+				     u32 flags,
+				     u64 ref_seqno,
+				     u64 metadata_handle,
+				     u64 metadata_cookie)
+{
+	struct xe_eudebug_event_vm_bind_op_metadata *e;
+	struct xe_eudebug_event *event;
+	const u32 sz = sizeof(*e);
+	u64 seqno;
+
+	seqno = atomic_long_inc_return(&d->events.seqno);
+
+	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_VM_BIND_OP_METADATA,
+					seqno, flags, sz);
+	if (!event)
+		return -ENOMEM;
+
+	e = cast_event(e, event);
+
+	write_member(struct drm_xe_eudebug_event_vm_bind_op_metadata, e,
+		     vm_bind_op_ref_seqno, ref_seqno);
+	write_member(struct drm_xe_eudebug_event_vm_bind_op_metadata, e,
+		     metadata_handle, metadata_handle);
+	write_member(struct drm_xe_eudebug_event_vm_bind_op_metadata, e,
+		     metadata_cookie, metadata_cookie);
+
+	/* If in discovery, no need to collect ops */
+	if (!completion_done(&d->discovery))
+		return xe_eudebug_queue_event(d, event);
+
+	return xe_eudebug_queue_bind_event(d, vm, event);
+}
+
+static int vm_bind_op_metadata_count(struct xe_eudebug *d,
+				     struct xe_vm *vm,
+				     struct list_head *debug_metadata)
+{
+	struct xe_vma_debug_metadata *metadata;
+	struct xe_debug_metadata *mdata;
+	int h_m = 0, metadata_count = 0;
+
+	if (!debug_metadata)
+		return 0;
+
+	list_for_each_entry(metadata, debug_metadata, link) {
+		mdata = xe_debug_metadata_get(vm->xef, metadata->metadata_id);
+		if (mdata) {
+			h_m = find_handle(d->res, XE_EUDEBUG_RES_TYPE_METADATA, mdata);
+			xe_debug_metadata_put(mdata);
+		}
+
+		if (!mdata || h_m < 0) {
+			if (!mdata) {
+				eu_err(d, "Metadata::%u not found.",
+				       metadata->metadata_id);
+			} else {
+				eu_err(d, "Metadata::%u not in the xe debugger",
+				       metadata->metadata_id);
+			}
+			xe_eudebug_disconnect(d, -ENOENT);
+			return -ENOENT;
+		}
+		metadata_count++;
+	}
+	return metadata_count;
+}
+
+static int vm_bind_op_metadata(struct xe_eudebug *d, struct xe_vm *vm,
+			       const u32 flags,
+			       const u64 op_ref_seqno,
+			       struct list_head *debug_metadata)
+{
+	struct xe_vma_debug_metadata *metadata;
+	int h_m = 0; /* handle space range = <1, MAX_INT>, return 0 if metadata not attached */
+	int metadata_count = 0;
+	int ret;
+
+	if (!debug_metadata)
+		return 0;
+
+	XE_WARN_ON(flags != DRM_XE_EUDEBUG_EVENT_CREATE);
+
+	list_for_each_entry(metadata, debug_metadata, link) {
+		struct xe_debug_metadata *mdata;
+
+		mdata = xe_debug_metadata_get(vm->xef, metadata->metadata_id);
+		if (mdata) {
+			h_m = find_handle(d->res, XE_EUDEBUG_RES_TYPE_METADATA, mdata);
+			xe_debug_metadata_put(mdata);
+		}
+
+		if (!mdata || h_m < 0) {
+			eu_err(d, "Attached debug metadata::%u not found!\n",
+			       metadata->metadata_id);
+			return -ENOENT;
+		}
+
+		ret = vm_bind_op_metadata_event(d, vm, flags, op_ref_seqno,
+						h_m, metadata->cookie);
+		if (ret < 0)
+			return ret;
+
+		metadata_count++;
+	}
+
+	return metadata_count;
+}
+
 static int vm_bind_op(struct xe_eudebug *d, struct xe_vm *vm,
 		      const u32 flags, const u64 bind_ref_seqno,
-		      u64 addr, u64 range)
+		      u64 addr, u64 range,
+		      struct list_head *debug_metadata)
 {
 	u64 op_seqno = 0;
-	u64 num_extensions = 0;
+	u64 num_extensions;
 	int ret;
 
+	ret = vm_bind_op_metadata_count(d, vm, debug_metadata);
+	if (ret < 0)
+		return ret;
+
+	num_extensions = ret;
+
 	ret = vm_bind_op_event(d, vm, flags, bind_ref_seqno, num_extensions,
 			       addr, range, &op_seqno);
 	if (ret)
 		return ret;
 
+	ret = vm_bind_op_metadata(d, vm, flags, op_seqno, debug_metadata);
+	if (ret < 0)
+		return ret;
+
+	if (ret != num_extensions) {
+		eu_err(d, "Inconsistency in metadata detected.");
+		return -EINVAL;
+	}
+
 	return 0;
 }
 
@@ -2774,9 +2990,11 @@ void xe_eudebug_vm_bind_start(struct xe_vm *vm)
 	xe_eudebug_put(d);
 }
 
-void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range)
+void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range,
+			       struct drm_gpuva_ops *ops)
 {
 	struct xe_eudebug *d;
+	struct list_head *debug_metadata = NULL;
 	u32 flags;
 
 	if (!xe_vm_in_lr_mode(vm))
@@ -2786,7 +3004,17 @@ void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range)
 	case DRM_XE_VM_BIND_OP_MAP:
 	case DRM_XE_VM_BIND_OP_MAP_USERPTR:
 	{
+		struct drm_gpuva_op *__op;
+
 		flags = DRM_XE_EUDEBUG_EVENT_CREATE;
+
+		/* OP_MAP will be last and singleton */
+		drm_gpuva_for_each_op(__op, ops) {
+			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
+
+			if (op->base.op == DRM_GPUVA_OP_MAP)
+				debug_metadata = &op->map.vma->eudebug.metadata.list;
+		}
 		break;
 	}
 	case DRM_XE_VM_BIND_OP_UNMAP:
@@ -2805,7 +3033,8 @@ void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range)
 	if (!d)
 		return;
 
-	xe_eudebug_event_put(d, vm_bind_op(d, vm, flags, 0, addr, range));
+	xe_eudebug_event_put(d, vm_bind_op(d, vm, flags, 0, addr, range,
+					   debug_metadata));
 }
 
 static struct xe_eudebug_event *fetch_bind_event(struct xe_vm * const vm)
@@ -2934,8 +3163,89 @@ int xe_eudebug_vm_bind_ufence(struct xe_user_fence *ufence)
 	return err;
 }
 
+static int send_debug_metadata_event(struct xe_eudebug *d, u32 flags,
+				     u64 client_handle, u64 metadata_handle,
+				     u64 type, u64 len, u64 seqno)
+{
+	struct xe_eudebug_event *event;
+	struct xe_eudebug_event_metadata *e;
+
+	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_METADATA, seqno,
+					flags, sizeof(*e));
+	if (!event)
+		return -ENOMEM;
+
+	e = cast_event(e, event);
+
+	write_member(struct drm_xe_eudebug_event_metadata, e, client_handle, client_handle);
+	write_member(struct drm_xe_eudebug_event_metadata, e, metadata_handle, metadata_handle);
+	write_member(struct drm_xe_eudebug_event_metadata, e, type, type);
+	write_member(struct drm_xe_eudebug_event_metadata, e, len, len);
+
+	return xe_eudebug_queue_event(d, event);
+}
+
+static int debug_metadata_create_event(struct xe_eudebug *d,
+				       struct xe_file *xef, struct xe_debug_metadata *m)
+{
+	int h_c, h_m;
+	u64 seqno;
+
+	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, xef);
+	if (h_c < 0)
+		return h_c;
+
+	h_m = xe_eudebug_add_handle(d, XE_EUDEBUG_RES_TYPE_METADATA, m, &seqno);
+	if (h_m <= 0)
+		return h_m;
+
+	return send_debug_metadata_event(d, DRM_XE_EUDEBUG_EVENT_CREATE,
+					 h_c, h_m, m->type, m->len, seqno);
+}
+
+static int debug_metadata_destroy_event(struct xe_eudebug *d,
+					struct xe_file *xef, struct xe_debug_metadata *m)
+{
+	int h_c, h_m;
+	u64 seqno;
+
+	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, xef);
+	if (h_c < 0)
+		return h_c;
+
+	h_m = xe_eudebug_remove_handle(d, XE_EUDEBUG_RES_TYPE_METADATA, m, &seqno);
+	if (h_m < 0)
+		return h_m;
+
+	return send_debug_metadata_event(d, DRM_XE_EUDEBUG_EVENT_DESTROY,
+					 h_c, h_m, m->type, m->len, seqno);
+}
+
+void xe_eudebug_debug_metadata_create(struct xe_file *xef, struct xe_debug_metadata *m)
+{
+	struct xe_eudebug *d;
+
+	d = xe_eudebug_get(xef);
+	if (!d)
+		return;
+
+	xe_eudebug_event_put(d, debug_metadata_create_event(d, xef, m));
+}
+
+void xe_eudebug_debug_metadata_destroy(struct xe_file *xef, struct xe_debug_metadata *m)
+{
+	struct xe_eudebug *d;
+
+	d = xe_eudebug_get(xef);
+	if (!d)
+		return;
+
+	xe_eudebug_event_put(d, debug_metadata_destroy_event(d, xef, m));
+}
+
 static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
 {
+	struct xe_debug_metadata *m;
 	struct xe_exec_queue *q;
 	struct xe_vm *vm;
 	unsigned long i;
@@ -2945,6 +3255,12 @@ static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
 	if (err)
 		return err;
 
+	xa_for_each(&xef->eudebug.metadata.xa, i, m) {
+		err = debug_metadata_create_event(d, xef, m);
+		if (err)
+			break;
+	}
+
 	xa_for_each(&xef->vm.xa, i, vm) {
 		err = vm_create_event(d, xef, vm);
 		if (err)
diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h
index 13ba0167b31b..572493d341ff 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.h
+++ b/drivers/gpu/drm/xe/xe_eudebug.h
@@ -16,6 +16,8 @@ struct xe_vma;
 struct xe_exec_queue;
 struct xe_hw_engine;
 struct xe_user_fence;
+struct xe_debug_metadata;
+struct drm_gpuva_ops;
 
 #if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
 
@@ -39,7 +41,8 @@ void xe_eudebug_exec_queue_destroy(struct xe_file *xef, struct xe_exec_queue *q)
 
 void xe_eudebug_vm_init(struct xe_vm *vm);
 void xe_eudebug_vm_bind_start(struct xe_vm *vm);
-void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range);
+void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range,
+			       struct drm_gpuva_ops *ops);
 void xe_eudebug_vm_bind_end(struct xe_vm *vm, bool has_ufence, int err);
 
 int xe_eudebug_vm_bind_ufence(struct xe_user_fence *ufence);
@@ -49,6 +52,9 @@ void xe_eudebug_ufence_fini(struct xe_user_fence *ufence);
 struct xe_eudebug *xe_eudebug_get(struct xe_file *xef);
 void xe_eudebug_put(struct xe_eudebug *d);
 
+void xe_eudebug_debug_metadata_create(struct xe_file *xef, struct xe_debug_metadata *m);
+void xe_eudebug_debug_metadata_destroy(struct xe_file *xef, struct xe_debug_metadata *m);
+
 #else
 
 static inline int xe_eudebug_connect_ioctl(struct drm_device *dev,
@@ -71,7 +77,8 @@ static inline void xe_eudebug_exec_queue_destroy(struct xe_file *xef, struct xe_
 
 static inline void xe_eudebug_vm_init(struct xe_vm *vm) { }
 static inline void xe_eudebug_vm_bind_start(struct xe_vm *vm) { }
-static inline void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range) { }
+static inline void xe_eudebug_vm_bind_op_add(struct xe_vm *vm, u32 op, u64 addr, u64 range,
+					     struct drm_gpuva_ops *ops) { }
 static inline void xe_eudebug_vm_bind_end(struct xe_vm *vm, bool has_ufence, int err) { }
 
 static inline int xe_eudebug_vm_bind_ufence(struct xe_user_fence *ufence) { return 0; }
@@ -82,6 +89,16 @@ static inline void xe_eudebug_ufence_fini(struct xe_user_fence *ufence) { }
 static inline struct xe_eudebug *xe_eudebug_get(struct xe_file *xef) { return NULL; }
 static inline void xe_eudebug_put(struct xe_eudebug *d) { }
 
+static inline void xe_eudebug_debug_metadata_create(struct xe_file *xef,
+						    struct xe_debug_metadata *m)
+{
+}
+
+static inline void xe_eudebug_debug_metadata_destroy(struct xe_file *xef,
+						     struct xe_debug_metadata *m)
+{
+}
+
 #endif /* CONFIG_DRM_XE_EUDEBUG */
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
index ffb0dc71430a..a69051b04698 100644
--- a/drivers/gpu/drm/xe/xe_eudebug_types.h
+++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
@@ -56,7 +56,8 @@ struct xe_eudebug_resource {
 #define XE_EUDEBUG_RES_TYPE_VM		1
 #define XE_EUDEBUG_RES_TYPE_EXEC_QUEUE	2
 #define XE_EUDEBUG_RES_TYPE_LRC		3
-#define XE_EUDEBUG_RES_TYPE_COUNT	(XE_EUDEBUG_RES_TYPE_LRC + 1)
+#define XE_EUDEBUG_RES_TYPE_METADATA	4
+#define XE_EUDEBUG_RES_TYPE_COUNT	(XE_EUDEBUG_RES_TYPE_METADATA + 1)
 
 /**
  * struct xe_eudebug_resources - eudebug resources for all types
@@ -326,4 +327,28 @@ struct xe_eudebug_event_vm_bind_ufence {
 	u64 vm_bind_ref_seqno;
 };
 
+struct xe_eudebug_event_metadata {
+	struct xe_eudebug_event base;
+
+	/** @client_handle: client for the attention */
+	u64 client_handle;
+
+	/** @metadata_handle: debug metadata handle it's created/destroyed */
+	u64 metadata_handle;
+
+	/* @type: metadata type, refer to xe_drm.h for options */
+	u64 type;
+
+	/* @len: size of metadata paylad */
+	u64 len;
+};
+
+struct xe_eudebug_event_vm_bind_op_metadata {
+	struct xe_eudebug_event base;
+	u64 vm_bind_op_ref_seqno;
+
+	u64 metadata_handle;
+	u64 metadata_cookie;
+};
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 19c0b36c10b1..474521d0fea9 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3178,7 +3178,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		if (err)
 			goto unwind_ops;
 
-		xe_eudebug_vm_bind_op_add(vm, op, addr, range);
+		xe_eudebug_vm_bind_op_add(vm, op, addr, range, ops[i]);
 
 #ifdef TEST_VM_OPS_ERROR
 		if (flags & FORCE_OP_ERROR) {
diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
index a5f13563b3b9..3c4d1b511acd 100644
--- a/include/uapi/drm/xe_drm_eudebug.h
+++ b/include/uapi/drm/xe_drm_eudebug.h
@@ -19,6 +19,7 @@ extern "C" {
 #define DRM_XE_EUDEBUG_IOCTL_EU_CONTROL		_IOWR('j', 0x2, struct drm_xe_eudebug_eu_control)
 #define DRM_XE_EUDEBUG_IOCTL_ACK_EVENT		_IOW('j', 0x4, struct drm_xe_eudebug_ack_event)
 #define DRM_XE_EUDEBUG_IOCTL_VM_OPEN		_IOW('j', 0x1, struct drm_xe_eudebug_vm_open)
+#define DRM_XE_EUDEBUG_IOCTL_READ_METADATA	_IOWR('j', 0x3, struct drm_xe_eudebug_read_metadata)
 
 /* XXX: Document events to match their internal counterparts when moved to xe_drm.h */
 struct drm_xe_eudebug_event {
@@ -35,6 +36,8 @@ struct drm_xe_eudebug_event {
 #define DRM_XE_EUDEBUG_EVENT_VM_BIND		7
 #define DRM_XE_EUDEBUG_EVENT_VM_BIND_OP		8
 #define DRM_XE_EUDEBUG_EVENT_VM_BIND_UFENCE	9
+#define DRM_XE_EUDEBUG_EVENT_METADATA		10
+#define DRM_XE_EUDEBUG_EVENT_VM_BIND_OP_METADATA 11
 
 	__u16 flags;
 #define DRM_XE_EUDEBUG_EVENT_CREATE		(1 << 0)
@@ -206,6 +209,33 @@ struct drm_xe_eudebug_vm_open {
 	__u64 timeout_ns;
 };
 
+struct drm_xe_eudebug_read_metadata {
+	__u64 client_handle;
+	__u64 metadata_handle;
+	__u32 flags;
+	__u32 reserved;
+	__u64 ptr;
+	__u64 size;
+};
+
+struct drm_xe_eudebug_event_metadata {
+	struct drm_xe_eudebug_event base;
+
+	__u64 client_handle;
+	__u64 metadata_handle;
+	/* XXX: Refer to xe_drm.h for fields */
+	__u64 type;
+	__u64 len;
+};
+
+struct drm_xe_eudebug_event_vm_bind_op_metadata {
+	struct drm_xe_eudebug_event base;
+	__u64 vm_bind_op_ref_seqno; /* *_event_vm_bind_op.base.seqno */
+
+	__u64 metadata_handle;
+	__u64 metadata_cookie;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 18/26] drm/xe/eudebug: Implement vm_bind_op discovery
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (16 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 17/26] drm/xe/eudebug: Add debug metadata support for xe_eudebug Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 19/26] drm/xe/eudebug: Dynamically toggle debugger functionality Mika Kuoppala
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Mika Kuoppala, Matthew Brost

Follow the vm bind, vm_bind op sequence for
discovery process of a vm with the vmas it has.
Send events for ops and attach metadata if available.

v2: - Fix bad op ref seqno (Christoph)
    - with discovery semaphore, we dont need vm lock (Matthew)

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c | 45 +++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 2514b880d871..e17b8f98c7b6 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -3243,6 +3243,47 @@ void xe_eudebug_debug_metadata_destroy(struct xe_file *xef, struct xe_debug_meta
 	xe_eudebug_event_put(d, debug_metadata_destroy_event(d, xef, m));
 }
 
+static int vm_discover_binds(struct xe_eudebug *d, struct xe_vm *vm)
+{
+	struct drm_gpuva *va;
+	unsigned int num_ops = 0, send_ops = 0;
+	u64 ref_seqno = 0;
+	int err;
+
+	/*
+	 * Currently only vm_bind_ioctl inserts vma's
+	 * and with discovery lock, we have exclusivity.
+	 */
+	lockdep_assert_held_write(&d->xe->eudebug.discovery_lock);
+
+	drm_gpuvm_for_each_va(va, &vm->gpuvm)
+		num_ops++;
+
+	if (!num_ops)
+		return 0;
+
+	err = vm_bind_event(d, vm, num_ops, &ref_seqno);
+	if (err)
+		return err;
+
+	drm_gpuvm_for_each_va(va, &vm->gpuvm) {
+		struct xe_vma *vma = container_of(va, struct xe_vma, gpuva);
+
+		if (send_ops >= num_ops)
+			break;
+
+		err = vm_bind_op(d, vm, DRM_XE_EUDEBUG_EVENT_CREATE, ref_seqno,
+				 xe_vma_start(vma), xe_vma_size(vma),
+				 &vma->eudebug.metadata.list);
+		if (err)
+			return err;
+
+		send_ops++;
+	}
+
+	return num_ops == send_ops ? 0 : -EINVAL;
+}
+
 static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
 {
 	struct xe_debug_metadata *m;
@@ -3265,6 +3306,10 @@ static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
 		err = vm_create_event(d, xef, vm);
 		if (err)
 			break;
+
+		err = vm_discover_binds(d, vm);
+		if (err)
+			break;
 	}
 
 	xa_for_each(&xef->exec_queue.xa, i, q) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 19/26] drm/xe/eudebug: Dynamically toggle debugger functionality
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (17 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 18/26] drm/xe/eudebug: Implement vm_bind_op discovery Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 20/26] drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test Mika Kuoppala
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Christoph Manszewski,
	Dominik Grzegorzek, Maciej Patelczyk, Mika Kuoppala

From: Christoph Manszewski <christoph.manszewski@intel.com>

Make it possible to dynamically enable/disable debugger funtionality,
including the setting and unsetting of required hw register values via a
sysfs entry located at '/sys/class/drm/card<X>/device/enable_eudebug'.

This entry uses 'kstrtobool' and as such it accepts inputs as documented
by this function, in particular '0' and '1'.

v2: use new discovery_lock to gain exclusivity (Mika)
v3: remove init_late and init_hw_engine (Dominik)

Signed-off-by: Christoph Manszewski <christoph.manszewski@intel.com>
Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_device.c       |   2 -
 drivers/gpu/drm/xe/xe_device_types.h |   3 +
 drivers/gpu/drm/xe/xe_eudebug.c      | 128 +++++++++++++++++++++++----
 drivers/gpu/drm/xe/xe_eudebug.h      |   4 -
 drivers/gpu/drm/xe/xe_exec_queue.c   |   5 ++
 drivers/gpu/drm/xe/xe_hw_engine.c    |   1 -
 drivers/gpu/drm/xe/xe_reg_sr.c       |  21 +++--
 drivers/gpu/drm/xe/xe_reg_sr.h       |   4 +-
 drivers/gpu/drm/xe/xe_rtp.c          |   2 +-
 9 files changed, 137 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index a7a715475184..3045f2a2ca1d 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -782,8 +782,6 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_debugfs_register(xe);
 
-	xe_eudebug_init_late(xe);
-
 	xe_hwmon_register(xe);
 
 	for_each_gt(gt, xe, id)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 4ab9f06eba2d..f081af5e729d 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -557,6 +557,9 @@ struct xe_device {
 		/** discovery_lock: used for discovery to block xe ioctls */
 		struct rw_semaphore discovery_lock;
 
+		/** @enable: is the debugging functionality enabled */
+		bool enable;
+
 		/** @attention_scan: attention scan worker */
 		struct delayed_work attention_scan;
 	} eudebug;
diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index e17b8f98c7b6..fe947d5350d8 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -2028,9 +2028,6 @@ xe_eudebug_connect(struct xe_device *xe,
 
 	param->version = DRM_XE_EUDEBUG_VERSION;
 
-	if (!xe->eudebug.available)
-		return -EOPNOTSUPP;
-
 	d = kzalloc(sizeof(*d), GFP_KERNEL);
 	if (!d)
 		return -ENOMEM;
@@ -2090,28 +2087,30 @@ int xe_eudebug_connect_ioctl(struct drm_device *dev,
 {
 	struct xe_device *xe = to_xe_device(dev);
 	struct drm_xe_eudebug_connect * const param = data;
-	int ret = 0;
 
-	ret = xe_eudebug_connect(xe, param);
+	lockdep_assert_held(&xe->eudebug.discovery_lock);
 
-	return ret;
+	if (!xe->eudebug.enable)
+		return -ENODEV;
+
+	return xe_eudebug_connect(xe, param);
 }
 
 static void add_sr_entry(struct xe_hw_engine *hwe,
 			 struct xe_reg_mcr mcr_reg,
-			 u32 mask)
+			 u32 mask, bool enable)
 {
 	const struct xe_reg_sr_entry sr_entry = {
 		.reg = mcr_reg.__reg,
 		.clr_bits = mask,
-		.set_bits = mask,
+		.set_bits = enable ? mask : 0,
 		.read_mask = mask,
 	};
 
-	xe_reg_sr_add(&hwe->reg_sr, &sr_entry, hwe->gt);
+	xe_reg_sr_add(&hwe->reg_sr, &sr_entry, hwe->gt, true);
 }
 
-void xe_eudebug_init_hw_engine(struct xe_hw_engine *hwe)
+static void xe_eudebug_reinit_hw_engine(struct xe_hw_engine *hwe, bool enable)
 {
 	struct xe_gt *gt = hwe->gt;
 	struct xe_device *xe = gt_to_xe(gt);
@@ -2123,23 +2122,113 @@ void xe_eudebug_init_hw_engine(struct xe_hw_engine *hwe)
 		return;
 
 	if (XE_WA(gt, 18022722726))
-		add_sr_entry(hwe, ROW_CHICKEN, STALL_DOP_GATING_DISABLE);
+		add_sr_entry(hwe, ROW_CHICKEN,
+			     STALL_DOP_GATING_DISABLE, enable);
 
 	if (XE_WA(gt, 14015474168))
-		add_sr_entry(hwe, ROW_CHICKEN2, XEHPC_DISABLE_BTB);
+		add_sr_entry(hwe, ROW_CHICKEN2,
+			     XEHPC_DISABLE_BTB,
+			     enable);
 
 	if (xe->info.graphics_verx100 >= 1200)
 		add_sr_entry(hwe, TD_CTL,
 			     TD_CTL_BREAKPOINT_ENABLE |
 			     TD_CTL_FORCE_THREAD_BREAKPOINT_ENABLE |
-			     TD_CTL_FEH_AND_FEE_ENABLE);
+			     TD_CTL_FEH_AND_FEE_ENABLE,
+			     enable);
 
 	if (xe->info.graphics_verx100 >= 1250)
-		add_sr_entry(hwe, TD_CTL, TD_CTL_GLOBAL_DEBUG_ENABLE);
+		add_sr_entry(hwe, TD_CTL,
+			     TD_CTL_GLOBAL_DEBUG_ENABLE, enable);
+}
+
+static int xe_eudebug_enable(struct xe_device *xe, bool enable)
+{
+	struct xe_gt *gt;
+	int i;
+	u8 id;
+
+	if (!xe->eudebug.available)
+		return -EOPNOTSUPP;
+
+	/*
+	 * The connect ioctl has read lock so we can
+	 * serialize with taking write
+	 */
+	down_write(&xe->eudebug.discovery_lock);
+
+	if (!enable && !list_empty(&xe->eudebug.list)) {
+		up_write(&xe->eudebug.discovery_lock);
+		return -EBUSY;
+	}
+
+	if (enable == xe->eudebug.enable) {
+		up_write(&xe->eudebug.discovery_lock);
+		return 0;
+	}
+
+	for_each_gt(gt, xe, id) {
+		for (i = 0; i < ARRAY_SIZE(gt->hw_engines); i++) {
+			if (!(gt->info.engine_mask & BIT(i)))
+				continue;
+
+			xe_eudebug_reinit_hw_engine(&gt->hw_engines[i], enable);
+		}
+
+		xe_gt_reset_async(gt);
+		flush_work(&gt->reset.worker);
+	}
+
+	xe->eudebug.enable = enable;
+	up_write(&xe->eudebug.discovery_lock);
+
+	if (enable)
+		attention_scan_flush(xe);
+	else
+		attention_scan_cancel(xe);
+
+	return 0;
+}
+
+static ssize_t enable_eudebug_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct xe_device *xe = pdev_to_xe_device(to_pci_dev(dev));
+
+	return sysfs_emit(buf, "%u\n", xe->eudebug.enable);
+}
+
+static ssize_t enable_eudebug_store(struct device *dev, struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	struct xe_device *xe = pdev_to_xe_device(to_pci_dev(dev));
+	bool enable;
+	int ret;
+
+	ret = kstrtobool(buf, &enable);
+	if (ret)
+		return ret;
+
+	ret = xe_eudebug_enable(xe, enable);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(enable_eudebug);
+
+static void xe_eudebug_sysfs_fini(void *arg)
+{
+	struct xe_device *xe = arg;
+
+	sysfs_remove_file(&xe->drm.dev->kobj, &dev_attr_enable_eudebug.attr);
 }
 
 void xe_eudebug_init(struct xe_device *xe)
 {
+	struct device *dev = xe->drm.dev;
+	int ret;
+
 	spin_lock_init(&xe->eudebug.lock);
 	INIT_LIST_HEAD(&xe->eudebug.list);
 
@@ -2150,14 +2239,17 @@ void xe_eudebug_init(struct xe_device *xe)
 
 	xe->eudebug.ordered_wq = alloc_ordered_workqueue("xe-eudebug-ordered-wq", 0);
 	xe->eudebug.available = !!xe->eudebug.ordered_wq;
-}
 
-void xe_eudebug_init_late(struct xe_device *xe)
-{
 	if (!xe->eudebug.available)
 		return;
 
-	attention_scan_flush(xe);
+	ret = sysfs_create_file(&xe->drm.dev->kobj, &dev_attr_enable_eudebug.attr);
+	if (ret)
+		drm_warn(&xe->drm, "eudebug sysfs init failed: %d, debugger unavailable\n", ret);
+	else
+		devm_add_action_or_reset(dev, xe_eudebug_sysfs_fini, xe);
+
+	xe->eudebug.available = ret == 0;
 }
 
 void xe_eudebug_fini(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h
index 572493d341ff..a08abf796cc1 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.h
+++ b/drivers/gpu/drm/xe/xe_eudebug.h
@@ -26,9 +26,7 @@ int xe_eudebug_connect_ioctl(struct drm_device *dev,
 			     struct drm_file *file);
 
 void xe_eudebug_init(struct xe_device *xe);
-void xe_eudebug_init_late(struct xe_device *xe);
 void xe_eudebug_fini(struct xe_device *xe);
-void xe_eudebug_init_hw_engine(struct xe_hw_engine *hwe);
 
 void xe_eudebug_file_open(struct xe_file *xef);
 void xe_eudebug_file_close(struct xe_file *xef);
@@ -62,9 +60,7 @@ static inline int xe_eudebug_connect_ioctl(struct drm_device *dev,
 					   struct drm_file *file) { return 0; }
 
 static inline void xe_eudebug_init(struct xe_device *xe) { }
-static inline void xe_eudebug_init_late(struct xe_device *xe) { }
 static inline void xe_eudebug_fini(struct xe_device *xe) { }
-static inline void xe_eudebug_init_hw_engine(struct xe_hw_engine *hwe) { }
 
 static inline void xe_eudebug_file_open(struct xe_file *xef) { }
 static inline void xe_eudebug_file_close(struct xe_file *xef) { }
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index cca46a32723e..044a0f2e1873 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -432,6 +432,11 @@ static int exec_queue_set_eudebug(struct xe_device *xe, struct xe_exec_queue *q,
 			 !(value & DRM_XE_EXEC_QUEUE_EUDEBUG_FLAG_ENABLE)))
 		return -EINVAL;
 
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+	if (XE_IOCTL_DBG(xe, !xe->eudebug.enable))
+		return -EPERM;
+#endif
+
 	q->eudebug_flags = EXEC_QUEUE_EUDEBUG_FLAG_ENABLE;
 	q->sched_props.preempt_timeout_us = 0;
 
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 8a188ddc99f4..c734aae88a57 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -559,7 +559,6 @@ static void hw_engine_init_early(struct xe_gt *gt, struct xe_hw_engine *hwe,
 	xe_tuning_process_engine(hwe);
 	xe_wa_process_engine(hwe);
 	hw_engine_setup_default_state(hwe);
-	xe_eudebug_init_hw_engine(hwe);
 
 	xe_reg_sr_init(&hwe->reg_whitelist, hwe->name, gt_to_xe(gt));
 	xe_reg_whitelist_process_engine(hwe);
diff --git a/drivers/gpu/drm/xe/xe_reg_sr.c b/drivers/gpu/drm/xe/xe_reg_sr.c
index e1a0e27cda14..e3a539c1c08e 100644
--- a/drivers/gpu/drm/xe/xe_reg_sr.c
+++ b/drivers/gpu/drm/xe/xe_reg_sr.c
@@ -93,22 +93,31 @@ static void reg_sr_inc_error(struct xe_reg_sr *sr)
 
 int xe_reg_sr_add(struct xe_reg_sr *sr,
 		  const struct xe_reg_sr_entry *e,
-		  struct xe_gt *gt)
+		  struct xe_gt *gt,
+		  bool overwrite)
 {
 	unsigned long idx = e->reg.addr;
 	struct xe_reg_sr_entry *pentry = xa_load(&sr->xa, idx);
 	int ret;
 
 	if (pentry) {
-		if (!compatible_entries(pentry, e)) {
+		if (overwrite && e->set_bits) {
+			pentry->clr_bits |= e->clr_bits;
+			pentry->set_bits |= e->set_bits;
+			pentry->read_mask |= e->read_mask;
+		} else if (overwrite && !e->set_bits) {
+			pentry->clr_bits |= e->clr_bits;
+			pentry->set_bits &= ~e->clr_bits;
+			pentry->read_mask |= e->read_mask;
+		} else if (!compatible_entries(pentry, e)) {
 			ret = -EINVAL;
 			goto fail;
+		} else {
+			pentry->clr_bits |= e->clr_bits;
+			pentry->set_bits |= e->set_bits;
+			pentry->read_mask |= e->read_mask;
 		}
 
-		pentry->clr_bits |= e->clr_bits;
-		pentry->set_bits |= e->set_bits;
-		pentry->read_mask |= e->read_mask;
-
 		return 0;
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_reg_sr.h b/drivers/gpu/drm/xe/xe_reg_sr.h
index 51fbba423e27..d67fafdcd847 100644
--- a/drivers/gpu/drm/xe/xe_reg_sr.h
+++ b/drivers/gpu/drm/xe/xe_reg_sr.h
@@ -6,6 +6,8 @@
 #ifndef _XE_REG_SR_
 #define _XE_REG_SR_
 
+#include <linux/types.h>
+
 /*
  * Reg save/restore bookkeeping
  */
@@ -21,7 +23,7 @@ int xe_reg_sr_init(struct xe_reg_sr *sr, const char *name, struct xe_device *xe)
 void xe_reg_sr_dump(struct xe_reg_sr *sr, struct drm_printer *p);
 
 int xe_reg_sr_add(struct xe_reg_sr *sr, const struct xe_reg_sr_entry *e,
-		  struct xe_gt *gt);
+		  struct xe_gt *gt, bool overwrite);
 void xe_reg_sr_apply_mmio(struct xe_reg_sr *sr, struct xe_gt *gt);
 void xe_reg_sr_apply_whitelist(struct xe_hw_engine *hwe);
 
diff --git a/drivers/gpu/drm/xe/xe_rtp.c b/drivers/gpu/drm/xe/xe_rtp.c
index b13d4d62f0b1..6006f7c90cac 100644
--- a/drivers/gpu/drm/xe/xe_rtp.c
+++ b/drivers/gpu/drm/xe/xe_rtp.c
@@ -153,7 +153,7 @@ static void rtp_add_sr_entry(const struct xe_rtp_action *action,
 	};
 
 	sr_entry.reg.addr += mmio_base;
-	xe_reg_sr_add(sr, &sr_entry, gt);
+	xe_reg_sr_add(sr, &sr_entry, gt, false);
 }
 
 static bool rtp_process_one_sr(const struct xe_rtp_entry_sr *entry,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 20/26] drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (18 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 19/26] drm/xe/eudebug: Dynamically toggle debugger functionality Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 21/26] drm/xe/eudebug/ptl: Add support for extra attention register Mika Kuoppala
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Christoph Manszewski, Mika Kuoppala

From: Christoph Manszewski <christoph.manszewski@intel.com>

Introduce kunit test for eudebug. For now it checks the dynamic
application of WAs.

v2: adapt to removal of call_for_each_device (Mika)
v3: s/FW_RENDER/FORCEWAKE_ALL (Mika)

Signed-off-by: Christoph Manszewski <christoph.manszewski@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/tests/xe_eudebug.c       | 176 ++++++++++++++++++++
 drivers/gpu/drm/xe/tests/xe_live_test_mod.c |   5 +
 drivers/gpu/drm/xe/xe_eudebug.c             |   4 +
 3 files changed, 185 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/tests/xe_eudebug.c

diff --git a/drivers/gpu/drm/xe/tests/xe_eudebug.c b/drivers/gpu/drm/xe/tests/xe_eudebug.c
new file mode 100644
index 000000000000..d47e4ff259cb
--- /dev/null
+++ b/drivers/gpu/drm/xe/tests/xe_eudebug.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: GPL-2.0 AND MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#include <kunit/visibility.h>
+
+#include "tests/xe_kunit_helpers.h"
+#include "tests/xe_pci_test.h"
+#include "tests/xe_test.h"
+
+#undef XE_REG_MCR
+#define XE_REG_MCR(r_, ...)	((const struct xe_reg_mcr){					\
+				 .__reg = XE_REG_INITIALIZER(r_,  ##__VA_ARGS__, .mcr = 1)	\
+				 })
+
+static const char *reg_to_str(struct xe_reg reg)
+{
+	if (reg.raw == TD_CTL.__reg.raw)
+		return "TD_CTL";
+	else if (reg.raw == CS_DEBUG_MODE2(RENDER_RING_BASE).raw)
+		return "CS_DEBUG_MODE2";
+	else if (reg.raw == ROW_CHICKEN.__reg.raw)
+		return "ROW_CHICKEN";
+	else if (reg.raw == ROW_CHICKEN2.__reg.raw)
+		return "ROW_CHICKEN2";
+	else if (reg.raw == ROW_CHICKEN3.__reg.raw)
+		return "ROW_CHICKEN3";
+	else
+		return "UNKNOWN REG";
+}
+
+static u32 get_reg_mask(struct xe_device *xe, struct xe_reg reg)
+{
+	struct kunit *test = kunit_get_current_test();
+	u32 val = 0;
+
+	if (reg.raw == TD_CTL.__reg.raw) {
+		val = TD_CTL_BREAKPOINT_ENABLE |
+		      TD_CTL_FORCE_THREAD_BREAKPOINT_ENABLE |
+		      TD_CTL_FEH_AND_FEE_ENABLE;
+
+		if (GRAPHICS_VERx100(xe) >= 1250)
+			val |= TD_CTL_GLOBAL_DEBUG_ENABLE;
+
+	} else if (reg.raw == CS_DEBUG_MODE2(RENDER_RING_BASE).raw) {
+		val = GLOBAL_DEBUG_ENABLE;
+	} else if (reg.raw == ROW_CHICKEN.__reg.raw) {
+		val = STALL_DOP_GATING_DISABLE;
+	} else if (reg.raw == ROW_CHICKEN2.__reg.raw) {
+		val = XEHPC_DISABLE_BTB;
+	} else if (reg.raw == ROW_CHICKEN3.__reg.raw) {
+		val = XE2_EUPEND_CHK_FLUSH_DIS;
+	} else {
+		kunit_warn(test, "Invalid register selection: %u\n", reg.raw);
+	}
+
+	return val;
+}
+
+static u32 get_reg_expected(struct xe_device *xe, struct xe_reg reg, bool enable_eudebug)
+{
+	u32 reg_mask = get_reg_mask(xe, reg);
+	u32 reg_bits = 0;
+
+	if (enable_eudebug || reg.raw == ROW_CHICKEN3.__reg.raw)
+		reg_bits = reg_mask;
+	else
+		reg_bits = 0;
+
+	return reg_bits;
+}
+
+static void check_reg(struct xe_gt *gt, bool enable_eudebug, struct xe_reg reg)
+{
+	struct kunit *test = kunit_get_current_test();
+	struct xe_device *xe = gt_to_xe(gt);
+	u32 reg_bits_expected = get_reg_expected(xe, reg, enable_eudebug);
+	u32 reg_mask = get_reg_mask(xe, reg);
+	u32 reg_bits = 0;
+
+	if (reg.mcr)
+		reg_bits = xe_gt_mcr_unicast_read_any(gt, (struct xe_reg_mcr){.__reg = reg});
+	else
+		reg_bits = xe_mmio_read32(&gt->mmio, reg);
+
+	reg_bits &= reg_mask;
+
+	kunit_printk(KERN_DEBUG, test, "%s bits: expected == 0x%x; actual == 0x%x\n",
+		     reg_to_str(reg), reg_bits_expected, reg_bits);
+	KUNIT_EXPECT_EQ_MSG(test, reg_bits_expected, reg_bits,
+			    "Invalid bits set for %s\n", reg_to_str(reg));
+}
+
+static void __check_regs(struct xe_gt *gt, bool enable_eudebug)
+{
+	struct xe_device *xe = gt_to_xe(gt);
+
+	if (GRAPHICS_VERx100(xe) >= 1200)
+		check_reg(gt, enable_eudebug, TD_CTL.__reg);
+
+	if (GRAPHICS_VERx100(xe) >= 1250 && GRAPHICS_VERx100(xe) <= 1274)
+		check_reg(gt, enable_eudebug, ROW_CHICKEN.__reg);
+
+	if (xe->info.platform == XE_PVC)
+		check_reg(gt, enable_eudebug, ROW_CHICKEN2.__reg);
+
+	if (GRAPHICS_VERx100(xe) >= 2000 && GRAPHICS_VERx100(xe) <= 2004)
+		check_reg(gt, enable_eudebug, ROW_CHICKEN3.__reg);
+}
+
+static void check_regs(struct xe_device *xe, bool enable_eudebug)
+{
+	struct kunit *test = kunit_get_current_test();
+	struct xe_gt *gt;
+	unsigned int fw_ref;
+	u8 id;
+
+	kunit_printk(KERN_DEBUG, test, "Check regs for eudebug %s\n",
+		     enable_eudebug ? "enabled" : "disabled");
+
+	xe_pm_runtime_get(xe);
+	for_each_gt(gt, xe, id) {
+		if (xe_gt_is_media_type(gt))
+			continue;
+
+		/* XXX: Figure out per platform proper domain */
+		fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
+		KUNIT_ASSERT_TRUE_MSG(test, fw_ref, "Forcewake failed.\n");
+
+		__check_regs(gt, enable_eudebug);
+
+		xe_force_wake_put(gt_to_fw(gt), fw_ref);
+	}
+	xe_pm_runtime_put(xe);
+}
+
+static int toggle_reg_value(struct xe_device *xe)
+{
+	struct kunit *test = kunit_get_current_test();
+	bool enable_eudebug = xe->eudebug.enable;
+
+	kunit_printk(KERN_DEBUG, test, "Test eudebug WAs for graphics version: %u\n",
+		     GRAPHICS_VERx100(xe));
+
+	check_regs(xe, enable_eudebug);
+
+	xe_eudebug_enable(xe, !enable_eudebug);
+	check_regs(xe, !enable_eudebug);
+
+	xe_eudebug_enable(xe, enable_eudebug);
+	check_regs(xe, enable_eudebug);
+
+	return 0;
+}
+
+static void xe_eudebug_toggle_reg_kunit(struct kunit *test)
+{
+	struct xe_device *xe = test->priv;
+
+	toggle_reg_value(xe);
+}
+
+static struct kunit_case xe_eudebug_tests[] = {
+	KUNIT_CASE_PARAM(xe_eudebug_toggle_reg_kunit,
+			 xe_pci_live_device_gen_param),
+	{}
+};
+
+VISIBLE_IF_KUNIT
+struct kunit_suite xe_eudebug_test_suite = {
+	.name = "xe_eudebug",
+	.test_cases = xe_eudebug_tests,
+	.init = xe_kunit_helper_xe_device_live_test_init,
+};
+EXPORT_SYMBOL_IF_KUNIT(xe_eudebug_test_suite);
diff --git a/drivers/gpu/drm/xe/tests/xe_live_test_mod.c b/drivers/gpu/drm/xe/tests/xe_live_test_mod.c
index 5f14737c8210..7dd8a0a4bdfd 100644
--- a/drivers/gpu/drm/xe/tests/xe_live_test_mod.c
+++ b/drivers/gpu/drm/xe/tests/xe_live_test_mod.c
@@ -15,6 +15,11 @@ kunit_test_suite(xe_dma_buf_test_suite);
 kunit_test_suite(xe_migrate_test_suite);
 kunit_test_suite(xe_mocs_test_suite);
 
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+extern struct kunit_suite xe_eudebug_test_suite;
+kunit_test_suite(xe_eudebug_test_suite);
+#endif
+
 MODULE_AUTHOR("Intel Corporation");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("xe live kunit tests");
diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index fe947d5350d8..f44cc0f8290e 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -3947,3 +3947,7 @@ xe_eudebug_vm_open_ioctl(struct xe_eudebug *d, unsigned long arg)
 
 	return ret;
 }
+
+#if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
+#include "tests/xe_eudebug.c"
+#endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 21/26] drm/xe/eudebug/ptl: Add support for extra attention register
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (19 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 20/26] drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 22/26] drm/xe/eudebug/ptl: Add RCU_DEBUG_1 register support for xe3 Mika Kuoppala
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Dominik Grzegorzek, Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

xe3 can set bits within an additional attention bit register EU_ATT1.
Recalculate bitmask and make sure we read all required data.

Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c  | 4 ++--
 drivers/gpu/drm/xe/xe_gt_debug.c | 8 ++++----
 drivers/gpu/drm/xe/xe_gt_debug.h | 8 ++++++--
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index f44cc0f8290e..c259e5804386 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -1858,7 +1858,7 @@ static int check_attn_mcr(struct xe_gt *gt, void *data,
 	struct xe_eudebug *d = iter->debugger;
 	unsigned int row;
 
-	for (row = 0; row < TD_EU_ATTENTION_MAX_ROWS; row++) {
+	for (row = 0; row < xe_gt_debug_eu_att_rows(gt); row++) {
 		u32 val, cur = 0;
 
 		if (iter->i >= iter->size)
@@ -1891,7 +1891,7 @@ static int clear_attn_mcr(struct xe_gt *gt, void *data,
 	struct xe_eudebug *d = iter->debugger;
 	unsigned int row;
 
-	for (row = 0; row < TD_EU_ATTENTION_MAX_ROWS; row++) {
+	for (row = 0; row < xe_gt_debug_eu_att_rows(gt); row++) {
 		u32 val;
 
 		if (iter->i >= iter->size)
diff --git a/drivers/gpu/drm/xe/xe_gt_debug.c b/drivers/gpu/drm/xe/xe_gt_debug.c
index f35b9df5e41b..49f24db9da9c 100644
--- a/drivers/gpu/drm/xe/xe_gt_debug.c
+++ b/drivers/gpu/drm/xe/xe_gt_debug.c
@@ -74,9 +74,9 @@ int xe_gt_eu_attention_bitmap_size(struct xe_gt *gt)
 	bitmap_or(dss_mask, gt->fuse_topo.c_dss_mask,
 		  gt->fuse_topo.g_dss_mask, XE_MAX_DSS_FUSE_BITS);
 
-	return  bitmap_weight(dss_mask, XE_MAX_DSS_FUSE_BITS) *
-		TD_EU_ATTENTION_MAX_ROWS * MAX_THREADS *
-		MAX_EUS_PER_ROW / 8;
+	return bitmap_weight(dss_mask, XE_MAX_DSS_FUSE_BITS) *
+	       xe_gt_debug_eu_att_rows(gt) * MAX_THREADS *
+	       MAX_EUS_PER_ROW / 8;
 }
 
 struct attn_read_iter {
@@ -92,7 +92,7 @@ static int read_eu_attentions_mcr(struct xe_gt *gt, void *data,
 	struct attn_read_iter * const iter = data;
 	unsigned int row;
 
-	for (row = 0; row < TD_EU_ATTENTION_MAX_ROWS; row++) {
+	for (row = 0; row < xe_gt_debug_eu_att_rows(gt); row++) {
 		u32 val;
 
 		if (iter->i >= iter->size)
diff --git a/drivers/gpu/drm/xe/xe_gt_debug.h b/drivers/gpu/drm/xe/xe_gt_debug.h
index 342082699ff6..1edb667154f1 100644
--- a/drivers/gpu/drm/xe/xe_gt_debug.h
+++ b/drivers/gpu/drm/xe/xe_gt_debug.h
@@ -6,12 +6,16 @@
 #ifndef __XE_GT_DEBUG_
 #define __XE_GT_DEBUG_
 
-#define TD_EU_ATTENTION_MAX_ROWS 2u
-
+#include "xe_device_types.h"
 #include "xe_gt_types.h"
 
 #define XE_GT_ATTENTION_TIMEOUT_MS 100
 
+static inline unsigned int xe_gt_debug_eu_att_rows(struct xe_gt *gt)
+{
+	return (GRAPHICS_VERx100(gt_to_xe(gt)) >= 3000) ? 4u : 2u;
+}
+
 int xe_gt_eu_threads_needing_attention(struct xe_gt *gt);
 int xe_gt_foreach_dss_group_instance(struct xe_gt *gt,
 				     int (*fn)(struct xe_gt *gt,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 22/26] drm/xe/eudebug/ptl: Add RCU_DEBUG_1 register support for xe3
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (20 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 21/26] drm/xe/eudebug/ptl: Add support for extra attention register Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 23/26] drm/xe/eudebug: Add read/count/compare helper for eu attention Mika Kuoppala
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Dominik Grzegorzek, Mika Kuoppala

From: Dominik Grzegorzek <dominik.grzegorzek@intel.com>

Format of Register_RenderControlUnitDebug1 is different from
previous gens. Adjust it so it matches PTL/xe3 format.

Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index c259e5804386..09b455a96571 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -1443,6 +1443,17 @@ static u32 engine_status_xe2(const struct xe_hw_engine * const hwe,
 	return (rcu_debug1 >> shift) & RCU_DEBUG_1_ENGINE_STATUS;
 }
 
+static u32 engine_status_xe3(const struct xe_hw_engine * const hwe,
+			     u32 rcu_debug1)
+{
+	const unsigned int first = 6;
+	const unsigned int incr = 4;
+	const unsigned int i = rcu_debug1_engine_index(hwe);
+	const unsigned int shift = first + (i * incr);
+
+	return (rcu_debug1 >> shift) & RCU_DEBUG_1_ENGINE_STATUS;
+}
+
 static u32 engine_status(const struct xe_hw_engine * const hwe,
 			 u32 rcu_debug1)
 {
@@ -1452,6 +1463,8 @@ static u32 engine_status(const struct xe_hw_engine * const hwe,
 		status = engine_status_xe1(hwe, rcu_debug1);
 	else if (GRAPHICS_VER(gt_to_xe(hwe->gt)) < 30)
 		status = engine_status_xe2(hwe, rcu_debug1);
+	else if (GRAPHICS_VER(gt_to_xe(hwe->gt)) < 35)
+		status = engine_status_xe3(hwe, rcu_debug1);
 	else
 		XE_WARN_ON(GRAPHICS_VER(gt_to_xe(hwe->gt)));
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 23/26] drm/xe/eudebug: Add read/count/compare helper for eu attention
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (21 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 22/26] drm/xe/eudebug/ptl: Add RCU_DEBUG_1 register support for xe3 Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 24/26] drm/xe/eudebug: Introduce EU pagefault handling interface Mika Kuoppala
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Gwan-gyeong Mun, Mika Kuoppala

From: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>

Add xe_eu_attentions structure to capture and store eu attention bits.
Add a function to count the number of eu threads that have turned on from
eu attentions, and add a function to count the number of eu threads that
have changed on a state between eu attentions.

Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_gt_debug.c | 64 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_debug.h | 15 ++++++++
 2 files changed, 79 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_debug.c b/drivers/gpu/drm/xe/xe_gt_debug.c
index 49f24db9da9c..a20e1e57212c 100644
--- a/drivers/gpu/drm/xe/xe_gt_debug.c
+++ b/drivers/gpu/drm/xe/xe_gt_debug.c
@@ -3,6 +3,7 @@
  * Copyright © 2023 Intel Corporation
  */
 
+#include <linux/delay.h>
 #include "regs/xe_gt_regs.h"
 #include "xe_device.h"
 #include "xe_force_wake.h"
@@ -146,3 +147,66 @@ int xe_gt_eu_threads_needing_attention(struct xe_gt *gt)
 
 	return err < 0 ? 0 : err;
 }
+
+static inline unsigned int
+xe_eu_attentions_count(const struct xe_eu_attentions *a)
+{
+	return bitmap_weight((void *)a->att, a->size * BITS_PER_BYTE);
+}
+
+void xe_gt_eu_attentions_read(struct xe_gt *gt,
+			      struct xe_eu_attentions *a,
+			      const unsigned int settle_time_ms)
+{
+	unsigned int prev = 0;
+	ktime_t end, now;
+
+	now = ktime_get_raw();
+	end = ktime_add_ms(now, settle_time_ms);
+
+	a->ts = 0;
+	a->size = min_t(int,
+			xe_gt_eu_attention_bitmap_size(gt),
+			sizeof(a->att));
+
+	do {
+		unsigned int attn;
+
+		xe_gt_eu_attention_bitmap(gt, a->att, a->size);
+		attn = xe_eu_attentions_count(a);
+
+		now = ktime_get_raw();
+
+		if (a->ts == 0)
+			a->ts = now;
+		else if (attn && attn != prev)
+			a->ts = now;
+
+		prev = attn;
+
+		if (settle_time_ms)
+			udelay(5);
+
+		/*
+		 * XXX We are gathering data for production SIP to find
+		 * the upper limit of settle time. For now, we wait full
+		 * timeout value regardless.
+		 */
+	} while (ktime_before(now, end));
+}
+
+unsigned int xe_eu_attentions_xor_count(const struct xe_eu_attentions *a,
+					const struct xe_eu_attentions *b)
+{
+	unsigned int count = 0;
+	unsigned int i;
+
+	if (XE_WARN_ON(a->size != b->size))
+		return -EINVAL;
+
+	for (i = 0; i < a->size; i++)
+		if (a->att[i] ^ b->att[i])
+			count++;
+
+	return count;
+}
diff --git a/drivers/gpu/drm/xe/xe_gt_debug.h b/drivers/gpu/drm/xe/xe_gt_debug.h
index 1edb667154f1..1d50b93235ae 100644
--- a/drivers/gpu/drm/xe/xe_gt_debug.h
+++ b/drivers/gpu/drm/xe/xe_gt_debug.h
@@ -11,6 +11,15 @@
 
 #define XE_GT_ATTENTION_TIMEOUT_MS 100
 
+struct xe_eu_attentions {
+#define XE_MAX_EUS 1024
+#define XE_MAX_THREADS 10
+
+	u8 att[DIV_ROUND_UP(XE_MAX_EUS * XE_MAX_THREADS, BITS_PER_BYTE)];
+	unsigned int size;
+	ktime_t ts;
+};
+
 static inline unsigned int xe_gt_debug_eu_att_rows(struct xe_gt *gt)
 {
 	return (GRAPHICS_VERx100(gt_to_xe(gt)) >= 3000) ? 4u : 2u;
@@ -28,4 +37,10 @@ int xe_gt_eu_attention_bitmap_size(struct xe_gt *gt);
 int xe_gt_eu_attention_bitmap(struct xe_gt *gt, u8 *bits,
 			      unsigned int bitmap_size);
 
+void xe_gt_eu_attentions_read(struct xe_gt *gt,
+			      struct xe_eu_attentions *a,
+			      const unsigned int settle_time_ms);
+
+unsigned int xe_eu_attentions_xor_count(const struct xe_eu_attentions *a,
+					const struct xe_eu_attentions *b);
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 24/26] drm/xe/eudebug: Introduce EU pagefault handling interface
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (22 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 23/26] drm/xe/eudebug: Add read/count/compare helper for eu attention Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 25/26] drm/xe/vm: Support for adding null page VMA to VM on request Mika Kuoppala
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Gwan-gyeong Mun, Mika Kuoppala

From: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>

The XE2 (and PVC) HW has a limitation that the pagefault due to invalid
access will halt the corresponding EUs. To solve this problem, introduce
EU pagefault handling functionality, which allows to unhalt pagefaulted
eu threads and to EU debugger to get inform about the eu attentions state
of EU threads during execution.

If a pagefault occurs, send the DRM_XE_EUDEBUG_EVENT_PAGEFAULT event to
the client connected to the xe_eudebug after handling the pagefault.
The pagefault eudebug event follows the newly added
drm_xe_eudebug_event_pagefault type.
When a pagefault occurs, it prevents to send the
DRM_XE_EUDEBUG_EVENT_EU_ATTENTION event to the client during pagefault
handling.

The page fault event delivery follows the below policy.
(1) If EU Debugger discovery has completed and pagefaulted eu threads turn
    on attention bit then pagefault handler delivers pagefault event
    directly.
(2) If a pagefault occurs during eu debugger discovery process, pagefault
    handler queues a pagefault event and sends the queued event when
    discovery has completed and pagefaulted eu threads turn on attention
    bit.
(3) If the pagefaulted eu thread struggles to turn on the attention bit
    within the specified time, the attention scan worker sends a pagefault
    event when it detects that the attention bit is turned on.

If multiple eu threads are running and a pagefault occurs due to accessing
the same invalid address, send a single pagefault event
(DRM_XE_EUDEBUG_EVENT_PAGEFAULT type) to the user debugger instead of a
pagefault event for each of the multiple eu threads.
If eu threads (other than the one that caused the page fault before) access
the new invalid addresses, send a new pagefault event.

As the attention scan worker send the eu attention event whenever the
attention bit is turned on, user debugger receives attenion event
immediately after pagefault event.
In this case, the page-fault event always precedes the attention event.

When the user debugger receives an attention event after a pagefault event,
it can detect whether additional breakpoints or interrupts occur in
addition to the existing pagefault by comparing the eu threads where the
pagefault occurred with the eu threads where the attention bit is newly
enabled.

Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c       | 489 +++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_eudebug.h       |  28 ++
 drivers/gpu/drm/xe/xe_eudebug_types.h |  94 +++++
 drivers/gpu/drm/xe/xe_gt_pagefault.c  |   4 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.h  |   2 +
 include/uapi/drm/xe_drm_eudebug.h     |  13 +
 6 files changed, 626 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 09b455a96571..0fd0958c5790 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -31,6 +31,7 @@
 #include "xe_gt.h"
 #include "xe_gt_debug.h"
 #include "xe_gt_mcr.h"
+#include "xe_gt_pagefault.h"
 #include "xe_guc_exec_queue_types.h"
 #include "xe_hw_engine.h"
 #include "xe_lrc.h"
@@ -236,10 +237,17 @@ static void xe_eudebug_free(struct kref *ref)
 {
 	struct xe_eudebug *d = container_of(ref, typeof(*d), ref);
 	struct xe_eudebug_event *event;
+	struct xe_eudebug_pagefault *pf, *pf_temp;
 
 	while (kfifo_get(&d->events.fifo, &event))
 		kfree(event);
 
+	/* Since it's the last reference no race here */
+	list_for_each_entry_safe(pf, pf_temp, &d->pagefaults, list) {
+		xe_exec_queue_put(pf->q);
+		kfree(pf);
+	}
+
 	xe_eudebug_destroy_resources(d);
 	put_task_struct(d->target_task);
 
@@ -911,7 +919,7 @@ static struct xe_eudebug_event *
 xe_eudebug_create_event(struct xe_eudebug *d, u16 type, u64 seqno, u16 flags,
 			u32 len)
 {
-	const u16 max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_OP_METADATA;
+	const u16 max_event = DRM_XE_EUDEBUG_EVENT_PAGEFAULT;
 	const u16 known_flags =
 		DRM_XE_EUDEBUG_EVENT_CREATE |
 		DRM_XE_EUDEBUG_EVENT_DESTROY |
@@ -946,7 +954,7 @@ static long xe_eudebug_read_event(struct xe_eudebug *d,
 		u64_to_user_ptr(arg);
 	struct drm_xe_eudebug_event user_event;
 	struct xe_eudebug_event *event;
-	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_VM_BIND_OP_METADATA;
+	const unsigned int max_event = DRM_XE_EUDEBUG_EVENT_PAGEFAULT;
 	long ret = 0;
 
 	if (XE_IOCTL_DBG(xe, copy_from_user(&user_event, user_orig, sizeof(user_event))))
@@ -1067,6 +1075,7 @@ static int do_eu_control(struct xe_eudebug *d,
 	struct xe_device *xe = d->xe;
 	u8 *bits = NULL;
 	unsigned int hw_attn_size, attn_size;
+	struct dma_fence *pf_fence;
 	struct xe_exec_queue *q;
 	struct xe_file *xef;
 	struct xe_lrc *lrc;
@@ -1132,6 +1141,23 @@ static int do_eu_control(struct xe_eudebug *d,
 
 	ret = -EINVAL;
 	mutex_lock(&d->eu_lock);
+	rcu_read_lock();
+	pf_fence = dma_fence_get_rcu_safe(&d->pf_fence);
+	rcu_read_unlock();
+
+	while (pf_fence) {
+		mutex_unlock(&d->eu_lock);
+		ret = dma_fence_wait(pf_fence, true);
+		dma_fence_put(pf_fence);
+
+		if (ret)
+			goto out_free;
+
+		mutex_lock(&d->eu_lock);
+		rcu_read_lock();
+		pf_fence = dma_fence_get_rcu_safe(&d->pf_fence);
+		rcu_read_unlock();
+	}
 
 	switch (arg->cmd) {
 	case DRM_XE_EUDEBUG_EU_CONTROL_CMD_INTERRUPT_ALL:
@@ -1720,6 +1746,182 @@ static int xe_eudebug_handle_gt_attention(struct xe_gt *gt)
 	return ret;
 }
 
+static int send_pagefault_event(struct xe_eudebug *d, struct xe_eudebug_pagefault *pf)
+{
+	struct xe_eudebug_event_pagefault *ep;
+	struct xe_eudebug_event *event;
+	int h_c, h_queue, h_lrc;
+	u32 size = xe_gt_eu_attention_bitmap_size(pf->q->gt) * 3;
+	u32 sz = struct_size(ep, bitmask, size);
+
+	XE_WARN_ON(pf->lrc_idx < 0 || pf->lrc_idx >= pf->q->width);
+
+	XE_WARN_ON(!xe_exec_queue_is_debuggable(pf->q));
+
+	h_c = find_handle(d->res, XE_EUDEBUG_RES_TYPE_CLIENT, pf->q->vm->xef);
+	if (h_c < 0)
+		return h_c;
+
+	h_queue = find_handle(d->res, XE_EUDEBUG_RES_TYPE_EXEC_QUEUE, pf->q);
+	if (h_queue < 0)
+		return h_queue;
+
+	h_lrc = find_handle(d->res, XE_EUDEBUG_RES_TYPE_LRC, pf->q->lrc[pf->lrc_idx]);
+	if (h_lrc < 0)
+		return h_lrc;
+
+	event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_PAGEFAULT, 0,
+					DRM_XE_EUDEBUG_EVENT_STATE_CHANGE, sz);
+
+	if (!event)
+		return -ENOSPC;
+
+	ep = cast_event(ep, event);
+	write_member(struct xe_eudebug_event_pagefault, ep, client_handle, (u64)h_c);
+	write_member(struct xe_eudebug_event_pagefault, ep, exec_queue_handle, (u64)h_queue);
+	write_member(struct xe_eudebug_event_pagefault, ep, lrc_handle, (u64)h_lrc);
+	write_member(struct xe_eudebug_event_pagefault, ep, bitmask_size, size);
+	write_member(struct xe_eudebug_event_pagefault, ep, pagefault_address, pf->fault.addr);
+
+	memcpy(ep->bitmask, pf->attentions.before.att, pf->attentions.before.size);
+	memcpy(ep->bitmask + pf->attentions.before.size,
+	       pf->attentions.after.att, pf->attentions.after.size);
+	memcpy(ep->bitmask + pf->attentions.before.size + pf->attentions.after.size,
+	       pf->attentions.resolved.att, pf->attentions.resolved.size);
+
+	event->seqno = atomic_long_inc_return(&d->events.seqno);
+
+	return xe_eudebug_queue_event(d, event);
+}
+
+static int send_pagefault(struct xe_gt *gt, struct xe_eudebug_pagefault *pf,
+			  bool from_attention_scan)
+{
+	struct xe_eudebug *d;
+	struct xe_exec_queue *q;
+	int ret, lrc_idx;
+
+	if (list_empty_careful(&gt_to_xe(gt)->eudebug.list))
+		return -ENOTCONN;
+
+	q = runalone_active_queue_get(gt, &lrc_idx);
+	if (IS_ERR(q))
+		return PTR_ERR(q);
+
+	if (!xe_exec_queue_is_debuggable(q)) {
+		ret = -EPERM;
+		goto out_exec_queue_put;
+	}
+
+	d = _xe_eudebug_get(q->vm->xef);
+	if (!d) {
+		ret = -ENOTCONN;
+		goto out_exec_queue_put;
+	}
+
+	if (!completion_done(&d->discovery)) {
+		eu_dbg(d, "discovery not yet done\n");
+		ret = -EBUSY;
+		goto out_eudebug_put;
+	}
+
+	if (pf->deferred_resolved) {
+		xe_gt_eu_attentions_read(gt, &pf->attentions.resolved,
+					 XE_GT_ATTENTION_TIMEOUT_MS);
+
+		if (!xe_eu_attentions_xor_count(&pf->attentions.after,
+						&pf->attentions.resolved) &&
+		    !from_attention_scan) {
+			eu_dbg(d, "xe attentions not yet updated\n");
+			ret = -EBUSY;
+			goto out_eudebug_put;
+		}
+	}
+
+	ret = send_pagefault_event(d, pf);
+	if (ret)
+		xe_eudebug_disconnect(d, ret);
+
+out_eudebug_put:
+	xe_eudebug_put(d);
+out_exec_queue_put:
+	xe_exec_queue_put(q);
+
+	return ret;
+}
+
+static int send_queued_pagefault(struct xe_eudebug *d, bool from_attention_scan)
+{
+	struct xe_eudebug_pagefault *pf, *pf_temp;
+	int ret = 0;
+
+	mutex_lock(&d->pf_lock);
+	list_for_each_entry_safe(pf, pf_temp, &d->pagefaults, list) {
+		struct xe_gt *gt = pf->q->gt;
+
+		ret = send_pagefault(gt, pf, from_attention_scan);
+
+		/* if resolved attentions are not updated */
+		if (ret == -EBUSY)
+			break;
+
+		/* decrease the reference count of xe_exec_queue obtained from pagefault handler */
+		xe_exec_queue_put(pf->q);
+		list_del(&pf->list);
+		kfree(pf);
+
+		if (ret)
+			break;
+	}
+	mutex_unlock(&d->pf_lock);
+
+	return ret;
+}
+
+static int handle_gt_queued_pagefault(struct xe_gt *gt)
+{
+	struct xe_exec_queue *q;
+	struct xe_eudebug *d;
+	int ret, lrc_idx;
+
+	ret = xe_gt_eu_threads_needing_attention(gt);
+	if (ret <= 0)
+		return ret;
+
+	if (list_empty_careful(&gt_to_xe(gt)->eudebug.list))
+		return -ENOTCONN;
+
+	q = runalone_active_queue_get(gt, &lrc_idx);
+	if (IS_ERR(q))
+		return PTR_ERR(q);
+
+	if (!xe_exec_queue_is_debuggable(q)) {
+		ret = -EPERM;
+		goto out_exec_queue_put;
+	}
+
+	d = _xe_eudebug_get(q->vm->xef);
+	if (!d) {
+		ret = -ENOTCONN;
+		goto out_exec_queue_put;
+	}
+
+	if (!completion_done(&d->discovery)) {
+		eu_dbg(d, "discovery not yet done\n");
+		ret = -EBUSY;
+		goto out_eudebug_put;
+	}
+
+	ret = send_queued_pagefault(d, true);
+
+out_eudebug_put:
+	xe_eudebug_put(d);
+out_exec_queue_put:
+	xe_exec_queue_put(q);
+
+	return ret;
+}
+
 #define XE_EUDEBUG_ATTENTION_INTERVAL 100
 static void attention_scan_fn(struct work_struct *work)
 {
@@ -1741,6 +1943,8 @@ static void attention_scan_fn(struct work_struct *work)
 			if (gt->info.type != XE_GT_TYPE_MAIN)
 				continue;
 
+			handle_gt_queued_pagefault(gt);
+
 			ret = xe_eudebug_handle_gt_attention(gt);
 			if (ret) {
 				// TODO: error capture
@@ -2048,6 +2252,8 @@ xe_eudebug_connect(struct xe_device *xe,
 	kref_init(&d->ref);
 	spin_lock_init(&d->connection.lock);
 	mutex_init(&d->eu_lock);
+	mutex_init(&d->pf_lock);
+	INIT_LIST_HEAD(&d->pagefaults);
 	init_waitqueue_head(&d->events.write_done);
 	init_waitqueue_head(&d->events.read_done);
 	init_completion(&d->discovery);
@@ -3490,6 +3696,8 @@ static void discovery_work_fn(struct work_struct *work)
 
 	up_write(&xe->eudebug.discovery_lock);
 
+	send_queued_pagefault(d, false);
+
 	xe_eudebug_put(d);
 }
 
@@ -3961,6 +4169,283 @@ xe_eudebug_vm_open_ioctl(struct xe_eudebug *d, unsigned long arg)
 	return ret;
 }
 
+static int queue_pagefault(struct xe_gt *gt, struct xe_eudebug_pagefault *pf)
+{
+	struct xe_eudebug *d;
+
+	if (list_empty_careful(&gt_to_xe(gt)->eudebug.list))
+		return -ENOTCONN;
+
+	d = _xe_eudebug_get(pf->q->vm->xef);
+	if (IS_ERR_OR_NULL(d))
+		return -EINVAL;
+
+	mutex_lock(&d->pf_lock);
+	list_add_tail(&pf->list, &d->pagefaults);
+	mutex_unlock(&d->pf_lock);
+
+	xe_eudebug_put(d);
+
+	return 0;
+}
+
+static int handle_pagefault(struct xe_gt *gt, struct xe_eudebug_pagefault *pf)
+{
+	int ret;
+
+	ret = send_pagefault(gt, pf, false);
+
+	/*
+	 * if debugger discovery is not completed or resolved attentions are not
+	 * updated, then queue pagefault
+	 */
+	if (ret == -EBUSY) {
+		ret = queue_pagefault(gt, pf);
+		if (!ret)
+			goto out;
+	}
+
+	xe_exec_queue_put(pf->q);
+	kfree(pf);
+
+out:
+	return ret;
+}
+
+static const char *
+pagefault_get_driver_name(struct dma_fence *dma_fence)
+{
+	return "xe";
+}
+
+static const char *
+pagefault_fence_get_timeline_name(struct dma_fence *dma_fence)
+{
+	return "eudebug_pagefault_fence";
+}
+
+static const struct dma_fence_ops pagefault_fence_ops = {
+	.get_driver_name = pagefault_get_driver_name,
+	.get_timeline_name = pagefault_fence_get_timeline_name,
+};
+
+struct pagefault_fence {
+	struct dma_fence base;
+	spinlock_t lock;
+};
+
+static struct pagefault_fence *pagefault_fence_create(void)
+{
+	struct pagefault_fence *fence;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (fence == NULL)
+		return NULL;
+
+	spin_lock_init(&fence->lock);
+	dma_fence_init(&fence->base, &pagefault_fence_ops, &fence->lock,
+		       dma_fence_context_alloc(1), 1);
+
+	return fence;
+}
+
+struct xe_eudebug_pagefault *
+xe_eudebug_pagefault_create(struct xe_gt *gt, struct xe_vm *vm, u64 page_addr,
+			    u8 fault_type, u8 fault_level, u8 access_type)
+{
+	struct pagefault_fence *pf_fence;
+	struct xe_eudebug_pagefault *pf;
+	struct xe_vma *vma = NULL;
+	struct xe_exec_queue *q;
+	struct dma_fence *fence;
+	struct xe_eudebug *d;
+	unsigned int fw_ref;
+	int lrc_idx;
+	u32 td_ctl;
+
+	down_read(&vm->lock);
+	vma = xe_gt_pagefault_lookup_vma(vm, page_addr);
+	up_read(&vm->lock);
+
+	if (vma)
+		return NULL;
+
+	d = _xe_eudebug_get(vm->xef);
+	if (!d)
+		return NULL;
+
+	q = runalone_active_queue_get(gt, &lrc_idx);
+	if (IS_ERR(q))
+		goto err_put_eudebug;
+
+	if (!xe_exec_queue_is_debuggable(q))
+		goto err_put_exec_queue;
+
+	fw_ref = xe_force_wake_get(gt_to_fw(gt), q->hwe->domain);
+	if (!fw_ref)
+		goto err_put_exec_queue;
+
+	/*
+	 * If there is no debug functionality (TD_CTL_GLOBAL_DEBUG_ENABLE, etc.),
+	 * don't proceed pagefault routine for eu debugger.
+	 */
+
+	td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL);
+	if (!td_ctl)
+		goto err_put_fw;
+
+	pf = kzalloc(sizeof(*pf), GFP_KERNEL);
+	if (!pf)
+		goto err_put_fw;
+
+	attention_scan_cancel(gt_to_xe(gt));
+
+	mutex_lock(&d->eu_lock);
+	rcu_read_lock();
+	fence = dma_fence_get_rcu_safe(&d->pf_fence);
+	rcu_read_unlock();
+
+	if (fence) {
+		/*
+		 * TODO: If the new incoming pagefaulted address is different
+		 * from the pagefaulted address it is currently handling on the
+		 * same ASID, it needs a routine to wait here and then do the
+		 * following pagefault.
+		 */
+		dma_fence_put(fence);
+		goto err_unlock_eu_lock;
+	}
+
+	pf_fence = pagefault_fence_create();
+	if (!pf_fence)
+		goto err_unlock_eu_lock;
+
+	d->pf_fence = &pf_fence->base;
+	mutex_unlock(&d->eu_lock);
+
+	INIT_LIST_HEAD(&pf->list);
+
+	xe_gt_eu_attentions_read(gt, &pf->attentions.before, 0);
+
+	/* Halt on next thread dispatch */
+	while (!(td_ctl & TD_CTL_FORCE_EXTERNAL_HALT)) {
+		xe_gt_mcr_multicast_write(gt, TD_CTL,
+					  td_ctl | TD_CTL_FORCE_EXTERNAL_HALT);
+		/*
+		 * The sleep is needed because some interrupts are ignored
+		 * by the HW, hence we allow the HW some time to acknowledge
+		 * that.
+		 */
+		udelay(200);
+		td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL);
+	}
+
+	/* Halt regardless of thread dependencies */
+	while (!(td_ctl & TD_CTL_FORCE_EXCEPTION)) {
+		xe_gt_mcr_multicast_write(gt, TD_CTL,
+					  td_ctl | TD_CTL_FORCE_EXCEPTION);
+		udelay(200);
+		td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL);
+	}
+
+	xe_gt_eu_attentions_read(gt, &pf->attentions.after,
+				 XE_GT_ATTENTION_TIMEOUT_MS);
+
+	/*
+	 * xe_exec_queue_put() will be called from xe_eudebug_pagefault_destroy()
+	 * or handle_pagefault()
+	 */
+	pf->q = q;
+	pf->lrc_idx = lrc_idx;
+	pf->fault.addr = page_addr;
+	pf->fault.type = fault_type;
+	pf->fault.level = fault_level;
+	pf->fault.access = access_type;
+
+	xe_force_wake_put(gt_to_fw(gt), fw_ref);
+	xe_eudebug_put(d);
+
+	return pf;
+
+err_unlock_eu_lock:
+	mutex_unlock(&d->eu_lock);
+	attention_scan_flush(gt_to_xe(gt));
+	kfree(pf);
+err_put_fw:
+	xe_force_wake_put(gt_to_fw(gt), fw_ref);
+err_put_exec_queue:
+	xe_exec_queue_put(q);
+err_put_eudebug:
+	xe_eudebug_put(d);
+
+	return NULL;
+}
+
+void
+xe_eudebug_pagefault_process(struct xe_gt *gt, struct xe_eudebug_pagefault *pf)
+{
+	xe_gt_eu_attentions_read(gt, &pf->attentions.resolved,
+				 XE_GT_ATTENTION_TIMEOUT_MS);
+
+	if (!xe_eu_attentions_xor_count(&pf->attentions.after,
+					&pf->attentions.resolved))
+		pf->deferred_resolved = true;
+}
+
+void
+xe_eudebug_pagefault_destroy(struct xe_gt *gt, struct xe_vm *vm,
+			     struct xe_eudebug_pagefault *pf, bool send_event)
+{
+	struct xe_eudebug *d;
+	unsigned int fw_ref;
+	u32 td_ctl;
+
+	fw_ref = xe_force_wake_get(gt_to_fw(gt), pf->q->hwe->domain);
+	if (!fw_ref) {
+		struct xe_device *xe = gt_to_xe(gt);
+
+		drm_warn(&xe->drm, "Forcewake fail: Can not recover TD_CTL");
+	} else {
+		td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL);
+		xe_gt_mcr_multicast_write(gt, TD_CTL, td_ctl &
+					  ~(TD_CTL_FORCE_EXTERNAL_HALT | TD_CTL_FORCE_EXCEPTION));
+		xe_force_wake_put(gt_to_fw(gt), fw_ref);
+	}
+
+	if (send_event)
+		handle_pagefault(gt, pf);
+
+	d = _xe_eudebug_get(vm->xef);
+	if (d) {
+		struct dma_fence *fence;
+
+		mutex_lock(&d->eu_lock);
+		rcu_read_lock();
+		fence = dma_fence_get_rcu_safe(&d->pf_fence);
+		rcu_read_unlock();
+
+		if (fence) {
+			if (send_event)
+				dma_fence_signal(fence);
+
+			dma_fence_put(fence); /* deref for dma_fence_get_rcu_safe() */
+			dma_fence_put(fence); /* defef for dma_fence_init() */
+		}
+
+		d->pf_fence = NULL;
+		mutex_unlock(&d->eu_lock);
+
+		xe_eudebug_put(d);
+	}
+
+	if (!send_event) {
+		xe_exec_queue_put(pf->q);
+		kfree(pf);
+	}
+
+	attention_scan_flush(gt_to_xe(gt));
+}
+
 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
 #include "tests/xe_eudebug.c"
 #endif
diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h
index a08abf796cc1..cf1df4e2c6a6 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.h
+++ b/drivers/gpu/drm/xe/xe_eudebug.h
@@ -11,6 +11,7 @@ struct drm_device;
 struct drm_file;
 struct xe_device;
 struct xe_file;
+struct xe_gt;
 struct xe_vm;
 struct xe_vma;
 struct xe_exec_queue;
@@ -18,6 +19,7 @@ struct xe_hw_engine;
 struct xe_user_fence;
 struct xe_debug_metadata;
 struct drm_gpuva_ops;
+struct xe_eudebug_pagefault;
 
 #if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
 
@@ -53,6 +55,13 @@ void xe_eudebug_put(struct xe_eudebug *d);
 void xe_eudebug_debug_metadata_create(struct xe_file *xef, struct xe_debug_metadata *m);
 void xe_eudebug_debug_metadata_destroy(struct xe_file *xef, struct xe_debug_metadata *m);
 
+struct xe_eudebug_pagefault *xe_eudebug_pagefault_create(struct xe_gt *gt, struct xe_vm *vm,
+							 u64 page_addr, u8 fault_type,
+							 u8 fault_level, u8 access_type);
+void xe_eudebug_pagefault_process(struct xe_gt *gt, struct xe_eudebug_pagefault *pf);
+void xe_eudebug_pagefault_destroy(struct xe_gt *gt, struct xe_vm *vm,
+				  struct xe_eudebug_pagefault *pf, bool send_event);
+
 #else
 
 static inline int xe_eudebug_connect_ioctl(struct drm_device *dev,
@@ -95,6 +104,25 @@ static inline void xe_eudebug_debug_metadata_destroy(struct xe_file *xef,
 {
 }
 
+static inline struct xe_eudebug_pagefault *
+xe_eudebug_pagefault_create(struct xe_gt *gt, struct xe_vm *vm, u64 page_addr,
+			    u8 fault_type, u8 fault_level, u8 access_type)
+{
+	return NULL;
+}
+
+static inline void
+xe_eudebug_pagefault_process(struct xe_gt *gt, struct xe_eudebug_pagefault *pf)
+{
+}
+
+static inline void xe_eudebug_pagefault_destroy(struct xe_gt *gt,
+						struct xe_vm *vm,
+						struct xe_eudebug_pagefault *pf,
+						bool send_event)
+{
+}
+
 #endif /* CONFIG_DRM_XE_EUDEBUG */
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h
index a69051b04698..00853dacd477 100644
--- a/drivers/gpu/drm/xe/xe_eudebug_types.h
+++ b/drivers/gpu/drm/xe/xe_eudebug_types.h
@@ -16,6 +16,8 @@
 
 #include <uapi/drm/xe_drm.h>
 
+#include "xe_gt_debug.h"
+
 struct xe_device;
 struct task_struct;
 struct xe_eudebug;
@@ -161,6 +163,16 @@ struct xe_eudebug {
 
 	/** @ops operations for eu_control */
 	struct xe_eudebug_eu_control_ops *ops;
+
+	/** @pf_lock: guards access to pagefaults list*/
+	struct mutex pf_lock;
+	/** @pagefaults: xe_eudebug_pagefault list for pagefault event queuing */
+	struct list_head pagefaults;
+	/**
+	 * @pf_fence: fence on operations of eus (eu thread control and attention)
+	 * when page faults are being handled, protected by @eu_lock.
+	 */
+	struct dma_fence __rcu *pf_fence;
 };
 
 /**
@@ -351,4 +363,86 @@ struct xe_eudebug_event_vm_bind_op_metadata {
 	u64 metadata_cookie;
 };
 
+/**
+ * struct xe_eudebug_event_pagefault - Internal event for EU Pagefault
+ */
+struct xe_eudebug_event_pagefault {
+	/** @base: base event */
+	struct xe_eudebug_event base;
+
+	/** @client_handle: client for the Pagefault */
+	u64 client_handle;
+
+	/** @exec_queue_handle: handle of exec_queue which raised Pagefault */
+	u64 exec_queue_handle;
+
+	/** @lrc_handle: lrc handle of the workload which raised Pagefault */
+	u64 lrc_handle;
+
+	/** @flags: eu Pagefault event flags, currently MBZ */
+	u32 flags;
+
+	/**
+	 * @bitmask_size: sum of size before/after/resolved att bits.
+	 * It has three times the size of xe_eudebug_event_eu_attention.bitmask_size.
+	 */
+	u32 bitmask_size;
+
+	/** @pagefault_address: The ppgtt address where the Pagefault occurred */
+	u64 pagefault_address;
+
+	/**
+	 * @bitmask: Bitmask of thread attentions starting from natural,
+	 * hardware order of DSS=0, eu=0, 8 attention bits per eu.
+	 * The order of the bitmask array is before, after, resolved.
+	 */
+	u8 bitmask[];
+};
+
+/**
+ * struct xe_eudebug_pagefault - eudebug structure for queuing pagefault
+ */
+struct xe_eudebug_pagefault {
+	/** @list: link into the xe_eudebug.pagefaults */
+	struct list_head list;
+	/** @q: exec_queue which raised pagefault */
+	struct xe_exec_queue *q;
+	/** @lrc_idx: lrc index of the workload which raised pagefault */
+	int lrc_idx;
+
+	/* pagefault raw partial data passed from guc*/
+	struct {
+		/** @addr: ppgtt address where the pagefault occurred */
+		u64 addr;
+		int type;
+		int level;
+		int access;
+	} fault;
+
+	struct {
+		/** @before: state of attention bits before page fault WA processing*/
+		struct xe_eu_attentions before;
+		/**
+		 * @after: status of attention bits during page fault WA processing.
+		 * It includes eu threads where attention bits are turned on for
+		 * reasons other than page fault WA (breakpoint, interrupt, etc.).
+		 */
+		struct xe_eu_attentions after;
+		/**
+		 * @resolved: state of the attention bits after page fault WA.
+		 * It includes the eu thread that caused the page fault.
+		 * To determine the eu thread that caused the page fault,
+		 * do XOR attentions.after and attentions.resolved.
+		 */
+		struct xe_eu_attentions resolved;
+	} attentions;
+
+	/**
+	 * @deferred_resolved: to update attentions.resolved again when attention
+	 * bits are ready if the eu thread fails to turn on attention bits within
+	 * a certain time after page fault WA processing.
+	 */
+	bool deferred_resolved;
+};
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 2606cd396df5..5558342b8e07 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -79,7 +79,7 @@ static bool vma_matches(struct xe_vma *vma, u64 page_addr)
 	return true;
 }
 
-static struct xe_vma *lookup_vma(struct xe_vm *vm, u64 page_addr)
+struct xe_vma *xe_gt_pagefault_lookup_vma(struct xe_vm *vm, u64 page_addr)
 {
 	struct xe_vma *vma = NULL;
 
@@ -225,7 +225,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 		goto unlock_vm;
 	}
 
-	vma = lookup_vma(vm, pf->page_addr);
+	vma = xe_gt_pagefault_lookup_vma(vm, pf->page_addr);
 	if (!vma) {
 		err = -EINVAL;
 		goto unlock_vm;
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.h b/drivers/gpu/drm/xe/xe_gt_pagefault.h
index 839c065a5e4c..3c0628b79f33 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.h
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.h
@@ -10,10 +10,12 @@
 
 struct xe_gt;
 struct xe_guc;
+struct xe_vm;
 
 int xe_gt_pagefault_init(struct xe_gt *gt);
 void xe_gt_pagefault_reset(struct xe_gt *gt);
 int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len);
 int xe_guc_access_counter_notify_handler(struct xe_guc *guc, u32 *msg, u32 len);
+struct xe_vma *xe_gt_pagefault_lookup_vma(struct xe_vm *vm, u64 page_addr);
 
 #endif	/* _XE_GT_PAGEFAULT_ */
diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h
index 3c4d1b511acd..e43576c7bc5e 100644
--- a/include/uapi/drm/xe_drm_eudebug.h
+++ b/include/uapi/drm/xe_drm_eudebug.h
@@ -38,6 +38,7 @@ struct drm_xe_eudebug_event {
 #define DRM_XE_EUDEBUG_EVENT_VM_BIND_UFENCE	9
 #define DRM_XE_EUDEBUG_EVENT_METADATA		10
 #define DRM_XE_EUDEBUG_EVENT_VM_BIND_OP_METADATA 11
+#define DRM_XE_EUDEBUG_EVENT_PAGEFAULT		12
 
 	__u16 flags;
 #define DRM_XE_EUDEBUG_EVENT_CREATE		(1 << 0)
@@ -236,6 +237,18 @@ struct drm_xe_eudebug_event_vm_bind_op_metadata {
 	__u64 metadata_cookie;
 };
 
+struct drm_xe_eudebug_event_pagefault {
+	struct drm_xe_eudebug_event base;
+
+	__u64 client_handle;
+	__u64 exec_queue_handle;
+	__u64 lrc_handle;
+	__u32 flags;
+	__u32 bitmask_size;
+	__u64 pagefault_address;
+	__u8 bitmask[];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 25/26] drm/xe/vm: Support for adding null page VMA to VM on request
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (23 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 24/26] drm/xe/eudebug: Introduce EU pagefault handling interface Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 13:33 ` [PATCH 26/26] drm/xe/eudebug: Enable EU pagefault handling Mika Kuoppala
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, christian.koenig, Gwan-gyeong Mun, Oak Zeng,
	Niranjana Vishwanathapura, Stuart Summers, Matthew Brost,
	Bruce Chang, Mika Kuoppala

From: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>

The XE2 (and PVC) HW has a limitation that the pagefault due to invalid
access will halt the corresponding EUs. So, in order to activate the
debugger, kmd needs to install the temporal page to unhalt the EUs.
Plan to be used for pagefault handling when the EU debugger is running.
The idea is to install a null page vma if the pagefault is from an invalid
access. After installing null page pte, the user debugger can continue to
run/inspect without causing a fatal failure or reset and stop.
Based on Bruce's implementation [1].

[1] https://lore.kernel.org/intel-xe/20230829231648.4438-1-yu.bruce.chang@intel.com/

Cc: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Co-developed-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 23 +++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.h |  2 ++
 2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 474521d0fea9..ff45e5264aed 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3552,3 +3552,26 @@ int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
 	up_read(&vm->userptr.notifier_lock);
 	return ret;
 }
+
+struct xe_vma *xe_vm_create_null_vma(struct xe_vm *vm, u64 addr)
+{
+	struct xe_vma *vma;
+	u32 page_size;
+	int err;
+
+	if (xe_vm_is_closed_or_banned(vm))
+		return ERR_PTR(-ENOENT);
+
+	page_size = vm->flags & XE_VM_FLAG_64K ? SZ_64K : SZ_4K;
+	vma = xe_vma_create(vm, NULL, 0, addr, addr + page_size - 1, 0, VMA_CREATE_FLAG_IS_NULL);
+	if (IS_ERR_OR_NULL(vma))
+		return vma;
+
+	err = xe_vm_insert_vma(vm, vma);
+	if (err) {
+		xe_vma_destroy_late(vma);
+		return ERR_PTR(err);
+	}
+
+	return vma;
+}
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 372ad40ad67f..2ae3749cfd82 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -283,3 +283,5 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
 
 int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
 			 void *buf, u64 len, bool write);
+
+struct xe_vma *xe_vm_create_null_vma(struct xe_vm *vm, u64 addr);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 26/26] drm/xe/eudebug: Enable EU pagefault handling
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (24 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 25/26] drm/xe/vm: Support for adding null page VMA to VM on request Mika Kuoppala
@ 2024-12-09 13:33 ` Mika Kuoppala
  2024-12-09 14:37 ` ✓ CI.Patch_applied: success for Intel Xe GPU debug support (eudebug) v3 Patchwork
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-09 13:33 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, christian.koenig, Gwan-gyeong Mun, Mika Kuoppala

From: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>

The XE2 (and PVC) HW has a limitation that the pagefault due to invalid
access will halt the corresponding EUs. To solve this problem, enable
EU pagefault handling functionality, which allows to unhalt pagefaulted
eu threads and to EU debugger to get inform about the eu attentions state
of EU threads during execution.

If a pagefault occurs, send the DRM_XE_EUDEBUG_EVENT_PAGEFAULT event to
the client connected to the xe_eudebug after handling the pagefault.

The pagefault handling is a mechanism that allows a stalled EU thread to
enter SIP mode by installing a temporal null page to the page table entry
where the pagefault happened.

A brief description of the page fault handling mechanism flow between KMD
and the eu thread is as follows

(1) eu thread accesses unallocated address
(2) pagefault happens and eu thread stalls
(3) XE kmd set an force eu thread exception to allow the running eu thread
    to enter SIP mode (kmd set ForceException / ForceExternalHalt bit of
    TD_CTL register)
    Not stalled (none-pagefaulted) eu threads enter SIP mode
(4) XE kmd installs temporal null page to the pagetable entry of the
    address where pagefault happened.
(5) XE kmd replies pagefault successful message to GUC
(6) stalled eu thread resumes as per pagefault condition has resolved
(7) resumed eu thread enters SIP mode due to force exception set by (3)

As designed this feature to only work when eudbug is enabled, it should
have no impact to regular recoverable pagefault code path.

Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c | 83 +++++++++++++++++++++++++---
 1 file changed, 75 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 5558342b8e07..4e2883e19018 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -13,6 +13,7 @@
 
 #include "abi/guc_actions_abi.h"
 #include "xe_bo.h"
+#include "xe_eudebug.h"
 #include "xe_gt.h"
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_guc.h"
@@ -199,12 +200,16 @@ static struct xe_vm *asid_to_vm(struct xe_device *xe, u32 asid)
 	return vm;
 }
 
-static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
+static int handle_pagefault_start(struct xe_gt *gt, struct pagefault *pf,
+				  struct xe_vm **pf_vm,
+				  struct xe_eudebug_pagefault **eudebug_pf_out)
 {
-	struct xe_device *xe = gt_to_xe(gt);
+	struct xe_eudebug_pagefault *eudebug_pf;
 	struct xe_tile *tile = gt_to_tile(gt);
-	struct xe_vm *vm;
+	struct xe_device *xe = gt_to_xe(gt);
+	bool  destroy_eudebug_pf = false;
 	struct xe_vma *vma = NULL;
+	struct xe_vm *vm;
 	int err;
 
 	/* SW isn't expected to handle TRTT faults */
@@ -215,6 +220,10 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 	if (IS_ERR(vm))
 		return PTR_ERR(vm);
 
+	eudebug_pf = xe_eudebug_pagefault_create(gt, vm, pf->page_addr,
+						 pf->fault_type, pf->fault_level,
+						 pf->access_type);
+
 	/*
 	 * TODO: Change to read lock? Using write lock for simplicity.
 	 */
@@ -227,8 +236,27 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 
 	vma = xe_gt_pagefault_lookup_vma(vm, pf->page_addr);
 	if (!vma) {
-		err = -EINVAL;
-		goto unlock_vm;
+		if (eudebug_pf)
+			vma = xe_vm_create_null_vma(vm, pf->page_addr);
+
+		if (IS_ERR_OR_NULL(vma)) {
+			err = -EINVAL;
+			if (eudebug_pf)
+				destroy_eudebug_pf = true;
+
+			goto unlock_vm;
+		}
+	} else {
+		/*
+		 * When creating an instance of eudebug_pagefault, there was
+		 * no vma containing the ppgtt address where the pagefault occurred,
+		 * but when reacquiring vm->lock, there is.
+		 * During not aquiring the vm->lock from this context,
+		 * but vma corresponding to the address where the pagefault occurred
+		 * in another context has allocated.
+		 */
+		if (eudebug_pf)
+			destroy_eudebug_pf = true;
 	}
 
 	err = handle_vma_pagefault(tile, pf, vma);
@@ -237,11 +265,43 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 	if (!err)
 		vm->usm.last_fault_vma = vma;
 	up_write(&vm->lock);
-	xe_vm_put(vm);
+
+	if (destroy_eudebug_pf) {
+		xe_eudebug_pagefault_destroy(gt, vm, eudebug_pf, false);
+		*eudebug_pf_out = NULL;
+	} else {
+		*eudebug_pf_out = eudebug_pf;
+	}
+
+	/* while the lifetime of the eudebug pagefault instance, keep the VM instance.*/
+	if (!*eudebug_pf_out) {
+		xe_vm_put(vm);
+		*pf_vm = NULL;
+	} else {
+		*pf_vm = vm;
+	}
 
 	return err;
 }
 
+static void handle_pagefault_end(struct xe_gt *gt, struct xe_vm *vm,
+				 struct xe_eudebug_pagefault *eudebug_pf)
+{
+	/* if there no eudebug_pagefault then return */
+	if (!eudebug_pf)
+		return;
+
+	xe_eudebug_pagefault_process(gt, eudebug_pf);
+
+	/*
+	 * TODO: Remove VMA added to handle eudebug pagefault
+	 */
+
+	xe_eudebug_pagefault_destroy(gt, vm, eudebug_pf, true);
+
+	xe_vm_put(vm);
+}
+
 static int send_pagefault_reply(struct xe_guc *guc,
 				struct xe_guc_pagefault_reply *reply)
 {
@@ -367,7 +427,10 @@ static void pf_queue_work_func(struct work_struct *w)
 	threshold = jiffies + msecs_to_jiffies(USM_QUEUE_MAX_RUNTIME_MS);
 
 	while (get_pagefault(pf_queue, &pf)) {
-		ret = handle_pagefault(gt, &pf);
+		struct xe_eudebug_pagefault *eudebug_pf = NULL;
+		struct xe_vm *vm = NULL;
+
+		ret = handle_pagefault_start(gt, &pf, &vm, &eudebug_pf);
 		if (unlikely(ret)) {
 			print_pagefault(xe, &pf);
 			pf.fault_unsuccessful = 1;
@@ -385,7 +448,11 @@ static void pf_queue_work_func(struct work_struct *w)
 			FIELD_PREP(PFR_ENG_CLASS, pf.engine_class) |
 			FIELD_PREP(PFR_PDATA, pf.pdata);
 
-		send_pagefault_reply(&gt->uc.guc, &reply);
+		ret = send_pagefault_reply(&gt->uc.guc, &reply);
+		if (unlikely(ret))
+			drm_dbg(&xe->drm, "GuC Pagefault reply failed: %d\n", ret);
+
+		handle_pagefault_end(gt, vm, eudebug_pf);
 
 		if (time_after(jiffies, threshold) &&
 		    pf_queue->tail != pf_queue->head) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-09 13:33 ` [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access Mika Kuoppala
@ 2024-12-09 14:03   ` Christian König
  2024-12-09 14:56     ` Joonas Lahtinen
  2024-12-09 15:31     ` Simona Vetter
  2024-12-16 14:17   ` [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite Mika Kuoppala
  2024-12-20 11:31   ` Mika Kuoppala
  2 siblings, 2 replies; 63+ messages in thread
From: Christian König @ 2024-12-09 14:03 UTC (permalink / raw)
  To: Mika Kuoppala, intel-xe, lkml, Linux MM
  Cc: dri-devel, Andrzej Hajda, Maciej Patelczyk, Jonathan Cavitt

Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
> From: Andrzej Hajda <andrzej.hajda@intel.com>
>
> Debugger needs to read/write program's vmas including userptr_vma.
> Since hmm_range_fault is used to pin userptr vmas, it is possible
> to map those vmas from debugger context.

Oh, this implementation is extremely questionable as well. Adding the 
LKML and the MM list as well.

First of all hmm_range_fault() does *not* pin anything!

In other words you don't have a page reference when the function 
returns, but rather just a sequence number you can check for modifications.

> v2: pin pages vs notifier, move to vm.c (Matthew)
> v3: - iterate over system pages instead of DMA, fixes iommu enabled
>      - s/xe_uvma_access/xe_vm_uvma_access/ (Matt)
>
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> #v1
> ---
>   drivers/gpu/drm/xe/xe_eudebug.c |  3 ++-
>   drivers/gpu/drm/xe/xe_vm.c      | 47 +++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/xe/xe_vm.h      |  3 +++
>   3 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
> index 9d87df75348b..e5949e4dcad8 100644
> --- a/drivers/gpu/drm/xe/xe_eudebug.c
> +++ b/drivers/gpu/drm/xe/xe_eudebug.c
> @@ -3076,7 +3076,8 @@ static int xe_eudebug_vma_access(struct xe_vma *vma, u64 offset_in_vma,
>   		return ret;
>   	}
>   
> -	return -EINVAL;
> +	return xe_vm_userptr_access(to_userptr_vma(vma), offset_in_vma,
> +				    buf, bytes, write);
>   }
>   
>   static int xe_eudebug_vm_access(struct xe_vm *vm, u64 offset,
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 0f17bc8b627b..224ff9e16941 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -3414,3 +3414,50 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
>   	}
>   	kvfree(snap);
>   }
> +
> +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
> +			 void *buf, u64 len, bool write)
> +{
> +	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
> +	struct xe_userptr *up = &uvma->userptr;
> +	struct xe_res_cursor cur = {};
> +	int cur_len, ret = 0;
> +
> +	while (true) {
> +		down_read(&vm->userptr.notifier_lock);
> +		if (!xe_vma_userptr_check_repin(uvma))
> +			break;
> +
> +		spin_lock(&vm->userptr.invalidated_lock);
> +		list_del_init(&uvma->userptr.invalidate_link);
> +		spin_unlock(&vm->userptr.invalidated_lock);
> +
> +		up_read(&vm->userptr.notifier_lock);
> +		ret = xe_vma_userptr_pin_pages(uvma);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (!up->sg) {
> +		ret = -EINVAL;
> +		goto out_unlock_notifier;
> +	}
> +
> +	for (xe_res_first_sg_system(up->sg, offset, len, &cur); cur.remaining;
> +	     xe_res_next(&cur, cur_len)) {
> +		void *ptr = kmap_local_page(sg_page(cur.sgl)) + cur.start;

The interface basically creates a side channel to access userptrs in the 
way an userspace application would do without actually going through 
userspace.

That is generally not something a device driver should ever do as far as 
I can see.

> +
> +		cur_len = min(cur.size, cur.remaining);
> +		if (write)
> +			memcpy(ptr, buf, cur_len);
> +		else
> +			memcpy(buf, ptr, cur_len);
> +		kunmap_local(ptr);
> +		buf += cur_len;
> +	}
> +	ret = len;
> +
> +out_unlock_notifier:
> +	up_read(&vm->userptr.notifier_lock);

I just strongly hope that this will prevent the mapping from changing.

Regards,
Christian.

> +	return ret;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 23adb7442881..372ad40ad67f 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -280,3 +280,6 @@ struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm);
>   void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
>   void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p);
>   void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
> +
> +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
> +			 void *buf, u64 len, bool write);


^ permalink raw reply	[flat|nested] 63+ messages in thread

* ✓ CI.Patch_applied: success for Intel Xe GPU debug support (eudebug) v3
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (25 preceding siblings ...)
  2024-12-09 13:33 ` [PATCH 26/26] drm/xe/eudebug: Enable EU pagefault handling Mika Kuoppala
@ 2024-12-09 14:37 ` Patchwork
  2024-12-09 14:38 ` ✗ CI.checkpatch: warning " Patchwork
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Patchwork @ 2024-12-09 14:37 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-xe

== Series Details ==

Series: Intel Xe GPU debug support (eudebug) v3
URL   : https://patchwork.freedesktop.org/series/142295/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: c266f2fbb9c0 drm-tip: 2024y-12m-09d-14h-13m-53s UTC integration manifest
=== git am output follows ===
Applying: ptrace: export ptrace_may_access
Applying: drm/xe/eudebug: Introduce eudebug support
Applying: drm/xe/eudebug: Introduce discovery for resources
Applying: drm/xe/eudebug: Introduce exec_queue events
Applying: drm/xe/eudebug: Introduce exec queue placements event
Applying: drm/xe/eudebug: hw enablement for eudebug
Applying: drm/xe: Add EUDEBUG_ENABLE exec queue property
Applying: drm/xe/eudebug: Introduce per device attention scan worker
Applying: drm/xe/eudebug: Introduce EU control interface
Applying: drm/xe/eudebug: Add vm bind and vm bind ops
Applying: drm/xe/eudebug: Add UFENCE events with acks
Applying: drm/xe/eudebug: vm open/pread/pwrite
Applying: drm/xe: add system memory page iterator support to xe_res_cursor
Applying: drm/xe/eudebug: implement userptr_vma access
Applying: drm/xe: Debug metadata create/destroy ioctls
Applying: drm/xe: Attach debug metadata to vma
Applying: drm/xe/eudebug: Add debug metadata support for xe_eudebug
Applying: drm/xe/eudebug: Implement vm_bind_op discovery
Applying: drm/xe/eudebug: Dynamically toggle debugger functionality
Applying: drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test
Applying: drm/xe/eudebug/ptl: Add support for extra attention register
Applying: drm/xe/eudebug/ptl: Add RCU_DEBUG_1 register support for xe3
Applying: drm/xe/eudebug: Add read/count/compare helper for eu attention
Applying: drm/xe/eudebug: Introduce EU pagefault handling interface
Applying: drm/xe/vm: Support for adding null page VMA to VM on request
Applying: drm/xe/eudebug: Enable EU pagefault handling



^ permalink raw reply	[flat|nested] 63+ messages in thread

* ✗ CI.checkpatch: warning for Intel Xe GPU debug support (eudebug) v3
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (26 preceding siblings ...)
  2024-12-09 14:37 ` ✓ CI.Patch_applied: success for Intel Xe GPU debug support (eudebug) v3 Patchwork
@ 2024-12-09 14:38 ` Patchwork
  2024-12-09 14:39 ` ✗ CI.KUnit: failure " Patchwork
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Patchwork @ 2024-12-09 14:38 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-xe

== Series Details ==

Series: Intel Xe GPU debug support (eudebug) v3
URL   : https://patchwork.freedesktop.org/series/142295/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
30ab6715fc09baee6cc14cb3c89ad8858688d474
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 00b6bd24a242c98271f20d013a487ae9666d092e
Author: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Date:   Mon Dec 9 15:33:17 2024 +0200

    drm/xe/eudebug: Enable EU pagefault handling
    
    The XE2 (and PVC) HW has a limitation that the pagefault due to invalid
    access will halt the corresponding EUs. To solve this problem, enable
    EU pagefault handling functionality, which allows to unhalt pagefaulted
    eu threads and to EU debugger to get inform about the eu attentions state
    of EU threads during execution.
    
    If a pagefault occurs, send the DRM_XE_EUDEBUG_EVENT_PAGEFAULT event to
    the client connected to the xe_eudebug after handling the pagefault.
    
    The pagefault handling is a mechanism that allows a stalled EU thread to
    enter SIP mode by installing a temporal null page to the page table entry
    where the pagefault happened.
    
    A brief description of the page fault handling mechanism flow between KMD
    and the eu thread is as follows
    
    (1) eu thread accesses unallocated address
    (2) pagefault happens and eu thread stalls
    (3) XE kmd set an force eu thread exception to allow the running eu thread
        to enter SIP mode (kmd set ForceException / ForceExternalHalt bit of
        TD_CTL register)
        Not stalled (none-pagefaulted) eu threads enter SIP mode
    (4) XE kmd installs temporal null page to the pagetable entry of the
        address where pagefault happened.
    (5) XE kmd replies pagefault successful message to GUC
    (6) stalled eu thread resumes as per pagefault condition has resolved
    (7) resumed eu thread enters SIP mode due to force exception set by (3)
    
    As designed this feature to only work when eudbug is enabled, it should
    have no impact to regular recoverable pagefault code path.
    
    Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
    Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
+ /mt/dim checkpatch c266f2fbb9c0250af17b14788364766adf802a4a drm-intel
729648a2eb39 ptrace: export ptrace_may_access
ac2f364d93cd drm/xe/eudebug: Introduce eudebug support
-:211: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#211: 
new file mode 100644

-:249: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#249: FILE: drivers/gpu/drm/xe/xe_eudebug.c:34:
+#define XE_EUDEBUG_DBG_ARGS(d) (d)->session, \
+		atomic_long_read(&(d)->events.seqno), \
+		READ_ONCE(d->connection.status) <= 0 ? "disconnected" : "", \
+		current->pid, \
+		task_tgid_nr(current), \
+		(d)->target_task->pid, \
+		task_tgid_nr((d)->target_task)

-:249: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'd' - possible side-effects?
#249: FILE: drivers/gpu/drm/xe/xe_eudebug.c:34:
+#define XE_EUDEBUG_DBG_ARGS(d) (d)->session, \
+		atomic_long_read(&(d)->events.seqno), \
+		READ_ONCE(d->connection.status) <= 0 ? "disconnected" : "", \
+		current->pid, \
+		task_tgid_nr(current), \
+		(d)->target_task->pid, \
+		task_tgid_nr((d)->target_task)

-:257: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'd' - possible side-effects?
#257: FILE: drivers/gpu/drm/xe/xe_eudebug.c:42:
+#define eu_err(d, fmt, ...) drm_err(&(d)->xe->drm, XE_EUDEBUG_DBG_STR # fmt, \
+				    XE_EUDEBUG_DBG_ARGS(d), ##__VA_ARGS__)

-:259: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'd' - possible side-effects?
#259: FILE: drivers/gpu/drm/xe/xe_eudebug.c:44:
+#define eu_warn(d, fmt, ...) drm_warn(&(d)->xe->drm, XE_EUDEBUG_DBG_STR # fmt, \
+				      XE_EUDEBUG_DBG_ARGS(d), ##__VA_ARGS__)

-:261: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'd' - possible side-effects?
#261: FILE: drivers/gpu/drm/xe/xe_eudebug.c:46:
+#define eu_dbg(d, fmt, ...) drm_dbg(&(d)->xe->drm, XE_EUDEBUG_DBG_STR # fmt, \
+				    XE_EUDEBUG_DBG_ARGS(d), ##__VA_ARGS__)

-:266: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'T' may be better as '(T)' to avoid precedence issues
#266: FILE: drivers/gpu/drm/xe/xe_eudebug.c:51:
+#define struct_member(T, member) (((T *)0)->member)

-:266: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'member' may be better as '(member)' to avoid precedence issues
#266: FILE: drivers/gpu/drm/xe/xe_eudebug.c:51:
+#define struct_member(T, member) (((T *)0)->member)

-:269: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
#269: FILE: drivers/gpu/drm/xe/xe_eudebug.c:54:
+#define write_member(T_out, ptr, member, value) { \
+	BUILD_BUG_ON(sizeof(*ptr) != sizeof(T_out)); \
+	BUILD_BUG_ON(offsetof(typeof(*ptr), member) != \
+		     offsetof(typeof(T_out), member)); \
+	BUILD_BUG_ON(sizeof(ptr->member) != sizeof(value)); \
+	BUILD_BUG_ON(sizeof(struct_member(T_out, member)) != sizeof(value)); \
+	BUILD_BUG_ON(!typecheck(typeof((ptr)->member), value));	\
+	(ptr)->member = (value); \
+	}

-:269: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'ptr' may be better as '(ptr)' to avoid precedence issues
#269: FILE: drivers/gpu/drm/xe/xe_eudebug.c:54:
+#define write_member(T_out, ptr, member, value) { \
+	BUILD_BUG_ON(sizeof(*ptr) != sizeof(T_out)); \
+	BUILD_BUG_ON(offsetof(typeof(*ptr), member) != \
+		     offsetof(typeof(T_out), member)); \
+	BUILD_BUG_ON(sizeof(ptr->member) != sizeof(value)); \
+	BUILD_BUG_ON(sizeof(struct_member(T_out, member)) != sizeof(value)); \
+	BUILD_BUG_ON(!typecheck(typeof((ptr)->member), value));	\
+	(ptr)->member = (value); \
+	}

-:269: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#269: FILE: drivers/gpu/drm/xe/xe_eudebug.c:54:
+#define write_member(T_out, ptr, member, value) { \
+	BUILD_BUG_ON(sizeof(*ptr) != sizeof(T_out)); \
+	BUILD_BUG_ON(offsetof(typeof(*ptr), member) != \
+		     offsetof(typeof(T_out), member)); \
+	BUILD_BUG_ON(sizeof(ptr->member) != sizeof(value)); \
+	BUILD_BUG_ON(sizeof(struct_member(T_out, member)) != sizeof(value)); \
+	BUILD_BUG_ON(!typecheck(typeof((ptr)->member), value));	\
+	(ptr)->member = (value); \
+	}

-:269: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'value' - possible side-effects?
#269: FILE: drivers/gpu/drm/xe/xe_eudebug.c:54:
+#define write_member(T_out, ptr, member, value) { \
+	BUILD_BUG_ON(sizeof(*ptr) != sizeof(T_out)); \
+	BUILD_BUG_ON(offsetof(typeof(*ptr), member) != \
+		     offsetof(typeof(T_out), member)); \
+	BUILD_BUG_ON(sizeof(ptr->member) != sizeof(value)); \
+	BUILD_BUG_ON(sizeof(struct_member(T_out, member)) != sizeof(value)); \
+	BUILD_BUG_ON(!typecheck(typeof((ptr)->member), value));	\
+	(ptr)->member = (value); \
+	}

-:541: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_err' - possible side-effects?
#541: FILE: drivers/gpu/drm/xe/xe_eudebug.c:326:
+#define xe_eudebug_disconnect(_d, _err) ({ \
+	if (_xe_eudebug_disconnect((_d), (_err))) { \
+		if ((_err) == 0 || (_err) == -ETIMEDOUT) \
+			eu_dbg(d, "Session closed (%d)", (_err)); \
+		else \
+			eu_err(d, "Session disconnected, err = %d (%s:%d)", \
+			       (_err), __func__, __LINE__); \
+	} \
+})

-:1201: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_d' - possible side-effects?
#1201: FILE: drivers/gpu/drm/xe/xe_eudebug.c:986:
+#define xe_eudebug_event_put(_d, _err) ({ \
+	if ((_err)) \
+		xe_eudebug_disconnect((_d), (_err)); \
+	xe_eudebug_put((_d)); \
+	})

-:1201: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_err' - possible side-effects?
#1201: FILE: drivers/gpu/drm/xe/xe_eudebug.c:986:
+#define xe_eudebug_event_put(_d, _err) ({ \
+	if ((_err)) \
+		xe_eudebug_disconnect((_d), (_err)); \
+	xe_eudebug_put((_d)); \
+	})

-:1610: WARNING:LONG_LINE: line length of 130 exceeds 100 columns
#1610: FILE: include/uapi/drm/xe_drm.h:121:
+#define DRM_IOCTL_XE_EUDEBUG_CONNECT		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_EUDEBUG_CONNECT, struct drm_xe_eudebug_connect)

total: 1 errors, 2 warnings, 13 checks, 1577 lines checked
c97ad7125e5c drm/xe/eudebug: Introduce discovery for resources
-:95: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#95: FILE: drivers/gpu/drm/xe/xe_device.h:232:
+}
+static inline void xe_eudebug_discovery_unlock(struct xe_device *xe, unsigned int cmd)

total: 0 errors, 0 warnings, 1 checks, 318 lines checked
17ef44596324 drm/xe/eudebug: Introduce exec_queue events
9b1663a6ba62 drm/xe/eudebug: Introduce exec queue placements event
-:78: CHECK:BRACES: Blank lines aren't necessary after an open brace '{'
#78: FILE: drivers/gpu/drm/xe/xe_eudebug.c:1245:
+{
+

-:135: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#135: FILE: drivers/gpu/drm/xe/xe_eudebug.c:1332:
+	ret = send_exec_queue_event(d, DRM_XE_EUDEBUG_EVENT_CREATE,
+				  h_c, h_vm, h_queue, q->class,

total: 0 errors, 0 warnings, 2 checks, 198 lines checked
d6539334fb02 drm/xe/eudebug: hw enablement for eudebug
71935b719213 drm/xe: Add EUDEBUG_ENABLE exec queue property
faa9fdc51a4f drm/xe/eudebug: Introduce per device attention scan worker
-:429: CHECK:LINE_SPACING: Please don't use multiple blank lines
#429: FILE: drivers/gpu/drm/xe/xe_eudebug.c:1156:
+
+

-:649: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#649: 
new file mode 100644

total: 0 errors, 1 warnings, 1 checks, 770 lines checked
af1c94d3a52a drm/xe/eudebug: Introduce EU control interface
2ecba6c79908 drm/xe/eudebug: Add vm bind and vm bind ops
19cfd471bec6 drm/xe/eudebug: Add UFENCE events with acks
-:667: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#667: FILE: drivers/gpu/drm/xe/xe_sync_types.h:26:
+		spinlock_t lock;

total: 0 errors, 0 warnings, 1 checks, 635 lines checked
e9ba92596063 drm/xe/eudebug: vm open/pread/pwrite
43ef77a37d6f drm/xe: add system memory page iterator support to xe_res_cursor
-:41: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#41: FILE: drivers/gpu/drm/xe/xe_res_cursor.h:148:
+static inline void __xe_res_first_sg(const struct sg_table *sg,
+				   u64 start, u64 size,

-:83: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#83: FILE: drivers/gpu/drm/xe/xe_res_cursor.h:189:
+static inline void xe_res_first_sg_system(const struct sg_table *sg,
+				   u64 start, u64 size,

total: 0 errors, 0 warnings, 2 checks, 71 lines checked
ea30cab25fb5 drm/xe/eudebug: implement userptr_vma access
ab7e2f12115d drm/xe: Debug metadata create/destroy ioctls
-:36: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#36: 
new file mode 100644

-:202: CHECK:LINE_SPACING: Please don't use multiple blank lines
#202: FILE: drivers/gpu/drm/xe/xe_debug_metadata.h:49:
+
+

-:351: WARNING:LONG_LINE: line length of 143 exceeds 100 columns
#351: FILE: include/uapi/drm/xe_drm.h:123:
+#define DRM_IOCTL_XE_DEBUG_METADATA_CREATE	 DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_DEBUG_METADATA_CREATE, struct drm_xe_debug_metadata_create)

-:352: WARNING:LONG_LINE: line length of 144 exceeds 100 columns
#352: FILE: include/uapi/drm/xe_drm.h:124:
+#define DRM_IOCTL_XE_DEBUG_METADATA_DESTROY	 DRM_IOW(DRM_COMMAND_BASE + DRM_XE_DEBUG_METADATA_DESTROY, struct drm_xe_debug_metadata_destroy)

total: 0 errors, 3 warnings, 1 checks, 337 lines checked
673be9f4e1cb drm/xe: Attach debug metadata to vma
-:315: WARNING:LONG_LINE: line length of 109 exceeds 100 columns
#315: FILE: drivers/gpu/drm/xe/xe_vm.c:2261:
+					err = xe_eudebug_copy_vma_metadata(&op->remap.prev->eudebug.metadata,

total: 0 errors, 1 warnings, 0 checks, 485 lines checked
6950a7c43f9e drm/xe/eudebug: Add debug metadata support for xe_eudebug
-:48: CHECK:LINE_SPACING: Please don't use multiple blank lines
#48: FILE: drivers/gpu/drm/xe/xe_debug_metadata.c:198:
 
+

total: 0 errors, 0 warnings, 1 checks, 599 lines checked
8f29285760a5 drm/xe/eudebug: Implement vm_bind_op discovery
59784a8a0666 drm/xe/eudebug: Dynamically toggle debugger functionality
1bbfdddec9c4 drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test
-:16: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#16: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 194 lines checked
7419000de5f6 drm/xe/eudebug/ptl: Add support for extra attention register
cc6676ac179e drm/xe/eudebug/ptl: Add RCU_DEBUG_1 register support for xe3
86497262e6c8 drm/xe/eudebug: Add read/count/compare helper for eu attention
1a6695e5836f drm/xe/eudebug: Introduce EU pagefault handling interface
-:409: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#409: FILE: drivers/gpu/drm/xe/xe_eudebug.c:4234:
+	spinlock_t lock;

-:417: CHECK:COMPARISON_TO_NULL: Comparison to NULL could be written "!fence"
#417: FILE: drivers/gpu/drm/xe/xe_eudebug.c:4242:
+	if (fence == NULL)

-:514: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see function description of usleep_range() and udelay().
#514: FILE: drivers/gpu/drm/xe/xe_eudebug.c:4339:
+		udelay(200);

-:522: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see function description of usleep_range() and udelay().
#522: FILE: drivers/gpu/drm/xe/xe_eudebug.c:4347:
+		udelay(200);

total: 0 errors, 0 warnings, 4 checks, 774 lines checked
09839463caad drm/xe/vm: Support for adding null page VMA to VM on request
-:15: WARNING:COMMIT_LOG_LONG_LINE: Prefer a maximum 75 chars per line (possible unwrapped commit description?)
#15: 
[1] https://lore.kernel.org/intel-xe/20230829231648.4438-1-yu.bruce.chang@intel.com/

total: 0 errors, 1 warnings, 0 checks, 31 lines checked
00b6bd24a242 drm/xe/eudebug: Enable EU pagefault handling



^ permalink raw reply	[flat|nested] 63+ messages in thread

* ✗ CI.KUnit: failure for Intel Xe GPU debug support (eudebug) v3
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (27 preceding siblings ...)
  2024-12-09 14:38 ` ✗ CI.checkpatch: warning " Patchwork
@ 2024-12-09 14:39 ` Patchwork
  2024-12-16 14:22 ` ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev2) Patchwork
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 63+ messages in thread
From: Patchwork @ 2024-12-09 14:39 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-xe

== Series Details ==

Series: Intel Xe GPU debug support (eudebug) v3
URL   : https://patchwork.freedesktop.org/series/142295/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[14:38:55] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[14:39:00] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json ARCH=um O=.kunit --jobs=48
ERROR:root:../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
  156 | u64 ioread64_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
  163 | u64 ioread64_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
  170 | u64 ioread64be_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
  178 | u64 ioread64be_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
  264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
  272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
  280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
  288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
In file included from ../include/linux/module.h:22,
                 from ../include/linux/device/driver.h:21,
                 from ../include/linux/device.h:32,
                 from ../include/linux/auxiliary_bus.h:11,
                 from ../include/linux/intel_vsec.h:5,
                 from ../drivers/gpu/drm/xe/xe_vsec.c:7:
../drivers/gpu/drm/xe/xe_vsec.c:233:18: error: expected ‘,’ or ‘;’ before ‘INTEL_VSEC’
  233 | MODULE_IMPORT_NS(INTEL_VSEC);
      |                  ^~~~~~~~~~
../include/linux/moduleparam.h:26:61: note: in definition of macro ‘__MODULE_INFO’
   26 |                 = __MODULE_INFO_PREFIX __stringify(tag) "=" info
      |                                                             ^~~~
../include/linux/module.h:299:33: note: in expansion of macro ‘MODULE_INFO’
  299 | #define MODULE_IMPORT_NS(ns)    MODULE_INFO(import_ns, ns)
      |                                 ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vsec.c:233:1: note: in expansion of macro ‘MODULE_IMPORT_NS’
  233 | MODULE_IMPORT_NS(INTEL_VSEC);
      | ^~~~~~~~~~~~~~~~
make[7]: *** [../scripts/Makefile.build:194: drivers/gpu/drm/xe/xe_vsec.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [../scripts/Makefile.build:440: drivers/gpu/drm/xe] Error 2
make[5]: *** [../scripts/Makefile.build:440: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:440: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:440: drivers] Error 2
make[2]: *** [/kernel/Makefile:1989: .] Error 2
make[1]: *** [/kernel/Makefile:251: __sub-make] Error 2
make: *** [Makefile:251: __sub-make] Error 2

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-09 14:03   ` Christian König
@ 2024-12-09 14:56     ` Joonas Lahtinen
  2024-12-09 15:31     ` Simona Vetter
  1 sibling, 0 replies; 63+ messages in thread
From: Joonas Lahtinen @ 2024-12-09 14:56 UTC (permalink / raw)
  To: Christian König, Linux MM, Mika Kuoppala, intel-xe, lkml
  Cc: dri-devel, Andrzej Hajda, Maciej Patelczyk, Jonathan Cavitt,
	Thomas Hellstrom, Matthew Brost

(+ Thomas and Matt B who this was reviewed with as a concept)

Quoting Christian König (2024-12-09 16:03:04)
> Am 09.12.24 um 14:33 schrieb Mika Kuoppala:

<SNIP>

> > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
> > +                      void *buf, u64 len, bool write)
> > +{
> > +     struct xe_vm *vm = xe_vma_vm(&uvma->vma);
> > +     struct xe_userptr *up = &uvma->userptr;
> > +     struct xe_res_cursor cur = {};
> > +     int cur_len, ret = 0;
> > +
> > +     while (true) {
> > +             down_read(&vm->userptr.notifier_lock);
> > +             if (!xe_vma_userptr_check_repin(uvma))
> > +                     break;
> > +
> > +             spin_lock(&vm->userptr.invalidated_lock);
> > +             list_del_init(&uvma->userptr.invalidate_link);
> > +             spin_unlock(&vm->userptr.invalidated_lock);
> > +
> > +             up_read(&vm->userptr.notifier_lock);
> > +             ret = xe_vma_userptr_pin_pages(uvma);
> > +             if (ret)
> > +                     return ret;
> > +     }
> > +
> > +     if (!up->sg) {
> > +             ret = -EINVAL;
> > +             goto out_unlock_notifier;
> > +     }
> > +
> > +     for (xe_res_first_sg_system(up->sg, offset, len, &cur); cur.remaining;
> > +          xe_res_next(&cur, cur_len)) {
> > +             void *ptr = kmap_local_page(sg_page(cur.sgl)) + cur.start;
> 
> The interface basically creates a side channel to access userptrs in the 
> way an userspace application would do without actually going through 
> userspace.

That's not quite the case here.

The whole point of the debugger ability to access memory is to access
any VMA in the GPU VM emulating as much as possible like the EUs themselves
would do the access.

As mentioned in the other threads, that also involves special cache flushes
to make sure the contents has been flushed in and out of the EU caches in case
we're modifying instruction code for breakpoints as an example.

What the memory access function should do is to establish a temporary
pinning situation where the memory would be accessible just like it would
be for a short batchbuffer, but without interfering with the command streamers.

If this should be established in a different way from this patch, then
we should adapt of course.

> That is generally not something a device driver should ever do as far as 
> I can see.

Given above explanation, does it make more sense? For debugging
purposes, we try to emulate the EU threads themselves accessing the
memory, not the userspace at all.

Regards, Joonas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-09 14:03   ` Christian König
  2024-12-09 14:56     ` Joonas Lahtinen
@ 2024-12-09 15:31     ` Simona Vetter
  2024-12-09 15:42       ` Christian König
  2024-12-12  8:49       ` Thomas Hellström
  1 sibling, 2 replies; 63+ messages in thread
From: Simona Vetter @ 2024-12-09 15:31 UTC (permalink / raw)
  To: Christian König
  Cc: Mika Kuoppala, intel-xe, lkml, Linux MM, dri-devel, Andrzej Hajda,
	Maciej Patelczyk, Jonathan Cavitt

On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
> Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
> > From: Andrzej Hajda <andrzej.hajda@intel.com>
> > 
> > Debugger needs to read/write program's vmas including userptr_vma.
> > Since hmm_range_fault is used to pin userptr vmas, it is possible
> > to map those vmas from debugger context.
> 
> Oh, this implementation is extremely questionable as well. Adding the LKML
> and the MM list as well.
> 
> First of all hmm_range_fault() does *not* pin anything!
> 
> In other words you don't have a page reference when the function returns,
> but rather just a sequence number you can check for modifications.

I think it's all there, holds the invalidation lock during the critical
access/section, drops it when reacquiring pages, retries until it works.

I think the issue is more that everyone hand-rolls userptr. Probably time
we standardize that and put it into gpuvm as an optional part, with
consistent locking, naming (like not calling it _pin_pages when it's
unpinnged userptr), kerneldoc and all the nice things so that we
stop consistently getting confused by other driver's userptr code.

I think that was on the plan originally as an eventual step, I guess time
to pump that up. Matt/Thomas, thoughts?
-Sima

> 
> > v2: pin pages vs notifier, move to vm.c (Matthew)
> > v3: - iterate over system pages instead of DMA, fixes iommu enabled
> >      - s/xe_uvma_access/xe_vm_uvma_access/ (Matt)
> > 
> > Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> > Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> #v1
> > ---
> >   drivers/gpu/drm/xe/xe_eudebug.c |  3 ++-
> >   drivers/gpu/drm/xe/xe_vm.c      | 47 +++++++++++++++++++++++++++++++++
> >   drivers/gpu/drm/xe/xe_vm.h      |  3 +++
> >   3 files changed, 52 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
> > index 9d87df75348b..e5949e4dcad8 100644
> > --- a/drivers/gpu/drm/xe/xe_eudebug.c
> > +++ b/drivers/gpu/drm/xe/xe_eudebug.c
> > @@ -3076,7 +3076,8 @@ static int xe_eudebug_vma_access(struct xe_vma *vma, u64 offset_in_vma,
> >   		return ret;
> >   	}
> > -	return -EINVAL;
> > +	return xe_vm_userptr_access(to_userptr_vma(vma), offset_in_vma,
> > +				    buf, bytes, write);
> >   }
> >   static int xe_eudebug_vm_access(struct xe_vm *vm, u64 offset,
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index 0f17bc8b627b..224ff9e16941 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -3414,3 +3414,50 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
> >   	}
> >   	kvfree(snap);
> >   }
> > +
> > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
> > +			 void *buf, u64 len, bool write)
> > +{
> > +	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
> > +	struct xe_userptr *up = &uvma->userptr;
> > +	struct xe_res_cursor cur = {};
> > +	int cur_len, ret = 0;
> > +
> > +	while (true) {
> > +		down_read(&vm->userptr.notifier_lock);
> > +		if (!xe_vma_userptr_check_repin(uvma))
> > +			break;
> > +
> > +		spin_lock(&vm->userptr.invalidated_lock);
> > +		list_del_init(&uvma->userptr.invalidate_link);
> > +		spin_unlock(&vm->userptr.invalidated_lock);
> > +
> > +		up_read(&vm->userptr.notifier_lock);
> > +		ret = xe_vma_userptr_pin_pages(uvma);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	if (!up->sg) {
> > +		ret = -EINVAL;
> > +		goto out_unlock_notifier;
> > +	}
> > +
> > +	for (xe_res_first_sg_system(up->sg, offset, len, &cur); cur.remaining;
> > +	     xe_res_next(&cur, cur_len)) {
> > +		void *ptr = kmap_local_page(sg_page(cur.sgl)) + cur.start;
> 
> The interface basically creates a side channel to access userptrs in the way
> an userspace application would do without actually going through userspace.
> 
> That is generally not something a device driver should ever do as far as I
> can see.
> 
> > +
> > +		cur_len = min(cur.size, cur.remaining);
> > +		if (write)
> > +			memcpy(ptr, buf, cur_len);
> > +		else
> > +			memcpy(buf, ptr, cur_len);
> > +		kunmap_local(ptr);
> > +		buf += cur_len;
> > +	}
> > +	ret = len;
> > +
> > +out_unlock_notifier:
> > +	up_read(&vm->userptr.notifier_lock);
> 
> I just strongly hope that this will prevent the mapping from changing.
> 
> Regards,
> Christian.
> 
> > +	return ret;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> > index 23adb7442881..372ad40ad67f 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -280,3 +280,6 @@ struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm);
> >   void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
> >   void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p);
> >   void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
> > +
> > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
> > +			 void *buf, u64 len, bool write);
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-09 15:31     ` Simona Vetter
@ 2024-12-09 15:42       ` Christian König
  2024-12-09 15:45         ` Christian König
                           ` (2 more replies)
  2024-12-12  8:49       ` Thomas Hellström
  1 sibling, 3 replies; 63+ messages in thread
From: Christian König @ 2024-12-09 15:42 UTC (permalink / raw)
  To: Mika Kuoppala, intel-xe, lkml, Linux MM, dri-devel, Andrzej Hajda,
	Maciej Patelczyk, Jonathan Cavitt

Am 09.12.24 um 16:31 schrieb Simona Vetter:
> On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
>> Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
>>> From: Andrzej Hajda <andrzej.hajda@intel.com>
>>>
>>> Debugger needs to read/write program's vmas including userptr_vma.
>>> Since hmm_range_fault is used to pin userptr vmas, it is possible
>>> to map those vmas from debugger context.
>> Oh, this implementation is extremely questionable as well. Adding the LKML
>> and the MM list as well.
>>
>> First of all hmm_range_fault() does *not* pin anything!
>>
>> In other words you don't have a page reference when the function returns,
>> but rather just a sequence number you can check for modifications.
> I think it's all there, holds the invalidation lock during the critical
> access/section, drops it when reacquiring pages, retries until it works.
>
> I think the issue is more that everyone hand-rolls userptr.

Well that is part of the issue.

The general problem here is that the eudebug interface tries to simulate 
the memory accesses as they would have happened by the hardware.

What the debugger should probably do is to cleanly attach to the 
application, get the information which CPU address is mapped to which 
GPU address and then use the standard ptrace interfaces.

The whole interface re-invents a lot of functionality which is already 
there just because you don't like the idea to attach to the debugged 
application in userspace.

As far as I can see this whole idea is extremely questionable. This 
looks like re-inventing the wheel in a different color.

Regards,
Christian.

>   Probably time
> we standardize that and put it into gpuvm as an optional part, with
> consistent locking, naming (like not calling it _pin_pages when it's
> unpinnged userptr), kerneldoc and all the nice things so that we
> stop consistently getting confused by other driver's userptr code.
>
> I think that was on the plan originally as an eventual step, I guess time
> to pump that up. Matt/Thomas, thoughts?
> -Sima
>
>>> v2: pin pages vs notifier, move to vm.c (Matthew)
>>> v3: - iterate over system pages instead of DMA, fixes iommu enabled
>>>       - s/xe_uvma_access/xe_vm_uvma_access/ (Matt)
>>>
>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>> Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> #v1
>>> ---
>>>    drivers/gpu/drm/xe/xe_eudebug.c |  3 ++-
>>>    drivers/gpu/drm/xe/xe_vm.c      | 47 +++++++++++++++++++++++++++++++++
>>>    drivers/gpu/drm/xe/xe_vm.h      |  3 +++
>>>    3 files changed, 52 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
>>> index 9d87df75348b..e5949e4dcad8 100644
>>> --- a/drivers/gpu/drm/xe/xe_eudebug.c
>>> +++ b/drivers/gpu/drm/xe/xe_eudebug.c
>>> @@ -3076,7 +3076,8 @@ static int xe_eudebug_vma_access(struct xe_vma *vma, u64 offset_in_vma,
>>>    		return ret;
>>>    	}
>>> -	return -EINVAL;
>>> +	return xe_vm_userptr_access(to_userptr_vma(vma), offset_in_vma,
>>> +				    buf, bytes, write);
>>>    }
>>>    static int xe_eudebug_vm_access(struct xe_vm *vm, u64 offset,
>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>> index 0f17bc8b627b..224ff9e16941 100644
>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>> @@ -3414,3 +3414,50 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
>>>    	}
>>>    	kvfree(snap);
>>>    }
>>> +
>>> +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
>>> +			 void *buf, u64 len, bool write)
>>> +{
>>> +	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
>>> +	struct xe_userptr *up = &uvma->userptr;
>>> +	struct xe_res_cursor cur = {};
>>> +	int cur_len, ret = 0;
>>> +
>>> +	while (true) {
>>> +		down_read(&vm->userptr.notifier_lock);
>>> +		if (!xe_vma_userptr_check_repin(uvma))
>>> +			break;
>>> +
>>> +		spin_lock(&vm->userptr.invalidated_lock);
>>> +		list_del_init(&uvma->userptr.invalidate_link);
>>> +		spin_unlock(&vm->userptr.invalidated_lock);
>>> +
>>> +		up_read(&vm->userptr.notifier_lock);
>>> +		ret = xe_vma_userptr_pin_pages(uvma);
>>> +		if (ret)
>>> +			return ret;
>>> +	}
>>> +
>>> +	if (!up->sg) {
>>> +		ret = -EINVAL;
>>> +		goto out_unlock_notifier;
>>> +	}
>>> +
>>> +	for (xe_res_first_sg_system(up->sg, offset, len, &cur); cur.remaining;
>>> +	     xe_res_next(&cur, cur_len)) {
>>> +		void *ptr = kmap_local_page(sg_page(cur.sgl)) + cur.start;
>> The interface basically creates a side channel to access userptrs in the way
>> an userspace application would do without actually going through userspace.
>>
>> That is generally not something a device driver should ever do as far as I
>> can see.
>>
>>> +
>>> +		cur_len = min(cur.size, cur.remaining);
>>> +		if (write)
>>> +			memcpy(ptr, buf, cur_len);
>>> +		else
>>> +			memcpy(buf, ptr, cur_len);
>>> +		kunmap_local(ptr);
>>> +		buf += cur_len;
>>> +	}
>>> +	ret = len;
>>> +
>>> +out_unlock_notifier:
>>> +	up_read(&vm->userptr.notifier_lock);
>> I just strongly hope that this will prevent the mapping from changing.
>>
>> Regards,
>> Christian.
>>
>>> +	return ret;
>>> +}
>>> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
>>> index 23adb7442881..372ad40ad67f 100644
>>> --- a/drivers/gpu/drm/xe/xe_vm.h
>>> +++ b/drivers/gpu/drm/xe/xe_vm.h
>>> @@ -280,3 +280,6 @@ struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm);
>>>    void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
>>>    void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p);
>>>    void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
>>> +
>>> +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
>>> +			 void *buf, u64 len, bool write);


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-09 15:42       ` Christian König
@ 2024-12-09 15:45         ` Christian König
  2024-12-10  9:33         ` Joonas Lahtinen
  2024-12-10 11:17         ` Simona Vetter
  2 siblings, 0 replies; 63+ messages in thread
From: Christian König @ 2024-12-09 15:45 UTC (permalink / raw)
  To: Mika Kuoppala, intel-xe, lkml, Linux MM, dri-devel, Andrzej Hajda,
	Maciej Patelczyk, Jonathan Cavitt

Am 09.12.24 um 16:42 schrieb Christian König:
> Am 09.12.24 um 16:31 schrieb Simona Vetter:
>> On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
>>> Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
>>>> From: Andrzej Hajda <andrzej.hajda@intel.com>
>>>>
>>>> Debugger needs to read/write program's vmas including userptr_vma.
>>>> Since hmm_range_fault is used to pin userptr vmas, it is possible
>>>> to map those vmas from debugger context.
>>> Oh, this implementation is extremely questionable as well. Adding 
>>> the LKML
>>> and the MM list as well.
>>>
>>> First of all hmm_range_fault() does *not* pin anything!
>>>
>>> In other words you don't have a page reference when the function 
>>> returns,
>>> but rather just a sequence number you can check for modifications.
>> I think it's all there, holds the invalidation lock during the critical
>> access/section, drops it when reacquiring pages, retries until it works.

One thing I'm missing: Is it possible that mappings are created read-only?

E.g. that you have a read-only mapping of libc and then write through 
this interface to it?

Of hand I don't see anything preventing this (well could be that you 
don't allow creating read-only mappings).

Regards,
Christian.

>>
>> I think the issue is more that everyone hand-rolls userptr.
>
> Well that is part of the issue.
>
> The general problem here is that the eudebug interface tries to 
> simulate the memory accesses as they would have happened by the hardware.
>
> What the debugger should probably do is to cleanly attach to the 
> application, get the information which CPU address is mapped to which 
> GPU address and then use the standard ptrace interfaces.
>
> The whole interface re-invents a lot of functionality which is already 
> there just because you don't like the idea to attach to the debugged 
> application in userspace.
>
> As far as I can see this whole idea is extremely questionable. This 
> looks like re-inventing the wheel in a different color.
>
> Regards,
> Christian.
>
>>   Probably time
>> we standardize that and put it into gpuvm as an optional part, with
>> consistent locking, naming (like not calling it _pin_pages when it's
>> unpinnged userptr), kerneldoc and all the nice things so that we
>> stop consistently getting confused by other driver's userptr code.
>>
>> I think that was on the plan originally as an eventual step, I guess 
>> time
>> to pump that up. Matt/Thomas, thoughts?
>> -Sima
>>
>>>> v2: pin pages vs notifier, move to vm.c (Matthew)
>>>> v3: - iterate over system pages instead of DMA, fixes iommu enabled
>>>>       - s/xe_uvma_access/xe_vm_uvma_access/ (Matt)
>>>>
>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>> Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
>>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>>> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> #v1
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_eudebug.c |  3 ++-
>>>>    drivers/gpu/drm/xe/xe_vm.c      | 47 
>>>> +++++++++++++++++++++++++++++++++
>>>>    drivers/gpu/drm/xe/xe_vm.h      |  3 +++
>>>>    3 files changed, 52 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_eudebug.c 
>>>> b/drivers/gpu/drm/xe/xe_eudebug.c
>>>> index 9d87df75348b..e5949e4dcad8 100644
>>>> --- a/drivers/gpu/drm/xe/xe_eudebug.c
>>>> +++ b/drivers/gpu/drm/xe/xe_eudebug.c
>>>> @@ -3076,7 +3076,8 @@ static int xe_eudebug_vma_access(struct 
>>>> xe_vma *vma, u64 offset_in_vma,
>>>>            return ret;
>>>>        }
>>>> -    return -EINVAL;
>>>> +    return xe_vm_userptr_access(to_userptr_vma(vma), offset_in_vma,
>>>> +                    buf, bytes, write);
>>>>    }
>>>>    static int xe_eudebug_vm_access(struct xe_vm *vm, u64 offset,
>>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>>> index 0f17bc8b627b..224ff9e16941 100644
>>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>>> @@ -3414,3 +3414,50 @@ void xe_vm_snapshot_free(struct 
>>>> xe_vm_snapshot *snap)
>>>>        }
>>>>        kvfree(snap);
>>>>    }
>>>> +
>>>> +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
>>>> +             void *buf, u64 len, bool write)
>>>> +{
>>>> +    struct xe_vm *vm = xe_vma_vm(&uvma->vma);
>>>> +    struct xe_userptr *up = &uvma->userptr;
>>>> +    struct xe_res_cursor cur = {};
>>>> +    int cur_len, ret = 0;
>>>> +
>>>> +    while (true) {
>>>> +        down_read(&vm->userptr.notifier_lock);
>>>> +        if (!xe_vma_userptr_check_repin(uvma))
>>>> +            break;
>>>> +
>>>> +        spin_lock(&vm->userptr.invalidated_lock);
>>>> + list_del_init(&uvma->userptr.invalidate_link);
>>>> +        spin_unlock(&vm->userptr.invalidated_lock);
>>>> +
>>>> +        up_read(&vm->userptr.notifier_lock);
>>>> +        ret = xe_vma_userptr_pin_pages(uvma);
>>>> +        if (ret)
>>>> +            return ret;
>>>> +    }
>>>> +
>>>> +    if (!up->sg) {
>>>> +        ret = -EINVAL;
>>>> +        goto out_unlock_notifier;
>>>> +    }
>>>> +
>>>> +    for (xe_res_first_sg_system(up->sg, offset, len, &cur); 
>>>> cur.remaining;
>>>> +         xe_res_next(&cur, cur_len)) {
>>>> +        void *ptr = kmap_local_page(sg_page(cur.sgl)) + cur.start;
>>> The interface basically creates a side channel to access userptrs in 
>>> the way
>>> an userspace application would do without actually going through 
>>> userspace.
>>>
>>> That is generally not something a device driver should ever do as 
>>> far as I
>>> can see.
>>>
>>>> +
>>>> +        cur_len = min(cur.size, cur.remaining);
>>>> +        if (write)
>>>> +            memcpy(ptr, buf, cur_len);
>>>> +        else
>>>> +            memcpy(buf, ptr, cur_len);
>>>> +        kunmap_local(ptr);
>>>> +        buf += cur_len;
>>>> +    }
>>>> +    ret = len;
>>>> +
>>>> +out_unlock_notifier:
>>>> +    up_read(&vm->userptr.notifier_lock);
>>> I just strongly hope that this will prevent the mapping from changing.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> +    return ret;
>>>> +}
>>>> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
>>>> index 23adb7442881..372ad40ad67f 100644
>>>> --- a/drivers/gpu/drm/xe/xe_vm.h
>>>> +++ b/drivers/gpu/drm/xe/xe_vm.h
>>>> @@ -280,3 +280,6 @@ struct xe_vm_snapshot 
>>>> *xe_vm_snapshot_capture(struct xe_vm *vm);
>>>>    void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
>>>>    void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct 
>>>> drm_printer *p);
>>>>    void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
>>>> +
>>>> +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
>>>> +             void *buf, u64 len, bool write);
>


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/26] ptrace: export ptrace_may_access
  2024-12-09 13:32 ` [PATCH 01/26] ptrace: export ptrace_may_access Mika Kuoppala
@ 2024-12-10  4:29   ` Christoph Hellwig
  2024-12-12  9:16     ` Joonas Lahtinen
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2024-12-10  4:29 UTC (permalink / raw)
  To: Mika Kuoppala
  Cc: intel-xe, dri-devel, christian.koenig, Oleg Nesterov,
	linux-kernel, Dave Airlie, Lucas De Marchi, Matthew Brost,
	Andi Shyti, Joonas Lahtinen, Maciej Patelczyk, Dominik Grzegorzek,
	Jonathan Cavitt, Andi Shyti

On Mon, Dec 09, 2024 at 03:32:52PM +0200, Mika Kuoppala wrote:
> xe driver would like to allow fine grained access control
> for GDB debugger using ptrace. Without this export, the only
> option would be to check for CAP_SYS_ADMIN.
> 
> The check intended for an ioctl to attach a GPU debugger
> is similar to the ptrace use case: allow a calling process
> to manipulate a target process if it has the necessary
> capabilities or the same permissions, as described in
> Documentation/process/adding-syscalls.rst.
> 
> Export ptrace_may_access function to allow GPU debugger to
> have identical access control for debugger(s)
> as a CPU debugger.

This seems to mis an actual user or you forgot to Cc linux-kernel on it.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-09 15:42       ` Christian König
  2024-12-09 15:45         ` Christian König
@ 2024-12-10  9:33         ` Joonas Lahtinen
  2024-12-10 10:00           ` Christian König
  2024-12-10 11:17         ` Simona Vetter
  2 siblings, 1 reply; 63+ messages in thread
From: Joonas Lahtinen @ 2024-12-10  9:33 UTC (permalink / raw)
  To: Andrzej Hajda, Christian König, Jonathan Cavitt, Linux MM,
	Maciej Patelczyk, Mika Kuoppala, dri-devel, intel-xe, lkml

Quoting Christian König (2024-12-09 17:42:32)
> Am 09.12.24 um 16:31 schrieb Simona Vetter:
> > On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
> >> Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
> >>> From: Andrzej Hajda <andrzej.hajda@intel.com>
> >>>
> >>> Debugger needs to read/write program's vmas including userptr_vma.
> >>> Since hmm_range_fault is used to pin userptr vmas, it is possible
> >>> to map those vmas from debugger context.
> >> Oh, this implementation is extremely questionable as well. Adding the LKML
> >> and the MM list as well.
> >>
> >> First of all hmm_range_fault() does *not* pin anything!
> >>
> >> In other words you don't have a page reference when the function returns,
> >> but rather just a sequence number you can check for modifications.
> > I think it's all there, holds the invalidation lock during the critical
> > access/section, drops it when reacquiring pages, retries until it works.
> >
> > I think the issue is more that everyone hand-rolls userptr.
> 
> Well that is part of the issue.
> 
> The general problem here is that the eudebug interface tries to simulate 
> the memory accesses as they would have happened by the hardware.

Could you elaborate, what is that a problem in that, exactly?

It's pretty much the equivalent of ptrace() poke/peek but for GPU memory.
And it is exactly the kind of interface that makes sense for debugger as
GPU memory != CPU memory, and they don't need to align at all.

> What the debugger should probably do is to cleanly attach to the 
> application, get the information which CPU address is mapped to which 
> GPU address and then use the standard ptrace interfaces.

I don't quite agree here -- at all. "Which CPU address is mapped to
which GPU address" makes no sense when the GPU address space and CPU
address space is completely controlled by the userspace driver/application.

Please try to consider things outside of the ROCm architecture.

Something like a register scratch region or EU instructions should not
even be mapped to CPU address space as CPU has no business accessing it
during normal operation. And backing of such region will vary per
context/LRC on the same virtual address per EU thread.

You seem to be suggesting to rewrite even our userspace driver to behave
the same way as ROCm driver does just so that we could implement debug memory
accesses via ptrace() to the CPU address space.

That seems bit of a radical suggestion, especially given the drawbacks
pointed out in your suggested design.

> The whole interface re-invents a lot of functionality which is already 
> there 

I'm not really sure I would call adding a single interface for memory
reading and writing to be "re-inventing a lot of functionality".

All the functionality behind this interface will be needed by GPU core
dumping, anyway. Just like for the other patch series.

> just because you don't like the idea to attach to the debugged 
> application in userspace.

A few points that have been brought up as drawback to the
GPU debug through ptrace(), but to recap a few relevant ones for this
discussion:

- You can only really support GDB stop-all mode or at least have to
  stop all the CPU threads while you control the GPU threads to
  avoid interference. Elaborated on this on the other threads more.
- Controlling the GPU threads will always interfere with CPU threads.
  Doesn't seem feasible to single-step an EU thread while CPU threads
  continue to run freely?
- You are very much restricted by the CPU VA ~ GPU VA alignment
  requirement, which is not true for OpenGL or Vulkan etc. Seems
  like one of the reasons why ROCm debugging is not easily extendable
  outside compute?
- You have to expose extra memory to CPU process just for GPU
  debugger access and keep track of GPU VA for each. Makes the GPU more
  prone to OOB writes from CPU. Exactly what not mapping the memory
  to CPU tried to protect the GPU from to begin with.

> As far as I can see this whole idea is extremely questionable. This 
> looks like re-inventing the wheel in a different color.

I see it like reinventing a round wheel compared to octagonal wheel.

Could you elaborate with facts much more on your position why the ROCm
debugger design is an absolute must for others to adopt?

Otherwise it just looks like you are trying to prevent others from
implementing a more flexible debugging interface through vague comments about
"questionable design" without going into details. Not listing much concrete
benefits nor addressing the very concretely expressed drawbacks of your
suggested design, makes it seem like a very biased non-technical discussion.

So while review interest and any comments are very much appreciated, please
also work on providing bit more reasoning and facts instead of just claiming
things. That'll help make the discussion much more fruitful.

Regards, Joonas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-10  9:33         ` Joonas Lahtinen
@ 2024-12-10 10:00           ` Christian König
  2024-12-10 11:57             ` Joonas Lahtinen
  0 siblings, 1 reply; 63+ messages in thread
From: Christian König @ 2024-12-10 10:00 UTC (permalink / raw)
  To: Joonas Lahtinen, Andrzej Hajda, Jonathan Cavitt, Linux MM,
	Maciej Patelczyk, Mika Kuoppala, dri-devel, intel-xe, lkml,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 7394 bytes --]

Am 10.12.24 um 10:33 schrieb Joonas Lahtinen:
> Quoting Christian König (2024-12-09 17:42:32)
>> Am 09.12.24 um 16:31 schrieb Simona Vetter:
>>> On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
>>>> Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
>>>>> From: Andrzej Hajda<andrzej.hajda@intel.com>
>>>>>
>>>>> Debugger needs to read/write program's vmas including userptr_vma.
>>>>> Since hmm_range_fault is used to pin userptr vmas, it is possible
>>>>> to map those vmas from debugger context.
>>>> Oh, this implementation is extremely questionable as well. Adding the LKML
>>>> and the MM list as well.
>>>>
>>>> First of all hmm_range_fault() does *not* pin anything!
>>>>
>>>> In other words you don't have a page reference when the function returns,
>>>> but rather just a sequence number you can check for modifications.
>>> I think it's all there, holds the invalidation lock during the critical
>>> access/section, drops it when reacquiring pages, retries until it works.
>>>
>>> I think the issue is more that everyone hand-rolls userptr.
>> Well that is part of the issue.
>>
>> The general problem here is that the eudebug interface tries to simulate
>> the memory accesses as they would have happened by the hardware.
> Could you elaborate, what is that a problem in that, exactly?
>
> It's pretty much the equivalent of ptrace() poke/peek but for GPU memory.

Exactly that here. You try to debug the GPU without taking control of 
the CPU process.

This means that you have to re-implement all debug functionalities which 
where previously invested for the CPU process for the GPU once more.

And that in turn creates a massive attack surface for security related 
problems, especially when you start messing with things like userptrs 
which have a very low level interaction with core memory management.

> And it is exactly the kind of interface that makes sense for debugger as
> GPU memory != CPU memory, and they don't need to align at all.

And that is what I strongly disagree on. When you debug the GPU it is 
mandatory to gain control of the CPU process as well.

The CPU process is basically the overseer of the GPU activity, so it 
should know everything about the GPU operation, for example what a 
mapping actually means.

The kernel driver and the hardware only have the information necessary 
to execute the work prepared by the CPU process. So the information 
available is limited to begin with.

>> What the debugger should probably do is to cleanly attach to the
>> application, get the information which CPU address is mapped to which
>> GPU address and then use the standard ptrace interfaces.
> I don't quite agree here -- at all. "Which CPU address is mapped to
> which GPU address" makes no sense when the GPU address space and CPU
> address space is completely controlled by the userspace driver/application.

Yeah, that's the reason why you should ask the userspace 
driver/application for the necessary information and not go over the 
kernel to debug things.

> Please try to consider things outside of the ROCm architecture.

Well I consider a good part of the ROCm architecture rather broken 
exactly because we haven't pushed back hard enough on bad ideas.

> Something like a register scratch region or EU instructions should not
> even be mapped to CPU address space as CPU has no business accessing it
> during normal operation. And backing of such region will vary per
> context/LRC on the same virtual address per EU thread.
>
> You seem to be suggesting to rewrite even our userspace driver to behave
> the same way as ROCm driver does just so that we could implement debug memory
> accesses via ptrace() to the CPU address space.

Oh, well certainly not. That ROCm has an 1 to 1 mapping between CPU and 
GPU is one thing I've pushed back massively on and has now proven to be 
problematic.

> That seems bit of a radical suggestion, especially given the drawbacks
> pointed out in your suggested design.
>
>> The whole interface re-invents a lot of functionality which is already
>> there
> I'm not really sure I would call adding a single interface for memory
> reading and writing to be "re-inventing a lot of functionality".
>
> All the functionality behind this interface will be needed by GPU core
> dumping, anyway. Just like for the other patch series.

As far as I can see exactly that's an absolutely no-go. Device core 
dumping should *never ever* touch memory imported by userptrs.

That's what process core dumping is good for.

>> just because you don't like the idea to attach to the debugged
>> application in userspace.
> A few points that have been brought up as drawback to the
> GPU debug through ptrace(), but to recap a few relevant ones for this
> discussion:
>
> - You can only really support GDB stop-all mode or at least have to
>    stop all the CPU threads while you control the GPU threads to
>    avoid interference. Elaborated on this on the other threads more.
> - Controlling the GPU threads will always interfere with CPU threads.
>    Doesn't seem feasible to single-step an EU thread while CPU threads
>    continue to run freely?

I would say no.

> - You are very much restricted by the CPU VA ~ GPU VA alignment
>    requirement, which is not true for OpenGL or Vulkan etc. Seems
>    like one of the reasons why ROCm debugging is not easily extendable
>    outside compute?

Well as long as you can't take debugged threads from the hardware you 
can pretty much forget any OpenGL or Vulkan debugging with this 
interface since it violates the dma_fence restrictions in the kernel.

> - You have to expose extra memory to CPU process just for GPU
>    debugger access and keep track of GPU VA for each. Makes the GPU more
>    prone to OOB writes from CPU. Exactly what not mapping the memory
>    to CPU tried to protect the GPU from to begin with.
>
>> As far as I can see this whole idea is extremely questionable. This
>> looks like re-inventing the wheel in a different color.
> I see it like reinventing a round wheel compared to octagonal wheel.
>
> Could you elaborate with facts much more on your position why the ROCm
> debugger design is an absolute must for others to adopt?

Well I'm trying to prevent some of the mistakes we did with the ROCm design.

And trying to re-invent well proven kernel interfaces is one of the big 
mistakes made in the ROCm design.

If you really want to expose an interface to userspace which walks the 
process page table, installs an MMU notifier, kmaps the resulting page 
and then memcpy to/from it then you absolutely *must* run that by guys 
like Christoph Hellwig, Andrew and even Linus.

I'm pretty sure that those guys will note that a device driver should 
absolutely not mess with such stuff.

Regards,
Christian.

> Otherwise it just looks like you are trying to prevent others from
> implementing a more flexible debugging interface through vague comments about
> "questionable design" without going into details. Not listing much concrete
> benefits nor addressing the very concretely expressed drawbacks of your
> suggested design, makes it seem like a very biased non-technical discussion.
>
> So while review interest and any comments are very much appreciated, please
> also work on providing bit more reasoning and facts instead of just claiming
> things. That'll help make the discussion much more fruitful.
>
> Regards, Joonas

[-- Attachment #2: Type: text/html, Size: 10875 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-09 15:42       ` Christian König
  2024-12-09 15:45         ` Christian König
  2024-12-10  9:33         ` Joonas Lahtinen
@ 2024-12-10 11:17         ` Simona Vetter
  2 siblings, 0 replies; 63+ messages in thread
From: Simona Vetter @ 2024-12-10 11:17 UTC (permalink / raw)
  To: Christian König
  Cc: Mika Kuoppala, intel-xe, lkml, Linux MM, dri-devel, Andrzej Hajda,
	Maciej Patelczyk, Jonathan Cavitt

On Mon, Dec 09, 2024 at 04:42:32PM +0100, Christian König wrote:
> Am 09.12.24 um 16:31 schrieb Simona Vetter:
> > On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
> > > Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
> > > > From: Andrzej Hajda <andrzej.hajda@intel.com>
> > > > 
> > > > Debugger needs to read/write program's vmas including userptr_vma.
> > > > Since hmm_range_fault is used to pin userptr vmas, it is possible
> > > > to map those vmas from debugger context.
> > > Oh, this implementation is extremely questionable as well. Adding the LKML
> > > and the MM list as well.
> > > 
> > > First of all hmm_range_fault() does *not* pin anything!
> > > 
> > > In other words you don't have a page reference when the function returns,
> > > but rather just a sequence number you can check for modifications.
> > I think it's all there, holds the invalidation lock during the critical
> > access/section, drops it when reacquiring pages, retries until it works.
> > 
> > I think the issue is more that everyone hand-rolls userptr.
> 
> Well that is part of the issue.

Yeah I ignored the other part, because that didn't seem super interesting
really.

Like for compute you can make the architectural assumption that gpu
addresses match cpu addresses, and this all becomes easy. Or at least
easier, there's still the issue of having to call into the driver for gpu
side flushes.

But for 3d userptr that's not the case, and you get two options:
- Have some tracking structure that umd and debugger agree on, so stable
  abi fun and all that, so you can find the mapping. And I think in some
  cases this would need to be added first.
- Just ask the kernel, which already knows.

Like for cpu mmaps we also don't inject tracking code into mmap/munmap, we
just ask the kernel through /proc/pid/maps. This sounds like the same
question, probably should have a similar answer.

I guess you can do a bit a bikeshed about whether the kernel should only
do the address translation and then you do a ptrace poke/peek for the
actual access. But again you need some flushes, so this might be a bit
silly.

But fundamentally this makes sense to me to ask the entity that actually
knows how userptr areas map to memory.
-Sima

> 
> The general problem here is that the eudebug interface tries to simulate the
> memory accesses as they would have happened by the hardware.
> 
> What the debugger should probably do is to cleanly attach to the
> application, get the information which CPU address is mapped to which GPU
> address and then use the standard ptrace interfaces.
> 
> The whole interface re-invents a lot of functionality which is already there
> just because you don't like the idea to attach to the debugged application
> in userspace.
> 
> As far as I can see this whole idea is extremely questionable. This looks
> like re-inventing the wheel in a different color.
> 
> Regards,
> Christian.
> 
> >   Probably time
> > we standardize that and put it into gpuvm as an optional part, with
> > consistent locking, naming (like not calling it _pin_pages when it's
> > unpinnged userptr), kerneldoc and all the nice things so that we
> > stop consistently getting confused by other driver's userptr code.
> > 
> > I think that was on the plan originally as an eventual step, I guess time
> > to pump that up. Matt/Thomas, thoughts?
> > -Sima
> > 
> > > > v2: pin pages vs notifier, move to vm.c (Matthew)
> > > > v3: - iterate over system pages instead of DMA, fixes iommu enabled
> > > >       - s/xe_uvma_access/xe_vm_uvma_access/ (Matt)
> > > > 
> > > > Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> > > > Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> > > > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> #v1
> > > > ---
> > > >    drivers/gpu/drm/xe/xe_eudebug.c |  3 ++-
> > > >    drivers/gpu/drm/xe/xe_vm.c      | 47 +++++++++++++++++++++++++++++++++
> > > >    drivers/gpu/drm/xe/xe_vm.h      |  3 +++
> > > >    3 files changed, 52 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
> > > > index 9d87df75348b..e5949e4dcad8 100644
> > > > --- a/drivers/gpu/drm/xe/xe_eudebug.c
> > > > +++ b/drivers/gpu/drm/xe/xe_eudebug.c
> > > > @@ -3076,7 +3076,8 @@ static int xe_eudebug_vma_access(struct xe_vma *vma, u64 offset_in_vma,
> > > >    		return ret;
> > > >    	}
> > > > -	return -EINVAL;
> > > > +	return xe_vm_userptr_access(to_userptr_vma(vma), offset_in_vma,
> > > > +				    buf, bytes, write);
> > > >    }
> > > >    static int xe_eudebug_vm_access(struct xe_vm *vm, u64 offset,
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > > index 0f17bc8b627b..224ff9e16941 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > @@ -3414,3 +3414,50 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
> > > >    	}
> > > >    	kvfree(snap);
> > > >    }
> > > > +
> > > > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
> > > > +			 void *buf, u64 len, bool write)
> > > > +{
> > > > +	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
> > > > +	struct xe_userptr *up = &uvma->userptr;
> > > > +	struct xe_res_cursor cur = {};
> > > > +	int cur_len, ret = 0;
> > > > +
> > > > +	while (true) {
> > > > +		down_read(&vm->userptr.notifier_lock);
> > > > +		if (!xe_vma_userptr_check_repin(uvma))
> > > > +			break;
> > > > +
> > > > +		spin_lock(&vm->userptr.invalidated_lock);
> > > > +		list_del_init(&uvma->userptr.invalidate_link);
> > > > +		spin_unlock(&vm->userptr.invalidated_lock);
> > > > +
> > > > +		up_read(&vm->userptr.notifier_lock);
> > > > +		ret = xe_vma_userptr_pin_pages(uvma);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +	}
> > > > +
> > > > +	if (!up->sg) {
> > > > +		ret = -EINVAL;
> > > > +		goto out_unlock_notifier;
> > > > +	}
> > > > +
> > > > +	for (xe_res_first_sg_system(up->sg, offset, len, &cur); cur.remaining;
> > > > +	     xe_res_next(&cur, cur_len)) {
> > > > +		void *ptr = kmap_local_page(sg_page(cur.sgl)) + cur.start;
> > > The interface basically creates a side channel to access userptrs in the way
> > > an userspace application would do without actually going through userspace.
> > > 
> > > That is generally not something a device driver should ever do as far as I
> > > can see.
> > > 
> > > > +
> > > > +		cur_len = min(cur.size, cur.remaining);
> > > > +		if (write)
> > > > +			memcpy(ptr, buf, cur_len);
> > > > +		else
> > > > +			memcpy(buf, ptr, cur_len);
> > > > +		kunmap_local(ptr);
> > > > +		buf += cur_len;
> > > > +	}
> > > > +	ret = len;
> > > > +
> > > > +out_unlock_notifier:
> > > > +	up_read(&vm->userptr.notifier_lock);
> > > I just strongly hope that this will prevent the mapping from changing.
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > +	return ret;
> > > > +}
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> > > > index 23adb7442881..372ad40ad67f 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm.h
> > > > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > > > @@ -280,3 +280,6 @@ struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm);
> > > >    void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
> > > >    void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p);
> > > >    void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
> > > > +
> > > > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64 offset,
> > > > +			 void *buf, u64 len, bool write);
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-10 10:00           ` Christian König
@ 2024-12-10 11:57             ` Joonas Lahtinen
  2024-12-10 14:03               ` Christian König
  0 siblings, 1 reply; 63+ messages in thread
From: Joonas Lahtinen @ 2024-12-10 11:57 UTC (permalink / raw)
  To: Andrzej Hajda, Christian König, Christoph Hellwig,
	Jonathan Cavitt, Linux MM, Maciej Patelczyk, Mika Kuoppala,
	dri-devel, intel-xe, lkml

Quoting Christian König (2024-12-10 12:00:48)
> Am 10.12.24 um 10:33 schrieb Joonas Lahtinen:
> 
>     Quoting Christian König (2024-12-09 17:42:32)
> 
>         Am 09.12.24 um 16:31 schrieb Simona Vetter:
> 
>             On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
> 
>                 Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
> 
>                     From: Andrzej Hajda <andrzej.hajda@intel.com>
> 
>                     Debugger needs to read/write program's vmas including userptr_vma.
>                     Since hmm_range_fault is used to pin userptr vmas, it is possible
>                     to map those vmas from debugger context.
> 
>                 Oh, this implementation is extremely questionable as well. Adding the LKML
>                 and the MM list as well.
> 
>                 First of all hmm_range_fault() does *not* pin anything!
> 
>                 In other words you don't have a page reference when the function returns,
>                 but rather just a sequence number you can check for modifications.
> 
>             I think it's all there, holds the invalidation lock during the critical
>             access/section, drops it when reacquiring pages, retries until it works.
> 
>             I think the issue is more that everyone hand-rolls userptr.
> 
>         Well that is part of the issue.
> 
>         The general problem here is that the eudebug interface tries to simulate
>         the memory accesses as they would have happened by the hardware.
> 
>     Could you elaborate, what is that a problem in that, exactly?
> 
>     It's pretty much the equivalent of ptrace() poke/peek but for GPU memory.
> 
> 
> Exactly that here. You try to debug the GPU without taking control of the CPU
> process.

You seem to have a built-in expectation that the CPU threads and memory space
must be interfered with in order to debug a completely different set of threads
and memory space elsewhere that executes independently. I don't quite see why?

In debugging massively parallel workloads, it's a huge drawback to be limited to
stop all mode in GDB. If ROCm folks are fine with such limitation, I have nothing
against them keeping that limitation. Just it was a starting design principle for
this design to avoid such a limitation.

> This means that you have to re-implement all debug functionalities which where
> previously invested for the CPU process for the GPU once more.

Seems like a strawman argument. Can you list the "all interfaces" being added
that would be possible via indirection via ptrace() beyond peek/poke?

> And that in turn creates a massive attack surface for security related
> problems, especially when you start messing with things like userptrs which
> have a very low level interaction with core memory management.

Again, just seems like a strawman argument. You seem to generalize to some massive
attack surface of hypothetical interfaces which you don't list. We're talking
about memory peek/poke here.

Can you explain the high-level difference from security perspective for
temporarily pinning userptr pages to write them to page tables for GPU to
execute a dma-fence workload with and temporarily pinning pages for
peek/poke?

>     And it is exactly the kind of interface that makes sense for debugger as
>     GPU memory != CPU memory, and they don't need to align at all.
> 
> 
> And that is what I strongly disagree on. When you debug the GPU it is mandatory
> to gain control of the CPU process as well.

You are free to disagree on that. I simply don't agree and have in this
and previous email presented multiple reasons as to why not. We can
agree to disagree on the topic.

> The CPU process is basically the overseer of the GPU activity, so it should
> know everything about the GPU operation, for example what a mapping actually
> means.

How does that relate to what is being discussed here? You just seem to
explain how you think userspace driver should work: Maintain a shadow
tree of each ppGTT VM layout? I don't agree on that, but I think it is
slightly irrelevant here.

> The kernel driver and the hardware only have the information necessary to
> execute the work prepared by the CPU process. So the information available is
> limited to begin with.

And the point here is? Are you saying kernel does not know the actual mappings
maintained in the GPU page tables?

>         What the debugger should probably do is to cleanly attach to the
>         application, get the information which CPU address is mapped to which
>         GPU address and then use the standard ptrace interfaces.
> 
>     I don't quite agree here -- at all. "Which CPU address is mapped to
>     which GPU address" makes no sense when the GPU address space and CPU
>     address space is completely controlled by the userspace driver/application.
> 
> 
> Yeah, that's the reason why you should ask the userspace driver/application for
> the necessary information and not go over the kernel to debug things.

What hypothetical necessary information are you referring to exactly?

I already explained there are good reasons not to map all the GPU memory
into the CPU address space.

>     Please try to consider things outside of the ROCm architecture.
> 
> 
> Well I consider a good part of the ROCm architecture rather broken exactly
> because we haven't pushed back hard enough on bad ideas.
> 
> 
>     Something like a register scratch region or EU instructions should not
>     even be mapped to CPU address space as CPU has no business accessing it
>     during normal operation. And backing of such region will vary per
>     context/LRC on the same virtual address per EU thread.
> 
>     You seem to be suggesting to rewrite even our userspace driver to behave
>     the same way as ROCm driver does just so that we could implement debug memory
>     accesses via ptrace() to the CPU address space.
> 
> 
> Oh, well certainly not. That ROCm has an 1 to 1 mapping between CPU and GPU is
> one thing I've pushed back massively on and has now proven to be problematic.

Right, so is your claim then that instead of being 1:1 the CPU address space
should be a superset of all GPU address spaces instead to make sure
ptrace() can modify all memory?

Cause I'm slightly lost here as you don't give much reasoning, just
claim things to be certain way.

>     That seems bit of a radical suggestion, especially given the drawbacks
>     pointed out in your suggested design.
> 
> 
>         The whole interface re-invents a lot of functionality which is already
>         there
> 
>     I'm not really sure I would call adding a single interface for memory
>     reading and writing to be "re-inventing a lot of functionality".
> 
>     All the functionality behind this interface will be needed by GPU core
>     dumping, anyway. Just like for the other patch series.
> 
> 
> As far as I can see exactly that's an absolutely no-go. Device core dumping
> should *never ever* touch memory imported by userptrs.

Could you again elaborate on what the great difference is to short term
pinning to use in dma-fence workloads? Just the kmap?

> That's what process core dumping is good for.

Not really sure I agree. If you do not dump the memory as seen by the
GPU, then you need to go parsing the CPU address space in order to make
sense which buffers were mapped where and that CPU memory contents containing
metadata could be corrupt as we're dealing with a crashing app to begin with.

Big point of relying to the information from GPU VM for the GPU memory layout
is that it won't be corrupted by rogue memory accesses in CPU process.

>         just because you don't like the idea to attach to the debugged
>         application in userspace.
> 
>     A few points that have been brought up as drawback to the
>     GPU debug through ptrace(), but to recap a few relevant ones for this
>     discussion:
> 
>     - You can only really support GDB stop-all mode or at least have to
>       stop all the CPU threads while you control the GPU threads to
>       avoid interference. Elaborated on this on the other threads more.
>     - Controlling the GPU threads will always interfere with CPU threads.
>       Doesn't seem feasible to single-step an EU thread while CPU threads
>       continue to run freely?
> 
> 
> I would say no.

Should this be understood that you agree these are limitations of the ROCm
debug architecture?

>     - You are very much restricted by the CPU VA ~ GPU VA alignment
>       requirement, which is not true for OpenGL or Vulkan etc. Seems
>       like one of the reasons why ROCm debugging is not easily extendable
>       outside compute?
> 
> 
> Well as long as you can't take debugged threads from the hardware you can
> pretty much forget any OpenGL or Vulkan debugging with this interface since it
> violates the dma_fence restrictions in the kernel.

Agreed. However doesn't mean because you can't do it right now, you you should
design an architecture that actively prevents you from doing that in the future.

>     - You have to expose extra memory to CPU process just for GPU
>       debugger access and keep track of GPU VA for each. Makes the GPU more
>       prone to OOB writes from CPU. Exactly what not mapping the memory
>       to CPU tried to protect the GPU from to begin with.
> 
> 
>         As far as I can see this whole idea is extremely questionable. This
>         looks like re-inventing the wheel in a different color.
> 
>     I see it like reinventing a round wheel compared to octagonal wheel.
> 
>     Could you elaborate with facts much more on your position why the ROCm
>     debugger design is an absolute must for others to adopt?
> 
> 
> Well I'm trying to prevent some of the mistakes we did with the ROCm design.

Well, I would say that the above limitations are direct results of the ROCm
debugging design. So while we're eager to learn about how you perceive
GPU debugging should work, would you mind addressing the above
shortcomings?

> And trying to re-invent well proven kernel interfaces is one of the big
> mistakes made in the ROCm design.

Appreciate the feedback. Please work on the representation a bit as it currently
doesn't seem very helpful but appears just as an attempt to try to throw a spanner
in the works.

> If you really want to expose an interface to userspace

To a debugger process, enabled only behind a flag.

> which walks the process
> page table, installs an MMU notifier

This part is already done to put an userptr to the GPU page tables to
begin with. So hopefully not too controversial.

> kmaps the resulting page

In addition to having it in the page tables where GPU can access it.

> and then memcpy
> to/from it then you absolutely *must* run that by guys like Christoph Hellwig,
> Andrew and even Linus.

Surely, that is why we're seeking out for review.

We could also in theory use an in-kernel GPU context on the GPU hardware for
doing the peek/poke operations on userptr.

But that seems like a high-overhead thing to do due to the overhead of
setting up a transfer per data word and going over the PCI bus twice
compared to accessing the memory directly by CPU when it trivially can.

So this is the current proposal.

Regards, Joonas

> 
> I'm pretty sure that those guys will note that a device driver should
> absolutely not mess with such stuff.
> 
> Regards,
> Christian.
> 
> 
>     Otherwise it just looks like you are trying to prevent others from
>     implementing a more flexible debugging interface through vague comments about
>     "questionable design" without going into details. Not listing much concrete
>     benefits nor addressing the very concretely expressed drawbacks of your
>     suggested design, makes it seem like a very biased non-technical discussion.
> 
>     So while review interest and any comments are very much appreciated, please
>     also work on providing bit more reasoning and facts instead of just claiming
>     things. That'll help make the discussion much more fruitful.
> 
>     Regards, Joonas
> 
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-10 11:57             ` Joonas Lahtinen
@ 2024-12-10 14:03               ` Christian König
  2024-12-11 12:59                 ` Joonas Lahtinen
  0 siblings, 1 reply; 63+ messages in thread
From: Christian König @ 2024-12-10 14:03 UTC (permalink / raw)
  To: Joonas Lahtinen, Andrzej Hajda, Christoph Hellwig,
	Jonathan Cavitt, Linux MM, Maciej Patelczyk, Mika Kuoppala,
	dri-devel, intel-xe, lkml

[-- Attachment #1: Type: text/plain, Size: 16173 bytes --]

Am 10.12.24 um 12:57 schrieb Joonas Lahtinen:
> Quoting Christian König (2024-12-10 12:00:48)
>> Am 10.12.24 um 10:33 schrieb Joonas Lahtinen:
>>
>>      Quoting Christian König (2024-12-09 17:42:32)
>>
>>          Am 09.12.24 um 16:31 schrieb Simona Vetter:
>>
>>              On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
>>
>>                  Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
>>
>>                      From: Andrzej Hajda<andrzej.hajda@intel.com>
>>
>>                      Debugger needs to read/write program's vmas including userptr_vma.
>>                      Since hmm_range_fault is used to pin userptr vmas, it is possible
>>                      to map those vmas from debugger context.
>>
>>                  Oh, this implementation is extremely questionable as well. Adding the LKML
>>                  and the MM list as well.
>>
>>                  First of all hmm_range_fault() does *not* pin anything!
>>
>>                  In other words you don't have a page reference when the function returns,
>>                  but rather just a sequence number you can check for modifications.
>>
>>              I think it's all there, holds the invalidation lock during the critical
>>              access/section, drops it when reacquiring pages, retries until it works.
>>
>>              I think the issue is more that everyone hand-rolls userptr.
>>
>>          Well that is part of the issue.
>>
>>          The general problem here is that the eudebug interface tries to simulate
>>          the memory accesses as they would have happened by the hardware.
>>
>>      Could you elaborate, what is that a problem in that, exactly?
>>
>>      It's pretty much the equivalent of ptrace() poke/peek but for GPU memory.
>>
>>
>> Exactly that here. You try to debug the GPU without taking control of the CPU
>> process.
> You seem to have a built-in expectation that the CPU threads and memory space
> must be interfered with in order to debug a completely different set of threads
> and memory space elsewhere that executes independently. I don't quite see why?

Because the GPU only gets the information it needs to execute the commands.

A simple example would be to single step through the high level shader 
code. That is usually not available to the GPU, but only to the 
application who has submitted the work.

The GPU only sees the result of the compiler from high level into low 
level assembler.

> In debugging massively parallel workloads, it's a huge drawback to be limited to
> stop all mode in GDB. If ROCm folks are fine with such limitation, I have nothing
> against them keeping that limitation. Just it was a starting design principle for
> this design to avoid such a limitation.

Well, that's the part I don't understand. Why is that a drawback?

>> This means that you have to re-implement all debug functionalities which where
>> previously invested for the CPU process for the GPU once more.
> Seems like a strawman argument. Can you list the "all interfaces" being added
> that would be possible via indirection via ptrace() beyond peek/poke?
>
>> And that in turn creates a massive attack surface for security related
>> problems, especially when you start messing with things like userptrs which
>> have a very low level interaction with core memory management.
> Again, just seems like a strawman argument. You seem to generalize to some massive
> attack surface of hypothetical interfaces which you don't list. We're talking
> about memory peek/poke here.

That peek/poke interface is more than enough to cause problems.

> Can you explain the high-level difference from security perspective for
> temporarily pinning userptr pages to write them to page tables for GPU to
> execute a dma-fence workload with and temporarily pinning pages for
> peek/poke?

If you want to access userptr imported pages from the GPU going through 
the hops of using hhm_range_fault()/get_user_pages() plus an MMU 
notifier is a must have.

For a CPU based debugging interface that isn't necessary, you can just 
look directly into the application address space with existing interfaces.

>>      And it is exactly the kind of interface that makes sense for debugger as
>>      GPU memory != CPU memory, and they don't need to align at all.
>>
>>
>> And that is what I strongly disagree on. When you debug the GPU it is mandatory
>> to gain control of the CPU process as well.
> You are free to disagree on that. I simply don't agree and have in this
> and previous email presented multiple reasons as to why not. We can
> agree to disagree on the topic.

Yeah, that's ok. I also think we can agree on that this doesn't matter 
for the discussion.

The question is rather should the userptr functionality be used for 
debugging or not.

>> The CPU process is basically the overseer of the GPU activity, so it should
>> know everything about the GPU operation, for example what a mapping actually
>> means.
> How does that relate to what is being discussed here? You just seem to
> explain how you think userspace driver should work: Maintain a shadow
> tree of each ppGTT VM layout? I don't agree on that, but I think it is
> slightly irrelevant here.

I'm trying to understand why you want to debug only the GPU without also 
attaching to the CPU process.

>> The kernel driver and the hardware only have the information necessary to
>> execute the work prepared by the CPU process. So the information available is
>> limited to begin with.
> And the point here is? Are you saying kernel does not know the actual mappings
> maintained in the GPU page tables?

The kernel knows that, the question is why does userspace don't know that?

On the other hand I have to agree that this isn't much of a problem.

If userspace really doesn't know what is mapped where in the GPU's VM 
address space then an IOCTL to query that is probably not an issue.

>>          What the debugger should probably do is to cleanly attach to the
>>          application, get the information which CPU address is mapped to which
>>          GPU address and then use the standard ptrace interfaces.
>>
>>      I don't quite agree here -- at all. "Which CPU address is mapped to
>>      which GPU address" makes no sense when the GPU address space and CPU
>>      address space is completely controlled by the userspace driver/application.
>>
>>
>> Yeah, that's the reason why you should ask the userspace driver/application for
>> the necessary information and not go over the kernel to debug things.
> What hypothetical necessary information are you referring to exactly?

What you said before: "the GPU address space and CPU address space is 
completely controlled by the userspace driver/application". When that's 
the case, then why as the kernel for help? The driver/application is in 
control.
> I already explained there are good reasons not to map all the GPU memory
> into the CPU address space.

Well I still don't fully agree to that argumentation, but compared to 
using userptr the peek/pook on a GEM handle is basically harmless.

>>      Please try to consider things outside of the ROCm architecture.
>>
>>
>> Well I consider a good part of the ROCm architecture rather broken exactly
>> because we haven't pushed back hard enough on bad ideas.
>>
>>
>>      Something like a register scratch region or EU instructions should not
>>      even be mapped to CPU address space as CPU has no business accessing it
>>      during normal operation. And backing of such region will vary per
>>      context/LRC on the same virtual address per EU thread.
>>
>>      You seem to be suggesting to rewrite even our userspace driver to behave
>>      the same way as ROCm driver does just so that we could implement debug memory
>>      accesses via ptrace() to the CPU address space.
>>
>>
>> Oh, well certainly not. That ROCm has an 1 to 1 mapping between CPU and GPU is
>> one thing I've pushed back massively on and has now proven to be problematic.
> Right, so is your claim then that instead of being 1:1 the CPU address space
> should be a superset of all GPU address spaces instead to make sure
> ptrace() can modify all memory?

Well why not? Mapping a BO and not accessing it has only minimal overhead.

We already considered to making that mandatory for TTM drivers for 
better OOM killer handling. That approach was discontinued, but 
certainly not for the overhead.

> Cause I'm slightly lost here as you don't give much reasoning, just
> claim things to be certain way.

Ok, that's certainly not what I'm trying to express.

Things don't need to be in a certain way, especially not in the way ROCm 
does things.

But you should not try to re-create GPU accesses with the CPU, 
especially when that isn't memory you have control over in the sense 
that it was allocated through your driver stack.

>>      That seems bit of a radical suggestion, especially given the drawbacks
>>      pointed out in your suggested design.
>>
>>
>>          The whole interface re-invents a lot of functionality which is already
>>          there
>>
>>      I'm not really sure I would call adding a single interface for memory
>>      reading and writing to be "re-inventing a lot of functionality".
>>
>>      All the functionality behind this interface will be needed by GPU core
>>      dumping, anyway. Just like for the other patch series.
>>
>>
>> As far as I can see exactly that's an absolutely no-go. Device core dumping
>> should *never ever* touch memory imported by userptrs.
> Could you again elaborate on what the great difference is to short term
> pinning to use in dma-fence workloads? Just the kmap?

The big difference is that the memory doesn't belong to the driver who 
is core dumping.

That is just something you have imported from the MM subsystem, e.g. 
anonymous memory and file backed mappings.

We also don't allow to mmap() dma-bufs on importing devices for similar 
reasons.

>> That's what process core dumping is good for.
> Not really sure I agree. If you do not dump the memory as seen by the
> GPU, then you need to go parsing the CPU address space in order to make
> sense which buffers were mapped where and that CPU memory contents containing
> metadata could be corrupt as we're dealing with a crashing app to begin with.
>
> Big point of relying to the information from GPU VM for the GPU memory layout
> is that it won't be corrupted by rogue memory accesses in CPU process.

Well that you don't want to use potentially corrupted information is a 
good argument, but why just not dump an information like "range 
0xabcd-0xbcde came as userptr from process 1 VMA 0x1234-0x5678" ?

A process address space is not really something a device driver should 
be messing with.

>
>>          just because you don't like the idea to attach to the debugged
>>          application in userspace.
>>
>>      A few points that have been brought up as drawback to the
>>      GPU debug through ptrace(), but to recap a few relevant ones for this
>>      discussion:
>>
>>      - You can only really support GDB stop-all mode or at least have to
>>        stop all the CPU threads while you control the GPU threads to
>>        avoid interference. Elaborated on this on the other threads more.
>>      - Controlling the GPU threads will always interfere with CPU threads.
>>        Doesn't seem feasible to single-step an EU thread while CPU threads
>>        continue to run freely?
>>
>>
>> I would say no.
> Should this be understood that you agree these are limitations of the ROCm
> debug architecture?

ROCm has a bunch of design decisions I would say we should never ever 
repeat:

1. Forcing a 1 to 1 model between GPU address space and CPU address space.

2. Using a separate file descriptor additional to the DRM render node.

3. Attaching information and context to the CPU process instead of the 
DRM render node.
....

But stopping the world, e.g. both CPU and GPU threads if you want to 
debug something is not one of the problematic decisions.

That's why I'm really surprised that you insist so much on that.

>>      - You are very much restricted by the CPU VA ~ GPU VA alignment
>>        requirement, which is not true for OpenGL or Vulkan etc. Seems
>>        like one of the reasons why ROCm debugging is not easily extendable
>>        outside compute?
>>
>>
>> Well as long as you can't take debugged threads from the hardware you can
>> pretty much forget any OpenGL or Vulkan debugging with this interface since it
>> violates the dma_fence restrictions in the kernel.
> Agreed. However doesn't mean because you can't do it right now, you you should
> design an architecture that actively prevents you from doing that in the future.

Good point. That's what I can totally agree on as well.

>>      - You have to expose extra memory to CPU process just for GPU
>>        debugger access and keep track of GPU VA for each. Makes the GPU more
>>        prone to OOB writes from CPU. Exactly what not mapping the memory
>>        to CPU tried to protect the GPU from to begin with.
>>
>>
>>          As far as I can see this whole idea is extremely questionable. This
>>          looks like re-inventing the wheel in a different color.
>>
>>      I see it like reinventing a round wheel compared to octagonal wheel.
>>
>>      Could you elaborate with facts much more on your position why the ROCm
>>      debugger design is an absolute must for others to adopt?
>>
>>
>> Well I'm trying to prevent some of the mistakes we did with the ROCm design.
> Well, I would say that the above limitations are direct results of the ROCm
> debugging design. So while we're eager to learn about how you perceive
> GPU debugging should work, would you mind addressing the above
> shortcomings?

Yeah, absolutely. That you don't have a 1 to 1 mapping on the GPU is a 
step in the right direction if you ask me.

>> And trying to re-invent well proven kernel interfaces is one of the big
>> mistakes made in the ROCm design.
> Appreciate the feedback. Please work on the representation a bit as it currently
> doesn't seem very helpful but appears just as an attempt to try to throw a spanner
> in the works.
>
>> If you really want to expose an interface to userspace
> To a debugger process, enabled only behind a flag.
>
>> which walks the process
>> page table, installs an MMU notifier
> This part is already done to put an userptr to the GPU page tables to
> begin with. So hopefully not too controversial.
>
>> kmaps the resulting page
> In addition to having it in the page tables where GPU can access it.
>
>> and then memcpy
>> to/from it then you absolutely *must* run that by guys like Christoph Hellwig,
>> Andrew and even Linus.
> Surely, that is why we're seeking out for review.
>
> We could also in theory use an in-kernel GPU context on the GPU hardware for
> doing the peek/poke operations on userptr.

Yeah, I mean that should clearly work out. We have something similar.

> But that seems like a high-overhead thing to do due to the overhead of
> setting up a transfer per data word and going over the PCI bus twice
> compared to accessing the memory directly by CPU when it trivially can.

Understandable, but that will create another way of accessing process 
memory.

Regards,
Christian.

>
> So this is the current proposal.
>
> Regards, Joonas
>
>> I'm pretty sure that those guys will note that a device driver should
>> absolutely not mess with such stuff.
>>
>> Regards,
>> Christian.
>>
>>
>>      Otherwise it just looks like you are trying to prevent others from
>>      implementing a more flexible debugging interface through vague comments about
>>      "questionable design" without going into details. Not listing much concrete
>>      benefits nor addressing the very concretely expressed drawbacks of your
>>      suggested design, makes it seem like a very biased non-technical discussion.
>>
>>      So while review interest and any comments are very much appreciated, please
>>      also work on providing bit more reasoning and facts instead of just claiming
>>      things. That'll help make the discussion much more fruitful.
>>
>>      Regards, Joonas
>>
>>

[-- Attachment #2: Type: text/html, Size: 22832 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-10 14:03               ` Christian König
@ 2024-12-11 12:59                 ` Joonas Lahtinen
  2024-12-17 14:12                   ` Joonas Lahtinen
  0 siblings, 1 reply; 63+ messages in thread
From: Joonas Lahtinen @ 2024-12-11 12:59 UTC (permalink / raw)
  To: Andrzej Hajda, Christian König, Christoph Hellwig,
	Jonathan Cavitt, Linux MM, Maciej Patelczyk, Mika Kuoppala,
	dri-devel, intel-xe, lkml

First of all, do appreciate taking the time to explain your positions
much more verbosely this time.

Quoting Christian König (2024-12-10 16:03:14)
> Am 10.12.24 um 12:57 schrieb Joonas Lahtinen:
> 
>     Quoting Christian König (2024-12-10 12:00:48)
> 
>         Am 10.12.24 um 10:33 schrieb Joonas Lahtinen:
> 
>             Quoting Christian König (2024-12-09 17:42:32)
> 
>                 Am 09.12.24 um 16:31 schrieb Simona Vetter:
> 
>                     On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
> 
>                         Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
> 
>                             From: Andrzej Hajda <andrzej.hajda@intel.com>
> 
>                             Debugger needs to read/write program's vmas including userptr_vma.
>                             Since hmm_range_fault is used to pin userptr vmas, it is possible
>                             to map those vmas from debugger context.
> 
>                         Oh, this implementation is extremely questionable as well. Adding the LKML
>                         and the MM list as well.
> 
>                         First of all hmm_range_fault() does *not* pin anything!
> 
>                         In other words you don't have a page reference when the function returns,
>                         but rather just a sequence number you can check for modifications.
> 
>                     I think it's all there, holds the invalidation lock during the critical
>                     access/section, drops it when reacquiring pages, retries until it works.
> 
>                     I think the issue is more that everyone hand-rolls userptr.
> 
>                 Well that is part of the issue.
> 
>                 The general problem here is that the eudebug interface tries to simulate
>                 the memory accesses as they would have happened by the hardware.
> 
>             Could you elaborate, what is that a problem in that, exactly?
> 
>             It's pretty much the equivalent of ptrace() poke/peek but for GPU memory.
> 
> 
>         Exactly that here. You try to debug the GPU without taking control of the CPU
>         process.
> 
>     You seem to have a built-in expectation that the CPU threads and memory space
>     must be interfered with in order to debug a completely different set of threads
>     and memory space elsewhere that executes independently. I don't quite see why?
> 
> 
> Because the GPU only gets the information it needs to execute the commands.

Right, but even for the CPU process, the debug symbols are not part of the
execution address space either. There similarly are only the instructions
generated by the compiler and the debug symbols are separate. They may be
obtainable by parsing /proc/<PID>/exe but can also be in a completely
different file in a different host machine.

> A simple example would be to single step through the high level shader code.
> That is usually not available to the GPU, but only to the application who has
> submitted the work.
> 
> The GPU only sees the result of the compiler from high level into low level
> assembler.

If we were to have unified executable format where both the GPU and CPU
instructions were to be part of the single executable file, so could the
DWARF information for both CPU and GPU.

Then GDB, by loading the executable file, would have all the debug
information it needed. No need to introspect to the CPU process in order
to debug the GPU, similarly as there is no need to introspect CPU
process to debug CPU process.

While we don't currently have that and GPU instructions are often JIT
generated, we tried to make life easier by userspace driver providing
the DWARF information it just generated for the code it JITed as VM_BIND
metadata for a VMA and we make copy to store safely to avoid corruption by
rogue CPU process writes.

In the history it was exported to a file and then loaded by GDB from that
separate file, making user experience quite bad.

So to recap, while for JIT scenarios and for lack of unified carrier
format for GPU and CPU instructions, there is some information that is
convenient to have in CPU address space, I don't think that is a
necessity at all. I guess we could equally export
/sys/class/drm/.../clients/<ID>/{load_map,dwarf_symbols} or whatever,
similar to /proc/<PID>/{maps,exe}.

TL;DR While getting the information from CPU process for JIT scenarios is
convenient for now, I don't think it is a must or explicitly required.

>     In debugging massively parallel workloads, it's a huge drawback to be limited to
>     stop all mode in GDB. If ROCm folks are fine with such limitation, I have nothing
>     against them keeping that limitation. Just it was a starting design principle for
>     this design to avoid such a limitation.
> 
> 
> Well, that's the part I don't understand. Why is that a drawback?

Hmm, same as for not supporting stop-all mode for CPU threads during CPU
debugging? You will not be able to stop and observe a single thread while
letting the other threads run.

If the CPU threads are for example supposed to react to memory
semaphores/fences written by GPU thread and you want to debug by doing
those memory writes from GPU thread from the GDB command line?

Again not being limited to stop-all mode being an input to the design
phase from the folks doing in-field debugging, I'm probably not going to be
able to give out all the good reasons for it.

And as the CPU side supports it, even if you did not support it for the
GPU debugging, if adding GPU to the equation would prevent from using the
existing feature for CPU debugging that feels like a regression in user
experience.

I think those both are major drawbacks, but we can of course seek out further
opinions if it's highly relevant. I'm pretty sure myself at this point that if
a feature is desireable for CPU threaded debugging, it'll be very shortly asked
to be available for GPU.

That seems to be the trend for any CPU debug feature, even if some are
less feasible than others due to the differences of GPUs and CPUs.

>         This means that you have to re-implement all debug functionalities which where
>         previously invested for the CPU process for the GPU once more.
> 
>     Seems like a strawman argument. Can you list the "all interfaces" being added
>     that would be possible via indirection via ptrace() beyond peek/poke?
> 
> 
>         And that in turn creates a massive attack surface for security related
>         problems, especially when you start messing with things like userptrs which
>         have a very low level interaction with core memory management.
> 
>     Again, just seems like a strawman argument. You seem to generalize to some massive
>     attack surface of hypothetical interfaces which you don't list. We're talking
>     about memory peek/poke here.
> 
> 
> That peek/poke interface is more than enough to cause problems.

Ok, just wanted to make sure we're talking about concrete things. Happy
to discuss any other problems too, but for now let's focus on the
peek/poke then, and not get sidetracked.

>     Can you explain the high-level difference from security perspective for
>     temporarily pinning userptr pages to write them to page tables for GPU to
>     execute a dma-fence workload with and temporarily pinning pages for
>     peek/poke?
> 
> 
> If you want to access userptr imported pages from the GPU going through the
> hops of using hhm_range_fault()/get_user_pages() plus an MMU notifier is a must
> have.

Right, the intent was always to make this as close to EU thread memory access as
possible from both locking and Linux core MM memory point of view so if
we need to improve on that front, we should look into it.

> For a CPU based debugging interface that isn't necessary, you can just look
> directly into the application address space with existing interfaces.

First, this is only even possible when you have mapped everything the GPUs have
access also to the CPU address space, and maintain a load map for each individual
<DRM client, GPU VM, GPU VMA> => CPU address.

I don't see the need to do that tracking in the userspace just
for debugging, because kernel already has to do all the work.

Second, mapping every GPU VMA to CPU address space will exhaust the
vm.max_map_count [1] quite a bit faster. This problem can already be hit if
a naive userspace application tries to create too many aligned_alloc
blocks for userptr instead of pooling memory.

Third of all when GPU VM size == CPU VM size for modern hardware, you will run
out of VA space in the CPU VM. When you consider you have to add VA blocks of
(num DRM clients) * (num VM) * (VM size) where (num VM) roughly equals number of
engines * 3.

And ultimately, I'm pretty sure there are processes like 32-bit games
and emulators, and even demanding compute applications actually expect
to be able to use most of the CPU address space :) So don't think we should
have a design where we expect to be able to consume significant portions of the
CPU VA space (especially if it is just for debug time functionality).

[1] Documentation/admin-guide/sysctl/vm.rst#max_map_count

>             And it is exactly the kind of interface that makes sense for debugger as
>             GPU memory != CPU memory, and they don't need to align at all.
> 
> 
>         And that is what I strongly disagree on. When you debug the GPU it is mandatory
>         to gain control of the CPU process as well.
> 
>     You are free to disagree on that. I simply don't agree and have in this
>     and previous email presented multiple reasons as to why not. We can
>     agree to disagree on the topic.
> 
> 
> Yeah, that's ok. I also think we can agree on that this doesn't matter for the
> discussion.
> 
> The question is rather should the userptr functionality be used for debugging
> or not.
> 
> 
>         The CPU process is basically the overseer of the GPU activity, so it should
>         know everything about the GPU operation, for example what a mapping actually
>         means.
> 
>     How does that relate to what is being discussed here? You just seem to
>     explain how you think userspace driver should work: Maintain a shadow
>     tree of each ppGTT VM layout? I don't agree on that, but I think it is
>     slightly irrelevant here.
> 
> 
> I'm trying to understand why you want to debug only the GPU without also
> attaching to the CPU process.

Mostly to ensure we're not limited to stop-all mode as described above and to
have a clean independent implementation for the thread run-control between the
"inferiors" in GDB. Say you have CPU threads and 2 sets of GPU threads (3
inferiors in total). We don't want the CPU inferior to be impacted by
the user requesting to control the GPU inferiors.

I know the ROCm GDB implementation takes a different approach, and I'm
not quite sure how you folks plan on supporting multi-GPU debugging.

I would spin the question the opposite direction, if you don't need anything from
the CPU process why would you make them dependent and interfering?

(Reminder, the peek/poke target page has been made available to the GPU
page tables, so we don't want anything from the CPU process per se, we
want to know which page the GPU IOMMU unit would get for its access.

For all practical matters, the CPU process could have already exited and
should not matter if an EU is executing on the GPU still.)

>         The kernel driver and the hardware only have the information necessary to
>         execute the work prepared by the CPU process. So the information available is
>         limited to begin with.
> 
>     And the point here is? Are you saying kernel does not know the actual mappings
>     maintained in the GPU page tables?
> 
> 
> The kernel knows that, the question is why does userspace don't know that?
> 
> On the other hand I have to agree that this isn't much of a problem.
> 
> If userspace really doesn't know what is mapped where in the GPU's VM address
> space then an IOCTL to query that is probably not an issue.
> 
>                 What the debugger should probably do is to cleanly attach to the
>                 application, get the information which CPU address is mapped to which
>                 GPU address and then use the standard ptrace interfaces.
> 
>             I don't quite agree here -- at all. "Which CPU address is mapped to
>             which GPU address" makes no sense when the GPU address space and CPU
>             address space is completely controlled by the userspace driver/application.
> 
> 
>         Yeah, that's the reason why you should ask the userspace driver/application for
>         the necessary information and not go over the kernel to debug things.
> 
>     What hypothetical necessary information are you referring to exactly?
> 
> 
> What you said before: "the GPU address space and CPU address space is
> completely controlled by the userspace driver/application". When that's the
> case, then why as the kernel for help? The driver/application is in control.

I guess the emphasis should have been on the application part. Debugger can agree
with userspace driver on conventions to facilitate debugging, but not with the
application code.

However, agree that query IOCTL could be avoided maintaining a shadow address
space tracking in case ptrace() approach to debugging was otherwise favorable.

>     I already explained there are good reasons not to map all the GPU memory
>     into the CPU address space.
> 
> 
> Well I still don't fully agree to that argumentation, but compared to using
> userptr the peek/pook on a GEM handle is basically harmless.

(Sidenote: We don't expose BO handles at all via debugger interface. The debugger
interface fully relies on GPU addresses for everything.)

But sounds like we're coming towards a conclusion that the focus of the
discussion is only really on the controversy of touching userptr with
the debugger peek/poke interface or not.

>             Please try to consider things outside of the ROCm architecture.
> 
> 
>         Well I consider a good part of the ROCm architecture rather broken exactly
>         because we haven't pushed back hard enough on bad ideas.
> 
> 
>             Something like a register scratch region or EU instructions should not
>             even be mapped to CPU address space as CPU has no business accessing it
>             during normal operation. And backing of such region will vary per
>             context/LRC on the same virtual address per EU thread.
> 
>             You seem to be suggesting to rewrite even our userspace driver to behave
>             the same way as ROCm driver does just so that we could implement debug memory
>             accesses via ptrace() to the CPU address space.
> 
> 
>         Oh, well certainly not. That ROCm has an 1 to 1 mapping between CPU and GPU is
>         one thing I've pushed back massively on and has now proven to be problematic.
> 
>     Right, so is your claim then that instead of being 1:1 the CPU address space
>     should be a superset of all GPU address spaces instead to make sure
>     ptrace() can modify all memory?
> 
> 
> Well why not? Mapping a BO and not accessing it has only minimal overhead.
> 
> We already considered to making that mandatory for TTM drivers for better OOM
> killer handling. That approach was discontinued, but certainly not for the
> overhead.

I listed the reasons earlier in this message.

>     Cause I'm slightly lost here as you don't give much reasoning, just
>     claim things to be certain way.
> 
> 
> Ok, that's certainly not what I'm trying to express.
> 
> Things don't need to be in a certain way, especially not in the way ROCm does
> things.
> 
> But you should not try to re-create GPU accesses with the CPU, especially when
> that isn't memory you have control over in the sense that it was allocated
> through your driver stack.

I guess thats what I don't quite follow.

It's memory pages that are temporarily pinned and made available via GPU PTE to
the GPU IOMMU and it will inherently be able to read/write them outside
of the CPU caching domain.

Not sure why replacing "Submit GPU workload to peek/poke such page pinned behind
PTE" with "Use CPU to peek/poke because userptr is system memory anyway" seems such
controversial and could cause much more complexity than userptr in
general?

>             That seems bit of a radical suggestion, especially given the drawbacks
>             pointed out in your suggested design.
> 
> 
>                 The whole interface re-invents a lot of functionality which is already
>                 there
> 
>             I'm not really sure I would call adding a single interface for memory
>             reading and writing to be "re-inventing a lot of functionality".
> 
>             All the functionality behind this interface will be needed by GPU core
>             dumping, anyway. Just like for the other patch series.
> 
> 
>         As far as I can see exactly that's an absolutely no-go. Device core dumping
>         should *never ever* touch memory imported by userptrs.
> 
>     Could you again elaborate on what the great difference is to short term
>     pinning to use in dma-fence workloads? Just the kmap?
> 
> 
> The big difference is that the memory doesn't belong to the driver who is core
> dumping.

But the driver who is core dumping is holding a temporary pin on that
memory anyway, and has it in the GPU page tables.

The CPU side of the memory dump would only reflect what was the CPU side
memory contents at a dump time. It may have different contents of the GPU
side depending on cache flush timing. Maybe this will not be true when
CXL or some other coherency protocl is everywhere, but for now it is.

So those two memory dumps may actually have different contents, and that
might actually be the bug we're trying to debug. For GPU debugging, we're
specifically interested on what was the GPU threads view of the memory.

So I think it is more complex than that.

> That is just something you have imported from the MM subsystem, e.g. anonymous
> memory and file backed mappings.
> 
> We also don't allow to mmap() dma-bufs on importing devices for similar
> reasons.

That is a reasonable limitation for userspace applications.

And at no point has there been suggestions to expose such API for normal
userspace to shoot itself in the foot.

However debugger is not an average userspace consumer. For an example, it needs to
be able modify read-only memory (like the EU instructions) and then do special cache
flushes to magically change those instructions.

I wouldn't want to expose such a functionality as regular IOCTL for an
application.

>         That's what process core dumping is good for.
> 
>     Not really sure I agree. If you do not dump the memory as seen by the
>     GPU, then you need to go parsing the CPU address space in order to make
>     sense which buffers were mapped where and that CPU memory contents containing
>     metadata could be corrupt as we're dealing with a crashing app to begin with.
> 
>     Big point of relying to the information from GPU VM for the GPU memory layout
>     is that it won't be corrupted by rogue memory accesses in CPU process.
> 
> 
> Well that you don't want to use potentially corrupted information is a good
> argument, but why just not dump an information like "range 0xabcd-0xbcde came
> as userptr from process 1 VMA 0x1234-0x5678" ?

I guess that could be done for interactive debugging (but would again
add the ptrace() dependency).

In theory you could probably also come up with such a convention for ELF to
support core dumps I guess, but I would have to refer to some folks more
knowledgeable on the topic.

Feels like that would make things more complex via indirection compared
to existing memory maps.

> A process address space is not really something a device driver should be
> messing with.
> 
> 
> 
> 
>                 just because you don't like the idea to attach to the debugged
>                 application in userspace.
> 
>             A few points that have been brought up as drawback to the
>             GPU debug through ptrace(), but to recap a few relevant ones for this
>             discussion:
> 
>             - You can only really support GDB stop-all mode or at least have to
>               stop all the CPU threads while you control the GPU threads to
>               avoid interference. Elaborated on this on the other threads more.
>             - Controlling the GPU threads will always interfere with CPU threads.
>               Doesn't seem feasible to single-step an EU thread while CPU threads
>               continue to run freely?
> 
> 
>         I would say no.
> 
>     Should this be understood that you agree these are limitations of the ROCm
>     debug architecture?
> 
> 
> ROCm has a bunch of design decisions I would say we should never ever repeat:
> 
> 1. Forcing a 1 to 1 model between GPU address space and CPU address space.
> 
> 2. Using a separate file descriptor additional to the DRM render node.
> 
> 3. Attaching information and context to the CPU process instead of the DRM
> render node.
> ....
> 
> But stopping the world, e.g. both CPU and GPU threads if you want to debug
> something is not one of the problematic decisions.
> 
> That's why I'm really surprised that you insist so much on that.

I'm hoping the above explanations clarify my position further.

Again, I would ask myself: "Why add a dependency that is not needed?"

>             - You are very much restricted by the CPU VA ~ GPU VA alignment
>               requirement, which is not true for OpenGL or Vulkan etc. Seems
>               like one of the reasons why ROCm debugging is not easily extendable
>               outside compute?
> 
> 
>         Well as long as you can't take debugged threads from the hardware you can
>         pretty much forget any OpenGL or Vulkan debugging with this interface since it
>         violates the dma_fence restrictions in the kernel.
> 
>     Agreed. However doesn't mean because you can't do it right now, you you should
>     design an architecture that actively prevents you from doing that in the future.
> 
> 
> Good point. That's what I can totally agree on as well.
> 
> 
>             - You have to expose extra memory to CPU process just for GPU
>               debugger access and keep track of GPU VA for each. Makes the GPU more
>               prone to OOB writes from CPU. Exactly what not mapping the memory
>               to CPU tried to protect the GPU from to begin with.
> 
> 
>                 As far as I can see this whole idea is extremely questionable. This
>                 looks like re-inventing the wheel in a different color.
> 
>             I see it like reinventing a round wheel compared to octagonal wheel.
> 
>             Could you elaborate with facts much more on your position why the ROCm
>             debugger design is an absolute must for others to adopt?
> 
> 
>         Well I'm trying to prevent some of the mistakes we did with the ROCm design.
> 
>     Well, I would say that the above limitations are direct results of the ROCm
>     debugging design. So while we're eager to learn about how you perceive
>     GPU debugging should work, would you mind addressing the above
>     shortcomings?
> 
> 
> Yeah, absolutely. That you don't have a 1 to 1 mapping on the GPU is a step in
> the right direction if you ask me.

Right, that is to have a possibility of extending to OpenGL/Vulkan etc. 

>         And trying to re-invent well proven kernel interfaces is one of the big
>         mistakes made in the ROCm design.
> 
>     Appreciate the feedback. Please work on the representation a bit as it currently
>     doesn't seem very helpful but appears just as an attempt to try to throw a spanner
>     in the works.
> 
> 
>         If you really want to expose an interface to userspace
> 
>     To a debugger process, enabled only behind a flag.
> 
> 
>         which walks the process
>         page table, installs an MMU notifier
> 
>     This part is already done to put an userptr to the GPU page tables to
>     begin with. So hopefully not too controversial.
> 
> 
>         kmaps the resulting page
> 
>     In addition to having it in the page tables where GPU can access it.
> 
> 
>         and then memcpy
>         to/from it then you absolutely *must* run that by guys like Christoph Hellwig,
>         Andrew and even Linus.
> 
>     Surely, that is why we're seeking out for review.
> 
>     We could also in theory use an in-kernel GPU context on the GPU hardware for
>     doing the peek/poke operations on userptr.
> 
> 
> Yeah, I mean that should clearly work out. We have something similar.

Right, and that might actually be desireable for the more special GPU VMA
like interconnect addresses.

However userptr is one of the items where it makes least sense, given
we'd have to set up the transfer over bus, the transfer would read
system memory over bus and write the result back to system memory over
bus.

And this is just to avoid kmap'ing a page that would otherwise be
already temporarily pinned for being in the PTEs.

I'm not saying it can't be done, but I just don't feel like it's a
technically sound solution.

>     But that seems like a high-overhead thing to do due to the overhead of
>     setting up a transfer per data word and going over the PCI bus twice
>     compared to accessing the memory directly by CPU when it trivially can.
> 
> 
> Understandable, but that will create another way of accessing process memory.

Well, we hopefully should be able to align with the regular temporary
pinning and making page available to PTEs, but instead of making
available to PTEs, do a peek/poke and then release the page already.

I'm kind of hoping to build the case for it making a lot of sense for
peek/poke performance (which is important for single-stepping), and
should not be a burden due to new locking chains.

And thanks once again for taking the time to share the details behind
the thinking and bearing with all my questions.

It seems like the peek/poke access to userptr is the big remaining open
where opinions differ, so maybe we should first focus on aligning on it.
It impacts both core dumping and interactive debugging.

Regards, Joonas

> 
> Regards,
> Christian.
> 
> 
> 
>     So this is the current proposal.
> 
>     Regards, Joonas
> 
> 
>         I'm pretty sure that those guys will note that a device driver should
>         absolutely not mess with such stuff.
> 
>         Regards,
>         Christian.
> 
> 
>             Otherwise it just looks like you are trying to prevent others from
>             implementing a more flexible debugging interface through vague comments about
>             "questionable design" without going into details. Not listing much concrete
>             benefits nor addressing the very concretely expressed drawbacks of your
>             suggested design, makes it seem like a very biased non-technical discussion.
> 
>             So while review interest and any comments are very much appreciated, please
>             also work on providing bit more reasoning and facts instead of just claiming
>             things. That'll help make the discussion much more fruitful.
> 
>             Regards, Joonas
> 
> 
> 
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-09 15:31     ` Simona Vetter
  2024-12-09 15:42       ` Christian König
@ 2024-12-12  8:49       ` Thomas Hellström
  2024-12-12 10:12         ` Simona Vetter
  1 sibling, 1 reply; 63+ messages in thread
From: Thomas Hellström @ 2024-12-12  8:49 UTC (permalink / raw)
  To: Christian König, Mika Kuoppala, intel-xe, lkml, Linux MM,
	dri-devel, Andrzej Hajda, Maciej Patelczyk, Jonathan Cavitt

On Mon, 2024-12-09 at 16:31 +0100, Simona Vetter wrote:
> On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
> > Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
> > > From: Andrzej Hajda <andrzej.hajda@intel.com>
> > > 
> > > Debugger needs to read/write program's vmas including
> > > userptr_vma.
> > > Since hmm_range_fault is used to pin userptr vmas, it is possible
> > > to map those vmas from debugger context.
> > 
> > Oh, this implementation is extremely questionable as well. Adding
> > the LKML
> > and the MM list as well.
> > 
> > First of all hmm_range_fault() does *not* pin anything!
> > 
> > In other words you don't have a page reference when the function
> > returns,
> > but rather just a sequence number you can check for modifications.
> 
> I think it's all there, holds the invalidation lock during the
> critical
> access/section, drops it when reacquiring pages, retries until it
> works.
> 
> I think the issue is more that everyone hand-rolls userptr. Probably
> time
> we standardize that and put it into gpuvm as an optional part, with
> consistent locking, naming (like not calling it _pin_pages when it's
> unpinnged userptr), kerneldoc and all the nice things so that we
> stop consistently getting confused by other driver's userptr code.
> 
> I think that was on the plan originally as an eventual step, I guess
> time
> to pump that up. Matt/Thomas, thoughts?

It looks like we have this planned and ongoing but there are some
complications and thoughts.

1) A drm_gpuvm implementation would be based on vma userptrs, and would
be pretty straightforward based on xe's current implementation and, as
you say, renaming.

2) Current Intel work to land this on the drm level is based on
drm_gpusvm (minus migration to VRAM). I'm not fully sure yet how this
will integrate with drm_gpuvm.

3) Christian mentioned a plan to have a common userptr implementation
based off drm_exec. I figure that would be bo-based like the amdgpu
implemeentation still is. Possibly i915 would be interested in this but
I think any VM_BIND based driver would want to use drm_gpuvm /
drm_gpusvm implementation, which is also typically O(1), since userptrs
are considered vm-local.

Ideas / suggestions welcome

> -Sima
> 
> > 
> > > v2: pin pages vs notifier, move to vm.c (Matthew)
> > > v3: - iterate over system pages instead of DMA, fixes iommu
> > > enabled
> > >      - s/xe_uvma_access/xe_vm_uvma_access/ (Matt)
> > > 
> > > Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> > > Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> > > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> #v1
> > > ---
> > >   drivers/gpu/drm/xe/xe_eudebug.c |  3 ++-
> > >   drivers/gpu/drm/xe/xe_vm.c      | 47
> > > +++++++++++++++++++++++++++++++++
> > >   drivers/gpu/drm/xe/xe_vm.h      |  3 +++
> > >   3 files changed, 52 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_eudebug.c
> > > b/drivers/gpu/drm/xe/xe_eudebug.c
> > > index 9d87df75348b..e5949e4dcad8 100644
> > > --- a/drivers/gpu/drm/xe/xe_eudebug.c
> > > +++ b/drivers/gpu/drm/xe/xe_eudebug.c
> > > @@ -3076,7 +3076,8 @@ static int xe_eudebug_vma_access(struct
> > > xe_vma *vma, u64 offset_in_vma,
> > >   		return ret;
> > >   	}
> > > -	return -EINVAL;
> > > +	return xe_vm_userptr_access(to_userptr_vma(vma),
> > > offset_in_vma,
> > > +				    buf, bytes, write);
> > >   }
> > >   static int xe_eudebug_vm_access(struct xe_vm *vm, u64 offset,
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > b/drivers/gpu/drm/xe/xe_vm.c
> > > index 0f17bc8b627b..224ff9e16941 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -3414,3 +3414,50 @@ void xe_vm_snapshot_free(struct
> > > xe_vm_snapshot *snap)
> > >   	}
> > >   	kvfree(snap);
> > >   }
> > > +
> > > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64
> > > offset,
> > > +			 void *buf, u64 len, bool write)
> > > +{
> > > +	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
> > > +	struct xe_userptr *up = &uvma->userptr;
> > > +	struct xe_res_cursor cur = {};
> > > +	int cur_len, ret = 0;
> > > +
> > > +	while (true) {
> > > +		down_read(&vm->userptr.notifier_lock);
> > > +		if (!xe_vma_userptr_check_repin(uvma))
> > > +			break;
> > > +
> > > +		spin_lock(&vm->userptr.invalidated_lock);
> > > +		list_del_init(&uvma->userptr.invalidate_link);
> > > +		spin_unlock(&vm->userptr.invalidated_lock);
> > > +
> > > +		up_read(&vm->userptr.notifier_lock);
> > > +		ret = xe_vma_userptr_pin_pages(uvma);
> > > +		if (ret)
> > > +			return ret;
> > > +	}
> > > +
> > > +	if (!up->sg) {
> > > +		ret = -EINVAL;
> > > +		goto out_unlock_notifier;
> > > +	}
> > > +
> > > +	for (xe_res_first_sg_system(up->sg, offset, len, &cur);
> > > cur.remaining;
> > > +	     xe_res_next(&cur, cur_len)) {
> > > +		void *ptr = kmap_local_page(sg_page(cur.sgl)) +
> > > cur.start;
> > 
> > The interface basically creates a side channel to access userptrs
> > in the way
> > an userspace application would do without actually going through
> > userspace.
> > 
> > That is generally not something a device driver should ever do as
> > far as I
> > can see.
> > 
> > > +
> > > +		cur_len = min(cur.size, cur.remaining);
> > > +		if (write)
> > > +			memcpy(ptr, buf, cur_len);
> > > +		else
> > > +			memcpy(buf, ptr, cur_len);
> > > +		kunmap_local(ptr);
> > > +		buf += cur_len;
> > > +	}
> > > +	ret = len;
> > > +
> > > +out_unlock_notifier:
> > > +	up_read(&vm->userptr.notifier_lock);
> > 
> > I just strongly hope that this will prevent the mapping from
> > changing.
> > 
> > Regards,
> > Christian.
> > 
> > > +	return ret;
> > > +}
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > > b/drivers/gpu/drm/xe/xe_vm.h
> > > index 23adb7442881..372ad40ad67f 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.h
> > > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > > @@ -280,3 +280,6 @@ struct xe_vm_snapshot
> > > *xe_vm_snapshot_capture(struct xe_vm *vm);
> > >   void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot
> > > *snap);
> > >   void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct
> > > drm_printer *p);
> > >   void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
> > > +
> > > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64
> > > offset,
> > > +			 void *buf, u64 len, bool write);
> > 
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/26] ptrace: export ptrace_may_access
  2024-12-10  4:29   ` Christoph Hellwig
@ 2024-12-12  9:16     ` Joonas Lahtinen
  0 siblings, 0 replies; 63+ messages in thread
From: Joonas Lahtinen @ 2024-12-12  9:16 UTC (permalink / raw)
  To: Christoph Hellwig, Mika Kuoppala
  Cc: intel-xe, dri-devel, christian.koenig, Oleg Nesterov,
	linux-kernel, Dave Airlie, Lucas De Marchi, Matthew Brost,
	Andi Shyti, Maciej Patelczyk, Dominik Grzegorzek, Jonathan Cavitt,
	Andi Shyti

Quoting Christoph Hellwig (2024-12-10 06:29:38)
> On Mon, Dec 09, 2024 at 03:32:52PM +0200, Mika Kuoppala wrote:
> > xe driver would like to allow fine grained access control
> > for GDB debugger using ptrace. Without this export, the only
> > option would be to check for CAP_SYS_ADMIN.
> > 
> > The check intended for an ioctl to attach a GPU debugger
> > is similar to the ptrace use case: allow a calling process
> > to manipulate a target process if it has the necessary
> > capabilities or the same permissions, as described in
> > Documentation/process/adding-syscalls.rst.
> > 
> > Export ptrace_may_access function to allow GPU debugger to
> > have identical access control for debugger(s)
> > as a CPU debugger.
> 
> This seems to mis an actual user or you forgot to Cc linux-kernel on it.

Right, that is a miss on our side. For the time being, the whole series
can be found in lore archive:

https://lore.kernel.org/dri-devel/20241209133318.1806472-1-mika.kuoppala@linux.intel.com/

The user is introduced in patch: [PATCH 03/26] drm/xe/eudebug: Introduce discovery for resources [1]

Essentially, we want to check if PID1 has permission to ptrace PID2, before we grant the
permission for PID1 to debug the GPU address space/memory of PID2.

Mika, please do Cc the relevant other patches of the series to LKML for next iteration.

Regards, Joonas

[1] https://lore.kernel.org/dri-devel/20241209133318.1806472-1-mika.kuoppala@linux.intel.com/T/#md3d005faaaac1ba01451b139a634e5545c2a266f

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-12  8:49       ` Thomas Hellström
@ 2024-12-12 10:12         ` Simona Vetter
  2024-12-13 19:39           ` Matthew Brost
  0 siblings, 1 reply; 63+ messages in thread
From: Simona Vetter @ 2024-12-12 10:12 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Christian König, Mika Kuoppala, intel-xe, lkml, Linux MM,
	dri-devel, Andrzej Hajda, Maciej Patelczyk, Jonathan Cavitt

On Thu, Dec 12, 2024 at 09:49:24AM +0100, Thomas Hellström wrote:
> On Mon, 2024-12-09 at 16:31 +0100, Simona Vetter wrote:
> > On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
> > > Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
> > > > From: Andrzej Hajda <andrzej.hajda@intel.com>
> > > > 
> > > > Debugger needs to read/write program's vmas including
> > > > userptr_vma.
> > > > Since hmm_range_fault is used to pin userptr vmas, it is possible
> > > > to map those vmas from debugger context.
> > > 
> > > Oh, this implementation is extremely questionable as well. Adding
> > > the LKML
> > > and the MM list as well.
> > > 
> > > First of all hmm_range_fault() does *not* pin anything!
> > > 
> > > In other words you don't have a page reference when the function
> > > returns,
> > > but rather just a sequence number you can check for modifications.
> > 
> > I think it's all there, holds the invalidation lock during the
> > critical
> > access/section, drops it when reacquiring pages, retries until it
> > works.
> > 
> > I think the issue is more that everyone hand-rolls userptr. Probably
> > time
> > we standardize that and put it into gpuvm as an optional part, with
> > consistent locking, naming (like not calling it _pin_pages when it's
> > unpinnged userptr), kerneldoc and all the nice things so that we
> > stop consistently getting confused by other driver's userptr code.
> > 
> > I think that was on the plan originally as an eventual step, I guess
> > time
> > to pump that up. Matt/Thomas, thoughts?
> 
> It looks like we have this planned and ongoing but there are some
> complications and thoughts.
> 
> 1) A drm_gpuvm implementation would be based on vma userptrs, and would
> be pretty straightforward based on xe's current implementation and, as
> you say, renaming.
> 
> 2) Current Intel work to land this on the drm level is based on
> drm_gpusvm (minus migration to VRAM). I'm not fully sure yet how this
> will integrate with drm_gpuvm.
> 
> 3) Christian mentioned a plan to have a common userptr implementation
> based off drm_exec. I figure that would be bo-based like the amdgpu
> implemeentation still is. Possibly i915 would be interested in this but
> I think any VM_BIND based driver would want to use drm_gpuvm /
> drm_gpusvm implementation, which is also typically O(1), since userptrs
> are considered vm-local.
> 
> Ideas / suggestions welcome

So just discussed this a bit with Joonas, and if we use access_remote_vm
for the userptr access instead of hand-rolling then we really only need
bare-bones data structure changes in gpuvm, and nothing more. So

- add the mm pointer to struct drm_gpuvm
- add a flag indicating that it's a userptr + userspace address to struct
  drm_gpuva
- since we already have userptr in drivers I guess there should be any
  need to adjust the actual drm_gpuvm code to cope with these

Then with this you can write the access helper using access_remote_vm
since that does the entire remote mm walking internally, and so there's
no need to also have all the mmu notifier and locking lifted to gpuvm. But
it does already give us some great places to put relevant kerneldocs (not
just for debugging architecture, but userptr stuff in general), which is
already a solid step forward.

Plus I think it'd would also be a solid first step that we need no matter
what for figuring out the questions/options you have above.

Thoughts?
-Sima

> 
> > -Sima
> > 
> > > 
> > > > v2: pin pages vs notifier, move to vm.c (Matthew)
> > > > v3: - iterate over system pages instead of DMA, fixes iommu
> > > > enabled
> > > >      - s/xe_uvma_access/xe_vm_uvma_access/ (Matt)
> > > > 
> > > > Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> > > > Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> > > > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> #v1
> > > > ---
> > > >   drivers/gpu/drm/xe/xe_eudebug.c |  3 ++-
> > > >   drivers/gpu/drm/xe/xe_vm.c      | 47
> > > > +++++++++++++++++++++++++++++++++
> > > >   drivers/gpu/drm/xe/xe_vm.h      |  3 +++
> > > >   3 files changed, 52 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_eudebug.c
> > > > b/drivers/gpu/drm/xe/xe_eudebug.c
> > > > index 9d87df75348b..e5949e4dcad8 100644
> > > > --- a/drivers/gpu/drm/xe/xe_eudebug.c
> > > > +++ b/drivers/gpu/drm/xe/xe_eudebug.c
> > > > @@ -3076,7 +3076,8 @@ static int xe_eudebug_vma_access(struct
> > > > xe_vma *vma, u64 offset_in_vma,
> > > >   		return ret;
> > > >   	}
> > > > -	return -EINVAL;
> > > > +	return xe_vm_userptr_access(to_userptr_vma(vma),
> > > > offset_in_vma,
> > > > +				    buf, bytes, write);
> > > >   }
> > > >   static int xe_eudebug_vm_access(struct xe_vm *vm, u64 offset,
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > > b/drivers/gpu/drm/xe/xe_vm.c
> > > > index 0f17bc8b627b..224ff9e16941 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > @@ -3414,3 +3414,50 @@ void xe_vm_snapshot_free(struct
> > > > xe_vm_snapshot *snap)
> > > >   	}
> > > >   	kvfree(snap);
> > > >   }
> > > > +
> > > > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64
> > > > offset,
> > > > +			 void *buf, u64 len, bool write)
> > > > +{
> > > > +	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
> > > > +	struct xe_userptr *up = &uvma->userptr;
> > > > +	struct xe_res_cursor cur = {};
> > > > +	int cur_len, ret = 0;
> > > > +
> > > > +	while (true) {
> > > > +		down_read(&vm->userptr.notifier_lock);
> > > > +		if (!xe_vma_userptr_check_repin(uvma))
> > > > +			break;
> > > > +
> > > > +		spin_lock(&vm->userptr.invalidated_lock);
> > > > +		list_del_init(&uvma->userptr.invalidate_link);
> > > > +		spin_unlock(&vm->userptr.invalidated_lock);
> > > > +
> > > > +		up_read(&vm->userptr.notifier_lock);
> > > > +		ret = xe_vma_userptr_pin_pages(uvma);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +	}
> > > > +
> > > > +	if (!up->sg) {
> > > > +		ret = -EINVAL;
> > > > +		goto out_unlock_notifier;
> > > > +	}
> > > > +
> > > > +	for (xe_res_first_sg_system(up->sg, offset, len, &cur);
> > > > cur.remaining;
> > > > +	     xe_res_next(&cur, cur_len)) {
> > > > +		void *ptr = kmap_local_page(sg_page(cur.sgl)) +
> > > > cur.start;
> > > 
> > > The interface basically creates a side channel to access userptrs
> > > in the way
> > > an userspace application would do without actually going through
> > > userspace.
> > > 
> > > That is generally not something a device driver should ever do as
> > > far as I
> > > can see.
> > > 
> > > > +
> > > > +		cur_len = min(cur.size, cur.remaining);
> > > > +		if (write)
> > > > +			memcpy(ptr, buf, cur_len);
> > > > +		else
> > > > +			memcpy(buf, ptr, cur_len);
> > > > +		kunmap_local(ptr);
> > > > +		buf += cur_len;
> > > > +	}
> > > > +	ret = len;
> > > > +
> > > > +out_unlock_notifier:
> > > > +	up_read(&vm->userptr.notifier_lock);
> > > 
> > > I just strongly hope that this will prevent the mapping from
> > > changing.
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > +	return ret;
> > > > +}
> > > > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > > > b/drivers/gpu/drm/xe/xe_vm.h
> > > > index 23adb7442881..372ad40ad67f 100644
> > > > --- a/drivers/gpu/drm/xe/xe_vm.h
> > > > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > > > @@ -280,3 +280,6 @@ struct xe_vm_snapshot
> > > > *xe_vm_snapshot_capture(struct xe_vm *vm);
> > > >   void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot
> > > > *snap);
> > > >   void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct
> > > > drm_printer *p);
> > > >   void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
> > > > +
> > > > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64
> > > > offset,
> > > > +			 void *buf, u64 len, bool write);
> > > 
> > 
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-12 10:12         ` Simona Vetter
@ 2024-12-13 19:39           ` Matthew Brost
  0 siblings, 0 replies; 63+ messages in thread
From: Matthew Brost @ 2024-12-13 19:39 UTC (permalink / raw)
  To: Simona Vetter
  Cc: Thomas Hellström, Christian König, Mika Kuoppala,
	intel-xe, lkml, Linux MM, dri-devel, Andrzej Hajda,
	Maciej Patelczyk, Jonathan Cavitt

On Thu, Dec 12, 2024 at 11:12:39AM +0100, Simona Vetter wrote:
> On Thu, Dec 12, 2024 at 09:49:24AM +0100, Thomas Hellström wrote:
> > On Mon, 2024-12-09 at 16:31 +0100, Simona Vetter wrote:
> > > On Mon, Dec 09, 2024 at 03:03:04PM +0100, Christian König wrote:
> > > > Am 09.12.24 um 14:33 schrieb Mika Kuoppala:
> > > > > From: Andrzej Hajda <andrzej.hajda@intel.com>
> > > > > 
> > > > > Debugger needs to read/write program's vmas including
> > > > > userptr_vma.
> > > > > Since hmm_range_fault is used to pin userptr vmas, it is possible
> > > > > to map those vmas from debugger context.
> > > > 
> > > > Oh, this implementation is extremely questionable as well. Adding
> > > > the LKML
> > > > and the MM list as well.
> > > > 
> > > > First of all hmm_range_fault() does *not* pin anything!
> > > > 
> > > > In other words you don't have a page reference when the function
> > > > returns,
> > > > but rather just a sequence number you can check for modifications.
> > > 
> > > I think it's all there, holds the invalidation lock during the
> > > critical
> > > access/section, drops it when reacquiring pages, retries until it
> > > works.
> > > 
> > > I think the issue is more that everyone hand-rolls userptr. Probably
> > > time
> > > we standardize that and put it into gpuvm as an optional part, with
> > > consistent locking, naming (like not calling it _pin_pages when it's
> > > unpinnged userptr), kerneldoc and all the nice things so that we
> > > stop consistently getting confused by other driver's userptr code.
> > > 
> > > I think that was on the plan originally as an eventual step, I guess
> > > time
> > > to pump that up. Matt/Thomas, thoughts?
> > 
> > It looks like we have this planned and ongoing but there are some
> > complications and thoughts.
> > 
> > 1) A drm_gpuvm implementation would be based on vma userptrs, and would
> > be pretty straightforward based on xe's current implementation and, as
> > you say, renaming.
> > 

My thoughts...

Standardize gpuvm userptrs gpuvmas a bit. In Xe I think we basically set
the BO to NULL in the gpuvmas then have some helpers in Xe to determine
if gpuvma is a userptr. I think some this code could be moved into gpuvm
so drivers are doing this in a standard way.

I think NULL bindings also set te BO to NULL too, perhaps we standardize
that too in gpuvm. 

> > 2) Current Intel work to land this on the drm level is based on
> > drm_gpusvm (minus migration to VRAM). I'm not fully sure yet how this
> > will integrate with drm_gpuvm.
> > 

Implement the userptr locking / page collection (i.e. hmm_range_fault
call) on top of gpusvm. Perhaps decouple the current page collection
from drm_gpusvm_range into an embedded struct like drm_gpusvm_devmem.
The plan was to more or less land gpusvm which in on the list addressing
Thomas's feedback before doing the userptr rework on top. 

As of now, different engineer will own this rework. Ofc, with Thomas and
myself providing guidance and welcoming community input. Xe will likely
be the first user of this so if we have to tweak this as more drivers
start to use this, ofc that is fine and will be open to any changes.

> > 3) Christian mentioned a plan to have a common userptr implementation
> > based off drm_exec. I figure that would be bo-based like the amdgpu
> > implemeentation still is. Possibly i915 would be interested in this but
> > I think any VM_BIND based driver would want to use drm_gpuvm /
> > drm_gpusvm implementation, which is also typically O(1), since userptrs
> > are considered vm-local.

I don't think any new driver would want a userptr implementation based
on drm_exec because of having to use BO's and this isn't required if
drm_gpuvm / drm_gpusvm is used which I suspect all new drivers will use.
Sure could be useful for amdgpu / i915 but for Xe we certainly wouldn't
want this nor would a VM bind only driver.

> > 
> > Ideas / suggestions welcome
> 
> So just discussed this a bit with Joonas, and if we use access_remote_vm
> for the userptr access instead of hand-rolling then we really only need
> bare-bones data structure changes in gpuvm, and nothing more. So
> 
> - add the mm pointer to struct drm_gpuvm
> - add a flag indicating that it's a userptr + userspace address to struct
>   drm_gpuva
> - since we already have userptr in drivers I guess there should be any
>   need to adjust the actual drm_gpuvm code to cope with these
> 
> Then with this you can write the access helper using access_remote_vm
> since that does the entire remote mm walking internally, and so there's
> no need to also have all the mmu notifier and locking lifted to gpuvm. But
> it does already give us some great places to put relevant kerneldocs (not
> just for debugging architecture, but userptr stuff in general), which is
> already a solid step forward.
> 
> Plus I think it'd would also be a solid first step that we need no matter
> what for figuring out the questions/options you have above.
> 
> Thoughts?

This seems like it could work with everything I've written above. Maybe
this lives in gpusvm though so we have clear divide where gpuvm is GPU
address space, and gpusvm is CPU address space. Kinda a bikeshed, but
agree in general if we need to access / modify userptrs this lives in
common code.

Do we view this userptr rework as a blocker for EuDebug? My thinking was
we don't as we (Intel) have fully committed to a common userptr
implementation.

FWIW, I really don't think the implementation in this patch and I stated
this may times but that feedback seems to have been ignored yet again.
I'd prefer an open code hmm_range_fault loop for now rather than a new
xe_res_cursor concept that will get thrown away.

Matt

> -Sima
> 
> > 
> > > -Sima
> > > 
> > > > 
> > > > > v2: pin pages vs notifier, move to vm.c (Matthew)
> > > > > v3: - iterate over system pages instead of DMA, fixes iommu
> > > > > enabled
> > > > >      - s/xe_uvma_access/xe_vm_uvma_access/ (Matt)
> > > > > 
> > > > > Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> > > > > Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> > > > > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > > Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> #v1
> > > > > ---
> > > > >   drivers/gpu/drm/xe/xe_eudebug.c |  3 ++-
> > > > >   drivers/gpu/drm/xe/xe_vm.c      | 47
> > > > > +++++++++++++++++++++++++++++++++
> > > > >   drivers/gpu/drm/xe/xe_vm.h      |  3 +++
> > > > >   3 files changed, 52 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_eudebug.c
> > > > > b/drivers/gpu/drm/xe/xe_eudebug.c
> > > > > index 9d87df75348b..e5949e4dcad8 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_eudebug.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_eudebug.c
> > > > > @@ -3076,7 +3076,8 @@ static int xe_eudebug_vma_access(struct
> > > > > xe_vma *vma, u64 offset_in_vma,
> > > > >   		return ret;
> > > > >   	}
> > > > > -	return -EINVAL;
> > > > > +	return xe_vm_userptr_access(to_userptr_vma(vma),
> > > > > offset_in_vma,
> > > > > +				    buf, bytes, write);
> > > > >   }
> > > > >   static int xe_eudebug_vm_access(struct xe_vm *vm, u64 offset,
> > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > > > b/drivers/gpu/drm/xe/xe_vm.c
> > > > > index 0f17bc8b627b..224ff9e16941 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > > @@ -3414,3 +3414,50 @@ void xe_vm_snapshot_free(struct
> > > > > xe_vm_snapshot *snap)
> > > > >   	}
> > > > >   	kvfree(snap);
> > > > >   }
> > > > > +
> > > > > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64
> > > > > offset,
> > > > > +			 void *buf, u64 len, bool write)
> > > > > +{
> > > > > +	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
> > > > > +	struct xe_userptr *up = &uvma->userptr;
> > > > > +	struct xe_res_cursor cur = {};
> > > > > +	int cur_len, ret = 0;
> > > > > +
> > > > > +	while (true) {
> > > > > +		down_read(&vm->userptr.notifier_lock);
> > > > > +		if (!xe_vma_userptr_check_repin(uvma))
> > > > > +			break;
> > > > > +
> > > > > +		spin_lock(&vm->userptr.invalidated_lock);
> > > > > +		list_del_init(&uvma->userptr.invalidate_link);
> > > > > +		spin_unlock(&vm->userptr.invalidated_lock);
> > > > > +
> > > > > +		up_read(&vm->userptr.notifier_lock);
> > > > > +		ret = xe_vma_userptr_pin_pages(uvma);
> > > > > +		if (ret)
> > > > > +			return ret;
> > > > > +	}
> > > > > +
> > > > > +	if (!up->sg) {
> > > > > +		ret = -EINVAL;
> > > > > +		goto out_unlock_notifier;
> > > > > +	}
> > > > > +
> > > > > +	for (xe_res_first_sg_system(up->sg, offset, len, &cur);
> > > > > cur.remaining;
> > > > > +	     xe_res_next(&cur, cur_len)) {
> > > > > +		void *ptr = kmap_local_page(sg_page(cur.sgl)) +
> > > > > cur.start;
> > > > 
> > > > The interface basically creates a side channel to access userptrs
> > > > in the way
> > > > an userspace application would do without actually going through
> > > > userspace.
> > > > 
> > > > That is generally not something a device driver should ever do as
> > > > far as I
> > > > can see.
> > > > 
> > > > > +
> > > > > +		cur_len = min(cur.size, cur.remaining);
> > > > > +		if (write)
> > > > > +			memcpy(ptr, buf, cur_len);
> > > > > +		else
> > > > > +			memcpy(buf, ptr, cur_len);
> > > > > +		kunmap_local(ptr);
> > > > > +		buf += cur_len;
> > > > > +	}
> > > > > +	ret = len;
> > > > > +
> > > > > +out_unlock_notifier:
> > > > > +	up_read(&vm->userptr.notifier_lock);
> > > > 
> > > > I just strongly hope that this will prevent the mapping from
> > > > changing.
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > > +	return ret;
> > > > > +}
> > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > > > > b/drivers/gpu/drm/xe/xe_vm.h
> > > > > index 23adb7442881..372ad40ad67f 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_vm.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > > > > @@ -280,3 +280,6 @@ struct xe_vm_snapshot
> > > > > *xe_vm_snapshot_capture(struct xe_vm *vm);
> > > > >   void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot
> > > > > *snap);
> > > > >   void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct
> > > > > drm_printer *p);
> > > > >   void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
> > > > > +
> > > > > +int xe_vm_userptr_access(struct xe_userptr_vma *uvma, u64
> > > > > offset,
> > > > > +			 void *buf, u64 len, bool write);
> > > > 
> > > 
> > 
> 
> -- 
> Simona Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite
  2024-12-09 13:33 ` [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access Mika Kuoppala
  2024-12-09 14:03   ` Christian König
@ 2024-12-16 14:17   ` Mika Kuoppala
  2024-12-20 11:31   ` Mika Kuoppala
  2 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-16 14:17 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, Mika Kuoppala, Matthew Brost, Andrzej Hajda,
	Thomas Hellström, Christian König, Joonas Lahtinen,
	Simona Vetter

Implement debugger vm access for userptrs.

When bind is done, take ref to current task so that
we know from which vm the address was bound. Then during
debugger pread/pwrite we use this target task as
parameter to access the debuggee vm with access_process_vm().

This is based on suggestions from Thomas, Joonas and Simona.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Simona Vetter <simona@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c  | 12 ++++++++++++
 drivers/gpu/drm/xe/xe_vm.c       | 11 +++++++++++
 drivers/gpu/drm/xe/xe_vm_types.h |  6 ++++++
 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 9d87df75348b..980b5a1383ad 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -3074,6 +3074,18 @@ static int xe_eudebug_vma_access(struct xe_vma *vma, u64 offset_in_vma,
 		xe_bo_put(bo);
 
 		return ret;
+	} else if (xe_vma_is_userptr(vma)) {
+		struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
+
+		/*
+		 * XXX: access_remote_vm() would fit as userptr notifier has
+		 * mm ref so we would not need to carry task ref at all.
+		 * But access_remote_vm is not exported. access_process_vm()
+		 * is exported so use it instead.
+		 */
+		return access_process_vm(userptr->eudebug.task,
+					 xe_vma_userptr(vma), buf, bytes,
+					 write ? FOLL_WRITE : 0);
 	}
 
 	return -EINVAL;
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 0f17bc8b627b..c23bb4547d66 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -999,6 +999,14 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 			}
 
 			userptr->notifier_seq = LONG_MAX;
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+			/*
+			 * We could use the mm which is on notifier. But
+			 * the access_remote_vm() is not exported. Thus
+			 * we get reference to task for access_process_vm()
+			 */
+			userptr->eudebug.task = get_task_struct(current);
+#endif
 		}
 
 		xe_vm_get(vm);
@@ -1023,6 +1031,9 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 		if (userptr->sg)
 			xe_hmm_userptr_free_sg(uvma);
 
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+		put_task_struct(userptr->eudebug.task);
+#endif
 		/*
 		 * Since userptr pages are not pinned, we can't remove
 		 * the notifer until we're sure the GPU is not accessing
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 557b047ebdd7..26176ccbcbbc 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -68,6 +68,12 @@ struct xe_userptr {
 #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
 	u32 divisor;
 #endif
+
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+	struct {
+		struct task_struct *task;
+	} eudebug;
+#endif
 };
 
 struct xe_vma {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev2)
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (28 preceding siblings ...)
  2024-12-09 14:39 ` ✗ CI.KUnit: failure " Patchwork
@ 2024-12-16 14:22 ` Patchwork
  2024-12-20 14:36 ` ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev3) Patchwork
  2025-01-13 16:15 ` ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev4) Patchwork
  31 siblings, 0 replies; 63+ messages in thread
From: Patchwork @ 2024-12-16 14:22 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-xe

== Series Details ==

Series: Intel Xe GPU debug support (eudebug) v3 (rev2)
URL   : https://patchwork.freedesktop.org/series/142295/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: 5c2a17b3ae90 drm-tip: 2024y-12m-16d-13h-54m-55s UTC integration manifest
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_exec_queue.c:117
error: drivers/gpu/drm/xe/xe_exec_queue.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_execlist.c:265
error: drivers/gpu/drm/xe/xe_execlist.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_lrc.c:876
error: drivers/gpu/drm/xe/xe_lrc.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_lrc.h:41
error: drivers/gpu/drm/xe/xe_lrc.h: patch does not apply
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: ptrace: export ptrace_may_access
Applying: drm/xe/eudebug: Introduce eudebug support
Applying: drm/xe/eudebug: Introduce discovery for resources
Applying: drm/xe/eudebug: Introduce exec_queue events
Applying: drm/xe/eudebug: Introduce exec queue placements event
Applying: drm/xe/eudebug: hw enablement for eudebug
Applying: drm/xe: Add EUDEBUG_ENABLE exec queue property
Patch failed at 0007 drm/xe: Add EUDEBUG_ENABLE exec queue property
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-11 12:59                 ` Joonas Lahtinen
@ 2024-12-17 14:12                   ` Joonas Lahtinen
  2024-12-20 12:47                     ` Mika Kuoppala
  0 siblings, 1 reply; 63+ messages in thread
From: Joonas Lahtinen @ 2024-12-17 14:12 UTC (permalink / raw)
  To: Andrzej Hajda, Christian König, Christoph Hellwig,
	Jonathan Cavitt, Linux MM, Maciej Patelczyk, Mika Kuoppala,
	dri-devel, intel-xe, lkml

Quoting Joonas Lahtinen (2024-12-11 14:59:33)
> Quoting Christian König (2024-12-10 16:03:14)

<SNIP>

> > If you really want to expose an interface to userspace which walks the process
> > page table, installs an MMU notifier, kmaps the resulting page and then memcpy
> > to/from it then you absolutely *must* run that by guys like Christoph Hellwig,
> > Andrew and even Linus.

> > I'm pretty sure that those guys will note that a device driver should
> > absolutely not mess with such stuff.

<SNIP>

> >     But that seems like a high-overhead thing to do due to the overhead of
> >     setting up a transfer per data word and going over the PCI bus twice
> >     compared to accessing the memory directly by CPU when it trivially can.
> > 
> > 
> > Understandable, but that will create another way of accessing process memory.

Based on this feedback and some further discussion, we now have an alternative
implementation for this interface via access_process_vm function posted by Mika:

https://lore.kernel.org/dri-devel/20241216141721.2051279-1-mika.kuoppala@linux.intel.com/

It's a couple of dozen lines don't need to do any open-coded kmapping, only utilizing
the pre-existing memory access functions.

Hopefully that would address the above concerns?

Regards, Joonas

PS. It could still be optimized further to directly use the struct mm
from within the mm notifier, and go with access_remote_vm using that,
but would require new symbol export.

For demonstration it is implemented by grabbing the task_struct and using
the already exported access_process_vm function.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite
  2024-12-09 13:33 ` [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access Mika Kuoppala
  2024-12-09 14:03   ` Christian König
  2024-12-16 14:17   ` [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite Mika Kuoppala
@ 2024-12-20 11:31   ` Mika Kuoppala
  2024-12-20 12:56     ` Christian König
  2024-12-23 10:31     ` Thomas Hellström
  2 siblings, 2 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-20 11:31 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, Mika Kuoppala, Matthew Brost, Andrzej Hajda,
	Thomas Hellström, Dominik Grzegorzek, Christian König,
	Joonas Lahtinen, Simona Vetter

Implement debugger vm access for userptrs.

When bind is done, take ref to current task so that
we know from which vm the address was bound. Then during
debugger pread/pwrite we use this target task as
parameter to access the debuggee vm with access_process_vm().

This is based on suggestions from Thomas, Joonas and Simona.

v2: need to add offset into vma (Dominik)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Simona Vetter <simona@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c  | 13 +++++++++++++
 drivers/gpu/drm/xe/xe_vm.c       |  4 ++++
 drivers/gpu/drm/xe/xe_vm.h       | 28 +++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm_types.h |  6 ++++++
 4 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 9d87df75348b..8b29192ab110 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -3074,6 +3074,19 @@ static int xe_eudebug_vma_access(struct xe_vma *vma, u64 offset_in_vma,
 		xe_bo_put(bo);
 
 		return ret;
+	} else if (xe_vma_is_userptr(vma)) {
+		struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
+
+		/*
+		 * XXX: access_remote_vm() would fit as userptr notifier has
+		 * mm ref so we would not need to carry task ref at all.
+		 * But access_remote_vm is not exported. access_process_vm()
+		 * is exported so use it instead.
+		 */
+		return access_process_vm(userptr->eudebug.task,
+					 xe_vma_userptr(vma) + offset_in_vma,
+					 buf, bytes,
+					 write ? FOLL_WRITE : 0);
 	}
 
 	return -EINVAL;
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 1cb21325d8dd..235ae2db5188 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -999,6 +999,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 			}
 
 			userptr->notifier_seq = LONG_MAX;
+
+			xe_eudebug_track_userptr_task(userptr);
 		}
 
 		xe_vm_get(vm);
@@ -1023,6 +1025,8 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 		if (userptr->sg)
 			xe_hmm_userptr_free_sg(uvma);
 
+		xe_eudebug_untrack_userptr_task(userptr);
+
 		/*
 		 * Since userptr pages are not pinned, we can't remove
 		 * the notifer until we're sure the GPU is not accessing
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 23adb7442881..4334cf2b0d9d 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -274,9 +274,35 @@ static inline void vm_dbg(const struct drm_device *dev,
 			  const char *format, ...)
 { /* noop */ }
 #endif
-#endif
 
 struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm);
 void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
 void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p);
 void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
+
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+static inline void xe_eudebug_track_userptr_task(struct xe_userptr *userptr)
+{
+	/*
+	 * We could use the mm which is on notifier. But
+	 * the access_remote_vm() is not exported. Thus
+	 * we get reference to task for access_process_vm()
+	 */
+	userptr->eudebug.task = get_task_struct(current);
+}
+
+static inline void xe_eudebug_untrack_userptr_task(struct xe_userptr *userptr)
+{
+	put_task_struct(userptr->eudebug.task);
+}
+#else
+static inline void xe_eudebug_track_userptr_task(struct xe_userptr *userptr)
+{
+}
+
+static inline void xe_eudebug_untrack_userptr_task(struct xe_userptr *userptr)
+{
+}
+#endif /* CONFIG_DRM_XE_EUDEBUG */
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 557b047ebdd7..26176ccbcbbc 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -68,6 +68,12 @@ struct xe_userptr {
 #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
 	u32 divisor;
 #endif
+
+#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
+	struct {
+		struct task_struct *task;
+	} eudebug;
+#endif
 };
 
 struct xe_vma {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access
  2024-12-17 14:12                   ` Joonas Lahtinen
@ 2024-12-20 12:47                     ` Mika Kuoppala
  0 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2024-12-20 12:47 UTC (permalink / raw)
  To: Joonas Lahtinen, Andrzej Hajda, Christian König,
	Christoph Hellwig, Jonathan Cavitt, Linux MM, Maciej Patelczyk,
	dri-devel, intel-xe, lkml

Joonas Lahtinen <joonas.lahtinen@linux.intel.com> writes:

> Quoting Joonas Lahtinen (2024-12-11 14:59:33)
>> Quoting Christian König (2024-12-10 16:03:14)
>
> <SNIP>
>
>> > If you really want to expose an interface to userspace which walks the process
>> > page table, installs an MMU notifier, kmaps the resulting page and then memcpy
>> > to/from it then you absolutely *must* run that by guys like Christoph Hellwig,
>> > Andrew and even Linus.
>
>> > I'm pretty sure that those guys will note that a device driver should
>> > absolutely not mess with such stuff.
>
> <SNIP>
>
>> >     But that seems like a high-overhead thing to do due to the overhead of
>> >     setting up a transfer per data word and going over the PCI bus twice
>> >     compared to accessing the memory directly by CPU when it trivially can.
>> > 
>> > 
>> > Understandable, but that will create another way of accessing process memory.
>
> Based on this feedback and some further discussion, we now have an alternative
> implementation for this interface via access_process_vm function posted by Mika:
>
> https://lore.kernel.org/dri-devel/20241216141721.2051279-1-mika.kuoppala@linux.intel.com/

v2:
https://lore.kernel.org/dri-devel/20241220113108.2386842-1-mika.kuoppala@linux.intel.com/
-Mika

>
> It's a couple of dozen lines don't need to do any open-coded kmapping, only utilizing
> the pre-existing memory access functions.
>
> Hopefully that would address the above concerns?
>
> Regards, Joonas
>
> PS. It could still be optimized further to directly use the struct mm
> from within the mm notifier, and go with access_remote_vm using that,
> but would require new symbol export.
>
> For demonstration it is implemented by grabbing the task_struct and using
> the already exported access_process_vm function.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite
  2024-12-20 11:31   ` Mika Kuoppala
@ 2024-12-20 12:56     ` Christian König
  2025-01-29  8:03       ` Joonas Lahtinen
  2024-12-23 10:31     ` Thomas Hellström
  1 sibling, 1 reply; 63+ messages in thread
From: Christian König @ 2024-12-20 12:56 UTC (permalink / raw)
  To: Mika Kuoppala, intel-xe
  Cc: dri-devel, Matthew Brost, Andrzej Hajda, Thomas Hellström,
	Dominik Grzegorzek, Joonas Lahtinen, Simona Vetter

Am 20.12.24 um 12:31 schrieb Mika Kuoppala:
> Implement debugger vm access for userptrs.
>
> When bind is done, take ref to current task so that
> we know from which vm the address was bound. Then during
> debugger pread/pwrite we use this target task as
> parameter to access the debuggee vm with access_process_vm().
>
> This is based on suggestions from Thomas, Joonas and Simona.

Yeah that looks much saner to me. I still have a couple of comments on 
the general approach, but I'm going to write that up after my vacation.

Regards,
Christian.

>
> v2: need to add offset into vma (Dominik)
>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Simona Vetter <simona@ffwll.ch>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/xe/xe_eudebug.c  | 13 +++++++++++++
>   drivers/gpu/drm/xe/xe_vm.c       |  4 ++++
>   drivers/gpu/drm/xe/xe_vm.h       | 28 +++++++++++++++++++++++++++-
>   drivers/gpu/drm/xe/xe_vm_types.h |  6 ++++++
>   4 files changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
> index 9d87df75348b..8b29192ab110 100644
> --- a/drivers/gpu/drm/xe/xe_eudebug.c
> +++ b/drivers/gpu/drm/xe/xe_eudebug.c
> @@ -3074,6 +3074,19 @@ static int xe_eudebug_vma_access(struct xe_vma *vma, u64 offset_in_vma,
>   		xe_bo_put(bo);
>   
>   		return ret;
> +	} else if (xe_vma_is_userptr(vma)) {
> +		struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
> +
> +		/*
> +		 * XXX: access_remote_vm() would fit as userptr notifier has
> +		 * mm ref so we would not need to carry task ref at all.
> +		 * But access_remote_vm is not exported. access_process_vm()
> +		 * is exported so use it instead.
> +		 */
> +		return access_process_vm(userptr->eudebug.task,
> +					 xe_vma_userptr(vma) + offset_in_vma,
> +					 buf, bytes,
> +					 write ? FOLL_WRITE : 0);
>   	}
>   
>   	return -EINVAL;
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 1cb21325d8dd..235ae2db5188 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -999,6 +999,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>   			}
>   
>   			userptr->notifier_seq = LONG_MAX;
> +
> +			xe_eudebug_track_userptr_task(userptr);
>   		}
>   
>   		xe_vm_get(vm);
> @@ -1023,6 +1025,8 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
>   		if (userptr->sg)
>   			xe_hmm_userptr_free_sg(uvma);
>   
> +		xe_eudebug_untrack_userptr_task(userptr);
> +
>   		/*
>   		 * Since userptr pages are not pinned, we can't remove
>   		 * the notifer until we're sure the GPU is not accessing
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 23adb7442881..4334cf2b0d9d 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -274,9 +274,35 @@ static inline void vm_dbg(const struct drm_device *dev,
>   			  const char *format, ...)
>   { /* noop */ }
>   #endif
> -#endif
>   
>   struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm);
>   void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
>   void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p);
>   void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
> +
> +#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
> +static inline void xe_eudebug_track_userptr_task(struct xe_userptr *userptr)
> +{
> +	/*
> +	 * We could use the mm which is on notifier. But
> +	 * the access_remote_vm() is not exported. Thus
> +	 * we get reference to task for access_process_vm()
> +	 */
> +	userptr->eudebug.task = get_task_struct(current);
> +}
> +
> +static inline void xe_eudebug_untrack_userptr_task(struct xe_userptr *userptr)
> +{
> +	put_task_struct(userptr->eudebug.task);
> +}
> +#else
> +static inline void xe_eudebug_track_userptr_task(struct xe_userptr *userptr)
> +{
> +}
> +
> +static inline void xe_eudebug_untrack_userptr_task(struct xe_userptr *userptr)
> +{
> +}
> +#endif /* CONFIG_DRM_XE_EUDEBUG */
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 557b047ebdd7..26176ccbcbbc 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -68,6 +68,12 @@ struct xe_userptr {
>   #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
>   	u32 divisor;
>   #endif
> +
> +#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
> +	struct {
> +		struct task_struct *task;
> +	} eudebug;
> +#endif
>   };
>   
>   struct xe_vma {


^ permalink raw reply	[flat|nested] 63+ messages in thread

* ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev3)
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (29 preceding siblings ...)
  2024-12-16 14:22 ` ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev2) Patchwork
@ 2024-12-20 14:36 ` Patchwork
  2025-01-13 16:15 ` ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev4) Patchwork
  31 siblings, 0 replies; 63+ messages in thread
From: Patchwork @ 2024-12-20 14:36 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-xe

== Series Details ==

Series: Intel Xe GPU debug support (eudebug) v3 (rev3)
URL   : https://patchwork.freedesktop.org/series/142295/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: d57431066901 drm-tip: 2024y-12m-20d-11h-06m-45s UTC integration manifest
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_exec_queue.c:117
error: drivers/gpu/drm/xe/xe_exec_queue.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_execlist.c:265
error: drivers/gpu/drm/xe/xe_execlist.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_lrc.c:876
error: drivers/gpu/drm/xe/xe_lrc.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_lrc.h:41
error: drivers/gpu/drm/xe/xe_lrc.h: patch does not apply
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: ptrace: export ptrace_may_access
Applying: drm/xe/eudebug: Introduce eudebug support
Applying: drm/xe/eudebug: Introduce discovery for resources
Applying: drm/xe/eudebug: Introduce exec_queue events
Applying: drm/xe/eudebug: Introduce exec queue placements event
Applying: drm/xe/eudebug: hw enablement for eudebug
Applying: drm/xe: Add EUDEBUG_ENABLE exec queue property
Patch failed at 0007 drm/xe: Add EUDEBUG_ENABLE exec queue property
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite
  2024-12-20 11:31   ` Mika Kuoppala
  2024-12-20 12:56     ` Christian König
@ 2024-12-23 10:31     ` Thomas Hellström
  2025-01-13 13:22       ` Mika Kuoppala
                         ` (2 more replies)
  1 sibling, 3 replies; 63+ messages in thread
From: Thomas Hellström @ 2024-12-23 10:31 UTC (permalink / raw)
  To: Mika Kuoppala, intel-xe
  Cc: dri-devel, Matthew Brost, Andrzej Hajda, Dominik Grzegorzek,
	Christian König, Joonas Lahtinen, Simona Vetter

On Fri, 2024-12-20 at 13:31 +0200, Mika Kuoppala wrote:
> Implement debugger vm access for userptrs.
> 
> When bind is done, take ref to current task so that
> we know from which vm the address was bound. Then during
> debugger pread/pwrite we use this target task as
> parameter to access the debuggee vm with access_process_vm().
> 
> This is based on suggestions from Thomas, Joonas and Simona.
> 
> v2: need to add offset into vma (Dominik)
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Simona Vetter <simona@ffwll.ch>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_eudebug.c  | 13 +++++++++++++
>  drivers/gpu/drm/xe/xe_vm.c       |  4 ++++
>  drivers/gpu/drm/xe/xe_vm.h       | 28 +++++++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_vm_types.h |  6 ++++++
>  4 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_eudebug.c
> b/drivers/gpu/drm/xe/xe_eudebug.c
> index 9d87df75348b..8b29192ab110 100644
> --- a/drivers/gpu/drm/xe/xe_eudebug.c
> +++ b/drivers/gpu/drm/xe/xe_eudebug.c
> @@ -3074,6 +3074,19 @@ static int xe_eudebug_vma_access(struct xe_vma
> *vma, u64 offset_in_vma,

AFAICT all across the core mm code, unsigned long is used for mm
offsets, rather than u64, which we use for gpu- and physical offsets.


>  		xe_bo_put(bo);
>  
>  		return ret;
> +	} else if (xe_vma_is_userptr(vma)) {
> +		struct xe_userptr *userptr = &to_userptr_vma(vma)-
> >userptr;
> +
> +		/*
> +		 * XXX: access_remote_vm() would fit as userptr
> notifier has
> +		 * mm ref so we would not need to carry task ref at
> all.
> +		 * But access_remote_vm is not exported.
> access_process_vm()
> +		 * is exported so use it instead.
> +		 */

Could we add a follow-up patch that exports access_remote_vm() and
changes this code to use access_remote_vm() instead?



> +		return access_process_vm(userptr->eudebug.task,
> +					 xe_vma_userptr(vma) +
> offset_in_vma,
> +					 buf, bytes,
> +					 write ? FOLL_WRITE : 0);
>  	}
>  
>  	return -EINVAL;
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 1cb21325d8dd..235ae2db5188 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -999,6 +999,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm
> *vm,
>  			}
>  
>  			userptr->notifier_seq = LONG_MAX;
> +
> +			xe_eudebug_track_userptr_task(userptr);
>  		}
>  
>  		xe_vm_get(vm);
> @@ -1023,6 +1025,8 @@ static void xe_vma_destroy_late(struct xe_vma
> *vma)
>  		if (userptr->sg)
>  			xe_hmm_userptr_free_sg(uvma);
>  
> +		xe_eudebug_untrack_userptr_task(userptr);
> +
>  		/*
>  		 * Since userptr pages are not pinned, we can't
> remove
>  		 * the notifer until we're sure the GPU is not
> accessing
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 23adb7442881..4334cf2b0d9d 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -274,9 +274,35 @@ static inline void vm_dbg(const struct
> drm_device *dev,
>  			  const char *format, ...)
>  { /* noop */ }
>  #endif
> -#endif
>  
>  struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm);
>  void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
>  void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct
> drm_printer *p);
>  void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
> +
> +#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
> +static inline void xe_eudebug_track_userptr_task(struct xe_userptr
> *userptr)
> +{
> +	/*
> +	 * We could use the mm which is on notifier. But
> +	 * the access_remote_vm() is not exported. Thus
> +	 * we get reference to task for access_process_vm()
> +	 */
> +	userptr->eudebug.task = get_task_struct(current);
> +}
> +
> +static inline void xe_eudebug_untrack_userptr_task(struct xe_userptr
> *userptr)
> +{
> +	put_task_struct(userptr->eudebug.task);
> +}
> +#else
> +static inline void xe_eudebug_track_userptr_task(struct xe_userptr
> *userptr)
> +{
> +}
> +
> +static inline void xe_eudebug_untrack_userptr_task(struct xe_userptr
> *userptr)
> +{
> +}
> +#endif /* CONFIG_DRM_XE_EUDEBUG */
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> b/drivers/gpu/drm/xe/xe_vm_types.h
> index 557b047ebdd7..26176ccbcbbc 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -68,6 +68,12 @@ struct xe_userptr {
>  #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
>  	u32 divisor;
>  #endif
> +
> +#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
> +	struct {
> +		struct task_struct *task;
> +	} eudebug;
> +#endif
>  };
>  
>  struct xe_vma {

Otherwise LGTM.
Thanks,
Thomas



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite
  2024-12-23 10:31     ` Thomas Hellström
@ 2025-01-13 13:22       ` Mika Kuoppala
  2025-01-13 13:32       ` [PATCH 13/27] mm: export access_remote_vm symbol for debugger use Mika Kuoppala
  2025-01-13 13:32       ` [PATCH 14/27] drm/xe/eudebug: userptr vm access pread/pwrite Mika Kuoppala
  2 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2025-01-13 13:22 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: dri-devel, Matthew Brost, Andrzej Hajda, Dominik Grzegorzek,
	Christian König, Joonas Lahtinen, Simona Vetter

Thomas Hellström <thomas.hellstrom@linux.intel.com> writes:

> On Fri, 2024-12-20 at 13:31 +0200, Mika Kuoppala wrote:
>> Implement debugger vm access for userptrs.
>> 
>> When bind is done, take ref to current task so that
>> we know from which vm the address was bound. Then during
>> debugger pread/pwrite we use this target task as
>> parameter to access the debuggee vm with access_process_vm().
>> 
>> This is based on suggestions from Thomas, Joonas and Simona.
>> 
>> v2: need to add offset into vma (Dominik)
>> 
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Simona Vetter <simona@ffwll.ch>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_eudebug.c  | 13 +++++++++++++
>>  drivers/gpu/drm/xe/xe_vm.c       |  4 ++++
>>  drivers/gpu/drm/xe/xe_vm.h       | 28 +++++++++++++++++++++++++++-
>>  drivers/gpu/drm/xe/xe_vm_types.h |  6 ++++++
>>  4 files changed, 50 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/xe/xe_eudebug.c
>> b/drivers/gpu/drm/xe/xe_eudebug.c
>> index 9d87df75348b..8b29192ab110 100644
>> --- a/drivers/gpu/drm/xe/xe_eudebug.c
>> +++ b/drivers/gpu/drm/xe/xe_eudebug.c
>> @@ -3074,6 +3074,19 @@ static int xe_eudebug_vma_access(struct xe_vma
>> *vma, u64 offset_in_vma,
>
> AFAICT all across the core mm code, unsigned long is used for mm
> offsets, rather than u64, which we use for gpu- and physical offsets.

Yup, changed these on the patch introducing the pread/pwrite.

>
>
>>  		xe_bo_put(bo);
>>  
>>  		return ret;
>> +	} else if (xe_vma_is_userptr(vma)) {
>> +		struct xe_userptr *userptr = &to_userptr_vma(vma)-
>> >userptr;
>> +
>> +		/*
>> +		 * XXX: access_remote_vm() would fit as userptr
>> notifier has
>> +		 * mm ref so we would not need to carry task ref at
>> all.
>> +		 * But access_remote_vm is not exported.
>> access_process_vm()
>> +		 * is exported so use it instead.
>> +		 */
>
> Could we add a follow-up patch that exports access_remote_vm() and
> changes this code to use access_remote_vm() instead?
>

Here is the diff:

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 996fcb4b0e9e..3fdafbf30209 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -3763,16 +3763,25 @@ static int xe_eudebug_vma_access(struct xe_vma *vma, u64 offset_in_vma,
 		return ret;
 	} else if (xe_vma_is_userptr(vma)) {
 		struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
+		struct xe_vm *vm = xe_vma_vm(vma);
+		struct mm_struct *mm = NULL;
+		int ret;
 
-		/*
-		 * XXX: access_remote_vm() would fit as userptr notifier has
-		 * mm ref so we would not need to carry task ref at all.
-		 * But access_remote_vm is not exported. access_process_vm()
-		 * is exported so use it instead.
-		 */
-		return access_process_vm(userptr->eudebug.task,
-					 xe_vma_userptr(vma), buf, bytes,
-					 write ? FOLL_WRITE : 0);
+		down_read(&vm->userptr.notifier_lock);
+		if (mmget_not_zero(userptr->notifier.mm))
+			mm = userptr->notifier.mm;
+		up_read(&vm->userptr.notifier_lock);
+
+		if (!mm)
+			return -EFAULT;
+
+		ret = access_remote_vm(mm,
+				       xe_vma_userptr(vma) + offset_in_vma,
+				       buf, bytes,
+				       write ? FOLL_WRITE : 0);
+		mmput(mm);
+
+		return ret;
 	}
 
 	return -EINVAL;
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index cbc7fdb74166..04157b6b26ea 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1003,14 +1003,6 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 			}
 
 			userptr->notifier_seq = LONG_MAX;
-#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
-			/*
-			 * We could use the mm which is on notifier. But
-			 * the access_remote_vm() is not exported. Thus
-			 * we get reference to task for access_process_vm()
-			 */
-			userptr->eudebug.task = get_task_struct(current);
-#endif
 		}
 
 		xe_vm_get(vm);
@@ -1035,9 +1027,6 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 		if (userptr->sg)
 			xe_hmm_userptr_free_sg(uvma);
 
-#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
-		put_task_struct(userptr->eudebug.task);
-#endif
 		/*
 		 * Since userptr pages are not pinned, we can't remove
 		 * the notifer until we're sure the GPU is not accessing
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 0be999dd513f..1c5776194e54 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -68,12 +68,6 @@ struct xe_userptr {
 #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
 	u32 divisor;
 #endif
-
-#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
-	struct {
-		struct task_struct *task;
-	} eudebug;
-#endif
 };
 
 #if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)

I will reply also with the export patch and
the complete patch. for reference, they can be found here also:

https://gitlab.freedesktop.org/miku/kernel/-/commit/3ffbc66fb6dd2ff0a9f5f282266a97e073f10deb
https://gitlab.freedesktop.org/miku/kernel/-/commit/ee2ebe9a5debabf984b2cfab34bf0996ace63ab7

Thanks,
-Mika

>
>
>> +		return access_process_vm(userptr->eudebug.task,
>> +					 xe_vma_userptr(vma) +
>> offset_in_vma,
>> +					 buf, bytes,
>> +					 write ? FOLL_WRITE : 0);
>>  	}
>>  
>>  	return -EINVAL;
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 1cb21325d8dd..235ae2db5188 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -999,6 +999,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm
>> *vm,
>>  			}
>>  
>>  			userptr->notifier_seq = LONG_MAX;
>> +
>> +			xe_eudebug_track_userptr_task(userptr);
>>  		}
>>  
>>  		xe_vm_get(vm);
>> @@ -1023,6 +1025,8 @@ static void xe_vma_destroy_late(struct xe_vma
>> *vma)
>>  		if (userptr->sg)
>>  			xe_hmm_userptr_free_sg(uvma);
>>  
>> +		xe_eudebug_untrack_userptr_task(userptr);
>> +
>>  		/*
>>  		 * Since userptr pages are not pinned, we can't
>> remove
>>  		 * the notifer until we're sure the GPU is not
>> accessing
>> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
>> index 23adb7442881..4334cf2b0d9d 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.h
>> +++ b/drivers/gpu/drm/xe/xe_vm.h
>> @@ -274,9 +274,35 @@ static inline void vm_dbg(const struct
>> drm_device *dev,
>>  			  const char *format, ...)
>>  { /* noop */ }
>>  #endif
>> -#endif
>>  
>>  struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm);
>>  void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap);
>>  void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct
>> drm_printer *p);
>>  void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
>> +
>> +#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
>> +static inline void xe_eudebug_track_userptr_task(struct xe_userptr
>> *userptr)
>> +{
>> +	/*
>> +	 * We could use the mm which is on notifier. But
>> +	 * the access_remote_vm() is not exported. Thus
>> +	 * we get reference to task for access_process_vm()
>> +	 */
>> +	userptr->eudebug.task = get_task_struct(current);
>> +}
>> +
>> +static inline void xe_eudebug_untrack_userptr_task(struct xe_userptr
>> *userptr)
>> +{
>> +	put_task_struct(userptr->eudebug.task);
>> +}
>> +#else
>> +static inline void xe_eudebug_track_userptr_task(struct xe_userptr
>> *userptr)
>> +{
>> +}
>> +
>> +static inline void xe_eudebug_untrack_userptr_task(struct xe_userptr
>> *userptr)
>> +{
>> +}
>> +#endif /* CONFIG_DRM_XE_EUDEBUG */
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
>> b/drivers/gpu/drm/xe/xe_vm_types.h
>> index 557b047ebdd7..26176ccbcbbc 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -68,6 +68,12 @@ struct xe_userptr {
>>  #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
>>  	u32 divisor;
>>  #endif
>> +
>> +#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG)
>> +	struct {
>> +		struct task_struct *task;
>> +	} eudebug;
>> +#endif
>>  };
>>  
>>  struct xe_vma {
>
> Otherwise LGTM.
> Thanks,
> Thomas

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 14/27] drm/xe/eudebug: userptr vm access pread/pwrite
@ 2025-01-13 13:26 Mika Kuoppala
  0 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2025-01-13 13:26 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, Mika Kuoppala, Matthew Brost, Andrzej Hajda,
	Thomas Hellström, Dominik Grzegorzek, Christian König,
	Joonas Lahtinen, Simona Vetter

Implement debugger vm access for userptrs.

When userptr bind is done, mmu notifier is added by core xe.
Later when debugger wants to access the target memory, this
notifier can be used as it carries the struct mm of target.

Implement userptr vm access, for debugger pread/pwrite
using notifier mm passed to access_remote_vm().

This is based on suggestions from Thomas, Joonas and Simona.

v2: need to add offset into vma (Dominik)
v3: use exported access_remote_vm (Thomas)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Simona Vetter <simona@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 210d9eeab1a7..25f18aa5447b 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -3077,6 +3077,27 @@ static int xe_eudebug_vma_access(struct xe_vma *vma,
 
 		xe_bo_put(bo);
 
+		return ret;
+	} else if (xe_vma_is_userptr(vma)) {
+		struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
+		struct xe_vm *vm = xe_vma_vm(vma);
+		struct mm_struct *mm = NULL;
+		int ret;
+
+		down_read(&vm->userptr.notifier_lock);
+		if (mmget_not_zero(userptr->notifier.mm))
+			mm = userptr->notifier.mm;
+		up_read(&vm->userptr.notifier_lock);
+
+		if (!mm)
+			return -EFAULT;
+
+		ret = access_remote_vm(mm,
+				       xe_vma_userptr(vma) + offset_in_vma,
+				       buf, bytes,
+				       write ? FOLL_WRITE : 0);
+		mmput(mm);
+
 		return ret;
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 13/27] mm: export access_remote_vm symbol for debugger use
  2024-12-23 10:31     ` Thomas Hellström
  2025-01-13 13:22       ` Mika Kuoppala
@ 2025-01-13 13:32       ` Mika Kuoppala
  2025-01-13 13:32       ` [PATCH 14/27] drm/xe/eudebug: userptr vm access pread/pwrite Mika Kuoppala
  2 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2025-01-13 13:32 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, Mika Kuoppala, Matthew Brost, Andrzej Hajda,
	Thomas Hellström, Christian König, Joonas Lahtinen,
	Simona Vetter

Export access_remote_vm as GPL symbol to allow debuggers (eudebug)
to access and modify memory in target VMs by tracking VM_BIND
operations. While access_process_vm() is already exported, it would
require maintaining task references in the debugger.

Since the mm reference is already present in the userptr's mm notifier
implementation, exporting access_remote_vm allows that existing reference
to be used directly without needing to obtain and maintain additional
task references just for memory access.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Simona Vetter <simona@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 mm/memory.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memory.c b/mm/memory.c
index 398c031be9ba..9b7c71c83db5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6690,6 +6690,7 @@ int access_remote_vm(struct mm_struct *mm, unsigned long addr,
 {
 	return __access_remote_vm(mm, addr, buf, len, gup_flags);
 }
+EXPORT_SYMBOL_GPL(access_remote_vm);
 
 /*
  * Access another process' address space.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 14/27] drm/xe/eudebug: userptr vm access pread/pwrite
  2024-12-23 10:31     ` Thomas Hellström
  2025-01-13 13:22       ` Mika Kuoppala
  2025-01-13 13:32       ` [PATCH 13/27] mm: export access_remote_vm symbol for debugger use Mika Kuoppala
@ 2025-01-13 13:32       ` Mika Kuoppala
  2 siblings, 0 replies; 63+ messages in thread
From: Mika Kuoppala @ 2025-01-13 13:32 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, Mika Kuoppala, Matthew Brost, Andrzej Hajda,
	Thomas Hellström, Dominik Grzegorzek, Christian König,
	Joonas Lahtinen, Simona Vetter

Implement debugger vm access for userptrs.

When userptr bind is done, mmu notifier is added by core xe.
Later when debugger wants to access the target memory, this
notifier can be used as it carries the struct mm of target.

Implement userptr vm access, for debugger pread/pwrite
using notifier mm passed to access_remote_vm().

This is based on suggestions from Thomas, Joonas and Simona.

v2: need to add offset into vma (Dominik)
v3: use exported access_remote_vm (Thomas)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Simona Vetter <simona@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_eudebug.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
index 210d9eeab1a7..25f18aa5447b 100644
--- a/drivers/gpu/drm/xe/xe_eudebug.c
+++ b/drivers/gpu/drm/xe/xe_eudebug.c
@@ -3077,6 +3077,27 @@ static int xe_eudebug_vma_access(struct xe_vma *vma,
 
 		xe_bo_put(bo);
 
+		return ret;
+	} else if (xe_vma_is_userptr(vma)) {
+		struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
+		struct xe_vm *vm = xe_vma_vm(vma);
+		struct mm_struct *mm = NULL;
+		int ret;
+
+		down_read(&vm->userptr.notifier_lock);
+		if (mmget_not_zero(userptr->notifier.mm))
+			mm = userptr->notifier.mm;
+		up_read(&vm->userptr.notifier_lock);
+
+		if (!mm)
+			return -EFAULT;
+
+		ret = access_remote_vm(mm,
+				       xe_vma_userptr(vma) + offset_in_vma,
+				       buf, bytes,
+				       write ? FOLL_WRITE : 0);
+		mmput(mm);
+
 		return ret;
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev4)
  2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
                   ` (30 preceding siblings ...)
  2024-12-20 14:36 ` ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev3) Patchwork
@ 2025-01-13 16:15 ` Patchwork
  31 siblings, 0 replies; 63+ messages in thread
From: Patchwork @ 2025-01-13 16:15 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-xe

== Series Details ==

Series: Intel Xe GPU debug support (eudebug) v3 (rev4)
URL   : https://patchwork.freedesktop.org/series/142295/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: 26615f392d22 drm-tip: 2025y-01m-13d-14h-56m-16s UTC integration manifest
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_exec_queue.c:117
error: drivers/gpu/drm/xe/xe_exec_queue.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_execlist.c:265
error: drivers/gpu/drm/xe/xe_execlist.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_lrc.c:876
error: drivers/gpu/drm/xe/xe_lrc.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_lrc.h:41
error: drivers/gpu/drm/xe/xe_lrc.h: patch does not apply
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: ptrace: export ptrace_may_access
Applying: drm/xe/eudebug: Introduce eudebug support
Applying: drm/xe/eudebug: Introduce discovery for resources
Applying: drm/xe/eudebug: Introduce exec_queue events
Applying: drm/xe/eudebug: Introduce exec queue placements event
Applying: drm/xe/eudebug: hw enablement for eudebug
Applying: drm/xe: Add EUDEBUG_ENABLE exec queue property
Patch failed at 0007 drm/xe: Add EUDEBUG_ENABLE exec queue property
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite
  2024-12-20 12:56     ` Christian König
@ 2025-01-29  8:03       ` Joonas Lahtinen
  2025-01-29 10:33         ` Christian König
  0 siblings, 1 reply; 63+ messages in thread
From: Joonas Lahtinen @ 2025-01-29  8:03 UTC (permalink / raw)
  To: Christian König, Mika Kuoppala, intel-xe
  Cc: dri-devel, Matthew Brost, Andrzej Hajda, Thomas Hellström,
	Dominik Grzegorzek, Simona Vetter

Quoting Christian König (2024-12-20 14:56:14)
> Am 20.12.24 um 12:31 schrieb Mika Kuoppala:
> > Implement debugger vm access for userptrs.
> >
> > When bind is done, take ref to current task so that
> > we know from which vm the address was bound. Then during
> > debugger pread/pwrite we use this target task as
> > parameter to access the debuggee vm with access_process_vm().
> >
> > This is based on suggestions from Thomas, Joonas and Simona.
> 
> Yeah that looks much saner to me. I still have a couple of comments on 
> the general approach, but I'm going to write that up after my vacation.

I see you've had some issues with mail servers, so just pinging here to
see if any replies have got lost.

Would be great to reach a consensus on the high level details before
spinning off further series addressing the smaller items.

Regards, Joonas

> 
> Regards,
> Christian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite
  2025-01-29  8:03       ` Joonas Lahtinen
@ 2025-01-29 10:33         ` Christian König
  2025-01-29 18:18           ` Joonas Lahtinen
  0 siblings, 1 reply; 63+ messages in thread
From: Christian König @ 2025-01-29 10:33 UTC (permalink / raw)
  To: Joonas Lahtinen, Mika Kuoppala, intel-xe
  Cc: dri-devel, Matthew Brost, Andrzej Hajda, Thomas Hellström,
	Dominik Grzegorzek, Simona Vetter

Am 29.01.25 um 09:03 schrieb Joonas Lahtinen:
> Quoting Christian König (2024-12-20 14:56:14)
>> Am 20.12.24 um 12:31 schrieb Mika Kuoppala:
>>> Implement debugger vm access for userptrs.
>>>
>>> When bind is done, take ref to current task so that
>>> we know from which vm the address was bound. Then during
>>> debugger pread/pwrite we use this target task as
>>> parameter to access the debuggee vm with access_process_vm().
>>>
>>> This is based on suggestions from Thomas, Joonas and Simona.
>> Yeah that looks much saner to me. I still have a couple of comments on
>> the general approach, but I'm going to write that up after my vacation.
> I see you've had some issues with mail servers, so just pinging here to
> see if any replies have got lost.

No, I'm just overworked and have like 10 things I need to take care of 
at the same time :(

> Would be great to reach a consensus on the high level details before
> spinning off further series addressing the smaller items.

I would say that attaching debug metadata to the GPU VMA doesn't look 
like the best design, but if you just do that inside XE it won't affect 
any other part of the kernel.

My other concern I have is using ptrace_may_access, I would still try to 
avoid that.

What if you first grab the DRM render node file descriptor which 
represents the GPU address space you want to debug with pidfd_getfd() 
and then either create the eudebug file descriptor from an IOCTL or 
implement the necessary IOCTLs on the DRM render node directly?

That would make it unnecessary to export ptrace_may_access.

Regards,
Christian.

>
> Regards, Joonas
>
>> Regards,
>> Christian.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite
  2025-01-29 10:33         ` Christian König
@ 2025-01-29 18:18           ` Joonas Lahtinen
  2025-01-30 12:09             ` Christian König
  0 siblings, 1 reply; 63+ messages in thread
From: Joonas Lahtinen @ 2025-01-29 18:18 UTC (permalink / raw)
  To: Christian König, Mika Kuoppala, intel-xe
  Cc: dri-devel, Matthew Brost, Andrzej Hajda, Thomas Hellström,
	Dominik Grzegorzek, Simona Vetter

Quoting Christian König (2025-01-29 12:33:52)
> Am 29.01.25 um 09:03 schrieb Joonas Lahtinen:
> > Quoting Christian König (2024-12-20 14:56:14)
> >> Am 20.12.24 um 12:31 schrieb Mika Kuoppala:
> >>> Implement debugger vm access for userptrs.
> >>>
> >>> When bind is done, take ref to current task so that
> >>> we know from which vm the address was bound. Then during
> >>> debugger pread/pwrite we use this target task as
> >>> parameter to access the debuggee vm with access_process_vm().
> >>>
> >>> This is based on suggestions from Thomas, Joonas and Simona.
> >> Yeah that looks much saner to me. I still have a couple of comments on
> >> the general approach, but I'm going to write that up after my vacation.
> > I see you've had some issues with mail servers, so just pinging here to
> > see if any replies have got lost.
> 
> No, I'm just overworked and have like 10 things I need to take care of 
> at the same time :(

Ack.

> > Would be great to reach a consensus on the high level details before
> > spinning off further series addressing the smaller items.
> 
> I would say that attaching debug metadata to the GPU VMA doesn't look 
> like the best design, but if you just do that inside XE it won't affect 
> any other part of the kernel.

It just grew out of convenience of implementation on the side of VM_BIND.

The other alternative would be to maintain a secondary load map in the
kernel in a separate data structure from GPU VMA.

I was actually going to suggest such thing as a common DRM thing: GPU VMA
metadata interface or parallel "GPU loadmap" interface. It'd allow for
userspace tooling to more easier go from GPU EU IP to a module that
was loaded at that address. Kind of a step 0 towards backtrace for GPU.

Can you elaborate on what your concern is with the VMA metadata
attachment?

> My other concern I have is using ptrace_may_access, I would still try to 
> avoid that.
> 
> What if you first grab the DRM render node file descriptor which 
> represents the GPU address space you want to debug with pidfd_getfd() 
> and then either create the eudebug file descriptor from an IOCTL or 
> implement the necessary IOCTLs on the DRM render node directly?
> 
> That would make it unnecessary to export ptrace_may_access.

We're prototyping this. At this point there are some caveats recognized:

1. There is a limitation that you don't get a notification when your
target PID opens a DRM client, but GDB or other application would have
to keep polling for the FDs. I'll have to check with the team how that
would fit to GDB side.

2. Debugging multiple DRM clients (to same GPU) under one PID now
requires separate debugger connections. This may break the way the debugger
locking is currently implemented for the discovery phase to prevent parallel
IOCTL from running. Will have to look into it once we have a working
prototype.

3. Last but not the least, we'll have to compare which LSM security
modules and other conditions checked on the pidfd_getfd() paths for
access restrictions.

Reason for using ptrace_may_access() was to have a clear 1:1 mapping
between user being allowed to ptrace() a PID to control CPU threads and
do debugger IOCTL to control GPU threads. So if user can attach to a PID
with GDB, they would certainly also be able to debug the GPU portion.

If there is divergence, I don't see a big benefit in going to
pidfd_getfd(). We're all the same not even exporting ptrace_may_access()
and YOLO'ing the access check by comparing euid and such which is close
to what ptrace does, but not exactly the same (just like pidfd_getfd()
access checks).

However I believe this would not be a fundamental blocker for the
series? If this would be the only remaining disagreement, I guess we
could just do CAP_ADMIN check initially before we can find something
agreeable.

Regards, Joonas

> 
> Regards,
> Christian.
> 
> >
> > Regards, Joonas
> >
> >> Regards,
> >> Christian.
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite
  2025-01-29 18:18           ` Joonas Lahtinen
@ 2025-01-30 12:09             ` Christian König
  0 siblings, 0 replies; 63+ messages in thread
From: Christian König @ 2025-01-30 12:09 UTC (permalink / raw)
  To: Joonas Lahtinen, Mika Kuoppala, intel-xe
  Cc: dri-devel, Matthew Brost, Andrzej Hajda, Thomas Hellström,
	Dominik Grzegorzek, Simona Vetter

[-- Attachment #1: Type: text/plain, Size: 4514 bytes --]

Am 29.01.25 um 19:18 schrieb Joonas Lahtinen:
>>> Would be great to reach a consensus on the high level details before
>>> spinning off further series addressing the smaller items.
>> I would say that attaching debug metadata to the GPU VMA doesn't look
>> like the best design, but if you just do that inside XE it won't affect
>> any other part of the kernel.
> It just grew out of convenience of implementation on the side of VM_BIND.
>
> The other alternative would be to maintain a secondary load map in the
> kernel in a separate data structure from GPU VMA.
>
> I was actually going to suggest such thing as a common DRM thing: GPU VMA
> metadata interface or parallel "GPU loadmap" interface. It'd allow for
> userspace tooling to more easier go from GPU EU IP to a module that
> was loaded at that address. Kind of a step 0 towards backtrace for GPU.
>
> Can you elaborate on what your concern is with the VMA metadata
> attachment?

In general we should try to avoid putting data into the kernel the 
kernel doesn't need.

In other words we don't put the debug metadata for the CPU process on 
the CPU VMA either.

You could reduce the amount of data massively if you just attach the 
source of the debug metadata to the GPU VMA.

E.g. instead of the symbol table just where to find it. A VA of the CPU 
process, a file system location or something like that.

>> My other concern I have is using ptrace_may_access, I would still try to
>> avoid that.
>>
>> What if you first grab the DRM render node file descriptor which
>> represents the GPU address space you want to debug with pidfd_getfd()
>> and then either create the eudebug file descriptor from an IOCTL or
>> implement the necessary IOCTLs on the DRM render node directly?
>>
>> That would make it unnecessary to export ptrace_may_access.
> We're prototyping this. At this point there are some caveats recognized:
>
> 1. There is a limitation that you don't get a notification when your
> target PID opens a DRM client, but GDB or other application would have
> to keep polling for the FDs. I'll have to check with the team how that
> would fit to GDB side.

Well there is the dnotify/inotify API which allows a process to be 
notified of certain filesystem events.

I never used it, but it potentially provides an event when a specific 
file is open.

On the other hand I have no idea what permissions are necessary to use 
it, potentially root.

> 2. Debugging multiple DRM clients (to same GPU) under one PID now
> requires separate debugger connections. This may break the way the debugger
> locking is currently implemented for the discovery phase to prevent parallel
> IOCTL from running. Will have to look into it once we have a working
> prototype.

Well multiple DRM clients would also have multiple GPU VM address 
spaces, wouldn't they?

So you would need multiple connections, one for each DRM client as well.

> 3. Last but not the least, we'll have to compare which LSM security
> modules and other conditions checked on the pidfd_getfd() paths for
> access restrictions.
>
> Reason for using ptrace_may_access() was to have a clear 1:1 mapping
> between user being allowed to ptrace() a PID to control CPU threads and
> do debugger IOCTL to control GPU threads. So if user can attach to a PID
> with GDB, they would certainly also be able to debug the GPU portion.
>
> If there is divergence, I don't see a big benefit in going to
> pidfd_getfd(). We're all the same not even exporting ptrace_may_access()
> and YOLO'ing the access check by comparing euid and such which is close
> to what ptrace does, but not exactly the same (just like pidfd_getfd()
> access checks).

Well that argumentation is exactly backward to why I suggested not using 
ptrace_may_access().

When an administrator used LSM to block pidfd_getfd() then a driver 
which uses a driver specific IOCTL to bypass that is actually a really 
bad idea.

> However I believe this would not be a fundamental blocker for the
> series? If this would be the only remaining disagreement, I guess we
> could just do CAP_ADMIN check initially before we can find something
> agreeable.

Well as soon as anything is merged upstream it becomes UAPI, which in 
turn means that changing it fundamentally becomes really hard to do.

What you could do is to expose the interface through debugfs and 
explicitly state that it isn't stable in any way.

Regards,
Christian.

>
> Regards, Joonas
>
>> Regards,
>> Christian.
>>
>>> Regards, Joonas
>>>
>>>> Regards,
>>>> Christian.

[-- Attachment #2: Type: text/html, Size: 6750 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2025-01-30 12:09 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-09 13:32 [PATCH 00/26] Intel Xe GPU debug support (eudebug) v3 Mika Kuoppala
2024-12-09 13:32 ` [PATCH 01/26] ptrace: export ptrace_may_access Mika Kuoppala
2024-12-10  4:29   ` Christoph Hellwig
2024-12-12  9:16     ` Joonas Lahtinen
2024-12-09 13:32 ` [PATCH 02/26] drm/xe/eudebug: Introduce eudebug support Mika Kuoppala
2024-12-09 13:32 ` [PATCH 03/26] drm/xe/eudebug: Introduce discovery for resources Mika Kuoppala
2024-12-09 13:32 ` [PATCH 04/26] drm/xe/eudebug: Introduce exec_queue events Mika Kuoppala
2024-12-09 13:32 ` [PATCH 05/26] drm/xe/eudebug: Introduce exec queue placements event Mika Kuoppala
2024-12-09 13:32 ` [PATCH 06/26] drm/xe/eudebug: hw enablement for eudebug Mika Kuoppala
2024-12-09 13:32 ` [PATCH 07/26] drm/xe: Add EUDEBUG_ENABLE exec queue property Mika Kuoppala
2024-12-09 13:32 ` [PATCH 08/26] drm/xe/eudebug: Introduce per device attention scan worker Mika Kuoppala
2024-12-09 13:33 ` [PATCH 09/26] drm/xe/eudebug: Introduce EU control interface Mika Kuoppala
2024-12-09 13:33 ` [PATCH 10/26] drm/xe/eudebug: Add vm bind and vm bind ops Mika Kuoppala
2024-12-09 13:33 ` [PATCH 11/26] drm/xe/eudebug: Add UFENCE events with acks Mika Kuoppala
2024-12-09 13:33 ` [PATCH 12/26] drm/xe/eudebug: vm open/pread/pwrite Mika Kuoppala
2024-12-09 13:33 ` [PATCH 13/26] drm/xe: add system memory page iterator support to xe_res_cursor Mika Kuoppala
2024-12-09 13:33 ` [PATCH 14/26] drm/xe/eudebug: implement userptr_vma access Mika Kuoppala
2024-12-09 14:03   ` Christian König
2024-12-09 14:56     ` Joonas Lahtinen
2024-12-09 15:31     ` Simona Vetter
2024-12-09 15:42       ` Christian König
2024-12-09 15:45         ` Christian König
2024-12-10  9:33         ` Joonas Lahtinen
2024-12-10 10:00           ` Christian König
2024-12-10 11:57             ` Joonas Lahtinen
2024-12-10 14:03               ` Christian König
2024-12-11 12:59                 ` Joonas Lahtinen
2024-12-17 14:12                   ` Joonas Lahtinen
2024-12-20 12:47                     ` Mika Kuoppala
2024-12-10 11:17         ` Simona Vetter
2024-12-12  8:49       ` Thomas Hellström
2024-12-12 10:12         ` Simona Vetter
2024-12-13 19:39           ` Matthew Brost
2024-12-16 14:17   ` [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite Mika Kuoppala
2024-12-20 11:31   ` Mika Kuoppala
2024-12-20 12:56     ` Christian König
2025-01-29  8:03       ` Joonas Lahtinen
2025-01-29 10:33         ` Christian König
2025-01-29 18:18           ` Joonas Lahtinen
2025-01-30 12:09             ` Christian König
2024-12-23 10:31     ` Thomas Hellström
2025-01-13 13:22       ` Mika Kuoppala
2025-01-13 13:32       ` [PATCH 13/27] mm: export access_remote_vm symbol for debugger use Mika Kuoppala
2025-01-13 13:32       ` [PATCH 14/27] drm/xe/eudebug: userptr vm access pread/pwrite Mika Kuoppala
2024-12-09 13:33 ` [PATCH 15/26] drm/xe: Debug metadata create/destroy ioctls Mika Kuoppala
2024-12-09 13:33 ` [PATCH 16/26] drm/xe: Attach debug metadata to vma Mika Kuoppala
2024-12-09 13:33 ` [PATCH 17/26] drm/xe/eudebug: Add debug metadata support for xe_eudebug Mika Kuoppala
2024-12-09 13:33 ` [PATCH 18/26] drm/xe/eudebug: Implement vm_bind_op discovery Mika Kuoppala
2024-12-09 13:33 ` [PATCH 19/26] drm/xe/eudebug: Dynamically toggle debugger functionality Mika Kuoppala
2024-12-09 13:33 ` [PATCH 20/26] drm/xe/eudebug_test: Introduce xe_eudebug wa kunit test Mika Kuoppala
2024-12-09 13:33 ` [PATCH 21/26] drm/xe/eudebug/ptl: Add support for extra attention register Mika Kuoppala
2024-12-09 13:33 ` [PATCH 22/26] drm/xe/eudebug/ptl: Add RCU_DEBUG_1 register support for xe3 Mika Kuoppala
2024-12-09 13:33 ` [PATCH 23/26] drm/xe/eudebug: Add read/count/compare helper for eu attention Mika Kuoppala
2024-12-09 13:33 ` [PATCH 24/26] drm/xe/eudebug: Introduce EU pagefault handling interface Mika Kuoppala
2024-12-09 13:33 ` [PATCH 25/26] drm/xe/vm: Support for adding null page VMA to VM on request Mika Kuoppala
2024-12-09 13:33 ` [PATCH 26/26] drm/xe/eudebug: Enable EU pagefault handling Mika Kuoppala
2024-12-09 14:37 ` ✓ CI.Patch_applied: success for Intel Xe GPU debug support (eudebug) v3 Patchwork
2024-12-09 14:38 ` ✗ CI.checkpatch: warning " Patchwork
2024-12-09 14:39 ` ✗ CI.KUnit: failure " Patchwork
2024-12-16 14:22 ` ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev2) Patchwork
2024-12-20 14:36 ` ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev3) Patchwork
2025-01-13 16:15 ` ✗ CI.Patch_applied: failure for Intel Xe GPU debug support (eudebug) v3 (rev4) Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2025-01-13 13:26 [PATCH 14/27] drm/xe/eudebug: userptr vm access pread/pwrite Mika Kuoppala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox