[RFC PATCH 00/29] UMD direct submission in Xe

Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 00/29] UMD direct submission in Xe
@ 2024-11-18 23:35 Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 01/29] dma-fence: Add dma_fence_preempt base class Matthew Brost
                   ` (29 more replies)
  0 siblings, 30 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:35 UTC (permalink / raw)
  To: igt-dev

This is an RFC, or possibly even a proof of concept, for UMD (User Mode
Driver) direct submission in Xe. It is similar to AMD's design [1] [2]
or ARM's design [3], utilizing a uAPI to convert user-space syncs
(memory writes) to kernel-space syncs (DMA fences). It is built around
the existing Xe preemption fences for dynamic memory management, such as
userptr invalidation and buffer object (BO) eviction.

The series also enables mapping a PPGTT-bound submission ring in
non-privileged mode, as well as exposing indirect ring state (such as
ring head, tail, etc.) and the doorbell to user space, enabling UMD
direct submission.

The target for this series is Mesa, with the goal of enabling UMD direct
submission and removing the submission thread that currently handles
future fences. I've discussed this with Sima and the Intel Mesa team,
and it seems like a reachable target. Most synchronization will be
handled in user space via memory writes and semaphore wait ring
instructions, with only legacy cross-process synchronization (e.g.,
compositors) requiring kernel synchronization (DMA fences).

The series includes some common patches at the beginning to implement
preemption fences and user fences. The idea of preemption
DMA-reservation slots [4] has been dropped in favor of attaching the
last exported DMA fence to the preemption fence as suggested by AMD.

This is a public checkpoint on the KMD (Kernel Mode Driver) work, which
will be tabled until Intel's Mesa team has the bandwidth to begin the
UMD work. That said, the uAPI is very preliminary and likely to change.
One idea that was discussed is a common user fence interface based
around DRM syncobjs, which will likely be explored further as UMD
engagement begins. Some work for syncing VM binds (kernel operation)
with UMD direct submission is also likely required.

Testing has been done with [5], and the main features—such as basic
submission, dynamic memory management, user-to-kernel sync conversion,
and protection against endless user fences—are working on BMG and LNL.

The GitLab branch [6] has also been pushed for reference.

Any early community feedback is always appreciated.

Matt

[1] https://patchwork.freedesktop.org/series/113675/
[2] https://patchwork.freedesktop.org/series/114385/
[3] https://patchwork.freedesktop.org/series/137924/
[4] https://patchwork.freedesktop.org/series/141129/
[5] https://patchwork.freedesktop.org/series/141518/
[6] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-umd-submission-post/-/tree/post-11-18-24?ref_type=heads 

Matthew Brost (28):
  dma-fence: Add dma_fence_preempt base class
  dma-fence: Add dma_fence_user_fence
  drm/xe: Use dma_fence_preempt base class
  drm/xe: Allocate doorbells for UMD exec queues
  drm/xe: Add doorbell ID to snapshot capture
  drm/xe: Break submission ring out into its own BO
  drm/xe: Break indirect ring state out into its own BO
  drm/xe: Clear GGTT in xe_bo_restore_kernel
  FIXME: drm/xe: Add pad to ring and indirect state
  drm/xe: Enable indirect ring on media GT
  drm/xe: Don't add pinned mappings to VM bulk move
  drm/xe: Add exec queue post init extension processing
  drm/xe: Add support for mmapping doorbells to user space
  drm/xe: Add support for mmapping submission ring and indirect ring
    state to user space
  drm/xe/uapi: Define UMD exec queue mapping uAPI
  drm/xe: Add usermap exec queue extension
  drm/xe: Drop EXEC_QUEUE_FLAG_UMD_SUBMISSION flag
  drm/xe: Do not allow usermap exec queues in exec IOCTL
  drm/xe: Teach GuC backend to kill usermap queues
  drm/xe: Enable preempt fences on usermap queues
  drm/xe/uapi: Add uAPI to convert user semaphore to / from drm syncobj
  drm/xe: Add user fence IRQ handler
  drm/xe: Add xe_hw_fence_user_init
  drm/xe: Add a message lock to the Xe GPU scheduler
  drm/xe: Always wait on preempt fences in vma_check_userptr
  drm/xe: Teach xe_sync layer about drm_xe_semaphore
  drm/xe: Add VM convert fence IOCTL
  drm/xe: Add user fence TDR

Tejas Upadhyay (1):
  drm/xe/mmap: Add mmap support for PCI memory barrier

 drivers/dma-buf/Makefile                     |   2 +-
 drivers/dma-buf/dma-fence-preempt.c          | 134 ++++++
 drivers/dma-buf/dma-fence-user-fence.c       |  73 ++++
 drivers/gpu/drm/xe/xe_bo.c                   |  29 +-
 drivers/gpu/drm/xe/xe_bo.h                   |   5 +
 drivers/gpu/drm/xe/xe_bo_evict.c             |   8 +-
 drivers/gpu/drm/xe/xe_device.c               | 181 +++++++-
 drivers/gpu/drm/xe/xe_device_types.h         |   3 +
 drivers/gpu/drm/xe/xe_exec.c                 |   3 +-
 drivers/gpu/drm/xe/xe_exec_queue.c           | 175 +++++++-
 drivers/gpu/drm/xe/xe_exec_queue.h           |   5 +
 drivers/gpu/drm/xe/xe_exec_queue_types.h     |  13 +
 drivers/gpu/drm/xe/xe_execlist.c             |   2 +-
 drivers/gpu/drm/xe/xe_ggtt.c                 |  19 +-
 drivers/gpu/drm/xe/xe_ggtt.h                 |   2 +
 drivers/gpu/drm/xe/xe_gpu_scheduler.c        |  19 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.h        |  12 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler_types.h  |   2 +
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   9 +-
 drivers/gpu/drm/xe/xe_guc_submit.c           | 177 +++++++-
 drivers/gpu/drm/xe/xe_guc_submit_types.h     |   2 +
 drivers/gpu/drm/xe/xe_hw_engine.c            |   4 +-
 drivers/gpu/drm/xe/xe_hw_engine_group.c      |   4 +-
 drivers/gpu/drm/xe/xe_hw_fence.c             |  17 +
 drivers/gpu/drm/xe/xe_hw_fence.h             |   3 +
 drivers/gpu/drm/xe/xe_lrc.c                  | 176 ++++++--
 drivers/gpu/drm/xe/xe_lrc.h                  |   4 +-
 drivers/gpu/drm/xe/xe_lrc_types.h            |  16 +-
 drivers/gpu/drm/xe/xe_pci.c                  |   1 +
 drivers/gpu/drm/xe/xe_preempt_fence.c        |  89 ++--
 drivers/gpu/drm/xe/xe_preempt_fence.h        |   2 +-
 drivers/gpu/drm/xe/xe_preempt_fence_types.h  |  11 +-
 drivers/gpu/drm/xe/xe_pt.c                   |   5 +-
 drivers/gpu/drm/xe/xe_sync.c                 |  90 ++++
 drivers/gpu/drm/xe/xe_sync.h                 |   8 +
 drivers/gpu/drm/xe/xe_sync_types.h           |   5 +-
 drivers/gpu/drm/xe/xe_vm.c                   | 423 ++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm.h                   |   4 +-
 drivers/gpu/drm/xe/xe_vm_types.h             |  26 ++
 include/linux/dma-fence-preempt.h            |  56 +++
 include/linux/dma-fence-user-fence.h         |  31 ++
 include/uapi/drm/xe_drm.h                    | 147 ++++++-
 42 files changed, 1798 insertions(+), 199 deletions(-)
 create mode 100644 drivers/dma-buf/dma-fence-preempt.c
 create mode 100644 drivers/dma-buf/dma-fence-user-fence.c
 create mode 100644 include/linux/dma-fence-preempt.h
 create mode 100644 include/linux/dma-fence-user-fence.h

-- 
2.34.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 01/29] dma-fence: Add dma_fence_preempt base class
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
@ 2024-11-18 23:35 ` Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 02/29] dma-fence: Add dma_fence_user_fence Matthew Brost
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:35 UTC (permalink / raw)
  To: igt-dev

Add a dma_fence_preempt base class with driver ops to implement
preemption, based on the existing Xe preemptive fence implementation.

Annotated to ensure correct driver usage.

Cc: Dave Airlie <airlied@redhat.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/dma-buf/Makefile            |   2 +-
 drivers/dma-buf/dma-fence-preempt.c | 133 ++++++++++++++++++++++++++++
 include/linux/dma-fence-preempt.h   |  56 ++++++++++++
 3 files changed, 190 insertions(+), 1 deletion(-)
 create mode 100644 drivers/dma-buf/dma-fence-preempt.c
 create mode 100644 include/linux/dma-fence-preempt.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 70ec901edf2c..c25500bb38b5 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
-	 dma-fence-unwrap.o dma-resv.o
+	 dma-fence-preempt.o dma-fence-unwrap.o dma-resv.o
 obj-$(CONFIG_DMABUF_HEAPS)	+= dma-heap.o
 obj-$(CONFIG_DMABUF_HEAPS)	+= heaps/
 obj-$(CONFIG_SYNC_FILE)		+= sync_file.o
diff --git a/drivers/dma-buf/dma-fence-preempt.c b/drivers/dma-buf/dma-fence-preempt.c
new file mode 100644
index 000000000000..6e6ce7ea7421
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-preempt.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#include <linux/dma-fence-preempt.h>
+#include <linux/dma-resv.h>
+
+static void dma_fence_preempt_work_func(struct work_struct *w)
+{
+	bool cookie = dma_fence_begin_signalling();
+	struct dma_fence_preempt *pfence =
+		container_of(w, typeof(*pfence), work);
+	const struct dma_fence_preempt_ops *ops = pfence->ops;
+	int err = pfence->base.error;
+
+	if (!err) {
+		err = ops->preempt_wait(pfence);
+		if (err)
+			dma_fence_set_error(&pfence->base, err);
+	}
+
+	dma_fence_signal(&pfence->base);
+	ops->preempt_finished(pfence);
+
+	dma_fence_end_signalling(cookie);
+}
+
+static const char *
+dma_fence_preempt_get_driver_name(struct dma_fence *fence)
+{
+	return "dma_fence_preempt";
+}
+
+static const char *
+dma_fence_preempt_get_timeline_name(struct dma_fence *fence)
+{
+	return "ordered";
+}
+
+static void dma_fence_preempt_issue(struct dma_fence_preempt *pfence)
+{
+	int err;
+
+	err = pfence->ops->preempt(pfence);
+	if (err)
+		dma_fence_set_error(&pfence->base, err);
+
+	queue_work(pfence->wq, &pfence->work);
+}
+
+static void dma_fence_preempt_cb(struct dma_fence *fence,
+				 struct dma_fence_cb *cb)
+{
+	struct dma_fence_preempt *pfence =
+		container_of(cb, typeof(*pfence), cb);
+
+	dma_fence_preempt_issue(pfence);
+}
+
+static void dma_fence_preempt_delay(struct dma_fence_preempt *pfence)
+{
+	struct dma_fence *fence;
+	int err;
+
+	fence = pfence->ops->preempt_delay(pfence);
+	if (WARN_ON_ONCE(!fence || IS_ERR(fence)))
+		return;
+
+	err = dma_fence_add_callback(fence, &pfence->cb, dma_fence_preempt_cb);
+	if (err == -ENOENT)
+		dma_fence_preempt_issue(pfence);
+}
+
+static bool dma_fence_preempt_enable_signaling(struct dma_fence *fence)
+{
+	struct dma_fence_preempt *pfence =
+		container_of(fence, typeof(*pfence), base);
+
+	if (pfence->ops->preempt_delay)
+		dma_fence_preempt_delay(pfence);
+	else
+		dma_fence_preempt_issue(pfence);
+
+	return true;
+}
+
+static const struct dma_fence_ops preempt_fence_ops = {
+	.get_driver_name = dma_fence_preempt_get_driver_name,
+	.get_timeline_name = dma_fence_preempt_get_timeline_name,
+	.enable_signaling = dma_fence_preempt_enable_signaling,
+};
+
+/**
+ * dma_fence_is_preempt() - Is preempt fence
+ *
+ * @fence: Preempt fence
+ *
+ * Return: True if preempt fence, False otherwise
+ */
+bool dma_fence_is_preempt(const struct dma_fence *fence)
+{
+	return fence->ops == &preempt_fence_ops;
+}
+EXPORT_SYMBOL(dma_fence_is_preempt);
+
+/**
+ * dma_fence_preempt_init() - Initial preempt fence
+ *
+ * @fence: Preempt fence
+ * @ops: Preempt fence operations
+ * @wq: Work queue for preempt wait, should have WQ_MEM_RECLAIM set
+ * @context: Fence context
+ * @seqno: Fence seqence number
+ */
+void dma_fence_preempt_init(struct dma_fence_preempt *fence,
+			    const struct dma_fence_preempt_ops *ops,
+			    struct workqueue_struct *wq,
+			    u64 context, u64 seqno)
+{
+	/*
+	 * XXX: We really want to check wq for WQ_MEM_RECLAIM here but
+	 * workqueue_struct is private.
+	 */
+
+	fence->ops = ops;
+	fence->wq = wq;
+	INIT_WORK(&fence->work, dma_fence_preempt_work_func);
+	spin_lock_init(&fence->lock);
+	dma_fence_init(&fence->base, &preempt_fence_ops,
+		       &fence->lock, context, seqno);
+}
+EXPORT_SYMBOL(dma_fence_preempt_init);
diff --git a/include/linux/dma-fence-preempt.h b/include/linux/dma-fence-preempt.h
new file mode 100644
index 000000000000..28d803f89527
--- /dev/null
+++ b/include/linux/dma-fence-preempt.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#ifndef __LINUX_DMA_FENCE_PREEMPT_H
+#define __LINUX_DMA_FENCE_PREEMPT_H
+
+#include <linux/dma-fence.h>
+#include <linux/workqueue.h>
+
+struct dma_fence_preempt;
+struct dma_resv;
+
+/**
+ * struct dma_fence_preempt_ops - Preempt fence operations
+ *
+ * These functions should be implemented in the driver side.
+ */
+struct dma_fence_preempt_ops {
+	/** @preempt_delay: Preempt execution with a delay */
+	struct dma_fence *(*preempt_delay)(struct dma_fence_preempt *fence);
+	/** @preempt: Preempt execution */
+	int (*preempt)(struct dma_fence_preempt *fence);
+	/** @preempt_wait: Wait for preempt of execution to complete */
+	int (*preempt_wait)(struct dma_fence_preempt *fence);
+	/** @preempt_finished: Signal that the preempt has finished */
+	void (*preempt_finished)(struct dma_fence_preempt *fence);
+};
+
+/**
+ * struct dma_fence_preempt - Embedded preempt fence base class
+ */
+struct dma_fence_preempt {
+	/** @base: Fence base class */
+	struct dma_fence base;
+	/** @lock: Spinlock for fence handling */
+	spinlock_t lock;
+	/** @cb: Callback preempt delay */
+	struct dma_fence_cb cb;
+	/** @ops: Preempt fence operation */
+	const struct dma_fence_preempt_ops *ops;
+	/** @wq: Work queue for preempt wait */
+	struct workqueue_struct *wq;
+	/** @work: Work struct for preempt wait */
+	struct work_struct work;
+};
+
+bool dma_fence_is_preempt(const struct dma_fence *fence);
+
+void dma_fence_preempt_init(struct dma_fence_preempt *fence,
+			    const struct dma_fence_preempt_ops *ops,
+			    struct workqueue_struct *wq,
+			    u64 context, u64 seqno);
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 02/29] dma-fence: Add dma_fence_user_fence
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 01/29] dma-fence: Add dma_fence_preempt base class Matthew Brost
@ 2024-11-18 23:35 ` Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 03/29] drm/xe: Use dma_fence_preempt base class Matthew Brost
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:35 UTC (permalink / raw)
  To: igt-dev

Normalize user fence attachment to a DMA fence. A user fence is a simple
seqno write to memory, implemented by attaching a DMA fence callback
that writes out the seqno. Intended use case is importing a dma-fence
into kernel and exporting a user fence.

Helpers added to allocate, attach, and free a dma_fence_user_fence.

Cc: Dave Airlie <airlied@redhat.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/dma-buf/Makefile               |  2 +-
 drivers/dma-buf/dma-fence-user-fence.c | 73 ++++++++++++++++++++++++++
 include/linux/dma-fence-user-fence.h   | 31 +++++++++++
 3 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 drivers/dma-buf/dma-fence-user-fence.c
 create mode 100644 include/linux/dma-fence-user-fence.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index c25500bb38b5..ba9ba339319e 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
-	 dma-fence-preempt.o dma-fence-unwrap.o dma-resv.o
+	 dma-fence-preempt.o dma-fence-unwrap.o dma-fence-user-fence.o dma-resv.o
 obj-$(CONFIG_DMABUF_HEAPS)	+= dma-heap.o
 obj-$(CONFIG_DMABUF_HEAPS)	+= heaps/
 obj-$(CONFIG_SYNC_FILE)		+= sync_file.o
diff --git a/drivers/dma-buf/dma-fence-user-fence.c b/drivers/dma-buf/dma-fence-user-fence.c
new file mode 100644
index 000000000000..5a4b289bacb8
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-user-fence.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#include <linux/dma-fence-user-fence.h>
+#include <linux/slab.h>
+
+static void user_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
+{
+	struct dma_fence_user_fence *user_fence =
+		container_of(cb, struct dma_fence_user_fence, cb);
+
+	if (user_fence->map.is_iomem)
+		writeq(user_fence->seqno, user_fence->map.vaddr_iomem);
+	else
+		*(u64 *)user_fence->map.vaddr = user_fence->seqno;
+
+	dma_fence_user_fence_free(user_fence);
+}
+
+/**
+ * dma_fence_user_fence_alloc() - Allocate user fence
+ *
+ * Return: Allocated struct dma_fence_user_fence on Success, NULL on failure
+ */
+struct dma_fence_user_fence *dma_fence_user_fence_alloc(void)
+{
+	return kmalloc(sizeof(struct dma_fence_user_fence), GFP_KERNEL);
+}
+EXPORT_SYMBOL(dma_fence_user_fence_alloc);
+
+/**
+ * dma_fence_user_fence_free() - Free user fence
+ *
+ * Free user fence. Should only be called on a user fence if
+ * dma_fence_user_fence_attach is not called to cleanup original allocation from
+ * dma_fence_user_fence_alloc.
+ */
+void dma_fence_user_fence_free(struct dma_fence_user_fence *user_fence)
+{
+	kfree(user_fence);
+}
+EXPORT_SYMBOL(dma_fence_user_fence_free);
+
+/**
+ * dma_fence_user_fence_attach() - Attach user fence to dma-fence
+ *
+ * @fence: fence
+ * @user_fence user fence
+ * @map: IOSYS map to write seqno to
+ * @seqno: seqno to write to IOSYS map
+ *
+ * Attach a user fence, which is a seqno write to an IOSYS map, to a DMA fence.
+ * The caller must guarantee that the memory in the IOSYS map doesn't move
+ * before the fence signals. This is typically done by installing the DMA fence
+ * into the BO's DMA reservation bookkeeping slot from which the IOSYS was
+ * derived.
+ */
+void dma_fence_user_fence_attach(struct dma_fence *fence,
+				 struct dma_fence_user_fence *user_fence,
+				 struct iosys_map *map, u64 seqno)
+{
+	int err;
+
+	user_fence->map = *map;
+	user_fence->seqno = seqno;
+
+	err = dma_fence_add_callback(fence, &user_fence->cb, user_fence_cb);
+	if (err == -ENOENT)
+		user_fence_cb(NULL, &user_fence->cb);
+}
+EXPORT_SYMBOL(dma_fence_user_fence_attach);
diff --git a/include/linux/dma-fence-user-fence.h b/include/linux/dma-fence-user-fence.h
new file mode 100644
index 000000000000..8678129c7d56
--- /dev/null
+++ b/include/linux/dma-fence-user-fence.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#ifndef __LINUX_DMA_FENCE_USER_FENCE_H
+#define __LINUX_DMA_FENCE_USER_FENCE_H
+
+#include <linux/dma-fence.h>
+#include <linux/iosys-map.h>
+
+/** struct dma_fence_user_fence - User fence */
+struct dma_fence_user_fence {
+	/** @cb: dma-fence callback used to attach user fence to dma-fence */
+	struct dma_fence_cb cb;
+	/** @map: IOSYS map to write seqno to */
+	struct iosys_map map;
+	/** @seqno: seqno to write to IOSYS map */
+	u64 seqno;
+};
+
+struct dma_fence_user_fence *dma_fence_user_fence_alloc(void);
+
+void dma_fence_user_fence_free(struct dma_fence_user_fence *user_fence);
+
+void dma_fence_user_fence_attach(struct dma_fence *fence,
+				 struct dma_fence_user_fence *user_fence,
+				 struct iosys_map *map,
+				 u64 seqno);
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 03/29] drm/xe: Use dma_fence_preempt base class
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 01/29] dma-fence: Add dma_fence_preempt base class Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 02/29] dma-fence: Add dma_fence_user_fence Matthew Brost
@ 2024-11-18 23:35 ` Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 04/29] drm/xe: Allocate doorbells for UMD exec queues Matthew Brost
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:35 UTC (permalink / raw)
  To: igt-dev

Use the dma_fence_preempt base class in Xe instead of open-coding the
preemption implementation.

Cc: Dave Airlie <airlied@redhat.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/dma-buf/dma-fence-preempt.c         |  5 +-
 drivers/gpu/drm/xe/xe_guc_submit.c          |  3 +
 drivers/gpu/drm/xe/xe_hw_engine_group.c     |  4 +-
 drivers/gpu/drm/xe/xe_preempt_fence.c       | 80 ++++++---------------
 drivers/gpu/drm/xe/xe_preempt_fence.h       |  2 +-
 drivers/gpu/drm/xe/xe_preempt_fence_types.h | 11 +--
 6 files changed, 34 insertions(+), 71 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-preempt.c b/drivers/dma-buf/dma-fence-preempt.c
index 6e6ce7ea7421..bcc5e5cec919 100644
--- a/drivers/dma-buf/dma-fence-preempt.c
+++ b/drivers/dma-buf/dma-fence-preempt.c
@@ -8,11 +8,11 @@
 
 static void dma_fence_preempt_work_func(struct work_struct *w)
 {
-	bool cookie = dma_fence_begin_signalling();
 	struct dma_fence_preempt *pfence =
 		container_of(w, typeof(*pfence), work);
 	const struct dma_fence_preempt_ops *ops = pfence->ops;
 	int err = pfence->base.error;
+	bool cookie = dma_fence_begin_signalling();
 
 	if (!err) {
 		err = ops->preempt_wait(pfence);
@@ -23,6 +23,7 @@ static void dma_fence_preempt_work_func(struct work_struct *w)
 	dma_fence_signal(&pfence->base);
 	ops->preempt_finished(pfence);
 
+	/* The entire worker is signaling path, thus annotate the entirety */
 	dma_fence_end_signalling(cookie);
 }
 
@@ -109,7 +110,7 @@ EXPORT_SYMBOL(dma_fence_is_preempt);
  *
  * @fence: Preempt fence
  * @ops: Preempt fence operations
- * @wq: Work queue for preempt wait, should have WQ_MEM_RECLAIM set
+ * @wq: Work queue for preempt wait, must have WQ_MEM_RECLAIM set
  * @context: Fence context
  * @seqno: Fence seqence number
  */
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index f9ecee5364d8..58a3f4bb3887 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1603,6 +1603,9 @@ static int guc_exec_queue_suspend_wait(struct xe_exec_queue *q)
 	struct xe_guc *guc = exec_queue_to_guc(q);
 	int ret;
 
+	if (exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q))
+		return -ECANCELED;
+
 	/*
 	 * Likely don't need to check exec_queue_killed() as we clear
 	 * suspend_pending upon kill but to be paranoid but races in which
diff --git a/drivers/gpu/drm/xe/xe_hw_engine_group.c b/drivers/gpu/drm/xe/xe_hw_engine_group.c
index 82750520a90a..8ed5410c3964 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine_group.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine_group.c
@@ -163,7 +163,7 @@ int xe_hw_engine_group_add_exec_queue(struct xe_hw_engine_group *group, struct x
 	if (xe_vm_in_fault_mode(q->vm) && group->cur_mode == EXEC_MODE_DMA_FENCE) {
 		q->ops->suspend(q);
 		err = q->ops->suspend_wait(q);
-		if (err)
+		if (err == -ETIME)
 			goto err_suspend;
 
 		xe_hw_engine_group_resume_faulting_lr_jobs(group);
@@ -236,7 +236,7 @@ static int xe_hw_engine_group_suspend_faulting_lr_jobs(struct xe_hw_engine_group
 			continue;
 
 		err = q->ops->suspend_wait(q);
-		if (err)
+		if (err == -ETIME)
 			goto err_suspend;
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_preempt_fence.c b/drivers/gpu/drm/xe/xe_preempt_fence.c
index 83fbeea5aa20..80a8bc82f3cc 100644
--- a/drivers/gpu/drm/xe/xe_preempt_fence.c
+++ b/drivers/gpu/drm/xe/xe_preempt_fence.c
@@ -4,73 +4,40 @@
  */
 
 #include "xe_preempt_fence.h"
-
-#include <linux/slab.h>
-
 #include "xe_exec_queue.h"
 #include "xe_vm.h"
 
-static void preempt_fence_work_func(struct work_struct *w)
+static struct xe_exec_queue *to_exec_queue(struct dma_fence_preempt *fence)
 {
-	bool cookie = dma_fence_begin_signalling();
-	struct xe_preempt_fence *pfence =
-		container_of(w, typeof(*pfence), preempt_work);
-	struct xe_exec_queue *q = pfence->q;
-
-	if (pfence->error) {
-		dma_fence_set_error(&pfence->base, pfence->error);
-	} else if (!q->ops->reset_status(q)) {
-		int err = q->ops->suspend_wait(q);
-
-		if (err)
-			dma_fence_set_error(&pfence->base, err);
-	} else {
-		dma_fence_set_error(&pfence->base, -ENOENT);
-	}
-
-	dma_fence_signal(&pfence->base);
-	/*
-	 * Opt for keep everything in the fence critical section. This looks really strange since we
-	 * have just signalled the fence, however the preempt fences are all signalled via single
-	 * global ordered-wq, therefore anything that happens in this callback can easily block
-	 * progress on the entire wq, which itself may prevent other published preempt fences from
-	 * ever signalling.  Therefore try to keep everything here in the callback in the fence
-	 * critical section. For example if something below grabs a scary lock like vm->lock,
-	 * lockdep should complain since we also hold that lock whilst waiting on preempt fences to
-	 * complete.
-	 */
-	xe_vm_queue_rebind_worker(q->vm);
-	xe_exec_queue_put(q);
-	dma_fence_end_signalling(cookie);
+	return container_of(fence, struct xe_preempt_fence, base)->q;
 }
 
-static const char *
-preempt_fence_get_driver_name(struct dma_fence *fence)
+static int xe_preempt_fence_preempt(struct dma_fence_preempt *fence)
 {
-	return "xe";
+	struct xe_exec_queue *q = to_exec_queue(fence);
+
+	return q->ops->suspend(q);
 }
 
-static const char *
-preempt_fence_get_timeline_name(struct dma_fence *fence)
+static int xe_preempt_fence_preempt_wait(struct dma_fence_preempt *fence)
 {
-	return "preempt";
+	struct xe_exec_queue *q = to_exec_queue(fence);
+
+	return q->ops->suspend_wait(q);
 }
 
-static bool preempt_fence_enable_signaling(struct dma_fence *fence)
+static void xe_preempt_fence_preempt_finished(struct dma_fence_preempt *fence)
 {
-	struct xe_preempt_fence *pfence =
-		container_of(fence, typeof(*pfence), base);
-	struct xe_exec_queue *q = pfence->q;
+	struct xe_exec_queue *q = to_exec_queue(fence);
 
-	pfence->error = q->ops->suspend(q);
-	queue_work(q->vm->xe->preempt_fence_wq, &pfence->preempt_work);
-	return true;
+	xe_vm_queue_rebind_worker(q->vm);
+	xe_exec_queue_put(q);
 }
 
-static const struct dma_fence_ops preempt_fence_ops = {
-	.get_driver_name = preempt_fence_get_driver_name,
-	.get_timeline_name = preempt_fence_get_timeline_name,
-	.enable_signaling = preempt_fence_enable_signaling,
+static const struct dma_fence_preempt_ops xe_preempt_fence_ops = {
+	.preempt = xe_preempt_fence_preempt,
+	.preempt_wait = xe_preempt_fence_preempt_wait,
+	.preempt_finished = xe_preempt_fence_preempt_finished,
 };
 
 /**
@@ -95,7 +62,6 @@ struct xe_preempt_fence *xe_preempt_fence_alloc(void)
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&pfence->link);
-	INIT_WORK(&pfence->preempt_work, preempt_fence_work_func);
 
 	return pfence;
 }
@@ -134,11 +100,11 @@ xe_preempt_fence_arm(struct xe_preempt_fence *pfence, struct xe_exec_queue *q,
 {
 	list_del_init(&pfence->link);
 	pfence->q = xe_exec_queue_get(q);
-	spin_lock_init(&pfence->lock);
-	dma_fence_init(&pfence->base, &preempt_fence_ops,
-		      &pfence->lock, context, seqno);
 
-	return &pfence->base;
+	dma_fence_preempt_init(&pfence->base, &xe_preempt_fence_ops,
+			       q->vm->xe->preempt_fence_wq, context, seqno);
+
+	return &pfence->base.base;
 }
 
 /**
@@ -169,5 +135,5 @@ xe_preempt_fence_create(struct xe_exec_queue *q,
 
 bool xe_fence_is_xe_preempt(const struct dma_fence *fence)
 {
-	return fence->ops == &preempt_fence_ops;
+	return dma_fence_is_preempt(fence);
 }
diff --git a/drivers/gpu/drm/xe/xe_preempt_fence.h b/drivers/gpu/drm/xe/xe_preempt_fence.h
index 9406c6fea525..7b56d12c0786 100644
--- a/drivers/gpu/drm/xe/xe_preempt_fence.h
+++ b/drivers/gpu/drm/xe/xe_preempt_fence.h
@@ -25,7 +25,7 @@ xe_preempt_fence_arm(struct xe_preempt_fence *pfence, struct xe_exec_queue *q,
 static inline struct xe_preempt_fence *
 to_preempt_fence(struct dma_fence *fence)
 {
-	return container_of(fence, struct xe_preempt_fence, base);
+	return container_of(fence, struct xe_preempt_fence, base.base);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_preempt_fence_types.h b/drivers/gpu/drm/xe/xe_preempt_fence_types.h
index 312c3372a49f..f12b89f7dc35 100644
--- a/drivers/gpu/drm/xe/xe_preempt_fence_types.h
+++ b/drivers/gpu/drm/xe/xe_preempt_fence_types.h
@@ -6,8 +6,7 @@
 #ifndef _XE_PREEMPT_FENCE_TYPES_H_
 #define _XE_PREEMPT_FENCE_TYPES_H_
 
-#include <linux/dma-fence.h>
-#include <linux/workqueue.h>
+#include <linux/dma-fence-preempt.h>
 
 struct xe_exec_queue;
 
@@ -18,17 +17,11 @@ struct xe_exec_queue;
  */
 struct xe_preempt_fence {
 	/** @base: dma fence base */
-	struct dma_fence base;
+	struct dma_fence_preempt base;
 	/** @link: link into list of pending preempt fences */
 	struct list_head link;
 	/** @q: exec queue for this preempt fence */
 	struct xe_exec_queue *q;
-	/** @preempt_work: work struct which issues preemption */
-	struct work_struct preempt_work;
-	/** @lock: dma-fence fence lock */
-	spinlock_t lock;
-	/** @error: preempt fence is in error state */
-	int error;
 };
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 04/29] drm/xe: Allocate doorbells for UMD exec queues
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (2 preceding siblings ...)
  2024-11-18 23:35 ` [RFC PATCH 03/29] drm/xe: Use dma_fence_preempt base class Matthew Brost
@ 2024-11-18 23:35 ` Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 05/29] drm/xe: Add doorbell ID to snapshot capture Matthew Brost
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:35 UTC (permalink / raw)
  To: igt-dev

These will be mapped to user space for UMD submission. Add
infrastructure to GuC submission backend to manage these.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue_types.h     |   2 +
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   7 ++
 drivers/gpu/drm/xe/xe_guc_submit.c           | 107 +++++++++++++++++--
 3 files changed, 106 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 1158b6062a6c..7f68587d4021 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -83,6 +83,8 @@ struct xe_exec_queue {
 #define EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD	BIT(3)
 /* kernel exec_queue only, set priority to highest level */
 #define EXEC_QUEUE_FLAG_HIGH_PRIORITY		BIT(4)
+/* queue used for UMD submission */
+#define EXEC_QUEUE_FLAG_UMD_SUBMISSION		BIT(5)
 
 	/**
 	 * @flags: flags for this exec queue, should statically setup aside from ban
diff --git a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
index 4c39f01e4f52..2d53af75ed75 100644
--- a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
@@ -47,6 +47,13 @@ struct xe_guc_exec_queue {
 	u16 id;
 	/** @suspend_wait: wait queue used to wait on pending suspends */
 	wait_queue_head_t suspend_wait;
+	/** @db: doorbell state */
+	struct {
+		/** @db.id: doorbell ID */
+		int id;
+		/** @db.dpa: doorbell device physical address */
+		u64 dpa;
+	} db;
 	/** @suspend_pending: a suspend of the exec_queue is pending */
 	bool suspend_pending;
 };
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 58a3f4bb3887..cc7a98c1343e 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -29,6 +29,7 @@
 #include "xe_guc.h"
 #include "xe_guc_capture.h"
 #include "xe_guc_ct.h"
+#include "xe_guc_db_mgr.h"
 #include "xe_guc_exec_queue_types.h"
 #include "xe_guc_id_mgr.h"
 #include "xe_guc_submit_types.h"
@@ -67,6 +68,7 @@ exec_queue_to_guc(struct xe_exec_queue *q)
 #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
 #define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
 #define EXEC_QUEUE_STATE_EXTRA_REF		(1 << 11)
+#define EXEC_QUEUE_STATE_DB_REGISTERED		(1 << 12)
 
 static bool exec_queue_registered(struct xe_exec_queue *q)
 {
@@ -218,6 +220,16 @@ static void set_exec_queue_extra_ref(struct xe_exec_queue *q)
 	atomic_or(EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
 }
 
+static bool exec_queue_doorbell_registered(struct xe_exec_queue *q)
+{
+	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_DB_REGISTERED;
+}
+
+static void set_exec_queue_doorbell_registered(struct xe_exec_queue *q)
+{
+	atomic_or(EXEC_QUEUE_STATE_DB_REGISTERED, &q->guc->state);
+}
+
 static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q)
 {
 	return (atomic_read(&q->guc->state) &
@@ -354,13 +366,6 @@ static int alloc_guc_id(struct xe_guc *guc, struct xe_exec_queue *q)
 	return ret;
 }
 
-static void release_guc_id(struct xe_guc *guc, struct xe_exec_queue *q)
-{
-	mutex_lock(&guc->submission_state.lock);
-	__release_guc_id(guc, q, q->width);
-	mutex_unlock(&guc->submission_state.lock);
-}
-
 struct exec_queue_policy {
 	u32 count;
 	struct guc_update_exec_queue_policy h2g;
@@ -1238,7 +1243,13 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
 
 	if (xe_exec_queue_is_lr(q))
 		cancel_work_sync(&ge->lr_tdr);
-	release_guc_id(guc, q);
+
+	mutex_lock(&guc->submission_state.lock);
+	if (q->guc->db.id >= 0)
+		xe_guc_db_mgr_release_id_locked(&guc->dbm, q->guc->db.id);
+	__release_guc_id(guc, q, q->width);
+	mutex_unlock(&guc->submission_state.lock);
+
 	xe_sched_entity_fini(&ge->entity);
 	xe_sched_fini(&ge->sched);
 
@@ -1273,6 +1284,8 @@ static void __guc_exec_queue_fini(struct xe_guc *guc, struct xe_exec_queue *q)
 	guc_exec_queue_fini_async(q);
 }
 
+static void deallocate_doorbell(struct xe_guc *guc, u16 guc_id);
+
 static void __guc_exec_queue_process_msg_cleanup(struct xe_sched_msg *msg)
 {
 	struct xe_exec_queue *q = msg->private_data;
@@ -1281,6 +1294,9 @@ static void __guc_exec_queue_process_msg_cleanup(struct xe_sched_msg *msg)
 	xe_gt_assert(guc_to_gt(guc), !(q->flags & EXEC_QUEUE_FLAG_PERMANENT));
 	trace_xe_exec_queue_cleanup_entity(q);
 
+	if (exec_queue_doorbell_registered(q))
+		deallocate_doorbell(guc, q->guc->id);
+
 	if (exec_queue_registered(q))
 		disable_scheduling_deregister(guc, q);
 	else
@@ -1399,6 +1415,53 @@ static void guc_exec_queue_process_msg(struct xe_sched_msg *msg)
 	xe_pm_runtime_put(xe);
 }
 
+static int allocate_doorbell(struct xe_guc *guc, u16 guc_id, int doorbell_id,
+			     u64 gpa)
+{
+	u32 action[] = {
+		XE_GUC_ACTION_ALLOCATE_DOORBELL,
+		guc_id,
+		doorbell_id,
+		lower_32_bits(gpa),
+		upper_32_bits(gpa),
+		0,
+	};
+
+	return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action));
+}
+
+static void deallocate_doorbell(struct xe_guc *guc, u16 guc_id)
+{
+	u32 action[] = {
+		XE_GUC_ACTION_DEALLOCATE_DOORBELL,
+		guc_id
+	};
+
+	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
+}
+
+#define GUC_MMIO_DB_BAR_OFFSET SZ_4M
+
+static int create_doorbell(struct xe_guc *guc, struct xe_exec_queue *q)
+{
+	int ret;
+
+	set_exec_queue_doorbell_registered(q);
+	xe_guc_submit_reset_wait(guc);
+
+	q->guc->db.dpa = GUC_MMIO_DB_BAR_OFFSET + PAGE_SIZE * q->guc->db.id;
+	register_exec_queue(q);
+	enable_scheduling(q);
+
+	ret = allocate_doorbell(guc, q->guc->id, q->guc->db.id, q->guc->db.dpa);
+	if (ret) {
+		disable_scheduling_deregister(guc, q);
+		return ret;
+	}
+
+	return 0;
+}
+
 static const struct drm_sched_backend_ops drm_sched_ops = {
 	.run_job = guc_exec_queue_run_job,
 	.free_job = guc_exec_queue_free_job,
@@ -1415,7 +1478,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 	struct xe_guc *guc = exec_queue_to_guc(q);
 	struct xe_guc_exec_queue *ge;
 	long timeout;
-	int err, i;
+	int err, i, db_id = 0;
 
 	xe_gt_assert(guc_to_gt(guc), xe_device_uc_enabled(guc_to_xe(guc)));
 
@@ -1458,14 +1521,35 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 	if (xe_guc_read_stopped(guc))
 		xe_sched_stop(sched);
 
+	q->guc->db.id = -1;
+	if (q->flags & EXEC_QUEUE_FLAG_UMD_SUBMISSION) {
+		db_id = xe_guc_db_mgr_reserve_id_locked(&guc->dbm);
+		if (db_id < 0) {
+			err = db_id;
+			goto err_id;
+		}
+	}
+
 	mutex_unlock(&guc->submission_state.lock);
 
+	if (q->flags & EXEC_QUEUE_FLAG_UMD_SUBMISSION) {
+		q->guc->db.id = db_id;
+		err = create_doorbell(guc, q);
+		if (err)
+			goto err_db;
+	}
+
 	xe_exec_queue_assign_name(q, q->guc->id);
 
 	trace_xe_exec_queue_create(q);
 
 	return 0;
 
+err_db:
+	mutex_lock(&guc->submission_state.lock);
+	xe_guc_db_mgr_release_id_locked(&guc->dbm, q->guc->db.id);
+err_id:
+	__release_guc_id(guc, q, q->width);
 err_entity:
 	mutex_unlock(&guc->submission_state.lock);
 	xe_sched_entity_fini(&ge->entity);
@@ -1699,7 +1783,10 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
 		struct xe_sched_job *job = xe_sched_first_pending_job(sched);
 		bool ban = false;
 
-		if (job) {
+		if (exec_queue_doorbell_registered(q)) {
+			/* TODO: Ban via UMD shim too */
+			ban = true;
+		} else if (job) {
 			if ((xe_sched_job_started(job) &&
 			    !xe_sched_job_completed(job)) ||
 			    xe_sched_invalidate_job(job, 2)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 05/29] drm/xe: Add doorbell ID to snapshot capture
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (3 preceding siblings ...)
  2024-11-18 23:35 ` [RFC PATCH 04/29] drm/xe: Allocate doorbells for UMD exec queues Matthew Brost
@ 2024-11-18 23:35 ` Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 06/29] drm/xe: Break submission ring out into its own BO Matthew Brost
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:35 UTC (permalink / raw)
  To: igt-dev

Useful for debugging hangs with doorbells.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c       | 2 ++
 drivers/gpu/drm/xe/xe_guc_submit_types.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index cc7a98c1343e..c226c7b3245d 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -2227,6 +2227,7 @@ xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
 		return NULL;
 
 	snapshot->guc.id = q->guc->id;
+	snapshot->guc.db_id = q->guc->db.id;
 	memcpy(&snapshot->name, &q->name, sizeof(snapshot->name));
 	snapshot->class = q->class;
 	snapshot->logical_mask = q->logical_mask;
@@ -2321,6 +2322,7 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
 	drm_printf(p, "\tClass: %d\n", snapshot->class);
 	drm_printf(p, "\tLogical mask: 0x%x\n", snapshot->logical_mask);
 	drm_printf(p, "\tWidth: %d\n", snapshot->width);
+	drm_printf(p, "\tDoorbell ID: %d\n", snapshot->guc.db_id);
 	drm_printf(p, "\tRef: %d\n", snapshot->refcount);
 	drm_printf(p, "\tTimeout: %ld (ms)\n", snapshot->sched_timeout);
 	drm_printf(p, "\tTimeslice: %u (us)\n",
diff --git a/drivers/gpu/drm/xe/xe_guc_submit_types.h b/drivers/gpu/drm/xe/xe_guc_submit_types.h
index dc7456c34583..12fef7848b78 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_submit_types.h
@@ -113,6 +113,8 @@ struct xe_guc_submit_exec_queue_snapshot {
 		u32 wqi_tail;
 		/** @guc.id: GuC id for this exec_queue */
 		u16 id;
+		/** @guc.db_id: Doorbell id */
+		u16 db_id;
 	} guc;
 
 	/**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 06/29] drm/xe: Break submission ring out into its own BO
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (4 preceding siblings ...)
  2024-11-18 23:35 ` [RFC PATCH 05/29] drm/xe: Add doorbell ID to snapshot capture Matthew Brost
@ 2024-11-18 23:35 ` Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 07/29] drm/xe: Break indirect ring state " Matthew Brost
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:35 UTC (permalink / raw)
  To: igt-dev

Start laying the ground work for UMD submission. This will allow mmaping
the submission ring to user space.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_lrc.c       | 38 +++++++++++++++++++++++++------
 drivers/gpu/drm/xe/xe_lrc_types.h |  9 ++++++--
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 22e58c6e2a35..758648b6a711 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -632,7 +632,7 @@ static inline u32 __xe_lrc_ring_offset(struct xe_lrc *lrc)
 
 u32 xe_lrc_pphwsp_offset(struct xe_lrc *lrc)
 {
-	return lrc->ring.size;
+	return 0;
 }
 
 /* Make the magic macros work */
@@ -712,7 +712,21 @@ static inline u32 __maybe_unused __xe_lrc_##elem##_ggtt_addr(struct xe_lrc *lrc)
 	return xe_bo_ggtt_addr(lrc->bo) + __xe_lrc_##elem##_offset(lrc); \
 } \
 
-DECL_MAP_ADDR_HELPERS(ring)
+#define DECL_MAP_RING_ADDR_HELPERS(elem) \
+static inline struct iosys_map __xe_lrc_##elem##_map(struct xe_lrc *lrc) \
+{ \
+	struct iosys_map map = lrc->submission_ring->vmap; \
+\
+	xe_assert(lrc_to_xe(lrc), !iosys_map_is_null(&map));  \
+	iosys_map_incr(&map, __xe_lrc_##elem##_offset(lrc)); \
+	return map; \
+} \
+static inline u32 __maybe_unused __xe_lrc_##elem##_ggtt_addr(struct xe_lrc *lrc) \
+{ \
+	return xe_bo_ggtt_addr(lrc->submission_ring) + __xe_lrc_##elem##_offset(lrc); \
+} \
+
+DECL_MAP_RING_ADDR_HELPERS(ring)
 DECL_MAP_ADDR_HELPERS(pphwsp)
 DECL_MAP_ADDR_HELPERS(seqno)
 DECL_MAP_ADDR_HELPERS(regs)
@@ -722,6 +736,7 @@ DECL_MAP_ADDR_HELPERS(ctx_timestamp)
 DECL_MAP_ADDR_HELPERS(parallel)
 DECL_MAP_ADDR_HELPERS(indirect_ring)
 
+#undef DECL_RING_MAP_ADDR_HELPERS
 #undef DECL_MAP_ADDR_HELPERS
 
 /**
@@ -866,10 +881,8 @@ static void xe_lrc_set_ppgtt(struct xe_lrc *lrc, struct xe_vm *vm)
 static void xe_lrc_finish(struct xe_lrc *lrc)
 {
 	xe_hw_fence_ctx_finish(&lrc->fence_ctx);
-	xe_bo_lock(lrc->bo, false);
-	xe_bo_unpin(lrc->bo);
-	xe_bo_unlock(lrc->bo);
-	xe_bo_put(lrc->bo);
+	xe_bo_unpin_map_no_vm(lrc->bo);
+	xe_bo_unpin_map_no_vm(lrc->submission_ring);
 }
 
 #define PVC_CTX_ASID		(0x2e + 1)
@@ -889,7 +902,7 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 
 	kref_init(&lrc->refcount);
 	lrc->flags = 0;
-	lrc_size = ring_size + xe_gt_lrc_size(gt, hwe->class);
+	lrc_size = xe_gt_lrc_size(gt, hwe->class);
 	if (xe_gt_has_indirect_ring_state(gt))
 		lrc->flags |= XE_LRC_FLAG_INDIRECT_RING_STATE;
 
@@ -905,6 +918,17 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	if (IS_ERR(lrc->bo))
 		return PTR_ERR(lrc->bo);
 
+	lrc->submission_ring = xe_bo_create_pin_map(xe, tile, vm, ring_size,
+						    ttm_bo_type_kernel,
+						    XE_BO_FLAG_VRAM_IF_DGFX(tile) |
+						    XE_BO_FLAG_GGTT |
+						    XE_BO_FLAG_GGTT_INVALIDATE);
+	if (IS_ERR(lrc->submission_ring)) {
+		err = PTR_ERR(lrc->submission_ring);
+		lrc->submission_ring = NULL;
+		goto err_lrc_finish;
+	}
+
 	lrc->size = lrc_size;
 	lrc->tile = gt_to_tile(hwe->gt);
 	lrc->ring.size = ring_size;
diff --git a/drivers/gpu/drm/xe/xe_lrc_types.h b/drivers/gpu/drm/xe/xe_lrc_types.h
index 71ecb453f811..3ad9ac2d644f 100644
--- a/drivers/gpu/drm/xe/xe_lrc_types.h
+++ b/drivers/gpu/drm/xe/xe_lrc_types.h
@@ -17,11 +17,16 @@ struct xe_bo;
  */
 struct xe_lrc {
 	/**
-	 * @bo: buffer object (memory) for logical ring context, per process HW
-	 * status page, and submission ring.
+	 * @bo: buffer object (memory) for logical ring context and per process
+	 * HW status page.
 	 */
 	struct xe_bo *bo;
 
+	/**
+	 * @submission_ring: buffer object (memory) for submission_ring
+	 */
+	struct xe_bo *submission_ring;
+
 	/** @size: size of lrc including any indirect ring state page */
 	u32 size;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 07/29] drm/xe: Break indirect ring state out into its own BO
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (5 preceding siblings ...)
  2024-11-18 23:35 ` [RFC PATCH 06/29] drm/xe: Break submission ring out into its own BO Matthew Brost
@ 2024-11-18 23:35 ` Matthew Brost
  2024-11-18 23:35 ` [RFC PATCH 08/29] drm/xe: Clear GGTT in xe_bo_restore_kernel Matthew Brost
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:35 UTC (permalink / raw)
  To: igt-dev

Start laying the ground work for UMD submission. This will allow mmaping
the indirect ring state to user space.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_lrc.c       | 79 ++++++++++++++++++++++---------
 drivers/gpu/drm/xe/xe_lrc_types.h |  7 ++-
 2 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 758648b6a711..e3c1773191bd 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -74,10 +74,6 @@ size_t xe_gt_lrc_size(struct xe_gt *gt, enum xe_engine_class class)
 		size = 2 * SZ_4K;
 	}
 
-	/* Add indirect ring state page */
-	if (xe_gt_has_indirect_ring_state(gt))
-		size += LRC_INDIRECT_RING_STATE_SIZE;
-
 	return size;
 }
 
@@ -694,8 +690,7 @@ static u32 __xe_lrc_ctx_timestamp_offset(struct xe_lrc *lrc)
 
 static inline u32 __xe_lrc_indirect_ring_offset(struct xe_lrc *lrc)
 {
-	/* Indirect ring state page is at the very end of LRC */
-	return lrc->size - LRC_INDIRECT_RING_STATE_SIZE;
+	return 0;
 }
 
 #define DECL_MAP_ADDR_HELPERS(elem) \
@@ -726,6 +721,20 @@ static inline u32 __maybe_unused __xe_lrc_##elem##_ggtt_addr(struct xe_lrc *lrc)
 	return xe_bo_ggtt_addr(lrc->submission_ring) + __xe_lrc_##elem##_offset(lrc); \
 } \
 
+#define DECL_MAP_INDIRECT_ADDR_HELPERS(elem) \
+static inline struct iosys_map __xe_lrc_##elem##_map(struct xe_lrc *lrc) \
+{ \
+	struct iosys_map map = lrc->indirect_state->vmap; \
+\
+	xe_assert(lrc_to_xe(lrc), !iosys_map_is_null(&map));  \
+	iosys_map_incr(&map, __xe_lrc_##elem##_offset(lrc)); \
+	return map; \
+} \
+static inline u32 __maybe_unused __xe_lrc_##elem##_ggtt_addr(struct xe_lrc *lrc) \
+{ \
+	return xe_bo_ggtt_addr(lrc->indirect_state) + __xe_lrc_##elem##_offset(lrc); \
+} \
+
 DECL_MAP_RING_ADDR_HELPERS(ring)
 DECL_MAP_ADDR_HELPERS(pphwsp)
 DECL_MAP_ADDR_HELPERS(seqno)
@@ -734,8 +743,9 @@ DECL_MAP_ADDR_HELPERS(start_seqno)
 DECL_MAP_ADDR_HELPERS(ctx_job_timestamp)
 DECL_MAP_ADDR_HELPERS(ctx_timestamp)
 DECL_MAP_ADDR_HELPERS(parallel)
-DECL_MAP_ADDR_HELPERS(indirect_ring)
+DECL_MAP_INDIRECT_ADDR_HELPERS(indirect_ring)
 
+#undef DECL_INDIRECT_MAP_ADDR_HELPERS
 #undef DECL_RING_MAP_ADDR_HELPERS
 #undef DECL_MAP_ADDR_HELPERS
 
@@ -845,25 +855,27 @@ void xe_lrc_write_ctx_reg(struct xe_lrc *lrc, int reg_nr, u32 val)
 	xe_map_write32(xe, &map, val);
 }
 
-static void *empty_lrc_data(struct xe_hw_engine *hwe)
+static void *empty_lrc_data(struct xe_hw_engine *hwe, bool has_default)
 {
 	struct xe_gt *gt = hwe->gt;
 	void *data;
 	u32 *regs;
 
-	data = kzalloc(xe_gt_lrc_size(gt, hwe->class), GFP_KERNEL);
+	data = kzalloc(xe_gt_lrc_size(gt, hwe->class) +
+		       LRC_INDIRECT_RING_STATE_SIZE, GFP_KERNEL);
 	if (!data)
 		return NULL;
 
 	/* 1st page: Per-Process of HW status Page */
-	regs = data + LRC_PPHWSP_SIZE;
-	set_offsets(regs, reg_offsets(gt_to_xe(gt), hwe->class), hwe);
-	set_context_control(regs, hwe);
-	set_memory_based_intr(regs, hwe);
-	reset_stop_ring(regs, hwe);
+	if (!has_default) {
+		regs = data + LRC_PPHWSP_SIZE;
+		set_offsets(regs, reg_offsets(gt_to_xe(gt), hwe->class), hwe);
+		set_context_control(regs, hwe);
+		set_memory_based_intr(regs, hwe);
+		reset_stop_ring(regs, hwe);
+	}
 	if (xe_gt_has_indirect_ring_state(gt)) {
-		regs = data + xe_gt_lrc_size(gt, hwe->class) -
-		       LRC_INDIRECT_RING_STATE_SIZE;
+		regs = data + xe_gt_lrc_size(gt, hwe->class);
 		set_offsets(regs, xe2_indirect_ring_state_offsets, hwe);
 	}
 
@@ -883,6 +895,7 @@ static void xe_lrc_finish(struct xe_lrc *lrc)
 	xe_hw_fence_ctx_finish(&lrc->fence_ctx);
 	xe_bo_unpin_map_no_vm(lrc->bo);
 	xe_bo_unpin_map_no_vm(lrc->submission_ring);
+	xe_bo_unpin_map_no_vm(lrc->indirect_state);
 }
 
 #define PVC_CTX_ASID		(0x2e + 1)
@@ -903,8 +916,6 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	kref_init(&lrc->refcount);
 	lrc->flags = 0;
 	lrc_size = xe_gt_lrc_size(gt, hwe->class);
-	if (xe_gt_has_indirect_ring_state(gt))
-		lrc->flags |= XE_LRC_FLAG_INDIRECT_RING_STATE;
 
 	/*
 	 * FIXME: Perma-pinning LRC as we don't yet support moving GGTT address
@@ -929,6 +940,22 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 		goto err_lrc_finish;
 	}
 
+	if (xe_gt_has_indirect_ring_state(gt)) {
+		lrc->flags |= XE_LRC_FLAG_INDIRECT_RING_STATE;
+
+		lrc->indirect_state = xe_bo_create_pin_map(xe, tile, vm,
+							   LRC_INDIRECT_RING_STATE_SIZE,
+							   ttm_bo_type_kernel,
+							   XE_BO_FLAG_VRAM_IF_DGFX(tile) |
+							   XE_BO_FLAG_GGTT |
+							   XE_BO_FLAG_GGTT_INVALIDATE);
+		if (IS_ERR(lrc->indirect_state)) {
+			err = PTR_ERR(lrc->indirect_state);
+			lrc->indirect_state = NULL;
+			goto err_lrc_finish;
+		}
+	}
+
 	lrc->size = lrc_size;
 	lrc->tile = gt_to_tile(hwe->gt);
 	lrc->ring.size = ring_size;
@@ -938,8 +965,8 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	xe_hw_fence_ctx_init(&lrc->fence_ctx, hwe->gt,
 			     hwe->fence_irq, hwe->name);
 
-	if (!gt->default_lrc[hwe->class]) {
-		init_data = empty_lrc_data(hwe);
+	if (!gt->default_lrc[hwe->class] || xe_gt_has_indirect_ring_state(gt)) {
+		init_data = empty_lrc_data(hwe, !!gt->default_lrc[hwe->class]);
 		if (!init_data) {
 			err = -ENOMEM;
 			goto err_lrc_finish;
@@ -951,7 +978,7 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	 * values
 	 */
 	map = __xe_lrc_pphwsp_map(lrc);
-	if (!init_data) {
+	if (gt->default_lrc[hwe->class]) {
 		xe_map_memset(xe, &map, 0, 0, LRC_PPHWSP_SIZE);	/* PPHWSP */
 		xe_map_memcpy_to(xe, &map, LRC_PPHWSP_SIZE,
 				 gt->default_lrc[hwe->class] + LRC_PPHWSP_SIZE,
@@ -959,9 +986,17 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	} else {
 		xe_map_memcpy_to(xe, &map, 0, init_data,
 				 xe_gt_lrc_size(gt, hwe->class));
-		kfree(init_data);
 	}
 
+	if (xe_gt_has_indirect_ring_state(gt)) {
+		map = __xe_lrc_indirect_ring_map(lrc);
+		xe_map_memcpy_to(xe, &map, 0, init_data +
+				 xe_gt_lrc_size(gt, hwe->class),
+				 LRC_INDIRECT_RING_STATE_SIZE);
+	}
+
+	kfree(init_data);
+
 	if (vm) {
 		xe_lrc_set_ppgtt(lrc, vm);
 
diff --git a/drivers/gpu/drm/xe/xe_lrc_types.h b/drivers/gpu/drm/xe/xe_lrc_types.h
index 3ad9ac2d644f..3be708c82313 100644
--- a/drivers/gpu/drm/xe/xe_lrc_types.h
+++ b/drivers/gpu/drm/xe/xe_lrc_types.h
@@ -27,7 +27,12 @@ struct xe_lrc {
 	 */
 	struct xe_bo *submission_ring;
 
-	/** @size: size of lrc including any indirect ring state page */
+	/**
+	 * @indirect_state: buffer object (memory) for indirect state
+	 */
+	struct xe_bo *indirect_state;
+
+	/** @size: size of lrc */
 	u32 size;
 
 	/** @tile: tile which this LRC belongs to */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 08/29] drm/xe: Clear GGTT in xe_bo_restore_kernel
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (6 preceding siblings ...)
  2024-11-18 23:35 ` [RFC PATCH 07/29] drm/xe: Break indirect ring state " Matthew Brost
@ 2024-11-18 23:35 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 09/29] FIXME: drm/xe: Add pad to ring and indirect state Matthew Brost
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:35 UTC (permalink / raw)
  To: igt-dev

Part of what xe_bo_restore_kernel does, is restore BO's GGTT mappings
which may have been lost during a power state change. Missing is
restoring the GGTT entries without BO mappings to a known state (e.g.,
scratch pages). Update xe_bo_restore_kernel to clear the entire GGTT
before restoring BO's GGTT mappings.

v2:
 - Include missing local change of tile and id variable (CI)
v3:
 - Fixed kernel doc (CI)
v4:
 - Only clear holes (CI)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
---
 drivers/gpu/drm/xe/xe_bo_evict.c |  8 +++++++-
 drivers/gpu/drm/xe/xe_ggtt.c     | 19 ++++++++++++++++---
 drivers/gpu/drm/xe/xe_ggtt.h     |  2 ++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
index 8fb2be061003..d7bb3dbb41d6 100644
--- a/drivers/gpu/drm/xe/xe_bo_evict.c
+++ b/drivers/gpu/drm/xe/xe_bo_evict.c
@@ -123,7 +123,8 @@ int xe_bo_evict_all(struct xe_device *xe)
  * @xe: xe device
  *
  * Move kernel BOs from temporary (typically system) memory to VRAM via CPU. All
- * moves done via TTM calls.
+ * moves done via TTM calls. All GGTT are restored too, first by clearing GGTT
+ * to known state and then restoring individual BO's GGTT mappings.
  *
  * This function should be called early, before trying to init the GT, on device
  * resume.
@@ -131,8 +132,13 @@ int xe_bo_evict_all(struct xe_device *xe)
 int xe_bo_restore_kernel(struct xe_device *xe)
 {
 	struct xe_bo *bo;
+	struct xe_tile *tile;
+	u8 id;
 	int ret;
 
+	for_each_tile(tile, xe, id)
+		xe_ggtt_clear(tile->mem.ggtt);
+
 	spin_lock(&xe->pinned.lock);
 	for (;;) {
 		bo = list_first_entry_or_null(&xe->pinned.evicted,
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 558fac8bb6fb..2fc498b89878 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -140,7 +140,7 @@ static void xe_ggtt_set_pte_and_flush(struct xe_ggtt *ggtt, u64 addr, u64 pte)
 	ggtt_update_access_counter(ggtt);
 }
 
-static void xe_ggtt_clear(struct xe_ggtt *ggtt, u64 start, u64 size)
+static void __xe_ggtt_clear(struct xe_ggtt *ggtt, u64 start, u64 size)
 {
 	u16 pat_index = tile_to_xe(ggtt->tile)->pat.idx[XE_CACHE_WB];
 	u64 end = start + size - 1;
@@ -160,6 +160,19 @@ static void xe_ggtt_clear(struct xe_ggtt *ggtt, u64 start, u64 size)
 	}
 }
 
+static void xe_ggtt_initial_clear(struct xe_ggtt *ggtt);
+
+/**
+ * xe_ggtt_clear() - GGTT clear
+ * @ggtt: the &xe_ggtt to be cleared
+ *
+ * Clear all GGTT to a known state
+ */
+void xe_ggtt_clear(struct xe_ggtt *ggtt)
+{
+	xe_ggtt_initial_clear(ggtt);
+}
+
 static void ggtt_fini_early(struct drm_device *drm, void *arg)
 {
 	struct xe_ggtt *ggtt = arg;
@@ -277,7 +290,7 @@ static void xe_ggtt_initial_clear(struct xe_ggtt *ggtt)
 	/* Display may have allocated inside ggtt, so be careful with clearing here */
 	mutex_lock(&ggtt->lock);
 	drm_mm_for_each_hole(hole, &ggtt->mm, start, end)
-		xe_ggtt_clear(ggtt, start, end - start);
+		__xe_ggtt_clear(ggtt, start, end - start);
 
 	xe_ggtt_invalidate(ggtt);
 	mutex_unlock(&ggtt->lock);
@@ -294,7 +307,7 @@ static void ggtt_node_remove(struct xe_ggtt_node *node)
 
 	mutex_lock(&ggtt->lock);
 	if (bound)
-		xe_ggtt_clear(ggtt, node->base.start, node->base.size);
+		__xe_ggtt_clear(ggtt, node->base.start, node->base.size);
 	drm_mm_remove_node(&node->base);
 	node->base.size = 0;
 	mutex_unlock(&ggtt->lock);
diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
index 27e7d67de004..b7ae440cdebf 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.h
+++ b/drivers/gpu/drm/xe/xe_ggtt.h
@@ -13,6 +13,8 @@ struct drm_printer;
 int xe_ggtt_init_early(struct xe_ggtt *ggtt);
 int xe_ggtt_init(struct xe_ggtt *ggtt);
 
+void xe_ggtt_clear(struct xe_ggtt *ggtt);
+
 struct xe_ggtt_node *xe_ggtt_node_init(struct xe_ggtt *ggtt);
 void xe_ggtt_node_fini(struct xe_ggtt_node *node);
 int xe_ggtt_node_insert_balloon(struct xe_ggtt_node *node,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 09/29] FIXME: drm/xe: Add pad to ring and indirect state
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (7 preceding siblings ...)
  2024-11-18 23:35 ` [RFC PATCH 08/29] drm/xe: Clear GGTT in xe_bo_restore_kernel Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 10/29] drm/xe: Enable indirect ring on media GT Matthew Brost
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Unsure why, but without this intermittent hangs occur on GuC context
switching.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_lrc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index e3c1773191bd..9633e5e700f6 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -929,7 +929,7 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	if (IS_ERR(lrc->bo))
 		return PTR_ERR(lrc->bo);
 
-	lrc->submission_ring = xe_bo_create_pin_map(xe, tile, vm, ring_size,
+	lrc->submission_ring = xe_bo_create_pin_map(xe, tile, vm, SZ_32K,
 						    ttm_bo_type_kernel,
 						    XE_BO_FLAG_VRAM_IF_DGFX(tile) |
 						    XE_BO_FLAG_GGTT |
@@ -943,8 +943,7 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	if (xe_gt_has_indirect_ring_state(gt)) {
 		lrc->flags |= XE_LRC_FLAG_INDIRECT_RING_STATE;
 
-		lrc->indirect_state = xe_bo_create_pin_map(xe, tile, vm,
-							   LRC_INDIRECT_RING_STATE_SIZE,
+		lrc->indirect_state = xe_bo_create_pin_map(xe, tile, vm, SZ_8K,
 							   ttm_bo_type_kernel,
 							   XE_BO_FLAG_VRAM_IF_DGFX(tile) |
 							   XE_BO_FLAG_GGTT |
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 10/29] drm/xe: Enable indirect ring on media GT
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (8 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 09/29] FIXME: drm/xe: Add pad to ring and indirect state Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 11/29] drm/xe: Don't add pinned mappings to VM bulk move Matthew Brost
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

The media GT supports this, required for UMD submission, so enable by
default.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 9b81e7d00a86..a27450e63cf9 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -209,6 +209,7 @@ static const struct xe_media_desc media_xelpmp = {
 
 static const struct xe_media_desc media_xe2 = {
 	.name = "Xe2_LPM / Xe2_HPM / Xe3_LPM",
+	.has_indirect_ring_state = 1,
 	.hw_engine_mask =
 		GENMASK(XE_HW_ENGINE_VCS7, XE_HW_ENGINE_VCS0) |
 		GENMASK(XE_HW_ENGINE_VECS3, XE_HW_ENGINE_VECS0) |
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 11/29] drm/xe: Don't add pinned mappings to VM bulk move
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (9 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 10/29] drm/xe: Enable indirect ring on media GT Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 12/29] drm/xe: Add exec queue post init extension processing Matthew Brost
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

We don't want kernel pinned resources (ring, indirect state) in the VM's
bulk move as these are unevictable.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 549866da5cd1..96dbc88b1f55 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1470,6 +1470,9 @@ __xe_bo_create_locked(struct xe_device *xe,
 {
 	struct xe_bo *bo = NULL;
 	int err;
+	bool want_bulk = vm && !xe_vm_in_fault_mode(vm) &&
+		flags & XE_BO_FLAG_USER &&
+		!(flags & (XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT));
 
 	if (vm)
 		xe_vm_assert_held(vm);
@@ -1488,9 +1491,7 @@ __xe_bo_create_locked(struct xe_device *xe,
 	}
 
 	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? xe_vm_resv(vm) : NULL,
-				    vm && !xe_vm_in_fault_mode(vm) &&
-				    flags & XE_BO_FLAG_USER ?
-				    &vm->lru_bulk_move : NULL, size,
+				    want_bulk ? &vm->lru_bulk_move : NULL, size,
 				    cpu_caching, type, flags);
 	if (IS_ERR(bo))
 		return bo;
@@ -1781,9 +1782,6 @@ int xe_bo_pin(struct xe_bo *bo)
 	struct xe_device *xe = xe_bo_device(bo);
 	int err;
 
-	/* We currently don't expect user BO to be pinned */
-	xe_assert(xe, !xe_bo_is_user(bo));
-
 	/* Pinned object must be in GGTT or have pinned flag */
 	xe_assert(xe, bo->flags & (XE_BO_FLAG_PINNED |
 				   XE_BO_FLAG_GGTT));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 12/29] drm/xe: Add exec queue post init extension processing
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (10 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 11/29] drm/xe: Don't add pinned mappings to VM bulk move Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 13/29] drm/xe/mmap: Add mmap support for PCI memory barrier Matthew Brost
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Add exec queue post init extension processing which is needed for more
complex extensions in which data is returned to the user.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c | 48 ++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index aab9e561153d..f402988b4fc0 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -33,6 +33,8 @@ enum xe_exec_queue_sched_prop {
 
 static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
 				      u64 extensions, int ext_number);
+static int exec_queue_user_extensions_post_init(struct xe_device *xe, struct xe_exec_queue *q,
+						u64 extensions, int ext_number);
 
 static void __xe_exec_queue_free(struct xe_exec_queue *q)
 {
@@ -446,6 +448,10 @@ static const xe_exec_queue_user_extension_fn exec_queue_user_extension_funcs[] =
 	[DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY] = exec_queue_user_ext_set_property,
 };
 
+static const xe_exec_queue_user_extension_fn exec_queue_user_extension_post_init_funcs[] = {
+	[DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY] = NULL,
+};
+
 #define MAX_USER_EXTENSIONS	16
 static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
 				      u64 extensions, int ext_number)
@@ -480,6 +486,42 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
 	return 0;
 }
 
+static int exec_queue_user_extensions_post_init(struct xe_device *xe, struct xe_exec_queue *q,
+						u64 extensions, int ext_number)
+{
+	u64 __user *address = u64_to_user_ptr(extensions);
+	struct drm_xe_user_extension ext;
+	int err;
+	u32 idx;
+
+	if (XE_IOCTL_DBG(xe, ext_number >= MAX_USER_EXTENSIONS))
+		return -E2BIG;
+
+	err = __copy_from_user(&ext, address, sizeof(ext));
+	if (XE_IOCTL_DBG(xe, err))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, ext.pad) ||
+	    XE_IOCTL_DBG(xe, ext.name >=
+			 ARRAY_SIZE(exec_queue_user_extension_post_init_funcs)))
+		return -EINVAL;
+
+	idx = array_index_nospec(ext.name,
+				 ARRAY_SIZE(exec_queue_user_extension_post_init_funcs));
+	if (exec_queue_user_extension_post_init_funcs[idx]) {
+		err = exec_queue_user_extension_post_init_funcs[idx](xe, q, extensions);
+		if (XE_IOCTL_DBG(xe, err))
+			return err;
+	}
+
+	if (ext.next_extension)
+		return exec_queue_user_extensions_post_init(xe, q,
+							    ext.next_extension,
+							    ++ext_number);
+
+	return 0;
+}
+
 static u32 calc_validate_logical_mask(struct xe_device *xe, struct xe_gt *gt,
 				      struct drm_xe_engine_class_instance *eci,
 				      u16 width, u16 num_placements)
@@ -647,6 +689,12 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 
 	q->xef = xe_file_get(xef);
 
+	if (args->extensions) {
+		err = exec_queue_user_extensions_post_init(xe, q, args->extensions, 0);
+		if (err)
+			goto kill_exec_queue;
+	}
+
 	/* user id alloc must always be last in ioctl to prevent UAF */
 	err = xa_alloc(&xef->exec_queue.xa, &id, q, xa_limit_32b, GFP_KERNEL);
 	if (err)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 13/29] drm/xe/mmap: Add mmap support for PCI memory barrier
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (11 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 12/29] drm/xe: Add exec queue post init extension processing Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 14/29] drm/xe: Add support for mmapping doorbells to user space Matthew Brost
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

From: Tejas Upadhyay <tejas.upadhyay@intel.com>

In order to avoid having userspace to use MI_MEM_FENCE,
we are adding a mechanism for userspace to generate a
PCI memory barrier with low overhead (avoiding IOCTL call
as well as writing to VRAM will adds some overhead).

This is implemented by memory-mapping a page as uncached
that is backed by MMIO on the dGPU and thus allowing userspace
to do memory write to the page without invoking an IOCTL.
We are selecting the MMIO so that it is not accessible from
the PCI bus so that the MMIO writes themselves are ignored,
but the PCI memory barrier will still take action as the MMIO
filtering will happen after the memory barrier effect.

When we detect special defined offset in mmap(), We are mapping
4K page which contains the last of page of doorbell MMIO range
to userspace for same purpose.

For user to query special offset we are adding special flag in
mmap_offset ioctl which needs to be passed as follows,
struct drm_xe_gem_mmap_offset mmo = {
        .handle = 0, /* this must be 0 */
        .flags = DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER,
};
igt_ioctl(fd, DRM_IOCTL_XE_GEM_MMAP_OFFSET, &mmo);
map = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, mmo);

Note: Test coverage for this is added by IGT
      https://patchwork.freedesktop.org/series/140368/  here.
      UMD implementing test, once PR is ready will attach with
      this patch.

V6(MAuld)
  - Move physical mmap to fault handler
  - Modify kernel-doc and attach UMD PR when ready
V5(MAuld)
  - Return invalid early in case of non 4K PAGE_SIZE
  - Format kernel-doc and add note for 4K PAGE_SIZE HW limit
V4(MAuld)
  - Add kernel-doc for uapi change
  - Restrict page size to 4K
V3(MAuld)
  - Remove offset defination from UAPI to be able to change later
  - Edit commit message for special flag addition
V2(MAuld)
  - Add fault handler with dummy page to handle unplug device
  - Add Build check for special offset to be below normal start page
  - Test d3hot, mapping seems to be valid in d3hot as well
  - Add more info to commit message

Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c     |  16 ++++-
 drivers/gpu/drm/xe/xe_bo.h     |   2 +
 drivers/gpu/drm/xe/xe_device.c | 103 ++++++++++++++++++++++++++++++++-
 include/uapi/drm/xe_drm.h      |  29 +++++++++-
 4 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 96dbc88b1f55..f948262e607f 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2138,9 +2138,23 @@ int xe_gem_mmap_offset_ioctl(struct drm_device *dev, void *data,
 	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
 		return -EINVAL;
 
-	if (XE_IOCTL_DBG(xe, args->flags))
+	if (XE_IOCTL_DBG(xe, args->flags &
+			 ~DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER))
 		return -EINVAL;
 
+	if (args->flags & DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER) {
+		if (XE_IOCTL_DBG(xe, args->handle))
+			return -EINVAL;
+
+		if (XE_IOCTL_DBG(xe, PAGE_SIZE > SZ_4K))
+			return -EINVAL;
+
+		BUILD_BUG_ON(((XE_PCI_BARRIER_MMAP_OFFSET >> XE_PTE_SHIFT) +
+			      SZ_4K) >= DRM_FILE_PAGE_OFFSET_START);
+		args->offset = XE_PCI_BARRIER_MMAP_OFFSET;
+		return 0;
+	}
+
 	gem_obj = drm_gem_object_lookup(file, args->handle);
 	if (XE_IOCTL_DBG(xe, !gem_obj))
 		return -ENOENT;
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 7fa44a0138b0..e7724965d3f1 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -63,6 +63,8 @@
 
 #define XE_BO_PROPS_INVALID	(-1)
 
+#define XE_PCI_BARRIER_MMAP_OFFSET	(0x50 << XE_PTE_SHIFT)
+
 struct sg_table;
 
 struct xe_bo *xe_bo_alloc(void);
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 930bb2750e2e..f6069db795e7 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -231,12 +231,113 @@ static long xe_drm_compat_ioctl(struct file *file, unsigned int cmd, unsigned lo
 #define xe_drm_compat_ioctl NULL
 #endif
 
+static void barrier_open(struct vm_area_struct *vma)
+{
+	drm_dev_get(vma->vm_private_data);
+}
+
+static void barrier_close(struct vm_area_struct *vma)
+{
+	drm_dev_put(vma->vm_private_data);
+}
+
+static void barrier_release_dummy_page(struct drm_device *dev, void *res)
+{
+	struct page *dummy_page = (struct page *)res;
+
+	__free_page(dummy_page);
+}
+
+static vm_fault_t barrier_fault(struct vm_fault *vmf)
+{
+	struct drm_device *dev = vmf->vma->vm_private_data;
+	struct vm_area_struct *vma = vmf->vma;
+	vm_fault_t ret = VM_FAULT_NOPAGE;
+	pgprot_t prot;
+	int idx;
+
+	prot = vm_get_page_prot(vma->vm_flags);
+
+	if (drm_dev_enter(dev, &idx)) {
+		unsigned long pfn;
+
+#define LAST_DB_PAGE_OFFSET 0x7ff001
+		pfn = PHYS_PFN(pci_resource_start(to_pci_dev(dev->dev), 0) +
+				LAST_DB_PAGE_OFFSET);
+		ret = vmf_insert_pfn_prot(vma, vma->vm_start, pfn,
+					  pgprot_noncached(prot));
+		drm_dev_exit(idx);
+	} else {
+		struct page *page;
+
+		/* Allocate new dummy page to map all the VA range in this VMA to it*/
+		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			return VM_FAULT_OOM;
+
+		/* Set the page to be freed using drmm release action */
+		if (drmm_add_action_or_reset(dev, barrier_release_dummy_page, page))
+			return VM_FAULT_OOM;
+
+		ret = vmf_insert_pfn_prot(vma, vma->vm_start, page_to_pfn(page),
+					  prot);
+	}
+
+	return ret;
+}
+
+static const struct vm_operations_struct vm_ops_barrier = {
+	.open = barrier_open,
+	.close = barrier_close,
+	.fault = barrier_fault,
+};
+
+static int xe_pci_barrier_mmap(struct file *filp,
+			       struct vm_area_struct *vma)
+{
+	struct drm_file *priv = filp->private_data;
+	struct drm_device *dev = priv->minor->dev;
+
+	if (vma->vm_end - vma->vm_start > SZ_4K)
+		return -EINVAL;
+
+	if (is_cow_mapping(vma->vm_flags))
+		return -EINVAL;
+
+	if (vma->vm_flags & (VM_READ | VM_EXEC))
+		return -EINVAL;
+
+	vm_flags_clear(vma, VM_MAYREAD | VM_MAYEXEC);
+	vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO);
+	vma->vm_ops = &vm_ops_barrier;
+	vma->vm_private_data = dev;
+	drm_dev_get(vma->vm_private_data);
+
+	return 0;
+}
+
+static int xe_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+	struct drm_file *priv = filp->private_data;
+	struct drm_device *dev = priv->minor->dev;
+
+	if (drm_dev_is_unplugged(dev))
+		return -ENODEV;
+
+	switch (vma->vm_pgoff) {
+	case XE_PCI_BARRIER_MMAP_OFFSET >> XE_PTE_SHIFT:
+		return xe_pci_barrier_mmap(filp, vma);
+	}
+
+	return drm_gem_mmap(filp, vma);
+}
+
 static const struct file_operations xe_driver_fops = {
 	.owner = THIS_MODULE,
 	.open = drm_open,
 	.release = drm_release_noglobal,
 	.unlocked_ioctl = xe_drm_ioctl,
-	.mmap = drm_gem_mmap,
+	.mmap = xe_mmap,
 	.poll = drm_poll,
 	.read = drm_read,
 	.compat_ioctl = xe_drm_compat_ioctl,
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 4a8a4a63e99c..6490b16b1217 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -811,6 +811,32 @@ struct drm_xe_gem_create {
 
 /**
  * struct drm_xe_gem_mmap_offset - Input of &DRM_IOCTL_XE_GEM_MMAP_OFFSET
+ *
+ * The @flags can be:
+ *  - %DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER - For user to query special offset
+ *  for use in mmap ioctl. Writing to the returned mmap address will generate a
+ *  PCI memory barrier with low overhead (avoiding IOCTL call as well as writing
+ *  to VRAM which would also add overhead), acting like an MI_MEM_FENCE
+ *  instruction.
+ *
+ *  Note: The mmap size can be at most 4K, due to HW limitations. As a result
+ *  this interface is only supported on CPU architectures that support 4K page
+ *  size. The mmap_offset ioctl will detect this and gracefully return an
+ *  error, where userspace is expected to have a different fallback method for
+ *  triggering a barrier.
+ *
+ *  Roughly the usage would be as follows:
+ *
+ *  .. code-block:: C
+ *
+ *  struct drm_xe_gem_mmap_offset mmo = {
+ *	.handle = 0, // must be set to 0
+ *	.flags = DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER,
+ *  };
+ *
+ *  err = ioctl(fd, DRM_IOCTL_XE_GEM_MMAP_OFFSET, &mmo);
+ *  map = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, mmo.offset);
+ *  map[i] = 0xdeadbeaf; // issue barrier
  */
 struct drm_xe_gem_mmap_offset {
 	/** @extensions: Pointer to the first extension struct, if any */
@@ -819,7 +845,8 @@ struct drm_xe_gem_mmap_offset {
 	/** @handle: Handle for the object being mapped. */
 	__u32 handle;
 
-	/** @flags: Must be zero */
+#define DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER     (1 << 0)
+	/** @flags: Flags */
 	__u32 flags;
 
 	/** @offset: The fake offset to use for subsequent mmap call */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 14/29] drm/xe: Add support for mmapping doorbells to user space
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (12 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 13/29] drm/xe/mmap: Add mmap support for PCI memory barrier Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 15/29] drm/xe: Add support for mmapping submission ring and indirect ring state " Matthew Brost
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Doorbells need to be mapped to user space for UMD direct submisssion,
add support for this.

FIXME: Wildly insecure as anyone can pick MMIO doorbell offset, will
need to randomize and tie unique offset to FD. Can be done in later revs
before upstreaming.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.h     |  3 ++
 drivers/gpu/drm/xe/xe_device.c | 73 ++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index e7724965d3f1..2772d42ac057 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -64,6 +64,9 @@
 #define XE_BO_PROPS_INVALID	(-1)
 
 #define XE_PCI_BARRIER_MMAP_OFFSET	(0x50 << XE_PTE_SHIFT)
+#define XE_MMIO_DOORBELL_MMAP_OFFSET	(0x100 << XE_PTE_SHIFT)
+#define XE_MMIO_DOORBELL_PFN_START	(SZ_4M >> XE_PTE_SHIFT)
+#define XE_MMIO_DOORBELL_PFN_COUNT	(256)
 
 struct sg_table;
 
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index f6069db795e7..bbdff4308b2e 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -316,6 +316,75 @@ static int xe_pci_barrier_mmap(struct file *filp,
 	return 0;
 }
 
+static vm_fault_t doorbell_fault(struct vm_fault *vmf)
+{
+	struct drm_device *dev = vmf->vma->vm_private_data;
+	struct vm_area_struct *vma = vmf->vma;
+	vm_fault_t ret = VM_FAULT_NOPAGE;
+	pgprot_t prot;
+	int idx;
+
+	prot = vm_get_page_prot(vma->vm_flags);
+
+	if (drm_dev_enter(dev, &idx)) {
+		unsigned long pfn;
+
+		pfn = PHYS_PFN(pci_resource_start(to_pci_dev(dev->dev), 0) +
+			       (XE_MMIO_DOORBELL_PFN_START << XE_PTE_SHIFT));
+		pfn += vma->vm_pgoff & (XE_MMIO_DOORBELL_PFN_COUNT - 1);
+
+		ret = vmf_insert_pfn_prot(vma, vma->vm_start, pfn,
+					  pgprot_noncached(prot));
+		drm_dev_exit(idx);
+	} else {
+		struct page *page;
+
+		/* Allocate new dummy page to map all the VA range in this VMA to it*/
+		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			return VM_FAULT_OOM;
+
+		/* Set the page to be freed using drmm release action */
+		if (drmm_add_action_or_reset(dev, barrier_release_dummy_page, page))
+			return VM_FAULT_OOM;
+
+		ret = vmf_insert_pfn_prot(vma, vma->vm_start, page_to_pfn(page),
+					  prot);
+	}
+
+	return ret;
+}
+
+static const struct vm_operations_struct vm_ops_doorbell = {
+	.open = barrier_open,
+	.close = barrier_close,
+	.fault = doorbell_fault,
+};
+
+static int xe_mmio_doorbell_mmap(struct file *filp,
+				 struct vm_area_struct *vma)
+{
+	struct drm_file *priv = filp->private_data;
+	struct drm_device *dev = priv->minor->dev;
+
+	if (vma->vm_end - vma->vm_start > SZ_4K)
+		return -EINVAL;
+
+	if (is_cow_mapping(vma->vm_flags))
+		return -EINVAL;
+
+	if (vma->vm_flags & VM_EXEC)
+		return -EINVAL;
+
+	vm_flags_clear(vma, VM_MAYEXEC);
+	vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO);
+	vma->vm_ops = &vm_ops_doorbell;
+	vma->vm_private_data = dev;
+	drm_dev_get(vma->vm_private_data);
+
+	return 0;
+}
+
 static int xe_mmap(struct file *filp, struct vm_area_struct *vma)
 {
 	struct drm_file *priv = filp->private_data;
@@ -327,6 +396,10 @@ static int xe_mmap(struct file *filp, struct vm_area_struct *vma)
 	switch (vma->vm_pgoff) {
 	case XE_PCI_BARRIER_MMAP_OFFSET >> XE_PTE_SHIFT:
 		return xe_pci_barrier_mmap(filp, vma);
+	case (XE_MMIO_DOORBELL_MMAP_OFFSET >> XE_PTE_SHIFT) ...
+		((XE_MMIO_DOORBELL_MMAP_OFFSET >> XE_PTE_SHIFT) +
+		XE_MMIO_DOORBELL_PFN_COUNT - 1):
+		return xe_mmio_doorbell_mmap(filp, vma);
 	}
 
 	return drm_gem_mmap(filp, vma);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 15/29] drm/xe: Add support for mmapping submission ring and indirect ring state to user space
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (13 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 14/29] drm/xe: Add support for mmapping doorbells to user space Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 16/29] drm/xe/uapi: Define UMD exec queue mapping uAPI Matthew Brost
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

The ring and indirect ring state need to mapped to user space for UMD
direction submission, add support for this.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c         |  3 ---
 drivers/gpu/drm/xe/xe_exec_queue.c |  2 +-
 drivers/gpu/drm/xe/xe_execlist.c   |  2 +-
 drivers/gpu/drm/xe/xe_lrc.c        | 29 ++++++++++++++++++++++-------
 drivers/gpu/drm/xe/xe_lrc.h        |  4 ++--
 5 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index f948262e607f..a87871f1cb95 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1311,9 +1311,6 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 	size_t aligned_size;
 	int err;
 
-	/* Only kernel objects should set GT */
-	xe_assert(xe, !tile || type == ttm_bo_type_kernel);
-
 	if (XE_WARN_ON(!size)) {
 		xe_bo_free(bo);
 		return ERR_PTR(-EINVAL);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index f402988b4fc0..aef5b130e7f8 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -119,7 +119,7 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q)
 	}
 
 	for (i = 0; i < q->width; ++i) {
-		q->lrc[i] = xe_lrc_create(q->hwe, q->vm, SZ_16K);
+		q->lrc[i] = xe_lrc_create(q, q->hwe, q->vm, SZ_16K);
 		if (IS_ERR(q->lrc[i])) {
 			err = PTR_ERR(q->lrc[i]);
 			goto err_unlock;
diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index a8c416a48812..93f76280d453 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -265,7 +265,7 @@ struct xe_execlist_port *xe_execlist_port_create(struct xe_device *xe,
 
 	port->hwe = hwe;
 
-	port->lrc = xe_lrc_create(hwe, NULL, SZ_16K);
+	port->lrc = xe_lrc_create(NULL, hwe, NULL, SZ_16K);
 	if (IS_ERR(port->lrc)) {
 		err = PTR_ERR(port->lrc);
 		goto err;
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 9633e5e700f6..8a79470b52ae 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -901,8 +901,9 @@ static void xe_lrc_finish(struct xe_lrc *lrc)
 #define PVC_CTX_ASID		(0x2e + 1)
 #define PVC_CTX_ACC_CTR_THOLD	(0x2a + 1)
 
-static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
-		       struct xe_vm *vm, u32 ring_size)
+static int xe_lrc_init(struct xe_lrc *lrc, struct xe_exec_queue *q,
+		       struct xe_hw_engine *hwe, struct xe_vm *vm,
+		       u32 ring_size)
 {
 	struct xe_gt *gt = hwe->gt;
 	struct xe_tile *tile = gt_to_tile(gt);
@@ -911,6 +912,11 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	void *init_data = NULL;
 	u32 arb_enable;
 	u32 lrc_size;
+	bool user_queue = q && q->flags & EXEC_QUEUE_FLAG_UMD_SUBMISSION;
+	enum ttm_bo_type submit_type = user_queue ? ttm_bo_type_device :
+		ttm_bo_type_kernel;
+	unsigned int submit_flags = user_queue ?
+		XE_BO_FLAG_USER : 0;
 	int err;
 
 	kref_init(&lrc->refcount);
@@ -930,7 +936,8 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 		return PTR_ERR(lrc->bo);
 
 	lrc->submission_ring = xe_bo_create_pin_map(xe, tile, vm, SZ_32K,
-						    ttm_bo_type_kernel,
+						    submit_type,
+						    submit_flags |
 						    XE_BO_FLAG_VRAM_IF_DGFX(tile) |
 						    XE_BO_FLAG_GGTT |
 						    XE_BO_FLAG_GGTT_INVALIDATE);
@@ -944,7 +951,8 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 		lrc->flags |= XE_LRC_FLAG_INDIRECT_RING_STATE;
 
 		lrc->indirect_state = xe_bo_create_pin_map(xe, tile, vm, SZ_8K,
-							   ttm_bo_type_kernel,
+							   submit_type,
+							   submit_flags |
 							   XE_BO_FLAG_VRAM_IF_DGFX(tile) |
 							   XE_BO_FLAG_GGTT |
 							   XE_BO_FLAG_GGTT_INVALIDATE);
@@ -955,6 +963,12 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 		}
 	}
 
+	/* Wait for clear */
+	if (user_queue)
+		dma_resv_wait_timeout(xe_vm_resv(vm),
+				      DMA_RESV_USAGE_KERNEL,
+				      false, MAX_SCHEDULE_TIMEOUT);
+
 	lrc->size = lrc_size;
 	lrc->tile = gt_to_tile(hwe->gt);
 	lrc->ring.size = ring_size;
@@ -1060,6 +1074,7 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 
 /**
  * xe_lrc_create - Create a LRC
+ * @q: Execution queue
  * @hwe: Hardware Engine
  * @vm: The VM (address space)
  * @ring_size: LRC ring size
@@ -1069,8 +1084,8 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
  * Return pointer to created LRC upon success and an error pointer
  * upon failure.
  */
-struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm,
-			     u32 ring_size)
+struct xe_lrc *xe_lrc_create(struct xe_exec_queue *q, struct xe_hw_engine *hwe,
+			     struct xe_vm *vm, u32 ring_size)
 {
 	struct xe_lrc *lrc;
 	int err;
@@ -1079,7 +1094,7 @@ struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm,
 	if (!lrc)
 		return ERR_PTR(-ENOMEM);
 
-	err = xe_lrc_init(lrc, hwe, vm, ring_size);
+	err = xe_lrc_init(lrc, q, hwe, vm, ring_size);
 	if (err) {
 		kfree(lrc);
 		return ERR_PTR(err);
diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
index b459dcab8787..23d71283c79d 100644
--- a/drivers/gpu/drm/xe/xe_lrc.h
+++ b/drivers/gpu/drm/xe/xe_lrc.h
@@ -41,8 +41,8 @@ struct xe_lrc_snapshot {
 
 #define LRC_PPHWSP_SCRATCH_ADDR (0x34 * 4)
 
-struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm,
-			     u32 ring_size);
+struct xe_lrc *xe_lrc_create(struct xe_exec_queue *q, struct xe_hw_engine *hwe,
+			     struct xe_vm *vm, u32 ring_size);
 void xe_lrc_destroy(struct kref *ref);
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 16/29] drm/xe/uapi: Define UMD exec queue mapping uAPI
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (14 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 15/29] drm/xe: Add support for mmapping submission ring and indirect ring state " Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 17/29] drm/xe: Add usermap exec queue extension Matthew Brost
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Define UMD exec queue mapping uAPI. The submit ring, indirect LRC state
(ring head, tail, etc...), and doorbell are securly mapped to user
space. The ring is a VM PPGTT addres, while indirect LRC state and
doorbell mapping is provided via a fake offset like BOs.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 include/uapi/drm/xe_drm.h | 56 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 6490b16b1217..9356a714a2e0 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1111,6 +1111,61 @@ struct drm_xe_vm_bind {
 	__u64 reserved[2];
 };
 
+/**
+ * struct drm_xe_exec_queue_ext_usermap
+ */
+struct drm_xe_exec_queue_ext_usermap {
+	/** @base: base user extension */
+	struct drm_xe_user_extension base;
+
+	/** @flags: MBZ */
+	__u32 flags;
+
+	/** @version: Version of usermap */
+#define DRM_XE_EXEC_QUEUE_USERMAP_VERSION_XE2_REV0	0
+	__u32 version;
+
+	/**
+	 * @ring_size: The ring size. 4k-2M valid, must be 4k aligned. User
+	 * space has to pad allocation / mapping to avoid prefetch faults.
+	 * Prefetch size is platform dependent.
+	 */
+	__u32 ring_size;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/**
+	 * @ring_addr: Ring address mapped within the VM, should be mapped as
+	 * UC.
+	 */
+	__u64 ring_addr;
+
+	/**
+	 * @indirect_ring_state_offset: The fake indirect ring state offset to
+	 * use for subsequent mmap call. Always 4k in size.
+	 */
+	__u64 indirect_ring_state_offset;
+
+	/**
+	 * @doorbell_offset: The fake doorbell offset to use for subsequent mmap
+	 * call. Always 4k in size.
+	 */
+	__u64 doorbell_offset;
+
+	/** @doorbell_page_offset: The doorbell offset within the mmapped page */
+	__u32 doorbell_page_offset;
+
+	/**
+	  * @indirect_ring_state_handle: Indirect ring state buffer object
+	  * handle. Allocated by KMD and must be closed by user.
+	 */
+	__u32 indirect_ring_state_handle;
+
+	/** @reserved: Reserved */
+	__u64 reserved[2];
+};
+
 /**
  * struct drm_xe_exec_queue_create - Input of &DRM_IOCTL_XE_EXEC_QUEUE_CREATE
  *
@@ -1138,6 +1193,7 @@ struct drm_xe_exec_queue_create {
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
 
+#define DRM_XE_EXEC_QUEUE_EXTENSION_USERMAP			1
 	/** @extensions: Pointer to the first extension struct, if any */
 	__u64 extensions;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 17/29] drm/xe: Add usermap exec queue extension
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (15 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 16/29] drm/xe/uapi: Define UMD exec queue mapping uAPI Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 18/29] drm/xe: Drop EXEC_QUEUE_FLAG_UMD_SUBMISSION flag Matthew Brost
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Implement uAPI which maps submit rings, indirect LRC state, and
doorbells to user space. This is required for UMD direction submission.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c       | 125 ++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  13 +++
 drivers/gpu/drm/xe/xe_execlist.c         |   2 +-
 drivers/gpu/drm/xe/xe_lrc.c              |  59 +++++++----
 drivers/gpu/drm/xe/xe_lrc.h              |   2 +-
 5 files changed, 176 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index aef5b130e7f8..c8d45133eb59 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -11,6 +11,7 @@
 #include <drm/drm_file.h>
 #include <uapi/drm/xe_drm.h>
 
+#include "xe_bo.h"
 #include "xe_device.h"
 #include "xe_gt.h"
 #include "xe_hw_engine_class_sysfs.h"
@@ -38,12 +39,18 @@ static int exec_queue_user_extensions_post_init(struct xe_device *xe, struct xe_
 
 static void __xe_exec_queue_free(struct xe_exec_queue *q)
 {
+	struct xe_device *xe = q->vm ? q->vm->xe : NULL;
+
 	if (q->vm)
 		xe_vm_put(q->vm);
 
 	if (q->xef)
 		xe_file_put(q->xef);
 
+	if (q->usermap)
+		xe_pm_runtime_put(xe);
+
+	kfree(q->usermap);
 	kfree(q);
 }
 
@@ -110,6 +117,8 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 static int __xe_exec_queue_init(struct xe_exec_queue *q)
 {
 	struct xe_vm *vm = q->vm;
+	u64 ring_addr = q->usermap ? q->usermap->ring_addr : 0;
+	u32 ring_size = q->usermap ? q->usermap->ring_size : SZ_16K;
 	int i, err;
 
 	if (vm) {
@@ -119,7 +128,8 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q)
 	}
 
 	for (i = 0; i < q->width; ++i) {
-		q->lrc[i] = xe_lrc_create(q, q->hwe, q->vm, SZ_16K);
+		q->lrc[i] = xe_lrc_create(q, q->hwe, q->vm, ring_size,
+					  ring_addr);
 		if (IS_ERR(q->lrc[i])) {
 			err = PTR_ERR(q->lrc[i]);
 			goto err_unlock;
@@ -444,12 +454,125 @@ typedef int (*xe_exec_queue_user_extension_fn)(struct xe_device *xe,
 					       struct xe_exec_queue *q,
 					       u64 extension);
 
+static int exec_queue_user_ext_usermap(struct xe_device *xe,
+				       struct xe_exec_queue *q,
+				       u64 extension)
+{
+	u64 __user *address = u64_to_user_ptr(extension);
+	struct drm_xe_exec_queue_ext_usermap ext;
+	int err;
+
+	/* Just parse args and make sure they are sane */
+
+	if (XE_IOCTL_DBG(xe, !xe_gt_has_indirect_ring_state(q->gt)))
+		return -EOPNOTSUPP;
+
+	if (XE_IOCTL_DBG(xe, q->width != 1))
+		return -EOPNOTSUPP;
+
+	if (XE_IOCTL_DBG(xe, q->flags & (EXEC_QUEUE_FLAG_KERNEL |
+					 EXEC_QUEUE_FLAG_PERMANENT |
+					 EXEC_QUEUE_FLAG_VM |
+					 EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD)))
+		return -EOPNOTSUPP;
+
+	if (XE_IOCTL_DBG(xe, q->width != 1))
+		return -EOPNOTSUPP;
+
+	/*
+	 * XXX: More or less free to support this but targeting Mesa for now as
+	 * LR mode has ULLS.
+	 */
+	if (XE_IOCTL_DBG(xe, xe_vm_in_lr_mode(q->vm)))
+		return -EOPNOTSUPP;
+
+	if (XE_IOCTL_DBG(xe, q->flags & EXEC_QUEUE_FLAG_UMD_SUBMISSION))
+		return -EINVAL;
+
+	err = __copy_from_user(&ext, address, sizeof(ext));
+	if (XE_IOCTL_DBG(xe, err))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, ext.reserved[0] || ext.reserved[1]))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, ext.pad))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, ext.flags))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, ext.ring_size < SZ_4K ||
+			 ext.ring_size > SZ_2M ||
+			 ext.ring_size & ~PAGE_MASK))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, ext.version !=
+			 DRM_XE_EXEC_QUEUE_USERMAP_VERSION_XE2_REV0))
+		return -EINVAL;
+
+	q->usermap = kzalloc(sizeof(struct xe_exec_queue_usermap), GFP_KERNEL);
+	if (!q->usermap)
+		return -ENOMEM;
+
+	q->usermap->ring_size = ext.ring_size;
+	q->usermap->ring_addr = ext.ring_addr;
+
+	xe_pm_runtime_get_noresume(xe);
+	q->flags |= EXEC_QUEUE_FLAG_UMD_SUBMISSION;
+
+	return 0;
+}
+
+static int exec_queue_user_ext_post_init_usermap(struct xe_device *xe,
+						 struct xe_exec_queue *q,
+						 u64 extension)
+{
+	struct drm_xe_exec_queue_ext_usermap ext;
+	struct xe_lrc *lrc = q->lrc[0];
+	u64 __user *address = u64_to_user_ptr(extension);
+	u32 indirect_ring_state_handle;
+	int err;
+
+	err = __copy_from_user(&ext, address, sizeof(ext));
+	if (XE_IOCTL_DBG(xe, err))
+		return -EFAULT;
+
+	err = drm_gem_handle_create(q->xef->drm,
+				    &lrc->indirect_state->ttm.base,
+				    &indirect_ring_state_handle);
+	if (err)
+		return err;
+
+	ext.indirect_ring_state_offset =
+		drm_vma_node_offset_addr(&lrc->indirect_state->ttm.base.vma_node);
+	ext.indirect_ring_state_handle = indirect_ring_state_handle;
+	ext.doorbell_offset = XE_MMIO_DOORBELL_MMAP_OFFSET +
+		SZ_4K * q->guc->db.id;
+	ext.doorbell_page_offset = 0;
+
+	err = copy_to_user(address, &ext, sizeof(ext));
+	if (XE_IOCTL_DBG(xe, err)) {
+		err = -EFAULT;
+		goto close_handles;
+	}
+
+	return 0;
+
+close_handles:
+	drm_gem_handle_delete(q->xef->drm, indirect_ring_state_handle);
+
+	return err;
+}
+
 static const xe_exec_queue_user_extension_fn exec_queue_user_extension_funcs[] = {
 	[DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY] = exec_queue_user_ext_set_property,
+	[DRM_XE_EXEC_QUEUE_EXTENSION_USERMAP] = exec_queue_user_ext_usermap,
 };
 
 static const xe_exec_queue_user_extension_fn exec_queue_user_extension_post_init_funcs[] = {
 	[DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY] = NULL,
+	[DRM_XE_EXEC_QUEUE_EXTENSION_USERMAP] = exec_queue_user_ext_post_init_usermap,
 };
 
 #define MAX_USER_EXTENSIONS	16
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 7f68587d4021..b30b5ee910fa 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -31,6 +31,16 @@ enum xe_exec_queue_priority {
 	XE_EXEC_QUEUE_PRIORITY_COUNT
 };
 
+/**
+ * struct xe_exec_queue_usermap - Execution queue usermap (UMD submission)
+ */
+struct xe_exec_queue_usermap {
+	/** @ring_addr: ring address (PPGTT) */
+	u64 ring_addr;
+	/** @ring_size: ring size */
+	u32 ring_size;
+};
+
 /**
  * struct xe_exec_queue - Execution queue
  *
@@ -130,6 +140,9 @@ struct xe_exec_queue {
 		struct list_head link;
 	} lr;
 
+	/** @usermap: user map interface */
+	struct xe_exec_queue_usermap *usermap;
+
 	/** @ops: submission backend exec queue operations */
 	const struct xe_exec_queue_ops *ops;
 
diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index 93f76280d453..803c84b2e4ed 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -265,7 +265,7 @@ struct xe_execlist_port *xe_execlist_port_create(struct xe_device *xe,
 
 	port->hwe = hwe;
 
-	port->lrc = xe_lrc_create(NULL, hwe, NULL, SZ_16K);
+	port->lrc = xe_lrc_create(NULL, hwe, NULL, SZ_16K, 0);
 	if (IS_ERR(port->lrc)) {
 		err = PTR_ERR(port->lrc);
 		goto err;
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 8a79470b52ae..8d5a65724c04 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -903,7 +903,7 @@ static void xe_lrc_finish(struct xe_lrc *lrc)
 
 static int xe_lrc_init(struct xe_lrc *lrc, struct xe_exec_queue *q,
 		       struct xe_hw_engine *hwe, struct xe_vm *vm,
-		       u32 ring_size)
+		       u32 ring_size, u64 ring_addr)
 {
 	struct xe_gt *gt = hwe->gt;
 	struct xe_tile *tile = gt_to_tile(gt);
@@ -919,6 +919,8 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_exec_queue *q,
 		XE_BO_FLAG_USER : 0;
 	int err;
 
+	xe_assert(xe, (!user_queue && !ring_addr) || (user_queue && ring_addr));
+
 	kref_init(&lrc->refcount);
 	lrc->flags = 0;
 	lrc_size = xe_gt_lrc_size(gt, hwe->class);
@@ -935,16 +937,18 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_exec_queue *q,
 	if (IS_ERR(lrc->bo))
 		return PTR_ERR(lrc->bo);
 
-	lrc->submission_ring = xe_bo_create_pin_map(xe, tile, vm, SZ_32K,
-						    submit_type,
-						    submit_flags |
-						    XE_BO_FLAG_VRAM_IF_DGFX(tile) |
-						    XE_BO_FLAG_GGTT |
-						    XE_BO_FLAG_GGTT_INVALIDATE);
-	if (IS_ERR(lrc->submission_ring)) {
-		err = PTR_ERR(lrc->submission_ring);
-		lrc->submission_ring = NULL;
-		goto err_lrc_finish;
+	if (!user_queue) {
+		lrc->submission_ring = xe_bo_create_pin_map(xe, tile, vm, SZ_32K,
+							    submit_type,
+							    submit_flags |
+							    XE_BO_FLAG_VRAM_IF_DGFX(tile) |
+							    XE_BO_FLAG_GGTT |
+							    XE_BO_FLAG_GGTT_INVALIDATE);
+		if (IS_ERR(lrc->submission_ring)) {
+			err = PTR_ERR(lrc->submission_ring);
+			lrc->submission_ring = NULL;
+			goto err_lrc_finish;
+		}
 	}
 
 	if (xe_gt_has_indirect_ring_state(gt)) {
@@ -1018,12 +1022,19 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_exec_queue *q,
 	}
 
 	if (xe_gt_has_indirect_ring_state(gt)) {
-		xe_lrc_write_ctx_reg(lrc, CTX_INDIRECT_RING_STATE,
-				     __xe_lrc_indirect_ring_ggtt_addr(lrc));
-
-		xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_START,
-					      __xe_lrc_ring_ggtt_addr(lrc));
-		xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_START_UDW, 0);
+		if (ring_addr) {	/* PPGTT */
+			xe_lrc_write_ctx_reg(lrc, CTX_INDIRECT_RING_STATE,
+					     __xe_lrc_indirect_ring_ggtt_addr(lrc) | BIT(0));
+			xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_START,
+						      ring_addr);
+		} else {
+			xe_lrc_write_ctx_reg(lrc, CTX_INDIRECT_RING_STATE,
+					     __xe_lrc_indirect_ring_ggtt_addr(lrc));
+			xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_START,
+						      __xe_lrc_ring_ggtt_addr(lrc));
+		}
+		xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_START_UDW,
+					      ring_addr >> 32);
 		xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_HEAD, 0);
 		xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_TAIL, lrc->ring.tail);
 		xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_CTL,
@@ -1056,8 +1067,10 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_exec_queue *q,
 		lrc->desc |= FIELD_PREP(LRC_ENGINE_CLASS, hwe->class);
 	}
 
-	arb_enable = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-	xe_lrc_write_ring(lrc, &arb_enable, sizeof(arb_enable));
+	if (lrc->submission_ring) {
+		arb_enable = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+		xe_lrc_write_ring(lrc, &arb_enable, sizeof(arb_enable));
+	}
 
 	map = __xe_lrc_seqno_map(lrc);
 	xe_map_write32(lrc_to_xe(lrc), &map, lrc->fence_ctx.next_seqno - 1);
@@ -1078,6 +1091,7 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_exec_queue *q,
  * @hwe: Hardware Engine
  * @vm: The VM (address space)
  * @ring_size: LRC ring size
+ * @ring_addr: LRC ring address, only valid for usermap queues
  *
  * Allocate and initialize the Logical Ring Context (LRC).
  *
@@ -1085,7 +1099,7 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_exec_queue *q,
  * upon failure.
  */
 struct xe_lrc *xe_lrc_create(struct xe_exec_queue *q, struct xe_hw_engine *hwe,
-			     struct xe_vm *vm, u32 ring_size)
+			     struct xe_vm *vm, u32 ring_size, u64 ring_addr)
 {
 	struct xe_lrc *lrc;
 	int err;
@@ -1094,7 +1108,7 @@ struct xe_lrc *xe_lrc_create(struct xe_exec_queue *q, struct xe_hw_engine *hwe,
 	if (!lrc)
 		return ERR_PTR(-ENOMEM);
 
-	err = xe_lrc_init(lrc, q, hwe, vm, ring_size);
+	err = xe_lrc_init(lrc, q, hwe, vm, ring_size, ring_addr);
 	if (err) {
 		kfree(lrc);
 		return ERR_PTR(err);
@@ -1717,7 +1731,8 @@ struct xe_lrc_snapshot *xe_lrc_snapshot_capture(struct xe_lrc *lrc)
 		xe_vm_get(lrc->bo->vm);
 
 	snapshot->context_desc = xe_lrc_ggtt_addr(lrc);
-	snapshot->ring_addr = __xe_lrc_ring_ggtt_addr(lrc);
+	snapshot->ring_addr = lrc->submission_ring ?
+		__xe_lrc_ring_ggtt_addr(lrc) : 0;
 	snapshot->indirect_context_desc = xe_lrc_indirect_ring_ggtt_addr(lrc);
 	snapshot->head = xe_lrc_ring_head(lrc);
 	snapshot->tail.internal = lrc->ring.tail;
diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
index 23d71283c79d..a7facfa8bf51 100644
--- a/drivers/gpu/drm/xe/xe_lrc.h
+++ b/drivers/gpu/drm/xe/xe_lrc.h
@@ -42,7 +42,7 @@ struct xe_lrc_snapshot {
 #define LRC_PPHWSP_SCRATCH_ADDR (0x34 * 4)
 
 struct xe_lrc *xe_lrc_create(struct xe_exec_queue *q, struct xe_hw_engine *hwe,
-			     struct xe_vm *vm, u32 ring_size);
+			     struct xe_vm *vm, u32 ring_size, u64 ring_addr);
 void xe_lrc_destroy(struct kref *ref);
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 18/29] drm/xe: Drop EXEC_QUEUE_FLAG_UMD_SUBMISSION flag
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (16 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 17/29] drm/xe: Add usermap exec queue extension Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 19/29] drm/xe: Do not allow usermap exec queues in exec IOCTL Matthew Brost
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Use xe_exec_queue_is_usermap helper instead.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c       | 3 +--
 drivers/gpu/drm/xe/xe_exec_queue.h       | 5 +++++
 drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 --
 drivers/gpu/drm/xe/xe_guc_submit.c       | 4 ++--
 drivers/gpu/drm/xe/xe_lrc.c              | 4 ++--
 5 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index c8d45133eb59..a22f089ccec6 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -486,7 +486,7 @@ static int exec_queue_user_ext_usermap(struct xe_device *xe,
 	if (XE_IOCTL_DBG(xe, xe_vm_in_lr_mode(q->vm)))
 		return -EOPNOTSUPP;
 
-	if (XE_IOCTL_DBG(xe, q->flags & EXEC_QUEUE_FLAG_UMD_SUBMISSION))
+	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_usermap(q)))
 		return -EINVAL;
 
 	err = __copy_from_user(&ext, address, sizeof(ext));
@@ -519,7 +519,6 @@ static int exec_queue_user_ext_usermap(struct xe_device *xe,
 	q->usermap->ring_addr = ext.ring_addr;
 
 	xe_pm_runtime_get_noresume(xe);
-	q->flags |= EXEC_QUEUE_FLAG_UMD_SUBMISSION;
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
index 90c7f73eab88..a4a1dbf5b977 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue.h
@@ -57,6 +57,11 @@ static inline bool xe_exec_queue_is_parallel(struct xe_exec_queue *q)
 	return q->width > 1;
 }
 
+static inline bool xe_exec_queue_is_usermap(struct xe_exec_queue *q)
+{
+	return !!q->usermap;
+}
+
 bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
 
 bool xe_exec_queue_ring_full(struct xe_exec_queue *q);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index b30b5ee910fa..26ce85b8d163 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -93,8 +93,6 @@ struct xe_exec_queue {
 #define EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD	BIT(3)
 /* kernel exec_queue only, set priority to highest level */
 #define EXEC_QUEUE_FLAG_HIGH_PRIORITY		BIT(4)
-/* queue used for UMD submission */
-#define EXEC_QUEUE_FLAG_UMD_SUBMISSION		BIT(5)
 
 	/**
 	 * @flags: flags for this exec queue, should statically setup aside from ban
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index c226c7b3245d..59d2e08797f5 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1522,7 +1522,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 		xe_sched_stop(sched);
 
 	q->guc->db.id = -1;
-	if (q->flags & EXEC_QUEUE_FLAG_UMD_SUBMISSION) {
+	if (xe_exec_queue_is_usermap(q)) {
 		db_id = xe_guc_db_mgr_reserve_id_locked(&guc->dbm);
 		if (db_id < 0) {
 			err = db_id;
@@ -1532,7 +1532,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 
 	mutex_unlock(&guc->submission_state.lock);
 
-	if (q->flags & EXEC_QUEUE_FLAG_UMD_SUBMISSION) {
+	if (xe_exec_queue_is_usermap(q)) {
 		q->guc->db.id = db_id;
 		err = create_doorbell(guc, q);
 		if (err)
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 8d5a65724c04..e8675624966d 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -18,7 +18,7 @@
 #include "xe_bo.h"
 #include "xe_device.h"
 #include "xe_drm_client.h"
-#include "xe_exec_queue_types.h"
+#include "xe_exec_queue.h"
 #include "xe_gt.h"
 #include "xe_gt_printk.h"
 #include "xe_hw_fence.h"
@@ -912,7 +912,7 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_exec_queue *q,
 	void *init_data = NULL;
 	u32 arb_enable;
 	u32 lrc_size;
-	bool user_queue = q && q->flags & EXEC_QUEUE_FLAG_UMD_SUBMISSION;
+	bool user_queue = q && xe_exec_queue_is_usermap(q);;
 	enum ttm_bo_type submit_type = user_queue ? ttm_bo_type_device :
 		ttm_bo_type_kernel;
 	unsigned int submit_flags = user_queue ?
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 19/29] drm/xe: Do not allow usermap exec queues in exec IOCTL
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (17 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 18/29] drm/xe: Drop EXEC_QUEUE_FLAG_UMD_SUBMISSION flag Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 20/29] drm/xe: Teach GuC backend to kill usermap queues Matthew Brost
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Not supported at the moment, may need something in the no doorbells
available.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 31cca938956f..898e4718d639 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -132,7 +132,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (XE_IOCTL_DBG(xe, !q))
 		return -ENOENT;
 
-	if (XE_IOCTL_DBG(xe, q->flags & EXEC_QUEUE_FLAG_VM)) {
+	if (XE_IOCTL_DBG(xe, q->flags & EXEC_QUEUE_FLAG_VM ||
+			 xe_exec_queue_is_usermap(q))) {
 		err = -EINVAL;
 		goto err_exec_queue;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 20/29] drm/xe: Teach GuC backend to kill usermap queues
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (18 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 19/29] drm/xe: Do not allow usermap exec queues in exec IOCTL Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 21/29] drm/xe: Enable preempt fences on " Matthew Brost
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Usermap exec queue's teardown (kill) differs from other exec queues as
no job is available, a doorbell is mapped, and the kill should be
immediate.

A follow up could unify LR queue cleanup with usermap but keeping this
a seperate flow for now.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |  2 +-
 drivers/gpu/drm/xe/xe_guc_submit.c           | 56 +++++++++++++++++++-
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
index 2d53af75ed75..c6c58e414b19 100644
--- a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
@@ -29,7 +29,7 @@ struct xe_guc_exec_queue {
 	 * a message needs to sent through the GPU scheduler but memory
 	 * allocations are not allowed.
 	 */
-#define MAX_STATIC_MSG_TYPE	3
+#define MAX_STATIC_MSG_TYPE	4
 	struct xe_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
 	/** @lr_tdr: long running TDR worker */
 	struct work_struct lr_tdr;
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 59d2e08797f5..82071a0ec91e 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -230,6 +230,11 @@ static void set_exec_queue_doorbell_registered(struct xe_exec_queue *q)
 	atomic_or(EXEC_QUEUE_STATE_DB_REGISTERED, &q->guc->state);
 }
 
+static void clear_exec_queue_doorbell_registered(struct xe_exec_queue *q)
+{
+	atomic_and(~EXEC_QUEUE_STATE_DB_REGISTERED, &q->guc->state);
+}
+
 static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q)
 {
 	return (atomic_read(&q->guc->state) &
@@ -798,6 +803,8 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
 		       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
 }
 
+static void guc_exec_queue_kill_user(struct xe_exec_queue *q);
+
 static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
 {
 	struct xe_guc *guc = exec_queue_to_guc(q);
@@ -806,7 +813,9 @@ static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
 	/** to wakeup xe_wait_user_fence ioctl if exec queue is reset */
 	wake_up_all(&xe->ufence_wq);
 
-	if (xe_exec_queue_is_lr(q))
+	if (xe_exec_queue_is_usermap(q))
+		guc_exec_queue_kill_user(q);
+	else if (xe_exec_queue_is_lr(q))
 		queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr);
 	else
 		xe_sched_tdr_queue_imm(&q->guc->sched);
@@ -1294,8 +1303,10 @@ static void __guc_exec_queue_process_msg_cleanup(struct xe_sched_msg *msg)
 	xe_gt_assert(guc_to_gt(guc), !(q->flags & EXEC_QUEUE_FLAG_PERMANENT));
 	trace_xe_exec_queue_cleanup_entity(q);
 
-	if (exec_queue_doorbell_registered(q))
+	if (exec_queue_doorbell_registered(q)) {
+		clear_exec_queue_doorbell_registered(q);
 		deallocate_doorbell(guc, q->guc->id);
+	}
 
 	if (exec_queue_registered(q))
 		disable_scheduling_deregister(guc, q);
@@ -1382,10 +1393,29 @@ static void __guc_exec_queue_process_msg_resume(struct xe_sched_msg *msg)
 	}
 }
 
+static void __guc_exec_queue_process_msg_kill_user(struct xe_sched_msg *msg)
+{
+	struct xe_exec_queue *q = msg->private_data;
+	struct xe_guc *guc = exec_queue_to_guc(q);
+
+	if (!xe_lrc_ring_is_idle(q->lrc[0]))
+		xe_gt_dbg(q->gt, "Killing non-idle usermap queue: guc_id=%d",
+			  q->guc->id);
+
+	if (exec_queue_doorbell_registered(q)) {
+		clear_exec_queue_doorbell_registered(q);
+		deallocate_doorbell(guc, q->guc->id);
+	}
+
+	if (exec_queue_registered(q))
+		disable_scheduling_deregister(guc, q);
+}
+
 #define CLEANUP		1	/* Non-zero values to catch uninitialized msg */
 #define SET_SCHED_PROPS	2
 #define SUSPEND		3
 #define RESUME		4
+#define KILL_USER	5
 #define OPCODE_MASK	0xf
 #define MSG_LOCKED	BIT(8)
 
@@ -1408,6 +1438,9 @@ static void guc_exec_queue_process_msg(struct xe_sched_msg *msg)
 	case RESUME:
 		__guc_exec_queue_process_msg_resume(msg);
 		break;
+	case KILL_USER:
+		__guc_exec_queue_process_msg_kill_user(msg);
+		break;
 	default:
 		XE_WARN_ON("Unknown message type");
 	}
@@ -1600,6 +1633,7 @@ static bool guc_exec_queue_try_add_msg(struct xe_exec_queue *q,
 #define STATIC_MSG_CLEANUP	0
 #define STATIC_MSG_SUSPEND	1
 #define STATIC_MSG_RESUME	2
+#define STATIC_MSG_KILL_USER	3
 static void guc_exec_queue_fini(struct xe_exec_queue *q)
 {
 	struct xe_sched_msg *msg = q->guc->static_msgs + STATIC_MSG_CLEANUP;
@@ -1725,6 +1759,24 @@ static void guc_exec_queue_resume(struct xe_exec_queue *q)
 	xe_sched_msg_unlock(sched);
 }
 
+static void guc_exec_queue_kill_user(struct xe_exec_queue *q)
+{
+	struct xe_gpu_scheduler *sched = &q->guc->sched;
+	struct xe_sched_msg *msg = q->guc->static_msgs + STATIC_MSG_KILL_USER;
+
+	if (exec_queue_extra_ref(q))
+		return;
+
+	set_exec_queue_banned(q);
+
+	xe_sched_msg_lock(sched);
+	if (guc_exec_queue_try_add_msg(q, msg, KILL_USER)) {
+		set_exec_queue_extra_ref(q);
+		xe_exec_queue_get(q);
+	}
+	xe_sched_msg_unlock(sched);
+}
+
 static bool guc_exec_queue_reset_status(struct xe_exec_queue *q)
 {
 	return exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 21/29] drm/xe: Enable preempt fences on usermap queues
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (19 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 20/29] drm/xe: Teach GuC backend to kill usermap queues Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 22/29] drm/xe/uapi: Add uAPI to convert user semaphore to / from drm syncobj Matthew Brost
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Preempt fences are used by usermap queues to implement dynamic memory
(BO eviction, userptr invalidation), enable preempt fences on usermap
queues.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c |  3 ++-
 drivers/gpu/drm/xe/xe_pt.c         |  3 +--
 drivers/gpu/drm/xe/xe_vm.c         | 18 ++++++++----------
 drivers/gpu/drm/xe/xe_vm.h         |  2 +-
 4 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index a22f089ccec6..987584090263 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -794,7 +794,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 		if (IS_ERR(q))
 			return PTR_ERR(q);
 
-		if (xe_vm_in_preempt_fence_mode(vm)) {
+		if (xe_vm_in_preempt_fence_mode(vm) ||
+		    xe_exec_queue_is_usermap(q)) {
 			q->lr.context = dma_fence_context_alloc(1);
 
 			err = xe_vm_add_compute_exec_queue(vm, q);
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 684dc075deac..a75667346ab3 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1882,8 +1882,7 @@ static void bind_op_commit(struct xe_vm *vm, struct xe_tile *tile,
 	 * the rebind worker
 	 */
 	if (pt_update_ops->wait_vm_bookkeep &&
-	    xe_vm_in_preempt_fence_mode(vm) &&
-	    !current->mm)
+	    vm->preempt.num_exec_queues && !current->mm)
 		xe_vm_queue_rebind_worker(vm);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 2e67648ed512..16bc1b82d950 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -229,7 +229,8 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 	int err;
 	bool wait;
 
-	xe_assert(vm->xe, xe_vm_in_preempt_fence_mode(vm));
+	xe_assert(vm->xe, xe_vm_in_preempt_fence_mode(vm) ||
+		  xe_exec_queue_is_usermap(q));
 
 	down_write(&vm->lock);
 	err = drm_gpuvm_exec_lock(&vm_exec);
@@ -280,7 +281,7 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
  */
 void xe_vm_remove_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 {
-	if (!xe_vm_in_preempt_fence_mode(vm))
+	if (!xe_vm_in_preempt_fence_mode(vm) && !xe_exec_queue_is_usermap(q))
 		return;
 
 	down_write(&vm->lock);
@@ -487,7 +488,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	long wait;
 	int __maybe_unused tries = 0;
 
-	xe_assert(vm->xe, xe_vm_in_preempt_fence_mode(vm));
+	xe_assert(vm->xe, !xe_vm_in_fault_mode(vm));
 	trace_xe_vm_rebind_worker_enter(vm);
 
 	down_write(&vm->lock);
@@ -1467,10 +1468,9 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 		vm->batch_invalidate_tlb = true;
 	}
 
-	if (vm->flags & XE_VM_FLAG_LR_MODE) {
-		INIT_WORK(&vm->preempt.rebind_work, preempt_rebind_work_func);
+	INIT_WORK(&vm->preempt.rebind_work, preempt_rebind_work_func);
+	if (vm->flags & XE_VM_FLAG_LR_MODE)
 		vm->batch_invalidate_tlb = false;
-	}
 
 	/* Fill pt_root after allocating scratch tables */
 	for_each_tile(tile, xe, id) {
@@ -1543,8 +1543,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	xe_assert(xe, !vm->preempt.num_exec_queues);
 
 	xe_vm_close(vm);
-	if (xe_vm_in_preempt_fence_mode(vm))
-		flush_work(&vm->preempt.rebind_work);
+	flush_work(&vm->preempt.rebind_work);
 
 	down_write(&vm->lock);
 	for_each_tile(tile, xe, id) {
@@ -1644,8 +1643,7 @@ static void vm_destroy_work_func(struct work_struct *w)
 	/* xe_vm_close_and_put was not called? */
 	xe_assert(xe, !vm->size);
 
-	if (xe_vm_in_preempt_fence_mode(vm))
-		flush_work(&vm->preempt.rebind_work);
+	flush_work(&vm->preempt.rebind_work);
 
 	mutex_destroy(&vm->snap_mutex);
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index c864dba35e1d..4391dbaeba51 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -216,7 +216,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma);
 
 static inline void xe_vm_queue_rebind_worker(struct xe_vm *vm)
 {
-	xe_assert(vm->xe, xe_vm_in_preempt_fence_mode(vm));
+	xe_assert(vm->xe, !xe_vm_in_fault_mode(vm));
 	queue_work(vm->xe->ordered_wq, &vm->preempt.rebind_work);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 22/29] drm/xe/uapi: Add uAPI to convert user semaphore to / from drm syncobj
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (20 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 21/29] drm/xe: Enable preempt fences on " Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 23/29] drm/xe: Add user fence IRQ handler Matthew Brost
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Simple interface to allow user space to share user syncs with kernel
syncs (dma-fences). The idea also is when user syncs are converted to
kernel syncs, preemption is guarded against until the kernel sync
signals. This is required to adhere to dma-fencing rules (no memory
allocates done in path of dma-fence, resume after preemption requires
memory allocations).

FIXME: uAPI likely to change, perhaps in drm generic way. Currently
enough for a PoC and enable initial Mesa development.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 include/uapi/drm/xe_drm.h | 62 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 9356a714a2e0..0cd473d2d91b 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -102,6 +102,7 @@ extern "C" {
 #define DRM_XE_EXEC			0x09
 #define DRM_XE_WAIT_USER_FENCE		0x0a
 #define DRM_XE_OBSERVATION		0x0b
+#define DRM_XE_VM_CONVERT_FENCE		0x0c
 
 /* Must be kept compact -- no holes */
 
@@ -117,6 +118,7 @@ extern "C" {
 #define DRM_IOCTL_XE_EXEC			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec)
 #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
 #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
+#define DRM_IOCTL_XE_VM_CONVERT_FENCE		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_VM_CONVERT_FENCE, struct drm_xe_vm_convert_fence)
 
 /**
  * DOC: Xe IOCTL Extensions
@@ -1796,6 +1798,66 @@ struct drm_xe_oa_stream_info {
 	__u64 reserved[3];
 };
 
+/**
+ * struct drm_xe_semaphore - Semaphore
+ */
+struct drm_xe_semaphore {
+	/**
+	 * @handle: Handle for the semaphore. Must be bound to the VM when
+	 * passed into drm_xe_vm_convert_fence.
+	 */
+	__u32 handle;
+
+	/** @offset: Offset in BO for semaphore, must QW aligned */
+	__u32 offset;
+
+	/** @seqno: Sequence number of semaphore */
+	__u64 seqno;
+
+	/** @token: Semaphore token - MBZ as not supported yet */
+	__u64 token;
+
+	/** @reserved: reserved for future use */
+	__u64 reserved[2];
+};
+
+/**
+ * struct drm_xe_vm_convert_fence - Convert semaphore to / from syncobj
+ *
+ * DRM_XE_SYNC_FLAG_SIGNAL set indicates semaphore -> syncobj
+ * DRM_XE_SYNC_FLAG_SIGNAL clear indicates syncobj -> semaphore
+ */
+struct drm_xe_vm_convert_fence {
+	/**
+	 * @extensions: Pointer to the first extension struct, if any
+	 */
+	__u64 extensions;
+
+	/** @vm_id: VM ID */
+	__u32 vm_id;
+
+	/** @flags: Flags - MBZ */
+	__u32 flags;
+
+	/** @pad: MBZ */
+	__u32 pad;
+
+	/**
+	 * @num_syncs: Number of struct drm_xe_sync and struct drm_xe_semaphore
+	 * in arrays.
+	 */
+	__u32 num_syncs;
+
+	/** @syncs: Pointer to struct drm_xe_sync array. */
+	__u64 syncs;
+
+	/** @semaphores: Pointer to struct drm_xe_semaphore array. */
+	__u64 semaphores;
+
+	/** @reserved: reserved for future use */
+	__u64 reserved[2];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 23/29] drm/xe: Add user fence IRQ handler
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (21 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 22/29] drm/xe/uapi: Add uAPI to convert user semaphore to / from drm syncobj Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 24/29] drm/xe: Add xe_hw_fence_user_init Matthew Brost
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Imported user fences will not be tied to a specific queue or hardware
engine class. Therefore, a device IRQ handler is needed to signal the
associated exported DMA fences.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c       | 4 ++++
 drivers/gpu/drm/xe/xe_device_types.h | 3 +++
 drivers/gpu/drm/xe/xe_hw_engine.c    | 4 +++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index bbdff4308b2e..573b5f3df0c8 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -39,6 +39,7 @@
 #include "xe_gt_sriov_vf.h"
 #include "xe_guc.h"
 #include "xe_hw_engine_group.h"
+#include "xe_hw_fence.h"
 #include "xe_hwmon.h"
 #include "xe_irq.h"
 #include "xe_memirq.h"
@@ -902,6 +903,7 @@ int xe_device_probe(struct xe_device *xe)
 	if (err)
 		goto err;
 
+	xe_hw_fence_irq_init(&xe->user_fence_irq);
 	for_each_gt(gt, xe, id) {
 		last_gt = id;
 
@@ -944,6 +946,7 @@ int xe_device_probe(struct xe_device *xe)
 	xe_oa_fini(xe);
 
 err_fini_gt:
+	xe_hw_fence_irq_finish(&xe->user_fence_irq);
 	for_each_gt(gt, xe, id) {
 		if (id < last_gt)
 			xe_gt_remove(gt);
@@ -979,6 +982,7 @@ void xe_device_remove(struct xe_device *xe)
 
 	xe_heci_gsc_fini(xe);
 
+	xe_hw_fence_irq_finish(&xe->user_fence_irq);
 	for_each_gt(gt, xe, id)
 		xe_gt_remove(gt);
 }
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 8592f1b02db1..3ac118c6f85e 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -507,6 +507,9 @@ struct xe_device {
 		int mode;
 	} wedged;
 
+	/** @user_fence_irq: User fence IRQ handler */
+	struct xe_hw_fence_irq user_fence_irq;
+
 #ifdef TEST_VM_OPS_ERROR
 	/**
 	 * @vm_inject_error_position: inject errors at different places in VM
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index c4b0dc3be39c..2c9aa5343971 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -822,8 +822,10 @@ void xe_hw_engine_handle_irq(struct xe_hw_engine *hwe, u16 intr_vec)
 	if (hwe->irq_handler)
 		hwe->irq_handler(hwe, intr_vec);
 
-	if (intr_vec & GT_RENDER_USER_INTERRUPT)
+	if (intr_vec & GT_RENDER_USER_INTERRUPT) {
+		xe_hw_fence_irq_run(&gt_to_xe(hwe->gt)->user_fence_irq);
 		xe_hw_fence_irq_run(hwe->fence_irq);
+	}
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 24/29] drm/xe: Add xe_hw_fence_user_init
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (22 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 23/29] drm/xe: Add user fence IRQ handler Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 25/29] drm/xe: Add a message lock to the Xe GPU scheduler Matthew Brost
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Add xe_hw_fence_user_init which can create a struct xe_hw_fence from a
user input rather than internal LRC state. Used to import user fence and
export them as dma fences.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_hw_fence.c | 17 +++++++++++++++++
 drivers/gpu/drm/xe/xe_hw_fence.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index 0b4f12be3692..2ea4d8bca6eb 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -263,3 +263,20 @@ void xe_hw_fence_init(struct dma_fence *fence, struct xe_hw_fence_ctx *ctx,
 
 	trace_xe_hw_fence_create(hw_fence);
 }
+
+void xe_hw_fence_user_init(struct dma_fence *fence, struct xe_device *xe,
+			   struct iosys_map seqno_map, u64 seqno)
+{
+	struct xe_hw_fence *hw_fence =
+		container_of(fence, typeof(*hw_fence), dma);
+
+	hw_fence->xe = xe;
+	snprintf(hw_fence->name, sizeof(hw_fence->name), "user");
+	hw_fence->seqno_map = seqno_map;
+
+	INIT_LIST_HEAD(&hw_fence->irq_link);
+	dma_fence_init(fence, &xe_hw_fence_ops, &xe->user_fence_irq.lock,
+		       dma_fence_context_alloc(1), seqno);
+
+	trace_xe_hw_fence_create(hw_fence);
+}
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.h b/drivers/gpu/drm/xe/xe_hw_fence.h
index f13a1c4982c7..76571ef2ef36 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.h
+++ b/drivers/gpu/drm/xe/xe_hw_fence.h
@@ -30,4 +30,7 @@ void xe_hw_fence_free(struct dma_fence *fence);
 
 void xe_hw_fence_init(struct dma_fence *fence, struct xe_hw_fence_ctx *ctx,
 		      struct iosys_map seqno_map);
+void xe_hw_fence_user_init(struct dma_fence *fence, struct xe_device *xe,
+			   struct iosys_map seqno_map, u64 seqno);
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 25/29] drm/xe: Add a message lock to the Xe GPU scheduler
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (23 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 24/29] drm/xe: Add xe_hw_fence_user_init Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 26/29] drm/xe: Always wait on preempt fences in vma_check_userptr Matthew Brost
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Stop abusing job list lock for message, use a dedicated lock. This lock
will soon be able to be taken in IRQ contexts, using irqsave for
simplicity. Can to tweaked in a follow up as needed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gpu_scheduler.c       | 19 ++++++++++++-------
 drivers/gpu/drm/xe/xe_gpu_scheduler.h       | 12 ++++--------
 drivers/gpu/drm/xe/xe_gpu_scheduler_types.h |  2 ++
 drivers/gpu/drm/xe/xe_guc_submit.c          | 15 +++++++++------
 4 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
index 50361b4638f9..55ccfb587523 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
@@ -14,25 +14,27 @@ static void xe_sched_process_msg_queue(struct xe_gpu_scheduler *sched)
 static void xe_sched_process_msg_queue_if_ready(struct xe_gpu_scheduler *sched)
 {
 	struct xe_sched_msg *msg;
+	unsigned long flags;
 
-	xe_sched_msg_lock(sched);
+	xe_sched_msg_lock(sched, flags);
 	msg = list_first_entry_or_null(&sched->msgs, struct xe_sched_msg, link);
 	if (msg)
 		xe_sched_process_msg_queue(sched);
-	xe_sched_msg_unlock(sched);
+	xe_sched_msg_unlock(sched, flags);
 }
 
 static struct xe_sched_msg *
 xe_sched_get_msg(struct xe_gpu_scheduler *sched)
 {
 	struct xe_sched_msg *msg;
+	unsigned long flags;
 
-	xe_sched_msg_lock(sched);
+	xe_sched_msg_lock(sched, flags);
 	msg = list_first_entry_or_null(&sched->msgs,
 				       struct xe_sched_msg, link);
 	if (msg)
 		list_del_init(&msg->link);
-	xe_sched_msg_unlock(sched);
+	xe_sched_msg_unlock(sched, flags);
 
 	return msg;
 }
@@ -64,6 +66,7 @@ int xe_sched_init(struct xe_gpu_scheduler *sched,
 		  struct device *dev)
 {
 	sched->ops = xe_ops;
+	spin_lock_init(&sched->msg_lock);
 	INIT_LIST_HEAD(&sched->msgs);
 	INIT_WORK(&sched->work_process_msg, xe_sched_process_msg_work);
 
@@ -98,15 +101,17 @@ void xe_sched_submission_resume_tdr(struct xe_gpu_scheduler *sched)
 void xe_sched_add_msg(struct xe_gpu_scheduler *sched,
 		      struct xe_sched_msg *msg)
 {
-	xe_sched_msg_lock(sched);
+	unsigned long flags;
+
+	xe_sched_msg_lock(sched, flags);
 	xe_sched_add_msg_locked(sched, msg);
-	xe_sched_msg_unlock(sched);
+	xe_sched_msg_unlock(sched, flags);
 }
 
 void xe_sched_add_msg_locked(struct xe_gpu_scheduler *sched,
 			     struct xe_sched_msg *msg)
 {
-	lockdep_assert_held(&sched->base.job_list_lock);
+	lockdep_assert_held(&sched->msg_lock);
 
 	list_add_tail(&msg->link, &sched->msgs);
 	xe_sched_process_msg_queue(sched);
diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
index c250ea773491..3238de26dcfe 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
@@ -29,15 +29,11 @@ void xe_sched_add_msg(struct xe_gpu_scheduler *sched,
 void xe_sched_add_msg_locked(struct xe_gpu_scheduler *sched,
 			     struct xe_sched_msg *msg);
 
-static inline void xe_sched_msg_lock(struct xe_gpu_scheduler *sched)
-{
-	spin_lock(&sched->base.job_list_lock);
-}
+#define xe_sched_msg_lock(sched, flags) \
+	spin_lock_irqsave(&sched->msg_lock, flags)
 
-static inline void xe_sched_msg_unlock(struct xe_gpu_scheduler *sched)
-{
-	spin_unlock(&sched->base.job_list_lock);
-}
+#define xe_sched_msg_unlock(sched, flags) \
+	spin_unlock_irqrestore(&sched->msg_lock, flags)
 
 static inline void xe_sched_stop(struct xe_gpu_scheduler *sched)
 {
diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h b/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
index 6731b13da8bb..c8e0352ef941 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
@@ -47,6 +47,8 @@ struct xe_gpu_scheduler {
 	const struct xe_sched_backend_ops	*ops;
 	/** @msgs: list of messages to be processed in @work_process_msg */
 	struct list_head			msgs;
+	/** @msg_lock: Lock for messages */
+	spinlock_t msg_lock;
 	/** @work_process_msg: processes messages */
 	struct work_struct		work_process_msg;
 };
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 82071a0ec91e..3efd2000c0a2 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1704,14 +1704,15 @@ static int guc_exec_queue_suspend(struct xe_exec_queue *q)
 {
 	struct xe_gpu_scheduler *sched = &q->guc->sched;
 	struct xe_sched_msg *msg = q->guc->static_msgs + STATIC_MSG_SUSPEND;
+	unsigned long flags;
 
 	if (exec_queue_killed_or_banned_or_wedged(q))
 		return -EINVAL;
 
-	xe_sched_msg_lock(sched);
+	xe_sched_msg_lock(sched, flags);
 	if (guc_exec_queue_try_add_msg(q, msg, SUSPEND))
 		q->guc->suspend_pending = true;
-	xe_sched_msg_unlock(sched);
+	xe_sched_msg_unlock(sched, flags);
 
 	return 0;
 }
@@ -1751,30 +1752,32 @@ static void guc_exec_queue_resume(struct xe_exec_queue *q)
 	struct xe_gpu_scheduler *sched = &q->guc->sched;
 	struct xe_sched_msg *msg = q->guc->static_msgs + STATIC_MSG_RESUME;
 	struct xe_guc *guc = exec_queue_to_guc(q);
+	unsigned long flags;
 
 	xe_gt_assert(guc_to_gt(guc), !q->guc->suspend_pending);
 
-	xe_sched_msg_lock(sched);
+	xe_sched_msg_lock(sched, flags);
 	guc_exec_queue_try_add_msg(q, msg, RESUME);
-	xe_sched_msg_unlock(sched);
+	xe_sched_msg_unlock(sched, flags);
 }
 
 static void guc_exec_queue_kill_user(struct xe_exec_queue *q)
 {
 	struct xe_gpu_scheduler *sched = &q->guc->sched;
 	struct xe_sched_msg *msg = q->guc->static_msgs + STATIC_MSG_KILL_USER;
+	unsigned long flags;
 
 	if (exec_queue_extra_ref(q))
 		return;
 
 	set_exec_queue_banned(q);
 
-	xe_sched_msg_lock(sched);
+	xe_sched_msg_lock(sched, flags);
 	if (guc_exec_queue_try_add_msg(q, msg, KILL_USER)) {
 		set_exec_queue_extra_ref(q);
 		xe_exec_queue_get(q);
 	}
-	xe_sched_msg_unlock(sched);
+	xe_sched_msg_unlock(sched, flags);
 }
 
 static bool guc_exec_queue_reset_status(struct xe_exec_queue *q)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 26/29] drm/xe: Always wait on preempt fences in vma_check_userptr
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (24 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 25/29] drm/xe: Add a message lock to the Xe GPU scheduler Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 27/29] drm/xe: Teach xe_sync layer about drm_xe_semaphore Matthew Brost
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

The assumption only a VM in preempt fence mode has preempt fences
attached is not true, preempt fences can be attached to a dma-resv VM if
user queues are open.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index a75667346ab3..1efe17b0b1f8 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1231,7 +1231,7 @@ static int vma_check_userptr(struct xe_vm *vm, struct xe_vma *vma,
 			       &vm->userptr.invalidated);
 		spin_unlock(&vm->userptr.invalidated_lock);
 
-		if (xe_vm_in_preempt_fence_mode(vm)) {
+		if (vm->preempt.num_exec_queues) {
 			struct dma_resv_iter cursor;
 			struct dma_fence *fence;
 			long err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 27/29] drm/xe: Teach xe_sync layer about drm_xe_semaphore
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (25 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 26/29] drm/xe: Always wait on preempt fences in vma_check_userptr Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 28/29] drm/xe: Add VM convert fence IOCTL Matthew Brost
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Teach xe_sync layer about drm_xe_semaphore which is used import / export
user fences.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_sync.c       | 90 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_sync.h       |  8 +++
 drivers/gpu/drm/xe/xe_sync_types.h |  5 +-
 3 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
index 42f5bebd09e5..ac4510ad52a9 100644
--- a/drivers/gpu/drm/xe/xe_sync.c
+++ b/drivers/gpu/drm/xe/xe_sync.c
@@ -6,6 +6,7 @@
 #include "xe_sync.h"
 
 #include <linux/dma-fence-array.h>
+#include <linux/dma-fence-user-fence.h>
 #include <linux/kthread.h>
 #include <linux/sched/mm.h>
 #include <linux/uaccess.h>
@@ -14,11 +15,15 @@
 #include <drm/drm_syncobj.h>
 #include <uapi/drm/xe_drm.h>
 
+#include "xe_bo.h"
 #include "xe_device_types.h"
 #include "xe_exec_queue.h"
+#include "xe_hw_fence.h"
 #include "xe_macros.h"
 #include "xe_sched_job_types.h"
 
+#define IS_UNINSTALLED_HW_FENCE		BIT(31)
+
 struct xe_user_fence {
 	struct xe_device *xe;
 	struct kref refcount;
@@ -211,6 +216,74 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
 	return 0;
 }
 
+int xe_sync_semaphore_parse(struct xe_device *xe, struct xe_file *xef,
+			    struct xe_sync_entry *sync,
+			    struct drm_xe_semaphore __user *semaphore_user,
+			    unsigned int flags)
+{
+	struct drm_xe_semaphore semaphore_in;
+	struct drm_gem_object *gem_obj;
+	struct xe_bo *bo;
+
+	if (copy_from_user(&semaphore_in, semaphore_user,
+			   sizeof(*semaphore_user)))
+		return -EFAULT;
+
+	if (XE_IOCTL_DBG(xe, semaphore_in.offset & 0x7 ||
+			 !semaphore_in.handle || semaphore_in.token ||
+			 semaphore_in.reserved[0] || semaphore_in.reserved[1]))
+		return -EINVAL;
+
+	gem_obj = drm_gem_object_lookup(xef->drm, semaphore_in.handle);
+	if (XE_IOCTL_DBG(xe, !gem_obj))
+		return -ENOENT;
+
+	bo = gem_to_xe_bo(gem_obj);
+
+	if (XE_IOCTL_DBG(xe, bo->size < semaphore_in.offset)) {
+		xe_bo_put(bo);
+		return -EINVAL;
+	}
+
+	if (flags & DRM_XE_SYNC_FLAG_SIGNAL) {
+		struct iosys_map vmap = sync->bo->vmap;
+		struct dma_fence *fence;
+
+		sync->chain_fence = dma_fence_chain_alloc();
+		if (!sync->chain_fence) {
+			xe_bo_put(bo);
+			dma_fence_chain_free(sync->chain_fence);
+			return -ENOMEM;
+		}
+
+		fence = xe_hw_fence_alloc();
+		if (IS_ERR(fence)) {
+			xe_bo_put(bo);
+			return PTR_ERR(fence);
+		}
+
+		vmap = bo->vmap;
+		iosys_map_incr(&vmap, semaphore_in.offset);
+
+		xe_hw_fence_user_init(fence, xe, vmap, semaphore_in.seqno);
+		sync->fence = fence;
+		sync->flags = IS_UNINSTALLED_HW_FENCE;
+	} else {
+		sync->user_fence = dma_fence_user_fence_alloc();
+		if (XE_IOCTL_DBG(xe, !sync->user_fence)) {
+			xe_bo_put(bo);
+			return PTR_ERR(sync->ufence);
+		}
+
+		sync->addr = semaphore_in.offset;
+		sync->timeline_value = semaphore_in.seqno;
+		sync->flags = DRM_XE_SYNC_FLAG_SIGNAL;
+	}
+	sync->bo = bo;
+
+	return 0;
+}
+
 int xe_sync_entry_add_deps(struct xe_sync_entry *sync, struct xe_sched_job *job)
 {
 	if (sync->fence)
@@ -249,17 +322,34 @@ void xe_sync_entry_signal(struct xe_sync_entry *sync, struct dma_fence *fence)
 			user_fence_put(sync->ufence);
 			dma_fence_put(fence);
 		}
+	} else if (sync->user_fence) {
+		struct iosys_map vmap = sync->bo->vmap;
+
+		iosys_map_incr(&vmap, sync->addr);
+		dma_fence_user_fence_attach(fence, sync->user_fence,
+					    &vmap, sync->timeline_value);
+		sync->user_fence = NULL;
 	}
 }
 
+void xe_sync_entry_hw_fence_installed(struct xe_sync_entry *sync)
+{
+	sync->flags &= ~IS_UNINSTALLED_HW_FENCE;
+}
+
 void xe_sync_entry_cleanup(struct xe_sync_entry *sync)
 {
 	if (sync->syncobj)
 		drm_syncobj_put(sync->syncobj);
+	xe_bo_put(sync->bo);
+	if (sync->flags & IS_UNINSTALLED_HW_FENCE)
+		dma_fence_set_error(sync->fence, -ECANCELED);
 	dma_fence_put(sync->fence);
 	dma_fence_chain_free(sync->chain_fence);
 	if (sync->ufence)
 		user_fence_put(sync->ufence);
+	if (sync->user_fence)
+		dma_fence_user_fence_free(sync->user_fence);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_sync.h b/drivers/gpu/drm/xe/xe_sync.h
index 256ffc1e54dc..fd56929e37cc 100644
--- a/drivers/gpu/drm/xe/xe_sync.h
+++ b/drivers/gpu/drm/xe/xe_sync.h
@@ -8,6 +8,9 @@
 
 #include "xe_sync_types.h"
 
+struct drm_xe_semaphore;
+struct drm_xe_sync;
+
 struct xe_device;
 struct xe_exec_queue;
 struct xe_file;
@@ -22,10 +25,15 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
 			struct xe_sync_entry *sync,
 			struct drm_xe_sync __user *sync_user,
 			unsigned int flags);
+int xe_sync_semaphore_parse(struct xe_device *xe, struct xe_file *xef,
+			    struct xe_sync_entry *sync,
+			    struct drm_xe_semaphore __user *semaphore_user,
+			    unsigned int flags);
 int xe_sync_entry_add_deps(struct xe_sync_entry *sync,
 			   struct xe_sched_job *job);
 void xe_sync_entry_signal(struct xe_sync_entry *sync,
 			  struct dma_fence *fence);
+void xe_sync_entry_hw_fence_installed(struct xe_sync_entry *sync);
 void xe_sync_entry_cleanup(struct xe_sync_entry *sync);
 struct dma_fence *
 xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
diff --git a/drivers/gpu/drm/xe/xe_sync_types.h b/drivers/gpu/drm/xe/xe_sync_types.h
index 30ac3f51993b..28e846c29122 100644
--- a/drivers/gpu/drm/xe/xe_sync_types.h
+++ b/drivers/gpu/drm/xe/xe_sync_types.h
@@ -11,14 +11,17 @@
 struct drm_syncobj;
 struct dma_fence;
 struct dma_fence_chain;
-struct drm_xe_sync;
+struct dma_fence_user_fence;
 struct user_fence;
+struct xe_bo;
 
 struct xe_sync_entry {
 	struct drm_syncobj *syncobj;
 	struct dma_fence *fence;
 	struct dma_fence_chain *chain_fence;
 	struct xe_user_fence *ufence;
+	struct dma_fence_user_fence *user_fence;
+	struct xe_bo *bo;
 	u64 addr;
 	u64 timeline_value;
 	u32 type;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 28/29] drm/xe: Add VM convert fence IOCTL
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (26 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 27/29] drm/xe: Teach xe_sync layer about drm_xe_semaphore Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-18 23:36 ` [RFC PATCH 29/29] drm/xe: Add user fence TDR Matthew Brost
  2024-11-19  4:05 ` ✗ Fi.CI.BUILD: failure for UMD direct submission in Xe Patchwork
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

Basically a version of the resume worker which also converts user syncs
to kerenl syncs (dma-fences) and vise versa. The expoxrted dma-fences in
the conversion guard against preemption which is required to avoid
breaking dma fence rules (no memory allocations in path of dma-fence,
resume requires memory allocations).

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c        |   1 +
 drivers/gpu/drm/xe/xe_preempt_fence.c |   9 +
 drivers/gpu/drm/xe/xe_vm.c            | 247 +++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm.h            |   2 +
 drivers/gpu/drm/xe/xe_vm_types.h      |   4 +
 5 files changed, 254 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 573b5f3df0c8..56dd26eddd92 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -191,6 +191,7 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(XE_WAIT_USER_FENCE, xe_wait_user_fence_ioctl,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(XE_VM_CONVERT_FENCE, xe_vm_convert_fence_ioctl, DRM_RENDER_ALLOW),
 };
 
 static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
diff --git a/drivers/gpu/drm/xe/xe_preempt_fence.c b/drivers/gpu/drm/xe/xe_preempt_fence.c
index 80a8bc82f3cc..c225f3cc82a3 100644
--- a/drivers/gpu/drm/xe/xe_preempt_fence.c
+++ b/drivers/gpu/drm/xe/xe_preempt_fence.c
@@ -12,6 +12,14 @@ static struct xe_exec_queue *to_exec_queue(struct dma_fence_preempt *fence)
 	return container_of(fence, struct xe_preempt_fence, base)->q;
 }
 
+static struct dma_fence *
+xe_preempt_fence_preempt_delay(struct dma_fence_preempt *fence)
+{
+	struct xe_exec_queue *q = to_exec_queue(fence);
+
+	return q->vm->preempt.exported_fence ?: dma_fence_get_stub();
+}
+
 static int xe_preempt_fence_preempt(struct dma_fence_preempt *fence)
 {
 	struct xe_exec_queue *q = to_exec_queue(fence);
@@ -35,6 +43,7 @@ static void xe_preempt_fence_preempt_finished(struct dma_fence_preempt *fence)
 }
 
 static const struct dma_fence_preempt_ops xe_preempt_fence_ops = {
+	.preempt_delay = xe_preempt_fence_preempt_delay,
 	.preempt = xe_preempt_fence_preempt,
 	.preempt_wait = xe_preempt_fence_preempt_wait,
 	.preempt_finished = xe_preempt_fence_preempt_finished,
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 16bc1b82d950..5078aeea2bd8 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -6,6 +6,7 @@
 #include "xe_vm.h"
 
 #include <linux/dma-fence-array.h>
+#include <linux/dma-fence-chain.h>
 #include <linux/nospec.h>
 
 #include <drm/drm_exec.h>
@@ -441,29 +442,44 @@ int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec,
 }
 
 static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
-				 bool *done)
+				 int extra_fence_count, bool *done)
 {
 	int err;
 
+	*done = false;
+
 	err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, 0);
 	if (err)
 		return err;
 
-	if (xe_vm_is_idle(vm)) {
+	if (xe_vm_in_preempt_fence_mode(vm) && xe_vm_is_idle(vm)) {
 		vm->preempt.rebind_deactivated = true;
 		*done = true;
 		return 0;
 	}
 
+	err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, 0);
+	if (err)
+		return err;
+
 	if (!preempt_fences_waiting(vm)) {
 		*done = true;
+
+		if (extra_fence_count) {
+			struct drm_gem_object *obj;
+			unsigned long index;
+
+			drm_exec_for_each_locked_object(exec, index, obj) {
+				err = dma_resv_reserve_fences(obj->resv,
+							      extra_fence_count);
+				if (err)
+					return err;
+			}
+		}
+
 		return 0;
 	}
 
-	err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, 0);
-	if (err)
-		return err;
-
 	err = wait_for_existing_preempt_fences(vm);
 	if (err)
 		return err;
@@ -474,7 +490,8 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 	 * The fence reservation here is intended for the new preempt fences
 	 * we attach at the end of the rebind work.
 	 */
-	return xe_vm_validate_rebind(vm, exec, vm->preempt.num_exec_queues);
+	return xe_vm_validate_rebind(vm, exec, vm->preempt.num_exec_queues +
+				     extra_fence_count);
 }
 
 static void preempt_rebind_work_func(struct work_struct *w)
@@ -509,9 +526,9 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
 
 	drm_exec_until_all_locked(&exec) {
-		bool done = false;
+		bool done;
 
-		err = xe_preempt_work_begin(&exec, vm, &done);
+		err = xe_preempt_work_begin(&exec, vm, 0, &done);
 		drm_exec_retry_on_contention(&exec);
 		if (err || done) {
 			drm_exec_fini(&exec);
@@ -1638,6 +1655,7 @@ static void vm_destroy_work_func(struct work_struct *w)
 		container_of(w, struct xe_vm, destroy_work);
 	struct xe_device *xe = vm->xe;
 	struct xe_tile *tile;
+	struct dma_fence *fence;
 	u8 id;
 
 	/* xe_vm_close_and_put was not called? */
@@ -1660,6 +1678,9 @@ static void vm_destroy_work_func(struct work_struct *w)
 	if (vm->xef)
 		xe_file_put(vm->xef);
 
+	dma_fence_chain_for_each(fence, vm->preempt.exported_fence);
+	dma_fence_put(vm->preempt.exported_fence);
+
 	kfree(vm);
 }
 
@@ -3403,3 +3424,211 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap)
 	}
 	kvfree(snap);
 }
+
+static int check_semaphores(struct xe_vm *vm, struct xe_sync_entry *syncs,
+			    struct drm_exec *exec, int num_syncs)
+{
+	int i, j;
+
+	for (i = 0; i < num_syncs; ++i) {
+		struct xe_bo *bo = syncs[i].bo;
+		struct drm_gem_object *obj = &bo->ttm.base;
+
+		if (bo->vm == vm)
+			continue;
+
+		for (j = 0; j < exec->num_objects; ++j) {
+			if (obj == exec->objects[j])
+				break;
+		}
+
+		if (j == exec->num_objects)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+int xe_vm_convert_fence_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file)
+{
+	struct xe_device *xe = to_xe_device(dev);
+	struct xe_file *xef = to_xe_file(file);
+	struct drm_xe_vm_convert_fence __user *args = data;
+	struct drm_xe_sync __user *syncs_user;
+	struct drm_xe_semaphore __user *semaphores_user;
+	struct xe_sync_entry *syncs = NULL;
+	struct xe_vm *vm;
+	int err = 0, i, num_syncs = 0;
+	bool done = false;
+	struct drm_exec exec;
+	unsigned int fence_count = 0;
+	LIST_HEAD(preempt_fences);
+	ktime_t end = 0;
+	long wait;
+	int __maybe_unused tries = 0;
+	struct dma_fence *fence, *prev = NULL;
+
+	if (XE_IOCTL_DBG(xe, args->extensions || args->flags ||
+			 args->reserved[0] || args->reserved[1] ||
+			 args->pad))
+		return -EINVAL;
+
+	vm = xe_vm_lookup(xef, args->vm_id);
+	if (XE_IOCTL_DBG(xe, !vm))
+		return -EINVAL;
+
+	err = down_write_killable(&vm->lock);
+	if (err)
+		goto put_vm;
+
+	if (XE_IOCTL_DBG(xe, xe_vm_is_closed_or_banned(vm))) {
+		err = -ENOENT;
+		goto release_vm_lock;
+	}
+
+	syncs = kcalloc(args->num_syncs * 2, sizeof(*syncs), GFP_KERNEL);
+	if (!syncs) {
+		err = -ENOMEM;
+		goto release_vm_lock;
+	}
+
+	syncs_user = u64_to_user_ptr(args->syncs);
+	semaphores_user = u64_to_user_ptr(args->semaphores);
+	for (i = 0; i < args->num_syncs; i++, num_syncs++) {
+		struct xe_sync_entry *sync = &syncs[i];
+		struct xe_sync_entry *semaphore_sync =
+			&syncs[args->num_syncs + i];
+
+		err = xe_sync_entry_parse(xe, xef, sync, &syncs_user[i],
+					  SYNC_PARSE_FLAG_DISALLOW_USER_FENCE);
+		if (err)
+			goto release_syncs;
+
+		err = xe_sync_semaphore_parse(xe, xef, semaphore_sync,
+					      &semaphores_user[i],
+					      sync->flags);
+		if (err) {
+			xe_sync_entry_cleanup(&syncs[i]);
+			goto release_syncs;
+		}
+	}
+
+retry:
+	if (xe_vm_userptr_check_repin(vm)) {
+		err = xe_vm_userptr_pin(vm);
+		if (err)
+			goto release_syncs;
+	}
+
+	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
+
+	drm_exec_until_all_locked(&exec) {
+		err = xe_preempt_work_begin(&exec, vm, num_syncs, &done);
+		drm_exec_retry_on_contention(&exec);
+		if (err) {
+			drm_exec_fini(&exec);
+			if (err && xe_vm_validate_should_retry(&exec, err, &end))
+				err = -EAGAIN;
+
+			goto release_syncs;
+		}
+	}
+
+	if (XE_IOCTL_DBG(xe, check_semaphores(vm, syncs + num_syncs,
+					      &exec, num_syncs))) {
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (!done) {
+		err = alloc_preempt_fences(vm, &preempt_fences, &fence_count);
+		if (err)
+			goto out_unlock;
+
+		wait = dma_resv_wait_timeout(xe_vm_resv(vm),
+					     DMA_RESV_USAGE_KERNEL,
+					     false, MAX_SCHEDULE_TIMEOUT);
+		if (wait <= 0) {
+			err = -ETIME;
+			goto out_unlock;
+		}
+	}
+
+#define retry_required(__tries, __vm) \
+	(IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT) ? \
+	(!(__tries)++ || __xe_vm_userptr_needs_repin(__vm)) : \
+	__xe_vm_userptr_needs_repin(__vm))
+
+	down_read(&vm->userptr.notifier_lock);
+	if (retry_required(tries, vm)) {
+		up_read(&vm->userptr.notifier_lock);
+		err = -EAGAIN;
+		goto out_unlock;
+	}
+
+#undef retry_required
+
+	/* Point of no return. */
+	xe_assert(vm->xe, list_empty(&vm->rebind_list));
+
+	for (i = 0; i < num_syncs; i++) {
+		struct xe_sync_entry *sync = &syncs[i];
+		struct xe_sync_entry *semaphore_sync = &syncs[num_syncs + i];
+
+		if (sync->flags & DRM_XE_SYNC_FLAG_SIGNAL) {
+			xe_sync_entry_signal(sync, semaphore_sync->fence);
+			xe_sync_entry_hw_fence_installed(semaphore_sync);
+
+			dma_fence_put(prev);
+			prev = dma_fence_get(vm->preempt.exported_fence);
+
+			dma_fence_chain_init(semaphore_sync->chain_fence,
+					     prev, semaphore_sync->fence,
+					     vm->preempt.seqno++);
+
+			vm->preempt.exported_fence =
+				&semaphore_sync->chain_fence->base;
+			semaphore_sync->chain_fence = NULL;
+
+			semaphore_sync->fence = NULL;   /* Ref owned by chain */
+		} else {
+			xe_sync_entry_signal(semaphore_sync, sync->fence);
+			drm_gpuvm_resv_add_fence(&vm->gpuvm, &exec,
+						 dma_fence_chain_contained(sync->fence),
+						 DMA_RESV_USAGE_BOOKKEEP,
+						 DMA_RESV_USAGE_BOOKKEEP);
+		}
+	}
+
+	dma_fence_chain_for_each(fence, prev);
+	dma_fence_put(prev);
+
+	if (!done) {
+		spin_lock(&vm->xe->ttm.lru_lock);
+		ttm_lru_bulk_move_tail(&vm->lru_bulk_move);
+		spin_unlock(&vm->xe->ttm.lru_lock);
+
+		arm_preempt_fences(vm, &preempt_fences);
+		resume_and_reinstall_preempt_fences(vm, &exec);
+	}
+	up_read(&vm->userptr.notifier_lock);
+
+out_unlock:
+	drm_exec_fini(&exec);
+release_syncs:
+	while (err != -EAGAIN && num_syncs--) {
+		xe_sync_entry_cleanup(&syncs[num_syncs]);
+		xe_sync_entry_cleanup(&syncs[args->num_syncs + num_syncs]);
+	}
+release_vm_lock:
+	if (err == -EAGAIN)
+		goto retry;
+	up_write(&vm->lock);
+put_vm:
+	xe_vm_put(vm);
+	free_preempt_fences(&preempt_fences);
+	kfree(syncs);
+
+	return err;
+}
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 4391dbaeba51..c1c70239cc91 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -181,6 +181,8 @@ int xe_vm_destroy_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file);
 int xe_vm_bind_ioctl(struct drm_device *dev, void *data,
 		     struct drm_file *file);
+int xe_vm_convert_fence_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file);
 
 void xe_vm_close_and_put(struct xe_vm *vm);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 7f9a303e51d8..c5cb83722706 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -254,6 +254,10 @@ struct xe_vm {
 		 * BOs
 		 */
 		struct work_struct rebind_work;
+		/** @seqno: Seqno of exported dma-fences */
+		u64 seqno;
+		/** @exported_fence: Chain of exported dma-fences */
+		struct dma_fence *exported_fence;
 	} preempt;
 
 	/** @um: unified memory state */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 29/29] drm/xe: Add user fence TDR
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (27 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 28/29] drm/xe: Add VM convert fence IOCTL Matthew Brost
@ 2024-11-18 23:36 ` Matthew Brost
  2024-11-19  4:05 ` ✗ Fi.CI.BUILD: failure for UMD direct submission in Xe Patchwork
  29 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2024-11-18 23:36 UTC (permalink / raw)
  To: igt-dev

We cannot let user fences exported as dma-fence run forever. Add a TDR
to protect against this. If the TDR fires the entire VM is killed as
dma-fences are not tied to an individual queue.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c       | 164 +++++++++++++++++++++++++++++--
 drivers/gpu/drm/xe/xe_vm_types.h |  22 +++++
 2 files changed, 179 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 5078aeea2bd8..8b475e76bfe0 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -30,6 +30,7 @@
 #include "xe_exec_queue.h"
 #include "xe_gt_pagefault.h"
 #include "xe_gt_tlb_invalidation.h"
+#include "xe_hw_fence.h"
 #include "xe_migrate.h"
 #include "xe_pat.h"
 #include "xe_pm.h"
@@ -336,11 +337,15 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked)
 	if (unlocked)
 		xe_vm_lock(vm, false);
 
-	vm->flags |= XE_VM_FLAG_BANNED;
-	trace_xe_vm_kill(vm);
+	if (!(vm->flags |= XE_VM_FLAG_BANNED)) {
+		vm->flags |= XE_VM_FLAG_BANNED;
+		trace_xe_vm_kill(vm);
 
-	list_for_each_entry(q, &vm->preempt.exec_queues, lr.link)
-		q->ops->kill(q);
+		list_for_each_entry(q, &vm->preempt.exec_queues, lr.link)
+			q->ops->kill(q);
+
+		/* TODO: Unmap usermap doorbells */
+	}
 
 	if (unlocked)
 		xe_vm_unlock(vm);
@@ -1393,6 +1398,9 @@ static void xe_vm_free_scratch(struct xe_vm *vm)
 	}
 }
 
+static void userfence_tdr(struct work_struct *w);
+static void userfence_kill(struct work_struct *w);
+
 struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 {
 	struct drm_gem_object *vm_resv_obj;
@@ -1517,6 +1525,12 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 		}
 	}
 
+	spin_lock_init(&vm->userfence.lock);
+	INIT_LIST_HEAD(&vm->userfence.pending_list);
+	vm->userfence.timeout = HZ * 5;
+	INIT_DELAYED_WORK(&vm->userfence.tdr, userfence_tdr);
+	INIT_WORK(&vm->userfence.kill_work, userfence_kill);
+
 	if (number_tiles > 1)
 		vm->composite_fence_ctx = dma_fence_context_alloc(1);
 
@@ -1562,6 +1576,9 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	xe_vm_close(vm);
 	flush_work(&vm->preempt.rebind_work);
 
+	flush_delayed_work(&vm->userfence.tdr);
+	flush_work(&vm->userfence.kill_work);
+
 	down_write(&vm->lock);
 	for_each_tile(tile, xe, id) {
 		if (vm->q[id])
@@ -3449,6 +3466,114 @@ static int check_semaphores(struct xe_vm *vm, struct xe_sync_entry *syncs,
 	return 0;
 }
 
+struct tdr_item {
+	struct dma_fence *fence;
+	struct xe_vm *vm;
+	struct list_head link;
+	struct dma_fence_cb cb;
+	u64 deadline;
+};
+
+static void userfence_kill(struct work_struct *w)
+{
+	struct xe_vm *vm =
+		container_of(w, struct xe_vm, userfence.kill_work);
+
+	down_write(&vm->lock);
+	xe_vm_kill(vm, true);
+	up_write(&vm->lock);
+}
+
+static void userfence_tdr(struct work_struct *w)
+{
+	struct xe_vm *vm =
+		container_of(w, struct xe_vm, userfence.tdr.work);
+	struct tdr_item *tdr_item;
+	bool timeout = false, cookie = dma_fence_begin_signalling();
+
+	xe_hw_fence_irq_stop(&vm->xe->user_fence_irq);
+
+	spin_lock_irq(&vm->userfence.lock);
+	list_for_each_entry(tdr_item, &vm->userfence.pending_list, link) {
+		if (!dma_fence_is_signaled(tdr_item->fence)) {
+			drm_notice(&vm->xe->drm,
+				   "Timedout usermap fence: seqno=%llu, deadline=%llu, jiffies=%llu",
+				   tdr_item->fence->seqno, tdr_item->deadline,
+				   get_jiffies_64());
+			dma_fence_set_error(tdr_item->fence, -ETIME);
+			timeout = true;
+			vm->userfence.timeout = 0;
+		}
+	}
+	spin_unlock_irq(&vm->userfence.lock);
+
+	xe_hw_fence_irq_start(&vm->xe->user_fence_irq);
+
+	/*
+	 * This is dma-fence signaling path so we cannot take the locks requires
+	 * to kill a VM. Defer killing to a worker.
+	 */
+	if (timeout)
+		schedule_work(&vm->userfence.kill_work);
+
+	dma_fence_end_signalling(cookie);
+}
+
+static void userfence_fence_cb(struct dma_fence *fence,
+			       struct dma_fence_cb *cb)
+{
+	struct tdr_item *next, *tdr_item = container_of(cb, struct tdr_item, cb);
+	struct xe_vm *vm = tdr_item->vm;
+	struct xe_gt *gt = xe_device_get_gt(vm->xe, 0);
+
+	if (fence)
+		spin_lock(&vm->userfence.lock);
+	else
+		spin_lock_irq(&vm->userfence.lock);
+
+	list_del(&tdr_item->link);
+	next = list_first_entry_or_null(&vm->userfence.pending_list,
+					typeof(*next), link);
+	if (next)
+		mod_delayed_work(gt->ordered_wq, &vm->userfence.tdr,
+				 next->deadline - get_jiffies_64());
+	else
+		cancel_delayed_work(&vm->userfence.tdr);
+
+	if (fence)
+		spin_unlock(&vm->userfence.lock);
+	else
+		spin_unlock_irq(&vm->userfence.lock);
+
+	dma_fence_put(tdr_item->fence);
+	xe_vm_put(tdr_item->vm);
+	kfree(tdr_item);
+}
+
+static void userfence_tdr_add(struct xe_vm *vm, struct tdr_item *tdr_item,
+			      struct dma_fence *fence)
+{
+	struct xe_gt *gt = xe_device_get_gt(vm->xe, 0);
+	int ret;
+
+	tdr_item->fence = dma_fence_get(fence);
+	tdr_item->vm = xe_vm_get(vm);
+	INIT_LIST_HEAD(&tdr_item->link);
+	tdr_item->deadline = vm->userfence.timeout + get_jiffies_64();
+
+	spin_lock_irq(&vm->userfence.lock);
+	list_add_tail(&tdr_item->link, &vm->userfence.pending_list);
+	if (list_is_singular(&vm->userfence.pending_list))
+		mod_delayed_work(gt->ordered_wq,
+				 &vm->userfence.tdr,
+				 vm->userfence.timeout);
+	spin_unlock_irq(&vm->userfence.lock);
+
+	ret = dma_fence_add_callback(fence, &tdr_item->cb, userfence_fence_cb);
+	if (ret == -ENOENT)
+		userfence_fence_cb(NULL, &tdr_item->cb);
+}
+
 int xe_vm_convert_fence_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *file)
 {
@@ -3459,6 +3584,7 @@ int xe_vm_convert_fence_ioctl(struct drm_device *dev, void *data,
 	struct drm_xe_semaphore __user *semaphores_user;
 	struct xe_sync_entry *syncs = NULL;
 	struct xe_vm *vm;
+	struct tdr_item **tdr_items = NULL;
 	int err = 0, i, num_syncs = 0;
 	bool done = false;
 	struct drm_exec exec;
@@ -3493,6 +3619,12 @@ int xe_vm_convert_fence_ioctl(struct drm_device *dev, void *data,
 		goto release_vm_lock;
 	}
 
+	tdr_items = kcalloc(args->num_syncs, sizeof(*tdr_items), GFP_KERNEL);
+	if (!tdr_items) {
+		err = -ENOMEM;
+		goto release_vm_lock;
+	}
+
 	syncs_user = u64_to_user_ptr(args->syncs);
 	semaphores_user = u64_to_user_ptr(args->semaphores);
 	for (i = 0; i < args->num_syncs; i++, num_syncs++) {
@@ -3505,6 +3637,15 @@ int xe_vm_convert_fence_ioctl(struct drm_device *dev, void *data,
 		if (err)
 			goto release_syncs;
 
+		if (sync->flags & DRM_XE_SYNC_FLAG_SIGNAL) {
+			tdr_items[i] = kmalloc(sizeof(struct tdr_item),
+					       GFP_KERNEL);
+			if (!tdr_items[i]) {
+				xe_sync_entry_cleanup(&syncs[i]);
+				goto release_syncs;
+			}
+		}
+
 		err = xe_sync_semaphore_parse(xe, xef, semaphore_sync,
 					      &semaphores_user[i],
 					      sync->flags);
@@ -3591,6 +3732,10 @@ int xe_vm_convert_fence_ioctl(struct drm_device *dev, void *data,
 				&semaphore_sync->chain_fence->base;
 			semaphore_sync->chain_fence = NULL;
 
+			userfence_tdr_add(vm, tdr_items[i],
+					  semaphore_sync->fence);
+			tdr_items[i] = 0;
+
 			semaphore_sync->fence = NULL;   /* Ref owned by chain */
 		} else {
 			xe_sync_entry_signal(semaphore_sync, sync->fence);
@@ -3617,9 +3762,13 @@ int xe_vm_convert_fence_ioctl(struct drm_device *dev, void *data,
 out_unlock:
 	drm_exec_fini(&exec);
 release_syncs:
-	while (err != -EAGAIN && num_syncs--) {
-		xe_sync_entry_cleanup(&syncs[num_syncs]);
-		xe_sync_entry_cleanup(&syncs[args->num_syncs + num_syncs]);
+	if (err != -EAGAIN) {
+		for (i = 0; i < num_syncs; ++i)
+			kfree(tdr_items[i]);
+		while (num_syncs--) {
+			xe_sync_entry_cleanup(&syncs[num_syncs]);
+			xe_sync_entry_cleanup(&syncs[args->num_syncs + num_syncs]);
+		}
 	}
 release_vm_lock:
 	if (err == -EAGAIN)
@@ -3629,6 +3778,7 @@ int xe_vm_convert_fence_ioctl(struct drm_device *dev, void *data,
 	xe_vm_put(vm);
 	free_preempt_fences(&preempt_fences);
 	kfree(syncs);
+	kfree(tdr_items);
 
 	return err;
 }
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index c5cb83722706..49cac5716f72 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -260,6 +260,28 @@ struct xe_vm {
 		struct dma_fence *exported_fence;
 	} preempt;
 
+	/** @userfence: User fence state */
+	struct {
+		/**
+		 * @userfence.lock: fence lock
+		 */
+		spinlock_t lock;
+		/**
+		 * @userfence.pending_list: pending fence list, protected by
+		 * userfence.lock
+		 */
+		struct list_head pending_list;
+		/** @userfence.tdr: fence TDR */
+		struct delayed_work tdr;
+		/** @userfence.kill_work */
+		struct work_struct kill_work;
+		/**
+		 * @userfence.timeout: Fence timeout period, protected by
+		 * userfence.lock
+		 */
+		u32 timeout;
+	} userfence;
+
 	/** @um: unified memory state */
 	struct {
 		/** @asid: address space ID, unique to each VM */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* ✗ Fi.CI.BUILD: failure for UMD direct submission in Xe
  2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
                   ` (28 preceding siblings ...)
  2024-11-18 23:36 ` [RFC PATCH 29/29] drm/xe: Add user fence TDR Matthew Brost
@ 2024-11-19  4:05 ` Patchwork
  29 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2024-11-19  4:05 UTC (permalink / raw)
  To: Matthew Brost; +Cc: igt-dev

== Series Details ==

Series: UMD direct submission in Xe
URL   : https://patchwork.freedesktop.org/series/141523/
State : failure

== Summary ==

Applying: dma-fence: Add dma_fence_preempt base class
Patch failed at 0001 dma-fence: Add dma_fence_preempt base class
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-11-19  4:05 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-18 23:35 [RFC PATCH 00/29] UMD direct submission in Xe Matthew Brost
2024-11-18 23:35 ` [RFC PATCH 01/29] dma-fence: Add dma_fence_preempt base class Matthew Brost
2024-11-18 23:35 ` [RFC PATCH 02/29] dma-fence: Add dma_fence_user_fence Matthew Brost
2024-11-18 23:35 ` [RFC PATCH 03/29] drm/xe: Use dma_fence_preempt base class Matthew Brost
2024-11-18 23:35 ` [RFC PATCH 04/29] drm/xe: Allocate doorbells for UMD exec queues Matthew Brost
2024-11-18 23:35 ` [RFC PATCH 05/29] drm/xe: Add doorbell ID to snapshot capture Matthew Brost
2024-11-18 23:35 ` [RFC PATCH 06/29] drm/xe: Break submission ring out into its own BO Matthew Brost
2024-11-18 23:35 ` [RFC PATCH 07/29] drm/xe: Break indirect ring state " Matthew Brost
2024-11-18 23:35 ` [RFC PATCH 08/29] drm/xe: Clear GGTT in xe_bo_restore_kernel Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 09/29] FIXME: drm/xe: Add pad to ring and indirect state Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 10/29] drm/xe: Enable indirect ring on media GT Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 11/29] drm/xe: Don't add pinned mappings to VM bulk move Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 12/29] drm/xe: Add exec queue post init extension processing Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 13/29] drm/xe/mmap: Add mmap support for PCI memory barrier Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 14/29] drm/xe: Add support for mmapping doorbells to user space Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 15/29] drm/xe: Add support for mmapping submission ring and indirect ring state " Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 16/29] drm/xe/uapi: Define UMD exec queue mapping uAPI Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 17/29] drm/xe: Add usermap exec queue extension Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 18/29] drm/xe: Drop EXEC_QUEUE_FLAG_UMD_SUBMISSION flag Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 19/29] drm/xe: Do not allow usermap exec queues in exec IOCTL Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 20/29] drm/xe: Teach GuC backend to kill usermap queues Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 21/29] drm/xe: Enable preempt fences on " Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 22/29] drm/xe/uapi: Add uAPI to convert user semaphore to / from drm syncobj Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 23/29] drm/xe: Add user fence IRQ handler Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 24/29] drm/xe: Add xe_hw_fence_user_init Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 25/29] drm/xe: Add a message lock to the Xe GPU scheduler Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 26/29] drm/xe: Always wait on preempt fences in vma_check_userptr Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 27/29] drm/xe: Teach xe_sync layer about drm_xe_semaphore Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 28/29] drm/xe: Add VM convert fence IOCTL Matthew Brost
2024-11-18 23:36 ` [RFC PATCH 29/29] drm/xe: Add user fence TDR Matthew Brost
2024-11-19  4:05 ` ✗ Fi.CI.BUILD: failure for UMD direct submission in Xe Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox