[PATCH v11 00/28] AMDGPU usermode queues

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v11 00/28] AMDGPU usermode queues
@ 2024-09-09 20:05 Shashank Sharma
  2024-09-09 20:05 ` [PATCH v11 01/28] drm/amdgpu: UAPI for user queue management Shashank Sharma
                   ` (23 more replies)
  0 siblings, 24 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma

This patch series introduces base code of AMDGPU usermode queues for gfx
workloads. Usermode queues is a method of GPU workload submission into the
graphics hardware without any interaction with kernel/DRM schedulers. In
this method, a userspace graphics application can create its own workqueue
and submit it directly in the GPU HW.

The general idea of how Userqueues are supposed to work:
- The application creates the following GPU objetcs:
  - A queue object to hold the workload packets.
  - A read pointer object.
  - A write pointer object.
  - A doorbell page.
  - Other supporting buffer objects as per target IP engine (shadow, GDS
    etc, information available with AMDGPU_INFO_IOCTL)
- The application picks a 32-bit offset in the doorbell page for this
  queue.
- The application uses the usermode_queue_create IOCTL introduced in
  this patch, by passing the GPU addresses of these objects (read ptr,
  write ptr, queue base address, shadow, gds) with doorbell object and
  32-bit doorbell offset in the doorbell page.
- The kernel creates the queue and maps it in the HW.
- The application maps the GPU buffers in process address space.
- The application can start submitting the data in the queue as soon as
  the kernel IOCTL returns.
- After filling the workload data in the queue, the app must write the
  number of dwords added in the queue into the doorbell offset and the
  WPTR buffer. The GPU will start fetching the data as soon as its done.
- This series adds usermode queue support for all three MES based IPs
  (GFX, SDMA and Compute).
- This series also adds eviction fences to handle migration of the
  userqueue mapped buffers by TTM.
- For synchronization of userqueues, we have added a secure semaphores
  IOCTL which is getting reviewed separately here:
  https://patchwork.freedesktop.org/patch/611971/

libDRM UAPI changes for this series can be found here:
(This also contains an example test utility which demonstrates
the usage of userqueue UAPI)
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

MESA changes consuming this series can be seen in the MR here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010

Alex Deucher (1):
  drm/amdgpu: UAPI for user queue management

Arvind Yadav (4):
  drm/amdgpu: enable SDMA usermode queues
  drm/amdgpu: Add input fence to sync bo unmap
  drm/amdgpu: fix MES GFX mask
  Revert "drm/amdgpu: don't allow userspace to create a doorbell BO"

Shashank Sharma (18):
  drm/amdgpu: add usermode queue base code
  drm/amdgpu: add new IOCTL for usermode queue
  drm/amdgpu: add helpers to create userqueue object
  drm/amdgpu: create MES-V11 usermode queue for GFX
  drm/amdgpu: create context space for usermode queue
  drm/amdgpu: map usermode queue into MES
  drm/amdgpu: map wptr BO into GART
  drm/amdgpu: generate doorbell index for userqueue
  drm/amdgpu: cleanup leftover queues
  drm/amdgpu: enable GFX-V11 userqueue support
  drm/amdgpu: enable compute/gfx usermode queue
  drm/amdgpu: update userqueue BOs and PDs
  drm/amdgpu: add kernel config for gfx-userqueue
  drm/amdgpu: add gfx eviction fence helpers
  drm/amdgpu: add userqueue suspend/resume functions
  drm/amdgpu: suspend gfx userqueues
  drm/amdgpu: resume gfx userqueues
  Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV"

 drivers/gpu/drm/amd/amdgpu/Kconfig            |   8 +
 drivers/gpu/drm/amd/amdgpu/Makefile           |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  11 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  10 +
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 297 ++++++++
 .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |  67 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |  68 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  11 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |   3 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |   2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 713 ++++++++++++++++++
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  74 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 644 ++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  42 +-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        |  16 +-
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 395 ++++++++++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h  |  30 +
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c        |   5 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h    | 100 +++
 drivers/gpu/drm/amd/include/v11_structs.h     |   4 +-
 include/uapi/drm/amdgpu_drm.h                 | 252 +++++++
 22 files changed, 2722 insertions(+), 45 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

-- 
2.45.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 01/28] drm/amdgpu: UAPI for user queue management
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
@ 2024-09-09 20:05 ` Shashank Sharma
  2024-09-09 20:05 ` [PATCH v11 02/28] drm/amdgpu: add usermode queue base code Shashank Sharma
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian Koenig, Shashank Sharma

From: Alex Deucher <alexander.deucher@amd.com>

This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.

This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.

V2: Addressed review comments from Alex and Christian
    - Make the doorbell offset's comment clearer
    - Change the output parameter name to queue_id

V3: Integration with doorbell manager

V4:
    - Updated the UAPI doc (Pierre-Eric)
    - Created a Union for engine specific MQDs (Alex)
    - Added Christian's R-B
V5:
    - Add variables for GDS and CSA in MQD structure (Alex)
    - Make MQD data a ptr-size pair instead of union (Alex)

V9:
   - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
     drm_amdgpu_userq_mqd as its being used for SDMA and
     compute queues as well

V10:
    - keeping the drm_amdgpu_userq_mqd IP independent, moving the
      _gfx_v11 objects in a separate structure in other patch.
      (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 include/uapi/drm/amdgpu_drm.h | 90 +++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 3e488b0119eb..bd8d47a55553 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -54,6 +54,7 @@ extern "C" {
 #define DRM_AMDGPU_VM			0x13
 #define DRM_AMDGPU_FENCE_TO_HANDLE	0x14
 #define DRM_AMDGPU_SCHED		0x15
+#define DRM_AMDGPU_USERQ		0x16
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -71,6 +72,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_VM		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
+#define DRM_IOCTL_AMDGPU_USERQ		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
 
 /**
  * DOC: memory domains
@@ -319,6 +321,94 @@ union drm_amdgpu_ctx {
 	union drm_amdgpu_ctx_out out;
 };
 
+/* user queue IOCTL */
+#define AMDGPU_USERQ_OP_CREATE	1
+#define AMDGPU_USERQ_OP_FREE	2
+
+/* Flag to indicate secure buffer related workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_SECURE	(1 << 0)
+/* Flag to indicate AQL workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_AQL	(1 << 1)
+
+/*
+ * MQD (memory queue descriptor) is a set of parameters which allow
+ * the GPU to uniquely define and identify a usermode queue. This
+ * structure defines the MQD for GFX-V11 IP ver 0.
+ */
+struct drm_amdgpu_userq_in {
+	/** AMDGPU_USERQ_OP_* */
+	__u32	op;
+	/** Queue handle for USERQ_OP_FREE */
+	__u32	queue_id;
+	/** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
+	__u32   ip_type;
+	/**
+	 * @flags: flags to indicate special function for queue like secure
+	 * buffer (TMZ). Unused for now.
+	 */
+	__u32   flags;
+	/**
+	 * @doorbell_handle: the handle of doorbell GEM object
+	 * associated to this client.
+	 */
+	__u32   doorbell_handle;
+	/**
+	 * @doorbell_offset: 32-bit offset of the doorbell in the doorbell bo.
+	 * Kernel will generate absolute doorbell offset using doorbell_handle
+	 * and doorbell_offset in the doorbell bo.
+	 */
+	__u32   doorbell_offset;
+
+	/**
+	 * @queue_va: Virtual address of the GPU memory which holds the queue
+	 * object. The queue holds the workload packets.
+	 */
+	__u64   queue_va;
+	/**
+	 * @queue_size: Size of the queue in bytes, this needs to be 256-byte
+	 * aligned.
+	 */
+	__u64   queue_size;
+	/**
+	 * @rptr_va : Virtual address of the GPU memory which holds the ring RPTR.
+	 * This object must be at least 8 byte in size and aligned to 8-byte offset.
+	 */
+	__u64   rptr_va;
+	/**
+	 * @wptr_va : Virtual address of the GPU memory which holds the ring WPTR.
+	 * This object must be at least 8 byte in size and aligned to 8-byte offset.
+	 *
+	 * Queue, RPTR and WPTR can come from the same object, as long as the size
+	 * and alignment related requirements are met.
+	 */
+	__u64   wptr_va;
+	/**
+	 * @mqd: Queue descriptor for USERQ_OP_CREATE
+	 * MQD data can be of different size for different GPU IP/engine and
+	 * their respective versions/revisions, so this points to a __u64 *
+	 * which holds MQD of this usermode queue.
+	 */
+	__u64 mqd;
+	/**
+	 * @size: size of MQD data in bytes, it must match the MQD structure
+	 * size of the respective engine/revision defined in UAPI for ex, for
+	 * gfx_v11 workloads, size = sizeof(drm_amdgpu_userq_mqd_gfx_v11).
+	 */
+	__u64 mqd_size;
+};
+
+struct drm_amdgpu_userq_out {
+	/** Queue handle */
+	__u32	queue_id;
+	/** Flags */
+	__u32	flags;
+};
+
+union drm_amdgpu_userq {
+	struct drm_amdgpu_userq_in in;
+	struct drm_amdgpu_userq_out out;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID	1
 #define AMDGPU_VM_OP_UNRESERVE_VMID	2
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 02/28] drm/amdgpu: add usermode queue base code
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
  2024-09-09 20:05 ` [PATCH v11 01/28] drm/amdgpu: UAPI for user queue management Shashank Sharma
@ 2024-09-09 20:05 ` Shashank Sharma
  2024-09-09 20:05 ` [PATCH v11 03/28] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig

This patch adds IP independent skeleton code for amdgpu
usermode queue. It contains:
- A new files with init functions of usermode queues.
- A queue context manager in driver private data.

V1: Worked on design review comments from RFC patch series:
(https://patchwork.freedesktop.org/series/112214/)
- Alex: Keep a list of queues, instead of single queue per process.
- Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

V2:
 - Reformatted code, split the big patch into two

V3:
- Integration with doorbell manager

V4:
- Align the structure member names to the largest member's column
  (Luben)
- Added SPDX license (Luben)

V5:
- Do not add amdgpu.h in amdgpu_userqueue.h (Christian).
- Move struct amdgpu_userq_mgr into amdgpu_userqueue.h (Christian).

V6: Rebase
V9: Rebase
V10: Rebase + Alex's R-B

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Change-Id: I6585d012a7ead1105bf43a7b91f361d7dd20a9a9
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 40 ++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    | 61 +++++++++++++++++++
 6 files changed, 113 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 34943b866687..dcf64b965bdf 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -250,6 +250,8 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += amdgpu_amdkfd.o
 
+# add gfx usermode queue
+amdgpu-y += amdgpu_userqueue.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6e6580ab7e04..57a418eec3d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -112,6 +112,7 @@
 #include "amdgpu_xcp.h"
 #include "amdgpu_seq64.h"
 #include "amdgpu_reg_state.h"
+#include "amdgpu_userqueue.h"
 #if defined(CONFIG_DRM_AMD_ISP)
 #include "amdgpu_isp.h"
 #endif
@@ -493,6 +494,7 @@ struct amdgpu_fpriv {
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
+	struct amdgpu_userq_mgr	userq_mgr;
 	/** GPU partition selection */
 	uint32_t		xcp_id;
 };
@@ -1052,6 +1054,7 @@ struct amdgpu_device {
 	bool                            enable_uni_mes;
 	struct amdgpu_mes               mes;
 	struct amdgpu_mqd               mqds[AMDGPU_HW_IP_NUM];
+	const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
 
 	/* df */
 	struct amdgpu_df                df;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 82bde5132dc6..d92f01f3ea44 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -50,6 +50,7 @@
 #include "amdgpu_reset.h"
 #include "amdgpu_sched.h"
 #include "amdgpu_xgmi.h"
+#include "amdgpu_userqueue.h"
 #include "../amdxcp/amdgpu_xcp_drv.h"
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d9fde38f6ee2..019a377620ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -45,6 +45,7 @@
 #include "amdgpu_ras.h"
 #include "amdgpu_reset.h"
 #include "amd_pcie.h"
+#include "amdgpu_userqueue.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -1392,6 +1393,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 
 	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
 
+	r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
+	if (r)
+		DRM_WARN("Can't setup usermode queues, use legacy workload submission only\n");
+
 	file_priv->driver_priv = fpriv;
 	goto out_suspend;
 
@@ -1461,6 +1466,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
 	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
 	amdgpu_vm_fini(adev, &fpriv->vm);
+	amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
 
 	if (pasid)
 		amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
new file mode 100644
index 000000000000..effc0c7c02cf
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+
+int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
+{
+	mutex_init(&userq_mgr->userq_mutex);
+	idr_init_base(&userq_mgr->userq_idr, 1);
+	userq_mgr->adev = adev;
+
+	return 0;
+}
+
+void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
+{
+	idr_destroy(&userq_mgr->userq_idr);
+	mutex_destroy(&userq_mgr->userq_mutex);
+}
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
new file mode 100644
index 000000000000..93ebe4b61682
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef AMDGPU_USERQUEUE_H_
+#define AMDGPU_USERQUEUE_H_
+
+#define AMDGPU_MAX_USERQ_COUNT 512
+
+struct amdgpu_mqd_prop;
+
+struct amdgpu_usermode_queue {
+	int			queue_type;
+	uint64_t		doorbell_handle;
+	uint64_t		doorbell_index;
+	uint64_t		flags;
+	struct amdgpu_mqd_prop	*userq_prop;
+	struct amdgpu_userq_mgr *userq_mgr;
+	struct amdgpu_vm	*vm;
+};
+
+struct amdgpu_userq_funcs {
+	int (*mqd_create)(struct amdgpu_userq_mgr *uq_mgr,
+			  struct drm_amdgpu_userq_in *args,
+			  struct amdgpu_usermode_queue *queue);
+	void (*mqd_destroy)(struct amdgpu_userq_mgr *uq_mgr,
+			    struct amdgpu_usermode_queue *uq);
+};
+
+/* Usermode queues for gfx */
+struct amdgpu_userq_mgr {
+	struct idr			userq_idr;
+	struct mutex			userq_mutex;
+	struct amdgpu_device		*adev;
+};
+
+int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
+
+void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
+
+#endif
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 03/28] drm/amdgpu: add new IOCTL for usermode queue
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
  2024-09-09 20:05 ` [PATCH v11 01/28] drm/amdgpu: UAPI for user queue management Shashank Sharma
  2024-09-09 20:05 ` [PATCH v11 02/28] drm/amdgpu: add usermode queue base code Shashank Sharma
@ 2024-09-09 20:05 ` Shashank Sharma
  2024-09-09 20:05 ` [PATCH v11 04/28] drm/amdgpu: add helpers to create userqueue object Shashank Sharma
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
  - Alex: Keep a list of queues, instead of single queue per process.
  - Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
 - Christian:
   - Formatting of text
   - There is no need for queuing of userqueues, with idr in place
 - Alex:
   - Remove use_doorbell, its unnecessary
   - Reuse amdgpu_mqd_props for saving mqd fields

 - Code formatting and re-arrangement

V3:
 - Integration with doorbell manager

V4:
 - Accommodate MQD union related changes in UAPI (Alex)
 - Do not set the queue size twice (Bas)

V5:
 - Remove wrapper functions for queue indexing (Christian)
 - Do not save the queue id/idr in queue itself (Christian)
 - Move the idr allocation in the IP independent generic space
  (Christian)

V6:
 - Check the validity of input IP type (Christian)

V7:
 - Move uq_func from uq_mgr to adev (Alex)
 - Add missing free(queue) for error cases (Yifan)

V9:
 - Rebase

V10: Addressed review comments from Christian, and added R-B:
 - Do not initialize the local variable
 - Convert DRM_ERROR to DEBUG.

V11:
  - check the input flags to be zero (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 120 ++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
 3 files changed, 123 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index d92f01f3ea44..79db64d30c18 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2951,6 +2951,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index effc0c7c02cf..cf7fe68d9277 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,126 @@
  */
 
 #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+	return idr_find(&uq_mgr->userq_idr, qid);
+}
+
+static int
+amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
+{
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+	struct amdgpu_device *adev = uq_mgr->adev;
+	const struct amdgpu_userq_funcs *uq_funcs;
+	struct amdgpu_usermode_queue *queue;
+
+	mutex_lock(&uq_mgr->userq_mutex);
+
+	queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+	if (!queue) {
+		DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+		mutex_unlock(&uq_mgr->userq_mutex);
+		return -EINVAL;
+	}
+
+	uq_funcs = adev->userq_funcs[queue->queue_type];
+	uq_funcs->mqd_destroy(uq_mgr, queue);
+	idr_remove(&uq_mgr->userq_idr, queue_id);
+	kfree(queue);
+
+	mutex_unlock(&uq_mgr->userq_mutex);
+	return 0;
+}
+
+static int
+amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
+{
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+	struct amdgpu_device *adev = uq_mgr->adev;
+	const struct amdgpu_userq_funcs *uq_funcs;
+	struct amdgpu_usermode_queue *queue;
+	int qid, r = 0;
+
+	if (args->in.flags) {
+		DRM_ERROR("Usermode queue flags not supported yet\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&uq_mgr->userq_mutex);
+
+	uq_funcs = adev->userq_funcs[args->in.ip_type];
+	if (!uq_funcs) {
+		DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", args->in.ip_type);
+		r = -EINVAL;
+		goto unlock;
+	}
+
+	queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+	if (!queue) {
+		DRM_ERROR("Failed to allocate memory for queue\n");
+		r = -ENOMEM;
+		goto unlock;
+	}
+	queue->doorbell_handle = args->in.doorbell_handle;
+	queue->doorbell_index = args->in.doorbell_offset;
+	queue->queue_type = args->in.ip_type;
+	queue->flags = args->in.flags;
+	queue->vm = &fpriv->vm;
+
+	r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
+	if (r) {
+		DRM_ERROR("Failed to create Queue\n");
+		kfree(queue);
+		goto unlock;
+	}
+
+	qid = idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ_COUNT, GFP_KERNEL);
+	if (qid < 0) {
+		DRM_ERROR("Failed to allocate a queue id\n");
+		uq_funcs->mqd_destroy(uq_mgr, queue);
+		kfree(queue);
+		r = -ENOMEM;
+		goto unlock;
+	}
+	args->out.queue_id = qid;
+
+unlock:
+	mutex_unlock(&uq_mgr->userq_mutex);
+	return r;
+}
+
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *filp)
+{
+	union drm_amdgpu_userq *args = data;
+	int r;
+
+	switch (args->in.op) {
+	case AMDGPU_USERQ_OP_CREATE:
+		r = amdgpu_userqueue_create(filp, args);
+		if (r)
+			DRM_ERROR("Failed to create usermode queue\n");
+		break;
+
+	case AMDGPU_USERQ_OP_FREE:
+		r = amdgpu_userqueue_destroy(filp, args->in.queue_id);
+		if (r)
+			DRM_ERROR("Failed to destroy usermode queue\n");
+		break;
+
+	default:
+		DRM_DEBUG_DRIVER("Invalid user queue op specified: %d\n", args->in.op);
+		return -EINVAL;
+	}
+
+	return r;
+}
 
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
 {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 93ebe4b61682..b739274c72e1 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -54,6 +54,8 @@ struct amdgpu_userq_mgr {
 	struct amdgpu_device		*adev;
 };
 
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
+
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 04/28] drm/amdgpu: add helpers to create userqueue object
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (2 preceding siblings ...)
  2024-09-09 20:05 ` [PATCH v11 03/28] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
@ 2024-09-09 20:05 ` Shashank Sharma
  2024-09-09 20:05 ` [PATCH v11 05/28] drm/amdgpu: create MES-V11 usermode queue for GFX Shashank Sharma
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig

This patch introduces amdgpu_userqueue_object and its helper
functions to creates and destroy this object. The helper
functions creates/destroys a base amdgpu_bo, kmap/unmap it and
save the respective GPU and CPU addresses in the encapsulating
userqueue object.

These helpers will be used to create/destroy userqueue MQD, WPTR
and FW areas.

V7:
- Forked out this new patch from V11-gfx-userqueue patch to prevent
  that patch from growing very big.
- Using amdgpu_bo_create instead of amdgpu_bo_create_kernel in prep
  for eviction fences (Christian)

V9:
 - Rebase
V10:
 - Added Alex's R-B

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 62 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    | 13 ++++
 2 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index cf7fe68d9277..501324dde343 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -32,6 +32,68 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
 	return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+				   struct amdgpu_userq_obj *userq_obj,
+				   int size)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct amdgpu_bo_param bp;
+	int r;
+
+	memset(&bp, 0, sizeof(bp));
+	bp.byte_align = PAGE_SIZE;
+	bp.domain = AMDGPU_GEM_DOMAIN_GTT;
+	bp.flags = AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
+		   AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
+	bp.type = ttm_bo_type_kernel;
+	bp.size = size;
+	bp.resv = NULL;
+	bp.bo_ptr_size = sizeof(struct amdgpu_bo);
+
+	r = amdgpu_bo_create(adev, &bp, &userq_obj->obj);
+	if (r) {
+		DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+		return r;
+	}
+
+	r = amdgpu_bo_reserve(userq_obj->obj, true);
+	if (r) {
+		DRM_ERROR("Failed to reserve BO to map (%d)", r);
+		goto free_obj;
+	}
+
+	r = amdgpu_ttm_alloc_gart(&(userq_obj->obj)->tbo);
+	if (r) {
+		DRM_ERROR("Failed to alloc GART for userqueue object (%d)", r);
+		goto unresv;
+	}
+
+	r = amdgpu_bo_kmap(userq_obj->obj, &userq_obj->cpu_ptr);
+	if (r) {
+		DRM_ERROR("Failed to map BO for userqueue (%d)", r);
+		goto unresv;
+	}
+
+	userq_obj->gpu_addr = amdgpu_bo_gpu_offset(userq_obj->obj);
+	amdgpu_bo_unreserve(userq_obj->obj);
+	memset(userq_obj->cpu_ptr, 0, size);
+	return 0;
+
+unresv:
+	amdgpu_bo_unreserve(userq_obj->obj);
+
+free_obj:
+	amdgpu_bo_unref(&userq_obj->obj);
+	return r;
+}
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+				   struct amdgpu_userq_obj *userq_obj)
+{
+	amdgpu_bo_kunmap(userq_obj->obj);
+	amdgpu_bo_unref(&userq_obj->obj);
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index b739274c72e1..bbd29f68b8d4 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -29,6 +29,12 @@
 
 struct amdgpu_mqd_prop;
 
+struct amdgpu_userq_obj {
+	void		 *cpu_ptr;
+	uint64_t	 gpu_addr;
+	struct amdgpu_bo *obj;
+};
+
 struct amdgpu_usermode_queue {
 	int			queue_type;
 	uint64_t		doorbell_handle;
@@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_mqd_prop	*userq_prop;
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_vm	*vm;
+	struct amdgpu_userq_obj mqd;
 };
 
 struct amdgpu_userq_funcs {
@@ -60,4 +67,10 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+				   struct amdgpu_userq_obj *userq_obj,
+				   int size);
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+				     struct amdgpu_userq_obj *userq_obj);
 #endif
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 05/28] drm/amdgpu: create MES-V11 usermode queue for GFX
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (3 preceding siblings ...)
  2024-09-09 20:05 ` [PATCH v11 04/28] drm/amdgpu: add helpers to create userqueue object Shashank Sharma
@ 2024-09-09 20:05 ` Shashank Sharma
  2024-09-09 20:05 ` [PATCH v11 06/28] drm/amdgpu: create context space for usermode queue Shashank Sharma
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig, Arvind Yadav

A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.

This patch:
- Adds a new file which will be used for MES based userqueue
  functions targeting GFX and SDMA IP.
- Introduces MQD handler functions for the usermode queues.

V1: Worked on review comments from Alex:
    - Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
    - Reuse the existing adev->mqd[ip] for MQD creation
    - Formatting and arrangement of code

V3:
    - Integration with doorbell manager

V4: Review comments addressed:
    - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
    - Align name of structure members (Luben)
    - Don't break up the Cc tag list and the Sob tag list in commit
      message (Luben)
V5:
   - No need to reserve the bo for MQD (Christian).
   - Some more changes to support IP specific MQD creation.

V6:
   - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
     calls while creating MQD object to amdgpu_bo_create() once eviction
     fences are ready (Christian).

V7:
   - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
   - Use memdup_user instead of copy_from_user (Christian)

V9:
   - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
     that it can be reused for SDMA userqueues as well (Shashank, Alex)

V10: Addressed review comments from Alex
   - Making this patch independent of IP engine(GFX/SDMA/Compute) and
     specific to MES V11 only, using the generic MQD structure.
   - Splitting a spearate patch to enabling GFX support from here.
   - Verify mqd va address to be non-NULL.
   - Add a separate header file.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Change-Id: I855f895a4822ef015957542bc17eabb166b792e6
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |  3 +-
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 98 +++++++++++++++++++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h  | 30 ++++++
 3 files changed, 130 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index dcf64b965bdf..d9bf70251eba 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -173,7 +173,8 @@ amdgpu-y += \
 amdgpu-y += \
 	amdgpu_mes.o \
 	mes_v11_0.o \
-	mes_v12_0.o
+	mes_v12_0.o \
+	mes_v11_0_userqueue.o
 
 # add UVD block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
new file mode 100644
index 000000000000..63fd48a5b8b0
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "amdgpu_gfx.h"
+#include "v11_structs.h"
+#include "mes_v11_0.h"
+#include "mes_v11_0_userqueue.h"
+
+static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
+				      struct drm_amdgpu_userq_in *args_in,
+				      struct amdgpu_usermode_queue *queue)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct amdgpu_mqd *mqd_hw_default = &adev->mqds[queue->queue_type];
+	struct drm_amdgpu_userq_in *mqd_user = args_in;
+	struct amdgpu_mqd_prop *userq_props;
+	int r;
+
+	/* Structure to initialize MQD for userqueue using generic MQD init function */
+	userq_props = kzalloc(sizeof(struct amdgpu_mqd_prop), GFP_KERNEL);
+	if (!userq_props) {
+		DRM_ERROR("Failed to allocate memory for userq_props\n");
+		return -ENOMEM;
+	}
+
+	if (!mqd_user->wptr_va || !mqd_user->rptr_va ||
+	    !mqd_user->queue_va || mqd_user->queue_size == 0) {
+		DRM_ERROR("Invalid MQD parameters for userqueue\n");
+		r = -EINVAL;
+		goto free_props;
+	}
+
+	r = amdgpu_userqueue_create_object(uq_mgr, &queue->mqd, mqd_hw_default->mqd_size);
+	if (r) {
+		DRM_ERROR("Failed to create MQD object for userqueue\n");
+		goto free_props;
+	}
+
+	/* Initialize the MQD BO with user given values */
+	userq_props->wptr_gpu_addr = mqd_user->wptr_va;
+	userq_props->rptr_gpu_addr = mqd_user->rptr_va;
+	userq_props->queue_size = mqd_user->queue_size;
+	userq_props->hqd_base_gpu_addr = mqd_user->queue_va;
+	userq_props->mqd_gpu_addr = queue->mqd.gpu_addr;
+	userq_props->use_doorbell = true;
+
+	queue->userq_prop = userq_props;
+
+	r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
+	if (r) {
+		DRM_ERROR("Failed to initialize MQD for userqueue\n");
+		goto free_mqd;
+	}
+
+	return 0;
+
+free_mqd:
+	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
+
+free_props:
+	kfree(userq_props);
+
+	return r;
+}
+
+static void
+mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
+			    struct amdgpu_usermode_queue *queue)
+{
+	kfree(queue->userq_prop);
+	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
+}
+
+const struct amdgpu_userq_funcs userq_mes_v11_0_funcs = {
+	.mqd_create = mes_v11_0_userq_mqd_create,
+	.mqd_destroy = mes_v11_0_userq_mqd_destroy,
+};
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h
new file mode 100644
index 000000000000..2c102361ca82
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef MES_V11_0_USERQ_H
+#define MES_V11_0_USERQ_H
+#include "amdgpu_userqueue.h"
+
+extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
+#endif
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 06/28] drm/amdgpu: create context space for usermode queue
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (4 preceding siblings ...)
  2024-09-09 20:05 ` [PATCH v11 05/28] drm/amdgpu: create MES-V11 usermode queue for GFX Shashank Sharma
@ 2024-09-09 20:05 ` Shashank Sharma
  2024-10-18 17:39   ` Alex Deucher
  2024-09-09 20:05 ` [PATCH v11 07/28] drm/amdgpu: map usermode queue into MES Shashank Sharma
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig, Arvind Yadav

The MES FW expects us to allocate at least one page as context
space to process gang and process related context data. This
patch creates a joint object for the same, and calculates GPU
space offsets of these spaces.

V1: Addressed review comments on RFC patch:
    Alex: Make this function IP specific

V2: Addressed review comments from Christian
    - Allocate only one object for total FW space, and calculate
      offsets for each of these objects.

V3: Integration with doorbell manager

V4: Review comments:
    - Remove shadow from FW space list from cover letter (Alex)
    - Alignment of macro (Luben)

V5: Merged patches 5 and 6 into this single patch
    Addressed review comments:
    - Use lower_32_bits instead of mask (Christian)
    - gfx_v11_0 instead of gfx_v11 in function names (Alex)
    - Shadow and GDS objects are now coming from userspace (Christian,
      Alex)

V6:
    - Add a comment to replace amdgpu_bo_create_kernel() with
      amdgpu_bo_create() during fw_ctx object creation (Christian).
    - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
      of generic queue structure and make it gen11 specific (Alex).

V7:
   - Using helper function to create/destroy userqueue objects.
   - Removed FW object space allocation.

V8:
   - Updating FW object address from user values.

V9:
   - uppdated function name from gfx_v11_* to mes_v11_*

V10:
   - making this patch independent of IP based changes, moving any
     GFX object related changes in GFX specific patch (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 33 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
 2 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 63fd48a5b8b0..2486ea2d72fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -27,6 +27,31 @@
 #include "mes_v11_0.h"
 #include "mes_v11_0_userqueue.h"
 
+#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
+
+static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+					    struct amdgpu_usermode_queue *queue,
+					    struct drm_amdgpu_userq_in *mqd_user)
+{
+	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+	int r, size;
+
+	/*
+	 * The FW expects at least one page space allocated for
+	 * process ctx and gang ctx each. Create an object
+	 * for the same.
+	 */
+	size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
+	r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);
+	if (r) {
+		DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
+		return r;
+	}
+
+	return 0;
+}
+
 static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 				      struct drm_amdgpu_userq_in *args_in,
 				      struct amdgpu_usermode_queue *queue)
@@ -73,6 +98,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 		goto free_mqd;
 	}
 
+	/* Create BO for FW operations */
+	r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
+	if (r) {
+		DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+		goto free_mqd;
+	}
+
 	return 0;
 
 free_mqd:
@@ -88,6 +120,7 @@ static void
 mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
 			    struct amdgpu_usermode_queue *queue)
 {
+	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
 	kfree(queue->userq_prop);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
 }
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index bbd29f68b8d4..643f31474bd8 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_vm	*vm;
 	struct amdgpu_userq_obj mqd;
+	struct amdgpu_userq_obj fw_obj;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 07/28] drm/amdgpu: map usermode queue into MES
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (5 preceding siblings ...)
  2024-09-09 20:05 ` [PATCH v11 06/28] drm/amdgpu: create context space for usermode queue Shashank Sharma
@ 2024-09-09 20:05 ` Shashank Sharma
  2024-09-09 20:05 ` [PATCH v11 08/28] drm/amdgpu: map wptr BO into GART Shashank Sharma
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig

This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would  be considered ready to accept the workload.

V1: Addressed review comments from Alex on the RFC patch series
    - Map/Unmap should be IP specific.
V2:
    Addressed review comments from Christian:
    - Fix the wptr_mc_addr calculation (moved into another patch)
    Addressed review comments from Alex:
    - Do not add fptrs for map/unmap

V3:  Integration with doorbell manager
V4:  Rebase
V5:  Use gfx_v11_0 for function names (Alex)
V6:  Removed queue->proc/gang/fw_ctx_address variables and doing the
     address calculations locally to keep the queue structure GEN
     independent (Alex)
V7:  Added R-B from Alex
V8:  Rebase
V9:  Rebase
V10: Rebase

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 74 +++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 2486ea2d72fe..a1bc6f488928 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,69 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
+			       struct amdgpu_usermode_queue *queue,
+			       struct amdgpu_mqd_prop *userq_props)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+	struct mes_add_queue_input queue_input;
+	int r;
+
+	memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
+
+	queue_input.process_va_start = 0;
+	queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
+
+	/* set process quantum to 10 ms and gang quantum to 1 ms as default */
+	queue_input.process_quantum = 100000;
+	queue_input.gang_quantum = 10000;
+	queue_input.paging = false;
+
+	queue_input.process_context_addr = ctx->gpu_addr;
+	queue_input.gang_context_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
+	queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+	queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+
+	queue_input.process_id = queue->vm->pasid;
+	queue_input.queue_type = queue->queue_type;
+	queue_input.mqd_addr = queue->mqd.gpu_addr;
+	queue_input.wptr_addr = userq_props->wptr_gpu_addr;
+	queue_input.queue_size = userq_props->queue_size >> 2;
+	queue_input.doorbell_offset = userq_props->doorbell_index;
+	queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
+
+	amdgpu_mes_lock(&adev->mes);
+	r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
+	amdgpu_mes_unlock(&adev->mes);
+	if (r) {
+		DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
+		return r;
+	}
+
+	DRM_DEBUG_DRIVER("Queue (doorbell:%d) mapped successfully\n", userq_props->doorbell_index);
+	return 0;
+}
+
+static void mes_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
+				  struct amdgpu_usermode_queue *queue)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct mes_remove_queue_input queue_input;
+	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+	int r;
+
+	memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
+	queue_input.doorbell_offset = queue->doorbell_index;
+	queue_input.gang_context_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
+
+	amdgpu_mes_lock(&adev->mes);
+	r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
+	amdgpu_mes_unlock(&adev->mes);
+	if (r)
+		DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
+}
+
 static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
 					    struct amdgpu_usermode_queue *queue,
 					    struct drm_amdgpu_userq_in *mqd_user)
@@ -105,8 +168,18 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 		goto free_mqd;
 	}
 
+	/* Map userqueue into FW using MES */
+	r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
+	if (r) {
+		DRM_ERROR("Failed to init MQD\n");
+		goto free_ctx;
+	}
+
 	return 0;
 
+free_ctx:
+	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
+
 free_mqd:
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
 
@@ -120,6 +193,7 @@ static void
 mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
 			    struct amdgpu_usermode_queue *queue)
 {
+	mes_v11_0_userq_unmap(uq_mgr, queue);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
 	kfree(queue->userq_prop);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 08/28] drm/amdgpu: map wptr BO into GART
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (6 preceding siblings ...)
  2024-09-09 20:05 ` [PATCH v11 07/28] drm/amdgpu: map usermode queue into MES Shashank Sharma
@ 2024-09-09 20:05 ` Shashank Sharma
  2024-09-16 12:39   ` Christian König
  2024-09-09 20:06 ` [PATCH v11 09/28] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:05 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig, Arvind Yadav

To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.

V4: fix the wptr value before mapping lookup (Bas, Christian).

V5: Addressed review comments from Christian:
    - Either pin object or allocate from GART, but not both.
    - All the handling must be done with the VM locks held.

V7: Addressed review comments from Christian:
    - Do not take vm->eviction_lock
    - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset

V8:  Rebase
V9:  Changed the function names from gfx_v11* to mes_v11*
V10: Remove unused adev (Harish)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 76 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
 2 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index a1bc6f488928..90511abaef05 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,73 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int
+mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_bo *bo)
+{
+	int ret;
+
+	ret = amdgpu_bo_reserve(bo, true);
+	if (ret) {
+		DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+		goto err_reserve_bo_failed;
+	}
+
+	ret = amdgpu_ttm_alloc_gart(&bo->tbo);
+	if (ret) {
+		DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+		goto err_map_bo_gart_failed;
+	}
+
+	amdgpu_bo_unreserve(bo);
+	bo = amdgpu_bo_ref(bo);
+
+	return 0;
+
+err_map_bo_gart_failed:
+	amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+	return ret;
+}
+
+static int
+mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
+			      struct amdgpu_usermode_queue *queue,
+			      uint64_t wptr)
+{
+	struct amdgpu_bo_va_mapping *wptr_mapping;
+	struct amdgpu_vm *wptr_vm;
+	struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
+	int ret;
+
+	wptr_vm = queue->vm;
+	ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+	if (ret)
+		return ret;
+
+	wptr &= AMDGPU_GMC_HOLE_MASK;
+	wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
+	amdgpu_bo_unreserve(wptr_vm->root.bo);
+	if (!wptr_mapping) {
+		DRM_ERROR("Failed to lookup wptr bo\n");
+		return -EINVAL;
+	}
+
+	wptr_obj->obj = wptr_mapping->bo_va->base.bo;
+	if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
+		DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
+		return -EINVAL;
+	}
+
+	ret = mes_v11_0_map_gtt_bo_to_gart(wptr_obj->obj);
+	if (ret) {
+		DRM_ERROR("Failed to map wptr bo to GART\n");
+		return ret;
+	}
+
+	queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
+	return 0;
+}
+
 static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
 			       struct amdgpu_usermode_queue *queue,
 			       struct amdgpu_mqd_prop *userq_props)
@@ -61,6 +128,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
 	queue_input.queue_size = userq_props->queue_size >> 2;
 	queue_input.doorbell_offset = userq_props->doorbell_index;
 	queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
+	queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
 
 	amdgpu_mes_lock(&adev->mes);
 	r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
@@ -168,6 +236,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 		goto free_mqd;
 	}
 
+	/* FW expects WPTR BOs to be mapped into GART */
+	r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, userq_props->wptr_gpu_addr);
+	if (r) {
+		DRM_ERROR("Failed to create WPTR mapping\n");
+		goto free_ctx;
+	}
+
 	/* Map userqueue into FW using MES */
 	r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
 	if (r) {
@@ -194,6 +269,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
 			    struct amdgpu_usermode_queue *queue)
 {
 	mes_v11_0_userq_unmap(uq_mgr, queue);
+	amdgpu_bo_unref(&queue->wptr_obj.obj);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
 	kfree(queue->userq_prop);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 643f31474bd8..ffe8a3d73756 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -45,6 +45,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_vm	*vm;
 	struct amdgpu_userq_obj mqd;
 	struct amdgpu_userq_obj fw_obj;
+	struct amdgpu_userq_obj wptr_obj;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 09/28] drm/amdgpu: generate doorbell index for userqueue
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (7 preceding siblings ...)
  2024-09-09 20:05 ` [PATCH v11 08/28] drm/amdgpu: map wptr BO into GART Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-09 20:06 ` [PATCH v11 10/28] drm/amdgpu: cleanup leftover queues Shashank Sharma
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig

The userspace sends us the doorbell object and the relative doobell
index in the object to be used for the usermode queue, but the FW
expects the absolute doorbell index on the PCI BAR in the MQD. This
patch adds a function to convert this relative doorbell index to
absolute doorbell index.

V5:  Fix the db object reference leak (Christian)
V6:  Pin the doorbell bo in userqueue_create() function, and unpin it
     in userqueue destoy (Christian)
V7:  Added missing kfree for queue in error cases
     Added Alex's R-B
V8:  Rebase
V9:  Changed the function names from gfx_v11* to mes_v11*
V10: Rebase

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 59 +++++++++++++++++++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  |  1 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
 3 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 501324dde343..3c9f804478d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -94,6 +94,53 @@ void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
 	amdgpu_bo_unref(&userq_obj->obj);
 }
 
+static uint64_t
+amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
+				     struct amdgpu_usermode_queue *queue,
+				     struct drm_file *filp,
+				     uint32_t doorbell_offset)
+{
+	uint64_t index;
+	struct drm_gem_object *gobj;
+	struct amdgpu_userq_obj *db_obj = &queue->db_obj;
+	int r;
+
+	gobj = drm_gem_object_lookup(filp, queue->doorbell_handle);
+	if (gobj == NULL) {
+		DRM_ERROR("Can't find GEM object for doorbell\n");
+		return -EINVAL;
+	}
+
+	db_obj->obj = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
+	drm_gem_object_put(gobj);
+
+	/* Pin the BO before generating the index, unpin in queue destroy */
+	r = amdgpu_bo_pin(db_obj->obj, AMDGPU_GEM_DOMAIN_DOORBELL);
+	if (r) {
+		DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+		goto unref_bo;
+	}
+
+	r = amdgpu_bo_reserve(db_obj->obj, true);
+	if (r) {
+		DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+		goto unpin_bo;
+	}
+
+	index = amdgpu_doorbell_index_on_bar(uq_mgr->adev, db_obj->obj,
+					     doorbell_offset, sizeof(u64));
+	DRM_DEBUG_DRIVER("[Usermode queues] doorbell index=%lld\n", index);
+	amdgpu_bo_unreserve(db_obj->obj);
+	return index;
+
+unpin_bo:
+	amdgpu_bo_unpin(db_obj->obj);
+
+unref_bo:
+	amdgpu_bo_unref(&db_obj->obj);
+	return r;
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
@@ -114,6 +161,8 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 
 	uq_funcs = adev->userq_funcs[queue->queue_type];
 	uq_funcs->mqd_destroy(uq_mgr, queue);
+	amdgpu_bo_unpin(queue->db_obj.obj);
+	amdgpu_bo_unref(&queue->db_obj.obj);
 	idr_remove(&uq_mgr->userq_idr, queue_id);
 	kfree(queue);
 
@@ -129,6 +178,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	struct amdgpu_device *adev = uq_mgr->adev;
 	const struct amdgpu_userq_funcs *uq_funcs;
 	struct amdgpu_usermode_queue *queue;
+	uint64_t index;
 	int qid, r = 0;
 
 	if (args->in.flags) {
@@ -157,6 +207,15 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	queue->flags = args->in.flags;
 	queue->vm = &fpriv->vm;
 
+	/* Convert relative doorbell offset into absolute doorbell index */
+	index = amdgpu_userqueue_get_doorbell_index(uq_mgr, queue, filp, args->in.doorbell_offset);
+	if (index == (uint64_t)-EINVAL) {
+		DRM_ERROR("Failed to get doorbell for queue\n");
+		kfree(queue);
+		goto unlock;
+	}
+	queue->doorbell_index = index;
+
 	r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
 	if (r) {
 		DRM_ERROR("Failed to create Queue\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 90511abaef05..bc9ce5233a7d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -220,6 +220,7 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 	userq_props->hqd_base_gpu_addr = mqd_user->queue_va;
 	userq_props->mqd_gpu_addr = queue->mqd.gpu_addr;
 	userq_props->use_doorbell = true;
+	userq_props->doorbell_index = queue->doorbell_index;
 
 	queue->userq_prop = userq_props;
 
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index ffe8a3d73756..a653e31350c5 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_vm	*vm;
 	struct amdgpu_userq_obj mqd;
+	struct amdgpu_userq_obj	db_obj;
 	struct amdgpu_userq_obj fw_obj;
 	struct amdgpu_userq_obj wptr_obj;
 };
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 10/28] drm/amdgpu: cleanup leftover queues
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (8 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 09/28] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-09 20:06 ` [PATCH v11 11/28] drm/amdgpu: enable GFX-V11 userqueue support Shashank Sharma
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig,
	Bas Nieuwenhuizen

This patch adds code to cleanup any leftover userqueues which
a user might have missed to destroy due to a crash or any other
programming error.

V7:  Added Alex's R-B
V8:  Rebase
V9:  Rebase
V10: Rebase

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 27 ++++++++++++++-----
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 3c9f804478d5..64a063ec3b27 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -26,6 +26,19 @@
 #include "amdgpu_vm.h"
 #include "amdgpu_userqueue.h"
 
+static void
+amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr *uq_mgr,
+			 struct amdgpu_usermode_queue *queue,
+			 int queue_id)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	const struct amdgpu_userq_funcs *uq_funcs = adev->userq_funcs[queue->queue_type];
+
+	uq_funcs->mqd_destroy(uq_mgr, queue);
+	idr_remove(&uq_mgr->userq_idr, queue_id);
+	kfree(queue);
+}
+
 static struct amdgpu_usermode_queue *
 amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
 {
@@ -146,8 +159,6 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
 	struct amdgpu_fpriv *fpriv = filp->driver_priv;
 	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
-	struct amdgpu_device *adev = uq_mgr->adev;
-	const struct amdgpu_userq_funcs *uq_funcs;
 	struct amdgpu_usermode_queue *queue;
 
 	mutex_lock(&uq_mgr->userq_mutex);
@@ -159,13 +170,9 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 		return -EINVAL;
 	}
 
-	uq_funcs = adev->userq_funcs[queue->queue_type];
-	uq_funcs->mqd_destroy(uq_mgr, queue);
 	amdgpu_bo_unpin(queue->db_obj.obj);
 	amdgpu_bo_unref(&queue->db_obj.obj);
-	idr_remove(&uq_mgr->userq_idr, queue_id);
-	kfree(queue);
-
+	amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
 	mutex_unlock(&uq_mgr->userq_mutex);
 	return 0;
 }
@@ -276,6 +283,12 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
 {
+	uint32_t queue_id;
+	struct amdgpu_usermode_queue *queue;
+
+	idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id)
+		amdgpu_userqueue_cleanup(userq_mgr, queue, queue_id);
+
 	idr_destroy(&userq_mgr->userq_idr);
 	mutex_destroy(&userq_mgr->userq_mutex);
 }
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 11/28] drm/amdgpu: enable GFX-V11 userqueue support
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (9 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 10/28] drm/amdgpu: cleanup leftover queues Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-09 20:06 ` [PATCH v11 12/28] drm/amdgpu: enable SDMA usermode queues Shashank Sharma
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig, Arvind Yadav

This patch enables GFX-v11 IP support in the usermode queue base
code. It typically:
- adds a GFX_v11 specific MQD structure
- sets IP functions to create and destroy MQDs
- sets MQD objects coming from userspace

V10: introduced this spearate patch for GFX V11 enabling (Alex).
V11: Addressed review comments:
     - update the comments in GFX mqd structure informing user about using
       the INFO IOCTL for object sizes (Alex)
     - rename struct drm_amdgpu_userq_mqd_gfx_v11 to
       drm_amdgpu_userq_mqd_gfx11 (Marek)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  6 ++++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  3 ++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 28 +++++++++++++++++++
 include/uapi/drm/amdgpu_drm.h                 | 19 +++++++++++++
 4 files changed, 56 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 64a063ec3b27..5cb984c509c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -188,6 +188,12 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	uint64_t index;
 	int qid, r = 0;
 
+	/* Usermode queues are only supported for GFX IP as of now */
+	if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+		DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
+		return -EINVAL;
+	}
+
 	if (args->in.flags) {
 		DRM_ERROR("Usermode queue flags not supported yet\n");
 		return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index d3e8be82a172..e68874fd0ff9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -49,6 +49,7 @@
 #include "gfx_v11_0_3.h"
 #include "nbio_v4_3.h"
 #include "mes_v11_0.h"
+#include "mes_v11_0_userqueue.h"
 
 #define GFX11_NUM_GFX_RINGS		1
 #define GFX11_MEC_HPD_SIZE	2048
@@ -1552,6 +1553,7 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_mec = 2;
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
+		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
 		break;
 	case IP_VERSION(11, 0, 1):
 	case IP_VERSION(11, 0, 4):
@@ -1564,6 +1566,7 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_mec = 1;
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
+		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
 		break;
 	default:
 		adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index bc9ce5233a7d..bcfa0d1ef7bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -180,6 +180,34 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
 		return r;
 	}
 
+	/* Shadow, GDS and CSA objects come directly from userspace */
+	if (mqd_user->ip_type == AMDGPU_HW_IP_GFX) {
+		struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
+		struct drm_amdgpu_userq_mqd_gfx11 *mqd_gfx_v11;
+
+		if (mqd_user->mqd_size != sizeof(*mqd_gfx_v11) || !mqd_user->mqd) {
+			DRM_ERROR("Invalid GFX MQD\n");
+			return -EINVAL;
+		}
+
+		mqd_gfx_v11 = memdup_user(u64_to_user_ptr(mqd_user->mqd), mqd_user->mqd_size);
+		if (IS_ERR(mqd_gfx_v11)) {
+			DRM_ERROR("Failed to read user MQD\n");
+			amdgpu_userqueue_destroy_object(uq_mgr, ctx);
+			return -ENOMEM;
+		}
+
+		mqd->shadow_base_lo = mqd_gfx_v11->shadow_va & 0xFFFFFFFC;
+		mqd->shadow_base_hi = upper_32_bits(mqd_gfx_v11->shadow_va);
+
+		mqd->gds_bkup_base_lo = mqd_gfx_v11->gds_va & 0xFFFFFFFC;
+		mqd->gds_bkup_base_hi = upper_32_bits(mqd_gfx_v11->gds_va);
+
+		mqd->fw_work_area_base_lo = mqd_gfx_v11->csa_va & 0xFFFFFFFC;
+		mqd->fw_work_area_base_hi = upper_32_bits(mqd_gfx_v11->csa_va);
+		kfree(mqd_gfx_v11);
+	}
+
 	return 0;
 }
 
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index bd8d47a55553..895d64982498 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -409,6 +409,25 @@ union drm_amdgpu_userq {
 	struct drm_amdgpu_userq_out out;
 };
 
+/* GFX V11 IP specific MQD parameters */
+struct drm_amdgpu_userq_mqd_gfx11 {
+	/**
+	 * @shadow_va: Virtual address of the GPU memory to hold the shadow buffer.
+	 * Use AMDGPU_INFO_IOCTL to find the exact size of the object.
+	 */
+	__u64   shadow_va;
+	/**
+	 * @gds_va: Virtual address of the GPU memory to hold the GDS buffer.
+	 * Use AMDGPU_INFO_IOCTL to find the exact size of the object.
+	 */
+	__u64   gds_va;
+	/**
+	 * @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
+	 * Use AMDGPU_INFO_IOCTL to find the exact size of the object.
+	 */
+	__u64   csa_va;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID	1
 #define AMDGPU_VM_OP_UNRESERVE_VMID	2
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 12/28] drm/amdgpu: enable SDMA usermode queues
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (10 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 11/28] drm/amdgpu: enable GFX-V11 userqueue support Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-09 20:06 ` [PATCH v11 13/28] drm/amdgpu: enable compute/gfx usermode queue Shashank Sharma
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx
  Cc: Arvind Yadav, Christian König, Alex Deucher,
	Christian König, Shashank Sharma, Arvind Yadav,
	Srinivasan Shanmugam

From: Arvind Yadav <Arvind.Yadav@amd.com>

This patch does necessary modifications to enable the SDMA
usermode queues using the existing userqueue infrastructure.

V9:  introduced this patch in the series
V10: use header file instead of extern (Alex)
V11: rename drm_amdgpu_userq_mqd_sdma_gfx_v11 to
     drm_amdgpu_userq_mqd_sdma_gfx11 (Marek)

Cc: Christian König <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Change-Id: I782acfc08fef0fa5302e665173788fc03dbc51e1
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c  |  2 +-
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c   | 18 ++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c         |  2 ++
 include/uapi/drm/amdgpu_drm.h                  | 10 ++++++++++
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 5cb984c509c2..2c5747cc492e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	int qid, r = 0;
 
 	/* Usermode queues are only supported for GFX IP as of now */
-	if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+	if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
 		DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
 		return -EINVAL;
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index bcfa0d1ef7bf..dc5359742774 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -206,6 +206,24 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
 		mqd->fw_work_area_base_lo = mqd_gfx_v11->csa_va & 0xFFFFFFFC;
 		mqd->fw_work_area_base_hi = upper_32_bits(mqd_gfx_v11->csa_va);
 		kfree(mqd_gfx_v11);
+	} else if (mqd_user->ip_type == AMDGPU_HW_IP_DMA) {
+		struct v11_sdma_mqd *mqd = queue->mqd.cpu_ptr;
+		struct drm_amdgpu_userq_mqd_sdma_gfx11 *mqd_sdma_v11;
+
+		if (mqd_user->mqd_size != sizeof(*mqd_sdma_v11) || !mqd_user->mqd) {
+			DRM_ERROR("Invalid SDMA MQD\n");
+			return -EINVAL;
+		}
+
+		mqd_sdma_v11 = memdup_user(u64_to_user_ptr(mqd_user->mqd), mqd_user->mqd_size);
+		if (IS_ERR(mqd_sdma_v11)) {
+			DRM_ERROR("Failed to read sdma user MQD\n");
+			amdgpu_userqueue_destroy_object(uq_mgr, ctx);
+			return -ENOMEM;
+		}
+
+		mqd->sdmax_rlcx_csa_addr_lo = mqd_sdma_v11->csa_va & 0xFFFFFFFC;
+		mqd->sdmax_rlcx_csa_addr_hi = upper_32_bits(mqd_sdma_v11->csa_va);
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 208a1fa9d4e7..62f6f015c685 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -43,6 +43,7 @@
 #include "sdma_common.h"
 #include "sdma_v6_0.h"
 #include "v11_structs.h"
+#include "mes_v11_0_userqueue.h"
 
 MODULE_FIRMWARE("amdgpu/sdma_6_0_0.bin");
 MODULE_FIRMWARE("amdgpu/sdma_6_0_1.bin");
@@ -1340,6 +1341,7 @@ static int sdma_v6_0_sw_init(void *handle)
 	else
 		DRM_ERROR("Failed to allocated memory for SDMA IP Dump\n");
 
+	adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
 	return r;
 }
 
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 895d64982498..3ea067242b19 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -428,6 +428,16 @@ struct drm_amdgpu_userq_mqd_gfx11 {
 	__u64   csa_va;
 };
 
+/* GFX V11 SDMA IP specific MQD parameters */
+struct drm_amdgpu_userq_mqd_sdma_gfx11 {
+	/**
+	 * @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
+	 * This must be a from a separate GPU object, and use AMDGPU_INFO IOCTL
+	 * to get the size.
+	 */
+	__u64   csa_va;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID	1
 #define AMDGPU_VM_OP_UNRESERVE_VMID	2
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 13/28] drm/amdgpu: enable compute/gfx usermode queue
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (11 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 12/28] drm/amdgpu: enable SDMA usermode queues Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-09 20:06 ` [PATCH v11 14/28] drm/amdgpu: update userqueue BOs and PDs Shashank Sharma
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig, Arvind Yadav

This patch does the necessary changes required to
enable compute workload support using the existing
usermode queues infrastructure.

V9:  Patch introduced
V10: Add custom IP specific mqd strcuture for compute (Alex)
V11: Rename drm_amdgpu_userq_mqd_compute_gfx_v11 to
     drm_amdgpu_userq_mqd_compute_gfx11 (Marek)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  4 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  2 ++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 23 +++++++++++++++++++
 include/uapi/drm/amdgpu_drm.h                 | 10 ++++++++
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 2c5747cc492e..5173718c3848 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,9 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	int qid, r = 0;
 
 	/* Usermode queues are only supported for GFX IP as of now */
-	if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
+	if (args->in.ip_type != AMDGPU_HW_IP_GFX &&
+	    args->in.ip_type != AMDGPU_HW_IP_DMA &&
+	    args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
 		DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
 		return -EINVAL;
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index e68874fd0ff9..82a8df56240e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1554,6 +1554,7 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
 		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
+		adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
 		break;
 	case IP_VERSION(11, 0, 1):
 	case IP_VERSION(11, 0, 4):
@@ -1567,6 +1568,7 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
 		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
+		adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
 		break;
 	default:
 		adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index dc5359742774..e70b8e429e9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -268,6 +268,29 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 	userq_props->use_doorbell = true;
 	userq_props->doorbell_index = queue->doorbell_index;
 
+	if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
+		struct drm_amdgpu_userq_mqd_compute_gfx11 *compute_mqd;
+
+		if (mqd_user->mqd_size != sizeof(*compute_mqd)) {
+			DRM_ERROR("Invalid compute IP MQD size\n");
+			r = -EINVAL;
+			goto free_mqd;
+		}
+
+		compute_mqd = memdup_user(u64_to_user_ptr(mqd_user->mqd), mqd_user->mqd_size);
+		if (IS_ERR(compute_mqd)) {
+			DRM_ERROR("Failed to read user MQD\n");
+			r = -ENOMEM;
+			goto free_mqd;
+		}
+
+		userq_props->eop_gpu_addr = compute_mqd->eop_va;
+		userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
+		userq_props->hqd_queue_priority = AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
+		userq_props->hqd_active = false;
+		kfree(compute_mqd);
+	}
+
 	queue->userq_prop = userq_props;
 
 	r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 3ea067242b19..6eac46e0f3fd 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -438,6 +438,16 @@ struct drm_amdgpu_userq_mqd_sdma_gfx11 {
 	__u64   csa_va;
 };
 
+/* GFX V11 Compute IP specific MQD parameters */
+struct drm_amdgpu_userq_mqd_compute_gfx11 {
+	/**
+	 * @eop_va: Virtual address of the GPU memory to hold the EOP buffer.
+	 * This must be a from a separate GPU object, and must be at least 1 page
+	 * sized.
+	 */
+	__u64   eop_va;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID	1
 #define AMDGPU_VM_OP_UNRESERVE_VMID	2
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 14/28] drm/amdgpu: update userqueue BOs and PDs
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (12 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 13/28] drm/amdgpu: enable compute/gfx usermode queue Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-09 20:06 ` [PATCH v11 15/28] drm/amdgpu: add kernel config for gfx-userqueue Shashank Sharma
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx
  Cc: Shashank Sharma, Alex Deucher, Christian Koenig, Felix Kuehling,
	Arvind Yadav

This patch updates the VM_IOCTL to allow userspace to synchronize
the mapping/unmapping of a BO in the page table.

The major changes are:
- it adds a drm_timeline object as an input parameter to the VM IOCTL.
- this object is used by the kernel to sync the update of the BO in
  the page table during the mapping of the object.
- the kernel also synchronizes the tlb flush of the page table entry of
  this object during the unmapping (Added in this series:
  https://patchwork.freedesktop.org/series/131276/ and
  https://patchwork.freedesktop.org/patch/584182/)
- the userspace can wait on this timeline, and then the BO is ready to
  be consumed by the GPU.

V2:
 - remove the eviction fence coupling

V3:
 - added the drm timeline support instead of input/output fence
   (Christian)

V4:
 - made timeline 64-bit (Christian)
 - bug fix (Arvind)

V5: GLCTS bug fix (Arvind)
V6: Rename syncobj_handle -> timeline_syncobj_out
    Rename point -> timeline_point_in (Marek)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Change-Id: I0942942641e095408a95d4ab6e2e9d813f0f78db
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       | 14 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 89 ++++++++++++++++++-
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  3 +
 include/uapi/drm/amdgpu_drm.h                 |  4 +
 4 files changed, 107 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index ebb3f87ef4f6..f4529f2fad97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -647,7 +647,7 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
 	if (!amdgpu_vm_ready(vm))
 		return;
 
-	r = amdgpu_vm_clear_freed(adev, vm, NULL);
+	r = amdgpu_vm_clear_freed(adev, vm, &vm->last_update);
 	if (r)
 		goto error;
 
@@ -825,10 +825,20 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 	default:
 		break;
 	}
-	if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !adev->debug_vm)
+	if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !adev->debug_vm) {
 		amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
 					args->operation);
 
+		if (args->timeline_syncobj_out && args->timeline_point_in) {
+			r = amdgpu_userqueue_update_bo_mapping(filp, bo_va, args->operation,
+							       args->timeline_syncobj_out,
+							       args->timeline_point_in);
+			if (r) {
+				DRM_ERROR("Failed to update userqueue mapping (%u)\n", r);
+			}
+		}
+	}
+
 error:
 	drm_exec_fini(&exec);
 	drm_gem_object_put(gobj);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 5173718c3848..c9cc935caabd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -21,7 +21,7 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
-
+#include <drm/drm_syncobj.h>
 #include "amdgpu.h"
 #include "amdgpu_vm.h"
 #include "amdgpu_userqueue.h"
@@ -154,6 +154,87 @@ amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
 	return r;
 }
 
+static int
+amdgpu_userqueue_validate_vm_bo(void *_unused, struct amdgpu_bo *bo)
+{
+	struct ttm_operation_ctx ctx = { false, false };
+	int ret;
+
+	amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
+
+	ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+	if (ret)
+		DRM_ERROR("Fail to validate\n");
+
+	return ret;
+}
+
+int amdgpu_userqueue_update_bo_mapping(struct drm_file *filp, struct amdgpu_bo_va *bo_va,
+				       uint32_t operation, uint32_t syncobj_handle,
+				       uint64_t point)
+{
+	struct amdgpu_bo *bo = bo_va ? bo_va->base.bo : NULL;
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	struct amdgpu_vm *vm = &fpriv->vm;
+	struct drm_syncobj *syncobj;
+	struct dma_fence_chain *chain;
+	struct dma_fence *last_update;
+
+	/*  Find the sync object */
+	syncobj = drm_syncobj_find(filp, syncobj_handle);
+	if (!syncobj)
+		return -ENOENT;
+
+	/* Allocate the chain node */
+	chain = dma_fence_chain_alloc();
+	if (!chain) {
+		drm_syncobj_put(syncobj);
+		return -ENOMEM;
+	}
+
+	/*  Determine the last update fence */
+	if ((bo && (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)) ||
+	    (operation == AMDGPU_VA_OP_UNMAP) ||
+	    (operation == AMDGPU_VA_OP_CLEAR))
+		last_update = vm->last_update;
+	else
+		last_update = bo_va->last_pt_update;
+
+	/* Add given point to timeline */
+	drm_syncobj_add_point(syncobj, chain, last_update, point);
+	return 0;
+}
+
+static int
+amdgpu_userqueue_update_vm(struct amdgpu_userq_mgr *uq_mgr,
+			   struct amdgpu_vm *vm)
+{
+	int ret;
+
+	ret = amdgpu_bo_reserve(vm->root.bo, true);
+	if (ret) {
+		DRM_ERROR("Reserve failed\n");
+		return ret;
+	}
+
+	/* Validate page directory of the vm */
+	ret = amdgpu_userqueue_validate_vm_bo(NULL, vm->root.bo);
+	if (ret) {
+		DRM_ERROR("Failed to validate PT BOs\n");
+		goto unresv;
+	}
+
+	ret = amdgpu_bo_sync_wait(vm->root.bo, AMDGPU_FENCE_OWNER_VM, false);
+	if (ret) {
+		DRM_ERROR("Sync failed\n");
+		goto unresv;
+	}
+
+unresv:
+	amdgpu_bo_unreserve(vm->root.bo);
+	return ret;
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
@@ -222,6 +303,12 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	queue->flags = args->in.flags;
 	queue->vm = &fpriv->vm;
 
+	r = amdgpu_userqueue_update_vm(uq_mgr, queue->vm);
+	if (r) {
+		DRM_ERROR("Failed to update vm\n");
+		goto unlock;
+	}
+
 	/* Convert relative doorbell offset into absolute doorbell index */
 	index = amdgpu_userqueue_get_doorbell_index(uq_mgr, queue, filp, args->in.doorbell_offset);
 	if (index == (uint64_t)-EINVAL) {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index a653e31350c5..d31e43404640 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -76,4 +76,7 @@ int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
 
 void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
 				     struct amdgpu_userq_obj *userq_obj);
+int amdgpu_userqueue_update_bo_mapping(struct drm_file *filp, struct amdgpu_bo_va *bo_va,
+				       uint32_t operation, uint32_t syncobj_handle,
+				       uint64_t point);
 #endif
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 6eac46e0f3fd..7367e72a38e9 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -721,6 +721,10 @@ struct drm_amdgpu_gem_va {
 	__u64 offset_in_bo;
 	/** Specify mapping size. Must be correctly aligned. */
 	__u64 map_size;
+	/** Sync object handle to wait for userqueue sync */
+	__u32 timeline_syncobj_out;
+	/** Timeline point */
+	__u64 timeline_point_in;
 };
 
 #define AMDGPU_HW_IP_GFX          0
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 15/28] drm/amdgpu: add kernel config for gfx-userqueue
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (13 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 14/28] drm/amdgpu: update userqueue BOs and PDs Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-09 20:06 ` [PATCH v11 21/28] drm/amdgpu: add gfx eviction fence helpers Shashank Sharma
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig, Arvind Yadav

This patch:
- adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
- moves the usequeue initialization code for all IPs under
  this flag

so that the userqueue works only when the config is enabled.

V9:  Introduce this patch
V10: Call it CONFIG_DRM_AMDGPU_NAVI3X_USERQ instead of
     CONFIG_DRM_AMDGPU_USERQ_GFX (Christian)
V11: Add GFX in the config help description message.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Change-Id: I509a1fc9eb9ae1adddd1e042ae4456737333a606
---
 drivers/gpu/drm/amd/amdgpu/Kconfig     | 8 ++++++++
 drivers/gpu/drm/amd/amdgpu/Makefile    | 4 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++++
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 0051fb1b437f..b7f41177b3b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -92,6 +92,14 @@ config DRM_AMDGPU_WERROR
 	  Add -Werror to the build flags for amdgpu.ko.
 	  Only enable this if you are warning code for amdgpu.ko.
 
+config DRM_AMDGPU_NAVI3X_USERQ
+	bool "Enable Navi 3x gfx usermode queues"
+	depends on DRM_AMDGPU
+	default n
+	help
+	  Choose this option to enable GFX usermode queue support for GFX/SDMA/Compute
+          workload submission. This feature is supported on Navi 3X only.
+
 source "drivers/gpu/drm/amd/acp/Kconfig"
 source "drivers/gpu/drm/amd/display/Kconfig"
 source "drivers/gpu/drm/amd/amdkfd/Kconfig"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index d9bf70251eba..beb8442b4e3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -174,7 +174,9 @@ amdgpu-y += \
 	amdgpu_mes.o \
 	mes_v11_0.o \
 	mes_v12_0.o \
-	mes_v11_0_userqueue.o
+
+# add GFX userqueue support
+amdgpu-$(CONFIG_DRM_AMDGPU_NAVI3X_USERQ) += mes_v11_0_userqueue.o
 
 # add UVD block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 82a8df56240e..f3d034f2d4fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1553,8 +1553,10 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_mec = 2;
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMDGPU_NAVI3X_USERQ
 		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
 		adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
+#endif
 		break;
 	case IP_VERSION(11, 0, 1):
 	case IP_VERSION(11, 0, 4):
@@ -1567,8 +1569,10 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_mec = 1;
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
 		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
 		adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
+#endif
 		break;
 	default:
 		adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 62f6f015c685..bb11917ad855 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1341,7 +1341,10 @@ static int sdma_v6_0_sw_init(void *handle)
 	else
 		DRM_ERROR("Failed to allocated memory for SDMA IP Dump\n");
 
+#ifdef CONFIG_DRM_AMDGPU_NAVI3X_USERQ
 	adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
+#endif
+
 	return r;
 }
 
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 21/28] drm/amdgpu: add gfx eviction fence helpers
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (14 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 15/28] drm/amdgpu: add kernel config for gfx-userqueue Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-16 14:14   ` Christian König
  2024-09-09 20:06 ` [PATCH v11 22/28] drm/amdgpu: add userqueue suspend/resume functions Shashank Sharma
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Christian Koenig, Alex Deucher, Arvind Yadav

This patch adds basic eviction fence framework for the gfx buffers.
The idea is to:
- One eviction fence is created per gfx process, at kms_open.
- This fence is attached to all the gem buffers created
  by this process.
- This fence is detached to all the gem buffers at postclose_kms.

This framework will be further used for usermode queues.

V2: Addressed review comments from Christian
    - keep fence_ctx and fence_seq directly in fpriv
    - evcition_fence should be dynamically allocated
    - do not save eviction fence instance in BO, there could be many
      such fences attached to one BO
    - use dma_resv_replace_fence() in detach

V3: Addressed review comments from Christian
    - eviction fence create and destroy functions should be called only once
      from fpriv create/destroy
    - use dma_fence_put() in eviction_fence_destroy

V4: Addressed review comments from Christian:
    - create a separate ev_fence_mgr structure
    - cleanup fence init part
    - do not add a domain for fence owner KGD

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Change-Id: I7a8d27d7172bafbfe34aa9decf2cd36655948275
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   6 +-
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 148 ++++++++++++++++++
 .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |  65 ++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |   9 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   3 +
 6 files changed, 231 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index ff5621697c68..0643078d1225 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -66,7 +66,7 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o amdgpu_kms.o \
 	amdgpu_fw_attestation.o amdgpu_securedisplay.o \
 	amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
 	amdgpu_ring_mux.o amdgpu_xcp.o amdgpu_seq64.o amdgpu_aca.o amdgpu_dev_coredump.o \
-	amdgpu_userq_fence.o
+	amdgpu_userq_fence.o amdgpu_eviction_fence.o
 
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 76ada47b1875..0013bfc74024 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -113,6 +113,7 @@
 #include "amdgpu_seq64.h"
 #include "amdgpu_reg_state.h"
 #include "amdgpu_userqueue.h"
+#include "amdgpu_eviction_fence.h"
 #if defined(CONFIG_DRM_AMD_ISP)
 #include "amdgpu_isp.h"
 #endif
@@ -481,7 +482,6 @@ struct amdgpu_flip_work {
 	bool				async;
 };
 
-
 /*
  * file private structure
  */
@@ -495,6 +495,10 @@ struct amdgpu_fpriv {
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
 	struct amdgpu_userq_mgr	userq_mgr;
+
+	/* Eviction fence infra */
+	struct amdgpu_eviction_fence_mgr evf_mgr;
+
 	/** GPU partition selection */
 	uint32_t		xcp_id;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
new file mode 100644
index 000000000000..2d474cb11cf9
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -0,0 +1,148 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include <linux/sched.h>
+#include "amdgpu.h"
+
+static const char *
+amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence)
+{
+	return "amdgpu";
+}
+
+static const char *
+amdgpu_eviction_fence_get_timeline_name(struct dma_fence *f)
+{
+	struct amdgpu_eviction_fence *ef;
+
+	ef = container_of(f, struct amdgpu_eviction_fence, base);
+	return ef->timeline_name;
+}
+
+static const struct dma_fence_ops amdgpu_eviction_fence_ops = {
+	.use_64bit_seqno = true,
+	.get_driver_name = amdgpu_eviction_fence_get_driver_name,
+	.get_timeline_name = amdgpu_eviction_fence_get_timeline_name,
+};
+
+int amdgpu_eviction_fence_signal(struct amdgpu_eviction_fence_mgr *evf_mgr)
+{
+	int ret;
+
+	spin_lock(&evf_mgr->ev_fence_lock);
+	ret = dma_fence_signal(&evf_mgr->ev_fence->base);
+	spin_unlock(&evf_mgr->ev_fence_lock);
+	return ret;
+}
+
+struct amdgpu_eviction_fence *
+amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr)
+{
+	struct amdgpu_eviction_fence *ev_fence;
+
+	ev_fence = kzalloc(sizeof(*ev_fence), GFP_KERNEL);
+	if (!ev_fence)
+		return NULL;
+
+	get_task_comm(ev_fence->timeline_name, current);
+	spin_lock_init(&ev_fence->lock);
+	dma_fence_init(&ev_fence->base, &amdgpu_eviction_fence_ops,
+		       &ev_fence->lock, evf_mgr->ev_fence_ctx,
+		       atomic_inc_return(&evf_mgr->ev_fence_seq));
+	return ev_fence;
+}
+
+void amdgpu_eviction_fence_destroy(struct amdgpu_eviction_fence_mgr *evf_mgr)
+{
+	if (!evf_mgr->ev_fence)
+		return;
+
+	if (!dma_fence_is_signaled(&evf_mgr->ev_fence->base))
+		dma_fence_wait(&evf_mgr->ev_fence->base, false);
+
+	/* Last unref of ev_fence */
+	spin_lock(&evf_mgr->ev_fence_lock);
+	dma_fence_put(&evf_mgr->ev_fence->base);
+	evf_mgr->ev_fence = NULL;
+	spin_unlock(&evf_mgr->ev_fence_lock);
+}
+
+int amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
+				 struct amdgpu_bo *bo)
+{
+	struct dma_fence *ef;
+	struct amdgpu_eviction_fence *ev_fence = evf_mgr->ev_fence;
+	struct dma_resv *resv = bo->tbo.base.resv;
+	int ret;
+
+	if (!ev_fence || !resv)
+		return 0;
+
+	ef = &ev_fence->base;
+	ret = dma_resv_reserve_fences(resv, 1);
+	if (ret) {
+		dma_fence_wait(ef, false);
+		return ret;
+	}
+
+	spin_lock(&evf_mgr->ev_fence_lock);
+	dma_resv_add_fence(resv, ef, DMA_RESV_USAGE_BOOKKEEP);
+	spin_unlock(&evf_mgr->ev_fence_lock);
+	return 0;
+}
+
+void amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr *evf_mgr,
+				  struct amdgpu_bo *bo)
+{
+	struct dma_fence *stub;
+	struct amdgpu_eviction_fence *ev_fence = evf_mgr->ev_fence;
+
+	if (!ev_fence)
+		return;
+
+	spin_lock(&evf_mgr->ev_fence_lock);
+	stub = dma_fence_get_stub();
+	dma_resv_replace_fences(bo->tbo.base.resv, evf_mgr->ev_fence_ctx,
+				stub, DMA_RESV_USAGE_BOOKKEEP);
+	dma_fence_put(stub);
+	spin_unlock(&evf_mgr->ev_fence_lock);
+}
+
+void amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr *evf_mgr)
+{
+
+	/* This needs to be done one time per open */
+	atomic_set(&evf_mgr->ev_fence_seq, 0);
+	evf_mgr->ev_fence_ctx = dma_fence_context_alloc(1);
+	spin_lock_init(&evf_mgr->ev_fence_lock);
+
+	spin_lock(&evf_mgr->ev_fence_lock);
+	evf_mgr->ev_fence = amdgpu_eviction_fence_create(evf_mgr);
+	if (!evf_mgr->ev_fence) {
+		DRM_ERROR("Failed to craete eviction fence\n");
+		goto unlock;
+	}
+
+unlock:
+	spin_unlock(&evf_mgr->ev_fence_lock);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
new file mode 100644
index 000000000000..b47ab1307ec5
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef AMDGPU_EV_FENCE_H_
+#define AMDGPU_EV_FENCE_H_
+
+struct amdgpu_eviction_fence {
+	struct dma_fence base;
+	spinlock_t	 lock;
+	char		 timeline_name[TASK_COMM_LEN];
+	struct amdgpu_userq_mgr *uq_mgr;
+};
+
+struct amdgpu_eviction_fence_mgr {
+	u64			ev_fence_ctx;
+	atomic_t		ev_fence_seq;
+	spinlock_t 		ev_fence_lock;
+	struct amdgpu_eviction_fence *ev_fence;
+};
+
+/* Eviction fence helper functions */
+struct amdgpu_eviction_fence *
+amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr);
+
+void
+amdgpu_eviction_fence_destroy(struct amdgpu_eviction_fence_mgr *evf_mgr);
+
+int
+amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
+			     struct amdgpu_bo *bo);
+
+void
+amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr *evf_mgr,
+			     struct amdgpu_bo *bo);
+
+void
+amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr *evf_mgr);
+
+int
+amdgpu_eviction_fence_signal(struct amdgpu_eviction_fence_mgr *evf_mgr);
+
+int
+amdgpu_eviction_fence_replace_fence(struct amdgpu_fpriv *fpriv);
+#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index f4529f2fad97..c9b4a6ce3f14 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -186,6 +186,13 @@ static int amdgpu_gem_object_open(struct drm_gem_object *obj,
 		bo_va = amdgpu_vm_bo_add(adev, vm, abo);
 	else
 		++bo_va->ref_count;
+
+	if (!vm->is_compute_context || !vm->process_info) {
+		/* attach gfx eviction fence */
+		if (amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo))
+			DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
+	}
+
 	amdgpu_bo_unreserve(abo);
 
 	/* Validate and add eviction fence to DMABuf imports with dynamic
@@ -236,6 +243,8 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
 	struct drm_exec exec;
 	long r;
 
+	amdgpu_eviction_fence_detach(&fpriv->evf_mgr, bo);
+
 	drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
 	drm_exec_until_all_locked(&exec) {
 		r = drm_exec_prepare_obj(&exec, &bo->tbo.base, 1);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 019a377620ce..e7fb13e20197 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1391,6 +1391,8 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 	mutex_init(&fpriv->bo_list_lock);
 	idr_init_base(&fpriv->bo_list_handles, 1);
 
+	amdgpu_eviction_fence_init(&fpriv->evf_mgr);
+
 	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
 
 	r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
@@ -1464,6 +1466,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 		amdgpu_bo_unreserve(pd);
 	}
 
+	amdgpu_eviction_fence_destroy(&fpriv->evf_mgr);
 	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
 	amdgpu_vm_fini(adev, &fpriv->vm);
 	amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 22/28] drm/amdgpu: add userqueue suspend/resume functions
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (15 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 21/28] drm/amdgpu: add gfx eviction fence helpers Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-09 20:06 ` [PATCH v11 23/28] drm/amdgpu: suspend gfx userqueues Shashank Sharma
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig, Arvind Yadav

This patch adds userqueue suspend/resume functions at
core MES V11 IP level.

V2: use true/false for queue_active status (Christian)
    added Christian's R-B

V3: reset/set queue status in mqd.create and mqd.destroy

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 33 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  5 +++
 2 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index b3aa49ff1a87..51c9a215ae77 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -331,6 +331,7 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 		goto free_ctx;
 	}
 
+	queue->queue_active = true;
 	return 0;
 
 free_ctx:
@@ -354,9 +355,41 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
 	kfree(queue->userq_prop);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
+	queue->queue_active = false;
+}
+
+static int mes_v11_0_userq_suspend(struct amdgpu_userq_mgr *uq_mgr,
+				   struct amdgpu_usermode_queue *queue)
+{
+	if (queue->queue_active) {
+		mes_v11_0_userq_unmap(uq_mgr, queue);
+		queue->queue_active = false;
+	}
+
+	return 0;
+}
+
+static int mes_v11_0_userq_resume(struct amdgpu_userq_mgr *uq_mgr,
+				  struct amdgpu_usermode_queue *queue)
+{
+	int ret;
+
+	if (queue->queue_active)
+		return 0;
+
+	ret = mes_v11_0_userq_map(uq_mgr, queue, queue->userq_prop);
+	if (ret) {
+		DRM_ERROR("Failed to resume queue\n");
+		return ret;
+	}
+
+	queue->queue_active = true;
+	return 0;
 }
 
 const struct amdgpu_userq_funcs userq_mes_v11_0_funcs = {
 	.mqd_create = mes_v11_0_userq_mqd_create,
 	.mqd_destroy = mes_v11_0_userq_mqd_destroy,
+	.suspend = mes_v11_0_userq_suspend,
+	.resume = mes_v11_0_userq_resume,
 };
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 77a33f9e37f8..37be29048f42 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -37,6 +37,7 @@ struct amdgpu_userq_obj {
 
 struct amdgpu_usermode_queue {
 	int			queue_type;
+	uint8_t			queue_active;
 	uint64_t		doorbell_handle;
 	uint64_t		doorbell_index;
 	uint64_t		flags;
@@ -57,6 +58,10 @@ struct amdgpu_userq_funcs {
 			  struct amdgpu_usermode_queue *queue);
 	void (*mqd_destroy)(struct amdgpu_userq_mgr *uq_mgr,
 			    struct amdgpu_usermode_queue *uq);
+	int (*suspend)(struct amdgpu_userq_mgr *uq_mgr,
+		       struct amdgpu_usermode_queue *queue);
+	int (*resume)(struct amdgpu_userq_mgr *uq_mgr,
+		      struct amdgpu_usermode_queue *queue);
 };
 
 /* Usermode queues for gfx */
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 23/28] drm/amdgpu: suspend gfx userqueues
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (16 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 22/28] drm/amdgpu: add userqueue suspend/resume functions Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-17 11:58   ` Christian König
  2024-09-09 20:06 ` [PATCH v11 24/28] drm/amdgpu: resume " Shashank Sharma
                   ` (5 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig, Arvind Yadav

This patch adds suspend support for gfx userqueues. It typically does
the following:
- adds an enable_signaling function for the eviction fence, so that it
  can trigger the userqueue suspend,
- adds a delayed function for suspending the userqueues, to suspend all
  the queues under this userq manager and signals the eviction fence,
- adds reference of userq manager in the eviction fence container so
  that it can be used in the suspend function.

V2: Addressed Christian's review comments:
    - schedule suspend work immediately

V4: Addressed Christian's review comments:
    - wait for pending uq fences before starting suspend, added
      queue->last_fence for the same
    - accommodate ev_fence_mgr into existing code
    - some bug fixes and NULL checks

V5: Addressed Christian's review comments (gitlab)
    - Wait for eviction fence to get signaled in destroy, dont signal it
    - Wait for eviction fence to get signaled in replace fence, dont signal it

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Change-Id: Ib60a7feda5544e3badc87bd1a991931ee726ee82
---
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 149 ++++++++++++++++++
 .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   2 +
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 100 ++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  10 ++
 6 files changed, 272 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
index 2d474cb11cf9..3d4fc704adb1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -22,8 +22,12 @@
  *
  */
 #include <linux/sched.h>
+#include <drm/drm_exec.h>
 #include "amdgpu.h"
 
+#define work_to_evf_mgr(w, name) container_of(w, struct amdgpu_eviction_fence_mgr, name)
+#define evf_mgr_to_fpriv(e) container_of(e, struct amdgpu_fpriv, evf_mgr)
+
 static const char *
 amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence)
 {
@@ -39,10 +43,150 @@ amdgpu_eviction_fence_get_timeline_name(struct dma_fence *f)
 	return ef->timeline_name;
 }
 
+static void
+amdgpu_eviction_fence_update_fence(struct amdgpu_eviction_fence_mgr *evf_mgr,
+				   struct amdgpu_eviction_fence *new_ef)
+{
+	struct dma_fence *old_ef = &evf_mgr->ev_fence->base;
+
+	spin_lock(&evf_mgr->ev_fence_lock);
+	dma_fence_put(old_ef);
+	evf_mgr->ev_fence = new_ef;
+	spin_unlock(&evf_mgr->ev_fence_lock);
+}
+
+int
+amdgpu_eviction_fence_replace_fence(struct amdgpu_fpriv *fpriv)
+{
+	struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
+	struct amdgpu_vm *vm = &fpriv->vm;
+	struct amdgpu_eviction_fence *old_ef, *new_ef;
+	struct amdgpu_bo_va *bo_va, *tmp;
+	int ret;
+
+	old_ef = evf_mgr->ev_fence;
+	if (old_ef && !dma_fence_is_signaled(&old_ef->base)) {
+		DRM_DEBUG_DRIVER("Old EF not signaled yet\n");
+		dma_fence_wait(&old_ef->base, true);
+	}
+
+	new_ef = amdgpu_eviction_fence_create(evf_mgr);
+	if (!new_ef) {
+		DRM_ERROR("Failed to create new eviction fence\n");
+		return ret;
+	}
+
+	/* Replace fences and free old one */
+	amdgpu_eviction_fence_update_fence(evf_mgr, new_ef);
+
+	/* Attach new eviction fence to BOs */
+	list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {
+		struct amdgpu_bo *bo = bo_va->base.bo;
+
+		if (!bo)
+			continue;
+
+		/* Skip pinned BOs */
+		if (bo->tbo.pin_count)
+			continue;
+
+		ret = amdgpu_eviction_fence_attach(evf_mgr, bo);
+		if (ret) {
+			DRM_ERROR("Failed to attch new eviction fence\n");
+			goto free_err;
+		}
+	}
+
+	return 0;
+
+free_err:
+	kfree(new_ef);
+	return ret;
+}
+
+static void
+amdgpu_eviction_fence_suspend_worker(struct work_struct *work)
+{
+	struct amdgpu_eviction_fence_mgr *evf_mgr = work_to_evf_mgr(work, suspend_work.work);
+	struct amdgpu_fpriv *fpriv = evf_mgr_to_fpriv(evf_mgr);
+	struct amdgpu_vm *vm = &fpriv->vm;
+	struct amdgpu_bo_va *bo_va, *tmp;
+	struct drm_exec exec;
+	struct amdgpu_bo *bo;
+	int ret;
+
+	/* Signal old eviction fence */
+	ret = amdgpu_eviction_fence_signal(evf_mgr);
+	if (ret) {
+		DRM_ERROR("Failed to signal eviction fence err=%d\n", ret);
+		return;
+	}
+
+	/* Cleanup old eviction fence entry */
+	amdgpu_eviction_fence_destroy(evf_mgr);
+
+	/* Do not replace eviction fence is fd is getting closed */
+	if (evf_mgr->eviction_allowed)
+		return;
+
+	drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
+	drm_exec_until_all_locked(&exec) {
+		ret = amdgpu_vm_lock_pd(vm, &exec, 2);
+		drm_exec_retry_on_contention(&exec);
+		if (unlikely(ret)) {
+			DRM_ERROR("Failed to lock PD\n");
+			goto unlock_drm;
+		}
+
+		/* Lock the done list */
+		list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {
+			bo = bo_va->base.bo;
+			if (!bo) continue;
+
+			ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
+			drm_exec_retry_on_contention(&exec);
+			if (unlikely(ret))
+				goto unlock_drm;
+		}
+	}
+	/* Replace old eviction fence with new one */
+	ret = amdgpu_eviction_fence_replace_fence(fpriv);
+	if (ret)
+		DRM_ERROR("Failed to replace eviction fence\n");
+unlock_drm:
+	drm_exec_fini(&exec);
+}
+
+static bool amdgpu_eviction_fence_enable_signaling(struct dma_fence *f)
+{
+	struct amdgpu_eviction_fence_mgr *evf_mgr;
+	struct amdgpu_eviction_fence *ev_fence;
+	struct amdgpu_userq_mgr *uq_mgr;
+	struct amdgpu_fpriv *fpriv;
+
+	if (!f)
+		return true;
+
+	ev_fence = to_ev_fence(f);
+	uq_mgr = ev_fence->uq_mgr;
+	fpriv = uq_mgr_to_fpriv(uq_mgr);
+	evf_mgr = &fpriv->evf_mgr;
+
+	if (uq_mgr->num_userqs)
+		/* If userqueues are active, suspend userqueues */
+		schedule_delayed_work(&uq_mgr->suspend_work, 0);
+	else
+		/* Else just signal and replace eviction fence */
+		schedule_delayed_work(&evf_mgr->suspend_work, 0);
+
+	return true;
+}
+
 static const struct dma_fence_ops amdgpu_eviction_fence_ops = {
 	.use_64bit_seqno = true,
 	.get_driver_name = amdgpu_eviction_fence_get_driver_name,
 	.get_timeline_name = amdgpu_eviction_fence_get_timeline_name,
+	.enable_signaling = amdgpu_eviction_fence_enable_signaling,
 };
 
 int amdgpu_eviction_fence_signal(struct amdgpu_eviction_fence_mgr *evf_mgr)
@@ -59,11 +203,14 @@ struct amdgpu_eviction_fence *
 amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr)
 {
 	struct amdgpu_eviction_fence *ev_fence;
+	struct amdgpu_fpriv *fpriv = evf_mgr_to_fpriv(evf_mgr);
+	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
 
 	ev_fence = kzalloc(sizeof(*ev_fence), GFP_KERNEL);
 	if (!ev_fence)
 		return NULL;
 
+	ev_fence->uq_mgr = uq_mgr;
 	get_task_comm(ev_fence->timeline_name, current);
 	spin_lock_init(&ev_fence->lock);
 	dma_fence_init(&ev_fence->base, &amdgpu_eviction_fence_ops,
@@ -143,6 +290,8 @@ void amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr *evf_mgr)
 		goto unlock;
 	}
 
+	INIT_DELAYED_WORK(&evf_mgr->suspend_work, amdgpu_eviction_fence_suspend_worker);
+
 unlock:
 	spin_unlock(&evf_mgr->ev_fence_lock);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
index b47ab1307ec5..712fabf09fc1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
@@ -37,6 +37,8 @@ struct amdgpu_eviction_fence_mgr {
 	atomic_t		ev_fence_seq;
 	spinlock_t 		ev_fence_lock;
 	struct amdgpu_eviction_fence *ev_fence;
+	struct delayed_work	suspend_work;
+	bool eviction_allowed;
 };
 
 /* Eviction fence helper functions */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index e7fb13e20197..88f3a885b1dc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1434,6 +1434,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 {
 	struct amdgpu_device *adev = drm_to_adev(dev);
 	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
+	struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
 	struct amdgpu_bo_list *list;
 	struct amdgpu_bo *pd;
 	u32 pasid;
@@ -1466,6 +1467,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 		amdgpu_bo_unreserve(pd);
 	}
 
+	evf_mgr->eviction_allowed = true;
 	amdgpu_eviction_fence_destroy(&fpriv->evf_mgr);
 	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
 	amdgpu_vm_fini(adev, &fpriv->vm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 614953b0fc19..4cf65aba9a9b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -455,10 +455,18 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
 	if (r)
 		goto exec_fini;
 
-	for (i = 0; i < num_bo_handles; i++)
+	/* Save the fence to wait for during suspend */
+	dma_fence_put(queue->last_fence);
+	queue->last_fence = dma_fence_get(fence);
+
+	for (i = 0; i < num_bo_handles; i++) {
+		if (!gobj || !gobj[i]->resv)
+			continue;
+
 		dma_resv_add_fence(gobj[i]->resv, fence,
 				   dma_resv_usage_rw(args->bo_flags &
 				   AMDGPU_USERQ_BO_WRITE));
+	}
 
 	/* Add the created fence to syncobj/BO's */
 	for (i = 0; i < num_syncobj_handles; i++)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index ba986d55ceeb..979174f80993 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -22,6 +22,7 @@
  *
  */
 #include <drm/drm_syncobj.h>
+#include <drm/drm_exec.h>
 #include "amdgpu.h"
 #include "amdgpu_vm.h"
 #include "amdgpu_userqueue.h"
@@ -282,6 +283,7 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 	amdgpu_bo_unpin(queue->db_obj.obj);
 	amdgpu_bo_unref(&queue->db_obj.obj);
 	amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
+	uq_mgr->num_userqs--;
 	mutex_unlock(&uq_mgr->userq_mutex);
 	return 0;
 }
@@ -369,6 +371,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 		goto unlock;
 	}
 	args->out.queue_id = qid;
+	uq_mgr->num_userqs++;
 
 unlock:
 	mutex_unlock(&uq_mgr->userq_mutex);
@@ -402,12 +405,109 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
 	return r;
 }
 
+static int
+amdgpu_userqueue_suspend_all(struct amdgpu_userq_mgr *uq_mgr)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	const struct amdgpu_userq_funcs *userq_funcs;
+	struct amdgpu_usermode_queue *queue;
+	int queue_id, ret;
+
+	userq_funcs = adev->userq_funcs[AMDGPU_HW_IP_GFX];
+
+	/* Suspend all the queues for this process */
+	idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
+		ret = userq_funcs->suspend(uq_mgr, queue);
+		if (ret)
+			DRM_ERROR("Failed to suspend queue\n");
+	}
+
+	return ret;
+}
+
+static int
+amdgpu_userqueue_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
+{
+	struct amdgpu_usermode_queue *queue;
+	int queue_id, ret;
+
+	idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
+		struct dma_fence *f;
+		struct drm_exec exec;
+
+		drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
+		drm_exec_until_all_locked(&exec) {
+			f = queue->last_fence;
+			drm_exec_retry_on_contention(&exec);
+		}
+		drm_exec_fini(&exec);
+
+		if (!f || dma_fence_is_signaled(f))
+			continue;
+		ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
+		if ( ret <= 0) {
+			DRM_ERROR("Timed out waiting for fence f=%p\n", f);
+			return -ETIMEDOUT;
+		}
+	}
+
+	return 0;
+}
+
+static void
+amdgpu_userqueue_suspend_worker(struct work_struct *work)
+{
+	int ret;
+	struct amdgpu_userq_mgr *uq_mgr = work_to_uq_mgr(work, suspend_work.work);
+	struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
+	struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
+
+	/* Wait for any pending userqueue fence to signal */
+	ret = amdgpu_userqueue_wait_for_signal(uq_mgr);
+	if (ret) {
+		DRM_ERROR("Not suspending userqueue, timeout waiting for work\n");
+		return;
+	}
+
+	mutex_lock(&uq_mgr->userq_mutex);
+	ret = amdgpu_userqueue_suspend_all(uq_mgr);
+	if (ret) {
+		DRM_ERROR("Failed to evict userqueue\n");
+		goto unlock;
+	}
+
+	/* Signal current eviction fence */
+	ret = amdgpu_eviction_fence_signal(evf_mgr);
+	if (ret) {
+		DRM_ERROR("Can't signal eviction fence to suspend\n");
+		goto unlock;
+	}
+
+	/* Cleanup old eviction fence entry */
+	amdgpu_eviction_fence_destroy(evf_mgr);
+
+unlock:
+	mutex_unlock(&uq_mgr->userq_mutex);
+}
+
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
 {
+	struct amdgpu_fpriv *fpriv;
+
 	mutex_init(&userq_mgr->userq_mutex);
 	idr_init_base(&userq_mgr->userq_idr, 1);
 	userq_mgr->adev = adev;
+	userq_mgr->num_userqs = 0;
+
+	fpriv = uq_mgr_to_fpriv(userq_mgr);
+	if (!fpriv->evf_mgr.ev_fence) {
+		DRM_ERROR("Eviction fence not initialized yet\n");
+		return -EINVAL;
+	}
 
+	/* This reference is required for suspend work */
+	fpriv->evf_mgr.ev_fence->uq_mgr = userq_mgr;
+	INIT_DELAYED_WORK(&userq_mgr->suspend_work, amdgpu_userqueue_suspend_worker);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 37be29048f42..8b3b50fa8b5b 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -27,6 +27,10 @@
 
 #define AMDGPU_MAX_USERQ_COUNT 512
 
+#define to_ev_fence(f) container_of(f, struct amdgpu_eviction_fence, base)
+#define work_to_uq_mgr(w, name) container_of(w, struct amdgpu_userq_mgr, name)
+#define uq_mgr_to_fpriv(u) container_of(u, struct amdgpu_fpriv, userq_mgr)
+
 struct amdgpu_mqd_prop;
 
 struct amdgpu_userq_obj {
@@ -50,6 +54,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_userq_obj wptr_obj;
 	struct xarray		uq_fence_drv_xa;
 	struct amdgpu_userq_fence_driver *fence_drv;
+	struct dma_fence	*last_fence;
 };
 
 struct amdgpu_userq_funcs {
@@ -69,6 +74,9 @@ struct amdgpu_userq_mgr {
 	struct idr			userq_idr;
 	struct mutex			userq_mutex;
 	struct amdgpu_device		*adev;
+
+	struct delayed_work		suspend_work;
+	int num_userqs;
 };
 
 int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
@@ -86,4 +94,6 @@ void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
 int amdgpu_userqueue_update_bo_mapping(struct drm_file *filp, struct amdgpu_bo_va *bo_va,
 				       uint32_t operation, uint32_t syncobj_handle,
 				       uint64_t point);
+
+int amdgpu_userqueue_enable_signaling(struct dma_fence *f);
 #endif
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 24/28] drm/amdgpu: resume gfx userqueues
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (17 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 23/28] drm/amdgpu: suspend gfx userqueues Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-17 12:30   ` Christian König
  2024-09-09 20:06 ` [PATCH v11 25/28] drm/amdgpu: Add input fence to sync bo unmap Shashank Sharma
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma, Alex Deucher, Christian Koenig, Arvind Yadav

This patch adds support for userqueue resume. What it typically does is
this:
- adds a new delayed work for resuming all the queues.
- schedules this delayed work from the suspend work.
- validates the BOs and replaces the eviction fence before resuming all
  the queues running under this instance of userq manager.

V2: Addressed Christian's review comments:
    - declare local variables like ret at the bottom.
    - lock all the object first, then start attaching the new fence.
    - dont replace old eviction fence, just attach new eviction fence.
    - no error logs for drm_exec_lock failures
    - no need to reserve bos after drm_exec_locked
    - schedule the resume worker immediately (not after 100 ms)
    - check for NULL BO (Arvind)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 120 ++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |   1 +
 2 files changed, 121 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 979174f80993..e7f7354e0c0e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -405,6 +405,122 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
 	return r;
 }
 
+static int
+amdgpu_userqueue_resume_all(struct amdgpu_userq_mgr *uq_mgr)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	const struct amdgpu_userq_funcs *userq_funcs;
+	struct amdgpu_usermode_queue *queue;
+	int queue_id, ret;
+
+	userq_funcs = adev->userq_funcs[AMDGPU_HW_IP_GFX];
+
+	/* Resume all the queues for this process */
+	idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
+		ret = userq_funcs->resume(uq_mgr, queue);
+		if (ret)
+			DRM_ERROR("Failed to resume queue %d\n", queue_id);
+	}
+
+	return ret;
+}
+
+static int
+amdgpu_userqueue_validate_bos(struct amdgpu_userq_mgr *uq_mgr)
+{
+	struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
+	struct amdgpu_vm *vm = &fpriv->vm;
+	struct amdgpu_bo_va *bo_va, *tmp;
+	struct drm_exec exec;
+	struct amdgpu_bo *bo;
+	int ret;
+
+	drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
+	drm_exec_until_all_locked(&exec) {
+		ret = amdgpu_vm_lock_pd(vm, &exec, 2);
+		drm_exec_retry_on_contention(&exec);
+		if (unlikely(ret)) {
+			DRM_ERROR("Failed to lock PD\n");
+			goto unlock_all;
+		}
+
+		/* Lock the done list */
+		list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {
+			bo = bo_va->base.bo;
+			if (!bo)
+				continue;
+
+			ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
+			drm_exec_retry_on_contention(&exec);
+			if (unlikely(ret))
+				goto unlock_all;
+		}
+
+		/* Lock the invalidated list */
+		list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, base.vm_status) {
+			bo = bo_va->base.bo;
+			if (!bo)
+				continue;
+
+			ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
+			drm_exec_retry_on_contention(&exec);
+			if (unlikely(ret))
+				goto unlock_all;
+		}
+	}
+
+	/* Now validate BOs */
+	list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, base.vm_status) {
+		bo = bo_va->base.bo;
+		if (!bo)
+			continue;
+
+		ret = amdgpu_userqueue_validate_vm_bo(NULL, bo);
+		if (ret) {
+			DRM_ERROR("Failed to validate BO\n");
+			goto unlock_all;
+		}
+	}
+
+	/* Handle the moved BOs */
+	ret = amdgpu_vm_handle_moved(uq_mgr->adev, vm, &exec.ticket);
+	if (ret) {
+		DRM_ERROR("Failed to handle moved BOs\n");
+		goto unlock_all;
+	}
+
+	ret = amdgpu_eviction_fence_replace_fence(fpriv);
+	if (ret)
+		DRM_ERROR("Failed to replace eviction fence\n");
+
+unlock_all:
+	drm_exec_fini(&exec);
+	return ret;
+}
+
+static void amdgpu_userqueue_resume_worker(struct work_struct *work)
+{
+	struct amdgpu_userq_mgr *uq_mgr = work_to_uq_mgr(work, resume_work.work);
+	int ret;
+
+	mutex_lock(&uq_mgr->userq_mutex);
+
+	ret = amdgpu_userqueue_validate_bos(uq_mgr);
+	if (ret) {
+		DRM_ERROR("Failed to validate BOs to restore\n");
+		goto unlock;
+	}
+
+	ret = amdgpu_userqueue_resume_all(uq_mgr);
+	if (ret) {
+		DRM_ERROR("Failed to resume all queues\n");
+		goto unlock;
+	}
+
+unlock:
+	mutex_unlock(&uq_mgr->userq_mutex);
+}
+
 static int
 amdgpu_userqueue_suspend_all(struct amdgpu_userq_mgr *uq_mgr)
 {
@@ -486,6 +602,9 @@ amdgpu_userqueue_suspend_worker(struct work_struct *work)
 	/* Cleanup old eviction fence entry */
 	amdgpu_eviction_fence_destroy(evf_mgr);
 
+	/* Schedule a work to restore userqueue */
+	schedule_delayed_work(&uq_mgr->resume_work, 0);
+
 unlock:
 	mutex_unlock(&uq_mgr->userq_mutex);
 }
@@ -508,6 +627,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
 	/* This reference is required for suspend work */
 	fpriv->evf_mgr.ev_fence->uq_mgr = userq_mgr;
 	INIT_DELAYED_WORK(&userq_mgr->suspend_work, amdgpu_userqueue_suspend_worker);
+	INIT_DELAYED_WORK(&userq_mgr->resume_work, amdgpu_userqueue_resume_worker);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 8b3b50fa8b5b..d035b5c2b14b 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -76,6 +76,7 @@ struct amdgpu_userq_mgr {
 	struct amdgpu_device		*adev;
 
 	struct delayed_work		suspend_work;
+	struct delayed_work		resume_work;
 	int num_userqs;
 };
 
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 25/28] drm/amdgpu: Add input fence to sync bo unmap
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (18 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 24/28] drm/amdgpu: resume " Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-09 20:06 ` [PATCH v11 26/28] drm/amdgpu: fix MES GFX mask Shashank Sharma
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx
  Cc: Arvind Yadav, Alex Deucher, Christian Koenig, Arvind Yadav,
	Shashank Sharma

From: Arvind Yadav <Arvind.Yadav@amd.com>

This patch adds input fences to VM_IOCTL for unmapping an object.
The kernel will unmap the BO only when the fence is signaled.

V2: Bug fix (Arvind)
V3: Bug fix (Arvind)
V4: Rename UAPI objects as per UAPI review (Marek)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Change-Id: Ib1572da97b640d80e39d73c9c166fa1759d720b5
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 41 +++++++++++++++++++++++++
 include/uapi/drm/amdgpu_drm.h           |  4 +++
 2 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index c9b4a6ce3f14..7823faa3dbaa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -36,6 +36,7 @@
 #include <drm/drm_exec.h>
 #include <drm/drm_gem_ttm_helper.h>
 #include <drm/ttm/ttm_tt.h>
+#include <drm/drm_syncobj.h>
 
 #include "amdgpu.h"
 #include "amdgpu_display.h"
@@ -45,6 +46,39 @@
 
 static const struct drm_gem_object_funcs amdgpu_gem_object_funcs;
 
+static void amdgpu_userqueue_add_input_fence(struct drm_file *filp,
+					     uint64_t syncobj_handles_array,
+					     uint32_t num_syncobj_handles)
+{
+	struct dma_fence *fence;
+	uint32_t *syncobj_handles;
+	int ret, i;
+
+	if (!num_syncobj_handles)
+		return;
+
+	syncobj_handles = memdup_user(u64_to_user_ptr(syncobj_handles_array),
+				      sizeof(uint32_t) * num_syncobj_handles);
+	if (IS_ERR(syncobj_handles)) {
+		DRM_ERROR("Failed to get the syncobj handles err = %ld\n",
+			  PTR_ERR(syncobj_handles));
+		return;
+	}
+
+	for (i = 0; i < num_syncobj_handles; i++) {
+
+		if (!syncobj_handles[i])
+			continue;
+
+		ret = drm_syncobj_find_fence(filp, syncobj_handles[i], 0, 0, &fence);
+		if (ret)
+			continue;
+
+		dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+	}
+}
+
 static vm_fault_t amdgpu_gem_fault(struct vm_fault *vmf)
 {
 	struct ttm_buffer_object *bo = vmf->vma->vm_private_data;
@@ -809,6 +843,13 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 		bo_va = NULL;
 	}
 
+	if (args->operation == AMDGPU_VA_OP_UNMAP ||
+	    args->operation == AMDGPU_VA_OP_CLEAR ||
+	    args->operation == AMDGPU_VA_OP_REPLACE)
+		amdgpu_userqueue_add_input_fence(filp,
+						 args->input_fence_syncobj_array_in,
+						 args->num_syncobj_handles_in);
+
 	switch (args->operation) {
 	case AMDGPU_VA_OP_MAP:
 		va_flags = amdgpu_gem_va_map_flags(adev, args->flags);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 1dc1dba6b024..8dd0d1808e37 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -840,6 +840,10 @@ struct drm_amdgpu_gem_va {
 	__u32 timeline_syncobj_out;
 	/** Timeline point */
 	__u64 timeline_point_in;
+	/** Array of sync object handle to wait for given input fences */
+	__u64 input_fence_syncobj_array_in;
+	/** the number of syncobj handles in @input_fence_syncobj_array_in */
+	__u32 num_syncobj_handles_in;
 };
 
 #define AMDGPU_HW_IP_GFX          0
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 26/28] drm/amdgpu: fix MES GFX mask
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (19 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 25/28] drm/amdgpu: Add input fence to sync bo unmap Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-17 12:21   ` Christian König
  2024-09-09 20:06 ` [PATCH v11 27/28] Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV" Shashank Sharma
                   ` (2 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx
  Cc: Arvind Yadav, Christian König, Alex Deucher, Shashank Sharma,
	Arvind Yadav

From: Arvind Yadav <Arvind.Yadav@amd.com>

Current MES GFX mask prevents FW to enable oversubscription. This patch
does the following:
- Fixes the mask values and adds a description for the same
- Removes the central mask setup and makes it IP specific, as it would
  be different when the number of pipes and queues are different.

Cc: Christian König <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Change-Id: I86f5b89c5527c23df94edc707c69c78819f4c8cf
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++++++--
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index f7d5d4f08a53..dbf19122dfc3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -151,9 +151,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
 		adev->mes.compute_hqd_mask[i] = 0xc;
 	}
 
-	for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
-		adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffffffe;
-
 	for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
 		if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
 		    IP_VERSION(6, 0, 0))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 96788c0f42f1..45e3508f0f8e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -109,8 +109,8 @@ struct amdgpu_mes {
 
 	uint32_t                        vmid_mask_gfxhub;
 	uint32_t                        vmid_mask_mmhub;
-	uint32_t                        compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
 	uint32_t                        gfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
+	uint32_t                        compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
 	uint32_t                        sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
 	uint32_t                        aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
 	uint32_t                        sch_ctx_offs[AMDGPU_MAX_MES_PIPES];
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 2911c45cfbe0..d2610a664b2a 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -653,8 +653,13 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes *mes)
 		mes_set_hw_res_pkt.compute_hqd_mask[i] =
 			mes->compute_hqd_mask[i];
 
-	for (i = 0; i < MAX_GFX_PIPES; i++)
-		mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+	/*
+	 * GFX pipe 0 queue 0 is being used by kernel
+	 * Set GFX pipe 0 queue 1 for MES scheduling
+	 * GFX pipe 1 can't be used for MES due to HW limitation.
+	 */
+	mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+	mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
 
 	for (i = 0; i < MAX_SDMA_PIPES; i++)
 		mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 27/28] Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV"
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (20 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 26/28] drm/amdgpu: fix MES GFX mask Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-09 20:31   ` Alex Deucher
  2024-09-09 20:06 ` [PATCH v11 28/28] Revert "drm/amdgpu: don't allow userspace to create a doorbell BO" Shashank Sharma
  2024-09-19 16:59 ` [PATCH v11 00/28] AMDGPU usermode queues Alex Deucher
  23 siblings, 1 reply; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Shashank Sharma

From: Shashank Sharma <contactshashanksharma@gmail.com>

This reverts commit 81af32520e7aaa337fe132f16c12ce54170187ea.

This commit prevents a usermode queue client to get the shadow related
information.

Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index dbf3bcadee32..1f0f7ec0facc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -661,12 +661,8 @@ static void gfx_v11_0_check_fw_cp_gfx_shadow(struct amdgpu_device *adev)
 	case IP_VERSION(11, 0, 3):
 		if ((adev->gfx.me_fw_version >= 1505) &&
 		    (adev->gfx.pfp_fw_version >= 1600) &&
-		    (adev->gfx.mec_fw_version >= 512)) {
-			if (amdgpu_sriov_vf(adev))
-				adev->gfx.cp_gfx_shadow = true;
-			else
-				adev->gfx.cp_gfx_shadow = false;
-		}
+		    (adev->gfx.mec_fw_version >= 512))
+			adev->gfx.cp_gfx_shadow = true;
 		break;
 	default:
 		adev->gfx.cp_gfx_shadow = false;
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 28/28] Revert "drm/amdgpu: don't allow userspace to create a doorbell BO"
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (21 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 27/28] Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV" Shashank Sharma
@ 2024-09-09 20:06 ` Shashank Sharma
  2024-09-17 12:25   ` Christian König
  2024-09-19 16:59 ` [PATCH v11 00/28] AMDGPU usermode queues Alex Deucher
  23 siblings, 1 reply; 38+ messages in thread
From: Shashank Sharma @ 2024-09-09 20:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav

From: Arvind Yadav <Arvind.Yadav@amd.com>

This reverts commit 6be2ad4f0073c541146caa66c5ae936c955a8224.
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 7823faa3dbaa..2e3c974a3340 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -365,10 +365,6 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
 	uint32_t handle, initial_domain;
 	int r;
 
-	/* reject DOORBELLs until userspace code to use it is available */
-	if (args->in.domains & AMDGPU_GEM_DOMAIN_DOORBELL)
-		return -EINVAL;
-
 	/* reject invalid gem flags */
 	if (flags & ~(AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
 		      AMDGPU_GEM_CREATE_NO_CPU_ACCESS |
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 27/28] Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV"
  2024-09-09 20:06 ` [PATCH v11 27/28] Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV" Shashank Sharma
@ 2024-09-09 20:31   ` Alex Deucher
  2024-09-11  9:20     ` Sharma, Shashank
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Deucher @ 2024-09-09 20:31 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Shashank Sharma

On Mon, Sep 9, 2024 at 4:18 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> This reverts commit 81af32520e7aaa337fe132f16c12ce54170187ea.
>
> This commit prevents a usermode queue client to get the shadow related
> information.
>
> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index dbf3bcadee32..1f0f7ec0facc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -661,12 +661,8 @@ static void gfx_v11_0_check_fw_cp_gfx_shadow(struct amdgpu_device *adev)
>         case IP_VERSION(11, 0, 3):
>                 if ((adev->gfx.me_fw_version >= 1505) &&
>                     (adev->gfx.pfp_fw_version >= 1600) &&
> -                   (adev->gfx.mec_fw_version >= 512)) {
> -                       if (amdgpu_sriov_vf(adev))
> -                               adev->gfx.cp_gfx_shadow = true;
> -                       else
> -                               adev->gfx.cp_gfx_shadow = false;
> -               }
> +                   (adev->gfx.mec_fw_version >= 512))
> +                       adev->gfx.cp_gfx_shadow = true;

We need to be a bit more surgical about this.  Setting
adev->gfx.cp_gfx_shadow = true, will also enable
gfx_v11_0_ring_emit_gfx_shadow() to execute on kernel queues which we
don't want.  We just want to enable the query for the shadow and csa
sizes.  Probably easiest to just add a new INFO IOCTL query for that
so we don't break the old query.  I.e., userspace looks for non-0
shadow and csa sizes to determine whether or not to enable shadowing
with kernel queues.

Alex

>                 break;
>         default:
>                 adev->gfx.cp_gfx_shadow = false;
> --
> 2.45.1
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 27/28] Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV"
  2024-09-09 20:31   ` Alex Deucher
@ 2024-09-11  9:20     ` Sharma, Shashank
  0 siblings, 0 replies; 38+ messages in thread
From: Sharma, Shashank @ 2024-09-11  9:20 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx, Shashank Sharma

Hello Alex

On 09/09/2024 22:31, Alex Deucher wrote:
> On Mon, Sep 9, 2024 at 4:18 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> This reverts commit 81af32520e7aaa337fe132f16c12ce54170187ea.
>>
>> This commit prevents a usermode queue client to get the shadow related
>> information.
>>
>> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 8 ++------
>>   1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> index dbf3bcadee32..1f0f7ec0facc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> @@ -661,12 +661,8 @@ static void gfx_v11_0_check_fw_cp_gfx_shadow(struct amdgpu_device *adev)
>>          case IP_VERSION(11, 0, 3):
>>                  if ((adev->gfx.me_fw_version >= 1505) &&
>>                      (adev->gfx.pfp_fw_version >= 1600) &&
>> -                   (adev->gfx.mec_fw_version >= 512)) {
>> -                       if (amdgpu_sriov_vf(adev))
>> -                               adev->gfx.cp_gfx_shadow = true;
>> -                       else
>> -                               adev->gfx.cp_gfx_shadow = false;
>> -               }
>> +                   (adev->gfx.mec_fw_version >= 512))
>> +                       adev->gfx.cp_gfx_shadow = true;
> We need to be a bit more surgical about this.  Setting
> adev->gfx.cp_gfx_shadow = true, will also enable
> gfx_v11_0_ring_emit_gfx_shadow() to execute on kernel queues which we
> don't want.  We just want to enable the query for the shadow and csa
> sizes.  Probably easiest to just add a new INFO IOCTL query for that
> so we don't break the old query.  I.e., userspace looks for non-0
> shadow and csa sizes to determine whether or not to enable shadowing
> with kernel queues.

I agree, I will fine tune this approach instead of reverting the patch.

Shashank

> Alex
>
>>                  break;
>>          default:
>>                  adev->gfx.cp_gfx_shadow = false;
>> --
>> 2.45.1
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 08/28] drm/amdgpu: map wptr BO into GART
  2024-09-09 20:05 ` [PATCH v11 08/28] drm/amdgpu: map wptr BO into GART Shashank Sharma
@ 2024-09-16 12:39   ` Christian König
  0 siblings, 0 replies; 38+ messages in thread
From: Christian König @ 2024-09-16 12:39 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Alex Deucher, Christian Koenig, Arvind Yadav

Am 09.09.24 um 22:05 schrieb Shashank Sharma:
> To support oversubscription, MES FW expects WPTR BOs to
> be mapped into GART, before they are submitted to usermode
> queues. This patch adds a function for the same.
>
> V4: fix the wptr value before mapping lookup (Bas, Christian).
>
> V5: Addressed review comments from Christian:
>      - Either pin object or allocate from GART, but not both.
>      - All the handling must be done with the VM locks held.
>
> V7: Addressed review comments from Christian:
>      - Do not take vm->eviction_lock
>      - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset
>
> V8:  Rebase
> V9:  Changed the function names from gfx_v11* to mes_v11*
> V10: Remove unused adev (Harish)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 76 +++++++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>   2 files changed, 77 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> index a1bc6f488928..90511abaef05 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> @@ -30,6 +30,73 @@
>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>   
> +static int
> +mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_bo *bo)
> +{
> +	int ret;
> +
> +	ret = amdgpu_bo_reserve(bo, true);
> +	if (ret) {
> +		DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
> +		goto err_reserve_bo_failed;
> +	}
> +
> +	ret = amdgpu_ttm_alloc_gart(&bo->tbo);
> +	if (ret) {
> +		DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
> +		goto err_map_bo_gart_failed;
> +	}
> +
> +	amdgpu_bo_unreserve(bo);
> +	bo = amdgpu_bo_ref(bo);
> +
> +	return 0;
> +
> +err_map_bo_gart_failed:
> +	amdgpu_bo_unreserve(bo);
> +err_reserve_bo_failed:
> +	return ret;
> +}
> +
> +static int
> +mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
> +			      struct amdgpu_usermode_queue *queue,
> +			      uint64_t wptr)
> +{
> +	struct amdgpu_bo_va_mapping *wptr_mapping;
> +	struct amdgpu_vm *wptr_vm;
> +	struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
> +	int ret;
> +
> +	wptr_vm = queue->vm;
> +	ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
> +	if (ret)
> +		return ret;
> +
> +	wptr &= AMDGPU_GMC_HOLE_MASK;
> +	wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
> +	amdgpu_bo_unreserve(wptr_vm->root.bo);
> +	if (!wptr_mapping) {
> +		DRM_ERROR("Failed to lookup wptr bo\n");
> +		return -EINVAL;
> +	}
> +
> +	wptr_obj->obj = wptr_mapping->bo_va->base.bo;
> +	if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
> +		DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = mes_v11_0_map_gtt_bo_to_gart(wptr_obj->obj);
> +	if (ret) {
> +		DRM_ERROR("Failed to map wptr bo to GART\n");
> +		return ret;
> +	}
> +
> +	queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
> +	return 0;
> +}
> +
>   static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>   			       struct amdgpu_usermode_queue *queue,
>   			       struct amdgpu_mqd_prop *userq_props)
> @@ -61,6 +128,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>   	queue_input.queue_size = userq_props->queue_size >> 2;
>   	queue_input.doorbell_offset = userq_props->doorbell_index;
>   	queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
> +	queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
>   
>   	amdgpu_mes_lock(&adev->mes);
>   	r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
> @@ -168,6 +236,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>   		goto free_mqd;
>   	}
>   
> +	/* FW expects WPTR BOs to be mapped into GART */
> +	r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, userq_props->wptr_gpu_addr);
> +	if (r) {
> +		DRM_ERROR("Failed to create WPTR mapping\n");
> +		goto free_ctx;
> +	}
> +
>   	/* Map userqueue into FW using MES */
>   	r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
>   	if (r) {
> @@ -194,6 +269,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>   			    struct amdgpu_usermode_queue *queue)
>   {
>   	mes_v11_0_userq_unmap(uq_mgr, queue);
> +	amdgpu_bo_unref(&queue->wptr_obj.obj);
>   	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>   	kfree(queue->userq_prop);
>   	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 643f31474bd8..ffe8a3d73756 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -45,6 +45,7 @@ struct amdgpu_usermode_queue {
>   	struct amdgpu_vm	*vm;
>   	struct amdgpu_userq_obj mqd;
>   	struct amdgpu_userq_obj fw_obj;
> +	struct amdgpu_userq_obj wptr_obj;
>   };
>   
>   struct amdgpu_userq_funcs {


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 21/28] drm/amdgpu: add gfx eviction fence helpers
  2024-09-09 20:06 ` [PATCH v11 21/28] drm/amdgpu: add gfx eviction fence helpers Shashank Sharma
@ 2024-09-16 14:14   ` Christian König
  2024-09-25  9:08     ` Sharma, Shashank
  0 siblings, 1 reply; 38+ messages in thread
From: Christian König @ 2024-09-16 14:14 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Christian Koenig, Alex Deucher, Arvind Yadav

Am 09.09.24 um 22:06 schrieb Shashank Sharma:
> This patch adds basic eviction fence framework for the gfx buffers.
> The idea is to:
> - One eviction fence is created per gfx process, at kms_open.
> - This fence is attached to all the gem buffers created
>    by this process.
> - This fence is detached to all the gem buffers at postclose_kms.
>
> This framework will be further used for usermode queues.
>
> V2: Addressed review comments from Christian
>      - keep fence_ctx and fence_seq directly in fpriv
>      - evcition_fence should be dynamically allocated
>      - do not save eviction fence instance in BO, there could be many
>        such fences attached to one BO
>      - use dma_resv_replace_fence() in detach
>
> V3: Addressed review comments from Christian
>      - eviction fence create and destroy functions should be called only once
>        from fpriv create/destroy
>      - use dma_fence_put() in eviction_fence_destroy
>
> V4: Addressed review comments from Christian:
>      - create a separate ev_fence_mgr structure
>      - cleanup fence init part
>      - do not add a domain for fence owner KGD
>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> Change-Id: I7a8d27d7172bafbfe34aa9decf2cd36655948275
> ---
>   drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   6 +-
>   .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 148 ++++++++++++++++++
>   .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |  65 ++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |   9 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   3 +
>   6 files changed, 231 insertions(+), 2 deletions(-)
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index ff5621697c68..0643078d1225 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -66,7 +66,7 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o amdgpu_kms.o \
>   	amdgpu_fw_attestation.o amdgpu_securedisplay.o \
>   	amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
>   	amdgpu_ring_mux.o amdgpu_xcp.o amdgpu_seq64.o amdgpu_aca.o amdgpu_dev_coredump.o \
> -	amdgpu_userq_fence.o
> +	amdgpu_userq_fence.o amdgpu_eviction_fence.o
>   
>   amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 76ada47b1875..0013bfc74024 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -113,6 +113,7 @@
>   #include "amdgpu_seq64.h"
>   #include "amdgpu_reg_state.h"
>   #include "amdgpu_userqueue.h"
> +#include "amdgpu_eviction_fence.h"
>   #if defined(CONFIG_DRM_AMD_ISP)
>   #include "amdgpu_isp.h"
>   #endif
> @@ -481,7 +482,6 @@ struct amdgpu_flip_work {
>   	bool				async;
>   };
>   
> -
>   /*
>    * file private structure
>    */
> @@ -495,6 +495,10 @@ struct amdgpu_fpriv {
>   	struct idr		bo_list_handles;
>   	struct amdgpu_ctx_mgr	ctx_mgr;
>   	struct amdgpu_userq_mgr	userq_mgr;
> +
> +	/* Eviction fence infra */
> +	struct amdgpu_eviction_fence_mgr evf_mgr;
> +
>   	/** GPU partition selection */
>   	uint32_t		xcp_id;
>   };
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> new file mode 100644
> index 000000000000..2d474cb11cf9
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> @@ -0,0 +1,148 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright 2024 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +#include <linux/sched.h>
> +#include "amdgpu.h"
> +
> +static const char *
> +amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence)
> +{
> +	return "amdgpu";
> +}
> +
> +static const char *
> +amdgpu_eviction_fence_get_timeline_name(struct dma_fence *f)
> +{
> +	struct amdgpu_eviction_fence *ef;
> +
> +	ef = container_of(f, struct amdgpu_eviction_fence, base);
> +	return ef->timeline_name;
> +}
> +
> +static const struct dma_fence_ops amdgpu_eviction_fence_ops = {
> +	.use_64bit_seqno = true,
> +	.get_driver_name = amdgpu_eviction_fence_get_driver_name,
> +	.get_timeline_name = amdgpu_eviction_fence_get_timeline_name,
> +};
> +
> +int amdgpu_eviction_fence_signal(struct amdgpu_eviction_fence_mgr *evf_mgr)
> +{
> +	int ret;
> +
> +	spin_lock(&evf_mgr->ev_fence_lock);
> +	ret = dma_fence_signal(&evf_mgr->ev_fence->base);
> +	spin_unlock(&evf_mgr->ev_fence_lock);
> +	return ret;
> +}
> +
> +struct amdgpu_eviction_fence *
> +amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr)
> +{
> +	struct amdgpu_eviction_fence *ev_fence;
> +
> +	ev_fence = kzalloc(sizeof(*ev_fence), GFP_KERNEL);
> +	if (!ev_fence)
> +		return NULL;
> +
> +	get_task_comm(ev_fence->timeline_name, current);
> +	spin_lock_init(&ev_fence->lock);
> +	dma_fence_init(&ev_fence->base, &amdgpu_eviction_fence_ops,
> +		       &ev_fence->lock, evf_mgr->ev_fence_ctx,
> +		       atomic_inc_return(&evf_mgr->ev_fence_seq));
> +	return ev_fence;
> +}
> +
> +void amdgpu_eviction_fence_destroy(struct amdgpu_eviction_fence_mgr *evf_mgr)
> +{
> +	if (!evf_mgr->ev_fence)
> +		return;
> +
> +	if (!dma_fence_is_signaled(&evf_mgr->ev_fence->base))

You can drop that if, dma_fence_wait() will check that anyway.

> +		dma_fence_wait(&evf_mgr->ev_fence->base, false);
> +
> +	/* Last unref of ev_fence */
> +	spin_lock(&evf_mgr->ev_fence_lock);
> +	dma_fence_put(&evf_mgr->ev_fence->base);
> +	evf_mgr->ev_fence = NULL;
> +	spin_unlock(&evf_mgr->ev_fence_lock);
> +}
> +
> +int amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
> +				 struct amdgpu_bo *bo)
> +{
> +	struct dma_fence *ef;
> +	struct amdgpu_eviction_fence *ev_fence = evf_mgr->ev_fence;
> +	struct dma_resv *resv = bo->tbo.base.resv;
> +	int ret;
> +
> +	if (!ev_fence || !resv)
> +		return 0;
> +
> +	ef = &ev_fence->base;
> +	ret = dma_resv_reserve_fences(resv, 1);
> +	if (ret) {
> +		dma_fence_wait(ef, false);
> +		return ret;
> +	}
> +
> +	spin_lock(&evf_mgr->ev_fence_lock);
> +	dma_resv_add_fence(resv, ef, DMA_RESV_USAGE_BOOKKEEP);
> +	spin_unlock(&evf_mgr->ev_fence_lock);

That spinlock is protecting evf_mgr->ev_fence, isn't it?

In that case you probably shouldn't dereference it outside of the spinlock.

> +	return 0;
> +}
> +
> +void amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr *evf_mgr,
> +				  struct amdgpu_bo *bo)
> +{
> +	struct dma_fence *stub;
> +	struct amdgpu_eviction_fence *ev_fence = evf_mgr->ev_fence;
> +
> +	if (!ev_fence)
> +		return;
> +
> +	spin_lock(&evf_mgr->ev_fence_lock);
> +	stub = dma_fence_get_stub();
> +	dma_resv_replace_fences(bo->tbo.base.resv, evf_mgr->ev_fence_ctx,
> +				stub, DMA_RESV_USAGE_BOOKKEEP);
> +	dma_fence_put(stub);
> +	spin_unlock(&evf_mgr->ev_fence_lock);

This operation doesn't need the spinlock since we are not accessing 
evf_mgr->ev_fence.

> +}
> +
> +void amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr *evf_mgr)
> +{
> +
> +	/* This needs to be done one time per open */
> +	atomic_set(&evf_mgr->ev_fence_seq, 0);
> +	evf_mgr->ev_fence_ctx = dma_fence_context_alloc(1);
> +	spin_lock_init(&evf_mgr->ev_fence_lock);
> +
> +	spin_lock(&evf_mgr->ev_fence_lock);
> +	evf_mgr->ev_fence = amdgpu_eviction_fence_create(evf_mgr);

amdgpu_eviction_fence_create() will call kmalloc, doing that while 
holding the spinlock is a bad idea.

You need to do something like:

tmp = amdgpu_eviction_fence_create(....);
spin_lock(...);
evf_mgr->ev_fence = tmp;
...

> +	if (!evf_mgr->ev_fence) {
> +		DRM_ERROR("Failed to craete eviction fence\n");
> +		goto unlock;
> +	}
> +
> +unlock:
> +	spin_unlock(&evf_mgr->ev_fence_lock);
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> new file mode 100644
> index 000000000000..b47ab1307ec5
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright 2023 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef AMDGPU_EV_FENCE_H_
> +#define AMDGPU_EV_FENCE_H_
> +
> +struct amdgpu_eviction_fence {
> +	struct dma_fence base;
> +	spinlock_t	 lock;
> +	char		 timeline_name[TASK_COMM_LEN];
> +	struct amdgpu_userq_mgr *uq_mgr;
> +};
> +
> +struct amdgpu_eviction_fence_mgr {
> +	u64			ev_fence_ctx;
> +	atomic_t		ev_fence_seq;
> +	spinlock_t 		ev_fence_lock;
> +	struct amdgpu_eviction_fence *ev_fence;
> +};
> +
> +/* Eviction fence helper functions */
> +struct amdgpu_eviction_fence *
> +amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr);
> +
> +void
> +amdgpu_eviction_fence_destroy(struct amdgpu_eviction_fence_mgr *evf_mgr);
> +
> +int
> +amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
> +			     struct amdgpu_bo *bo);
> +
> +void
> +amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr *evf_mgr,
> +			     struct amdgpu_bo *bo);
> +
> +void
> +amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr *evf_mgr);
> +
> +int
> +amdgpu_eviction_fence_signal(struct amdgpu_eviction_fence_mgr *evf_mgr);
> +
> +int
> +amdgpu_eviction_fence_replace_fence(struct amdgpu_fpriv *fpriv);
> +#endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index f4529f2fad97..c9b4a6ce3f14 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -186,6 +186,13 @@ static int amdgpu_gem_object_open(struct drm_gem_object *obj,
>   		bo_va = amdgpu_vm_bo_add(adev, vm, abo);
>   	else
>   		++bo_va->ref_count;
> +
> +	if (!vm->is_compute_context || !vm->process_info) {

I said it before, we should really drop this line since the user queues 
are completely independent of that.

> +		/* attach gfx eviction fence */
> +		if (amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo))
> +			DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
> +	}
> +
>   	amdgpu_bo_unreserve(abo);
>   
>   	/* Validate and add eviction fence to DMABuf imports with dynamic
> @@ -236,6 +243,8 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
>   	struct drm_exec exec;
>   	long r;
>   
> +	amdgpu_eviction_fence_detach(&fpriv->evf_mgr, bo);
> +

We should probably skip that call for per VM BOs, or otherwise we will 
also detach the page tables accidentally.

BTW: Were do we attach the eviction fence to the page tables?

Regards,
Christian.

>   	drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
>   	drm_exec_until_all_locked(&exec) {
>   		r = drm_exec_prepare_obj(&exec, &bo->tbo.base, 1);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 019a377620ce..e7fb13e20197 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1391,6 +1391,8 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>   	mutex_init(&fpriv->bo_list_lock);
>   	idr_init_base(&fpriv->bo_list_handles, 1);
>   
> +	amdgpu_eviction_fence_init(&fpriv->evf_mgr);
> +
>   	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
>   
>   	r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
> @@ -1464,6 +1466,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>   		amdgpu_bo_unreserve(pd);
>   	}
>   
> +	amdgpu_eviction_fence_destroy(&fpriv->evf_mgr);
>   	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>   	amdgpu_vm_fini(adev, &fpriv->vm);
>   	amdgpu_userq_mgr_fini(&fpriv->userq_mgr);


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 23/28] drm/amdgpu: suspend gfx userqueues
  2024-09-09 20:06 ` [PATCH v11 23/28] drm/amdgpu: suspend gfx userqueues Shashank Sharma
@ 2024-09-17 11:58   ` Christian König
  2024-09-25  9:13     ` Sharma, Shashank
  0 siblings, 1 reply; 38+ messages in thread
From: Christian König @ 2024-09-17 11:58 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Alex Deucher, Christian Koenig, Arvind Yadav

Am 09.09.24 um 22:06 schrieb Shashank Sharma:
> This patch adds suspend support for gfx userqueues. It typically does
> the following:
> - adds an enable_signaling function for the eviction fence, so that it
>    can trigger the userqueue suspend,
> - adds a delayed function for suspending the userqueues, to suspend all
>    the queues under this userq manager and signals the eviction fence,
> - adds reference of userq manager in the eviction fence container so
>    that it can be used in the suspend function.
>
> V2: Addressed Christian's review comments:
>      - schedule suspend work immediately
>
> V4: Addressed Christian's review comments:
>      - wait for pending uq fences before starting suspend, added
>        queue->last_fence for the same
>      - accommodate ev_fence_mgr into existing code
>      - some bug fixes and NULL checks
>
> V5: Addressed Christian's review comments (gitlab)
>      - Wait for eviction fence to get signaled in destroy, dont signal it
>      - Wait for eviction fence to get signaled in replace fence, dont signal it
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> Change-Id: Ib60a7feda5544e3badc87bd1a991931ee726ee82
> ---
>   .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 149 ++++++++++++++++++
>   .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   2 +
>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   |  10 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 100 ++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  10 ++
>   6 files changed, 272 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> index 2d474cb11cf9..3d4fc704adb1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> @@ -22,8 +22,12 @@
>    *
>    */
>   #include <linux/sched.h>
> +#include <drm/drm_exec.h>
>   #include "amdgpu.h"
>   
> +#define work_to_evf_mgr(w, name) container_of(w, struct amdgpu_eviction_fence_mgr, name)
> +#define evf_mgr_to_fpriv(e) container_of(e, struct amdgpu_fpriv, evf_mgr)
> +
>   static const char *
>   amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence)
>   {
> @@ -39,10 +43,150 @@ amdgpu_eviction_fence_get_timeline_name(struct dma_fence *f)
>   	return ef->timeline_name;
>   }
>   
> +static void
> +amdgpu_eviction_fence_update_fence(struct amdgpu_eviction_fence_mgr *evf_mgr,
> +				   struct amdgpu_eviction_fence *new_ef)
> +{
> +	struct dma_fence *old_ef = &evf_mgr->ev_fence->base;

The spinlock is protecting evf_mgr->ev_fence so this access without 
holding the spinlock here is illegal.

I think you should just drop the local variable.

> +
> +	spin_lock(&evf_mgr->ev_fence_lock);
> +	dma_fence_put(old_ef);
> +	evf_mgr->ev_fence = new_ef;
> +	spin_unlock(&evf_mgr->ev_fence_lock);
> +}
> +
> +int
> +amdgpu_eviction_fence_replace_fence(struct amdgpu_fpriv *fpriv)
> +{
> +	struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
> +	struct amdgpu_vm *vm = &fpriv->vm;
> +	struct amdgpu_eviction_fence *old_ef, *new_ef;
> +	struct amdgpu_bo_va *bo_va, *tmp;
> +	int ret;
> +
> +	old_ef = evf_mgr->ev_fence;
> +	if (old_ef && !dma_fence_is_signaled(&old_ef->base)) {
> +		DRM_DEBUG_DRIVER("Old EF not signaled yet\n");
> +		dma_fence_wait(&old_ef->base, true);
> +	}

Please completely drop that.

> +
> +	new_ef = amdgpu_eviction_fence_create(evf_mgr);
> +	if (!new_ef) {
> +		DRM_ERROR("Failed to create new eviction fence\n");
> +		return ret;
> +	}
> +
> +	/* Replace fences and free old one */
> +	amdgpu_eviction_fence_update_fence(evf_mgr, new_ef);
> +
> +	/* Attach new eviction fence to BOs */
> +	list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {

It's probably better to use drm_exec_for_each_locked() here.

> +		struct amdgpu_bo *bo = bo_va->base.bo;
> +
> +		if (!bo)
> +			continue;
> +
> +		/* Skip pinned BOs */
> +		if (bo->tbo.pin_count)
> +			continue;

Clearly a bad idea, even pinned BOs need the eviction fence because they 
can be unpinned at any time.

> +
> +		ret = amdgpu_eviction_fence_attach(evf_mgr, bo);
> +		if (ret) {
> +			DRM_ERROR("Failed to attch new eviction fence\n");
> +			goto free_err;
> +		}
> +	}
> +
> +	return 0;
> +
> +free_err:
> +	kfree(new_ef);
> +	return ret;
> +}
> +
> +static void
> +amdgpu_eviction_fence_suspend_worker(struct work_struct *work)
> +{
> +	struct amdgpu_eviction_fence_mgr *evf_mgr = work_to_evf_mgr(work, suspend_work.work);
> +	struct amdgpu_fpriv *fpriv = evf_mgr_to_fpriv(evf_mgr);
> +	struct amdgpu_vm *vm = &fpriv->vm;
> +	struct amdgpu_bo_va *bo_va, *tmp;
> +	struct drm_exec exec;
> +	struct amdgpu_bo *bo;
> +	int ret;
> +
> +	/* Signal old eviction fence */
> +	ret = amdgpu_eviction_fence_signal(evf_mgr);
> +	if (ret) {
> +		DRM_ERROR("Failed to signal eviction fence err=%d\n", ret);
> +		return;
> +	}
> +
> +	/* Cleanup old eviction fence entry */
> +	amdgpu_eviction_fence_destroy(evf_mgr);

Of hand that looks like a bad idea to me. The eviction fence should 
never become NULL unless the fd is closed.

In general we need to make sure that nothing races here, e.g. we always 
need a defensive ordering.

Something like:
1. Lock all BOs
2. Create new eviction fence,
3. Publish eviction fence in the evf_mgr.
4. Add the eviction fence to the BOs.
5. Drop locks on all BOs.

This way concurrently opening/closing BOs should always see the right 
eviction fence.

Regards,
Christian.

> +
> +	/* Do not replace eviction fence is fd is getting closed */
> +	if (evf_mgr->eviction_allowed)
> +		return;
> +
> +	drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
> +	drm_exec_until_all_locked(&exec) {
> +		ret = amdgpu_vm_lock_pd(vm, &exec, 2);
> +		drm_exec_retry_on_contention(&exec);
> +		if (unlikely(ret)) {
> +			DRM_ERROR("Failed to lock PD\n");
> +			goto unlock_drm;
> +		}
> +
> +		/* Lock the done list */
> +		list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {
> +			bo = bo_va->base.bo;
> +			if (!bo) continue;
> +
> +			ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
> +			drm_exec_retry_on_contention(&exec);
> +			if (unlikely(ret))
> +				goto unlock_drm;
> +		}
> +	}
> +	/* Replace old eviction fence with new one */
> +	ret = amdgpu_eviction_fence_replace_fence(fpriv);
> +	if (ret)
> +		DRM_ERROR("Failed to replace eviction fence\n");
> +unlock_drm:
> +	drm_exec_fini(&exec);
> +}
> +
> +static bool amdgpu_eviction_fence_enable_signaling(struct dma_fence *f)
> +{
> +	struct amdgpu_eviction_fence_mgr *evf_mgr;
> +	struct amdgpu_eviction_fence *ev_fence;
> +	struct amdgpu_userq_mgr *uq_mgr;
> +	struct amdgpu_fpriv *fpriv;
> +
> +	if (!f)
> +		return true;
> +
> +	ev_fence = to_ev_fence(f);
> +	uq_mgr = ev_fence->uq_mgr;
> +	fpriv = uq_mgr_to_fpriv(uq_mgr);
> +	evf_mgr = &fpriv->evf_mgr;
> +
> +	if (uq_mgr->num_userqs)

I don't think you should make that decision here. At least of hand that 
looks racy.

Probably better to always trigger the suspend work in the uq manager.

> +		/* If userqueues are active, suspend userqueues */
> +		schedule_delayed_work(&uq_mgr->suspend_work, 0);
> +	else
> +		/* Else just signal and replace eviction fence */
> +		schedule_delayed_work(&evf_mgr->suspend_work, 0);
> +
> +	return true;
> +}
> +
>   static const struct dma_fence_ops amdgpu_eviction_fence_ops = {
>   	.use_64bit_seqno = true,
>   	.get_driver_name = amdgpu_eviction_fence_get_driver_name,
>   	.get_timeline_name = amdgpu_eviction_fence_get_timeline_name,
> +	.enable_signaling = amdgpu_eviction_fence_enable_signaling,
>   };
>   
>   int amdgpu_eviction_fence_signal(struct amdgpu_eviction_fence_mgr *evf_mgr)
> @@ -59,11 +203,14 @@ struct amdgpu_eviction_fence *
>   amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr)
>   {
>   	struct amdgpu_eviction_fence *ev_fence;
> +	struct amdgpu_fpriv *fpriv = evf_mgr_to_fpriv(evf_mgr);
> +	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>   
>   	ev_fence = kzalloc(sizeof(*ev_fence), GFP_KERNEL);
>   	if (!ev_fence)
>   		return NULL;
>   
> +	ev_fence->uq_mgr = uq_mgr;
>   	get_task_comm(ev_fence->timeline_name, current);
>   	spin_lock_init(&ev_fence->lock);
>   	dma_fence_init(&ev_fence->base, &amdgpu_eviction_fence_ops,
> @@ -143,6 +290,8 @@ void amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr *evf_mgr)
>   		goto unlock;
>   	}
>   
> +	INIT_DELAYED_WORK(&evf_mgr->suspend_work, amdgpu_eviction_fence_suspend_worker);
> +
>   unlock:
>   	spin_unlock(&evf_mgr->ev_fence_lock);
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> index b47ab1307ec5..712fabf09fc1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> @@ -37,6 +37,8 @@ struct amdgpu_eviction_fence_mgr {
>   	atomic_t		ev_fence_seq;
>   	spinlock_t 		ev_fence_lock;
>   	struct amdgpu_eviction_fence *ev_fence;
> +	struct delayed_work	suspend_work;
> +	bool eviction_allowed;
>   };
>   
>   /* Eviction fence helper functions */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index e7fb13e20197..88f3a885b1dc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1434,6 +1434,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>   {
>   	struct amdgpu_device *adev = drm_to_adev(dev);
>   	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
> +	struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
>   	struct amdgpu_bo_list *list;
>   	struct amdgpu_bo *pd;
>   	u32 pasid;
> @@ -1466,6 +1467,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>   		amdgpu_bo_unreserve(pd);
>   	}
>   
> +	evf_mgr->eviction_allowed = true;
>   	amdgpu_eviction_fence_destroy(&fpriv->evf_mgr);
>   	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>   	amdgpu_vm_fini(adev, &fpriv->vm);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 614953b0fc19..4cf65aba9a9b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -455,10 +455,18 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
>   	if (r)
>   		goto exec_fini;
>   
> -	for (i = 0; i < num_bo_handles; i++)
> +	/* Save the fence to wait for during suspend */
> +	dma_fence_put(queue->last_fence);
> +	queue->last_fence = dma_fence_get(fence);
> +
> +	for (i = 0; i < num_bo_handles; i++) {
> +		if (!gobj || !gobj[i]->resv)
> +			continue;
> +
>   		dma_resv_add_fence(gobj[i]->resv, fence,
>   				   dma_resv_usage_rw(args->bo_flags &
>   				   AMDGPU_USERQ_BO_WRITE));
> +	}
>   
>   	/* Add the created fence to syncobj/BO's */
>   	for (i = 0; i < num_syncobj_handles; i++)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index ba986d55ceeb..979174f80993 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -22,6 +22,7 @@
>    *
>    */
>   #include <drm/drm_syncobj.h>
> +#include <drm/drm_exec.h>
>   #include "amdgpu.h"
>   #include "amdgpu_vm.h"
>   #include "amdgpu_userqueue.h"
> @@ -282,6 +283,7 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>   	amdgpu_bo_unpin(queue->db_obj.obj);
>   	amdgpu_bo_unref(&queue->db_obj.obj);
>   	amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
> +	uq_mgr->num_userqs--;
>   	mutex_unlock(&uq_mgr->userq_mutex);
>   	return 0;
>   }
> @@ -369,6 +371,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>   		goto unlock;
>   	}
>   	args->out.queue_id = qid;
> +	uq_mgr->num_userqs++;
>   
>   unlock:
>   	mutex_unlock(&uq_mgr->userq_mutex);
> @@ -402,12 +405,109 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>   	return r;
>   }
>   
> +static int
> +amdgpu_userqueue_suspend_all(struct amdgpu_userq_mgr *uq_mgr)
> +{
> +	struct amdgpu_device *adev = uq_mgr->adev;
> +	const struct amdgpu_userq_funcs *userq_funcs;
> +	struct amdgpu_usermode_queue *queue;
> +	int queue_id, ret;
> +
> +	userq_funcs = adev->userq_funcs[AMDGPU_HW_IP_GFX];
> +
> +	/* Suspend all the queues for this process */
> +	idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
> +		ret = userq_funcs->suspend(uq_mgr, queue);
> +		if (ret)
> +			DRM_ERROR("Failed to suspend queue\n");
> +	}
> +
> +	return ret;
> +}
> +
> +static int
> +amdgpu_userqueue_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
> +{
> +	struct amdgpu_usermode_queue *queue;
> +	int queue_id, ret;
> +
> +	idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
> +		struct dma_fence *f;
> +		struct drm_exec exec;
> +
> +		drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> +		drm_exec_until_all_locked(&exec) {
> +			f = queue->last_fence;
> +			drm_exec_retry_on_contention(&exec);
> +		}
> +		drm_exec_fini(&exec);
> +
> +		if (!f || dma_fence_is_signaled(f))
> +			continue;
> +		ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
> +		if ( ret <= 0) {
> +			DRM_ERROR("Timed out waiting for fence f=%p\n", f);
> +			return -ETIMEDOUT;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static void
> +amdgpu_userqueue_suspend_worker(struct work_struct *work)
> +{
> +	int ret;
> +	struct amdgpu_userq_mgr *uq_mgr = work_to_uq_mgr(work, suspend_work.work);
> +	struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
> +	struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
> +
> +	/* Wait for any pending userqueue fence to signal */
> +	ret = amdgpu_userqueue_wait_for_signal(uq_mgr);
> +	if (ret) {
> +		DRM_ERROR("Not suspending userqueue, timeout waiting for work\n");
> +		return;
> +	}
> +
> +	mutex_lock(&uq_mgr->userq_mutex);
> +	ret = amdgpu_userqueue_suspend_all(uq_mgr);
> +	if (ret) {
> +		DRM_ERROR("Failed to evict userqueue\n");
> +		goto unlock;
> +	}
> +
> +	/* Signal current eviction fence */
> +	ret = amdgpu_eviction_fence_signal(evf_mgr);
> +	if (ret) {
> +		DRM_ERROR("Can't signal eviction fence to suspend\n");
> +		goto unlock;
> +	}
> +
> +	/* Cleanup old eviction fence entry */
> +	amdgpu_eviction_fence_destroy(evf_mgr);
> +
> +unlock:
> +	mutex_unlock(&uq_mgr->userq_mutex);
> +}
> +
>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>   {
> +	struct amdgpu_fpriv *fpriv;
> +
>   	mutex_init(&userq_mgr->userq_mutex);
>   	idr_init_base(&userq_mgr->userq_idr, 1);
>   	userq_mgr->adev = adev;
> +	userq_mgr->num_userqs = 0;
> +
> +	fpriv = uq_mgr_to_fpriv(userq_mgr);
> +	if (!fpriv->evf_mgr.ev_fence) {
> +		DRM_ERROR("Eviction fence not initialized yet\n");
> +		return -EINVAL;
> +	}
>   
> +	/* This reference is required for suspend work */
> +	fpriv->evf_mgr.ev_fence->uq_mgr = userq_mgr;
> +	INIT_DELAYED_WORK(&userq_mgr->suspend_work, amdgpu_userqueue_suspend_worker);
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 37be29048f42..8b3b50fa8b5b 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -27,6 +27,10 @@
>   
>   #define AMDGPU_MAX_USERQ_COUNT 512
>   
> +#define to_ev_fence(f) container_of(f, struct amdgpu_eviction_fence, base)
> +#define work_to_uq_mgr(w, name) container_of(w, struct amdgpu_userq_mgr, name)
> +#define uq_mgr_to_fpriv(u) container_of(u, struct amdgpu_fpriv, userq_mgr)
> +
>   struct amdgpu_mqd_prop;
>   
>   struct amdgpu_userq_obj {
> @@ -50,6 +54,7 @@ struct amdgpu_usermode_queue {
>   	struct amdgpu_userq_obj wptr_obj;
>   	struct xarray		uq_fence_drv_xa;
>   	struct amdgpu_userq_fence_driver *fence_drv;
> +	struct dma_fence	*last_fence;
>   };
>   
>   struct amdgpu_userq_funcs {
> @@ -69,6 +74,9 @@ struct amdgpu_userq_mgr {
>   	struct idr			userq_idr;
>   	struct mutex			userq_mutex;
>   	struct amdgpu_device		*adev;
> +
> +	struct delayed_work		suspend_work;
> +	int num_userqs;
>   };
>   
>   int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
> @@ -86,4 +94,6 @@ void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
>   int amdgpu_userqueue_update_bo_mapping(struct drm_file *filp, struct amdgpu_bo_va *bo_va,
>   				       uint32_t operation, uint32_t syncobj_handle,
>   				       uint64_t point);
> +
> +int amdgpu_userqueue_enable_signaling(struct dma_fence *f);
>   #endif


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 26/28] drm/amdgpu: fix MES GFX mask
  2024-09-09 20:06 ` [PATCH v11 26/28] drm/amdgpu: fix MES GFX mask Shashank Sharma
@ 2024-09-17 12:21   ` Christian König
  0 siblings, 0 replies; 38+ messages in thread
From: Christian König @ 2024-09-17 12:21 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Arvind Yadav, Christian König, Alex Deucher

Am 09.09.24 um 22:06 schrieb Shashank Sharma:
> From: Arvind Yadav <Arvind.Yadav@amd.com>
>
> Current MES GFX mask prevents FW to enable oversubscription. This patch
> does the following:
> - Fixes the mask values and adds a description for the same
> - Removes the central mask setup and makes it IP specific, as it would
>    be different when the number of pipes and queues are different.
>
> Cc: Christian König <Christian.Koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> Change-Id: I86f5b89c5527c23df94edc707c69c78819f4c8cf

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 +-
>   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++++++--
>   3 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> index f7d5d4f08a53..dbf19122dfc3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> @@ -151,9 +151,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
>   		adev->mes.compute_hqd_mask[i] = 0xc;
>   	}
>   
> -	for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
> -		adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffffffe;
> -
>   	for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
>   		if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
>   		    IP_VERSION(6, 0, 0))
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> index 96788c0f42f1..45e3508f0f8e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> @@ -109,8 +109,8 @@ struct amdgpu_mes {
>   
>   	uint32_t                        vmid_mask_gfxhub;
>   	uint32_t                        vmid_mask_mmhub;
> -	uint32_t                        compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
>   	uint32_t                        gfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
> +	uint32_t                        compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
>   	uint32_t                        sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
>   	uint32_t                        aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
>   	uint32_t                        sch_ctx_offs[AMDGPU_MAX_MES_PIPES];
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> index 2911c45cfbe0..d2610a664b2a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> @@ -653,8 +653,13 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes *mes)
>   		mes_set_hw_res_pkt.compute_hqd_mask[i] =
>   			mes->compute_hqd_mask[i];
>   
> -	for (i = 0; i < MAX_GFX_PIPES; i++)
> -		mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
> +	/*
> +	 * GFX pipe 0 queue 0 is being used by kernel
> +	 * Set GFX pipe 0 queue 1 for MES scheduling
> +	 * GFX pipe 1 can't be used for MES due to HW limitation.
> +	 */
> +	mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
> +	mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
>   
>   	for (i = 0; i < MAX_SDMA_PIPES; i++)
>   		mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 28/28] Revert "drm/amdgpu: don't allow userspace to create a doorbell BO"
  2024-09-09 20:06 ` [PATCH v11 28/28] Revert "drm/amdgpu: don't allow userspace to create a doorbell BO" Shashank Sharma
@ 2024-09-17 12:25   ` Christian König
  0 siblings, 0 replies; 38+ messages in thread
From: Christian König @ 2024-09-17 12:25 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Arvind Yadav

Am 09.09.24 um 22:06 schrieb Shashank Sharma:
> From: Arvind Yadav <Arvind.Yadav@amd.com>
>
> This reverts commit 6be2ad4f0073c541146caa66c5ae936c955a8224.

Signed-off-by line missing, apart from that Reviewed-by: Christian König 
<christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 ----
>   1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 7823faa3dbaa..2e3c974a3340 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -365,10 +365,6 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
>   	uint32_t handle, initial_domain;
>   	int r;
>   
> -	/* reject DOORBELLs until userspace code to use it is available */
> -	if (args->in.domains & AMDGPU_GEM_DOMAIN_DOORBELL)
> -		return -EINVAL;
> -
>   	/* reject invalid gem flags */
>   	if (flags & ~(AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
>   		      AMDGPU_GEM_CREATE_NO_CPU_ACCESS |


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 24/28] drm/amdgpu: resume gfx userqueues
  2024-09-09 20:06 ` [PATCH v11 24/28] drm/amdgpu: resume " Shashank Sharma
@ 2024-09-17 12:30   ` Christian König
  2024-09-25  9:15     ` Sharma, Shashank
  0 siblings, 1 reply; 38+ messages in thread
From: Christian König @ 2024-09-17 12:30 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Alex Deucher, Christian Koenig, Arvind Yadav

Am 09.09.24 um 22:06 schrieb Shashank Sharma:
> This patch adds support for userqueue resume. What it typically does is
> this:
> - adds a new delayed work for resuming all the queues.
> - schedules this delayed work from the suspend work.
> - validates the BOs and replaces the eviction fence before resuming all
>    the queues running under this instance of userq manager.
>
> V2: Addressed Christian's review comments:
>      - declare local variables like ret at the bottom.
>      - lock all the object first, then start attaching the new fence.
>      - dont replace old eviction fence, just attach new eviction fence.
>      - no error logs for drm_exec_lock failures
>      - no need to reserve bos after drm_exec_locked
>      - schedule the resume worker immediately (not after 100 ms)
>      - check for NULL BO (Arvind)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 120 ++++++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |   1 +
>   2 files changed, 121 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 979174f80993..e7f7354e0c0e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -405,6 +405,122 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>   	return r;
>   }
>   
> +static int
> +amdgpu_userqueue_resume_all(struct amdgpu_userq_mgr *uq_mgr)
> +{
> +	struct amdgpu_device *adev = uq_mgr->adev;
> +	const struct amdgpu_userq_funcs *userq_funcs;
> +	struct amdgpu_usermode_queue *queue;
> +	int queue_id, ret;
> +
> +	userq_funcs = adev->userq_funcs[AMDGPU_HW_IP_GFX];
> +
> +	/* Resume all the queues for this process */
> +	idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
> +		ret = userq_funcs->resume(uq_mgr, queue);
> +		if (ret)
> +			DRM_ERROR("Failed to resume queue %d\n", queue_id);
> +	}
> +
> +	return ret;
> +}
> +
> +static int
> +amdgpu_userqueue_validate_bos(struct amdgpu_userq_mgr *uq_mgr)
> +{
> +	struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
> +	struct amdgpu_vm *vm = &fpriv->vm;
> +	struct amdgpu_bo_va *bo_va, *tmp;
> +	struct drm_exec exec;
> +	struct amdgpu_bo *bo;
> +	int ret;
> +
> +	drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
> +	drm_exec_until_all_locked(&exec) {
> +		ret = amdgpu_vm_lock_pd(vm, &exec, 2);
> +		drm_exec_retry_on_contention(&exec);
> +		if (unlikely(ret)) {
> +			DRM_ERROR("Failed to lock PD\n");

I would drop those error messages in the low level function.

The most likely cause (except for contention) why locking a BO fails is 
because we were interrupted, and for that we actually don't want to 
print anything.

Apart from that I really need to wrap my head around the VM code once 
more, but that here should probably work for now.

Regards,
Christian.

> +			goto unlock_all;
> +		}
> +
> +		/* Lock the done list */
> +		list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {
> +			bo = bo_va->base.bo;
> +			if (!bo)
> +				continue;
> +
> +			ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
> +			drm_exec_retry_on_contention(&exec);
> +			if (unlikely(ret))
> +				goto unlock_all;
> +		}
> +
> +		/* Lock the invalidated list */
> +		list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, base.vm_status) {
> +			bo = bo_va->base.bo;
> +			if (!bo)
> +				continue;
> +
> +			ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
> +			drm_exec_retry_on_contention(&exec);
> +			if (unlikely(ret))
> +				goto unlock_all;
> +		}
> +	}
> +
> +	/* Now validate BOs */
> +	list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, base.vm_status) {
> +		bo = bo_va->base.bo;
> +		if (!bo)
> +			continue;
> +
> +		ret = amdgpu_userqueue_validate_vm_bo(NULL, bo);
> +		if (ret) {
> +			DRM_ERROR("Failed to validate BO\n");
> +			goto unlock_all;
> +		}
> +	}
> +
> +	/* Handle the moved BOs */
> +	ret = amdgpu_vm_handle_moved(uq_mgr->adev, vm, &exec.ticket);
> +	if (ret) {
> +		DRM_ERROR("Failed to handle moved BOs\n");
> +		goto unlock_all;
> +	}
> +
> +	ret = amdgpu_eviction_fence_replace_fence(fpriv);
> +	if (ret)
> +		DRM_ERROR("Failed to replace eviction fence\n");
> +
> +unlock_all:
> +	drm_exec_fini(&exec);
> +	return ret;
> +}
> +
> +static void amdgpu_userqueue_resume_worker(struct work_struct *work)
> +{
> +	struct amdgpu_userq_mgr *uq_mgr = work_to_uq_mgr(work, resume_work.work);
> +	int ret;
> +
> +	mutex_lock(&uq_mgr->userq_mutex);
> +
> +	ret = amdgpu_userqueue_validate_bos(uq_mgr);
> +	if (ret) {
> +		DRM_ERROR("Failed to validate BOs to restore\n");
> +		goto unlock;
> +	}
> +
> +	ret = amdgpu_userqueue_resume_all(uq_mgr);
> +	if (ret) {
> +		DRM_ERROR("Failed to resume all queues\n");
> +		goto unlock;
> +	}
> +
> +unlock:
> +	mutex_unlock(&uq_mgr->userq_mutex);
> +}
> +
>   static int
>   amdgpu_userqueue_suspend_all(struct amdgpu_userq_mgr *uq_mgr)
>   {
> @@ -486,6 +602,9 @@ amdgpu_userqueue_suspend_worker(struct work_struct *work)
>   	/* Cleanup old eviction fence entry */
>   	amdgpu_eviction_fence_destroy(evf_mgr);
>   
> +	/* Schedule a work to restore userqueue */
> +	schedule_delayed_work(&uq_mgr->resume_work, 0);
> +
>   unlock:
>   	mutex_unlock(&uq_mgr->userq_mutex);
>   }
> @@ -508,6 +627,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
>   	/* This reference is required for suspend work */
>   	fpriv->evf_mgr.ev_fence->uq_mgr = userq_mgr;
>   	INIT_DELAYED_WORK(&userq_mgr->suspend_work, amdgpu_userqueue_suspend_worker);
> +	INIT_DELAYED_WORK(&userq_mgr->resume_work, amdgpu_userqueue_resume_worker);
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 8b3b50fa8b5b..d035b5c2b14b 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -76,6 +76,7 @@ struct amdgpu_userq_mgr {
>   	struct amdgpu_device		*adev;
>   
>   	struct delayed_work		suspend_work;
> +	struct delayed_work		resume_work;
>   	int num_userqs;
>   };
>   


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 00/28] AMDGPU usermode queues
  2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
                   ` (22 preceding siblings ...)
  2024-09-09 20:06 ` [PATCH v11 28/28] Revert "drm/amdgpu: don't allow userspace to create a doorbell BO" Shashank Sharma
@ 2024-09-19 16:59 ` Alex Deucher
  2024-09-25  9:14   ` Sharma, Shashank
  23 siblings, 1 reply; 38+ messages in thread
From: Alex Deucher @ 2024-09-19 16:59 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx

On Mon, Sep 9, 2024 at 4:07 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> This patch series introduces base code of AMDGPU usermode queues for gfx
> workloads. Usermode queues is a method of GPU workload submission into the
> graphics hardware without any interaction with kernel/DRM schedulers. In
> this method, a userspace graphics application can create its own workqueue
> and submit it directly in the GPU HW.
>
> The general idea of how Userqueues are supposed to work:
> - The application creates the following GPU objetcs:
>   - A queue object to hold the workload packets.
>   - A read pointer object.
>   - A write pointer object.
>   - A doorbell page.
>   - Other supporting buffer objects as per target IP engine (shadow, GDS
>     etc, information available with AMDGPU_INFO_IOCTL)

the queue, rptr, wptr, and metadata buffers don't have to be separate
buffers.  Userspace could suballocate them out of the same buffer.  We
just need the virtual addresses.  However, we need to keep track of
the GPU virtual addresses used by the user queue for these buffers and
prevent them from being unmapped until the queue is destroyed, similar
to what we do on the KFD side.  Otherwise, the user could unmap one of
the buffers and submit work to the user queue which could cause it to
hang.

Alex

> - The application picks a 32-bit offset in the doorbell page for this
>   queue.
> - The application uses the usermode_queue_create IOCTL introduced in
>   this patch, by passing the GPU addresses of these objects (read ptr,
>   write ptr, queue base address, shadow, gds) with doorbell object and
>   32-bit doorbell offset in the doorbell page.
> - The kernel creates the queue and maps it in the HW.
> - The application maps the GPU buffers in process address space.
> - The application can start submitting the data in the queue as soon as
>   the kernel IOCTL returns.
> - After filling the workload data in the queue, the app must write the
>   number of dwords added in the queue into the doorbell offset and the
>   WPTR buffer. The GPU will start fetching the data as soon as its done.
> - This series adds usermode queue support for all three MES based IPs
>   (GFX, SDMA and Compute).
> - This series also adds eviction fences to handle migration of the
>   userqueue mapped buffers by TTM.
> - For synchronization of userqueues, we have added a secure semaphores
>   IOCTL which is getting reviewed separately here:
>   https://patchwork.freedesktop.org/patch/611971/
>
> libDRM UAPI changes for this series can be found here:
> (This also contains an example test utility which demonstrates
> the usage of userqueue UAPI)
> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>
> MESA changes consuming this series can be seen in the MR here:
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010
>
> Alex Deucher (1):
>   drm/amdgpu: UAPI for user queue management
>
> Arvind Yadav (4):
>   drm/amdgpu: enable SDMA usermode queues
>   drm/amdgpu: Add input fence to sync bo unmap
>   drm/amdgpu: fix MES GFX mask
>   Revert "drm/amdgpu: don't allow userspace to create a doorbell BO"
>
> Shashank Sharma (18):
>   drm/amdgpu: add usermode queue base code
>   drm/amdgpu: add new IOCTL for usermode queue
>   drm/amdgpu: add helpers to create userqueue object
>   drm/amdgpu: create MES-V11 usermode queue for GFX
>   drm/amdgpu: create context space for usermode queue
>   drm/amdgpu: map usermode queue into MES
>   drm/amdgpu: map wptr BO into GART
>   drm/amdgpu: generate doorbell index for userqueue
>   drm/amdgpu: cleanup leftover queues
>   drm/amdgpu: enable GFX-V11 userqueue support
>   drm/amdgpu: enable compute/gfx usermode queue
>   drm/amdgpu: update userqueue BOs and PDs
>   drm/amdgpu: add kernel config for gfx-userqueue
>   drm/amdgpu: add gfx eviction fence helpers
>   drm/amdgpu: add userqueue suspend/resume functions
>   drm/amdgpu: suspend gfx userqueues
>   drm/amdgpu: resume gfx userqueues
>   Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV"
>
>  drivers/gpu/drm/amd/amdgpu/Kconfig            |   8 +
>  drivers/gpu/drm/amd/amdgpu/Makefile           |  10 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  11 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   5 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  10 +
>  .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 297 ++++++++
>  .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |  67 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |  68 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  11 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |   3 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |   2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 713 ++++++++++++++++++
>  .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  74 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 644 ++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  42 +-
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        |  16 +-
>  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 395 ++++++++++
>  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h  |  30 +
>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c        |   5 +
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    | 100 +++
>  drivers/gpu/drm/amd/include/v11_structs.h     |   4 +-
>  include/uapi/drm/amdgpu_drm.h                 | 252 +++++++
>  22 files changed, 2722 insertions(+), 45 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h
>  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>
> --
> 2.45.1
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 21/28] drm/amdgpu: add gfx eviction fence helpers
  2024-09-16 14:14   ` Christian König
@ 2024-09-25  9:08     ` Sharma, Shashank
  0 siblings, 0 replies; 38+ messages in thread
From: Sharma, Shashank @ 2024-09-25  9:08 UTC (permalink / raw)
  To: Christian König, amd-gfx
  Cc: Christian Koenig, Alex Deucher, Arvind Yadav

Hey Christian,

On 16/09/2024 16:14, Christian König wrote:
> Am 09.09.24 um 22:06 schrieb Shashank Sharma:
>> This patch adds basic eviction fence framework for the gfx buffers.
>> The idea is to:
>> - One eviction fence is created per gfx process, at kms_open.
>> - This fence is attached to all the gem buffers created
>>    by this process.
>> - This fence is detached to all the gem buffers at postclose_kms.
>>
>> This framework will be further used for usermode queues.
>>
>> V2: Addressed review comments from Christian
>>      - keep fence_ctx and fence_seq directly in fpriv
>>      - evcition_fence should be dynamically allocated
>>      - do not save eviction fence instance in BO, there could be many
>>        such fences attached to one BO
>>      - use dma_resv_replace_fence() in detach
>>
>> V3: Addressed review comments from Christian
>>      - eviction fence create and destroy functions should be called 
>> only once
>>        from fpriv create/destroy
>>      - use dma_fence_put() in eviction_fence_destroy
>>
>> V4: Addressed review comments from Christian:
>>      - create a separate ev_fence_mgr structure
>>      - cleanup fence init part
>>      - do not add a domain for fence owner KGD
>>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> Change-Id: I7a8d27d7172bafbfe34aa9decf2cd36655948275
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   6 +-
>>   .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 148 ++++++++++++++++++
>>   .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |  65 ++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |   9 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   3 +
>>   6 files changed, 231 insertions(+), 2 deletions(-)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index ff5621697c68..0643078d1225 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -66,7 +66,7 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
>> amdgpu_kms.o \
>>       amdgpu_fw_attestation.o amdgpu_securedisplay.o \
>>       amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
>>       amdgpu_ring_mux.o amdgpu_xcp.o amdgpu_seq64.o amdgpu_aca.o 
>> amdgpu_dev_coredump.o \
>> -    amdgpu_userq_fence.o
>> +    amdgpu_userq_fence.o amdgpu_eviction_fence.o
>>     amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 76ada47b1875..0013bfc74024 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -113,6 +113,7 @@
>>   #include "amdgpu_seq64.h"
>>   #include "amdgpu_reg_state.h"
>>   #include "amdgpu_userqueue.h"
>> +#include "amdgpu_eviction_fence.h"
>>   #if defined(CONFIG_DRM_AMD_ISP)
>>   #include "amdgpu_isp.h"
>>   #endif
>> @@ -481,7 +482,6 @@ struct amdgpu_flip_work {
>>       bool                async;
>>   };
>>   -
>>   /*
>>    * file private structure
>>    */
>> @@ -495,6 +495,10 @@ struct amdgpu_fpriv {
>>       struct idr        bo_list_handles;
>>       struct amdgpu_ctx_mgr    ctx_mgr;
>>       struct amdgpu_userq_mgr    userq_mgr;
>> +
>> +    /* Eviction fence infra */
>> +    struct amdgpu_eviction_fence_mgr evf_mgr;
>> +
>>       /** GPU partition selection */
>>       uint32_t        xcp_id;
>>   };
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>> new file mode 100644
>> index 000000000000..2d474cb11cf9
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>> @@ -0,0 +1,148 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright 2024 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person 
>> obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without 
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, 
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom 
>> the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be 
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>> EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>> DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>> OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>> USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +#include <linux/sched.h>
>> +#include "amdgpu.h"
>> +
>> +static const char *
>> +amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence)
>> +{
>> +    return "amdgpu";
>> +}
>> +
>> +static const char *
>> +amdgpu_eviction_fence_get_timeline_name(struct dma_fence *f)
>> +{
>> +    struct amdgpu_eviction_fence *ef;
>> +
>> +    ef = container_of(f, struct amdgpu_eviction_fence, base);
>> +    return ef->timeline_name;
>> +}
>> +
>> +static const struct dma_fence_ops amdgpu_eviction_fence_ops = {
>> +    .use_64bit_seqno = true,
>> +    .get_driver_name = amdgpu_eviction_fence_get_driver_name,
>> +    .get_timeline_name = amdgpu_eviction_fence_get_timeline_name,
>> +};
>> +
>> +int amdgpu_eviction_fence_signal(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr)
>> +{
>> +    int ret;
>> +
>> +    spin_lock(&evf_mgr->ev_fence_lock);
>> +    ret = dma_fence_signal(&evf_mgr->ev_fence->base);
>> +    spin_unlock(&evf_mgr->ev_fence_lock);
>> +    return ret;
>> +}
>> +
>> +struct amdgpu_eviction_fence *
>> +amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr)
>> +{
>> +    struct amdgpu_eviction_fence *ev_fence;
>> +
>> +    ev_fence = kzalloc(sizeof(*ev_fence), GFP_KERNEL);
>> +    if (!ev_fence)
>> +        return NULL;
>> +
>> +    get_task_comm(ev_fence->timeline_name, current);
>> +    spin_lock_init(&ev_fence->lock);
>> +    dma_fence_init(&ev_fence->base, &amdgpu_eviction_fence_ops,
>> +               &ev_fence->lock, evf_mgr->ev_fence_ctx,
>> + atomic_inc_return(&evf_mgr->ev_fence_seq));
>> +    return ev_fence;
>> +}
>> +
>> +void amdgpu_eviction_fence_destroy(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr)
>> +{
>> +    if (!evf_mgr->ev_fence)
>> +        return;
>> +
>> +    if (!dma_fence_is_signaled(&evf_mgr->ev_fence->base))
>
> You can drop that if, dma_fence_wait() will check that anyway.
Noted
>
>> + dma_fence_wait(&evf_mgr->ev_fence->base, false);
>> +
>> +    /* Last unref of ev_fence */
>> +    spin_lock(&evf_mgr->ev_fence_lock);
>> +    dma_fence_put(&evf_mgr->ev_fence->base);
>> +    evf_mgr->ev_fence = NULL;
>> +    spin_unlock(&evf_mgr->ev_fence_lock);
>> +}
>> +
>> +int amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr,
>> +                 struct amdgpu_bo *bo)
>> +{
>> +    struct dma_fence *ef;
>> +    struct amdgpu_eviction_fence *ev_fence = evf_mgr->ev_fence;
>> +    struct dma_resv *resv = bo->tbo.base.resv;
>> +    int ret;
>> +
>> +    if (!ev_fence || !resv)
>> +        return 0;
>> +
>> +    ef = &ev_fence->base;
>> +    ret = dma_resv_reserve_fences(resv, 1);
>> +    if (ret) {
>> +        dma_fence_wait(ef, false);
>> +        return ret;
>> +    }
>> +
>> +    spin_lock(&evf_mgr->ev_fence_lock);
>> +    dma_resv_add_fence(resv, ef, DMA_RESV_USAGE_BOOKKEEP);
>> +    spin_unlock(&evf_mgr->ev_fence_lock);
>
> That spinlock is protecting evf_mgr->ev_fence, isn't it?
>
> In that case you probably shouldn't dereference it outside of the 
> spinlock.
Noted
>
>> +    return 0;
>> +}
>> +
>> +void amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr,
>> +                  struct amdgpu_bo *bo)
>> +{
>> +    struct dma_fence *stub;
>> +    struct amdgpu_eviction_fence *ev_fence = evf_mgr->ev_fence;
>> +
>> +    if (!ev_fence)
>> +        return;
>> +
>> +    spin_lock(&evf_mgr->ev_fence_lock);
>> +    stub = dma_fence_get_stub();
>> +    dma_resv_replace_fences(bo->tbo.base.resv, evf_mgr->ev_fence_ctx,
>> +                stub, DMA_RESV_USAGE_BOOKKEEP);
>> +    dma_fence_put(stub);
>> +    spin_unlock(&evf_mgr->ev_fence_lock);
>
> This operation doesn't need the spinlock since we are not accessing 
> evf_mgr->ev_fence.

Actually the intent of this lock was to serialize the update inside 
evf_mgr across various queues inside a process, so I want to use it for 
any updates into evf_mgr. Do you think that's unnecessary ?

>
>> +}
>> +
>> +void amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr)
>> +{
>> +
>> +    /* This needs to be done one time per open */
>> +    atomic_set(&evf_mgr->ev_fence_seq, 0);
>> +    evf_mgr->ev_fence_ctx = dma_fence_context_alloc(1);
>> +    spin_lock_init(&evf_mgr->ev_fence_lock);
>> +
>> +    spin_lock(&evf_mgr->ev_fence_lock);
>> +    evf_mgr->ev_fence = amdgpu_eviction_fence_create(evf_mgr);
>
> amdgpu_eviction_fence_create() will call kmalloc, doing that while 
> holding the spinlock is a bad idea.
>
> You need to do something like:
>
> tmp = amdgpu_eviction_fence_create(....);
> spin_lock(...);
> evf_mgr->ev_fence = tmp;
> ...
Noted
>
>> +    if (!evf_mgr->ev_fence) {
>> +        DRM_ERROR("Failed to craete eviction fence\n");
>> +        goto unlock;
>> +    }
>> +
>> +unlock:
>> +    spin_unlock(&evf_mgr->ev_fence_lock);
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>> new file mode 100644
>> index 000000000000..b47ab1307ec5
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>> @@ -0,0 +1,65 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright 2023 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person 
>> obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without 
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, 
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom 
>> the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be 
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>> EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>> DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>> OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>> USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#ifndef AMDGPU_EV_FENCE_H_
>> +#define AMDGPU_EV_FENCE_H_
>> +
>> +struct amdgpu_eviction_fence {
>> +    struct dma_fence base;
>> +    spinlock_t     lock;
>> +    char         timeline_name[TASK_COMM_LEN];
>> +    struct amdgpu_userq_mgr *uq_mgr;
>> +};
>> +
>> +struct amdgpu_eviction_fence_mgr {
>> +    u64            ev_fence_ctx;
>> +    atomic_t        ev_fence_seq;
>> +    spinlock_t         ev_fence_lock;
>> +    struct amdgpu_eviction_fence *ev_fence;
>> +};
>> +
>> +/* Eviction fence helper functions */
>> +struct amdgpu_eviction_fence *
>> +amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr);
>> +
>> +void
>> +amdgpu_eviction_fence_destroy(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr);
>> +
>> +int
>> +amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
>> +                 struct amdgpu_bo *bo);
>> +
>> +void
>> +amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr *evf_mgr,
>> +                 struct amdgpu_bo *bo);
>> +
>> +void
>> +amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr *evf_mgr);
>> +
>> +int
>> +amdgpu_eviction_fence_signal(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr);
>> +
>> +int
>> +amdgpu_eviction_fence_replace_fence(struct amdgpu_fpriv *fpriv);
>> +#endif
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index f4529f2fad97..c9b4a6ce3f14 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -186,6 +186,13 @@ static int amdgpu_gem_object_open(struct 
>> drm_gem_object *obj,
>>           bo_va = amdgpu_vm_bo_add(adev, vm, abo);
>>       else
>>           ++bo_va->ref_count;
>> +
>> +    if (!vm->is_compute_context || !vm->process_info) {
>
> I said it before, we should really drop this line since the user 
> queues are completely independent of that.
I can see that compute code is trying to segregate the KFD UQs, so I 
thought we should also do the same.
>
>> +        /* attach gfx eviction fence */
>> +        if (amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo))
>> +            DRM_DEBUG_DRIVER("Failed to attach eviction fence to 
>> BO\n");
>> +    }
>> +
>>       amdgpu_bo_unreserve(abo);
>>         /* Validate and add eviction fence to DMABuf imports with 
>> dynamic
>> @@ -236,6 +243,8 @@ static void amdgpu_gem_object_close(struct 
>> drm_gem_object *obj,
>>       struct drm_exec exec;
>>       long r;
>>   +    amdgpu_eviction_fence_detach(&fpriv->evf_mgr, bo);
>> +
>
> We should probably skip that call for per VM BOs, or otherwise we will 
> also detach the page tables accidentally.

Agreed, but we have not added the fence to VM root yet, due to the 
dependency as discussed. There will be separate patch for it, following 
this.

- Shashank

>
> BTW: Were do we attach the eviction fence to the page tables?
>
> Regards,
> Christian.
>
>>       drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
>>       drm_exec_until_all_locked(&exec) {
>>           r = drm_exec_prepare_obj(&exec, &bo->tbo.base, 1);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 019a377620ce..e7fb13e20197 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -1391,6 +1391,8 @@ int amdgpu_driver_open_kms(struct drm_device 
>> *dev, struct drm_file *file_priv)
>>       mutex_init(&fpriv->bo_list_lock);
>>       idr_init_base(&fpriv->bo_list_handles, 1);
>>   +    amdgpu_eviction_fence_init(&fpriv->evf_mgr);
>> +
>>       amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
>>         r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
>> @@ -1464,6 +1466,7 @@ void amdgpu_driver_postclose_kms(struct 
>> drm_device *dev,
>>           amdgpu_bo_unreserve(pd);
>>       }
>>   +    amdgpu_eviction_fence_destroy(&fpriv->evf_mgr);
>>       amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>>       amdgpu_vm_fini(adev, &fpriv->vm);
>>       amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 23/28] drm/amdgpu: suspend gfx userqueues
  2024-09-17 11:58   ` Christian König
@ 2024-09-25  9:13     ` Sharma, Shashank
  0 siblings, 0 replies; 38+ messages in thread
From: Sharma, Shashank @ 2024-09-25  9:13 UTC (permalink / raw)
  To: Christian König, amd-gfx
  Cc: Alex Deucher, Christian Koenig, Arvind Yadav


On 17/09/2024 13:58, Christian König wrote:
> Am 09.09.24 um 22:06 schrieb Shashank Sharma:
>> This patch adds suspend support for gfx userqueues. It typically does
>> the following:
>> - adds an enable_signaling function for the eviction fence, so that it
>>    can trigger the userqueue suspend,
>> - adds a delayed function for suspending the userqueues, to suspend all
>>    the queues under this userq manager and signals the eviction fence,
>> - adds reference of userq manager in the eviction fence container so
>>    that it can be used in the suspend function.
>>
>> V2: Addressed Christian's review comments:
>>      - schedule suspend work immediately
>>
>> V4: Addressed Christian's review comments:
>>      - wait for pending uq fences before starting suspend, added
>>        queue->last_fence for the same
>>      - accommodate ev_fence_mgr into existing code
>>      - some bug fixes and NULL checks
>>
>> V5: Addressed Christian's review comments (gitlab)
>>      - Wait for eviction fence to get signaled in destroy, dont 
>> signal it
>>      - Wait for eviction fence to get signaled in replace fence, dont 
>> signal it
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> Change-Id: Ib60a7feda5544e3badc87bd1a991931ee726ee82
>> ---
>>   .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 149 ++++++++++++++++++
>>   .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |   2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   2 +
>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   |  10 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 100 ++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  10 ++
>>   6 files changed, 272 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>> index 2d474cb11cf9..3d4fc704adb1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>> @@ -22,8 +22,12 @@
>>    *
>>    */
>>   #include <linux/sched.h>
>> +#include <drm/drm_exec.h>
>>   #include "amdgpu.h"
>>   +#define work_to_evf_mgr(w, name) container_of(w, struct 
>> amdgpu_eviction_fence_mgr, name)
>> +#define evf_mgr_to_fpriv(e) container_of(e, struct amdgpu_fpriv, 
>> evf_mgr)
>> +
>>   static const char *
>>   amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence)
>>   {
>> @@ -39,10 +43,150 @@ amdgpu_eviction_fence_get_timeline_name(struct 
>> dma_fence *f)
>>       return ef->timeline_name;
>>   }
>>   +static void
>> +amdgpu_eviction_fence_update_fence(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr,
>> +                   struct amdgpu_eviction_fence *new_ef)
>> +{
>> +    struct dma_fence *old_ef = &evf_mgr->ev_fence->base;
>
> The spinlock is protecting evf_mgr->ev_fence so this access without 
> holding the spinlock here is illegal.
>
> I think you should just drop the local variable.
>
Agreed
>> +
>> +    spin_lock(&evf_mgr->ev_fence_lock);
>> +    dma_fence_put(old_ef);
>> +    evf_mgr->ev_fence = new_ef;
>> +    spin_unlock(&evf_mgr->ev_fence_lock);
>> +}
>> +
>> +int
>> +amdgpu_eviction_fence_replace_fence(struct amdgpu_fpriv *fpriv)
>> +{
>> +    struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
>> +    struct amdgpu_vm *vm = &fpriv->vm;
>> +    struct amdgpu_eviction_fence *old_ef, *new_ef;
>> +    struct amdgpu_bo_va *bo_va, *tmp;
>> +    int ret;
>> +
>> +    old_ef = evf_mgr->ev_fence;
>> +    if (old_ef && !dma_fence_is_signaled(&old_ef->base)) {
>> +        DRM_DEBUG_DRIVER("Old EF not signaled yet\n");
>> +        dma_fence_wait(&old_ef->base, true);
>> +    }
>
> Please completely drop that.

I need some info on this. If we reach here to replace the eviction 
fence, but the previous eviction fence is not yet signaled, should not 
bother about waiting on the old ev_fence to be signaled ?

>
>> +
>> +    new_ef = amdgpu_eviction_fence_create(evf_mgr);
>> +    if (!new_ef) {
>> +        DRM_ERROR("Failed to create new eviction fence\n");
>> +        return ret;
>> +    }
>> +
>> +    /* Replace fences and free old one */
>> +    amdgpu_eviction_fence_update_fence(evf_mgr, new_ef);
>> +
>> +    /* Attach new eviction fence to BOs */
>> +    list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {
>
> It's probably better to use drm_exec_for_each_locked() here.
Noted,
>
>> +        struct amdgpu_bo *bo = bo_va->base.bo;
>> +
>> +        if (!bo)
>> +            continue;
>> +
>> +        /* Skip pinned BOs */
>> +        if (bo->tbo.pin_count)
>> +            continue;
>
> Clearly a bad idea, even pinned BOs need the eviction fence because 
> they can be unpinned at any time.
Noted
>
>> +
>> +        ret = amdgpu_eviction_fence_attach(evf_mgr, bo);
>> +        if (ret) {
>> +            DRM_ERROR("Failed to attch new eviction fence\n");
>> +            goto free_err;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +
>> +free_err:
>> +    kfree(new_ef);
>> +    return ret;
>> +}
>> +
>> +static void
>> +amdgpu_eviction_fence_suspend_worker(struct work_struct *work)
>> +{
>> +    struct amdgpu_eviction_fence_mgr *evf_mgr = 
>> work_to_evf_mgr(work, suspend_work.work);
>> +    struct amdgpu_fpriv *fpriv = evf_mgr_to_fpriv(evf_mgr);
>> +    struct amdgpu_vm *vm = &fpriv->vm;
>> +    struct amdgpu_bo_va *bo_va, *tmp;
>> +    struct drm_exec exec;
>> +    struct amdgpu_bo *bo;
>> +    int ret;
>> +
>> +    /* Signal old eviction fence */
>> +    ret = amdgpu_eviction_fence_signal(evf_mgr);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to signal eviction fence err=%d\n", ret);
>> +        return;
>> +    }
>> +
>> +    /* Cleanup old eviction fence entry */
>> +    amdgpu_eviction_fence_destroy(evf_mgr);
>
> Of hand that looks like a bad idea to me. The eviction fence should 
> never become NULL unless the fd is closed.
>
> In general we need to make sure that nothing races here, e.g. we 
> always need a defensive ordering.
>
> Something like:
> 1. Lock all BOs
> 2. Create new eviction fence,
> 3. Publish eviction fence in the evf_mgr.
> 4. Add the eviction fence to the BOs.
> 5. Drop locks on all BOs.
>
> This way concurrently opening/closing BOs should always see the right 
> eviction fence.

Noted, I will do the sequence change as suggested.

- Shashank

>
> Regards,
> Christian.
>
>> +
>> +    /* Do not replace eviction fence is fd is getting closed */
>> +    if (evf_mgr->eviction_allowed)
>> +        return;
>> +
>> +    drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
>> +    drm_exec_until_all_locked(&exec) {
>> +        ret = amdgpu_vm_lock_pd(vm, &exec, 2);
>> +        drm_exec_retry_on_contention(&exec);
>> +        if (unlikely(ret)) {
>> +            DRM_ERROR("Failed to lock PD\n");
>> +            goto unlock_drm;
>> +        }
>> +
>> +        /* Lock the done list */
>> +        list_for_each_entry_safe(bo_va, tmp, &vm->done, 
>> base.vm_status) {
>> +            bo = bo_va->base.bo;
>> +            if (!bo) continue;
>> +
>> +            ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
>> +            drm_exec_retry_on_contention(&exec);
>> +            if (unlikely(ret))
>> +                goto unlock_drm;
>> +        }
>> +    }
>> +    /* Replace old eviction fence with new one */
>> +    ret = amdgpu_eviction_fence_replace_fence(fpriv);
>> +    if (ret)
>> +        DRM_ERROR("Failed to replace eviction fence\n");
>> +unlock_drm:
>> +    drm_exec_fini(&exec);
>> +}
>> +
>> +static bool amdgpu_eviction_fence_enable_signaling(struct dma_fence *f)
>> +{
>> +    struct amdgpu_eviction_fence_mgr *evf_mgr;
>> +    struct amdgpu_eviction_fence *ev_fence;
>> +    struct amdgpu_userq_mgr *uq_mgr;
>> +    struct amdgpu_fpriv *fpriv;
>> +
>> +    if (!f)
>> +        return true;
>> +
>> +    ev_fence = to_ev_fence(f);
>> +    uq_mgr = ev_fence->uq_mgr;
>> +    fpriv = uq_mgr_to_fpriv(uq_mgr);
>> +    evf_mgr = &fpriv->evf_mgr;
>> +
>> +    if (uq_mgr->num_userqs)
>
> I don't think you should make that decision here. At least of hand 
> that looks racy.
>
> Probably better to always trigger the suspend work in the uq manager.
>
>> +        /* If userqueues are active, suspend userqueues */
>> +        schedule_delayed_work(&uq_mgr->suspend_work, 0);
>> +    else
>> +        /* Else just signal and replace eviction fence */
>> +        schedule_delayed_work(&evf_mgr->suspend_work, 0);
>> +
>> +    return true;
>> +}
>> +
>>   static const struct dma_fence_ops amdgpu_eviction_fence_ops = {
>>       .use_64bit_seqno = true,
>>       .get_driver_name = amdgpu_eviction_fence_get_driver_name,
>>       .get_timeline_name = amdgpu_eviction_fence_get_timeline_name,
>> +    .enable_signaling = amdgpu_eviction_fence_enable_signaling,
>>   };
>>     int amdgpu_eviction_fence_signal(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr)
>> @@ -59,11 +203,14 @@ struct amdgpu_eviction_fence *
>>   amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr 
>> *evf_mgr)
>>   {
>>       struct amdgpu_eviction_fence *ev_fence;
>> +    struct amdgpu_fpriv *fpriv = evf_mgr_to_fpriv(evf_mgr);
>> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>>         ev_fence = kzalloc(sizeof(*ev_fence), GFP_KERNEL);
>>       if (!ev_fence)
>>           return NULL;
>>   +    ev_fence->uq_mgr = uq_mgr;
>>       get_task_comm(ev_fence->timeline_name, current);
>>       spin_lock_init(&ev_fence->lock);
>>       dma_fence_init(&ev_fence->base, &amdgpu_eviction_fence_ops,
>> @@ -143,6 +290,8 @@ void amdgpu_eviction_fence_init(struct 
>> amdgpu_eviction_fence_mgr *evf_mgr)
>>           goto unlock;
>>       }
>>   +    INIT_DELAYED_WORK(&evf_mgr->suspend_work, 
>> amdgpu_eviction_fence_suspend_worker);
>> +
>>   unlock:
>>       spin_unlock(&evf_mgr->ev_fence_lock);
>>   }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>> index b47ab1307ec5..712fabf09fc1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>> @@ -37,6 +37,8 @@ struct amdgpu_eviction_fence_mgr {
>>       atomic_t        ev_fence_seq;
>>       spinlock_t         ev_fence_lock;
>>       struct amdgpu_eviction_fence *ev_fence;
>> +    struct delayed_work    suspend_work;
>> +    bool eviction_allowed;
>>   };
>>     /* Eviction fence helper functions */
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index e7fb13e20197..88f3a885b1dc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -1434,6 +1434,7 @@ void amdgpu_driver_postclose_kms(struct 
>> drm_device *dev,
>>   {
>>       struct amdgpu_device *adev = drm_to_adev(dev);
>>       struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
>> +    struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
>>       struct amdgpu_bo_list *list;
>>       struct amdgpu_bo *pd;
>>       u32 pasid;
>> @@ -1466,6 +1467,7 @@ void amdgpu_driver_postclose_kms(struct 
>> drm_device *dev,
>>           amdgpu_bo_unreserve(pd);
>>       }
>>   +    evf_mgr->eviction_allowed = true;
>>       amdgpu_eviction_fence_destroy(&fpriv->evf_mgr);
>>       amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>>       amdgpu_vm_fini(adev, &fpriv->vm);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> index 614953b0fc19..4cf65aba9a9b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> @@ -455,10 +455,18 @@ int amdgpu_userq_signal_ioctl(struct drm_device 
>> *dev, void *data,
>>       if (r)
>>           goto exec_fini;
>>   -    for (i = 0; i < num_bo_handles; i++)
>> +    /* Save the fence to wait for during suspend */
>> +    dma_fence_put(queue->last_fence);
>> +    queue->last_fence = dma_fence_get(fence);
>> +
>> +    for (i = 0; i < num_bo_handles; i++) {
>> +        if (!gobj || !gobj[i]->resv)
>> +            continue;
>> +
>>           dma_resv_add_fence(gobj[i]->resv, fence,
>>                      dma_resv_usage_rw(args->bo_flags &
>>                      AMDGPU_USERQ_BO_WRITE));
>> +    }
>>         /* Add the created fence to syncobj/BO's */
>>       for (i = 0; i < num_syncobj_handles; i++)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index ba986d55ceeb..979174f80993 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -22,6 +22,7 @@
>>    *
>>    */
>>   #include <drm/drm_syncobj.h>
>> +#include <drm/drm_exec.h>
>>   #include "amdgpu.h"
>>   #include "amdgpu_vm.h"
>>   #include "amdgpu_userqueue.h"
>> @@ -282,6 +283,7 @@ amdgpu_userqueue_destroy(struct drm_file *filp, 
>> int queue_id)
>>       amdgpu_bo_unpin(queue->db_obj.obj);
>>       amdgpu_bo_unref(&queue->db_obj.obj);
>>       amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
>> +    uq_mgr->num_userqs--;
>>       mutex_unlock(&uq_mgr->userq_mutex);
>>       return 0;
>>   }
>> @@ -369,6 +371,7 @@ amdgpu_userqueue_create(struct drm_file *filp, 
>> union drm_amdgpu_userq *args)
>>           goto unlock;
>>       }
>>       args->out.queue_id = qid;
>> +    uq_mgr->num_userqs++;
>>     unlock:
>>       mutex_unlock(&uq_mgr->userq_mutex);
>> @@ -402,12 +405,109 @@ int amdgpu_userq_ioctl(struct drm_device *dev, 
>> void *data,
>>       return r;
>>   }
>>   +static int
>> +amdgpu_userqueue_suspend_all(struct amdgpu_userq_mgr *uq_mgr)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    const struct amdgpu_userq_funcs *userq_funcs;
>> +    struct amdgpu_usermode_queue *queue;
>> +    int queue_id, ret;
>> +
>> +    userq_funcs = adev->userq_funcs[AMDGPU_HW_IP_GFX];
>> +
>> +    /* Suspend all the queues for this process */
>> +    idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
>> +        ret = userq_funcs->suspend(uq_mgr, queue);
>> +        if (ret)
>> +            DRM_ERROR("Failed to suspend queue\n");
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static int
>> +amdgpu_userqueue_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
>> +{
>> +    struct amdgpu_usermode_queue *queue;
>> +    int queue_id, ret;
>> +
>> +    idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
>> +        struct dma_fence *f;
>> +        struct drm_exec exec;
>> +
>> +        drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
>> +        drm_exec_until_all_locked(&exec) {
>> +            f = queue->last_fence;
>> +            drm_exec_retry_on_contention(&exec);
>> +        }
>> +        drm_exec_fini(&exec);
>> +
>> +        if (!f || dma_fence_is_signaled(f))
>> +            continue;
>> +        ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
>> +        if ( ret <= 0) {
>> +            DRM_ERROR("Timed out waiting for fence f=%p\n", f);
>> +            return -ETIMEDOUT;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static void
>> +amdgpu_userqueue_suspend_worker(struct work_struct *work)
>> +{
>> +    int ret;
>> +    struct amdgpu_userq_mgr *uq_mgr = work_to_uq_mgr(work, 
>> suspend_work.work);
>> +    struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
>> +    struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
>> +
>> +    /* Wait for any pending userqueue fence to signal */
>> +    ret = amdgpu_userqueue_wait_for_signal(uq_mgr);
>> +    if (ret) {
>> +        DRM_ERROR("Not suspending userqueue, timeout waiting for 
>> work\n");
>> +        return;
>> +    }
>> +
>> +    mutex_lock(&uq_mgr->userq_mutex);
>> +    ret = amdgpu_userqueue_suspend_all(uq_mgr);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to evict userqueue\n");
>> +        goto unlock;
>> +    }
>> +
>> +    /* Signal current eviction fence */
>> +    ret = amdgpu_eviction_fence_signal(evf_mgr);
>> +    if (ret) {
>> +        DRM_ERROR("Can't signal eviction fence to suspend\n");
>> +        goto unlock;
>> +    }
>> +
>> +    /* Cleanup old eviction fence entry */
>> +    amdgpu_eviction_fence_destroy(evf_mgr);
>> +
>> +unlock:
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +}
>> +
>>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, 
>> struct amdgpu_device *adev)
>>   {
>> +    struct amdgpu_fpriv *fpriv;
>> +
>>       mutex_init(&userq_mgr->userq_mutex);
>>       idr_init_base(&userq_mgr->userq_idr, 1);
>>       userq_mgr->adev = adev;
>> +    userq_mgr->num_userqs = 0;
>> +
>> +    fpriv = uq_mgr_to_fpriv(userq_mgr);
>> +    if (!fpriv->evf_mgr.ev_fence) {
>> +        DRM_ERROR("Eviction fence not initialized yet\n");
>> +        return -EINVAL;
>> +    }
>>   +    /* This reference is required for suspend work */
>> +    fpriv->evf_mgr.ev_fence->uq_mgr = userq_mgr;
>> +    INIT_DELAYED_WORK(&userq_mgr->suspend_work, 
>> amdgpu_userqueue_suspend_worker);
>>       return 0;
>>   }
>>   diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 37be29048f42..8b3b50fa8b5b 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -27,6 +27,10 @@
>>     #define AMDGPU_MAX_USERQ_COUNT 512
>>   +#define to_ev_fence(f) container_of(f, struct 
>> amdgpu_eviction_fence, base)
>> +#define work_to_uq_mgr(w, name) container_of(w, struct 
>> amdgpu_userq_mgr, name)
>> +#define uq_mgr_to_fpriv(u) container_of(u, struct amdgpu_fpriv, 
>> userq_mgr)
>> +
>>   struct amdgpu_mqd_prop;
>>     struct amdgpu_userq_obj {
>> @@ -50,6 +54,7 @@ struct amdgpu_usermode_queue {
>>       struct amdgpu_userq_obj wptr_obj;
>>       struct xarray        uq_fence_drv_xa;
>>       struct amdgpu_userq_fence_driver *fence_drv;
>> +    struct dma_fence    *last_fence;
>>   };
>>     struct amdgpu_userq_funcs {
>> @@ -69,6 +74,9 @@ struct amdgpu_userq_mgr {
>>       struct idr            userq_idr;
>>       struct mutex            userq_mutex;
>>       struct amdgpu_device        *adev;
>> +
>> +    struct delayed_work        suspend_work;
>> +    int num_userqs;
>>   };
>>     int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct 
>> drm_file *filp);
>> @@ -86,4 +94,6 @@ void amdgpu_userqueue_destroy_object(struct 
>> amdgpu_userq_mgr *uq_mgr,
>>   int amdgpu_userqueue_update_bo_mapping(struct drm_file *filp, 
>> struct amdgpu_bo_va *bo_va,
>>                          uint32_t operation, uint32_t syncobj_handle,
>>                          uint64_t point);
>> +
>> +int amdgpu_userqueue_enable_signaling(struct dma_fence *f);
>>   #endif
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 00/28] AMDGPU usermode queues
  2024-09-19 16:59 ` [PATCH v11 00/28] AMDGPU usermode queues Alex Deucher
@ 2024-09-25  9:14   ` Sharma, Shashank
  0 siblings, 0 replies; 38+ messages in thread
From: Sharma, Shashank @ 2024-09-25  9:14 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx


On 19/09/2024 18:59, Alex Deucher wrote:
> On Mon, Sep 9, 2024 at 4:07 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> This patch series introduces base code of AMDGPU usermode queues for gfx
>> workloads. Usermode queues is a method of GPU workload submission into the
>> graphics hardware without any interaction with kernel/DRM schedulers. In
>> this method, a userspace graphics application can create its own workqueue
>> and submit it directly in the GPU HW.
>>
>> The general idea of how Userqueues are supposed to work:
>> - The application creates the following GPU objetcs:
>>    - A queue object to hold the workload packets.
>>    - A read pointer object.
>>    - A write pointer object.
>>    - A doorbell page.
>>    - Other supporting buffer objects as per target IP engine (shadow, GDS
>>      etc, information available with AMDGPU_INFO_IOCTL)
> the queue, rptr, wptr, and metadata buffers don't have to be separate
> buffers.  Userspace could suballocate them out of the same buffer.  We
> just need the virtual addresses.  However, we need to keep track of
> the GPU virtual addresses used by the user queue for these buffers and
> prevent them from being unmapped until the queue is destroyed, similar
> to what we do on the KFD side.  Otherwise, the user could unmap one of
> the buffers and submit work to the user queue which could cause it to
> hang.
Noted, thanks Alex.
> Alex
>
>> - The application picks a 32-bit offset in the doorbell page for this
>>    queue.
>> - The application uses the usermode_queue_create IOCTL introduced in
>>    this patch, by passing the GPU addresses of these objects (read ptr,
>>    write ptr, queue base address, shadow, gds) with doorbell object and
>>    32-bit doorbell offset in the doorbell page.
>> - The kernel creates the queue and maps it in the HW.
>> - The application maps the GPU buffers in process address space.
>> - The application can start submitting the data in the queue as soon as
>>    the kernel IOCTL returns.
>> - After filling the workload data in the queue, the app must write the
>>    number of dwords added in the queue into the doorbell offset and the
>>    WPTR buffer. The GPU will start fetching the data as soon as its done.
>> - This series adds usermode queue support for all three MES based IPs
>>    (GFX, SDMA and Compute).
>> - This series also adds eviction fences to handle migration of the
>>    userqueue mapped buffers by TTM.
>> - For synchronization of userqueues, we have added a secure semaphores
>>    IOCTL which is getting reviewed separately here:
>>    https://patchwork.freedesktop.org/patch/611971/
>>
>> libDRM UAPI changes for this series can be found here:
>> (This also contains an example test utility which demonstrates
>> the usage of userqueue UAPI)
>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>
>> MESA changes consuming this series can be seen in the MR here:
>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010
>>
>> Alex Deucher (1):
>>    drm/amdgpu: UAPI for user queue management
>>
>> Arvind Yadav (4):
>>    drm/amdgpu: enable SDMA usermode queues
>>    drm/amdgpu: Add input fence to sync bo unmap
>>    drm/amdgpu: fix MES GFX mask
>>    Revert "drm/amdgpu: don't allow userspace to create a doorbell BO"
>>
>> Shashank Sharma (18):
>>    drm/amdgpu: add usermode queue base code
>>    drm/amdgpu: add new IOCTL for usermode queue
>>    drm/amdgpu: add helpers to create userqueue object
>>    drm/amdgpu: create MES-V11 usermode queue for GFX
>>    drm/amdgpu: create context space for usermode queue
>>    drm/amdgpu: map usermode queue into MES
>>    drm/amdgpu: map wptr BO into GART
>>    drm/amdgpu: generate doorbell index for userqueue
>>    drm/amdgpu: cleanup leftover queues
>>    drm/amdgpu: enable GFX-V11 userqueue support
>>    drm/amdgpu: enable compute/gfx usermode queue
>>    drm/amdgpu: update userqueue BOs and PDs
>>    drm/amdgpu: add kernel config for gfx-userqueue
>>    drm/amdgpu: add gfx eviction fence helpers
>>    drm/amdgpu: add userqueue suspend/resume functions
>>    drm/amdgpu: suspend gfx userqueues
>>    drm/amdgpu: resume gfx userqueues
>>    Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV"
>>
>>   drivers/gpu/drm/amd/amdgpu/Kconfig            |   8 +
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |  10 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  11 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   5 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  10 +
>>   .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 297 ++++++++
>>   .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |  67 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |  68 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  11 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |   3 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |   2 +-
>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 713 ++++++++++++++++++
>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  74 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 644 ++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  42 +-
>>   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        |  16 +-
>>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 395 ++++++++++
>>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h  |  30 +
>>   drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c        |   5 +
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    | 100 +++
>>   drivers/gpu/drm/amd/include/v11_structs.h     |   4 +-
>>   include/uapi/drm/amdgpu_drm.h                 | 252 +++++++
>>   22 files changed, 2722 insertions(+), 45 deletions(-)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h
>>   create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>
>> --
>> 2.45.1
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 24/28] drm/amdgpu: resume gfx userqueues
  2024-09-17 12:30   ` Christian König
@ 2024-09-25  9:15     ` Sharma, Shashank
  0 siblings, 0 replies; 38+ messages in thread
From: Sharma, Shashank @ 2024-09-25  9:15 UTC (permalink / raw)
  To: Christian König, amd-gfx
  Cc: Alex Deucher, Christian Koenig, Arvind Yadav


On 17/09/2024 14:30, Christian König wrote:
> Am 09.09.24 um 22:06 schrieb Shashank Sharma:
>> This patch adds support for userqueue resume. What it typically does is
>> this:
>> - adds a new delayed work for resuming all the queues.
>> - schedules this delayed work from the suspend work.
>> - validates the BOs and replaces the eviction fence before resuming all
>>    the queues running under this instance of userq manager.
>>
>> V2: Addressed Christian's review comments:
>>      - declare local variables like ret at the bottom.
>>      - lock all the object first, then start attaching the new fence.
>>      - dont replace old eviction fence, just attach new eviction fence.
>>      - no error logs for drm_exec_lock failures
>>      - no need to reserve bos after drm_exec_locked
>>      - schedule the resume worker immediately (not after 100 ms)
>>      - check for NULL BO (Arvind)
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 120 ++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |   1 +
>>   2 files changed, 121 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 979174f80993..e7f7354e0c0e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -405,6 +405,122 @@ int amdgpu_userq_ioctl(struct drm_device *dev, 
>> void *data,
>>       return r;
>>   }
>>   +static int
>> +amdgpu_userqueue_resume_all(struct amdgpu_userq_mgr *uq_mgr)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    const struct amdgpu_userq_funcs *userq_funcs;
>> +    struct amdgpu_usermode_queue *queue;
>> +    int queue_id, ret;
>> +
>> +    userq_funcs = adev->userq_funcs[AMDGPU_HW_IP_GFX];
>> +
>> +    /* Resume all the queues for this process */
>> +    idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
>> +        ret = userq_funcs->resume(uq_mgr, queue);
>> +        if (ret)
>> +            DRM_ERROR("Failed to resume queue %d\n", queue_id);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static int
>> +amdgpu_userqueue_validate_bos(struct amdgpu_userq_mgr *uq_mgr)
>> +{
>> +    struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
>> +    struct amdgpu_vm *vm = &fpriv->vm;
>> +    struct amdgpu_bo_va *bo_va, *tmp;
>> +    struct drm_exec exec;
>> +    struct amdgpu_bo *bo;
>> +    int ret;
>> +
>> +    drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
>> +    drm_exec_until_all_locked(&exec) {
>> +        ret = amdgpu_vm_lock_pd(vm, &exec, 2);
>> +        drm_exec_retry_on_contention(&exec);
>> +        if (unlikely(ret)) {
>> +            DRM_ERROR("Failed to lock PD\n");
>
> I would drop those error messages in the low level function.
>
> The most likely cause (except for contention) why locking a BO fails 
> is because we were interrupted, and for that we actually don't want to 
> print anything.
>
> Apart from that I really need to wrap my head around the VM code once 
> more, but that here should probably work for now.

Noted, I will remove the error message.

- Shashank

>
> Regards,
> Christian.
>
>> +            goto unlock_all;
>> +        }
>> +
>> +        /* Lock the done list */
>> +        list_for_each_entry_safe(bo_va, tmp, &vm->done, 
>> base.vm_status) {
>> +            bo = bo_va->base.bo;
>> +            if (!bo)
>> +                continue;
>> +
>> +            ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
>> +            drm_exec_retry_on_contention(&exec);
>> +            if (unlikely(ret))
>> +                goto unlock_all;
>> +        }
>> +
>> +        /* Lock the invalidated list */
>> +        list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, 
>> base.vm_status) {
>> +            bo = bo_va->base.bo;
>> +            if (!bo)
>> +                continue;
>> +
>> +            ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
>> +            drm_exec_retry_on_contention(&exec);
>> +            if (unlikely(ret))
>> +                goto unlock_all;
>> +        }
>> +    }
>> +
>> +    /* Now validate BOs */
>> +    list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, 
>> base.vm_status) {
>> +        bo = bo_va->base.bo;
>> +        if (!bo)
>> +            continue;
>> +
>> +        ret = amdgpu_userqueue_validate_vm_bo(NULL, bo);
>> +        if (ret) {
>> +            DRM_ERROR("Failed to validate BO\n");
>> +            goto unlock_all;
>> +        }
>> +    }
>> +
>> +    /* Handle the moved BOs */
>> +    ret = amdgpu_vm_handle_moved(uq_mgr->adev, vm, &exec.ticket);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to handle moved BOs\n");
>> +        goto unlock_all;
>> +    }
>> +
>> +    ret = amdgpu_eviction_fence_replace_fence(fpriv);
>> +    if (ret)
>> +        DRM_ERROR("Failed to replace eviction fence\n");
>> +
>> +unlock_all:
>> +    drm_exec_fini(&exec);
>> +    return ret;
>> +}
>> +
>> +static void amdgpu_userqueue_resume_worker(struct work_struct *work)
>> +{
>> +    struct amdgpu_userq_mgr *uq_mgr = work_to_uq_mgr(work, 
>> resume_work.work);
>> +    int ret;
>> +
>> +    mutex_lock(&uq_mgr->userq_mutex);
>> +
>> +    ret = amdgpu_userqueue_validate_bos(uq_mgr);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to validate BOs to restore\n");
>> +        goto unlock;
>> +    }
>> +
>> +    ret = amdgpu_userqueue_resume_all(uq_mgr);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to resume all queues\n");
>> +        goto unlock;
>> +    }
>> +
>> +unlock:
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +}
>> +
>>   static int
>>   amdgpu_userqueue_suspend_all(struct amdgpu_userq_mgr *uq_mgr)
>>   {
>> @@ -486,6 +602,9 @@ amdgpu_userqueue_suspend_worker(struct 
>> work_struct *work)
>>       /* Cleanup old eviction fence entry */
>>       amdgpu_eviction_fence_destroy(evf_mgr);
>>   +    /* Schedule a work to restore userqueue */
>> +    schedule_delayed_work(&uq_mgr->resume_work, 0);
>> +
>>   unlock:
>>       mutex_unlock(&uq_mgr->userq_mutex);
>>   }
>> @@ -508,6 +627,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
>> *userq_mgr, struct amdgpu_devi
>>       /* This reference is required for suspend work */
>>       fpriv->evf_mgr.ev_fence->uq_mgr = userq_mgr;
>>       INIT_DELAYED_WORK(&userq_mgr->suspend_work, 
>> amdgpu_userqueue_suspend_worker);
>> +    INIT_DELAYED_WORK(&userq_mgr->resume_work, 
>> amdgpu_userqueue_resume_worker);
>>       return 0;
>>   }
>>   diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 8b3b50fa8b5b..d035b5c2b14b 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -76,6 +76,7 @@ struct amdgpu_userq_mgr {
>>       struct amdgpu_device        *adev;
>>         struct delayed_work        suspend_work;
>> +    struct delayed_work        resume_work;
>>       int num_userqs;
>>   };
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 06/28] drm/amdgpu: create context space for usermode queue
  2024-09-09 20:05 ` [PATCH v11 06/28] drm/amdgpu: create context space for usermode queue Shashank Sharma
@ 2024-10-18 17:39   ` Alex Deucher
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Deucher @ 2024-10-18 17:39 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Alex Deucher, Christian Koenig, Arvind Yadav

On Mon, Sep 9, 2024 at 4:07 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> The MES FW expects us to allocate at least one page as context
> space to process gang and process related context data. This
> patch creates a joint object for the same, and calculates GPU
> space offsets of these spaces.
>
> V1: Addressed review comments on RFC patch:
>     Alex: Make this function IP specific
>
> V2: Addressed review comments from Christian
>     - Allocate only one object for total FW space, and calculate
>       offsets for each of these objects.
>
> V3: Integration with doorbell manager
>
> V4: Review comments:
>     - Remove shadow from FW space list from cover letter (Alex)
>     - Alignment of macro (Luben)
>
> V5: Merged patches 5 and 6 into this single patch
>     Addressed review comments:
>     - Use lower_32_bits instead of mask (Christian)
>     - gfx_v11_0 instead of gfx_v11 in function names (Alex)
>     - Shadow and GDS objects are now coming from userspace (Christian,
>       Alex)
>
> V6:
>     - Add a comment to replace amdgpu_bo_create_kernel() with
>       amdgpu_bo_create() during fw_ctx object creation (Christian).
>     - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
>       of generic queue structure and make it gen11 specific (Alex).
>
> V7:
>    - Using helper function to create/destroy userqueue objects.
>    - Removed FW object space allocation.
>
> V8:
>    - Updating FW object address from user values.
>
> V9:
>    - uppdated function name from gfx_v11_* to mes_v11_*
>
> V10:
>    - making this patch independent of IP based changes, moving any
>      GFX object related changes in GFX specific patch (Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Acked-by: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 33 +++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>  2 files changed, 34 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> index 63fd48a5b8b0..2486ea2d72fe 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> @@ -27,6 +27,31 @@
>  #include "mes_v11_0.h"
>  #include "mes_v11_0_userqueue.h"
>
> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE

I just realized these are set to PAGE_SIZE.  That's probably not what
we want.  Since the PAGE_SIZE could be really large on some systems.
I would change these to align with whatever the sizes and alignments
for them are in the firmware.  Probably 4K is a good place to start,
but maybe that is bigger than we need.

Alex

> +
> +static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                           struct amdgpu_usermode_queue *queue,
> +                                           struct drm_amdgpu_userq_in *mqd_user)
> +{
> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> +       int r, size;
> +
> +       /*
> +        * The FW expects at least one page space allocated for
> +        * process ctx and gang ctx each. Create an object
> +        * for the same.
> +        */
> +       size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
> +       r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);
> +       if (r) {
> +               DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
> +               return r;
> +       }
> +
> +       return 0;
> +}
> +
>  static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>                                       struct drm_amdgpu_userq_in *args_in,
>                                       struct amdgpu_usermode_queue *queue)
> @@ -73,6 +98,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>                 goto free_mqd;
>         }
>
> +       /* Create BO for FW operations */
> +       r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
> +       if (r) {
> +               DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
> +               goto free_mqd;
> +       }
> +
>         return 0;
>
>  free_mqd:
> @@ -88,6 +120,7 @@ static void
>  mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>                             struct amdgpu_usermode_queue *queue)
>  {
> +       amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>         kfree(queue->userq_prop);
>         amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>  }
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index bbd29f68b8d4..643f31474bd8 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
>         struct amdgpu_userq_mgr *userq_mgr;
>         struct amdgpu_vm        *vm;
>         struct amdgpu_userq_obj mqd;
> +       struct amdgpu_userq_obj fw_obj;
>  };
>
>  struct amdgpu_userq_funcs {
> --
> 2.45.1
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2024-10-18 17:40 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-09 20:05 [PATCH v11 00/28] AMDGPU usermode queues Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 01/28] drm/amdgpu: UAPI for user queue management Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 02/28] drm/amdgpu: add usermode queue base code Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 03/28] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 04/28] drm/amdgpu: add helpers to create userqueue object Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 05/28] drm/amdgpu: create MES-V11 usermode queue for GFX Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 06/28] drm/amdgpu: create context space for usermode queue Shashank Sharma
2024-10-18 17:39   ` Alex Deucher
2024-09-09 20:05 ` [PATCH v11 07/28] drm/amdgpu: map usermode queue into MES Shashank Sharma
2024-09-09 20:05 ` [PATCH v11 08/28] drm/amdgpu: map wptr BO into GART Shashank Sharma
2024-09-16 12:39   ` Christian König
2024-09-09 20:06 ` [PATCH v11 09/28] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 10/28] drm/amdgpu: cleanup leftover queues Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 11/28] drm/amdgpu: enable GFX-V11 userqueue support Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 12/28] drm/amdgpu: enable SDMA usermode queues Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 13/28] drm/amdgpu: enable compute/gfx usermode queue Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 14/28] drm/amdgpu: update userqueue BOs and PDs Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 15/28] drm/amdgpu: add kernel config for gfx-userqueue Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 21/28] drm/amdgpu: add gfx eviction fence helpers Shashank Sharma
2024-09-16 14:14   ` Christian König
2024-09-25  9:08     ` Sharma, Shashank
2024-09-09 20:06 ` [PATCH v11 22/28] drm/amdgpu: add userqueue suspend/resume functions Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 23/28] drm/amdgpu: suspend gfx userqueues Shashank Sharma
2024-09-17 11:58   ` Christian König
2024-09-25  9:13     ` Sharma, Shashank
2024-09-09 20:06 ` [PATCH v11 24/28] drm/amdgpu: resume " Shashank Sharma
2024-09-17 12:30   ` Christian König
2024-09-25  9:15     ` Sharma, Shashank
2024-09-09 20:06 ` [PATCH v11 25/28] drm/amdgpu: Add input fence to sync bo unmap Shashank Sharma
2024-09-09 20:06 ` [PATCH v11 26/28] drm/amdgpu: fix MES GFX mask Shashank Sharma
2024-09-17 12:21   ` Christian König
2024-09-09 20:06 ` [PATCH v11 27/28] Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV" Shashank Sharma
2024-09-09 20:31   ` Alex Deucher
2024-09-11  9:20     ` Sharma, Shashank
2024-09-09 20:06 ` [PATCH v11 28/28] Revert "drm/amdgpu: don't allow userspace to create a doorbell BO" Shashank Sharma
2024-09-17 12:25   ` Christian König
2024-09-19 16:59 ` [PATCH v11 00/28] AMDGPU usermode queues Alex Deucher
2024-09-25  9:14   ` Sharma, Shashank

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox