AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/11] Add CWSR support to user queues
@ 2026-01-22 10:39 Lijo Lazar
  2026-01-22 10:39 ` [PATCH v4 01/11] drm/amdgpu: Add helper function to get xcc count Lijo Lazar
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang

This series ports some of CWSR functions from KFD to KGD side for userqueues.

Things which are part of this series -

v1:
  Allocation of TBA/TMA regions for first level handler. First level handler is always from driver.
  Presently, this takes care of only dGPU allocations and is TBD for APUs. Backend to add second
  level handler is added, but there is no IOCTL provided yet. TBA is allocated only once and TMA is
  allocated per VM. It's tracked by a cwsr object which is maintained in userqueue manager.

  Adds save area and control stack calculations to KGD side. Also, support to specify save/restore
  area params while creating user queues. TBD:IOCTL parameters need to be modified to specify save
  area params. Also, need to confirm size calculations with multi-xcc and obtain the number of XCCs
  used by a userqueue manager.
 
v2:
  Remove association of cwsr with user queue manager (Christian)
  Add ioctl support to query cwsr size, set cwsr parameters for user queues and set second level
  handler.
  TBD: Handle level1 trap handler allocation for APUs.

v3:
  Removed 'TBD: Handle level1 trap handler allocation for APUs' (confirmed that APUs also use
  the same path).
  Rebase against amd-staging-drm-next.
  Fixes for issues reported by Jesse Zhang:
    Keep 2 pages for cwsr handler (TBA) and 1 page for TMA.
    Add cwsr_enabled in addition to cwsr_supported and use it for NULL pointer issues.
v4:
  Add disable userqueue check (Alex)
  Fix usage of __free (Krzysztof)
  Relocate userqueue trap VA to avoid conflict (Jesse)
  Keep the user save area size check to be minimum size required.(Alex). Control stack size is still
  matched.
  Set trap enable flag (Jesse)
  Rename input parameter for consistency (Alex)
  Add new function to set debug trap flag.

Lijo Lazar (11):
  drm/amdgpu: Add helper function to get xcc count
  drm/amdgpu: Add cwsr functions
  drm/amdgpu: Fill cwsr save area details
  drm/amdgpu: Add user save area params validation
  drm/amdgpu: Add cwsr to device init/fini sequence
  drm/amdgpu: Add first level cwsr handler to userq
  drm/amdgpu: Add user save area params to mqd input
  drm/amdgpu: Add ioctl to get cwsr details
  drm/amdgpu: Add ioctl support for cwsr params
  drm/amdgpu: Add ioctl to set level2 handler
  drm/amdgpu: Add interface to set debug trap flag

 drivers/gpu/drm/amd/amdgpu/Makefile        |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c   | 634 +++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h   | 100 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   8 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |  29 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c  |  24 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h  |   5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h     |  13 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h    |  22 +
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c     |  14 +
 drivers/gpu/drm/amd/amdgpu/mes_userqueue.c |  28 +
 include/uapi/drm/amdgpu_drm.h              |  56 ++
 14 files changed, 942 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h

-- 
2.49.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 01/11] drm/amdgpu: Add helper function to get xcc count
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
@ 2026-01-22 10:39 ` Lijo Lazar
  2026-01-22 10:39 ` [PATCH v4 02/11] drm/amdgpu: Add cwsr functions Lijo Lazar
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx
  Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang,
	Alex Deucher

Add a helper function to get the number of XCCs given a parition id. If
there is no partition manager, return 1 as default.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
index 8058e8f35d41..b780c12b07e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
@@ -217,4 +217,26 @@ amdgpu_get_next_xcp(struct amdgpu_xcp_mgr *xcp_mgr, int *from)
 	for (i = 0, xcp = amdgpu_get_next_xcp(xcp_mgr, &i); xcp; \
 	     ++i, xcp = amdgpu_get_next_xcp(xcp_mgr, &i))
 
+static inline int amdgpu_xcp_get_num_xcc(struct amdgpu_xcp_mgr *xcp_mgr,
+					 int xcp_id)
+{
+	struct amdgpu_xcp *xcp;
+	uint32_t xcc_mask;
+	int i, r;
+
+	if (!xcp_mgr || xcp_id == AMDGPU_XCP_NO_PARTITION)
+		return 1;
+	for_each_xcp(xcp_mgr, xcp, i) {
+		if (xcp->id == xcp_id) {
+			r = amdgpu_xcp_get_inst_details(xcp, AMDGPU_XCP_GFX,
+							&xcc_mask);
+			if (unlikely(r))
+				return 1;
+			else
+				return hweight32(xcc_mask);
+		}
+	}
+
+	return 1;
+}
 #endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 02/11] drm/amdgpu: Add cwsr functions
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
  2026-01-22 10:39 ` [PATCH v4 01/11] drm/amdgpu: Add helper function to get xcc count Lijo Lazar
@ 2026-01-22 10:39 ` Lijo Lazar
  2026-01-23 20:41   ` Alex Deucher
  2026-01-22 10:39 ` [PATCH v4 03/11] drm/amdgpu: Fill cwsr save area details Lijo Lazar
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang

Add functions related to cwsr handling inside amdgpu framework.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile      |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h      |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 364 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h |  67 +++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   |  13 +-
 5 files changed, 445 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 8e22882b66aa..3b563c73bb66 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -67,7 +67,7 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o amdgpu_kms.o \
 	amdgpu_fw_attestation.o amdgpu_securedisplay.o \
 	amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
 	amdgpu_ring_mux.o amdgpu_xcp.o amdgpu_seq64.o amdgpu_aca.o amdgpu_dev_coredump.o \
-	amdgpu_cper.o amdgpu_userq_fence.o amdgpu_eviction_fence.o amdgpu_ip.o
+	amdgpu_cper.o amdgpu_userq_fence.o amdgpu_eviction_fence.o amdgpu_ip.o  amdgpu_cwsr.o
 
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9c11535c44c6..0ace28c170bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -328,6 +328,7 @@ struct kfd_vm_fault_info;
 struct amdgpu_hive_info;
 struct amdgpu_reset_context;
 struct amdgpu_reset_control;
+struct amdgpu_cwsr_isa;
 
 enum amdgpu_cp_irq {
 	AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP = 0,
@@ -1237,6 +1238,8 @@ struct amdgpu_device {
 	 * Must be last --ends in a flexible-array member.
 	 */
 	struct amdgpu_kfd_dev		kfd;
+
+	struct amdgpu_cwsr_info *cwsr_info;
 };
 
 static inline uint32_t amdgpu_ip_version(const struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
new file mode 100644
index 000000000000..f2d3837366bf
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
@@ -0,0 +1,364 @@
+/*
+ * Copyright 2025 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <drm/drm_exec.h>
+
+#include "amdgpu.h"
+#include "cwsr_trap_handler.h"
+#include "amdgpu_cwsr.h"
+
+extern int cwsr_enable;
+
+#define AMDGPU_CWSR_TBA_MAX_SIZE (2 * AMDGPU_GPU_PAGE_SIZE)
+#define AMDGPU_CWSR_TMA_MAX_SIZE (AMDGPU_GPU_PAGE_SIZE)
+#define AMDGPU_CWSR_TMA_OFFSET (AMDGPU_CWSR_TBA_MAX_SIZE)
+
+enum amdgpu_cwsr_region {
+	AMDGPU_CWSR_TBA,
+	AMDGPU_CWSR_TMA,
+};
+
+static inline uint64_t amdgpu_cwsr_tba_vaddr(struct amdgpu_device *adev)
+{
+	uint64_t addr = AMDGPU_VA_RESERVED_TRAP_UQ_START(adev);
+
+	addr = amdgpu_gmc_sign_extend(addr);
+
+	return addr;
+}
+
+static inline bool amdgpu_cwsr_is_supported(struct amdgpu_device *adev)
+{
+	uint32_t gc_ver = amdgpu_ip_version(adev, GC_HWIP, 0);
+
+	if (!cwsr_enable || adev->gfx.disable_uq ||
+	    gc_ver < IP_VERSION(9, 0, 1))
+		return false;
+
+	return true;
+}
+
+static void amdgpu_cwsr_init_isa_details(struct amdgpu_device *adev,
+					 struct amdgpu_cwsr_info *cwsr_info)
+{
+	uint32_t gc_ver = amdgpu_ip_version(adev, GC_HWIP, 0);
+
+	if (gc_ver < IP_VERSION(9, 0, 1)) {
+		BUILD_BUG_ON(sizeof(cwsr_trap_gfx8_hex) >
+			     AMDGPU_CWSR_TBA_MAX_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_gfx8_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_gfx8_hex);
+	} else if (gc_ver == IP_VERSION(9, 4, 1)) {
+		BUILD_BUG_ON(sizeof(cwsr_trap_arcturus_hex) >
+			     AMDGPU_CWSR_TBA_MAX_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_arcturus_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_arcturus_hex);
+	} else if (gc_ver == IP_VERSION(9, 4, 2)) {
+		BUILD_BUG_ON(sizeof(cwsr_trap_aldebaran_hex) >
+			     AMDGPU_CWSR_TBA_MAX_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_aldebaran_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_aldebaran_hex);
+	} else if (gc_ver == IP_VERSION(9, 4, 3) ||
+		   gc_ver == IP_VERSION(9, 4, 4)) {
+		BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_4_3_hex) >
+			     AMDGPU_CWSR_TBA_MAX_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_gfx9_4_3_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_gfx9_4_3_hex);
+	} else if (gc_ver == IP_VERSION(9, 5, 0)) {
+		BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_5_0_hex) > PAGE_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_gfx9_5_0_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_gfx9_5_0_hex);
+	} else if (gc_ver < IP_VERSION(10, 1, 1)) {
+		BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_hex) >
+			     AMDGPU_CWSR_TBA_MAX_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_gfx9_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_gfx9_hex);
+	} else if (gc_ver < IP_VERSION(10, 3, 0)) {
+		BUILD_BUG_ON(sizeof(cwsr_trap_nv1x_hex) >
+			     AMDGPU_CWSR_TBA_MAX_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_nv1x_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_nv1x_hex);
+	} else if (gc_ver < IP_VERSION(11, 0, 0)) {
+		BUILD_BUG_ON(sizeof(cwsr_trap_gfx10_hex) >
+			     AMDGPU_CWSR_TBA_MAX_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_gfx10_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_gfx10_hex);
+	} else if (gc_ver < IP_VERSION(12, 0, 0)) {
+		/* The gfx11 cwsr trap handler must fit inside a single
+			   page. */
+		BUILD_BUG_ON(sizeof(cwsr_trap_gfx11_hex) > PAGE_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_gfx11_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_gfx11_hex);
+	} else if (gc_ver < IP_VERSION(12, 1, 0)) {
+		BUILD_BUG_ON(sizeof(cwsr_trap_gfx12_hex) >
+			     AMDGPU_CWSR_TBA_MAX_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_gfx12_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_gfx12_hex);
+	} else {
+		BUILD_BUG_ON(sizeof(cwsr_trap_gfx12_1_0_hex) >
+			     AMDGPU_CWSR_TBA_MAX_SIZE);
+		cwsr_info->isa_buf = cwsr_trap_gfx12_1_0_hex;
+		cwsr_info->isa_sz = sizeof(cwsr_trap_gfx12_1_0_hex);
+	}
+}
+
+int amdgpu_cwsr_init(struct amdgpu_device *adev)
+{
+	struct amdgpu_cwsr_info *cwsr_info __free(kfree) =
+		kzalloc(sizeof(*cwsr_info), GFP_KERNEL);
+	void *ptr;
+	int r;
+
+	if (!amdgpu_cwsr_is_supported(adev))
+		return -EOPNOTSUPP;
+
+	if (!cwsr_info)
+		return -ENOMEM;
+	amdgpu_cwsr_init_isa_details(adev, cwsr_info);
+
+	if (!cwsr_info->isa_sz)
+		return -EOPNOTSUPP;
+
+	r = amdgpu_bo_create_kernel(adev, AMDGPU_CWSR_TBA_MAX_SIZE, PAGE_SIZE,
+				    AMDGPU_GEM_DOMAIN_GTT, &cwsr_info->isa_bo,
+				    NULL, &ptr);
+	if (r)
+		return r;
+
+	memcpy(ptr, cwsr_info->isa_buf, cwsr_info->isa_sz);
+	adev->cwsr_info = no_free_ptr(cwsr_info);
+
+	return 0;
+}
+
+void amdgpu_cwsr_fini(struct amdgpu_device *adev)
+{
+	if (!amdgpu_cwsr_is_enabled(adev))
+		return;
+
+	amdgpu_bo_free_kernel(&adev->cwsr_info->isa_bo, NULL, NULL);
+	kfree(adev->cwsr_info);
+	adev->cwsr_info = NULL;
+}
+
+/*
+ * amdgpu_map_cwsr_trap_handler should be called during amdgpu_vm_init
+ * it maps virtual address amdgpu_cwsr_trap_handler_vaddr() to this VM, and each
+ * compute queue can use this virtual address for wave save/restore
+ * operations to support compute preemption.
+ */
+static int amdgpu_cwsr_map_region(struct amdgpu_device *adev,
+				  struct amdgpu_vm *vm,
+				  struct amdgpu_cwsr_trap_obj *cwsr,
+				  enum amdgpu_cwsr_region region)
+{
+	uint64_t cwsr_addr, va_flags, va;
+	struct amdgpu_bo_va **bo_va;
+	struct amdgpu_bo *bo;
+	uint32_t size;
+	int r;
+
+	if (!cwsr || !vm)
+		return -EINVAL;
+
+	cwsr_addr = amdgpu_cwsr_tba_vaddr(adev);
+
+	if (region == AMDGPU_CWSR_TBA) {
+		size = AMDGPU_CWSR_TBA_MAX_SIZE;
+		bo_va = &cwsr->tba_va;
+		bo = adev->cwsr_info->isa_bo;
+		va = cwsr_addr;
+		va_flags = (AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE |
+			    AMDGPU_VM_PAGE_EXECUTABLE);
+	} else {
+		size = AMDGPU_CWSR_TMA_MAX_SIZE;
+		bo_va = &cwsr->tma_va;
+		bo = cwsr->tma_bo;
+		va = cwsr_addr + AMDGPU_CWSR_TMA_OFFSET;
+		va_flags = (AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE);
+	}
+
+	*bo_va = amdgpu_vm_bo_add(adev, vm, bo);
+	if (!*bo_va)
+		return -ENOMEM;
+
+	va &= AMDGPU_GMC_HOLE_MASK;
+	r = amdgpu_vm_bo_map(adev, *bo_va, va, 0, size, va_flags);
+	if (r) {
+		dev_err(adev->dev, "failed to do bo map of %s region, err=%d\n",
+			(region == AMDGPU_CWSR_TBA ? "tba" : "tma"), r);
+		amdgpu_vm_bo_del(adev, *bo_va);
+		*bo_va = NULL;
+		return r;
+	}
+
+	r = amdgpu_vm_bo_update(adev, *bo_va, false);
+	if (r) {
+		dev_err(adev->dev,
+			"failed to do page table update of %s region, err=%d\n",
+			(region == AMDGPU_CWSR_TBA ? "tba" : "tma"), r);
+		amdgpu_vm_bo_del(adev, *bo_va);
+		*bo_va = NULL;
+		return r;
+	}
+
+	if (region == AMDGPU_CWSR_TBA)
+		cwsr->tba_gpu_va_addr = va;
+	else
+		cwsr->tma_gpu_va_addr = va;
+
+	return 0;
+}
+
+static int amdgpu_cwsr_unmap_region(struct amdgpu_device *adev,
+				    struct amdgpu_cwsr_trap_obj *cwsr,
+				    enum amdgpu_cwsr_region region)
+{
+	struct amdgpu_bo_va **bo_va;
+	uint64_t va;
+	int r;
+
+	if (!cwsr)
+		return -EINVAL;
+
+	if (region == AMDGPU_CWSR_TBA) {
+		bo_va = &cwsr->tba_va;
+		va = cwsr->tba_gpu_va_addr;
+	} else {
+		bo_va = &cwsr->tma_va;
+		va = cwsr->tma_gpu_va_addr;
+	}
+
+	r = amdgpu_vm_bo_unmap(adev, *bo_va, va);
+	if (r) {
+		dev_err(adev->dev,
+			"failed to do bo_unmap on CWSR trap handler, err=%d\n",
+			r);
+		return r;
+	}
+
+	amdgpu_vm_bo_del(adev, *bo_va);
+	*bo_va = NULL;
+
+	return r;
+}
+
+int amdgpu_cwsr_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+		      struct amdgpu_cwsr_trap_obj **trap_obj)
+{
+	struct amdgpu_cwsr_trap_obj *cwsr;
+	struct amdgpu_bo *bo;
+	struct drm_exec exec;
+	int r;
+
+	if (!amdgpu_cwsr_is_enabled(adev))
+		return -EOPNOTSUPP;
+	if (!vm || !trap_obj)
+		return -EINVAL;
+	cwsr = kzalloc(sizeof(*cwsr), GFP_KERNEL);
+	if (!cwsr)
+		return -ENOMEM;
+
+	bo = adev->cwsr_info->isa_bo;
+	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
+	drm_exec_until_all_locked(&exec) {
+		r = amdgpu_vm_lock_pd(vm, &exec, 0);
+		if (likely(!r))
+			r = drm_exec_lock_obj(&exec, &bo->tbo.base);
+		drm_exec_retry_on_contention(&exec);
+		if (unlikely(r)) {
+			dev_err(adev->dev,
+				"failed to reserve for CWSR allocs: err=%d\n",
+				r);
+			goto err;
+		}
+	}
+
+	r = amdgpu_bo_create_kernel(adev, AMDGPU_CWSR_TMA_MAX_SIZE, PAGE_SIZE,
+				    AMDGPU_GEM_DOMAIN_GTT, &cwsr->tma_bo, NULL,
+				    &cwsr->tma_cpu_addr);
+	if (r)
+		goto err;
+
+	r = amdgpu_cwsr_map_region(adev, vm, cwsr, AMDGPU_CWSR_TMA);
+	if (r)
+		goto err;
+	r = amdgpu_cwsr_map_region(adev, vm, cwsr, AMDGPU_CWSR_TBA);
+	if (r) {
+		amdgpu_cwsr_unmap_region(adev, cwsr, AMDGPU_CWSR_TMA);
+		goto err;
+	}
+
+err:
+	drm_exec_fini(&exec);
+	if (r) {
+		amdgpu_bo_free_kernel(&cwsr->tma_bo, NULL, NULL);
+		kfree(cwsr);
+		*trap_obj = NULL;
+	} else {
+		*trap_obj = cwsr;
+	}
+
+	return r;
+}
+
+void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+		      struct amdgpu_cwsr_trap_obj **trap_obj)
+{
+	struct amdgpu_bo *tba_bo;
+	struct amdgpu_bo *tma_bo;
+	struct drm_exec exec;
+	int r;
+
+	if (!trap_obj || !*trap_obj || !(*trap_obj)->tma_bo)
+		return;
+	tba_bo = adev->cwsr_info->isa_bo;
+	tma_bo = (*trap_obj)->tma_bo;
+
+	if (!tba_bo || !tma_bo)
+		return;
+
+	drm_exec_init(&exec, 0, 0);
+	drm_exec_until_all_locked(&exec)
+	{
+		r = amdgpu_vm_lock_pd(vm, &exec, 0);
+		if (likely(!r))
+			r = drm_exec_lock_obj(&exec, &tba_bo->tbo.base);
+		drm_exec_retry_on_contention(&exec);
+		if (likely(!r))
+			r = drm_exec_lock_obj(&exec, &tma_bo->tbo.base);
+		drm_exec_retry_on_contention(&exec);
+		if (unlikely(r)) {
+			dev_err(adev->dev,
+				"failed to reserve CWSR BOs: err=%d\n", r);
+			goto err;
+		}
+	}
+
+	amdgpu_cwsr_unmap_region(adev, *trap_obj, AMDGPU_CWSR_TBA);
+	amdgpu_cwsr_unmap_region(adev, *trap_obj, AMDGPU_CWSR_TMA);
+err:
+	drm_exec_fini(&exec);
+	amdgpu_bo_free_kernel(&(*trap_obj)->tma_bo, NULL, NULL);
+	kfree(*trap_obj);
+	*trap_obj = NULL;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
new file mode 100644
index 000000000000..26ed9308f70b
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
@@ -0,0 +1,67 @@
+/*
+ * Copyright 2025 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef AMDGPU_CWSR_H
+#define AMDGPU_CWSR_H
+
+#include <linux/types.h>
+
+struct amdgpu_bo;
+struct amdgpu_bo_va;
+struct amdgpu_device;
+struct amdgpu_vm;
+
+/**
+ * struct amdgpu_cwsr_obj - CWSR (Compute Wave Save Restore) buffer tracking
+ * @bo: Buffer object for CWSR area
+ * @bo_va: Buffer object virtual address mapping
+ */
+struct amdgpu_cwsr_trap_obj {
+	uint64_t tma_gpu_va_addr;
+	uint64_t tba_gpu_va_addr;
+
+	struct amdgpu_bo *tma_bo;
+	struct amdgpu_bo_va *tba_va;
+	struct amdgpu_bo_va *tma_va;
+	void *tma_cpu_addr;
+};
+
+struct amdgpu_cwsr_info {
+	/* cwsr isa */
+	struct amdgpu_bo *isa_bo;
+	const void *isa_buf;
+	uint32_t isa_sz;
+};
+
+int amdgpu_cwsr_init(struct amdgpu_device *adev);
+void amdgpu_cwsr_fini(struct amdgpu_device *adev);
+
+int amdgpu_cwsr_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+		      struct amdgpu_cwsr_trap_obj **cwsr_obj);
+void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+		      struct amdgpu_cwsr_trap_obj **cwsr_obj);
+static inline bool amdgpu_cwsr_is_enabled(struct amdgpu_device *adev)
+{
+	return adev->cwsr_info != NULL;
+}
+
+#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 139642eacdd0..9bde17815a6a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -176,10 +176,17 @@ struct amdgpu_bo_vm;
 #define AMDGPU_VA_RESERVED_TRAP_SIZE		(2ULL << 12)
 #define AMDGPU_VA_RESERVED_TRAP_START(adev)	(AMDGPU_VA_RESERVED_SEQ64_START(adev) \
 						 - AMDGPU_VA_RESERVED_TRAP_SIZE)
+/* TBD: Ideally, existing TRAP VA should do. There is a conflict with KFD mapping that needs to be
+ * resolved. Revisit later.
+ */
+#define AMDGPU_VA_RESERVED_TRAP_UQ_SIZE (3ULL << 12)
+#define AMDGPU_VA_RESERVED_TRAP_UQ_START(adev) \
+	(AMDGPU_VA_RESERVED_TRAP_START(adev) - AMDGPU_VA_RESERVED_TRAP_UQ_SIZE)
+
 #define AMDGPU_VA_RESERVED_BOTTOM		(1ULL << 16)
-#define AMDGPU_VA_RESERVED_TOP			(AMDGPU_VA_RESERVED_TRAP_SIZE + \
-						 AMDGPU_VA_RESERVED_SEQ64_SIZE + \
-						 AMDGPU_VA_RESERVED_CSA_SIZE)
+#define AMDGPU_VA_RESERVED_TOP                                            \
+	(AMDGPU_VA_RESERVED_TRAP_UQ_SIZE + AMDGPU_VA_RESERVED_TRAP_SIZE + \
+	 AMDGPU_VA_RESERVED_SEQ64_SIZE + AMDGPU_VA_RESERVED_CSA_SIZE)
 
 /* See vm_update_mode */
 #define AMDGPU_VM_USE_CPU_FOR_GFX (1 << 0)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 03/11] drm/amdgpu: Fill cwsr save area details
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
  2026-01-22 10:39 ` [PATCH v4 01/11] drm/amdgpu: Add helper function to get xcc count Lijo Lazar
  2026-01-22 10:39 ` [PATCH v4 02/11] drm/amdgpu: Add cwsr functions Lijo Lazar
@ 2026-01-22 10:39 ` Lijo Lazar
  2026-01-22 10:39 ` [PATCH v4 04/11] drm/amdgpu: Add user save area params validation Lijo Lazar
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx
  Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang,
	Alex Deucher

Calculate control stack and total save area size required.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 104 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h |   4 +
 2 files changed, 108 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
index f2d3837366bf..80020fd33ce6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
@@ -32,6 +32,13 @@ extern int cwsr_enable;
 #define AMDGPU_CWSR_TMA_MAX_SIZE (AMDGPU_GPU_PAGE_SIZE)
 #define AMDGPU_CWSR_TMA_OFFSET (AMDGPU_CWSR_TBA_MAX_SIZE)
 
+#define SGPR_SIZE_PER_CU 0x4000
+#define LDS_SIZE_PER_CU 0x10000
+#define HWREG_SIZE_PER_CU 0x1000
+#define DEBUGGER_BYTES_ALIGN 64
+#define DEBUGGER_BYTES_PER_WAVE 32
+#define SIZEOF_HSA_USER_CONTEXT_SAVE_AREA_HEADER 40
+
 enum amdgpu_cwsr_region {
 	AMDGPU_CWSR_TBA,
 	AMDGPU_CWSR_TMA,
@@ -121,6 +128,101 @@ static void amdgpu_cwsr_init_isa_details(struct amdgpu_device *adev,
 	}
 }
 
+static uint32_t amdgpu_cwsr_get_vgpr_size_per_cu(struct amdgpu_device *adev)
+{
+	uint32_t gc_ver = amdgpu_ip_version(adev, GC_HWIP, 0);
+	uint32_t vgpr_size;
+
+	switch (gc_ver) {
+	case IP_VERSION(9, 4, 1): /* GFX_VERSION_ARCTURUS */
+	case IP_VERSION(9, 4, 2): /* GFX_VERSION_ALDEBARAN */
+	case IP_VERSION(9, 4, 3): /* GFX_VERSION_AQUA_VANJARAM */
+	case IP_VERSION(9, 4, 4): /* GFX_VERSION_AQUA_VANJARAM */
+	case IP_VERSION(9, 5, 0):
+		vgpr_size = 0x80000;
+		break;
+	case IP_VERSION(11, 0, 0):
+	case IP_VERSION(11, 0, 2):
+	case IP_VERSION(11, 0, 3):
+	case IP_VERSION(12, 0, 0):
+	case IP_VERSION(12, 0, 1):
+		vgpr_size = 0x60000;
+		break;
+	default:
+		vgpr_size = 0x40000;
+		break;
+	}
+
+	return vgpr_size;
+}
+
+static uint32_t amdgpu_cwsr_get_wg_ctxt_size_per_cu(struct amdgpu_device *adev)
+{
+	uint32_t lds_sz_per_cu;
+
+	lds_sz_per_cu =
+		(amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) ?
+			(adev->gfx.cu_info.lds_size << 10) :
+			LDS_SIZE_PER_CU;
+
+	return amdgpu_cwsr_get_vgpr_size_per_cu(adev) + SGPR_SIZE_PER_CU +
+	       lds_sz_per_cu + HWREG_SIZE_PER_CU;
+}
+
+static uint32_t amdgpu_cwsr_ctl_stack_bytes_per_wave(struct amdgpu_device *adev)
+{
+	uint32_t sz;
+
+	if (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(10, 1, 0))
+		sz = 12;
+	else
+		sz = 8;
+	return sz;
+}
+
+static void amdgpu_cwsr_init_save_area_info(struct amdgpu_device *adev,
+					    struct amdgpu_cwsr_info *cwsr_info)
+{
+	struct amdgpu_gfx_config *gfx_info = &adev->gfx.config;
+	uint32_t gc_ver = amdgpu_ip_version(adev, GC_HWIP, 0);
+	uint32_t ctl_stack_size, wg_data_size, dbg_mem_size;
+	uint32_t array_count;
+	uint32_t wave_num;
+	uint32_t cu_num;
+
+	if (gc_ver < IP_VERSION(9, 0, 1))
+		return;
+
+	array_count = gfx_info->max_shader_engines * gfx_info->max_sh_per_se;
+
+	cu_num = adev->gfx.cu_info.number / NUM_XCC(adev->gfx.xcc_mask);
+	wave_num = (gc_ver < IP_VERSION(10, 1, 0)) ? /* GFX_VERSION_NAVI10 */
+			   min(cu_num * 40,
+			       array_count / gfx_info->max_sh_per_se * 512) :
+			   cu_num * 32;
+
+	wg_data_size = ALIGN(cu_num * amdgpu_cwsr_get_wg_ctxt_size_per_cu(adev),
+			     PAGE_SIZE);
+	ctl_stack_size =
+		wave_num * amdgpu_cwsr_ctl_stack_bytes_per_wave(adev) + 8;
+	ctl_stack_size =
+		ALIGN(SIZEOF_HSA_USER_CONTEXT_SAVE_AREA_HEADER + ctl_stack_size,
+		      PAGE_SIZE);
+	dbg_mem_size =
+		ALIGN(wave_num * DEBUGGER_BYTES_PER_WAVE, DEBUGGER_BYTES_ALIGN);
+	/*
+	* HW design limits control stack size to 0x7000.
+	* This is insufficient for theoretical PM4 cases
+	* but sufficient for AQL, limited by SPI events.
+	*/
+	if (IP_VERSION_MAJ(gc_ver) == 10)
+		ctl_stack_size = min(ctl_stack_size, 0x7000);
+
+	cwsr_info->xcc_ctl_stack_sz = ctl_stack_size;
+	cwsr_info->xcc_cwsr_sz = ctl_stack_size + wg_data_size;
+	cwsr_info->xcc_dbg_mem_sz = dbg_mem_size;
+}
+
 int amdgpu_cwsr_init(struct amdgpu_device *adev)
 {
 	struct amdgpu_cwsr_info *cwsr_info __free(kfree) =
@@ -145,6 +247,8 @@ int amdgpu_cwsr_init(struct amdgpu_device *adev)
 		return r;
 
 	memcpy(ptr, cwsr_info->isa_buf, cwsr_info->isa_sz);
+
+	amdgpu_cwsr_init_save_area_info(adev, cwsr_info);
 	adev->cwsr_info = no_free_ptr(cwsr_info);
 
 	return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
index 26ed9308f70b..3c80d057bbed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
@@ -50,6 +50,10 @@ struct amdgpu_cwsr_info {
 	struct amdgpu_bo *isa_bo;
 	const void *isa_buf;
 	uint32_t isa_sz;
+	/* cwsr size info per XCC*/
+	uint32_t xcc_ctl_stack_sz;
+	uint32_t xcc_dbg_mem_sz;
+	uint32_t xcc_cwsr_sz;
 };
 
 int amdgpu_cwsr_init(struct amdgpu_device *adev);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 04/11] drm/amdgpu: Add user save area params validation
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
                   ` (2 preceding siblings ...)
  2026-01-22 10:39 ` [PATCH v4 03/11] drm/amdgpu: Fill cwsr save area details Lijo Lazar
@ 2026-01-22 10:39 ` Lijo Lazar
  2026-01-23 20:44   ` Alex Deucher
  2026-01-22 10:39 ` [PATCH v4 05/11] drm/amdgpu: Add cwsr to device init/fini sequence Lijo Lazar
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang

Add an interface to validate user provided save area parameters. Address
validation is not done and expected to be done outside.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 44 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h | 11 ++++++
 2 files changed, 55 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
index 80020fd33ce6..32d9398cd1d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
@@ -64,6 +64,15 @@ static inline bool amdgpu_cwsr_is_supported(struct amdgpu_device *adev)
 	return true;
 }
 
+uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc)
+{
+	if (!amdgpu_cwsr_is_enabled(adev))
+		return 0;
+
+	return num_xcc *
+	       (adev->cwsr_info->xcc_cwsr_sz + adev->cwsr_info->xcc_dbg_mem_sz);
+}
+
 static void amdgpu_cwsr_init_isa_details(struct amdgpu_device *adev,
 					 struct amdgpu_cwsr_info *cwsr_info)
 {
@@ -425,6 +434,41 @@ int amdgpu_cwsr_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	return r;
 }
 
+int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
+				struct amdgpu_cwsr_params *cwsr_params,
+				int num_xcc)
+{
+	struct amdgpu_cwsr_info *cwsr_info = adev->cwsr_info;
+
+	if (!amdgpu_cwsr_is_enabled(adev))
+		return -EOPNOTSUPP;
+
+	if (!cwsr_params)
+		return -EINVAL;
+
+	/*
+	 * Only control stack and save area size details checked. Address validation needs to be
+	 * carried out separately.
+	 */
+	if (cwsr_params->ctl_stack_sz !=
+	    (cwsr_info->xcc_ctl_stack_sz * num_xcc)) {
+		dev_dbg(adev->dev,
+			"queue ctl stack size 0x%x not equal to node ctl stack size 0x%x\n",
+			cwsr_params->ctl_stack_sz,
+			num_xcc * cwsr_info->xcc_ctl_stack_sz);
+		return -EINVAL;
+	}
+
+	if (cwsr_params->cwsr_sz < (cwsr_info->xcc_cwsr_sz * num_xcc)) {
+		dev_dbg(adev->dev,
+			"queue cwsr size 0x%x not equal to node cwsr size 0x%x\n",
+			cwsr_params->cwsr_sz, num_xcc * cwsr_info->xcc_cwsr_sz);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		      struct amdgpu_cwsr_trap_obj **trap_obj)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
index 3c80d057bbed..96b03a8ed99b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
@@ -56,6 +56,13 @@ struct amdgpu_cwsr_info {
 	uint32_t xcc_cwsr_sz;
 };
 
+struct amdgpu_cwsr_params {
+	uint64_t ctx_save_area_address;
+	/* cwsr size info */
+	uint32_t ctl_stack_sz;
+	uint32_t cwsr_sz;
+};
+
 int amdgpu_cwsr_init(struct amdgpu_device *adev);
 void amdgpu_cwsr_fini(struct amdgpu_device *adev);
 
@@ -68,4 +75,8 @@ static inline bool amdgpu_cwsr_is_enabled(struct amdgpu_device *adev)
 	return adev->cwsr_info != NULL;
 }
 
+uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc);
+int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
+				struct amdgpu_cwsr_params *cwsr_params,
+				int num_xcc);
 #endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 05/11] drm/amdgpu: Add cwsr to device init/fini sequence
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
                   ` (3 preceding siblings ...)
  2026-01-22 10:39 ` [PATCH v4 04/11] drm/amdgpu: Add user save area params validation Lijo Lazar
@ 2026-01-22 10:39 ` Lijo Lazar
  2026-01-22 10:39 ` [PATCH v4 06/11] drm/amdgpu: Add first level cwsr handler to userq Lijo Lazar
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx
  Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang,
	Alex Deucher

Initialize cwsr handler related info during device initialization.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 362ab2b34498..d84e513613d7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -78,6 +78,7 @@
 #include "amdgpu_reset.h"
 #include "amdgpu_virt.h"
 #include "amdgpu_dev_coredump.h"
+#include "amdgpu_cwsr.h"
 
 #include <linux/suspend.h>
 #include <drm/task_barrier.h>
@@ -3171,6 +3172,12 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 
 	r = amdgpu_cper_init(adev);
 
+	if (!r) {
+		r = amdgpu_cwsr_init(adev);
+		if (r == -EOPNOTSUPP)
+			r = 0;
+	}
+
 init_failed:
 
 	return r;
@@ -3561,6 +3568,7 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
 {
 	int i, r;
 
+	amdgpu_cwsr_fini(adev);
 	amdgpu_cper_fini(adev);
 
 	if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 06/11] drm/amdgpu: Add first level cwsr handler to userq
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
                   ` (4 preceding siblings ...)
  2026-01-22 10:39 ` [PATCH v4 05/11] drm/amdgpu: Add cwsr to device init/fini sequence Lijo Lazar
@ 2026-01-22 10:39 ` Lijo Lazar
  2026-01-22 10:39 ` [PATCH v4 07/11] drm/amdgpu: Add user save area params to mqd input Lijo Lazar
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx
  Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang,
	Alex Deucher

Add cwsr_trap_obj to render file handle. It maps the first level cwsr
handler to the vm with which the file handle is associated. Use
cwsr trap object's tba/tma address for the userqueue.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h   | 6 ++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 8 ++++++++
 drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 7 +++++++
 4 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0ace28c170bb..218d8030a07c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -329,6 +329,7 @@ struct amdgpu_hive_info;
 struct amdgpu_reset_context;
 struct amdgpu_reset_control;
 struct amdgpu_cwsr_isa;
+struct amdgpu_cwsr_trap_obj;
 
 enum amdgpu_cp_irq {
 	AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP = 0,
@@ -449,6 +450,7 @@ struct amdgpu_fpriv {
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
 	struct amdgpu_userq_mgr	userq_mgr;
+	struct amdgpu_cwsr_trap_obj *cwsr_trap;
 
 	/* Eviction fence infra */
 	struct amdgpu_eviction_fence_mgr evf_mgr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
index 96b03a8ed99b..b54240d40a6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
@@ -79,4 +79,10 @@ uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc);
 int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
 				struct amdgpu_cwsr_params *cwsr_params,
 				int num_xcc);
+static inline bool amdgpu_cwsr_has_dbg_wa(struct amdgpu_device *adev)
+{
+	uint32_t gc_ver = amdgpu_ip_version(adev, GC_HWIP, 0);
+
+	return gc_ver >= IP_VERSION(11, 0, 0) && gc_ver <= IP_VERSION(11, 0, 3);
+}
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 728033a88078..fed15a922346 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -46,6 +46,7 @@
 #include "amdgpu_reset.h"
 #include "amd_pcie.h"
 #include "amdgpu_userq.h"
+#include "amdgpu_cwsr.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -1512,6 +1513,12 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 			 "Failed to init usermode queue manager (%d), use legacy workload submission only\n",
 			 r);
 
+	if (amdgpu_cwsr_is_enabled(adev)) {
+		r = amdgpu_cwsr_alloc(adev, &fpriv->vm, &fpriv->cwsr_trap);
+		if (r)
+			dev_dbg(adev->dev, "cwsr trap not enabled");
+	}
+
 	r = amdgpu_eviction_fence_init(&fpriv->evf_mgr);
 	if (r)
 		goto error_vm;
@@ -1584,6 +1591,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 	}
 
 	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
+	amdgpu_cwsr_free(adev, &fpriv->vm, &fpriv->cwsr_trap);
 	amdgpu_vm_fini(adev, &fpriv->vm);
 
 	if (pasid)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
index f2309d72bbe6..27917614b1a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
@@ -26,6 +26,7 @@
 #include "amdgpu_gfx.h"
 #include "mes_userqueue.h"
 #include "amdgpu_userq_fence.h"
+#include "amdgpu_cwsr.h"
 
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
@@ -136,6 +137,7 @@ static int convert_to_mes_priority(int priority)
 static int mes_userq_map(struct amdgpu_usermode_queue *queue)
 {
 	struct amdgpu_userq_mgr *uq_mgr = queue->userq_mgr;
+	struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
 	struct amdgpu_device *adev = uq_mgr->adev;
 	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
 	struct amdgpu_mqd_prop *userq_props = queue->userq_prop;
@@ -165,6 +167,11 @@ static int mes_userq_map(struct amdgpu_usermode_queue *queue)
 	queue_input.doorbell_offset = userq_props->doorbell_index;
 	queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
 	queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
+	if (fpriv->cwsr_trap) {
+		queue_input.tba_addr = fpriv->cwsr_trap->tba_gpu_va_addr;
+		queue_input.tma_addr = fpriv->cwsr_trap->tma_gpu_va_addr;
+		queue_input.trap_en = !amdgpu_cwsr_has_dbg_wa(adev);
+	}
 
 	amdgpu_mes_lock(&adev->mes);
 	r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 07/11] drm/amdgpu: Add user save area params to mqd input
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
                   ` (5 preceding siblings ...)
  2026-01-22 10:39 ` [PATCH v4 06/11] drm/amdgpu: Add first level cwsr handler to userq Lijo Lazar
@ 2026-01-22 10:39 ` Lijo Lazar
  2026-01-23 20:47   ` Alex Deucher
  2026-01-22 10:39 ` [PATCH v4 08/11] drm/amdgpu: Add ioctl to get cwsr details Lijo Lazar
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang

Add user save area parameters to mqd properties for queue creation.
Validate the parameters before using for mqd initialization.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  4 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c  | 24 ++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h  |  5 +++++
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c     | 14 +++++++++++++
 drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 16 +++++++++++++++
 5 files changed, 63 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 218d8030a07c..26b757c95579 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -814,6 +814,10 @@ struct amdgpu_mqd_prop {
 	uint64_t fence_address;
 	bool tmz_queue;
 	bool kernel_queue;
+	/* cwsr params*/
+	uint64_t ctx_save_area_addr;
+	uint32_t ctx_save_area_size;
+	uint32_t ctl_stack_size;
 };
 
 struct amdgpu_mqd {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index 37a526a1085f..119b84a0703e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -33,6 +33,7 @@
 #include "amdgpu_userq.h"
 #include "amdgpu_hmm.h"
 #include "amdgpu_userq_fence.h"
+#include "amdgpu_cwsr.h"
 
 u32 amdgpu_userq_get_supported_ip_mask(struct amdgpu_device *adev)
 {
@@ -265,6 +266,29 @@ int amdgpu_userq_input_va_validate(struct amdgpu_device *adev,
 	return r;
 }
 
+int amdgpu_userq_input_cwsr_params_validate(
+	struct amdgpu_usermode_queue *queue,
+	struct amdgpu_cwsr_params *cwsr_params)
+{
+	struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(queue->userq_mgr);
+	struct amdgpu_device *adev = queue->userq_mgr->adev;
+	uint32_t cwsr_size;
+	int num_xcc;
+	int r;
+
+	num_xcc = amdgpu_xcp_get_num_xcc(adev->xcp_mgr, fpriv->xcp_id);
+	r = amdgpu_cwsr_validate_params(queue->userq_mgr->adev, cwsr_params,
+					num_xcc);
+	if (r)
+		return r;
+	cwsr_size = amdgpu_cwsr_size_needed(queue->userq_mgr->adev, num_xcc);
+	if (!cwsr_size)
+		return -EOPNOTSUPP;
+
+	return amdgpu_userq_input_va_validate(
+		adev, queue, cwsr_params->ctx_save_area_address, cwsr_size);
+}
+
 static bool amdgpu_userq_buffer_va_mapped(struct amdgpu_vm *vm, u64 addr)
 {
 	struct amdgpu_bo_va_mapping *mapping;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h
index 5845d8959034..a64292bc24dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h
@@ -41,6 +41,7 @@ enum amdgpu_userq_state {
 };
 
 struct amdgpu_mqd_prop;
+struct amdgpu_cwsr_params;
 
 struct amdgpu_userq_obj {
 	void		 *cpu_ptr;
@@ -157,4 +158,8 @@ int amdgpu_userq_input_va_validate(struct amdgpu_device *adev,
 int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
 				       struct amdgpu_bo_va_mapping *mapping,
 				       uint64_t saddr);
+int amdgpu_userq_input_cwsr_params_validate(
+	struct amdgpu_usermode_queue *queue,
+	struct amdgpu_cwsr_params *cwsr_params);
+
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 40660b05f979..5f6a6f630495 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -3243,6 +3243,20 @@ static int gfx_v12_0_compute_mqd_init(struct amdgpu_device *adev, void *m,
 	mqd->fence_address_lo = lower_32_bits(prop->fence_address);
 	mqd->fence_address_hi = upper_32_bits(prop->fence_address);
 
+	/* If non-zero, assume cwsr is enabled */
+	if (prop->ctx_save_area_addr) {
+		mqd->cp_hqd_persistent_state |=
+			(1 << CP_HQD_PERSISTENT_STATE__QSWITCH_MODE__SHIFT);
+		mqd->cp_hqd_ctx_save_base_addr_lo =
+			lower_32_bits(prop->ctx_save_area_addr);
+		mqd->cp_hqd_ctx_save_base_addr_hi =
+			upper_32_bits(prop->ctx_save_area_addr);
+		mqd->cp_hqd_ctx_save_size = prop->ctx_save_area_size;
+		mqd->cp_hqd_cntl_stack_size = prop->ctl_stack_size;
+		mqd->cp_hqd_cntl_stack_offset = prop->ctl_stack_size;
+		mqd->cp_hqd_wg_state_offset = prop->ctl_stack_size;
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
index 27917614b1a8..7ad8297eb0d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
@@ -314,6 +314,7 @@ static int mes_userq_mqd_create(struct amdgpu_usermode_queue *queue,
 
 	if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
 		struct drm_amdgpu_userq_mqd_compute_gfx11 *compute_mqd;
+		struct amdgpu_cwsr_params cwsr_params;
 
 		if (mqd_user->mqd_size != sizeof(*compute_mqd)) {
 			DRM_ERROR("Invalid compute IP MQD size\n");
@@ -339,6 +340,21 @@ static int mes_userq_mqd_create(struct amdgpu_usermode_queue *queue,
 		userq_props->hqd_active = false;
 		userq_props->tmz_queue =
 			mqd_user->flags & AMDGPU_USERQ_CREATE_FLAGS_QUEUE_SECURE;
+
+		if (amdgpu_cwsr_is_enabled(adev)) {
+			cwsr_params.ctx_save_area_address =
+				userq_props->ctx_save_area_addr;
+			cwsr_params.cwsr_sz = userq_props->ctx_save_area_size;
+			cwsr_params.ctl_stack_sz = userq_props->ctl_stack_size;
+
+			r = amdgpu_userq_input_cwsr_params_validate(
+				queue, &cwsr_params);
+			if (r) {
+				kfree(compute_mqd);
+				goto free_mqd;
+			}
+		}
+
 		kfree(compute_mqd);
 	} else if (queue->queue_type == AMDGPU_HW_IP_GFX) {
 		struct drm_amdgpu_userq_mqd_gfx11 *mqd_gfx_v11;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 08/11] drm/amdgpu: Add ioctl to get cwsr details
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
                   ` (6 preceding siblings ...)
  2026-01-22 10:39 ` [PATCH v4 07/11] drm/amdgpu: Add user save area params to mqd input Lijo Lazar
@ 2026-01-22 10:39 ` Lijo Lazar
  2026-01-23 20:48   ` Alex Deucher
  2026-01-22 10:39 ` [PATCH v4 09/11] drm/amdgpu: Add ioctl support for cwsr params Lijo Lazar
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang

Add an ioctl to return size information required for CWSR regions.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 21 +++++++++++++++++++++
 include/uapi/drm/amdgpu_drm.h           | 16 ++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index fed15a922346..992bcdf3fc1c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1426,6 +1426,27 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 			return -EINVAL;
 		}
 	}
+	case AMDGPU_INFO_CWSR: {
+		struct drm_amdgpu_info_cwsr cwsr_info;
+		int num_xcc, r;
+
+		fpriv = (struct amdgpu_fpriv *)filp->driver_priv;
+		if (!amdgpu_cwsr_is_enabled(adev) || !fpriv->cwsr_trap)
+			return -EOPNOTSUPP;
+		num_xcc = amdgpu_xcp_get_num_xcc(adev->xcp_mgr, fpriv->xcp_id);
+		cwsr_info.ctl_stack_size =
+			adev->cwsr_info->xcc_ctl_stack_sz * num_xcc;
+		cwsr_info.dbg_mem_size =
+			adev->cwsr_info->xcc_dbg_mem_sz * num_xcc;
+		cwsr_info.min_save_area_size =
+			adev->cwsr_info->xcc_cwsr_sz * num_xcc;
+		r = copy_to_user(out, &cwsr_info,
+				 min((size_t)size, sizeof(cwsr_info))) ?
+			    -EFAULT :
+			    0;
+		return r;
+	}
+
 	default:
 		DRM_DEBUG_KMS("Invalid request %d\n", info->query);
 		return -EINVAL;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index ab2bf47553e1..c178b8e0bd3f 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -1269,6 +1269,8 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
 #define AMDGPU_INFO_GPUVM_FAULT			0x23
 /* query FW object size and alignment */
 #define AMDGPU_INFO_UQ_FW_AREAS			0x24
+/* query CWSR size and alignment */
+#define AMDGPU_INFO_CWSR			0x25
 
 #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT	0
 #define AMDGPU_INFO_MMR_SE_INDEX_MASK	0xff
@@ -1648,6 +1650,20 @@ struct drm_amdgpu_info_uq_metadata {
 	};
 };
 
+/**
+ * struct drm_amdgpu_info_cwsr - cwsr information
+ *
+ * Gives cwsr related size details. User needs to allocate buffer based on this.
+ */
+struct drm_amdgpu_info_cwsr {
+	/* Control stack size */
+	__u32 ctl_stack_size;
+	/* Debug memory area size */
+	__u32 dbg_mem_size;
+	/* Minimum save area size required */
+	__u32 min_save_area_size;
+};
+
 /*
  * Supported GPU families
  */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 09/11] drm/amdgpu: Add ioctl support for cwsr params
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
                   ` (7 preceding siblings ...)
  2026-01-22 10:39 ` [PATCH v4 08/11] drm/amdgpu: Add ioctl to get cwsr details Lijo Lazar
@ 2026-01-22 10:39 ` Lijo Lazar
  2026-01-23 20:51   ` Alex Deucher
  2026-01-22 10:39 ` [PATCH v4 10/11] drm/amdgpu: Add ioctl to set level2 handler Lijo Lazar
  2026-01-22 10:40 ` [PATCH v4 11/11] drm/amdgpu: Add interface to set debug trap flag Lijo Lazar
  10 siblings, 1 reply; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang

Add cwsr parameters to userqueue ioctl. User should pass the GPU virtual
address for save/restore buffer, and size allocated. They are supported
only for user compute queues.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 13 +++++++++----
 include/uapi/drm/amdgpu_drm.h              | 16 ++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
index 7ad8297eb0d8..2765317f04df 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
@@ -343,16 +343,21 @@ static int mes_userq_mqd_create(struct amdgpu_usermode_queue *queue,
 
 		if (amdgpu_cwsr_is_enabled(adev)) {
 			cwsr_params.ctx_save_area_address =
-				userq_props->ctx_save_area_addr;
-			cwsr_params.cwsr_sz = userq_props->ctx_save_area_size;
-			cwsr_params.ctl_stack_sz = userq_props->ctl_stack_size;
-
+				compute_mqd->ctx_save_area_va;
+			cwsr_params.cwsr_sz = compute_mqd->ctx_save_area_size;
+			cwsr_params.ctl_stack_sz = compute_mqd->ctl_stack_size;
 			r = amdgpu_userq_input_cwsr_params_validate(
 				queue, &cwsr_params);
 			if (r) {
 				kfree(compute_mqd);
 				goto free_mqd;
 			}
+			userq_props->ctx_save_area_addr =
+				compute_mqd->ctx_save_area_va;
+			userq_props->ctx_save_area_size =
+				compute_mqd->ctx_save_area_size;
+			userq_props->ctl_stack_size =
+				compute_mqd->ctl_stack_size;
 		}
 
 		kfree(compute_mqd);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index c178b8e0bd3f..b7a858365174 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -460,6 +460,22 @@ struct drm_amdgpu_userq_mqd_compute_gfx11 {
 	 * to get the size.
 	 */
 	__u64   eop_va;
+	/**
+	 * @ctx_save_area_va: Virtual address of the GPU memory for save/restore buffer.
+	 * This must be from a separate GPU object, and use AMDGPU_INFO IOCTL
+	 * to get the size. This includes control stack, wave context and debugger memory.
+	 */
+	__u64 ctx_save_area_va;
+	/**
+	 * @ctx_save_area_size:  Total size (in bytes) allocated for save/restore buffer.
+	 * Use AMDGPU_INFO IOCTL to get the size.
+	 */
+	__u32 ctx_save_area_size;
+	/**
+	 * @ctl_stack_size: Size (in bytes) of control stack region in the save/restore buffer.
+	 * Use AMDGPU_INFO IOCTL to get the size.
+	 */
+	__u32 ctl_stack_size;
 };
 
 /* userq signal/wait ioctl */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 10/11] drm/amdgpu: Add ioctl to set level2 handler
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
                   ` (8 preceding siblings ...)
  2026-01-22 10:39 ` [PATCH v4 09/11] drm/amdgpu: Add ioctl support for cwsr params Lijo Lazar
@ 2026-01-22 10:39 ` Lijo Lazar
  2026-01-23 20:52   ` Alex Deucher
  2026-01-22 10:40 ` [PATCH v4 11/11] drm/amdgpu: Add interface to set debug trap flag Lijo Lazar
  10 siblings, 1 reply; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:39 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang

Add ioctl to set tba/tma of level2 trap handler

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h      |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 105 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h |  11 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |   2 +
 include/uapi/drm/amdgpu_drm.h            |  24 ++++++
 5 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 26b757c95579..c3dfd84c2962 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1575,7 +1575,6 @@ int amdgpu_enable_vblank_kms(struct drm_crtc *crtc);
 void amdgpu_disable_vblank_kms(struct drm_crtc *crtc);
 int amdgpu_info_ioctl(struct drm_device *dev, void *data,
 		      struct drm_file *filp);
-
 /*
  * functions used by amdgpu_encoder.c
  */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
index 32d9398cd1d1..70f444afece0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
@@ -510,3 +510,108 @@ void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	kfree(*trap_obj);
 	*trap_obj = NULL;
 }
+
+static int amdgpu_cwsr_validate_user_addr(struct amdgpu_device *adev,
+					  struct amdgpu_vm *vm,
+					  struct amdgpu_cwsr_usr_addr *usr_addr)
+{
+	struct amdgpu_bo_va_mapping *va_map;
+	uint64_t addr;
+	uint32_t size;
+	int r;
+
+	addr = (usr_addr->addr & AMDGPU_GMC_HOLE_MASK) >> AMDGPU_GPU_PAGE_SHIFT;
+	size = usr_addr->size >> AMDGPU_GPU_PAGE_SHIFT;
+
+	r = amdgpu_bo_reserve(vm->root.bo, false);
+	if (r)
+		return r;
+
+	va_map = amdgpu_vm_bo_lookup_mapping(vm, addr);
+	if (!va_map) {
+		r = -EINVAL;
+		goto err;
+	}
+	/* validate whether resident in the VM mapping range */
+	if (addr >= va_map->start && va_map->last - addr + 1 >= size) {
+		amdgpu_bo_unreserve(vm->root.bo);
+		return 0;
+	}
+
+	r = -EINVAL;
+err:
+	amdgpu_bo_unreserve(vm->root.bo);
+
+	return r;
+}
+
+static int amdgpu_cwsr_set_l2_trap_handler(
+	struct amdgpu_device *adev, struct amdgpu_vm *vm,
+	struct amdgpu_cwsr_trap_obj *cwsr_obj, struct amdgpu_cwsr_usr_addr *tma,
+	struct amdgpu_cwsr_usr_addr *tba)
+{
+	uint64_t *l1tma;
+	int r;
+
+	if (!amdgpu_cwsr_is_enabled(adev))
+		return -EOPNOTSUPP;
+
+	if (!cwsr_obj || !cwsr_obj->tma_cpu_addr || !tma || !tba)
+		return -EINVAL;
+	r = amdgpu_cwsr_validate_user_addr(adev, vm, tma);
+	if (r)
+		return r;
+	r = amdgpu_cwsr_validate_user_addr(adev, vm, tba);
+	if (r)
+		return r;
+
+	l1tma = (uint64_t *)(cwsr_obj->tma_cpu_addr);
+	l1tma[0] = tma->addr;
+	l1tma[1] = tba->addr;
+
+	return 0;
+}
+
+/*
+ * Userspace cwsr related ioctl
+ */
+/**
+ * amdgpu_cwsr_ioctl - Handle cwsr specific requests.
+ *
+ * @dev: drm device pointer
+ * @data: request object
+ * @filp: drm filp
+ *
+ * This function is used to perform cwsr and trap handler related operations
+ * Returns 0 on success, error code on failure.
+ */
+int amdgpu_cwsr_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
+{
+	struct amdgpu_device *adev = drm_to_adev(dev);
+	union drm_amdgpu_cwsr *cwsr = data;
+	struct amdgpu_fpriv *fpriv;
+	int r;
+
+	fpriv = (struct amdgpu_fpriv *)filp->driver_priv;
+
+	if (!fpriv->cwsr_trap)
+		return -EOPNOTSUPP;
+
+	switch (cwsr->in.op) {
+	case AMDGPU_CWSR_OP_SET_L2_TRAP: {
+		struct amdgpu_cwsr_usr_addr tba;
+		struct amdgpu_cwsr_usr_addr tma;
+
+		tba.addr = cwsr->in.l2trap.tba_va;
+		tba.size = cwsr->in.l2trap.tba_sz;
+		tma.addr = cwsr->in.l2trap.tma_va;
+		tma.size = cwsr->in.l2trap.tma_sz;
+		r = amdgpu_cwsr_set_l2_trap_handler(
+			adev, &fpriv->vm, fpriv->cwsr_trap, &tma, &tba);
+	} break;
+	default:
+		return -EINVAL;
+	}
+
+	return r;
+}
\ No newline at end of file
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
index b54240d40a6c..c9f61e393fde 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
@@ -31,7 +31,7 @@ struct amdgpu_device;
 struct amdgpu_vm;
 
 /**
- * struct amdgpu_cwsr_obj - CWSR (Compute Wave Save Restore) buffer tracking
+ * struct amdgpu_cwsr_trap_obj - CWSR (Compute Wave Save Restore) buffer tracking
  * @bo: Buffer object for CWSR area
  * @bo_va: Buffer object virtual address mapping
  */
@@ -63,6 +63,11 @@ struct amdgpu_cwsr_params {
 	uint32_t cwsr_sz;
 };
 
+struct amdgpu_cwsr_usr_addr {
+	uint64_t addr;
+	uint32_t size;
+};
+
 int amdgpu_cwsr_init(struct amdgpu_device *adev);
 void amdgpu_cwsr_fini(struct amdgpu_device *adev);
 
@@ -85,4 +90,8 @@ static inline bool amdgpu_cwsr_has_dbg_wa(struct amdgpu_device *adev)
 
 	return gc_ver >= IP_VERSION(11, 0, 0) && gc_ver <= IP_VERSION(11, 0, 3);
 }
+
+int amdgpu_cwsr_ioctl(struct drm_device *dev, void *data,
+		      struct drm_file *filp);
+
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 771c89c84608..7fbd106fff8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -53,6 +53,7 @@
 #include "amdgpu_sched.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_userq.h"
+#include "amdgpu_cwsr.h"
 #include "amdgpu_userq_fence.h"
 #include "../amdxcp/amdgpu_xcp_drv.h"
 
@@ -3074,6 +3075,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(AMDGPU_SCHED, amdgpu_sched_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF_DRV(AMDGPU_BO_LIST, amdgpu_bo_list_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_FENCE_TO_HANDLE, amdgpu_cs_fence_to_handle_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_CWSR, amdgpu_cwsr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	/* KMS */
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_MMAP, amdgpu_gem_mmap_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_WAIT_IDLE, amdgpu_gem_wait_idle_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index b7a858365174..a36e3e2e679c 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -58,6 +58,7 @@ extern "C" {
 #define DRM_AMDGPU_USERQ_SIGNAL		0x17
 #define DRM_AMDGPU_USERQ_WAIT		0x18
 #define DRM_AMDGPU_GEM_LIST_HANDLES	0x19
+#define DRM_AMDGPU_CWSR 0x20
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -79,6 +80,8 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_USERQ_SIGNAL	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_SIGNAL, struct drm_amdgpu_userq_signal)
 #define DRM_IOCTL_AMDGPU_USERQ_WAIT	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_WAIT, struct drm_amdgpu_userq_wait)
 #define DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_LIST_HANDLES, struct drm_amdgpu_gem_list_handles)
+#define DRM_IOCTL_AMDGPU_CWSR \
+	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_CWSR, union drm_amdgpu_cwsr)
 
 /**
  * DOC: memory domains
@@ -1680,6 +1683,27 @@ struct drm_amdgpu_info_cwsr {
 	__u32 min_save_area_size;
 };
 
+/* cwsr ioctl */
+#define AMDGPU_CWSR_OP_SET_L2_TRAP 1
+
+struct drm_amdgpu_cwsr_in {
+	/* AMDGPU_CWSR_OP_* */
+	__u32 op;
+	struct {
+		/* Level 2 trap handler base address */
+		__u64 tba_va;
+		/* Level 2 trap handler buffer size (in bytes) */
+		__u32 tba_sz;
+		/* Level 2 trap memory buffer address */
+		__u64 tma_va;
+		/* Level 2 trap memory buffer size (in bytes) */
+		__u32 tma_sz;
+	} l2trap;
+};
+
+union drm_amdgpu_cwsr {
+	struct drm_amdgpu_cwsr_in in;
+};
 /*
  * Supported GPU families
  */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 11/11] drm/amdgpu: Add interface to set debug trap flag
  2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
                   ` (9 preceding siblings ...)
  2026-01-22 10:39 ` [PATCH v4 10/11] drm/amdgpu: Add ioctl to set level2 handler Lijo Lazar
@ 2026-01-22 10:40 ` Lijo Lazar
  2026-01-23 20:53   ` Alex Deucher
  10 siblings, 1 reply; 28+ messages in thread
From: Lijo Lazar @ 2026-01-22 10:40 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Alexander.Deucher, Christian.Koenig, Jesse.Zhang

Add interface to set debugger trap flag in TMA region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 19 ++++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h |  3 +++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
index 70f444afece0..663b91c8e6f3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
@@ -19,7 +19,6 @@
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  */
-
 #include <drm/drm_exec.h>
 
 #include "amdgpu.h"
@@ -614,4 +613,22 @@ int amdgpu_cwsr_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	}
 
 	return r;
+}
+
+int amdgpu_cwsr_set_trap_debug_flag(struct amdgpu_device *adev,
+				    struct amdgpu_cwsr_trap_obj *cwsr_obj,
+				    bool enabled)
+{
+	uint64_t *l1tma;
+
+	if (!amdgpu_cwsr_is_enabled(adev))
+		return -EOPNOTSUPP;
+
+	if (!cwsr_obj)
+		return -EINVAL;
+
+	l1tma = (uint64_t *)(cwsr_obj->tma_cpu_addr);
+	l1tma[2] = enabled;
+
+	return 0;
 }
\ No newline at end of file
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
index c9f61e393fde..a32044b07b45 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
@@ -93,5 +93,8 @@ static inline bool amdgpu_cwsr_has_dbg_wa(struct amdgpu_device *adev)
 
 int amdgpu_cwsr_ioctl(struct drm_device *dev, void *data,
 		      struct drm_file *filp);
+int amdgpu_cwsr_set_trap_debug_flag(struct amdgpu_device *adev,
+				    struct amdgpu_cwsr_trap_obj *cwsr_obj,
+				    bool enabled);
 
 #endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 02/11] drm/amdgpu: Add cwsr functions
  2026-01-22 10:39 ` [PATCH v4 02/11] drm/amdgpu: Add cwsr functions Lijo Lazar
@ 2026-01-23 20:41   ` Alex Deucher
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Deucher @ 2026-01-23 20:41 UTC (permalink / raw)
  To: Lijo Lazar
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang

On Thu, Jan 22, 2026 at 6:37 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>
> Add functions related to cwsr handling inside amdgpu framework.
>
> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile      |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h      |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 364 +++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h |  67 +++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   |  13 +-
>  5 files changed, 445 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 8e22882b66aa..3b563c73bb66 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -67,7 +67,7 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o amdgpu_kms.o \
>         amdgpu_fw_attestation.o amdgpu_securedisplay.o \
>         amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
>         amdgpu_ring_mux.o amdgpu_xcp.o amdgpu_seq64.o amdgpu_aca.o amdgpu_dev_coredump.o \
> -       amdgpu_cper.o amdgpu_userq_fence.o amdgpu_eviction_fence.o amdgpu_ip.o
> +       amdgpu_cper.o amdgpu_userq_fence.o amdgpu_eviction_fence.o amdgpu_ip.o  amdgpu_cwsr.o
>
>  amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 9c11535c44c6..0ace28c170bb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -328,6 +328,7 @@ struct kfd_vm_fault_info;
>  struct amdgpu_hive_info;
>  struct amdgpu_reset_context;
>  struct amdgpu_reset_control;
> +struct amdgpu_cwsr_isa;
>
>  enum amdgpu_cp_irq {
>         AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP = 0,
> @@ -1237,6 +1238,8 @@ struct amdgpu_device {
>          * Must be last --ends in a flexible-array member.
>          */
>         struct amdgpu_kfd_dev           kfd;
> +
> +       struct amdgpu_cwsr_info *cwsr_info;
>  };
>
>  static inline uint32_t amdgpu_ip_version(const struct amdgpu_device *adev,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> new file mode 100644
> index 000000000000..f2d3837366bf
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> @@ -0,0 +1,364 @@
> +/*
> + * Copyright 2025 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <drm/drm_exec.h>
> +
> +#include "amdgpu.h"
> +#include "cwsr_trap_handler.h"
> +#include "amdgpu_cwsr.h"
> +
> +extern int cwsr_enable;
> +
> +#define AMDGPU_CWSR_TBA_MAX_SIZE (2 * AMDGPU_GPU_PAGE_SIZE)
> +#define AMDGPU_CWSR_TMA_MAX_SIZE (AMDGPU_GPU_PAGE_SIZE)
> +#define AMDGPU_CWSR_TMA_OFFSET (AMDGPU_CWSR_TBA_MAX_SIZE)
> +
> +enum amdgpu_cwsr_region {
> +       AMDGPU_CWSR_TBA,
> +       AMDGPU_CWSR_TMA,
> +};
> +
> +static inline uint64_t amdgpu_cwsr_tba_vaddr(struct amdgpu_device *adev)
> +{
> +       uint64_t addr = AMDGPU_VA_RESERVED_TRAP_UQ_START(adev);
> +
> +       addr = amdgpu_gmc_sign_extend(addr);
> +
> +       return addr;
> +}
> +
> +static inline bool amdgpu_cwsr_is_supported(struct amdgpu_device *adev)
> +{
> +       uint32_t gc_ver = amdgpu_ip_version(adev, GC_HWIP, 0);
> +
> +       if (!cwsr_enable || adev->gfx.disable_uq ||
> +           gc_ver < IP_VERSION(9, 0, 1))
> +               return false;
> +
> +       return true;
> +}
> +
> +static void amdgpu_cwsr_init_isa_details(struct amdgpu_device *adev,
> +                                        struct amdgpu_cwsr_info *cwsr_info)
> +{
> +       uint32_t gc_ver = amdgpu_ip_version(adev, GC_HWIP, 0);
> +
> +       if (gc_ver < IP_VERSION(9, 0, 1)) {
> +               BUILD_BUG_ON(sizeof(cwsr_trap_gfx8_hex) >
> +                            AMDGPU_CWSR_TBA_MAX_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_gfx8_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_gfx8_hex);
> +       } else if (gc_ver == IP_VERSION(9, 4, 1)) {
> +               BUILD_BUG_ON(sizeof(cwsr_trap_arcturus_hex) >
> +                            AMDGPU_CWSR_TBA_MAX_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_arcturus_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_arcturus_hex);
> +       } else if (gc_ver == IP_VERSION(9, 4, 2)) {
> +               BUILD_BUG_ON(sizeof(cwsr_trap_aldebaran_hex) >
> +                            AMDGPU_CWSR_TBA_MAX_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_aldebaran_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_aldebaran_hex);
> +       } else if (gc_ver == IP_VERSION(9, 4, 3) ||
> +                  gc_ver == IP_VERSION(9, 4, 4)) {
> +               BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_4_3_hex) >
> +                            AMDGPU_CWSR_TBA_MAX_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_gfx9_4_3_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_gfx9_4_3_hex);
> +       } else if (gc_ver == IP_VERSION(9, 5, 0)) {
> +               BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_5_0_hex) > PAGE_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_gfx9_5_0_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_gfx9_5_0_hex);
> +       } else if (gc_ver < IP_VERSION(10, 1, 1)) {
> +               BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_hex) >
> +                            AMDGPU_CWSR_TBA_MAX_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_gfx9_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_gfx9_hex);
> +       } else if (gc_ver < IP_VERSION(10, 3, 0)) {
> +               BUILD_BUG_ON(sizeof(cwsr_trap_nv1x_hex) >
> +                            AMDGPU_CWSR_TBA_MAX_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_nv1x_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_nv1x_hex);
> +       } else if (gc_ver < IP_VERSION(11, 0, 0)) {
> +               BUILD_BUG_ON(sizeof(cwsr_trap_gfx10_hex) >
> +                            AMDGPU_CWSR_TBA_MAX_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_gfx10_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_gfx10_hex);
> +       } else if (gc_ver < IP_VERSION(12, 0, 0)) {
> +               /* The gfx11 cwsr trap handler must fit inside a single
> +                          page. */
> +               BUILD_BUG_ON(sizeof(cwsr_trap_gfx11_hex) > PAGE_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_gfx11_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_gfx11_hex);
> +       } else if (gc_ver < IP_VERSION(12, 1, 0)) {
> +               BUILD_BUG_ON(sizeof(cwsr_trap_gfx12_hex) >
> +                            AMDGPU_CWSR_TBA_MAX_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_gfx12_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_gfx12_hex);
> +       } else {
> +               BUILD_BUG_ON(sizeof(cwsr_trap_gfx12_1_0_hex) >
> +                            AMDGPU_CWSR_TBA_MAX_SIZE);
> +               cwsr_info->isa_buf = cwsr_trap_gfx12_1_0_hex;
> +               cwsr_info->isa_sz = sizeof(cwsr_trap_gfx12_1_0_hex);
> +       }
> +}
> +
> +int amdgpu_cwsr_init(struct amdgpu_device *adev)
> +{
> +       struct amdgpu_cwsr_info *cwsr_info __free(kfree) =
> +               kzalloc(sizeof(*cwsr_info), GFP_KERNEL);
> +       void *ptr;
> +       int r;
> +
> +       if (!amdgpu_cwsr_is_supported(adev))
> +               return -EOPNOTSUPP;
> +
> +       if (!cwsr_info)
> +               return -ENOMEM;
> +       amdgpu_cwsr_init_isa_details(adev, cwsr_info);
> +
> +       if (!cwsr_info->isa_sz)
> +               return -EOPNOTSUPP;
> +
> +       r = amdgpu_bo_create_kernel(adev, AMDGPU_CWSR_TBA_MAX_SIZE, PAGE_SIZE,
> +                                   AMDGPU_GEM_DOMAIN_GTT, &cwsr_info->isa_bo,
> +                                   NULL, &ptr);
> +       if (r)
> +               return r;
> +
> +       memcpy(ptr, cwsr_info->isa_buf, cwsr_info->isa_sz);
> +       adev->cwsr_info = no_free_ptr(cwsr_info);
> +
> +       return 0;
> +}
> +
> +void amdgpu_cwsr_fini(struct amdgpu_device *adev)
> +{
> +       if (!amdgpu_cwsr_is_enabled(adev))
> +               return;
> +
> +       amdgpu_bo_free_kernel(&adev->cwsr_info->isa_bo, NULL, NULL);
> +       kfree(adev->cwsr_info);
> +       adev->cwsr_info = NULL;
> +}
> +
> +/*
> + * amdgpu_map_cwsr_trap_handler should be called during amdgpu_vm_init
> + * it maps virtual address amdgpu_cwsr_trap_handler_vaddr() to this VM, and each
> + * compute queue can use this virtual address for wave save/restore
> + * operations to support compute preemption.
> + */
> +static int amdgpu_cwsr_map_region(struct amdgpu_device *adev,
> +                                 struct amdgpu_vm *vm,
> +                                 struct amdgpu_cwsr_trap_obj *cwsr,
> +                                 enum amdgpu_cwsr_region region)
> +{
> +       uint64_t cwsr_addr, va_flags, va;
> +       struct amdgpu_bo_va **bo_va;
> +       struct amdgpu_bo *bo;
> +       uint32_t size;
> +       int r;
> +
> +       if (!cwsr || !vm)
> +               return -EINVAL;
> +
> +       cwsr_addr = amdgpu_cwsr_tba_vaddr(adev);
> +
> +       if (region == AMDGPU_CWSR_TBA) {
> +               size = AMDGPU_CWSR_TBA_MAX_SIZE;
> +               bo_va = &cwsr->tba_va;
> +               bo = adev->cwsr_info->isa_bo;
> +               va = cwsr_addr;
> +               va_flags = (AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE |
> +                           AMDGPU_VM_PAGE_EXECUTABLE);
> +       } else {
> +               size = AMDGPU_CWSR_TMA_MAX_SIZE;
> +               bo_va = &cwsr->tma_va;
> +               bo = cwsr->tma_bo;
> +               va = cwsr_addr + AMDGPU_CWSR_TMA_OFFSET;
> +               va_flags = (AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE);
> +       }
> +
> +       *bo_va = amdgpu_vm_bo_add(adev, vm, bo);
> +       if (!*bo_va)
> +               return -ENOMEM;
> +
> +       va &= AMDGPU_GMC_HOLE_MASK;
> +       r = amdgpu_vm_bo_map(adev, *bo_va, va, 0, size, va_flags);
> +       if (r) {
> +               dev_err(adev->dev, "failed to do bo map of %s region, err=%d\n",
> +                       (region == AMDGPU_CWSR_TBA ? "tba" : "tma"), r);
> +               amdgpu_vm_bo_del(adev, *bo_va);
> +               *bo_va = NULL;
> +               return r;
> +       }
> +
> +       r = amdgpu_vm_bo_update(adev, *bo_va, false);
> +       if (r) {
> +               dev_err(adev->dev,
> +                       "failed to do page table update of %s region, err=%d\n",
> +                       (region == AMDGPU_CWSR_TBA ? "tba" : "tma"), r);
> +               amdgpu_vm_bo_del(adev, *bo_va);
> +               *bo_va = NULL;
> +               return r;
> +       }
> +
> +       if (region == AMDGPU_CWSR_TBA)
> +               cwsr->tba_gpu_va_addr = va;
> +       else
> +               cwsr->tma_gpu_va_addr = va;
> +
> +       return 0;
> +}
> +
> +static int amdgpu_cwsr_unmap_region(struct amdgpu_device *adev,
> +                                   struct amdgpu_cwsr_trap_obj *cwsr,
> +                                   enum amdgpu_cwsr_region region)
> +{
> +       struct amdgpu_bo_va **bo_va;
> +       uint64_t va;
> +       int r;
> +
> +       if (!cwsr)
> +               return -EINVAL;
> +
> +       if (region == AMDGPU_CWSR_TBA) {
> +               bo_va = &cwsr->tba_va;
> +               va = cwsr->tba_gpu_va_addr;
> +       } else {
> +               bo_va = &cwsr->tma_va;
> +               va = cwsr->tma_gpu_va_addr;
> +       }
> +
> +       r = amdgpu_vm_bo_unmap(adev, *bo_va, va);
> +       if (r) {
> +               dev_err(adev->dev,
> +                       "failed to do bo_unmap on CWSR trap handler, err=%d\n",
> +                       r);
> +               return r;
> +       }
> +
> +       amdgpu_vm_bo_del(adev, *bo_va);
> +       *bo_va = NULL;
> +
> +       return r;
> +}
> +
> +int amdgpu_cwsr_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> +                     struct amdgpu_cwsr_trap_obj **trap_obj)
> +{
> +       struct amdgpu_cwsr_trap_obj *cwsr;
> +       struct amdgpu_bo *bo;
> +       struct drm_exec exec;
> +       int r;
> +
> +       if (!amdgpu_cwsr_is_enabled(adev))
> +               return -EOPNOTSUPP;
> +       if (!vm || !trap_obj)
> +               return -EINVAL;
> +       cwsr = kzalloc(sizeof(*cwsr), GFP_KERNEL);
> +       if (!cwsr)
> +               return -ENOMEM;
> +
> +       bo = adev->cwsr_info->isa_bo;
> +       drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> +       drm_exec_until_all_locked(&exec) {
> +               r = amdgpu_vm_lock_pd(vm, &exec, 0);
> +               if (likely(!r))
> +                       r = drm_exec_lock_obj(&exec, &bo->tbo.base);
> +               drm_exec_retry_on_contention(&exec);
> +               if (unlikely(r)) {
> +                       dev_err(adev->dev,
> +                               "failed to reserve for CWSR allocs: err=%d\n",
> +                               r);
> +                       goto err;
> +               }
> +       }
> +
> +       r = amdgpu_bo_create_kernel(adev, AMDGPU_CWSR_TMA_MAX_SIZE, PAGE_SIZE,
> +                                   AMDGPU_GEM_DOMAIN_GTT, &cwsr->tma_bo, NULL,
> +                                   &cwsr->tma_cpu_addr);
> +       if (r)
> +               goto err;
> +
> +       r = amdgpu_cwsr_map_region(adev, vm, cwsr, AMDGPU_CWSR_TMA);
> +       if (r)
> +               goto err;
> +       r = amdgpu_cwsr_map_region(adev, vm, cwsr, AMDGPU_CWSR_TBA);
> +       if (r) {
> +               amdgpu_cwsr_unmap_region(adev, cwsr, AMDGPU_CWSR_TMA);
> +               goto err;
> +       }
> +
> +err:
> +       drm_exec_fini(&exec);
> +       if (r) {
> +               amdgpu_bo_free_kernel(&cwsr->tma_bo, NULL, NULL);
> +               kfree(cwsr);
> +               *trap_obj = NULL;
> +       } else {
> +               *trap_obj = cwsr;
> +       }
> +
> +       return r;
> +}
> +
> +void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> +                     struct amdgpu_cwsr_trap_obj **trap_obj)
> +{
> +       struct amdgpu_bo *tba_bo;
> +       struct amdgpu_bo *tma_bo;
> +       struct drm_exec exec;
> +       int r;
> +
> +       if (!trap_obj || !*trap_obj || !(*trap_obj)->tma_bo)
> +               return;
> +       tba_bo = adev->cwsr_info->isa_bo;
> +       tma_bo = (*trap_obj)->tma_bo;
> +
> +       if (!tba_bo || !tma_bo)
> +               return;
> +
> +       drm_exec_init(&exec, 0, 0);
> +       drm_exec_until_all_locked(&exec)
> +       {
> +               r = amdgpu_vm_lock_pd(vm, &exec, 0);
> +               if (likely(!r))
> +                       r = drm_exec_lock_obj(&exec, &tba_bo->tbo.base);
> +               drm_exec_retry_on_contention(&exec);
> +               if (likely(!r))
> +                       r = drm_exec_lock_obj(&exec, &tma_bo->tbo.base);
> +               drm_exec_retry_on_contention(&exec);
> +               if (unlikely(r)) {
> +                       dev_err(adev->dev,
> +                               "failed to reserve CWSR BOs: err=%d\n", r);
> +                       goto err;
> +               }
> +       }
> +
> +       amdgpu_cwsr_unmap_region(adev, *trap_obj, AMDGPU_CWSR_TBA);
> +       amdgpu_cwsr_unmap_region(adev, *trap_obj, AMDGPU_CWSR_TMA);
> +err:
> +       drm_exec_fini(&exec);
> +       amdgpu_bo_free_kernel(&(*trap_obj)->tma_bo, NULL, NULL);
> +       kfree(*trap_obj);
> +       *trap_obj = NULL;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> new file mode 100644
> index 000000000000..26ed9308f70b
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> @@ -0,0 +1,67 @@
> +/*
> + * Copyright 2025 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef AMDGPU_CWSR_H
> +#define AMDGPU_CWSR_H
> +
> +#include <linux/types.h>
> +
> +struct amdgpu_bo;
> +struct amdgpu_bo_va;
> +struct amdgpu_device;
> +struct amdgpu_vm;
> +
> +/**
> + * struct amdgpu_cwsr_obj - CWSR (Compute Wave Save Restore) buffer tracking
> + * @bo: Buffer object for CWSR area
> + * @bo_va: Buffer object virtual address mapping
> + */
> +struct amdgpu_cwsr_trap_obj {
> +       uint64_t tma_gpu_va_addr;
> +       uint64_t tba_gpu_va_addr;
> +
> +       struct amdgpu_bo *tma_bo;
> +       struct amdgpu_bo_va *tba_va;
> +       struct amdgpu_bo_va *tma_va;
> +       void *tma_cpu_addr;
> +};
> +
> +struct amdgpu_cwsr_info {
> +       /* cwsr isa */
> +       struct amdgpu_bo *isa_bo;
> +       const void *isa_buf;
> +       uint32_t isa_sz;
> +};
> +
> +int amdgpu_cwsr_init(struct amdgpu_device *adev);
> +void amdgpu_cwsr_fini(struct amdgpu_device *adev);
> +
> +int amdgpu_cwsr_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> +                     struct amdgpu_cwsr_trap_obj **cwsr_obj);
> +void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> +                     struct amdgpu_cwsr_trap_obj **cwsr_obj);
> +static inline bool amdgpu_cwsr_is_enabled(struct amdgpu_device *adev)
> +{
> +       return adev->cwsr_info != NULL;
> +}
> +
> +#endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 139642eacdd0..9bde17815a6a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -176,10 +176,17 @@ struct amdgpu_bo_vm;
>  #define AMDGPU_VA_RESERVED_TRAP_SIZE           (2ULL << 12)
>  #define AMDGPU_VA_RESERVED_TRAP_START(adev)    (AMDGPU_VA_RESERVED_SEQ64_START(adev) \
>                                                  - AMDGPU_VA_RESERVED_TRAP_SIZE)
> +/* TBD: Ideally, existing TRAP VA should do. There is a conflict with KFD mapping that needs to be
> + * resolved. Revisit later.
> + */
> +#define AMDGPU_VA_RESERVED_TRAP_UQ_SIZE (3ULL << 12)
> +#define AMDGPU_VA_RESERVED_TRAP_UQ_START(adev) \
> +       (AMDGPU_VA_RESERVED_TRAP_START(adev) - AMDGPU_VA_RESERVED_TRAP_UQ_SIZE)
> +
>  #define AMDGPU_VA_RESERVED_BOTTOM              (1ULL << 16)
> -#define AMDGPU_VA_RESERVED_TOP                 (AMDGPU_VA_RESERVED_TRAP_SIZE + \
> -                                                AMDGPU_VA_RESERVED_SEQ64_SIZE + \
> -                                                AMDGPU_VA_RESERVED_CSA_SIZE)
> +#define AMDGPU_VA_RESERVED_TOP                                            \
> +       (AMDGPU_VA_RESERVED_TRAP_UQ_SIZE + AMDGPU_VA_RESERVED_TRAP_SIZE + \
> +        AMDGPU_VA_RESERVED_SEQ64_SIZE + AMDGPU_VA_RESERVED_CSA_SIZE)
>
>  /* See vm_update_mode */
>  #define AMDGPU_VM_USE_CPU_FOR_GFX (1 << 0)
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 04/11] drm/amdgpu: Add user save area params validation
  2026-01-22 10:39 ` [PATCH v4 04/11] drm/amdgpu: Add user save area params validation Lijo Lazar
@ 2026-01-23 20:44   ` Alex Deucher
  2026-01-27  5:35     ` Lazar, Lijo
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Deucher @ 2026-01-23 20:44 UTC (permalink / raw)
  To: Lijo Lazar
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang

On Thu, Jan 22, 2026 at 5:52 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>
> Add an interface to validate user provided save area parameters. Address
> validation is not done and expected to be done outside.
>
> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 44 ++++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h | 11 ++++++
>  2 files changed, 55 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> index 80020fd33ce6..32d9398cd1d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> @@ -64,6 +64,15 @@ static inline bool amdgpu_cwsr_is_supported(struct amdgpu_device *adev)
>         return true;
>  }
>
> +uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc)
> +{
> +       if (!amdgpu_cwsr_is_enabled(adev))
> +               return 0;
> +
> +       return num_xcc *
> +              (adev->cwsr_info->xcc_cwsr_sz + adev->cwsr_info->xcc_dbg_mem_sz);

These could overflow if userspace passes in especially large values.

Alex

> +}
> +
>  static void amdgpu_cwsr_init_isa_details(struct amdgpu_device *adev,
>                                          struct amdgpu_cwsr_info *cwsr_info)
>  {
> @@ -425,6 +434,41 @@ int amdgpu_cwsr_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>         return r;
>  }
>
> +int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
> +                               struct amdgpu_cwsr_params *cwsr_params,
> +                               int num_xcc)
> +{
> +       struct amdgpu_cwsr_info *cwsr_info = adev->cwsr_info;
> +
> +       if (!amdgpu_cwsr_is_enabled(adev))
> +               return -EOPNOTSUPP;
> +
> +       if (!cwsr_params)
> +               return -EINVAL;
> +
> +       /*
> +        * Only control stack and save area size details checked. Address validation needs to be
> +        * carried out separately.
> +        */
> +       if (cwsr_params->ctl_stack_sz !=
> +           (cwsr_info->xcc_ctl_stack_sz * num_xcc)) {
> +               dev_dbg(adev->dev,
> +                       "queue ctl stack size 0x%x not equal to node ctl stack size 0x%x\n",
> +                       cwsr_params->ctl_stack_sz,
> +                       num_xcc * cwsr_info->xcc_ctl_stack_sz);
> +               return -EINVAL;
> +       }
> +
> +       if (cwsr_params->cwsr_sz < (cwsr_info->xcc_cwsr_sz * num_xcc)) {
> +               dev_dbg(adev->dev,
> +                       "queue cwsr size 0x%x not equal to node cwsr size 0x%x\n",
> +                       cwsr_params->cwsr_sz, num_xcc * cwsr_info->xcc_cwsr_sz);
> +               return -EINVAL;
> +       }
> +
> +       return 0;
> +}
> +
>  void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>                       struct amdgpu_cwsr_trap_obj **trap_obj)
>  {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> index 3c80d057bbed..96b03a8ed99b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> @@ -56,6 +56,13 @@ struct amdgpu_cwsr_info {
>         uint32_t xcc_cwsr_sz;
>  };
>
> +struct amdgpu_cwsr_params {
> +       uint64_t ctx_save_area_address;
> +       /* cwsr size info */
> +       uint32_t ctl_stack_sz;
> +       uint32_t cwsr_sz;
> +};
> +
>  int amdgpu_cwsr_init(struct amdgpu_device *adev);
>  void amdgpu_cwsr_fini(struct amdgpu_device *adev);
>
> @@ -68,4 +75,8 @@ static inline bool amdgpu_cwsr_is_enabled(struct amdgpu_device *adev)
>         return adev->cwsr_info != NULL;
>  }
>
> +uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc);
> +int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
> +                               struct amdgpu_cwsr_params *cwsr_params,
> +                               int num_xcc);
>  #endif
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 07/11] drm/amdgpu: Add user save area params to mqd input
  2026-01-22 10:39 ` [PATCH v4 07/11] drm/amdgpu: Add user save area params to mqd input Lijo Lazar
@ 2026-01-23 20:47   ` Alex Deucher
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Deucher @ 2026-01-23 20:47 UTC (permalink / raw)
  To: Lijo Lazar
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang

On Thu, Jan 22, 2026 at 5:52 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>
> Add user save area parameters to mqd properties for queue creation.
> Validate the parameters before using for mqd initialization.
>
> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  4 ++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c  | 24 ++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h  |  5 +++++
>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c     | 14 +++++++++++++
>  drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 16 +++++++++++++++
>  5 files changed, 63 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 218d8030a07c..26b757c95579 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -814,6 +814,10 @@ struct amdgpu_mqd_prop {
>         uint64_t fence_address;
>         bool tmz_queue;
>         bool kernel_queue;
> +       /* cwsr params*/
> +       uint64_t ctx_save_area_addr;
> +       uint32_t ctx_save_area_size;
> +       uint32_t ctl_stack_size;
>  };
>
>  struct amdgpu_mqd {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index 37a526a1085f..119b84a0703e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -33,6 +33,7 @@
>  #include "amdgpu_userq.h"
>  #include "amdgpu_hmm.h"
>  #include "amdgpu_userq_fence.h"
> +#include "amdgpu_cwsr.h"
>
>  u32 amdgpu_userq_get_supported_ip_mask(struct amdgpu_device *adev)
>  {
> @@ -265,6 +266,29 @@ int amdgpu_userq_input_va_validate(struct amdgpu_device *adev,
>         return r;
>  }
>
> +int amdgpu_userq_input_cwsr_params_validate(
> +       struct amdgpu_usermode_queue *queue,
> +       struct amdgpu_cwsr_params *cwsr_params)
> +{
> +       struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(queue->userq_mgr);
> +       struct amdgpu_device *adev = queue->userq_mgr->adev;
> +       uint32_t cwsr_size;
> +       int num_xcc;
> +       int r;
> +
> +       num_xcc = amdgpu_xcp_get_num_xcc(adev->xcp_mgr, fpriv->xcp_id);
> +       r = amdgpu_cwsr_validate_params(queue->userq_mgr->adev, cwsr_params,
> +                                       num_xcc);
> +       if (r)
> +               return r;
> +       cwsr_size = amdgpu_cwsr_size_needed(queue->userq_mgr->adev, num_xcc);
> +       if (!cwsr_size)
> +               return -EOPNOTSUPP;
> +
> +       return amdgpu_userq_input_va_validate(
> +               adev, queue, cwsr_params->ctx_save_area_address, cwsr_size);
> +}
> +
>  static bool amdgpu_userq_buffer_va_mapped(struct amdgpu_vm *vm, u64 addr)
>  {
>         struct amdgpu_bo_va_mapping *mapping;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h
> index 5845d8959034..a64292bc24dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h
> @@ -41,6 +41,7 @@ enum amdgpu_userq_state {
>  };
>
>  struct amdgpu_mqd_prop;
> +struct amdgpu_cwsr_params;
>
>  struct amdgpu_userq_obj {
>         void             *cpu_ptr;
> @@ -157,4 +158,8 @@ int amdgpu_userq_input_va_validate(struct amdgpu_device *adev,
>  int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
>                                        struct amdgpu_bo_va_mapping *mapping,
>                                        uint64_t saddr);
> +int amdgpu_userq_input_cwsr_params_validate(
> +       struct amdgpu_usermode_queue *queue,
> +       struct amdgpu_cwsr_params *cwsr_params);
> +
>  #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
> index 40660b05f979..5f6a6f630495 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
> @@ -3243,6 +3243,20 @@ static int gfx_v12_0_compute_mqd_init(struct amdgpu_device *adev, void *m,
>         mqd->fence_address_lo = lower_32_bits(prop->fence_address);
>         mqd->fence_address_hi = upper_32_bits(prop->fence_address);
>
> +       /* If non-zero, assume cwsr is enabled */
> +       if (prop->ctx_save_area_addr) {
> +               mqd->cp_hqd_persistent_state |=
> +                       (1 << CP_HQD_PERSISTENT_STATE__QSWITCH_MODE__SHIFT);
> +               mqd->cp_hqd_ctx_save_base_addr_lo =
> +                       lower_32_bits(prop->ctx_save_area_addr);
> +               mqd->cp_hqd_ctx_save_base_addr_hi =
> +                       upper_32_bits(prop->ctx_save_area_addr);
> +               mqd->cp_hqd_ctx_save_size = prop->ctx_save_area_size;
> +               mqd->cp_hqd_cntl_stack_size = prop->ctl_stack_size;
> +               mqd->cp_hqd_cntl_stack_offset = prop->ctl_stack_size;
> +               mqd->cp_hqd_wg_state_offset = prop->ctl_stack_size;
> +       }
> +
>         return 0;
>  }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> index 27917614b1a8..7ad8297eb0d8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> @@ -314,6 +314,7 @@ static int mes_userq_mqd_create(struct amdgpu_usermode_queue *queue,
>
>         if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
>                 struct drm_amdgpu_userq_mqd_compute_gfx11 *compute_mqd;
> +               struct amdgpu_cwsr_params cwsr_params;
>
>                 if (mqd_user->mqd_size != sizeof(*compute_mqd)) {
>                         DRM_ERROR("Invalid compute IP MQD size\n");
> @@ -339,6 +340,21 @@ static int mes_userq_mqd_create(struct amdgpu_usermode_queue *queue,
>                 userq_props->hqd_active = false;
>                 userq_props->tmz_queue =
>                         mqd_user->flags & AMDGPU_USERQ_CREATE_FLAGS_QUEUE_SECURE;
> +
> +               if (amdgpu_cwsr_is_enabled(adev)) {
> +                       cwsr_params.ctx_save_area_address =
> +                               userq_props->ctx_save_area_addr;
> +                       cwsr_params.cwsr_sz = userq_props->ctx_save_area_size;
> +                       cwsr_params.ctl_stack_sz = userq_props->ctl_stack_size;
> +
> +                       r = amdgpu_userq_input_cwsr_params_validate(
> +                               queue, &cwsr_params);
> +                       if (r) {
> +                               kfree(compute_mqd);
> +                               goto free_mqd;
> +                       }
> +               }
> +
>                 kfree(compute_mqd);
>         } else if (queue->queue_type == AMDGPU_HW_IP_GFX) {
>                 struct drm_amdgpu_userq_mqd_gfx11 *mqd_gfx_v11;
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 08/11] drm/amdgpu: Add ioctl to get cwsr details
  2026-01-22 10:39 ` [PATCH v4 08/11] drm/amdgpu: Add ioctl to get cwsr details Lijo Lazar
@ 2026-01-23 20:48   ` Alex Deucher
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Deucher @ 2026-01-23 20:48 UTC (permalink / raw)
  To: Lijo Lazar
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang

On Thu, Jan 22, 2026 at 6:02 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>
> Add an ioctl to return size information required for CWSR regions.
>
> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 21 +++++++++++++++++++++
>  include/uapi/drm/amdgpu_drm.h           | 16 ++++++++++++++++
>  2 files changed, 37 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index fed15a922346..992bcdf3fc1c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1426,6 +1426,27 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>                         return -EINVAL;
>                 }
>         }
> +       case AMDGPU_INFO_CWSR: {
> +               struct drm_amdgpu_info_cwsr cwsr_info;
> +               int num_xcc, r;
> +
> +               fpriv = (struct amdgpu_fpriv *)filp->driver_priv;
> +               if (!amdgpu_cwsr_is_enabled(adev) || !fpriv->cwsr_trap)
> +                       return -EOPNOTSUPP;
> +               num_xcc = amdgpu_xcp_get_num_xcc(adev->xcp_mgr, fpriv->xcp_id);
> +               cwsr_info.ctl_stack_size =
> +                       adev->cwsr_info->xcc_ctl_stack_sz * num_xcc;
> +               cwsr_info.dbg_mem_size =
> +                       adev->cwsr_info->xcc_dbg_mem_sz * num_xcc;
> +               cwsr_info.min_save_area_size =
> +                       adev->cwsr_info->xcc_cwsr_sz * num_xcc;
> +               r = copy_to_user(out, &cwsr_info,
> +                                min((size_t)size, sizeof(cwsr_info))) ?
> +                           -EFAULT :
> +                           0;
> +               return r;
> +       }
> +
>         default:
>                 DRM_DEBUG_KMS("Invalid request %d\n", info->query);
>                 return -EINVAL;
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index ab2bf47553e1..c178b8e0bd3f 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -1269,6 +1269,8 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
>  #define AMDGPU_INFO_GPUVM_FAULT                        0x23
>  /* query FW object size and alignment */
>  #define AMDGPU_INFO_UQ_FW_AREAS                        0x24
> +/* query CWSR size and alignment */
> +#define AMDGPU_INFO_CWSR                       0x25
>
>  #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT 0
>  #define AMDGPU_INFO_MMR_SE_INDEX_MASK  0xff
> @@ -1648,6 +1650,20 @@ struct drm_amdgpu_info_uq_metadata {
>         };
>  };
>
> +/**
> + * struct drm_amdgpu_info_cwsr - cwsr information
> + *
> + * Gives cwsr related size details. User needs to allocate buffer based on this.
> + */
> +struct drm_amdgpu_info_cwsr {
> +       /* Control stack size */
> +       __u32 ctl_stack_size;
> +       /* Debug memory area size */
> +       __u32 dbg_mem_size;
> +       /* Minimum save area size required */
> +       __u32 min_save_area_size;
> +};
> +
>  /*
>   * Supported GPU families
>   */
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 09/11] drm/amdgpu: Add ioctl support for cwsr params
  2026-01-22 10:39 ` [PATCH v4 09/11] drm/amdgpu: Add ioctl support for cwsr params Lijo Lazar
@ 2026-01-23 20:51   ` Alex Deucher
  2026-01-27  5:44     ` Lazar, Lijo
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Deucher @ 2026-01-23 20:51 UTC (permalink / raw)
  To: Lijo Lazar
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang

On Thu, Jan 22, 2026 at 5:52 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>
> Add cwsr parameters to userqueue ioctl. User should pass the GPU virtual
> address for save/restore buffer, and size allocated. They are supported
> only for user compute queues.
>
> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 13 +++++++++----
>  include/uapi/drm/amdgpu_drm.h              | 16 ++++++++++++++++
>  2 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> index 7ad8297eb0d8..2765317f04df 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> @@ -343,16 +343,21 @@ static int mes_userq_mqd_create(struct amdgpu_usermode_queue *queue,
>
>                 if (amdgpu_cwsr_is_enabled(adev)) {
>                         cwsr_params.ctx_save_area_address =
> -                               userq_props->ctx_save_area_addr;
> -                       cwsr_params.cwsr_sz = userq_props->ctx_save_area_size;
> -                       cwsr_params.ctl_stack_sz = userq_props->ctl_stack_size;
> -
> +                               compute_mqd->ctx_save_area_va;
> +                       cwsr_params.cwsr_sz = compute_mqd->ctx_save_area_size;
> +                       cwsr_params.ctl_stack_sz = compute_mqd->ctl_stack_size;
>                         r = amdgpu_userq_input_cwsr_params_validate(
>                                 queue, &cwsr_params);
>                         if (r) {
>                                 kfree(compute_mqd);
>                                 goto free_mqd;
>                         }
> +                       userq_props->ctx_save_area_addr =
> +                               compute_mqd->ctx_save_area_va;
> +                       userq_props->ctx_save_area_size =
> +                               compute_mqd->ctx_save_area_size;
> +                       userq_props->ctl_stack_size =
> +                               compute_mqd->ctl_stack_size;
>                 }
>
>                 kfree(compute_mqd);
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index c178b8e0bd3f..b7a858365174 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -460,6 +460,22 @@ struct drm_amdgpu_userq_mqd_compute_gfx11 {
>          * to get the size.
>          */
>         __u64   eop_va;
> +       /**
> +        * @ctx_save_area_va: Virtual address of the GPU memory for save/restore buffer.
> +        * This must be from a separate GPU object, and use AMDGPU_INFO IOCTL
> +        * to get the size. This includes control stack, wave context and debugger memory.
> +        */
> +       __u64 ctx_save_area_va;
> +       /**
> +        * @ctx_save_area_size:  Total size (in bytes) allocated for save/restore buffer.
> +        * Use AMDGPU_INFO IOCTL to get the size.
> +        */
> +       __u32 ctx_save_area_size;
> +       /**
> +        * @ctl_stack_size: Size (in bytes) of control stack region in the save/restore buffer.
> +        * Use AMDGPU_INFO IOCTL to get the size.
> +        */
> +       __u32 ctl_stack_size;

Does it matter where the ctl_stack is within the save area?

Alex

>  };
>
>  /* userq signal/wait ioctl */
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 10/11] drm/amdgpu: Add ioctl to set level2 handler
  2026-01-22 10:39 ` [PATCH v4 10/11] drm/amdgpu: Add ioctl to set level2 handler Lijo Lazar
@ 2026-01-23 20:52   ` Alex Deucher
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Deucher @ 2026-01-23 20:52 UTC (permalink / raw)
  To: Lijo Lazar
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang

On Thu, Jan 22, 2026 at 6:02 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>
> Add ioctl to set tba/tma of level2 trap handler
>
> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h      |   1 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 105 +++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h |  11 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |   2 +
>  include/uapi/drm/amdgpu_drm.h            |  24 ++++++
>  5 files changed, 141 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 26b757c95579..c3dfd84c2962 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1575,7 +1575,6 @@ int amdgpu_enable_vblank_kms(struct drm_crtc *crtc);
>  void amdgpu_disable_vblank_kms(struct drm_crtc *crtc);
>  int amdgpu_info_ioctl(struct drm_device *dev, void *data,
>                       struct drm_file *filp);
> -
>  /*
>   * functions used by amdgpu_encoder.c
>   */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> index 32d9398cd1d1..70f444afece0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> @@ -510,3 +510,108 @@ void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>         kfree(*trap_obj);
>         *trap_obj = NULL;
>  }
> +
> +static int amdgpu_cwsr_validate_user_addr(struct amdgpu_device *adev,
> +                                         struct amdgpu_vm *vm,
> +                                         struct amdgpu_cwsr_usr_addr *usr_addr)
> +{
> +       struct amdgpu_bo_va_mapping *va_map;
> +       uint64_t addr;
> +       uint32_t size;
> +       int r;
> +
> +       addr = (usr_addr->addr & AMDGPU_GMC_HOLE_MASK) >> AMDGPU_GPU_PAGE_SHIFT;
> +       size = usr_addr->size >> AMDGPU_GPU_PAGE_SHIFT;
> +
> +       r = amdgpu_bo_reserve(vm->root.bo, false);
> +       if (r)
> +               return r;
> +
> +       va_map = amdgpu_vm_bo_lookup_mapping(vm, addr);
> +       if (!va_map) {
> +               r = -EINVAL;
> +               goto err;
> +       }
> +       /* validate whether resident in the VM mapping range */
> +       if (addr >= va_map->start && va_map->last - addr + 1 >= size) {
> +               amdgpu_bo_unreserve(vm->root.bo);
> +               return 0;
> +       }
> +
> +       r = -EINVAL;
> +err:
> +       amdgpu_bo_unreserve(vm->root.bo);
> +
> +       return r;
> +}
> +
> +static int amdgpu_cwsr_set_l2_trap_handler(
> +       struct amdgpu_device *adev, struct amdgpu_vm *vm,
> +       struct amdgpu_cwsr_trap_obj *cwsr_obj, struct amdgpu_cwsr_usr_addr *tma,
> +       struct amdgpu_cwsr_usr_addr *tba)
> +{
> +       uint64_t *l1tma;
> +       int r;
> +
> +       if (!amdgpu_cwsr_is_enabled(adev))
> +               return -EOPNOTSUPP;
> +
> +       if (!cwsr_obj || !cwsr_obj->tma_cpu_addr || !tma || !tba)
> +               return -EINVAL;
> +       r = amdgpu_cwsr_validate_user_addr(adev, vm, tma);
> +       if (r)
> +               return r;
> +       r = amdgpu_cwsr_validate_user_addr(adev, vm, tba);
> +       if (r)
> +               return r;
> +
> +       l1tma = (uint64_t *)(cwsr_obj->tma_cpu_addr);
> +       l1tma[0] = tma->addr;
> +       l1tma[1] = tba->addr;
> +
> +       return 0;
> +}
> +
> +/*
> + * Userspace cwsr related ioctl
> + */
> +/**
> + * amdgpu_cwsr_ioctl - Handle cwsr specific requests.
> + *
> + * @dev: drm device pointer
> + * @data: request object
> + * @filp: drm filp
> + *
> + * This function is used to perform cwsr and trap handler related operations
> + * Returns 0 on success, error code on failure.
> + */
> +int amdgpu_cwsr_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
> +{
> +       struct amdgpu_device *adev = drm_to_adev(dev);
> +       union drm_amdgpu_cwsr *cwsr = data;
> +       struct amdgpu_fpriv *fpriv;
> +       int r;
> +
> +       fpriv = (struct amdgpu_fpriv *)filp->driver_priv;
> +
> +       if (!fpriv->cwsr_trap)
> +               return -EOPNOTSUPP;
> +
> +       switch (cwsr->in.op) {
> +       case AMDGPU_CWSR_OP_SET_L2_TRAP: {
> +               struct amdgpu_cwsr_usr_addr tba;
> +               struct amdgpu_cwsr_usr_addr tma;
> +
> +               tba.addr = cwsr->in.l2trap.tba_va;
> +               tba.size = cwsr->in.l2trap.tba_sz;
> +               tma.addr = cwsr->in.l2trap.tma_va;
> +               tma.size = cwsr->in.l2trap.tma_sz;
> +               r = amdgpu_cwsr_set_l2_trap_handler(
> +                       adev, &fpriv->vm, fpriv->cwsr_trap, &tma, &tba);
> +       } break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       return r;
> +}
> \ No newline at end of file
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> index b54240d40a6c..c9f61e393fde 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> @@ -31,7 +31,7 @@ struct amdgpu_device;
>  struct amdgpu_vm;
>
>  /**
> - * struct amdgpu_cwsr_obj - CWSR (Compute Wave Save Restore) buffer tracking
> + * struct amdgpu_cwsr_trap_obj - CWSR (Compute Wave Save Restore) buffer tracking
>   * @bo: Buffer object for CWSR area
>   * @bo_va: Buffer object virtual address mapping
>   */
> @@ -63,6 +63,11 @@ struct amdgpu_cwsr_params {
>         uint32_t cwsr_sz;
>  };
>
> +struct amdgpu_cwsr_usr_addr {
> +       uint64_t addr;
> +       uint32_t size;
> +};
> +
>  int amdgpu_cwsr_init(struct amdgpu_device *adev);
>  void amdgpu_cwsr_fini(struct amdgpu_device *adev);
>
> @@ -85,4 +90,8 @@ static inline bool amdgpu_cwsr_has_dbg_wa(struct amdgpu_device *adev)
>
>         return gc_ver >= IP_VERSION(11, 0, 0) && gc_ver <= IP_VERSION(11, 0, 3);
>  }
> +
> +int amdgpu_cwsr_ioctl(struct drm_device *dev, void *data,
> +                     struct drm_file *filp);
> +
>  #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 771c89c84608..7fbd106fff8b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -53,6 +53,7 @@
>  #include "amdgpu_sched.h"
>  #include "amdgpu_xgmi.h"
>  #include "amdgpu_userq.h"
> +#include "amdgpu_cwsr.h"
>  #include "amdgpu_userq_fence.h"
>  #include "../amdxcp/amdgpu_xcp_drv.h"
>
> @@ -3074,6 +3075,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>         DRM_IOCTL_DEF_DRV(AMDGPU_SCHED, amdgpu_sched_ioctl, DRM_MASTER),
>         DRM_IOCTL_DEF_DRV(AMDGPU_BO_LIST, amdgpu_bo_list_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(AMDGPU_FENCE_TO_HANDLE, amdgpu_cs_fence_to_handle_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(AMDGPU_CWSR, amdgpu_cwsr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>         /* KMS */
>         DRM_IOCTL_DEF_DRV(AMDGPU_GEM_MMAP, amdgpu_gem_mmap_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(AMDGPU_GEM_WAIT_IDLE, amdgpu_gem_wait_idle_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index b7a858365174..a36e3e2e679c 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -58,6 +58,7 @@ extern "C" {
>  #define DRM_AMDGPU_USERQ_SIGNAL                0x17
>  #define DRM_AMDGPU_USERQ_WAIT          0x18
>  #define DRM_AMDGPU_GEM_LIST_HANDLES    0x19
> +#define DRM_AMDGPU_CWSR 0x20
>
>  #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>  #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> @@ -79,6 +80,8 @@ extern "C" {
>  #define DRM_IOCTL_AMDGPU_USERQ_SIGNAL  DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_SIGNAL, struct drm_amdgpu_userq_signal)
>  #define DRM_IOCTL_AMDGPU_USERQ_WAIT    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_WAIT, struct drm_amdgpu_userq_wait)
>  #define DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_LIST_HANDLES, struct drm_amdgpu_gem_list_handles)
> +#define DRM_IOCTL_AMDGPU_CWSR \
> +       DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_CWSR, union drm_amdgpu_cwsr)
>
>  /**
>   * DOC: memory domains
> @@ -1680,6 +1683,27 @@ struct drm_amdgpu_info_cwsr {
>         __u32 min_save_area_size;
>  };
>
> +/* cwsr ioctl */
> +#define AMDGPU_CWSR_OP_SET_L2_TRAP 1
> +
> +struct drm_amdgpu_cwsr_in {
> +       /* AMDGPU_CWSR_OP_* */
> +       __u32 op;
> +       struct {
> +               /* Level 2 trap handler base address */
> +               __u64 tba_va;
> +               /* Level 2 trap handler buffer size (in bytes) */
> +               __u32 tba_sz;
> +               /* Level 2 trap memory buffer address */
> +               __u64 tma_va;
> +               /* Level 2 trap memory buffer size (in bytes) */
> +               __u32 tma_sz;
> +       } l2trap;
> +};
> +
> +union drm_amdgpu_cwsr {
> +       struct drm_amdgpu_cwsr_in in;
> +};
>  /*
>   * Supported GPU families
>   */
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 11/11] drm/amdgpu: Add interface to set debug trap flag
  2026-01-22 10:40 ` [PATCH v4 11/11] drm/amdgpu: Add interface to set debug trap flag Lijo Lazar
@ 2026-01-23 20:53   ` Alex Deucher
  2026-01-27 12:36     ` Lazar, Lijo
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Deucher @ 2026-01-23 20:53 UTC (permalink / raw)
  To: Lijo Lazar
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang

On Thu, Jan 22, 2026 at 5:42 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>
> Add interface to set debugger trap flag in TMA region.
>
> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 19 ++++++++++++++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h |  3 +++
>  2 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> index 70f444afece0..663b91c8e6f3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> @@ -19,7 +19,6 @@
>   * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>   * OTHER DEALINGS IN THE SOFTWARE.
>   */
> -

Spurious change.

>  #include <drm/drm_exec.h>
>
>  #include "amdgpu.h"
> @@ -614,4 +613,22 @@ int amdgpu_cwsr_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>         }
>
>         return r;
> +}
> +
> +int amdgpu_cwsr_set_trap_debug_flag(struct amdgpu_device *adev,
> +                                   struct amdgpu_cwsr_trap_obj *cwsr_obj,
> +                                   bool enabled)
> +{
> +       uint64_t *l1tma;
> +
> +       if (!amdgpu_cwsr_is_enabled(adev))
> +               return -EOPNOTSUPP;
> +
> +       if (!cwsr_obj)
> +               return -EINVAL;
> +
> +       l1tma = (uint64_t *)(cwsr_obj->tma_cpu_addr);
> +       l1tma[2] = enabled;
> +
> +       return 0;
>  }
> \ No newline at end of file
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> index c9f61e393fde..a32044b07b45 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> @@ -93,5 +93,8 @@ static inline bool amdgpu_cwsr_has_dbg_wa(struct amdgpu_device *adev)
>
>  int amdgpu_cwsr_ioctl(struct drm_device *dev, void *data,
>                       struct drm_file *filp);
> +int amdgpu_cwsr_set_trap_debug_flag(struct amdgpu_device *adev,
> +                                   struct amdgpu_cwsr_trap_obj *cwsr_obj,
> +                                   bool enabled);
>

Nothing uses this yet?

Alex

>  #endif
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 04/11] drm/amdgpu: Add user save area params validation
  2026-01-23 20:44   ` Alex Deucher
@ 2026-01-27  5:35     ` Lazar, Lijo
  2026-01-27  5:55       ` Alex Deucher
  0 siblings, 1 reply; 28+ messages in thread
From: Lazar, Lijo @ 2026-01-27  5:35 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang



On 24-Jan-26 2:14 AM, Alex Deucher wrote:
> On Thu, Jan 22, 2026 at 5:52 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>
>> Add an interface to validate user provided save area parameters. Address
>> validation is not done and expected to be done outside.
>>
>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 44 ++++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h | 11 ++++++
>>   2 files changed, 55 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
>> index 80020fd33ce6..32d9398cd1d1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
>> @@ -64,6 +64,15 @@ static inline bool amdgpu_cwsr_is_supported(struct amdgpu_device *adev)
>>          return true;
>>   }
>>
>> +uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc)
>> +{
>> +       if (!amdgpu_cwsr_is_enabled(adev))
>> +               return 0;
>> +
>> +       return num_xcc *
>> +              (adev->cwsr_info->xcc_cwsr_sz + adev->cwsr_info->xcc_dbg_mem_sz);
> 
> These could overflow if userspace passes in especially large values.
> 

Sorry, I didn't get that. cwsr_info contains driver calculated values. 
This function returns the size required.

Thanks,
Lijo

> Alex
> 
>> +}
>> +
>>   static void amdgpu_cwsr_init_isa_details(struct amdgpu_device *adev,
>>                                           struct amdgpu_cwsr_info *cwsr_info)
>>   {
>> @@ -425,6 +434,41 @@ int amdgpu_cwsr_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>          return r;
>>   }
>>
>> +int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
>> +                               struct amdgpu_cwsr_params *cwsr_params,
>> +                               int num_xcc)
>> +{
>> +       struct amdgpu_cwsr_info *cwsr_info = adev->cwsr_info;
>> +
>> +       if (!amdgpu_cwsr_is_enabled(adev))
>> +               return -EOPNOTSUPP;
>> +
>> +       if (!cwsr_params)
>> +               return -EINVAL;
>> +
>> +       /*
>> +        * Only control stack and save area size details checked. Address validation needs to be
>> +        * carried out separately.
>> +        */
>> +       if (cwsr_params->ctl_stack_sz !=
>> +           (cwsr_info->xcc_ctl_stack_sz * num_xcc)) {
>> +               dev_dbg(adev->dev,
>> +                       "queue ctl stack size 0x%x not equal to node ctl stack size 0x%x\n",
>> +                       cwsr_params->ctl_stack_sz,
>> +                       num_xcc * cwsr_info->xcc_ctl_stack_sz);
>> +               return -EINVAL;
>> +       }
>> +
>> +       if (cwsr_params->cwsr_sz < (cwsr_info->xcc_cwsr_sz * num_xcc)) {
>> +               dev_dbg(adev->dev,
>> +                       "queue cwsr size 0x%x not equal to node cwsr size 0x%x\n",
>> +                       cwsr_params->cwsr_sz, num_xcc * cwsr_info->xcc_cwsr_sz);
>> +               return -EINVAL;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>   void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>                        struct amdgpu_cwsr_trap_obj **trap_obj)
>>   {
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>> index 3c80d057bbed..96b03a8ed99b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>> @@ -56,6 +56,13 @@ struct amdgpu_cwsr_info {
>>          uint32_t xcc_cwsr_sz;
>>   };
>>
>> +struct amdgpu_cwsr_params {
>> +       uint64_t ctx_save_area_address;
>> +       /* cwsr size info */
>> +       uint32_t ctl_stack_sz;
>> +       uint32_t cwsr_sz;
>> +};
>> +
>>   int amdgpu_cwsr_init(struct amdgpu_device *adev);
>>   void amdgpu_cwsr_fini(struct amdgpu_device *adev);
>>
>> @@ -68,4 +75,8 @@ static inline bool amdgpu_cwsr_is_enabled(struct amdgpu_device *adev)
>>          return adev->cwsr_info != NULL;
>>   }
>>
>> +uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc);
>> +int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
>> +                               struct amdgpu_cwsr_params *cwsr_params,
>> +                               int num_xcc);
>>   #endif
>> --
>> 2.49.0
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 09/11] drm/amdgpu: Add ioctl support for cwsr params
  2026-01-23 20:51   ` Alex Deucher
@ 2026-01-27  5:44     ` Lazar, Lijo
  2026-01-28 11:59       ` Lancelot SIX
  0 siblings, 1 reply; 28+ messages in thread
From: Lazar, Lijo @ 2026-01-27  5:44 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang, Yat Sin, David, Lancelot.Six



On 24-Jan-26 2:21 AM, Alex Deucher wrote:
> On Thu, Jan 22, 2026 at 5:52 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>
>> Add cwsr parameters to userqueue ioctl. User should pass the GPU virtual
>> address for save/restore buffer, and size allocated. They are supported
>> only for user compute queues.
>>
>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 13 +++++++++----
>>   include/uapi/drm/amdgpu_drm.h              | 16 ++++++++++++++++
>>   2 files changed, 25 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
>> index 7ad8297eb0d8..2765317f04df 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
>> @@ -343,16 +343,21 @@ static int mes_userq_mqd_create(struct amdgpu_usermode_queue *queue,
>>
>>                  if (amdgpu_cwsr_is_enabled(adev)) {
>>                          cwsr_params.ctx_save_area_address =
>> -                               userq_props->ctx_save_area_addr;
>> -                       cwsr_params.cwsr_sz = userq_props->ctx_save_area_size;
>> -                       cwsr_params.ctl_stack_sz = userq_props->ctl_stack_size;
>> -
>> +                               compute_mqd->ctx_save_area_va;
>> +                       cwsr_params.cwsr_sz = compute_mqd->ctx_save_area_size;
>> +                       cwsr_params.ctl_stack_sz = compute_mqd->ctl_stack_size;
>>                          r = amdgpu_userq_input_cwsr_params_validate(
>>                                  queue, &cwsr_params);
>>                          if (r) {
>>                                  kfree(compute_mqd);
>>                                  goto free_mqd;
>>                          }
>> +                       userq_props->ctx_save_area_addr =
>> +                               compute_mqd->ctx_save_area_va;
>> +                       userq_props->ctx_save_area_size =
>> +                               compute_mqd->ctx_save_area_size;
>> +                       userq_props->ctl_stack_size =
>> +                               compute_mqd->ctl_stack_size;
>>                  }
>>
>>                  kfree(compute_mqd);
>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>> index c178b8e0bd3f..b7a858365174 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -460,6 +460,22 @@ struct drm_amdgpu_userq_mqd_compute_gfx11 {
>>           * to get the size.
>>           */
>>          __u64   eop_va;
>> +       /**
>> +        * @ctx_save_area_va: Virtual address of the GPU memory for save/restore buffer.
>> +        * This must be from a separate GPU object, and use AMDGPU_INFO IOCTL
>> +        * to get the size. This includes control stack, wave context and debugger memory.
>> +        */
>> +       __u64 ctx_save_area_va;
>> +       /**
>> +        * @ctx_save_area_size:  Total size (in bytes) allocated for save/restore buffer.
>> +        * Use AMDGPU_INFO IOCTL to get the size.
>> +        */
>> +       __u32 ctx_save_area_size;
>> +       /**
>> +        * @ctl_stack_size: Size (in bytes) of control stack region in the save/restore buffer.
>> +        * Use AMDGPU_INFO IOCTL to get the size.
>> +        */
>> +       __u32 ctl_stack_size;
> 
> Does it matter where the ctl_stack is within the save area?
> 

This is the legacy way. Probably, this can be avoided. Adding David and 
Lancelot.

Hi David/Lancelot,

Do you have the background of userspace passing back control stack size?

https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c#L260

Can driver assume that context save area takes care of everything and 
assume that user allotted as per the right control stack size?

Thanks,
Lijo

> Alex
> 
>>   };
>>
>>   /* userq signal/wait ioctl */
>> --
>> 2.49.0
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 04/11] drm/amdgpu: Add user save area params validation
  2026-01-27  5:35     ` Lazar, Lijo
@ 2026-01-27  5:55       ` Alex Deucher
  2026-01-27  6:11         ` Lazar, Lijo
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Deucher @ 2026-01-27  5:55 UTC (permalink / raw)
  To: Lazar, Lijo
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang

On Tue, Jan 27, 2026 at 12:35 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>
>
> On 24-Jan-26 2:14 AM, Alex Deucher wrote:
> > On Thu, Jan 22, 2026 at 5:52 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
> >>
> >> Add an interface to validate user provided save area parameters. Address
> >> validation is not done and expected to be done outside.
> >>
> >> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 44 ++++++++++++++++++++++++
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h | 11 ++++++
> >>   2 files changed, 55 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> >> index 80020fd33ce6..32d9398cd1d1 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
> >> @@ -64,6 +64,15 @@ static inline bool amdgpu_cwsr_is_supported(struct amdgpu_device *adev)
> >>          return true;
> >>   }
> >>
> >> +uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc)
> >> +{
> >> +       if (!amdgpu_cwsr_is_enabled(adev))
> >> +               return 0;
> >> +
> >> +       return num_xcc *
> >> +              (adev->cwsr_info->xcc_cwsr_sz + adev->cwsr_info->xcc_dbg_mem_sz);
> >
> > These could overflow if userspace passes in especially large values.
> >
>
> Sorry, I didn't get that. cwsr_info contains driver calculated values.
> This function returns the size required.

Sorry, I mixed this up.  See below.

>
> Thanks,
> Lijo
>
> > Alex
> >
> >> +}
> >> +
> >>   static void amdgpu_cwsr_init_isa_details(struct amdgpu_device *adev,
> >>                                           struct amdgpu_cwsr_info *cwsr_info)
> >>   {
> >> @@ -425,6 +434,41 @@ int amdgpu_cwsr_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> >>          return r;
> >>   }
> >>
> >> +int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
> >> +                               struct amdgpu_cwsr_params *cwsr_params,
> >> +                               int num_xcc)
> >> +{
> >> +       struct amdgpu_cwsr_info *cwsr_info = adev->cwsr_info;
> >> +
> >> +       if (!amdgpu_cwsr_is_enabled(adev))
> >> +               return -EOPNOTSUPP;
> >> +
> >> +       if (!cwsr_params)
> >> +               return -EINVAL;
> >> +
> >> +       /*
> >> +        * Only control stack and save area size details checked. Address validation needs to be
> >> +        * carried out separately.
> >> +        */
> >> +       if (cwsr_params->ctl_stack_sz !=
> >> +           (cwsr_info->xcc_ctl_stack_sz * num_xcc)) {
> >> +               dev_dbg(adev->dev,
> >> +                       "queue ctl stack size 0x%x not equal to node ctl stack size 0x%x\n",
> >> +                       cwsr_params->ctl_stack_sz,
> >> +                       num_xcc * cwsr_info->xcc_ctl_stack_sz);
> >> +               return -EINVAL;
> >> +       }
> >> +
> >> +       if (cwsr_params->cwsr_sz < (cwsr_info->xcc_cwsr_sz * num_xcc)) {
> >> +               dev_dbg(adev->dev,
> >> +                       "queue cwsr size 0x%x not equal to node cwsr size 0x%x\n",
> >> +                       cwsr_params->cwsr_sz, num_xcc * cwsr_info->xcc_cwsr_sz);
> >> +               return -EINVAL;
> >> +       }

cwsr_params->cwsr_sz has no upper bound check.  Can this cause an
overflow elsewhere?

Alex


Alex

> >> +
> >> +       return 0;
> >> +}
> >> +
> >>   void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> >>                        struct amdgpu_cwsr_trap_obj **trap_obj)
> >>   {
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> >> index 3c80d057bbed..96b03a8ed99b 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> >> @@ -56,6 +56,13 @@ struct amdgpu_cwsr_info {
> >>          uint32_t xcc_cwsr_sz;
> >>   };
> >>
> >> +struct amdgpu_cwsr_params {
> >> +       uint64_t ctx_save_area_address;
> >> +       /* cwsr size info */
> >> +       uint32_t ctl_stack_sz;
> >> +       uint32_t cwsr_sz;
> >> +};
> >> +
> >>   int amdgpu_cwsr_init(struct amdgpu_device *adev);
> >>   void amdgpu_cwsr_fini(struct amdgpu_device *adev);
> >>
> >> @@ -68,4 +75,8 @@ static inline bool amdgpu_cwsr_is_enabled(struct amdgpu_device *adev)
> >>          return adev->cwsr_info != NULL;
> >>   }
> >>
> >> +uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc);
> >> +int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
> >> +                               struct amdgpu_cwsr_params *cwsr_params,
> >> +                               int num_xcc);
> >>   #endif
> >> --
> >> 2.49.0
> >>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 04/11] drm/amdgpu: Add user save area params validation
  2026-01-27  5:55       ` Alex Deucher
@ 2026-01-27  6:11         ` Lazar, Lijo
  2026-01-28 12:30           ` Lancelot SIX
  0 siblings, 1 reply; 28+ messages in thread
From: Lazar, Lijo @ 2026-01-27  6:11 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang, Lancelot.Six, Yat Sin, David



On 27-Jan-26 11:25 AM, Alex Deucher wrote:
> On Tue, Jan 27, 2026 at 12:35 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>
>>
>>
>> On 24-Jan-26 2:14 AM, Alex Deucher wrote:
>>> On Thu, Jan 22, 2026 at 5:52 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>>>
>>>> Add an interface to validate user provided save area parameters. Address
>>>> validation is not done and expected to be done outside.
>>>>
>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 44 ++++++++++++++++++++++++
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h | 11 ++++++
>>>>    2 files changed, 55 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
>>>> index 80020fd33ce6..32d9398cd1d1 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
>>>> @@ -64,6 +64,15 @@ static inline bool amdgpu_cwsr_is_supported(struct amdgpu_device *adev)
>>>>           return true;
>>>>    }
>>>>
>>>> +uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc)
>>>> +{
>>>> +       if (!amdgpu_cwsr_is_enabled(adev))
>>>> +               return 0;
>>>> +
>>>> +       return num_xcc *
>>>> +              (adev->cwsr_info->xcc_cwsr_sz + adev->cwsr_info->xcc_dbg_mem_sz);
>>>
>>> These could overflow if userspace passes in especially large values.
>>>
>>
>> Sorry, I didn't get that. cwsr_info contains driver calculated values.
>> This function returns the size required.
> 
> Sorry, I mixed this up.  See below.
> 
>>
>> Thanks,
>> Lijo
>>
>>> Alex
>>>
>>>> +}
>>>> +
>>>>    static void amdgpu_cwsr_init_isa_details(struct amdgpu_device *adev,
>>>>                                            struct amdgpu_cwsr_info *cwsr_info)
>>>>    {
>>>> @@ -425,6 +434,41 @@ int amdgpu_cwsr_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>>           return r;
>>>>    }
>>>>
>>>> +int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
>>>> +                               struct amdgpu_cwsr_params *cwsr_params,
>>>> +                               int num_xcc)
>>>> +{
>>>> +       struct amdgpu_cwsr_info *cwsr_info = adev->cwsr_info;
>>>> +
>>>> +       if (!amdgpu_cwsr_is_enabled(adev))
>>>> +               return -EOPNOTSUPP;
>>>> +
>>>> +       if (!cwsr_params)
>>>> +               return -EINVAL;
>>>> +
>>>> +       /*
>>>> +        * Only control stack and save area size details checked. Address validation needs to be
>>>> +        * carried out separately.
>>>> +        */
>>>> +       if (cwsr_params->ctl_stack_sz !=
>>>> +           (cwsr_info->xcc_ctl_stack_sz * num_xcc)) {
>>>> +               dev_dbg(adev->dev,
>>>> +                       "queue ctl stack size 0x%x not equal to node ctl stack size 0x%x\n",
>>>> +                       cwsr_params->ctl_stack_sz,
>>>> +                       num_xcc * cwsr_info->xcc_ctl_stack_sz);
>>>> +               return -EINVAL;
>>>> +       }
>>>> +
>>>> +       if (cwsr_params->cwsr_sz < (cwsr_info->xcc_cwsr_sz * num_xcc)) {
>>>> +               dev_dbg(adev->dev,
>>>> +                       "queue cwsr size 0x%x not equal to node cwsr size 0x%x\n",
>>>> +                       cwsr_params->cwsr_sz, num_xcc * cwsr_info->xcc_cwsr_sz);
>>>> +               return -EINVAL;
>>>> +       }
> 
> cwsr_params->cwsr_sz has no upper bound check.  Can this cause an
> overflow elsewhere?
> 

We could restrict to a max limit of 2 * cwsr size required. Adding 
David/Lancelot as well.

Thanks,
Lijo

> Alex
> 
> 
> Alex
> 
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>>    void amdgpu_cwsr_free(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>>                         struct amdgpu_cwsr_trap_obj **trap_obj)
>>>>    {
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>>>> index 3c80d057bbed..96b03a8ed99b 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>>>> @@ -56,6 +56,13 @@ struct amdgpu_cwsr_info {
>>>>           uint32_t xcc_cwsr_sz;
>>>>    };
>>>>
>>>> +struct amdgpu_cwsr_params {
>>>> +       uint64_t ctx_save_area_address;
>>>> +       /* cwsr size info */
>>>> +       uint32_t ctl_stack_sz;
>>>> +       uint32_t cwsr_sz;
>>>> +};
>>>> +
>>>>    int amdgpu_cwsr_init(struct amdgpu_device *adev);
>>>>    void amdgpu_cwsr_fini(struct amdgpu_device *adev);
>>>>
>>>> @@ -68,4 +75,8 @@ static inline bool amdgpu_cwsr_is_enabled(struct amdgpu_device *adev)
>>>>           return adev->cwsr_info != NULL;
>>>>    }
>>>>
>>>> +uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int num_xcc);
>>>> +int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
>>>> +                               struct amdgpu_cwsr_params *cwsr_params,
>>>> +                               int num_xcc);
>>>>    #endif
>>>> --
>>>> 2.49.0
>>>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 11/11] drm/amdgpu: Add interface to set debug trap flag
  2026-01-23 20:53   ` Alex Deucher
@ 2026-01-27 12:36     ` Lazar, Lijo
  0 siblings, 0 replies; 28+ messages in thread
From: Lazar, Lijo @ 2026-01-27 12:36 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang



On 24-Jan-26 2:23 AM, Alex Deucher wrote:
> On Thu, Jan 22, 2026 at 5:42 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>
>> Add interface to set debugger trap flag in TMA region.
>>
>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c | 19 ++++++++++++++++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h |  3 +++
>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
>> index 70f444afece0..663b91c8e6f3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.c
>> @@ -19,7 +19,6 @@
>>    * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>    * OTHER DEALINGS IN THE SOFTWARE.
>>    */
>> -
> 
> Spurious change.
> 
>>   #include <drm/drm_exec.h>
>>
>>   #include "amdgpu.h"
>> @@ -614,4 +613,22 @@ int amdgpu_cwsr_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>>          }
>>
>>          return r;
>> +}
>> +
>> +int amdgpu_cwsr_set_trap_debug_flag(struct amdgpu_device *adev,
>> +                                   struct amdgpu_cwsr_trap_obj *cwsr_obj,
>> +                                   bool enabled)
>> +{
>> +       uint64_t *l1tma;
>> +
>> +       if (!amdgpu_cwsr_is_enabled(adev))
>> +               return -EOPNOTSUPP;
>> +
>> +       if (!cwsr_obj)
>> +               return -EINVAL;
>> +
>> +       l1tma = (uint64_t *)(cwsr_obj->tma_cpu_addr);
>> +       l1tma[2] = enabled;
>> +
>> +       return 0;
>>   }
>> \ No newline at end of file
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>> index c9f61e393fde..a32044b07b45 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>> @@ -93,5 +93,8 @@ static inline bool amdgpu_cwsr_has_dbg_wa(struct amdgpu_device *adev)
>>
>>   int amdgpu_cwsr_ioctl(struct drm_device *dev, void *data,
>>                        struct drm_file *filp);
>> +int amdgpu_cwsr_set_trap_debug_flag(struct amdgpu_device *adev,
>> +                                   struct amdgpu_cwsr_trap_obj *cwsr_obj,
>> +                                   bool enabled);
>>
> 
> Nothing uses this yet?
> 

This is added to account for this - kfd_process_set_trap_debug_flag ()

https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdkfd/kfd_process.c#L1505

Usage with the new design needs to be finalized.

Thanks,
Lijo


> Alex
> 
>>   #endif
>> --
>> 2.49.0
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 09/11] drm/amdgpu: Add ioctl support for cwsr params
  2026-01-27  5:44     ` Lazar, Lijo
@ 2026-01-28 11:59       ` Lancelot SIX
  2026-01-28 13:21         ` Lazar, Lijo
  0 siblings, 1 reply; 28+ messages in thread
From: Lancelot SIX @ 2026-01-28 11:59 UTC (permalink / raw)
  To: Lazar, Lijo, Alex Deucher
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang, Yat Sin, David, Kim, Jonathan



On 27/01/2026 05:44, Lazar, Lijo wrote:
> 
> 
> On 24-Jan-26 2:21 AM, Alex Deucher wrote:
>> On Thu, Jan 22, 2026 at 5:52 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>>
>>> Add cwsr parameters to userqueue ioctl. User should pass the GPU virtual
>>> address for save/restore buffer, and size allocated. They are supported
>>> only for user compute queues.
>>>
>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 13 +++++++++----
>>>   include/uapi/drm/amdgpu_drm.h              | 16 ++++++++++++++++
>>>   2 files changed, 25 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c b/drivers/ 
>>> gpu/drm/amd/amdgpu/mes_userqueue.c
>>> index 7ad8297eb0d8..2765317f04df 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
>>> @@ -343,16 +343,21 @@ static int mes_userq_mqd_create(struct 
>>> amdgpu_usermode_queue *queue,
>>>
>>>                  if (amdgpu_cwsr_is_enabled(adev)) {
>>>                          cwsr_params.ctx_save_area_address =
>>> -                               userq_props->ctx_save_area_addr;
>>> -                       cwsr_params.cwsr_sz = userq_props- 
>>> >ctx_save_area_size;
>>> -                       cwsr_params.ctl_stack_sz = userq_props- 
>>> >ctl_stack_size;
>>> -
>>> +                               compute_mqd->ctx_save_area_va;
>>> +                       cwsr_params.cwsr_sz = compute_mqd- 
>>> >ctx_save_area_size;
>>> +                       cwsr_params.ctl_stack_sz = compute_mqd- 
>>> >ctl_stack_size;
>>>                          r = amdgpu_userq_input_cwsr_params_validate(
>>>                                  queue, &cwsr_params);
>>>                          if (r) {
>>>                                  kfree(compute_mqd);
>>>                                  goto free_mqd;
>>>                          }
>>> +                       userq_props->ctx_save_area_addr =
>>> +                               compute_mqd->ctx_save_area_va;
>>> +                       userq_props->ctx_save_area_size =
>>> +                               compute_mqd->ctx_save_area_size;
>>> +                       userq_props->ctl_stack_size =
>>> +                               compute_mqd->ctl_stack_size;
>>>                  }
>>>
>>>                  kfree(compute_mqd);
>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/ 
>>> amdgpu_drm.h
>>> index c178b8e0bd3f..b7a858365174 100644
>>> --- a/include/uapi/drm/amdgpu_drm.h
>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>> @@ -460,6 +460,22 @@ struct drm_amdgpu_userq_mqd_compute_gfx11 {
>>>           * to get the size.
>>>           */
>>>          __u64   eop_va;
>>> +       /**
>>> +        * @ctx_save_area_va: Virtual address of the GPU memory for 
>>> save/restore buffer.
>>> +        * This must be from a separate GPU object, and use 
>>> AMDGPU_INFO IOCTL
>>> +        * to get the size. This includes control stack, wave context 
>>> and debugger memory.
>>> +        */
>>> +       __u64 ctx_save_area_va;
>>> +       /**
>>> +        * @ctx_save_area_size:  Total size (in bytes) allocated for 
>>> save/restore buffer.
>>> +        * Use AMDGPU_INFO IOCTL to get the size.
>>> +        */
>>> +       __u32 ctx_save_area_size;
>>> +       /**
>>> +        * @ctl_stack_size: Size (in bytes) of control stack region 
>>> in the save/restore buffer.
>>> +        * Use AMDGPU_INFO IOCTL to get the size.
>>> +        */
>>> +       __u32 ctl_stack_size;
>>
>> Does it matter where the ctl_stack is within the save area?
>>
> 
> This is the legacy way. Probably, this can be avoided. Adding David and 
> Lancelot.
> 
> Hi David/Lancelot,
> 
> Do you have the background of userspace passing back control stack size?
> 
> https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/ 
> amdkfd/kfd_chardev.c#L260
> 
> Can driver assume that context save area takes care of everything and 
> assume that user allotted as per the right control stack size?
> 
> Thanks,
> Lijo

Hi,

As far as ROCr is concerned, the control stack is just an element that 
contributes to the size that need to be allocated for the CWSR area.  I 
do not expect ROCr needs to know anything about it if it can query the 
driver for the minimum size the CWSR allocation should be.

If userspace processes are interested in accessing the control stack 
(like the debugger for example), the way to access it and know its 
current size is by reading the CWSR area header maintained by the 
driver.  See "struct kfd_context_save_area_header", which contains the 
effective size (of valid data).  This struct is at the beginning of the 
cwsr area (ctx_save_area_va), and contains everything needed to 
effectively decode CWSR.

Does that answer your question?

Best,
Lancelot.

cc Jonathan.

> 
>> Alex
>>
>>>   };
>>>
>>>   /* userq signal/wait ioctl */
>>> -- 
>>> 2.49.0
>>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 04/11] drm/amdgpu: Add user save area params validation
  2026-01-27  6:11         ` Lazar, Lijo
@ 2026-01-28 12:30           ` Lancelot SIX
  2026-01-28 16:06             ` Alex Deucher
  0 siblings, 1 reply; 28+ messages in thread
From: Lancelot SIX @ 2026-01-28 12:30 UTC (permalink / raw)
  To: Lazar, Lijo, Alex Deucher
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang, Yat Sin, David, Kim, Jonathan, Morichetti, Laurent

>>>>> +       /*
>>>>> +        * Only control stack and save area size details checked. 
>>>>> Address validation needs to be
>>>>> +        * carried out separately.
>>>>> +        */
>>>>> +       if (cwsr_params->ctl_stack_sz !=
>>>>> +           (cwsr_info->xcc_ctl_stack_sz * num_xcc)) {
>>>>> +               dev_dbg(adev->dev,
>>>>> +                       "queue ctl stack size 0x%x not equal to 
>>>>> node ctl stack size 0x%x\n",
>>>>> +                       cwsr_params->ctl_stack_sz,
>>>>> +                       num_xcc * cwsr_info->xcc_ctl_stack_sz);
>>>>> +               return -EINVAL;
>>>>> +       }
>>>>> +
>>>>> +       if (cwsr_params->cwsr_sz < (cwsr_info->xcc_cwsr_sz * 
>>>>> num_xcc)) {
>>>>> +               dev_dbg(adev->dev,
>>>>> +                       "queue cwsr size 0x%x not equal to node 
>>>>> cwsr size 0x%x\n",
>>>>> +                       cwsr_params->cwsr_sz, num_xcc * cwsr_info- 
>>>>> >xcc_cwsr_sz);
>>>>> +               return -EINVAL;
>>>>> +       }
>>
>> cwsr_params->cwsr_sz has no upper bound check.  Can this cause an
>> overflow elsewhere?
>>
> 
> We could restrict to a max limit of 2 * cwsr size required. Adding 
> David/Lancelot as well.
> 
> Thanks,
> Lijo
> 

Hi,

The CWSR size should allow room for userspace to choose the amount 
allocated for use by the debugger.  I am not sure what limit would make 
sense, as I can't really predict what will be needed in the future, but 
I really don't see how we could need more than the cwsr size (which 
itself can contain the entire state of what is running on the queue).

Best,
Lancelot.

cc Jonathan/Laurent

>> Alex
>>
>>
>> Alex
>>
>>>>> +
>>>>> +       return 0;
>>>>> +}
>>>>> +
>>>>>    void amdgpu_cwsr_free(struct amdgpu_device *adev, struct 
>>>>> amdgpu_vm *vm,
>>>>>                         struct amdgpu_cwsr_trap_obj **trap_obj)
>>>>>    {
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/ 
>>>>> gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>>>>> index 3c80d057bbed..96b03a8ed99b 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
>>>>> @@ -56,6 +56,13 @@ struct amdgpu_cwsr_info {
>>>>>           uint32_t xcc_cwsr_sz;
>>>>>    };
>>>>>
>>>>> +struct amdgpu_cwsr_params {
>>>>> +       uint64_t ctx_save_area_address;
>>>>> +       /* cwsr size info */
>>>>> +       uint32_t ctl_stack_sz;
>>>>> +       uint32_t cwsr_sz;
>>>>> +};
>>>>> +
>>>>>    int amdgpu_cwsr_init(struct amdgpu_device *adev);
>>>>>    void amdgpu_cwsr_fini(struct amdgpu_device *adev);
>>>>>
>>>>> @@ -68,4 +75,8 @@ static inline bool amdgpu_cwsr_is_enabled(struct 
>>>>> amdgpu_device *adev)
>>>>>           return adev->cwsr_info != NULL;
>>>>>    }
>>>>>
>>>>> +uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int 
>>>>> num_xcc);
>>>>> +int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
>>>>> +                               struct amdgpu_cwsr_params 
>>>>> *cwsr_params,
>>>>> +                               int num_xcc);
>>>>>    #endif
>>>>> -- 
>>>>> 2.49.0
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 09/11] drm/amdgpu: Add ioctl support for cwsr params
  2026-01-28 11:59       ` Lancelot SIX
@ 2026-01-28 13:21         ` Lazar, Lijo
  0 siblings, 0 replies; 28+ messages in thread
From: Lazar, Lijo @ 2026-01-28 13:21 UTC (permalink / raw)
  To: Lancelot SIX, Alex Deucher
  Cc: amd-gfx, Hawking.Zhang, Alexander.Deucher, Christian.Koenig,
	Jesse.Zhang, Yat Sin, David, Kim, Jonathan



On 28-Jan-26 5:29 PM, Lancelot SIX wrote:
> 
> 
> On 27/01/2026 05:44, Lazar, Lijo wrote:
>>
>>
>> On 24-Jan-26 2:21 AM, Alex Deucher wrote:
>>> On Thu, Jan 22, 2026 at 5:52 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>>>
>>>> Add cwsr parameters to userqueue ioctl. User should pass the GPU 
>>>> virtual
>>>> address for save/restore buffer, and size allocated. They are supported
>>>> only for user compute queues.
>>>>
>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 13 +++++++++----
>>>>   include/uapi/drm/amdgpu_drm.h              | 16 ++++++++++++++++
>>>>   2 files changed, 25 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c b/drivers/ 
>>>> gpu/drm/amd/amdgpu/mes_userqueue.c
>>>> index 7ad8297eb0d8..2765317f04df 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
>>>> @@ -343,16 +343,21 @@ static int mes_userq_mqd_create(struct 
>>>> amdgpu_usermode_queue *queue,
>>>>
>>>>                  if (amdgpu_cwsr_is_enabled(adev)) {
>>>>                          cwsr_params.ctx_save_area_address =
>>>> -                               userq_props->ctx_save_area_addr;
>>>> -                       cwsr_params.cwsr_sz = userq_props- 
>>>> >ctx_save_area_size;
>>>> -                       cwsr_params.ctl_stack_sz = userq_props- 
>>>> >ctl_stack_size;
>>>> -
>>>> +                               compute_mqd->ctx_save_area_va;
>>>> +                       cwsr_params.cwsr_sz = compute_mqd- 
>>>> >ctx_save_area_size;
>>>> +                       cwsr_params.ctl_stack_sz = compute_mqd- 
>>>> >ctl_stack_size;
>>>>                          r = amdgpu_userq_input_cwsr_params_validate(
>>>>                                  queue, &cwsr_params);
>>>>                          if (r) {
>>>>                                  kfree(compute_mqd);
>>>>                                  goto free_mqd;
>>>>                          }
>>>> +                       userq_props->ctx_save_area_addr =
>>>> +                               compute_mqd->ctx_save_area_va;
>>>> +                       userq_props->ctx_save_area_size =
>>>> +                               compute_mqd->ctx_save_area_size;
>>>> +                       userq_props->ctl_stack_size =
>>>> +                               compute_mqd->ctl_stack_size;
>>>>                  }
>>>>
>>>>                  kfree(compute_mqd);
>>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/ 
>>>> amdgpu_drm.h
>>>> index c178b8e0bd3f..b7a858365174 100644
>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>> @@ -460,6 +460,22 @@ struct drm_amdgpu_userq_mqd_compute_gfx11 {
>>>>           * to get the size.
>>>>           */
>>>>          __u64   eop_va;
>>>> +       /**
>>>> +        * @ctx_save_area_va: Virtual address of the GPU memory for 
>>>> save/restore buffer.
>>>> +        * This must be from a separate GPU object, and use 
>>>> AMDGPU_INFO IOCTL
>>>> +        * to get the size. This includes control stack, wave 
>>>> context and debugger memory.
>>>> +        */
>>>> +       __u64 ctx_save_area_va;
>>>> +       /**
>>>> +        * @ctx_save_area_size:  Total size (in bytes) allocated for 
>>>> save/restore buffer.
>>>> +        * Use AMDGPU_INFO IOCTL to get the size.
>>>> +        */
>>>> +       __u32 ctx_save_area_size;
>>>> +       /**
>>>> +        * @ctl_stack_size: Size (in bytes) of control stack region 
>>>> in the save/restore buffer.
>>>> +        * Use AMDGPU_INFO IOCTL to get the size.
>>>> +        */
>>>> +       __u32 ctl_stack_size;
>>>
>>> Does it matter where the ctl_stack is within the save area?
>>>
>>
>> This is the legacy way. Probably, this can be avoided. Adding David 
>> and Lancelot.
>>
>> Hi David/Lancelot,
>>
>> Do you have the background of userspace passing back control stack size?
>>
>> https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/ 
>> amdkfd/kfd_chardev.c#L260
>>
>> Can driver assume that context save area takes care of everything and 
>> assume that user allotted as per the right control stack size?
>>
>> Thanks,
>> Lijo
> 
> Hi,
> 
> As far as ROCr is concerned, the control stack is just an element that 
> contributes to the size that need to be allocated for the CWSR area.  I 
> do not expect ROCr needs to know anything about it if it can query the 
> driver for the minimum size the CWSR allocation should be.
> 
> If userspace processes are interested in accessing the control stack 
> (like the debugger for example), the way to access it and know its 
> current size is by reading the CWSR area header maintained by the 
> driver.  See "struct kfd_context_save_area_header", which contains the 
> effective size (of valid data).  This struct is at the beginning of the 
> cwsr area (ctx_save_area_va), and contains everything needed to 
> effectively decode CWSR.
> 
> Does that answer your question?
> 

Thanks, that clarifies. Control stack size is expected to be passed to 
mqd. I think driver can use the size it calculated as long as user has 
allocated the minimum size required for the whole save area. Will remove 
this from input parameter.

Thanks for the pointer to save area header. The interface to query the 
used size is missing.

Thanks,
Lijo

> Best,
> Lancelot.
> 
> cc Jonathan.
> 
>>
>>> Alex
>>>
>>>>   };
>>>>
>>>>   /* userq signal/wait ioctl */
>>>> -- 
>>>> 2.49.0
>>>>
>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 04/11] drm/amdgpu: Add user save area params validation
  2026-01-28 12:30           ` Lancelot SIX
@ 2026-01-28 16:06             ` Alex Deucher
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Deucher @ 2026-01-28 16:06 UTC (permalink / raw)
  To: Lancelot SIX
  Cc: Lazar, Lijo, amd-gfx, Hawking.Zhang, Alexander.Deucher,
	Christian.Koenig, Jesse.Zhang, Yat Sin, David, Kim, Jonathan,
	Morichetti, Laurent

On Wed, Jan 28, 2026 at 7:30 AM Lancelot SIX <Lancelot.Six@amd.com> wrote:
>
> >>>>> +       /*
> >>>>> +        * Only control stack and save area size details checked.
> >>>>> Address validation needs to be
> >>>>> +        * carried out separately.
> >>>>> +        */
> >>>>> +       if (cwsr_params->ctl_stack_sz !=
> >>>>> +           (cwsr_info->xcc_ctl_stack_sz * num_xcc)) {
> >>>>> +               dev_dbg(adev->dev,
> >>>>> +                       "queue ctl stack size 0x%x not equal to
> >>>>> node ctl stack size 0x%x\n",
> >>>>> +                       cwsr_params->ctl_stack_sz,
> >>>>> +                       num_xcc * cwsr_info->xcc_ctl_stack_sz);
> >>>>> +               return -EINVAL;
> >>>>> +       }
> >>>>> +
> >>>>> +       if (cwsr_params->cwsr_sz < (cwsr_info->xcc_cwsr_sz *
> >>>>> num_xcc)) {
> >>>>> +               dev_dbg(adev->dev,
> >>>>> +                       "queue cwsr size 0x%x not equal to node
> >>>>> cwsr size 0x%x\n",
> >>>>> +                       cwsr_params->cwsr_sz, num_xcc * cwsr_info-
> >>>>> >xcc_cwsr_sz);
> >>>>> +               return -EINVAL;
> >>>>> +       }
> >>
> >> cwsr_params->cwsr_sz has no upper bound check.  Can this cause an
> >> overflow elsewhere?
> >>
> >
> > We could restrict to a max limit of 2 * cwsr size required. Adding
> > David/Lancelot as well.
> >
> > Thanks,
> > Lijo
> >
>
> Hi,
>
> The CWSR size should allow room for userspace to choose the amount
> allocated for use by the debugger.  I am not sure what limit would make
> sense, as I can't really predict what will be needed in the future, but
> I really don't see how we could need more than the cwsr size (which
> itself can contain the entire state of what is running on the queue).
>

We can always make it bigger in the future if we need it.  Unbounded
seems ripe for an overflow somewhere.

Alex

> Best,
> Lancelot.
>
> cc Jonathan/Laurent
>
> >> Alex
> >>
> >>
> >> Alex
> >>
> >>>>> +
> >>>>> +       return 0;
> >>>>> +}
> >>>>> +
> >>>>>    void amdgpu_cwsr_free(struct amdgpu_device *adev, struct
> >>>>> amdgpu_vm *vm,
> >>>>>                         struct amdgpu_cwsr_trap_obj **trap_obj)
> >>>>>    {
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h b/drivers/
> >>>>> gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> >>>>> index 3c80d057bbed..96b03a8ed99b 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cwsr.h
> >>>>> @@ -56,6 +56,13 @@ struct amdgpu_cwsr_info {
> >>>>>           uint32_t xcc_cwsr_sz;
> >>>>>    };
> >>>>>
> >>>>> +struct amdgpu_cwsr_params {
> >>>>> +       uint64_t ctx_save_area_address;
> >>>>> +       /* cwsr size info */
> >>>>> +       uint32_t ctl_stack_sz;
> >>>>> +       uint32_t cwsr_sz;
> >>>>> +};
> >>>>> +
> >>>>>    int amdgpu_cwsr_init(struct amdgpu_device *adev);
> >>>>>    void amdgpu_cwsr_fini(struct amdgpu_device *adev);
> >>>>>
> >>>>> @@ -68,4 +75,8 @@ static inline bool amdgpu_cwsr_is_enabled(struct
> >>>>> amdgpu_device *adev)
> >>>>>           return adev->cwsr_info != NULL;
> >>>>>    }
> >>>>>
> >>>>> +uint32_t amdgpu_cwsr_size_needed(struct amdgpu_device *adev, int
> >>>>> num_xcc);
> >>>>> +int amdgpu_cwsr_validate_params(struct amdgpu_device *adev,
> >>>>> +                               struct amdgpu_cwsr_params
> >>>>> *cwsr_params,
> >>>>> +                               int num_xcc);
> >>>>>    #endif
> >>>>> --
> >>>>> 2.49.0
> >>>>>
> >>>
> >
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-01-28 16:07 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-22 10:39 [PATCH v4 00/11] Add CWSR support to user queues Lijo Lazar
2026-01-22 10:39 ` [PATCH v4 01/11] drm/amdgpu: Add helper function to get xcc count Lijo Lazar
2026-01-22 10:39 ` [PATCH v4 02/11] drm/amdgpu: Add cwsr functions Lijo Lazar
2026-01-23 20:41   ` Alex Deucher
2026-01-22 10:39 ` [PATCH v4 03/11] drm/amdgpu: Fill cwsr save area details Lijo Lazar
2026-01-22 10:39 ` [PATCH v4 04/11] drm/amdgpu: Add user save area params validation Lijo Lazar
2026-01-23 20:44   ` Alex Deucher
2026-01-27  5:35     ` Lazar, Lijo
2026-01-27  5:55       ` Alex Deucher
2026-01-27  6:11         ` Lazar, Lijo
2026-01-28 12:30           ` Lancelot SIX
2026-01-28 16:06             ` Alex Deucher
2026-01-22 10:39 ` [PATCH v4 05/11] drm/amdgpu: Add cwsr to device init/fini sequence Lijo Lazar
2026-01-22 10:39 ` [PATCH v4 06/11] drm/amdgpu: Add first level cwsr handler to userq Lijo Lazar
2026-01-22 10:39 ` [PATCH v4 07/11] drm/amdgpu: Add user save area params to mqd input Lijo Lazar
2026-01-23 20:47   ` Alex Deucher
2026-01-22 10:39 ` [PATCH v4 08/11] drm/amdgpu: Add ioctl to get cwsr details Lijo Lazar
2026-01-23 20:48   ` Alex Deucher
2026-01-22 10:39 ` [PATCH v4 09/11] drm/amdgpu: Add ioctl support for cwsr params Lijo Lazar
2026-01-23 20:51   ` Alex Deucher
2026-01-27  5:44     ` Lazar, Lijo
2026-01-28 11:59       ` Lancelot SIX
2026-01-28 13:21         ` Lazar, Lijo
2026-01-22 10:39 ` [PATCH v4 10/11] drm/amdgpu: Add ioctl to set level2 handler Lijo Lazar
2026-01-23 20:52   ` Alex Deucher
2026-01-22 10:40 ` [PATCH v4 11/11] drm/amdgpu: Add interface to set debug trap flag Lijo Lazar
2026-01-23 20:53   ` Alex Deucher
2026-01-27 12:36     ` Lazar, Lijo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox