* [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields
@ 2026-04-30 16:03 Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 02/10] drm/amdgpu/mes11: plumb unmap_flag_addr + NOTIFY_WORK_ON_UNMAPPED_QUEUE Jesse Zhang
` (9 more replies)
0 siblings, 10 replies; 17+ messages in thread
From: Jesse Zhang @ 2026-04-30 16:03 UTC (permalink / raw)
To: amd-gfx; +Cc: Alexander.Deucher, Christian Koenig, Jesse.zhang, Jesse Zhang
From: "Jesse.zhang" <Jesse.zhang@amd.com>
Kernel-side abstraction work for the SDMA usermode-queue plumbing
landed in subsequent per-engine patches:
- mes_add_queue_input gains is_user_mode_submission and
unmap_flag_addr. Without is_user_mode_submission MES treats SDMA
queues as kernel-managed and uses the end-of-MQD slot for the unmap
flag, so PROTECTED_FENCE at the tail of every SDMA IB looks like a
"queue done" signal and MES gangs the queue out forever.
- mes_misc_opcode gains MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE
with a notify_work.priority_level payload. This wakes a gangs-out
SDMA UMQ so subsequent IBs get re-mapped (SDMA has no
CP_UNMAPPED_DOORBELL HW intercept).
Also surface the matching firmware bits in mes_v12_api_def.h:
is_user_mode_submission / enable_perf_profiling /
exclude_process_limit / is_video_blit_queue bitfields in
MESAPI__ADD_QUEUE, and the unmap_flag_addr packet field.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 7 +++++++
drivers/gpu/drm/amd/include/mes_v12_api_def.h | 12 +++++++++++-
2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index cafc5caae822..705056de94b0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -265,6 +265,8 @@ struct mes_add_queue_input {
uint32_t exclusively_scheduled;
uint32_t sh_mem_config_data;
uint32_t vm_cntx_cntl;
+ uint32_t is_user_mode_submission;
+ uint64_t unmap_flag_addr;
};
struct mes_remove_queue_input {
@@ -343,6 +345,7 @@ enum mes_misc_opcode {
MES_MISC_OP_WRM_REG_WR_WAIT,
MES_MISC_OP_SET_SHADER_DEBUGGER,
MES_MISC_OP_CHANGE_CONFIG,
+ MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE,
};
struct mes_misc_op_input {
@@ -397,6 +400,10 @@ struct mes_misc_op_input {
uint32_t tdr_delay;
} tdr_config;
} change_config;
+
+ struct {
+ uint32_t priority_level;
+ } notify_work;
};
};
diff --git a/drivers/gpu/drm/amd/include/mes_v12_api_def.h b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
index e541a43714a1..cd6e60184a06 100644
--- a/drivers/gpu/drm/amd/include/mes_v12_api_def.h
+++ b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
@@ -381,7 +381,11 @@ union MESAPI__ADD_QUEUE {
uint32_t exclusively_scheduled : 1;
uint32_t is_long_running : 1;
uint32_t is_dwm_queue : 1;
- uint32_t reserved : 15;
+ uint32_t is_video_blit_queue : 1;
+ uint32_t is_user_mode_submission : 1;
+ uint32_t enable_perf_profiling : 1;
+ uint32_t exclude_process_limit : 1;
+ uint32_t reserved : 11;
};
struct MES_API_STATUS api_status;
uint64_t tma_addr;
@@ -393,6 +397,12 @@ union MESAPI__ADD_QUEUE {
uint32_t queue_id;
uint32_t alignment_mode_setting;
uint32_t full_sh_mem_config_data;
+ /*
+ * MC addr where MES writes 1 when it unmaps the queue. Used
+ * by user-mode SDMA UMQs so the kernel/userspace can detect
+ * the unmapped state and re-arm work via NOTIFY_WORK_ON_UNMAPPED_QUEUE.
+ */
+ uint64_t unmap_flag_addr;
};
uint32_t max_dwords_in_api[API_FRAME_SIZE_IN_DWORDS];
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 02/10] drm/amdgpu/mes11: plumb unmap_flag_addr + NOTIFY_WORK_ON_UNMAPPED_QUEUE
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
@ 2026-04-30 16:03 ` Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 03/10] drm/amdgpu/mes12: plumb is_user_mode_submission, unmap_flag_addr, NOTIFY Jesse Zhang
` (8 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jesse Zhang @ 2026-04-30 16:03 UTC (permalink / raw)
To: amd-gfx; +Cc: Alexander.Deucher, Christian Koenig, Jesse.zhang, Jesse Zhang
From: "Jesse.zhang" <Jesse.zhang@amd.com>
Pass the new mes_add_queue_input.unmap_flag_addr through to the
MESAPI__ADD_QUEUE packet, and route MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE
to the matching MESAPI_MISC opcode.
Note: the MES v11 firmware spec does not (yet) carry a per-queue
is_user_mode_submission bit, so SDMA UMQs on chips with MES v11 may
still see PROTECTED_FENCE-as-queue-done behaviour after the first IB
until firmware adds the bit. The wakeup mechanism (NOTIFY) is wired
up so that path is ready when firmware lands.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
---
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index a926a330700e..575cc4a684b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -362,6 +362,16 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.exclusively_scheduled = input->exclusively_scheduled;
+ /*
+ * unmap_flag_addr is plumbed through but only honoured by MES when
+ * the global use_add_queue_unmap_flag_addr flag is set in
+ * SET_HW_RESOURCES. MES v11 firmware spec does not carry a
+ * per-queue is_user_mode_submission bit, so SDMA UMQs on chips with
+ * MES v11 may still see PROTECTED_FENCE-as-queue-done behaviour
+ * until firmware adds the bit.
+ */
+ mes_add_queue_pkt.unmap_flag_addr = input->unmap_flag_addr;
+
return mes_v11_0_submit_pkt_and_poll_completion(mes,
&mes_add_queue_pkt, sizeof(mes_add_queue_pkt),
offsetof(union MESAPI__ADD_QUEUE, api_status));
@@ -660,6 +670,10 @@ static int mes_v11_0_misc_op(struct amdgpu_mes *mes,
misc_pkt.change_config.option.bits.limit_single_process =
input->change_config.option.limit_single_process;
break;
+ case MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE:
+ misc_pkt.opcode = MESAPI_MISC__NOTIFY_WORK_ON_UNMAPPED_QUEUE;
+ misc_pkt.queue_sch_level = input->notify_work.priority_level;
+ break;
default:
drm_err(adev_to_drm(mes->adev), "unsupported misc op (%d)\n", input->op);
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 03/10] drm/amdgpu/mes12: plumb is_user_mode_submission, unmap_flag_addr, NOTIFY
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 02/10] drm/amdgpu/mes11: plumb unmap_flag_addr + NOTIFY_WORK_ON_UNMAPPED_QUEUE Jesse Zhang
@ 2026-04-30 16:03 ` Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 04/10] drm/amdgpu/mes_userqueue: mark SDMA UMQs as user-mode submission Jesse Zhang
` (7 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jesse Zhang @ 2026-04-30 16:03 UTC (permalink / raw)
To: amd-gfx; +Cc: Alexander.Deucher, Christian Koenig, Jesse Zhang
Pass is_user_mode_submission and unmap_flag_addr from
mes_add_queue_input through to MESAPI__ADD_QUEUE in both mes_v12_0
add_hw_queue paths, and route MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE
to the matching MESAPI_MISC opcode.
The kernel-side caller that actually sets is_user_mode_submission for
SDMA UMQs lives in a later patch; this one is just the engine-level
plumbing.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
---
drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 023c7345ea54..5acc505533f3 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -342,6 +342,8 @@ static int mes_v12_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.trap_en = input->trap_en;
mes_add_queue_pkt.skip_process_ctx_clear = input->skip_process_ctx_clear;
mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
+ mes_add_queue_pkt.is_user_mode_submission = input->is_user_mode_submission;
+ mes_add_queue_pkt.unmap_flag_addr = input->unmap_flag_addr;
/* For KFD, gds_size is re-used for queue size (needed in MES for AQL queues) */
mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
@@ -697,6 +699,10 @@ static int mes_v12_0_misc_op(struct amdgpu_mes *mes,
misc_pkt.change_config.option.bits.limit_single_process =
input->change_config.option.limit_single_process;
break;
+ case MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE:
+ misc_pkt.opcode = MESAPI_MISC__NOTIFY_WORK_ON_UNMAPPED_QUEUE;
+ misc_pkt.queue_sch_level = input->notify_work.priority_level;
+ break;
default:
DRM_ERROR("unsupported misc op (%d)\n", input->op);
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 04/10] drm/amdgpu/mes_userqueue: mark SDMA UMQs as user-mode submission
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 02/10] drm/amdgpu/mes11: plumb unmap_flag_addr + NOTIFY_WORK_ON_UNMAPPED_QUEUE Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 03/10] drm/amdgpu/mes12: plumb is_user_mode_submission, unmap_flag_addr, NOTIFY Jesse Zhang
@ 2026-04-30 16:03 ` Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 05/10] amdgpu: add global aggregated doorbell bo Jesse Zhang
` (6 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jesse Zhang @ 2026-04-30 16:03 UTC (permalink / raw)
To: amd-gfx; +Cc: Alexander.Deucher, Christian Koenig, Jesse.zhang, Jesse Zhang
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="Y", Size: 2382 bytes --]
From: "Jesse.zhang" <Jesse.zhang@amd.com>
For AMDGPU_HW_IP_DMA queues, set mes_add_queue_input.is_user_mode_submission
and a stable unmap_flag_addr (a kernel-owned dword in the MQD
object's tail padding). This tells MES to use the new wptr_mc /
unmap_flag scheme so the PROTECTED_FENCE at the tail of every SDMA
IB no longer terminates the queue. Combined with the
NOTIFY_WORK_ON_UNMAPPED_QUEUE wakeup added in a follow-up patch, this
lets multi-IB submissions on a single SDMA UMQ work end-to-end.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
---
drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
index d12cd1b7790b..3dbcddb46b24 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
@@ -165,6 +165,28 @@ static int mes_userq_map(struct amdgpu_usermode_queue *queue)
queue_input.doorbell_offset = userq_props->doorbell_index;
queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
+ /*
+ * SDMA UMQs need is_user_mode_submission so MES treats them as user
+ * queues (using the new wptr_mc_addr / unmap_flag_addr scheme).
+ * Without this MES uses end-of-MQD for unmap_flag, sees PROTECTED_FENCE
+ * as a "queue done" signal, and gangs the queue out forever. Combined
+ * with NOTIFY_WORK_ON_UNMAPPED_QUEUE poke from amdgpu_userq_signal_ioctl
+ * this lets multi-IB submissions work. Use queue->mqd.gpu_addr +
+ * mqd_size as a stable kernel-owned location for unmap_flag — userspace
+ * never reads it; the kernel just needs SOMETHING valid to give MES.
+ */
+ if (queue->queue_type == AMDGPU_HW_IP_DMA) {
+ queue_input.is_user_mode_submission = 1;
+ /*
+ * Same offset MES would derive in legacy mode
+ * (get_unmap_flag_addr_from_end_of_mqd in MES src 12). Lives
+ * inside the allocated MQD object's tail padding so it's a
+ * valid MC address; the kernel never reads it back — its only
+ * purpose is to keep MES happy.
+ */
+ queue_input.unmap_flag_addr = queue->mqd.gpu_addr +
+ adev->mqds[queue->queue_type].mqd_size + sizeof(u32);
+ }
amdgpu_mes_lock(&adev->mes);
r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 05/10] amdgpu: add global aggregated doorbell bo
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
` (2 preceding siblings ...)
2026-04-30 16:03 ` [PATCH v4 04/10] drm/amdgpu/mes_userqueue: mark SDMA UMQs as user-mode submission Jesse Zhang
@ 2026-04-30 16:03 ` Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 06/10] drm/amdgpu: add AMDGPU_INFO_DOORBELL Jesse Zhang
` (5 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jesse Zhang @ 2026-04-30 16:03 UTC (permalink / raw)
To: amd-gfx
Cc: Alexander.Deucher, Christian Koenig, David (Ming Qiang) Wu,
Alex Deucher
From: "David (Ming Qiang) Wu" <David.Wu3@amd.com>
Allocate aggregated doorbell bo at device level so it can be used for
multiple IPs. Also add AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL so
userspace can obtain a handle to this BO via AMDGPU_GEM_OP_OPEN_GLOBAL
and mmap it for direct doorbell rings.
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c | 14 ++++++++++++++
include/uapi/drm/amdgpu_drm.h | 4 ++++
3 files changed, 21 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 39894e38fee4..047d6bb6f2a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1196,6 +1196,9 @@ struct amdgpu_device {
struct amdgpu_uma_carveout_info uma_info;
+ /* aggregated doorbell */
+ struct amdgpu_bo *agdb_bo;
+
/* KFD
* Must be last --ends in a flexible-array member.
*/
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
index bc7858567321..5a63a501d48e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
@@ -148,6 +148,7 @@ uint32_t amdgpu_doorbell_index_on_bar(struct amdgpu_device *adev,
*/
int amdgpu_doorbell_create_kernel_doorbells(struct amdgpu_device *adev)
{
+ struct amdgpu_bo_param bp;
int r;
int size;
@@ -176,6 +177,19 @@ int amdgpu_doorbell_create_kernel_doorbells(struct amdgpu_device *adev)
}
adev->doorbell.num_kernel_doorbells = size / sizeof(u32);
+
+ /* allocate aggregated doorbell bo at device level */
+ if (!adev->agdb_bo) {
+ memset(&bp, 0, sizeof(bp));
+ bp.type = ttm_bo_type_device;
+ bp.size = AMDGPU_GPU_PAGE_SIZE;
+ bp.byte_align = AMDGPU_GPU_PAGE_SIZE;
+ bp.domain = AMDGPU_GEM_DOMAIN_DOORBELL;
+ bp.bo_ptr_size = sizeof(struct amdgpu_bo);
+
+ return amdgpu_bo_create(adev, &bp, &adev->agdb_bo);
+ }
+
return 0;
}
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 9f3090db2f16..062ae4741fd6 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -806,6 +806,10 @@ union drm_amdgpu_wait_fences {
#define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO 0
#define AMDGPU_GEM_OP_SET_PLACEMENT 1
#define AMDGPU_GEM_OP_GET_MAPPING_INFO 2
+#define AMDGPU_GEM_OP_OPEN_GLOBAL 3
+
+#define AMDGPU_GEM_GLOBAL_MMIO_REMAP 0
+#define AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL 1
struct drm_amdgpu_gem_vm_entry {
/* Start of mapping (in bytes) */
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 06/10] drm/amdgpu: add AMDGPU_INFO_DOORBELL
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
` (3 preceding siblings ...)
2026-04-30 16:03 ` [PATCH v4 05/10] amdgpu: add global aggregated doorbell bo Jesse Zhang
@ 2026-04-30 16:03 ` Jesse Zhang
2026-05-01 13:37 ` Alex Deucher
2026-04-30 16:03 ` [PATCH v4 07/10] drm/amdgpu: Add AMDGPU_GEM_OP_OPEN_GLOBAL Jesse Zhang
` (4 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Jesse Zhang @ 2026-04-30 16:03 UTC (permalink / raw)
To: amd-gfx
Cc: Alexander.Deucher, Christian Koenig, David (Ming Qiang) Wu,
Alex Deucher
From: "David (Ming Qiang) Wu" <David.Wu3@amd.com>
Use it to get the doorbell range and aggregated doorbell enablement
and offset. This patch only supports VCN for now.
V2 - drop VPE and use vcn.agdb_offset saved in
umsch_mm_agdb_index_init() (suggested by Alex)
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 18 ++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 +
include/uapi/drm/amdgpu_drm.h | 13 +++++++++++++
3 files changed, 32 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d88e4994c8c1..a3beeff800bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1425,6 +1425,24 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
return -EINVAL;
}
}
+ case AMDGPU_INFO_DOORBELL: {
+ struct drm_amdgpu_info_doorbell doorbell_info = {};
+ /* note: may need to check asic_type */
+ switch (info->query_hw_ip.type) {
+ case AMDGPU_HW_IP_VCN_ENC:
+ if (adev->agdb_bo) {
+ doorbell_info.agdb_enable = 1;
+ doorbell_info.agdb_offset = adev->vcn.agdb_offset;
+ }
+ doorbell_info.index_start = adev->doorbell_index.vcn.vcn_ring0_1 << 1;
+ doorbell_info.index_end = (adev->doorbell_index.vcn.vcn_ring6_7 << 1) + 1;
+ break;
+ default:
+ return -EINVAL;
+ }
+ return copy_to_user(out, &doorbell_info,
+ min((size_t)size, sizeof(doorbell_info))) ? -EFAULT : 0;
+ }
default:
DRM_DEBUG_KMS("Invalid request %d\n", info->query);
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
index 82624b44e661..f07920594295 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
@@ -368,6 +368,7 @@ struct amdgpu_vcn {
struct mutex workload_profile_mutex;
u32 reg_count;
const struct amdgpu_hwip_reg_entry *reg_list;
+ uint32_t agdb_offset;
};
struct amdgpu_fw_shared_rb_ptrs_struct {
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 062ae4741fd6..3ffdd2f8c418 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -1276,6 +1276,8 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
#define AMDGPU_INFO_GPUVM_FAULT 0x23
/* query FW object size and alignment */
#define AMDGPU_INFO_UQ_FW_AREAS 0x24
+/* query doorbell info */
+#define AMDGPU_INFO_DOORBELL 0x25
#define AMDGPU_INFO_MMR_SE_INDEX_SHIFT 0
#define AMDGPU_INFO_MMR_SE_INDEX_MASK 0xff
@@ -1677,6 +1679,17 @@ struct drm_amdgpu_info_uq_metadata {
#define AMDGPU_FAMILY_GC_11_5_4 154 /* GC 11.5.4 */
#define AMDGPU_FAMILY_GC_12_0_0 152 /* GC 12.0.0 */
+/* for AMDGPU_INFO_DOORBELL query */
+struct drm_amdgpu_info_doorbell {
+ __u32 index_start;
+ /* could be equal to index_start */
+ __u32 index_end;
+ /* aggregated doorbell, 0 for disable */
+ __u32 agdb_enable;
+ /* if agdb_enable, it is a value in [index_start, index_end] */
+ __u32 agdb_offset;
+};
+
#if defined(__cplusplus)
}
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 07/10] drm/amdgpu: Add AMDGPU_GEM_OP_OPEN_GLOBAL
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
` (4 preceding siblings ...)
2026-04-30 16:03 ` [PATCH v4 06/10] drm/amdgpu: add AMDGPU_INFO_DOORBELL Jesse Zhang
@ 2026-04-30 16:03 ` Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 08/10] drm/amdgpu/mes: route NORMAL aggregated doorbell through global agdb_bo Jesse Zhang
` (3 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jesse Zhang @ 2026-04-30 16:03 UTC (permalink / raw)
To: amd-gfx
Cc: Alexander.Deucher, Christian Koenig, Christian König,
Alex Deucher, Leo Liu, Ruijing Dong, David (Ming Qiang) Wu,
Srinivasan Shanmugam
From: Christian König <ckoenig.leichtzumerken@gmail.com>
Instead of abusing the create IOCTL to open global BO add a new
AMDGPU_GEM_OP_OPEN_GLOBAL functionality.
The new AMDGPU_GEM_OP_OPEN_GLOBAL functionality expects an enum which
tells it which global BO to open and copies the information about the BO
to userspace similar to the AMDGPU_GEM_OP_GET_GEM_CREATE_INFO operation.
The advantage is that we don't start overloading the create IOCTL with
tons of special cases and opening the global BOs doesn't requires
knowing the exact size and parameters of it in userspace any more.
This keeps the GEM create path simpler and avoids exposing internal
allocation details to userspace.
v2 (Srini):
- Centralize global BO ID to BO mapping into a helper.
- Return -EOPNOTSUPP if the requested global BO is not available.
- Allow args->value == 0 for AMDGPU_GEM_OP_OPEN_GLOBAL so callers that
only need a handle can skip the create_info copy_to_user().
- Clarify the input/output semantics of the handle field for OPEN_GLOBAL
in the UAPI documentation.
- Avoid potential NULL dereference for MMIO_REMAP on unsupported
hardware. (David)
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Leo Liu <leo.liu@amd.com>
Cc: Ruijing Dong <ruijing.dong@amd.com>
Cc: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 44 +++++++++++++++++++++----
include/uapi/drm/amdgpu_drm.h | 8 ++++-
2 files changed, 45 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 0071d6957828..3b486f2bb2e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -999,25 +999,44 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
return r;
}
+static struct amdgpu_bo *
+amdgpu_get_global_bo(struct amdgpu_device *adev, u32 id)
+{
+ switch (id) {
+ case AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL:
+ return adev->agdb_bo;
+ default:
+ return NULL;
+ }
+}
+
int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp)
{
+ struct amdgpu_fpriv *fpriv = filp->driver_priv;
struct drm_amdgpu_gem_op *args = data;
struct drm_gem_object *gobj;
struct amdgpu_vm_bo_base *base;
struct amdgpu_bo *robj;
struct drm_exec exec;
- struct amdgpu_fpriv *fpriv = filp->driver_priv;
int r;
if (args->padding)
return -EINVAL;
- gobj = drm_gem_object_lookup(filp, args->handle);
- if (!gobj)
- return -ENOENT;
+ if (args->op == AMDGPU_GEM_OP_OPEN_GLOBAL) {
+ robj = amdgpu_get_global_bo(drm_to_adev(dev), args->handle);
+ if (!robj)
+ return -EOPNOTSUPP;
+ gobj = &robj->tbo.base;
+ drm_gem_object_get(gobj);
+ } else {
+ gobj = drm_gem_object_lookup(filp, args->handle);
+ if (!gobj)
+ return -ENOENT;
- robj = gem_to_amdgpu_bo(gobj);
+ robj = gem_to_amdgpu_bo(gobj);
+ }
drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT |
DRM_EXEC_IGNORE_DUPLICATES, 0);
@@ -1036,17 +1055,27 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
}
switch (args->op) {
+ case AMDGPU_GEM_OP_OPEN_GLOBAL:
case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
struct drm_amdgpu_gem_create_in info;
- void __user *out = u64_to_user_ptr(args->value);
+ void __user *out = NULL;
info.bo_size = robj->tbo.base.size;
info.alignment = robj->tbo.page_alignment << PAGE_SHIFT;
info.domains = robj->preferred_domains;
info.domain_flags = robj->flags;
drm_exec_fini(&exec);
+ /*
+ * For OPEN_GLOBAL, allow args->value == 0 when the caller
+ * only wants a handle and does not need the create_info.
+ */
+ if (args->op == AMDGPU_GEM_OP_OPEN_GLOBAL && !args->value)
+ break;
+
+ out = u64_to_user_ptr(args->value);
if (copy_to_user(out, &info, sizeof(info)))
r = -EFAULT;
+
break;
}
case AMDGPU_GEM_OP_SET_PLACEMENT:
@@ -1130,6 +1159,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
r = -EINVAL;
}
+ if (!r && args->op == AMDGPU_GEM_OP_OPEN_GLOBAL)
+ r = drm_gem_handle_create(filp, gobj, &args->handle);
+
drm_gem_object_put(gobj);
return r;
out_exec:
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 3ffdd2f8c418..1e04771ed3e5 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -827,7 +827,13 @@ struct drm_amdgpu_gem_vm_entry {
/* Sets or returns a value associated with a buffer. */
struct drm_amdgpu_gem_op {
- /** GEM object handle */
+ /**
+ * GEM object handle or AMDGPU_GEM_GLOBAL_*.
+ *
+ * For AMDGPU_GEM_OP_OPEN_GLOBAL:
+ * - input: handle = AMDGPU_GEM_GLOBAL_* ID
+ * - output: handle = per-file GEM handle to that global BO
+ */
__u32 handle;
/** AMDGPU_GEM_OP_* */
__u32 op;
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 08/10] drm/amdgpu/mes: route NORMAL aggregated doorbell through global agdb_bo
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
` (5 preceding siblings ...)
2026-04-30 16:03 ` [PATCH v4 07/10] drm/amdgpu: Add AMDGPU_GEM_OP_OPEN_GLOBAL Jesse Zhang
@ 2026-04-30 16:03 ` Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 09/10] drm/amdgpu/userq: report SDMA UMQ doorbell info via AMDGPU_INFO_DOORBELL Jesse Zhang
` (2 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jesse Zhang @ 2026-04-30 16:03 UTC (permalink / raw)
To: amd-gfx
Cc: Alexander.Deucher, Christian Koenig, Jesse.zhang, Alex Deucher,
Jesse Zhang
From: "Jesse.zhang" <Jesse.zhang@amd.com>
SDMA UMQ submits cannot reliably wake the SDMA engine via the per-queue
doorbell because LSDMA programs the HQD-slot DOORBELL_OFFSET register
asynchronously after MES MAP_QUEUE — early rings hit a stale slot and
get silently dropped. Going through the MES aggregated doorbell works,
but today's slot lives inside the MES kernel doorbell page, which is
not user-mappable. This forces every SDMA UMQ submit through
amdgpu_userq_signal_ioctl just so the kernel can WDOORBELL32() the
agg slot — defeating the purpose of user mode queues.
Allocate the NORMAL-priority aggregated doorbell inside the global
adev->agdb_bo. MES is told via SET_HW_RESOURCES to monitor the new
location — code unchanged because it reads
mes->aggregated_doorbells[NORMAL] which now points into agdb_bo.
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 24 ++++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 9 +++++++++
2 files changed, 33 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index bdf2561b5404..974dd7a3fc4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -28,6 +28,7 @@
#include "amdgpu.h"
#include "soc15_common.h"
#include "amdgpu_mes_ctx.h"
+#include "amdgpu_doorbell.h"
#define AMDGPU_MES_MAX_NUM_OF_QUEUES_PER_PROCESS 1024
#define AMDGPU_ONE_DOORBELL_SIZE 8
@@ -57,6 +58,29 @@ static int amdgpu_mes_doorbell_init(struct amdgpu_device *adev)
set_bit(i, mes->doorbell_bitmap);
}
+ /*
+ * Relocate the NORMAL-priority aggregated doorbell into the global
+ * aggregated-doorbell BO so userspace SDMA UMQs can mmap it (via
+ * AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL) and ring it directly,
+ * bypassing signal_ioctl. GFX/COMPUTE NORMAL queues are unaffected
+ * functionally — they don't actively ring the agg doorbell; the HW
+ * CP_UNMAPPED_DOORBELL intercept wakes MES for them. SDMA has no
+ * such intercept and needs an explicit wake path. Other priorities
+ * stay in the MES kernel doorbell page.
+ */
+ if (adev->agdb_bo) {
+ int r = amdgpu_bo_reserve(adev->agdb_bo, true);
+
+ if (r)
+ return r;
+ adev->sdma.agdb_offset = AMDGPU_NAVI10_DOORBELL_sDMA_ENGINE0 << 1;
+ adev->mes.aggregated_doorbells[AMDGPU_MES_PRIORITY_LEVEL_NORMAL] =
+ amdgpu_doorbell_index_on_bar(adev, adev->agdb_bo,
+ adev->sdma.agdb_offset,
+ sizeof(u32));
+ amdgpu_bo_unreserve(adev->agdb_bo);
+ }
+
return 0;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
index 2bf365609775..11cefb4c3a27 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
@@ -144,6 +144,15 @@ struct amdgpu_sdma {
struct list_head reset_callback_list;
bool no_user_submission;
bool disable_uq;
+ /*
+ * Dword offset within adev->agdb_bo for the SDMA UMQ aggregated
+ * doorbell. MES is told via SET_HW_RESOURCES to monitor this slot
+ * for NORMAL priority wakes. Userspace mmaps adev->agdb_bo (via
+ * AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL) and writes WPTR here to
+ * wake SDMA UMQs without going through signal_ioctl. Reported via
+ * AMDGPU_INFO_DOORBELL with query_hw_ip.type = AMDGPU_HW_IP_DMA.
+ */
+ uint32_t agdb_offset;
void (*get_csa_info)(struct amdgpu_device *adev,
struct amdgpu_sdma_csa_info *csa_info);
};
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 09/10] drm/amdgpu/userq: report SDMA UMQ doorbell info via AMDGPU_INFO_DOORBELL
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
` (6 preceding siblings ...)
2026-04-30 16:03 ` [PATCH v4 08/10] drm/amdgpu/mes: route NORMAL aggregated doorbell through global agdb_bo Jesse Zhang
@ 2026-04-30 16:03 ` Jesse Zhang
2026-05-01 13:35 ` Alex Deucher
2026-04-30 16:03 ` [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA UMQ submit Jesse Zhang
2026-05-01 13:26 ` [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Alex Deucher
9 siblings, 1 reply; 17+ messages in thread
From: Jesse Zhang @ 2026-04-30 16:03 UTC (permalink / raw)
To: amd-gfx; +Cc: Alexander.Deucher, Christian Koenig, Jesse.zhang, Jesse Zhang
From: "Jesse.zhang" <Jesse.zhang@amd.com>
Extend AMDGPU_INFO_DOORBELL with an AMDGPU_HW_IP_DMA case so userspace
can discover:
- The SDMA UMQ doorbell BAR range (index_start..index_end), used to
validate per-queue doorbell offsets.
- Whether an aggregated doorbell slot exists in adev->agdb_bo
(agdb_enable) and its dword offset within that BO (agdb_offset).
A user-mode driver opens the global aggregated-doorbell BO via
AMDGPU_GEM_OP_OPEN_GLOBAL with AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL,
mmaps it, and writes the queue's 32-bit WPTR at the byte offset
agdb_offset * 4 to wake MES — bypassing amdgpu_userq_signal_ioctl
entirely.
This is the user-visible half of moving the SDMA UMQ wake path out of
the kernel. The kernel-side wake is removed in a follow-up patch.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index a3beeff800bf..1c6368d25b7d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1437,6 +1437,17 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
doorbell_info.index_start = adev->doorbell_index.vcn.vcn_ring0_1 << 1;
doorbell_info.index_end = (adev->doorbell_index.vcn.vcn_ring6_7 << 1) + 1;
break;
+ case AMDGPU_HW_IP_DMA:
+ if (adev->agdb_bo) {
+ doorbell_info.agdb_enable = 1;
+ doorbell_info.agdb_offset = adev->sdma.agdb_offset;
+ }
+ doorbell_info.index_start =
+ adev->doorbell_index.sdma_engine[0] << 1;
+ doorbell_info.index_end =
+ (adev->doorbell_index.sdma_engine[adev->sdma.num_instances - 1]
+ << 1) + 1;
+ break;
default:
return -EINVAL;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA UMQ submit
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
` (7 preceding siblings ...)
2026-04-30 16:03 ` [PATCH v4 09/10] drm/amdgpu/userq: report SDMA UMQ doorbell info via AMDGPU_INFO_DOORBELL Jesse Zhang
@ 2026-04-30 16:03 ` Jesse Zhang
2026-05-01 13:30 ` Alex Deucher
2026-05-01 13:26 ` [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Alex Deucher
9 siblings, 1 reply; 17+ messages in thread
From: Jesse Zhang @ 2026-04-30 16:03 UTC (permalink / raw)
To: amd-gfx; +Cc: Alexander.Deucher, Christian Koenig, Jesse.zhang, Jesse Zhang
From: "Jesse.zhang" <Jesse.zhang@amd.com>
Pair the userspace aggregated-doorbell ring (added by the
AMDGPU_INFO_DOORBELL / AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL ABI in
the previous patches) with a kernel-side
MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE in amdgpu_userq_signal_ioctl
for SDMA UMQs.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 29 +++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index a58342c2ac44..50e275b51c9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -598,6 +598,35 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
/* drop the reference acquired in fence creation function */
dma_fence_put(fence);
+ /*
+ * SDMA UMQ wake: SDMA has no CP_UNMAPPED_DOORBELL HW intercept, so
+ * once MES gangs the queue out (after the first IB's PROTECTED_FENCE
+ * idles the queue), per-queue doorbell rings hit a mapped-out HW
+ * slot and are silently dropped — FENCE IRQ never fires.
+ *
+ * Userspace rings the priority's MES aggregated doorbell directly
+ * via the agdb_bo mmap (see AMDGPU_INFO_DOORBELL +
+ * AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL). That alone, however, is
+ * not enough on current MES12 firmware — MES will not scan the
+ * priority's queue list unless its hasReadyQueues flag is set.
+ * NOTIFY_WORK_ON_UNMAPPED_QUEUE flips that flag, so MES then
+ * processes the doorbell ring and re-MAP_QUEUEs the SDMA UMQ.
+ *
+ * This is a kernel-side companion to the userspace agg doorbell
+ * ring; remove once firmware learns to wake on bare aggregated
+ * doorbell.
+ */
+ if (queue && queue->queue_type == AMDGPU_HW_IP_DMA &&
+ adev->enable_mes && adev->mes.funcs->misc_op) {
+ struct mes_misc_op_input op = { 0 };
+
+ op.op = MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE;
+ op.notify_work.priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+ amdgpu_mes_lock(&adev->mes);
+ (void)adev->mes.funcs->misc_op(&adev->mes, &op);
+ amdgpu_mes_unlock(&adev->mes);
+ }
+
exec_fini:
drm_exec_fini(&exec);
put_gobj_write:
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
` (8 preceding siblings ...)
2026-04-30 16:03 ` [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA UMQ submit Jesse Zhang
@ 2026-05-01 13:26 ` Alex Deucher
9 siblings, 0 replies; 17+ messages in thread
From: Alex Deucher @ 2026-05-01 13:26 UTC (permalink / raw)
To: Jesse Zhang; +Cc: amd-gfx, Alexander.Deucher, Christian Koenig
On Thu, Apr 30, 2026 at 12:29 PM Jesse Zhang <Jesse.Zhang@amd.com> wrote:
>
> From: "Jesse.zhang" <Jesse.zhang@amd.com>
>
> Kernel-side abstraction work for the SDMA usermode-queue plumbing
> landed in subsequent per-engine patches:
>
> - mes_add_queue_input gains is_user_mode_submission and
> unmap_flag_addr. Without is_user_mode_submission MES treats SDMA
> queues as kernel-managed and uses the end-of-MQD slot for the unmap
> flag, so PROTECTED_FENCE at the tail of every SDMA IB looks like a
> "queue done" signal and MES gangs the queue out forever.
>
> - mes_misc_opcode gains MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE
> with a notify_work.priority_level payload. This wakes a gangs-out
> SDMA UMQ so subsequent IBs get re-mapped (SDMA has no
> CP_UNMAPPED_DOORBELL HW intercept).
>
> Also surface the matching firmware bits in mes_v12_api_def.h:
> is_user_mode_submission / enable_perf_profiling /
> exclude_process_limit / is_video_blit_queue bitfields in
> MESAPI__ADD_QUEUE, and the unmap_flag_addr packet field.
>
> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 7 +++++++
> drivers/gpu/drm/amd/include/mes_v12_api_def.h | 12 +++++++++++-
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> index cafc5caae822..705056de94b0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> @@ -265,6 +265,8 @@ struct mes_add_queue_input {
> uint32_t exclusively_scheduled;
> uint32_t sh_mem_config_data;
> uint32_t vm_cntx_cntl;
> + uint32_t is_user_mode_submission;
> + uint64_t unmap_flag_addr;
> };
>
> struct mes_remove_queue_input {
> @@ -343,6 +345,7 @@ enum mes_misc_opcode {
> MES_MISC_OP_WRM_REG_WR_WAIT,
> MES_MISC_OP_SET_SHADER_DEBUGGER,
> MES_MISC_OP_CHANGE_CONFIG,
> + MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE,
> };
>
> struct mes_misc_op_input {
> @@ -397,6 +400,10 @@ struct mes_misc_op_input {
> uint32_t tdr_delay;
> } tdr_config;
> } change_config;
> +
> + struct {
> + uint32_t priority_level;
> + } notify_work;
> };
> };
>
> diff --git a/drivers/gpu/drm/amd/include/mes_v12_api_def.h b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
> index e541a43714a1..cd6e60184a06 100644
> --- a/drivers/gpu/drm/amd/include/mes_v12_api_def.h
> +++ b/drivers/gpu/drm/amd/include/mes_v12_api_def.h
> @@ -381,7 +381,11 @@ union MESAPI__ADD_QUEUE {
> uint32_t exclusively_scheduled : 1;
> uint32_t is_long_running : 1;
> uint32_t is_dwm_queue : 1;
> - uint32_t reserved : 15;
> + uint32_t is_video_blit_queue : 1;
> + uint32_t is_user_mode_submission : 1;
> + uint32_t enable_perf_profiling : 1;
> + uint32_t exclude_process_limit : 1;
> + uint32_t reserved : 11;
> };
> struct MES_API_STATUS api_status;
> uint64_t tma_addr;
> @@ -393,6 +397,12 @@ union MESAPI__ADD_QUEUE {
> uint32_t queue_id;
> uint32_t alignment_mode_setting;
> uint32_t full_sh_mem_config_data;
> + /*
> + * MC addr where MES writes 1 when it unmaps the queue. Used
> + * by user-mode SDMA UMQs so the kernel/userspace can detect
> + * the unmapped state and re-arm work via NOTIFY_WORK_ON_UNMAPPED_QUEUE.
> + */
> + uint64_t unmap_flag_addr;
> };
>
> uint32_t max_dwords_in_api[API_FRAME_SIZE_IN_DWORDS];
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA UMQ submit
2026-04-30 16:03 ` [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA UMQ submit Jesse Zhang
@ 2026-05-01 13:30 ` Alex Deucher
2026-05-04 9:01 ` Christian König
0 siblings, 1 reply; 17+ messages in thread
From: Alex Deucher @ 2026-05-01 13:30 UTC (permalink / raw)
To: Jesse Zhang; +Cc: amd-gfx, Alexander.Deucher, Christian Koenig
On Thu, Apr 30, 2026 at 12:29 PM Jesse Zhang <Jesse.Zhang@amd.com> wrote:
>
> From: "Jesse.zhang" <Jesse.zhang@amd.com>
>
> Pair the userspace aggregated-doorbell ring (added by the
> AMDGPU_INFO_DOORBELL / AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL ABI in
> the previous patches) with a kernel-side
> MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE in amdgpu_userq_signal_ioctl
> for SDMA UMQs.
>
> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
How will this work if the user doesn't use this IOCTL? protected
fences are optional. An application can create a user queue and never
use a protected fence. Why don't KFD SDMA queues need this special
treatment?
Alex
> ---
> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 29 +++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index a58342c2ac44..50e275b51c9e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -598,6 +598,35 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
> /* drop the reference acquired in fence creation function */
> dma_fence_put(fence);
>
> + /*
> + * SDMA UMQ wake: SDMA has no CP_UNMAPPED_DOORBELL HW intercept, so
> + * once MES gangs the queue out (after the first IB's PROTECTED_FENCE
> + * idles the queue), per-queue doorbell rings hit a mapped-out HW
> + * slot and are silently dropped — FENCE IRQ never fires.
> + *
> + * Userspace rings the priority's MES aggregated doorbell directly
> + * via the agdb_bo mmap (see AMDGPU_INFO_DOORBELL +
> + * AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL). That alone, however, is
> + * not enough on current MES12 firmware — MES will not scan the
> + * priority's queue list unless its hasReadyQueues flag is set.
> + * NOTIFY_WORK_ON_UNMAPPED_QUEUE flips that flag, so MES then
> + * processes the doorbell ring and re-MAP_QUEUEs the SDMA UMQ.
> + *
> + * This is a kernel-side companion to the userspace agg doorbell
> + * ring; remove once firmware learns to wake on bare aggregated
> + * doorbell.
> + */
> + if (queue && queue->queue_type == AMDGPU_HW_IP_DMA &&
> + adev->enable_mes && adev->mes.funcs->misc_op) {
> + struct mes_misc_op_input op = { 0 };
> +
> + op.op = MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE;
> + op.notify_work.priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> + amdgpu_mes_lock(&adev->mes);
> + (void)adev->mes.funcs->misc_op(&adev->mes, &op);
> + amdgpu_mes_unlock(&adev->mes);
> + }
> +
> exec_fini:
> drm_exec_fini(&exec);
> put_gobj_write:
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 09/10] drm/amdgpu/userq: report SDMA UMQ doorbell info via AMDGPU_INFO_DOORBELL
2026-04-30 16:03 ` [PATCH v4 09/10] drm/amdgpu/userq: report SDMA UMQ doorbell info via AMDGPU_INFO_DOORBELL Jesse Zhang
@ 2026-05-01 13:35 ` Alex Deucher
0 siblings, 0 replies; 17+ messages in thread
From: Alex Deucher @ 2026-05-01 13:35 UTC (permalink / raw)
To: Jesse Zhang; +Cc: amd-gfx, Alexander.Deucher, Christian Koenig
On Thu, Apr 30, 2026 at 12:19 PM Jesse Zhang <Jesse.Zhang@amd.com> wrote:
>
> From: "Jesse.zhang" <Jesse.zhang@amd.com>
>
> Extend AMDGPU_INFO_DOORBELL with an AMDGPU_HW_IP_DMA case so userspace
> can discover:
>
> - The SDMA UMQ doorbell BAR range (index_start..index_end), used to
> validate per-queue doorbell offsets.
> - Whether an aggregated doorbell slot exists in adev->agdb_bo
> (agdb_enable) and its dword offset within that BO (agdb_offset).
>
> A user-mode driver opens the global aggregated-doorbell BO via
> AMDGPU_GEM_OP_OPEN_GLOBAL with AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL,
> mmaps it, and writes the queue's 32-bit WPTR at the byte offset
> agdb_offset * 4 to wake MES — bypassing amdgpu_userq_signal_ioctl
> entirely.
>
> This is the user-visible half of moving the SDMA UMQ wake path out of
> the kernel. The kernel-side wake is removed in a follow-up patch.
>
> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
> ---> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index a3beeff800bf..1c6368d25b7d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1437,6 +1437,17 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
> doorbell_info.index_start = adev->doorbell_index.vcn.vcn_ring0_1 << 1;
> doorbell_info.index_end = (adev->doorbell_index.vcn.vcn_ring6_7 << 1) + 1;
> break;
> + case AMDGPU_HW_IP_DMA:
> + if (adev->agdb_bo) {
I think this IOCTL should return an error adev->agdb_bo is NULL. We
probably also need a per-IP flag to determine if the aggregated
doorbell is required or not. That way we can easily support other IP
versions that may not require the aggregated doorbell.
Can you also add a patch to add doorbell info support for compute and
gfx for consistency?
Thanks,
Alex
> + doorbell_info.agdb_enable = 1;
> + doorbell_info.agdb_offset = adev->sdma.agdb_offset;
> + }
> + doorbell_info.index_start =
> + adev->doorbell_index.sdma_engine[0] << 1;
> + doorbell_info.index_end =
> + (adev->doorbell_index.sdma_engine[adev->sdma.num_instances - 1]
> + << 1) + 1;
> + break;
> default:
> return -EINVAL;
> }
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 06/10] drm/amdgpu: add AMDGPU_INFO_DOORBELL
2026-04-30 16:03 ` [PATCH v4 06/10] drm/amdgpu: add AMDGPU_INFO_DOORBELL Jesse Zhang
@ 2026-05-01 13:37 ` Alex Deucher
0 siblings, 0 replies; 17+ messages in thread
From: Alex Deucher @ 2026-05-01 13:37 UTC (permalink / raw)
To: Jesse Zhang
Cc: amd-gfx, Alexander.Deucher, Christian Koenig,
David (Ming Qiang) Wu
On Thu, Apr 30, 2026 at 12:29 PM Jesse Zhang <Jesse.Zhang@amd.com> wrote:
>
> From: "David (Ming Qiang) Wu" <David.Wu3@amd.com>
>
> Use it to get the doorbell range and aggregated doorbell enablement
> and offset. This patch only supports VCN for now.
>
> V2 - drop VPE and use vcn.agdb_offset saved in
> umsch_mm_agdb_index_init() (suggested by Alex)
>
> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 18 ++++++++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 +
> include/uapi/drm/amdgpu_drm.h | 13 +++++++++++++
> 3 files changed, 32 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index d88e4994c8c1..a3beeff800bf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1425,6 +1425,24 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
> return -EINVAL;
> }
> }
> + case AMDGPU_INFO_DOORBELL: {
> + struct drm_amdgpu_info_doorbell doorbell_info = {};
> + /* note: may need to check asic_type */
> + switch (info->query_hw_ip.type) {
> + case AMDGPU_HW_IP_VCN_ENC:
> + if (adev->agdb_bo) {
I think this IOCTL should return an error adev->agdb_bo is NULL. We
probably also need a per-IP flag to determine if the aggregated
doorbell is required or not. That way we can easily support other IP
versions that may not require the aggregated doorbell.
> + doorbell_info.agdb_enable = 1;
> + doorbell_info.agdb_offset = adev->vcn.agdb_offset;
> + }
> + doorbell_info.index_start = adev->doorbell_index.vcn.vcn_ring0_1 << 1;
> + doorbell_info.index_end = (adev->doorbell_index.vcn.vcn_ring6_7 << 1) + 1;
> + break;
> + default:
> + return -EINVAL;
> + }
> + return copy_to_user(out, &doorbell_info,
> + min((size_t)size, sizeof(doorbell_info))) ? -EFAULT : 0;
> + }
> default:
> DRM_DEBUG_KMS("Invalid request %d\n", info->query);
> return -EINVAL;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> index 82624b44e661..f07920594295 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> @@ -368,6 +368,7 @@ struct amdgpu_vcn {
> struct mutex workload_profile_mutex;
> u32 reg_count;
> const struct amdgpu_hwip_reg_entry *reg_list;
> + uint32_t agdb_offset;
> };
>
> struct amdgpu_fw_shared_rb_ptrs_struct {
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 062ae4741fd6..3ffdd2f8c418 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
I'd split this part out as a separate patch.
Alex
> @@ -1276,6 +1276,8 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
> #define AMDGPU_INFO_GPUVM_FAULT 0x23
> /* query FW object size and alignment */
> #define AMDGPU_INFO_UQ_FW_AREAS 0x24
> +/* query doorbell info */
> +#define AMDGPU_INFO_DOORBELL 0x25
>
> #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT 0
> #define AMDGPU_INFO_MMR_SE_INDEX_MASK 0xff
> @@ -1677,6 +1679,17 @@ struct drm_amdgpu_info_uq_metadata {
> #define AMDGPU_FAMILY_GC_11_5_4 154 /* GC 11.5.4 */
> #define AMDGPU_FAMILY_GC_12_0_0 152 /* GC 12.0.0 */
>
> +/* for AMDGPU_INFO_DOORBELL query */
> +struct drm_amdgpu_info_doorbell {
> + __u32 index_start;
> + /* could be equal to index_start */
> + __u32 index_end;
> + /* aggregated doorbell, 0 for disable */
> + __u32 agdb_enable;
> + /* if agdb_enable, it is a value in [index_start, index_end] */
> + __u32 agdb_offset;
> +};
> +
> #if defined(__cplusplus)
> }
> #endif
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA UMQ submit
2026-05-01 13:30 ` Alex Deucher
@ 2026-05-04 9:01 ` Christian König
2026-05-06 6:08 ` Zhang, Jesse(Jie)
0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2026-05-04 9:01 UTC (permalink / raw)
To: Alex Deucher, Jesse Zhang; +Cc: amd-gfx, Alexander.Deucher
On 5/1/26 15:30, Alex Deucher wrote:
> On Thu, Apr 30, 2026 at 12:29 PM Jesse Zhang <Jesse.Zhang@amd.com> wrote:
>>
>> From: "Jesse.zhang" <Jesse.zhang@amd.com>
>>
>> Pair the userspace aggregated-doorbell ring (added by the
>> AMDGPU_INFO_DOORBELL / AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL ABI in
>> the previous patches) with a kernel-side
>> MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE in amdgpu_userq_signal_ioctl
>> for SDMA UMQs.
>>
>> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
>
> How will this work if the user doesn't use this IOCTL? protected
> fences are optional. An application can create a user queue and never
> use a protected fence. Why don't KFD SDMA queues need this special
> treatment?
Yeah agree that whole approach doesn't work.
What we could do is similar to the MM queues that userspace need to signal both a per queue doorbell and an aggregated one for the queue type.
Regards,
Christian.
>
> Alex
>
>> ---
>> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 29 +++++++++++++++++++
>> 1 file changed, 29 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> index a58342c2ac44..50e275b51c9e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> @@ -598,6 +598,35 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
>> /* drop the reference acquired in fence creation function */
>> dma_fence_put(fence);
>>
>> + /*
>> + * SDMA UMQ wake: SDMA has no CP_UNMAPPED_DOORBELL HW intercept, so
>> + * once MES gangs the queue out (after the first IB's PROTECTED_FENCE
>> + * idles the queue), per-queue doorbell rings hit a mapped-out HW
>> + * slot and are silently dropped — FENCE IRQ never fires.
>> + *
>> + * Userspace rings the priority's MES aggregated doorbell directly
>> + * via the agdb_bo mmap (see AMDGPU_INFO_DOORBELL +
>> + * AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL). That alone, however, is
>> + * not enough on current MES12 firmware — MES will not scan the
>> + * priority's queue list unless its hasReadyQueues flag is set.
>> + * NOTIFY_WORK_ON_UNMAPPED_QUEUE flips that flag, so MES then
>> + * processes the doorbell ring and re-MAP_QUEUEs the SDMA UMQ.
>> + *
>> + * This is a kernel-side companion to the userspace agg doorbell
>> + * ring; remove once firmware learns to wake on bare aggregated
>> + * doorbell.
>> + */
>> + if (queue && queue->queue_type == AMDGPU_HW_IP_DMA &&
>> + adev->enable_mes && adev->mes.funcs->misc_op) {
>> + struct mes_misc_op_input op = { 0 };
>> +
>> + op.op = MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE;
>> + op.notify_work.priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>> + amdgpu_mes_lock(&adev->mes);
>> + (void)adev->mes.funcs->misc_op(&adev->mes, &op);
>> + amdgpu_mes_unlock(&adev->mes);
>> + }
>> +
>> exec_fini:
>> drm_exec_fini(&exec);
>> put_gobj_write:
>> --
>> 2.49.0
>>
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA UMQ submit
2026-05-04 9:01 ` Christian König
@ 2026-05-06 6:08 ` Zhang, Jesse(Jie)
2026-05-06 7:21 ` Christian König
0 siblings, 1 reply; 17+ messages in thread
From: Zhang, Jesse(Jie) @ 2026-05-06 6:08 UTC (permalink / raw)
To: Koenig, Christian, Alex Deucher
Cc: amd-gfx@lists.freedesktop.org, Deucher, Alexander
AMD General
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Monday, May 4, 2026 5:01 PM
> To: Alex Deucher <alexdeucher@gmail.com>; Zhang, Jesse(Jie)
> <Jesse.Zhang@amd.com>
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA
> UMQ submit
>
> On 5/1/26 15:30, Alex Deucher wrote:
> > On Thu, Apr 30, 2026 at 12:29 PM Jesse Zhang <Jesse.Zhang@amd.com>
> wrote:
> >>
> >> From: "Jesse.zhang" <Jesse.zhang@amd.com>
> >>
> >> Pair the userspace aggregated-doorbell ring (added by the
> >> AMDGPU_INFO_DOORBELL /
> AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL ABI in
> >> the previous patches) with a kernel-side
> >> MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE in
> >> amdgpu_userq_signal_ioctl for SDMA UMQs.
> >>
> >> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
> >
> > How will this work if the user doesn't use this IOCTL? protected
> > fences are optional. An application can create a user queue and never
> > use a protected fence. Why don't KFD SDMA queues need this special
> > treatment?
>
> Yeah agree that whole approach doesn't work.
>
> What we could do is similar to the MM queues that userspace need to signal both a
> per queue doorbell and an aggregated one for the queue type.
>
> Regards,
> Christian.
Hi Christian, Alex,
Agreed, and will drop this patch.
The MM-style userspace ABI is already in place: David's agdb_bo
(AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL + GEM_OP_OPEN_GLOBAL) plus patch 9
(AMDGPU_INFO_DOORBELL reports the SDMA agdb slot). IGT rings per-queue +
aggregated on every submit.
The remaining gap: on MES12 , a bare agg_db ring does NOT wake an
unmapped SDMA UMQ — MES needs hasReadyQueues set, which today only
NOTIFY_WORK_ON_UNMAPPED_QUEUE flips. This is by design, not Linux-only.
The Windows UMQ path also uses the same
contract — MES writes 1 to *unmap_flag_addr on preempt; UMD checks the
flag and calls NOTIFY before ringing doorbells on the next submit.
Next version v5 (matches Windows):
- Drop patch 10.
- Keep David's ABI + INFO_DOORBELL.
- Add a small standalone NOTIFY ioctl (e.g. AMDGPU_USERQ_OP_NOTIFY_WORK)
so UMQ apps call it on demand.
Is it the right direction?
Attached test results (current v4, with the to-be-dropped signal_ioctl NOTIFY):
HW / fw : gfx12
Test : IGT amd_userq_sdma stress, 100 iters
Result : 100/100 PASS
So the agg_db + NOTIFY mechanism works on hardware.
Thanks,
Jesse
>
> >
> > Alex
> >
> >> ---
> >> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 29
> +++++++++++++++++++
> >> 1 file changed, 29 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> index a58342c2ac44..50e275b51c9e 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> @@ -598,6 +598,35 @@ int amdgpu_userq_signal_ioctl(struct drm_device
> *dev, void *data,
> >> /* drop the reference acquired in fence creation function */
> >> dma_fence_put(fence);
> >>
> >> + /*
> >> + * SDMA UMQ wake: SDMA has no CP_UNMAPPED_DOORBELL HW
> intercept, so
> >> + * once MES gangs the queue out (after the first IB's
> PROTECTED_FENCE
> >> + * idles the queue), per-queue doorbell rings hit a mapped-out HW
> >> + * slot and are silently dropped — FENCE IRQ never fires.
> >> + *
> >> + * Userspace rings the priority's MES aggregated doorbell directly
> >> + * via the agdb_bo mmap (see AMDGPU_INFO_DOORBELL +
> >> + * AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL). That alone,
> however, is
> >> + * not enough on current MES12 firmware — MES will not scan the
> >> + * priority's queue list unless its hasReadyQueues flag is set.
> >> + * NOTIFY_WORK_ON_UNMAPPED_QUEUE flips that flag, so MES
> then
> >> + * processes the doorbell ring and re-MAP_QUEUEs the SDMA UMQ.
> >> + *
> >> + * This is a kernel-side companion to the userspace agg doorbell
> >> + * ring; remove once firmware learns to wake on bare aggregated
> >> + * doorbell.
> >> + */
> >> + if (queue && queue->queue_type == AMDGPU_HW_IP_DMA &&
> >> + adev->enable_mes && adev->mes.funcs->misc_op) {
> >> + struct mes_misc_op_input op = { 0 };
> >> +
> >> + op.op =
> MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE;
> >> + op.notify_work.priority_level =
> AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> >> + amdgpu_mes_lock(&adev->mes);
> >> + (void)adev->mes.funcs->misc_op(&adev->mes, &op);
> >> + amdgpu_mes_unlock(&adev->mes);
> >> + }
> >> +
> >> exec_fini:
> >> drm_exec_fini(&exec);
> >> put_gobj_write:
> >> --
> >> 2.49.0
> >>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA UMQ submit
2026-05-06 6:08 ` Zhang, Jesse(Jie)
@ 2026-05-06 7:21 ` Christian König
0 siblings, 0 replies; 17+ messages in thread
From: Christian König @ 2026-05-06 7:21 UTC (permalink / raw)
To: Zhang, Jesse(Jie), Alex Deucher
Cc: amd-gfx@lists.freedesktop.org, Deucher, Alexander
On 5/6/26 08:08, Zhang, Jesse(Jie) wrote:
> AMD General
>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Monday, May 4, 2026 5:01 PM
>> To: Alex Deucher <alexdeucher@gmail.com>; Zhang, Jesse(Jie)
>> <Jesse.Zhang@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
>> <Alexander.Deucher@amd.com>
>> Subject: Re: [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA
>> UMQ submit
>>
>> On 5/1/26 15:30, Alex Deucher wrote:
>>> On Thu, Apr 30, 2026 at 12:29 PM Jesse Zhang <Jesse.Zhang@amd.com>
>> wrote:
>>>>
>>>> From: "Jesse.zhang" <Jesse.zhang@amd.com>
>>>>
>>>> Pair the userspace aggregated-doorbell ring (added by the
>>>> AMDGPU_INFO_DOORBELL /
>> AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL ABI in
>>>> the previous patches) with a kernel-side
>>>> MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE in
>>>> amdgpu_userq_signal_ioctl for SDMA UMQs.
>>>>
>>>> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
>>>
>>> How will this work if the user doesn't use this IOCTL? protected
>>> fences are optional. An application can create a user queue and never
>>> use a protected fence. Why don't KFD SDMA queues need this special
>>> treatment?
>>
>> Yeah agree that whole approach doesn't work.
>>
>> What we could do is similar to the MM queues that userspace need to signal both a
>> per queue doorbell and an aggregated one for the queue type.
>>
>> Regards,
>> Christian.
> Hi Christian, Alex,
>
> Agreed, and will drop this patch.
>
> The MM-style userspace ABI is already in place: David's agdb_bo
> (AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL + GEM_OP_OPEN_GLOBAL) plus patch 9
> (AMDGPU_INFO_DOORBELL reports the SDMA agdb slot). IGT rings per-queue +
> aggregated on every submit.
>
> The remaining gap: on MES12 , a bare agg_db ring does NOT wake an
> unmapped SDMA UMQ — MES needs hasReadyQueues set, which today only
> NOTIFY_WORK_ON_UNMAPPED_QUEUE flips. This is by design, not Linux-only.
> The Windows UMQ path also uses the same
> contract — MES writes 1 to *unmap_flag_addr on preempt; UMD checks the
> flag and calls NOTIFY before ringing doorbells on the next submit.
>
> Next version v5 (matches Windows):
>
> - Drop patch 10.
> - Keep David's ABI + INFO_DOORBELL.
> - Add a small standalone NOTIFY ioctl (e.g. AMDGPU_USERQ_OP_NOTIFY_WORK)
Completely NAK to that approach. This not only results in problems with GFX userqueues but also completely breaks ROCm.
It looks like we need to go back to the drawing board with the FW team and avoid such workarounds.
Regards,
Christian.
> so UMQ apps call it on demand.
>
> Is it the right direction?
>
> Attached test results (current v4, with the to-be-dropped signal_ioctl NOTIFY):
>
> HW / fw : gfx12
> Test : IGT amd_userq_sdma stress, 100 iters
> Result : 100/100 PASS
>
> So the agg_db + NOTIFY mechanism works on hardware.
>
> Thanks,
> Jesse
>>
>>>
>>> Alex
>>>
>>>> ---
>>>> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 29
>> +++++++++++++++++++
>>>> 1 file changed, 29 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>> index a58342c2ac44..50e275b51c9e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>> @@ -598,6 +598,35 @@ int amdgpu_userq_signal_ioctl(struct drm_device
>> *dev, void *data,
>>>> /* drop the reference acquired in fence creation function */
>>>> dma_fence_put(fence);
>>>>
>>>> + /*
>>>> + * SDMA UMQ wake: SDMA has no CP_UNMAPPED_DOORBELL HW
>> intercept, so
>>>> + * once MES gangs the queue out (after the first IB's
>> PROTECTED_FENCE
>>>> + * idles the queue), per-queue doorbell rings hit a mapped-out HW
>>>> + * slot and are silently dropped — FENCE IRQ never fires.
>>>> + *
>>>> + * Userspace rings the priority's MES aggregated doorbell directly
>>>> + * via the agdb_bo mmap (see AMDGPU_INFO_DOORBELL +
>>>> + * AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL). That alone,
>> however, is
>>>> + * not enough on current MES12 firmware — MES will not scan the
>>>> + * priority's queue list unless its hasReadyQueues flag is set.
>>>> + * NOTIFY_WORK_ON_UNMAPPED_QUEUE flips that flag, so MES
>> then
>>>> + * processes the doorbell ring and re-MAP_QUEUEs the SDMA UMQ.
>>>> + *
>>>> + * This is a kernel-side companion to the userspace agg doorbell
>>>> + * ring; remove once firmware learns to wake on bare aggregated
>>>> + * doorbell.
>>>> + */
>>>> + if (queue && queue->queue_type == AMDGPU_HW_IP_DMA &&
>>>> + adev->enable_mes && adev->mes.funcs->misc_op) {
>>>> + struct mes_misc_op_input op = { 0 };
>>>> +
>>>> + op.op =
>> MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE;
>>>> + op.notify_work.priority_level =
>> AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>> + amdgpu_mes_lock(&adev->mes);
>>>> + (void)adev->mes.funcs->misc_op(&adev->mes, &op);
>>>> + amdgpu_mes_unlock(&adev->mes);
>>>> + }
>>>> +
>>>> exec_fini:
>>>> drm_exec_fini(&exec);
>>>> put_gobj_write:
>>>> --
>>>> 2.49.0
>>>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-05-06 7:22 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 16:03 [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 02/10] drm/amdgpu/mes11: plumb unmap_flag_addr + NOTIFY_WORK_ON_UNMAPPED_QUEUE Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 03/10] drm/amdgpu/mes12: plumb is_user_mode_submission, unmap_flag_addr, NOTIFY Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 04/10] drm/amdgpu/mes_userqueue: mark SDMA UMQs as user-mode submission Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 05/10] amdgpu: add global aggregated doorbell bo Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 06/10] drm/amdgpu: add AMDGPU_INFO_DOORBELL Jesse Zhang
2026-05-01 13:37 ` Alex Deucher
2026-04-30 16:03 ` [PATCH v4 07/10] drm/amdgpu: Add AMDGPU_GEM_OP_OPEN_GLOBAL Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 08/10] drm/amdgpu/mes: route NORMAL aggregated doorbell through global agdb_bo Jesse Zhang
2026-04-30 16:03 ` [PATCH v4 09/10] drm/amdgpu/userq: report SDMA UMQ doorbell info via AMDGPU_INFO_DOORBELL Jesse Zhang
2026-05-01 13:35 ` Alex Deucher
2026-04-30 16:03 ` [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA UMQ submit Jesse Zhang
2026-05-01 13:30 ` Alex Deucher
2026-05-04 9:01 ` Christian König
2026-05-06 6:08 ` Zhang, Jesse(Jie)
2026-05-06 7:21 ` Christian König
2026-05-01 13:26 ` [PATCH v4 01/10] drm/amdgpu/mes: add NOTIFY_WORK_ON_UNMAPPED_QUEUE op + ADD_QUEUE fields Alex Deucher
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox