[PATCH 00/42] Improvements for IB handling

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/42] Improvements for IB handling
@ 2026-01-08 14:48 Alex Deucher
  2026-01-08 14:48 ` [PATCH 01/42] drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check Alex Deucher
                   ` (42 more replies)
  0 siblings, 43 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

This set contains a number of bug fixes and cleanups for
IB handling that I worked on over the holidays.

Patches 1-2:
Simple bug fixes.

Patches 3-26:
Removes the direct submit path for IBs and requires
that all IB submissions use a job structure.  This
greatly simplifies the IB submission code.

Patches 27-42:
Split IB state setup and ring emission.  This keeps all
of the IB state in the job.  This greatly simplifies
re-emission of non-timed-out jobs after a ring reset and
allows for re-emission multiple times if multiple resets
happen in a row.  It also properly handles the dma fence
error handling for timedout jobs with adapter resets.

Alex Deucher (42):
  drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
  drm/amdgpu: fix error handling in ib_schedule()
  drm/amdgpu: add new job ids
  drm/amdgpu/vpe: switch to using job for IBs
  drm/amdgpu/gfx6: switch to using job for IBs
  drm/amdgpu/gfx7: switch to using job for IBs
  drm/amdgpu/gfx8: switch to using job for IBs
  drm/amdgpu/gfx9: switch to using job for IBs
  drm/amdgpu/gfx9.4.2: switch to using job for IBs
  drm/amdgpu/gfx9.4.3: switch to using job for IBs
  drm/amdgpu/gfx10: switch to using job for IBs
  drm/amdgpu/gfx11: switch to using job for IBs
  drm/amdgpu/gfx12: switch to using job for IBs
  drm/amdgpu/gfx12.1: switch to using job for IBs
  drm/amdgpu/si_dma: switch to using job for IBs
  drm/amdgpu/cik_sdma: switch to using job for IBs
  drm/amdgpu/sdma2.4: switch to using job for IBs
  drm/amdgpu/sdma3: switch to using job for IBs
  drm/amdgpu/sdma4: switch to using job for IBs
  drm/amdgpu/sdma4.4.2: switch to using job for IBs
  drm/amdgpu/sdma5: switch to using job for IBs
  drm/amdgpu/sdma5.2: switch to using job for IBs
  drm/amdgpu/sdma6: switch to using job for IBs
  drm/amdgpu/sdma7: switch to using job for IBs
  drm/amdgpu/sdma7.1: switch to using job for IBs
  drm/amdgpu: require a job to schedule an IB
  drm/amdgpu: mark fences with errors before ring reset
  drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
  drm/amdgpu: don't call drm_sched_stop/start() in asic reset
  drm/amdgpu: drop drm_sched_increase_karma()
  drm/amdgpu: plumb timedout fence through to force completion
  drm/amdgpu: change function signature for emit_pipeline_sync()
  drm/amdgpu: drop extra parameter for vm_flush
  drm/amdgpu: move need_ctx_switch into amdgpu_job
  drm/amdgpu: store vm flush state in amdgpu_job
  drm/amdgpu: split fence init and emit logic
  drm/amdgpu: split vm flush and vm flush emit logic
  drm/amdgpu: split ib schedule and ib emit logic
  drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
  drm/amdgpu: add an all_instance_rings_reset ring flag
  drm/amdgpu: rework reset reemit handling
  drm/amdgpu: simplify per queue reset code

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
 drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
 drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
 54 files changed, 952 insertions(+), 966 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 01/42] drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 02/42] drm/amdgpu: fix error handling in ib_schedule() Alex Deucher
                   ` (41 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

The per queue reset flag is only set when sr-iov is
disabled so this check is not necessary as the function
will never be called on sr-iov.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
index aae7328973d18..50ed7fb0e941c 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
@@ -1145,9 +1145,6 @@ static int jpeg_v4_0_3_ring_reset(struct amdgpu_ring *ring,
 				  unsigned int vmid,
 				  struct amdgpu_fence *timedout_fence)
 {
-	if (amdgpu_sriov_vf(ring->adev))
-		return -EOPNOTSUPP;
-
 	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	jpeg_v4_0_3_core_stall_reset(ring);
 	jpeg_v4_0_3_start_jrbc(ring);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/42] drm/amdgpu: fix error handling in ib_schedule()
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
  2026-01-08 14:48 ` [PATCH 01/42] drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 03/42] drm/amdgpu: add new job ids Alex Deucher
                   ` (40 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

If fence emit fails, free the fence if necessary.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 586a58facca10..72ec455fa932c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -302,7 +302,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 		if (job && job->vmid)
 			amdgpu_vmid_reset(adev, ring->vm_hub, job->vmid);
 		amdgpu_ring_undo(ring);
-		return r;
+		goto free_fence;
 	}
 	*f = &af->base;
 	/* get a ref for the job */
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/42] drm/amdgpu: add new job ids
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
  2026-01-08 14:48 ` [PATCH 01/42] drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check Alex Deucher
  2026-01-08 14:48 ` [PATCH 02/42] drm/amdgpu: fix error handling in ib_schedule() Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 04/42] drm/amdgpu/vpe: switch to using job for IBs Alex Deucher
                   ` (39 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Use this for gfx, sdma, vpe IB tests and kernel shaders.
The end goal it to get rid of the direct IB submit without a
job structure.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 7abf069d17d42..56a88e14a0448 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -59,6 +59,10 @@ enum amdgpu_ib_pool_type;
 #define AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB          (18446744073709551604ULL)
 #define AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP           (18446744073709551603ULL)
 #define AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST          (18446744073709551602ULL)
+#define AMDGPU_KERNEL_JOB_ID_GFX_RING_TEST          (18446744073709551601ULL)
+#define AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST         (18446744073709551600ULL)
+#define AMDGPU_KERNEL_JOB_ID_VPE_RING_TEST          (18446744073709551599ULL)
+#define AMDGPU_KERNEL_JOB_ID_RUN_SHADER             (18446744073709551598ULL)
 
 struct amdgpu_job {
 	struct drm_sched_job    base;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/42] drm/amdgpu/vpe: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (2 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 03/42] drm/amdgpu: add new job ids Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 05/42] drm/amdgpu/gfx6: " Alex Deucher
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 37 ++++++++++++++-----------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
index fd881388d6125..9fb1946be1ba2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
@@ -817,7 +817,8 @@ static int vpe_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
 	const uint32_t test_pattern = 0xdeadbeef;
-	struct amdgpu_ib ib = {};
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	uint32_t index;
 	uint64_t wb_addr;
@@ -832,23 +833,28 @@ static int vpe_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	adev->wb.wb[index] = 0;
 	wb_addr = adev->wb.gpu_addr + (index * 4);
 
-	ret = amdgpu_ib_get(adev, NULL, 256, AMDGPU_IB_POOL_DIRECT, &ib);
+	ret = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				       AMDGPU_IB_POOL_DIRECT, &job,
+				       AMDGPU_KERNEL_JOB_ID_VPE_RING_TEST);
 	if (ret)
 		goto err0;
-
-	ib.ptr[0] = VPE_CMD_HEADER(VPE_CMD_OPCODE_FENCE, 0);
-	ib.ptr[1] = lower_32_bits(wb_addr);
-	ib.ptr[2] = upper_32_bits(wb_addr);
-	ib.ptr[3] = test_pattern;
-	ib.ptr[4] = VPE_CMD_HEADER(VPE_CMD_OPCODE_NOP, 0);
-	ib.ptr[5] = VPE_CMD_HEADER(VPE_CMD_OPCODE_NOP, 0);
-	ib.ptr[6] = VPE_CMD_HEADER(VPE_CMD_OPCODE_NOP, 0);
-	ib.ptr[7] = VPE_CMD_HEADER(VPE_CMD_OPCODE_NOP, 0);
-	ib.length_dw = 8;
-
-	ret = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (ret)
+	ib = &job->ibs[0];
+
+	ib->ptr[0] = VPE_CMD_HEADER(VPE_CMD_OPCODE_FENCE, 0);
+	ib->ptr[1] = lower_32_bits(wb_addr);
+	ib->ptr[2] = upper_32_bits(wb_addr);
+	ib->ptr[3] = test_pattern;
+	ib->ptr[4] = VPE_CMD_HEADER(VPE_CMD_OPCODE_NOP, 0);
+	ib->ptr[5] = VPE_CMD_HEADER(VPE_CMD_OPCODE_NOP, 0);
+	ib->ptr[6] = VPE_CMD_HEADER(VPE_CMD_OPCODE_NOP, 0);
+	ib->ptr[7] = VPE_CMD_HEADER(VPE_CMD_OPCODE_NOP, 0);
+	ib->length_dw = 8;
+
+	ret = amdgpu_job_submit_direct(job, ring, &f);
+	if (ret) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	ret = dma_fence_wait_timeout(f, false, timeout);
 	if (ret <= 0) {
@@ -859,7 +865,6 @@ static int vpe_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	ret = (le32_to_cpu(adev->wb.wb[index]) == test_pattern) ? 0 : -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/42] drm/amdgpu/gfx6: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (3 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 04/42] drm/amdgpu/vpe: switch to using job for IBs Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 06/42] drm/amdgpu/gfx7: " Alex Deucher
                   ` (37 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 73223d97a87f5..2f8aa99f17480 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -1895,24 +1895,29 @@ static int gfx_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
 	struct dma_fence *f = NULL;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	uint32_t tmp = 0;
 	long r;
 
 	WREG32(mmSCRATCH_REG0, 0xCAFEDEAD);
-	memset(&ib, 0, sizeof(ib));
-	r = amdgpu_ib_get(adev, NULL, 256, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_GFX_RING_TEST);
 	if (r)
 		return r;
 
-	ib.ptr[0] = PACKET3(PACKET3_SET_CONFIG_REG, 1);
-	ib.ptr[1] = mmSCRATCH_REG0 - PACKET3_SET_CONFIG_REG_START;
-	ib.ptr[2] = 0xDEADBEEF;
-	ib.length_dw = 3;
+	ib = &job->ibs[0];
+	ib->ptr[0] = PACKET3(PACKET3_SET_CONFIG_REG, 1);
+	ib->ptr[1] = mmSCRATCH_REG0 - PACKET3_SET_CONFIG_REG_START;
+	ib->ptr[2] = 0xDEADBEEF;
+	ib->length_dw = 3;
 
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto error;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -1928,7 +1933,6 @@ static int gfx_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 error:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 	return r;
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/42] drm/amdgpu/gfx7: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (4 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 05/42] drm/amdgpu/gfx6: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 07/42] drm/amdgpu/gfx8: " Alex Deucher
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 2b691452775bc..fa235b981c2e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2291,25 +2291,31 @@ static void gfx_v7_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags)
 static int gfx_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	uint32_t tmp = 0;
 	long r;
 
 	WREG32(mmSCRATCH_REG0, 0xCAFEDEAD);
-	memset(&ib, 0, sizeof(ib));
-	r = amdgpu_ib_get(adev, NULL, 256, AMDGPU_IB_POOL_DIRECT, &ib);
+
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_GFX_RING_TEST);
 	if (r)
 		return r;
 
-	ib.ptr[0] = PACKET3(PACKET3_SET_UCONFIG_REG, 1);
-	ib.ptr[1] = mmSCRATCH_REG0 - PACKET3_SET_UCONFIG_REG_START;
-	ib.ptr[2] = 0xDEADBEEF;
-	ib.length_dw = 3;
+	ib = &job->ibs[0];
+	ib->ptr[0] = PACKET3(PACKET3_SET_UCONFIG_REG, 1);
+	ib->ptr[1] = mmSCRATCH_REG0 - PACKET3_SET_UCONFIG_REG_START;
+	ib->ptr[2] = 0xDEADBEEF;
+	ib->length_dw = 3;
 
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto error;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -2325,7 +2331,6 @@ static int gfx_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 error:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 	return r;
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/42] drm/amdgpu/gfx8: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (5 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 06/42] drm/amdgpu/gfx7: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 08/42] drm/amdgpu/gfx9: " Alex Deucher
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 139 +++++++++++++-------------
 1 file changed, 72 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index a6b4c8f41dc11..4736216cd0211 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -868,9 +868,9 @@ static int gfx_v8_0_ring_test_ring(struct amdgpu_ring *ring)
 static int gfx_v8_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
-
 	unsigned int index;
 	uint64_t gpu_addr;
 	uint32_t tmp;
@@ -882,22 +882,26 @@ static int gfx_v8_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	adev->wb.wb[index] = cpu_to_le32(0xCAFEDEAD);
-	memset(&ib, 0, sizeof(ib));
 
-	r = amdgpu_ib_get(adev, NULL, 20, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 20,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_GFX_RING_TEST);
 	if (r)
 		goto err1;
 
-	ib.ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
-	ib.ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
-	ib.ptr[2] = lower_32_bits(gpu_addr);
-	ib.ptr[3] = upper_32_bits(gpu_addr);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.length_dw = 5;
+	ib = &job->ibs[0];
+	ib->ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
+	ib->ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
+	ib->ptr[2] = lower_32_bits(gpu_addr);
+	ib->ptr[3] = upper_32_bits(gpu_addr);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->length_dw = 5;
 
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err2;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -914,7 +918,6 @@ static int gfx_v8_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err2:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err1:
 	amdgpu_device_wb_free(adev, index);
@@ -1474,7 +1477,8 @@ static const u32 sec_ded_counter_registers[] =
 static int gfx_v8_0_do_edc_gpr_workarounds(struct amdgpu_device *adev)
 {
 	struct amdgpu_ring *ring = &adev->gfx.compute_ring[0];
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	int r, i;
 	u32 tmp;
@@ -1505,106 +1509,108 @@ static int gfx_v8_0_do_edc_gpr_workarounds(struct amdgpu_device *adev)
 	total_size += sizeof(sgpr_init_compute_shader);
 
 	/* allocate an indirect buffer to put the commands in */
-	memset(&ib, 0, sizeof(ib));
-	r = amdgpu_ib_get(adev, NULL, total_size,
-					AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, total_size,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_RUN_SHADER);
 	if (r) {
 		drm_err(adev_to_drm(adev), "failed to get ib (%d).\n", r);
 		return r;
 	}
+	ib = &job->ibs[0];
 
 	/* load the compute shaders */
 	for (i = 0; i < ARRAY_SIZE(vgpr_init_compute_shader); i++)
-		ib.ptr[i + (vgpr_offset / 4)] = vgpr_init_compute_shader[i];
+		ib->ptr[i + (vgpr_offset / 4)] = vgpr_init_compute_shader[i];
 
 	for (i = 0; i < ARRAY_SIZE(sgpr_init_compute_shader); i++)
-		ib.ptr[i + (sgpr_offset / 4)] = sgpr_init_compute_shader[i];
+		ib->ptr[i + (sgpr_offset / 4)] = sgpr_init_compute_shader[i];
 
 	/* init the ib length to 0 */
-	ib.length_dw = 0;
+	ib->length_dw = 0;
 
 	/* VGPR */
 	/* write the register state for the compute dispatch */
 	for (i = 0; i < ARRAY_SIZE(vgpr_init_regs); i += 2) {
-		ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
-		ib.ptr[ib.length_dw++] = vgpr_init_regs[i] - PACKET3_SET_SH_REG_START;
-		ib.ptr[ib.length_dw++] = vgpr_init_regs[i + 1];
+		ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
+		ib->ptr[ib->length_dw++] = vgpr_init_regs[i] - PACKET3_SET_SH_REG_START;
+		ib->ptr[ib->length_dw++] = vgpr_init_regs[i + 1];
 	}
 	/* write the shader start address: mmCOMPUTE_PGM_LO, mmCOMPUTE_PGM_HI */
-	gpu_addr = (ib.gpu_addr + (u64)vgpr_offset) >> 8;
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
-	ib.ptr[ib.length_dw++] = mmCOMPUTE_PGM_LO - PACKET3_SET_SH_REG_START;
-	ib.ptr[ib.length_dw++] = lower_32_bits(gpu_addr);
-	ib.ptr[ib.length_dw++] = upper_32_bits(gpu_addr);
+	gpu_addr = (ib->gpu_addr + (u64)vgpr_offset) >> 8;
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
+	ib->ptr[ib->length_dw++] = mmCOMPUTE_PGM_LO - PACKET3_SET_SH_REG_START;
+	ib->ptr[ib->length_dw++] = lower_32_bits(gpu_addr);
+	ib->ptr[ib->length_dw++] = upper_32_bits(gpu_addr);
 
 	/* write dispatch packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
-	ib.ptr[ib.length_dw++] = 8; /* x */
-	ib.ptr[ib.length_dw++] = 1; /* y */
-	ib.ptr[ib.length_dw++] = 1; /* z */
-	ib.ptr[ib.length_dw++] =
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
+	ib->ptr[ib->length_dw++] = 8; /* x */
+	ib->ptr[ib->length_dw++] = 1; /* y */
+	ib->ptr[ib->length_dw++] = 1; /* z */
+	ib->ptr[ib->length_dw++] =
 		REG_SET_FIELD(0, COMPUTE_DISPATCH_INITIATOR, COMPUTE_SHADER_EN, 1);
 
 	/* write CS partial flush packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
-	ib.ptr[ib.length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
+	ib->ptr[ib->length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
 
 	/* SGPR1 */
 	/* write the register state for the compute dispatch */
 	for (i = 0; i < ARRAY_SIZE(sgpr1_init_regs); i += 2) {
-		ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
-		ib.ptr[ib.length_dw++] = sgpr1_init_regs[i] - PACKET3_SET_SH_REG_START;
-		ib.ptr[ib.length_dw++] = sgpr1_init_regs[i + 1];
+		ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
+		ib->ptr[ib->length_dw++] = sgpr1_init_regs[i] - PACKET3_SET_SH_REG_START;
+		ib->ptr[ib->length_dw++] = sgpr1_init_regs[i + 1];
 	}
 	/* write the shader start address: mmCOMPUTE_PGM_LO, mmCOMPUTE_PGM_HI */
-	gpu_addr = (ib.gpu_addr + (u64)sgpr_offset) >> 8;
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
-	ib.ptr[ib.length_dw++] = mmCOMPUTE_PGM_LO - PACKET3_SET_SH_REG_START;
-	ib.ptr[ib.length_dw++] = lower_32_bits(gpu_addr);
-	ib.ptr[ib.length_dw++] = upper_32_bits(gpu_addr);
+	gpu_addr = (ib->gpu_addr + (u64)sgpr_offset) >> 8;
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
+	ib->ptr[ib->length_dw++] = mmCOMPUTE_PGM_LO - PACKET3_SET_SH_REG_START;
+	ib->ptr[ib->length_dw++] = lower_32_bits(gpu_addr);
+	ib->ptr[ib->length_dw++] = upper_32_bits(gpu_addr);
 
 	/* write dispatch packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
-	ib.ptr[ib.length_dw++] = 8; /* x */
-	ib.ptr[ib.length_dw++] = 1; /* y */
-	ib.ptr[ib.length_dw++] = 1; /* z */
-	ib.ptr[ib.length_dw++] =
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
+	ib->ptr[ib->length_dw++] = 8; /* x */
+	ib->ptr[ib->length_dw++] = 1; /* y */
+	ib->ptr[ib->length_dw++] = 1; /* z */
+	ib->ptr[ib->length_dw++] =
 		REG_SET_FIELD(0, COMPUTE_DISPATCH_INITIATOR, COMPUTE_SHADER_EN, 1);
 
 	/* write CS partial flush packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
-	ib.ptr[ib.length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
+	ib->ptr[ib->length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
 
 	/* SGPR2 */
 	/* write the register state for the compute dispatch */
 	for (i = 0; i < ARRAY_SIZE(sgpr2_init_regs); i += 2) {
-		ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
-		ib.ptr[ib.length_dw++] = sgpr2_init_regs[i] - PACKET3_SET_SH_REG_START;
-		ib.ptr[ib.length_dw++] = sgpr2_init_regs[i + 1];
+		ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
+		ib->ptr[ib->length_dw++] = sgpr2_init_regs[i] - PACKET3_SET_SH_REG_START;
+		ib->ptr[ib->length_dw++] = sgpr2_init_regs[i + 1];
 	}
 	/* write the shader start address: mmCOMPUTE_PGM_LO, mmCOMPUTE_PGM_HI */
-	gpu_addr = (ib.gpu_addr + (u64)sgpr_offset) >> 8;
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
-	ib.ptr[ib.length_dw++] = mmCOMPUTE_PGM_LO - PACKET3_SET_SH_REG_START;
-	ib.ptr[ib.length_dw++] = lower_32_bits(gpu_addr);
-	ib.ptr[ib.length_dw++] = upper_32_bits(gpu_addr);
+	gpu_addr = (ib->gpu_addr + (u64)sgpr_offset) >> 8;
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
+	ib->ptr[ib->length_dw++] = mmCOMPUTE_PGM_LO - PACKET3_SET_SH_REG_START;
+	ib->ptr[ib->length_dw++] = lower_32_bits(gpu_addr);
+	ib->ptr[ib->length_dw++] = upper_32_bits(gpu_addr);
 
 	/* write dispatch packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
-	ib.ptr[ib.length_dw++] = 8; /* x */
-	ib.ptr[ib.length_dw++] = 1; /* y */
-	ib.ptr[ib.length_dw++] = 1; /* z */
-	ib.ptr[ib.length_dw++] =
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
+	ib->ptr[ib->length_dw++] = 8; /* x */
+	ib->ptr[ib->length_dw++] = 1; /* y */
+	ib->ptr[ib->length_dw++] = 1; /* z */
+	ib->ptr[ib->length_dw++] =
 		REG_SET_FIELD(0, COMPUTE_DISPATCH_INITIATOR, COMPUTE_SHADER_EN, 1);
 
 	/* write CS partial flush packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
-	ib.ptr[ib.length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
+	ib->ptr[ib->length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
 
 	/* shedule the ib on the ring */
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
+	r = amdgpu_job_submit_direct(job, ring, &f);
 	if (r) {
 		drm_err(adev_to_drm(adev), "ib submit failed (%d).\n", r);
+		amdgpu_job_free(job);
 		goto fail;
 	}
 
@@ -1629,7 +1635,6 @@ static int gfx_v8_0_do_edc_gpr_workarounds(struct amdgpu_device *adev)
 		RREG32(sec_ded_counter_registers[i]);
 
 fail:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 
 	return r;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/42] drm/amdgpu/gfx9: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (6 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 07/42] drm/amdgpu/gfx8: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 09/42] drm/amdgpu/gfx9.4.2: " Alex Deucher
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 138 +++++++++++++-------------
 1 file changed, 71 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 7e9d753f4a808..36f0300a21bfa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1224,9 +1224,9 @@ static int gfx_v9_0_ring_test_ring(struct amdgpu_ring *ring)
 static int gfx_v9_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
-
 	unsigned index;
 	uint64_t gpu_addr;
 	uint32_t tmp;
@@ -1238,22 +1238,26 @@ static int gfx_v9_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	adev->wb.wb[index] = cpu_to_le32(0xCAFEDEAD);
-	memset(&ib, 0, sizeof(ib));
 
-	r = amdgpu_ib_get(adev, NULL, 20, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 20,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_GFX_RING_TEST);
 	if (r)
 		goto err1;
 
-	ib.ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
-	ib.ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
-	ib.ptr[2] = lower_32_bits(gpu_addr);
-	ib.ptr[3] = upper_32_bits(gpu_addr);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.length_dw = 5;
+	ib = &job->ibs[0];
+	ib->ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
+	ib->ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
+	ib->ptr[2] = lower_32_bits(gpu_addr);
+	ib->ptr[3] = upper_32_bits(gpu_addr);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->length_dw = 5;
 
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err2;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -1270,7 +1274,6 @@ static int gfx_v9_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err2:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err1:
 	amdgpu_device_wb_free(adev, index);
@@ -4624,7 +4627,8 @@ static int gfx_v9_0_do_edc_gds_workarounds(struct amdgpu_device *adev)
 static int gfx_v9_0_do_edc_gpr_workarounds(struct amdgpu_device *adev)
 {
 	struct amdgpu_ring *ring = &adev->gfx.compute_ring[0];
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	int r, i;
 	unsigned total_size, vgpr_offset, sgpr_offset;
@@ -4670,9 +4674,9 @@ static int gfx_v9_0_do_edc_gpr_workarounds(struct amdgpu_device *adev)
 	total_size += sizeof(sgpr_init_compute_shader);
 
 	/* allocate an indirect buffer to put the commands in */
-	memset(&ib, 0, sizeof(ib));
-	r = amdgpu_ib_get(adev, NULL, total_size,
-					AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, total_size,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_RUN_SHADER);
 	if (r) {
 		drm_err(adev_to_drm(adev), "failed to get ib (%d).\n", r);
 		return r;
@@ -4680,102 +4684,103 @@ static int gfx_v9_0_do_edc_gpr_workarounds(struct amdgpu_device *adev)
 
 	/* load the compute shaders */
 	for (i = 0; i < vgpr_init_shader_size/sizeof(u32); i++)
-		ib.ptr[i + (vgpr_offset / 4)] = vgpr_init_shader_ptr[i];
+		ib->ptr[i + (vgpr_offset / 4)] = vgpr_init_shader_ptr[i];
 
 	for (i = 0; i < ARRAY_SIZE(sgpr_init_compute_shader); i++)
-		ib.ptr[i + (sgpr_offset / 4)] = sgpr_init_compute_shader[i];
+		ib->ptr[i + (sgpr_offset / 4)] = sgpr_init_compute_shader[i];
 
 	/* init the ib length to 0 */
-	ib.length_dw = 0;
+	ib->length_dw = 0;
 
 	/* VGPR */
 	/* write the register state for the compute dispatch */
 	for (i = 0; i < gpr_reg_size; i++) {
-		ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
-		ib.ptr[ib.length_dw++] = SOC15_REG_ENTRY_OFFSET(vgpr_init_regs_ptr[i])
+		ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
+		ib->ptr[ib->length_dw++] = SOC15_REG_ENTRY_OFFSET(vgpr_init_regs_ptr[i])
 								- PACKET3_SET_SH_REG_START;
-		ib.ptr[ib.length_dw++] = vgpr_init_regs_ptr[i].reg_value;
+		ib->ptr[ib->length_dw++] = vgpr_init_regs_ptr[i].reg_value;
 	}
 	/* write the shader start address: mmCOMPUTE_PGM_LO, mmCOMPUTE_PGM_HI */
-	gpu_addr = (ib.gpu_addr + (u64)vgpr_offset) >> 8;
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
-	ib.ptr[ib.length_dw++] = SOC15_REG_OFFSET(GC, 0, mmCOMPUTE_PGM_LO)
+	gpu_addr = (ib->gpu_addr + (u64)vgpr_offset) >> 8;
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
+	ib->ptr[ib->length_dw++] = SOC15_REG_OFFSET(GC, 0, mmCOMPUTE_PGM_LO)
 							- PACKET3_SET_SH_REG_START;
-	ib.ptr[ib.length_dw++] = lower_32_bits(gpu_addr);
-	ib.ptr[ib.length_dw++] = upper_32_bits(gpu_addr);
+	ib->ptr[ib->length_dw++] = lower_32_bits(gpu_addr);
+	ib->ptr[ib->length_dw++] = upper_32_bits(gpu_addr);
 
 	/* write dispatch packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
-	ib.ptr[ib.length_dw++] = compute_dim_x * 2; /* x */
-	ib.ptr[ib.length_dw++] = 1; /* y */
-	ib.ptr[ib.length_dw++] = 1; /* z */
-	ib.ptr[ib.length_dw++] =
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
+	ib->ptr[ib->length_dw++] = compute_dim_x * 2; /* x */
+	ib->ptr[ib->length_dw++] = 1; /* y */
+	ib->ptr[ib->length_dw++] = 1; /* z */
+	ib->ptr[ib->length_dw++] =
 		REG_SET_FIELD(0, COMPUTE_DISPATCH_INITIATOR, COMPUTE_SHADER_EN, 1);
 
 	/* write CS partial flush packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
-	ib.ptr[ib.length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
+	ib->ptr[ib->length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
 
 	/* SGPR1 */
 	/* write the register state for the compute dispatch */
 	for (i = 0; i < gpr_reg_size; i++) {
-		ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
-		ib.ptr[ib.length_dw++] = SOC15_REG_ENTRY_OFFSET(sgpr1_init_regs[i])
+		ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
+		ib->ptr[ib->length_dw++] = SOC15_REG_ENTRY_OFFSET(sgpr1_init_regs[i])
 								- PACKET3_SET_SH_REG_START;
-		ib.ptr[ib.length_dw++] = sgpr1_init_regs[i].reg_value;
+		ib->ptr[ib->length_dw++] = sgpr1_init_regs[i].reg_value;
 	}
 	/* write the shader start address: mmCOMPUTE_PGM_LO, mmCOMPUTE_PGM_HI */
-	gpu_addr = (ib.gpu_addr + (u64)sgpr_offset) >> 8;
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
-	ib.ptr[ib.length_dw++] = SOC15_REG_OFFSET(GC, 0, mmCOMPUTE_PGM_LO)
+	gpu_addr = (ib->gpu_addr + (u64)sgpr_offset) >> 8;
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
+	ib->ptr[ib->length_dw++] = SOC15_REG_OFFSET(GC, 0, mmCOMPUTE_PGM_LO)
 							- PACKET3_SET_SH_REG_START;
-	ib.ptr[ib.length_dw++] = lower_32_bits(gpu_addr);
-	ib.ptr[ib.length_dw++] = upper_32_bits(gpu_addr);
+	ib->ptr[ib->length_dw++] = lower_32_bits(gpu_addr);
+	ib->ptr[ib->length_dw++] = upper_32_bits(gpu_addr);
 
 	/* write dispatch packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
-	ib.ptr[ib.length_dw++] = compute_dim_x / 2 * sgpr_work_group_size; /* x */
-	ib.ptr[ib.length_dw++] = 1; /* y */
-	ib.ptr[ib.length_dw++] = 1; /* z */
-	ib.ptr[ib.length_dw++] =
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
+	ib->ptr[ib->length_dw++] = compute_dim_x / 2 * sgpr_work_group_size; /* x */
+	ib->ptr[ib->length_dw++] = 1; /* y */
+	ib->ptr[ib->length_dw++] = 1; /* z */
+	ib->ptr[ib->length_dw++] =
 		REG_SET_FIELD(0, COMPUTE_DISPATCH_INITIATOR, COMPUTE_SHADER_EN, 1);
 
 	/* write CS partial flush packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
-	ib.ptr[ib.length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
+	ib->ptr[ib->length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
 
 	/* SGPR2 */
 	/* write the register state for the compute dispatch */
 	for (i = 0; i < gpr_reg_size; i++) {
-		ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
-		ib.ptr[ib.length_dw++] = SOC15_REG_ENTRY_OFFSET(sgpr2_init_regs[i])
+		ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 1);
+		ib->ptr[ib->length_dw++] = SOC15_REG_ENTRY_OFFSET(sgpr2_init_regs[i])
 								- PACKET3_SET_SH_REG_START;
-		ib.ptr[ib.length_dw++] = sgpr2_init_regs[i].reg_value;
+		ib->ptr[ib->length_dw++] = sgpr2_init_regs[i].reg_value;
 	}
 	/* write the shader start address: mmCOMPUTE_PGM_LO, mmCOMPUTE_PGM_HI */
-	gpu_addr = (ib.gpu_addr + (u64)sgpr_offset) >> 8;
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
-	ib.ptr[ib.length_dw++] = SOC15_REG_OFFSET(GC, 0, mmCOMPUTE_PGM_LO)
+	gpu_addr = (ib->gpu_addr + (u64)sgpr_offset) >> 8;
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_SET_SH_REG, 2);
+	ib->ptr[ib->length_dw++] = SOC15_REG_OFFSET(GC, 0, mmCOMPUTE_PGM_LO)
 							- PACKET3_SET_SH_REG_START;
-	ib.ptr[ib.length_dw++] = lower_32_bits(gpu_addr);
-	ib.ptr[ib.length_dw++] = upper_32_bits(gpu_addr);
+	ib->ptr[ib->length_dw++] = lower_32_bits(gpu_addr);
+	ib->ptr[ib->length_dw++] = upper_32_bits(gpu_addr);
 
 	/* write dispatch packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
-	ib.ptr[ib.length_dw++] = compute_dim_x / 2 * sgpr_work_group_size; /* x */
-	ib.ptr[ib.length_dw++] = 1; /* y */
-	ib.ptr[ib.length_dw++] = 1; /* z */
-	ib.ptr[ib.length_dw++] =
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_DISPATCH_DIRECT, 3);
+	ib->ptr[ib->length_dw++] = compute_dim_x / 2 * sgpr_work_group_size; /* x */
+	ib->ptr[ib->length_dw++] = 1; /* y */
+	ib->ptr[ib->length_dw++] = 1; /* z */
+	ib->ptr[ib->length_dw++] =
 		REG_SET_FIELD(0, COMPUTE_DISPATCH_INITIATOR, COMPUTE_SHADER_EN, 1);
 
 	/* write CS partial flush packet */
-	ib.ptr[ib.length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
-	ib.ptr[ib.length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
+	ib->ptr[ib->length_dw++] = PACKET3(PACKET3_EVENT_WRITE, 0);
+	ib->ptr[ib->length_dw++] = EVENT_TYPE(7) | EVENT_INDEX(4);
 
 	/* shedule the ib on the ring */
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
+	r = amdgpu_job_submit_direct(job, ring, &f);
 	if (r) {
 		drm_err(adev_to_drm(adev), "ib schedule failed (%d).\n", r);
+		amdgpu_job_free(job);
 		goto fail;
 	}
 
@@ -4787,7 +4792,6 @@ static int gfx_v9_0_do_edc_gpr_workarounds(struct amdgpu_device *adev)
 	}
 
 fail:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 
 	return r;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/42] drm/amdgpu/gfx9.4.2: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (7 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 08/42] drm/amdgpu/gfx9: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 10/42] drm/amdgpu/gfx9.4.3: " Alex Deucher
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 26 ++++++++-----------------
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
index 8058ea91ecafd..424b05b84ea74 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
@@ -345,12 +345,13 @@ const struct soc15_reg_entry sgpr64_init_regs_aldebaran[] = {
 
 static int gfx_v9_4_2_run_shader(struct amdgpu_device *adev,
 				 struct amdgpu_ring *ring,
-				 struct amdgpu_ib *ib,
 				 const u32 *shader_ptr, u32 shader_size,
 				 const struct soc15_reg_entry *init_regs, u32 regs_size,
 				 u32 compute_dim_x, u64 wb_gpu_addr, u32 pattern,
 				 struct dma_fence **fence_ptr)
 {
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	int r, i;
 	uint32_t total_size, shader_offset;
 	u64 gpu_addr;
@@ -360,10 +361,9 @@ static int gfx_v9_4_2_run_shader(struct amdgpu_device *adev,
 	shader_offset = total_size;
 	total_size += ALIGN(shader_size, 256);
 
-	/* allocate an indirect buffer to put the commands in */
-	memset(ib, 0, sizeof(*ib));
-	r = amdgpu_ib_get(adev, NULL, total_size,
-					AMDGPU_IB_POOL_DIRECT, ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, total_size,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_RUN_SHADER);
 	if (r) {
 		dev_err(adev->dev, "failed to get ib (%d).\n", r);
 		return r;
@@ -408,11 +408,11 @@ static int gfx_v9_4_2_run_shader(struct amdgpu_device *adev,
 	ib->ptr[ib->length_dw++] =
 		REG_SET_FIELD(0, COMPUTE_DISPATCH_INITIATOR, COMPUTE_SHADER_EN, 1);
 
-	/* shedule the ib on the ring */
-	r = amdgpu_ib_schedule(ring, 1, ib, NULL, fence_ptr);
+	/* schedule the ib on the ring */
+	r = amdgpu_job_submit_direct(job, ring, fence_ptr);
 	if (r) {
 		dev_err(adev->dev, "ib submit failed (%d).\n", r);
-		amdgpu_ib_free(ib, NULL);
+		amdgpu_job_free(job);
 	}
 	return r;
 }
@@ -493,7 +493,6 @@ static int gfx_v9_4_2_do_sgprs_init(struct amdgpu_device *adev)
 	int wb_size = adev->gfx.config.max_shader_engines *
 			 CU_ID_MAX * SIMD_ID_MAX * WAVE_ID_MAX;
 	struct amdgpu_ib wb_ib;
-	struct amdgpu_ib disp_ibs[3];
 	struct dma_fence *fences[3];
 	u32 pattern[3] = { 0x1, 0x5, 0xa };
 
@@ -514,7 +513,6 @@ static int gfx_v9_4_2_do_sgprs_init(struct amdgpu_device *adev)
 
 	r = gfx_v9_4_2_run_shader(adev,
 			&adev->gfx.compute_ring[0],
-			&disp_ibs[0],
 			sgpr112_init_compute_shader_aldebaran,
 			sizeof(sgpr112_init_compute_shader_aldebaran),
 			sgpr112_init_regs_aldebaran,
@@ -539,7 +537,6 @@ static int gfx_v9_4_2_do_sgprs_init(struct amdgpu_device *adev)
 
 	r = gfx_v9_4_2_run_shader(adev,
 			&adev->gfx.compute_ring[1],
-			&disp_ibs[1],
 			sgpr96_init_compute_shader_aldebaran,
 			sizeof(sgpr96_init_compute_shader_aldebaran),
 			sgpr96_init_regs_aldebaran,
@@ -579,7 +576,6 @@ static int gfx_v9_4_2_do_sgprs_init(struct amdgpu_device *adev)
 	memset(wb_ib.ptr, 0, (1 + wb_size) * sizeof(uint32_t));
 	r = gfx_v9_4_2_run_shader(adev,
 			&adev->gfx.compute_ring[0],
-			&disp_ibs[2],
 			sgpr64_init_compute_shader_aldebaran,
 			sizeof(sgpr64_init_compute_shader_aldebaran),
 			sgpr64_init_regs_aldebaran,
@@ -611,13 +607,10 @@ static int gfx_v9_4_2_do_sgprs_init(struct amdgpu_device *adev)
 	}
 
 disp2_failed:
-	amdgpu_ib_free(&disp_ibs[2], NULL);
 	dma_fence_put(fences[2]);
 disp1_failed:
-	amdgpu_ib_free(&disp_ibs[1], NULL);
 	dma_fence_put(fences[1]);
 disp0_failed:
-	amdgpu_ib_free(&disp_ibs[0], NULL);
 	dma_fence_put(fences[0]);
 pro_end:
 	amdgpu_ib_free(&wb_ib, NULL);
@@ -637,7 +630,6 @@ static int gfx_v9_4_2_do_vgprs_init(struct amdgpu_device *adev)
 	int wb_size = adev->gfx.config.max_shader_engines *
 			 CU_ID_MAX * SIMD_ID_MAX * WAVE_ID_MAX;
 	struct amdgpu_ib wb_ib;
-	struct amdgpu_ib disp_ib;
 	struct dma_fence *fence;
 	u32 pattern = 0xa;
 
@@ -657,7 +649,6 @@ static int gfx_v9_4_2_do_vgprs_init(struct amdgpu_device *adev)
 
 	r = gfx_v9_4_2_run_shader(adev,
 			&adev->gfx.compute_ring[0],
-			&disp_ib,
 			vgpr_init_compute_shader_aldebaran,
 			sizeof(vgpr_init_compute_shader_aldebaran),
 			vgpr_init_regs_aldebaran,
@@ -687,7 +678,6 @@ static int gfx_v9_4_2_do_vgprs_init(struct amdgpu_device *adev)
 	}
 
 disp_failed:
-	amdgpu_ib_free(&disp_ib, NULL);
 	dma_fence_put(fence);
 pro_end:
 	amdgpu_ib_free(&wb_ib, NULL);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/42] drm/amdgpu/gfx9.4.3: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (8 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 09/42] drm/amdgpu/gfx9.4.2: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 11/42] drm/amdgpu/gfx10: " Alex Deucher
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 29 ++++++++++++++-----------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index ad4d442e7345e..d78b2c2ae13a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -451,9 +451,9 @@ static int gfx_v9_4_3_ring_test_ring(struct amdgpu_ring *ring)
 static int gfx_v9_4_3_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
-
 	unsigned index;
 	uint64_t gpu_addr;
 	uint32_t tmp;
@@ -465,22 +465,26 @@ static int gfx_v9_4_3_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	adev->wb.wb[index] = cpu_to_le32(0xCAFEDEAD);
-	memset(&ib, 0, sizeof(ib));
 
-	r = amdgpu_ib_get(adev, NULL, 20, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 20,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_GFX_RING_TEST);
 	if (r)
 		goto err1;
 
-	ib.ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
-	ib.ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
-	ib.ptr[2] = lower_32_bits(gpu_addr);
-	ib.ptr[3] = upper_32_bits(gpu_addr);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.length_dw = 5;
+	ib = &job->ibs[0];
+	ib->ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
+	ib->ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
+	ib->ptr[2] = lower_32_bits(gpu_addr);
+	ib->ptr[3] = upper_32_bits(gpu_addr);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->length_dw = 5;
 
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err2;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -497,7 +501,6 @@ static int gfx_v9_4_3_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err2:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err1:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 11/42] drm/amdgpu/gfx10: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (9 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 10/42] drm/amdgpu/gfx9.4.3: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 12/42] drm/amdgpu/gfx11: " Alex Deucher
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 29 ++++++++++++++------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 41bbedb8e157e..496121bdc1de1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4071,15 +4071,14 @@ static int gfx_v10_0_ring_test_ring(struct amdgpu_ring *ring)
 static int gfx_v10_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned int index;
 	uint64_t gpu_addr;
 	uint32_t *cpu_ptr;
 	long r;
 
-	memset(&ib, 0, sizeof(ib));
-
 	r = amdgpu_device_wb_get(adev, &index);
 	if (r)
 		return r;
@@ -4088,22 +4087,27 @@ static int gfx_v10_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	adev->wb.wb[index] = cpu_to_le32(0xCAFEDEAD);
 	cpu_ptr = &adev->wb.wb[index];
 
-	r = amdgpu_ib_get(adev, NULL, 20, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 20,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_GFX_RING_TEST);
 	if (r) {
 		drm_err(adev_to_drm(adev), "failed to get ib (%ld).\n", r);
 		goto err1;
 	}
+	ib = &job->ibs[0];
 
-	ib.ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
-	ib.ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
-	ib.ptr[2] = lower_32_bits(gpu_addr);
-	ib.ptr[3] = upper_32_bits(gpu_addr);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.length_dw = 5;
+	ib->ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
+	ib->ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
+	ib->ptr[2] = lower_32_bits(gpu_addr);
+	ib->ptr[3] = upper_32_bits(gpu_addr);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->length_dw = 5;
 
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err2;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -4118,7 +4122,6 @@ static int gfx_v10_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	else
 		r = -EINVAL;
 err2:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err1:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 12/42] drm/amdgpu/gfx11: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (10 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 11/42] drm/amdgpu/gfx10: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 13/42] drm/amdgpu/gfx12: " Alex Deucher
                   ` (30 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 29 ++++++++++++++------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 3a4ca104b1612..5ad2516a60240 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -604,7 +604,8 @@ static int gfx_v11_0_ring_test_ring(struct amdgpu_ring *ring)
 static int gfx_v11_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	uint64_t gpu_addr;
@@ -616,8 +617,6 @@ static int gfx_v11_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	    ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
 		return 0;
 
-	memset(&ib, 0, sizeof(ib));
-
 	r = amdgpu_device_wb_get(adev, &index);
 	if (r)
 		return r;
@@ -626,22 +625,27 @@ static int gfx_v11_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	adev->wb.wb[index] = cpu_to_le32(0xCAFEDEAD);
 	cpu_ptr = &adev->wb.wb[index];
 
-	r = amdgpu_ib_get(adev, NULL, 20, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 20,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_GFX_RING_TEST);
 	if (r) {
 		drm_err(adev_to_drm(adev), "failed to get ib (%ld).\n", r);
 		goto err1;
 	}
+	ib = &job->ibs[0];
 
-	ib.ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
-	ib.ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
-	ib.ptr[2] = lower_32_bits(gpu_addr);
-	ib.ptr[3] = upper_32_bits(gpu_addr);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.length_dw = 5;
+	ib->ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
+	ib->ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
+	ib->ptr[2] = lower_32_bits(gpu_addr);
+	ib->ptr[3] = upper_32_bits(gpu_addr);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->length_dw = 5;
 
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err2;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -656,7 +660,6 @@ static int gfx_v11_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	else
 		r = -EINVAL;
 err2:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err1:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 13/42] drm/amdgpu/gfx12: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (11 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 12/42] drm/amdgpu/gfx11: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 14/42] drm/amdgpu/gfx12.1: " Alex Deucher
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 29 ++++++++++++++------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 6cd16f016c374..5862b5f60a6ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -494,7 +494,8 @@ static int gfx_v12_0_ring_test_ring(struct amdgpu_ring *ring)
 static int gfx_v12_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	uint64_t gpu_addr;
@@ -506,8 +507,6 @@ static int gfx_v12_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	    ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
 		return 0;
 
-	memset(&ib, 0, sizeof(ib));
-
 	r = amdgpu_device_wb_get(adev, &index);
 	if (r)
 		return r;
@@ -516,22 +515,27 @@ static int gfx_v12_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	adev->wb.wb[index] = cpu_to_le32(0xCAFEDEAD);
 	cpu_ptr = &adev->wb.wb[index];
 
-	r = amdgpu_ib_get(adev, NULL, 16, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 16,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_GFX_RING_TEST);
 	if (r) {
 		drm_err(adev_to_drm(adev), "failed to get ib (%ld).\n", r);
 		goto err1;
 	}
+	ib = &job->ibs[0];
 
-	ib.ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
-	ib.ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
-	ib.ptr[2] = lower_32_bits(gpu_addr);
-	ib.ptr[3] = upper_32_bits(gpu_addr);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.length_dw = 5;
+	ib->ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
+	ib->ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
+	ib->ptr[2] = lower_32_bits(gpu_addr);
+	ib->ptr[3] = upper_32_bits(gpu_addr);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->length_dw = 5;
 
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err2;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -546,7 +550,6 @@ static int gfx_v12_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	else
 		r = -EINVAL;
 err2:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err1:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 14/42] drm/amdgpu/gfx12.1: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (12 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 13/42] drm/amdgpu/gfx12: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 15/42] drm/amdgpu/si_dma: " Alex Deucher
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c | 29 ++++++++++++++------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c
index 86cc90a662965..7d02569cd4738 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c
@@ -306,7 +306,8 @@ static int gfx_v12_1_ring_test_ring(struct amdgpu_ring *ring)
 static int gfx_v12_1_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	uint64_t gpu_addr;
@@ -318,8 +319,6 @@ static int gfx_v12_1_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	    ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
 		return 0;
 
-	memset(&ib, 0, sizeof(ib));
-
 	r = amdgpu_device_wb_get(adev, &index);
 	if (r)
 		return r;
@@ -328,22 +327,27 @@ static int gfx_v12_1_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	adev->wb.wb[index] = cpu_to_le32(0xCAFEDEAD);
 	cpu_ptr = &adev->wb.wb[index];
 
-	r = amdgpu_ib_get(adev, NULL, 16, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 16,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_GFX_RING_TEST);
 	if (r) {
 		dev_err(adev->dev, "amdgpu: failed to get ib (%ld).\n", r);
 		goto err1;
 	}
+	ib = &job->ibs[0];
 
-	ib.ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
-	ib.ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
-	ib.ptr[2] = lower_32_bits(gpu_addr);
-	ib.ptr[3] = upper_32_bits(gpu_addr);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.length_dw = 5;
+	ib->ptr[0] = PACKET3(PACKET3_WRITE_DATA, 3);
+	ib->ptr[1] = WRITE_DATA_DST_SEL(5) | WR_CONFIRM;
+	ib->ptr[2] = lower_32_bits(gpu_addr);
+	ib->ptr[3] = upper_32_bits(gpu_addr);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->length_dw = 5;
 
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err2;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -358,7 +362,6 @@ static int gfx_v12_1_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	else
 		r = -EINVAL;
 err2:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err1:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 15/42] drm/amdgpu/si_dma: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (13 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 14/42] drm/amdgpu/gfx12.1: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 16/42] drm/amdgpu/cik_sdma: " Alex Deucher
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/si_dma.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c b/drivers/gpu/drm/amd/amdgpu/si_dma.c
index 74fcaa340d9b1..b67bd343f795f 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
@@ -259,7 +259,8 @@ static int si_dma_ring_test_ring(struct amdgpu_ring *ring)
 static int si_dma_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	u32 tmp = 0;
@@ -273,20 +274,25 @@ static int si_dma_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	tmp = 0xCAFEDEAD;
 	adev->wb.wb[index] = cpu_to_le32(tmp);
-	memset(&ib, 0, sizeof(ib));
-	r = amdgpu_ib_get(adev, NULL, 256,
-					AMDGPU_IB_POOL_DIRECT, &ib);
+
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r)
 		goto err0;
 
-	ib.ptr[0] = DMA_PACKET(DMA_PACKET_WRITE, 0, 0, 0, 1);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr) & 0xff;
-	ib.ptr[3] = 0xDEADBEEF;
-	ib.length_dw = 4;
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib = &job->ibs[0];
+	ib->ptr[0] = DMA_PACKET(DMA_PACKET_WRITE, 0, 0, 0, 1);
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr) & 0xff;
+	ib->ptr[3] = 0xDEADBEEF;
+	ib->length_dw = 4;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -302,7 +308,6 @@ static int si_dma_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 16/42] drm/amdgpu/cik_sdma: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (14 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 15/42] drm/amdgpu/si_dma: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 17/42] drm/amdgpu/sdma2.4: " Alex Deucher
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 31 ++++++++++++++++-----------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
index 9e8715b4739da..e2ca96f5a7cfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
@@ -652,7 +652,8 @@ static int cik_sdma_ring_test_ring(struct amdgpu_ring *ring)
 static int cik_sdma_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	u32 tmp = 0;
@@ -666,22 +667,27 @@ static int cik_sdma_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	tmp = 0xCAFEDEAD;
 	adev->wb.wb[index] = cpu_to_le32(tmp);
-	memset(&ib, 0, sizeof(ib));
-	r = amdgpu_ib_get(adev, NULL, 256,
-					AMDGPU_IB_POOL_DIRECT, &ib);
+
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r)
 		goto err0;
+	ib = &job->ibs[0];
 
-	ib.ptr[0] = SDMA_PACKET(SDMA_OPCODE_WRITE,
+	ib->ptr[0] = SDMA_PACKET(SDMA_OPCODE_WRITE,
 				SDMA_WRITE_SUB_OPCODE_LINEAR, 0);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr);
-	ib.ptr[3] = 1;
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.length_dw = 5;
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr);
+	ib->ptr[3] = 1;
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->length_dw = 5;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -697,7 +703,6 @@ static int cik_sdma_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 17/42] drm/amdgpu/sdma2.4: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (15 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 16/42] drm/amdgpu/cik_sdma: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 18/42] drm/amdgpu/sdma3: " Alex Deucher
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 38 ++++++++++++++------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
index 92ce580647cdc..46263d50cc9ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
@@ -584,7 +584,8 @@ static int sdma_v2_4_ring_test_ring(struct amdgpu_ring *ring)
 static int sdma_v2_4_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	u32 tmp = 0;
@@ -598,26 +599,30 @@ static int sdma_v2_4_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	tmp = 0xCAFEDEAD;
 	adev->wb.wb[index] = cpu_to_le32(tmp);
-	memset(&ib, 0, sizeof(ib));
-	r = amdgpu_ib_get(adev, NULL, 256,
-					AMDGPU_IB_POOL_DIRECT, &ib);
+
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r)
 		goto err0;
 
-	ib.ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
+	ib = &job->ibs[0];
+	ib->ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
 		SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr);
-	ib.ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(1);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.ptr[5] = SDMA_PKT_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[6] = SDMA_PKT_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[7] = SDMA_PKT_HEADER_OP(SDMA_OP_NOP);
-	ib.length_dw = 8;
-
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr);
+	ib->ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(1);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->ptr[5] = SDMA_PKT_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[6] = SDMA_PKT_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[7] = SDMA_PKT_HEADER_OP(SDMA_OP_NOP);
+	ib->length_dw = 8;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -633,7 +638,6 @@ static int sdma_v2_4_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 18/42] drm/amdgpu/sdma3: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (16 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 17/42] drm/amdgpu/sdma2.4: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 19/42] drm/amdgpu/sdma4: " Alex Deucher
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 38 ++++++++++++++------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index 1c076bd1cf73e..f9f05768072ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -858,7 +858,8 @@ static int sdma_v3_0_ring_test_ring(struct amdgpu_ring *ring)
 static int sdma_v3_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	u32 tmp = 0;
@@ -872,26 +873,30 @@ static int sdma_v3_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	tmp = 0xCAFEDEAD;
 	adev->wb.wb[index] = cpu_to_le32(tmp);
-	memset(&ib, 0, sizeof(ib));
-	r = amdgpu_ib_get(adev, NULL, 256,
-					AMDGPU_IB_POOL_DIRECT, &ib);
+
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r)
 		goto err0;
 
-	ib.ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
+	ib = &job->ibs[0];
+	ib->ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
 		SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr);
-	ib.ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(1);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.length_dw = 8;
-
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr);
+	ib->ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(1);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->length_dw = 8;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -906,7 +911,6 @@ static int sdma_v3_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	else
 		r = -EINVAL;
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 19/42] drm/amdgpu/sdma4: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (17 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 18/42] drm/amdgpu/sdma3: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 20/42] drm/amdgpu/sdma4.4.2: " Alex Deucher
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 38 ++++++++++++++------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index f38004e6064e5..56d2832ccba2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1516,7 +1516,8 @@ static int sdma_v4_0_ring_test_ring(struct amdgpu_ring *ring)
 static int sdma_v4_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	long r;
@@ -1530,26 +1531,30 @@ static int sdma_v4_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	tmp = 0xCAFEDEAD;
 	adev->wb.wb[index] = cpu_to_le32(tmp);
-	memset(&ib, 0, sizeof(ib));
-	r = amdgpu_ib_get(adev, NULL, 256,
-					AMDGPU_IB_POOL_DIRECT, &ib);
+
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r)
 		goto err0;
 
-	ib.ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
+	ib = &job->ibs[0];
+	ib->ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
 		SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr);
-	ib.ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.length_dw = 8;
-
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr);
+	ib->ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->length_dw = 8;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -1565,7 +1570,6 @@ static int sdma_v4_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 20/42] drm/amdgpu/sdma4.4.2: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (18 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 19/42] drm/amdgpu/sdma4: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 21/42] drm/amdgpu/sdma5: " Alex Deucher
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 38 +++++++++++++-----------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index a1443990d5c60..dd8d6a572710d 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -1112,7 +1112,8 @@ static int sdma_v4_4_2_ring_test_ring(struct amdgpu_ring *ring)
 static int sdma_v4_4_2_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	long r;
@@ -1126,26 +1127,30 @@ static int sdma_v4_4_2_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	tmp = 0xCAFEDEAD;
 	adev->wb.wb[index] = cpu_to_le32(tmp);
-	memset(&ib, 0, sizeof(ib));
-	r = amdgpu_ib_get(adev, NULL, 256,
-					AMDGPU_IB_POOL_DIRECT, &ib);
+
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r)
 		goto err0;
 
-	ib.ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
+	ib = &job->ibs[0];
+	ib->ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
 		SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr);
-	ib.ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.length_dw = 8;
-
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr);
+	ib->ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->length_dw = 8;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -1161,7 +1166,6 @@ static int sdma_v4_4_2_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 21/42] drm/amdgpu/sdma5: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (19 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 20/42] drm/amdgpu/sdma4.4.2: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 22/42] drm/amdgpu/sdma5.2: " Alex Deucher
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 37 ++++++++++++++------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 7811cbb1f7ba3..786f1776fa30d 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -1074,7 +1074,8 @@ static int sdma_v5_0_ring_test_ring(struct amdgpu_ring *ring)
 static int sdma_v5_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	long r;
@@ -1082,7 +1083,6 @@ static int sdma_v5_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	u64 gpu_addr;
 
 	tmp = 0xCAFEDEAD;
-	memset(&ib, 0, sizeof(ib));
 
 	r = amdgpu_device_wb_get(adev, &index);
 	if (r) {
@@ -1093,27 +1093,31 @@ static int sdma_v5_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	adev->wb.wb[index] = cpu_to_le32(tmp);
 
-	r = amdgpu_ib_get(adev, NULL, 256,
-			  AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r) {
 		drm_err(adev_to_drm(adev), "failed to get ib (%ld).\n", r);
 		goto err0;
 	}
 
-	ib.ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
+	ib = &job->ibs[0];
+	ib->ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
 		SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr);
-	ib.ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.length_dw = 8;
-
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr);
+	ib->ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->length_dw = 8;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -1133,7 +1137,6 @@ static int sdma_v5_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 22/42] drm/amdgpu/sdma5.2: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (20 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 21/42] drm/amdgpu/sdma5: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 23/42] drm/amdgpu/sdma6: " Alex Deucher
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 36 ++++++++++++++------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index dbe5b8f109f6a..49005b96aa3f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -974,7 +974,8 @@ static int sdma_v5_2_ring_test_ring(struct amdgpu_ring *ring)
 static int sdma_v5_2_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	long r;
@@ -982,7 +983,6 @@ static int sdma_v5_2_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	u64 gpu_addr;
 
 	tmp = 0xCAFEDEAD;
-	memset(&ib, 0, sizeof(ib));
 
 	r = amdgpu_device_wb_get(adev, &index);
 	if (r) {
@@ -993,26 +993,31 @@ static int sdma_v5_2_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	adev->wb.wb[index] = cpu_to_le32(tmp);
 
-	r = amdgpu_ib_get(adev, NULL, 256, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r) {
 		drm_err(adev_to_drm(adev), "failed to get ib (%ld).\n", r);
 		goto err0;
 	}
 
-	ib.ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
+	ib = &job->ibs[0];
+	ib->ptr[0] = SDMA_PKT_HEADER_OP(SDMA_OP_WRITE) |
 		SDMA_PKT_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr);
-	ib.ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.length_dw = 8;
-
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr);
+	ib->ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->length_dw = 8;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -1032,7 +1037,6 @@ static int sdma_v5_2_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 23/42] drm/amdgpu/sdma6: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (21 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 22/42] drm/amdgpu/sdma5.2: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 24/42] drm/amdgpu/sdma7: " Alex Deucher
                   ` (19 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 36 ++++++++++++++------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index eec659194718d..210ea6ba6212f 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -981,7 +981,8 @@ static int sdma_v6_0_ring_test_ring(struct amdgpu_ring *ring)
 static int sdma_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	long r;
@@ -989,7 +990,6 @@ static int sdma_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	u64 gpu_addr;
 
 	tmp = 0xCAFEDEAD;
-	memset(&ib, 0, sizeof(ib));
 
 	r = amdgpu_device_wb_get(adev, &index);
 	if (r) {
@@ -1000,26 +1000,31 @@ static int sdma_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	adev->wb.wb[index] = cpu_to_le32(tmp);
 
-	r = amdgpu_ib_get(adev, NULL, 256, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r) {
 		drm_err(adev_to_drm(adev), "failed to get ib (%ld).\n", r);
 		goto err0;
 	}
 
-	ib.ptr[0] = SDMA_PKT_COPY_LINEAR_HEADER_OP(SDMA_OP_WRITE) |
+	ib = &job->ibs[0];
+	ib->ptr[0] = SDMA_PKT_COPY_LINEAR_HEADER_OP(SDMA_OP_WRITE) |
 		SDMA_PKT_COPY_LINEAR_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr);
-	ib.ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.length_dw = 8;
-
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr);
+	ib->ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->length_dw = 8;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -1039,7 +1044,6 @@ static int sdma_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 24/42] drm/amdgpu/sdma7: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (22 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 23/42] drm/amdgpu/sdma6: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 25/42] drm/amdgpu/sdma7.1: " Alex Deucher
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 36 ++++++++++++++------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
index 8d16ef257bcb9..3b4417d19212e 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
@@ -997,7 +997,8 @@ static int sdma_v7_0_ring_test_ring(struct amdgpu_ring *ring)
 static int sdma_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	long r;
@@ -1005,7 +1006,6 @@ static int sdma_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	u64 gpu_addr;
 
 	tmp = 0xCAFEDEAD;
-	memset(&ib, 0, sizeof(ib));
 
 	r = amdgpu_device_wb_get(adev, &index);
 	if (r) {
@@ -1016,26 +1016,31 @@ static int sdma_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	adev->wb.wb[index] = cpu_to_le32(tmp);
 
-	r = amdgpu_ib_get(adev, NULL, 256, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r) {
 		drm_err(adev_to_drm(adev), "failed to get ib (%ld).\n", r);
 		goto err0;
 	}
 
-	ib.ptr[0] = SDMA_PKT_COPY_LINEAR_HEADER_OP(SDMA_OP_WRITE) |
+	ib = &job->ibs[0];
+	ib->ptr[0] = SDMA_PKT_COPY_LINEAR_HEADER_OP(SDMA_OP_WRITE) |
 		SDMA_PKT_COPY_LINEAR_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr);
-	ib.ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.length_dw = 8;
-
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr);
+	ib->ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->length_dw = 8;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -1055,7 +1060,6 @@ static int sdma_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 25/42] drm/amdgpu/sdma7.1: switch to using job for IBs
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (23 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 24/42] drm/amdgpu/sdma7: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 26/42] drm/amdgpu: require a job to schedule an IB Alex Deucher
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Switch to using a job structure for IBs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c | 36 ++++++++++++++------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c
index 5bc45c3e00d18..d71a546bdde61 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c
@@ -987,7 +987,8 @@ static int sdma_v7_1_ring_test_ring(struct amdgpu_ring *ring)
 static int sdma_v7_1_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib ib;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
 	struct dma_fence *f = NULL;
 	unsigned index;
 	long r;
@@ -995,7 +996,6 @@ static int sdma_v7_1_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	u64 gpu_addr;
 
 	tmp = 0xCAFEDEAD;
-	memset(&ib, 0, sizeof(ib));
 
 	r = amdgpu_device_wb_get(adev, &index);
 	if (r) {
@@ -1006,26 +1006,31 @@ static int sdma_v7_1_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 	gpu_addr = adev->wb.gpu_addr + (index * 4);
 	adev->wb.wb[index] = cpu_to_le32(tmp);
 
-	r = amdgpu_ib_get(adev, NULL, 256, AMDGPU_IB_POOL_DIRECT, &ib);
+	r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, 256,
+				     AMDGPU_IB_POOL_DIRECT, &job,
+				     AMDGPU_KERNEL_JOB_ID_SDMA_RING_TEST);
 	if (r) {
 		DRM_ERROR("amdgpu: failed to get ib (%ld).\n", r);
 		goto err0;
 	}
 
-	ib.ptr[0] = SDMA_PKT_COPY_LINEAR_HEADER_OP(SDMA_OP_WRITE) |
+	ib = &job->ibs[0];
+	ib->ptr[0] = SDMA_PKT_COPY_LINEAR_HEADER_OP(SDMA_OP_WRITE) |
 		SDMA_PKT_COPY_LINEAR_HEADER_SUB_OP(SDMA_SUBOP_WRITE_LINEAR);
-	ib.ptr[1] = lower_32_bits(gpu_addr);
-	ib.ptr[2] = upper_32_bits(gpu_addr);
-	ib.ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
-	ib.ptr[4] = 0xDEADBEEF;
-	ib.ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
-	ib.length_dw = 8;
-
-	r = amdgpu_ib_schedule(ring, 1, &ib, NULL, &f);
-	if (r)
+	ib->ptr[1] = lower_32_bits(gpu_addr);
+	ib->ptr[2] = upper_32_bits(gpu_addr);
+	ib->ptr[3] = SDMA_PKT_WRITE_UNTILED_DW_3_COUNT(0);
+	ib->ptr[4] = 0xDEADBEEF;
+	ib->ptr[5] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[6] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->ptr[7] = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP);
+	ib->length_dw = 8;
+
+	r = amdgpu_job_submit_direct(job, ring, &f);
+	if (r) {
+		amdgpu_job_free(job);
 		goto err1;
+	}
 
 	r = dma_fence_wait_timeout(f, false, timeout);
 	if (r == 0) {
@@ -1045,7 +1050,6 @@ static int sdma_v7_1_ring_test_ib(struct amdgpu_ring *ring, long timeout)
 		r = -EINVAL;
 
 err1:
-	amdgpu_ib_free(&ib, NULL);
 	dma_fence_put(f);
 err0:
 	amdgpu_device_wb_free(adev, index);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 26/42] drm/amdgpu: require a job to schedule an IB
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (24 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 25/42] drm/amdgpu/sdma7.1: " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 27/42] drm/amdgpu: mark fences with errors before ring reset Alex Deucher
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Remove the old direct submit path.  This simplifies
the code.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c     | 106 ++++++++-------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   3 +-
 4 files changed, 45 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 67a01c4f38855..e8f1266cd8575 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -680,7 +680,7 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev,
 	job->vmid = vmid;
 	job->num_ibs = 1;
 
-	ret = amdgpu_ib_schedule(ring, 1, ib, job, &f);
+	ret = amdgpu_ib_schedule(ring, job, &f);
 
 	if (ret) {
 		drm_err(adev_to_drm(adev), "failed to schedule IB.\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 72ec455fa932c..fb2e08ea248c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -103,8 +103,6 @@ void amdgpu_ib_free(struct amdgpu_ib *ib, struct dma_fence *f)
  * amdgpu_ib_schedule - schedule an IB (Indirect Buffer) on the ring
  *
  * @ring: ring index the IB is associated with
- * @num_ibs: number of IBs to schedule
- * @ibs: IB objects to schedule
  * @job: job to schedule
  * @f: fence created during this submission
  *
@@ -121,12 +119,11 @@ void amdgpu_ib_free(struct amdgpu_ib *ib, struct dma_fence *f)
  * a CONST_IB), it will be put on the ring prior to the DE IB.  Prior
  * to SI there was just a DE IB.
  */
-int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
-		       struct amdgpu_ib *ibs, struct amdgpu_job *job,
+int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		       struct dma_fence **f)
 {
 	struct amdgpu_device *adev = ring->adev;
-	struct amdgpu_ib *ib = &ibs[0];
+	struct amdgpu_ib *ib;
 	struct dma_fence *tmp = NULL;
 	struct amdgpu_fence *af;
 	bool need_ctx_switch;
@@ -142,64 +139,51 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 	unsigned int i;
 	int r = 0;
 
-	if (num_ibs == 0)
+	if (!job)
+		return -EINVAL;
+	if (job->num_ibs == 0)
 		return -EINVAL;
 
-	/* ring tests don't use a job */
-	if (job) {
-		vm = job->vm;
-		fence_ctx = job->base.s_fence ?
-			job->base.s_fence->finished.context : 0;
-		shadow_va = job->shadow_va;
-		csa_va = job->csa_va;
-		gds_va = job->gds_va;
-		init_shadow = job->init_shadow;
-		af = job->hw_fence;
-		/* Save the context of the job for reset handling.
-		 * The driver needs this so it can skip the ring
-		 * contents for guilty contexts.
-		 */
-		af->context = fence_ctx;
-		/* the vm fence is also part of the job's context */
-		job->hw_vm_fence->context = fence_ctx;
-	} else {
-		vm = NULL;
-		fence_ctx = 0;
-		shadow_va = 0;
-		csa_va = 0;
-		gds_va = 0;
-		init_shadow = false;
-		af = kzalloc(sizeof(*af), GFP_ATOMIC);
-		if (!af)
-			return -ENOMEM;
-	}
+	ib = &job->ibs[0];
+	vm = job->vm;
+	fence_ctx = job->base.s_fence ?
+		job->base.s_fence->finished.context : 0;
+	shadow_va = job->shadow_va;
+	csa_va = job->csa_va;
+	gds_va = job->gds_va;
+	init_shadow = job->init_shadow;
+	af = job->hw_fence;
+	/* Save the context of the job for reset handling.
+	 * The driver needs this so it can skip the ring
+	 * contents for guilty contexts.
+	 */
+	af->context = fence_ctx;
+	/* the vm fence is also part of the job's context */
+	job->hw_vm_fence->context = fence_ctx;
 
 	if (!ring->sched.ready) {
 		dev_err(adev->dev, "couldn't schedule ib on ring <%s>\n", ring->name);
-		r = -EINVAL;
-		goto free_fence;
+		return -EINVAL;
 	}
 
 	if (vm && !job->vmid) {
 		dev_err(adev->dev, "VM IB without ID\n");
-		r = -EINVAL;
-		goto free_fence;
+		return -EINVAL;
 	}
 
 	if ((ib->flags & AMDGPU_IB_FLAGS_SECURE) &&
 	    (!ring->funcs->secure_submission_supported)) {
 		dev_err(adev->dev, "secure submissions not supported on ring <%s>\n", ring->name);
-		r = -EINVAL;
-		goto free_fence;
+		return -EINVAL;
 	}
 
-	alloc_size = ring->funcs->emit_frame_size + num_ibs *
+	alloc_size = ring->funcs->emit_frame_size + job->num_ibs *
 		ring->funcs->emit_ib_size;
 
 	r = amdgpu_ring_alloc(ring, alloc_size);
 	if (r) {
 		dev_err(adev->dev, "scheduling IB failed (%d).\n", r);
-		goto free_fence;
+		return r;
 	}
 
 	need_ctx_switch = ring->current_ctx != fence_ctx;
@@ -225,12 +209,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 	if (ring->funcs->insert_start)
 		ring->funcs->insert_start(ring);
 
-	if (job) {
-		r = amdgpu_vm_flush(ring, job, need_pipe_sync);
-		if (r) {
-			amdgpu_ring_undo(ring);
-			return r;
-		}
+	r = amdgpu_vm_flush(ring, job, need_pipe_sync);
+	if (r) {
+		amdgpu_ring_undo(ring);
+		return r;
 	}
 
 	amdgpu_ring_ib_begin(ring);
@@ -248,7 +230,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 	if (need_ctx_switch)
 		status |= AMDGPU_HAVE_CTX_SWITCH;
 
-	if (job && ring->funcs->emit_cntxcntl) {
+	if (ring->funcs->emit_cntxcntl) {
 		status |= job->preamble_status;
 		status |= job->preemption_status;
 		amdgpu_ring_emit_cntxcntl(ring, status);
@@ -257,15 +239,15 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 	/* Setup initial TMZiness and send it off.
 	 */
 	secure = false;
-	if (job && ring->funcs->emit_frame_cntl) {
+	if (ring->funcs->emit_frame_cntl) {
 		secure = ib->flags & AMDGPU_IB_FLAGS_SECURE;
 		amdgpu_ring_emit_frame_cntl(ring, true, secure);
 	}
 
-	for (i = 0; i < num_ibs; ++i) {
-		ib = &ibs[i];
+	for (i = 0; i < job->num_ibs; ++i) {
+		ib = &job->ibs[i];
 
-		if (job && ring->funcs->emit_frame_cntl) {
+		if (ring->funcs->emit_frame_cntl) {
 			if (secure != !!(ib->flags & AMDGPU_IB_FLAGS_SECURE)) {
 				amdgpu_ring_emit_frame_cntl(ring, false, secure);
 				secure = !secure;
@@ -277,7 +259,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 		status &= ~AMDGPU_HAVE_CTX_SWITCH;
 	}
 
-	if (job && ring->funcs->emit_frame_cntl)
+	if (ring->funcs->emit_frame_cntl)
 		amdgpu_ring_emit_frame_cntl(ring, false, secure);
 
 	amdgpu_device_invalidate_hdp(adev, ring);
@@ -286,7 +268,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 		fence_flags |= AMDGPU_FENCE_FLAG_TC_WB_ONLY;
 
 	/* wrap the last IB with fence */
-	if (job && job->uf_addr) {
+	if (job->uf_addr) {
 		amdgpu_ring_emit_fence(ring, job->uf_addr, job->uf_sequence,
 				       fence_flags | AMDGPU_FENCE_FLAG_64BIT);
 	}
@@ -299,15 +281,14 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 	r = amdgpu_fence_emit(ring, af, fence_flags);
 	if (r) {
 		dev_err(adev->dev, "failed to emit fence (%d)\n", r);
-		if (job && job->vmid)
+		if (job->vmid)
 			amdgpu_vmid_reset(adev, ring->vm_hub, job->vmid);
 		amdgpu_ring_undo(ring);
-		goto free_fence;
+		return r;
 	}
 	*f = &af->base;
 	/* get a ref for the job */
-	if (job)
-		dma_fence_get(*f);
+	dma_fence_get(*f);
 
 	if (ring->funcs->insert_end)
 		ring->funcs->insert_end(ring);
@@ -315,7 +296,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 	amdgpu_ring_patch_cond_exec(ring, cond_exec);
 
 	ring->current_ctx = fence_ctx;
-	if (job && ring->funcs->emit_switch_buffer)
+	if (ring->funcs->emit_switch_buffer)
 		amdgpu_ring_emit_switch_buffer(ring);
 
 	if (ring->funcs->emit_wave_limit &&
@@ -334,11 +315,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 	amdgpu_ring_commit(ring);
 
 	return 0;
-
-free_fence:
-	if (!job)
-		kfree(af);
-	return r;
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 7f5d01164897f..d94b85e4e28a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -367,7 +367,7 @@ int amdgpu_job_submit_direct(struct amdgpu_job *job, struct amdgpu_ring *ring,
 	int r;
 
 	job->base.sched = &ring->sched;
-	r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, job, fence);
+	r = amdgpu_ib_schedule(ring, job, fence);
 
 	if (r)
 		return r;
@@ -437,8 +437,7 @@ static struct dma_fence *amdgpu_job_run(struct drm_sched_job *sched_job)
 		dev_dbg(adev->dev, "Skip scheduling IBs in ring(%s)",
 			ring->name);
 	} else {
-		r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, job,
-				       &fence);
+		r = amdgpu_ib_schedule(ring, job, &fence);
 		if (r)
 			dev_err(adev->dev,
 				"Error scheduling IBs (%d) in ring(%s)", r,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 87c9df6c2ecfe..cf56babb2527a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -569,8 +569,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		  enum amdgpu_ib_pool_type pool,
 		  struct amdgpu_ib *ib);
 void amdgpu_ib_free(struct amdgpu_ib *ib, struct dma_fence *f);
-int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
-		       struct amdgpu_ib *ibs, struct amdgpu_job *job,
+int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		       struct dma_fence **f);
 int amdgpu_ib_pool_init(struct amdgpu_device *adev);
 void amdgpu_ib_pool_fini(struct amdgpu_device *adev);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 27/42] drm/amdgpu: mark fences with errors before ring reset
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (25 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 26/42] drm/amdgpu: require a job to schedule an IB Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-13 13:12   ` Christian König
  2026-01-08 14:48 ` [PATCH 28/42] drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion() Alex Deucher
                   ` (15 subsequent siblings)
  42 siblings, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Mark fences with errors before we reset the rings as
we may end up signalling fences as part of the reset
sequence.  The error needs to be set before the fence
is signalled.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 600e6bb98af7a..5defdebd7091e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -872,6 +872,10 @@ void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
 	drm_sched_wqueue_stop(&ring->sched);
 	/* back up the non-guilty commands */
 	amdgpu_ring_backup_unprocessed_commands(ring, guilty_fence);
+	/* signal the guilty fence and set an error on all fences from the context */
+	if (guilty_fence)
+		amdgpu_fence_driver_guilty_force_completion(guilty_fence);
+
 }
 
 int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
@@ -885,9 +889,6 @@ int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
 	if (r)
 		return r;
 
-	/* signal the guilty fence and set an error on all fences from the context */
-	if (guilty_fence)
-		amdgpu_fence_driver_guilty_force_completion(guilty_fence);
 	/* Re-emit the non-guilty commands */
 	if (ring->ring_backup_entries_to_copy) {
 		amdgpu_ring_alloc_reemit(ring, ring->ring_backup_entries_to_copy);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 27/42] drm/amdgpu: mark fences with errors before ring reset
  2026-01-08 14:48 ` [PATCH 27/42] drm/amdgpu: mark fences with errors before ring reset Alex Deucher
@ 2026-01-13 13:12   ` Christian König
  2026-01-13 15:39     ` Alex Deucher
  0 siblings, 1 reply; 66+ messages in thread
From: Christian König @ 2026-01-13 13:12 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx

On 1/8/26 15:48, Alex Deucher wrote:
> Mark fences with errors before we reset the rings as
> we may end up signalling fences as part of the reset
> sequence.  The error needs to be set before the fence
> is signalled.

Setting the error is a good idea, but signaling the fence before the reset is clearly a NAK.

Fence signaling can only happen after we are sure that the DMA operation has been canceled.

Regards,
Christian.

> 
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 600e6bb98af7a..5defdebd7091e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -872,6 +872,10 @@ void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
>  	drm_sched_wqueue_stop(&ring->sched);
>  	/* back up the non-guilty commands */
>  	amdgpu_ring_backup_unprocessed_commands(ring, guilty_fence);
> +	/* signal the guilty fence and set an error on all fences from the context */
> +	if (guilty_fence)
> +		amdgpu_fence_driver_guilty_force_completion(guilty_fence);
> +
>  }
>  
>  int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
> @@ -885,9 +889,6 @@ int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
>  	if (r)
>  		return r;
>  
> -	/* signal the guilty fence and set an error on all fences from the context */
> -	if (guilty_fence)
> -		amdgpu_fence_driver_guilty_force_completion(guilty_fence);
>  	/* Re-emit the non-guilty commands */
>  	if (ring->ring_backup_entries_to_copy) {
>  		amdgpu_ring_alloc_reemit(ring, ring->ring_backup_entries_to_copy);


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 27/42] drm/amdgpu: mark fences with errors before ring reset
  2026-01-13 13:12   ` Christian König
@ 2026-01-13 15:39     ` Alex Deucher
  2026-01-13 21:23       ` Alex Deucher
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-13 15:39 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Tue, Jan 13, 2026 at 8:13 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> On 1/8/26 15:48, Alex Deucher wrote:
> > Mark fences with errors before we reset the rings as
> > we may end up signalling fences as part of the reset
> > sequence.  The error needs to be set before the fence
> > is signalled.
>
> Setting the error is a good idea, but signaling the fence before the reset is clearly a NAK.
>
> Fence signaling can only happen after we are sure that the DMA operation has been canceled.

This function doesn't actually signal any fences any more.  I just
sets errors on the fences.  That's why I renamed it in the next patch.

Alex

>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > index 600e6bb98af7a..5defdebd7091e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > @@ -872,6 +872,10 @@ void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
> >       drm_sched_wqueue_stop(&ring->sched);
> >       /* back up the non-guilty commands */
> >       amdgpu_ring_backup_unprocessed_commands(ring, guilty_fence);
> > +     /* signal the guilty fence and set an error on all fences from the context */
> > +     if (guilty_fence)
> > +             amdgpu_fence_driver_guilty_force_completion(guilty_fence);
> > +
> >  }
> >
> >  int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
> > @@ -885,9 +889,6 @@ int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
> >       if (r)
> >               return r;
> >
> > -     /* signal the guilty fence and set an error on all fences from the context */
> > -     if (guilty_fence)
> > -             amdgpu_fence_driver_guilty_force_completion(guilty_fence);
> >       /* Re-emit the non-guilty commands */
> >       if (ring->ring_backup_entries_to_copy) {
> >               amdgpu_ring_alloc_reemit(ring, ring->ring_backup_entries_to_copy);
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 27/42] drm/amdgpu: mark fences with errors before ring reset
  2026-01-13 15:39     ` Alex Deucher
@ 2026-01-13 21:23       ` Alex Deucher
  0 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-13 21:23 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Tue, Jan 13, 2026 at 10:39 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Tue, Jan 13, 2026 at 8:13 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > On 1/8/26 15:48, Alex Deucher wrote:
> > > Mark fences with errors before we reset the rings as
> > > we may end up signalling fences as part of the reset
> > > sequence.  The error needs to be set before the fence
> > > is signalled.
> >
> > Setting the error is a good idea, but signaling the fence before the reset is clearly a NAK.
> >
> > Fence signaling can only happen after we are sure that the DMA operation has been canceled.
>
> This function doesn't actually signal any fences any more.  I just
> sets errors on the fences.  That's why I renamed it in the next patch.

I've reordered these to make that clearer.

Alex

>
> Alex
>
> >
> > Regards,
> > Christian.
> >
> > >
> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > > index 600e6bb98af7a..5defdebd7091e 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > > @@ -872,6 +872,10 @@ void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
> > >       drm_sched_wqueue_stop(&ring->sched);
> > >       /* back up the non-guilty commands */
> > >       amdgpu_ring_backup_unprocessed_commands(ring, guilty_fence);
> > > +     /* signal the guilty fence and set an error on all fences from the context */
> > > +     if (guilty_fence)
> > > +             amdgpu_fence_driver_guilty_force_completion(guilty_fence);
> > > +
> > >  }
> > >
> > >  int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
> > > @@ -885,9 +889,6 @@ int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
> > >       if (r)
> > >               return r;
> > >
> > > -     /* signal the guilty fence and set an error on all fences from the context */
> > > -     if (guilty_fence)
> > > -             amdgpu_fence_driver_guilty_force_completion(guilty_fence);
> > >       /* Re-emit the non-guilty commands */
> > >       if (ring->ring_backup_entries_to_copy) {
> > >               amdgpu_ring_alloc_reemit(ring, ring->ring_backup_entries_to_copy);
> >

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 28/42] drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (26 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 27/42] drm/amdgpu: mark fences with errors before ring reset Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset Alex Deucher
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

The function no longer signals the fence so rename it to
better match what it does.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 3a23cce5f769a..6f37fc45458a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -709,12 +709,12 @@ void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring)
  */
 
 /**
- * amdgpu_fence_driver_guilty_force_completion - force signal of specified sequence
+ * amdgpu_fence_driver_update_timedout_fence_state - Update fence state and set errors
  *
- * @af: fence of the ring to signal
+ * @af: fence of the ring to update
  *
  */
-void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af)
+void amdgpu_fence_driver_update_timedout_fence_state(struct amdgpu_fence *af)
 {
 	struct dma_fence *unprocessed;
 	struct dma_fence __rcu **ptr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 5defdebd7091e..b03e3f5d40000 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -874,7 +874,7 @@ void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
 	amdgpu_ring_backup_unprocessed_commands(ring, guilty_fence);
 	/* signal the guilty fence and set an error on all fences from the context */
 	if (guilty_fence)
-		amdgpu_fence_driver_guilty_force_completion(guilty_fence);
+		amdgpu_fence_driver_update_timedout_fence_state(guilty_fence);
 
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index cf56babb2527a..86a788d476957 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -161,7 +161,7 @@ extern const struct drm_sched_backend_ops amdgpu_sched_ops;
 
 void amdgpu_fence_driver_set_error(struct amdgpu_ring *ring, int error);
 void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
-void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af);
+void amdgpu_fence_driver_update_timedout_fence_state(struct amdgpu_fence *af);
 void amdgpu_fence_save_wptr(struct amdgpu_fence *af);
 
 int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (27 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 28/42] drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion() Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-13 13:17   ` Christian König
  2026-01-08 14:48 ` [PATCH 30/42] drm/amdgpu: drop drm_sched_increase_karma() Alex Deucher
                   ` (13 subsequent siblings)
  42 siblings, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

We only want to stop the work queues, not mess with the
pending list so just stop the work queues.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 80572f71ff627..868ab5314c0d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -6301,7 +6301,7 @@ static void amdgpu_device_halt_activities(struct amdgpu_device *adev,
 			if (!amdgpu_ring_sched_ready(ring))
 				continue;
 
-			drm_sched_stop(&ring->sched, job ? &job->base : NULL);
+			drm_sched_wqueue_stop(&ring->sched);
 
 			if (need_emergency_restart)
 				amdgpu_job_stop_all_jobs_on_sched(&ring->sched);
@@ -6385,7 +6385,7 @@ static int amdgpu_device_sched_resume(struct list_head *device_list,
 			if (!amdgpu_ring_sched_ready(ring))
 				continue;
 
-			drm_sched_start(&ring->sched, 0);
+			drm_sched_wqueue_start(&ring->sched);
 		}
 
 		if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset
  2026-01-08 14:48 ` [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset Alex Deucher
@ 2026-01-13 13:17   ` Christian König
  2026-01-13 13:34     ` Philipp Stanner
  0 siblings, 1 reply; 66+ messages in thread
From: Christian König @ 2026-01-13 13:17 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx, Philipp Stanner

On 1/8/26 15:48, Alex Deucher wrote:
> We only want to stop the work queues, not mess with the
> pending list so just stop the work queues.

Oh, yes please! I can't remember how long we have worked towards that.

But we also need to change the return code so that the scheduler now re-inserts the job into the pending list.

Adding Philip on CC to double check what I say above.

Regards,
Christian.

> 
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 80572f71ff627..868ab5314c0d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -6301,7 +6301,7 @@ static void amdgpu_device_halt_activities(struct amdgpu_device *adev,
>  			if (!amdgpu_ring_sched_ready(ring))
>  				continue;
>  
> -			drm_sched_stop(&ring->sched, job ? &job->base : NULL);
> +			drm_sched_wqueue_stop(&ring->sched);
>  
>  			if (need_emergency_restart)
>  				amdgpu_job_stop_all_jobs_on_sched(&ring->sched);
> @@ -6385,7 +6385,7 @@ static int amdgpu_device_sched_resume(struct list_head *device_list,
>  			if (!amdgpu_ring_sched_ready(ring))
>  				continue;
>  
> -			drm_sched_start(&ring->sched, 0);
> +			drm_sched_wqueue_start(&ring->sched);
>  		}
>  
>  		if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset
  2026-01-13 13:17   ` Christian König
@ 2026-01-13 13:34     ` Philipp Stanner
  2026-01-13 14:37       ` Christian König
  0 siblings, 1 reply; 66+ messages in thread
From: Philipp Stanner @ 2026-01-13 13:34 UTC (permalink / raw)
  To: Christian König, Alex Deucher, amd-gfx, Philipp Stanner

On Tue, 2026-01-13 at 14:17 +0100, Christian König wrote:
> On 1/8/26 15:48, Alex Deucher wrote:
> > We only want to stop the work queues, not mess with the
> > pending list so just stop the work queues.

Ideally amdgpu could stop touching the pending_list altogether forever,
as discussed at XDC. Is work for that in the pipe? Is that what this
patch is for?

> 
> Oh, yes please! I can't remember how long we have worked towards that.
> 
> But we also need to change the return code so that the scheduler now re-inserts the job into the pending list.

You're referring to false-positive timeouts. Porting users to that
typically consists of adding that return code and also removing
whatever the driver used to do to inject the non-timedout job into the
scheduler again.

How is that being done here?

P.

> 
> Adding Philip on CC to double check what I say above.
> 
> Regards,
> Christian.
> 
> > 
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 80572f71ff627..868ab5314c0d1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -6301,7 +6301,7 @@ static void amdgpu_device_halt_activities(struct amdgpu_device *adev,
> >  			if (!amdgpu_ring_sched_ready(ring))
> >  				continue;
> >  
> > -			drm_sched_stop(&ring->sched, job ? &job->base : NULL);
> > +			drm_sched_wqueue_stop(&ring->sched);
> >  
> >  			if (need_emergency_restart)
> >  				amdgpu_job_stop_all_jobs_on_sched(&ring->sched);
> > @@ -6385,7 +6385,7 @@ static int amdgpu_device_sched_resume(struct list_head *device_list,
> >  			if (!amdgpu_ring_sched_ready(ring))
> >  				continue;
> >  
> > -			drm_sched_start(&ring->sched, 0);
> > +			drm_sched_wqueue_start(&ring->sched);
> >  		}
> >  
> >  		if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled)
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset
  2026-01-13 13:34     ` Philipp Stanner
@ 2026-01-13 14:37       ` Christian König
  2026-01-13 15:16         ` Philipp Stanner
  2026-01-13 16:46         ` Alex Deucher
  0 siblings, 2 replies; 66+ messages in thread
From: Christian König @ 2026-01-13 14:37 UTC (permalink / raw)
  To: phasta, Alex Deucher, amd-gfx

On 1/13/26 14:34, Philipp Stanner wrote:
> On Tue, 2026-01-13 at 14:17 +0100, Christian König wrote:
>> On 1/8/26 15:48, Alex Deucher wrote:
>>> We only want to stop the work queues, not mess with the
>>> pending list so just stop the work queues.
> 
> Ideally amdgpu could stop touching the pending_list altogether forever,
> as discussed at XDC. Is work for that in the pipe? Is that what this
> patch is for?

Yes.

> 
>>
>> Oh, yes please! I can't remember how long we have worked towards that.
>>
>> But we also need to change the return code so that the scheduler now re-inserts the job into the pending list.
> 
> You're referring to false-positive timeouts. Porting users to that
> typically consists of adding that return code and also removing
> whatever the driver used to do to inject the non-timedout job into the
> scheduler again.
> 
> How is that being done here?

Previously drm_sched_stop() would insert the job back into the pending list after stopping the scheduler thread.

But when that is replaced with drm_sched_wqueue_stop() then that won't happen any more. That is a good thing and prevents us from running into problems like UAF because the HW fence signaled.

As far as I can see we should start returning DRM_GPU_SCHED_STAT_NO_HANG from amdgpu even when there was actually a hang (maybe rename the return code).

Regards,
Christian.

> 
> P.
> 
>>
>> Adding Philip on CC to double check what I say above.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 80572f71ff627..868ab5314c0d1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -6301,7 +6301,7 @@ static void amdgpu_device_halt_activities(struct amdgpu_device *adev,
>>>  			if (!amdgpu_ring_sched_ready(ring))
>>>  				continue;
>>>  
>>> -			drm_sched_stop(&ring->sched, job ? &job->base : NULL);
>>> +			drm_sched_wqueue_stop(&ring->sched);
>>>  
>>>  			if (need_emergency_restart)
>>>  				amdgpu_job_stop_all_jobs_on_sched(&ring->sched);
>>> @@ -6385,7 +6385,7 @@ static int amdgpu_device_sched_resume(struct list_head *device_list,
>>>  			if (!amdgpu_ring_sched_ready(ring))
>>>  				continue;
>>>  
>>> -			drm_sched_start(&ring->sched, 0);
>>> +			drm_sched_wqueue_start(&ring->sched);
>>>  		}
>>>  
>>>  		if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled)
>>
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset
  2026-01-13 14:37       ` Christian König
@ 2026-01-13 15:16         ` Philipp Stanner
  2026-01-13 16:46         ` Alex Deucher
  1 sibling, 0 replies; 66+ messages in thread
From: Philipp Stanner @ 2026-01-13 15:16 UTC (permalink / raw)
  To: Christian König, phasta, Alex Deucher, amd-gfx

On Tue, 2026-01-13 at 15:37 +0100, Christian König wrote:
> On 1/13/26 14:34, Philipp Stanner wrote:
> > On Tue, 2026-01-13 at 14:17 +0100, Christian König wrote:
> > > On 1/8/26 15:48, Alex Deucher wrote:
> > > > We only want to stop the work queues, not mess with the
> > > > pending list so just stop the work queues.
> > 
> > Ideally amdgpu could stop touching the pending_list altogether forever,
> > as discussed at XDC. Is work for that in the pipe? Is that what this
> > patch is for?
> 
> Yes.

Good!

> 
> > 
> > > 
> > > Oh, yes please! I can't remember how long we have worked towards
> > > that.
> > > 
> > > But we also need to change the return code so that the scheduler
> > > now re-inserts the job into the pending list.
> > 
> > You're referring to false-positive timeouts. Porting users to that
> > typically consists of adding that return code and also removing
> > whatever the driver used to do to inject the non-timedout job into
> > the
> > scheduler again.
> > 
> > How is that being done here?
> 
> Previously drm_sched_stop() would insert the job back into the
> pending list after stopping the scheduler thread.

Why does it even do that?

The entire function drm_sched_stop() looks insane to me. I'm not even
sure we're talking about the same reinserting.

Why is it that anyone has any interest at all of keeping a broken job
in the pending_list?


> 
> But when that is replaced with drm_sched_wqueue_stop() then that
> won't happen any more. That is a good thing and prevents us from
> running into problems like UAF because the HW fence signaled.
> 
> As far as I can see we should start returning
> DRM_GPU_SCHED_STAT_NO_HANG from amdgpu even when there was actually a
> hang (maybe rename the return code).

Well, no. That's not how NO_HANG was designed.

Why would you want that?


P.

> 
> Regards,
> Christian.
> 
> > 
> > P.
> > 
> > > 
> > > Adding Philip on CC to double check what I say above.
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > 
> > > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > > > ---
> > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > > index 80572f71ff627..868ab5314c0d1 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > > @@ -6301,7 +6301,7 @@ static void amdgpu_device_halt_activities(struct amdgpu_device *adev,
> > > >  			if (!amdgpu_ring_sched_ready(ring))
> > > >  				continue;
> > > >  
> > > > -			drm_sched_stop(&ring->sched, job ? &job->base : NULL);
> > > > +			drm_sched_wqueue_stop(&ring->sched);
> > > >  
> > > >  			if (need_emergency_restart)
> > > >  				amdgpu_job_stop_all_jobs_on_sched(&ring->sched);
> > > > @@ -6385,7 +6385,7 @@ static int amdgpu_device_sched_resume(struct list_head *device_list,
> > > >  			if (!amdgpu_ring_sched_ready(ring))
> > > >  				continue;
> > > >  
> > > > -			drm_sched_start(&ring->sched, 0);
> > > > +			drm_sched_wqueue_start(&ring->sched);
> > > >  		}
> > > >  
> > > >  		if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled)
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset
  2026-01-13 14:37       ` Christian König
  2026-01-13 15:16         ` Philipp Stanner
@ 2026-01-13 16:46         ` Alex Deucher
  1 sibling, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-13 16:46 UTC (permalink / raw)
  To: Christian König; +Cc: phasta, Alex Deucher, amd-gfx

On Tue, Jan 13, 2026 at 10:57 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> On 1/13/26 14:34, Philipp Stanner wrote:
> > On Tue, 2026-01-13 at 14:17 +0100, Christian König wrote:
> >> On 1/8/26 15:48, Alex Deucher wrote:
> >>> We only want to stop the work queues, not mess with the
> >>> pending list so just stop the work queues.
> >
> > Ideally amdgpu could stop touching the pending_list altogether forever,
> > as discussed at XDC. Is work for that in the pipe? Is that what this
> > patch is for?
>
> Yes.
>
> >
> >>
> >> Oh, yes please! I can't remember how long we have worked towards that.
> >>
> >> But we also need to change the return code so that the scheduler now re-inserts the job into the pending list.
> >
> > You're referring to false-positive timeouts. Porting users to that
> > typically consists of adding that return code and also removing
> > whatever the driver used to do to inject the non-timedout job into the
> > scheduler again.
> >
> > How is that being done here?
>
> Previously drm_sched_stop() would insert the job back into the pending list after stopping the scheduler thread.
>
> But when that is replaced with drm_sched_wqueue_stop() then that won't happen any more. That is a good thing and prevents us from running into problems like UAF because the HW fence signaled.
>
> As far as I can see we should start returning DRM_GPU_SCHED_STAT_NO_HANG from amdgpu even when there was actually a hang (maybe rename the return code).
>

We already return DRM_GPU_SCHED_STAT_NOMINAL unconditionally.  The
only other option is DRM_GPU_SCHED_STAT_ENODEV which is not correct.

As far as I can see, there is nothing else to do.  The fence will be
signalled after the adapter reset.

Alex

> Regards,
> Christian.
>
> >
> > P.
> >
> >>
> >> Adding Philip on CC to double check what I say above.
> >>
> >> Regards,
> >> Christian.
> >>
> >>>
> >>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >>> ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> index 80572f71ff627..868ab5314c0d1 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> @@ -6301,7 +6301,7 @@ static void amdgpu_device_halt_activities(struct amdgpu_device *adev,
> >>>                     if (!amdgpu_ring_sched_ready(ring))
> >>>                             continue;
> >>>
> >>> -                   drm_sched_stop(&ring->sched, job ? &job->base : NULL);
> >>> +                   drm_sched_wqueue_stop(&ring->sched);
> >>>
> >>>                     if (need_emergency_restart)
> >>>                             amdgpu_job_stop_all_jobs_on_sched(&ring->sched);
> >>> @@ -6385,7 +6385,7 @@ static int amdgpu_device_sched_resume(struct list_head *device_list,
> >>>                     if (!amdgpu_ring_sched_ready(ring))
> >>>                             continue;
> >>>
> >>> -                   drm_sched_start(&ring->sched, 0);
> >>> +                   drm_sched_wqueue_start(&ring->sched);
> >>>             }
> >>>
> >>>             if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled)
> >>
> >
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 30/42] drm/amdgpu: drop drm_sched_increase_karma()
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (28 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-13 13:22   ` Christian König
  2026-01-08 14:48 ` [PATCH 31/42] drm/amdgpu: plumb timedout fence through to force completion Alex Deucher
                   ` (12 subsequent siblings)
  42 siblings, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

It was leftover from when the driver supported drm sched
resubmit.  That was dropped long ago, so drop this as well.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 868ab5314c0d1..c9954dd8d83c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5808,9 +5808,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 
 	amdgpu_fence_driver_isr_toggle(adev, false);
 
-	if (job && job->vm)
-		drm_sched_increase_karma(&job->base);
-
 	r = amdgpu_reset_prepare_hwcontext(adev, reset_context);
 	/* If reset handler not implemented, continue; otherwise return */
 	if (r == -EOPNOTSUPP)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 30/42] drm/amdgpu: drop drm_sched_increase_karma()
  2026-01-08 14:48 ` [PATCH 30/42] drm/amdgpu: drop drm_sched_increase_karma() Alex Deucher
@ 2026-01-13 13:22   ` Christian König
  2026-01-13 21:27     ` Alex Deucher
  0 siblings, 1 reply; 66+ messages in thread
From: Christian König @ 2026-01-13 13:22 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx

On 1/8/26 15:48, Alex Deucher wrote:
> It was leftover from when the driver supported drm sched
> resubmit.  That was dropped long ago, so drop this as well.

We unfortunately still need that to update the guilty flag in the context so that amdgpu_ctx_query2() works correctly.

But we could change the code in amdgpu_ctx_query2() to check the individual entities for error codes instead.

Regards,
Christian.

> 
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 868ab5314c0d1..c9954dd8d83c8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5808,9 +5808,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>  
>  	amdgpu_fence_driver_isr_toggle(adev, false);
>  
> -	if (job && job->vm)
> -		drm_sched_increase_karma(&job->base);
> -
>  	r = amdgpu_reset_prepare_hwcontext(adev, reset_context);
>  	/* If reset handler not implemented, continue; otherwise return */
>  	if (r == -EOPNOTSUPP)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 30/42] drm/amdgpu: drop drm_sched_increase_karma()
  2026-01-13 13:22   ` Christian König
@ 2026-01-13 21:27     ` Alex Deucher
  2026-01-13 21:45       ` Alex Deucher
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-13 21:27 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Tue, Jan 13, 2026 at 8:42 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> On 1/8/26 15:48, Alex Deucher wrote:
> > It was leftover from when the driver supported drm sched
> > resubmit.  That was dropped long ago, so drop this as well.
>
> We unfortunately still need that to update the guilty flag in the context so that amdgpu_ctx_query2() works correctly.

I don't think it matters?  We don't call this for per queue resets and
the errors seem to make their way up to userspace properly.  Maybe it
would be better to move drm_sched_increase_karma() into
amdgpu_job_timedout() so it covers both queue resets and adapter
resets.

Alex

>
> But we could change the code in amdgpu_ctx_query2() to check the individual entities for error codes instead.
>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
> >  1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 868ab5314c0d1..c9954dd8d83c8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -5808,9 +5808,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
> >
> >       amdgpu_fence_driver_isr_toggle(adev, false);
> >
> > -     if (job && job->vm)
> > -             drm_sched_increase_karma(&job->base);
> > -
> >       r = amdgpu_reset_prepare_hwcontext(adev, reset_context);
> >       /* If reset handler not implemented, continue; otherwise return */
> >       if (r == -EOPNOTSUPP)
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 30/42] drm/amdgpu: drop drm_sched_increase_karma()
  2026-01-13 21:27     ` Alex Deucher
@ 2026-01-13 21:45       ` Alex Deucher
  0 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-13 21:45 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Tue, Jan 13, 2026 at 4:27 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Tue, Jan 13, 2026 at 8:42 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > On 1/8/26 15:48, Alex Deucher wrote:
> > > It was leftover from when the driver supported drm sched
> > > resubmit.  That was dropped long ago, so drop this as well.
> >
> > We unfortunately still need that to update the guilty flag in the context so that amdgpu_ctx_query2() works correctly.
>
> I don't think it matters?  We don't call this for per queue resets and
> the errors seem to make their way up to userspace properly.  Maybe it
> would be better to move drm_sched_increase_karma() into
> amdgpu_job_timedout() so it covers both queue resets and adapter
> resets.

Calling drm_sched_increase_karma() appears to not do the right thing.
If I keep it in place, the context always shows up as innocent.  If I
move it up to amdgpu_job_timedout(), even per queue reset contexts
show up as innocent.  The behavior is better with it removed.

Alex

>
> Alex
>
> >
> > But we could change the code in amdgpu_ctx_query2() to check the individual entities for error codes instead.
> >
> > Regards,
> > Christian.
> >
> > >
> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
> > >  1 file changed, 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index 868ab5314c0d1..c9954dd8d83c8 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -5808,9 +5808,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
> > >
> > >       amdgpu_fence_driver_isr_toggle(adev, false);
> > >
> > > -     if (job && job->vm)
> > > -             drm_sched_increase_karma(&job->base);
> > > -
> > >       r = amdgpu_reset_prepare_hwcontext(adev, reset_context);
> > >       /* If reset handler not implemented, continue; otherwise return */
> > >       if (r == -EOPNOTSUPP)
> >

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 31/42] drm/amdgpu: plumb timedout fence through to force completion
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (29 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 30/42] drm/amdgpu: drop drm_sched_increase_karma() Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 32/42] drm/amdgpu: change function signature for emit_pipeline_sync() Alex Deucher
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

When we do a full adapter reset, if we know the timedout fence
mark the fence with -ETIME rather than -ECANCELED so it
gets properly handled by userspace.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  6 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 28 +++++++++++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     | 21 ++++++++++------
 7 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1f3e52637326b..e36c8e3cfb0f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1960,7 +1960,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 		/* swap out the old fences */
 		amdgpu_ib_preempt_fences_swap(ring, fences);
 
-		amdgpu_fence_driver_force_completion(ring);
+		amdgpu_fence_driver_force_completion(ring, NULL);
 
 		/* resubmit unfinished jobs */
 		amdgpu_ib_preempt_job_recovery(&ring->sched);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c9954dd8d83c8..d77c3e6552a8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5783,6 +5783,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 {
 	int i, r = 0;
 	struct amdgpu_job *job = NULL;
+	struct dma_fence *fence = NULL;
 	struct amdgpu_device *tmp_adev = reset_context->reset_req_dev;
 	bool need_full_reset =
 		test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags);
@@ -5795,6 +5796,9 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 
 	amdgpu_fence_driver_isr_toggle(adev, true);
 
+	if (job)
+		fence = &job->hw_fence->base;
+
 	/* block all schedulers and reset given job's ring */
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
@@ -5803,7 +5807,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 			continue;
 
 		/* after all hw jobs are reset, hw fence is meaningless, so force_completion */
-		amdgpu_fence_driver_force_completion(ring);
+		amdgpu_fence_driver_force_completion(ring, fence);
 	}
 
 	amdgpu_fence_driver_isr_toggle(adev, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 6f37fc45458a3..b1cf9550c259b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -568,7 +568,7 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev)
 			r = -ENODEV;
 		/* no need to trigger GPU reset as we are unloading */
 		if (r)
-			amdgpu_fence_driver_force_completion(ring);
+			amdgpu_fence_driver_force_completion(ring, NULL);
 
 		if (!drm_dev_is_unplugged(adev_to_drm(adev)) &&
 		    ring->fence_drv.irq_src &&
@@ -683,16 +683,34 @@ void amdgpu_fence_driver_set_error(struct amdgpu_ring *ring, int error)
  * amdgpu_fence_driver_force_completion - force signal latest fence of ring
  *
  * @ring: fence of the ring to signal
+ * @timedout_fence: fence of the timedout job
  *
  */
-void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring)
+void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring,
+					  struct dma_fence *timedout_fence)
 {
-	amdgpu_fence_driver_set_error(ring, -ECANCELED);
+	struct amdgpu_fence_driver *drv = &ring->fence_drv;
+	unsigned long flags;
+
+	spin_lock_irqsave(&drv->lock, flags);
+	for (unsigned int i = 0; i <= drv->num_fences_mask; ++i) {
+		struct dma_fence *fence;
+
+		fence = rcu_dereference_protected(drv->fences[i],
+						  lockdep_is_held(&drv->lock));
+		if (fence && !dma_fence_is_signaled_locked(fence)) {
+			if (fence == timedout_fence)
+				dma_fence_set_error(fence, -ETIME);
+			else
+				dma_fence_set_error(fence, -ECANCELED);
+		}
+	}
+	spin_unlock_irqrestore(&drv->lock, flags);
+
 	amdgpu_fence_write(ring, ring->fence_drv.sync_seq);
 	amdgpu_fence_process(ring);
 }
 
-
 /*
  * Kernel queue reset handling
  *
@@ -753,7 +771,7 @@ void amdgpu_fence_driver_update_timedout_fence_state(struct amdgpu_fence *af)
 
 	if (reemitted) {
 		/* if we've already reemitted once then just cancel everything */
-		amdgpu_fence_driver_force_completion(af->ring);
+		amdgpu_fence_driver_force_completion(af->ring, &af->base);
 		af->ring->ring_backup_entries_to_copy = 0;
 	}
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 86a788d476957..ce095427611fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -160,7 +160,8 @@ struct amdgpu_fence {
 extern const struct drm_sched_backend_ops amdgpu_sched_ops;
 
 void amdgpu_fence_driver_set_error(struct amdgpu_ring *ring, int error);
-void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
+void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring,
+					  struct dma_fence *timedout_fence);
 void amdgpu_fence_driver_update_timedout_fence_state(struct amdgpu_fence *af);
 void amdgpu_fence_save_wptr(struct amdgpu_fence *af);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
index 8b8a04138711c..c270a40de5e5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
@@ -597,10 +597,10 @@ int amdgpu_sdma_reset_engine(struct amdgpu_device *adev, uint32_t instance_id,
 		 * to be submitted to the queues after the reset is complete.
 		 */
 		if (!ret) {
-			amdgpu_fence_driver_force_completion(gfx_ring);
+			amdgpu_fence_driver_force_completion(gfx_ring, NULL);
 			drm_sched_wqueue_start(&gfx_ring->sched);
 			if (adev->sdma.has_page_queue) {
-				amdgpu_fence_driver_force_completion(page_ring);
+				amdgpu_fence_driver_force_completion(page_ring, NULL);
 				drm_sched_wqueue_start(&page_ring->sched);
 			}
 		}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 9d5cca7da1d9e..3a3bc0d370fa6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -512,7 +512,7 @@ int amdgpu_uvd_resume(struct amdgpu_device *adev)
 			}
 			memset_io(ptr, 0, size);
 			/* to restore uvd fence seq */
-			amdgpu_fence_driver_force_completion(&adev->uvd.inst[i].ring);
+			amdgpu_fence_driver_force_completion(&adev->uvd.inst[i].ring, NULL);
 		}
 	}
 	return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 75ae9b429420e..d22c8980fa42b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -1482,15 +1482,16 @@ int vcn_set_powergating_state(struct amdgpu_ip_block *ip_block,
 
 /**
  * amdgpu_vcn_reset_engine - Reset a specific VCN engine
- * @adev: Pointer to the AMDGPU device
- * @instance_id: VCN engine instance to reset
+ * @ring: Pointer to the VCN ring
+ * @timedout_fence: fence that timed out
  *
  * Returns: 0 on success, or a negative error code on failure.
  */
-static int amdgpu_vcn_reset_engine(struct amdgpu_device *adev,
-				   uint32_t instance_id)
+static int amdgpu_vcn_reset_engine(struct amdgpu_ring *ring,
+				   struct amdgpu_fence *timedout_fence)
 {
-	struct amdgpu_vcn_inst *vinst = &adev->vcn.inst[instance_id];
+	struct amdgpu_device *adev = ring->adev;
+	struct amdgpu_vcn_inst *vinst = &adev->vcn.inst[ring->me];
 	int r, i;
 
 	mutex_lock(&vinst->engine_reset_mutex);
@@ -1514,9 +1515,13 @@ static int amdgpu_vcn_reset_engine(struct amdgpu_device *adev,
 		if (r)
 			goto unlock;
 	}
-	amdgpu_fence_driver_force_completion(&vinst->ring_dec);
+	amdgpu_fence_driver_force_completion(&vinst->ring_dec,
+					     (&vinst->ring_dec == ring) ?
+					     &timedout_fence->base : NULL);
 	for (i = 0; i < vinst->num_enc_rings; i++)
-		amdgpu_fence_driver_force_completion(&vinst->ring_enc[i]);
+		amdgpu_fence_driver_force_completion(&vinst->ring_enc[i],
+						     (&vinst->ring_enc[i] == ring) ?
+						     &timedout_fence->base : NULL);
 
 	/* Restart the scheduler's work queue for the dec and enc rings
 	 * if they were stopped by this function. This allows new tasks
@@ -1552,7 +1557,7 @@ int amdgpu_vcn_ring_reset(struct amdgpu_ring *ring,
 	if (adev->vcn.inst[ring->me].using_unified_queue)
 		return -EINVAL;
 
-	return amdgpu_vcn_reset_engine(adev, ring->me);
+	return amdgpu_vcn_reset_engine(ring, timedout_fence);
 }
 
 int amdgpu_vcn_reg_dump_init(struct amdgpu_device *adev,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 32/42] drm/amdgpu: change function signature for emit_pipeline_sync()
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (30 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 31/42] drm/amdgpu: plumb timedout fence through to force completion Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 33/42] drm/amdgpu: drop extra parameter for vm_flush Alex Deucher
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Pass the seq as a parameter. No intended functional change.
This paves the way for future improvements to queue reset
handling by making the sync point explicit rather then implicit.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c  | 4 ++--
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c    | 5 +++--
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c   | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c   | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c    | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c    | 5 +++--
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c    | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 3 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c  | 4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   | 5 +++--
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   | 5 +++--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   | 5 +++--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 5 +++--
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 5 +++--
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   | 5 +++--
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   | 5 +++--
 drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c   | 5 +++--
 drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c   | 5 +++--
 drivers/gpu/drm/amd/amdgpu/si_dma.c      | 5 +++--
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c    | 8 ++++----
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c    | 4 ++--
 25 files changed, 63 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index ce095427611fb..da437c349aab9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -252,7 +252,7 @@ struct amdgpu_ring_funcs {
 			uint32_t flags);
 	void (*emit_fence)(struct amdgpu_ring *ring, uint64_t addr,
 			   uint64_t seq, unsigned flags);
-	void (*emit_pipeline_sync)(struct amdgpu_ring *ring);
+	void (*emit_pipeline_sync)(struct amdgpu_ring *ring, u32 seq);
 	void (*emit_vm_flush)(struct amdgpu_ring *ring, unsigned vmid,
 			      uint64_t pd_addr);
 	void (*emit_hdp_flush)(struct amdgpu_ring *ring);
@@ -436,7 +436,7 @@ struct amdgpu_ring {
 #define amdgpu_ring_get_wptr(r) (r)->funcs->get_wptr((r))
 #define amdgpu_ring_set_wptr(r) (r)->funcs->set_wptr((r))
 #define amdgpu_ring_emit_ib(r, job, ib, flags) ((r)->funcs->emit_ib((r), (job), (ib), (flags)))
-#define amdgpu_ring_emit_pipeline_sync(r) (r)->funcs->emit_pipeline_sync((r))
+#define amdgpu_ring_emit_pipeline_sync(r, s) (r)->funcs->emit_pipeline_sync((r), (s))
 #define amdgpu_ring_emit_vm_flush(r, vmid, addr) (r)->funcs->emit_vm_flush((r), (vmid), (addr))
 #define amdgpu_ring_emit_fence(r, addr, seq, flags) (r)->funcs->emit_fence((r), (addr), (seq), (flags))
 #define amdgpu_ring_emit_gds_switch(r, v, db, ds, wb, ws, ab, as) (r)->funcs->emit_gds_switch((r), (v), (db), (ds), (wb), (ws), (ab), (as))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0eccb31793ca7..c05a9f80053d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -820,7 +820,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
 						   ring->cond_exe_gpu_addr);
 
 	if (need_pipe_sync)
-		amdgpu_ring_emit_pipeline_sync(ring);
+		amdgpu_ring_emit_pipeline_sync(ring, ring->fence_drv.sync_seq);
 
 	if (cleaner_shader_needed)
 		ring->funcs->emit_cleaner_shader(ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
index 9fb1946be1ba2..54ee78c034cdb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
@@ -564,9 +564,9 @@ static void vpe_ring_emit_fence(struct amdgpu_ring *ring, uint64_t addr,
 
 }
 
-static void vpe_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void vpe_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	vpe_ring_emit_pred_exec(ring, 0, 6);
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
index e2ca96f5a7cfb..21b3a815bf2a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
@@ -821,12 +821,13 @@ static void cik_sdma_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib)
  * cik_sdma_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void cik_sdma_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void cik_sdma_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					     u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 496121bdc1de1..e0e125eef9ac5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -8736,10 +8736,10 @@ static void gfx_v10_0_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,
 	amdgpu_ring_write(ring, 0);
 }
 
-static void gfx_v10_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void gfx_v10_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
 	int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	gfx_v10_0_wait_reg_mem(ring, usepfp, 1, 0, lower_32_bits(addr),
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 5ad2516a60240..cc9ac87c5be02 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -5966,10 +5966,10 @@ static void gfx_v11_0_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,
 	amdgpu_ring_write(ring, 0);
 }
 
-static void gfx_v11_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void gfx_v11_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
 	int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	gfx_v11_0_wait_reg_mem(ring, usepfp, 1, 0, lower_32_bits(addr),
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 5862b5f60a6ee..cbe175145286b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -4483,10 +4483,10 @@ static void gfx_v12_0_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,
 	amdgpu_ring_write(ring, 0);
 }
 
-static void gfx_v12_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void gfx_v12_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
 	int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	gfx_v12_0_wait_reg_mem(ring, usepfp, 1, 0, lower_32_bits(addr),
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c
index 7d02569cd4738..b7e1d7546267c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c
@@ -3381,10 +3381,10 @@ static void gfx_v12_1_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,
 	amdgpu_ring_write(ring, 0);
 }
 
-static void gfx_v12_1_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void gfx_v12_1_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
 	int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	gfx_v12_1_wait_reg_mem(ring, usepfp, 1, 0, lower_32_bits(addr),
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 2f8aa99f17480..fcc1e75146e90 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -2287,10 +2287,10 @@ static int gfx_v6_0_cp_resume(struct amdgpu_device *adev)
 	return 0;
 }
 
-static void gfx_v6_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void gfx_v6_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					     u32 seq)
 {
 	int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	amdgpu_ring_write(ring, PACKET3(PACKET3_WAIT_REG_MEM, 5));
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index fa235b981c2e9..4ffff8ad4dc83 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -3096,14 +3096,15 @@ static int gfx_v7_0_cp_resume(struct amdgpu_device *adev)
  * gfx_v7_0_ring_emit_pipeline_sync - cik vm flush using the CP
  *
  * @ring: the ring to emit the commands to
+ * @seq: sequence number to wait for
  *
  * Sync the command pipeline with the PFP. E.g. wait for everything
  * to be completed.
  */
-static void gfx_v7_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void gfx_v7_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					     u32 seq)
 {
 	int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	amdgpu_ring_write(ring, PACKET3(PACKET3_WAIT_REG_MEM, 5));
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 4736216cd0211..f88cfef175c0f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6176,10 +6176,10 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr,
 
 }
 
-static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					     u32 seq)
 {
 	int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	amdgpu_ring_write(ring, PACKET3(PACKET3_WAIT_REG_MEM, 5));
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 36f0300a21bfa..07fe959abe0d7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -5606,7 +5606,8 @@ static void gfx_v9_0_emit_mem_sync(struct amdgpu_ring *ring)
 	amdgpu_ring_write(ring, 0x0000000A); /* POLL_INTERVAL */
 }
 
-static void gfx_v9_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void gfx_v9_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					     u32 seq)
 {
 	if (ring->funcs->type == AMDGPU_RING_TYPE_GFX) {
 		gfx_v9_0_ring_emit_event_write(ring, VS_PARTIAL_FLUSH, 4);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index d78b2c2ae13a3..fb731e877c99c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -2904,10 +2904,10 @@ static void gfx_v9_4_3_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,
 	amdgpu_ring_write(ring, 0);
 }
 
-static void gfx_v9_4_3_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void gfx_v9_4_3_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					       u32 seq)
 {
 	int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	gfx_v9_4_3_wait_reg_mem(ring, usepfp, 1, 0,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
index 46263d50cc9ef..42dca080e1dd5 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
@@ -756,12 +756,13 @@ static void sdma_v2_4_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib
  * sdma_v2_4_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void sdma_v2_4_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void sdma_v2_4_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index f9f05768072ad..b6e0d035c27eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -1029,12 +1029,13 @@ static void sdma_v3_0_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib
  * sdma_v3_0_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void sdma_v3_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void sdma_v3_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 56d2832ccba2d..ae6b9f344e20d 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1691,12 +1691,13 @@ static void sdma_v4_0_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib
  * sdma_v4_0_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void sdma_v4_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void sdma_v4_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index dd8d6a572710d..86b800e2b4329 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -1287,12 +1287,13 @@ static void sdma_v4_4_2_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *
  * sdma_v4_4_2_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void sdma_v4_4_2_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void sdma_v4_4_2_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+						u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 786f1776fa30d..c5dc727c7b448 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -1259,12 +1259,13 @@ static void sdma_v5_0_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib
  * sdma_v5_0_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void sdma_v5_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void sdma_v5_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index 49005b96aa3f2..3076734462d25 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -1160,12 +1160,13 @@ static void sdma_v5_2_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib
  * sdma_v5_2_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void sdma_v5_2_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void sdma_v5_2_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 210ea6ba6212f..fbac29485d0c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1165,12 +1165,13 @@ static void sdma_v6_0_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib
  * sdma_v6_0_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void sdma_v6_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void sdma_v6_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
index 3b4417d19212e..bb9fae2c8dee0 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
@@ -1185,12 +1185,13 @@ static void sdma_v7_0_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib
  * sdma_v7_0_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void sdma_v7_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void sdma_v7_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c
index d71a546bdde61..5efdb4dcbed97 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c
@@ -1182,12 +1182,13 @@ static void sdma_v7_1_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib
  * sdma_v7_1_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void sdma_v7_1_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void sdma_v7_1_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					      u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c b/drivers/gpu/drm/amd/amdgpu/si_dma.c
index b67bd343f795f..3f5fe58c47165 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
@@ -428,12 +428,13 @@ static void si_dma_ring_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib)
  * si_dma_ring_emit_pipeline_sync - sync the pipeline
  *
  * @ring: amdgpu_ring pointer
+ * @seq: seq number to wait on
  *
  * Make sure all previous operations are completed (CIK).
  */
-static void si_dma_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void si_dma_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					   u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	/* wait for idle */
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index ecd7ead7a60b1..ef9a822ec6701 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -1089,9 +1089,9 @@ static void uvd_v6_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
 	amdgpu_ring_write(ring, 0xC);
 }
 
-static void uvd_v6_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void uvd_v6_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+					     u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	amdgpu_ring_write(ring, PACKET0(mmUVD_GPCOM_VCPU_DATA0, 0));
@@ -1118,9 +1118,9 @@ static void uvd_v6_0_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count)
 	}
 }
 
-static void uvd_v6_0_enc_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void uvd_v6_0_enc_ring_emit_pipeline_sync(struct amdgpu_ring *ring,
+						 u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	amdgpu_ring_write(ring, HEVC_ENC_CMD_WAIT_GE);
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index 03d79e464f04f..4a6f16c8e9c1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -887,9 +887,9 @@ static void vce_v3_0_emit_vm_flush(struct amdgpu_ring *ring,
 	amdgpu_ring_write(ring, VCE_CMD_END);
 }
 
-static void vce_v3_0_emit_pipeline_sync(struct amdgpu_ring *ring)
+static void vce_v3_0_emit_pipeline_sync(struct amdgpu_ring *ring,
+					u32 seq)
 {
-	uint32_t seq = ring->fence_drv.sync_seq;
 	uint64_t addr = ring->fence_drv.gpu_addr;
 
 	amdgpu_ring_write(ring, VCE_CMD_WAIT_GE);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 33/42] drm/amdgpu: drop extra parameter for vm_flush
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (31 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 32/42] drm/amdgpu: change function signature for emit_pipeline_sync() Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 34/42] drm/amdgpu: move need_ctx_switch into amdgpu_job Alex Deucher
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

We can store it in the job.  No need for an extra
parameter.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  4 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 10 ++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 +-
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index fb2e08ea248c6..d9db39d56fa2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -134,7 +134,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 	bool secure, init_shadow;
 	u64 shadow_va, csa_va, gds_va;
 	int vmid = AMDGPU_JOB_GET_VMID(job);
-	bool need_pipe_sync = false;
 	unsigned int cond_exec;
 	unsigned int i;
 	int r = 0;
@@ -191,7 +190,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 	    ((tmp = amdgpu_sync_get_fence(&job->explicit_sync)) ||
 	     need_ctx_switch || amdgpu_vm_need_pipeline_sync(ring, job))) {
 
-		need_pipe_sync = true;
+		job->need_pipe_sync = true;
+		job->pipe_sync_seq = ring->fence_drv.sync_seq;
 
 		if (tmp)
 			trace_amdgpu_ib_pipe_sync(job, tmp);
@@ -209,7 +209,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 	if (ring->funcs->insert_start)
 		ring->funcs->insert_start(ring);
 
-	r = amdgpu_vm_flush(ring, job, need_pipe_sync);
+	r = amdgpu_vm_flush(ring, job);
 	if (r) {
 		amdgpu_ring_undo(ring);
 		return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 56a88e14a0448..908239d45bd3f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -101,6 +101,10 @@ struct amdgpu_job {
 	bool			enforce_isolation;
 	bool			run_cleaner_shader;
 
+	/* job state */
+	bool			need_pipe_sync;
+	u32			pipe_sync_seq;
+
 	uint32_t		num_ibs;
 	struct amdgpu_ib	ibs[];
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index c05a9f80053d4..cea359f2084ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -761,15 +761,13 @@ bool amdgpu_vm_need_pipeline_sync(struct amdgpu_ring *ring,
  *
  * @ring: ring to use for flush
  * @job:  related job
- * @need_pipe_sync: is pipe sync needed
  *
  * Emit a VM flush when it is necessary.
  *
  * Returns:
  * 0 on success, errno otherwise.
  */
-int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
-		    bool need_pipe_sync)
+int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 {
 	struct amdgpu_device *adev = ring->adev;
 	struct amdgpu_isolation *isolation = &adev->isolation[ring->xcp_id];
@@ -810,7 +808,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		ring->funcs->emit_cleaner_shader && job->base.s_fence &&
 		&job->base.s_fence->scheduled == isolation->spearhead;
 
-	if (!vm_flush_needed && !gds_switch_needed && !need_pipe_sync &&
+	if (!vm_flush_needed && !gds_switch_needed && !job->need_pipe_sync &&
 	    !cleaner_shader_needed)
 		return 0;
 
@@ -819,8 +817,8 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		patch = amdgpu_ring_init_cond_exec(ring,
 						   ring->cond_exe_gpu_addr);
 
-	if (need_pipe_sync)
-		amdgpu_ring_emit_pipeline_sync(ring, ring->fence_drv.sync_seq);
+	if (job->need_pipe_sync)
+		amdgpu_ring_emit_pipeline_sync(ring, job->pipe_sync_seq);
 
 	if (cleaner_shader_needed)
 		ring->funcs->emit_cleaner_shader(ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 139642eacdd0c..89b76639cb273 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -512,7 +512,7 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		       struct ww_acquire_ctx *ticket,
 		       int (*callback)(void *p, struct amdgpu_bo *bo),
 		       void *param);
-int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, bool need_pipe_sync);
+int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job);
 int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
 			  struct amdgpu_vm *vm, bool immediate);
 int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 34/42] drm/amdgpu: move need_ctx_switch into amdgpu_job
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (32 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 33/42] drm/amdgpu: drop extra parameter for vm_flush Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 35/42] drm/amdgpu: store vm flush state in amdgpu_job Alex Deucher
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

No intended functional change.  Needed to help separate
the IB scheduling and emit logic.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 7 +++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 1 +
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index d9db39d56fa2d..8c36eaba151e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -126,7 +126,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 	struct amdgpu_ib *ib;
 	struct dma_fence *tmp = NULL;
 	struct amdgpu_fence *af;
-	bool need_ctx_switch;
 	struct amdgpu_vm *vm;
 	uint64_t fence_ctx;
 	uint32_t status = 0, alloc_size;
@@ -185,10 +184,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		return r;
 	}
 
-	need_ctx_switch = ring->current_ctx != fence_ctx;
+	job->need_ctx_switch = ring->current_ctx != fence_ctx;
 	if (ring->funcs->emit_pipeline_sync && job &&
 	    ((tmp = amdgpu_sync_get_fence(&job->explicit_sync)) ||
-	     need_ctx_switch || amdgpu_vm_need_pipeline_sync(ring, job))) {
+	     job->need_ctx_switch || amdgpu_vm_need_pipeline_sync(ring, job))) {
 
 		job->need_pipe_sync = true;
 		job->pipe_sync_seq = ring->fence_drv.sync_seq;
@@ -227,7 +226,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 
 	amdgpu_device_flush_hdp(adev, ring);
 
-	if (need_ctx_switch)
+	if (job->need_ctx_switch)
 		status |= AMDGPU_HAVE_CTX_SWITCH;
 
 	if (ring->funcs->emit_cntxcntl) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 908239d45bd3f..21e1941ce356a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -104,6 +104,7 @@ struct amdgpu_job {
 	/* job state */
 	bool			need_pipe_sync;
 	u32			pipe_sync_seq;
+	bool			need_ctx_switch;
 
 	uint32_t		num_ibs;
 	struct amdgpu_ib	ibs[];
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 35/42] drm/amdgpu: store vm flush state in amdgpu_job
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (33 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 34/42] drm/amdgpu: move need_ctx_switch into amdgpu_job Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 36/42] drm/amdgpu: split fence init and emit logic Alex Deucher
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Store it in the job rather than as local variables.
No intended functional change.  Needed to help separate
the vm flush and emit logic.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 54 ++++++++++++-------------
 2 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 21e1941ce356a..d53c13322a648 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -105,6 +105,9 @@ struct amdgpu_job {
 	bool			need_pipe_sync;
 	u32			pipe_sync_seq;
 	bool			need_ctx_switch;
+	bool			vm_flush_needed;
+	bool			cleaner_shader_needed;
+	bool			pasid_mapping_needed;
 
 	uint32_t		num_ibs;
 	struct amdgpu_ib	ibs[];
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index cea359f2084ca..e480a65dbdb1c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -774,42 +774,40 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 	unsigned vmhub = ring->vm_hub;
 	struct amdgpu_vmid_mgr *id_mgr = &adev->vm_manager.id_mgr[vmhub];
 	struct amdgpu_vmid *id = &id_mgr->ids[job->vmid];
-	bool spm_update_needed = job->spm_update_needed;
-	bool gds_switch_needed = ring->funcs->emit_gds_switch &&
-		job->gds_switch_needed;
-	bool vm_flush_needed = job->vm_needs_flush;
-	bool cleaner_shader_needed = false;
-	bool pasid_mapping_needed = false;
 	struct dma_fence *fence = NULL;
 	unsigned int patch;
 	int r;
 
+	job->gds_switch_needed = ring->funcs->emit_gds_switch &&
+		job->gds_switch_needed;
+	job->vm_flush_needed = job->vm_needs_flush;
+
 	if (amdgpu_vmid_had_gpu_reset(adev, id)) {
-		gds_switch_needed = true;
-		vm_flush_needed = true;
-		pasid_mapping_needed = true;
-		spm_update_needed = true;
+		job->gds_switch_needed = true;
+		job->vm_flush_needed = true;
+		job->pasid_mapping_needed = true;
+		job->spm_update_needed = true;
 	}
 
 	mutex_lock(&id_mgr->lock);
 	if (id->pasid != job->pasid || !id->pasid_mapping ||
 	    !dma_fence_is_signaled(id->pasid_mapping))
-		pasid_mapping_needed = true;
+		job->pasid_mapping_needed = true;
 	mutex_unlock(&id_mgr->lock);
 
-	gds_switch_needed &= !!ring->funcs->emit_gds_switch;
-	vm_flush_needed &= !!ring->funcs->emit_vm_flush  &&
-			job->vm_pd_addr != AMDGPU_BO_INVALID_OFFSET;
-	pasid_mapping_needed &= adev->gmc.gmc_funcs->emit_pasid_mapping &&
+	job->gds_switch_needed &= !!ring->funcs->emit_gds_switch;
+	job->vm_flush_needed &= !!ring->funcs->emit_vm_flush &&
+		job->vm_pd_addr != AMDGPU_BO_INVALID_OFFSET;
+	job->pasid_mapping_needed &= adev->gmc.gmc_funcs->emit_pasid_mapping &&
 		ring->funcs->emit_wreg;
 
-	cleaner_shader_needed = job->run_cleaner_shader &&
+	job->cleaner_shader_needed = job->run_cleaner_shader &&
 		adev->gfx.enable_cleaner_shader &&
 		ring->funcs->emit_cleaner_shader && job->base.s_fence &&
 		&job->base.s_fence->scheduled == isolation->spearhead;
 
-	if (!vm_flush_needed && !gds_switch_needed && !job->need_pipe_sync &&
-	    !cleaner_shader_needed)
+	if (!job->vm_flush_needed && !job->gds_switch_needed && !job->need_pipe_sync &&
+	    !job->cleaner_shader_needed)
 		return 0;
 
 	amdgpu_ring_ib_begin(ring);
@@ -820,29 +818,31 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 	if (job->need_pipe_sync)
 		amdgpu_ring_emit_pipeline_sync(ring, job->pipe_sync_seq);
 
-	if (cleaner_shader_needed)
+	if (job->cleaner_shader_needed)
 		ring->funcs->emit_cleaner_shader(ring);
 
-	if (vm_flush_needed) {
+	if (job->vm_flush_needed) {
 		trace_amdgpu_vm_flush(ring, job->vmid, job->vm_pd_addr);
 		amdgpu_ring_emit_vm_flush(ring, job->vmid, job->vm_pd_addr);
 	}
 
-	if (pasid_mapping_needed)
+	if (job->pasid_mapping_needed)
 		amdgpu_gmc_emit_pasid_mapping(ring, job->vmid, job->pasid);
 
-	if (spm_update_needed && adev->gfx.rlc.funcs->update_spm_vmid)
+	if (job->spm_update_needed && adev->gfx.rlc.funcs->update_spm_vmid)
 		adev->gfx.rlc.funcs->update_spm_vmid(adev, ring->xcc_id, ring, job->vmid);
 
 	if (ring->funcs->emit_gds_switch &&
-	    gds_switch_needed) {
+	    job->gds_switch_needed) {
 		amdgpu_ring_emit_gds_switch(ring, job->vmid, job->gds_base,
 					    job->gds_size, job->gws_base,
 					    job->gws_size, job->oa_base,
 					    job->oa_size);
 	}
 
-	if (vm_flush_needed || pasid_mapping_needed || cleaner_shader_needed) {
+	if (job->vm_flush_needed ||
+	    job->pasid_mapping_needed ||
+	    job->cleaner_shader_needed) {
 		r = amdgpu_fence_emit(ring, job->hw_vm_fence, 0);
 		if (r)
 			return r;
@@ -851,7 +851,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 		dma_fence_get(fence);
 	}
 
-	if (vm_flush_needed) {
+	if (job->vm_flush_needed) {
 		mutex_lock(&id_mgr->lock);
 		dma_fence_put(id->last_flush);
 		id->last_flush = dma_fence_get(fence);
@@ -860,7 +860,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 		mutex_unlock(&id_mgr->lock);
 	}
 
-	if (pasid_mapping_needed) {
+	if (job->pasid_mapping_needed) {
 		mutex_lock(&id_mgr->lock);
 		id->pasid = job->pasid;
 		dma_fence_put(id->pasid_mapping);
@@ -872,7 +872,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 	 * Make sure that all other submissions wait for the cleaner shader to
 	 * finish before we push them to the HW.
 	 */
-	if (cleaner_shader_needed) {
+	if (job->cleaner_shader_needed) {
 		trace_amdgpu_cleaner_shader(ring, fence);
 		mutex_lock(&adev->enforce_isolation_mutex);
 		dma_fence_put(isolation->spearhead);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 36/42] drm/amdgpu: split fence init and emit logic
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (34 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 35/42] drm/amdgpu: store vm flush state in amdgpu_job Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 37/42] drm/amdgpu: split vm flush and vm flush " Alex Deucher
                   ` (6 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

No intended functional change.  Needed to help separate
the IB scheduling and emit logic.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 19 +++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c    |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  3 ++-
 4 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index b1cf9550c259b..c1cb47e92d08c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -104,13 +104,11 @@ static void amdgpu_fence_save_fence_wptr_end(struct amdgpu_fence *af)
  *
  * @ring: ring the fence is associated with
  * @af: amdgpu fence input
- * @flags: flags to pass into the subordinate .emit_fence() call
  *
  * Emits a fence command on the requested ring (all asics).
  * Returns 0 on success, -ENOMEM on failure.
  */
-int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
-		      unsigned int flags)
+int amdgpu_fence_init(struct amdgpu_ring *ring, struct amdgpu_fence *af)
 {
 	struct amdgpu_device *adev = ring->adev;
 	struct dma_fence *fence;
@@ -126,11 +124,6 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
 		       &ring->fence_drv.lock,
 		       adev->fence_context + ring->idx, seq);
 
-	amdgpu_fence_save_fence_wptr_start(af);
-	amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
-			       seq, flags | AMDGPU_FENCE_FLAG_INT);
-	amdgpu_fence_save_fence_wptr_end(af);
-	amdgpu_fence_save_wptr(af);
 	pm_runtime_get_noresume(adev_to_drm(adev)->dev);
 	ptr = &ring->fence_drv.fences[seq & ring->fence_drv.num_fences_mask];
 	if (unlikely(rcu_dereference_protected(*ptr, 1))) {
@@ -158,6 +151,16 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
 	return 0;
 }
 
+void amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
+		       unsigned int flags)
+{
+	amdgpu_fence_save_fence_wptr_start(af);
+	amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
+			       af->base.seqno, flags | AMDGPU_FENCE_FLAG_INT);
+	amdgpu_fence_save_fence_wptr_end(af);
+	amdgpu_fence_save_wptr(af);
+}
+
 /**
  * amdgpu_fence_emit_polling - emit a fence on the requeste ring
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 8c36eaba151e6..d2f03060d9d3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -277,7 +277,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		amdgpu_ring_init_cond_exec(ring, ring->cond_exe_gpu_addr);
 	}
 
-	r = amdgpu_fence_emit(ring, af, fence_flags);
+	r = amdgpu_fence_init(ring, af);
 	if (r) {
 		dev_err(adev->dev, "failed to emit fence (%d)\n", r);
 		if (job->vmid)
@@ -285,6 +285,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		amdgpu_ring_undo(ring);
 		return r;
 	}
+	amdgpu_fence_emit(ring, af, fence_flags);
 	*f = &af->base;
 	/* get a ref for the job */
 	dma_fence_get(*f);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index da437c349aab9..8aab82af2e0e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -173,8 +173,9 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev);
 void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev);
 int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev);
 void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev);
-int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
-		      unsigned int flags);
+int amdgpu_fence_init(struct amdgpu_ring *ring, struct amdgpu_fence *af);
+void amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
+		       unsigned int flags);
 int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s,
 			      uint32_t timeout);
 bool amdgpu_fence_process(struct amdgpu_ring *ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e480a65dbdb1c..374991520ad2c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -843,9 +843,10 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 	if (job->vm_flush_needed ||
 	    job->pasid_mapping_needed ||
 	    job->cleaner_shader_needed) {
-		r = amdgpu_fence_emit(ring, job->hw_vm_fence, 0);
+		r = amdgpu_fence_init(ring, job->hw_vm_fence);
 		if (r)
 			return r;
+		amdgpu_fence_emit(ring, job->hw_vm_fence, 0);
 		fence = &job->hw_vm_fence->base;
 		/* get a ref for the job */
 		dma_fence_get(fence);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 37/42] drm/amdgpu: split vm flush and vm flush emit logic
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (35 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 36/42] drm/amdgpu: split fence init and emit logic Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 38/42] drm/amdgpu: split ib schedule and ib " Alex Deucher
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

No intended functional change.  Split the logic into
two functions, one to set the state and one to use
the state to emit the ring contents.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 92 ++++++++++++++-----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  1 +
 4 files changed, 56 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index d2f03060d9d3a..54d7a975a74c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -213,6 +213,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		amdgpu_ring_undo(ring);
 		return r;
 	}
+	amdgpu_vm_emit_flush(ring, job);
 
 	amdgpu_ring_ib_begin(ring);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index d53c13322a648..72d50602a8e52 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -108,6 +108,7 @@ struct amdgpu_job {
 	bool			vm_flush_needed;
 	bool			cleaner_shader_needed;
 	bool			pasid_mapping_needed;
+	bool			emit_vm_fence;
 
 	uint32_t		num_ibs;
 	struct amdgpu_ib	ibs[];
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 374991520ad2c..6c84677daad4e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -756,6 +756,57 @@ bool amdgpu_vm_need_pipeline_sync(struct amdgpu_ring *ring,
 	return false;
 }
 
+void amdgpu_vm_emit_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
+{
+	struct amdgpu_device *adev = ring->adev;
+	unsigned int patch;
+
+	if (!job->vm_flush_needed && !job->gds_switch_needed && !job->need_pipe_sync &&
+	    !job->cleaner_shader_needed)
+		return;
+
+	amdgpu_ring_ib_begin(ring);
+	if (ring->funcs->init_cond_exec)
+		patch = amdgpu_ring_init_cond_exec(ring,
+						   ring->cond_exe_gpu_addr);
+
+	if (job->need_pipe_sync)
+		amdgpu_ring_emit_pipeline_sync(ring, job->pipe_sync_seq);
+
+	if (job->cleaner_shader_needed)
+		ring->funcs->emit_cleaner_shader(ring);
+
+	if (job->vm_flush_needed)
+		amdgpu_ring_emit_vm_flush(ring, job->vmid, job->vm_pd_addr);
+
+	if (job->pasid_mapping_needed)
+		amdgpu_gmc_emit_pasid_mapping(ring, job->vmid, job->pasid);
+
+	if (job->spm_update_needed && adev->gfx.rlc.funcs->update_spm_vmid)
+		adev->gfx.rlc.funcs->update_spm_vmid(adev, ring->xcc_id, ring, job->vmid);
+
+	if (ring->funcs->emit_gds_switch &&
+	    job->gds_switch_needed) {
+		amdgpu_ring_emit_gds_switch(ring, job->vmid, job->gds_base,
+					    job->gds_size, job->gws_base,
+					    job->gws_size, job->oa_base,
+					    job->oa_size);
+	}
+
+	if (job->emit_vm_fence)
+		amdgpu_fence_emit(ring, job->hw_vm_fence, 0);
+
+	amdgpu_ring_patch_cond_exec(ring, patch);
+
+	/* the double SWITCH_BUFFER here *cannot* be skipped by COND_EXEC */
+	if (ring->funcs->emit_switch_buffer) {
+		amdgpu_ring_emit_switch_buffer(ring);
+		amdgpu_ring_emit_switch_buffer(ring);
+	}
+
+	amdgpu_ring_ib_end(ring);
+}
+
 /**
  * amdgpu_vm_flush - hardware flush the vm
  *
@@ -775,7 +826,6 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 	struct amdgpu_vmid_mgr *id_mgr = &adev->vm_manager.id_mgr[vmhub];
 	struct amdgpu_vmid *id = &id_mgr->ids[job->vmid];
 	struct dma_fence *fence = NULL;
-	unsigned int patch;
 	int r;
 
 	job->gds_switch_needed = ring->funcs->emit_gds_switch &&
@@ -810,35 +860,8 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 	    !job->cleaner_shader_needed)
 		return 0;
 
-	amdgpu_ring_ib_begin(ring);
-	if (ring->funcs->init_cond_exec)
-		patch = amdgpu_ring_init_cond_exec(ring,
-						   ring->cond_exe_gpu_addr);
-
-	if (job->need_pipe_sync)
-		amdgpu_ring_emit_pipeline_sync(ring, job->pipe_sync_seq);
-
-	if (job->cleaner_shader_needed)
-		ring->funcs->emit_cleaner_shader(ring);
-
-	if (job->vm_flush_needed) {
+	if (job->vm_flush_needed)
 		trace_amdgpu_vm_flush(ring, job->vmid, job->vm_pd_addr);
-		amdgpu_ring_emit_vm_flush(ring, job->vmid, job->vm_pd_addr);
-	}
-
-	if (job->pasid_mapping_needed)
-		amdgpu_gmc_emit_pasid_mapping(ring, job->vmid, job->pasid);
-
-	if (job->spm_update_needed && adev->gfx.rlc.funcs->update_spm_vmid)
-		adev->gfx.rlc.funcs->update_spm_vmid(adev, ring->xcc_id, ring, job->vmid);
-
-	if (ring->funcs->emit_gds_switch &&
-	    job->gds_switch_needed) {
-		amdgpu_ring_emit_gds_switch(ring, job->vmid, job->gds_base,
-					    job->gds_size, job->gws_base,
-					    job->gws_size, job->oa_base,
-					    job->oa_size);
-	}
 
 	if (job->vm_flush_needed ||
 	    job->pasid_mapping_needed ||
@@ -846,7 +869,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 		r = amdgpu_fence_init(ring, job->hw_vm_fence);
 		if (r)
 			return r;
-		amdgpu_fence_emit(ring, job->hw_vm_fence, 0);
+		job->emit_vm_fence = true;
 		fence = &job->hw_vm_fence->base;
 		/* get a ref for the job */
 		dma_fence_get(fence);
@@ -882,15 +905,6 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 	}
 	dma_fence_put(fence);
 
-	amdgpu_ring_patch_cond_exec(ring, patch);
-
-	/* the double SWITCH_BUFFER here *cannot* be skipped by COND_EXEC */
-	if (ring->funcs->emit_switch_buffer) {
-		amdgpu_ring_emit_switch_buffer(ring);
-		amdgpu_ring_emit_switch_buffer(ring);
-	}
-
-	amdgpu_ring_ib_end(ring);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 89b76639cb273..0ce37aab8b518 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -512,6 +512,7 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		       struct ww_acquire_ctx *ticket,
 		       int (*callback)(void *p, struct amdgpu_bo *bo),
 		       void *param);
+void amdgpu_vm_emit_flush(struct amdgpu_ring *ring, struct amdgpu_job *job);
 int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job);
 int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
 			  struct amdgpu_vm *vm, bool immediate);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 38/42] drm/amdgpu: split ib schedule and ib emit logic
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (36 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 37/42] drm/amdgpu: split vm flush and vm flush " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 39/42] drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout() Alex Deucher
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

No intended functional change.  Split the logic into
two functions, one to set the state and one to use
the state to emit the ring contents.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 201 ++++++++++++++-----------
 1 file changed, 110 insertions(+), 91 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 54d7a975a74c0..0e648fbe0980f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -99,40 +99,15 @@ void amdgpu_ib_free(struct amdgpu_ib *ib, struct dma_fence *f)
 	amdgpu_sa_bo_free(&ib->sa_bo, f);
 }
 
-/**
- * amdgpu_ib_schedule - schedule an IB (Indirect Buffer) on the ring
- *
- * @ring: ring index the IB is associated with
- * @job: job to schedule
- * @f: fence created during this submission
- *
- * Schedule an IB on the associated ring (all asics).
- * Returns 0 on success, error on failure.
- *
- * On SI, there are two parallel engines fed from the primary ring,
- * the CE (Constant Engine) and the DE (Drawing Engine).  Since
- * resource descriptors have moved to memory, the CE allows you to
- * prime the caches while the DE is updating register state so that
- * the resource descriptors will be already in cache when the draw is
- * processed.  To accomplish this, the userspace driver submits two
- * IBs, one for the CE and one for the DE.  If there is a CE IB (called
- * a CONST_IB), it will be put on the ring prior to the DE IB.  Prior
- * to SI there was just a DE IB.
- */
-int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
-		       struct dma_fence **f)
+static int amdgpu_ib_emit(struct amdgpu_ring *ring, struct amdgpu_job *job)
 {
 	struct amdgpu_device *adev = ring->adev;
+	int vmid = AMDGPU_JOB_GET_VMID(job);
 	struct amdgpu_ib *ib;
-	struct dma_fence *tmp = NULL;
-	struct amdgpu_fence *af;
-	struct amdgpu_vm *vm;
-	uint64_t fence_ctx;
 	uint32_t status = 0, alloc_size;
+	u64 shadow_va, csa_va, gds_va;
 	unsigned int fence_flags = 0;
 	bool secure, init_shadow;
-	u64 shadow_va, csa_va, gds_va;
-	int vmid = AMDGPU_JOB_GET_VMID(job);
 	unsigned int cond_exec;
 	unsigned int i;
 	int r = 0;
@@ -143,61 +118,23 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		return -EINVAL;
 
 	ib = &job->ibs[0];
-	vm = job->vm;
-	fence_ctx = job->base.s_fence ?
-		job->base.s_fence->finished.context : 0;
 	shadow_va = job->shadow_va;
 	csa_va = job->csa_va;
 	gds_va = job->gds_va;
 	init_shadow = job->init_shadow;
-	af = job->hw_fence;
-	/* Save the context of the job for reset handling.
-	 * The driver needs this so it can skip the ring
-	 * contents for guilty contexts.
-	 */
-	af->context = fence_ctx;
-	/* the vm fence is also part of the job's context */
-	job->hw_vm_fence->context = fence_ctx;
-
-	if (!ring->sched.ready) {
-		dev_err(adev->dev, "couldn't schedule ib on ring <%s>\n", ring->name);
-		return -EINVAL;
-	}
 
-	if (vm && !job->vmid) {
-		dev_err(adev->dev, "VM IB without ID\n");
-		return -EINVAL;
-	}
-
-	if ((ib->flags & AMDGPU_IB_FLAGS_SECURE) &&
-	    (!ring->funcs->secure_submission_supported)) {
-		dev_err(adev->dev, "secure submissions not supported on ring <%s>\n", ring->name);
-		return -EINVAL;
-	}
+	if (ib->flags & AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE)
+		fence_flags |= AMDGPU_FENCE_FLAG_TC_WB_ONLY;
 
 	alloc_size = ring->funcs->emit_frame_size + job->num_ibs *
 		ring->funcs->emit_ib_size;
 
 	r = amdgpu_ring_alloc(ring, alloc_size);
 	if (r) {
-		dev_err(adev->dev, "scheduling IB failed (%d).\n", r);
+		dev_err(adev->dev, "Ring allocation for IB failed (%d).\n", r);
 		return r;
 	}
 
-	job->need_ctx_switch = ring->current_ctx != fence_ctx;
-	if (ring->funcs->emit_pipeline_sync && job &&
-	    ((tmp = amdgpu_sync_get_fence(&job->explicit_sync)) ||
-	     job->need_ctx_switch || amdgpu_vm_need_pipeline_sync(ring, job))) {
-
-		job->need_pipe_sync = true;
-		job->pipe_sync_seq = ring->fence_drv.sync_seq;
-
-		if (tmp)
-			trace_amdgpu_ib_pipe_sync(job, tmp);
-
-		dma_fence_put(tmp);
-	}
-
 	if ((ib->flags & AMDGPU_IB_FLAG_EMIT_MEM_SYNC) && ring->funcs->emit_mem_sync)
 		ring->funcs->emit_mem_sync(ring);
 
@@ -208,11 +145,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 	if (ring->funcs->insert_start)
 		ring->funcs->insert_start(ring);
 
-	r = amdgpu_vm_flush(ring, job);
-	if (r) {
-		amdgpu_ring_undo(ring);
-		return r;
-	}
 	amdgpu_vm_emit_flush(ring, job);
 
 	amdgpu_ring_ib_begin(ring);
@@ -264,9 +196,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 
 	amdgpu_device_invalidate_hdp(adev, ring);
 
-	if (ib->flags & AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE)
-		fence_flags |= AMDGPU_FENCE_FLAG_TC_WB_ONLY;
-
 	/* wrap the last IB with fence */
 	if (job->uf_addr) {
 		amdgpu_ring_emit_fence(ring, job->uf_addr, job->uf_sequence,
@@ -278,25 +207,13 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		amdgpu_ring_init_cond_exec(ring, ring->cond_exe_gpu_addr);
 	}
 
-	r = amdgpu_fence_init(ring, af);
-	if (r) {
-		dev_err(adev->dev, "failed to emit fence (%d)\n", r);
-		if (job->vmid)
-			amdgpu_vmid_reset(adev, ring->vm_hub, job->vmid);
-		amdgpu_ring_undo(ring);
-		return r;
-	}
-	amdgpu_fence_emit(ring, af, fence_flags);
-	*f = &af->base;
-	/* get a ref for the job */
-	dma_fence_get(*f);
+	amdgpu_fence_emit(ring, job->hw_fence, fence_flags);
 
 	if (ring->funcs->insert_end)
 		ring->funcs->insert_end(ring);
 
 	amdgpu_ring_patch_cond_exec(ring, cond_exec);
 
-	ring->current_ctx = fence_ctx;
 	if (ring->funcs->emit_switch_buffer)
 		amdgpu_ring_emit_switch_buffer(ring);
 
@@ -310,7 +227,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 	 * fence so we know what rings contents to backup
 	 * after we reset the queue.
 	 */
-	amdgpu_fence_save_wptr(af);
+	amdgpu_fence_save_wptr(job->hw_fence);
 
 	amdgpu_ring_ib_end(ring);
 	amdgpu_ring_commit(ring);
@@ -318,6 +235,108 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 	return 0;
 }
 
+/**
+ * amdgpu_ib_schedule - schedule an IB (Indirect Buffer) on the ring
+ *
+ * @ring: ring index the IB is associated with
+ * @job: job to schedule
+ * @f: fence created during this submission
+ *
+ * Schedule an IB on the associated ring (all asics).
+ * Returns 0 on success, error on failure.
+ *
+ * On SI, there are two parallel engines fed from the primary ring,
+ * the CE (Constant Engine) and the DE (Drawing Engine).  Since
+ * resource descriptors have moved to memory, the CE allows you to
+ * prime the caches while the DE is updating register state so that
+ * the resource descriptors will be already in cache when the draw is
+ * processed.  To accomplish this, the userspace driver submits two
+ * IBs, one for the CE and one for the DE.  If there is a CE IB (called
+ * a CONST_IB), it will be put on the ring prior to the DE IB.  Prior
+ * to SI there was just a DE IB.
+ */
+int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
+		       struct dma_fence **f)
+{
+	struct amdgpu_device *adev = ring->adev;
+	struct dma_fence *tmp = NULL;
+	struct amdgpu_fence *af;
+	struct amdgpu_ib *ib;
+	struct amdgpu_vm *vm;
+	uint64_t fence_ctx;
+	int r = 0;
+
+	if (!job)
+		return -EINVAL;
+	if (job->num_ibs == 0)
+		return -EINVAL;
+
+	ib = &job->ibs[0];
+	vm = job->vm;
+	fence_ctx = job->base.s_fence ?
+		job->base.s_fence->finished.context : 0;
+	af = job->hw_fence;
+	/* Save the context of the job for reset handling.
+	 * The driver needs this so it can skip the ring
+	 * contents for guilty contexts.
+	 */
+	af->context = fence_ctx;
+	/* the vm fence is also part of the job's context */
+	job->hw_vm_fence->context = fence_ctx;
+
+	if (!ring->sched.ready) {
+		dev_err(adev->dev, "couldn't schedule ib on ring <%s>\n", ring->name);
+		return -EINVAL;
+	}
+
+	if (vm && !job->vmid) {
+		dev_err(adev->dev, "VM IB without ID\n");
+		return -EINVAL;
+	}
+
+	if ((ib->flags & AMDGPU_IB_FLAGS_SECURE) &&
+	    (!ring->funcs->secure_submission_supported)) {
+		dev_err(adev->dev, "secure submissions not supported on ring <%s>\n", ring->name);
+		return -EINVAL;
+	}
+
+	job->need_ctx_switch = ring->current_ctx != fence_ctx;
+	if (ring->funcs->emit_pipeline_sync && job &&
+	    ((tmp = amdgpu_sync_get_fence(&job->explicit_sync)) ||
+	     job->need_ctx_switch || amdgpu_vm_need_pipeline_sync(ring, job))) {
+
+		job->need_pipe_sync = true;
+		job->pipe_sync_seq = ring->fence_drv.sync_seq;
+
+		if (tmp)
+			trace_amdgpu_ib_pipe_sync(job, tmp);
+
+		dma_fence_put(tmp);
+	}
+
+	r = amdgpu_vm_flush(ring, job);
+	if (r)
+		return r;
+
+	r = amdgpu_fence_init(ring, af);
+	if (r) {
+		dev_err(adev->dev, "failed to emit fence (%d)\n", r);
+		if (job->vmid)
+			amdgpu_vmid_reset(adev, ring->vm_hub, job->vmid);
+		return r;
+	}
+	*f = &af->base;
+	/* get a ref for the job */
+	dma_fence_get(*f);
+
+	r = amdgpu_ib_emit(ring, job);
+	if (r)
+		return r;
+	ring->current_ctx = fence_ctx;
+
+	return 0;
+}
+
 /**
  * amdgpu_ib_pool_init - Init the IB (Indirect Buffer) pool
  *
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 39/42] drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (37 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 38/42] drm/amdgpu: split ib schedule and ib " Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 40/42] drm/amdgpu: add an all_instance_rings_reset ring flag Alex Deucher
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Centralize it in amdgpu_job_timedout() rather than the ring
reset helpers.  No intended functional change.  Needed for
upcoming reemit changes.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 4 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 4 ----
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index d94b85e4e28a4..3834c1b288eab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -135,8 +135,12 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 	    ring->funcs->reset) {
 		dev_err(adev->dev, "Starting %s ring reset\n",
 			s_job->sched->name);
+		/* Stop the scheduler to prevent anybody else from touching the ring buffer. */
+		drm_sched_wqueue_stop(&ring->sched);
 		r = amdgpu_ring_reset(ring, job->vmid, job->hw_fence);
 		if (!r) {
+			/* Start the scheduler again */
+			drm_sched_wqueue_start(&ring->sched);
 			atomic_inc(&ring->adev->gpu_reset_counter);
 			dev_err(adev->dev, "Ring %s reset succeeded\n",
 				ring->sched.name);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index b03e3f5d40000..8cb10d71ee733 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -868,8 +868,6 @@ bool amdgpu_ring_sched_ready(struct amdgpu_ring *ring)
 void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
 				    struct amdgpu_fence *guilty_fence)
 {
-	/* Stop the scheduler to prevent anybody else from touching the ring buffer. */
-	drm_sched_wqueue_stop(&ring->sched);
 	/* back up the non-guilty commands */
 	amdgpu_ring_backup_unprocessed_commands(ring, guilty_fence);
 	/* signal the guilty fence and set an error on all fences from the context */
@@ -896,8 +894,6 @@ int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
 			amdgpu_ring_write(ring, ring->ring_backup[i]);
 		amdgpu_ring_commit(ring);
 	}
-	/* Start the scheduler again */
-	drm_sched_wqueue_start(&ring->sched);
 	return 0;
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 40/42] drm/amdgpu: add an all_instance_rings_reset ring flag
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (38 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 39/42] drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout() Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 41/42] drm/amdgpu: rework reset reemit handling Alex Deucher
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Some IPs only support instance reset.  If there are multiple
rings on the instance, they will all be reset.  Add a flag
to handle note this case.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c    | 2 ++
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c    | 2 ++
 4 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 8aab82af2e0e0..63272425a12f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -388,6 +388,7 @@ struct amdgpu_ring {
 	u32			doorbell_index;
 	bool			use_doorbell;
 	bool			use_pollmem;
+	bool			all_instance_rings_reset;
 	unsigned		wptr_offs;
 	u64			wptr_gpu_addr;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index 86b800e2b4329..e508703d24d33 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -1469,6 +1469,7 @@ static int sdma_v4_4_2_sw_init(struct amdgpu_ip_block *ip_block)
 		ring = &adev->sdma.instance[i].ring;
 		ring->ring_obj = NULL;
 		ring->use_doorbell = true;
+		ring->all_instance_rings_reset = true;
 		aid_id = adev->sdma.instance[i].aid_id;
 
 		DRM_DEBUG("SDMA %d use_doorbell being set to: [%s]\n", i,
@@ -1490,6 +1491,7 @@ static int sdma_v4_4_2_sw_init(struct amdgpu_ip_block *ip_block)
 			ring = &adev->sdma.instance[i].page;
 			ring->ring_obj = NULL;
 			ring->use_doorbell = true;
+			ring->all_instance_rings_reset = true;
 
 			/* doorbell index of page queue is assigned right after
 			 * gfx queue on the same instance
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
index cebee453871c1..694eaa61c4b6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
@@ -334,6 +334,7 @@ static int vcn_v2_5_sw_init(struct amdgpu_ip_block *ip_block)
 
 		ring = &adev->vcn.inst[j].ring_dec;
 		ring->use_doorbell = true;
+		ring->all_instance_rings_reset = true;
 
 		ring->doorbell_index = (adev->doorbell_index.vcn.vcn_ring0_1 << 1) +
 				(amdgpu_sriov_vf(adev) ? 2*j : 8*j);
@@ -354,6 +355,7 @@ static int vcn_v2_5_sw_init(struct amdgpu_ip_block *ip_block)
 
 			ring = &adev->vcn.inst[j].ring_enc[i];
 			ring->use_doorbell = true;
+			ring->all_instance_rings_reset = true;
 
 			ring->doorbell_index = (adev->doorbell_index.vcn.vcn_ring0_1 << 1) +
 					(amdgpu_sriov_vf(adev) ? (1 + i + 2*j) : (2 + i + 8*j));
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index 02d5c5af65f23..cda3154692b35 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -234,6 +234,7 @@ static int vcn_v3_0_sw_init(struct amdgpu_ip_block *ip_block)
 
 		ring = &adev->vcn.inst[i].ring_dec;
 		ring->use_doorbell = true;
+		ring->all_instance_rings_reset = true;
 		if (amdgpu_sriov_vf(adev)) {
 			ring->doorbell_index = vcn_doorbell_index + i * (adev->vcn.inst[i].num_enc_rings + 1);
 		} else {
@@ -258,6 +259,7 @@ static int vcn_v3_0_sw_init(struct amdgpu_ip_block *ip_block)
 
 			ring = &adev->vcn.inst[i].ring_enc[j];
 			ring->use_doorbell = true;
+			ring->all_instance_rings_reset = true;
 			if (amdgpu_sriov_vf(adev)) {
 				ring->doorbell_index = vcn_doorbell_index + i * (adev->vcn.inst[i].num_enc_rings + 1) + 1 + j;
 			} else {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 41/42] drm/amdgpu: rework reset reemit handling
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (39 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 40/42] drm/amdgpu: add an all_instance_rings_reset ring flag Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-08 14:48 ` [PATCH 42/42] drm/amdgpu: simplify per queue reset code Alex Deucher
  2026-01-13 13:31 ` [PATCH 00/42] Improvements for IB handling Christian König
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Rather than saving and reemitting the ring's contents,
use the state stored in the job and fence to reemit
the state explicitly.  This greatly simplifies reemitting
the ring state and allows it to be reemitted over and
over if there are multiple ring resets.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 91 ++---------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c    | 35 ++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   | 29 ++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  | 50 +------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  | 15 +---
 5 files changed, 65 insertions(+), 155 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index c1cb47e92d08c..28691f9b6e32d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -89,16 +89,6 @@ static u32 amdgpu_fence_read(struct amdgpu_ring *ring)
 	return seq;
 }
 
-static void amdgpu_fence_save_fence_wptr_start(struct amdgpu_fence *af)
-{
-	af->fence_wptr_start = af->ring->wptr;
-}
-
-static void amdgpu_fence_save_fence_wptr_end(struct amdgpu_fence *af)
-{
-	af->fence_wptr_end = af->ring->wptr;
-}
-
 /**
  * amdgpu_fence_emit - emit a fence on the requested ring
  *
@@ -154,11 +144,8 @@ int amdgpu_fence_init(struct amdgpu_ring *ring, struct amdgpu_fence *af)
 void amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
 		       unsigned int flags)
 {
-	amdgpu_fence_save_fence_wptr_start(af);
 	amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
 			       af->base.seqno, flags | AMDGPU_FENCE_FLAG_INT);
-	amdgpu_fence_save_fence_wptr_end(af);
-	amdgpu_fence_save_wptr(af);
 }
 
 /**
@@ -743,7 +730,6 @@ void amdgpu_fence_driver_update_timedout_fence_state(struct amdgpu_fence *af)
 	struct amdgpu_ring *ring = af->ring;
 	unsigned long flags;
 	u32 seq, last_seq;
-	bool reemitted = false;
 
 	last_seq = amdgpu_fence_read(ring) & ring->fence_drv.num_fences_mask;
 	seq = ring->fence_drv.sync_seq & ring->fence_drv.num_fences_mask;
@@ -761,84 +747,17 @@ void amdgpu_fence_driver_update_timedout_fence_state(struct amdgpu_fence *af)
 		if (unprocessed && !dma_fence_is_signaled_locked(unprocessed)) {
 			fence = container_of(unprocessed, struct amdgpu_fence, base);
 
-			if (fence->reemitted > 1)
-				reemitted = true;
-			else if (fence == af)
+			if (fence == af) {
 				dma_fence_set_error(&fence->base, -ETIME);
-			else if (fence->context == af->context)
+				fence->skip_ib = true;
+			} else if (fence->context == af->context) {
 				dma_fence_set_error(&fence->base, -ECANCELED);
-		}
-		rcu_read_unlock();
-	} while (last_seq != seq);
-	spin_unlock_irqrestore(&ring->fence_drv.lock, flags);
-
-	if (reemitted) {
-		/* if we've already reemitted once then just cancel everything */
-		amdgpu_fence_driver_force_completion(af->ring, &af->base);
-		af->ring->ring_backup_entries_to_copy = 0;
-	}
-}
-
-void amdgpu_fence_save_wptr(struct amdgpu_fence *af)
-{
-	af->wptr = af->ring->wptr;
-}
-
-static void amdgpu_ring_backup_unprocessed_command(struct amdgpu_ring *ring,
-						   u64 start_wptr, u32 end_wptr)
-{
-	unsigned int first_idx = start_wptr & ring->buf_mask;
-	unsigned int last_idx = end_wptr & ring->buf_mask;
-	unsigned int i;
-
-	/* Backup the contents of the ring buffer. */
-	for (i = first_idx; i != last_idx; ++i, i &= ring->buf_mask)
-		ring->ring_backup[ring->ring_backup_entries_to_copy++] = ring->ring[i];
-}
-
-void amdgpu_ring_backup_unprocessed_commands(struct amdgpu_ring *ring,
-					     struct amdgpu_fence *guilty_fence)
-{
-	struct dma_fence *unprocessed;
-	struct dma_fence __rcu **ptr;
-	struct amdgpu_fence *fence;
-	u64 wptr;
-	u32 seq, last_seq;
-
-	last_seq = amdgpu_fence_read(ring) & ring->fence_drv.num_fences_mask;
-	seq = ring->fence_drv.sync_seq & ring->fence_drv.num_fences_mask;
-	wptr = ring->fence_drv.signalled_wptr;
-	ring->ring_backup_entries_to_copy = 0;
-
-	do {
-		last_seq++;
-		last_seq &= ring->fence_drv.num_fences_mask;
-
-		ptr = &ring->fence_drv.fences[last_seq];
-		rcu_read_lock();
-		unprocessed = rcu_dereference(*ptr);
-
-		if (unprocessed && !dma_fence_is_signaled(unprocessed)) {
-			fence = container_of(unprocessed, struct amdgpu_fence, base);
-
-			/* save everything if the ring is not guilty, otherwise
-			 * just save the content from other contexts.
-			 */
-			if (!fence->reemitted &&
-			    (!guilty_fence || (fence->context != guilty_fence->context))) {
-				amdgpu_ring_backup_unprocessed_command(ring, wptr,
-								       fence->wptr);
-			} else if (!fence->reemitted) {
-				/* always save the fence */
-				amdgpu_ring_backup_unprocessed_command(ring,
-								       fence->fence_wptr_start,
-								       fence->fence_wptr_end);
+				fence->skip_ib = true;
 			}
-			wptr = fence->wptr;
-			fence->reemitted++;
 		}
 		rcu_read_unlock();
 	} while (last_seq != seq);
+	spin_unlock_irqrestore(&ring->fence_drv.lock, flags);
 }
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 0e648fbe0980f..15a7daf5b9fa8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -99,7 +99,7 @@ void amdgpu_ib_free(struct amdgpu_ib *ib, struct dma_fence *f)
 	amdgpu_sa_bo_free(&ib->sa_bo, f);
 }
 
-static int amdgpu_ib_emit(struct amdgpu_ring *ring, struct amdgpu_job *job)
+int amdgpu_ib_emit(struct amdgpu_ring *ring, struct amdgpu_job *job)
 {
 	struct amdgpu_device *adev = ring->adev;
 	int vmid = AMDGPU_JOB_GET_VMID(job);
@@ -135,6 +135,31 @@ static int amdgpu_ib_emit(struct amdgpu_ring *ring, struct amdgpu_job *job)
 		return r;
 	}
 
+	if (job->hw_fence->skip_ib) {
+		if (ring->funcs->insert_start)
+			ring->funcs->insert_start(ring);
+		if (job->emit_vm_fence) {
+			amdgpu_ring_ib_begin(ring);
+			if (ring->funcs->init_cond_exec)
+				cond_exec = amdgpu_ring_init_cond_exec(ring,
+								       ring->cond_exe_gpu_addr);
+			amdgpu_fence_emit(ring, job->hw_vm_fence, 0);
+			amdgpu_ring_ib_end(ring);
+			amdgpu_ring_patch_cond_exec(ring, cond_exec);
+		}
+		amdgpu_ring_ib_begin(ring);
+		if (ring->funcs->init_cond_exec)
+			cond_exec = amdgpu_ring_init_cond_exec(ring,
+							       ring->cond_exe_gpu_addr);
+		amdgpu_fence_emit(ring, job->hw_fence, fence_flags);
+		if (ring->funcs->insert_end)
+			ring->funcs->insert_end(ring);
+		amdgpu_ring_patch_cond_exec(ring, cond_exec);
+		amdgpu_ring_ib_end(ring);
+		amdgpu_ring_commit(ring);
+		return 0;
+	}
+
 	if ((ib->flags & AMDGPU_IB_FLAG_EMIT_MEM_SYNC) && ring->funcs->emit_mem_sync)
 		ring->funcs->emit_mem_sync(ring);
 
@@ -221,14 +246,6 @@ static int amdgpu_ib_emit(struct amdgpu_ring *ring, struct amdgpu_job *job)
 	    ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
 		ring->funcs->emit_wave_limit(ring, false);
 
-	/* Save the wptr associated with this fence.
-	 * This must be last for resets to work properly
-	 * as we need to save the wptr associated with this
-	 * fence so we know what rings contents to backup
-	 * after we reset the queue.
-	 */
-	amdgpu_fence_save_wptr(job->hw_fence);
-
 	amdgpu_ring_ib_end(ring);
 	amdgpu_ring_commit(ring);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 3834c1b288eab..91821207636ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -85,6 +85,33 @@ static void amdgpu_job_core_dump(struct amdgpu_device *adev,
 	}
 }
 
+static int amdgpu_job_reemit_jobs(struct drm_sched_job *timedout_s_job)
+{
+	struct amdgpu_job *timedout_job = to_amdgpu_job(timedout_s_job);
+	struct amdgpu_ring *ring = to_amdgpu_ring(timedout_s_job->sched);
+	struct drm_gpu_scheduler *sched = &ring->sched;
+	struct drm_sched_job *s_job, *tmp;
+	int r;
+
+	/* skip reemit if we reset all the rings on an instance */
+	if (ring->all_instance_rings_reset)
+		return 0;
+
+	r = amdgpu_ib_emit(ring, timedout_job);
+	if (r)
+		return r;
+	list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
+		struct amdgpu_job *job = to_amdgpu_job(s_job);
+
+		r = amdgpu_ib_emit(ring, job);
+		if (r)
+			return r;
+	}
+
+	return 0;
+}
+
+
 static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 {
 	struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
@@ -138,6 +165,8 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 		/* Stop the scheduler to prevent anybody else from touching the ring buffer. */
 		drm_sched_wqueue_stop(&ring->sched);
 		r = amdgpu_ring_reset(ring, job->vmid, job->hw_fence);
+		if (!r)
+			r = amdgpu_job_reemit_jobs(s_job);
 		if (!r) {
 			/* Start the scheduler again */
 			drm_sched_wqueue_start(&ring->sched);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 8cb10d71ee733..1d94707fc86d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -104,29 +104,6 @@ int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned int ndw)
 	return 0;
 }
 
-/**
- * amdgpu_ring_alloc_reemit - allocate space on the ring buffer for reemit
- *
- * @ring: amdgpu_ring structure holding ring information
- * @ndw: number of dwords to allocate in the ring buffer
- *
- * Allocate @ndw dwords in the ring buffer (all asics).
- * doesn't check the max_dw limit as we may be reemitting
- * several submissions.
- */
-static void amdgpu_ring_alloc_reemit(struct amdgpu_ring *ring, unsigned int ndw)
-{
-	/* Align requested size with padding so unlock_commit can
-	 * pad safely */
-	ndw = (ndw + ring->funcs->align_mask) & ~ring->funcs->align_mask;
-
-	ring->count_dw = ndw;
-	ring->wptr_old = ring->wptr;
-
-	if (ring->funcs->begin_use)
-		ring->funcs->begin_use(ring);
-}
-
 /**
  * amdgpu_ring_insert_nop - insert NOP packets
  *
@@ -370,12 +347,6 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 	/*  Initialize cached_rptr to 0 */
 	ring->cached_rptr = 0;
 
-	if (!ring->ring_backup) {
-		ring->ring_backup = kvzalloc(ring->ring_size, GFP_KERNEL);
-		if (!ring->ring_backup)
-			return -ENOMEM;
-	}
-
 	/* Allocate ring buffer */
 	if (ring->ring_obj == NULL) {
 		r = amdgpu_bo_create_kernel(adev, ring->ring_size + ring->funcs->extra_bytes,
@@ -386,7 +357,6 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 					    (void **)&ring->ring);
 		if (r) {
 			dev_err(adev->dev, "(%d) ring create failed\n", r);
-			kvfree(ring->ring_backup);
 			return r;
 		}
 		amdgpu_ring_clear_ring(ring);
@@ -430,8 +400,6 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring)
 	amdgpu_bo_free_kernel(&ring->ring_obj,
 			      &ring->gpu_addr,
 			      (void **)&ring->ring);
-	kvfree(ring->ring_backup);
-	ring->ring_backup = NULL;
 
 	dma_fence_put(ring->vmid_wait);
 	ring->vmid_wait = NULL;
@@ -868,8 +836,6 @@ bool amdgpu_ring_sched_ready(struct amdgpu_ring *ring)
 void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
 				    struct amdgpu_fence *guilty_fence)
 {
-	/* back up the non-guilty commands */
-	amdgpu_ring_backup_unprocessed_commands(ring, guilty_fence);
 	/* signal the guilty fence and set an error on all fences from the context */
 	if (guilty_fence)
 		amdgpu_fence_driver_update_timedout_fence_state(guilty_fence);
@@ -879,22 +845,8 @@ void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
 int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
 				 struct amdgpu_fence *guilty_fence)
 {
-	unsigned int i;
-	int r;
-
 	/* verify that the ring is functional */
-	r = amdgpu_ring_test_ring(ring);
-	if (r)
-		return r;
-
-	/* Re-emit the non-guilty commands */
-	if (ring->ring_backup_entries_to_copy) {
-		amdgpu_ring_alloc_reemit(ring, ring->ring_backup_entries_to_copy);
-		for (i = 0; i < ring->ring_backup_entries_to_copy; i++)
-			amdgpu_ring_write(ring, ring->ring_backup[i]);
-		amdgpu_ring_commit(ring);
-	}
-	return 0;
+	return amdgpu_ring_test_ring(ring);
 }
 
 bool amdgpu_ring_is_reset_type_supported(struct amdgpu_ring *ring,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 63272425a12f6..eae82b802399f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -150,11 +150,8 @@ struct amdgpu_fence {
 	u64				wptr;
 	/* fence context for resets */
 	u64				context;
-	/* has this fence been reemitted */
-	unsigned int			reemitted;
-	/* wptr for the fence for the submission */
-	u64				fence_wptr_start;
-	u64				fence_wptr_end;
+	/*  fence state */
+	bool				skip_ib;
 };
 
 extern const struct drm_sched_backend_ops amdgpu_sched_ops;
@@ -163,7 +160,6 @@ void amdgpu_fence_driver_set_error(struct amdgpu_ring *ring, int error);
 void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring,
 					  struct dma_fence *timedout_fence);
 void amdgpu_fence_driver_update_timedout_fence_state(struct amdgpu_fence *af);
-void amdgpu_fence_save_wptr(struct amdgpu_fence *af);
 
 int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring);
 int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring,
@@ -312,9 +308,6 @@ struct amdgpu_ring {
 
 	struct amdgpu_bo	*ring_obj;
 	uint32_t		*ring;
-	/* backups for resets */
-	uint32_t		*ring_backup;
-	unsigned int		ring_backup_entries_to_copy;
 	unsigned		rptr_offs;
 	u64			rptr_gpu_addr;
 	u32			*rptr_cpu_addr;
@@ -572,14 +565,14 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		  enum amdgpu_ib_pool_type pool,
 		  struct amdgpu_ib *ib);
 void amdgpu_ib_free(struct amdgpu_ib *ib, struct dma_fence *f);
+int amdgpu_ib_emit(struct amdgpu_ring *ring, struct amdgpu_job *job);
 int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
 		       struct dma_fence **f);
+
 int amdgpu_ib_pool_init(struct amdgpu_device *adev);
 void amdgpu_ib_pool_fini(struct amdgpu_device *adev);
 int amdgpu_ib_ring_tests(struct amdgpu_device *adev);
 bool amdgpu_ring_sched_ready(struct amdgpu_ring *ring);
-void amdgpu_ring_backup_unprocessed_commands(struct amdgpu_ring *ring,
-					     struct amdgpu_fence *guilty_fence);
 void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
 				    struct amdgpu_fence *guilty_fence);
 int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 42/42] drm/amdgpu: simplify per queue reset code
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (40 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 41/42] drm/amdgpu: rework reset reemit handling Alex Deucher
@ 2026-01-08 14:48 ` Alex Deucher
  2026-01-13 13:31 ` [PATCH 00/42] Improvements for IB handling Christian König
  42 siblings, 0 replies; 66+ messages in thread
From: Alex Deucher @ 2026-01-08 14:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Drop the helpers and move all of the logic into the
common code.  The reset callbacks just reset the
ring.  All of the other logic (reemit, fence error
handling, etc.) is handled in the common code.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 16 ----------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  4 ----
 drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c  |  4 +---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   |  8 ++------
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   |  8 ++------
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c   |  8 ++------
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    |  8 ++------
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c  |  5 +----
 drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c   |  3 +--
 drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c   |  3 +--
 drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c   |  3 +--
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c   |  3 +--
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   |  4 +---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   |  4 +---
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   |  4 +---
 drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c   |  4 +---
 drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c   |  4 +---
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c    |  3 +--
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c  |  4 +---
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c  |  3 +--
 drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c  |  3 +--
 drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c  |  4 +---
 28 files changed, 31 insertions(+), 96 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 91821207636ea..eb0ca53af0ebb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -164,6 +164,8 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 			s_job->sched->name);
 		/* Stop the scheduler to prevent anybody else from touching the ring buffer. */
 		drm_sched_wqueue_stop(&ring->sched);
+		if (!ring->all_instance_rings_reset)
+			amdgpu_fence_driver_update_timedout_fence_state(job->hw_fence);
 		r = amdgpu_ring_reset(ring, job->vmid, job->hw_fence);
 		if (!r)
 			r = amdgpu_job_reemit_jobs(s_job);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 1d94707fc86d9..24ec9fa0fb979 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -833,22 +833,6 @@ bool amdgpu_ring_sched_ready(struct amdgpu_ring *ring)
 	return true;
 }
 
-void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
-				    struct amdgpu_fence *guilty_fence)
-{
-	/* signal the guilty fence and set an error on all fences from the context */
-	if (guilty_fence)
-		amdgpu_fence_driver_update_timedout_fence_state(guilty_fence);
-
-}
-
-int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
-				 struct amdgpu_fence *guilty_fence)
-{
-	/* verify that the ring is functional */
-	return amdgpu_ring_test_ring(ring);
-}
-
 bool amdgpu_ring_is_reset_type_supported(struct amdgpu_ring *ring,
 					 u32 reset_type)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index eae82b802399f..69395cab9ed3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -573,10 +573,6 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev);
 void amdgpu_ib_pool_fini(struct amdgpu_device *adev);
 int amdgpu_ib_ring_tests(struct amdgpu_device *adev);
 bool amdgpu_ring_sched_ready(struct amdgpu_ring *ring);
-void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
-				    struct amdgpu_fence *guilty_fence);
-int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
-				 struct amdgpu_fence *guilty_fence);
 bool amdgpu_ring_is_reset_type_supported(struct amdgpu_ring *ring,
 					 u32 reset_type);
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
index 54ee78c034cdb..e1bcf8a5a36d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
@@ -911,8 +911,6 @@ static int vpe_ring_reset(struct amdgpu_ring *ring,
 	struct amdgpu_device *adev = ring->adev;
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	r = amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VPE,
 						   AMD_PG_STATE_GATE);
 	if (r)
@@ -922,7 +920,7 @@ static int vpe_ring_reset(struct amdgpu_ring *ring,
 	if (r)
 		return r;
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static ssize_t amdgpu_get_vpe_reset_mask(struct device *dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index e0e125eef9ac5..73f709fe293d7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -9517,8 +9517,6 @@ static int gfx_v10_0_reset_kgq(struct amdgpu_ring *ring,
 	if (!kiq->pmf || !kiq->pmf->kiq_unmap_queues)
 		return -EINVAL;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	spin_lock_irqsave(&kiq->ring_lock, flags);
 
 	if (amdgpu_ring_alloc(kiq_ring, 5 + 7 + 7)) {
@@ -9566,7 +9564,7 @@ static int gfx_v10_0_reset_kgq(struct amdgpu_ring *ring,
 	if (r)
 		return r;
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static int gfx_v10_0_reset_kcq(struct amdgpu_ring *ring,
@@ -9582,8 +9580,6 @@ static int gfx_v10_0_reset_kcq(struct amdgpu_ring *ring,
 	if (!kiq->pmf || !kiq->pmf->kiq_unmap_queues)
 		return -EINVAL;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	spin_lock_irqsave(&kiq->ring_lock, flags);
 
 	if (amdgpu_ring_alloc(kiq_ring, kiq->pmf->unmap_queues_size)) {
@@ -9636,7 +9632,7 @@ static int gfx_v10_0_reset_kcq(struct amdgpu_ring *ring,
 	if (r)
 		return r;
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static void gfx_v10_ip_print(struct amdgpu_ip_block *ip_block, struct drm_printer *p)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index cc9ac87c5be02..8019cac16a181 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -6833,8 +6833,6 @@ static int gfx_v11_0_reset_kgq(struct amdgpu_ring *ring,
 	struct amdgpu_device *adev = ring->adev;
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	r = amdgpu_mes_reset_legacy_queue(ring->adev, ring, vmid, false, 0);
 	if (r) {
 
@@ -6856,7 +6854,7 @@ static int gfx_v11_0_reset_kgq(struct amdgpu_ring *ring,
 		return r;
 	}
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static int gfx_v11_0_reset_compute_pipe(struct amdgpu_ring *ring)
@@ -6996,8 +6994,6 @@ static int gfx_v11_0_reset_kcq(struct amdgpu_ring *ring,
 	struct amdgpu_device *adev = ring->adev;
 	int r = 0;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	r = amdgpu_mes_reset_legacy_queue(ring->adev, ring, vmid, true, 0);
 	if (r) {
 		dev_warn(adev->dev, "fail(%d) to reset kcq and try pipe reset\n", r);
@@ -7017,7 +7013,7 @@ static int gfx_v11_0_reset_kcq(struct amdgpu_ring *ring,
 		return r;
 	}
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static void gfx_v11_ip_print(struct amdgpu_ip_block *ip_block, struct drm_printer *p)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index cbe175145286b..0afca15c5cd15 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -5308,8 +5308,6 @@ static int gfx_v12_0_reset_kgq(struct amdgpu_ring *ring,
 	struct amdgpu_device *adev = ring->adev;
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	r = amdgpu_mes_reset_legacy_queue(ring->adev, ring, vmid, false, 0);
 	if (r) {
 		dev_warn(adev->dev, "reset via MES failed and try pipe reset %d\n", r);
@@ -5330,7 +5328,7 @@ static int gfx_v12_0_reset_kgq(struct amdgpu_ring *ring,
 		return r;
 	}
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static int gfx_v12_0_reset_compute_pipe(struct amdgpu_ring *ring)
@@ -5423,8 +5421,6 @@ static int gfx_v12_0_reset_kcq(struct amdgpu_ring *ring,
 	struct amdgpu_device *adev = ring->adev;
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	r = amdgpu_mes_reset_legacy_queue(ring->adev, ring, vmid, true, 0);
 	if (r) {
 		dev_warn(adev->dev, "fail(%d) to reset kcq  and try pipe reset\n", r);
@@ -5444,7 +5440,7 @@ static int gfx_v12_0_reset_kcq(struct amdgpu_ring *ring,
 		return r;
 	}
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static void gfx_v12_0_ring_begin_use(struct amdgpu_ring *ring)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 07fe959abe0d7..f5d8e7cb78f04 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -7211,8 +7211,6 @@ static int gfx_v9_0_reset_kgq(struct amdgpu_ring *ring,
 	u32 tmp;
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	spin_lock_irqsave(&kiq->ring_lock, flags);
 
 	if (amdgpu_ring_alloc(kiq_ring, 5)) {
@@ -7261,7 +7259,7 @@ static int gfx_v9_0_reset_kgq(struct amdgpu_ring *ring,
 	if (r)
 		return r;
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static int gfx_v9_0_reset_kcq(struct amdgpu_ring *ring,
@@ -7277,8 +7275,6 @@ static int gfx_v9_0_reset_kcq(struct amdgpu_ring *ring,
 	if (!kiq->pmf || !kiq->pmf->kiq_unmap_queues)
 		return -EINVAL;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	spin_lock_irqsave(&kiq->ring_lock, flags);
 
 	if (amdgpu_ring_alloc(kiq_ring, kiq->pmf->unmap_queues_size)) {
@@ -7334,7 +7330,7 @@ static int gfx_v9_0_reset_kcq(struct amdgpu_ring *ring,
 		DRM_ERROR("fail to remap queue\n");
 		return r;
 	}
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static void gfx_v9_ip_print(struct amdgpu_ip_block *ip_block, struct drm_printer *p)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index fb731e877c99c..f26569ca03d51 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -3560,8 +3560,6 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring *ring,
 	if (!kiq->pmf || !kiq->pmf->kiq_unmap_queues)
 		return -EINVAL;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	spin_lock_irqsave(&kiq->ring_lock, flags);
 
 	if (amdgpu_ring_alloc(kiq_ring, kiq->pmf->unmap_queues_size)) {
@@ -3624,8 +3622,7 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring *ring,
 			goto pipe_reset;
 	}
 
-
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 enum amdgpu_gfx_cp_ras_mem_id {
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c
index 9fe8d10ab2705..9f3e042e13745 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c
@@ -772,14 +772,13 @@ static int jpeg_v2_0_ring_reset(struct amdgpu_ring *ring,
 {
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	r = jpeg_v2_0_stop(ring->adev);
 	if (r)
 		return r;
 	r = jpeg_v2_0_start(ring->adev);
 	if (r)
 		return r;
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amd_ip_funcs jpeg_v2_0_ip_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c
index 20983f126b490..0db65c58b67a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c
@@ -650,10 +650,9 @@ static int jpeg_v2_5_ring_reset(struct amdgpu_ring *ring,
 				unsigned int vmid,
 				struct amdgpu_fence *timedout_fence)
 {
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	jpeg_v2_5_stop_inst(ring->adev, ring->me);
 	jpeg_v2_5_start_inst(ring->adev, ring->me);
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amd_ip_funcs jpeg_v2_5_ip_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
index 98f5e0622bc58..f8e09bbf7ff1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
@@ -564,14 +564,13 @@ static int jpeg_v3_0_ring_reset(struct amdgpu_ring *ring,
 {
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	r = jpeg_v3_0_stop(ring->adev);
 	if (r)
 		return r;
 	r = jpeg_v3_0_start(ring->adev);
 	if (r)
 		return r;
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amd_ip_funcs jpeg_v3_0_ip_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
index 0bd83820dd20c..c7b0b1773d3e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
@@ -729,14 +729,13 @@ static int jpeg_v4_0_ring_reset(struct amdgpu_ring *ring,
 {
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	r = jpeg_v4_0_stop(ring->adev);
 	if (r)
 		return r;
 	r = jpeg_v4_0_start(ring->adev);
 	if (r)
 		return r;
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amd_ip_funcs jpeg_v4_0_ip_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
index 50ed7fb0e941c..daddfbf6e2d8f 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
@@ -1145,10 +1145,9 @@ static int jpeg_v4_0_3_ring_reset(struct amdgpu_ring *ring,
 				  unsigned int vmid,
 				  struct amdgpu_fence *timedout_fence)
 {
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	jpeg_v4_0_3_core_stall_reset(ring);
 	jpeg_v4_0_3_start_jrbc(ring);
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amd_ip_funcs jpeg_v4_0_3_ip_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
index 54fd9c800c40a..96b8cbba382b0 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
@@ -774,14 +774,13 @@ static int jpeg_v4_0_5_ring_reset(struct amdgpu_ring *ring,
 {
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	r = jpeg_v4_0_5_stop(ring->adev);
 	if (r)
 		return r;
 	r = jpeg_v4_0_5_start(ring->adev);
 	if (r)
 		return r;
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amd_ip_funcs jpeg_v4_0_5_ip_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c
index 46bf15dce2bd0..43cff4db2c153 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c
@@ -650,14 +650,13 @@ static int jpeg_v5_0_0_ring_reset(struct amdgpu_ring *ring,
 {
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	r = jpeg_v5_0_0_stop(ring->adev);
 	if (r)
 		return r;
 	r = jpeg_v5_0_0_start(ring->adev);
 	if (r)
 		return r;
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amd_ip_funcs jpeg_v5_0_0_ip_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c
index ab0bf880d3d8a..146ec7b1d0ab9 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c
@@ -844,10 +844,9 @@ static int jpeg_v5_0_1_ring_reset(struct amdgpu_ring *ring,
 				  unsigned int vmid,
 				  struct amdgpu_fence *timedout_fence)
 {
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	jpeg_v5_0_1_core_stall_reset(ring);
 	jpeg_v5_0_1_init_jrbc(ring);
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amd_ip_funcs jpeg_v5_0_1_ip_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c
index 1821dced936fb..56c4a37520925 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c
@@ -633,14 +633,13 @@ static int jpeg_v5_3_0_ring_reset(struct amdgpu_ring *ring,
 {
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	r = jpeg_v5_3_0_stop(ring->adev);
 	if (r)
 		return r;
 	r = jpeg_v5_3_0_start(ring->adev);
 	if (r)
 		return r;
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amd_ip_funcs jpeg_v5_3_0_ip_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index c5dc727c7b448..c983cadbdb808 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -1557,15 +1557,13 @@ static int sdma_v5_0_reset_queue(struct amdgpu_ring *ring,
 		return -EINVAL;
 	}
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	amdgpu_amdkfd_suspend(adev, true);
 	r = amdgpu_sdma_reset_engine(adev, ring->me, true);
 	amdgpu_amdkfd_resume(adev, true);
 	if (r)
 		return r;
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static int sdma_v5_0_stop_queue(struct amdgpu_ring *ring)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index 3076734462d25..cc9ecfa8673bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -1473,15 +1473,13 @@ static int sdma_v5_2_reset_queue(struct amdgpu_ring *ring,
 		return -EINVAL;
 	}
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	amdgpu_amdkfd_suspend(adev, true);
 	r = amdgpu_sdma_reset_engine(adev, ring->me, true);
 	amdgpu_amdkfd_resume(adev, true);
 	if (r)
 		return r;
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static int sdma_v5_2_stop_queue(struct amdgpu_ring *ring)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index fbac29485d0c8..69aa10265891e 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1602,8 +1602,6 @@ static int sdma_v6_0_reset_queue(struct amdgpu_ring *ring,
 		return -EINVAL;
 	}
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	r = amdgpu_mes_reset_legacy_queue(adev, ring, vmid, true, 0);
 	if (r)
 		return r;
@@ -1612,7 +1610,7 @@ static int sdma_v6_0_reset_queue(struct amdgpu_ring *ring,
 	if (r)
 		return r;
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static int sdma_v6_0_set_trap_irq_state(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
index bb9fae2c8dee0..1425c0c2ca9a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
@@ -814,8 +814,6 @@ static int sdma_v7_0_reset_queue(struct amdgpu_ring *ring,
 		return -EINVAL;
 	}
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	r = amdgpu_mes_reset_legacy_queue(adev, ring, vmid, true, 0);
 	if (r)
 		return r;
@@ -824,7 +822,7 @@ static int sdma_v7_0_reset_queue(struct amdgpu_ring *ring,
 	if (r)
 		return r;
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c
index 5efdb4dcbed97..95768ff7f9985 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c
@@ -805,8 +805,6 @@ static int sdma_v7_1_reset_queue(struct amdgpu_ring *ring,
 		return -EINVAL;
 	}
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	r = amdgpu_mes_reset_legacy_queue(adev, ring, vmid, true, 0);
 	if (r)
 		return r;
@@ -815,7 +813,7 @@ static int sdma_v7_1_reset_queue(struct amdgpu_ring *ring,
 	if (r)
 		return r;
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index d17219be50f39..151da0e405a80 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -1964,14 +1964,13 @@ static int vcn_v4_0_ring_reset(struct amdgpu_ring *ring,
 	struct amdgpu_vcn_inst *vinst = &adev->vcn.inst[ring->me];
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	r = vcn_v4_0_stop(vinst);
 	if (r)
 		return r;
 	r = vcn_v4_0_start(vinst);
 	if (r)
 		return r;
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static struct amdgpu_ring_funcs vcn_v4_0_unified_ring_vm_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
index cb7123ec1a5d1..68cfa648a82c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
@@ -1605,8 +1605,6 @@ static int vcn_v4_0_3_ring_reset(struct amdgpu_ring *ring,
 	struct amdgpu_device *adev = ring->adev;
 	struct amdgpu_vcn_inst *vinst = &adev->vcn.inst[ring->me];
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	vcn_inst = GET_INST(VCN, ring->me);
 	r = amdgpu_dpm_reset_vcn(adev, 1 << vcn_inst);
 
@@ -1621,7 +1619,7 @@ static int vcn_v4_0_3_ring_reset(struct amdgpu_ring *ring,
 	vcn_v4_0_3_hw_init_inst(vinst);
 	vcn_v4_0_3_start_dpg_mode(vinst, adev->vcn.inst[ring->me].indirect_sram);
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amdgpu_ring_funcs vcn_v4_0_3_unified_ring_vm_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index 1f6a22983c0dd..c9c32ef1f1317 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -1469,14 +1469,13 @@ static int vcn_v4_0_5_ring_reset(struct amdgpu_ring *ring,
 	struct amdgpu_vcn_inst *vinst = &adev->vcn.inst[ring->me];
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	r = vcn_v4_0_5_stop(vinst);
 	if (r)
 		return r;
 	r = vcn_v4_0_5_start(vinst);
 	if (r)
 		return r;
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static struct amdgpu_ring_funcs vcn_v4_0_5_unified_ring_vm_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c
index 0202df5db1e12..34459b4b9b987 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c
@@ -1189,14 +1189,13 @@ static int vcn_v5_0_0_ring_reset(struct amdgpu_ring *ring,
 	struct amdgpu_vcn_inst *vinst = &adev->vcn.inst[ring->me];
 	int r;
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
 	r = vcn_v5_0_0_stop(vinst);
 	if (r)
 		return r;
 	r = vcn_v5_0_0_start(vinst);
 	if (r)
 		return r;
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amdgpu_ring_funcs vcn_v5_0_0_unified_ring_vm_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c
index 8bd457dea4cff..06b4399ef295a 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c
@@ -1310,8 +1310,6 @@ static int vcn_v5_0_1_ring_reset(struct amdgpu_ring *ring,
 	struct amdgpu_device *adev = ring->adev;
 	struct amdgpu_vcn_inst *vinst = &adev->vcn.inst[ring->me];
 
-	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
-
 	vcn_inst = GET_INST(VCN, ring->me);
 	r = amdgpu_dpm_reset_vcn(adev, 1 << vcn_inst);
 
@@ -1323,7 +1321,7 @@ static int vcn_v5_0_1_ring_reset(struct amdgpu_ring *ring,
 	vcn_v5_0_1_hw_init_inst(adev, ring->me);
 	vcn_v5_0_1_start_dpg_mode(vinst, vinst->indirect_sram);
 
-	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
+	return amdgpu_ring_test_ring(ring);
 }
 
 static const struct amdgpu_ring_funcs vcn_v5_0_1_unified_ring_vm_funcs = {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
                   ` (41 preceding siblings ...)
  2026-01-08 14:48 ` [PATCH 42/42] drm/amdgpu: simplify per queue reset code Alex Deucher
@ 2026-01-13 13:31 ` Christian König
  2026-01-13 14:10   ` Alex Deucher
  2026-01-13 21:17   ` Alex Deucher
  42 siblings, 2 replies; 66+ messages in thread
From: Christian König @ 2026-01-13 13:31 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx

Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>

Comment on patch #4 which also affects patches #5-#26.

Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.

Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>

Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.

Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.

Regards,
Christian.

On 1/8/26 15:48, Alex Deucher wrote:
> This set contains a number of bug fixes and cleanups for
> IB handling that I worked on over the holidays.
> 
> Patches 1-2:
> Simple bug fixes.
> 
> Patches 3-26:
> Removes the direct submit path for IBs and requires
> that all IB submissions use a job structure.  This
> greatly simplifies the IB submission code.
> 
> Patches 27-42:
> Split IB state setup and ring emission.  This keeps all
> of the IB state in the job.  This greatly simplifies
> re-emission of non-timed-out jobs after a ring reset and
> allows for re-emission multiple times if multiple resets
> happen in a row.  It also properly handles the dma fence
> error handling for timedout jobs with adapter resets.
> 
> Alex Deucher (42):
>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
>   drm/amdgpu: fix error handling in ib_schedule()
>   drm/amdgpu: add new job ids
>   drm/amdgpu/vpe: switch to using job for IBs
>   drm/amdgpu/gfx6: switch to using job for IBs
>   drm/amdgpu/gfx7: switch to using job for IBs
>   drm/amdgpu/gfx8: switch to using job for IBs
>   drm/amdgpu/gfx9: switch to using job for IBs
>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
>   drm/amdgpu/gfx10: switch to using job for IBs
>   drm/amdgpu/gfx11: switch to using job for IBs
>   drm/amdgpu/gfx12: switch to using job for IBs
>   drm/amdgpu/gfx12.1: switch to using job for IBs
>   drm/amdgpu/si_dma: switch to using job for IBs
>   drm/amdgpu/cik_sdma: switch to using job for IBs
>   drm/amdgpu/sdma2.4: switch to using job for IBs
>   drm/amdgpu/sdma3: switch to using job for IBs
>   drm/amdgpu/sdma4: switch to using job for IBs
>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
>   drm/amdgpu/sdma5: switch to using job for IBs
>   drm/amdgpu/sdma5.2: switch to using job for IBs
>   drm/amdgpu/sdma6: switch to using job for IBs
>   drm/amdgpu/sdma7: switch to using job for IBs
>   drm/amdgpu/sdma7.1: switch to using job for IBs
>   drm/amdgpu: require a job to schedule an IB
>   drm/amdgpu: mark fences with errors before ring reset
>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
>   drm/amdgpu: drop drm_sched_increase_karma()
>   drm/amdgpu: plumb timedout fence through to force completion
>   drm/amdgpu: change function signature for emit_pipeline_sync()
>   drm/amdgpu: drop extra parameter for vm_flush
>   drm/amdgpu: move need_ctx_switch into amdgpu_job
>   drm/amdgpu: store vm flush state in amdgpu_job
>   drm/amdgpu: split fence init and emit logic
>   drm/amdgpu: split vm flush and vm flush emit logic
>   drm/amdgpu: split ib schedule and ib emit logic
>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
>   drm/amdgpu: add an all_instance_rings_reset ring flag
>   drm/amdgpu: rework reset reemit handling
>   drm/amdgpu: simplify per queue reset code
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
>  54 files changed, 952 insertions(+), 966 deletions(-)
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-13 13:31 ` [PATCH 00/42] Improvements for IB handling Christian König
@ 2026-01-13 14:10   ` Alex Deucher
  2026-01-13 14:47     ` Christian König
  2026-01-13 21:17   ` Alex Deucher
  1 sibling, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-13 14:10 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Tue, Jan 13, 2026 at 8:57 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Comment on patch #4 which also affects patches #5-#26.
>
> Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.
>
> Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.
>
> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.
>

I disagree.  If the ring emit functions are purely just emitting
packets to the ring, it's a much cleaner approach than trying to save
and restore packet sequences repeatedly.  If the relevant state is
stored in the job, you can re-emit it and get the same ring state each
time.  If you end up with multiple queue resets in a row, it gets
really complex to try and save and restore opaque ring contents.  By
the time you fix up the state tracking to handle that, you end up
pretty close to this solution.

Alex

> Regards,
> Christian.
>
> On 1/8/26 15:48, Alex Deucher wrote:
> > This set contains a number of bug fixes and cleanups for
> > IB handling that I worked on over the holidays.
> >
> > Patches 1-2:
> > Simple bug fixes.
> >
> > Patches 3-26:
> > Removes the direct submit path for IBs and requires
> > that all IB submissions use a job structure.  This
> > greatly simplifies the IB submission code.
> >
> > Patches 27-42:
> > Split IB state setup and ring emission.  This keeps all
> > of the IB state in the job.  This greatly simplifies
> > re-emission of non-timed-out jobs after a ring reset and
> > allows for re-emission multiple times if multiple resets
> > happen in a row.  It also properly handles the dma fence
> > error handling for timedout jobs with adapter resets.
> >
> > Alex Deucher (42):
> >   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
> >   drm/amdgpu: fix error handling in ib_schedule()
> >   drm/amdgpu: add new job ids
> >   drm/amdgpu/vpe: switch to using job for IBs
> >   drm/amdgpu/gfx6: switch to using job for IBs
> >   drm/amdgpu/gfx7: switch to using job for IBs
> >   drm/amdgpu/gfx8: switch to using job for IBs
> >   drm/amdgpu/gfx9: switch to using job for IBs
> >   drm/amdgpu/gfx9.4.2: switch to using job for IBs
> >   drm/amdgpu/gfx9.4.3: switch to using job for IBs
> >   drm/amdgpu/gfx10: switch to using job for IBs
> >   drm/amdgpu/gfx11: switch to using job for IBs
> >   drm/amdgpu/gfx12: switch to using job for IBs
> >   drm/amdgpu/gfx12.1: switch to using job for IBs
> >   drm/amdgpu/si_dma: switch to using job for IBs
> >   drm/amdgpu/cik_sdma: switch to using job for IBs
> >   drm/amdgpu/sdma2.4: switch to using job for IBs
> >   drm/amdgpu/sdma3: switch to using job for IBs
> >   drm/amdgpu/sdma4: switch to using job for IBs
> >   drm/amdgpu/sdma4.4.2: switch to using job for IBs
> >   drm/amdgpu/sdma5: switch to using job for IBs
> >   drm/amdgpu/sdma5.2: switch to using job for IBs
> >   drm/amdgpu/sdma6: switch to using job for IBs
> >   drm/amdgpu/sdma7: switch to using job for IBs
> >   drm/amdgpu/sdma7.1: switch to using job for IBs
> >   drm/amdgpu: require a job to schedule an IB
> >   drm/amdgpu: mark fences with errors before ring reset
> >   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
> >   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
> >   drm/amdgpu: drop drm_sched_increase_karma()
> >   drm/amdgpu: plumb timedout fence through to force completion
> >   drm/amdgpu: change function signature for emit_pipeline_sync()
> >   drm/amdgpu: drop extra parameter for vm_flush
> >   drm/amdgpu: move need_ctx_switch into amdgpu_job
> >   drm/amdgpu: store vm flush state in amdgpu_job
> >   drm/amdgpu: split fence init and emit logic
> >   drm/amdgpu: split vm flush and vm flush emit logic
> >   drm/amdgpu: split ib schedule and ib emit logic
> >   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
> >   drm/amdgpu: add an all_instance_rings_reset ring flag
> >   drm/amdgpu: rework reset reemit handling
> >   drm/amdgpu: simplify per queue reset code
> >
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
> >  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
> >  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
> >  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
> >  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
> >  54 files changed, 952 insertions(+), 966 deletions(-)
> >
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-13 14:10   ` Alex Deucher
@ 2026-01-13 14:47     ` Christian König
  2026-01-13 15:34       ` Alex Deucher
  0 siblings, 1 reply; 66+ messages in thread
From: Christian König @ 2026-01-13 14:47 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, amd-gfx

On 1/13/26 15:10, Alex Deucher wrote:
> On Tue, Jan 13, 2026 at 8:57 AM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Comment on patch #4 which also affects patches #5-#26.
>>
>> Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.
>>
>> Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.
>>
>> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.
>>
> 
> I disagree.  If the ring emit functions are purely just emitting
> packets to the ring, it's a much cleaner approach than trying to save
> and restore packet sequences repeatedly.

Exactly that's the problem, this is not what they do.

See gfx_v11_0_ring_emit_gfx_shadow() for an example:

...
        /*      
         * We start with skipping the prefix SET_Q_MODE and always executing
         * the postfix SET_Q_MODE packet. This is changed below with a
         * WRITE_DATA command when the postfix executed.
         */
        amdgpu_ring_write(ring, shadow_va ? 1 : 0);
        amdgpu_ring_write(ring, 0);

        if (ring->set_q_mode_offs) {
                uint64_t addr; 

                addr = amdgpu_bo_gpu_offset(ring->ring_obj);
                addr += ring->set_q_mode_offs << 2;
                end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr);
        }
...
        if (shadow_va) {
                uint64_t token = shadow_va ^ csa_va ^ gds_va ^ vmid;

                /*
                 * If the tokens match try to skip the last postfix SET_Q_MODE
                 * packet to avoid saving/restoring the state all the time.
                 */
                if (ring->set_q_mode_ptr && ring->set_q_mode_token == token)
                        *ring->set_q_mode_ptr = 0;

                ring->set_q_mode_token = token;
        } else {
                ring->set_q_mode_ptr = &ring->ring[ring->set_q_mode_offs];
        }

        ring->set_q_mode_offs = offs;
}

Executing this multiple times is simply not possible without saving set_q_mode_offs, the token and the CPU pointer (and restoring the CPU pointer content).

And that is just the tip of the iceberg, we have tons of state like this.

> If the relevant state is
> stored in the job, you can re-emit it and get the same ring state each
> time.

No, you can't. Background is that the relevant state is not job dependent, but inter job dependent.

In other words it doesn't depend on what job is executing now but rather which one was executed right before that one.

Or even worse in the case of the set_q_mode packet on the job dependent after the one you want to execute.

I can absolutely not see how stuff like that should work with re-submission.

> If you end up with multiple queue resets in a row, it gets
> really complex to try and save and restore opaque ring contents.  By
> the time you fix up the state tracking to handle that, you end up
> pretty close to this solution.

Not even remotely, you have tons of state we would need to save and restore and a lot of that is outside of the job.

Updating a few fence pointers on re-submission is absolutely trivial compared to that.

Regards,
Christian.

> 
> Alex
> 
>> Regards,
>> Christian.
>>
>> On 1/8/26 15:48, Alex Deucher wrote:
>>> This set contains a number of bug fixes and cleanups for
>>> IB handling that I worked on over the holidays.
>>>
>>> Patches 1-2:
>>> Simple bug fixes.
>>>
>>> Patches 3-26:
>>> Removes the direct submit path for IBs and requires
>>> that all IB submissions use a job structure.  This
>>> greatly simplifies the IB submission code.
>>>
>>> Patches 27-42:
>>> Split IB state setup and ring emission.  This keeps all
>>> of the IB state in the job.  This greatly simplifies
>>> re-emission of non-timed-out jobs after a ring reset and
>>> allows for re-emission multiple times if multiple resets
>>> happen in a row.  It also properly handles the dma fence
>>> error handling for timedout jobs with adapter resets.
>>>
>>> Alex Deucher (42):
>>>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
>>>   drm/amdgpu: fix error handling in ib_schedule()
>>>   drm/amdgpu: add new job ids
>>>   drm/amdgpu/vpe: switch to using job for IBs
>>>   drm/amdgpu/gfx6: switch to using job for IBs
>>>   drm/amdgpu/gfx7: switch to using job for IBs
>>>   drm/amdgpu/gfx8: switch to using job for IBs
>>>   drm/amdgpu/gfx9: switch to using job for IBs
>>>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
>>>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
>>>   drm/amdgpu/gfx10: switch to using job for IBs
>>>   drm/amdgpu/gfx11: switch to using job for IBs
>>>   drm/amdgpu/gfx12: switch to using job for IBs
>>>   drm/amdgpu/gfx12.1: switch to using job for IBs
>>>   drm/amdgpu/si_dma: switch to using job for IBs
>>>   drm/amdgpu/cik_sdma: switch to using job for IBs
>>>   drm/amdgpu/sdma2.4: switch to using job for IBs
>>>   drm/amdgpu/sdma3: switch to using job for IBs
>>>   drm/amdgpu/sdma4: switch to using job for IBs
>>>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
>>>   drm/amdgpu/sdma5: switch to using job for IBs
>>>   drm/amdgpu/sdma5.2: switch to using job for IBs
>>>   drm/amdgpu/sdma6: switch to using job for IBs
>>>   drm/amdgpu/sdma7: switch to using job for IBs
>>>   drm/amdgpu/sdma7.1: switch to using job for IBs
>>>   drm/amdgpu: require a job to schedule an IB
>>>   drm/amdgpu: mark fences with errors before ring reset
>>>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
>>>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
>>>   drm/amdgpu: drop drm_sched_increase_karma()
>>>   drm/amdgpu: plumb timedout fence through to force completion
>>>   drm/amdgpu: change function signature for emit_pipeline_sync()
>>>   drm/amdgpu: drop extra parameter for vm_flush
>>>   drm/amdgpu: move need_ctx_switch into amdgpu_job
>>>   drm/amdgpu: store vm flush state in amdgpu_job
>>>   drm/amdgpu: split fence init and emit logic
>>>   drm/amdgpu: split vm flush and vm flush emit logic
>>>   drm/amdgpu: split ib schedule and ib emit logic
>>>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
>>>   drm/amdgpu: add an all_instance_rings_reset ring flag
>>>   drm/amdgpu: rework reset reemit handling
>>>   drm/amdgpu: simplify per queue reset code
>>>
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
>>>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
>>>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
>>>  54 files changed, 952 insertions(+), 966 deletions(-)
>>>
>>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-13 14:47     ` Christian König
@ 2026-01-13 15:34       ` Alex Deucher
  2026-01-13 22:36         ` Alex Deucher
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-13 15:34 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Tue, Jan 13, 2026 at 9:48 AM Christian König
<christian.koenig@amd.com> wrote:
>
> On 1/13/26 15:10, Alex Deucher wrote:
> > On Tue, Jan 13, 2026 at 8:57 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >>
> >> Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>
> >>
> >> Comment on patch #4 which also affects patches #5-#26.
> >>
> >> Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.
> >>
> >> Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>
> >>
> >> Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.
> >>
> >> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.
> >>
> >
> > I disagree.  If the ring emit functions are purely just emitting
> > packets to the ring, it's a much cleaner approach than trying to save
> > and restore packet sequences repeatedly.
>
> Exactly that's the problem, this is not what they do.
>
> See gfx_v11_0_ring_emit_gfx_shadow() for an example:
>
> ...
>         /*
>          * We start with skipping the prefix SET_Q_MODE and always executing
>          * the postfix SET_Q_MODE packet. This is changed below with a
>          * WRITE_DATA command when the postfix executed.
>          */
>         amdgpu_ring_write(ring, shadow_va ? 1 : 0);
>         amdgpu_ring_write(ring, 0);
>
>         if (ring->set_q_mode_offs) {
>                 uint64_t addr;
>
>                 addr = amdgpu_bo_gpu_offset(ring->ring_obj);
>                 addr += ring->set_q_mode_offs << 2;
>                 end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr);
>         }
> ...
>         if (shadow_va) {
>                 uint64_t token = shadow_va ^ csa_va ^ gds_va ^ vmid;
>
>                 /*
>                  * If the tokens match try to skip the last postfix SET_Q_MODE
>                  * packet to avoid saving/restoring the state all the time.
>                  */
>                 if (ring->set_q_mode_ptr && ring->set_q_mode_token == token)
>                         *ring->set_q_mode_ptr = 0;
>
>                 ring->set_q_mode_token = token;
>         } else {
>                 ring->set_q_mode_ptr = &ring->ring[ring->set_q_mode_offs];
>         }
>
>         ring->set_q_mode_offs = offs;
> }
>
> Executing this multiple times is simply not possible without saving set_q_mode_offs, the token and the CPU pointer (and restoring the CPU pointer content).
>
> And that is just the tip of the iceberg, we have tons of state like this.

There is not much more than that.  I looked when I wrote these
patches.  Even this state should be handled correctly.  In this case,
the state is saved in the job at the original submission time and is
explicitly passed to the emit ring functions.  As such the original
state is reproduced.  In this case, ring->set_q_mode_offs and
ring->set_q_mode_ptr get reset in gfx_v11_0_ring_emit_vm_flush().
Then they get set as appropriate based on the saved state in the job
in gfx_v11_0_ring_emit_gfx_shadow().  It emits the same ring state
again.

>
> > If the relevant state is
> > stored in the job, you can re-emit it and get the same ring state each
> > time.
>
> No, you can't. Background is that the relevant state is not job dependent, but inter job dependent.
>
> In other words it doesn't depend on what job is executing now but rather which one was executed right before that one.
>
> Or even worse in the case of the set_q_mode packet on the job dependent after the one you want to execute.
>
> I can absolutely not see how stuff like that should work with re-submission.

All you need to do is save the state that was used to emit the packets
in the original submission.

>
> > If you end up with multiple queue resets in a row, it gets
> > really complex to try and save and restore opaque ring contents.  By
> > the time you fix up the state tracking to handle that, you end up
> > pretty close to this solution.
>
> Not even remotely, you have tons of state we would need to save and restore and a lot of that is outside of the job.
>
> Updating a few fence pointers on re-submission is absolutely trivial compared to that.

It's not that easy.  If you want to just emit the fences for bad
contexts rather than the whole IB stream, you can also potentially
mess up the ring state.  You'd end up needing a pile of pointers that
need to be recalculated on every reset to try and remit the
appropriate state again.  This approach also paves the way for
re-emitting state for all queues after adapter reset when VRAM is not
lost.

Alex

>
> Regards,
> Christian.
>
> >
> > Alex
> >
> >> Regards,
> >> Christian.
> >>
> >> On 1/8/26 15:48, Alex Deucher wrote:
> >>> This set contains a number of bug fixes and cleanups for
> >>> IB handling that I worked on over the holidays.
> >>>
> >>> Patches 1-2:
> >>> Simple bug fixes.
> >>>
> >>> Patches 3-26:
> >>> Removes the direct submit path for IBs and requires
> >>> that all IB submissions use a job structure.  This
> >>> greatly simplifies the IB submission code.
> >>>
> >>> Patches 27-42:
> >>> Split IB state setup and ring emission.  This keeps all
> >>> of the IB state in the job.  This greatly simplifies
> >>> re-emission of non-timed-out jobs after a ring reset and
> >>> allows for re-emission multiple times if multiple resets
> >>> happen in a row.  It also properly handles the dma fence
> >>> error handling for timedout jobs with adapter resets.
> >>>
> >>> Alex Deucher (42):
> >>>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
> >>>   drm/amdgpu: fix error handling in ib_schedule()
> >>>   drm/amdgpu: add new job ids
> >>>   drm/amdgpu/vpe: switch to using job for IBs
> >>>   drm/amdgpu/gfx6: switch to using job for IBs
> >>>   drm/amdgpu/gfx7: switch to using job for IBs
> >>>   drm/amdgpu/gfx8: switch to using job for IBs
> >>>   drm/amdgpu/gfx9: switch to using job for IBs
> >>>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
> >>>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
> >>>   drm/amdgpu/gfx10: switch to using job for IBs
> >>>   drm/amdgpu/gfx11: switch to using job for IBs
> >>>   drm/amdgpu/gfx12: switch to using job for IBs
> >>>   drm/amdgpu/gfx12.1: switch to using job for IBs
> >>>   drm/amdgpu/si_dma: switch to using job for IBs
> >>>   drm/amdgpu/cik_sdma: switch to using job for IBs
> >>>   drm/amdgpu/sdma2.4: switch to using job for IBs
> >>>   drm/amdgpu/sdma3: switch to using job for IBs
> >>>   drm/amdgpu/sdma4: switch to using job for IBs
> >>>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
> >>>   drm/amdgpu/sdma5: switch to using job for IBs
> >>>   drm/amdgpu/sdma5.2: switch to using job for IBs
> >>>   drm/amdgpu/sdma6: switch to using job for IBs
> >>>   drm/amdgpu/sdma7: switch to using job for IBs
> >>>   drm/amdgpu/sdma7.1: switch to using job for IBs
> >>>   drm/amdgpu: require a job to schedule an IB
> >>>   drm/amdgpu: mark fences with errors before ring reset
> >>>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
> >>>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
> >>>   drm/amdgpu: drop drm_sched_increase_karma()
> >>>   drm/amdgpu: plumb timedout fence through to force completion
> >>>   drm/amdgpu: change function signature for emit_pipeline_sync()
> >>>   drm/amdgpu: drop extra parameter for vm_flush
> >>>   drm/amdgpu: move need_ctx_switch into amdgpu_job
> >>>   drm/amdgpu: store vm flush state in amdgpu_job
> >>>   drm/amdgpu: split fence init and emit logic
> >>>   drm/amdgpu: split vm flush and vm flush emit logic
> >>>   drm/amdgpu: split ib schedule and ib emit logic
> >>>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
> >>>   drm/amdgpu: add an all_instance_rings_reset ring flag
> >>>   drm/amdgpu: rework reset reemit handling
> >>>   drm/amdgpu: simplify per queue reset code
> >>>
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
> >>>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
> >>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
> >>>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
> >>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
> >>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
> >>>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
> >>>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
> >>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
> >>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
> >>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
> >>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
> >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
> >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
> >>>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
> >>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
> >>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
> >>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
> >>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
> >>>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
> >>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
> >>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
> >>>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
> >>>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
> >>>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
> >>>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
> >>>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
> >>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
> >>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
> >>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
> >>>  54 files changed, 952 insertions(+), 966 deletions(-)
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-13 15:34       ` Alex Deucher
@ 2026-01-13 22:36         ` Alex Deucher
  2026-01-14 10:45           ` Christian König
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-13 22:36 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Tue, Jan 13, 2026 at 10:34 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Tue, Jan 13, 2026 at 9:48 AM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > On 1/13/26 15:10, Alex Deucher wrote:
> > > On Tue, Jan 13, 2026 at 8:57 AM Christian König
> > > <christian.koenig@amd.com> wrote:
> > >>
> > >> Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>
> > >>
> > >> Comment on patch #4 which also affects patches #5-#26.
> > >>
> > >> Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.
> > >>
> > >> Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>
> > >>
> > >> Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.
> > >>
> > >> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.
> > >>
> > >
> > > I disagree.  If the ring emit functions are purely just emitting
> > > packets to the ring, it's a much cleaner approach than trying to save
> > > and restore packet sequences repeatedly.
> >
> > Exactly that's the problem, this is not what they do.
> >
> > See gfx_v11_0_ring_emit_gfx_shadow() for an example:
> >
> > ...
> >         /*
> >          * We start with skipping the prefix SET_Q_MODE and always executing
> >          * the postfix SET_Q_MODE packet. This is changed below with a
> >          * WRITE_DATA command when the postfix executed.
> >          */
> >         amdgpu_ring_write(ring, shadow_va ? 1 : 0);
> >         amdgpu_ring_write(ring, 0);
> >
> >         if (ring->set_q_mode_offs) {
> >                 uint64_t addr;
> >
> >                 addr = amdgpu_bo_gpu_offset(ring->ring_obj);
> >                 addr += ring->set_q_mode_offs << 2;
> >                 end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr);
> >         }
> > ...
> >         if (shadow_va) {
> >                 uint64_t token = shadow_va ^ csa_va ^ gds_va ^ vmid;
> >
> >                 /*
> >                  * If the tokens match try to skip the last postfix SET_Q_MODE
> >                  * packet to avoid saving/restoring the state all the time.
> >                  */
> >                 if (ring->set_q_mode_ptr && ring->set_q_mode_token == token)
> >                         *ring->set_q_mode_ptr = 0;
> >
> >                 ring->set_q_mode_token = token;
> >         } else {
> >                 ring->set_q_mode_ptr = &ring->ring[ring->set_q_mode_offs];
> >         }
> >
> >         ring->set_q_mode_offs = offs;
> > }
> >
> > Executing this multiple times is simply not possible without saving set_q_mode_offs, the token and the CPU pointer (and restoring the CPU pointer content).
> >
> > And that is just the tip of the iceberg, we have tons of state like this.
>
> There is not much more than that.  I looked when I wrote these
> patches.  Even this state should be handled correctly.  In this case,
> the state is saved in the job at the original submission time and is
> explicitly passed to the emit ring functions.  As such the original
> state is reproduced.  In this case, ring->set_q_mode_offs and
> ring->set_q_mode_ptr get reset in gfx_v11_0_ring_emit_vm_flush().
> Then they get set as appropriate based on the saved state in the job
> in gfx_v11_0_ring_emit_gfx_shadow().  It emits the same ring state
> again.
>

I just fixed up the set_q handling locally.  I added a helper which
saves the state of the ring (any ring->set_q values, etc.) in the job
before we schedule the IB.  Then after the reset I restore the ring
state before re-emitting the IB state.  At that point the ring has the
same state it had before the queue was reset and the state gets
updated in the ring as the IBs are reemitted.

That's it.  The only other state dependent on the ring was the seq
number to wait on for pipeline sync and I fixed that by making it
explicit.

Alex

> >
> > > If the relevant state is
> > > stored in the job, you can re-emit it and get the same ring state each
> > > time.
> >
> > No, you can't. Background is that the relevant state is not job dependent, but inter job dependent.
> >
> > In other words it doesn't depend on what job is executing now but rather which one was executed right before that one.
> >
> > Or even worse in the case of the set_q_mode packet on the job dependent after the one you want to execute.
> >
> > I can absolutely not see how stuff like that should work with re-submission.
>
> All you need to do is save the state that was used to emit the packets
> in the original submission.
>
> >
> > > If you end up with multiple queue resets in a row, it gets
> > > really complex to try and save and restore opaque ring contents.  By
> > > the time you fix up the state tracking to handle that, you end up
> > > pretty close to this solution.
> >
> > Not even remotely, you have tons of state we would need to save and restore and a lot of that is outside of the job.
> >
> > Updating a few fence pointers on re-submission is absolutely trivial compared to that.
>
> It's not that easy.  If you want to just emit the fences for bad
> contexts rather than the whole IB stream, you can also potentially
> mess up the ring state.  You'd end up needing a pile of pointers that
> need to be recalculated on every reset to try and remit the
> appropriate state again.  This approach also paves the way for
> re-emitting state for all queues after adapter reset when VRAM is not
> lost.
>
> Alex
>
> >
> > Regards,
> > Christian.
> >
> > >
> > > Alex
> > >
> > >> Regards,
> > >> Christian.
> > >>
> > >> On 1/8/26 15:48, Alex Deucher wrote:
> > >>> This set contains a number of bug fixes and cleanups for
> > >>> IB handling that I worked on over the holidays.
> > >>>
> > >>> Patches 1-2:
> > >>> Simple bug fixes.
> > >>>
> > >>> Patches 3-26:
> > >>> Removes the direct submit path for IBs and requires
> > >>> that all IB submissions use a job structure.  This
> > >>> greatly simplifies the IB submission code.
> > >>>
> > >>> Patches 27-42:
> > >>> Split IB state setup and ring emission.  This keeps all
> > >>> of the IB state in the job.  This greatly simplifies
> > >>> re-emission of non-timed-out jobs after a ring reset and
> > >>> allows for re-emission multiple times if multiple resets
> > >>> happen in a row.  It also properly handles the dma fence
> > >>> error handling for timedout jobs with adapter resets.
> > >>>
> > >>> Alex Deucher (42):
> > >>>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
> > >>>   drm/amdgpu: fix error handling in ib_schedule()
> > >>>   drm/amdgpu: add new job ids
> > >>>   drm/amdgpu/vpe: switch to using job for IBs
> > >>>   drm/amdgpu/gfx6: switch to using job for IBs
> > >>>   drm/amdgpu/gfx7: switch to using job for IBs
> > >>>   drm/amdgpu/gfx8: switch to using job for IBs
> > >>>   drm/amdgpu/gfx9: switch to using job for IBs
> > >>>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
> > >>>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
> > >>>   drm/amdgpu/gfx10: switch to using job for IBs
> > >>>   drm/amdgpu/gfx11: switch to using job for IBs
> > >>>   drm/amdgpu/gfx12: switch to using job for IBs
> > >>>   drm/amdgpu/gfx12.1: switch to using job for IBs
> > >>>   drm/amdgpu/si_dma: switch to using job for IBs
> > >>>   drm/amdgpu/cik_sdma: switch to using job for IBs
> > >>>   drm/amdgpu/sdma2.4: switch to using job for IBs
> > >>>   drm/amdgpu/sdma3: switch to using job for IBs
> > >>>   drm/amdgpu/sdma4: switch to using job for IBs
> > >>>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
> > >>>   drm/amdgpu/sdma5: switch to using job for IBs
> > >>>   drm/amdgpu/sdma5.2: switch to using job for IBs
> > >>>   drm/amdgpu/sdma6: switch to using job for IBs
> > >>>   drm/amdgpu/sdma7: switch to using job for IBs
> > >>>   drm/amdgpu/sdma7.1: switch to using job for IBs
> > >>>   drm/amdgpu: require a job to schedule an IB
> > >>>   drm/amdgpu: mark fences with errors before ring reset
> > >>>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
> > >>>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
> > >>>   drm/amdgpu: drop drm_sched_increase_karma()
> > >>>   drm/amdgpu: plumb timedout fence through to force completion
> > >>>   drm/amdgpu: change function signature for emit_pipeline_sync()
> > >>>   drm/amdgpu: drop extra parameter for vm_flush
> > >>>   drm/amdgpu: move need_ctx_switch into amdgpu_job
> > >>>   drm/amdgpu: store vm flush state in amdgpu_job
> > >>>   drm/amdgpu: split fence init and emit logic
> > >>>   drm/amdgpu: split vm flush and vm flush emit logic
> > >>>   drm/amdgpu: split ib schedule and ib emit logic
> > >>>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
> > >>>   drm/amdgpu: add an all_instance_rings_reset ring flag
> > >>>   drm/amdgpu: rework reset reemit handling
> > >>>   drm/amdgpu: simplify per queue reset code
> > >>>
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
> > >>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
> > >>>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
> > >>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
> > >>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
> > >>>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
> > >>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
> > >>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
> > >>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
> > >>>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
> > >>>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
> > >>>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
> > >>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
> > >>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
> > >>>  54 files changed, 952 insertions(+), 966 deletions(-)
> > >>>
> > >>
> >

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-13 22:36         ` Alex Deucher
@ 2026-01-14 10:45           ` Christian König
  2026-01-14 16:36             ` Alex Deucher
  0 siblings, 1 reply; 66+ messages in thread
From: Christian König @ 2026-01-14 10:45 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, amd-gfx

On 1/13/26 23:36, Alex Deucher wrote:
> On Tue, Jan 13, 2026 at 10:34 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>>
>> On Tue, Jan 13, 2026 at 9:48 AM Christian König
>> <christian.koenig@amd.com> wrote:
>>>
>>> On 1/13/26 15:10, Alex Deucher wrote:
>>>> On Tue, Jan 13, 2026 at 8:57 AM Christian König
>>>> <christian.koenig@amd.com> wrote:
>>>>>
>>>>> Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>
>>>>> Comment on patch #4 which also affects patches #5-#26.
>>>>>
>>>>> Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.
>>>>>
>>>>> Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>
>>>>> Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.
>>>>>
>>>>> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.
>>>>>
>>>>
>>>> I disagree.  If the ring emit functions are purely just emitting
>>>> packets to the ring, it's a much cleaner approach than trying to save
>>>> and restore packet sequences repeatedly.
>>>
>>> Exactly that's the problem, this is not what they do.
>>>
>>> See gfx_v11_0_ring_emit_gfx_shadow() for an example:
>>>
>>> ...
>>>         /*
>>>          * We start with skipping the prefix SET_Q_MODE and always executing
>>>          * the postfix SET_Q_MODE packet. This is changed below with a
>>>          * WRITE_DATA command when the postfix executed.
>>>          */
>>>         amdgpu_ring_write(ring, shadow_va ? 1 : 0);
>>>         amdgpu_ring_write(ring, 0);
>>>
>>>         if (ring->set_q_mode_offs) {
>>>                 uint64_t addr;
>>>
>>>                 addr = amdgpu_bo_gpu_offset(ring->ring_obj);
>>>                 addr += ring->set_q_mode_offs << 2;
>>>                 end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr);
>>>         }
>>> ...
>>>         if (shadow_va) {
>>>                 uint64_t token = shadow_va ^ csa_va ^ gds_va ^ vmid;
>>>
>>>                 /*
>>>                  * If the tokens match try to skip the last postfix SET_Q_MODE
>>>                  * packet to avoid saving/restoring the state all the time.
>>>                  */
>>>                 if (ring->set_q_mode_ptr && ring->set_q_mode_token == token)
>>>                         *ring->set_q_mode_ptr = 0;
>>>
>>>                 ring->set_q_mode_token = token;
>>>         } else {
>>>                 ring->set_q_mode_ptr = &ring->ring[ring->set_q_mode_offs];
>>>         }
>>>
>>>         ring->set_q_mode_offs = offs;
>>> }
>>>
>>> Executing this multiple times is simply not possible without saving set_q_mode_offs, the token and the CPU pointer (and restoring the CPU pointer content).
>>>
>>> And that is just the tip of the iceberg, we have tons of state like this.
>>
>> There is not much more than that.  I looked when I wrote these
>> patches.  Even this state should be handled correctly.  In this case,
>> the state is saved in the job at the original submission time and is
>> explicitly passed to the emit ring functions.  As such the original
>> state is reproduced.  In this case, ring->set_q_mode_offs and
>> ring->set_q_mode_ptr get reset in gfx_v11_0_ring_emit_vm_flush().
>> Then they get set as appropriate based on the saved state in the job
>> in gfx_v11_0_ring_emit_gfx_shadow().  It emits the same ring state
>> again.
>>
> 
> I just fixed up the set_q handling locally.  I added a helper which
> saves the state of the ring (any ring->set_q values, etc.) in the job
> before we schedule the IB.  Then after the reset I restore the ring
> state before re-emitting the IB state.

Exactly that doesn't work.

See the set_q_mode handling works by looking at the next job in the queue and determining based in PM4 code if executing the packet is necessary or not.

When we drop some jobs from execution because they belong to the same context as the one who caused the timeout we write incorrect commands into the PM4 stream when re-emitting.

We would need to extend the handling in a way where we can say ok this job is now skipped, but we need to pretend that it isn't so that the set_q_mode handling works and then still not execute the IBs in the job.

Long story short that is seriously not going to work. So absolutely clear NAK from my side to this approach.

What we could do to avoid problems and patching pointers in the command stream is to emit only the fence signaling for skipped jobs and fill everything else with NOPs.

Regards,
Christian.

>  At that point the ring has the
> same state it had before the queue was reset and the state gets
> updated in the ring as the IBs are reemitted.
> 
> That's it.  The only other state dependent on the ring was the seq
> number to wait on for pipeline sync and I fixed that by making it
> explicit.
> 
> Alex
> 
>>>
>>>> If the relevant state is
>>>> stored in the job, you can re-emit it and get the same ring state each
>>>> time.
>>>
>>> No, you can't. Background is that the relevant state is not job dependent, but inter job dependent.
>>>
>>> In other words it doesn't depend on what job is executing now but rather which one was executed right before that one.
>>>
>>> Or even worse in the case of the set_q_mode packet on the job dependent after the one you want to execute.
>>>
>>> I can absolutely not see how stuff like that should work with re-submission.
>>
>> All you need to do is save the state that was used to emit the packets
>> in the original submission.
>>
>>>
>>>> If you end up with multiple queue resets in a row, it gets
>>>> really complex to try and save and restore opaque ring contents.  By
>>>> the time you fix up the state tracking to handle that, you end up
>>>> pretty close to this solution.
>>>
>>> Not even remotely, you have tons of state we would need to save and restore and a lot of that is outside of the job.
>>>
>>> Updating a few fence pointers on re-submission is absolutely trivial compared to that.
>>
>> It's not that easy.  If you want to just emit the fences for bad
>> contexts rather than the whole IB stream, you can also potentially
>> mess up the ring state.  You'd end up needing a pile of pointers that
>> need to be recalculated on every reset to try and remit the
>> appropriate state again.  This approach also paves the way for
>> re-emitting state for all queues after adapter reset when VRAM is not
>> lost.
>>
>> Alex
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Alex
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> On 1/8/26 15:48, Alex Deucher wrote:
>>>>>> This set contains a number of bug fixes and cleanups for
>>>>>> IB handling that I worked on over the holidays.
>>>>>>
>>>>>> Patches 1-2:
>>>>>> Simple bug fixes.
>>>>>>
>>>>>> Patches 3-26:
>>>>>> Removes the direct submit path for IBs and requires
>>>>>> that all IB submissions use a job structure.  This
>>>>>> greatly simplifies the IB submission code.
>>>>>>
>>>>>> Patches 27-42:
>>>>>> Split IB state setup and ring emission.  This keeps all
>>>>>> of the IB state in the job.  This greatly simplifies
>>>>>> re-emission of non-timed-out jobs after a ring reset and
>>>>>> allows for re-emission multiple times if multiple resets
>>>>>> happen in a row.  It also properly handles the dma fence
>>>>>> error handling for timedout jobs with adapter resets.
>>>>>>
>>>>>> Alex Deucher (42):
>>>>>>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
>>>>>>   drm/amdgpu: fix error handling in ib_schedule()
>>>>>>   drm/amdgpu: add new job ids
>>>>>>   drm/amdgpu/vpe: switch to using job for IBs
>>>>>>   drm/amdgpu/gfx6: switch to using job for IBs
>>>>>>   drm/amdgpu/gfx7: switch to using job for IBs
>>>>>>   drm/amdgpu/gfx8: switch to using job for IBs
>>>>>>   drm/amdgpu/gfx9: switch to using job for IBs
>>>>>>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
>>>>>>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
>>>>>>   drm/amdgpu/gfx10: switch to using job for IBs
>>>>>>   drm/amdgpu/gfx11: switch to using job for IBs
>>>>>>   drm/amdgpu/gfx12: switch to using job for IBs
>>>>>>   drm/amdgpu/gfx12.1: switch to using job for IBs
>>>>>>   drm/amdgpu/si_dma: switch to using job for IBs
>>>>>>   drm/amdgpu/cik_sdma: switch to using job for IBs
>>>>>>   drm/amdgpu/sdma2.4: switch to using job for IBs
>>>>>>   drm/amdgpu/sdma3: switch to using job for IBs
>>>>>>   drm/amdgpu/sdma4: switch to using job for IBs
>>>>>>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
>>>>>>   drm/amdgpu/sdma5: switch to using job for IBs
>>>>>>   drm/amdgpu/sdma5.2: switch to using job for IBs
>>>>>>   drm/amdgpu/sdma6: switch to using job for IBs
>>>>>>   drm/amdgpu/sdma7: switch to using job for IBs
>>>>>>   drm/amdgpu/sdma7.1: switch to using job for IBs
>>>>>>   drm/amdgpu: require a job to schedule an IB
>>>>>>   drm/amdgpu: mark fences with errors before ring reset
>>>>>>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
>>>>>>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
>>>>>>   drm/amdgpu: drop drm_sched_increase_karma()
>>>>>>   drm/amdgpu: plumb timedout fence through to force completion
>>>>>>   drm/amdgpu: change function signature for emit_pipeline_sync()
>>>>>>   drm/amdgpu: drop extra parameter for vm_flush
>>>>>>   drm/amdgpu: move need_ctx_switch into amdgpu_job
>>>>>>   drm/amdgpu: store vm flush state in amdgpu_job
>>>>>>   drm/amdgpu: split fence init and emit logic
>>>>>>   drm/amdgpu: split vm flush and vm flush emit logic
>>>>>>   drm/amdgpu: split ib schedule and ib emit logic
>>>>>>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
>>>>>>   drm/amdgpu: add an all_instance_rings_reset ring flag
>>>>>>   drm/amdgpu: rework reset reemit handling
>>>>>>   drm/amdgpu: simplify per queue reset code
>>>>>>
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
>>>>>>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
>>>>>>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
>>>>>>  54 files changed, 952 insertions(+), 966 deletions(-)
>>>>>>
>>>>>
>>>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-14 10:45           ` Christian König
@ 2026-01-14 16:36             ` Alex Deucher
  2026-01-15  9:07               ` Christian König
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-14 16:36 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Wed, Jan 14, 2026 at 5:45 AM Christian König
<christian.koenig@amd.com> wrote:
>
> On 1/13/26 23:36, Alex Deucher wrote:
> > On Tue, Jan 13, 2026 at 10:34 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> >>
> >> On Tue, Jan 13, 2026 at 9:48 AM Christian König
> >> <christian.koenig@amd.com> wrote:
> >>>
> >>> On 1/13/26 15:10, Alex Deucher wrote:
> >>>> On Tue, Jan 13, 2026 at 8:57 AM Christian König
> >>>> <christian.koenig@amd.com> wrote:
> >>>>>
> >>>>> Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>
> >>>>>
> >>>>> Comment on patch #4 which also affects patches #5-#26.
> >>>>>
> >>>>> Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.
> >>>>>
> >>>>> Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>
> >>>>>
> >>>>> Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.
> >>>>>
> >>>>> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.
> >>>>>
> >>>>
> >>>> I disagree.  If the ring emit functions are purely just emitting
> >>>> packets to the ring, it's a much cleaner approach than trying to save
> >>>> and restore packet sequences repeatedly.
> >>>
> >>> Exactly that's the problem, this is not what they do.
> >>>
> >>> See gfx_v11_0_ring_emit_gfx_shadow() for an example:
> >>>
> >>> ...
> >>>         /*
> >>>          * We start with skipping the prefix SET_Q_MODE and always executing
> >>>          * the postfix SET_Q_MODE packet. This is changed below with a
> >>>          * WRITE_DATA command when the postfix executed.
> >>>          */
> >>>         amdgpu_ring_write(ring, shadow_va ? 1 : 0);
> >>>         amdgpu_ring_write(ring, 0);
> >>>
> >>>         if (ring->set_q_mode_offs) {
> >>>                 uint64_t addr;
> >>>
> >>>                 addr = amdgpu_bo_gpu_offset(ring->ring_obj);
> >>>                 addr += ring->set_q_mode_offs << 2;
> >>>                 end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr);
> >>>         }
> >>> ...
> >>>         if (shadow_va) {
> >>>                 uint64_t token = shadow_va ^ csa_va ^ gds_va ^ vmid;
> >>>
> >>>                 /*
> >>>                  * If the tokens match try to skip the last postfix SET_Q_MODE
> >>>                  * packet to avoid saving/restoring the state all the time.
> >>>                  */
> >>>                 if (ring->set_q_mode_ptr && ring->set_q_mode_token == token)
> >>>                         *ring->set_q_mode_ptr = 0;
> >>>
> >>>                 ring->set_q_mode_token = token;
> >>>         } else {
> >>>                 ring->set_q_mode_ptr = &ring->ring[ring->set_q_mode_offs];
> >>>         }
> >>>
> >>>         ring->set_q_mode_offs = offs;
> >>> }
> >>>
> >>> Executing this multiple times is simply not possible without saving set_q_mode_offs, the token and the CPU pointer (and restoring the CPU pointer content).
> >>>
> >>> And that is just the tip of the iceberg, we have tons of state like this.
> >>
> >> There is not much more than that.  I looked when I wrote these
> >> patches.  Even this state should be handled correctly.  In this case,
> >> the state is saved in the job at the original submission time and is
> >> explicitly passed to the emit ring functions.  As such the original
> >> state is reproduced.  In this case, ring->set_q_mode_offs and
> >> ring->set_q_mode_ptr get reset in gfx_v11_0_ring_emit_vm_flush().
> >> Then they get set as appropriate based on the saved state in the job
> >> in gfx_v11_0_ring_emit_gfx_shadow().  It emits the same ring state
> >> again.
> >>
> >
> > I just fixed up the set_q handling locally.  I added a helper which
> > saves the state of the ring (any ring->set_q values, etc.) in the job
> > before we schedule the IB.  Then after the reset I restore the ring
> > state before re-emitting the IB state.
>
> Exactly that doesn't work.
>
> See the set_q_mode handling works by looking at the next job in the queue and determining based in PM4 code if executing the packet is necessary or not.
>
> When we drop some jobs from execution because they belong to the same context as the one who caused the timeout we write incorrect commands into the PM4 stream when re-emitting.
>
> We would need to extend the handling in a way where we can say ok this job is now skipped, but we need to pretend that it isn't so that the set_q_mode handling works and then still not execute the IBs in the job.
>

Explicit re-emit is the only way this can easily work correctly.  We
save the ring state and and job state in the job and then we replay
the state and re-emit a proper coherent packet stream after the reset.
When we re-emit, we update the offsets as appropriate so that the
logic works properly as you replay the job stream.  You can skip the
IBs for the timedout context, but as long as the rest of the logic is
there, everything works.  Saving and restoring the opaque ring
contents is much harder because you need to either save a bunch of
pointers or try and determine which offsets to patch, etc.

Alex

> Long story short that is seriously not going to work. So absolutely clear NAK from my side to this approach.
>
> What we could do to avoid problems and patching pointers in the command stream is to emit only the fence signaling for skipped jobs and fill everything else with NOPs.
>
> Regards,
> Christian.
>
> >  At that point the ring has the
> > same state it had before the queue was reset and the state gets
> > updated in the ring as the IBs are reemitted.
> >
> > That's it.  The only other state dependent on the ring was the seq
> > number to wait on for pipeline sync and I fixed that by making it
> > explicit.
> >
> > Alex
> >
> >>>
> >>>> If the relevant state is
> >>>> stored in the job, you can re-emit it and get the same ring state each
> >>>> time.
> >>>
> >>> No, you can't. Background is that the relevant state is not job dependent, but inter job dependent.
> >>>
> >>> In other words it doesn't depend on what job is executing now but rather which one was executed right before that one.
> >>>
> >>> Or even worse in the case of the set_q_mode packet on the job dependent after the one you want to execute.
> >>>
> >>> I can absolutely not see how stuff like that should work with re-submission.
> >>
> >> All you need to do is save the state that was used to emit the packets
> >> in the original submission.
> >>
> >>>
> >>>> If you end up with multiple queue resets in a row, it gets
> >>>> really complex to try and save and restore opaque ring contents.  By
> >>>> the time you fix up the state tracking to handle that, you end up
> >>>> pretty close to this solution.
> >>>
> >>> Not even remotely, you have tons of state we would need to save and restore and a lot of that is outside of the job.
> >>>
> >>> Updating a few fence pointers on re-submission is absolutely trivial compared to that.
> >>
> >> It's not that easy.  If you want to just emit the fences for bad
> >> contexts rather than the whole IB stream, you can also potentially
> >> mess up the ring state.  You'd end up needing a pile of pointers that
> >> need to be recalculated on every reset to try and remit the
> >> appropriate state again.  This approach also paves the way for
> >> re-emitting state for all queues after adapter reset when VRAM is not
> >> lost.
> >>
> >> Alex
> >>
> >>>
> >>> Regards,
> >>> Christian.
> >>>
> >>>>
> >>>> Alex
> >>>>
> >>>>> Regards,
> >>>>> Christian.
> >>>>>
> >>>>> On 1/8/26 15:48, Alex Deucher wrote:
> >>>>>> This set contains a number of bug fixes and cleanups for
> >>>>>> IB handling that I worked on over the holidays.
> >>>>>>
> >>>>>> Patches 1-2:
> >>>>>> Simple bug fixes.
> >>>>>>
> >>>>>> Patches 3-26:
> >>>>>> Removes the direct submit path for IBs and requires
> >>>>>> that all IB submissions use a job structure.  This
> >>>>>> greatly simplifies the IB submission code.
> >>>>>>
> >>>>>> Patches 27-42:
> >>>>>> Split IB state setup and ring emission.  This keeps all
> >>>>>> of the IB state in the job.  This greatly simplifies
> >>>>>> re-emission of non-timed-out jobs after a ring reset and
> >>>>>> allows for re-emission multiple times if multiple resets
> >>>>>> happen in a row.  It also properly handles the dma fence
> >>>>>> error handling for timedout jobs with adapter resets.
> >>>>>>
> >>>>>> Alex Deucher (42):
> >>>>>>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
> >>>>>>   drm/amdgpu: fix error handling in ib_schedule()
> >>>>>>   drm/amdgpu: add new job ids
> >>>>>>   drm/amdgpu/vpe: switch to using job for IBs
> >>>>>>   drm/amdgpu/gfx6: switch to using job for IBs
> >>>>>>   drm/amdgpu/gfx7: switch to using job for IBs
> >>>>>>   drm/amdgpu/gfx8: switch to using job for IBs
> >>>>>>   drm/amdgpu/gfx9: switch to using job for IBs
> >>>>>>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
> >>>>>>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
> >>>>>>   drm/amdgpu/gfx10: switch to using job for IBs
> >>>>>>   drm/amdgpu/gfx11: switch to using job for IBs
> >>>>>>   drm/amdgpu/gfx12: switch to using job for IBs
> >>>>>>   drm/amdgpu/gfx12.1: switch to using job for IBs
> >>>>>>   drm/amdgpu/si_dma: switch to using job for IBs
> >>>>>>   drm/amdgpu/cik_sdma: switch to using job for IBs
> >>>>>>   drm/amdgpu/sdma2.4: switch to using job for IBs
> >>>>>>   drm/amdgpu/sdma3: switch to using job for IBs
> >>>>>>   drm/amdgpu/sdma4: switch to using job for IBs
> >>>>>>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
> >>>>>>   drm/amdgpu/sdma5: switch to using job for IBs
> >>>>>>   drm/amdgpu/sdma5.2: switch to using job for IBs
> >>>>>>   drm/amdgpu/sdma6: switch to using job for IBs
> >>>>>>   drm/amdgpu/sdma7: switch to using job for IBs
> >>>>>>   drm/amdgpu/sdma7.1: switch to using job for IBs
> >>>>>>   drm/amdgpu: require a job to schedule an IB
> >>>>>>   drm/amdgpu: mark fences with errors before ring reset
> >>>>>>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
> >>>>>>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
> >>>>>>   drm/amdgpu: drop drm_sched_increase_karma()
> >>>>>>   drm/amdgpu: plumb timedout fence through to force completion
> >>>>>>   drm/amdgpu: change function signature for emit_pipeline_sync()
> >>>>>>   drm/amdgpu: drop extra parameter for vm_flush
> >>>>>>   drm/amdgpu: move need_ctx_switch into amdgpu_job
> >>>>>>   drm/amdgpu: store vm flush state in amdgpu_job
> >>>>>>   drm/amdgpu: split fence init and emit logic
> >>>>>>   drm/amdgpu: split vm flush and vm flush emit logic
> >>>>>>   drm/amdgpu: split ib schedule and ib emit logic
> >>>>>>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
> >>>>>>   drm/amdgpu: add an all_instance_rings_reset ring flag
> >>>>>>   drm/amdgpu: rework reset reemit handling
> >>>>>>   drm/amdgpu: simplify per queue reset code
> >>>>>>
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
> >>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
> >>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
> >>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
> >>>>>>  54 files changed, 952 insertions(+), 966 deletions(-)
> >>>>>>
> >>>>>
> >>>
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-14 16:36             ` Alex Deucher
@ 2026-01-15  9:07               ` Christian König
  2026-01-15 14:08                 ` Alex Deucher
  0 siblings, 1 reply; 66+ messages in thread
From: Christian König @ 2026-01-15  9:07 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, amd-gfx

On 1/14/26 17:36, Alex Deucher wrote:
> On Wed, Jan 14, 2026 at 5:45 AM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> On 1/13/26 23:36, Alex Deucher wrote:
>>> On Tue, Jan 13, 2026 at 10:34 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>>>>
>>>> On Tue, Jan 13, 2026 at 9:48 AM Christian König
>>>> <christian.koenig@amd.com> wrote:
>>>>>
>>>>> On 1/13/26 15:10, Alex Deucher wrote:
>>>>>> On Tue, Jan 13, 2026 at 8:57 AM Christian König
>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>
>>>>>>> Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>>>
>>>>>>> Comment on patch #4 which also affects patches #5-#26.
>>>>>>>
>>>>>>> Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.
>>>>>>>
>>>>>>> Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>>>
>>>>>>> Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.
>>>>>>>
>>>>>>> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.
>>>>>>>
>>>>>>
>>>>>> I disagree.  If the ring emit functions are purely just emitting
>>>>>> packets to the ring, it's a much cleaner approach than trying to save
>>>>>> and restore packet sequences repeatedly.
>>>>>
>>>>> Exactly that's the problem, this is not what they do.
>>>>>
>>>>> See gfx_v11_0_ring_emit_gfx_shadow() for an example:
>>>>>
>>>>> ...
>>>>>         /*
>>>>>          * We start with skipping the prefix SET_Q_MODE and always executing
>>>>>          * the postfix SET_Q_MODE packet. This is changed below with a
>>>>>          * WRITE_DATA command when the postfix executed.
>>>>>          */
>>>>>         amdgpu_ring_write(ring, shadow_va ? 1 : 0);
>>>>>         amdgpu_ring_write(ring, 0);
>>>>>
>>>>>         if (ring->set_q_mode_offs) {
>>>>>                 uint64_t addr;
>>>>>
>>>>>                 addr = amdgpu_bo_gpu_offset(ring->ring_obj);
>>>>>                 addr += ring->set_q_mode_offs << 2;
>>>>>                 end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr);
>>>>>         }
>>>>> ...
>>>>>         if (shadow_va) {
>>>>>                 uint64_t token = shadow_va ^ csa_va ^ gds_va ^ vmid;
>>>>>
>>>>>                 /*
>>>>>                  * If the tokens match try to skip the last postfix SET_Q_MODE
>>>>>                  * packet to avoid saving/restoring the state all the time.
>>>>>                  */
>>>>>                 if (ring->set_q_mode_ptr && ring->set_q_mode_token == token)
>>>>>                         *ring->set_q_mode_ptr = 0;
>>>>>
>>>>>                 ring->set_q_mode_token = token;
>>>>>         } else {
>>>>>                 ring->set_q_mode_ptr = &ring->ring[ring->set_q_mode_offs];
>>>>>         }
>>>>>
>>>>>         ring->set_q_mode_offs = offs;
>>>>> }
>>>>>
>>>>> Executing this multiple times is simply not possible without saving set_q_mode_offs, the token and the CPU pointer (and restoring the CPU pointer content).
>>>>>
>>>>> And that is just the tip of the iceberg, we have tons of state like this.
>>>>
>>>> There is not much more than that.  I looked when I wrote these
>>>> patches.  Even this state should be handled correctly.  In this case,
>>>> the state is saved in the job at the original submission time and is
>>>> explicitly passed to the emit ring functions.  As such the original
>>>> state is reproduced.  In this case, ring->set_q_mode_offs and
>>>> ring->set_q_mode_ptr get reset in gfx_v11_0_ring_emit_vm_flush().
>>>> Then they get set as appropriate based on the saved state in the job
>>>> in gfx_v11_0_ring_emit_gfx_shadow().  It emits the same ring state
>>>> again.
>>>>
>>>
>>> I just fixed up the set_q handling locally.  I added a helper which
>>> saves the state of the ring (any ring->set_q values, etc.) in the job
>>> before we schedule the IB.  Then after the reset I restore the ring
>>> state before re-emitting the IB state.
>>
>> Exactly that doesn't work.
>>
>> See the set_q_mode handling works by looking at the next job in the queue and determining based in PM4 code if executing the packet is necessary or not.
>>
>> When we drop some jobs from execution because they belong to the same context as the one who caused the timeout we write incorrect commands into the PM4 stream when re-emitting.
>>
>> We would need to extend the handling in a way where we can say ok this job is now skipped, but we need to pretend that it isn't so that the set_q_mode handling works and then still not execute the IBs in the job.
>>
> 
> Explicit re-emit is the only way this can easily work correctly.  We
> save the ring state and and job state in the job and then we replay
> the state and re-emit a proper coherent packet stream after the reset.
> When we re-emit, we update the offsets as appropriate so that the
> logic works properly as you replay the job stream.  You can skip the
> IBs for the timedout context, but as long as the rest of the logic is
> there, everything works.  Saving and restoring the opaque ring
> contents is much harder because you need to either save a bunch of
> pointers or try and determine which offsets to patch, etc.

Or you tell the HW to continue at the place you stopped excuting and before the reset and use the conditional execute all jobs are wrapped up in anyway to determine if they should execute or not or overwrite the commands with NOPs when for engines who don't use the conditional execute.

Re-emitting the command stream would only be necessary if we need to change the commands in anyway, and even if we would need to do that then I would say that we should not emit the commands again at all.

I have patches in the pipeline to remove the job object from the reset path, so that we can free it up directly after submission again and completely solve all the lifetime issues we had with that.

Re-emitting completely breaks that again.

Christian.

> 
> Alex
> 
>> Long story short that is seriously not going to work. So absolutely clear NAK from my side to this approach.
>>
>> What we could do to avoid problems and patching pointers in the command stream is to emit only the fence signaling for skipped jobs and fill everything else with NOPs.
>>
>> Regards,
>> Christian.
>>
>>>  At that point the ring has the
>>> same state it had before the queue was reset and the state gets
>>> updated in the ring as the IBs are reemitted.
>>>
>>> That's it.  The only other state dependent on the ring was the seq
>>> number to wait on for pipeline sync and I fixed that by making it
>>> explicit.
>>>
>>> Alex
>>>
>>>>>
>>>>>> If the relevant state is
>>>>>> stored in the job, you can re-emit it and get the same ring state each
>>>>>> time.
>>>>>
>>>>> No, you can't. Background is that the relevant state is not job dependent, but inter job dependent.
>>>>>
>>>>> In other words it doesn't depend on what job is executing now but rather which one was executed right before that one.
>>>>>
>>>>> Or even worse in the case of the set_q_mode packet on the job dependent after the one you want to execute.
>>>>>
>>>>> I can absolutely not see how stuff like that should work with re-submission.
>>>>
>>>> All you need to do is save the state that was used to emit the packets
>>>> in the original submission.
>>>>
>>>>>
>>>>>> If you end up with multiple queue resets in a row, it gets
>>>>>> really complex to try and save and restore opaque ring contents.  By
>>>>>> the time you fix up the state tracking to handle that, you end up
>>>>>> pretty close to this solution.
>>>>>
>>>>> Not even remotely, you have tons of state we would need to save and restore and a lot of that is outside of the job.
>>>>>
>>>>> Updating a few fence pointers on re-submission is absolutely trivial compared to that.
>>>>
>>>> It's not that easy.  If you want to just emit the fences for bad
>>>> contexts rather than the whole IB stream, you can also potentially
>>>> mess up the ring state.  You'd end up needing a pile of pointers that
>>>> need to be recalculated on every reset to try and remit the
>>>> appropriate state again.  This approach also paves the way for
>>>> re-emitting state for all queues after adapter reset when VRAM is not
>>>> lost.
>>>>
>>>> Alex
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> On 1/8/26 15:48, Alex Deucher wrote:
>>>>>>>> This set contains a number of bug fixes and cleanups for
>>>>>>>> IB handling that I worked on over the holidays.
>>>>>>>>
>>>>>>>> Patches 1-2:
>>>>>>>> Simple bug fixes.
>>>>>>>>
>>>>>>>> Patches 3-26:
>>>>>>>> Removes the direct submit path for IBs and requires
>>>>>>>> that all IB submissions use a job structure.  This
>>>>>>>> greatly simplifies the IB submission code.
>>>>>>>>
>>>>>>>> Patches 27-42:
>>>>>>>> Split IB state setup and ring emission.  This keeps all
>>>>>>>> of the IB state in the job.  This greatly simplifies
>>>>>>>> re-emission of non-timed-out jobs after a ring reset and
>>>>>>>> allows for re-emission multiple times if multiple resets
>>>>>>>> happen in a row.  It also properly handles the dma fence
>>>>>>>> error handling for timedout jobs with adapter resets.
>>>>>>>>
>>>>>>>> Alex Deucher (42):
>>>>>>>>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
>>>>>>>>   drm/amdgpu: fix error handling in ib_schedule()
>>>>>>>>   drm/amdgpu: add new job ids
>>>>>>>>   drm/amdgpu/vpe: switch to using job for IBs
>>>>>>>>   drm/amdgpu/gfx6: switch to using job for IBs
>>>>>>>>   drm/amdgpu/gfx7: switch to using job for IBs
>>>>>>>>   drm/amdgpu/gfx8: switch to using job for IBs
>>>>>>>>   drm/amdgpu/gfx9: switch to using job for IBs
>>>>>>>>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
>>>>>>>>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
>>>>>>>>   drm/amdgpu/gfx10: switch to using job for IBs
>>>>>>>>   drm/amdgpu/gfx11: switch to using job for IBs
>>>>>>>>   drm/amdgpu/gfx12: switch to using job for IBs
>>>>>>>>   drm/amdgpu/gfx12.1: switch to using job for IBs
>>>>>>>>   drm/amdgpu/si_dma: switch to using job for IBs
>>>>>>>>   drm/amdgpu/cik_sdma: switch to using job for IBs
>>>>>>>>   drm/amdgpu/sdma2.4: switch to using job for IBs
>>>>>>>>   drm/amdgpu/sdma3: switch to using job for IBs
>>>>>>>>   drm/amdgpu/sdma4: switch to using job for IBs
>>>>>>>>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
>>>>>>>>   drm/amdgpu/sdma5: switch to using job for IBs
>>>>>>>>   drm/amdgpu/sdma5.2: switch to using job for IBs
>>>>>>>>   drm/amdgpu/sdma6: switch to using job for IBs
>>>>>>>>   drm/amdgpu/sdma7: switch to using job for IBs
>>>>>>>>   drm/amdgpu/sdma7.1: switch to using job for IBs
>>>>>>>>   drm/amdgpu: require a job to schedule an IB
>>>>>>>>   drm/amdgpu: mark fences with errors before ring reset
>>>>>>>>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
>>>>>>>>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
>>>>>>>>   drm/amdgpu: drop drm_sched_increase_karma()
>>>>>>>>   drm/amdgpu: plumb timedout fence through to force completion
>>>>>>>>   drm/amdgpu: change function signature for emit_pipeline_sync()
>>>>>>>>   drm/amdgpu: drop extra parameter for vm_flush
>>>>>>>>   drm/amdgpu: move need_ctx_switch into amdgpu_job
>>>>>>>>   drm/amdgpu: store vm flush state in amdgpu_job
>>>>>>>>   drm/amdgpu: split fence init and emit logic
>>>>>>>>   drm/amdgpu: split vm flush and vm flush emit logic
>>>>>>>>   drm/amdgpu: split ib schedule and ib emit logic
>>>>>>>>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
>>>>>>>>   drm/amdgpu: add an all_instance_rings_reset ring flag
>>>>>>>>   drm/amdgpu: rework reset reemit handling
>>>>>>>>   drm/amdgpu: simplify per queue reset code
>>>>>>>>
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
>>>>>>>>  54 files changed, 952 insertions(+), 966 deletions(-)
>>>>>>>>
>>>>>>>
>>>>>
>>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-15  9:07               ` Christian König
@ 2026-01-15 14:08                 ` Alex Deucher
  2026-01-15 14:54                   ` Christian König
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-15 14:08 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Thu, Jan 15, 2026 at 4:08 AM Christian König
<christian.koenig@amd.com> wrote:
>
> On 1/14/26 17:36, Alex Deucher wrote:
> > On Wed, Jan 14, 2026 at 5:45 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >>
> >> On 1/13/26 23:36, Alex Deucher wrote:
> >>> On Tue, Jan 13, 2026 at 10:34 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> >>>>
> >>>> On Tue, Jan 13, 2026 at 9:48 AM Christian König
> >>>> <christian.koenig@amd.com> wrote:
> >>>>>
> >>>>> On 1/13/26 15:10, Alex Deucher wrote:
> >>>>>> On Tue, Jan 13, 2026 at 8:57 AM Christian König
> >>>>>> <christian.koenig@amd.com> wrote:
> >>>>>>>
> >>>>>>> Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>
> >>>>>>>
> >>>>>>> Comment on patch #4 which also affects patches #5-#26.
> >>>>>>>
> >>>>>>> Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.
> >>>>>>>
> >>>>>>> Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>
> >>>>>>>
> >>>>>>> Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.
> >>>>>>>
> >>>>>>> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.
> >>>>>>>
> >>>>>>
> >>>>>> I disagree.  If the ring emit functions are purely just emitting
> >>>>>> packets to the ring, it's a much cleaner approach than trying to save
> >>>>>> and restore packet sequences repeatedly.
> >>>>>
> >>>>> Exactly that's the problem, this is not what they do.
> >>>>>
> >>>>> See gfx_v11_0_ring_emit_gfx_shadow() for an example:
> >>>>>
> >>>>> ...
> >>>>>         /*
> >>>>>          * We start with skipping the prefix SET_Q_MODE and always executing
> >>>>>          * the postfix SET_Q_MODE packet. This is changed below with a
> >>>>>          * WRITE_DATA command when the postfix executed.
> >>>>>          */
> >>>>>         amdgpu_ring_write(ring, shadow_va ? 1 : 0);
> >>>>>         amdgpu_ring_write(ring, 0);
> >>>>>
> >>>>>         if (ring->set_q_mode_offs) {
> >>>>>                 uint64_t addr;
> >>>>>
> >>>>>                 addr = amdgpu_bo_gpu_offset(ring->ring_obj);
> >>>>>                 addr += ring->set_q_mode_offs << 2;
> >>>>>                 end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr);
> >>>>>         }
> >>>>> ...
> >>>>>         if (shadow_va) {
> >>>>>                 uint64_t token = shadow_va ^ csa_va ^ gds_va ^ vmid;
> >>>>>
> >>>>>                 /*
> >>>>>                  * If the tokens match try to skip the last postfix SET_Q_MODE
> >>>>>                  * packet to avoid saving/restoring the state all the time.
> >>>>>                  */
> >>>>>                 if (ring->set_q_mode_ptr && ring->set_q_mode_token == token)
> >>>>>                         *ring->set_q_mode_ptr = 0;
> >>>>>
> >>>>>                 ring->set_q_mode_token = token;
> >>>>>         } else {
> >>>>>                 ring->set_q_mode_ptr = &ring->ring[ring->set_q_mode_offs];
> >>>>>         }
> >>>>>
> >>>>>         ring->set_q_mode_offs = offs;
> >>>>> }
> >>>>>
> >>>>> Executing this multiple times is simply not possible without saving set_q_mode_offs, the token and the CPU pointer (and restoring the CPU pointer content).
> >>>>>
> >>>>> And that is just the tip of the iceberg, we have tons of state like this.
> >>>>
> >>>> There is not much more than that.  I looked when I wrote these
> >>>> patches.  Even this state should be handled correctly.  In this case,
> >>>> the state is saved in the job at the original submission time and is
> >>>> explicitly passed to the emit ring functions.  As such the original
> >>>> state is reproduced.  In this case, ring->set_q_mode_offs and
> >>>> ring->set_q_mode_ptr get reset in gfx_v11_0_ring_emit_vm_flush().
> >>>> Then they get set as appropriate based on the saved state in the job
> >>>> in gfx_v11_0_ring_emit_gfx_shadow().  It emits the same ring state
> >>>> again.
> >>>>
> >>>
> >>> I just fixed up the set_q handling locally.  I added a helper which
> >>> saves the state of the ring (any ring->set_q values, etc.) in the job
> >>> before we schedule the IB.  Then after the reset I restore the ring
> >>> state before re-emitting the IB state.
> >>
> >> Exactly that doesn't work.
> >>
> >> See the set_q_mode handling works by looking at the next job in the queue and determining based in PM4 code if executing the packet is necessary or not.
> >>
> >> When we drop some jobs from execution because they belong to the same context as the one who caused the timeout we write incorrect commands into the PM4 stream when re-emitting.
> >>
> >> We would need to extend the handling in a way where we can say ok this job is now skipped, but we need to pretend that it isn't so that the set_q_mode handling works and then still not execute the IBs in the job.
> >>
> >
> > Explicit re-emit is the only way this can easily work correctly.  We
> > save the ring state and and job state in the job and then we replay
> > the state and re-emit a proper coherent packet stream after the reset.
> > When we re-emit, we update the offsets as appropriate so that the
> > logic works properly as you replay the job stream.  You can skip the
> > IBs for the timedout context, but as long as the rest of the logic is
> > there, everything works.  Saving and restoring the opaque ring
> > contents is much harder because you need to either save a bunch of
> > pointers or try and determine which offsets to patch, etc.
>
> Or you tell the HW to continue at the place you stopped excuting and before the reset and use the conditional execute all jobs are wrapped up in anyway to determine if they should execute or not or overwrite the commands with NOPs when for engines who don't use the conditional execute.
>

Not all rings retain the contents of the ring after a reset or may not
be able to start at arbitrary ring ptr locations.  Plus only gfx and
compute have conditional execution support.  For everything else you
need to adjust the packet stream.

> Re-emitting the command stream would only be necessary if we need to change the commands in anyway, and even if we would need to do that then I would say that we should not emit the commands again at all.
>

The only case where we need to mess with anything is to support the
set_q stuff and that is only supported on one gfx11 chip specifically
for SR-IOV.

> I have patches in the pipeline to remove the job object from the reset path, so that we can free it up directly after submission again and completely solve all the lifetime issues we had with that.
>

I don't really see any lifetime issues with the job after we fix the
whole sched stop/start stuff.  Moreover, having the job around (or we
could hang the state on the fence, but that is less clean because
there are potentially two fences per job that you need to keep track
of that share common state) makes it much easier to re-emit the packet
stream after a reset.  It's a lot easier to just call the emit
functions on a clean ring than to deal with opaque ring contents.
Depending on the ring you end up needing to keep lots of pointers to
mark fences and job boundaries.  Then if you have to re-emit the same
job multiple times, you have to re-adjust all of the pointers, plus
deal with skipping the IBs while still emitting the fences.

Alex

> Re-emitting completely breaks that again.
>
> Christian.
>
> >
> > Alex
> >
> >> Long story short that is seriously not going to work. So absolutely clear NAK from my side to this approach.
> >>
> >> What we could do to avoid problems and patching pointers in the command stream is to emit only the fence signaling for skipped jobs and fill everything else with NOPs.
> >>
> >> Regards,
> >> Christian.
> >>
> >>>  At that point the ring has the
> >>> same state it had before the queue was reset and the state gets
> >>> updated in the ring as the IBs are reemitted.
> >>>
> >>> That's it.  The only other state dependent on the ring was the seq
> >>> number to wait on for pipeline sync and I fixed that by making it
> >>> explicit.
> >>>
> >>> Alex
> >>>
> >>>>>
> >>>>>> If the relevant state is
> >>>>>> stored in the job, you can re-emit it and get the same ring state each
> >>>>>> time.
> >>>>>
> >>>>> No, you can't. Background is that the relevant state is not job dependent, but inter job dependent.
> >>>>>
> >>>>> In other words it doesn't depend on what job is executing now but rather which one was executed right before that one.
> >>>>>
> >>>>> Or even worse in the case of the set_q_mode packet on the job dependent after the one you want to execute.
> >>>>>
> >>>>> I can absolutely not see how stuff like that should work with re-submission.
> >>>>
> >>>> All you need to do is save the state that was used to emit the packets
> >>>> in the original submission.
> >>>>
> >>>>>
> >>>>>> If you end up with multiple queue resets in a row, it gets
> >>>>>> really complex to try and save and restore opaque ring contents.  By
> >>>>>> the time you fix up the state tracking to handle that, you end up
> >>>>>> pretty close to this solution.
> >>>>>
> >>>>> Not even remotely, you have tons of state we would need to save and restore and a lot of that is outside of the job.
> >>>>>
> >>>>> Updating a few fence pointers on re-submission is absolutely trivial compared to that.
> >>>>
> >>>> It's not that easy.  If you want to just emit the fences for bad
> >>>> contexts rather than the whole IB stream, you can also potentially
> >>>> mess up the ring state.  You'd end up needing a pile of pointers that
> >>>> need to be recalculated on every reset to try and remit the
> >>>> appropriate state again.  This approach also paves the way for
> >>>> re-emitting state for all queues after adapter reset when VRAM is not
> >>>> lost.
> >>>>
> >>>> Alex
> >>>>
> >>>>>
> >>>>> Regards,
> >>>>> Christian.
> >>>>>
> >>>>>>
> >>>>>> Alex
> >>>>>>
> >>>>>>> Regards,
> >>>>>>> Christian.
> >>>>>>>
> >>>>>>> On 1/8/26 15:48, Alex Deucher wrote:
> >>>>>>>> This set contains a number of bug fixes and cleanups for
> >>>>>>>> IB handling that I worked on over the holidays.
> >>>>>>>>
> >>>>>>>> Patches 1-2:
> >>>>>>>> Simple bug fixes.
> >>>>>>>>
> >>>>>>>> Patches 3-26:
> >>>>>>>> Removes the direct submit path for IBs and requires
> >>>>>>>> that all IB submissions use a job structure.  This
> >>>>>>>> greatly simplifies the IB submission code.
> >>>>>>>>
> >>>>>>>> Patches 27-42:
> >>>>>>>> Split IB state setup and ring emission.  This keeps all
> >>>>>>>> of the IB state in the job.  This greatly simplifies
> >>>>>>>> re-emission of non-timed-out jobs after a ring reset and
> >>>>>>>> allows for re-emission multiple times if multiple resets
> >>>>>>>> happen in a row.  It also properly handles the dma fence
> >>>>>>>> error handling for timedout jobs with adapter resets.
> >>>>>>>>
> >>>>>>>> Alex Deucher (42):
> >>>>>>>>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
> >>>>>>>>   drm/amdgpu: fix error handling in ib_schedule()
> >>>>>>>>   drm/amdgpu: add new job ids
> >>>>>>>>   drm/amdgpu/vpe: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/gfx6: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/gfx7: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/gfx8: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/gfx9: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/gfx10: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/gfx11: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/gfx12: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/gfx12.1: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/si_dma: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/cik_sdma: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/sdma2.4: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/sdma3: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/sdma4: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/sdma5: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/sdma5.2: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/sdma6: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/sdma7: switch to using job for IBs
> >>>>>>>>   drm/amdgpu/sdma7.1: switch to using job for IBs
> >>>>>>>>   drm/amdgpu: require a job to schedule an IB
> >>>>>>>>   drm/amdgpu: mark fences with errors before ring reset
> >>>>>>>>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
> >>>>>>>>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
> >>>>>>>>   drm/amdgpu: drop drm_sched_increase_karma()
> >>>>>>>>   drm/amdgpu: plumb timedout fence through to force completion
> >>>>>>>>   drm/amdgpu: change function signature for emit_pipeline_sync()
> >>>>>>>>   drm/amdgpu: drop extra parameter for vm_flush
> >>>>>>>>   drm/amdgpu: move need_ctx_switch into amdgpu_job
> >>>>>>>>   drm/amdgpu: store vm flush state in amdgpu_job
> >>>>>>>>   drm/amdgpu: split fence init and emit logic
> >>>>>>>>   drm/amdgpu: split vm flush and vm flush emit logic
> >>>>>>>>   drm/amdgpu: split ib schedule and ib emit logic
> >>>>>>>>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
> >>>>>>>>   drm/amdgpu: add an all_instance_rings_reset ring flag
> >>>>>>>>   drm/amdgpu: rework reset reemit handling
> >>>>>>>>   drm/amdgpu: simplify per queue reset code
> >>>>>>>>
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
> >>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
> >>>>>>>>  54 files changed, 952 insertions(+), 966 deletions(-)
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-15 14:08                 ` Alex Deucher
@ 2026-01-15 14:54                   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2026-01-15 14:54 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, amd-gfx

On 1/15/26 15:08, Alex Deucher wrote:
>>>
>>> Explicit re-emit is the only way this can easily work correctly.  We
>>> save the ring state and and job state in the job and then we replay
>>> the state and re-emit a proper coherent packet stream after the reset.
>>> When we re-emit, we update the offsets as appropriate so that the
>>> logic works properly as you replay the job stream.  You can skip the
>>> IBs for the timedout context, but as long as the rest of the logic is
>>> there, everything works.  Saving and restoring the opaque ring
>>> contents is much harder because you need to either save a bunch of
>>> pointers or try and determine which offsets to patch, etc.
>>
>> Or you tell the HW to continue at the place you stopped excuting and before the reset and use the conditional execute all jobs are wrapped up in anyway to determine if they should execute or not or overwrite the commands with NOPs when for engines who don't use the conditional execute.
>>
> 
> Not all rings retain the contents of the ring after a reset

In that case I think we should not re-emit the work at all.

For example even if VRAM is not lost after a GPU reset the remaining state (VM etc..) is gone and can't be restored easily as far as I see.

We should absolutely not re-emit prending jobs in that case. It's basically just jepardy if that works or not.

> or may not
> be able to start at arbitrary ring ptr locations.

Every engine can do that, we just have to insert NOPs until you end up at the specific location.

> Plus only gfx and
> compute have conditional execution support.  For everything else you
> need to adjust the packet stream.

That is harmless, just overwrite with NOPs. That's certainly something all rings can do.

>> Re-emitting the command stream would only be necessary if we need to change the commands in anyway, and even if we would need to do that then I would say that we should not emit the commands again at all.
>>
> 
> The only case where we need to mess with anything is to support the
> set_q stuff and that is only supported on one gfx11 chip specifically
> for SR-IOV.
> 
>> I have patches in the pipeline to remove the job object from the reset path, so that we can free it up directly after submission again and completely solve all the lifetime issues we had with that.
>>
> 
> I don't really see any lifetime issues with the job after we fix the
> whole sched stop/start stuff.

It's not only the job object itself, but also all objects it eventually points to.

For example job->vm is invalid after you initially emitted the job and exactly that has been caused issues tons of times.

> Moreover, having the job around (or we
> could hang the state on the fence, but that is less clean because
> there are potentially two fences per job that you need to keep track
> of that share common state) makes it much easier to re-emit the packet
> stream after a reset.  It's a lot easier to just call the emit
> functions on a clean ring than to deal with opaque ring contents.
> Depending on the ring you end up needing to keep lots of pointers to
> mark fences and job boundaries.  Then if you have to re-emit the same
> job multiple times, you have to re-adjust all of the pointers, plus
> deal with skipping the IBs while still emitting the fences.

And exactly that is what I'm trying to prevent. Emitting jobs multiple times was an extremely bad idea.

I can't count how many hours we have spend over the last 10 years just to try to get that working and we still have the same problems we had at the beginning.

So I absolutely see no change that this will change.

Christian.

> 
> Alex
> 
>> Re-emitting completely breaks that again.
>>
>> Christian.
>>
>>>
>>> Alex
>>>
>>>> Long story short that is seriously not going to work. So absolutely clear NAK from my side to this approach.
>>>>
>>>> What we could do to avoid problems and patching pointers in the command stream is to emit only the fence signaling for skipped jobs and fill everything else with NOPs.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>  At that point the ring has the
>>>>> same state it had before the queue was reset and the state gets
>>>>> updated in the ring as the IBs are reemitted.
>>>>>
>>>>> That's it.  The only other state dependent on the ring was the seq
>>>>> number to wait on for pipeline sync and I fixed that by making it
>>>>> explicit.
>>>>>
>>>>> Alex
>>>>>
>>>>>>>
>>>>>>>> If the relevant state is
>>>>>>>> stored in the job, you can re-emit it and get the same ring state each
>>>>>>>> time.
>>>>>>>
>>>>>>> No, you can't. Background is that the relevant state is not job dependent, but inter job dependent.
>>>>>>>
>>>>>>> In other words it doesn't depend on what job is executing now but rather which one was executed right before that one.
>>>>>>>
>>>>>>> Or even worse in the case of the set_q_mode packet on the job dependent after the one you want to execute.
>>>>>>>
>>>>>>> I can absolutely not see how stuff like that should work with re-submission.
>>>>>>
>>>>>> All you need to do is save the state that was used to emit the packets
>>>>>> in the original submission.
>>>>>>
>>>>>>>
>>>>>>>> If you end up with multiple queue resets in a row, it gets
>>>>>>>> really complex to try and save and restore opaque ring contents.  By
>>>>>>>> the time you fix up the state tracking to handle that, you end up
>>>>>>>> pretty close to this solution.
>>>>>>>
>>>>>>> Not even remotely, you have tons of state we would need to save and restore and a lot of that is outside of the job.
>>>>>>>
>>>>>>> Updating a few fence pointers on re-submission is absolutely trivial compared to that.
>>>>>>
>>>>>> It's not that easy.  If you want to just emit the fences for bad
>>>>>> contexts rather than the whole IB stream, you can also potentially
>>>>>> mess up the ring state.  You'd end up needing a pile of pointers that
>>>>>> need to be recalculated on every reset to try and remit the
>>>>>> appropriate state again.  This approach also paves the way for
>>>>>> re-emitting state for all queues after adapter reset when VRAM is not
>>>>>> lost.
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> On 1/8/26 15:48, Alex Deucher wrote:
>>>>>>>>>> This set contains a number of bug fixes and cleanups for
>>>>>>>>>> IB handling that I worked on over the holidays.
>>>>>>>>>>
>>>>>>>>>> Patches 1-2:
>>>>>>>>>> Simple bug fixes.
>>>>>>>>>>
>>>>>>>>>> Patches 3-26:
>>>>>>>>>> Removes the direct submit path for IBs and requires
>>>>>>>>>> that all IB submissions use a job structure.  This
>>>>>>>>>> greatly simplifies the IB submission code.
>>>>>>>>>>
>>>>>>>>>> Patches 27-42:
>>>>>>>>>> Split IB state setup and ring emission.  This keeps all
>>>>>>>>>> of the IB state in the job.  This greatly simplifies
>>>>>>>>>> re-emission of non-timed-out jobs after a ring reset and
>>>>>>>>>> allows for re-emission multiple times if multiple resets
>>>>>>>>>> happen in a row.  It also properly handles the dma fence
>>>>>>>>>> error handling for timedout jobs with adapter resets.
>>>>>>>>>>
>>>>>>>>>> Alex Deucher (42):
>>>>>>>>>>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
>>>>>>>>>>   drm/amdgpu: fix error handling in ib_schedule()
>>>>>>>>>>   drm/amdgpu: add new job ids
>>>>>>>>>>   drm/amdgpu/vpe: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx6: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx7: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx8: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx9: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx10: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx11: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx12: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx12.1: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/si_dma: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/cik_sdma: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma2.4: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma3: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma4: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma5: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma5.2: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma6: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma7: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma7.1: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu: require a job to schedule an IB
>>>>>>>>>>   drm/amdgpu: mark fences with errors before ring reset
>>>>>>>>>>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
>>>>>>>>>>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
>>>>>>>>>>   drm/amdgpu: drop drm_sched_increase_karma()
>>>>>>>>>>   drm/amdgpu: plumb timedout fence through to force completion
>>>>>>>>>>   drm/amdgpu: change function signature for emit_pipeline_sync()
>>>>>>>>>>   drm/amdgpu: drop extra parameter for vm_flush
>>>>>>>>>>   drm/amdgpu: move need_ctx_switch into amdgpu_job
>>>>>>>>>>   drm/amdgpu: store vm flush state in amdgpu_job
>>>>>>>>>>   drm/amdgpu: split fence init and emit logic
>>>>>>>>>>   drm/amdgpu: split vm flush and vm flush emit logic
>>>>>>>>>>   drm/amdgpu: split ib schedule and ib emit logic
>>>>>>>>>>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
>>>>>>>>>>   drm/amdgpu: add an all_instance_rings_reset ring flag
>>>>>>>>>>   drm/amdgpu: rework reset reemit handling
>>>>>>>>>>   drm/amdgpu: simplify per queue reset code
>>>>>>>>>>
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
>>>>>>>>>>  54 files changed, 952 insertions(+), 966 deletions(-)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-13 13:31 ` [PATCH 00/42] Improvements for IB handling Christian König
  2026-01-13 14:10   ` Alex Deucher
@ 2026-01-13 21:17   ` Alex Deucher
  2026-01-14 10:35     ` Christian König
  1 sibling, 1 reply; 66+ messages in thread
From: Alex Deucher @ 2026-01-13 21:17 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Tue, Jan 13, 2026 at 8:57 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Comment on patch #4 which also affects patches #5-#26.

What was your comment on patch 4?  I don't see that reply on the mailing list.

Alex

>
> Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.
>
> Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.
>
> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.
>
> Regards,
> Christian.
>
> On 1/8/26 15:48, Alex Deucher wrote:
> > This set contains a number of bug fixes and cleanups for
> > IB handling that I worked on over the holidays.
> >
> > Patches 1-2:
> > Simple bug fixes.
> >
> > Patches 3-26:
> > Removes the direct submit path for IBs and requires
> > that all IB submissions use a job structure.  This
> > greatly simplifies the IB submission code.
> >
> > Patches 27-42:
> > Split IB state setup and ring emission.  This keeps all
> > of the IB state in the job.  This greatly simplifies
> > re-emission of non-timed-out jobs after a ring reset and
> > allows for re-emission multiple times if multiple resets
> > happen in a row.  It also properly handles the dma fence
> > error handling for timedout jobs with adapter resets.
> >
> > Alex Deucher (42):
> >   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
> >   drm/amdgpu: fix error handling in ib_schedule()
> >   drm/amdgpu: add new job ids
> >   drm/amdgpu/vpe: switch to using job for IBs
> >   drm/amdgpu/gfx6: switch to using job for IBs
> >   drm/amdgpu/gfx7: switch to using job for IBs
> >   drm/amdgpu/gfx8: switch to using job for IBs
> >   drm/amdgpu/gfx9: switch to using job for IBs
> >   drm/amdgpu/gfx9.4.2: switch to using job for IBs
> >   drm/amdgpu/gfx9.4.3: switch to using job for IBs
> >   drm/amdgpu/gfx10: switch to using job for IBs
> >   drm/amdgpu/gfx11: switch to using job for IBs
> >   drm/amdgpu/gfx12: switch to using job for IBs
> >   drm/amdgpu/gfx12.1: switch to using job for IBs
> >   drm/amdgpu/si_dma: switch to using job for IBs
> >   drm/amdgpu/cik_sdma: switch to using job for IBs
> >   drm/amdgpu/sdma2.4: switch to using job for IBs
> >   drm/amdgpu/sdma3: switch to using job for IBs
> >   drm/amdgpu/sdma4: switch to using job for IBs
> >   drm/amdgpu/sdma4.4.2: switch to using job for IBs
> >   drm/amdgpu/sdma5: switch to using job for IBs
> >   drm/amdgpu/sdma5.2: switch to using job for IBs
> >   drm/amdgpu/sdma6: switch to using job for IBs
> >   drm/amdgpu/sdma7: switch to using job for IBs
> >   drm/amdgpu/sdma7.1: switch to using job for IBs
> >   drm/amdgpu: require a job to schedule an IB
> >   drm/amdgpu: mark fences with errors before ring reset
> >   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
> >   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
> >   drm/amdgpu: drop drm_sched_increase_karma()
> >   drm/amdgpu: plumb timedout fence through to force completion
> >   drm/amdgpu: change function signature for emit_pipeline_sync()
> >   drm/amdgpu: drop extra parameter for vm_flush
> >   drm/amdgpu: move need_ctx_switch into amdgpu_job
> >   drm/amdgpu: store vm flush state in amdgpu_job
> >   drm/amdgpu: split fence init and emit logic
> >   drm/amdgpu: split vm flush and vm flush emit logic
> >   drm/amdgpu: split ib schedule and ib emit logic
> >   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
> >   drm/amdgpu: add an all_instance_rings_reset ring flag
> >   drm/amdgpu: rework reset reemit handling
> >   drm/amdgpu: simplify per queue reset code
> >
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
> >  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
> >  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
> >  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
> >  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
> >  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
> >  54 files changed, 952 insertions(+), 966 deletions(-)
> >
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/42] Improvements for IB handling
  2026-01-13 21:17   ` Alex Deucher
@ 2026-01-14 10:35     ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2026-01-14 10:35 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, amd-gfx

On 1/13/26 22:17, Alex Deucher wrote:
> On Tue, Jan 13, 2026 at 8:57 AM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> Patches #1-#3: Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Comment on patch #4 which also affects patches #5-#26.
> 
> What was your comment on patch 4?  I don't see that reply on the mailing list.

That we didn't used the job because we couldn't allocate memory while in the GPU reset.

We could use GFP_ATOMIC when allocating from the GPU reset IB pool to solve this.

Christian.

> 
> Alex
> 
>>
>> Comment on patch #27 and #28. When #28 comes before #27 then that would potentially solve the issue with #27.
>>
>> Patches #31: Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Patches #32-#40 that looks extremely questionable to me. I've intentionally removed that state from the job because it isn't job dependent and sometimes has inter-job meaning.
>>
>> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max and I'm clearly against doing that over and over again. Saving the ring content clearly seems to be the saver approach.
>>
>> Regards,
>> Christian.
>>
>> On 1/8/26 15:48, Alex Deucher wrote:
>>> This set contains a number of bug fixes and cleanups for
>>> IB handling that I worked on over the holidays.
>>>
>>> Patches 1-2:
>>> Simple bug fixes.
>>>
>>> Patches 3-26:
>>> Removes the direct submit path for IBs and requires
>>> that all IB submissions use a job structure.  This
>>> greatly simplifies the IB submission code.
>>>
>>> Patches 27-42:
>>> Split IB state setup and ring emission.  This keeps all
>>> of the IB state in the job.  This greatly simplifies
>>> re-emission of non-timed-out jobs after a ring reset and
>>> allows for re-emission multiple times if multiple resets
>>> happen in a row.  It also properly handles the dma fence
>>> error handling for timedout jobs with adapter resets.
>>>
>>> Alex Deucher (42):
>>>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
>>>   drm/amdgpu: fix error handling in ib_schedule()
>>>   drm/amdgpu: add new job ids
>>>   drm/amdgpu/vpe: switch to using job for IBs
>>>   drm/amdgpu/gfx6: switch to using job for IBs
>>>   drm/amdgpu/gfx7: switch to using job for IBs
>>>   drm/amdgpu/gfx8: switch to using job for IBs
>>>   drm/amdgpu/gfx9: switch to using job for IBs
>>>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
>>>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
>>>   drm/amdgpu/gfx10: switch to using job for IBs
>>>   drm/amdgpu/gfx11: switch to using job for IBs
>>>   drm/amdgpu/gfx12: switch to using job for IBs
>>>   drm/amdgpu/gfx12.1: switch to using job for IBs
>>>   drm/amdgpu/si_dma: switch to using job for IBs
>>>   drm/amdgpu/cik_sdma: switch to using job for IBs
>>>   drm/amdgpu/sdma2.4: switch to using job for IBs
>>>   drm/amdgpu/sdma3: switch to using job for IBs
>>>   drm/amdgpu/sdma4: switch to using job for IBs
>>>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
>>>   drm/amdgpu/sdma5: switch to using job for IBs
>>>   drm/amdgpu/sdma5.2: switch to using job for IBs
>>>   drm/amdgpu/sdma6: switch to using job for IBs
>>>   drm/amdgpu/sdma7: switch to using job for IBs
>>>   drm/amdgpu/sdma7.1: switch to using job for IBs
>>>   drm/amdgpu: require a job to schedule an IB
>>>   drm/amdgpu: mark fences with errors before ring reset
>>>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
>>>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
>>>   drm/amdgpu: drop drm_sched_increase_karma()
>>>   drm/amdgpu: plumb timedout fence through to force completion
>>>   drm/amdgpu: change function signature for emit_pipeline_sync()
>>>   drm/amdgpu: drop extra parameter for vm_flush
>>>   drm/amdgpu: move need_ctx_switch into amdgpu_job
>>>   drm/amdgpu: store vm flush state in amdgpu_job
>>>   drm/amdgpu: split fence init and emit logic
>>>   drm/amdgpu: split vm flush and vm flush emit logic
>>>   drm/amdgpu: split ib schedule and ib emit logic
>>>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
>>>   drm/amdgpu: add an all_instance_rings_reset ring flag
>>>   drm/amdgpu: rework reset reemit handling
>>>   drm/amdgpu: simplify per queue reset code
>>>
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 ++++++++++----------
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
>>>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
>>>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
>>>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
>>>  54 files changed, 952 insertions(+), 966 deletions(-)
>>>
>>


^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2026-01-15 14:54 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
2026-01-08 14:48 ` [PATCH 01/42] drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check Alex Deucher
2026-01-08 14:48 ` [PATCH 02/42] drm/amdgpu: fix error handling in ib_schedule() Alex Deucher
2026-01-08 14:48 ` [PATCH 03/42] drm/amdgpu: add new job ids Alex Deucher
2026-01-08 14:48 ` [PATCH 04/42] drm/amdgpu/vpe: switch to using job for IBs Alex Deucher
2026-01-08 14:48 ` [PATCH 05/42] drm/amdgpu/gfx6: " Alex Deucher
2026-01-08 14:48 ` [PATCH 06/42] drm/amdgpu/gfx7: " Alex Deucher
2026-01-08 14:48 ` [PATCH 07/42] drm/amdgpu/gfx8: " Alex Deucher
2026-01-08 14:48 ` [PATCH 08/42] drm/amdgpu/gfx9: " Alex Deucher
2026-01-08 14:48 ` [PATCH 09/42] drm/amdgpu/gfx9.4.2: " Alex Deucher
2026-01-08 14:48 ` [PATCH 10/42] drm/amdgpu/gfx9.4.3: " Alex Deucher
2026-01-08 14:48 ` [PATCH 11/42] drm/amdgpu/gfx10: " Alex Deucher
2026-01-08 14:48 ` [PATCH 12/42] drm/amdgpu/gfx11: " Alex Deucher
2026-01-08 14:48 ` [PATCH 13/42] drm/amdgpu/gfx12: " Alex Deucher
2026-01-08 14:48 ` [PATCH 14/42] drm/amdgpu/gfx12.1: " Alex Deucher
2026-01-08 14:48 ` [PATCH 15/42] drm/amdgpu/si_dma: " Alex Deucher
2026-01-08 14:48 ` [PATCH 16/42] drm/amdgpu/cik_sdma: " Alex Deucher
2026-01-08 14:48 ` [PATCH 17/42] drm/amdgpu/sdma2.4: " Alex Deucher
2026-01-08 14:48 ` [PATCH 18/42] drm/amdgpu/sdma3: " Alex Deucher
2026-01-08 14:48 ` [PATCH 19/42] drm/amdgpu/sdma4: " Alex Deucher
2026-01-08 14:48 ` [PATCH 20/42] drm/amdgpu/sdma4.4.2: " Alex Deucher
2026-01-08 14:48 ` [PATCH 21/42] drm/amdgpu/sdma5: " Alex Deucher
2026-01-08 14:48 ` [PATCH 22/42] drm/amdgpu/sdma5.2: " Alex Deucher
2026-01-08 14:48 ` [PATCH 23/42] drm/amdgpu/sdma6: " Alex Deucher
2026-01-08 14:48 ` [PATCH 24/42] drm/amdgpu/sdma7: " Alex Deucher
2026-01-08 14:48 ` [PATCH 25/42] drm/amdgpu/sdma7.1: " Alex Deucher
2026-01-08 14:48 ` [PATCH 26/42] drm/amdgpu: require a job to schedule an IB Alex Deucher
2026-01-08 14:48 ` [PATCH 27/42] drm/amdgpu: mark fences with errors before ring reset Alex Deucher
2026-01-13 13:12   ` Christian König
2026-01-13 15:39     ` Alex Deucher
2026-01-13 21:23       ` Alex Deucher
2026-01-08 14:48 ` [PATCH 28/42] drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion() Alex Deucher
2026-01-08 14:48 ` [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset Alex Deucher
2026-01-13 13:17   ` Christian König
2026-01-13 13:34     ` Philipp Stanner
2026-01-13 14:37       ` Christian König
2026-01-13 15:16         ` Philipp Stanner
2026-01-13 16:46         ` Alex Deucher
2026-01-08 14:48 ` [PATCH 30/42] drm/amdgpu: drop drm_sched_increase_karma() Alex Deucher
2026-01-13 13:22   ` Christian König
2026-01-13 21:27     ` Alex Deucher
2026-01-13 21:45       ` Alex Deucher
2026-01-08 14:48 ` [PATCH 31/42] drm/amdgpu: plumb timedout fence through to force completion Alex Deucher
2026-01-08 14:48 ` [PATCH 32/42] drm/amdgpu: change function signature for emit_pipeline_sync() Alex Deucher
2026-01-08 14:48 ` [PATCH 33/42] drm/amdgpu: drop extra parameter for vm_flush Alex Deucher
2026-01-08 14:48 ` [PATCH 34/42] drm/amdgpu: move need_ctx_switch into amdgpu_job Alex Deucher
2026-01-08 14:48 ` [PATCH 35/42] drm/amdgpu: store vm flush state in amdgpu_job Alex Deucher
2026-01-08 14:48 ` [PATCH 36/42] drm/amdgpu: split fence init and emit logic Alex Deucher
2026-01-08 14:48 ` [PATCH 37/42] drm/amdgpu: split vm flush and vm flush " Alex Deucher
2026-01-08 14:48 ` [PATCH 38/42] drm/amdgpu: split ib schedule and ib " Alex Deucher
2026-01-08 14:48 ` [PATCH 39/42] drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout() Alex Deucher
2026-01-08 14:48 ` [PATCH 40/42] drm/amdgpu: add an all_instance_rings_reset ring flag Alex Deucher
2026-01-08 14:48 ` [PATCH 41/42] drm/amdgpu: rework reset reemit handling Alex Deucher
2026-01-08 14:48 ` [PATCH 42/42] drm/amdgpu: simplify per queue reset code Alex Deucher
2026-01-13 13:31 ` [PATCH 00/42] Improvements for IB handling Christian König
2026-01-13 14:10   ` Alex Deucher
2026-01-13 14:47     ` Christian König
2026-01-13 15:34       ` Alex Deucher
2026-01-13 22:36         ` Alex Deucher
2026-01-14 10:45           ` Christian König
2026-01-14 16:36             ` Alex Deucher
2026-01-15  9:07               ` Christian König
2026-01-15 14:08                 ` Alex Deucher
2026-01-15 14:54                   ` Christian König
2026-01-13 21:17   ` Alex Deucher
2026-01-14 10:35     ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox