AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Deucher <alexander.deucher@amd.com>
To: <amd-gfx@lists.freedesktop.org>
Cc: Alex Deucher <alexander.deucher@amd.com>,
	Jesse Zhang <jesse.zhang@amd.com>
Subject: [PATCH 01/10] drm/amdgpu: re-add the bad job to the pending list for ring resets
Date: Tue, 20 Jan 2026 22:00:48 -0500	[thread overview]
Message-ID: <20260121030057.1683102-2-alexander.deucher@amd.com> (raw)
In-Reply-To: <20260121030057.1683102-1-alexander.deucher@amd.com>

Need to re-add the bad job to the pending list before we
restart the scheduler.

Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 6 ++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 4 ----
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 1daa9145b217e..ec8d74db62758 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -135,8 +135,14 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 	    ring->funcs->reset) {
 		dev_err(adev->dev, "Starting %s ring reset\n",
 			s_job->sched->name);
+		/* Stop the scheduler to prevent anybody else from touching the ring buffer. */
+		drm_sched_wqueue_stop(&ring->sched);
 		r = amdgpu_ring_reset(ring, job->vmid, job->hw_fence);
 		if (!r) {
+			/* add the job back to the pending list */
+			list_add(&s_job->list, &s_job->sched->pending_list);
+			/* Start the scheduler again */
+			drm_sched_wqueue_start(&ring->sched);
 			atomic_inc(&ring->adev->gpu_reset_counter);
 			dev_err(adev->dev, "Ring %s reset succeeded\n",
 				ring->sched.name);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index b82357c657237..129ad51386535 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -868,8 +868,6 @@ bool amdgpu_ring_sched_ready(struct amdgpu_ring *ring)
 void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
 				    struct amdgpu_fence *guilty_fence)
 {
-	/* Stop the scheduler to prevent anybody else from touching the ring buffer. */
-	drm_sched_wqueue_stop(&ring->sched);
 	/* back up the non-guilty commands */
 	amdgpu_ring_backup_unprocessed_commands(ring, guilty_fence);
 }
@@ -895,8 +893,6 @@ int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
 			amdgpu_ring_write(ring, ring->ring_backup[i]);
 		amdgpu_ring_commit(ring);
 	}
-	/* Start the scheduler again */
-	drm_sched_wqueue_start(&ring->sched);
 	return 0;
 }
 
-- 
2.52.0


  reply	other threads:[~2026-01-21  3:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21  3:00 [PATCH 00/10] Improvements for IB handling V5 Alex Deucher
2026-01-21  3:00 ` Alex Deucher [this message]
2026-01-21  3:00 ` [PATCH 02/10] drm/amdgpu/job: use GFP_ATOMIC while in gpu reset Alex Deucher
2026-01-21  8:37   ` Zhang, Jesse(Jie)
2026-01-21  3:00 ` [PATCH 03/10] drm/amdgpu: switch all IPs to using job for IBs Alex Deucher
2026-01-21  3:00 ` [PATCH 04/10] drm/amdgpu: require a job to schedule an IB Alex Deucher
2026-01-21  3:00 ` [PATCH 05/10] drm/amdgpu: don't call drm_sched_stop/start() in asic reset Alex Deucher
2026-01-21  3:00 ` [PATCH 06/10] drm/amdgpu/cs: return -ETIME for guilty contexts Alex Deucher
2026-01-21  3:00 ` [PATCH 07/10] drm/amdgpu: plumb timedout fence through to force completion Alex Deucher
2026-01-21  3:00 ` [PATCH 08/10] drm/amdgpu: simplify VCN reset helper Alex Deucher
2026-01-21  3:00 ` [PATCH 09/10] drm/amdgpu: Call drm_sched_increase_karma() for ring resets Alex Deucher
2026-01-21  8:38   ` Zhang, Jesse(Jie)
2026-01-21  3:00 ` [PATCH 10/10] drm/amdgpu: rework ring reset backup and reemit v4 Alex Deucher
2026-01-21  8:38   ` Zhang, Jesse(Jie)
  -- strict thread matches above, loose matches on Subject: below --
2026-01-20  1:34 [PATCH 00/10] Improvements for IB handling V4 Alex Deucher
2026-01-20  1:34 ` [PATCH 01/10] drm/amdgpu: re-add the bad job to the pending list for ring resets Alex Deucher
2026-01-20  7:23   ` Zhang, Jesse(Jie)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260121030057.1683102-2-alexander.deucher@amd.com \
    --to=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=jesse.zhang@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox