From: "Christian König" <christian.koenig@amd.com>
To: Alex Deucher <alexander.deucher@amd.com>, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 10/10] drm/amdgpu: rework ring reset backup and reemit
Date: Mon, 19 Jan 2026 14:19:59 +0100 [thread overview]
Message-ID: <ca96231d-3152-49e4-9fac-750d6ef3c868@amd.com> (raw)
In-Reply-To: <20260116162027.21550-11-alexander.deucher@amd.com>
On 1/16/26 17:20, Alex Deucher wrote:
> Store the start and end wptrs in the IB fence. On queue
> reset, only save the ring contents of the non-guilty contexts.
> Since the VM fence is a sub-fence of the the IB fence, the IB
> fence stores the start and end wptrs for both fences as the IB
> state encapsulates the VM fence state.
Most looks like a step in the right direction, but that is not such a good idea.
Even for a gulty fence we still want to re-submit the VM stuff since submissions from other context might depend on getting this right.
Regards,
Christian.
>
> For reemit, only reemit the state for non-guilty contexts; for
> guilty contexts, just emit the fences. Split the reemit per
> fence when when we reemit, update the start and end wptrs
> with the new values from reemit. This allows us to
> reemit jobs repeatedly as the wptrs get properly updated
> each time. Submitting the fence directly also simplifies
> the logic as we no longer need to store additional wptrs for
> each fence within the IB ring sequence. If the context is
> guilty, we emit the fence(s) and if not, we reemit the
> whole IB ring sequence for that IB.
>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 98 +++++++++--------------
> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 9 +--
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 37 +--------
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 20 ++---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
> 5 files changed, 54 insertions(+), 111 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index d48f61076c06a..541cdbe1e28bf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -89,16 +89,6 @@ static u32 amdgpu_fence_read(struct amdgpu_ring *ring)
> return seq;
> }
>
> -static void amdgpu_fence_save_fence_wptr_start(struct amdgpu_fence *af)
> -{
> - af->fence_wptr_start = af->ring->wptr;
> -}
> -
> -static void amdgpu_fence_save_fence_wptr_end(struct amdgpu_fence *af)
> -{
> - af->fence_wptr_end = af->ring->wptr;
> -}
> -
> /**
> * amdgpu_fence_emit - emit a fence on the requested ring
> *
> @@ -126,11 +116,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
> &ring->fence_drv.lock,
> adev->fence_context + ring->idx, seq);
>
> - amdgpu_fence_save_fence_wptr_start(af);
> + af->flags = flags | AMDGPU_FENCE_FLAG_INT;
> amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
> - seq, flags | AMDGPU_FENCE_FLAG_INT);
> - amdgpu_fence_save_fence_wptr_end(af);
> - amdgpu_fence_save_wptr(af);
> + seq, af->flags);
> +
> pm_runtime_get_noresume(adev_to_drm(adev)->dev);
> ptr = &ring->fence_drv.fences[seq & ring->fence_drv.num_fences_mask];
> if (unlikely(rcu_dereference_protected(*ptr, 1))) {
> @@ -241,7 +230,6 @@ bool amdgpu_fence_process(struct amdgpu_ring *ring)
>
> do {
> struct dma_fence *fence, **ptr;
> - struct amdgpu_fence *am_fence;
>
> ++last_seq;
> last_seq &= drv->num_fences_mask;
> @@ -254,12 +242,6 @@ bool amdgpu_fence_process(struct amdgpu_ring *ring)
> if (!fence)
> continue;
>
> - /* Save the wptr in the fence driver so we know what the last processed
> - * wptr was. This is required for re-emitting the ring state for
> - * queues that are reset but are not guilty and thus have no guilty fence.
> - */
> - am_fence = container_of(fence, struct amdgpu_fence, base);
> - drv->signalled_wptr = am_fence->wptr;
> dma_fence_signal(fence);
> dma_fence_put(fence);
> pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
> @@ -727,25 +709,27 @@ void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring,
> */
>
> /**
> - * amdgpu_fence_driver_update_timedout_fence_state - Update fence state and set errors
> + * amdgpu_ring_set_fence_errors_and_reemit - Set dma_fence errors and reemit
> *
> - * @af: fence of the ring to update
> + * @guilty_fence: fence of the ring to update
> *
> */
> -void amdgpu_fence_driver_update_timedout_fence_state(struct amdgpu_fence *af)
> +void amdgpu_ring_set_fence_errors_and_reemit(struct amdgpu_ring *ring,
> + struct amdgpu_fence *guilty_fence)
> {
> struct dma_fence *unprocessed;
> struct dma_fence __rcu **ptr;
> struct amdgpu_fence *fence;
> - struct amdgpu_ring *ring = af->ring;
> unsigned long flags;
> u32 seq, last_seq;
> - bool reemitted = false;
> + unsigned int i;
>
> last_seq = amdgpu_fence_read(ring) & ring->fence_drv.num_fences_mask;
> seq = ring->fence_drv.sync_seq & ring->fence_drv.num_fences_mask;
>
> /* mark all fences from the guilty context with an error */
> + amdgpu_ring_alloc(ring, ring->ring_backup_entries_to_copy +
> + 20 * ring->guilty_fence_count);
> spin_lock_irqsave(&ring->fence_drv.lock, flags);
> do {
> last_seq++;
> @@ -758,39 +742,41 @@ void amdgpu_fence_driver_update_timedout_fence_state(struct amdgpu_fence *af)
> if (unprocessed && !dma_fence_is_signaled_locked(unprocessed)) {
> fence = container_of(unprocessed, struct amdgpu_fence, base);
>
> - if (fence->reemitted > 1)
> - reemitted = true;
> - else if (fence == af)
> + if (fence == guilty_fence)
> dma_fence_set_error(&fence->base, -ETIME);
> - else if (fence->context == af->context)
> + else if (fence->context == guilty_fence->context)
> dma_fence_set_error(&fence->base, -ECANCELED);
> +
> + if (fence->context == guilty_fence->context) {
> + /* just emit the fence for the guilty context */
> + amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
> + fence->base.seqno, fence->flags);
> + } else {
> + /* reemit the packet stream and update wptrs */
> + fence->wptr_start = ring->wptr;
> + for (i = 0; i < fence->backup_size; i++)
> + amdgpu_ring_write(ring, ring->ring_backup[fence->backup_idx + i]);
> + fence->wptr_end = ring->wptr;
> + }
> }
> rcu_read_unlock();
> } while (last_seq != seq);
> spin_unlock_irqrestore(&ring->fence_drv.lock, flags);
> -
> - if (reemitted) {
> - /* if we've already reemitted once then just cancel everything */
> - amdgpu_fence_driver_force_completion(af->ring, &af->base);
> - af->ring->ring_backup_entries_to_copy = 0;
> - }
> -}
> -
> -void amdgpu_fence_save_wptr(struct amdgpu_fence *af)
> -{
> - af->wptr = af->ring->wptr;
> + amdgpu_ring_commit(ring);
> }
>
> static void amdgpu_ring_backup_unprocessed_command(struct amdgpu_ring *ring,
> - u64 start_wptr, u64 end_wptr)
> + struct amdgpu_fence *af)
> {
> - unsigned int first_idx = start_wptr & ring->buf_mask;
> - unsigned int last_idx = end_wptr & ring->buf_mask;
> + unsigned int first_idx = af->wptr_start & ring->buf_mask;
> + unsigned int last_idx = af->wptr_end & ring->buf_mask;
> unsigned int i;
>
> /* Backup the contents of the ring buffer. */
> + af->backup_idx = ring->ring_backup_entries_to_copy;
> for (i = first_idx; i != last_idx; ++i, i &= ring->buf_mask)
> ring->ring_backup[ring->ring_backup_entries_to_copy++] = ring->ring[i];
> + af->backup_size = ring->ring_backup_entries_to_copy - af->backup_idx;
> }
>
> void amdgpu_ring_backup_unprocessed_commands(struct amdgpu_ring *ring,
> @@ -799,13 +785,12 @@ void amdgpu_ring_backup_unprocessed_commands(struct amdgpu_ring *ring,
> struct dma_fence *unprocessed;
> struct dma_fence __rcu **ptr;
> struct amdgpu_fence *fence;
> - u64 wptr;
> u32 seq, last_seq;
>
> last_seq = amdgpu_fence_read(ring) & ring->fence_drv.num_fences_mask;
> seq = ring->fence_drv.sync_seq & ring->fence_drv.num_fences_mask;
> - wptr = ring->fence_drv.signalled_wptr;
> ring->ring_backup_entries_to_copy = 0;
> + ring->guilty_fence_count = 0;
>
> do {
> last_seq++;
> @@ -818,21 +803,12 @@ void amdgpu_ring_backup_unprocessed_commands(struct amdgpu_ring *ring,
> if (unprocessed && !dma_fence_is_signaled(unprocessed)) {
> fence = container_of(unprocessed, struct amdgpu_fence, base);
>
> - /* save everything if the ring is not guilty, otherwise
> - * just save the content from other contexts.
> - */
> - if (!fence->reemitted &&
> - (!guilty_fence || (fence->context != guilty_fence->context))) {
> - amdgpu_ring_backup_unprocessed_command(ring, wptr,
> - fence->wptr);
> - } else if (!fence->reemitted) {
> - /* always save the fence */
> - amdgpu_ring_backup_unprocessed_command(ring,
> - fence->fence_wptr_start,
> - fence->fence_wptr_end);
> - }
> - wptr = fence->wptr;
> - fence->reemitted++;
> + /* keep track of the guilty fences for reemit */
> + if (fence->context == guilty_fence->context)
> + ring->guilty_fence_count++;
> + /* Non-vm fence has all the state. Backup the non-guilty contexts */
> + if (!fence->is_vm_fence && (fence->context != guilty_fence->context))
> + amdgpu_ring_backup_unprocessed_command(ring, fence);
> }
> rcu_read_unlock();
> } while (last_seq != seq);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index fb58637ed1507..865a803026c3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -185,6 +185,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
> dev_err(adev->dev, "scheduling IB failed (%d).\n", r);
> return r;
> }
> + af->wptr_start = ring->wptr;
>
> need_ctx_switch = ring->current_ctx != fence_ctx;
> if (ring->funcs->emit_pipeline_sync && job &&
> @@ -303,13 +304,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, struct amdgpu_job *job,
> ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
> ring->funcs->emit_wave_limit(ring, false);
>
> - /* Save the wptr associated with this fence.
> - * This must be last for resets to work properly
> - * as we need to save the wptr associated with this
> - * fence so we know what rings contents to backup
> - * after we reset the queue.
> - */
> - amdgpu_fence_save_wptr(af);
> + af->wptr_end = ring->wptr;
>
> amdgpu_ring_ib_end(ring);
> amdgpu_ring_commit(ring);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index b82357c657237..df37335127979 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -104,29 +104,6 @@ int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned int ndw)
> return 0;
> }
>
> -/**
> - * amdgpu_ring_alloc_reemit - allocate space on the ring buffer for reemit
> - *
> - * @ring: amdgpu_ring structure holding ring information
> - * @ndw: number of dwords to allocate in the ring buffer
> - *
> - * Allocate @ndw dwords in the ring buffer (all asics).
> - * doesn't check the max_dw limit as we may be reemitting
> - * several submissions.
> - */
> -static void amdgpu_ring_alloc_reemit(struct amdgpu_ring *ring, unsigned int ndw)
> -{
> - /* Align requested size with padding so unlock_commit can
> - * pad safely */
> - ndw = (ndw + ring->funcs->align_mask) & ~ring->funcs->align_mask;
> -
> - ring->count_dw = ndw;
> - ring->wptr_old = ring->wptr;
> -
> - if (ring->funcs->begin_use)
> - ring->funcs->begin_use(ring);
> -}
> -
> /**
> * amdgpu_ring_insert_nop - insert NOP packets
> *
> @@ -877,7 +854,6 @@ void amdgpu_ring_reset_helper_begin(struct amdgpu_ring *ring,
> int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
> struct amdgpu_fence *guilty_fence)
> {
> - unsigned int i;
> int r;
>
> /* verify that the ring is functional */
> @@ -885,16 +861,9 @@ int amdgpu_ring_reset_helper_end(struct amdgpu_ring *ring,
> if (r)
> return r;
>
> - /* set an error on all fences from the context */
> - if (guilty_fence)
> - amdgpu_fence_driver_update_timedout_fence_state(guilty_fence);
> - /* Re-emit the non-guilty commands */
> - if (ring->ring_backup_entries_to_copy) {
> - amdgpu_ring_alloc_reemit(ring, ring->ring_backup_entries_to_copy);
> - for (i = 0; i < ring->ring_backup_entries_to_copy; i++)
> - amdgpu_ring_write(ring, ring->ring_backup[i]);
> - amdgpu_ring_commit(ring);
> - }
> + /* set an error on all fences from the context and reemit */
> + amdgpu_ring_set_fence_errors_and_reemit(ring, guilty_fence);
> +
> /* Start the scheduler again */
> drm_sched_wqueue_start(&ring->sched);
> return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index ce095427611fb..6dca1ccd2fc2e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -121,7 +121,6 @@ struct amdgpu_fence_driver {
> /* sync_seq is protected by ring emission lock */
> uint32_t sync_seq;
> atomic_t last_seq;
> - u64 signalled_wptr;
> bool initialized;
> struct amdgpu_irq_src *irq_src;
> unsigned irq_type;
> @@ -146,15 +145,17 @@ struct amdgpu_fence {
> struct amdgpu_ring *ring;
> ktime_t start_timestamp;
>
> + bool is_vm_fence;
> + unsigned int flags;
> /* wptr for the total submission for resets */
> - u64 wptr;
> + u64 wptr_start;
> + u64 wptr_end;
> /* fence context for resets */
> u64 context;
> - /* has this fence been reemitted */
> - unsigned int reemitted;
> - /* wptr for the fence for the submission */
> - u64 fence_wptr_start;
> - u64 fence_wptr_end;
> + /* idx into the ring backup */
> + unsigned int backup_idx;
> + unsigned int backup_size;
> +
> };
>
> extern const struct drm_sched_backend_ops amdgpu_sched_ops;
> @@ -162,8 +163,8 @@ extern const struct drm_sched_backend_ops amdgpu_sched_ops;
> void amdgpu_fence_driver_set_error(struct amdgpu_ring *ring, int error);
> void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring,
> struct dma_fence *timedout_fence);
> -void amdgpu_fence_driver_update_timedout_fence_state(struct amdgpu_fence *af);
> -void amdgpu_fence_save_wptr(struct amdgpu_fence *af);
> +void amdgpu_ring_set_fence_errors_and_reemit(struct amdgpu_ring *ring,
> + struct amdgpu_fence *guilty_fence);
>
> int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring);
> int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring,
> @@ -314,6 +315,7 @@ struct amdgpu_ring {
> /* backups for resets */
> uint32_t *ring_backup;
> unsigned int ring_backup_entries_to_copy;
> + unsigned int guilty_fence_count;
> unsigned rptr_offs;
> u64 rptr_gpu_addr;
> u32 *rptr_cpu_addr;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 6a2ea200d90c8..bd2c01b1ef18f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -848,6 +848,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
> r = amdgpu_fence_emit(ring, job->hw_vm_fence, 0);
> if (r)
> return r;
> + job->hw_vm_fence->is_vm_fence = true;
> fence = &job->hw_vm_fence->base;
> /* get a ref for the job */
> dma_fence_get(fence);
next prev parent reply other threads:[~2026-01-19 13:20 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-16 16:20 [PATCH 00/10] Improvements for IB handling V3 Alex Deucher
2026-01-16 16:20 ` [PATCH 01/10] drm/amdgpu: fix type for wptr in ring backup Alex Deucher
2026-01-19 12:18 ` Christian König
2026-01-16 16:20 ` [PATCH 02/10] drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion() Alex Deucher
2026-01-19 12:22 ` Christian König
2026-01-16 16:20 ` [PATCH 03/10] drm/amdgpu/job: use GFP_ATOMIC while in gpu reset Alex Deucher
2026-01-19 12:27 ` Christian König
2026-01-16 16:20 ` [PATCH 04/10] drm/amdgpu: switch all IPs to using job for IBs Alex Deucher
2026-01-19 12:31 ` Christian König
2026-01-16 16:20 ` [PATCH 05/10] drm/amdgpu: require a job to schedule an IB Alex Deucher
2026-01-19 12:41 ` Christian König
2026-01-16 16:20 ` [PATCH 06/10] drm/amdgpu: don't call drm_sched_stop/start() in asic reset Alex Deucher
2026-01-19 12:42 ` Christian König
2026-01-16 16:20 ` [PATCH 07/10] drm/amdgpu: drop drm_sched_increase_karma() Alex Deucher
2026-01-19 12:44 ` Christian König
2026-01-16 16:20 ` [PATCH 08/10] drm/amdgpu: plumb timedout fence through to force completion Alex Deucher
2026-01-16 16:20 ` [PATCH 09/10] drm/amdgpu: simplify VCN reset helper Alex Deucher
2026-01-16 16:20 ` [PATCH 10/10] drm/amdgpu: rework ring reset backup and reemit Alex Deucher
2026-01-19 13:19 ` Christian König [this message]
2026-01-20 2:41 ` Timur Kristóf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca96231d-3152-49e4-9fac-750d6ef3c868@amd.com \
--to=christian.koenig@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox