From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: phasta@kernel.org, Alex Deucher <alexander.deucher@amd.com>,
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset
Date: Tue, 13 Jan 2026 15:37:50 +0100 [thread overview]
Message-ID: <bcd6ee8a-b951-4088-94c7-b9d260fe0c48@gmail.com> (raw)
In-Reply-To: <8a8dbf04b6d13d67541dc2bc1fb91769def373c2.camel@mailbox.org>
On 1/13/26 14:34, Philipp Stanner wrote:
> On Tue, 2026-01-13 at 14:17 +0100, Christian König wrote:
>> On 1/8/26 15:48, Alex Deucher wrote:
>>> We only want to stop the work queues, not mess with the
>>> pending list so just stop the work queues.
>
> Ideally amdgpu could stop touching the pending_list altogether forever,
> as discussed at XDC. Is work for that in the pipe? Is that what this
> patch is for?
Yes.
>
>>
>> Oh, yes please! I can't remember how long we have worked towards that.
>>
>> But we also need to change the return code so that the scheduler now re-inserts the job into the pending list.
>
> You're referring to false-positive timeouts. Porting users to that
> typically consists of adding that return code and also removing
> whatever the driver used to do to inject the non-timedout job into the
> scheduler again.
>
> How is that being done here?
Previously drm_sched_stop() would insert the job back into the pending list after stopping the scheduler thread.
But when that is replaced with drm_sched_wqueue_stop() then that won't happen any more. That is a good thing and prevents us from running into problems like UAF because the HW fence signaled.
As far as I can see we should start returning DRM_GPU_SCHED_STAT_NO_HANG from amdgpu even when there was actually a hang (maybe rename the return code).
Regards,
Christian.
>
> P.
>
>>
>> Adding Philip on CC to double check what I say above.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 80572f71ff627..868ab5314c0d1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -6301,7 +6301,7 @@ static void amdgpu_device_halt_activities(struct amdgpu_device *adev,
>>> if (!amdgpu_ring_sched_ready(ring))
>>> continue;
>>>
>>> - drm_sched_stop(&ring->sched, job ? &job->base : NULL);
>>> + drm_sched_wqueue_stop(&ring->sched);
>>>
>>> if (need_emergency_restart)
>>> amdgpu_job_stop_all_jobs_on_sched(&ring->sched);
>>> @@ -6385,7 +6385,7 @@ static int amdgpu_device_sched_resume(struct list_head *device_list,
>>> if (!amdgpu_ring_sched_ready(ring))
>>> continue;
>>>
>>> - drm_sched_start(&ring->sched, 0);
>>> + drm_sched_wqueue_start(&ring->sched);
>>> }
>>>
>>> if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled)
>>
>
next prev parent reply other threads:[~2026-01-13 14:37 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-08 14:48 [PATCH 00/42] Improvements for IB handling Alex Deucher
2026-01-08 14:48 ` [PATCH 01/42] drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check Alex Deucher
2026-01-08 14:48 ` [PATCH 02/42] drm/amdgpu: fix error handling in ib_schedule() Alex Deucher
2026-01-08 14:48 ` [PATCH 03/42] drm/amdgpu: add new job ids Alex Deucher
2026-01-08 14:48 ` [PATCH 04/42] drm/amdgpu/vpe: switch to using job for IBs Alex Deucher
2026-01-08 14:48 ` [PATCH 05/42] drm/amdgpu/gfx6: " Alex Deucher
2026-01-08 14:48 ` [PATCH 06/42] drm/amdgpu/gfx7: " Alex Deucher
2026-01-08 14:48 ` [PATCH 07/42] drm/amdgpu/gfx8: " Alex Deucher
2026-01-08 14:48 ` [PATCH 08/42] drm/amdgpu/gfx9: " Alex Deucher
2026-01-08 14:48 ` [PATCH 09/42] drm/amdgpu/gfx9.4.2: " Alex Deucher
2026-01-08 14:48 ` [PATCH 10/42] drm/amdgpu/gfx9.4.3: " Alex Deucher
2026-01-08 14:48 ` [PATCH 11/42] drm/amdgpu/gfx10: " Alex Deucher
2026-01-08 14:48 ` [PATCH 12/42] drm/amdgpu/gfx11: " Alex Deucher
2026-01-08 14:48 ` [PATCH 13/42] drm/amdgpu/gfx12: " Alex Deucher
2026-01-08 14:48 ` [PATCH 14/42] drm/amdgpu/gfx12.1: " Alex Deucher
2026-01-08 14:48 ` [PATCH 15/42] drm/amdgpu/si_dma: " Alex Deucher
2026-01-08 14:48 ` [PATCH 16/42] drm/amdgpu/cik_sdma: " Alex Deucher
2026-01-08 14:48 ` [PATCH 17/42] drm/amdgpu/sdma2.4: " Alex Deucher
2026-01-08 14:48 ` [PATCH 18/42] drm/amdgpu/sdma3: " Alex Deucher
2026-01-08 14:48 ` [PATCH 19/42] drm/amdgpu/sdma4: " Alex Deucher
2026-01-08 14:48 ` [PATCH 20/42] drm/amdgpu/sdma4.4.2: " Alex Deucher
2026-01-08 14:48 ` [PATCH 21/42] drm/amdgpu/sdma5: " Alex Deucher
2026-01-08 14:48 ` [PATCH 22/42] drm/amdgpu/sdma5.2: " Alex Deucher
2026-01-08 14:48 ` [PATCH 23/42] drm/amdgpu/sdma6: " Alex Deucher
2026-01-08 14:48 ` [PATCH 24/42] drm/amdgpu/sdma7: " Alex Deucher
2026-01-08 14:48 ` [PATCH 25/42] drm/amdgpu/sdma7.1: " Alex Deucher
2026-01-08 14:48 ` [PATCH 26/42] drm/amdgpu: require a job to schedule an IB Alex Deucher
2026-01-08 14:48 ` [PATCH 27/42] drm/amdgpu: mark fences with errors before ring reset Alex Deucher
2026-01-13 13:12 ` Christian König
2026-01-13 15:39 ` Alex Deucher
2026-01-13 21:23 ` Alex Deucher
2026-01-08 14:48 ` [PATCH 28/42] drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion() Alex Deucher
2026-01-08 14:48 ` [PATCH 29/42] drm/amdgpu: don't call drm_sched_stop/start() in asic reset Alex Deucher
2026-01-13 13:17 ` Christian König
2026-01-13 13:34 ` Philipp Stanner
2026-01-13 14:37 ` Christian König [this message]
2026-01-13 15:16 ` Philipp Stanner
2026-01-13 16:46 ` Alex Deucher
2026-01-08 14:48 ` [PATCH 30/42] drm/amdgpu: drop drm_sched_increase_karma() Alex Deucher
2026-01-13 13:22 ` Christian König
2026-01-13 21:27 ` Alex Deucher
2026-01-13 21:45 ` Alex Deucher
2026-01-08 14:48 ` [PATCH 31/42] drm/amdgpu: plumb timedout fence through to force completion Alex Deucher
2026-01-08 14:48 ` [PATCH 32/42] drm/amdgpu: change function signature for emit_pipeline_sync() Alex Deucher
2026-01-08 14:48 ` [PATCH 33/42] drm/amdgpu: drop extra parameter for vm_flush Alex Deucher
2026-01-08 14:48 ` [PATCH 34/42] drm/amdgpu: move need_ctx_switch into amdgpu_job Alex Deucher
2026-01-08 14:48 ` [PATCH 35/42] drm/amdgpu: store vm flush state in amdgpu_job Alex Deucher
2026-01-08 14:48 ` [PATCH 36/42] drm/amdgpu: split fence init and emit logic Alex Deucher
2026-01-08 14:48 ` [PATCH 37/42] drm/amdgpu: split vm flush and vm flush " Alex Deucher
2026-01-08 14:48 ` [PATCH 38/42] drm/amdgpu: split ib schedule and ib " Alex Deucher
2026-01-08 14:48 ` [PATCH 39/42] drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout() Alex Deucher
2026-01-08 14:48 ` [PATCH 40/42] drm/amdgpu: add an all_instance_rings_reset ring flag Alex Deucher
2026-01-08 14:48 ` [PATCH 41/42] drm/amdgpu: rework reset reemit handling Alex Deucher
2026-01-08 14:48 ` [PATCH 42/42] drm/amdgpu: simplify per queue reset code Alex Deucher
2026-01-13 13:31 ` [PATCH 00/42] Improvements for IB handling Christian König
2026-01-13 14:10 ` Alex Deucher
2026-01-13 14:47 ` Christian König
2026-01-13 15:34 ` Alex Deucher
2026-01-13 22:36 ` Alex Deucher
2026-01-14 10:45 ` Christian König
2026-01-14 16:36 ` Alex Deucher
2026-01-15 9:07 ` Christian König
2026-01-15 14:08 ` Alex Deucher
2026-01-15 14:54 ` Christian König
2026-01-13 21:17 ` Alex Deucher
2026-01-14 10:35 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bcd6ee8a-b951-4088-94c7-b9d260fe0c48@gmail.com \
--to=ckoenig.leichtzumerken@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=phasta@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox