From: "Christian König" <christian.koenig@amd.com>
To: "André Almeida" <andrealmeid@igalia.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch,
"Raag Jadav" <raag.jadav@intel.com>,
rodrigo.vivi@intel.com, jani.nikula@linux.intel.com,
"Xaver Hugl" <xaver.hugl@gmail.com>,
"Krzysztof Karas" <krzysztof.karas@intel.com>
Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org,
intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v9 6/6] drm/amdgpu: Make use of drm_wedge_task_info
Date: Wed, 18 Jun 2025 09:29:04 +0200 [thread overview]
Message-ID: <a0f508fd-3277-4839-a4b6-e6bc56546f6c@amd.com> (raw)
In-Reply-To: <63b4fb79-8132-4c05-bcac-3238366899d9@igalia.com>
On 6/17/25 15:22, André Almeida wrote:
> Em 17/06/2025 10:07, Christian König escreveu:
>> On 6/17/25 14:49, André Almeida wrote:
>>> To notify userspace about which task (if any) made the device get in a
>>> wedge state, make use of drm_wedge_task_info parameter, filling it with
>>> the task PID and name.
>>>
>>> Signed-off-by: André Almeida <andrealmeid@igalia.com>
>>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Do you have commit right for drm-misc-next?
>>
>
> Thanks for the reviews!
>
> I do have access, but if you don't mind, can you push this one?
Sure, but give me till the end of today.
(And maybe ping me next week should I forget about it).
Regards,
Christian.
>
>> Regards,
>> Christian.
>>
>>> ---
>>> v8:
>>> - Drop check before calling amdgpu_vm_put_task_info()
>>> - Drop local variable `info`
>>> v7:
>>> - Remove struct cast, now we can use `info = &ti->task`
>>> - Fix struct lifetime, move amdgpu_vm_put_task_info() after
>>> drm_dev_wedged_event() call
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 +++++++++++--
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 7 +++++--
>>> 2 files changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 8a0f36f33f13..a59f194e3360 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -6363,8 +6363,17 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>> atomic_set(&adev->reset_domain->reset_res, r);
>>> - if (!r)
>>> - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL);
>>> + if (!r) {
>>> + struct amdgpu_task_info *ti = NULL;
>>> +
>>> + if (job)
>>> + ti = amdgpu_vm_get_task_info_pasid(adev, job->pasid);
>>> +
>>> + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE,
>>> + ti ? &ti->task : NULL);
>>> +
>>> + amdgpu_vm_put_task_info(ti);
>>> + }
>>> return r;
>>> }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index 0c1381b527fe..1e24590ae144 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -89,6 +89,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
>>> {
>>> struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
>>> struct amdgpu_job *job = to_amdgpu_job(s_job);
>>> + struct drm_wedge_task_info *info = NULL;
>>> struct amdgpu_task_info *ti;
>>> struct amdgpu_device *adev = ring->adev;
>>> int idx;
>>> @@ -125,7 +126,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
>>> ti = amdgpu_vm_get_task_info_pasid(ring->adev, job->pasid);
>>> if (ti) {
>>> amdgpu_vm_print_task_info(adev, ti);
>>> - amdgpu_vm_put_task_info(ti);
>>> + info = &ti->task;
>>> }
>>> /* attempt a per ring reset */
>>> @@ -164,13 +165,15 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
>>> if (amdgpu_ring_sched_ready(ring))
>>> drm_sched_start(&ring->sched, 0);
>>> dev_err(adev->dev, "Ring %s reset succeeded\n", ring->sched.name);
>>> - drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL);
>>> + drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, info);
>>> goto exit;
>>> }
>>> dev_err(adev->dev, "Ring %s reset failure\n", ring->sched.name);
>>> }
>>> dma_fence_set_error(&s_job->s_fence->finished, -ETIME);
>>> + amdgpu_vm_put_task_info(ti);
>>> +
>>> if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>> struct amdgpu_reset_context reset_context;
>>> memset(&reset_context, 0, sizeof(reset_context));
>>
>
next prev parent reply other threads:[~2025-06-18 7:29 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-17 12:49 [PATCH v9 0/6] drm: Create a task info option for wedge events André Almeida
2025-06-17 12:49 ` [PATCH v9 1/6] drm: amdgpu: Allow NULL pointers at amdgpu_vm_put_task_info() André Almeida
2025-06-17 12:51 ` Christian König
2025-06-17 12:49 ` [PATCH v9 2/6] drm: amdgpu: Create amdgpu_vm_print_task_info() André Almeida
2025-06-17 12:53 ` Christian König
2025-06-17 12:49 ` [PATCH v9 3/6] drm: Create a task info option for wedge events André Almeida
2025-06-17 12:59 ` Christian König
2025-06-17 12:49 ` [PATCH v9 4/6] drm/doc: Add a section about "Task information" for the wedge API André Almeida
2025-06-17 13:01 ` Christian König
2025-06-17 12:49 ` [PATCH v9 5/6] drm: amdgpu: Use struct drm_wedge_task_info inside of struct amdgpu_task_info André Almeida
2025-06-17 12:49 ` [PATCH v9 6/6] drm/amdgpu: Make use of drm_wedge_task_info André Almeida
2025-06-17 13:07 ` Christian König
2025-06-17 13:22 ` André Almeida
2025-06-17 14:38 ` André Almeida
2025-06-18 7:29 ` Christian König [this message]
2025-06-18 12:39 ` André Almeida
2025-06-17 14:54 ` André Almeida
2025-06-17 13:02 ` ✗ CI.checkpatch: warning for drm: Create a task info option for wedge events (rev4) Patchwork
2025-06-17 13:03 ` ✓ CI.KUnit: success " Patchwork
2025-06-17 13:18 ` ✗ CI.checksparse: warning " Patchwork
2025-06-17 13:46 ` ✓ Xe.CI.BAT: success " Patchwork
2025-06-17 15:45 ` ✓ i915.CI.BAT: " Patchwork
2025-06-17 21:03 ` ✗ Xe.CI.Full: failure " Patchwork
2025-06-18 2:06 ` ✗ i915.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a0f508fd-3277-4839-a4b6-e6bc56546f6c@amd.com \
--to=christian.koenig@amd.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=andrealmeid@igalia.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@linux.intel.com \
--cc=kernel-dev@igalia.com \
--cc=krzysztof.karas@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=raag.jadav@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=simona@ffwll.ch \
--cc=siqueira@igalia.com \
--cc=xaver.hugl@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.