From: "Christian König" <christian.koenig@amd.com>
To: vitaly.prosyak@amd.com, amd-gfx@lists.freedesktop.org, "Khatri,
Sunil" <Sunil.Khatri@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>,
Alex Deucher <alexander.deucher@amd.com>,
Jesse Zhang <jesse.zhang@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: fix NULL pointer dereference in amdgpu_devcoredump_format
Date: Fri, 10 Apr 2026 16:20:06 +0200 [thread overview]
Message-ID: <b5573653-080e-460a-906e-cff8f3dd85a1@amd.com> (raw)
In-Reply-To: <20260410013639.129917-2-vitaly.prosyak@amd.com>
On 4/10/26 03:35, vitaly.prosyak@amd.com wrote:
> From: Vitaly Prosyak <vitaly.prosyak@amd.com>
>
> A race condition in the devcoredump code causes a NULL pointer
> dereference in amdgpu_devcoredump_format() when two GPU resets occur
> in quick succession.
>
> The sequence of events:
>
> 1. First reset calls amdgpu_coredump(), creates coredump1, sets
> adev->coredump = coredump1, and queues the deferred work.
> 2. The deferred work begins executing (work_pending() returns false
> since the work is now running, not just queued).
> 3. A second reset calls amdgpu_coredump(). work_pending() returns
> false because the work is running, so amdgpu_coredump() proceeds:
> creates coredump2, overwrites adev->coredump = coredump2, and
> re-queues the deferred work with queue_work().
> 4. The first deferred work finishes and unconditionally sets
> adev->coredump = NULL, destroying the reference to coredump2.
> 5. The re-queued deferred work starts and reads
> adev->coredump = NULL. It then passes this NULL into
> amdgpu_devcoredump_format() which dereferences coredump->adev
> (offset 0 in the struct), triggering:
>
> KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> RIP: 0010:amdgpu_devcoredump_format+0xa6/0x36b0 [amdgpu]
>
> This was observed during the amd_deadlock IGT test where multiple
> subtests trigger rapid ring resets. The dmesg log shows four
> coredumps created within 120ms (at 102.377s, 104.424s, 104.492s,
> and 104.497s), with the crash occurring 13ms after the last one.
>
> Fix this with three changes:
>
> - Replace work_pending() with work_busy() in amdgpu_coredump() to
> also reject new coredumps while the deferred work is executing,
> not just when it is queued. This closes the main race window.
>
> - Add a defensive NULL check for adev->coredump at the start of
> amdgpu_devcoredump_deferred_work() to prevent the crash if the
> race still occurs (work_busy() is advisory, not a full barrier).
>
> - Guard the unconditional coredump->pasid = job->pasid assignment
> with a NULL check on job, since callers can pass job=NULL (as
> evidenced by the existing if (job && job->pasid) pattern).
>
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Jesse Zhang <jesse.zhang@amd.com>
> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
> index 8edec416fe2b..5cfd9ecccdf2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
> @@ -464,6 +464,9 @@ static void amdgpu_devcoredump_deferred_work(struct work_struct *work)
> struct amdgpu_device *adev = container_of(work, typeof(*adev), coredump_work);
> struct amdgpu_coredump_info *coredump = adev->coredump;
>
> + if (!coredump)
> + goto end;
> +
> /* Do a one-time preparation of the coredump output because
> * repeatingly calling drm_coredump_printer is very slow.
> */
> @@ -499,7 +502,7 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool skip_vram_check,
> int i, off, idx;
>
> /* No need to generate a new coredump if there's one in progress already. */
> - if (work_pending(&adev->coredump_work))
> + if (work_busy(&adev->coredump_work))
> return;
>
> if (job && job->pasid)
> @@ -511,7 +514,8 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool skip_vram_check,
>
> coredump->skip_vram_check = skip_vram_check;
> coredump->reset_vram_lost = vram_lost;
> - coredump->pasid = job->pasid;
> + if (job)
> + coredump->pasid = job->pasid;
Sunil also send out a patch for fixing this which looked a little bit better.
Please sync up on the code and review each other patches.
Thanks,
Christian.
>
> if (job && job->pasid) {
> struct amdgpu_task_info *ti;
next prev parent reply other threads:[~2026-04-10 14:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 1:35 [PATCH 1/2] drm/amdgpu: fix heap buffer overflow in amdgpu_coredump ring dump vitaly.prosyak
2026-04-10 1:35 ` [PATCH 2/2] drm/amdgpu: fix NULL pointer dereference in amdgpu_devcoredump_format vitaly.prosyak
2026-04-10 14:20 ` Christian König [this message]
2026-04-13 11:56 ` Pierre-Eric Pelloux-Prayer
2026-04-13 12:06 ` Khatri, Sunil
2026-04-10 14:17 ` [PATCH 1/2] drm/amdgpu: fix heap buffer overflow in amdgpu_coredump ring dump Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b5573653-080e-460a-906e-cff8f3dd85a1@amd.com \
--to=christian.koenig@amd.com \
--cc=Sunil.Khatri@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=jesse.zhang@amd.com \
--cc=pierre-eric.pelloux-prayer@amd.com \
--cc=vitaly.prosyak@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox