public inbox for amd-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: vitaly.prosyak@amd.com, amd-gfx@lists.freedesktop.org, "Khatri,
	Sunil" <Sunil.Khatri@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	Jesse Zhang <jesse.zhang@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: fix NULL pointer dereference in amdgpu_devcoredump_format
Date: Fri, 10 Apr 2026 16:20:06 +0200	[thread overview]
Message-ID: <b5573653-080e-460a-906e-cff8f3dd85a1@amd.com> (raw)
In-Reply-To: <20260410013639.129917-2-vitaly.prosyak@amd.com>

On 4/10/26 03:35, vitaly.prosyak@amd.com wrote:
> From: Vitaly Prosyak <vitaly.prosyak@amd.com>
> 
> A race condition in the devcoredump code causes a NULL pointer
> dereference in amdgpu_devcoredump_format() when two GPU resets occur
> in quick succession.
> 
> The sequence of events:
> 
> 1. First reset calls amdgpu_coredump(), creates coredump1, sets
>    adev->coredump = coredump1, and queues the deferred work.
> 2. The deferred work begins executing (work_pending() returns false
>    since the work is now running, not just queued).
> 3. A second reset calls amdgpu_coredump(). work_pending() returns
>    false because the work is running, so amdgpu_coredump() proceeds:
>    creates coredump2, overwrites adev->coredump = coredump2, and
>    re-queues the deferred work with queue_work().
> 4. The first deferred work finishes and unconditionally sets
>    adev->coredump = NULL, destroying the reference to coredump2.
> 5. The re-queued deferred work starts and reads
>    adev->coredump = NULL. It then passes this NULL into
>    amdgpu_devcoredump_format() which dereferences coredump->adev
>    (offset 0 in the struct), triggering:
> 
>    KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
>    RIP: 0010:amdgpu_devcoredump_format+0xa6/0x36b0 [amdgpu]
> 
> This was observed during the amd_deadlock IGT test where multiple
> subtests trigger rapid ring resets. The dmesg log shows four
> coredumps created within 120ms (at 102.377s, 104.424s, 104.492s,
> and 104.497s), with the crash occurring 13ms after the last one.
> 
> Fix this with three changes:
> 
> - Replace work_pending() with work_busy() in amdgpu_coredump() to
>   also reject new coredumps while the deferred work is executing,
>   not just when it is queued. This closes the main race window.
> 
> - Add a defensive NULL check for adev->coredump at the start of
>   amdgpu_devcoredump_deferred_work() to prevent the crash if the
>   race still occurs (work_busy() is advisory, not a full barrier).
> 
> - Guard the unconditional coredump->pasid = job->pasid assignment
>   with a NULL check on job, since callers can pass job=NULL (as
>   evidenced by the existing if (job && job->pasid) pattern).
> 
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Jesse Zhang <jesse.zhang@amd.com>
> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
> index 8edec416fe2b..5cfd9ecccdf2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
> @@ -464,6 +464,9 @@ static void amdgpu_devcoredump_deferred_work(struct work_struct *work)
>  	struct amdgpu_device *adev = container_of(work, typeof(*adev), coredump_work);
>  	struct amdgpu_coredump_info *coredump = adev->coredump;
>  
> +	if (!coredump)
> +		goto end;
> +
>  	/* Do a one-time preparation of the coredump output because
>  	 * repeatingly calling drm_coredump_printer is very slow.
>  	 */
> @@ -499,7 +502,7 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool skip_vram_check,
>  	int i, off, idx;
>  
>  	/* No need to generate a new coredump if there's one in progress already. */
> -	if (work_pending(&adev->coredump_work))
> +	if (work_busy(&adev->coredump_work))
>  		return;
>  
>  	if (job && job->pasid)
> @@ -511,7 +514,8 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool skip_vram_check,
>  
>  	coredump->skip_vram_check = skip_vram_check;
>  	coredump->reset_vram_lost = vram_lost;
> -	coredump->pasid = job->pasid;
> +	if (job)
> +		coredump->pasid = job->pasid;

Sunil also send out a patch for fixing this which looked a little bit better.

Please sync up on the code and review each other patches.

Thanks,
Christian.

>  
>  	if (job && job->pasid) {
>  		struct amdgpu_task_info *ti;


  reply	other threads:[~2026-04-10 14:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10  1:35 [PATCH 1/2] drm/amdgpu: fix heap buffer overflow in amdgpu_coredump ring dump vitaly.prosyak
2026-04-10  1:35 ` [PATCH 2/2] drm/amdgpu: fix NULL pointer dereference in amdgpu_devcoredump_format vitaly.prosyak
2026-04-10 14:20   ` Christian König [this message]
2026-04-13 11:56   ` Pierre-Eric Pelloux-Prayer
2026-04-13 12:06     ` Khatri, Sunil
2026-04-10 14:17 ` [PATCH 1/2] drm/amdgpu: fix heap buffer overflow in amdgpu_coredump ring dump Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b5573653-080e-460a-906e-cff8f3dd85a1@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Sunil.Khatri@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=jesse.zhang@amd.com \
    --cc=pierre-eric.pelloux-prayer@amd.com \
    --cc=vitaly.prosyak@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox