AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>
Cc: amd-gfx@lists.freedesktop.org,
	Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Subject: Re: [PATCH v3] drm/amdgpu: Avoid NULL dereference in discovery topology coredump path v3
Date: Wed, 18 Mar 2026 11:33:40 +0100	[thread overview]
Message-ID: <3d062269-15fd-4bc9-9d36-ce47a11137e3@amd.com> (raw)
In-Reply-To: <20260318103059.1976682-1-srinivasan.shanmugam@amd.com>



On 3/18/26 11:30, Srinivasan Shanmugam wrote:
> When a GPU fault or timeout happens, the driver creates a devcoredump
> to collect debug information.
> 
> During this, amdgpu_devcoredump_format() calls
> amdgpu_discovery_dump() to print IP discovery data.
> 
> amdgpu_discovery_dump() uses:
>   adev->discovery.ip_top
> 
> and then accesses:
>   ip_top->die_kset
> 
> amdgpu_discovery_dump() uses adev->discovery.ip_top. However,
> ip_top may be NULL if the discovery topology was never initialized.
> 
> The current code does not check for this before using ip_top. As a
> result, when ip_top is NULL, the coredump worker crashes while taking
> the spinlock for ip_top->die_kset.
> 
> Fix this by checking for a missing ip_top before walking the discovery
> topology. If it is unavailable, print a short message in the dump and
> return safely.
> 
> - If ip_top is NULL, print a message and skip the dump
> - Also add the same check in the cleanup path
> 
> This makes the coredump and cleanup paths safe even when the
> discovery topology is not available.
> 
> KASAN trace:
> [  522.228252] [IGT] amd_deadlock: starting subtest amdgpu-deadlock-sdma
> [  522.240681] [IGT] amd_deadlock: starting dynamic subtest amdgpu-deadlock-sdma
> 
> ...
> 
> [  522.952317] Write of size 4 at addr 0000000000000050 by task kworker/u129:5/5434
> [  522.937526] BUG: KASAN: null-ptr-deref in _raw_spin_lock+0x66/0xc0
> [  522.967659] Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu]
> 
> ...
> 
> [  522.969445] Call Trace:
> [  522.969508]  _raw_spin_lock+0x66/0xc0
> [  522.969518]  ? __pfx__raw_spin_lock+0x10/0x10
> [  522.969534]  amdgpu_discovery_dump+0x61/0x530 [amdgpu]
> [  522.971346]  ? pick_next_task_fair+0x3f6/0x1c60
> [  522.971363]  amdgpu_devcoredump_format+0x84f/0x26f0 [amdgpu]
> [  522.973188]  ? __pfx_amdgpu_devcoredump_format+0x10/0x10 [amdgpu]
> [  522.975012]  ? psi_task_switch+0x2b5/0x9b0
> [  522.975027]  ? __pfx___drm_printfn_coredump+0x10/0x10 [drm]
> [  522.975198]  ? __pfx___drm_puts_coredump+0x10/0x10 [drm]
> [  522.975366]  ? __schedule+0x113c/0x38d0
> [  522.975381]  amdgpu_devcoredump_deferred_work+0x4c/0x1f0 [amdgpu]
> 
> v2: Updated commit message - Clarified that ip_top is not freed, it can
>     just be NULL if discovery was not initialized. (Christian/Lijo)
> 
> v3: Removed the extra drm_warn() for sysfs init failure as sysfs already
>     reports errors. (Christian)
> 
> Fixes: 7083eb8982fb ("drm/amdgpu: include ip discovery data in devcoredump")
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index f7f37d93d0ce..6be1f971a31a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -1395,6 +1395,9 @@ static void amdgpu_discovery_sysfs_fini(struct amdgpu_device *adev)
>  	struct list_head *el, *tmp;
>  	struct kset *die_kset;
>  
> +	if (!ip_top)
> +		return;
> +
>  	die_kset = &ip_top->die_kset;
>  	spin_lock(&die_kset->list_lock);
>  	list_for_each_prev_safe(el, tmp, &die_kset->list) {
> @@ -1419,9 +1422,15 @@ void amdgpu_discovery_dump(struct amdgpu_device *adev, struct drm_printer *p)
>  	struct ip_hw_instance *ip_inst;
>  	int i = 0, j;
>  
> +	drm_printf(p, "\nHW IP Discovery\n");
> +
> +	if (!ip_top) {
> +		drm_printf(p, "ip discovery topology unavailable\n");
> +		return;
> +	}
> +
>  	die_kset = &ip_top->die_kset;
>  
> -	drm_printf(p, "\nHW IP Discovery\n");
>  	spin_lock(&die_kset->list_lock);
>  	list_for_each(el_die, &die_kset->list) {
>  		drm_printf(p, "die %d\n", i++);


  reply	other threads:[~2026-03-18 10:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-18 10:30 [PATCH v3] drm/amdgpu: Avoid NULL dereference in discovery topology coredump path v3 Srinivasan Shanmugam
2026-03-18 10:33 ` Christian König [this message]
2026-03-18 10:57 ` Lazar, Lijo
2026-03-18 11:11   ` SHANMUGAM, SRINIVASAN
2026-03-19  4:15     ` Lazar, Lijo
2026-03-19  4:47       ` SHANMUGAM, SRINIVASAN
2026-03-19  5:00         ` Lazar, Lijo
2026-03-19  7:38           ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d062269-15fd-4bc9-9d36-ce47a11137e3@amd.com \
    --to=christian.koenig@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=pierre-eric.pelloux-prayer@amd.com \
    --cc=srinivasan.shanmugam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox