public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: "Lazar, Lijo" <lijo.lazar@amd.com>
To: Alex Deucher <alexander.deucher@amd.com>, amd-gfx@lists.freedesktop.org
Cc: Jon Doron <jond@wiz.io>, stable@vger.kernel.org
Subject: Re: [PATCH] drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove
Date: Thu, 22 Jan 2026 10:37:27 +0530	[thread overview]
Message-ID: <9d5291d6-9e1f-4df4-ad0b-ba7543d8a2af@amd.com> (raw)
In-Reply-To: <20260121182447.2434085-1-alexander.deucher@amd.com>



On 21-Jan-26 11:54 PM, Alex Deucher wrote:
> From: Jon Doron <jond@wiz.io>
> 
> On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and
> ih2 interrupt ring buffers are not initialized. This is by design, as
> these secondary IH rings are only available on discrete GPUs. See
> vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when
> AMD_IS_APU is set.
> 
> However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to
> get the timestamp of the last interrupt entry. When retry faults are
> enabled on APUs (noretry=0), this function is called from the SVM page
> fault recovery path, resulting in a NULL pointer dereference when
> amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[].
> 
> The crash manifests as:
> 
>    BUG: kernel NULL pointer dereference, address: 0000000000000004
>    RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu]
>    Call Trace:
>     amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu]
>     svm_range_restore_pages+0xae5/0x11c0 [amdgpu]
>     amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu]
>     gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu]
>     amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu]
>     amdgpu_ih_process+0x84/0x100 [amdgpu]
> 
> This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW
> IP 9.3.0 from noretry=1") which changed the default for Renoir APU from
> noretry=1 to noretry=0, enabling retry fault handling and thus
> exercising the buggy code path.
> 
> Fix this by adding a check for ih1.ring_size before attempting to use
> it. Also restore the soft_ih support from commit dd299441654f ("drm/amdgpu:
> Rework retry fault removal").  This is needed if the hardware doesn't
> support secondary HW IH rings.
> 
> v2: additional updates (Alex)
> 
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814
> Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Jon Doron <jond@wiz.io>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 8e65fec9f534e..243d75917458a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -498,8 +498,13 @@ void amdgpu_gmc_filter_faults_remove(struct amdgpu_device *adev, uint64_t addr,
>   
>   	if (adev->irq.retry_cam_enabled)
>   		return;
> +	else if (adev->irq.ih1.ring_size)
> +		ih = &adev->irq.ih1;
> +	else if (adev->irq.ih_soft.enabled)
> +		ih = &adev->irq.ih_soft;

Faults are delegated to soft ring when retry_cam is enabled - 
https://gitlab.freedesktop.org/agd5f/linux/-/blob/amd-staging-drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c#L541

That matches with the original logic in d299441654f ("drm/amdgpu: Rework 
retry fault removal").

To match exactly with the logic in above commit, I think it should use 
soft ring only when retry cam is enabled. Presently, it's returning 
without doing anything.

Thanks,
Lijo

> +	else
> +		return;
>   
> -	ih = &adev->irq.ih1;
>   	/* Get the WPTR of the last entry in IH ring */
>   	last_wptr = amdgpu_ih_get_wptr(adev, ih);
>   	/* Order wptr with ring data. */


  reply	other threads:[~2026-01-22  5:07 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21 18:24 [PATCH] drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove Alex Deucher
2026-01-22  5:07 ` Lazar, Lijo [this message]
2026-01-22 15:00   ` Timur Kristóf
2026-01-23  7:17     ` Lazar, Lijo
2026-01-23  8:06       ` Lazar, Lijo
2026-01-22 15:01 ` Timur Kristóf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d5291d6-9e1f-4df4-ad0b-ba7543d8a2af@amd.com \
    --to=lijo.lazar@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=jond@wiz.io \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox