All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Timur Kristóf" <timur.kristof@gmail.com>
To: Alex Deucher <alexander.deucher@amd.com>, amd-gfx@lists.freedesktop.org
Cc: Jon Doron <jond@wiz.io>,
	stable@vger.kernel.org, "Lazar, Lijo" <lijo.lazar@amd.com>
Subject: Re: [PATCH] drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove
Date: Thu, 22 Jan 2026 16:00:41 +0100	[thread overview]
Message-ID: <4882409.vXUDI8C0e8@timur-hyperion> (raw)
In-Reply-To: <9d5291d6-9e1f-4df4-ad0b-ba7543d8a2af@amd.com>

On Thursday, January 22, 2026 6:07:27 AM Central European Standard Time Lazar, 
Lijo wrote:
> On 21-Jan-26 11:54 PM, Alex Deucher wrote:
> > From: Jon Doron <jond@wiz.io>
> > 
> > On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and
> > ih2 interrupt ring buffers are not initialized. This is by design, as
> > these secondary IH rings are only available on discrete GPUs. See
> > vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when
> > AMD_IS_APU is set.
> > 
> > However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to
> > get the timestamp of the last interrupt entry. When retry faults are
> > enabled on APUs (noretry=0), this function is called from the SVM page
> > fault recovery path, resulting in a NULL pointer dereference when
> > amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[].
> > 
> > The crash manifests as:
> >    BUG: kernel NULL pointer dereference, address: 0000000000000004
> >    RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu]
> >    
> >    Call Trace:
> >     amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu]
> >     svm_range_restore_pages+0xae5/0x11c0 [amdgpu]
> >     amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu]
> >     gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu]
> >     amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu]
> >     amdgpu_ih_process+0x84/0x100 [amdgpu]
> > 
> > This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW
> > IP 9.3.0 from noretry=1") which changed the default for Renoir APU from
> > noretry=1 to noretry=0, enabling retry fault handling and thus
> > exercising the buggy code path.
> > 
> > Fix this by adding a check for ih1.ring_size before attempting to use
> > it. Also restore the soft_ih support from commit dd299441654f
> > ("drm/amdgpu:
> > Rework retry fault removal").  This is needed if the hardware doesn't
> > support secondary HW IH rings.
> > 
> > v2: additional updates (Alex)
> > 
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814
> > Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Jon Doron <jond@wiz.io>
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> > 
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 7 ++++++-
> >   1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index
> > 8e65fec9f534e..243d75917458a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > @@ -498,8 +498,13 @@ void amdgpu_gmc_filter_faults_remove(struct
> > amdgpu_device *adev, uint64_t addr,> 
> >   	if (adev->irq.retry_cam_enabled)
> >   	
> >   		return;
> > 
> > +	else if (adev->irq.ih1.ring_size)
> > +		ih = &adev->irq.ih1;
> > +	else if (adev->irq.ih_soft.enabled)
> > +		ih = &adev->irq.ih_soft;
> 
> Faults are delegated to soft ring when retry_cam is enabled -
> https://gitlab.freedesktop.org/agd5f/linux/-/blob/amd-staging-drm-next/drive
> rs/gpu/drm/amd/amdgpu/amdgpu_gmc.c#L541

Hi,

As far as I know the retry CAM is not available on APUs.
Please correct me if I'm wrong.

Thanks,
Timur

> 
> That matches with the original logic in d299441654f ("drm/amdgpu: Rework
> retry fault removal").
> 
> To match exactly with the logic in above commit, I think it should use
> soft ring only when retry cam is enabled. Presently, it's returning
> without doing anything.
> 
> Thanks,
> Lijo
> 
> > +	else
> > +		return;
> > 
> > -	ih = &adev->irq.ih1;
> > 
> >   	/* Get the WPTR of the last entry in IH ring */
> >   	last_wptr = amdgpu_ih_get_wptr(adev, ih);
> >   	/* Order wptr with ring data. */





  parent reply	other threads:[~2026-01-22 15:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21 18:24 [PATCH] drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove Alex Deucher
2026-01-22  5:07 ` Lazar, Lijo
2026-01-22 14:02   ` Alex Deucher
2026-01-22 15:00   ` Timur Kristóf [this message]
2026-01-22 22:47     ` Philip Yang
2026-01-23  7:17     ` Lazar, Lijo
2026-01-23  8:06       ` Lazar, Lijo
2026-01-22 15:01 ` Timur Kristóf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4882409.vXUDI8C0e8@timur-hyperion \
    --to=timur.kristof@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=jond@wiz.io \
    --cc=lijo.lazar@amd.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.