* FAILED: patch "[PATCH] drm/amdgpu: fix NULL pointer dereference in" failed to apply to 6.6-stable tree
@ 2026-02-03 14:26 gregkh
2026-02-04 0:26 ` [PATCH 6.6.y 1/2] drm/amdkfd: Don't use sw fault filter if retry cam enabled Sasha Levin
0 siblings, 1 reply; 3+ messages in thread
From: gregkh @ 2026-02-03 14:26 UTC (permalink / raw)
To: jond, Philip.Yang, alexander.deucher, timur.kristof; +Cc: stable
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 8b1ecc9377bc641533cd9e76dfa3aee3cd04a007
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2026020358-resemble-wildness-53b5@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8b1ecc9377bc641533cd9e76dfa3aee3cd04a007 Mon Sep 17 00:00:00 2001
From: Jon Doron <jond@wiz.io>
Date: Sat, 20 Dec 2025 15:04:40 +0200
Subject: [PATCH] drm/amdgpu: fix NULL pointer dereference in
amdgpu_gmc_filter_faults_remove
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and
ih2 interrupt ring buffers are not initialized. This is by design, as
these secondary IH rings are only available on discrete GPUs. See
vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when
AMD_IS_APU is set.
However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to
get the timestamp of the last interrupt entry. When retry faults are
enabled on APUs (noretry=0), this function is called from the SVM page
fault recovery path, resulting in a NULL pointer dereference when
amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[].
The crash manifests as:
BUG: kernel NULL pointer dereference, address: 0000000000000004
RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu]
Call Trace:
amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu]
svm_range_restore_pages+0xae5/0x11c0 [amdgpu]
amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu]
gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu]
amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu]
amdgpu_ih_process+0x84/0x100 [amdgpu]
This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW
IP 9.3.0 from noretry=1") which changed the default for Renoir APU from
noretry=1 to noretry=0, enabling retry fault handling and thus
exercising the buggy code path.
Fix this by adding a check for ih1.ring_size before attempting to use
it. Also restore the soft_ih support from commit dd299441654f ("drm/amdgpu:
Rework retry fault removal"). This is needed if the hardware doesn't
support secondary HW IH rings.
v2: additional updates (Alex)
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814
Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6ce8d536c80aa1f059e82184f0d1994436b1d526)
Cc: stable@vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 7e623f91f2d7..d9c7ad297293 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -498,8 +498,13 @@ void amdgpu_gmc_filter_faults_remove(struct amdgpu_device *adev, uint64_t addr,
if (adev->irq.retry_cam_enabled)
return;
+ else if (adev->irq.ih1.ring_size)
+ ih = &adev->irq.ih1;
+ else if (adev->irq.ih_soft.enabled)
+ ih = &adev->irq.ih_soft;
+ else
+ return;
- ih = &adev->irq.ih1;
/* Get the WPTR of the last entry in IH ring */
last_wptr = amdgpu_ih_get_wptr(adev, ih);
/* Order wptr with ring data. */
^ permalink raw reply related [flat|nested] 3+ messages in thread* [PATCH 6.6.y 1/2] drm/amdkfd: Don't use sw fault filter if retry cam enabled
2026-02-03 14:26 FAILED: patch "[PATCH] drm/amdgpu: fix NULL pointer dereference in" failed to apply to 6.6-stable tree gregkh
@ 2026-02-04 0:26 ` Sasha Levin
2026-02-04 0:26 ` [PATCH 6.6.y 2/2] drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove Sasha Levin
0 siblings, 1 reply; 3+ messages in thread
From: Sasha Levin @ 2026-02-04 0:26 UTC (permalink / raw)
To: stable; +Cc: Philip Yang, Christian König, Alex Deucher, Sasha Levin
From: Philip Yang <Philip.Yang@amd.com>
[ Upstream commit e61801f162ddcf8874c820639483ec4849b0fb0b ]
If retry cam enabled, we don't use sw retry fault filter and add fault
into sw filter ring, so we shouldn't remove fault from sw filter.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 8b1ecc9377bc ("drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 0b6a0e149f1c4..9b225acdcf974 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -452,7 +452,10 @@ void amdgpu_gmc_filter_faults_remove(struct amdgpu_device *adev, uint64_t addr,
uint32_t hash;
uint64_t tmp;
- ih = adev->irq.retry_cam_enabled ? &adev->irq.ih_soft : &adev->irq.ih1;
+ if (adev->irq.retry_cam_enabled)
+ return;
+
+ ih = &adev->irq.ih1;
/* Get the WPTR of the last entry in IH ring */
last_wptr = amdgpu_ih_get_wptr(adev, ih);
/* Order wptr with ring data. */
--
2.51.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* [PATCH 6.6.y 2/2] drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove
2026-02-04 0:26 ` [PATCH 6.6.y 1/2] drm/amdkfd: Don't use sw fault filter if retry cam enabled Sasha Levin
@ 2026-02-04 0:26 ` Sasha Levin
0 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2026-02-04 0:26 UTC (permalink / raw)
To: stable
Cc: Jon Doron, Timur Kristóf, Philip Yang, Alex Deucher,
Sasha Levin
From: Jon Doron <jond@wiz.io>
[ Upstream commit 8b1ecc9377bc641533cd9e76dfa3aee3cd04a007 ]
On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and
ih2 interrupt ring buffers are not initialized. This is by design, as
these secondary IH rings are only available on discrete GPUs. See
vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when
AMD_IS_APU is set.
However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to
get the timestamp of the last interrupt entry. When retry faults are
enabled on APUs (noretry=0), this function is called from the SVM page
fault recovery path, resulting in a NULL pointer dereference when
amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[].
The crash manifests as:
BUG: kernel NULL pointer dereference, address: 0000000000000004
RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu]
Call Trace:
amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu]
svm_range_restore_pages+0xae5/0x11c0 [amdgpu]
amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu]
gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu]
amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu]
amdgpu_ih_process+0x84/0x100 [amdgpu]
This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW
IP 9.3.0 from noretry=1") which changed the default for Renoir APU from
noretry=1 to noretry=0, enabling retry fault handling and thus
exercising the buggy code path.
Fix this by adding a check for ih1.ring_size before attempting to use
it. Also restore the soft_ih support from commit dd299441654f ("drm/amdgpu:
Rework retry fault removal"). This is needed if the hardware doesn't
support secondary HW IH rings.
v2: additional updates (Alex)
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814
Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6ce8d536c80aa1f059e82184f0d1994436b1d526)
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 9b225acdcf974..3c24637f3d6e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -454,8 +454,13 @@ void amdgpu_gmc_filter_faults_remove(struct amdgpu_device *adev, uint64_t addr,
if (adev->irq.retry_cam_enabled)
return;
+ else if (adev->irq.ih1.ring_size)
+ ih = &adev->irq.ih1;
+ else if (adev->irq.ih_soft.enabled)
+ ih = &adev->irq.ih_soft;
+ else
+ return;
- ih = &adev->irq.ih1;
/* Get the WPTR of the last entry in IH ring */
last_wptr = amdgpu_ih_get_wptr(adev, ih);
/* Order wptr with ring data. */
--
2.51.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-02-04 0:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-03 14:26 FAILED: patch "[PATCH] drm/amdgpu: fix NULL pointer dereference in" failed to apply to 6.6-stable tree gregkh
2026-02-04 0:26 ` [PATCH 6.6.y 1/2] drm/amdkfd: Don't use sw fault filter if retry cam enabled Sasha Levin
2026-02-04 0:26 ` [PATCH 6.6.y 2/2] drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox