* [PATCH] drm/amdgpu: fix zero-size GDS range init on RDNA4 [not found] <bug-221376-2300@https.bugzilla.kernel.org/> @ 2026-04-20 21:57 ` arjan 2026-04-21 6:42 ` Christian König 0 siblings, 1 reply; 2+ messages in thread From: arjan @ 2026-04-20 21:57 UTC (permalink / raw) To: amd-gfx Cc: Arjan van de Ven, Alex Deucher, Christian König, dri-devel, linux-kernel From: Arjan van de Ven <arjan@linux.intel.com> RDNA4 (GFX 12) hardware removes the GDS, GWS, and OA on-chip memory resources. The gfx_v12_0 initialisation code correctly leaves adev->gds.gds_size, adev->gds.gws_size, and adev->gds.oa_size at zero to reflect this. amdgpu_ttm_init() unconditionally calls amdgpu_ttm_init_on_chip() for each of these resources regardless of size. When the size is zero, amdgpu_ttm_init_on_chip() forwards the call to ttm_range_man_init(), which calls drm_mm_init(mm, 0, 0). drm_mm_init() immediately fires DRM_MM_BUG_ON(start + size <= start) -- trivially true when size is zero -- crashing the kernel during modprobe of amdgpu on an RX 9070 XT. Guard against this by returning 0 early from amdgpu_ttm_init_on_chip() when size_in_page is zero. This skips TTM resource manager registration for hardware resources that are absent, without affecting any other GPU type. Link: https://lore.kernel.org/all/bug-221376-2300@https.bugzilla.kernel.org%2F/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=221376 Oops-Analysis: http://oops.fenrus.org/reports/bugzilla.korg/221376/report.html Assisted-by: GitHub Copilot:Claude Sonnet 4.6 linux-kernel-oops-x86. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index afaaab6496def..8075ac735321e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -75,6 +75,9 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev, unsigned int type, uint64_t size_in_page) { + if (!size_in_page) + return 0; + return ttm_range_man_init(&adev->mman.bdev, type, false, size_in_page); } ^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] drm/amdgpu: fix zero-size GDS range init on RDNA4 2026-04-20 21:57 ` [PATCH] drm/amdgpu: fix zero-size GDS range init on RDNA4 arjan @ 2026-04-21 6:42 ` Christian König 0 siblings, 0 replies; 2+ messages in thread From: Christian König @ 2026-04-21 6:42 UTC (permalink / raw) To: arjan, amd-gfx; +Cc: Alex Deucher, dri-devel, linux-kernel On 4/20/26 23:57, arjan@linux.intel.com wrote: > > RDNA4 (GFX 12) hardware removes the GDS, GWS, and OA on-chip memory > resources. The gfx_v12_0 initialisation code correctly leaves > adev->gds.gds_size, adev->gds.gws_size, and adev->gds.oa_size at > zero to reflect this. > > amdgpu_ttm_init() unconditionally calls amdgpu_ttm_init_on_chip() for > each of these resources regardless of size. When the size is zero, > amdgpu_ttm_init_on_chip() forwards the call to ttm_range_man_init(), > which calls drm_mm_init(mm, 0, 0). drm_mm_init() immediately fires > DRM_MM_BUG_ON(start + size <= start) -- trivially true when size is > zero -- crashing the kernel during modprobe of amdgpu on an RX 9070 XT. Mhm in general not a bad idea, but we are having tons of GFX 12 systems in our test machines and nothing is crashing there. We are clearly missing something here. Is that on an upstream kernel or something backported? Regards, Christian. > > Guard against this by returning 0 early from > amdgpu_ttm_init_on_chip() when size_in_page is zero. This skips TTM > resource manager registration for hardware resources that are absent, > without affecting any other GPU type. > > Link: https://lore.kernel.org/all/bug-221376-2300@https.bugzilla.kernel.org%2F/ > Link: https://bugzilla.kernel.org/show_bug.cgi?id=221376 > Oops-Analysis: http://oops.fenrus.org/reports/bugzilla.korg/221376/report.html > Assisted-by: GitHub Copilot:Claude Sonnet 4.6 linux-kernel-oops-x86. > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> > Cc: Alex Deucher <alexander.deucher@amd.com> > Cc: "Christian König" <christian.koenig@amd.com> > Cc: amd-gfx@lists.freedesktop.org > Cc: dri-devel@lists.freedesktop.org > Cc: linux-kernel@vger.kernel.org > > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index afaaab6496def..8075ac735321e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -75,6 +75,9 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev, > unsigned int type, > uint64_t size_in_page) > { > + if (!size_in_page) > + return 0; > + > return ttm_range_man_init(&adev->mman.bdev, type, > false, size_in_page); > } ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-04-21 6:42 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-221376-2300@https.bugzilla.kernel.org/>
2026-04-20 21:57 ` [PATCH] drm/amdgpu: fix zero-size GDS range init on RDNA4 arjan
2026-04-21 6:42 ` Christian König
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox