From: Ahmed Elmetwally <en22ue@gmail.com>
To: alexander.deucher@amd.com, christian.koenig@amd.com
Cc: airlied@gmail.com, simona@ffwll.ch,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org, Ahmed Elmetwally <en22ue@gmail.com>
Subject: [PATCH] drm/amdgpu: clamp user gartsize against device capacity
Date: Thu, 14 May 2026 20:49:37 +0100 [thread overview]
Message-ID: <20260514194937.35649-1-en22ue@gmail.com> (raw)
When the user-supplied amdgpu.gartsize= module parameter requests a GART
aperture so large that the resulting page-table BO cannot be allocated
from VRAM, gmc_v11_0_sw_init() aborts with -ENOMEM during the kernel BO
pin in amdgpu_gart_table_vram_alloc(), which fails amdgpu probe entirely
and leaves the device without a /dev/dri node.
The GART page table is 8 bytes per AMDGPU_GPU_PAGE_SIZE-sized page,
i.e. table_size = gart_size / 512. The table BO is allocated from real
VRAM. On small-VRAM SoCs (APUs with stolen VRAM -- Strix Halo at ~1 GiB
is the case at hand) a user-supplied gartsize that produces a table BO
larger than what can be pinned in VRAM is silently accepted today and
only fails far downstream.
Reproducer on Strix Halo (gfx1151, 1 GiB stolen VRAM):
/etc/modprobe.d/amdgpu-tuning.conf:
options amdgpu gartsize=262144 # 256 GiB -> 512 MiB page table
dmesg:
amdgpu: [gmc_v11_0]*ERROR* GART aperture is needed by the driver but
no memory has been pinned for it ...
amdgpu 0000:c6:00.0: amdgpu: SW IP initialize failed
amdgpu 0000:c6:00.0: amdgpu: sw_init of IP block <gmc_v11_0> failed -12
Recovery requires booting with amdgpu.gartsize=N on the kernel command
line -- i.e. the user must already know the cause, from rescue media,
with the GPU offline.
The user-supplied gartsize is a tuning hint, not a correctness
requirement. Compute the maximum sensible value such that the page-table
BO fits within real_vram_size / 8 (leaves 7/8 of VRAM for everything
else), and if the user value exceeds it, log a warning and fall back to
the per-IP auto default. With this patch the same modprobe.d entry
above produces:
amdgpu 0000:c6:00.0: amdgpu: amdgpu.gartsize=262144 MiB exceeds device
capacity (real_vram=1024 MiB, max sensible=65536 MiB); clamping to
default 512 MiB
and the device probes normally.
The helper lives in amdgpu_gmc.c so the other gmc_v*_0 backends can
adopt it in follow-up patches; this patch only wires gmc_v11_0 because
that is where the failure was observed.
Signed-off-by: Ahmed Elmetwally <en22ue@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 47 +++++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 2 +
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 6 +---
3 files changed, 50 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -330,6 +330,53 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev,
dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
mc->gart_size >> 20, mc->gart_start, mc->gart_end);
}
+
+/**
+ * amdgpu_gmc_validate_gart_size - clamp amdgpu.gartsize against VRAM capacity
+ *
+ * @adev: amdgpu device (real_vram_size must already be populated)
+ * @user_mb: value of the amdgpu_gart_size module parameter (MiB),
+ * or -1 for auto
+ * @default_mb: per-IP auto default in MiB
+ *
+ * The GART page table is allocated from VRAM and sized as
+ * gart_size / AMDGPU_GPU_PAGE_SIZE * sizeof(u64). A typoed or mis-pasted
+ * amdgpu.gartsize value (e.g. 262144 MiB on a 1 GiB-VRAM APU) produces a
+ * page-table BO that cannot be pinned, aborting GPU probe. Cap the page
+ * table at 1/8 of real VRAM, warn, and fall back to the auto default.
+ * Returns gart_size in bytes.
+ */
+u64 amdgpu_gmc_validate_gart_size(struct amdgpu_device *adev,
+ int user_mb, u32 default_mb)
+{
+ u64 want_bytes, max_bytes;
+
+ if (user_mb == -1)
+ return (u64)default_mb << 20;
+
+ want_bytes = (u64)user_mb << 20;
+
+ /*
+ * page-table BO bytes = gart_bytes / AMDGPU_GPU_PAGE_SIZE * 8
+ * Constraint: page-table BO <= real_vram_size / 8
+ * gart_bytes <= (real_vram_size / 8) * (AMDGPU_GPU_PAGE_SIZE / 8)
+ */
+ max_bytes = (adev->gmc.real_vram_size / 8) *
+ (AMDGPU_GPU_PAGE_SIZE / 8);
+
+ if (want_bytes > max_bytes) {
+ dev_warn(adev->dev,
+ "amdgpu.gartsize=%u MiB exceeds device capacity "
+ "(real_vram=%llu MiB, max sensible=%llu MiB); "
+ "clamping to default %u MiB\n",
+ user_mb,
+ adev->gmc.real_vram_size >> 20,
+ max_bytes >> 20,
+ default_mb);
+ return (u64)default_mb << 20;
+ }
+
+ return want_bytes;
+}
/**
* amdgpu_gmc_agp_location - try to find AGP location
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -416,6 +416,8 @@ void amdgpu_gmc_vram_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc,
void amdgpu_gmc_gart_location(struct amdgpu_device *adev,
struct amdgpu_gmc *mc,
enum amdgpu_gart_placement gart_placement);
+u64 amdgpu_gmc_validate_gart_size(struct amdgpu_device *adev,
+ int user_mb, u32 default_mb);
void amdgpu_gmc_agp_location(struct amdgpu_device *adev,
struct amdgpu_gmc *mc);
void amdgpu_gmc_set_agp_default(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -708,11 +708,7 @@ static int gmc_v11_0_mc_init(struct amdgpu_device *adev)
if (adev->gmc.visible_vram_size > adev->gmc.real_vram_size)
adev->gmc.visible_vram_size = adev->gmc.real_vram_size;
- /* set the gart size */
- if (amdgpu_gart_size == -1)
- adev->gmc.gart_size = 512ULL << 20;
- else
- adev->gmc.gart_size = (u64)amdgpu_gart_size << 20;
+ adev->gmc.gart_size = amdgpu_gmc_validate_gart_size(adev,
+ amdgpu_gart_size, 512);
gmc_v11_0_vram_gtt_location(adev, &adev->gmc);
--
2.45.0
reply other threads:[~2026-05-14 19:49 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260514194937.35649-1-en22ue@gmail.com \
--to=en22ue@gmail.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=simona@ffwll.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox