From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F56731F99F for ; Thu, 14 May 2026 19:49:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778788194; cv=none; b=js+wsz2uu/s19JqkqFDGzfc/FBp1EHRDWnPUnxhRXJsyMC5nXrFby+gSLlPoFXocoKQZy/s4oBlh58IzndKIkcxi8dedskMMx2+Qeehdne+WbGbGX4rv+MH7AYn1A/5txwkhLKmN4PhRSZGA1x3Qc3/1s6Q4OQU8p2TLvX3CsoM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778788194; c=relaxed/simple; bh=8nH8omQc6QJzLlqcU/ucCMIlFxpGV8+aPwMkN8cSNzE=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=gRaCEqRR2IH9WhLXJ/GiNzFpK7xAbDDqxpWFXLcNm1sEETG7ZoDrxBz3JP7bIbswt1azE4gRhir/keMGthj8n2zq9AOSbV4UKnvS5e809Y9QWD11d+grIkcQb9Y7uXwNWpNXKyHFwGpyYXLNqAusKiYfsFpl+/nJUQMrnTAnzVk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Wy5//tpW; arc=none smtp.client-ip=209.85.221.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Wy5//tpW" Received: by mail-wr1-f50.google.com with SMTP id ffacd0b85a97d-44ce78ab5feso7111711f8f.0 for ; Thu, 14 May 2026 12:49:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778788192; x=1779392992; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=q9/6gW0M6Uv9EyNb5SxqRKZXb6/yE5Hd9XtK9uuQUhM=; b=Wy5//tpWeOEfur9wjGVq/OILEpZARSveoQNZEMaA54PsOLC1PCBVmzPpUF3PyTcqq3 U/mBYTGxtvBovEV/DleQFRZg8HXLu31h9e6UxuvX2oAyZqy1y/MbJBRPbg6sRzcZ1Owh boEg6ldqodYlj5rtYsJfPELGPI1zqiLbGP45kiJKr7BywEfCLEI5jHWC/YP6gRpkdrvV yDmuhdirH120b4LKMQrPpyU44E7PjGhhbJRkVilc/rpWE0jQaltLlnc6/VS5laZ8O6FZ ccQTe2a4Mk1JPo7Rby9hy7mt6LzbR/noPkng0fiqARQSbeJq0XQNx8euFPo2tzVuFFb1 8bIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778788192; x=1779392992; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=q9/6gW0M6Uv9EyNb5SxqRKZXb6/yE5Hd9XtK9uuQUhM=; b=BXpyiWAVMDSOBr7G5JOLk0E4FwV7OpilYnKahUFopGytU7ht/kegXmZ5HOD8lOyyTo EjnLwwelqN1MHSJXgjn4m/J3qQLBe0e47Q7FJbX2ku+xTw1eunB7IRj4tHSL3FX6JOFn MICo/mwK7sY/r4EdV4GMHlnhE0NtQaDrJBkK4hKr7PhfsN4y3V4wuLEi+lWHTXNAW64A kohoULbuZYyBvTlCCovg0T/RGZJW9fVstk3mAz8aNoXVWJF6O2neul5MPPLxo/S5dse6 b/WPbVH1nvcY+gZnKqdAgT/0ZJ/zUT7B+RLu85R2c2pNPsInIa4zTxJe558PZbSadzSr u+zA== X-Forwarded-Encrypted: i=1; AFNElJ/8y/6wcNyNkkW3ni41QrfSv0NMRx1j21DKYlVp8Vw/KnAqmu1s1GyAbYJVEoqLC1snqWptLH85+kXcgGg=@vger.kernel.org X-Gm-Message-State: AOJu0YwQ7v7E5cCXBJM3dbFTnjeWtp2CdbsZbKfJtz0rx0XHSMk9iHDh /tYQj7N65Wt85QVWzKfSzcXNXjtgzk6TAjJUdlf3F9wRE2EAICBXGhRp X-Gm-Gg: Acq92OG48g0P8mhutz+HZ3Q3yq+zruvo/SUUAl2D3qjrYVRrBuFxDa6LOVkyKG6ocuN ReAaA5ihhbFA+c/PZOVoSN12U2JlCsXqH5FWxY93OIgCQH4gkx9baPh/PNa0ELCeklDNm7f9vhK T0gNQDLG0mb1hv/U4rQdmlMD6ryBUGxWE9QbYr/JQYAATkXrrcSNy4lbYuFppTvkd3BE4etxwaB djYbbUHxijLDEtVjGlHncgJI2sbN1yZsPRfkZZRey7blStuuZqNdUzivq0peFmf8vzOGQOIHgEc OxADtNHweMfwjjfbhsypz6G+G3fxQMADNP9rpqaAMnlNFv2iG8m+P4DAY/n0jJg1ZPQ45MjrYcl /cm6p7CcUbXicMPoA7zpEYUngVr+uBrHvgQhdl+O4qQ2tJEWjQv8mFOZQzb9KqQARIy788oDUGM mT3fmd3BfrUSli663oIbbhAriYlPAQi59DXbp4Q/KCU6+BdxU9 X-Received: by 2002:a05:6000:2f8a:b0:43e:a73e:cc98 with SMTP id ffacd0b85a97d-45e5c58d529mr698372f8f.23.1778788191454; Thu, 14 May 2026 12:49:51 -0700 (PDT) Received: from fedora.taile3bb89.ts.net ([151.245.116.244]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45d9e768c4fsm9135127f8f.8.2026.05.14.12.49.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 May 2026 12:49:51 -0700 (PDT) From: Ahmed Elmetwally To: alexander.deucher@amd.com, christian.koenig@amd.com Cc: airlied@gmail.com, simona@ffwll.ch, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Ahmed Elmetwally Subject: [PATCH] drm/amdgpu: clamp user gartsize against device capacity Date: Thu, 14 May 2026 20:49:37 +0100 Message-ID: <20260514194937.35649-1-en22ue@gmail.com> X-Mailer: git-send-email 2.54.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit When the user-supplied amdgpu.gartsize= module parameter requests a GART aperture so large that the resulting page-table BO cannot be allocated from VRAM, gmc_v11_0_sw_init() aborts with -ENOMEM during the kernel BO pin in amdgpu_gart_table_vram_alloc(), which fails amdgpu probe entirely and leaves the device without a /dev/dri node. The GART page table is 8 bytes per AMDGPU_GPU_PAGE_SIZE-sized page, i.e. table_size = gart_size / 512. The table BO is allocated from real VRAM. On small-VRAM SoCs (APUs with stolen VRAM -- Strix Halo at ~1 GiB is the case at hand) a user-supplied gartsize that produces a table BO larger than what can be pinned in VRAM is silently accepted today and only fails far downstream. Reproducer on Strix Halo (gfx1151, 1 GiB stolen VRAM): /etc/modprobe.d/amdgpu-tuning.conf: options amdgpu gartsize=262144 # 256 GiB -> 512 MiB page table dmesg: amdgpu: [gmc_v11_0]*ERROR* GART aperture is needed by the driver but no memory has been pinned for it ... amdgpu 0000:c6:00.0: amdgpu: SW IP initialize failed amdgpu 0000:c6:00.0: amdgpu: sw_init of IP block failed -12 Recovery requires booting with amdgpu.gartsize=N on the kernel command line -- i.e. the user must already know the cause, from rescue media, with the GPU offline. The user-supplied gartsize is a tuning hint, not a correctness requirement. Compute the maximum sensible value such that the page-table BO fits within real_vram_size / 8 (leaves 7/8 of VRAM for everything else), and if the user value exceeds it, log a warning and fall back to the per-IP auto default. With this patch the same modprobe.d entry above produces: amdgpu 0000:c6:00.0: amdgpu: amdgpu.gartsize=262144 MiB exceeds device capacity (real_vram=1024 MiB, max sensible=65536 MiB); clamping to default 512 MiB and the device probes normally. The helper lives in amdgpu_gmc.c so the other gmc_v*_0 backends can adopt it in follow-up patches; this patch only wires gmc_v11_0 because that is where the failure was observed. Signed-off-by: Ahmed Elmetwally --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 47 +++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 2 + drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 6 +--- 3 files changed, 50 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -330,6 +330,53 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n", mc->gart_size >> 20, mc->gart_start, mc->gart_end); } + +/** + * amdgpu_gmc_validate_gart_size - clamp amdgpu.gartsize against VRAM capacity + * + * @adev: amdgpu device (real_vram_size must already be populated) + * @user_mb: value of the amdgpu_gart_size module parameter (MiB), + * or -1 for auto + * @default_mb: per-IP auto default in MiB + * + * The GART page table is allocated from VRAM and sized as + * gart_size / AMDGPU_GPU_PAGE_SIZE * sizeof(u64). A typoed or mis-pasted + * amdgpu.gartsize value (e.g. 262144 MiB on a 1 GiB-VRAM APU) produces a + * page-table BO that cannot be pinned, aborting GPU probe. Cap the page + * table at 1/8 of real VRAM, warn, and fall back to the auto default. + * Returns gart_size in bytes. + */ +u64 amdgpu_gmc_validate_gart_size(struct amdgpu_device *adev, + int user_mb, u32 default_mb) +{ + u64 want_bytes, max_bytes; + + if (user_mb == -1) + return (u64)default_mb << 20; + + want_bytes = (u64)user_mb << 20; + + /* + * page-table BO bytes = gart_bytes / AMDGPU_GPU_PAGE_SIZE * 8 + * Constraint: page-table BO <= real_vram_size / 8 + * gart_bytes <= (real_vram_size / 8) * (AMDGPU_GPU_PAGE_SIZE / 8) + */ + max_bytes = (adev->gmc.real_vram_size / 8) * + (AMDGPU_GPU_PAGE_SIZE / 8); + + if (want_bytes > max_bytes) { + dev_warn(adev->dev, + "amdgpu.gartsize=%u MiB exceeds device capacity " + "(real_vram=%llu MiB, max sensible=%llu MiB); " + "clamping to default %u MiB\n", + user_mb, + adev->gmc.real_vram_size >> 20, + max_bytes >> 20, + default_mb); + return (u64)default_mb << 20; + } + + return want_bytes; +} /** * amdgpu_gmc_agp_location - try to find AGP location diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -416,6 +416,8 @@ void amdgpu_gmc_vram_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc, void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc, enum amdgpu_gart_placement gart_placement); +u64 amdgpu_gmc_validate_gart_size(struct amdgpu_device *adev, + int user_mb, u32 default_mb); void amdgpu_gmc_agp_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc); void amdgpu_gmc_set_agp_default(struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c @@ -708,11 +708,7 @@ static int gmc_v11_0_mc_init(struct amdgpu_device *adev) if (adev->gmc.visible_vram_size > adev->gmc.real_vram_size) adev->gmc.visible_vram_size = adev->gmc.real_vram_size; - /* set the gart size */ - if (amdgpu_gart_size == -1) - adev->gmc.gart_size = 512ULL << 20; - else - adev->gmc.gart_size = (u64)amdgpu_gart_size << 20; + adev->gmc.gart_size = amdgpu_gmc_validate_gart_size(adev, + amdgpu_gart_size, 512); gmc_v11_0_vram_gtt_location(adev, &adev->gmc); -- 2.45.0