[PATCH AUTOSEL 6.15 032/110] drm/amdgpu: Add basic validation for RAS header

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Lijo Lazar <lijo.lazar@amd.com>,
	Hawking Zhang <Hawking.Zhang@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	Sasha Levin <sashal@kernel.org>,
	christian.koenig@amd.com, airlied@gmail.com, simona@ffwll.ch,
	matthias.bgg@gmail.com, angelogioacchino.delregno@collabora.com,
	tao.zhou1@amd.com, ganglxie@amd.com, candice.li@amd.com,
	Stanley.Yang@amd.com, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org
Subject: [PATCH AUTOSEL 6.15 032/110] drm/amdgpu: Add basic validation for RAS header
Date: Sun,  1 Jun 2025 19:23:14 -0400	[thread overview]
Message-ID: <20250601232435.3507697-32-sashal@kernel.org> (raw)
In-Reply-To: <20250601232435.3507697-1-sashal@kernel.org>

From: Lijo Lazar <lijo.lazar@amd.com>

[ Upstream commit 5df0d6addb7e9b6f71f7162d1253762a5be9138e ]

If RAS header read from EEPROM is corrupted, it could result in trying
to allocate huge memory for reading the records. Add some validation to
header fields.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

Based on my analysis, here is my assessment: **YES** This commit should
be backported to stable kernel trees. Here's my extensive analysis: ##
Security Vulnerability Analysis **Critical Memory Allocation
Vulnerability:** The commit fixes a serious security vulnerability where
corrupted RAS (Reliability, Availability, and Serviceability) EEPROM
header data could cause huge memory allocations. In
`/home/sasha/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:3011`, the
code performs: ```c bps = kcalloc(control->ras_num_recs, sizeof(*bps),
GFP_KERNEL); ``` **Before the fix:** If `control->ras_num_recs` contains
a corrupted large value (e.g., 0xFFFFFFFF), this would attempt to
allocate `0xFFFFFFFF 0001-Fix-Clippy-warnings.patch 0002-Enhance-
inference-prompt-to-utilize-CVEKERNELDIR-whe.patch 0003-Update-to-
latest-version-of-clap.patch Cargo.lock Cargo.toml LICENSE README.md
analyze_merge_commit.sh dpp_rcg_backport_analysis.md
drm_bridge_analysis.txt drm_imagination_register_update_analysis.md
drm_mediatek_mtk_dpi_refactoring_analysis.md io_uring_analysis.txt
ksmbd_analysis.txt merge_commit_analysis.txt model prompt src target
test_gpio_cleanup.txt test_patch.txt verisilicon_av1_4k_analysis.md 24
bytes` = ~96GB of memory, likely causing: 1. System memory exhaustion 2.
Denial of service 3. Potential system crash/instability ## Code Changes
Analysis **1. Version Validation Enhancement:** The fix replaces a
simple `if/else` with a robust `switch` statement: ```c // Before: Only
checked for version >= V2_1 if (hdr->version >= RAS_TABLE_VER_V2_1) { //
After: Explicit validation of known versions switch (hdr->version) {
case RAS_TABLE_VER_V2_1: case RAS_TABLE_VER_V3: // Future-proofing //
V2.1+ handling break; case RAS_TABLE_VER_V1: // V1 handling break;
default: dev_err(adev->dev, "RAS header invalid, unsupported version:
%u", hdr->version); return -EINVAL; } ``` **2. Record Count Bounds
Checking:** Critical addition of bounds validation: ```c if
(control->ras_num_recs > control->ras_max_record_count) {
dev_err(adev->dev, "RAS header invalid, records in header: %u max
allowed :%u", control->ras_num_recs, control->ras_max_record_count);
return -EINVAL; } ``` This prevents the memory allocation attack by
ensuring `ras_num_recs` cannot exceed reasonable bounds. ## Stable Tree
Criteria Assessment ✅ **Fixes important security bug:** Prevents DoS via
memory exhaustion ✅ **Small and contained:** Only adds validation logic,
no functional changes ✅ **Clear side effects:** None - only adds error
checking ✅ **No architectural changes:** Pure validation enhancement ✅
**Touches critical subsystem:** Graphics driver reliability/security ✅
**Minimal regression risk:** Only adds stricter validation ## Comparison
with Historical Patterns Looking at the provided similar commits (all
marked "NO"), they were architectural refactoring changes that: - Split
functions (`ras_eeprom_init into init and check functions`) - Added new
infrastructure (`Hook EEPROM table to RAS`) - Wrapped function calls
(`RAS xfer to read/write`) - Added entirely new features (`Add RAS
EEPROM table`) In contrast, this commit is a **security hardening fix**
that adds essential input validation to prevent memory exhaustion
attacks. ## Impact Assessment **Risk if NOT backported:** - Systems with
AMD GPUs vulnerable to DoS attacks via corrupted EEPROM data - Potential
for system instability when corrupted data triggers massive allocations
- Security exposure in enterprise/server environments using AMD GPUs
**Risk if backported:** - Minimal: Only adds validation, existing
correct data will continue to work - Possible rejection of previously
accepted (but actually corrupted) EEPROM data - this is desired behavior
This fix addresses a clear security vulnerability with minimal code
changes and should definitely be backported to stable trees to protect
users from memory exhaustion attacks via corrupted RAS EEPROM headers.

 .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c    | 22 ++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 0ea7cfaf3587d..e979a6086178c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -1392,17 +1392,33 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control)
 
 	__decode_table_header_from_buf(hdr, buf);
 
-	if (hdr->version >= RAS_TABLE_VER_V2_1) {
+	switch (hdr->version) {
+	case RAS_TABLE_VER_V2_1:
+	case RAS_TABLE_VER_V3:
 		control->ras_num_recs = RAS_NUM_RECS_V2_1(hdr);
 		control->ras_record_offset = RAS_RECORD_START_V2_1;
 		control->ras_max_record_count = RAS_MAX_RECORD_COUNT_V2_1;
-	} else {
+		break;
+	case RAS_TABLE_VER_V1:
 		control->ras_num_recs = RAS_NUM_RECS(hdr);
 		control->ras_record_offset = RAS_RECORD_START;
 		control->ras_max_record_count = RAS_MAX_RECORD_COUNT;
+		break;
+	default:
+		dev_err(adev->dev,
+			"RAS header invalid, unsupported version: %u",
+			hdr->version);
+		return -EINVAL;
 	}
-	control->ras_fri = RAS_OFFSET_TO_INDEX(control, hdr->first_rec_offset);
 
+	if (control->ras_num_recs > control->ras_max_record_count) {
+		dev_err(adev->dev,
+			"RAS header invalid, records in header: %u max allowed :%u",
+			control->ras_num_recs, control->ras_max_record_count);
+		return -EINVAL;
+	}
+
+	control->ras_fri = RAS_OFFSET_TO_INDEX(control, hdr->first_rec_offset);
 	control->ras_num_mca_recs = 0;
 	control->ras_num_pa_recs = 0;
 	return 0;
-- 
2.39.5

next prev parent reply	other threads:[~2025-06-01 23:26 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-01 23:22 [PATCH AUTOSEL 6.15 001/110] drm/amd/display: disable DPP RCG before DPP CLK enable Sasha Levin
2025-06-01 23:22 ` [PATCH AUTOSEL 6.15 003/110] drm/amdgpu/gfx6: fix CSIB handling Sasha Levin
2025-06-01 23:22 ` [PATCH AUTOSEL 6.15 008/110] drm/amdgpu: Fix API status offset for MES queue reset Sasha Levin
2025-06-01 23:22 ` [PATCH AUTOSEL 6.15 009/110] drm/amd/display: DCN32 null data check Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 020/110] drm/amdkfd: Drop workaround for GC v9.4.3 revID 0 Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 021/110] drm/amdgpu/gfx11: fix CSIB handling Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 025/110] drm/amd/display: Avoid divide by zero by initializing dummy pitch to 1 Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 029/110] drm/amd/display: Add NULL pointer checks in dm_force_atomic_commit() Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 031/110] drm/amd/display: Skip to enable dsc if it has been off Sasha Levin
2025-06-01 23:23 ` Sasha Levin [this message]
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 035/110] drm/amd/display: Do Not Consider DSC if Valid Config Not Found Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 037/110] drm/amdgpu/gfx10: fix CSIB handling Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 041/110] drm/amd/display: fix zero value for APU watermark_c Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 043/110] drm/amdgpu/gfx7: fix CSIB handling Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 049/110] drm/amd/display: Update IPS sequential_ono requirement checks Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 050/110] drm/amd/display: Correct SSC enable detection for DCN351 Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 051/110] drm/amd/display: Fix Vertical Interrupt definitions for dcn32, dcn401 Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 055/110] drm/amdgpu: fix MES GFX mask Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 056/110] drm/amdgpu: Disallow partition query during reset Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 059/110] drm/amdgpu/gfx8: fix CSIB handling Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 060/110] drm/amd/display: disable EASF narrow filter sharpening Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 061/110] drm/amdgpu/gfx9: fix CSIB handling Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 062/110] drm/amd/display: Fix VUpdate offset calculations for dcn401 Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 064/110] drm/amd/pm: Reset SMU v13.0.x custom settings Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 065/110] drm/amd/display: Correct prefetch calculation Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 066/110] drm/amd/display: Restructure DMI quirks Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 069/110] drm/amdkfd: Set SDMA_RLCx_IB_CNTL/SWITCH_INSIDE_IB Sasha Levin
2025-06-01 23:23 ` [PATCH AUTOSEL 6.15 076/110] drm/amdgpu: Add indirect L1_TLB_CNTL reg programming for VFs Sasha Levin

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:0ea7cfaf3587 dfblob:e979a6086178 )
 OR (
bs:"[PATCH AUTOSEL 6.15 032/110] drm/amdgpu: Add basic validation for RAS header" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250601232435.3507697-32-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Hawking.Zhang@amd.com \
    --cc=Stanley.Yang@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=candice.li@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ganglxie@amd.com \
    --cc=lijo.lazar@amd.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=matthias.bgg@gmail.com \
    --cc=patches@lists.linux.dev \
    --cc=simona@ffwll.ch \
    --cc=stable@vger.kernel.org \
    --cc=tao.zhou1@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox