From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 403383A381E; Mon, 20 Apr 2026 13:31:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776691878; cv=none; b=EABHiucHCsVa3OUm7i+o+SyJ3zR5Vgh3tjf2TIw/NuvSBEVt8rxRCLlu9ppE9POGwH7cyGDDvPpy48zRCTTfQMZevbz2+dzEwyhpypGFjPE/gLNSIAkin7df8/I1G/zCz7WeN/TrNl0UcuBhFi9yY2LJiKTm2evNcHk9ZRJYttI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776691878; c=relaxed/simple; bh=1VT6tlFrlUK8WOAd+dzOVnbdWLK8JPn8qKU+z7+uWnI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ry/TwpTzl4XybFeFvtWk8/JXvZzt8ZNcU8q3OX+5SP7XQG1DG6pmZ1kXBwiocxkEc2CnsJl/QvIo+W+rJjH0IrBcW6oeYucnepZKNSB84VrIH2jvNyvEvGPza4CqG5/5ziKymymditzPO1sSvdsEiUeblkuyl2PeS3znnDef5kw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FfCM4MxT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FfCM4MxT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B7659C2BCB4; Mon, 20 Apr 2026 13:31:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776691878; bh=1VT6tlFrlUK8WOAd+dzOVnbdWLK8JPn8qKU+z7+uWnI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FfCM4MxTfjwnaoamVCEHCtuPscWiiVHEWtc6HKGpcpBTrwx3hVmv23YkxJaE8ajVM H6GcsLi6wHSDOP7rz/biNjxFDmvr3xOfKNSt6OowuRzmfO+3mVUMY4sYWfCD4XfgXj eknbLIjnY2SzNpC2MqfaIRUK2kYZpoVoFU3dWzGqojIS89aeU+sis/G2RSMyPmR7tW OnarQZzqbrr86ngaoInTSFDRVpU7GkGp3g2qLzTYBmVPVwHFWNrOLr2VC/6RCrBpo7 mobJ6yl/hXeqkU8HUAolks8rjnnglPJN3vMXvLI/dMbhyXmnMio+HC6AZnNFLkKgFO hu8dh2oSZOTDg== From: Sasha Levin To: patches@lists.linux.dev, stable@vger.kernel.org Cc: Asad Kamal , Lijo Lazar , Alex Deucher , Sasha Levin , kenneth.feng@amd.com, christian.koenig@amd.com, airlied@gmail.com, simona@ffwll.ch, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH AUTOSEL 7.0] drm/amd/pm: Avoid overflow when sorting pp_feature list Date: Mon, 20 Apr 2026 09:20:41 -0400 Message-ID: <20260420132314.1023554-247-sashal@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org> References: <20260420132314.1023554-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 7.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Asad Kamal [ Upstream commit 8e8f6bda8a84f41c4002bca44ac89a5b3f8c7df2 ] pp_features sorting uses int8_t sort_feature[] to store driver feature enum indices. On newer ASICs the enum index can exceed 127, causing signed overflow and silently dropping entries from the output. Switch the array to int16_t so all enum indices are preserved. Signed-off-by: Asad Kamal Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- LLM Generated explanations, may be completely bogus: Now I have all the information needed for a complete analysis. Let me compile the full walkthrough. --- ## PHASE 1: COMMIT MESSAGE FORENSICS ### Step 1.1: Subject Line - **Subsystem:** `drm/amd/pm` (AMD GPU power management) - **Action verb:** "Avoid" (preventing a bug) - **Summary:** Avoid integer overflow in `sort_feature[]` array when enum indices exceed `int8_t` max (127) ### Step 1.2: Tags - `Signed-off-by: Asad Kamal ` - AMD employee, author - `Reviewed-by: Lijo Lazar ` - AMD GPU PM subsystem reviewer - `Signed-off-by: Alex Deucher ` - AMD DRM maintainer - No Fixes: tag, no Cc: stable, no Reported-by (expected for autosel candidates) ### Step 1.3: Commit Body Bug: `int8_t sort_feature[]` stores enum indices that can exceed 127 on newer ASICs. Signed overflow wraps values to negative, and the subsequent `< 0` check silently drops those entries from the sysfs output. Fix: widen to `int16_t`. ### Step 1.4: Hidden Bug Fix Detection This is explicitly described as an overflow fix. Not hidden. ## PHASE 2: DIFF ANALYSIS ### Step 2.1: Inventory - 1 file changed: `drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c` - 1 line added, 1 line removed - Function modified: `smu_cmn_get_pp_feature_mask()` ### Step 2.2: Code Flow **Before:** `int8_t sort_feature[MAX(SMU_FEATURE_COUNT, SMU_FEATURE_MAX)]` - can hold values -128 to 127. **After:** `int16_t sort_feature[MAX(SMU_FEATURE_COUNT, SMU_FEATURE_MAX)]` - can hold values -32768 to 32767. The array is initialized to `-1` via `memset(sort_feature, -1, sizeof(...))`, then populated with enum index `i` (0 to `SMU_FEATURE_COUNT-1`). Entries remaining `-1` are skipped via the `< 0` check. With `int8_t`, any `i >= 128` overflows to a negative value, falsely triggering the skip. ### Step 2.3: Bug Mechanism **Integer overflow / type bug.** `SMU_FEATURE_COUNT = 135` (verified by counting enum entries). Indices 128-134 (7 features: `APT_SQ_THROTTLE`, `APT_PF_DCS`, `GFX_EDC_XVMIN`, `GFX_DIDT_XVMIN`, `FAN_ABNORMAL`, `PIT`, `HROM_EN`) overflow `int8_t`, wrapping to negative values and being silently dropped. ### Step 2.4: Fix Quality - Obviously correct: widening the type eliminates the overflow - `memset(-1)` still works correctly: fills all bytes with `0xFF`, making each `int16_t` element `0xFFFF = -1` in two's complement (confirmed by the author in review discussion and correct by C standard) - No regression risk: the type widening is strictly safe; no logic changes - Minimal and surgical: 1-line change ## PHASE 3: GIT HISTORY ### Step 3.1: Blame The `int8_t` type was introduced in commit `6f73d6762694c` ("drm/amd/pm: optimize the interface for dpm feature status query", dated 2022-05-25, by Evan Quan). Originally (commit `7dbf78051f75f1`, 2020), the array was `uint32_t sort_feature[SMU_FEATURE_COUNT]` with no overflow possibility. The refactoring in 6f73d6762694c downsized the type to `int8_t` (using `-1` as sentinel). ### Step 3.2: Fixes: tag No Fixes: tag present. The logical "Fixes:" would be `6f73d6762694c` (introduced `int8_t`) + `25d48f2eb0af1` (pushed enum count past 127). ### Step 3.3: Related Changes Recent changes to `smu_cmn.c` include significant refactoring of feature mask handling (`7b88453a476c9` etc.), but none address this specific overflow. ### Step 3.4: Author Asad Kamal is an AMD employee who regularly contributes to `drm/amd/pm`. Multiple recent commits in the subsystem. ### Step 3.5: Dependencies No dependencies. The fix is self-contained. ## PHASE 4: MAILING LIST DISCUSSION ### Step 4.1: Original Submission Found via `b4 dig -c 8e8f6bda8a84f`: https://patch.msgid.link/20260302061242.3062232-1-asad.kamal@amd.com ### Step 4.2: Review Discussion - **Lijo Lazar** (AMD reviewer): Gave `Reviewed-by` immediately - **Kevin Wang** raised a concern about `memset(-1)` correctness with `int16_t` — asking whether it would correctly initialize all elements to `-1` - **Asad Kamal** correctly explained: "memset fills all bytes with 0xFF. For int16_t, that becomes 0xFFFF, which is -1 in two's complement." - **Kevin Wang** accepted the explanation: "Based on private discussions, please continue to submit the code." - No NAKs, no concerns about the fix itself, only a clarification question that was satisfactorily resolved. ### Step 4.3-4.5: No external bug reports. No stable-specific discussion found. ## PHASE 5: CODE SEMANTIC ANALYSIS ### Step 5.1-5.2: Call Chain `smu_cmn_get_pp_feature_mask()` is called via sysfs: user reads `pp_features` -> `amdgpu_pm.c:amdgpu_dpm_get_ppfeature_status()` -> `smu_sys_get_pp_feature_mask()` -> `smu_cmn_get_pp_feature_mask()`. Used by **17 different GPU backends** (verified: SMU v11, v12, v13, v14, v15 variants). ### Step 5.4: User Reachability Directly reachable from userspace via sysfs read. Any user or monitoring tool reading GPU feature status triggers this code. ## PHASE 6: STABLE TREE ANALYSIS ### Step 6.1: Buggy Code in Stable Trees - `int8_t` introduced in `6f73d6762694c` (v5.19/v6.0 era) - Overflow-triggering features added in `25d48f2eb0af1` (v6.12) - **The overflow is triggerable in v6.12+ stable trees** where both the `int8_t` type and the >127 enum count coexist - For older stable trees (6.6.y, 6.1.y), SMU_FEATURE_COUNT is still < 128, so no overflow yet — but future backported features could trigger it ### Step 6.2: Backport Difficulty Clean apply expected — the change is a single-line type change with no context dependencies. ## PHASE 7: SUBSYSTEM CONTEXT ### Step 7.1: Subsystem - **drm/amd/pm** — AMD GPU power management, IMPORTANT criticality - Affects all users with AMD GPUs using swSMU (modern AMD GPUs: RDNA2+, CDNA) ### Step 7.2: Activity Very actively developed — many recent commits. ## PHASE 8: IMPACT AND RISK ASSESSMENT ### Step 8.1: Affected Users All users with AMD GPUs running SMU v14.0.2/3 or newer (Radeon RX 8000 series and similar), or any ASIC whose feature mapping exceeds index 127. ### Step 8.2: Trigger Conditions - **Trigger:** Any read of `pp_features` sysfs node - **Frequency:** Common — monitoring tools, manual inspection, power management tools read this - **Unprivileged:** Yes, sysfs readable by any user ### Step 8.3: Severity - **MEDIUM:** Incorrect/incomplete sysfs output. Not a crash or security issue, but features are silently dropped, making power management monitoring unreliable. ### Step 8.4: Risk-Benefit - **Benefit:** Fixes incorrect sysfs output for AMD GPU users; prevents silent data loss in feature reporting - **Risk:** Extremely low — 1-line type change, no logic modification, correctness of `memset(-1)` with `int16_t` verified in review and mathematically sound - **Stack increase:** `sort_feature` grows from 128 bytes (`int8_t[128]`) to 256 bytes (`int16_t[128]`); negligible for a stack- allocated array ## PHASE 9: FINAL SYNTHESIS ### Step 9.1: Evidence Summary **FOR backporting:** - Fixes a real, verifiable integer overflow bug - SMU_FEATURE_COUNT = 135 > 127, confirmed to overflow `int8_t` - 7 power management features silently dropped from sysfs output - 1-line fix, obviously correct, minimal risk - Reviewed by AMD engineer, no objections after clarification - Used by 17 GPU backends across all modern AMD GPUs - Signed off by Alex Deucher (AMD DRM maintainer) **AGAINST backporting:** - Not a crash or security issue (incorrect output only) - Only affects v6.12+ trees where enum count exceeds 127 - No Fixes: tag or Cc: stable ### Step 9.2: Stable Rules Checklist 1. **Obviously correct and tested?** YES — trivial type widening, reviewed 2. **Fixes a real bug?** YES — integer overflow causing features to be silently dropped 3. **Important issue?** MEDIUM — not crash/security, but correctness bug in user-visible output 4. **Small and contained?** YES — 1 line, 1 file 5. **No new features or APIs?** Correct — no new features 6. **Can apply to stable?** YES — clean apply expected ### Step 9.3: Exception Categories None applicable — this is a standard bug fix. ### Step 9.4: Decision This is a clean, minimal, well-reviewed bug fix for a verifiable integer overflow that causes incorrect user-visible behavior on modern AMD GPUs. It meets all stable criteria. ## Verification - [Phase 1] Parsed tags: Reviewed-by Lijo Lazar, SOB from Alex Deucher (AMD maintainer) - [Phase 2] Diff analysis: 1 line changed, `int8_t` -> `int16_t` in `smu_cmn_get_pp_feature_mask()` - [Phase 2] Verified SMU_FEATURE_COUNT = 135 by counting enum entries in `smu_types.h` — 7 features exceed index 127 - [Phase 3] git blame: `int8_t` introduced in `6f73d6762694c` (v5.19/v6.0 era, Evan Quan, 2022) - [Phase 3] Overflow-triggering features added in `25d48f2eb0af1` (v6.12, 2024-09-10) - [Phase 3] Original type was `uint32_t` in `7dbf78051f75f1` (2020) — no overflow possible - [Phase 4] b4 dig -c found submission: https://patch.msgid.link/20260302061242.3062232-1-asad.kamal@amd.com - [Phase 4] b4 dig -w: AMD team members CC'd (lijo.lazar, hawking.zhang, le.ma, alexander.deucher, kevinyang.wang) - [Phase 4] Review discussion decoded from base64: Kevin Wang raised memset concern, Asad explained correctly, Kevin approved - [Phase 5] Traced call chain: sysfs read -> `amdgpu_dpm_get_ppfeature_status()` -> `smu_sys_get_pp_feature_mask()` -> target function - [Phase 5] Verified 17 GPU backends use this function (SMU v11, v12, v13, v14, v15) - [Phase 6] Bug triggerable in v6.12+ (both int8_t type and >127 enum present) - [Phase 8] Severity: MEDIUM (incorrect sysfs output, not crash/security) - [Phase 8] Risk: Very low (1-line type change, no logic change, stack grows by 128 bytes) **YES** drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c index 6fd50c2fd20e0..97ed66cb47472 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c @@ -880,7 +880,7 @@ static const char *smu_get_feature_name(struct smu_context *smu, size_t smu_cmn_get_pp_feature_mask(struct smu_context *smu, char *buf) { - int8_t sort_feature[MAX(SMU_FEATURE_COUNT, SMU_FEATURE_MAX)]; + int16_t sort_feature[MAX(SMU_FEATURE_COUNT, SMU_FEATURE_MAX)]; struct smu_feature_bits feature_mask; uint32_t features[2]; int i, feature_index; -- 2.53.0