From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: "Jesse.Zhang" <Jesse.Zhang@amd.com>,
Jesse Zhang <jesse.zhang@amd.com>,
"Shaoyun . liu" <Shaoyun.liu@amd.com>,
Prike Liang <Prike.Liang@amd.com>,
Alex Deucher <alexander.deucher@amd.com>,
Sasha Levin <sashal@kernel.org>,
christian.koenig@amd.com, airlied@gmail.com, simona@ffwll.ch,
sunil.khatri@amd.com, Jack.Xiao@amd.com,
srinivasan.shanmugam@amd.com, shaoyun.liu@amd.com,
Jiadong.Zhu@amd.com, Hawking.Zhang@amd.com, michael.chen@amd.com,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 6.14 008/102] drm/amdgpu: Fix API status offset for MES queue reset
Date: Sun, 1 Jun 2025 19:28:00 -0400 [thread overview]
Message-ID: <20250601232937.3510379-8-sashal@kernel.org> (raw)
In-Reply-To: <20250601232937.3510379-1-sashal@kernel.org>
From: "Jesse.Zhang" <Jesse.Zhang@amd.com>
[ Upstream commit ad7c088e31f026d71fe87fd09473fafb7d6ed006 ]
The mes_v11_0_reset_hw_queue and mes_v12_0_reset_hw_queue functions were
using the wrong union type (MESAPI__REMOVE_QUEUE) when getting the offset
for api_status. Since these functions handle queue reset operations, they
should use MESAPI__RESET union instead.
This fixes the polling of API status during hardware queue reset operations
in the MES for both v11 and v12 versions.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-By: Shaoyun.liu <Shaoyun.liu@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
**YES** This commit should be backported to stable kernel trees based on
the following analysis: ## Bug Analysis The commit fixes a clear
functional bug where `mes_v11_0_reset_hw_queue()` and
`mes_v12_0_reset_hw_queue()` functions were using the wrong union type
(`MESAPI__REMOVE_QUEUE`) to get the API status offset for polling
completion. Since these functions handle **reset** operations, they
should use the `MESAPI__RESET` union instead. ## Code Changes Assessment
1. **Small and contained**: The fix changes only 2 lines across 2 files:
- `mes_v11_0.c:480`: `offsetof(union MESAPI__REMOVE_QUEUE, api_status)`
→ `offsetof(union MESAPI__RESET, api_status)` - `mes_v12_0.c:500`: Same
change 2. **Logical correctness**: The functions create `union
MESAPI__RESET mes_reset_queue_pkt` packets but were polling using the
wrong union offset. This is a clear mismatch that could cause incorrect
polling behavior. 3. **No architectural changes**: This is purely a bug
fix that aligns the API status polling with the correct packet type. ##
Impact Assessment - **Affects GPU queue management**: MES (Micro Engine
Scheduler) queue reset is a critical operation for AMD GPUs -
**Potential for silent failures**: Wrong offset polling could lead to
improper completion detection - **User-visible impact**: Could cause GPU
hangs or incorrect queue state management - **Low regression risk**: The
fix aligns the code with its intended design ## Comparison with Similar
Commits All 5 provided similar commits were marked "NO" for backporting
because they: - Added new APIs/features (commits #1, #2) - Made
functional changes to queue reset behavior (commits #3, #4, #5) In
contrast, this commit: - Fixes an existing bug without adding features -
Maintains existing behavior while correcting implementation - Follows
the stable tree rule of "important bugfixes with minimal risk" ## Stable
Tree Criteria Met ✅ **Important bugfix**: Fixes incorrect API status
polling in GPU queue reset ✅ **Minimal risk**: Two-line change with
clear logic ✅ **No new features**: Pure correctness fix ✅ **Contained
change**: Limited to MES driver subsystem ✅ **Clear side effects**:
Improves reliability of queue reset operations This commit represents
exactly the type of focused bug fix that stable trees are designed to
include.
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 68bb334393bb6..ec7ef8763f932 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -477,7 +477,7 @@ static int mes_v11_0_reset_hw_queue(struct amdgpu_mes *mes,
return mes_v11_0_submit_pkt_and_poll_completion(mes,
&mes_reset_queue_pkt, sizeof(mes_reset_queue_pkt),
- offsetof(union MESAPI__REMOVE_QUEUE, api_status));
+ offsetof(union MESAPI__RESET, api_status));
}
static int mes_v11_0_map_legacy_queue(struct amdgpu_mes *mes,
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 6b121c2723d66..53d059a2a42e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -500,7 +500,7 @@ static int mes_v12_0_reset_hw_queue(struct amdgpu_mes *mes,
return mes_v12_0_submit_pkt_and_poll_completion(mes, pipe,
&mes_reset_queue_pkt, sizeof(mes_reset_queue_pkt),
- offsetof(union MESAPI__REMOVE_QUEUE, api_status));
+ offsetof(union MESAPI__RESET, api_status));
}
static int mes_v12_0_map_legacy_queue(struct amdgpu_mes *mes,
--
2.39.5
next prev parent reply other threads:[~2025-06-01 23:30 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-01 23:27 [PATCH AUTOSEL 6.14 001/102] drm/amd/display: disable DPP RCG before DPP CLK enable Sasha Levin
2025-06-01 23:27 ` [PATCH AUTOSEL 6.14 003/102] drm/amdgpu/gfx6: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` Sasha Levin [this message]
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 009/102] drm/amd/display: DCN32 null data check Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 020/102] drm/amdkfd: Drop workaround for GC v9.4.3 revID 0 Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 021/102] drm/amdgpu/gfx11: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 025/102] drm/amd/display: Avoid divide by zero by initializing dummy pitch to 1 Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 029/102] drm/amd/display: Add NULL pointer checks in dm_force_atomic_commit() Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 031/102] drm/amd/display: Skip to enable dsc if it has been off Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 034/102] drm/amd/display: Do Not Consider DSC if Valid Config Not Found Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 036/102] drm/amdgpu/gfx10: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 040/102] drm/amd/display: fix zero value for APU watermark_c Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 042/102] drm/amdgpu/gfx7: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 047/102] drm/amd/display: Correct SSC enable detection for DCN351 Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 051/102] drm/amdgpu: fix MES GFX mask Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 052/102] drm/amdgpu: Disallow partition query during reset Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 055/102] drm/amdgpu/gfx8: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 056/102] drm/amd/display: disable EASF narrow filter sharpening Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 057/102] drm/amdgpu/gfx9: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 058/102] drm/amd/display: Fix VUpdate offset calculations for dcn401 Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 060/102] drm/amd/display: Correct prefetch calculation Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 061/102] drm/amd/display: Restructure DMI quirks Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 064/102] drm/amdkfd: Set SDMA_RLCx_IB_CNTL/SWITCH_INSIDE_IB Sasha Levin
2025-06-01 23:29 ` [PATCH AUTOSEL 6.14 070/102] drm/amdgpu: Add indirect L1_TLB_CNTL reg programming for VFs Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250601232937.3510379-8-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=Hawking.Zhang@amd.com \
--cc=Jack.Xiao@amd.com \
--cc=Jesse.Zhang@amd.com \
--cc=Jiadong.Zhu@amd.com \
--cc=Prike.Liang@amd.com \
--cc=Shaoyun.liu@amd.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=michael.chen@amd.com \
--cc=patches@lists.linux.dev \
--cc=simona@ffwll.ch \
--cc=srinivasan.shanmugam@amd.com \
--cc=stable@vger.kernel.org \
--cc=sunil.khatri@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox