AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: "Jesse.Zhang" <Jesse.Zhang@amd.com>,
	Jesse Zhang <jesse.zhang@amd.com>,
	"Shaoyun . liu" <Shaoyun.liu@amd.com>,
	Prike Liang <Prike.Liang@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	Sasha Levin <sashal@kernel.org>,
	christian.koenig@amd.com, airlied@gmail.com, simona@ffwll.ch,
	sunil.khatri@amd.com, Jack.Xiao@amd.com,
	srinivasan.shanmugam@amd.com, shaoyun.liu@amd.com,
	Jiadong.Zhu@amd.com, Hawking.Zhang@amd.com, michael.chen@amd.com,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 6.14 008/102] drm/amdgpu: Fix API status offset for MES queue reset
Date: Sun,  1 Jun 2025 19:28:00 -0400	[thread overview]
Message-ID: <20250601232937.3510379-8-sashal@kernel.org> (raw)
In-Reply-To: <20250601232937.3510379-1-sashal@kernel.org>

From: "Jesse.Zhang" <Jesse.Zhang@amd.com>

[ Upstream commit ad7c088e31f026d71fe87fd09473fafb7d6ed006 ]

The mes_v11_0_reset_hw_queue and mes_v12_0_reset_hw_queue functions were
using the wrong union type (MESAPI__REMOVE_QUEUE) when getting the offset
for api_status. Since these functions handle queue reset operations, they
should use MESAPI__RESET union instead.

This fixes the polling of API status during hardware queue reset operations
in the MES for both v11 and v12 versions.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-By: Shaoyun.liu <Shaoyun.liu@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

**YES** This commit should be backported to stable kernel trees based on
the following analysis: ## Bug Analysis The commit fixes a clear
functional bug where `mes_v11_0_reset_hw_queue()` and
`mes_v12_0_reset_hw_queue()` functions were using the wrong union type
(`MESAPI__REMOVE_QUEUE`) to get the API status offset for polling
completion. Since these functions handle **reset** operations, they
should use the `MESAPI__RESET` union instead. ## Code Changes Assessment
1. **Small and contained**: The fix changes only 2 lines across 2 files:
- `mes_v11_0.c:480`: `offsetof(union MESAPI__REMOVE_QUEUE, api_status)`
→ `offsetof(union MESAPI__RESET, api_status)` - `mes_v12_0.c:500`: Same
change 2. **Logical correctness**: The functions create `union
MESAPI__RESET mes_reset_queue_pkt` packets but were polling using the
wrong union offset. This is a clear mismatch that could cause incorrect
polling behavior. 3. **No architectural changes**: This is purely a bug
fix that aligns the API status polling with the correct packet type. ##
Impact Assessment - **Affects GPU queue management**: MES (Micro Engine
Scheduler) queue reset is a critical operation for AMD GPUs -
**Potential for silent failures**: Wrong offset polling could lead to
improper completion detection - **User-visible impact**: Could cause GPU
hangs or incorrect queue state management - **Low regression risk**: The
fix aligns the code with its intended design ## Comparison with Similar
Commits All 5 provided similar commits were marked "NO" for backporting
because they: - Added new APIs/features (commits #1, #2) - Made
functional changes to queue reset behavior (commits #3, #4, #5) In
contrast, this commit: - Fixes an existing bug without adding features -
Maintains existing behavior while correcting implementation - Follows
the stable tree rule of "important bugfixes with minimal risk" ## Stable
Tree Criteria Met ✅ **Important bugfix**: Fixes incorrect API status
polling in GPU queue reset ✅ **Minimal risk**: Two-line change with
clear logic ✅ **No new features**: Pure correctness fix ✅ **Contained
change**: Limited to MES driver subsystem ✅ **Clear side effects**:
Improves reliability of queue reset operations This commit represents
exactly the type of focused bug fix that stable trees are designed to
include.

 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 68bb334393bb6..ec7ef8763f932 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -477,7 +477,7 @@ static int mes_v11_0_reset_hw_queue(struct amdgpu_mes *mes,
 
 	return mes_v11_0_submit_pkt_and_poll_completion(mes,
 			&mes_reset_queue_pkt, sizeof(mes_reset_queue_pkt),
-			offsetof(union MESAPI__REMOVE_QUEUE, api_status));
+			offsetof(union MESAPI__RESET, api_status));
 }
 
 static int mes_v11_0_map_legacy_queue(struct amdgpu_mes *mes,
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 6b121c2723d66..53d059a2a42e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -500,7 +500,7 @@ static int mes_v12_0_reset_hw_queue(struct amdgpu_mes *mes,
 
 	return mes_v12_0_submit_pkt_and_poll_completion(mes, pipe,
 			&mes_reset_queue_pkt, sizeof(mes_reset_queue_pkt),
-			offsetof(union MESAPI__REMOVE_QUEUE, api_status));
+			offsetof(union MESAPI__RESET, api_status));
 }
 
 static int mes_v12_0_map_legacy_queue(struct amdgpu_mes *mes,
-- 
2.39.5


  parent reply	other threads:[~2025-06-01 23:30 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-01 23:27 [PATCH AUTOSEL 6.14 001/102] drm/amd/display: disable DPP RCG before DPP CLK enable Sasha Levin
2025-06-01 23:27 ` [PATCH AUTOSEL 6.14 003/102] drm/amdgpu/gfx6: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` Sasha Levin [this message]
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 009/102] drm/amd/display: DCN32 null data check Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 020/102] drm/amdkfd: Drop workaround for GC v9.4.3 revID 0 Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 021/102] drm/amdgpu/gfx11: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 025/102] drm/amd/display: Avoid divide by zero by initializing dummy pitch to 1 Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 029/102] drm/amd/display: Add NULL pointer checks in dm_force_atomic_commit() Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 031/102] drm/amd/display: Skip to enable dsc if it has been off Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 034/102] drm/amd/display: Do Not Consider DSC if Valid Config Not Found Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 036/102] drm/amdgpu/gfx10: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 040/102] drm/amd/display: fix zero value for APU watermark_c Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 042/102] drm/amdgpu/gfx7: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 047/102] drm/amd/display: Correct SSC enable detection for DCN351 Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 051/102] drm/amdgpu: fix MES GFX mask Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 052/102] drm/amdgpu: Disallow partition query during reset Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 055/102] drm/amdgpu/gfx8: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 056/102] drm/amd/display: disable EASF narrow filter sharpening Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 057/102] drm/amdgpu/gfx9: fix CSIB handling Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 058/102] drm/amd/display: Fix VUpdate offset calculations for dcn401 Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 060/102] drm/amd/display: Correct prefetch calculation Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 061/102] drm/amd/display: Restructure DMI quirks Sasha Levin
2025-06-01 23:28 ` [PATCH AUTOSEL 6.14 064/102] drm/amdkfd: Set SDMA_RLCx_IB_CNTL/SWITCH_INSIDE_IB Sasha Levin
2025-06-01 23:29 ` [PATCH AUTOSEL 6.14 070/102] drm/amdgpu: Add indirect L1_TLB_CNTL reg programming for VFs Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250601232937.3510379-8-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Hawking.Zhang@amd.com \
    --cc=Jack.Xiao@amd.com \
    --cc=Jesse.Zhang@amd.com \
    --cc=Jiadong.Zhu@amd.com \
    --cc=Prike.Liang@amd.com \
    --cc=Shaoyun.liu@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.chen@amd.com \
    --cc=patches@lists.linux.dev \
    --cc=simona@ffwll.ch \
    --cc=srinivasan.shanmugam@amd.com \
    --cc=stable@vger.kernel.org \
    --cc=sunil.khatri@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox