* [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix a7xx debugbus read
[not found] <20250902120833.1342615-1-sashal@kernel.org>
@ 2025-09-02 12:08 ` Sasha Levin
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix a7xx TPL1 cluster snapshot Sasha Levin
` (2 subsequent siblings)
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-09-02 12:08 UTC (permalink / raw)
To: patches, stable
Cc: Rob Clark, Sasha Levin, linux-arm-msm, dri-devel, freedreno
From: Rob Clark <robin.clark@oss.qualcomm.com>
[ Upstream commit 13ed0a1af263b56a5ebbf38ab7163cbc9dcb009e ]
The bitfield positions changed in a7xx.
v2: Don't open-code the bitfield building
v3: Also fix cx_debugbus
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/666659/
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the
following reasons:
## Bug Fix Analysis
The commit fixes a functional bug in the MSM DRM driver's debugbus
reading functionality for Adreno 7xx GPUs. Specifically:
1. **Incorrect Bitfield Positions**: The code was using A6XX bitfield
definitions for A7XX hardware, but the hardware changed the bitfield
positions. Looking at the changes in `a6xx.xml`:
- A6XX uses bits 15:8 for `PING_BLK_SEL`
- A7XX uses bits 24:16 for `PING_BLK_SEL`
This mismatch would cause incorrect values to be written to the debug
bus selection registers, resulting in wrong or garbage data being read.
2. **Affects Core Debugging Functionality**: The debugbus is critical
for GPU state capture during crashes and debugging. Without this fix,
debugging A7XX GPUs would be severely impaired as the debugbus reads
would return incorrect data.
## Stable Criteria Evaluation
1. **Fixes a real bug**: Yes - incorrect register programming leading to
wrong debug data
2. **Small and contained**: Yes - only 46 lines changed across 2 files
3. **No major side effects**: The changes are conditional based on GPU
family and only affect debug bus reading
4. **No architectural changes**: Simple conditional logic to use correct
bitfield definitions
5. **Critical subsystem impact**: Moderate - affects GPU debugging
capabilities which are important for diagnosing GPU hangs/crashes
6. **Already upstream**: Yes - commit
13ed0a1af263b56a5ebbf38ab7163cbc9dcb009e is already merged upstream
## Code Safety Analysis
The fix is safe because:
- It adds proper family checks: `if (to_adreno_gpu(gpu)->info->family >=
ADRENO_7XX_GEN1)`
- Maintains backward compatibility for A6XX hardware
- Only changes register programming for debug bus reads, not normal GPU
operation
- The fix pattern is applied consistently to both `debugbus_read()` and
`cx_debugbus_read()` functions
## Impact Assessment
Without this fix, users with A7XX GPUs would experience:
- Incorrect GPU state dumps when GPU crashes occur
- Inability to properly debug GPU issues
- Misleading diagnostic information that could hamper bug resolution
This is a clear bugfix that restores proper functionality for A7XX
hardware without introducing new features or risky changes, making it an
ideal candidate for stable backporting.
drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 32 ++++++++++++++-----
drivers/gpu/drm/msm/registers/adreno/a6xx.xml | 14 +++++++-
2 files changed, 37 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index f46bc906ca2a3..61850e2802914 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -174,8 +174,15 @@ static int a6xx_crashdumper_run(struct msm_gpu *gpu,
static int debugbus_read(struct msm_gpu *gpu, u32 block, u32 offset,
u32 *data)
{
- u32 reg = A6XX_DBGC_CFG_DBGBUS_SEL_D_PING_INDEX(offset) |
- A6XX_DBGC_CFG_DBGBUS_SEL_D_PING_BLK_SEL(block);
+ u32 reg;
+
+ if (to_adreno_gpu(gpu)->info->family >= ADRENO_7XX_GEN1) {
+ reg = A7XX_DBGC_CFG_DBGBUS_SEL_D_PING_INDEX(offset) |
+ A7XX_DBGC_CFG_DBGBUS_SEL_D_PING_BLK_SEL(block);
+ } else {
+ reg = A6XX_DBGC_CFG_DBGBUS_SEL_D_PING_INDEX(offset) |
+ A6XX_DBGC_CFG_DBGBUS_SEL_D_PING_BLK_SEL(block);
+ }
gpu_write(gpu, REG_A6XX_DBGC_CFG_DBGBUS_SEL_A, reg);
gpu_write(gpu, REG_A6XX_DBGC_CFG_DBGBUS_SEL_B, reg);
@@ -198,11 +205,18 @@ static int debugbus_read(struct msm_gpu *gpu, u32 block, u32 offset,
readl((ptr) + ((offset) << 2))
/* read a value from the CX debug bus */
-static int cx_debugbus_read(void __iomem *cxdbg, u32 block, u32 offset,
+static int cx_debugbus_read(struct msm_gpu *gpu, void __iomem *cxdbg, u32 block, u32 offset,
u32 *data)
{
- u32 reg = A6XX_CX_DBGC_CFG_DBGBUS_SEL_A_PING_INDEX(offset) |
- A6XX_CX_DBGC_CFG_DBGBUS_SEL_A_PING_BLK_SEL(block);
+ u32 reg;
+
+ if (to_adreno_gpu(gpu)->info->family >= ADRENO_7XX_GEN1) {
+ reg = A7XX_CX_DBGC_CFG_DBGBUS_SEL_A_PING_INDEX(offset) |
+ A7XX_CX_DBGC_CFG_DBGBUS_SEL_A_PING_BLK_SEL(block);
+ } else {
+ reg = A6XX_CX_DBGC_CFG_DBGBUS_SEL_A_PING_INDEX(offset) |
+ A6XX_CX_DBGC_CFG_DBGBUS_SEL_A_PING_BLK_SEL(block);
+ }
cxdbg_write(cxdbg, REG_A6XX_CX_DBGC_CFG_DBGBUS_SEL_A, reg);
cxdbg_write(cxdbg, REG_A6XX_CX_DBGC_CFG_DBGBUS_SEL_B, reg);
@@ -315,7 +329,8 @@ static void a6xx_get_debugbus_block(struct msm_gpu *gpu,
ptr += debugbus_read(gpu, block->id, i, ptr);
}
-static void a6xx_get_cx_debugbus_block(void __iomem *cxdbg,
+static void a6xx_get_cx_debugbus_block(struct msm_gpu *gpu,
+ void __iomem *cxdbg,
struct a6xx_gpu_state *a6xx_state,
const struct a6xx_debugbus_block *block,
struct a6xx_gpu_state_obj *obj)
@@ -330,7 +345,7 @@ static void a6xx_get_cx_debugbus_block(void __iomem *cxdbg,
obj->handle = block;
for (ptr = obj->data, i = 0; i < block->count; i++)
- ptr += cx_debugbus_read(cxdbg, block->id, i, ptr);
+ ptr += cx_debugbus_read(gpu, cxdbg, block->id, i, ptr);
}
static void a6xx_get_debugbus_blocks(struct msm_gpu *gpu,
@@ -527,7 +542,8 @@ static void a6xx_get_debugbus(struct msm_gpu *gpu,
int i;
for (i = 0; i < nr_cx_debugbus_blocks; i++)
- a6xx_get_cx_debugbus_block(cxdbg,
+ a6xx_get_cx_debugbus_block(gpu,
+ cxdbg,
a6xx_state,
&cx_debugbus_blocks[i],
&a6xx_state->cx_debugbus[i]);
diff --git a/drivers/gpu/drm/msm/registers/adreno/a6xx.xml b/drivers/gpu/drm/msm/registers/adreno/a6xx.xml
index d860fd94feae8..86fab2750ba7b 100644
--- a/drivers/gpu/drm/msm/registers/adreno/a6xx.xml
+++ b/drivers/gpu/drm/msm/registers/adreno/a6xx.xml
@@ -594,10 +594,14 @@ by a particular renderpass/blit.
<reg32 offset="0x0600" name="DBGC_CFG_DBGBUS_SEL_A"/>
<reg32 offset="0x0601" name="DBGC_CFG_DBGBUS_SEL_B"/>
<reg32 offset="0x0602" name="DBGC_CFG_DBGBUS_SEL_C"/>
- <reg32 offset="0x0603" name="DBGC_CFG_DBGBUS_SEL_D">
+ <reg32 offset="0x0603" name="DBGC_CFG_DBGBUS_SEL_D" variants="A6XX">
<bitfield high="7" low="0" name="PING_INDEX"/>
<bitfield high="15" low="8" name="PING_BLK_SEL"/>
</reg32>
+ <reg32 offset="0x0603" name="DBGC_CFG_DBGBUS_SEL_D" variants="A7XX-">
+ <bitfield high="7" low="0" name="PING_INDEX"/>
+ <bitfield high="24" low="16" name="PING_BLK_SEL"/>
+ </reg32>
<reg32 offset="0x0604" name="DBGC_CFG_DBGBUS_CNTLT">
<bitfield high="5" low="0" name="TRACEEN"/>
<bitfield high="14" low="12" name="GRANU"/>
@@ -3796,6 +3800,14 @@ by a particular renderpass/blit.
<reg32 offset="0x0030" name="CFG_DBGBUS_TRACE_BUF2"/>
</domain>
+<domain name="A7XX_CX_DBGC" width="32">
+ <!-- Bitfields shifted, but otherwise the same: -->
+ <reg32 offset="0x0000" name="CFG_DBGBUS_SEL_A" variants="A7XX-">
+ <bitfield high="7" low="0" name="PING_INDEX"/>
+ <bitfield high="24" low="16" name="PING_BLK_SEL"/>
+ </reg32>
+</domain>
+
<domain name="A6XX_CX_MISC" width="32" prefix="variant" varset="chip">
<reg32 offset="0x0001" name="SYSTEM_CACHE_CNTL_0"/>
<reg32 offset="0x0002" name="SYSTEM_CACHE_CNTL_1"/>
--
2.50.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix a7xx TPL1 cluster snapshot
[not found] <20250902120833.1342615-1-sashal@kernel.org>
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix a7xx debugbus read Sasha Levin
@ 2025-09-02 12:08 ` Sasha Levin
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix debugbus snapshot Sasha Levin
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix order of selector programming in cluster snapshot Sasha Levin
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-09-02 12:08 UTC (permalink / raw)
To: patches, stable
Cc: Rob Clark, Sasha Levin, linux-arm-msm, dri-devel, freedreno
From: Rob Clark <robin.clark@oss.qualcomm.com>
[ Upstream commit e9621ef610c4a600678da5d8020d4a0dfe686faa ]
Later gens have both a PIPE_BR and PIPE_NONE section. The snapshot tool
seems to expect this for x1-85 as well. I guess this was just a bug in
downstream kgsl, which went unnoticed?
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/666662/
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's the
extensive analysis:
## Technical Analysis
### Nature of the Fix
The commit fixes a bug in GPU crash dump snapshot generation for Adreno
a7xx GPUs (specifically the x1-85 variant). The issue is in the TPL1
(Texture Processing Level 1) cluster snapshot register definitions where
the PIPE_NONE section was missing.
### Code Changes Analysis
1. **In `adreno_gen7_0_0_snapshot.h`**:
- The original code had only
`gen7_0_0_tpl1_noncontext_pipe_br_registers` which included all
registers (0x0b600-0x0b633)
- The fix splits this into two separate arrays:
- `gen7_0_0_tpl1_noncontext_pipe_none_registers`: Contains the bulk
of registers (0x0b600-0x0b633)
- `gen7_0_0_tpl1_noncontext_pipe_br_registers`: Now only contains
register 0x0b600
2. **In `adreno_gen7_2_0_snapshot.h`**:
- Adds a new entry to the `gen7_2_0_sptp_clusters` array:
```c
{ A7XX_CLUSTER_NONE, A7XX_TP0_NCTX_REG, A7XX_PIPE_NONE, 0,
A7XX_USPTP,
gen7_0_0_tpl1_noncontext_pipe_none_registers, 0xb600 },
```
- This ensures both PIPE_BR and PIPE_NONE sections are captured in
the snapshot
### Why This is a Bug Fix
1. **Incorrect Snapshot Data**: Without this fix, GPU crash dumps would
be missing critical register data from the PIPE_NONE section
2. **Tool Compatibility**: The commit message states "The snapshot tool
seems to expect this for x1-85 as well", indicating external tools
that parse these snapshots expect this format
3. **Consistency with Later Generations**: The commit aligns x1-85
(gen7_0_0) behavior with later generations that already have both
sections
### Backport Suitability Criteria
✅ **Fixes a real bug**: Missing register data in crash dumps affects
debugging capabilities
✅ **Small and contained**: Only modifies snapshot header definitions -
no runtime code changes
✅ **Low risk**: Changes are purely additive to snapshot data structures
✅ **No architectural changes**: Simply corrects register definitions
✅ **No new features**: Only fixes existing snapshot functionality
✅ **Important for stability**: Proper crash dumps are crucial for
debugging GPU issues in production
### Context from Repository
Looking at recent fixes in the same area:
- Multiple snapshot-related fixes have been made recently
(f28c9fc2c82de, e1d5ccf0a62b8, 06dd5d86c6aef)
- The gen7_9_0 variant already has PIPE_NONE definitions, confirming
this is the expected pattern
- This appears to be fixing a downstream KGSL (Kernel Graphics Support
Layer) bug that was carried over
### Risk Assessment
**Very Low Risk** because:
- Only affects devcoredump generation (post-crash debugging data)
- No impact on normal GPU operation
- Changes are data-only (register arrays)
- Aligns with existing patterns in other GPU generations
This is an ideal stable backport candidate as it fixes a specific bug
with minimal code changes and virtually no risk of regression.
drivers/gpu/drm/msm/adreno/adreno_gen7_0_0_snapshot.h | 11 +++++++++--
drivers/gpu/drm/msm/adreno/adreno_gen7_2_0_snapshot.h | 2 ++
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gen7_0_0_snapshot.h b/drivers/gpu/drm/msm/adreno/adreno_gen7_0_0_snapshot.h
index cb66ece6606b5..4f305de5d7304 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gen7_0_0_snapshot.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gen7_0_0_snapshot.h
@@ -668,12 +668,19 @@ static const u32 gen7_0_0_sp_noncontext_pipe_lpac_usptp_registers[] = {
};
static_assert(IS_ALIGNED(sizeof(gen7_0_0_sp_noncontext_pipe_lpac_usptp_registers), 8));
-/* Block: TPl1 Cluster: noncontext Pipeline: A7XX_PIPE_BR */
-static const u32 gen7_0_0_tpl1_noncontext_pipe_br_registers[] = {
+/* Block: TPl1 Cluster: noncontext Pipeline: A7XX_PIPE_NONE */
+static const u32 gen7_0_0_tpl1_noncontext_pipe_none_registers[] = {
0x0b600, 0x0b600, 0x0b602, 0x0b602, 0x0b604, 0x0b604, 0x0b608, 0x0b60c,
0x0b60f, 0x0b621, 0x0b630, 0x0b633,
UINT_MAX, UINT_MAX,
};
+static_assert(IS_ALIGNED(sizeof(gen7_0_0_tpl1_noncontext_pipe_none_registers), 8));
+
+/* Block: TPl1 Cluster: noncontext Pipeline: A7XX_PIPE_BR */
+static const u32 gen7_0_0_tpl1_noncontext_pipe_br_registers[] = {
+ 0x0b600, 0x0b600,
+ UINT_MAX, UINT_MAX,
+};
static_assert(IS_ALIGNED(sizeof(gen7_0_0_tpl1_noncontext_pipe_br_registers), 8));
/* Block: TPl1 Cluster: noncontext Pipeline: A7XX_PIPE_LPAC */
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gen7_2_0_snapshot.h b/drivers/gpu/drm/msm/adreno/adreno_gen7_2_0_snapshot.h
index 6f8ad50f32ce1..8d44b9377207c 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gen7_2_0_snapshot.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gen7_2_0_snapshot.h
@@ -573,6 +573,8 @@ static struct gen7_sptp_cluster_registers gen7_2_0_sptp_clusters[] = {
gen7_0_0_sp_noncontext_pipe_lpac_usptp_registers, 0xaf80 },
{ A7XX_CLUSTER_NONE, A7XX_TP0_NCTX_REG, A7XX_PIPE_BR, 0, A7XX_USPTP,
gen7_0_0_tpl1_noncontext_pipe_br_registers, 0xb600 },
+ { A7XX_CLUSTER_NONE, A7XX_TP0_NCTX_REG, A7XX_PIPE_NONE, 0, A7XX_USPTP,
+ gen7_0_0_tpl1_noncontext_pipe_none_registers, 0xb600 },
{ A7XX_CLUSTER_NONE, A7XX_TP0_NCTX_REG, A7XX_PIPE_LPAC, 0, A7XX_USPTP,
gen7_0_0_tpl1_noncontext_pipe_lpac_registers, 0xb780 },
{ A7XX_CLUSTER_SP_PS, A7XX_SP_CTX0_3D_CPS_REG, A7XX_PIPE_BR, 0, A7XX_HLSQ_STATE,
--
2.50.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix debugbus snapshot
[not found] <20250902120833.1342615-1-sashal@kernel.org>
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix a7xx debugbus read Sasha Levin
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix a7xx TPL1 cluster snapshot Sasha Levin
@ 2025-09-02 12:08 ` Sasha Levin
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix order of selector programming in cluster snapshot Sasha Levin
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-09-02 12:08 UTC (permalink / raw)
To: patches, stable
Cc: Rob Clark, Connor Abbott, Sasha Levin, linux-arm-msm, dri-devel,
freedreno
From: Rob Clark <robin.clark@oss.qualcomm.com>
[ Upstream commit a814ba2d7b847cff15565bbab781df89e190619c ]
We weren't setting the # of captured debugbus blocks.
Reported-by: Connor Abbott <cwabbott0@gmail.com>
Suggested-by: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/666660/
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the
following reasons:
## Bug Impact Analysis
1. **Clear bug fix**: The code analysis shows that
`a7xx_get_debugbus_blocks()` allocates memory for
`total_debugbus_blocks` worth of debugbus data and populates all of
them, but critically fails to set `a6xx_state->nr_debugbus =
total_debugbus_blocks`. This is clearly visible at line 442 of the
fixed code.
2. **User-visible impact**: Without setting `nr_debugbus`, the function
`a6xx_show_debugbus()` at line 1949 will iterate 0 times (`for (i =
0; i < a6xx_state->nr_debugbus; i++)`), meaning NO debugbus data will
be shown in GPU crash dumps for a7xx GPUs. This severely impacts
debugging capabilities when GPU hangs or crashes occur.
3. **Regression timeline**: This bug was introduced in commit
64d6255650d4e0 ("drm/msm: More fully implement devcoredump for a7xx")
from January 2024, which added the `a7xx_get_debugbus_blocks()`
function but forgot to set the counter. The a6xx version of this
function correctly sets `nr_debugbus` at lines 372 and 384.
## Stable Tree Criteria Met
1. **Real bug affecting users**: Yes - debugbus data is completely
missing from a7xx GPU crash dumps
2. **Small and contained fix**: Yes - single line addition:
`a6xx_state->nr_debugbus = total_debugbus_blocks;`
3. **No architectural changes**: The fix simply sets an existing counter
variable that was forgotten
4. **Minimal regression risk**: The change only affects the specific
code path for a7xx GPUs and simply enables already-allocated and
populated data to be displayed
5. **Critical debugging functionality**: GPU crash dumps are essential
for debugging graphics driver issues in production
## Technical Details
The bug is in the a7xx-specific path where:
- Memory is allocated for `total_debugbus_blocks` entries (line 426)
- All blocks are populated via `a6xx_get_debugbus_block()` calls (lines
430-439)
- But `nr_debugbus` is never set, leaving it at 0
- This causes `a6xx_show_debugbus()` to skip all debugbus output since
it loops from 0 to `nr_debugbus`
The fix correctly sets `nr_debugbus = total_debugbus_blocks` after
populating all the data, matching the pattern used in the a6xx
equivalent function.
This is a perfect candidate for stable backporting as it fixes a clear
functional regression in debugging infrastructure without any risk of
destabilizing the system.
drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index a85d3df7a5fac..f46bc906ca2a3 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -423,8 +423,9 @@ static void a7xx_get_debugbus_blocks(struct msm_gpu *gpu,
a6xx_state, &a7xx_debugbus_blocks[gbif_debugbus_blocks[i]],
&a6xx_state->debugbus[i + debugbus_blocks_count]);
}
- }
+ a6xx_state->nr_debugbus = total_debugbus_blocks;
+ }
}
static void a6xx_get_debugbus(struct msm_gpu *gpu,
--
2.50.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix order of selector programming in cluster snapshot
[not found] <20250902120833.1342615-1-sashal@kernel.org>
` (2 preceding siblings ...)
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix debugbus snapshot Sasha Levin
@ 2025-09-02 12:08 ` Sasha Levin
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-09-02 12:08 UTC (permalink / raw)
To: patches, stable
Cc: Rob Clark, Akhil P Oommen, Sasha Levin, linux-arm-msm, dri-devel,
freedreno
From: Rob Clark <robin.clark@oss.qualcomm.com>
[ Upstream commit a506578d8909e7e6f0d545af9850ccd4318bf6cf ]
Program the selector _after_ selecting the aperture. This aligns with
the downstream driver, and fixes a case where we were failing to capture
ctx0 regs (and presumably what we thought were ctx1 regs were actually
ctx0).
Suggested-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/666655/
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Bug Fix Nature
The commit fixes a **clear functional bug** in the GPU crash
dump/snapshot functionality for Qualcomm Adreno GPUs (specifically the
a7xx series). The bug causes incorrect register capture during GPU crash
dumps:
- **Symptom**: Failing to capture ctx0 (context 0) registers correctly
- **Impact**: ctx1 registers were actually ctx0 registers, meaning
critical debugging information was lost or misrepresented
## Code Analysis
The fix is **extremely simple and surgical** - it only reorders two
blocks of code in the `a7xx_get_cluster()` function:
**Before the fix:**
```c
/* Some clusters need a selector register to be programmed too */
if (cluster->sel)
in += CRASHDUMP_WRITE(in, cluster->sel->cd_reg, cluster->sel->val);
in += CRASHDUMP_WRITE(in, REG_A7XX_CP_APERTURE_CNTL_CD, ...);
```
**After the fix:**
```c
in += CRASHDUMP_WRITE(in, REG_A7XX_CP_APERTURE_CNTL_CD, ...);
/* Some clusters need a selector register to be programmed too */
if (cluster->sel)
in += CRASHDUMP_WRITE(in, cluster->sel->cd_reg, cluster->sel->val);
```
The aperture control register (`REG_A7XX_CP_APERTURE_CNTL_CD`) must be
programmed **before** the selector register. This is a classic hardware
programming sequence issue where register order matters.
## Stable Backport Criteria Met
1. **Fixes a real bug**: Yes - incorrect crash dump data affects
debugging capability
2. **Small and contained**: Yes - only 4 lines moved, no logic changes
3. **No architectural changes**: Correct - purely a reordering fix
4. **Minimal regression risk**: Yes - only affects crash dump path, not
normal operation
5. **Aligns with vendor driver**: The commit explicitly states it
"aligns with the downstream driver"
6. **No new features**: Correct - purely a bug fix
## Additional Supporting Evidence
- The commit has already been marked with "Upstream commit
a506578d8909..." suggesting it's been accepted upstream
- Similar fixes in the same file (e.g., `f28c9fc2c82de drm/msm: Fix
debugbus snapshot`) show a pattern of fixing crash dump issues
- The fix is isolated to the crash dump code path
(`a7xx_get_cluster()`), which is only executed during GPU error
recovery
- The author (Rob Clark) is a maintainer of the MSM DRM driver, lending
credibility to the fix
This is an ideal stable backport candidate - it fixes a clear bug with
minimal code change and virtually no risk of regression.
drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index 61850e2802914..6e8dbd27addbe 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -776,15 +776,15 @@ static void a7xx_get_cluster(struct msm_gpu *gpu,
size_t datasize;
int i, regcount = 0;
- /* Some clusters need a selector register to be programmed too */
- if (cluster->sel)
- in += CRASHDUMP_WRITE(in, cluster->sel->cd_reg, cluster->sel->val);
-
in += CRASHDUMP_WRITE(in, REG_A7XX_CP_APERTURE_CNTL_CD,
A7XX_CP_APERTURE_CNTL_CD_PIPE(cluster->pipe_id) |
A7XX_CP_APERTURE_CNTL_CD_CLUSTER(cluster->cluster_id) |
A7XX_CP_APERTURE_CNTL_CD_CONTEXT(cluster->context_id));
+ /* Some clusters need a selector register to be programmed too */
+ if (cluster->sel)
+ in += CRASHDUMP_WRITE(in, cluster->sel->cd_reg, cluster->sel->val);
+
for (i = 0; cluster->regs[i] != UINT_MAX; i += 2) {
int count = RANGE(cluster->regs, i);
--
2.50.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-09-02 12:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250902120833.1342615-1-sashal@kernel.org>
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix a7xx debugbus read Sasha Levin
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix a7xx TPL1 cluster snapshot Sasha Levin
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix debugbus snapshot Sasha Levin
2025-09-02 12:08 ` [PATCH AUTOSEL 6.16-6.12] drm/msm: Fix order of selector programming in cluster snapshot Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).