From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Rob Clark <robdclark@chromium.org>,
Rob Clark <robin.clark@oss.qualcomm.com>,
Sasha Levin <sashal@kernel.org>,
lumag@kernel.org, linux-arm-msm@vger.kernel.org,
dri-devel@lists.freedesktop.org, freedreno@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.15 02/20] drm/msm: Fix a fence leak in submit error path
Date: Tue, 24 Jun 2025 00:11:01 -0400 [thread overview]
Message-ID: <20250624041120.83191-2-sashal@kernel.org> (raw)
In-Reply-To: <20250624041120.83191-1-sashal@kernel.org>
From: Rob Clark <robdclark@chromium.org>
[ Upstream commit 5d319f75ccf7f0927425a7545aa1a22b3eedc189 ]
In error paths, we could unref the submit without calling
drm_sched_entity_push_job(), so msm_job_free() will never get
called. Since drm_sched_job_cleanup() will NULL out the
s_fence, we can use that to detect this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/653584/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Let me analyze based on what I know from the commit:
**YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Analysis of the Fence Leak Fix
### 1. **Bug Description**
The commit fixes a resource leak in the MSM GPU driver's submit error
path. When errors occur after `drm_sched_job_arm()` but before
`drm_sched_entity_push_job()`, the scheduler fence (`s_fence`) is leaked
because `drm_sched_job_cleanup()` is never called.
### 2. **Code Changes Analysis**
The fix adds this critical code to `__msm_gem_submit_destroy()`:
```c
/*
- In error paths, we could unref the submit without calling
- drm_sched_entity_push_job(), so msm_job_free() will never
- get called. Since drm_sched_job_cleanup() will NULL out
- s_fence, we can use that to detect this case.
*/
if (submit->base.s_fence)
drm_sched_job_cleanup(&submit->base);
```
This is a defensive check that ensures proper cleanup of scheduler
resources when the submit is destroyed without going through the normal
scheduler path.
### 3. **Why This is a Good Backport Candidate**
**a) Fixes a Real Bug:** This addresses a concrete resource leak that
affects system stability. Each leaked fence consumes memory and other
kernel resources that are never freed.
**b) Small and Contained:** The fix is minimal - just 9 lines of code
including comments. It doesn't change any APIs or data structures.
**c) Clear Error Path Fix:** The fix targets a specific error handling
path without affecting the normal execution flow. When
`drm_sched_entity_push_job()` is called successfully, `msm_job_free()`
handles the cleanup as before.
**d) Follows Established Patterns:** The fix uses the same pattern as
the hw_fence cleanup (checking refcount) that's already in the code,
making it consistent with existing error handling.
**e) No Architectural Changes:** This is purely a bug fix that plugs a
resource leak. It doesn't introduce new features or change behavior.
**f) Critical Subsystem:** GPU drivers are important for system
stability and user experience. Memory leaks in GPU submission paths can
accumulate quickly under heavy graphics workloads.
### 4. **Comparison with Similar Commits**
Looking at the similar commits provided:
- The first commit (drm/msm: Fix wait_fence submitqueue leak) with
backport status YES also fixes a resource leak in error paths
- It's similarly small, contained, and fixes a clear bug without
architectural changes
### 5. **Risk Assessment**
**Low Risk:**
- The added check only executes in the destroy path when `s_fence` is
non-NULL
- If `drm_sched_entity_push_job()` was called, `msm_job_free()` will
have already cleaned up `s_fence`, making it NULL
- The fix is defensive and won't cause issues even if called
unnecessarily
### 6. **Stable Criteria Met**
According to stable tree rules:
- ✓ Fixes a bug (fence resource leak)
- ✓ Minimal change (9 lines)
- ✓ Obviously correct (clear logic with explanatory comment)
- ✓ Already tested (has Patchwork link indicating review)
- ✓ No new features
- ✓ Fixes a real issue users could encounter
The fence leak could accumulate over time, especially in scenarios with
GPU submission failures, leading to memory exhaustion and system
instability. This makes it an important fix for stable kernels.
drivers/gpu/drm/msm/msm_gem_submit.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 3e9aa2cc38ef9..b2aeaecaa39b3 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -85,6 +85,15 @@ void __msm_gem_submit_destroy(struct kref *kref)
container_of(kref, struct msm_gem_submit, ref);
unsigned i;
+ /*
+ * In error paths, we could unref the submit without calling
+ * drm_sched_entity_push_job(), so msm_job_free() will never
+ * get called. Since drm_sched_job_cleanup() will NULL out
+ * s_fence, we can use that to detect this case.
+ */
+ if (submit->base.s_fence)
+ drm_sched_job_cleanup(&submit->base);
+
if (submit->fence_id) {
spin_lock(&submit->queue->idr_lock);
idr_remove(&submit->queue->fence_idr, submit->fence_id);
--
2.39.5
next prev parent reply other threads:[~2025-06-24 4:11 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-24 4:11 [PATCH AUTOSEL 6.15 01/20] x86/platform/amd: move final timeout check to after final sleep Sasha Levin
2025-06-24 4:11 ` Sasha Levin [this message]
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 03/20] drm/msm: Fix another leak in the submit error path Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 04/20] ALSA: sb: Don't allow changing the DMA mode during operations Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 05/20] ALSA: sb: Force to disable DMAs once when DMA mode is changed Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 06/20] ata: libata-acpi: Do not assume 40 wire cable if no devices are enabled Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 07/20] ata: pata_cs5536: fix build on 32-bit UML Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 08/20] ASoC: amd: yc: Add quirk for MSI Bravo 17 D7VF internal mic Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 09/20] platform/x86/amd/pmc: Add PCSpecialist Lafite Pro V 14M to 8042 quirks list Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 10/20] genirq/irq_sim: Initialize work context pointers properly Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 11/20] powerpc: Fix struct termio related ioctl macros Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 12/20] ASoC: amd: yc: update quirk data for HP Victus Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 13/20] regulator: fan53555: add enable_time support and soft-start times Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 14/20] scsi: target: Fix NULL pointer dereference in core_scsi3_decode_spec_i_port() Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 15/20] aoe: defer rexmit timer downdev work to workqueue Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 16/20] wifi: mac80211: drop invalid source address OCB frames Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 17/20] wifi: ath6kl: remove WARN on bad firmware input Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 18/20] ACPICA: Refuse to evaluate a method if arguments are missing Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 19/20] mtd: spinand: fix memory leak of ECC engine conf Sasha Levin
2025-06-24 4:11 ` [PATCH AUTOSEL 6.15 20/20] rcu: Return early if callback is not specified Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250624041120.83191-2-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=freedreno@lists.freedesktop.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=lumag@kernel.org \
--cc=patches@lists.linux.dev \
--cc=robdclark@chromium.org \
--cc=robin.clark@oss.qualcomm.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox