From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Yifan Zhang <yifan1.zhang@amd.com>,
Philip Yang <Philip.Yang@amd.com>,
Alex Deucher <alexander.deucher@amd.com>,
Sasha Levin <sashal@kernel.org>,
Felix.Kuehling@amd.com, amd-gfx@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.6 02/18] amd/amdkfd: fix a kfd_process ref leak
Date: Mon, 9 Jun 2025 09:46:36 -0400 [thread overview]
Message-ID: <20250609134652.1344323-2-sashal@kernel.org> (raw)
In-Reply-To: <20250609134652.1344323-1-sashal@kernel.org>
From: Yifan Zhang <yifan1.zhang@amd.com>
[ Upstream commit 90237b16ec1d7afa16e2173cc9a664377214cdd9 ]
This patch is to fix a kfd_prcess ref leak.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
**YES**
This commit should be backported to stable kernel trees. Here's my
detailed analysis:
## Bug Analysis
This patch fixes a clear **resource leak bug** where a reference count
is not properly decremented in an error path. Specifically:
1. **The leak occurs in `kfd_signal_poison_consumed_event()`** at line
1334 where the function returns early without calling
`kfd_unref_process(p)`:
```c
if (unlikely(user_gpu_id == -EINVAL)) {
WARN_ONCE(1, "Could not get user_gpu_id from dev->id:%x\n", dev->id);
return; // BUG: Missing kfd_unref_process(p)
}
```
2. **The function `kfd_lookup_process_by_pasid()` increments the
reference count** on the kfd_process object (as confirmed by
examining the kernel source). This means every successful call must
be paired with `kfd_unref_process()`.
3. **The fix is minimal and correct**: It simply adds the missing
`kfd_unref_process(p)` call before the early return, ensuring the
reference count is properly decremented.
## Why This Should Be Backported
1. **It's a clear bug fix**: This fixes a reference count leak that will
cause memory to be leaked every time this error path is hit. Over
time, this could lead to significant memory exhaustion.
2. **The fix is minimal and contained**: Only one line is added, making
it low-risk for regression. The change doesn't alter any logic or
introduce new features.
3. **It affects user-visible behavior**: When GPU errors occur and this
code path is triggered repeatedly, the memory leak could impact
system stability and performance.
4. **Similar to previously backported commits**: Looking at the
historical examples, commit #4 which also fixed a memory leak in
create_process failure was marked for backporting and included "Cc:
stable@vger.kernel.org".
5. **The bug is in error handling code**: While the error condition
might be rare (invalid GPU ID), when it does occur, the leak happens
every time. Error handling bugs are particularly important to fix
because they can accumulate unnoticed.
6. **No architectural changes**: This is a simple bug fix that doesn't
change any APIs, data structures, or introduce new functionality.
The commit follows the stable kernel rules perfectly: it fixes a real
bug, is minimal in scope, has clear consequences if not fixed (memory
leak), and doesn't introduce new features or risky changes.
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 0f58be65132fc..2b07c0000df6e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -1287,6 +1287,7 @@ void kfd_signal_poison_consumed_event(struct kfd_node *dev, u32 pasid)
user_gpu_id = kfd_process_get_user_gpu_id(p, dev->id);
if (unlikely(user_gpu_id == -EINVAL)) {
WARN_ONCE(1, "Could not get user_gpu_id from dev->id:%x\n", dev->id);
+ kfd_unref_process(p);
return;
}
--
2.39.5
next prev parent reply other threads:[~2025-06-09 13:46 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-09 13:46 [PATCH AUTOSEL 6.6 01/18] md/md-bitmap: fix dm-raid max_write_behind setting Sasha Levin
2025-06-09 13:46 ` Sasha Levin [this message]
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 03/18] bcache: fix NULL pointer in cache_set_flush() Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 04/18] drm/scheduler: signal scheduled fence when kill job Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 05/18] iio: pressure: zpa2326: Use aligned_s64 for the timestamp Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 06/18] um: Add cmpxchg8b_emu and checksum functions to asm-prototypes.h Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 07/18] um: use proper care when taking mmap lock during segfault Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 08/18] coresight: Only check bottom two claim bits Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 09/18] usb: dwc2: also exit clock_gating when stopping udc while suspended Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 10/18] iio: adc: ad_sigma_delta: Fix use of uninitialized status_pos Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 11/18] misc: tps6594-pfsm: Add NULL pointer check in tps6594_pfsm_probe() Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 12/18] usb: potential integer overflow in usbg_make_tpg() Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 13/18] tty: serial: uartlite: register uart driver in init Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 14/18] usb: common: usb-conn-gpio: use a unique name for usb connector device Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 15/18] usb: Add checks for snprintf() calls in usb_alloc_dev() Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 16/18] usb: cdc-wdm: avoid setting WDM_READ for ZLP-s Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 17/18] usb: typec: displayport: Receive DP Status Update NAK request exit dp altmode Sasha Levin
2025-06-09 13:46 ` [PATCH AUTOSEL 6.6 18/18] usb: typec: mux: do not return on EOPNOTSUPP in {mux, switch}_set Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250609134652.1344323-2-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=Felix.Kuehling@amd.com \
--cc=Philip.Yang@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=yifan1.zhang@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox