From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: David Francis <David.Francis@amd.com>,
Felix Kuehling <felix.kuehling@amd.com>,
Alex Deucher <alexander.deucher@amd.com>,
Sasha Levin <sashal@kernel.org>,
Felix.Kuehling@amd.com, amd-gfx@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.17-6.1] drm/amdgpu: Allow kfd CRIU with no buffer objects
Date: Sat, 25 Oct 2025 11:55:18 -0400 [thread overview]
Message-ID: <20251025160905.3857885-87-sashal@kernel.org> (raw)
In-Reply-To: <20251025160905.3857885-1-sashal@kernel.org>
From: David Francis <David.Francis@amd.com>
[ Upstream commit 85705b18ae7674347f8675f64b2b3115fb1d5629 ]
The kfd CRIU checkpoint ioctl would return an error if trying
to checkpoint a process with no kfd buffer objects.
This is a normal case and should not be an error.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
YES
- What it fixes: Previously, the CRIU path rejected processes with no
KFD buffer objects by requiring both a non-NULL `bos` pointer and a
non-zero `num_bos`. The commit relaxes this so that a process with
zero BOs is treated as a normal case instead of an error.
- Precise change: In `criu_restore`, the validation changes from
rejecting zero BOs to only requiring `args->bos` when there actually
are BOs:
- New check only requires `bos` if `num_bos > 0`:
`drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2568-2570`
- `(args->num_bos > 0 && !args->bos) || !args->devices ||
!args->priv_data || !args->priv_data_size || !args->num_devices`
- This removes the old unconditional `!args->bos` and `!args->num_bos`
rejection.
- Why it’s correct and safe:
- Downstream restore code already handles zero BOs correctly:
- Size checks scale with `num_bos`:
`drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2439-2440`
- Zero-length allocations are fine; `kvmalloc_array(args->num_bos,
...)` and `kvzalloc(...)` safely handle `num_bos == 0` and
`kvfree` is safe:
`drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2445-2453, 2463-2467,
2495-2499`
- `copy_from_user` and `copy_to_user` with size 0 are no-ops and
safe even if `bos` is NULL:
`drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2455-2461, 2487-2492`
- The loop over BOs naturally skips when `num_bos == 0`:
`drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2479-2485`
- For `num_bos > 0`, the new check still requires a valid `bos`
pointer, preserving existing behavior where needed.
- Scope and risk:
- Small, localized input validation fix in KFD CRIU restore path only;
no architectural changes.
- No impact on other subsystems; error handling paths remain
unchanged.
- Regression risk is minimal because it only relaxes a reject
condition for a valid scenario and downstream code already supports
zero BOs.
- User impact:
- Fixes spurious `-EINVAL` on CRIU operations for processes without
KFD BOs, which is a normal scenario per the commit message.
- Improves reliability of CRIU-based workflows for AMD GPU compute
processes.
- Stable backport criteria:
- Important bugfix affecting real users.
- Minimal, contained change with low risk.
- No new features or API changes; aligns behavior with existing code
expectations.
Note: While the commit message mentions the checkpoint ioctl, this
change updates the restore validation
(`drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2568-2570`). It still
satisfies stable criteria by correcting CRIU handling for the no-BO case
on restore, with the downstream code already safely handling `num_bos ==
0`.
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 43115a3744694..8535a52a62cab 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2571,8 +2571,8 @@ static int criu_restore(struct file *filep,
pr_debug("CRIU restore (num_devices:%u num_bos:%u num_objects:%u priv_data_size:%llu)\n",
args->num_devices, args->num_bos, args->num_objects, args->priv_data_size);
- if (!args->bos || !args->devices || !args->priv_data || !args->priv_data_size ||
- !args->num_devices || !args->num_bos)
+ if ((args->num_bos > 0 && !args->bos) || !args->devices || !args->priv_data ||
+ !args->priv_data_size || !args->num_devices)
return -EINVAL;
mutex_lock(&p->mutex);
--
2.51.0
next prev parent reply other threads:[~2025-10-25 16:13 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251025160905.3857885-1-sashal@kernel.org>
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] amd/amdkfd: enhance kfd process check in switch partition Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-6.12] drm/amdgpu: fix nullptr err of vm_handle_moved Sasha Levin
2025-10-25 15:55 ` Sasha Levin [this message]
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/amd/pm: refine amdgpu pm sysfs node error code Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.6] drm/amdkfd: Handle lack of READ permissions in SVM mapping Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17-6.6] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.4] drm/amdkfd: return -ENOTTY for unsupported IOCTLs Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.1] drm/amdkfd: fix vram allocation failure for a special case Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.4] drm/amdkfd: Tie UNMAP_LATENCY to queue_preemption Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251025160905.3857885-87-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=David.Francis@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=felix.kuehling@amd.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox