[PATCH AUTOSEL 6.17] amd/amdkfd: enhance kfd process check in switch partition

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.17] amd/amdkfd: enhance kfd process check in switch partition
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
@ 2025-10-25 15:54 ` Sasha Levin
  2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-6.12] drm/amdgpu: fix nullptr err of vm_handle_moved Sasha Levin
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-10-25 15:54 UTC (permalink / raw)
  To: patches, stable
  Cc: Yifan Zhang, Philip.Yang, Alex Deucher, Sasha Levin,
	Felix.Kuehling, amd-gfx

From: Yifan Zhang <yifan1.zhang@amd.com>

[ Upstream commit 45da20e00d5da842e17dfc633072b127504f0d0e ]

current switch partition only check if kfd_processes_table is empty.
kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but
kfd_process tear down is in kfd_process_wq_release.

consider two processes:

Process A (workqueue) -> kfd_process_wq_release -> Access kfd_node member
Process B switch partition -> amdgpu_xcp_pre_partition_switch -> amdgpu_amdkfd_device_fini_sw
-> kfd_node tear down.

Process A and B may trigger a race as shown in dmesg log.

This patch is to resolve the race by adding an atomic kfd_process counter
kfd_processes_count, it increment as create kfd process, decrement as
finish kfd_process_wq_release.

v2: Put kfd_processes_count per kfd_dev, move decrement to kfd_process_destroy_pdds
and bug fix. (Philip Yang)

[3966658.307702] divide error: 0000 [#1] SMP NOPTI
[3966658.350818]  i10nm_edac
[3966658.356318] CPU: 124 PID: 38435 Comm: kworker/124:0 Kdump: loaded Tainted
[3966658.356890] Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu]
[3966658.362839]  nfit
[3966658.366457] RIP: 0010:kfd_get_num_sdma_engines+0x17/0x40 [amdgpu]
[3966658.366460] Code: 00 00 e9 ac 81 02 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 8b 4f 08 48 8b b7 00 01 00 00 8b 81 58 26 03 00 99 <f7> be b8 01 00 00 80 b9 70 2e 00 00 00 74 0b 83 f8 02 ba 02 00 00
[3966658.380967]  x86_pkg_temp_thermal
[3966658.391529] RSP: 0018:ffffc900a0edfdd8 EFLAGS: 00010246
[3966658.391531] RAX: 0000000000000008 RBX: ffff8974e593b800 RCX: ffff888645900000
[3966658.391531] RDX: 0000000000000000 RSI: ffff888129154400 RDI: ffff888129151c00
[3966658.391532] RBP: ffff8883ad79d400 R08: 0000000000000000 R09: ffff8890d2750af4
[3966658.391532] R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000000
[3966658.391533] R13: ffff8883ad79d400 R14: ffffe87ff662ba00 R15: ffff8974e593b800
[3966658.391533] FS:  0000000000000000(0000) GS:ffff88fe7f600000(0000) knlGS:0000000000000000
[3966658.391534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3966658.391534] CR2: 0000000000d71000 CR3: 000000dd0e970004 CR4: 0000000002770ee0
[3966658.391535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3966658.391535] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[3966658.391536] PKRU: 55555554
[3966658.391536] Call Trace:
[3966658.391674]  deallocate_sdma_queue+0x38/0xa0 [amdgpu]
[3966658.391762]  process_termination_cpsch+0x1ed/0x480 [amdgpu]
[3966658.399754]  intel_powerclamp
[3966658.402831]  kfd_process_dequeue_from_all_devices+0x5b/0xc0 [amdgpu]
[3966658.402908]  kfd_process_wq_release+0x1a/0x1a0 [amdgpu]
[3966658.410516]  coretemp
[3966658.434016]  process_one_work+0x1ad/0x380
[3966658.434021]  worker_thread+0x49/0x310
[3966658.438963]  kvm_intel
[3966658.446041]  ? process_one_work+0x380/0x380
[3966658.446045]  kthread+0x118/0x140
[3966658.446047]  ? __kthread_bind_mask+0x60/0x60
[3966658.446050]  ret_from_fork+0x1f/0x30
[3966658.446053] Modules linked in: kpatch_20765354(OEK)
[3966658.455310]  kvm
[3966658.464534]  mptcp_diag xsk_diag raw_diag unix_diag af_packet_diag netlink_diag udp_diag act_pedit act_mirred act_vlan cls_flower kpatch_21951273(OEK) kpatch_18424469(OEK) kpatch_19749756(OEK)
[3966658.473462]  idxd_mdev
[3966658.482306]  kpatch_17971294(OEK) sch_ingress xt_conntrack amdgpu(OE) amdxcp(OE) amddrm_buddy(OE) amd_sched(OE) amdttm(OE) amdkcl(OE) intel_ifs iptable_mangle tcm_loop target_core_pscsi tcp_diag target_core_file inet_diag target_core_iblock target_core_user target_core_mod coldpgs kpatch_18383292(OEK) ip6table_nat ip6table_filter ip6_tables ip_set_hash_ipportip ip_set_hash_ipportnet ip_set_hash_ipport ip_set_bitmap_port xt_comment iptable_nat nf_nat iptable_filter ip_tables ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sn_core_odd(OE) i40e overlay binfmt_misc tun bonding(OE) aisqos(OE) aisqos_hotfixes(OE) rfkill uio_pci_generic uio cuse fuse nf_tables nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm idxd_mdev
[3966658.491237]  vfio_pci
[3966658.501196]  vfio_pci vfio_virqfd mdev vfio_iommu_type1 vfio iax_crypto intel_pmt_telemetry iTCO_wdt intel_pmt_class iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq
[3966658.508537]  vfio_virqfd
[3966658.517569]  snd_seq_device ipmi_ssif isst_if_mbox_pci isst_if_mmio pcspkr snd_pcm idxd intel_uncore ses isst_if_common intel_vsec idxd_bus enclosure snd_timer mei_me snd i2c_i801 i2c_smbus mei i2c_ismt soundcore joydev acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad vfat fat
[3966658.526851]  mdev
[3966658.536096]  nfsd auth_rpcgss nfs_acl lockd grace slb_vtoa(OE) sunrpc dm_mod hookers mlx5_ib(OE) ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm mlx5_core(OE) mlxfw(OE)
[3966658.540381]  vfio_iommu_type1
[3966658.544341]  nvme mpt3sas tls drm nvme_core pci_hyperv_intf raid_class psample libcrc32c crc32c_intel mlxdevm(OE) i2c_core
[3966658.551254]  vfio
[3966658.558742]  scsi_transport_sas wmi pinctrl_emmitsburg sd_mod t10_pi sg ahci libahci libata rdma_ucm(OE) ib_uverbs(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_umad(OE) ib_core(OE) ib_ucm(OE) mlx_compat(OE)
[3966658.563004]  iax_crypto
[3966658.570988]  [last unloaded: diagnose]
[3966658.571027] ---[ end trace cc9dbb180f9ae537 ]---

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip.Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES
- The crash the commit describes is real: when
  `kfd_process_notifier_release` removes a process from
  `kfd_processes_table`, the subsequent `kfd_process_wq_release` can
  still touch `kfd_node` while `kgd2kfd_check_and_lock_kfd` allows a
  partition switch to proceed, tearing the device down and triggering
  the reported divide error in `kfd_get_num_sdma_engines`. The
  regression was introduced when commit `96f75f9594466f` relaxed the
  partition switch guard to rely only on the hash table; the new trace
  shows we now have a use-after-free window.
- The fix is tight and well scoped: it adds a per-device atomic counter
  at `drivers/gpu/drm/amd/amdkfd/kfd_priv.h:386` and initializes it in
  `kgd2kfd_probe` (`drivers/gpu/drm/amd/amdkfd/kfd_device.c:498`).
  `kgd2kfd_check_and_lock_kfd` now refuses partition switches while that
  counter is non-zero
  (`drivers/gpu/drm/amd/amdkfd/kfd_device.c:1495-1503`), preventing the
  race.
- The counter is balanced across process lifecycle: it increments
  whenever a process device descriptor is created
  (`drivers/gpu/drm/amd/amdkfd/kfd_process.c:1644-1654`) and decrements
  when the descriptor is destroyed in the workqueue cleanup
  (`drivers/gpu/drm/amd/amdkfd/kfd_process.c:1085-1093`). Because
  `kfd_process_destroy_pdds` zeroes `p->n_pdds` after the loop, double
  decrements are prevented.
- Side effects are minimal: the patch touches only amdkfd code,
  introduces no API/ABI changes, and relies on existing synchronization
  (`kfd_processes_mutex` and atomics). The new counter simply gatekeeps
  the existing teardown path, so regression risk is low. No follow-up
  fixes are required.
- For stable backports, ensure the base tree already contains the
  compute-partition switch support from `96f75f9594466f`; earlier
  kernels that never allowed switching with live processes don’t hit
  this race and wouldn’t benefit. On trees with that support, this
  change cleanly applies and prevents a hard crash, making it an
  excellent stable candidate.

Next step: cherry-pick 45da20e00d5da842e17dfc633072b127504f0d0e onto the
relevant stable branches and run the usual amdgpu/amdkfd partition-
switch regression tests.

 drivers/gpu/drm/amd/amdkfd/kfd_device.c  | 10 ++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  4 ++++
 3 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 051a00152b089..e9cfb80bd4366 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -495,6 +495,7 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev, bool vf)
 	mutex_init(&kfd->doorbell_mutex);
 
 	ida_init(&kfd->doorbell_ida);
+	atomic_set(&kfd->kfd_processes_count, 0);
 
 	return kfd;
 }
@@ -1493,6 +1494,15 @@ int kgd2kfd_check_and_lock_kfd(struct kfd_dev *kfd)
 
 	mutex_lock(&kfd_processes_mutex);
 
+	/* kfd_processes_count is per kfd_dev, return -EBUSY without
+	 * further check
+	 */
+	if (!!atomic_read(&kfd->kfd_processes_count)) {
+		pr_debug("process_wq_release not finished\n");
+		r = -EBUSY;
+		goto out;
+	}
+
 	if (hash_empty(kfd_processes_table) && !kfd_is_locked(kfd))
 		goto out;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index d01ef5ac07666..70ef051511bb1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -382,6 +382,8 @@ struct kfd_dev {
 
 	/* for dynamic partitioning */
 	int kfd_dev_lock;
+
+	atomic_t kfd_processes_count;
 };
 
 enum kfd_mempool {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 5be28c6c4f6aa..ddfe30c13e9d6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1088,6 +1088,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 			pdd->runtime_inuse = false;
 		}
 
+		atomic_dec(&pdd->dev->kfd->kfd_processes_count);
+
 		kfree(pdd);
 		p->pdds[i] = NULL;
 	}
@@ -1649,6 +1651,8 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_node *dev,
 	/* Init idr used for memory handle translation */
 	idr_init(&pdd->alloc_idr);
 
+	atomic_inc(&dev->kfd->kfd_processes_count);
+
 	return pdd;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.17-6.12] drm/amdgpu: fix nullptr err of vm_handle_moved
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
  2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] amd/amdkfd: enhance kfd process check in switch partition Sasha Levin
@ 2025-10-25 15:54 ` Sasha Levin
  2025-10-25 15:55 ` [PATCH AUTOSEL 6.17-6.1] drm/amdgpu: Allow kfd CRIU with no buffer objects Sasha Levin
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-10-25 15:54 UTC (permalink / raw)
  To: patches, stable
  Cc: Heng Zhou, Kasiviswanathan, Harish, Alex Deucher, Sasha Levin,
	Felix.Kuehling, amd-gfx

From: Heng Zhou <Heng.Zhou@amd.com>

[ Upstream commit 859958a7faefe5b7742b7b8cdbc170713d4bf158 ]

If a amdgpu_bo_va is fpriv->prt_va, the bo of this one is always NULL.
So, such kind of amdgpu_bo_va should be updated separately before
amdgpu_vm_handle_moved.

Signed-off-by: Heng Zhou <Heng.Zhou@amd.com>
Reviewed-by: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

- Bug impact: The commit fixes a real, user-visible NULL pointer
  dereference during KFD process BO restore. In the KFD restore path,
  PRT (partial resident texture) mappings live in `fpriv->prt_va`, which
  is a VM mapping with no backing BO. This mapping can appear in the
  VM’s “moved/invalidated” lists, and `amdgpu_vm_handle_moved()` will
  then dereference `bo_va->base.bo`, causing a NULL deref. Specifically,
  `amdgpu_vm_handle_moved()` dereferences `bo_va->base.bo` in the
  invalidated loop to fetch the reservation object:
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1608 and
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1618. That’s unsafe for PRT VA
  since its BO is always NULL.

- Why the bug exists: `fpriv->prt_va` is created with a NULL BO (as
  intended) at drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:1428, which sets
  up a special VA mapping without a BO: it calls `amdgpu_vm_bo_add(adev,
  &fpriv->vm, NULL)`. Consequently, any generic handling that assumes
  `bo_va->base.bo` is non-NULL can crash if the PRT VA ends up in the
  VM’s invalidation or movement queues.

- What the change does: The patch updates the PRT mapping before calling
  the generic VM “handle moved” pass, ensuring the PRT VA is not present
  in those lists when the code that assumes a non-NULL BO runs.
  - Before: In the restore path, after validating PDs/PTs, the code
    directly calls `amdgpu_vm_handle_moved(adev, peer_vm, &exec.ticket)`
    for all VMs (drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:2992).
  - After: It first derives `fpriv` from the VM, then explicitly updates
    `fpriv->prt_va` with `amdgpu_vm_bo_update(adev, fpriv->prt_va,
    false)` before calling `amdgpu_vm_handle_moved()` (as per the diff).
    This mirrors how command submission already handles PRT VA before
    calling handle_moved (see
    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1149–1191).
  - Rationale: `amdgpu_vm_bo_update()` safely supports `bo_va->base.bo
    == NULL` (PRT case) and moves the mapping’s state to “done” without
    dereferencing a BO, see
    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1245–1390. It uses the VM’s
    fence (`vm->last_update`) instead of a BO fence when `bo == NULL`,
    and it moves the mapping out of the invalidated/moved state via
    `amdgpu_vm_bo_done(&bo_va->base)`.

- Safety and minimality:
  - The fix is small, localized to
    `amdgpu_amdkfd_gpuvm_restore_process_bos()`
    (drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:2986–3004 region),
    with no architectural changes.
  - It follows an established pattern already present in the CS path:
    `amdgpu_cs` updates `fpriv->prt_va` before
    `amdgpu_vm_handle_moved()`
    (drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1149–1191).
  - It operates while PD/PTs are already reserved via the DRM
    exec/ticket path, matching the expectations of both
    `amdgpu_vm_bo_update()` and `amdgpu_vm_handle_moved()`.
  - The additional log message changes (`pr_debug` → `dev_dbg` with PID)
    are non-functional and low risk.

- Stable tree criteria:
  - Fixes a real crash (NULL deref), so it’s an important bug fix.
  - Change is small and contained to the AMDGPU KFD restore path.
  - No new features or API changes; no architectural rework.
  - Regression risk is low, because it matches the existing, proven
    sequence used in command submission code.
  - Touches a critical subsystem (AMDGPU), but the pattern is already
    used elsewhere, reducing risk.

- Applicability across stable branches:
  - This is relevant for stable kernels that have the KFD restore path
    structured like the one in your 6.17 workspace (with “Update
    mappings not managed by KFD” and the call to
    `amdgpu_vm_handle_moved()` within
    `amdgpu_amdkfd_gpuvm_restore_process_bos` at
    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:2986–3004).
  - Older long-term branches (e.g., 5.4 in your `$CVEKERNELTREE`) have a
    different implementation of the restore path and do not invoke
    `amdgpu_vm_handle_moved()` there (see
    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:1952 in that tree).
    For those, this exact backport is not applicable or needed.

Conclusion: This is a correct, minimal, and safe bug fix preventing a
NULL pointer crash in the KFD eviction-restore path and mirrors existing
correct handling in CS paths. It should be backported to stable trees
that have this KFD restore flow.

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 902eac2c685f3..30d4a47535882 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2993,9 +2993,22 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
 		struct amdgpu_device *adev = amdgpu_ttm_adev(
 			peer_vm->root.bo->tbo.bdev);
 
+		struct amdgpu_fpriv *fpriv =
+			container_of(peer_vm, struct amdgpu_fpriv, vm);
+
+		ret = amdgpu_vm_bo_update(adev, fpriv->prt_va, false);
+		if (ret) {
+			dev_dbg(adev->dev,
+				"Memory eviction: handle PRT moved failed, pid %8d. Try again.\n",
+				pid_nr(process_info->pid));
+			goto validate_map_fail;
+		}
+
 		ret = amdgpu_vm_handle_moved(adev, peer_vm, &exec.ticket);
 		if (ret) {
-			pr_debug("Memory eviction: handle moved failed. Try again\n");
+			dev_dbg(adev->dev,
+				"Memory eviction: handle moved failed, pid %8d. Try again.\n",
+				pid_nr(process_info->pid));
 			goto validate_map_fail;
 		}
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.17-6.1] drm/amdgpu: Allow kfd CRIU with no buffer objects
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
  2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] amd/amdkfd: enhance kfd process check in switch partition Sasha Levin
  2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-6.12] drm/amdgpu: fix nullptr err of vm_handle_moved Sasha Levin
@ 2025-10-25 15:55 ` Sasha Levin
  2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/amd/pm: refine amdgpu pm sysfs node error code Sasha Levin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-10-25 15:55 UTC (permalink / raw)
  To: patches, stable
  Cc: David Francis, Felix Kuehling, Alex Deucher, Sasha Levin,
	Felix.Kuehling, amd-gfx

From: David Francis <David.Francis@amd.com>

[ Upstream commit 85705b18ae7674347f8675f64b2b3115fb1d5629 ]

The kfd CRIU checkpoint ioctl would return an error if trying
to checkpoint a process with no kfd buffer objects.

This is a normal case and should not be an error.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

- What it fixes: Previously, the CRIU path rejected processes with no
  KFD buffer objects by requiring both a non-NULL `bos` pointer and a
  non-zero `num_bos`. The commit relaxes this so that a process with
  zero BOs is treated as a normal case instead of an error.

- Precise change: In `criu_restore`, the validation changes from
  rejecting zero BOs to only requiring `args->bos` when there actually
  are BOs:
  - New check only requires `bos` if `num_bos > 0`:
    `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2568-2570`
    - `(args->num_bos > 0 && !args->bos) || !args->devices ||
      !args->priv_data || !args->priv_data_size || !args->num_devices`
  - This removes the old unconditional `!args->bos` and `!args->num_bos`
    rejection.

- Why it’s correct and safe:
  - Downstream restore code already handles zero BOs correctly:
    - Size checks scale with `num_bos`:
      `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2439-2440`
    - Zero-length allocations are fine; `kvmalloc_array(args->num_bos,
      ...)` and `kvzalloc(...)` safely handle `num_bos == 0` and
      `kvfree` is safe:
      `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2445-2453, 2463-2467,
      2495-2499`
    - `copy_from_user` and `copy_to_user` with size 0 are no-ops and
      safe even if `bos` is NULL:
      `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2455-2461, 2487-2492`
    - The loop over BOs naturally skips when `num_bos == 0`:
      `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2479-2485`
  - For `num_bos > 0`, the new check still requires a valid `bos`
    pointer, preserving existing behavior where needed.

- Scope and risk:
  - Small, localized input validation fix in KFD CRIU restore path only;
    no architectural changes.
  - No impact on other subsystems; error handling paths remain
    unchanged.
  - Regression risk is minimal because it only relaxes a reject
    condition for a valid scenario and downstream code already supports
    zero BOs.

- User impact:
  - Fixes spurious `-EINVAL` on CRIU operations for processes without
    KFD BOs, which is a normal scenario per the commit message.
  - Improves reliability of CRIU-based workflows for AMD GPU compute
    processes.

- Stable backport criteria:
  - Important bugfix affecting real users.
  - Minimal, contained change with low risk.
  - No new features or API changes; aligns behavior with existing code
    expectations.

Note: While the commit message mentions the checkpoint ioctl, this
change updates the restore validation
(`drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2568-2570`). It still
satisfies stable criteria by correcting CRIU handling for the no-BO case
on restore, with the downstream code already safely handling `num_bos ==
0`.

 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 43115a3744694..8535a52a62cab 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2571,8 +2571,8 @@ static int criu_restore(struct file *filep,
 	pr_debug("CRIU restore (num_devices:%u num_bos:%u num_objects:%u priv_data_size:%llu)\n",
 		 args->num_devices, args->num_bos, args->num_objects, args->priv_data_size);
 
-	if (!args->bos || !args->devices || !args->priv_data || !args->priv_data_size ||
-	    !args->num_devices || !args->num_bos)
+	if ((args->num_bos > 0 && !args->bos) || !args->devices || !args->priv_data ||
+	    !args->priv_data_size || !args->num_devices)
 		return -EINVAL;
 
 	mutex_lock(&p->mutex);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.17] drm/amd/pm: refine amdgpu pm sysfs node error code
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2025-10-25 15:55 ` [PATCH AUTOSEL 6.17-6.1] drm/amdgpu: Allow kfd CRIU with no buffer objects Sasha Levin
@ 2025-10-25 15:57 ` Sasha Levin
  2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.6] drm/amdkfd: Handle lack of READ permissions in SVM mapping Sasha Levin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-10-25 15:57 UTC (permalink / raw)
  To: patches, stable
  Cc: Yang Wang, Lijo Lazar, Alex Deucher, Sasha Levin, kenneth.feng,
	amd-gfx

From: Yang Wang <kevinyang.wang@amd.com>

[ Upstream commit cf32515a70618c0fb2319bd4a855f4d9447940a8 ]

v1:
Returns different error codes based on the scenario to help the user app understand
the AMDGPU device status when an exception occurs.

v2:
change -NODEV to -EBUSY.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

- What changed
  - The function `amdgpu_pm_dev_state_check()` now returns `-EBUSY`
    instead of `-EPERM` when the device is in GPU reset or system
    suspend:
    - `drivers/gpu/drm/amd/pm/amdgpu_pm.c:112`: `if
      (amdgpu_in_reset(adev)) return -EBUSY;` (was `-EPERM`)
    - `drivers/gpu/drm/amd/pm/amdgpu_pm.c:115`: `if (adev->in_suspend &&
      !runpm_check) return -EBUSY;` (was `-EPERM`)
  - This function gates access in `amdgpu_pm_get_access()` and
    `amdgpu_pm_get_access_if_active()`:
    - `drivers/gpu/drm/amd/pm/amdgpu_pm.c:133`: `ret =
      amdgpu_pm_dev_state_check(adev, true);`
    - `drivers/gpu/drm/amd/pm/amdgpu_pm.c:153`: `ret =
      amdgpu_pm_dev_state_check(adev, false);`
  - Numerous PM-related sysfs show/store handlers directly return the
    `ret` from these helpers (e.g., `amdgpu_get_power_dpm_state()`
    returns `ret` on failure), so the errno visible to userspace changes
    from `-EPERM` to `-EBUSY` when the device is resetting or suspended
    (example call and return: `drivers/gpu/drm/amd/pm/amdgpu_pm.c:217`
    onward in the `amdgpu_get_power_dpm_state` path shows the pattern of
    `ret = ...; if (ret) return ret;`).

- Why it’s a bug fix suitable for stable
  - Correctness/semantics: `-EPERM` indicates a permissions problem,
    which is misleading here; the device is temporarily unavailable due
    to reset or suspend. `-EBUSY` accurately communicates a transient
    busy state and invites retry, which aligns better with userspace
    expectations and error handling.
  - Scope and risk: The change is tiny and localized to return codes in
    a single helper. It does not alter call sequences, state checks, PM
    flows, or locking. No ABI or uAPI additions, no
    structural/architectural changes.
  - Impacted surface: Only sysfs PM nodes’ errno in specific exceptional
    states. In-kernel callers are not affected (the helpers are
    `static`). Userspace seeing `-EBUSY` instead of `-EPERM` is an
    improvement for diagnostics and retry logic. AMDGPU already returns
    `-EBUSY` in analogous busy conditions elsewhere, so this aligns with
    existing patterns.
  - Stability: No performance, functional, or security regression
    vectors are introduced. The remaining `-EPERM` usage in
    `amdgpu_pm_get_access_if_active()` when the device is not active
    (`drivers/gpu/drm/amd/pm/amdgpu_pm.c:163`) is untouched, keeping
    behavior consistent for that distinct case.

- Backport considerations
  - The patch is self-contained and minimal. If the target stable trees
    already have `amdgpu_pm_dev_state_check()` and the access helpers,
    this applies cleanly. If older trees did the checks inline in each
    sysfs op, the backport would require equivalent one-line
    replacements at those sites; still trivial and low risk.
  - No user-visible feature change; only corrected errno in exceptional
    conditions, which is a common and acceptable class of stable fixes.

Given the small, precise nature of the change, its correctness benefit,
and minimal regression risk, this is a good candidate for stable
backporting.

 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 5fbfe7333b54d..1fca183827c7c 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -110,9 +110,10 @@ static int amdgpu_pm_dev_state_check(struct amdgpu_device *adev, bool runpm)
 	bool runpm_check = runpm ? adev->in_runpm : false;
 
 	if (amdgpu_in_reset(adev))
-		return -EPERM;
+		return -EBUSY;
+
 	if (adev->in_suspend && !runpm_check)
-		return -EPERM;
+		return -EBUSY;
 
 	return 0;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.17-6.6] drm/amdkfd: Handle lack of READ permissions in SVM mapping
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/amd/pm: refine amdgpu pm sysfs node error code Sasha Levin
@ 2025-10-25 15:58 ` Sasha Levin
  2025-10-25 15:59 ` [PATCH AUTOSEL 6.17-6.6] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Sasha Levin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-10-25 15:58 UTC (permalink / raw)
  To: patches, stable
  Cc: Kent Russell, Felix Kuehling, Alex Deucher, Sasha Levin,
	Felix.Kuehling, amd-gfx

From: Kent Russell <kent.russell@amd.com>

[ Upstream commit 0ed704d058cec7643a716a21888d58c7d03f2c3e ]

HMM assumes that pages have READ permissions by default. Inside
svm_range_validate_and_map, we add READ permissions then add WRITE
permissions if the VMA isn't read-only. This will conflict with regions
that only have PROT_WRITE or have PROT_NONE. When that happens,
svm_range_restore_work will continue to retry, silently, giving the
impression of a hang if pr_debug isn't enabled to show the retries..

If pages don't have READ permissions, simply unmap them and continue. If
they weren't mapped in the first place, this would be a no-op. Since x86
doesn't support write-only, and PROT_NONE doesn't allow reads or writes
anyways, this will allow the svm range validation to continue without
getting stuck in a loop forever on mappings we can't use with HMM.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

Explanation
- Bug fixed: The change addresses an indefinite retry loop (apparent
  hang) in HMM-backed SVM mapping when encountering VMAs without read
  permission, specifically write-only mappings and PROT_NONE. The loop
  is triggered because HMM assumes READ by default and the existing code
  adds READ then WRITE in svm_range_validate_and_map. That conflicts
  with mappings that lack READ and causes svm_range_restore_work to
  silently retry forever.
- Core change: In drivers/gpu/drm/amd/amdkfd/kfd_svm.c inside
  svm_range_validate_and_map, after resolving the VMA and before calling
  amdgpu_hmm_range_get_pages, the patch adds a guard:
  - Check: if (!(vma->vm_flags & VM_READ)) { … continue; }
  - Behavior on no-READ: Acquire range lock, optionally pr_debug if
    VM_WRITE is set without VM_READ, compute the intersection of the
    current address range with prange, call
    svm_range_unmap_from_gpus(prange, s, e,
    KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU), unlock, advance addr to next,
    and continue.
  - This explicitly treats PROT_NONE and write-only VMAs as unmappable
    for HMM/SVM and avoids mapping attempts that will never succeed.
- Containment: The change is localized to a single function and code
  path used during SVM range validation/mapping. No APIs or data
  structures are changed. It only affects the slow path when
  encountering a VMA without VM_READ; normal mappings (with READ) follow
  the existing flow unchanged.
- User impact: Prevents a hang-like condition (endless retry) that users
  would experience during SVM range restoration/validation when a VMA
  has PROT_NONE or write-only protection. This is a real, user-facing
  bug that can stall workloads using KFD/HMM.
- Risk assessment:
  - Minimal risk: The unmap is a no-op if the pages were not mapped (“If
    they weren't mapped in the first place, this would be a no-op”), and
    otherwise it correctly tears down mappings that cannot be used by
    HMM anyway (x86 doesn’t support write-only, and PROT_NONE allows
    neither reads nor writes).
  - No architectural changes: No redesign or wide-reaching behavior
    changes outside of this corner case. Logging is via pr_debug, so no
    noisy kernel logs in production.
- Stable criteria:
  - Important bugfix affecting users: Yes (prevents indefinite
    retry/hang).
  - Small and contained: Yes (~20 lines, one function).
  - No new features: Correct.
  - No broad side effects: Correct; behavior is limited to VMAs lacking
    READ, which cannot be supported by HMM.
  - Explicit stable tag: Not shown in the snippet, but the fix clearly
    meets stable backport guidelines due to the hang avoidance and
    limited scope.

Notes for backporters
- Interface compatibility: The patch relies on existing primitives
  present in amdkfd SVM code paths: vma_lookup, svm_range_lock/unlock,
  svm_range_unmap_from_gpus, KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU, and
  prange fields (start/last). These are stable in recent kernels that
  have SVM/HMM in KFD.
- Unit consistency check: Ensure the parameters passed to
  svm_range_unmap_from_gpus(prange, s, e, …) are in the units expected
  by your target stable branch. In the shown diff, s/e are computed as s
  = max(start, prange->start) and e = min(end, prange->last) where
  start/end appear to be byte addresses (start = map_start <<
  PAGE_SHIFT) and prange->start/last are often page indices in KFD SVM
  code. Upstream code likely uses consistent units (either all pages or
  all bytes). When backporting, verify that s/e match the function’s
  expected units (adjust by PAGE_SHIFT if necessary) to avoid off-by-
  PAGE_SHIFT mistakes.
- Validation suggestion: Reproduce with a user VMA set to PROT_NONE or
  write-only protection and trigger SVM range validation (e.g., by
  causing GPU access). Before the fix, svm_range_restore_work would
  continuously retry; after the fix, the range is unmapped and
  validation proceeds without looping.

Why this is safe and needed
- The patch turns an unrecoverable, retry-forever condition into a
  deterministic handling path by unmapping and moving on. It does not
  try to force unsupported permissions and does not alter behavior for
  the common case. It matches HMM’s requirement that mappings be at
  least readable and avoids futile retry cycles. This is precisely the
  kind of small, correctness-oriented fix that minimizes regression risk
  and improves robustness for users of amdkfd/HMM.

 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 3d8b20828c068..cecdbcea0bb90 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1714,6 +1714,29 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
 
 			next = min(vma->vm_end, end);
 			npages = (next - addr) >> PAGE_SHIFT;
+			/* HMM requires at least READ permissions. If provided with PROT_NONE,
+			 * unmap the memory. If it's not already mapped, this is a no-op
+			 * If PROT_WRITE is provided without READ, warn first then unmap
+			 */
+			if (!(vma->vm_flags & VM_READ)) {
+				unsigned long e, s;
+
+				svm_range_lock(prange);
+				if (vma->vm_flags & VM_WRITE)
+					pr_debug("VM_WRITE without VM_READ is not supported");
+				s = max(start, prange->start);
+				e = min(end, prange->last);
+				if (e >= s)
+					r = svm_range_unmap_from_gpus(prange, s, e,
+						       KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
+				svm_range_unlock(prange);
+				/* If unmap returns non-zero, we'll bail on the next for loop
+				 * iteration, so just leave r and continue
+				 */
+				addr = next;
+				continue;
+			}
+
 			WRITE_ONCE(p->svms.faulting_task, current);
 			r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
 						       readonly, owner, NULL,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.17-6.6] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.6] drm/amdkfd: Handle lack of READ permissions in SVM mapping Sasha Levin
@ 2025-10-25 15:59 ` Sasha Levin
  2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.4] drm/amdkfd: return -ENOTTY for unsupported IOCTLs Sasha Levin
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-10-25 15:59 UTC (permalink / raw)
  To: patches, stable
  Cc: Yifan Zhang, Philip Yang, Alex Deucher, Sasha Levin,
	Felix.Kuehling, amd-gfx

From: Yifan Zhang <yifan1.zhang@amd.com>

[ Upstream commit 99d7181bca34e96fbf61bdb6844918bdd4df2814 ]

There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
  kfree(kfd), and KGD interrupt generated.

kernel panic log:

BUG: kernel NULL pointer dereference, address: 0000000000000098
amdgpu 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP

PGD d78c68067 P4D d78c68067

kfd kfd: amdgpu: Allocated 3969056 bytes on gart

PUD 1465b8067 PMD @

Oops: @002 [#1] SMP NOPTI

kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K

RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40

Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO> OF b1 17 75 Ba 4c 89 e@ 41 Sc

89 c6 e8 07 38 5d

RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046

RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
0000000000000000 R11: 0000000000000000 R12: 0900000000000002
ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00

CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033

CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400

PKRU: 55555554

Call Trace:

<IRQ>

kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]

? amdgpu_fence_process+0xa4/0x150 [amdgpu]

kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace

amdgpu_irq_dispatch+0x165/0x210 [amdgpu]

amdgpu_ih_process+0x80/0x100 [amdgpu]

amdgpu: Virtual CRAT table created for GPU

amdgpu_irq_handler+0x1f/@x60 [amdgpu]

__handle_irq_event_percpu+0x3d/0x170

amdgpu: Topology: Add dGPU node [0x74a2:0x1002]

handle_irq_event+0x5a/@xcO

handle_edge_irq+0x93/0x240

kfd kfd: amdgpu: KFD node 1 partition @ size 49148M

asm_call_irq_on_stack+0xf/@x20

</IRQ>

common_interrupt+0xb3/0x130

asm_common_interrupt+0x1le/0x40

5.10.134-010.a1i5000.a18.x86_64 #1

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

- `kfd_cleanup_nodes` nulls out each entry in `kfd->nodes` while tearing
  the device down (`drivers/gpu/drm/amd/amdkfd/kfd_device.c:654-676`).
  If an interrupt fires in that window, the old code in
  `kgd2kfd_interrupt` still dereferences the stale slot and immediately
  touches `node->interrupt_lock`, so a NULL entry explodes exactly as
  shown in the panic log (offset 0x98 into a NULL `node`).
- The patch now defends that loop: before grabbing the lock it verifies
  the slot is still populated and bails out if it is already NULL
  (`drivers/gpu/drm/amd/amdkfd/kfd_device.c:1137-1144`). That mirrors
  the teardown progress—once the first node is gone we are already in
  device finalization—so dropping the interrupt is harmless, and more
  importantly it eliminates the crash.
- The change is tiny, self-contained, and has no functional impact while
  the device is operational because `kfd->nodes[i]` remains non-NULL
  outside of teardown. It only touches this ISR path and does not rely
  on newer framework helpers, so it applies cleanly to older trees.
- The bug has already been observed on production kernels (panic log
  from 5.10), making this a real user-visible regression. Given the
  severity (interrupt-time NULL dereference) and the minimal risk of the
  guard, it is a strong stable backport candidate.

 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 349c351e242b5..051a00152b089 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
 	}
 
 	for (i = 0; i < kfd->num_nodes; i++) {
-		node = kfd->nodes[i];
+		/* Race if another thread in b/w
+		 * kfd_cleanup_nodes and kfree(kfd),
+		 * when kfd->nodes[i] = NULL
+		 */
+		if (kfd->nodes[i])
+			node = kfd->nodes[i];
+		else
+			return;
+
 		spin_lock_irqsave(&node->interrupt_lock, flags);
 
 		if (node->interrupts_active
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.17-5.4] drm/amdkfd: return -ENOTTY for unsupported IOCTLs
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
                   ` (5 preceding siblings ...)
  2025-10-25 15:59 ` [PATCH AUTOSEL 6.17-6.6] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Sasha Levin
@ 2025-10-25 16:00 ` Sasha Levin
  2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.1] drm/amdkfd: fix vram allocation failure for a special case Sasha Levin
  2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.4] drm/amdkfd: Tie UNMAP_LATENCY to queue_preemption Sasha Levin
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-10-25 16:00 UTC (permalink / raw)
  To: patches, stable
  Cc: Geoffrey McRae, Alex Deucher, Felix Kuehling, Sasha Levin,
	Felix.Kuehling, amd-gfx

From: Geoffrey McRae <geoffrey.mcrae@amd.com>

[ Upstream commit 57af162bfc8c05332a28c4d458d246cc46d2746d ]

Some kfd ioctls may not be available depending on the kernel version the
user is running, as such we need to report -ENOTTY so userland can
determine the cause of the ioctl failure.

Signed-off-by: Geoffrey McRae <geoffrey.mcrae@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

- What changed
  - Unsupported amdkfd ioctls now return -ENOTTY instead of the previous
    default -EINVAL on two early error paths:
    - When `_IOC_NR(cmd)` is beyond the table: `nr >=
      AMDKFD_CORE_IOCTL_COUNT` now sets `retcode = -ENOTTY` before `goto
      err_i1` (drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:3256).
    - When the ioctl number is outside the defined KFD command range:
      the `else` branch after the `(nr >= AMDKFD_COMMAND_START) && (nr <
      AMDKFD_COMMAND_END)` check now sets `retcode = -ENOTTY` before
      `goto err_i1` (drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:3270).
  - Prior to this change, both paths fell through with `retcode`
    initialized to -EINVAL
    (drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:3253).

- Why it matters
  - KFD adds ioctls over time; user space built against newer headers
    may issue an ioctl unknown to an older kernel. Returning -ENOTTY is
    the canonical “inappropriate ioctl for device/unsupported ioctl”
    signal, allowing user space to distinguish “unsupported on this
    kernel” from “bad arguments to a supported ioctl” (which should be
    -EINVAL).
  - This aligns KFD’s error semantics with kernel ioctl conventions used
    elsewhere. For example, DRM core returns -ENOTTY for unsupported
    ioctl types (drivers/gpu/drm/drm_ioctl.c:782), and many drivers use
    -ENOTTY to indicate unsupported ioctls.

- Scope and risk
  - Change is small and tightly scoped to `kfd_ioctl` error handling for
    unsupported commands only
    (drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:3244).
  - No architectural changes, no new UAPI or features, and no functional
    change for supported ioctls.
  - Potential regression risk is low: only the errno for “unsupported
    ioctl” changes from -EINVAL to -ENOTTY. Well-behaved user space
    should treat -ENOTTY (not -EINVAL) as the indicator for an
    unsupported ioctl. This is correcting an API bug rather than
    changing intended behavior.
  - Security impact is nil; control flow and data handling are
    unchanged.

- Stable backport criteria
  - Fixes a real user-visible bug: ambiguous errno on unsupported ioctls
    made feature detection and fallback logic in userland unreliable.
  - Minimal, self-contained patch limited to amdkfd.
  - No side effects beyond improving errno correctness for unsupported
    ioctls.
  - Consistent with prior amdkfd errno cleanups (e.g., “Return proper
    error code for gws alloc API”) and general kernel ioctl practices.

Given these points, this is a good, low-risk candidate for stable
backport.

 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 828a9ceef1e76..79ed3be63d0dd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -3252,8 +3252,10 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 	int retcode = -EINVAL;
 	bool ptrace_attached = false;
 
-	if (nr >= AMDKFD_CORE_IOCTL_COUNT)
+	if (nr >= AMDKFD_CORE_IOCTL_COUNT) {
+		retcode = -ENOTTY;
 		goto err_i1;
+	}
 
 	if ((nr >= AMDKFD_COMMAND_START) && (nr < AMDKFD_COMMAND_END)) {
 		u32 amdkfd_size;
@@ -3266,8 +3268,10 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 			asize = amdkfd_size;
 
 		cmd = ioctl->cmd;
-	} else
+	} else {
+		retcode = -ENOTTY;
 		goto err_i1;
+	}
 
 	dev_dbg(kfd_device, "ioctl cmd 0x%x (#0x%x), arg 0x%lx\n", cmd, nr, arg);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.17-6.1] drm/amdkfd: fix vram allocation failure for a special case
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
                   ` (6 preceding siblings ...)
  2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.4] drm/amdkfd: return -ENOTTY for unsupported IOCTLs Sasha Levin
@ 2025-10-25 16:00 ` Sasha Levin
  2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.4] drm/amdkfd: Tie UNMAP_LATENCY to queue_preemption Sasha Levin
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-10-25 16:00 UTC (permalink / raw)
  To: patches, stable
  Cc: Eric Huang, Harish Kasiviswanathan, Alex Deucher, Sasha Levin,
	Felix.Kuehling, amd-gfx

From: Eric Huang <jinhuieric.huang@amd.com>

[ Upstream commit 93aa919ca05bec544b17ee9a1bfe394ce6c94bd8 ]

When it only allocates vram without va, which is 0, and a
SVM range allocated stays in this range, the vram allocation
returns failure. It should be skipped for this case from
SVM usage check.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

This is a small, targeted bug fix that prevents erroneous -EADDRINUSE
failures when userspace allocates a VRAM buffer without providing a VA
(i.e., `va_addr == 0`). The change is confined to the KFD ioctl path and
poses minimal regression risk while fixing a real user-visible issue.

What changed
- In `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:1045`, inside
  `kfd_ioctl_alloc_memory_of_gpu`, the SVM overlap check was amended to
  skip a special case:
  - New guard added at `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:1071`:
    - `if (!(!args->va_addr && (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM))
      && interval_tree_iter_first(...)) { ... return -EADDRINUSE; }`
  - Practically, this means the SVM interval-tree overlap check is
    bypassed only when:
    - `args->va_addr == 0` (no VA requested), and
    - `flags` includes `KFD_IOC_ALLOC_MEM_FLAGS_VRAM`.
  - Previously, the overlap check was unconditional, which could falsely
    report “Address already allocated by SVM” when VA is 0 (see the
    surrounding context at
    `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:1064-1079`).

Why it’s a bug fix
- The commit message accurately describes a failure mode: when
  allocating VRAM-only without a VA (VA=0) and there exists an SVM range
  that falls in that [0, size) range, the ioctl incorrectly returns
  `-EADDRINUSE`. For VRAM-only allocations without a VA, SVM address-
  range conflicts are irrelevant and should not block allocation.
- The code change corrects this by skipping the SVM overlap check for
  that specific case, avoiding a false-positive error.

Safety and scope
- Minimal, localized change: It adds a single conditional guard and
  comment in one function. No ABI or architectural changes.
- Confined to AMD KFD user memory allocation path; does not touch core
  MM, scheduler, or unrelated GPU subsystems.
- Consistency with mapping rules: mapping requires a non-zero VA. In
  `kfd_mem_attach` (called during mapping), mapping with `mem->va == 0`
  is rejected
  (`drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:858-930`, check at
  “if (!va) { ... return -EINVAL; }”). This ensures that skipping the
  SVM check for VA=0 can’t accidentally permit an overlapping SVM GPU-VA
  mapping later: mapping at VA=0 is inherently invalid and denied. Thus
  the change strictly avoids a spurious allocation-time error without
  enabling unsafe mappings.
- Flags behavior matches UAPI: `KFD_IOC_ALLOC_MEM_FLAGS_VRAM` is
  intended for VRAM allocations (`include/uapi/linux/kfd_ioctl.h:407`).
  VRAM-only allocations with VA=0 are valid for certain use cases (e.g.,
  export or CPU-visible VRAM on large BAR), and should not be blocked by
  SVM interval checks.

Stable backport criteria
- Fixes a real bug affecting users (spurious -EADDRINUSE on valid VRAM-
  only allocations).
- Change is small and contained, with clear intent and low regression
  risk.
- No new features or architectural shifts.
- Touches only driver code in a single path
  (`kfd_ioctl_alloc_memory_of_gpu`), no widespread side effects.

Conclusion
- This is a clear, minimal bug fix that prevents erroneous allocation
  failures and aligns with the mapping semantics already enforced
  elsewhere. It is suitable for stable backport.

 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 79ed3be63d0dd..43115a3744694 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1070,7 +1070,12 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 	svm_range_list_lock_and_flush_work(&p->svms, current->mm);
 	mutex_lock(&p->svms.lock);
 	mmap_write_unlock(current->mm);
-	if (interval_tree_iter_first(&p->svms.objects,
+
+	/* Skip a special case that allocates VRAM without VA,
+	 * VA will be invalid of 0.
+	 */
+	if (!(!args->va_addr && (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)) &&
+	    interval_tree_iter_first(&p->svms.objects,
 				     args->va_addr >> PAGE_SHIFT,
 				     (args->va_addr + args->size - 1) >> PAGE_SHIFT)) {
 		pr_err("Address: 0x%llx already allocated by SVM\n",
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.17-5.4] drm/amdkfd: Tie UNMAP_LATENCY to queue_preemption
       [not found] <20251025160905.3857885-1-sashal@kernel.org>
                   ` (7 preceding siblings ...)
  2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.1] drm/amdkfd: fix vram allocation failure for a special case Sasha Levin
@ 2025-10-25 16:00 ` Sasha Levin
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-10-25 16:00 UTC (permalink / raw)
  To: patches, stable
  Cc: Amber Lin, Harish Kasiviswanathan, Alex Deucher, Sasha Levin,
	Felix.Kuehling, amd-gfx

From: Amber Lin <Amber.Lin@amd.com>

[ Upstream commit f3820e9d356132e18405cd7606e22dc87ccfa6d1 ]

When KFD asks CP to preempt queues, other than preempt CP queues, CP
also requests SDMA to preempt SDMA queues with UNMAP_LATENCY timeout.
Currently queue_preemption_timeout_ms is 9000 ms by default but can be
configured via module parameter. KFD_UNMAP_LATENCY_MS is hard coded as
4000 ms though. This patch ties KFD_UNMAP_LATENCY_MS to
queue_preemption_timeout_ms so in a slow system such as emulator, both
CP and SDMA slowness are taken into account.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

- What changed
  - Replaces hard-coded `KFD_UNMAP_LATENCY_MS (4000)` with a value
    derived from the existing module parameter
    `queue_preemption_timeout_ms`: `((queue_preemption_timeout_ms -
    queue_preemption_timeout_ms / 10) >> 1)` in
    `drivers/gpu/drm/amd/amdkfd/kfd_priv.h:120`. This budgets ~45% of
    the total preemption timeout for each of the two SDMA engines,
    leaving ~10% for CP overhead, per the new comment in
    `drivers/gpu/drm/amd/amdkfd/kfd_priv.h:114`.
  - `queue_preemption_timeout_ms` is already a public module parameter
    with default 9000 ms in
    `drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:833`, documented at
    `drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:835`, and declared for KFD
    use at `drivers/gpu/drm/amd/amdkfd/kfd_priv.h:195`.

- Why it matters (bug and impact)
  - When KFD asks CP to preempt queues, CP also requests SDMA to preempt
    SDMA queues with an UNMAP latency. The driver waits for the CP fence
    using `queue_preemption_timeout_ms` (see
    `drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:2402`), but
    previously SDMA’s UNMAP latency was fixed at 4000 ms. This mismatch
    can cause spurious preemption timeouts on slow systems (e.g.,
    emulators) or when users tune the module parameter, leading to
    preempt failures and potential error paths like “The cp might be in
    an unrecoverable state due to an unsuccessful queues preemption.”
  - By tying `KFD_UNMAP_LATENCY_MS` to `queue_preemption_timeout_ms`,
    the SDMA preemption budget scales consistently with the CP fence
    wait, avoiding premature timeouts and improving reliability.

- Where the new value is used
  - Programmed into MES/PM4 packets (units of 100 ms):
    `packet->bitfields2.unmap_latency = KFD_UNMAP_LATENCY_MS / 100;` in
    `drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c:129` and
    `drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c:205`.
  - Passed as the timeout when destroying MQDs (preempt/unmap paths):
    calls to `mqd_mgr->destroy_mqd(..., KFD_UNMAP_LATENCY_MS, ...)` in
    `drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:884`,
    `drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:996`, and
    `drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:1175`.
  - Used for resetting hung queues via `hqd_reset(...,
    KFD_UNMAP_LATENCY_MS)` in
    `drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:2230`.

- Stable criteria assessment
  - Fixes a real-world reliability issue (timeouts/mismatched budgets)
    that affects users, especially on slow systems and when
    `queue_preemption_timeout_ms` is tuned.
  - Change is small, contained to a single macro in one header
    (`kfd_priv.h`) with clear rationale and no architectural
    refactoring.
  - Side effects are minimal: default behavior remains effectively
    unchanged (for 9000 ms, `KFD_UNMAP_LATENCY_MS` becomes ~4050 ms;
    when quantized to 100 ms units it still programs 40), while non-
    default configurations become consistent and safer.
  - Touches KFD/amdgpu preemption logic but only adjusts a timeout
    parameter already designed to be user-configurable; no new features
    introduced.

Given the above, this is a low-risk, correctness-improving timeout
alignment and a good candidate for backporting to stable.

 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 67694bcd94646..d01ef5ac07666 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -111,7 +111,14 @@
 
 #define KFD_KERNEL_QUEUE_SIZE 2048
 
-#define KFD_UNMAP_LATENCY_MS	(4000)
+/*  KFD_UNMAP_LATENCY_MS is the timeout CP waiting for SDMA preemption. One XCC
+ *  can be associated to 2 SDMA engines. queue_preemption_timeout_ms is the time
+ *  driver waiting for CP returning the UNMAP_QUEUE fence. Thus the math is
+ *  queue_preemption_timeout_ms = sdma_preemption_time * 2 + cp workload
+ *  The format here makes CP workload 10% of total timeout
+ */
+#define KFD_UNMAP_LATENCY_MS	\
+	((queue_preemption_timeout_ms - queue_preemption_timeout_ms / 10) >> 1)
 
 #define KFD_MAX_SDMA_QUEUES	128
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-10-25 16:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20251025160905.3857885-1-sashal@kernel.org>
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] amd/amdkfd: enhance kfd process check in switch partition Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-6.12] drm/amdgpu: fix nullptr err of vm_handle_moved Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17-6.1] drm/amdgpu: Allow kfd CRIU with no buffer objects Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/amd/pm: refine amdgpu pm sysfs node error code Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.6] drm/amdkfd: Handle lack of READ permissions in SVM mapping Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17-6.6] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.4] drm/amdkfd: return -ENOTTY for unsupported IOCTLs Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.1] drm/amdkfd: fix vram allocation failure for a special case Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-5.4] drm/amdkfd: Tie UNMAP_LATENCY to queue_preemption Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox