[PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
@ 2025-09-24 15:29 Yifan Zhang
  2025-09-24 15:29 ` [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition Yifan Zhang
  2025-09-24 22:48 ` [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Philip Yang
  0 siblings, 2 replies; 16+ messages in thread
From: Yifan Zhang @ 2025-09-24 15:29 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alexander.Deucher, Felix.Kuehling, Philip.Yang, Lijo.Lazar,
	Yifan Zhang

There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
  kfree(kfd), and KGD interrupt generated.

kernel panic log:

BUG: kernel NULL pointer dereference, address: 0000000000000098
amdgpu 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP

PGD d78c68067 P4D d78c68067

kfd kfd: amdgpu: Allocated 3969056 bytes on gart

PUD 1465b8067 PMD @

Oops: @002 [#1] SMP NOPTI

kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K

RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40

Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO> OF b1 17 75 Ba 4c 89 e@ 41 Sc

89 c6 e8 07 38 5d

RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046

RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
0000000000000000 R11: 0000000000000000 R12: 0900000000000002
ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00

CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033

CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400

PKRU: 55555554

Call Trace:

<IRQ>

kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]

? amdgpu_fence_process+0xa4/0x150 [amdgpu]

kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace

amdgpu_irq_dispatch+0x165/0x210 [amdgpu]

amdgpu_ih_process+0x80/0x100 [amdgpu]

amdgpu: Virtual CRAT table created for GPU

amdgpu_irq_handler+0x1f/@x60 [amdgpu]

__handle_irq_event_percpu+0x3d/0x170

amdgpu: Topology: Add dGPU node [0x74a2:0x1002]

handle_irq_event+0x5a/@xcO

handle_edge_irq+0x93/0x240

kfd kfd: amdgpu: KFD node 1 partition @ size 49148M

asm_call_irq_on_stack+0xf/@x20

</IRQ>

common_interrupt+0xb3/0x130

asm_common_interrupt+0x1le/0x40

5.10.134-010.a1i5000.a18.x86_64 #1

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 349c351e242b..051a00152b08 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
 	}
 
 	for (i = 0; i < kfd->num_nodes; i++) {
-		node = kfd->nodes[i];
+		/* Race if another thread in b/w
+		 * kfd_cleanup_nodes and kfree(kfd),
+		 * when kfd->nodes[i] = NULL
+		 */
+		if (kfd->nodes[i])
+			node = kfd->nodes[i];
+		else
+			return;
+
 		spin_lock_irqsave(&node->interrupt_lock, flags);
 
 		if (node->interrupts_active
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition
  2025-09-24 15:29 [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Yifan Zhang
@ 2025-09-24 15:29 ` Yifan Zhang
  2025-09-24 23:14   ` Philip Yang
  2025-09-26 19:52   ` Chen, Xiaogang
  2025-09-24 22:48 ` [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Philip Yang
  1 sibling, 2 replies; 16+ messages in thread
From: Yifan Zhang @ 2025-09-24 15:29 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alexander.Deucher, Felix.Kuehling, Philip.Yang, Lijo.Lazar,
	Yifan Zhang

current switch partition only check if kfd_processes_table is empty.
kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but
kfd_process tear down is in kfd_process_wq_release.

consider two processes:

Process A (workqueue) -> kfd_process_wq_release -> Access kfd_node member
Process B switch partition -> amdgpu_xcp_pre_partition_switch -> amdgpu_amdkfd_device_fini_sw
-> kfd_node tear down.

Process A and B may trigger a race as shown in dmesg log.

This patch is to resolve the race by adding an atomic kfd_process counter
kfd_processes_count, it increment as create kfd process, decrement as
finish kfd_process_wq_release.

v2: Put kfd_processes_count per kfd_dev, move decrement to kfd_process_destroy_pdds
and bug fix. (Philip Yang)

[3966658.307702] divide error: 0000 [#1] SMP NOPTI
[3966658.350818]  i10nm_edac
[3966658.356318] CPU: 124 PID: 38435 Comm: kworker/124:0 Kdump: loaded Tainted
[3966658.356890] Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu]
[3966658.362839]  nfit
[3966658.366457] RIP: 0010:kfd_get_num_sdma_engines+0x17/0x40 [amdgpu]
[3966658.366460] Code: 00 00 e9 ac 81 02 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 8b 4f 08 48 8b b7 00 01 00 00 8b 81 58 26 03 00 99 <f7> be b8 01 00 00 80 b9 70 2e 00 00 00 74 0b 83 f8 02 ba 02 00 00
[3966658.380967]  x86_pkg_temp_thermal
[3966658.391529] RSP: 0018:ffffc900a0edfdd8 EFLAGS: 00010246
[3966658.391531] RAX: 0000000000000008 RBX: ffff8974e593b800 RCX: ffff888645900000
[3966658.391531] RDX: 0000000000000000 RSI: ffff888129154400 RDI: ffff888129151c00
[3966658.391532] RBP: ffff8883ad79d400 R08: 0000000000000000 R09: ffff8890d2750af4
[3966658.391532] R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000000
[3966658.391533] R13: ffff8883ad79d400 R14: ffffe87ff662ba00 R15: ffff8974e593b800
[3966658.391533] FS:  0000000000000000(0000) GS:ffff88fe7f600000(0000) knlGS:0000000000000000
[3966658.391534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3966658.391534] CR2: 0000000000d71000 CR3: 000000dd0e970004 CR4: 0000000002770ee0
[3966658.391535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3966658.391535] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[3966658.391536] PKRU: 55555554
[3966658.391536] Call Trace:
[3966658.391674]  deallocate_sdma_queue+0x38/0xa0 [amdgpu]
[3966658.391762]  process_termination_cpsch+0x1ed/0x480 [amdgpu]
[3966658.399754]  intel_powerclamp
[3966658.402831]  kfd_process_dequeue_from_all_devices+0x5b/0xc0 [amdgpu]
[3966658.402908]  kfd_process_wq_release+0x1a/0x1a0 [amdgpu]
[3966658.410516]  coretemp
[3966658.434016]  process_one_work+0x1ad/0x380
[3966658.434021]  worker_thread+0x49/0x310
[3966658.438963]  kvm_intel
[3966658.446041]  ? process_one_work+0x380/0x380
[3966658.446045]  kthread+0x118/0x140
[3966658.446047]  ? __kthread_bind_mask+0x60/0x60
[3966658.446050]  ret_from_fork+0x1f/0x30
[3966658.446053] Modules linked in: kpatch_20765354(OEK)
[3966658.455310]  kvm
[3966658.464534]  mptcp_diag xsk_diag raw_diag unix_diag af_packet_diag netlink_diag udp_diag act_pedit act_mirred act_vlan cls_flower kpatch_21951273(OEK) kpatch_18424469(OEK) kpatch_19749756(OEK)
[3966658.473462]  idxd_mdev
[3966658.482306]  kpatch_17971294(OEK) sch_ingress xt_conntrack amdgpu(OE) amdxcp(OE) amddrm_buddy(OE) amd_sched(OE) amdttm(OE) amdkcl(OE) intel_ifs iptable_mangle tcm_loop target_core_pscsi tcp_diag target_core_file inet_diag target_core_iblock target_core_user target_core_mod coldpgs kpatch_18383292(OEK) ip6table_nat ip6table_filter ip6_tables ip_set_hash_ipportip ip_set_hash_ipportnet ip_set_hash_ipport ip_set_bitmap_port xt_comment iptable_nat nf_nat iptable_filter ip_tables ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sn_core_odd(OE) i40e overlay binfmt_misc tun bonding(OE) aisqos(OE) aisqos_hotfixes(OE) rfkill uio_pci_generic uio cuse fuse nf_tables nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm idxd_mdev
[3966658.491237]  vfio_pci
[3966658.501196]  vfio_pci vfio_virqfd mdev vfio_iommu_type1 vfio iax_crypto intel_pmt_telemetry iTCO_wdt intel_pmt_class iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq
[3966658.508537]  vfio_virqfd
[3966658.517569]  snd_seq_device ipmi_ssif isst_if_mbox_pci isst_if_mmio pcspkr snd_pcm idxd intel_uncore ses isst_if_common intel_vsec idxd_bus enclosure snd_timer mei_me snd i2c_i801 i2c_smbus mei i2c_ismt soundcore joydev acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad vfat fat
[3966658.526851]  mdev
[3966658.536096]  nfsd auth_rpcgss nfs_acl lockd grace slb_vtoa(OE) sunrpc dm_mod hookers mlx5_ib(OE) ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm mlx5_core(OE) mlxfw(OE)
[3966658.540381]  vfio_iommu_type1
[3966658.544341]  nvme mpt3sas tls drm nvme_core pci_hyperv_intf raid_class psample libcrc32c crc32c_intel mlxdevm(OE) i2c_core
[3966658.551254]  vfio
[3966658.558742]  scsi_transport_sas wmi pinctrl_emmitsburg sd_mod t10_pi sg ahci libahci libata rdma_ucm(OE) ib_uverbs(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_umad(OE) ib_core(OE) ib_ucm(OE) mlx_compat(OE)
[3966658.563004]  iax_crypto
[3966658.570988]  [last unloaded: diagnose]
[3966658.571027] ---[ end trace cc9dbb180f9ae537 ]---

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  | 9 +++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++++
 3 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 051a00152b08..488c8c0e6ccd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -1493,6 +1493,15 @@ int kgd2kfd_check_and_lock_kfd(struct kfd_dev *kfd)
 
 	mutex_lock(&kfd_processes_mutex);
 
+	/* kfd_processes_count is per kfd_dev, return -EBUSY without
+	 * further check
+	 */
+	if (!!atomic_read(&kfd->kfd_processes_count)) {
+		pr_debug("process_wq_release not finished\n");
+		r = -EBUSY;
+		goto out;
+	}
+
 	if (hash_empty(kfd_processes_table) && !kfd_is_locked(kfd))
 		goto out;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index d01ef5ac0766..70ef051511bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -382,6 +382,8 @@ struct kfd_dev {
 
 	/* for dynamic partitioning */
 	int kfd_dev_lock;
+
+	atomic_t kfd_processes_count;
 };
 
 enum kfd_mempool {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 5be28c6c4f6a..ddfe30c13e9d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1088,6 +1088,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 			pdd->runtime_inuse = false;
 		}
 
+		atomic_dec(&pdd->dev->kfd->kfd_processes_count);
+
 		kfree(pdd);
 		p->pdds[i] = NULL;
 	}
@@ -1649,6 +1651,8 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_node *dev,
 	/* Init idr used for memory handle translation */
 	idr_init(&pdd->alloc_idr);
 
+	atomic_inc(&dev->kfd->kfd_processes_count);
+
 	return pdd;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition
  2025-09-24 15:29 ` [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition Yifan Zhang
@ 2025-09-24 23:14   ` Philip Yang
  2025-09-26 19:52   ` Chen, Xiaogang
  1 sibling, 0 replies; 16+ messages in thread
From: Philip Yang @ 2025-09-24 23:14 UTC (permalink / raw)
  To: Yifan Zhang, amd-gfx
  Cc: Alexander.Deucher, Felix.Kuehling, Philip.Yang, Lijo.Lazar


On 2025-09-24 11:29, Yifan Zhang wrote:
> current switch partition only check if kfd_processes_table is empty.
> kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but
> kfd_process tear down is in kfd_process_wq_release.
>
> consider two processes:
>
> Process A (workqueue) -> kfd_process_wq_release -> Access kfd_node member
> Process B switch partition -> amdgpu_xcp_pre_partition_switch -> amdgpu_amdkfd_device_fini_sw
> -> kfd_node tear down.
>
> Process A and B may trigger a race as shown in dmesg log.
>
> This patch is to resolve the race by adding an atomic kfd_process counter
> kfd_processes_count, it increment as create kfd process, decrement as
> finish kfd_process_wq_release.
>
> v2: Put kfd_processes_count per kfd_dev, move decrement to kfd_process_destroy_pdds
> and bug fix. (Philip Yang)
>
> [3966658.307702] divide error: 0000 [#1] SMP NOPTI
> [3966658.350818]  i10nm_edac
> [3966658.356318] CPU: 124 PID: 38435 Comm: kworker/124:0 Kdump: loaded Tainted
> [3966658.356890] Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu]
> [3966658.362839]  nfit
> [3966658.366457] RIP: 0010:kfd_get_num_sdma_engines+0x17/0x40 [amdgpu]
> [3966658.366460] Code: 00 00 e9 ac 81 02 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 8b 4f 08 48 8b b7 00 01 00 00 8b 81 58 26 03 00 99 <f7> be b8 01 00 00 80 b9 70 2e 00 00 00 74 0b 83 f8 02 ba 02 00 00
> [3966658.380967]  x86_pkg_temp_thermal
> [3966658.391529] RSP: 0018:ffffc900a0edfdd8 EFLAGS: 00010246
> [3966658.391531] RAX: 0000000000000008 RBX: ffff8974e593b800 RCX: ffff888645900000
> [3966658.391531] RDX: 0000000000000000 RSI: ffff888129154400 RDI: ffff888129151c00
> [3966658.391532] RBP: ffff8883ad79d400 R08: 0000000000000000 R09: ffff8890d2750af4
> [3966658.391532] R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000000
> [3966658.391533] R13: ffff8883ad79d400 R14: ffffe87ff662ba00 R15: ffff8974e593b800
> [3966658.391533] FS:  0000000000000000(0000) GS:ffff88fe7f600000(0000) knlGS:0000000000000000
> [3966658.391534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [3966658.391534] CR2: 0000000000d71000 CR3: 000000dd0e970004 CR4: 0000000002770ee0
> [3966658.391535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [3966658.391535] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [3966658.391536] PKRU: 55555554
> [3966658.391536] Call Trace:
> [3966658.391674]  deallocate_sdma_queue+0x38/0xa0 [amdgpu]
> [3966658.391762]  process_termination_cpsch+0x1ed/0x480 [amdgpu]
> [3966658.399754]  intel_powerclamp
> [3966658.402831]  kfd_process_dequeue_from_all_devices+0x5b/0xc0 [amdgpu]
> [3966658.402908]  kfd_process_wq_release+0x1a/0x1a0 [amdgpu]
> [3966658.410516]  coretemp
> [3966658.434016]  process_one_work+0x1ad/0x380
> [3966658.434021]  worker_thread+0x49/0x310
> [3966658.438963]  kvm_intel
> [3966658.446041]  ? process_one_work+0x380/0x380
> [3966658.446045]  kthread+0x118/0x140
> [3966658.446047]  ? __kthread_bind_mask+0x60/0x60
> [3966658.446050]  ret_from_fork+0x1f/0x30
> [3966658.446053] Modules linked in: kpatch_20765354(OEK)
> [3966658.455310]  kvm
> [3966658.464534]  mptcp_diag xsk_diag raw_diag unix_diag af_packet_diag netlink_diag udp_diag act_pedit act_mirred act_vlan cls_flower kpatch_21951273(OEK) kpatch_18424469(OEK) kpatch_19749756(OEK)
> [3966658.473462]  idxd_mdev
> [3966658.482306]  kpatch_17971294(OEK) sch_ingress xt_conntrack amdgpu(OE) amdxcp(OE) amddrm_buddy(OE) amd_sched(OE) amdttm(OE) amdkcl(OE) intel_ifs iptable_mangle tcm_loop target_core_pscsi tcp_diag target_core_file inet_diag target_core_iblock target_core_user target_core_mod coldpgs kpatch_18383292(OEK) ip6table_nat ip6table_filter ip6_tables ip_set_hash_ipportip ip_set_hash_ipportnet ip_set_hash_ipport ip_set_bitmap_port xt_comment iptable_nat nf_nat iptable_filter ip_tables ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sn_core_odd(OE) i40e overlay binfmt_misc tun bonding(OE) aisqos(OE) aisqos_hotfixes(OE) rfkill uio_pci_generic uio cuse fuse nf_tables nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm idxd_mdev
> [3966658.491237]  vfio_pci
> [3966658.501196]  vfio_pci vfio_virqfd mdev vfio_iommu_type1 vfio iax_crypto intel_pmt_telemetry iTCO_wdt intel_pmt_class iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq
> [3966658.508537]  vfio_virqfd
> [3966658.517569]  snd_seq_device ipmi_ssif isst_if_mbox_pci isst_if_mmio pcspkr snd_pcm idxd intel_uncore ses isst_if_common intel_vsec idxd_bus enclosure snd_timer mei_me snd i2c_i801 i2c_smbus mei i2c_ismt soundcore joydev acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad vfat fat
> [3966658.526851]  mdev
> [3966658.536096]  nfsd auth_rpcgss nfs_acl lockd grace slb_vtoa(OE) sunrpc dm_mod hookers mlx5_ib(OE) ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm mlx5_core(OE) mlxfw(OE)
> [3966658.540381]  vfio_iommu_type1
> [3966658.544341]  nvme mpt3sas tls drm nvme_core pci_hyperv_intf raid_class psample libcrc32c crc32c_intel mlxdevm(OE) i2c_core
> [3966658.551254]  vfio
> [3966658.558742]  scsi_transport_sas wmi pinctrl_emmitsburg sd_mod t10_pi sg ahci libahci libata rdma_ucm(OE) ib_uverbs(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_umad(OE) ib_core(OE) ib_ucm(OE) mlx_compat(OE)
> [3966658.563004]  iax_crypto
> [3966658.570988]  [last unloaded: diagnose]
> [3966658.571027] ---[ end trace cc9dbb180f9ae537 ]---
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>

Because kfd structure is kzalloc, on x86, atomic_set 0 is not needed, I 
don't know other platforms. To be safe,

@@ -495,6 +495,7 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device 
*adev, bool vf)
         mutex_init(&kfd->doorbell_mutex);

         ida_init(&kfd->doorbell_ida);
+       atomic_set(&kfd->kfd_processes_count, 0);

         return kfd;
  }

With this added, this patch is Reviewed-by: Philip.Yang<Philip.Yang@amd.com>

> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c  | 9 +++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 2 ++
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++++
>   3 files changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 051a00152b08..488c8c0e6ccd 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -1493,6 +1493,15 @@ int kgd2kfd_check_and_lock_kfd(struct kfd_dev *kfd)
>   
>   	mutex_lock(&kfd_processes_mutex);
>   
> +	/* kfd_processes_count is per kfd_dev, return -EBUSY without
> +	 * further check
> +	 */
> +	if (!!atomic_read(&kfd->kfd_processes_count)) {
> +		pr_debug("process_wq_release not finished\n");
> +		r = -EBUSY;
> +		goto out;
> +	}
> +
>   	if (hash_empty(kfd_processes_table) && !kfd_is_locked(kfd))
>   		goto out;
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index d01ef5ac0766..70ef051511bb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -382,6 +382,8 @@ struct kfd_dev {
>   
>   	/* for dynamic partitioning */
>   	int kfd_dev_lock;
> +
> +	atomic_t kfd_processes_count;
>   };
>   
>   enum kfd_mempool {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 5be28c6c4f6a..ddfe30c13e9d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1088,6 +1088,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
>   			pdd->runtime_inuse = false;
>   		}
>   
> +		atomic_dec(&pdd->dev->kfd->kfd_processes_count);
> +
>   		kfree(pdd);
>   		p->pdds[i] = NULL;
>   	}
> @@ -1649,6 +1651,8 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_node *dev,
>   	/* Init idr used for memory handle translation */
>   	idr_init(&pdd->alloc_idr);
>   
> +	atomic_inc(&dev->kfd->kfd_processes_count);
> +
>   	return pdd;
>   }
>   

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition
  2025-09-24 15:29 ` [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition Yifan Zhang
  2025-09-24 23:14   ` Philip Yang
@ 2025-09-26 19:52   ` Chen, Xiaogang
  2025-09-29 13:09     ` Philip Yang
  1 sibling, 1 reply; 16+ messages in thread
From: Chen, Xiaogang @ 2025-09-26 19:52 UTC (permalink / raw)
  To: Yifan Zhang, amd-gfx
  Cc: Alexander.Deucher, Felix.Kuehling, Philip.Yang, Lijo.Lazar


On 9/24/2025 10:29 AM, Yifan Zhang wrote:
> current switch partition only check if kfd_processes_table is empty.
> kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but
> kfd_process tear down is in kfd_process_wq_release.
>
> consider two processes:
>
> Process A (workqueue) -> kfd_process_wq_release -> Access kfd_node member
> Process B switch partition -> amdgpu_xcp_pre_partition_switch -> amdgpu_amdkfd_device_fini_sw
> -> kfd_node tear down.
>
> Process A and B may trigger a race as shown in dmesg log.
>
> This patch is to resolve the race by adding an atomic kfd_process counter
> kfd_processes_count, it increment as create kfd process, decrement as
> finish kfd_process_wq_release.
>
> v2: Put kfd_processes_count per kfd_dev, move decrement to kfd_process_destroy_pdds
> and bug fix. (Philip Yang)
>
> [3966658.307702] divide error: 0000 [#1] SMP NOPTI
> [3966658.350818]  i10nm_edac
> [3966658.356318] CPU: 124 PID: 38435 Comm: kworker/124:0 Kdump: loaded Tainted
> [3966658.356890] Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu]
> [3966658.362839]  nfit
> [3966658.366457] RIP: 0010:kfd_get_num_sdma_engines+0x17/0x40 [amdgpu]
> [3966658.366460] Code: 00 00 e9 ac 81 02 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 8b 4f 08 48 8b b7 00 01 00 00 8b 81 58 26 03 00 99 <f7> be b8 01 00 00 80 b9 70 2e 00 00 00 74 0b 83 f8 02 ba 02 00 00
> [3966658.380967]  x86_pkg_temp_thermal
> [3966658.391529] RSP: 0018:ffffc900a0edfdd8 EFLAGS: 00010246
> [3966658.391531] RAX: 0000000000000008 RBX: ffff8974e593b800 RCX: ffff888645900000
> [3966658.391531] RDX: 0000000000000000 RSI: ffff888129154400 RDI: ffff888129151c00
> [3966658.391532] RBP: ffff8883ad79d400 R08: 0000000000000000 R09: ffff8890d2750af4
> [3966658.391532] R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000000
> [3966658.391533] R13: ffff8883ad79d400 R14: ffffe87ff662ba00 R15: ffff8974e593b800
> [3966658.391533] FS:  0000000000000000(0000) GS:ffff88fe7f600000(0000) knlGS:0000000000000000
> [3966658.391534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [3966658.391534] CR2: 0000000000d71000 CR3: 000000dd0e970004 CR4: 0000000002770ee0
> [3966658.391535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [3966658.391535] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [3966658.391536] PKRU: 55555554
> [3966658.391536] Call Trace:
> [3966658.391674]  deallocate_sdma_queue+0x38/0xa0 [amdgpu]
> [3966658.391762]  process_termination_cpsch+0x1ed/0x480 [amdgpu]
> [3966658.399754]  intel_powerclamp
> [3966658.402831]  kfd_process_dequeue_from_all_devices+0x5b/0xc0 [amdgpu]
> [3966658.402908]  kfd_process_wq_release+0x1a/0x1a0 [amdgpu]
> [3966658.410516]  coretemp
> [3966658.434016]  process_one_work+0x1ad/0x380
> [3966658.434021]  worker_thread+0x49/0x310
> [3966658.438963]  kvm_intel
> [3966658.446041]  ? process_one_work+0x380/0x380
> [3966658.446045]  kthread+0x118/0x140
> [3966658.446047]  ? __kthread_bind_mask+0x60/0x60
> [3966658.446050]  ret_from_fork+0x1f/0x30
> [3966658.446053] Modules linked in: kpatch_20765354(OEK)
> [3966658.455310]  kvm
> [3966658.464534]  mptcp_diag xsk_diag raw_diag unix_diag af_packet_diag netlink_diag udp_diag act_pedit act_mirred act_vlan cls_flower kpatch_21951273(OEK) kpatch_18424469(OEK) kpatch_19749756(OEK)
> [3966658.473462]  idxd_mdev
> [3966658.482306]  kpatch_17971294(OEK) sch_ingress xt_conntrack amdgpu(OE) amdxcp(OE) amddrm_buddy(OE) amd_sched(OE) amdttm(OE) amdkcl(OE) intel_ifs iptable_mangle tcm_loop target_core_pscsi tcp_diag target_core_file inet_diag target_core_iblock target_core_user target_core_mod coldpgs kpatch_18383292(OEK) ip6table_nat ip6table_filter ip6_tables ip_set_hash_ipportip ip_set_hash_ipportnet ip_set_hash_ipport ip_set_bitmap_port xt_comment iptable_nat nf_nat iptable_filter ip_tables ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sn_core_odd(OE) i40e overlay binfmt_misc tun bonding(OE) aisqos(OE) aisqos_hotfixes(OE) rfkill uio_pci_generic uio cuse fuse nf_tables nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm idxd_mdev
> [3966658.491237]  vfio_pci
> [3966658.501196]  vfio_pci vfio_virqfd mdev vfio_iommu_type1 vfio iax_crypto intel_pmt_telemetry iTCO_wdt intel_pmt_class iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq
> [3966658.508537]  vfio_virqfd
> [3966658.517569]  snd_seq_device ipmi_ssif isst_if_mbox_pci isst_if_mmio pcspkr snd_pcm idxd intel_uncore ses isst_if_common intel_vsec idxd_bus enclosure snd_timer mei_me snd i2c_i801 i2c_smbus mei i2c_ismt soundcore joydev acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad vfat fat
> [3966658.526851]  mdev
> [3966658.536096]  nfsd auth_rpcgss nfs_acl lockd grace slb_vtoa(OE) sunrpc dm_mod hookers mlx5_ib(OE) ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm mlx5_core(OE) mlxfw(OE)
> [3966658.540381]  vfio_iommu_type1
> [3966658.544341]  nvme mpt3sas tls drm nvme_core pci_hyperv_intf raid_class psample libcrc32c crc32c_intel mlxdevm(OE) i2c_core
> [3966658.551254]  vfio
> [3966658.558742]  scsi_transport_sas wmi pinctrl_emmitsburg sd_mod t10_pi sg ahci libahci libata rdma_ucm(OE) ib_uverbs(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_umad(OE) ib_core(OE) ib_ucm(OE) mlx_compat(OE)
> [3966658.563004]  iax_crypto
> [3966658.570988]  [last unloaded: diagnose]
> [3966658.571027] ---[ end trace cc9dbb180f9ae537 ]---
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c  | 9 +++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 2 ++
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++++
>   3 files changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 051a00152b08..488c8c0e6ccd 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -1493,6 +1493,15 @@ int kgd2kfd_check_and_lock_kfd(struct kfd_dev *kfd)
>   
>   	mutex_lock(&kfd_processes_mutex);
>   
> +	/* kfd_processes_count is per kfd_dev, return -EBUSY without
> +	 * further check
> +	 */
> +	if (!!atomic_read(&kfd->kfd_processes_count)) {
> +		pr_debug("process_wq_release not finished\n");
> +		r = -EBUSY;
> +		goto out;
> +	}
> +
>   	if (hash_empty(kfd_processes_table) && !kfd_is_locked(kfd))
>   		goto out;
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index d01ef5ac0766..70ef051511bb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -382,6 +382,8 @@ struct kfd_dev {
>   
>   	/* for dynamic partitioning */
>   	int kfd_dev_lock;
> +
> +	atomic_t kfd_processes_count;

Why need add kfd process count per kfd_dev? A kfd process uses all kfd 
nodes on system. Is there a case that a kfd process just use some kfd_dev?

It does not seems the root cause or the solution.

Regards

Xiaogang

>   };
>   
>   enum kfd_mempool {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 5be28c6c4f6a..ddfe30c13e9d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1088,6 +1088,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
>   			pdd->runtime_inuse = false;
>   		}
>   
> +		atomic_dec(&pdd->dev->kfd->kfd_processes_count);
> +
>   		kfree(pdd);
>   		p->pdds[i] = NULL;
>   	}
> @@ -1649,6 +1651,8 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_node *dev,
>   	/* Init idr used for memory handle translation */
>   	idr_init(&pdd->alloc_idr);
>   
> +	atomic_inc(&dev->kfd->kfd_processes_count);
> +
>   	return pdd;
>   }
>   

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition
  2025-09-26 19:52   ` Chen, Xiaogang
@ 2025-09-29 13:09     ` Philip Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Philip Yang @ 2025-09-29 13:09 UTC (permalink / raw)
  To: Chen, Xiaogang, Yifan Zhang, amd-gfx
  Cc: Alexander.Deucher, Felix.Kuehling, Philip.Yang, Lijo.Lazar


On 2025-09-26 15:52, Chen, Xiaogang wrote:
>
> On 9/24/2025 10:29 AM, Yifan Zhang wrote:
>> current switch partition only check if kfd_processes_table is empty.
>> kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but
>> kfd_process tear down is in kfd_process_wq_release.
>>
>> consider two processes:
>>
>> Process A (workqueue) -> kfd_process_wq_release -> Access kfd_node 
>> member
>> Process B switch partition -> amdgpu_xcp_pre_partition_switch -> 
>> amdgpu_amdkfd_device_fini_sw
>> -> kfd_node tear down.
>>
>> Process A and B may trigger a race as shown in dmesg log.
>>
>> This patch is to resolve the race by adding an atomic kfd_process 
>> counter
>> kfd_processes_count, it increment as create kfd process, decrement as
>> finish kfd_process_wq_release.
>>
>> v2: Put kfd_processes_count per kfd_dev, move decrement to 
>> kfd_process_destroy_pdds
>> and bug fix. (Philip Yang)
>>
>> [3966658.307702] divide error: 0000 [#1] SMP NOPTI
>> [3966658.350818]  i10nm_edac
>> [3966658.356318] CPU: 124 PID: 38435 Comm: kworker/124:0 Kdump: 
>> loaded Tainted
>> [3966658.356890] Workqueue: kfd_process_wq kfd_process_wq_release 
>> [amdgpu]
>> [3966658.362839]  nfit
>> [3966658.366457] RIP: 0010:kfd_get_num_sdma_engines+0x17/0x40 [amdgpu]
>> [3966658.366460] Code: 00 00 e9 ac 81 02 00 66 66 2e 0f 1f 84 00 00 
>> 00 00 00 90 0f 1f 44 00 00 48 8b 4f 08 48 8b b7 00 01 00 00 8b 81 58 
>> 26 03 00 99 <f7> be b8 01 00 00 80 b9 70 2e 00 00 00 74 0b 83 f8 02 
>> ba 02 00 00
>> [3966658.380967]  x86_pkg_temp_thermal
>> [3966658.391529] RSP: 0018:ffffc900a0edfdd8 EFLAGS: 00010246
>> [3966658.391531] RAX: 0000000000000008 RBX: ffff8974e593b800 RCX: 
>> ffff888645900000
>> [3966658.391531] RDX: 0000000000000000 RSI: ffff888129154400 RDI: 
>> ffff888129151c00
>> [3966658.391532] RBP: ffff8883ad79d400 R08: 0000000000000000 R09: 
>> ffff8890d2750af4
>> [3966658.391532] R10: 0000000000000018 R11: 0000000000000018 R12: 
>> 0000000000000000
>> [3966658.391533] R13: ffff8883ad79d400 R14: ffffe87ff662ba00 R15: 
>> ffff8974e593b800
>> [3966658.391533] FS:  0000000000000000(0000) 
>> GS:ffff88fe7f600000(0000) knlGS:0000000000000000
>> [3966658.391534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [3966658.391534] CR2: 0000000000d71000 CR3: 000000dd0e970004 CR4: 
>> 0000000002770ee0
>> [3966658.391535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> [3966658.391535] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 
>> 0000000000000400
>> [3966658.391536] PKRU: 55555554
>> [3966658.391536] Call Trace:
>> [3966658.391674]  deallocate_sdma_queue+0x38/0xa0 [amdgpu]
>> [3966658.391762]  process_termination_cpsch+0x1ed/0x480 [amdgpu]
>> [3966658.399754]  intel_powerclamp
>> [3966658.402831]  kfd_process_dequeue_from_all_devices+0x5b/0xc0 
>> [amdgpu]
>> [3966658.402908]  kfd_process_wq_release+0x1a/0x1a0 [amdgpu]
>> [3966658.410516]  coretemp
>> [3966658.434016]  process_one_work+0x1ad/0x380
>> [3966658.434021]  worker_thread+0x49/0x310
>> [3966658.438963]  kvm_intel
>> [3966658.446041]  ? process_one_work+0x380/0x380
>> [3966658.446045]  kthread+0x118/0x140
>> [3966658.446047]  ? __kthread_bind_mask+0x60/0x60
>> [3966658.446050]  ret_from_fork+0x1f/0x30
>> [3966658.446053] Modules linked in: kpatch_20765354(OEK)
>> [3966658.455310]  kvm
>> [3966658.464534]  mptcp_diag xsk_diag raw_diag unix_diag 
>> af_packet_diag netlink_diag udp_diag act_pedit act_mirred act_vlan 
>> cls_flower kpatch_21951273(OEK) kpatch_18424469(OEK) 
>> kpatch_19749756(OEK)
>> [3966658.473462]  idxd_mdev
>> [3966658.482306]  kpatch_17971294(OEK) sch_ingress xt_conntrack 
>> amdgpu(OE) amdxcp(OE) amddrm_buddy(OE) amd_sched(OE) amdttm(OE) 
>> amdkcl(OE) intel_ifs iptable_mangle tcm_loop target_core_pscsi 
>> tcp_diag target_core_file inet_diag target_core_iblock 
>> target_core_user target_core_mod coldpgs kpatch_18383292(OEK) 
>> ip6table_nat ip6table_filter ip6_tables ip_set_hash_ipportip 
>> ip_set_hash_ipportnet ip_set_hash_ipport ip_set_bitmap_port 
>> xt_comment iptable_nat nf_nat iptable_filter ip_tables ip_set 
>> ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 
>> nf_defrag_ipv4 sn_core_odd(OE) i40e overlay binfmt_misc tun 
>> bonding(OE) aisqos(OE) aisqos_hotfixes(OE) rfkill uio_pci_generic uio 
>> cuse fuse nf_tables nfnetlink intel_rapl_msr intel_rapl_common 
>> intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit 
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm idxd_mdev
>> [3966658.491237]  vfio_pci
>> [3966658.501196]  vfio_pci vfio_virqfd mdev vfio_iommu_type1 vfio 
>> iax_crypto intel_pmt_telemetry iTCO_wdt intel_pmt_class 
>> iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul 
>> ghash_clmulni_intel rapl intel_cstate snd_hda_intel snd_intel_dspcfg 
>> snd_hda_codec snd_hda_core snd_hwdep snd_seq
>> [3966658.508537]  vfio_virqfd
>> [3966658.517569]  snd_seq_device ipmi_ssif isst_if_mbox_pci 
>> isst_if_mmio pcspkr snd_pcm idxd intel_uncore ses isst_if_common 
>> intel_vsec idxd_bus enclosure snd_timer mei_me snd i2c_i801 i2c_smbus 
>> mei i2c_ismt soundcore joydev acpi_ipmi ipmi_si ipmi_devintf 
>> ipmi_msghandler acpi_power_meter acpi_pad vfat fat
>> [3966658.526851]  mdev
>> [3966658.536096]  nfsd auth_rpcgss nfs_acl lockd grace slb_vtoa(OE) 
>> sunrpc dm_mod hookers mlx5_ib(OE) ast i2c_algo_bit drm_vram_helper 
>> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
>> drm_ttm_helper ttm mlx5_core(OE) mlxfw(OE)
>> [3966658.540381]  vfio_iommu_type1
>> [3966658.544341]  nvme mpt3sas tls drm nvme_core pci_hyperv_intf 
>> raid_class psample libcrc32c crc32c_intel mlxdevm(OE) i2c_core
>> [3966658.551254]  vfio
>> [3966658.558742]  scsi_transport_sas wmi pinctrl_emmitsburg sd_mod 
>> t10_pi sg ahci libahci libata rdma_ucm(OE) ib_uverbs(OE) rdma_cm(OE) 
>> iw_cm(OE) ib_cm(OE) ib_umad(OE) ib_core(OE) ib_ucm(OE) mlx_compat(OE)
>> [3966658.563004]  iax_crypto
>> [3966658.570988]  [last unloaded: diagnose]
>> [3966658.571027] ---[ end trace cc9dbb180f9ae537 ]---
>>
>> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c  | 9 +++++++++
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 2 ++
>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++++
>>   3 files changed, 15 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> index 051a00152b08..488c8c0e6ccd 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> @@ -1493,6 +1493,15 @@ int kgd2kfd_check_and_lock_kfd(struct kfd_dev 
>> *kfd)
>>         mutex_lock(&kfd_processes_mutex);
>>   +    /* kfd_processes_count is per kfd_dev, return -EBUSY without
>> +     * further check
>> +     */
>> +    if (!!atomic_read(&kfd->kfd_processes_count)) {
>> +        pr_debug("process_wq_release not finished\n");
>> +        r = -EBUSY;
>> +        goto out;
>> +    }
>> +
>>       if (hash_empty(kfd_processes_table) && !kfd_is_locked(kfd))
>>           goto out;
>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> index d01ef5ac0766..70ef051511bb 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> @@ -382,6 +382,8 @@ struct kfd_dev {
>>         /* for dynamic partitioning */
>>       int kfd_dev_lock;
>> +
>> +    atomic_t kfd_processes_count;
>
> Why need add kfd process count per kfd_dev? A kfd process uses all kfd 
> nodes on system. Is there a case that a kfd process just use some 
> kfd_dev?
>
yes, cgroup could exclude devices for process, count per kfd_dev allow 
device partition switch with running process that has been on cgroup 
excluded from this device

Regards,

Philip

> It does not seems the root cause or the solution.
>
> Regards
>
> Xiaogang
>
>>   };
>>     enum kfd_mempool {
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index 5be28c6c4f6a..ddfe30c13e9d 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -1088,6 +1088,8 @@ static void kfd_process_destroy_pdds(struct 
>> kfd_process *p)
>>               pdd->runtime_inuse = false;
>>           }
>>   + atomic_dec(&pdd->dev->kfd->kfd_processes_count);
>> +
>>           kfree(pdd);
>>           p->pdds[i] = NULL;
>>       }
>> @@ -1649,6 +1651,8 @@ struct kfd_process_device 
>> *kfd_create_process_device_data(struct kfd_node *dev,
>>       /* Init idr used for memory handle translation */
>>       idr_init(&pdd->alloc_idr);
>>   +    atomic_inc(&dev->kfd->kfd_processes_count);
>> +
>>       return pdd;
>>   }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-24 15:29 [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Yifan Zhang
  2025-09-24 15:29 ` [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition Yifan Zhang
@ 2025-09-24 22:48 ` Philip Yang
  2025-09-25  6:19   ` Lazar, Lijo
  2025-09-25 19:11   ` Chen, Xiaogang
  1 sibling, 2 replies; 16+ messages in thread
From: Philip Yang @ 2025-09-24 22:48 UTC (permalink / raw)
  To: Yifan Zhang, amd-gfx
  Cc: Alexander.Deucher, Felix.Kuehling, Philip.Yang, Lijo.Lazar


On 2025-09-24 11:29, Yifan Zhang wrote:
> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>    kfree(kfd), and KGD interrupt generated.
>
> kernel panic log:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000098
> amdgpu 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>
> PGD d78c68067 P4D d78c68067
>
> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>
> PUD 1465b8067 PMD @
>
> Oops: @002 [#1] SMP NOPTI
>
> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>
> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>
> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>
> 89 c6 e8 07 38 5d
>
> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>
> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>
> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>
> PKRU: 55555554
>
> Call Trace:
>
> <IRQ>
>
> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>
> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>
> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>
> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>
> amdgpu_ih_process+0x80/0x100 [amdgpu]
>
> amdgpu: Virtual CRAT table created for GPU
>
> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>
> __handle_irq_event_percpu+0x3d/0x170
>
> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>
> handle_irq_event+0x5a/@xcO
>
> handle_edge_irq+0x93/0x240
>
> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>
> asm_call_irq_on_stack+0xf/@x20
>
> </IRQ>
>
> common_interrupt+0xb3/0x130
>
> asm_common_interrupt+0x1le/0x40
>
> 5.10.134-010.a1i5000.a18.x86_64 #1
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 349c351e242b..051a00152b08 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>   	}
>   
>   	for (i = 0; i < kfd->num_nodes; i++) {
> -		node = kfd->nodes[i];
> +		/* Race if another thread in b/w
> +		 * kfd_cleanup_nodes and kfree(kfd),
> +		 * when kfd->nodes[i] = NULL
> +		 */
> +		if (kfd->nodes[i])
> +			node = kfd->nodes[i];
> +		else
> +			return;
> +
>   		spin_lock_irqsave(&node->interrupt_lock, flags);
>   
>   		if (node->interrupts_active

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-24 22:48 ` [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Philip Yang
@ 2025-09-25  6:19   ` Lazar, Lijo
  2025-09-25  6:41     ` Zhang, Yifan
  2025-09-25 19:11   ` Chen, Xiaogang
  1 sibling, 1 reply; 16+ messages in thread
From: Lazar, Lijo @ 2025-09-25  6:19 UTC (permalink / raw)
  To: Yang, Philip, Zhang, Yifan, amd-gfx@lists.freedesktop.org
  Cc: Deucher, Alexander, Kuehling, Felix

[Public]

> Race if another thread in b/w kfd_cleanup_nodes

This code is there before cleanup of nodes.
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);

Shouldn't the interrupt handling code check if interrupt handling is enabled rather than checking individual node NULL pointers?

Thanks,
Lijo

-----Original Message-----
From: Yang, Philip <Philip.Yang@amd.com>
Sent: Thursday, September 25, 2025 4:19 AM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>
Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw


On 2025-09-24 11:29, Yifan Zhang wrote:
> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>    kfree(kfd), and KGD interrupt generated.
>
> kernel panic log:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000098 amdgpu
> 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>
> PGD d78c68067 P4D d78c68067
>
> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>
> PUD 1465b8067 PMD @
>
> Oops: @002 [#1] SMP NOPTI
>
> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>
> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>
> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF
> 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO>
> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>
> 89 c6 e8 07 38 5d
>
> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>
> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>
> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>
> PKRU: 55555554
>
> Call Trace:
>
> <IRQ>
>
> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>
> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>
> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>
> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>
> amdgpu_ih_process+0x80/0x100 [amdgpu]
>
> amdgpu: Virtual CRAT table created for GPU
>
> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>
> __handle_irq_event_percpu+0x3d/0x170
>
> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>
> handle_irq_event+0x5a/@xcO
>
> handle_edge_irq+0x93/0x240
>
> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>
> asm_call_irq_on_stack+0xf/@x20
>
> </IRQ>
>
> common_interrupt+0xb3/0x130
>
> asm_common_interrupt+0x1le/0x40
>
> 5.10.134-010.a1i5000.a18.x86_64 #1
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 349c351e242b..051a00152b08 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>       }
>
>       for (i = 0; i < kfd->num_nodes; i++) {
> -             node = kfd->nodes[i];
> +             /* Race if another thread in b/w
> +              * kfd_cleanup_nodes and kfree(kfd),
> +              * when kfd->nodes[i] = NULL
> +              */
> +             if (kfd->nodes[i])
> +                     node = kfd->nodes[i];
> +             else
> +                     return;
> +
>               spin_lock_irqsave(&node->interrupt_lock, flags);
>
>               if (node->interrupts_active

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-25  6:19   ` Lazar, Lijo
@ 2025-09-25  6:41     ` Zhang, Yifan
  2025-09-25  6:58       ` Lazar, Lijo
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Yifan @ 2025-09-25  6:41 UTC (permalink / raw)
  To: Lazar, Lijo, Yang, Philip, amd-gfx@lists.freedesktop.org
  Cc: Deucher, Alexander, Kuehling, Felix

[Public]

The interrupts are from KGD, still active after flush ih_wq and kfd_dev is freed. And after knode is freed, node->interrupts_active is also inaccessible. The race condition is as below:

Interrupt handling                                              switch partition process
                                                        |
                                                                |       flush_workqueue(kfd->ih_wq);
                                                                |       destroy_workqueue(kfd->ih_wq);
amdgpu_irq_dispatch                                                |
amdgpu_amdkfd_interrupt                         |
kgd2kfd_interrupt                                       |
                                                        |       kfd_cleanup_nodes
                                                        |       kfree(knode);
spin_lock_irqsave(&node->interrupt_lock, flags);        |
//NULL Pointer


-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Thursday, September 25, 2025 2:19 PM
To: Yang, Philip <Philip.Yang@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

> Race if another thread in b/w kfd_cleanup_nodes

This code is there before cleanup of nodes.
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);

Shouldn't the interrupt handling code check if interrupt handling is enabled rather than checking individual node NULL pointers?

Thanks,
Lijo

-----Original Message-----
From: Yang, Philip <Philip.Yang@amd.com>
Sent: Thursday, September 25, 2025 4:19 AM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>
Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw


On 2025-09-24 11:29, Yifan Zhang wrote:
> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>    kfree(kfd), and KGD interrupt generated.
>
> kernel panic log:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000098 amdgpu
> 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>
> PGD d78c68067 P4D d78c68067
>
> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>
> PUD 1465b8067 PMD @
>
> Oops: @002 [#1] SMP NOPTI
>
> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>
> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>
> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF
> 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO>
> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>
> 89 c6 e8 07 38 5d
>
> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>
> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>
> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>
> PKRU: 55555554
>
> Call Trace:
>
> <IRQ>
>
> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>
> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>
> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>
> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>
> amdgpu_ih_process+0x80/0x100 [amdgpu]
>
> amdgpu: Virtual CRAT table created for GPU
>
> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>
> __handle_irq_event_percpu+0x3d/0x170
>
> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>
> handle_irq_event+0x5a/@xcO
>
> handle_edge_irq+0x93/0x240
>
> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>
> asm_call_irq_on_stack+0xf/@x20
>
> </IRQ>
>
> common_interrupt+0xb3/0x130
>
> asm_common_interrupt+0x1le/0x40
>
> 5.10.134-010.a1i5000.a18.x86_64 #1
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 349c351e242b..051a00152b08 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>       }
>
>       for (i = 0; i < kfd->num_nodes; i++) {
> -             node = kfd->nodes[i];
> +             /* Race if another thread in b/w
> +              * kfd_cleanup_nodes and kfree(kfd),
> +              * when kfd->nodes[i] = NULL
> +              */
> +             if (kfd->nodes[i])
> +                     node = kfd->nodes[i];
> +             else
> +                     return;
> +
>               spin_lock_irqsave(&node->interrupt_lock, flags);
>
>               if (node->interrupts_active


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-25  6:41     ` Zhang, Yifan
@ 2025-09-25  6:58       ` Lazar, Lijo
  2025-09-25  7:06         ` Lazar, Lijo
  0 siblings, 1 reply; 16+ messages in thread
From: Lazar, Lijo @ 2025-09-25  6:58 UTC (permalink / raw)
  To: Zhang, Yifan, Yang, Philip, amd-gfx@lists.freedesktop.org
  Cc: Deucher, Alexander, Kuehling, Felix

[Public]

I meant something like this.

destroy_workqueue(kfd->ih_wq);
kfd->ih_wq = NULL;

Then check for NULL at the beginning of kgd2kfd_interrupt. If there is no IH workqueue, then there is no interrupt handling capability.

May also be within the loop. Not sure if that is really required; if some work is already scheduled previously, it should be inside flush_workqueue() of cleanup_nodes.

        for (i = 0; kfd->ih_wq && i < kfd->num_nodes; i++)

Thanks,
Lijo

-----Original Message-----
From: Zhang, Yifan <Yifan1.Zhang@amd.com>
Sent: Thursday, September 25, 2025 12:11 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

The interrupts are from KGD, still active after flush ih_wq and kfd_dev is freed. And after knode is freed, node->interrupts_active is also inaccessible. The race condition is as below:

Interrupt handling                                              switch partition process
                                                        |
                                                                |       flush_workqueue(kfd->ih_wq);
                                                                |       destroy_workqueue(kfd->ih_wq);
amdgpu_irq_dispatch                                                |
amdgpu_amdkfd_interrupt                         |
kgd2kfd_interrupt                                       |
                                                        |       kfd_cleanup_nodes
                                                        |       kfree(knode);
spin_lock_irqsave(&node->interrupt_lock, flags);        |
//NULL Pointer


-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Thursday, September 25, 2025 2:19 PM
To: Yang, Philip <Philip.Yang@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

> Race if another thread in b/w kfd_cleanup_nodes

This code is there before cleanup of nodes.
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);

Shouldn't the interrupt handling code check if interrupt handling is enabled rather than checking individual node NULL pointers?

Thanks,
Lijo

-----Original Message-----
From: Yang, Philip <Philip.Yang@amd.com>
Sent: Thursday, September 25, 2025 4:19 AM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>
Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw


On 2025-09-24 11:29, Yifan Zhang wrote:
> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>    kfree(kfd), and KGD interrupt generated.
>
> kernel panic log:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000098 amdgpu
> 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>
> PGD d78c68067 P4D d78c68067
>
> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>
> PUD 1465b8067 PMD @
>
> Oops: @002 [#1] SMP NOPTI
>
> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>
> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>
> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF
> 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO>
> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>
> 89 c6 e8 07 38 5d
>
> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>
> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>
> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>
> PKRU: 55555554
>
> Call Trace:
>
> <IRQ>
>
> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>
> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>
> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>
> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>
> amdgpu_ih_process+0x80/0x100 [amdgpu]
>
> amdgpu: Virtual CRAT table created for GPU
>
> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>
> __handle_irq_event_percpu+0x3d/0x170
>
> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>
> handle_irq_event+0x5a/@xcO
>
> handle_edge_irq+0x93/0x240
>
> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>
> asm_call_irq_on_stack+0xf/@x20
>
> </IRQ>
>
> common_interrupt+0xb3/0x130
>
> asm_common_interrupt+0x1le/0x40
>
> 5.10.134-010.a1i5000.a18.x86_64 #1
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 349c351e242b..051a00152b08 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>       }
>
>       for (i = 0; i < kfd->num_nodes; i++) {
> -             node = kfd->nodes[i];
> +             /* Race if another thread in b/w
> +              * kfd_cleanup_nodes and kfree(kfd),
> +              * when kfd->nodes[i] = NULL
> +              */
> +             if (kfd->nodes[i])
> +                     node = kfd->nodes[i];
> +             else
> +                     return;
> +
>               spin_lock_irqsave(&node->interrupt_lock, flags);
>
>               if (node->interrupts_active



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-25  6:58       ` Lazar, Lijo
@ 2025-09-25  7:06         ` Lazar, Lijo
  2025-09-25  9:54           ` Zhang, Yifan
  0 siblings, 1 reply; 16+ messages in thread
From: Lazar, Lijo @ 2025-09-25  7:06 UTC (permalink / raw)
  To: Lazar, Lijo, Zhang, Yifan, Yang, Philip,
	amd-gfx@lists.freedesktop.org
  Cc: Deucher, Alexander, Kuehling, Felix

[Public]

On a second thought, probably this will require some locking. For ex: it's quite possible that destroy_workqueue(kfd->ih_wq) and queue_work(node->kfd->ih_wq, &node->interrupt_work) could be happening back-to-back. Node is not yet deleted.

Thanks,
Lijo
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Lazar, Lijo
Sent: Thursday, September 25, 2025 12:29 PM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

I meant something like this.

destroy_workqueue(kfd->ih_wq);
kfd->ih_wq = NULL;

Then check for NULL at the beginning of kgd2kfd_interrupt. If there is no IH workqueue, then there is no interrupt handling capability.

May also be within the loop. Not sure if that is really required; if some work is already scheduled previously, it should be inside flush_workqueue() of cleanup_nodes.

        for (i = 0; kfd->ih_wq && i < kfd->num_nodes; i++)

Thanks,
Lijo

-----Original Message-----
From: Zhang, Yifan <Yifan1.Zhang@amd.com>
Sent: Thursday, September 25, 2025 12:11 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

The interrupts are from KGD, still active after flush ih_wq and kfd_dev is freed. And after knode is freed, node->interrupts_active is also inaccessible. The race condition is as below:

Interrupt handling                                              switch partition process
                                                        |
                                                                |       flush_workqueue(kfd->ih_wq);
                                                                |       destroy_workqueue(kfd->ih_wq);
amdgpu_irq_dispatch                                                |
amdgpu_amdkfd_interrupt                         |
kgd2kfd_interrupt                                       |
                                                        |       kfd_cleanup_nodes
                                                        |       kfree(knode);
spin_lock_irqsave(&node->interrupt_lock, flags);        |
//NULL Pointer


-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Thursday, September 25, 2025 2:19 PM
To: Yang, Philip <Philip.Yang@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

> Race if another thread in b/w kfd_cleanup_nodes

This code is there before cleanup of nodes.
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);

Shouldn't the interrupt handling code check if interrupt handling is enabled rather than checking individual node NULL pointers?

Thanks,
Lijo

-----Original Message-----
From: Yang, Philip <Philip.Yang@amd.com>
Sent: Thursday, September 25, 2025 4:19 AM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>
Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw


On 2025-09-24 11:29, Yifan Zhang wrote:
> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>    kfree(kfd), and KGD interrupt generated.
>
> kernel panic log:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000098 amdgpu
> 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>
> PGD d78c68067 P4D d78c68067
>
> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>
> PUD 1465b8067 PMD @
>
> Oops: @002 [#1] SMP NOPTI
>
> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>
> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>
> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF
> 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO>
> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>
> 89 c6 e8 07 38 5d
>
> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>
> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>
> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>
> PKRU: 55555554
>
> Call Trace:
>
> <IRQ>
>
> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>
> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>
> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>
> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>
> amdgpu_ih_process+0x80/0x100 [amdgpu]
>
> amdgpu: Virtual CRAT table created for GPU
>
> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>
> __handle_irq_event_percpu+0x3d/0x170
>
> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>
> handle_irq_event+0x5a/@xcO
>
> handle_edge_irq+0x93/0x240
>
> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>
> asm_call_irq_on_stack+0xf/@x20
>
> </IRQ>
>
> common_interrupt+0xb3/0x130
>
> asm_common_interrupt+0x1le/0x40
>
> 5.10.134-010.a1i5000.a18.x86_64 #1
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 349c351e242b..051a00152b08 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>       }
>
>       for (i = 0; i < kfd->num_nodes; i++) {
> -             node = kfd->nodes[i];
> +             /* Race if another thread in b/w
> +              * kfd_cleanup_nodes and kfree(kfd),
> +              * when kfd->nodes[i] = NULL
> +              */
> +             if (kfd->nodes[i])
> +                     node = kfd->nodes[i];
> +             else
> +                     return;
> +
>               spin_lock_irqsave(&node->interrupt_lock, flags);
>
>               if (node->interrupts_active



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-25  7:06         ` Lazar, Lijo
@ 2025-09-25  9:54           ` Zhang, Yifan
  2025-09-26  6:49             ` Lazar, Lijo
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Yifan @ 2025-09-25  9:54 UTC (permalink / raw)
  To: Lazar, Lijo, Yang, Philip, amd-gfx@lists.freedesktop.org
  Cc: Deucher, Alexander, Kuehling, Felix

[Public]

Hi Lijo, Do you mean a change like below ? Besides readability, is there functional improvement ?

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index e9cfb80bd436..86676acd9cbe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -663,6 +663,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes)
         */
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);
+       kfd->ih_wq = NULL;

        for (i = 0; i < num_nodes; i++) {
                knode = kfd->nodes[i];
@@ -1125,7 +1126,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
        unsigned long flags;
        struct kfd_node *node;

-       if (!kfd->init_complete)
+       if (!kfd->init_complete && !kfd->ih_wq)
                return;

        if (kfd->device_info.ih_ring_entry_size > sizeof(patched_ihre)) {


Regarding the destroy_workqueue(kfd->ih_wq) and queue_work(node->kfd->ih_wq, &node->interrupt_work) sync, it looks like another issue, should set node->interrupts_active = false before  destroy_workqueue(kfd->ih_wq) is called. I think something like below:

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index e9cfb80bd436..92d9fa99e954 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -662,6 +662,9 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes)
         * can't be created because we stopped interrupt handling above.
         */
        flush_workqueue(kfd->ih_wq);
+       for (i = 0; i < num_nodes; i++) {
+               kfd_interrupt_exit(knode);
+       }
        destroy_workqueue(kfd->ih_wq);

        for (i = 0; i < num_nodes; i++) {


-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Thursday, September 25, 2025 3:06 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

On a second thought, probably this will require some locking. For ex: it's quite possible that destroy_workqueue(kfd->ih_wq) and queue_work(node->kfd->ih_wq, &node->interrupt_work) could be happening back-to-back. Node is not yet deleted.

Thanks,
Lijo
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Lazar, Lijo
Sent: Thursday, September 25, 2025 12:29 PM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

I meant something like this.

destroy_workqueue(kfd->ih_wq);
kfd->ih_wq = NULL;

Then check for NULL at the beginning of kgd2kfd_interrupt. If there is no IH workqueue, then there is no interrupt handling capability.

May also be within the loop. Not sure if that is really required; if some work is already scheduled previously, it should be inside flush_workqueue() of cleanup_nodes.

        for (i = 0; kfd->ih_wq && i < kfd->num_nodes; i++)

Thanks,
Lijo

-----Original Message-----
From: Zhang, Yifan <Yifan1.Zhang@amd.com>
Sent: Thursday, September 25, 2025 12:11 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

The interrupts are from KGD, still active after flush ih_wq and kfd_dev is freed. And after knode is freed, node->interrupts_active is also inaccessible. The race condition is as below:

Interrupt handling                                              switch partition process
                                                        |
                                                                |       flush_workqueue(kfd->ih_wq);
                                                                |       destroy_workqueue(kfd->ih_wq);
amdgpu_irq_dispatch                                                |
amdgpu_amdkfd_interrupt                         |
kgd2kfd_interrupt                                       |
                                                        |       kfd_cleanup_nodes
                                                        |       kfree(knode);
spin_lock_irqsave(&node->interrupt_lock, flags);        |
//NULL Pointer


-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Thursday, September 25, 2025 2:19 PM
To: Yang, Philip <Philip.Yang@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

> Race if another thread in b/w kfd_cleanup_nodes

This code is there before cleanup of nodes.
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);

Shouldn't the interrupt handling code check if interrupt handling is enabled rather than checking individual node NULL pointers?

Thanks,
Lijo

-----Original Message-----
From: Yang, Philip <Philip.Yang@amd.com>
Sent: Thursday, September 25, 2025 4:19 AM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>
Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw


On 2025-09-24 11:29, Yifan Zhang wrote:
> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>    kfree(kfd), and KGD interrupt generated.
>
> kernel panic log:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000098 amdgpu
> 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>
> PGD d78c68067 P4D d78c68067
>
> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>
> PUD 1465b8067 PMD @
>
> Oops: @002 [#1] SMP NOPTI
>
> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>
> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>
> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF
> 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO>
> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>
> 89 c6 e8 07 38 5d
>
> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>
> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>
> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>
> PKRU: 55555554
>
> Call Trace:
>
> <IRQ>
>
> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>
> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>
> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>
> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>
> amdgpu_ih_process+0x80/0x100 [amdgpu]
>
> amdgpu: Virtual CRAT table created for GPU
>
> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>
> __handle_irq_event_percpu+0x3d/0x170
>
> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>
> handle_irq_event+0x5a/@xcO
>
> handle_edge_irq+0x93/0x240
>
> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>
> asm_call_irq_on_stack+0xf/@x20
>
> </IRQ>
>
> common_interrupt+0xb3/0x130
>
> asm_common_interrupt+0x1le/0x40
>
> 5.10.134-010.a1i5000.a18.x86_64 #1
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 349c351e242b..051a00152b08 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>       }
>
>       for (i = 0; i < kfd->num_nodes; i++) {
> -             node = kfd->nodes[i];
> +             /* Race if another thread in b/w
> +              * kfd_cleanup_nodes and kfree(kfd),
> +              * when kfd->nodes[i] = NULL
> +              */
> +             if (kfd->nodes[i])
> +                     node = kfd->nodes[i];
> +             else
> +                     return;
> +
>               spin_lock_irqsave(&node->interrupt_lock, flags);
>
>               if (node->interrupts_active




^ permalink raw reply related	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-25  9:54           ` Zhang, Yifan
@ 2025-09-26  6:49             ` Lazar, Lijo
  2025-09-26 14:33               ` Zhang, Yifan
  0 siblings, 1 reply; 16+ messages in thread
From: Lazar, Lijo @ 2025-09-26  6:49 UTC (permalink / raw)
  To: Zhang, Yifan, Yang, Philip, amd-gfx@lists.freedesktop.org
  Cc: Deucher, Alexander, Kuehling, Felix

[Public]

The intention is to let kgd2kfd_interrupt thread know that KFD is done with interrupt handling and exit at the earliest (that is even without going through kfd node loop). I was thinking of checking ih_wq NULL value, but since that value is not under lock, it's not necessary that kgd2kfd_interrupt thread sees the NULL value immediately.

> if (!kfd->init_complete && !kfd->ih_wq)

I think this should be  if (!kfd->ih_wq || !kfd->init_complete)

For this - kfd_interrupt_exit(knode) - it's better to keep it before flush_workqueue. That indicates no more queueing is allowed from any node.

Thanks,
Lijo

-----Original Message-----
From: Zhang, Yifan <Yifan1.Zhang@amd.com>
Sent: Thursday, September 25, 2025 3:25 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

Hi Lijo, Do you mean a change like below ? Besides readability, is there functional improvement ?

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index e9cfb80bd436..86676acd9cbe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -663,6 +663,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes)
         */
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);
+       kfd->ih_wq = NULL;

        for (i = 0; i < num_nodes; i++) {
                knode = kfd->nodes[i];
@@ -1125,7 +1126,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
        unsigned long flags;
        struct kfd_node *node;

-       if (!kfd->init_complete)
+       if (!kfd->init_complete && !kfd->ih_wq)
                return;

        if (kfd->device_info.ih_ring_entry_size > sizeof(patched_ihre)) {


Regarding the destroy_workqueue(kfd->ih_wq) and queue_work(node->kfd->ih_wq, &node->interrupt_work) sync, it looks like another issue, should set node->interrupts_active = false before  destroy_workqueue(kfd->ih_wq) is called. I think something like below:

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index e9cfb80bd436..92d9fa99e954 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -662,6 +662,9 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes)
         * can't be created because we stopped interrupt handling above.
         */
        flush_workqueue(kfd->ih_wq);
+       for (i = 0; i < num_nodes; i++) {
+               kfd_interrupt_exit(knode);
+       }
        destroy_workqueue(kfd->ih_wq);

        for (i = 0; i < num_nodes; i++) {


-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Thursday, September 25, 2025 3:06 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

On a second thought, probably this will require some locking. For ex: it's quite possible that destroy_workqueue(kfd->ih_wq) and queue_work(node->kfd->ih_wq, &node->interrupt_work) could be happening back-to-back. Node is not yet deleted.

Thanks,
Lijo
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Lazar, Lijo
Sent: Thursday, September 25, 2025 12:29 PM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

I meant something like this.

destroy_workqueue(kfd->ih_wq);
kfd->ih_wq = NULL;

Then check for NULL at the beginning of kgd2kfd_interrupt. If there is no IH workqueue, then there is no interrupt handling capability.

May also be within the loop. Not sure if that is really required; if some work is already scheduled previously, it should be inside flush_workqueue() of cleanup_nodes.

        for (i = 0; kfd->ih_wq && i < kfd->num_nodes; i++)

Thanks,
Lijo

-----Original Message-----
From: Zhang, Yifan <Yifan1.Zhang@amd.com>
Sent: Thursday, September 25, 2025 12:11 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

The interrupts are from KGD, still active after flush ih_wq and kfd_dev is freed. And after knode is freed, node->interrupts_active is also inaccessible. The race condition is as below:

Interrupt handling                                              switch partition process
                                                        |
                                                                |       flush_workqueue(kfd->ih_wq);
                                                                |       destroy_workqueue(kfd->ih_wq);
amdgpu_irq_dispatch                                                |
amdgpu_amdkfd_interrupt                         |
kgd2kfd_interrupt                                       |
                                                        |       kfd_cleanup_nodes
                                                        |       kfree(knode);
spin_lock_irqsave(&node->interrupt_lock, flags);        |
//NULL Pointer


-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Thursday, September 25, 2025 2:19 PM
To: Yang, Philip <Philip.Yang@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

> Race if another thread in b/w kfd_cleanup_nodes

This code is there before cleanup of nodes.
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);

Shouldn't the interrupt handling code check if interrupt handling is enabled rather than checking individual node NULL pointers?

Thanks,
Lijo

-----Original Message-----
From: Yang, Philip <Philip.Yang@amd.com>
Sent: Thursday, September 25, 2025 4:19 AM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>
Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw


On 2025-09-24 11:29, Yifan Zhang wrote:
> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>    kfree(kfd), and KGD interrupt generated.
>
> kernel panic log:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000098 amdgpu
> 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>
> PGD d78c68067 P4D d78c68067
>
> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>
> PUD 1465b8067 PMD @
>
> Oops: @002 [#1] SMP NOPTI
>
> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>
> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>
> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF
> 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO>
> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>
> 89 c6 e8 07 38 5d
>
> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>
> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>
> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>
> PKRU: 55555554
>
> Call Trace:
>
> <IRQ>
>
> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>
> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>
> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>
> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>
> amdgpu_ih_process+0x80/0x100 [amdgpu]
>
> amdgpu: Virtual CRAT table created for GPU
>
> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>
> __handle_irq_event_percpu+0x3d/0x170
>
> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>
> handle_irq_event+0x5a/@xcO
>
> handle_edge_irq+0x93/0x240
>
> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>
> asm_call_irq_on_stack+0xf/@x20
>
> </IRQ>
>
> common_interrupt+0xb3/0x130
>
> asm_common_interrupt+0x1le/0x40
>
> 5.10.134-010.a1i5000.a18.x86_64 #1
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 349c351e242b..051a00152b08 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>       }
>
>       for (i = 0; i < kfd->num_nodes; i++) {
> -             node = kfd->nodes[i];
> +             /* Race if another thread in b/w
> +              * kfd_cleanup_nodes and kfree(kfd),
> +              * when kfd->nodes[i] = NULL
> +              */
> +             if (kfd->nodes[i])
> +                     node = kfd->nodes[i];
> +             else
> +                     return;
> +
>               spin_lock_irqsave(&node->interrupt_lock, flags);
>
>               if (node->interrupts_active





^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-26  6:49             ` Lazar, Lijo
@ 2025-09-26 14:33               ` Zhang, Yifan
  0 siblings, 0 replies; 16+ messages in thread
From: Zhang, Yifan @ 2025-09-26 14:33 UTC (permalink / raw)
  To: Lazar, Lijo, Yang, Philip, amd-gfx@lists.freedesktop.org
  Cc: Deucher, Alexander, Kuehling, Felix

[-- Attachment #1: Type: text/plain, Size: 14028 bytes --]

[Public]

  kfd_interrupt_exit needs to break to two parts, because knode->ih_fifo is still needed by flush ih_wq.  Change like this:

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index e9cfb80bd436..270459826147 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -655,19 +655,33 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes)
 {
        struct kfd_node *knode;
        unsigned int i;
+       unsigned long flags;

        /*
         * flush_work ensures that there are no outstanding
         * work-queue items that will access interrupt_ring. New work items
         * can't be created because we stopped interrupt handling above.
         */
+       for (i = 0; i < num_nodes; i++) {
+               knode = kfd->nodes[i];
+               spin_lock_irqsave(&knode->interrupt_lock, flags);
+               knode->interrupts_active = false;
+               spin_unlock_irqrestore(&knode->interrupt_lock, flags);
+       }
+
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);

+       for (i = 0; i < num_nodes; i++) {
+               spin_lock_irqsave(&knode->interrupt_lock, flags);
+               knode->kfd->ih_wq = NULL;
+               spin_unlock_irqrestore(&knode->interrupt_lock, flags);
+               kfifo_free(&knode->ih_fifo);
+       }
+
        for (i = 0; i < num_nodes; i++) {
                knode = kfd->nodes[i];
                device_queue_manager_uninit(knode->dqm);
-               kfd_interrupt_exit(knode);
                kfd_topology_remove_device(knode);
                if (knode->gws)
                        amdgpu_amdkfd_free_gws(knode->adev, knode->gws);
@@ -1125,7 +1139,8 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
        unsigned long flags;
        struct kfd_node *node;

-       if (!kfd->init_complete)
+       if (!kfd->init_complete || !kfd->ih_wq)
                return;

        if (kfd->device_info.ih_ring_entry_size > sizeof(patched_ihre)) {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
index 783c2f5a04e4..be10c0a9d391 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
@@ -87,21 +87,6 @@ int kfd_interrupt_init(struct kfd_node *node)
        return 0;
 }

-void kfd_interrupt_exit(struct kfd_node *node)
-{
-       /*
-        * Stop the interrupt handler from writing to the ring and scheduling
-        * workqueue items. The spinlock ensures that any interrupt running
-        * after we have unlocked sees interrupts_active = false.
-        */
-       unsigned long flags;
-
-       spin_lock_irqsave(&node->interrupt_lock, flags);
-       node->interrupts_active = false;
-       spin_unlock_irqrestore(&node->interrupt_lock, flags);
-       kfifo_free(&node->ih_fifo);
-}
-




________________________________
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Friday, September 26, 2025 2:49 PM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

The intention is to let kgd2kfd_interrupt thread know that KFD is done with interrupt handling and exit at the earliest (that is even without going through kfd node loop). I was thinking of checking ih_wq NULL value, but since that value is not under lock, it's not necessary that kgd2kfd_interrupt thread sees the NULL value immediately.

> if (!kfd->init_complete && !kfd->ih_wq)

I think this should be  if (!kfd->ih_wq || !kfd->init_complete)

For this - kfd_interrupt_exit(knode) - it's better to keep it before flush_workqueue. That indicates no more queueing is allowed from any node.

Thanks,
Lijo

-----Original Message-----
From: Zhang, Yifan <Yifan1.Zhang@amd.com>
Sent: Thursday, September 25, 2025 3:25 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

Hi Lijo, Do you mean a change like below ? Besides readability, is there functional improvement ?

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index e9cfb80bd436..86676acd9cbe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -663,6 +663,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes)
         */
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);
+       kfd->ih_wq = NULL;

        for (i = 0; i < num_nodes; i++) {
                knode = kfd->nodes[i];
@@ -1125,7 +1126,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
        unsigned long flags;
        struct kfd_node *node;

-       if (!kfd->init_complete)
+       if (!kfd->init_complete && !kfd->ih_wq)
                return;

        if (kfd->device_info.ih_ring_entry_size > sizeof(patched_ihre)) {


Regarding the destroy_workqueue(kfd->ih_wq) and queue_work(node->kfd->ih_wq, &node->interrupt_work) sync, it looks like another issue, should set node->interrupts_active = false before  destroy_workqueue(kfd->ih_wq) is called. I think something like below:

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index e9cfb80bd436..92d9fa99e954 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -662,6 +662,9 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes)
         * can't be created because we stopped interrupt handling above.
         */
        flush_workqueue(kfd->ih_wq);
+       for (i = 0; i < num_nodes; i++) {
+               kfd_interrupt_exit(knode);
+       }
        destroy_workqueue(kfd->ih_wq);

        for (i = 0; i < num_nodes; i++) {


-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Thursday, September 25, 2025 3:06 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

On a second thought, probably this will require some locking. For ex: it's quite possible that destroy_workqueue(kfd->ih_wq) and queue_work(node->kfd->ih_wq, &node->interrupt_work) could be happening back-to-back. Node is not yet deleted.

Thanks,
Lijo
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Lazar, Lijo
Sent: Thursday, September 25, 2025 12:29 PM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

I meant something like this.

destroy_workqueue(kfd->ih_wq);
kfd->ih_wq = NULL;

Then check for NULL at the beginning of kgd2kfd_interrupt. If there is no IH workqueue, then there is no interrupt handling capability.

May also be within the loop. Not sure if that is really required; if some work is already scheduled previously, it should be inside flush_workqueue() of cleanup_nodes.

        for (i = 0; kfd->ih_wq && i < kfd->num_nodes; i++)

Thanks,
Lijo

-----Original Message-----
From: Zhang, Yifan <Yifan1.Zhang@amd.com>
Sent: Thursday, September 25, 2025 12:11 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Yang, Philip <Philip.Yang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

The interrupts are from KGD, still active after flush ih_wq and kfd_dev is freed. And after knode is freed, node->interrupts_active is also inaccessible. The race condition is as below:

Interrupt handling                                              switch partition process
                                                        |
                                                                |       flush_workqueue(kfd->ih_wq);
                                                                |       destroy_workqueue(kfd->ih_wq);
amdgpu_irq_dispatch                                                |
amdgpu_amdkfd_interrupt                         |
kgd2kfd_interrupt                                       |
                                                        |       kfd_cleanup_nodes
                                                        |       kfree(knode);
spin_lock_irqsave(&node->interrupt_lock, flags);        |
//NULL Pointer


-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Thursday, September 25, 2025 2:19 PM
To: Yang, Philip <Philip.Yang@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

[Public]

> Race if another thread in b/w kfd_cleanup_nodes

This code is there before cleanup of nodes.
        flush_workqueue(kfd->ih_wq);
        destroy_workqueue(kfd->ih_wq);

Shouldn't the interrupt handling code check if interrupt handling is enabled rather than checking individual node NULL pointers?

Thanks,
Lijo

-----Original Message-----
From: Yang, Philip <Philip.Yang@amd.com>
Sent: Thursday, September 25, 2025 4:19 AM
To: Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>
Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw


On 2025-09-24 11:29, Yifan Zhang wrote:
> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>    kfree(kfd), and KGD interrupt generated.
>
> kernel panic log:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000098 amdgpu
> 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>
> PGD d78c68067 P4D d78c68067
>
> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>
> PUD 1465b8067 PMD @
>
> Oops: @002 [#1] SMP NOPTI
>
> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>
> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>
> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF
> 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO>
> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>
> 89 c6 e8 07 38 5d
>
> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>
> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>
> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>
> PKRU: 55555554
>
> Call Trace:
>
> <IRQ>
>
> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>
> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>
> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>
> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>
> amdgpu_ih_process+0x80/0x100 [amdgpu]
>
> amdgpu: Virtual CRAT table created for GPU
>
> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>
> __handle_irq_event_percpu+0x3d/0x170
>
> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>
> handle_irq_event+0x5a/@xcO
>
> handle_edge_irq+0x93/0x240
>
> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>
> asm_call_irq_on_stack+0xf/@x20
>
> </IRQ>
>
> common_interrupt+0xb3/0x130
>
> asm_common_interrupt+0x1le/0x40
>
> 5.10.134-010.a1i5000.a18.x86_64 #1
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 349c351e242b..051a00152b08 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>       }
>
>       for (i = 0; i < kfd->num_nodes; i++) {
> -             node = kfd->nodes[i];
> +             /* Race if another thread in b/w
> +              * kfd_cleanup_nodes and kfree(kfd),
> +              * when kfd->nodes[i] = NULL
> +              */
> +             if (kfd->nodes[i])
> +                     node = kfd->nodes[i];
> +             else
> +                     return;
> +
>               spin_lock_irqsave(&node->interrupt_lock, flags);
>
>               if (node->interrupts_active





[-- Attachment #2: Type: text/html, Size: 35205 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-24 22:48 ` [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Philip Yang
  2025-09-25  6:19   ` Lazar, Lijo
@ 2025-09-25 19:11   ` Chen, Xiaogang
  2025-09-26  1:15     ` Zhang, Yifan
  1 sibling, 1 reply; 16+ messages in thread
From: Chen, Xiaogang @ 2025-09-25 19:11 UTC (permalink / raw)
  To: Philip Yang, Yifan Zhang, amd-gfx
  Cc: Alexander.Deucher, Felix.Kuehling, Philip.Yang, Lijo.Lazar


On 9/24/2025 5:48 PM, Philip Yang wrote:
>
> On 2025-09-24 11:29, Yifan Zhang wrote:
>> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
>> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>>    kfree(kfd), and KGD interrupt generated.
>>
>> kernel panic log:
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000098
>> amdgpu 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>>
>> PGD d78c68067 P4D d78c68067
>>
>> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>>
>> PUD 1465b8067 PMD @
>>
>> Oops: @002 [#1] SMP NOPTI
>>
>> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
>> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>>
>> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>>
>> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF 
>> 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO> 
>> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>>
>> 89 c6 e8 07 38 5d
>>
>> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>>
>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
>> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
>> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
>> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
>> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>>
>> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>>
>> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
>> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>>
>> PKRU: 55555554
>>
>> Call Trace:
>>
>> <IRQ>
>>
>> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>>
>> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>>
>> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>>
>> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>>
>> amdgpu_ih_process+0x80/0x100 [amdgpu]
>>
>> amdgpu: Virtual CRAT table created for GPU
>>
>> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>>
>> __handle_irq_event_percpu+0x3d/0x170
>>
>> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>>
>> handle_irq_event+0x5a/@xcO
>>
>> handle_edge_irq+0x93/0x240
>>
>> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>>
>> asm_call_irq_on_stack+0xf/@x20
>>
>> </IRQ>
>>
>> common_interrupt+0xb3/0x130
>>
>> asm_common_interrupt+0x1le/0x40
>>
>> 5.10.134-010.a1i5000.a18.x86_64 #1
>>
>> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
> Reviewed-by: Philip Yang<Philip.Yang@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> index 349c351e242b..051a00152b08 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, 
>> const void *ih_ring_entry)
>>       }
>>         for (i = 0; i < kfd->num_nodes; i++) {
>> -        node = kfd->nodes[i];
>> +        /* Race if another thread in b/w
>> +         * kfd_cleanup_nodes and kfree(kfd),
>> +         * when kfd->nodes[i] = NULL
>> +         */
>> +        if (kfd->nodes[i])
>> +            node = kfd->nodes[i];
>> +        else
>> +            return;
>> +

KFD interrupt is handled later in wq node->kfd->ih_wq at interrupt_wq 
which uses  kfd->nodes[i].  Checking " kfd->nodes[i] == NULL" here is 
not enough as kfd_cleanup_nodes can free kfd node at any time.

Regards

Xiaogang

>> spin_lock_irqsave(&node->interrupt_lock, flags);
>>             if (node->interrupts_active

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-25 19:11   ` Chen, Xiaogang
@ 2025-09-26  1:15     ` Zhang, Yifan
  2025-09-26 15:26       ` Chen, Xiaogang
  0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Yifan @ 2025-09-26  1:15 UTC (permalink / raw)
  To: Chen, Xiaogang, Yang, Philip, amd-gfx@lists.freedesktop.org
  Cc: Deucher, Alexander, Kuehling, Felix, Yang, Philip, Lazar, Lijo

[AMD Official Use Only - AMD Internal Distribution Only]

flush_workqueue(kfd->ih_wq) and destroy_workqueue(kfd->ih_wq) in kfd_cleanup_nodes clean up pending work items, and node->interrupts_active check prevent new work items from being enqueued. So after kfd_cleanup_nodes free kfd node, there is no pending kfd ih_wq work items.

And I agree there is still potential race here, since kfd->nodes[i] NULL check is not protected by lock. Will address it together with the issue Lijo mentioned.


-----Original Message-----
From: Chen, Xiaogang <Xiaogang.Chen@amd.com>
Sent: Friday, September 26, 2025 3:11 AM
To: Yang, Philip <Philip.Yang@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>
Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw


On 9/24/2025 5:48 PM, Philip Yang wrote:
>
> On 2025-09-24 11:29, Yifan Zhang wrote:
>> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
>> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>>    kfree(kfd), and KGD interrupt generated.
>>
>> kernel panic log:
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000098
>> amdgpu 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>>
>> PGD d78c68067 P4D d78c68067
>>
>> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>>
>> PUD 1465b8067 PMD @
>>
>> Oops: @002 [#1] SMP NOPTI
>>
>> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
>> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>>
>> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>>
>> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF
>> 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO>
>> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>>
>> 89 c6 e8 07 38 5d
>>
>> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>>
>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
>> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
>> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
>> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
>> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>>
>> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>>
>> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
>> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>>
>> PKRU: 55555554
>>
>> Call Trace:
>>
>> <IRQ>
>>
>> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>>
>> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>>
>> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>>
>> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>>
>> amdgpu_ih_process+0x80/0x100 [amdgpu]
>>
>> amdgpu: Virtual CRAT table created for GPU
>>
>> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>>
>> __handle_irq_event_percpu+0x3d/0x170
>>
>> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>>
>> handle_irq_event+0x5a/@xcO
>>
>> handle_edge_irq+0x93/0x240
>>
>> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>>
>> asm_call_irq_on_stack+0xf/@x20
>>
>> </IRQ>
>>
>> common_interrupt+0xb3/0x130
>>
>> asm_common_interrupt+0x1le/0x40
>>
>> 5.10.134-010.a1i5000.a18.x86_64 #1
>>
>> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
> Reviewed-by: Philip Yang<Philip.Yang@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> index 349c351e242b..051a00152b08 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd,
>> const void *ih_ring_entry)
>>       }
>>         for (i = 0; i < kfd->num_nodes; i++) {
>> -        node = kfd->nodes[i];
>> +        /* Race if another thread in b/w
>> +         * kfd_cleanup_nodes and kfree(kfd),
>> +         * when kfd->nodes[i] = NULL
>> +         */
>> +        if (kfd->nodes[i])
>> +            node = kfd->nodes[i];
>> +        else
>> +            return;
>> +

KFD interrupt is handled later in wq node->kfd->ih_wq at interrupt_wq which uses  kfd->nodes[i].  Checking " kfd->nodes[i] == NULL" here is not enough as kfd_cleanup_nodes can free kfd node at any time.

Regards

Xiaogang

>> spin_lock_irqsave(&node->interrupt_lock, flags);
>>             if (node->interrupts_active

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  2025-09-26  1:15     ` Zhang, Yifan
@ 2025-09-26 15:26       ` Chen, Xiaogang
  0 siblings, 0 replies; 16+ messages in thread
From: Chen, Xiaogang @ 2025-09-26 15:26 UTC (permalink / raw)
  To: Zhang, Yifan, Yang, Philip, amd-gfx@lists.freedesktop.org
  Cc: Deucher, Alexander, Kuehling, Felix, Lazar, Lijo


On 9/25/2025 8:15 PM, Zhang, Yifan wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> flush_workqueue(kfd->ih_wq) and destroy_workqueue(kfd->ih_wq) in kfd_cleanup_nodes clean up pending work items, and node->interrupts_active check prevent new work items from being enqueued. So after kfd_cleanup_nodes free kfd node, there is no pending kfd ih_wq work items.
>
> And I agree there is still potential race here, since kfd->nodes[i] NULL check is not protected by lock. Will address it together with the issue Lijo mentioned.

yes, the ih_wq is flushed and destroyed before kfd node got freed. But 
still, kfd->nodes[i] can be null at any time for kgd2kfd_interrupt at a 
different task.

I think the real issue is kfd->nodes[i] was not checked under 
node->interrupt_lock. Both kfd_cleanup_nodes and kgd2kfd_interrupt need 
use this lock when operate on a kfd node. Currently only 
kgd2kfd_interrupt uses it and does not include kfd->nodes[i]. 
kfd_cleanup_nodes also needs use this lock to exclude operation on kfd 
node from kgd2kfd_interrupt.

Regards

Xiaogang

>
>
> -----Original Message-----
> From: Chen, Xiaogang <Xiaogang.Chen@amd.com>
> Sent: Friday, September 26, 2025 3:11 AM
> To: Yang, Philip <Philip.Yang@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>
> Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
>
>
> On 9/24/2025 5:48 PM, Philip Yang wrote:
>> On 2025-09-24 11:29, Yifan Zhang wrote:
>>> There is race in amdgpu_amdkfd_device_fini_sw and interrupt.
>>> if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and
>>>     kfree(kfd), and KGD interrupt generated.
>>>
>>> kernel panic log:
>>>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000098
>>> amdgpu 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP
>>>
>>> PGD d78c68067 P4D d78c68067
>>>
>>> kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>>>
>>> PUD 1465b8067 PMD @
>>>
>>> Oops: @002 [#1] SMP NOPTI
>>>
>>> kfd kfd: amdgpu: Total number of KFD nodes to be created: 4
>>> CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K
>>>
>>> RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40
>>>
>>> Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF
>>> 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO>
>>> OF b1 17 75 Ba 4c 89 e@ 41 Sc
>>>
>>> 89 c6 e8 07 38 5d
>>>
>>> RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046
>>>
>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
>>> 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098
>>> ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020
>>> 0000000000000000 R11: 0000000000000000 R12: 0900000000000002
>>> ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00
>>>
>>> CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033
>>>
>>> CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0
>>> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400
>>>
>>> PKRU: 55555554
>>>
>>> Call Trace:
>>>
>>> <IRQ>
>>>
>>> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu]
>>>
>>> ? amdgpu_fence_process+0xa4/0x150 [amdgpu]
>>>
>>> kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace
>>>
>>> amdgpu_irq_dispatch+0x165/0x210 [amdgpu]
>>>
>>> amdgpu_ih_process+0x80/0x100 [amdgpu]
>>>
>>> amdgpu: Virtual CRAT table created for GPU
>>>
>>> amdgpu_irq_handler+0x1f/@x60 [amdgpu]
>>>
>>> __handle_irq_event_percpu+0x3d/0x170
>>>
>>> amdgpu: Topology: Add dGPU node [0x74a2:0x1002]
>>>
>>> handle_irq_event+0x5a/@xcO
>>>
>>> handle_edge_irq+0x93/0x240
>>>
>>> kfd kfd: amdgpu: KFD node 1 partition @ size 49148M
>>>
>>> asm_call_irq_on_stack+0xf/@x20
>>>
>>> </IRQ>
>>>
>>> common_interrupt+0xb3/0x130
>>>
>>> asm_common_interrupt+0x1le/0x40
>>>
>>> 5.10.134-010.a1i5000.a18.x86_64 #1
>>>
>>> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
>> Reviewed-by: Philip Yang<Philip.Yang@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++++++++-
>>>    1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> index 349c351e242b..051a00152b08 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> @@ -1133,7 +1133,15 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd,
>>> const void *ih_ring_entry)
>>>        }
>>>          for (i = 0; i < kfd->num_nodes; i++) {
>>> -        node = kfd->nodes[i];
>>> +        /* Race if another thread in b/w
>>> +         * kfd_cleanup_nodes and kfree(kfd),
>>> +         * when kfd->nodes[i] = NULL
>>> +         */
>>> +        if (kfd->nodes[i])
>>> +            node = kfd->nodes[i];
>>> +        else
>>> +            return;
>>> +
> KFD interrupt is handled later in wq node->kfd->ih_wq at interrupt_wq which uses  kfd->nodes[i].  Checking " kfd->nodes[i] == NULL" here is not enough as kfd_cleanup_nodes can free kfd node at any time.
>
> Regards
>
> Xiaogang
>
>>> spin_lock_irqsave(&node->interrupt_lock, flags);
>>>              if (node->interrupts_active

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-09-29 13:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-24 15:29 [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Yifan Zhang
2025-09-24 15:29 ` [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition Yifan Zhang
2025-09-24 23:14   ` Philip Yang
2025-09-26 19:52   ` Chen, Xiaogang
2025-09-29 13:09     ` Philip Yang
2025-09-24 22:48 ` [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw Philip Yang
2025-09-25  6:19   ` Lazar, Lijo
2025-09-25  6:41     ` Zhang, Yifan
2025-09-25  6:58       ` Lazar, Lijo
2025-09-25  7:06         ` Lazar, Lijo
2025-09-25  9:54           ` Zhang, Yifan
2025-09-26  6:49             ` Lazar, Lijo
2025-09-26 14:33               ` Zhang, Yifan
2025-09-25 19:11   ` Chen, Xiaogang
2025-09-26  1:15     ` Zhang, Yifan
2025-09-26 15:26       ` Chen, Xiaogang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox