From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Philip Yang <Philip.Yang@amd.com>, Ruili Ji <ruili.ji@amd.com>,
Felix Kuehling <Felix.Kuehling@amd.com>,
Alex Deucher <alexander.deucher@amd.com>,
Sasha Levin <sashal@kernel.org>,
christian.koenig@amd.com, Xinhui.Pan@amd.com, airlied@linux.ie,
daniel@ffwll.ch, amd-gfx@lists.freedesktop.org,
dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 5.16 017/109] drm/amdkfd: svm range restore work deadlock when process exit
Date: Fri, 1 Apr 2022 10:31:24 -0400 [thread overview]
Message-ID: <20220401143256.1950537-17-sashal@kernel.org> (raw)
In-Reply-To: <20220401143256.1950537-1-sashal@kernel.org>
From: Philip Yang <Philip.Yang@amd.com>
[ Upstream commit 6225bb3a88d22594aacea2485dc28ca12d596721 ]
kfd_process_notifier_release flush svm_range_restore_work
which calls svm_range_list_lock_and_flush_work to flush deferred_list
work, but if deferred_list work mmput release the last user, it will
call exit_mmap -> notifier_release, it is deadlock with below backtrace.
Move flush svm_range_restore_work to kfd_process_wq_release to avoid
deadlock. Then svm_range_restore_work take task->mm ref to avoid mm is
gone while validating and mapping ranges to GPU.
Workqueue: events svm_range_deferred_list_work [amdgpu]
Call Trace:
wait_for_completion+0x94/0x100
__flush_work+0x12a/0x1e0
__cancel_work_timer+0x10e/0x190
cancel_delayed_work_sync+0x13/0x20
kfd_process_notifier_release+0x98/0x2a0 [amdgpu]
__mmu_notifier_release+0x74/0x1f0
exit_mmap+0x170/0x200
mmput+0x5d/0x130
svm_range_deferred_list_work+0x104/0x230 [amdgpu]
process_one_work+0x220/0x3c0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reported-by: Ruili Ji <ruili.ji@amd.com>
Tested-by: Ruili Ji <ruili.ji@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 1 -
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 15 +++++++++------
2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index b993011cfa64..990228711108 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1150,7 +1150,6 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
cancel_delayed_work_sync(&p->eviction_work);
cancel_delayed_work_sync(&p->restore_work);
- cancel_delayed_work_sync(&p->svms.restore_work);
mutex_lock(&p->mutex);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index ea1c5aaf659a..a1b0c6bda803 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1589,13 +1589,14 @@ static void svm_range_restore_work(struct work_struct *work)
pr_debug("restore svm ranges\n");
- /* kfd_process_notifier_release destroys this worker thread. So during
- * the lifetime of this thread, kfd_process and mm will be valid.
- */
p = container_of(svms, struct kfd_process, svms);
- mm = p->mm;
- if (!mm)
+
+ /* Keep mm reference when svm_range_validate_and_map ranges */
+ mm = get_task_mm(p->lead_thread);
+ if (!mm) {
+ pr_debug("svms 0x%p process mm gone\n", svms);
return;
+ }
svm_range_list_lock_and_flush_work(svms, mm);
mutex_lock(&svms->lock);
@@ -1649,6 +1650,7 @@ static void svm_range_restore_work(struct work_struct *work)
out_reschedule:
mutex_unlock(&svms->lock);
mmap_write_unlock(mm);
+ mmput(mm);
/* If validation failed, reschedule another attempt */
if (evicted_ranges) {
@@ -2779,6 +2781,8 @@ void svm_range_list_fini(struct kfd_process *p)
pr_debug("pasid 0x%x svms 0x%p\n", p->pasid, &p->svms);
+ cancel_delayed_work_sync(&p->svms.restore_work);
+
/* Ensure list work is finished before process is destroyed */
flush_work(&p->svms.deferred_list_work);
@@ -2789,7 +2793,6 @@ void svm_range_list_fini(struct kfd_process *p)
atomic_inc(&p->svms.drain_pagefaults);
svm_range_drain_retry_fault(&p->svms);
-
list_for_each_entry_safe(prange, next, &p->svms.list, list) {
svm_range_unlink(prange);
svm_range_remove_notifier(prange);
--
2.34.1
next prev parent reply other threads:[~2022-04-01 15:02 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-01 14:31 [PATCH AUTOSEL 5.16 001/109] drm: Add orientation quirk for GPD Win Max Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 002/109] ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111 Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 003/109] drm/amd/display: Add signal type check when verify stream backends same Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 004/109] drm/edid: remove non_desktop quirk for HPN-3515 and LEN-B800 Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 005/109] drm/edid: improve non-desktop quirk logging Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 006/109] scsi: scsi_debug: Address races following module load Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 007/109] drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 008/109] drm/amd/display: Fix memory leak Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 009/109] drm/amd/display: Use PSR version selected during set_psr_caps Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 010/109] usb: gadget: tegra-xudc: Do not program SPARAM Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 011/109] usb: gadget: tegra-xudc: Fix control endpoint's definitions Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 012/109] usb: cdnsp: fix cdnsp_decode_trb function to properly handle ret value Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 013/109] ptp: replace snprintf with sysfs_emit Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 014/109] selftests, xsk: Fix bpf_res cleanup test Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 015/109] drm/amdkfd: Don't take process mutex for svm ioctls Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 016/109] drm/amdkfd: Ensure mm remain valid in svm deferred_list work Sasha Levin
2022-04-01 14:31 ` Sasha Levin [this message]
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 018/109] powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 019/109] ath11k: fix kernel panic during unload/load ath11k modules Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 020/109] ath11k: pci: fix crash on suspend if board file is not found Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 021/109] ath11k: mhi: use mhi_sync_power_up() Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 022/109] net/smc: Send directly when TCP_CORK is cleared Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 023/109] drm/bridge: Add missing pm_runtime_put_sync Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 024/109] bpf: Make dst_port field in struct bpf_sock 16-bit wide Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 025/109] scsi: mvsas: Replace snprintf() with sysfs_emit() Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 026/109] scsi: bfa: " Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 027/109] drm/v3d: fix missing unlock Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 028/109] power: supply: axp20x_battery: properly report current when discharging Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 029/109] mt76: mt7921: fix crash when startup fails Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 030/109] mt76: dma: initialize skip_unmap in mt76_dma_rx_fill Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 031/109] cfg80211: don't add non transmitted BSS to 6GHz scanned channels Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 032/109] libbpf: Fix build issue with llvm-readelf Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 033/109] ipv6: make mc_forwarding atomic Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 034/109] net: initialize init_net earlier Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 035/109] powerpc: Set crashkernel offset to mid of RMA region Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 036/109] drm/amdgpu: Fix recursive locking warning Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 037/109] scsi: smartpqi: Fix rmmod stack trace Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 038/109] scsi: smartpqi: Fix kdump issue when controller is locked up Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 039/109] PCI: aardvark: Fix support for MSI interrupts Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 040/109] iommu/arm-smmu-v3: fix event handling soft lockup Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 041/109] usb: ehci: add pci device support for Aspeed platforms Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 042/109] KVM: arm64: Do not change the PMU event filter after a VCPU has run Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 043/109] PCI: endpoint: Fix alignment fault error in copy tests Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 044/109] tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 045/109] PCI: pciehp: Add Qualcomm quirk for Command Completed erratum Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 046/109] scsi: mpi3mr: Fix reporting of actual data transfer size Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 047/109] scsi: mpi3mr: Fix memory leaks Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 048/109] powerpc/set_memory: Avoid spinlock recursion in change_page_attr() Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 049/109] power: supply: axp288-charger: Set Vhold to 4.4V Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 050/109] drm/amd/display: reset lane settings after each PHY repeater LT Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 051/109] net/mlx5e: Disable TX queues before registering the netdev Sasha Levin
2022-04-01 14:31 ` [PATCH AUTOSEL 5.16 052/109] usb: dwc3: pci: Set the swnode from inside dwc3_pci_quirks() Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 053/109] iwlwifi: mvm: Correctly set fragmented EBS Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 054/109] iwlwifi: mvm: Passively scan non PSC channels only when requested so Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 055/109] iwlwifi: fix small doc mistake for iwl_fw_ini_addr_val Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 056/109] iwlwifi: mvm: move only to an enabled channel Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 057/109] drm/msm/dsi: Remove spurious IRQF_ONESHOT flag Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 058/109] rtw89: fix RCU usage in rtw89_core_txq_push() Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 059/109] ipv4: Invalidate neighbour for broadcast address upon address addition Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 060/109] dm ioctl: prevent potential spectre v1 gadget Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 061/109] dm: requeue IO if mapping table not yet available Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 062/109] drm/amdkfd: make CRAT table missing message informational only Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 063/109] vfio/pci: Stub vfio_pci_vga_rw when !CONFIG_VFIO_PCI_VGA Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 064/109] scsi: pm8001: Fix pm80xx_pci_mem_copy() interface Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 065/109] scsi: pm8001: Fix pm8001_mpi_task_abort_resp() Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 066/109] scsi: pm8001: Fix task leak in pm8001_send_abort_all() Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 067/109] scsi: pm8001: Fix tag leaks on error Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 068/109] scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req() Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 069/109] mt76: mt7915: fix injected MPDU transmission to not use HW A-MSDU Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 070/109] drm/simpledrm: Add "panel orientation" property on non-upright mounted LCD panels Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 071/109] powerpc/64s/hash: Make hash faults work in NMI context Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 072/109] mt76: mt7615: Fix assigning negative values to unsigned variable Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 073/109] scsi: aha152x: Fix aha152x_setup() __setup handler return value Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 074/109] scsi: hisi_sas: Free irq vectors in order for v3 HW Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 075/109] scsi: hisi_sas: Limit users changing debugfs BIST count value Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 076/109] net/smc: correct settings of RMB window update limit Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 077/109] mips: ralink: fix a refcount leak in ill_acc_of_setup() Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 078/109] macvtap: advertise link netns via netlink Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 079/109] tuntap: add sanity checks about msg_controllen in sendmsg Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 080/109] iommu/iova: Improve 32-bit free space estimate Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 081/109] Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg} Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 082/109] Bluetooth: use memset avoid memory leaks Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 083/109] bnxt_en: Eliminate unintended link toggle during FW reset Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 084/109] PCI: endpoint: Fix misused goto label Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 085/109] MIPS: fix fortify panic when copying asm exception handlers Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 086/109] powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 087/109] powerpc/secvar: fix refcount leak in format_show() Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 088/109] scsi: libfc: Fix use after free in fc_exch_abts_resp() Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 089/109] can: isotp: set default value for N_As to 50 micro seconds Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 090/109] can: etas_es58x: es58x_fd_rx_event_msg(): initialize rx_event_msg before calling es58x_check_msg_len() Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 091/109] riscv: Fixed misaligned memory access. Fixed pointer comparison Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 092/109] net: account alternate interface name memory Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 093/109] net: limit altnames to 64k total Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 094/109] net/mlx5e: Remove overzealous validations in netlink EEPROM query Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 095/109] platform/x86: hp-wmi: Fix SW_TABLET_MODE detection method Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 096/109] platform/x86: hp-wmi: Fix 0x05 error code reported by several WMI calls Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 097/109] net: sfp: add 2500base-X quirk for Lantech SFP module Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 098/109] usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 099/109] mt76: fix monitor mode crash with sdio driver Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 100/109] xtensa: fix DTC warning unit_address_format Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 101/109] MIPS: ingenic: correct unit node address Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 102/109] Bluetooth: Fix use after free in hci_send_acl Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 103/109] netfilter: conntrack: revisit gc autotuning Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 104/109] netlabel: fix out-of-bounds memory accesses Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 105/109] ceph: fix inode reference leakage in ceph_get_snapdir() Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 106/109] ceph: fix memory leak in ceph_readdir when note_last_dentry returns error Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 107/109] lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 108/109] init/main.c: return 1 from handled __setup() functions Sasha Levin
2022-04-01 14:32 ` [PATCH AUTOSEL 5.16 109/109] minix: fix bug when opening a file with O_DIRECT Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220401143256.1950537-17-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=Felix.Kuehling@amd.com \
--cc=Philip.Yang@amd.com \
--cc=Xinhui.Pan@amd.com \
--cc=airlied@linux.ie \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ruili.ji@amd.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).