public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Philip Yang <Philip.Yang@amd.com>,
	Felix Kuehling <felix.kuehling@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	Sasha Levin <sashal@kernel.org>,
	Felix.Kuehling@amd.com, christian.koenig@amd.com,
	Xinhui.Pan@amd.com, airlied@gmail.com, daniel@ffwll.ch,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.6 20/31] drm/amdkfd: Fix lock dependency warning with srcu
Date: Sun, 28 Jan 2024 11:12:50 -0500	[thread overview]
Message-ID: <20240128161315.201999-20-sashal@kernel.org> (raw)
In-Reply-To: <20240128161315.201999-1-sashal@kernel.org>

From: Philip Yang <Philip.Yang@amd.com>

[ Upstream commit 2a9de42e8d3c82c6990d226198602be44f43f340 ]

======================================================
WARNING: possible circular locking dependency detected
6.5.0-kfd-yangp #2289 Not tainted
------------------------------------------------------
kworker/0:2/996 is trying to acquire lock:
        (srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0

but task is already holding lock:
        ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}, at:
	process_one_work+0x211/0x560

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}:
        __flush_work+0x88/0x4f0
        svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu]
        svm_range_set_attr+0xd6/0x14c0 [amdgpu]
        kfd_ioctl+0x1d1/0x630 [amdgpu]
        __x64_sys_ioctl+0x88/0xc0

-> #2 (&info->lock#2){+.+.}-{3:3}:
        __mutex_lock+0x99/0xc70
        amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu]
        restore_process_helper+0x22/0x80 [amdgpu]
        restore_process_worker+0x2d/0xa0 [amdgpu]
        process_one_work+0x29b/0x560
        worker_thread+0x3d/0x3d0

-> #1 ((work_completion)(&(&process->restore_work)->work)){+.+.}-{0:0}:
        __flush_work+0x88/0x4f0
        __cancel_work_timer+0x12c/0x1c0
        kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu]
        __mmu_notifier_release+0xad/0x240
        exit_mmap+0x6a/0x3a0
        mmput+0x6a/0x120
        do_exit+0x322/0xb90
        do_group_exit+0x37/0xa0
        __x64_sys_exit_group+0x18/0x20
        do_syscall_64+0x38/0x80

-> #0 (srcu){.+.+}-{0:0}:
        __lock_acquire+0x1521/0x2510
        lock_sync+0x5f/0x90
        __synchronize_srcu+0x4f/0x1a0
        __mmu_notifier_release+0x128/0x240
        exit_mmap+0x6a/0x3a0
        mmput+0x6a/0x120
        svm_range_deferred_list_work+0x19f/0x350 [amdgpu]
        process_one_work+0x29b/0x560
        worker_thread+0x3d/0x3d0

other info that might help us debug this:
Chain exists of:
  srcu --> &info->lock#2 --> (work_completion)(&svms->deferred_list_work)

Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
        lock((work_completion)(&svms->deferred_list_work));
                        lock(&info->lock#2);
			lock((work_completion)(&svms->deferred_list_work));
        sync(srcu);

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index a4c911fa1675..b51224a85a38 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2343,8 +2343,10 @@ static void svm_range_deferred_list_work(struct work_struct *work)
 		mutex_unlock(&svms->lock);
 		mmap_write_unlock(mm);
 
-		/* Pairs with mmget in svm_range_add_list_work */
-		mmput(mm);
+		/* Pairs with mmget in svm_range_add_list_work. If dropping the
+		 * last mm refcount, schedule release work to avoid circular locking
+		 */
+		mmput_async(mm);
 
 		spin_lock(&svms->deferred_list_lock);
 	}
-- 
2.43.0


  parent reply	other threads:[~2024-01-28 16:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-28 16:12 [PATCH AUTOSEL 6.6 01/31] PCI: Only override AMD USB controller if required Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 02/31] PCI: switchtec: Fix stdev_release() crash after surprise hot remove Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 03/31] perf cs-etm: Bump minimum OpenCSD version to ensure a bugfix is present Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 04/31] xhci: fix possible null pointer deref during xhci urb enqueue Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 05/31] extcon: fix possible name leak in extcon_dev_register() Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 06/31] usb: hub: Replace hardcoded quirk value with BIT() macro Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 07/31] usb: hub: Add quirk to decrease IN-ep poll interval for Microchip USB491x hub Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 08/31] selftests/sgx: Fix linker script asserts Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 09/31] tty: allow TIOCSLCKTRMIOS with CAP_CHECKPOINT_RESTORE Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 10/31] fs/kernfs/dir: obey S_ISGID Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 11/31] spmi: mediatek: Fix UAF on device remove Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 12/31] power: supply: qcom_battmgr: Register the power supplies after PDR is up Sasha Levin
2024-01-29 13:04   ` Sebastian Reichel
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 13/31] PCI: Fix 64GT/s effective data rate calculation Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 14/31] PCI/AER: Decode Requester ID when no error info found Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 15/31] 9p: Fix initialisation of netfs_inode for 9p Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 16/31] usb: xhci-plat: fix usb disconnect issue after s4 Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 17/31] misc: lis3lv02d_i2c: Add missing setting of the reg_ctrl callback Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 18/31] libsubcmd: Fix memory leak in uniq() Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 19/31] drm/amdkfd: Fix lock dependency warning Sasha Levin
2024-01-28 16:12 ` Sasha Levin [this message]
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 21/31] virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 22/31] blk-mq: fix IO hang from sbitmap wakeup race Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 23/31] ceph: reinitialize mds feature bit even when session in open Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 24/31] ceph: fix deadlock or deadcode of misusing dget() Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 25/31] ceph: fix invalid pointer access if get_quota_realm return ERR_PTR Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 26/31] drm/amdgpu: fix avg vs input power reporting on smu7 Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 27/31] drm/amd/powerplay: Fix kzalloc parameter 'ATOM_Tonga_PPM_Table' in 'get_platform_power_management_table()' Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 28/31] drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()' Sasha Levin
2024-01-28 16:12 ` [PATCH AUTOSEL 6.6 29/31] drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()' Sasha Levin
2024-01-28 16:13 ` [PATCH AUTOSEL 6.6 30/31] drm/amdkfd: Fix 'node' NULL check in 'svm_range_get_range_boundaries()' Sasha Levin
2024-01-28 16:13 ` [PATCH AUTOSEL 6.6 31/31] i2c: rk3x: Adjust mask/value offset for i2c2 on rv1126 Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240128161315.201999-20-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Philip.Yang@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox