From: Matthew Auld <matthew.auld@intel.com>
To: intel-xe@lists.freedesktop.org
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: [Intel-xe] [PATCH v14 09/10] drm/xe: drop xe_device_mem_access_get() from invalidation_vma
Date: Mon, 17 Jul 2023 12:25:12 +0100 [thread overview]
Message-ID: <20230717112502.32379-21-matthew.auld@intel.com> (raw)
In-Reply-To: <20230717112502.32379-12-matthew.auld@intel.com>
Lockdep gives the following splat:
[ 594.158863] ffff888140da53f0 (&vm->userptr.notifier_lock){++++}-{3:3}, at: vma_userptr_invalidate+0xeb/0x330 [xe]
[ 594.158921]
but task is already holding lock:
[ 594.158926] ffffffff82761940
(mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: unmap_vmas+0x0/0x1c0
[ 594.158941]
which lock already depends on the new lock.
[ 594.158947]
the existing dependency chain (in reverse order) is:
[ 594.158953]
-> #5 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
[ 594.158961] fs_reclaim_acquire+0x68/0xd0
[ 594.158969] __kmem_cache_alloc_node+0x2c/0x1b0
[ 594.158975] kmalloc_node_trace+0x1d/0xb0
[ 594.158983] alloc_worker+0x18/0x50
[ 594.158989] init_rescuer.part.0+0x13/0xa0
[ 594.158995] workqueue_init+0xdf/0x210
[ 594.159001] kernel_init_freeable+0x5c/0x2f0
[ 594.159009] kernel_init+0x11/0x1a0
[ 594.159017] ret_from_fork+0x29/0x50
[ 594.159023]
-> #4 (fs_reclaim){+.+.}-{0:0}:
[ 594.159031] fs_reclaim_acquire+0xa0/0xd0
[ 594.159037] __kmem_cache_alloc_node+0x2c/0x1b0
[ 594.159042] kmalloc_trace+0x20/0xb0
[ 594.159048] acpi_device_add+0x25a/0x3f0
[ 594.159056] acpi_add_single_object+0x387/0x750
[ 594.159063] acpi_bus_check_add+0x108/0x280
[ 594.159069] acpi_bus_scan+0x34/0xf0
[ 594.159075] acpi_scan_init+0xed/0x2b0
[ 594.159082] acpi_init+0x21e/0x520
[ 594.159087] do_one_initcall+0x53/0x260
[ 594.159092] kernel_init_freeable+0x18a/0x2f0
[ 594.159099] kernel_init+0x11/0x1a0
[ 594.159105] ret_from_fork+0x29/0x50
[ 594.159110]
-> #3 (acpi_device_lock){+.+.}-{3:3}:
[ 594.159117] __mutex_lock+0x95/0xd10
[ 594.159122] acpi_enable_wakeup_device_power+0x30/0x120
[ 594.159130] __acpi_device_wakeup_enable+0x34/0x110
[ 594.159138] acpi_pm_set_device_wakeup+0x55/0x140
[ 594.159143] __pci_enable_wake+0x56/0xb0
[ 594.159150] pci_finish_runtime_suspend+0x35/0x80
[ 594.159157] pci_pm_runtime_suspend+0xb5/0x1a0
[ 594.159162] __rpm_callback+0x3c/0x110
[ 594.159170] rpm_callback+0x58/0x70
[ 594.159176] rpm_suspend+0x15c/0x6f0
[ 594.159182] pm_runtime_work+0x9b/0xb0
[ 594.159188] process_one_work+0x263/0x520
[ 594.159195] worker_thread+0x4d/0x3b0
[ 594.159200] kthread+0xeb/0x120
[ 594.159206] ret_from_fork+0x29/0x50
[ 594.159211]
-> #2 (acpi_wakeup_lock){+.+.}-{3:3}:
[ 594.159218] __mutex_lock+0x95/0xd10
[ 594.159223] acpi_pm_set_device_wakeup+0x7a/0x140
[ 594.159228] __pci_enable_wake+0x77/0xb0
[ 594.159234] pci_pm_runtime_resume+0x70/0xd0
[ 594.159240] __rpm_callback+0x3c/0x110
[ 594.159246] rpm_callback+0x58/0x70
[ 594.159252] rpm_resume+0x50d/0x7a0
[ 594.159258] rpm_resume+0x267/0x7a0
[ 594.159264] __pm_runtime_resume+0x45/0x90
[ 594.159270] xe_pm_runtime_resume_and_get+0x12/0x50 [xe]
[ 594.159314] xe_device_mem_access_get+0x97/0xc0 [xe]
[ 594.159346] hw_engines+0x65/0xf0 [xe]
[ 594.159380] seq_read_iter+0x10d/0x4b0
[ 594.159385] seq_read+0x9e/0xd0
[ 594.159390] full_proxy_read+0x4e/0x80
[ 594.159396] vfs_read+0xb6/0x310
[ 594.159401] ksys_read+0x60/0xe0
[ 594.159406] do_syscall_64+0x38/0x90
[ 594.159413] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 594.159419]
-> #1 (&xe->mem_access.lock){+.+.}-{3:3}:
[ 594.159427] xe_device_mem_access_get+0x43/0xc0 [xe]
[ 594.159457] xe_gt_tlb_invalidation_vma+0x53/0x190 [xe]
[ 594.159490] invalidation_fence_init+0x1d2/0x2c0 [xe]
[ 594.159529] __xe_pt_unbind_vma+0x151/0x4e0 [xe]
[ 594.159564] vm_bind_ioctl+0x48a/0xae0 [xe]
[ 594.159602] async_op_work_func+0x20c/0x530 [xe]
[ 594.159634] process_one_work+0x263/0x520
[ 594.159640] worker_thread+0x4d/0x3b0
[ 594.159646] kthread+0xeb/0x120
[ 594.159650] ret_from_fork+0x29/0x50
[ 594.159655]
-> #0 (&vm->userptr.notifier_lock){++++}-{3:3}:
[ 594.159663] __lock_acquire+0x16fa/0x2850
[ 594.159670] lock_acquire+0xd2/0x2e0
[ 594.159676] down_write+0x36/0xd0
[ 594.159681] vma_userptr_invalidate+0xeb/0x330 [xe]
[ 594.159714] __mmu_notifier_invalidate_range_start+0x239/0x2a0
[ 594.159722] unmap_vmas+0x1ac/0x1c0
[ 594.159727] unmap_region+0xb5/0x120
[ 594.159732] do_vmi_align_munmap+0x2be/0x430
[ 594.159739] do_vmi_munmap+0xea/0x120
[ 594.159744] __vm_munmap+0x9c/0x160
[ 594.159750] __x64_sys_munmap+0x12/0x20
[ 594.159756] do_syscall_64+0x38/0x90
[ 594.159761] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 594.159768]
other info that might help us debug this:
[ 594.159773] Chain exists of:
&vm->userptr.notifier_lock --> fs_reclaim -->
mmu_notifier_invalidate_range_start
[ 594.159785] Possible unsafe locking scenario:
[ 594.159790] CPU0 CPU1
[ 594.159794] ---- ----
[ 594.159797] lock(mmu_notifier_invalidate_range_start);
[ 594.159802] lock(fs_reclaim);
[ 594.159808]
lock(mmu_notifier_invalidate_range_start);
[ 594.159814] lock(&vm->userptr.notifier_lock);
[ 594.159819]
The VM should be holding a mem_access.ref so this looks like it should
be a false positive and we can just drop the explicit mem_access in
xe_gt_tlb_invalidation(). The GGTT invalidation path also takes care to
hold mem_access.ref so should be fine there also, and we already assert
that we hold access.ref for the GuC communication underneath.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index e2b85559257c..cad0ade595ec 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -248,7 +248,7 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
struct xe_device *xe = gt_to_xe(gt);
#define MAX_TLB_INVALIDATION_LEN 7
u32 action[MAX_TLB_INVALIDATION_LEN];
- int len = 0, ret;
+ int len = 0;
XE_BUG_ON(!vma);
@@ -302,11 +302,7 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
XE_BUG_ON(len > MAX_TLB_INVALIDATION_LEN);
- xe_device_mem_access_get(xe);
- ret = send_tlb_invalidation(>->uc.guc, fence, action, len);
- xe_device_mem_access_put(xe);
-
- return ret;
+ return send_tlb_invalidation(>->uc.guc, fence, action, len);
}
/**
--
2.41.0
next prev parent reply other threads:[~2023-07-17 11:25 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-17 11:25 [Intel-xe] [PATCH v14 00/10] xe_device_mem_access fixes and related bits Matthew Auld
2023-07-17 11:25 ` [Intel-xe] [PATCH v14 01/10] drm/xe: fix xe_device_mem_access_get() races Matthew Auld
2023-07-17 11:25 ` [Intel-xe] [PATCH v14 02/10] drm/xe/vm: tidy up xe_runtime_pm usage Matthew Auld
2023-07-17 11:25 ` [Intel-xe] [PATCH v14 03/10] drm/xe/debugfs: grab mem_access around forcewake Matthew Auld
2023-07-17 11:25 ` [Intel-xe] [PATCH v14 04/10] drm/xe/guc_pc: add missing mem_access for freq_rpe_show Matthew Auld
2023-07-17 11:25 ` [Intel-xe] [PATCH v14 05/10] drm/xe/mmio: grab mem_access in xe_mmio_ioctl Matthew Auld
2023-07-17 11:25 ` [Intel-xe] [PATCH v14 06/10] drm/xe: ensure correct access_put ordering Matthew Auld
2023-07-17 11:25 ` [Intel-xe] [PATCH v14 07/10] drm/xe: drop xe_device_mem_access_get() from guc_ct_send Matthew Auld
2023-07-17 11:25 ` [Intel-xe] [PATCH v14 08/10] drm/xe/ggtt: prime ggtt->lock against FS_RECLAIM Matthew Auld
2023-07-17 11:25 ` Matthew Auld [this message]
2023-07-17 11:25 ` [Intel-xe] [PATCH v14 10/10] drm/xe: add lockdep annotation for xe_device_mem_access_get() Matthew Auld
2023-07-17 11:28 ` [Intel-xe] ✓ CI.Patch_applied: success for xe_device_mem_access fixes and related bits (rev4) Patchwork
2023-07-17 11:28 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-07-17 11:29 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-07-17 11:33 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-07-17 11:34 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-07-17 11:35 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-07-18 12:50 ` [Intel-xe] ✓ CI.Patch_applied: success for xe_device_mem_access fixes and related bits (rev6) Patchwork
2023-07-18 12:50 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-07-18 12:51 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-07-18 12:55 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-07-18 12:56 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-07-18 12:57 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-07-18 13:31 ` [Intel-xe] ○ CI.BAT: info " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230717112502.32379-21-matthew.auld@intel.com \
--to=matthew.auld@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox