All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Dmitry Osipenko" <dmitry.osipenko@collabora.com>,
	dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Subject: dma_buf_detach lockdep splat
Date: Wed, 26 Jun 2024 17:58:02 +0200	[thread overview]
Message-ID: <40611e5ff8c32b99e7863293a0baac078f323d8f.camel@linux.intel.com> (raw)

Hi!

I'm seeing the below lockdep splat 1) with the xe driver in an imported
dma-buf object destruction path.

It's not because we hold the dma_resv lock at that point, but rather
because we hold *another* dma_resv lock at that point, and the dma_resv
detach happens when the object is idle, in this case it was idle at the
final put(), and dma_buf_detach() is called in the putting process.

Holding another dma-buf lock might happen as part of
drm_exec_unlock_all, or simply if the wider vm dma_resv was held at
object put time, so it's not an uncommon pattern, even if the drm_exec
instance can be fixed by putting all bos after unlocking them all.

Two solutions coming to mind here:

1) Provide a dma_buf_detach_locked()
2) Have TTM always take the delayed delete path for imported dma-buf
objects.

I'd prefer 1) since I think the correct place to call this is in the
TTM callback delete_mem_notify() where the bo is already locked, and I
figure non-TTM gem backends may come to suffer from the same problem.

Opinions, suggestions?

[1]
[   99.136161] ============================================
[   99.136162] WARNING: possible recursive locking detected
[   99.136163] 6.10.0-rc2+ #6 Tainted: G     U            
[   99.136165] --------------------------------------------
[   99.136166] glxgears:sh0/4675 is trying to acquire lock:
[   99.136167] ffff9967dcdd91a8 (reservation_ww_class_mutex){+.+.}-
{3:3}, at: dma_buf_detach+0x3b/0xf0
[   99.136184] 
               but task is already holding lock:
[   99.136186] ffff9967d8c145a8 (reservation_ww_class_mutex){+.+.}-
{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[   99.136191] 
               other info that might help us debug this:
[   99.136192]  Possible unsafe locking scenario:

[   99.136194]        CPU0
[   99.136194]        ----
[   99.136195]   lock(reservation_ww_class_mutex);
[   99.136197]   lock(reservation_ww_class_mutex);
[   99.136199] 
                *** DEADLOCK ***

[   99.136199]  May be due to missing lock nesting notation

[   99.136200] 5 locks held by glxgears:sh0/4675:
[   99.136202]  #0: ffff9967d8c104c8 (&xef->vm.lock){+.+.}-{3:3}, at:
xe_file_close+0xde/0x1c0 [xe]
[   99.136272]  #1: ffff9967d5bb7480 (&vm->lock){++++}-{3:3}, at:
xe_vm_close_and_put+0x161/0x9b0 [xe]
[   99.136350]  #2: ffff9967ef88a970 (&val->lock){.+.+}-{3:3}, at:
xe_validation_ctx_init+0x6d/0x70 [xe]
[   99.136440]  #3: ffffbd6a085577b8
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
xe_vma_destroy_unlocked+0x7f/0xe0 [xe]
[   99.136546]  #4: ffff9967d8c145a8
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[   99.136552] 
               stack backtrace:
[   99.136553] CPU: 10 PID: 4675 Comm: glxgears:sh0 Tainted: G     U  
6.10.0-rc2+ #6
[   99.136555] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[   99.136557] Call Trace:
[   99.136558]  <TASK>
[   99.136560]  dump_stack_lvl+0x77/0xb0
[   99.136564]  __lock_acquire+0x1232/0x2160
[   99.136569]  lock_acquire+0xcb/0x2d0
[   99.136570]  ? dma_buf_detach+0x3b/0xf0
[   99.136574]  ? __lock_acquire+0x417/0x2160
[   99.136577]  __ww_mutex_lock.constprop.0+0xd0/0x13b0
[   99.136580]  ? dma_buf_detach+0x3b/0xf0
[   99.136584]  ? dma_buf_detach+0x3b/0xf0
[   99.136588]  ? ww_mutex_lock+0x2b/0x90
[   99.136590]  ww_mutex_lock+0x2b/0x90
[   99.136592]  dma_buf_detach+0x3b/0xf0
[   99.136595]  drm_prime_gem_destroy+0x2f/0x40 [drm]
[   99.136638]  xe_ttm_bo_destroy+0x32/0x220 [xe]
[   99.136734]  ? __mutex_unlock_slowpath+0x3a/0x290
[   99.136738]  drm_exec_unlock_all+0xa1/0xd0 [drm_exec]
[   99.136741]  drm_exec_fini+0x12/0xb0 [drm_exec]
[   99.136743]  xe_validation_ctx_fini+0x15/0x40 [xe]
[   99.136848]  xe_vma_destroy_unlocked+0xb1/0xe0 [xe]
[   99.136954]  xe_vm_close_and_put+0x41a/0x9b0 [xe]
[   99.137056]  ? xa_find+0xe3/0x1e0
[   99.137060]  xe_file_close+0x10a/0x1c0 [xe]
[   99.137157]  drm_file_free+0x22a/0x280 [drm]
[   99.137193]  drm_release_noglobal+0x22/0x70 [drm]
[   99.137227]  __fput+0xf1/0x2d0
[   99.137231]  task_work_run+0x59/0x90
[   99.137235]  do_exit+0x330/0xb40
[   99.137238]  do_group_exit+0x36/0xa0
[   99.137241]  get_signal+0xbd2/0xbe0
[   99.137245]  arch_do_signal_or_restart+0x3e/0x240
[   99.137249]  syscall_exit_to_user_mode+0x1e7/0x290
[   99.137252]  do_syscall_64+0xa1/0x180
[   99.137255]  ? _raw_spin_unlock+0x23/0x40
[   99.137257]  ? look_up_lock_class+0x6f/0x120
[   99.137261]  ? __lock_acquire+0x417/0x2160
[   99.137264]  ? lock_acquire+0xcb/0x2d0
[   99.137266]  ? __set_task_comm+0x28/0x1e0
[   99.137268]  ? find_held_lock+0x2b/0x80
[   99.137271]  ? __set_task_comm+0xe1/0x1e0
[   99.137273]  ? lock_release+0xca/0x290
[   99.137277]  ? __do_sys_prctl+0x245/0xab0
[   99.137279]  ? lockdep_hardirqs_on_prepare+0xde/0x190
[   99.137281]  ? syscall_exit_to_user_mode+0xb0/0x290
[   99.137284]  ? do_syscall_64+0xa1/0x180
[   99.137286]  ? cpuset_cpus_allowed+0x36/0x140
[   99.137289]  ? find_held_lock+0x2b/0x80
[   99.137291]  ? find_held_lock+0x2b/0x80
[   99.137294]  ? __sched_setaffinity+0x78/0x240
[   99.137297]  ? kfree+0xe2/0x310
[   99.137301]  ? kfree+0x202/0x310
[   99.137303]  ? __sched_setaffinity+0x78/0x240
[   99.137305]  ? __x64_sys_sched_setaffinity+0x69/0xb0
[   99.137307]  ? kfree+0xe2/0x310
[   99.137310]  ? lockdep_hardirqs_on_prepare+0xde/0x190
[   99.137312]  ? syscall_exit_to_user_mode+0xb0/0x290
[   99.137315]  ? do_syscall_64+0xa1/0x180
[   99.137317]  ? trace_hardirqs_off+0x4b/0xc0
[   99.137321]  ? clear_bhb_loop+0x45/0xa0
[   99.137325]  ? clear_bhb_loop+0x45/0xa0
[   99.137327]  ? clear_bhb_loop+0x45/0xa0
[   99.137330]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   99.137333] RIP: 0033:0x7fda70ee6169
[   99.137351] Code: Unable to access opcode bytes at 0x7fda70ee613f.
[   99.137352] RSP: 002b:00007fda5fdffc80 EFLAGS: 00000246 ORIG_RAX:
00000000000000ca
[   99.137354] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX:
00007fda70ee6169
[   99.137356] RDX: 0000000000000000 RSI: 0000000000000189 RDI:
0000564a96f45b30
[   99.137358] RBP: 00007fda5fdffcb0 R08: 0000000000000000 R09:
00000000ffffffff
[   99.137359] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000000000
[   99.137360] R13: 0000000000000000 R14: 0000000000000000 R15:
0000564a96f45b30
[   99.137365]  </TASK>


             reply	other threads:[~2024-06-26 15:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-26 15:58 Thomas Hellström [this message]
2024-06-27  8:04 ` dma_buf_detach lockdep splat Daniel Vetter
2024-06-27  8:25   ` Christian König
2024-06-27 12:03     ` [Linaro-mm-sig] " Thomas Hellström
2024-06-27 12:18   ` Thomas Hellström
2024-06-28 18:06     ` Daniel Vetter
2024-07-01  8:56       ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40611e5ff8c32b99e7863293a0baac078f323d8f.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dmitry.osipenko@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.