From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Matthew Auld <matthew.auld@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [RFC 19/34] drm/xe: Remove pm_runtime lockdep
Date: Thu, 15 Feb 2024 17:47:48 -0500 [thread overview]
Message-ID: <Zc6UlCthcR61eI4y@intel.com> (raw)
In-Reply-To: <fc63da97-8aee-4027-a723-723ef7ac9567@intel.com>
On Mon, Feb 05, 2024 at 11:54:45AM +0000, Matthew Auld wrote:
> On 26/01/2024 20:30, Rodrigo Vivi wrote:
> > This lockdep was initially designed for the mem_access,
> > where the mem_access needed to run the resume sync from
> > the innerbounds and count the references.
> >
> > With the runtime moving to the outer bounds of the driver
> > and the mem_access replaced by the pure rpm get/put
> > references, it is no longer needed and it is in a matter
>
> We are also calling it in workers like invalidation_fence_work_func(),
> xe_sched_process_msg_work() etc. It is still quite easy to deadlock with
> that.
>
> > of fact just splatting may false positives as the following:
>
> Yeah, calling invalidation_fence_work_func() directly in the callers context
> is a false positive since ioctl has the rpm ref. But what about calling that
> in the async case where invalidation_fence_work_func() is only called when
> the in-fence signals? We are sure that is safe? It should be simple to fix
> the false positive here, no? If we just delete all the annotations we get
> zero help from lockdep for the more complicated cases. Also IIRC lockdep
> doesn't show you every splat once it triggers once, so who knows what else
> is lurking and whether that is also false positive?
>
> >
> > -> #1 (xe_pm_runtime_lockdep_map){+.+.}-{0:0}:
> > [ 384.778761] xe_pm_runtime_get+0xa3/0x100 [xe]
> > [ 384.783871] invalidation_fence_work_func+0x7f/0x2b0 [xe]
> > [ 384.789942] invalidation_fence_init+0x8c2/0xce0 [xe]
> > [ 384.795671] __xe_pt_unbind_vma+0x4a7/0x1be0 [xe]
> > [ 384.801050] xe_vm_unbind+0x22f/0xc70 [xe]
> > [ 384.805821] __xe_vma_op_execute+0xc67/0x1af0 [xe]
> > [ 384.811286] xe_vm_bind_ioctl+0x3a36/0x66c0 [xe]
> > [ 384.816579] drm_ioctl_kernel+0x14a/0x2c0
> > [ 384.821132] drm_ioctl+0x4c6/0xab0
> > [ 384.825073] xe_drm_ioctl+0xa1/0xe0 [xe]
> > [ 384.829651] __x64_sys_ioctl+0x130/0x1a0
> > [ 384.834115] do_syscall_64+0x5c/0xe0
> > [ 384.838232] entry_SYSCALL_64_after_hwframe+0x6e/0x76
> > [ 384.843829]
> > -> #0 (reservation_ww_class_mutex){+.+.}-{4:4}:
> > [ 384.850911] __lock_acquire+0x3261/0x6330
> > [ 384.855462] lock_acquire+0x19b/0x4d0
> > [ 384.859666] __ww_mutex_lock.constprop.0+0x1d8/0x3500
> > [ 384.865263] ww_mutex_lock+0x38/0x150
> > [ 384.869465] xe_bo_lock+0x41/0x70 [xe]
> > [ 384.873869] xe_bo_evict_all+0x7ad/0xa40 [xe]
> > [ 384.878883] xe_pm_runtime_suspend+0x297/0x340 [xe]
> > [ 384.884431] xe_pci_runtime_suspend+0x3b/0x1e0 [xe]
> > [ 384.889975] pci_pm_runtime_suspend+0x168/0x540
> > [ 384.895052] __rpm_callback+0xa9/0x390
> > [ 384.899343] rpm_callback+0x1aa/0x210
> > [ 384.903543] rpm_suspend+0x2ea/0x14c0
> > [ 384.907746] pm_runtime_work+0x133/0x170
> > [ 384.912213] process_one_work+0x73b/0x1230
> > [ 384.916853] worker_thread+0x726/0x1320
> > [ 384.921237] kthread+0x2ee/0x3d0
> > [ 384.925005] ret_from_fork+0x2d/0x70
> > [ 384.929120] ret_from_fork_asm+0x1b/0x30
> > [ 384.933585]
> > other info that might help us debug this:
> >
> > [ 384.941625] Possible unsafe locking scenario:
> >
> > [ 384.947572] CPU0 CPU1
> > [ 384.952123] ---- ----
> > [ 384.956676] lock(xe_pm_runtime_lockdep_map);
> > [ 384.961140] lock(reservation_ww_class_mutex);
> > [ 384.968220] lock(xe_pm_runtime_lockdep_map);
> > [ 384.975214] lock(reservation_ww_class_mutex);
> > [ 384.979765]
> > *** DEADLOCK ***
> >
> > In a matter of fact, there's actually a third lock that is not
> > in this picture:
> > spin_lock_irq(&dev->power.lock);
> > and INIT_WORK(&dev->power.work, pm_runtime_work);
> >
> > The pm_callback_task will ensure that there's no recursive
> > calls of the resume function and it will increase the
> > reference counter anyway.
> >
> > Then, the pm_runtime workqueue and spin locks will avoid that
> > any resume and suspend operations happens in parallel with
> > other resume and suspend operations.
> >
> > With that, the only thing that we are actually doing here is
> > to wrongly train the lockdep, basically saying that we will
> > acquire some locks on resume and on suspend concurrently,
> > entirely ignoring its serialization and protection.
> >
> > The above scenario is simply not possible because there's
> > a serialization with the spin_lock_irq(&dev->power.lock)
> > before each operation. However we are telling the lockep
> > that the lock(xe_pm_runtime_lockdep_map) occurs before
> > the &dev->power.lock and lockdep is not capable to see
> > that other protection.
>
> Can you share some more info here? AFAIK dev->power.lock is an RPM core lock
> that protects internal state like transitioning between ACTIVE, SUSPENDING,
> RESUMING etc. It is never held when calling any of our rpm callbacks, so it
> should never factor in to xe_pm_runtime_lockdep_map.
>
> The overall thing should look like this:
>
> xe_rpm_get:
> lock_map_acquire(&xe_pm_runtime_lockdep_map);
> lock_map_release(&xe_pm_runtime_lockdep_map);
> .....
> RPM core grabs dev->power.lock
>
> rpm resume callback: (RPM has dropped dev->power.lock)
^ this is exactly the problem.
RPM doesn't drop the dev->power.lock while the callback
is called. it relaxes while waiting for other transaction
to finish, but it is hold by the time it gets to the callback.
> lock_map_acquire(&xe_pm_runtime_lockdep_map);
> ....do resume stuff
> lock_map_release(&xe_pm_runtime_lockdep_map);
>
> rpm suspend callback: (RPM has dropped dev->power.lock)
> lock_map_acquire(&xe_pm_runtime_lockdep_map);
> .....do suspend stuff
> lock_map_release(&xe_pm_runtime_lockdep_map);
>
> >
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_pm.c | 55 --------------------------------------
> > 1 file changed, 55 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index 86bf225dba02..f49e449d9fb7 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -67,12 +67,6 @@
> > */
> > -#ifdef CONFIG_LOCKDEP
> > -struct lockdep_map xe_pm_runtime_lockdep_map = {
> > - .name = "xe_pm_runtime_lockdep_map"
> > -};
> > -#endif
> > -
> > /**
> > * xe_pm_suspend - Helper for System suspend, i.e. S0->S3 / S0->S2idle
> > * @xe: xe device instance
> > @@ -291,29 +285,6 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
> > /* Disable access_ongoing asserts and prevent recursive pm calls */
> > xe_pm_write_callback_task(xe, current);
> > - /*
> > - * The actual xe_pm_runtime_put() is always async underneath, so
> > - * exactly where that is called should makes no difference to us. However
> > - * we still need to be very careful with the locks that this callback
> > - * acquires and the locks that are acquired and held by any callers of
> > - * xe_runtime_pm_get(). We already have the matching annotation
> > - * on that side, but we also need it here. For example lockdep should be
> > - * able to tell us if the following scenario is in theory possible:
> > - *
> > - * CPU0 | CPU1 (kworker)
> > - * lock(A) |
> > - * | xe_pm_runtime_suspend()
> > - * | lock(A)
> > - * xe_pm_runtime_get() |
> > - *
> > - * This will clearly deadlock since rpm core needs to wait for
> > - * xe_pm_runtime_suspend() to complete, but here we are holding lock(A)
> > - * on CPU0 which prevents CPU1 making forward progress. With the
> > - * annotation here and in xe_pm_runtime_get() lockdep will see
> > - * the potential lock inversion and give us a nice splat.
> > - */
> > - lock_map_acquire(&xe_pm_runtime_lockdep_map);
> > -
> > /*
> > * Applying lock for entire list op as xe_ttm_bo_destroy and xe_bo_move_notify
> > * also checks and delets bo entry from user fault list.
> > @@ -341,7 +312,6 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
> > if (xe->d3cold.allowed)
> > xe_display_pm_runtime_suspend(xe);
> > out:
> > - lock_map_release(&xe_pm_runtime_lockdep_map);
> > xe_pm_write_callback_task(xe, NULL);
> > return err;
> > }
> > @@ -361,8 +331,6 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > /* Disable access_ongoing asserts and prevent recursive pm calls */
> > xe_pm_write_callback_task(xe, current);
> > - lock_map_acquire(&xe_pm_runtime_lockdep_map);
> > -
> > /*
> > * It can be possible that xe has allowed d3cold but other pcie devices
> > * in gfx card soc would have blocked d3cold, therefore card has not
> > @@ -400,31 +368,10 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > goto out;
> > }
> > out:
> > - lock_map_release(&xe_pm_runtime_lockdep_map);
> > xe_pm_write_callback_task(xe, NULL);
> > return err;
> > }
> > -/*
> > - * For places where resume is synchronous it can be quite easy to deadlock
> > - * if we are not careful. Also in practice it might be quite timing
> > - * sensitive to ever see the 0 -> 1 transition with the callers locks
> > - * held, so deadlocks might exist but are hard for lockdep to ever see.
> > - * With this in mind, help lockdep learn about the potentially scary
> > - * stuff that can happen inside the runtime_resume callback by acquiring
> > - * a dummy lock (it doesn't protect anything and gets compiled out on
> > - * non-debug builds). Lockdep then only needs to see the
> > - * xe_pm_runtime_lockdep_map -> runtime_resume callback once, and then can
> > - * hopefully validate all the (callers_locks) -> xe_pm_runtime_lockdep_map.
> > - * For example if the (callers_locks) are ever grabbed in the
> > - * runtime_resume callback, lockdep should give us a nice splat.
> > - */
> > -static void pm_runtime_lockdep_training(void)
> > -{
> > - lock_map_acquire(&xe_pm_runtime_lockdep_map);
> > - lock_map_release(&xe_pm_runtime_lockdep_map);
> > -}
> > -
> > /**
> > * xe_pm_runtime_get - Get a runtime_pm reference and resume synchronously
> > * @xe: xe device instance
> > @@ -436,7 +383,6 @@ void xe_pm_runtime_get(struct xe_device *xe)
> > if (xe_pm_read_callback_task(xe) == current)
> > return;
> > - pm_runtime_lockdep_training();
> > pm_runtime_resume(xe->drm.dev);
> > }
> > @@ -466,7 +412,6 @@ int xe_pm_runtime_get_sync(struct xe_device *xe)
> > if (WARN_ON(xe_pm_read_callback_task(xe) == current))
> > return -ELOOP;
> > - pm_runtime_lockdep_training();
> > return pm_runtime_get_sync(xe->drm.dev);
> > }
next prev parent reply other threads:[~2024-02-15 22:47 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-26 20:30 [RFC 00/34] Kill mem_access v2 Rodrigo Vivi
2024-01-26 20:30 ` [RFC 01/34] Revert "drm/xe/uc: Store firmware binary in system-memory backed BO" Rodrigo Vivi
2024-01-26 20:30 ` [RFC 02/34] drm/xe: Document Xe PM component Rodrigo Vivi
2024-01-29 10:38 ` Francois Dugast
2024-01-26 20:30 ` [RFC 03/34] drm/xe: Fix display runtime_pm handling Rodrigo Vivi
2024-02-05 9:11 ` Matthew Auld
2024-02-14 18:05 ` Rodrigo Vivi
2024-02-15 9:30 ` Matthew Auld
2024-02-15 22:19 ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 04/34] drm/xe: Create a xe_pm_runtime_resume_and_get variant for display Rodrigo Vivi
2024-01-26 20:30 ` [RFC 05/34] drm/xe: Convert xe_pm_runtime_{get, put} to void and protect from recursion Rodrigo Vivi
2024-01-26 20:30 ` [RFC 06/34] drm/xe: Prepare display for D3Cold Rodrigo Vivi
2024-01-26 20:30 ` [RFC 07/34] drm/xe: Convert mem_access assertion towards the runtime_pm state Rodrigo Vivi
2024-02-05 9:55 ` Matthew Auld
2024-02-14 18:15 ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 08/34] drm/xe: Runtime PM wake on every IOCTL Rodrigo Vivi
2024-02-05 9:39 ` Matthew Auld
2024-01-26 20:30 ` [RFC 09/34] drm/xe: Convert kunit tests from mem_access to xe_pm_runtime Rodrigo Vivi
2024-02-05 9:57 ` Matthew Auld
2024-01-26 20:30 ` [RFC 10/34] drm/xe: Convert scheduler towards direct pm_runtime Rodrigo Vivi
2024-02-05 10:46 ` Matthew Auld
2024-01-26 20:30 ` [RFC 11/34] drm/xe: Runtime PM wake on every sysfs call Rodrigo Vivi
2024-02-05 10:55 ` Matthew Auld
2024-02-14 18:48 ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 12/34] drm/xe: Ensure device is awake before removing it Rodrigo Vivi
2024-02-05 11:05 ` Matthew Auld
2024-02-14 18:51 ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 13/34] drm/xe: Remove mem_access from guc_pc calls Rodrigo Vivi
2024-02-05 11:08 ` Matthew Auld
2024-01-26 20:30 ` [RFC 14/34] drm/xe: Runtime PM wake on every debugfs call Rodrigo Vivi
2024-02-05 11:10 ` Matthew Auld
2024-02-14 18:57 ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 15/34] drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls Rodrigo Vivi
2024-02-05 11:15 ` Matthew Auld
2024-01-26 20:30 ` [RFC 16/34] drm/xe: Removing extra mem_access protection from runtime pm Rodrigo Vivi
2024-02-05 11:23 ` Matthew Auld
2024-01-26 20:30 ` [RFC 17/34] drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls Rodrigo Vivi
2024-02-05 11:25 ` Matthew Auld
2024-01-26 20:30 ` [RFC 18/34] drm/xe: Move lockdep protection from mem_access to xe_pm_runtime Rodrigo Vivi
2024-02-05 11:31 ` Matthew Auld
2024-01-26 20:30 ` [RFC 19/34] drm/xe: Remove pm_runtime lockdep Rodrigo Vivi
2024-02-05 11:54 ` Matthew Auld
2024-02-15 22:47 ` Rodrigo Vivi [this message]
2024-02-20 17:48 ` Matthew Auld
2024-02-28 16:53 ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 20/34] drm/xe: Stop checking for power_lost on D3Cold Rodrigo Vivi
2024-01-26 20:30 ` [RFC 21/34] drm/xe: Convert GuC CT paths from mem_access to xe_pm_runtime Rodrigo Vivi
2024-02-05 12:23 ` Matthew Auld
2024-02-28 16:51 ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 22/34] drm/xe: Keep D0 for the entire duration of a LR VM Rodrigo Vivi
2024-01-26 20:30 ` [RFC 23/34] drm/xe: Ensure D0 on TLB invalidation Rodrigo Vivi
2024-02-05 12:41 ` Matthew Auld
2024-01-26 20:30 ` [RFC 24/34] drm/xe: Remove useless mem_access protection for query ioctls Rodrigo Vivi
2024-02-05 12:43 ` Matthew Auld
2024-01-26 20:30 ` [RFC 25/34] drm/xe: Convert gsc_work from mem_access to xe_pm_runtime Rodrigo Vivi
2024-02-05 13:11 ` Matthew Auld
2024-01-26 20:30 ` [RFC 26/34] drm/xe: VMs don't need the mem_access protection anymore Rodrigo Vivi
2024-02-05 13:29 ` Matthew Auld
2024-02-15 22:37 ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 27/34] drm/xe: Remove useless mem_access during probe Rodrigo Vivi
2024-02-05 13:18 ` Matthew Auld
2024-01-26 20:30 ` [RFC 28/34] drm/xe: Remove mem_access from suspend and resume functions Rodrigo Vivi
2024-02-05 13:30 ` Matthew Auld
2024-01-26 20:30 ` [RFC 29/34] drm/xe: Convert gt_reset from mem_access to xe_pm_runtime Rodrigo Vivi
2024-02-05 13:33 ` Matthew Auld
2024-01-26 20:30 ` [RFC 30/34] drm/xe: Remove useless mem_access on PAT dumps Rodrigo Vivi
2024-02-05 13:34 ` Matthew Auld
2024-01-26 20:30 ` [RFC 31/34] drm/xe: Remove inner mem_access protections Rodrigo Vivi
2024-01-26 20:30 ` [RFC 32/34] drm/xe: Kill xe_device_mem_access_{get*,put} Rodrigo Vivi
2024-01-26 20:30 ` [RFC 33/34] drm/xe: Remove unused runtime pm helper Rodrigo Vivi
2024-01-26 20:30 ` [RFC 34/34] drm/xe: Enable D3Cold on 'low' VRAM utilization Rodrigo Vivi
2024-01-29 12:12 ` Matthew Auld
2024-01-29 19:01 ` Vivi, Rodrigo
2024-01-30 15:01 ` Gupta, Anshuman
2024-01-26 20:39 ` ✓ CI.Patch_applied: success for Kill mem_access v2 Patchwork
2024-01-26 20:40 ` ✗ CI.checkpatch: warning " Patchwork
2024-01-26 20:40 ` ✗ CI.KUnit: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zc6UlCthcR61eI4y@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.