From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Matthew Auld <matthew.auld@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH 4/9] drm/xe: Move xe_irq runtime suspend and resume out of lockdep
Date: Tue, 5 Mar 2024 17:45:44 -0500 [thread overview]
Message-ID: <ZeegmIhyBz5aCwcm@intel.com> (raw)
In-Reply-To: <13054dd0-51cd-4dd4-8b14-4587037acf2f@intel.com>
On Tue, Mar 05, 2024 at 11:07:37AM +0000, Matthew Auld wrote:
> On 04/03/2024 18:21, Rodrigo Vivi wrote:
> > Now that mem_access xe_pm_runtime_lockdep_map was moved to protect all
> > the sync resume calls lockdep is saying:
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(xe_pm_runtime_lockdep_map);
> > lock(&power_domains->lock);
> > lock(xe_pm_runtime_lockdep_map);
> > lock(&power_domains->lock);
> >
> > -> #1 (xe_pm_runtime_lockdep_map){+.+.}-{0:0}:
> > xe_pm_runtime_resume_and_get+0x6a/0x190 [xe]
> > release_async_put_domains+0x26/0xa0 [xe]
> > intel_display_power_put_async_work+0xcb/0x1f0 [xe]
> >
> > -> #0 (&power_domains->lock){+.+.}-{4:4}:
> > __lock_acquire+0x3259/0x62c0
> > lock_acquire+0x19b/0x4c0
> > __mutex_lock+0x16b/0x1a10
> > intel_display_power_is_enabled+0x1f/0x40 [xe]
> > gen11_display_irq_reset+0x1f2/0xcc0 [xe]
> > xe_irq_reset+0x43d/0x1cb0 [xe]
> > xe_irq_resume+0x52/0x660 [xe]
> > xe_pm_runtime_resume+0x7d/0xdc0 [xe
> >
> > This is likely a false positive.
> >
> > This lockdep is created to protect races from the inner callers
>
> There is no real lock here so it doesn't protect anything AFAIK. It is just
> about mapping the hidden dependencies between locks held when waking up the
> device and locks acquired in the resume and suspend callbacks.
indeed a bad phrase. something like
'This lockdep is created to warn us if we are at risk of introducing inner callers"
would make it better?
>
> > of get-and-resume-sync that are within holding various memory access locks
> > with the resume and suspend itself that can also be trying to grab these
> > memory access locks.
> >
> > This is not the case here, for sure. The &power_domains->lock seems to be
> > sufficient to protect any race and there's no counter part to get deadlocked
> > with.
>
> What is meant by "race" here? The lockdep splat is saying that one or both
> of the resume or suspend callbacks is grabbing some lock, but that same lock
> is also held when potentially waking up the device. From lockdep POV that is
> a potential deadlock.
The lock is &power_domains->lock only, that could be grabbed at both suspend
and resume. But even though we are not trusting that only one of the operations
can help simultaneously, what are the other lock that could be possibly be
hold in a way to cause this theoretical deadlock?
>
> If we are saying that it is impossible to actually wake up the device in
> this particular case then can we rather make caller use _noresume() or
> ifactive()?
I'm trying to avoid touching the i915-display runtime-pm code. :/
At some point I even thought about making all the i915-display bogus on xe
and making the runtime_pm idle to check for display connected, but there
are so many cases where the code take different decisions if runtime_pm
is in-use vs not that it would complicate things a bit anyway.
>
> >
> > Also worth to mention that on i915, intel_display_power_put_async_work
> > also gets and resume synchronously and the runtime pm get/put
> > also resets the irq and that code was never problematic.
> >
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_pm.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index b534a194a9ef..919250e38ae0 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -347,7 +347,10 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
> > goto out;
> > }
> > + lock_map_release(&xe_pm_runtime_lockdep_map);
> > xe_irq_suspend(xe);
> > + xe_pm_write_callback_task(xe, NULL);
> > + return 0;
> > out:
> > lock_map_release(&xe_pm_runtime_lockdep_map);
> > xe_pm_write_callback_task(xe, NULL);
> > @@ -369,6 +372,8 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > /* Disable access_ongoing asserts and prevent recursive pm calls */
> > xe_pm_write_callback_task(xe, current);
> > + xe_irq_resume(xe);
> > +
> > lock_map_acquire(&xe_pm_runtime_lockdep_map);
> > /*
> > @@ -395,8 +400,6 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > goto out;
> > }
> > - xe_irq_resume(xe);
> > -
> > for_each_gt(gt, xe, id)
> > xe_gt_resume(gt);
next prev parent reply other threads:[~2024-03-05 22:45 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-04 18:21 [PATCH 1/9] drm/xe: Remove useless mem_access during probe Rodrigo Vivi
2024-03-04 18:21 ` [PATCH 2/9] drm/xe: Convert xe_gem_fault to use direct xe_pm_runtime calls Rodrigo Vivi
2024-03-05 10:18 ` Matthew Auld
2024-03-05 11:29 ` Matthew Auld
2024-03-05 22:29 ` Rodrigo Vivi
2024-03-04 18:21 ` [PATCH 3/9] drm/xe: Move lockdep protection from mem_access to xe_pm_runtime Rodrigo Vivi
2024-03-05 10:20 ` Matthew Auld
2024-03-04 18:21 ` [PATCH 4/9] drm/xe: Move xe_irq runtime suspend and resume out of lockdep Rodrigo Vivi
2024-03-05 11:07 ` Matthew Auld
2024-03-05 22:45 ` Rodrigo Vivi [this message]
2024-03-06 16:04 ` Matthew Auld
2024-03-06 17:49 ` Rodrigo Vivi
2024-03-06 18:56 ` Matthew Auld
2024-03-06 20:04 ` Rodrigo Vivi
2024-03-04 18:21 ` [PATCH 5/9] drm/xe: Removing useless mem_access protection from runtime pm Rodrigo Vivi
2024-03-05 10:22 ` Matthew Auld
2024-03-04 18:21 ` [PATCH 6/9] drm/xe: Introduce xe_pm_runtime_get_noresume for inner callers Rodrigo Vivi
2024-03-05 10:29 ` Matthew Auld
2024-03-04 18:21 ` [PATCH 7/9] drm/xe: Convert mem_access_if_ongoing to direct xe_pm_runtime_get_if_active Rodrigo Vivi
2024-03-05 10:24 ` Matthew Auld
2024-03-04 18:21 ` [PATCH 8/9] drm/xe: Ensure all the inner access are using the _noresume variant Rodrigo Vivi
2024-03-05 11:14 ` Matthew Auld
2024-03-04 18:21 ` [PATCH 9/9] drm/xe: Kill xe_device_mem_access_{get*,put} Rodrigo Vivi
2024-03-05 11:18 ` Matthew Auld
2024-03-04 18:27 ` ✓ CI.Patch_applied: success for series starting with [1/9] drm/xe: Remove useless mem_access during probe Patchwork
2024-03-04 18:28 ` ✗ CI.checkpatch: warning " Patchwork
2024-03-04 18:28 ` ✓ CI.KUnit: success " Patchwork
2024-03-04 18:42 ` ✓ CI.Build: " Patchwork
2024-03-04 18:42 ` ✓ CI.Hooks: " Patchwork
2024-03-04 18:44 ` ✓ CI.checksparse: " Patchwork
2024-03-04 19:14 ` ✗ CI.BAT: failure " Patchwork
2024-03-05 10:17 ` [PATCH 1/9] " Matthew Auld
2024-03-06 20:15 ` ✗ CI.Patch_applied: failure for series starting with [1/9] drm/xe: Remove useless mem_access during probe (rev2) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZeegmIhyBz5aCwcm@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).