From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Imre Deak <imre.deak@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH] drm/xe/pm: Also avoid missing outer rpm warning on system suspend
Date: Fri, 20 Dec 2024 13:34:01 -0500 [thread overview]
Message-ID: <Z2W4mTZBwFPujqzM@intel.com> (raw)
In-Reply-To: <Z2WZNp0YZAehJfoL@ideak-desk.fi.intel.com>
On Fri, Dec 20, 2024 at 06:20:06PM +0200, Imre Deak wrote:
> On Fri, Dec 20, 2024 at 09:55:04AM -0500, Rodrigo Vivi wrote:
> > On Wed, Dec 18, 2024 at 04:33:09PM +0200, Imre Deak wrote:
> > > On Tue, Dec 17, 2024 at 06:05:47PM -0500, Rodrigo Vivi wrote:
> > > > We have some cases where display is releasing power domains at
> > > > release_async_put_domains() where intel_runtime_pm_get_noresume()
> > > > is called, but no outer protection. In Xe this will trigger our
> > > > traditional warning.
> > >
> > > I suppose by outer protection you mean an RPM reference that is
> > > guaranteed to be held at the point (that is right before)
> > > release_async_put_domains() calls intel_runtime_pm_get_noresume(). This
> > > is guaranteed, i.e. such an RPM reference is held by definition (by the
> > > power domain reference that is being put).
> >
> > not actually.
> > The outer rpm reference needs to be a reference on the outer bounds
> > that ensures the device is awake. _noresume calls should only be used
> > in inner places where you know there's something already ensuring
> > that the device is awake but you don't want to take the risk of that
> > reference being lost while you are in the middle of your sequence,
> > so you call the 'noresume' as an extra thing to ensure that you can
> > go to the end without device getting suspended because the other
> > reference got dropped.
>
> Yes, that is what I meant. In case of release_async_put_domains() it is
> sure that the device is awake and hence there is no runtime resume
> needed. The power domain reference being put holds a runtime PM
> reference. So the "no outer protection" reasoning in the commit log is
> not correct.
>
> The reason for the WARN that this patch fixes is simply that
> pm_runtime_get_if_in_use() used by xe to check for an outer RPM
> reference fails if it is called either during runtime suspend/resume or
> system suspend/resume. The existing code took this already into account
> for the runtime suspend/resume case, but it didn't take it into account
> for system suspend/resume. After this patch the outer protection check
> will work the same way for both the runtime and system s/r case,
> removing the WARN in the latter case.
great then, we are in the same page.
>
> > > Instead, the actual reason for triggering the warn - IIUC - is that
> > > intel_runtime_pm_get_if_in_use() called from
> > > xe_pm_runtime_get_noresume() (probably for the exact reason to check if
> > > an outer RPM is held) fails if it is called while system suspending /
> > > resuming. This is the same scenario as when
> > > intel_runtime_pm_get_if_in_use() would fail if called during runtime
> > > suspending / resuming and - worked around earlier I assume - by
> > > suppressing the warning in this case using xe_pm_suspending_or_resuming().
> >
> > The get_if_in_use is only the choice inside our _noresume so we can
> > properly check if the device was really awake and warn that we have
> > an unprotected case that we need to handle properly. If we were sure
> > to have all the outer protections in place already, we could safely
> > just use the _noresume option from the rpm directly.
> >
> > > So in this fix the above workaround to suppress the warning is just
> > > extended to the system suspend/resume case.
> > >
> > > > However, this case should be safe because it is triggered from the
> > > > system suspend path, where we certainly won't be transitioning to rpm
> > > > suspend.
> > > >
> > > > This wouldn't happen if the display pm sequences, including
> > > > all irq flow was in sync between i915 and xe. So, while we
> > > > don't get there, let's not raise warnings when we are in this
> > > > system suspend path.
> > >
> > > I think the issue fixed in this patch is just a consequence of how the
> > > outer RPM check works using xe_pm_suspending_or_resuming() and wouldn't
> > > change even after the IRQ related issues are fixed.
> >
> > If there's other cases where this release_async_put_domains is called
> > out of the suspend path, this warning here is showing that we do
> > need an extra runtime_pm_get right at the beginning of the workqueue.
> > And this patch here would only be masking this warning in this case
> > here, while leaving the release_async_put_domains unprotected.
>
> Fixing the IRQ handling doesn't change how pm_runtime_get_if_in_use()
> works and hence how its return value is ignored in the outer protection
> check during runtime and system s/r.
indeed!
so, pushed to drm-xe-next.
Thank you so much for the suggestion, review and insights here
>
> > > > Suggested-by: Imre Deak <imre.deak@intel.com>
> > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > >
> > > With the above understanding:
> > > Reviewed-by: Imre Deak <imre.deak@intel.com>
> > >
> > > > ---
> > > > drivers/gpu/drm/xe/xe_pm.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > > index a6761cb769b2..c6e57af0144c 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > > @@ -7,6 +7,7 @@
> > > >
> > > > #include <linux/fault-inject.h>
> > > > #include <linux/pm_runtime.h>
> > > > +#include <linux/suspend.h>
> > > >
> > > > #include <drm/drm_managed.h>
> > > > #include <drm/ttm/ttm_placement.h>
> > > > @@ -607,7 +608,8 @@ static bool xe_pm_suspending_or_resuming(struct xe_device *xe)
> > > > struct device *dev = xe->drm.dev;
> > > >
> > > > return dev->power.runtime_status == RPM_SUSPENDING ||
> > > > - dev->power.runtime_status == RPM_RESUMING;
> > > > + dev->power.runtime_status == RPM_RESUMING ||
> > > > + pm_suspend_target_state != PM_SUSPEND_ON;
> > > > #else
> > > > return false;
> > > > #endif
> > > > --
> > > > 2.47.1
> > > >
next prev parent reply other threads:[~2024-12-20 18:34 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-17 23:05 [PATCH] drm/xe/pm: Also avoid missing outer rpm warning on system suspend Rodrigo Vivi
2024-12-18 3:05 ` ✓ CI.Patch_applied: success for " Patchwork
2024-12-18 3:05 ` ✓ CI.checkpatch: " Patchwork
2024-12-18 3:07 ` ✓ CI.KUnit: " Patchwork
2024-12-18 3:25 ` ✓ CI.Build: " Patchwork
2024-12-18 3:27 ` ✓ CI.Hooks: " Patchwork
2024-12-18 3:29 ` ✓ CI.checksparse: " Patchwork
2024-12-18 4:03 ` ✓ Xe.CI.BAT: " Patchwork
2024-12-18 13:43 ` ✗ Xe.CI.Full: failure " Patchwork
2024-12-18 14:33 ` [PATCH] " Imre Deak
2024-12-20 14:55 ` Rodrigo Vivi
2024-12-20 16:20 ` Imre Deak
2024-12-20 18:34 ` Rodrigo Vivi [this message]
2024-12-20 19:15 ` Imre Deak
2024-12-20 19:21 ` Rodrigo Vivi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z2W4mTZBwFPujqzM@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=imre.deak@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox