From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Imre Deak <imre.deak@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH] drm/xe/pm: Also avoid missing outer rpm warning on system suspend
Date: Fri, 20 Dec 2024 13:34:01 -0500 [thread overview]
Message-ID: <Z2W4mTZBwFPujqzM@intel.com> (raw)
In-Reply-To: <Z2WZNp0YZAehJfoL@ideak-desk.fi.intel.com>
On Fri, Dec 20, 2024 at 06:20:06PM +0200, Imre Deak wrote:
> On Fri, Dec 20, 2024 at 09:55:04AM -0500, Rodrigo Vivi wrote:
> > On Wed, Dec 18, 2024 at 04:33:09PM +0200, Imre Deak wrote:
> > > On Tue, Dec 17, 2024 at 06:05:47PM -0500, Rodrigo Vivi wrote:
> > > > We have some cases where display is releasing power domains at
> > > > release_async_put_domains() where intel_runtime_pm_get_noresume()
> > > > is called, but no outer protection. In Xe this will trigger our
> > > > traditional warning.
> > >
> > > I suppose by outer protection you mean an RPM reference that is
> > > guaranteed to be held at the point (that is right before)
> > > release_async_put_domains() calls intel_runtime_pm_get_noresume(). This
> > > is guaranteed, i.e. such an RPM reference is held by definition (by the
> > > power domain reference that is being put).
> >
> > not actually.
> > The outer rpm reference needs to be a reference on the outer bounds
> > that ensures the device is awake. _noresume calls should only be used
> > in inner places where you know there's something already ensuring
> > that the device is awake but you don't want to take the risk of that
> > reference being lost while you are in the middle of your sequence,
> > so you call the 'noresume' as an extra thing to ensure that you can
> > go to the end without device getting suspended because the other
> > reference got dropped.
>
> Yes, that is what I meant. In case of release_async_put_domains() it is
> sure that the device is awake and hence there is no runtime resume
> needed. The power domain reference being put holds a runtime PM
> reference. So the "no outer protection" reasoning in the commit log is
> not correct.
>
> The reason for the WARN that this patch fixes is simply that
> pm_runtime_get_if_in_use() used by xe to check for an outer RPM
> reference fails if it is called either during runtime suspend/resume or
> system suspend/resume. The existing code took this already into account
> for the runtime suspend/resume case, but it didn't take it into account
> for system suspend/resume. After this patch the outer protection check
> will work the same way for both the runtime and system s/r case,
> removing the WARN in the latter case.
great then, we are in the same page.
>
> > > Instead, the actual reason for triggering the warn - IIUC - is that
> > > intel_runtime_pm_get_if_in_use() called from
> > > xe_pm_runtime_get_noresume() (probably for the exact reason to check if
> > > an outer RPM is held) fails if it is called while system suspending /
> > > resuming. This is the same scenario as when
> > > intel_runtime_pm_get_if_in_use() would fail if called during runtime
> > > suspending / resuming and - worked around earlier I assume - by
> > > suppressing the warning in this case using xe_pm_suspending_or_resuming().
> >
> > The get_if_in_use is only the choice inside our _noresume so we can
> > properly check if the device was really awake and warn that we have
> > an unprotected case that we need to handle properly. If we were sure
> > to have all the outer protections in place already, we could safely
> > just use the _noresume option from the rpm directly.
> >
> > > So in this fix the above workaround to suppress the warning is just
> > > extended to the system suspend/resume case.
> > >
> > > > However, this case should be safe because it is triggered from the
> > > > system suspend path, where we certainly won't be transitioning to rpm
> > > > suspend.
> > > >
> > > > This wouldn't happen if the display pm sequences, including
> > > > all irq flow was in sync between i915 and xe. So, while we
> > > > don't get there, let's not raise warnings when we are in this
> > > > system suspend path.
> > >
> > > I think the issue fixed in this patch is just a consequence of how the
> > > outer RPM check works using xe_pm_suspending_or_resuming() and wouldn't
> > > change even after the IRQ related issues are fixed.
> >
> > If there's other cases where this release_async_put_domains is called
> > out of the suspend path, this warning here is showing that we do
> > need an extra runtime_pm_get right at the beginning of the workqueue.
> > And this patch here would only be masking this warning in this case
> > here, while leaving the release_async_put_domains unprotected.
>
> Fixing the IRQ handling doesn't change how pm_runtime_get_if_in_use()
> works and hence how its return value is ignored in the outer protection
> check during runtime and system s/r.
indeed!
so, pushed to drm-xe-next.
Thank you so much for the suggestion, review and insights here
>
> > > > Suggested-by: Imre Deak <imre.deak@intel.com>
> > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > >
> > > With the above understanding:
> > > Reviewed-by: Imre Deak <imre.deak@intel.com>
> > >
> > > > ---
> > > > drivers/gpu/drm/xe/xe_pm.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > > index a6761cb769b2..c6e57af0144c 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > > @@ -7,6 +7,7 @@
> > > >
> > > > #include <linux/fault-inject.h>
> > > > #include <linux/pm_runtime.h>
> > > > +#include <linux/suspend.h>
> > > >
> > > > #include <drm/drm_managed.h>
> > > > #include <drm/ttm/ttm_placement.h>
> > > > @@ -607,7 +608,8 @@ static bool xe_pm_suspending_or_resuming(struct xe_device *xe)
> > > > struct device *dev = xe->drm.dev;
> > > >
> > > > return dev->power.runtime_status == RPM_SUSPENDING ||
> > > > - dev->power.runtime_status == RPM_RESUMING;
> > > > + dev->power.runtime_status == RPM_RESUMING ||
> > > > + pm_suspend_target_state != PM_SUSPEND_ON;
> > > > #else
> > > > return false;
> > > > #endif
> > > > --
> > > > 2.47.1
> > > >
next prev parent reply other threads:[~2024-12-20 18:34 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-17 23:05 [PATCH] drm/xe/pm: Also avoid missing outer rpm warning on system suspend Rodrigo Vivi
2024-12-18 3:05 ` ✓ CI.Patch_applied: success for " Patchwork
2024-12-18 3:05 ` ✓ CI.checkpatch: " Patchwork
2024-12-18 3:07 ` ✓ CI.KUnit: " Patchwork
2024-12-18 3:25 ` ✓ CI.Build: " Patchwork
2024-12-18 3:27 ` ✓ CI.Hooks: " Patchwork
2024-12-18 3:29 ` ✓ CI.checksparse: " Patchwork
2024-12-18 4:03 ` ✓ Xe.CI.BAT: " Patchwork
2024-12-18 13:43 ` ✗ Xe.CI.Full: failure " Patchwork
2024-12-18 14:33 ` [PATCH] " Imre Deak
2024-12-20 14:55 ` Rodrigo Vivi
2024-12-20 16:20 ` Imre Deak
2024-12-20 18:34 ` Rodrigo Vivi [this message]
2024-12-20 19:15 ` Imre Deak
2024-12-20 19:21 ` Rodrigo Vivi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z2W4mTZBwFPujqzM@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=imre.deak@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.