From: Raag Jadav <raag.jadav@intel.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
lucas.demarchi@intel.com, rodrigo.vivi@intel.com, "Nilawar,
Badal" <badal.nilawar@intel.com>,
intel-xe@lists.freedesktop.org, anshuman.gupta@intel.com
Subject: Re: [PATCH] drm/xe/d3cold: Set power state to D3Cold during s2idle/s3
Date: Wed, 2 Apr 2025 11:34:07 +0300 [thread overview]
Message-ID: <Z-z2fz97RwX2kBya@black.fi.intel.com> (raw)
In-Reply-To: <Z-znv8D3qIcVX-p1@black.fi.intel.com>
On Wed, Apr 02, 2025 at 10:31:16AM +0300, Raag Jadav wrote:
> On Tue, Apr 01, 2025 at 09:35:49PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Apr 1, 2025 at 7:53 PM Raag Jadav <raag.jadav@intel.com> wrote:
>
> ...
>
> > > That's not what I meant here. There are multiple issues at play.
> > >
> > > 1. An AER is reported[*] on root port during system suspend even before we
> > > reach any of the driver PM callback. From initial investigation it seems
> > > like a case of misbehaviour by some child device, but this is a different
> > > issue entirely.
> > >
> > > [*] irq/120-aerdrv-145 [002] ..... 1264.981023: <stack trace>
> > > => xe_pm_runtime_resume
> > > => xe_pci_runtime_resume
> > > => pci_pm_runtime_resume
> > > => __rpm_callback
> > > => rpm_callback
> > > => rpm_resume
> > > => __pm_runtime_resume
> > > => pci_pm_runtime_get_sync
> > > => __pci_walk_bus
> > > => pci_walk_bus
> > > => pcie_do_recovery
> > > => aer_process_err_devices
> > > => aer_isr
> > >
> > > 2. Setting explicit pci_set_power_state(pdev, PCI_D3cold) from xe_pm_suspend().
> > > Although we see many drivers do it for their case, it's quite a questionable
> > > choice (atleast IMHO) to hard suspend the device from driver PM callback
> > > without any regard to runtime_usage counter. It can hide potential issues
> > > like AER during system suspend (regardless of whether or not it is supported
> > > by the driver, since it is supposed to keep the device active on such a
> > > catastrophic failure anyway), but I'll leave it to the experts to decide.
> >
> > If the driver does not set DPM_FLAG_SMART_SUSPEND, and xe doesn't set
> > it AFAICS (at least not in the mainline), pci_pm_suspend() will resume
> > the device from runtime suspend before invoking its driver's callback.
> >
> > This guarantees that the device is always RPM_ACTIVE when
> > xe_pci_suspend() runs and it cannot be runtime-suspended because the
> > core is holding a runtime PM reference on it (and so the runtime PM
> > usage counter is always greater than zero when xe_pci_suspend() runs).
> >
> > This means that neither xe_pci_runtime_suspend() nor
> > xe_pci_runtime_resume() can run concurrently with respect to it, so
> > xe_pci_suspend() can change the power state of the device etc. safely.
>
> Ah, I failed to notice that __pm_runtime_resume() is taking a spin_lock_irqsave().
> Thanks for clearing this up.
On second thought, pcie_do_recovery() can still race with xe_pci_suspend(),
is this understanding correct?
I'm assuming this is why it's much safer to do pci_save_state() and
pci_prepare_to_sleep() only in ->noirq() callbacks like originally done
by PCI PM, right?
> > Of course, it needs to call pci_save_state() before doing any of that,
> > but it does so.
> >
> > This is expected to work, but resuming the device from runtime suspend
> > during a system suspend transition can be avoided (see below).
> >
> > However, on the resume side, pci_pm_resume_noirq() runs first and
> > because dev_pm_skip_resume() returns 'false' (since
> > DPM_FLAG_SMART_SUSPEND is unset) and skip_bus_pm is 'false' (since the
> > device is not in D0), it will call pci_pm_default_resume_early() which
> > will (1) put the device into D0 unconditionally and (2) restore its
> > state.
> >
> > This means that by the time when xe_pci_resume() runs, the power state
> > of the device is already D0 and its state has already been restored,
> > so the initial part of it is arguably redundant.
> >
> > There also appears to be a problem with calling pci_disable_device()
> > after pci_save_state() in xe_pci_suspend() because restoring the
> > device's state in pci_pm_default_resume_early() will inadvertently
> > cause it to be enabled AFAICS. I would call pci_save_state() after
> > pci_disable_device(), so the device would still be disabled after
> > running pci_pm_default_resume_early() and xe_pci_resume() would enable
> > it as appropriate.
> >
> > > 2. Since we're already setting D0 and D3 states in our runtime PM callbacks,
> > > utilizing DPM_FLAG_SMART_* flags will allow us to skip unnecessary
> > > runtime_resume during system suspend and let PCI PM take care of all
> > > the PCI stuff in a smarter way (even skip D3hot/cold if it's already in it)
> > > without us needing to duplicate code and do everything manually.
> >
> > That's true in principle, but care needs to be taken.
> >
> > First of all, DPM_FLAG_SMART_PREPARE and DPM_FLAG_SMART_SUSPEND are
> > about different things.
> >
> > DPM_FLAG_SMART_PREPARE is about the direct-complete optimization that
> > allows the suspend and resume of the device to be skipped completely
> > if some additional conditions are met and it allows the driver to
> > control the behavior related to this optimization (see "Entering
> > System Suspend" in Documentation/driver-api/pm/devices.rst for
> > details). The bottom line here is that because xe doesn't provide a
> > ->prepare() callback in xe_pm_ops, this flag will have no effect. xe
> > may want to set DPM_FLAG_NO_DIRECT_COMPLETE to disable the
> > direct-complete optimization completely, though (i915 does this for
> > some reason unclear to me).
> >
> > DPM_FLAG_SMART_SUSPEND is about leaving the device in runtime suspend
> > across the entire system suspend transition if it is still
> > runtime-suspended after running its driver's ->suspend() callback.
> >
> > This means in particular that if DPM_FLAG_SMART_SUSPEND is set, the
> > ->suspend() callback may run in parallel with xe_pci_runtime_resume(),
> > so it cannot be the current xe_pci_suspend() because it cannot
> > manipulate the device or call pci_save_state() on it or similar. The
> > most straightforward way to overcome this would be to point
> > ->suspend_late() to xe_pci_suspend(), but this would cause the device
> > to be suspended later unless it was runtime-suspended already. In
> > turn, suspending it later may cause the total duration of the system
> > suspend transition to increase (because the device will now be
> > suspended in a different phase of the transition).
> >
> > The suspend time can be distributed somewhat differently if the core
> > is allowed to take care of putting the device into a low-power state.
> > Simply removing the pci_save_state() and pci_set_power_state() calls
> > from xe_pci_suspend() should be sufficient to make this happen and
> > that would be my first step (after fixing the device disable/enable
> > issue described above).
> >
> > Next, you can point ->suspend_late(), instead of ->suspend(), to
> > xe_pci_suspend() and that is still expected to work, but the total
> > system suspend timing will change.
> >
> > If this works, you'll be ready to set DPM_FLAG_SMART_SUSPEND.
>
> Excellent. While I did some experimentation of my own, this definitely makes
> things more clear. Thank you for your detailed feedback.
>
> I'm still trying to wrap my head around the total suspend duration argument.
> Do you mean the time taken by xe_pci_suspend() itself can change with
> ->suspend_late()? Or are there any other side-effects to it?
>
> Raag
next prev parent reply other threads:[~2025-04-02 8:34 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-27 16:19 [PATCH] drm/xe/d3cold: Set power state to D3Cold during s2idle/s3 Badal Nilawar
2025-03-27 16:24 ` ✓ CI.Patch_applied: success for " Patchwork
2025-03-27 16:25 ` ✗ CI.checkpatch: warning " Patchwork
2025-03-27 16:26 ` ✓ CI.KUnit: success " Patchwork
2025-03-27 16:32 ` [PATCH] " Nilawar, Badal
2025-03-27 17:14 ` Rodrigo Vivi
2025-03-28 16:02 ` Rodrigo Vivi
2025-03-29 5:20 ` Raag Jadav
2025-03-31 16:15 ` Rodrigo Vivi
2025-03-31 20:18 ` Wysocki, Rafael J
2025-04-01 17:53 ` Raag Jadav
2025-04-01 19:35 ` Rafael J. Wysocki
2025-04-02 7:31 ` Raag Jadav
2025-04-02 8:34 ` Raag Jadav [this message]
2025-04-02 10:31 ` Rafael J. Wysocki
2025-04-03 3:12 ` Raag Jadav
2025-04-03 11:12 ` Rafael J. Wysocki
2025-04-02 10:19 ` Rafael J. Wysocki
2025-04-02 11:19 ` Gupta, Anshuman
2025-04-03 7:36 ` Nilawar, Badal
2025-04-03 11:16 ` Rafael J. Wysocki
2025-03-27 16:42 ` ✓ CI.Build: success for " Patchwork
2025-03-27 16:44 ` ✓ CI.Hooks: " Patchwork
2025-03-27 16:46 ` ✓ CI.checksparse: " Patchwork
2025-03-27 17:08 ` ✓ Xe.CI.BAT: " Patchwork
2025-03-28 1:49 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z-z2fz97RwX2kBya@black.fi.intel.com \
--to=raag.jadav@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=rafael.j.wysocki@intel.com \
--cc=rafael@kernel.org \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox