From: Matthew Brost <matthew.brost@intel.com>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Raag Jadav <raag.jadav@intel.com>, <lucas.demarchi@intel.com>,
<rodrigo.vivi@intel.com>, <intel-xe@lists.freedesktop.org>,
<riana.tauro@intel.com>, <michal.wajdeczko@intel.com>
Subject: Re: [PATCH v5 2/2] drm/xe/gt: Introduce runtime suspend/resume
Date: Thu, 16 Oct 2025 11:38:25 -0700 [thread overview]
Message-ID: <aPE7oYJa037sxbaU@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <1aa6c0b4-73fd-4603-b414-802b4e15442a@intel.com>
On Thu, Oct 16, 2025 at 08:32:23AM -0700, Daniele Ceraolo Spurio wrote:
>
>
> On 10/14/2025 12:30 AM, Raag Jadav wrote:
> > If power state is retained between suspend/resume cycle, we don't need
> > to perform full GT re-initialization. Introduce runtime helpers for GT
> > which greatly reduce suspend/resume delay.
> >
> > v2: Drop redundant xe_gt_sanitize() and xe_guc_ct_stop() (Daniele)
> > Use runtime naming for guc helpers (Daniele)
> > v3: Drop redundant logging, add kernel doc (Michal)
> > Use runtime naming for ct helpers (Michal)
> > v4: Fix tags (Rodrigo)
> > v5: Include host_l2_vram workaround (Daniele)
> > Reuse xe_guc_submit_enable/disable() helpers (Daniele)
>
> Based on the reply about VF behavior in the previous rev, I am thinking this
> is not the correct approach to this.
> If on a VF the runtime_suspend/resume functions are not called at all like
> you said, it means that the VF driver needs to be able to cope with the fact
> that the HW can lose power without it being directly notified if its rpm
> refcount is 0. This in turn means that on a VF the driver can't rely on what
> you do in xe_uc_runtime_suspend/resume() to idle the state and must instead
> guarantee that the state is already idled when the last rpm ref is released
> and that a new rpm ref is taken before restarting anything (which might
> already be true). AFAIK there are no difference in the SW state management
> of queues and CTBs between PF and VF, so if we achieve that on a VF we'll
> also have it on PF/Native, which means that there will be no need to
> pause/unpause CTBs and exec_queues. The only thing that the PF would need to
> do in the rpm flow is program the HW (e.g. the host_l2_vram stuff and the
> irq re-enabling).
>
> tl;dr, if we guarantee that:
> 1 - if the rpm refcount is 0 then there is no activity on HW, so nothing
> that needs to be paused (which might already be true)
> 2 - an rpm ref is taken before any activity is started (which might also
> already be true)
>
> Then we're guaranteeing that there is nothing to pause/unpause at runtime
> suspend/resume time, so we're safe skipping those calls entirely on VF while
> on native/PF we can just focus on the HW re-programming.
>
> BTW, any idea how this is working with the current code? Given that we're
> re-loading the GuC on runtime resume, are the VFs getting disconnected and
VFs hold a RPM ref. See pf_enable_vfs in xe_pci_sriov.c.
Matt
> having to detect that and re-connect? Or are we just disabling rpm if a VF
> is enabled? Because if our approach is that rpm is just not supported if VFs
> are in use then we can keep your current approach and add an assert to make
> sure we're not runtime suspending if the vf count is > 0.
>
> Daniele
>
> >
> > Co-developed-by: Riana Tauro <riana.tauro@intel.com>
> > Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_gt.c | 60 ++++++++++++++++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_gt.h | 2 ++
> > drivers/gpu/drm/xe/xe_guc.c | 34 +++++++++++++++++++
> > drivers/gpu/drm/xe/xe_guc.h | 2 ++
> > drivers/gpu/drm/xe/xe_guc_ct.c | 27 +++++++++++++++
> > drivers/gpu/drm/xe/xe_guc_ct.h | 2 ++
> > drivers/gpu/drm/xe/xe_pm.c | 10 +++---
> > drivers/gpu/drm/xe/xe_uc.c | 28 ++++++++++++++++
> > drivers/gpu/drm/xe/xe_uc.h | 2 ++
> > 9 files changed, 162 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > index d8e94fb8b9bd..0eacca14ccbb 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -1003,6 +1003,66 @@ int xe_gt_resume(struct xe_gt *gt)
> > return err;
> > }
> > +/**
> > + * xe_gt_runtime_suspend() - GT runtime suspend
> > + * @gt: the GT object
> > + *
> > + * Return: 0 on success, negative error code otherwise.
> > + */
> > +int xe_gt_runtime_suspend(struct xe_gt *gt)
> > +{
> > + unsigned int fw_ref;
> > + int err = -ETIMEDOUT;
> > +
> > + xe_gt_dbg(gt, "runtime suspending\n");
> > +
> > + fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
> > + if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL))
> > + goto err_force_wake;
> > +
> > + xe_uc_runtime_suspend(>->uc);
> > + xe_gt_disable_host_l2_vram(gt);
> > +
> > + xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > + xe_gt_dbg(gt, "runtime suspended\n");
> > +
> > + return 0;
> > +
> > +err_force_wake:
> > + xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > + return err;
> > +}
> > +
> > +/**
> > + * xe_gt_runtime_resume() - GT runtime resume
> > + * @gt: the GT object
> > + *
> > + * Return: 0 on success, negative error code otherwise.
> > + */
> > +int xe_gt_runtime_resume(struct xe_gt *gt)
> > +{
> > + unsigned int fw_ref;
> > + int err = -ETIMEDOUT;
> > +
> > + xe_gt_dbg(gt, "runtime resuming\n");
> > +
> > + fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
> > + if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL))
> > + goto err_force_wake;
> > +
> > + xe_gt_enable_host_l2_vram(gt);
> > + xe_uc_runtime_resume(>->uc);
> > +
> > + xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > + xe_gt_dbg(gt, "runtime resumed\n");
> > +
> > + return 0;
> > +
> > +err_force_wake:
> > + xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > + return err;
> > +}
> > +
> > struct xe_hw_engine *xe_gt_hw_engine(struct xe_gt *gt,
> > enum xe_engine_class class,
> > u16 instance, bool logical)
> > diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> > index 5df2ffe3ff83..ceb633ec22d0 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.h
> > +++ b/drivers/gpu/drm/xe/xe_gt.h
> > @@ -52,6 +52,8 @@ int xe_gt_suspend(struct xe_gt *gt);
> > void xe_gt_shutdown(struct xe_gt *gt);
> > int xe_gt_resume(struct xe_gt *gt);
> > void xe_gt_reset_async(struct xe_gt *gt);
> > +int xe_gt_runtime_resume(struct xe_gt *gt);
> > +int xe_gt_runtime_suspend(struct xe_gt *gt);
> > void xe_gt_sanitize(struct xe_gt *gt);
> > int xe_gt_sanitize_freq(struct xe_gt *gt);
> > diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> > index d94490979adc..6262ca1c1d42 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.c
> > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > @@ -1599,6 +1599,40 @@ int xe_guc_start(struct xe_guc *guc)
> > return xe_guc_submit_start(guc);
> > }
> > +/**
> > + * xe_guc_runtime_suspend() - GuC runtime suspend
> > + * @guc: The GuC object
> > + *
> > + * Stop further runs of submission tasks on given GuC and runtime suspend
> > + * GuC CT.
> > + */
> > +void xe_guc_runtime_suspend(struct xe_guc *guc)
> > +{
> > + xe_guc_submit_pause(guc);
> > + xe_guc_submit_disable(guc);
> > + xe_guc_ct_runtime_suspend(&guc->ct);
> > +}
> > +
> > +/**
> > + * xe_guc_runtime_resume() - GuC runtime resume
> > + * @guc: The GuC object
> > + *
> > + * Runtime resume GuC CT and allow further runs of submission tasks on
> > + * given GuC.
> > + */
> > +void xe_guc_runtime_resume(struct xe_guc *guc)
> > +{
> > + /*
> > + * Runtime PM flows are not applicable for VFs, so it's safe to
> > + * directly enable IRQ.
> > + */
> > + guc_enable_irq(guc);
> > +
> > + xe_guc_ct_runtime_resume(&guc->ct);
> > + xe_guc_submit_enable(guc);
> > + xe_guc_submit_unpause(guc);
> > +}
> > +
> > void xe_guc_print_info(struct xe_guc *guc, struct drm_printer *p)
> > {
> > struct xe_gt *gt = guc_to_gt(guc);
> > diff --git a/drivers/gpu/drm/xe/xe_guc.h b/drivers/gpu/drm/xe/xe_guc.h
> > index 1cca05967e62..0165e941a352 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.h
> > +++ b/drivers/gpu/drm/xe/xe_guc.h
> > @@ -35,6 +35,8 @@ int xe_guc_upload(struct xe_guc *guc);
> > int xe_guc_min_load_for_hwconfig(struct xe_guc *guc);
> > int xe_guc_enable_communication(struct xe_guc *guc);
> > int xe_guc_opt_in_features_enable(struct xe_guc *guc);
> > +void xe_guc_runtime_suspend(struct xe_guc *guc);
> > +void xe_guc_runtime_resume(struct xe_guc *guc);
> > int xe_guc_suspend(struct xe_guc *guc);
> > void xe_guc_notify(struct xe_guc *guc);
> > int xe_guc_auth_huc(struct xe_guc *guc, u32 rsa_addr);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > index 3ae1e8db143a..2232e872dbd6 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > @@ -634,6 +634,33 @@ void xe_guc_ct_stop(struct xe_guc_ct *ct)
> > stop_g2h_handler(ct);
> > }
> > +/**
> > + * xe_guc_ct_runtime_suspend() - GuC CT runtime suspend
> > + * @ct: the &xe_guc_ct
> > + *
> > + * Set GuC CT to disabled state.
> > + */
> > +void xe_guc_ct_runtime_suspend(struct xe_guc_ct *ct)
> > +{
> > + /*
> > + * Since we're already in runtime suspend path, we shouldn't have pending
> > + * messages. But if there happen to be any, we'd probably want them to be
> > + * thrown as errors for further investigation.
> > + */
> > + xe_guc_ct_disable(ct);
> > +}
> > +
> > +/**
> > + * xe_guc_ct_runtime_resume() - GuC CT runtime resume
> > + * @ct: the &xe_guc_ct
> > + *
> > + * Restart GuC CT and set it to enabled state.
> > + */
> > +void xe_guc_ct_runtime_resume(struct xe_guc_ct *ct)
> > +{
> > + xe_guc_ct_restart(ct);
> > +}
> > +
> > static bool h2g_has_room(struct xe_guc_ct *ct, u32 cmd_len)
> > {
> > struct guc_ctb *h2g = &ct->ctbs.h2g;
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
> > index ca1ce2b3c354..5599939f8fe1 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.h
> > @@ -17,6 +17,8 @@ int xe_guc_ct_init_post_hwconfig(struct xe_guc_ct *ct);
> > int xe_guc_ct_enable(struct xe_guc_ct *ct);
> > int xe_guc_ct_restart(struct xe_guc_ct *ct);
> > void xe_guc_ct_disable(struct xe_guc_ct *ct);
> > +void xe_guc_ct_runtime_resume(struct xe_guc_ct *ct);
> > +void xe_guc_ct_runtime_suspend(struct xe_guc_ct *ct);
> > void xe_guc_ct_stop(struct xe_guc_ct *ct);
> > void xe_guc_ct_flush_and_stop(struct xe_guc_ct *ct);
> > void xe_guc_ct_fast_path(struct xe_guc_ct *ct);
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index 53507e09f7bc..403a61e98ad8 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -591,7 +591,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
> > }
> > for_each_gt(gt, xe, id) {
> > - err = xe_gt_suspend(gt);
> > + err = xe->d3cold.allowed ? xe_gt_suspend(gt) : xe_gt_runtime_suspend(gt);
> > if (err)
> > goto out_resume;
> > }
> > @@ -633,10 +633,10 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > xe_rpm_lockmap_acquire(xe);
> > - for_each_gt(gt, xe, id)
> > - xe_gt_idle_disable_c6(gt);
> > -
> > if (xe->d3cold.allowed) {
> > + for_each_gt(gt, xe, id)
> > + xe_gt_idle_disable_c6(gt);
> > +
> > err = xe_pcode_ready(xe, true);
> > if (err)
> > goto out;
> > @@ -657,7 +657,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > xe_irq_resume(xe);
> > for_each_gt(gt, xe, id)
> > - xe_gt_resume(gt);
> > + xe->d3cold.allowed ? xe_gt_resume(gt) : xe_gt_runtime_resume(gt);
> > xe_display_pm_runtime_resume(xe);
> > diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
> > index 465bda355443..6a58b33248f5 100644
> > --- a/drivers/gpu/drm/xe/xe_uc.c
> > +++ b/drivers/gpu/drm/xe/xe_uc.c
> > @@ -301,6 +301,34 @@ int xe_uc_suspend(struct xe_uc *uc)
> > return xe_guc_suspend(&uc->guc);
> > }
> > +/**
> > + * xe_uc_runtime_suspend() - UC runtime suspend
> > + * @uc: the UC object
> > + *
> > + * Runtime suspend all UCs.
> > + */
> > +void xe_uc_runtime_suspend(struct xe_uc *uc)
> > +{
> > + if (!xe_device_uc_enabled(uc_to_xe(uc)))
> > + return;
> > +
> > + xe_guc_runtime_suspend(&uc->guc);
> > +}
> > +
> > +/**
> > + * xe_uc_runtime_resume() - UC runtime resume
> > + * @uc: the UC object
> > + *
> > + * Runtime resume all UCs.
> > + */
> > +void xe_uc_runtime_resume(struct xe_uc *uc)
> > +{
> > + if (!xe_device_uc_enabled(uc_to_xe(uc)))
> > + return;
> > +
> > + xe_guc_runtime_resume(&uc->guc);
> > +}
> > +
> > /**
> > * xe_uc_declare_wedged() - Declare UC wedged
> > * @uc: the UC object
> > diff --git a/drivers/gpu/drm/xe/xe_uc.h b/drivers/gpu/drm/xe/xe_uc.h
> > index 21c9306098cf..5398da1a8097 100644
> > --- a/drivers/gpu/drm/xe/xe_uc.h
> > +++ b/drivers/gpu/drm/xe/xe_uc.h
> > @@ -14,6 +14,8 @@ int xe_uc_init_post_hwconfig(struct xe_uc *uc);
> > int xe_uc_load_hw(struct xe_uc *uc);
> > void xe_uc_gucrc_disable(struct xe_uc *uc);
> > int xe_uc_reset_prepare(struct xe_uc *uc);
> > +void xe_uc_runtime_resume(struct xe_uc *uc);
> > +void xe_uc_runtime_suspend(struct xe_uc *uc);
> > void xe_uc_stop_prepare(struct xe_uc *uc);
> > void xe_uc_stop(struct xe_uc *uc);
> > int xe_uc_start(struct xe_uc *uc);
>
next prev parent reply other threads:[~2025-10-16 18:38 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-14 7:30 [PATCH v5 0/2] Introduce GT runtime suspend/resume Raag Jadav
2025-10-14 7:30 ` [PATCH v5 1/2] drm/xe/guc: Make xe_guc_submit_pause() available for non-VF cases Raag Jadav
2025-10-14 18:09 ` Matthew Brost
2025-10-15 7:04 ` Raag Jadav
2025-10-16 12:21 ` Raag Jadav
2025-10-16 19:01 ` Matthew Brost
2025-10-14 7:30 ` [PATCH v5 2/2] drm/xe/gt: Introduce runtime suspend/resume Raag Jadav
2025-10-16 15:32 ` Daniele Ceraolo Spurio
2025-10-16 18:38 ` Matthew Brost [this message]
2025-10-16 18:47 ` Daniele Ceraolo Spurio
2025-10-20 11:48 ` Raag Jadav
2025-10-14 7:42 ` ✓ CI.KUnit: success for Introduce GT runtime suspend/resume (rev2) Patchwork
2025-10-14 8:19 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-14 15:40 ` ✗ Xe.CI.Full: failure " Patchwork
2025-10-16 13:03 ` ✗ CI.checkpatch: warning for Introduce GT runtime suspend/resume (rev3) Patchwork
2025-10-16 13:04 ` ✓ CI.KUnit: success " Patchwork
2025-10-16 14:08 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-17 9:43 ` ✓ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPE7oYJa037sxbaU@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=daniele.ceraolospurio@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=raag.jadav@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox