Re: [PATCH v5 2/2] drm/xe/gt: Introduce runtime suspend/resume

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Raag Jadav <raag.jadav@intel.com>, <lucas.demarchi@intel.com>,
	<rodrigo.vivi@intel.com>, <intel-xe@lists.freedesktop.org>,
	<riana.tauro@intel.com>, <michal.wajdeczko@intel.com>
Subject: Re: [PATCH v5 2/2] drm/xe/gt: Introduce runtime suspend/resume
Date: Thu, 16 Oct 2025 11:38:25 -0700	[thread overview]
Message-ID: <aPE7oYJa037sxbaU@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <1aa6c0b4-73fd-4603-b414-802b4e15442a@intel.com>

On Thu, Oct 16, 2025 at 08:32:23AM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 10/14/2025 12:30 AM, Raag Jadav wrote:
> > If power state is retained between suspend/resume cycle, we don't need
> > to perform full GT re-initialization. Introduce runtime helpers for GT
> > which greatly reduce suspend/resume delay.
> > 
> > v2: Drop redundant xe_gt_sanitize() and xe_guc_ct_stop() (Daniele)
> >      Use runtime naming for guc helpers (Daniele)
> > v3: Drop redundant logging, add kernel doc (Michal)
> >      Use runtime naming for ct helpers (Michal)
> > v4: Fix tags (Rodrigo)
> > v5: Include host_l2_vram workaround (Daniele)
> >      Reuse xe_guc_submit_enable/disable() helpers (Daniele)
> 
> Based on the reply about VF behavior in the previous rev, I am thinking this
> is not the correct approach to this.
> If on a VF the runtime_suspend/resume functions are not called at all like
> you said, it means that the VF driver needs to be able to cope with the fact
> that the HW can lose power without it being directly notified if its rpm
> refcount is 0. This in turn means that on a VF the driver can't rely on what
> you do in xe_uc_runtime_suspend/resume() to idle the state and must instead
> guarantee that the state is already idled when the last rpm ref is released
> and that a new rpm ref is taken before restarting anything (which might
> already be true). AFAIK there are no difference in the SW state management
> of queues and CTBs between PF and VF, so if we achieve that on a VF we'll
> also have it on PF/Native, which means that there will be no need to
> pause/unpause CTBs and exec_queues. The only thing that the PF would need to
> do in the rpm flow is program the HW (e.g. the host_l2_vram stuff and the
> irq re-enabling).
> 
> tl;dr, if we guarantee that:
> 1 - if the rpm refcount is 0 then there is no activity on HW, so nothing
> that needs to be paused (which might already be true)
> 2 - an rpm ref is taken before any activity is started (which might also
> already be true)
> 
> Then we're guaranteeing that there is nothing to pause/unpause at runtime
> suspend/resume time, so we're safe skipping those calls entirely on VF while
> on native/PF we can just focus on the HW re-programming.
> 
> BTW, any idea how this is working with the current code? Given that we're
> re-loading the GuC on runtime resume, are the VFs getting disconnected and

VFs hold a RPM ref. See pf_enable_vfs in xe_pci_sriov.c.

Matt

> having to detect that and re-connect? Or are we just disabling rpm if a VF
> is enabled? Because if our approach is that rpm is just not supported if VFs
> are in use then we can keep your current approach and add an assert to make
> sure we're not runtime suspending if the vf count is > 0.
> 
> Daniele
> 
> > 
> > Co-developed-by: Riana Tauro <riana.tauro@intel.com>
> > Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_gt.c     | 60 ++++++++++++++++++++++++++++++++++
> >   drivers/gpu/drm/xe/xe_gt.h     |  2 ++
> >   drivers/gpu/drm/xe/xe_guc.c    | 34 +++++++++++++++++++
> >   drivers/gpu/drm/xe/xe_guc.h    |  2 ++
> >   drivers/gpu/drm/xe/xe_guc_ct.c | 27 +++++++++++++++
> >   drivers/gpu/drm/xe/xe_guc_ct.h |  2 ++
> >   drivers/gpu/drm/xe/xe_pm.c     | 10 +++---
> >   drivers/gpu/drm/xe/xe_uc.c     | 28 ++++++++++++++++
> >   drivers/gpu/drm/xe/xe_uc.h     |  2 ++
> >   9 files changed, 162 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > index d8e94fb8b9bd..0eacca14ccbb 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -1003,6 +1003,66 @@ int xe_gt_resume(struct xe_gt *gt)
> >   	return err;
> >   }
> > +/**
> > + * xe_gt_runtime_suspend() - GT runtime suspend
> > + * @gt: the GT object
> > + *
> > + * Return: 0 on success, negative error code otherwise.
> > + */
> > +int xe_gt_runtime_suspend(struct xe_gt *gt)
> > +{
> > +	unsigned int fw_ref;
> > +	int err = -ETIMEDOUT;
> > +
> > +	xe_gt_dbg(gt, "runtime suspending\n");
> > +
> > +	fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
> > +	if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL))
> > +		goto err_force_wake;
> > +
> > +	xe_uc_runtime_suspend(&gt->uc);
> > +	xe_gt_disable_host_l2_vram(gt);
> > +
> > +	xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > +	xe_gt_dbg(gt, "runtime suspended\n");
> > +
> > +	return 0;
> > +
> > +err_force_wake:
> > +	xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > +	return err;
> > +}
> > +
> > +/**
> > + * xe_gt_runtime_resume() - GT runtime resume
> > + * @gt: the GT object
> > + *
> > + * Return: 0 on success, negative error code otherwise.
> > + */
> > +int xe_gt_runtime_resume(struct xe_gt *gt)
> > +{
> > +	unsigned int fw_ref;
> > +	int err = -ETIMEDOUT;
> > +
> > +	xe_gt_dbg(gt, "runtime resuming\n");
> > +
> > +	fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
> > +	if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL))
> > +		goto err_force_wake;
> > +
> > +	xe_gt_enable_host_l2_vram(gt);
> > +	xe_uc_runtime_resume(&gt->uc);
> > +
> > +	xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > +	xe_gt_dbg(gt, "runtime resumed\n");
> > +
> > +	return 0;
> > +
> > +err_force_wake:
> > +	xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > +	return err;
> > +}
> > +
> >   struct xe_hw_engine *xe_gt_hw_engine(struct xe_gt *gt,
> >   				     enum xe_engine_class class,
> >   				     u16 instance, bool logical)
> > diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> > index 5df2ffe3ff83..ceb633ec22d0 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.h
> > +++ b/drivers/gpu/drm/xe/xe_gt.h
> > @@ -52,6 +52,8 @@ int xe_gt_suspend(struct xe_gt *gt);
> >   void xe_gt_shutdown(struct xe_gt *gt);
> >   int xe_gt_resume(struct xe_gt *gt);
> >   void xe_gt_reset_async(struct xe_gt *gt);
> > +int xe_gt_runtime_resume(struct xe_gt *gt);
> > +int xe_gt_runtime_suspend(struct xe_gt *gt);
> >   void xe_gt_sanitize(struct xe_gt *gt);
> >   int xe_gt_sanitize_freq(struct xe_gt *gt);
> > diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> > index d94490979adc..6262ca1c1d42 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.c
> > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > @@ -1599,6 +1599,40 @@ int xe_guc_start(struct xe_guc *guc)
> >   	return xe_guc_submit_start(guc);
> >   }
> > +/**
> > + * xe_guc_runtime_suspend() - GuC runtime suspend
> > + * @guc: The GuC object
> > + *
> > + * Stop further runs of submission tasks on given GuC and runtime suspend
> > + * GuC CT.
> > + */
> > +void xe_guc_runtime_suspend(struct xe_guc *guc)
> > +{
> > +	xe_guc_submit_pause(guc);
> > +	xe_guc_submit_disable(guc);
> > +	xe_guc_ct_runtime_suspend(&guc->ct);
> > +}
> > +
> > +/**
> > + * xe_guc_runtime_resume() - GuC runtime resume
> > + * @guc: The GuC object
> > + *
> > + * Runtime resume GuC CT and allow further runs of submission tasks on
> > + * given GuC.
> > + */
> > +void xe_guc_runtime_resume(struct xe_guc *guc)
> > +{
> > +	/*
> > +	 * Runtime PM flows are not applicable for VFs, so it's safe to
> > +	 * directly enable IRQ.
> > +	 */
> > +	guc_enable_irq(guc);
> > +
> > +	xe_guc_ct_runtime_resume(&guc->ct);
> > +	xe_guc_submit_enable(guc);
> > +	xe_guc_submit_unpause(guc);
> > +}
> > +
> >   void xe_guc_print_info(struct xe_guc *guc, struct drm_printer *p)
> >   {
> >   	struct xe_gt *gt = guc_to_gt(guc);
> > diff --git a/drivers/gpu/drm/xe/xe_guc.h b/drivers/gpu/drm/xe/xe_guc.h
> > index 1cca05967e62..0165e941a352 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.h
> > +++ b/drivers/gpu/drm/xe/xe_guc.h
> > @@ -35,6 +35,8 @@ int xe_guc_upload(struct xe_guc *guc);
> >   int xe_guc_min_load_for_hwconfig(struct xe_guc *guc);
> >   int xe_guc_enable_communication(struct xe_guc *guc);
> >   int xe_guc_opt_in_features_enable(struct xe_guc *guc);
> > +void xe_guc_runtime_suspend(struct xe_guc *guc);
> > +void xe_guc_runtime_resume(struct xe_guc *guc);
> >   int xe_guc_suspend(struct xe_guc *guc);
> >   void xe_guc_notify(struct xe_guc *guc);
> >   int xe_guc_auth_huc(struct xe_guc *guc, u32 rsa_addr);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > index 3ae1e8db143a..2232e872dbd6 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > @@ -634,6 +634,33 @@ void xe_guc_ct_stop(struct xe_guc_ct *ct)
> >   	stop_g2h_handler(ct);
> >   }
> > +/**
> > + * xe_guc_ct_runtime_suspend() - GuC CT runtime suspend
> > + * @ct: the &xe_guc_ct
> > + *
> > + * Set GuC CT to disabled state.
> > + */
> > +void xe_guc_ct_runtime_suspend(struct xe_guc_ct *ct)
> > +{
> > +	/*
> > +	 * Since we're already in runtime suspend path, we shouldn't have pending
> > +	 * messages. But if there happen to be any, we'd probably want them to be
> > +	 * thrown as errors for further investigation.
> > +	 */
> > +	xe_guc_ct_disable(ct);
> > +}
> > +
> > +/**
> > + * xe_guc_ct_runtime_resume() - GuC CT runtime resume
> > + * @ct: the &xe_guc_ct
> > + *
> > + * Restart GuC CT and set it to enabled state.
> > + */
> > +void xe_guc_ct_runtime_resume(struct xe_guc_ct *ct)
> > +{
> > +	xe_guc_ct_restart(ct);
> > +}
> > +
> >   static bool h2g_has_room(struct xe_guc_ct *ct, u32 cmd_len)
> >   {
> >   	struct guc_ctb *h2g = &ct->ctbs.h2g;
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
> > index ca1ce2b3c354..5599939f8fe1 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.h
> > @@ -17,6 +17,8 @@ int xe_guc_ct_init_post_hwconfig(struct xe_guc_ct *ct);
> >   int xe_guc_ct_enable(struct xe_guc_ct *ct);
> >   int xe_guc_ct_restart(struct xe_guc_ct *ct);
> >   void xe_guc_ct_disable(struct xe_guc_ct *ct);
> > +void xe_guc_ct_runtime_resume(struct xe_guc_ct *ct);
> > +void xe_guc_ct_runtime_suspend(struct xe_guc_ct *ct);
> >   void xe_guc_ct_stop(struct xe_guc_ct *ct);
> >   void xe_guc_ct_flush_and_stop(struct xe_guc_ct *ct);
> >   void xe_guc_ct_fast_path(struct xe_guc_ct *ct);
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index 53507e09f7bc..403a61e98ad8 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -591,7 +591,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
> >   	}
> >   	for_each_gt(gt, xe, id) {
> > -		err = xe_gt_suspend(gt);
> > +		err = xe->d3cold.allowed ? xe_gt_suspend(gt) : xe_gt_runtime_suspend(gt);
> >   		if (err)
> >   			goto out_resume;
> >   	}
> > @@ -633,10 +633,10 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> >   	xe_rpm_lockmap_acquire(xe);
> > -	for_each_gt(gt, xe, id)
> > -		xe_gt_idle_disable_c6(gt);
> > -
> >   	if (xe->d3cold.allowed) {
> > +		for_each_gt(gt, xe, id)
> > +			xe_gt_idle_disable_c6(gt);
> > +
> >   		err = xe_pcode_ready(xe, true);
> >   		if (err)
> >   			goto out;
> > @@ -657,7 +657,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> >   	xe_irq_resume(xe);
> >   	for_each_gt(gt, xe, id)
> > -		xe_gt_resume(gt);
> > +		xe->d3cold.allowed ? xe_gt_resume(gt) : xe_gt_runtime_resume(gt);
> >   	xe_display_pm_runtime_resume(xe);
> > diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
> > index 465bda355443..6a58b33248f5 100644
> > --- a/drivers/gpu/drm/xe/xe_uc.c
> > +++ b/drivers/gpu/drm/xe/xe_uc.c
> > @@ -301,6 +301,34 @@ int xe_uc_suspend(struct xe_uc *uc)
> >   	return xe_guc_suspend(&uc->guc);
> >   }
> > +/**
> > + * xe_uc_runtime_suspend() - UC runtime suspend
> > + * @uc: the UC object
> > + *
> > + * Runtime suspend all UCs.
> > + */
> > +void xe_uc_runtime_suspend(struct xe_uc *uc)
> > +{
> > +	if (!xe_device_uc_enabled(uc_to_xe(uc)))
> > +		return;
> > +
> > +	xe_guc_runtime_suspend(&uc->guc);
> > +}
> > +
> > +/**
> > + * xe_uc_runtime_resume() - UC runtime resume
> > + * @uc: the UC object
> > + *
> > + * Runtime resume all UCs.
> > + */
> > +void xe_uc_runtime_resume(struct xe_uc *uc)
> > +{
> > +	if (!xe_device_uc_enabled(uc_to_xe(uc)))
> > +		return;
> > +
> > +	xe_guc_runtime_resume(&uc->guc);
> > +}
> > +
> >   /**
> >    * xe_uc_declare_wedged() - Declare UC wedged
> >    * @uc: the UC object
> > diff --git a/drivers/gpu/drm/xe/xe_uc.h b/drivers/gpu/drm/xe/xe_uc.h
> > index 21c9306098cf..5398da1a8097 100644
> > --- a/drivers/gpu/drm/xe/xe_uc.h
> > +++ b/drivers/gpu/drm/xe/xe_uc.h
> > @@ -14,6 +14,8 @@ int xe_uc_init_post_hwconfig(struct xe_uc *uc);
> >   int xe_uc_load_hw(struct xe_uc *uc);
> >   void xe_uc_gucrc_disable(struct xe_uc *uc);
> >   int xe_uc_reset_prepare(struct xe_uc *uc);
> > +void xe_uc_runtime_resume(struct xe_uc *uc);
> > +void xe_uc_runtime_suspend(struct xe_uc *uc);
> >   void xe_uc_stop_prepare(struct xe_uc *uc);
> >   void xe_uc_stop(struct xe_uc *uc);
> >   int xe_uc_start(struct xe_uc *uc);
>

next prev parent reply	other threads:[~2025-10-16 18:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-14  7:30 [PATCH v5 0/2] Introduce GT runtime suspend/resume Raag Jadav
2025-10-14  7:30 ` [PATCH v5 1/2] drm/xe/guc: Make xe_guc_submit_pause() available for non-VF cases Raag Jadav
2025-10-14 18:09   ` Matthew Brost
2025-10-15  7:04     ` Raag Jadav
2025-10-16 12:21       ` Raag Jadav
2025-10-16 19:01         ` Matthew Brost
2025-10-14  7:30 ` [PATCH v5 2/2] drm/xe/gt: Introduce runtime suspend/resume Raag Jadav
2025-10-16 15:32   ` Daniele Ceraolo Spurio
2025-10-16 18:38     ` Matthew Brost [this message]
2025-10-16 18:47       ` Daniele Ceraolo Spurio
2025-10-20 11:48         ` Raag Jadav
2025-10-14  7:42 ` ✓ CI.KUnit: success for Introduce GT runtime suspend/resume (rev2) Patchwork
2025-10-14  8:19 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-14 15:40 ` ✗ Xe.CI.Full: failure " Patchwork
2025-10-16 13:03 ` ✗ CI.checkpatch: warning for Introduce GT runtime suspend/resume (rev3) Patchwork
2025-10-16 13:04 ` ✓ CI.KUnit: success " Patchwork
2025-10-16 14:08 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-17  9:43 ` ✓ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPE7oYJa037sxbaU@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=raag.jadav@intel.com \
    --cc=riana.tauro@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox