From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>,
<intel-xe@lists.freedesktop.org>,
Matthew Brost <matthew.brost@intel.com>
Subject: Re: [PATCH v2] drm/xe: remove GuC reload in D3Hot path
Date: Thu, 27 Jun 2024 21:00:23 -0400 [thread overview]
Message-ID: <Zn4LJ1qAFUAVpxYs@intel.com> (raw)
In-Reply-To: <25eccae7-3bfe-40c7-b400-738ab77c47ca@intel.com>
On Mon, Jun 24, 2024 at 10:10:48AM +0200, Michal Wajdeczko wrote:
> Hi Riana,
>
>
> On 24.06.2024 09:27, Riana Tauro wrote:
> > Currently GuC is reloaded for both runtime suspend
> > and system suspend. For D3hot <-> D0 transitions
> > no power is lost on suspend and GuC reload is not
> > necessary.
>
> quite aggressive line wrap at ~50, try wrap a ~72
> (only commit titles should be around ~50)
>
> >
> > remove GuC reload from D3Hot path and only enable/disable
> > CTB communications
>
> "Remove .... communication."
>
> >
> > v2: rebase
> >
> > Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_gt.c | 19 ++++++++++---
> > drivers/gpu/drm/xe/xe_gt.h | 4 +--
> > drivers/gpu/drm/xe/xe_guc.c | 52 +++++++++++++++++++++++++++++++++-
> > drivers/gpu/drm/xe/xe_guc.h | 2 ++
> > drivers/gpu/drm/xe/xe_guc_ct.c | 35 ++++++++++++++++++-----
> > drivers/gpu/drm/xe/xe_guc_ct.h | 3 +-
> > drivers/gpu/drm/xe/xe_pm.c | 8 +++---
> > drivers/gpu/drm/xe/xe_uc.c | 30 ++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_uc.h | 2 ++
> > 9 files changed, 136 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > index 759634cff1d8..5b48e4db9776 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -776,8 +776,9 @@ void xe_gt_suspend_prepare(struct xe_gt *gt)
> > XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
> > }
> >
> > -int xe_gt_suspend(struct xe_gt *gt)
> > +int xe_gt_suspend(struct xe_gt *gt, bool runtime)
>
> as you are changing function signature, please add full kernel-doc for
> this public function
>
> > {
> > + struct xe_device *xe = gt_to_xe(gt);
> > int err;
> >
> > xe_gt_dbg(gt, "suspending\n");
> > @@ -787,7 +788,11 @@ int xe_gt_suspend(struct xe_gt *gt)
> > if (err)
> > goto err_msg;
> >
> > - err = xe_uc_suspend(>->uc);
> > + if (runtime && !xe->d3cold.allowed)
>
> nit: maybe instead of just looking at some internal flag, it would be
> better to define helpers:
>
> static inline bool xe_device_d3cold_allowed(xe) { .. }
> static inline bool xe_device_d3hot_allowed(xe) { .. }
>
> > + err = xe_uc_runtime_suspend(>->uc);
> > + else
> > + err = xe_uc_suspend(>->uc);
> > +
> > if (err)
> > goto err_force_wake;
> >
> > @@ -825,8 +830,9 @@ int xe_gt_sanitize_freq(struct xe_gt *gt)
> > return ret;
> > }
> >
> > -int xe_gt_resume(struct xe_gt *gt)
> > +int xe_gt_resume(struct xe_gt *gt, bool runtime)
>
> add kernel-doc
>
> > {
> > + struct xe_device *xe = gt_to_xe(gt);
> > int err;
> >
> > xe_gt_dbg(gt, "resuming\n");
> > @@ -834,7 +840,12 @@ int xe_gt_resume(struct xe_gt *gt)
> > if (err)
> > goto err_msg;
> >
> > - err = do_gt_restart(gt);
> > + /* Do not reload GuC at runtime resume from D3Hot to D0 */
>
> maybe better to explain "why" than "what":
>
> /* GuC is still alive at D3hot, no need to reload it */
>
> > + if (runtime && !xe->d3cold.allowed)
> > + xe_uc_runtime_resume(>->uc);
> > + else
> > + err = do_gt_restart(gt);
> > +
> > if (err)
> > goto err_force_wake;
> >
> > diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> > index 1123fdfc4ebc..4c942e4716ad 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.h
> > +++ b/drivers/gpu/drm/xe/xe_gt.h
> > @@ -52,8 +52,8 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt);
> > void xe_gt_record_user_engines(struct xe_gt *gt);
> >
> > void xe_gt_suspend_prepare(struct xe_gt *gt);
> > -int xe_gt_suspend(struct xe_gt *gt);
> > -int xe_gt_resume(struct xe_gt *gt);
> > +int xe_gt_suspend(struct xe_gt *gt, bool runtime);
> > +int xe_gt_resume(struct xe_gt *gt, bool runtime);
> > void xe_gt_reset_async(struct xe_gt *gt);
> > void xe_gt_sanitize(struct xe_gt *gt);
> > int xe_gt_sanitize_freq(struct xe_gt *gt);
> > diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> > index 172b65a50e31..1ea1d00f93f3 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.c
> > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > @@ -870,7 +870,7 @@ int xe_guc_enable_communication(struct xe_guc *guc)
> > guc_enable_irq(guc);
> > }
> >
> > - err = xe_guc_ct_enable(&guc->ct);
> > + err = xe_guc_ct_register(&guc->ct);
> > if (err)
> > return err;
> >
> > @@ -1101,6 +1101,56 @@ void xe_guc_sanitize(struct xe_guc *guc)
> > guc->submission_state.enabled = false;
> > }
> >
> > +/**
> > + * xe_guc_runtime_suspend - GuC runtime suspend
> > + * @guc: GuC object
> > + *
> > + * This function waits for any outstanding CTB
> > + * and disables CTB communication before entering
> > + * D3Hot
> > + *
> > + * Return: 0 on success, negative error code on error.
> > + */
> > +int xe_guc_runtime_suspend(struct xe_guc *guc)
> > +{
> > + struct xe_guc_ct *ct = &guc->ct;
> > +
> > + /*
> > + * Wait for any outstanding CTB before tearing down communication
> > + * with GuC.
> > + */
> > + if (!wait_event_timeout(ct->wq, !ct->g2h_outstanding, (HZ / 5)))
> > + return -ETIMEDOUT;
> > +
> > + /* Disable CTB */
> > + xe_guc_ct_disable(&guc->ct);
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * xe_guc_runtime_resume - GuC runtime resume
> > + * @guc: GuC object
> > + *
> > + * This function enables CTB communication
> > + * and interrupts.
> > + */
> > +void xe_guc_runtime_resume(struct xe_guc *guc)
> > +{
> > + /*
> > + * Power is not lost when resuming from D3Hot,
> > + * hence it is not necessary to reload GuC
> > + * everytime. Only enable interrupts and
> > + * CTB communication during resume
> > + */
> > + guc_enable_irq(guc);
>
> this will break VFs
>
> can't you just use xe_guc_enable_communication() as it was ?
> it did the same steps what you are trying in this function
>
> > +
> > + xe_mmio_rmw32(guc_to_gt(guc), PMINTRMSK,
> > + ARAT_EXPIRED_INTRMSK, 0);
>
> if power not lost we likely don't need this (as this is a default value
> and we already set it once as part of __xe_guc_upload()
>
> > +
> > + xe_guc_ct_enable(&guc->ct);
> > +}
> > +
> > int xe_guc_reset_prepare(struct xe_guc *guc)
> > {
> > return xe_guc_submit_reset_prepare(guc);
> > diff --git a/drivers/gpu/drm/xe/xe_guc.h b/drivers/gpu/drm/xe/xe_guc.h
> > index af59c9545753..a01999bb4a8e 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.h
> > +++ b/drivers/gpu/drm/xe/xe_guc.h
> > @@ -22,6 +22,8 @@ int xe_guc_upload(struct xe_guc *guc);
> > int xe_guc_min_load_for_hwconfig(struct xe_guc *guc);
> > int xe_guc_enable_communication(struct xe_guc *guc);
> > int xe_guc_suspend(struct xe_guc *guc);
> > +int xe_guc_runtime_suspend(struct xe_guc *guc);
> > +void xe_guc_runtime_resume(struct xe_guc *guc);
> > void xe_guc_notify(struct xe_guc *guc);
> > int xe_guc_auth_huc(struct xe_guc *guc, u32 rsa_addr);
> > int xe_guc_mmio_send(struct xe_guc *guc, const u32 *request, u32 len);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > index b4137fe195a4..400680c429fd 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > @@ -360,7 +360,16 @@ static void ct_exit_safe_mode(struct xe_guc_ct *ct)
> > xe_gt_dbg(ct_to_gt(ct), "GuC CT safe-mode disabled\n");
> > }
> >
> > -int xe_guc_ct_enable(struct xe_guc_ct *ct)
> > +/**
> > + * xe_guc_ct_register - register GuC CTB
> > + * @ct: GuC CT object
> > + *
> > + * register GuC CTB's and enable GuC CT
> > + * communication channel.
> > + *
> > + * Return: 0 on success, negative error code on error.
> > + */
> > +int xe_guc_ct_register(struct xe_guc_ct *ct)
>
> hmm, can we have common naming convention across all driver components?
>
> for the initialization there was pattern like:
>
> - init_early
> - init
> - enable
> - disable
> - fini
>
> and the "_register" suffix was mostly used by functions that registers
> the sysfs/debugfs attributes
>
> now CTB is splitting 'enable' step into 'register' and 'enable' where
> the latter can't be used alone and be even more misleading
>
> + Rodrigo
yeap, I fully agree with this and all the other points from Michal on
this review.
>
> > {
> > struct xe_device *xe = ct_to_xe(ct);
> > struct xe_gt *gt = ct_to_gt(ct);
> > @@ -383,11 +392,7 @@ int xe_guc_ct_enable(struct xe_guc_ct *ct)
> > if (err)
> > goto err_out;
> >
> > - xe_guc_ct_set_state(ct, XE_GUC_CT_STATE_ENABLED);
> > -
> > - smp_mb();
> > - wake_up_all(&ct->wq);
> > - xe_gt_dbg(gt, "GuC CT communication channel enabled\n");
> > + xe_guc_ct_enable(ct);
> >
> > if (ct_needs_safe_mode(ct))
> > ct_enter_safe_mode(ct);
> > @@ -396,10 +401,26 @@ int xe_guc_ct_enable(struct xe_guc_ct *ct)
> >
> > err_out:
> > xe_gt_err(gt, "Failed to enable GuC CT (%pe)\n", ERR_PTR(err));
> > -
> > return err;
> > }
> >
> > +/**
> > + * xe_guc_ct_enable - Set GuC CT to enabled state
> > + * @ct: GuC CT object
> > + *
> > + * Set GuC CT to enabled state
> > + */
> > +void xe_guc_ct_enable(struct xe_guc_ct *ct)
> > +{
> > + struct xe_gt *gt = ct_to_gt(ct);
> > +
> > + xe_guc_ct_set_state(ct, XE_GUC_CT_STATE_ENABLED);
> > +
> > + smp_mb();
> > + wake_up_all(&ct->wq);
> > + xe_gt_dbg(gt, "GuC CT communication channel enabled\n");
>
> this seems wrong, as CTB was enabled and fully functional once you
> called guc_ct_control_toggle(true), decoupling reporting of what have
> already happened in the previous steps is IMO just misleading
>
> + Matt
>
> > +}
> > +
> > static void stop_g2h_handler(struct xe_guc_ct *ct)
> > {
> > cancel_work_sync(&ct->g2h_worker);
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
> > index 105bb8e99a8d..f786b3ec2b68 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.h
> > @@ -11,7 +11,8 @@
> > struct drm_printer;
> >
> > int xe_guc_ct_init(struct xe_guc_ct *ct);
> > -int xe_guc_ct_enable(struct xe_guc_ct *ct);
> > +int xe_guc_ct_register(struct xe_guc_ct *ct);
> > +void xe_guc_ct_enable(struct xe_guc_ct *ct);
> > void xe_guc_ct_disable(struct xe_guc_ct *ct);
> > void xe_guc_ct_stop(struct xe_guc_ct *ct);
> > void xe_guc_ct_fast_path(struct xe_guc_ct *ct);
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index de3b5df65e48..070d28d193c7 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -99,7 +99,7 @@ int xe_pm_suspend(struct xe_device *xe)
> > xe_display_pm_suspend(xe, false);
> >
> > for_each_gt(gt, xe, id) {
> > - err = xe_gt_suspend(gt);
> > + err = xe_gt_suspend(gt, false);
> > if (err) {
> > xe_display_pm_resume(xe, false);
> > goto err;
> > @@ -154,7 +154,7 @@ int xe_pm_resume(struct xe_device *xe)
> > xe_display_pm_resume(xe, false);
> >
> > for_each_gt(gt, xe, id)
> > - xe_gt_resume(gt);
> > + xe_gt_resume(gt, false);
> >
> > err = xe_bo_restore_user(xe);
> > if (err)
> > @@ -370,7 +370,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
> > }
> >
> > for_each_gt(gt, xe, id) {
> > - err = xe_gt_suspend(gt);
> > + err = xe_gt_suspend(gt, true);
> > if (err)
> > goto out;
> > }
> > @@ -423,7 +423,7 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > xe_irq_resume(xe);
> >
> > for_each_gt(gt, xe, id)
> > - xe_gt_resume(gt);
> > + xe_gt_resume(gt, true);
> >
> > if (xe->d3cold.allowed) {
> > xe_display_pm_resume(xe, true);
> > diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
> > index 0f240534fb72..997ab4e82931 100644
> > --- a/drivers/gpu/drm/xe/xe_uc.c
> > +++ b/drivers/gpu/drm/xe/xe_uc.c
> > @@ -288,6 +288,36 @@ int xe_uc_suspend(struct xe_uc *uc)
> > return xe_guc_suspend(&uc->guc);
> > }
> >
> > +/**
> > + * xe_uc_runtime_suspend - uC runtime suspend
> > + * @uc: uC object
> > + *
> > + * Return: 0 on success, negative error code on error.
> > + */
> > +int xe_uc_runtime_suspend(struct xe_uc *uc)
> > +{
> > + /* GuC submission not enabled, nothing to do */
>
> nit: this comment doesn't bring too much new info and likely can be
> dropped without losing any clarity of the code
>
> > + if (!xe_device_uc_enabled(uc_to_xe(uc)))
> > + return 0;
> > +
> > + return xe_guc_runtime_suspend(&uc->guc);
> > +}
> > +
> > +/**
> > + * xe_uc_runtime_resume - uC runtime resume
> > + * @uc: uC object
> > + *
> > + * Called while resuming from D3Hot
> > + */
> > +void xe_uc_runtime_resume(struct xe_uc *uc)
> > +{
> > + /* GuC submission not enabled, nothing to do */
> > + if (!xe_device_uc_enabled(uc_to_xe(uc)))
> > + return;
> > +
> > + xe_guc_runtime_resume(&uc->guc);
> > +}
> > +
> > /**
> > * xe_uc_remove() - Clean up the UC structures before driver removal
> > * @uc: the UC object
> > diff --git a/drivers/gpu/drm/xe/xe_uc.h b/drivers/gpu/drm/xe/xe_uc.h
> > index 11856f24e6f9..dd483ea43b9b 100644
> > --- a/drivers/gpu/drm/xe/xe_uc.h
> > +++ b/drivers/gpu/drm/xe/xe_uc.h
> > @@ -15,6 +15,8 @@ int xe_uc_init_hw(struct xe_uc *uc);
> > int xe_uc_fini_hw(struct xe_uc *uc);
> > void xe_uc_gucrc_disable(struct xe_uc *uc);
> > int xe_uc_reset_prepare(struct xe_uc *uc);
> > +int xe_uc_runtime_suspend(struct xe_uc *uc);
> > +void xe_uc_runtime_resume(struct xe_uc *uc);
> > void xe_uc_stop_prepare(struct xe_uc *uc);
> > void xe_uc_stop(struct xe_uc *uc);
> > int xe_uc_start(struct xe_uc *uc);
next prev parent reply other threads:[~2024-06-28 1:00 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-24 7:27 [PATCH v2] drm/xe: remove GuC reload in D3Hot path Riana Tauro
2024-06-24 7:20 ` ✓ CI.Patch_applied: success for drm/xe: remove GuC reload in D3Hot path (rev2) Patchwork
2024-06-24 7:20 ` ✗ CI.checkpatch: warning " Patchwork
2024-06-24 7:22 ` ✓ CI.KUnit: success " Patchwork
2024-06-24 7:41 ` ✓ CI.Build: " Patchwork
2024-06-24 7:43 ` ✗ CI.Hooks: failure " Patchwork
2024-06-24 7:44 ` ✓ CI.checksparse: success " Patchwork
2024-06-24 8:10 ` [PATCH v2] drm/xe: remove GuC reload in D3Hot path Michal Wajdeczko
2024-06-24 10:50 ` Riana Tauro
2024-06-28 1:00 ` Rodrigo Vivi [this message]
2024-06-24 8:24 ` ✓ CI.BAT: success for drm/xe: remove GuC reload in D3Hot path (rev2) Patchwork
2024-06-24 9:50 ` ✗ CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zn4LJ1qAFUAVpxYs@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=riana.tauro@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.