From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>,
<intel-xe@lists.freedesktop.org>
Cc: Michal Winiarski <michal.winiarski@intel.com>,
Matthew Brost <matthew.brost@intel.com>,
Tomasz Lis <tomasz.lis@intel.com>
Subject: Re: [PATCH v2] drm/xe/vf: Fix up GGTT on S4 resume under SR-IOV
Date: Mon, 3 Nov 2025 12:16:03 +0100 [thread overview]
Message-ID: <41719271-c1d3-4218-b882-c84c002eb693@intel.com> (raw)
In-Reply-To: <20251023064610.4499-2-satyanarayana.k.v.p@intel.com>
On 10/23/2025 8:46 AM, Satyanarayana K V P wrote:
> With SRIOV enabled, while resuming from S4, there is a possibility that,
> the VM might have been suspended on one VF and resumed on another VF.
> Since GGTT space is not virtualized, we need to fixup all the GGTT
> references while resuming.
>
> While resuming from S4, check whether the GGTT space is same or not for
> the given VF and fix-up if it is different.
is this an optimization, or the regular MIGRATION notification and flow
with the GuC does not work in this case ?
>
> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Michal Winiarski <michal.winiarski@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Tomasz Lis <tomasz.lis@intel.com>
> ---
> V1 -> V2:
> - Rebased to latest drm-tip.
> ---
> drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 32 +++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_gt_sriov_vf.h | 1 +
> drivers/gpu/drm/xe/xe_pm.c | 12 +++++++++++
> 3 files changed, 45 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> index d0b102ab6ce8..38528495478f 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> @@ -1375,3 +1375,35 @@ void xe_gt_sriov_vf_wait_valid_ggtt(struct xe_gt *gt)
> HZ * 5);
> xe_gt_WARN_ON(gt, !ret);
> }
> +
> +/**
> + * xe_sriov_vf_fixup_ggtt - Fix up GGTT on resume from S4.
* xe_sriov_vf_fixup_ggtt() - ...
and I don't see anything special about S4 in function code below
> + * @gt: the &xe_gt.
> + *
> + * This function shall be called only by VF.
this just says about the 'requirement' for the caller
> + * Main GT and media GT share the same GGTT space. So, fixups are needed
> + * only for Main GT.
this only about why there is nothing to do on media
but we still don't know what is special about S4
> + *
> + * Returns: 0 if the operation completed successfully, or a negative
> + * error code otherwise.
> + */
> +int xe_gt_sriov_vf_fixup_ggtt(struct xe_gt *gt)
> +{
> + int err = 0;
no need to initialize here
> +
> + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
> +
> + if (gt->info.type != XE_GT_TYPE_MAIN)
> + return err;
hmm, you could return 0 here, but vf_post_migration_fixups() called
below actually does something for media GT (hwsp rebases), so maybe
we shouldn't skip that step here?
> +
> + xe_guc_comm_init_early(>->uc.guc);
shouldn't be needed since xe_gt shall be already fully initialized
> + err = xe_gt_sriov_vf_bootstrap(gt);
this is much more than simple "do fixup" as given in the description
and "vf_bootstrap" just do a handshake with GuC, it's not querying
for any new self config
> + if (err)
> + goto out;
> +
> + err = vf_post_migration_fixups(gt);
instead, the GGTT query is done as part of the above call (which is
already part of the recovery flow)
so the question is:
* why can't we wait until migration recovery is triggered?
or is it because it will not be triggered because GuC never return any VF_MIGRATION status?
so maybe we should initiate the recovery once we learned that new GGTT base has changed?
> +
> +out:
> + return err;
> +}
> +
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> index af40276790fa..17004223f33a 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> @@ -39,5 +39,6 @@ void xe_gt_sriov_vf_print_runtime(struct xe_gt *gt, struct drm_printer *p);
> void xe_gt_sriov_vf_print_version(struct xe_gt *gt, struct drm_printer *p);
>
> void xe_gt_sriov_vf_wait_valid_ggtt(struct xe_gt *gt);
> +int xe_gt_sriov_vf_fixup_ggtt(struct xe_gt *gt);
>
> #endif
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index 210298c4bcb1..68bf08ae62f6 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -19,6 +19,7 @@
> #include "xe_ggtt.h"
> #include "xe_gt.h"
> #include "xe_gt_idle.h"
> +#include "xe_gt_sriov_vf.h"
> #include "xe_i2c.h"
> #include "xe_irq.h"
> #include "xe_late_bind_fw.h"
> @@ -248,6 +249,17 @@ int xe_pm_resume(struct xe_device *xe)
>
> xe_display_pm_resume_early(xe);
>
> + /* GGTT fixups (if needed) have to be done before restoring BOs */
be more specific " .. before restoring pinned BOs"
> + if (IS_SRIOV_VF(xe)) {
> + for_each_gt(gt, xe, id) {
but maybe instead of new for_each_gt loop here, update the previous one:
for_each_gt(gt, xe, id)
- xe_gt_idle_disable_c6(gt);
+ xe_gt_resume_early(gt);
and there call VF specific code there:
+ xe_gt_resume_early()
+ {
+ if (IS_SRIOV_VF())
+ xe_gt_sriov_vf_resume_early(gt);
+ else
+ xe_gt_idle_disable_c6(gt);
+ }
> + err = xe_gt_sriov_vf_fixup_ggtt(gt);
> + if (err) {
> + drm_err(&xe->drm, "GGTT fixups failed with %d\n", err);
nit: xe_gt_err(gt, ...) and likely better to inline that in the fixup function
> + return err;
> + }
> + }
> + }
> +
> /*
> * This only restores pinned memory which is the memory required for the
> * GT(s) to resume.
prev parent reply other threads:[~2025-11-03 11:16 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-23 6:46 [PATCH v2] drm/xe/vf: Fix up GGTT on S4 resume under SR-IOV Satyanarayana K V P
2025-10-23 7:06 ` ✓ CI.KUnit: success for drm/xe/vf: Fix up GGTT on S4 resume under SR-IOV (rev2) Patchwork
2025-10-23 7:44 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-23 13:25 ` ✗ Xe.CI.Full: failure " Patchwork
2025-10-24 2:49 ` [PATCH v2] drm/xe/vf: Fix up GGTT on S4 resume under SR-IOV Matthew Brost
2025-10-24 4:09 ` K V P, Satyanarayana
2025-10-24 15:58 ` Matthew Brost
2025-11-03 11:16 ` Michal Wajdeczko [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41719271-c1d3-4218-b882-c84c002eb693@intel.com \
--to=michal.wajdeczko@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=michal.winiarski@intel.com \
--cc=satyanarayana.k.v.p@intel.com \
--cc=tomasz.lis@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox