Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: Michal Winiarski <michal.winiarski@intel.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Tomasz Lis <tomasz.lis@intel.com>
Subject: Re: [PATCH v2] drm/xe/vf: Fix up GGTT on S4 resume under SR-IOV
Date: Mon, 3 Nov 2025 12:16:03 +0100	[thread overview]
Message-ID: <41719271-c1d3-4218-b882-c84c002eb693@intel.com> (raw)
In-Reply-To: <20251023064610.4499-2-satyanarayana.k.v.p@intel.com>



On 10/23/2025 8:46 AM, Satyanarayana K V P wrote:
> With SRIOV enabled, while resuming from S4, there is a possibility that,
> the VM might have been suspended on one VF and resumed on another VF.
> Since GGTT space is not virtualized, we need to fixup all the GGTT
> references while resuming.
> 
> While resuming from S4, check whether the GGTT space is same or not for
> the given VF and fix-up if it is different.

is this an optimization, or the regular MIGRATION notification and flow
with the GuC does not work in this case ?

> 
> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Michal Winiarski <michal.winiarski@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Tomasz Lis <tomasz.lis@intel.com>
> ---
> V1 -> V2:
> - Rebased to latest drm-tip.
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 32 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_gt_sriov_vf.h |  1 +
>  drivers/gpu/drm/xe/xe_pm.c          | 12 +++++++++++
>  3 files changed, 45 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> index d0b102ab6ce8..38528495478f 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> @@ -1375,3 +1375,35 @@ void xe_gt_sriov_vf_wait_valid_ggtt(struct xe_gt *gt)
>  					       HZ * 5);
>  	xe_gt_WARN_ON(gt, !ret);
>  }
> +
> +/**
> + * xe_sriov_vf_fixup_ggtt - Fix up GGTT on resume from S4.

    * xe_sriov_vf_fixup_ggtt() - ...

and I don't see anything special about S4 in function code below

> + * @gt: the &xe_gt.
> + *
> + * This function shall be called only by VF.

this just says about the 'requirement' for the caller
> + * Main GT and media GT share the same GGTT space. So, fixups are needed
> + * only for Main GT.

this only about why there is nothing to do on media

but we still don't know what is special about S4

> + *
> + * Returns: 0 if the operation completed successfully, or a negative
> + * error code otherwise.
> + */
> +int xe_gt_sriov_vf_fixup_ggtt(struct xe_gt *gt)
> +{
> +	int err = 0;

no need to initialize here

> +
> +	xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
> +
> +	if (gt->info.type != XE_GT_TYPE_MAIN)
> +		return err;

hmm, you could return 0 here, but vf_post_migration_fixups() called
below actually does something for media GT (hwsp rebases), so maybe
we shouldn't skip that step here? 

> +
> +	xe_guc_comm_init_early(&gt->uc.guc);

shouldn't be needed since xe_gt shall be already fully initialized

> +	err = xe_gt_sriov_vf_bootstrap(gt);

this is much more than simple "do fixup" as given in the description

and "vf_bootstrap" just do a handshake with GuC, it's not querying
for any new self config
> +	if (err)
> +		goto out;
> +
> +	err = vf_post_migration_fixups(gt);

instead, the GGTT query is done as part of the above call (which is
already part of the recovery flow)

so the question is:

* why can't we wait until migration recovery is triggered?

or is it because it will not be triggered because GuC never return any VF_MIGRATION status?

so maybe we should initiate the recovery once we learned that new GGTT base has changed?

> +
> +out:
> +	return err;
> +}
> +
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> index af40276790fa..17004223f33a 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> @@ -39,5 +39,6 @@ void xe_gt_sriov_vf_print_runtime(struct xe_gt *gt, struct drm_printer *p);
>  void xe_gt_sriov_vf_print_version(struct xe_gt *gt, struct drm_printer *p);
>  
>  void xe_gt_sriov_vf_wait_valid_ggtt(struct xe_gt *gt);
> +int xe_gt_sriov_vf_fixup_ggtt(struct xe_gt *gt);
>  
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index 210298c4bcb1..68bf08ae62f6 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -19,6 +19,7 @@
>  #include "xe_ggtt.h"
>  #include "xe_gt.h"
>  #include "xe_gt_idle.h"
> +#include "xe_gt_sriov_vf.h"
>  #include "xe_i2c.h"
>  #include "xe_irq.h"
>  #include "xe_late_bind_fw.h"
> @@ -248,6 +249,17 @@ int xe_pm_resume(struct xe_device *xe)
>  
>  	xe_display_pm_resume_early(xe);
>  
> +	/* GGTT fixups (if needed) have to be done before restoring BOs */

be more specific " .. before restoring pinned BOs" 

> +	if (IS_SRIOV_VF(xe)) {
> +		for_each_gt(gt, xe, id) {

but maybe instead of new for_each_gt loop here, update the previous one:

	for_each_gt(gt, xe, id)
-		xe_gt_idle_disable_c6(gt);
+		xe_gt_resume_early(gt);

and there call VF specific code there:

+ xe_gt_resume_early()
+ {
+ 	if (IS_SRIOV_VF())
+ 		xe_gt_sriov_vf_resume_early(gt);
+ 	else
+ 		xe_gt_idle_disable_c6(gt);
+ }


> +			err = xe_gt_sriov_vf_fixup_ggtt(gt);
> +			if (err) {
> +				drm_err(&xe->drm, "GGTT fixups failed with %d\n", err);

nit: xe_gt_err(gt, ...) and likely better to inline that in the fixup function

> +				return err;
> +			}
> +		}
> +	}
> +
>  	/*
>  	 * This only restores pinned memory which is the memory required for the
>  	 * GT(s) to resume.


      parent reply	other threads:[~2025-11-03 11:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-23  6:46 [PATCH v2] drm/xe/vf: Fix up GGTT on S4 resume under SR-IOV Satyanarayana K V P
2025-10-23  7:06 ` ✓ CI.KUnit: success for drm/xe/vf: Fix up GGTT on S4 resume under SR-IOV (rev2) Patchwork
2025-10-23  7:44 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-23 13:25 ` ✗ Xe.CI.Full: failure " Patchwork
2025-10-24  2:49 ` [PATCH v2] drm/xe/vf: Fix up GGTT on S4 resume under SR-IOV Matthew Brost
2025-10-24  4:09   ` K V P, Satyanarayana
2025-10-24 15:58     ` Matthew Brost
2025-11-03 11:16 ` Michal Wajdeczko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41719271-c1d3-4218-b882-c84c002eb693@intel.com \
    --to=michal.wajdeczko@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=michal.winiarski@intel.com \
    --cc=satyanarayana.k.v.p@intel.com \
    --cc=tomasz.lis@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox