Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Tomasz Lis <tomasz.lis@intel.com>, <intel-xe@lists.freedesktop.org>
Cc: "Michał Winiarski" <michal.winiarski@intel.com>,
	"Piotr Piórkowski" <piotr.piorkowski@intel.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Satyanarayana K V P" <satyanarayana.k.v.p@intel.com>
Subject: Re: [PATCH v3 3/5] drm/xe/vf: Skip fixups on VF migration before getting GGTT info
Date: Fri, 17 Oct 2025 00:20:57 +0200	[thread overview]
Message-ID: <d3d9ae10-5128-47cf-8c6f-3aa9ec9f0202@intel.com> (raw)
In-Reply-To: <20251016120511.856792-4-tomasz.lis@intel.com>



On 10/16/2025 2:05 PM, Tomasz Lis wrote:
> The GuC RESFIX state should be achievable only after a successful
> handshake. If VF KMD has no GGTT configuration yet and we still got
> into RESFIX state, then either we're dealing with unclean initial
> state due to unusual actions before probe, or the migration
> happened while xe init (started by probe) was running.
> 
> In 1st case (VF migration before probe), we should just skip migration.
> Init procedure will ensure exit from RESFIX state as it starts GuC
> handshake with a reset.
> 
> In 2nd case (VF migration during xe init), the migration procedure
> should execute normally if GGTT configuration was already acquired
> from GuC, and can be skipped if it was not acquired.

we initiate recovery as part of the MEMIRQ handling

but to get MEMIRQ interrupt from the GuC we need to setup it first
to setup MEMIRQ vector we need to obtain GGTT data from GuC
only the we can register this MEMIRQ vector in GuC

so valid GGTT data seems to be hard prerequisite for this scenario

thus vf_ggtt_queried() should be always true, no?

> 
> This solution will avoid crashes due to the VF migration running
> on non-initialized xe sub-structures. But it is not enough to allow
> fully reliable migration during driver probe. In particular, the
> situation where the probe might not end successfully, is:
> 
> * The VF is paused and migrated after GuC reset (vf_bootstrap) but
> before config is acquired (vf_query_config). In such case, GuC may
> remain in RESFIX state, leading to timeouting requests.
> 
> Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> index 34c68de6e2f3..bb0b71a47125 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> @@ -1149,6 +1149,12 @@ void xe_gt_sriov_vf_print_version(struct xe_gt *gt, struct drm_printer *p)
>  		   pf_version->major, pf_version->minor);
>  }
>  
> +static bool vf_ggtt_queried(struct xe_tile *tile)
> +{
> +	guard(mutex)(&tile->mem.ggtt->lock);
> +	return xe_tile_sriov_vf_ggtt(tile) != 0;
> +}
> +
>  static bool vf_post_migration_shutdown(struct xe_gt *gt)
>  {
>  	struct xe_device *xe = gt_to_xe(gt);
> @@ -1260,6 +1266,11 @@ static void vf_post_migration_recovery(struct xe_gt *gt)
>  	xe_gt_sriov_dbg(gt, "migration recovery in progress\n");
>  
>  	xe_pm_runtime_get(xe);
> +
> +	/* If during init and before GGTT configuration, skip the procedure. */
> +	if (!vf_ggtt_queried(gt_to_tile(gt)))
> +		goto skip;
> +
>  	retry = vf_post_migration_shutdown(gt);
>  	if (retry)
>  		goto queue;
> @@ -1282,6 +1293,7 @@ static void vf_post_migration_recovery(struct xe_gt *gt)
>  
>  	vf_post_migration_kickstart(gt);
>  
> +skip:
>  	xe_pm_runtime_put(xe);
>  	xe_gt_sriov_notice(gt, "migration recovery ended\n");
>  	return;


  reply	other threads:[~2025-10-16 22:21 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16 12:05 [PATCH v3 0/5] drm/xe/vf: Minor fixes to post-migration recovery Tomasz Lis
2025-10-16 12:05 ` [PATCH v3 1/5] drm/xe/vf: Helper for telling whether CCS migration BBs are needed Tomasz Lis
2025-10-16 19:57   ` Michal Wajdeczko
2025-10-16 21:51     ` Lis, Tomasz
2025-10-16 12:05 ` [PATCH v3 2/5] drm/xe/vf: Fix GuC FW check for VF migration support Tomasz Lis
2025-10-16 21:55   ` Michal Wajdeczko
2025-10-17 15:31     ` Lis, Tomasz
2025-10-17 15:40       ` Michal Wajdeczko
2025-10-16 12:05 ` [PATCH v3 3/5] drm/xe/vf: Skip fixups on VF migration before getting GGTT info Tomasz Lis
2025-10-16 22:20   ` Michal Wajdeczko [this message]
2025-10-20 19:30     ` Lis, Tomasz
2025-10-16 12:05 ` [PATCH v3 4/5] drm/xe: Assert that VF will never use fixed placement of BOs Tomasz Lis
2025-10-16 12:05 ` [PATCH v3 5/5] drm/xe/vf: Do not disable VF migration on ATS-M Tomasz Lis
2025-10-16 22:25   ` Michal Wajdeczko
2025-10-20 19:22     ` Lis, Tomasz
2025-10-16 12:11 ` ✓ CI.KUnit: success for drm/xe/vf: Minor fixes to post-migration recovery (rev3) Patchwork
2025-10-16 12:57 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-17  8:23 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3d9ae10-5128-47cf-8c6f-3aa9ec9f0202@intel.com \
    --to=michal.wajdeczko@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=michal.winiarski@intel.com \
    --cc=piotr.piorkowski@intel.com \
    --cc=satyanarayana.k.v.p@intel.com \
    --cc=tomasz.lis@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox