Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: Matthew Brost <matthew.brost@intel.com>,
	Tomasz Lis <tomasz.lis@intel.com>
Subject: Re: [PATCH v7 3/4] drm/xe/vf: Requeue recovery on GuC MIGRATION error during VF post-migration
Date: Sat, 29 Nov 2025 21:27:36 +0100	[thread overview]
Message-ID: <91d0b4c4-8209-44c5-8d1d-8960ca02144a@intel.com> (raw)
In-Reply-To: <20251128133052.17120-9-satyanarayana.k.v.p@intel.com>



On 11/28/2025 2:30 PM, Satyanarayana K V P wrote:
> Handle GuC response `XE_GUC_RESPONSE_VF_MIGRATED` as a special case in the
> VF post-migration recovery flow. When this error occurs, it indicates that
> a new migration was detected while the resource fixup process was still in
> progress. Instead of failing immediately, requeue the VF into the recovery
> path to allow proper handling of the new migration event.
> 
> This improves robustness of VF recovery in SR-IOV environments where
> migrations can overlap with resource fixup steps.
> 
> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Tomasz Lis <tomasz.lis@intel.com>
> 
> ---
> V6 -> V7:
> - New commit.
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 9 +++++++--
>  drivers/gpu/drm/xe/xe_guc.c         | 6 ++++++
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> index fd7dd4a4739d..937554657440 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> @@ -1256,14 +1256,19 @@ static void vf_post_migration_recovery(struct xe_gt *gt)
>  	}
>  
>  	err = vf_post_migration_fixups(gt);
> -	if (err)
> +	if (unlikely(err))

no need to add 'unlikely' anywhere, we are not on critical path

>  		goto fail;
>  
>  	vf_post_migration_rearm(gt);
>  
>  	err = vf_post_migration_resfix_done(gt, marker);
> -	if (err)
> +	if (unlikely(err == -EREMCHG))
> +		goto queue;

maybe it's better to code that as:

	if (err) {
		if (err == -EREMCHG)
			goto queue;
		...

> +	if (unlikely(err)) {
> +		xe_gt_sriov_err(gt, "Recovery failed at GuC RESFIX_DONE step (%pe)\n",
> +				ERR_PTR(err));
>  		goto fail;

shouldn't this err message be part of the earlier patch?

> +	}
>  
>  	vf_post_migration_kickstart(gt);
>  
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index 88376bc2a483..f0407bab9a0c 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -1484,6 +1484,12 @@ int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request,
>  		u32 hint = FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, header);
>  		u32 error = FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, header);
>  
> +		if (unlikely(error == XE_GUC_RESPONSE_VF_MIGRATED)) {
> +			xe_gt_dbg(gt, "GuC mmio request %#x rejected due to MIGRATION (hint %#x)\n",
> +				  request[0], hint);
> +			return -EREMCHG;
> +		}
> +
>  		xe_gt_err(gt, "GuC mmio request %#x: failure %#x hint %#x\n",
>  			  request[0], error, hint);
>  		return -ENXIO;


  reply	other threads:[~2025-11-29 20:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-28 13:30 [PATCH v7 0/4] VF double migration Satyanarayana K V P
2025-11-28 13:30 ` [PATCH v7 1/4] drm/xe/vf: Enable VF migration only on supported GuC versions Satyanarayana K V P
2025-11-28 14:29   ` Michal Wajdeczko
2025-11-28 13:30 ` [PATCH v7 2/4] drm/xe/vf: Introduce RESFIX start marker support Satyanarayana K V P
2025-11-29 20:01   ` Michal Wajdeczko
2025-12-01  9:26     ` K V P, Satyanarayana
2025-11-28 13:30 ` [PATCH v7 3/4] drm/xe/vf: Requeue recovery on GuC MIGRATION error during VF post-migration Satyanarayana K V P
2025-11-29 20:27   ` Michal Wajdeczko [this message]
2025-11-28 13:30 ` [PATCH v7 4/4] drm/xe/vf: Add debugfs entries to test VF double migration Satyanarayana K V P
2025-11-29 21:07   ` Michal Wajdeczko
2025-12-01  6:04   ` Adam Miszczak
2025-11-28 14:21 ` ✗ CI.checkpatch: warning for VF double migration (rev7) Patchwork
2025-11-28 14:22 ` ✓ CI.KUnit: success " Patchwork
2025-11-28 15:33 ` ✓ Xe.CI.BAT: " Patchwork
2025-11-28 16:50 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91d0b4c4-8209-44c5-8d1d-8960ca02144a@intel.com \
    --to=michal.wajdeczko@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=satyanarayana.k.v.p@intel.com \
    --cc=tomasz.lis@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox