From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Tomasz Lis <tomasz.lis@intel.com>, <intel-xe@lists.freedesktop.org>
Cc: "Michał Winiarski" <michal.winiarski@intel.com>,
"Piotr Piórkowski" <piotr.piorkowski@intel.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Satyanarayana K V P" <satyanarayana.k.v.p@intel.com>
Subject: Re: [PATCH v3 2/5] drm/xe/vf: Fix GuC FW check for VF migration support
Date: Thu, 16 Oct 2025 23:55:15 +0200 [thread overview]
Message-ID: <ddf9bf4a-f7f3-4a1a-908f-2b8dd6d066da@intel.com> (raw)
In-Reply-To: <20251016120511.856792-3-tomasz.lis@intel.com>
On 10/16/2025 2:05 PM, Tomasz Lis wrote:
> The check was done before GuC ABI version could be acquired.
> Comparing only to zeros provides very stable results, though
> not the ones expected.
instead of above sentence, better say that this was triggering:
<4> [174.830604] xe 0000:00:02.1: [drm] Assertion `gt->sriov.vf.guc_version.major` failed!
...
>
> This change dislodged part of the VF migration support check
> and moved it to after GuC handshake.
and describe your changes in imperative mood
[1] https://docs.kernel.org/process/submitting-patches.html#describe-your-changes
>
> v2: Use xe_sriov_vf_ccs_migration_bb_needed()
you can keep change log under ---
>
> Tested-by: Matthew Brost <matthew.brost@intel.com>
I guess above was true for # rev1
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6349
> Fixes: be5590c384f3 ("drm/xe/vf: Enable CCS save/restore only on supported GUC versions")
> Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
> ---
> drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 41 +++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_gt_sriov_vf.h | 1 +
> drivers/gpu/drm/xe/xe_guc.c | 2 ++
> drivers/gpu/drm/xe/xe_sriov_vf.c | 10 -------
> 4 files changed, 44 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> index 46518e629ba3..34c68de6e2f3 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> @@ -314,6 +314,47 @@ static int guc_action_vf_notify_resfix_done(struct xe_guc *guc)
> return ret > 0 ? -EPROTO : ret;
> }
>
> +static void vf_disable_migration(struct xe_gt *gt, const char *fmt, ...)
> +{
> + struct xe_device *xe = gt_to_xe(gt);
> + struct va_format vaf;
> + va_list va_args;
> +
> + xe_gt_assert(gt, IS_SRIOV_VF(xe));
> +
> + va_start(va_args, fmt);
> + vaf.fmt = fmt;
> + vaf.va = &va_args;
> + xe_gt_sriov_notice(gt, "migration disabled: %pV\n", &vaf);
> + va_end(va_args);
> +
> + xe->sriov.vf.migration.enabled = false;
this looks like a layer violation
and we already have a function that wraps that at the device level
maybe just promote device-level vf_disable_migration(xe,...) from xe_sriov_vf.c
and call it from this gt-level place ?
hmm, but see below [2]
> +}
> +
> +/**
> + * xe_gt_sriov_vf_guc_check_migration_support - Check for disable migration due to GuC.
> + * @gt: the &xe_gt struct instance linked to target GuC
> + *
> + * Performs late disable of VF migration feature in case GuC FW cannot support it.
> + */
> +void xe_gt_sriov_vf_guc_check_migration_support(struct xe_gt *gt)
> +{
> + struct xe_device *xe = gt_to_xe(gt);
> +
> + if (!xe_sriov_vf_migration_supported(xe))
> + return;
> +
> + if (xe_sriov_vf_ccs_migration_bb_needed(xe)) {
> + struct xe_uc_fw_version guc_version;
> +
> + xe_gt_sriov_vf_guc_versions(gt, NULL, &guc_version);
> + if (MAKE_GUC_VER_STRUCT(guc_version) < MAKE_GUC_VER(1, 23, 0))
> + return vf_disable_migration(gt,
> + "CCS migration requires GuC ABI >= 1.23 but only %u.%u found",
> + guc_version.major, guc_version.minor);
since we split migration checks from one place,
this CCS GuC ABI condition shall be placed in sriov_vf_ccs.c subcomponent
> + }
> +}
> +
> /**
> * vf_notify_resfix_done - Notify GuC about resource fixups apply completed.
> * @gt: the &xe_gt struct instance linked to target GuC
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> index af40276790fa..60a3b9b05b20 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> @@ -26,6 +26,7 @@ void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
> int xe_gt_sriov_vf_init_early(struct xe_gt *gt);
> int xe_gt_sriov_vf_init(struct xe_gt *gt);
> bool xe_gt_sriov_vf_recovery_pending(struct xe_gt *gt);
> +void xe_gt_sriov_vf_guc_check_migration_support(struct xe_gt *gt);
>
> u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt);
> u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt);
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index d94490979adc..3c4e64233b3a 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -713,6 +713,8 @@ static int vf_guc_init_noalloc(struct xe_guc *guc)
> if (err)
> return err;
>
> + xe_gt_sriov_vf_guc_check_migration_support(gt);
> +
[2] so this is now going through these layers:
guc_vf vf_guc_init_noalloc
gt_vf xe_gt_sriov_vf_guc_check_migration_support
xe_vf xe_sriov_vf_migration_supported
xe_vf_ccs xe_sriov_vf_ccs_migration_bb_needed
gt_vf xe_gt_sriov_vf_guc_versions
gt_vf vf_disable_migration
xe_vf xe->sriov.vf.migration.enabled
so maybe better leave this VF GuC init as-is and just make "late" checks
in xe_device_probe either in xe_sriov_init_late
xe xe_sriov_init_late
xe_vf xe_sriov_vf_init_late
xe_vf xe_sriov_vf_migration_supported
xe_vf_ccs xe_sriov_vf_ccs_init_late
xe_vf_ccs xe_sriov_vf_ccs_migration_bb_needed
gt_vf xe_gt_sriov_vf_guc_versions
xe_vf vf_disable_migration
or after for_each_gt/xe_gt_init_early loop
xe xe_device_probe
xe_vf xe_sriov_vf_check_migration
xe_vf xe_sriov_vf_migration_supported
xe_vf_ccs xe_sriov_vf_ccs_init_late
xe_vf_ccs xe_sriov_vf_ccs_migration_bb_needed
gt_vf xe_gt_sriov_vf_guc_versions
xe_vf vf_disable_migration
or just make it as part of the xe_sriov_vf_ccs_init()
since before that point CCS migration is not working either
> err = xe_gt_sriov_vf_query_config(gt);
> if (err)
> return err;
> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
> index 911d5720917b..5fb042c05112 100644
> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
> @@ -163,16 +163,6 @@ static void vf_migration_init_early(struct xe_device *xe)
> return vf_disable_migration(xe, "requires gfx version >= 20, but only %u found",
> GRAPHICS_VER(xe));
>
> - if (!IS_DGFX(xe)) {
> - struct xe_uc_fw_version guc_version;
> -
> - xe_gt_sriov_vf_guc_versions(xe_device_get_gt(xe, 0), NULL, &guc_version);
> - if (MAKE_GUC_VER_STRUCT(guc_version) < MAKE_GUC_VER(1, 23, 0))
> - return vf_disable_migration(xe,
> - "CCS migration requires GuC ABI >= 1.23 but only %u.%u found",
> - guc_version.major, guc_version.minor);
> - }
> -
> xe->sriov.vf.migration.enabled = true;
> xe_sriov_dbg(xe, "migration support enabled\n");
this would be non-reliable, as we might still disable migration later on
so we should either remove it completely (assuming its "enabled" until explicitly disabled)
or reverse the logic and use this flag instead:
xe->sriov.vf.migration.disabled
> }
next prev parent reply other threads:[~2025-10-16 21:55 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 12:05 [PATCH v3 0/5] drm/xe/vf: Minor fixes to post-migration recovery Tomasz Lis
2025-10-16 12:05 ` [PATCH v3 1/5] drm/xe/vf: Helper for telling whether CCS migration BBs are needed Tomasz Lis
2025-10-16 19:57 ` Michal Wajdeczko
2025-10-16 21:51 ` Lis, Tomasz
2025-10-16 12:05 ` [PATCH v3 2/5] drm/xe/vf: Fix GuC FW check for VF migration support Tomasz Lis
2025-10-16 21:55 ` Michal Wajdeczko [this message]
2025-10-17 15:31 ` Lis, Tomasz
2025-10-17 15:40 ` Michal Wajdeczko
2025-10-16 12:05 ` [PATCH v3 3/5] drm/xe/vf: Skip fixups on VF migration before getting GGTT info Tomasz Lis
2025-10-16 22:20 ` Michal Wajdeczko
2025-10-20 19:30 ` Lis, Tomasz
2025-10-16 12:05 ` [PATCH v3 4/5] drm/xe: Assert that VF will never use fixed placement of BOs Tomasz Lis
2025-10-16 12:05 ` [PATCH v3 5/5] drm/xe/vf: Do not disable VF migration on ATS-M Tomasz Lis
2025-10-16 22:25 ` Michal Wajdeczko
2025-10-20 19:22 ` Lis, Tomasz
2025-10-16 12:11 ` ✓ CI.KUnit: success for drm/xe/vf: Minor fixes to post-migration recovery (rev3) Patchwork
2025-10-16 12:57 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-17 8:23 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ddf9bf4a-f7f3-4a1a-908f-2b8dd6d066da@intel.com \
--to=michal.wajdeczko@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=michal.winiarski@intel.com \
--cc=piotr.piorkowski@intel.com \
--cc=satyanarayana.k.v.p@intel.com \
--cc=tomasz.lis@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox