From: "Lis, Tomasz" <tomasz.lis@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>,
<intel-xe@lists.freedesktop.org>
Cc: "Michał Winiarski" <michal.winiarski@intel.com>,
"Piotr Piórkowski" <piotr.piorkowski@intel.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Satyanarayana K V P" <satyanarayana.k.v.p@intel.com>
Subject: Re: [PATCH v3 2/5] drm/xe/vf: Fix GuC FW check for VF migration support
Date: Fri, 17 Oct 2025 17:31:21 +0200 [thread overview]
Message-ID: <a705eb88-0795-41b8-becb-cc41e2479785@intel.com> (raw)
In-Reply-To: <ddf9bf4a-f7f3-4a1a-908f-2b8dd6d066da@intel.com>
[-- Attachment #1: Type: text/plain, Size: 8513 bytes --]
On 10/16/2025 11:55 PM, Michal Wajdeczko wrote:
>
> On 10/16/2025 2:05 PM, Tomasz Lis wrote:
>> The check was done before GuC ABI version could be acquired.
>> Comparing only to zeros provides very stable results, though
>> not the ones expected.
> instead of above sentence, better say that this was triggering:
>
> <4> [174.830604] xe 0000:00:02.1: [drm] Assertion `gt->sriov.vf.guc_version.major` failed!
> ...
ok.
>
>
>> This change dislodged part of the VF migration support check
>> and moved it to after GuC handshake.
> and describe your changes in imperative mood
>
> [1]https://docs.kernel.org/process/submitting-patches.html#describe-your-changes
will do.
>
>> v2: Use xe_sriov_vf_ccs_migration_bb_needed()
> you can keep change log under ---
ack.
>
>> Tested-by: Matthew Brost<matthew.brost@intel.com>
> I guess above was true for # rev1
ack.
>
>> Closes:https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6349
>> Fixes: be5590c384f3 ("drm/xe/vf: Enable CCS save/restore only on supported GUC versions")
>> Signed-off-by: Tomasz Lis<tomasz.lis@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 41 +++++++++++++++++++++++++++++
>> drivers/gpu/drm/xe/xe_gt_sriov_vf.h | 1 +
>> drivers/gpu/drm/xe/xe_guc.c | 2 ++
>> drivers/gpu/drm/xe/xe_sriov_vf.c | 10 -------
>> 4 files changed, 44 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>> index 46518e629ba3..34c68de6e2f3 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
>> @@ -314,6 +314,47 @@ static int guc_action_vf_notify_resfix_done(struct xe_guc *guc)
>> return ret > 0 ? -EPROTO : ret;
>> }
>>
>> +static void vf_disable_migration(struct xe_gt *gt, const char *fmt, ...)
>> +{
>> + struct xe_device *xe = gt_to_xe(gt);
>> + struct va_format vaf;
>> + va_list va_args;
>> +
>> + xe_gt_assert(gt, IS_SRIOV_VF(xe));
>> +
>> + va_start(va_args, fmt);
>> + vaf.fmt = fmt;
>> + vaf.va = &va_args;
>> + xe_gt_sriov_notice(gt, "migration disabled: %pV\n", &vaf);
>> + va_end(va_args);
>> +
>> + xe->sriov.vf.migration.enabled = false;
> this looks like a layer violation
>
> and we already have a function that wraps that at the device level
>
> maybe just promote device-level vf_disable_migration(xe,...) from xe_sriov_vf.c
> and call it from this gt-level place ?
>
> hmm, but see below [2]
>
>> +}
>> +
>> +/**
>> + * xe_gt_sriov_vf_guc_check_migration_support - Check for disable migration due to GuC.
>> + * @gt: the &xe_gt struct instance linked to target GuC
>> + *
>> + * Performs late disable of VF migration feature in case GuC FW cannot support it.
>> + */
>> +void xe_gt_sriov_vf_guc_check_migration_support(struct xe_gt *gt)
>> +{
>> + struct xe_device *xe = gt_to_xe(gt);
>> +
>> + if (!xe_sriov_vf_migration_supported(xe))
>> + return;
>> +
>> + if (xe_sriov_vf_ccs_migration_bb_needed(xe)) {
>> + struct xe_uc_fw_version guc_version;
>> +
>> + xe_gt_sriov_vf_guc_versions(gt, NULL, &guc_version);
>> + if (MAKE_GUC_VER_STRUCT(guc_version) < MAKE_GUC_VER(1, 23, 0))
>> + return vf_disable_migration(gt,
>> + "CCS migration requires GuC ABI >= 1.23 but only %u.%u found",
>> + guc_version.major, guc_version.minor);
> since we split migration checks from one place,
> this CCS GuC ABI condition shall be placed in sriov_vf_ccs.c subcomponent
will move.
>
>> + }
>> +}
>> +
>> /**
>> * vf_notify_resfix_done - Notify GuC about resource fixups apply completed.
>> * @gt: the &xe_gt struct instance linked to target GuC
>> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>> index af40276790fa..60a3b9b05b20 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
>> @@ -26,6 +26,7 @@ void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
>> int xe_gt_sriov_vf_init_early(struct xe_gt *gt);
>> int xe_gt_sriov_vf_init(struct xe_gt *gt);
>> bool xe_gt_sriov_vf_recovery_pending(struct xe_gt *gt);
>> +void xe_gt_sriov_vf_guc_check_migration_support(struct xe_gt *gt);
>>
>> u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt);
>> u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt);
>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
>> index d94490979adc..3c4e64233b3a 100644
>> --- a/drivers/gpu/drm/xe/xe_guc.c
>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>> @@ -713,6 +713,8 @@ static int vf_guc_init_noalloc(struct xe_guc *guc)
>> if (err)
>> return err;
>>
>> + xe_gt_sriov_vf_guc_check_migration_support(gt);
>> +
> [2] so this is now going through these layers:
>
> guc_vf vf_guc_init_noalloc
> gt_vf xe_gt_sriov_vf_guc_check_migration_support
> xe_vf xe_sriov_vf_migration_supported
> xe_vf_ccs xe_sriov_vf_ccs_migration_bb_needed
> gt_vf xe_gt_sriov_vf_guc_versions
> gt_vf vf_disable_migration
> xe_vf xe->sriov.vf.migration.enabled
>
> so maybe better leave this VF GuC init as-is and just make "late" checks
> in xe_device_probe either in xe_sriov_init_late
>
> xe xe_sriov_init_late
> xe_vf xe_sriov_vf_init_late
> xe_vf xe_sriov_vf_migration_supported
> xe_vf_ccs xe_sriov_vf_ccs_init_late
> xe_vf_ccs xe_sriov_vf_ccs_migration_bb_needed
> gt_vf xe_gt_sriov_vf_guc_versions
> xe_vf vf_disable_migration
>
> or after for_each_gt/xe_gt_init_early loop
>
> xe xe_device_probe
> xe_vf xe_sriov_vf_check_migration
> xe_vf xe_sriov_vf_migration_supported
> xe_vf_ccs xe_sriov_vf_ccs_init_late
> xe_vf_ccs xe_sriov_vf_ccs_migration_bb_needed
> gt_vf xe_gt_sriov_vf_guc_versions
> xe_vf vf_disable_migration
>
> or just make it as part of the xe_sriov_vf_ccs_init()
> since before that point CCS migration is not working either
The check can be done later than where I put it; but it needs to be
before IRQs are enabled. Both kinds of these are enabled in `xe_gt_init()`:
MEMIRQs in
gt_init_with_gt_forcewake->xe_uc_init->xe_guc_enable_communication
MMIO IRQs in gt_init_with_gt_forcewake->xe_irq_enable_hwe
The `xe_sriov_vf_ccs_init` is currently called after `xe_gt_init`. I'm
not completely sure if this is correct placement .. it might be, if
there are no CCS metadata set at that point.
GuC should have no problem assisting to migration without CCS metadata
transfer, so that placement could be ok.
But that's definitely too late for figuring out whether we support
migration at all.
So this eliminates the `xe_sriov_vf_init_late` and
`xe_sriov_vf_ccs_init` options.
For the 2nd option - this can be done. But does it really make sense to
put a single-platform workaround check directly in `xe_device_probe`?
I can do this - if you consider this option acceptable. Though
personally I see no reason for trying to rip it out of
`xe_gt_init_early`. It can be easily turned from per-gt check to single
GT check (by checking only the primary GuC which will actually be
responsible for scheduling the CCS save/restore BB execution), but that
gives an option rather than a reason.
Any place between `xe_gt_init_early()` and `xe_gt_init()` is ok for me.
>
>> err = xe_gt_sriov_vf_query_config(gt);
>> if (err)
>> return err;
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> index 911d5720917b..5fb042c05112 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> @@ -163,16 +163,6 @@ static void vf_migration_init_early(struct xe_device *xe)
>> return vf_disable_migration(xe, "requires gfx version >= 20, but only %u found",
>> GRAPHICS_VER(xe));
>>
>> - if (!IS_DGFX(xe)) {
>> - struct xe_uc_fw_version guc_version;
>> -
>> - xe_gt_sriov_vf_guc_versions(xe_device_get_gt(xe, 0), NULL, &guc_version);
>> - if (MAKE_GUC_VER_STRUCT(guc_version) < MAKE_GUC_VER(1, 23, 0))
>> - return vf_disable_migration(xe,
>> - "CCS migration requires GuC ABI >= 1.23 but only %u.%u found",
>> - guc_version.major, guc_version.minor);
>> - }
>> -
>> xe->sriov.vf.migration.enabled = true;
>> xe_sriov_dbg(xe, "migration support enabled\n");
> this would be non-reliable, as we might still disable migration later on
>
> so we should either remove it completely (assuming its "enabled" until explicitly disabled)
> or reverse the logic and use this flag instead:
>
> xe->sriov.vf.migration.disabled
will remove, it doesn't make sense here.
-Tomasz
>
>> }
[-- Attachment #2: Type: text/html, Size: 12093 bytes --]
next prev parent reply other threads:[~2025-10-17 15:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 12:05 [PATCH v3 0/5] drm/xe/vf: Minor fixes to post-migration recovery Tomasz Lis
2025-10-16 12:05 ` [PATCH v3 1/5] drm/xe/vf: Helper for telling whether CCS migration BBs are needed Tomasz Lis
2025-10-16 19:57 ` Michal Wajdeczko
2025-10-16 21:51 ` Lis, Tomasz
2025-10-16 12:05 ` [PATCH v3 2/5] drm/xe/vf: Fix GuC FW check for VF migration support Tomasz Lis
2025-10-16 21:55 ` Michal Wajdeczko
2025-10-17 15:31 ` Lis, Tomasz [this message]
2025-10-17 15:40 ` Michal Wajdeczko
2025-10-16 12:05 ` [PATCH v3 3/5] drm/xe/vf: Skip fixups on VF migration before getting GGTT info Tomasz Lis
2025-10-16 22:20 ` Michal Wajdeczko
2025-10-20 19:30 ` Lis, Tomasz
2025-10-16 12:05 ` [PATCH v3 4/5] drm/xe: Assert that VF will never use fixed placement of BOs Tomasz Lis
2025-10-16 12:05 ` [PATCH v3 5/5] drm/xe/vf: Do not disable VF migration on ATS-M Tomasz Lis
2025-10-16 22:25 ` Michal Wajdeczko
2025-10-20 19:22 ` Lis, Tomasz
2025-10-16 12:11 ` ✓ CI.KUnit: success for drm/xe/vf: Minor fixes to post-migration recovery (rev3) Patchwork
2025-10-16 12:57 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-17 8:23 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a705eb88-0795-41b8-becb-cc41e2479785@intel.com \
--to=tomasz.lis@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=michal.winiarski@intel.com \
--cc=piotr.piorkowski@intel.com \
--cc=satyanarayana.k.v.p@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox