Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Lis, Tomasz" <tomasz.lis@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>,
	<intel-xe@lists.freedesktop.org>,
	Lucas De Marchi <lucas.demarchi@intel.com>
Cc: "Michał Winiarski" <michal.winiarski@intel.com>,
	"Piotr Piórkowski" <piotr.piorkowski@intel.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Satyanarayana K V P" <satyanarayana.k.v.p@intel.com>
Subject: Re: [PATCH v4 2/4] drm/xe/vf: Fix GuC FW check for VF migration support
Date: Tue, 21 Oct 2025 01:46:57 +0200	[thread overview]
Message-ID: <3784d12b-26ea-4554-ad48-f90885fca2d3@intel.com> (raw)
In-Reply-To: <05e13fdb-cad6-43da-95e2-174ccc919833@intel.com>


On 10/21/2025 12:17 AM, Michal Wajdeczko wrote:
>
> On 10/20/2025 10:58 PM, Tomasz Lis wrote:
>> The check whether GuC ABI version meets requirements shall be
>> performed after said version is received from GuC.
>>
>> Doing it in wrong order was triggering a warning:
>> xe 0000:00:02.1: [drm] Assertion `gt->sriov.vf.guc_version.major` failed!
>>
>> With this change, dislodge part of the VF migration support check
>> and moved it to after GuC handshake.
>>
>> v2: Use xe_sriov_vf_ccs_migration_bb_needed()
>>
>> v3: Update commit message, move check funct to ccs module (Michal),
>>   rename xe_sriov_vf_migration_disable(), remove its duplicate
>>
>> Tested-by: Matthew Brost <matthew.brost@intel.com> #v1
>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6349
>> Fixes: be5590c384f3 ("drm/xe/vf: Enable CCS save/restore only on supported GUC versions")
> hmm, but this was likely working fine until this:
>
> commit ff1d2b5e3d28a62e79c89d2b2ab28ef5eaab84d8
> Author: Matt Roper <matthew.d.roper@intel.com>
> Date:   Mon Oct 13 13:09:50 2025 -0700
>
>      drm/xe: Read VF GMD_ID with a specifically-allocated dummy GT
>
> so maybe Fixes: shall point that commit instead?
>
> @Lucas ?
Will accept whatever you say is right. It looks like the fix tag relates 
to a commit, and I filled it with the commit which introduced the code. 
The comparison never worked, though maybe before Matts commit that fact 
was concealed. Since the commit I pointed, we were not comparing with a 
version received from GuC.
>
>> Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_sriov_vf.c     | 24 ++++++++----------------
>>   drivers/gpu/drm/xe/xe_sriov_vf.h     |  1 +
>>   drivers/gpu/drm/xe/xe_sriov_vf_ccs.c | 25 +++++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_sriov_vf_ccs.h |  1 +
>>   4 files changed, 35 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> index 911d5720917b..3a3cd9c35aa8 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> @@ -133,7 +133,7 @@ bool xe_sriov_vf_migration_supported(struct xe_device *xe)
>>   	return xe->sriov.vf.migration.enabled;
>>   }
>>   
>> -static void vf_disable_migration(struct xe_device *xe, const char *fmt, ...)
>> +void xe_sriov_vf_migration_disable(struct xe_device *xe, const char *fmt, ...)
>>   {
>>   	struct va_format vaf;
>>   	va_list va_args;
>> @@ -156,25 +156,15 @@ static void vf_migration_init_early(struct xe_device *xe)
>>   	 * supported at production quality.
>>   	 */
>>   	if (!IS_ENABLED(CONFIG_DRM_XE_DEBUG))
>> -		return vf_disable_migration(xe,
>> -					    "experimental feature not available on production builds");
>> +		return xe_sriov_vf_migration_disable(xe,
>> +				"experimental feature not available on production builds");
>>   
>>   	if (GRAPHICS_VER(xe) < 20)
>> -		return vf_disable_migration(xe, "requires gfx version >= 20, but only %u found",
>> -					    GRAPHICS_VER(xe));
>> -
>> -	if (!IS_DGFX(xe)) {
>> -		struct xe_uc_fw_version guc_version;
>> -
>> -		xe_gt_sriov_vf_guc_versions(xe_device_get_gt(xe, 0), NULL, &guc_version);
>> -		if (MAKE_GUC_VER_STRUCT(guc_version) < MAKE_GUC_VER(1, 23, 0))
>> -			return vf_disable_migration(xe,
>> -						    "CCS migration requires GuC ABI >= 1.23 but only %u.%u found",
>> -						    guc_version.major, guc_version.minor);
>> -	}
>> +		return xe_sriov_vf_migration_disable(xe,
>> +				"requires gfx version >= 20, but only %u found",
>> +				GRAPHICS_VER(xe));
>>   
>>   	xe->sriov.vf.migration.enabled = true;
> as said earlier, we should change the logic to
>
>   	xe->sriov.vf.migration.disabled = true;
>
> and set this in xe_sriov_vf_migration_disable()
not sure where this is going, but will invert.
>> -	xe_sriov_dbg(xe, "migration support enabled\n");
>>   }
>>   
>>   /**
>> @@ -198,6 +188,8 @@ int xe_sriov_vf_init_late(struct xe_device *xe)
>>   {
>>   	int err = 0;
>>   
>> +	xe_sriov_vf_migration_ccs_bb_support_check(xe);
> why not move that to xe_sriov_vf_ccs_init() called just below?
>
> then you will not have to export
>
> 	xe_sriov_vf_migration_ccs_bb_support_check
> nor
> 	xe_sriov_vf_migration_ccs_bb_needed
>
> as everything will be in xe_sriov_vf_ccs.c

right, at this point it makes sense.

-Tomasz

>> +
>>   	if (xe_sriov_vf_migration_supported(xe))
>>   		err = xe_sriov_vf_ccs_init(xe);
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.h b/drivers/gpu/drm/xe/xe_sriov_vf.h
>> index 4df95266b261..e967d4166a43 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_vf.h
>> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.h
>> @@ -14,6 +14,7 @@ struct xe_device;
>>   void xe_sriov_vf_init_early(struct xe_device *xe);
>>   int xe_sriov_vf_init_late(struct xe_device *xe);
>>   bool xe_sriov_vf_migration_supported(struct xe_device *xe);
>> +void xe_sriov_vf_migration_disable(struct xe_device *xe, const char *fmt, ...);
>>   void xe_sriov_vf_debugfs_register(struct xe_device *xe, struct dentry *root);
>>   
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c b/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c
>> index a2d61b37ff21..02d0fcd26399 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c
>> @@ -10,6 +10,8 @@
>>   #include "xe_device.h"
>>   #include "xe_exec_queue.h"
>>   #include "xe_exec_queue_types.h"
>> +#include "xe_gt_sriov_vf.h"
>> +#include "xe_guc.h"
>>   #include "xe_guc_submit.h"
>>   #include "xe_lrc.h"
>>   #include "xe_migrate.h"
>> @@ -275,6 +277,29 @@ bool xe_sriov_vf_migration_ccs_bb_needed(struct xe_device *xe)
>>   	return !IS_DGFX(xe) && xe_device_has_flat_ccs(xe);
>>   }
>>   
>> +/**
>> + * xe_sriov_vf_migration_ccs_bb_support_check - Check for disable migration due to FW version.
>> + * @xe: the &xe_device instance.
>> + *
>> + * Performs late disable of VF migration feature in case GuC FW cannot support it.
>> + */
>> +void xe_sriov_vf_migration_ccs_bb_support_check(struct xe_device *xe)
>> +{
>> +	if (!xe_sriov_vf_migration_supported(xe))
>> +		return;
>> +
>> +	if (xe_sriov_vf_migration_ccs_bb_needed(xe)) {
>> +		struct xe_gt *gt = xe_device_get_gt(xe, 0);
>> +		struct xe_uc_fw_version guc_version;
>> +
>> +		xe_gt_sriov_vf_guc_versions(gt, NULL, &guc_version);
>> +		if (MAKE_GUC_VER_STRUCT(guc_version) < MAKE_GUC_VER(1, 23, 0))
>> +			return xe_sriov_vf_migration_disable(xe,
>> +				"CCS migration requires GuC ABI >= 1.23 but only %u.%u found",
>> +				guc_version.major, guc_version.minor);
>> +	}
>> +}
>> +
>>   static void xe_sriov_vf_ccs_fini(void *arg)
>>   {
>>   	struct xe_sriov_vf_ccs_ctx *ctx = arg;
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf_ccs.h b/drivers/gpu/drm/xe/xe_sriov_vf_ccs.h
>> index 0e6b27016dac..2844628269d1 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_vf_ccs.h
>> +++ b/drivers/gpu/drm/xe/xe_sriov_vf_ccs.h
>> @@ -15,6 +15,7 @@ struct xe_device;
>>   struct xe_bo;
>>   
>>   bool xe_sriov_vf_migration_ccs_bb_needed(struct xe_device *xe);
>> +void xe_sriov_vf_migration_ccs_bb_support_check(struct xe_device *xe);
>>   
>>   int xe_sriov_vf_ccs_init(struct xe_device *xe);
>>   int xe_sriov_vf_ccs_attach_bo(struct xe_bo *bo);

  reply	other threads:[~2025-10-20 23:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-20 20:58 [PATCH v4 0/4] drm/xe/vf: Minor fixes to post-migration recovery Tomasz Lis
2025-10-20 20:58 ` [PATCH v4 1/4] drm/xe/vf: Helper for telling whether CCS migration BBs are needed Tomasz Lis
2025-10-20 20:58 ` [PATCH v4 2/4] drm/xe/vf: Fix GuC FW check for VF migration support Tomasz Lis
2025-10-20 22:17   ` Michal Wajdeczko
2025-10-20 23:46     ` Lis, Tomasz [this message]
2025-10-20 20:58 ` [PATCH v4 3/4] drm/xe: Assert that VF will never use fixed placement of BOs Tomasz Lis
2025-10-20 21:59   ` Michal Wajdeczko
2025-10-20 22:48     ` Lis, Tomasz
2025-10-21 15:04       ` Michal Wajdeczko
2025-10-21 17:20         ` Lis, Tomasz
2025-10-20 20:58 ` [PATCH v4 4/4] drm/xe/vf: Do not disable VF migration on ATS-M Tomasz Lis
2025-10-20 21:53   ` Michal Wajdeczko
2025-10-21 10:51 ` ✓ CI.KUnit: success for drm/xe/vf: Minor fixes to post-migration recovery (rev4) Patchwork
2025-10-21 12:41 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-21 13:38 ` ✓ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3784d12b-26ea-4554-ad48-f90885fca2d3@intel.com \
    --to=tomasz.lis@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=michal.winiarski@intel.com \
    --cc=piotr.piorkowski@intel.com \
    --cc=satyanarayana.k.v.p@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox