Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Lis, Tomasz" <tomasz.lis@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: "Michał Winiarski" <michal.winiarski@intel.com>,
	"Piotr Piórkowski" <piotr.piorkowski@intel.com>,
	"Satyanarayana K V P" <satyanarayana.k.v.p@intel.com>
Subject: Re: [PATCH v1] drm/xe/vf: Fail migration recovery if fixups needed but platform not supported
Date: Wed, 14 May 2025 01:16:04 +0200	[thread overview]
Message-ID: <de2c653a-5153-4432-a177-960bcd731d83@intel.com> (raw)
In-Reply-To: <454bf5c7-1d8e-42ba-9443-e7d7f61cc421@intel.com>


On 13.05.2025 13:21, Michal Wajdeczko wrote:
>
> On 13.05.2025 01:06, Tomasz Lis wrote:
>> The post-migration recovery needs to be fully implemented for a
>> specific platform in order to make continuation of workloads
>> possible.
>>
>> New platforms introduce changes which affect the recovery procedure,
>> and without a clear verification of support this leads to errors
>> with no straight forward error message explaining the cause.
>>
>> This patch fixes that issue - it introduces a message to be logged
>> when the current driver is known to not support the current platform.
>>
>> Wedging the driver immediately also decreases the amount of
>> additional errors which would come afterwards if the driver continued
>> operation.
>>
>> Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_sriov_vf.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> index 2674fa948fda..f21f98f5d25f 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> @@ -224,6 +224,11 @@ static void vf_post_migration_notify_resfix_done(struct xe_device *xe)
>>   	drm_dbg(&xe->drm, "another recovery imminent, skipping notifications\n");
>>   }
>>   
>> +static bool fixups_supported(struct xe_device *xe)
>> +{
> can we have some TODO comment here explaining what conditions we expect
> to be added here? or maybe we can start with CONFIG_XE_DEBUG to indicate
> early development phase?
do you mean `return IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV);`?
>> +	return false;
>> +}
>> +
>>   static void vf_post_migration_recovery(struct xe_device *xe)
>>   {
>>   	bool need_fixups;
>> @@ -243,6 +248,11 @@ static void vf_post_migration_recovery(struct xe_device *xe)
>>   		vf_post_migration_fixup_ctb(xe);
>>   
>>   	vf_post_migration_notify_resfix_done(xe);
>> +	if (need_fixups && !fixups_supported(xe)) {
>> +		drm_err(&xe->drm, "migration recovery not supported by this module version\n");
> we already have drm_err in the fail: section, do we need this extra one?
The idea behind introducing this error is to inform the user 
specifically that the module version used has no support of VF 
migration. We do need it, it's the point.
> if yes, can we make the message more specific
What specific information do you think should be conveyed there?
>   (and maybe the reason
> should be printed in fixups_supported() as for now it's all magic)

I was planning that function as a conditions check function, not 
decision function. Such functions typically just do the checks and 
return the result.

What's the concept you have?

> also, since support likely will not change between one migration and the
> other, maybe it should be just a single drm_info() message printed
> during a VF boot that any later migration will fail, without waiting
> until the first migration happen to surprise the user

The idea is to make sure user sees the message after fail - and he will 
first search for error message.

We've settled on providing the information when the issue happened; we 
can re-evaluate that if you think it's not the right call.

>> +		err = -ENOTRECOVERABLE;
>> +		goto fail;
>> +	}
> hmm, and this whole chunk seems to be placed in a wrong place - if
> fixups are not supported, why did we attempt to fixup CTB few lines
> above

My idea was to allow testing the existing recovery even if it's incomplete.

But it doesn't matter to me - can change if no fixups at all is better 
for whatever reason.

>   and claim that fixups are done? can you please explain

we never claim that to the user. We only take GuC out of RESFIX state.

As stated above, I have no problem with shifting the verification to 
earlier, it does not matter.

-Tomasz

>>   	xe_pm_runtime_put(xe);
>>   	drm_notice(&xe->drm, "migration recovery ended\n");
>>   	return;

  reply	other threads:[~2025-05-13 23:16 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12 23:06 [PATCH v1] drm/xe/vf: Fail migration recovery if fixups needed but platform not supported Tomasz Lis
2025-05-12 23:12 ` ✓ CI.Patch_applied: success for " Patchwork
2025-05-12 23:12 ` ✓ CI.checkpatch: " Patchwork
2025-05-12 23:13 ` ✓ CI.KUnit: " Patchwork
2025-05-12 23:23 ` ✗ CI.Build: failure " Patchwork
2025-05-13 11:21 ` [PATCH v1] " Michal Wajdeczko
2025-05-13 23:16   ` Lis, Tomasz [this message]
2025-05-13 14:40 ` ✓ CI.Patch_applied: success for " Patchwork
2025-05-13 14:40 ` ✓ CI.checkpatch: " Patchwork
2025-05-13 14:42 ` ✓ CI.KUnit: " Patchwork
2025-05-13 14:52 ` ✓ CI.Build: " Patchwork
2025-05-14  5:28 ` ✓ CI.Patch_applied: success for drm/xe/vf: Fail migration recovery if fixups needed but platform not supported (rev2) Patchwork
2025-05-14  5:28 ` ✓ CI.checkpatch: " Patchwork
2025-05-14  5:30 ` ✓ CI.KUnit: " Patchwork
2025-05-14  5:40 ` ✓ CI.Build: " Patchwork
2025-05-14  5:42 ` ✓ CI.Hooks: " Patchwork
2025-05-14  5:44 ` ✓ CI.checksparse: " Patchwork
2025-05-14  6:07 ` ✓ Xe.CI.BAT: " Patchwork
2025-05-14  8:01 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de2c653a-5153-4432-a177-960bcd731d83@intel.com \
    --to=tomasz.lis@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=michal.wajdeczko@intel.com \
    --cc=michal.winiarski@intel.com \
    --cc=piotr.piorkowski@intel.com \
    --cc=satyanarayana.k.v.p@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox