Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Lis, Tomasz" <tomasz.lis@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: "Michał Winiarski" <michal.winiarski@intel.com>
Subject: Re: [PATCH 3/4] drm/xe/vf: Start post-migration fixups with GuC MMIO handshake
Date: Mon, 23 Sep 2024 23:11:18 +0200	[thread overview]
Message-ID: <c164b4cc-952f-4c43-8d21-10da823ca64d@intel.com> (raw)
In-Reply-To: <b015dd94-4b39-4f9a-ad1e-8ec7b7cfe346@intel.com>


On 23.09.2024 14:02, Michal Wajdeczko wrote:
>
> On 21.09.2024 00:29, Tomasz Lis wrote:
>> During post-migration recovery, only MMIO communication to GuC is
>> allowed. But that communication requires initialization.
> shouldn't this be patch 2/4 ie. before actually trying to send anything

GuC is obligated to accept the RESFIX_DONE or RESET while in the RESFIX 
state. Regardless of whether the KMD says "Hi" first or not.

This means the patches are in the correct order.

>
>> Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_sriov_vf.c | 40 ++++++++++++++++++++++++++++++++
>>   1 file changed, 40 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> index 459fa936aaba..3cea2d21525f 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> @@ -22,6 +22,36 @@ void xe_sriov_vf_init_early(struct xe_device *xe)
>>   	INIT_WORK(&xe->sriov.vf.migration_worker, migration_worker_func);
>>   }
>>   
>> +/**
>> + * vf_post_migration_reinit_guc - Re-initialize GuC communication.
>> + * @xe: the &xe_device struct instance
>> + *
>> + * After migration, we need to reestablish communication with GuC and
>> + * re-query all VF configuration to make sure they match previous
>> + * provisioning. Note that most of VF provisioning shall be the same,
>> + * except GGTT range, since GGTT is not virtualized per-VF.
>> + *
>> + * Returns: 0 if the operation completed successfully, or a negative error
> correct tag is "Return:" see [1]
>
> [1] https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
ack
>
>> + * code otherwise.
>> + */
>> +static int vf_post_migration_reinit_guc(struct xe_device *xe)
>> +{
>> +	struct xe_gt *gt;
>> +	unsigned int id;
>> +	int err, ret;
>> +
>> +	err = 0;
>> +	xe_pm_runtime_get(xe);
> again, maybe PM can be done once in vf_post_migration_recovery()

We could, but we do not need to keep it during fixups. From what I 
understand, the lower the granularity, the better.

>
>> +	for_each_gt(gt, xe, id) {
>> +		ret = xe_gt_sriov_vf_bootstrap(gt);
>> +		if (!err)
>> +			err = ret;
>> +	}
>> +	xe_pm_runtime_put(xe);
>> +
>> +	return err;
> do we care about sending a reset to those GuCs that successfully
> completed handshake or we assume that going wedge is sufficient?

On wedge, all we should do is silence the hardware. If left in RESFIX 
state, the hardware is silent.

We can't assure that GuCs won't be in RESFIX state anyway, because 
another migration might happen after the wedging.

-Tomasz

>
>> +}
>> +
>>   /*
>>    * vf_post_migration_notify_resfix_done - Notify all GuCs about resource fixups apply finished.
>>    * @xe: the &xe_device struct instance
>> @@ -44,10 +74,20 @@ static void vf_post_migration_notify_resfix_done(struct xe_device *xe)
>>   
>>   static void vf_post_migration_recovery(struct xe_device *xe)
>>   {
>> +	int err;
>> +
>>   	drm_dbg(&xe->drm, "migration recovery in progress\n");
>> +	err = vf_post_migration_reinit_guc(xe);
>> +	if (unlikely(err))
>> +		goto fail;
>> +
>>   	/* FIXME: add the recovery steps */
>>   	vf_post_migration_notify_resfix_done(xe);
>>   	drm_notice(&xe->drm, "migration recovery completed\n");
>> +	return;
>> +fail:
>> +	drm_err(&xe->drm, "migration recovery failed (%pe)\n", ERR_PTR(err));
>> +	xe_device_declare_wedged(xe);
>>   }
>>   
>>   static void migration_worker_func(struct work_struct *w)

  reply	other threads:[~2024-09-23 21:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-20 22:29 [PATCH 0/4] drm/xe/vf: Post-migration recovery worker basis Tomasz Lis
2024-09-20 22:29 ` [PATCH 1/4] drm/xe/vf: React to MIGRATED interrupt Tomasz Lis
2024-09-23  8:39   ` Michal Wajdeczko
2024-09-20 22:29 ` [PATCH 2/4] drm/xe/vf: Send RESFIX_DONE message at end of VF restore Tomasz Lis
2024-09-23 11:55   ` Michal Wajdeczko
2024-09-23 20:52     ` Lis, Tomasz
2024-09-20 22:29 ` [PATCH 3/4] drm/xe/vf: Start post-migration fixups with GuC MMIO handshake Tomasz Lis
2024-09-23 12:02   ` Michal Wajdeczko
2024-09-23 21:11     ` Lis, Tomasz [this message]
2024-09-20 22:29 ` [PATCH 4/4] drm/xe/vf: Defer fixups if migrated twice fast Tomasz Lis
2024-09-20 22:34 ` ✓ CI.Patch_applied: success for drm/xe/vf: Post-migration recovery worker basis Patchwork
2024-09-20 22:35 ` ✗ CI.checkpatch: warning " Patchwork
2024-09-20 22:35 ` ✗ CI.KUnit: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c164b4cc-952f-4c43-8d21-10da823ca64d@intel.com \
    --to=tomasz.lis@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=michal.wajdeczko@intel.com \
    --cc=michal.winiarski@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox