From: "Lis, Tomasz" <tomasz.lis@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>,
<intel-xe@lists.freedesktop.org>
Cc: "Michał Winiarski" <michal.winiarski@intel.com>
Subject: Re: [PATCH 3/4] drm/xe/vf: Start post-migration fixups with GuC MMIO handshake
Date: Mon, 23 Sep 2024 23:11:18 +0200 [thread overview]
Message-ID: <c164b4cc-952f-4c43-8d21-10da823ca64d@intel.com> (raw)
In-Reply-To: <b015dd94-4b39-4f9a-ad1e-8ec7b7cfe346@intel.com>
On 23.09.2024 14:02, Michal Wajdeczko wrote:
>
> On 21.09.2024 00:29, Tomasz Lis wrote:
>> During post-migration recovery, only MMIO communication to GuC is
>> allowed. But that communication requires initialization.
> shouldn't this be patch 2/4 ie. before actually trying to send anything
GuC is obligated to accept the RESFIX_DONE or RESET while in the RESFIX
state. Regardless of whether the KMD says "Hi" first or not.
This means the patches are in the correct order.
>
>> Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_sriov_vf.c | 40 ++++++++++++++++++++++++++++++++
>> 1 file changed, 40 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> index 459fa936aaba..3cea2d21525f 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> @@ -22,6 +22,36 @@ void xe_sriov_vf_init_early(struct xe_device *xe)
>> INIT_WORK(&xe->sriov.vf.migration_worker, migration_worker_func);
>> }
>>
>> +/**
>> + * vf_post_migration_reinit_guc - Re-initialize GuC communication.
>> + * @xe: the &xe_device struct instance
>> + *
>> + * After migration, we need to reestablish communication with GuC and
>> + * re-query all VF configuration to make sure they match previous
>> + * provisioning. Note that most of VF provisioning shall be the same,
>> + * except GGTT range, since GGTT is not virtualized per-VF.
>> + *
>> + * Returns: 0 if the operation completed successfully, or a negative error
> correct tag is "Return:" see [1]
>
> [1] https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
ack
>
>> + * code otherwise.
>> + */
>> +static int vf_post_migration_reinit_guc(struct xe_device *xe)
>> +{
>> + struct xe_gt *gt;
>> + unsigned int id;
>> + int err, ret;
>> +
>> + err = 0;
>> + xe_pm_runtime_get(xe);
> again, maybe PM can be done once in vf_post_migration_recovery()
We could, but we do not need to keep it during fixups. From what I
understand, the lower the granularity, the better.
>
>> + for_each_gt(gt, xe, id) {
>> + ret = xe_gt_sriov_vf_bootstrap(gt);
>> + if (!err)
>> + err = ret;
>> + }
>> + xe_pm_runtime_put(xe);
>> +
>> + return err;
> do we care about sending a reset to those GuCs that successfully
> completed handshake or we assume that going wedge is sufficient?
On wedge, all we should do is silence the hardware. If left in RESFIX
state, the hardware is silent.
We can't assure that GuCs won't be in RESFIX state anyway, because
another migration might happen after the wedging.
-Tomasz
>
>> +}
>> +
>> /*
>> * vf_post_migration_notify_resfix_done - Notify all GuCs about resource fixups apply finished.
>> * @xe: the &xe_device struct instance
>> @@ -44,10 +74,20 @@ static void vf_post_migration_notify_resfix_done(struct xe_device *xe)
>>
>> static void vf_post_migration_recovery(struct xe_device *xe)
>> {
>> + int err;
>> +
>> drm_dbg(&xe->drm, "migration recovery in progress\n");
>> + err = vf_post_migration_reinit_guc(xe);
>> + if (unlikely(err))
>> + goto fail;
>> +
>> /* FIXME: add the recovery steps */
>> vf_post_migration_notify_resfix_done(xe);
>> drm_notice(&xe->drm, "migration recovery completed\n");
>> + return;
>> +fail:
>> + drm_err(&xe->drm, "migration recovery failed (%pe)\n", ERR_PTR(err));
>> + xe_device_declare_wedged(xe);
>> }
>>
>> static void migration_worker_func(struct work_struct *w)
next prev parent reply other threads:[~2024-09-23 21:11 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-20 22:29 [PATCH 0/4] drm/xe/vf: Post-migration recovery worker basis Tomasz Lis
2024-09-20 22:29 ` [PATCH 1/4] drm/xe/vf: React to MIGRATED interrupt Tomasz Lis
2024-09-23 8:39 ` Michal Wajdeczko
2024-09-20 22:29 ` [PATCH 2/4] drm/xe/vf: Send RESFIX_DONE message at end of VF restore Tomasz Lis
2024-09-23 11:55 ` Michal Wajdeczko
2024-09-23 20:52 ` Lis, Tomasz
2024-09-20 22:29 ` [PATCH 3/4] drm/xe/vf: Start post-migration fixups with GuC MMIO handshake Tomasz Lis
2024-09-23 12:02 ` Michal Wajdeczko
2024-09-23 21:11 ` Lis, Tomasz [this message]
2024-09-20 22:29 ` [PATCH 4/4] drm/xe/vf: Defer fixups if migrated twice fast Tomasz Lis
2024-09-20 22:34 ` ✓ CI.Patch_applied: success for drm/xe/vf: Post-migration recovery worker basis Patchwork
2024-09-20 22:35 ` ✗ CI.checkpatch: warning " Patchwork
2024-09-20 22:35 ` ✗ CI.KUnit: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c164b4cc-952f-4c43-8d21-10da823ca64d@intel.com \
--to=tomasz.lis@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=michal.wajdeczko@intel.com \
--cc=michal.winiarski@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox