From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Tomasz Lis <tomasz.lis@intel.com>, intel-xe@lists.freedesktop.org
Cc: "Michał Winiarski" <michal.winiarski@intel.com>
Subject: Re: [PATCH 2/4] drm/xe/vf: Send RESFIX_DONE message at end of VF restore
Date: Mon, 23 Sep 2024 13:55:07 +0200 [thread overview]
Message-ID: <02158df1-ea94-4fb2-a649-3e3f38ff8d64@intel.com> (raw)
In-Reply-To: <20240920222926.846985-3-tomasz.lis@intel.com>
On 21.09.2024 00:29, Tomasz Lis wrote:
> After restore, GuC will not answer to any messages from VF KMD until
> fixups are applied. When that is done, VF KMD sends RESFIX_DONE
> message to GuC, at which point GuC resumes normal operation.
>
> This patch implements sending the RESFIX_DONE message at end of
> post-migration recovery.
>
> Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
> ---
> .../gpu/drm/xe/abi/guc_actions_sriov_abi.h | 38 +++++++++++++++++++
> drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 33 ++++++++++++++++
> drivers/gpu/drm/xe/xe_gt_sriov_vf.h | 1 +
> drivers/gpu/drm/xe/xe_sriov_vf.c | 23 +++++++++++
> 4 files changed, 95 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h
> index b6a1852749dd..74874bbba10c 100644
> --- a/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h
> @@ -501,6 +501,44 @@
> #define VF2GUC_VF_RESET_RESPONSE_MSG_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN
> #define VF2GUC_VF_RESET_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0
>
> +/**
> + * DOC: VF2GUC_NOTIFY_RESFIX_DONE
> + *
> + * This action is used by VF to notify the GuC that the VF KMD has completed
> + * post-migration recovery steps.
> + *
> + * This message must be sent as `MMIO HXG Message`_.
> + *
> + * +---+-------+--------------------------------------------------------------+
> + * | | Bits | Description |
> + * +===+=======+==============================================================+
> + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ |
> + * | +-------+--------------------------------------------------------------+
> + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ |
> + * | +-------+--------------------------------------------------------------+
> + * | | 27:16 | DATA0 = MBZ |
> + * | +-------+--------------------------------------------------------------+
> + * | | 15:0 | ACTION = _`GUC_ACTION_VF2GUC_NOTIFY_RESFIX_DONE` = 0x5508 |
> + * +---+-------+--------------------------------------------------------------+
> + *
> + * +---+-------+--------------------------------------------------------------+
> + * | | Bits | Description |
> + * +===+=======+==============================================================+
> + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_GUC_ |
> + * | +-------+--------------------------------------------------------------+
> + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ |
> + * | +-------+--------------------------------------------------------------+
> + * | | 27:0 | DATA0 = MBZ |
> + * +---+-------+--------------------------------------------------------------+
> + */
> +#define GUC_ACTION_VF2GUC_NOTIFY_RESFIX_DONE 0x5508
please make it unsigned (like all newer definitions) 0x5508u
> +
> +#define VF2GUC_NOTIFY_RESFIX_DONE_REQUEST_MSG_LEN GUC_HXG_REQUEST_MSG_MIN_LEN
> +#define VF2GUC_NOTIFY_RESFIX_DONE_REQUEST_MSG_0_MBZ GUC_HXG_REQUEST_MSG_0_DATA0
> +
> +#define VF2GUC_NOTIFY_RESFIX_DONE_RESPONSE_MSG_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN
> +#define VF2GUC_NOTIFY_RESFIX_DONE_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0
> +
> /**
> * DOC: VF2GUC_QUERY_SINGLE_KLV
> *
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> index d3baba50f085..08b5f6912923 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> @@ -223,6 +223,39 @@ int xe_gt_sriov_vf_bootstrap(struct xe_gt *gt)
> return 0;
> }
>
> +static int guc_action_vf_notify_resfix_done(struct xe_guc *guc)
> +{
> + u32 request[GUC_HXG_REQUEST_MSG_MIN_LEN] = {
> + FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) |
> + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
> + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_ACTION_VF2GUC_NOTIFY_RESFIX_DONE),
> + };
> + int ret;
> +
> + ret = xe_guc_mmio_send(guc, request, ARRAY_SIZE(request));
> +
> + return ret > 0 ? -EPROTO : ret;
> +}
> +
> +/**
> + * xe_gt_sriov_vf_notify_resfix_done - Notify GuC about resource fixups apply completed.
> + * @gt: the &xe_gt struct instance linked to target GuC
don't forget about "Return:"
> + */
> +int xe_gt_sriov_vf_notify_resfix_done(struct xe_gt *gt)
> +{
> + struct xe_guc *guc = >->uc.guc;
> + int err;
> +
> + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt)));
> +
> + err = guc_action_vf_notify_resfix_done(guc);
> + if (unlikely(err))
> + xe_gt_sriov_err(gt, "Failed to notify GuC about resource fixup done (%pe)\n",
> + ERR_PTR(err));
> +
> + return err;
> +}
> +
> static int guc_action_query_single_klv(struct xe_guc *guc, u32 key,
> u32 *value, u32 value_len)
> {
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> index e541ce57bec2..97e8c76a641a 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
> @@ -17,6 +17,7 @@ int xe_gt_sriov_vf_query_config(struct xe_gt *gt);
> int xe_gt_sriov_vf_connect(struct xe_gt *gt);
> int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
> int xe_gt_sriov_vf_prepare_ggtt(struct xe_gt *gt);
> +int xe_gt_sriov_vf_notify_resfix_done(struct xe_gt *gt);
>
> u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt);
> u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt);
> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
> index b068c57b2bdc..459fa936aaba 100644
> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
> @@ -7,8 +7,10 @@
>
> #include "xe_assert.h"
> #include "xe_device.h"
> +#include "xe_gt_sriov_vf.h"
> #include "xe_guc_ct.h"
> #include "xe_module.h"
> +#include "xe_pm.h"
> #include "xe_sriov.h"
> #include "xe_sriov_vf.h"
> #include "xe_sriov_printk.h"
> @@ -20,10 +22,31 @@ void xe_sriov_vf_init_early(struct xe_device *xe)
> INIT_WORK(&xe->sriov.vf.migration_worker, migration_worker_func);
> }
>
> +/*
> + * vf_post_migration_notify_resfix_done - Notify all GuCs about resource fixups apply finished.
> + * @xe: the &xe_device struct instance
> + */
> +static void vf_post_migration_notify_resfix_done(struct xe_device *xe)
> +{
> + struct xe_gt *gt;
> + unsigned int id;
> + int err, num_sent = 0;
> +
> + xe_pm_runtime_get(xe);
maybe pm_get/put can done once inside the top level recovery function:
vf_post_migration_recovery()
> + for_each_gt(gt, xe, id) {
> + err = xe_gt_sriov_vf_notify_resfix_done(gt);
> + if (!err)
> + num_sent++;
> + }
> + xe_pm_runtime_put(xe);
> + drm_dbg(&xe->drm, "sent %d VF resource fixups done notifications\n", num_sent);
hmm, so what's the plan for handling the cases when one or all
notify-fixup-done fail? are we going wedge or will silently ignore like
here?
> +}
> +
> static void vf_post_migration_recovery(struct xe_device *xe)
> {
> drm_dbg(&xe->drm, "migration recovery in progress\n");
> /* FIXME: add the recovery steps */
> + vf_post_migration_notify_resfix_done(xe);
hmm, shouldn't we have some kind of guard to actually track whether we
succeed with the fixups and only then send notify-done?
otherwise in addition that we're already cheating the user with below
message 'recovery-completed', now we cheating the GuC that 'fixup-done'
there likely always be a cases where something went wrong, so we either
end up with reset or wedge, so maybe we should also start with that and
only send 'fixup-done' when passing all post-migration steps
> drm_notice(&xe->drm, "migration recovery completed\n");
> }
>
next prev parent reply other threads:[~2024-09-23 11:55 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-20 22:29 [PATCH 0/4] drm/xe/vf: Post-migration recovery worker basis Tomasz Lis
2024-09-20 22:29 ` [PATCH 1/4] drm/xe/vf: React to MIGRATED interrupt Tomasz Lis
2024-09-23 8:39 ` Michal Wajdeczko
2024-09-20 22:29 ` [PATCH 2/4] drm/xe/vf: Send RESFIX_DONE message at end of VF restore Tomasz Lis
2024-09-23 11:55 ` Michal Wajdeczko [this message]
2024-09-23 20:52 ` Lis, Tomasz
2024-09-20 22:29 ` [PATCH 3/4] drm/xe/vf: Start post-migration fixups with GuC MMIO handshake Tomasz Lis
2024-09-23 12:02 ` Michal Wajdeczko
2024-09-23 21:11 ` Lis, Tomasz
2024-09-20 22:29 ` [PATCH 4/4] drm/xe/vf: Defer fixups if migrated twice fast Tomasz Lis
2024-09-20 22:34 ` ✓ CI.Patch_applied: success for drm/xe/vf: Post-migration recovery worker basis Patchwork
2024-09-20 22:35 ` ✗ CI.checkpatch: warning " Patchwork
2024-09-20 22:35 ` ✗ CI.KUnit: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=02158df1-ea94-4fb2-a649-3e3f38ff8d64@intel.com \
--to=michal.wajdeczko@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=michal.winiarski@intel.com \
--cc=tomasz.lis@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox