From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: Raag Jadav <raag.jadav@intel.com>, <intel-xe@lists.freedesktop.org>
Cc: <matthew.brost@intel.com>, <rodrigo.vivi@intel.com>,
<thomas.hellstrom@linux.intel.com>, <riana.tauro@intel.com>,
<michal.wajdeczko@intel.com>, <matthew.d.roper@intel.com>,
<michal.winiarski@intel.com>, <matthew.auld@intel.com>,
<maarten@lankhorst.se>, <jani.nikula@intel.com>,
<lukasz.laguna@intel.com>, <zhanjun.dong@intel.com>,
<lukas@wunner.de>
Subject: Re: [PATCH v5 3/9] drm/xe/gt: Introduce FLR helpers
Date: Wed, 15 Apr 2026 09:25:14 -0700 [thread overview]
Message-ID: <9eb9aa1e-92fd-4e8a-991b-a94599600c4e@intel.com> (raw)
In-Reply-To: <20260406140722.154445-4-raag.jadav@intel.com>
On 4/6/2026 7:07 AM, Raag Jadav wrote:
> In preparation of usecases which require preparing/re-initializing GT and
> all its uCs before/after PCIe FLR, introduce flr_prepare/done() helpers.
>
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> ---
> v2: Add kernel doc (Matthew Brost)
> v4: Teardown exec queues instead of mangling scheduler pending list (Matthew Brost)
> ---
> drivers/gpu/drm/xe/xe_gsc.c | 14 ++++++++++++
> drivers/gpu/drm/xe/xe_gsc.h | 1 +
> drivers/gpu/drm/xe/xe_gt.c | 37 ++++++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_gt.h | 2 ++
> drivers/gpu/drm/xe/xe_gt_types.h | 9 ++++++++
> drivers/gpu/drm/xe/xe_guc.c | 29 +++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_guc.h | 2 ++
> drivers/gpu/drm/xe/xe_huc.c | 14 ++++++++++++
> drivers/gpu/drm/xe/xe_huc.h | 1 +
> drivers/gpu/drm/xe/xe_uc.c | 37 ++++++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_uc.h | 2 ++
> 11 files changed, 148 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
> index e5c234f3d795..7bf6a90ea1a2 100644
> --- a/drivers/gpu/drm/xe/xe_gsc.c
> +++ b/drivers/gpu/drm/xe/xe_gsc.c
> @@ -552,6 +552,20 @@ void xe_gsc_wait_for_worker_completion(struct xe_gsc *gsc)
> flush_work(&gsc->work);
> }
>
> +/**
> + * xe_gsc_flr_done() - Re-initialize GSC after FLR
> + * @gsc: The GSC object
> + *
> + * Returns: 0 on success, negative error code otherwise.
> + */
> +int xe_gsc_flr_done(struct xe_gsc *gsc)
The function name is slightly unclear, as it feels like you're querying
in the FLR process is done or not. Maybe just call it xe_gsc_reinit or
xe_gsc_reinit_post_flr?
Same for other similar functions.
> +{
> + if (!xe_uc_fw_is_loadable(&gsc->fw))
> + return 0;
> +
> + return xe_uc_fw_reinit(&gsc->fw);
> +}
> +
> void xe_gsc_stop_prepare(struct xe_gsc *gsc)
> {
> struct xe_gt *gt = gsc_to_gt(gsc);
> diff --git a/drivers/gpu/drm/xe/xe_gsc.h b/drivers/gpu/drm/xe/xe_gsc.h
> index b8b8e0810ad9..8b7fd98f0be6 100644
> --- a/drivers/gpu/drm/xe/xe_gsc.h
> +++ b/drivers/gpu/drm/xe/xe_gsc.h
> @@ -13,6 +13,7 @@ struct xe_gsc;
> struct xe_gt;
> struct xe_hw_engine;
>
> +int xe_gsc_flr_done(struct xe_gsc *gsc);
> int xe_gsc_init(struct xe_gsc *gsc);
> int xe_gsc_init_post_hwconfig(struct xe_gsc *gsc);
> void xe_gsc_wait_for_worker_completion(struct xe_gsc *gsc);
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 8a31c963c372..c395d8cc3b5a 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -991,6 +991,43 @@ void xe_gt_reset_async(struct xe_gt *gt)
> xe_pm_runtime_put(gt_to_xe(gt));
> }
>
> +static void xe_gt_flr_prepare_work(struct work_struct *w)
> +{
> + struct xe_gt *gt = container_of(w, typeof(*gt), flr.worker);
> +
> + xe_uc_flr_prepare(>->uc);
> +}
> +
> +/**
> + * xe_gt_flr_prepare() - Prepare GT for FLR
> + * @gt: the GT object
> + *
> + * Prepare all GT uCs for FLR.
> + */
> +void xe_gt_flr_prepare(struct xe_gt *gt)
> +{
> + /*
> + * We'll be tearing down exec queues which signals all fences and frees the
> + * jobs but all of that happens asynchronously, so make sure we don't disrupt
> + * the scheduler while jobs are still in-flight.
> + */
> + INIT_WORK_ONSTACK(>->flr.worker, xe_gt_flr_prepare_work);
> + queue_work(gt->ordered_wq, >->flr.worker);
> + flush_work(>->flr.worker);
> + destroy_work_on_stack(>->flr.worker);
> +}
> +
> +/**
> + * xe_gt_flr_done() - Re-initialize GT after FLR
> + * @gt: the GT object
> + *
> + * Returns: 0 on success, negative error code otherwise.
> + */
> +int xe_gt_flr_done(struct xe_gt *gt)
> +{
> + return xe_uc_flr_done(>->uc);
> +}
> +
> void xe_gt_suspend_prepare(struct xe_gt *gt)
> {
> CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL);
> diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> index de7e47763411..5e6e4eb09efe 100644
> --- a/drivers/gpu/drm/xe/xe_gt.h
> +++ b/drivers/gpu/drm/xe/xe_gt.h
> @@ -63,6 +63,8 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt);
> */
> void xe_gt_record_user_engines(struct xe_gt *gt);
>
> +int xe_gt_flr_done(struct xe_gt *gt);
> +void xe_gt_flr_prepare(struct xe_gt *gt);
> void xe_gt_suspend_prepare(struct xe_gt *gt);
> int xe_gt_suspend(struct xe_gt *gt);
> void xe_gt_shutdown(struct xe_gt *gt);
> diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
> index 8b55cf25a75f..f6694cc90582 100644
> --- a/drivers/gpu/drm/xe/xe_gt_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_types.h
> @@ -201,6 +201,15 @@ struct xe_gt {
> struct work_struct worker;
> } reset;
>
> + /** @flr: state for FLR */
> + struct {
> + /**
> + * @flr.worker: worker for FLR to be done async allowing to safely
> + * flush all code paths
> + */
> + struct work_struct worker;
> + } flr;
> +
> /** @tlb_inval: TLB invalidation state */
> struct xe_tlb_inval tlb_inval;
>
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index e762eada21db..76a3f0904159 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -1689,6 +1689,35 @@ void xe_guc_sanitize(struct xe_guc *guc)
> xe_guc_submit_disable(guc);
> }
>
> +/**
> + * xe_guc_flr_prepare() - Prepare GuC for FLR
> + * @guc: The GuC object
> + *
> + * Stop GuC submission and tear down exec queues.
> + */
> +void xe_guc_flr_prepare(struct xe_guc *guc)
> +{
> + if (!xe_uc_fw_is_loadable(&guc->fw))
> + return;
> +
> + xe_guc_submit_stop(guc);
> + xe_guc_submit_pause_abort(guc);
> +}
> +
> +/**
> + * xe_guc_flr_done() - Re-initialize GuC after FLR
> + * @guc: The GuC object
> + *
> + * Returns: 0 on success, negative error code otherwise.
> + */
> +int xe_guc_flr_done(struct xe_guc *guc)
> +{
> + if (!xe_uc_fw_is_loadable(&guc->fw))
> + return 0;
> +
> + return xe_uc_fw_reinit(&guc->fw);
> +}
> +
> int xe_guc_reset_prepare(struct xe_guc *guc)
> {
> return xe_guc_submit_reset_prepare(guc);
> diff --git a/drivers/gpu/drm/xe/xe_guc.h b/drivers/gpu/drm/xe/xe_guc.h
> index 02514914f404..1fcc623cf24e 100644
> --- a/drivers/gpu/drm/xe/xe_guc.h
> +++ b/drivers/gpu/drm/xe/xe_guc.h
> @@ -32,6 +32,8 @@
> struct drm_printer;
>
> void xe_guc_comm_init_early(struct xe_guc *guc);
> +int xe_guc_flr_done(struct xe_guc *guc);
> +void xe_guc_flr_prepare(struct xe_guc *guc);
> int xe_guc_init_noalloc(struct xe_guc *guc);
> int xe_guc_init(struct xe_guc *guc);
> int xe_guc_init_post_hwconfig(struct xe_guc *guc);
> diff --git a/drivers/gpu/drm/xe/xe_huc.c b/drivers/gpu/drm/xe/xe_huc.c
> index 57afe21444b1..96c1bbb6f52c 100644
> --- a/drivers/gpu/drm/xe/xe_huc.c
> +++ b/drivers/gpu/drm/xe/xe_huc.c
> @@ -296,6 +296,20 @@ void xe_huc_sanitize(struct xe_huc *huc)
> xe_uc_fw_sanitize(&huc->fw);
> }
>
> +/**
> + * xe_huc_flr_done() - Re-initialize HuC after FLR
> + * @huc: The HuC object
> + *
> + * Returns: 0 on success, negative error code otherwise.
> + */
> +int xe_huc_flr_done(struct xe_huc *huc)
> +{
> + if (!xe_uc_fw_is_loadable(&huc->fw))
> + return 0;
> +
> + return xe_uc_fw_reinit(&huc->fw);
> +}
> +
> void xe_huc_print_info(struct xe_huc *huc, struct drm_printer *p)
> {
> struct xe_gt *gt = huc_to_gt(huc);
> diff --git a/drivers/gpu/drm/xe/xe_huc.h b/drivers/gpu/drm/xe/xe_huc.h
> index fa1c45e70443..7600ea196908 100644
> --- a/drivers/gpu/drm/xe/xe_huc.h
> +++ b/drivers/gpu/drm/xe/xe_huc.h
> @@ -17,6 +17,7 @@ enum xe_huc_auth_types {
> XE_HUC_AUTH_TYPES_COUNT
> };
>
> +int xe_huc_flr_done(struct xe_huc *huc);
> int xe_huc_init(struct xe_huc *huc);
> int xe_huc_init_post_hwconfig(struct xe_huc *huc);
> int xe_huc_upload(struct xe_huc *huc);
> diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
> index 75091bde0d50..e41aa95a4322 100644
> --- a/drivers/gpu/drm/xe/xe_uc.c
> +++ b/drivers/gpu/drm/xe/xe_uc.c
> @@ -15,6 +15,7 @@
> #include "xe_guc_pc.h"
> #include "xe_guc_rc.h"
> #include "xe_guc_engine_activity.h"
> +#include "xe_guc_submit.h"
> #include "xe_huc.h"
> #include "xe_sriov.h"
> #include "xe_wopcm.h"
> @@ -275,6 +276,42 @@ static void uc_reset_wait(struct xe_uc *uc)
> goto again;
> }
>
> +/**
> + * xe_uc_flr_prepare() - Prepare uCs for FLR
> + * @uc: The uC object
> + *
> + * Tear down pending work and stop all uCs.
> + */
> +void xe_uc_flr_prepare(struct xe_uc *uc)
> +{
> + xe_gsc_wait_for_worker_completion(&uc->gsc);
> + xe_uc_reset_prepare(uc);
> + xe_guc_flr_prepare(&uc->guc);
> + xe_uc_stop(uc);
> + xe_uc_sanitize(uc);
Note that xe_uc_sanitize purposely does not clear the GSC state because
GSC survives both reset and D3Hot. However, FLR does reset it so the
state needs to be cleared here.
However, we currently do not support GSC on DGFX, so while the FLR
support is limited to DGFX this is not an issue (although if you want to
skip adding support please at least add an assert so that we catch it if
we ever enable FLR for iGFX).
Daniele
> +}
> +
> +/**
> + * xe_uc_flr_done() - Re-initialize uCs after FLR
> + * @uc: The uC object
> + *
> + * Returns: 0 on success, negative error code otherwise.
> + */
> +int xe_uc_flr_done(struct xe_uc *uc)
> +{
> + int ret;
> +
> + ret = xe_guc_flr_done(&uc->guc);
> + if (ret)
> + return ret;
> +
> + ret = xe_huc_flr_done(&uc->huc);
> + if (ret)
> + return ret;
> +
> + return xe_gsc_flr_done(&uc->gsc);
> +}
> +
> void xe_uc_suspend_prepare(struct xe_uc *uc)
> {
> xe_gsc_wait_for_worker_completion(&uc->gsc);
> diff --git a/drivers/gpu/drm/xe/xe_uc.h b/drivers/gpu/drm/xe/xe_uc.h
> index 255a54a8f876..1756821edea1 100644
> --- a/drivers/gpu/drm/xe/xe_uc.h
> +++ b/drivers/gpu/drm/xe/xe_uc.h
> @@ -8,6 +8,8 @@
>
> struct xe_uc;
>
> +int xe_uc_flr_done(struct xe_uc *uc);
> +void xe_uc_flr_prepare(struct xe_uc *uc);
> int xe_uc_init_noalloc(struct xe_uc *uc);
> int xe_uc_init(struct xe_uc *uc);
> int xe_uc_init_post_hwconfig(struct xe_uc *uc);
next prev parent reply other threads:[~2026-04-15 16:25 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-06 14:07 [PATCH v5 0/9] Introduce Xe PCIe FLR Raag Jadav
2026-04-06 14:07 ` [PATCH v5 1/9] drm/xe/uc_fw: Allow re-initializing firmware Raag Jadav
2026-04-15 16:06 ` Daniele Ceraolo Spurio
2026-04-06 14:07 ` [PATCH v5 2/9] drm/xe/guc_submit: Introduce guc_exec_queue_reinit() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 3/9] drm/xe/gt: Introduce FLR helpers Raag Jadav
2026-04-15 16:25 ` Daniele Ceraolo Spurio [this message]
2026-04-06 14:07 ` [PATCH v5 4/9] drm/xe/irq: Introduce xe_irq_disable() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 5/9] drm/xe: Introduce xe_device_assert_lmem_ready() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 6/9] drm/xe/bo_evict: Introduce xe_bo_restore_map() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 7/9] drm/xe/exec_queue: Introduce xe_exec_queue_reinit() Raag Jadav
2026-04-15 16:10 ` Daniele Ceraolo Spurio
2026-04-15 16:48 ` Daniele Ceraolo Spurio
2026-04-15 17:02 ` Daniele Ceraolo Spurio
2026-04-06 14:07 ` [PATCH v5 8/9] drm/xe/migrate: Introduce xe_migrate_reinit() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 9/9] drm/xe/pci: Introduce PCIe FLR Raag Jadav
2026-04-15 8:43 ` Laguna, Lukasz
2026-04-15 9:46 ` Raag Jadav
2026-04-15 10:33 ` Laguna, Lukasz
2026-04-15 10:54 ` Raag Jadav
2026-04-16 6:40 ` Raag Jadav
2026-04-15 16:45 ` Daniele Ceraolo Spurio
2026-04-06 14:18 ` ✗ CI.checkpatch: warning for Introduce Xe PCIe FLR (rev5) Patchwork
2026-04-06 14:19 ` ✓ CI.KUnit: success " Patchwork
2026-04-06 14:54 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-06 18:08 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-04-10 14:22 ` [PATCH v5 0/9] Introduce Xe PCIe FLR Raag Jadav
2026-04-10 18:22 ` Maarten Lankhorst
2026-04-11 8:11 ` Raag Jadav
2026-04-15 15:47 ` Daniele Ceraolo Spurio
2026-04-16 6:19 ` Raag Jadav
2026-04-16 6:35 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9eb9aa1e-92fd-4e8a-991b-a94599600c4e@intel.com \
--to=daniele.ceraolospurio@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
--cc=lukas@wunner.de \
--cc=lukasz.laguna@intel.com \
--cc=maarten@lankhorst.se \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=michal.winiarski@intel.com \
--cc=raag.jadav@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=zhanjun.dong@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox