From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: Raag Jadav <raag.jadav@intel.com>, <intel-xe@lists.freedesktop.org>
Cc: <matthew.brost@intel.com>, <rodrigo.vivi@intel.com>,
<thomas.hellstrom@linux.intel.com>, <riana.tauro@intel.com>,
<michal.wajdeczko@intel.com>, <matthew.d.roper@intel.com>,
<michal.winiarski@intel.com>, <matthew.auld@intel.com>,
<maarten@lankhorst.se>, <jani.nikula@intel.com>,
<lukasz.laguna@intel.com>, <zhanjun.dong@intel.com>,
<lukas@wunner.de>
Subject: Re: [PATCH v5 7/9] drm/xe/exec_queue: Introduce xe_exec_queue_reinit()
Date: Wed, 15 Apr 2026 09:10:29 -0700 [thread overview]
Message-ID: <b30043f7-9143-4cd9-bfbd-38b6c5077153@intel.com> (raw)
In-Reply-To: <20260406140722.154445-8-raag.jadav@intel.com>
On 4/6/2026 7:07 AM, Raag Jadav wrote:
> In preparation of usecases which require re-initializing an exec queue
> after PCIe FLR, introduce xe_exec_queue_reinit() helper. All the exec
> queue LCRs already exist but the context is lost on PCIe FLR and needs
> re-initialization.
Isn't this potentially problematic for userspace? If they have state
saved in their LRCs, that state would be lost without any way for the
user to know. New submission on those contexts might end up giving
incorrect output without explanation.
IMO it'd be better to just kill all the contexts and be done with it.
FLR is a full reset and I don't think apps are supposed to survive it
without noticing.
Daniele
>
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> ---
> v2: Re-initialize migrate context (Matthew Brost)
> ---
> drivers/gpu/drm/xe/xe_exec_queue.c | 37 ++++++++++++++++++++++++++----
> drivers/gpu/drm/xe/xe_exec_queue.h | 1 +
> drivers/gpu/drm/xe/xe_lrc.c | 17 ++++++++++++++
> drivers/gpu/drm/xe/xe_lrc.h | 2 ++
> 4 files changed, 53 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index b287d0e0e60a..dd99bf766926 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -331,9 +331,8 @@ static void __xe_exec_queue_fini(struct xe_exec_queue *q)
> xe_lrc_put(q->lrc[i]);
> }
>
> -static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 exec_queue_flags)
> +static u32 xe_lrc_init_flags(struct xe_exec_queue *q, u32 exec_queue_flags)
> {
> - int i, err;
> u32 flags = 0;
>
> /*
> @@ -356,6 +355,13 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 exec_queue_flags)
> if (q->flags & EXEC_QUEUE_FLAG_DISABLE_STATE_CACHE_PERF_FIX)
> flags |= XE_LRC_DISABLE_STATE_CACHE_PERF_FIX;
>
> + return flags;
> +}
> +
> +static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 exec_queue_flags)
> +{
> + int i, err;
> +
> err = q->ops->init(q);
> if (err)
> return err;
> @@ -379,8 +385,8 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 exec_queue_flags)
>
> marker = xe_gt_sriov_vf_wait_valid_ggtt(q->gt);
>
> - lrc = xe_lrc_create(q->hwe, q->vm, q->replay_state,
> - xe_lrc_ring_size(), q->msix_vec, flags);
> + lrc = xe_lrc_create(q->hwe, q->vm, q->replay_state, xe_lrc_ring_size(),
> + q->msix_vec, xe_lrc_init_flags(q, exec_queue_flags));
> if (IS_ERR(lrc)) {
> err = PTR_ERR(lrc);
> goto err_lrc;
> @@ -402,6 +408,29 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 exec_queue_flags)
> return err;
> }
>
> +/**
> + * xe_exec_queue_reinit() - Re-initialize exec queue
> + * @q: exec queue to re-initialize
> + *
> + * Returns: 0 on success, negative error code otherwise.
> + */
> +int xe_exec_queue_reinit(struct xe_exec_queue *q)
> +{
> + int i, err;
> +
> + /* Re-initialize submission backend */
> + q->ops->reinit(q);
> +
> + for (i = 0; i < q->width; i++) {
> + err = xe_lrc_reinit(q->lrc[i], q->hwe, q->vm, q->replay_state,
> + q->msix_vec, xe_lrc_init_flags(q, q->flags));
> + if (err)
> + return err;
> + }
> +
> + return 0;
> +}
> +
> /**
> * xe_exec_queue_create() - Create an exec queue
> * @xe: Xe device
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
> index a82d99bd77bc..445867d4da26 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -34,6 +34,7 @@ struct xe_exec_queue *xe_exec_queue_create_bind(struct xe_device *xe,
> void xe_exec_queue_fini(struct xe_exec_queue *q);
> void xe_exec_queue_destroy(struct kref *ref);
> void xe_exec_queue_assign_name(struct xe_exec_queue *q, u32 instance);
> +int xe_exec_queue_reinit(struct xe_exec_queue *q);
>
> static inline struct xe_exec_queue *
> xe_exec_queue_get_unless_zero(struct xe_exec_queue *q)
> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
> index 9d12a0d2f0b5..a6421ac3765b 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.c
> +++ b/drivers/gpu/drm/xe/xe_lrc.c
> @@ -1593,6 +1593,23 @@ static int xe_lrc_ctx_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, struct
> return err;
> }
>
> +/**
> + * xe_lrc_reinit() - Re-initialize LRC
> + * @lrc: Pointer to the LRC
> + * @hwe: Hardware Engine
> + * @vm: The VM (address space)
> + * @replay_state: GPU hang replay state
> + * @msix_vec: MSI-X interrupt vector (for platforms that support it)
> + * @init_flags: LRC initialization flags
> + *
> + * Returns: 0 on success, negative error code otherwise.
> + */
> +int xe_lrc_reinit(struct xe_lrc *lrc, struct xe_hw_engine *hwe, struct xe_vm *vm,
> + void *replay_state, u16 msix_vec, u32 init_flags)
> +{
> + return xe_lrc_ctx_init(lrc, hwe, vm, replay_state, msix_vec, init_flags);
> +}
> +
> static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, struct xe_vm *vm,
> void *replay_state, u32 ring_size, u16 msix_vec, u32 init_flags)
> {
> diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
> index e7c975f9e2d9..514355ce3d6a 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.h
> +++ b/drivers/gpu/drm/xe/xe_lrc.h
> @@ -53,6 +53,8 @@ struct xe_lrc_snapshot {
>
> struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm,
> void *replay_state, u32 ring_size, u16 msix_vec, u32 flags);
> +int xe_lrc_reinit(struct xe_lrc *lrc, struct xe_hw_engine *hwe, struct xe_vm *vm,
> + void *replay_state, u16 msix_vec, u32 init_flags);
> void xe_lrc_destroy(struct kref *ref);
>
> /**
next prev parent reply other threads:[~2026-04-15 16:10 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-06 14:07 [PATCH v5 0/9] Introduce Xe PCIe FLR Raag Jadav
2026-04-06 14:07 ` [PATCH v5 1/9] drm/xe/uc_fw: Allow re-initializing firmware Raag Jadav
2026-04-15 16:06 ` Daniele Ceraolo Spurio
2026-04-06 14:07 ` [PATCH v5 2/9] drm/xe/guc_submit: Introduce guc_exec_queue_reinit() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 3/9] drm/xe/gt: Introduce FLR helpers Raag Jadav
2026-04-15 16:25 ` Daniele Ceraolo Spurio
2026-04-06 14:07 ` [PATCH v5 4/9] drm/xe/irq: Introduce xe_irq_disable() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 5/9] drm/xe: Introduce xe_device_assert_lmem_ready() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 6/9] drm/xe/bo_evict: Introduce xe_bo_restore_map() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 7/9] drm/xe/exec_queue: Introduce xe_exec_queue_reinit() Raag Jadav
2026-04-15 16:10 ` Daniele Ceraolo Spurio [this message]
2026-04-15 16:48 ` Daniele Ceraolo Spurio
2026-04-15 17:02 ` Daniele Ceraolo Spurio
2026-04-06 14:07 ` [PATCH v5 8/9] drm/xe/migrate: Introduce xe_migrate_reinit() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 9/9] drm/xe/pci: Introduce PCIe FLR Raag Jadav
2026-04-15 8:43 ` Laguna, Lukasz
2026-04-15 9:46 ` Raag Jadav
2026-04-15 10:33 ` Laguna, Lukasz
2026-04-15 10:54 ` Raag Jadav
2026-04-16 6:40 ` Raag Jadav
2026-04-15 16:45 ` Daniele Ceraolo Spurio
2026-04-06 14:18 ` ✗ CI.checkpatch: warning for Introduce Xe PCIe FLR (rev5) Patchwork
2026-04-06 14:19 ` ✓ CI.KUnit: success " Patchwork
2026-04-06 14:54 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-06 18:08 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-04-10 14:22 ` [PATCH v5 0/9] Introduce Xe PCIe FLR Raag Jadav
2026-04-10 18:22 ` Maarten Lankhorst
2026-04-11 8:11 ` Raag Jadav
2026-04-15 15:47 ` Daniele Ceraolo Spurio
2026-04-16 6:19 ` Raag Jadav
2026-04-16 6:35 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b30043f7-9143-4cd9-bfbd-38b6c5077153@intel.com \
--to=daniele.ceraolospurio@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
--cc=lukas@wunner.de \
--cc=lukasz.laguna@intel.com \
--cc=maarten@lankhorst.se \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=michal.winiarski@intel.com \
--cc=raag.jadav@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=zhanjun.dong@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox