public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: Raag Jadav <raag.jadav@intel.com>, <intel-xe@lists.freedesktop.org>
Cc: <matthew.brost@intel.com>, <rodrigo.vivi@intel.com>,
	<thomas.hellstrom@linux.intel.com>, <riana.tauro@intel.com>,
	<michal.wajdeczko@intel.com>, <matthew.d.roper@intel.com>,
	<michal.winiarski@intel.com>, <matthew.auld@intel.com>,
	<maarten@lankhorst.se>, <jani.nikula@intel.com>,
	<lukasz.laguna@intel.com>, <zhanjun.dong@intel.com>,
	<lukas@wunner.de>
Subject: Re: [PATCH v5 7/9] drm/xe/exec_queue: Introduce xe_exec_queue_reinit()
Date: Wed, 15 Apr 2026 10:02:54 -0700	[thread overview]
Message-ID: <a02d2d11-e3db-4752-8a2a-26e36c403e97@intel.com> (raw)
In-Reply-To: <1f83cf41-b901-430e-8583-49f2a94b64e4@intel.com>



On 4/15/2026 9:48 AM, Daniele Ceraolo Spurio wrote:
>
>
> On 4/15/2026 9:10 AM, Daniele Ceraolo Spurio wrote:
>>
>>
>> On 4/6/2026 7:07 AM, Raag Jadav wrote:
>>> In preparation of usecases which require re-initializing an exec queue
>>> after PCIe FLR, introduce xe_exec_queue_reinit() helper. All the exec
>>> queue LCRs already exist but the context is lost on PCIe FLR and needs
>>> re-initialization.
>>
>> Isn't this potentially problematic for userspace? If they have state 
>> saved in their LRCs, that state would be lost without any way for the 
>> user to know. New submission on those contexts might end up giving 
>> incorrect output without explanation.
>> IMO it'd be better to just kill all the contexts and be done with it. 
>> FLR is a full reset and I don't think apps are supposed to survive it 
>> without noticing.
>
> Just realized looking at the follow up patches that this is only 
> called for the migration queue, which is kernel-owned. If we're only 
> expecting to re-init kernel queues that needs to be documented and we 
> need an assert in code.

Sorry about the triple email, I'm kind of finding new questions while 
looking at the other patches.
What about the other kernel-owned queues apart from the migration ones? 
e.g., we have a couple for GSC and PXP. Those features are not supported 
on discrete, but we should at least add asserts to make sure that we 
can't enable FLR on integrated without handling those queues.

>
>>
>> Daniele
>>
>>>
>>> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
>>> ---
>>> v2: Re-initialize migrate context (Matthew Brost)
>>> ---
>>>   drivers/gpu/drm/xe/xe_exec_queue.c | 37 
>>> ++++++++++++++++++++++++++----
>>>   drivers/gpu/drm/xe/xe_exec_queue.h |  1 +
>>>   drivers/gpu/drm/xe/xe_lrc.c        | 17 ++++++++++++++
>>>   drivers/gpu/drm/xe/xe_lrc.h        |  2 ++
>>>   4 files changed, 53 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c 
>>> b/drivers/gpu/drm/xe/xe_exec_queue.c
>>> index b287d0e0e60a..dd99bf766926 100644
>>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>> @@ -331,9 +331,8 @@ static void __xe_exec_queue_fini(struct 
>>> xe_exec_queue *q)
>>>           xe_lrc_put(q->lrc[i]);
>>>   }
>>>   -static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 
>>> exec_queue_flags)
>>> +static u32 xe_lrc_init_flags(struct xe_exec_queue *q, u32 
>>> exec_queue_flags)
>>>   {
>>> -    int i, err;
>>>       u32 flags = 0;
>>>         /*
>>> @@ -356,6 +355,13 @@ static int __xe_exec_queue_init(struct 
>>> xe_exec_queue *q, u32 exec_queue_flags)
>>>       if (q->flags & EXEC_QUEUE_FLAG_DISABLE_STATE_CACHE_PERF_FIX)
>>>           flags |= XE_LRC_DISABLE_STATE_CACHE_PERF_FIX;
>>>   +    return flags;
>>> +}
>>> +
>>> +static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 
>>> exec_queue_flags)
>>> +{
>>> +    int i, err;
>>> +
>>>       err = q->ops->init(q);
>>>       if (err)
>>>           return err;
>>> @@ -379,8 +385,8 @@ static int __xe_exec_queue_init(struct 
>>> xe_exec_queue *q, u32 exec_queue_flags)
>>>                 marker = xe_gt_sriov_vf_wait_valid_ggtt(q->gt);
>>>   -            lrc = xe_lrc_create(q->hwe, q->vm, q->replay_state,
>>> -                        xe_lrc_ring_size(), q->msix_vec, flags);
>>> +            lrc = xe_lrc_create(q->hwe, q->vm, q->replay_state, 
>>> xe_lrc_ring_size(),
>>> +                        q->msix_vec, xe_lrc_init_flags(q, 
>>> exec_queue_flags));
>>>               if (IS_ERR(lrc)) {
>>>                   err = PTR_ERR(lrc);
>>>                   goto err_lrc;
>>> @@ -402,6 +408,29 @@ static int __xe_exec_queue_init(struct 
>>> xe_exec_queue *q, u32 exec_queue_flags)
>>>       return err;
>>>   }
>>>   +/**
>>> + * xe_exec_queue_reinit() - Re-initialize exec queue
>>> + * @q: exec queue to re-initialize
>>> + *
>>> + * Returns: 0 on success, negative error code otherwise.
>>> + */
>>> +int xe_exec_queue_reinit(struct xe_exec_queue *q)
>>> +{
>>> +    int i, err;
>>> +
>>> +    /* Re-initialize submission backend */
>>> +    q->ops->reinit(q);
>>> +
>>> +    for (i = 0; i < q->width; i++) {
>>> +        err = xe_lrc_reinit(q->lrc[i], q->hwe, q->vm, q->replay_state,
>>> +                    q->msix_vec, xe_lrc_init_flags(q, q->flags));
>>> +        if (err)
>>> +            return err;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   /**
>>>    * xe_exec_queue_create() - Create an exec queue
>>>    * @xe: Xe device
>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h 
>>> b/drivers/gpu/drm/xe/xe_exec_queue.h
>>> index a82d99bd77bc..445867d4da26 100644
>>> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
>>> @@ -34,6 +34,7 @@ struct xe_exec_queue 
>>> *xe_exec_queue_create_bind(struct xe_device *xe,
>>>   void xe_exec_queue_fini(struct xe_exec_queue *q);
>>>   void xe_exec_queue_destroy(struct kref *ref);
>>>   void xe_exec_queue_assign_name(struct xe_exec_queue *q, u32 
>>> instance);
>>> +int xe_exec_queue_reinit(struct xe_exec_queue *q);
>>>     static inline struct xe_exec_queue *
>>>   xe_exec_queue_get_unless_zero(struct xe_exec_queue *q)
>>> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
>>> index 9d12a0d2f0b5..a6421ac3765b 100644
>>> --- a/drivers/gpu/drm/xe/xe_lrc.c
>>> +++ b/drivers/gpu/drm/xe/xe_lrc.c
>>> @@ -1593,6 +1593,23 @@ static int xe_lrc_ctx_init(struct xe_lrc 
>>> *lrc, struct xe_hw_engine *hwe, struct
>>>       return err;
>>>   }
>>>   +/**
>>> + * xe_lrc_reinit() - Re-initialize LRC
>>> + * @lrc: Pointer to the LRC
>>> + * @hwe: Hardware Engine
>>> + * @vm: The VM (address space)
>>> + * @replay_state: GPU hang replay state
>>> + * @msix_vec: MSI-X interrupt vector (for platforms that support it)
>>> + * @init_flags: LRC initialization flags
>>> + *
>>> + * Returns: 0 on success, negative error code otherwise.
>>> + */
>>> +int xe_lrc_reinit(struct xe_lrc *lrc, struct xe_hw_engine *hwe, 
>>> struct xe_vm *vm,
>>> +          void *replay_state, u16 msix_vec, u32 init_flags)
>>> +{
>>> +    return xe_lrc_ctx_init(lrc, hwe, vm, replay_state, msix_vec, 
>>> init_flags);
>>> +}
>>> +
>>>   static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine 
>>> *hwe, struct xe_vm *vm,
>>>                  void *replay_state, u32 ring_size, u16 msix_vec, 
>>> u32 init_flags)
>>>   {
>>> diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
>>> index e7c975f9e2d9..514355ce3d6a 100644
>>> --- a/drivers/gpu/drm/xe/xe_lrc.h
>>> +++ b/drivers/gpu/drm/xe/xe_lrc.h
>>> @@ -53,6 +53,8 @@ struct xe_lrc_snapshot {
>>>     struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct 
>>> xe_vm *vm,
>>>                    void *replay_state, u32 ring_size, u16 msix_vec, 
>>> u32 flags);
>>> +int xe_lrc_reinit(struct xe_lrc *lrc, struct xe_hw_engine *hwe, 
>>> struct xe_vm *vm,
>>> +          void *replay_state, u16 msix_vec, u32 init_flags);
>>>   void xe_lrc_destroy(struct kref *ref);
>>>     /**
>>
>


  reply	other threads:[~2026-04-15 17:03 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06 14:07 [PATCH v5 0/9] Introduce Xe PCIe FLR Raag Jadav
2026-04-06 14:07 ` [PATCH v5 1/9] drm/xe/uc_fw: Allow re-initializing firmware Raag Jadav
2026-04-15 16:06   ` Daniele Ceraolo Spurio
2026-04-06 14:07 ` [PATCH v5 2/9] drm/xe/guc_submit: Introduce guc_exec_queue_reinit() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 3/9] drm/xe/gt: Introduce FLR helpers Raag Jadav
2026-04-15 16:25   ` Daniele Ceraolo Spurio
2026-04-06 14:07 ` [PATCH v5 4/9] drm/xe/irq: Introduce xe_irq_disable() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 5/9] drm/xe: Introduce xe_device_assert_lmem_ready() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 6/9] drm/xe/bo_evict: Introduce xe_bo_restore_map() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 7/9] drm/xe/exec_queue: Introduce xe_exec_queue_reinit() Raag Jadav
2026-04-15 16:10   ` Daniele Ceraolo Spurio
2026-04-15 16:48     ` Daniele Ceraolo Spurio
2026-04-15 17:02       ` Daniele Ceraolo Spurio [this message]
2026-04-06 14:07 ` [PATCH v5 8/9] drm/xe/migrate: Introduce xe_migrate_reinit() Raag Jadav
2026-04-06 14:07 ` [PATCH v5 9/9] drm/xe/pci: Introduce PCIe FLR Raag Jadav
2026-04-15  8:43   ` Laguna, Lukasz
2026-04-15  9:46     ` Raag Jadav
2026-04-15 10:33       ` Laguna, Lukasz
2026-04-15 10:54         ` Raag Jadav
2026-04-16  6:40           ` Raag Jadav
2026-04-15 16:45   ` Daniele Ceraolo Spurio
2026-04-06 14:18 ` ✗ CI.checkpatch: warning for Introduce Xe PCIe FLR (rev5) Patchwork
2026-04-06 14:19 ` ✓ CI.KUnit: success " Patchwork
2026-04-06 14:54 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-06 18:08 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-04-10 14:22 ` [PATCH v5 0/9] Introduce Xe PCIe FLR Raag Jadav
2026-04-10 18:22   ` Maarten Lankhorst
2026-04-11  8:11     ` Raag Jadav
2026-04-15 15:47 ` Daniele Ceraolo Spurio
2026-04-16  6:19   ` Raag Jadav
2026-04-16  6:35     ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a02d2d11-e3db-4752-8a2a-26e36c403e97@intel.com \
    --to=daniele.ceraolospurio@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jani.nikula@intel.com \
    --cc=lukas@wunner.de \
    --cc=lukasz.laguna@intel.com \
    --cc=maarten@lankhorst.se \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=michal.winiarski@intel.com \
    --cc=raag.jadav@intel.com \
    --cc=riana.tauro@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=zhanjun.dong@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox