Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Raag Jadav <raag.jadav@intel.com>
To: Matt Roper <matthew.d.roper@intel.com>
Cc: intel-xe@lists.freedesktop.org, matthew.brost@intel.com,
	rodrigo.vivi@intel.com, thomas.hellstrom@linux.intel.com,
	riana.tauro@intel.com, michal.wajdeczko@intel.com,
	michal.winiarski@intel.com
Subject: Re: [PATCH v1 0/6] Introduce Xe PCIe FLR
Date: Fri, 27 Feb 2026 05:18:04 +0100	[thread overview]
Message-ID: <aaEa_NrpBGVqkH7e@black.igk.intel.com> (raw)
In-Reply-To: <20260226221225.GX4694@mdroper-desk1.amr.corp.intel.com>

On Thu, Feb 26, 2026 at 02:12:25PM -0800, Matt Roper wrote:
> On Tue, Feb 24, 2026 at 03:55:13PM +0530, Raag Jadav wrote:
> > Here's my humble attempt at introducing PCIe FLR support in xe driver.
> > This is ofcourse a half baked implementation and only limited to reloading
> > uC firmwares. This needs to be extended for a lot of different components
> > which I've skipped here for my lack of competence, so feel free to join
> > in and support them.
> > 
> > PS: This works enough to allow a single exec test run after FLR but it
> > follows with a GuC crash on subsequent runs which I'm still investigating.
> 
> Probably a dumb question since FLR while the driver is bound isn't an
> area I've considered:  if we do a PCI FLR, is there *anything* in the
> driver state that would still be relevant and useful to carry forward
> after the reset?  I believe at the hardware level vram gets wiped, all
> registers in the BAR go back to power-up defaults, etc., right?  If
> there's no state that we can meaningfully carry forward post-reset, then
> couldn't that be handled by destroying the whole xe_device (and
> releasing all of its resources), and then starting over with
> xe_pci_probe() to initialize a new one from scratch?

As an alternative yes, but that's not the userspace contract with FLR.
If user wants to reload the driver, there's a different path for it,
i.e. unbind + bind. Detailed explanation[1] from Winiarski on this.

[1] https://lore.kernel.org/intel-xe/forn7m5f2m6bwpspktrmjzvxcezcmoqyuuclu64x77uxdo5c5u@fcg3kphdb5re/

> I guess on an igpu all of our data is in smem and only the stolen memory
> gets wiped, so an FLR is a bit less destructive.  But on a dgpu I'm not
> sure how much continuation is really possible?

The expectation is to give user back a working hardware. Since VRAM is
lost, user may need to recreate memory contents, but we keep the clients
and their descriptors intact.

On a side note, this implementation is meant as a stepping stone and to
be reused for other usecases in future products.

Raag

> > Raag Jadav (6):
> >   drm/xe/uc_fw: Allow reloading firmware
> >   drm/xe/uc: Introduce FLR helpers
> >   drm/xe/irq: Introduce FLR helper
> >   drm/xe: Introduce xe_device_assert_lmem_ready()
> >   drm/xe/bo_evict: Introduce xe_bo_restore_map()
> >   drm/xe/pci: Introduce PCIe FLR
> > 
> >  drivers/gpu/drm/xe/Makefile      |   1 +
> >  drivers/gpu/drm/xe/xe_bo_evict.c |  34 +++++--
> >  drivers/gpu/drm/xe/xe_bo_evict.h |   2 +
> >  drivers/gpu/drm/xe/xe_device.c   |   4 +-
> >  drivers/gpu/drm/xe/xe_device.h   |   1 +
> >  drivers/gpu/drm/xe/xe_gsc.c      |  15 ++++
> >  drivers/gpu/drm/xe/xe_gsc.h      |   1 +
> >  drivers/gpu/drm/xe/xe_gt.c       |  10 +++
> >  drivers/gpu/drm/xe/xe_gt.h       |   2 +
> >  drivers/gpu/drm/xe/xe_guc.c      |  16 ++++
> >  drivers/gpu/drm/xe/xe_guc.h      |   1 +
> >  drivers/gpu/drm/xe/xe_huc.c      |  16 ++++
> >  drivers/gpu/drm/xe/xe_huc.h      |   1 +
> >  drivers/gpu/drm/xe/xe_irq.c      |   7 +-
> >  drivers/gpu/drm/xe/xe_irq.h      |   1 +
> >  drivers/gpu/drm/xe/xe_pci.c      |   1 +
> >  drivers/gpu/drm/xe/xe_pci.h      |   2 +
> >  drivers/gpu/drm/xe/xe_pci_err.c  | 147 +++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_uc.c       |  24 +++++
> >  drivers/gpu/drm/xe/xe_uc.h       |   2 +
> >  drivers/gpu/drm/xe/xe_uc_fw.c    |  33 +++----
> >  drivers/gpu/drm/xe/xe_uc_fw.h    |   1 +
> >  22 files changed, 295 insertions(+), 27 deletions(-)
> >  create mode 100644 drivers/gpu/drm/xe/xe_pci_err.c
> > 
> > -- 
> > 2.43.0
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation

      reply	other threads:[~2026-02-27  4:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24 10:25 [PATCH v1 0/6] Introduce Xe PCIe FLR Raag Jadav
2026-02-24 10:25 ` [PATCH v1 1/6] drm/xe/uc_fw: Allow reloading firmware Raag Jadav
2026-02-24 10:25 ` [PATCH v1 2/6] drm/xe/uc: Introduce FLR helpers Raag Jadav
2026-02-24 10:25 ` [PATCH v1 3/6] drm/xe/irq: Introduce FLR helper Raag Jadav
2026-02-24 10:25 ` [PATCH v1 4/6] drm/xe: Introduce xe_device_assert_lmem_ready() Raag Jadav
2026-02-24 10:25 ` [PATCH v1 5/6] drm/xe/bo_evict: Introduce xe_bo_restore_map() Raag Jadav
2026-02-24 22:11   ` Matthew Brost
2026-02-26 12:13     ` Raag Jadav
2026-02-24 10:25 ` [PATCH v1 6/6] drm/xe/pci: Introduce PCIe FLR Raag Jadav
2026-02-26 16:42   ` Maarten Lankhorst
2026-02-26 16:46     ` Raag Jadav
2026-02-26 16:55       ` Raag Jadav
2026-02-24 10:36 ` ✗ CI.checkpatch: warning for Introduce Xe " Patchwork
2026-02-24 10:38 ` ✓ CI.KUnit: success " Patchwork
2026-02-26 10:28 ` [PATCH v1 0/6] " Jani Nikula
2026-02-26 12:31   ` Raag Jadav
2026-02-26 16:08 ` Matthew Auld
2026-02-26 16:37   ` Raag Jadav
2026-02-26 22:12 ` Matt Roper
2026-02-27  4:18   ` Raag Jadav [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaEa_NrpBGVqkH7e@black.igk.intel.com \
    --to=raag.jadav@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=michal.winiarski@intel.com \
    --cc=riana.tauro@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox