[PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration
@ 2025-11-24 23:08 Michał Winiarski
  2025-11-24 23:08 ` [PATCH v6 1/4] drm/xe/pf: Enable SR-IOV " Michał Winiarski
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Michał Winiarski @ 2025-11-24 23:08 UTC (permalink / raw)
  To: Alex Williamson, Lucas De Marchi, Thomas Hellström,
	Rodrigo Vivi, Jason Gunthorpe, Yishai Hadas, Kevin Tian,
	Shameer Kolothum, intel-xe, linux-kernel, kvm, Matthew Brost,
	Michal Wajdeczko
  Cc: dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig,
	Michał Winiarski

Hi,

We're now at v6, thanks for all the review feedback.

First 24 patches are now already merged through drm-tip tree, and I hope
we can get the remaining ones through the VFIO tree.
No major changes worth highlighting in this rev. Full changelog can be
found below.

Cover letter from the previous revision:

Xe is a DRM driver supporting Intel GPUs and for SR-IOV capable
devices, it enables the creation of SR-IOV VFs.
This series adds xe-vfio-pci driver variant that interacts with Xe
driver to control VF device state and read/write migration data,
allowing it to extend regular vfio-pci functionality with VFIO migration
capability.
The driver doesn't expose PRE_COPY support, as currently supported
hardware lacks the capability to track dirty pages.

While Xe driver already had the capability to manage VF device state,
management of migration data was something that needed to be implemented
and constitutes the majority of the series.

The migration data is processed asynchronously by the Xe driver, and is
organized into multiple migration data packet types representing the
hardware interfaces of the device (GGTT / MMIO / GuC FW / VRAM).
Since the VRAM can potentially be larger than available system memory,
it is copied in multiple chunks. The metadata needed for migration
compatibility decisions is added as part of descriptor packet (currently
limited to PCI device ID / revision).
Xe driver abstracts away the internals of packet processing and takes
care of tracking the position within individual packets.
The API exported to VFIO is similar to API exported by VFIO to
userspace, a simple .read()/.write().

Note that some of the VF resources are not virtualized (e.g. GGTT - the
GFX device global virtual address space). This means that the VF driver
needs to be aware that migration has occurred in order to properly
relocate (patching or reemiting data that contains references to GGTT
addresses) before resuming operation.
The code to handle that is already present in upstream Linux and in
production VF drivers for other OSes.

Links to previous revisions for reference.
v1:
https://lore.kernel.org/lkml/20251011193847.1836454-1-michal.winiarski@intel.com/
v2:
https://lore.kernel.org/lkml/20251021224133.577765-1-michal.winiarski@intel.com/
v3:
https://lore.kernel.org/lkml/20251030203135.337696-1-michal.winiarski@intel.com/
v4:
https://lore.kernel.org/lkml/20251105151027.540712-1-michal.winiarski@intel.com/
v5:
https://lore.kernel.org/lkml/20251111010439.347045-1-michal.winiarski@intel.com/

v5 -> v6:
* Exclude the patches already merged through drm-tip
* Add logging when migration is enabled in debug mode (Michał)
* Rename the xe_pf_get_pf helper (Michał)
* Don't use "vendor specific" (yet again) (Michał)
* Kerneldoc tweaks (Michał)
* Use guard(xe_pm_runtime_noresume) instead of assert (Michał)
* Check for num_vfs rather than total_vfs (Michał)

v4 -> v5:
* Require GuC version >= 70.54.0
* Fix VFIO migration migf disable
* Fix null-ptr-deref on save_read error
* Don't use "vendor specific" (again) (Kevin)
* Introduce xe_sriov_packet_types.h (Michał)
* Kernel-doc fixes (Michał)
* Use tile_id / gt_id instead of tile / gt in packet header (Michał)
* Don't use struct_group() in packet (Michał)
* And other, more minor changes

v3 -> v4:
* Add error handling on data_read / data_write path
* Don't match on PCI class, use PCI_DRIVER_OVERRIDE_DEVICE_VFIO helper
  instead (Lucas De Marchi)
* Use proper node VMA size inside GGTT save / restore helper (Michał)
* Improve data tracking set_bit / clear_bit wrapper names (Michał)
* Improve packet dump helper (Michał)
* Use drmm for migration mutex init (Michał)
* Rename the pf_device access helper (Michał)
* Use non-interruptible sleep in VRAM copy (Matt)
* Rename xe_sriov_migration_data to xe_sriov_packet along with relevant
  functions (Michał)
* Rename per-vf device-level data to xe_sriov_migration_state (Michał)
* Use struct name that matches component name instead of anonymous
  struct (Michał)
* Don't add XE_GT_SRIOV_STATE_MAX to state enum, use a helper macro
  instead (Michał)
* Kernel-doc fixes (Michał)

v2 -> v3:
* Bind xe-vfio-pci to specific devices instead of using vendor and
  class (Christoph Hellwig / Jason Gunthorpe)
* Don't refer to the driver as "vendor specific" (Christoph)
* Use pci_iov_get_pf_drvdata and change the interface to take xe_device
  (Jason)
* Update the RUNNING_P2P comment (Jason / Kevin Tian)
* Add state_mutex to protect device state transitions (Kevin)
* Implement .error_detected (Kevin)
* Drop redundant comments (Kevin)
* Explain 1-based indexing and wait_flr_done (Kevin)
* Add a missing get_file() (Kevin)
* Drop redundant state transitions when p2p is supported (Kevin)
* Update run/stop naming to match other drivers (Kevin)
* Fix error state handling (Kevin)
* Fix SAVE state diagram rendering (Michał Wajdeczko)
* Control state machine flipping PROCESS / WAIT logic (Michał Wajdeczko)
* Drop GUC / GGTT / MMIO / VRAM from SAVE control state machine
* Use devm instead of drmm for migration-related allocations (Michał)
* Use GGTT node for size calculations (Michał)
* Use mutex guards consistently (Michał)
* Fix build break on 32-bit (lkp)
* Kernel-doc updates (Michał)
* And other, more minor changes

v1 -> v2:
* Do not require debug flag to support migration on PTL/BMG
* Fix PCI class match on VFIO side
* Reorganized PF Control state machine (Michał Wajdeczko)
* Kerneldoc tidying (Michał Wajdeczko)
* Return NULL instead of -ENODATA for produce/consume (Michał Wajdeczko)
* guc_buf s/sync/sync_read (Matt Brost)
* Squash patch 03 (Matt Brost)
* Assert on PM ref instead of taking it (Matt Brost)
* Remove CCS completely (Matt Brost)
* Return ptr on guc_buf_sync_read (Michał Wajdeczko)
* Define default guc_buf size (Michał Wajdeczko)
* Drop CONFIG_PCI_IOV=n stubs where not needed (Michał Wajdeczko)
* And other, more minor changes

Michał Winiarski (4):
  drm/xe/pf: Enable SR-IOV VF migration
  drm/xe/pci: Introduce a helper to allow VF access to PF xe_device
  drm/xe/pf: Export helpers for VFIO
  vfio/xe: Add device specific vfio_pci driver variant for Intel
    graphics

 MAINTAINERS                                   |   7 +
 drivers/gpu/drm/xe/Makefile                   |   2 +
 drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |   9 +
 drivers/gpu/drm/xe/xe_pci.c                   |  17 +
 drivers/gpu/drm/xe/xe_pci.h                   |   3 +
 drivers/gpu/drm/xe/xe_sriov_pf_migration.c    |  35 +-
 drivers/gpu/drm/xe/xe_sriov_pf_migration.h    |   1 +
 .../gpu/drm/xe/xe_sriov_pf_migration_types.h  |   4 +-
 drivers/gpu/drm/xe/xe_sriov_vfio.c            | 276 +++++++++
 drivers/vfio/pci/Kconfig                      |   2 +
 drivers/vfio/pci/Makefile                     |   2 +
 drivers/vfio/pci/xe/Kconfig                   |  12 +
 drivers/vfio/pci/xe/Makefile                  |   3 +
 drivers/vfio/pci/xe/main.c                    | 568 ++++++++++++++++++
 include/drm/intel/xe_sriov_vfio.h             |  30 +
 15 files changed, 964 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_sriov_vfio.c
 create mode 100644 drivers/vfio/pci/xe/Kconfig
 create mode 100644 drivers/vfio/pci/xe/Makefile
 create mode 100644 drivers/vfio/pci/xe/main.c
 create mode 100644 include/drm/intel/xe_sriov_vfio.h

-- 
2.51.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v6 1/4] drm/xe/pf: Enable SR-IOV VF migration
  2025-11-24 23:08 [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration Michał Winiarski
@ 2025-11-24 23:08 ` Michał Winiarski
  2025-11-25 14:26   ` Michal Wajdeczko
  2025-11-24 23:08 ` [PATCH v6 2/4] drm/xe/pci: Introduce a helper to allow VF access to PF xe_device Michał Winiarski
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Michał Winiarski @ 2025-11-24 23:08 UTC (permalink / raw)
  To: Alex Williamson, Lucas De Marchi, Thomas Hellström,
	Rodrigo Vivi, Jason Gunthorpe, Yishai Hadas, Kevin Tian,
	Shameer Kolothum, intel-xe, linux-kernel, kvm, Matthew Brost,
	Michal Wajdeczko
  Cc: dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig,
	Michał Winiarski

All of the necessary building blocks are now in place to support SR-IOV
VF migration.
Flip the enable/disable logic to match VF code and disable the feature
only for platforms that don't meet the necessary prerequisites.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  9 +++++
 drivers/gpu/drm/xe/xe_sriov_pf_migration.c    | 35 ++++++++++++++++---
 drivers/gpu/drm/xe/xe_sriov_pf_migration.h    |  1 +
 .../gpu/drm/xe/xe_sriov_pf_migration_types.h  |  4 +--
 4 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
index d5d918ddce4fe..3174a8dee779e 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
@@ -17,6 +17,7 @@
 #include "xe_gt_sriov_pf_helpers.h"
 #include "xe_gt_sriov_pf_migration.h"
 #include "xe_gt_sriov_printk.h"
+#include "xe_guc.h"
 #include "xe_guc_buf.h"
 #include "xe_guc_ct.h"
 #include "xe_migrate.h"
@@ -1023,6 +1024,12 @@ static void action_ring_cleanup(void *arg)
 	ptr_ring_cleanup(r, destroy_pf_packet);
 }
 
+static void pf_gt_migration_check_support(struct xe_gt *gt)
+{
+	if (GUC_FIRMWARE_VER(&gt->uc.guc) < MAKE_GUC_VER(70, 54, 0))
+		xe_sriov_pf_migration_disable(gt_to_xe(gt), "requires GuC version >= 70.54.0");
+}
+
 /**
  * xe_gt_sriov_pf_migration_init() - Initialize support for VF migration.
  * @gt: the &xe_gt
@@ -1039,6 +1046,8 @@ int xe_gt_sriov_pf_migration_init(struct xe_gt *gt)
 
 	xe_gt_assert(gt, IS_SRIOV_PF(xe));
 
+	pf_gt_migration_check_support(gt);
+
 	if (!pf_migration_supported(gt))
 		return 0;
 
diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_sriov_pf_migration.c
index de06cc690fc81..6c4b16409cc9a 100644
--- a/drivers/gpu/drm/xe/xe_sriov_pf_migration.c
+++ b/drivers/gpu/drm/xe/xe_sriov_pf_migration.c
@@ -46,13 +46,37 @@ bool xe_sriov_pf_migration_supported(struct xe_device *xe)
 {
 	xe_assert(xe, IS_SRIOV_PF(xe));
 
-	return xe->sriov.pf.migration.supported;
+	return IS_ENABLED(CONFIG_DRM_XE_DEBUG) || !xe->sriov.pf.migration.disabled;
 }
 
-static bool pf_check_migration_support(struct xe_device *xe)
+/**
+ * xe_sriov_pf_migration_disable() - Turn off SR-IOV VF migration support on PF.
+ * @xe: the &xe_device instance.
+ * @fmt: format string for the log message, to be combined with following VAs.
+ */
+void xe_sriov_pf_migration_disable(struct xe_device *xe, const char *fmt, ...)
+{
+	struct va_format vaf;
+	va_list va_args;
+
+	xe_assert(xe, IS_SRIOV_PF(xe));
+
+	va_start(va_args, fmt);
+	vaf.fmt = fmt;
+	vaf.va  = &va_args;
+	xe_sriov_notice(xe, "migration %s: %pV\n",
+			IS_ENABLED(CONFIG_DRM_XE_DEBUG) ?
+			"missing prerequisite" : "disabled",
+			&vaf);
+	va_end(va_args);
+
+	xe->sriov.pf.migration.disabled = true;
+}
+
+static void pf_migration_check_support(struct xe_device *xe)
 {
-	/* XXX: for now this is for feature enabling only */
-	return IS_ENABLED(CONFIG_DRM_XE_DEBUG);
+	if (!xe_device_has_memirq(xe))
+		xe_sriov_pf_migration_disable(xe, "requires memory-based IRQ support");
 }
 
 static void pf_migration_cleanup(void *arg)
@@ -77,7 +101,8 @@ int xe_sriov_pf_migration_init(struct xe_device *xe)
 
 	xe_assert(xe, IS_SRIOV_PF(xe));
 
-	xe->sriov.pf.migration.supported = pf_check_migration_support(xe);
+	pf_migration_check_support(xe);
+
 	if (!xe_sriov_pf_migration_supported(xe))
 		return 0;
 
diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_migration.h b/drivers/gpu/drm/xe/xe_sriov_pf_migration.h
index b806298a0bb62..f8f408df84813 100644
--- a/drivers/gpu/drm/xe/xe_sriov_pf_migration.h
+++ b/drivers/gpu/drm/xe/xe_sriov_pf_migration.h
@@ -14,6 +14,7 @@ struct xe_sriov_packet;
 
 int xe_sriov_pf_migration_init(struct xe_device *xe);
 bool xe_sriov_pf_migration_supported(struct xe_device *xe);
+void xe_sriov_pf_migration_disable(struct xe_device *xe, const char *fmt, ...);
 int xe_sriov_pf_migration_restore_produce(struct xe_device *xe, unsigned int vfid,
 					  struct xe_sriov_packet *data);
 struct xe_sriov_packet *
diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h b/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h
index 363d673ee1dd5..7d9a8a278d915 100644
--- a/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h
+++ b/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h
@@ -14,8 +14,8 @@
  * struct xe_sriov_pf_migration - Xe device level VF migration data
  */
 struct xe_sriov_pf_migration {
-	/** @supported: indicates whether VF migration feature is supported */
-	bool supported;
+	/** @disabled: indicates whether VF migration feature is disabled */
+	bool disabled;
 };
 
 /**
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 2/4] drm/xe/pci: Introduce a helper to allow VF access to PF xe_device
  2025-11-24 23:08 [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration Michał Winiarski
  2025-11-24 23:08 ` [PATCH v6 1/4] drm/xe/pf: Enable SR-IOV " Michał Winiarski
@ 2025-11-24 23:08 ` Michał Winiarski
  2025-11-24 23:08 ` [PATCH v6 3/4] drm/xe/pf: Export helpers for VFIO Michał Winiarski
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Michał Winiarski @ 2025-11-24 23:08 UTC (permalink / raw)
  To: Alex Williamson, Lucas De Marchi, Thomas Hellström,
	Rodrigo Vivi, Jason Gunthorpe, Yishai Hadas, Kevin Tian,
	Shameer Kolothum, intel-xe, linux-kernel, kvm, Matthew Brost,
	Michal Wajdeczko
  Cc: dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig,
	Michał Winiarski

In certain scenarios (such as VF migration), VF driver needs to interact
with PF driver.
Add a helper to allow VF driver access to PF xe_device.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/xe/xe_pci.c | 17 +++++++++++++++++
 drivers/gpu/drm/xe/xe_pci.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index cd03b4b3ebdbd..b27f6364faa0f 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -1224,6 +1224,23 @@ static struct pci_driver xe_pci_driver = {
 #endif
 };
 
+/**
+ * xe_pci_to_pf_device() - Get PF &xe_device.
+ * @pdev: the VF &pci_dev device
+ *
+ * Return: pointer to PF &xe_device, NULL otherwise.
+ */
+struct xe_device *xe_pci_to_pf_device(struct pci_dev *pdev)
+{
+	struct drm_device *drm;
+
+	drm = pci_iov_get_pf_drvdata(pdev, &xe_pci_driver);
+	if (IS_ERR(drm))
+		return NULL;
+
+	return to_xe_device(drm);
+}
+
 int xe_register_pci_driver(void)
 {
 	return pci_register_driver(&xe_pci_driver);
diff --git a/drivers/gpu/drm/xe/xe_pci.h b/drivers/gpu/drm/xe/xe_pci.h
index 611c1209b14cc..11bcc5fe2c5b9 100644
--- a/drivers/gpu/drm/xe/xe_pci.h
+++ b/drivers/gpu/drm/xe/xe_pci.h
@@ -6,7 +6,10 @@
 #ifndef _XE_PCI_H_
 #define _XE_PCI_H_
 
+struct pci_dev;
+
 int xe_register_pci_driver(void);
 void xe_unregister_pci_driver(void);
+struct xe_device *xe_pci_to_pf_device(struct pci_dev *pdev);
 
 #endif
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 3/4] drm/xe/pf: Export helpers for VFIO
  2025-11-24 23:08 [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration Michał Winiarski
  2025-11-24 23:08 ` [PATCH v6 1/4] drm/xe/pf: Enable SR-IOV " Michał Winiarski
  2025-11-24 23:08 ` [PATCH v6 2/4] drm/xe/pci: Introduce a helper to allow VF access to PF xe_device Michał Winiarski
@ 2025-11-24 23:08 ` Michał Winiarski
  2025-11-25 14:38   ` Michal Wajdeczko
  2025-11-25 18:34   ` Alex Williamson
  2025-11-24 23:08 ` [PATCH v6 4/4] vfio/xe: Add device specific vfio_pci driver variant for Intel graphics Michał Winiarski
  2025-11-25 20:13 ` [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration Alex Williamson
  4 siblings, 2 replies; 19+ messages in thread
From: Michał Winiarski @ 2025-11-24 23:08 UTC (permalink / raw)
  To: Alex Williamson, Lucas De Marchi, Thomas Hellström,
	Rodrigo Vivi, Jason Gunthorpe, Yishai Hadas, Kevin Tian,
	Shameer Kolothum, intel-xe, linux-kernel, kvm, Matthew Brost,
	Michal Wajdeczko
  Cc: dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig,
	Michał Winiarski

Device specific VFIO driver variant for Xe will implement VF migration.
Export everything that's needed for migration ops.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/xe/Makefile        |   2 +
 drivers/gpu/drm/xe/xe_sriov_vfio.c | 276 +++++++++++++++++++++++++++++
 include/drm/intel/xe_sriov_vfio.h  |  30 ++++
 3 files changed, 308 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_sriov_vfio.c
 create mode 100644 include/drm/intel/xe_sriov_vfio.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index b848da79a4e18..0938b00a4c7fe 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -184,6 +184,8 @@ xe-$(CONFIG_PCI_IOV) += \
 	xe_sriov_pf_sysfs.o \
 	xe_tile_sriov_pf_debugfs.o
 
+xe-$(CONFIG_XE_VFIO_PCI) += xe_sriov_vfio.o
+
 # include helpers for tests even when XE is built-in
 ifdef CONFIG_DRM_XE_KUNIT_TEST
 xe-y += tests/xe_kunit_helpers.o
diff --git a/drivers/gpu/drm/xe/xe_sriov_vfio.c b/drivers/gpu/drm/xe/xe_sriov_vfio.c
new file mode 100644
index 0000000000000..785f9a5027d10
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sriov_vfio.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#include <drm/intel/xe_sriov_vfio.h>
+#include <linux/cleanup.h>
+
+#include "xe_pci.h"
+#include "xe_pm.h"
+#include "xe_sriov_pf_control.h"
+#include "xe_sriov_pf_helpers.h"
+#include "xe_sriov_pf_migration.h"
+
+/**
+ * xe_sriov_vfio_get_pf() - Get PF &xe_device.
+ * @pdev: the VF &pci_dev device
+ *
+ * Return: pointer to PF &xe_device, NULL otherwise.
+ */
+struct xe_device *xe_sriov_vfio_get_pf(struct pci_dev *pdev)
+{
+	return xe_pci_to_pf_device(pdev);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_get_pf, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_migration_supported() - Check if migration is supported.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ *
+ * Return: true if migration is supported, false otherwise.
+ */
+bool xe_sriov_vfio_migration_supported(struct xe_device *xe)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+
+	return xe_sriov_pf_migration_supported(xe);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_migration_supported, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_wait_flr_done() - Wait for VF FLR completion.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ *
+ * This function will wait until VF FLR is processed by PF on all tiles (or
+ * until timeout occurs).
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_sriov_vfio_wait_flr_done(struct xe_device *xe, unsigned int vfid)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_control_wait_flr(xe, vfid);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_wait_flr_done, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_suspend_device() - Suspend VF.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ *
+ * This function will pause VF on all tiles/GTs.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_sriov_vfio_suspend_device(struct xe_device *xe, unsigned int vfid)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_control_pause_vf(xe, vfid);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_suspend_device, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_resume_device() - Resume VF.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ *
+ * This function will resume VF on all tiles.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_sriov_vfio_resume_device(struct xe_device *xe, unsigned int vfid)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_control_resume_vf(xe, vfid);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_resume_device, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_stop_copy_enter() - Initiate a VF device migration data save.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_sriov_vfio_stop_copy_enter(struct xe_device *xe, unsigned int vfid)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_control_trigger_save_vf(xe, vfid);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_stop_copy_enter, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_stop_copy_exit() - Finish a VF device migration data save.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_sriov_vfio_stop_copy_exit(struct xe_device *xe, unsigned int vfid)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_control_finish_save_vf(xe, vfid);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_stop_copy_exit, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_resume_data_enter() - Initiate a VF device migration data restore.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_sriov_vfio_resume_data_enter(struct xe_device *xe, unsigned int vfid)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_control_trigger_restore_vf(xe, vfid);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_resume_data_enter, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_resume_data_exit() - Finish a VF device migration data restore.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_sriov_vfio_resume_data_exit(struct xe_device *xe, unsigned int vfid)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_control_finish_restore_vf(xe, vfid);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_resume_data_exit, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_error() - Move VF device to error state.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ *
+ * Reset is needed to move it out of error state.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_sriov_vfio_error(struct xe_device *xe, unsigned int vfid)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_control_stop_vf(xe, vfid);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_error, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_data_read() - Read migration data from the VF device.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ * @buf: start address of userspace buffer
+ * @len: requested read size from userspace
+ *
+ * Return: number of bytes that has been successfully read,
+ *	   0 if no more migration data is available, -errno on failure.
+ */
+ssize_t xe_sriov_vfio_data_read(struct xe_device *xe, unsigned int vfid,
+				char __user *buf, size_t len)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_migration_read(xe, vfid, buf, len);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_data_read, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_data_write() - Write migration data to the VF device.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ * @buf: start address of userspace buffer
+ * @len: requested write size from userspace
+ *
+ * Return: number of bytes that has been successfully written, -errno on failure.
+ */
+ssize_t xe_sriov_vfio_data_write(struct xe_device *xe, unsigned int vfid,
+				 const char __user *buf, size_t len)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_migration_write(xe, vfid, buf, len);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_data_write, "xe-vfio-pci");
+
+/**
+ * xe_sriov_vfio_stop_copy_size() - Get a size estimate of VF device migration data.
+ * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
+ * @vfid: the VF identifier (can't be 0)
+ *
+ * Return: migration data size in bytes or a negative error code on failure.
+ */
+ssize_t xe_sriov_vfio_stop_copy_size(struct xe_device *xe, unsigned int vfid)
+{
+	if (!IS_SRIOV_PF(xe))
+		return -EPERM;
+	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
+		return -EINVAL;
+
+	guard(xe_pm_runtime_noresume)(xe);
+
+	return xe_sriov_pf_migration_size(xe, vfid);
+}
+EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_stop_copy_size, "xe-vfio-pci");
diff --git a/include/drm/intel/xe_sriov_vfio.h b/include/drm/intel/xe_sriov_vfio.h
new file mode 100644
index 0000000000000..bcd7085a81c55
--- /dev/null
+++ b/include/drm/intel/xe_sriov_vfio.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_SRIOV_VFIO_H_
+#define _XE_SRIOV_VFIO_H_
+
+#include <linux/types.h>
+
+struct pci_dev;
+struct xe_device;
+
+struct xe_device *xe_sriov_vfio_get_pf(struct pci_dev *pdev);
+bool xe_sriov_vfio_migration_supported(struct xe_device *xe);
+int xe_sriov_vfio_wait_flr_done(struct xe_device *xe, unsigned int vfid);
+int xe_sriov_vfio_suspend_device(struct xe_device *xe, unsigned int vfid);
+int xe_sriov_vfio_resume_device(struct xe_device *xe, unsigned int vfid);
+int xe_sriov_vfio_stop_copy_enter(struct xe_device *xe, unsigned int vfid);
+int xe_sriov_vfio_stop_copy_exit(struct xe_device *xe, unsigned int vfid);
+int xe_sriov_vfio_resume_data_enter(struct xe_device *xe, unsigned int vfid);
+int xe_sriov_vfio_resume_data_exit(struct xe_device *xe, unsigned int vfid);
+int xe_sriov_vfio_error(struct xe_device *xe, unsigned int vfid);
+ssize_t xe_sriov_vfio_data_read(struct xe_device *xe, unsigned int vfid,
+				char __user *buf, size_t len);
+ssize_t xe_sriov_vfio_data_write(struct xe_device *xe, unsigned int vfid,
+				 const char __user *buf, size_t len);
+ssize_t xe_sriov_vfio_stop_copy_size(struct xe_device *xe, unsigned int vfid);
+
+#endif
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 4/4] vfio/xe: Add device specific vfio_pci driver variant for Intel graphics
  2025-11-24 23:08 [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration Michał Winiarski
                   ` (2 preceding siblings ...)
  2025-11-24 23:08 ` [PATCH v6 3/4] drm/xe/pf: Export helpers for VFIO Michał Winiarski
@ 2025-11-24 23:08 ` Michał Winiarski
  2025-11-25 20:08   ` Alex Williamson
  2025-11-25 20:13 ` [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration Alex Williamson
  4 siblings, 1 reply; 19+ messages in thread
From: Michał Winiarski @ 2025-11-24 23:08 UTC (permalink / raw)
  To: Alex Williamson, Lucas De Marchi, Thomas Hellström,
	Rodrigo Vivi, Jason Gunthorpe, Yishai Hadas, Kevin Tian,
	Shameer Kolothum, intel-xe, linux-kernel, kvm, Matthew Brost,
	Michal Wajdeczko
  Cc: dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig,
	Michał Winiarski

In addition to generic VFIO PCI functionality, the driver implements
VFIO migration uAPI, allowing userspace to enable migration for Intel
Graphics SR-IOV Virtual Functions.
The driver binds to VF device and uses API exposed by Xe driver to
transfer the VF migration data under the control of PF device.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 MAINTAINERS                  |   7 +
 drivers/vfio/pci/Kconfig     |   2 +
 drivers/vfio/pci/Makefile    |   2 +
 drivers/vfio/pci/xe/Kconfig  |  12 +
 drivers/vfio/pci/xe/Makefile |   3 +
 drivers/vfio/pci/xe/main.c   | 568 +++++++++++++++++++++++++++++++++++
 6 files changed, 594 insertions(+)
 create mode 100644 drivers/vfio/pci/xe/Kconfig
 create mode 100644 drivers/vfio/pci/xe/Makefile
 create mode 100644 drivers/vfio/pci/xe/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index acc951f122eaf..adb5aa9cd29e9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27025,6 +27025,13 @@ L:	virtualization@lists.linux.dev
 S:	Maintained
 F:	drivers/vfio/pci/virtio
 
+VFIO XE PCI DRIVER
+M:	Michał Winiarski <michal.winiarski@intel.com>
+L:	kvm@vger.kernel.org
+L:	intel-xe@lists.freedesktop.org
+S:	Supported
+F:	drivers/vfio/pci/xe
+
 VGA_SWITCHEROO
 R:	Lukas Wunner <lukas@wunner.de>
 S:	Maintained
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 2b0172f546652..c100f0ab87f2d 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -67,4 +67,6 @@ source "drivers/vfio/pci/nvgrace-gpu/Kconfig"
 
 source "drivers/vfio/pci/qat/Kconfig"
 
+source "drivers/vfio/pci/xe/Kconfig"
+
 endmenu
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index cf00c0a7e55c8..f5d46aa9347b9 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -19,3 +19,5 @@ obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
 obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu/
 
 obj-$(CONFIG_QAT_VFIO_PCI) += qat/
+
+obj-$(CONFIG_XE_VFIO_PCI) += xe/
diff --git a/drivers/vfio/pci/xe/Kconfig b/drivers/vfio/pci/xe/Kconfig
new file mode 100644
index 0000000000000..4253f2a86ca1f
--- /dev/null
+++ b/drivers/vfio/pci/xe/Kconfig
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config XE_VFIO_PCI
+	tristate "VFIO support for Intel Graphics"
+	depends on DRM_XE
+	select VFIO_PCI_CORE
+	help
+	  This option enables device specific VFIO driver variant for Intel Graphics.
+	  In addition to generic VFIO PCI functionality, it implements VFIO
+	  migration uAPI allowing userspace to enable migration for
+	  Intel Graphics SR-IOV Virtual Functions supported by the Xe driver.
+
+	  If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/xe/Makefile b/drivers/vfio/pci/xe/Makefile
new file mode 100644
index 0000000000000..13aa0fd192cd4
--- /dev/null
+++ b/drivers/vfio/pci/xe/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_XE_VFIO_PCI) += xe-vfio-pci.o
+xe-vfio-pci-y := main.o
diff --git a/drivers/vfio/pci/xe/main.c b/drivers/vfio/pci/xe/main.c
new file mode 100644
index 0000000000000..ce0ed82ee4d31
--- /dev/null
+++ b/drivers/vfio/pci/xe/main.c
@@ -0,0 +1,568 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/delay.h>
+#include <linux/file.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/sizes.h>
+#include <linux/types.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+
+#include <drm/intel/xe_sriov_vfio.h>
+#include <drm/intel/pciids.h>
+
+struct xe_vfio_pci_migration_file {
+	struct file *filp;
+	/* serializes accesses to migration data */
+	struct mutex lock;
+	bool disabled;
+	struct xe_vfio_pci_core_device *xe_vdev;
+};
+
+struct xe_vfio_pci_core_device {
+	struct vfio_pci_core_device core_device;
+	struct xe_device *xe;
+	/* PF internal control uses vfid index starting from 1 */
+	unsigned int vfid;
+	u8 migrate_cap:1;
+	u8 deferred_reset:1;
+	/* protects migration state */
+	struct mutex state_mutex;
+	enum vfio_device_mig_state mig_state;
+	/* protects the reset_done flow */
+	spinlock_t reset_lock;
+	struct xe_vfio_pci_migration_file *migf;
+};
+
+#define xe_vdev_to_dev(xe_vdev) (&(xe_vdev)->core_device.pdev->dev)
+
+static void xe_vfio_pci_disable_file(struct xe_vfio_pci_migration_file *migf)
+{
+	mutex_lock(&migf->lock);
+	migf->disabled = true;
+	mutex_unlock(&migf->lock);
+}
+
+static void xe_vfio_pci_put_file(struct xe_vfio_pci_core_device *xe_vdev)
+{
+	xe_vfio_pci_disable_file(xe_vdev->migf);
+	fput(xe_vdev->migf->filp);
+	xe_vdev->migf = NULL;
+}
+
+static void xe_vfio_pci_reset(struct xe_vfio_pci_core_device *xe_vdev)
+{
+	if (xe_vdev->migf)
+		xe_vfio_pci_put_file(xe_vdev);
+
+	xe_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+}
+
+static void xe_vfio_pci_state_mutex_lock(struct xe_vfio_pci_core_device *xe_vdev)
+{
+	mutex_lock(&xe_vdev->state_mutex);
+}
+
+/*
+ * This function is called in all state_mutex unlock cases to
+ * handle a 'deferred_reset' if exists.
+ */
+static void xe_vfio_pci_state_mutex_unlock(struct xe_vfio_pci_core_device *xe_vdev)
+{
+again:
+	spin_lock(&xe_vdev->reset_lock);
+	if (xe_vdev->deferred_reset) {
+		xe_vdev->deferred_reset = false;
+		spin_unlock(&xe_vdev->reset_lock);
+		xe_vfio_pci_reset(xe_vdev);
+		goto again;
+	}
+	mutex_unlock(&xe_vdev->state_mutex);
+	spin_unlock(&xe_vdev->reset_lock);
+}
+
+static void xe_vfio_pci_reset_done(struct pci_dev *pdev)
+{
+	struct xe_vfio_pci_core_device *xe_vdev = pci_get_drvdata(pdev);
+	int ret;
+
+	if (!xe_vdev->vfid)
+		return;
+
+	/*
+	 * VF FLR requires additional processing done by PF driver.
+	 * The processing is done after FLR is already finished from PCIe
+	 * perspective.
+	 * In order to avoid a scenario where VF is used while PF processing
+	 * is still in progress, additional synchronization point is needed.
+	 */
+	ret = xe_sriov_vfio_wait_flr_done(xe_vdev->xe, xe_vdev->vfid);
+	if (ret)
+		dev_err(&pdev->dev, "Failed to wait for FLR: %d\n", ret);
+
+	if (!xe_vdev->migrate_cap)
+		return;
+
+	/*
+	 * As the higher VFIO layers are holding locks across reset and using
+	 * those same locks with the mm_lock we need to prevent ABBA deadlock
+	 * with the state_mutex and mm_lock.
+	 * In case the state_mutex was taken already we defer the cleanup work
+	 * to the unlock flow of the other running context.
+	 */
+	spin_lock(&xe_vdev->reset_lock);
+	xe_vdev->deferred_reset = true;
+	if (!mutex_trylock(&xe_vdev->state_mutex)) {
+		spin_unlock(&xe_vdev->reset_lock);
+		return;
+	}
+	spin_unlock(&xe_vdev->reset_lock);
+	xe_vfio_pci_state_mutex_unlock(xe_vdev);
+
+	xe_vfio_pci_reset(xe_vdev);
+}
+
+static const struct pci_error_handlers xe_vfio_pci_err_handlers = {
+	.reset_done = xe_vfio_pci_reset_done,
+	.error_detected = vfio_pci_core_aer_err_detected,
+};
+
+static int xe_vfio_pci_open_device(struct vfio_device *core_vdev)
+{
+	struct xe_vfio_pci_core_device *xe_vdev =
+		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
+	struct vfio_pci_core_device *vdev = &xe_vdev->core_device;
+	int ret;
+
+	ret = vfio_pci_core_enable(vdev);
+	if (ret)
+		return ret;
+
+	vfio_pci_core_finish_enable(vdev);
+
+	return 0;
+}
+
+static int xe_vfio_pci_release_file(struct inode *inode, struct file *filp)
+{
+	struct xe_vfio_pci_migration_file *migf = filp->private_data;
+
+	xe_vfio_pci_disable_file(migf);
+	mutex_destroy(&migf->lock);
+	kfree(migf);
+
+	return 0;
+}
+
+static ssize_t xe_vfio_pci_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos)
+{
+	struct xe_vfio_pci_migration_file *migf = filp->private_data;
+	ssize_t ret;
+
+	if (pos)
+		return -ESPIPE;
+
+	mutex_lock(&migf->lock);
+	if (migf->disabled) {
+		mutex_unlock(&migf->lock);
+		return -ENODEV;
+	}
+
+	ret = xe_sriov_vfio_data_read(migf->xe_vdev->xe, migf->xe_vdev->vfid, buf, len);
+	mutex_unlock(&migf->lock);
+
+	return ret;
+}
+
+static const struct file_operations xe_vfio_pci_save_fops = {
+	.owner = THIS_MODULE,
+	.read = xe_vfio_pci_save_read,
+	.release = xe_vfio_pci_release_file,
+	.llseek = noop_llseek,
+};
+
+static ssize_t xe_vfio_pci_resume_write(struct file *filp, const char __user *buf,
+					size_t len, loff_t *pos)
+{
+	struct xe_vfio_pci_migration_file *migf = filp->private_data;
+	ssize_t ret;
+
+	if (pos)
+		return -ESPIPE;
+
+	mutex_lock(&migf->lock);
+	if (migf->disabled) {
+		mutex_unlock(&migf->lock);
+		return -ENODEV;
+	}
+
+	ret = xe_sriov_vfio_data_write(migf->xe_vdev->xe, migf->xe_vdev->vfid, buf, len);
+	mutex_unlock(&migf->lock);
+
+	return ret;
+}
+
+static const struct file_operations xe_vfio_pci_resume_fops = {
+	.owner = THIS_MODULE,
+	.write = xe_vfio_pci_resume_write,
+	.release = xe_vfio_pci_release_file,
+	.llseek = noop_llseek,
+};
+
+static const char *vfio_dev_state_str(u32 state)
+{
+	switch (state) {
+	case VFIO_DEVICE_STATE_RUNNING: return "running";
+	case VFIO_DEVICE_STATE_RUNNING_P2P: return "running_p2p";
+	case VFIO_DEVICE_STATE_STOP_COPY: return "stopcopy";
+	case VFIO_DEVICE_STATE_STOP: return "stop";
+	case VFIO_DEVICE_STATE_RESUMING: return "resuming";
+	case VFIO_DEVICE_STATE_ERROR: return "error";
+	default: return "";
+	}
+}
+
+enum xe_vfio_pci_file_type {
+	XE_VFIO_FILE_SAVE = 0,
+	XE_VFIO_FILE_RESUME,
+};
+
+static struct xe_vfio_pci_migration_file *
+xe_vfio_pci_alloc_file(struct xe_vfio_pci_core_device *xe_vdev,
+		       enum xe_vfio_pci_file_type type)
+{
+	struct xe_vfio_pci_migration_file *migf;
+	const struct file_operations *fops;
+	int flags;
+
+	migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+	if (!migf)
+		return ERR_PTR(-ENOMEM);
+
+	fops = type == XE_VFIO_FILE_SAVE ? &xe_vfio_pci_save_fops : &xe_vfio_pci_resume_fops;
+	flags = type == XE_VFIO_FILE_SAVE ? O_RDONLY : O_WRONLY;
+	migf->filp = anon_inode_getfile("xe_vfio_mig", fops, migf, flags);
+	if (IS_ERR(migf->filp)) {
+		kfree(migf);
+		return ERR_CAST(migf->filp);
+	}
+
+	mutex_init(&migf->lock);
+	migf->xe_vdev = xe_vdev;
+	xe_vdev->migf = migf;
+
+	stream_open(migf->filp->f_inode, migf->filp);
+
+	return migf;
+}
+
+static struct file *
+xe_vfio_set_state(struct xe_vfio_pci_core_device *xe_vdev, u32 new)
+{
+	u32 cur = xe_vdev->mig_state;
+	int ret;
+
+	dev_dbg(xe_vdev_to_dev(xe_vdev),
+		"state: %s->%s\n", vfio_dev_state_str(cur), vfio_dev_state_str(new));
+
+	/*
+	 * "STOP" handling is reused for "RUNNING_P2P", as the device doesn't
+	 * have the capability to selectively block outgoing p2p DMA transfers.
+	 * While the device is allowing BAR accesses when the VF is stopped, it
+	 * is not processing any new workload requests, effectively stopping
+	 * any outgoing DMA transfers (not just p2p).
+	 * Any VRAM / MMIO accesses occurring during "RUNNING_P2P" are kept and
+	 * will be migrated to target VF during stop-copy.
+	 */
+	if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) {
+		ret = xe_sriov_vfio_suspend_device(xe_vdev->xe, xe_vdev->vfid);
+		if (ret)
+			goto err;
+
+		return NULL;
+	}
+
+	if ((cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_STOP) ||
+	    (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RUNNING_P2P))
+		return NULL;
+
+	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) {
+		ret = xe_sriov_vfio_resume_device(xe_vdev->xe, xe_vdev->vfid);
+		if (ret)
+			goto err;
+
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_STOP_COPY) {
+		struct xe_vfio_pci_migration_file *migf;
+
+		migf = xe_vfio_pci_alloc_file(xe_vdev, XE_VFIO_FILE_SAVE);
+		if (IS_ERR(migf)) {
+			ret = PTR_ERR(migf);
+			goto err;
+		}
+		get_file(migf->filp);
+
+		ret = xe_sriov_vfio_stop_copy_enter(xe_vdev->xe, xe_vdev->vfid);
+		if (ret) {
+			fput(migf->filp);
+			goto err;
+		}
+
+		return migf->filp;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP) {
+		if (xe_vdev->migf)
+			xe_vfio_pci_put_file(xe_vdev);
+
+		ret = xe_sriov_vfio_stop_copy_exit(xe_vdev->xe, xe_vdev->vfid);
+		if (ret)
+			goto err;
+
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RESUMING) {
+		struct xe_vfio_pci_migration_file *migf;
+
+		migf = xe_vfio_pci_alloc_file(xe_vdev, XE_VFIO_FILE_RESUME);
+		if (IS_ERR(migf)) {
+			ret = PTR_ERR(migf);
+			goto err;
+		}
+		get_file(migf->filp);
+
+		ret = xe_sriov_vfio_resume_data_enter(xe_vdev->xe, xe_vdev->vfid);
+		if (ret) {
+			fput(migf->filp);
+			goto err;
+		}
+
+		return migf->filp;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) {
+		if (xe_vdev->migf)
+			xe_vfio_pci_put_file(xe_vdev);
+
+		ret = xe_sriov_vfio_resume_data_exit(xe_vdev->xe, xe_vdev->vfid);
+		if (ret)
+			goto err;
+
+		return NULL;
+	}
+
+	WARN(true, "Unknown state transition %d->%d", cur, new);
+	return ERR_PTR(-EINVAL);
+
+err:
+	dev_dbg(xe_vdev_to_dev(xe_vdev),
+		"Failed to transition state: %s->%s err=%d\n",
+		vfio_dev_state_str(cur), vfio_dev_state_str(new), ret);
+	return ERR_PTR(ret);
+}
+
+static struct file *
+xe_vfio_pci_set_device_state(struct vfio_device *core_vdev,
+			     enum vfio_device_mig_state new_state)
+{
+	struct xe_vfio_pci_core_device *xe_vdev =
+		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
+	enum vfio_device_mig_state next_state;
+	struct file *f = NULL;
+	int ret;
+
+	xe_vfio_pci_state_mutex_lock(xe_vdev);
+	while (new_state != xe_vdev->mig_state) {
+		ret = vfio_mig_get_next_state(core_vdev, xe_vdev->mig_state,
+					      new_state, &next_state);
+		if (ret) {
+			xe_sriov_vfio_error(xe_vdev->xe, xe_vdev->vfid);
+			f = ERR_PTR(ret);
+			break;
+		}
+		f = xe_vfio_set_state(xe_vdev, next_state);
+		if (IS_ERR(f))
+			break;
+
+		xe_vdev->mig_state = next_state;
+
+		/* Multiple state transitions with non-NULL file in the middle */
+		if (f && new_state != xe_vdev->mig_state) {
+			fput(f);
+			f = ERR_PTR(-EINVAL);
+			break;
+		}
+	}
+	xe_vfio_pci_state_mutex_unlock(xe_vdev);
+
+	return f;
+}
+
+static int xe_vfio_pci_get_device_state(struct vfio_device *core_vdev,
+					enum vfio_device_mig_state *curr_state)
+{
+	struct xe_vfio_pci_core_device *xe_vdev =
+		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
+
+	xe_vfio_pci_state_mutex_lock(xe_vdev);
+	*curr_state = xe_vdev->mig_state;
+	xe_vfio_pci_state_mutex_unlock(xe_vdev);
+
+	return 0;
+}
+
+static int xe_vfio_pci_get_data_size(struct vfio_device *vdev,
+				     unsigned long *stop_copy_length)
+{
+	struct xe_vfio_pci_core_device *xe_vdev =
+		container_of(vdev, struct xe_vfio_pci_core_device, core_device.vdev);
+
+	xe_vfio_pci_state_mutex_lock(xe_vdev);
+	*stop_copy_length = xe_sriov_vfio_stop_copy_size(xe_vdev->xe, xe_vdev->vfid);
+	xe_vfio_pci_state_mutex_unlock(xe_vdev);
+
+	return 0;
+}
+
+static const struct vfio_migration_ops xe_vfio_pci_migration_ops = {
+	.migration_set_state = xe_vfio_pci_set_device_state,
+	.migration_get_state = xe_vfio_pci_get_device_state,
+	.migration_get_data_size = xe_vfio_pci_get_data_size,
+};
+
+static void xe_vfio_pci_migration_init(struct xe_vfio_pci_core_device *xe_vdev)
+{
+	struct vfio_device *core_vdev = &xe_vdev->core_device.vdev;
+	struct pci_dev *pdev = to_pci_dev(core_vdev->dev);
+	struct xe_device *xe = xe_sriov_vfio_get_pf(pdev);
+	int ret;
+
+	if (!xe)
+		return;
+	if (!xe_sriov_vfio_migration_supported(xe))
+		return;
+
+	ret = pci_iov_vf_id(pdev);
+	if (ret < 0)
+		return;
+
+	mutex_init(&xe_vdev->state_mutex);
+	spin_lock_init(&xe_vdev->reset_lock);
+
+	/* PF internal control uses vfid index starting from 1 */
+	xe_vdev->vfid = ret + 1;
+	xe_vdev->xe = xe;
+	xe_vdev->migrate_cap = true;
+
+	core_vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P;
+	core_vdev->mig_ops = &xe_vfio_pci_migration_ops;
+}
+
+static void xe_vfio_pci_migration_fini(struct xe_vfio_pci_core_device *xe_vdev)
+{
+	if (!xe_vdev->migrate_cap)
+		return;
+
+	mutex_destroy(&xe_vdev->state_mutex);
+}
+
+static int xe_vfio_pci_init_dev(struct vfio_device *core_vdev)
+{
+	struct xe_vfio_pci_core_device *xe_vdev =
+		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
+
+	xe_vfio_pci_migration_init(xe_vdev);
+
+	return vfio_pci_core_init_dev(core_vdev);
+}
+
+static void xe_vfio_pci_release_dev(struct vfio_device *core_vdev)
+{
+	struct xe_vfio_pci_core_device *xe_vdev =
+		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
+
+	xe_vfio_pci_migration_fini(xe_vdev);
+}
+
+static const struct vfio_device_ops xe_vfio_pci_ops = {
+	.name = "xe-vfio-pci",
+	.init = xe_vfio_pci_init_dev,
+	.release = xe_vfio_pci_release_dev,
+	.open_device = xe_vfio_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = vfio_pci_core_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = vfio_pci_core_read,
+	.write = vfio_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.match_token_uuid = vfio_pci_core_match_token_uuid,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
+};
+
+static int xe_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct xe_vfio_pci_core_device *xe_vdev;
+	int ret;
+
+	xe_vdev = vfio_alloc_device(xe_vfio_pci_core_device, core_device.vdev, &pdev->dev,
+				    &xe_vfio_pci_ops);
+	if (IS_ERR(xe_vdev))
+		return PTR_ERR(xe_vdev);
+
+	dev_set_drvdata(&pdev->dev, &xe_vdev->core_device);
+
+	ret = vfio_pci_core_register_device(&xe_vdev->core_device);
+	if (ret) {
+		vfio_put_device(&xe_vdev->core_device.vdev);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void xe_vfio_pci_remove(struct pci_dev *pdev)
+{
+	struct xe_vfio_pci_core_device *xe_vdev = pci_get_drvdata(pdev);
+
+	vfio_pci_core_unregister_device(&xe_vdev->core_device);
+	vfio_put_device(&xe_vdev->core_device.vdev);
+}
+
+#define INTEL_PCI_VFIO_DEVICE(_id) { \
+	PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, (_id)) \
+}
+
+static const struct pci_device_id xe_vfio_pci_table[] = {
+	INTEL_PTL_IDS(INTEL_PCI_VFIO_DEVICE),
+	INTEL_WCL_IDS(INTEL_PCI_VFIO_DEVICE),
+	INTEL_BMG_IDS(INTEL_PCI_VFIO_DEVICE),
+	{}
+};
+MODULE_DEVICE_TABLE(pci, xe_vfio_pci_table);
+
+static struct pci_driver xe_vfio_pci_driver = {
+	.name = "xe-vfio-pci",
+	.id_table = xe_vfio_pci_table,
+	.probe = xe_vfio_pci_probe,
+	.remove = xe_vfio_pci_remove,
+	.err_handler = &xe_vfio_pci_err_handlers,
+	.driver_managed_dma = true,
+};
+module_pci_driver(xe_vfio_pci_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Michał Winiarski <michal.winiarski@intel.com>");
+MODULE_DESCRIPTION("VFIO PCI driver with migration support for Intel Graphics");
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] drm/xe/pf: Enable SR-IOV VF migration
  2025-11-24 23:08 ` [PATCH v6 1/4] drm/xe/pf: Enable SR-IOV " Michał Winiarski
@ 2025-11-25 14:26   ` Michal Wajdeczko
  2025-11-26 22:07     ` Michał Winiarski
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Wajdeczko @ 2025-11-25 14:26 UTC (permalink / raw)
  To: Michał Winiarski, Alex Williamson, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Jason Gunthorpe,
	Yishai Hadas, Kevin Tian, Shameer Kolothum, intel-xe,
	linux-kernel, kvm, Matthew Brost
  Cc: dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig



On 11/25/2025 12:08 AM, Michał Winiarski wrote:
> All of the necessary building blocks are now in place to support SR-IOV
> VF migration.
> Flip the enable/disable logic to match VF code and disable the feature
> only for platforms that don't meet the necessary prerequisites.
> 

I guess you should mention that "to allow more testing and experiments,
on DEBUG builds any missing prerequisites will be ignored"

> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  9 +++++
>  drivers/gpu/drm/xe/xe_sriov_pf_migration.c    | 35 ++++++++++++++++---
>  drivers/gpu/drm/xe/xe_sriov_pf_migration.h    |  1 +
>  .../gpu/drm/xe/xe_sriov_pf_migration_types.h  |  4 +--
>  4 files changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> index d5d918ddce4fe..3174a8dee779e 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> @@ -17,6 +17,7 @@
>  #include "xe_gt_sriov_pf_helpers.h"
>  #include "xe_gt_sriov_pf_migration.h"
>  #include "xe_gt_sriov_printk.h"
> +#include "xe_guc.h"
>  #include "xe_guc_buf.h"
>  #include "xe_guc_ct.h"
>  #include "xe_migrate.h"
> @@ -1023,6 +1024,12 @@ static void action_ring_cleanup(void *arg)
>  	ptr_ring_cleanup(r, destroy_pf_packet);
>  }
>  
> +static void pf_gt_migration_check_support(struct xe_gt *gt)
> +{
> +	if (GUC_FIRMWARE_VER(&gt->uc.guc) < MAKE_GUC_VER(70, 54, 0))
> +		xe_sriov_pf_migration_disable(gt_to_xe(gt), "requires GuC version >= 70.54.0");
> +}
> +
>  /**
>   * xe_gt_sriov_pf_migration_init() - Initialize support for VF migration.
>   * @gt: the &xe_gt
> @@ -1039,6 +1046,8 @@ int xe_gt_sriov_pf_migration_init(struct xe_gt *gt)
>  
>  	xe_gt_assert(gt, IS_SRIOV_PF(xe));
>  
> +	pf_gt_migration_check_support(gt);
> +
>  	if (!pf_migration_supported(gt))
>  		return 0;
>  
> diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_sriov_pf_migration.c
> index de06cc690fc81..6c4b16409cc9a 100644
> --- a/drivers/gpu/drm/xe/xe_sriov_pf_migration.c
> +++ b/drivers/gpu/drm/xe/xe_sriov_pf_migration.c
> @@ -46,13 +46,37 @@ bool xe_sriov_pf_migration_supported(struct xe_device *xe)
>  {
>  	xe_assert(xe, IS_SRIOV_PF(xe));
>  
> -	return xe->sriov.pf.migration.supported;
> +	return IS_ENABLED(CONFIG_DRM_XE_DEBUG) || !xe->sriov.pf.migration.disabled;
>  }
>  
> -static bool pf_check_migration_support(struct xe_device *xe)
> +/**
> + * xe_sriov_pf_migration_disable() - Turn off SR-IOV VF migration support on PF.
> + * @xe: the &xe_device instance.
> + * @fmt: format string for the log message, to be combined with following VAs.
> + */
> +void xe_sriov_pf_migration_disable(struct xe_device *xe, const char *fmt, ...)
> +{
> +	struct va_format vaf;
> +	va_list va_args;
> +
> +	xe_assert(xe, IS_SRIOV_PF(xe));
> +
> +	va_start(va_args, fmt);
> +	vaf.fmt = fmt;
> +	vaf.va  = &va_args;
> +	xe_sriov_notice(xe, "migration %s: %pV\n",
> +			IS_ENABLED(CONFIG_DRM_XE_DEBUG) ?
> +			"missing prerequisite" : "disabled",
> +			&vaf);
> +	va_end(va_args);
> +
> +	xe->sriov.pf.migration.disabled = true;
> +}
> +
> +static void pf_migration_check_support(struct xe_device *xe)
>  {
> -	/* XXX: for now this is for feature enabling only */
> -	return IS_ENABLED(CONFIG_DRM_XE_DEBUG);
> +	if (!xe_device_has_memirq(xe))
> +		xe_sriov_pf_migration_disable(xe, "requires memory-based IRQ support");
>  }
>  
>  static void pf_migration_cleanup(void *arg)
> @@ -77,7 +101,8 @@ int xe_sriov_pf_migration_init(struct xe_device *xe)
>  
>  	xe_assert(xe, IS_SRIOV_PF(xe));
>  
> -	xe->sriov.pf.migration.supported = pf_check_migration_support(xe);
> +	pf_migration_check_support(xe);
> +
>  	if (!xe_sriov_pf_migration_supported(xe))
>  		return 0;
>  
> diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_migration.h b/drivers/gpu/drm/xe/xe_sriov_pf_migration.h
> index b806298a0bb62..f8f408df84813 100644
> --- a/drivers/gpu/drm/xe/xe_sriov_pf_migration.h
> +++ b/drivers/gpu/drm/xe/xe_sriov_pf_migration.h
> @@ -14,6 +14,7 @@ struct xe_sriov_packet;
>  
>  int xe_sriov_pf_migration_init(struct xe_device *xe);
>  bool xe_sriov_pf_migration_supported(struct xe_device *xe);
> +void xe_sriov_pf_migration_disable(struct xe_device *xe, const char *fmt, ...);
>  int xe_sriov_pf_migration_restore_produce(struct xe_device *xe, unsigned int vfid,
>  					  struct xe_sriov_packet *data);
>  struct xe_sriov_packet *
> diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h b/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h
> index 363d673ee1dd5..7d9a8a278d915 100644
> --- a/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h
> +++ b/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h
> @@ -14,8 +14,8 @@
>   * struct xe_sriov_pf_migration - Xe device level VF migration data
>   */
>  struct xe_sriov_pf_migration {
> -	/** @supported: indicates whether VF migration feature is supported */
> -	bool supported;
> +	/** @disabled: indicates whether VF migration feature is disabled */
> +	bool disabled;
>  };
>  
>  /**

otherwise lgtm,

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] drm/xe/pf: Export helpers for VFIO
  2025-11-24 23:08 ` [PATCH v6 3/4] drm/xe/pf: Export helpers for VFIO Michał Winiarski
@ 2025-11-25 14:38   ` Michal Wajdeczko
  2025-11-26 22:07     ` Michał Winiarski
  2025-11-25 18:34   ` Alex Williamson
  1 sibling, 1 reply; 19+ messages in thread
From: Michal Wajdeczko @ 2025-11-25 14:38 UTC (permalink / raw)
  To: Michał Winiarski, Alex Williamson, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Jason Gunthorpe,
	Yishai Hadas, Kevin Tian, Shameer Kolothum, intel-xe,
	linux-kernel, kvm, Matthew Brost
  Cc: dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig



On 11/25/2025 12:08 AM, Michał Winiarski wrote:
> Device specific VFIO driver variant for Xe will implement VF migration.
> Export everything that's needed for migration ops.
> 
> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile        |   2 +
>  drivers/gpu/drm/xe/xe_sriov_vfio.c | 276 +++++++++++++++++++++++++++++
>  include/drm/intel/xe_sriov_vfio.h  |  30 ++++
>  3 files changed, 308 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_sriov_vfio.c
>  create mode 100644 include/drm/intel/xe_sriov_vfio.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index b848da79a4e18..0938b00a4c7fe 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -184,6 +184,8 @@ xe-$(CONFIG_PCI_IOV) += \
>  	xe_sriov_pf_sysfs.o \
>  	xe_tile_sriov_pf_debugfs.o
>  
> +xe-$(CONFIG_XE_VFIO_PCI) += xe_sriov_vfio.o

hmm, shouldn't we also check for CONFIG_PCI_IOV ?
otherwise, some PF functions might not be available
or there some other implicit rule in Kconfig?

> +
>  # include helpers for tests even when XE is built-in
>  ifdef CONFIG_DRM_XE_KUNIT_TEST
>  xe-y += tests/xe_kunit_helpers.o
> diff --git a/drivers/gpu/drm/xe/xe_sriov_vfio.c b/drivers/gpu/drm/xe/xe_sriov_vfio.c
> new file mode 100644
> index 0000000000000..785f9a5027d10
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_sriov_vfio.c
> @@ -0,0 +1,276 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include <drm/intel/xe_sriov_vfio.h>
> +#include <linux/cleanup.h>
> +
> +#include "xe_pci.h"
> +#include "xe_pm.h"
> +#include "xe_sriov_pf_control.h"
> +#include "xe_sriov_pf_helpers.h"
> +#include "xe_sriov_pf_migration.h"
> +
> +/**
> + * xe_sriov_vfio_get_pf() - Get PF &xe_device.
> + * @pdev: the VF &pci_dev device
> + *
> + * Return: pointer to PF &xe_device, NULL otherwise.
> + */
> +struct xe_device *xe_sriov_vfio_get_pf(struct pci_dev *pdev)
> +{
> +	return xe_pci_to_pf_device(pdev);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_get_pf, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_migration_supported() - Check if migration is supported.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + *
> + * Return: true if migration is supported, false otherwise.
> + */
> +bool xe_sriov_vfio_migration_supported(struct xe_device *xe)
> +{

hmm, I'm wondering if maybe we should also check for NULL xe in all those
functions, as above helper function might return NULL in some unlikely case

but maybe this is too defensive

> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +
> +	return xe_sriov_pf_migration_supported(xe);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_migration_supported, "xe-vfio-pci");
> +

everything else lgtm, so:

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] drm/xe/pf: Export helpers for VFIO
  2025-11-24 23:08 ` [PATCH v6 3/4] drm/xe/pf: Export helpers for VFIO Michał Winiarski
  2025-11-25 14:38   ` Michal Wajdeczko
@ 2025-11-25 18:34   ` Alex Williamson
  2025-11-26 18:21     ` Michał Winiarski
  1 sibling, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2025-11-25 18:34 UTC (permalink / raw)
  To: Michał Winiarski
  Cc: Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Yishai Hadas, Kevin Tian, Shameer Kolothum,
	intel-xe, linux-kernel, kvm, Matthew Brost, Michal Wajdeczko,
	dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Tue, 25 Nov 2025 00:08:40 +0100
Michał Winiarski <michal.winiarski@intel.com> wrote:
> +/**
> + * xe_sriov_vfio_wait_flr_done() - Wait for VF FLR completion.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + *
> + * This function will wait until VF FLR is processed by PF on all tiles (or
> + * until timeout occurs).
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_sriov_vfio_wait_flr_done(struct xe_device *xe, unsigned int vfid)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_control_wait_flr(xe, vfid);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_wait_flr_done, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_suspend_device() - Suspend VF.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + *
> + * This function will pause VF on all tiles/GTs.
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_sriov_vfio_suspend_device(struct xe_device *xe, unsigned int vfid)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_control_pause_vf(xe, vfid);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_suspend_device, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_resume_device() - Resume VF.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + *
> + * This function will resume VF on all tiles.
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_sriov_vfio_resume_device(struct xe_device *xe, unsigned int vfid)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_control_resume_vf(xe, vfid);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_resume_device, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_stop_copy_enter() - Initiate a VF device migration data save.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_sriov_vfio_stop_copy_enter(struct xe_device *xe, unsigned int vfid)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_control_trigger_save_vf(xe, vfid);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_stop_copy_enter, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_stop_copy_exit() - Finish a VF device migration data save.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_sriov_vfio_stop_copy_exit(struct xe_device *xe, unsigned int vfid)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_control_finish_save_vf(xe, vfid);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_stop_copy_exit, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_resume_data_enter() - Initiate a VF device migration data restore.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_sriov_vfio_resume_data_enter(struct xe_device *xe, unsigned int vfid)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_control_trigger_restore_vf(xe, vfid);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_resume_data_enter, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_resume_data_exit() - Finish a VF device migration data restore.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_sriov_vfio_resume_data_exit(struct xe_device *xe, unsigned int vfid)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_control_finish_restore_vf(xe, vfid);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_resume_data_exit, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_error() - Move VF device to error state.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + *
> + * Reset is needed to move it out of error state.
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int xe_sriov_vfio_error(struct xe_device *xe, unsigned int vfid)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_control_stop_vf(xe, vfid);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_error, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_data_read() - Read migration data from the VF device.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + * @buf: start address of userspace buffer
> + * @len: requested read size from userspace
> + *
> + * Return: number of bytes that has been successfully read,
> + *	   0 if no more migration data is available, -errno on failure.
> + */
> +ssize_t xe_sriov_vfio_data_read(struct xe_device *xe, unsigned int vfid,
> +				char __user *buf, size_t len)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_migration_read(xe, vfid, buf, len);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_data_read, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_data_write() - Write migration data to the VF device.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + * @buf: start address of userspace buffer
> + * @len: requested write size from userspace
> + *
> + * Return: number of bytes that has been successfully written, -errno on failure.
> + */
> +ssize_t xe_sriov_vfio_data_write(struct xe_device *xe, unsigned int vfid,
> +				 const char __user *buf, size_t len)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_migration_write(xe, vfid, buf, len);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_data_write, "xe-vfio-pci");
> +
> +/**
> + * xe_sriov_vfio_stop_copy_size() - Get a size estimate of VF device migration data.
> + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> + * @vfid: the VF identifier (can't be 0)
> + *
> + * Return: migration data size in bytes or a negative error code on failure.
> + */
> +ssize_t xe_sriov_vfio_stop_copy_size(struct xe_device *xe, unsigned int vfid)
> +{
> +	if (!IS_SRIOV_PF(xe))
> +		return -EPERM;
> +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> +		return -EINVAL;
> +
> +	guard(xe_pm_runtime_noresume)(xe);
> +
> +	return xe_sriov_pf_migration_size(xe, vfid);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_stop_copy_size, "xe-vfio-pci");

The duplicated testing and identical structure of most of the above
functions suggests a helper, if not full on definition by macro.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 4/4] vfio/xe: Add device specific vfio_pci driver variant for Intel graphics
  2025-11-24 23:08 ` [PATCH v6 4/4] vfio/xe: Add device specific vfio_pci driver variant for Intel graphics Michał Winiarski
@ 2025-11-25 20:08   ` Alex Williamson
  2025-11-26 11:59     ` Michał Winiarski
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2025-11-25 20:08 UTC (permalink / raw)
  To: Michał Winiarski
  Cc: Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Yishai Hadas, Kevin Tian, Shameer Kolothum,
	intel-xe, linux-kernel, kvm, Matthew Brost, Michal Wajdeczko,
	dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Tue, 25 Nov 2025 00:08:41 +0100
Michał Winiarski <michal.winiarski@intel.com> wrote:

> In addition to generic VFIO PCI functionality, the driver implements
> VFIO migration uAPI, allowing userspace to enable migration for Intel
> Graphics SR-IOV Virtual Functions.
> The driver binds to VF device and uses API exposed by Xe driver to
> transfer the VF migration data under the control of PF device.
> 
> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  MAINTAINERS                  |   7 +
>  drivers/vfio/pci/Kconfig     |   2 +
>  drivers/vfio/pci/Makefile    |   2 +
>  drivers/vfio/pci/xe/Kconfig  |  12 +
>  drivers/vfio/pci/xe/Makefile |   3 +
>  drivers/vfio/pci/xe/main.c   | 568 +++++++++++++++++++++++++++++++++++
>  6 files changed, 594 insertions(+)
>  create mode 100644 drivers/vfio/pci/xe/Kconfig
>  create mode 100644 drivers/vfio/pci/xe/Makefile
>  create mode 100644 drivers/vfio/pci/xe/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index acc951f122eaf..adb5aa9cd29e9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -27025,6 +27025,13 @@ L:	virtualization@lists.linux.dev
>  S:	Maintained
>  F:	drivers/vfio/pci/virtio
>  
> +VFIO XE PCI DRIVER
> +M:	Michał Winiarski <michal.winiarski@intel.com>
> +L:	kvm@vger.kernel.org
> +L:	intel-xe@lists.freedesktop.org
> +S:	Supported
> +F:	drivers/vfio/pci/xe
> +
>  VGA_SWITCHEROO
>  R:	Lukas Wunner <lukas@wunner.de>
>  S:	Maintained
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 2b0172f546652..c100f0ab87f2d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -67,4 +67,6 @@ source "drivers/vfio/pci/nvgrace-gpu/Kconfig"
>  
>  source "drivers/vfio/pci/qat/Kconfig"
>  
> +source "drivers/vfio/pci/xe/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index cf00c0a7e55c8..f5d46aa9347b9 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -19,3 +19,5 @@ obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>  obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu/
>  
>  obj-$(CONFIG_QAT_VFIO_PCI) += qat/
> +
> +obj-$(CONFIG_XE_VFIO_PCI) += xe/
> diff --git a/drivers/vfio/pci/xe/Kconfig b/drivers/vfio/pci/xe/Kconfig
> new file mode 100644
> index 0000000000000..4253f2a86ca1f
> --- /dev/null
> +++ b/drivers/vfio/pci/xe/Kconfig
> @@ -0,0 +1,12 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config XE_VFIO_PCI
> +	tristate "VFIO support for Intel Graphics"
> +	depends on DRM_XE
> +	select VFIO_PCI_CORE
> +	help
> +	  This option enables device specific VFIO driver variant for Intel Graphics.
> +	  In addition to generic VFIO PCI functionality, it implements VFIO
> +	  migration uAPI allowing userspace to enable migration for
> +	  Intel Graphics SR-IOV Virtual Functions supported by the Xe driver.
> +
> +	  If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/xe/Makefile b/drivers/vfio/pci/xe/Makefile
> new file mode 100644
> index 0000000000000..13aa0fd192cd4
> --- /dev/null
> +++ b/drivers/vfio/pci/xe/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_XE_VFIO_PCI) += xe-vfio-pci.o
> +xe-vfio-pci-y := main.o
> diff --git a/drivers/vfio/pci/xe/main.c b/drivers/vfio/pci/xe/main.c
> new file mode 100644
> index 0000000000000..ce0ed82ee4d31
> --- /dev/null
> +++ b/drivers/vfio/pci/xe/main.c
> @@ -0,0 +1,568 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include <linux/anon_inodes.h>
> +#include <linux/delay.h>
> +#include <linux/file.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/sizes.h>
> +#include <linux/types.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +
> +#include <drm/intel/xe_sriov_vfio.h>
> +#include <drm/intel/pciids.h>
> +
> +struct xe_vfio_pci_migration_file {
> +	struct file *filp;
> +	/* serializes accesses to migration data */
> +	struct mutex lock;
> +	bool disabled;

Move to the end to avoid a hole?  Unless you know mutex leaves a gap.
Maybe also use a bitfield u8 for consistency to flags in below struct.

> +	struct xe_vfio_pci_core_device *xe_vdev;
> +};
> +
> +struct xe_vfio_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	struct xe_device *xe;
> +	/* PF internal control uses vfid index starting from 1 */
> +	unsigned int vfid;
> +	u8 migrate_cap:1;
> +	u8 deferred_reset:1;
> +	/* protects migration state */
> +	struct mutex state_mutex;
> +	enum vfio_device_mig_state mig_state;
> +	/* protects the reset_done flow */
> +	spinlock_t reset_lock;
> +	struct xe_vfio_pci_migration_file *migf;
> +};
> +
> +#define xe_vdev_to_dev(xe_vdev) (&(xe_vdev)->core_device.pdev->dev)
> +
> +static void xe_vfio_pci_disable_file(struct xe_vfio_pci_migration_file *migf)
> +{
> +	mutex_lock(&migf->lock);
> +	migf->disabled = true;
> +	mutex_unlock(&migf->lock);
> +}
> +
> +static void xe_vfio_pci_put_file(struct xe_vfio_pci_core_device *xe_vdev)
> +{
> +	xe_vfio_pci_disable_file(xe_vdev->migf);
> +	fput(xe_vdev->migf->filp);
> +	xe_vdev->migf = NULL;
> +}
> +
> +static void xe_vfio_pci_reset(struct xe_vfio_pci_core_device *xe_vdev)
> +{
> +	if (xe_vdev->migf)
> +		xe_vfio_pci_put_file(xe_vdev);
> +
> +	xe_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
> +}
> +
> +static void xe_vfio_pci_state_mutex_lock(struct xe_vfio_pci_core_device *xe_vdev)
> +{
> +	mutex_lock(&xe_vdev->state_mutex);
> +}
> +
> +/*
> + * This function is called in all state_mutex unlock cases to
> + * handle a 'deferred_reset' if exists.
> + */
> +static void xe_vfio_pci_state_mutex_unlock(struct xe_vfio_pci_core_device *xe_vdev)
> +{
> +again:
> +	spin_lock(&xe_vdev->reset_lock);
> +	if (xe_vdev->deferred_reset) {
> +		xe_vdev->deferred_reset = false;
> +		spin_unlock(&xe_vdev->reset_lock);
> +		xe_vfio_pci_reset(xe_vdev);
> +		goto again;
> +	}
> +	mutex_unlock(&xe_vdev->state_mutex);
> +	spin_unlock(&xe_vdev->reset_lock);
> +}
> +
> +static void xe_vfio_pci_reset_done(struct pci_dev *pdev)
> +{
> +	struct xe_vfio_pci_core_device *xe_vdev = pci_get_drvdata(pdev);
> +	int ret;
> +
> +	if (!xe_vdev->vfid)
> +		return;
> +
> +	/*
> +	 * VF FLR requires additional processing done by PF driver.
> +	 * The processing is done after FLR is already finished from PCIe
> +	 * perspective.
> +	 * In order to avoid a scenario where VF is used while PF processing
> +	 * is still in progress, additional synchronization point is needed.
> +	 */
> +	ret = xe_sriov_vfio_wait_flr_done(xe_vdev->xe, xe_vdev->vfid);
> +	if (ret)
> +		dev_err(&pdev->dev, "Failed to wait for FLR: %d\n", ret);
> +
> +	if (!xe_vdev->migrate_cap)
> +		return;

It seems like the above is intended to cause a stall for all VFs,
regardless of migration support, but vfid and xe are only set for VFs
supporting migration.  Maybe that much needs to be pulled out of
migration_init into init_dev, which then gives the migrate_cap flag
purpose where it otherwise seems redundant to testing xe or vfid.

> +
> +	/*
> +	 * As the higher VFIO layers are holding locks across reset and using
> +	 * those same locks with the mm_lock we need to prevent ABBA deadlock
> +	 * with the state_mutex and mm_lock.
> +	 * In case the state_mutex was taken already we defer the cleanup work
> +	 * to the unlock flow of the other running context.
> +	 */
> +	spin_lock(&xe_vdev->reset_lock);
> +	xe_vdev->deferred_reset = true;
> +	if (!mutex_trylock(&xe_vdev->state_mutex)) {
> +		spin_unlock(&xe_vdev->reset_lock);
> +		return;
> +	}
> +	spin_unlock(&xe_vdev->reset_lock);
> +	xe_vfio_pci_state_mutex_unlock(xe_vdev);
> +
> +	xe_vfio_pci_reset(xe_vdev);
> +}
> +
> +static const struct pci_error_handlers xe_vfio_pci_err_handlers = {
> +	.reset_done = xe_vfio_pci_reset_done,
> +	.error_detected = vfio_pci_core_aer_err_detected,
> +};
> +
> +static int xe_vfio_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct xe_vfio_pci_core_device *xe_vdev =
> +		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &xe_vdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	vfio_pci_core_finish_enable(vdev);
> +
> +	return 0;
> +}

Typically migration drivers are setting the initial RUNNING mig_state
in their open_device function, are we implicitly relying on the
reset_done callback for this instead?

> +
> +static int xe_vfio_pci_release_file(struct inode *inode, struct file *filp)
> +{
> +	struct xe_vfio_pci_migration_file *migf = filp->private_data;
> +
> +	xe_vfio_pci_disable_file(migf);

What does calling the above accomplish?  If something is racing access,
setting disabled immediately before we destroy the lock and free the
object isn't going to solve anything.

> +	mutex_destroy(&migf->lock);
> +	kfree(migf);
> +
> +	return 0;
> +}
> +
> +static ssize_t xe_vfio_pci_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos)
> +{
> +	struct xe_vfio_pci_migration_file *migf = filp->private_data;
> +	ssize_t ret;
> +
> +	if (pos)
> +		return -ESPIPE;
> +
> +	mutex_lock(&migf->lock);
> +	if (migf->disabled) {
> +		mutex_unlock(&migf->lock);
> +		return -ENODEV;
> +	}
> +
> +	ret = xe_sriov_vfio_data_read(migf->xe_vdev->xe, migf->xe_vdev->vfid, buf, len);
> +	mutex_unlock(&migf->lock);
> +
> +	return ret;
> +}
> +
> +static const struct file_operations xe_vfio_pci_save_fops = {
> +	.owner = THIS_MODULE,
> +	.read = xe_vfio_pci_save_read,
> +	.release = xe_vfio_pci_release_file,
> +	.llseek = noop_llseek,
> +};
> +
> +static ssize_t xe_vfio_pci_resume_write(struct file *filp, const char __user *buf,
> +					size_t len, loff_t *pos)
> +{
> +	struct xe_vfio_pci_migration_file *migf = filp->private_data;
> +	ssize_t ret;
> +
> +	if (pos)
> +		return -ESPIPE;
> +
> +	mutex_lock(&migf->lock);
> +	if (migf->disabled) {
> +		mutex_unlock(&migf->lock);
> +		return -ENODEV;
> +	}
> +
> +	ret = xe_sriov_vfio_data_write(migf->xe_vdev->xe, migf->xe_vdev->vfid, buf, len);
> +	mutex_unlock(&migf->lock);
> +
> +	return ret;
> +}
> +
> +static const struct file_operations xe_vfio_pci_resume_fops = {
> +	.owner = THIS_MODULE,
> +	.write = xe_vfio_pci_resume_write,
> +	.release = xe_vfio_pci_release_file,
> +	.llseek = noop_llseek,
> +};
> +
> +static const char *vfio_dev_state_str(u32 state)
> +{
> +	switch (state) {
> +	case VFIO_DEVICE_STATE_RUNNING: return "running";
> +	case VFIO_DEVICE_STATE_RUNNING_P2P: return "running_p2p";
> +	case VFIO_DEVICE_STATE_STOP_COPY: return "stopcopy";
> +	case VFIO_DEVICE_STATE_STOP: return "stop";
> +	case VFIO_DEVICE_STATE_RESUMING: return "resuming";
> +	case VFIO_DEVICE_STATE_ERROR: return "error";
> +	default: return "";
> +	}
> +}
> +
> +enum xe_vfio_pci_file_type {
> +	XE_VFIO_FILE_SAVE = 0,
> +	XE_VFIO_FILE_RESUME,
> +};
> +
> +static struct xe_vfio_pci_migration_file *
> +xe_vfio_pci_alloc_file(struct xe_vfio_pci_core_device *xe_vdev,
> +		       enum xe_vfio_pci_file_type type)
> +{
> +	struct xe_vfio_pci_migration_file *migf;
> +	const struct file_operations *fops;
> +	int flags;
> +
> +	migf = kzalloc(sizeof(*migf), GFP_KERNEL);

GFP_KERNEL_ACCOUNT

> +	if (!migf)
> +		return ERR_PTR(-ENOMEM);
> +
> +	fops = type == XE_VFIO_FILE_SAVE ? &xe_vfio_pci_save_fops : &xe_vfio_pci_resume_fops;
> +	flags = type == XE_VFIO_FILE_SAVE ? O_RDONLY : O_WRONLY;
> +	migf->filp = anon_inode_getfile("xe_vfio_mig", fops, migf, flags);
> +	if (IS_ERR(migf->filp)) {
> +		kfree(migf);
> +		return ERR_CAST(migf->filp);
> +	}
> +
> +	mutex_init(&migf->lock);
> +	migf->xe_vdev = xe_vdev;
> +	xe_vdev->migf = migf;
> +
> +	stream_open(migf->filp->f_inode, migf->filp);
> +
> +	return migf;
> +}
> +
> +static struct file *
> +xe_vfio_set_state(struct xe_vfio_pci_core_device *xe_vdev, u32 new)
> +{
> +	u32 cur = xe_vdev->mig_state;
> +	int ret;
> +
> +	dev_dbg(xe_vdev_to_dev(xe_vdev),
> +		"state: %s->%s\n", vfio_dev_state_str(cur), vfio_dev_state_str(new));
> +
> +	/*
> +	 * "STOP" handling is reused for "RUNNING_P2P", as the device doesn't
> +	 * have the capability to selectively block outgoing p2p DMA transfers.
> +	 * While the device is allowing BAR accesses when the VF is stopped, it
> +	 * is not processing any new workload requests, effectively stopping
> +	 * any outgoing DMA transfers (not just p2p).
> +	 * Any VRAM / MMIO accesses occurring during "RUNNING_P2P" are kept and
> +	 * will be migrated to target VF during stop-copy.
> +	 */
> +	if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) {
> +		ret = xe_sriov_vfio_suspend_device(xe_vdev->xe, xe_vdev->vfid);
> +		if (ret)
> +			goto err;
> +
> +		return NULL;
> +	}
> +
> +	if ((cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_STOP) ||
> +	    (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RUNNING_P2P))
> +		return NULL;
> +
> +	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) {
> +		ret = xe_sriov_vfio_resume_device(xe_vdev->xe, xe_vdev->vfid);
> +		if (ret)
> +			goto err;
> +
> +		return NULL;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_STOP_COPY) {
> +		struct xe_vfio_pci_migration_file *migf;
> +
> +		migf = xe_vfio_pci_alloc_file(xe_vdev, XE_VFIO_FILE_SAVE);
> +		if (IS_ERR(migf)) {
> +			ret = PTR_ERR(migf);
> +			goto err;
> +		}
> +		get_file(migf->filp);
> +
> +		ret = xe_sriov_vfio_stop_copy_enter(xe_vdev->xe, xe_vdev->vfid);
> +		if (ret) {
> +			fput(migf->filp);
> +			goto err;
> +		}
> +
> +		return migf->filp;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP) {
> +		if (xe_vdev->migf)
> +			xe_vfio_pci_put_file(xe_vdev);
> +
> +		ret = xe_sriov_vfio_stop_copy_exit(xe_vdev->xe, xe_vdev->vfid);
> +		if (ret)
> +			goto err;
> +
> +		return NULL;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RESUMING) {
> +		struct xe_vfio_pci_migration_file *migf;
> +
> +		migf = xe_vfio_pci_alloc_file(xe_vdev, XE_VFIO_FILE_RESUME);
> +		if (IS_ERR(migf)) {
> +			ret = PTR_ERR(migf);
> +			goto err;
> +		}
> +		get_file(migf->filp);
> +
> +		ret = xe_sriov_vfio_resume_data_enter(xe_vdev->xe, xe_vdev->vfid);
> +		if (ret) {
> +			fput(migf->filp);
> +			goto err;
> +		}
> +
> +		return migf->filp;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) {
> +		if (xe_vdev->migf)
> +			xe_vfio_pci_put_file(xe_vdev);
> +
> +		ret = xe_sriov_vfio_resume_data_exit(xe_vdev->xe, xe_vdev->vfid);
> +		if (ret)
> +			goto err;
> +
> +		return NULL;
> +	}
> +
> +	WARN(true, "Unknown state transition %d->%d", cur, new);
> +	return ERR_PTR(-EINVAL);
> +
> +err:
> +	dev_dbg(xe_vdev_to_dev(xe_vdev),
> +		"Failed to transition state: %s->%s err=%d\n",
> +		vfio_dev_state_str(cur), vfio_dev_state_str(new), ret);
> +	return ERR_PTR(ret);
> +}
> +
> +static struct file *
> +xe_vfio_pci_set_device_state(struct vfio_device *core_vdev,
> +			     enum vfio_device_mig_state new_state)
> +{
> +	struct xe_vfio_pci_core_device *xe_vdev =
> +		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> +	enum vfio_device_mig_state next_state;
> +	struct file *f = NULL;
> +	int ret;
> +
> +	xe_vfio_pci_state_mutex_lock(xe_vdev);
> +	while (new_state != xe_vdev->mig_state) {
> +		ret = vfio_mig_get_next_state(core_vdev, xe_vdev->mig_state,
> +					      new_state, &next_state);
> +		if (ret) {
> +			xe_sriov_vfio_error(xe_vdev->xe, xe_vdev->vfid);
> +			f = ERR_PTR(ret);
> +			break;
> +		}
> +		f = xe_vfio_set_state(xe_vdev, next_state);
> +		if (IS_ERR(f))
> +			break;
> +
> +		xe_vdev->mig_state = next_state;
> +
> +		/* Multiple state transitions with non-NULL file in the middle */
> +		if (f && new_state != xe_vdev->mig_state) {
> +			fput(f);
> +			f = ERR_PTR(-EINVAL);
> +			break;
> +		}
> +	}
> +	xe_vfio_pci_state_mutex_unlock(xe_vdev);
> +
> +	return f;
> +}
> +
> +static int xe_vfio_pci_get_device_state(struct vfio_device *core_vdev,
> +					enum vfio_device_mig_state *curr_state)
> +{
> +	struct xe_vfio_pci_core_device *xe_vdev =
> +		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> +
> +	xe_vfio_pci_state_mutex_lock(xe_vdev);
> +	*curr_state = xe_vdev->mig_state;
> +	xe_vfio_pci_state_mutex_unlock(xe_vdev);
> +
> +	return 0;
> +}
> +
> +static int xe_vfio_pci_get_data_size(struct vfio_device *vdev,
> +				     unsigned long *stop_copy_length)
> +{
> +	struct xe_vfio_pci_core_device *xe_vdev =
> +		container_of(vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> +
> +	xe_vfio_pci_state_mutex_lock(xe_vdev);
> +	*stop_copy_length = xe_sriov_vfio_stop_copy_size(xe_vdev->xe, xe_vdev->vfid);
> +	xe_vfio_pci_state_mutex_unlock(xe_vdev);
> +
> +	return 0;
> +}
> +
> +static const struct vfio_migration_ops xe_vfio_pci_migration_ops = {
> +	.migration_set_state = xe_vfio_pci_set_device_state,
> +	.migration_get_state = xe_vfio_pci_get_device_state,
> +	.migration_get_data_size = xe_vfio_pci_get_data_size,
> +};
> +
> +static void xe_vfio_pci_migration_init(struct xe_vfio_pci_core_device *xe_vdev)
> +{
> +	struct vfio_device *core_vdev = &xe_vdev->core_device.vdev;
> +	struct pci_dev *pdev = to_pci_dev(core_vdev->dev);
> +	struct xe_device *xe = xe_sriov_vfio_get_pf(pdev);
> +	int ret;
> +
> +	if (!xe)
> +		return;
> +	if (!xe_sriov_vfio_migration_supported(xe))
> +		return;

As above, ordering here seems wrong if FLR is expecting vfid and xe set
independent of support migration.

> +
> +	ret = pci_iov_vf_id(pdev);
> +	if (ret < 0)
> +		return;

Maybe this is just defensive, but @xe being non-NULL verifies @pdev is
a VF bound to &xe_pci_driver, so we could pretty safely just use
'pci_iov_vf_id(pdev) + 1' below.  Thanks,

Alex

> +
> +	mutex_init(&xe_vdev->state_mutex);
> +	spin_lock_init(&xe_vdev->reset_lock);
> +
> +	/* PF internal control uses vfid index starting from 1 */
> +	xe_vdev->vfid = ret + 1;
> +	xe_vdev->xe = xe;
> +	xe_vdev->migrate_cap = true;
> +
> +	core_vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P;
> +	core_vdev->mig_ops = &xe_vfio_pci_migration_ops;
> +}
> +
> +static void xe_vfio_pci_migration_fini(struct xe_vfio_pci_core_device *xe_vdev)
> +{
> +	if (!xe_vdev->migrate_cap)
> +		return;
> +
> +	mutex_destroy(&xe_vdev->state_mutex);
> +}
> +
> +static int xe_vfio_pci_init_dev(struct vfio_device *core_vdev)
> +{
> +	struct xe_vfio_pci_core_device *xe_vdev =
> +		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> +
> +	xe_vfio_pci_migration_init(xe_vdev);
> +
> +	return vfio_pci_core_init_dev(core_vdev);
> +}
> +
> +static void xe_vfio_pci_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct xe_vfio_pci_core_device *xe_vdev =
> +		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> +
> +	xe_vfio_pci_migration_fini(xe_vdev);
> +}
> +
> +static const struct vfio_device_ops xe_vfio_pci_ops = {
> +	.name = "xe-vfio-pci",
> +	.init = xe_vfio_pci_init_dev,
> +	.release = xe_vfio_pci_release_dev,
> +	.open_device = xe_vfio_pci_open_device,
> +	.close_device = vfio_pci_core_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.match_token_uuid = vfio_pci_core_match_token_uuid,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +	.detach_ioas = vfio_iommufd_physical_detach_ioas,
> +};
> +
> +static int xe_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> +{
> +	struct xe_vfio_pci_core_device *xe_vdev;
> +	int ret;
> +
> +	xe_vdev = vfio_alloc_device(xe_vfio_pci_core_device, core_device.vdev, &pdev->dev,
> +				    &xe_vfio_pci_ops);
> +	if (IS_ERR(xe_vdev))
> +		return PTR_ERR(xe_vdev);
> +
> +	dev_set_drvdata(&pdev->dev, &xe_vdev->core_device);
> +
> +	ret = vfio_pci_core_register_device(&xe_vdev->core_device);
> +	if (ret) {
> +		vfio_put_device(&xe_vdev->core_device.vdev);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static void xe_vfio_pci_remove(struct pci_dev *pdev)
> +{
> +	struct xe_vfio_pci_core_device *xe_vdev = pci_get_drvdata(pdev);
> +
> +	vfio_pci_core_unregister_device(&xe_vdev->core_device);
> +	vfio_put_device(&xe_vdev->core_device.vdev);
> +}
> +
> +#define INTEL_PCI_VFIO_DEVICE(_id) { \
> +	PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, (_id)) \
> +}
> +
> +static const struct pci_device_id xe_vfio_pci_table[] = {
> +	INTEL_PTL_IDS(INTEL_PCI_VFIO_DEVICE),
> +	INTEL_WCL_IDS(INTEL_PCI_VFIO_DEVICE),
> +	INTEL_BMG_IDS(INTEL_PCI_VFIO_DEVICE),
> +	{}
> +};
> +MODULE_DEVICE_TABLE(pci, xe_vfio_pci_table);
> +
> +static struct pci_driver xe_vfio_pci_driver = {
> +	.name = "xe-vfio-pci",
> +	.id_table = xe_vfio_pci_table,
> +	.probe = xe_vfio_pci_probe,
> +	.remove = xe_vfio_pci_remove,
> +	.err_handler = &xe_vfio_pci_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +module_pci_driver(xe_vfio_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Michał Winiarski <michal.winiarski@intel.com>");
> +MODULE_DESCRIPTION("VFIO PCI driver with migration support for Intel Graphics");


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration
  2025-11-24 23:08 [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration Michał Winiarski
                   ` (3 preceding siblings ...)
  2025-11-24 23:08 ` [PATCH v6 4/4] vfio/xe: Add device specific vfio_pci driver variant for Intel graphics Michał Winiarski
@ 2025-11-25 20:13 ` Alex Williamson
  2025-11-26  1:20   ` Matthew Brost
  4 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2025-11-25 20:13 UTC (permalink / raw)
  To: Michał Winiarski
  Cc: Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Yishai Hadas, Kevin Tian, Shameer Kolothum,
	intel-xe, linux-kernel, kvm, Matthew Brost, Michal Wajdeczko,
	dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Tue, 25 Nov 2025 00:08:37 +0100
Michał Winiarski <michal.winiarski@intel.com> wrote:

> Hi,
> 
> We're now at v6, thanks for all the review feedback.
> 
> First 24 patches are now already merged through drm-tip tree, and I hope
> we can get the remaining ones through the VFIO tree.

Are all those dependencies in a topic branch somewhere?  Otherwise to
go in through vfio would mean we need to rebase our next branch after
drm is merged.  LPC is happening during this merge window, so we may
not be able to achieve that leniency in ordering.  Is the better
approach to get acks on the variant driver and funnel the whole thing
through the drm tree?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration
  2025-11-25 20:13 ` [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration Alex Williamson
@ 2025-11-26  1:20   ` Matthew Brost
  2025-11-26 11:38     ` Thomas Hellström
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Brost @ 2025-11-26  1:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Michał Winiarski, Lucas De Marchi, Thomas Hellström,
	Rodrigo Vivi, Jason Gunthorpe, Yishai Hadas, Kevin Tian,
	Shameer Kolothum, intel-xe, linux-kernel, kvm, Michal Wajdeczko,
	dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Tue, Nov 25, 2025 at 01:13:15PM -0700, Alex Williamson wrote:
> On Tue, 25 Nov 2025 00:08:37 +0100
> Michał Winiarski <michal.winiarski@intel.com> wrote:
> 
> > Hi,
> > 
> > We're now at v6, thanks for all the review feedback.
> > 
> > First 24 patches are now already merged through drm-tip tree, and I hope
> > we can get the remaining ones through the VFIO tree.
> 
> Are all those dependencies in a topic branch somewhere?  Otherwise to
> go in through vfio would mean we need to rebase our next branch after
> drm is merged.  LPC is happening during this merge window, so we may
> not be able to achieve that leniency in ordering.  Is the better
> approach to get acks on the variant driver and funnel the whole thing
> through the drm tree?  Thanks,

+1 on merging through drm if VFIO maintainers are ok with this. I've
done this for various drm external changes in the past with maintainers
acks.

Matt

> 
> Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration
  2025-11-26  1:20   ` Matthew Brost
@ 2025-11-26 11:38     ` Thomas Hellström
  2025-11-26 11:39       ` Thomas Hellström
  2025-11-26 14:46       ` Michał Winiarski
  0 siblings, 2 replies; 19+ messages in thread
From: Thomas Hellström @ 2025-11-26 11:38 UTC (permalink / raw)
  To: Matthew Brost, Alex Williamson
  Cc: Michał Winiarski, Lucas De Marchi, Rodrigo Vivi,
	Jason Gunthorpe, Yishai Hadas, Kevin Tian, Shameer Kolothum,
	intel-xe, linux-kernel, kvm, Michal Wajdeczko, dri-devel,
	Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin, David Airlie,
	Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Tue, 2025-11-25 at 17:20 -0800, Matthew Brost wrote:
> On Tue, Nov 25, 2025 at 01:13:15PM -0700, Alex Williamson wrote:
> > On Tue, 25 Nov 2025 00:08:37 +0100
> > Michał Winiarski <michal.winiarski@intel.com> wrote:
> > 
> > > Hi,
> > > 
> > > We're now at v6, thanks for all the review feedback.
> > > 
> > > First 24 patches are now already merged through drm-tip tree, and
> > > I hope
> > > we can get the remaining ones through the VFIO tree.
> > 
> > Are all those dependencies in a topic branch somewhere?  Otherwise
> > to
> > go in through vfio would mean we need to rebase our next branch
> > after
> > drm is merged.  LPC is happening during this merge window, so we
> > may
> > not be able to achieve that leniency in ordering.  Is the better
> > approach to get acks on the variant driver and funnel the whole
> > thing
> > through the drm tree?  Thanks,
> 
> +1 on merging through drm if VFIO maintainers are ok with this. I've
> done this for various drm external changes in the past with
> maintainers
> acks.
> 
> Matt

@Michal Winiarski

Are these patches depending on any other VFIO changes that are queued
for 6.19? 

If not and with proper VFIO acks, I could ask Dave / Sima to allow this
for drm-xe-next-fixes pull. Then I also would need a strong
justification for it being in 6.19 rather in 7.0.

Otherwise we'd need to have the VFIO changes it depends on in a topic
branch, or target this for 7.0 and hold off the merge until we can
backmerge 6.9-rc1.

Thanks,
Thomas


> 
> > 
> > Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration
  2025-11-26 11:38     ` Thomas Hellström
@ 2025-11-26 11:39       ` Thomas Hellström
  2025-11-26 14:46       ` Michał Winiarski
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Hellström @ 2025-11-26 11:39 UTC (permalink / raw)
  To: Matthew Brost, Alex Williamson
  Cc: Michał Winiarski, Lucas De Marchi, Rodrigo Vivi,
	Jason Gunthorpe, Yishai Hadas, Kevin Tian, Shameer Kolothum,
	intel-xe, linux-kernel, kvm, Michal Wajdeczko, dri-devel,
	Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin, David Airlie,
	Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Wed, 2025-11-26 at 12:38 +0100, Thomas Hellström wrote:
> On Tue, 2025-11-25 at 17:20 -0800, Matthew Brost wrote:
> > On Tue, Nov 25, 2025 at 01:13:15PM -0700, Alex Williamson wrote:
> > > On Tue, 25 Nov 2025 00:08:37 +0100
> > > Michał Winiarski <michal.winiarski@intel.com> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > We're now at v6, thanks for all the review feedback.
> > > > 
> > > > First 24 patches are now already merged through drm-tip tree,
> > > > and
> > > > I hope
> > > > we can get the remaining ones through the VFIO tree.
> > > 
> > > Are all those dependencies in a topic branch somewhere? 
> > > Otherwise
> > > to
> > > go in through vfio would mean we need to rebase our next branch
> > > after
> > > drm is merged.  LPC is happening during this merge window, so we
> > > may
> > > not be able to achieve that leniency in ordering.  Is the better
> > > approach to get acks on the variant driver and funnel the whole
> > > thing
> > > through the drm tree?  Thanks,
> > 
> > +1 on merging through drm if VFIO maintainers are ok with this.
> > I've
> > done this for various drm external changes in the past with
> > maintainers
> > acks.
> > 
> > Matt
> 
> @Michal Winiarski
> 
> Are these patches depending on any other VFIO changes that are queued
> for 6.19? 
> 
> If not and with proper VFIO acks, I could ask Dave / Sima to allow
> this
> for drm-xe-next-fixes pull. Then I also would need a strong
> justification for it being in 6.19 rather in 7.0.
> 
> Otherwise we'd need to have the VFIO changes it depends on in a topic
> branch, or target this for 7.0 and hold off the merge until we can
> backmerge 6.9-rc1.

6.19-rc1

/Thomas


> 
> Thanks,
> Thomas
> 
> 
> > 
> > > 
> > > Alex
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 4/4] vfio/xe: Add device specific vfio_pci driver variant for Intel graphics
  2025-11-25 20:08   ` Alex Williamson
@ 2025-11-26 11:59     ` Michał Winiarski
  0 siblings, 0 replies; 19+ messages in thread
From: Michał Winiarski @ 2025-11-26 11:59 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Yishai Hadas, Kevin Tian, Shameer Kolothum,
	intel-xe, linux-kernel, kvm, Matthew Brost, Michal Wajdeczko,
	dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Tue, Nov 25, 2025 at 01:08:14PM -0700, Alex Williamson wrote:
> On Tue, 25 Nov 2025 00:08:41 +0100
> Michał Winiarski <michal.winiarski@intel.com> wrote:
> 
> > In addition to generic VFIO PCI functionality, the driver implements
> > VFIO migration uAPI, allowing userspace to enable migration for Intel
> > Graphics SR-IOV Virtual Functions.
> > The driver binds to VF device and uses API exposed by Xe driver to
> > transfer the VF migration data under the control of PF device.
> > 
> > Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> > Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > ---
> >  MAINTAINERS                  |   7 +
> >  drivers/vfio/pci/Kconfig     |   2 +
> >  drivers/vfio/pci/Makefile    |   2 +
> >  drivers/vfio/pci/xe/Kconfig  |  12 +
> >  drivers/vfio/pci/xe/Makefile |   3 +
> >  drivers/vfio/pci/xe/main.c   | 568 +++++++++++++++++++++++++++++++++++
> >  6 files changed, 594 insertions(+)
> >  create mode 100644 drivers/vfio/pci/xe/Kconfig
> >  create mode 100644 drivers/vfio/pci/xe/Makefile
> >  create mode 100644 drivers/vfio/pci/xe/main.c
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index acc951f122eaf..adb5aa9cd29e9 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -27025,6 +27025,13 @@ L:	virtualization@lists.linux.dev
> >  S:	Maintained
> >  F:	drivers/vfio/pci/virtio
> >  
> > +VFIO XE PCI DRIVER
> > +M:	Michał Winiarski <michal.winiarski@intel.com>
> > +L:	kvm@vger.kernel.org
> > +L:	intel-xe@lists.freedesktop.org
> > +S:	Supported
> > +F:	drivers/vfio/pci/xe
> > +
> >  VGA_SWITCHEROO
> >  R:	Lukas Wunner <lukas@wunner.de>
> >  S:	Maintained
> > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> > index 2b0172f546652..c100f0ab87f2d 100644
> > --- a/drivers/vfio/pci/Kconfig
> > +++ b/drivers/vfio/pci/Kconfig
> > @@ -67,4 +67,6 @@ source "drivers/vfio/pci/nvgrace-gpu/Kconfig"
> >  
> >  source "drivers/vfio/pci/qat/Kconfig"
> >  
> > +source "drivers/vfio/pci/xe/Kconfig"
> > +
> >  endmenu
> > diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> > index cf00c0a7e55c8..f5d46aa9347b9 100644
> > --- a/drivers/vfio/pci/Makefile
> > +++ b/drivers/vfio/pci/Makefile
> > @@ -19,3 +19,5 @@ obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> >  obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu/
> >  
> >  obj-$(CONFIG_QAT_VFIO_PCI) += qat/
> > +
> > +obj-$(CONFIG_XE_VFIO_PCI) += xe/
> > diff --git a/drivers/vfio/pci/xe/Kconfig b/drivers/vfio/pci/xe/Kconfig
> > new file mode 100644
> > index 0000000000000..4253f2a86ca1f
> > --- /dev/null
> > +++ b/drivers/vfio/pci/xe/Kconfig
> > @@ -0,0 +1,12 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +config XE_VFIO_PCI
> > +	tristate "VFIO support for Intel Graphics"
> > +	depends on DRM_XE
> > +	select VFIO_PCI_CORE
> > +	help
> > +	  This option enables device specific VFIO driver variant for Intel Graphics.
> > +	  In addition to generic VFIO PCI functionality, it implements VFIO
> > +	  migration uAPI allowing userspace to enable migration for
> > +	  Intel Graphics SR-IOV Virtual Functions supported by the Xe driver.
> > +
> > +	  If you don't know what to do here, say N.
> > diff --git a/drivers/vfio/pci/xe/Makefile b/drivers/vfio/pci/xe/Makefile
> > new file mode 100644
> > index 0000000000000..13aa0fd192cd4
> > --- /dev/null
> > +++ b/drivers/vfio/pci/xe/Makefile
> > @@ -0,0 +1,3 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +obj-$(CONFIG_XE_VFIO_PCI) += xe-vfio-pci.o
> > +xe-vfio-pci-y := main.o
> > diff --git a/drivers/vfio/pci/xe/main.c b/drivers/vfio/pci/xe/main.c
> > new file mode 100644
> > index 0000000000000..ce0ed82ee4d31
> > --- /dev/null
> > +++ b/drivers/vfio/pci/xe/main.c
> > @@ -0,0 +1,568 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include <linux/anon_inodes.h>
> > +#include <linux/delay.h>
> > +#include <linux/file.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +#include <linux/sizes.h>
> > +#include <linux/types.h>
> > +#include <linux/vfio.h>
> > +#include <linux/vfio_pci_core.h>
> > +
> > +#include <drm/intel/xe_sriov_vfio.h>
> > +#include <drm/intel/pciids.h>
> > +
> > +struct xe_vfio_pci_migration_file {
> > +	struct file *filp;
> > +	/* serializes accesses to migration data */
> > +	struct mutex lock;
> > +	bool disabled;
> 
> Move to the end to avoid a hole?  Unless you know mutex leaves a gap.
> Maybe also use a bitfield u8 for consistency to flags in below struct.

I'll move it and switch to bitfield u8.

> 
> > +	struct xe_vfio_pci_core_device *xe_vdev;
> > +};
> > +
> > +struct xe_vfio_pci_core_device {
> > +	struct vfio_pci_core_device core_device;
> > +	struct xe_device *xe;
> > +	/* PF internal control uses vfid index starting from 1 */
> > +	unsigned int vfid;
> > +	u8 migrate_cap:1;
> > +	u8 deferred_reset:1;
> > +	/* protects migration state */
> > +	struct mutex state_mutex;
> > +	enum vfio_device_mig_state mig_state;
> > +	/* protects the reset_done flow */
> > +	spinlock_t reset_lock;
> > +	struct xe_vfio_pci_migration_file *migf;
> > +};
> > +
> > +#define xe_vdev_to_dev(xe_vdev) (&(xe_vdev)->core_device.pdev->dev)
> > +
> > +static void xe_vfio_pci_disable_file(struct xe_vfio_pci_migration_file *migf)
> > +{
> > +	mutex_lock(&migf->lock);
> > +	migf->disabled = true;
> > +	mutex_unlock(&migf->lock);
> > +}
> > +
> > +static void xe_vfio_pci_put_file(struct xe_vfio_pci_core_device *xe_vdev)
> > +{
> > +	xe_vfio_pci_disable_file(xe_vdev->migf);
> > +	fput(xe_vdev->migf->filp);
> > +	xe_vdev->migf = NULL;
> > +}
> > +
> > +static void xe_vfio_pci_reset(struct xe_vfio_pci_core_device *xe_vdev)
> > +{
> > +	if (xe_vdev->migf)
> > +		xe_vfio_pci_put_file(xe_vdev);
> > +
> > +	xe_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
> > +}
> > +
> > +static void xe_vfio_pci_state_mutex_lock(struct xe_vfio_pci_core_device *xe_vdev)
> > +{
> > +	mutex_lock(&xe_vdev->state_mutex);
> > +}
> > +
> > +/*
> > + * This function is called in all state_mutex unlock cases to
> > + * handle a 'deferred_reset' if exists.
> > + */
> > +static void xe_vfio_pci_state_mutex_unlock(struct xe_vfio_pci_core_device *xe_vdev)
> > +{
> > +again:
> > +	spin_lock(&xe_vdev->reset_lock);
> > +	if (xe_vdev->deferred_reset) {
> > +		xe_vdev->deferred_reset = false;
> > +		spin_unlock(&xe_vdev->reset_lock);
> > +		xe_vfio_pci_reset(xe_vdev);
> > +		goto again;
> > +	}
> > +	mutex_unlock(&xe_vdev->state_mutex);
> > +	spin_unlock(&xe_vdev->reset_lock);
> > +}
> > +
> > +static void xe_vfio_pci_reset_done(struct pci_dev *pdev)
> > +{
> > +	struct xe_vfio_pci_core_device *xe_vdev = pci_get_drvdata(pdev);
> > +	int ret;
> > +
> > +	if (!xe_vdev->vfid)
> > +		return;
> > +
> > +	/*
> > +	 * VF FLR requires additional processing done by PF driver.
> > +	 * The processing is done after FLR is already finished from PCIe
> > +	 * perspective.
> > +	 * In order to avoid a scenario where VF is used while PF processing
> > +	 * is still in progress, additional synchronization point is needed.
> > +	 */
> > +	ret = xe_sriov_vfio_wait_flr_done(xe_vdev->xe, xe_vdev->vfid);
> > +	if (ret)
> > +		dev_err(&pdev->dev, "Failed to wait for FLR: %d\n", ret);
> > +
> > +	if (!xe_vdev->migrate_cap)
> > +		return;
> 
> It seems like the above is intended to cause a stall for all VFs,
> regardless of migration support, but vfid and xe are only set for VFs
> supporting migration.  Maybe that much needs to be pulled out of
> migration_init into init_dev, which then gives the migrate_cap flag
> purpose where it otherwise seems redundant to testing xe or vfid.

Yeah - I'll remove migrate_cap and test for vfid instead.
The test for xe_vdev->vfid at the top of the function will be replaced
with check for pdev->is_virtfn, as we do want to exit early in case
xe-vfio-pci was bound to native PCI device (not VF).

> 
> > +
> > +	/*
> > +	 * As the higher VFIO layers are holding locks across reset and using
> > +	 * those same locks with the mm_lock we need to prevent ABBA deadlock
> > +	 * with the state_mutex and mm_lock.
> > +	 * In case the state_mutex was taken already we defer the cleanup work
> > +	 * to the unlock flow of the other running context.
> > +	 */
> > +	spin_lock(&xe_vdev->reset_lock);
> > +	xe_vdev->deferred_reset = true;
> > +	if (!mutex_trylock(&xe_vdev->state_mutex)) {
> > +		spin_unlock(&xe_vdev->reset_lock);
> > +		return;
> > +	}
> > +	spin_unlock(&xe_vdev->reset_lock);
> > +	xe_vfio_pci_state_mutex_unlock(xe_vdev);
> > +
> > +	xe_vfio_pci_reset(xe_vdev);
> > +}
> > +
> > +static const struct pci_error_handlers xe_vfio_pci_err_handlers = {
> > +	.reset_done = xe_vfio_pci_reset_done,
> > +	.error_detected = vfio_pci_core_aer_err_detected,
> > +};
> > +
> > +static int xe_vfio_pci_open_device(struct vfio_device *core_vdev)
> > +{
> > +	struct xe_vfio_pci_core_device *xe_vdev =
> > +		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> > +	struct vfio_pci_core_device *vdev = &xe_vdev->core_device;
> > +	int ret;
> > +
> > +	ret = vfio_pci_core_enable(vdev);
> > +	if (ret)
> > +		return ret;
> > +
> > +	vfio_pci_core_finish_enable(vdev);
> > +
> > +	return 0;
> > +}
> 
> Typically migration drivers are setting the initial RUNNING mig_state
> in their open_device function, are we implicitly relying on the
> reset_done callback for this instead?

We are relying on reset_done, and know that we want to make it explicit.
I'll add proper handling here and in close path.

> 
> > +
> > +static int xe_vfio_pci_release_file(struct inode *inode, struct file *filp)
> > +{
> > +	struct xe_vfio_pci_migration_file *migf = filp->private_data;
> > +
> > +	xe_vfio_pci_disable_file(migf);
> 
> What does calling the above accomplish?  If something is racing access,
> setting disabled immediately before we destroy the lock and free the
> object isn't going to solve anything.

I think we can savely remove it - IIUC, the upper layers are taking care
of the race by taking a ref as part of read/write.
I'll do that.

> 
> > +	mutex_destroy(&migf->lock);
> > +	kfree(migf);
> > +
> > +	return 0;
> > +}
> > +
> > +static ssize_t xe_vfio_pci_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos)
> > +{
> > +	struct xe_vfio_pci_migration_file *migf = filp->private_data;
> > +	ssize_t ret;
> > +
> > +	if (pos)
> > +		return -ESPIPE;
> > +
> > +	mutex_lock(&migf->lock);
> > +	if (migf->disabled) {
> > +		mutex_unlock(&migf->lock);
> > +		return -ENODEV;
> > +	}
> > +
> > +	ret = xe_sriov_vfio_data_read(migf->xe_vdev->xe, migf->xe_vdev->vfid, buf, len);
> > +	mutex_unlock(&migf->lock);
> > +
> > +	return ret;
> > +}
> > +
> > +static const struct file_operations xe_vfio_pci_save_fops = {
> > +	.owner = THIS_MODULE,
> > +	.read = xe_vfio_pci_save_read,
> > +	.release = xe_vfio_pci_release_file,
> > +	.llseek = noop_llseek,
> > +};
> > +
> > +static ssize_t xe_vfio_pci_resume_write(struct file *filp, const char __user *buf,
> > +					size_t len, loff_t *pos)
> > +{
> > +	struct xe_vfio_pci_migration_file *migf = filp->private_data;
> > +	ssize_t ret;
> > +
> > +	if (pos)
> > +		return -ESPIPE;
> > +
> > +	mutex_lock(&migf->lock);
> > +	if (migf->disabled) {
> > +		mutex_unlock(&migf->lock);
> > +		return -ENODEV;
> > +	}
> > +
> > +	ret = xe_sriov_vfio_data_write(migf->xe_vdev->xe, migf->xe_vdev->vfid, buf, len);
> > +	mutex_unlock(&migf->lock);
> > +
> > +	return ret;
> > +}
> > +
> > +static const struct file_operations xe_vfio_pci_resume_fops = {
> > +	.owner = THIS_MODULE,
> > +	.write = xe_vfio_pci_resume_write,
> > +	.release = xe_vfio_pci_release_file,
> > +	.llseek = noop_llseek,
> > +};
> > +
> > +static const char *vfio_dev_state_str(u32 state)
> > +{
> > +	switch (state) {
> > +	case VFIO_DEVICE_STATE_RUNNING: return "running";
> > +	case VFIO_DEVICE_STATE_RUNNING_P2P: return "running_p2p";
> > +	case VFIO_DEVICE_STATE_STOP_COPY: return "stopcopy";
> > +	case VFIO_DEVICE_STATE_STOP: return "stop";
> > +	case VFIO_DEVICE_STATE_RESUMING: return "resuming";
> > +	case VFIO_DEVICE_STATE_ERROR: return "error";
> > +	default: return "";
> > +	}
> > +}
> > +
> > +enum xe_vfio_pci_file_type {
> > +	XE_VFIO_FILE_SAVE = 0,
> > +	XE_VFIO_FILE_RESUME,
> > +};
> > +
> > +static struct xe_vfio_pci_migration_file *
> > +xe_vfio_pci_alloc_file(struct xe_vfio_pci_core_device *xe_vdev,
> > +		       enum xe_vfio_pci_file_type type)
> > +{
> > +	struct xe_vfio_pci_migration_file *migf;
> > +	const struct file_operations *fops;
> > +	int flags;
> > +
> > +	migf = kzalloc(sizeof(*migf), GFP_KERNEL);
> 
> GFP_KERNEL_ACCOUNT

Ok.

> 
> > +	if (!migf)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	fops = type == XE_VFIO_FILE_SAVE ? &xe_vfio_pci_save_fops : &xe_vfio_pci_resume_fops;
> > +	flags = type == XE_VFIO_FILE_SAVE ? O_RDONLY : O_WRONLY;
> > +	migf->filp = anon_inode_getfile("xe_vfio_mig", fops, migf, flags);
> > +	if (IS_ERR(migf->filp)) {
> > +		kfree(migf);
> > +		return ERR_CAST(migf->filp);
> > +	}
> > +
> > +	mutex_init(&migf->lock);
> > +	migf->xe_vdev = xe_vdev;
> > +	xe_vdev->migf = migf;
> > +
> > +	stream_open(migf->filp->f_inode, migf->filp);
> > +
> > +	return migf;
> > +}
> > +
> > +static struct file *
> > +xe_vfio_set_state(struct xe_vfio_pci_core_device *xe_vdev, u32 new)
> > +{
> > +	u32 cur = xe_vdev->mig_state;
> > +	int ret;
> > +
> > +	dev_dbg(xe_vdev_to_dev(xe_vdev),
> > +		"state: %s->%s\n", vfio_dev_state_str(cur), vfio_dev_state_str(new));
> > +
> > +	/*
> > +	 * "STOP" handling is reused for "RUNNING_P2P", as the device doesn't
> > +	 * have the capability to selectively block outgoing p2p DMA transfers.
> > +	 * While the device is allowing BAR accesses when the VF is stopped, it
> > +	 * is not processing any new workload requests, effectively stopping
> > +	 * any outgoing DMA transfers (not just p2p).
> > +	 * Any VRAM / MMIO accesses occurring during "RUNNING_P2P" are kept and
> > +	 * will be migrated to target VF during stop-copy.
> > +	 */
> > +	if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) {
> > +		ret = xe_sriov_vfio_suspend_device(xe_vdev->xe, xe_vdev->vfid);
> > +		if (ret)
> > +			goto err;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	if ((cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_STOP) ||
> > +	    (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RUNNING_P2P))
> > +		return NULL;
> > +
> > +	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) {
> > +		ret = xe_sriov_vfio_resume_device(xe_vdev->xe, xe_vdev->vfid);
> > +		if (ret)
> > +			goto err;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_STOP_COPY) {
> > +		struct xe_vfio_pci_migration_file *migf;
> > +
> > +		migf = xe_vfio_pci_alloc_file(xe_vdev, XE_VFIO_FILE_SAVE);
> > +		if (IS_ERR(migf)) {
> > +			ret = PTR_ERR(migf);
> > +			goto err;
> > +		}
> > +		get_file(migf->filp);
> > +
> > +		ret = xe_sriov_vfio_stop_copy_enter(xe_vdev->xe, xe_vdev->vfid);
> > +		if (ret) {
> > +			fput(migf->filp);
> > +			goto err;
> > +		}
> > +
> > +		return migf->filp;
> > +	}
> > +
> > +	if (cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP) {
> > +		if (xe_vdev->migf)
> > +			xe_vfio_pci_put_file(xe_vdev);
> > +
> > +		ret = xe_sriov_vfio_stop_copy_exit(xe_vdev->xe, xe_vdev->vfid);
> > +		if (ret)
> > +			goto err;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RESUMING) {
> > +		struct xe_vfio_pci_migration_file *migf;
> > +
> > +		migf = xe_vfio_pci_alloc_file(xe_vdev, XE_VFIO_FILE_RESUME);
> > +		if (IS_ERR(migf)) {
> > +			ret = PTR_ERR(migf);
> > +			goto err;
> > +		}
> > +		get_file(migf->filp);
> > +
> > +		ret = xe_sriov_vfio_resume_data_enter(xe_vdev->xe, xe_vdev->vfid);
> > +		if (ret) {
> > +			fput(migf->filp);
> > +			goto err;
> > +		}
> > +
> > +		return migf->filp;
> > +	}
> > +
> > +	if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) {
> > +		if (xe_vdev->migf)
> > +			xe_vfio_pci_put_file(xe_vdev);
> > +
> > +		ret = xe_sriov_vfio_resume_data_exit(xe_vdev->xe, xe_vdev->vfid);
> > +		if (ret)
> > +			goto err;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	WARN(true, "Unknown state transition %d->%d", cur, new);
> > +	return ERR_PTR(-EINVAL);
> > +
> > +err:
> > +	dev_dbg(xe_vdev_to_dev(xe_vdev),
> > +		"Failed to transition state: %s->%s err=%d\n",
> > +		vfio_dev_state_str(cur), vfio_dev_state_str(new), ret);
> > +	return ERR_PTR(ret);
> > +}
> > +
> > +static struct file *
> > +xe_vfio_pci_set_device_state(struct vfio_device *core_vdev,
> > +			     enum vfio_device_mig_state new_state)
> > +{
> > +	struct xe_vfio_pci_core_device *xe_vdev =
> > +		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> > +	enum vfio_device_mig_state next_state;
> > +	struct file *f = NULL;
> > +	int ret;
> > +
> > +	xe_vfio_pci_state_mutex_lock(xe_vdev);
> > +	while (new_state != xe_vdev->mig_state) {
> > +		ret = vfio_mig_get_next_state(core_vdev, xe_vdev->mig_state,
> > +					      new_state, &next_state);
> > +		if (ret) {
> > +			xe_sriov_vfio_error(xe_vdev->xe, xe_vdev->vfid);
> > +			f = ERR_PTR(ret);
> > +			break;
> > +		}
> > +		f = xe_vfio_set_state(xe_vdev, next_state);
> > +		if (IS_ERR(f))
> > +			break;
> > +
> > +		xe_vdev->mig_state = next_state;
> > +
> > +		/* Multiple state transitions with non-NULL file in the middle */
> > +		if (f && new_state != xe_vdev->mig_state) {
> > +			fput(f);
> > +			f = ERR_PTR(-EINVAL);
> > +			break;
> > +		}
> > +	}
> > +	xe_vfio_pci_state_mutex_unlock(xe_vdev);
> > +
> > +	return f;
> > +}
> > +
> > +static int xe_vfio_pci_get_device_state(struct vfio_device *core_vdev,
> > +					enum vfio_device_mig_state *curr_state)
> > +{
> > +	struct xe_vfio_pci_core_device *xe_vdev =
> > +		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> > +
> > +	xe_vfio_pci_state_mutex_lock(xe_vdev);
> > +	*curr_state = xe_vdev->mig_state;
> > +	xe_vfio_pci_state_mutex_unlock(xe_vdev);
> > +
> > +	return 0;
> > +}
> > +
> > +static int xe_vfio_pci_get_data_size(struct vfio_device *vdev,
> > +				     unsigned long *stop_copy_length)
> > +{
> > +	struct xe_vfio_pci_core_device *xe_vdev =
> > +		container_of(vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> > +
> > +	xe_vfio_pci_state_mutex_lock(xe_vdev);
> > +	*stop_copy_length = xe_sriov_vfio_stop_copy_size(xe_vdev->xe, xe_vdev->vfid);
> > +	xe_vfio_pci_state_mutex_unlock(xe_vdev);
> > +
> > +	return 0;
> > +}
> > +
> > +static const struct vfio_migration_ops xe_vfio_pci_migration_ops = {
> > +	.migration_set_state = xe_vfio_pci_set_device_state,
> > +	.migration_get_state = xe_vfio_pci_get_device_state,
> > +	.migration_get_data_size = xe_vfio_pci_get_data_size,
> > +};
> > +
> > +static void xe_vfio_pci_migration_init(struct xe_vfio_pci_core_device *xe_vdev)
> > +{
> > +	struct vfio_device *core_vdev = &xe_vdev->core_device.vdev;
> > +	struct pci_dev *pdev = to_pci_dev(core_vdev->dev);
> > +	struct xe_device *xe = xe_sriov_vfio_get_pf(pdev);
> > +	int ret;
> > +
> > +	if (!xe)
> > +		return;
> > +	if (!xe_sriov_vfio_migration_supported(xe))
> > +		return;
> 
> As above, ordering here seems wrong if FLR is expecting vfid and xe set
> independent of support migration.
> 
> > +
> > +	ret = pci_iov_vf_id(pdev);
> > +	if (ret < 0)
> > +		return;
> 
> Maybe this is just defensive, but @xe being non-NULL verifies @pdev is
> a VF bound to &xe_pci_driver, so we could pretty safely just use
> 'pci_iov_vf_id(pdev) + 1' below.  Thanks,

It's a result of a review feedback from previous revision, but in that
revision xe_sriov_vfio_get_pf() helper didn't exist. I'll use it
directly below.

Thanks,
-Michał

> 
> Alex
> 
> > +
> > +	mutex_init(&xe_vdev->state_mutex);
> > +	spin_lock_init(&xe_vdev->reset_lock);
> > +
> > +	/* PF internal control uses vfid index starting from 1 */
> > +	xe_vdev->vfid = ret + 1;
> > +	xe_vdev->xe = xe;
> > +	xe_vdev->migrate_cap = true;
> > +
> > +	core_vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P;
> > +	core_vdev->mig_ops = &xe_vfio_pci_migration_ops;
> > +}
> > +
> > +static void xe_vfio_pci_migration_fini(struct xe_vfio_pci_core_device *xe_vdev)
> > +{
> > +	if (!xe_vdev->migrate_cap)
> > +		return;
> > +
> > +	mutex_destroy(&xe_vdev->state_mutex);
> > +}
> > +
> > +static int xe_vfio_pci_init_dev(struct vfio_device *core_vdev)
> > +{
> > +	struct xe_vfio_pci_core_device *xe_vdev =
> > +		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> > +
> > +	xe_vfio_pci_migration_init(xe_vdev);
> > +
> > +	return vfio_pci_core_init_dev(core_vdev);
> > +}
> > +
> > +static void xe_vfio_pci_release_dev(struct vfio_device *core_vdev)
> > +{
> > +	struct xe_vfio_pci_core_device *xe_vdev =
> > +		container_of(core_vdev, struct xe_vfio_pci_core_device, core_device.vdev);
> > +
> > +	xe_vfio_pci_migration_fini(xe_vdev);
> > +}
> > +
> > +static const struct vfio_device_ops xe_vfio_pci_ops = {
> > +	.name = "xe-vfio-pci",
> > +	.init = xe_vfio_pci_init_dev,
> > +	.release = xe_vfio_pci_release_dev,
> > +	.open_device = xe_vfio_pci_open_device,
> > +	.close_device = vfio_pci_core_close_device,
> > +	.ioctl = vfio_pci_core_ioctl,
> > +	.device_feature = vfio_pci_core_ioctl_feature,
> > +	.read = vfio_pci_core_read,
> > +	.write = vfio_pci_core_write,
> > +	.mmap = vfio_pci_core_mmap,
> > +	.request = vfio_pci_core_request,
> > +	.match = vfio_pci_core_match,
> > +	.match_token_uuid = vfio_pci_core_match_token_uuid,
> > +	.bind_iommufd = vfio_iommufd_physical_bind,
> > +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> > +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> > +	.detach_ioas = vfio_iommufd_physical_detach_ioas,
> > +};
> > +
> > +static int xe_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > +{
> > +	struct xe_vfio_pci_core_device *xe_vdev;
> > +	int ret;
> > +
> > +	xe_vdev = vfio_alloc_device(xe_vfio_pci_core_device, core_device.vdev, &pdev->dev,
> > +				    &xe_vfio_pci_ops);
> > +	if (IS_ERR(xe_vdev))
> > +		return PTR_ERR(xe_vdev);
> > +
> > +	dev_set_drvdata(&pdev->dev, &xe_vdev->core_device);
> > +
> > +	ret = vfio_pci_core_register_device(&xe_vdev->core_device);
> > +	if (ret) {
> > +		vfio_put_device(&xe_vdev->core_device.vdev);
> > +		return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static void xe_vfio_pci_remove(struct pci_dev *pdev)
> > +{
> > +	struct xe_vfio_pci_core_device *xe_vdev = pci_get_drvdata(pdev);
> > +
> > +	vfio_pci_core_unregister_device(&xe_vdev->core_device);
> > +	vfio_put_device(&xe_vdev->core_device.vdev);
> > +}
> > +
> > +#define INTEL_PCI_VFIO_DEVICE(_id) { \
> > +	PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, (_id)) \
> > +}
> > +
> > +static const struct pci_device_id xe_vfio_pci_table[] = {
> > +	INTEL_PTL_IDS(INTEL_PCI_VFIO_DEVICE),
> > +	INTEL_WCL_IDS(INTEL_PCI_VFIO_DEVICE),
> > +	INTEL_BMG_IDS(INTEL_PCI_VFIO_DEVICE),
> > +	{}
> > +};
> > +MODULE_DEVICE_TABLE(pci, xe_vfio_pci_table);
> > +
> > +static struct pci_driver xe_vfio_pci_driver = {
> > +	.name = "xe-vfio-pci",
> > +	.id_table = xe_vfio_pci_table,
> > +	.probe = xe_vfio_pci_probe,
> > +	.remove = xe_vfio_pci_remove,
> > +	.err_handler = &xe_vfio_pci_err_handlers,
> > +	.driver_managed_dma = true,
> > +};
> > +module_pci_driver(xe_vfio_pci_driver);
> > +
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR("Michał Winiarski <michal.winiarski@intel.com>");
> > +MODULE_DESCRIPTION("VFIO PCI driver with migration support for Intel Graphics");
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration
  2025-11-26 11:38     ` Thomas Hellström
  2025-11-26 11:39       ` Thomas Hellström
@ 2025-11-26 14:46       ` Michał Winiarski
  2025-11-26 15:40         ` Alex Williamson
  1 sibling, 1 reply; 19+ messages in thread
From: Michał Winiarski @ 2025-11-26 14:46 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Matthew Brost, Alex Williamson, Lucas De Marchi, Rodrigo Vivi,
	Jason Gunthorpe, Yishai Hadas, Kevin Tian, Shameer Kolothum,
	intel-xe, linux-kernel, kvm, Michal Wajdeczko, dri-devel,
	Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin, David Airlie,
	Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Wed, Nov 26, 2025 at 12:38:34PM +0100, Thomas Hellström wrote:
> On Tue, 2025-11-25 at 17:20 -0800, Matthew Brost wrote:
> > On Tue, Nov 25, 2025 at 01:13:15PM -0700, Alex Williamson wrote:
> > > On Tue, 25 Nov 2025 00:08:37 +0100
> > > Michał Winiarski <michal.winiarski@intel.com> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > We're now at v6, thanks for all the review feedback.
> > > > 
> > > > First 24 patches are now already merged through drm-tip tree, and
> > > > I hope
> > > > we can get the remaining ones through the VFIO tree.
> > > 
> > > Are all those dependencies in a topic branch somewhere?  Otherwise
> > > to
> > > go in through vfio would mean we need to rebase our next branch
> > > after
> > > drm is merged.  LPC is happening during this merge window, so we
> > > may
> > > not be able to achieve that leniency in ordering.  Is the better
> > > approach to get acks on the variant driver and funnel the whole
> > > thing
> > > through the drm tree?  Thanks,
> > 
> > +1 on merging through drm if VFIO maintainers are ok with this. I've
> > done this for various drm external changes in the past with
> > maintainers
> > acks.
> > 
> > Matt
> 
> @Michal Winiarski
> 
> Are these patches depending on any other VFIO changes that are queued
> for 6.19? 

No, there's a series that I'm working on in parallel:
https://lore.kernel.org/lkml/20251120123647.3522082-1-michal.winiarski@intel.com/

Which will potentially change the VFIO driver that's part of this
series.
But I believe that this could go through fixes, after we have all the
pieces in place as part of 6.19-rc release.

> 
> If not and with proper VFIO acks, I could ask Dave / Sima to allow this
> for drm-xe-next-fixes pull. Then I also would need a strong
> justification for it being in 6.19 rather in 7.0.
> 
> Otherwise we'd need to have the VFIO changes it depends on in a topic
> branch, or target this for 7.0 and hold off the merge until we can
> backmerge 6.9-rc1.

Unless Alex has a different opinion, I think the justification would be
that this is just a matter of logistics - merging through DRM would just
be a simpler process than merging through VFIO. End result would be the
same.

Thanks,
-Michał

> 
> Thanks,
> Thomas
> 
> 
> > 
> > > 
> > > Alex
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration
  2025-11-26 14:46       ` Michał Winiarski
@ 2025-11-26 15:40         ` Alex Williamson
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2025-11-26 15:40 UTC (permalink / raw)
  To: Michał Winiarski
  Cc: Thomas Hellström, Matthew Brost, Lucas De Marchi,
	Rodrigo Vivi, Jason Gunthorpe, Yishai Hadas, Kevin Tian,
	Shameer Kolothum, intel-xe, linux-kernel, kvm, Michal Wajdeczko,
	dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Wed, 26 Nov 2025 15:46:43 +0100
Michał Winiarski <michal.winiarski@intel.com> wrote:

> On Wed, Nov 26, 2025 at 12:38:34PM +0100, Thomas Hellström wrote:
> > On Tue, 2025-11-25 at 17:20 -0800, Matthew Brost wrote:  
> > > On Tue, Nov 25, 2025 at 01:13:15PM -0700, Alex Williamson wrote:  
> > > > On Tue, 25 Nov 2025 00:08:37 +0100
> > > > Michał Winiarski <michal.winiarski@intel.com> wrote:
> > > >   
> > > > > Hi,
> > > > > 
> > > > > We're now at v6, thanks for all the review feedback.
> > > > > 
> > > > > First 24 patches are now already merged through drm-tip tree, and
> > > > > I hope
> > > > > we can get the remaining ones through the VFIO tree.  
> > > > 
> > > > Are all those dependencies in a topic branch somewhere?  Otherwise
> > > > to
> > > > go in through vfio would mean we need to rebase our next branch
> > > > after
> > > > drm is merged.  LPC is happening during this merge window, so we
> > > > may
> > > > not be able to achieve that leniency in ordering.  Is the better
> > > > approach to get acks on the variant driver and funnel the whole
> > > > thing
> > > > through the drm tree?  Thanks,  
> > > 
> > > +1 on merging through drm if VFIO maintainers are ok with this. I've
> > > done this for various drm external changes in the past with
> > > maintainers
> > > acks.
> > > 
> > > Matt  
> > 
> > @Michal Winiarski
> > 
> > Are these patches depending on any other VFIO changes that are queued
> > for 6.19?   
> 
> No, there's a series that I'm working on in parallel:
> https://lore.kernel.org/lkml/20251120123647.3522082-1-michal.winiarski@intel.com/
> 
> Which will potentially change the VFIO driver that's part of this
> series.
> But I believe that this could go through fixes, after we have all the
> pieces in place as part of 6.19-rc release.

6.19-rc or 6.19+1, depends on to what extent we decide the other
variant drivers have this same problem.  This driver has worked around
it in the traditional way though and I don't think it needs to be
delayed for a universal helper.

> > If not and with proper VFIO acks, I could ask Dave / Sima to allow this
> > for drm-xe-next-fixes pull. Then I also would need a strong
> > justification for it being in 6.19 rather in 7.0.
> > 
> > Otherwise we'd need to have the VFIO changes it depends on in a topic
> > branch, or target this for 7.0 and hold off the merge until we can
> > backmerge 6.9-rc1.  
> 
> Unless Alex has a different opinion, I think the justification would be
> that this is just a matter of logistics - merging through DRM would just
> be a simpler process than merging through VFIO. End result would be the
> same.

Yes, the result is the same, logistics of waiting for the drm-next
merge, rebasing, and sending a 2nd vfio pull request is the overhead.
The easier route through drm still depends on getting full acks on this
and whether drm will take it.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] drm/xe/pf: Export helpers for VFIO
  2025-11-25 18:34   ` Alex Williamson
@ 2025-11-26 18:21     ` Michał Winiarski
  0 siblings, 0 replies; 19+ messages in thread
From: Michał Winiarski @ 2025-11-26 18:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Yishai Hadas, Kevin Tian, Shameer Kolothum,
	intel-xe, linux-kernel, kvm, Matthew Brost, Michal Wajdeczko,
	dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Tue, Nov 25, 2025 at 11:34:03AM -0700, Alex Williamson wrote:
> On Tue, 25 Nov 2025 00:08:40 +0100
> Michał Winiarski <michal.winiarski@intel.com> wrote:
> > +/**
> > + * xe_sriov_vfio_wait_flr_done() - Wait for VF FLR completion.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + *
> > + * This function will wait until VF FLR is processed by PF on all tiles (or
> > + * until timeout occurs).
> > + *
> > + * Return: 0 on success or a negative error code on failure.
> > + */
> > +int xe_sriov_vfio_wait_flr_done(struct xe_device *xe, unsigned int vfid)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_control_wait_flr(xe, vfid);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_wait_flr_done, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_suspend_device() - Suspend VF.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + *
> > + * This function will pause VF on all tiles/GTs.
> > + *
> > + * Return: 0 on success or a negative error code on failure.
> > + */
> > +int xe_sriov_vfio_suspend_device(struct xe_device *xe, unsigned int vfid)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_control_pause_vf(xe, vfid);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_suspend_device, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_resume_device() - Resume VF.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + *
> > + * This function will resume VF on all tiles.
> > + *
> > + * Return: 0 on success or a negative error code on failure.
> > + */
> > +int xe_sriov_vfio_resume_device(struct xe_device *xe, unsigned int vfid)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_control_resume_vf(xe, vfid);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_resume_device, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_stop_copy_enter() - Initiate a VF device migration data save.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + *
> > + * Return: 0 on success or a negative error code on failure.
> > + */
> > +int xe_sriov_vfio_stop_copy_enter(struct xe_device *xe, unsigned int vfid)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_control_trigger_save_vf(xe, vfid);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_stop_copy_enter, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_stop_copy_exit() - Finish a VF device migration data save.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + *
> > + * Return: 0 on success or a negative error code on failure.
> > + */
> > +int xe_sriov_vfio_stop_copy_exit(struct xe_device *xe, unsigned int vfid)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_control_finish_save_vf(xe, vfid);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_stop_copy_exit, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_resume_data_enter() - Initiate a VF device migration data restore.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + *
> > + * Return: 0 on success or a negative error code on failure.
> > + */
> > +int xe_sriov_vfio_resume_data_enter(struct xe_device *xe, unsigned int vfid)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_control_trigger_restore_vf(xe, vfid);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_resume_data_enter, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_resume_data_exit() - Finish a VF device migration data restore.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + *
> > + * Return: 0 on success or a negative error code on failure.
> > + */
> > +int xe_sriov_vfio_resume_data_exit(struct xe_device *xe, unsigned int vfid)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_control_finish_restore_vf(xe, vfid);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_resume_data_exit, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_error() - Move VF device to error state.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + *
> > + * Reset is needed to move it out of error state.
> > + *
> > + * Return: 0 on success or a negative error code on failure.
> > + */
> > +int xe_sriov_vfio_error(struct xe_device *xe, unsigned int vfid)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_control_stop_vf(xe, vfid);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_error, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_data_read() - Read migration data from the VF device.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + * @buf: start address of userspace buffer
> > + * @len: requested read size from userspace
> > + *
> > + * Return: number of bytes that has been successfully read,
> > + *	   0 if no more migration data is available, -errno on failure.
> > + */
> > +ssize_t xe_sriov_vfio_data_read(struct xe_device *xe, unsigned int vfid,
> > +				char __user *buf, size_t len)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_migration_read(xe, vfid, buf, len);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_data_read, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_data_write() - Write migration data to the VF device.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + * @buf: start address of userspace buffer
> > + * @len: requested write size from userspace
> > + *
> > + * Return: number of bytes that has been successfully written, -errno on failure.
> > + */
> > +ssize_t xe_sriov_vfio_data_write(struct xe_device *xe, unsigned int vfid,
> > +				 const char __user *buf, size_t len)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_migration_write(xe, vfid, buf, len);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_data_write, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_stop_copy_size() - Get a size estimate of VF device migration data.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + * @vfid: the VF identifier (can't be 0)
> > + *
> > + * Return: migration data size in bytes or a negative error code on failure.
> > + */
> > +ssize_t xe_sriov_vfio_stop_copy_size(struct xe_device *xe, unsigned int vfid)
> > +{
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +	if (vfid == PFID || vfid > xe_sriov_pf_num_vfs(xe))
> > +		return -EINVAL;
> > +
> > +	guard(xe_pm_runtime_noresume)(xe);
> > +
> > +	return xe_sriov_pf_migration_size(xe, vfid);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_stop_copy_size, "xe-vfio-pci");
> 
> The duplicated testing and identical structure of most of the above
> functions suggests a helper, if not full on definition by macro.
> Thanks,
> 
> Alex

I'll convert it to use macro definition for everything except
xe_sriov_vfio_data_write/xe_sriov_vfio_data_read.

Thanks,
-Michał

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] drm/xe/pf: Export helpers for VFIO
  2025-11-25 14:38   ` Michal Wajdeczko
@ 2025-11-26 22:07     ` Michał Winiarski
  0 siblings, 0 replies; 19+ messages in thread
From: Michał Winiarski @ 2025-11-26 22:07 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: Alex Williamson, Lucas De Marchi, Thomas Hellström,
	Rodrigo Vivi, Jason Gunthorpe, Yishai Hadas, Kevin Tian,
	Shameer Kolothum, intel-xe, linux-kernel, kvm, Matthew Brost,
	dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Tue, Nov 25, 2025 at 03:38:17PM +0100, Michal Wajdeczko wrote:
> 
> 
> On 11/25/2025 12:08 AM, Michał Winiarski wrote:
> > Device specific VFIO driver variant for Xe will implement VF migration.
> > Export everything that's needed for migration ops.
> > 
> > Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> > ---
> >  drivers/gpu/drm/xe/Makefile        |   2 +
> >  drivers/gpu/drm/xe/xe_sriov_vfio.c | 276 +++++++++++++++++++++++++++++
> >  include/drm/intel/xe_sriov_vfio.h  |  30 ++++
> >  3 files changed, 308 insertions(+)
> >  create mode 100644 drivers/gpu/drm/xe/xe_sriov_vfio.c
> >  create mode 100644 include/drm/intel/xe_sriov_vfio.h
> > 
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index b848da79a4e18..0938b00a4c7fe 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -184,6 +184,8 @@ xe-$(CONFIG_PCI_IOV) += \
> >  	xe_sriov_pf_sysfs.o \
> >  	xe_tile_sriov_pf_debugfs.o
> >  
> > +xe-$(CONFIG_XE_VFIO_PCI) += xe_sriov_vfio.o
> 
> hmm, shouldn't we also check for CONFIG_PCI_IOV ?
> otherwise, some PF functions might not be available
> or there some other implicit rule in Kconfig?

I did compile-test without CONFIG_PCI_IOV at some point, and it seems to
build fine for me.
But yeah - it should probably be pulled under CONFIG_PCI_IOV just like
other SR-IOV related files.
I'll do that (+ stubs for when CONFIG_PCI_IOV is disabled).

> 
> > +
> >  # include helpers for tests even when XE is built-in
> >  ifdef CONFIG_DRM_XE_KUNIT_TEST
> >  xe-y += tests/xe_kunit_helpers.o
> > diff --git a/drivers/gpu/drm/xe/xe_sriov_vfio.c b/drivers/gpu/drm/xe/xe_sriov_vfio.c
> > new file mode 100644
> > index 0000000000000..785f9a5027d10
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_sriov_vfio.c
> > @@ -0,0 +1,276 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2025 Intel Corporation
> > + */
> > +
> > +#include <drm/intel/xe_sriov_vfio.h>
> > +#include <linux/cleanup.h>
> > +
> > +#include "xe_pci.h"
> > +#include "xe_pm.h"
> > +#include "xe_sriov_pf_control.h"
> > +#include "xe_sriov_pf_helpers.h"
> > +#include "xe_sriov_pf_migration.h"
> > +
> > +/**
> > + * xe_sriov_vfio_get_pf() - Get PF &xe_device.
> > + * @pdev: the VF &pci_dev device
> > + *
> > + * Return: pointer to PF &xe_device, NULL otherwise.
> > + */
> > +struct xe_device *xe_sriov_vfio_get_pf(struct pci_dev *pdev)
> > +{
> > +	return xe_pci_to_pf_device(pdev);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_get_pf, "xe-vfio-pci");
> > +
> > +/**
> > + * xe_sriov_vfio_migration_supported() - Check if migration is supported.
> > + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf()
> > + *
> > + * Return: true if migration is supported, false otherwise.
> > + */
> > +bool xe_sriov_vfio_migration_supported(struct xe_device *xe)
> > +{
> 
> hmm, I'm wondering if maybe we should also check for NULL xe in all those
> functions, as above helper function might return NULL in some unlikely case
> 
> but maybe this is too defensive

I think it's too defensive.
The xe_sriov_vfio_get_pf() is used in one place, and the return value is
checked. Worst case - not checking the return value will be caught early
as it will explode immediately with NULL-ptr-deref.

> 
> > +	if (!IS_SRIOV_PF(xe))
> > +		return -EPERM;
> > +
> > +	return xe_sriov_pf_migration_supported(xe);
> > +}
> > +EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_migration_supported, "xe-vfio-pci");
> > +
> 
> everything else lgtm, so:
> 
> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 

Thanks,
-Michał

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] drm/xe/pf: Enable SR-IOV VF migration
  2025-11-25 14:26   ` Michal Wajdeczko
@ 2025-11-26 22:07     ` Michał Winiarski
  0 siblings, 0 replies; 19+ messages in thread
From: Michał Winiarski @ 2025-11-26 22:07 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: Alex Williamson, Lucas De Marchi, Thomas Hellström,
	Rodrigo Vivi, Jason Gunthorpe, Yishai Hadas, Kevin Tian,
	Shameer Kolothum, intel-xe, linux-kernel, kvm, Matthew Brost,
	dri-devel, Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin,
	David Airlie, Simona Vetter, Lukasz Laguna, Christoph Hellwig

On Tue, Nov 25, 2025 at 03:26:38PM +0100, Michal Wajdeczko wrote:
> 
> 
> On 11/25/2025 12:08 AM, Michał Winiarski wrote:
> > All of the necessary building blocks are now in place to support SR-IOV
> > VF migration.
> > Flip the enable/disable logic to match VF code and disable the feature
> > only for platforms that don't meet the necessary prerequisites.
> > 
> 
> I guess you should mention that "to allow more testing and experiments,
> on DEBUG builds any missing prerequisites will be ignored"

Sure, let's add it.

> 
> > Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  9 +++++
> >  drivers/gpu/drm/xe/xe_sriov_pf_migration.c    | 35 ++++++++++++++++---
> >  drivers/gpu/drm/xe/xe_sriov_pf_migration.h    |  1 +
> >  .../gpu/drm/xe/xe_sriov_pf_migration_types.h  |  4 +--
> >  4 files changed, 42 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > index d5d918ddce4fe..3174a8dee779e 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > @@ -17,6 +17,7 @@
> >  #include "xe_gt_sriov_pf_helpers.h"
> >  #include "xe_gt_sriov_pf_migration.h"
> >  #include "xe_gt_sriov_printk.h"
> > +#include "xe_guc.h"
> >  #include "xe_guc_buf.h"
> >  #include "xe_guc_ct.h"
> >  #include "xe_migrate.h"
> > @@ -1023,6 +1024,12 @@ static void action_ring_cleanup(void *arg)
> >  	ptr_ring_cleanup(r, destroy_pf_packet);
> >  }
> >  
> > +static void pf_gt_migration_check_support(struct xe_gt *gt)
> > +{
> > +	if (GUC_FIRMWARE_VER(&gt->uc.guc) < MAKE_GUC_VER(70, 54, 0))
> > +		xe_sriov_pf_migration_disable(gt_to_xe(gt), "requires GuC version >= 70.54.0");
> > +}
> > +
> >  /**
> >   * xe_gt_sriov_pf_migration_init() - Initialize support for VF migration.
> >   * @gt: the &xe_gt
> > @@ -1039,6 +1046,8 @@ int xe_gt_sriov_pf_migration_init(struct xe_gt *gt)
> >  
> >  	xe_gt_assert(gt, IS_SRIOV_PF(xe));
> >  
> > +	pf_gt_migration_check_support(gt);
> > +
> >  	if (!pf_migration_supported(gt))
> >  		return 0;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_sriov_pf_migration.c
> > index de06cc690fc81..6c4b16409cc9a 100644
> > --- a/drivers/gpu/drm/xe/xe_sriov_pf_migration.c
> > +++ b/drivers/gpu/drm/xe/xe_sriov_pf_migration.c
> > @@ -46,13 +46,37 @@ bool xe_sriov_pf_migration_supported(struct xe_device *xe)
> >  {
> >  	xe_assert(xe, IS_SRIOV_PF(xe));
> >  
> > -	return xe->sriov.pf.migration.supported;
> > +	return IS_ENABLED(CONFIG_DRM_XE_DEBUG) || !xe->sriov.pf.migration.disabled;
> >  }
> >  
> > -static bool pf_check_migration_support(struct xe_device *xe)
> > +/**
> > + * xe_sriov_pf_migration_disable() - Turn off SR-IOV VF migration support on PF.
> > + * @xe: the &xe_device instance.
> > + * @fmt: format string for the log message, to be combined with following VAs.
> > + */
> > +void xe_sriov_pf_migration_disable(struct xe_device *xe, const char *fmt, ...)
> > +{
> > +	struct va_format vaf;
> > +	va_list va_args;
> > +
> > +	xe_assert(xe, IS_SRIOV_PF(xe));
> > +
> > +	va_start(va_args, fmt);
> > +	vaf.fmt = fmt;
> > +	vaf.va  = &va_args;
> > +	xe_sriov_notice(xe, "migration %s: %pV\n",
> > +			IS_ENABLED(CONFIG_DRM_XE_DEBUG) ?
> > +			"missing prerequisite" : "disabled",
> > +			&vaf);
> > +	va_end(va_args);
> > +
> > +	xe->sriov.pf.migration.disabled = true;
> > +}
> > +
> > +static void pf_migration_check_support(struct xe_device *xe)
> >  {
> > -	/* XXX: for now this is for feature enabling only */
> > -	return IS_ENABLED(CONFIG_DRM_XE_DEBUG);
> > +	if (!xe_device_has_memirq(xe))
> > +		xe_sriov_pf_migration_disable(xe, "requires memory-based IRQ support");
> >  }
> >  
> >  static void pf_migration_cleanup(void *arg)
> > @@ -77,7 +101,8 @@ int xe_sriov_pf_migration_init(struct xe_device *xe)
> >  
> >  	xe_assert(xe, IS_SRIOV_PF(xe));
> >  
> > -	xe->sriov.pf.migration.supported = pf_check_migration_support(xe);
> > +	pf_migration_check_support(xe);
> > +
> >  	if (!xe_sriov_pf_migration_supported(xe))
> >  		return 0;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_migration.h b/drivers/gpu/drm/xe/xe_sriov_pf_migration.h
> > index b806298a0bb62..f8f408df84813 100644
> > --- a/drivers/gpu/drm/xe/xe_sriov_pf_migration.h
> > +++ b/drivers/gpu/drm/xe/xe_sriov_pf_migration.h
> > @@ -14,6 +14,7 @@ struct xe_sriov_packet;
> >  
> >  int xe_sriov_pf_migration_init(struct xe_device *xe);
> >  bool xe_sriov_pf_migration_supported(struct xe_device *xe);
> > +void xe_sriov_pf_migration_disable(struct xe_device *xe, const char *fmt, ...);
> >  int xe_sriov_pf_migration_restore_produce(struct xe_device *xe, unsigned int vfid,
> >  					  struct xe_sriov_packet *data);
> >  struct xe_sriov_packet *
> > diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h b/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h
> > index 363d673ee1dd5..7d9a8a278d915 100644
> > --- a/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h
> > +++ b/drivers/gpu/drm/xe/xe_sriov_pf_migration_types.h
> > @@ -14,8 +14,8 @@
> >   * struct xe_sriov_pf_migration - Xe device level VF migration data
> >   */
> >  struct xe_sriov_pf_migration {
> > -	/** @supported: indicates whether VF migration feature is supported */
> > -	bool supported;
> > +	/** @disabled: indicates whether VF migration feature is disabled */
> > +	bool disabled;
> >  };
> >  
> >  /**
> 
> otherwise lgtm,
> 
> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 

Thanks,
-Michał

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-11-26 22:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-24 23:08 [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration Michał Winiarski
2025-11-24 23:08 ` [PATCH v6 1/4] drm/xe/pf: Enable SR-IOV " Michał Winiarski
2025-11-25 14:26   ` Michal Wajdeczko
2025-11-26 22:07     ` Michał Winiarski
2025-11-24 23:08 ` [PATCH v6 2/4] drm/xe/pci: Introduce a helper to allow VF access to PF xe_device Michał Winiarski
2025-11-24 23:08 ` [PATCH v6 3/4] drm/xe/pf: Export helpers for VFIO Michał Winiarski
2025-11-25 14:38   ` Michal Wajdeczko
2025-11-26 22:07     ` Michał Winiarski
2025-11-25 18:34   ` Alex Williamson
2025-11-26 18:21     ` Michał Winiarski
2025-11-24 23:08 ` [PATCH v6 4/4] vfio/xe: Add device specific vfio_pci driver variant for Intel graphics Michał Winiarski
2025-11-25 20:08   ` Alex Williamson
2025-11-26 11:59     ` Michał Winiarski
2025-11-25 20:13 ` [PATCH v6 0/4] vfio/xe: Add driver variant for Xe VF migration Alex Williamson
2025-11-26  1:20   ` Matthew Brost
2025-11-26 11:38     ` Thomas Hellström
2025-11-26 11:39       ` Thomas Hellström
2025-11-26 14:46       ` Michał Winiarski
2025-11-26 15:40         ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox