public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files
@ 2026-03-23 23:57 David Matlack
  2026-03-23 23:57 ` [PATCH v3 01/24] liveupdate: Export symbols needed by modules David Matlack
                   ` (24 more replies)
  0 siblings, 25 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:57 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

This series can be found on GitHub:

  https://github.com/dmatlack/linux/tree/liveupdate/vfio/cdev/v3

This series adds the base support to preserve a VFIO device file across
a Live Update. "Base support" means that this allows userspace to
safely preserve a VFIO device file with LIVEUPDATE_SESSION_PRESERVE_FD
and retrieve it with  LIVEUPDATE_SESSION_RETRIEVE_FD, but the device
itself is not preserved in a fully running state across Live Update.

This series aims to provide a foundation on which to build the rest of
the device preservation infrastructure, including:

 1. Preservation of iommufd files [1]
 2. Preservation of IOMMU driver state
 3. Preservation of PCI state (BAR resources, device state, bridge state, ...)
 4. Preservation of vfio-pci driver state

Steps 1 and 2 are already in-progress on the mailing list. We are
working on a detailed roadmap for steps 3 and 4.

Testing
-------

The patches at the end of this series provide comprehensive selftests
for the new code added by this series. The selftests have been validated
in both a VM environment using a virtio-net PCIe device, and in a
baremetal environment on an Intel EMR server with an Intel DSA PCIe
device.

Here is an example of how to run the new selftests:

vfio_pci_liveupdate_uapi_test:

  $ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0
  $ tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test 0000:00:04.0
  $ tools/testing/selftests/vfio/scripts/cleanup.sh

vfio_pci_liveupdate_kexec_test:

  $ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0
  $ tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test --stage 1 0000:00:04.0
  $ kexec ...

  $ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0
  $ tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test --stage 2 0000:00:04.0
  $ tools/testing/selftests/vfio/scripts/cleanup.sh

It is also possible to run vfio_pci_liveupdate_kexec_test multiple times
to preserve multiple devices simultaneously across a Live Update. This
series has been tested with up to 8 devices concurrently preserved.

Dependencies
------------

This series is built on top of v7.0-rc4 plus a series from Pasha
Tatashin to fix the module refcounting in FLB:

  https://lore.kernel.org/lkml/20260318141637.1870220-10-pasha.tatashin@soleen.com/

Changelog
---------

v3:
 - Add logging & documentation for pci=assign-busses overrides (Pranjal)
 - Use 2026 in drivers/pci/liveupdate.c (Pranjal)
 - Use 2026 in drivers/vfio/pci/vfio_pci_liveupdate.c (Pranjal)
 - Drop incoming/outgoing from PCI APIs (Sami)
 - Eliminate duplicate extern declarations for vfio_device_fops
   (Pranjal)
 - Keep struct vfio_device_file private (Pranjal)
 - Add comment about not supporting hot-plug (Pranjal)
 - Add comment about not supporting VFs (Sami)
 - Better error handling for liveupdate_flb_get_incoming() (Pranjal)
 - Remove liveupdate_enabled() checks (Zhu)
 - Remove liveupdate_enabled() checks in vfio_pci_liveupdate_init() (Pranjal)
 - Drop IOMMU reference from bus number commit message (Bjorn)
 - Add fabric rationale to commit message (Jason)
 - Swap incoming ... outgoing ordering in commit message (Bjorn)
 - Use vfio_device_cdev_opened() instead of df->group (Alex)
 - Add comments for CONFIG_VFIO_PCI_ZDEV_KVM (Alex)
 - Add comments for vfio_pci_is_intel_display() (Alex)
 - Use pci_dev_try_lock() in freeze (Alex)
 - Fix device reset locking in freeze() (me)
 - Use u32 for domain in PCI (Bjorn)
 - Use u32 for domain in VFIO (Bjorn)
 - Make pci_liveupdate_incoming_nr_devices() private to drivers/pci/ (Bjorn)
 - Fix dev->liveupdate_incoming readability (Bjorn)
 - Take pci_ser_delete() out of WARN_ON_ONCE() (Bjorn)
 - Drop reference to userspace & files from PCI commit message (Bjorn)
 - Rename __vfio_device_fops_cdev_open() to vfio_device_cdev_open_file() (Alex)
 - Fix NULL pointer dereference in release() (Alex)
 - Handle return value of pci_liveupdate_outgoing_preserve() (Alex)
 - Make pci_liveupdate_unregister_fh() unabe to fail. (Alex)
 - Move vfio_liveupdate_incoming_is_preserved() to drivers/vfio/vfio.h (Alex)
 - Add vfio_pci_core_probe_reset() (Alex)
 - Forward declare ser struct in include/linux/vfio_pci_core.h (Alex)
 - Bump compatibility string when adding reset_works (Alex)
 - How will userspace detect partial preservation? (Alex)
 - Require single device per iommu_group (Jason)
 - Rename pci_liveupdate_register_fh() to pci_liveupdate_register_flb() (Vipin)
 - Use ksft_exit_skip() and SKIP() (Vipin)
 - Move documentation to code (Vipin)
 - Use __u64 instead of int for token in Live Update selftest helpers (Gemini)
 - Add documentation for drivers/pci/liveupdate.c (me)

v2: https://lore.kernel.org/kvm/20260129212510.967611-1-dmatlack@google.com/

v1: https://lore.kernel.org/kvm/20251126193608.2678510-1-dmatlack@google.com/

rfc: https://lore.kernel.org/kvm/20251018000713.677779-1-vipinsh@google.com/

Cc: Pranjal Shrivastava <praan@google.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Adithya Jayachandran <ajayachandra@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: William Tu <witu@nvidia.com>
Cc: Jacob Pan <jacob.pan@linux.microsoft.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Josh Hilke <jrhilke@google.com>
Cc: David Rientjes <rientjes@google.com>

[1] https://lore.kernel.org/linux-iommu/20251202230303.1017519-1-skhawaja@google.com/

David Matlack (15):
  liveupdate: Export symbols needed by modules
  PCI: Add API to track PCI devices preserved across Live Update
  PCI: Require Live Update preserved devices are in singleton
    iommu_groups
  PCI: Inherit bus numbers from previous kernel during Live Update
  docs: liveupdate: Add documentation for PCI
  vfio/pci: Notify PCI subsystem about devices preserved across Live
    Update
  vfio: Enforce preserved devices are retrieved via
    LIVEUPDATE_SESSION_RETRIEVE_FD
  vfio/pci: Store incoming Live Update state in struct
    vfio_pci_core_device
  docs: liveupdate: Add documentation for VFIO PCI
  vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED
  vfio: selftests: Add vfio_pci_liveupdate_uapi_test
  vfio: selftests: Expose iommu_modes to tests
  vfio: selftests: Expose low-level helper routines for setting up
    struct vfio_pci_device
  vfio: selftests: Verify that opening VFIO device fails during Live
    Update
  vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test

Vipin Sharma (9):
  vfio/pci: Register a file handler with Live Update Orchestrator
  vfio/pci: Preserve vfio-pci device files across Live Update
  vfio/pci: Retrieve preserved device files after Live Update
  vfio/pci: Skip reset of preserved device after Live Update
  selftests/liveupdate: Move luo_test_utils.* into a reusable library
  selftests/liveupdate: Add helpers to preserve/retrieve FDs
  vfio: selftests: Build liveupdate library in VFIO selftests
  vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD
  vfio: selftests: Add vfio_pci_liveupdate_kexec_test

 Documentation/PCI/liveupdate.rst              |  23 +
 .../admin-guide/kernel-parameters.txt         |   6 +-
 Documentation/core-api/liveupdate.rst         |   2 +
 .../driver-api/vfio_pci_liveupdate.rst        |  23 +
 MAINTAINERS                                   |   2 +
 drivers/pci/Kconfig                           |  11 +
 drivers/pci/Makefile                          |   1 +
 drivers/pci/liveupdate.c                      | 415 ++++++++++++++++++
 drivers/pci/pci.h                             |  14 +
 drivers/pci/probe.c                           |  37 +-
 drivers/vfio/device_cdev.c                    |  63 ++-
 drivers/vfio/group.c                          |   9 +
 drivers/vfio/pci/Kconfig                      |  11 +
 drivers/vfio/pci/Makefile                     |   1 +
 drivers/vfio/pci/vfio_pci.c                   |  14 +-
 drivers/vfio/pci/vfio_pci_core.c              |  90 ++--
 drivers/vfio/pci/vfio_pci_liveupdate.c        | 328 ++++++++++++++
 drivers/vfio/pci/vfio_pci_priv.h              |  18 +
 drivers/vfio/vfio.h                           |  18 +
 drivers/vfio/vfio_main.c                      |  16 +-
 include/linux/kho/abi/pci.h                   |  62 +++
 include/linux/kho/abi/vfio_pci.h              |  45 ++
 include/linux/pci.h                           |  41 ++
 include/linux/vfio.h                          |  13 +
 include/linux/vfio_pci_core.h                 |   2 +
 kernel/liveupdate/luo_core.c                  |   1 +
 kernel/liveupdate/luo_file.c                  |   2 +
 tools/testing/selftests/liveupdate/.gitignore |   1 +
 tools/testing/selftests/liveupdate/Makefile   |  14 +-
 .../include/libliveupdate.h}                  |  11 +-
 .../selftests/liveupdate/lib/libliveupdate.mk |  20 +
 .../{luo_test_utils.c => lib/liveupdate.c}    |  43 +-
 .../selftests/liveupdate/luo_kexec_simple.c   |   2 +-
 .../selftests/liveupdate/luo_multi_session.c  |   2 +-
 tools/testing/selftests/vfio/Makefile         |  23 +-
 .../vfio/lib/include/libvfio/iommu.h          |   2 +
 .../lib/include/libvfio/vfio_pci_device.h     |   8 +
 tools/testing/selftests/vfio/lib/iommu.c      |   4 +-
 .../selftests/vfio/lib/vfio_pci_device.c      |  60 ++-
 .../vfio/vfio_pci_liveupdate_kexec_test.c     | 256 +++++++++++
 .../vfio/vfio_pci_liveupdate_uapi_test.c      |  93 ++++
 41 files changed, 1715 insertions(+), 92 deletions(-)
 create mode 100644 Documentation/PCI/liveupdate.rst
 create mode 100644 Documentation/driver-api/vfio_pci_liveupdate.rst
 create mode 100644 drivers/pci/liveupdate.c
 create mode 100644 drivers/vfio/pci/vfio_pci_liveupdate.c
 create mode 100644 include/linux/kho/abi/pci.h
 create mode 100644 include/linux/kho/abi/vfio_pci.h
 rename tools/testing/selftests/liveupdate/{luo_test_utils.h => lib/include/libliveupdate.h} (80%)
 create mode 100644 tools/testing/selftests/liveupdate/lib/libliveupdate.mk
 rename tools/testing/selftests/liveupdate/{luo_test_utils.c => lib/liveupdate.c} (89%)
 create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c
 create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c


base-commit: 3251ac3df4374e4e94e4fbecf49ad1573933018a
prerequisite-patch-id: 37ebd38e2247ccb02e6a6c7543a378534d69a038
prerequisite-patch-id: 19c8469a9ae5cd13618481dd75012444330f80f9
prerequisite-patch-id: 2ba04f598993e2c2d941c4cd3f2dc1e98905d68b
prerequisite-patch-id: fcab928f6ee32a145667822ceca5e4f1f567d530
prerequisite-patch-id: 6d11347278609426baa9eacc1726b32b48d09a25
prerequisite-patch-id: 94a9e8a8cb6004e12de90fcc0068e8a8b12652de
prerequisite-patch-id: d658019a7ac7c82ebe4a6c6086457e27460174d3
prerequisite-patch-id: 41e68c9fc8e8c5e493497e87ca13577b3167cf80
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 01/24] liveupdate: Export symbols needed by modules
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
@ 2026-03-23 23:57 ` David Matlack
  2026-03-23 23:57 ` [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update David Matlack
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:57 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Export liveupdate_enabled(), liveupdate_register_file_handler(), and
liveupdate_unregister_file_handler(). All of these will be used by
vfio-pci in a subsequent commit, which can be built as a module.

Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 kernel/liveupdate/luo_core.c | 1 +
 kernel/liveupdate/luo_file.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
index dda7bb57d421..59d7793d9444 100644
--- a/kernel/liveupdate/luo_core.c
+++ b/kernel/liveupdate/luo_core.c
@@ -255,6 +255,7 @@ bool liveupdate_enabled(void)
 {
 	return luo_global.enabled;
 }
+EXPORT_SYMBOL_GPL(liveupdate_enabled);
 
 /**
  * DOC: LUO ioctl Interface
diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
index a38ea4975824..cdc48d49e5e5 100644
--- a/kernel/liveupdate/luo_file.c
+++ b/kernel/liveupdate/luo_file.c
@@ -866,6 +866,7 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(liveupdate_register_file_handler);
 
 /**
  * liveupdate_unregister_file_handler - Unregister a liveupdate file handler
@@ -884,3 +885,4 @@ void liveupdate_unregister_file_handler(struct liveupdate_file_handler *fh)
 		list_del(&ACCESS_PRIVATE(fh, list));
 	}
 }
+EXPORT_SYMBOL_GPL(liveupdate_unregister_file_handler);
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
  2026-03-23 23:57 ` [PATCH v3 01/24] liveupdate: Export symbols needed by modules David Matlack
@ 2026-03-23 23:57 ` David Matlack
  2026-03-25 20:06   ` David Matlack
  2026-03-25 23:12   ` Bjorn Helgaas
  2026-03-23 23:57 ` [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups David Matlack
                   ` (22 subsequent siblings)
  24 siblings, 2 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:57 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Add an API to enable the PCI subsystem to participate in a Live Update
and track all devices that are being preserved by drivers. Since this
support is still under development, hide it behind a new Kconfig
PCI_LIVEUPDATE that is marked experimental.

This API will be used in subsequent commits by the vfio-pci driver to
preserve VFIO devices across Live Update.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 drivers/pci/Kconfig         |  11 ++
 drivers/pci/Makefile        |   1 +
 drivers/pci/liveupdate.c    | 380 ++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h           |  14 ++
 drivers/pci/probe.c         |   2 +
 include/linux/kho/abi/pci.h |  62 ++++++
 include/linux/pci.h         |  41 ++++
 7 files changed, 511 insertions(+)
 create mode 100644 drivers/pci/liveupdate.c
 create mode 100644 include/linux/kho/abi/pci.h

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index e3f848ffb52a..05307d89c3f4 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -334,6 +334,17 @@ config VGA_ARB_MAX_GPUS
 	  Reserves space in the kernel to maintain resource locking for
 	  multiple GPUS.  The overhead for each GPU is very small.
 
+config PCI_LIVEUPDATE
+	bool "PCI Live Update Support (EXPERIMENTAL)"
+	depends on PCI && LIVEUPDATE
+	help
+	  Support for preserving PCI devices across a Live Update. This option
+	  should only be enabled by developers working on implementing this
+	  support. Once enough support as landed in the kernel, this option
+	  will no longer be marked EXPERIMENTAL.
+
+	  If unsure, say N.
+
 source "drivers/pci/hotplug/Kconfig"
 source "drivers/pci/controller/Kconfig"
 source "drivers/pci/endpoint/Kconfig"
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 41ebc3b9a518..e8d003cb6757 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_PROC_FS)		+= proc.o
 obj-$(CONFIG_SYSFS)		+= pci-sysfs.o slot.o
 obj-$(CONFIG_ACPI)		+= pci-acpi.o
 obj-$(CONFIG_GENERIC_PCI_IOMAP) += iomap.o
+obj-$(CONFIG_PCI_LIVEUPDATE)	+= liveupdate.o
 endif
 
 obj-$(CONFIG_OF)		+= of.o
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
new file mode 100644
index 000000000000..bec7b3500057
--- /dev/null
+++ b/drivers/pci/liveupdate.c
@@ -0,0 +1,380 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * David Matlack <dmatlack@google.com>
+ */
+
+/**
+ * DOC: PCI Live Update
+ *
+ * The PCI subsystem participates in the Live Update process to enable drivers
+ * to preserve their PCI devices across kexec.
+ *
+ * Device preservation across Live Update is built on top of the Live Update
+ * Orchestrator (LUO) support for file preservation across kexec. Userspace
+ * indicates that a device should be preserved by preserving the file associated
+ * with the device with ``ioctl(LIVEUPDATE_SESSION_PRESERVE_FD)``.
+ *
+ * .. note::
+ *    The support for preserving PCI devices across Live Update is currently
+ *    *partial* and should be considered *experimental*. It should only be
+ *    used by developers working on the implementation for the time being.
+ *
+ *    To enable the support, enable ``CONFIG_PCI_LIVEUPDATE``.
+ *
+ * Driver API
+ * ==========
+ *
+ * Drivers that support file-based device preservation must register their
+ * ``liveupdate_file_handler`` with the PCI subsystem by calling
+ * ``pci_liveupdate_register_flb()``. This ensures the PCI subsystem will be
+ * notified whenever a device file is preserved so that ``struct pci_ser``
+ * can be allocated to track all preserved devices. This struct is an ABI
+ * and is eventually handed off to the next kernel via Kexec-Handover (KHO).
+ *
+ * In the "outgoing" kernel (before kexec), drivers should then notify the PCI
+ * subsystem directly whenever the preservation status for a device changes:
+ *
+ *  * ``pci_liveupdate_preserve(pci_dev)``: The device is being preserved.
+ *
+ *  * ``pci_liveupdate_unpreserve(pci_dev)``: The device is no longer being
+ *    preserved (preservation is cancelled).
+ *
+ * In the "incoming" kernel (after kexec), drivers should notify the PCI
+ * subsystem with the following calls:
+ *
+ *  * ``pci_liveupdate_retrieve(pci_dev)``: The device file is being retrieved
+ *    by userspace.
+ *
+ *  * ``pci_liveupdate_finish(pci_dev)``: The device is done participating in
+ *    Live Update. After this point the device may no longer be even associated
+ *    with the same driver.
+ *
+ * Incoming/Outgoing
+ * =================
+ *
+ * The state of each device's participation in Live Update is stored in
+ * ``struct pci_dev``:
+ *
+ *  * ``liveupdate_outgoing``: True if the device is being preserved in the
+ *    outgoing kernel. Set in ``pci_liveupdate_preserve()`` and cleared in
+ *    ``pci_liveupdate_unpreserve()``.
+ *
+ *  * ``liveupdate_incoming``: True if the device is preserved in the incoming
+ *    kernel. Set during probing when the device is first created and cleared
+ *    in ``pci_liveupdate_finish()``.
+ *
+ * Restrictions
+ * ============
+ *
+ * Preserved devices currently have the following restrictions. Each of these
+ * may be relaxed in the future.
+ *
+ *  * The device must not be a Virtual Function (VF).
+ *
+ *  * The device must not be a Physical Function (PF).
+ *
+ * Preservation Behavior
+ * =====================
+ *
+ * The kernel preserves the following state for devices preserved across a Live
+ * Update:
+ *
+ *  * The PCI Segment, Bus, Device, and Function numbers assigned to the device
+ *    are guaranteed to remain the same across Live Update.
+ *
+ * This list will be extended in the future as new support is added.
+ *
+ * Driver Binding
+ * ==============
+ *
+ * It is the driver's responsibility for ensuring that preserved devices are not
+ * released or bound to a different driver for as long as they are preserved. In
+ * practice, this is enforced by LUO taking an extra referenced to the preserved
+ * device file for as long as it is preserved.
+ *
+ * However, there is a window of time in the incoming kernel when a device is
+ * first probed and when userspace retrieves the device file with
+ * ``LIVEUPDATE_SESSION_RETRIEVE_FD`` when the device could be bound to any
+ * driver.
+ *
+ * It is currently userspace's responsibility to ensure that the device is bound
+ * to the correct driver in this window.
+ */
+
+#include <linux/bsearch.h>
+#include <linux/io.h>
+#include <linux/kexec_handover.h>
+#include <linux/kho/abi/pci.h>
+#include <linux/liveupdate.h>
+#include <linux/mutex.h>
+#include <linux/mm.h>
+#include <linux/pci.h>
+#include <linux/sort.h>
+
+#include "pci.h"
+
+static DEFINE_MUTEX(pci_flb_outgoing_lock);
+
+static int pci_flb_preserve(struct liveupdate_flb_op_args *args)
+{
+	struct pci_dev *dev = NULL;
+	int max_nr_devices = 0;
+	struct pci_ser *ser;
+	unsigned long size;
+
+	/*
+	 * Don't both accounting for VFs that could be created after this
+	 * since preserving VFs is not supported yet. Also don't account
+	 * for devices that could be hot-plugged after this since preserving
+	 * hot-plugged devices across Live Update is not yet an expected
+	 * use-case.
+	 */
+	for_each_pci_dev(dev)
+		max_nr_devices++;
+
+	size = struct_size_t(struct pci_ser, devices, max_nr_devices);
+
+	ser = kho_alloc_preserve(size);
+	if (IS_ERR(ser))
+		return PTR_ERR(ser);
+
+	ser->max_nr_devices = max_nr_devices;
+
+	args->obj = ser;
+	args->data = virt_to_phys(ser);
+	return 0;
+}
+
+static void pci_flb_unpreserve(struct liveupdate_flb_op_args *args)
+{
+	struct pci_ser *ser = args->obj;
+
+	WARN_ON_ONCE(ser->nr_devices);
+	kho_unpreserve_free(ser);
+}
+
+static int pci_flb_retrieve(struct liveupdate_flb_op_args *args)
+{
+	args->obj = phys_to_virt(args->data);
+	return 0;
+}
+
+static void pci_flb_finish(struct liveupdate_flb_op_args *args)
+{
+	kho_restore_free(args->obj);
+}
+
+static struct liveupdate_flb_ops pci_liveupdate_flb_ops = {
+	.preserve = pci_flb_preserve,
+	.unpreserve = pci_flb_unpreserve,
+	.retrieve = pci_flb_retrieve,
+	.finish = pci_flb_finish,
+	.owner = THIS_MODULE,
+};
+
+static struct liveupdate_flb pci_liveupdate_flb = {
+	.ops = &pci_liveupdate_flb_ops,
+	.compatible = PCI_LUO_FLB_COMPATIBLE,
+};
+
+#define INIT_PCI_DEV_SER(_dev) {		\
+	.domain = pci_domain_nr((_dev)->bus),	\
+	.bdf = pci_dev_id(_dev),		\
+}
+
+static int pci_dev_ser_cmp(const void *__a, const void *__b)
+{
+	const struct pci_dev_ser *a = __a, *b = __b;
+
+	return cmp_int((u64)a->domain << 16 | a->bdf,
+		       (u64)b->domain << 16 | b->bdf);
+}
+
+static struct pci_dev_ser *pci_ser_find(struct pci_ser *ser,
+					struct pci_dev *dev)
+{
+	const struct pci_dev_ser key = INIT_PCI_DEV_SER(dev);
+
+	return bsearch(&key, ser->devices, ser->nr_devices,
+		       sizeof(key), pci_dev_ser_cmp);
+}
+
+static void pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev)
+{
+	struct pci_dev_ser *dev_ser;
+	int i;
+
+	dev_ser = pci_ser_find(ser, dev);
+
+	/*
+	 * This should never happen unless there is a kernel bug or
+	 * corruption that causes the state in struct pci_ser to get
+	 * out of sync with struct pci_dev.
+	 */
+	if (pci_WARN_ONCE(dev, !dev_ser, "Cannot find preserved device!"))
+		return;
+
+	for (i = dev_ser - ser->devices; i < ser->nr_devices - 1; i++)
+		ser->devices[i] = ser->devices[i + 1];
+
+	ser->nr_devices--;
+}
+
+int pci_liveupdate_preserve(struct pci_dev *dev)
+{
+	struct pci_dev_ser new = INIT_PCI_DEV_SER(dev);
+	struct pci_ser *ser;
+	int i, ret;
+
+	/* SR-IOV is not supported yet. */
+	if (dev->is_virtfn || dev->is_physfn)
+		return -EINVAL;
+
+	guard(mutex)(&pci_flb_outgoing_lock);
+
+	if (dev->liveupdate_outgoing)
+		return -EBUSY;
+
+	ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser);
+	if (ret)
+		return ret;
+
+	if (ser->nr_devices == ser->max_nr_devices)
+		return -E2BIG;
+
+	for (i = ser->nr_devices; i > 0; i--) {
+		struct pci_dev_ser *prev = &ser->devices[i - 1];
+		int cmp = pci_dev_ser_cmp(&new, prev);
+
+		/*
+		 * This should never happen unless there is a kernel bug or
+		 * corruption that causes the state in struct pci_ser to get out
+		 * of sync with struct pci_dev.
+		 */
+		if (WARN_ON_ONCE(!cmp))
+			return -EBUSY;
+
+		if (cmp > 0)
+			break;
+
+		ser->devices[i] = *prev;
+	}
+
+	ser->devices[i] = new;
+	ser->nr_devices++;
+	dev->liveupdate_outgoing = true;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_preserve);
+
+void pci_liveupdate_unpreserve(struct pci_dev *dev)
+{
+	struct pci_ser *ser;
+	int ret;
+
+	/* This should never happen unless the caller (driver) is buggy */
+	if (WARN_ON_ONCE(!dev->liveupdate_outgoing))
+		return;
+
+	guard(mutex)(&pci_flb_outgoing_lock);
+
+	ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser);
+
+	/* This should never happen unless there is a bug in LUO */
+	if (WARN_ON_ONCE(ret))
+		return;
+
+	pci_ser_delete(ser, dev);
+	dev->liveupdate_outgoing = false;
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_unpreserve);
+
+static int pci_liveupdate_flb_get_incoming(struct pci_ser **serp)
+{
+	int ret;
+
+	ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)serp);
+
+	/* Live Update is not enabled. */
+	if (ret == -EOPNOTSUPP)
+		return ret;
+
+	/* Live Update is enabled, but there is no incoming FLB data. */
+	if (ret == -ENODATA)
+		return ret;
+
+	/*
+	 * Live Update is enabled and there is incoming FLB data, but none of it
+	 * matches pci_liveupdate_flb.compatible.
+	 *
+	 * This could mean that no PCI FLB data was passed by the previous
+	 * kernel, but it could also mean the previous kernel used a different
+	 * compatibility string (i.e.a different ABI). The latter deserves at
+	 * least a WARN_ON_ONCE() but it cannot be distinguished from the
+	 * former.
+	 */
+	if (ret == -ENOENT) {
+		pr_info_once("PCI: No incoming FLB data detected during Live Update");
+		return ret;
+	}
+
+	/*
+	 * There is incoming FLB data that matches pci_liveupdate_flb.compatible
+	 * but it cannot be retrieved. Proceed with standard initialization as
+	 * if there was not incoming PCI FLB data.
+	 */
+	WARN_ONCE(ret, "PCI: Failed to retrieve incoming FLB data during Live Update");
+	return ret;
+}
+
+u32 pci_liveupdate_incoming_nr_devices(void)
+{
+	struct pci_ser *ser;
+
+	if (pci_liveupdate_flb_get_incoming(&ser))
+		return 0;
+
+	return ser->nr_devices;
+}
+
+void pci_liveupdate_setup_device(struct pci_dev *dev)
+{
+	struct pci_ser *ser;
+
+	if (pci_liveupdate_flb_get_incoming(&ser))
+		return;
+
+	if (!pci_ser_find(ser, dev))
+		return;
+
+	dev->liveupdate_incoming = true;
+}
+
+int pci_liveupdate_retrieve(struct pci_dev *dev)
+{
+	if (!dev->liveupdate_incoming)
+		return -EINVAL;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_retrieve);
+
+void pci_liveupdate_finish(struct pci_dev *dev)
+{
+	dev->liveupdate_incoming = false;
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_finish);
+
+int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
+{
+	return liveupdate_register_flb(fh, &pci_liveupdate_flb);
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_register_flb);
+
+void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh)
+{
+	liveupdate_unregister_flb(fh, &pci_liveupdate_flb);
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_unregister_flb);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 13d998fbacce..979cb9921340 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1434,4 +1434,18 @@ static inline int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int inde
 	(PCI_CONF1_ADDRESS(bus, dev, func, reg) | \
 	 PCI_CONF1_EXT_REG(reg))
 
+#ifdef CONFIG_PCI_LIVEUPDATE
+void pci_liveupdate_setup_device(struct pci_dev *dev);
+u32 pci_liveupdate_incoming_nr_devices(void);
+#else
+static inline void pci_liveupdate_setup_device(struct pci_dev *dev)
+{
+}
+
+static inline u32 pci_liveupdate_incoming_nr_devices(void)
+{
+	return 0;
+}
+#endif
+
 #endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bccc7a4bdd79..c60222d45659 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2064,6 +2064,8 @@ int pci_setup_device(struct pci_dev *dev)
 	if (pci_early_dump)
 		early_dump_pci_device(dev);
 
+	pci_liveupdate_setup_device(dev);
+
 	/* Need to have dev->class ready */
 	dev->cfg_size = pci_cfg_space_size(dev);
 
diff --git a/include/linux/kho/abi/pci.h b/include/linux/kho/abi/pci.h
new file mode 100644
index 000000000000..7764795f6818
--- /dev/null
+++ b/include/linux/kho/abi/pci.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * David Matlack <dmatlack@google.com>
+ */
+
+#ifndef _LINUX_KHO_ABI_PCI_H
+#define _LINUX_KHO_ABI_PCI_H
+
+#include <linux/bug.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/**
+ * DOC: PCI File-Lifecycle Bound (FLB) Live Update ABI
+ *
+ * This header defines the ABI for preserving core PCI state across kexec using
+ * Live Update File-Lifecycle Bound (FLB) data.
+ *
+ * This interface is a contract. Any modification to any of the serialization
+ * structs defined here constitutes a breaking change. Such changes require
+ * incrementing the version number in the PCI_LUO_FLB_COMPATIBLE string.
+ */
+
+#define PCI_LUO_FLB_COMPATIBLE "pci-v1"
+
+/**
+ * struct pci_dev_ser - Serialized state about a single PCI device.
+ *
+ * @domain: The device's PCI domain number (segment).
+ * @bdf: The device's PCI bus, device, and function number.
+ * @reserved: Reserved (to naturally align struct pci_dev_ser).
+ */
+struct pci_dev_ser {
+	u32 domain;
+	u16 bdf;
+	u16 reserved;
+} __packed;
+
+/**
+ * struct pci_ser - PCI Subsystem Live Update State
+ *
+ * This struct tracks state about all devices that are being preserved across
+ * a Live Update for the next kernel.
+ *
+ * @max_nr_devices: The length of the devices[] flexible array.
+ * @nr_devices: The number of devices that were preserved.
+ * @devices: Flexible array of pci_dev_ser structs for each device. Guaranteed
+ *           to be sorted ascending by domain and bdf.
+ */
+struct pci_ser {
+	u64 max_nr_devices;
+	u64 nr_devices;
+	struct pci_dev_ser devices[];
+} __packed;
+
+/* Ensure all elements of devices[] are naturally aligned. */
+static_assert(offsetof(struct pci_ser, devices) % sizeof(unsigned long) == 0);
+static_assert(sizeof(struct pci_dev_ser) % sizeof(unsigned long) == 0);
+
+#endif /* _LINUX_KHO_ABI_PCI_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1c270f1d5123..27ee9846a2fd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -40,6 +40,7 @@
 #include <linux/resource_ext.h>
 #include <linux/msi_api.h>
 #include <uapi/linux/pci.h>
+#include <linux/liveupdate.h>
 
 #include <linux/pci_ids.h>
 
@@ -591,6 +592,10 @@ struct pci_dev {
 	u8		tph_mode;	/* TPH mode */
 	u8		tph_req_type;	/* TPH requester type */
 #endif
+#ifdef CONFIG_PCI_LIVEUPDATE
+	unsigned int	liveupdate_incoming:1;	/* Preserved by previous kernel */
+	unsigned int	liveupdate_outgoing:1;	/* Preserved for next kernel */
+#endif
 };
 
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
@@ -2871,4 +2876,40 @@ void pci_uevent_ers(struct pci_dev *pdev, enum  pci_ers_result err_type);
 	WARN_ONCE(condition, "%s %s: " fmt, \
 		  dev_driver_string(&(pdev)->dev), pci_name(pdev), ##arg)
 
+#ifdef CONFIG_PCI_LIVEUPDATE
+int pci_liveupdate_preserve(struct pci_dev *dev);
+void pci_liveupdate_unpreserve(struct pci_dev *dev);
+int pci_liveupdate_retrieve(struct pci_dev *dev);
+void pci_liveupdate_finish(struct pci_dev *dev);
+int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh);
+void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh);
+#else
+static inline int pci_liveupdate_preserve(struct pci_dev *dev)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void pci_liveupdate_unpreserve(struct pci_dev *dev)
+{
+}
+
+static inline int pci_liveupdate_retrieve(struct pci_dev *dev)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void pci_liveupdate_finish(struct pci_dev *dev)
+{
+}
+
+static inline int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh)
+{
+}
+#endif
+
 #endif /* LINUX_PCI_H */
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
  2026-03-23 23:57 ` [PATCH v3 01/24] liveupdate: Export symbols needed by modules David Matlack
  2026-03-23 23:57 ` [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update David Matlack
@ 2026-03-23 23:57 ` David Matlack
  2026-03-24 13:07   ` Yi Liu
  2026-03-25 23:13   ` Bjorn Helgaas
  2026-03-23 23:57 ` [PATCH v3 04/24] PCI: Inherit bus numbers from previous kernel during Live Update David Matlack
                   ` (21 subsequent siblings)
  24 siblings, 2 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:57 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Require that Live Update preserved devices are in singleton iommu_groups
during preservation (outgoing kernel) and retrieval (incoming kernel).

PCI devices preserved across Live Update will be allowed to perform
memory transactions throughout the Live Update. Thus IOMMU groups for
preserved devices must remain fixed. Since all current use cases for
Live Update are for PCI devices in singleton iommu_groups, require that
as a starting point. This avoids the complexity of needing to enforce
arbitrary iommu_group topologies while still allowing all current use
cases.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 drivers/pci/liveupdate.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index bec7b3500057..a3dbe06650ff 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -75,6 +75,8 @@
  *
  *  * The device must not be a Physical Function (PF).
  *
+ *  * The device must be the only device in its IOMMU group.
+ *
  * Preservation Behavior
  * =====================
  *
@@ -105,6 +107,7 @@
 
 #include <linux/bsearch.h>
 #include <linux/io.h>
+#include <linux/iommu.h>
 #include <linux/kexec_handover.h>
 #include <linux/kho/abi/pci.h>
 #include <linux/liveupdate.h>
@@ -222,6 +225,31 @@ static void pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev)
 	ser->nr_devices--;
 }
 
+static int count_devices(struct device *dev, void *__nr_devices)
+{
+	(*(int *)__nr_devices)++;
+	return 0;
+}
+
+static int pci_liveupdate_validate_iommu_group(struct pci_dev *dev)
+{
+	struct iommu_group *group;
+	int nr_devices = 0;
+
+	group = iommu_group_get(&dev->dev);
+	if (group) {
+		iommu_group_for_each_dev(group, &nr_devices, count_devices);
+		iommu_group_put(group);
+	}
+
+	if (nr_devices != 1) {
+		pci_warn(dev, "Live Update preserved devices must be in singleton iommu groups!");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 int pci_liveupdate_preserve(struct pci_dev *dev)
 {
 	struct pci_dev_ser new = INIT_PCI_DEV_SER(dev);
@@ -232,6 +260,10 @@ int pci_liveupdate_preserve(struct pci_dev *dev)
 	if (dev->is_virtfn || dev->is_physfn)
 		return -EINVAL;
 
+	ret = pci_liveupdate_validate_iommu_group(dev);
+	if (ret)
+		return ret;
+
 	guard(mutex)(&pci_flb_outgoing_lock);
 
 	if (dev->liveupdate_outgoing)
@@ -357,7 +389,7 @@ int pci_liveupdate_retrieve(struct pci_dev *dev)
 	if (!dev->liveupdate_incoming)
 		return -EINVAL;
 
-	return 0;
+	return pci_liveupdate_validate_iommu_group(dev);
 }
 EXPORT_SYMBOL_GPL(pci_liveupdate_retrieve);
 
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 04/24] PCI: Inherit bus numbers from previous kernel during Live Update
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (2 preceding siblings ...)
  2026-03-23 23:57 ` [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups David Matlack
@ 2026-03-23 23:57 ` David Matlack
  2026-03-23 23:57 ` [PATCH v3 05/24] docs: liveupdate: Add documentation for PCI David Matlack
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:57 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Inherit bus numbers from the previous kernel during a Live Update when
one or more PCI devices are being preserved, even if pci=assign-busses
is enabled.

During a Live Update, preserved devices will be allowed to continue
performing memory transactions. Thus the kernel cannot change the fabric
topology, including changing bus numbers, since that would requiring
disabling and flushing any memory transactions first.

So if pci=assign-busses is enabled, ignore it during the Live Update and
inherit all bus numbers assigned by the previous kernel. This will not
break users that rely on pci=assign-busses for their system to function
correctly since the system can be assumed to be in a functional state
already if a Live Update is underway. In other words, pci=assign-busses
would establish a functional topology during the initial cold boot, and
then that topology would remain fixed across any subsequent Live
Updates.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 .../admin-guide/kernel-parameters.txt         |  6 +++-
 drivers/pci/liveupdate.c                      |  5 ++-
 drivers/pci/probe.c                           | 35 ++++++++++++++++---
 3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 03a550630644..beff9f3f8e3b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5156,7 +5156,11 @@ Kernel parameters
 				explicitly which ones they are.
 		assign-busses	[X86] Always assign all PCI bus
 				numbers ourselves, overriding
-				whatever the firmware may have done.
+				whatever the firmware may have done. Ignored
+				during a Live Update, where the kernel must
+				inherit the PCI topology (including bus numbers)
+				to avoid interrupting ongoing memory
+				transactions of preserved devices.
 		usepirqmask	[X86] Honor the possible IRQ mask stored
 				in the BIOS $PIR table. This is needed on
 				some systems with broken BIOSes, notably
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index a3dbe06650ff..c1251f4f8438 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -84,7 +84,10 @@
  * Update:
  *
  *  * The PCI Segment, Bus, Device, and Function numbers assigned to the device
- *    are guaranteed to remain the same across Live Update.
+ *    are guaranteed to remain the same across Live Update. Note that this is
+ *    true even if pci=assign-busses is set on the command line. The kernel will
+ *    always inherit bus numbers assigned by the previous kernel during a Live
+ *    Update.
  *
  * This list will be extended in the future as new support is added.
  *
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c60222d45659..165056d71e66 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1369,6 +1369,34 @@ bool pci_ea_fixed_busnrs(struct pci_dev *dev, u8 *sec, u8 *sub)
 	return true;
 }
 
+static bool pci_assign_all_busses(void)
+{
+	if (!pcibios_assign_all_busses())
+		return false;
+
+	/*
+	 * During a Live Update, preserved devices are are allowed to continue
+	 * performing memory transactions. Thus the kernel cannot change the
+	 * fabric topology, including changing bus numbers, since that would
+	 * requiring disabling and flushing any memory transactions first.
+	 *
+	 * So if pci=assign-busses is enabled, ignore it during the Live Update
+	 * and inherit all bus numbers assigned by the previous kernel. This
+	 * will not break users that rely on pci=assign-busses for their system
+	 * to function correctly since the system can be assumed to be in a
+	 * functional state already if a Live Update is underway. In other
+	 * words, pci=assign-busses should be used to establish working bus
+	 * numbers during the initial cold boot, and then that topology would
+	 * then remain fixed across any subsequent Live Updates.
+	 */
+	if (pci_liveupdate_incoming_nr_devices()) {
+		pr_info_once("Ignoring pci=assign-busses and inheriting bus numbers during Live Update\n");
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * pci_scan_bridge_extend() - Scan buses behind a bridge
  * @bus: Parent bus the bridge is on
@@ -1396,6 +1424,7 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
 				  int max, unsigned int available_buses,
 				  int pass)
 {
+	const bool assign_all_busses = pci_assign_all_busses();
 	struct pci_bus *child;
 	u32 buses;
 	u16 bctl;
@@ -1448,8 +1477,7 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
 		goto out;
 	}
 
-	if ((secondary || subordinate) &&
-	    !pcibios_assign_all_busses() && !broken) {
+	if ((secondary || subordinate) && !assign_all_busses && !broken) {
 		unsigned int cmax, buses;
 
 		/*
@@ -1491,8 +1519,7 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
 		 * do in the second pass.
 		 */
 		if (!pass) {
-			if (pcibios_assign_all_busses() || broken)
-
+			if (assign_all_busses || broken)
 				/*
 				 * Temporarily disable forwarding of the
 				 * configuration cycles on all bridges in
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 05/24] docs: liveupdate: Add documentation for PCI
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (3 preceding siblings ...)
  2026-03-23 23:57 ` [PATCH v3 04/24] PCI: Inherit bus numbers from previous kernel during Live Update David Matlack
@ 2026-03-23 23:57 ` David Matlack
  2026-03-23 23:57 ` [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:57 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Add documentation files for the PCI subsystem's participation in Live
Update, generated from the kernel-doc comments the code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 Documentation/PCI/liveupdate.rst      | 23 +++++++++++++++++++++++
 Documentation/core-api/liveupdate.rst |  1 +
 2 files changed, 24 insertions(+)
 create mode 100644 Documentation/PCI/liveupdate.rst

diff --git a/Documentation/PCI/liveupdate.rst b/Documentation/PCI/liveupdate.rst
new file mode 100644
index 000000000000..04c9b675e8df
--- /dev/null
+++ b/Documentation/PCI/liveupdate.rst
@@ -0,0 +1,23 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+===========================
+PCI Support for Live Update
+===========================
+
+.. kernel-doc:: drivers/pci/liveupdate.c
+   :doc: PCI Live Update
+
+PCI Preservation ABI
+====================
+
+.. kernel-doc:: include/linux/kho/abi/pci.h
+   :doc: PCI File-Lifecycle Bound (FLB) Live Update ABI
+
+.. kernel-doc:: include/linux/kho/abi/pci.h
+   :internal:
+
+See Also
+========
+
+ * :doc:`/core-api/liveupdate`
+ * :doc:`/core-api/kho/index`
diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst
index 5a292d0f3706..d56a7760978a 100644
--- a/Documentation/core-api/liveupdate.rst
+++ b/Documentation/core-api/liveupdate.rst
@@ -70,3 +70,4 @@ See Also
 
 - :doc:`Live Update uAPI </userspace-api/liveupdate>`
 - :doc:`/core-api/kho/index`
+- :doc:`PCI </PCI/liveupdate>`
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (4 preceding siblings ...)
  2026-03-23 23:57 ` [PATCH v3 05/24] docs: liveupdate: Add documentation for PCI David Matlack
@ 2026-03-23 23:57 ` David Matlack
  2026-03-24 13:07   ` Yi Liu
  2026-03-23 23:57 ` [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:57 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

From: Vipin Sharma <vipinsh@google.com>

Register a live update file handler for vfio-pci device files. Add stub
implementations of all required callbacks so that registration does not
fail (i.e. to avoid breaking git-bisect).

This file handler will be extended in subsequent commits to enable a
device bound to vfio-pci to run without interruption while the host is
going through a kexec Live Update.

Put this support behind a new Kconfig VFIO_PCI_LIVEUPDATE that is marked
experimental and default-disabled until more of the device preservation
support has landed in the kernel.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Co-developed-by: David Matlack <dmatlack@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 MAINTAINERS                            |  1 +
 drivers/vfio/pci/Kconfig               | 11 ++++
 drivers/vfio/pci/Makefile              |  1 +
 drivers/vfio/pci/vfio_pci.c            | 12 ++++-
 drivers/vfio/pci/vfio_pci_liveupdate.c | 69 ++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_priv.h       | 14 ++++++
 include/linux/kho/abi/vfio_pci.h       | 28 +++++++++++
 7 files changed, 135 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_liveupdate.c
 create mode 100644 include/linux/kho/abi/vfio_pci.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 96ea84948d76..a16a7ecc67a4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27685,6 +27685,7 @@ F:	Documentation/ABI/testing/debugfs-vfio
 F:	Documentation/ABI/testing/sysfs-devices-vfio-dev
 F:	Documentation/driver-api/vfio.rst
 F:	drivers/vfio/
+F:	include/linux/kho/abi/vfio_pci.h
 F:	include/linux/vfio.h
 F:	include/linux/vfio_pci_core.h
 F:	include/uapi/linux/vfio.h
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 1e82b44bda1a..8f087f7b58c3 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -58,6 +58,17 @@ config VFIO_PCI_ZDEV_KVM
 config VFIO_PCI_DMABUF
 	def_bool y if VFIO_PCI_CORE && PCI_P2PDMA && DMA_SHARED_BUFFER
 
+config VFIO_PCI_LIVEUPDATE
+	bool "VFIO PCI support for Live Update (EXPERIMENTAL)"
+	depends on VFIO_PCI && PCI_LIVEUPDATE
+	help
+	  Support for preserving devices bound to vfio-pci across a Live
+	  Update. This option should only be enabled by developers working on
+	  implementing this support. Once enough support has landed in the
+	  kernel, this option will no longer be marked EXPERIMENTAL.
+
+	  If you don't know what to do here, say N.
+
 source "drivers/vfio/pci/mlx5/Kconfig"
 
 source "drivers/vfio/pci/hisilicon/Kconfig"
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index e0a0757dd1d2..f462df61edb9 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
 
 vfio-pci-y := vfio_pci.o
 vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
+vfio-pci-$(CONFIG_VFIO_PCI_LIVEUPDATE) += vfio_pci_liveupdate.o
 obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
 
 obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0c771064c0b8..41dcbe4ace67 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -170,6 +170,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	ret = vfio_pci_core_register_device(vdev);
 	if (ret)
 		goto out_put_vdev;
+
 	return 0;
 
 out_put_vdev:
@@ -264,10 +265,14 @@ static int __init vfio_pci_init(void)
 
 	vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
 
+	ret = vfio_pci_liveupdate_init();
+	if (ret)
+		return ret;
+
 	/* Register and scan for devices */
 	ret = pci_register_driver(&vfio_pci_driver);
 	if (ret)
-		return ret;
+		goto err_liveupdate_cleanup;
 
 	vfio_pci_fill_ids();
 
@@ -275,12 +280,17 @@ static int __init vfio_pci_init(void)
 		pr_warn("device denylist disabled.\n");
 
 	return 0;
+
+err_liveupdate_cleanup:
+	vfio_pci_liveupdate_cleanup();
+	return ret;
 }
 module_init(vfio_pci_init);
 
 static void __exit vfio_pci_cleanup(void)
 {
 	pci_unregister_driver(&vfio_pci_driver);
+	vfio_pci_liveupdate_cleanup();
 }
 module_exit(vfio_pci_cleanup);
 
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
new file mode 100644
index 000000000000..5ea5af46b159
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * Vipin Sharma <vipinsh@google.com>
+ * David Matlack <dmatlack@google.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kho/abi/vfio_pci.h>
+#include <linux/liveupdate.h>
+#include <linux/errno.h>
+
+#include "vfio_pci_priv.h"
+
+static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *handler,
+					     struct file *file)
+{
+	return false;
+}
+
+static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args)
+{
+	return -EOPNOTSUPP;
+}
+
+static void vfio_pci_liveupdate_unpreserve(struct liveupdate_file_op_args *args)
+{
+}
+
+static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
+{
+	return -EOPNOTSUPP;
+}
+
+static void vfio_pci_liveupdate_finish(struct liveupdate_file_op_args *args)
+{
+}
+
+static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = {
+	.can_preserve = vfio_pci_liveupdate_can_preserve,
+	.preserve = vfio_pci_liveupdate_preserve,
+	.unpreserve = vfio_pci_liveupdate_unpreserve,
+	.retrieve = vfio_pci_liveupdate_retrieve,
+	.finish = vfio_pci_liveupdate_finish,
+	.owner = THIS_MODULE,
+};
+
+static struct liveupdate_file_handler vfio_pci_liveupdate_fh = {
+	.ops = &vfio_pci_liveupdate_file_ops,
+	.compatible = VFIO_PCI_LUO_FH_COMPATIBLE,
+};
+
+int __init vfio_pci_liveupdate_init(void)
+{
+	int ret;
+
+	ret = liveupdate_register_file_handler(&vfio_pci_liveupdate_fh);
+	if (ret && ret != -EOPNOTSUPP)
+		return ret;
+
+	return 0;
+}
+
+void vfio_pci_liveupdate_cleanup(void)
+{
+       liveupdate_unregister_file_handler(&vfio_pci_liveupdate_fh);
+}
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 27ac280f00b9..cbf46e09da30 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -133,4 +133,18 @@ static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev,
 }
 #endif
 
+#ifdef CONFIG_VFIO_PCI_LIVEUPDATE
+int __init vfio_pci_liveupdate_init(void);
+void vfio_pci_liveupdate_cleanup(void);
+#else
+static inline int vfio_pci_liveupdate_init(void)
+{
+	return 0;
+}
+
+static inline void vfio_pci_liveupdate_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_PCI_LIVEUPDATE */
+
 #endif
diff --git a/include/linux/kho/abi/vfio_pci.h b/include/linux/kho/abi/vfio_pci.h
new file mode 100644
index 000000000000..e2412b455e61
--- /dev/null
+++ b/include/linux/kho/abi/vfio_pci.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Vipin Sharma <vipinsh@google.com>
+ * David Matlack <dmatlack@google.com>
+ */
+
+#ifndef _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H
+#define _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H
+
+/**
+ * DOC: VFIO PCI Live Update ABI
+ *
+ * VFIO uses the ABI defined below for preserving device files across a kexec
+ * reboot using LUO.
+ *
+ * Device metadata is serialized into memory which is then handed to the next
+ * kernel via KHO.
+ *
+ * This interface is a contract. Any modification to any of the serialization
+ * structs defined here constitutes a breaking change. Such changes require
+ * incrementing the version number in the VFIO_PCI_LUO_FH_COMPATIBLE string.
+ */
+
+#define VFIO_PCI_LUO_FH_COMPATIBLE "vfio-pci-v1"
+
+#endif /* _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H */
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (5 preceding siblings ...)
  2026-03-23 23:57 ` [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
@ 2026-03-23 23:57 ` David Matlack
  2026-03-24 13:08   ` Yi Liu
  2026-03-23 23:58 ` [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after " David Matlack
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:57 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

From: Vipin Sharma <vipinsh@google.com>

Implement the live update file handler callbacks to preserve a vfio-pci
device across a Live Update. Subsequent commits will enable userspace to
then retrieve this file after the Live Update.

Live Update support is scoped only to cdev files (i.e. not
VFIO_GROUP_GET_DEVICE_FD files).

State about each device is serialized into a new ABI struct
vfio_pci_core_device_ser. The contents of this struct are preserved
across the Live Update to the next kernel using a combination of
Kexec-Handover (KHO) to preserve the page(s) holding the struct and the
Live Update Orchestrator (LUO) to preserve the physical address of the
struct.

For now the only contents of struct vfio_pci_core_device_ser the
device's PCI segment number and BDF, so that the device can be uniquely
identified after the Live Update.

Require that userspace disables interrupts on the device prior to
freeze() so that the device does not send any interrupts until new
interrupt handlers have been set up by the next kernel.

Reset the device and restore its state in the freeze() callback. This
ensures the device can be received by the next kernel in a consistent
state. Eventually this will be dropped and the device can be preserved
across in a running state, but that requires further work in VFIO and
the core PCI layer.

Note that LUO holds a reference to this file when it is preserved. So
VFIO is guaranteed that vfio_df_device_last_close() will not be called
on this device no matter what userspace does.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Co-developed-by: David Matlack <dmatlack@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 drivers/vfio/pci/vfio_pci.c            |   2 +-
 drivers/vfio/pci/vfio_pci_core.c       |  57 +++++----
 drivers/vfio/pci/vfio_pci_liveupdate.c | 156 ++++++++++++++++++++++++-
 drivers/vfio/pci/vfio_pci_priv.h       |   4 +
 drivers/vfio/vfio_main.c               |   3 +-
 include/linux/kho/abi/vfio_pci.h       |  15 +++
 include/linux/vfio.h                   |   2 +
 7 files changed, 213 insertions(+), 26 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 41dcbe4ace67..351480d13f6e 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -125,7 +125,7 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
 	return 0;
 }
 
-static const struct vfio_device_ops vfio_pci_ops = {
+const struct vfio_device_ops vfio_pci_ops = {
 	.name		= "vfio-pci",
 	.init		= vfio_pci_core_init_dev,
 	.release	= vfio_pci_core_release_dev,
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d43745fe4c84..81f941323641 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -585,9 +585,42 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_enable);
 
+void vfio_pci_core_try_reset(struct vfio_pci_core_device *vdev)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	struct pci_dev *bridge = pci_upstream_bridge(pdev);
+
+	lockdep_assert_held(&vdev->vdev.dev_set->lock);
+
+	if (!vdev->reset_works)
+		return;
+
+	/*
+	 * Try to get the locks ourselves to prevent a deadlock. The
+	 * success of this is dependent on being able to lock the device,
+	 * which is not always possible.
+	 *
+	 * We cannot use the "try" reset interface here, since that will
+	 * overwrite the previously restored configuration information.
+	 */
+	if (bridge && !pci_dev_trylock(bridge))
+		return;
+
+	if (!pci_dev_trylock(pdev))
+		goto out;
+
+	if (!__pci_reset_function_locked(pdev))
+		vdev->needs_reset = false;
+
+	pci_dev_unlock(pdev);
+out:
+	if (bridge)
+		pci_dev_unlock(bridge);
+}
+EXPORT_SYMBOL_GPL(vfio_pci_core_try_reset);
+
 void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 {
-	struct pci_dev *bridge;
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_dummy_resource *dummy_res, *tmp;
 	struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
@@ -687,27 +720,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 	 */
 	pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
 
-	/*
-	 * Try to get the locks ourselves to prevent a deadlock. The
-	 * success of this is dependent on being able to lock the device,
-	 * which is not always possible.
-	 * We can not use the "try" reset interface here, which will
-	 * overwrite the previously restored configuration information.
-	 */
-	if (vdev->reset_works) {
-		bridge = pci_upstream_bridge(pdev);
-		if (bridge && !pci_dev_trylock(bridge))
-			goto out_restore_state;
-		if (pci_dev_trylock(pdev)) {
-			if (!__pci_reset_function_locked(pdev))
-				vdev->needs_reset = false;
-			pci_dev_unlock(pdev);
-		}
-		if (bridge)
-			pci_dev_unlock(bridge);
-	}
-
-out_restore_state:
+	vfio_pci_core_try_reset(vdev);
 	pci_restore_state(pdev);
 out:
 	pci_disable_device(pdev);
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
index 5ea5af46b159..c4ebc7c486e5 100644
--- a/drivers/vfio/pci/vfio_pci_liveupdate.c
+++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
@@ -6,27 +6,178 @@
  * David Matlack <dmatlack@google.com>
  */
 
+/**
+ * DOC: VFIO PCI Preservation via LUO
+ *
+ * VFIO PCI devices can be preserved over a kexec using the Live Update
+ * Orchestrator (LUO) file preservation. This allows userspace (such as a VMM)
+ * to transfer an in-use device to the next kernel.
+ *
+ * .. note::
+ *    The support for preserving VFIO PCI devices is currently *partial* and
+ *    should be considered *experimental*. It should only be used by developers
+ *    working on expanding the support for the time being.
+ *
+ *    To avoid accidental usage while the support is still experimental, this
+ *    support is hidden behind a default-disable config option
+ *    ``CONFIG_VFIO_PCI_LIVEUPDATE``. Once the kernel support has stabilized and
+ *    become complete, this option will be enabled by default when
+ *    ``CONFIG_VFIO_PCI`` and ``CONFIG_LIVEUPDATE`` are enabled.
+ *
+ * Usage Example
+ * =============
+ *
+ * VFIO PCI devices can be preserved across a kexec by preserving the file
+ * associated with the device in a LUO session::
+ *
+ *   device_fd = open("/dev/vfio/devices/X");
+ *   ...
+ *   ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, { ..., device_fd, ...});
+ *
+ * .. note::
+ *    LUO will hold an extra reference to the device file for as long as it is
+ *    preserved, so there is no way for the file to be destroyed or the device
+ *    to be unbound from the vfio-pci driver while it is preserved.
+ *
+ * Retrieving the file after kexec is not yet supported.
+ *
+ * Restrictions
+ * ============
+ *
+ * The kernel imposes the following restrictions when preserving VFIO devices:
+ *
+ *  * The device must be bound to the ``vfio-pci`` driver.
+ *
+ *  * ``CONFIG_VFIO_PCI_ZDEV_KVM`` must not be enabled. This may be relaxed in
+ *    the future.
+ *
+ *  * The device not be an Intel display device. This may be relaxed in the
+ *    future.
+ *
+ *  * The device file must have been acquired from the VFIO character device,
+ *    not ``VFIO_GROUP_GET_DEVICE_FD``.
+ *
+ *  * The device must have interrupt disable prior to kexec. Failure to disable
+ *    interrupts on the device will cause the ``reboot(LINUX_REBOOT_CMD_KEXEC)``
+ *    syscall (to initiate the kexec) to fail.
+ *
+ * Preservation Behavior
+ * =====================
+ *
+ * The eventual goal of this support is to avoid disrupting the workload, state,
+ * or configuration of each preserved device during a Live Update. This would
+ * include allowing the device to perform DMA to preserved memory buffers and
+ * perform P2P DMA to other preserved devices. However, there are many pieces
+ * that still need to land in the kernel.
+ *
+ * For now, VFIO only preserves the following state for for devices:
+ *
+ *  * The PCI Segment, Bus, Device, and Function numbers of the device. The
+ *    kernel guarantees the these will not change across a kexec when a device
+ *    is preserved.
+ *
+ * Since the kernel is not yet prepared to preserve all parts of the device and
+ * its dependencies (such as DMA mappings), VFIO currently resets and restores
+ * preserved devices back into an idle state during kexec, before handing off
+ * control to the next kernel. This will be relaxed in future versions of the
+ * kernel once it is safe to allow the device to keep running across kexec.
+ */
+
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/kexec_handover.h>
 #include <linux/kho/abi/vfio_pci.h>
 #include <linux/liveupdate.h>
 #include <linux/errno.h>
+#include <linux/vfio.h>
 
 #include "vfio_pci_priv.h"
 
 static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *handler,
 					     struct file *file)
 {
-	return false;
+	struct vfio_device *device = vfio_device_from_file(file);
+	struct vfio_pci_core_device *vdev;
+	struct pci_dev *pdev;
+
+	if (!device)
+		return false;
+
+	/* Live Update support is limited to cdev files. */
+	if (!vfio_device_cdev_opened(device))
+		return false;
+
+	if (device->ops != &vfio_pci_ops)
+		return false;
+
+	vdev = container_of(device, struct vfio_pci_core_device, vdev);
+	pdev = vdev->pdev;
+
+	/*
+	 * Don't support specialized vfio-pci devices for now since they haven't
+	 * been tested.
+	 */
+	if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV_KVM) || vfio_pci_is_intel_display(pdev))
+		return false;
+
+	return true;
 }
 
 static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args)
 {
-	return -EOPNOTSUPP;
+	struct vfio_device *device = vfio_device_from_file(args->file);
+	struct vfio_pci_core_device_ser *ser;
+	struct vfio_pci_core_device *vdev;
+	struct pci_dev *pdev;
+
+	vdev = container_of(device, struct vfio_pci_core_device, vdev);
+	pdev = vdev->pdev;
+
+	ser = kho_alloc_preserve(sizeof(*ser));
+	if (IS_ERR(ser))
+		return PTR_ERR(ser);
+
+	ser->bdf = pci_dev_id(pdev);
+	ser->domain = pci_domain_nr(pdev->bus);
+
+	args->serialized_data = virt_to_phys(ser);
+	return 0;
 }
 
 static void vfio_pci_liveupdate_unpreserve(struct liveupdate_file_op_args *args)
 {
+	kho_unpreserve_free(phys_to_virt(args->serialized_data));
+}
+
+static int vfio_pci_liveupdate_freeze(struct liveupdate_file_op_args *args)
+{
+	struct vfio_device *device = vfio_device_from_file(args->file);
+	struct vfio_pci_core_device *vdev;
+	struct pci_dev *pdev;
+	int ret;
+
+	vdev = container_of(device, struct vfio_pci_core_device, vdev);
+	pdev = vdev->pdev;
+
+	guard(mutex)(&device->dev_set->lock);
+
+	/*
+	 * Userspace must disable interrupts on the device prior to freeze so
+	 * that the device does not send any interrupts until new interrupt
+	 * handlers have been established by the next kernel.
+	 */
+	if (vdev->irq_type != VFIO_PCI_NUM_IRQS) {
+		pci_err(pdev, "Freeze failed! Interrupts are still enabled.\n");
+		return -EINVAL;
+	}
+
+	ret = pci_load_saved_state(pdev, vdev->pci_saved_state);
+	if (ret)
+		return ret;
+
+	vfio_pci_core_try_reset(vdev);
+	pci_restore_state(pdev);
+	return 0;
 }
 
 static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
@@ -42,6 +193,7 @@ static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = {
 	.can_preserve = vfio_pci_liveupdate_can_preserve,
 	.preserve = vfio_pci_liveupdate_preserve,
 	.unpreserve = vfio_pci_liveupdate_unpreserve,
+	.freeze = vfio_pci_liveupdate_freeze,
 	.retrieve = vfio_pci_liveupdate_retrieve,
 	.finish = vfio_pci_liveupdate_finish,
 	.owner = THIS_MODULE,
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index cbf46e09da30..fa5c7f544f8a 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -11,6 +11,10 @@
 /* Cap maximum number of ioeventfds per device (arbitrary) */
 #define VFIO_PCI_IOEVENTFD_MAX		1000
 
+extern const struct vfio_device_ops vfio_pci_ops;
+
+void vfio_pci_core_try_reset(struct vfio_pci_core_device *vdev);
+
 struct vfio_pci_ioeventfd {
 	struct list_head	next;
 	struct vfio_pci_core_device	*vdev;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 742477546b15..8b222f71bbab 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1436,7 +1436,7 @@ const struct file_operations vfio_device_fops = {
 #endif
 };
 
-static struct vfio_device *vfio_device_from_file(struct file *file)
+struct vfio_device *vfio_device_from_file(struct file *file)
 {
 	struct vfio_device_file *df = file->private_data;
 
@@ -1444,6 +1444,7 @@ static struct vfio_device *vfio_device_from_file(struct file *file)
 		return NULL;
 	return df->device;
 }
+EXPORT_SYMBOL_GPL(vfio_device_from_file);
 
 /**
  * vfio_file_is_valid - True if the file is valid vfio file
diff --git a/include/linux/kho/abi/vfio_pci.h b/include/linux/kho/abi/vfio_pci.h
index e2412b455e61..876aaf81dd92 100644
--- a/include/linux/kho/abi/vfio_pci.h
+++ b/include/linux/kho/abi/vfio_pci.h
@@ -9,6 +9,9 @@
 #ifndef _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H
 #define _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H
 
+#include <linux/compiler.h>
+#include <linux/types.h>
+
 /**
  * DOC: VFIO PCI Live Update ABI
  *
@@ -25,4 +28,16 @@
 
 #define VFIO_PCI_LUO_FH_COMPATIBLE "vfio-pci-v1"
 
+/**
+ * struct vfio_pci_core_device_ser - Serialized state of a single VFIO PCI
+ * device.
+ *
+ * @domain: The device's PCI domain number (segment).
+ * @bdf: The device's PCI bus, device, and function number.
+ */
+struct vfio_pci_core_device_ser {
+	u32 domain;
+	u16 bdf;
+} __packed;
+
 #endif /* _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e90859956514..e9d3ddb715c5 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -81,6 +81,8 @@ struct vfio_device {
 #endif
 };
 
+struct vfio_device *vfio_device_from_file(struct file *file);
+
 /**
  * struct vfio_device_ops - VFIO bus driver device callbacks
  *
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after Live Update
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (6 preceding siblings ...)
  2026-03-23 23:57 ` [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-24 13:08   ` Yi Liu
  2026-03-23 23:58 ` [PATCH v3 09/24] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

From: Vipin Sharma <vipinsh@google.com>

Enable userspace to retrieve preserved VFIO device files from VFIO after
a Live Update by implementing the retrieve() and finish() file handler
callbacks.

Use an anonymous inode when creating the file, since the retrieved
device file is not opened through any particular cdev inode, and the
cdev inode does not matter in practice.

For now the retrieved file is functionally equivalent a opening the
corresponding VFIO cdev file. Subsequent commits will leverage the
preserved state associated with the retrieved file to preserve bits of
the device across Live Update.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Co-developed-by: David Matlack <dmatlack@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 drivers/vfio/device_cdev.c             | 59 ++++++++++++++++++++++----
 drivers/vfio/pci/vfio_pci_liveupdate.c | 52 ++++++++++++++++++++++-
 drivers/vfio/vfio_main.c               | 13 ++++++
 include/linux/vfio.h                   | 11 +++++
 4 files changed, 124 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 8ceca24ac136..edf322315a41 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -2,6 +2,7 @@
 /*
  * Copyright (c) 2023 Intel Corporation.
  */
+#include <linux/anon_inodes.h>
 #include <linux/vfio.h>
 #include <linux/iommufd.h>
 
@@ -16,15 +17,10 @@ void vfio_init_device_cdev(struct vfio_device *device)
 	device->cdev.owner = THIS_MODULE;
 }
 
-/*
- * device access via the fd opened by this function is blocked until
- * .open_device() is called successfully during BIND_IOMMUFD.
- */
-int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
+static int vfio_device_cdev_open(struct vfio_device *device, struct file **filep)
 {
-	struct vfio_device *device = container_of(inode->i_cdev,
-						  struct vfio_device, cdev);
 	struct vfio_device_file *df;
+	struct file *file = *filep;
 	int ret;
 
 	/* Paired with the put in vfio_device_fops_release() */
@@ -37,22 +33,67 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
 		goto err_put_registration;
 	}
 
-	filep->private_data = df;
+	/*
+	 * Simulate opening the character device using an anonymous inode. The
+	 * returned file has the same properties as a cdev file (e.g. operations
+	 * are blocked until BIND_IOMMUFD is called).
+	 */
+	if (!file) {
+		file = anon_inode_getfile_fmode("[vfio-device-liveupdate]",
+						&vfio_device_fops, NULL,
+						O_RDWR, FMODE_PREAD | FMODE_PWRITE);
+
+		if (IS_ERR(file)) {
+			ret = PTR_ERR(file);
+			goto err_free_device_file;
+		}
+
+		*filep = file;
+	}
+
+	file->private_data = df;
 
 	/*
 	 * Use the pseudo fs inode on the device to link all mmaps
 	 * to the same address space, allowing us to unmap all vmas
 	 * associated to this device using unmap_mapping_range().
 	 */
-	filep->f_mapping = device->inode->i_mapping;
+	file->f_mapping = device->inode->i_mapping;
 
 	return 0;
 
+err_free_device_file:
+	kvfree(df);
 err_put_registration:
 	vfio_device_put_registration(device);
 	return ret;
 }
 
+struct file *vfio_device_liveupdate_cdev_open(struct vfio_device *device)
+{
+	struct file *file = NULL;
+	int ret;
+
+	ret = vfio_device_cdev_open(device, &file);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return file;
+}
+EXPORT_SYMBOL_GPL(vfio_device_liveupdate_cdev_open);
+
+/*
+ * device access via the fd opened by this function is blocked until
+ * .open_device() is called successfully during BIND_IOMMUFD.
+ */
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *file)
+{
+	struct vfio_device *device = container_of(inode->i_cdev,
+						  struct vfio_device, cdev);
+
+	return vfio_device_cdev_open(device, &file);
+}
+
 static void vfio_df_get_kvm_safe(struct vfio_device_file *df)
 {
 	spin_lock(&df->kvm_ref_lock);
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
index c4ebc7c486e5..4b83a02401aa 100644
--- a/drivers/vfio/pci/vfio_pci_liveupdate.c
+++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
@@ -39,7 +39,13 @@
  *    preserved, so there is no way for the file to be destroyed or the device
  *    to be unbound from the vfio-pci driver while it is preserved.
  *
- * Retrieving the file after kexec is not yet supported.
+ * After kexec, the preserved VFIO device file can be retrieved from the session
+ * just like any other preserved file::
+ *
+ *   ioctl(session_fd, LIVEUPDATE_SESSION_RETRIEVE_FD, &arg);
+ *   device_fd = arg.fd;
+ *   ...
+ *   ioctl(session_fd, LIVEUPDATE_SESSION_FINISH, ...);
  *
  * Restrictions
  * ============
@@ -85,6 +91,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/file.h>
 #include <linux/kexec_handover.h>
 #include <linux/kho/abi/vfio_pci.h>
 #include <linux/liveupdate.h>
@@ -180,13 +187,53 @@ static int vfio_pci_liveupdate_freeze(struct liveupdate_file_op_args *args)
 	return 0;
 }
 
+static int match_device(struct device *dev, const void *arg)
+{
+	struct vfio_device *device = container_of(dev, struct vfio_device, device);
+	const struct vfio_pci_core_device_ser *ser = arg;
+	struct pci_dev *pdev;
+
+	pdev = dev_is_pci(device->dev) ? to_pci_dev(device->dev) : NULL;
+	if (!pdev)
+		return false;
+
+	return ser->bdf == pci_dev_id(pdev) && ser->domain == pci_domain_nr(pdev->bus);
+}
+
 static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
 {
-	return -EOPNOTSUPP;
+	struct vfio_pci_core_device_ser *ser;
+	struct vfio_device *device;
+	struct file *file;
+	int ret = 0;
+
+	ser = phys_to_virt(args->serialized_data);
+
+	device = vfio_find_device(ser, match_device);
+	if (!device)
+		return -ENODEV;
+
+	file = vfio_device_liveupdate_cdev_open(device);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto out;
+	}
+
+	args->file = file;
+out:
+	/* Drop the reference from vfio_find_device() */
+	put_device(&device->device);
+	return ret;
+}
+
+static bool vfio_pci_liveupdate_can_finish(struct liveupdate_file_op_args *args)
+{
+	return args->retrieve_status > 0;
 }
 
 static void vfio_pci_liveupdate_finish(struct liveupdate_file_op_args *args)
 {
+	kho_restore_free(phys_to_virt(args->serialized_data));
 }
 
 static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = {
@@ -195,6 +242,7 @@ static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = {
 	.unpreserve = vfio_pci_liveupdate_unpreserve,
 	.freeze = vfio_pci_liveupdate_freeze,
 	.retrieve = vfio_pci_liveupdate_retrieve,
+	.can_finish = vfio_pci_liveupdate_can_finish,
 	.finish = vfio_pci_liveupdate_finish,
 	.owner = THIS_MODULE,
 };
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8b222f71bbab..e5886235cad4 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -13,6 +13,7 @@
 #include <linux/cdev.h>
 #include <linux/compat.h>
 #include <linux/device.h>
+#include <linux/device/class.h>
 #include <linux/fs.h>
 #include <linux/idr.h>
 #include <linux/iommu.h>
@@ -1766,6 +1767,18 @@ int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
 }
 EXPORT_SYMBOL(vfio_dma_rw);
 
+struct vfio_device *vfio_find_device(const void *data, device_match_t match)
+{
+	struct device *device;
+
+	device = class_find_device(vfio.device_class, NULL, data, match);
+	if (!device)
+		return NULL;
+
+	return container_of(device, struct vfio_device, device);
+}
+EXPORT_SYMBOL_GPL(vfio_find_device);
+
 /*
  * Module/class support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e9d3ddb715c5..7384965d15d7 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -393,4 +393,15 @@ int vfio_virqfd_enable(void *opaque, int (*handler)(void *, void *),
 void vfio_virqfd_disable(struct virqfd **pvirqfd);
 void vfio_virqfd_flush_thread(struct virqfd **pvirqfd);
 
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+struct file *vfio_device_liveupdate_cdev_open(struct vfio_device *device);
+#else
+static inline struct file *vfio_device_liveupdate_cdev_open(struct vfio_device *device)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+#endif /* IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) */
+
+struct vfio_device *vfio_find_device(const void *data, device_match_t match);
+
 #endif /* VFIO_H */
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 09/24] vfio/pci: Notify PCI subsystem about devices preserved across Live Update
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (7 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after " David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 10/24] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Notify the PCI subsystem about devices vfio-pci is preserving across
Live Update by registering the vfio-pci liveupdate file handler with the
PCI subsystem's FLB handler.

Notably this will ensure that devices preserved through vfio-pci will
have their PCI bus numbers preserved across Live Update, allowing VFIO
to use BDF as a key to identify the device across the Live Update and
(in the future) allow the device to continue DMA operations across
the Live Update.

This also enables VFIO to detect that a device was preserved before
userspace first retrieves the file from it, which will be used in
subsequent commits.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 drivers/vfio/pci/vfio_pci_liveupdate.c | 44 +++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
index 4b83a02401aa..b960ec3ffbf2 100644
--- a/drivers/vfio/pci/vfio_pci_liveupdate.c
+++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
@@ -67,6 +67,9 @@
  *    interrupts on the device will cause the ``reboot(LINUX_REBOOT_CMD_KEXEC)``
  *    syscall (to initiate the kexec) to fail.
  *
+ * In addition, the device must meet all of the restrictions imposed by the
+ * core PCI layer documented at :doc:`/PCI/liveupdate`.
+ *
  * Preservation Behavior
  * =====================
  *
@@ -136,23 +139,37 @@ static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args)
 	struct vfio_pci_core_device_ser *ser;
 	struct vfio_pci_core_device *vdev;
 	struct pci_dev *pdev;
+	int ret;
 
 	vdev = container_of(device, struct vfio_pci_core_device, vdev);
 	pdev = vdev->pdev;
 
+	ret = pci_liveupdate_preserve(pdev);
+	if (ret)
+		return ret;
+
 	ser = kho_alloc_preserve(sizeof(*ser));
-	if (IS_ERR(ser))
-		return PTR_ERR(ser);
+	if (IS_ERR(ser)) {
+		ret = PTR_ERR(ser);
+		goto err_unpreserve;
+	}
 
 	ser->bdf = pci_dev_id(pdev);
 	ser->domain = pci_domain_nr(pdev->bus);
 
 	args->serialized_data = virt_to_phys(ser);
 	return 0;
+
+err_unpreserve:
+	pci_liveupdate_unpreserve(pdev);
+	return ret;
 }
 
 static void vfio_pci_liveupdate_unpreserve(struct liveupdate_file_op_args *args)
 {
+	struct vfio_device *device = vfio_device_from_file(args->file);
+
+	pci_liveupdate_unpreserve(to_pci_dev(device->dev));
 	kho_unpreserve_free(phys_to_virt(args->serialized_data));
 }
 
@@ -213,6 +230,10 @@ static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
 	if (!device)
 		return -ENODEV;
 
+	ret = pci_liveupdate_retrieve(to_pci_dev(device->dev));
+	if (ret)
+		goto out;
+
 	file = vfio_device_liveupdate_cdev_open(device);
 	if (IS_ERR(file)) {
 		ret = PTR_ERR(file);
@@ -233,6 +254,9 @@ static bool vfio_pci_liveupdate_can_finish(struct liveupdate_file_op_args *args)
 
 static void vfio_pci_liveupdate_finish(struct liveupdate_file_op_args *args)
 {
+	struct vfio_device *device = vfio_device_from_file(args->file);
+
+	pci_liveupdate_finish(to_pci_dev(device->dev));
 	kho_restore_free(phys_to_virt(args->serialized_data));
 }
 
@@ -257,13 +281,23 @@ int __init vfio_pci_liveupdate_init(void)
 	int ret;
 
 	ret = liveupdate_register_file_handler(&vfio_pci_liveupdate_fh);
-	if (ret && ret != -EOPNOTSUPP)
-		return ret;
+	if (ret)
+		goto err_return;
+
+	ret = pci_liveupdate_register_flb(&vfio_pci_liveupdate_fh);
+	if (ret)
+		goto err_unregister;
 
 	return 0;
+
+err_unregister:
+	liveupdate_unregister_file_handler(&vfio_pci_liveupdate_fh);
+err_return:
+	return (ret == -EOPNOTSUPP) ? 0 : ret;
 }
 
 void vfio_pci_liveupdate_cleanup(void)
 {
-       liveupdate_unregister_file_handler(&vfio_pci_liveupdate_fh);
+	pci_liveupdate_unregister_flb(&vfio_pci_liveupdate_fh);
+	liveupdate_unregister_file_handler(&vfio_pci_liveupdate_fh);
 }
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 10/24] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (8 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 09/24] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 11/24] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Enforce that files for incoming (preserved by previous kernel) VFIO
devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD rather than by
opening the corresponding VFIO character device or via
VFIO_GROUP_GET_DEVICE_FD.

Both of these methods would result in VFIO initializing the device
without access to the preserved state of the device passed by the
previous kernel.

Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 drivers/vfio/device_cdev.c             |  4 ++++
 drivers/vfio/group.c                   |  9 +++++++++
 drivers/vfio/pci/vfio_pci_liveupdate.c |  6 ++++++
 drivers/vfio/vfio.h                    | 18 ++++++++++++++++++
 4 files changed, 37 insertions(+)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index edf322315a41..6844684a3d8e 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -91,6 +91,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *file)
 	struct vfio_device *device = container_of(inode->i_cdev,
 						  struct vfio_device, cdev);
 
+	/* Device file must be retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD */
+	if (vfio_liveupdate_incoming_is_preserved(device))
+		return -EBUSY;
+
 	return vfio_device_cdev_open(device, &file);
 }
 
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 4f15016d2a5f..0fa9761b13d3 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -311,6 +311,15 @@ static int vfio_group_ioctl_get_device_fd(struct vfio_group *group,
 	if (IS_ERR(device))
 		return PTR_ERR(device);
 
+	/*
+	 * This device was preserved across a Live Update. Accessing it via
+	 * VFIO_GROUP_GET_DEVICE_FD is not allowed.
+	 */
+	if (vfio_liveupdate_incoming_is_preserved(device)) {
+		vfio_device_put_registration(device);
+		return -EBUSY;
+	}
+
 	fd = FD_ADD(O_CLOEXEC, vfio_device_open_file(device));
 	if (fd < 0)
 		vfio_device_put_registration(device);
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
index b960ec3ffbf2..6f760ace7065 100644
--- a/drivers/vfio/pci/vfio_pci_liveupdate.c
+++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
@@ -47,6 +47,12 @@
  *   ...
  *   ioctl(session_fd, LIVEUPDATE_SESSION_FINISH, ...);
  *
+ * .. note::
+ *    After kexec, if a device was preserved by the previous kernel, attempting
+ *    to open a new file for the device via its character device
+ *    (``/dev/vfio/devices/X``) or via ``VFIO_GROUP_GET_DEVICE_FD`` will fail
+ *    with ``-EBUSY``.
+ *
  * Restrictions
  * ============
  *
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 50128da18bca..8fcc98cf9577 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -11,6 +11,7 @@
 #include <linux/cdev.h>
 #include <linux/module.h>
 #include <linux/vfio.h>
+#include <linux/pci.h>
 
 struct iommufd_ctx;
 struct iommu_group;
@@ -462,4 +463,21 @@ static inline void vfio_device_debugfs_init(struct vfio_device *vdev) { }
 static inline void vfio_device_debugfs_exit(struct vfio_device *vdev) { }
 #endif /* CONFIG_VFIO_DEBUGFS */
 
+#ifdef CONFIG_PCI_LIVEUPDATE
+static inline bool vfio_liveupdate_incoming_is_preserved(struct vfio_device *device)
+{
+	struct device *d = device->dev;
+
+	if (dev_is_pci(d))
+		return to_pci_dev(d)->liveupdate_incoming;
+
+	return false;
+}
+#else
+static inline bool vfio_liveupdate_incoming_is_preserved(struct vfio_device *device)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_LIVEUPDATE */
+
 #endif
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 11/24] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (9 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 10/24] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 12/24] vfio/pci: Skip reset of preserved device after Live Update David Matlack
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Stash a pointer to a device's incoming Live Updated state in struct
vfio_pci_core_device. This will enable subsequent commits to use the
preserved state when initializing the device.

To enable VFIO to safely access this pointer during device enablement,
require that the device is fully enabled before returning true from
can_finish(). This is synchronized by vfio_pci_core.c setting
vdev->liveupdate_incoming_state to NULL under dev_set lock once it's
done using it.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 drivers/vfio/pci/vfio_pci_core.c       |  2 +-
 drivers/vfio/pci/vfio_pci_liveupdate.c | 17 ++++++++++++++++-
 include/linux/vfio_pci_core.h          |  2 ++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 81f941323641..d7c472cf4729 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -568,7 +568,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 	if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
 		vdev->has_vga = true;
 
-
+	vdev->liveupdate_incoming_state = NULL;
 	return 0;
 
 out_free_zdev:
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
index 6f760ace7065..8d6681e1d328 100644
--- a/drivers/vfio/pci/vfio_pci_liveupdate.c
+++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
@@ -226,6 +226,7 @@ static int match_device(struct device *dev, const void *arg)
 static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
 {
 	struct vfio_pci_core_device_ser *ser;
+	struct vfio_pci_core_device *vdev;
 	struct vfio_device *device;
 	struct file *file;
 	int ret = 0;
@@ -246,6 +247,9 @@ static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
 		goto out;
 	}
 
+	vdev = container_of(device, struct vfio_pci_core_device, vdev);
+	vdev->liveupdate_incoming_state = ser;
+
 	args->file = file;
 out:
 	/* Drop the reference from vfio_find_device() */
@@ -255,7 +259,18 @@ static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
 
 static bool vfio_pci_liveupdate_can_finish(struct liveupdate_file_op_args *args)
 {
-	return args->retrieve_status > 0;
+	struct vfio_pci_core_device *vdev;
+	struct vfio_device *device;
+
+	if (args->retrieve_status <= 0)
+		return false;
+
+	device = vfio_device_from_file(args->file);
+	vdev = container_of(device, struct vfio_pci_core_device, vdev);
+
+	/* Check that vdev->liveupdate_incoming_state is no longer in use. */
+	guard(mutex)(&device->dev_set->lock);
+	return !vdev->liveupdate_incoming_state;
 }
 
 static void vfio_pci_liveupdate_finish(struct liveupdate_file_op_args *args)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 2ebba746c18f..0c508dd8d1ac 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -26,6 +26,7 @@
 #define VFIO_PCI_OFFSET_MASK	(((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
 
 struct vfio_pci_core_device;
+struct vfio_pci_core_device_ser;
 struct vfio_pci_region;
 struct p2pdma_provider;
 struct dma_buf_attachment;
@@ -142,6 +143,7 @@ struct vfio_pci_core_device {
 	struct notifier_block	nb;
 	struct rw_semaphore	memory_lock;
 	struct list_head	dmabufs;
+	struct vfio_pci_core_device_ser *liveupdate_incoming_state;
 };
 
 enum vfio_pci_io_width {
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 12/24] vfio/pci: Skip reset of preserved device after Live Update
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (10 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 11/24] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 13/24] docs: liveupdate: Add documentation for VFIO PCI David Matlack
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

From: Vipin Sharma <vipinsh@google.com>

Do not reset the device when a Live Update preserved vfio-pci device is
retrieved and first enabled. vfio_pci_liveupdate_freeze() guarantees the
device is reset prior to Live Update, so there's no reason to reset it
again after Live Update.

Since VFIO normally uses the initial reset to detect if the device
supports function resets, pass that from the previous kernel via
struct vfio_pci_core_dev_ser.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 drivers/vfio/pci/vfio_pci_core.c       | 31 ++++++++++++++++++++++----
 drivers/vfio/pci/vfio_pci_liveupdate.c |  4 ++++
 include/linux/kho/abi/vfio_pci.h       |  4 +++-
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d7c472cf4729..849a3b57d56b 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -16,6 +16,7 @@
 #include <linux/file.h>
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
+#include <linux/kho/abi/vfio_pci.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/notifier.h>
@@ -494,6 +495,30 @@ static const struct dev_pm_ops vfio_pci_core_pm_ops = {
 			   NULL)
 };
 
+static int vfio_pci_core_probe_reset(struct vfio_pci_core_device *vdev)
+{
+	int ret;
+
+	/*
+	 * This device was preserved by the previous kernel across a Live
+	 * Update, so it does not need to be reset and reset_works can be
+	 * inherited from the previous kernel.
+	 */
+	if (vdev->liveupdate_incoming_state) {
+		vdev->reset_works = vdev->liveupdate_incoming_state->reset_works;
+		return 0;
+	}
+
+	ret = pci_try_reset_function(vdev->pdev);
+
+	/* Bail if the device lock cannot be acquired. */
+	if (ret == -EAGAIN)
+		return ret;
+
+	vdev->reset_works = !ret;
+	return 0;
+}
+
 int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
@@ -514,12 +539,10 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 	if (ret)
 		goto out_power;
 
-	/* If reset fails because of the device lock, fail this path entirely */
-	ret = pci_try_reset_function(pdev);
-	if (ret == -EAGAIN)
+	ret = vfio_pci_core_probe_reset(vdev);
+	if (ret)
 		goto out_disable_device;
 
-	vdev->reset_works = !ret;
 	pci_save_state(pdev);
 	vdev->pci_saved_state = pci_store_saved_state(pdev);
 	if (!vdev->pci_saved_state)
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
index 8d6681e1d328..874c821bf6eb 100644
--- a/drivers/vfio/pci/vfio_pci_liveupdate.c
+++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
@@ -91,6 +91,9 @@
  *    kernel guarantees the these will not change across a kexec when a device
  *    is preserved.
  *
+ *  * Whether or not the device supports function resets. This is necessary to
+ *    avoid resetting the device after kexec to probe for reset support.
+ *
  * Since the kernel is not yet prepared to preserve all parts of the device and
  * its dependencies (such as DMA mappings), VFIO currently resets and restores
  * preserved devices back into an idle state during kexec, before handing off
@@ -162,6 +165,7 @@ static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args)
 
 	ser->bdf = pci_dev_id(pdev);
 	ser->domain = pci_domain_nr(pdev->bus);
+	ser->reset_works = vdev->reset_works;
 
 	args->serialized_data = virt_to_phys(ser);
 	return 0;
diff --git a/include/linux/kho/abi/vfio_pci.h b/include/linux/kho/abi/vfio_pci.h
index 876aaf81dd92..c057794a044f 100644
--- a/include/linux/kho/abi/vfio_pci.h
+++ b/include/linux/kho/abi/vfio_pci.h
@@ -26,7 +26,7 @@
  * incrementing the version number in the VFIO_PCI_LUO_FH_COMPATIBLE string.
  */
 
-#define VFIO_PCI_LUO_FH_COMPATIBLE "vfio-pci-v1"
+#define VFIO_PCI_LUO_FH_COMPATIBLE "vfio-pci-v2"
 
 /**
  * struct vfio_pci_core_device_ser - Serialized state of a single VFIO PCI
@@ -34,10 +34,12 @@
  *
  * @domain: The device's PCI domain number (segment).
  * @bdf: The device's PCI bus, device, and function number.
+ * @reset_works: Non-zero if the device supports function resets.
  */
 struct vfio_pci_core_device_ser {
 	u32 domain;
 	u16 bdf;
+	u8 reset_works;
 } __packed;
 
 #endif /* _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H */
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 13/24] docs: liveupdate: Add documentation for VFIO PCI
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (11 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 12/24] vfio/pci: Skip reset of preserved device after Live Update David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 14/24] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Add documentation for preserving VFIO device files across a Live Update,
generated from the kernel-doc comments in the code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 Documentation/core-api/liveupdate.rst         |  1 +
 .../driver-api/vfio_pci_liveupdate.rst        | 23 +++++++++++++++++++
 MAINTAINERS                                   |  1 +
 3 files changed, 25 insertions(+)
 create mode 100644 Documentation/driver-api/vfio_pci_liveupdate.rst

diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst
index d56a7760978a..c55d0d9d1d3b 100644
--- a/Documentation/core-api/liveupdate.rst
+++ b/Documentation/core-api/liveupdate.rst
@@ -34,6 +34,7 @@ The following types of file descriptors can be preserved
    :maxdepth: 1
 
    ../mm/memfd_preservation
+   ../driver-api/vfio_pci_liveupdate
 
 Public API
 ==========
diff --git a/Documentation/driver-api/vfio_pci_liveupdate.rst b/Documentation/driver-api/vfio_pci_liveupdate.rst
new file mode 100644
index 000000000000..1098b84e5ecd
--- /dev/null
+++ b/Documentation/driver-api/vfio_pci_liveupdate.rst
@@ -0,0 +1,23 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+====================================
+VFIO PCI Device Preservation via LUO
+====================================
+
+.. kernel-doc:: drivers/vfio/pci/vfio_pci_liveupdate.c
+   :doc: VFIO PCI Preservation via LUO
+
+VFIO PCI Preservation ABI
+=========================
+
+.. kernel-doc:: include/linux/kho/abi/vfio_pci.h
+   :doc: VFIO PCI Live Update ABI
+
+.. kernel-doc:: include/linux/kho/abi/vfio_pci.h
+   :internal:
+
+See Also
+========
+
+- :doc:`/core-api/liveupdate`
+- :doc:`/core-api/kho/index`
diff --git a/MAINTAINERS b/MAINTAINERS
index a16a7ecc67a4..a6a31b94a4e8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27684,6 +27684,7 @@ T:	git https://github.com/awilliam/linux-vfio.git
 F:	Documentation/ABI/testing/debugfs-vfio
 F:	Documentation/ABI/testing/sysfs-devices-vfio-dev
 F:	Documentation/driver-api/vfio.rst
+F:	Documentation/driver-api/vfio_pci_liveupdate.rst
 F:	drivers/vfio/
 F:	include/linux/kho/abi/vfio_pci.h
 F:	include/linux/vfio.h
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 14/24] selftests/liveupdate: Move luo_test_utils.* into a reusable library
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (12 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 13/24] docs: liveupdate: Add documentation for VFIO PCI David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 15/24] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

From: Vipin Sharma <vipinsh@google.com>

Move luo_test_utils.[ch] into a lib/ directory and pull the rules to
build them out into a separate make script. This will enable these
utilities to be also built by and used within other selftests (such as
VFIO) in subsequent commits.

No functional change intended.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Co-developed-by: David Matlack <dmatlack@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/liveupdate/.gitignore |  1 +
 tools/testing/selftests/liveupdate/Makefile   | 14 ++++---------
 .../include/libliveupdate.h}                  |  8 ++++----
 .../selftests/liveupdate/lib/libliveupdate.mk | 20 +++++++++++++++++++
 .../{luo_test_utils.c => lib/liveupdate.c}    |  2 +-
 .../selftests/liveupdate/luo_kexec_simple.c   |  2 +-
 .../selftests/liveupdate/luo_multi_session.c  |  2 +-
 7 files changed, 32 insertions(+), 17 deletions(-)
 rename tools/testing/selftests/liveupdate/{luo_test_utils.h => lib/include/libliveupdate.h} (87%)
 create mode 100644 tools/testing/selftests/liveupdate/lib/libliveupdate.mk
 rename tools/testing/selftests/liveupdate/{luo_test_utils.c => lib/liveupdate.c} (99%)

diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
index 661827083ab6..18a0c7036cf3 100644
--- a/tools/testing/selftests/liveupdate/.gitignore
+++ b/tools/testing/selftests/liveupdate/.gitignore
@@ -3,6 +3,7 @@
 !/**/
 !*.c
 !*.h
+!*.mk
 !*.sh
 !.gitignore
 !config
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
index 080754787ede..a060cc21f27f 100644
--- a/tools/testing/selftests/liveupdate/Makefile
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -1,7 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-LIB_C += luo_test_utils.c
-
 TEST_GEN_PROGS += liveupdate
 
 TEST_GEN_PROGS_EXTENDED += luo_kexec_simple
@@ -10,25 +8,21 @@ TEST_GEN_PROGS_EXTENDED += luo_multi_session
 TEST_FILES += do_kexec.sh
 
 include ../lib.mk
+include lib/libliveupdate.mk
 
 CFLAGS += $(KHDR_INCLUDES)
 CFLAGS += -Wall -O2 -Wno-unused-function
 CFLAGS += -MD
 
-LIB_O := $(patsubst %.c, $(OUTPUT)/%.o, $(LIB_C))
 TEST_O := $(patsubst %, %.o, $(TEST_GEN_PROGS))
 TEST_O += $(patsubst %, %.o, $(TEST_GEN_PROGS_EXTENDED))
 
-TEST_DEP_FILES := $(patsubst %.o, %.d, $(LIB_O))
+TEST_DEP_FILES := $(patsubst %.o, %.d, $(LIBLIVEUPDATE_O))
 TEST_DEP_FILES += $(patsubst %.o, %.d, $(TEST_O))
 -include $(TEST_DEP_FILES)
 
-$(LIB_O): $(OUTPUT)/%.o: %.c
-	$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
-
-$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): $(OUTPUT)/%: %.o $(LIB_O)
-	$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIB_O) $(LDLIBS) -o $@
+$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): $(OUTPUT)/%: %.o $(LIBLIVEUPDATE_O)
+	$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIBLIVEUPDATE_O) $(LDLIBS) -o $@
 
-EXTRA_CLEAN += $(LIB_O)
 EXTRA_CLEAN += $(TEST_O)
 EXTRA_CLEAN += $(TEST_DEP_FILES)
diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.h b/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h
similarity index 87%
rename from tools/testing/selftests/liveupdate/luo_test_utils.h
rename to tools/testing/selftests/liveupdate/lib/include/libliveupdate.h
index 90099bf49577..4390a2737930 100644
--- a/tools/testing/selftests/liveupdate/luo_test_utils.h
+++ b/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h
@@ -7,13 +7,13 @@
  * Utility functions for LUO kselftests.
  */
 
-#ifndef LUO_TEST_UTILS_H
-#define LUO_TEST_UTILS_H
+#ifndef SELFTESTS_LIVEUPDATE_LIB_LIVEUPDATE_H
+#define SELFTESTS_LIVEUPDATE_LIB_LIVEUPDATE_H
 
 #include <errno.h>
 #include <string.h>
 #include <linux/liveupdate.h>
-#include "../kselftest.h"
+#include "../../../kselftest.h"
 
 #define LUO_DEVICE "/dev/liveupdate"
 
@@ -41,4 +41,4 @@ typedef void (*luo_test_stage2_fn)(int luo_fd, int state_session_fd);
 int luo_test(int argc, char *argv[], const char *state_session_name,
 	     luo_test_stage1_fn stage1, luo_test_stage2_fn stage2);
 
-#endif /* LUO_TEST_UTILS_H */
+#endif /* SELFTESTS_LIVEUPDATE_LIB_LIVEUPDATE_H */
diff --git a/tools/testing/selftests/liveupdate/lib/libliveupdate.mk b/tools/testing/selftests/liveupdate/lib/libliveupdate.mk
new file mode 100644
index 000000000000..fffd95b085b6
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/lib/libliveupdate.mk
@@ -0,0 +1,20 @@
+include $(top_srcdir)/scripts/subarch.include
+ARCH ?= $(SUBARCH)
+
+LIBLIVEUPDATE_SRCDIR := $(selfdir)/liveupdate/lib
+
+LIBLIVEUPDATE_C := liveupdate.c
+
+LIBLIVEUPDATE_OUTPUT := $(OUTPUT)/libliveupdate
+
+LIBLIVEUPDATE_O := $(patsubst %.c, $(LIBLIVEUPDATE_OUTPUT)/%.o, $(LIBLIVEUPDATE_C))
+
+LIBLIVEUPDATE_O_DIRS := $(shell dirname $(LIBLIVEUPDATE_O) | uniq)
+$(shell mkdir -p $(LIBLIVEUPDATE_O_DIRS))
+
+CFLAGS += -I$(LIBLIVEUPDATE_SRCDIR)/include
+
+$(LIBLIVEUPDATE_O): $(LIBLIVEUPDATE_OUTPUT)/%.o : $(LIBLIVEUPDATE_SRCDIR)/%.c
+	$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
+
+EXTRA_CLEAN += $(LIBLIVEUPDATE_OUTPUT)
diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.c b/tools/testing/selftests/liveupdate/lib/liveupdate.c
similarity index 99%
rename from tools/testing/selftests/liveupdate/luo_test_utils.c
rename to tools/testing/selftests/liveupdate/lib/liveupdate.c
index 3c8721c505df..60121873f685 100644
--- a/tools/testing/selftests/liveupdate/luo_test_utils.c
+++ b/tools/testing/selftests/liveupdate/lib/liveupdate.c
@@ -21,7 +21,7 @@
 #include <errno.h>
 #include <stdarg.h>
 
-#include "luo_test_utils.h"
+#include <libliveupdate.h>
 
 int luo_open_device(void)
 {
diff --git a/tools/testing/selftests/liveupdate/luo_kexec_simple.c b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
index d7ac1f3dc4cb..786ac93b9ae3 100644
--- a/tools/testing/selftests/liveupdate/luo_kexec_simple.c
+++ b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
@@ -8,7 +8,7 @@
  * across a single kexec reboot.
  */
 
-#include "luo_test_utils.h"
+#include <libliveupdate.h>
 
 #define TEST_SESSION_NAME "test-session"
 #define TEST_MEMFD_TOKEN 0x1A
diff --git a/tools/testing/selftests/liveupdate/luo_multi_session.c b/tools/testing/selftests/liveupdate/luo_multi_session.c
index 0ee2d795beef..aac24a5f5ce3 100644
--- a/tools/testing/selftests/liveupdate/luo_multi_session.c
+++ b/tools/testing/selftests/liveupdate/luo_multi_session.c
@@ -9,7 +9,7 @@
  * files.
  */
 
-#include "luo_test_utils.h"
+#include <libliveupdate.h>
 
 #define SESSION_EMPTY_1 "multi-test-empty-1"
 #define SESSION_EMPTY_2 "multi-test-empty-2"
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 15/24] selftests/liveupdate: Add helpers to preserve/retrieve FDs
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (13 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 14/24] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 16/24] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

From: Vipin Sharma <vipinsh@google.com>

Add helper functions to preserve and retrieve file descriptors from an
LUO session. These will be used be used in subsequent commits to
preserve FDs other than memfd.

No functional change intended.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Co-developed-by: David Matlack <dmatlack@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 .../liveupdate/lib/include/libliveupdate.h    |  3 ++
 .../selftests/liveupdate/lib/liveupdate.c     | 41 +++++++++++++++----
 2 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h b/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h
index 4390a2737930..2b04b3256382 100644
--- a/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h
+++ b/tools/testing/selftests/liveupdate/lib/include/libliveupdate.h
@@ -26,6 +26,9 @@ int luo_create_session(int luo_fd, const char *name);
 int luo_retrieve_session(int luo_fd, const char *name);
 int luo_session_finish(int session_fd);
 
+int luo_session_preserve_fd(int session_fd, int fd, __u64 token);
+int luo_session_retrieve_fd(int session_fd, __u64 token);
+
 int create_and_preserve_memfd(int session_fd, int token, const char *data);
 int restore_and_verify_memfd(int session_fd, int token, const char *expected_data);
 
diff --git a/tools/testing/selftests/liveupdate/lib/liveupdate.c b/tools/testing/selftests/liveupdate/lib/liveupdate.c
index 60121873f685..3e070975a3ec 100644
--- a/tools/testing/selftests/liveupdate/lib/liveupdate.c
+++ b/tools/testing/selftests/liveupdate/lib/liveupdate.c
@@ -54,9 +54,35 @@ int luo_retrieve_session(int luo_fd, const char *name)
 	return arg.fd;
 }
 
+int luo_session_preserve_fd(int session_fd, int fd, __u64 token)
+{
+	struct liveupdate_session_preserve_fd arg = {
+		.size = sizeof(arg),
+		.fd = fd,
+		.token = token,
+	};
+
+	if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg))
+		return -errno;
+
+	return 0;
+}
+
+int luo_session_retrieve_fd(int session_fd, __u64 token)
+{
+	struct liveupdate_session_retrieve_fd arg = {
+		.size = sizeof(arg),
+		.token = token,
+	};
+
+	if (ioctl(session_fd, LIVEUPDATE_SESSION_RETRIEVE_FD, &arg))
+		return -errno;
+
+	return arg.fd;
+}
+
 int create_and_preserve_memfd(int session_fd, int token, const char *data)
 {
-	struct liveupdate_session_preserve_fd arg = { .size = sizeof(arg) };
 	long page_size = sysconf(_SC_PAGE_SIZE);
 	void *map = MAP_FAILED;
 	int mfd = -1, ret = -1;
@@ -75,9 +101,8 @@ int create_and_preserve_memfd(int session_fd, int token, const char *data)
 	snprintf(map, page_size, "%s", data);
 	munmap(map, page_size);
 
-	arg.fd = mfd;
-	arg.token = token;
-	if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg) < 0)
+	ret = luo_session_preserve_fd(session_fd, mfd, token);
+	if (ret)
 		goto out;
 
 	ret = 0;
@@ -92,15 +117,13 @@ int create_and_preserve_memfd(int session_fd, int token, const char *data)
 int restore_and_verify_memfd(int session_fd, int token,
 			     const char *expected_data)
 {
-	struct liveupdate_session_retrieve_fd arg = { .size = sizeof(arg) };
 	long page_size = sysconf(_SC_PAGE_SIZE);
 	void *map = MAP_FAILED;
 	int mfd = -1, ret = -1;
 
-	arg.token = token;
-	if (ioctl(session_fd, LIVEUPDATE_SESSION_RETRIEVE_FD, &arg) < 0)
-		return -errno;
-	mfd = arg.fd;
+	mfd = luo_session_retrieve_fd(session_fd, token);
+	if (mfd < 0)
+		return mfd;
 
 	map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, mfd, 0);
 	if (map == MAP_FAILED)
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 16/24] vfio: selftests: Build liveupdate library in VFIO selftests
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (14 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 15/24] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 17/24] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

From: Vipin Sharma <vipinsh@google.com>

Import and build liveupdate selftest library in VFIO selftests.

It allows to use liveupdate ioctls in VFIO selftests

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/vfio/Makefile | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile
index 8e90e409e91d..7f3c94da289d 100644
--- a/tools/testing/selftests/vfio/Makefile
+++ b/tools/testing/selftests/vfio/Makefile
@@ -20,6 +20,7 @@ TEST_FILES += scripts/setup.sh
 
 include ../lib.mk
 include lib/libvfio.mk
+include ../liveupdate/lib/libliveupdate.mk
 
 CFLAGS += -I$(top_srcdir)/tools/include
 CFLAGS += -MD
@@ -27,11 +28,15 @@ CFLAGS += $(EXTRA_CFLAGS)
 
 LDFLAGS += -pthread
 
-$(TEST_GEN_PROGS): %: %.o $(LIBVFIO_O)
-	$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $< $(LIBVFIO_O) $(LDLIBS) -o $@
+LIBS_O := $(LIBVFIO_O)
+LIBS_O += $(LIBLIVEUPDATE_O)
+
+$(TEST_GEN_PROGS): %: %.o $(LIBS_O)
+	$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIBS_O) $(LDLIBS) -o $@
 
 TEST_GEN_PROGS_O = $(patsubst %, %.o, $(TEST_GEN_PROGS))
-TEST_DEP_FILES = $(patsubst %.o, %.d, $(TEST_GEN_PROGS_O) $(LIBVFIO_O))
+TEST_DEP_FILES := $(patsubst %.o, %.d, $(TEST_GEN_PROGS_O))
+TEST_DEP_FILES += $(patsubst %.o, %.d, $(LIBS_O))
 -include $(TEST_DEP_FILES)
 
 EXTRA_CLEAN += $(TEST_GEN_PROGS_O) $(TEST_DEP_FILES)
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 17/24] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (15 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 16/24] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 18/24] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Add Makefile support for TEST_GEN_PROGS_EXTENDED targets. These tests
are not run by default.

TEST_GEN_PROGS_EXTENDED will be used for Live Update selftests in
subsequent commits. These selftests must be run manually because they
require the user/runner to perform additional actions, such as kexec,
during the test.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/vfio/Makefile | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile
index 7f3c94da289d..9d5e390a61b7 100644
--- a/tools/testing/selftests/vfio/Makefile
+++ b/tools/testing/selftests/vfio/Makefile
@@ -31,14 +31,17 @@ LDFLAGS += -pthread
 LIBS_O := $(LIBVFIO_O)
 LIBS_O += $(LIBLIVEUPDATE_O)
 
-$(TEST_GEN_PROGS): %: %.o $(LIBS_O)
+$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): %: %.o $(LIBS_O)
 	$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIBS_O) $(LDLIBS) -o $@
 
-TEST_GEN_PROGS_O = $(patsubst %, %.o, $(TEST_GEN_PROGS))
-TEST_DEP_FILES := $(patsubst %.o, %.d, $(TEST_GEN_PROGS_O))
+TESTS_O := $(patsubst %, %.o, $(TEST_GEN_PROGS))
+TESTS_O += $(patsubst %, %.o, $(TEST_GEN_PROGS_EXTENDED))
+
+TEST_DEP_FILES := $(patsubst %.o, %.d, $(TESTS_O))
 TEST_DEP_FILES += $(patsubst %.o, %.d, $(LIBS_O))
 -include $(TEST_DEP_FILES)
 
-EXTRA_CLEAN += $(TEST_GEN_PROGS_O) $(TEST_DEP_FILES)
+EXTRA_CLEAN += $(TESTS_O)
+EXTRA_CLEAN += $(TEST_DEP_FILES)
 
 endif
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 18/24] vfio: selftests: Add vfio_pci_liveupdate_uapi_test
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (16 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 17/24] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 19/24] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Add a selftest to exercise preserving a various VFIO files through
/dev/liveupdate. Ensure that VFIO cdev device files can be preserved and
everything else (group-based device files, group files, and container
files) all fail.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/vfio/Makefile         |  1 +
 .../vfio/vfio_pci_liveupdate_uapi_test.c      | 93 +++++++++++++++++++
 2 files changed, 94 insertions(+)
 create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c

diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile
index 9d5e390a61b7..5b6e79593555 100644
--- a/tools/testing/selftests/vfio/Makefile
+++ b/tools/testing/selftests/vfio/Makefile
@@ -12,6 +12,7 @@ TEST_GEN_PROGS += vfio_iommufd_setup_test
 TEST_GEN_PROGS += vfio_pci_device_test
 TEST_GEN_PROGS += vfio_pci_device_init_perf_test
 TEST_GEN_PROGS += vfio_pci_driver_test
+TEST_GEN_PROGS += vfio_pci_liveupdate_uapi_test
 
 TEST_FILES += scripts/cleanup.sh
 TEST_FILES += scripts/lib.sh
diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c b/tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c
new file mode 100644
index 000000000000..1d89b08ab0a4
--- /dev/null
+++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_uapi_test.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <libliveupdate.h>
+#include <libvfio.h>
+#include <kselftest_harness.h>
+
+static const char *device_bdf;
+
+FIXTURE(vfio_pci_liveupdate_uapi_test) {
+	int luo_fd;
+	int session_fd;
+	struct iommu *iommu;
+	struct vfio_pci_device *device;
+};
+
+FIXTURE_VARIANT(vfio_pci_liveupdate_uapi_test) {
+	const char *iommu_mode;
+};
+
+#define FIXTURE_VARIANT_ADD_IOMMU_MODE(_iommu_mode)			\
+FIXTURE_VARIANT_ADD(vfio_pci_liveupdate_uapi_test, _iommu_mode) {	\
+	.iommu_mode = #_iommu_mode,					\
+}
+
+FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES();
+#undef FIXTURE_VARIANT_ADD_IOMMU_MODE
+
+FIXTURE_SETUP(vfio_pci_liveupdate_uapi_test)
+{
+	self->luo_fd = luo_open_device();
+	ASSERT_GE(self->luo_fd, 0);
+
+	self->session_fd = luo_create_session(self->luo_fd, "session");
+	ASSERT_GE(self->session_fd, 0);
+
+	self->iommu = iommu_init(variant->iommu_mode);
+	self->device = vfio_pci_device_init(device_bdf, self->iommu);
+}
+
+FIXTURE_TEARDOWN(vfio_pci_liveupdate_uapi_test)
+{
+	vfio_pci_device_cleanup(self->device);
+	iommu_cleanup(self->iommu);
+	close(self->session_fd);
+	close(self->luo_fd);
+}
+
+TEST_F(vfio_pci_liveupdate_uapi_test, preserve_device)
+{
+	int ret;
+
+	ret = luo_session_preserve_fd(self->session_fd, self->device->fd, 0);
+
+	/* Preservation should only be supported for VFIO cdev files. */
+	ASSERT_EQ(ret, self->iommu->iommufd ? 0 : -ENOENT);
+}
+
+TEST_F(vfio_pci_liveupdate_uapi_test, preserve_group_fails)
+{
+	int ret;
+
+	if (self->iommu->iommufd)
+		SKIP(return, "iommufd-mode does not have group files");
+
+	ret = luo_session_preserve_fd(self->session_fd, self->device->group_fd, 0);
+	ASSERT_EQ(ret, -ENOENT);
+}
+
+TEST_F(vfio_pci_liveupdate_uapi_test, preserve_container_fails)
+{
+	int ret;
+
+	if (self->iommu->iommufd)
+		SKIP(return, "iommufd-mode does not have container files");
+
+	ret = luo_session_preserve_fd(self->session_fd, self->iommu->container_fd, 0);
+	ASSERT_EQ(ret, -ENOENT);
+}
+
+int main(int argc, char *argv[])
+{
+	int fd;
+
+	fd = luo_open_device();
+	if (fd < 0)
+		ksft_exit_skip("open(%s) failed: %s, skipping\n",
+			       LUO_DEVICE, strerror(errno));
+
+	close(fd);
+
+	device_bdf = vfio_selftests_get_bdf(&argc, argv);
+	return test_harness_run(argc, argv);
+}
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 19/24] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (17 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 18/24] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 20/24] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

From: Vipin Sharma <vipinsh@google.com>

Use the given VFIO cdev FD to initialize vfio_pci_device in VFIO
selftests. Add the assertion to make sure that passed cdev FD is not
used with legacy VFIO APIs. If VFIO cdev FD is provided then do not open
the device instead use the FD for any interaction with the device.

This API will allow to write selftests where VFIO device FD is preserved
using liveupdate and retrieved later using liveupdate ioctl after kexec.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Co-developed-by: David Matlack <dmatlack@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 .../lib/include/libvfio/vfio_pci_device.h     |  3 ++
 .../selftests/vfio/lib/vfio_pci_device.c      | 33 ++++++++++++++-----
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
index 2858885a89bb..896dfde88118 100644
--- a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
+++ b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
@@ -38,6 +38,9 @@ struct vfio_pci_device {
 #define dev_info(_dev, _fmt, ...) printf("%s: " _fmt, (_dev)->bdf, ##__VA_ARGS__)
 #define dev_err(_dev, _fmt, ...) fprintf(stderr, "%s: " _fmt, (_dev)->bdf, ##__VA_ARGS__)
 
+struct vfio_pci_device *__vfio_pci_device_init(const char *bdf,
+					       struct iommu *iommu,
+					       int device_fd);
 struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iommu);
 void vfio_pci_device_cleanup(struct vfio_pci_device *device);
 
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
index 4e5871f1ebc3..e9215c712cda 100644
--- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c
+++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
@@ -340,19 +340,27 @@ static void vfio_device_attach_iommufd_pt(int device_fd, u32 pt_id)
 	ioctl_assert(device_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &args);
 }
 
-static void vfio_pci_iommufd_setup(struct vfio_pci_device *device, const char *bdf)
+static void vfio_pci_iommufd_setup(struct vfio_pci_device *device,
+				   const char *bdf, int device_fd)
 {
-	const char *cdev_path = vfio_pci_get_cdev_path(bdf);
+	const char *cdev_path;
 
-	device->fd = open(cdev_path, O_RDWR);
-	VFIO_ASSERT_GE(device->fd, 0);
-	free((void *)cdev_path);
+	if (device_fd >= 0) {
+		device->fd = device_fd;
+	} else {
+		cdev_path = vfio_pci_get_cdev_path(bdf);
+		device->fd = open(cdev_path, O_RDWR);
+		VFIO_ASSERT_GE(device->fd, 0);
+		free((void *)cdev_path);
+	}
 
 	vfio_device_bind_iommufd(device->fd, device->iommu->iommufd);
 	vfio_device_attach_iommufd_pt(device->fd, device->iommu->ioas_id);
 }
 
-struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iommu)
+struct vfio_pci_device *__vfio_pci_device_init(const char *bdf,
+					       struct iommu *iommu,
+					       int device_fd)
 {
 	struct vfio_pci_device *device;
 
@@ -363,10 +371,12 @@ struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iomm
 	device->iommu = iommu;
 	device->bdf = bdf;
 
-	if (iommu->mode->container_path)
+	if (iommu->mode->container_path) {
+		VFIO_ASSERT_EQ(device_fd, -1);
 		vfio_pci_container_setup(device, bdf);
-	else
-		vfio_pci_iommufd_setup(device, bdf);
+	} else {
+		vfio_pci_iommufd_setup(device, bdf, device_fd);
+	}
 
 	vfio_pci_device_setup(device);
 	vfio_pci_driver_probe(device);
@@ -374,6 +384,11 @@ struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iomm
 	return device;
 }
 
+struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iommu)
+{
+	return __vfio_pci_device_init(bdf, iommu, /*device_fd=*/-1);
+}
+
 void vfio_pci_device_cleanup(struct vfio_pci_device *device)
 {
 	int i;
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 20/24] vfio: selftests: Add vfio_pci_liveupdate_kexec_test
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (18 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 19/24] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 21/24] vfio: selftests: Expose iommu_modes to tests David Matlack
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

From: Vipin Sharma <vipinsh@google.com>

Add a selftest to exercise preserving a vfio-pci device across a Live
Update. For now the test is extremely simple and just verifies that the
device file can be preserved and retrieved. In the future this test will
be extended to verify more parts about device preservation as they are
implemented.

This test is added to TEST_GEN_PROGS_EXTENDED since it must be run
manually along with a kexec.

To run this test manually:

 $ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0
 $ tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test --stage 1 0000:00:04.0

 $ kexec ...   # NOTE: Exact method will be distro-dependent

 $ tools/testing/selftests/vfio/scripts/setup.sh 0000:00:04.0
 $ tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test --stage 2 0000:00:04.0

The second call to setup.sh is necessary because preserved devices are
not bound to a driver after Live Update. Such devices must be manually
bound by userspace after Live Update via driver_override.

This test is considered passing if all commands exit with 0.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Co-developed-by: David Matlack <dmatlack@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/vfio/Makefile         |  4 +
 .../vfio/vfio_pci_liveupdate_kexec_test.c     | 89 +++++++++++++++++++
 2 files changed, 93 insertions(+)
 create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c

diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile
index 5b6e79593555..792c4245d4f7 100644
--- a/tools/testing/selftests/vfio/Makefile
+++ b/tools/testing/selftests/vfio/Makefile
@@ -14,6 +14,10 @@ TEST_GEN_PROGS += vfio_pci_device_init_perf_test
 TEST_GEN_PROGS += vfio_pci_driver_test
 TEST_GEN_PROGS += vfio_pci_liveupdate_uapi_test
 
+# This test must be run manually since it requires the user/automation to
+# perform a kexec during the test.
+TEST_GEN_PROGS_EXTENDED += vfio_pci_liveupdate_kexec_test
+
 TEST_FILES += scripts/cleanup.sh
 TEST_FILES += scripts/lib.sh
 TEST_FILES += scripts/run.sh
diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c
new file mode 100644
index 000000000000..15b3e3af91d1
--- /dev/null
+++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <libliveupdate.h>
+#include <libvfio.h>
+
+static const char *device_bdf;
+
+static char state_session[LIVEUPDATE_SESSION_NAME_LENGTH];
+static char device_session[LIVEUPDATE_SESSION_NAME_LENGTH];
+
+enum {
+	STATE_TOKEN,
+	DEVICE_TOKEN,
+};
+
+static void before_kexec(int luo_fd)
+{
+	struct vfio_pci_device *device;
+	struct iommu *iommu;
+	int session_fd;
+	int ret;
+
+	iommu = iommu_init("iommufd");
+	device = vfio_pci_device_init(device_bdf, iommu);
+
+	create_state_file(luo_fd, state_session, STATE_TOKEN, /*next_stage=*/2);
+
+	session_fd = luo_create_session(luo_fd, device_session);
+	VFIO_ASSERT_GE(session_fd, 0);
+
+	printf("Preserving device in session\n");
+	ret = luo_session_preserve_fd(session_fd, device->fd, DEVICE_TOKEN);
+	VFIO_ASSERT_EQ(ret, 0);
+
+	close(luo_fd);
+	daemonize_and_wait();
+}
+
+static void after_kexec(int luo_fd, int state_session_fd)
+{
+	struct vfio_pci_device *device;
+	struct iommu *iommu;
+	int session_fd;
+	int device_fd;
+	int stage;
+
+	restore_and_read_stage(state_session_fd, STATE_TOKEN, &stage);
+	VFIO_ASSERT_EQ(stage, 2);
+
+	session_fd = luo_retrieve_session(luo_fd, device_session);
+	VFIO_ASSERT_GE(session_fd, 0);
+
+	printf("Finishing the session before retrieving the device (should fail)\n");
+	VFIO_ASSERT_NE(luo_session_finish(session_fd), 0);
+
+	printf("Retrieving the device FD from LUO\n");
+	device_fd = luo_session_retrieve_fd(session_fd, DEVICE_TOKEN);
+	VFIO_ASSERT_GE(device_fd, 0);
+
+	printf("Finishing the session before binding to iommufd (should fail)\n");
+	VFIO_ASSERT_NE(luo_session_finish(session_fd), 0);
+
+	printf("Binding the device to an iommufd and setting it up\n");
+	iommu = iommu_init("iommufd");
+
+	/*
+	 * This will invoke various ioctls on device_fd such as
+	 * VFIO_DEVICE_GET_INFO. So this is a decent sanity test
+	 * that LUO actually handed us back a valid VFIO device
+	 * file and not something else.
+	 */
+	device = __vfio_pci_device_init(device_bdf, iommu, device_fd);
+
+	printf("Finishing the session\n");
+	VFIO_ASSERT_EQ(luo_session_finish(session_fd), 0);
+
+	vfio_pci_device_cleanup(device);
+	iommu_cleanup(iommu);
+}
+
+int main(int argc, char *argv[])
+{
+	device_bdf = vfio_selftests_get_bdf(&argc, argv);
+
+	sprintf(device_session, "device-%s", device_bdf);
+	sprintf(state_session, "state-%s", device_bdf);
+
+	return luo_test(argc, argv, state_session, before_kexec, after_kexec);
+}
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 21/24] vfio: selftests: Expose iommu_modes to tests
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (19 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 20/24] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 22/24] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Expose the list of iommu_modes to enable tests that want to iterate
through all possible iommu modes.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/vfio/lib/include/libvfio/iommu.h | 2 ++
 tools/testing/selftests/vfio/lib/iommu.c                 | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/iommu.h b/tools/testing/selftests/vfio/lib/include/libvfio/iommu.h
index e9a3386a4719..4b9cbe262159 100644
--- a/tools/testing/selftests/vfio/lib/include/libvfio/iommu.h
+++ b/tools/testing/selftests/vfio/lib/include/libvfio/iommu.h
@@ -15,6 +15,8 @@ struct iommu_mode {
 	unsigned long iommu_type;
 };
 
+extern const struct iommu_mode iommu_modes[];
+extern const int nr_iommu_modes;
 extern const char *default_iommu_mode;
 
 struct dma_region {
diff --git a/tools/testing/selftests/vfio/lib/iommu.c b/tools/testing/selftests/vfio/lib/iommu.c
index 035dac069d60..95a494f829d2 100644
--- a/tools/testing/selftests/vfio/lib/iommu.c
+++ b/tools/testing/selftests/vfio/lib/iommu.c
@@ -23,7 +23,7 @@
 const char *default_iommu_mode = MODE_IOMMUFD;
 
 /* Reminder: Keep in sync with FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(). */
-static const struct iommu_mode iommu_modes[] = {
+const struct iommu_mode iommu_modes[] = {
 	{
 		.name = MODE_VFIO_TYPE1_IOMMU,
 		.container_path = "/dev/vfio/vfio",
@@ -49,6 +49,8 @@ static const struct iommu_mode iommu_modes[] = {
 	},
 };
 
+const int nr_iommu_modes = ARRAY_SIZE(iommu_modes);
+
 static const struct iommu_mode *lookup_iommu_mode(const char *iommu_mode)
 {
 	int i;
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 22/24] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (20 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 21/24] vfio: selftests: Expose iommu_modes to tests David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 23/24] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Expose a few low-level helper routings for setting up vfio_pci_device
structs. These routines will be used in a subsequent commit to assert
that VFIO_GROUP_GET_DEVICE_FD fails under certain conditions.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 .../lib/include/libvfio/vfio_pci_device.h     |  5 +++
 .../selftests/vfio/lib/vfio_pci_device.c      | 33 +++++++++++++------
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
index 896dfde88118..2389c7698335 100644
--- a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
+++ b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
@@ -125,4 +125,9 @@ static inline bool vfio_pci_device_match(struct vfio_pci_device *device,
 
 const char *vfio_pci_get_cdev_path(const char *bdf);
 
+/* Low-level routines for setting up a struct vfio_pci_device */
+struct vfio_pci_device *vfio_pci_device_alloc(const char *bdf, struct iommu *iommu);
+void vfio_pci_group_setup(struct vfio_pci_device *device);
+void vfio_pci_iommu_setup(struct vfio_pci_device *device);
+
 #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H */
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
index e9215c712cda..66ee268110e2 100644
--- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c
+++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
@@ -220,7 +220,7 @@ static unsigned int vfio_pci_get_group_from_dev(const char *bdf)
 	return group;
 }
 
-static void vfio_pci_group_setup(struct vfio_pci_device *device, const char *bdf)
+void vfio_pci_group_setup(struct vfio_pci_device *device)
 {
 	struct vfio_group_status group_status = {
 		.argsz = sizeof(group_status),
@@ -228,7 +228,7 @@ static void vfio_pci_group_setup(struct vfio_pci_device *device, const char *bdf
 	char group_path[32];
 	int group;
 
-	group = vfio_pci_get_group_from_dev(bdf);
+	group = vfio_pci_get_group_from_dev(device->bdf);
 	snprintf(group_path, sizeof(group_path), "/dev/vfio/%d", group);
 
 	device->group_fd = open(group_path, O_RDWR);
@@ -240,14 +240,12 @@ static void vfio_pci_group_setup(struct vfio_pci_device *device, const char *bdf
 	ioctl_assert(device->group_fd, VFIO_GROUP_SET_CONTAINER, &device->iommu->container_fd);
 }
 
-static void vfio_pci_container_setup(struct vfio_pci_device *device, const char *bdf)
+void vfio_pci_iommu_setup(struct vfio_pci_device *device)
 {
 	struct iommu *iommu = device->iommu;
 	unsigned long iommu_type = iommu->mode->iommu_type;
 	int ret;
 
-	vfio_pci_group_setup(device, bdf);
-
 	ret = ioctl(iommu->container_fd, VFIO_CHECK_EXTENSION, iommu_type);
 	VFIO_ASSERT_GT(ret, 0, "VFIO IOMMU type %lu not supported\n", iommu_type);
 
@@ -257,8 +255,14 @@ static void vfio_pci_container_setup(struct vfio_pci_device *device, const char
 	 * because the IOMMU type is already set.
 	 */
 	(void)ioctl(iommu->container_fd, VFIO_SET_IOMMU, (void *)iommu_type);
+}
 
-	device->fd = ioctl(device->group_fd, VFIO_GROUP_GET_DEVICE_FD, bdf);
+static void vfio_pci_container_setup(struct vfio_pci_device *device)
+{
+	vfio_pci_group_setup(device);
+	vfio_pci_iommu_setup(device);
+
+	device->fd = ioctl(device->group_fd, VFIO_GROUP_GET_DEVICE_FD, device->bdf);
 	VFIO_ASSERT_GE(device->fd, 0);
 }
 
@@ -358,9 +362,7 @@ static void vfio_pci_iommufd_setup(struct vfio_pci_device *device,
 	vfio_device_attach_iommufd_pt(device->fd, device->iommu->ioas_id);
 }
 
-struct vfio_pci_device *__vfio_pci_device_init(const char *bdf,
-					       struct iommu *iommu,
-					       int device_fd)
+struct vfio_pci_device *vfio_pci_device_alloc(const char *bdf, struct iommu *iommu)
 {
 	struct vfio_pci_device *device;
 
@@ -371,9 +373,20 @@ struct vfio_pci_device *__vfio_pci_device_init(const char *bdf,
 	device->iommu = iommu;
 	device->bdf = bdf;
 
+	return device;
+}
+
+struct vfio_pci_device *__vfio_pci_device_init(const char *bdf,
+					       struct iommu *iommu,
+					       int device_fd)
+{
+	struct vfio_pci_device *device;
+
+	device = vfio_pci_device_alloc(bdf, iommu);
+
 	if (iommu->mode->container_path) {
 		VFIO_ASSERT_EQ(device_fd, -1);
-		vfio_pci_container_setup(device, bdf);
+		vfio_pci_container_setup(device);
 	} else {
 		vfio_pci_iommufd_setup(device, bdf, device_fd);
 	}
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 23/24] vfio: selftests: Verify that opening VFIO device fails during Live Update
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (21 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 22/24] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-23 23:58 ` [PATCH v3 24/24] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack
  2026-03-26 20:43 ` [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Verify that opening a VFIO device through its cdev file and via
VFIO_GROUP_GET_DEVICE_FD both fail with -EBUSY if the device was
preserved across a Live Update. When a device file is preserve across a
Live Update, the file must be retrieved from /dev/liveupdate, not from
VFIO directly.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 .../vfio/vfio_pci_liveupdate_kexec_test.c     | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c
index 15b3e3af91d1..65c48196e44e 100644
--- a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c
+++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c
@@ -36,6 +36,42 @@ static void before_kexec(int luo_fd)
 	daemonize_and_wait();
 }
 
+static void check_open_vfio_device_fails(void)
+{
+	const char *cdev_path = vfio_pci_get_cdev_path(device_bdf);
+	struct vfio_pci_device *device;
+	struct iommu *iommu;
+	int ret, i;
+
+	printf("Checking open(%s) fails\n", cdev_path);
+	ret = open(cdev_path, O_RDWR);
+	VFIO_ASSERT_EQ(ret, -1);
+	VFIO_ASSERT_EQ(errno, EBUSY);
+	free((void *)cdev_path);
+
+	for (i = 0; i < nr_iommu_modes; i++) {
+		if (!iommu_modes[i].container_path)
+			continue;
+
+		iommu = iommu_init(iommu_modes[i].name);
+
+		device = vfio_pci_device_alloc(device_bdf, iommu);
+		vfio_pci_group_setup(device);
+		vfio_pci_iommu_setup(device);
+
+		printf("Checking ioctl(group_fd, VFIO_GROUP_GET_DEVICE_FD, \"%s\") fails (%s)\n",
+		       device_bdf, iommu_modes[i].name);
+
+		ret = ioctl(device->group_fd, VFIO_GROUP_GET_DEVICE_FD, device->bdf);
+		VFIO_ASSERT_EQ(ret, -1);
+		VFIO_ASSERT_EQ(errno, EBUSY);
+
+		close(device->group_fd);
+		free(device);
+		iommu_cleanup(iommu);
+	}
+}
+
 static void after_kexec(int luo_fd, int state_session_fd)
 {
 	struct vfio_pci_device *device;
@@ -44,6 +80,8 @@ static void after_kexec(int luo_fd, int state_session_fd)
 	int device_fd;
 	int stage;
 
+	check_open_vfio_device_fails();
+
 	restore_and_read_stage(state_session_fd, STATE_TOKEN, &stage);
 	VFIO_ASSERT_EQ(stage, 2);
 
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 24/24] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (22 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 23/24] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
@ 2026-03-23 23:58 ` David Matlack
  2026-03-26 20:43 ` [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-23 23:58 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Matlack, David Rientjes, Feng Tang,
	Jacob Pan, Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet,
	Josh Hilke, Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

Add a long-running DMA memcpy operation to
vfio_pci_liveupdate_kexec_test so that the device attempts to perform
DMAs continuously during the Live Update.

At this point iommufd preservation is not supported and bus mastering is
not kept enabled on the device during across the kexec, so most of these
DMAs will be dropped. However this test ensures that the current device
preservation support does not lead to system instability or crashes if
the device is active. And once iommufd and bus mastering are preserved,
this test can be relaxed to check that the DMA operations completed
successfully.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 .../vfio/vfio_pci_liveupdate_kexec_test.c     | 129 ++++++++++++++++++
 1 file changed, 129 insertions(+)

diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c
index 65c48196e44e..36bddfbb88ed 100644
--- a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c
+++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c
@@ -1,8 +1,16 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
+#include <linux/sizes.h>
+#include <sys/mman.h>
+
 #include <libliveupdate.h>
 #include <libvfio.h>
 
+#define MEMCPY_SIZE SZ_1G
+#define DRIVER_SIZE SZ_1M
+#define MEMFD_SIZE (MEMCPY_SIZE + DRIVER_SIZE)
+
+static struct dma_region memcpy_region;
 static const char *device_bdf;
 
 static char state_session[LIVEUPDATE_SESSION_NAME_LENGTH];
@@ -11,8 +19,89 @@ static char device_session[LIVEUPDATE_SESSION_NAME_LENGTH];
 enum {
 	STATE_TOKEN,
 	DEVICE_TOKEN,
+	MEMFD_TOKEN,
 };
 
+static void dma_memcpy_one(struct vfio_pci_device *device)
+{
+	void *src = memcpy_region.vaddr, *dst;
+	u64 size;
+
+	size = min_t(u64, memcpy_region.size / 2, device->driver.max_memcpy_size);
+	dst = src + size;
+
+	memset(src, 1, size);
+	memset(dst, 0, size);
+
+	printf("Kicking off 1 DMA memcpy operations of size 0x%lx...\n", size);
+	vfio_pci_driver_memcpy(device,
+			       to_iova(device, src),
+			       to_iova(device, dst),
+			       size);
+
+	VFIO_ASSERT_EQ(memcmp(src, dst, size), 0);
+}
+
+static void dma_memcpy_start(struct vfio_pci_device *device)
+{
+	void *src = memcpy_region.vaddr, *dst;
+	u64 count, size;
+
+	size = min_t(u64, memcpy_region.size / 2, device->driver.max_memcpy_size);
+	dst = src + size;
+
+	/*
+	 * Rough Math: If we assume the device will perform memcpy at a rate of
+	 * 30GB/s then 7200GB of transfers will run for about 4 minutes.
+	 */
+	count = (u64)7200 * SZ_1G / size;
+	count = min_t(u64, count, device->driver.max_memcpy_count);
+
+	memset(src, 1, size / 2);
+	memset(dst, 0, size / 2);
+
+	printf("Kicking off %lu DMA memcpy operations of size 0x%lx...\n", count, size);
+	vfio_pci_driver_memcpy_start(device,
+				     to_iova(device, src),
+				     to_iova(device, dst),
+				     size, count);
+}
+
+static void dma_memfd_map(struct vfio_pci_device *device, int fd)
+{
+	void *vaddr;
+
+	vaddr = mmap(NULL, MEMFD_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
+	VFIO_ASSERT_NE(vaddr, MAP_FAILED);
+
+	memcpy_region.iova = SZ_4G;
+	memcpy_region.size = MEMCPY_SIZE;
+	memcpy_region.vaddr = vaddr;
+	iommu_map(device->iommu, &memcpy_region);
+
+	device->driver.region.iova = memcpy_region.iova + memcpy_region.size;
+	device->driver.region.size = DRIVER_SIZE;
+	device->driver.region.vaddr = vaddr + memcpy_region.size;
+	iommu_map(device->iommu, &device->driver.region);
+}
+
+static void dma_memfd_setup(struct vfio_pci_device *device, int session_fd)
+{
+	int fd, ret;
+
+	fd = memfd_create("dma-buffer", 0);
+	VFIO_ASSERT_GE(fd, 0);
+
+	ret = fallocate(fd, 0, 0, MEMFD_SIZE);
+	VFIO_ASSERT_EQ(ret, 0);
+
+	printf("Preserving memfd of size 0x%x in session\n", MEMFD_SIZE);
+	ret = luo_session_preserve_fd(session_fd, fd, MEMFD_TOKEN);
+	VFIO_ASSERT_EQ(ret, 0);
+
+	dma_memfd_map(device, fd);
+}
+
 static void before_kexec(int luo_fd)
 {
 	struct vfio_pci_device *device;
@@ -32,6 +121,27 @@ static void before_kexec(int luo_fd)
 	ret = luo_session_preserve_fd(session_fd, device->fd, DEVICE_TOKEN);
 	VFIO_ASSERT_EQ(ret, 0);
 
+	dma_memfd_setup(device, session_fd);
+
+	/*
+	 * If the device has a selftests driver, kick off a long-running DMA
+	 * operation to exercise the device trying to DMA during a Live Update.
+	 * Since iommufd preservation is not supported yet, these DMAs should be
+	 * dropped. So this is just looking to verify that the system does not
+	 * fall over and crash as a result of a busy device being preserved.
+	 */
+	if (device->driver.ops) {
+		vfio_pci_driver_init(device);
+		dma_memcpy_start(device);
+
+		/*
+		 * Disable interrupts on the device or freeze() will fail.
+		 * Unfortunately there isn't a way to easily have a test for
+		 * that here since the check happens during shutdown.
+		 */
+		vfio_pci_msix_disable(device);
+	}
+
 	close(luo_fd);
 	daemonize_and_wait();
 }
@@ -78,6 +188,7 @@ static void after_kexec(int luo_fd, int state_session_fd)
 	struct iommu *iommu;
 	int session_fd;
 	int device_fd;
+	int memfd;
 	int stage;
 
 	check_open_vfio_device_fails();
@@ -88,6 +199,10 @@ static void after_kexec(int luo_fd, int state_session_fd)
 	session_fd = luo_retrieve_session(luo_fd, device_session);
 	VFIO_ASSERT_GE(session_fd, 0);
 
+	printf("Retrieving memfd from LUO\n");
+	memfd = luo_session_retrieve_fd(session_fd, MEMFD_TOKEN);
+	VFIO_ASSERT_GE(memfd, 0);
+
 	printf("Finishing the session before retrieving the device (should fail)\n");
 	VFIO_ASSERT_NE(luo_session_finish(session_fd), 0);
 
@@ -109,9 +224,23 @@ static void after_kexec(int luo_fd, int state_session_fd)
 	 */
 	device = __vfio_pci_device_init(device_bdf, iommu, device_fd);
 
+	dma_memfd_map(device, memfd);
+
 	printf("Finishing the session\n");
 	VFIO_ASSERT_EQ(luo_session_finish(session_fd), 0);
 
+	/*
+	 * Once iommufd preservation is supported and the device is kept fully
+	 * running across the Live Update, this should wait for the long-
+	 * running DMA memcpy operation kicked off in before_kexec() to
+	 * complete. But for now we expect the device to be reset so just
+	 * trigger a single memcpy to make sure it's still functional.
+	 */
+	if (device->driver.ops) {
+		vfio_pci_driver_init(device);
+		dma_memcpy_one(device);
+	}
+
 	vfio_pci_device_cleanup(device);
 	iommu_cleanup(iommu);
 }
-- 
2.53.0.983.g0bb29b3bc5-goog



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups
  2026-03-23 23:57 ` [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups David Matlack
@ 2026-03-24 13:07   ` Yi Liu
  2026-03-24 18:00     ` David Matlack
  2026-03-25 23:13   ` Bjorn Helgaas
  1 sibling, 1 reply; 41+ messages in thread
From: Yi Liu @ 2026-03-24 13:07 UTC (permalink / raw)
  To: David Matlack, Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan,
	Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet, Josh Hilke,
	Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Zhu Yanjun

On 3/24/26 07:57, David Matlack wrote:
> Require that Live Update preserved devices are in singleton iommu_groups
> during preservation (outgoing kernel) and retrieval (incoming kernel).
> 
> PCI devices preserved across Live Update will be allowed to perform
> memory transactions throughout the Live Update. Thus IOMMU groups for
> preserved devices must remain fixed. Since all current use cases for
> Live Update are for PCI devices in singleton iommu_groups, require that
> as a starting point. This avoids the complexity of needing to enforce
> arbitrary iommu_group topologies while still allowing all current use
> cases.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   drivers/pci/liveupdate.c | 34 +++++++++++++++++++++++++++++++++-
>   1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
> index bec7b3500057..a3dbe06650ff 100644
> --- a/drivers/pci/liveupdate.c
> +++ b/drivers/pci/liveupdate.c
> @@ -75,6 +75,8 @@
>    *
>    *  * The device must not be a Physical Function (PF).
>    *
> + *  * The device must be the only device in its IOMMU group.
> + *
>    * Preservation Behavior
>    * =====================
>    *
> @@ -105,6 +107,7 @@
>   
>   #include <linux/bsearch.h>
>   #include <linux/io.h>
> +#include <linux/iommu.h>
>   #include <linux/kexec_handover.h>
>   #include <linux/kho/abi/pci.h>
>   #include <linux/liveupdate.h>
> @@ -222,6 +225,31 @@ static void pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev)
>   	ser->nr_devices--;
>   }
>   
> +static int count_devices(struct device *dev, void *__nr_devices)
> +{
> +	(*(int *)__nr_devices)++;
> +	return 0;
> +}
> +

there was a related discussion on the singleton group check. have you
considered the device_group_immutable_singleton() in below link?

https://lore.kernel.org/linux-iommu/20220421052121.3464100-4-baolu.lu@linux.intel.com/

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator
  2026-03-23 23:57 ` [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
@ 2026-03-24 13:07   ` Yi Liu
  2026-03-24 16:33     ` David Matlack
  0 siblings, 1 reply; 41+ messages in thread
From: Yi Liu @ 2026-03-24 13:07 UTC (permalink / raw)
  To: David Matlack, Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan,
	Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet, Josh Hilke,
	Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Zhu Yanjun

On 3/24/26 07:57, David Matlack wrote:
> From: Vipin Sharma <vipinsh@google.com>
> 
> Register a live update file handler for vfio-pci device files. Add stub
> implementations of all required callbacks so that registration does not
> fail (i.e. to avoid breaking git-bisect).
> 
> This file handler will be extended in subsequent commits to enable a
> device bound to vfio-pci to run without interruption while the host is
> going through a kexec Live Update.
> 
> Put this support behind a new Kconfig VFIO_PCI_LIVEUPDATE that is marked
> experimental and default-disabled until more of the device preservation
> support has landed in the kernel.
> 
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> Co-developed-by: David Matlack <dmatlack@google.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   MAINTAINERS                            |  1 +
>   drivers/vfio/pci/Kconfig               | 11 ++++
>   drivers/vfio/pci/Makefile              |  1 +
>   drivers/vfio/pci/vfio_pci.c            | 12 ++++-
>   drivers/vfio/pci/vfio_pci_liveupdate.c | 69 ++++++++++++++++++++++++++
>   drivers/vfio/pci/vfio_pci_priv.h       | 14 ++++++
>   include/linux/kho/abi/vfio_pci.h       | 28 +++++++++++
>   7 files changed, 135 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/vfio/pci/vfio_pci_liveupdate.c
>   create mode 100644 include/linux/kho/abi/vfio_pci.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 96ea84948d76..a16a7ecc67a4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -27685,6 +27685,7 @@ F:	Documentation/ABI/testing/debugfs-vfio
>   F:	Documentation/ABI/testing/sysfs-devices-vfio-dev
>   F:	Documentation/driver-api/vfio.rst
>   F:	drivers/vfio/
> +F:	include/linux/kho/abi/vfio_pci.h
>   F:	include/linux/vfio.h
>   F:	include/linux/vfio_pci_core.h
>   F:	include/uapi/linux/vfio.h
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 1e82b44bda1a..8f087f7b58c3 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -58,6 +58,17 @@ config VFIO_PCI_ZDEV_KVM
>   config VFIO_PCI_DMABUF
>   	def_bool y if VFIO_PCI_CORE && PCI_P2PDMA && DMA_SHARED_BUFFER
>   
> +config VFIO_PCI_LIVEUPDATE
> +	bool "VFIO PCI support for Live Update (EXPERIMENTAL)"
> +	depends on VFIO_PCI && PCI_LIVEUPDATE
> +	help
> +	  Support for preserving devices bound to vfio-pci across a Live
> +	  Update. This option should only be enabled by developers working on
> +	  implementing this support. Once enough support has landed in the
> +	  kernel, this option will no longer be marked EXPERIMENTAL.
> +
> +	  If you don't know what to do here, say N.
> +
>   source "drivers/vfio/pci/mlx5/Kconfig"
>   
>   source "drivers/vfio/pci/hisilicon/Kconfig"
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index e0a0757dd1d2..f462df61edb9 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
>   
>   vfio-pci-y := vfio_pci.o
>   vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
> +vfio-pci-$(CONFIG_VFIO_PCI_LIVEUPDATE) += vfio_pci_liveupdate.o
>   obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
>   
>   obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 0c771064c0b8..41dcbe4ace67 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -170,6 +170,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   	ret = vfio_pci_core_register_device(vdev);
>   	if (ret)
>   		goto out_put_vdev;
> +

a meaningless line here.

>   	return 0;
>   
>   out_put_vdev:
> @@ -264,10 +265,14 @@ static int __init vfio_pci_init(void)
>   
>   	vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
>   
> +	ret = vfio_pci_liveupdate_init();
> +	if (ret)
> +		return ret;
> +
>   	/* Register and scan for devices */
>   	ret = pci_register_driver(&vfio_pci_driver);
>   	if (ret)
> -		return ret;
> +		goto err_liveupdate_cleanup;
>   
>   	vfio_pci_fill_ids();
>   
> @@ -275,12 +280,17 @@ static int __init vfio_pci_init(void)
>   		pr_warn("device denylist disabled.\n");
>   
>   	return 0;
> +
> +err_liveupdate_cleanup:
> +	vfio_pci_liveupdate_cleanup();
> +	return ret;
>   }
>   module_init(vfio_pci_init);
>   
>   static void __exit vfio_pci_cleanup(void)
>   {
>   	pci_unregister_driver(&vfio_pci_driver);
> +	vfio_pci_liveupdate_cleanup();
>   }
>   module_exit(vfio_pci_cleanup);
>   
> diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
> new file mode 100644
> index 000000000000..5ea5af46b159
> --- /dev/null
> +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
> @@ -0,0 +1,69 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright (c) 2026, Google LLC.
> + * Vipin Sharma <vipinsh@google.com>
> + * David Matlack <dmatlack@google.com>
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/kho/abi/vfio_pci.h>
> +#include <linux/liveupdate.h>
> +#include <linux/errno.h>
> +
> +#include "vfio_pci_priv.h"
> +
> +static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *handler,
> +					     struct file *file)
> +{
> +	return false;
> +}
> +
> +static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static void vfio_pci_liveupdate_unpreserve(struct liveupdate_file_op_args *args)
> +{
> +}
> +
> +static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static void vfio_pci_liveupdate_finish(struct liveupdate_file_op_args *args)
> +{
> +}
> +
> +static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = {
> +	.can_preserve = vfio_pci_liveupdate_can_preserve,
> +	.preserve = vfio_pci_liveupdate_preserve,
> +	.unpreserve = vfio_pci_liveupdate_unpreserve,
> +	.retrieve = vfio_pci_liveupdate_retrieve,
> +	.finish = vfio_pci_liveupdate_finish,
> +	.owner = THIS_MODULE,
> +};
> +
> +static struct liveupdate_file_handler vfio_pci_liveupdate_fh = {
> +	.ops = &vfio_pci_liveupdate_file_ops,
> +	.compatible = VFIO_PCI_LUO_FH_COMPATIBLE,
> +};
> +
> +int __init vfio_pci_liveupdate_init(void)
> +{
> +	int ret;
> +
> +	ret = liveupdate_register_file_handler(&vfio_pci_liveupdate_fh);
> +	if (ret && ret != -EOPNOTSUPP)
> +		return ret;
> +
> +	return 0;
> +}
> +
> +void vfio_pci_liveupdate_cleanup(void)
> +{
> +       liveupdate_unregister_file_handler(&vfio_pci_liveupdate_fh);
> +}
> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> index 27ac280f00b9..cbf46e09da30 100644
> --- a/drivers/vfio/pci/vfio_pci_priv.h
> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> @@ -133,4 +133,18 @@ static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev,
>   }
>   #endif
>   
> +#ifdef CONFIG_VFIO_PCI_LIVEUPDATE
> +int __init vfio_pci_liveupdate_init(void);
> +void vfio_pci_liveupdate_cleanup(void);
> +#else
> +static inline int vfio_pci_liveupdate_init(void)
> +{
> +	return 0;
> +}
> +
> +static inline void vfio_pci_liveupdate_cleanup(void)
> +{
> +}
> +#endif /* CONFIG_VFIO_PCI_LIVEUPDATE */
> +
>   #endif
> diff --git a/include/linux/kho/abi/vfio_pci.h b/include/linux/kho/abi/vfio_pci.h
> new file mode 100644
> index 000000000000..e2412b455e61
> --- /dev/null
> +++ b/include/linux/kho/abi/vfio_pci.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Copyright (c) 2025, Google LLC.

would be nice to update 2025 to 2026 now. :)

Regards,
Yi Liu



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update
  2026-03-23 23:57 ` [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
@ 2026-03-24 13:08   ` Yi Liu
  2026-03-24 16:46     ` David Matlack
  0 siblings, 1 reply; 41+ messages in thread
From: Yi Liu @ 2026-03-24 13:08 UTC (permalink / raw)
  To: David Matlack, Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan,
	Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet, Josh Hilke,
	Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Zhu Yanjun

On 3/24/26 07:57, David Matlack wrote:
> From: Vipin Sharma <vipinsh@google.com>
> 
> Implement the live update file handler callbacks to preserve a vfio-pci
> device across a Live Update. Subsequent commits will enable userspace to
> then retrieve this file after the Live Update.
> 
> Live Update support is scoped only to cdev files (i.e. not
> VFIO_GROUP_GET_DEVICE_FD files).
> 
> State about each device is serialized into a new ABI struct
> vfio_pci_core_device_ser. The contents of this struct are preserved
> across the Live Update to the next kernel using a combination of
> Kexec-Handover (KHO) to preserve the page(s) holding the struct and the
> Live Update Orchestrator (LUO) to preserve the physical address of the
> struct.
> 
> For now the only contents of struct vfio_pci_core_device_ser the
> device's PCI segment number and BDF, so that the device can be uniquely
> identified after the Live Update.
> 
> Require that userspace disables interrupts on the device prior to
> freeze() so that the device does not send any interrupts until new
> interrupt handlers have been set up by the next kernel.
> 
> Reset the device and restore its state in the freeze() callback. This
> ensures the device can be received by the next kernel in a consistent
> state. Eventually this will be dropped and the device can be preserved
> across in a running state, but that requires further work in VFIO and
> the core PCI layer.
> 
> Note that LUO holds a reference to this file when it is preserved. So
> VFIO is guaranteed that vfio_df_device_last_close() will not be called
> on this device no matter what userspace does.
> 
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> Co-developed-by: David Matlack <dmatlack@google.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   drivers/vfio/pci/vfio_pci.c            |   2 +-
>   drivers/vfio/pci/vfio_pci_core.c       |  57 +++++----
>   drivers/vfio/pci/vfio_pci_liveupdate.c | 156 ++++++++++++++++++++++++-
>   drivers/vfio/pci/vfio_pci_priv.h       |   4 +
>   drivers/vfio/vfio_main.c               |   3 +-
>   include/linux/kho/abi/vfio_pci.h       |  15 +++
>   include/linux/vfio.h                   |   2 +
>   7 files changed, 213 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 41dcbe4ace67..351480d13f6e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -125,7 +125,7 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
>   	return 0;
>   }
>   
> -static const struct vfio_device_ops vfio_pci_ops = {
> +const struct vfio_device_ops vfio_pci_ops = {
>   	.name		= "vfio-pci",
>   	.init		= vfio_pci_core_init_dev,
>   	.release	= vfio_pci_core_release_dev,
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d43745fe4c84..81f941323641 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -585,9 +585,42 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
>   }
>   EXPORT_SYMBOL_GPL(vfio_pci_core_enable);
>   
> +void vfio_pci_core_try_reset(struct vfio_pci_core_device *vdev)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	struct pci_dev *bridge = pci_upstream_bridge(pdev);
> +
> +	lockdep_assert_held(&vdev->vdev.dev_set->lock);
> +
> +	if (!vdev->reset_works)
> +		return;
> +
> +	/*
> +	 * Try to get the locks ourselves to prevent a deadlock. The
> +	 * success of this is dependent on being able to lock the device,
> +	 * which is not always possible.
> +	 *
> +	 * We cannot use the "try" reset interface here, since that will
> +	 * overwrite the previously restored configuration information.
> +	 */
> +	if (bridge && !pci_dev_trylock(bridge))
> +		return;
> +
> +	if (!pci_dev_trylock(pdev))
> +		goto out;
> +
> +	if (!__pci_reset_function_locked(pdev))
> +		vdev->needs_reset = false;
> +
> +	pci_dev_unlock(pdev);
> +out:
> +	if (bridge)
> +		pci_dev_unlock(bridge);
> +}
> +EXPORT_SYMBOL_GPL(vfio_pci_core_try_reset);
> +
>   void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>   {
> -	struct pci_dev *bridge;
>   	struct pci_dev *pdev = vdev->pdev;
>   	struct vfio_pci_dummy_resource *dummy_res, *tmp;
>   	struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
> @@ -687,27 +720,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>   	 */
>   	pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
>   
> -	/*
> -	 * Try to get the locks ourselves to prevent a deadlock. The
> -	 * success of this is dependent on being able to lock the device,
> -	 * which is not always possible.
> -	 * We can not use the "try" reset interface here, which will
> -	 * overwrite the previously restored configuration information.
> -	 */
> -	if (vdev->reset_works) {
> -		bridge = pci_upstream_bridge(pdev);
> -		if (bridge && !pci_dev_trylock(bridge))
> -			goto out_restore_state;
> -		if (pci_dev_trylock(pdev)) {
> -			if (!__pci_reset_function_locked(pdev))
> -				vdev->needs_reset = false;
> -			pci_dev_unlock(pdev);
> -		}
> -		if (bridge)
> -			pci_dev_unlock(bridge);
> -	}
> -
> -out_restore_state:
> +	vfio_pci_core_try_reset(vdev);
>   	pci_restore_state(pdev);
>   out:
>   	pci_disable_device(pdev);
> diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
> index 5ea5af46b159..c4ebc7c486e5 100644
> --- a/drivers/vfio/pci/vfio_pci_liveupdate.c
> +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
> @@ -6,27 +6,178 @@
>    * David Matlack <dmatlack@google.com>
>    */
>   
> +/**
> + * DOC: VFIO PCI Preservation via LUO
> + *
> + * VFIO PCI devices can be preserved over a kexec using the Live Update
> + * Orchestrator (LUO) file preservation. This allows userspace (such as a VMM)
> + * to transfer an in-use device to the next kernel.
> + *
> + * .. note::
> + *    The support for preserving VFIO PCI devices is currently *partial* and
> + *    should be considered *experimental*. It should only be used by developers
> + *    working on expanding the support for the time being.
> + *
> + *    To avoid accidental usage while the support is still experimental, this
> + *    support is hidden behind a default-disable config option
> + *    ``CONFIG_VFIO_PCI_LIVEUPDATE``. Once the kernel support has stabilized and
> + *    become complete, this option will be enabled by default when
> + *    ``CONFIG_VFIO_PCI`` and ``CONFIG_LIVEUPDATE`` are enabled.
> + *
> + * Usage Example
> + * =============
> + *
> + * VFIO PCI devices can be preserved across a kexec by preserving the file
> + * associated with the device in a LUO session::
> + *
> + *   device_fd = open("/dev/vfio/devices/X");

/dev/vfio/devices/vfioX

> + *   ...
> + *   ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, { ..., device_fd, ...});
> + *
> + * .. note::
> + *    LUO will hold an extra reference to the device file for as long as it is
> + *    preserved, so there is no way for the file to be destroyed or the device
> + *    to be unbound from the vfio-pci driver while it is preserved.
> + *
> + * Retrieving the file after kexec is not yet supported.
> + *
> + * Restrictions
> + * ============
> + *
> + * The kernel imposes the following restrictions when preserving VFIO devices:
> + *
> + *  * The device must be bound to the ``vfio-pci`` driver.
> + *
> + *  * ``CONFIG_VFIO_PCI_ZDEV_KVM`` must not be enabled. This may be relaxed in
> + *    the future.
> + *
> + *  * The device not be an Intel display device. This may be relaxed in the
> + *    future.
> + *
> + *  * The device file must have been acquired from the VFIO character device,
> + *    not ``VFIO_GROUP_GET_DEVICE_FD``.

how about "The device file descriptor must be obtained by opening the 
VFIO device
character device (``/dev/vfio/devices/vfioX``), not via 
``VFIO_GROUP_GET_DEVICE_FD``."?

just be aligned with the below words in vfio.rst.

"Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
user can now acquire a device fd by directly opening a character device 
/dev/vfio/devices/vfioX"

> + *
> + *  * The device must have interrupt disable prior to kexec. Failure to disable
> + *    interrupts on the device will cause the ``reboot(LINUX_REBOOT_CMD_KEXEC)``
> + *    syscall (to initiate the kexec) to fail.
> + *
> + * Preservation Behavior
> + * =====================
> + *
> + * The eventual goal of this support is to avoid disrupting the workload, state,
> + * or configuration of each preserved device during a Live Update. This would
> + * include allowing the device to perform DMA to preserved memory buffers and
> + * perform P2P DMA to other preserved devices. However, there are many pieces
> + * that still need to land in the kernel.
> + *
> + * For now, VFIO only preserves the following state for for devices:
> + *
> + *  * The PCI Segment, Bus, Device, and Function numbers of the device. The
> + *    kernel guarantees the these will not change across a kexec when a device
> + *    is preserved.
> + *
> + * Since the kernel is not yet prepared to preserve all parts of the device and
> + * its dependencies (such as DMA mappings), VFIO currently resets and restores
> + * preserved devices back into an idle state during kexec, before handing off
> + * control to the next kernel. This will be relaxed in future versions of the
> + * kernel once it is safe to allow the device to keep running across kexec.
> + */
> +
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>   
> +#include <linux/kexec_handover.h>
>   #include <linux/kho/abi/vfio_pci.h>
>   #include <linux/liveupdate.h>
>   #include <linux/errno.h>
> +#include <linux/vfio.h>

maybe follow alphabet order. errno.h would be moved to the top first.

Regards,Yi Liu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after Live Update
  2026-03-23 23:58 ` [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after " David Matlack
@ 2026-03-24 13:08   ` Yi Liu
  2026-03-24 17:05     ` David Matlack
  0 siblings, 1 reply; 41+ messages in thread
From: Yi Liu @ 2026-03-24 13:08 UTC (permalink / raw)
  To: David Matlack, Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan,
	Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet, Josh Hilke,
	Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Zhu Yanjun

On 3/24/26 07:58, David Matlack wrote:
> From: Vipin Sharma <vipinsh@google.com>
> 
> Enable userspace to retrieve preserved VFIO device files from VFIO after
> a Live Update by implementing the retrieve() and finish() file handler
> callbacks.
> 
> Use an anonymous inode when creating the file, since the retrieved
> device file is not opened through any particular cdev inode, and the
> cdev inode does not matter in practice.

do we have a list of struct file fields that do not matter?

> 
> For now the retrieved file is functionally equivalent a opening the
> corresponding VFIO cdev file. Subsequent commits will leverage the
> preserved state associated with the retrieved file to preserve bits of
> the device across Live Update.
> 
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> Co-developed-by: David Matlack <dmatlack@google.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   drivers/vfio/device_cdev.c             | 59 ++++++++++++++++++++++----
>   drivers/vfio/pci/vfio_pci_liveupdate.c | 52 ++++++++++++++++++++++-
>   drivers/vfio/vfio_main.c               | 13 ++++++
>   include/linux/vfio.h                   | 11 +++++
>   4 files changed, 124 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 8ceca24ac136..edf322315a41 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -2,6 +2,7 @@
>   /*
>    * Copyright (c) 2023 Intel Corporation.
>    */
> +#include <linux/anon_inodes.h>
>   #include <linux/vfio.h>
>   #include <linux/iommufd.h>
>   
> @@ -16,15 +17,10 @@ void vfio_init_device_cdev(struct vfio_device *device)
>   	device->cdev.owner = THIS_MODULE;
>   }
>   
> -/*
> - * device access via the fd opened by this function is blocked until
> - * .open_device() is called successfully during BIND_IOMMUFD.
> - */
> -int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
> +static int vfio_device_cdev_open(struct vfio_device *device, struct file **filep)
>   {
> -	struct vfio_device *device = container_of(inode->i_cdev,
> -						  struct vfio_device, cdev);
>   	struct vfio_device_file *df;
> +	struct file *file = *filep;
>   	int ret;
>   
>   	/* Paired with the put in vfio_device_fops_release() */
> @@ -37,22 +33,67 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
>   		goto err_put_registration;
>   	}
>   
> -	filep->private_data = df;
> +	/*
> +	 * Simulate opening the character device using an anonymous inode. The
> +	 * returned file has the same properties as a cdev file (e.g. operations
> +	 * are blocked until BIND_IOMMUFD is called).
> +	 */
> +	if (!file) {
> +		file = anon_inode_getfile_fmode("[vfio-device-liveupdate]",
> +						&vfio_device_fops, NULL,
> +						O_RDWR, FMODE_PREAD | FMODE_PWRITE);
> +
> +		if (IS_ERR(file)) {
> +			ret = PTR_ERR(file);
> +			goto err_free_device_file;
> +		}
> +
> +		*filep = file;
> +	}
> +
> +	file->private_data = df;
>   
>   	/*
>   	 * Use the pseudo fs inode on the device to link all mmaps
>   	 * to the same address space, allowing us to unmap all vmas
>   	 * associated to this device using unmap_mapping_range().
>   	 */
> -	filep->f_mapping = device->inode->i_mapping;
> +	file->f_mapping = device->inode->i_mapping;
>
>   	return 0;
>   
> +err_free_device_file:
> +	kvfree(df);

any reason to use kvfree()?

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator
  2026-03-24 13:07   ` Yi Liu
@ 2026-03-24 16:33     ` David Matlack
  0 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-24 16:33 UTC (permalink / raw)
  To: Yi Liu
  Cc: Alex Williamson, Bjorn Helgaas, Adithya Jayachandran,
	Alexander Graf, Alex Mastro, Andrew Morton, Ankit Agrawal,
	Arnd Bergmann, Askar Safin, Borislav Petkov (AMD), Chris Li,
	Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan, Jason Gunthorpe,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Kees Cook,
	Kevin Tian, kexec, kvm, Leon Romanovsky, Leon Romanovsky,
	linux-doc, linux-kernel, linux-kselftest, linux-mm, linux-pci,
	Li RongQing, Lukas Wunner, Marco Elver, Michał Winiarski,
	Mike Rapoport, Parav Pandit, Pasha Tatashin, Paul E. McKenney,
	Pawan Gupta, Peter Zijlstra (Intel), Pranjal Shrivastava,
	Pratyush Yadav, Raghavendra Rao Ananta, Randy Dunlap,
	Rodrigo Vivi, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, Vivek Kasireddy, William Tu, Zhu Yanjun

On Tue, Mar 24, 2026 at 6:00 AM Yi Liu <yi.l.liu@intel.com> wrote:
> On 3/24/26 07:57, David Matlack wrote:

> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -170,6 +170,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >       ret = vfio_pci_core_register_device(vdev);
> >       if (ret)
> >               goto out_put_vdev;
> > +
>
> a meaningless line here.

Will fix in v4, thanks.

> > --- /dev/null
> > +++ b/include/linux/kho/abi/vfio_pci.h
> > @@ -0,0 +1,28 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
>
> would be nice to update 2025 to 2026 now. :)

Oops, missed this one, thanks. Will fix in v4.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update
  2026-03-24 13:08   ` Yi Liu
@ 2026-03-24 16:46     ` David Matlack
  0 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-24 16:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: Alex Williamson, Bjorn Helgaas, Adithya Jayachandran,
	Alexander Graf, Alex Mastro, Andrew Morton, Ankit Agrawal,
	Arnd Bergmann, Askar Safin, Borislav Petkov (AMD), Chris Li,
	Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan, Jason Gunthorpe,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Kees Cook,
	Kevin Tian, kexec, kvm, Leon Romanovsky, Leon Romanovsky,
	linux-doc, linux-kernel, linux-kselftest, linux-mm, linux-pci,
	Li RongQing, Lukas Wunner, Marco Elver, Michał Winiarski,
	Mike Rapoport, Parav Pandit, Pasha Tatashin, Paul E. McKenney,
	Pawan Gupta, Peter Zijlstra (Intel), Pranjal Shrivastava,
	Pratyush Yadav, Raghavendra Rao Ananta, Randy Dunlap,
	Rodrigo Vivi, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, Vivek Kasireddy, William Tu, Zhu Yanjun

On Tue, Mar 24, 2026 at 6:01 AM Yi Liu <yi.l.liu@intel.com> wrote:
> On 3/24/26 07:57, David Matlack wrote:

> > + * Usage Example
> > + * =============
> > + *
> > + * VFIO PCI devices can be preserved across a kexec by preserving the file
> > + * associated with the device in a LUO session::
> > + *
> > + *   device_fd = open("/dev/vfio/devices/X");
>
> /dev/vfio/devices/vfioX

Will fix in v4.

> > + *  * The device file must have been acquired from the VFIO character device,
> > + *    not ``VFIO_GROUP_GET_DEVICE_FD``.
>
> how about "The device file descriptor must be obtained by opening the
> VFIO device
> character device (``/dev/vfio/devices/vfioX``), not via
> ``VFIO_GROUP_GET_DEVICE_FD``."?
>
> just be aligned with the below words in vfio.rst.
>
> "Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> user can now acquire a device fd by directly opening a character device
> /dev/vfio/devices/vfioX"

Thanks for the suggestion. Here is the wording I have for v4:

  *  * The device file being preserved must have been obtained by
opening the
  *    VFIO character device (``/dev/vfio/devices/vfioX``), not via
  *    ``VFIO_GROUP_GET_DEVICE_FD``.

> > +#include <linux/kexec_handover.h>
> >   #include <linux/kho/abi/vfio_pci.h>
> >   #include <linux/liveupdate.h>
> >   #include <linux/errno.h>
> > +#include <linux/vfio.h>
>
> maybe follow alphabet order. errno.h would be moved to the top first.

I will reorder errno.h to be at the top in the previous patch (where
the alphabetical ordering issue is introduced).


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after Live Update
  2026-03-24 13:08   ` Yi Liu
@ 2026-03-24 17:05     ` David Matlack
  0 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-24 17:05 UTC (permalink / raw)
  To: Yi Liu
  Cc: Alex Williamson, Bjorn Helgaas, Adithya Jayachandran,
	Alexander Graf, Alex Mastro, Andrew Morton, Ankit Agrawal,
	Arnd Bergmann, Askar Safin, Borislav Petkov (AMD), Chris Li,
	Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan, Jason Gunthorpe,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Kees Cook,
	Kevin Tian, kexec, kvm, Leon Romanovsky, Leon Romanovsky,
	linux-doc, linux-kernel, linux-kselftest, linux-mm, linux-pci,
	Li RongQing, Lukas Wunner, Marco Elver, Michał Winiarski,
	Mike Rapoport, Parav Pandit, Pasha Tatashin, Paul E. McKenney,
	Pawan Gupta, Peter Zijlstra (Intel), Pranjal Shrivastava,
	Pratyush Yadav, Raghavendra Rao Ananta, Randy Dunlap,
	Rodrigo Vivi, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, Vivek Kasireddy, William Tu, Zhu Yanjun

On 2026-03-24 09:08 PM, Yi Liu wrote:
> On 3/24/26 07:58, David Matlack wrote:
> > From: Vipin Sharma <vipinsh@google.com>
> > 
> > Enable userspace to retrieve preserved VFIO device files from VFIO after
> > a Live Update by implementing the retrieve() and finish() file handler
> > callbacks.
> > 
> > Use an anonymous inode when creating the file, since the retrieved
> > device file is not opened through any particular cdev inode, and the
> > cdev inode does not matter in practice.
> 
> do we have a list of struct file fields that do not matter?

My understanding is that VFIO only cares about these fields in struct
file:

 - private_data: Pointer to struct vfio_device_file
 - f_op: Pointer to vfio_device_fops
 - f_mapping: Pointer to vfio_device->inode->i_mapping

This is based on cross-referencing VFIO_GROUP_GET_DEVICE_FD (which uses
an anonymous inode) and the cdev code.

> > +err_free_device_file:
> > +	kvfree(df);
> 
> any reason to use kvfree()?

No this can be kfree(). Will fix in v4.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups
  2026-03-24 13:07   ` Yi Liu
@ 2026-03-24 18:00     ` David Matlack
  2026-03-25 11:12       ` Yi Liu
  0 siblings, 1 reply; 41+ messages in thread
From: David Matlack @ 2026-03-24 18:00 UTC (permalink / raw)
  To: Yi Liu
  Cc: Alex Williamson, Bjorn Helgaas, Adithya Jayachandran,
	Alexander Graf, Alex Mastro, Andrew Morton, Ankit Agrawal,
	Arnd Bergmann, Askar Safin, Borislav Petkov (AMD), Chris Li,
	Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan, Jason Gunthorpe,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Kees Cook,
	Kevin Tian, kexec, kvm, Leon Romanovsky, Leon Romanovsky,
	linux-doc, linux-kernel, linux-kselftest, linux-mm, linux-pci,
	Li RongQing, Lukas Wunner, Marco Elver, Michał Winiarski,
	Mike Rapoport, Parav Pandit, Pasha Tatashin, Paul E. McKenney,
	Pawan Gupta, Peter Zijlstra (Intel), Pranjal Shrivastava,
	Pratyush Yadav, Raghavendra Rao Ananta, Randy Dunlap,
	Rodrigo Vivi, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, Vivek Kasireddy, William Tu, Zhu Yanjun

On 2026-03-24 09:07 PM, Yi Liu wrote:
> On 3/24/26 07:57, David Matlack wrote:
> > Require that Live Update preserved devices are in singleton iommu_groups
> > during preservation (outgoing kernel) and retrieval (incoming kernel).
> > 
> > PCI devices preserved across Live Update will be allowed to perform
> > memory transactions throughout the Live Update. Thus IOMMU groups for
> > preserved devices must remain fixed. Since all current use cases for
> > Live Update are for PCI devices in singleton iommu_groups, require that
> > as a starting point. This avoids the complexity of needing to enforce
> > arbitrary iommu_group topologies while still allowing all current use
> > cases.
> > 
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >   drivers/pci/liveupdate.c | 34 +++++++++++++++++++++++++++++++++-
> >   1 file changed, 33 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
> > index bec7b3500057..a3dbe06650ff 100644
> > --- a/drivers/pci/liveupdate.c
> > +++ b/drivers/pci/liveupdate.c
> > @@ -75,6 +75,8 @@
> >    *
> >    *  * The device must not be a Physical Function (PF).
> >    *
> > + *  * The device must be the only device in its IOMMU group.
> > + *
> >    * Preservation Behavior
> >    * =====================
> >    *
> > @@ -105,6 +107,7 @@
> >   #include <linux/bsearch.h>
> >   #include <linux/io.h>
> > +#include <linux/iommu.h>
> >   #include <linux/kexec_handover.h>
> >   #include <linux/kho/abi/pci.h>
> >   #include <linux/liveupdate.h>
> > @@ -222,6 +225,31 @@ static void pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev)
> >   	ser->nr_devices--;
> >   }
> > +static int count_devices(struct device *dev, void *__nr_devices)
> > +{
> > +	(*(int *)__nr_devices)++;
> > +	return 0;
> > +}
> > +
> 
> there was a related discussion on the singleton group check. have you
> considered the device_group_immutable_singleton() in below link?
> 
> https://lore.kernel.org/linux-iommu/20220421052121.3464100-4-baolu.lu@linux.intel.com/

Thanks for the link.

Based on the discussion in the follow-up threads, I think the only check
in that function that is needed on top of what is in this patch to
ensure group immutability is this one:

	/*
	 * The device could be considered to be fully isolated if
	 * all devices on the path from the device to the host-PCI
	 * bridge are protected from peer-to-peer DMA by ACS.
	 */
	if (!pci_acs_path_enabled(pdev, NULL, REQ_ACS_FLAGS))
		return false;

However, this would restrict Live Update support to only device
topologies that have these flags enabled. I am not yet sure if this
would be overly restrictive for the scenarios we care about supporting.

An alternative way to ensure immutability would be to block adding
devices at probe time. i.e. Fail pci_device_group() if the device being
added has liveupdate_incoming=True, or if the group already contains a
device with liveupdate_{incoming,outgoing}=True. We would still need the
check in pci_liveupdate_preserve() to pretect against setting
liveupdate_outgoing=True on a device in a multi-device group.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups
  2026-03-24 18:00     ` David Matlack
@ 2026-03-25 11:12       ` Yi Liu
  2026-03-25 17:29         ` David Matlack
  0 siblings, 1 reply; 41+ messages in thread
From: Yi Liu @ 2026-03-25 11:12 UTC (permalink / raw)
  To: David Matlack
  Cc: Alex Williamson, Bjorn Helgaas, Adithya Jayachandran,
	Alexander Graf, Alex Mastro, Andrew Morton, Ankit Agrawal,
	Arnd Bergmann, Askar Safin, Borislav Petkov (AMD), Chris Li,
	Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan, Jason Gunthorpe,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Kees Cook,
	Kevin Tian, kexec, kvm, Leon Romanovsky, Leon Romanovsky,
	linux-doc, linux-kernel, linux-kselftest, linux-mm, linux-pci,
	Li RongQing, Lukas Wunner, Marco Elver, Michał Winiarski,
	Mike Rapoport, Parav Pandit, Pasha Tatashin, Paul E. McKenney,
	Pawan Gupta, Peter Zijlstra (Intel), Pranjal Shrivastava,
	Pratyush Yadav, Raghavendra Rao Ananta, Randy Dunlap,
	Rodrigo Vivi, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, Vivek Kasireddy, William Tu, Zhu Yanjun



On 3/25/26 02:00, David Matlack wrote:
> On 2026-03-24 09:07 PM, Yi Liu wrote:
>> On 3/24/26 07:57, David Matlack wrote:
>>> Require that Live Update preserved devices are in singleton iommu_groups
>>> during preservation (outgoing kernel) and retrieval (incoming kernel).
>>>
>>> PCI devices preserved across Live Update will be allowed to perform
>>> memory transactions throughout the Live Update. Thus IOMMU groups for
>>> preserved devices must remain fixed. Since all current use cases for
>>> Live Update are for PCI devices in singleton iommu_groups, require that
>>> as a starting point. This avoids the complexity of needing to enforce
>>> arbitrary iommu_group topologies while still allowing all current use
>>> cases.
>>>
>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Signed-off-by: David Matlack <dmatlack@google.com>
>>> ---
>>>    drivers/pci/liveupdate.c | 34 +++++++++++++++++++++++++++++++++-
>>>    1 file changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
>>> index bec7b3500057..a3dbe06650ff 100644
>>> --- a/drivers/pci/liveupdate.c
>>> +++ b/drivers/pci/liveupdate.c
>>> @@ -75,6 +75,8 @@
>>>     *
>>>     *  * The device must not be a Physical Function (PF).
>>>     *
>>> + *  * The device must be the only device in its IOMMU group.
>>> + *
>>>     * Preservation Behavior
>>>     * =====================
>>>     *
>>> @@ -105,6 +107,7 @@
>>>    #include <linux/bsearch.h>
>>>    #include <linux/io.h>
>>> +#include <linux/iommu.h>
>>>    #include <linux/kexec_handover.h>
>>>    #include <linux/kho/abi/pci.h>
>>>    #include <linux/liveupdate.h>
>>> @@ -222,6 +225,31 @@ static void pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev)
>>>    	ser->nr_devices--;
>>>    }
>>> +static int count_devices(struct device *dev, void *__nr_devices)
>>> +{
>>> +	(*(int *)__nr_devices)++;
>>> +	return 0;
>>> +}
>>> +
>>
>> there was a related discussion on the singleton group check. have you
>> considered the device_group_immutable_singleton() in below link?
>>
>> https://lore.kernel.org/linux-iommu/20220421052121.3464100-4-baolu.lu@linux.intel.com/
> 
> Thanks for the link.
> 
> Based on the discussion in the follow-up threads, I think the only check
> in that function that is needed on top of what is in this patch to
> ensure group immutability is this one:
> 
> 	/*
> 	 * The device could be considered to be fully isolated if
> 	 * all devices on the path from the device to the host-PCI
> 	 * bridge are protected from peer-to-peer DMA by ACS.
> 	 */
> 	if (!pci_acs_path_enabled(pdev, NULL, REQ_ACS_FLAGS))
> 		return false;
> 
> However, this would restrict Live Update support to only device
> topologies that have these flags enabled. I am not yet sure if this
> would be overly restrictive for the scenarios we care about supporting.

yes. It's a bit different from that thread in which not only require
singleton group but also need to be immutable.

> An alternative way to ensure immutability would be to block adding
> devices at probe time. i.e. Fail pci_device_group() if the device being
> added has liveupdate_incoming=True, or if the group already contains a
> device with liveupdate_{incoming,outgoing}=True. We would still need the
> check in pci_liveupdate_preserve() to pretect against setting
> liveupdate_outgoing=True on a device in a multi-device group.

this looks good to me. But you'll disallow hotplug-in during liveupdate.
not sure about if any decision w.r.t. hotplug. is it acceptable?

BTW. A question not specific to this patch. If failure happens after
executing kexec, is there any chance to fallback to the prior kernel?

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups
@ 2026-03-25 14:51 Liu, Yi L
  0 siblings, 0 replies; 41+ messages in thread
From: Liu, Yi L @ 2026-03-25 14:51 UTC (permalink / raw)
  To: David Matlack
  Cc: Alex Williamson, Bjorn Helgaas, Adithya Jayachandran,
	Graf, Alexander, Alex Mastro, Andrew Morton, Ankit Agrawal,
	Arnd Bergmann, Askar Safin, Borislav Petkov, Chris Li, Dapeng Mi,
	David Rientjes, Feng Tang, Jacob Pan, Jason Gunthorpe,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Kees Cook,
	Tian, Kevin, kexec@lists.infradead.org, kvm@vger.kernel.org,
	Leon Romanovsky, Leon Romanovsky, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-mm@kvack.org, linux-pci@vger.kernel.org, Li RongQing,
	Lukas Wunner, Elver Marco, Winiarski, Michal, Mike Rapoport,
	Parav Pandit, Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra, Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Vivi, Rodrigo,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Kasireddy, Vivek, William Tu, Yanjun Zhu



On 3/25/26 02:00, David Matlack wrote:
> On 2026-03-24 09:07 PM, Yi Liu wrote:
>> On 3/24/26 07:57, David Matlack wrote:
>>> Require that Live Update preserved devices are in singleton iommu_groups
>>> during preservation (outgoing kernel) and retrieval (incoming kernel).
>>> PCI devices preserved across Live Update will be allowed to perform
>>> memory transactions throughout the Live Update. Thus IOMMU groups for
>>> preserved devices must remain fixed. Since all current use cases for
>>> Live Update are for PCI devices in singleton iommu_groups, require that
>>> as a starting point. This avoids the complexity of needing to enforce
>>> arbitrary iommu_group topologies while still allowing all current use
>>> cases.
>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Signed-off-by: David Matlack <dmatlack@google.com>
>>> ---
>>>  drivers/pci/liveupdate.c | 34 +++++++++++++++++++++++++++++++++-
>>>  1 file changed, 33 insertions(+), 1 deletion(-)
>>> diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
>>> index bec7b3500057..a3dbe06650ff 100644
>>> --- a/drivers/pci/liveupdate.c
>>> +++ b/drivers/pci/liveupdate.c
>>> @@ -75,6 +75,8 @@
>>>   *
>>>   *  * The device must not be a Physical Function (PF).
>>>   *
>>> + *  * The device must be the only device in its IOMMU group.
>>> + *
>>>   * Preservation Behavior
>>>   * =====================
>>>   *
>>> @@ -105,6 +107,7 @@
>>>  #include <linux/bsearch.h>
>>>  #include <linux/io.h>
>>> +#include <linux/iommu.h>
>>>  #include <linux/kexec_handover.h>
>>>  #include <linux/kho/abi/pci.h>
>>>  #include <linux/liveupdate.h>
>>> @@ -222,6 +225,31 @@ static void pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev)
>>>      ser->nr_devices--;
>>>  }
>>> +static int count_devices(struct device *dev, void *__nr_devices)
>>> +{
>>> +    (*(int *)__nr_devices)++;
>>> +    return 0;
>>> +}
>>> +
>> there was a related discussion on the singleton group check. have you
>> considered the device_group_immutable_singleton() in below link?
>> https://lore.kernel.org/linux-iommu/20220421052121.3464100-4-baolu.lu@linux.intel.com/
> Thanks for the link.
> Based on the discussion in the follow-up threads, I think the only check
> in that function that is needed on top of what is in this patch to
> ensure group immutability is this one:
>   /*
>    * The device could be considered to be fully isolated if
>    * all devices on the path from the device to the host-PCI
>    * bridge are protected from peer-to-peer DMA by ACS.
>    */
>   if (!pci_acs_path_enabled(pdev, NULL, REQ_ACS_FLAGS))
>       return false;
> However, this would restrict Live Update support to only device
> topologies that have these flags enabled. I am not yet sure if this
> would be overly restrictive for the scenarios we care about supporting.

yes. It's a bit different from that thread in which not only require
singleton group but also need to be immutable.

> An alternative way to ensure immutability would be to block adding
> devices at probe time. i.e. Fail pci_device_group() if the device being
> added has liveupdate_incoming=True, or if the group already contains a
> device with liveupdate_{incoming,outgoing}=True. We would still need the
> check in pci_liveupdate_preserve() to pretect against setting
> liveupdate_outgoing=True on a device in a multi-device group.

this looks good to me. But you'll disallow hotplug-in during liveupdate.
not sure about if any decision w.r.t. hotplug. is it acceptable?

BTW. A question not specific to this patch. If failure happens after
executing kexec, is there any chance to fallback to the prior kernel?

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups
  2026-03-25 11:12       ` Yi Liu
@ 2026-03-25 17:29         ` David Matlack
  0 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-25 17:29 UTC (permalink / raw)
  To: Yi Liu
  Cc: Alex Williamson, Bjorn Helgaas, Adithya Jayachandran,
	Alexander Graf, Alex Mastro, Andrew Morton, Ankit Agrawal,
	Arnd Bergmann, Askar Safin, Borislav Petkov (AMD), Chris Li,
	Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan, Jason Gunthorpe,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Kees Cook,
	Kevin Tian, kexec, kvm, Leon Romanovsky, Leon Romanovsky,
	linux-doc, linux-kernel, linux-kselftest, linux-mm, linux-pci,
	Li RongQing, Lukas Wunner, Marco Elver, Michał Winiarski,
	Mike Rapoport, Parav Pandit, Pasha Tatashin, Paul E. McKenney,
	Pawan Gupta, Peter Zijlstra (Intel), Pranjal Shrivastava,
	Pratyush Yadav, Raghavendra Rao Ananta, Randy Dunlap,
	Rodrigo Vivi, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, Vivek Kasireddy, William Tu, Zhu Yanjun

On 2026-03-25 07:12 PM, Yi Liu wrote:
> 
> 
> On 3/25/26 02:00, David Matlack wrote:
> > On 2026-03-24 09:07 PM, Yi Liu wrote:
> > > On 3/24/26 07:57, David Matlack wrote:
> > > > Require that Live Update preserved devices are in singleton iommu_groups
> > > > during preservation (outgoing kernel) and retrieval (incoming kernel).
> > > > 
> > > > PCI devices preserved across Live Update will be allowed to perform
> > > > memory transactions throughout the Live Update. Thus IOMMU groups for
> > > > preserved devices must remain fixed. Since all current use cases for
> > > > Live Update are for PCI devices in singleton iommu_groups, require that
> > > > as a starting point. This avoids the complexity of needing to enforce
> > > > arbitrary iommu_group topologies while still allowing all current use
> > > > cases.
> > > > 
> > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > > ---
> > > >    drivers/pci/liveupdate.c | 34 +++++++++++++++++++++++++++++++++-
> > > >    1 file changed, 33 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
> > > > index bec7b3500057..a3dbe06650ff 100644
> > > > --- a/drivers/pci/liveupdate.c
> > > > +++ b/drivers/pci/liveupdate.c
> > > > @@ -75,6 +75,8 @@
> > > >     *
> > > >     *  * The device must not be a Physical Function (PF).
> > > >     *
> > > > + *  * The device must be the only device in its IOMMU group.
> > > > + *
> > > >     * Preservation Behavior
> > > >     * =====================
> > > >     *
> > > > @@ -105,6 +107,7 @@
> > > >    #include <linux/bsearch.h>
> > > >    #include <linux/io.h>
> > > > +#include <linux/iommu.h>
> > > >    #include <linux/kexec_handover.h>
> > > >    #include <linux/kho/abi/pci.h>
> > > >    #include <linux/liveupdate.h>
> > > > @@ -222,6 +225,31 @@ static void pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev)
> > > >    	ser->nr_devices--;
> > > >    }
> > > > +static int count_devices(struct device *dev, void *__nr_devices)
> > > > +{
> > > > +	(*(int *)__nr_devices)++;
> > > > +	return 0;
> > > > +}
> > > > +
> > > 
> > > there was a related discussion on the singleton group check. have you
> > > considered the device_group_immutable_singleton() in below link?
> > > 
> > > https://lore.kernel.org/linux-iommu/20220421052121.3464100-4-baolu.lu@linux.intel.com/
> > 
> > Thanks for the link.
> > 
> > Based on the discussion in the follow-up threads, I think the only check
> > in that function that is needed on top of what is in this patch to
> > ensure group immutability is this one:
> > 
> > 	/*
> > 	 * The device could be considered to be fully isolated if
> > 	 * all devices on the path from the device to the host-PCI
> > 	 * bridge are protected from peer-to-peer DMA by ACS.
> > 	 */
> > 	if (!pci_acs_path_enabled(pdev, NULL, REQ_ACS_FLAGS))
> > 		return false;
> > 
> > However, this would restrict Live Update support to only device
> > topologies that have these flags enabled. I am not yet sure if this
> > would be overly restrictive for the scenarios we care about supporting.
> 
> yes. It's a bit different from that thread in which not only require
> singleton group but also need to be immutable.
> 
> > An alternative way to ensure immutability would be to block adding
> > devices at probe time. i.e. Fail pci_device_group() if the device being
> > added has liveupdate_incoming=True, or if the group already contains a
> > device with liveupdate_{incoming,outgoing}=True. We would still need the
> > check in pci_liveupdate_preserve() to pretect against setting
> > liveupdate_outgoing=True on a device in a multi-device group.
> 
> this looks good to me. But you'll disallow hotplug-in during liveupdate.
> not sure about if any decision w.r.t. hotplug. is it acceptable?

Anyone doing hotplug during the middle of a Live Update is asking for
trouble IMO. And it would only prevent a hot-plugged device from coming
up if it were to be added to the iommu_group as an existing preserved
device. I think that is reasonable.

> BTW. A question not specific to this patch. If failure happens after
> executing kexec, is there any chance to fallback to the prior kernel?

There are many failure paths during the reboot() syscall that can return
back to userspace, and then userspace can figure out how to bring the
system (e.g. VMs) back online on the current kernel.

But otherwise, kexec is currently a one way door. Once you kexec, into
the new kernel, you would have to do another Live Update to get back
into the previous kernel.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update
  2026-03-23 23:57 ` [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update David Matlack
@ 2026-03-25 20:06   ` David Matlack
  2026-03-25 23:12   ` Bjorn Helgaas
  1 sibling, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-25 20:06 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan,
	Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet, Josh Hilke,
	Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

On 2026-03-23 11:57 PM, David Matlack wrote:

> +static void pci_flb_finish(struct liveupdate_flb_op_args *args)
> +{
> +	kho_restore_free(args->obj);
> +}
> +
> +static struct liveupdate_flb_ops pci_liveupdate_flb_ops = {
> +	.preserve = pci_flb_preserve,
> +	.unpreserve = pci_flb_unpreserve,
> +	.retrieve = pci_flb_retrieve,
> +	.finish = pci_flb_finish,
> +	.owner = THIS_MODULE,
> +};
...
> +static int pci_liveupdate_flb_get_incoming(struct pci_ser **serp)
> +{
> +	int ret;
> +
> +	ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)serp);
> +
> +	/* Live Update is not enabled. */
> +	if (ret == -EOPNOTSUPP)
> +		return ret;
> +
> +	/* Live Update is enabled, but there is no incoming FLB data. */
> +	if (ret == -ENODATA)
> +		return ret;
> +
> +	/*
> +	 * Live Update is enabled and there is incoming FLB data, but none of it
> +	 * matches pci_liveupdate_flb.compatible.
> +	 *
> +	 * This could mean that no PCI FLB data was passed by the previous
> +	 * kernel, but it could also mean the previous kernel used a different
> +	 * compatibility string (i.e.a different ABI). The latter deserves at
> +	 * least a WARN_ON_ONCE() but it cannot be distinguished from the
> +	 * former.
> +	 */
> +	if (ret == -ENOENT) {
> +		pr_info_once("PCI: No incoming FLB data detected during Live Update");
> +		return ret;
> +	}
> +
> +	/*
> +	 * There is incoming FLB data that matches pci_liveupdate_flb.compatible
> +	 * but it cannot be retrieved. Proceed with standard initialization as
> +	 * if there was not incoming PCI FLB data.
> +	 */
> +	WARN_ONCE(ret, "PCI: Failed to retrieve incoming FLB data during Live Update");
> +	return ret;
> +}
> +
> +u32 pci_liveupdate_incoming_nr_devices(void)
> +{
> +	struct pci_ser *ser;
> +
> +	if (pci_liveupdate_flb_get_incoming(&ser))
> +		return 0;
> +
> +	return ser->nr_devices;
> +}
> +
> +void pci_liveupdate_setup_device(struct pci_dev *dev)
> +{
> +	struct pci_ser *ser;
> +
> +	if (pci_liveupdate_flb_get_incoming(&ser))
> +		return;
> +
> +	if (!pci_ser_find(ser, dev))
> +		return;
> +
> +	dev->liveupdate_incoming = true;
> +}

There is an inerent race between callers of
liveupdate_flb_get_incoming() and liveupdate_flb_ops.finish(). There is
no way for callers to protect themselves against the finish() callback
running and freeing the incoming FLB after liveupdate_flb_get_incoming()
returns. Sashiko flagged this as well [1].

After some off list discussion with Pasha and Sami, the proposal to fix
this is to have liveupdate_flb_get_incoming() increment the reference
count on the incoming FLB. We will add a liveupdate_flb_put_incoming()
to drop the reference when the caller is done using the incoming FLB.

I plan to include a patch for this in v4.

[1] https://sashiko.dev/#/patchset/20260323235817.1960573-1-dmatlack%40google.com?patch=7974


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update
  2026-03-23 23:57 ` [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update David Matlack
  2026-03-25 20:06   ` David Matlack
@ 2026-03-25 23:12   ` Bjorn Helgaas
  2026-03-26 21:39     ` David Matlack
  1 sibling, 1 reply; 41+ messages in thread
From: Bjorn Helgaas @ 2026-03-25 23:12 UTC (permalink / raw)
  To: David Matlack
  Cc: Alex Williamson, Bjorn Helgaas, Adithya Jayachandran,
	Alexander Graf, Alex Mastro, Andrew Morton, Ankit Agrawal,
	Arnd Bergmann, Askar Safin, Borislav Petkov (AMD), Chris Li,
	Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan, Jason Gunthorpe,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Kees Cook,
	Kevin Tian, kexec, kvm, Leon Romanovsky, Leon Romanovsky,
	linux-doc, linux-kernel, linux-kselftest, linux-mm, linux-pci,
	Li RongQing, Lukas Wunner, Marco Elver, Michał Winiarski,
	Mike Rapoport, Parav Pandit, Pasha Tatashin, Paul E. McKenney,
	Pawan Gupta, Peter Zijlstra (Intel), Pranjal Shrivastava,
	Pratyush Yadav, Raghavendra Rao Ananta, Randy Dunlap,
	Rodrigo Vivi, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

On Mon, Mar 23, 2026 at 11:57:54PM +0000, David Matlack wrote:
> Add an API to enable the PCI subsystem to participate in a Live Update
> and track all devices that are being preserved by drivers. Since this
> support is still under development, hide it behind a new Kconfig
> PCI_LIVEUPDATE that is marked experimental.

Can you list the interfaces being added here, e.g.,

  pci_liveupdate_register_flb() - register driver's liveupdate_file_handler
  pci_liveupdate_unregister_flb()
  pci_liveupdate_preserve() - preserve device across LU kexec
  pci_liveupdate_unpreserve() - cancel device preservation
  pci_liveupdate_retrieve() - not sure?
  pci_liveupdate_finish()

I think it's nice to have an idea of what pieces to look for before
reading the patch.

> This API will be used in subsequent commits by the vfio-pci driver to
> preserve VFIO devices across Live Update.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  drivers/pci/Kconfig         |  11 ++
>  drivers/pci/Makefile        |   1 +
>  drivers/pci/liveupdate.c    | 380 ++++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.h           |  14 ++
>  drivers/pci/probe.c         |   2 +
>  include/linux/kho/abi/pci.h |  62 ++++++
>  include/linux/pci.h         |  41 ++++
>  7 files changed, 511 insertions(+)
>  create mode 100644 drivers/pci/liveupdate.c
>  create mode 100644 include/linux/kho/abi/pci.h
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index e3f848ffb52a..05307d89c3f4 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -334,6 +334,17 @@ config VGA_ARB_MAX_GPUS
>  	  Reserves space in the kernel to maintain resource locking for
>  	  multiple GPUS.  The overhead for each GPU is very small.
>  
> +config PCI_LIVEUPDATE
> +	bool "PCI Live Update Support (EXPERIMENTAL)"
> +	depends on PCI && LIVEUPDATE
> +	help
> +	  Support for preserving PCI devices across a Live Update. This option
> +	  should only be enabled by developers working on implementing this
> +	  support. Once enough support as landed in the kernel, this option
> +	  will no longer be marked EXPERIMENTAL.

This would be a good place for a one-sentence explanation of what
"preserving PCI devices" means.  Obviously the physical devices stay
there; what's interesting is that the hardware continues operating
without interruption across the update.

s/support as landed/support has landed/ (maybe no need for this
sentence at all)

> +	  If unsure, say N.
> +
>  source "drivers/pci/hotplug/Kconfig"
>  source "drivers/pci/controller/Kconfig"
>  source "drivers/pci/endpoint/Kconfig"
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 41ebc3b9a518..e8d003cb6757 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -16,6 +16,7 @@ obj-$(CONFIG_PROC_FS)		+= proc.o
>  obj-$(CONFIG_SYSFS)		+= pci-sysfs.o slot.o
>  obj-$(CONFIG_ACPI)		+= pci-acpi.o
>  obj-$(CONFIG_GENERIC_PCI_IOMAP) += iomap.o
> +obj-$(CONFIG_PCI_LIVEUPDATE)	+= liveupdate.o
>  endif
>  
>  obj-$(CONFIG_OF)		+= of.o
> diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
> new file mode 100644
> index 000000000000..bec7b3500057
> --- /dev/null
> +++ b/drivers/pci/liveupdate.c
> @@ -0,0 +1,380 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright (c) 2026, Google LLC.
> + * David Matlack <dmatlack@google.com>
> + */
> +
> +/**
> + * DOC: PCI Live Update
> + *
> + * The PCI subsystem participates in the Live Update process to enable drivers
> + * to preserve their PCI devices across kexec.
> + *
> + * Device preservation across Live Update is built on top of the Live Update
> + * Orchestrator (LUO) support for file preservation across kexec. Userspace
> + * indicates that a device should be preserved by preserving the file associated
> + * with the device with ``ioctl(LIVEUPDATE_SESSION_PRESERVE_FD)``.
> + *
> + * .. note::
> + *    The support for preserving PCI devices across Live Update is currently
> + *    *partial* and should be considered *experimental*. It should only be
> + *    used by developers working on the implementation for the time being.
> + *
> + *    To enable the support, enable ``CONFIG_PCI_LIVEUPDATE``.
> + *
> + * Driver API
> + * ==========
> + *
> + * Drivers that support file-based device preservation must register their
> + * ``liveupdate_file_handler`` with the PCI subsystem by calling
> + * ``pci_liveupdate_register_flb()``. This ensures the PCI subsystem will be
> + * notified whenever a device file is preserved so that ``struct pci_ser``
> + * can be allocated to track all preserved devices. This struct is an ABI
> + * and is eventually handed off to the next kernel via Kexec-Handover (KHO).
> + *
> + * In the "outgoing" kernel (before kexec), drivers should then notify the PCI
> + * subsystem directly whenever the preservation status for a device changes:
> + *
> + *  * ``pci_liveupdate_preserve(pci_dev)``: The device is being preserved.
> + *
> + *  * ``pci_liveupdate_unpreserve(pci_dev)``: The device is no longer being
> + *    preserved (preservation is cancelled).
> + *
> + * In the "incoming" kernel (after kexec), drivers should notify the PCI
> + * subsystem with the following calls:
> + *
> + *  * ``pci_liveupdate_retrieve(pci_dev)``: The device file is being retrieved
> + *    by userspace.

I'm not clear on what this means.  Is this telling the PCI core that
somebody else (userspace?) is doing something?  Why does the PCI core
care?  The name suggests that this interface would retrieve some data
from the PCI core, but that doesn't seem to be what's happening.

> + *
> + *  * ``pci_liveupdate_finish(pci_dev)``: The device is done participating in
> + *    Live Update. After this point the device may no longer be even associated
> + *    with the same driver.

This sets "dev->liveupdate_incoming = false", and the only place we
check that is in pci_liveupdate_retrieve().  In particular, there's
nothing in the driver bind/unbind paths that seems related.  I guess
pci_liveupdate_finish() just means the driver can't call
pci_liveupdate_retrieve() any more?

> + *
> + * Incoming/Outgoing
> + * =================
> + *
> + * The state of each device's participation in Live Update is stored in
> + * ``struct pci_dev``:
> + *
> + *  * ``liveupdate_outgoing``: True if the device is being preserved in the
> + *    outgoing kernel. Set in ``pci_liveupdate_preserve()`` and cleared in
> + *    ``pci_liveupdate_unpreserve()``.
> + *
> + *  * ``liveupdate_incoming``: True if the device is preserved in the incoming
> + *    kernel. Set during probing when the device is first created and cleared
> + *    in ``pci_liveupdate_finish()``.
> + *
> + * Restrictions
> + * ============
> + *
> + * Preserved devices currently have the following restrictions. Each of these
> + * may be relaxed in the future.
> + *
> + *  * The device must not be a Virtual Function (VF).
> + *
> + *  * The device must not be a Physical Function (PF).
> + *
> + * Preservation Behavior
> + * =====================
> + *
> + * The kernel preserves the following state for devices preserved across a Live
> + * Update:
> + *
> + *  * The PCI Segment, Bus, Device, and Function numbers assigned to the device
> + *    are guaranteed to remain the same across Live Update.
> + *
> + * This list will be extended in the future as new support is added.
> + *
> + * Driver Binding
> + * ==============
> + *
> + * It is the driver's responsibility for ensuring that preserved devices are not
> + * released or bound to a different driver for as long as they are preserved. In
> + * practice, this is enforced by LUO taking an extra referenced to the preserved

s/responsibility for ensuring/responsibility to ensure/
s/referenced/reference/

> + * device file for as long as it is preserved.
> + *
> + * However, there is a window of time in the incoming kernel when a device is
> + * first probed and when userspace retrieves the device file with
> + * ``LIVEUPDATE_SESSION_RETRIEVE_FD`` when the device could be bound to any
> + * driver.

  ... window of time in the incoming kernel between a device being
  probed and userspace retrieving the device file ... when the device
  could be bound ...

I'm not sure what it means to retrieve a device file.  It doesn't
sound like the usual Unix "device file" or "special file" in /dev/,
since those aren't "retrieved".

> + * It is currently userspace's responsibility to ensure that the device is bound
> + * to the correct driver in this window.
> + */
> +
> +#include <linux/bsearch.h>
> +#include <linux/io.h>
> +#include <linux/kexec_handover.h>
> +#include <linux/kho/abi/pci.h>
> +#include <linux/liveupdate.h>
> +#include <linux/mutex.h>
> +#include <linux/mm.h>
> +#include <linux/pci.h>
> +#include <linux/sort.h>
> +
> +#include "pci.h"
> +
> +static DEFINE_MUTEX(pci_flb_outgoing_lock);

It'd be handy if there were some excuse to mention "FLB" and expand it
once in the doc above, since I have no idea what it means or where to
look for it.  Maybe unfortunate that it will be pronounced "flub" ;)

> +static int pci_flb_preserve(struct liveupdate_flb_op_args *args)
> +{
> +	struct pci_dev *dev = NULL;
> +	int max_nr_devices = 0;
> +	struct pci_ser *ser;
> +	unsigned long size;
> +
> +	/*
> +	 * Don't both accounting for VFs that could be created after this
> +	 * since preserving VFs is not supported yet. Also don't account
> +	 * for devices that could be hot-plugged after this since preserving
> +	 * hot-plugged devices across Live Update is not yet an expected
> +	 * use-case.

s/Don't both accounting/Don't bother accounting/ ? not sure of intent

I suspect the important thing here is that this allocates space for
preserving X devices, and each subsequent pci_liveupdate_preserve()
call from a driver uses up one of those slots.

My guess is this is just an allocation issue and from that point of
view there's no actual problem with enabling VFs or hot-adding devices
after this point; it's just that pci_liveupdate_preserve() will fail
after X calls.

> +	 */
> +	for_each_pci_dev(dev)
> +		max_nr_devices++;
> +
> +	size = struct_size_t(struct pci_ser, devices, max_nr_devices);
> +
> +	ser = kho_alloc_preserve(size);
> +	if (IS_ERR(ser))
> +		return PTR_ERR(ser);
> +
> +	ser->max_nr_devices = max_nr_devices;
> +
> +	args->obj = ser;
> +	args->data = virt_to_phys(ser);
> +	return 0;
> +}
> +
> +static void pci_flb_unpreserve(struct liveupdate_flb_op_args *args)
> +{
> +	struct pci_ser *ser = args->obj;
> +
> +	WARN_ON_ONCE(ser->nr_devices);

I guess this means somebody (userspace?) called .unpreserve() before
all the drivers that had called pci_liveupdate_preserve() have also
called pci_liveupdate_unpreserve()?

If this is userspace-triggerable, maybe it's worth a meaningful
message including one or more of the device IDs from ser->devices[]?

> +	kho_unpreserve_free(ser);
> +}
> +
> +static int pci_flb_retrieve(struct liveupdate_flb_op_args *args)
> +{
> +	args->obj = phys_to_virt(args->data);
> +	return 0;
> +}
> +
> +static void pci_flb_finish(struct liveupdate_flb_op_args *args)
> +{
> +	kho_restore_free(args->obj);
> +}
> +
> +static struct liveupdate_flb_ops pci_liveupdate_flb_ops = {
> +	.preserve = pci_flb_preserve,
> +	.unpreserve = pci_flb_unpreserve,
> +	.retrieve = pci_flb_retrieve,
> +	.finish = pci_flb_finish,
> +	.owner = THIS_MODULE,
> +};
> +
> +static struct liveupdate_flb pci_liveupdate_flb = {
> +	.ops = &pci_liveupdate_flb_ops,
> +	.compatible = PCI_LUO_FLB_COMPATIBLE,
> +};
> +
> +#define INIT_PCI_DEV_SER(_dev) {		\
> +	.domain = pci_domain_nr((_dev)->bus),	\
> +	.bdf = pci_dev_id(_dev),		\
> +}
> +
> +static int pci_dev_ser_cmp(const void *__a, const void *__b)
> +{
> +	const struct pci_dev_ser *a = __a, *b = __b;
> +
> +	return cmp_int((u64)a->domain << 16 | a->bdf,
> +		       (u64)b->domain << 16 | b->bdf);
> +}
> +
> +static struct pci_dev_ser *pci_ser_find(struct pci_ser *ser,
> +					struct pci_dev *dev)
> +{
> +	const struct pci_dev_ser key = INIT_PCI_DEV_SER(dev);
> +
> +	return bsearch(&key, ser->devices, ser->nr_devices,
> +		       sizeof(key), pci_dev_ser_cmp);
> +}
> +
> +static void pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev)
> +{
> +	struct pci_dev_ser *dev_ser;
> +	int i;
> +
> +	dev_ser = pci_ser_find(ser, dev);
> +
> +	/*
> +	 * This should never happen unless there is a kernel bug or
> +	 * corruption that causes the state in struct pci_ser to get
> +	 * out of sync with struct pci_dev.

Corruption can be a bug anywhere and isn't really worth mentioning,
but the "out of sync" part sounds like it glosses over something
important.

I guess this happens if there was no successful
pci_liveupdate_preserve(X) before calling
pci_liveupdate_unpreserve(X)?  That does sound like a kernel bug (I
suppose a VFIO or other driver bug?), and I would just say what
happened directly instead of calling it "out of sync".

> +	 */
> +	if (pci_WARN_ONCE(dev, !dev_ser, "Cannot find preserved device!"))

Seems like an every-time sort of message if this indicates a driver bug?

It's enough of a hassle to convince myself that pci_WARN_ONCE()
returns the value that caused the warning that I would prefer:

  if (!dev_ser) {
    pci_warn(...) or pci_WARN_ONCE(...)
    return;
  }

> +		return;
> +
> +	for (i = dev_ser - ser->devices; i < ser->nr_devices - 1; i++)
> +		ser->devices[i] = ser->devices[i + 1];
> +
> +	ser->nr_devices--;
> +}
> +
> +int pci_liveupdate_preserve(struct pci_dev *dev)
> +{
> +	struct pci_dev_ser new = INIT_PCI_DEV_SER(dev);
> +	struct pci_ser *ser;
> +	int i, ret;
> +
> +	/* SR-IOV is not supported yet. */
> +	if (dev->is_virtfn || dev->is_physfn)
> +		return -EINVAL;
> +
> +	guard(mutex)(&pci_flb_outgoing_lock);
> +
> +	if (dev->liveupdate_outgoing)
> +		return -EBUSY;
> +
> +	ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser);
> +	if (ret)
> +		return ret;
> +
> +	if (ser->nr_devices == ser->max_nr_devices)
> +		return -E2BIG;
> +
> +	for (i = ser->nr_devices; i > 0; i--) {
> +		struct pci_dev_ser *prev = &ser->devices[i - 1];
> +		int cmp = pci_dev_ser_cmp(&new, prev);
> +
> +		/*
> +		 * This should never happen unless there is a kernel bug or
> +		 * corruption that causes the state in struct pci_ser to get out
> +		 * of sync with struct pci_dev.

Huh.  Same comment as above.  I don't think this is telling me
anything useful.  I guess what happened is we're trying to preserve X
and X is already in "ser", but we should have returned -EBUSY above
for that case.  If we're just saying memory corruption could cause
bugs, I think that's pointless.

Actually I'm not even sure we should check for this.

> +		 */
> +		if (WARN_ON_ONCE(!cmp))
> +			return -EBUSY;
> +
> +		if (cmp > 0)
> +			break;
> +
> +		ser->devices[i] = *prev;
> +	}
> +
> +	ser->devices[i] = new;
> +	ser->nr_devices++;
> +	dev->liveupdate_outgoing = true;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_liveupdate_preserve);
> +
> +void pci_liveupdate_unpreserve(struct pci_dev *dev)
> +{
> +	struct pci_ser *ser;
> +	int ret;
> +
> +	/* This should never happen unless the caller (driver) is buggy */
> +	if (WARN_ON_ONCE(!dev->liveupdate_outgoing))

Why once?  Is there some situation where we could get a flood?  Since
we have a pci_dev, maybe a pci_warn() that would indicate the driver
and device would be more useful?

> +		return;
> +
> +	guard(mutex)(&pci_flb_outgoing_lock);
> +
> +	ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser);
> +
> +	/* This should never happen unless there is a bug in LUO */
> +	if (WARN_ON_ONCE(ret))

Is LUO completely in-kernel?  I think this warning message would be
kind of obscure if this is something that could be triggered by a
userspace bug.  Also, we do have the pci_dev, which a WARN_ON_ONCE()
doesn't take advantage of at all.

> +		return;
> +
> +	pci_ser_delete(ser, dev);
> +	dev->liveupdate_outgoing = false;
> +}
> +EXPORT_SYMBOL_GPL(pci_liveupdate_unpreserve);
> +
> +static int pci_liveupdate_flb_get_incoming(struct pci_ser **serp)
> +{
> +	int ret;
> +
> +	ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)serp);
> +
> +	/* Live Update is not enabled. */
> +	if (ret == -EOPNOTSUPP)
> +		return ret;
> +
> +	/* Live Update is enabled, but there is no incoming FLB data. */
> +	if (ret == -ENODATA)
> +		return ret;
> +
> +	/*
> +	 * Live Update is enabled and there is incoming FLB data, but none of it
> +	 * matches pci_liveupdate_flb.compatible.
> +	 *
> +	 * This could mean that no PCI FLB data was passed by the previous
> +	 * kernel, but it could also mean the previous kernel used a different
> +	 * compatibility string (i.e.a different ABI). The latter deserves at
> +	 * least a WARN_ON_ONCE() but it cannot be distinguished from the
> +	 * former.

This says both "there is incoming FLB data" and "no PCI FLB data".  I
guess maybe it's possible to have FLB data but no *PCI* FLB data?

s/i.e.a/i.e., /

> +	 */
> +	if (ret == -ENOENT) {
> +		pr_info_once("PCI: No incoming FLB data detected during Live Update");

Not sure "FLB" will be meaningful to users here.  Maybe we could say
something like ("no FLB data compatible with %s\n", pci_liveupdate_flb.compatible)?

> +		return ret;
> +	}
> +
> +	/*
> +	 * There is incoming FLB data that matches pci_liveupdate_flb.compatible
> +	 * but it cannot be retrieved. Proceed with standard initialization as
> +	 * if there was not incoming PCI FLB data.

s/if there was not/if there was no/

> +	 */
> +	WARN_ONCE(ret, "PCI: Failed to retrieve incoming FLB data during Live Update");
> +	return ret;
> +}
> +
> +u32 pci_liveupdate_incoming_nr_devices(void)
> +{
> +	struct pci_ser *ser;
> +
> +	if (pci_liveupdate_flb_get_incoming(&ser))
> +		return 0;

Seems slightly overcomplicated to return various error codes from
pci_liveupdate_flb_get_incoming(), only to throw them away here and
special-case the "return 0".  I think you *could* set
"ser->nr_devices" to zero at entry to
pci_liveupdate_flb_get_incoming() and make this just:

  pci_liveupdate_flb_get_incoming(&ser);
  return ser->nr_devices;

> +	return ser->nr_devices;
> +}
> +
> +void pci_liveupdate_setup_device(struct pci_dev *dev)
> +{
> +	struct pci_ser *ser;
> +
> +	if (pci_liveupdate_flb_get_incoming(&ser))
> +		return;
> +
> +	if (!pci_ser_find(ser, dev))
> +		return;

If pci_liveupdate_flb_get_incoming() set ser->nr_devices to zero at
entry, the bsearch() in pci_ser_find() would return NULL if there were
no devices to search:

  pci_liveupdate_flb_get_incoming(&ser);
  if (!pci_ser_find(ser, dev))
    return;

> +	dev->liveupdate_incoming = true;
> +}
> +
> +int pci_liveupdate_retrieve(struct pci_dev *dev)
> +{
> +	if (!dev->liveupdate_incoming)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_liveupdate_retrieve);
> +
> +void pci_liveupdate_finish(struct pci_dev *dev)
> +{
> +	dev->liveupdate_incoming = false;
> +}
> +EXPORT_SYMBOL_GPL(pci_liveupdate_finish);
> +
> +int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
> +{
> +	return liveupdate_register_flb(fh, &pci_liveupdate_flb);
> +}
> +EXPORT_SYMBOL_GPL(pci_liveupdate_register_flb);
> +
> +void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh)
> +{
> +	liveupdate_unregister_flb(fh, &pci_liveupdate_flb);
> +}
> +EXPORT_SYMBOL_GPL(pci_liveupdate_unregister_flb);
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 13d998fbacce..979cb9921340 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -1434,4 +1434,18 @@ static inline int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int inde
>  	(PCI_CONF1_ADDRESS(bus, dev, func, reg) | \
>  	 PCI_CONF1_EXT_REG(reg))
>  
> +#ifdef CONFIG_PCI_LIVEUPDATE
> +void pci_liveupdate_setup_device(struct pci_dev *dev);
> +u32 pci_liveupdate_incoming_nr_devices(void);
> +#else
> +static inline void pci_liveupdate_setup_device(struct pci_dev *dev)
> +{
> +}
> +
> +static inline u32 pci_liveupdate_incoming_nr_devices(void)
> +{
> +	return 0;
> +}
> +#endif
> +
>  #endif /* DRIVERS_PCI_H */
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index bccc7a4bdd79..c60222d45659 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2064,6 +2064,8 @@ int pci_setup_device(struct pci_dev *dev)
>  	if (pci_early_dump)
>  		early_dump_pci_device(dev);
>  
> +	pci_liveupdate_setup_device(dev);
> +
>  	/* Need to have dev->class ready */
>  	dev->cfg_size = pci_cfg_space_size(dev);
>  
> diff --git a/include/linux/kho/abi/pci.h b/include/linux/kho/abi/pci.h
> new file mode 100644
> index 000000000000..7764795f6818
> --- /dev/null
> +++ b/include/linux/kho/abi/pci.h

It seems like most of include/linux/ is ABI, so does kho/abi/ need to
be separated out in its own directory?

It's kind of unusual for the hierarchy to be this deep, especially
since abi/ is the only thing in include/linux/kho/.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups
  2026-03-23 23:57 ` [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups David Matlack
  2026-03-24 13:07   ` Yi Liu
@ 2026-03-25 23:13   ` Bjorn Helgaas
  1 sibling, 0 replies; 41+ messages in thread
From: Bjorn Helgaas @ 2026-03-25 23:13 UTC (permalink / raw)
  To: David Matlack
  Cc: Alex Williamson, Bjorn Helgaas, Adithya Jayachandran,
	Alexander Graf, Alex Mastro, Andrew Morton, Ankit Agrawal,
	Arnd Bergmann, Askar Safin, Borislav Petkov (AMD), Chris Li,
	Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan, Jason Gunthorpe,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Kees Cook,
	Kevin Tian, kexec, kvm, Leon Romanovsky, Leon Romanovsky,
	linux-doc, linux-kernel, linux-kselftest, linux-mm, linux-pci,
	Li RongQing, Lukas Wunner, Marco Elver, Michał Winiarski,
	Mike Rapoport, Parav Pandit, Pasha Tatashin, Paul E. McKenney,
	Pawan Gupta, Peter Zijlstra (Intel), Pranjal Shrivastava,
	Pratyush Yadav, Raghavendra Rao Ananta, Randy Dunlap,
	Rodrigo Vivi, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

On Mon, Mar 23, 2026 at 11:57:55PM +0000, David Matlack wrote:
> Require that Live Update preserved devices are in singleton iommu_groups
> during preservation (outgoing kernel) and retrieval (incoming kernel).
> 
> PCI devices preserved across Live Update will be allowed to perform
> memory transactions throughout the Live Update. Thus IOMMU groups for
> preserved devices must remain fixed. Since all current use cases for
> Live Update are for PCI devices in singleton iommu_groups, require that
> as a starting point. This avoids the complexity of needing to enforce
> arbitrary iommu_group topologies while still allowing all current use
> cases.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  drivers/pci/liveupdate.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
> index bec7b3500057..a3dbe06650ff 100644
> --- a/drivers/pci/liveupdate.c
> +++ b/drivers/pci/liveupdate.c
> @@ -75,6 +75,8 @@
>   *
>   *  * The device must not be a Physical Function (PF).
>   *
> + *  * The device must be the only device in its IOMMU group.
> + *
>   * Preservation Behavior
>   * =====================
>   *
> @@ -105,6 +107,7 @@
>  
>  #include <linux/bsearch.h>
>  #include <linux/io.h>
> +#include <linux/iommu.h>
>  #include <linux/kexec_handover.h>
>  #include <linux/kho/abi/pci.h>
>  #include <linux/liveupdate.h>
> @@ -222,6 +225,31 @@ static void pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev)
>  	ser->nr_devices--;
>  }
>  
> +static int count_devices(struct device *dev, void *__nr_devices)
> +{
> +	(*(int *)__nr_devices)++;
> +	return 0;
> +}
> +
> +static int pci_liveupdate_validate_iommu_group(struct pci_dev *dev)
> +{
> +	struct iommu_group *group;
> +	int nr_devices = 0;
> +
> +	group = iommu_group_get(&dev->dev);
> +	if (group) {
> +		iommu_group_for_each_dev(group, &nr_devices, count_devices);
> +		iommu_group_put(group);
> +	}
> +
> +	if (nr_devices != 1) {
> +		pci_warn(dev, "Live Update preserved devices must be in singleton iommu groups!");
> +		return -EINVAL;
> +	}
> +
> +	return 0;

I assume the requirement is that there *is* an iommu_group and also
that dev is the only member.  If so, I think the intent would be a
little clearer as:

    group = iommu_group_get(&dev->dev);
    if (!group)
      goto no_group;

    iommu_group_for_each_dev(group, &nr_devices, count_devices);
    iommu_group_put(group);

    if (nr_devices == 1) {
      return 0;

  no_group:
    pci_warn(...);
    return -EINVAL;

> +}
> +
>  int pci_liveupdate_preserve(struct pci_dev *dev)
>  {
>  	struct pci_dev_ser new = INIT_PCI_DEV_SER(dev);
> @@ -232,6 +260,10 @@ int pci_liveupdate_preserve(struct pci_dev *dev)
>  	if (dev->is_virtfn || dev->is_physfn)
>  		return -EINVAL;
>  
> +	ret = pci_liveupdate_validate_iommu_group(dev);
> +	if (ret)
> +		return ret;
> +
>  	guard(mutex)(&pci_flb_outgoing_lock);
>  
>  	if (dev->liveupdate_outgoing)
> @@ -357,7 +389,7 @@ int pci_liveupdate_retrieve(struct pci_dev *dev)
>  	if (!dev->liveupdate_incoming)
>  		return -EINVAL;
>  
> -	return 0;
> +	return pci_liveupdate_validate_iommu_group(dev);
>  }
>  EXPORT_SYMBOL_GPL(pci_liveupdate_retrieve);
>  
> -- 
> 2.53.0.983.g0bb29b3bc5-goog
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files
  2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
                   ` (23 preceding siblings ...)
  2026-03-23 23:58 ` [PATCH v3 24/24] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack
@ 2026-03-26 20:43 ` David Matlack
  24 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-26 20:43 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: Adithya Jayachandran, Alexander Graf, Alex Mastro, Andrew Morton,
	Ankit Agrawal, Arnd Bergmann, Askar Safin, Borislav Petkov (AMD),
	Chris Li, Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan,
	Jason Gunthorpe, Jason Gunthorpe, Jonathan Corbet, Josh Hilke,
	Kees Cook, Kevin Tian, kexec, kvm, Leon Romanovsky,
	Leon Romanovsky, linux-doc, linux-kernel, linux-kselftest,
	linux-mm, linux-pci, Li RongQing, Lukas Wunner, Marco Elver,
	Michał Winiarski, Mike Rapoport, Parav Pandit,
	Pasha Tatashin, Paul E. McKenney, Pawan Gupta,
	Peter Zijlstra (Intel), Pranjal Shrivastava, Pratyush Yadav,
	Raghavendra Rao Ananta, Randy Dunlap, Rodrigo Vivi,
	Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Vipin Sharma,
	Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

On Mon, Mar 23, 2026 at 4:58 PM David Matlack <dmatlack@google.com> wrote:
>
> This series can be found on GitHub:
>
>   https://github.com/dmatlack/linux/tree/liveupdate/vfio/cdev/v3
>
> This series adds the base support to preserve a VFIO device file across
> a Live Update. "Base support" means that this allows userspace to
> safely preserve a VFIO device file with LIVEUPDATE_SESSION_PRESERVE_FD
> and retrieve it with  LIVEUPDATE_SESSION_RETRIEVE_FD, but the device
> itself is not preserved in a fully running state across Live Update.

Apologies for the large number of people who got added to the CC list
on this version of the patchset. The changes to
Documentation/admin-guide/kernel-parameters.txt in patch 4 caused
scripts/get_maintainer.pl to CC a number of additional people due to
--git-fallback. I'll fix that in the next version.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update
  2026-03-25 23:12   ` Bjorn Helgaas
@ 2026-03-26 21:39     ` David Matlack
  0 siblings, 0 replies; 41+ messages in thread
From: David Matlack @ 2026-03-26 21:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Alex Williamson, Bjorn Helgaas, Adithya Jayachandran,
	Alexander Graf, Alex Mastro, Andrew Morton, Ankit Agrawal,
	Arnd Bergmann, Askar Safin, Borislav Petkov (AMD), Chris Li,
	Dapeng Mi, David Rientjes, Feng Tang, Jacob Pan, Jason Gunthorpe,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Kees Cook,
	Kevin Tian, kexec, kvm, Leon Romanovsky, Leon Romanovsky,
	linux-doc, linux-kernel, linux-kselftest, linux-mm, linux-pci,
	Li RongQing, Lukas Wunner, Marco Elver, Michał Winiarski,
	Mike Rapoport, Parav Pandit, Pasha Tatashin, Paul E. McKenney,
	Pawan Gupta, Peter Zijlstra (Intel), Pranjal Shrivastava,
	Pratyush Yadav, Raghavendra Rao Ananta, Randy Dunlap,
	Rodrigo Vivi, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, Vivek Kasireddy, William Tu, Yi Liu, Zhu Yanjun

On 2026-03-25 06:12 PM, Bjorn Helgaas wrote:

Thank you for the thorough review Bjorn!

> On Mon, Mar 23, 2026 at 11:57:54PM +0000, David Matlack wrote:
> > Add an API to enable the PCI subsystem to participate in a Live Update
> > and track all devices that are being preserved by drivers. Since this
> > support is still under development, hide it behind a new Kconfig
> > PCI_LIVEUPDATE that is marked experimental.
> 
> Can you list the interfaces being added here

Yes will do.

> > +config PCI_LIVEUPDATE
> > +	bool "PCI Live Update Support (EXPERIMENTAL)"
> > +	depends on PCI && LIVEUPDATE
> > +	help
> > +	  Support for preserving PCI devices across a Live Update. This option
> > +	  should only be enabled by developers working on implementing this
> > +	  support. Once enough support as landed in the kernel, this option
> > +	  will no longer be marked EXPERIMENTAL.
> 
> This would be a good place for a one-sentence explanation of what
> "preserving PCI devices" means.  Obviously the physical devices stay
> there; what's interesting is that the hardware continues operating
> without interruption across the update.
> 
> s/support as landed/support has landed/ (maybe no need for this
> sentence at all)

Will do.

> > + * Driver API
> > + * ==========
> > + *
> > + * Drivers that support file-based device preservation must register their
> > + * ``liveupdate_file_handler`` with the PCI subsystem by calling
> > + * ``pci_liveupdate_register_flb()``. This ensures the PCI subsystem will be
> > + * notified whenever a device file is preserved so that ``struct pci_ser``
> > + * can be allocated to track all preserved devices. This struct is an ABI
> > + * and is eventually handed off to the next kernel via Kexec-Handover (KHO).
> > + *
> > + * In the "outgoing" kernel (before kexec), drivers should then notify the PCI
> > + * subsystem directly whenever the preservation status for a device changes:
> > + *
> > + *  * ``pci_liveupdate_preserve(pci_dev)``: The device is being preserved.
> > + *
> > + *  * ``pci_liveupdate_unpreserve(pci_dev)``: The device is no longer being
> > + *    preserved (preservation is cancelled).
> > + *
> > + * In the "incoming" kernel (after kexec), drivers should notify the PCI
> > + * subsystem with the following calls:
> > + *
> > + *  * ``pci_liveupdate_retrieve(pci_dev)``: The device file is being retrieved
> > + *    by userspace.
> 
> I'm not clear on what this means.  Is this telling the PCI core that
> somebody else (userspace?) is doing something?  Why does the PCI core
> care?  The name suggests that this interface would retrieve some data
> from the PCI core, but that doesn't seem to be what's happening.

I think this function can go away in the next version.

I added this so that the PCI core could prevent userspace from
retrieving the preserved file associated with the device from LUO if
the device is not in a singleton IOMMU group (see next patch). But per
the discussion with Yi I am going to move that check to probe time.

> > + *
> > + *  * ``pci_liveupdate_finish(pci_dev)``: The device is done participating in
> > + *    Live Update. After this point the device may no longer be even associated
> > + *    with the same driver.
> 
> This sets "dev->liveupdate_incoming = false", and the only place we
> check that is in pci_liveupdate_retrieve().  In particular, there's
> nothing in the driver bind/unbind paths that seems related.  I guess
> pci_liveupdate_finish() just means the driver can't call
> pci_liveupdate_retrieve() any more?

liveupdate_incoming is used by VFIO in patch 10:

  https://lore.kernel.org/kvm/20260323235817.1960573-11-dmatlack@google.com/

Fundamentally, I think drivers will need to know that the device they
are dealing with was preserved across the Live Update so they can react
accordingly and this is how they know. This feels like an appropriate
responsibility to delegate to the PCI core since it can be common across
all PCI devices, rather than requiring drivers to store their own state
about which devices were preserved. I suspect PCI core will also use
liveupdate_incoming in the future (e.g. to avoid assigning new BARs) as
we implement more of the device preservation.

And in case you are also wondering about liveupdate_outgoing, I forsee
that being used for things like skipping disabling bus mastering in
pci_device_shutdown().

I think it would be a good idea to try to split this patch up, so there
is more breathing room to explain this context in the commit messages.
> 
> > + * device file for as long as it is preserved.
> > + *
> > + * However, there is a window of time in the incoming kernel when a device is
> > + * first probed and when userspace retrieves the device file with
> > + * ``LIVEUPDATE_SESSION_RETRIEVE_FD`` when the device could be bound to any
> > + * driver.
> 
>   ... window of time in the incoming kernel between a device being
>   probed and userspace retrieving the device file ... when the device
>   could be bound ...
> 
> I'm not sure what it means to retrieve a device file.  It doesn't
> sound like the usual Unix "device file" or "special file" in /dev/,
> since those aren't "retrieved".

For the forseeable future, device preservation will be triggered by
userspace preserving a VFIO device file in a LUO session using the ioctl
LIVEUPDATE_SESSION_PRESERVE_FD.  After kexec, userspace retrieves the
preserved file with the ioctl LIVEUPDATE_SESSION_RETRIEVE_FD.

This section would probably make more sense if it talked about VFIO
specifically instead of abstract "files" since that is the currently the
only use-case.

I expect non-VFIO drivers (i.e. "in-kernel") drivers could be supported
eventually but they will likely need a different API.

> > +static DEFINE_MUTEX(pci_flb_outgoing_lock);
> 
> It'd be handy if there were some excuse to mention "FLB" and expand it
> once in the doc above, since I have no idea what it means or where to
> look for it.  Maybe unfortunate that it will be pronounced "flub" ;)

I will add a section explaining FLB to the kerneldoc above.

> > +static int pci_flb_preserve(struct liveupdate_flb_op_args *args)
> > +{
> > +	struct pci_dev *dev = NULL;
> > +	int max_nr_devices = 0;
> > +	struct pci_ser *ser;
> > +	unsigned long size;
> > +
> > +	/*
> > +	 * Don't both accounting for VFs that could be created after this
> > +	 * since preserving VFs is not supported yet. Also don't account
> > +	 * for devices that could be hot-plugged after this since preserving
> > +	 * hot-plugged devices across Live Update is not yet an expected
> > +	 * use-case.
> 
> s/Don't both accounting/Don't bother accounting/ ? not sure of intent

"Don't bother" was the intent.

> I suspect the important thing here is that this allocates space for
> preserving X devices, and each subsequent pci_liveupdate_preserve()
> call from a driver uses up one of those slots.
> 
> My guess is this is just an allocation issue and from that point of
> view there's no actual problem with enabling VFs or hot-adding devices
> after this point; it's just that pci_liveupdate_preserve() will fail
> after X calls.

Yes that is correct.

> > +static void pci_flb_unpreserve(struct liveupdate_flb_op_args *args)
> > +{
> > +	struct pci_ser *ser = args->obj;
> > +
> > +	WARN_ON_ONCE(ser->nr_devices);
> 
> I guess this means somebody (userspace?) called .unpreserve() before
> all the drivers that had called pci_liveupdate_preserve() have also
> called pci_liveupdate_unpreserve()?
> 
> If this is userspace-triggerable, maybe it's worth a meaningful
> message including one or more of the device IDs from ser->devices[]?

This is not userspace triggerable unless there is a bug in LUO and/or
the driver (VFIO). By the way, that is the case for all of the WARN_ONs
in this commit. They are no userspace-triggerable, they are just there
to catch "this should never happen, there must be a kernel bug" type
issues.

I see that a lot of your comments are about these WARN_ONs so do you
have any general guidance on how I should be handling them?

> > +static void pci_ser_delete(struct pci_ser *ser, struct pci_dev *dev)
> > +{
> > +	struct pci_dev_ser *dev_ser;
> > +	int i;
> > +
> > +	dev_ser = pci_ser_find(ser, dev);
> > +
> > +	/*
> > +	 * This should never happen unless there is a kernel bug or
> > +	 * corruption that causes the state in struct pci_ser to get
> > +	 * out of sync with struct pci_dev.
> 
> Corruption can be a bug anywhere and isn't really worth mentioning,
> but the "out of sync" part sounds like it glosses over something
> important.
> 
> I guess this happens if there was no successful
> pci_liveupdate_preserve(X) before calling
> pci_liveupdate_unpreserve(X)?  That does sound like a kernel bug (I
> suppose a VFIO or other driver bug?), and I would just say what
> happened directly instead of calling it "out of sync".

No not even that would cause this warning to fire because
pci_liveupdate_unpreserve() bails immediately if liveupdate_outgoing
isn't true. This truly should never happen, hence the WARN.

> 
> > +	 */
> > +	if (pci_WARN_ONCE(dev, !dev_ser, "Cannot find preserved device!"))
> 
> Seems like an every-time sort of message if this indicates a driver bug?
> 
> It's enough of a hassle to convince myself that pci_WARN_ONCE()
> returns the value that caused the warning that I would prefer:
> 
>   if (!dev_ser) {
>     pci_warn(...) or pci_WARN_ONCE(...)
>     return;
>   }

For "this should really never happen" warnings, which is the case here,
my preference is to use WARN_ON_ONCE() since you only need to see it
happen once to know there is a bug somewhere, and logging every time can
lead to overwhelmingly interleaved logs if it happens too many times.

> > +	for (i = ser->nr_devices; i > 0; i--) {
> > +		struct pci_dev_ser *prev = &ser->devices[i - 1];
> > +		int cmp = pci_dev_ser_cmp(&new, prev);
> > +
> > +		/*
> > +		 * This should never happen unless there is a kernel bug or
> > +		 * corruption that causes the state in struct pci_ser to get out
> > +		 * of sync with struct pci_dev.
> 
> Huh.  Same comment as above.  I don't think this is telling me
> anything useful.  I guess what happened is we're trying to preserve X
> and X is already in "ser", but we should have returned -EBUSY above
> for that case.  If we're just saying memory corruption could cause
> bugs, I think that's pointless.
> 
> Actually I'm not even sure we should check for this.
> 
> > +		 */
> > +		if (WARN_ON_ONCE(!cmp))
> > +			return -EBUSY;

This is another "this should really never happen" check. I could just
return without warning but this is a sign that something is very wrong
somewhere in the kernel and it is trivial to just add WARN_ON_ONCE() so
that it gets flagged in dmesg. In my experience that can be very helpful
to track down logic bugs during developemt and rare race conditions at
scale in production environments.

> > +void pci_liveupdate_unpreserve(struct pci_dev *dev)
> > +{
> > +	struct pci_ser *ser;
> > +	int ret;
> > +
> > +	/* This should never happen unless the caller (driver) is buggy */
> > +	if (WARN_ON_ONCE(!dev->liveupdate_outgoing))
> 
> Why once?  Is there some situation where we could get a flood?  Since
> we have a pci_dev, maybe a pci_warn() that would indicate the driver
> and device would be more useful?

ONCE because this is a sign of a kernel bug and one instance is enough
to warrant debugging and fixing. Allowing multiple could lead to logs
interleaving, log rotation, and other issues if there is an excessive
amount.

I also chose full WARN_ON_ONCE() over just a warning log line so that
the user gets a backtrace and can see the caller.

I agree that showing the PCI device and driver would be helpful so
pci_WARN_ONCE() would be better.

> > +	ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser);
> > +
> > +	/* This should never happen unless there is a bug in LUO */
> > +	if (WARN_ON_ONCE(ret))
> 
> Is LUO completely in-kernel? 

Yes

> I think this warning message would be
> kind of obscure if this is something that could be triggered by a
> userspace bug.

This can only be triggered by a kernel bug.

> Also, we do have the pci_dev, which a WARN_ON_ONCE()
> doesn't take advantage of at all.

I'll switch to pci_WARN_ONCE().

> > +	/*
> > +	 * Live Update is enabled and there is incoming FLB data, but none of it
> > +	 * matches pci_liveupdate_flb.compatible.
> > +	 *
> > +	 * This could mean that no PCI FLB data was passed by the previous
> > +	 * kernel, but it could also mean the previous kernel used a different
> > +	 * compatibility string (i.e.a different ABI). The latter deserves at
> > +	 * least a WARN_ON_ONCE() but it cannot be distinguished from the
> > +	 * former.
> 
> This says both "there is incoming FLB data" and "no PCI FLB data".  I
> guess maybe it's possible to have FLB data but no *PCI* FLB data?

Yes, PCI is just the users of File-Lifecycle Bound (FLB) data to
preserve state across Live Update.

> s/i.e.a/i.e., /

Will do.


> > +	 */
> > +	if (ret == -ENOENT) {
> > +		pr_info_once("PCI: No incoming FLB data detected during Live Update");
> 
> Not sure "FLB" will be meaningful to users here.  Maybe we could say
> something like ("no FLB data compatible with %s\n", pci_liveupdate_flb.compatible)?

Good idea, will do!

> > +u32 pci_liveupdate_incoming_nr_devices(void)
> > +{
> > +	struct pci_ser *ser;
> > +
> > +	if (pci_liveupdate_flb_get_incoming(&ser))
> > +		return 0;
> 
> Seems slightly overcomplicated to return various error codes from
> pci_liveupdate_flb_get_incoming(), only to throw them away here and
> special-case the "return 0".  I think you *could* set
> "ser->nr_devices" to zero at entry to
> pci_liveupdate_flb_get_incoming() and make this just:
> 
>   pci_liveupdate_flb_get_incoming(&ser);
>   return ser->nr_devices;

pci_liveupdate_flb_get_incoming() fetches the preserved pci_ser struct
from LUO (the struct that the previous kernel allocated and populated).
If pci_liveupdate_flb_get_incoming() returns an error, it means there
was no struct pci_ser preserved by the previous kernel (or at least not
that the current kernel is compatible with), so we return 0 here to
indicate that 0 devices were preserved.

> > +void pci_liveupdate_setup_device(struct pci_dev *dev)
> > +{
> > +	struct pci_ser *ser;
> > +
> > +	if (pci_liveupdate_flb_get_incoming(&ser))
> > +		return;
> > +
> > +	if (!pci_ser_find(ser, dev))
> > +		return;
> 
> If pci_liveupdate_flb_get_incoming() set ser->nr_devices to zero at
> entry, the bsearch() in pci_ser_find() would return NULL if there were
> no devices to search:
> 
>   pci_liveupdate_flb_get_incoming(&ser);
>   if (!pci_ser_find(ser, dev))
>     return;

I think this is explained by my reply to the previous comment.  If
pci_liveupdate_flb_get_incoming() returns an error then there was no
pci_ser struct passed to use by the previous kernel. Thus we return.

> > diff --git a/include/linux/kho/abi/pci.h b/include/linux/kho/abi/pci.h
> > new file mode 100644
> > index 000000000000..7764795f6818
> > --- /dev/null
> > +++ b/include/linux/kho/abi/pci.h
> 
> It seems like most of include/linux/ is ABI, so does kho/abi/ need to
> be separated out in its own directory?

include/linux/kho/abi/ contains all of the structs, enums, etc. that are
handed off between kernels during a Live Update. If almost anything
changes in this directory, it breaks our ability to upgrade/downgrade
via Live Update. That's why it's split off into its own directory.

include/linux/ is not part of the Live Update ABI. Changes to those
headers to not affect our ability to upgrade/downgrade via Live Update.

> It's kind of unusual for the hierarchy to be this deep, especially
> since abi/ is the only thing in include/linux/kho/.

Yes I agree, but that is outside the scope of this patchset I think.
This directory already exists.


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2026-03-26 21:39 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
2026-03-23 23:57 ` [PATCH v3 01/24] liveupdate: Export symbols needed by modules David Matlack
2026-03-23 23:57 ` [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update David Matlack
2026-03-25 20:06   ` David Matlack
2026-03-25 23:12   ` Bjorn Helgaas
2026-03-26 21:39     ` David Matlack
2026-03-23 23:57 ` [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups David Matlack
2026-03-24 13:07   ` Yi Liu
2026-03-24 18:00     ` David Matlack
2026-03-25 11:12       ` Yi Liu
2026-03-25 17:29         ` David Matlack
2026-03-25 23:13   ` Bjorn Helgaas
2026-03-23 23:57 ` [PATCH v3 04/24] PCI: Inherit bus numbers from previous kernel during Live Update David Matlack
2026-03-23 23:57 ` [PATCH v3 05/24] docs: liveupdate: Add documentation for PCI David Matlack
2026-03-23 23:57 ` [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
2026-03-24 13:07   ` Yi Liu
2026-03-24 16:33     ` David Matlack
2026-03-23 23:57 ` [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
2026-03-24 13:08   ` Yi Liu
2026-03-24 16:46     ` David Matlack
2026-03-23 23:58 ` [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after " David Matlack
2026-03-24 13:08   ` Yi Liu
2026-03-24 17:05     ` David Matlack
2026-03-23 23:58 ` [PATCH v3 09/24] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
2026-03-23 23:58 ` [PATCH v3 10/24] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
2026-03-23 23:58 ` [PATCH v3 11/24] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
2026-03-23 23:58 ` [PATCH v3 12/24] vfio/pci: Skip reset of preserved device after Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 13/24] docs: liveupdate: Add documentation for VFIO PCI David Matlack
2026-03-23 23:58 ` [PATCH v3 14/24] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
2026-03-23 23:58 ` [PATCH v3 15/24] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
2026-03-23 23:58 ` [PATCH v3 16/24] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
2026-03-23 23:58 ` [PATCH v3 17/24] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
2026-03-23 23:58 ` [PATCH v3 18/24] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
2026-03-23 23:58 ` [PATCH v3 19/24] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
2026-03-23 23:58 ` [PATCH v3 20/24] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
2026-03-23 23:58 ` [PATCH v3 21/24] vfio: selftests: Expose iommu_modes to tests David Matlack
2026-03-23 23:58 ` [PATCH v3 22/24] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
2026-03-23 23:58 ` [PATCH v3 23/24] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 24/24] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack
2026-03-26 20:43 ` [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
  -- strict thread matches above, loose matches on Subject: below --
2026-03-25 14:51 [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups Liu, Yi L

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox