* [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update
@ 2026-04-23 21:23 David Matlack
2026-04-23 21:23 ` [PATCH v4 01/11] PCI: liveupdate: Set up FLB handler for the PCI core David Matlack
` (10 more replies)
0 siblings, 11 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
This series can be found on GitHub:
https://github.com/dmatlack/linux/tree/liveupdate/pci/base/v4
This patch series introduces support in the PCI core for Live Update,
enabling drivers to preserve PCI devices across a kexec-based kernel
update without interrupting the device. This functionality is critical
for minimizing downtime in environments where PCI devices (e.g., those
assigned to VMs via VFIO) must continue operating or maintain state
across a host kernel upgrade.
This series was split off from the the VFIO driver series [1] to enable
more rapid iteration on the PCI core changes, add breathing room to
split changes into smaller patches, and add some more functionality.
Series Overview
---------------
This series implements the following to support PCI device preservation
across Live Update:
1. Set up a File-Lifecycle-Bound (FLB) handler to track and preserve
PCI-specific state (struct pci_ser) across Live Update using Kexec
Handover (KHO).
2. Add APIs for drivers to register "outgoing" devices for
preservation and for the PCI core to identify "incoming" preserved
devices during enumeration.
3. Automatically preserve all upstream bridges for any preserved
endpoint. Use reference counting to ensure bridges remain preserved
as long as any downstream device is preserved.
4. Inherit secondary/subordinate bus numbers, ARI Forwarding Enable,
and Access Control Services (ACS) flags from the previous kernel to
ensure a stable routing fabric and consistent IOMMU group
assignments during Live Update.
5. Restrict preservation to devices in immutable singleton IOMMU
groups. Require that all upstream bridges have the necessary ACS
features enabled to prevent IOMMU group changes across the update.
6. Modify the PCI shutdown path to avoid disabling bus mastering on
preserved devices and their upstream bridges, allowing memory
transactions to continue uninterrupted.
7. Provide comprehensive documentation for the FLB API, device
tracking mechanisms, and the division of responsibilities between
the PCI core, drivers, and userspace.
This series could be simplified down to fewer patches by limiting
preservation support to only devices on a root bus. Supporting devices
downstream of bridges could be split off into a follow-up series.
However since I got bridge preservation working and the series was less
than 15 patches I opted to include it for now.
Dependencies
------------
This series depends on 2 LUO patches to enable refcounting of the
incoming FLB so that it is safe for the PCI core to use
liveupdate_flb_get_incoming() during enumeration.
https://lore.kernel.org/lkml/20260423174032.3140399-1-dmatlack@google.com/
VFIO support for PCI device preservation is built on top of this series.
The following branch on GitHub contains all the patches together to
enable testing (the LUO FLB changes, this series, and the VFIO patches):
https://github.com/dmatlack/linux/tree/liveupdate/pci/base/v4-with-vfio
Testing
-------
This series was tested in combination with the VFIO patches mentioned in
the previous section using the the new VFIO selftests:
- vfio_pci_liveupdate_uapi_test
- vfio_pci_liveupdate_kexec_test
Both tests were ran in ran in a QEMU-based VM environment, using a
single virtio-net PCIe device behind a PCI-to-PCI bridge as the test
device, and in a baremetal environment on an Intel EMR server, using 8x
Intel DSA PCIe devices (each on a host bridge).
Future Work
-----------
After this series we expect to make further improvements to the PCI core
support for Live Update. Once these are done we plan to drop the
"experimental" verbiage from PCI_LIVEUPDATE Kconfig help message and
documentation.
- Ensure bridges with downstream preserved devices stay in D0 across
Live Update in case preserved endpoints are doing memory
transactions.
- Preserve BARs of all preserved devices to avoid disrupting P2P
Beyond that we also plan to add support for preserving Virtual Functions
since that is a major use-case for Cloud environments. This will require
keeping SR-IOV enabled on the partent PF across a Live Update.
Changelog
---------
v4:
Enhancements on top of previous series:
- Split "PCI: Add API to track PCI devices preserved across Live
Update" from v3 into 4 separate commits to make reviewing easier (FLB
setup, outgoing device tracking, incoming device tracking, and
documentation for driver binding)
- Use new incoming FLB refcounting to avoid use-after-free bugs during
enumeration
- Use an xarray to speed up looking up of incoming preserved devices
during enumeration
- Use a per-device bit to indicate when secondary and subordinate bus
numbers should be inherited on bridges instead of global data to
avoid races between the 2 passes
- Inherit ARI enablement across Live Update
- Automatically preserve bridges upstream of preserved endpoints so
so that ACS flags, ARI enablement, and bus mastering can be kept
constant on bridges across Live Update
- Avoid clearing bus mastering during shutdown on outgoing preserved
device to avoid disrupting memory transcations being performed by
preserved devices
- Add a MAINTAINERS entry for the new files to support Live Update in
the PCI core
- Add info and debug level logging for various events throughout device
preservation
Changes based on review feedback on v3:
- Fix up typos, wording, documentation gaps, and code style (Bjorn)
- Use pci_WARN_ONCE() where possible (Bjorn)
- Require ACS flags to preserve devices behind bridges so that
singleton IOMMU group topology is guaranteed to remain across Live
Update (Yi)
- Preserve ACS flags (Jason, Alex)
v3: https://lore.kernel.org/kvm/20260323235817.1960573-1-dmatlack@google.com/
v2: https://lore.kernel.org/kvm/20260129212510.967611-1-dmatlack@google.com/
v1: https://lore.kernel.org/kvm/20251126193608.2678510-1-dmatlack@google.com/
rfc: https://lore.kernel.org/kvm/20251018000713.677779-1-vipinsh@google.com/
[1] https://lore.kernel.org/kvm/20260323235817.1960573-1-dmatlack@google.com/
David Matlack (11):
PCI: liveupdate: Set up FLB handler for the PCI core
PCI: liveupdate: Track outgoing preserved PCI devices
PCI: liveupdate: Track incoming preserved PCI devices
PCI: liveupdate: Document driver binding responsibilities
PCI: liveupdate: Inherit bus numbers during Live Update
PCI: liveupdate: Auto-preserve upstream bridges across Live Update
PCI: liveupdate: Inherit ACS flags in incoming preserved devices
PCI: liveupdate: Require preserved devices are in immutable singleton
IOMMU groups
PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges
PCI: liveupdate: Do not disable bus mastering on preserved devices
during kexec
Documentation: PCI: Add documentation for Live Update
Documentation/PCI/index.rst | 1 +
Documentation/PCI/liveupdate.rst | 23 +
.../admin-guide/kernel-parameters.txt | 6 +-
Documentation/core-api/liveupdate.rst | 1 +
MAINTAINERS | 13 +
drivers/iommu/iommu.c | 35 ++
drivers/pci/Kconfig | 14 +
drivers/pci/Makefile | 1 +
drivers/pci/liveupdate.c | 562 ++++++++++++++++++
drivers/pci/pci-driver.c | 31 +-
drivers/pci/pci.c | 22 +-
drivers/pci/pci.h | 13 +
drivers/pci/probe.c | 25 +-
include/linux/iommu.h | 7 +
include/linux/kho/abi/pci.h | 62 ++
include/linux/pci.h | 58 ++
16 files changed, 858 insertions(+), 16 deletions(-)
create mode 100644 Documentation/PCI/liveupdate.rst
create mode 100644 drivers/pci/liveupdate.c
create mode 100644 include/linux/kho/abi/pci.h
base-commit: a13f7eb5b2d5bef886659768680093bec1c0470d
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 01/11] PCI: liveupdate: Set up FLB handler for the PCI core
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
@ 2026-04-23 21:23 ` David Matlack
2026-04-23 21:23 ` [PATCH v4 02/11] PCI: liveupdate: Track outgoing preserved PCI devices David Matlack
` (9 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
Set up a File-Lifecycle-Bound (FLB) handler for the PCI core to enable
it to participate in the preservation of PCI devices across Live Update.
Essentially, this commit enables the PCI core to allocate a struct
(struct pci_ser) and preserve it across a Live Update whenever at least
one device is preserved.
Preserving PCI devices across Live Update is built on top of the Live
Update Orchestrator's (LUO) support for file preservation. Drivers are
expected to expose a file to userspace to represent a single PCI device
and support preservation of that file. This is itended primarily to
support preservation of PCI devices bound to VFIO drivers.
This commit enables drivers to register their liveupdate_file_handler
with the PCI core so that the PCI core can do its own tracking and
enforcement of which devices are preserved.
pci_liveupdate_register_flb(driver_file_handler);
pci_liveupdate_unregister_flb(driver_file_handler);
When the first file (with a handler registered with the PCI core) is
preserved, the PCI core will be notified to allocate its tracking struct
(pci_ser). When the last file is unpreserved (i.e. preservation
cancelled) the PCI core will be notified to free struct pci_ser.
This struct is preserved across a Live Update using KHO and can be
fetched by the PCI core during early boot (e.g. during device
enumeration) so that it knows which devices were preserved.
Note that this commit only allocates struct pci_ser and preserves it
across Live Update. A subsequent commit will add an API for drivers to
tell the PCI core exactly which devices are being preserved.
Signed-off-by: David Matlack <dmatlack@google.com>
---
MAINTAINERS | 12 ++++
drivers/pci/Kconfig | 14 ++++
drivers/pci/Makefile | 1 +
drivers/pci/liveupdate.c | 139 ++++++++++++++++++++++++++++++++++++
include/linux/kho/abi/pci.h | 61 ++++++++++++++++
include/linux/pci.h | 15 ++++
6 files changed, 242 insertions(+)
create mode 100644 drivers/pci/liveupdate.c
create mode 100644 include/linux/kho/abi/pci.h
diff --git a/MAINTAINERS b/MAINTAINERS
index c9b7b6f9828e..94af31837375 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20555,6 +20555,18 @@ L: linux-pci@vger.kernel.org
S: Supported
F: Documentation/PCI/pci-error-recovery.rst
+PCI LIVE UPDATE
+M: Bjorn Helgaas <bhelgaas@google.com>
+M: David Matlack <dmatlack@google.com>
+L: linux-pci@vger.kernel.org
+S: Supported
+Q: https://patchwork.kernel.org/project/linux-pci/list/
+B: https://bugzilla.kernel.org
+C: irc://irc.oftc.net/linux-pci
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
+F: drivers/pci/liveupdate.c
+F: include/linux/kho/abi/pci.h
+
PCI MSI DRIVER FOR ALTERA MSI IP
L: linux-pci@vger.kernel.org
S: Orphan
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 33c88432b728..08398cbe970c 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -328,6 +328,20 @@ config VGA_ARB_MAX_GPUS
Reserves space in the kernel to maintain resource locking for
multiple GPUS. The overhead for each GPU is very small.
+config PCI_LIVEUPDATE
+ bool "PCI Live Update Support (EXPERIMENTAL)"
+ depends on PCI && LIVEUPDATE
+ help
+ Enable PCI core support for preserving PCI devices across Live
+ Update. This, in combination with support in a device's driver,
+ enables PCI devices to run and perform memory transactions
+ uninterrupted during a kexec for Live Update.
+
+ This option should only be enabled by developers working on
+ implementing this support.
+
+ If unsure, say N.
+
source "drivers/pci/hotplug/Kconfig"
source "drivers/pci/controller/Kconfig"
source "drivers/pci/endpoint/Kconfig"
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 41ebc3b9a518..e8d003cb6757 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_SYSFS) += pci-sysfs.o slot.o
obj-$(CONFIG_ACPI) += pci-acpi.o
obj-$(CONFIG_GENERIC_PCI_IOMAP) += iomap.o
+obj-$(CONFIG_PCI_LIVEUPDATE) += liveupdate.o
endif
obj-$(CONFIG_OF) += of.o
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
new file mode 100644
index 000000000000..d4fa61625d56
--- /dev/null
+++ b/drivers/pci/liveupdate.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * David Matlack <dmatlack@google.com>
+ */
+
+/**
+ * DOC: PCI Live Update
+ *
+ * The PCI subsystem participates in the Live Update process to enable drivers
+ * to preserve their PCI devices across kexec.
+ *
+ * .. note::
+ * The support for preserving PCI devices across Live Update is currently
+ * *partial* and should be considered *experimental*. It should only be
+ * used by developers working on the implementation for the time being.
+ *
+ * To enable the support, enable ``CONFIG_PCI_LIVEUPDATE``.
+ *
+ * File-Lifecycle-Bound (FLB) Data
+ * ===============================
+ *
+ * PCI device preservation across Live Update is built on top of the Live Update
+ * Orchestrator's (LUO) support for file preservation across kexec. Drivers
+ * are expected to expose a file to represent a single PCI device and support
+ * preservation of that file with ``ioctl(LIVEUPDATE_SESSION_PRESERVE_FD)``.
+ * This allows userspace to control the preservation of devices and ensure
+ * proper lifecycle management while a device is preserved. The first intended
+ * use-case is preserving vfio-pci device files.
+ *
+ * The PCI core maintains its own state about what devices are being preserved
+ * across Live Update using a feature called File-Lifecycle-Bound (FLB) data in
+ * LUO. Essentially, this allows the PCI core to allocate struct pci_ser when
+ * the first device (file) is preserved and free it when the last device (file)
+ * is unpreserved. After kexec, the PCI core can fetch the struct pci_ser (which
+ * was constructed by the previous kernel) from LUO at any time (e.g. during
+ * enumeration) so that it knows which devices were preserved.
+ *
+ * To enable the PCI core to be notified whenever a file representing a device
+ * is preserved, drivers must register their struct liveupdate_file_handler with
+ * the PCI core by using the following APIs:
+ *
+ * * ``pci_liveupdate_register_flb(driver_file_handler)``
+ * * ``pci_liveupdate_unregister_flb(driver_file_handler)``
+ */
+
+#define pr_fmt(fmt) "PCI: liveupdate: " fmt
+
+#include <linux/bsearch.h>
+#include <linux/io.h>
+#include <linux/kexec_handover.h>
+#include <linux/kho/abi/pci.h>
+#include <linux/liveupdate.h>
+#include <linux/mutex.h>
+#include <linux/mm.h>
+#include <linux/pci.h>
+#include <linux/sort.h>
+
+static int pci_flb_preserve(struct liveupdate_flb_op_args *args)
+{
+ struct pci_dev *dev = NULL;
+ u32 max_nr_devices = 0;
+ struct pci_ser *ser;
+ unsigned long size;
+
+ /*
+ * Allocate enough space to preserve all of the devices that are
+ * currently present on the system. Extra padding can be added to this
+ * in the future to increase the chances that there is enough room to
+ * preserve devices that are not yet present on the system (e.g. VFs,
+ * hot-plugged devices).
+ */
+ for_each_pci_dev(dev)
+ max_nr_devices++;
+
+ size = struct_size_t(struct pci_ser, devices, max_nr_devices);
+
+ pr_debug("Preserving struct pci_ser with room for %u devices\n",
+ max_nr_devices);
+
+ ser = kho_alloc_preserve(size);
+ if (IS_ERR(ser))
+ return PTR_ERR(ser);
+
+ ser->max_nr_devices = max_nr_devices;
+ ser->nr_devices = 0;
+
+ args->obj = ser;
+ args->data = virt_to_phys(ser);
+ return 0;
+}
+
+static void pci_flb_unpreserve(struct liveupdate_flb_op_args *args)
+{
+ struct pci_ser *ser = args->obj;
+
+ pr_debug("Unpreserving struct pci_ser\n");
+ WARN_ON_ONCE(ser->nr_devices);
+ kho_unpreserve_free(ser);
+}
+
+static int pci_flb_retrieve(struct liveupdate_flb_op_args *args)
+{
+ args->obj = phys_to_virt(args->data);
+ return 0;
+}
+
+static void pci_flb_finish(struct liveupdate_flb_op_args *args)
+{
+ kho_restore_free(args->obj);
+}
+
+static struct liveupdate_flb_ops pci_liveupdate_flb_ops = {
+ .preserve = pci_flb_preserve,
+ .unpreserve = pci_flb_unpreserve,
+ .retrieve = pci_flb_retrieve,
+ .finish = pci_flb_finish,
+ .owner = THIS_MODULE,
+};
+
+static struct liveupdate_flb pci_liveupdate_flb = {
+ .ops = &pci_liveupdate_flb_ops,
+ .compatible = PCI_LUO_FLB_COMPATIBLE,
+};
+
+int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
+{
+ pr_debug("Registering file handler \"%s\"\n", fh->compatible);
+ return liveupdate_register_flb(fh, &pci_liveupdate_flb);
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_register_flb);
+
+void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh)
+{
+ pr_debug("Unregistering file handler \"%s\"\n", fh->compatible);
+ liveupdate_unregister_flb(fh, &pci_liveupdate_flb);
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_unregister_flb);
diff --git a/include/linux/kho/abi/pci.h b/include/linux/kho/abi/pci.h
new file mode 100644
index 000000000000..5c0e92588c00
--- /dev/null
+++ b/include/linux/kho/abi/pci.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * David Matlack <dmatlack@google.com>
+ */
+
+#ifndef _LINUX_KHO_ABI_PCI_H
+#define _LINUX_KHO_ABI_PCI_H
+
+#include <linux/bug.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/**
+ * DOC: PCI File-Lifecycle Bound (FLB) Live Update ABI
+ *
+ * This header defines the ABI for preserving core PCI state across kexec using
+ * Live Update File-Lifecycle Bound (FLB) data.
+ *
+ * This interface is a contract. Any modification to any of the serialization
+ * structs defined here constitutes a breaking change. Such changes require
+ * incrementing the version number in the PCI_LUO_FLB_COMPATIBLE string.
+ */
+
+#define PCI_LUO_FLB_COMPATIBLE "pci-v1"
+
+/**
+ * struct pci_dev_ser - Serialized state about a single PCI device.
+ *
+ * @domain: The device's PCI domain number (segment).
+ * @bdf: The device's PCI bus, device, and function number.
+ * @reserved: Reserved (to naturally align struct pci_dev_ser).
+ */
+struct pci_dev_ser {
+ u32 domain;
+ u16 bdf;
+ u16 reserved;
+} __packed;
+
+/**
+ * struct pci_ser - PCI Subsystem Live Update State
+ *
+ * This struct tracks state about all devices that are being preserved across
+ * a Live Update for the next kernel.
+ *
+ * @max_nr_devices: The length of the devices[] flexible array.
+ * @nr_devices: The number of devices that were preserved.
+ * @devices: Flexible array of pci_dev_ser structs for each device.
+ */
+struct pci_ser {
+ u32 max_nr_devices;
+ u32 nr_devices;
+ struct pci_dev_ser devices[];
+} __packed;
+
+/* Ensure all elements of devices[] are naturally aligned. */
+static_assert(offsetof(struct pci_ser, devices) % sizeof(unsigned long) == 0);
+static_assert(sizeof(struct pci_dev_ser) % sizeof(unsigned long) == 0);
+
+#endif /* _LINUX_KHO_ABI_PCI_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2c4454583c11..d70080babd52 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -40,6 +40,7 @@
#include <linux/resource_ext.h>
#include <linux/msi_api.h>
#include <uapi/linux/pci.h>
+#include <linux/liveupdate.h>
#include <linux/pci_ids.h>
@@ -2876,4 +2877,18 @@ void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type);
WARN_ONCE(condition, "%s %s: " fmt, \
dev_driver_string(&(pdev)->dev), pci_name(pdev), ##arg)
+#ifdef CONFIG_PCI_LIVEUPDATE
+int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh);
+void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh);
+#else
+static inline int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh)
+{
+}
+#endif
+
#endif /* LINUX_PCI_H */
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 02/11] PCI: liveupdate: Track outgoing preserved PCI devices
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
2026-04-23 21:23 ` [PATCH v4 01/11] PCI: liveupdate: Set up FLB handler for the PCI core David Matlack
@ 2026-04-23 21:23 ` David Matlack
2026-04-23 21:23 ` [PATCH v4 03/11] PCI: liveupdate: Track incoming " David Matlack
` (8 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
Add APIs to allow drivers to notify the PCI core of which devices are
being preserved across a Live Update for the next kernel, i.e.
"outgoing" devices.
Drivers must notify the PCI core when devices are preserved so that the
PCI core can update its FLB data (struct pci_ser) and track the list of
outgoing devices. pci_liveupdate_preserve() notifies the PCI core that a
device must be preserved across Live Update. pci_liveupdate_unpreserve()
reverses this (cancels the preservation of the device).
This tracking ensures the PCI core is fully aware of which devices may
need special handling during shutdown and kexec, and so that it can be
handed off to the next kernel.
Signed-off-by: David Matlack <dmatlack@google.com>
---
drivers/pci/liveupdate.c | 101 ++++++++++++++++++++++++++++++++++++
include/linux/kho/abi/pci.h | 7 +--
include/linux/pci.h | 26 ++++++++++
3 files changed, 131 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index d4fa61625d56..2dd8daa2f17c 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -43,6 +43,26 @@
*
* * ``pci_liveupdate_register_flb(driver_file_handler)``
* * ``pci_liveupdate_unregister_flb(driver_file_handler)``
+ *
+ * Device Tracking
+ * ===============
+ *
+ * Drivers must notify the PCI core when specific devices are preserved or
+ * unpreserved with the following APIs:
+ *
+ * * ``pci_liveupdate_preserve(pci_dev)``
+ * * ``pci_liveupdate_unpreserve(pci_dev)``
+ *
+ * This allows the PCI core to keep it's FLB data (struct pci_ser) up to date
+ * with the list of **outgoing** preserved devices for the next kernel.
+ *
+ * Restrictions
+ * ============
+ *
+ * The PCI core enforces the following restrictions on which devices can be
+ * preserved. These may be relaxed in the future:
+ *
+ * * The device cannot be a Virtual Function (VF).
*/
#define pr_fmt(fmt) "PCI: liveupdate: " fmt
@@ -57,6 +77,8 @@
#include <linux/pci.h>
#include <linux/sort.h>
+static DEFINE_MUTEX(pci_flb_outgoing_lock);
+
static int pci_flb_preserve(struct liveupdate_flb_op_args *args)
{
struct pci_dev *dev = NULL;
@@ -124,6 +146,85 @@ static struct liveupdate_flb pci_liveupdate_flb = {
.compatible = PCI_LUO_FLB_COMPATIBLE,
};
+int pci_liveupdate_preserve(struct pci_dev *dev)
+{
+ struct pci_ser *ser;
+ int i, ret;
+
+ guard(mutex)(&pci_flb_outgoing_lock);
+
+ ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser);
+ if (ret)
+ return ret;
+
+ if (!ser)
+ return -ENOENT;
+
+ if (dev->is_virtfn)
+ return -EINVAL;
+
+ if (dev->liveupdate_outgoing)
+ return -EBUSY;
+
+ if (ser->nr_devices == ser->max_nr_devices)
+ return -ENOSPC;
+
+ for (i = 0; i < ser->max_nr_devices; i++) {
+ /*
+ * Start searching at index ser->nr_devices. This should result
+ * in a constant time search under expected conditions (devices
+ * are not getting unpreserved).
+ */
+ int index = (ser->nr_devices + i) % ser->max_nr_devices;
+ struct pci_dev_ser *dev_ser = &ser->devices[index];
+
+ if (dev_ser->refcount)
+ continue;
+
+ pci_info(dev, "Device will be preserved across next Live Update\n");
+ ser->nr_devices++;
+
+ dev_ser->domain = pci_domain_nr(dev->bus);
+ dev_ser->bdf = pci_dev_id(dev);
+ dev_ser->refcount = 1;
+
+ dev->liveupdate_outgoing = dev_ser;
+ return 0;
+ }
+
+ return -ENOSPC;
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_preserve);
+
+void pci_liveupdate_unpreserve(struct pci_dev *dev)
+{
+ struct pci_dev_ser *dev_ser;
+ struct pci_ser *ser = NULL;
+ int ret;
+
+ guard(mutex)(&pci_flb_outgoing_lock);
+
+ ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser);
+
+ if (ret || !ser) {
+ pci_warn(dev, "Cannot unpreserve device without outgoing Live Update state\n");
+ return;
+
+ }
+
+ dev_ser = dev->liveupdate_outgoing;
+ if (!dev_ser) {
+ pci_warn(dev, "Cannot unpreserve device that is not preserved\n");
+ return;
+ }
+
+ pci_info(dev, "Device will no longer be preserved across next Live Update\n");
+ ser->nr_devices--;
+ memset(dev_ser, 0, sizeof(*dev_ser));
+ dev->liveupdate_outgoing = NULL;
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_unpreserve);
+
int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
{
pr_debug("Registering file handler \"%s\"\n", fh->compatible);
diff --git a/include/linux/kho/abi/pci.h b/include/linux/kho/abi/pci.h
index 5c0e92588c00..5b4c8d9e462c 100644
--- a/include/linux/kho/abi/pci.h
+++ b/include/linux/kho/abi/pci.h
@@ -23,19 +23,20 @@
* incrementing the version number in the PCI_LUO_FLB_COMPATIBLE string.
*/
-#define PCI_LUO_FLB_COMPATIBLE "pci-v1"
+#define PCI_LUO_FLB_COMPATIBLE "pci-v2"
/**
* struct pci_dev_ser - Serialized state about a single PCI device.
*
* @domain: The device's PCI domain number (segment).
* @bdf: The device's PCI bus, device, and function number.
- * @reserved: Reserved (to naturally align struct pci_dev_ser).
+ * @refcount: Reference count used by the PCI core to keep track of whether it
+ * is done using a device's struct pci_dev_ser.
*/
struct pci_dev_ser {
u32 domain;
u16 bdf;
- u16 reserved;
+ u16 refcount;
} __packed;
/**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d70080babd52..eb94cbd8ab9d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -41,6 +41,7 @@
#include <linux/msi_api.h>
#include <uapi/linux/pci.h>
#include <linux/liveupdate.h>
+#include <linux/kho/abi/pci.h>
#include <linux/pci_ids.h>
@@ -594,6 +595,9 @@ struct pci_dev {
u8 tph_mode; /* TPH mode */
u8 tph_req_type; /* TPH requester type */
#endif
+#ifdef CONFIG_PCI_LIVEUPDATE
+ struct pci_dev_ser *liveupdate_outgoing; /* State preserved for next kernel */
+#endif
};
static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
@@ -2880,6 +2884,14 @@ void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type);
#ifdef CONFIG_PCI_LIVEUPDATE
int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh);
void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh);
+
+int pci_liveupdate_preserve(struct pci_dev *dev);
+void pci_liveupdate_unpreserve(struct pci_dev *dev);
+
+static inline struct pci_dev_ser *pci_liveupdate_outgoing(struct pci_dev *dev)
+{
+ return dev->liveupdate_outgoing;
+}
#else
static inline int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
{
@@ -2889,6 +2901,20 @@ static inline int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh
static inline void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh)
{
}
+
+static inline int pci_liveupdate_preserve(struct pci_dev *dev)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void pci_liveupdate_unpreserve(struct pci_dev *dev)
+{
+}
+
+static inline struct pci_dev_ser *pci_liveupdate_outgoing(struct pci_dev *dev)
+{
+ return NULL;
+}
#endif
#endif /* LINUX_PCI_H */
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 03/11] PCI: liveupdate: Track incoming preserved PCI devices
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
2026-04-23 21:23 ` [PATCH v4 01/11] PCI: liveupdate: Set up FLB handler for the PCI core David Matlack
2026-04-23 21:23 ` [PATCH v4 02/11] PCI: liveupdate: Track outgoing preserved PCI devices David Matlack
@ 2026-04-23 21:23 ` David Matlack
2026-04-23 21:23 ` [PATCH v4 04/11] PCI: liveupdate: Document driver binding responsibilities David Matlack
` (7 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
During PCI enumeration, the previous kernel might have passed state about
devices that were preserved across kexec. The PCI core needs to fetch
this state to identify which devices are "incoming" and require special
handling.
Add pci_liveupdate_setup_device() which is called during device setup
to fetch the serialized state (struct pci_ser) from the Live Update
Orchestrator. The first time this happens, pci_flb_retrieve() will run
and convert the array of pci_dev_ser structs into an xarray so that it
can be looked up efficiently.
If a device is found in the xarray, the PCI core stores a pointer to its
state in dev->liveupdate_incoming and holds a reference to the incoming
FLB until pci_liveupdate_finish() is called by the driver.
This ensures proper lifecycle management for incoming preserved devices
and allows the PCI core and drivers to apply specific Live Update
logic to them in subsequent commits.
Signed-off-by: David Matlack <dmatlack@google.com>
---
drivers/pci/liveupdate.c | 189 ++++++++++++++++++++++++++++++++++++++-
drivers/pci/pci.h | 13 +++
drivers/pci/probe.c | 4 +
include/linux/pci.h | 16 ++++
4 files changed, 218 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 2dd8daa2f17c..e616cecc37c8 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -56,6 +56,20 @@
* This allows the PCI core to keep it's FLB data (struct pci_ser) up to date
* with the list of **outgoing** preserved devices for the next kernel.
*
+ * After kexec, whenever a device is enumerated, the PCI core will check if it
+ * is an **incoming** preserved device (i.e. preserved by the previous kernel)
+ * by checking the incoming FLB data (struct pci_ser).
+ *
+ * Drivers must notify the PCI core when an **incoming** device is done
+ * participating in the incoming Live Update with the following API:
+ *
+ * * ``pci_liveupdate_finish(pci_dev)``
+ *
+ * The PCI core does not enforce any ordering of ``pci_liveupdate_finish()`` and
+ * ``pci_liveupdate_preserve()``. i.e. A PCI device can be **outgoing**
+ * (preserved for next kernel) and **incoming** (preserved by previous kernel)
+ * at the same time.
+ *
* Restrictions
* ============
*
@@ -67,7 +81,6 @@
#define pr_fmt(fmt) "PCI: liveupdate: " fmt
-#include <linux/bsearch.h>
#include <linux/io.h>
#include <linux/kexec_handover.h>
#include <linux/kho/abi/pci.h>
@@ -75,10 +88,24 @@
#include <linux/mutex.h>
#include <linux/mm.h>
#include <linux/pci.h>
-#include <linux/sort.h>
+
+#include "pci.h"
static DEFINE_MUTEX(pci_flb_outgoing_lock);
+struct pci_flb_incoming {
+ /* The pci_ser struct passed by the previous kernel. */
+ struct pci_ser *ser;
+
+ /* xarray used to quickly find a device in ser->devices[] */
+ struct xarray xa;
+};
+
+static unsigned long pci_ser_xa_key(unsigned long domain, unsigned long bdf)
+{
+ return domain << 16 | bdf;
+}
+
static int pci_flb_preserve(struct liveupdate_flb_op_args *args)
{
struct pci_dev *dev = NULL;
@@ -124,13 +151,44 @@ static void pci_flb_unpreserve(struct liveupdate_flb_op_args *args)
static int pci_flb_retrieve(struct liveupdate_flb_op_args *args)
{
- args->obj = phys_to_virt(args->data);
+ struct pci_flb_incoming *incoming;
+ int i, ret;
+
+ incoming = kmalloc(sizeof(*incoming), GFP_KERNEL);
+ if (!incoming)
+ return -ENOMEM;
+
+ incoming->ser = phys_to_virt(args->data);
+
+ xa_init(&incoming->xa);
+
+ for (i = 0; i < incoming->ser->max_nr_devices; i++) {
+ struct pci_dev_ser *dev_ser = &incoming->ser->devices[i];
+ unsigned long key;
+
+ if (!dev_ser->refcount)
+ continue;
+
+ key = pci_ser_xa_key(dev_ser->domain, dev_ser->bdf);
+ ret = xa_err(xa_store(&incoming->xa, key, dev_ser, GFP_KERNEL));
+ if (ret) {
+ xa_destroy(&incoming->xa);
+ kfree(incoming);
+ return ret;
+ }
+ }
+
+ args->obj = incoming;
return 0;
}
static void pci_flb_finish(struct liveupdate_flb_op_args *args)
{
- kho_restore_free(args->obj);
+ struct pci_flb_incoming *incoming = args->obj;
+
+ xa_destroy(&incoming->xa);
+ kho_restore_free(incoming->ser);
+ kfree(incoming);
}
static struct liveupdate_flb_ops pci_liveupdate_flb_ops = {
@@ -225,6 +283,129 @@ void pci_liveupdate_unpreserve(struct pci_dev *dev)
}
EXPORT_SYMBOL_GPL(pci_liveupdate_unpreserve);
+static struct xarray *pci_liveupdate_flb_get_incoming(void)
+{
+ struct pci_flb_incoming *incoming;
+ int ret;
+
+ ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)&incoming);
+
+ /* Live Update is not enabled. */
+ if (ret == -EOPNOTSUPP)
+ return NULL;
+
+ /* Live Update is enabled, but there is no incoming FLB data. */
+ if (ret == -ENODATA)
+ return NULL;
+
+ /*
+ * Live Update is enabled and there is incoming FLB data, but none of it
+ * matches pci_liveupdate_flb.compatible.
+ *
+ * This could mean that no PCI FLB data was passed by the previous
+ * kernel, but it could also mean the previous kernel used a different
+ * compatibility string (i.e. a different ABI).
+ */
+ if (ret == -ENOENT) {
+ pr_info_once("No incoming FLB matched %s\n", pci_liveupdate_flb.compatible);
+ return NULL;
+ }
+
+ /*
+ * There is incoming FLB data that matches pci_liveupdate_flb.compatible
+ * but it cannot be retrieved.
+ */
+ if (ret) {
+ WARN_ONCE(ret, "Failed to retrieve incoming FLB data\n");
+ return NULL;
+ }
+
+ return &incoming->xa;
+}
+
+static void pci_liveupdate_flb_put_incoming(void)
+{
+ liveupdate_flb_put_incoming(&pci_liveupdate_flb);
+}
+
+void pci_liveupdate_setup_device(struct pci_dev *dev)
+{
+ struct pci_dev_ser *dev_ser;
+ struct xarray *xa;
+ unsigned long key;
+
+ xa = pci_liveupdate_flb_get_incoming();
+ if (!xa)
+ return;
+
+ key = pci_ser_xa_key(pci_domain_nr(dev->bus), pci_dev_id(dev));
+ dev_ser = xa_load(xa, key);
+
+ /* This device was not preserved across Live Update */
+ if (!dev_ser) {
+ pci_liveupdate_flb_put_incoming();
+ return;
+ }
+
+ /*
+ * This device was preserved, but has already been probed and gone
+ * through pci_liveupdate_finish(). This can happen if PCI core probes
+ * the same device multiple times, e.g. due to hotplug.
+ */
+ if (!dev_ser->refcount) {
+ pci_liveupdate_flb_put_incoming();
+ return;
+ }
+
+ pci_info(dev, "Device was preserved by previous kernel across Live Update\n");
+
+ /*
+ * Hold the ref on the incoming FLB until pci_liveupdate_finish() so
+ * that dev_ser does not get freed while it is in use.
+ */
+ dev->liveupdate_incoming = dev_ser;
+}
+
+void pci_liveupdate_cleanup_device(struct pci_dev *dev)
+{
+ /*
+ * Drop the FLB reference acquired in pci_liveupdate_setup_device() if
+ * the device is being cleaned up before pci_liveupdate_finish(), e.g.
+ * due to allocation failure during setup.
+ *
+ * Do not drop dev->liveupdate_incoming->refcount since this device has
+ * not gone through pci_liveupdate_finish() and thus is still an
+ * incoming preserved device.
+ *
+ * Note: This cannot race with pci_liveupdate_finish() since it is only
+ * called in cleanup paths when there are no users of the pci_dev.
+ */
+ if (dev->liveupdate_incoming)
+ pci_liveupdate_flb_put_incoming();
+}
+
+void pci_liveupdate_finish(struct pci_dev *dev)
+{
+ if (!dev->liveupdate_incoming) {
+ pci_warn(dev, "Cannot finish preserving an unpreserved device\n");
+ return;
+ }
+
+ pci_info(dev, "Device is finished participating in Live Update\n");
+
+ /*
+ * Drop the refcount so this device does not get treated as an incoming
+ * device again, e.g. in case pci_liveupdate_setup_device() gets called
+ * again becase the device is hot-plugged.
+ */
+ dev->liveupdate_incoming->refcount = 0;
+ dev->liveupdate_incoming = NULL;
+
+ /* Drop this device's reference on the incoming FLB. */
+ pci_liveupdate_flb_put_incoming();
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_finish);
+
int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
{
pr_debug("Registering file handler \"%s\"\n", fh->compatible);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4a14f88e543a..09bab39738d7 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1439,4 +1439,17 @@ static inline int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int inde
(PCI_CONF1_ADDRESS(bus, dev, func, reg) | \
PCI_CONF1_EXT_REG(reg))
+#ifdef CONFIG_PCI_LIVEUPDATE
+void pci_liveupdate_setup_device(struct pci_dev *dev);
+void pci_liveupdate_cleanup_device(struct pci_dev *dev);
+#else
+static inline void pci_liveupdate_setup_device(struct pci_dev *dev)
+{
+}
+
+static inline void pci_liveupdate_cleanup_device(struct pci_dev *dev)
+{
+}
+#endif
+
#endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index b63cd0c310bc..938a28e4a7a0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2069,6 +2069,8 @@ int pci_setup_device(struct pci_dev *dev)
if (pci_early_dump)
early_dump_pci_device(dev);
+ pci_liveupdate_setup_device(dev);
+
/* Need to have dev->class ready */
dev->cfg_size = pci_cfg_space_size(dev);
@@ -2192,6 +2194,7 @@ int pci_setup_device(struct pci_dev *dev)
default: /* unknown header */
pci_err(dev, "unknown header type %02x, ignoring device\n",
dev->hdr_type);
+ pci_liveupdate_cleanup_device(dev);
pci_release_of_node(dev);
return -EIO;
@@ -2490,6 +2493,7 @@ static void pci_release_dev(struct device *dev)
pci_dev = to_pci_dev(dev);
pci_release_capabilities(pci_dev);
+ pci_liveupdate_cleanup_device(pci_dev);
pci_release_of_node(pci_dev);
pcibios_release_device(pci_dev);
pci_bus_put(pci_dev->bus);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index eb94cbd8ab9d..dd6b26ca9462 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -597,6 +597,7 @@ struct pci_dev {
#endif
#ifdef CONFIG_PCI_LIVEUPDATE
struct pci_dev_ser *liveupdate_outgoing; /* State preserved for next kernel */
+ struct pci_dev_ser *liveupdate_incoming; /* State preserved by previous kernel */
#endif
};
@@ -2887,11 +2888,17 @@ void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh);
int pci_liveupdate_preserve(struct pci_dev *dev);
void pci_liveupdate_unpreserve(struct pci_dev *dev);
+void pci_liveupdate_finish(struct pci_dev *dev);
static inline struct pci_dev_ser *pci_liveupdate_outgoing(struct pci_dev *dev)
{
return dev->liveupdate_outgoing;
}
+
+static inline struct pci_dev_ser *pci_liveupdate_incoming(struct pci_dev *dev)
+{
+ return dev->liveupdate_incoming;
+}
#else
static inline int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
{
@@ -2911,10 +2918,19 @@ static inline void pci_liveupdate_unpreserve(struct pci_dev *dev)
{
}
+static inline void pci_liveupdate_finish(struct pci_dev *dev)
+{
+}
+
static inline struct pci_dev_ser *pci_liveupdate_outgoing(struct pci_dev *dev)
{
return NULL;
}
+
+static inline struct pci_dev_ser *pci_liveupdate_incoming(struct pci_dev *dev)
+{
+ return NULL;
+}
#endif
#endif /* LINUX_PCI_H */
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 04/11] PCI: liveupdate: Document driver binding responsibilities
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
` (2 preceding siblings ...)
2026-04-23 21:23 ` [PATCH v4 03/11] PCI: liveupdate: Track incoming " David Matlack
@ 2026-04-23 21:23 ` David Matlack
2026-04-23 21:23 ` [PATCH v4 05/11] PCI: liveupdate: Inherit bus numbers during Live Update David Matlack
` (6 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
Document how driver binding works during a Live Update and what the PCI
core expects of drivers and users. Note that this is only a description
of the current division of responsibilities. These can change in the
future if we decide.
Signed-off-by: David Matlack <dmatlack@google.com>
---
drivers/pci/liveupdate.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index e616cecc37c8..c0a30d16d9b8 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -77,6 +77,22 @@
* preserved. These may be relaxed in the future:
*
* * The device cannot be a Virtual Function (VF).
+ *
+ * Driver Binding
+ * ==============
+ *
+ * In the outgoing kernel, it is the driver's responsibility to ensure that it
+ * does not release a device between pci_liveupdate_preserve() and
+ * pci_liveupdate_unpreserve().
+ *
+ * In the incoming kernel, it is the driver's responsibility to ensure that it
+ * does not release a preserved device between probe() and
+ * pci_liveupdate_finish().
+ *
+ * It is the user's responsibility to ensure that incoming preserved devices are
+ * bound to the correct driver. i.e. The PCI core does not protect against a
+ * device getting preserved by driver A in the outgoing kernel and then getting
+ * bound to driver B in the incoming kernel.
*/
#define pr_fmt(fmt) "PCI: liveupdate: " fmt
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 05/11] PCI: liveupdate: Inherit bus numbers during Live Update
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
` (3 preceding siblings ...)
2026-04-23 21:23 ` [PATCH v4 04/11] PCI: liveupdate: Document driver binding responsibilities David Matlack
@ 2026-04-23 21:23 ` David Matlack
2026-04-23 21:23 ` [PATCH v4 06/11] PCI: liveupdate: Auto-preserve upstream bridges across " David Matlack
` (5 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
Inherit bus numbers from the previous kernel during a Live Update when
one or more PCI devices are being preserved.
During a Live Update, preserved devices must be allowed to continue
performing memory transactions so the kernel cannot change the fabric
topology, including bus numbers, since that would require disabling
and flushing any memory transactions first.
To keep things simple, inherit the secondary and subordinate bus numbers
on all bridges if any PCI devices were preserved (i.e. even bridges
without any downstream endpoints that were preserved). This avoids
accidentally assigning a bridge a new window that overlaps with a
preserved device that is downstream of a different bridge.
If a bridge is enumerated with a broken topology or has no bus numbers
set during a Live Update, refuse to assign it new bus numbers and refuse
to enumerate devices below it. This is a safety measure to prevent
topology conflicts.
Require that CONFIG_CARDBUS is not enabled to enable
CONFIG_PCI_LIVEUPDATE since inheriting bus numbers on PCI-to-CardBus
bridges requires additional work but is not a priority at the moment.
Signed-off-by: David Matlack <dmatlack@google.com>
---
.../admin-guide/kernel-parameters.txt | 6 +++-
drivers/pci/Kconfig | 2 +-
drivers/pci/liveupdate.c | 28 +++++++++++++++++++
drivers/pci/probe.c | 21 +++++++++++---
include/linux/pci.h | 1 +
5 files changed, 52 insertions(+), 6 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cf3807641d89..f412a4b77fb7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5156,7 +5156,11 @@ Kernel parameters
explicitly which ones they are.
assign-busses [X86] Always assign all PCI bus
numbers ourselves, overriding
- whatever the firmware may have done.
+ whatever the firmware may have done. Ignored
+ during a Live Update, where the kernel must
+ inherit the PCI topology (including bus numbers)
+ to avoid interrupting ongoing memory
+ transactions of preserved devices.
usepirqmask [X86] Honor the possible IRQ mask stored
in the BIOS $PIR table. This is needed on
some systems with broken BIOSes, notably
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 08398cbe970c..6ef457ff9d08 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -330,7 +330,7 @@ config VGA_ARB_MAX_GPUS
config PCI_LIVEUPDATE
bool "PCI Live Update Support (EXPERIMENTAL)"
- depends on PCI && LIVEUPDATE
+ depends on PCI && LIVEUPDATE && !CARDBUS
help
Enable PCI core support for preserving PCI devices across Live
Update. This, in combination with support in a device's driver,
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index c0a30d16d9b8..cf8cff134a75 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -93,6 +93,19 @@
* bound to the correct driver. i.e. The PCI core does not protect against a
* device getting preserved by driver A in the outgoing kernel and then getting
* bound to driver B in the incoming kernel.
+ *
+ * BDF Stability
+ * =============
+ *
+ * The PCI core guarantees that incoming preserved devices can be identified by
+ * the same bus, device, and function numbers as prior to kexec. To accomplish
+ * this, the PCI core always inherits the secondary and subordinate bus numbers
+ * assigned to bridges during enumeration, rather than assigning new ones (the
+ * PCI core assumes that the previous kernel established a sane topology).
+ *
+ * If a misconfigured or unconfigured bridge is encountered during enumeration
+ * while there are incoming preserved devices, it's secondary and subordinate
+ * bus numbers will be cleared and devices below it will not be enumerated.
*/
#define pr_fmt(fmt) "PCI: liveupdate: " fmt
@@ -354,6 +367,21 @@ void pci_liveupdate_setup_device(struct pci_dev *dev)
if (!xa)
return;
+ /*
+ * During a Live Update, preserved devices are allowed to continue
+ * performing memory transactions. The kernel must not change the fabric
+ * topology, including bus numbers, since that would require disabling
+ * and flushing any memory transactions first.
+ *
+ * To keep things simple, inherit the secondary and subordinate bus
+ * numbers on _all_ bridges if _any_ PCI devices were preserved (i.e.
+ * even bridges without any downstream endpoints that were preserved).
+ * This avoids accidentally assigning a bridge a new window that
+ * overlaps with a preserved device that is downstream of a different
+ * bridge.
+ */
+ dev->liveupdate_inherit_buses = true;
+
key = pci_ser_xa_key(pci_domain_nr(dev->bus), pci_dev_id(dev));
dev_ser = xa_load(xa, key);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 938a28e4a7a0..fa26f4170add 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1374,6 +1374,14 @@ bool pci_ea_fixed_busnrs(struct pci_dev *dev, u8 *sec, u8 *sub)
return true;
}
+static bool pci_should_assign_new_buses(struct pci_dev *dev)
+{
+ if (dev->liveupdate_inherit_buses)
+ return false;
+
+ return pcibios_assign_all_busses();
+}
+
/*
* pci_scan_bridge_extend() - Scan buses behind a bridge
* @bus: Parent bus the bridge is on
@@ -1401,6 +1409,7 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
int max, unsigned int available_buses,
int pass)
{
+ const bool assign_new_buses = pci_should_assign_new_buses(dev);
struct pci_bus *child;
u32 buses;
u16 bctl;
@@ -1453,8 +1462,7 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
goto out;
}
- if ((secondary || subordinate) &&
- !pcibios_assign_all_busses() && !broken) {
+ if ((secondary || subordinate) && !assign_new_buses && !broken) {
unsigned int cmax, buses;
/*
@@ -1496,8 +1504,7 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
* do in the second pass.
*/
if (!pass) {
- if (pcibios_assign_all_busses() || broken)
-
+ if (assign_new_buses || broken)
/*
* Temporarily disable forwarding of the
* configuration cycles on all bridges in
@@ -1511,6 +1518,12 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
goto out;
}
+ if (dev->liveupdate_inherit_buses) {
+ pci_err(dev, "Cannot reconfigure bridge during Live Update!\n");
+ pci_err(dev, "Downstream devices will not be enumerated!\n");
+ goto out;
+ }
+
/* Clear errors */
pci_write_config_word(dev, PCI_STATUS, 0xffff);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index dd6b26ca9462..9a602b322e3c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -511,6 +511,7 @@ struct pci_dev {
unsigned int rom_bar_overlap:1; /* ROM BAR disable broken */
unsigned int rom_attr_enabled:1; /* Display of ROM attribute enabled? */
unsigned int non_mappable_bars:1; /* BARs can't be mapped to user-space */
+ unsigned int liveupdate_inherit_buses:1; /* Inherit bus numbers due to Live Update */
pci_dev_flags_t dev_flags;
atomic_t enable_cnt; /* pci_enable_device has been called */
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 06/11] PCI: liveupdate: Auto-preserve upstream bridges across Live Update
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
` (4 preceding siblings ...)
2026-04-23 21:23 ` [PATCH v4 05/11] PCI: liveupdate: Inherit bus numbers during Live Update David Matlack
@ 2026-04-23 21:23 ` David Matlack
2026-04-23 21:23 ` [PATCH v4 07/11] PCI: liveupdate: Inherit ACS flags in incoming preserved devices David Matlack
` (4 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
When a PCI device is preserved across a Live Update, all of its upstream
bridges up to the root port must also be preserved. This enables the PCI
core and any drivers bound to the bridges to manage bridges correctly
across a Live Update.
Notably, this will be used in subsequent commits to ensure that
preserved devices can continue performing memory transactions without a
disruption or change in routing.
To preserve bridges, the PCI core tracks the number of downstream
devices preserved under each bridge using a reference count in struct
pci_dev_ser. This allows a bridge to remain preserved until all its
downstream preserved devices are unpreserved or finish their
participation in the Live Update.
Signed-off-by: David Matlack <dmatlack@google.com>
---
drivers/pci/liveupdate.c | 149 +++++++++++++++++++++++++++++----------
1 file changed, 111 insertions(+), 38 deletions(-)
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index cf8cff134a75..88125f9a2c6b 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -106,6 +106,18 @@
* If a misconfigured or unconfigured bridge is encountered during enumeration
* while there are incoming preserved devices, it's secondary and subordinate
* bus numbers will be cleared and devices below it will not be enumerated.
+ *
+ * PCI-to-PCI Bridges
+ * ==================
+ *
+ * Any PCI-to-PCI bridges upstream of a preserved device are automatically
+ * preserved when the device is preserved. The PCI core keeps track of the
+ * number of downstream devices that are preserved under a bridge so that the
+ * bridge is only unpreserved once all downstream devices are unpreserved.
+ *
+ * This enables the PCI core and any drivers bound to the bridge to participate
+ * in the Live Update so that preserved endpoints can continue issuing memory
+ * transactions during the Live Update.
*/
#define pr_fmt(fmt) "PCI: liveupdate: " fmt
@@ -233,25 +245,14 @@ static struct liveupdate_flb pci_liveupdate_flb = {
.compatible = PCI_LUO_FLB_COMPATIBLE,
};
-int pci_liveupdate_preserve(struct pci_dev *dev)
+static int pci_liveupdate_preserve_device(struct pci_ser *ser, struct pci_dev *dev)
{
- struct pci_ser *ser;
- int i, ret;
-
- guard(mutex)(&pci_flb_outgoing_lock);
-
- ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser);
- if (ret)
- return ret;
+ int i;
- if (!ser)
- return -ENOENT;
-
- if (dev->is_virtfn)
- return -EINVAL;
-
- if (dev->liveupdate_outgoing)
- return -EBUSY;
+ if (dev->liveupdate_outgoing) {
+ dev->liveupdate_outgoing->refcount++;
+ return 0;
+ }
if (ser->nr_devices == ser->max_nr_devices)
return -ENOSPC;
@@ -281,11 +282,82 @@ int pci_liveupdate_preserve(struct pci_dev *dev)
return -ENOSPC;
}
+
+static void pci_liveupdate_unpreserve_path(struct pci_ser *ser, struct pci_dev *dev)
+{
+ struct pci_dev *upstream_bridge = dev->bus->self;
+ struct pci_dev_ser *dev_ser;
+
+ if (upstream_bridge)
+ pci_liveupdate_unpreserve_path(ser, upstream_bridge);
+
+ dev_ser = dev->liveupdate_outgoing;
+ if (!dev_ser) {
+ pci_warn(dev, "Cannot unpreserve device that is not preserved\n");
+ return;
+ }
+
+ if (--dev_ser->refcount == 0) {
+ pci_info(dev, "Device will no longer be preserved across next Live Update\n");
+ ser->nr_devices--;
+ memset(dev_ser, 0, sizeof(*dev_ser));
+ dev->liveupdate_outgoing = NULL;
+ }
+}
+
+static int pci_liveupdate_preserve_path(struct pci_ser *ser, struct pci_dev *dev)
+{
+ struct pci_dev *upstream_bridge = dev->bus->self;
+ int ret = 0;
+
+ if (upstream_bridge) {
+ ret = pci_liveupdate_preserve_path(ser, upstream_bridge);
+ if (ret)
+ return ret;
+ } else if (!pci_is_root_bus(dev->bus)) {
+ pci_err(dev, "Failed to preserve up to root port\n");
+ return -EINVAL;
+ }
+
+ ret = pci_liveupdate_preserve_device(ser, dev);
+ if (ret)
+ goto err;
+
+ return 0;
+
+err:
+ if (upstream_bridge)
+ pci_liveupdate_unpreserve_path(ser, upstream_bridge);
+
+ return ret;
+}
+
+int pci_liveupdate_preserve(struct pci_dev *dev)
+{
+ struct pci_ser *ser;
+ int ret;
+
+ guard(mutex)(&pci_flb_outgoing_lock);
+
+ ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser);
+ if (ret)
+ return ret;
+
+ if (!ser)
+ return -ENOENT;
+
+ if (dev->is_virtfn)
+ return -EINVAL;
+
+ if (dev->liveupdate_outgoing)
+ return -EBUSY;
+
+ return pci_liveupdate_preserve_path(ser, dev);
+}
EXPORT_SYMBOL_GPL(pci_liveupdate_preserve);
void pci_liveupdate_unpreserve(struct pci_dev *dev)
{
- struct pci_dev_ser *dev_ser;
struct pci_ser *ser = NULL;
int ret;
@@ -296,19 +368,9 @@ void pci_liveupdate_unpreserve(struct pci_dev *dev)
if (ret || !ser) {
pci_warn(dev, "Cannot unpreserve device without outgoing Live Update state\n");
return;
-
- }
-
- dev_ser = dev->liveupdate_outgoing;
- if (!dev_ser) {
- pci_warn(dev, "Cannot unpreserve device that is not preserved\n");
- return;
}
- pci_info(dev, "Device will no longer be preserved across next Live Update\n");
- ser->nr_devices--;
- memset(dev_ser, 0, sizeof(*dev_ser));
- dev->liveupdate_outgoing = NULL;
+ pci_liveupdate_unpreserve_path(ser, dev);
}
EXPORT_SYMBOL_GPL(pci_liveupdate_unpreserve);
@@ -428,6 +490,25 @@ void pci_liveupdate_cleanup_device(struct pci_dev *dev)
pci_liveupdate_flb_put_incoming();
}
+static void pci_liveupdate_finish_path(struct pci_dev *dev)
+{
+ struct pci_dev *upstream_bridge = dev->bus->self;
+
+ if (upstream_bridge)
+ pci_liveupdate_finish_path(upstream_bridge);
+
+ /*
+ * Decrement the refcount so this device does not get treated as an
+ * incoming device again, e.g. in case pci_liveupdate_setup_device()
+ * gets called again becase the device is hot-plugged.
+ */
+ if (--dev->liveupdate_incoming->refcount)
+ return;
+
+ pci_info(dev, "Device is finished participating in Live Update\n");
+ dev->liveupdate_incoming = NULL;
+}
+
void pci_liveupdate_finish(struct pci_dev *dev)
{
if (!dev->liveupdate_incoming) {
@@ -435,15 +516,7 @@ void pci_liveupdate_finish(struct pci_dev *dev)
return;
}
- pci_info(dev, "Device is finished participating in Live Update\n");
-
- /*
- * Drop the refcount so this device does not get treated as an incoming
- * device again, e.g. in case pci_liveupdate_setup_device() gets called
- * again becase the device is hot-plugged.
- */
- dev->liveupdate_incoming->refcount = 0;
- dev->liveupdate_incoming = NULL;
+ pci_liveupdate_finish_path(dev);
/* Drop this device's reference on the incoming FLB. */
pci_liveupdate_flb_put_incoming();
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 07/11] PCI: liveupdate: Inherit ACS flags in incoming preserved devices
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
` (5 preceding siblings ...)
2026-04-23 21:23 ` [PATCH v4 06/11] PCI: liveupdate: Auto-preserve upstream bridges across " David Matlack
@ 2026-04-23 21:23 ` David Matlack
2026-04-23 21:23 ` [PATCH v4 08/11] PCI: liveupdate: Require preserved devices are in immutable singleton IOMMU groups David Matlack
` (3 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
Inherit Access Control Services (ACS) flags on all incoming preserved
devices (endpoints and upstream bridges) during a Live Update.
Inheriting ACS flags avoids changing routing rules while memory
transactions are in flight from preserved devices. This is also strictly
necessary to ensure that IOMMU group assignments do not change across
a Live Update for preserved devices, as changing ACS configurations can
split or merge IOMMU groups.
Signed-off-by: David Matlack <dmatlack@google.com>
---
drivers/pci/liveupdate.c | 10 ++++++++++
drivers/pci/pci.c | 10 +++++++++-
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 88125f9a2c6b..a9a89f7bd3e5 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -118,6 +118,16 @@
* This enables the PCI core and any drivers bound to the bridge to participate
* in the Live Update so that preserved endpoints can continue issuing memory
* transactions during the Live Update.
+ *
+ * Handling Preserved Devices
+ * ==========================
+ *
+ * The PCI core treats preserved devices differently than non-preserved devices.
+ * This section enumerates those differences.
+ *
+ * * The PCI core inherits all ACS flags enabled on incoming preserved devices
+ * rather than assigning new ones. This ensures that TLPs are routed the same
+ * way after Live Update and ensures that IOMMU groups do not change.
*/
#define pr_fmt(fmt) "PCI: liveupdate: " fmt
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8f7cfcc00090..e615b7c3e430 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1017,6 +1017,15 @@ void pci_enable_acs(struct pci_dev *dev)
bool enable_acs = false;
int pos;
+ /*
+ * ACS flags must be inherited from the previous kernel during a Live
+ * Update for preserved devices (which includes endpoints and any
+ * upstream bridges) to avoid changing routing while memory transactions
+ * are in flight.
+ */
+ if (pci_liveupdate_incoming(dev))
+ return;
+
/* If an iommu is present we start with kernel default caps */
if (pci_acs_enable) {
if (pci_dev_specific_enable_acs(dev))
@@ -1041,7 +1050,6 @@ void pci_enable_acs(struct pci_dev *dev)
PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC,
~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC));
__pci_config_acs(dev, &caps, config_acs_param, 0, 0);
-
pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
}
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 08/11] PCI: liveupdate: Require preserved devices are in immutable singleton IOMMU groups
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
` (6 preceding siblings ...)
2026-04-23 21:23 ` [PATCH v4 07/11] PCI: liveupdate: Inherit ACS flags in incoming preserved devices David Matlack
@ 2026-04-23 21:23 ` David Matlack
2026-04-23 22:10 ` David Matlack
2026-04-23 21:23 ` [PATCH v4 09/11] PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges David Matlack
` (2 subsequent siblings)
10 siblings, 1 reply; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
Restrict support for preserving PCI devices across Live Update to
devices in immutable singleton IOMMU groups. A device's group is
considered immutable if all bridges upstream from the device up to the
root port have the required ACS features enabled.
Since ACS flags are inherited across a Live Update for preserved devices
and all the way up to the root port, the preserved device should be in a
singleton IOMMU group after kexec in the new kernel.
This change should still permit all the current use-cases for PCI device
preservation across Live Update, since it is intended to be used in
Cloud enviroments which should have the required ACS features enabled
for virtualization purposes.
If a device is part of a multi-device IOMMU group, preserving it will
now fail with an error. This restriction may be lifted in the future if
support for preserving multi-device groups is desired.
Signed-off-by: David Matlack <dmatlack@google.com>
---
drivers/iommu/iommu.c | 35 +++++++++++++++++++++++++++++++++++
drivers/pci/liveupdate.c | 6 ++++++
include/linux/iommu.h | 7 +++++++
3 files changed, 48 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 61c12ba78206..782e73a9d45f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1664,6 +1664,41 @@ struct iommu_group *pci_device_group(struct device *dev)
}
EXPORT_SYMBOL_GPL(pci_device_group);
+bool pci_device_group_immutable_singleton(struct pci_dev *dev)
+{
+ struct iommu_group *group;
+ struct group_device *d;
+ struct pci_bus *bus;
+ int nr_devices = 0;
+
+ group = iommu_group_get(&dev->dev);
+ if (!group)
+ return false;
+
+ mutex_lock(&group->mutex);
+
+ for_each_group_device(group, d)
+ nr_devices++;
+
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+
+ if (nr_devices != 1)
+ return false;
+
+ for (bus = dev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
+ if (!bus->self)
+ continue;
+
+ if (!pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
+ return false;
+
+ break;
+ }
+
+ return true;
+}
+
/* Get the IOMMU group for device on fsl-mc bus */
struct iommu_group *fsl_mc_device_group(struct device *dev)
{
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index a9a89f7bd3e5..54a90ff02bdd 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -133,6 +133,7 @@
#define pr_fmt(fmt) "PCI: liveupdate: " fmt
#include <linux/io.h>
+#include <linux/iommu.h>
#include <linux/kexec_handover.h>
#include <linux/kho/abi/pci.h>
#include <linux/liveupdate.h>
@@ -359,6 +360,11 @@ int pci_liveupdate_preserve(struct pci_dev *dev)
if (dev->is_virtfn)
return -EINVAL;
+ if (!pci_device_group_immutable_singleton(dev)) {
+ pci_warn(dev, "Device preservation limited to immutable singleton iommu groups\n");
+ return -EINVAL;
+ }
+
if (dev->liveupdate_outgoing)
return -EBUSY;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e587d4ac4d33..6f5d1dec3f89 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1096,6 +1096,8 @@ extern struct iommu_group *generic_device_group(struct device *dev);
struct iommu_group *fsl_mc_device_group(struct device *dev);
extern struct iommu_group *generic_single_device_group(struct device *dev);
+bool pci_device_group_immutable_singleton(struct pci_dev *dev);
+
/**
* struct iommu_fwspec - per-device IOMMU instance data
* @iommu_fwnode: firmware handle for this device's IOMMU
@@ -1528,6 +1530,11 @@ static inline int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
static inline void pci_dev_reset_iommu_done(struct pci_dev *pdev)
{
}
+
+static inline bool pci_device_group_immutable_singleton(struct pci_dev *dev)
+{
+ return false;
+}
#endif /* CONFIG_IOMMU_API */
#ifdef CONFIG_IRQ_MSI_IOMMU
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 09/11] PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
` (7 preceding siblings ...)
2026-04-23 21:23 ` [PATCH v4 08/11] PCI: liveupdate: Require preserved devices are in immutable singleton IOMMU groups David Matlack
@ 2026-04-23 21:23 ` David Matlack
2026-04-23 21:23 ` [PATCH v4 10/11] PCI: liveupdate: Do not disable bus mastering on preserved devices during kexec David Matlack
2026-04-23 21:23 ` [PATCH v4 11/11] Documentation: PCI: Add documentation for Live Update David Matlack
10 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
Inherit the ARI Forwarding Enable on preserved bridges and update
pci_dev->ari_enabled accordingly during a Live Update. This ensures that
the preserved devices on the bridge's secondary bus can be identified
with the same expanded 8-bit function number after a Live Update.
Signed-off-by: David Matlack <dmatlack@google.com>
---
drivers/pci/liveupdate.c | 4 ++++
drivers/pci/pci.c | 12 +++++++++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 54a90ff02bdd..25c86cd4c173 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -128,6 +128,10 @@
* * The PCI core inherits all ACS flags enabled on incoming preserved devices
* rather than assigning new ones. This ensures that TLPs are routed the same
* way after Live Update and ensures that IOMMU groups do not change.
+ *
+ * * The PCI core inherits ARI Forwarding Enable on all bridges with downstream
+ * preserved devices to ensure that all preserved devices on the bridge's
+ * secondary bus are addressable after the Live Update.
*/
#define pr_fmt(fmt) "PCI: liveupdate: " fmt
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e615b7c3e430..b45539c55c7d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3498,8 +3498,9 @@ void pci_configure_ari(struct pci_dev *dev)
{
u32 cap;
struct pci_dev *bridge;
+ u16 val = 0;
- if (pcie_ari_disabled || !pci_is_pcie(dev) || dev->devfn)
+ if (!pci_is_pcie(dev) || dev->devfn)
return;
bridge = dev->bus->self;
@@ -3510,6 +3511,15 @@ void pci_configure_ari(struct pci_dev *dev)
if (!(cap & PCI_EXP_DEVCAP2_ARI))
return;
+ if (pci_liveupdate_incoming(bridge)) {
+ pcie_capability_read_word(bridge, PCI_EXP_DEVCTL2, &val);
+ bridge->ari_enabled = !!(val & PCI_EXP_DEVCTL2_ARI);
+ return;
+ }
+
+ if (pcie_ari_disabled)
+ return;
+
if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ARI)) {
pcie_capability_set_word(bridge, PCI_EXP_DEVCTL2,
PCI_EXP_DEVCTL2_ARI);
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 10/11] PCI: liveupdate: Do not disable bus mastering on preserved devices during kexec
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
` (8 preceding siblings ...)
2026-04-23 21:23 ` [PATCH v4 09/11] PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges David Matlack
@ 2026-04-23 21:23 ` David Matlack
2026-04-23 21:23 ` [PATCH v4 11/11] Documentation: PCI: Add documentation for Live Update David Matlack
10 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
Do not disable bus mastering on outgoing preserved devices during
pci_device_shutdown() for kexec.
Preserved devices must be allowed to perform memory transactions during
a Live Update to minimize downtime and ensure continuous operation.
Clearing the bus mastering bit would prevent these devices from issuing
any memory requests while the new kernel boots.
Because bridges upstream of preserved endpoint devices are also
automatically preserved, this change also avoids clearing bus mastering
on them. This is critical because clearing bus mastering on an upstream
bridge prevents the bridge from forwarding memory requests upstream (i.e.
it would prevent the endpoint device from accessing system RAM and doing
peer-to-peer transactions with devices not downstream of the bridge).
Signed-off-by: David Matlack <dmatlack@google.com>
---
drivers/pci/liveupdate.c | 4 ++++
drivers/pci/pci-driver.c | 31 ++++++++++++++++++++++---------
2 files changed, 26 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 25c86cd4c173..2a4a139623a6 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -132,6 +132,10 @@
* * The PCI core inherits ARI Forwarding Enable on all bridges with downstream
* preserved devices to ensure that all preserved devices on the bridge's
* secondary bus are addressable after the Live Update.
+ *
+ * * The PCI core does not disable bus mastering on outoing preserved devices
+ * during kexec. This allows preserved devices to issue memory transactions
+ * throughout the Live Update.
*/
#define pr_fmt(fmt) "PCI: liveupdate: " fmt
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d10ece0889f0..05584bc76332 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -531,6 +531,27 @@ static void pci_device_remove(struct device *dev)
pci_dev_put(pci_dev);
}
+/*
+ * Disable bus mastering on the device so that it does not perform memory
+ * transactions during kexec.
+ *
+ * Don't touch devices that are being preserved across kexec for Live
+ * Update or that are in D3cold or unknown states.
+ */
+static void pci_clear_master_for_shutdown(struct pci_dev *pci_dev)
+{
+ if (!kexec_in_progress)
+ return;
+
+ if (pci_liveupdate_outgoing(pci_dev))
+ return;
+
+ if (pci_dev->current_state > PCI_D3hot)
+ return;
+
+ pci_clear_master(pci_dev);
+}
+
static void pci_device_shutdown(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
@@ -541,15 +562,7 @@ static void pci_device_shutdown(struct device *dev)
if (drv && drv->shutdown)
drv->shutdown(pci_dev);
- /*
- * If this is a kexec reboot, turn off Bus Master bit on the
- * device to tell it to not continue to do DMA. Don't touch
- * devices in D3cold or unknown states.
- * If it is not a kexec reboot, firmware will hit the PCI
- * devices with big hammer and stop their DMA any way.
- */
- if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
- pci_clear_master(pci_dev);
+ pci_clear_master_for_shutdown(pci_dev);
}
#ifdef CONFIG_PM_SLEEP
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 11/11] Documentation: PCI: Add documentation for Live Update
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
` (9 preceding siblings ...)
2026-04-23 21:23 ` [PATCH v4 10/11] PCI: liveupdate: Do not disable bus mastering on preserved devices during kexec David Matlack
@ 2026-04-23 21:23 ` David Matlack
10 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2026-04-23 21:23 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
Add documentation files for the PCI subsystem's participation in Live
Update.
These documentation files are generated from the kernel-doc comments
in the PCI Live Update source code. They describe the File-Lifecycle
Bound (FLB) API, the device tracking API, and the specific policies
applied to preserved devices (such as bus number inheritance and bus
mastering preservation).
Signed-off-by: David Matlack <dmatlack@google.com>
---
Documentation/PCI/index.rst | 1 +
Documentation/PCI/liveupdate.rst | 23 +++++++++++++++++++++++
Documentation/core-api/liveupdate.rst | 1 +
MAINTAINERS | 1 +
4 files changed, 26 insertions(+)
create mode 100644 Documentation/PCI/liveupdate.rst
diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index 5d720d2a415e..23fb737ac969 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -20,3 +20,4 @@ PCI Bus Subsystem
controller/index
boot-interrupts
tph
+ liveupdate
diff --git a/Documentation/PCI/liveupdate.rst b/Documentation/PCI/liveupdate.rst
new file mode 100644
index 000000000000..04c9b675e8df
--- /dev/null
+++ b/Documentation/PCI/liveupdate.rst
@@ -0,0 +1,23 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+===========================
+PCI Support for Live Update
+===========================
+
+.. kernel-doc:: drivers/pci/liveupdate.c
+ :doc: PCI Live Update
+
+PCI Preservation ABI
+====================
+
+.. kernel-doc:: include/linux/kho/abi/pci.h
+ :doc: PCI File-Lifecycle Bound (FLB) Live Update ABI
+
+.. kernel-doc:: include/linux/kho/abi/pci.h
+ :internal:
+
+See Also
+========
+
+ * :doc:`/core-api/liveupdate`
+ * :doc:`/core-api/kho/index`
diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst
index 5a292d0f3706..d56a7760978a 100644
--- a/Documentation/core-api/liveupdate.rst
+++ b/Documentation/core-api/liveupdate.rst
@@ -70,3 +70,4 @@ See Also
- :doc:`Live Update uAPI </userspace-api/liveupdate>`
- :doc:`/core-api/kho/index`
+- :doc:`PCI </PCI/liveupdate>`
diff --git a/MAINTAINERS b/MAINTAINERS
index 94af31837375..42dbac2c2ed3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20564,6 +20564,7 @@ Q: https://patchwork.kernel.org/project/linux-pci/list/
B: https://bugzilla.kernel.org
C: irc://irc.oftc.net/linux-pci
T: git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
+F: Documentation/PCI/liveupdate.rst
F: drivers/pci/liveupdate.c
F: include/linux/kho/abi/pci.h
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 08/11] PCI: liveupdate: Require preserved devices are in immutable singleton IOMMU groups
2026-04-23 21:23 ` [PATCH v4 08/11] PCI: liveupdate: Require preserved devices are in immutable singleton IOMMU groups David Matlack
@ 2026-04-23 22:10 ` David Matlack
2026-04-23 22:52 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: David Matlack @ 2026-04-23 22:10 UTC (permalink / raw)
To: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan,
Jason Gunthorpe, Joerg Roedel, Jonathan Corbet, Josh Hilke,
Leon Romanovsky, Lukas Wunner, Mike Rapoport, Parav Pandit,
Pasha Tatashin, Pranjal Shrivastava, Pratyush Yadav, Robin Murphy,
Saeed Mahameed, Samiullah Khawaja, Shuah Khan, Will Deacon,
William Tu, Yi Liu
On Thu, Apr 23, 2026 at 2:23 PM David Matlack <dmatlack@google.com> wrote:
>
> Restrict support for preserving PCI devices across Live Update to
> devices in immutable singleton IOMMU groups. A device's group is
> considered immutable if all bridges upstream from the device up to the
> root port have the required ACS features enabled.
>
> Since ACS flags are inherited across a Live Update for preserved devices
> and all the way up to the root port, the preserved device should be in a
> singleton IOMMU group after kexec in the new kernel.
>
> This change should still permit all the current use-cases for PCI device
> preservation across Live Update, since it is intended to be used in
> Cloud enviroments which should have the required ACS features enabled
> for virtualization purposes.
>
> If a device is part of a multi-device IOMMU group, preserving it will
> now fail with an error. This restriction may be lifted in the future if
> support for preserving multi-device groups is desired.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
Jason, do you think requiring singleton iommu groups is still
necessary/useful now that this series preserves ACS flags on preserved
devices and upstream bridges?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 08/11] PCI: liveupdate: Require preserved devices are in immutable singleton IOMMU groups
2026-04-23 22:10 ` David Matlack
@ 2026-04-23 22:52 ` Jason Gunthorpe
2026-04-23 23:09 ` David Matlack
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2026-04-23 22:52 UTC (permalink / raw)
To: David Matlack
Cc: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci,
Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan, Joerg Roedel,
Jonathan Corbet, Josh Hilke, Leon Romanovsky, Lukas Wunner,
Mike Rapoport, Parav Pandit, Pasha Tatashin, Pranjal Shrivastava,
Pratyush Yadav, Robin Murphy, Saeed Mahameed, Samiullah Khawaja,
Shuah Khan, Will Deacon, William Tu, Yi Liu
On Thu, Apr 23, 2026 at 03:10:55PM -0700, David Matlack wrote:
> On Thu, Apr 23, 2026 at 2:23 PM David Matlack <dmatlack@google.com> wrote:
> >
> > Restrict support for preserving PCI devices across Live Update to
> > devices in immutable singleton IOMMU groups. A device's group is
> > considered immutable if all bridges upstream from the device up to the
> > root port have the required ACS features enabled.
> >
> > Since ACS flags are inherited across a Live Update for preserved devices
> > and all the way up to the root port, the preserved device should be in a
> > singleton IOMMU group after kexec in the new kernel.
> >
> > This change should still permit all the current use-cases for PCI device
> > preservation across Live Update, since it is intended to be used in
> > Cloud enviroments which should have the required ACS features enabled
> > for virtualization purposes.
> >
> > If a device is part of a multi-device IOMMU group, preserving it will
> > now fail with an error. This restriction may be lifted in the future if
> > support for preserving multi-device groups is desired.
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
>
> Jason, do you think requiring singleton iommu groups is still
> necessary/useful now that this series preserves ACS flags on preserved
> devices and upstream bridges?
I have forgotten why we introduced that? There are alot of funky
things about iommu groups that might be important upon restoration..
Like if you preserve one group member but not the other what do you ?
Even if you have ACS flags there are cases where groups are still
aliasing DMA..
Frankly, multi-device iommu groups don't even work fully last time we
tried to use them in a VMM. So I think I would not expect them to ever
intersect with live update. Blocking something tricky you can't test
does seem like a reasonable thing.
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 08/11] PCI: liveupdate: Require preserved devices are in immutable singleton IOMMU groups
2026-04-23 22:52 ` Jason Gunthorpe
@ 2026-04-23 23:09 ` David Matlack
2026-04-23 23:27 ` Samiullah Khawaja
0 siblings, 1 reply; 16+ messages in thread
From: David Matlack @ 2026-04-23 23:09 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, kexec, linux-doc, linux-kernel, linux-mm, linux-pci,
Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan, Joerg Roedel,
Jonathan Corbet, Josh Hilke, Leon Romanovsky, Lukas Wunner,
Mike Rapoport, Parav Pandit, Pasha Tatashin, Pranjal Shrivastava,
Pratyush Yadav, Robin Murphy, Saeed Mahameed, Samiullah Khawaja,
Shuah Khan, Will Deacon, William Tu, Yi Liu
On Thu, Apr 23, 2026 at 3:53 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Apr 23, 2026 at 03:10:55PM -0700, David Matlack wrote:
> > On Thu, Apr 23, 2026 at 2:23 PM David Matlack <dmatlack@google.com> wrote:
> > >
> > > Restrict support for preserving PCI devices across Live Update to
> > > devices in immutable singleton IOMMU groups. A device's group is
> > > considered immutable if all bridges upstream from the device up to the
> > > root port have the required ACS features enabled.
> > >
> > > Since ACS flags are inherited across a Live Update for preserved devices
> > > and all the way up to the root port, the preserved device should be in a
> > > singleton IOMMU group after kexec in the new kernel.
> > >
> > > This change should still permit all the current use-cases for PCI device
> > > preservation across Live Update, since it is intended to be used in
> > > Cloud enviroments which should have the required ACS features enabled
> > > for virtualization purposes.
> > >
> > > If a device is part of a multi-device IOMMU group, preserving it will
> > > now fail with an error. This restriction may be lifted in the future if
> > > support for preserving multi-device groups is desired.
> > >
> > > Signed-off-by: David Matlack <dmatlack@google.com>
> >
> > Jason, do you think requiring singleton iommu groups is still
> > necessary/useful now that this series preserves ACS flags on preserved
> > devices and upstream bridges?
>
> I have forgotten why we introduced that? There are alot of funky
> things about iommu groups that might be important upon restoration..
You had originally suggested it in this thread:
https://lore.kernel.org/kvm/20260301192236.GQ5933@nvidia.com/
> Like if you preserve one group member but not the other what do you ?
Yeah I imagine there could be some tricky cases there...
I wonder if PCI core is the right layer to enforce this. Maybe this
fits better into Sami's IOMMU core series since that is where all
those tricky cases will be (I imagine?).
> Even if you have ACS flags there are cases where groups are still
> aliasing DMA..
Hm, if a DMA alias can be created after boot time enumeration even
with the REQ_ACS_FLAGS check, then
pci_device_group_immutable_singleton() is not really immutable.
> Frankly, multi-device iommu groups don't even work fully last time we
> tried to use them in a VMM. So I think I would not expect them to ever
> intersect with live update. Blocking something tricky you can't test
> does seem like a reasonable thing.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 08/11] PCI: liveupdate: Require preserved devices are in immutable singleton IOMMU groups
2026-04-23 23:09 ` David Matlack
@ 2026-04-23 23:27 ` Samiullah Khawaja
0 siblings, 0 replies; 16+ messages in thread
From: Samiullah Khawaja @ 2026-04-23 23:27 UTC (permalink / raw)
To: David Matlack
Cc: Jason Gunthorpe, iommu, kexec, linux-doc, linux-kernel, linux-mm,
linux-pci, Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan, Joerg Roedel,
Jonathan Corbet, Josh Hilke, Leon Romanovsky, Lukas Wunner,
Mike Rapoport, Parav Pandit, Pasha Tatashin, Pranjal Shrivastava,
Pratyush Yadav, Robin Murphy, Saeed Mahameed, Shuah Khan,
Will Deacon, William Tu, Yi Liu
On Thu, Apr 23, 2026 at 04:09:01PM -0700, David Matlack wrote:
>On Thu, Apr 23, 2026 at 3:53 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>> On Thu, Apr 23, 2026 at 03:10:55PM -0700, David Matlack wrote:
>> > On Thu, Apr 23, 2026 at 2:23 PM David Matlack <dmatlack@google.com> wrote:
>> > >
>> > > Restrict support for preserving PCI devices across Live Update to
>> > > devices in immutable singleton IOMMU groups. A device's group is
>> > > considered immutable if all bridges upstream from the device up to the
>> > > root port have the required ACS features enabled.
>> > >
>> > > Since ACS flags are inherited across a Live Update for preserved devices
>> > > and all the way up to the root port, the preserved device should be in a
>> > > singleton IOMMU group after kexec in the new kernel.
>> > >
>> > > This change should still permit all the current use-cases for PCI device
>> > > preservation across Live Update, since it is intended to be used in
>> > > Cloud enviroments which should have the required ACS features enabled
>> > > for virtualization purposes.
>> > >
>> > > If a device is part of a multi-device IOMMU group, preserving it will
>> > > now fail with an error. This restriction may be lifted in the future if
>> > > support for preserving multi-device groups is desired.
>> > >
>> > > Signed-off-by: David Matlack <dmatlack@google.com>
>> >
>> > Jason, do you think requiring singleton iommu groups is still
>> > necessary/useful now that this series preserves ACS flags on preserved
>> > devices and upstream bridges?
>>
>> I have forgotten why we introduced that? There are alot of funky
>> things about iommu groups that might be important upon restoration..
>
>You had originally suggested it in this thread:
>
> https://lore.kernel.org/kvm/20260301192236.GQ5933@nvidia.com/
>
>> Like if you preserve one group member but not the other what do you ?
>
>Yeah I imagine there could be some tricky cases there...
>
>I wonder if PCI core is the right layer to enforce this. Maybe this
>fits better into Sami's IOMMU core series since that is where all
>those tricky cases will be (I imagine?).
+1
Also I think this should probably be checked by iommufd and invoked
through vfio cdev. Basically when vfio cdev calls into iommufd to
preserve IOMMU specific aspects of device (PASID table etc), iommufd can
check this and return error.
>
>> Even if you have ACS flags there are cases where groups are still
>> aliasing DMA..
>
>Hm, if a DMA alias can be created after boot time enumeration even
>with the REQ_ACS_FLAGS check, then
>pci_device_group_immutable_singleton() is not really immutable.
>
>
>
>> Frankly, multi-device iommu groups don't even work fully last time we
>> tried to use them in a VMM. So I think I would not expect them to ever
>> intersect with live update. Blocking something tricky you can't test
>> does seem like a reasonable thing.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-04-23 23:27 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
2026-04-23 21:23 ` [PATCH v4 01/11] PCI: liveupdate: Set up FLB handler for the PCI core David Matlack
2026-04-23 21:23 ` [PATCH v4 02/11] PCI: liveupdate: Track outgoing preserved PCI devices David Matlack
2026-04-23 21:23 ` [PATCH v4 03/11] PCI: liveupdate: Track incoming " David Matlack
2026-04-23 21:23 ` [PATCH v4 04/11] PCI: liveupdate: Document driver binding responsibilities David Matlack
2026-04-23 21:23 ` [PATCH v4 05/11] PCI: liveupdate: Inherit bus numbers during Live Update David Matlack
2026-04-23 21:23 ` [PATCH v4 06/11] PCI: liveupdate: Auto-preserve upstream bridges across " David Matlack
2026-04-23 21:23 ` [PATCH v4 07/11] PCI: liveupdate: Inherit ACS flags in incoming preserved devices David Matlack
2026-04-23 21:23 ` [PATCH v4 08/11] PCI: liveupdate: Require preserved devices are in immutable singleton IOMMU groups David Matlack
2026-04-23 22:10 ` David Matlack
2026-04-23 22:52 ` Jason Gunthorpe
2026-04-23 23:09 ` David Matlack
2026-04-23 23:27 ` Samiullah Khawaja
2026-04-23 21:23 ` [PATCH v4 09/11] PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges David Matlack
2026-04-23 21:23 ` [PATCH v4 10/11] PCI: liveupdate: Do not disable bus mastering on preserved devices during kexec David Matlack
2026-04-23 21:23 ` [PATCH v4 11/11] Documentation: PCI: Add documentation for Live Update David Matlack
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox