linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem
@ 2025-07-28  8:24 Chris Li
  2025-07-28  8:24 ` [PATCH RFC 01/25] PCI/LUO: Register with Liveupdate Orchestrator Chris Li
                   ` (24 more replies)
  0 siblings, 25 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

The LUO PCI subsystem is based on the LUO V2 series.
https://lore.kernel.org/lkml/20250515182322.117840-1-pasha.tatashin@soleen.com/

It registers the PCI as a LUO subsystem and forwards the liveupdate
callback to the device. The struct dev_liveupdate has been add to struct
device to keep track of the liveupdate related context.

A device can be marked as requested for liveupdate during the normal
state.

In the prepare() callback. The PCI core will build a list of the PCI device
for liveupdate based on the PCI device dependency:
1) The VF device is dependent on the PF device for SR-IOV function.
   The PF device needs to restore the number of VF.
2) The requested device is dependent on the PCI bridge it is on to preserve
   the bridge bus master. All the way to the root bridge. If the bus master
   has been disabled on the bridge, the DMA on the children devices will
   get impacted.

The list of liveupdate devices is used for prepare(), cancel(), freeze()
and finalized() callback.

The PCI subsystem will preserve the driver name for each liveupdate PCI
device and only probe that driver after kexec boot up.

It also saves the number of VF for the live updated PF device. The PF
driver will be responsible for restoring the number of VF.

Preserving the PCI device state during kexec boot up will need to change
the device probing logic significantly. After liveupdate kexec, the device
can't just be initialized as a fresh start any more. It needs to adopt the
already initialized state from the previous kernel.

Currently it is using pci_lu_adopt() function to detect if the device is
under liveupdate, then skip the device initialization write if needed.
That part of the code is pretty invasive and spread into many PCI device
initialization code paths. I am open to suggestion how it can be done
cleaner.

After kexec boot up, the PF device will probe before VF. Inside the PF
driver probe(), the PF driver will restore the number of VF and create the
VF device. Then the VF driver's probing will be called.

Disclaimer:
The data preservation format is not final. It currently uses C struct
directly. It does not deal with version change on the data format yet. I
do have some idea how to address the versioning of data layout. Those
will be outside the scope of this series.

Testing:
Testing was done with Intel diorite NVMe VF device 8086:1457. Bind the PF
with pci-lu-stub-pf driver and VF with pci-lu-stub driver.  The VF is mark
mark as requested
[  317.393914] pci-lu-stub 0000:09:00.0: Marking device as live update requested

Now perform luo prepare, the PCI subsystem builds the liveupdate device
list from the PCI root bridge. The PF device and PCI bridge will be mark
depended.
[  330.870750] pci-lu-stub 0000:09:00.0: PCI liveupdate: collect liveupdate device: [requested]
[  330.879214] pci_bus 0000:09: PCI liveupdate: collect liveupdate bus 0000:09
[  330.886219] pci-lu-stub-pf 0000:05:00.1: PCI liveupdate: collect liveupdate device: [depended]
[  330.894845] pci_bus 0000:05: PCI liveupdate: collect liveupdate bus 0000:05
[  330.901829] pcieport 0000:04:01.0: PCI liveupdate: collect liveupdate device: [depended]
[  330.909944] pci_bus 0000:04: PCI liveupdate: collect liveupdate bus 0000:04
[  330.916933] pci-lu-stub 0000:09:00.0: pci_lu_stub_prepare(): data: 0x1eaf1c000
[  330.924174] pci-lu-stub-pf 0000:05:00.1: pci_lu_stub_prepare(): data: 0x1a2abe000
[  330.931678] PCI liveupdate: prepare data[23654a000]
[  330.936587] luo_core: Switched from [normal] to [prepared] state

After kexec reboot. The liveupdate devices are probed and restores the live
update context.
[    3.628261] pci 0000:04:01.0: PCI liveupdate: liveupdate restore [depended] driver: pcieport data: [0] num_vfs: 0
[    4.769292] pci 0000:05:00.1: PCI liveupdate: liveupdate restore [depended] driver: pci-lu-stub-pf data: [1a2abe000] num_vfs: 4
[   16.811848] pci 0000:09:00.0: PCI liveupdate: liveupdate restore [requested] driver: pci-lu-stub data: [1eaf1c000] num_vfs: 0

Perform luo finish to convert from update state to normal state. The
reserved folio will be freed.
[  287.836486] PCI liveupdate: finish data[23654a000]
[  287.841309] pci-lu-stub-pf 0000:05:00.1: pci_lu_stub_finish(): data: 0x1a2abe000
[  287.848733] pci-lu-stub 0000:09:00.0: pci_lu_stub_finish(): data: 0x1eaf1c000
[  287.855897] luo_core: Switched from [updated] to [normal] state

Signed-off-by: Chris Li <chrisl@kernel.org>
---
Chris Li (14):
      PCI/LUO: Register with Liveupdate Orchestrator
      PCI/LUO: Add struct dev_liveupdate
      PCI/LUO: Create requested liveupdate device list
      PCI/LUO: Forward prepare()/freeze()/cancel() callbacks to driver
      PCI/LUO: Restore state at PCI enumeration
      PCI/LUO: Forward finish callbacks to drivers
      PCI/LUO: Save and restore driver name
      PCI/LUO: Add liveupdate to pcieport driver
      PCI/LUO: Save SR-IOV number of VF
      PCI/LUO: Add pci_liveupdate_get_driver_data()
      PCI: pci-lu-stub: Add a stub driver for Live Update testing
      PCI/LUO: Track liveupdate buses
      PCI/LUO: Avoid write to liveupdate devices at boot
      PCI: pci-lu-pf-stub: Add a PF stub driver for Live Update testing

David Matlack (1):
      PCI/LUO: Clean up PCI_SER_GET()

Jason Miu (10):
      PCI/LUO: Save struct pci_dev info during prepare phase
      PCI/LUO: Check the device function numbers in restoration
      PCI/LUO: Restore power state of a PCI device
      PCI/LUO: Restore PM related fields
      PCI/LUO: Restore the pme_poll flag
      PCI/LUO: Restore the no_d3cold flag
      PCI/LUO: Restore pci_dev fields during probe
      PCI/LUO: Save and restore the PCI resource
      PCI/LUO: Save PCI bus and host bridge states
      PCI/LUO: Check the PCI bus state after restoration

 drivers/pci/Kconfig            |  10 +
 drivers/pci/Makefile           |   2 +
 drivers/pci/ats.c              |   7 +-
 drivers/pci/bus.c              |   5 +
 drivers/pci/iov.c              |  58 ++--
 drivers/pci/liveupdate.c       | 707 +++++++++++++++++++++++++++++++++++++++++
 drivers/pci/msi/msi.c          |  32 +-
 drivers/pci/msi/pcidev_msi.c   |   4 +-
 drivers/pci/pci-acpi.c         |   3 +
 drivers/pci/pci-lu-stub.c      | 216 +++++++++++++
 drivers/pci/pci.c              | 105 ++++--
 drivers/pci/pci.h              |  70 ++++
 drivers/pci/pcie/aspm.c        |   7 +-
 drivers/pci/pcie/pme.c         |  11 +-
 drivers/pci/pcie/portdrv.c     |  13 +
 drivers/pci/probe.c            |  92 ++++--
 drivers/pci/setup-bus.c        |  10 +-
 include/linux/dev_liveupdate.h |  64 ++++
 include/linux/device.h         |  15 +
 include/linux/device/driver.h  |   6 +
 include/linux/pci.h            |   9 +
 21 files changed, 1352 insertions(+), 94 deletions(-)
---
base-commit: 57fb5d5e70ca837e0cf3e38c59112cce460b643d
change-id: 20250724-luo-pci-1291890b710f

Best regards,
-- 
Chris Li <chrisl@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH RFC 01/25] PCI/LUO: Register with Liveupdate Orchestrator
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 02/25] PCI/LUO: Add struct dev_liveupdate Chris Li
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

Register PCI subsystem with the Liveupdate Orchestrator
and provide noop liveupdate callbacks.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/Makefile     |  1 +
 drivers/pci/liveupdate.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 67647f1880fb8fb0629d680398f5b88d69aac660..aa1bac7aed7d12c641a6b55e56176fb3cdde4c91 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -37,6 +37,7 @@ obj-$(CONFIG_PCI_DOE)		+= doe.o
 obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o
 obj-$(CONFIG_PCI_NPEM)		+= npem.o
 obj-$(CONFIG_PCIE_TPH)		+= tph.o
+obj-$(CONFIG_LIVEUPDATE)	+= liveupdate.o
 
 # Endpoint library must be initialized before its users
 obj-$(CONFIG_PCI_ENDPOINT)	+= endpoint/
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
new file mode 100644
index 0000000000000000000000000000000000000000..86b4f3a2fb44781c6e323ba029db510450556fa9
--- /dev/null
+++ b/drivers/pci/liveupdate.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Chris Li <chrisl@kernel.org>
+ */
+
+#define pr_fmt(fmt) "PCI liveupdate: " fmt
+
+#include <linux/liveupdate.h>
+
+#define PCI_SUBSYSTEM_NAME "pci"
+
+static int pci_liveupdate_prepare(void *arg, u64 *data)
+{
+	pr_info("prepare data[%llx]\n", *data);
+	return 0;
+}
+
+static int pci_liveupdate_freeze(void *arg, u64 *data)
+{
+	pr_info("freeze data[%llx]\n", *data);
+	return 0;
+}
+
+static void pci_liveupdate_cancel(void *arg, u64 data)
+{
+	pr_info("cancel data[%llx]\n", data);
+}
+
+static void pci_liveupdate_finish(void *arg, u64 data)
+{
+	pr_info("finish data[%llx]\n", data);
+}
+
+struct liveupdate_subsystem pci_liveupdate_ops = {
+	.prepare = pci_liveupdate_prepare,
+	.freeze = pci_liveupdate_freeze,
+	.cancel = pci_liveupdate_cancel,
+	.finish = pci_liveupdate_finish,
+	.name = PCI_SUBSYSTEM_NAME,
+};
+
+static int __init pci_liveupdate_init(void)
+{
+	int ret;
+
+	ret = liveupdate_register_subsystem(&pci_liveupdate_ops);
+	if (ret && liveupdate_state_updated())
+		panic("PCI liveupdate: Register subsystem failed: %d", ret);
+	WARN(ret, "PCI liveupdate: Register subsystem failed %d", ret);
+	return 0;
+}
+late_initcall_sync(pci_liveupdate_init);

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 02/25] PCI/LUO: Add struct dev_liveupdate
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
  2025-07-28  8:24 ` [PATCH RFC 01/25] PCI/LUO: Register with Liveupdate Orchestrator Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 03/25] PCI/LUO: Create requested liveupdate device list Chris Li
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

Introduce struct dev_liveupdate and add it to struct device.

Use the new struct to track a device's liveupdate states.
- requested: The device is requested for a live update.
- depended:  One of the child devices is requested for live update.

When the device is requested, the dev->lu.requested will
set to 1.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c       |  3 +++
 include/linux/dev_liveupdate.h | 35 +++++++++++++++++++++++++++++++++++
 include/linux/device.h         |  6 ++++++
 3 files changed, 44 insertions(+)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 86b4f3a2fb44781c6e323ba029db510450556fa9..1c69adf412255c8ee5bc6db588ff04b1642e8e19 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -6,8 +6,11 @@
  */
 
 #define pr_fmt(fmt) "PCI liveupdate: " fmt
+#define dev_fmt(fmt) "PCI liveupdate: " fmt
 
+#include <linux/types.h>
 #include <linux/liveupdate.h>
+#include "pci.h"
 
 #define PCI_SUBSYSTEM_NAME "pci"
 
diff --git a/include/linux/dev_liveupdate.h b/include/linux/dev_liveupdate.h
new file mode 100644
index 0000000000000000000000000000000000000000..057407c030b0872bfa8cd666e6ffc305f7aa4083
--- /dev/null
+++ b/include/linux/dev_liveupdate.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ * Chris Li <chrisl@kernel.org>
+ */
+#ifndef _LINUX_DEV_LIVEUPDATE_H
+#define _LINUX_DEV_LIVEUPDATE_H
+
+#include <linux/liveupdate.h>
+
+#ifdef CONFIG_LIVEUPDATE
+
+/**
+ * struct dev_liveupdate - Device state for live update operations
+ * @lu_next:	List head for linking the device into live update
+ *		related lists (e.g., a list of devices participating
+ *		in a live update sequence).
+ * @requested:	Set if a live update has been requested for this
+ *		device (i.e. device will participate in live update).
+ * @depended:	Set if the device participtate the live update due to
+ *		one of its child device is requested in live update.
+ *
+ * This structure holds the state information required for performing
+ * live update operations on a device. It is embedded within a struct device.
+ */
+struct dev_liveupdate {
+	struct list_head lu_next;
+	bool requested:1;
+	bool depended:1;
+};
+
+#endif /* CONFIG_LIVEUPDATE */
+#endif /* _LINUX_DEV_LIVEUPDATE_H */
diff --git a/include/linux/device.h b/include/linux/device.h
index 4940db137fffff4ceacf819b32433a0f4898b125..4aee7912218865168a73fe4c6d3a82646b8dd86f 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -21,6 +21,7 @@
 #include <linux/lockdep.h>
 #include <linux/compiler.h>
 #include <linux/types.h>
+#include <linux/dev_liveupdate.h>
 #include <linux/mutex.h>
 #include <linux/pm.h>
 #include <linux/atomic.h>
@@ -508,6 +509,7 @@ struct device_physical_location {
  * @pm_domain:	Provide callbacks that are executed during system suspend,
  * 		hibernation, system resume and during runtime PM transitions
  * 		along with subsystem-level and driver-level callbacks.
+ * @lu:		Live update state.
  * @em_pd:	device's energy model performance domain
  * @pins:	For device pin management.
  *		See Documentation/driver-api/pin-control.rst for details.
@@ -603,6 +605,10 @@ struct device {
 	struct dev_pm_info	power;
 	struct dev_pm_domain	*pm_domain;
 
+#ifdef CONFIG_LIVEUPDATE
+	struct dev_liveupdate	lu;
+#endif
+
 #ifdef CONFIG_ENERGY_MODEL
 	struct em_perf_domain	*em_pd;
 #endif

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 03/25] PCI/LUO: Create requested liveupdate device list
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
  2025-07-28  8:24 ` [PATCH RFC 01/25] PCI/LUO: Register with Liveupdate Orchestrator Chris Li
  2025-07-28  8:24 ` [PATCH RFC 02/25] PCI/LUO: Add struct dev_liveupdate Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 04/25] PCI/LUO: Forward prepare()/freeze()/cancel() callbacks to driver Chris Li
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

In the PCI subsystem prepare callback, create the
requested device list as per the following rules:
- If the device is liveupdate requested, then the parent device will
  also be added to the list as "depended".
- If a SR-IOV VF device is liveupdate requested, then its corresponding
  PF device will also be added to the list as "depended".

The list of PCI root bus and its children bus lists form a tree of all
PCI buses. The tree is walked in postorder traversal, so that the device
on the child bus can mark the parent bridge as "depended".

Notice that the VF is always created after PF is created. Walk the
pci_bus->devices in the reverse order so that the VF can mark the PF as
"depended".

After the postorder traversal of the bus tree then reverse order
enumerates the devices in the bus, all device marks either requested or
depended will be added to the requested device list.

This list of devices will be used in the next change to forward the
liveupdate call back into individual devices.

Note that collect_liveupdate_devices() returns the number of devices it
added to request_devices. This will be used in a subsequent commit so that
the PCI subsystem can calculate what size folio to allocate for its save
state.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.c     |  1 +
 drivers/pci/probe.c            |  4 ++-
 include/linux/dev_liveupdate.h |  4 +++
 include/linux/device.h         |  9 +++++
 5 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 1c69adf412255c8ee5bc6db588ff04b1642e8e19..73cf13f58382d62969844ae6dd6160b1a77f844b 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -14,9 +14,90 @@
 
 #define PCI_SUBSYSTEM_NAME "pci"
 
+static void stack_push_buses(struct list_head *stack, struct list_head *buses)
+{
+	struct pci_bus *bus;
+
+	list_for_each_entry(bus, buses, node)
+		list_move_tail(&bus->dev.lu.lu_next, stack);
+}
+
+static void requested_devices_add(struct device *dev, struct list_head *head)
+{
+	dev_info(dev, "collect liveupdate device:%s%s\n",
+		 dev->lu.depended ? " [depended]" : "",
+		 dev->lu.requested ? " [requested]" : "");
+	list_move_tail(&dev->lu.lu_next, head);
+}
+
+static int collect_bus_devices_reverse(struct pci_bus *bus, struct list_head *head)
+{
+	struct pci_dev *pdev;
+	int count = 0;
+
+	list_for_each_entry_reverse(pdev, &bus->devices, bus_list) {
+		if (pdev->dev.lu.requested || pdev->dev.lu.depended) {
+			if (pdev->is_virtfn)
+				pdev->physfn->dev.lu.depended = 1;
+			if (pdev->dev.parent)
+				pdev->dev.parent->lu.depended = 1;
+			requested_devices_add(&pdev->dev, head);
+			count++;
+		}
+	}
+	return count;
+}
+
+static int build_liveupdate_devices(struct list_head *head)
+{
+	LIST_HEAD(bus_stack);
+	int count = 0;
+
+	stack_push_buses(&bus_stack, &pci_root_buses);
+
+	while (!list_empty(&bus_stack)) {
+		struct device *busdev;
+		struct pci_bus *bus;
+
+		busdev = list_last_entry(&bus_stack, struct device, lu.lu_next);
+		bus = to_pci_bus(busdev);
+		if (!busdev->lu.visited && !list_empty(&bus->children)) {
+			stack_push_buses(&bus_stack, &bus->children);
+			busdev->lu.visited = 1;
+			continue;
+		}
+
+		count += collect_bus_devices_reverse(bus, head);
+		busdev->lu.visited = 0;
+		list_del_init(&busdev->lu.lu_next);
+	}
+	return count;
+}
+
+static void cleanup_liveupdate_devices(struct list_head *head)
+{
+	struct device *d, *n;
+
+	list_for_each_entry_safe(d, n, head, lu.lu_next) {
+		d->lu.depended = 0;
+		list_del_init(&d->lu.lu_next);
+	}
+}
+
 static int pci_liveupdate_prepare(void *arg, u64 *data)
 {
+	LIST_HEAD(requested_devices);
+
 	pr_info("prepare data[%llx]\n", *data);
+
+	pci_lock_rescan_remove();
+	down_write(&pci_bus_sem);
+
+	build_liveupdate_devices(&requested_devices);
+	cleanup_liveupdate_devices(&requested_devices);
+
+	up_write(&pci_bus_sem);
+	pci_unlock_rescan_remove();
 	return 0;
 }
 
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index e8318fd5f6ed537a1b236a3a0f054161d5710abd..0e9ef387182856771d857181d88f376632b46f0d 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -304,6 +304,7 @@ static int pcie_device_init(struct pci_dev *pdev, int service, int irq)
 	device = &pcie->device;
 	device->bus = &pcie_port_bus_type;
 	device->release = release_pcie_device;	/* callback to free pcie dev */
+	dev_liveupdate_init(device);
 	dev_set_name(device, "%s:pcie%03x",
 		     pci_name(pdev),
 		     get_descriptor_id(pci_pcie_type(pdev), service));
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 4b8693ec9e4c67fc1655e0057b3b96b4098e6630..dddd7ebc03d1a6e6ee456e0bf02ab9833a819509 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -614,6 +614,7 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 	INIT_LIST_HEAD(&b->devices);
 	INIT_LIST_HEAD(&b->slots);
 	INIT_LIST_HEAD(&b->resources);
+	dev_liveupdate_init(&b->dev);
 	b->max_bus_speed = PCI_SPEED_UNKNOWN;
 	b->cur_bus_speed = PCI_SPEED_UNKNOWN;
 #ifdef CONFIG_PCI_DOMAINS_GENERIC
@@ -1985,6 +1986,7 @@ int pci_setup_device(struct pci_dev *dev)
 	dev->sysdata = dev->bus->sysdata;
 	dev->dev.parent = dev->bus->bridge;
 	dev->dev.bus = &pci_bus_type;
+	dev_liveupdate_init(&dev->dev);
 	dev->hdr_type = hdr_type & 0x7f;
 	dev->multifunction = !!(hdr_type & 0x80);
 	dev->error_state = pci_channel_io_normal;
@@ -3184,7 +3186,7 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
 		return NULL;
 
 	bridge->dev.parent = parent;
-
+	dev_liveupdate_init(&bridge->dev);
 	list_splice_init(resources, &bridge->windows);
 	bridge->sysdata = sysdata;
 	bridge->busnr = bus;
diff --git a/include/linux/dev_liveupdate.h b/include/linux/dev_liveupdate.h
index 057407c030b0872bfa8cd666e6ffc305f7aa4083..6b45452c8f1420b59ed3ce954a1623fd472045f4 100644
--- a/include/linux/dev_liveupdate.h
+++ b/include/linux/dev_liveupdate.h
@@ -21,6 +21,9 @@
  *		device (i.e. device will participate in live update).
  * @depended:	Set if the device participtate the live update due to
  *		one of its child device is requested in live update.
+ * @visited:	Only used by the bus devices when travese the PCI buses
+ *		to build the liveupdate devices list. Set if the child
+ *		buses have been pushed into the pending stack.
  *
  * This structure holds the state information required for performing
  * live update operations on a device. It is embedded within a struct device.
@@ -29,6 +32,7 @@ struct dev_liveupdate {
 	struct list_head lu_next;
 	bool requested:1;
 	bool depended:1;
+	bool visited:1;
 };
 
 #endif /* CONFIG_LIVEUPDATE */
diff --git a/include/linux/device.h b/include/linux/device.h
index 4aee7912218865168a73fe4c6d3a82646b8dd86f..e0b35c723239f1254a3b6152f433e0412cd3fb34 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1174,4 +1174,13 @@ void device_link_wait_removal(void);
 #define MODULE_ALIAS_CHARDEV_MAJOR(major) \
 	MODULE_ALIAS("char-major-" __stringify(major) "-*")
 
+#ifdef CONFIG_LIVEUPDATE
+static inline void dev_liveupdate_init(struct device *dev)
+{
+	INIT_LIST_HEAD(&dev->lu.lu_next);
+}
+#else
+static inline void dev_liveupdate_init(struct device *dev) {}
+#endif
+
 #endif /* _DEVICE_H_ */

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 04/25] PCI/LUO: Forward prepare()/freeze()/cancel() callbacks to driver
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (2 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 03/25] PCI/LUO: Create requested liveupdate device list Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 05/25] PCI/LUO: Restore state at PCI enumeration Chris Li
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

After the list of preserved devices is constructed, the PCI subsystem can
now forward the liveupdate request to the driver.

The PCI subsystem saves and restores a u64 data from LUO callback. For
each device, the PCI subsystem preserve a "dev_state" struct, which
contains the path (domain + bus + devfn) and a per device u64 data.

The device driver will use such a u64 data area to store the device driver
state. The device live update callback looks very similar to the LUO
subsystem callback, with the "void *arg" change to "struct device *dev".

In the prepare callback, the PCI subsystem allocates then preserves a
folio big enough to hold all requested device state (struct pci_dev_ser)
in an array and the count.

The PCI sub system will just forward the liveupdate call back with u64
data point to the u64 field of the device state array.

If some device fails the prepare callback, all previous devices that
already successfully finished the prepare call back will get the cancel
call back to clean up the saved state. That clean up is the special case
that not the full list will be walked.

In other live update callbacks, all the devices in the preserved device
list will get the callback with their own u64 data field.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c       | 203 +++++++++++++++++++++++++++++++++++++++--
 include/linux/dev_liveupdate.h |  23 +++++
 include/linux/device/driver.h  |   6 ++
 3 files changed, 223 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 73cf13f58382d62969844ae6dd6160b1a77f844b..bbff9b314f99185dfe8941b711cdf0db16b1ed8a 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -9,11 +9,25 @@
 #define dev_fmt(fmt) "PCI liveupdate: " fmt
 
 #include <linux/types.h>
+#include <linux/kexec_handover.h>
 #include <linux/liveupdate.h>
 #include "pci.h"
 
 #define PCI_SUBSYSTEM_NAME "pci"
 
+static LIST_HEAD(preserved_devices);
+
+struct pci_dev_ser {
+	u32	path;		/* domain + bus + slot + fn */
+	u8	requested;
+	u64	driver_data;	/* driver data */
+};
+
+struct pci_ser {
+	u32 count;
+	struct pci_dev_ser devs[];
+};
+
 static void stack_push_buses(struct list_head *stack, struct list_head *buses)
 {
 	struct pci_bus *bus;
@@ -74,42 +88,213 @@ static int build_liveupdate_devices(struct list_head *head)
 	return count;
 }
 
+static void dev_cleanup_liveupdate(struct device *dev)
+{
+	dev->lu.depended = 0;
+	list_del_init(&dev->lu.lu_next);
+}
+
 static void cleanup_liveupdate_devices(struct list_head *head)
 {
 	struct device *d, *n;
 
-	list_for_each_entry_safe(d, n, head, lu.lu_next) {
-		d->lu.depended = 0;
-		list_del_init(&d->lu.lu_next);
+	list_for_each_entry_safe(d, n, head, lu.lu_next)
+		dev_cleanup_liveupdate(d);
+}
+
+static void cleanup_liveupdate_state(struct pci_ser *pci_state)
+{
+	struct folio *folio = virt_to_folio(pci_state);
+
+	kho_unpreserve_folio(folio);
+	folio_put(folio);
+}
+
+static void pci_call_cancel(struct pci_ser *pci_state)
+{
+	struct pci_dev_ser *si = pci_state->devs;
+	struct device *dev, *next;
+
+	list_for_each_entry_safe(dev, next, &preserved_devices, lu.lu_next) {
+		struct pci_dev_ser *s = si++;
+
+		if (!dev->driver)
+			panic("PCI liveupdate cancel: %s has no driver", dev_name(dev));
+		if (!dev->driver->lu)
+			panic("PCI liveupdate cancel: %s driver %s does not support liveupdate",
+			      dev_name(dev), dev->driver->name ? : "(null name)");
+		if (dev->driver->lu->cancel)
+			dev->driver->lu->cancel(dev, s->driver_data);
+		dev_cleanup_liveupdate(dev);
 	}
 }
 
-static int pci_liveupdate_prepare(void *arg, u64 *data)
+static int pci_get_device_path(struct pci_dev *pdev)
+{
+	return (pci_domain_nr(pdev->bus) << 16) | pci_dev_id(pdev);
+}
+
+static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	s->path = pci_get_device_path(pdev);
+	s->requested = dev->lu.requested;
+	return 0;
+}
+
+static int pci_call_prepare(struct pci_ser *pci_state,
+			    struct list_head *devices)
+{
+	struct pci_dev_ser *pdev_state_current = pci_state->devs;
+	struct device *dev, *next;
+	int ret;
+	char *reason;
+
+	list_for_each_entry_safe(dev, next, devices, lu.lu_next) {
+		struct pci_dev_ser *s = pdev_state_current++;
+
+		if (!dev->driver) {
+			reason = "no driver";
+			ret = -ENOENT;
+			goto cancel;
+		}
+		if (!dev->driver->lu) {
+			reason = "driver does not support liveupdate";
+			ret = -EPERM;
+			goto cancel;
+		}
+		ret = pci_save_device_state(dev, s);
+		if (ret) {
+			reason = "save device state failed";
+			goto cancel;
+		}
+		if (dev->driver->lu->prepare) {
+			ret = dev->driver->lu->prepare(dev, &s->driver_data);
+			if (ret) {
+				reason = "prepare() failed";
+				goto cancel;
+			}
+		}
+		list_move_tail(&dev->lu.lu_next, &preserved_devices);
+	}
+	return 0;
+
+cancel:
+	dev_err(dev, "luo prepare failed %d (%s)\n", ret, reason);
+	pci_call_cancel(pci_state);
+	return ret;
+}
+
+static int __pci_liveupdate_prepare(void *arg, u64 *data)
 {
 	LIST_HEAD(requested_devices);
+	struct pci_ser *pci_state;
+	int ret;
+	int count = build_liveupdate_devices(&requested_devices);
+	int size = sizeof(*pci_state) + sizeof(pci_state->devs[0]) * count;
+	int order = get_order(size);
+	struct folio *folio;
 
-	pr_info("prepare data[%llx]\n", *data);
+	folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, order);
+	if (!folio) {
+		ret = -ENOMEM;
+		goto cleanup_device;
+	}
 
-	pci_lock_rescan_remove();
-	down_write(&pci_bus_sem);
+	pci_state = folio_address(folio);
+	pci_state->count = count;
+
+	ret = kho_preserve_folio(folio);
+	if (ret) {
+		pr_err("liveupdate_preserve_folio failed\n");
+		goto release_folio;
+	}
+
+	ret = pci_call_prepare(pci_state, &requested_devices);
+	if (ret)
+		goto unpreserve;
 
-	build_liveupdate_devices(&requested_devices);
+	*data = __pa(pci_state);
+	pr_info("prepare data[%llx]\n", *data);
+	return 0;
+
+unpreserve:
+	kho_unpreserve_folio(folio);
+release_folio:
+	folio_put(folio);
+cleanup_device:
 	cleanup_liveupdate_devices(&requested_devices);
+	return ret;
+}
 
+static int pci_liveupdate_prepare(void *arg, u64 *data)
+{
+	int ret;
+
+	pci_lock_rescan_remove();
+	down_write(&pci_bus_sem);
+	ret = __pci_liveupdate_prepare(arg, data);
 	up_write(&pci_bus_sem);
 	pci_unlock_rescan_remove();
+	return ret;
+}
+
+static int pci_call_freeze(struct pci_ser *pci_state, struct list_head *devlist)
+{
+	struct pci_dev_ser *n = pci_state->devs;
+	struct device *dev;
+	int ret = 0;
+
+	list_for_each_entry(dev, devlist, lu.lu_next) {
+		struct pci_dev_ser *s = n++;
+
+		if (!dev->driver) {
+			if (!dev->parent)
+				continue;
+			panic("PCI liveupdate freeze: %s has no driver", dev_name(dev));
+		}
+		if (!dev->driver->lu->freeze)
+			continue;
+		ret = dev->driver->lu->freeze(dev, &s->driver_data);
+		if (ret) {
+			dev_err(dev, "luo freeze failed %d\n", ret);
+			pci_call_cancel(pci_state);
+			return ret;
+		}
+	}
 	return 0;
 }
 
 static int pci_liveupdate_freeze(void *arg, u64 *data)
 {
+	struct pci_ser *pci_state = phys_to_virt(*data);
+	int ret;
+
 	pr_info("freeze data[%llx]\n", *data);
-	return 0;
+	pci_lock_rescan_remove();
+	down_write(&pci_bus_sem);
+
+	ret = pci_call_freeze(pci_state, &preserved_devices);
+
+	up_write(&pci_bus_sem);
+	pci_unlock_rescan_remove();
+	return ret;
 }
 
 static void pci_liveupdate_cancel(void *arg, u64 data)
 {
+	struct pci_ser *pci_state = phys_to_virt(data);
+
 	pr_info("cancel data[%llx]\n", data);
+	pci_lock_rescan_remove();
+	down_write(&pci_bus_sem);
+
+	pci_call_cancel(pci_state);
+	cleanup_liveupdate_state(pci_state);
+
+	up_write(&pci_bus_sem);
+	pci_unlock_rescan_remove();
 }
 
 static void pci_liveupdate_finish(void *arg, u64 data)
diff --git a/include/linux/dev_liveupdate.h b/include/linux/dev_liveupdate.h
index 6b45452c8f1420b59ed3ce954a1623fd472045f4..fa664976f9f5e90b8a5a17cfbed8bd2fdc87b7a1 100644
--- a/include/linux/dev_liveupdate.h
+++ b/include/linux/dev_liveupdate.h
@@ -12,6 +12,8 @@
 
 #ifdef CONFIG_LIVEUPDATE
 
+struct device;
+
 /**
  * struct dev_liveupdate - Device state for live update operations
  * @lu_next:	List head for linking the device into live update
@@ -35,5 +37,26 @@ struct dev_liveupdate {
 	bool visited:1;
 };
 
+/**
+ * struct dev_liveupdate_ops - Live Update callback functions
+ * @prepare:     Prepare device for the upcoming state transition. Driver and
+ *               buses should save the necessary device state.
+ * @freeze:      A final notification before the system jumps to the new kernel.
+ *               Called from reboot() syscall.
+ * @cancel:      Cancel the live update process. Driver should clean
+ *               up any saved state if necessary.
+ * @finish:      The system has completed a transition. Drivers and buses should
+ *               have already restored the previously saved device state.
+ *               Clean-up any saved state or reset unreclaimed device.
+ *
+ * This structure is used by drivers and buses to hold the callback from LUO.
+ */
+struct dev_liveupdate_ops {
+	int (*prepare)(struct device *dev, u64 *data);
+	int (*freeze)(struct device *dev, u64 *data);
+	void (*cancel)(struct device *dev, u64 data);
+	void (*finish)(struct device *dev, u64 data);
+};
+
 #endif /* CONFIG_LIVEUPDATE */
 #endif /* _LINUX_DEV_LIVEUPDATE_H */
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index cd8e0f0a634be9ea63ff22e89d66ada3b1a9eaf2..b2ba469cc3065a412f02230c62e811af19c4d2c6 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -19,6 +19,7 @@
 #include <linux/pm.h>
 #include <linux/device/bus.h>
 #include <linux/module.h>
+#include <linux/dev_liveupdate.h>
 
 /**
  * enum probe_type - device driver probe type to try
@@ -80,6 +81,8 @@ enum probe_type {
  *		it is bound to the driver.
  * @pm:		Power management operations of the device which matched
  *		this driver.
+ * @lu:		Live update callbacks, notify device of the live
+ *		update state, and allow preserve device across reboot.
  * @coredump:	Called when sysfs entry is written to. The device driver
  *		is expected to call the dev_coredump API resulting in a
  *		uevent.
@@ -116,6 +119,9 @@ struct device_driver {
 	const struct attribute_group **dev_groups;
 
 	const struct dev_pm_ops *pm;
+#ifdef CONFIG_LIVEUPDATE
+	const struct dev_liveupdate_ops *lu;
+#endif
 	void (*coredump) (struct device *dev);
 
 	struct driver_private *p;

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 05/25] PCI/LUO: Restore state at PCI enumeration
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (3 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 04/25] PCI/LUO: Forward prepare()/freeze()/cancel() callbacks to driver Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 06/25] PCI/LUO: Forward finish callbacks to drivers Chris Li
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

Add a PCI device saved state member to indicate the device is requested
vs depended.

Restore the PCI subsystem saved state folio during PCI enumeration.

When a new PCI device is created, restore the per device state pointer
into the dev->lu.dev_state if the device is found in the saved
devices array, by matching the device path.

Also restore the dev->lu.requested or dev->lu.depended base on the saved
"requested" field. Add such devices to the "probed_devices" list.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c       | 54 ++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h              |  6 +++++
 drivers/pci/probe.c            |  2 ++
 include/linux/dev_liveupdate.h |  2 ++
 4 files changed, 64 insertions(+)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index bbff9b314f99185dfe8941b711cdf0db16b1ed8a..4d13071f5edd6520adb64003262f08d1f79e26c4 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -16,6 +16,7 @@
 #define PCI_SUBSYSTEM_NAME "pci"
 
 static LIST_HEAD(preserved_devices);
+static LIST_HEAD(probe_devices);
 
 struct pci_dev_ser {
 	u32	path;		/* domain + bus + slot + fn */
@@ -91,6 +92,7 @@ static int build_liveupdate_devices(struct list_head *head)
 static void dev_cleanup_liveupdate(struct device *dev)
 {
 	dev->lu.depended = 0;
+	dev->lu.dev_state = NULL;
 	list_del_init(&dev->lu.lu_next);
 }
 
@@ -310,6 +312,58 @@ struct liveupdate_subsystem pci_liveupdate_ops = {
 	.name = PCI_SUBSYSTEM_NAME,
 };
 
+static struct pci_ser *pci_state_get(void)
+{
+	static struct pci_ser *pci_state;
+	struct folio *folio;
+	phys_addr_t data = 0;
+	int ret;
+
+	if (pci_state)
+		return pci_state;
+
+	ret = liveupdate_get_subsystem_data(&pci_liveupdate_ops, &data);
+	if (ret || !data)
+		panic("PCI liveupdate: get subsystem data: [%llx] ret: %d", data, ret);
+
+	folio = kho_restore_folio(data);
+	if (!folio)
+		panic("PCI liveupdate: restore folio from %llx failed", data);
+
+	/* Cache the value for future callers. */
+	pci_state = folio_address(folio);
+	return pci_state;
+}
+
+static void pci_dev_do_restore(struct pci_dev *dev, struct pci_dev_ser *s)
+{
+	dev->dev.lu.dev_state = s;
+	if (s->requested)
+		dev->dev.lu.requested = 1;
+	else
+		dev->dev.lu.depended = 1;
+	pci_info(dev, "liveupdate restore [%s] data: [%llx]\n",
+		 s->requested ? "requested" : "depended",
+		 s->driver_data);
+	list_move_tail(&dev->dev.lu.lu_next, &probe_devices);
+}
+
+void pci_liveupdate_restore(struct pci_dev *dev)
+{
+	int path;
+	struct pci_dev_ser *s, *end;
+
+	if (!liveupdate_state_updated())
+		return;
+
+	path = pci_get_device_path(dev);
+	s = pci_state_get()->devs;
+	end = s + pci_state_get()->count;
+	for (; s < end; s++)
+		if (s->path == path)
+			return pci_dev_do_restore(dev, s);
+}
+
 static int __init pci_liveupdate_init(void)
 {
 	int ret;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 12215ee72afb682b669c0e3a582b5379828e70c4..c9a7383753949994e031dc362920286a475fe2ab 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1159,4 +1159,10 @@ static inline int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int inde
 	(PCI_CONF1_ADDRESS(bus, dev, func, reg) | \
 	 PCI_CONF1_EXT_REG(reg))
 
+#ifdef CONFIG_LIVEUPDATE
+void pci_liveupdate_restore(struct pci_dev *dev);
+#else
+static inline void pci_liveupdate_restore(struct pci_dev *dev) {}
+#endif
+
 #endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index dddd7ebc03d1a6e6ee456e0bf02ab9833a819509..a0605af1a699cd07b09897172803dcba1d2da9f9 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2017,6 +2017,8 @@ int pci_setup_device(struct pci_dev *dev)
 	if (pci_early_dump)
 		early_dump_pci_device(dev);
 
+	pci_liveupdate_restore(dev);
+
 	/* Need to have dev->class ready */
 	dev->cfg_size = pci_cfg_space_size(dev);
 
diff --git a/include/linux/dev_liveupdate.h b/include/linux/dev_liveupdate.h
index fa664976f9f5e90b8a5a17cfbed8bd2fdc87b7a1..dc65e2b2d92c02bf15440b6745c62cd748721eef 100644
--- a/include/linux/dev_liveupdate.h
+++ b/include/linux/dev_liveupdate.h
@@ -19,6 +19,7 @@ struct device;
  * @lu_next:	List head for linking the device into live update
  *		related lists (e.g., a list of devices participating
  *		in a live update sequence).
+ * @dev_state:	Set to the device state at restore.
  * @requested:	Set if a live update has been requested for this
  *		device (i.e. device will participate in live update).
  * @depended:	Set if the device participtate the live update due to
@@ -32,6 +33,7 @@ struct device;
  */
 struct dev_liveupdate {
 	struct list_head lu_next;
+	void *dev_state;
 	bool requested:1;
 	bool depended:1;
 	bool visited:1;

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 06/25] PCI/LUO: Forward finish callbacks to drivers
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (4 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 05/25] PCI/LUO: Restore state at PCI enumeration Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 07/25] PCI/LUO: Save and restore driver name Chris Li
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

When PCI receives the LUO finish callback. The PCI subsystem forwards the
finish callback to the driver with restored dev->lu.dev_state->data.

Tested: In qemu, request a virtio net device as requested.
	Perform luo prepare then kexec. Verify the new kernel boot up
	dmesg shows the requested device has per device live update state
	restored. Perform liveupdate finish and see the device finish
	callback gets invoked.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 4d13071f5edd6520adb64003262f08d1f79e26c4..6b85673f4ec20add7e49b04dc44f1bcd868adbdc 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -268,6 +268,29 @@ static int pci_call_freeze(struct pci_ser *pci_state, struct list_head *devlist)
 	return 0;
 }
 
+static void pci_call_finish(struct list_head *devlist)
+{
+	struct device *dev;
+
+	pci_lock_rescan_remove();
+	down_write(&pci_bus_sem);
+
+	list_for_each_entry(dev, devlist, lu.lu_next) {
+		struct pci_dev_ser *s = dev->lu.dev_state;
+
+		if (!dev->driver)
+			panic("PCI luo finish: dev %s does not have driver", dev_name(dev));
+		if (!dev->driver->lu)
+			panic("PCI luo finish: dev %s does not support liveupdate",
+			      dev_name(dev));
+		if (!dev->driver->lu->finish)
+			continue;
+		dev->driver->lu->finish(dev, s->driver_data);
+	}
+	up_write(&pci_bus_sem);
+	pci_unlock_rescan_remove();
+}
+
 static int pci_liveupdate_freeze(void *arg, u64 *data)
 {
 	struct pci_ser *pci_state = phys_to_virt(*data);
@@ -301,7 +324,12 @@ static void pci_liveupdate_cancel(void *arg, u64 data)
 
 static void pci_liveupdate_finish(void *arg, u64 data)
 {
+	struct pci_ser *pci_state = phys_to_virt(data);
+
 	pr_info("finish data[%llx]\n", data);
+	pci_call_finish(&probe_devices);
+	cleanup_liveupdate_devices(&probe_devices);
+	cleanup_liveupdate_state(pci_state);
 }
 
 struct liveupdate_subsystem pci_liveupdate_ops = {

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 07/25] PCI/LUO: Save and restore driver name
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (5 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 06/25] PCI/LUO: Forward finish callbacks to drivers Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 08/25] PCI/LUO: Add liveupdate to pcieport driver Chris Li
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

Save the PCI driver name into "struct pci_dev_ser" during the PCI
prepare callback.

After kexec, use driver_set_override() to ensure the device is
bound only to the saved driver.

Clear the override after the finish callback.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c | 36 ++++++++++++++++++++++++++++++++++--
 drivers/pci/pci.h        |  2 ++
 drivers/pci/probe.c      |  2 ++
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 6b85673f4ec20add7e49b04dc44f1bcd868adbdc..189827c6111b2c00ebb24404a205cde3f75d33c3 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -21,6 +21,7 @@ static LIST_HEAD(probe_devices);
 struct pci_dev_ser {
 	u32	path;		/* domain + bus + slot + fn */
 	u8	requested;
+	char	driver_name[63];
 	u64	driver_data;	/* driver data */
 };
 
@@ -91,6 +92,10 @@ static int build_liveupdate_devices(struct list_head *head)
 
 static void dev_cleanup_liveupdate(struct device *dev)
 {
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (liveupdate_state_updated())
+		WARN_ON(driver_set_override(dev, &pdev->driver_override, "", 0));
 	dev->lu.depended = 0;
 	dev->lu.dev_state = NULL;
 	list_del_init(&dev->lu.lu_next);
@@ -139,7 +144,13 @@ static int pci_get_device_path(struct pci_dev *pdev)
 static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
+	const char *name = dev->driver->name;
 
+	if (!name)
+		return -ENXIO;
+	if (strlen(name) > sizeof(s->driver_name) - 1)
+		return -ENOSPC;
+	strscpy(s->driver_name, name, sizeof(s->driver_name));
 	s->path = pci_get_device_path(pdev);
 	s->requested = dev->lu.requested;
 	return 0;
@@ -370,9 +381,9 @@ static void pci_dev_do_restore(struct pci_dev *dev, struct pci_dev_ser *s)
 		dev->dev.lu.requested = 1;
 	else
 		dev->dev.lu.depended = 1;
-	pci_info(dev, "liveupdate restore [%s] data: [%llx]\n",
+	pci_info(dev, "liveupdate restore [%s] driver: %s data: [%llx]\n",
 		 s->requested ? "requested" : "depended",
-		 s->driver_data);
+		 s->driver_name, s->driver_data);
 	list_move_tail(&dev->dev.lu.lu_next, &probe_devices);
 }
 
@@ -392,6 +403,27 @@ void pci_liveupdate_restore(struct pci_dev *dev)
 			return pci_dev_do_restore(dev, s);
 }
 
+void pci_liveupdate_override_driver(struct pci_dev *dev)
+{
+	struct pci_dev_ser *s = dev->dev.lu.dev_state;
+	int ret;
+	int len;
+
+	if (!s)
+		return;
+
+	len = strlen(s->driver_name);
+	if (!len)
+		return;
+
+	ret = driver_set_override(&dev->dev,
+				  &dev->driver_override,
+				  s->driver_name, len);
+	if (ret)
+		panic("PCI Liveupdate override driver failed: %s", s->driver_name);
+}
+
+
 static int __init pci_liveupdate_init(void)
 {
 	int ret;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c9a7383753949994e031dc362920286a475fe2ab..b79a18c5e948980fe2ef3f0a10e0d795b1eee6d7 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1161,8 +1161,10 @@ static inline int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int inde
 
 #ifdef CONFIG_LIVEUPDATE
 void pci_liveupdate_restore(struct pci_dev *dev);
+void pci_liveupdate_override_driver(struct pci_dev *dev);
 #else
 static inline void pci_liveupdate_restore(struct pci_dev *dev) {}
+static inline void pci_liveupdate_override_driver(struct pci_dev *dev) {}
 #endif
 
 #endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a0605af1a699cd07b09897172803dcba1d2da9f9..e41a1bef2083aa9184fd1c894d5de964f19d5c01 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2714,6 +2714,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 	/* Set up MSI IRQ domain */
 	pci_set_msi_domain(dev);
 
+	pci_liveupdate_override_driver(dev);
+
 	/* Notifier could use PCI capabilities */
 	ret = device_add(&dev->dev);
 	WARN_ON(ret < 0);

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 08/25] PCI/LUO: Add liveupdate to pcieport driver
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (6 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 07/25] PCI/LUO: Save and restore driver name Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 09/25] PCI/LUO: Save SR-IOV number of VF Chris Li
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

The PCIe port driver is the driver bound to the PCI-PCI bridge.
The PCIe port device is depended on by its PCI children devices.

Add the empty liveupdate callback to the pcieport driver to indicate
this driver supports liveupdate. Otherwise it can fail the liveupdate
operation if the PCI-PCI bridge is depended..

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/pcie/portdrv.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 0e9ef387182856771d857181d88f376632b46f0d..fd43c1ebfb9d2852fbc460b0390dd7fb016226d2 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -789,6 +789,15 @@ static const struct pci_error_handlers pcie_portdrv_err_handler = {
 	.mmio_enabled = pcie_portdrv_mmio_enabled,
 };
 
+#ifdef CONFIG_LIVEUPDATE
+
+/*
+ * Empty pcie_port_lu_ops to indicate this driver support liveupdate.
+ */
+static struct dev_liveupdate_ops pcie_port_lu_ops;
+
+#endif /* CONFIG_LIVEUPDATE */
+
 static struct pci_driver pcie_portdriver = {
 	.name		= "pcieport",
 	.id_table	= port_pci_ids,
@@ -802,6 +811,9 @@ static struct pci_driver pcie_portdriver = {
 	.driver_managed_dma = true,
 
 	.driver.pm	= PCIE_PORTDRV_PM_OPS,
+#ifdef CONFIG_LIVEUPDATE
+	.driver.lu	= &pcie_port_lu_ops,
+#endif
 };
 
 static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 09/25] PCI/LUO: Save SR-IOV number of VF
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (7 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 08/25] PCI/LUO: Add liveupdate to pcieport driver Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 10/25] PCI/LUO: Add pci_liveupdate_get_driver_data() Chris Li
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

During the PCI prepare callback, save the SR-IOV number if the device
is a physical function.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 189827c6111b2c00ebb24404a205cde3f75d33c3..09faba99e9218b443f66060db5142208e22c7dd5 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -21,6 +21,7 @@ static LIST_HEAD(probe_devices);
 struct pci_dev_ser {
 	u32	path;		/* domain + bus + slot + fn */
 	u8	requested;
+	u16	num_vfs;
 	char	driver_name[63];
 	u64	driver_data;	/* driver data */
 };
@@ -153,6 +154,8 @@ static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
 	strscpy(s->driver_name, name, sizeof(s->driver_name));
 	s->path = pci_get_device_path(pdev);
 	s->requested = dev->lu.requested;
+	if (pdev->sriov && pdev->is_physfn)
+		s->num_vfs = pdev->sriov->num_VFs;
 	return 0;
 }
 
@@ -381,9 +384,9 @@ static void pci_dev_do_restore(struct pci_dev *dev, struct pci_dev_ser *s)
 		dev->dev.lu.requested = 1;
 	else
 		dev->dev.lu.depended = 1;
-	pci_info(dev, "liveupdate restore [%s] driver: %s data: [%llx]\n",
+	pci_info(dev, "liveupdate restore [%s] driver: %s data: [%llx] num_vfs: %d\n",
 		 s->requested ? "requested" : "depended",
-		 s->driver_name, s->driver_data);
+		 s->driver_name, s->driver_data, s->num_vfs);
 	list_move_tail(&dev->dev.lu.lu_next, &probe_devices);
 }
 

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 10/25] PCI/LUO: Add pci_liveupdate_get_driver_data()
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (8 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 09/25] PCI/LUO: Save SR-IOV number of VF Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 11/25] PCI: pci-lu-stub: Add a stub driver for Live Update testing Chris Li
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

Similar to liveupdate_get_subsystem_data(), the PCI subsystem
provide pci_liveupdate_get_driver_data() for the driver to
receive the driver data during new kernel boot up, in the liveupdate
updated state.

This function  will return an error on any other liveupdate state.

For example, vfio-pci will use this API in probe() to access the
liveupdate state from the previous kernel.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c | 15 +++++++++++++++
 include/linux/pci.h      |  9 +++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 09faba99e9218b443f66060db5142208e22c7dd5..f84c0a455f7055b9b64051b125368fb0f9e6144f 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -406,6 +406,21 @@ void pci_liveupdate_restore(struct pci_dev *dev)
 			return pci_dev_do_restore(dev, s);
 }
 
+int pci_liveupdate_get_driver_data(struct pci_dev *pdev, u64 *data)
+{
+	struct dev_liveupdate *lu = &pdev->dev.lu;
+	struct pci_dev_ser *s = lu->dev_state;
+
+	if (!liveupdate_state_updated())
+		return -EINVAL;
+
+	if (!lu->dev_state)
+		return -ENOENT;
+
+	*data = s->driver_data;
+	return 0;
+}
+
 void pci_liveupdate_override_driver(struct pci_dev *dev)
 {
 	struct pci_dev_ser *s = dev->dev.lu.dev_state;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 05e68f35f39238f8b9ce08df97b384d1c1e89bbe..50296bb04aaa7f2bbd2260f8ec4670533e019e38 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2767,4 +2767,13 @@ void pci_uevent_ers(struct pci_dev *pdev, enum  pci_ers_result err_type);
 	WARN_ONCE(condition, "%s %s: " fmt, \
 		  dev_driver_string(&(pdev)->dev), pci_name(pdev), ##arg)
 
+#ifdef CONFIG_LIVEUPDATE
+int pci_liveupdate_get_driver_data(struct pci_dev *pdev, u64 *data);
+#else
+static inline int pci_liveupdate_get_driver_data(struct pci_dev *pdev,
+						 u64 *data)
+{
+	return 0;
+}
+#endif
 #endif /* LINUX_PCI_H */

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 11/25] PCI: pci-lu-stub: Add a stub driver for Live Update testing
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (9 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 10/25] PCI/LUO: Add pci_liveupdate_get_driver_data() Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 12/25] PCI/LUO: Save struct pci_dev info during prepare phase chrisl
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

Introduce a new driver, pci-lu-stub, that can be bound to any PCI device
and used to test the PCI subsystem support for Live Update. This driver
gives developers a way to opt-in a device for Live Update and driver
interaction with the PCI subsystem. This driver is only intended for
testing purposes.

In the future this driver can be extended to test other scenarios (such
as failing prepare() on purpose).

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/Kconfig       |  10 ++++
 drivers/pci/Makefile      |   1 +
 drivers/pci/pci-lu-stub.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 150 insertions(+)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 9c0e4aaf4e8cb7fecd9f80ac6289b8d854ce03aa..37e44782fa35c64c2eba6a0f6942d44d8003a499 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -327,3 +327,13 @@ source "drivers/pci/switch/Kconfig"
 source "drivers/pci/pwrctrl/Kconfig"
 
 endif
+
+config PCI_LU_STUB
+	tristate "PCI Live Update Stub Driver"
+	depends on LIVEUPDATE
+	help
+	  Say Y or M here if you want to enable support for the Live Update stub
+	  driver. This driver can be used to test the PCI subsystem support for
+	  Live Updates.
+
+	  When in doubt, say N.
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index aa1bac7aed7d12c641a6b55e56176fb3cdde4c91..061e98d0411a951573e1996c61ce5a98f2775e53 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o
 obj-$(CONFIG_PCI_NPEM)		+= npem.o
 obj-$(CONFIG_PCIE_TPH)		+= tph.o
 obj-$(CONFIG_LIVEUPDATE)	+= liveupdate.o
+obj-$(CONFIG_PCI_LU_STUB) 	+= pci-lu-stub.o
 
 # Endpoint library must be initialized before its users
 obj-$(CONFIG_PCI_ENDPOINT)	+= endpoint/
diff --git a/drivers/pci/pci-lu-stub.c b/drivers/pci/pci-lu-stub.c
new file mode 100644
index 0000000000000000000000000000000000000000..ea8142dcb250d31cbf817df957157bc4ec3a876d
--- /dev/null
+++ b/drivers/pci/pci-lu-stub.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/kexec_handover.h>
+#include <linux/liveupdate.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+
+struct pci_lu_stub_ser {
+	u16 dev_id;
+} __packed;
+
+static const struct pci_device_id pci_lu_stub_id_table[] = {
+	/* Allow binding to any device but only via driver_override. */
+	{ PCI_DEVICE_DRIVER_OVERRIDE(PCI_ANY_ID, PCI_ANY_ID, 1) },
+	{},
+};
+
+static int validate_folio(struct pci_dev *dev, struct folio *folio)
+{
+	const struct pci_lu_stub_ser *ser = folio_address(folio);
+
+	if (folio_order(folio) != get_order(sizeof(*ser))) {
+		pci_err(dev, "Restored folio has unexpected order %u\n", folio_order(folio));
+		return -ERANGE;
+	}
+
+	if (ser->dev_id != pci_dev_id(dev)) {
+		pci_err(dev, "Restored folio contains unexpected dev_id: 0x%x\n", ser->dev_id);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int pci_lu_stub_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+	struct folio *folio;
+	u64 data;
+	int ret;
+
+	if (liveupdate_state_normal()) {
+		pci_info(dev, "Marking device as liveupdate requested\n");
+		dev->dev.lu.requested = 1;
+		return 0;
+	}
+
+	if (!liveupdate_state_updated()) {
+		pci_err(dev, "Unable to handle probe() outside of normal and updated states.\n");
+		return -EOPNOTSUPP;
+	}
+
+	ret = pci_liveupdate_get_driver_data(dev, &data);
+	if (ret) {
+		pci_err(dev, "Failed to get driver data for device (%d)\n", ret);
+		return ret;
+	}
+
+	pci_info(dev, "%s(): data: 0x%llx\n", __func__, data);
+
+	folio = kho_restore_folio(data);
+	if (!folio) {
+		pci_err(dev, "Failed to restore folio at 0x%llx.\n", data);
+		return -ENOENT;
+	}
+
+	return validate_folio(dev, folio);
+}
+
+static void pci_lu_stub_remove(struct pci_dev *dev)
+{
+	WARN_ON(!liveupdate_state_normal());
+	dev->dev.lu.requested = 0;
+}
+
+static int pci_lu_stub_prepare(struct device *dev, u64 *data)
+{
+	struct pci_lu_stub_ser *ser;
+	struct folio *folio;
+	int ret;
+
+	folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, get_order(sizeof(*ser)));
+	if (!folio)
+		return -ENOMEM;
+
+	ret = kho_preserve_folio(folio);
+	if (ret) {
+		dev_err(dev, "Failed to preserve folio (%d)\n", ret);
+		folio_put(folio);
+		return ret;
+	}
+
+	ser = folio_address(folio);
+	ser->dev_id = pci_dev_id(to_pci_dev(dev));
+
+	*data = virt_to_phys(ser);
+	dev_info(dev, "%s(): data: 0x%llx\n", __func__, *data);
+	return 0;
+}
+
+static int pci_lu_stub_freeze(struct device *dev, u64 *data)
+{
+	struct folio *folio = pfn_folio(PHYS_PFN(*data));
+
+	dev_info(dev, "%s(): data: 0x%llx\n", __func__, *data);
+	return validate_folio(to_pci_dev(dev), folio);
+}
+
+static void pci_lu_stub_finish(struct device *dev, u64 data)
+{
+	struct folio *folio = pfn_folio(PHYS_PFN(data));
+
+	dev_info(dev, "%s(): data: 0x%llx\n", __func__, data);
+	WARN_ON(validate_folio(to_pci_dev(dev), folio));
+	folio_put(folio);
+}
+
+static void pci_lu_stub_cancel(struct device *dev, u64 data)
+{
+	dev_info(dev, "%s(): data: 0x%llx\n", __func__, data);
+	pci_lu_stub_finish(dev, data);
+}
+
+static struct dev_liveupdate_ops liveupdate_ops = {
+	.prepare	= pci_lu_stub_prepare,
+	.freeze		= pci_lu_stub_freeze,
+	.finish		= pci_lu_stub_finish,
+	.cancel		= pci_lu_stub_cancel,
+};
+
+static struct pci_driver pci_lu_stub_driver = {
+	.name		= "pci-lu-stub",
+	.id_table	= pci_lu_stub_id_table,
+	.probe		= pci_lu_stub_probe,
+	.remove		= pci_lu_stub_remove,
+	.driver.lu	= &liveupdate_ops,
+};
+
+module_pci_driver(pci_lu_stub_driver);
+MODULE_LICENSE("GPL");

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 12/25] PCI/LUO: Save struct pci_dev info during prepare phase
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (10 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 11/25] PCI: pci-lu-stub: Add a stub driver for Live Update testing Chris Li
@ 2025-07-28  8:24 ` chrisl
  2025-07-28  8:24 ` [PATCH RFC 13/25] PCI/LUO: Check the device function numbers in restoration chrisl
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: Jason Miu <jasonmiu@google.com>

Some fields in the struct pci_dev are mutable during kernel execution,
and the runtime values cannot be re-construct from reading the PCI
config space registers again. Therefore, for the next kernel, we need
to save those fields for liveupdate during the prepare phase. Those
values are expected to be restored in the boot up time of the next
kernel.

The struct pci_dev_ser is packed for making sure the field offsets are
consistent across the kernel images before and after liveupdate.

We would like to save one more field for PCI resources, which has type
struct resource. It contains pointers so needs extra handlings in the
coming patches.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c | 22 ++++++++++++++--------
 drivers/pci/pci.h        | 22 ++++++++++++++++++++++
 2 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index f84c0a455f7055b9b64051b125368fb0f9e6144f..6b1c14d70fd16b0919ca22faae788069f3743708 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -18,14 +18,6 @@
 static LIST_HEAD(preserved_devices);
 static LIST_HEAD(probe_devices);
 
-struct pci_dev_ser {
-	u32	path;		/* domain + bus + slot + fn */
-	u8	requested;
-	u16	num_vfs;
-	char	driver_name[63];
-	u64	driver_data;	/* driver data */
-};
-
 struct pci_ser {
 	u32 count;
 	struct pci_dev_ser devs[];
@@ -156,6 +148,20 @@ static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
 	s->requested = dev->lu.requested;
 	if (pdev->sriov && pdev->is_physfn)
 		s->num_vfs = pdev->sriov->num_VFs;
+
+	s->devfn = pdev->devfn;
+	s->current_state = pdev->current_state;
+	s->pm_cap = pdev->pm_cap;
+	s->broken_intx_masking = pdev->broken_intx_masking;
+	s->pme_poll = pdev->pme_poll;
+	s->no_d3cold = pdev->no_d3cold;
+	s->wakeup_prepared = pdev->wakeup_prepared;
+	s->skip_bus_pm = pdev->skip_bus_pm;
+	s->ignore_hotplug = pdev->ignore_hotplug;
+	s->hotplug_user_indicators = pdev->hotplug_user_indicators;
+	s->pref_window = pdev->pref_window;
+	s->pref_64_window = pdev->pref_64_window;
+
 	return 0;
 }
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index b79a18c5e948980fe2ef3f0a10e0d795b1eee6d7..2ef12745ee05960878d8d3fe0cdf136f69c8d408 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -489,6 +489,28 @@ struct pci_sriov {
 	bool		drivers_autoprobe; /* Auto probing of VFs by driver */
 };
 
+struct pci_dev_ser {
+	u32	path;		/* domain + bus + slot + fn */
+	u8	requested;
+	u16	num_vfs;
+	char	driver_name[63];
+	u64	driver_data;	/* driver data */
+
+	/* Saved fields from struct pci_dev */
+	u32	devfn;
+	u32	current_state;
+	u8	pm_cap;
+	u32	broken_intx_masking:1;
+	u32	pme_poll:1;
+	u32	no_d3cold:1;
+	u32	wakeup_prepared:1;
+	u32	skip_bus_pm:1;
+	u32	ignore_hotplug:1;
+	u32	hotplug_user_indicators:1;
+	u32	pref_window:1;
+	u32	pref_64_window:1;
+} __packed;
+
 #ifdef CONFIG_PCI_DOE
 void pci_doe_init(struct pci_dev *pdev);
 void pci_doe_destroy(struct pci_dev *pdev);

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 13/25] PCI/LUO: Check the device function numbers in restoration
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (11 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 12/25] PCI/LUO: Save struct pci_dev info during prepare phase chrisl
@ 2025-07-28  8:24 ` chrisl
  2025-07-28  8:24 ` [PATCH RFC 14/25] PCI/LUO: Restore power state of a PCI device chrisl
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: Jason Miu <jasonmiu@google.com>

After liveupdate reboot, the device BDF shouldn't be changed from the
previous kernel. If this happens, the saved LUO device state cannot be
used, and panic the kernel.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 6b1c14d70fd16b0919ca22faae788069f3743708..ec2d7917441ceb4e3d7cd8becae41ca215cba7c3 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -393,6 +393,15 @@ static void pci_dev_do_restore(struct pci_dev *dev, struct pci_dev_ser *s)
 	pci_info(dev, "liveupdate restore [%s] driver: %s data: [%llx] num_vfs: %d\n",
 		 s->requested ? "requested" : "depended",
 		 s->driver_name, s->driver_data, s->num_vfs);
+
+	/*
+	 * The devfn got changed since reboot. We cannot restore device
+	 * info preserved by liveupdate
+	 */
+	if (s->devfn != dev->devfn)
+		panic("%s: Device and function numbers are changed from 0x%40x to 0x%40x\n",
+		      __func__, s->devfn, dev->devfn);
+
 	list_move_tail(&dev->dev.lu.lu_next, &probe_devices);
 }
 

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 14/25] PCI/LUO: Restore power state of a PCI device
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (12 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 13/25] PCI/LUO: Check the device function numbers in restoration chrisl
@ 2025-07-28  8:24 ` chrisl
  2025-07-28  8:24 ` [PATCH RFC 15/25] PCI/LUO: Restore PM related fields chrisl
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: Jason Miu <jasonmiu@google.com>

From the liveupdate saved PCI device state, restore the device power
state.

The `pci_dev->current_state` is a cached power state. If the device
driver calls `pci_enable_device()`, this value can be modified from
reading the PMCSR register (see `pci_enable_device_flags()`). In the
future patches when a driver tries to enable the PCI device after
liveupdate, we should check the device power state at that moment with
the saved value.

Tested: QEMU liveupdate boot test. Trigger the liveupdate to the
        `finish` phase.
Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/pci.h   | 6 ++++++
 drivers/pci/probe.c | 8 ++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2ef12745ee05960878d8d3fe0cdf136f69c8d408..a8acc986a5aac808ec64395d7d946ee036270f5b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1182,9 +1182,15 @@ static inline int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int inde
 	 PCI_CONF1_EXT_REG(reg))
 
 #ifdef CONFIG_LIVEUPDATE
+#define PCI_SER_GET(__pci_dev, __var, __def)			\
+	(__pci_dev->dev.lu.dev_state) ?				\
+	((struct pci_dev_ser *)__pci_dev->dev.lu.dev_state)->__var : __def
+
 void pci_liveupdate_restore(struct pci_dev *dev);
 void pci_liveupdate_override_driver(struct pci_dev *dev);
 #else
+#define PCI_SER_GET(__dev, __var, __def) __def
+
 static inline void pci_liveupdate_restore(struct pci_dev *dev) {}
 static inline void pci_liveupdate_override_driver(struct pci_dev *dev) {}
 #endif
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e41a1bef2083aa9184fd1c894d5de964f19d5c01..7dd2cf9f9e110636f8998df22a333638cce25e6b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2030,8 +2030,12 @@ int pci_setup_device(struct pci_dev *dev)
 	if (pci_is_pcie(dev))
 		dev->supported_speeds = pcie_get_supported_speeds(dev);
 
-	/* "Unknown power state" */
-	dev->current_state = PCI_UNKNOWN;
+	/*
+	 * Restore the power state from liveupdate saved state.
+	 * If we are not booted from liveupdate, default
+	 * "Unknown power state".
+	 */
+	dev->current_state = PCI_SER_GET(dev, current_state, PCI_UNKNOWN);
 
 	/* Early fixups, before probing the BARs */
 	pci_fixup_device(pci_fixup_early, dev);

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 15/25] PCI/LUO: Restore PM related fields
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (13 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 14/25] PCI/LUO: Restore power state of a PCI device chrisl
@ 2025-07-28  8:24 ` chrisl
  2025-07-28  8:24 ` [PATCH RFC 16/25] PCI/LUO: Restore the pme_poll flag chrisl
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: Jason Miu <jasonmiu@google.com>

- pm_cap
Restore the liveupdate saved `pm_cap` during the PCI Power Management
initialization.

- skip_bus_pm flag:
The flag skip_bus_pm is used in the PM suspend and resume
operations. Therefore we restore this flag for the device in the PM
init before all the operations.

- wakeup_prepared flag:
Restores the wakeup_prepared flag during the PM initialization.

Tested: QEMU VM boot test.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/pci.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9e42090fb108920995ebe34bd2535a0e23fef7fd..e0e730f7bb3932567815c390088088bd5c56f11e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3209,13 +3209,14 @@ void pci_pm_init(struct pci_dev *dev)
 	u16 pmc;
 
 	device_enable_async_suspend(&dev->dev);
-	dev->wakeup_prepared = false;
 
 	dev->pm_cap = 0;
 	dev->pme_support = 0;
 
-	/* find PCI PM capability in list */
-	pm = pci_find_capability(dev, PCI_CAP_ID_PM);
+	/* Restore PM related fields after live update or find PM capability */
+	pm = PCI_SER_GET(dev, pm_cap, pci_find_capability(dev, PCI_CAP_ID_PM));
+	dev->wakeup_prepared = PCI_SER_GET(dev, wakeup_prepared, false);
+	dev->skip_bus_pm = PCI_SER_GET(dev, skip_bus_pm, dev->skip_bus_pm);
 	if (!pm)
 		goto poweron;
 	/* Check device's ability to generate PME# */

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 16/25] PCI/LUO: Restore the pme_poll flag
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (14 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 15/25] PCI/LUO: Restore PM related fields chrisl
@ 2025-07-28  8:24 ` chrisl
  2025-07-28  8:24 ` [PATCH RFC 17/25] PCI/LUO: Restore the no_d3cold flag chrisl
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: Jason Miu <jasonmiu@google.com>

Restore the pci_dev pme_poll flag from liveupdate. If the restored
flag is false, the device is in an active state (was not being PME
polled before liveupdate reboot), we do not touch the PCI PME register
of the device.

Tested: QEMU VM liveupdate reboot, put liveupdate in to finish phase.
Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/pci.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e0e730f7bb3932567815c390088088bd5c56f11e..46fb80dbca590c251fcad3bf2f011a16f6898810 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3263,8 +3263,17 @@ void pci_pm_init(struct pci_dev *dev)
 		 * let the user space enable it to wake up the system as needed.
 		 */
 		device_set_wakeup_capable(&dev->dev, true);
-		/* Disable the PME# generation functionality */
-		pci_pme_active(dev, false);
+
+		dev->pme_poll = PCI_SER_GET(dev, pme_poll, true);
+		/*
+		 * If the restored pme_poll is false, do not
+		 * set disable to pci_pme_active(), as the device should be in
+		 * an active state, we do not update the device's
+		 * PCI_PM_CTRL_PME_ENABLE flag and disable the PME# generation
+		 * functionality
+		 */
+		if (dev->pme_poll)
+			pci_pme_active(dev, false);
 	}
 
 	pci_read_config_word(dev, PCI_STATUS, &status);

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 17/25] PCI/LUO: Restore the no_d3cold flag
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (15 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 16/25] PCI/LUO: Restore the pme_poll flag chrisl
@ 2025-07-28  8:24 ` chrisl
  2025-07-28  8:24 ` [PATCH RFC 18/25] PCI/LUO: Restore pci_dev fields during probe chrisl
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: Jason Miu <jasonmiu@google.com>

When the PCI bus adds a device, restore the saved no_d3cold flag
before the bus does the D3 checking for the bridge. This tells the
bridge the current D3cold availability of the device.

Tested: QEMU VM boot test.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/bus.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 69048869ef1c378454f86091ddb2b59a3c3d53ec..e9c7a6dc643d3534755e4ef5218fb6f90d5dcd65 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -353,6 +353,11 @@ void pci_bus_add_device(struct pci_dev *dev)
 		of_pci_make_dev_node(dev);
 	pci_create_sysfs_dev_files(dev);
 	pci_proc_attach_device(dev);
+	/*
+	 * Restore the no_d3cold flag for the device before we start to update
+	 * the D3 state for the bridge.
+	 */
+	dev->no_d3cold = PCI_SER_GET(dev, no_d3cold, dev->no_d3cold);
 	pci_bridge_d3_update(dev);
 
 	/*

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 18/25] PCI/LUO: Restore pci_dev fields during probe
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (16 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 17/25] PCI/LUO: Restore the no_d3cold flag chrisl
@ 2025-07-28  8:24 ` chrisl
  2025-07-28  8:24 ` [PATCH RFC 19/25] PCI/LUO: Track liveupdate buses Chris Li
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: Jason Miu <jasonmiu@google.com>

- pci_intx_mask_broken flag:
This is a flag showing the PCI_COMMAND_INTX_DISABLE writability. Some
devices PCI_COMMAND_INTX_DISABLE register is not writable, and this
flag is used to report this capability. This flag is also updated in
the driver/pci/quirks.c for fixing some devices, but those flags are
static as the checking and udpating are being done for each device
model, so we only restore the flag value from liveupdate in the PCI
device setup.

- pref_window and pref_64_window flags:
Restore the pref_window and pref_64_window flags for a bridge
device. These flags are managed by the function
`pci_read_bridge_windows()` during the PCI device setup. Since we
cannot write the PCI_PREF memory after a liveupdate reboot, so we
restore the saved state from liveupdate.

It is expected the following patches will skip the bridge device Pref
window test in a liveupdate boot.

- hotplug_user_indicators flags:
Restore the hotplug_user_indicators flag for a PCI device. This flag
is for managing platform-specific indicators, so during setting up the
PCI device, restore this information from the Liveupdate.

For the flag usage, see more in 576243b3f9ea.

- ignore_hotplug flag:
The flag ignore_hotplug is managed by the function
`pci_ignore_hotplug()`, which is used by PCI drivers during a suspend
operation. We restore this flag when a PCI device is setting up, to
preserve the device state.

Tested: QEMU VM boot test

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/probe.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 7dd2cf9f9e110636f8998df22a333638cce25e6b..d8b80e1c4fb35289208d7c953fb5c1e137a5c1a8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2059,7 +2059,24 @@ int pci_setup_device(struct pci_dev *dev)
 		}
 	}
 
-	dev->broken_intx_masking = pci_intx_mask_broken(dev);
+	/*
+	 * Restore PCI device fields:
+	 * - Broken INTx masking and can't be used
+	 * - Ignore hotplug events
+	 * - Have the SlotCtl indicators controlled exclusively by user sysfs
+	 * - Pref mem window availiblity of a bridge device
+	 * - Pref mem window is 64-bit
+	 */
+	dev->broken_intx_masking = PCI_SER_GET(dev, broken_intx_masking,
+					       pci_intx_mask_broken(dev));
+	dev->ignore_hotplug = PCI_SER_GET(dev, ignore_hotplug,
+					  dev->ignore_hotplug);
+	dev->hotplug_user_indicators = PCI_SER_GET(dev, hotplug_user_indicators,
+						   dev->hotplug_user_indicators);
+	dev->pref_window = PCI_SER_GET(dev, pref_window,
+				       dev->pref_window);
+	dev->pref_64_window = PCI_SER_GET(dev, pref_64_window,
+					  dev->pref_64_window);
 
 	switch (dev->hdr_type) {		    /* header type */
 	case PCI_HEADER_TYPE_NORMAL:		    /* standard header */

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 19/25] PCI/LUO: Track liveupdate buses
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (17 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 18/25] PCI/LUO: Restore pci_dev fields during probe chrisl
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot Chris Li
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

Add the bus state to PCI state, after the devs[] array.

Currently for each bus only save the domain and bus number.

Tested: In QEMU, perform liveupdate prepare. Check dmesg for "collect
	liveupdate bus" matching to the liveupdate device's bus.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c | 68 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 57 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index ec2d7917441ceb4e3d7cd8becae41ca215cba7c3..bc2c166ef494fd0b38cc05500bf0817c0f50fd95 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -16,11 +16,20 @@
 #define PCI_SUBSYSTEM_NAME "pci"
 
 static LIST_HEAD(preserved_devices);
+static LIST_HEAD(preserved_buses);
 static LIST_HEAD(probe_devices);
+static LIST_HEAD(probe_buses);
+
+struct pci_bus_ser {
+	u16	domain;
+	u8	number;
+};
 
 struct pci_ser {
-	u32 count;
+	u32 dev_count;
+	u32 bus_count;
 	struct pci_dev_ser devs[];
+	/* struct pci_bus_ser buses[] */
 };
 
 static void stack_push_buses(struct list_head *stack, struct list_head *buses)
@@ -39,7 +48,7 @@ static void requested_devices_add(struct device *dev, struct list_head *head)
 	list_move_tail(&dev->lu.lu_next, head);
 }
 
-static int collect_bus_devices_reverse(struct pci_bus *bus, struct list_head *head)
+static int collect_buses_and_devices(struct pci_bus *bus, struct list_head *head)
 {
 	struct pci_dev *pdev;
 	int count = 0;
@@ -54,6 +63,13 @@ static int collect_bus_devices_reverse(struct pci_bus *bus, struct list_head *he
 			count++;
 		}
 	}
+	if (count || bus->dev.lu.depended) {
+		if (bus->parent)
+			bus->parent->dev.lu.depended = 1;
+		dev_info(&bus->dev, "collect liveupdate bus %s\n",
+			 dev_name(&bus->dev));
+		list_move_tail(&bus->dev.lu.lu_next, &preserved_buses);
+	}
 	return count;
 }
 
@@ -76,9 +92,11 @@ static int build_liveupdate_devices(struct list_head *head)
 			continue;
 		}
 
-		count += collect_bus_devices_reverse(bus, head);
-		busdev->lu.visited = 0;
+		/* Pop from bus_stack */
 		list_del_init(&busdev->lu.lu_next);
+
+		count += collect_buses_and_devices(bus, head);
+		busdev->lu.visited = 0;
 	}
 	return count;
 }
@@ -102,6 +120,16 @@ static void cleanup_liveupdate_devices(struct list_head *head)
 		dev_cleanup_liveupdate(d);
 }
 
+static void cleanup_liveupdate_buses(struct list_head *head)
+{
+	struct device *b, *n;
+
+	list_for_each_entry_safe(b, n, head, lu.lu_next) {
+		b->lu.depended = 0;
+		list_del_init(&b->lu.lu_next);
+	}
+}
+
 static void cleanup_liveupdate_state(struct pci_ser *pci_state)
 {
 	struct folio *folio = virt_to_folio(pci_state);
@@ -165,16 +193,24 @@ static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
 	return 0;
 }
 
+static void pci_save_bus_state(struct pci_bus *bus, struct pci_bus_ser *s)
+{
+	s->number = bus->number;
+	s->domain = pci_domain_nr(bus);
+}
+
 static int pci_call_prepare(struct pci_ser *pci_state,
 			    struct list_head *devices)
 {
-	struct pci_dev_ser *pdev_state_current = pci_state->devs;
+	struct pci_dev_ser *dev_state = pci_state->devs;
+	struct pci_bus_ser *bus_state = (struct pci_bus_ser *)
+			(dev_state + pci_state->dev_count);
 	struct device *dev, *next;
 	int ret;
 	char *reason;
 
 	list_for_each_entry_safe(dev, next, devices, lu.lu_next) {
-		struct pci_dev_ser *s = pdev_state_current++;
+		struct pci_dev_ser *s = dev_state++;
 
 		if (!dev->driver) {
 			reason = "no driver";
@@ -200,6 +236,8 @@ static int pci_call_prepare(struct pci_ser *pci_state,
 		}
 		list_move_tail(&dev->lu.lu_next, &preserved_devices);
 	}
+	list_for_each_entry(dev, &preserved_buses, lu.lu_next)
+		pci_save_bus_state(to_pci_bus(dev), bus_state++);
 	return 0;
 
 cancel:
@@ -213,8 +251,10 @@ static int __pci_liveupdate_prepare(void *arg, u64 *data)
 	LIST_HEAD(requested_devices);
 	struct pci_ser *pci_state;
 	int ret;
-	int count = build_liveupdate_devices(&requested_devices);
-	int size = sizeof(*pci_state) + sizeof(pci_state->devs[0]) * count;
+	int devcnt = build_liveupdate_devices(&requested_devices);
+	int buscnt = list_count_nodes(&preserved_buses);
+	int size = sizeof(*pci_state) + sizeof(pci_state->devs[0]) * devcnt
+			+ sizeof(struct pci_bus_ser) * buscnt;
 	int order = get_order(size);
 	struct folio *folio;
 
@@ -225,7 +265,8 @@ static int __pci_liveupdate_prepare(void *arg, u64 *data)
 	}
 
 	pci_state = folio_address(folio);
-	pci_state->count = count;
+	pci_state->dev_count = devcnt;
+	pci_state->bus_count = buscnt;
 
 	ret = kho_preserve_folio(folio);
 	if (ret) {
@@ -247,6 +288,7 @@ static int __pci_liveupdate_prepare(void *arg, u64 *data)
 	folio_put(folio);
 cleanup_device:
 	cleanup_liveupdate_devices(&requested_devices);
+	cleanup_liveupdate_buses(&preserved_buses);
 	return ret;
 }
 
@@ -336,6 +378,7 @@ static void pci_liveupdate_cancel(void *arg, u64 data)
 	down_write(&pci_bus_sem);
 
 	pci_call_cancel(pci_state);
+	cleanup_liveupdate_buses(&preserved_buses);
 	cleanup_liveupdate_state(pci_state);
 
 	up_write(&pci_bus_sem);
@@ -349,6 +392,7 @@ static void pci_liveupdate_finish(void *arg, u64 data)
 	pr_info("finish data[%llx]\n", data);
 	pci_call_finish(&probe_devices);
 	cleanup_liveupdate_devices(&probe_devices);
+	cleanup_liveupdate_buses(&probe_buses);
 	cleanup_liveupdate_state(pci_state);
 }
 
@@ -408,14 +452,16 @@ static void pci_dev_do_restore(struct pci_dev *dev, struct pci_dev_ser *s)
 void pci_liveupdate_restore(struct pci_dev *dev)
 {
 	int path;
+	struct pci_ser *pci_state;
 	struct pci_dev_ser *s, *end;
 
 	if (!liveupdate_state_updated())
 		return;
 
 	path = pci_get_device_path(dev);
-	s = pci_state_get()->devs;
-	end = s + pci_state_get()->count;
+	pci_state = pci_state_get();
+	s = pci_state->devs;
+	end = s + pci_state->dev_count;
 	for (; s < end; s++)
 		if (s->path == path)
 			return pci_dev_do_restore(dev, s);

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (18 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 19/25] PCI/LUO: Track liveupdate buses Chris Li
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28 17:23   ` Thomas Gleixner
  2025-07-28  8:24 ` [PATCH RFC 21/25] PCI/LUO: Save and restore the PCI resource chrisl
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

The liveupdate devices are already initialized by the kernel before the
kexec. During the kexec the device is still running. Avoid write to the
liveupdate devices during the new kernel boot up.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/ats.c            |  7 ++--
 drivers/pci/iov.c            | 58 ++++++++++++++++++------------
 drivers/pci/msi/msi.c        | 32 ++++++++++++-----
 drivers/pci/msi/pcidev_msi.c |  4 +--
 drivers/pci/pci-acpi.c       |  3 ++
 drivers/pci/pci.c            | 85 +++++++++++++++++++++++++++++---------------
 drivers/pci/pci.h            |  9 ++++-
 drivers/pci/pcie/aspm.c      |  7 ++--
 drivers/pci/pcie/pme.c       | 11 ++++--
 drivers/pci/probe.c          | 43 +++++++++++++++-------
 drivers/pci/setup-bus.c      | 10 +++++-
 11 files changed, 184 insertions(+), 85 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index ec6c8dbdc5e9c9959e822e016ab301bf483713a5..284f43c82593903058dee58ce64b82bad8aed710 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -75,7 +75,9 @@ int pci_prepare_ats(struct pci_dev *dev, int ps)
 
 	dev->ats_stu = ps;
 	ctrl = PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
-	pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
+
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(pci_prepare_ats);
@@ -114,7 +116,8 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
 		dev->ats_stu = ps;
 		ctrl |= PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
 	}
-	pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
 
 	dev->ats_enabled = 1;
 	return 0;
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 10693b5d7eb66bbbfb9b70ffe6e89eb89c8dc3a3..df27bcf840d9fc0dbce29810e288c1c2b74a70c9 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -85,7 +85,8 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
 {
 	struct pci_sriov *iov = dev->sriov;
 
-	pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn);
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, &iov->offset);
 	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride);
 }
@@ -694,10 +695,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 
 	pci_iov_set_numvfs(dev, nr_virtfn);
 	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
-	pci_cfg_access_lock(dev);
-	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-	msleep(100);
-	pci_cfg_access_unlock(dev);
+	if (!pci_lu_adopt(dev)) {
+		pci_cfg_access_lock(dev);
+		pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
+		msleep(100);
+		pci_cfg_access_unlock(dev);
+	}
 
 	rc = sriov_add_vfs(dev, initial);
 	if (rc)
@@ -710,10 +713,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 
 err_pcibios:
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
-	pci_cfg_access_lock(dev);
-	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-	ssleep(1);
-	pci_cfg_access_unlock(dev);
+	if (!pci_lu_adopt(dev)) {
+		pci_cfg_access_lock(dev);
+		pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
+		ssleep(1);
+		pci_cfg_access_unlock(dev);
+	}
 
 	pcibios_sriov_disable(dev);
 
@@ -741,11 +746,13 @@ static void sriov_disable(struct pci_dev *dev)
 		return;
 
 	sriov_del_vfs(dev);
-	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
-	pci_cfg_access_lock(dev);
-	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-	ssleep(1);
-	pci_cfg_access_unlock(dev);
+	if (!pci_lu_adopt(dev)) {
+		iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
+		pci_cfg_access_lock(dev);
+		pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
+		ssleep(1);
+		pci_cfg_access_unlock(dev);
+	}
 
 	pcibios_sriov_disable(dev);
 
@@ -770,7 +777,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
 	u32 sriovbars[PCI_SRIOV_NUM_BARS];
 
 	pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, &ctrl);
-	if (ctrl & PCI_SRIOV_CTRL_VFE) {
+	if (!pci_lu_adopt(dev) && ctrl & PCI_SRIOV_CTRL_VFE) {
 		pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0);
 		ssleep(1);
 	}
@@ -785,7 +792,8 @@ static int sriov_init(struct pci_dev *dev, int pos)
 		ctrl |= PCI_SRIOV_CTRL_ARI;
 
 found:
-	pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
 
 	pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &total);
 	if (!total)
@@ -798,7 +806,8 @@ static int sriov_init(struct pci_dev *dev, int pos)
 		return -EIO;
 
 	pgsz &= ~(pgsz - 1);
-	pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz);
 
 	iov = kzalloc(sizeof(*iov), GFP_KERNEL);
 	if (!iov)
@@ -904,14 +913,17 @@ static void sriov_restore_state(struct pci_dev *dev)
 	 */
 	ctrl &= ~PCI_SRIOV_CTRL_ARI;
 	ctrl |= iov->ctrl & PCI_SRIOV_CTRL_ARI;
-	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, ctrl);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, ctrl);
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
 		pci_update_resource(dev, i + PCI_IOV_RESOURCES);
 
-	pci_write_config_dword(dev, iov->pos + PCI_SRIOV_SYS_PGSIZE, iov->pgsz);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_dword(dev, iov->pos + PCI_SRIOV_SYS_PGSIZE, iov->pgsz);
 	pci_iov_set_numvfs(dev, iov->num_VFs);
-	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
 	if (iov->ctrl & PCI_SRIOV_CTRL_VFE)
 		msleep(100);
 }
@@ -1013,10 +1025,12 @@ void pci_iov_update_resource(struct pci_dev *dev, int resno)
 	new |= res->flags & ~PCI_BASE_ADDRESS_MEM_MASK;
 
 	reg = iov->pos + PCI_SRIOV_BAR + 4 * vf_bar;
-	pci_write_config_dword(dev, reg, new);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_dword(dev, reg, new);
 	if (res->flags & IORESOURCE_MEM_64) {
 		new = region.start >> 16 >> 16;
-		pci_write_config_dword(dev, reg + 4, new);
+		if (!pci_lu_adopt(dev))
+			pci_write_config_dword(dev, reg + 4, new);
 	}
 }
 
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 6ede55a7c5e652c80b51b10e58f0290eb6556430..7c40fde1ba0f89ad1d72064ac9e80696faeab426 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -113,7 +113,8 @@ static int pci_setup_msi_context(struct pci_dev *dev)
 
 void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
 {
-	raw_spinlock_t *lock = &to_pci_dev(desc->dev)->msi_lock;
+	struct pci_dev *pci_dev = to_pci_dev(desc->dev);
+	raw_spinlock_t *lock = &pci_dev->msi_lock;
 	unsigned long flags;
 
 	if (!desc->pci.msi_attrib.can_mask)
@@ -122,8 +123,9 @@ void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
 	raw_spin_lock_irqsave(lock, flags);
 	desc->pci.msi_mask &= ~clear;
 	desc->pci.msi_mask |= set;
-	pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->pci.mask_pos,
-			       desc->pci.msi_mask);
+	if (!pci_lu_adopt(pci_dev))
+		pci_write_config_dword(pci_dev, desc->pci.mask_pos,
+				       desc->pci.msi_mask);
 	raw_spin_unlock_irqrestore(lock, flags);
 }
 
@@ -190,6 +192,9 @@ static inline void pci_write_msg_msi(struct pci_dev *dev, struct msi_desc *desc,
 	int pos = dev->msi_cap;
 	u16 msgctl;
 
+	if (pci_lu_adopt(dev))
+		return;
+
 	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
 	msgctl &= ~PCI_MSI_FLAGS_QSIZE;
 	msgctl |= FIELD_PREP(PCI_MSI_FLAGS_QSIZE, desc->pci.msi_attrib.multiple);
@@ -214,6 +219,8 @@ static inline void pci_write_msg_msix(struct msi_desc *desc, struct msi_msg *msg
 
 	if (desc->pci.msi_attrib.is_virtual)
 		return;
+	if (pci_lu_adopt(to_pci_dev(desc->dev)))
+		return;
 	/*
 	 * The specification mandates that the entry is masked
 	 * when the message is modified:
@@ -279,7 +286,8 @@ static void pci_msi_set_enable(struct pci_dev *dev, int enable)
 	control &= ~PCI_MSI_FLAGS_ENABLE;
 	if (enable)
 		control |= PCI_MSI_FLAGS_ENABLE;
-	pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
 }
 
 static int msi_setup_msi_desc(struct pci_dev *dev, int nvec,
@@ -553,6 +561,7 @@ static void pci_msix_clear_and_set_ctrl(struct pci_dev *dev, u16 clear, u16 set)
 {
 	u16 ctrl;
 
+	BUG_ON(pci_lu_adopt(dev));
 	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
 	ctrl &= ~clear;
 	ctrl |= set;
@@ -720,8 +729,9 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	 * registers can be accessed.  Mask all the vectors to prevent
 	 * interrupts coming in before they're fully set up.
 	 */
-	pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL |
-				    PCI_MSIX_FLAGS_ENABLE);
+	if (!pci_lu_adopt(dev))
+		pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL |
+					    PCI_MSIX_FLAGS_ENABLE);
 
 	/* Mark it enabled so setup functions can query it */
 	dev->msix_enabled = 1;
@@ -753,14 +763,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 		 */
 		msix_mask_all(dev->msix_base, tsize);
 	}
-	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
+	if (!pci_lu_adopt(dev))
+		pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 
 	pcibios_free_irq(dev);
 	return 0;
 
 out_disable:
 	dev->msix_enabled = 0;
-	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0);
+	if (!pci_lu_adopt(dev))
+		pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0);
 
 	return ret;
 }
@@ -864,6 +876,7 @@ void __pci_restore_msix_state(struct pci_dev *dev)
 	if (!dev->msix_enabled)
 		return;
 
+	BUG_ON(pci_lu_adopt(dev));
 	/* route the table */
 	pci_intx_for_msi(dev, 0);
 	pci_msix_clear_and_set_ctrl(dev, 0,
@@ -898,7 +911,8 @@ void pci_msix_shutdown(struct pci_dev *dev)
 	msi_for_each_desc(desc, &dev->dev, MSI_DESC_ALL)
 		pci_msix_mask(desc);
 
-	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
+	if (!pci_lu_adopt(dev))
+		pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
 	pci_intx_for_msi(dev, 1);
 	dev->msix_enabled = 0;
 	pcibios_alloc_irq(dev);
diff --git a/drivers/pci/msi/pcidev_msi.c b/drivers/pci/msi/pcidev_msi.c
index 5520aff53b5670e70311c63f0f358228bf03c309..f9f682a84a05ef47ff4d85e7d0e724cc7c2f5cdc 100644
--- a/drivers/pci/msi/pcidev_msi.c
+++ b/drivers/pci/msi/pcidev_msi.c
@@ -18,7 +18,7 @@ void pci_msi_init(struct pci_dev *dev)
 		return;
 
 	pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &ctrl);
-	if (ctrl & PCI_MSI_FLAGS_ENABLE) {
+	if (!pci_lu_adopt(dev) && ctrl & PCI_MSI_FLAGS_ENABLE) {
 		pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS,
 				      ctrl & ~PCI_MSI_FLAGS_ENABLE);
 	}
@@ -36,7 +36,7 @@ void pci_msix_init(struct pci_dev *dev)
 		return;
 
 	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
-	if (ctrl & PCI_MSIX_FLAGS_ENABLE) {
+	if (!pci_lu_adopt(dev) && ctrl & PCI_MSIX_FLAGS_ENABLE) {
 		pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS,
 				      ctrl & ~PCI_MSIX_FLAGS_ENABLE);
 	}
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index af370628e58393aa0cbdf6d283b3afe33e5effb5..b9e42a1352c87443dd5c4ee9f03bc8a0d343d714 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -172,6 +172,9 @@ static void program_hpx_type0(struct pci_dev *dev, struct hpx_type0 *hpx)
 		hpx = &pci_default_type0;
 	}
 
+	if (pci_lu_adopt(dev))
+		return;
+
 	pci_write_config_byte(dev, PCI_CACHE_LINE_SIZE, hpx->cache_line_size);
 	pci_write_config_byte(dev, PCI_LATENCY_TIMER, hpx->latency_timer);
 	pci_read_config_word(dev, PCI_COMMAND, &pci_cmd);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 46fb80dbca590c251fcad3bf2f011a16f6898810..c1cc723f979ae881cf07ad06e1fa0d472e8b89c6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -218,7 +218,7 @@ int pci_status_get_and_clear_errors(struct pci_dev *pdev)
 		return -EIO;
 
 	status &= PCI_STATUS_ERROR_BITS;
-	if (status)
+	if (status && !pci_lu_adopt(pdev))
 		pci_write_config_word(pdev, PCI_STATUS, status);
 
 	return status;
@@ -628,7 +628,7 @@ u64 pci_get_dsn(struct pci_dev *dev)
 	int pos;
 
 	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DSN);
-	if (!pos)
+	if (!pos && !pci_lu_adopt(dev))
 		return 0;
 
 	/*
@@ -1103,7 +1103,8 @@ static void pci_enable_acs(struct pci_dev *dev)
 			 ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC));
 	__pci_config_acs(dev, &caps, config_acs_param, 0, 0);
 
-	pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
 }
 
 /**
@@ -1394,7 +1395,8 @@ int pci_power_up(struct pci_dev *dev)
 	 * Force the entire word to 0. This doesn't affect PME_Status, disables
 	 * PME_En, and sets PowerState to 0.
 	 */
-	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
 
 	/* Mandatory transition delays; see PCI PM 1.2. */
 	if (state == PCI_D3hot)
@@ -1552,7 +1554,8 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
 	pmcsr |= state;
 
 	/* Enter specified state */
-	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
 
 	/* Mandatory power management transition delays; see PCI PM 1.2. */
 	if (state == PCI_D3hot)
@@ -1781,7 +1784,8 @@ static void pci_restore_pcix_state(struct pci_dev *dev)
 		return;
 	cap = (u16 *)&save_state->cap.data[0];
 
-	pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
 }
 
 /**
@@ -2090,7 +2094,7 @@ static int do_pci_enable_device(struct pci_dev *dev, int bars)
 	pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &pin);
 	if (pin) {
 		pci_read_config_word(dev, PCI_COMMAND, &cmd);
-		if (cmd & PCI_COMMAND_INTX_DISABLE)
+		if (!pci_lu_adopt(dev) && cmd & PCI_COMMAND_INTX_DISABLE)
 			pci_write_config_word(dev, PCI_COMMAND,
 					      cmd & ~PCI_COMMAND_INTX_DISABLE);
 	}
@@ -2248,7 +2252,8 @@ static void do_pci_disable_device(struct pci_dev *dev)
 	pci_read_config_word(dev, PCI_COMMAND, &pci_command);
 	if (pci_command & PCI_COMMAND_MASTER) {
 		pci_command &= ~PCI_COMMAND_MASTER;
-		pci_write_config_word(dev, PCI_COMMAND, pci_command);
+		if (!pci_lu_adopt(dev))
+			pci_write_config_word(dev, PCI_COMMAND, pci_command);
 	}
 
 	pcibios_disable_device(dev);
@@ -2369,7 +2374,8 @@ bool pci_check_pme_status(struct pci_dev *dev)
 		ret = true;
 	}
 
-	pci_write_config_word(dev, pmcsr_pos, pmcsr);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, pmcsr_pos, pmcsr);
 
 	return ret;
 }
@@ -2484,7 +2490,8 @@ static void __pci_pme_active(struct pci_dev *dev, bool enable)
 	if (!enable)
 		pmcsr &= ~PCI_PM_CTRL_PME_ENABLE;
 
-	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
 }
 
 /**
@@ -2506,7 +2513,8 @@ void pci_pme_restore(struct pci_dev *dev)
 		pmcsr &= ~PCI_PM_CTRL_PME_ENABLE;
 		pmcsr |= PCI_PM_CTRL_PME_STATUS;
 	}
-	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
 }
 
 /**
@@ -3587,12 +3595,14 @@ void pci_configure_ari(struct pci_dev *dev)
 		return;
 
 	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ARI)) {
-		pcie_capability_set_word(bridge, PCI_EXP_DEVCTL2,
-					 PCI_EXP_DEVCTL2_ARI);
+		if (!pci_lu_adopt(dev))
+			pcie_capability_set_word(bridge, PCI_EXP_DEVCTL2,
+						 PCI_EXP_DEVCTL2_ARI);
 		bridge->ari_enabled = 1;
 	} else {
-		pcie_capability_clear_word(bridge, PCI_EXP_DEVCTL2,
-					   PCI_EXP_DEVCTL2_ARI);
+		if (!pci_lu_adopt(dev))
+			pcie_capability_clear_word(bridge, PCI_EXP_DEVCTL2,
+						   PCI_EXP_DEVCTL2_ARI);
 		bridge->ari_enabled = 0;
 	}
 }
@@ -4286,7 +4296,8 @@ static void __pci_set_master(struct pci_dev *dev, bool enable)
 	if (cmd != old_cmd) {
 		pci_dbg(dev, "%s bus mastering\n",
 			enable ? "enabling" : "disabling");
-		pci_write_config_word(dev, PCI_COMMAND, cmd);
+		if (!pci_lu_adopt(dev))
+			pci_write_config_word(dev, PCI_COMMAND, cmd);
 	}
 	dev->is_busmaster = enable;
 }
@@ -4416,7 +4427,8 @@ int pci_set_mwi(struct pci_dev *dev)
 	if (!(cmd & PCI_COMMAND_INVALIDATE)) {
 		pci_dbg(dev, "enabling Mem-Wr-Inval\n");
 		cmd |= PCI_COMMAND_INVALIDATE;
-		pci_write_config_word(dev, PCI_COMMAND, cmd);
+		if (!pci_lu_adopt(dev))
+			pci_write_config_word(dev, PCI_COMMAND, cmd);
 	}
 	return 0;
 #endif
@@ -4456,7 +4468,8 @@ void pci_clear_mwi(struct pci_dev *dev)
 	pci_read_config_word(dev, PCI_COMMAND, &cmd);
 	if (cmd & PCI_COMMAND_INVALIDATE) {
 		cmd &= ~PCI_COMMAND_INVALIDATE;
-		pci_write_config_word(dev, PCI_COMMAND, cmd);
+		if (!pci_lu_adopt(dev))
+			pci_write_config_word(dev, PCI_COMMAND, cmd);
 	}
 #endif
 }
@@ -4475,7 +4488,8 @@ void pci_disable_parity(struct pci_dev *dev)
 	pci_read_config_word(dev, PCI_COMMAND, &cmd);
 	if (cmd & PCI_COMMAND_PARITY) {
 		cmd &= ~PCI_COMMAND_PARITY;
-		pci_write_config_word(dev, PCI_COMMAND, cmd);
+		if (!pci_lu_adopt(dev))
+			pci_write_config_word(dev, PCI_COMMAND, cmd);
 	}
 }
 
@@ -4500,7 +4514,8 @@ void pci_intx(struct pci_dev *pdev, int enable)
 	if (new == pci_command)
 		return;
 
-	pci_write_config_word(pdev, PCI_COMMAND, new);
+	if (!pci_lu_adopt(pdev))
+			pci_write_config_word(pdev, PCI_COMMAND, new);
 }
 EXPORT_SYMBOL_GPL(pci_intx);
 
@@ -4648,12 +4663,14 @@ static int pci_pm_reset(struct pci_dev *dev, bool probe)
 
 	csr &= ~PCI_PM_CTRL_STATE_MASK;
 	csr |= PCI_D3hot;
-	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
 	pci_dev_d3_sleep(dev);
 
 	csr &= ~PCI_PM_CTRL_STATE_MASK;
 	csr |= PCI_D0;
-	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
 	pci_dev_d3_sleep(dev);
 
 	return pci_dev_wait(dev, "PM D3hot->D0", PCIE_RESET_READY_POLL_MS);
@@ -4959,6 +4976,7 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	u16 ctrl;
 
+	BUG_ON(pci_lu_adopt(dev));
 	pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
 	ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
@@ -5186,7 +5204,8 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
 	 * DMA from the device including MSI/MSI-X interrupts.  For PCI 2.3
 	 * compliant devices, INTx-disable prevents legacy interrupts.
 	 */
-	pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
 }
 
 static void pci_dev_restore(struct pci_dev *dev)
@@ -5897,8 +5916,9 @@ int pcix_set_mmrbc(struct pci_dev *dev, int mmrbc)
 
 		cmd &= ~PCI_X_CMD_MAX_READ;
 		cmd |= FIELD_PREP(PCI_X_CMD_MAX_READ, v);
-		if (pci_write_config_word(dev, cap + PCI_X_CMD, cmd))
-			return -EIO;
+		if (!pci_lu_adopt(dev))
+			if (pci_write_config_word(dev, cap + PCI_X_CMD, cmd))
+				return -EIO;
 	}
 	return 0;
 }
@@ -5960,6 +5980,8 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
 		}
 	}
 
+	if (pci_lu_adopt(dev))
+		return 0;
 	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
 						  PCI_EXP_DEVCTL_READRQ, v);
 
@@ -6004,6 +6026,8 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
 		return -EINVAL;
 	v = FIELD_PREP(PCI_EXP_DEVCTL_PAYLOAD, v);
 
+	if (pci_lu_adopt(dev))
+		return 0;
 	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
 						  PCI_EXP_DEVCTL_PAYLOAD, v);
 
@@ -6304,7 +6328,8 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
 			cmd |= command_bits;
 		else
 			cmd &= ~command_bits;
-		pci_write_config_word(dev, PCI_COMMAND, cmd);
+		if (!pci_lu_adopt(dev))
+			pci_write_config_word(dev, PCI_COMMAND, cmd);
 	}
 
 	if (!(flags & PCI_VGA_STATE_CHANGE_BRIDGE))
@@ -6320,8 +6345,9 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
 				cmd |= PCI_BRIDGE_CTL_VGA;
 			else
 				cmd &= ~PCI_BRIDGE_CTL_VGA;
-			pci_write_config_word(bridge, PCI_BRIDGE_CONTROL,
-					      cmd);
+			if (!pci_lu_adopt(bridge))
+				pci_write_config_word(bridge, PCI_BRIDGE_CONTROL,
+						      cmd);
 		}
 		bus = bus->parent;
 	}
@@ -6621,7 +6647,8 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
 
 	pci_read_config_word(dev, PCI_COMMAND, &command);
 	command &= ~PCI_COMMAND_MEMORY;
-	pci_write_config_word(dev, PCI_COMMAND, command);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, PCI_COMMAND, command);
 
 	for (i = 0; i <= PCI_ROM_RESOURCE; i++)
 		pci_request_resource_alignment(dev, i, align, resize);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a8acc986a5aac808ec64395d7d946ee036270f5b..bd198227ae3cf687f4ddae76c2f53125681ca91d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1188,11 +1188,18 @@ static inline int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int inde
 
 void pci_liveupdate_restore(struct pci_dev *dev);
 void pci_liveupdate_override_driver(struct pci_dev *dev);
+static inline struct pci_dev_ser *pci_lu_adopt(struct pci_dev *dev)
+{
+	return dev->dev.lu.requested ? dev->dev.lu.dev_state : NULL;
+}
 #else
 #define PCI_SER_GET(__dev, __var, __def) __def
 
 static inline void pci_liveupdate_restore(struct pci_dev *dev) {}
 static inline void pci_liveupdate_override_driver(struct pci_dev *dev) {}
+static inline struct pci_dev_ser *pci_lu_adopt(struct pci_dev *dev)
+{
+	return NULL;
+}
 #endif
-
 #endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 29fcb0689a918f9cb123691e1680de5a1af2c115..61f9a443f6ad2bad57d3fc5958e8855117f79598 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -788,7 +788,7 @@ static void aspm_l1ss_init(struct pcie_link_state *link)
 		aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
 }
 
-static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
+static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist, bool lu_restore)
 {
 	struct pci_dev *child = link->downstream, *parent = link->pdev;
 	u32 parent_lnkcap, child_lnkcap;
@@ -812,7 +812,8 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
 		return;
 
 	/* Configure common clock before checking latencies */
-	pcie_aspm_configure_common_clock(link);
+	if (!lu_restore)
+		pcie_aspm_configure_common_clock(link);
 
 	/*
 	 * Re-read upstream/downstream components' register state after
@@ -1130,7 +1131,7 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev)
 	 * upstream links also because capable state of them can be
 	 * update through pcie_aspm_cap_init().
 	 */
-	pcie_aspm_cap_init(link, blacklist);
+	pcie_aspm_cap_init(link, blacklist, pci_lu_adopt(pdev));
 
 	/* Setup initial Clock PM state */
 	pcie_clkpm_cap_init(link, blacklist);
diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c
index a2daebd9806cd7273ee331406201402a758bd7b8..da093a5ba7ee1f9d20652c71e8e78662fdab176c 100644
--- a/drivers/pci/pcie/pme.c
+++ b/drivers/pci/pcie/pme.c
@@ -53,6 +53,8 @@ struct pcie_pme_service_data {
  */
 void pcie_pme_interrupt_enable(struct pci_dev *dev, bool enable)
 {
+	if (pci_lu_adopt(dev))
+		return;
 	if (enable)
 		pcie_capability_set_word(dev, PCI_EXP_RTCTL,
 					 PCI_EXP_RTCTL_PMEIE);
@@ -344,8 +346,10 @@ static int pcie_pme_probe(struct pcie_device *srv)
 	data->srv = srv;
 	set_service_data(srv, data);
 
-	pcie_pme_interrupt_enable(port, false);
-	pcie_clear_root_pme_status(port);
+	if (!pci_lu_adopt(port)) {
+		pcie_pme_interrupt_enable(port, false);
+		pcie_clear_root_pme_status(port);
+	}
 
 	ret = request_irq(srv->irq, pcie_pme_irq, IRQF_SHARED, "PCIe PME", srv);
 	if (ret) {
@@ -356,7 +360,8 @@ static int pcie_pme_probe(struct pcie_device *srv)
 	pci_info(port, "Signaling with IRQ %d\n", srv->irq);
 
 	pcie_pme_mark_devices(port);
-	pcie_pme_interrupt_enable(port, true);
+	if (!pci_lu_adopt(port))
+		pcie_pme_interrupt_enable(port, true);
 	return 0;
 }
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index d8b80e1c4fb35289208d7c953fb5c1e137a5c1a8..5c30d1d52a96b17a92794756cab5db0972548267 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -358,7 +358,7 @@ static __always_inline void pci_read_bases(struct pci_dev *dev,
 		return;
 
 	/* No printks while decoding is disabled! */
-	if (!dev->mmio_always_on) {
+	if (!pci_lu_adopt(dev) && !dev->mmio_always_on) {
 		pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
 		if (orig_cmd & PCI_COMMAND_DECODE_ENABLE) {
 			pci_write_config_word(dev, PCI_COMMAND,
@@ -366,11 +366,13 @@ static __always_inline void pci_read_bases(struct pci_dev *dev,
 		}
 	}
 
-	__pci_size_stdbars(dev, howmany, PCI_BASE_ADDRESS_0, stdbars);
-	if (rom)
-		__pci_size_rom(dev, rom, &rombar);
+	if (!pci_lu_adopt(dev)) {
+		__pci_size_stdbars(dev, howmany, PCI_BASE_ADDRESS_0, stdbars);
+		if (rom)
+			__pci_size_rom(dev, rom, &rombar);
+	}
 
-	if (!dev->mmio_always_on &&
+	if (!pci_lu_adopt(dev) && !dev->mmio_always_on &&
 	    (orig_cmd & PCI_COMMAND_DECODE_ENABLE))
 		pci_write_config_word(dev, PCI_COMMAND, orig_cmd);
 
@@ -1269,8 +1271,9 @@ static void pci_enable_rrs_sv(struct pci_dev *pdev)
 	/* Enable Configuration RRS Software Visibility if supported */
 	pcie_capability_read_word(pdev, PCI_EXP_RTCAP, &root_cap);
 	if (root_cap & PCI_EXP_RTCAP_RRS_SV) {
-		pcie_capability_set_word(pdev, PCI_EXP_RTCTL,
-					 PCI_EXP_RTCTL_RRS_SVE);
+		if (!pci_lu_adopt(pdev))
+			pcie_capability_set_word(pdev, PCI_EXP_RTCTL,
+						 PCI_EXP_RTCTL_RRS_SVE);
 		pdev->config_rrs_sv = 1;
 	}
 }
@@ -1384,8 +1387,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
 	 * bus errors in some architectures.
 	 */
 	pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &bctl);
-	pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
-			      bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
+				      bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
 
 	if ((secondary || subordinate) && !pcibios_assign_all_busses() &&
 	    !is_cardbus && !broken) {
@@ -1404,6 +1408,10 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
 		 * more than one bridge. The second case can happen with
 		 * the i450NX chipset.
 		 */
+		if (pci_lu_adopt(dev)) {
+			/* Verify bus number here */
+		}
+
 		child = pci_find_bus(pci_domain_nr(bus), secondary);
 		if (!child) {
 			child = pci_add_new_bus(bus, dev, secondary);
@@ -1558,7 +1566,8 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
 	/* Clear errors in the Secondary Status Register */
 	pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
 
-	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
+	if (!pci_lu_adopt(dev))
+		pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
 
 	pm_runtime_put(&dev->dev);
 
@@ -2035,7 +2044,10 @@ int pci_setup_device(struct pci_dev *dev)
 	 * If we are not booted from liveupdate, default
 	 * "Unknown power state".
 	 */
-	dev->current_state = PCI_SER_GET(dev, current_state, PCI_UNKNOWN);
+	if (pci_lu_adopt(dev))
+		dev->current_state = 0; /* FIXME */
+	else
+		dev->current_state = PCI_SER_GET(dev, current_state, PCI_UNKNOWN);
 
 	/* Early fixups, before probing the BARs */
 	pci_fixup_device(pci_fixup_early, dev);
@@ -2075,7 +2087,8 @@ int pci_setup_device(struct pci_dev *dev)
 						   dev->hotplug_user_indicators);
 	dev->pref_window = PCI_SER_GET(dev, pref_window,
 				       dev->pref_window);
-	dev->pref_64_window = PCI_SER_GET(dev, pref_64_window,
+	if (!pci_lu_adopt(dev))
+		dev->pref_64_window = PCI_SER_GET(dev, pref_64_window,
 					  dev->pref_64_window);
 
 	switch (dev->hdr_type) {		    /* header type */
@@ -2269,6 +2282,10 @@ int pci_configure_extended_tags(struct pci_dev *dev, void *ign)
 	if (!host)
 		return 0;
 
+
+	if (pci_lu_adopt(dev))
+		return 0;
+
 	/*
 	 * If some device in the hierarchy doesn't handle Extended Tags
 	 * correctly, make sure they're disabled.
@@ -2373,7 +2390,7 @@ static void pci_configure_serr(struct pci_dev *dev)
 		 * endpoint unless SERR# forwarding is enabled.
 		 */
 		pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &control);
-		if (!(control & PCI_BRIDGE_CTL_SERR)) {
+		if (!pci_lu_adopt(dev) && !(control & PCI_BRIDGE_CTL_SERR)) {
 			control |= PCI_BRIDGE_CTL_SERR;
 			pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
 		}
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 07c3d021a47ec794aaae13e1c12a667cfb47cb45..276a62c6957218c0c89d8881b1a4d6f6d419dacf 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -706,6 +706,9 @@ static void pci_setup_bridge_io(struct pci_dev *bridge)
 		io_upper16 = 0;
 		l = 0x00f0;
 	}
+
+	if (pci_lu_adopt(bridge))
+		return;
 	/* Temporarily disable the I/O range before updating PCI_IO_BASE */
 	pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, 0x0000ffff);
 	/* Update lower 16 bits of I/O base/limit */
@@ -732,6 +735,8 @@ static void pci_setup_bridge_mmio(struct pci_dev *bridge)
 	} else {
 		l = 0x0000fff0;
 	}
+	if (pci_lu_adopt(bridge))
+		return;
 	pci_write_config_dword(bridge, PCI_MEMORY_BASE, l);
 }
 
@@ -765,6 +770,8 @@ static void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
 	} else {
 		l = 0x0000fff0;
 	}
+	if (pci_lu_adopt(bridge))
+		return;
 	pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, l);
 
 	/* Set the upper 32 bits of PREF base & limit */
@@ -787,7 +794,8 @@ static void __pci_setup_bridge(struct pci_bus *bus, unsigned long type)
 	if (type & IORESOURCE_PREFETCH)
 		pci_setup_bridge_mmio_pref(bridge);
 
-	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl);
+	if (!pci_lu_adopt(bridge))
+		pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl);
 }
 
 void __weak pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 21/25] PCI/LUO: Save and restore the PCI resource
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (19 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot Chris Li
@ 2025-07-28  8:24 ` chrisl
  2025-07-28  8:24 ` [PATCH RFC 22/25] PCI/LUO: Save PCI bus and host bridge states chrisl
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: Jason Miu <jasonmiu@google.com>

Preserve the resource array in pci_dev, in pci_dev_ser with an array
of `struct pci_resource_ser`. This array save all resource regions
claimed by a PCI device in the LUO prepare phase.

When a PCI device is setting up after a liveupdate reboot, normally it
read/write the PCI BARs for probing the available resource regions,
with pci_read_bases() function. We check if liveupdate is enabled and
the preserved resource is preserved. If it does, we restore the
resource data structure instead of accessing the hardware.

Tested:
  - QEMU VM boot test. Save and restore a pf-test driver.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/pci/pci.h        | 17 ++++++++++++++++
 drivers/pci/probe.c      | 18 ++++++++++++++---
 3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index bc2c166ef494fd0b38cc05500bf0817c0f50fd95..7fda7e4d409adce6bf92ef7af1167f7bda302c7e 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -166,10 +166,12 @@ static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	const char *name = dev->driver->name;
+	int i;
 
 	if (!name)
 		return -ENXIO;
-	if (strlen(name) > sizeof(s->driver_name) - 1)
+	if ((strlen(name) > sizeof(s->driver_name) - 1) ||
+	    (strlen(name) > sizeof(s->resource[0].name) - 1))
 		return -ENOSPC;
 	strscpy(s->driver_name, name, sizeof(s->driver_name));
 	s->path = pci_get_device_path(pdev);
@@ -190,6 +192,28 @@ static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
 	s->pref_window = pdev->pref_window;
 	s->pref_64_window = pdev->pref_64_window;
 
+	/*
+	 * Per PCIe r4.0, sec 9.3.4.1.11, the VF BARs are all RO Zero,
+	 * no need to preserve the resource.
+	 */
+	if (pdev->is_virtfn)
+		return 0;
+
+	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+		/* This resource region is not claimed by this device, skip. */
+		if ((pdev->resource[i].name == NULL) ||
+		    (strlen(pdev->resource[i].name) == 0))
+			continue;
+
+		s->resource[i].start = pdev->resource[i].start;
+		s->resource[i].end = pdev->resource[i].end;
+		s->resource[i].flags = pdev->resource[i].flags;
+		s->resource[i].desc = pdev->resource[i].desc;
+
+		strscpy((char *)s->resource[i].name, pci_name(pdev),
+			sizeof(s->resource[i].name));
+	}
+
 	return 0;
 }
 
@@ -502,6 +526,32 @@ void pci_liveupdate_override_driver(struct pci_dev *dev)
 		panic("PCI Liveupdate override driver failed: %s", s->driver_name);
 }
 
+int pci_liveupdate_reclaim_resource(struct pci_dev *dev)
+{
+	const char *name = pci_name(dev);
+	int i;
+
+	if (!dev->dev.lu.dev_state)
+		return -EINVAL;
+
+	if (dev->is_virtfn)
+		return 0;
+
+	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+		/* This resource region was not claimed, skip.  */
+		if (strncmp(PCI_SER_GET(dev, resource[i].name, ""), name,
+				strlen(name)) != 0)
+			continue;
+
+		dev->resource[i].start = PCI_SER_GET(dev, resource[i].start, 0);
+		dev->resource[i].end = PCI_SER_GET(dev, resource[i].end, 0);
+		dev->resource[i].name = pci_name(dev);
+		dev->resource[i].flags = PCI_SER_GET(dev, resource[i].flags, 0);
+		dev->resource[i].desc = PCI_SER_GET(dev, resource[i].desc, 0);
+	}
+
+	return 0;
+}
 
 static int __init pci_liveupdate_init(void)
 {
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index bd198227ae3cf687f4ddae76c2f53125681ca91d..7af32edb128faef9c5e2665ca5055374f7fd30ea 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -489,6 +489,19 @@ struct pci_sriov {
 	bool		drivers_autoprobe; /* Auto probing of VFs by driver */
 };
 
+#define PCI_RESOURCE_SER_NAME_SIZE 64
+struct pci_resource_ser {
+	u64 start;
+	u64 end;
+	const char name[PCI_RESOURCE_SER_NAME_SIZE];
+	u64 flags;
+	u64 desc;
+	/*
+	 * The PCI resource is not nested. We do not need to preserve
+	 * the parent, sibling, child pointers in the original struct resource.
+	 */
+} __packed;
+
 struct pci_dev_ser {
 	u32	path;		/* domain + bus + slot + fn */
 	u8	requested;
@@ -509,6 +522,7 @@ struct pci_dev_ser {
 	u32	hotplug_user_indicators:1;
 	u32	pref_window:1;
 	u32	pref_64_window:1;
+	struct pci_resource_ser resource[DEVICE_COUNT_RESOURCE];
 } __packed;
 
 #ifdef CONFIG_PCI_DOE
@@ -1192,6 +1206,7 @@ static inline struct pci_dev_ser *pci_lu_adopt(struct pci_dev *dev)
 {
 	return dev->dev.lu.requested ? dev->dev.lu.dev_state : NULL;
 }
+int pci_liveupdate_reclaim_resource(struct pci_dev *dev);
 #else
 #define PCI_SER_GET(__dev, __var, __def) __def
 
@@ -1201,5 +1216,7 @@ static inline struct pci_dev_ser *pci_lu_adopt(struct pci_dev *dev)
 {
 	return NULL;
 }
+static inline int pci_liveupdate_reclaim_resource(
+	struct pci_dev *dev) { return -ENXIO; }
 #endif
 #endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5c30d1d52a96b17a92794756cab5db0972548267..a101a44956821e5e81c6b063e6aab7db49a4cf7f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2096,7 +2096,13 @@ int pci_setup_device(struct pci_dev *dev)
 		if (class == PCI_CLASS_BRIDGE_PCI)
 			goto bad;
 		pci_read_irq(dev);
-		pci_read_bases(dev, PCI_STD_NUM_BARS, PCI_ROM_ADDRESS);
+
+		/*
+		 * If we can reclaim the resource from liveupdate preserved data,
+		 * do not access the hardware.
+		 */
+		if (pci_liveupdate_reclaim_resource(dev) < 0)
+			pci_read_bases(dev, PCI_STD_NUM_BARS, PCI_ROM_ADDRESS);
 
 		pci_subsystem_ids(dev, &dev->subsystem_vendor, &dev->subsystem_device);
 
@@ -2152,7 +2158,10 @@ int pci_setup_device(struct pci_dev *dev)
 		 */
 		pci_read_irq(dev);
 		dev->transparent = ((dev->class & 0xff) == 1);
-		pci_read_bases(dev, 2, PCI_ROM_ADDRESS1);
+
+		if (pci_liveupdate_reclaim_resource(dev) < 0)
+			pci_read_bases(dev, 2, PCI_ROM_ADDRESS1);
+
 		pci_read_bridge_windows(dev);
 		set_pcie_hotplug_bridge(dev);
 		pos = pci_find_capability(dev, PCI_CAP_ID_SSVID);
@@ -2166,7 +2175,10 @@ int pci_setup_device(struct pci_dev *dev)
 		if (class != PCI_CLASS_BRIDGE_CARDBUS)
 			goto bad;
 		pci_read_irq(dev);
-		pci_read_bases(dev, 1, 0);
+
+		if (pci_liveupdate_reclaim_resource(dev) < 0)
+			pci_read_bases(dev, 1, 0);
+
 		pci_read_config_word(dev, PCI_CB_SUBSYSTEM_VENDOR_ID, &dev->subsystem_vendor);
 		pci_read_config_word(dev, PCI_CB_SUBSYSTEM_ID, &dev->subsystem_device);
 		break;

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 22/25] PCI/LUO: Save PCI bus and host bridge states
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (20 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 21/25] PCI/LUO: Save and restore the PCI resource chrisl
@ 2025-07-28  8:24 ` chrisl
  2025-07-28  8:24 ` [PATCH RFC 23/25] PCI/LUO: Check the PCI bus state after restoration chrisl
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: Jason Miu <jasonmiu@google.com>

In the LUO prepare phase, saves the PCI bus and host bridge states.

For a PCI bus, save the domain and bus numbers. Save the bridge types.
Save the upstream bus domain and bus numbers so we can verify the
relationship in the later restoration phase.

If the current bridge is a host bridge, save also the PCI bridge
resource. This is not needed by other PCI bridges as the resource is
already preserved by its associated struct pci_dev.

Tested:
  - QEMU VM boot test, preserve device with pci-lu-stub

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c | 60 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 52 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index 7fda7e4d409adce6bf92ef7af1167f7bda302c7e..be22af7a2db3a9bb06d8e100603a59f11b7fa5f8 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -20,9 +20,20 @@ static LIST_HEAD(preserved_buses);
 static LIST_HEAD(probe_devices);
 static LIST_HEAD(probe_buses);
 
+enum pci_bus_ser_bridge_type {
+	PCI_BUS_SER_NULL_BRIDGE, /* virtual bus */
+	PCI_BUS_SER_PCI_HOST_BRIDGE,
+	PCI_BUS_SER_PCI_BRIDGE,
+};
+
 struct pci_bus_ser {
 	u16	domain;
 	u8	number;
+	u16	parent_domain;
+	u8	parent_number;
+	enum pci_bus_ser_bridge_type bridge_type;
+	/* For a root bus, saves the host bridge PCI bridge resource */
+	struct pci_resource_ser resource[PCI_BRIDGE_RESOURCE_NUM];
 };
 
 struct pci_ser {
@@ -162,6 +173,16 @@ static int pci_get_device_path(struct pci_dev *pdev)
 	return (pci_domain_nr(pdev->bus) << 16) | pci_dev_id(pdev);
 }
 
+static void save_device_resource(struct pci_resource_ser *dest,
+				 struct resource *src)
+{
+	strscpy((char *)dest->name, src->name, sizeof(dest->name));
+	dest->start = src->start;
+	dest->end = src->end;
+	dest->flags = src->flags;
+	dest->desc = src->desc;
+}
+
 static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -205,13 +226,7 @@ static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
 		    (strlen(pdev->resource[i].name) == 0))
 			continue;
 
-		s->resource[i].start = pdev->resource[i].start;
-		s->resource[i].end = pdev->resource[i].end;
-		s->resource[i].flags = pdev->resource[i].flags;
-		s->resource[i].desc = pdev->resource[i].desc;
-
-		strscpy((char *)s->resource[i].name, pci_name(pdev),
-			sizeof(s->resource[i].name));
+		save_device_resource(s->resource + i, pdev->resource + i);
 	}
 
 	return 0;
@@ -219,8 +234,37 @@ static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
 
 static void pci_save_bus_state(struct pci_bus *bus, struct pci_bus_ser *s)
 {
-	s->number = bus->number;
+	int i;
+
 	s->domain = pci_domain_nr(bus);
+	s->number = bus->number;
+	if (bus->parent) {
+		s->parent_domain = pci_domain_nr(bus->parent);
+		s->parent_number = bus->parent->number;
+	}
+
+	/* This bus is a virtual bus if no physical bridge is being referred. */
+	if (!bus->bridge) {
+		s->bridge_type = PCI_BUS_SER_NULL_BRIDGE;
+		return;
+	}
+
+	if (!pci_is_root_bus(bus)) {
+		s->bridge_type = PCI_BUS_SER_PCI_BRIDGE;
+		return;
+	}
+
+	/* This bridge is a PCI host bridge. Saves its resource. */
+	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
+		/* This resource region is not claimed, skip. */
+		if ((bus->resource[i] == NULL) ||
+		    (bus->resource[i]->name == NULL) ||
+		    (strlen(bus->resource[i]->name) == 0))
+			continue;
+
+		save_device_resource(s->resource + i, bus->resource[i]);
+	}
+	s->bridge_type = PCI_BUS_SER_PCI_HOST_BRIDGE;
 }
 
 static int pci_call_prepare(struct pci_ser *pci_state,

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 23/25] PCI/LUO: Check the PCI bus state after restoration
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (21 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 22/25] PCI/LUO: Save PCI bus and host bridge states chrisl
@ 2025-07-28  8:24 ` chrisl
  2025-07-28  8:24 ` [PATCH RFC 24/25] PCI: pci-lu-pf-stub: Add a PF stub driver for Live Update testing Chris Li
  2025-07-28  8:24 ` [PATCH RFC 25/25] PCI/LUO: Clean up PCI_SER_GET() chrisl
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: Jason Miu <jasonmiu@google.com>

After the LUO reboot, check if the bus topology assoicated with the
current PCI device is matching with the bus states saved in LUO. We like
to verify:
- The domain and bus numbers.
- The parent bus domain and number.
- The bus type, which can be PCI-PCI bridge, host bridge, or virtual bus.
- PCI bridge resource of host bridge, as different from PCI-PCI bridge
which the resource is restorated from the PCI bridge device, we check
if the host bridge resource is changed from the last boot.

Tested:
  - QEMU VM liveupdate boot test with pci-lu-stub

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/liveupdate.c | 123 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 110 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
index be22af7a2db3a9bb06d8e100603a59f11b7fa5f8..739de5f655dba04024c9cf8db2bf6ea5e136cf5f 100644
--- a/drivers/pci/liveupdate.c
+++ b/drivers/pci/liveupdate.c
@@ -43,6 +43,45 @@ struct pci_ser {
 	/* struct pci_bus_ser buses[] */
 };
 
+static struct pci_bus_ser *get_saved_pci_bus_state(struct pci_ser *state,
+						   u16 domain, u8 number)
+{
+	int i;
+	struct pci_bus_ser *bus_state =
+		(struct pci_bus_ser *)(state->devs + state->dev_count);
+
+	for (i = 0; i < state->bus_count; i++, bus_state++) {
+		if (bus_state->domain == domain  &&
+		    bus_state->number == number)
+			return bus_state;
+	}
+
+	return NULL;
+}
+
+static enum pci_bus_ser_bridge_type get_bus_ser_bridge_type(struct pci_bus *bus)
+{
+	/* This bus is a virtual bus if no physical bridge is being referred. */
+	if (!bus->bridge)
+		return  PCI_BUS_SER_NULL_BRIDGE;
+
+	return pci_is_root_bus(bus) ?
+		PCI_BUS_SER_PCI_HOST_BRIDGE : PCI_BUS_SER_PCI_BRIDGE;
+}
+
+static char *bus_ser_bridge_type_to_string(enum pci_bus_ser_bridge_type bt)
+{
+	switch (bt) {
+	case PCI_BUS_SER_NULL_BRIDGE:
+		return "PCI_BUS_SER_NULL_BRIDGE";
+	case PCI_BUS_SER_PCI_BRIDGE:
+		return "PCI_BUS_SER_PCI_BRIDGE";
+	case PCI_BUS_SER_PCI_HOST_BRIDGE:
+		return "PCI_BUS_SER_PCI_HOST_BRIDGE";
+	}
+	return "PCI_BUS_SER_INVALID";
+}
+
 static void stack_push_buses(struct list_head *stack, struct list_head *buses)
 {
 	struct pci_bus *bus;
@@ -183,6 +222,71 @@ static void save_device_resource(struct pci_resource_ser *dest,
 	dest->desc = src->desc;
 }
 
+static void check_saved_bus_state(struct pci_dev *dev, struct pci_ser *pci_state)
+{
+	int i;
+	const struct resource *res;
+	const struct pci_resource_ser *saved_res;
+	const struct pci_bus_ser *bus_state =
+		get_saved_pci_bus_state(pci_state,
+					pci_domain_nr(dev->bus),
+					dev->bus->number);
+	struct pci_bus *bus = dev->bus;
+
+	if (!bus_state) {
+		panic("The bus of PCI device %s was not preserved by Liveupdate",
+		      pci_name(dev));
+	}
+
+	if (get_bus_ser_bridge_type(bus) != bus_state->bridge_type) {
+		panic("The bus (%04x:%02x) bridge type (%s) of PCI device (%s) is changed. "
+		      "Liveupdate preserved %s",
+		      pci_domain_nr(bus), bus->number,
+		      bus_ser_bridge_type_to_string(get_bus_ser_bridge_type(bus)),
+		      pci_name(dev),
+		      bus_ser_bridge_type_to_string(bus_state->bridge_type));
+	}
+
+	if (bus->parent) {
+		if (pci_domain_nr(bus->parent) != bus_state->parent_domain ||
+		    bus->parent->number != bus_state->parent_number) {
+			panic("The parent bus (%04x:%02x) of PCI device (%s) is changed. "
+			      "Liveupdate preserved %04x:%02x",
+			      pci_domain_nr(bus->parent), bus->parent->number,
+			      pci_name(dev),
+			      bus_state->parent_domain, bus_state->parent_number);
+		}
+
+		/* Checkings of PCI-PCI bridge and Virtual bus ends. */
+		return;
+	}
+
+	/* This is a host bridge device */
+	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
+
+		res = bus->resource[i];
+		if (res == NULL ||
+		    res->name == NULL ||
+		    strlen(res->name) == 0)
+			continue;
+
+		/* check its PCI bridge resource */
+		saved_res = &bus_state->resource[i];
+		if (res->start != saved_res->start ||
+		    res->end != saved_res->end ||
+		    res->flags != saved_res->flags ||
+		    res->desc != saved_res->desc ||
+		    strncmp(res->name, saved_res->name, sizeof(saved_res->name)) != 0) {
+			panic("Host bridge resource %pr is changed. "
+			      "Liveupdate preserved "
+			      "[mem 0x%016llx-0x%016llx flags 0x%016llx desc 0x%016llx name %s]",
+			      res,
+			      saved_res->start, saved_res->end,
+			      saved_res->flags, saved_res->desc, saved_res->name);
+		}
+	}
+}
+
 static int pci_save_device_state(struct device *dev, struct pci_dev_ser *s)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -238,22 +342,13 @@ static void pci_save_bus_state(struct pci_bus *bus, struct pci_bus_ser *s)
 
 	s->domain = pci_domain_nr(bus);
 	s->number = bus->number;
+	s->bridge_type = get_bus_ser_bridge_type(bus);
+
 	if (bus->parent) {
 		s->parent_domain = pci_domain_nr(bus->parent);
 		s->parent_number = bus->parent->number;
 	}
 
-	/* This bus is a virtual bus if no physical bridge is being referred. */
-	if (!bus->bridge) {
-		s->bridge_type = PCI_BUS_SER_NULL_BRIDGE;
-		return;
-	}
-
-	if (!pci_is_root_bus(bus)) {
-		s->bridge_type = PCI_BUS_SER_PCI_BRIDGE;
-		return;
-	}
-
 	/* This bridge is a PCI host bridge. Saves its resource. */
 	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) {
 		/* This resource region is not claimed, skip. */
@@ -264,7 +359,6 @@ static void pci_save_bus_state(struct pci_bus *bus, struct pci_bus_ser *s)
 
 		save_device_resource(s->resource + i, bus->resource[i]);
 	}
-	s->bridge_type = PCI_BUS_SER_PCI_HOST_BRIDGE;
 }
 
 static int pci_call_prepare(struct pci_ser *pci_state,
@@ -531,8 +625,11 @@ void pci_liveupdate_restore(struct pci_dev *dev)
 	s = pci_state->devs;
 	end = s + pci_state->dev_count;
 	for (; s < end; s++)
-		if (s->path == path)
+		if (s->path == path) {
+			/* If the bus state checking fails, kernel panics */
+			check_saved_bus_state(dev, pci_state);
 			return pci_dev_do_restore(dev, s);
+		}
 }
 
 int pci_liveupdate_get_driver_data(struct pci_dev *pdev, u64 *data)

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 24/25] PCI: pci-lu-pf-stub: Add a PF stub driver for Live Update testing
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (22 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 23/25] PCI/LUO: Check the PCI bus state after restoration chrisl
@ 2025-07-28  8:24 ` Chris Li
  2025-07-28  8:24 ` [PATCH RFC 25/25] PCI/LUO: Clean up PCI_SER_GET() chrisl
  24 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

The pci-lu-stub driver will always request device in probe(). However if
the PF driver might be add the liveupdate device list due to "depended" bit
rather than "requested" bit.

Create the pci-lu-stub-pf driver base on the pci-lu-stuf driver, it will
not request the device at probe().

For PF device, also restore the number of VFs at probe().

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/pci-lu-stub.c | 85 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 81 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci-lu-stub.c b/drivers/pci/pci-lu-stub.c
index ea8142dcb250d31cbf817df957157bc4ec3a876d..ff6230102b83ff3ad646c23b79d4e1b6de58b43f 100644
--- a/drivers/pci/pci-lu-stub.c
+++ b/drivers/pci/pci-lu-stub.c
@@ -5,6 +5,8 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 
+#include "pci.h"
+
 struct pci_lu_stub_ser {
 	u16 dev_id;
 } __packed;
@@ -32,15 +34,47 @@ static int validate_folio(struct pci_dev *dev, struct folio *folio)
 	return 0;
 }
 
-static int pci_lu_stub_probe(struct pci_dev *dev, const struct pci_device_id *id)
+static bool is_pf_driver(struct pci_dev *dev)
+{
+	return pci_get_drvdata(dev);
+}
+
+static int check_lu_flags(struct pci_dev *dev, bool is_pf)
+{
+	struct dev_liveupdate *lu = &dev->dev.lu;
+	bool expect_requested = !is_pf;
+	bool expect_depended = is_pf;
+
+	if (lu->requested != expect_requested) {
+		pci_err(dev, "Device requested bit %d not match expected %d\n",
+			lu->requested, expect_requested);
+		return -EINVAL;
+	}
+
+	if (lu->depended != expect_depended) {
+		pci_err(dev, "Device requested bit %d not match expected %d\n",
+			lu->depended, expect_depended);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int __pci_lu_stub_probe(struct pci_dev *dev, const struct pci_device_id *id,
+			       bool is_pf)
 {
 	struct folio *folio;
 	u64 data;
 	int ret;
+	int vfs;
+	struct dev_liveupdate *lu = &dev->dev.lu;
+	struct pci_dev_ser *s;
 
+	pci_set_drvdata(dev, (void *)(intptr_t) is_pf);
 	if (liveupdate_state_normal()) {
-		pci_info(dev, "Marking device as liveupdate requested\n");
-		dev->dev.lu.requested = 1;
+		if (!is_pf) {
+			pci_info(dev, "Marking device as liveupdate requested\n");
+			lu->requested = 1;
+		}
 		return 0;
 	}
 
@@ -49,6 +83,10 @@ static int pci_lu_stub_probe(struct pci_dev *dev, const struct pci_device_id *id
 		return -EOPNOTSUPP;
 	}
 
+	ret = check_lu_flags(dev, is_pf);
+	if (ret)
+		return ret;
+
 	ret = pci_liveupdate_get_driver_data(dev, &data);
 	if (ret) {
 		pci_err(dev, "Failed to get driver data for device (%d)\n", ret);
@@ -63,7 +101,31 @@ static int pci_lu_stub_probe(struct pci_dev *dev, const struct pci_device_id *id
 		return -ENOENT;
 	}
 
-	return validate_folio(dev, folio);
+	ret = validate_folio(dev, folio);
+	if (ret)
+		return ret;
+
+	s = lu->dev_state;
+	vfs = s->num_vfs;
+	if (dev->is_physfn && vfs) {
+		ret = pci_sriov_configure_simple(dev, vfs);
+		if (vfs != ret) {
+			pci_err(dev, "Failed to restore num VFs %d got %d\n",
+				vfs, ret);
+			return (ret < 0) ? ret : -EAGAIN;
+		}
+	}
+	return  0;
+}
+
+static int pci_lu_stub_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+	return __pci_lu_stub_probe(dev, id, false);
+}
+
+static int pci_lu_stub_pf_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+	return __pci_lu_stub_probe(dev, id, true);
 }
 
 static void pci_lu_stub_remove(struct pci_dev *dev)
@@ -74,10 +136,15 @@ static void pci_lu_stub_remove(struct pci_dev *dev)
 
 static int pci_lu_stub_prepare(struct device *dev, u64 *data)
 {
+	struct pci_dev *pdev = to_pci_dev(dev);
 	struct pci_lu_stub_ser *ser;
 	struct folio *folio;
 	int ret;
 
+	ret = check_lu_flags(pdev, is_pf_driver(pdev));
+	if (ret)
+		return ret;
+
 	folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, get_order(sizeof(*ser)));
 	if (!folio)
 		return -ENOMEM;
@@ -135,5 +202,15 @@ static struct pci_driver pci_lu_stub_driver = {
 	.driver.lu	= &liveupdate_ops,
 };
 
+static struct pci_driver pci_lu_stub_pf_driver = {
+	.name		= "pci-lu-stub-pf",
+	.id_table	= pci_lu_stub_id_table,
+	.probe		= pci_lu_stub_pf_probe,
+	.remove		= pci_lu_stub_remove,
+	.sriov_configure = pci_sriov_configure_simple,
+	.driver.lu	= &liveupdate_ops,
+};
+
 module_pci_driver(pci_lu_stub_driver);
+module_pci_driver(pci_lu_stub_pf_driver);
 MODULE_LICENSE("GPL");

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC 25/25] PCI/LUO: Clean up PCI_SER_GET()
  2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
                   ` (23 preceding siblings ...)
  2025-07-28  8:24 ` [PATCH RFC 24/25] PCI: pci-lu-pf-stub: Add a PF stub driver for Live Update testing Chris Li
@ 2025-07-28  8:24 ` chrisl
  24 siblings, 0 replies; 34+ messages in thread
From: chrisl @ 2025-07-28  8:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

From: David Matlack <dmatlack@google.com>

Refactor PCI_SER_GET() to be more readable by storing the pointer
to struct pci_dev_ser in an intermediate variable and adding a helper
function to_pci_dev_ser().

Change pci_lu_adopt() to return a boolean since it is only used to check
if a device has preserved state.

Opportunistically fix the formatting on the static inline prototype of
pci_liveupdate_reclaim_resource() as well.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Chris Li <chrisl@kernel.org>
---
 drivers/pci/pci.h | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 7af32edb128faef9c5e2665ca5055374f7fd30ea..d092cea96dc22cca5d3526c720cfb8b330c47683 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1196,27 +1196,37 @@ static inline int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int inde
 	 PCI_CONF1_EXT_REG(reg))
 
 #ifdef CONFIG_LIVEUPDATE
-#define PCI_SER_GET(__pci_dev, __var, __def)			\
-	(__pci_dev->dev.lu.dev_state) ?				\
-	((struct pci_dev_ser *)__pci_dev->dev.lu.dev_state)->__var : __def
-
 void pci_liveupdate_restore(struct pci_dev *dev);
 void pci_liveupdate_override_driver(struct pci_dev *dev);
-static inline struct pci_dev_ser *pci_lu_adopt(struct pci_dev *dev)
+static inline struct pci_dev_ser *to_pci_dev_ser(struct pci_dev *dev)
+{
+	return dev->dev.lu.dev_state;
+}
+static inline bool pci_lu_adopt(struct pci_dev *dev)
 {
-	return dev->dev.lu.requested ? dev->dev.lu.dev_state : NULL;
+	return dev->dev.lu.requested && to_pci_dev_ser(dev);
 }
 int pci_liveupdate_reclaim_resource(struct pci_dev *dev);
 #else
-#define PCI_SER_GET(__dev, __var, __def) __def
-
 static inline void pci_liveupdate_restore(struct pci_dev *dev) {}
 static inline void pci_liveupdate_override_driver(struct pci_dev *dev) {}
-static inline struct pci_dev_ser *pci_lu_adopt(struct pci_dev *dev)
+static inline struct pci_dev_ser *to_pci_dev_ser(struct pci_dev *dev)
 {
 	return NULL;
 }
-static inline int pci_liveupdate_reclaim_resource(
-	struct pci_dev *dev) { return -ENXIO; }
+static inline bool pci_lu_adopt(struct pci_dev *dev)
+{
+	return false;
+}
+static inline int pci_liveupdate_reclaim_resource(struct pci_dev *dev)
+{
+	return -ENXIO;
+}
 #endif
+
+#define PCI_SER_GET(__pci_dev, __field, __default) ({			\
+	struct pci_dev_ser *__ser = to_pci_dev_ser(__pci_dev);		\
+									\
+	__ser ? __ser->__field : __default;				\
+})
 #endif /* DRIVERS_PCI_H */

-- 
2.50.1.487.gc89ff58d15-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot
  2025-07-28  8:24 ` [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot Chris Li
@ 2025-07-28 17:23   ` Thomas Gleixner
  2025-07-28 23:50     ` Jason Gunthorpe
  2025-07-30  1:51     ` Chris Li
  0 siblings, 2 replies; 34+ messages in thread
From: Thomas Gleixner @ 2025-07-28 17:23 UTC (permalink / raw)
  To: Chris Li, Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown
  Cc: linux-kernel, linux-pci, linux-acpi, David Matlack,
	Pasha Tatashin, Jason Miu, Vipin Sharma, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, William Tu, Mike Rapoport,
	Chris Li, Jason Gunthorpe, Leon Romanovsky

On Mon, Jul 28 2025 at 01:24, Chris Li wrote:
> The liveupdate devices are already initialized by the kernel before the
> kexec. During the kexec the device is still running. Avoid write to the
> liveupdate devices during the new kernel boot up.

This change log is way too meager for this kind of change.

 1) You want to explain in detail how this works.

    "initialized by the kernel before the kexec" is as vague as it gets.

 2) Avoid write ....

    Again this lacks any information how this is supposed to work correctly.

>  drivers/pci/ats.c            |  7 ++--
>  drivers/pci/iov.c            | 58 ++++++++++++++++++------------
>  drivers/pci/msi/msi.c        | 32 ++++++++++++-----
>  drivers/pci/msi/pcidev_msi.c |  4 +--
>  drivers/pci/pci-acpi.c       |  3 ++
>  drivers/pci/pci.c            | 85 +++++++++++++++++++++++++++++---------------
>  drivers/pci/pci.h            |  9 ++++-
>  drivers/pci/pcie/aspm.c      |  7 ++--
>  drivers/pci/pcie/pme.c       | 11 ++++--
>  drivers/pci/probe.c          | 43 +++++++++++++++-------
>  drivers/pci/setup-bus.c      | 10 +++++-

Then you sprinkle this stuff into files, which have completely different
purposes, without any explanation for the particular instances why they
are supposed to be correct and how this works.

I'm just looking at the MSI parts, as I have no expertise with the rest.

> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 6ede55a7c5e652c80b51b10e58f0290eb6556430..7c40fde1ba0f89ad1d72064ac9e80696faeab426 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -113,7 +113,8 @@ static int pci_setup_msi_context(struct pci_dev *dev)
>  
>  void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
>  {
> -	raw_spinlock_t *lock = &to_pci_dev(desc->dev)->msi_lock;
> +	struct pci_dev *pci_dev = to_pci_dev(desc->dev);
> +	raw_spinlock_t *lock = &pci_dev->msi_lock;
>  	unsigned long flags;
>  
>  	if (!desc->pci.msi_attrib.can_mask)
> @@ -122,8 +123,9 @@ void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
>  	raw_spin_lock_irqsave(lock, flags);
>  	desc->pci.msi_mask &= ~clear;
>  	desc->pci.msi_mask |= set;
> -	pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->pci.mask_pos,
> -			       desc->pci.msi_mask);
> +	if (!pci_lu_adopt(pci_dev))
> +		pci_write_config_dword(pci_dev, desc->pci.mask_pos,
> +				       desc->pci.msi_mask);

This results in inconsistent state, which is a bad idea to begin
with. How is cached software state and hardware state going to be
brought in sync at some point?

If you analyzed all places, which actually depend on hardware state and
make decisions based on it, for correctness, then you failed to provide
that analysis. If not, no comment.

>  	raw_spin_unlock_irqrestore(lock, flags);
>  }
>  
> @@ -190,6 +192,9 @@ static inline void pci_write_msg_msi(struct pci_dev *dev, struct msi_desc *desc,
>  	int pos = dev->msi_cap;
>  	u16 msgctl;
>  
> +	if (pci_lu_adopt(dev))
> +		return;
> +
>  	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
>  	msgctl &= ~PCI_MSI_FLAGS_QSIZE;
>  	msgctl |= FIELD_PREP(PCI_MSI_FLAGS_QSIZE, desc->pci.msi_attrib.multiple);
> @@ -214,6 +219,8 @@ static inline void pci_write_msg_msix(struct msi_desc *desc, struct msi_msg *msg
>  
>  	if (desc->pci.msi_attrib.is_virtual)
>  		return;
> +	if (pci_lu_adopt(to_pci_dev(desc->dev)))
> +		return;

So you don't allow the new kernel to write the MSI message, but the
interrupt subsystem has this new message and there are places which
utilize that cached message. How is this supposed to work?

>  	/*
>  	 * The specification mandates that the entry is masked
>  	 * when the message is modified:
> @@ -279,7 +286,8 @@ static void pci_msi_set_enable(struct pci_dev *dev, int enable)
>  	control &= ~PCI_MSI_FLAGS_ENABLE;
>  	if (enable)
>  		control |= PCI_MSI_FLAGS_ENABLE;
> -	pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
> +	if (!pci_lu_adopt(dev))
> +		pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);

The placement of these conditionals is arbitrary. Some are the begin of
a function, others just block the write. Is that based on some logic or
were the places selected by shabby AI queries?

>  static int msi_setup_msi_desc(struct pci_dev *dev, int nvec,
> @@ -553,6 +561,7 @@ static void pci_msix_clear_and_set_ctrl(struct pci_dev *dev, u16 clear, u16 set)
>  {
>  	u16 ctrl;
>  
> +	BUG_ON(pci_lu_adopt(dev));

Not going to happen. BUG() is only appropriate when there is absolutely
no way to handle a situation. This is as undocumented as everything else
here.

>  	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
>  	ctrl &= ~clear;
>  	ctrl |= set;
> @@ -720,8 +729,9 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>  	 * registers can be accessed.  Mask all the vectors to prevent
>  	 * interrupts coming in before they're fully set up.
>  	 */
> -	pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL |
> -				    PCI_MSIX_FLAGS_ENABLE);
> +	if (!pci_lu_adopt(dev))
> +		pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL |
> +					    PCI_MSIX_FLAGS_ENABLE);

And for enhanced annoyance you sprinkle this condition everywhere into
the code and then BUG() when you missed an instance. Because putting it
into the function which is invoked a gazillion of times would be too
obvious, right? That would at least be tasteful, but that's not the
primary problem of all this.

Sprinkling these conditionals all over the place is absolutely
unmaintainable, error prone and burdens everyone with this insanity and
the related hard to chase bugs.

Especially as there is no concept behind this and zero documentation how
any of this should work or even be remotely correct.

Before you start the next hackery, please sit down and write up coherent
explanations:

  What is the general concept of this?

  What is the exact state in which a device is left when the old kernel
  jumps into the new kernel?

  What is the state of the MSI[-X] or legacy PCI interrupts at this
  point?

  Can the device raise interrupts during the transition from the old to
  the new kernel?

  How is the "live" state of the device reflected and restored
  throughout the interrupt subsystem?

  How is the device driver supposed to attach to the same interrupt
  state as before?

  How are the potentially different Linux interrupt numbers mapped to
  the previous state?

Before this materializes and is agreed on, this is not going anywhere.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot
  2025-07-28 17:23   ` Thomas Gleixner
@ 2025-07-28 23:50     ` Jason Gunthorpe
  2025-07-30  4:13       ` Chris Li
  2025-07-30  1:51     ` Chris Li
  1 sibling, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2025-07-28 23:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Chris Li, Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown, linux-kernel, linux-pci, linux-acpi,
	David Matlack, Pasha Tatashin, Jason Miu, Vipin Sharma,
	Saeed Mahameed, Adithya Jayachandran, Parav Pandit, William Tu,
	Mike Rapoport, Leon Romanovsky

On Mon, Jul 28, 2025 at 07:23:03PM +0200, Thomas Gleixner wrote:
> On Mon, Jul 28 2025 at 01:24, Chris Li wrote:
> > The liveupdate devices are already initialized by the kernel before the
> > kexec. During the kexec the device is still running. Avoid write to the
> > liveupdate devices during the new kernel boot up.
> 
> This change log is way too meager for this kind of change.
> 
>  1) You want to explain in detail how this works.
> 
>     "initialized by the kernel before the kexec" is as vague as it gets.
> 
>  2) Avoid write ....
> 
>     Again this lacks any information how this is supposed to work correctly.
> 
> >  drivers/pci/ats.c            |  7 ++--
> >  drivers/pci/iov.c            | 58 ++++++++++++++++++------------
> >  drivers/pci/msi/msi.c        | 32 ++++++++++++-----
> >  drivers/pci/msi/pcidev_msi.c |  4 +--
> >  drivers/pci/pci-acpi.c       |  3 ++
> >  drivers/pci/pci.c            | 85 +++++++++++++++++++++++++++++---------------
> >  drivers/pci/pci.h            |  9 ++++-
> >  drivers/pci/pcie/aspm.c      |  7 ++--
> >  drivers/pci/pcie/pme.c       | 11 ++++--
> >  drivers/pci/probe.c          | 43 +++++++++++++++-------
> >  drivers/pci/setup-bus.c      | 10 +++++-
> 
> Then you sprinkle this stuff into files, which have completely different
> purposes, without any explanation for the particular instances why they
> are supposed to be correct and how this works.

Yeah, everyting needs to be very carefully explained.

For instance I'm not sure we should be doing *anything* to the
MSI. Why did you think so?

MSI should be fully cleared by the new kernel and the new VFIO should
re-establish all the MSI routing from scratch as part of adopting the
device. We already accept that any interrupts are lost during the
kexec process so what reason is there to do anything except start up the
new kernel with a fully disabled MSI and cleared MSI?

If otherwise it should be explained why we can't work this way - and
then explain how the new kernel will adopt the inherited operating MSI
(hint: I doubt it can) without disrupting it.

Same remark for everything. Explain in the commits and perhaps a well
placed comment why anything needs to be done and why exactly we can't
use the cold boot flow for each item.

eg "we can't use the cold boot flow for BAR sizing because BAR sizing
requires changing the BAR register and that will break ongoing P2P
DMAs"

"we can't use the cold boot flow for bridge windows because changing
the bridge windows in any way will break ongoing P2P DMAs" (though you
also need to explain why the cold boot flow would change the bridge
windows)

etc etc.

There is also some complication here as the iommu driver technically
owns some of the PCI state, and we really don't want the PCI Core to
change it, but we do need theiommu driver to affirm what the in-use
state should be because it is responsible to clean it up.

This may actually require some restructing of the iommu driver/pci
core interfaces to switch from an enable/disbale language to a 'target
state' language. Ie "ATS shall be on and ATS page size shall be X".

This series is very big, so I would probably try to break it up into
smaller chunks. Like you don't need to preserve bridge windows and
BARs if you don't support P2P. You don't need to worry about ATS and
PASID if you don't support those, etc, etc.

Yes, in the end all needs to be supported, but going bit by bit will
be easier for people to understand. Basic VFIO support with a basic
IOMMU using basic PCI with no P2P is the simplest thing you can do,
and I think it needs surprisingly little preservation.

Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot
  2025-07-28 17:23   ` Thomas Gleixner
  2025-07-28 23:50     ` Jason Gunthorpe
@ 2025-07-30  1:51     ` Chris Li
  2025-07-31 15:01       ` Jason Gunthorpe
  1 sibling, 1 reply; 34+ messages in thread
From: Chris Li @ 2025-07-30  1:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Bjorn Helgaas, Greg Kroah-Hartman, Rafael J. Wysocki,
	Danilo Krummrich, Len Brown, linux-kernel, linux-pci, linux-acpi,
	David Matlack, Pasha Tatashin, Jason Miu, Vipin Sharma,
	Saeed Mahameed, Adithya Jayachandran, Parav Pandit, William Tu,
	Mike Rapoport, Jason Gunthorpe, Leon Romanovsky

Hi Thomas,

On Mon, Jul 28, 2025 at 10:23 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Mon, Jul 28 2025 at 01:24, Chris Li wrote:
> > The liveupdate devices are already initialized by the kernel before the
> > kexec. During the kexec the device is still running. Avoid write to the
> > liveupdate devices during the new kernel boot up.
>
> This change log is way too meager for this kind of change.

I agree with you. I mention it in the cover letter, I do expect this
part of change to be controversial. This RFC series is just to kick
off the discussion for PCI device liveupdate.

>  1) You want to explain in detail how this works.
>     "initialized by the kernel before the kexec" is as vague as it gets.

Agree. Sorry I haven't included more documents in this series. Working on it.

>
>  2) Avoid write ....
>
>     Again this lacks any information how this is supposed to work correctly.

I guess I haven't presented the big picture of how liveupdate works
with a PCI device.

Let's start with the background why we want to do this. We want to
upgrade a host kernel, which has a VM running with a GPU device
attached to the VM via vfio_pci. We want the host kernel upgrade in a
way that the VM can continue without shutting down and restarting the
VM. The VM will pause during the host kernel kexec. The GPU device
will continue running and DMA without pausing. VM will not be able to
run the interrupt until the new kernel is finished booting and resume
the VM.

Pasha's LUO series already have designs on the liveupdate state, with
callback associated with the state.

https://lore.kernel.org/lkml/20250515182322.117840-1-pasha.tatashin@soleen.com/

I copy paste some of Pasha's LUO state here:
==========quote==========
LUO State Machine and Events:

NORMAL:   Default operational state.
PREPARED: Initial preparation complete after LIVEUPDATE_PREPARE
          event. Subsystems have saved initial state.
FROZEN:   Final "blackout window" state after LIVEUPDATE_FREEZE
          event, just before kexec. Workloads must be suspended.
UPDATED:  Next kernel has booted via live update. Awaiting restoration
          and LIVEUPDATE_FINISH.

Events:
LIVEUPDATE_PREPARE: Prepare for reboot, serialize state.
LIVEUPDATE_FREEZE:  Final opportunity to save state before kexec.
LIVEUPDATE_FINISH:  Post-reboot cleanup in the next kernel.
LIVEUPDATE_CANCEL:  Abort prepare or freeze, revert changes.
==========quote ends ===========

The PCI core register will register as a subsystem to LUO and
participate in the LUO callbacks.
1) In NORMAL state:
The PCI device will register to the PCI subsystem by setting the
pci_dev->dev.lu.requested flag.

2) PREPARE callback. The PCI subsystem will build the list of the PCI
devices using the PCI device dependency. VF depends on PF, PCI devices
depend on the parent bridge.

The PCI subsystem will save the struct pci_dev part of the pci device
state. Then forward the prepare callback to the PCI devices to
serialize the PCI devices driver state. The VM  is still running but
with some limitations. e.g. can't create new DMA mapping. can't attach
to an additional new vfio_pci device.

3) FREEZE callback: VM is paused. Last change for PCI device to
serialize the device state.

4) kexec booting up the new kernel.

5) PCI device enumeration and probing. Find the PCI device in the
serialized preserved device list, restore the device serialized data
pointer for PCI device. PF device probe(), restores the number of  VF
and creates the VF, the VF device probe()

6) VM re-attach to the requested PCI device via vfio_pci.

7) FINISH callback. PCI subsystem and PCI devices free their preserved
serialized data. System go back to NORMAL state.

8) VM resume running.

>
> >  drivers/pci/ats.c            |  7 ++--
> >  drivers/pci/iov.c            | 58 ++++++++++++++++++------------
> >  drivers/pci/msi/msi.c        | 32 ++++++++++++-----
> >  drivers/pci/msi/pcidev_msi.c |  4 +--
> >  drivers/pci/pci-acpi.c       |  3 ++
> >  drivers/pci/pci.c            | 85 +++++++++++++++++++++++++++++---------------
> >  drivers/pci/pci.h            |  9 ++++-
> >  drivers/pci/pcie/aspm.c      |  7 ++--
> >  drivers/pci/pcie/pme.c       | 11 ++++--
> >  drivers/pci/probe.c          | 43 +++++++++++++++-------
> >  drivers/pci/setup-bus.c      | 10 +++++-
>
> Then you sprinkle this stuff into files, which have completely different
> purposes, without any explanation for the particular instances why they
> are supposed to be correct and how this works.

They follow a pattern that the original kernel needs to write to the
device and change the device state. The liveupdate device needs to
maintain the previous state not changed, therefore needs to prevent
such write initialization in liveupdate case.

I can certainly split it into more patches and group them by functions
in the later series.
This patch does it in a whole sale just to demonstrate what needs to
happen to make a device live update.

>
> I'm just looking at the MSI parts, as I have no expertise with the rest.

Thank you for your feedback, that is very helpful.

>
> > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> > index 6ede55a7c5e652c80b51b10e58f0290eb6556430..7c40fde1ba0f89ad1d72064ac9e80696faeab426 100644
> > --- a/drivers/pci/msi/msi.c
> > +++ b/drivers/pci/msi/msi.c
> > @@ -113,7 +113,8 @@ static int pci_setup_msi_context(struct pci_dev *dev)
> >
> >  void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
> >  {
> > -     raw_spinlock_t *lock = &to_pci_dev(desc->dev)->msi_lock;
> > +     struct pci_dev *pci_dev = to_pci_dev(desc->dev);
> > +     raw_spinlock_t *lock = &pci_dev->msi_lock;
> >       unsigned long flags;
> >
> >       if (!desc->pci.msi_attrib.can_mask)
> > @@ -122,8 +123,9 @@ void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
> >       raw_spin_lock_irqsave(lock, flags);
> >       desc->pci.msi_mask &= ~clear;
> >       desc->pci.msi_mask |= set;
> > -     pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->pci.mask_pos,
> > -                            desc->pci.msi_mask);
> > +     if (!pci_lu_adopt(pci_dev))
> > +             pci_write_config_dword(pci_dev, desc->pci.mask_pos,
> > +                                    desc->pci.msi_mask);
>
> This results in inconsistent state, which is a bad idea to begin
> with. How is cached software state and hardware state going to be
> brought in sync at some point?

Yes, to make the interrupt fully working we need to tell the new
kernel about the previous kernel's interrupt descriptor in IOMMU etc.
As it is, the liveupdate device interrupt is not fully working yet.
David is working on the interrupt and later there will be an interrupt
series to make interrupt working with liveupdate devices. This is just
the first baby step.

>
> If you analyzed all places, which actually depend on hardware state and
> make decisions based on it, for correctness, then you failed to provide
> that analysis. If not, no comment.

Let me clarify. This avoid writing to devices only applies to
liveupdate devices. Only between FREEZE and FINISH. After the LUO
finish(), LUO is back to normal state again. The device can be
writable again as normal, most likely by the VM. We don't want the
device state to change between FREEZE and FINISH.

>
> >       raw_spin_unlock_irqrestore(lock, flags);
> >  }
> >
> > @@ -190,6 +192,9 @@ static inline void pci_write_msg_msi(struct pci_dev *dev, struct msi_desc *desc,
> >       int pos = dev->msi_cap;
> >       u16 msgctl;
> >
> > +     if (pci_lu_adopt(dev))
> > +             return;
> > +
> >       pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
> >       msgctl &= ~PCI_MSI_FLAGS_QSIZE;
> >       msgctl |= FIELD_PREP(PCI_MSI_FLAGS_QSIZE, desc->pci.msi_attrib.multiple);
> > @@ -214,6 +219,8 @@ static inline void pci_write_msg_msix(struct msi_desc *desc, struct msi_msg *msg
> >
> >       if (desc->pci.msi_attrib.is_virtual)
> >               return;
> > +     if (pci_lu_adopt(to_pci_dev(desc->dev)))
> > +             return;
>
> So you don't allow the new kernel to write the MSI message, but the
> interrupt subsystem has this new message and there are places which
> utilize that cached message. How is this supposed to work?

We don't allow the PCI subsystem or driver to write the MSI message
before FINISH.
There are two possible ways. 1) Have someone save the incoming MSI
message somehow, and re-deliver them after the FINISH call. 2) Don't
save the MSI message between FREEZE and FINISH. At finish, deliver one
spurious interrupt to the device driver, so the device driver can have
a chance to check if there is any pending work that needs to be done.
It is possible that no MSI has been dropped, the driver finds out
there is nothing that needs to be done. We expect the driver can
tolerate such one time spurious interruptions. Because spurious
interruptions can happen for other reasons, that should be fine? Let
me know if there is a case where this kind of spurious interrupt can
cause a problem, we are very interested to know.

>
> >       /*
> >        * The specification mandates that the entry is masked
> >        * when the message is modified:
> > @@ -279,7 +286,8 @@ static void pci_msi_set_enable(struct pci_dev *dev, int enable)
> >       control &= ~PCI_MSI_FLAGS_ENABLE;
> >       if (enable)
> >               control |= PCI_MSI_FLAGS_ENABLE;
> > -     pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
> > +     if (!pci_lu_adopt(dev))
> > +             pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
>
> The placement of these conditionals is arbitrary. Some are the begin of
> a function, others just block the write. Is that based on some logic or
> were the places selected by shabby AI queries?
They all can be converted to the pattern as:
if (!pci_luo_adopt(dev))
      pci_write_config_xxx().

Sometimes I choose to return early if there is multiple write but not
data stored in struct pci_dev. Mostly just try to reduce the number of
if (!pci_luo_adopt(dev)). I am not satisfied with this change yet. The
goal of this patch is to show what effect needs to happen, we can
discuss better ways to do it.

>
> >  static int msi_setup_msi_desc(struct pci_dev *dev, int nvec,
> > @@ -553,6 +561,7 @@ static void pci_msix_clear_and_set_ctrl(struct pci_dev *dev, u16 clear, u16 set)
> >  {
> >       u16 ctrl;
> >
> > +     BUG_ON(pci_lu_adopt(dev));
>
> Not going to happen. BUG() is only appropriate when there is absolutely
> no way to handle a situation. This is as undocumented as everything else
> here.

Agree. This is some developing/debug stuff left over. I haven't
encountered msix_clear_and_set_ctrl() in my test. I will remove the
bug in the next version.

>
> >       pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
> >       ctrl &= ~clear;
> >       ctrl |= set;
> > @@ -720,8 +729,9 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
> >        * registers can be accessed.  Mask all the vectors to prevent
> >        * interrupts coming in before they're fully set up.
> >        */
> > -     pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL |
> > -                                 PCI_MSIX_FLAGS_ENABLE);
> > +     if (!pci_lu_adopt(dev))
> > +             pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL |
> > +                                         PCI_MSIX_FLAGS_ENABLE);
>
> And for enhanced annoyance you sprinkle this condition everywhere into
> the code and then BUG() when you missed an instance. Because putting it
> into the function which is invoked a gazillion of times would be too
> obvious, right? That would at least be tasteful, but that's not the
> primary problem of all this.
>
> Sprinkling these conditionals all over the place is absolutely
> unmaintainable, error prone and burdens everyone with this insanity and
> the related hard to chase bugs.

If you prefer, I can move them all into the pci_config_write. We
actually start with pci_config_write_xxx(). But that solution has its
own problem as well.  For starters, the function name does not reflect
what the function actually does any more. Also for the complicated
case, where liveupdate does need to write some config register but not
the other. e.g. From the live update point of view, PF devices
shouldn't write to SR-IOV related registers that change the VF devices
number. But PF devices should be able to tolerate some other config
space write, because the VM is not using the PF device. The PF device
state can be changed without impacting the VM.
It is going to be unmaintainable to make a complicated logic inside
pci_config_write_xxx(), depending on which caller and what state, what
is allowed and what is not.

I can discuss and try different approaches to address this problem. I
understand it is a hard problem. I don't have a perfect solution
without cons. This is just the first baby step to demonstrate what is
the resulting effect we want. Then we can shape the code to our
liking. I am happy to explore other approaches as well.

>
> Especially as there is no concept behind this and zero documentation how
> any of this should work or even be remotely correct.

I hope the above description can help you understand better why we
want to do it and the approach we take. I am happy to answer questions
if you have any. Mind you that I don't have all the answers. It is
part of the journey to find the best solution.
>
> Before you start the next hackery, please sit down and write up coherent
> explanations:
>
>   What is the general concept of this?

See above.

>
>   What is the exact state in which a device is left when the old kernel
>   jumps into the new kernel?

The device allows DMA to the mapping region during PREPARE and raise
interrupt. The interrupt handler will not be able to run during kexec
black out period (between freeze and finish). Other than the state
store in the device, there is also a PCI subsystem and device driver
state serialized in the preserved folio for the next kernel to
interrupt.

>   What is the state of the MSI[-X] or legacy PCI interrupts at this
>   point?

The current approach is that, just drop the interrupt during black out
period (between freeze and finish) then deliver a spurious interrupt
to the device at finish(), that gives the device driver a chance to
perform the interrupt handler action which can't happen in black out.

>
>   Can the device raise interrupts during the transition from the old to
>   the new kernel?

Yes, can raise interrupt but interrupt handle won't able to run during
black out.
After finish() it is business as usual.
>
>   How is the "live" state of the device reflected and restored
>   throughout the interrupt subsystem?

Those are very good questions. Current approach just drop them and use
the spurious interrupt to catch up in the end.
>
>   How is the device driver supposed to attach to the same interrupt
>   state as before?

We can't if we did not save the interrupt state changed during black
out. Current approach is just using a spurious interrupt to catch up
in finish().

>
>   How are the potentially different Linux interrupt numbers mapped to
>   the previous state?
The IRQ number will remain the state cross kexec. However the
interrupt descriptor address might have changed in the new kernel. We
need to save some of the interrupt descriptor and interrupt state into
the preserved folio for the next kernel to rebuild. To be continued in
the interrupt series. Not covered by this patch series yet.
>
> Before this materializes and is agreed on, this is not going anywhere.

Those are very good questions. Hopefully I have answered some of it.
Please let me know if you have more questions I can clarify.

Again this is just an RFC to show what was the resulting effect we
want to get from the PCI device livedupate. It is not complete nor
perfect. I am happy to explore different approaches.

Thanks for the questions. I still owe you a write up document for the
PCI device liveupdate. I will work on that.

Hope that helps explain some of the background and approach. It is not
a substitution of the document. I am working on that and will include
it in the next version.

Chris



Chris

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot
  2025-07-28 23:50     ` Jason Gunthorpe
@ 2025-07-30  4:13       ` Chris Li
  0 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-07-30  4:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Thomas Gleixner, Bjorn Helgaas, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Len Brown, linux-kernel,
	linux-pci, linux-acpi, David Matlack, Pasha Tatashin, Jason Miu,
	Vipin Sharma, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
	William Tu, Mike Rapoport, Leon Romanovsky, Samiullah Khawaja

On Mon, Jul 28, 2025 at 4:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > Then you sprinkle this stuff into files, which have completely different
> > purposes, without any explanation for the particular instances why they
> > are supposed to be correct and how this works.
>
> Yeah, everyting needs to be very carefully explained.

Agree. I did some explanation in my last email reply to Thomas. Will
add a document for the next version.

>
> For instance I'm not sure we should be doing *anything* to the
> MSI. Why did you think so?
>
> MSI should be fully cleared by the new kernel and the new VFIO should
> re-establish all the MSI routing from scratch as part of adopting the
> device. We already accept that any interrupts are lost during the
> kexec process so what reason is there to do anything except start up the
> new kernel with a fully disabled MSI and cleared MSI?

The current approach is that we fake/inject a spurious interrupt to
the device to allow the device driver to have a chance to process any
pending action for the interrupt. There is also a possibility there is
nothing the device driver needs to do due to no interrupt having ever
triggered in the kexec window.  We expect the driver can tolerate that
spurious interrupt.

The alternative is to try to (partially) process the interrupt during
kexec. e.g. remember which IRQ has the interrupt triggered. It will
make things much more complicated. Invoke interrupt handler in the
early boot stage before IOMMU is very tricky.
>
> If otherwise it should be explained why we can't work this way - and
> then explain how the new kernel will adopt the inherited operating MSI
> (hint: I doubt it can) without disrupting it.

Agree.

>
> Same remark for everything. Explain in the commits and perhaps a well
> placed comment why anything needs to be done and why exactly we can't
> use the cold boot flow for each item.

We certainly can do that.

I am trying to see if we can agree on the VFIO_PCI device used by the
VM. We don't want any config space register to change during the
liveupdate kexec (before finish). We can certainly change what config
space register might or might not break stuff. But it is going to be
very hard to test and verify what can break if we change this.

If we can draw a line and say, there is no config space to write to
the device between freeze and finish. It is much easier to reason from
the device point of view, the device should continue working. The
device has no way of knowing the host kernel has been changed. The
device has only a limited view of their config space, the DMA area it
can read/write to. If we preserve enough stuff, the device should
continue working. For most of the devices, we can reason with the
model that keeping the status quo will not break things.

There is an obvious exception to that, e.g. if the device has a
watchdog timer it needs to kick at regular intervals, if that interval
is shorter than the kexec cycle. It should be pretty rare and we can
deal with those when we actually encounter one.

>
> eg "we can't use the cold boot flow for BAR sizing because BAR sizing
> requires changing the BAR register and that will break ongoing P2P
> DMAs"
>
> "we can't use the cold boot flow for bridge windows because changing
> the bridge windows in any way will break ongoing P2P DMAs" (though you
> also need to explain why the cold boot flow would change the bridge
> windows)
>
> etc etc.

There will be some config space register hard to make sure changing it
will break things or not.
e.g. The base BAR register, if we change to a new memory region, and
all follow up write to the device using a BAR new address, should
things continue working? Will have a lot of corner case like this, it
is much easier to just avoid changing anything to make things
consistent.

>
> There is also some complication here as the iommu driver technically
> owns some of the PCI state, and we really don't want the PCI Core to
> change it, but we do need theiommu driver to affirm what the in-use
> state should be because it is responsible to clean it up.

Yes, there is overlap between PCI and IOMMU, more than just config
space write. The IOMMU needs to know which PCI device participates,
which set of groups it needs to save. CC Samiullah here, he knows more
about the IOMMU side of the liveupdate than I do.

> This may actually require some restructing of the iommu driver/pci
> core interfaces to switch from an enable/disbale language to a 'target
> state' language. Ie "ATS shall be on and ATS page size shall be X".
>
Ack.

I have some ideas to make the PCI initialization cleaner for this
usage as well. Instead of directly initiating and turning on features
if found. We can do in 3 stages:
1) enumerate PCI capability and get the list of capability available
but don't turn them on yet.
2) determine what capability needs to be turned on/off. For the normal
initiation without liveupdate, the current behavior mostly turns on
whatever can be turned on. For liveupdate devices, it would be
inherent the on/off from what the previous kernel hands off to the new
kernel. By either 1) reading the device state (assume reading state is
possible and does not change device state) or 2) previous kernel save
state into preserved folio and new kernel reads the state from
preserved folio.
3) Perform the action to turn on/off the according the result from 2).
For live update devices the most common case is skip write, that will
be noop. For normal initialization without liveupdate, it will turn on
the capability.

> This series is very big, so I would probably try to break it up into
> smaller chunks. Like you don't need to preserve bridge windows and
> BARs if you don't support P2P. You don't need to worry about ATS and
> PASID if you don't support those, etc, etc.

Yes, I can break it to smaller chunks.

One of the deliverables of this patch series is that I can test the
liveupdate with the pci-lu-stub and pci-lub-stub-pf driver. Having
additional patch to verify no PCI config space write has performed on
the requested PCI device during shutdown and kexec boot up.

> Yes, in the end all needs to be supported, but going bit by bit will
> be easier for people to understand. Basic VFIO support with a basic
> IOMMU using basic PCI with no P2P is the simplest thing you can do,
> and I think it needs surprisingly little preservation.

Yes, that is certainly possible ;-)

Because I am working on the PCI side of the liveupdate, there are
other developers working on VFIO and IOMMU depending on my PCI
changes. From the project development point of view the PCI change
needs to happen first, to unblock others. That is how I get here.

I can certainly break it down to smaller chunks.

Chris

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot
  2025-07-30  1:51     ` Chris Li
@ 2025-07-31 15:01       ` Jason Gunthorpe
  2025-08-01 23:04         ` Chris Li
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2025-07-31 15:01 UTC (permalink / raw)
  To: Chris Li
  Cc: Thomas Gleixner, Bjorn Helgaas, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Len Brown, linux-kernel,
	linux-pci, linux-acpi, David Matlack, Pasha Tatashin, Jason Miu,
	Vipin Sharma, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
	William Tu, Mike Rapoport, Leon Romanovsky

On Tue, Jul 29, 2025 at 06:51:27PM -0700, Chris Li wrote:

> They follow a pattern that the original kernel needs to write to the
> device and change the device state. The liveupdate device needs to
> maintain the previous state not changed, therefore needs to prevent
> such write initialization in liveupdate case.

No, I fundamentally reject this position and your testing methodology.

The new kernel *should* be writing to config space and it *should* be
doing things like clearing and gaining control over MSI. It is fully
wrong to be blocking it like you are doing just to satify some
incorrect qemu based test checking for no config access.

Only some config accesse are bad. Each and every "bad" one needs to be
clearly explained *why* it is bad and only then mitigated.

Most mitigation are far harder than just if'ing around the config
write. My ATS/PASID/etc example for instance.

Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot
  2025-07-31 15:01       ` Jason Gunthorpe
@ 2025-08-01 23:04         ` Chris Li
  2025-08-02 13:50           ` Jason Gunthorpe
  0 siblings, 1 reply; 34+ messages in thread
From: Chris Li @ 2025-08-01 23:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Thomas Gleixner, Bjorn Helgaas, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Len Brown, linux-kernel,
	linux-pci, linux-acpi, David Matlack, Pasha Tatashin, Jason Miu,
	Vipin Sharma, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
	William Tu, Mike Rapoport, Leon Romanovsky, Junaid Shahid

On Thu, Jul 31, 2025 at 8:02 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jul 29, 2025 at 06:51:27PM -0700, Chris Li wrote:
>
> > They follow a pattern that the original kernel needs to write to the
> > device and change the device state. The liveupdate device needs to
> > maintain the previous state not changed, therefore needs to prevent
> > such write initialization in liveupdate case.
>
> No, I fundamentally reject this position and your testing methodology.
>
> The new kernel *should* be writing to config space and it *should* be
> doing things like clearing and gaining control over MSI. It is fully
> wrong to be blocking it like you are doing just to satify some
> incorrect qemu based test checking for no config access.

First of all, let me clarify that the PCI PF and VF tests I mention in
the cover letter are run on the real data center servers, not qemu.
QEMU does not have the correct IOMMU simulation for my workstation
anyway. I do use qemu in development to quickly check if I screwed up
something badly. The real test is always on the real machine. Our
internal test dashboard has reached a high two digit number now, all
with real hardware.

With that out of the way. Let me explain why we did it the way we did.
I believe you and I eventually want the same thing, just different
ways to get there. I am also working on a series that allows fine
grain control of  PCI preservation. It allows the driver to select
exactly what needs to be preserved, rather than the current
"preserved" vs "depended" control. With the fine grain control, it can
basically do what you described, allow new kernel writes to config
space they don't want to preserve. However this RFC series is already
getting very long, that is why I did not include the fine grain
control series in this RFC. Keep in mind that this is just RFC, I want
to demonstrate the problem space, and what source code needs to be
modified in order to preserve all config space. It is not the final
version that gets merged. Your feedback is important to us.

My philosophy is that the LUO PCI subsystem is for service of the PCI
device driver. Ultimately it is the PCI device driver who decides what
part of the config space they want to preserve or overwrite. The PCI
layer is just there to facilitate that service.

Regarding the testing. There are many different tests we can write and
run. Preserving all config space is just one of them.  We also have
other tests that partially preserve the config space and write to some
config as it needs to. That is why I need to have the fine grain
control series.

If you still think it is unjustifiable to have one test try to
preserve all config space for liveupdate. Please elaborate your
reasoning. I am very curious.
With the fine grained control we let the driver decide what the driver
wants to preserve vs not, will that remove your objection?

> Only some config accesse are bad. Each and every "bad" one needs to be
> clearly explained *why* it is bad and only then mitigated.

That is exactly the reason why we have the conservative test that
preserves every config space test as a starting point. It does not
mean that is the ending point.  We also have tests that only partially
preserve the config space driver actually needs. When things break, we
can quickly compare to find out not preserving which register will
break which device. This incremental approach is very effective to
deal with very complex devices.

Another constraint is that the data center servers are dependent on
the network device able to connect to the network appropriately. Take
diorite NIC  for example, if I try only preserving ATS/PASID did not
finish the rest of liveupdate, the nic wasn't able to boot up and
connect to the network all the way. Even if the test passes for the
ATS part, the over test fails because the server is not back online. I
can't include that test into the test dashboard, because it brings
down the server. The only way to recover from that is rebooting the
server, which takes a long time for a big server. I can only keep that
non-passing test as my own private developing test, not the regression
test set.

That is the reason we to have some conservative tests passing first,
then expand to the more risky tests. We are actually quickly expanding
our test metrics for doing more and more interesting(and risky) stuff.

I hope that clarifies the eventual end goal and the development
approach we take.

> Most mitigation are far harder than just if'ing around the config
> write. My ATS/PASID/etc example for instance.

Exactly why we can't add those risky(non working) tests into the
dashboard before the conservative passing one.

Chris

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot
  2025-08-01 23:04         ` Chris Li
@ 2025-08-02 13:50           ` Jason Gunthorpe
  2025-08-07  0:50             ` Chris Li
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2025-08-02 13:50 UTC (permalink / raw)
  To: Chris Li
  Cc: Thomas Gleixner, Bjorn Helgaas, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Len Brown, linux-kernel,
	linux-pci, linux-acpi, David Matlack, Pasha Tatashin, Jason Miu,
	Vipin Sharma, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
	William Tu, Mike Rapoport, Leon Romanovsky, Junaid Shahid

On Fri, Aug 01, 2025 at 04:04:39PM -0700, Chris Li wrote:
> My philosophy is that the LUO PCI subsystem is for service of the PCI
> device driver. Ultimately it is the PCI device driver who decides what
> part of the config space they want to preserve or overwrite. The PCI
> layer is just there to facilitate that service.

I don't think this makes any sense at all. There is nothing the device
driver can contribute here.
 
> If you still think it is unjustifiable to have one test try to
> preserve all config space for liveupdate. 

I do think it is unjustifiable, it is architecurally wrong. You only
should be preserving the absolute bare minimum of config space bits
and everything else should be rewritten by the next kernel in the
normal way. This MSI is a prime example of a nonsensical outcome if
you take the position the config space should not be written to.

> > Only some config accesse are bad. Each and every "bad" one needs to be
> > clearly explained *why* it is bad and only then mitigated.
> 
> That is exactly the reason why we have the conservative test that
> preserves every config space test as a starting point. 

That is completely the opposite of what I said. Preserving everything
is giving up on the harder job of identifying which bits cannot be
changed, explaining why they can't be changed, and then mitigating
only those things.

> Another constraint is that the data center servers are dependent on
> the network device able to connect to the network appropriately. Take
> diorite NIC  for example, if I try only preserving ATS/PASID did not
> finish the rest of liveupdate, the nic wasn't able to boot up and
> connect to the network all the way. Even if the test passes for the
> ATS part, the over test fails because the server is not back online. I
> can't include that test into the test dashboard, because it brings
> down the server. The only way to recover from that is rebooting the
> server, which takes a long time for a big server. I can only keep that
> non-passing test as my own private developing test, not the regression
> test set.

I have no idea what this is trying to say and it sounds like you also
can't explain exactly what is "wrong" and justify why things are being
preserved.

Again, your series should be starting simpler. Perserve the dumbest
simplest PCI configuration. Certainly no switches, P2P, ATS or
PASID. When that is working you can then add on more complex PCI
features piece by piece.

Jason

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot
  2025-08-02 13:50           ` Jason Gunthorpe
@ 2025-08-07  0:50             ` Chris Li
  0 siblings, 0 replies; 34+ messages in thread
From: Chris Li @ 2025-08-07  0:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Thomas Gleixner, Bjorn Helgaas, Greg Kroah-Hartman,
	Rafael J. Wysocki, Danilo Krummrich, Len Brown, linux-kernel,
	linux-pci, linux-acpi, David Matlack, Pasha Tatashin, Jason Miu,
	Vipin Sharma, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
	William Tu, Mike Rapoport, Leon Romanovsky, Junaid Shahid

Hi Jason,

Thanks for your feedback.

On Sat, Aug 2, 2025 at 6:50 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Aug 01, 2025 at 04:04:39PM -0700, Chris Li wrote:
> > My philosophy is that the LUO PCI subsystem is for service of the PCI
> > device driver. Ultimately it is the PCI device driver who decides what
> > part of the config space they want to preserve or overwrite. The PCI
> > layer is just there to facilitate that service.
>
> I don't think this makes any sense at all. There is nothing the device
> driver can contribute here.

I am considering that the device driver owner will know a lot more
device internal knowledge, e.g. why it needs to reserve this and that
register where the PCI layer might not know much about the internal
device behavior.

> > If you still think it is unjustifiable to have one test try to
> > preserve all config space for liveupdate.
>
> I do think it is unjustifiable, it is architecurally wrong. You only
> should be preserving the absolute bare minimum of config space bits
> and everything else should be rewritten by the next kernel in the
> normal way. This MSI is a prime example of a nonsensical outcome if
> you take the position the config space should not be written to.

OK. Let me rework the V2 with your approach.

>
> > > Only some config accesse are bad. Each and every "bad" one needs to be
> > > clearly explained *why* it is bad and only then mitigated.
> >
> > That is exactly the reason why we have the conservative test that
> > preserves every config space test as a starting point.
>
> That is completely the opposite of what I said. Preserving everything
> is giving up on the harder job of identifying which bits cannot be
> changed, explaining why they can't be changed, and then mitigating
> only those things.

We can still preserve every thing then work backwards to preserve
less.  As I said, I will rework V2 with your approach preserving bare
minimum as the starting place.

> > Another constraint is that the data center servers are dependent on
> > the network device able to connect to the network appropriately. Take
> > diorite NIC  for example, if I try only preserving ATS/PASID did not
> > finish the rest of liveupdate, the nic wasn't able to boot up and
> > connect to the network all the way. Even if the test passes for the
> > ATS part, the over test fails because the server is not back online. I
> > can't include that test into the test dashboard, because it brings
> > down the server. The only way to recover from that is rebooting the
> > server, which takes a long time for a big server. I can only keep that
> > non-passing test as my own private developing test, not the regression
> > test set.
>
> I have no idea what this is trying to say and it sounds like you also
> can't explain exactly what is "wrong" and justify why things are being
> preserved.

I know what register is causing the trouble but I think we are under a
different philosophy of addressing the problem from different ends.
Another consideration is the device testing matrixs. The kexec with
device liveupdate is a rare event. With that many device state
re-initializing might trigger some very rare bug in the device or
firmware. So it might be due to the device internal implementation,
even though PCI spec might say otherwise or undefined.

Anyway, let me do it your way in V2 then.

> Again, your series should be starting simpler. Perserve the dumbest
> simplest PCI configuration. Certainly no switches, P2P, ATS or
> PASID. When that is working you can then add on more complex PCI
> features piece by piece.

With the V1 the patch series deliverable is having an Intel diorite
NVMe device preserve every config space access and pass to the vfio
and iommu people to build the vfio and iommu on top of it. Let's
forget about V1.

With V2 I want to start with the minimal end. No switches,P2P, ATS or
PASID. I need some help to define what is deliverable in such a
minimal preserve. e.g. Do I be able to read back the config value not
changed then call it a day. Or do I expect to see the device fully
initialized, it is able to be used by the user space. Will the device
need to perform any DMA? Interrupt?

I will probably find a device as simple as possible and it is attached
to the root PCI host bridge, not the PCI-PCI bridge.
Maybe no interrupt as the first step. One possibility is using the
Intel DSA device that does the DMA streaming.

If you have any other feedback on the candidate device and deliverable
test for V2, I am looking forward to it.

Thanks.

Chris

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2025-08-07  0:50 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
2025-07-28  8:24 ` [PATCH RFC 01/25] PCI/LUO: Register with Liveupdate Orchestrator Chris Li
2025-07-28  8:24 ` [PATCH RFC 02/25] PCI/LUO: Add struct dev_liveupdate Chris Li
2025-07-28  8:24 ` [PATCH RFC 03/25] PCI/LUO: Create requested liveupdate device list Chris Li
2025-07-28  8:24 ` [PATCH RFC 04/25] PCI/LUO: Forward prepare()/freeze()/cancel() callbacks to driver Chris Li
2025-07-28  8:24 ` [PATCH RFC 05/25] PCI/LUO: Restore state at PCI enumeration Chris Li
2025-07-28  8:24 ` [PATCH RFC 06/25] PCI/LUO: Forward finish callbacks to drivers Chris Li
2025-07-28  8:24 ` [PATCH RFC 07/25] PCI/LUO: Save and restore driver name Chris Li
2025-07-28  8:24 ` [PATCH RFC 08/25] PCI/LUO: Add liveupdate to pcieport driver Chris Li
2025-07-28  8:24 ` [PATCH RFC 09/25] PCI/LUO: Save SR-IOV number of VF Chris Li
2025-07-28  8:24 ` [PATCH RFC 10/25] PCI/LUO: Add pci_liveupdate_get_driver_data() Chris Li
2025-07-28  8:24 ` [PATCH RFC 11/25] PCI: pci-lu-stub: Add a stub driver for Live Update testing Chris Li
2025-07-28  8:24 ` [PATCH RFC 12/25] PCI/LUO: Save struct pci_dev info during prepare phase chrisl
2025-07-28  8:24 ` [PATCH RFC 13/25] PCI/LUO: Check the device function numbers in restoration chrisl
2025-07-28  8:24 ` [PATCH RFC 14/25] PCI/LUO: Restore power state of a PCI device chrisl
2025-07-28  8:24 ` [PATCH RFC 15/25] PCI/LUO: Restore PM related fields chrisl
2025-07-28  8:24 ` [PATCH RFC 16/25] PCI/LUO: Restore the pme_poll flag chrisl
2025-07-28  8:24 ` [PATCH RFC 17/25] PCI/LUO: Restore the no_d3cold flag chrisl
2025-07-28  8:24 ` [PATCH RFC 18/25] PCI/LUO: Restore pci_dev fields during probe chrisl
2025-07-28  8:24 ` [PATCH RFC 19/25] PCI/LUO: Track liveupdate buses Chris Li
2025-07-28  8:24 ` [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot Chris Li
2025-07-28 17:23   ` Thomas Gleixner
2025-07-28 23:50     ` Jason Gunthorpe
2025-07-30  4:13       ` Chris Li
2025-07-30  1:51     ` Chris Li
2025-07-31 15:01       ` Jason Gunthorpe
2025-08-01 23:04         ` Chris Li
2025-08-02 13:50           ` Jason Gunthorpe
2025-08-07  0:50             ` Chris Li
2025-07-28  8:24 ` [PATCH RFC 21/25] PCI/LUO: Save and restore the PCI resource chrisl
2025-07-28  8:24 ` [PATCH RFC 22/25] PCI/LUO: Save PCI bus and host bridge states chrisl
2025-07-28  8:24 ` [PATCH RFC 23/25] PCI/LUO: Check the PCI bus state after restoration chrisl
2025-07-28  8:24 ` [PATCH RFC 24/25] PCI: pci-lu-pf-stub: Add a PF stub driver for Live Update testing Chris Li
2025-07-28  8:24 ` [PATCH RFC 25/25] PCI/LUO: Clean up PCI_SER_GET() chrisl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).