* [RFC PATCH 00/15] iommu: Add live update state preservation
@ 2025-09-28 19:06 Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 01/15] iommu/vt-d: Register with Live Update Orchestrator Samiullah Khawaja
` (14 more replies)
0 siblings, 15 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
Hi,
This RFC patch series introduces a mechanism for IOMMU state
preservation across live update, using the Intel VT-d driver as the
initial example implementation and demonstration platform.
Please take a look at the following LWN article to learn about KHO and
Live Update Orchestrator:
https://lwn.net/Articles/1033364/
This work is based on the LUOv3 patch series listed below. Please find
the details of various live update states, file descriptor and subsystem
preservation callbacks, and memory preservation mechanisms in the LUOv3
series.
https://lore.kernel.org/all/20250807014442.3829950-1-pasha.tatashin@soleen.com/
The kernel tree with all dependencies is uploaded to the following
Github location:
https://github.com/googleprodkernel/linux-liveupdate/tree/iommu/rfc-v1
Overall Goals:
The goal of this effort is to preserve the IOMMU domains, of devices
marked for preservation, managed by iommufd. This allows DMA mappings
and IOMMU context of a device assigned to a VM to be maintained across
a live update.
This will be ultimately achieved by preserving IOMMU page tables, IOMMU
root table and the relevant context entries across live update.
Current Implementation, Scope and Limitations:
This RFC provides foundational mechanisms and demonstrates the
end-to-end workflow. It only implements the preservation of the minimum
IOMMU state, which includes the root table and context tables.
Specifically, it includes:
- Registration of the Intel VT-d IOMMU driver with the Live Update
Orchestrator.
- Registration of iommufd as a file handler with Live Update
Orchestrator.
- A subsystem-wide rw_semaphore to protect live update state and
operations.
- An API iommu_domain_preserve to preserve IOMMU domains for
preservation. Currently it only marks them as preserved.
- Implementation for preserving and restoring the Intel IOMMU root and
context tables.
- A selftest to validate the end-to-end preservation and restoration of
an iommufd file descriptor.
This version does not yet preserve the DMA mappings (page tables)
themselves. This means that ongoing DMA from a device will not continue
to work across the live update. This is a known limitation that will be
addressed in future work.
It is important to note that the preservation of the device state itself
is outside the scope of this series.
The series also does not yet include a versioning scheme for the
persisted state; this will be added later.
Target Architectural Overview:
The target architecture for IOMMU state preservation across a live
update involves coordination between the Live Update Orchestrator,
iommufd, and the IOMMU drivers.
The core design uses the Live Update Orchestrator's file descriptor
preservation mechanism to preserve iommufd file descriptors. During
preservation, the LUO prepare callback for an iommufd walks through the
IOMMU domains it manages to identify the ones associated with devices
marked for preservation. Once identified, Generic Page Table support
will be used to preserve the page tables of these domains. The domains
are then marked as preserved.
The Live Update Orchestrator's subsystem mechanism will be used to
preserve the IOMMU context entries and the associated root table.
It is important to note that the preservation of the device state is
outside the scope of this patch series. This series focuses solely on
the IOMMU subsystem's role in supporting live update for such preserved
devices.
Critical Design Considerations:
After a live update, we can restore the IOMMU domain using two
approaches,
1. Reuse the preserved page tables:
During boot the next kernel can prepare the new domain reusing the
existing preserved page tables and reattach the devices to it. The
restored domain can be retrieved and reclaimed when the iommufd file
descriptor is restored.
2. Hotswap a new domain on finish:
During boot the next kernel can setup domains for all the preserved
devices without updating context entries, so these devices can keep on
using the old preserved page tables. The userspace VMM can restore the
iommufd, create IOAS/HWPT, attach devices to it and setup DMA mappings.
Once Live Update Orchestrator moves to the finish state, the context
entries of the preserved devices can be updated and replaced with the
new IOMMU domains and page tables that are cooked in the new kernel.
I am inclined towards the "Hotswap" approach, as it involves restoring
the minimum state from the previous kernel and lets user space
regenerate the mappings. This provides a clean way of discarding the old
kernel state and using the new kernel data structures. I will share more
details on the specifics of this approach in future versions of this
series.
High-Level Sequence Flow:
The following diagrams illustrate the high-level interactions during the
preservation phase. The diagrams also contain parts that are not
implemented in this series.
Prepare:
Before live update the PREPARE event of Liveupdate Orchestrator invokes
callbacks of the registered file and subsystem handlers.
Userspace (VMM) | LUO | iommufd | IOMMU Core | Driver
-----------------|---------|-----------------|-----------------|--------
| | | |
Preserve iommufd | | | |
-----------------> | | |
| register| | |
<----------------- | | |
| | | |
| | | |
PREPARE | | | |
-----------------> | | |
| | | |
| Call FS | | |
| handle | | |
|---------> | |
| | Preserve Domain | |
| |-----------------> |
| | | Preserve using |
| | | Generic-Page |
| | | Tables |
| | |----------------->
| | | | Preserve
| | | | Domain
| | <------------------
| <------------------ |
| | Return phys | |
| save | Address of | |
<---------- state | |
| | | |
| | | |
| subsys | | |
| handle | | |
|--------------------------------------------->
| | | | Save iommu
| | | | state
| | | |
| | | | Return phys
| | | | Address of
| | | | state
| <------------------------------------
| save | | |
Restore:
After a live update, the preserved state is restored during boot and/or
when userspace retrieves the preserved FDs.
Userspace (VMM) | LUO | iommufd | IOMMU Core | Driver
-----------------|---------|-----------------|-----------------|--------
| | | | Init
| | | |
| | | | get phys
| | | | address
| <------------------------------------
| Return | | |
| addr | | |
| ------------------------------------>
| | | | Restore root
| | | | table
| | | |
Retrieve iommufd | | | |
-----------------> Call FS | | |
| handle | | |
|---------> | |
| | Restore | |
<---------- | |
| | | |
Attach IOAS | | | |
---------------------------> | |
| | Attach | |
| ------------------> |
| | | attach |
| | ------------------> Attach domain
| | | | w/o context
| | | | update
| | <------------------
<---------------------------- |
| | | |
| | | |
FINISH | | | |
-----------------> | | |
|FS handle| | |
----------> | |
| | Hotswap context | |
| ------------------> |
| | | Update Context |
| | |----------------->
| | | | Update
| | | | Context
| | Release old <------------------
| | page tables | |
| <------------------ |
| | | |
Tested:
This series was tested using QEMU with virtual IOMMU (VT-d) support. The
workflow was validated using a guest with virtio-net device bound to the
vfio-pci driver.
The new iommufd_liveupdate selftest was used to verify the end-to-end
preservation logic:
1. The selftest is run for the first time. It opens the VFIO device,
attaches it to an iommufd instance, and then uses the
LIVEUPDATE_IOCTL_FD_PRESERVE ioctl to mark the iommufd file descriptor
for preservation.
2. The test then triggers the LIVEUPDATE_PREPARE event, which in turn
triggers the preservation of the iommufd instance and the IOMMU
state.
3. The guest is rebooted using kexec.
4. After reboot, the selftest is run a second time. It detects the
LIVEUPDATE_STATE_UPDATED state and restores the iommud file
descriptor via the LIVEUPDATE_IOCTL_FD_RESTORE ioctl.
Future Work:
This RFC is the foundation for a more complete solution. The planned
next steps are:
- Implement the chosen page table preservation and restoration strategy
(Hotswap or Reuse).
- Keep the IOMMU translation enabled during shutdown.
- Add support for preserving PASID tables for devices that use them.
- Implement a versioning scheme for serialized data to ensure
compatibility across kernel versions.
- Extend support to other IOMMU architectures (e.g., AMD-Vi, Arm SMMUv3).
I am looking forward to feedback on this initial approach and the target
architecture.
Samiullah Khawaja (12):
iommu/vt-d: Register with Live Update Orchestrator
iommu: Add rw_semaphore to serialize live update state
iommu/vt-d: Prevent hotplugs when live update state is not normal
iommu: Add preserve iommu_domain op
iommu: Introduce API to preserve iommu domain
iommu/vt-d: Add stub intel iommu domain preserve op
iommu/vt-d: Add implementation of live update prepare callback
iommu/vt-d: Implement live update preserve_iommu_context
iommu/vt-d: Add live update freeze callback
iommu/vt-d: Restore iommu root_table and context on live update
iommu/vt-d: sanitize restored root table and iommu contexts
iommufd/selftest: Add test to verify iommufd preservation
YiFei Zhu (3):
iommufd: Add basic skeleton based on liveupdate_file_handle
iommufd-luo: Implement basic prepare/cancel/finish/retrieve using
folios
iommufd: Persist iommu domains for live update
MAINTAINERS | 2 +
drivers/iommu/intel/Makefile | 1 +
drivers/iommu/intel/dmar.c | 9 +
drivers/iommu/intel/iommu.c | 15 +-
drivers/iommu/intel/iommu.h | 9 +
drivers/iommu/intel/liveupdate.c | 401 ++++++++++++++++++
drivers/iommu/iommu.c | 24 ++
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/iommufd_private.h | 27 ++
drivers/iommu/iommufd/liveupdate.c | 236 +++++++++++
drivers/iommu/iommufd/main.c | 16 +-
include/linux/iommu.h | 22 +
tools/testing/selftests/iommu/Makefile | 1 +
.../selftests/iommu/iommufd_liveupdate.c | 196 +++++++++
14 files changed, 956 insertions(+), 4 deletions(-)
create mode 100644 drivers/iommu/intel/liveupdate.c
create mode 100644 drivers/iommu/iommufd/liveupdate.c
create mode 100644 tools/testing/selftests/iommu/iommufd_liveupdate.c
base-commit: 454219033bd8093293af8fbd4de47142530bdedc
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply [flat|nested] 53+ messages in thread
* [RFC PATCH 01/15] iommu/vt-d: Register with Live Update Orchestrator
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 02/15] iommu: Add rw_semaphore to serialize live update state Samiullah Khawaja
` (13 subsequent siblings)
14 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
Register Intel IOMMU driver with live update orchestrator as subsystem.
Add stub implementation of the prepare, cancel and finish callbacks.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
MAINTAINERS | 2 ++
drivers/iommu/intel/Makefile | 1 +
drivers/iommu/intel/liveupdate.c | 45 ++++++++++++++++++++++++++++++++
3 files changed, 48 insertions(+)
create mode 100644 drivers/iommu/intel/liveupdate.c
diff --git a/MAINTAINERS b/MAINTAINERS
index baeda8a526aa..e038cdd6aa41 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14228,6 +14228,7 @@ F: tools/testing/selftests/livepatch/
LIVE UPDATE
M: Pasha Tatashin <pasha.tatashin@soleen.com>
R: Pratyush Yadav <pratyush@kernel.org>
+R: Samiullah Khawaja <skhawaja@google.com>
L: linux-kernel@vger.kernel.org
S: Maintained
F: Documentation/ABI/testing/sysfs-kernel-liveupdate
@@ -14235,6 +14236,7 @@ F: Documentation/admin-guide/liveupdate.rst
F: Documentation/core-api/liveupdate.rst
F: Documentation/mm/memfd_preservation.rst
F: Documentation/userspace-api/liveupdate.rst
+F: drivers/iommu/intel/liveupdate.c
F: include/linux/liveupdate.h
F: include/uapi/linux/liveupdate.h
F: kernel/liveupdate/
diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
index ada651c4a01b..58922d580c79 100644
--- a/drivers/iommu/intel/Makefile
+++ b/drivers/iommu/intel/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
obj-$(CONFIG_INTEL_IOMMU_SVM) += svm.o
obj-$(CONFIG_IRQ_REMAP) += irq_remapping.o
obj-$(CONFIG_INTEL_IOMMU_PERF_EVENTS) += perfmon.o
+obj-$(CONFIG_LIVEUPDATE) += liveupdate.o
diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupdate.c
new file mode 100644
index 000000000000..d73d780d7e19
--- /dev/null
+++ b/drivers/iommu/intel/liveupdate.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025, Google LLC
+ * Author: Samiullah Khawaja <skhawaja@google.com>
+ */
+
+#define pr_fmt(fmt) "iommu: liveupdate: " fmt
+
+#include <linux/liveupdate.h>
+#include <linux/module.h>
+
+static int intel_liveupdate_prepare(struct liveupdate_subsystem *handle, u64 *data)
+{
+ pr_warn("Not implemented\n");
+ return 0;
+}
+
+static void intel_liveupdate_cancel(struct liveupdate_subsystem *handle, u64 data)
+{
+ pr_warn("Not implemented\n");
+}
+
+static void intel_liveupdate_finish(struct liveupdate_subsystem *handle, u64 data)
+{
+ pr_warn("Not implemented\n");
+}
+
+static struct liveupdate_subsystem_ops intel_liveupdate_subsystem_ops = {
+ .prepare = intel_liveupdate_prepare,
+ .finish = intel_liveupdate_finish,
+ .cancel = intel_liveupdate_cancel,
+};
+
+static struct liveupdate_subsystem intel_liveupdate_subsystem = {
+ .name = "intel-iommu",
+ .ops = &intel_liveupdate_subsystem_ops,
+};
+
+static int __init intel_liveupdate_init(void)
+{
+ WARN_ON_ONCE(liveupdate_register_subsystem(&intel_liveupdate_subsystem));
+ return 0;
+}
+
+late_initcall(intel_liveupdate_init);
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 02/15] iommu: Add rw_semaphore to serialize live update state
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 01/15] iommu/vt-d: Register with Live Update Orchestrator Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal Samiullah Khawaja
` (12 subsequent siblings)
14 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
Introduce a rw_semaphore to protect the IOMMU live update state.
When a live update operation (prepare, cancel, etc.) is in progress, the
underlying state of the IOMMU subsystem (e.g., the set of active
hardware units, the state of preserved domains) must not be changed by
concurrent events.
This semaphore acts as a subsystem-wide lock to serialize the LUO
callbacks against any other code path that might modify this state,
such as IOMMU hardware hotplug.
The LUO callbacks take a write lock, as they modify the live update
state. Other code paths should take a read lock to check if a live
update is in progress.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/iommu.c | 4 ++++
include/linux/iommu.h | 9 +++++++++
2 files changed, 13 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 060ebe330ee1..bfa7c8653720 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2052,6 +2052,10 @@ struct iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev,
}
EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc_flags);
+#ifdef CONFIG_LIVEUPDATE
+DECLARE_RWSEM(liveupdate_state_rwsem);
+#endif
+
void iommu_domain_free(struct iommu_domain *domain)
{
switch (domain->cookie_type) {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c30d12e16473..d23d078f7c18 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -598,6 +598,15 @@ __iommu_copy_struct_to_user(const struct iommu_user_data *dst_data,
__iommu_copy_struct_to_user(user_data, ksrc, data_type, sizeof(*ksrc), \
offsetofend(typeof(*ksrc), min_last))
+#ifdef CONFIG_LIVEUPDATE
+extern struct rw_semaphore liveupdate_state_rwsem;
+#define guard_liveupdate_state_read() guard(rwsem_read)(&liveupdate_state_rwsem)
+#define guard_liveupdate_state_write() guard(rwsem_write)(&liveupdate_state_rwsem)
+#else
+#define guard_liveupdate_state_read()
+#define guard_liveupdate_state_write()
+#endif /* CONFIG_LIVEUPDATE */
+
/**
* struct iommu_ops - iommu ops and capabilities
* @capable: check capability
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 01/15] iommu/vt-d: Register with Live Update Orchestrator Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 02/15] iommu: Add rw_semaphore to serialize live update state Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-29 15:51 ` Jason Gunthorpe
2025-09-28 19:06 ` [RFC PATCH 04/15] iommu: Add preserve iommu_domain op Samiullah Khawaja
` (11 subsequent siblings)
14 siblings, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
Hotplugs should not be allowed when the live update state is not normal.
This means either we have preserved the state of IOMMU hardware units or
restoring the preserved state.
The live update semaphore read lock should be taken before checking the
live update state.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/intel/dmar.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index ec975c73cfe6..248bc7e9b035 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -26,6 +26,7 @@
#include <linux/dmi.h>
#include <linux/slab.h>
#include <linux/iommu.h>
+#include <linux/liveupdate.h>
#include <linux/numa.h>
#include <linux/limits.h>
#include <asm/irq_remapping.h>
@@ -2357,6 +2358,10 @@ static int dmar_device_hotplug(acpi_handle handle, bool insert)
if (tmp == NULL)
return 0;
+ guard_liveupdate_state_read();
+ if (!liveupdate_state_normal())
+ return -EBUSY;
+
down_write(&dmar_global_lock);
if (insert)
ret = dmar_hotplug_insert(tmp);
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 04/15] iommu: Add preserve iommu_domain op
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (2 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 05/15] iommu: Introduce API to preserve iommu domain Samiullah Khawaja
` (10 subsequent siblings)
14 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
Add an optional preserve iommu_domain op that can be implemented by the
iommu drivers to preserve the iommu domain.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
include/linux/iommu.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d23d078f7c18..40801d8eac61 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -758,6 +758,8 @@ struct iommu_ops {
* specific mechanisms.
* @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
* @free: Release the domain after use.
+ * @preserve: Preserve the iommu domain for liveupdate.
+ * Returns 0 on success, a negative errno on failure.
*/
struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
@@ -787,6 +789,7 @@ struct iommu_domain_ops {
unsigned long quirks);
void (*free)(struct iommu_domain *domain);
+ int (*preserve)(struct iommu_domain *domain);
};
/**
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 05/15] iommu: Introduce API to preserve iommu domain
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (3 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 04/15] iommu: Add preserve iommu_domain op Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-29 15:54 ` Jason Gunthorpe
2025-09-28 19:06 ` [RFC PATCH 06/15] iommu/vt-d: Add stub intel iommu domain preserve op Samiullah Khawaja
` (9 subsequent siblings)
14 siblings, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
Add an API that can be called by the iommu users to preserve iommu
domain. Currently it only marks the iommu_domain as preserved.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/iommu.c | 20 ++++++++++++++++++++
include/linux/iommu.h | 10 ++++++++++
2 files changed, 30 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bfa7c8653720..2e6e9c3f26ec 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2002,6 +2002,10 @@ static void iommu_domain_init(struct iommu_domain *domain, unsigned int type,
domain->owner = ops;
if (!domain->ops)
domain->ops = ops->default_domain_ops;
+
+#ifdef CONFIG_LIVEUPDATE
+ atomic_set(&domain->preserved, 0);
+#endif
}
static struct iommu_domain *
@@ -2054,6 +2058,22 @@ EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc_flags);
#ifdef CONFIG_LIVEUPDATE
DECLARE_RWSEM(liveupdate_state_rwsem);
+
+int iommu_domain_preserve(struct iommu_domain *domain)
+{
+ int ret;
+
+ lockdep_assert_held(&liveupdate_state_rwsem);
+ if (!domain->ops->preserve)
+ return -EOPNOTSUPP;
+
+ ret = domain->ops->preserve(domain);
+ if (!ret)
+ atomic_set(&domain->preserved, 1);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_domain_preserve);
#endif
void iommu_domain_free(struct iommu_domain *domain)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 40801d8eac61..aafd06134f5c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -14,6 +14,7 @@
#include <linux/err.h>
#include <linux/of.h>
#include <linux/iova_bitmap.h>
+#include <linux/atomic.h>
#include <uapi/linux/iommufd.h>
#define IOMMU_READ (1 << 0)
@@ -248,6 +249,10 @@ struct iommu_domain {
struct list_head next;
};
};
+
+#ifdef CONFIG_LIVEUPDATE
+ atomic_t preserved;
+#endif
};
static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
@@ -915,6 +920,11 @@ static inline struct iommu_domain *iommu_paging_domain_alloc(struct device *dev)
{
return iommu_paging_domain_alloc_flags(dev, 0);
}
+
+#ifdef CONFIG_LIVEUPDATE
+int iommu_domain_preserve(struct iommu_domain *domain);
+#endif
+
extern void iommu_domain_free(struct iommu_domain *domain);
extern int iommu_attach_device(struct iommu_domain *domain,
struct device *dev);
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 06/15] iommu/vt-d: Add stub intel iommu domain preserve op
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (4 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 05/15] iommu: Introduce API to preserve iommu domain Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 07/15] iommu/vt-d: Add implementation of live update prepare callback Samiullah Khawaja
` (8 subsequent siblings)
14 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
Add a stub implementation of iommu domain preserve for intel iommu
driver. This is required so that the iommu_domain is marked as
preserved.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/intel/iommu.c | 6 ++++++
drivers/iommu/intel/iommu.h | 4 ++++
drivers/iommu/intel/liveupdate.c | 9 +++++++++
3 files changed, 19 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index e236c7ec221f..7035ffca020f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4472,6 +4472,9 @@ const struct iommu_domain_ops intel_fs_paging_domain_ops = {
.iova_to_phys = intel_iommu_iova_to_phys,
.free = intel_iommu_domain_free,
.enforce_cache_coherency = intel_iommu_enforce_cache_coherency_fs,
+#ifdef CONFIG_LIVEUPDATE
+ .preserve = intel_iommu_domain_liveupdate_preserve,
+#endif
};
const struct iommu_domain_ops intel_ss_paging_domain_ops = {
@@ -4485,6 +4488,9 @@ const struct iommu_domain_ops intel_ss_paging_domain_ops = {
.iova_to_phys = intel_iommu_iova_to_phys,
.free = intel_iommu_domain_free,
.enforce_cache_coherency = intel_iommu_enforce_cache_coherency_ss,
+#ifdef CONFIG_LIVEUPDATE
+ .preserve = intel_iommu_domain_liveupdate_preserve,
+#endif
};
const struct iommu_ops intel_iommu_ops = {
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 3056583d7f56..6b69232efffd 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1345,6 +1345,10 @@ static inline int iopf_for_domain_replace(struct iommu_domain *new,
return 0;
}
+#ifdef CONFIG_LIVEUPDATE
+int intel_iommu_domain_liveupdate_preserve(struct iommu_domain *domain);
+#endif
+
#ifdef CONFIG_INTEL_IOMMU_SVM
void intel_svm_check(struct intel_iommu *iommu);
struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupdate.c
index d73d780d7e19..a15feef4d9ca 100644
--- a/drivers/iommu/intel/liveupdate.c
+++ b/drivers/iommu/intel/liveupdate.c
@@ -9,6 +9,14 @@
#include <linux/liveupdate.h>
#include <linux/module.h>
+#include "iommu.h"
+
+int intel_iommu_domain_liveupdate_preserve(struct iommu_domain *domain)
+{
+ pr_warn("Not implemented\n");
+ return 0;
+}
+
static int intel_liveupdate_prepare(struct liveupdate_subsystem *handle, u64 *data)
{
pr_warn("Not implemented\n");
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 07/15] iommu/vt-d: Add implementation of live update prepare callback
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (5 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 06/15] iommu/vt-d: Add stub intel iommu domain preserve op Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 08/15] iommu/vt-d: Implement live update preserve_iommu_context Samiullah Khawaja
` (7 subsequent siblings)
14 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
Liveupdate prepare callback preserves the iommus of the devices that are
attached to a preserved iommu domain. It does this for only pcie devices
by iterating through all the pcie devices and checking whether the
attached iommu domain is preserved.
Only stub implementation of the iommu and device preservation are added
with this commit.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/intel/dmar.c | 4 +
drivers/iommu/intel/iommu.h | 3 +
drivers/iommu/intel/liveupdate.c | 150 ++++++++++++++++++++++++++++++-
3 files changed, 156 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 248bc7e9b035..cd6ce519c1da 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1101,6 +1101,10 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
ida_init(&iommu->domain_ida);
mutex_init(&iommu->did_lock);
+#ifdef CONFIG_LIVEUPDATE
+ atomic_set(&iommu->preserved, 0);
+#endif
+
ver = readl(iommu->reg + DMAR_VER_REG);
pr_info("%s: reg_base_addr %llx ver %d:%d cap %llx ecap %llx\n",
iommu->name,
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 6b69232efffd..93ac55eb49f0 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -758,6 +758,9 @@ struct intel_iommu {
void *perf_statistic;
struct iommu_pmu *pmu;
+#ifdef CONFIG_LIVEUPDATE
+ atomic_t preserved;
+#endif
};
/* PCI domain-device relationship */
diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupdate.c
index a15feef4d9ca..94aabf025a60 100644
--- a/drivers/iommu/intel/liveupdate.c
+++ b/drivers/iommu/intel/liveupdate.c
@@ -6,24 +6,172 @@
#define pr_fmt(fmt) "iommu: liveupdate: " fmt
+#include <linux/kexec_handover.h>
#include <linux/liveupdate.h>
#include <linux/module.h>
+#include <linux/pci.h>
#include "iommu.h"
+struct iommu_unit_ser {
+ u64 phys_addr;
+ u64 root_table;
+};
+
+struct device_ser {
+ u64 bdf;
+ u64 pasid_table;
+ u64 pasid_order;
+ u64 iommu_phys;
+};
+
+struct iommu_ser {
+ u64 nr_iommus;
+ u64 nr_devices;
+
+ union {
+ u64 iommu_units_phys;
+ struct iommu_unit_ser *iommu_units;
+ };
+
+ union {
+ u64 devices_phys;
+ struct device_ser *devices;
+ };
+};
+
int intel_iommu_domain_liveupdate_preserve(struct iommu_domain *domain)
{
pr_warn("Not implemented\n");
return 0;
+}
+
+static bool is_device_domain_preserved(struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ return atomic_read(&info->domain->domain.preserved) == 1;
}
-static int intel_liveupdate_prepare(struct liveupdate_subsystem *handle, u64 *data)
+static int preserve_device_state(struct pci_dev *dev, struct device_ser *ser)
{
pr_warn("Not implemented\n");
return 0;
}
+static int preserve_iommu_state(struct intel_iommu *iommu,
+ struct iommu_unit_ser *ser)
+{
+ pr_warn("Not implemented\n");
+ return 0;
+}
+
+static void unpreserve_state(struct iommu_ser *ser)
+{
+ pr_warn("Not implemented\n");
+}
+
+static int preserve_state(struct iommu_ser *ser)
+{
+ struct device_domain_info *info;
+ struct pci_dev *pdev = NULL;
+ struct dmar_drhd_unit *drhd;
+ struct intel_iommu *iommu;
+ int ret = 0;
+
+ for_each_pci_dev(pdev) {
+ if (!is_device_domain_preserved(&pdev->dev))
+ continue;
+
+ info = dev_iommu_priv_get(&pdev->dev);
+ if (!info)
+ return -EINVAL;
+
+ if (ser->devices)
+ ret = preserve_device_state(pdev, &ser->devices[ser->nr_devices]);
+
+ if (ret)
+ return ret;
+
+ atomic_set(&info->iommu->preserved, 1);
+ ser->nr_devices++;
+ }
+
+ for_each_iommu(iommu, drhd) {
+ if (!atomic_read(&iommu->preserved))
+ continue;
+
+ atomic_set(&iommu->preserved, 0);
+ if (ser->iommu_units)
+ ret = preserve_iommu_state(iommu, &ser->iommu_units[ser->nr_iommus]);
+
+ if (ret)
+ return ret;
+
+ ser->nr_iommus++;
+ }
+
+ return 0;
+}
+
+static struct iommu_ser *alloc_preserve_state_mem(void)
+{
+ struct iommu_ser *ser_ptr;
+ struct iommu_ser ser;
+ struct folio *folio;
+ size_t sz;
+ int ret;
+
+ memset(&ser, 0, sizeof(ser));
+ ret = preserve_state(&ser);
+ if (ret)
+ goto error;
+
+ sz = sizeof(struct iommu_ser) +
+ (ser.nr_iommus * sizeof(struct iommu_unit_ser)) +
+ (ser.nr_devices * sizeof(struct device_ser));
+
+ folio = folio_alloc(GFP_KERNEL, get_order(sz));
+ if (!folio)
+ return ERR_PTR(-ENOMEM);
+
+ ret = kho_preserve_folio(folio);
+ if (ret)
+ goto error_preserve;
+
+ ser_ptr = folio_address(folio);
+ memset(ser_ptr, 0, sz);
+ ser_ptr->iommu_units = (void *)(ser_ptr + 1);
+ ser_ptr->devices = (void *)(ser_ptr->iommu_units + ser.nr_iommus);
+
+ return ser_ptr;
+
+error_preserve:
+ folio_put(folio);
+error:
+ return ERR_PTR(ret);
+}
+
+static int intel_liveupdate_prepare(struct liveupdate_subsystem *handle, u64 *data)
+{
+ struct iommu_ser *ser;
+ int ret;
+
+ guard_liveupdate_state_write();
+ ser = alloc_preserve_state_mem();
+ if (IS_ERR(ser))
+ return PTR_ERR(ser);
+
+ ret = preserve_state(ser);
+ if (ret)
+ unpreserve_state(ser);
+
+ if (!ret)
+ *data = __pa(ser);
+
+ return ret;
+}
+
static void intel_liveupdate_cancel(struct liveupdate_subsystem *handle, u64 data)
{
pr_warn("Not implemented\n");
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 08/15] iommu/vt-d: Implement live update preserve_iommu_context
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (6 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 07/15] iommu/vt-d: Add implementation of live update prepare callback Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-29 15:57 ` Jason Gunthorpe
2025-09-28 19:06 ` [RFC PATCH 09/15] iommu/vt-d: Add live update freeze callback Samiullah Khawaja
` (6 subsequent siblings)
14 siblings, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
Add implementation of preserve_iommu_context to preserve the root_table
and the associated context tables. Also mark the iommu unit as
preserved.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/intel/iommu.c | 2 -
drivers/iommu/intel/iommu.h | 1 +
drivers/iommu/intel/liveupdate.c | 80 +++++++++++++++++++++++++++++++-
3 files changed, 79 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7035ffca020f..caac4fd9a51e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -67,8 +67,6 @@ static int force_on = 0;
static int intel_iommu_tboot_noforce;
static int no_platform_optin;
-#define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry))
-
/*
* Take a root_entry and return the Lower Context Table Pointer (LCTP)
* if marked present.
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 93ac55eb49f0..273d40812d09 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -556,6 +556,8 @@ struct root_entry {
u64 lo;
u64 hi;
};
+
+#define ROOT_ENTRY_NR (VTD_PAGE_SIZE / sizeof(struct root_entry))
/*
* low 64 bits:
diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupdate.c
index 94aabf025a60..fb214736aa3c 100644
--- a/drivers/iommu/intel/liveupdate.c
+++ b/drivers/iommu/intel/liveupdate.c
@@ -59,11 +59,87 @@ static int preserve_device_state(struct pci_dev *dev, struct device_ser *ser)
return 0;
}
+static int unpreserve_iommu_context(struct intel_iommu *iommu, int end)
+{
+ struct context_entry *context;
+ int i;
+
+ if (end < 0)
+ end = ROOT_ENTRY_NR;
+
+ for (i = 0; i < end; i++) {
+ context = iommu_context_addr(iommu, i, 0, 0);
+ if (context)
+ WARN_ON_ONCE(kho_unpreserve_folio(virt_to_folio(context)));
+
+ if (!sm_supported(iommu))
+ continue;
+
+ context = iommu_context_addr(iommu, i, 0x80, 0);
+ if (context)
+ WARN_ON_ONCE(kho_unpreserve_folio(virt_to_folio(context)));
+ }
+
+ return 0;
+}
+
+static int preserve_iommu_context(struct intel_iommu *iommu)
+{
+ struct context_entry *context;
+ int ret;
+ int i;
+
+ for (i = 0; i < ROOT_ENTRY_NR; i++) {
+ context = iommu_context_addr(iommu, i, 0, 0);
+ if (context) {
+ ret = kho_preserve_folio(virt_to_folio(context));
+ if (ret)
+ goto error;
+ }
+
+ if (!sm_supported(iommu))
+ continue;
+
+ context = iommu_context_addr(iommu, i, 0x80, 0);
+ if (context) {
+ ret = kho_preserve_folio(virt_to_folio(context));
+ if (ret)
+ goto error_sm;
+ }
+ }
+
+ return 0;
+
+error_sm:
+ context = iommu_context_addr(iommu, i, 0, 0);
+ WARN_ON_ONCE(kho_unpreserve_folio(virt_to_folio(context)));
+error:
+ WARN_ON_ONCE(unpreserve_iommu_context(iommu, i));
+ return ret;
+}
+
static int preserve_iommu_state(struct intel_iommu *iommu,
struct iommu_unit_ser *ser)
{
- pr_warn("Not implemented\n");
- return 0;
+ int ret;
+
+ spin_lock(&iommu->lock);
+ ret = preserve_iommu_context(iommu);
+ if (ret)
+ goto error;
+
+ ret = kho_preserve_folio(virt_to_folio(iommu->root_entry));
+ if (ret) {
+ unpreserve_iommu_context(iommu, -1);
+ goto error;
+ }
+
+ ser->phys_addr = iommu->reg_phys;
+ ser->root_table = __pa(iommu->root_entry);
+ atomic_set(&iommu->preserved, 1);
+error:
+ spin_unlock(&iommu->lock);
+ return ret;
}
static void unpreserve_state(struct iommu_ser *ser)
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 09/15] iommu/vt-d: Add live update freeze callback
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (7 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 08/15] iommu/vt-d: Implement live update preserve_iommu_context Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-29 15:58 ` Jason Gunthorpe
2025-09-28 19:06 ` [RFC PATCH 10/15] iommu/vt-d: Restore iommu root_table and context on live update Samiullah Khawaja
` (5 subsequent siblings)
14 siblings, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
The iommu_ser needs to be updated during freeze to set the physical
address of the iommu_units and devices array as the virtual addresses
will not be valid after kexec in the next kernel.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/intel/liveupdate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupdate.c
index fb214736aa3c..a7d9b07aaada 100644
--- a/drivers/iommu/intel/liveupdate.c
+++ b/drivers/iommu/intel/liveupdate.c
@@ -258,10 +258,21 @@ static void intel_liveupdate_finish(struct liveupdate_subsystem *handle, u64 dat
pr_warn("Not implemented\n");
}
+static int intel_liveupdate_freeze(struct liveupdate_subsystem *handle, u64 *data)
+{
+ struct iommu_ser *ser = __va(*data);
+
+ ser->iommu_units_phys = __pa(ser->iommu_units);
+ ser->devices_phys = __pa(ser->devices);
+
+ return 0;
+}
+
static struct liveupdate_subsystem_ops intel_liveupdate_subsystem_ops = {
.prepare = intel_liveupdate_prepare,
.finish = intel_liveupdate_finish,
.cancel = intel_liveupdate_cancel,
+ .freeze = intel_liveupdate_freeze,
};
static struct liveupdate_subsystem intel_liveupdate_subsystem = {
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 10/15] iommu/vt-d: Restore iommu root_table and context on live update
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (8 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 09/15] iommu/vt-d: Add live update freeze callback Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 11/15] iommufd: Add basic skeleton based on liveupdate_file_handle Samiullah Khawaja
` (4 subsequent siblings)
14 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
During boot if the live update state is updated then the iommu live
update state needs to be checked to see if the state of any iommu
hardware unit was persisted before live update. If there is preserved
state available for an iommu hardware unit then restore the root_table
and the iommu context from preserved state.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/intel/iommu.c | 7 +++
drivers/iommu/intel/iommu.h | 1 +
drivers/iommu/intel/liveupdate.c | 92 +++++++++++++++++++++++++++++++-
3 files changed, 99 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index caac4fd9a51e..245316db3650 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -984,6 +984,13 @@ static int iommu_alloc_root_entry(struct intel_iommu *iommu)
{
struct root_entry *root;
+#ifdef CONFIG_LIVEUPDATE
+ if (!intel_iommu_liveupdate_restore_root_table(iommu) &&
+ iommu->root_entry) {
+ __iommu_flush_cache(iommu, iommu->root_entry, ROOT_SIZE);
+ return 0;
+ }
+#endif
root = iommu_alloc_pages_node_sz(iommu->node, GFP_ATOMIC, SZ_4K);
if (!root) {
pr_err("Allocating root entry for %s failed\n",
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 273d40812d09..6119a638c530 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1351,6 +1351,7 @@ static inline int iopf_for_domain_replace(struct iommu_domain *new,
#ifdef CONFIG_LIVEUPDATE
int intel_iommu_domain_liveupdate_preserve(struct iommu_domain *domain);
+int intel_iommu_liveupdate_restore_root_table(struct intel_iommu *iommu);
#endif
#ifdef CONFIG_INTEL_IOMMU_SVM
diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupdate.c
index a7d9b07aaada..755325a5225c 100644
--- a/drivers/iommu/intel/liveupdate.c
+++ b/drivers/iommu/intel/liveupdate.c
@@ -253,9 +253,11 @@ static void intel_liveupdate_cancel(struct liveupdate_subsystem *handle, u64 dat
pr_warn("Not implemented\n");
}
+static struct iommu_ser *serialized_state;
+
static void intel_liveupdate_finish(struct liveupdate_subsystem *handle, u64 data)
{
- pr_warn("Not implemented\n");
+ serialized_state = NULL;
}
static int intel_liveupdate_freeze(struct liveupdate_subsystem *handle, u64 *data)
@@ -280,6 +282,94 @@ static struct liveupdate_subsystem intel_liveupdate_subsystem = {
.ops = &intel_liveupdate_subsystem_ops,
};
+static struct iommu_ser *get_liveupdate_state(void)
+{
+ struct iommu_ser *ser;
+ u64 data;
+ int ret;
+
+ if (serialized_state)
+ return serialized_state;
+
+ ret = liveupdate_get_subsystem_data(&intel_liveupdate_subsystem, &data);
+ if (WARN_ON_ONCE(ret))
+ return NULL;
+
+ if (!kho_restore_folio(data))
+ return NULL;
+
+ ser = __va(data);
+ ser->iommu_units = __va(ser->iommu_units_phys);
+ ser->devices = __va(ser->devices_phys);
+ serialized_state = ser;
+
+ return ser;
+}
+
+static int restore_iommu_context(struct intel_iommu *iommu)
+{
+ struct context_entry *context;
+ int i, ret = 0;
+
+ for (i = 0; i < ROOT_ENTRY_NR; i++) {
+ context = iommu_context_addr(iommu, i, 0, 0);
+ if (context)
+ BUG_ON(!kho_restore_folio(virt_to_phys(context)));
+
+ if (!sm_supported(iommu))
+ continue;
+
+ context = iommu_context_addr(iommu, i, 0x80, 0);
+ if (context)
+ BUG_ON(!kho_restore_folio(virt_to_phys(context)));
+ }
+
+ return ret;
+}
+
+static struct iommu_unit_ser *get_iommu_unit_state(struct iommu_ser *ser, u64 reg_phys)
+{
+ int i;
+
+ for (i = 0; i < ser->nr_iommus; ++i) {
+ if (ser->iommu_units[i].phys_addr == reg_phys)
+ return &ser->iommu_units[i];
+ }
+
+ return NULL;
+}
+
+int intel_iommu_liveupdate_restore_root_table(struct intel_iommu *iommu)
+{
+ struct iommu_unit_ser *iser;
+ struct iommu_ser *ser;
+ int ret;
+
+ if (!liveupdate_state_updated())
+ return -EINVAL;
+
+ ser = get_liveupdate_state();
+ if (!ser)
+ return -EINVAL;
+
+ iser = get_iommu_unit_state(ser, iommu->reg_phys);
+ if (!iser)
+ return -EINVAL;
+
+ iommu->root_entry = __va(iser->root_table);
+
+ ret = restore_iommu_context(iommu);
+ if (ret) {
+ WARN_ONCE(ret, "Cannot restore iommu [%llx] root context\n", iommu->reg_phys);
+ folio_put(virt_to_folio(iommu->root_entry));
+ iommu->root_entry = NULL;
+ }
+ pr_info("Restored IOMMU[0x%llx] Root Table at: 0x%llx\n",
+ iommu->reg_phys, iser->root_table);
+
+ return ret;
+}
+
static int __init intel_liveupdate_init(void)
{
WARN_ON_ONCE(liveupdate_register_subsystem(&intel_liveupdate_subsystem));
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 11/15] iommufd: Add basic skeleton based on liveupdate_file_handle
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (9 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 10/15] iommu/vt-d: Restore iommu root_table and context on live update Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 12/15] iommufd-luo: Implement basic prepare/cancel/finish/retrieve using folios Samiullah Khawaja
` (3 subsequent siblings)
14 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: YiFei Zhu, Samiullah Khawaja, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
From: YiFei Zhu <zhuyifei@google.com>
No functionality is implemented in this commit. Just registering and
unregistering of the struct liveupdate_file_handle for iommufd.
All operations are stubs returning either error or no-op.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/iommufd_private.h | 15 ++++++
drivers/iommu/iommufd/liveupdate.c | 68 +++++++++++++++++++++++++
drivers/iommu/iommufd/main.c | 14 ++++-
4 files changed, 97 insertions(+), 1 deletion(-)
create mode 100644 drivers/iommu/iommufd/liveupdate.c
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 71d692c9a8f4..f37830ff7229 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o
iommufd_driver-y := driver.o
obj-$(CONFIG_IOMMUFD_DRIVER_CORE) += iommufd_driver.o
+obj-$(CONFIG_LIVEUPDATE) += liveupdate.o
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 0da2a81eedfa..faf48ca9e555 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -702,6 +702,21 @@ iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id)
struct iommufd_vdevice, obj);
}
+#ifdef CONFIG_LIVEUPDATE
+int iommufd_liveupdate_register_lufs(void);
+int iommufd_liveupdate_unregister_lufs(void);
+#else
+static inline int iommufd_liveupdate_register_lufs(void)
+{
+ return 0;
+}
+
+static inline int iommufd_liveupdate_unregister_lufs(void)
+{
+ return 0;
+}
+#endif
+
#ifdef CONFIG_IOMMUFD_TEST
int iommufd_test(struct iommufd_ucmd *ucmd);
void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/drivers/iommu/iommufd/liveupdate.c b/drivers/iommu/iommufd/liveupdate.c
new file mode 100644
index 000000000000..6d2a64966335
--- /dev/null
+++ b/drivers/iommu/iommufd/liveupdate.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "iommufd: " fmt
+
+#include <linux/file.h>
+#include <linux/iommufd.h>
+#include <linux/liveupdate.h>
+
+#include "iommufd_private.h"
+
+static int iommufd_liveupdate_prepare(struct liveupdate_file_handler *handler,
+ struct file *file, u64 *data)
+{
+ return -EOPNOTSUPP;
+}
+
+static int iommufd_liveupdate_freeze(struct liveupdate_file_handler *handler,
+ struct file *file, u64 *data)
+{
+ /* No-Op; everything should be made read-only */
+ return 0;
+}
+
+static void iommufd_liveupdate_cancel(struct liveupdate_file_handler *handler,
+ struct file *file, u64 data)
+{
+}
+
+static void iommufd_liveupdate_finish(struct liveupdate_file_handler *handler,
+ struct file *file, u64 data, bool reclaimed)
+{
+}
+
+static int iommufd_liveupdate_retrieve(struct liveupdate_file_handler *handler,
+ u64 data, struct file **file_p)
+{
+ return -EOPNOTSUPP;
+}
+
+static bool iommufd_liveupdate_can_preserve(struct liveupdate_file_handler *handler,
+ struct file *file)
+{
+ return false;
+}
+
+static struct liveupdate_file_ops iommufd_lu_file_ops = {
+ .prepare = iommufd_liveupdate_prepare,
+ .freeze = iommufd_liveupdate_freeze,
+ .cancel = iommufd_liveupdate_cancel,
+ .finish = iommufd_liveupdate_finish,
+ .retrieve = iommufd_liveupdate_retrieve,
+ .can_preserve = iommufd_liveupdate_can_preserve,
+};
+
+static struct liveupdate_file_handler iommufd_lu_handler = {
+ .compatible = "iommufd-v1",
+ .ops = &iommufd_lu_file_ops,
+};
+
+int iommufd_liveupdate_register_lufs(void)
+{
+ return liveupdate_register_file_handler(&iommufd_lu_handler);
+}
+
+int iommufd_liveupdate_unregister_lufs(void)
+{
+ return liveupdate_unregister_file_handler(&iommufd_lu_handler);
+}
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 15af7ced0501..b3bf65bc8da4 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -723,11 +723,21 @@ static int __init iommufd_init(void)
if (ret)
goto err_misc;
}
+
+ if (IS_ENABLED(CONFIG_LIVEUPDATE)) {
+ ret = iommufd_liveupdate_register_lufs();
+ if (ret)
+ goto err_vfio_misc;
+ }
+
ret = iommufd_test_init();
if (ret)
- goto err_vfio_misc;
+ goto err_lufs;
return 0;
+err_lufs:
+ if (IS_ENABLED(CONFIG_LIVEUPDATE))
+ iommufd_liveupdate_unregister_lufs();
err_vfio_misc:
if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER))
misc_deregister(&vfio_misc_dev);
@@ -739,6 +749,8 @@ static int __init iommufd_init(void)
static void __exit iommufd_exit(void)
{
iommufd_test_exit();
+ if (IS_ENABLED(CONFIG_LIVEUPDATE))
+ iommufd_liveupdate_unregister_lufs();
if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER))
misc_deregister(&vfio_misc_dev);
misc_deregister(&iommu_misc_dev);
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 12/15] iommufd-luo: Implement basic prepare/cancel/finish/retrieve using folios
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (10 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 11/15] iommufd: Add basic skeleton based on liveupdate_file_handle Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 13/15] iommufd: Persist iommu domains for live update Samiullah Khawaja
` (2 subsequent siblings)
14 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: YiFei Zhu, Samiullah Khawaja, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
From: YiFei Zhu <zhuyifei@google.com>
The actual serialization and de-serialization is implemented in
follow up commits.
- On prepare, a single folio is created and preserved to store
all the structs.
- On cancel, the folio is unpreserved and freed.
- On retrieve, the folio is restored, then an fd with anon_inode
is created, with data pointing to the folio.
- On finish, the folio is freed.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/iommufd/iommufd_private.h | 12 +++
drivers/iommu/iommufd/liveupdate.c | 127 +++++++++++++++++++++++-
drivers/iommu/iommufd/main.c | 2 +-
3 files changed, 137 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index faf48ca9e555..dfa17bfc9933 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -19,6 +19,9 @@ struct iommu_domain;
struct iommu_group;
struct iommu_option;
struct iommufd_device;
+struct iommufd_lu;
+
+extern const struct file_operations iommufd_fops;
struct iommufd_sw_msi_map {
struct list_head sw_msi_item;
@@ -55,6 +58,10 @@ struct iommufd_ctx {
/* Compatibility with VFIO no iommu */
u8 no_iommu_mode;
struct iommufd_ioas *vfio_ioas;
+
+#ifdef CONFIG_LIVEUPDATE
+ struct iommufd_lu *lu;
+#endif
};
/* Entry for iommufd_ctx::mt_mmap */
@@ -703,6 +710,11 @@ iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id)
}
#ifdef CONFIG_LIVEUPDATE
+struct iommufd_lu {
+ /* Only valid in restore, for lifetime purposes */
+ struct folio *folio_lu;
+};
+
int iommufd_liveupdate_register_lufs(void);
int iommufd_liveupdate_unregister_lufs(void);
#else
diff --git a/drivers/iommu/iommufd/liveupdate.c b/drivers/iommu/iommufd/liveupdate.c
index 6d2a64966335..1bdd5a82af90 100644
--- a/drivers/iommu/iommufd/liveupdate.c
+++ b/drivers/iommu/iommufd/liveupdate.c
@@ -2,16 +2,52 @@
#define pr_fmt(fmt) "iommufd: " fmt
+#include <linux/anon_inodes.h>
#include <linux/file.h>
#include <linux/iommufd.h>
+#include <linux/kexec_handover.h>
#include <linux/liveupdate.h>
+#include <linux/mm.h>
#include "iommufd_private.h"
static int iommufd_liveupdate_prepare(struct liveupdate_file_handler *handler,
struct file *file, u64 *data)
{
- return -EOPNOTSUPP;
+ struct iommufd_ctx *ictx = iommufd_ctx_from_file(file);
+ struct iommufd_lu *iommufd_lu;
+ struct folio *folio_lu;
+ size_t serial_size;
+ int rc;
+
+ if (IS_ERR(ictx))
+ return PTR_ERR(ictx);
+
+ serial_size = sizeof(*iommufd_lu);
+
+ folio_lu = folio_alloc(GFP_KERNEL, get_order(serial_size));
+ if (!folio_lu) {
+ rc = -ENOMEM;
+ goto err_ctx_put;
+ }
+
+ iommufd_lu = folio_address(folio_lu);
+
+ rc = kho_preserve_folio(folio_lu);
+ if (rc)
+ goto err_folio_put;
+
+ *data = virt_to_phys(iommufd_lu);
+
+ iommufd_ctx_put(ictx);
+ return 0;
+
+err_folio_put:
+ folio_put(folio_lu);
+
+err_ctx_put:
+ iommufd_ctx_put(ictx);
+ return rc;
}
static int iommufd_liveupdate_freeze(struct liveupdate_file_handler *handler,
@@ -24,23 +60,108 @@ static int iommufd_liveupdate_freeze(struct liveupdate_file_handler *handler,
static void iommufd_liveupdate_cancel(struct liveupdate_file_handler *handler,
struct file *file, u64 data)
{
+ struct iommufd_ctx *ictx = iommufd_ctx_from_file(file);
+ struct folio *folio_lu;
+
+ if (WARN_ON(IS_ERR(ictx)))
+ return;
+
+ folio_lu = pfn_folio(PHYS_PFN(data));
+ WARN_ON(kho_unpreserve_folio(folio_lu));
+ folio_put(folio_lu);
+
+ iommufd_ctx_put(ictx);
}
static void iommufd_liveupdate_finish(struct liveupdate_file_handler *handler,
struct file *file, u64 data, bool reclaimed)
{
+ struct iommufd_lu *iommufd_lu;
+ struct iommufd_ctx *ictx;
+ struct folio *folio_lu;
+
+ if (!reclaimed || !file) {
+ pr_warn("%s: fd not reclaimed\n", __func__);
+
+ folio_lu = kho_restore_folio(data);
+ if (WARN_ON_ONCE(IS_ERR_OR_NULL(folio_lu)))
+ return;
+
+ iommufd_lu = folio_address(folio_lu);
+ } else {
+ ictx = iommufd_ctx_from_file(file);
+ iommufd_lu = ictx->lu;
+ ictx->lu = NULL;
+ iommufd_ctx_put(ictx);
+ }
+
+ folio_put(iommufd_lu->folio_lu);
}
static int iommufd_liveupdate_retrieve(struct liveupdate_file_handler *handler,
u64 data, struct file **file_p)
{
- return -EOPNOTSUPP;
+ struct iommufd_lu *iommufd_lu;
+ struct iommufd_ctx *ictx;
+ struct folio *folio_lu;
+ struct file *file;
+ int rc;
+
+ folio_lu = kho_restore_folio(data);
+ if (IS_ERR_OR_NULL(folio_lu))
+ return -EFAULT;
+
+ iommufd_lu = folio_address(folio_lu);
+ iommufd_lu->folio_lu = folio_lu;
+
+ file = anon_inode_create_getfile("iommufd", &iommufd_fops,
+ NULL, O_RDWR, NULL);
+ if (IS_ERR(file)) {
+ rc = PTR_ERR(file);
+ goto err_folio_put;
+ }
+
+ rc = iommufd_fops.open(file->f_inode, file);
+ if (rc)
+ goto err_fput;
+
+ ictx = iommufd_ctx_from_file(file);
+ if (WARN_ON(IS_ERR(ictx))) {
+ rc = PTR_ERR(ictx);
+ goto err_fput;
+ }
+
+ if (WARN_ON(ictx->lu)) {
+ rc = -EEXIST;
+ goto err_ctx_put;
+ }
+ ictx->lu = iommufd_lu;
+
+ iommufd_ctx_put(ictx);
+
+ *file_p = file;
+
+ return 0;
+
+err_ctx_put:
+ iommufd_ctx_put(ictx);
+err_fput:
+ fput(file);
+err_folio_put:
+ folio_put(folio_lu);
+ return rc;
}
static bool iommufd_liveupdate_can_preserve(struct liveupdate_file_handler *handler,
struct file *file)
{
- return false;
+ struct iommufd_ctx *ictx = iommufd_ctx_from_file(file);
+
+ if (IS_ERR(ictx))
+ return false;
+
+ iommufd_ctx_put(ictx);
+ return true;
}
static struct liveupdate_file_ops iommufd_lu_file_ops = {
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index b3bf65bc8da4..a8b6daaca11f 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -577,7 +577,7 @@ static int iommufd_fops_mmap(struct file *filp, struct vm_area_struct *vma)
return rc;
}
-static const struct file_operations iommufd_fops = {
+const struct file_operations iommufd_fops = {
.owner = THIS_MODULE,
.open = iommufd_fops_open,
.release = iommufd_fops_release,
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (11 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 12/15] iommufd-luo: Implement basic prepare/cancel/finish/retrieve using folios Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-29 16:00 ` Jason Gunthorpe
2025-09-28 19:06 ` [RFC PATCH 14/15] iommu/vt-d: sanitize restored root table and iommu contexts Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 15/15] iommufd/selftest: Add test to verify iommufd preservation Samiullah Khawaja
14 siblings, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: YiFei Zhu, Samiullah Khawaja, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
From: YiFei Zhu <zhuyifei@google.com>
Iterate through all the IOAS objects and the underlying hwpt_paging
objects. Persist each iommu domain using API iommu_domain_preserve.
This is temporary as only the domains attached to the persisted devices
need to preserved.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/iommufd/liveupdate.c | 47 ++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/drivers/iommu/iommufd/liveupdate.c b/drivers/iommu/iommufd/liveupdate.c
index 1bdd5a82af90..0af0c6fadff1 100644
--- a/drivers/iommu/iommufd/liveupdate.c
+++ b/drivers/iommu/iommufd/liveupdate.c
@@ -8,9 +8,52 @@
#include <linux/kexec_handover.h>
#include <linux/liveupdate.h>
#include <linux/mm.h>
+#include <linux/pci.h>
#include "iommufd_private.h"
+static int iommufd_save_ioas(struct iommufd_ctx *ictx,
+ struct iommufd_lu *iommufd_lu)
+{
+ struct iommufd_hwpt_paging *hwpt_paging;
+ struct iommufd_ioas *ioas = NULL;
+ struct iommufd_object *obj;
+ unsigned long index;
+ int rc;
+
+ /* Iterate each ioas. */
+ xa_for_each(&ictx->objects, index, obj) {
+ if (obj->type != IOMMUFD_OBJ_IOAS)
+ continue;
+
+ ioas = (struct iommufd_ioas *)obj;
+ mutex_lock(&ioas->mutex);
+
+ /*
+ * TODO: Iterate over each device of this iommufd and only save
+ * hwpt/domain if the device is persisted.
+ */
+ list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
+ if (!hwpt_paging->common.domain)
+ continue;
+
+ rc = iommu_domain_preserve(hwpt_paging->common.domain);
+ if (rc)
+ goto err;
+ }
+
+ mutex_unlock(&ioas->mutex);
+ ioas = NULL;
+ }
+
+ return 0;
+
+err:
+ if (ioas)
+ mutex_unlock(&ioas->mutex);
+ return rc;
+}
+
static int iommufd_liveupdate_prepare(struct liveupdate_file_handler *handler,
struct file *file, u64 *data)
{
@@ -33,6 +76,10 @@ static int iommufd_liveupdate_prepare(struct liveupdate_file_handler *handler,
iommufd_lu = folio_address(folio_lu);
+ rc = iommufd_save_ioas(ictx, iommufd_lu);
+ if (rc)
+ goto err_folio_put;
+
rc = kho_preserve_folio(folio_lu);
if (rc)
goto err_folio_put;
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 14/15] iommu/vt-d: sanitize restored root table and iommu contexts
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (12 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 13/15] iommufd: Persist iommu domains for live update Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 15/15] iommufd/selftest: Add test to verify iommufd preservation Samiullah Khawaja
14 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
The persisted root table will contain context entries from the previous
kernel. Sanitize the root table entries by setting them all to zero.
This is temporary, the context entries for the persisted devices need to
be kept intact.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
drivers/iommu/intel/liveupdate.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupdate.c
index 755325a5225c..3783632cf634 100644
--- a/drivers/iommu/intel/liveupdate.c
+++ b/drivers/iommu/intel/liveupdate.c
@@ -306,6 +306,26 @@ static struct iommu_ser *get_liveupdate_state(void)
return ser;
}
+static void sanitize_iommu_context(struct intel_iommu *iommu)
+{
+ struct context_entry *context;
+ int i;
+
+ /* TODO: Keep the context entries for the preserved devices. */
+ for (i = 0; i < ROOT_ENTRY_NR; i++) {
+ context = iommu_context_addr(iommu, i, 0, 0);
+ if (context)
+ memset(context, 0, PAGE_SIZE);
+
+ if (!sm_supported(iommu))
+ continue;
+
+ context = iommu_context_addr(iommu, i, 0x80, 0);
+ if (context)
+ memset(context, 0, PAGE_SIZE);
+ }
+}
+
static int restore_iommu_context(struct intel_iommu *iommu)
{
struct context_entry *context;
@@ -324,6 +344,8 @@ static int restore_iommu_context(struct intel_iommu *iommu)
BUG_ON(!kho_restore_folio(virt_to_phys(context)));
}
+ sanitize_iommu_context(iommu);
+
return ret;
}
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC PATCH 15/15] iommufd/selftest: Add test to verify iommufd preservation
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
` (13 preceding siblings ...)
2025-09-28 19:06 ` [RFC PATCH 14/15] iommu/vt-d: sanitize restored root table and iommu contexts Samiullah Khawaja
@ 2025-09-28 19:06 ` Samiullah Khawaja
14 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-28 19:06 UTC (permalink / raw)
To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, Jason Gunthorpe, iommu
Cc: Samiullah Khawaja, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
Test iommufd preservation by setting up an iommufd and preserve it
across live update. Test takes VFIO cdev path of a device bound to
vfio-pci driver and binds it to an iommufd being preserved.
Note that the helper functions setup_cdev, open_iommufd, and
setup_iommufd will be replaced with VFIO selftest library. Similarly the
helper function defined to open and interface with Live Update
Orchestrator device will be replaced with a common helper library.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
tools/testing/selftests/iommu/Makefile | 1 +
.../selftests/iommu/iommufd_liveupdate.c | 196 ++++++++++++++++++
2 files changed, 197 insertions(+)
create mode 100644 tools/testing/selftests/iommu/iommufd_liveupdate.c
diff --git a/tools/testing/selftests/iommu/Makefile b/tools/testing/selftests/iommu/Makefile
index 84abeb2f0949..42c962c5e612 100644
--- a/tools/testing/selftests/iommu/Makefile
+++ b/tools/testing/selftests/iommu/Makefile
@@ -6,5 +6,6 @@ LDLIBS += -lcap
TEST_GEN_PROGS :=
TEST_GEN_PROGS += iommufd
TEST_GEN_PROGS += iommufd_fail_nth
+TEST_GEN_PROGS += iommufd_liveupdate
include ../lib.mk
diff --git a/tools/testing/selftests/iommu/iommufd_liveupdate.c b/tools/testing/selftests/iommu/iommufd_liveupdate.c
new file mode 100644
index 000000000000..1003d0cf2cae
--- /dev/null
+++ b/tools/testing/selftests/iommu/iommufd_liveupdate.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Samiullah Khawaja <skhawaja@google.com>
+ */
+
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <stdbool.h>
+#include <unistd.h>
+
+#define __EXPORTED_HEADERS__
+#include <linux/liveupdate.h>
+#include <linux/iommufd.h>
+#include <linux/types.h>
+#include <linux/vfio.h>
+
+#include "../kselftest.h"
+
+#define ksft_assert(condition) \
+ do { if (!(condition)) \
+ ksft_exit_fail_msg("Failed: %s at %s %d\n", \
+ #condition, __FILE__, __LINE__); } while (0)
+
+int setup_cdev(const char *vfio_cdev_path)
+{
+ int cdev_fd;
+
+ cdev_fd = open(vfio_cdev_path, O_RDWR);
+ if (cdev_fd < 0)
+ ksft_exit_skip("Failed to open VFIO cdev: %s\n", vfio_cdev_path);
+
+ return cdev_fd;
+}
+
+int open_iommufd(void)
+{
+ int iommufd;
+
+ iommufd = open("/dev/iommu", O_RDWR);
+ if (iommufd < 0)
+ ksft_exit_skip("Failed to open /dev/iommu. IOMMUFD support not enabled.\n");
+
+ return iommufd;
+}
+
+int setup_iommufd(int iommufd, int cdev_fd)
+{
+ int ret;
+
+ struct vfio_device_bind_iommufd bind = {
+ .argsz = sizeof(bind),
+ .flags = 0,
+ };
+ struct iommu_ioas_alloc alloc_data = {
+ .size = sizeof(alloc_data),
+ .flags = 0,
+ };
+ struct vfio_device_attach_iommufd_pt attach_data = {
+ .argsz = sizeof(attach_data),
+ .flags = 0,
+ };
+
+ bind.iommufd = iommufd;
+ ret = ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
+ ksft_assert(!ret);
+
+ ret = ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
+ ksft_assert(!ret);
+
+ attach_data.pt_id = alloc_data.out_ioas_id;
+ ret = ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
+ ksft_assert(!ret);
+
+ return ret;
+}
+
+int open_liveupdate_orchestrator(void)
+{
+ int luo;
+
+ luo = open("/dev/liveupdate", O_RDWR);
+ ksft_assert(luo > 0);
+
+ return luo;
+}
+
+__u32 liveupdate_get_state(int luo)
+{
+ struct liveupdate_ioctl_get_state state;
+ int ret;
+
+ state.size = sizeof(state);
+ ret = ioctl(luo, LIVEUPDATE_IOCTL_GET_STATE, &state);
+ ksft_assert(!ret);
+
+ return state.state;
+}
+
+bool liveupdate_state_normal(int luo)
+{
+ return liveupdate_get_state(luo) == LIVEUPDATE_STATE_NORMAL;
+}
+
+bool liveupdate_state_updated(int luo)
+{
+ return liveupdate_get_state(luo) == LIVEUPDATE_STATE_UPDATED;
+}
+
+int liveupdate_set_event(int luo, enum liveupdate_event ev)
+{
+ struct liveupdate_ioctl_set_event event;
+ int ret;
+
+ event.event = ev;
+ event.size = sizeof(event);
+
+ ret = ioctl(luo, LIVEUPDATE_IOCTL_SET_EVENT, &event);
+ ksft_assert(!ret);
+
+ return ret;
+}
+
+int liveupdate_preserve_iommufd(int luo, int iommufd, int token)
+{
+ struct liveupdate_ioctl_fd_preserve preserve;
+ int ret;
+
+ preserve.fd = iommufd;
+ preserve.token = token;
+ preserve.size = sizeof(preserve);
+
+ ret = ioctl(luo, LIVEUPDATE_IOCTL_FD_PRESERVE, &preserve);
+ ksft_assert(!ret);
+
+ return ret;
+}
+
+int liveupdate_restore_iommufd(int luo, int token)
+{
+ struct liveupdate_ioctl_fd_restore restore;
+ int ret;
+
+ restore.token = token;
+ restore.size = sizeof(restore);
+
+ ret = ioctl(luo, LIVEUPDATE_IOCTL_FD_RESTORE, &restore);
+ ksft_assert(!ret);
+ ksft_assert(restore.fd > 0);
+
+ return restore.fd;
+}
+
+int main(int argc, char *argv[])
+{
+ int iommufd, cdev_fd, luo, ret;
+ const int token = 0x123456;
+
+ if (argc < 2) {
+ printf("Usage: ./iommufd_liveupdate <vfio_cdev_path>\n");
+ return 1;
+ }
+
+ cdev_fd = setup_cdev(argv[1]);
+
+ luo = open_liveupdate_orchestrator();
+ ksft_assert(luo > 0);
+
+ if (liveupdate_state_normal(luo))
+ iommufd = open_iommufd();
+ else if (liveupdate_state_updated(luo))
+ iommufd = liveupdate_restore_iommufd(luo, token);
+ else
+ ksft_exit_fail_msg("Test can only run when LUO state is normal or updated");
+
+ ret = setup_iommufd(iommufd, cdev_fd);
+ ksft_assert(!ret);
+
+ if (liveupdate_state_normal(luo)) {
+ ret = liveupdate_preserve_iommufd(luo, iommufd, token);
+ ksft_assert(!ret);
+
+ ret = liveupdate_set_event(luo, LIVEUPDATE_PREPARE);
+ ksft_assert(!ret);
+
+ while (1)
+ sleep(5);
+ } else {
+ ret = liveupdate_set_event(luo, LIVEUPDATE_FINISH);
+ ksft_assert(!ret);
+ }
+
+ return 0;
+}
+
--
2.51.0.536.g15c5d4f767-goog
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal
2025-09-28 19:06 ` [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal Samiullah Khawaja
@ 2025-09-29 15:51 ` Jason Gunthorpe
2025-09-29 16:50 ` Pasha Tatashin
2025-09-29 17:21 ` Samiullah Khawaja
0 siblings, 2 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-09-29 15:51 UTC (permalink / raw)
To: Samiullah Khawaja, Pasha Tatashin
Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon, iommu,
Robin Murphy, Pratyush Yadav, Kevin Tian, linux-kernel,
Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
On Sun, Sep 28, 2025 at 07:06:11PM +0000, Samiullah Khawaja wrote:
> Hotplugs should not be allowed when the live update state is not normal.
> This means either we have preserved the state of IOMMU hardware units or
> restoring the preserved state.
>
> The live update semaphore read lock should be taken before checking the
> live update state.
>
> Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> ---
> drivers/iommu/intel/dmar.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index ec975c73cfe6..248bc7e9b035 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -26,6 +26,7 @@
> #include <linux/dmi.h>
> #include <linux/slab.h>
> #include <linux/iommu.h>
> +#include <linux/liveupdate.h>
> #include <linux/numa.h>
> #include <linux/limits.h>
> #include <asm/irq_remapping.h>
> @@ -2357,6 +2358,10 @@ static int dmar_device_hotplug(acpi_handle handle, bool insert)
> if (tmp == NULL)
> return 0;
>
> + guard_liveupdate_state_read();
> + if (!liveupdate_state_normal())
> + return -EBUSY;
Pasha, this is madness!
Exactly why I said we should not have these crazy globals, people are
just going to sprinkle them randomly everywhere with no possible way
of ever understanding why or what they even are supposed to protect!
There is no reason to block hotplug. Do the locking and state tracking
properly so you only manage the instances that need to participate in
luo because they are linked to already plugged devices that are also
participating in luo.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 05/15] iommu: Introduce API to preserve iommu domain
2025-09-28 19:06 ` [RFC PATCH 05/15] iommu: Introduce API to preserve iommu domain Samiullah Khawaja
@ 2025-09-29 15:54 ` Jason Gunthorpe
2025-09-29 18:11 ` Samiullah Khawaja
0 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2025-09-29 15:54 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, iommu, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
On Sun, Sep 28, 2025 at 07:06:13PM +0000, Samiullah Khawaja wrote:
> Add an API that can be called by the iommu users to preserve iommu
> domain. Currently it only marks the iommu_domain as preserved.
Merge it with the previous path
> +#ifdef CONFIG_LIVEUPDATE
> + atomic_set(&domain->preserved, 0);
> +#endif
The memory is kzallocated, I don't think this is needed
> +int iommu_domain_preserve(struct iommu_domain *domain)
> +{
I expect this to accept some kind of luo pointer to signal what stream
the domain is part of.
Domains are linked to iommufd's which are linked to luo sessions. This
all needs to be carefully conveyed down to all the lower levels.
I also expect preserve to return some kind of handle that the caller
can hide away to deserialize.
> + lockdep_assert_held(&liveupdate_state_rwsem);
> + if (!domain->ops->preserve)
> + return -EOPNOTSUPP;
> +
> + ret = domain->ops->preserve(domain);
> + if (!ret)
> + atomic_set(&domain->preserved, 1);
And if we have a caller handle then there is probably no reason to
have this state tracking atomic.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 08/15] iommu/vt-d: Implement live update preserve_iommu_context
2025-09-28 19:06 ` [RFC PATCH 08/15] iommu/vt-d: Implement live update preserve_iommu_context Samiullah Khawaja
@ 2025-09-29 15:57 ` Jason Gunthorpe
0 siblings, 0 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-09-29 15:57 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, iommu, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
On Sun, Sep 28, 2025 at 07:06:16PM +0000, Samiullah Khawaja wrote:
> +static int unpreserve_iommu_context(struct intel_iommu *iommu, int end)
> +{
> + struct context_entry *context;
> + int i;
> +
> + if (end < 0)
> + end = ROOT_ENTRY_NR;
> +
> + for (i = 0; i < end; i++) {
> + context = iommu_context_addr(iommu, i, 0, 0);
> + if (context)
> + WARN_ON_ONCE(kho_unpreserve_folio(virt_to_folio(context)));
Wrong function. IIRC all of these allocations came from iommu-pages.c
and have struct page metadata.
iommu-pages needs to participate in restoring them and put back it's
struct page information.
> static int preserve_iommu_state(struct intel_iommu *iommu,
> struct iommu_unit_ser *ser)
> {
> - pr_warn("Not implemented\n");
> - return 0;
> + int ret;
> +
> + spin_lock(&iommu->lock);
> + ret = preserve_iommu_context(iommu);
> + if (ret)
> + goto error;
> +
> + ret = kho_preserve_folio(virt_to_folio(iommu->root_entry));
> + if (ret) {
> + unpreserve_iommu_context(iommu, -1);
> + goto error;
> + }
> +
> + ser->phys_addr = iommu->reg_phys;
> + ser->root_table = __pa(iommu->root_entry);
> + atomic_set(&iommu->preserved, 1);
Why all these atomics??
Also most probably this should all be flowing through the core code as
I think the core code has to genericall prevent attach/detach/probe
from happing once serialization has started.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 09/15] iommu/vt-d: Add live update freeze callback
2025-09-28 19:06 ` [RFC PATCH 09/15] iommu/vt-d: Add live update freeze callback Samiullah Khawaja
@ 2025-09-29 15:58 ` Jason Gunthorpe
0 siblings, 0 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-09-29 15:58 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, iommu, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
On Sun, Sep 28, 2025 at 07:06:17PM +0000, Samiullah Khawaja wrote:
> +static int intel_liveupdate_freeze(struct liveupdate_subsystem *handle, u64 *data)
> +{
> + struct iommu_ser *ser = __va(*data);
> +
> + ser->iommu_units_phys = __pa(ser->iommu_units);
> + ser->devices_phys = __pa(ser->devices);
Why didn't this happen at an earlier stage? It makes no sense to some
hacky naked __pa
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-28 19:06 ` [RFC PATCH 13/15] iommufd: Persist iommu domains for live update Samiullah Khawaja
@ 2025-09-29 16:00 ` Jason Gunthorpe
2025-09-29 17:32 ` Samiullah Khawaja
2025-09-30 13:07 ` Pasha Tatashin
0 siblings, 2 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-09-29 16:00 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> + struct iommufd_lu *iommufd_lu)
> +{
> + struct iommufd_hwpt_paging *hwpt_paging;
> + struct iommufd_ioas *ioas = NULL;
> + struct iommufd_object *obj;
> + unsigned long index;
> + int rc;
> +
> + /* Iterate each ioas. */
> + xa_for_each(&ictx->objects, index, obj) {
> + if (obj->type != IOMMUFD_OBJ_IOAS)
> + continue;
Wrong locking
> +
> + ioas = (struct iommufd_ioas *)obj;
> + mutex_lock(&ioas->mutex);
> +
> + /*
> + * TODO: Iterate over each device of this iommufd and only save
> + * hwpt/domain if the device is persisted.
> + */
> + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> + if (!hwpt_paging->common.domain)
> + continue;
I don't think this should be automatic. The user should directly
serialize/unserialize HWPTs by ID.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal
2025-09-29 15:51 ` Jason Gunthorpe
@ 2025-09-29 16:50 ` Pasha Tatashin
2025-09-29 17:21 ` Samiullah Khawaja
1 sibling, 0 replies; 53+ messages in thread
From: Pasha Tatashin @ 2025-09-29 16:50 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
On Mon, Sep 29, 2025 at 11:51 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Sep 28, 2025 at 07:06:11PM +0000, Samiullah Khawaja wrote:
> > Hotplugs should not be allowed when the live update state is not normal.
> > This means either we have preserved the state of IOMMU hardware units or
> > restoring the preserved state.
> >
> > The live update semaphore read lock should be taken before checking the
> > live update state.
> >
> > Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> > ---
> > drivers/iommu/intel/dmar.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> > index ec975c73cfe6..248bc7e9b035 100644
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -26,6 +26,7 @@
> > #include <linux/dmi.h>
> > #include <linux/slab.h>
> > #include <linux/iommu.h>
> > +#include <linux/liveupdate.h>
> > #include <linux/numa.h>
> > #include <linux/limits.h>
> > #include <asm/irq_remapping.h>
> > @@ -2357,6 +2358,10 @@ static int dmar_device_hotplug(acpi_handle handle, bool insert)
> > if (tmp == NULL)
> > return 0;
> >
> > + guard_liveupdate_state_read();
> > + if (!liveupdate_state_normal())
> > + return -EBUSY;
>
> Pasha, this is madness!
>
> Exactly why I said we should not have these crazy globals, people are
> just going to sprinkle them randomly everywhere with no possible way
We now have per session "state", so presumably, LUO should provide an interface:
"struct file" -> session LUO state.
We should probably add interfaces like these:
liveupdate_is_preserved(struct file *) -> return true if file is preserved.
liveupdate_state(struct file *) -> returns the current state (or
LIVEUPDATE_STATE_UNDEFINED if unpreserved) for the session to which
this FD belongs (or (in the future we could improve to per FD
granularity, if needed, but I think per-session is going to be
scalable enought).
liveupdate_state_read_enter(struct file *) -> to protect state
transition for the session to which this file belongs.
> of ever understanding why or what they even are supposed to protect!
>
> There is no reason to block hotplug. Do the locking and state tracking
This makes sense, adding a new device should be fine.
> properly so you only manage the instances that need to participate in
> luo because they are linked to already plugged devices that are also
> participating in luo.
Pasha
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal
2025-09-29 15:51 ` Jason Gunthorpe
2025-09-29 16:50 ` Pasha Tatashin
@ 2025-09-29 17:21 ` Samiullah Khawaja
1 sibling, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-29 17:21 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
On Mon, Sep 29, 2025 at 8:51 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Sep 28, 2025 at 07:06:11PM +0000, Samiullah Khawaja wrote:
> > Hotplugs should not be allowed when the live update state is not normal.
> > This means either we have preserved the state of IOMMU hardware units or
> > restoring the preserved state.
> >
> > The live update semaphore read lock should be taken before checking the
> > live update state.
> >
> > Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> > ---
> > drivers/iommu/intel/dmar.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> > index ec975c73cfe6..248bc7e9b035 100644
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -26,6 +26,7 @@
> > #include <linux/dmi.h>
> > #include <linux/slab.h>
> > #include <linux/iommu.h>
> > +#include <linux/liveupdate.h>
> > #include <linux/numa.h>
> > #include <linux/limits.h>
> > #include <asm/irq_remapping.h>
> > @@ -2357,6 +2358,10 @@ static int dmar_device_hotplug(acpi_handle handle, bool insert)
> > if (tmp == NULL)
> > return 0;
> >
> > + guard_liveupdate_state_read();
> > + if (!liveupdate_state_normal())
> > + return -EBUSY;
>
> Pasha, this is madness!
>
> Exactly why I said we should not have these crazy globals, people are
> just going to sprinkle them randomly everywhere with no possible way
> of ever understanding why or what they even are supposed to protect!
>
> There is no reason to block hotplug. Do the locking and state tracking
> properly so you only manage the instances that need to participate in
> luo because they are linked to already plugged devices that are also
> participating in luo.
Agreed.
I'll rework this and do proper state tracking once I rebase on top of
LUOv4 for next revision.
>
> Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-29 16:00 ` Jason Gunthorpe
@ 2025-09-29 17:32 ` Samiullah Khawaja
2025-09-29 17:43 ` Jason Gunthorpe
2025-09-29 17:45 ` Pasha Tatashin
2025-09-30 13:07 ` Pasha Tatashin
1 sibling, 2 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-29 17:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Mon, Sep 29, 2025 at 9:00 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > + struct iommufd_lu *iommufd_lu)
> > +{
> > + struct iommufd_hwpt_paging *hwpt_paging;
> > + struct iommufd_ioas *ioas = NULL;
> > + struct iommufd_object *obj;
> > + unsigned long index;
> > + int rc;
> > +
> > + /* Iterate each ioas. */
> > + xa_for_each(&ictx->objects, index, obj) {
> > + if (obj->type != IOMMUFD_OBJ_IOAS)
> > + continue;
>
> Wrong locking
>
> > +
> > + ioas = (struct iommufd_ioas *)obj;
> > + mutex_lock(&ioas->mutex);
> > +
> > + /*
> > + * TODO: Iterate over each device of this iommufd and only save
> > + * hwpt/domain if the device is persisted.
> > + */
> > + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > + if (!hwpt_paging->common.domain)
> > + continue;
>
> I don't think this should be automatic. The user should directly
> serialize/unserialize HWPTs by ID.
Interesting. So the user should be able to serialize/unserialize HWPTs
before the Live Update PREPARE event? But what if a device was marked
for preservation but the user never serialized the attached HWPT,
would that be considered an error during LUO PREPARE or should iommufd
serialize the remaining HWPTs here?
>
> Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-29 17:32 ` Samiullah Khawaja
@ 2025-09-29 17:43 ` Jason Gunthorpe
2025-09-29 17:45 ` Pasha Tatashin
1 sibling, 0 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-09-29 17:43 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Mon, Sep 29, 2025 at 10:32:22AM -0700, Samiullah Khawaja wrote:
> On Mon, Sep 29, 2025 at 9:00 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > > + struct iommufd_lu *iommufd_lu)
> > > +{
> > > + struct iommufd_hwpt_paging *hwpt_paging;
> > > + struct iommufd_ioas *ioas = NULL;
> > > + struct iommufd_object *obj;
> > > + unsigned long index;
> > > + int rc;
> > > +
> > > + /* Iterate each ioas. */
> > > + xa_for_each(&ictx->objects, index, obj) {
> > > + if (obj->type != IOMMUFD_OBJ_IOAS)
> > > + continue;
> >
> > Wrong locking
> >
> > > +
> > > + ioas = (struct iommufd_ioas *)obj;
> > > + mutex_lock(&ioas->mutex);
> > > +
> > > + /*
> > > + * TODO: Iterate over each device of this iommufd and only save
> > > + * hwpt/domain if the device is persisted.
> > > + */
> > > + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > > + if (!hwpt_paging->common.domain)
> > > + continue;
> >
> > I don't think this should be automatic. The user should directly
> > serialize/unserialize HWPTs by ID.
> Interesting. So the user should be able to serialize/unserialize HWPTs
> before the Live Update PREPARE event? But what if a device was marked
> for preservation but the user never serialized the attached HWPT,
> would that be considered an error during LUO PREPARE or should iommufd
> serialize the remaining HWPTs here?
yes that would be an error
I also think your patch series is a bit upside down, you should
present the iommufd and core pieces first, then come with a driver
implementation last.
It will be easier to understand the context that having a driver
implementation appear out of no where with no callers..
And everything should be driven by iommufd in this step, the iommu
driver should not be magically auto-preserving itself. Just preserve
the drivers linked to devices being preserved by iommufd.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-29 17:32 ` Samiullah Khawaja
2025-09-29 17:43 ` Jason Gunthorpe
@ 2025-09-29 17:45 ` Pasha Tatashin
1 sibling, 0 replies; 53+ messages in thread
From: Pasha Tatashin @ 2025-09-29 17:45 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: Jason Gunthorpe, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Mon, Sep 29, 2025 at 1:32 PM Samiullah Khawaja <skhawaja@google.com> wrote:
>
> On Mon, Sep 29, 2025 at 9:00 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > > + struct iommufd_lu *iommufd_lu)
> > > +{
> > > + struct iommufd_hwpt_paging *hwpt_paging;
> > > + struct iommufd_ioas *ioas = NULL;
> > > + struct iommufd_object *obj;
> > > + unsigned long index;
> > > + int rc;
> > > +
> > > + /* Iterate each ioas. */
> > > + xa_for_each(&ictx->objects, index, obj) {
> > > + if (obj->type != IOMMUFD_OBJ_IOAS)
> > > + continue;
> >
> > Wrong locking
> >
> > > +
> > > + ioas = (struct iommufd_ioas *)obj;
> > > + mutex_lock(&ioas->mutex);
> > > +
> > > + /*
> > > + * TODO: Iterate over each device of this iommufd and only save
> > > + * hwpt/domain if the device is persisted.
> > > + */
> > > + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > > + if (!hwpt_paging->common.domain)
> > > + continue;
> >
> > I don't think this should be automatic. The user should directly
> > serialize/unserialize HWPTs by ID.
> Interesting. So the user should be able to serialize/unserialize HWPTs
> before the Live Update PREPARE event? But what if a device was marked
> for preservation but the user never serialized the attached HWPT,
> would that be considered an error during LUO PREPARE or should iommufd
> serialize the remaining HWPTs here?
Users ~can~ serialize their sessions before system-wide prepare event.
During prepare event all unserialized sessions and their FDs are going
to be serialized anyways.
Pasha
> >
> > Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 05/15] iommu: Introduce API to preserve iommu domain
2025-09-29 15:54 ` Jason Gunthorpe
@ 2025-09-29 18:11 ` Samiullah Khawaja
0 siblings, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-29 18:11 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
Pasha Tatashin, iommu, Robin Murphy, Pratyush Yadav, Kevin Tian,
linux-kernel, Saeed Mahameed, Adithya Jayachandran, Parav Pandit,
Leon Romanovsky, William Tu, Vipin Sharma, dmatlack, zhuyifei,
Chris Li, praan
On Mon, Sep 29, 2025 at 8:54 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Sep 28, 2025 at 07:06:13PM +0000, Samiullah Khawaja wrote:
> > Add an API that can be called by the iommu users to preserve iommu
> > domain. Currently it only marks the iommu_domain as preserved.
>
> Merge it with the previous path
>
> > +#ifdef CONFIG_LIVEUPDATE
> > + atomic_set(&domain->preserved, 0);
> > +#endif
>
> The memory is kzallocated, I don't think this is needed
>
> > +int iommu_domain_preserve(struct iommu_domain *domain)
> > +{
>
> I expect this to accept some kind of luo pointer to signal what stream
> the domain is part of.
>
> Domains are linked to iommufd's which are linked to luo sessions. This
> all needs to be carefully conveyed down to all the lower levels.
>
Agreed. Currently this is based on LUOv3 and once I rebase on top of
LUOv4 I will rework all this.
> I also expect preserve to return some kind of handle that the caller
> can hide away to deserialize.
>
> > + lockdep_assert_held(&liveupdate_state_rwsem);
> > + if (!domain->ops->preserve)
> > + return -EOPNOTSUPP;
> > +
> > + ret = domain->ops->preserve(domain);
> > + if (!ret)
> > + atomic_set(&domain->preserved, 1);
>
> And if we have a caller handle then there is probably no reason to
> have this state tracking atomic.
Yes, all the domain states can be serialized and backed by the caller
handle. But we will still need to mark the backing iommus for
preservation, because the state of preserved iommus will be needed for
restoration during iommu init during boot. That is why I have a LUO
subsystem registered for IOMMU.
>
> Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-29 16:00 ` Jason Gunthorpe
2025-09-29 17:32 ` Samiullah Khawaja
@ 2025-09-30 13:07 ` Pasha Tatashin
2025-09-30 13:59 ` Jason Gunthorpe
1 sibling, 1 reply; 53+ messages in thread
From: Pasha Tatashin @ 2025-09-30 13:07 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Mon, Sep 29, 2025 at 12:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > + struct iommufd_lu *iommufd_lu)
> > +{
> > + struct iommufd_hwpt_paging *hwpt_paging;
> > + struct iommufd_ioas *ioas = NULL;
> > + struct iommufd_object *obj;
> > + unsigned long index;
> > + int rc;
> > +
> > + /* Iterate each ioas. */
> > + xa_for_each(&ictx->objects, index, obj) {
> > + if (obj->type != IOMMUFD_OBJ_IOAS)
> > + continue;
>
> Wrong locking
>
> > +
> > + ioas = (struct iommufd_ioas *)obj;
> > + mutex_lock(&ioas->mutex);
> > +
> > + /*
> > + * TODO: Iterate over each device of this iommufd and only save
> > + * hwpt/domain if the device is persisted.
> > + */
> > + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > + if (!hwpt_paging->common.domain)
> > + continue;
>
> I don't think this should be automatic. The user should directly
> serialize/unserialize HWPTs by ID.
Why not? Live Updated uAPI is handled through FDs, and both iommufd
and vfiofd have to be preserved; I assume we can automatically
determine the hwpt to be preserved through dependencies. Why would we
delegate this to the user?
Pasha
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-30 13:07 ` Pasha Tatashin
@ 2025-09-30 13:59 ` Jason Gunthorpe
2025-09-30 15:09 ` Pasha Tatashin
2025-09-30 20:02 ` Samiullah Khawaja
0 siblings, 2 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-09-30 13:59 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Tue, Sep 30, 2025 at 09:07:48AM -0400, Pasha Tatashin wrote:
> On Mon, Sep 29, 2025 at 12:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > > + struct iommufd_lu *iommufd_lu)
> > > +{
> > > + struct iommufd_hwpt_paging *hwpt_paging;
> > > + struct iommufd_ioas *ioas = NULL;
> > > + struct iommufd_object *obj;
> > > + unsigned long index;
> > > + int rc;
> > > +
> > > + /* Iterate each ioas. */
> > > + xa_for_each(&ictx->objects, index, obj) {
> > > + if (obj->type != IOMMUFD_OBJ_IOAS)
> > > + continue;
> >
> > Wrong locking
> >
> > > +
> > > + ioas = (struct iommufd_ioas *)obj;
> > > + mutex_lock(&ioas->mutex);
> > > +
> > > + /*
> > > + * TODO: Iterate over each device of this iommufd and only save
> > > + * hwpt/domain if the device is persisted.
> > > + */
> > > + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > > + if (!hwpt_paging->common.domain)
> > > + continue;
> >
> > I don't think this should be automatic. The user should directly
> > serialize/unserialize HWPTs by ID.
>
> Why not? Live Updated uAPI is handled through FDs, and both iommufd
> and vfiofd have to be preserved; I assume we can automatically
> determine the hwpt to be preserved through dependencies. Why would we
> delegate this to the user?
There are HWPTs outside the IOAS so it is inconsisent.
We are not going to reconstruct the IOAS.
The IDR ids of the HWPT may not be available on restore (we cannot
make this ABI), so without userspace expressly labeling them and
recovering the new IDR ids it doesn't work.
Finally we expect to discard the preserved HWPTs and replace them we
rebuilt ones at least as a first step. Userspace needs to sequence all
of this..
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-30 13:59 ` Jason Gunthorpe
@ 2025-09-30 15:09 ` Pasha Tatashin
2025-09-30 16:31 ` Jason Gunthorpe
2025-09-30 20:02 ` Samiullah Khawaja
1 sibling, 1 reply; 53+ messages in thread
From: Pasha Tatashin @ 2025-09-30 15:09 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Tue, Sep 30, 2025 at 9:59 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Sep 30, 2025 at 09:07:48AM -0400, Pasha Tatashin wrote:
> > On Mon, Sep 29, 2025 at 12:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > > > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > > > + struct iommufd_lu *iommufd_lu)
> > > > +{
> > > > + struct iommufd_hwpt_paging *hwpt_paging;
> > > > + struct iommufd_ioas *ioas = NULL;
> > > > + struct iommufd_object *obj;
> > > > + unsigned long index;
> > > > + int rc;
> > > > +
> > > > + /* Iterate each ioas. */
> > > > + xa_for_each(&ictx->objects, index, obj) {
> > > > + if (obj->type != IOMMUFD_OBJ_IOAS)
> > > > + continue;
> > >
> > > Wrong locking
> > >
> > > > +
> > > > + ioas = (struct iommufd_ioas *)obj;
> > > > + mutex_lock(&ioas->mutex);
> > > > +
> > > > + /*
> > > > + * TODO: Iterate over each device of this iommufd and only save
> > > > + * hwpt/domain if the device is persisted.
> > > > + */
> > > > + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > > > + if (!hwpt_paging->common.domain)
> > > > + continue;
> > >
> > > I don't think this should be automatic. The user should directly
> > > serialize/unserialize HWPTs by ID.
> >
> > Why not? Live Updated uAPI is handled through FDs, and both iommufd
> > and vfiofd have to be preserved; I assume we can automatically
> > determine the hwpt to be preserved through dependencies. Why would we
> > delegate this to the user?
>
> There are HWPTs outside the IOAS so it is inconsisent.
>
> We are not going to reconstruct the IOAS.
>
> The IDR ids of the HWPT may not be available on restore (we cannot
> make this ABI), so without userspace expressly labeling them and
> recovering the new IDR ids it doesn't work.
>
> Finally we expect to discard the preserved HWPTs and replace them we
> rebuilt ones at least as a first step. Userspace needs to sequence all
> of this..
The way LUOv4 is implemented, "LUO sessions" are always participating
LU. Once a user adds file descriptors to a session, that session and
its contents are automatically carried across multiple consecutive
live updates. The user only needs to act if they explicitly want to
remove an FD and opt-out of preservation, or close session. This is
consistent and convenient for long-running VM that should survive by
default.
I was hoping for a similar "preserve by default" or "opt-in-once"
model for iommufd objects that are put into the LUO session to avoid a
flurry of IOCTLs to re-register before every single live update.
On the other hand, userspace still has to issue IOCTLs after retrieval
to bring the restored FDs and associated objects back to a workable
state. Perhaps, we could do something like "Yes, I'm actively using
this object again, so please preserve it if another live update
happens." ?
Pasha
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-30 15:09 ` Pasha Tatashin
@ 2025-09-30 16:31 ` Jason Gunthorpe
0 siblings, 0 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-09-30 16:31 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Tue, Sep 30, 2025 at 11:09:59AM -0400, Pasha Tatashin wrote:
>
> The way LUOv4 is implemented, "LUO sessions" are always participating
> LU. Once a user adds file descriptors to a session, that session and
> its contents are automatically carried across multiple consecutive
> live updates. The user only needs to act if they explicitly want to
> remove an FD and opt-out of preservation, or close session. This is
> consistent and convenient for long-running VM that should survive by
> default.
I don't think this is a good idea. Each kernel should decide on its
own what and how things get included and manage the labels, from
scratch.
If you do this then alot more stuff becomes ABI and I think it will
turn into a huge PITA.
The userspace already has to have the code to setup the luo if it is
on a clean reboot - what is the point of not running that every time?
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-30 13:59 ` Jason Gunthorpe
2025-09-30 15:09 ` Pasha Tatashin
@ 2025-09-30 20:02 ` Samiullah Khawaja
2025-09-30 21:05 ` Jason Gunthorpe
1 sibling, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-30 20:02 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Tue, Sep 30, 2025 at 6:59 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Sep 30, 2025 at 09:07:48AM -0400, Pasha Tatashin wrote:
> > On Mon, Sep 29, 2025 at 12:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > > > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > > > + struct iommufd_lu *iommufd_lu)
> > > > +{
> > > > + struct iommufd_hwpt_paging *hwpt_paging;
> > > > + struct iommufd_ioas *ioas = NULL;
> > > > + struct iommufd_object *obj;
> > > > + unsigned long index;
> > > > + int rc;
> > > > +
> > > > + /* Iterate each ioas. */
> > > > + xa_for_each(&ictx->objects, index, obj) {
> > > > + if (obj->type != IOMMUFD_OBJ_IOAS)
> > > > + continue;
> > >
> > > Wrong locking
> > >
> > > > +
> > > > + ioas = (struct iommufd_ioas *)obj;
> > > > + mutex_lock(&ioas->mutex);
> > > > +
> > > > + /*
> > > > + * TODO: Iterate over each device of this iommufd and only save
> > > > + * hwpt/domain if the device is persisted.
> > > > + */
> > > > + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > > > + if (!hwpt_paging->common.domain)
> > > > + continue;
> > >
> > > I don't think this should be automatic. The user should directly
> > > serialize/unserialize HWPTs by ID.
> >
> > Why not? Live Updated uAPI is handled through FDs, and both iommufd
> > and vfiofd have to be preserved; I assume we can automatically
> > determine the hwpt to be preserved through dependencies. Why would we
> > delegate this to the user?
>
> There are HWPTs outside the IOAS so it is inconsisent.
This makes sense. But if I understand correctly a HWPT should be
associated one way or another to a preserved device or IOAS. Also the
nested ones will have parent HWPT. Can we not look at the dependencies
here and find the HWPTs that need to preserved.
>
> We are not going to reconstruct the IOAS.
>
> The IDR ids of the HWPT may not be available on restore (we cannot
> make this ABI), so without userspace expressly labeling them and
> recovering the new IDR ids it doesn't work.
>
> Finally we expect to discard the preserved HWPTs and replace them we
> rebuilt ones at least as a first step. Userspace needs to sequence all
> of this..
But if we discard the old HWPTs and replace them with the new ones, we
shouldn't need labeling of the old HWPTs? We would definitely need to
sequence the replacement and discard of the old ones, but that can
also be inferred through the dependencies between the new HWPTs?
>
> Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-30 20:02 ` Samiullah Khawaja
@ 2025-09-30 21:05 ` Jason Gunthorpe
2025-09-30 23:15 ` Samiullah Khawaja
0 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2025-09-30 21:05 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Tue, Sep 30, 2025 at 01:02:31PM -0700, Samiullah Khawaja wrote:
> > There are HWPTs outside the IOAS so it is inconsisent.
>
> This makes sense. But if I understand correctly a HWPT should be
> associated one way or another to a preserved device or IOAS. Also the
> nested ones will have parent HWPT. Can we not look at the dependencies
> here and find the HWPTs that need to preserved.
Maybe in some capacity, but I would say more of don't allow preserving
things that depend on things not already preserved somehow.
> > Finally we expect to discard the preserved HWPTs and replace them we
> > rebuilt ones at least as a first step. Userspace needs to sequence all
> > of this..
>
> But if we discard the old HWPTs and replace them with the new ones, we
> shouldn't need labeling of the old HWPTs? We would definitely need to
> sequence the replacement and discard of the old ones, but that can
> also be inferred through the dependencies between the new HWPTs?
It depends how this ends up being designed and who is responsible to
free the restored iommu_domain.
The iommu core code should be restoring the iommu_domain as soon as
the attached device is plugged in and attaching the preserved domain
instead of something else during the device probe sequence
This logic should not be in drivers.
From there you either put the hwpt back into iommufd and have it free
the iommu_domain when it destroys the hwpt
Or you have the iommu core code free the iommu_domain at some point
after iommufd has replaced the attachment with a new iommu_domain?
I'm not sure which is a better option..
Also there is an interesting behavior to note that if the iommu driver
restores a domain then it will also prevent a non-vfio driver from
binding to that device.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-30 21:05 ` Jason Gunthorpe
@ 2025-09-30 23:15 ` Samiullah Khawaja
2025-10-01 11:47 ` Jason Gunthorpe
0 siblings, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-09-30 23:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Tue, Sep 30, 2025 at 2:05 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Sep 30, 2025 at 01:02:31PM -0700, Samiullah Khawaja wrote:
> > > There are HWPTs outside the IOAS so it is inconsisent.
> >
> > This makes sense. But if I understand correctly a HWPT should be
> > associated one way or another to a preserved device or IOAS. Also the
> > nested ones will have parent HWPT. Can we not look at the dependencies
> > here and find the HWPTs that need to preserved.
>
> Maybe in some capacity, but I would say more of don't allow preserving
> things that depend on things not already preserved somehow.
I agree. I think this makes sense. Users can explicitly indicate that
they want to preserve HWPTs and iommufd can enforce the dependencies.
>
> > > Finally we expect to discard the preserved HWPTs and replace them we
> > > rebuilt ones at least as a first step. Userspace needs to sequence all
> > > of this..
> >
> > But if we discard the old HWPTs and replace them with the new ones, we
> > shouldn't need labeling of the old HWPTs? We would definitely need to
> > sequence the replacement and discard of the old ones, but that can
> > also be inferred through the dependencies between the new HWPTs?
>
> It depends how this ends up being designed and who is responsible to
> free the restored iommu_domain.
Agreed. I think it depends on how much is restored from the previous
kernel. Discussed further below inline.
>
> The iommu core code should be restoring the iommu_domain as soon as
> the attached device is plugged in and attaching the preserved domain
> instead of something else during the device probe sequence
>
> This logic should not be in drivers.
>
> From there you either put the hwpt back into iommufd and have it free
> the iommu_domain when it destroys the hwpt
>
> Or you have the iommu core code free the iommu_domain at some point
> after iommufd has replaced the attachment with a new iommu_domain?
But we cannot do the replacement during domain attachment because
userspace might not have fully prepared the new domain with all the
required DMA mappings. Replace during LUO finish?
This is actually very close to what I had in mind for the "Hotswap"
model. My thought was:
1. During boot, the IOMMU core sets up a default domain but doesn't
program the context entries for the preserved device. The hardware
keeps on using the old preserved tables.
2. Userspace restores the iommufd, creates a new HWPT/domain and
populates mappings.
3. On FINISH, the IOMMU core updates the context entries of preserved
devices to point to the new domain.
I have a sequence diagram for this in the cover letter also.
I understand the desire to have the preserved iommu domain be restored
during boot so the device has a default domain and there is an owner
of the attached restored domain, but that would prevent the iommfud
from cooking a clean new domain.
Maybe we can refine the "Hotswap" model I had in mind. Basically on
boot the core restores the preserved iommu domain, but core lets
iommufd attach a new domain with preserved devices without replacing
the underlying context entries? The core replaces the context entries
when the iommufd indicates that the domain is fully prepared (during
luo finish).
>
> I'm not sure which is a better option..
>
> Also there is an interesting behavior to note that if the iommu driver
> restores a domain then it will also prevent a non-vfio driver from
> binding to that device.
Agreed. I think in the "Hotswap" approach I discussed above, if we
don't restore the domain, the core can just commit the context entries
of the new default domain if a non-vfio driver is bound to the device.
>
> Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-09-30 23:15 ` Samiullah Khawaja
@ 2025-10-01 11:47 ` Jason Gunthorpe
2025-10-01 19:28 ` Pasha Tatashin
2025-10-02 1:00 ` Samiullah Khawaja
0 siblings, 2 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-10-01 11:47 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Tue, Sep 30, 2025 at 04:15:43PM -0700, Samiullah Khawaja wrote:
> > The iommu core code should be restoring the iommu_domain as soon as
> > the attached device is plugged in and attaching the preserved domain
> > instead of something else during the device probe sequence
> >
> > This logic should not be in drivers.
> >
> > From there you either put the hwpt back into iommufd and have it free
> > the iommu_domain when it destroys the hwpt
> >
> > Or you have the iommu core code free the iommu_domain at some point
> > after iommufd has replaced the attachment with a new iommu_domain?
>
> But we cannot do the replacement during domain attachment because
> userspace might not have fully prepared the new domain with all the
> required DMA mappings. Replace during LUO finish?
The idea is the kernel will restore the iommu_domain during early boot
in the iommu_core and then attach it. This should "rewrite" the IOMMU
HW context for that device with identical content. Drivers must be
enhanced to support this hitless rewrite (AMD and ARM are already
done).
At this point the kernel is operating normally with a normal domain
and a normal driver, no special luo stuff.
Later iommufd will come along and establish a HWPT that has an
identical translation. Then we replace the luo domain with the new
HWPT and free the luo domain.
> 1. During boot, the IOMMU core sets up a default domain but doesn't
> program the context entries for the preserved device. The hardware
> keeps on using the old preserved tables.
When the iommu driver first starts up it can take over the context
memory from the predecessor kernel. But it has to go through it and
clear out most of the context entries.
Only context entries belonging to devices marked for preservation
should be kept unchanged.
Later we probe the struct device to the iommu and do as I said above
to restore consistency.
> 2. Userspace restores the iommufd, creates a new HWPT/domain and
> populates mappings.
Yes
> 3. On FINISH, the IOMMU core updates the context entries of preserved
> devices to point to the new domain.
No, finish should never do anything on the restore path, IMHO. User
should directly attach the newly created HWPT when it is ready.
> I understand the desire to have the preserved iommu domain be restored
> during boot so the device has a default domain and there is an owner
> of the attached restored domain, but that would prevent the iommfud
> from cooking a clean new domain.
The "default domain" is the "DMA API domain" and it has to be created
and setup always. The change here is instead of attaching the default
domain we attach the luo restored domain at early boot.
This sets the device into an "owned" mode but vfio can still attach
and nothing prevents iommufd from building a new hwpt and attaching
it.
> Maybe we can refine the "Hotswap" model I had in mind. Basically on
> boot the core restores the preserved iommu domain, but core lets
> iommufd attach a new domain with preserved devices without replacing
> the underlying context entries?
Replace the context entries. If everything is working properly the
preserved domain should compute an identical context entry, so no
reason to not just "replace" it which should be a NOP.
> > Also there is an interesting behavior to note that if the iommu driver
> > restores a domain then it will also prevent a non-vfio driver from
> > binding to that device.
>
> Agreed. I think in the "Hotswap" approach I discussed above, if we
> don't restore the domain, the core can just commit the context entries
> of the new default domain if a non-vfio driver is bound to the device.
As I said, the owned nature of the device will prevent attaching a
non-vfio driver in the first place.
So the only path forward for userspace is to attach vfio, and then
iommufd should take over that luo restored iommu_domain and eventually
free it.
You might consider that finish should de-own the device if vfio didn't
claim it. But that is a bit tricky since it needs a FLR before the
domains can be switched around.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-01 11:47 ` Jason Gunthorpe
@ 2025-10-01 19:28 ` Pasha Tatashin
2025-10-02 11:57 ` Jason Gunthorpe
2025-10-02 1:00 ` Samiullah Khawaja
1 sibling, 1 reply; 53+ messages in thread
From: Pasha Tatashin @ 2025-10-01 19:28 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
> > 3. On FINISH, the IOMMU core updates the context entries of preserved
> > devices to point to the new domain.
>
> No, finish should never do anything on the restore path, IMHO. User
> should directly attach the newly created HWPT when it is ready.
But, finish is our indicator that a particular session (VM) is out of
blackout, and now we are free to do slow things, such as
re-allocating/recreating page tables. Why start it before a VM is out
of blackout?
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-01 11:47 ` Jason Gunthorpe
2025-10-01 19:28 ` Pasha Tatashin
@ 2025-10-02 1:00 ` Samiullah Khawaja
2025-10-02 13:41 ` Jason Gunthorpe
1 sibling, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-10-02 1:00 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Wed, Oct 1, 2025 at 4:47 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Sep 30, 2025 at 04:15:43PM -0700, Samiullah Khawaja wrote:
>
> > > The iommu core code should be restoring the iommu_domain as soon as
> > > the attached device is plugged in and attaching the preserved domain
> > > instead of something else during the device probe sequence
> > >
> > > This logic should not be in drivers.
> > >
> > > From there you either put the hwpt back into iommufd and have it free
> > > the iommu_domain when it destroys the hwpt
> > >
> > > Or you have the iommu core code free the iommu_domain at some point
> > > after iommufd has replaced the attachment with a new iommu_domain?
> >
> > But we cannot do the replacement during domain attachment because
> > userspace might not have fully prepared the new domain with all the
> > required DMA mappings. Replace during LUO finish?
>
> The idea is the kernel will restore the iommu_domain during early boot
> in the iommu_core and then attach it. This should "rewrite" the IOMMU
> HW context for that device with identical content. Drivers must be
> enhanced to support this hitless rewrite (AMD and ARM are already
> done).
>
> At this point the kernel is operating normally with a normal domain
> and a normal driver, no special luo stuff.
>
> Later iommufd will come along and establish a HWPT that has an
> identical translation. Then we replace the luo domain with the new
> HWPT and free the luo domain.
>
> > 1. During boot, the IOMMU core sets up a default domain but doesn't
> > program the context entries for the preserved device. The hardware
> > keeps on using the old preserved tables.
>
> When the iommu driver first starts up it can take over the context
> memory from the predecessor kernel. But it has to go through it and
> clear out most of the context entries.
>
> Only context entries belonging to devices marked for preservation
> should be kept unchanged.
Agreed. We have to sanitize these and remove unused entries. I think
the same goes for any PASID tables.
>
> Later we probe the struct device to the iommu and do as I said above
> to restore consistency.
>
> > 2. Userspace restores the iommufd, creates a new HWPT/domain and
> > populates mappings.
>
> Yes
>
> > 3. On FINISH, the IOMMU core updates the context entries of preserved
> > devices to point to the new domain.
>
> No, finish should never do anything on the restore path, IMHO. User
> should directly attach the newly created HWPT when it is ready.
Makes sense. But if the user never replaces the restored iommu_domain
with a new HWPT, we will have to discard the old (restored) domain on
finish since it doesn't have any associated HWPT. I see you already
hinted at this below. This needs to be handled carefully considering
the vfio cdev FD state also. Discussed further below.
>
> > I understand the desire to have the preserved iommu domain be restored
> > during boot so the device has a default domain and there is an owner
> > of the attached restored domain, but that would prevent the iommfud
> > from cooking a clean new domain.
>
> The "default domain" is the "DMA API domain" and it has to be created
> and setup always. The change here is instead of attaching the default
> domain we attach the luo restored domain at early boot.
Oh... I meant the group->domain instead of group->default_domain.
Should have written active domain instead of default domain.
>
> This sets the device into an "owned" mode but vfio can still attach
> and nothing prevents iommufd from building a new hwpt and attaching
> it.
This is the part that I was concerned about since I was looking into
the auto_domain. Users that attach to ioas directly and use
auto_domain would not be able to restore the mappings before attaching
to the device. But users that use HWPT directly should be able to
prepare a new domain and hotswap when ready. But I think a new
interface can be built to support IOAS only use cases also. We can
revisit this later.
>
> > Maybe we can refine the "Hotswap" model I had in mind. Basically on
> > boot the core restores the preserved iommu domain, but core lets
> > iommufd attach a new domain with preserved devices without replacing
> > the underlying context entries?
>
> Replace the context entries. If everything is working properly the
> preserved domain should compute an identical context entry, so no
> reason to not just "replace" it which should be a NOP.
>
> > > Also there is an interesting behavior to note that if the iommu driver
> > > restores a domain then it will also prevent a non-vfio driver from
> > > binding to that device.
> >
> > Agreed. I think in the "Hotswap" approach I discussed above, if we
> > don't restore the domain, the core can just commit the context entries
> > of the new default domain if a non-vfio driver is bound to the device.
>
> As I said, the owned nature of the device will prevent attaching a
> non-vfio driver in the first place.
>
> So the only path forward for userspace is to attach vfio, and then
> iommufd should take over that luo restored iommu_domain and eventually
> free it.
>
> You might consider that finish should de-own the device if vfio didn't
> claim it. But that is a bit tricky since it needs a FLR before the
> domains can be switched around.
That's a good point. But it might be tricky since the ownership of the
device is with the vfio cdev FD. So if vfio cdev FD is never
restored/reclaimed the device can be FLR'd. iommufd will follow along
and discard the domain.
The more interesting case might be where cdev is restored and bound to
iommufd but the user never recreates and hotswaps a new HWPT. In this
case we can discard the restored iommu_domain and replace it with the
blocking domain as it should have been if the device was not
preserved.
>
> Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-01 19:28 ` Pasha Tatashin
@ 2025-10-02 11:57 ` Jason Gunthorpe
2025-10-02 14:43 ` Pasha Tatashin
0 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 11:57 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Wed, Oct 01, 2025 at 03:28:56PM -0400, Pasha Tatashin wrote:
> > > 3. On FINISH, the IOMMU core updates the context entries of preserved
> > > devices to point to the new domain.
> >
> > No, finish should never do anything on the restore path, IMHO. User
> > should directly attach the newly created HWPT when it is ready.
>
> But, finish is our indicator that a particular session (VM) is out of
> blackout, and now we are free to do slow things, such as
> re-allocating/recreating page tables. Why start it before a VM is out
> of blackout?
Things should be paired.. The suspend side is
start luo - "brown out" - kernel does basically nothing as the luo is empty
add all sorts of things to sessions
finish - kernel does last minute things
While the resume is the symmetric opposite:
kexec boot - kernel restores the critical stuff it needs to boot to
userspace
userspace does all sorts of stuff and gets things out of the sessions
finish - luo should be empty now as everything was taken out by
userspace
I think when things come out of luo they should be fully operational
immediately.
Finish on resume shouldn't indicate anything specific beyond the luo
should be empty and everything should have been restored. It isn't
like finish on pre-kexec.
Userspace decides how it sequences things and what steps it takes
before ending blackout and resuming the VM.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 1:00 ` Samiullah Khawaja
@ 2025-10-02 13:41 ` Jason Gunthorpe
2025-10-02 14:59 ` Pasha Tatashin
2025-10-02 17:03 ` Samiullah Khawaja
0 siblings, 2 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 13:41 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Wed, Oct 01, 2025 at 06:00:58PM -0700, Samiullah Khawaja wrote:
> > No, finish should never do anything on the restore path, IMHO. User
> > should directly attach the newly created HWPT when it is ready.
>
> Makes sense. But if the user never replaces the restored iommu_domain
> with a new HWPT, we will have to discard the old (restored) domain on
> finish since it doesn't have any associated HWPT. I see you already
> hinted at this below. This needs to be handled carefully considering
> the vfio cdev FD state also. Discussed further below.
I think the simplest thing is the domain exists forever until
userspace attaches an iommufd, takes ownership of it and frees it.
Nothing to do with finish.
While the domain is attached iommu_device_use_default_domain() will
fail.
> This is the part that I was concerned about since I was looking into
> the auto_domain. Users that attach to ioas directly and use
> auto_domain would not be able to restore the mappings before attaching
> to the device.
IMHO luo users need to be sophisticated enough to avoid auto_domain.
> That's a good point. But it might be tricky since the ownership of the
> device is with the vfio cdev FD. So if vfio cdev FD is never
> restored/reclaimed the device can be FLR'd. iommufd will follow along
> and discard the domain.
Honestly, I keep wanting things to be kept as simple as possible with
as few exception flows as necessary.
If we make it so that iommu_device_claim_dma_owner() is aware of luo
and the only way vfio can get ownership is if it is also restoring the
luo session then that sounds perfect.
Attaching a non-luo VFIO would be blocked by the kernel so we never
get these inconsistencies.
> The more interesting case might be where cdev is restored and bound to
> iommufd but the user never recreates and hotswaps a new HWPT. In this
> case we can discard the restored iommu_domain and replace it with the
> blocking domain as it should have been if the device was not
> preserved.
Maybe the HWPT has to be auto-created inside the iommufd as soon as it
is attached. The "restore" ioctl would just return back the ID of this
already created HWPT.
Again, this seems to avoid special cases as once we exit the special
luo mode of iommu_device_claim_dma_owner() iommufd is always
responsible for the iommu_domain.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 11:57 ` Jason Gunthorpe
@ 2025-10-02 14:43 ` Pasha Tatashin
2025-10-02 15:10 ` Jason Gunthorpe
0 siblings, 1 reply; 53+ messages in thread
From: Pasha Tatashin @ 2025-10-02 14:43 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 2, 2025 at 7:57 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 01, 2025 at 03:28:56PM -0400, Pasha Tatashin wrote:
> > > > 3. On FINISH, the IOMMU core updates the context entries of preserved
> > > > devices to point to the new domain.
> > >
> > > No, finish should never do anything on the restore path, IMHO. User
> > > should directly attach the newly created HWPT when it is ready.
> >
> > But, finish is our indicator that a particular session (VM) is out of
> > blackout, and now we are free to do slow things, such as
> > re-allocating/recreating page tables. Why start it before a VM is out
> > of blackout?
>
> Things should be paired.. The suspend side is
>
> start luo - "brown out" - kernel does basically nothing as the luo is empty
> add all sorts of things to sessions
> finish - kernel does last minute things
>
> While the resume is the symmetric opposite:
>
> kexec boot - kernel restores the critical stuff it needs to boot to
> userspace
> userspace does all sorts of stuff and gets things out of the sessions
> finish - luo should be empty now as everything was taken out by
> userspace
I see, so you are proposing that finish() is basically a no-op for
IOMMU as long as everything was properly reclaimed by userspace.
> I think when things come out of luo they should be fully operational
> immediately.
I agree. Once we are in "normal" mode, we should be done with all
live-update specifics. In this state, the kernel must be fully
operational without limitations or pending background work that could
reduce VM performance. Also, any session was not reclaimed before
finish(), it and all resources associated with it should be terminated
during finish.
> Finish on resume shouldn't indicate anything specific beyond the luo
> should be empty and everything should have been restored. It isn't
> like finish on pre-kexec.
>
> Userspace decides how it sequences things and what steps it takes
> before ending blackout and resuming the VM.
This is a fair statement: userspace knows when vCPUs are resumed and
can decide when to do the HWPT swap. Following that logic, what if we
provide a specific ioctl() to perform the swap? Userspace could then
call that ioctl() prior to finish(), and during the finish() callback,
we would only need to do a quick sanity check that everything is in
order (i.e., resources were retrieved and the HWPTs were swapped).
What do we do if the user reclaimed iommufd but did not swap HWPT or
did not perform some other ioctl() before finish(), simply print a
kernel warnings and let it be, or force swapping during finish before
going into normal mode?
Pasha
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 13:41 ` Jason Gunthorpe
@ 2025-10-02 14:59 ` Pasha Tatashin
2025-10-02 17:03 ` Samiullah Khawaja
1 sibling, 0 replies; 53+ messages in thread
From: Pasha Tatashin @ 2025-10-02 14:59 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 2, 2025 at 9:41 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 01, 2025 at 06:00:58PM -0700, Samiullah Khawaja wrote:
> > > No, finish should never do anything on the restore path, IMHO. User
> > > should directly attach the newly created HWPT when it is ready.
> >
> > Makes sense. But if the user never replaces the restored iommu_domain
> > with a new HWPT, we will have to discard the old (restored) domain on
> > finish since it doesn't have any associated HWPT. I see you already
> > hinted at this below. This needs to be handled carefully considering
> > the vfio cdev FD state also. Discussed further below.
>
> I think the simplest thing is the domain exists forever until
> userspace attaches an iommufd, takes ownership of it and frees it.
> Nothing to do with finish.
>
> While the domain is attached iommu_device_use_default_domain() will
> fail.
Ah you answered my question from my previous email, let me talk to Sami.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 14:43 ` Pasha Tatashin
@ 2025-10-02 15:10 ` Jason Gunthorpe
2025-10-02 19:29 ` Samiullah Khawaja
0 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 15:10 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 02, 2025 at 10:43:45AM -0400, Pasha Tatashin wrote:
> > Finish on resume shouldn't indicate anything specific beyond the luo
> > should be empty and everything should have been restored. It isn't
> > like finish on pre-kexec.
> >
> > Userspace decides how it sequences things and what steps it takes
> > before ending blackout and resuming the VM.
>
> This is a fair statement: userspace knows when vCPUs are resumed and
> can decide when to do the HWPT swap. Following that logic, what if we
> provide a specific ioctl() to perform the swap?
Yeah, that is what I've been talking about. The ioctl already exists
in iommufd..
> What do we do if the user reclaimed iommufd but did not swap HWPT or
> did not perform some other ioctl() before finish(), simply print a
> kernel warnings and let it be, or force swapping during finish before
> going into normal mode?
The problem we haven't discussed how to solve is the linkage between
the iommu_domain and the memfd.
Since the preserved iommu_domain is referring to memory owned by the
memfd and the pins don't get restored until the iommufd starts and
generates new pins. Thus we need to keep the memfd in a frozen state.
Maybe that is the real use case for finish - things like memfd remain
frozen until finish concludes.
However, keeping with the keep it simple theme, finish can just not
succeed if there are stray objects that userspace has not cleaned up
floating around. Eg a simple refcount and iommu_domain decrs it when
it is destroyed.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 13:41 ` Jason Gunthorpe
2025-10-02 14:59 ` Pasha Tatashin
@ 2025-10-02 17:03 ` Samiullah Khawaja
2025-10-02 17:37 ` Jason Gunthorpe
1 sibling, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-10-02 17:03 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 2, 2025 at 6:41 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 01, 2025 at 06:00:58PM -0700, Samiullah Khawaja wrote:
> > > No, finish should never do anything on the restore path, IMHO. User
> > > should directly attach the newly created HWPT when it is ready.
> >
> > Makes sense. But if the user never replaces the restored iommu_domain
> > with a new HWPT, we will have to discard the old (restored) domain on
> > finish since it doesn't have any associated HWPT. I see you already
> > hinted at this below. This needs to be handled carefully considering
> > the vfio cdev FD state also. Discussed further below.
>
> I think the simplest thing is the domain exists forever until
> userspace attaches an iommufd, takes ownership of it and frees it.
> Nothing to do with finish.
Hmm.. I think this is tricky. There needs to be a way to clean up and
discard the old state if the userspace doesn't need it. And I think
the LUO (session) FINISH event is that trigger. Basically if the LUO
session manager (VMM or LUOD) decides that the finish needs to happen
and the iommufd (or the underlying HWPTs) are not restored, it means
that LUOD has decided that the VM is not going to come up and the
preserved state and resources (domain, device, memory) need to be
freed/released. If we don't do this in "FINISH" then the system will
be in a stuck state and the VM scheduler cannot schedule another VM
using the same device and resources.
>
> While the domain is attached iommu_device_use_default_domain() will
> fail.
Yes this makes sense.
>
> > This is the part that I was concerned about since I was looking into
> > the auto_domain. Users that attach to ioas directly and use
> > auto_domain would not be able to restore the mappings before attaching
> > to the device.
>
> IMHO luo users need to be sophisticated enough to avoid auto_domain.
Agreed.
>
> > That's a good point. But it might be tricky since the ownership of the
> > device is with the vfio cdev FD. So if vfio cdev FD is never
> > restored/reclaimed the device can be FLR'd. iommufd will follow along
> > and discard the domain.
>
> Honestly, I keep wanting things to be kept as simple as possible with
> as few exception flows as necessary.
>
> If we make it so that iommu_device_claim_dma_owner() is aware of luo
> and the only way vfio can get ownership is if it is also restoring the
> luo session then that sounds perfect.
>
> Attaching a non-luo VFIO would be blocked by the kernel so we never
> get these inconsistencies.
>
> > The more interesting case might be where cdev is restored and bound to
> > iommufd but the user never recreates and hotswaps a new HWPT. In this
> > case we can discard the restored iommu_domain and replace it with the
> > blocking domain as it should have been if the device was not
> > preserved.
>
> Maybe the HWPT has to be auto-created inside the iommufd as soon as it
> is attached. The "restore" ioctl would just return back the ID of this
> already created HWPT.
Once we return the ID, do we make this HWPT mutable? Or is this
re-created HWPT just a handle to keep the domain ownership?
I think if we make it mutable, this will really complicate the design
and we will get into the sanity checking about attach/detach and
map/unmap calls on this HWPT. I think keeping the restored domain
attached to the preserved device until it is hotswapped with a new
HWPT is cleaner and simpler as you desire it to be.
I think if we consider FINISH a point where everything is supposed to
be reclaimed or discarded then this problem is solved. This should
also allow LUOD to cleanup the resources and create new VMs using the
same device and resources. I see you suggested in the other thread
with Pasha that we can make FINISH fail if things are not reclaimed, I
think that also means that the system would be stuck in this state
indefinitely. Maybe this is correct since the domain is owned by VFIO
and needs to be released by it.
>
> Again, this seems to avoid special cases as once we exit the special
> luo mode of iommu_device_claim_dma_owner() iommufd is always
> responsible for the iommu_domain.
>
> Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 17:03 ` Samiullah Khawaja
@ 2025-10-02 17:37 ` Jason Gunthorpe
2025-10-02 18:08 ` Samiullah Khawaja
2025-10-10 1:28 ` Pasha Tatashin
0 siblings, 2 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 17:37 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 02, 2025 at 10:03:05AM -0700, Samiullah Khawaja wrote:
> > I think the simplest thing is the domain exists forever until
> > userspace attaches an iommufd, takes ownership of it and frees it.
> > Nothing to do with finish.
>
> Hmm.. I think this is tricky. There needs to be a way to clean up and
> discard the old state if the userspace doesn't need it.
Why?
Isn't "userspace doesn't need it" some extermely weird unused corner
case?
This should not be automatic or divorced from userspace, if the
operator would like to switch something out of LUO then they should
have userspace that co-ordinates this. Receive the iommufd, close it,
install a normal kernel driver.
Why make special code in the kernel to sequence this automatically?
> session manager (VMM or LUOD) decides that the finish needs to happen
> and the iommufd (or the underlying HWPTs) are not restored, it means
> that LUOD has decided that the VM is not going to come up and the
> preserved state and resources (domain, device, memory) need to be
> freed/released.
I've been assuming if luo fails so catastrophically the whole node
would reboot to recover.
Is there really a case where you might say a kexec happens and a
single VM out of many doesn't survive? Seems weird..
So to repeat above, if this is something people want then the
userspace should complete luo restoring the failed vm and then turn
around and free up all the resources. Why should the kernel
automatically do the same operations?
Maybe userspace needs some contingency flow where there is a dedicated
reaper program for a luo session. The VMM crashes during restore, OK,
we pass the luo FD to a reaper and it cleans up the objects in the
session and closes it.
> > Maybe the HWPT has to be auto-created inside the iommufd as soon as it
> > is attached. The "restore" ioctl would just return back the ID of this
> > already created HWPT.
>
> Once we return the ID, do we make this HWPT mutable? Or is this
> re-created HWPT just a handle to keep the domain ownership?
That's a bigger question..
For starting I was imagining that the restored iommu_domain was
immutable, eg it does not have map and unmap operations. It never
becomes mutable.
As I outlined this special luo immutable domain is then attached
during early boot, which sould be a NOP, and gets turned into a HWPT
during iommufd restoration. The only thing userspace should be able to
do with that HWPT handle is destroy it after replacing it.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 17:37 ` Jason Gunthorpe
@ 2025-10-02 18:08 ` Samiullah Khawaja
2025-10-10 1:28 ` Pasha Tatashin
1 sibling, 0 replies; 53+ messages in thread
From: Samiullah Khawaja @ 2025-10-02 18:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 2, 2025 at 10:37 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Oct 02, 2025 at 10:03:05AM -0700, Samiullah Khawaja wrote:
> > > I think the simplest thing is the domain exists forever until
> > > userspace attaches an iommufd, takes ownership of it and frees it.
> > > Nothing to do with finish.
> >
> > Hmm.. I think this is tricky. There needs to be a way to clean up and
> > discard the old state if the userspace doesn't need it.
>
> Why?
>
> Isn't "userspace doesn't need it" some extermely weird unused corner
> case?
>
> This should not be automatic or divorced from userspace, if the
> operator would like to switch something out of LUO then they should
> have userspace that co-ordinates this. Receive the iommufd, close it,
> install a normal kernel driver.
>
> Why make special code in the kernel to sequence this automatically?
>
> > session manager (VMM or LUOD) decides that the finish needs to happen
> > and the iommufd (or the underlying HWPTs) are not restored, it means
> > that LUOD has decided that the VM is not going to come up and the
> > preserved state and resources (domain, device, memory) need to be
> > freed/released.
>
> I've been assuming if luo fails so catastrophically the whole node
> would reboot to recover.
>
> Is there really a case where you might say a kexec happens and a
> single VM out of many doesn't survive? Seems weird..
>
> So to repeat above, if this is something people want then the
> userspace should complete luo restoring the failed vm and then turn
> around and free up all the resources. Why should the kernel
> automatically do the same operations?
>
> Maybe userspace needs some contingency flow where there is a dedicated
> reaper program for a luo session. The VMM crashes during restore, OK,
> we pass the luo FD to a reaper and it cleans up the objects in the
> session and closes it.
These are all great points. I agree, it makes sense. It keeps the
FINISH lightweight and makes the domain ownership model very clean. I
will further discuss the memfd dependency scenario in the other
thread.
>
> > > Maybe the HWPT has to be auto-created inside the iommufd as soon as it
> > > is attached. The "restore" ioctl would just return back the ID of this
> > > already created HWPT.
> >
> > Once we return the ID, do we make this HWPT mutable? Or is this
> > re-created HWPT just a handle to keep the domain ownership?
>
> That's a bigger question..
>
> For starting I was imagining that the restored iommu_domain was
> immutable, eg it does not have map and unmap operations. It never
> becomes mutable.
>
> As I outlined this special luo immutable domain is then attached
> during early boot, which sould be a NOP, and gets turned into a HWPT
> during iommufd restoration. The only thing userspace should be able to
> do with that HWPT handle is destroy it after replacing it.
Okay, this is great. An immutable HWPT associated with the restored
iommu_domain confirms my intuition that this is just a handle to the
underlying domain. The user can destroy it when it is replaced, or
when iommufd is closed without HWPT replacement.
>
> Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 15:10 ` Jason Gunthorpe
@ 2025-10-02 19:29 ` Samiullah Khawaja
2025-10-02 21:12 ` Jason Gunthorpe
0 siblings, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-10-02 19:29 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 2, 2025 at 8:10 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Oct 02, 2025 at 10:43:45AM -0400, Pasha Tatashin wrote:
> > > Finish on resume shouldn't indicate anything specific beyond the luo
> > > should be empty and everything should have been restored. It isn't
> > > like finish on pre-kexec.
> > >
> > > Userspace decides how it sequences things and what steps it takes
> > > before ending blackout and resuming the VM.
> >
> > This is a fair statement: userspace knows when vCPUs are resumed and
> > can decide when to do the HWPT swap. Following that logic, what if we
> > provide a specific ioctl() to perform the swap?
>
> Yeah, that is what I've been talking about. The ioctl already exists
> in iommufd..
Yes, I agree. We can use the existing ioctl and the hotswap happens
when userspace attaches the new HWPT to the device. That has been my
understanding as well.
Userspace should indeed have full autonomy to perform the hotswap
whenever the VMM (and HWPT) is ready.
>
> > What do we do if the user reclaimed iommufd but did not swap HWPT or
> > did not perform some other ioctl() before finish(), simply print a
> > kernel warnings and let it be, or force swapping during finish before
> > going into normal mode?
>
> The problem we haven't discussed how to solve is the linkage between
> the iommu_domain and the memfd.
>
> Since the preserved iommu_domain is referring to memory owned by the
> memfd and the pins don't get restored until the iommufd starts and
> generates new pins. Thus we need to keep the memfd in a frozen state.
Yes, there are dependencies between preserved FDs, and we need to
consider them during LUO PREPARE (preservation). We can use an LUO
helper in the can_preserve callback to check if a dependency is also
going to be preserved. I discuss the restore part below.
>
> Maybe that is the real use case for finish - things like memfd remain
> frozen until finish concludes.
Yes, for memfd LUO file_handler, maybe that is the purpose of FINISH.
But that gets us into the discussion of whether a dependency is
ready/allowed to mutate and FINISH. How would LUO file_handler of a
dependency know that it is safe to mutate/finish? Maybe LUO calls the
iommufd FINISH first and if it fails the dependencies don't get a
FINISH call.
I had a quick discussion with Pasha to see how LUO can help with FD
dependencies and FINISH order. Perhaps we need a new LUO API that
iommufd can call before live update, explicitly telling LUO that it
depends on an FD that is going to be preserved.
>
> However, keeping with the keep it simple theme, finish can just not
> succeed if there are stray objects that userspace has not cleaned up
> floating around. Eg a simple refcount and iommu_domain decrs it when
> it is destroyed.
>
> Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 19:29 ` Samiullah Khawaja
@ 2025-10-02 21:12 ` Jason Gunthorpe
2025-10-02 21:30 ` Pasha Tatashin
0 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 21:12 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 02, 2025 at 12:29:25PM -0700, Samiullah Khawaja wrote:
> I had a quick discussion with Pasha to see how LUO can help with FD
> dependencies and FINISH order. Perhaps we need a new LUO API that
> iommufd can call before live update, explicitly telling LUO that it
> depends on an FD that is going to be preserved.
Keeping track of a dependency graph is possible.
But I wonder if it is really needed to be fine grained.
If a memfd remains frozen until finish, and finish can't happen until
all luo objects that are internally refering to outside memory
indicate they are done, don't we get the same outcome?
Is there a reason a specific memfd should be unfrozen before finish?
Maybe finish is too broad grained? What if each session had a finish?
All the objects in the session are cleaned up, invoke the session
finish and the memfd's in the session unfreeze?
Otherwise to build a dependency graph we'd need things like
iommu_domain to record all the memfds/etc stored within it and
preserve that and so on. This information has to come from the IOAS in
iommfd so it is quite a bit more weirdness to inject.
Whereas if we have the preserving iommufd do a sequence where it
pushes all the ioas pages (memfd/etc) to luo, and only then permits
the hwpt to be preserved to the same session, we get the same basic
tracking without needing to store a graph.
Donno...
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 21:12 ` Jason Gunthorpe
@ 2025-10-02 21:30 ` Pasha Tatashin
2025-10-02 22:58 ` Jason Gunthorpe
0 siblings, 1 reply; 53+ messages in thread
From: Pasha Tatashin @ 2025-10-02 21:30 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
> Maybe finish is too broad grained? What if each session had a finish?
> All the objects in the session are cleaned up, invoke the session
> finish and the memfd's in the session unfreeze?
All sessions have their own finish:
https://lore.kernel.org/all/20250929010321.3462457-15-pasha.tatashin@soleen.com
LIVEUPDATE_SESSION_SET_EVENT
Each session can go into a "finished" state independently. However, I
am still thinking about whether a dependency graph is needed. I feel
that if we require FDs to be added to a session in a specific order
(i.e., dependencies must be added first), and every subsequent FD
checks that all prerequisites are already in the session via the
existing can_preserve() callback, we should be okay, as long as we
finish() them in reverse order.
There are two issues:
1. What do we do with LIVEUPDATE_SESSION_UNPRESERVE_FD ?
We can simply remove this IOCTL all together. Stuff can be unpreserved
by simply closing session FD.
2. Remembering this order on the way back, and since we are using the
token as an iterator, that is not going to work, unless the graph is
also preserved. However, now that we have sessions and the token
values are independent for each session, I am thinking we can go back
to the model where the kernel issues tokens when FDs are preserved, as
each session will always start from token=0. This way FD preservation
order and token order will always match.
Pasha
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 21:30 ` Pasha Tatashin
@ 2025-10-02 22:58 ` Jason Gunthorpe
2025-10-02 23:56 ` Samiullah Khawaja
0 siblings, 1 reply; 53+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 22:58 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 02, 2025 at 05:30:53PM -0400, Pasha Tatashin wrote:
> > Maybe finish is too broad grained? What if each session had a finish?
> > All the objects in the session are cleaned up, invoke the session
> > finish and the memfd's in the session unfreeze?
>
> All sessions have their own finish:
> https://lore.kernel.org/all/20250929010321.3462457-15-pasha.tatashin@soleen.com
> LIVEUPDATE_SESSION_SET_EVENT
>
> Each session can go into a "finished" state independently. However, I
> am still thinking about whether a dependency graph is needed. I feel
> that if we require FDs to be added to a session in a specific order
> (i.e., dependencies must be added first), and every subsequent FD
> checks that all prerequisites are already in the session via the
> existing can_preserve() callback, we should be okay, as long as we
> finish() them in reverse order.
I don't think it is quite that simple, like "finishing" an
iommu_domain cannot reconnect it back to the memfd. The only way to
finish it in the current sketch is to delete it.
So if you have a notion that finish is disallowed and when it is
actually finished maybe the order doesn't matter?
eg it doesn't matter what order we unfreeze memfds in.
This sort of assumes that something outside luo is still ensuring that
no disallowed operations are happening to the objects. eg nobody is
trying to ftruncate a memfd.
But I don't quite know what other objects besides memfd are going to
have this special frozen state??
> There are two issues:
> 1. What do we do with LIVEUPDATE_SESSION_UNPRESERVE_FD ?
> We can simply remove this IOCTL all together. Stuff can be unpreserved
> by simply closing session FD.
This is for serialize error handling? It does make sense if some sub
component of a session fails to serialize you'd just give up and close
the whole session.
> 2. Remembering this order on the way back, and since we are using the
> token as an iterator, that is not going to work, unless the graph is
> also preserved. However, now that we have sessions and the token
> values are independent for each session, I am thinking we can go back
> to the model where the kernel issues tokens when FDs are preserved, as
> each session will always start from token=0. This way FD preservation
> order and token order will always match.
You could just encode a preservation order numer in a seperate field?
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 22:58 ` Jason Gunthorpe
@ 2025-10-02 23:56 ` Samiullah Khawaja
2025-10-03 12:09 ` Jason Gunthorpe
0 siblings, 1 reply; 53+ messages in thread
From: Samiullah Khawaja @ 2025-10-02 23:56 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 2, 2025 at 3:58 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Oct 02, 2025 at 05:30:53PM -0400, Pasha Tatashin wrote:
> > > Maybe finish is too broad grained? What if each session had a finish?
> > > All the objects in the session are cleaned up, invoke the session
> > > finish and the memfd's in the session unfreeze?
> >
> > All sessions have their own finish:
> > https://lore.kernel.org/all/20250929010321.3462457-15-pasha.tatashin@soleen.com
> > LIVEUPDATE_SESSION_SET_EVENT
> >
> > Each session can go into a "finished" state independently. However, I
> > am still thinking about whether a dependency graph is needed. I feel
> > that if we require FDs to be added to a session in a specific order
> > (i.e., dependencies must be added first), and every subsequent FD
> > checks that all prerequisites are already in the session via the
> > existing can_preserve() callback, we should be okay, as long as we
> > finish() them in reverse order.
>
> I don't think it is quite that simple, like "finishing" an
> iommu_domain cannot reconnect it back to the memfd. The only way to
> finish it in the current sketch is to delete it.
Agreed. But I think we don't need to reconnect the iommu_domain back
to the memfd it depended on. All we need to ensure is that the memfd
remains immutable until the new HWPT replaces the old one that is
pointing to the restored iommu_domain. Until that replacement is done,
iommufd's FINISH callback would keep failing, which would prevent its
dependencies (like memfd) from receiving their FINISH calls and so it
keeps them immutable.
>
> So if you have a notion that finish is disallowed and when it is
> actually finished maybe the order doesn't matter?
I think FINISH for FDs in a SESSION is not atomic. If a dependency
memfd gets its FINISH call first, it might make itself mutable before
the iommufd FINISH callback fails because old HWPT is not replaced
yet. By then, it would be too late; the memfd has already become
mutable. That is why order would be needed.
>
> eg it doesn't matter what order we unfreeze memfds in.
>
> This sort of assumes that something outside luo is still ensuring that
> no disallowed operations are happening to the objects. eg nobody is
> trying to ftruncate a memfd.
>
> But I don't quite know what other objects besides memfd are going to
> have this special frozen state??
>
> > There are two issues:
> > 1. What do we do with LIVEUPDATE_SESSION_UNPRESERVE_FD ?
> > We can simply remove this IOCTL all together. Stuff can be unpreserved
> > by simply closing session FD.
>
> This is for serialize error handling? It does make sense if some sub
> component of a session fails to serialize you'd just give up and close
> the whole session.
>
> > 2. Remembering this order on the way back, and since we are using the
> > token as an iterator, that is not going to work, unless the graph is
> > also preserved. However, now that we have sessions and the token
> > values are independent for each session, I am thinking we can go back
> > to the model where the kernel issues tokens when FDs are preserved, as
> > each session will always start from token=0. This way FD preservation
> > order and token order will always match.
>
> You could just encode a preservation order numer in a seperate field?
>
> Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 23:56 ` Samiullah Khawaja
@ 2025-10-03 12:09 ` Jason Gunthorpe
0 siblings, 0 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-10-03 12:09 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: Pasha Tatashin, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 02, 2025 at 04:56:57PM -0700, Samiullah Khawaja wrote:
> > So if you have a notion that finish is disallowed and when it is
> > actually finished maybe the order doesn't matter?
>
> I think FINISH for FDs in a SESSION is not atomic. If a dependency
> memfd gets its FINISH call first, it might make itself mutable before
> the iommufd FINISH callback fails because old HWPT is not replaced
> yet. By then, it would be too late; the memfd has already become
> mutable. That is why order would be needed.
I'm thinking of having an counter in the session and the iommu_domain
holds it elevated until it is destroyed. Finish can't even start until
the counter is 0.
If the counter is 0 then it is fine to unfreeze all the remaning
objects in any order.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-02 17:37 ` Jason Gunthorpe
2025-10-02 18:08 ` Samiullah Khawaja
@ 2025-10-10 1:28 ` Pasha Tatashin
2025-10-10 14:24 ` Jason Gunthorpe
1 sibling, 1 reply; 53+ messages in thread
From: Pasha Tatashin @ 2025-10-10 1:28 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 2, 2025 at 1:37 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Oct 02, 2025 at 10:03:05AM -0700, Samiullah Khawaja wrote:
> > > I think the simplest thing is the domain exists forever until
> > > userspace attaches an iommufd, takes ownership of it and frees it.
> > > Nothing to do with finish.
> >
> > Hmm.. I think this is tricky. There needs to be a way to clean up and
> > discard the old state if the userspace doesn't need it.
>
> Why?
>
> Isn't "userspace doesn't need it" some extermely weird unused corner
> case?
It might be a corner case, but at cloud scale, even rare cases happen.
For example, if four VMs are resumed and one crashes while retrieving
half of its resources, we can't simply reboot the machine because of
that. We must have a way to recover the machine to a normal state,
even if some resources are not reclaimed. I would say that finish must
be properly backward-ordered, but we still should release resources
that are not reclaimed during finish, as well as those that were
reclaimed but later closed.
Pasha
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
2025-10-10 1:28 ` Pasha Tatashin
@ 2025-10-10 14:24 ` Jason Gunthorpe
0 siblings, 0 replies; 53+ messages in thread
From: Jason Gunthorpe @ 2025-10-10 14:24 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Samiullah Khawaja, David Woodhouse, Lu Baolu, Joerg Roedel,
Will Deacon, iommu, YiFei Zhu, Robin Murphy, Pratyush Yadav,
Kevin Tian, linux-kernel, Saeed Mahameed, Adithya Jayachandran,
Parav Pandit, Leon Romanovsky, William Tu, Vipin Sharma, dmatlack,
Chris Li, praan
On Thu, Oct 09, 2025 at 09:28:44PM -0400, Pasha Tatashin wrote:
> On Thu, Oct 2, 2025 at 1:37 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Thu, Oct 02, 2025 at 10:03:05AM -0700, Samiullah Khawaja wrote:
> > > > I think the simplest thing is the domain exists forever until
> > > > userspace attaches an iommufd, takes ownership of it and frees it.
> > > > Nothing to do with finish.
> > >
> > > Hmm.. I think this is tricky. There needs to be a way to clean up and
> > > discard the old state if the userspace doesn't need it.
> >
> > Why?
> >
> > Isn't "userspace doesn't need it" some extermely weird unused corner
> > case?
>
> It might be a corner case, but at cloud scale, even rare cases happen.
> For example, if four VMs are resumed and one crashes while retrieving
> half of its resources, we can't simply reboot the machine because of
> that. We must have a way to recover the machine to a normal state,
> even if some resources are not reclaimed. I would say that finish must
> be properly backward-ordered, but we still should release resources
> that are not reclaimed during finish, as well as those that were
> reclaimed but later closed.
Sure, but as I said, userspace should deal with most of this, and I
think we should lean into the worst error flows end up "leaking"
resources. They are not actually leaked, the luo still holds them and
userspace could still try again later to restore and free them. They
will get cleaned up on the next kexec, and kexec to recover from a
partially failed kexec is not an unreasonable plan...
This means think carefully about the userspace restore sequence so it
is more reliable. Like don't restore the memfd as the first thing :)
Only if there are real measurements that this is not sufficent would I
think about teaching the kernel to do a non-restore flow where it
directly destroys the object in a way that cannot fail. Eg the memfd
can directly free the page list instead of allocating an xarray. This
is alot more complex error path code to add to the kernel so lets not
do it without a strong justification.
You also can't do it until something sequences the vfio and iommufd
parts to unfreeze the memfd, this is very complicated error flows as
well.
Jason
^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2025-10-10 14:24 UTC | newest]
Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-28 19:06 [RFC PATCH 00/15] iommu: Add live update state preservation Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 01/15] iommu/vt-d: Register with Live Update Orchestrator Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 02/15] iommu: Add rw_semaphore to serialize live update state Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal Samiullah Khawaja
2025-09-29 15:51 ` Jason Gunthorpe
2025-09-29 16:50 ` Pasha Tatashin
2025-09-29 17:21 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 04/15] iommu: Add preserve iommu_domain op Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 05/15] iommu: Introduce API to preserve iommu domain Samiullah Khawaja
2025-09-29 15:54 ` Jason Gunthorpe
2025-09-29 18:11 ` Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 06/15] iommu/vt-d: Add stub intel iommu domain preserve op Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 07/15] iommu/vt-d: Add implementation of live update prepare callback Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 08/15] iommu/vt-d: Implement live update preserve_iommu_context Samiullah Khawaja
2025-09-29 15:57 ` Jason Gunthorpe
2025-09-28 19:06 ` [RFC PATCH 09/15] iommu/vt-d: Add live update freeze callback Samiullah Khawaja
2025-09-29 15:58 ` Jason Gunthorpe
2025-09-28 19:06 ` [RFC PATCH 10/15] iommu/vt-d: Restore iommu root_table and context on live update Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 11/15] iommufd: Add basic skeleton based on liveupdate_file_handle Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 12/15] iommufd-luo: Implement basic prepare/cancel/finish/retrieve using folios Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 13/15] iommufd: Persist iommu domains for live update Samiullah Khawaja
2025-09-29 16:00 ` Jason Gunthorpe
2025-09-29 17:32 ` Samiullah Khawaja
2025-09-29 17:43 ` Jason Gunthorpe
2025-09-29 17:45 ` Pasha Tatashin
2025-09-30 13:07 ` Pasha Tatashin
2025-09-30 13:59 ` Jason Gunthorpe
2025-09-30 15:09 ` Pasha Tatashin
2025-09-30 16:31 ` Jason Gunthorpe
2025-09-30 20:02 ` Samiullah Khawaja
2025-09-30 21:05 ` Jason Gunthorpe
2025-09-30 23:15 ` Samiullah Khawaja
2025-10-01 11:47 ` Jason Gunthorpe
2025-10-01 19:28 ` Pasha Tatashin
2025-10-02 11:57 ` Jason Gunthorpe
2025-10-02 14:43 ` Pasha Tatashin
2025-10-02 15:10 ` Jason Gunthorpe
2025-10-02 19:29 ` Samiullah Khawaja
2025-10-02 21:12 ` Jason Gunthorpe
2025-10-02 21:30 ` Pasha Tatashin
2025-10-02 22:58 ` Jason Gunthorpe
2025-10-02 23:56 ` Samiullah Khawaja
2025-10-03 12:09 ` Jason Gunthorpe
2025-10-02 1:00 ` Samiullah Khawaja
2025-10-02 13:41 ` Jason Gunthorpe
2025-10-02 14:59 ` Pasha Tatashin
2025-10-02 17:03 ` Samiullah Khawaja
2025-10-02 17:37 ` Jason Gunthorpe
2025-10-02 18:08 ` Samiullah Khawaja
2025-10-10 1:28 ` Pasha Tatashin
2025-10-10 14:24 ` Jason Gunthorpe
2025-09-28 19:06 ` [RFC PATCH 14/15] iommu/vt-d: sanitize restored root table and iommu contexts Samiullah Khawaja
2025-09-28 19:06 ` [RFC PATCH 15/15] iommufd/selftest: Add test to verify iommufd preservation Samiullah Khawaja
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox