public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/16] iommu: Add live update state preservation
@ 2026-04-27 17:56 Samiullah Khawaja
  2026-04-27 17:56 ` [PATCH v2 01/16] liveupdate: luo_file: Add internal APIs for file preservation Samiullah Khawaja
                   ` (15 more replies)
  0 siblings, 16 replies; 22+ messages in thread
From: Samiullah Khawaja @ 2026-04-27 17:56 UTC (permalink / raw)
  To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Jason Gunthorpe
  Cc: Samiullah Khawaja, Robin Murphy, Kevin Tian, Alex Williamson,
	Shuah Khan, iommu, linux-kernel, kvm, Saeed Mahameed,
	Adithya Jayachandran, Parav Pandit, Leon Romanovsky, William Tu,
	Pratyush Yadav, Pasha Tatashin, David Matlack, Andrew Morton,
	Chris Li, Pranjal Shrivastava, Vipin Sharma, YiFei Zhu

Hi,

This patch series introduces a mechanism for IOMMU state preservation
across live update, including the Intel VT-d driver support
implementation.

Please take a look at the following LWN article to learn about KHO and
Live Update Orchestrator:

https://lwn.net/Articles/1033364/

This work depends on,

- VFIO CDEV preservation series (v3):
  https://lore.kernel.org/kvm/20260323235817.1960573-1-dmatlack@google.com/
- PCI device preservation series (v4):
  https://lore.kernel.org/all/20260423212316.3431746-1-dmatlack@google.com/
- LUO series to add FLB refcounting:
  https://lore.kernel.org/lkml/20260423174032.3140399-1-dmatlack@google.com/

The kernel tree with all dependencies is uploaded to the following
Github location:

https://github.com/samikhawaja/linux/tree/iommu/phase1-v2

Overall Goals:

The goal of this effort is to preserve the IOMMU domains, managed by
iommufd, attached to devices preserved through VFIO cdev. This allows
DMA mappings and IOMMU context of a device assigned to a VM to be
maintained across a kexec live update.

This is achieved by preserving IOMMU page tables using Generic Page
Table support, IOMMU root table and the relevant context entries across
live update.

The functionality in the previously sent RFC is split into two phases
and this series implements the Phase 1. Phase 1 implements the following
functionality:

  - Foundational work in IOMMU core and VT-d driver to preserve and
    restore IOMMU translation units, IOMMU domains and devices across
    liveupdate kexec.
  - The preservation is triggered by preserving vfio cdev FD and bound
    iommufd FD into a live update session.
  - An HWPT (and backing IOMMU domain) is only preserved if it contains
    only file type DMA mappings. Also the memfd being used for such
    mapping should be SEAL SEAL'd during mapping.
  - During live update boot, the state of preserved Intel VT-d, IOMMU
    domain and devices is restored.
  - The restored IOMMU domains are reattached to the preserved devices
    during early boot.
  - The DMA ownership of restored devices is also claimed during
    live update boot. This means that any attempt to bind a non-vfio
    drivers with them or binding a new iommufd with them will fail.

Architectural Overview:

The target architecture for IOMMU state preservation across a live
update involves coordination between the Live Update Orchestrator,
iommufd, and the IOMMU drivers.

The core design uses the Live Update Orchestrator's file descriptor
preservation mechanism to preserve iommufd file descriptors. The user
marks the iommufd HWPTs for preservation using a new ioctl added in this
series. Once done, the preservation of iommufd inside an LUO session is
triggered using LUO ioctls. During preservation, the LUO preserve
callback for an iommufd walks through the HWPTs it manages to identify
the ones that need to be preserved. Once identified, a new IOMMU core
API is used to preserve the iommu domain. The IOMMU core uses Generic
Page Table to preserve the page tables of these domains. The domains are
then marked as preserved.

When the user triggers the preservation of a VFIO cdev that is attached
to an iommufd that is preserved, the device attachment state of that
VFIO cdev is also preserved using an API exported by iommufd. IOMMUFD
fetches all the information that needs to be preserved and calls the
IOMMU core API to preserve the device state. The IOMMU core also
preserves state of IOMMU that is associated with this device.

The IOMMU core has LUO FLB registered with the iommufd LUO file handler
so the preserved iommu domain and iommu hardware unit state is available
during boot for early restore in the next kernel.

During boot the driver fetches the preserved state from the IOMMU core
and restores the state of preserved IOMMUs. Later when IOMMU core goes
through the devices and probes them, the iommu domains of preserved
devices are restored and the preserved devices are attached to them.
During attachment, the DMA ownership of these devices is also claimed.

Tested:

The new iommufd_liveupdate_kexec_test selftest was used to verify the
preservation logic. It was tested using QEMU with virtual IOMMU (VT-d)
support with virtio pcie device bound to the vfio-pci driver.

Also Tested on an Intel machine with DSA device bound to vfio-pci
driver.

Following steps were followed for verification,

- Bind the test device with vfio-pci driver
- Run test on the machine by running

  ./iommufd_liveupdate_kexec_test <vfio-cdev-path>

- Trigger Kexec.
- After reboot, try binding the device to a non-vfio pci driver,

  echo <device bdf> > /sys/class/bus/drivers/pci-pf-stub/bind

- This should fail with "Device or resource busy".
- Bind the device with vfio-pci driver and run the test again.

  ./iommufd_liveupdate_kexec_test <vfio-cdev-path> --stage 2

- Test verifies that the device cannot be bound with a new iommufd and
  the session cannot be finished.

Future Work:

- Phase 2 with IOMMUFD restore to reclaim the preserved vfio cdev and
  restore the preserved HWPTs.
- Full support for PASID preservation.
- Nested IOMMU preservation.
- Extend support to other IOMMU architectures (e.g., AMD-Vi, Arm SMMUv3).
- DMA Alloc preservation support buddy allocator only.
  https://github.com/samikhawaja/linux/tree/dma-alloc-preserve

Roadmap:

  The doc below gives a break down of the overall work into Patch series
  needed to complete Liveupdate feature from IOMMU perspective:
  https://docs.google.com/document/d/1enDn-uPE9U77U-xHEnzn6HHGKiePSAtMIP8EDU3NO0M

High-Level Sequence Flow:

The following diagrams illustrate the high-level interactions during the
preservation phase. Note that function names in the diagram are kept
abbreviated to save horizontal space.

Preserve:

Before live update the PREPARE event of Liveupdate Orchestrator invokes
callbacks of the registered file and subsystem handlers.

 Userspace (VMM) | LUO Core |    iommufd    |  IOMMU Core   | IOMMU Driver
-----------------|----------|---------------|---------------|-------------
                 |          |               |               |
MARK_HWPT        |          |               |               |
--------------------------->                |               |
                 |          | Mark HWPT for |               |
                 |          | preservation  |               |
                 |          |               |               |
PRESERVE         |          |               |               |
 iommufd_fd      |          |               |               |
----------------->          |               |               |
                 | preserve |               |               |
                 |---------->               |               |
                 |          | For each HWPT |               |
                 |          |-------------->                |
                 |          |               | domain_presrv |
                 |          |               |-------------->
                 |          |               |               | gpt(preserve)
                 |          |               |<--------------|
                 |          |<--------------|               |
                 |<---------|               |               |
                 |          |               |               |
...              |          |               |               |
                 |          |               |               |
PRESERVE,        |          |               |               |
 vfio_cdev_fd    |          |               |               |
----------------->          |               |               |
                 | preserve |               |               |
                 |---------->               |               |
                 |          |               |               |
                 |          | iommu_preserv |               |
                 |          | _device()     |               |
                 |          |-------------->                |
                 |          |               | preserve      |
                 |          |               | (iommu_hw)    |
                 |          |               |-------------->
                 |          |               |               | preserve(root)
                 |          |               |               | preserve(pasid)
                 |          |               |<--------------|
                 |          |               |               |
                 |          |               | preserve      |
                 |          |               | _device(dev)  |
                 |          |               |-------------->
                 |          |               |               |
                 |          |               |<--------------|
                 |          |<--------------|               |
                 |<---------|               |               |

Restore:

After a live update, the preserved state is restored during boot.

 Userspace (VMM) | LUO Core |    iommufd    |  IOMMU Core   | IOMMU Driver
-----------------|----------|---------------|---------------|-------------
                 |          |               |               |
                 |          |               |               | Restore
                 |          |               |               | Root, DIDs
                 |          |               |               |
                 |          |               |               | Register
                 |          |               | probe devices |
                 |          |               |               |
                 |          |               | restore       |
                 |          |               | domain        |
                 |          |               |-------------->
                 |          |               |               | restore
                 |          |               | reattach      |
                 |          |               | domain        |
                 |          |               |-------------->
                 |          |               |               |


Looking forward to your feedback on this.

Changelog:

v2:
  - Move IOMMU_LIVEUPDATE under IOMMU_SUPPORT dependencies.
  - Update copyright year to 2026.
  - Add Kernel-doc for FLB struct.
  - Add an ASCII diagram for FLB memory layout.
  - Change compatibility to iommu-liveupdate-v1.
  - Rename structs to more reasonable names.
  - Add comment explaining the rationale for BUG_ON during restoration.
  - Rename 'did' to 'attachment_id'.
  - Use phys_to_virt() and virt_to_phys() consistently.
  - Create separate functions for FLB unpreserve and folio_put.
  - Return the virtual address from the restore_array function.
  - Free serialized state on finish without checking for NULL.
  - Rename 'iommu-lu.h' to 'iommu-liveupdate.h'.
  - Suffix max with _per_page (iommu_max_objs_per_page).
  - Move unused helpers to the patches that actually need them.
  - Rename reserve_obj_ser to alloc_object_ser.
  - Only allow vfio drivers to use preserve_device APIs.
  - Move ops declarations under CONFIG_IOMMU_LIVEUPDATE guards.
  - Add lockdep_assert_held() validations in locked functions.
  - Rename device_ser_match to match_device_ser.
  - Create iommu_folio_update_stats(folio, nr_pages) helper.
  - Explicitly set incoherent flag to false for restored pages.
  - Do not use unpreserve_pages() function for error handling.
  - Preserve vasz and sign_extend of a domain.
  - Make the restored domain free-only.
  - Ignore error of pt_descend() during restore as it should never fail (dead code).
  - On domain restore, clear all features except the preserved.
  - Add a KUnit test to verify that a restored domain can be freed with zeroed features.
  - Bypass paging_domain_compatible() checks for restored domains.
  - Use an explicit if-check for preserved state to clear the context entries.
  - Rename clean_context to clear_unpreserved_context_entries.
  - Use "DMAR:" in pr_fmt instead of "iommu:".
  - Pass ROOT_ENTRY_NR directly into unpreserve functions.
  - Remove IOMMU lock during preserve as not needed.
  - Iterate all devices when clearing context entries instead of just PCIe devices.
  - Do not populate empty unpreserve_device callbacks if not needed.
  - Rename unpreserve_iommu_context to unpreserve_iommu_context_table.
  - Check the associated IOMMU in _restore_used_domain_id.
  - Add a comment that MMIO base is used as token for identifying IOMMU.
  - Add a comment in _restore_used_domain_ids that IDA allocation can safely fail.
  - Mark IOMMU state restored after restoring to prevent double restoration and UAF.
  - Fix domain leak in iommu_restore_domain error path.
  - Check DID start range before allocation.
  - Add lockdep_assert_held in group domain restoration path.
  - Add WARN_ON to catch unexpected group ownership during device probe.
  - Add comment explaining group ownership setup and reclaim during device probe.
  - Cleanup the full PASID directory instead of only 4K.
  - Rename ioctl to IOMMU_HWPT_LIVEUPDATE_MARK_PRESERVE.
  - Use XArray to save the HWPT liveupdate marks.
  - Add the ioctl struct to ucmd_buffer.
  - Make the ioctl 64-bit aligned (struct padding).
  - Add UAPI documentation stating that HWPTs cannot be unmarked.
  - Add UAPI documentation stating that only file-based mappings are allowed.
  - Rename structs to use "liveupdate" instead of "lu".
  - Rework HWPT preservation to prevent TOCTOU and mutex lock inside XA spin lock.
  - Check preserved pages at the IOPT level, not the HWPT level.
  - Return -EOPNOTSUPP if there are PASID attachments.
  - Do not allow detach (or attach) once the device is preserved.
  - Add MODULE_IMPORT_NS("IOMMUFD") in vfio_pci_liveupdate.c.
  - Make token u64.
  - Modify selftest to use live update kexec_test pattern.
  - Use helper macros in test instead of repeating ksft_assert
  - Rename setup_cdev to open_cdev.
  - Define constants (tokens) as u64 to avoid ABI warnings.

v1: https://lore.kernel.org/all/20260203220948.2176157-1-skhawaja@google.com/

rfcv2: https://lore.kernel.org/all/20251202230303.1017519-1-skhawaja@google.com/

rfcv1: https://lore.kernel.org/all/20250928190624.3735830-1-skhawaja@google.com/

Pasha Tatashin (1):
  liveupdate: luo_file: Add internal APIs for file preservation

Samiullah Khawaja (13):
  iommu: Implement IOMMU Live update FLB callbacks
  iommu: Implement IOMMU domain preservation
  iommu: Implement device and IOMMU HW preservation
  iommu/pages: Add APIs to preserve/unpreserve/restore iommu pages
  iommupt: Implement preserve/unpreserve/restore callbacks
  iommu/vt-d: Implement device and iommu preserve/unpreserve ops
  iommu: Add APIs to get iommu and device preserved state
  iommu/vt-d: Restore IOMMU state and reclaimed domain ids
  iommu: Restore and reattach preserved domains to devices
  iommu/vt-d: preserve PASID table of preserved device
  iommufd: Add APIs to preserve/unpreserve a vfio cdev
  vfio/pci: Preserve the iommufd state of the vfio cdev
  iommufd/selftest: Add test to verify iommufd preservation

YiFei Zhu (2):
  iommufd: Implement ioctl to mark HWPT for preservation
  iommufd: Persist iommu hardware pagetables for live update

 MAINTAINERS                                   |  12 +
 drivers/iommu/Kconfig                         |  12 +
 drivers/iommu/Makefile                        |   1 +
 drivers/iommu/generic_pt/iommu_pt.h           | 131 ++++
 drivers/iommu/generic_pt/kunit_iommu_pt.h     |  28 +
 drivers/iommu/intel/Makefile                  |   1 +
 drivers/iommu/intel/iommu.c                   | 159 ++++-
 drivers/iommu/intel/iommu.h                   |  50 +-
 drivers/iommu/intel/liveupdate.c              | 337 ++++++++++
 drivers/iommu/intel/nested.c                  |   2 +-
 drivers/iommu/intel/pasid.c                   |   7 +-
 drivers/iommu/intel/pasid.h                   |   9 +
 drivers/iommu/iommu-pages.c                   | 108 +++-
 drivers/iommu/iommu-pages.h                   |  30 +
 drivers/iommu/iommu.c                         |  79 ++-
 drivers/iommu/iommufd/Makefile                |   1 +
 drivers/iommu/iommufd/device.c                | 102 +++
 drivers/iommu/iommufd/io_pagetable.c          |  11 +
 drivers/iommu/iommufd/io_pagetable.h          |   1 +
 drivers/iommu/iommufd/iommufd_private.h       |  46 ++
 drivers/iommu/iommufd/liveupdate.c            | 339 ++++++++++
 drivers/iommu/iommufd/main.c                  |  19 +-
 drivers/iommu/iommufd/pages.c                 |   7 +
 drivers/iommu/liveupdate.c                    | 592 ++++++++++++++++++
 drivers/vfio/device_cdev.c                    |  10 +
 drivers/vfio/pci/vfio_pci_liveupdate.c        |  33 +-
 include/linux/generic_pt/iommu.h              |  19 +-
 include/linux/iommu-liveupdate.h              | 156 +++++
 include/linux/iommu.h                         |  47 ++
 include/linux/iommufd.h                       |  29 +
 include/linux/kho/abi/iommu.h                 | 249 ++++++++
 include/linux/kho/abi/iommufd.h               |  51 ++
 include/linux/liveupdate.h                    |  21 +
 include/uapi/linux/iommufd.h                  |  26 +
 kernel/liveupdate/luo_file.c                  |  69 ++
 kernel/liveupdate/luo_internal.h              |  17 +
 tools/testing/selftests/iommu/Makefile        |  12 +
 .../iommu/iommufd_liveupdate_kexec_test.c     | 239 +++++++
 38 files changed, 3016 insertions(+), 46 deletions(-)
 create mode 100644 drivers/iommu/intel/liveupdate.c
 create mode 100644 drivers/iommu/iommufd/liveupdate.c
 create mode 100644 drivers/iommu/liveupdate.c
 create mode 100644 include/linux/iommu-liveupdate.h
 create mode 100644 include/linux/kho/abi/iommu.h
 create mode 100644 include/linux/kho/abi/iommufd.h
 create mode 100644 tools/testing/selftests/iommu/iommufd_liveupdate_kexec_test.c


base-commit: 2a4c0c11c0193889446cdb6f1540cc2b9aff97dd
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-05-04 19:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 17:56 [PATCH v2 00/16] iommu: Add live update state preservation Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 01/16] liveupdate: luo_file: Add internal APIs for file preservation Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 02/16] iommu: Implement IOMMU Live update FLB callbacks Samiullah Khawaja
2026-05-01 21:45   ` David Matlack
2026-04-27 17:56 ` [PATCH v2 03/16] iommu: Implement IOMMU domain preservation Samiullah Khawaja
2026-05-01 22:08   ` David Matlack
2026-05-04 18:33     ` Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 04/16] iommu: Implement device and IOMMU HW preservation Samiullah Khawaja
2026-05-01 22:42   ` David Matlack
2026-05-04 19:06     ` Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 05/16] iommu/pages: Add APIs to preserve/unpreserve/restore iommu pages Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 06/16] iommupt: Implement preserve/unpreserve/restore callbacks Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 07/16] iommu/vt-d: Implement device and iommu preserve/unpreserve ops Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 08/16] iommu: Add APIs to get iommu and device preserved state Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 09/16] iommu/vt-d: Restore IOMMU state and reclaimed domain ids Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 10/16] iommu: Restore and reattach preserved domains to devices Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 11/16] iommu/vt-d: preserve PASID table of preserved device Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 12/16] iommufd: Implement ioctl to mark HWPT for preservation Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 13/16] iommufd: Persist iommu hardware pagetables for live update Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 14/16] iommufd: Add APIs to preserve/unpreserve a vfio cdev Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 15/16] vfio/pci: Preserve the iommufd state of the " Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 16/16] iommufd/selftest: Add test to verify iommufd preservation Samiullah Khawaja

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox