public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: James Gowans <jgowans@amazon.com>
To: <linux-kernel@vger.kernel.org>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Kevin Tian" <kevin.tian@intel.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"Krzysztof Wilczyński" <kw@linux.com>,
	"Will Deacon" <will@kernel.org>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>,
	iommu@lists.linux.dev, "Sean Christopherson" <seanjc@google.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	kvm@vger.kernel.org, "David Woodhouse" <dwmw2@infradead.org>,
	"Lu Baolu" <baolu.lu@linux.intel.com>,
	"Alexander Graf" <graf@amazon.de>,
	anthony.yznaga@oracle.com, steven.sistare@oracle.com,
	nh-open-source@amazon.com, "Saenz Julienne,
	Nicolas" <nsaenz@amazon.es>
Subject: [RFC PATCH 00/13] Support iommu(fd) persistence for live update
Date: Mon, 16 Sep 2024 13:30:49 +0200	[thread overview]
Message-ID: <20240916113102.710522-1-jgowans@amazon.com> (raw)

Live update is a mechanism to support updating a hypervisor in a way
that has limited impact to running virtual machines. This is done by
pausing/serialising running VMs, kexec-ing into a new kernel, starting
new VMM processes and then deserialising/resuming the VMs so that they
continue running from where they were. When the VMs have DMA devices
assigned to them, the IOMMU state and page tables needs to be persisted
so that DMA transactions can continue across kexec.

Currently there is no mechanism in Linux to be able to continue running DMA
across kexec, and pick up handles to the original DMA mappings after
kexec. We are looking for a path to be able to support this capability
which is necessary for live update.  In this RFC patch series a
potential solution is sketched out, and we are looking for feedback on
the approach and userspace interfaces.

This RFC is intended to serve as the discussion ground for a Linux
Plumbers Conf 2024 session on iommu persistence:
https://lpc.events/event/18/contributions/1686/
... and a BoF session on memory persistence in general:
https://lpc.events/event/18/contributions/1970/
Please join those to further this discussion.

The concept is as follows:
IOMMUFDs are marked as persistent via a new option.  When a struct
iommu_domain is allocated by a iommufd, that iommu_domain also gets
marked as persistend.  Before kexec iommufd serialises the metadata for
all persistent domains to the KHO device tree blob. Similarly the IOMMU
platform driver marks all of the page table pages as persistent.  After
kexec the persistent IOMMUFDs are expose to userspace in sysfs.  Fresh
IOMMUFD objects are built up from the data which was passed across in
KHO. Userspace can open these sysfs files to get a handle on the IOMMUFD
again. Iommufd ensures that only persistent memory can be mapped into
persistent address spaces.

This depends on KHO as the foundational framework for passing data
across kexec and for marking pages as persistent:
https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com/

It also depends on guestmemfs as the provider of persistent guest RAM to
be mapped into the IOMMU:
https://lore.kernel.org/all/20240805093245.889357-1-jgowans@amazon.com/

The code is not a complete solution, it is more of a sketch to show the
moving parts and proposed interfaces to gather feedback and have the
discussion. Only a small portion of the IOMMUFD object and dmar_domains
are serialised; it is necessary to figure out all the data which needs
to be serialised to have a fully working deserialised object.

Sign-offs are omitted to make it clear that this is not for merging yet.

Adding maintainers from IOMMU drivers, iommufd, kexec and KVM (seeing as
this is designed for live update of a hypervisor specifically), and
others who have engaged with the topic of memory persistence previously.

James Gowans (13):
  iommufd: Support marking and tracking persistent iommufds
  iommufd: Add plumbing for KHO (de)serialise
  iommu/intel: zap context table entries on kexec
  iommu: Support marking domains as persistent on alloc
  iommufd: Serialise persisted iommufds and ioas
  iommufd: Expose persistent iommufd IDs in sysfs
  iommufd: Re-hydrate a usable iommufd ctx from sysfs
  intel-iommu: Add serialise and deserialise boilerplate
  intel-iommu: Serialise dmar_domain on KHO activaet
  intel-iommu: Re-hydrate persistent domains after kexec
  iommu: Add callback to restore persisted iommu_domain
  iommufd, guestmemfs: Ensure persistent file used for persistent DMA
  iommufd, guestmemfs: Pin files when mapped for persistent DMA

 drivers/iommu/amd/iommu.c                   |   4 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   3 +-
 drivers/iommu/intel/Makefile                |   1 +
 drivers/iommu/intel/dmar.c                  |   1 +
 drivers/iommu/intel/iommu.c                 |  84 ++++++--
 drivers/iommu/intel/iommu.h                 |  31 +++
 drivers/iommu/intel/serialise.c             | 174 ++++++++++++++++
 drivers/iommu/iommufd/Makefile              |   1 +
 drivers/iommu/iommufd/hw_pagetable.c        |   5 +-
 drivers/iommu/iommufd/io_pagetable.c        |   2 +-
 drivers/iommu/iommufd/ioas.c                |  26 +++
 drivers/iommu/iommufd/iommufd_private.h     |  34 ++++
 drivers/iommu/iommufd/main.c                |  75 ++++++-
 drivers/iommu/iommufd/selftest.c            |   1 +
 drivers/iommu/iommufd/serialise.c           | 213 ++++++++++++++++++++
 fs/guestmemfs/file.c                        |  25 +++
 fs/guestmemfs/guestmemfs.h                  |   1 +
 fs/guestmemfs/inode.c                       |   4 +
 include/linux/guestmemfs.h                  |  15 ++
 include/linux/iommu.h                       |  16 +-
 include/uapi/linux/iommufd.h                |   5 +
 21 files changed, 698 insertions(+), 23 deletions(-)
 create mode 100644 drivers/iommu/intel/serialise.c
 create mode 100644 drivers/iommu/iommufd/serialise.c

-- 
2.34.1


             reply	other threads:[~2024-09-16 11:31 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-16 11:30 James Gowans [this message]
2024-09-16 11:30 ` [RFC PATCH 01/13] iommufd: Support marking and tracking persistent iommufds James Gowans
2024-09-16 11:30 ` [RFC PATCH 02/13] iommufd: Add plumbing for KHO (de)serialise James Gowans
2024-09-16 11:30 ` [RFC PATCH 03/13] iommu/intel: zap context table entries on kexec James Gowans
2024-10-03 13:27   ` Jason Gunthorpe
2024-09-16 11:30 ` [RFC PATCH 04/13] iommu: Support marking domains as persistent on alloc James Gowans
2024-09-16 11:30 ` [RFC PATCH 05/13] iommufd: Serialise persisted iommufds and ioas James Gowans
2024-10-02 18:55   ` Jason Gunthorpe
2024-10-07  8:39     ` Gowans, James
2024-10-07  8:47       ` David Woodhouse
2024-10-07  8:57         ` Gowans, James
2024-10-07 15:01           ` Jason Gunthorpe
2024-10-09 11:44             ` Gowans, James
2024-10-09 12:28               ` Jason Gunthorpe
2024-10-10 15:12                 ` Gowans, James
2024-10-10 15:32                   ` Jason Gunthorpe
2024-10-07 15:11         ` Jason Gunthorpe
2024-10-07 15:16       ` Jason Gunthorpe
2024-10-16 22:20   ` Jacob Pan
2024-10-28 16:03     ` Jacob Pan
2024-11-02 10:22       ` Gowans, James
2024-11-04 13:00         ` Jason Gunthorpe
2024-11-06 19:18           ` Jacob Pan
2024-09-16 11:30 ` [RFC PATCH 06/13] iommufd: Expose persistent iommufd IDs in sysfs James Gowans
2024-09-16 11:30 ` [RFC PATCH 07/13] iommufd: Re-hydrate a usable iommufd ctx from sysfs James Gowans
2024-09-16 11:30 ` [RFC PATCH 08/13] intel-iommu: Add serialise and deserialise boilerplate James Gowans
2024-09-16 11:30 ` [RFC PATCH 09/13] intel-iommu: Serialise dmar_domain on KHO activaet James Gowans
2024-09-16 11:30 ` [RFC PATCH 10/13] intel-iommu: Re-hydrate persistent domains after kexec James Gowans
2024-09-16 11:31 ` [RFC PATCH 11/13] iommu: Add callback to restore persisted iommu_domain James Gowans
2024-10-03 13:33   ` Jason Gunthorpe
2024-09-16 11:31 ` [RFC PATCH 12/13] iommufd, guestmemfs: Ensure persistent file used for persistent DMA James Gowans
2024-10-03 13:36   ` Jason Gunthorpe
2024-09-16 11:31 ` [RFC PATCH 13/13] iommufd, guestmemfs: Pin files when mapped " James Gowans

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240916113102.710522-1-jgowans@amazon.com \
    --to=jgowans@amazon.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=graf@amazon.de \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kw@linux.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=madvenka@linux.microsoft.com \
    --cc=nh-open-source@amazon.com \
    --cc=nsaenz@amazon.es \
    --cc=pbonzini@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=steven.sistare@oracle.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox