From: "Gowans, James" <jgowans@amazon.com>
To: "jgg@ziepe.ca" <jgg@ziepe.ca>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"rppt@kernel.org" <rppt@kernel.org>,
"kw@linux.com" <kw@linux.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
"madvenka@linux.microsoft.com" <madvenka@linux.microsoft.com>,
"anthony.yznaga@oracle.com" <anthony.yznaga@oracle.com>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
"nh-open-source@amazon.com" <nh-open-source@amazon.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"seanjc@google.com" <seanjc@google.com>,
"Saenz Julienne, Nicolas" <nsaenz@amazon.es>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"kevin.tian@intel.com" <kevin.tian@intel.com>,
"dwmw2@infradead.org" <dwmw2@infradead.org>,
"steven.sistare@oracle.com" <steven.sistare@oracle.com>,
"Graf (AWS), Alexander" <graf@amazon.de>,
"will@kernel.org" <will@kernel.org>,
"joro@8bytes.org" <joro@8bytes.org>
Subject: Re: [RFC PATCH 05/13] iommufd: Serialise persisted iommufds and ioas
Date: Mon, 7 Oct 2024 08:39:53 +0000 [thread overview]
Message-ID: <d6328467adc9b7512f6dd88a6f8f843b8efdc154.camel@amazon.com> (raw)
In-Reply-To: <20241002185520.GL1369530@ziepe.ca>
On Wed, 2024-10-02 at 15:55 -0300, Jason Gunthorpe wrote:
> On Mon, Sep 16, 2024 at 01:30:54PM +0200, James Gowans wrote:
> > Now actually implementing the serialise callback for iommufd.
> > On KHO activate, iterate through all persisted domains and write their
> > metadata to the device tree format. For now just a few fields are
> > serialised to demonstrate the concept. To actually make this useful a
> > lot more field and related objects will need to be serialised too.
>
> But isn't that a rather difficult problem? The "a lot more fields"
> include things like pointers to the mm struct, the user_struct and
> task_struct, then all the pinning accounting as well.
>
> Coming work extends this to memfds and more is coming. I would expect
> this KHO stuff to use the memfd-like path to access the physical VM
> memory too.
>
> I think expecting to serialize and restore everything like this is
> probably much too complicated.
On reflection I think you're right - this will be complex both from a
development and a maintenance perspective, trying to make sure we
serialise all the necessary state and reconstruct it correctly. Even
more complex when structs are refactored/changed across kernel versions.
An important requirement of this functionality is the ability to kexec
between different kernel versions including going back to an older
kernel version in the case of a rollback.
So, let's look at other options:
>
> If you could just retain a small portion and then directly reconstruct
> the missing parts it seems like it would be more maintainable.
I think we have two other possible approaches here:
1. What this RFC is sketching out, serialising fields from the structs
and setting those fields again on deserialise. As you point out this
will be complicated.
2. Get userspace to do the work: userspace needs to re-do the ioctls
after kexec to reconstruct the objects. My main issue with this approach
is that the kernel needs to do some sort of trust but verify approach to
ensure that userspace constructs everything the same way after kexec as
it was before kexec. We don't want to end up in a state where the
iommufd objects don't match the persisted page tables.
3. Serialise and reply the ioctls. Ioctl APIs and payloads should
(must?) be stable across kernel versions. If IOMMUFD records the ioctls
executed by userspace then it could replay them as part of deserialise
and give userspace a handle to the resulting objects after kexec. This
way we are guaranteed consistent iommufd / IOAS objects. By "consistent"
I mean they are the same as before kexec and match the persisted page
tables. By having the kernel do this it means it doesn't need to depend
on userspace doing the correct thing.
What do you think of this 3rd approach? I can try to sketch it out and
send another RFC if you think it sounds reasonable.
>
> Ie "recover" a HWPT from a KHO on a manually created a IOAS with the
> right "memfd" for the backing storage. Then the recovery can just
> validate that things are correct and adopt the iommu_domain as the
> hwpt.
This sounds more like option 2 where we expect userspace to re-drive the
ioctls, but verify that they have corresponding payloads as before kexec
so that iommufd objects are consistent with persisted page tables.
If the kernel is doing verification wouldn't it be better for the kernel
to do the ioctl work itself and give the resulting objects to userspace?
>
> Eventually you'll want this to work for the viommus as well, and that
> seems like a lot more tricky complexity..
>
> Jason
next prev parent reply other threads:[~2024-10-07 8:39 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-16 11:30 [RFC PATCH 00/13] Support iommu(fd) persistence for live update James Gowans
2024-09-16 11:30 ` [RFC PATCH 01/13] iommufd: Support marking and tracking persistent iommufds James Gowans
2024-09-16 11:30 ` [RFC PATCH 02/13] iommufd: Add plumbing for KHO (de)serialise James Gowans
2024-09-16 11:30 ` [RFC PATCH 03/13] iommu/intel: zap context table entries on kexec James Gowans
2024-10-03 13:27 ` Jason Gunthorpe
2024-09-16 11:30 ` [RFC PATCH 04/13] iommu: Support marking domains as persistent on alloc James Gowans
2024-09-16 11:30 ` [RFC PATCH 05/13] iommufd: Serialise persisted iommufds and ioas James Gowans
2024-10-02 18:55 ` Jason Gunthorpe
2024-10-07 8:39 ` Gowans, James [this message]
2024-10-07 8:47 ` David Woodhouse
2024-10-07 8:57 ` Gowans, James
2024-10-07 15:01 ` Jason Gunthorpe
2024-10-09 11:44 ` Gowans, James
2024-10-09 12:28 ` Jason Gunthorpe
2024-10-10 15:12 ` Gowans, James
2024-10-10 15:32 ` Jason Gunthorpe
2024-10-07 15:11 ` Jason Gunthorpe
2024-10-07 15:16 ` Jason Gunthorpe
2024-10-16 22:20 ` Jacob Pan
2024-10-28 16:03 ` Jacob Pan
2024-11-02 10:22 ` Gowans, James
2024-11-04 13:00 ` Jason Gunthorpe
2024-11-06 19:18 ` Jacob Pan
2024-09-16 11:30 ` [RFC PATCH 06/13] iommufd: Expose persistent iommufd IDs in sysfs James Gowans
2024-09-16 11:30 ` [RFC PATCH 07/13] iommufd: Re-hydrate a usable iommufd ctx from sysfs James Gowans
2024-09-16 11:30 ` [RFC PATCH 08/13] intel-iommu: Add serialise and deserialise boilerplate James Gowans
2024-09-16 11:30 ` [RFC PATCH 09/13] intel-iommu: Serialise dmar_domain on KHO activaet James Gowans
2024-09-16 11:30 ` [RFC PATCH 10/13] intel-iommu: Re-hydrate persistent domains after kexec James Gowans
2024-09-16 11:31 ` [RFC PATCH 11/13] iommu: Add callback to restore persisted iommu_domain James Gowans
2024-10-03 13:33 ` Jason Gunthorpe
2024-09-16 11:31 ` [RFC PATCH 12/13] iommufd, guestmemfs: Ensure persistent file used for persistent DMA James Gowans
2024-10-03 13:36 ` Jason Gunthorpe
2024-09-16 11:31 ` [RFC PATCH 13/13] iommufd, guestmemfs: Pin files when mapped " James Gowans
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d6328467adc9b7512f6dd88a6f8f843b8efdc154.camel@amazon.com \
--to=jgowans@amazon.com \
--cc=anthony.yznaga@oracle.com \
--cc=baolu.lu@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=graf@amazon.de \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kw@linux.com \
--cc=linux-kernel@vger.kernel.org \
--cc=madvenka@linux.microsoft.com \
--cc=nh-open-source@amazon.com \
--cc=nsaenz@amazon.es \
--cc=pbonzini@redhat.com \
--cc=robin.murphy@arm.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=steven.sistare@oracle.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox