From: Jason Gunthorpe <jgg@ziepe.ca>
To: "Gowans, James" <jgowans@amazon.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"rppt@kernel.org" <rppt@kernel.org>,
"kw@linux.com" <kw@linux.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
"madvenka@linux.microsoft.com" <madvenka@linux.microsoft.com>,
"anthony.yznaga@oracle.com" <anthony.yznaga@oracle.com>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
"nh-open-source@amazon.com" <nh-open-source@amazon.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"seanjc@google.com" <seanjc@google.com>,
"Saenz Julienne, Nicolas" <nsaenz@amazon.es>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"kevin.tian@intel.com" <kevin.tian@intel.com>,
"dwmw2@infradead.org" <dwmw2@infradead.org>,
"steven.sistare@oracle.com" <steven.sistare@oracle.com>,
"Graf (AWS), Alexander" <graf@amazon.de>,
"will@kernel.org" <will@kernel.org>,
"joro@8bytes.org" <joro@8bytes.org>,
"maz@kernel.org" <maz@kernel.org>
Subject: Re: [RFC PATCH 05/13] iommufd: Serialise persisted iommufds and ioas
Date: Thu, 10 Oct 2024 12:32:48 -0300 [thread overview]
Message-ID: <20241010153248.GH762027@ziepe.ca> (raw)
In-Reply-To: <673df8a09723d3398ca9e9c638893547b0b0ec63.camel@amazon.com>
On Thu, Oct 10, 2024 at 03:12:09PM +0000, Gowans, James wrote:
> > If this little issue already scares you then I don't think I want to
> > see you serialize anything more complex, there are endless scenarios
> > for compatibility problems :\
>
> The things that scare me is some subtle page table difference which
> causes silent data corruption... This is one of the reasons I liked re-
> using the existing tables, there is no way for this sort of subtle bug
> to happen.
> > > If we say that to be safe/correct in the general case then it is
> > > necessary for the translations to be *exactly* the same before and after
> > > kexec, is there any benefit to building new translation tables and
> > > switching to them? We may as well continue to use the exact same page
> > > tables and construct iommufd objects (IOAS, etc) to match.
> >
> > The benifit is principally that you did all the machinery to get up to
> > that point, including re-pinning and so forth all the memory, instead
> > of trying to magically recover that additional state.
> >
> > This is the philosophy that you replay instead of de-serialize, so you
> > have to replay into a page table at some level to make that work.
>
> We could have some "skip_pgtable_update" flag which the replay machinery
> sets, allowing IOMMUFD to create fresh objects internally and leave the
> page tables alone?
The point made before was that iommufd hard depends on the content of
the iommu_domain for correctness since it uses it as the storage for
the PFNs.
Making an assumption that the prior kernle domain matches what iommufd
requires opens up the easy possibility of hypervisor kernel
corruption.
I think this is a bad direction..
You have to at least validate that userspace has set things up in a
way that is consistent with the prior domain before adopting it.
It would be easier to understand this if the performance costs of
doing such a validation was more understood. Perhaps it can be
optimized somehow.
> > > then it would be useful to avoid rebuilding identical tables. Maybe it
> > > ends up being in the "warm" path - the VM can start running but will
> > > sleep if taking a page fault before IOMMUFD is re-initalised...
> >
> > I didn't think you'd support page faults? There are bigger issues here
> > if you expect to have a vIOMMU in the guest.
>
> vIOMMU is one case, but another is memory oversubscription. With PRI/ATS
> we can oversubscribe memory which is DMA mapped. In that case a page
> fault would be a blocking operation until IOMMUFD is all set up and
> ready to go. I suspect there will be benefit in getting this fast, but
> as long as we have a path to optimise it in future I'm totally fine to
> start with re-creating everything.
Yes, this is true, but if you intend to do this kind of manipulation
of the page table then it really should be in the exact format the new
kernel is tested to understand. Expecting the new kernel to interwork
with the old kernel's page table is likely to be OK, but also along
the same lines of your fear there could be differences :\
Still, PRI/ATS for backing guests storage is a pretty advanced
concept, we don't have support for that yet.
Jason
next prev parent reply other threads:[~2024-10-10 15:32 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-16 11:30 [RFC PATCH 00/13] Support iommu(fd) persistence for live update James Gowans
2024-09-16 11:30 ` [RFC PATCH 01/13] iommufd: Support marking and tracking persistent iommufds James Gowans
2024-09-16 11:30 ` [RFC PATCH 02/13] iommufd: Add plumbing for KHO (de)serialise James Gowans
2024-09-16 11:30 ` [RFC PATCH 03/13] iommu/intel: zap context table entries on kexec James Gowans
2024-10-03 13:27 ` Jason Gunthorpe
2024-09-16 11:30 ` [RFC PATCH 04/13] iommu: Support marking domains as persistent on alloc James Gowans
2024-09-16 11:30 ` [RFC PATCH 05/13] iommufd: Serialise persisted iommufds and ioas James Gowans
2024-10-02 18:55 ` Jason Gunthorpe
2024-10-07 8:39 ` Gowans, James
2024-10-07 8:47 ` David Woodhouse
2024-10-07 8:57 ` Gowans, James
2024-10-07 15:01 ` Jason Gunthorpe
2024-10-09 11:44 ` Gowans, James
2024-10-09 12:28 ` Jason Gunthorpe
2024-10-10 15:12 ` Gowans, James
2024-10-10 15:32 ` Jason Gunthorpe [this message]
2024-10-07 15:11 ` Jason Gunthorpe
2024-10-07 15:16 ` Jason Gunthorpe
2024-10-16 22:20 ` Jacob Pan
2024-10-28 16:03 ` Jacob Pan
2024-11-02 10:22 ` Gowans, James
2024-11-04 13:00 ` Jason Gunthorpe
2024-11-06 19:18 ` Jacob Pan
2024-09-16 11:30 ` [RFC PATCH 06/13] iommufd: Expose persistent iommufd IDs in sysfs James Gowans
2024-09-16 11:30 ` [RFC PATCH 07/13] iommufd: Re-hydrate a usable iommufd ctx from sysfs James Gowans
2024-09-16 11:30 ` [RFC PATCH 08/13] intel-iommu: Add serialise and deserialise boilerplate James Gowans
2024-09-16 11:30 ` [RFC PATCH 09/13] intel-iommu: Serialise dmar_domain on KHO activaet James Gowans
2024-09-16 11:30 ` [RFC PATCH 10/13] intel-iommu: Re-hydrate persistent domains after kexec James Gowans
2024-09-16 11:31 ` [RFC PATCH 11/13] iommu: Add callback to restore persisted iommu_domain James Gowans
2024-10-03 13:33 ` Jason Gunthorpe
2024-09-16 11:31 ` [RFC PATCH 12/13] iommufd, guestmemfs: Ensure persistent file used for persistent DMA James Gowans
2024-10-03 13:36 ` Jason Gunthorpe
2024-09-16 11:31 ` [RFC PATCH 13/13] iommufd, guestmemfs: Pin files when mapped " James Gowans
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241010153248.GH762027@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=anthony.yznaga@oracle.com \
--cc=baolu.lu@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=graf@amazon.de \
--cc=iommu@lists.linux.dev \
--cc=jgowans@amazon.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kw@linux.com \
--cc=linux-kernel@vger.kernel.org \
--cc=madvenka@linux.microsoft.com \
--cc=maz@kernel.org \
--cc=nh-open-source@amazon.com \
--cc=nsaenz@amazon.es \
--cc=pbonzini@redhat.com \
--cc=robin.murphy@arm.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=steven.sistare@oracle.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox