public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: "Gowans, James" <jgowans@amazon.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"rppt@kernel.org" <rppt@kernel.org>,
	"kw@linux.com" <kw@linux.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"madvenka@linux.microsoft.com" <madvenka@linux.microsoft.com>,
	"anthony.yznaga@oracle.com" <anthony.yznaga@oracle.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
	"nh-open-source@amazon.com" <nh-open-source@amazon.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"seanjc@google.com" <seanjc@google.com>,
	"Saenz Julienne, Nicolas" <nsaenz@amazon.es>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"kevin.tian@intel.com" <kevin.tian@intel.com>,
	"dwmw2@infradead.org" <dwmw2@infradead.org>,
	"steven.sistare@oracle.com" <steven.sistare@oracle.com>,
	"Graf (AWS), Alexander" <graf@amazon.de>,
	"will@kernel.org" <will@kernel.org>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"maz@kernel.org" <maz@kernel.org>
Subject: Re: [RFC PATCH 05/13] iommufd: Serialise persisted iommufds and ioas
Date: Wed, 9 Oct 2024 09:28:30 -0300	[thread overview]
Message-ID: <20241009122830.GF762027@ziepe.ca> (raw)
In-Reply-To: <b76aa005c0fb75199cbb1fa0790858b9c808c90a.camel@amazon.com>

On Wed, Oct 09, 2024 at 11:44:30AM +0000, Gowans, James wrote:

> Okay, but in general this still means that the page tables must have
> exactly the same translations if we try to switch from one set to
> another. If it is possible to change translations then translation table
> entries could be created at different granularity (PTE, PMD, PUD) level
> which would violate this requirement. 

Yes, but we strive to make page tables consistently and it isn't that
often that we get new features that would chang the layout (contig bit
for instance). I'd suggest in these cases you'd add some creation flag
to the HWPT that can inhibit the new feature and your VMM will deal
with it.

Or you sweep it and manually split/join to deal with BBM < level
2. Generic pt will have code to do all of this so it is not that bad.

If this little issue already scares you then I don't think I want to
see you serialize anything more complex, there are endless scenarios
for compatibility problems :\

> It's also possible for different IOMMU driver versions to set up the the
> same translations, but at different page table levels. Perhaps an older
> version did not coalesce come PTEs, but a newer version does coalesce.
> Would the same translations but at a different size violate BBM?

Yes, that is the only thing that violates BBM.

> If we say that to be safe/correct in the general case then it is
> necessary for the translations to be *exactly* the same before and after
> kexec, is there any benefit to building new translation tables and
> switching to them? We may as well continue to use the exact same page
> tables and construct iommufd objects (IOAS, etc) to match.

The benifit is principally that you did all the machinery to get up to
that point, including re-pinning and so forth all the memory, instead
of trying to magically recover that additional state.

This is the philosophy that you replay instead of de-serialize, so you
have to replay into a page table at some level to make that work.

> There is also a performance consideration here: when doing live update
> every millisecond of down time matters. I'm not sure if this iommufd re-
> initialisation will end up being in the hot path of things that need to
> be done before the VM can start running again. 

As we talked about in the session, your KVM can start running
immediately, you don't need iommufd to be fully setup.

You only need iommufd fully working again if you intend to do certain
operations, like memory hotplug or something that requires an address
map change. So you can operate in a degraded state that is largely
invisible to the guest while recovering this stuff. It shouldn't be on
your critical path.

> then it would be useful to avoid rebuilding identical tables. Maybe it
> ends up being in the "warm" path - the VM can start running but will
> sleep if taking a page fault before IOMMUFD is re-initalised...

I didn't think you'd support page faults? There are bigger issues here
if you expect to have a vIOMMU in the guest.

Jason

  reply	other threads:[~2024-10-09 12:28 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-16 11:30 [RFC PATCH 00/13] Support iommu(fd) persistence for live update James Gowans
2024-09-16 11:30 ` [RFC PATCH 01/13] iommufd: Support marking and tracking persistent iommufds James Gowans
2024-09-16 11:30 ` [RFC PATCH 02/13] iommufd: Add plumbing for KHO (de)serialise James Gowans
2024-09-16 11:30 ` [RFC PATCH 03/13] iommu/intel: zap context table entries on kexec James Gowans
2024-10-03 13:27   ` Jason Gunthorpe
2024-09-16 11:30 ` [RFC PATCH 04/13] iommu: Support marking domains as persistent on alloc James Gowans
2024-09-16 11:30 ` [RFC PATCH 05/13] iommufd: Serialise persisted iommufds and ioas James Gowans
2024-10-02 18:55   ` Jason Gunthorpe
2024-10-07  8:39     ` Gowans, James
2024-10-07  8:47       ` David Woodhouse
2024-10-07  8:57         ` Gowans, James
2024-10-07 15:01           ` Jason Gunthorpe
2024-10-09 11:44             ` Gowans, James
2024-10-09 12:28               ` Jason Gunthorpe [this message]
2024-10-10 15:12                 ` Gowans, James
2024-10-10 15:32                   ` Jason Gunthorpe
2024-10-07 15:11         ` Jason Gunthorpe
2024-10-07 15:16       ` Jason Gunthorpe
2024-10-16 22:20   ` Jacob Pan
2024-10-28 16:03     ` Jacob Pan
2024-11-02 10:22       ` Gowans, James
2024-11-04 13:00         ` Jason Gunthorpe
2024-11-06 19:18           ` Jacob Pan
2024-09-16 11:30 ` [RFC PATCH 06/13] iommufd: Expose persistent iommufd IDs in sysfs James Gowans
2024-09-16 11:30 ` [RFC PATCH 07/13] iommufd: Re-hydrate a usable iommufd ctx from sysfs James Gowans
2024-09-16 11:30 ` [RFC PATCH 08/13] intel-iommu: Add serialise and deserialise boilerplate James Gowans
2024-09-16 11:30 ` [RFC PATCH 09/13] intel-iommu: Serialise dmar_domain on KHO activaet James Gowans
2024-09-16 11:30 ` [RFC PATCH 10/13] intel-iommu: Re-hydrate persistent domains after kexec James Gowans
2024-09-16 11:31 ` [RFC PATCH 11/13] iommu: Add callback to restore persisted iommu_domain James Gowans
2024-10-03 13:33   ` Jason Gunthorpe
2024-09-16 11:31 ` [RFC PATCH 12/13] iommufd, guestmemfs: Ensure persistent file used for persistent DMA James Gowans
2024-10-03 13:36   ` Jason Gunthorpe
2024-09-16 11:31 ` [RFC PATCH 13/13] iommufd, guestmemfs: Pin files when mapped " James Gowans

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241009122830.GF762027@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=anthony.yznaga@oracle.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=graf@amazon.de \
    --cc=iommu@lists.linux.dev \
    --cc=jgowans@amazon.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kw@linux.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=madvenka@linux.microsoft.com \
    --cc=maz@kernel.org \
    --cc=nh-open-source@amazon.com \
    --cc=nsaenz@amazon.es \
    --cc=pbonzini@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=steven.sistare@oracle.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox