public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>,
	iommu@lists.linux.dev, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, alex.williamson@redhat.com,
	pbonzini@redhat.com, joro@8bytes.org, will@kernel.org,
	robin.murphy@arm.com, kevin.tian@intel.com,
	baolu.lu@linux.intel.com, dwmw2@infradead.org,
	yi.l.liu@intel.com
Subject: Re: [RFC PATCH 00/42] Sharing KVM TDP to IOMMU
Date: Mon, 4 Dec 2023 15:50:55 -0400	[thread overview]
Message-ID: <20231204195055.GA2692119@nvidia.com> (raw)
In-Reply-To: <ZW4nCUS9VDk0DycG@google.com>

On Mon, Dec 04, 2023 at 11:22:49AM -0800, Sean Christopherson wrote:
> On Mon, Dec 04, 2023, Jason Gunthorpe wrote:
> > On Mon, Dec 04, 2023 at 09:00:55AM -0800, Sean Christopherson wrote:
> > 
> > > There are more approaches beyond having IOMMUFD and KVM be
> > > completely separate entities.  E.g. extract the bulk of KVM's "TDP
> > > MMU" implementation to common code so that IOMMUFD doesn't need to
> > > reinvent the wheel.
> > 
> > We've pretty much done this already, it is called "hmm" and it is what
> > the IO world uses. Merging/splitting huge page is just something that
> > needs some coding in the page table code, that people want for other
> > reasons anyhow.
> 
> Not really.  HMM is a wildly different implementation than KVM's TDP MMU.  At a
> glance, HMM is basically a variation on the primary MMU, e.g. deals with VMAs,
> runs under mmap_lock (or per-VMA locks?), and faults memory into the primary MMU
> while walking the "secondary" HMM page tables.

hmm supports the essential idea of shadowing parts of the primary
MMU. This is a big chunk of what kvm is doing, just differently.

> KVM's TDP MMU (and all of KVM's flavors of MMUs) is much more of a pure secondary
> MMU.  The core of a KVM MMU maps GFNs to PFNs, the intermediate steps that involve
> the primary MMU are largely orthogonal.  E.g. getting a PFN from guest_memfd
> instead of the primary MMU essentially boils down to invoking kvm_gmem_get_pfn()
> instead of __gfn_to_pfn_memslot(), the MMU proper doesn't care how the PFN was
> resolved.  I.e. 99% of KVM's MMU logic has no interaction with the primary MMU.

Hopefully the memfd stuff we be generalized so we can use it in
iommufd too, without relying on kvm. At least the first basic stuff
should be doable fairly soon.

> I'm not advocating mirroring/copying/shadowing page tables between KVM and the
> IOMMU.  I'm suggesting managing IOMMU page tables mostly independently, but reusing
> KVM code to do so.

I guess from my POV, if KVM has two copies of the logically same radix
tree then that is fine too.

> Yes, sharing page tables will Just Work for faulting in memory, but the downside
> is that _when_, not if, KVM modifies PTEs for whatever reason, those modifications
> will also impact the IO path.  My understanding is that IO page faults are at least
> an order of magnitude more expensive than CPU page faults.  That means that what's
> optimal for CPU page tables may not be optimal, or even _viable_, for IOMMU page
> tables.

Yes, you wouldn't want to do some of the same KVM techniques today in
a shared mode.
 
> E.g. based on our conversation at LPC, write-protecting guest memory to do dirty
> logging is not a viable option for the IOMMU because the latency of the resulting
> IOPF is too high.  Forcing KVM to use D-bit dirty logging for CPUs just because
> the VM has passthrough (mediated?) devices would be likely a
> non-starter.

Yes

> One of my biggest concerns with sharing page tables between KVM and IOMMUs is that
> we will end up having to revert/reject changes that benefit KVM's usage due to
> regressing the IOMMU usage.

It is certainly a strong argument

> I'm not suggesting full blown mirroring, all I'm suggesting is a fire-and-forget
> notifier for KVM to tell IOMMUFD "I've faulted in GFN A, you might want to do the
> same".

If we say the only thing this works with is the memfd version of KVM,
could we design the memfd stuff to not have the same challenges with
mirroring as normal VMAs? 

> It wouldn't even necessarily need to be a notifier per se, e.g. if we taught KVM
> to manage IOMMU page tables, then KVM could simply install mappings for multiple
> sets of page tables as appropriate.

This somehow feels more achievable to me since KVM already has all the
code to handle multiple TDPs, having two parallel ones is probably
much easier than trying to weld KVM to a different page table
implementation through some kind of loose coupled notifier.

Jason

  reply	other threads:[~2023-12-04 19:51 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-02  9:12 [RFC PATCH 00/42] Sharing KVM TDP to IOMMU Yan Zhao
2023-12-02  9:13 ` [RFC PATCH 01/42] KVM: Public header for KVM to export TDP Yan Zhao
2023-12-02  9:15 ` [RFC PATCH 02/42] KVM: x86: Arch header for kvm to export TDP for Intel Yan Zhao
2023-12-02  9:15 ` [RFC PATCH 03/42] KVM: Introduce VM ioctl KVM_CREATE_TDP_FD Yan Zhao
2023-12-02  9:16 ` [RFC PATCH 04/42] KVM: Skeleton of KVM TDP FD object Yan Zhao
2023-12-02  9:16 ` [RFC PATCH 05/42] KVM: Embed "arch" object and call arch init/destroy in TDP FD Yan Zhao
2023-12-02  9:17 ` [RFC PATCH 06/42] KVM: Register/Unregister importers to KVM exported TDP Yan Zhao
2023-12-02  9:18 ` [RFC PATCH 07/42] KVM: Forward page fault requests to arch specific code for " Yan Zhao
2023-12-02  9:18 ` [RFC PATCH 08/42] KVM: Add a helper to notify importers that KVM exported TDP is flushed Yan Zhao
2023-12-02  9:19 ` [RFC PATCH 09/42] iommu: Add IOMMU_DOMAIN_KVM Yan Zhao
2023-12-02  9:20 ` [RFC PATCH 10/42] iommu: Add new iommu op to create domains managed by KVM Yan Zhao
2023-12-04 15:09   ` Jason Gunthorpe
2023-12-02  9:20 ` [RFC PATCH 11/42] iommu: Add new domain op cache_invalidate_kvm Yan Zhao
2023-12-04 15:09   ` Jason Gunthorpe
2023-12-05  6:40     ` Yan Zhao
2023-12-05 14:52       ` Jason Gunthorpe
2023-12-06  1:00         ` Yan Zhao
2023-12-02  9:21 ` [RFC PATCH 12/42] iommufd: Introduce allocation data info and flag for KVM managed HWPT Yan Zhao
2023-12-04 18:29   ` Jason Gunthorpe
2023-12-05  7:08     ` Yan Zhao
2023-12-05 14:53       ` Jason Gunthorpe
2023-12-06  0:58         ` Yan Zhao
2023-12-02  9:21 ` [RFC PATCH 13/42] iommufd: Add a KVM HW pagetable object Yan Zhao
2023-12-02  9:22 ` [RFC PATCH 14/42] iommufd: Enable KVM HW page table object to be proxy between KVM and IOMMU Yan Zhao
2023-12-04 18:34   ` Jason Gunthorpe
2023-12-05  7:09     ` Yan Zhao
2023-12-02  9:22 ` [RFC PATCH 15/42] iommufd: Add iopf handler to KVM hw pagetable Yan Zhao
2023-12-02  9:23 ` [RFC PATCH 16/42] iommufd: Enable device feature IOPF during device attachment to KVM HWPT Yan Zhao
2023-12-04 18:36   ` Jason Gunthorpe
2023-12-05  7:14     ` Yan Zhao
2023-12-05 14:53       ` Jason Gunthorpe
2023-12-06  0:55         ` Yan Zhao
2023-12-02  9:23 ` [RFC PATCH 17/42] iommu/vt-d: Make some macros and helpers to be extern Yan Zhao
2023-12-02  9:24 ` [RFC PATCH 18/42] iommu/vt-d: Support of IOMMU_DOMAIN_KVM domain in Intel IOMMU Yan Zhao
2023-12-02  9:24 ` [RFC PATCH 19/42] iommu/vt-d: Set bit PGSNP in PASIDTE if domain cache coherency is enforced Yan Zhao
2023-12-02  9:25 ` [RFC PATCH 20/42] iommu/vt-d: Support attach devices to IOMMU_DOMAIN_KVM domain Yan Zhao
2023-12-02  9:26 ` [RFC PATCH 21/42] iommu/vt-d: Check reserved bits for " Yan Zhao
2023-12-02  9:26 ` [RFC PATCH 22/42] iommu/vt-d: Support cache invalidate of " Yan Zhao
2023-12-02  9:26 ` [RFC PATCH 23/42] iommu/vt-d: Allow pasid 0 in IOPF Yan Zhao
2023-12-02  9:27 ` [RFC PATCH 24/42] KVM: x86/mmu: Move bit SPTE_MMU_PRESENT from bit 11 to bit 59 Yan Zhao
2023-12-02  9:27 ` [RFC PATCH 25/42] KVM: x86/mmu: Abstract "struct kvm_mmu_common" from "struct kvm_mmu" Yan Zhao
2023-12-02  9:28 ` [RFC PATCH 26/42] KVM: x86/mmu: introduce new op get_default_mt_mask to kvm_x86_ops Yan Zhao
2023-12-02  9:28 ` [RFC PATCH 27/42] KVM: x86/mmu: change param "vcpu" to "kvm" in kvm_mmu_hugepage_adjust() Yan Zhao
2023-12-02  9:29 ` [RFC PATCH 28/42] KVM: x86/mmu: change "vcpu" to "kvm" in page_fault_handle_page_track() Yan Zhao
2023-12-02  9:29 ` [RFC PATCH 29/42] KVM: x86/mmu: remove param "vcpu" from kvm_mmu_get_tdp_level() Yan Zhao
2023-12-02  9:30 ` [RFC PATCH 30/42] KVM: x86/mmu: remove param "vcpu" from kvm_calc_tdp_mmu_root_page_role() Yan Zhao
2023-12-02  9:30 ` [RFC PATCH 31/42] KVM: x86/mmu: add extra param "kvm" to kvm_faultin_pfn() Yan Zhao
2023-12-02  9:31 ` [RFC PATCH 32/42] KVM: x86/mmu: add extra param "kvm" to make_mmio_spte() Yan Zhao
2023-12-02  9:31 ` [RFC PATCH 33/42] KVM: x86/mmu: add extra param "kvm" to make_spte() Yan Zhao
2023-12-02  9:32 ` [RFC PATCH 34/42] KVM: x86/mmu: add extra param "kvm" to tdp_mmu_map_handle_target_level() Yan Zhao
2023-12-02  9:32 ` [RFC PATCH 35/42] KVM: x86/mmu: Get/Put TDP root page to be exported Yan Zhao
2023-12-02  9:33 ` [RFC PATCH 36/42] KVM: x86/mmu: Keep exported TDP root valid Yan Zhao
2023-12-02  9:33 ` [RFC PATCH 37/42] KVM: x86: Implement KVM exported TDP fault handler on x86 Yan Zhao
2023-12-02  9:35 ` [RFC PATCH 38/42] KVM: x86: "compose" and "get" interface for meta data of exported TDP Yan Zhao
2023-12-02  9:35 ` [RFC PATCH 39/42] KVM: VMX: add config KVM_INTEL_EXPORTED_EPT Yan Zhao
2023-12-02  9:36 ` [RFC PATCH 40/42] KVM: VMX: Compose VMX specific meta data for KVM exported TDP Yan Zhao
2023-12-02  9:36 ` [RFC PATCH 41/42] KVM: VMX: Implement ops .flush_remote_tlbs* in VMX when EPT is on Yan Zhao
2023-12-02  9:37 ` [RFC PATCH 42/42] KVM: VMX: Notify importers of exported TDP to flush TLBs on KVM flushes EPT Yan Zhao
2023-12-04 15:08 ` [RFC PATCH 00/42] Sharing KVM TDP to IOMMU Jason Gunthorpe
2023-12-04 16:38   ` Sean Christopherson
2023-12-05  1:31     ` Yan Zhao
2023-12-05  6:45       ` Tian, Kevin
2023-12-05  1:52   ` Yan Zhao
2023-12-05  6:30   ` Tian, Kevin
2023-12-04 17:00 ` Sean Christopherson
2023-12-04 17:30   ` Jason Gunthorpe
2023-12-04 19:22     ` Sean Christopherson
2023-12-04 19:50       ` Jason Gunthorpe [this message]
2023-12-04 20:11         ` Sean Christopherson
2023-12-04 23:49           ` Jason Gunthorpe
2023-12-05  7:17         ` Tian, Kevin
2023-12-05  5:53       ` Yan Zhao
2023-12-05  3:51   ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231204195055.GA2692119@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=seanjc@google.com \
    --cc=will@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox