From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: iommu@lists.linux.dev, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, alex.williamson@redhat.com,
jgg@nvidia.com, pbonzini@redhat.com, joro@8bytes.org,
will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com,
baolu.lu@linux.intel.com, dwmw2@infradead.org,
yi.l.liu@intel.com
Subject: Re: [RFC PATCH 00/42] Sharing KVM TDP to IOMMU
Date: Mon, 4 Dec 2023 09:00:55 -0800 [thread overview]
Message-ID: <ZW4Fx2U80L1PJKlh@google.com> (raw)
In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com>
On Sat, Dec 02, 2023, Yan Zhao wrote:
> This RFC series proposes a framework to resolve IOPF by sharing KVM TDP
> (Two Dimensional Paging) page table to IOMMU as its stage 2 paging
> structure to support IOPF (IO page fault) on IOMMU's stage 2 paging
> structure.
>
> Previously, all guest pages have to be pinned and mapped in IOMMU stage 2
> paging structures after pass-through devices attached, even if the device
> has IOPF capability. Such all-guest-memory pinning can be avoided when IOPF
> handling for stage 2 paging structure is supported and if there are only
> IOPF-capable devices attached to a VM.
>
> There are 2 approaches to support IOPF on IOMMU stage 2 paging structures:
> - Supporting by IOMMUFD/IOMMU alone
> IOMMUFD handles IO page faults on stage-2 HWPT by calling GUPs and then
> iommu_map() to setup IOVA mappings. (IOAS is required to keep info of GPA
> to HVA, but page pinning/unpinning needs to be skipped.)
> Then upon MMU notifiers on host primary MMU, iommu_unmap() is called to
> adjust IOVA mappings accordingly.
> IOMMU driver needs to support unmapping sub-ranges of a previous mapped
> range and take care of huge page merge and split in atomic way. [1][2].
>
> - Sharing KVM TDP
> IOMMUFD sets the root of KVM TDP page table (EPT/NPT in x86) as the root
> of IOMMU stage 2 paging structure, and routes IO page faults to KVM.
> (This assumes that the iommu hw supports the same stage-2 page table
> format as CPU.)
> In this model the page table is centrally managed by KVM (mmu notifier,
> page mapping, subpage unmapping, atomic huge page split/merge, etc.),
> while IOMMUFD only needs to invalidate iotlb/devtlb properly.
There are more approaches beyond having IOMMUFD and KVM be completely separate
entities. E.g. extract the bulk of KVM's "TDP MMU" implementation to common code
so that IOMMUFD doesn't need to reinvent the wheel.
> Currently, there's no upstream code available to support stage 2 IOPF yet.
>
> This RFC chooses to implement "Sharing KVM TDP" approach which has below
> main benefits:
Please list out the pros and cons for each. In the cons column for piggybacking
KVM's page tables:
- *Significantly* increases the complexity in KVM
- Puts constraints on what KVM can/can't do in the future (see the movement
of SPTE_MMU_PRESENT).
- Subjects IOMMUFD to all of KVM's historical baggage, e.g. the memslot deletion
mess, the truly nasty MTRR emulation (which I still hope to delete), the NX
hugepage mitigation, etc.
Please also explain the intended/expected/targeted use cases. E.g. if the main
use case is for device passthrough to slice-of-hardware VMs that aren't memory
oversubscribed,
> - Unified page table management
> The complexity of allocating guest pages per GPAs, registering to MMU
> notifier on host primary MMU, sub-page unmapping, atomic page merge/split
Please find different terminology than "sub-page". With Sub-Page Protection, Intel
has more or less established "sub-page" to mean "less than 4KiB granularity". But
that can't possibly what you mean here because KVM doesn't support (un)mapping
memory at <4KiB granularity. Based on context above, I assume you mean "unmapping
arbitrary pages within a given range".
> are only required to by handled in KVM side, which has been doing that
> well for a long time.
>
> - Reduced page faults:
> Only one page fault is triggered on a single GPA, either caused by IO
> access or by vCPU access. (compared to one IO page fault for DMA and one
> CPU page fault for vCPUs in the non-shared approach.)
This would be relatively easy to solve with bi-directional notifiers, i.e. KVM
notifies IOMMUFD when a vCPU faults in a page, and vice versa.
> - Reduced memory consumption:
> Memory of one page table are saved.
I'm not convinced that memory consumption is all that interesting. If a VM is
mapping the majority of memory into a device, then odds are good that the guest
is backed with at least 2MiB page, if not 1GiB pages, at which point the memory
overhead for pages tables is quite small, especially relative to the total amount
of memory overheads for such systems.
If a VM is mapping only a small subset of its memory into devices, then the IOMMU
page tables should be sparsely populated, i.e. won't consume much memory.
next prev parent reply other threads:[~2023-12-04 17:00 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-02 9:12 [RFC PATCH 00/42] Sharing KVM TDP to IOMMU Yan Zhao
2023-12-02 9:13 ` [RFC PATCH 01/42] KVM: Public header for KVM to export TDP Yan Zhao
2023-12-02 9:15 ` [RFC PATCH 02/42] KVM: x86: Arch header for kvm to export TDP for Intel Yan Zhao
2023-12-02 9:15 ` [RFC PATCH 03/42] KVM: Introduce VM ioctl KVM_CREATE_TDP_FD Yan Zhao
2023-12-02 9:16 ` [RFC PATCH 04/42] KVM: Skeleton of KVM TDP FD object Yan Zhao
2023-12-02 9:16 ` [RFC PATCH 05/42] KVM: Embed "arch" object and call arch init/destroy in TDP FD Yan Zhao
2023-12-02 9:17 ` [RFC PATCH 06/42] KVM: Register/Unregister importers to KVM exported TDP Yan Zhao
2023-12-02 9:18 ` [RFC PATCH 07/42] KVM: Forward page fault requests to arch specific code for " Yan Zhao
2023-12-02 9:18 ` [RFC PATCH 08/42] KVM: Add a helper to notify importers that KVM exported TDP is flushed Yan Zhao
2023-12-02 9:19 ` [RFC PATCH 09/42] iommu: Add IOMMU_DOMAIN_KVM Yan Zhao
2023-12-02 9:20 ` [RFC PATCH 10/42] iommu: Add new iommu op to create domains managed by KVM Yan Zhao
2023-12-04 15:09 ` Jason Gunthorpe
2023-12-02 9:20 ` [RFC PATCH 11/42] iommu: Add new domain op cache_invalidate_kvm Yan Zhao
2023-12-04 15:09 ` Jason Gunthorpe
2023-12-05 6:40 ` Yan Zhao
2023-12-05 14:52 ` Jason Gunthorpe
2023-12-06 1:00 ` Yan Zhao
2023-12-02 9:21 ` [RFC PATCH 12/42] iommufd: Introduce allocation data info and flag for KVM managed HWPT Yan Zhao
2023-12-04 18:29 ` Jason Gunthorpe
2023-12-05 7:08 ` Yan Zhao
2023-12-05 14:53 ` Jason Gunthorpe
2023-12-06 0:58 ` Yan Zhao
2023-12-02 9:21 ` [RFC PATCH 13/42] iommufd: Add a KVM HW pagetable object Yan Zhao
2023-12-02 9:22 ` [RFC PATCH 14/42] iommufd: Enable KVM HW page table object to be proxy between KVM and IOMMU Yan Zhao
2023-12-04 18:34 ` Jason Gunthorpe
2023-12-05 7:09 ` Yan Zhao
2023-12-02 9:22 ` [RFC PATCH 15/42] iommufd: Add iopf handler to KVM hw pagetable Yan Zhao
2023-12-02 9:23 ` [RFC PATCH 16/42] iommufd: Enable device feature IOPF during device attachment to KVM HWPT Yan Zhao
2023-12-04 18:36 ` Jason Gunthorpe
2023-12-05 7:14 ` Yan Zhao
2023-12-05 14:53 ` Jason Gunthorpe
2023-12-06 0:55 ` Yan Zhao
2023-12-02 9:23 ` [RFC PATCH 17/42] iommu/vt-d: Make some macros and helpers to be extern Yan Zhao
2023-12-02 9:24 ` [RFC PATCH 18/42] iommu/vt-d: Support of IOMMU_DOMAIN_KVM domain in Intel IOMMU Yan Zhao
2023-12-02 9:24 ` [RFC PATCH 19/42] iommu/vt-d: Set bit PGSNP in PASIDTE if domain cache coherency is enforced Yan Zhao
2023-12-02 9:25 ` [RFC PATCH 20/42] iommu/vt-d: Support attach devices to IOMMU_DOMAIN_KVM domain Yan Zhao
2023-12-02 9:26 ` [RFC PATCH 21/42] iommu/vt-d: Check reserved bits for " Yan Zhao
2023-12-02 9:26 ` [RFC PATCH 22/42] iommu/vt-d: Support cache invalidate of " Yan Zhao
2023-12-02 9:26 ` [RFC PATCH 23/42] iommu/vt-d: Allow pasid 0 in IOPF Yan Zhao
2023-12-02 9:27 ` [RFC PATCH 24/42] KVM: x86/mmu: Move bit SPTE_MMU_PRESENT from bit 11 to bit 59 Yan Zhao
2023-12-02 9:27 ` [RFC PATCH 25/42] KVM: x86/mmu: Abstract "struct kvm_mmu_common" from "struct kvm_mmu" Yan Zhao
2023-12-02 9:28 ` [RFC PATCH 26/42] KVM: x86/mmu: introduce new op get_default_mt_mask to kvm_x86_ops Yan Zhao
2023-12-02 9:28 ` [RFC PATCH 27/42] KVM: x86/mmu: change param "vcpu" to "kvm" in kvm_mmu_hugepage_adjust() Yan Zhao
2023-12-02 9:29 ` [RFC PATCH 28/42] KVM: x86/mmu: change "vcpu" to "kvm" in page_fault_handle_page_track() Yan Zhao
2023-12-02 9:29 ` [RFC PATCH 29/42] KVM: x86/mmu: remove param "vcpu" from kvm_mmu_get_tdp_level() Yan Zhao
2023-12-02 9:30 ` [RFC PATCH 30/42] KVM: x86/mmu: remove param "vcpu" from kvm_calc_tdp_mmu_root_page_role() Yan Zhao
2023-12-02 9:30 ` [RFC PATCH 31/42] KVM: x86/mmu: add extra param "kvm" to kvm_faultin_pfn() Yan Zhao
2023-12-02 9:31 ` [RFC PATCH 32/42] KVM: x86/mmu: add extra param "kvm" to make_mmio_spte() Yan Zhao
2023-12-02 9:31 ` [RFC PATCH 33/42] KVM: x86/mmu: add extra param "kvm" to make_spte() Yan Zhao
2023-12-02 9:32 ` [RFC PATCH 34/42] KVM: x86/mmu: add extra param "kvm" to tdp_mmu_map_handle_target_level() Yan Zhao
2023-12-02 9:32 ` [RFC PATCH 35/42] KVM: x86/mmu: Get/Put TDP root page to be exported Yan Zhao
2023-12-02 9:33 ` [RFC PATCH 36/42] KVM: x86/mmu: Keep exported TDP root valid Yan Zhao
2023-12-02 9:33 ` [RFC PATCH 37/42] KVM: x86: Implement KVM exported TDP fault handler on x86 Yan Zhao
2023-12-02 9:35 ` [RFC PATCH 38/42] KVM: x86: "compose" and "get" interface for meta data of exported TDP Yan Zhao
2023-12-02 9:35 ` [RFC PATCH 39/42] KVM: VMX: add config KVM_INTEL_EXPORTED_EPT Yan Zhao
2023-12-02 9:36 ` [RFC PATCH 40/42] KVM: VMX: Compose VMX specific meta data for KVM exported TDP Yan Zhao
2023-12-02 9:36 ` [RFC PATCH 41/42] KVM: VMX: Implement ops .flush_remote_tlbs* in VMX when EPT is on Yan Zhao
2023-12-02 9:37 ` [RFC PATCH 42/42] KVM: VMX: Notify importers of exported TDP to flush TLBs on KVM flushes EPT Yan Zhao
2023-12-04 15:08 ` [RFC PATCH 00/42] Sharing KVM TDP to IOMMU Jason Gunthorpe
2023-12-04 16:38 ` Sean Christopherson
2023-12-05 1:31 ` Yan Zhao
2023-12-05 6:45 ` Tian, Kevin
2023-12-05 1:52 ` Yan Zhao
2023-12-05 6:30 ` Tian, Kevin
2023-12-04 17:00 ` Sean Christopherson [this message]
2023-12-04 17:30 ` Jason Gunthorpe
2023-12-04 19:22 ` Sean Christopherson
2023-12-04 19:50 ` Jason Gunthorpe
2023-12-04 20:11 ` Sean Christopherson
2023-12-04 23:49 ` Jason Gunthorpe
2023-12-05 7:17 ` Tian, Kevin
2023-12-05 5:53 ` Yan Zhao
2023-12-05 3:51 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZW4Fx2U80L1PJKlh@google.com \
--to=seanjc@google.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox