From: Isaku Yamahata <isaku.yamahata@intel.com>
To: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com,
erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com,
dmatlack@google.com
Subject: Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion
Date: Wed, 15 May 2024 11:03:24 -0700 [thread overview]
Message-ID: <20240515180324.GF168153@ls.amr.corp.intel.com> (raw)
In-Reply-To: <20240515005952.3410568-3-rick.p.edgecombe@intel.com>
On Tue, May 14, 2024 at 05:59:38PM -0700,
Rick Edgecombe <rick.p.edgecombe@intel.com> wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
>
> Introduce a per-memslot flag KVM_MEM_ZAP_LEAFS_ONLY to permit zap only leaf
> SPTEs when deleting a memslot.
>
> Today "zapping only memslot leaf SPTEs" on memslot deletion is not done.
> Instead KVM will invalidate all old TDPs (i.e. EPT for Intel or NPT for
> AMD) and generate fresh new TDPs based on the new memslot layout. This is
> because zapping and re-generating TDPs is low overhead for most use cases,
> and more importantly, it's due to a bug [1] which caused VM instability
> when a VM is with Nvidia Geforce GPU assigned.
>
> There's a previous attempt [2] to introduce a per-VM flag to workaround bug
> [1] by only allowing "zapping only memslot leaf SPTEs" for specific VMs.
> However, [2] was not merged due to lacking of a clear explanation of
> exactly what is broken [3] and it's not wise to "have a bug that is known
> to happen when you enable the capability".
>
> However, for some specific scenarios, e.g. TDX, invalidating and
> re-generating a new page table is not viable for reasons:
> - TDX requires root page of private page table remains unaltered throughout
> the TD life cycle.
> - TDX mandates that leaf entries in private page table must be zapped prior
> to non-leaf entries.
>
> So, Sean re-considered about introducing a per-VM flag or per-memslot flag
> again for VMs like TDX. [4]
>
> This patch is an implementation of per-memslot flag.
> Compared to per-VM flag approach,
> Pros:
> (1) By allowing userspace to control the zapping behavior in fine-grained
> granularity, optimizations for specific use cases can be developed
> without future kernel changes.
> (2) Allows developing new zapping behaviors without risking regressions by
> changing KVM behavior, as seen previously.
>
> Cons:
> (1) Users need to ensure all necessary memslots are with flag
> KVM_MEM_ZAP_LEAFS_ONLY set.e.g. QEMU needs to ensure all GUEST_MEMFD
> memslot is with ZAP_LEAFS_ONLY flag for TDX VM.
> (2) Opens up the possibility that userspace could configure memslots for
> normal VM in such a way that the bug [1] is seen.
>
> However, one thing deserves noting for TDX, is that TDX may potentially
> meet bug [1] for either per-memslot flag or per-VM flag approach, since
> there's a usage in radar to assign an untrusted & passthrough GPU device
> in TDX. If that happens, it can be treated as a bug (not regression) and
> fixed accordingly.
>
> An alternative approach we can also consider is to always invalidate &
> rebuild all shared page tables and zap only memslot leaf SPTEs for mirrored
> and private page tables on memslot deletion. This approach could exempt TDX
> from bug [1] when "untrusted & passthrough" devices are involved. But
> downside is that this approach requires creating new very specific KVM
> zapping ABI that could limit future changes in the same way that the bug
> did for normal VMs.
>
> Link: https://patchwork.kernel.org/project/kvm/patch/20190205210137.1377-11-sean.j.christopherson@intel.com [1]
> Link: https://lore.kernel.org/kvm/20200713190649.GE29725@linux.intel.com/T/#mabc0119583dacf621025e9d873c85f4fbaa66d5c [2]
> Link: https://lore.kernel.org/kvm/20200713190649.GE29725@linux.intel.com/T/#m1839c85392a7a022df9e507876bb241c022c4f06 [3]
> Link: https://lore.kernel.org/kvm/ZhSYEVCHqSOpVKMh@google.com [4]
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> TDX MMU Part 1:
> - New patch
> ---
> arch/x86/kvm/mmu/mmu.c | 30 +++++++++++++++++++++++++++++-
> arch/x86/kvm/x86.c | 17 +++++++++++++++++
> include/uapi/linux/kvm.h | 1 +
> virt/kvm/kvm_main.c | 5 ++++-
> 4 files changed, 51 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 61982da8c8b2..4a8e819794db 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6962,10 +6962,38 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
> kvm_mmu_zap_all(kvm);
> }
>
> +static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
> +{
> + if (KVM_BUG_ON(!tdp_mmu_enabled, kvm))
> + return;
> +
> + write_lock(&kvm->mmu_lock);
> +
> + /*
> + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
> + * case scenario we'll have unused shadow pages lying around until they
> + * are recycled due to age or when the VM is destroyed.
> + */
> + struct kvm_gfn_range range = {
> + .slot = slot,
> + .start = slot->base_gfn,
> + .end = slot->base_gfn + slot->npages,
> + .may_block = true,
> + };
nit: move this up at the beginning of this function.
Compiler didn't complain?
> +
> + if (kvm_tdp_mmu_unmap_gfn_range(kvm, &range, false))
> + kvm_flush_remote_tlbs(kvm);
> +
> + write_unlock(&kvm->mmu_lock);
> +}
> +
> void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> struct kvm_memory_slot *slot)
> {
> - kvm_mmu_zap_all_fast(kvm);
> + if (slot->flags & KVM_MEM_ZAP_LEAFS_ONLY)
> + kvm_mmu_zap_memslot_leafs(kvm, slot);
> + else
> + kvm_mmu_zap_all_fast(kvm);
> }
>
> void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7c593a081eba..4b3ec2ec79e9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12952,6 +12952,23 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
> return -EINVAL;
>
> + /*
> + * Since TDX private pages requires re-accepting after zap,
> + * and TDX private root page should not be zapped, TDX requires
> + * memslots for private memory must have flag
> + * KVM_MEM_ZAP_LEAFS_ONLY set too, so that only leaf SPTEs of
> + * the deleting memslot will be zapped and SPTEs in other
> + * memslots would not be affected.
> + */
> + if (kvm->arch.vm_type == KVM_X86_TDX_VM &&
> + (new->flags & KVM_MEM_GUEST_MEMFD) &&
> + !(new->flags & KVM_MEM_ZAP_LEAFS_ONLY))
> + return -EINVAL;
> +
> + /* zap-leafs-only works only when TDP MMU is enabled for now */
> + if ((new->flags & KVM_MEM_ZAP_LEAFS_ONLY) && !tdp_mmu_enabled)
> + return -EINVAL;
> +
> return kvm_alloc_memslot_metadata(kvm, new);
> }
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index aee67912e71c..d53648c19b26 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -51,6 +51,7 @@ struct kvm_userspace_memory_region2 {
> #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
> #define KVM_MEM_READONLY (1UL << 1)
> #define KVM_MEM_GUEST_MEMFD (1UL << 2)
> +#define KVM_MEM_ZAP_LEAFS_ONLY (1UL << 3)
If we make this uAPI, please update Documentation/virt/kvm/api.rst too.
>
> /* for KVM_IRQ_LINE */
> struct kvm_irq_level {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 81b90bf03f2f..1b1ffb6fc786 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1568,6 +1568,8 @@ static int check_memory_region_flags(struct kvm *kvm,
> if (kvm_arch_has_private_mem(kvm))
> valid_flags |= KVM_MEM_GUEST_MEMFD;
>
> + valid_flags |= KVM_MEM_ZAP_LEAFS_ONLY;
> +
This is arch common code. We need a guard for other arch (non-x86).
Also feature enumeration. KVM_CAP_USER_MEMORY2 can be used?
--
Isaku Yamahata <isaku.yamahata@intel.com>
next prev parent reply other threads:[~2024-05-15 18:03 UTC|newest]
Thread overview: 152+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-15 0:59 [PATCH 00/16] TDX MMU prep series part 1 Rick Edgecombe
2024-05-15 0:59 ` [PATCH 01/16] KVM: x86: Add a VM type define for TDX Rick Edgecombe
2024-05-15 0:59 ` [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion Rick Edgecombe
2024-05-15 13:24 ` Huang, Kai
2024-05-15 19:09 ` Sean Christopherson
2024-05-15 19:23 ` Edgecombe, Rick P
2024-05-15 20:05 ` Sean Christopherson
2024-05-15 20:53 ` Edgecombe, Rick P
2024-05-15 22:47 ` Sean Christopherson
2024-05-15 23:06 ` Huang, Kai
2024-05-15 23:20 ` Sean Christopherson
2024-05-15 23:36 ` Huang, Kai
2024-05-16 1:12 ` Xiaoyao Li
2024-05-17 15:30 ` Paolo Bonzini
2024-05-22 1:29 ` Yan Zhao
2024-05-22 2:31 ` Sean Christopherson
2024-05-22 6:48 ` Yan Zhao
2024-05-22 15:45 ` Paolo Bonzini
2024-05-24 1:50 ` Yan Zhao
2024-05-15 23:56 ` Edgecombe, Rick P
2024-05-16 2:21 ` Edgecombe, Rick P
2024-05-16 3:56 ` Yan Zhao
2024-05-17 15:27 ` Paolo Bonzini
2024-05-17 15:25 ` Paolo Bonzini
2024-05-15 18:03 ` Isaku Yamahata [this message]
2024-05-15 0:59 ` [PATCH 03/16] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Rick Edgecombe
2024-05-17 7:44 ` Chao Gao
2024-05-17 9:08 ` Isaku Yamahata
2024-05-15 0:59 ` [PATCH 04/16] KVM: x86/mmu: Add address conversion functions for TDX shared bit of GPA Rick Edgecombe
2024-05-15 22:34 ` Huang, Kai
2024-05-15 23:21 ` Edgecombe, Rick P
2024-05-15 23:31 ` Huang, Kai
2024-05-15 23:38 ` Edgecombe, Rick P
2024-05-15 23:44 ` Huang, Kai
2024-05-15 23:59 ` Edgecombe, Rick P
2024-05-16 0:12 ` Huang, Kai
2024-05-16 0:19 ` Edgecombe, Rick P
2024-05-16 0:25 ` Huang, Kai
2024-05-16 0:35 ` Edgecombe, Rick P
2024-05-16 1:04 ` Huang, Kai
2024-05-16 1:20 ` Edgecombe, Rick P
2024-05-16 1:40 ` Huang, Kai
2024-05-16 5:52 ` Yan Zhao
2024-05-18 0:25 ` Edgecombe, Rick P
2024-05-16 23:08 ` Edgecombe, Rick P
2024-05-17 0:37 ` Huang, Kai
2024-05-17 1:51 ` Edgecombe, Rick P
2024-05-17 4:26 ` Huang, Kai
2024-05-17 21:12 ` Edgecombe, Rick P
2024-05-15 0:59 ` [PATCH 05/16] KVM: Add member to struct kvm_gfn_range for target alias Rick Edgecombe
2024-05-17 20:58 ` Edgecombe, Rick P
2024-05-15 0:59 ` [PATCH 06/16] KVM: x86/mmu: Add a new is_private member for union kvm_mmu_page_role Rick Edgecombe
2024-05-15 0:59 ` [PATCH 07/16] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page Rick Edgecombe
2024-05-15 0:59 ` [PATCH 08/16] KVM: x86/mmu: Bug the VM if kvm_zap_gfn_range() is called for TDX Rick Edgecombe
2024-05-15 13:27 ` Huang, Kai
2024-05-15 15:22 ` Edgecombe, Rick P
2024-05-15 23:14 ` Huang, Kai
2024-05-15 15:34 ` Sean Christopherson
2024-05-15 15:49 ` Edgecombe, Rick P
2024-05-15 15:56 ` Edgecombe, Rick P
2024-05-15 16:02 ` Sean Christopherson
2024-05-15 16:12 ` Edgecombe, Rick P
2024-05-15 18:09 ` Sean Christopherson
2024-05-15 18:22 ` Edgecombe, Rick P
2024-05-15 19:48 ` Sean Christopherson
2024-05-15 20:32 ` Edgecombe, Rick P
2024-05-15 23:26 ` Sean Christopherson
2024-05-15 16:22 ` Isaku Yamahata
2024-05-15 22:17 ` Huang, Kai
2024-05-15 23:14 ` Edgecombe, Rick P
2024-05-15 23:38 ` Huang, Kai
2024-05-16 0:13 ` Edgecombe, Rick P
2024-05-16 0:27 ` Isaku Yamahata
2024-05-16 1:11 ` Huang, Kai
2024-05-16 0:15 ` Isaku Yamahata
2024-05-16 0:52 ` Edgecombe, Rick P
2024-05-16 1:21 ` Huang, Kai
2024-05-16 17:27 ` Isaku Yamahata
2024-05-16 21:46 ` Edgecombe, Rick P
2024-05-16 22:23 ` Huang, Kai
2024-05-16 22:38 ` Edgecombe, Rick P
2024-05-16 23:16 ` Huang, Kai
2024-05-15 0:59 ` [PATCH 09/16] KVM: x86/mmu: Make kvm_tdp_mmu_alloc_root() return void Rick Edgecombe
2024-05-15 0:59 ` [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU Rick Edgecombe
2024-05-15 17:35 ` Isaku Yamahata
2024-05-15 18:00 ` Edgecombe, Rick P
2024-05-16 0:52 ` Huang, Kai
2024-05-16 1:27 ` Edgecombe, Rick P
2024-05-16 2:07 ` Huang, Kai
2024-05-16 2:57 ` Edgecombe, Rick P
2024-05-16 13:04 ` Huang, Kai
2024-05-16 16:36 ` Edgecombe, Rick P
2024-05-16 19:42 ` Isaku Yamahata
2024-05-17 2:35 ` Edgecombe, Rick P
2024-05-17 9:03 ` Isaku Yamahata
2024-05-17 18:16 ` Edgecombe, Rick P
2024-05-17 19:16 ` Isaku Yamahata
2024-05-20 23:32 ` Isaku Yamahata
2024-05-21 15:07 ` Edgecombe, Rick P
2024-05-21 16:15 ` Isaku Yamahata
2024-05-22 22:34 ` Isaku Yamahata
2024-05-22 23:09 ` Edgecombe, Rick P
2024-05-22 23:47 ` Isaku Yamahata
2024-05-22 23:50 ` Edgecombe, Rick P
2024-05-23 0:01 ` Isaku Yamahata
2024-05-23 18:27 ` Edgecombe, Rick P
2024-05-24 7:55 ` Isaku Yamahata
2024-05-28 16:27 ` Edgecombe, Rick P
2024-05-28 17:47 ` Paolo Bonzini
2024-05-29 2:13 ` Edgecombe, Rick P
2024-05-29 7:25 ` Paolo Bonzini
2024-05-31 14:11 ` Isaku Yamahata
2024-05-28 17:43 ` Paolo Bonzini
2024-05-28 17:16 ` Paolo Bonzini
2024-05-28 18:29 ` Edgecombe, Rick P
2024-05-29 1:06 ` Isaku Yamahata
2024-05-29 1:51 ` Edgecombe, Rick P
2024-05-17 2:36 ` Huang, Kai
2024-05-17 8:14 ` Isaku Yamahata
2024-05-18 5:42 ` Huang, Kai
2024-05-18 15:41 ` Edgecombe, Rick P
2024-05-20 10:38 ` Huang, Kai
2024-05-20 18:58 ` Isaku Yamahata
2024-05-20 19:02 ` Edgecombe, Rick P
2024-05-20 23:39 ` Edgecombe, Rick P
2024-05-21 2:25 ` Isaku Yamahata
2024-05-21 2:57 ` Edgecombe, Rick P
2024-05-20 22:34 ` Huang, Kai
2024-05-16 1:48 ` Isaku Yamahata
2024-05-16 2:00 ` Edgecombe, Rick P
2024-05-16 2:10 ` Huang, Kai
2024-05-28 16:59 ` Paolo Bonzini
2024-05-16 17:10 ` Isaku Yamahata
2024-05-23 23:14 ` Edgecombe, Rick P
2024-05-24 8:20 ` Isaku Yamahata
2024-05-28 21:48 ` Edgecombe, Rick P
2024-05-29 1:16 ` Isaku Yamahata
2024-05-29 1:50 ` Edgecombe, Rick P
2024-05-29 2:20 ` Isaku Yamahata
2024-05-29 2:29 ` Edgecombe, Rick P
2024-05-28 20:54 ` Edgecombe, Rick P
2024-05-29 1:24 ` Isaku Yamahata
2024-05-28 23:06 ` Edgecombe, Rick P
2024-05-29 1:57 ` Isaku Yamahata
2024-05-29 2:13 ` Edgecombe, Rick P
2024-05-29 16:55 ` Isaku Yamahata
2024-05-15 0:59 ` [PATCH 11/16] KVM: x86/tdp_mmu: Extract root invalid check from tdx_mmu_next_root() Rick Edgecombe
2024-05-15 0:59 ` [PATCH 12/16] KVM: x86/tdp_mmu: Introduce KVM MMU root types to specify page table type Rick Edgecombe
2024-05-15 0:59 ` [PATCH 13/16] KVM: x86/tdp_mmu: Introduce shared, private KVM MMU root types Rick Edgecombe
2024-05-15 0:59 ` [PATCH 14/16] KVM: x86/tdp_mmu: Take root types for kvm_tdp_mmu_invalidate_all_roots() Rick Edgecombe
2024-05-15 0:59 ` [PATCH 15/16] KVM: x86/tdp_mmu: Make mmu notifier callbacks to check kvm_process Rick Edgecombe
2024-05-15 0:59 ` [PATCH 16/16] KVM: x86/tdp_mmu: Invalidate correct roots Rick Edgecombe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240515180324.GF168153@ls.amr.corp.intel.com \
--to=isaku.yamahata@intel.com \
--cc=dmatlack@google.com \
--cc=erdemaktas@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=sagis@google.com \
--cc=seanjc@google.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox