From: Isaku Yamahata <isaku.yamahata@intel.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "Zhao, Yan Y" <yan.y.zhao@intel.com>,
"Gao, Chao" <chao.gao@intel.com>,
"seanjc@google.com" <seanjc@google.com>,
"Huang, Kai" <kai.huang@intel.com>,
"sagis@google.com" <sagis@google.com>,
"isaku.yamahata@gmail.com" <isaku.yamahata@gmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Aktas, Erdem" <erdemaktas@google.com>,
"dmatlack@google.com" <dmatlack@google.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Yamahata, Isaku" <isaku.yamahata@intel.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
isaku.yamahata@linux.intel.com
Subject: Re: [PATCH v3 17/17] KVM: x86/tdp_mmu: Take root types for kvm_tdp_mmu_invalidate_all_roots()
Date: Thu, 18 Jul 2024 08:28:23 -0700 [thread overview]
Message-ID: <20240718152823.GG1900928@ls.amr.corp.intel.com> (raw)
In-Reply-To: <0e026a5d31e7e3a3f0eb07a2b75cb4f659d014d5.camel@intel.com>
On Fri, Jun 21, 2024 at 07:08:22PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:
> On Fri, 2024-06-21 at 15:10 +0800, Yan Zhao wrote:
> > On Wed, Jun 19, 2024 at 03:36:14PM -0700, Rick Edgecombe wrote:
> > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > > index 630e6b6d4bf2..a1ab67a4f41f 100644
> > > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > > @@ -37,7 +37,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
> > > * for zapping and thus puts the TDP MMU's reference to each root,
> > > i.e.
> > > * ultimately frees all roots.
> > > */
> > > - kvm_tdp_mmu_invalidate_all_roots(kvm);
> > > + kvm_tdp_mmu_invalidate_roots(kvm, KVM_VALID_ROOTS);
> > all roots (mirror + direct) are invalidated here.
>
> Right.
> >
> > > kvm_tdp_mmu_zap_invalidated_roots(kvm);
> > kvm_tdp_mmu_zap_invalidated_roots() will zap invalidated mirror root with
> > mmu_lock held for read, which should trigger KVM_BUG_ON() in
> > __tdp_mmu_set_spte_atomic(), which assumes "atomic zapping don't operate on
> > mirror roots".
> >
> > But up to now, the KVM_BUG_ON() is not triggered because
> > kvm_mmu_notifier_release() is called earlier than kvm_destroy_vm() (as in
> > below
> > call trace), and kvm_arch_flush_shadow_all() in kvm_mmu_notifier_release() has
> > zapped all mirror SPTEs before kvm_mmu_uninit_vm() called in kvm_destroy_vm().
> >
> >
> > kvm_mmu_notifier_release
> > kvm_flush_shadow_all
> > kvm_arch_flush_shadow_all
> > static_call_cond(kvm_x86_flush_shadow_all_private)(kvm);
> > kvm_mmu_zap_all ==>hold mmu_lock for write
> > kvm_tdp_mmu_zap_all ==>zap KVM_ALL_ROOTS with mmu_lock held for write
> >
> > kvm_destroy_vm
> > kvm_arch_destroy_vm
> > kvm_mmu_uninit_vm
> > kvm_mmu_uninit_tdp_mmu
> > kvm_tdp_mmu_invalidate_roots ==>invalid all KVM_VALID_ROOTS
> > kvm_tdp_mmu_zap_invalidated_roots ==> zap all roots with mmu_lock held
> > for read
> >
> >
> > A question is that kvm_mmu_notifier_release(), as a callback of primary MMU
> > notifier, why does it zap mirrored tdp when all other callbacks are with
> > KVM_FILTER_SHARED?
> >
> > Could we just zap all KVM_DIRECT_ROOTS (valid | invalid) in
> > kvm_mmu_notifier_release() and move mirrord tdp related stuffs from
> > kvm_arch_flush_shadow_all() to kvm_mmu_uninit_tdp_mmu(), ensuring mmu_lock is
> > held for write?
>
> Sigh, thanks for flagging this. I agree it seems weird to free private memory
> from an MMU notifier callback. I also found this old thread where Sean NAKed the
> current approach (free hkid in mmu release):
> https://lore.kernel.org/kvm/ZN+1QHGa6ltpQxZn@google.com/#t
>
> One challenge is that flush_shadow_all_private() needs to be done before
> kvm_destroy_vcpus(), where it gets into tdx_vcpu_free(). So kvm_mmu_uninit_vm()
> is too late. Perhaps this is why it was shoved into mmu notifier release (which
> happens long before as you noted). Isaku, do you recall any other reasons?
Although it's a bit late, it's for record.
It's to optimize the destruction Secure-EPT. Free HKID early and destruct
Secure-EPT by TDH.PHYMEM.PAGE.RECLAIM(). QEMU doesn't close any KVM file
descriptors on exit. (gmem fd references KVM VM fd. so vm destruction happens
after all gmem fds are closed. Closing gmem fd causes secure-EPT zapping befure
releasing HKID.)
Because we're ignoring such optimization for now, we can simply defer releasing
HKID following Seans's call.
> But static_call_cond(kvm_x86_vm_destroy) happens before kvm_destroy_vcpus, so we
> could maybe actually just do the tdx_mmu_release_hkid() part there. Then drop
> the flush_shadow_all_private x86 op. See the (not thoroughly checked) diff at
> the bottom of this mail.
Yep, we can release HKID at vm destruction with potential too slow zapping of
Secure-EPT. The following change basically looks good to me.
(The callback for Secure-EPT can be simplified.)
Thanks,
> But most of what is being discussed is in future patches where it starts to get
> into the TDX module interaction. So I wonder if we should drop this patch 17
> from "part 1" and include it with the next series so it can all be considered
> together.
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-
> ops.h
> index 2adf36b74910..3927731aa947 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -23,7 +23,6 @@ KVM_X86_OP(has_emulated_msr)
> KVM_X86_OP(vcpu_after_set_cpuid)
> KVM_X86_OP_OPTIONAL(vm_enable_cap)
> KVM_X86_OP(vm_init)
> -KVM_X86_OP_OPTIONAL(flush_shadow_all_private)
> KVM_X86_OP_OPTIONAL(vm_destroy)
> KVM_X86_OP_OPTIONAL(vm_free)
> KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8a72e5873808..8b2b79b39d0f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1647,7 +1647,6 @@ struct kvm_x86_ops {
> unsigned int vm_size;
> int (*vm_enable_cap)(struct kvm *kvm, struct kvm_enable_cap *cap);
> int (*vm_init)(struct kvm *kvm);
> - void (*flush_shadow_all_private)(struct kvm *kvm);
> void (*vm_destroy)(struct kvm *kvm);
> void (*vm_free)(struct kvm *kvm);
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e1299eb03e63..4deeeac14324 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6446,7 +6446,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
> * lead to use-after-free.
> */
> if (tdp_mmu_enabled)
> - kvm_tdp_mmu_zap_invalidated_roots(kvm);
> + kvm_tdp_mmu_zap_invalidated_roots(kvm, true);
> }
>
> static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
> @@ -6977,13 +6977,6 @@ static void kvm_mmu_zap_all(struct kvm *kvm)
>
> void kvm_arch_flush_shadow_all(struct kvm *kvm)
> {
> - /*
> - * kvm_mmu_zap_all() zaps both private and shared page tables. Before
> - * tearing down private page tables, TDX requires some TD resources to
> - * be destroyed (i.e. keyID must have been reclaimed, etc). Invoke
> - * kvm_x86_flush_shadow_all_private() for this.
> - */
> - static_call_cond(kvm_x86_flush_shadow_all_private)(kvm);
> kvm_mmu_zap_all(kvm);
> }
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 68dfcdb46ab7..9e8b012aa8cc 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -38,7 +38,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
> * ultimately frees all roots.
> */
> kvm_tdp_mmu_invalidate_roots(kvm, KVM_VALID_ROOTS);
> - kvm_tdp_mmu_zap_invalidated_roots(kvm);
> + kvm_tdp_mmu_zap_invalidated_roots(kvm, false);
>
> WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
> WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
> @@ -1057,7 +1057,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
> * KVM_RUN is unreachable, i.e. no vCPUs will ever service the request.
> */
> lockdep_assert_held_write(&kvm->mmu_lock);
> - for_each_tdp_mmu_root_yield_safe(kvm, root)
> + __for_each_tdp_mmu_root_yield_safe(kvm, root, -1, KVM_DIRECT_ROOTS)
> tdp_mmu_zap_root(kvm, root, false);
> }
>
> @@ -1065,11 +1065,14 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
> * Zap all invalidated roots to ensure all SPTEs are dropped before the "fast
> * zap" completes.
> */
> -void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
> +void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm, bool shared)
> {
> struct kvm_mmu_page *root;
>
> - read_lock(&kvm->mmu_lock);
> + if (shared)
> + read_lock(&kvm->mmu_lock);
> + else
> + write_lock(&kvm->mmu_lock);
>
> for_each_tdp_mmu_root_yield_safe(kvm, root) {
> if (!root->tdp_mmu_scheduled_root_to_zap)
> @@ -1087,7 +1090,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
> * that may be zapped, as such entries are associated with the
> * ASID on both VMX and SVM.
> */
> - tdp_mmu_zap_root(kvm, root, true);
> + tdp_mmu_zap_root(kvm, root, shared);
>
> /*
> * The referenced needs to be put *after* zapping the root, as
> @@ -1097,7 +1100,10 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
> kvm_tdp_mmu_put_root(kvm, root);
> }
>
> - read_unlock(&kvm->mmu_lock);
> + if (shared)
> + read_unlock(&kvm->mmu_lock);
> + else
> + write_unlock(&kvm->mmu_lock);
> }
>
> /*
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 56741d31048a..7927fa4a96e0 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -68,7 +68,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page
> *sp);
> void kvm_tdp_mmu_zap_all(struct kvm *kvm);
> void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm,
> enum kvm_tdp_mmu_root_types root_types);
> -void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
> +void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm, bool shared);
>
> int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
>
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index b6828e35eb17..3f9bfcd3e152 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -98,16 +98,12 @@ static int vt_vm_init(struct kvm *kvm)
> return vmx_vm_init(kvm);
> }
>
> -static void vt_flush_shadow_all_private(struct kvm *kvm)
> -{
> - if (is_td(kvm))
> - tdx_mmu_release_hkid(kvm);
> -}
> -
> static void vt_vm_destroy(struct kvm *kvm)
> {
> - if (is_td(kvm))
> + if (is_td(kvm)) {
> + tdx_mmu_release_hkid(kvm);
> return;
> + }
>
> vmx_vm_destroy(kvm);
> }
> @@ -980,7 +976,6 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
> .vm_size = sizeof(struct kvm_vmx),
> .vm_enable_cap = vt_vm_enable_cap,
> .vm_init = vt_vm_init,
> - .flush_shadow_all_private = vt_flush_shadow_all_private,
> .vm_destroy = vt_vm_destroy,
> .vm_free = vt_vm_free,
>
>
--
Isaku Yamahata <isaku.yamahata@intel.com>
next prev parent reply other threads:[~2024-07-18 15:28 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-19 22:35 [PATCH v3 00/17] TDX MMU prep series part 1 Rick Edgecombe
2024-06-19 22:35 ` [PATCH v3 01/17] KVM: x86/tdp_mmu: Rename REMOVED_SPTE to FROZEN_SPTE Rick Edgecombe
2024-06-19 22:35 ` [PATCH v3 02/17] KVM: Add member to struct kvm_gfn_range for target alias Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 03/17] KVM: x86: Add a VM type define for TDX Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 04/17] KVM: x86/mmu: Add an external pointer to struct kvm_mmu_page Rick Edgecombe
2024-07-03 7:03 ` Yan Zhao
2024-06-19 22:36 ` [PATCH v3 05/17] KVM: x86/mmu: Add an is_mirror member for union kvm_mmu_page_role Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 06/17] KVM: x86/mmu: Make kvm_tdp_mmu_alloc_root() return void Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 07/17] KVM: x86/tdp_mmu: Take struct kvm in iter loops Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 08/17] KVM: x86/tdp_mmu: Take a GFN in kvm_tdp_mmu_fast_pf_get_last_sptep() Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 09/17] KVM: x86/mmu: Support GFN direct bits Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 10/17] KVM: x86/tdp_mmu: Extract root invalid check from tdx_mmu_next_root() Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 11/17] KVM: x86/tdp_mmu: Introduce KVM MMU root types to specify page table type Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 12/17] KVM: x86/tdp_mmu: Take root in tdp_mmu_for_each_pte() Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 13/17] KVM: x86/tdp_mmu: Support mirror root for TDP MMU Rick Edgecombe
2024-06-24 8:30 ` Yan Zhao
2024-06-25 0:51 ` Edgecombe, Rick P
2024-06-25 5:43 ` Yan Zhao
2024-06-25 20:33 ` Edgecombe, Rick P
2024-06-26 5:05 ` Yan Zhao
2024-07-03 19:40 ` Edgecombe, Rick P
2024-07-04 8:09 ` Yan Zhao
2024-07-09 22:36 ` Edgecombe, Rick P
2024-07-04 8:51 ` Yan Zhao
2024-07-09 22:38 ` Edgecombe, Rick P
2024-07-11 23:54 ` Edgecombe, Rick P
2024-07-12 1:42 ` Yan Zhao
2024-06-19 22:36 ` [PATCH v3 14/17] KVM: x86/tdp_mmu: Propagate attr_filter to MMU notifier callbacks Rick Edgecombe
2024-06-19 22:36 ` [PATCH v3 15/17] KVM: x86/tdp_mmu: Propagate building mirror page tables Rick Edgecombe
2024-06-20 5:15 ` Binbin Wu
2024-06-24 23:52 ` Edgecombe, Rick P
2024-06-19 22:36 ` [PATCH v3 16/17] KVM: x86/tdp_mmu: Propagate tearing down " Rick Edgecombe
2024-06-20 8:44 ` Binbin Wu
2024-06-24 23:55 ` Edgecombe, Rick P
2024-06-19 22:36 ` [PATCH v3 17/17] KVM: x86/tdp_mmu: Take root types for kvm_tdp_mmu_invalidate_all_roots() Rick Edgecombe
2024-06-21 7:10 ` Yan Zhao
2024-06-21 19:08 ` Edgecombe, Rick P
2024-06-24 8:29 ` Yan Zhao
2024-06-24 23:15 ` Edgecombe, Rick P
2024-06-25 6:14 ` Yan Zhao
2024-06-25 20:56 ` Edgecombe, Rick P
2024-06-26 2:25 ` Yan Zhao
2024-07-03 20:00 ` Edgecombe, Rick P
2024-07-05 1:16 ` Yan Zhao
2024-07-09 22:52 ` Edgecombe, Rick P
2024-07-18 15:28 ` Isaku Yamahata [this message]
2024-07-18 15:55 ` Edgecombe, Rick P
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240718152823.GG1900928@ls.amr.corp.intel.com \
--to=isaku.yamahata@intel.com \
--cc=chao.gao@intel.com \
--cc=dmatlack@google.com \
--cc=erdemaktas@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=isaku.yamahata@linux.intel.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=sagis@google.com \
--cc=seanjc@google.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox