public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	dave.hansen@linux.intel.com,  rick.p.edgecombe@intel.com,
	kai.huang@intel.com, adrian.hunter@intel.com,
	 reinette.chatre@intel.com, xiaoyao.li@intel.com,
	tony.lindgren@intel.com,  binbin.wu@linux.intel.com,
	dmatlack@google.com, isaku.yamahata@intel.com,
	 isaku.yamahata@gmail.com, nik.borisov@suse.com,
	linux-kernel@vger.kernel.org,  x86@kernel.org
Subject: Re: [RFC PATCH 2/2] KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal
Date: Wed, 18 Dec 2024 08:10:48 -0800	[thread overview]
Message-ID: <Z2L0CPLKg9dh6imZ@google.com> (raw)
In-Reply-To: <Z2JhXfA14UjC1/fs@yzhao56-desk.sh.intel.com>

On Wed, Dec 18, 2024, Yan Zhao wrote:
> On Tue, Dec 17, 2024 at 03:29:03PM -0800, Sean Christopherson wrote:
> > On Thu, Nov 21, 2024, Yan Zhao wrote:
> > > For tdh_mem_range_block(), tdh_mem_track(), tdh_mem_page_remove(),
> > > 
> > > - Upon detection of TDX_OPERAND_BUSY, retry each SEAMCALL only once.
> > > - During the retry, kick off all vCPUs and prevent any vCPU from entering
> > >   to avoid potential contentions.
> > > 
> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > ---
> > >  arch/x86/include/asm/kvm_host.h |  2 ++
> > >  arch/x86/kvm/vmx/tdx.c          | 49 +++++++++++++++++++++++++--------
> > >  2 files changed, 40 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 521c7cf725bc..bb7592110337 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -123,6 +123,8 @@
> > >  #define KVM_REQ_HV_TLB_FLUSH \
> > >  	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> > >  #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(34)
> > > +#define KVM_REQ_NO_VCPU_ENTER_INPROGRESS \
> > > +	KVM_ARCH_REQ_FLAGS(33, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> > >  
> > >  #define CR0_RESERVED_BITS                                               \
> > >  	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > index 60d9e9d050ad..ed6b41bbcec6 100644
> > > --- a/arch/x86/kvm/vmx/tdx.c
> > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > @@ -311,6 +311,20 @@ static void tdx_clear_page(unsigned long page_pa)
> > >  	__mb();
> > >  }
> > >  
> > > +static void tdx_no_vcpus_enter_start(struct kvm *kvm)
> > > +{
> > > +	kvm_make_all_cpus_request(kvm, KVM_REQ_NO_VCPU_ENTER_INPROGRESS);
> > 
> > I vote for making this a common request with a more succient name, e.g. KVM_REQ_PAUSE.
> KVM_REQ_PAUSE looks good to me. But will the "pause" cause any confusion with
> the guest's pause state?

Maybe?

> > And with appropriate helpers in common code.  I could have sworn I floated this
> > idea in the past for something else, but apparently not.  The only thing I can
> Yes, you suggested me to implement it via a request, similar to
> KVM_REQ_MCLOCK_INPROGRESS. [1].
> (I didn't add your suggested-by tag in this patch because it's just an RFC).
> 
> [1] https://lore.kernel.org/kvm/ZuR09EqzU1WbQYGd@google.com/
> 
> > find is an old arm64 version for pausing vCPUs to emulated.  Hmm, maybe I was
> > thinking of KVM_REQ_OUTSIDE_GUEST_MODE?
> KVM_REQ_OUTSIDE_GUEST_MODE just kicks vCPUs outside guest mode, it does not set
> a bit in vcpu->requests to prevent later vCPUs entering.

Yeah, I was mostly just talking to myself. :-)

> > Anyways, I don't see any reason to make this an arch specific request.
> After making it non-arch specific, probably we need an atomic counter for the
> start/stop requests in the common helpers. So I just made it TDX-specific to
> keep it simple in the RFC.

Oh, right.  I didn't consider the complications with multiple users.  Hrm.

Actually, this doesn't need to be a request.  KVM_REQ_OUTSIDE_GUEST_MODE will
forces vCPUs to exit, at which point tdx_vcpu_run() can return immediately with
EXIT_FASTPATH_EXIT_HANDLED, which is all that kvm_vcpu_exit_request() does.  E.g.
have the zap side set wait_for_sept_zap before blasting the request to all vCPU,
and then in tdx_vcpu_run():

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b49dcf32206b..508ad6462e6d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -921,6 +921,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
                return EXIT_FASTPATH_NONE;
        }
 
+       if (unlikely(READ_ONCE(to_kvm_tdx(vcpu->kvm)->wait_for_sept_zap)))
+               return EXIT_FASTPATH_EXIT_HANDLED;
+
        trace_kvm_entry(vcpu, force_immediate_exit);
 
        if (pi_test_on(&tdx->pi_desc)) {


Ooh, but there's a subtle flaw with that approach.  Unlike kvm_vcpu_exit_request(),
the above check would obviously happen before entry to the guest, which means that,
in theory, KVM needs to goto cancel_injection to re-request req_immediate_exit and
cancel injection:

	if (req_immediate_exit)
		kvm_make_request(KVM_REQ_EVENT, vcpu);
	kvm_x86_call(cancel_injection)(vcpu);

But!  This actually an opportunity to harden KVM.  Because the TDX module doesn't
guarantee entry, it's already possible for KVM to _think_ it completely entry to
the guest without actually having done so.  It just happens to work because KVM
never needs to force an immediate exit for TDX, and can't do direct injection,
i.e. can "safely" skip the cancel_injection path.

So, I think can and should go with the above suggestion, but also add a WARN on
req_immediate_exit being set, because TDX ignores it entirely.

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b49dcf32206b..e23cd8231144 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -914,6 +914,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
        struct vcpu_tdx *tdx = to_tdx(vcpu);
        struct vcpu_vt *vt = to_tdx(vcpu);
 
+       /* <comment goes here> */
+       WARN_ON_ONCE(force_immediate_exit);
+
        /* TDX exit handle takes care of this error case. */
        if (unlikely(tdx->state != VCPU_TD_STATE_INITIALIZED)) {
                tdx->vp_enter_ret = TDX_SW_ERROR;
@@ -921,6 +924,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
                return EXIT_FASTPATH_NONE;
        }
 
+       if (unlikely(to_kvm_tdx(vcpu->kvm)->wait_for_sept_zap))
+               return EXIT_FASTPATH_EXIT_HANDLED;
+
        trace_kvm_entry(vcpu, force_immediate_exit);
 
        if (pi_test_on(&tdx->pi_desc)) {

  reply	other threads:[~2024-12-18 16:10 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-12  7:33 [PATCH v2 00/24] TDX MMU Part 2 Yan Zhao
2024-11-12  7:34 ` [PATCH v2 01/24] KVM: x86/mmu: Implement memslot deletion for TDX Yan Zhao
2024-11-12  7:34 ` [PATCH v2 02/24] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Yan Zhao
2024-11-12  7:35 ` [PATCH v2 03/24] KVM: x86/mmu: Do not enable page track for TD guest Yan Zhao
2024-11-12  7:35 ` [PATCH v2 04/24] KVM: VMX: Split out guts of EPT violation to common/exposed function Yan Zhao
2024-11-12  7:35 ` [PATCH v2 05/24] KVM: VMX: Teach EPT violation helper about private mem Yan Zhao
2024-11-12  7:35 ` [PATCH v2 06/24] KVM: TDX: Add accessors VMX VMCS helpers Yan Zhao
2024-11-12  7:36 ` [PATCH v2 07/24] KVM: TDX: Add load_mmu_pgd method for TDX Yan Zhao
2024-11-12  7:36 ` [PATCH v2 08/24] KVM: TDX: Set gfn_direct_bits to shared bit Yan Zhao
2024-11-12  7:36 ` [PATCH v2 09/24] x86/virt/tdx: Add SEAMCALL wrapper tdh_mem_sept_add() to add SEPT pages Yan Zhao
2024-11-12  7:36 ` [PATCH v2 10/24] x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages Yan Zhao
2024-11-12  7:36 ` [PATCH v2 11/24] x86/virt/tdx: Add SEAMCALL wrappers to manage TDX TLB tracking Yan Zhao
2024-11-12  7:36 ` [PATCH v2 12/24] x86/virt/tdx: Add SEAMCALL wrappers to remove a TD private page Yan Zhao
2024-11-12  7:37 ` [PATCH v2 13/24] x86/virt/tdx: Add SEAMCALL wrappers for TD measurement of initial contents Yan Zhao
2024-11-12  7:37 ` [PATCH v2 14/24] KVM: TDX: Require TDP MMU and mmio caching for TDX Yan Zhao
2024-11-12  7:37 ` [PATCH v2 15/24] KVM: x86/mmu: Add setter for shadow_mmio_value Yan Zhao
2024-11-12  7:37 ` [PATCH v2 16/24] KVM: TDX: Set per-VM shadow_mmio_value to 0 Yan Zhao
2024-11-12  7:37 ` [PATCH v2 17/24] KVM: TDX: Handle TLB tracking for TDX Yan Zhao
2024-11-12  7:38 ` [PATCH v2 18/24] KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Yan Zhao
2024-11-12  7:38 ` [PATCH v2 19/24] KVM: TDX: Implement hook to get max mapping level of private pages Yan Zhao
2024-11-12  7:38 ` [PATCH v2 20/24] KVM: x86/mmu: Export kvm_tdp_map_page() Yan Zhao
2024-11-12  7:38 ` [PATCH v2 21/24] KVM: TDX: Add an ioctl to create initial guest memory Yan Zhao
2024-11-27 18:08   ` Nikolay Borisov
2024-11-28  2:20     ` Yan Zhao
2024-11-12  7:38 ` [PATCH v2 22/24] KVM: TDX: Finalize VM initialization Yan Zhao
2024-12-24 14:31   ` Paolo Bonzini
2025-01-07  7:44     ` Yan Zhao
2025-01-07 14:02       ` Paolo Bonzini
2025-01-08  2:18         ` Yan Zhao
2024-11-12  7:38 ` [PATCH v2 23/24] KVM: TDX: Handle vCPU dissociation Yan Zhao
2024-11-12  7:39 ` [PATCH v2 24/24] [HACK] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT Yan Zhao
2024-11-21 11:51 ` [RFC PATCH 0/2] SEPT SEAMCALL retry proposal Yan Zhao
2024-11-21 11:56   ` [RFC PATCH 1/2] KVM: TDX: Retry in TDX when installing TD private/sept pages Yan Zhao
2024-11-21 11:57   ` [RFC PATCH 2/2] KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal Yan Zhao
2024-11-26  0:47     ` Edgecombe, Rick P
2024-11-26  6:39       ` Yan Zhao
2024-12-17 23:29     ` Sean Christopherson
2024-12-18  5:45       ` Yan Zhao
2024-12-18 16:10         ` Sean Christopherson [this message]
2024-12-19  1:52           ` Yan Zhao
2024-12-19  2:39             ` Sean Christopherson
2024-12-19  3:03               ` Yan Zhao
2024-11-26  0:46   ` [RFC PATCH 0/2] SEPT SEAMCALL retry proposal Edgecombe, Rick P
2024-11-26  6:24     ` Yan Zhao
2024-12-13  1:01   ` Yan Zhao
2024-12-17 17:00   ` Edgecombe, Rick P
2024-12-17 23:18     ` Sean Christopherson
2024-12-10 18:21 ` [PATCH v2 00/24] TDX MMU Part 2 Paolo Bonzini
2024-12-24 14:33 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2L0CPLKg9dh6imZ@google.com \
    --to=seanjc@google.com \
    --cc=adrian.hunter@intel.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=tony.lindgren@intel.com \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox