From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
Yuan Yao <yuan.yao@intel.com>, Kai Huang <kai.huang@intel.com>,
"isaku.yamahata@gmail.com" <isaku.yamahata@gmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"dmatlack@google.com" <dmatlack@google.com>,
"nik.borisov@suse.com" <nik.borisov@suse.com>
Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT
Date: Tue, 8 Oct 2024 07:51:13 -0700 [thread overview]
Message-ID: <ZwVG4bQ4g5Tm2jrt@google.com> (raw)
In-Reply-To: <ZvPrqMj1BWrkkwqN@yzhao56-desk.sh.intel.com>
On Wed, Sep 25, 2024, Yan Zhao wrote:
> On Sat, Sep 14, 2024 at 05:27:32PM +0800, Yan Zhao wrote:
> > On Fri, Sep 13, 2024 at 10:23:00AM -0700, Sean Christopherson wrote:
> > > On Fri, Sep 13, 2024, Yan Zhao wrote:
> > > > This is a lock status report of TDX module for current SEAMCALL retry issue
> > > > based on code in TDX module public repo https://github.com/intel/tdx-module.git
> > > > branch TDX_1.5.05.
> > > >
> > > > TL;DR:
> > > > - tdh_mem_track() can contend with tdh_vp_enter().
> > > > - tdh_vp_enter() contends with tdh_mem*() when 0-stepping is suspected.
> > >
> > > The zero-step logic seems to be the most problematic. E.g. if KVM is trying to
> > > install a page on behalf of two vCPUs, and KVM resumes the guest if it encounters
> > > a FROZEN_SPTE when building the non-leaf SPTEs, then one of the vCPUs could
> > > trigger the zero-step mitigation if the vCPU that "wins" and gets delayed for
> > > whatever reason.
> > >
> > > Since FROZEN_SPTE is essentially bit-spinlock with a reaaaaaly slow slow-path,
> > > what if instead of resuming the guest if a page fault hits FROZEN_SPTE, KVM retries
> > > the fault "locally", i.e. _without_ redoing tdh_vp_enter() to see if the vCPU still
> > > hits the fault?
> > >
> > > For non-TDX, resuming the guest and letting the vCPU retry the instruction is
> > > desirable because in many cases, the winning task will install a valid mapping
> > > before KVM can re-run the vCPU, i.e. the fault will be fixed before the
> > > instruction is re-executed. In the happy case, that provides optimal performance
> > > as KVM doesn't introduce any extra delay/latency.
> > >
> > > But for TDX, the math is different as the cost of a re-hitting a fault is much,
> > > much higher, especially in light of the zero-step issues.
> > >
> > > E.g. if the TDP MMU returns a unique error code for the frozen case, and
> > > kvm_mmu_page_fault() is modified to return the raw return code instead of '1',
> > > then the TDX EPT violation path can safely retry locally, similar to the do-while
> > > loop in kvm_tdp_map_page().
> > >
> > > The only part I don't like about this idea is having two "retry" return values,
> > > which creates the potential for bugs due to checking one but not the other.
> > >
> > > Hmm, that could be avoided by passing a bool pointer as an out-param to communicate
> > > to the TDX S-EPT fault handler that the SPTE is frozen. I think I like that
> > > option better even though the out-param is a bit gross, because it makes it more
> > > obvious that the "frozen_spte" is a special case that doesn't need attention for
> > > most paths.
> > Good idea.
> > But could we extend it a bit more to allow TDX's EPT violation handler to also
> > retry directly when tdh_mem_sept_add()/tdh_mem_page_aug() returns BUSY?
> I'm asking this because merely avoiding invoking tdh_vp_enter() in vCPUs seeing
> FROZEN_SPTE might not be enough to prevent zero step mitigation.
The goal isn't to make it completely impossible for zero-step to fire, it's to
make it so that _if_ zero-step fires, KVM can report the error to userspace without
having to retry, because KVM _knows_ that advancing past the zero-step isn't
something KVM can solve.
: I'm not worried about any performance hit with zero-step, I'm worried about KVM
: not being able to differentiate between a KVM bug and guest interference. The
: goal with a local retry is to make it so that KVM _never_ triggers zero-step,
: unless there is a bug somewhere. At that point, if zero-step fires, KVM can
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
: report the error to userspace instead of trying to suppress guest activity, and
: potentially from other KVM tasks too.
In other words, for the selftest you crafted, KVM reporting an error to userspace
due to zero-step would be working as intended.
> E.g. in below selftest with a TD configured with pending_ve_disable=N,
> zero step mitigation can be triggered on a vCPU that is stuck in EPT violation
> vm exit for more than 6 times (due to that user space does not do memslot
> conversion correctly).
>
> So, if vCPU A wins the chance to call tdh_mem_page_aug(), the SEAMCALL may
> contend with zero step mitigation code in tdh_vp_enter() in vCPU B stuck
> in EPT violation vm exits.
next prev parent reply other threads:[~2024-10-08 14:51 UTC|newest]
Thread overview: 139+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-04 3:07 [PATCH 00/21] TDX MMU Part 2 Rick Edgecombe
2024-09-04 3:07 ` [PATCH 01/21] KVM: x86/mmu: Implement memslot deletion for TDX Rick Edgecombe
2024-09-09 13:44 ` Paolo Bonzini
2024-09-09 21:06 ` Edgecombe, Rick P
2024-09-04 3:07 ` [PATCH 02/21] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Rick Edgecombe
2024-09-09 13:51 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 03/21] KVM: x86/mmu: Do not enable page track for TD guest Rick Edgecombe
2024-09-09 13:53 ` Paolo Bonzini
2024-09-09 21:07 ` Edgecombe, Rick P
2024-09-04 3:07 ` [PATCH 04/21] KVM: VMX: Split out guts of EPT violation to common/exposed function Rick Edgecombe
2024-09-09 13:57 ` Paolo Bonzini
2024-09-09 16:07 ` Sean Christopherson
2024-09-10 7:36 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 05/21] KVM: VMX: Teach EPT violation helper about private mem Rick Edgecombe
2024-09-09 13:59 ` Paolo Bonzini
2024-09-11 8:52 ` Chao Gao
2024-09-11 16:29 ` Edgecombe, Rick P
2024-09-12 0:39 ` Huang, Kai
2024-09-12 13:58 ` Sean Christopherson
2024-09-12 14:43 ` Edgecombe, Rick P
2024-09-12 14:46 ` Paolo Bonzini
2024-09-12 1:19 ` Huang, Kai
2024-09-04 3:07 ` [PATCH 06/21] KVM: TDX: Add accessors VMX VMCS helpers Rick Edgecombe
2024-09-09 14:19 ` Paolo Bonzini
2024-09-09 21:29 ` Edgecombe, Rick P
2024-09-10 10:48 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 07/21] KVM: TDX: Add load_mmu_pgd method for TDX Rick Edgecombe
2024-09-11 2:48 ` Chao Gao
2024-09-11 2:49 ` Edgecombe, Rick P
2024-09-04 3:07 ` [PATCH 08/21] KVM: TDX: Set gfn_direct_bits to shared bit Rick Edgecombe
2024-09-09 15:21 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT Rick Edgecombe
2024-09-06 1:41 ` Huang, Kai
2024-09-09 20:25 ` Edgecombe, Rick P
2024-09-09 15:25 ` Paolo Bonzini
2024-09-09 20:22 ` Edgecombe, Rick P
2024-09-09 21:11 ` Sean Christopherson
2024-09-09 21:23 ` Sean Christopherson
2024-09-09 22:34 ` Edgecombe, Rick P
2024-09-09 23:58 ` Sean Christopherson
2024-09-10 0:50 ` Edgecombe, Rick P
2024-09-10 1:46 ` Sean Christopherson
2024-09-11 1:17 ` Huang, Kai
2024-09-11 2:48 ` Edgecombe, Rick P
2024-09-11 22:55 ` Huang, Kai
2024-09-10 13:15 ` Paolo Bonzini
2024-09-10 13:57 ` Sean Christopherson
2024-09-10 15:16 ` Paolo Bonzini
2024-09-10 15:57 ` Sean Christopherson
2024-09-10 16:28 ` Edgecombe, Rick P
2024-09-10 17:42 ` Sean Christopherson
2024-09-13 8:36 ` Yan Zhao
2024-09-13 17:23 ` Sean Christopherson
2024-09-13 19:19 ` Edgecombe, Rick P
2024-09-13 22:18 ` Sean Christopherson
2024-09-14 9:27 ` Yan Zhao
2024-09-15 9:53 ` Yan Zhao
2024-09-17 1:31 ` Huang, Kai
2024-09-25 10:53 ` Yan Zhao
2024-10-08 14:51 ` Sean Christopherson [this message]
2024-10-10 5:23 ` Yan Zhao
2024-10-10 17:33 ` Sean Christopherson
2024-10-10 21:53 ` Edgecombe, Rick P
2024-10-11 2:30 ` Yan Zhao
2024-10-14 10:54 ` Huang, Kai
2024-10-14 17:36 ` Edgecombe, Rick P
2024-10-14 23:03 ` Huang, Kai
2024-10-15 1:24 ` Edgecombe, Rick P
2024-10-11 2:06 ` Yan Zhao
2024-10-16 14:13 ` Yan Zhao
2024-09-17 2:11 ` Huang, Kai
2024-09-13 19:19 ` Edgecombe, Rick P
2024-09-14 10:00 ` Yan Zhao
2024-09-04 3:07 ` [PATCH 10/21] KVM: TDX: Require TDP MMU and mmio caching for TDX Rick Edgecombe
2024-09-09 15:26 ` Paolo Bonzini
2024-09-12 0:15 ` Huang, Kai
2024-09-04 3:07 ` [PATCH 11/21] KVM: x86/mmu: Add setter for shadow_mmio_value Rick Edgecombe
2024-09-09 15:33 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 12/21] KVM: TDX: Set per-VM shadow_mmio_value to 0 Rick Edgecombe
2024-09-09 15:33 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 13/21] KVM: TDX: Handle TLB tracking for TDX Rick Edgecombe
2024-09-10 8:16 ` Paolo Bonzini
2024-09-10 23:49 ` Edgecombe, Rick P
2024-10-14 6:34 ` Yan Zhao
2024-09-11 6:25 ` Xu Yilun
2024-09-11 17:28 ` Edgecombe, Rick P
2024-09-12 4:54 ` Yan Zhao
2024-09-12 14:44 ` Edgecombe, Rick P
2024-09-12 7:47 ` Xu Yilun
2024-09-04 3:07 ` [PATCH 14/21] KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Rick Edgecombe
2024-09-06 2:10 ` Huang, Kai
2024-09-09 21:03 ` Edgecombe, Rick P
2024-09-10 1:52 ` Yan Zhao
2024-09-10 9:33 ` Paolo Bonzini
2024-09-10 23:58 ` Edgecombe, Rick P
2024-09-11 1:05 ` Yan Zhao
2024-10-30 3:03 ` Binbin Wu
2024-11-04 9:09 ` Yan Zhao
2024-09-04 3:07 ` [PATCH 15/21] KVM: TDX: Implement hook to get max mapping level of private pages Rick Edgecombe
2024-09-10 10:17 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 16/21] KVM: TDX: Premap initial guest memory Rick Edgecombe
2024-09-10 10:24 ` Paolo Bonzini
2024-09-11 0:19 ` Edgecombe, Rick P
2024-09-13 13:33 ` Adrian Hunter
2024-09-13 19:49 ` Edgecombe, Rick P
2024-09-10 10:49 ` Paolo Bonzini
2024-09-11 0:30 ` Edgecombe, Rick P
2024-09-11 10:39 ` Paolo Bonzini
2024-09-11 16:36 ` Edgecombe, Rick P
2024-09-04 3:07 ` [PATCH 17/21] KVM: TDX: MTRR: implement get_mt_mask() for TDX Rick Edgecombe
2024-09-10 10:04 ` Paolo Bonzini
2024-09-10 14:05 ` Sean Christopherson
2024-09-04 3:07 ` [PATCH 18/21] KVM: x86/mmu: Export kvm_tdp_map_page() Rick Edgecombe
2024-09-10 10:02 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 19/21] KVM: TDX: Add an ioctl to create initial guest memory Rick Edgecombe
2024-09-04 4:53 ` Yan Zhao
2024-09-04 14:01 ` Edgecombe, Rick P
2024-09-06 16:30 ` Edgecombe, Rick P
2024-09-09 1:29 ` Yan Zhao
2024-09-10 10:13 ` Paolo Bonzini
2024-09-11 0:11 ` Edgecombe, Rick P
2024-09-04 13:56 ` Edgecombe, Rick P
2024-09-10 10:16 ` Paolo Bonzini
2024-09-11 0:12 ` Edgecombe, Rick P
2024-09-04 3:07 ` [PATCH 20/21] KVM: TDX: Finalize VM initialization Rick Edgecombe
2024-09-04 15:37 ` Adrian Hunter
2024-09-04 16:09 ` Edgecombe, Rick P
2024-09-10 10:33 ` Paolo Bonzini
2024-09-10 11:15 ` Adrian Hunter
2024-09-10 11:28 ` Paolo Bonzini
2024-09-10 11:31 ` Adrian Hunter
2024-09-10 10:25 ` Paolo Bonzini
2024-09-10 11:54 ` Adrian Hunter
2024-09-04 3:07 ` [PATCH 21/21] KVM: TDX: Handle vCPU dissociation Rick Edgecombe
2024-09-09 15:41 ` Paolo Bonzini
2024-09-09 23:30 ` Edgecombe, Rick P
2024-09-10 10:45 ` Paolo Bonzini
2024-09-11 0:17 ` Edgecombe, Rick P
2024-11-04 9:45 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZwVG4bQ4g5Tm2jrt@google.com \
--to=seanjc@google.com \
--cc=dmatlack@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nik.borisov@suse.com \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yuan.yao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).