public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Rick P Edgecombe <rick.p.edgecombe@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Yuan Yao <yuan.yao@intel.com>,  Kai Huang <kai.huang@intel.com>,
	"isaku.yamahata@gmail.com" <isaku.yamahata@gmail.com>,
	 Yan Y Zhao <yan.y.zhao@intel.com>,
	"dmatlack@google.com" <dmatlack@google.com>,
	 "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"nik.borisov@suse.com" <nik.borisov@suse.com>,
	 "pbonzini@redhat.com" <pbonzini@redhat.com>
Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT
Date: Mon, 9 Sep 2024 16:58:19 -0700	[thread overview]
Message-ID: <Zt-LmzUSyljHGcMO@google.com> (raw)
In-Reply-To: <72ef77d580d2f16f0b04cbb03235109f5bde48dd.camel@intel.com>

On Mon, Sep 09, 2024, Rick P Edgecombe wrote:
> On Mon, 2024-09-09 at 14:23 -0700, Sean Christopherson wrote:
> > > In general, I am _very_ opposed to blindly retrying an SEPT SEAMCALL,
> > > ever.  For its operations, I'm pretty sure the only sane approach is for
> > > KVM to ensure there will be no contention.  And if the TDX module's
> > > single-step protection spuriously kicks in, KVM exits to userspace.  If
> > > the TDX module can't/doesn't/won't communicate that it's mitigating
> > > single-step, e.g. so that KVM can forward the information to userspace,
> > > then that's a TDX module problem to solve.
> > > 
> > > > Per the docs, in general the VMM is supposed to retry SEAMCALLs that
> > > > return TDX_OPERAND_BUSY.
> > > 
> > > IMO, that's terrible advice.  SGX has similar behavior, where the xucode
> > > "module" signals #GP if there's a conflict.  #GP is obviously far, far
> > > worse as it lacks the precision that would help software understand
> > > exactly what went wrong, but I think one of the better decisions we made
> > > with the SGX driver was to have a "zero tolerance" policy where the
> > > driver would _never_ retry due to a potential resource conflict, i.e.
> > > that any conflict in the module would be treated as a kernel bug.
> 
> Thanks for the analysis. The direction seems reasonable to me for this lock in
> particular. We need to do some analysis on how much the existing mmu_lock can
> protects us. 

I would operate under the assumption that it provides SEPT no meaningful protection.
I think I would even go so far as to say that it is a _requirement_ that mmu_lock
does NOT provide the ordering required by SEPT, because I do not want to take on
any risk (due to SEPT constraints) that would limit KVM's ability to do things
while holding mmu_lock for read.

> Maybe sprinkle some asserts for documentation purposes.

Not sure I understand, assert on what?

> For the general case of TDX_OPERAND_BUSY, there might be one wrinkle. The guest
> side operations can take the locks too. From "Base Architecture Specification":
> "
> Host-Side (SEAMCALL) Operation
> ------------------------------
> The host VMM is expected to retry host-side operations that fail with a
> TDX_OPERAND_BUSY status. The host priority mechanism helps guarantee that at
> most after a limited time (the longest guest-side TDX module flow) there will be
> no contention with a guest TD attempting to acquire access to the same resource.
> 
> Lock operations process the HOST_PRIORITY bit as follows:
>    - A SEAMCALL (host-side) function that fails to acquire a lock sets the lock’s
>    HOST_PRIORITY bit and returns a TDX_OPERAND_BUSY status to the host VMM. It is
>    the host VMM’s responsibility to re-attempt the SEAMCALL function until is
>    succeeds; otherwise, the HOST_PRIORITY bit remains set, preventing the guest TD
>    from acquiring the lock.
>    - A SEAMCALL (host-side) function that succeeds to acquire a lock clears the
>    lock’s HOST_PRIORITY bit.

*sigh*

> Guest-Side (TDCALL) Operation
> -----------------------------
> A TDCALL (guest-side) function that attempt to acquire a lock fails if
> HOST_PRIORITY is set to 1; a TDX_OPERAND_BUSY status is returned to the guest.
> The guest is expected to retry the operation.
> 
> Guest-side TDCALL flows that acquire a host priority lock have an upper bound on
> the host-side latency for that lock; once a lock is acquired, the flow either
> releases within a fixed upper time bound, or periodically monitor the
> HOST_PRIORITY flag to see if the host is attempting to acquire the lock.
> "
> 
> So KVM can't fully prevent TDX_OPERAND_BUSY with KVM side locks, because it is
> involved in sorting out contention between the guest as well. We need to double
> check this, but I *think* this HOST_PRIORITY bit doesn't come into play for the
> functionality we need to exercise for base support.
> 
> The thing that makes me nervous about retry based solution is the potential for
> some kind deadlock like pattern. Just to gather your opinion, if there was some
> SEAMCALL contention that couldn't be locked around from KVM, but came with some
> strong well described guarantees, would a retry loop be hard NAK still?

I don't know.  It would depend on what operations can hit BUSY, and what the
alternatives are.  E.g. if we can narrow down the retry paths to a few select
cases where it's (a) expected, (b) unavoidable, and (c) has minimal risk of
deadlock, then maybe that's the least awful option.

What I don't think KVM should do is blindly retry N number of times, because
then there are effectively no rules whatsoever.  E.g. if KVM is tearing down a
VM then KVM should assert on immediate success.  And if KVM is handling a fault
on behalf of a vCPU, then KVM can and should resume the guest and let it retry.
Ugh, but that would likely trigger the annoying "zero-step mitigation" crap.

What does this actually mean in practice?  What's the threshold, is the VM-Enter
error uniquely identifiable, and can KVM rely on HOST_PRIORITY to be set if KVM
runs afoul of the zero-step mitigation?

  After a pre-determined number of such EPT violations occur on the same instruction,
  the TDX module starts tracking the GPAs that caused Secure EPT faults and fails
  further host VMM attempts to enter the TD VCPU unless previously faulting private
  GPAs are properly mapped in the Secure EPT.

If HOST_PRIORITY is set, then one idea would be to resume the guest if there's
SEPT contention on a fault, and then _if_ the zero-step mitigation is triggered,
kick all vCPUs (via IPI) to ensure that the contended SEPT entry is unlocked and
can't be re-locked by the guest.  That would allow KVM to guarantee forward
progress without an arbitrary retry loop in the TDP MMU.

Similarly, if KVM needs to zap a SPTE and hits BUSY, kick all vCPUs to ensure the
one and only retry is guaranteed to succeed.

  reply	other threads:[~2024-09-09 23:58 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-04  3:07 [PATCH 00/21] TDX MMU Part 2 Rick Edgecombe
2024-09-04  3:07 ` [PATCH 01/21] KVM: x86/mmu: Implement memslot deletion for TDX Rick Edgecombe
2024-09-09 13:44   ` Paolo Bonzini
2024-09-09 21:06     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 02/21] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Rick Edgecombe
2024-09-09 13:51   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 03/21] KVM: x86/mmu: Do not enable page track for TD guest Rick Edgecombe
2024-09-09 13:53   ` Paolo Bonzini
2024-09-09 21:07     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 04/21] KVM: VMX: Split out guts of EPT violation to common/exposed function Rick Edgecombe
2024-09-09 13:57   ` Paolo Bonzini
2024-09-09 16:07   ` Sean Christopherson
2024-09-10  7:36     ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 05/21] KVM: VMX: Teach EPT violation helper about private mem Rick Edgecombe
2024-09-09 13:59   ` Paolo Bonzini
2024-09-11  8:52   ` Chao Gao
2024-09-11 16:29     ` Edgecombe, Rick P
2024-09-12  0:39   ` Huang, Kai
2024-09-12 13:58     ` Sean Christopherson
2024-09-12 14:43       ` Edgecombe, Rick P
2024-09-12 14:46         ` Paolo Bonzini
2024-09-12  1:19   ` Huang, Kai
2024-09-04  3:07 ` [PATCH 06/21] KVM: TDX: Add accessors VMX VMCS helpers Rick Edgecombe
2024-09-09 14:19   ` Paolo Bonzini
2024-09-09 21:29     ` Edgecombe, Rick P
2024-09-10 10:48       ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 07/21] KVM: TDX: Add load_mmu_pgd method for TDX Rick Edgecombe
2024-09-11  2:48   ` Chao Gao
2024-09-11  2:49     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 08/21] KVM: TDX: Set gfn_direct_bits to shared bit Rick Edgecombe
2024-09-09 15:21   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT Rick Edgecombe
2024-09-06  1:41   ` Huang, Kai
2024-09-09 20:25     ` Edgecombe, Rick P
2024-09-09 15:25   ` Paolo Bonzini
2024-09-09 20:22     ` Edgecombe, Rick P
2024-09-09 21:11       ` Sean Christopherson
2024-09-09 21:23         ` Sean Christopherson
2024-09-09 22:34           ` Edgecombe, Rick P
2024-09-09 23:58             ` Sean Christopherson [this message]
2024-09-10  0:50               ` Edgecombe, Rick P
2024-09-10  1:46                 ` Sean Christopherson
2024-09-11  1:17               ` Huang, Kai
2024-09-11  2:48                 ` Edgecombe, Rick P
2024-09-11 22:55                   ` Huang, Kai
2024-09-10 13:15         ` Paolo Bonzini
2024-09-10 13:57           ` Sean Christopherson
2024-09-10 15:16             ` Paolo Bonzini
2024-09-10 15:57               ` Sean Christopherson
2024-09-10 16:28                 ` Edgecombe, Rick P
2024-09-10 17:42                   ` Sean Christopherson
2024-09-13  8:36                     ` Yan Zhao
2024-09-13 17:23                       ` Sean Christopherson
2024-09-13 19:19                         ` Edgecombe, Rick P
2024-09-13 22:18                           ` Sean Christopherson
2024-09-14  9:27                         ` Yan Zhao
2024-09-15  9:53                           ` Yan Zhao
2024-09-17  1:31                             ` Huang, Kai
2024-09-25 10:53                           ` Yan Zhao
2024-10-08 14:51                             ` Sean Christopherson
2024-10-10  5:23                               ` Yan Zhao
2024-10-10 17:33                                 ` Sean Christopherson
2024-10-10 21:53                                   ` Edgecombe, Rick P
2024-10-11  2:30                                     ` Yan Zhao
2024-10-14 10:54                                     ` Huang, Kai
2024-10-14 17:36                                       ` Edgecombe, Rick P
2024-10-14 23:03                                         ` Huang, Kai
2024-10-15  1:24                                           ` Edgecombe, Rick P
2024-10-11  2:06                                   ` Yan Zhao
2024-10-16 14:13                                   ` Yan Zhao
2024-09-17  2:11                         ` Huang, Kai
2024-09-13 19:19                       ` Edgecombe, Rick P
2024-09-14 10:00                         ` Yan Zhao
2024-09-04  3:07 ` [PATCH 10/21] KVM: TDX: Require TDP MMU and mmio caching for TDX Rick Edgecombe
2024-09-09 15:26   ` Paolo Bonzini
2024-09-12  0:15   ` Huang, Kai
2024-09-04  3:07 ` [PATCH 11/21] KVM: x86/mmu: Add setter for shadow_mmio_value Rick Edgecombe
2024-09-09 15:33   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 12/21] KVM: TDX: Set per-VM shadow_mmio_value to 0 Rick Edgecombe
2024-09-09 15:33   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 13/21] KVM: TDX: Handle TLB tracking for TDX Rick Edgecombe
2024-09-10  8:16   ` Paolo Bonzini
2024-09-10 23:49     ` Edgecombe, Rick P
2024-10-14  6:34     ` Yan Zhao
2024-09-11  6:25   ` Xu Yilun
2024-09-11 17:28     ` Edgecombe, Rick P
2024-09-12  4:54       ` Yan Zhao
2024-09-12 14:44         ` Edgecombe, Rick P
2024-09-12  7:47       ` Xu Yilun
2024-09-04  3:07 ` [PATCH 14/21] KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Rick Edgecombe
2024-09-06  2:10   ` Huang, Kai
2024-09-09 21:03     ` Edgecombe, Rick P
2024-09-10  1:52       ` Yan Zhao
2024-09-10  9:33       ` Paolo Bonzini
2024-09-10 23:58         ` Edgecombe, Rick P
2024-09-11  1:05           ` Yan Zhao
2024-10-30  3:03   ` Binbin Wu
2024-11-04  9:09     ` Yan Zhao
2024-09-04  3:07 ` [PATCH 15/21] KVM: TDX: Implement hook to get max mapping level of private pages Rick Edgecombe
2024-09-10 10:17   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 16/21] KVM: TDX: Premap initial guest memory Rick Edgecombe
2024-09-10 10:24   ` Paolo Bonzini
2024-09-11  0:19     ` Edgecombe, Rick P
2024-09-13 13:33       ` Adrian Hunter
2024-09-13 19:49         ` Edgecombe, Rick P
2024-09-10 10:49   ` Paolo Bonzini
2024-09-11  0:30     ` Edgecombe, Rick P
2024-09-11 10:39       ` Paolo Bonzini
2024-09-11 16:36         ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 17/21] KVM: TDX: MTRR: implement get_mt_mask() for TDX Rick Edgecombe
2024-09-10 10:04   ` Paolo Bonzini
2024-09-10 14:05     ` Sean Christopherson
2024-09-04  3:07 ` [PATCH 18/21] KVM: x86/mmu: Export kvm_tdp_map_page() Rick Edgecombe
2024-09-10 10:02   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 19/21] KVM: TDX: Add an ioctl to create initial guest memory Rick Edgecombe
2024-09-04  4:53   ` Yan Zhao
2024-09-04 14:01     ` Edgecombe, Rick P
2024-09-06 16:30       ` Edgecombe, Rick P
2024-09-09  1:29         ` Yan Zhao
2024-09-10 10:13         ` Paolo Bonzini
2024-09-11  0:11           ` Edgecombe, Rick P
2024-09-04 13:56   ` Edgecombe, Rick P
2024-09-10 10:16   ` Paolo Bonzini
2024-09-11  0:12     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 20/21] KVM: TDX: Finalize VM initialization Rick Edgecombe
2024-09-04 15:37   ` Adrian Hunter
2024-09-04 16:09     ` Edgecombe, Rick P
2024-09-10 10:33     ` Paolo Bonzini
2024-09-10 11:15       ` Adrian Hunter
2024-09-10 11:28         ` Paolo Bonzini
2024-09-10 11:31         ` Adrian Hunter
2024-09-10 10:25   ` Paolo Bonzini
2024-09-10 11:54     ` Adrian Hunter
2024-09-04  3:07 ` [PATCH 21/21] KVM: TDX: Handle vCPU dissociation Rick Edgecombe
2024-09-09 15:41   ` Paolo Bonzini
2024-09-09 23:30     ` Edgecombe, Rick P
2024-09-10 10:45   ` Paolo Bonzini
2024-09-11  0:17     ` Edgecombe, Rick P
2024-11-04  9:45     ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zt-LmzUSyljHGcMO@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yuan.yao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox