public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	 Yan Y Zhao <yan.y.zhao@intel.com>, Yuan Yao <yuan.yao@intel.com>,
	 "nik.borisov@suse.com" <nik.borisov@suse.com>,
	"dmatlack@google.com" <dmatlack@google.com>,
	 Kai Huang <kai.huang@intel.com>,
	"isaku.yamahata@gmail.com" <isaku.yamahata@gmail.com>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT
Date: Tue, 10 Sep 2024 08:57:02 -0700	[thread overview]
Message-ID: <ZuBsTlbrlD6NHyv1@google.com> (raw)
In-Reply-To: <CABgObfayLGyWKERXkU+0gjeUg=Sp3r7GEQU=+13sUMpo36weWg@mail.gmail.com>

On Tue, Sep 10, 2024, Paolo Bonzini wrote:
> On Tue, Sep 10, 2024 at 3:58 PM Sean Christopherson <seanjc@google.com> wrote:
> > On Tue, Sep 10, 2024, Paolo Bonzini wrote:
> > No, because that defeates the purpose of having mmu_lock be a rwlock.
> 
> But if this part of the TDX module is wrapped in a single big
> try_lock, there's no difference in spinning around busy seamcalls, or
> doing spin_lock(&kvm->arch.seamcall_lock). All of them hit contention
> in the same way.  With respect to FROZEN_SPTE...
>
> > > This way we know that "busy" errors must come from the guest and have set
> > > HOST_PRIORITY.
> >
> > We should be able to achieve that without a VM-wide spinlock.  My thought (from
> > v11?) was to effectively use the FROZEN_SPTE bit as a per-SPTE spinlock, i.e. keep
> > it set until the SEAMCALL completes.
> 
> Only if the TDX module returns BUSY per-SPTE (as suggested by 18.1.3,
> which documents that the TDX module returns TDX_OPERAND_BUSY on a
> CMPXCHG failure). If it returns BUSY per-VM, FROZEN_SPTE is not enough
> to prevent contention in the TDX module.

Looking at the TDX module code, things like (UN)BLOCK and REMOVE take a per-VM
lock in write mode, but ADD, AUG, and PROMOTE/DEMOTE take the lock in read mode.

So for the operations that KVM can do in parallel, the locking should effectively
be per-entry.  Because KVM will never throw away an entire S-EPT root, zapping
SPTEs will need to be done while holding mmu_lock for write, i.e. KVM shouldn't
have problems with host tasks competing for the TDX module's VM-wide lock.

> If we want to be a bit more optimistic, let's do something more
> sophisticated, like only take the lock after the first busy reply. But
> the spinlock is the easiest way to completely remove host-induced
> TDX_OPERAND_BUSY, and only have to deal with guest-induced ones.

I am not convinced that's necessary or a good idea.  I worry that doing so would
just kick the can down the road, and potentially make the problems harder to solve,
e.g. because we'd have to worry about regressing existing setups.

> > > It is still kinda bad that guests can force the VMM to loop, but the VMM can
> > > always say enough is enough.  In other words, let's assume that a limit of
> > > 16 is probably appropriate but we can also increase the limit and crash the
> > > VM if things become ridiculous.
> >
> > 2 :-)
> >
> > One try that guarantees no other host task is accessing the S-EPT entry, and a
> > second try after blasting IPI to kick vCPUs to ensure no guest-side task has
> > locked the S-EPT entry.
> 
> Fair enough. Though in principle it is possible to race and have the
> vCPU re-run and re-issue a TDG call before KVM re-issues the TDH call.

My limit of '2' is predicated on the lock being a "host priority" lock, i.e. that
kicking vCPUs would ensure the lock has been dropped and can't be re-acquired by
the guest.

> So I would make it 5 or so just to be safe.
> 
> > My concern with an arbitrary retry loop is that we'll essentially propagate the
> > TDX module issues to the broader kernel.  Each of those SEAMCALLs is slooow, so
> > retrying even ~20 times could exceed the system's tolerances for scheduling, RCU,
> > etc...
> 
> How slow are the failed ones? The number of retries is essentially the
> cost of successful seamcall / cost of busy seamcall.

I haven't measured, but would be surprised if it's less than 2000 cycles.

> If HOST_PRIORITY works, even a not-small-but-not-huge number of
> retries would be better than the IPIs. IPIs are not cheap either.

Agreed, but we also need to account for the operations that are conflicting.
E.g. if KVM is trying to zap a S-EPT that the guest is accessing, then busy waiting
for the to-be-zapped S-EPT entry to be available doesn't make much sense.

> > > For zero step detection, my reading is that it's TDH.VP.ENTER that fails;
> > > not any of the MEM seamcalls.  For that one to be resolved, it should be
> > > enough to do take and release the mmu_lock back to back, which ensures that
> > > all pending critical sections have completed (that is,
> > > "write_lock(&kvm->mmu_lock); write_unlock(&kvm->mmu_lock);").  And then
> > > loop.  Adding a vCPU stat for that one is a good idea, too.
> >
> > As above and in my discussion with Rick, I would prefer to kick vCPUs to force
> > forward progress, especially for the zero-step case.  If KVM gets to the point
> > where it has retried TDH.VP.ENTER on the same fault so many times that zero-step
> > kicks in, then it's time to kick and wait, not keep retrying blindly.
> 
> Wait, zero-step detection should _not_ affect TDH.MEM latency. Only
> TDH.VP.ENTER is delayed.

Blocked, not delayed.  Yes, it's TDH.VP.ENTER that "fails", but to get past
TDH.VP.ENTER, KVM needs to resolve the underlying fault, i.e. needs to guarantee
forward progress for TDH.MEM (or whatever the operations are called).

Though I wonder, are there any operations guest/host operations that can conflict
if the vCPU is faulting?  Maybe this particular scenario is a complete non-issue.

> If it is delayed to the point of failing, we can do write_lock/write_unlock()
> in the vCPU entry path.

I was thinking that KVM could set a flag (another synthetic error code bit?) to
tell the page fault handler that it needs to kick vCPUs.  But as above, it might
be unnecessary.

> My issue is that, even if we could make it a bit better by looking at
> the TDX module source code, we don't have enough information to make a
> good choice.  For now we should start with something _easy_, even if
> it may not be the greatest.

I am not opposed to an easy/simple solution, but I am very much opposed to
implementing a retry loop without understanding _exactly_ when and why it's
needed.

  reply	other threads:[~2024-09-10 15:57 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-04  3:07 [PATCH 00/21] TDX MMU Part 2 Rick Edgecombe
2024-09-04  3:07 ` [PATCH 01/21] KVM: x86/mmu: Implement memslot deletion for TDX Rick Edgecombe
2024-09-09 13:44   ` Paolo Bonzini
2024-09-09 21:06     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 02/21] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Rick Edgecombe
2024-09-09 13:51   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 03/21] KVM: x86/mmu: Do not enable page track for TD guest Rick Edgecombe
2024-09-09 13:53   ` Paolo Bonzini
2024-09-09 21:07     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 04/21] KVM: VMX: Split out guts of EPT violation to common/exposed function Rick Edgecombe
2024-09-09 13:57   ` Paolo Bonzini
2024-09-09 16:07   ` Sean Christopherson
2024-09-10  7:36     ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 05/21] KVM: VMX: Teach EPT violation helper about private mem Rick Edgecombe
2024-09-09 13:59   ` Paolo Bonzini
2024-09-11  8:52   ` Chao Gao
2024-09-11 16:29     ` Edgecombe, Rick P
2024-09-12  0:39   ` Huang, Kai
2024-09-12 13:58     ` Sean Christopherson
2024-09-12 14:43       ` Edgecombe, Rick P
2024-09-12 14:46         ` Paolo Bonzini
2024-09-12  1:19   ` Huang, Kai
2024-09-04  3:07 ` [PATCH 06/21] KVM: TDX: Add accessors VMX VMCS helpers Rick Edgecombe
2024-09-09 14:19   ` Paolo Bonzini
2024-09-09 21:29     ` Edgecombe, Rick P
2024-09-10 10:48       ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 07/21] KVM: TDX: Add load_mmu_pgd method for TDX Rick Edgecombe
2024-09-11  2:48   ` Chao Gao
2024-09-11  2:49     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 08/21] KVM: TDX: Set gfn_direct_bits to shared bit Rick Edgecombe
2024-09-09 15:21   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT Rick Edgecombe
2024-09-06  1:41   ` Huang, Kai
2024-09-09 20:25     ` Edgecombe, Rick P
2024-09-09 15:25   ` Paolo Bonzini
2024-09-09 20:22     ` Edgecombe, Rick P
2024-09-09 21:11       ` Sean Christopherson
2024-09-09 21:23         ` Sean Christopherson
2024-09-09 22:34           ` Edgecombe, Rick P
2024-09-09 23:58             ` Sean Christopherson
2024-09-10  0:50               ` Edgecombe, Rick P
2024-09-10  1:46                 ` Sean Christopherson
2024-09-11  1:17               ` Huang, Kai
2024-09-11  2:48                 ` Edgecombe, Rick P
2024-09-11 22:55                   ` Huang, Kai
2024-09-10 13:15         ` Paolo Bonzini
2024-09-10 13:57           ` Sean Christopherson
2024-09-10 15:16             ` Paolo Bonzini
2024-09-10 15:57               ` Sean Christopherson [this message]
2024-09-10 16:28                 ` Edgecombe, Rick P
2024-09-10 17:42                   ` Sean Christopherson
2024-09-13  8:36                     ` Yan Zhao
2024-09-13 17:23                       ` Sean Christopherson
2024-09-13 19:19                         ` Edgecombe, Rick P
2024-09-13 22:18                           ` Sean Christopherson
2024-09-14  9:27                         ` Yan Zhao
2024-09-15  9:53                           ` Yan Zhao
2024-09-17  1:31                             ` Huang, Kai
2024-09-25 10:53                           ` Yan Zhao
2024-10-08 14:51                             ` Sean Christopherson
2024-10-10  5:23                               ` Yan Zhao
2024-10-10 17:33                                 ` Sean Christopherson
2024-10-10 21:53                                   ` Edgecombe, Rick P
2024-10-11  2:30                                     ` Yan Zhao
2024-10-14 10:54                                     ` Huang, Kai
2024-10-14 17:36                                       ` Edgecombe, Rick P
2024-10-14 23:03                                         ` Huang, Kai
2024-10-15  1:24                                           ` Edgecombe, Rick P
2024-10-11  2:06                                   ` Yan Zhao
2024-10-16 14:13                                   ` Yan Zhao
2024-09-17  2:11                         ` Huang, Kai
2024-09-13 19:19                       ` Edgecombe, Rick P
2024-09-14 10:00                         ` Yan Zhao
2024-09-04  3:07 ` [PATCH 10/21] KVM: TDX: Require TDP MMU and mmio caching for TDX Rick Edgecombe
2024-09-09 15:26   ` Paolo Bonzini
2024-09-12  0:15   ` Huang, Kai
2024-09-04  3:07 ` [PATCH 11/21] KVM: x86/mmu: Add setter for shadow_mmio_value Rick Edgecombe
2024-09-09 15:33   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 12/21] KVM: TDX: Set per-VM shadow_mmio_value to 0 Rick Edgecombe
2024-09-09 15:33   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 13/21] KVM: TDX: Handle TLB tracking for TDX Rick Edgecombe
2024-09-10  8:16   ` Paolo Bonzini
2024-09-10 23:49     ` Edgecombe, Rick P
2024-10-14  6:34     ` Yan Zhao
2024-09-11  6:25   ` Xu Yilun
2024-09-11 17:28     ` Edgecombe, Rick P
2024-09-12  4:54       ` Yan Zhao
2024-09-12 14:44         ` Edgecombe, Rick P
2024-09-12  7:47       ` Xu Yilun
2024-09-04  3:07 ` [PATCH 14/21] KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Rick Edgecombe
2024-09-06  2:10   ` Huang, Kai
2024-09-09 21:03     ` Edgecombe, Rick P
2024-09-10  1:52       ` Yan Zhao
2024-09-10  9:33       ` Paolo Bonzini
2024-09-10 23:58         ` Edgecombe, Rick P
2024-09-11  1:05           ` Yan Zhao
2024-10-30  3:03   ` Binbin Wu
2024-11-04  9:09     ` Yan Zhao
2024-09-04  3:07 ` [PATCH 15/21] KVM: TDX: Implement hook to get max mapping level of private pages Rick Edgecombe
2024-09-10 10:17   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 16/21] KVM: TDX: Premap initial guest memory Rick Edgecombe
2024-09-10 10:24   ` Paolo Bonzini
2024-09-11  0:19     ` Edgecombe, Rick P
2024-09-13 13:33       ` Adrian Hunter
2024-09-13 19:49         ` Edgecombe, Rick P
2024-09-10 10:49   ` Paolo Bonzini
2024-09-11  0:30     ` Edgecombe, Rick P
2024-09-11 10:39       ` Paolo Bonzini
2024-09-11 16:36         ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 17/21] KVM: TDX: MTRR: implement get_mt_mask() for TDX Rick Edgecombe
2024-09-10 10:04   ` Paolo Bonzini
2024-09-10 14:05     ` Sean Christopherson
2024-09-04  3:07 ` [PATCH 18/21] KVM: x86/mmu: Export kvm_tdp_map_page() Rick Edgecombe
2024-09-10 10:02   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 19/21] KVM: TDX: Add an ioctl to create initial guest memory Rick Edgecombe
2024-09-04  4:53   ` Yan Zhao
2024-09-04 14:01     ` Edgecombe, Rick P
2024-09-06 16:30       ` Edgecombe, Rick P
2024-09-09  1:29         ` Yan Zhao
2024-09-10 10:13         ` Paolo Bonzini
2024-09-11  0:11           ` Edgecombe, Rick P
2024-09-04 13:56   ` Edgecombe, Rick P
2024-09-10 10:16   ` Paolo Bonzini
2024-09-11  0:12     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 20/21] KVM: TDX: Finalize VM initialization Rick Edgecombe
2024-09-04 15:37   ` Adrian Hunter
2024-09-04 16:09     ` Edgecombe, Rick P
2024-09-10 10:33     ` Paolo Bonzini
2024-09-10 11:15       ` Adrian Hunter
2024-09-10 11:28         ` Paolo Bonzini
2024-09-10 11:31         ` Adrian Hunter
2024-09-10 10:25   ` Paolo Bonzini
2024-09-10 11:54     ` Adrian Hunter
2024-09-04  3:07 ` [PATCH 21/21] KVM: TDX: Handle vCPU dissociation Rick Edgecombe
2024-09-09 15:41   ` Paolo Bonzini
2024-09-09 23:30     ` Edgecombe, Rick P
2024-09-10 10:45   ` Paolo Bonzini
2024-09-11  0:17     ` Edgecombe, Rick P
2024-11-04  9:45     ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZuBsTlbrlD6NHyv1@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yuan.yao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox