From: Sean Christopherson <seanjc@google.com>
To: Rick P Edgecombe <rick.p.edgecombe@intel.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
Yuan Yao <yuan.yao@intel.com>, Kai Huang <kai.huang@intel.com>,
"isaku.yamahata@gmail.com" <isaku.yamahata@gmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Yan Y Zhao <yan.y.zhao@intel.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"dmatlack@google.com" <dmatlack@google.com>,
"nik.borisov@suse.com" <nik.borisov@suse.com>
Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT
Date: Tue, 10 Sep 2024 10:42:20 -0700 [thread overview]
Message-ID: <ZuCE_KtmXNi0qePb@google.com> (raw)
In-Reply-To: <655170f6a09ad892200cd033efe5498a26504fec.camel@intel.com>
On Tue, Sep 10, 2024, Rick P Edgecombe wrote:
> On Tue, 2024-09-10 at 08:57 -0700, Sean Christopherson wrote:
> > > Only if the TDX module returns BUSY per-SPTE (as suggested by 18.1.3,
> > > which documents that the TDX module returns TDX_OPERAND_BUSY on a
> > > CMPXCHG failure). If it returns BUSY per-VM, FROZEN_SPTE is not enough
> > > to prevent contention in the TDX module.
> >
> > Looking at the TDX module code, things like (UN)BLOCK and REMOVE take a per-VM
> > lock in write mode, but ADD, AUG, and PROMOTE/DEMOTE take the lock in read
> > mode.
>
> AUG does take other locks as exclusive:
> https://github.com/intel/tdx-module/blob/tdx_1.5/src/vmm_dispatcher/api_calls/tdh_mem_page_aug.c
Only a lock on the underlying physical page. guest_memfd should prevent mapping
the same HPA into multiple GPAs, and FROZEN_SPTE should prevent two vCPUs from
concurrently AUGing the same GPA+HPA.
> I count 5 locks in total as well. I think trying to mirror the locking in KVM
> will be an uphill battle.
I don't want to mirror the locking, I want to understand and document the
expectations and rules. "Throw 16 noodles and hope one sticks" is not a recipe
for success.
> > So for the operations that KVM can do in parallel, the locking should
> > effectively
> > be per-entry. Because KVM will never throw away an entire S-EPT root, zapping
> > SPTEs will need to be done while holding mmu_lock for write, i.e. KVM
> > shouldn't
> > have problems with host tasks competing for the TDX module's VM-wide lock.
> >
> > > If we want to be a bit more optimistic, let's do something more
> > > sophisticated, like only take the lock after the first busy reply. But
> > > the spinlock is the easiest way to completely remove host-induced
> > > TDX_OPERAND_BUSY, and only have to deal with guest-induced ones.
> >
> > I am not convinced that's necessary or a good idea. I worry that doing so
> > would
> > just kick the can down the road, and potentially make the problems harder to
> > solve,
> > e.g. because we'd have to worry about regressing existing setups.
> >
> > > > > It is still kinda bad that guests can force the VMM to loop, but the VMM
> > > > > can
> > > > > always say enough is enough. In other words, let's assume that a limit
> > > > > of
> > > > > 16 is probably appropriate but we can also increase the limit and crash
> > > > > the
> > > > > VM if things become ridiculous.
> > > >
> > > > 2 :-)
> > > >
> > > > One try that guarantees no other host task is accessing the S-EPT entry,
> > > > and a
> > > > second try after blasting IPI to kick vCPUs to ensure no guest-side task
> > > > has
> > > > locked the S-EPT entry.
> > >
> > > Fair enough. Though in principle it is possible to race and have the
> > > vCPU re-run and re-issue a TDG call before KVM re-issues the TDH call.
> >
> > My limit of '2' is predicated on the lock being a "host priority" lock,
> > i.e. that kicking vCPUs would ensure the lock has been dropped and can't
> > be re-acquired by the guest.
>
> So kicking would be to try to break loose any deadlock we encountered? It sounds
> like the kind of kludge that could be hard to remove.
No, the intent of the kick would be to wait for vCPUs to exit, which in turn
guarantees that any locks held by vCPUs have been dropped. Again, this idea is
predicated on the lock being "host priority", i.e. that vCPUs can't re-take the
lock before KVM.
next prev parent reply other threads:[~2024-09-10 17:42 UTC|newest]
Thread overview: 139+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-04 3:07 [PATCH 00/21] TDX MMU Part 2 Rick Edgecombe
2024-09-04 3:07 ` [PATCH 01/21] KVM: x86/mmu: Implement memslot deletion for TDX Rick Edgecombe
2024-09-09 13:44 ` Paolo Bonzini
2024-09-09 21:06 ` Edgecombe, Rick P
2024-09-04 3:07 ` [PATCH 02/21] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Rick Edgecombe
2024-09-09 13:51 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 03/21] KVM: x86/mmu: Do not enable page track for TD guest Rick Edgecombe
2024-09-09 13:53 ` Paolo Bonzini
2024-09-09 21:07 ` Edgecombe, Rick P
2024-09-04 3:07 ` [PATCH 04/21] KVM: VMX: Split out guts of EPT violation to common/exposed function Rick Edgecombe
2024-09-09 13:57 ` Paolo Bonzini
2024-09-09 16:07 ` Sean Christopherson
2024-09-10 7:36 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 05/21] KVM: VMX: Teach EPT violation helper about private mem Rick Edgecombe
2024-09-09 13:59 ` Paolo Bonzini
2024-09-11 8:52 ` Chao Gao
2024-09-11 16:29 ` Edgecombe, Rick P
2024-09-12 0:39 ` Huang, Kai
2024-09-12 13:58 ` Sean Christopherson
2024-09-12 14:43 ` Edgecombe, Rick P
2024-09-12 14:46 ` Paolo Bonzini
2024-09-12 1:19 ` Huang, Kai
2024-09-04 3:07 ` [PATCH 06/21] KVM: TDX: Add accessors VMX VMCS helpers Rick Edgecombe
2024-09-09 14:19 ` Paolo Bonzini
2024-09-09 21:29 ` Edgecombe, Rick P
2024-09-10 10:48 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 07/21] KVM: TDX: Add load_mmu_pgd method for TDX Rick Edgecombe
2024-09-11 2:48 ` Chao Gao
2024-09-11 2:49 ` Edgecombe, Rick P
2024-09-04 3:07 ` [PATCH 08/21] KVM: TDX: Set gfn_direct_bits to shared bit Rick Edgecombe
2024-09-09 15:21 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT Rick Edgecombe
2024-09-06 1:41 ` Huang, Kai
2024-09-09 20:25 ` Edgecombe, Rick P
2024-09-09 15:25 ` Paolo Bonzini
2024-09-09 20:22 ` Edgecombe, Rick P
2024-09-09 21:11 ` Sean Christopherson
2024-09-09 21:23 ` Sean Christopherson
2024-09-09 22:34 ` Edgecombe, Rick P
2024-09-09 23:58 ` Sean Christopherson
2024-09-10 0:50 ` Edgecombe, Rick P
2024-09-10 1:46 ` Sean Christopherson
2024-09-11 1:17 ` Huang, Kai
2024-09-11 2:48 ` Edgecombe, Rick P
2024-09-11 22:55 ` Huang, Kai
2024-09-10 13:15 ` Paolo Bonzini
2024-09-10 13:57 ` Sean Christopherson
2024-09-10 15:16 ` Paolo Bonzini
2024-09-10 15:57 ` Sean Christopherson
2024-09-10 16:28 ` Edgecombe, Rick P
2024-09-10 17:42 ` Sean Christopherson [this message]
2024-09-13 8:36 ` Yan Zhao
2024-09-13 17:23 ` Sean Christopherson
2024-09-13 19:19 ` Edgecombe, Rick P
2024-09-13 22:18 ` Sean Christopherson
2024-09-14 9:27 ` Yan Zhao
2024-09-15 9:53 ` Yan Zhao
2024-09-17 1:31 ` Huang, Kai
2024-09-25 10:53 ` Yan Zhao
2024-10-08 14:51 ` Sean Christopherson
2024-10-10 5:23 ` Yan Zhao
2024-10-10 17:33 ` Sean Christopherson
2024-10-10 21:53 ` Edgecombe, Rick P
2024-10-11 2:30 ` Yan Zhao
2024-10-14 10:54 ` Huang, Kai
2024-10-14 17:36 ` Edgecombe, Rick P
2024-10-14 23:03 ` Huang, Kai
2024-10-15 1:24 ` Edgecombe, Rick P
2024-10-11 2:06 ` Yan Zhao
2024-10-16 14:13 ` Yan Zhao
2024-09-17 2:11 ` Huang, Kai
2024-09-13 19:19 ` Edgecombe, Rick P
2024-09-14 10:00 ` Yan Zhao
2024-09-04 3:07 ` [PATCH 10/21] KVM: TDX: Require TDP MMU and mmio caching for TDX Rick Edgecombe
2024-09-09 15:26 ` Paolo Bonzini
2024-09-12 0:15 ` Huang, Kai
2024-09-04 3:07 ` [PATCH 11/21] KVM: x86/mmu: Add setter for shadow_mmio_value Rick Edgecombe
2024-09-09 15:33 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 12/21] KVM: TDX: Set per-VM shadow_mmio_value to 0 Rick Edgecombe
2024-09-09 15:33 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 13/21] KVM: TDX: Handle TLB tracking for TDX Rick Edgecombe
2024-09-10 8:16 ` Paolo Bonzini
2024-09-10 23:49 ` Edgecombe, Rick P
2024-10-14 6:34 ` Yan Zhao
2024-09-11 6:25 ` Xu Yilun
2024-09-11 17:28 ` Edgecombe, Rick P
2024-09-12 4:54 ` Yan Zhao
2024-09-12 14:44 ` Edgecombe, Rick P
2024-09-12 7:47 ` Xu Yilun
2024-09-04 3:07 ` [PATCH 14/21] KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Rick Edgecombe
2024-09-06 2:10 ` Huang, Kai
2024-09-09 21:03 ` Edgecombe, Rick P
2024-09-10 1:52 ` Yan Zhao
2024-09-10 9:33 ` Paolo Bonzini
2024-09-10 23:58 ` Edgecombe, Rick P
2024-09-11 1:05 ` Yan Zhao
2024-10-30 3:03 ` Binbin Wu
2024-11-04 9:09 ` Yan Zhao
2024-09-04 3:07 ` [PATCH 15/21] KVM: TDX: Implement hook to get max mapping level of private pages Rick Edgecombe
2024-09-10 10:17 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 16/21] KVM: TDX: Premap initial guest memory Rick Edgecombe
2024-09-10 10:24 ` Paolo Bonzini
2024-09-11 0:19 ` Edgecombe, Rick P
2024-09-13 13:33 ` Adrian Hunter
2024-09-13 19:49 ` Edgecombe, Rick P
2024-09-10 10:49 ` Paolo Bonzini
2024-09-11 0:30 ` Edgecombe, Rick P
2024-09-11 10:39 ` Paolo Bonzini
2024-09-11 16:36 ` Edgecombe, Rick P
2024-09-04 3:07 ` [PATCH 17/21] KVM: TDX: MTRR: implement get_mt_mask() for TDX Rick Edgecombe
2024-09-10 10:04 ` Paolo Bonzini
2024-09-10 14:05 ` Sean Christopherson
2024-09-04 3:07 ` [PATCH 18/21] KVM: x86/mmu: Export kvm_tdp_map_page() Rick Edgecombe
2024-09-10 10:02 ` Paolo Bonzini
2024-09-04 3:07 ` [PATCH 19/21] KVM: TDX: Add an ioctl to create initial guest memory Rick Edgecombe
2024-09-04 4:53 ` Yan Zhao
2024-09-04 14:01 ` Edgecombe, Rick P
2024-09-06 16:30 ` Edgecombe, Rick P
2024-09-09 1:29 ` Yan Zhao
2024-09-10 10:13 ` Paolo Bonzini
2024-09-11 0:11 ` Edgecombe, Rick P
2024-09-04 13:56 ` Edgecombe, Rick P
2024-09-10 10:16 ` Paolo Bonzini
2024-09-11 0:12 ` Edgecombe, Rick P
2024-09-04 3:07 ` [PATCH 20/21] KVM: TDX: Finalize VM initialization Rick Edgecombe
2024-09-04 15:37 ` Adrian Hunter
2024-09-04 16:09 ` Edgecombe, Rick P
2024-09-10 10:33 ` Paolo Bonzini
2024-09-10 11:15 ` Adrian Hunter
2024-09-10 11:28 ` Paolo Bonzini
2024-09-10 11:31 ` Adrian Hunter
2024-09-10 10:25 ` Paolo Bonzini
2024-09-10 11:54 ` Adrian Hunter
2024-09-04 3:07 ` [PATCH 21/21] KVM: TDX: Handle vCPU dissociation Rick Edgecombe
2024-09-09 15:41 ` Paolo Bonzini
2024-09-09 23:30 ` Edgecombe, Rick P
2024-09-10 10:45 ` Paolo Bonzini
2024-09-11 0:17 ` Edgecombe, Rick P
2024-11-04 9:45 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZuCE_KtmXNi0qePb@google.com \
--to=seanjc@google.com \
--cc=dmatlack@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nik.borisov@suse.com \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yuan.yao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox