kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	 Yan Y Zhao <yan.y.zhao@intel.com>, Yuan Yao <yuan.yao@intel.com>,
	 "nik.borisov@suse.com" <nik.borisov@suse.com>,
	"dmatlack@google.com" <dmatlack@google.com>,
	 Kai Huang <kai.huang@intel.com>,
	"isaku.yamahata@gmail.com" <isaku.yamahata@gmail.com>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT
Date: Tue, 10 Sep 2024 06:57:54 -0700	[thread overview]
Message-ID: <ZuBQYvY6Ib4ZYBgx@google.com> (raw)
In-Reply-To: <1bbe3a78-8746-4db9-a96c-9dc5f1190f16@redhat.com>

On Tue, Sep 10, 2024, Paolo Bonzini wrote:
> On 9/9/24 23:11, Sean Christopherson wrote:
> > In general, I am_very_  opposed to blindly retrying an SEPT SEAMCALL, ever.  For
> > its operations, I'm pretty sure the only sane approach is for KVM to ensure there
> > will be no contention.  And if the TDX module's single-step protection spuriously
> > kicks in, KVM exits to userspace.  If the TDX module can't/doesn't/won't communicate
> > that it's mitigating single-step, e.g. so that KVM can forward the information
> > to userspace, then that's a TDX module problem to solve.
> 
> In principle I agree but we also need to be pragmatic.  Exiting to userspace
> may not be practical in all flows, for example.
> 
> First of all, we can add a spinlock around affected seamcalls.

No, because that defeates the purpose of having mmu_lock be a rwlock.

> This way we know that "busy" errors must come from the guest and have set
> HOST_PRIORITY.
 
We should be able to achieve that without a VM-wide spinlock.  My thought (from
v11?) was to effectively use the FROZEN_SPTE bit as a per-SPTE spinlock, i.e. keep
it set until the SEAMCALL completes.

> It is still kinda bad that guests can force the VMM to loop, but the VMM can
> always say enough is enough.  In other words, let's assume that a limit of
> 16 is probably appropriate but we can also increase the limit and crash the
> VM if things become ridiculous.
> 
> Something like this:
> 
> 	static u32 max = 16;
> 	int retry = 0;
> 	spin_lock(&kvm->arch.seamcall_lock);
> 	for (;;) {
> 		args_in = *in;
> 		ret = seamcall_ret(op, in);
> 		if (++retry == 1) {
> 			/* protected by the same seamcall_lock */
> 			kvm->stat.retried_seamcalls++;
> 		} else if (retry == READ_ONCE(max)) {
> 			pr_warn("Exceeded %d retries for S-EPT operation\n", max);
> 			if (KVM_BUG_ON(kvm, retry == 1024)) {
> 				pr_err("Crashing due to lock contention in the TDX module\n");
> 				break;
> 			}
> 			cmpxchg(&max, retry, retry * 2);
> 		}
> 	}
> 	spin_unlock(&kvm->arch.seamcall_lock);
> 
> This way we can do some testing and figure out a useful limit.

2 :-)

One try that guarantees no other host task is accessing the S-EPT entry, and a
second try after blasting IPI to kick vCPUs to ensure no guest-side task has
locked the S-EPT entry.

My concern with an arbitrary retry loop is that we'll essentially propagate the
TDX module issues to the broader kernel.  Each of those SEAMCALLs is slooow, so
retrying even ~20 times could exceed the system's tolerances for scheduling, RCU,
etc...

> For zero step detection, my reading is that it's TDH.VP.ENTER that fails;
> not any of the MEM seamcalls.  For that one to be resolved, it should be
> enough to do take and release the mmu_lock back to back, which ensures that
> all pending critical sections have completed (that is,
> "write_lock(&kvm->mmu_lock); write_unlock(&kvm->mmu_lock);").  And then
> loop.  Adding a vCPU stat for that one is a good idea, too.

As above and in my discussion with Rick, I would prefer to kick vCPUs to force
forward progress, especially for the zero-step case.  If KVM gets to the point
where it has retried TDH.VP.ENTER on the same fault so many times that zero-step
kicks in, then it's time to kick and wait, not keep retrying blindly.

There is still risk of a hang, e.g. if a CPU fails to respond to the IPI, but
that's a possibility that always exists.  Kicking vCPUs allows KVM to know with
100% certainty that a SEAMCALL should succeed.

Hrm, the wrinkle is that if we want to guarantee success, the vCPU kick would
need to happen when the SPTE is frozen, to ensure some other host task doesn't
"steal" the lock.

  reply	other threads:[~2024-09-10 13:57 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-04  3:07 [PATCH 00/21] TDX MMU Part 2 Rick Edgecombe
2024-09-04  3:07 ` [PATCH 01/21] KVM: x86/mmu: Implement memslot deletion for TDX Rick Edgecombe
2024-09-09 13:44   ` Paolo Bonzini
2024-09-09 21:06     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 02/21] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Rick Edgecombe
2024-09-09 13:51   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 03/21] KVM: x86/mmu: Do not enable page track for TD guest Rick Edgecombe
2024-09-09 13:53   ` Paolo Bonzini
2024-09-09 21:07     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 04/21] KVM: VMX: Split out guts of EPT violation to common/exposed function Rick Edgecombe
2024-09-09 13:57   ` Paolo Bonzini
2024-09-09 16:07   ` Sean Christopherson
2024-09-10  7:36     ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 05/21] KVM: VMX: Teach EPT violation helper about private mem Rick Edgecombe
2024-09-09 13:59   ` Paolo Bonzini
2024-09-11  8:52   ` Chao Gao
2024-09-11 16:29     ` Edgecombe, Rick P
2024-09-12  0:39   ` Huang, Kai
2024-09-12 13:58     ` Sean Christopherson
2024-09-12 14:43       ` Edgecombe, Rick P
2024-09-12 14:46         ` Paolo Bonzini
2024-09-12  1:19   ` Huang, Kai
2024-09-04  3:07 ` [PATCH 06/21] KVM: TDX: Add accessors VMX VMCS helpers Rick Edgecombe
2024-09-09 14:19   ` Paolo Bonzini
2024-09-09 21:29     ` Edgecombe, Rick P
2024-09-10 10:48       ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 07/21] KVM: TDX: Add load_mmu_pgd method for TDX Rick Edgecombe
2024-09-11  2:48   ` Chao Gao
2024-09-11  2:49     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 08/21] KVM: TDX: Set gfn_direct_bits to shared bit Rick Edgecombe
2024-09-09 15:21   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT Rick Edgecombe
2024-09-06  1:41   ` Huang, Kai
2024-09-09 20:25     ` Edgecombe, Rick P
2024-09-09 15:25   ` Paolo Bonzini
2024-09-09 20:22     ` Edgecombe, Rick P
2024-09-09 21:11       ` Sean Christopherson
2024-09-09 21:23         ` Sean Christopherson
2024-09-09 22:34           ` Edgecombe, Rick P
2024-09-09 23:58             ` Sean Christopherson
2024-09-10  0:50               ` Edgecombe, Rick P
2024-09-10  1:46                 ` Sean Christopherson
2024-09-11  1:17               ` Huang, Kai
2024-09-11  2:48                 ` Edgecombe, Rick P
2024-09-11 22:55                   ` Huang, Kai
2024-09-10 13:15         ` Paolo Bonzini
2024-09-10 13:57           ` Sean Christopherson [this message]
2024-09-10 15:16             ` Paolo Bonzini
2024-09-10 15:57               ` Sean Christopherson
2024-09-10 16:28                 ` Edgecombe, Rick P
2024-09-10 17:42                   ` Sean Christopherson
2024-09-13  8:36                     ` Yan Zhao
2024-09-13 17:23                       ` Sean Christopherson
2024-09-13 19:19                         ` Edgecombe, Rick P
2024-09-13 22:18                           ` Sean Christopherson
2024-09-14  9:27                         ` Yan Zhao
2024-09-15  9:53                           ` Yan Zhao
2024-09-17  1:31                             ` Huang, Kai
2024-09-25 10:53                           ` Yan Zhao
2024-10-08 14:51                             ` Sean Christopherson
2024-10-10  5:23                               ` Yan Zhao
2024-10-10 17:33                                 ` Sean Christopherson
2024-10-10 21:53                                   ` Edgecombe, Rick P
2024-10-11  2:30                                     ` Yan Zhao
2024-10-14 10:54                                     ` Huang, Kai
2024-10-14 17:36                                       ` Edgecombe, Rick P
2024-10-14 23:03                                         ` Huang, Kai
2024-10-15  1:24                                           ` Edgecombe, Rick P
2024-10-11  2:06                                   ` Yan Zhao
2024-10-16 14:13                                   ` Yan Zhao
2024-09-17  2:11                         ` Huang, Kai
2024-09-13 19:19                       ` Edgecombe, Rick P
2024-09-14 10:00                         ` Yan Zhao
2024-09-04  3:07 ` [PATCH 10/21] KVM: TDX: Require TDP MMU and mmio caching for TDX Rick Edgecombe
2024-09-09 15:26   ` Paolo Bonzini
2024-09-12  0:15   ` Huang, Kai
2024-09-04  3:07 ` [PATCH 11/21] KVM: x86/mmu: Add setter for shadow_mmio_value Rick Edgecombe
2024-09-09 15:33   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 12/21] KVM: TDX: Set per-VM shadow_mmio_value to 0 Rick Edgecombe
2024-09-09 15:33   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 13/21] KVM: TDX: Handle TLB tracking for TDX Rick Edgecombe
2024-09-10  8:16   ` Paolo Bonzini
2024-09-10 23:49     ` Edgecombe, Rick P
2024-10-14  6:34     ` Yan Zhao
2024-09-11  6:25   ` Xu Yilun
2024-09-11 17:28     ` Edgecombe, Rick P
2024-09-12  4:54       ` Yan Zhao
2024-09-12 14:44         ` Edgecombe, Rick P
2024-09-12  7:47       ` Xu Yilun
2024-09-04  3:07 ` [PATCH 14/21] KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Rick Edgecombe
2024-09-06  2:10   ` Huang, Kai
2024-09-09 21:03     ` Edgecombe, Rick P
2024-09-10  1:52       ` Yan Zhao
2024-09-10  9:33       ` Paolo Bonzini
2024-09-10 23:58         ` Edgecombe, Rick P
2024-09-11  1:05           ` Yan Zhao
2024-10-30  3:03   ` Binbin Wu
2024-11-04  9:09     ` Yan Zhao
2024-09-04  3:07 ` [PATCH 15/21] KVM: TDX: Implement hook to get max mapping level of private pages Rick Edgecombe
2024-09-10 10:17   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 16/21] KVM: TDX: Premap initial guest memory Rick Edgecombe
2024-09-10 10:24   ` Paolo Bonzini
2024-09-11  0:19     ` Edgecombe, Rick P
2024-09-13 13:33       ` Adrian Hunter
2024-09-13 19:49         ` Edgecombe, Rick P
2024-09-10 10:49   ` Paolo Bonzini
2024-09-11  0:30     ` Edgecombe, Rick P
2024-09-11 10:39       ` Paolo Bonzini
2024-09-11 16:36         ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 17/21] KVM: TDX: MTRR: implement get_mt_mask() for TDX Rick Edgecombe
2024-09-10 10:04   ` Paolo Bonzini
2024-09-10 14:05     ` Sean Christopherson
2024-09-04  3:07 ` [PATCH 18/21] KVM: x86/mmu: Export kvm_tdp_map_page() Rick Edgecombe
2024-09-10 10:02   ` Paolo Bonzini
2024-09-04  3:07 ` [PATCH 19/21] KVM: TDX: Add an ioctl to create initial guest memory Rick Edgecombe
2024-09-04  4:53   ` Yan Zhao
2024-09-04 14:01     ` Edgecombe, Rick P
2024-09-06 16:30       ` Edgecombe, Rick P
2024-09-09  1:29         ` Yan Zhao
2024-09-10 10:13         ` Paolo Bonzini
2024-09-11  0:11           ` Edgecombe, Rick P
2024-09-04 13:56   ` Edgecombe, Rick P
2024-09-10 10:16   ` Paolo Bonzini
2024-09-11  0:12     ` Edgecombe, Rick P
2024-09-04  3:07 ` [PATCH 20/21] KVM: TDX: Finalize VM initialization Rick Edgecombe
2024-09-04 15:37   ` Adrian Hunter
2024-09-04 16:09     ` Edgecombe, Rick P
2024-09-10 10:33     ` Paolo Bonzini
2024-09-10 11:15       ` Adrian Hunter
2024-09-10 11:28         ` Paolo Bonzini
2024-09-10 11:31         ` Adrian Hunter
2024-09-10 10:25   ` Paolo Bonzini
2024-09-10 11:54     ` Adrian Hunter
2024-09-04  3:07 ` [PATCH 21/21] KVM: TDX: Handle vCPU dissociation Rick Edgecombe
2024-09-09 15:41   ` Paolo Bonzini
2024-09-09 23:30     ` Edgecombe, Rick P
2024-09-10 10:45   ` Paolo Bonzini
2024-09-11  0:17     ` Edgecombe, Rick P
2024-11-04  9:45     ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZuBQYvY6Ib4ZYBgx@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yuan.yao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).