kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	 "pbonzini@redhat.com" <pbonzini@redhat.com>,
	Vishal Annapurve <vannapurve@google.com>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Yan Y Zhao <yan.y.zhao@intel.com>,
	 "michael.roth@amd.com" <michael.roth@amd.com>
Subject: Re: [RFC PATCH 09/12] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Date: Thu, 28 Aug 2025 16:17:57 -0700	[thread overview]
Message-ID: <aLDjpe31-w6md-GV@google.com> (raw)
In-Reply-To: <68b0d2fb207cc_27c6d294e1@iweiny-mobl.notmuch>

On Thu, Aug 28, 2025, Ira Weiny wrote:
> Edgecombe, Rick P wrote:
> > On Thu, 2025-08-28 at 13:26 -0700, Sean Christopherson wrote:
> > > Me confused.  This is pre-boot, not the normal fault path, i.e. blocking other
> > > operations is not a concern.
> > 
> > Just was my recollection of the discussion. I found it:
> > https://lore.kernel.org/lkml/Zbrj5WKVgMsUFDtb@google.com/
> > 
> > > 
> > > If tdh_mr_extend() is too heavy for a non-preemptible section, then the current
> > > code is also broken in the sense that there are no cond_resched() calls.  The
> > > vast majority of TDX hosts will be using non-preemptible kernels, so without an
> > > explicit cond_resched(), there's no practical difference between extending the
> > > measurement under mmu_lock versus outside of mmu_lock.
> > > 
> > > _If_ we need/want to do tdh_mr_extend() outside of mmu_lock, we can and should
> > > still do tdh_mem_page_add() under mmu_lock.
> > 
> > I just did a quick test and we should be on the order of <1 ms per page for the
> > full loop. I can try to get some more formal test data if it matters. But that
> > doesn't sound too horrible?
> > 
> > tdh_mr_extend() outside MMU lock is tempting because it doesn't *need* to be
> > inside it.
> 
> I'm probably not following this conversation, so stupid question:  It
> doesn't need to be in the lock because user space should not be setting up
> memory and extending the measurement in an asynchronous way.  Is that
> correct?

No, from userspace's perspective ADD+MEASURE is fully serialized.  ADD "needs"
to be under mmu_lock to guarantee consistency between the mirror EPT and the
"real" S-EPT entries.  E.g. if ADD is done after the fact, then KVM can end up
with a PRESENT M-EPT entry but a corresponding S-EPT entry that is !PRESENT.
That causes a pile of problems because it breaks KVM's fundamental assumption
that M-EPT and S-EPT entries updated in lock-step.

TDH_MR_EXTEND doesn't have the same same consistency issue.  If it fails, the
only thing that's left in a bad state is the measurement.  That's obviously not
ideal either, but we can handle that by forcefully terminating the VM, without
opening up KVM to edge cases that would otherwise be impossible.

> > But maybe a better reason is that we could better handle errors
> > outside the fault. (i.e. no 5 line comment about why not to return an error in
> > tdx_mem_page_add() due to code in another file).
> > 
> > I wonder if Yan can give an analysis of any zapping races if we do that.
> 
> When you say analysis, you mean detecting user space did something wrong
> and failing gracefully?  Is that correct?

More specifically, whether or not KVM can WARN without the WARN being user
triggerable.  Kernel policy is that WARNs must not be triggerable absent kernel,
hardware, or firmware bugs.  What we're trying to figure out is if there's a
flow that can be triggered by userspace (misbehving or not) that would trip a
WARN even if KVM is operating as expected.  I'm pretty sure the answer is "no".

Oh, and WARNing here is desirable, because it improves the chances of detecting
a fatal-to-the-VM bug, e.g. in KVM and/or in the TDX-Module.

  reply	other threads:[~2025-08-28 23:17 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-27  0:05 [RFC PATCH 00/12] KVM: x86/mmu: TDX post-populate cleanups Sean Christopherson
2025-08-27  0:05 ` [RFC PATCH 01/12] KVM: TDX: Drop PROVE_MMU=y sanity check on to-be-populated mappings Sean Christopherson
2025-08-27  8:14   ` Yan Zhao
2025-08-28  0:37   ` Ira Weiny
2025-08-28  2:13   ` Huang, Kai
2025-08-27  0:05 ` [RFC PATCH 02/12] KVM: x86/mmu: Add dedicated API to map guest_memfd pfn into TDP MMU Sean Christopherson
2025-08-27  8:25   ` Yan Zhao
2025-08-28  0:54     ` Edgecombe, Rick P
2025-08-28  1:26       ` Edgecombe, Rick P
2025-08-28  6:23         ` Yan Zhao
2025-08-28 19:40           ` Sean Christopherson
2025-08-29  1:16             ` Yan Zhao
2025-09-01  0:39               ` Yan Zhao
2025-08-28  6:55       ` Yan Zhao
2025-08-28  0:40   ` Ira Weiny
2025-08-28  1:51     ` Edgecombe, Rick P
2025-08-28 19:57       ` Sean Christopherson
2025-08-27  0:05 ` [RFC PATCH 03/12] Revert "KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU" Sean Christopherson
2025-08-27  0:05 ` [RFC PATCH 04/12] KVM: x86/mmu: Rename kvm_tdp_map_page() to kvm_tdp_prefault_page() Sean Christopherson
2025-08-28  2:01   ` Edgecombe, Rick P
2025-08-28 18:50     ` Sean Christopherson
2025-08-28 19:04       ` Edgecombe, Rick P
2025-08-27  0:05 ` [RFC PATCH 05/12] KVM: TDX: Drop superfluous page pinning in S-EPT management Sean Christopherson
2025-08-27  8:33   ` Yan Zhao
2025-08-28  2:05     ` Edgecombe, Rick P
2025-08-28 20:16       ` Sean Christopherson
2025-08-28  0:36   ` Ira Weiny
2025-08-28  7:08     ` Yan Zhao
2025-08-28 15:54       ` Ira Weiny
2025-08-28  2:45   ` Huang, Kai
2025-08-27  0:05 ` [RFC PATCH 06/12] KVM: TDX: Return -EIO, not -EINVAL, on a KVM_BUG_ON() condition Sean Christopherson
2025-08-27  8:39   ` Yan Zhao
2025-08-27 17:26     ` Sean Christopherson
2025-08-28  2:11   ` Edgecombe, Rick P
2025-08-28 19:21     ` Sean Christopherson
2025-08-28 20:13       ` Edgecombe, Rick P
2025-08-28 21:00         ` Sean Christopherson
2025-08-28 21:19           ` Edgecombe, Rick P
2025-08-28 21:34             ` Sean Christopherson
2025-08-28 15:03   ` Ira Weiny
2025-08-27  0:05 ` [RFC PATCH 07/12] KVM: TDX: Avoid a double-KVM_BUG_ON() in tdx_sept_zap_private_spte() Sean Christopherson
2025-08-28  2:19   ` Edgecombe, Rick P
2025-08-28 14:50     ` Edgecombe, Rick P
2025-08-29  1:10       ` Yan Zhao
2025-08-28 15:02   ` Ira Weiny
2025-08-27  0:05 ` [RFC PATCH 08/12] KVM: TDX: Use atomic64_dec_return() instead of a poor equivalent Sean Christopherson
2025-08-28  2:56   ` Edgecombe, Rick P
2025-08-28  6:48     ` Yan Zhao
2025-08-28 19:14       ` Edgecombe, Rick P
2025-08-28 22:33         ` Sean Christopherson
2025-08-28 23:18           ` Edgecombe, Rick P
2025-08-28 15:03   ` Ira Weiny
2025-08-27  0:05 ` [RFC PATCH 09/12] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller Sean Christopherson
2025-08-27  9:02   ` Yan Zhao
2025-08-27 19:08     ` Sean Christopherson
2025-08-28  3:13       ` Edgecombe, Rick P
2025-08-28  5:56         ` Yan Zhao
2025-08-28 19:08           ` Edgecombe, Rick P
2025-08-28  5:43       ` Yan Zhao
2025-08-28 17:00         ` Sean Christopherson
2025-08-28 18:52           ` Edgecombe, Rick P
2025-08-28 20:26             ` Sean Christopherson
2025-08-28 21:33               ` Edgecombe, Rick P
2025-08-28 21:57                 ` Sean Christopherson
2025-08-28 23:17                   ` Edgecombe, Rick P
2025-08-29  6:08                   ` Yan Zhao
2025-08-28 22:06                 ` Ira Weiny
2025-08-28 23:17                   ` Sean Christopherson [this message]
2025-08-29  0:35                     ` Ira Weiny
2025-08-29  6:06                 ` Yan Zhao
2025-08-28 21:44             ` Sean Christopherson
2025-08-29  2:42             ` Binbin Wu
2025-08-29  2:31           ` Yan Zhao
2025-08-29  6:33             ` Yan Zhao
2025-08-28 15:30       ` Ira Weiny
2025-08-28 15:28     ` Ira Weiny
2025-08-27  0:05 ` [RFC PATCH 10/12] KVM: TDX: Assert that slots_lock is held when nr_premapped is accessed Sean Christopherson
2025-08-27  0:05 ` [RFC PATCH 11/12] KVM: TDX: Track nr_premapped as an "unsigned long", not an "atomic64_t" Sean Christopherson
2025-08-27  9:12   ` Yan Zhao
2025-08-27  0:05 ` [RFC PATCH 12/12] KVM: TDX: Rename nr_premapped to nr_pending_tdh_mem_page_adds Sean Christopherson
2025-08-27  9:22   ` Yan Zhao
2025-08-28 15:23   ` Ira Weiny
2025-08-27  9:48 ` [RFC PATCH 00/12] KVM: x86/mmu: TDX post-populate cleanups Yan Zhao
2025-08-28 19:01 ` Edgecombe, Rick P
2025-08-28 23:19   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aLDjpe31-w6md-GV@google.com \
    --to=seanjc@google.com \
    --cc=ira.weiny@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=vannapurve@google.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).