From: Ira Weiny <ira.weiny@intel.com>
To: Sean Christopherson <seanjc@google.com>, Ira Weiny <ira.weiny@intel.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"Vishal Annapurve" <vannapurve@google.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Yan Y Zhao <yan.y.zhao@intel.com>,
"michael.roth@amd.com" <michael.roth@amd.com>
Subject: Re: [RFC PATCH 09/12] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Date: Thu, 28 Aug 2025 19:35:16 -0500 [thread overview]
Message-ID: <68b0f5c4e6716_293b32946@iweiny-mobl.notmuch> (raw)
In-Reply-To: <aLDjpe31-w6md-GV@google.com>
Sean Christopherson wrote:
> On Thu, Aug 28, 2025, Ira Weiny wrote:
> > Edgecombe, Rick P wrote:
> > > On Thu, 2025-08-28 at 13:26 -0700, Sean Christopherson wrote:
> > > > Me confused. This is pre-boot, not the normal fault path, i.e. blocking other
> > > > operations is not a concern.
> > >
> > > Just was my recollection of the discussion. I found it:
> > > https://lore.kernel.org/lkml/Zbrj5WKVgMsUFDtb@google.com/
> > >
> > > >
> > > > If tdh_mr_extend() is too heavy for a non-preemptible section, then the current
> > > > code is also broken in the sense that there are no cond_resched() calls. The
> > > > vast majority of TDX hosts will be using non-preemptible kernels, so without an
> > > > explicit cond_resched(), there's no practical difference between extending the
> > > > measurement under mmu_lock versus outside of mmu_lock.
> > > >
> > > > _If_ we need/want to do tdh_mr_extend() outside of mmu_lock, we can and should
> > > > still do tdh_mem_page_add() under mmu_lock.
> > >
> > > I just did a quick test and we should be on the order of <1 ms per page for the
> > > full loop. I can try to get some more formal test data if it matters. But that
> > > doesn't sound too horrible?
> > >
> > > tdh_mr_extend() outside MMU lock is tempting because it doesn't *need* to be
> > > inside it.
> >
> > I'm probably not following this conversation, so stupid question: It
> > doesn't need to be in the lock because user space should not be setting up
> > memory and extending the measurement in an asynchronous way. Is that
> > correct?
>
> No, from userspace's perspective ADD+MEASURE is fully serialized. ADD "needs"
> to be under mmu_lock to guarantee consistency between the mirror EPT and the
> "real" S-EPT entries. E.g. if ADD is done after the fact, then KVM can end up
> with a PRESENT M-EPT entry but a corresponding S-EPT entry that is !PRESENT.
> That causes a pile of problems because it breaks KVM's fundamental assumption
> that M-EPT and S-EPT entries updated in lock-step.
Ok yes, I think I worded my query incorrectly but this makes things clear.
Thanks!
>
> TDH_MR_EXTEND doesn't have the same same consistency issue. If it fails, the
> only thing that's left in a bad state is the measurement. That's obviously not
> ideal either, but we can handle that by forcefully terminating the VM, without
> opening up KVM to edge cases that would otherwise be impossible.
>
> > > But maybe a better reason is that we could better handle errors
> > > outside the fault. (i.e. no 5 line comment about why not to return an error in
> > > tdx_mem_page_add() due to code in another file).
> > >
> > > I wonder if Yan can give an analysis of any zapping races if we do that.
> >
> > When you say analysis, you mean detecting user space did something wrong
> > and failing gracefully? Is that correct?
>
> More specifically, whether or not KVM can WARN without the WARN being user
> triggerable. Kernel policy is that WARNs must not be triggerable absent kernel,
> hardware, or firmware bugs. What we're trying to figure out is if there's a
> flow that can be triggered by userspace (misbehving or not) that would trip a
> WARN even if KVM is operating as expected. I'm pretty sure the answer is "no".
>
> Oh, and WARNing here is desirable, because it improves the chances of detecting
> a fatal-to-the-VM bug, e.g. in KVM and/or in the TDX-Module.
OK... In other areas of the kernel if the user misbehaves it is
reasonable to fail an operation. I would think that being fatal to the VM
would be fine if QEMU did not properly synchronize ADD, measurement, and
finalize, for example. Am I wrong in that assumption?
Ira
next prev parent reply other threads:[~2025-08-29 0:33 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-27 0:05 [RFC PATCH 00/12] KVM: x86/mmu: TDX post-populate cleanups Sean Christopherson
2025-08-27 0:05 ` [RFC PATCH 01/12] KVM: TDX: Drop PROVE_MMU=y sanity check on to-be-populated mappings Sean Christopherson
2025-08-27 8:14 ` Yan Zhao
2025-08-28 0:37 ` Ira Weiny
2025-08-28 2:13 ` Huang, Kai
2025-08-27 0:05 ` [RFC PATCH 02/12] KVM: x86/mmu: Add dedicated API to map guest_memfd pfn into TDP MMU Sean Christopherson
2025-08-27 8:25 ` Yan Zhao
2025-08-28 0:54 ` Edgecombe, Rick P
2025-08-28 1:26 ` Edgecombe, Rick P
2025-08-28 6:23 ` Yan Zhao
2025-08-28 19:40 ` Sean Christopherson
2025-08-29 1:16 ` Yan Zhao
2025-09-01 0:39 ` Yan Zhao
2025-08-28 6:55 ` Yan Zhao
2025-08-28 0:40 ` Ira Weiny
2025-08-28 1:51 ` Edgecombe, Rick P
2025-08-28 19:57 ` Sean Christopherson
2025-08-27 0:05 ` [RFC PATCH 03/12] Revert "KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU" Sean Christopherson
2025-08-27 0:05 ` [RFC PATCH 04/12] KVM: x86/mmu: Rename kvm_tdp_map_page() to kvm_tdp_prefault_page() Sean Christopherson
2025-08-28 2:01 ` Edgecombe, Rick P
2025-08-28 18:50 ` Sean Christopherson
2025-08-28 19:04 ` Edgecombe, Rick P
2025-08-27 0:05 ` [RFC PATCH 05/12] KVM: TDX: Drop superfluous page pinning in S-EPT management Sean Christopherson
2025-08-27 8:33 ` Yan Zhao
2025-08-28 2:05 ` Edgecombe, Rick P
2025-08-28 20:16 ` Sean Christopherson
2025-08-28 0:36 ` Ira Weiny
2025-08-28 7:08 ` Yan Zhao
2025-08-28 15:54 ` Ira Weiny
2025-08-28 2:45 ` Huang, Kai
2025-08-27 0:05 ` [RFC PATCH 06/12] KVM: TDX: Return -EIO, not -EINVAL, on a KVM_BUG_ON() condition Sean Christopherson
2025-08-27 8:39 ` Yan Zhao
2025-08-27 17:26 ` Sean Christopherson
2025-08-28 2:11 ` Edgecombe, Rick P
2025-08-28 19:21 ` Sean Christopherson
2025-08-28 20:13 ` Edgecombe, Rick P
2025-08-28 21:00 ` Sean Christopherson
2025-08-28 21:19 ` Edgecombe, Rick P
2025-08-28 21:34 ` Sean Christopherson
2025-08-28 15:03 ` Ira Weiny
2025-08-27 0:05 ` [RFC PATCH 07/12] KVM: TDX: Avoid a double-KVM_BUG_ON() in tdx_sept_zap_private_spte() Sean Christopherson
2025-08-28 2:19 ` Edgecombe, Rick P
2025-08-28 14:50 ` Edgecombe, Rick P
2025-08-29 1:10 ` Yan Zhao
2025-08-28 15:02 ` Ira Weiny
2025-08-27 0:05 ` [RFC PATCH 08/12] KVM: TDX: Use atomic64_dec_return() instead of a poor equivalent Sean Christopherson
2025-08-28 2:56 ` Edgecombe, Rick P
2025-08-28 6:48 ` Yan Zhao
2025-08-28 19:14 ` Edgecombe, Rick P
2025-08-28 22:33 ` Sean Christopherson
2025-08-28 23:18 ` Edgecombe, Rick P
2025-08-28 15:03 ` Ira Weiny
2025-08-27 0:05 ` [RFC PATCH 09/12] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller Sean Christopherson
2025-08-27 9:02 ` Yan Zhao
2025-08-27 19:08 ` Sean Christopherson
2025-08-28 3:13 ` Edgecombe, Rick P
2025-08-28 5:56 ` Yan Zhao
2025-08-28 19:08 ` Edgecombe, Rick P
2025-08-28 5:43 ` Yan Zhao
2025-08-28 17:00 ` Sean Christopherson
2025-08-28 18:52 ` Edgecombe, Rick P
2025-08-28 20:26 ` Sean Christopherson
2025-08-28 21:33 ` Edgecombe, Rick P
2025-08-28 21:57 ` Sean Christopherson
2025-08-28 23:17 ` Edgecombe, Rick P
2025-08-29 6:08 ` Yan Zhao
2025-08-28 22:06 ` Ira Weiny
2025-08-28 23:17 ` Sean Christopherson
2025-08-29 0:35 ` Ira Weiny [this message]
2025-08-29 6:06 ` Yan Zhao
2025-08-28 21:44 ` Sean Christopherson
2025-08-29 2:42 ` Binbin Wu
2025-08-29 2:31 ` Yan Zhao
2025-08-29 6:33 ` Yan Zhao
2025-08-28 15:30 ` Ira Weiny
2025-08-28 15:28 ` Ira Weiny
2025-08-27 0:05 ` [RFC PATCH 10/12] KVM: TDX: Assert that slots_lock is held when nr_premapped is accessed Sean Christopherson
2025-08-27 0:05 ` [RFC PATCH 11/12] KVM: TDX: Track nr_premapped as an "unsigned long", not an "atomic64_t" Sean Christopherson
2025-08-27 9:12 ` Yan Zhao
2025-08-27 0:05 ` [RFC PATCH 12/12] KVM: TDX: Rename nr_premapped to nr_pending_tdh_mem_page_adds Sean Christopherson
2025-08-27 9:22 ` Yan Zhao
2025-08-28 15:23 ` Ira Weiny
2025-08-27 9:48 ` [RFC PATCH 00/12] KVM: x86/mmu: TDX post-populate cleanups Yan Zhao
2025-08-28 19:01 ` Edgecombe, Rick P
2025-08-28 23:19 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=68b0f5c4e6716_293b32946@iweiny-mobl.notmuch \
--to=ira.weiny@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=vannapurve@google.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox