From: Sean Christopherson <seanjc@google.com>
To: Kai Huang <kai.huang@intel.com>
Cc: "binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Xiaoyao Li <xiaoyao.li@intel.com>,
Yan Y Zhao <yan.y.zhao@intel.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
Adrian Hunter <adrian.hunter@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Reinette Chatre <reinette.chatre@intel.com>,
Weijiang Yang <weijiang.yang@intel.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
Isaku Yamahata <isaku.yamahata@intel.com>,
"dmatlack@google.com" <dmatlack@google.com>,
"nik.borisov@suse.com" <nik.borisov@suse.com>,
"tony.lindgren@linux.intel.com" <tony.lindgren@linux.intel.com>,
Chao Gao <chao.gao@intel.com>,
Rick P Edgecombe <rick.p.edgecombe@intel.com>,
"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH 0/7] KVM: TDX: TD vcpu enter/exit
Date: Mon, 25 Nov 2024 14:51:52 -0800 [thread overview]
Message-ID: <Z0T_iPdmtpjrc14q@google.com> (raw)
In-Reply-To: <735d3a560046e4a7a9f223dc5688dcf1730280c5.camel@intel.com>
On Mon, Nov 25, 2024, Kai Huang wrote:
> On Mon, 2024-11-25 at 07:19 -0800, Sean Christopherson wrote:
> > On Mon, Nov 25, 2024, Binbin Wu wrote:
> > > On 11/22/2024 4:14 AM, Adrian Hunter wrote:
> > > [...]
> > > > - tdx_vcpu_enter_exit() calls guest_state_enter_irqoff()
> > > > and guest_state_exit_irqoff() which comments say should be
> > > > called from non-instrumentable code but noinst was removed
> > > > at Sean's suggestion:
> > > > https://lore.kernel.org/all/Zg8tJspL9uBmMZFO@google.com/
> > > > noinstr is also needed to retain NMI-blocking by avoiding
> > > > instrumented code that leads to an IRET which unblocks NMIs.
> > > > A later patch set will deal with NMI VM-exits.
> > > >
> > > In https://lore.kernel.org/all/Zg8tJspL9uBmMZFO@google.com, Sean mentioned:
> > > "The reason the VM-Enter flows for VMX and SVM need to be noinstr is they do things
> > > like load the guest's CR2, and handle NMI VM-Exits with NMIs blocks. None of
> > > that applies to TDX. Either that, or there are some massive bugs lurking due to
> > > missing code."
> > >
> > > I don't understand why handle NMI VM-Exits with NMIs blocks doesn't apply to
> > > TDX. IIUIC, similar to VMX, TDX also needs to handle the NMI VM-exit in the
> > > noinstr section to avoid the unblock of NMIs due to instrumentation-induced
> > > fault.
> >
> > With TDX, SEAMCALL is mechnically a VM-Exit. KVM is the "guest" running in VMX
> > root mode, and the TDX-Module is the "host", running in SEAM root mode.
> >
> > And for TDH.VP.ENTER, if a hardware NMI arrives with the TDX guest is active,
> > the initial NMI VM-Exit, which consumes the NMI and blocks further NMIs, goes
> > from SEAM non-root to SEAM root. The SEAMRET from SEAM root to VMX root (KVM)
> > is effectively a VM-Enter, and does NOT block NMIs in VMX root (at least, AFAIK).
> >
> > So trying to handle the NMI "exit" in a noinstr section is pointless because NMIs
> > are never blocked.
>
> I thought NMI remains being blocked after SEAMRET?
No, because NMIs weren't blocked at SEAMCALL.
> The TDX CPU architecture extension spec says:
>
> "
> On transition to SEAM VMX root operation, the processor can inhibit NMI and SMI.
> While inhibited, if these events occur, then they are tailored to stay pending
> and be delivered when the inhibit state is removed. NMI and external interrupts
> can be uninhibited in SEAM VMX-root operation. In SEAM VMX-root operation,
> MSR_INTR_PENDING can be read to help determine status of any pending events.
>
> On transition to SEAM VMX non-root operation using a VM entry, NMI and INTR
> inhibit states are, by design, updated based on the configuration of the TD VMCS
> used to perform the VM entry.
>
> ...
>
> On transition to legacy VMX root operation using SEAMRET, the NMI and SMI
> inhibit state can be restored to the inhibit state at the time of the previous
> SEAMCALL and any pending NMI/SMI would be delivered if not inhibited.
> "
>
> Here NMI is inhibited in SEAM VMX root, but is never inhibited in VMX root.
Yep.
> And the last paragraph does say "any pending NMI would be delivered if not
> inhibited".
That's referring to the scenario where an NMI becomes pending while the CPU is in
SEAM, i.e. has NMIs blocked.
> But I thought this applies to the case when "NMI happens in SEAM VMX root", but
> not "NMI happens in SEAM VMX non-root"? I thought the NMI is already
> "delivered" when CPU is in "SEAM VMX non-root", but I guess I was wrong here..
When an NMI happens in non-root, the NMI is acknowledged by the CPU prior to
performing VM-Exit. In regular VMX, NMIs are blocked after such VM-Exits. With
TDX, that blocking happens for SEAM root, but the SEAMRET back to VMX root will
load interruptibility from the SEAMCALL VMCS, and I don't see any code in the
TDX-Module that propagates that blocking to SEAMCALL VMCS.
Hmm, actually, this means that TDX has a causality inversion, which may become
visible with FRED's NMI source reporting. E.g. NMI X arrives in SEAM non-root
and triggers a VM-Exit. NMI X+1 becomes pending while SEAM root is active.
TDX-Module SEAMRETs to VMX root, NMIs are unblocked, and so NMI X+1 is delivered
and handled before NMI X.
So the TDX-Module needs something like this:
diff --git a/src/td_transitions/td_exit.c b/src/td_transitions/td_exit.c
index eecfb2e..b5c17c3 100644
--- a/src/td_transitions/td_exit.c
+++ b/src/td_transitions/td_exit.c
@@ -527,6 +527,11 @@ void td_vmexit_to_vmm(uint8_t vcpu_state, uint8_t last_td_exit, uint64_t scrub_m
load_xmms_by_mask(tdvps_ptr, xmm_select);
}
+ if (<is NMI VM-Exit => SEAMRET)
+ {
+ set_guest_inter_blocking_by_nmi();
+ }
+
// 7. Run the common SEAMRET routine.
tdx_vmm_post_dispatching();
and then KVM should indeed handle NMI exits prior to leaving the noinstr section.
> > TDX is also different because KVM isn't responsible for context switching guest
> > state. Specifically, CR2 is managed by the TDX Module, and so there is no window
> > where KVM runs with guest CR2, and thus there is no risk of clobbering guest CR2
> > with a host value, e.g. due to take a #PF due instrumentation triggering something.
> >
> > All that said, I did forget that code that runs between guest_state_enter_irqoff()
> > and guest_state_exit_irqoff() can't be instrumeneted. And at least as of patch 2
> > in this series, the simplest way to make that happen is to tag tdx_vcpu_enter_exit()
> > as noinstr. Just please make sure nothing else is added in the noinstr section
> > unless it absolutely needs to be there.
>
> If NMI is not a concern, is below also an option?
No, because instrumentation needs to be prohibited for the entire time between
guest_state_enter_irqoff() and guest_state_exit_irqoff().
> guest_state_enter_irqoff();
>
> instructmentation_begin();
> tdh_vp_enter();
> instructmentation_end();
>
> guest_state_exit_irqoff();
>
next prev parent reply other threads:[~2024-11-25 22:51 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-21 20:14 [PATCH 0/7] KVM: TDX: TD vcpu enter/exit Adrian Hunter
2024-11-21 20:14 ` [PATCH RFC 1/7] x86/virt/tdx: Add SEAMCALL wrapper to enter/exit TDX guest Adrian Hunter
2024-11-22 11:10 ` Adrian Hunter
2024-11-22 16:33 ` Dave Hansen
2024-11-25 13:40 ` Adrian Hunter
2024-11-28 11:13 ` Adrian Hunter
2024-12-04 15:58 ` Adrian Hunter
2024-12-11 18:43 ` Adrian Hunter
2024-12-13 15:45 ` Adrian Hunter
2024-12-13 16:16 ` Dave Hansen
2024-12-13 16:30 ` Adrian Hunter
2024-12-13 16:44 ` Dave Hansen
2024-11-22 16:26 ` Dave Hansen
2024-11-22 17:29 ` Edgecombe, Rick P
2024-11-25 13:43 ` Adrian Hunter
2024-11-21 20:14 ` [PATCH 2/7] KVM: TDX: Implement TDX vcpu enter/exit path Adrian Hunter
2024-11-22 5:23 ` Xiaoyao Li
2024-11-22 5:56 ` Binbin Wu
2024-11-22 14:33 ` Adrian Hunter
2024-11-28 5:56 ` Yan Zhao
2024-11-28 6:26 ` Adrian Hunter
2024-11-21 20:14 ` [PATCH 3/7] KVM: TDX: vcpu_run: save/restore host state(host kernel gs) Adrian Hunter
2024-11-25 14:12 ` Nikolay Borisov
2024-11-26 16:15 ` Adrian Hunter
2024-11-21 20:14 ` [PATCH 4/7] KVM: TDX: restore host xsave state when exit from the guest TD Adrian Hunter
2024-11-22 5:49 ` Chao Gao
2024-11-25 11:10 ` Adrian Hunter
2024-11-26 2:20 ` Chao Gao
2024-11-28 6:50 ` Adrian Hunter
2024-12-02 2:52 ` Chao Gao
2024-12-02 6:36 ` Adrian Hunter
2024-12-17 16:09 ` Sean Christopherson
2024-12-20 15:22 ` Adrian Hunter
2024-12-20 16:22 ` Sean Christopherson
2024-12-20 21:24 ` PKEY syscall number for selftest? (was: [PATCH 4/7] KVM: TDX: restore host xsave state when exit from the guest TD) Sean Christopherson
2025-01-27 17:09 ` Sean Christopherson
2025-01-03 18:16 ` [PATCH 4/7] KVM: TDX: restore host xsave state when exit from the guest TD Adrian Hunter
2025-01-09 19:11 ` Sean Christopherson
2025-01-10 14:50 ` Adrian Hunter
2025-01-10 17:30 ` Sean Christopherson
2025-01-14 20:04 ` Adrian Hunter
2025-01-15 2:28 ` Sean Christopherson
2025-01-13 19:28 ` Adrian Hunter
2025-01-13 23:47 ` Sean Christopherson
2024-11-25 11:34 ` Adrian Hunter
2024-11-21 20:14 ` [PATCH 5/7] KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr Adrian Hunter
2024-11-21 20:14 ` [PATCH 6/7] KVM: TDX: restore user ret MSRs Adrian Hunter
2024-11-21 20:14 ` [PATCH 7/7] KVM: TDX: Add TSX_CTRL msr into uret_msrs list Adrian Hunter
2024-11-22 3:27 ` Chao Gao
2024-11-27 14:00 ` Sean Christopherson
2024-11-29 11:39 ` Adrian Hunter
2024-12-02 19:07 ` Sean Christopherson
2024-12-02 19:24 ` Edgecombe, Rick P
2024-12-03 0:34 ` Sean Christopherson
2024-12-03 17:34 ` Edgecombe, Rick P
2024-12-03 19:17 ` Adrian Hunter
2024-12-04 1:25 ` Chao Gao
2024-12-04 6:18 ` Adrian Hunter
2024-12-04 6:37 ` Chao Gao
2024-12-04 6:57 ` Adrian Hunter
2024-12-04 11:13 ` Chao Gao
2024-12-04 11:55 ` Adrian Hunter
2024-12-04 15:33 ` Xiaoyao Li
2024-12-04 23:51 ` Edgecombe, Rick P
2024-12-05 17:31 ` Adrian Hunter
2024-12-06 3:37 ` Xiaoyao Li
2024-12-06 14:40 ` Adrian Hunter
2024-12-09 2:46 ` Xiaoyao Li
2024-12-09 7:08 ` Adrian Hunter
2024-12-10 2:45 ` Xiaoyao Li
2024-12-04 23:40 ` Edgecombe, Rick P
2024-11-25 1:25 ` [PATCH 0/7] KVM: TDX: TD vcpu enter/exit Binbin Wu
2024-11-25 15:19 ` Sean Christopherson
2024-11-25 19:50 ` Huang, Kai
2024-11-25 22:51 ` Sean Christopherson [this message]
2024-11-26 1:43 ` Huang, Kai
2024-11-26 1:44 ` Binbin Wu
2024-11-26 3:52 ` Huang, Kai
2024-11-26 5:29 ` Binbin Wu
2024-11-26 5:37 ` Huang, Kai
2024-11-26 21:41 ` Sean Christopherson
2024-12-10 18:23 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z0T_iPdmtpjrc14q@google.com \
--to=seanjc@google.com \
--cc=adrian.hunter@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=chao.gao@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dmatlack@google.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nik.borisov@suse.com \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=tony.lindgren@linux.intel.com \
--cc=weijiang.yang@intel.com \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.