From: Sean Christopherson <seanjc@google.com>
To: Vishal Annapurve <vannapurve@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
x86@kernel.org, linux-kernel@vger.kernel.org,
pbonzini@redhat.com, erdemaktas@google.com,
ackerleytng@google.com, jxgao@google.com, sagis@google.com,
oupton@google.com, pgonda@google.com, kirill@shutemov.name,
dave.hansen@linux.intel.com, linux-coco@lists.linux.dev,
chao.p.peng@linux.intel.com, isaku.yamahata@gmail.com
Subject: Re: [PATCH 1/1] x86/tdx: Route safe halt execution via tdx_safe_halt
Date: Tue, 28 Jan 2025 16:45:35 -0800 [thread overview]
Message-ID: <Z5l6L3Hen9_Y3SGC@google.com> (raw)
In-Reply-To: <CAGtprH_1hc35um_47n=TMJzXAvpa7LY6kyXUafjJt3sPCosC_w@mail.gmail.com>
On Tue, Jan 28, 2025, Vishal Annapurve wrote:
> On Tue, Jan 28, 2025 at 2:08 PM Dave Hansen <dave.hansen@intel.com> wrote:
> > Do you have any thoughts on why nobody has hit this up to now? Are TDX
> > users not enabling PARAVIRT_XXL? Not using ACPI?
>
> This has been a long-standing issue which would show up visibly with
> certain workloads where vcpus hit idle loop quite often during the
> runtime. I was only able to recently spend some time towards
> understanding the cause properly.
...
> > > @@ -1083,6 +1089,15 @@ void __init tdx_early_init(void)
> > > x86_platform.guest.enc_kexec_begin = tdx_kexec_begin;
> > > x86_platform.guest.enc_kexec_finish = tdx_kexec_finish;
> > >
> > > +#ifdef CONFIG_PARAVIRT_XXL
> > > + /*
> > > + * halt instruction execution is not atomic for TDX VMs as it generates
> > > + * #VEs, so otherwise "safe" halt invocations which cause interrupts to
> > > + * get enabled right after halt instruction don't work for TDX VMs.
> > > + */
> > > + pv_ops.irq.safe_halt = tdx_safe_halt;
> > > +#endif
> > The basic bug here was that there was a path to a hlt instruction that
> > folks didn't realize. This patch fixes the basic bug and gives us a nice
> > warning if there are additional paths that weren't imagined.
> >
> > But it doesn't really help us audit the code to make it clear that TDX
> > guest kernel's _can't_ screw up hlt again the same way. This, for
> > instance would make it pretty clear:
> >
> > static __always_inline void native_safe_halt(void)
> > {
> > if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> > tdx_safe_halt();
This incorrectly assumes the hypervisor is intercepting HLT. If the VM is given
a slice of hardware, HLT-exiting may be disabled, in which case it's desirable
for the guest to natively execute HLT, as the latencies to get in and out of "HLT"
are lower, especially for TDX guests. Such a VM would hopefully have MONITOR/MWAIT
available as well, but even if that were the case, the admin could select HLT for
idling.
Ugh, and I see that bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
overrides default_idle(). The kernel really shouldn't do that, because odds are
decent that any TDX guest will have direct access to HLT. The best approach I
can think of would be to patch x86_idle() to tdx_safe_halt() if and only if a HLT
#VE is taken. The tricky part would be delaying the update until it's safe to do
so.
As for taking a #VE, the exception itself is fine (assuming the kernel isn't off
the rails and using a trap gate :-D). The issue is likely that RFLAGS.IF=1 on
the stack, and so the call to cond_local_irq_enable() enables IRQs before making
the hypercall. E.g. no one has complained about #VC, because exc_vmm_communication()
doesn't enable IRQs.
Off the top of my head, I can't think of any flows that would do HLT with IRQs
fully enabled. Even PV spinlocks use safe_halt(), e.g. in kvm_wait(), so I don't
think there's any value in trying to precisely identify that it's a safe HLT?
E.g. this should fix the immediate problem, and then ideally someone would make
TDX guests play nice with native HLT.
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 2dbadf347b5f..c60659468894 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -78,6 +78,8 @@
#include <asm/proto.h>
+#include <uapi/asm/vmx.h>
+
DECLARE_BITMAP(system_vectors, NR_VECTORS);
__always_inline int is_valid_bugaddr(unsigned long addr)
@@ -1424,7 +1426,14 @@ DEFINE_IDTENTRY(exc_virtualization_exception)
*/
tdx_get_ve_info(&ve);
- cond_local_irq_enable(regs);
+ /*
+ * Don't enable IRQs on #VE due to HLT, as the HLT was likely executed
+ * in an STI-shadow, e.g. by safe_halt(). For safe HLT, IRQs need to
+ * remain disabled until the TDCALL to request HLT emulation, so that
+ * pending IRQs are correctly treated as wake events.
+ */
+ if (ve.exit_reason != EXIT_REASON_HLT)
+ cond_local_irq_enable(regs);
/*
* If tdx_handle_virt_exception() could not process
@@ -1433,7 +1442,8 @@ DEFINE_IDTENTRY(exc_virtualization_exception)
if (!tdx_handle_virt_exception(regs, &ve))
ve_raise_fault(regs, 0, ve.gla);
- cond_local_irq_disable(regs);
+ if (ve.exit_reason != EXIT_REASON_HLT)
+ cond_local_irq_disable(regs);
}
#endif
next prev parent reply other threads:[~2025-01-29 0:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 21:36 [PATCH 1/1] x86/tdx: Route safe halt execution via tdx_safe_halt Vishal Annapurve
2025-01-28 22:08 ` Dave Hansen
2025-01-28 23:10 ` Vishal Annapurve
2025-01-29 0:45 ` Sean Christopherson [this message]
2025-01-29 12:10 ` Kirill A. Shutemov
2025-01-29 13:55 ` Sean Christopherson
2025-01-29 14:00 ` Sean Christopherson
2025-01-29 18:35 ` Vishal Annapurve
2025-01-29 10:37 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5l6L3Hen9_Y3SGC@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=chao.p.peng@linux.intel.com \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=erdemaktas@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=jxgao@google.com \
--cc=kirill@shutemov.name \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=oupton@google.com \
--cc=pbonzini@redhat.com \
--cc=pgonda@google.com \
--cc=sagis@google.com \
--cc=vannapurve@google.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox