public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "seanjc@google.com" <seanjc@google.com>,
	"Nowicki, Robert" <robert.nowicki@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Verma, Vishal L" <vishal.l.verma@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Swierszcz, Igor" <Igor.Swierszcz@intel.com>
Subject: Re: [PATCH] x86/tdx, KVM: fix HKID leak when kexec is initiated with active TDs
Date: Wed, 22 Apr 2026 14:29:08 +0000	[thread overview]
Message-ID: <6aa7f5e57ffa05cb068956c643d9e2be7630b285.camel@intel.com> (raw)
In-Reply-To: <aejJup_I4wM0Rb93@google.com>

On Wed, 2026-04-22 at 06:14 -0700, Sean Christopherson wrote:
> On Wed, Apr 22, 2026, Robert Nowicki wrote:
> > When kexec is initiated while TDs are running, vCPU threads can be
> > mid-TDH.VP.ENTER on other CPUs when tdx_shutdown() fires. The TDX
> > module rejects TDH.MNG.VPFLUSHDONE for a VP in RUNNING state, leaving
> > the HKID in a leaked state:
> > 
> >    kvm_intel: tdh_mng_vpflushdone() failed. HKID 33 is leaked.
> > 
> > Fix this by introducing a quiescing flag set at the start of
> > tdx_shutdown(). KVM's tdx_vcpu_run() checks the flag and returns
> > EXIT_FASTPATH_NONE before attempting TDH.VP.ENTER. After setting the
> > flag, tdx_shutdown() calls on_each_cpu(tdx_seam_sync) with wait=1 to
> > ensure any CPU currently inside TDH.VP.ENTER has exited SEAM before
> > tdx_sys_disable() is called.
> > 
> > Fixes: 58171ae22e11 ("x86/tdx: Disable the TDX module during kexec and
> > kdump")
> 
> Please don't post seemingly standalone patches for code that hasn't yet been
> merged, it's quite confusing.

+1. Robert, we try to coordinate public Linux TDX work internally before posting
because there is so much of it, it gets confusing to community/maintainers.
Please check in with the Linux TDX developers before posing TDX patches so we
can have a cohesive effort.

> 
> >   u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args);
> >   u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
> > @@ -206,6 +207,7 @@ static inline u32 tdx_get_nr_guest_keyids(void) { return
> > 0; }
> >   static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL;
> > }
> >   static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return
> > NULL; }
> >   static inline void tdx_sys_disable(void) { }
> > +static inline bool tdx_kexec_quiescing(void) { return false; }
> >   #endif	/* CONFIG_INTEL_TDX_HOST */
> >   
> >   #endif /* !__ASSEMBLER__ */
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 50a5cfdbd33e..2d658db7700d 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -1053,6 +1053,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64
> > run_flags)
> >   	struct vcpu_tdx *tdx = to_tdx(vcpu);
> >   	struct vcpu_vt *vt = to_vt(vcpu);
> >   
> > +	if (unlikely(tdx_kexec_quiescing()))

There is essentially an existing kexec race, where vmxoff happens when SEAMCALLs
could still happen. It goes back to the first TDX kexec support (i.e. not
introduced by vmxon refactor). VMX KVM has some spurious logic to handle
something similar for normal VMs, but TDX doesn't. 

I don't see why this TDH.MNG.VPFLUSHDONE case is special. If the TDX module is
shutdown and the old kernel is going away, how is anything leaked other than the
normal type of leakage that happens during kexec? So I think maybe this is just
the known vmxoff seamcall race, with the specific case observed generating a
message about leaking.

Also, not sure how handling VP.ENTER would prevent the VPFLUSHDONE call from
meeting an error and emitting the same message. If the TDX module is shutdown...

> 
> Requiring KVM to check a global on every entry is pretty ugly, especially
> since this is for a very rare scenario (in terms of number of entries).  And
> forcing KVM to do a CALL+RET to check an almost-never-set flag is especially
> ugly.
> 
> Why not handle this entirely in tdx_shutdown_cpu()?  E.g. have the last CPU
> through disable TDX, and hld all the CPUs hostage until that's done.  It's not
> the prettiest thing in the world, but it's entirely self-contained.
> 
> static void tdx_shutdown_cpu(void *__nr_cpus_remaining)
> {
> 	atomic_t *nr_cpus_remaining = __nr_cpus_remaining;
> 
> 	if (!atomic_add_unless(nr_cpus_remaining, -1, 1)) {
> 		tdx_sys_disable();
> 		atomic_set(nr_cpus_remaining, 0);
> 	}
> 
> 	x86_virt_put_ref(X86_FEATURE_VMX);
> 
> 	while (!atomic_read(nr_cpus_remaining))
> 		cpu_relax();
> }
> 
> static void tdx_shutdown(void *ign)
> {
> 	atomic_t nr_cpus_remaining = ATOMIC_INIT(num_online_cpus());
> 
> 	on_each_cpu(tdx_shutdown_cpu, &nr_cpus_remaining, 1);
> }

After vmxoff happens, the SEAMCALLs will just meet other errors. The wrappers
will morph the vmxoff condition into a SW error that much of the TDX code can't
handle either. So it doesn't help the problem I'm afraid.

It would be my preference to fix the existing issue separately than this series.
This series makes kexec way more functional for TDX, and the worst case AFAICT
is a splat in an otherwise successful kexec. So a non-critical and existing
problem.

Kai and I were previously kicking around some ideas about the general case
problem. It somehow missed our cleanup list, but I just added it.

      parent reply	other threads:[~2026-04-22 14:29 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 20:59 [PATCH v2 0/5] Fuller TDX kexec support Vishal Verma
2026-03-23 20:59 ` [PATCH v2 1/5] x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h> Vishal Verma
2026-03-24  9:49   ` Chao Gao
2026-03-31 19:30   ` Sean Christopherson
2026-03-31 21:46     ` Edgecombe, Rick P
2026-03-23 20:59 ` [PATCH v2 2/5] x86/virt/tdx: Pull kexec cache flush logic into arch/x86 Vishal Verma
2026-03-24 10:03   ` Chao Gao
2026-03-30 11:42   ` Kiryl Shutsemau
2026-03-31 19:22   ` Sean Christopherson
2026-03-31 22:21     ` Edgecombe, Rick P
2026-03-31 23:04       ` Sean Christopherson
2026-03-31 23:29         ` Edgecombe, Rick P
2026-04-01 15:03         ` Dave Hansen
2026-04-01 17:42           ` H. Peter Anvin
2026-04-01 18:12             ` Sean Christopherson
2026-04-01 18:30               ` Dave Hansen
2026-03-23 20:59 ` [PATCH v2 3/5] x86/virt/tdx: Add SEAMCALL wrapper for TDH.SYS.DISABLE Vishal Verma
2026-03-23 21:54   ` Verma, Vishal L
2026-03-23 22:40   ` Huang, Kai
2026-03-24 10:18   ` Chao Gao
2026-03-30 11:58   ` Kiryl Shutsemau
2026-03-30 19:25     ` Edgecombe, Rick P
2026-03-31 12:18       ` Kiryl Shutsemau
2026-03-31 18:22         ` Verma, Vishal L
2026-03-31 21:36           ` Edgecombe, Rick P
2026-04-01  9:26             ` Kiryl Shutsemau
2026-04-01 14:24             ` Dave Hansen
2026-03-23 20:59 ` [PATCH v2 4/5] x86/tdx: Disable the TDX module during kexec and kdump Vishal Verma
2026-03-23 22:41   ` Huang, Kai
2026-03-30 12:03   ` Kiryl Shutsemau
2026-03-23 20:59 ` [PATCH v2 5/5] x86/virt/tdx: Remove kexec docs Vishal Verma
2026-03-23 22:41   ` Huang, Kai
2026-03-30 12:04   ` Kiryl Shutsemau
2026-04-22 12:45 ` [PATCH] x86/tdx, KVM: fix HKID leak when kexec is initiated with active TDs Nowicki, Robert
2026-04-22 13:14   ` Sean Christopherson
     [not found]     ` <CH3PR11MB843450CBD154D42B31D15E93832D2@CH3PR11MB8434.namprd11.prod.outlook.com>
2026-04-22 13:34       ` Sean Christopherson
2026-04-22 14:29     ` Edgecombe, Rick P [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6aa7f5e57ffa05cb068956c643d9e2be7630b285.camel@intel.com \
    --to=rick.p.edgecombe@intel.com \
    --cc=Igor.Swierszcz@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=robert.nowicki@intel.com \
    --cc=seanjc@google.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox