From: "Huang, Kai" <kai.huang@intel.com>
To: "seanjc@google.com" <seanjc@google.com>,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"ashish.kalra@amd.com" <ashish.kalra@amd.com>,
"Hansen, Dave" <dave.hansen@intel.com>,
"thomas.lendacky@amd.com" <thomas.lendacky@amd.com>,
"kas@kernel.org" <kas@kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"dwmw@amazon.co.uk" <dwmw@amazon.co.uk>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"Chatre, Reinette" <reinette.chatre@intel.com>,
"Yamahata, Isaku" <isaku.yamahata@intel.com>,
"nik.borisov@suse.com" <nik.borisov@suse.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"hpa@zytor.com" <hpa@zytor.com>,
"peterz@infradead.org" <peterz@infradead.org>,
"sagis@google.com" <sagis@google.com>,
"Chen, Farrah" <farrah.chen@intel.com>,
"bp@alien8.de" <bp@alien8.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>,
"Gao, Chao" <chao.gao@intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v6 7/7] KVM: TDX: Explicitly do WBINVD when no more TDX SEAMCALLs
Date: Thu, 14 Aug 2025 22:19:10 +0000 [thread overview]
Message-ID: <d2e33db367b503dde2f342de3cedb3b8fa29cc42.camel@intel.com> (raw)
In-Reply-To: <aJ4kWcuyNIpCnaXE@google.com>
On Thu, 2025-08-14 at 11:00 -0700, Sean Christopherson wrote:
> On Thu, Aug 14, 2025, Rick P Edgecombe wrote:
> > On Thu, 2025-08-14 at 06:54 -0700, Sean Christopherson wrote:
> > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > > index 66744f5768c8..1bc6f52e0cd7 100644
> > > > --- a/arch/x86/kvm/vmx/tdx.c
> > > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > > @@ -442,6 +442,18 @@ void tdx_disable_virtualization_cpu(void)
> > > > tdx_flush_vp(&arg);
> > > > }
> > > > local_irq_restore(flags);
> > > > +
> > > > + /*
> > > > + * No more TDX activity on this CPU from here. Flush cache to
> > > > + * avoid having to do WBINVD in stop_this_cpu() during kexec.
> > > > + *
> > > > + * Kexec calls native_stop_other_cpus() to stop remote CPUs
> > > > + * before booting to new kernel, but that code has a "race"
> > > > + * when the normal REBOOT IPI times out and NMIs are sent to
> > > > + * remote CPUs to stop them. Doing WBINVD in stop_this_cpu()
> > > > + * could potentially increase the possibility of the "race".
>
> Why is that race problematic? The changelog just says
>
> : However, the native_stop_other_cpus() and stop_this_cpu() have a "race"
> : which is extremely rare to happen but could cause the system to hang.
> : even
> : Specifically, the native_stop_other_cpus() firstly sends normal reboot
> : IPI to remote CPUs and waits one second for them to stop. If that times
> : out, native_stop_other_cpus() then sends NMIs to remote CPUs to stop
> : them.
>
> without explaining how that can cause a system hang.
Thanks for review. Sean.
The race is about the kexec-ing CPU could jump to second kernel when other
CPUs have not fully stopped.
In the patch 3 I appended a link in the changelog to explain the race:
https://lore.kernel.org/kvm/b963fcd60abe26c7ec5dc20b42f1a2ebbcc72397.1750934177.git.kai.huang@intel.com/
Please see "[*] The "race" in native_stop_other_cpus()" part.
I will put the link in the changelog of this patch too.
>
> > > > + */
> > > > + tdx_cpu_flush_cache();
> > >
> > > IIUC, this can be:
> > >
> > > if (IS_ENABLED(CONFIG_KEXEC))
> > > tdx_cpu_flush_cache();
> > >
> >
> > No strong objection, just 2 cents. I bet !CONFIG_KEXEC && CONFIG_INTEL_TDX_HOST
> > kernels will be the minority. Seems like an opportunity to simplify the code.
>
> Reducing the number of lines of code is not always a simplification. IMO, not
> checking CONFIG_KEXEC adds "complexity" because anyone that reads the comment
> (and/or the massive changelog) will be left wondering why there's a bunch of
> documentation that talks about kexec, but no hint of kexec considerations in the
> code.
I think we can use 'kexec_in_progress', which is even better than
IS_ENABLED(CONFIG_KEXEC) IMHO.
When CONFIG_KEXEC is on, 'kexec_in_progress' will only be set when kexec
is actually happening, thus tdx_cpu_flush_cache() will only be called for
kexec. When CONFIG_KEXEC (CONFIG_KEXEC_CORE) is off, then
'kexec_in_progress' is a macro defined to false. The compiler can
optimize this out too I suppose.
Any comments?
next prev parent reply other threads:[~2025-08-14 22:19 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-13 23:59 [PATCH v6 0/7] TDX host: kexec/kdump support Kai Huang
2025-08-13 23:59 ` [PATCH v6 1/7] x86/kexec: Consolidate relocate_kernel() function parameters Kai Huang
2025-08-15 10:46 ` Borislav Petkov
2025-08-18 1:15 ` Huang, Kai
2025-08-13 23:59 ` [PATCH v6 2/7] x86/sme: Use percpu boolean to control WBINVD during kexec Kai Huang
2025-08-19 19:28 ` Borislav Petkov
2025-08-19 21:57 ` Huang, Kai
2025-08-13 23:59 ` [PATCH v6 3/7] x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL Kai Huang
2025-08-13 23:59 ` [PATCH v6 4/7] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum Kai Huang
2025-08-13 23:59 ` [PATCH v6 5/7] x86/virt/tdx: Remove the !KEXEC_CORE dependency Kai Huang
2025-08-13 23:59 ` [PATCH v6 6/7] x86/virt/tdx: Update the kexec section in the TDX documentation Kai Huang
2025-08-13 23:59 ` [PATCH v6 7/7] KVM: TDX: Explicitly do WBINVD when no more TDX SEAMCALLs Kai Huang
2025-08-14 13:54 ` Sean Christopherson
2025-08-14 15:38 ` Edgecombe, Rick P
2025-08-14 18:00 ` Sean Christopherson
2025-08-14 22:19 ` Huang, Kai [this message]
2025-08-14 23:22 ` Sean Christopherson
2025-08-15 0:00 ` Huang, Kai
2025-08-19 10:31 ` Paolo Bonzini
2025-08-19 21:53 ` Huang, Kai
2025-08-20 9:51 ` Paolo Bonzini
2025-08-20 11:22 ` Huang, Kai
2025-08-20 20:35 ` Paolo Bonzini
2025-08-20 21:34 ` Huang, Kai
2025-08-20 15:39 ` Paolo Bonzini
2025-08-14 22:25 ` Huang, Kai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d2e33db367b503dde2f342de3cedb3b8fa29cc42.camel@intel.com \
--to=kai.huang@intel.com \
--cc=ashish.kalra@amd.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=chao.gao@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dwmw@amazon.co.uk \
--cc=farrah.chen@intel.com \
--cc=hpa@zytor.com \
--cc=isaku.yamahata@intel.com \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=sagis@google.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.