From: Chao Gao <chao.gao@intel.com>
To: Kai Huang <kai.huang@intel.com>
Cc: <dave.hansen@intel.com>, <bp@alien8.de>, <tglx@linutronix.de>,
<peterz@infradead.org>, <mingo@redhat.com>, <hpa@zytor.com>,
<thomas.lendacky@amd.com>, <x86@kernel.org>, <kas@kernel.org>,
<rick.p.edgecombe@intel.com>, <dwmw@amazon.co.uk>,
<linux-kernel@vger.kernel.org>, <pbonzini@redhat.com>,
<seanjc@google.com>, <kvm@vger.kernel.org>,
<reinette.chatre@intel.com>, <isaku.yamahata@intel.com>,
<dan.j.williams@intel.com>, <ashish.kalra@amd.com>,
<nik.borisov@suse.com>, <sagis@google.com>,
"Farrah Chen" <farrah.chen@intel.com>,
Binbin Wu <binbin.wu@linux.intel.com>
Subject: Re: [PATCH v5 7/7] KVM: TDX: Explicitly do WBINVD when no more TDX SEAMCALLs
Date: Fri, 1 Aug 2025 16:30:58 +0800 [thread overview]
Message-ID: <aIx7Qlpi1Y/VsRVY@intel.com> (raw)
In-Reply-To: <c29f7a3348a95f687c83ac965ebc92ff5f253e87.1753679792.git.kai.huang@intel.com>
On Tue, Jul 29, 2025 at 12:28:41AM +1200, Kai Huang wrote:
>On TDX platforms, during kexec, the kernel needs to make sure there are
>no dirty cachelines of TDX private memory before booting to the new
>kernel to avoid silent memory corruption to the new kernel.
>
>During kexec, the kexec-ing CPU firstly invokes native_stop_other_cpus()
>to stop all remote CPUs before booting to the new kernel. The remote
>CPUs will then execute stop_this_cpu() to stop themselves.
>
>The kernel has a percpu boolean to indicate whether the cache of a CPU
>may be in incoherent state. In stop_this_cpu(), the kernel does WBINVD
>if that percpu boolean is true.
>
>TDX turns on that percpu boolean on a CPU when the kernel does SEAMCALL.
>This makes sure the caches will be flushed during kexec.
>
>However, the native_stop_other_cpus() and stop_this_cpu() have a "race"
>which is extremely rare to happen but could cause the system to hang.
>
>Specifically, the native_stop_other_cpus() firstly sends normal reboot
>IPI to remote CPUs and waits one second for them to stop. If that times
>out, native_stop_other_cpus() then sends NMIs to remote CPUs to stop
>them.
>
>The aforementioned race happens when NMIs are sent. Doing WBINVD in
>stop_this_cpu() makes each CPU take longer time to stop and increases
>the chance of the race happening.
>
>Explicitly flush cache in tdx_disable_virtualization_cpu() after which
>no more TDX activity can happen on this cpu. This moves the WBINVD to
>an earlier stage than stop_this_cpus(), avoiding a possibly lengthy
>operation at a time where it could cause this race.
>
>Signed-off-by: Kai Huang <kai.huang@intel.com>
>Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>Tested-by: Farrah Chen <farrah.chen@intel.com>
>Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Flushing cache after disabling virtualization looks clean. So,
Reviewed-by: Chao Gao <chao.gao@intel.com>
next prev parent reply other threads:[~2025-08-01 8:31 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-28 12:28 [PATCH v5 0/7] TDX host: kexec/kdump support Kai Huang
2025-07-28 12:28 ` [PATCH v5 1/7] x86/kexec: Consolidate relocate_kernel() function parameters Kai Huang
2025-08-06 6:53 ` Huang, Kai
2025-08-06 13:00 ` Tom Lendacky
2025-08-06 22:29 ` Huang, Kai
2025-07-28 12:28 ` [PATCH v5 2/7] x86/sme: Use percpu boolean to control WBINVD during kexec Kai Huang
2025-07-28 12:28 ` [PATCH v5 3/7] x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL Kai Huang
2025-08-01 8:23 ` Chao Gao
2025-08-04 12:47 ` Huang, Kai
2025-08-12 0:51 ` Edgecombe, Rick P
2025-08-12 1:32 ` Huang, Kai
2025-08-12 1:34 ` Edgecombe, Rick P
2025-08-12 2:03 ` Huang, Kai
2025-08-14 0:09 ` Huang, Kai
2025-07-28 12:28 ` [PATCH v5 4/7] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum Kai Huang
2025-07-28 12:28 ` [PATCH v5 5/7] x86/virt/tdx: Remove the !KEXEC_CORE dependency Kai Huang
2025-07-28 12:28 ` [PATCH v5 6/7] x86/virt/tdx: Update the kexec section in the TDX documentation Kai Huang
2025-07-28 12:28 ` [PATCH v5 7/7] KVM: TDX: Explicitly do WBINVD when no more TDX SEAMCALLs Kai Huang
2025-08-01 8:30 ` Chao Gao [this message]
2025-08-04 12:48 ` Huang, Kai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIx7Qlpi1Y/VsRVY@intel.com \
--to=chao.gao@intel.com \
--cc=ashish.kalra@amd.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dwmw@amazon.co.uk \
--cc=farrah.chen@intel.com \
--cc=hpa@zytor.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=sagis@google.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.