From: Kai Huang <kai.huang@intel.com>
To: dave.hansen@linux.intel.com, seanjc@google.com,
pbonzini@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com
Cc: tglx@kernel.org, bp@alien8.de, mingo@redhat.com, x86@kernel.org,
hpa@zytor.com, linux-kernel@vger.kernel.org,
Kai Huang <kai.huang@intel.com>,
stable@vger.kernel.org, Vishal Verma <vishal.l.verma@intel.com>,
Nikolay Borisov <nik.borisov@suse.com>
Subject: [PATCH v3] x86/virt/tdx: Fix lockdep assertion failure in cache flush for kexec
Date: Wed, 8 Apr 2026 11:33:33 +1200 [thread overview]
Message-ID: <20260407233333.1608820-1-kai.huang@intel.com> (raw)
TDX can leave the cache in an incoherent state for the memory it uses.
During kexec, the kernel does a WBINVD for all CPUs which may have
incoherent cache before memory gets reused in the second kernel.
The kernel tracks the need for the kexec-time WBINVD generically (i.e.,
not just for TDX) in a per-cpu boolean. The TDX core also provides a
function tdx_cpu_flush_cache_for_kexec() to do the WBINVD at an earlier
time to save the kexec-time WBINVD. This function manipulates the
per-cpu boolean, and it has a lockdep_assert_preemption_disabled() to
check that the context cannot be preempted.
This function is called from both KVM's syscore ops shutdown() handler
(which is called at an early stage in the kexec path) and the module
unload path. The lockdep assert passes in the kexec path since the
shutdown() handler calls that function in IRQ context. However, the
module unload path invokes it via the CPUHP callback which runs that
function in a per-cpu thread with preemption enabled. This triggers the
lockdep warning when unloading the KVM module:
IS_ENABLED(CONFIG_PREEMPT_COUNT) && __lockdep_enabled && (preempt_count() == 0 && this_cpu_read(hardirqs_enabled))
WARNING: arch/x86/virt/vmx/tdx/tdx.c:1875 at tdx_cpu_flush_cache_for_kexec+0x36/0x60, CPU#0: cpuhp/0/22
...
Call Trace:
<TASK>
vt_disable_virtualization_cpu+0x1c/0x30 [kvm_intel]
kvm_arch_disable_virtualization_cpu+0x12/0x80 [kvm]
kvm_offline_cpu+0x24/0x40 [kvm]
cpuhp_invoke_callback+0x1b0/0x740
...
The lockdep warning is actually a false positive, though, because the
CPUHP callback guarantees that the function runs on the same CPU even
when preemption is enabled. In other words, the lockdep assert of
preemption being disabled is overkill.
Remove the overkill lockdep_assert_preemption_disabled(), and change
this_cpu_{read|write}() to __this_cpu_{read|write}() which provides a
more proper preemption check (when CONFIG_DEBUG_PREEMPT is true), which
checks all conditions that the context cannot be moved to another CPU to
run in the middle.
Fixes: 61221d07e815 ("KVM/TDX: Explicitly do WBINVD when no more TDX SEAMCALLs")
Cc: stable@vger.kernel.org
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Acked-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2 -> v3:
- Add Kirill's tag (Thanks!)
- Address Dave's comments (provided in internal chat) about the changelog
being not concise enough to get the main points:
The code manipulates a per-cpu variable
There is a preempt disable check for #1
The preempt check passes in the kexec path, but WARN()s in the "KVM unload"
The KVM unload path is actually OK so #3 is a false positive
v2: https://lore.kernel.org/lkml/20260312100009.924136-1-kai.huang@intel.com/
---
arch/x86/virt/vmx/tdx/tdx.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 8b8e165a2001..6f6be1df4b78 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1872,9 +1872,7 @@ EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_hkid);
#ifdef CONFIG_KEXEC_CORE
void tdx_cpu_flush_cache_for_kexec(void)
{
- lockdep_assert_preemption_disabled();
-
- if (!this_cpu_read(cache_state_incoherent))
+ if (!__this_cpu_read(cache_state_incoherent))
return;
/*
@@ -1883,7 +1881,7 @@ void tdx_cpu_flush_cache_for_kexec(void)
* there should be no more SEAMCALLs on this CPU.
*/
wbinvd();
- this_cpu_write(cache_state_incoherent, false);
+ __this_cpu_write(cache_state_incoherent, false);
}
EXPORT_SYMBOL_FOR_KVM(tdx_cpu_flush_cache_for_kexec);
#endif
base-commit: 87d034b5b9f36c66bf02af587fb6935af88ffbf1
--
2.53.0
reply other threads:[~2026-04-07 23:33 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260407233333.1608820-1-kai.huang@intel.com \
--to=kai.huang@intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kas@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=stable@vger.kernel.org \
--cc=tglx@kernel.org \
--cc=vishal.l.verma@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox