public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Kai" <kai.huang@intel.com>
To: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"kas@kernel.org" <kas@kernel.org>,
	"seanjc@google.com" <seanjc@google.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Cc: "bp@alien8.de" <bp@alien8.de>, "x86@kernel.org" <x86@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Verma, Vishal L" <vishal.l.verma@intel.com>,
	"tglx@kernel.org" <tglx@kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>
Subject: Re: [PATCH] x86/virt/tdx: Fix lockdep assertion failure in cache flush for kexec
Date: Tue, 10 Mar 2026 07:19:17 +0000	[thread overview]
Message-ID: <88b3637c84737136da1fe373cde43801845bd062.camel@intel.com> (raw)
In-Reply-To: <e762ca34d2e3f3555490e158cab82292c6122857.camel@intel.com>

On Mon, 2026-03-09 at 16:38 +0000, Edgecombe, Rick P wrote:
> On Mon, 2026-03-02 at 23:22 +1300, Kai Huang wrote:
> > TDX can leave the cache in an incoherent state for the memory it
> > uses. During kexec the kernel does a WBINVD for each CPU before
> > memory gets reused in the second kernel.
> > 
> > There were two considerations for where this WBINVD should happen. 
> > In order to handle cases where the cache might get into an incoherent
> > state while the kexec is in the initial stages, it is needed to do
> > this later in the kexec path, when the kexecing CPU stops all remote
> > CPUs.  However, the later kexec process is sensitive to existing
> > races.  So to avoid perturbing that operation, it is better to do it
> > earlier.
> > 
> > The existing solution is to track the need for the kexec time WBINVD
> > generically (i.e., not just for TDX) in a per-cpu var.  The late
> > invocation only happens if the earlier TDX specific logic in
> > tdx_cpu_flush_cache_for_kexec() didn’t take care of the work.  This
> > earlier WBINVD logic was built into KVM’s existing syscore ops
> > shutdown() handler, which is called earlier in the kexec path.
> > 
> > However, this accidentally added it to KVM’s unload path as well
> > (also the "error path" when bringing up TDX during KVM module load),
> > which uses the same internal functions.  This makes some sense too,
> > though, because if KVM is getting unloaded, TDX cache affecting
> > operations will likely cease.  So it is a good point to do the work
> > before KVM is unloaded and won't have a chance to handle the shutdown
> > operation in the future.
> > 
> > Unfortunately this KVM unload invocation triggers a lockdep warning
> > in tdx_cpu_flush_cache_for_kexec().  Since
> > tdx_cpu_flush_cache_for_kexec() is doing WBINVD on a specific CPU, it
> > has an assert for preemption being disabled.  This works fine for the
> > kexec time invocation, but the KVM unload path calls this as part of
> > a CPUHP callback for which, despite always executing on the target
> > CPU, preemption is not disabled.
> > 
> > It might be better to add the earlier invocation logic to a dedicated
> > arch/x86 TDX syscore shutdown() handler, but to make the fix more
> > backport friendly just adjust the lockdep assert in the
> > tdx_cpu_flush_cache_for_kexec().
> > 
> > The real requirement is tdx_cpu_flush_cache_for_kexec() must be done
> > on the same CPU.  It's OK that it can be preempted in the middle as
> > long as it won't be rescheduled to another CPU.
> > 
> > Remove the too strong lockdep_assert_preemption_disabled(), and
> > change this_cpu_{read|write}() to __this_cpu_{read|write}() which
> > provide the more proper check (when CONFIG_DEBUG_PREEMPT is true),
> > which checks all conditions that the context cannot be moved to
> > another CPU to run in the middle.
> > 
> > Fixes: 61221d07e815 ("KVM/TDX: Explicitly do WBINVD when no more TDX
> > SEAMCALLs")
> > Cc: stable@vger.kernel.org
> > Reported-by: Vishal Verma <vishal.l.verma@intel.com>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > Tested-by: Vishal Verma <vishal.l.verma@intel.com>
> 
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> 
> But this issue is also solved by:
> https://lore.kernel.org/kvm/20260307010358.819645-3-rick.p.edgecombe@intel.com/

This depends on Sean's series to move VMXON to x86 core, so it's not stable
friendly.

> 
> I guess that these changes are correct in either case. There is no need
> for the stricter asserts. But depending on the order the log would be
> confusing in the history when it talks about lockdep warnings. So we'll
> have to keep an eye on things. If this goes first, then it's fine.

I see.  Will keep this in mind.

> 
> You know, it might have helped to include the splat if you end up with
> a v2.

I thought lockdep warn should be obvious even w/o the actual splat, but fine
I can include the splat if v2 is needed.

Hi Sean, Paolo, Kirill,

It would be good to merge this upstream and backport to stable.  Appreciate
if you can ack if it looks good to you?  Thanks.

 

  reply	other threads:[~2026-03-10  7:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02 10:22 [PATCH] x86/virt/tdx: Fix lockdep assertion failure in cache flush for kexec Kai Huang
2026-03-02 10:22 ` [PATCH v2] " Kai Huang
2026-03-02 10:26   ` Huang, Kai
2026-03-05 18:33   ` Nikolay Borisov
2026-03-05 21:35     ` Huang, Kai
2026-03-06  9:58       ` Nikolay Borisov
2026-03-08 10:12         ` Huang, Kai
2026-03-10 13:43   ` Sean Christopherson
2026-03-09 16:38 ` [PATCH] " Edgecombe, Rick P
2026-03-10  7:19   ` Huang, Kai [this message]
2026-03-10 13:50     ` Sean Christopherson
2026-03-10 16:36       ` Edgecombe, Rick P
2026-03-10 21:15       ` Huang, Kai
2026-03-10 16:42   ` Edgecombe, Rick P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88b3637c84737136da1fe373cde43801845bd062.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=kas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@kernel.org \
    --cc=vishal.l.verma@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox