linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Kai Huang <kai.huang@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Chao Gao <chao.gao@intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Farrah Chen <farrah.chen@intel.com>,
	Sasha Levin <sashal@kernel.org>,
	kas@kernel.org, isaku.yamahata@intel.com,
	alexandre.f.demers@gmail.com, thuth@redhat.com,
	vannapurve@google.com, adrian.hunter@intel.com, x86@kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org
Subject: [PATCH AUTOSEL 6.17] x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL
Date: Sat, 25 Oct 2025 11:59:19 -0400	[thread overview]
Message-ID: <20251025160905.3857885-328-sashal@kernel.org> (raw)
In-Reply-To: <20251025160905.3857885-1-sashal@kernel.org>

From: Kai Huang <kai.huang@intel.com>

[ Upstream commit 10df8607bf1a22249d21859f56eeb61e9a033313 ]

On TDX platforms, dirty cacheline aliases with and without encryption
bits can coexist, and the cpu can flush them back to memory in random
order.  During kexec, the caches must be flushed before jumping to the
new kernel otherwise the dirty cachelines could silently corrupt the
memory used by the new kernel due to different encryption property.

A percpu boolean is used to mark whether the cache of a given CPU may be
in an incoherent state, and the kexec performs WBINVD on the CPUs with
that boolean turned on.

For TDX, only the TDX module or the TDX guests can generate dirty
cachelines of TDX private memory, i.e., they are only generated when the
kernel does a SEAMCALL.

Set that boolean when the kernel does SEAMCALL so that kexec can flush
the cache correctly.

The kernel provides both the __seamcall*() assembly functions and the
seamcall*() wrapper ones which additionally handle running out of
entropy error in a loop.  Most of the SEAMCALLs are called using the
seamcall*(), except TDH.VP.ENTER and TDH.PHYMEM.PAGE.RDMD which are
called using __seamcall*() variant directly.

To cover the two special cases, add a new __seamcall_dirty_cache()
helper which only sets the percpu boolean and calls the __seamcall*(),
and change the special cases to use the new helper.  To cover all other
SEAMCALLs, change seamcall*() to call the new helper.

For the SEAMCALLs invoked via seamcall*(), they can be made from both
task context and IRQ disabled context.  Given SEAMCALL is just a lengthy
instruction (e.g., thousands of cycles) from kernel's point of view and
preempt_{disable|enable}() is cheap compared to it, just unconditionally
disable preemption during setting the boolean and making SEAMCALL.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Link: https://lore.kernel.org/all/20250901160930.1785244-4-pbonzini%40redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

Why this fixes a real bug
- TDX can leave dirty cachelines for private memory with different
  encryption attributes (C-bit aliases). If kexec interrupts a CPU
  during a SEAMCALL, its dirty private cachelines can later be flushed
  in the wrong order and silently corrupt the new kernel’s memory.
  Marking the CPU’s cache state as “incoherent” before executing
  SEAMCALL ensures kexec will WBINVD on that CPU and avoid corruption.

What changed (key points with code references)
- New helper marks per-CPU cache incoherent before any SEAMCALL:
  - arch/x86/include/asm/tdx.h:111 sets
    `this_cpu_write(cache_state_incoherent, true)` in
    `__seamcall_dirty_cache()` and asserts preemption is disabled (lines
    111–128).
- Wrap all `seamcall*()` paths with preemption-disabled critical
  section:
  - arch/x86/include/asm/tdx.h:130–147 uses
    `preempt_disable()/preempt_enable()` in `sc_retry()` so the same CPU
    that sets the flag executes the SEAMCALL, avoiding migration races.
- Convert special direct callers to use the new helper:
  - arch/x86/virt/vmx/tdx/tdx.c:1271 changes `paddr_is_tdx_private()` to
    call `__seamcall_dirty_cache(__seamcall_ret, TDH_PHYMEM_PAGE_RDMD,
    ...)`.
  - arch/x86/virt/vmx/tdx/tdx.c:1522 changes `tdh_vp_enter()` to call
    `__seamcall_dirty_cache(__seamcall_saved_ret, TDH_VP_ENTER, ...)`.
- Consumers of the per-CPU flag during kexec/CPU stop:
  - arch/x86/kernel/process.c:99 defines `cache_state_incoherent` and
    uses it in `stop_this_cpu()` to WBINVD if set
    (arch/x86/kernel/process.c:840).
  - arch/x86/kernel/machine_kexec_64.c:449 sets
    `RELOC_KERNEL_CACHE_INCOHERENT` when the per-CPU flag is set so
    `relocate_kernel_64.S` executes WBINVD (relocate path).
  - The TDX-specific flush routine will WBINVD and clear the flag if
    needed (arch/x86/virt/vmx/tdx/tdx.c:1872–1887).

Why it’s safe to backport
- Scope-limited: touches only TDX host paths and the seamcall wrappers;
  no ABI or architectural changes.
- Minimal risk: setting a per-CPU boolean and wrapping SEAMCALLs with
  preempt disable. SEAMCALLs are long; added preemption control is
  negligible overhead and avoids CPU migration races.
- Correctness across contexts: SEAMCALLs can happen with IRQs disabled;
  the helper asserts preemption is off, and the wrappers explicitly
  ensure it. The two special direct-call sites run in contexts where
  IRQs are off or preemption is already disabled.
- Aligns with existing kexec logic: Stable trees already check
  `cache_state_incoherent` during CPU stop and relocation
  (arch/x86/kernel/process.c:840,
  arch/x86/kernel/machine_kexec_64.c:449).

Dependencies/assumptions for stable trees
- Requires the per-CPU `cache_state_incoherent` infrastructure and kexec
  consumers:
  - Declaration: arch/x86/include/asm/processor.h:734
  - Definition/usage: arch/x86/kernel/process.c:99,
    arch/x86/kernel/process.c:840
  - Kexec integration: arch/x86/kernel/machine_kexec_64.c:449 and
    arch/x86/kernel/relocate_kernel_64.S (WBINVD when
    `RELOC_KERNEL_CACHE_INCOHERENT` set)

Summary
- This is a focused, low-risk bugfix preventing silent memory corruption
  on TDX hosts during kexec by correctly marking and subsequently
  flushing CPUs that might have generated dirty private cachelines
  during SEAMCALLs. It satisfies stable backport criteria (user-visible
  correctness fix, minimal change, localized impact).

 arch/x86/include/asm/tdx.h  | 25 ++++++++++++++++++++++++-
 arch/x86/virt/vmx/tdx/tdx.c |  4 ++--
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 7ddef3a698668..0922265c6bdcb 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -102,10 +102,31 @@ u64 __seamcall_ret(u64 fn, struct tdx_module_args *args);
 u64 __seamcall_saved_ret(u64 fn, struct tdx_module_args *args);
 void tdx_init(void);
 
+#include <linux/preempt.h>
 #include <asm/archrandom.h>
+#include <asm/processor.h>
 
 typedef u64 (*sc_func_t)(u64 fn, struct tdx_module_args *args);
 
+static __always_inline u64 __seamcall_dirty_cache(sc_func_t func, u64 fn,
+						  struct tdx_module_args *args)
+{
+	lockdep_assert_preemption_disabled();
+
+	/*
+	 * SEAMCALLs are made to the TDX module and can generate dirty
+	 * cachelines of TDX private memory.  Mark cache state incoherent
+	 * so that the cache can be flushed during kexec.
+	 *
+	 * This needs to be done before actually making the SEAMCALL,
+	 * because kexec-ing CPU could send NMI to stop remote CPUs,
+	 * in which case even disabling IRQ won't help here.
+	 */
+	this_cpu_write(cache_state_incoherent, true);
+
+	return func(fn, args);
+}
+
 static __always_inline u64 sc_retry(sc_func_t func, u64 fn,
 			   struct tdx_module_args *args)
 {
@@ -113,7 +134,9 @@ static __always_inline u64 sc_retry(sc_func_t func, u64 fn,
 	u64 ret;
 
 	do {
-		ret = func(fn, args);
+		preempt_disable();
+		ret = __seamcall_dirty_cache(func, fn, args);
+		preempt_enable();
 	} while (ret == TDX_RND_NO_ENTROPY && --retry);
 
 	return ret;
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index c7a9a087ccaf5..3ea6f587c81a3 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1266,7 +1266,7 @@ static bool paddr_is_tdx_private(unsigned long phys)
 		return false;
 
 	/* Get page type from the TDX module */
-	sret = __seamcall_ret(TDH_PHYMEM_PAGE_RDMD, &args);
+	sret = __seamcall_dirty_cache(__seamcall_ret, TDH_PHYMEM_PAGE_RDMD, &args);
 
 	/*
 	 * The SEAMCALL will not return success unless there is a
@@ -1522,7 +1522,7 @@ noinstr __flatten u64 tdh_vp_enter(struct tdx_vp *td, struct tdx_module_args *ar
 {
 	args->rcx = tdx_tdvpr_pa(td);
 
-	return __seamcall_saved_ret(TDH_VP_ENTER, args);
+	return __seamcall_dirty_cache(__seamcall_saved_ret, TDH_VP_ENTER, args);
 }
 EXPORT_SYMBOL_GPL(tdh_vp_enter);
 
-- 
2.51.0


  parent reply	other threads:[~2025-10-25 16:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20251025160905.3857885-1-sashal@kernel.org>
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum Sasha Levin
2025-10-26 22:24   ` Huang, Kai
2025-11-03  9:26     ` Huang, Kai
2025-11-04 14:46       ` Sasha Levin
2025-11-04 21:27         ` Huang, Kai
2025-10-25 15:59 ` Sasha Levin [this message]
2025-10-26 22:25   ` [PATCH AUTOSEL 6.17] x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL Huang, Kai
2025-10-28 17:49     ` Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17] x86/virt/tdx: Use precalculated TDVPR page physical address Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251025160905.3857885-328-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexandre.f.demers@gmail.com \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=farrah.chen@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=patches@lists.linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=thuth@redhat.com \
    --cc=vannapurve@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).