From: Xiaoyao Li <xiaoyao.li@intel.com>
To: Adrian Hunter <adrian.hunter@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
pbonzini@redhat.com, seanjc@google.com, vannapurve@google.com
Cc: Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>,
x86@kernel.org, H Peter Anvin <hpa@zytor.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
rick.p.edgecombe@intel.com, kas@kernel.org, kai.huang@intel.com,
reinette.chatre@intel.com, tony.lindgren@linux.intel.com,
binbin.wu@linux.intel.com, isaku.yamahata@intel.com,
yan.y.zhao@intel.com, chao.gao@intel.com
Subject: Re: [PATCH V4 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
Date: Wed, 23 Jul 2025 20:31:55 +0800 [thread overview]
Message-ID: <a5bb2763-c28a-4b3e-a252-ebc1cd440e0b@intel.com> (raw)
In-Reply-To: <20250723120539.122752-3-adrian.hunter@intel.com>
On 7/23/2025 8:05 PM, Adrian Hunter wrote:
> Avoid clearing reclaimed TDX private pages unless the platform is affected
> by the X86_BUG_TDX_PW_MCE erratum. This significantly reduces VM shutdown
> time on unaffected systems.
>
> Background
>
> KVM currently clears reclaimed TDX private pages using MOVDIR64B, which:
>
> - Clears the TD Owner bit (which identifies TDX private memory) and
> integrity metadata without triggering integrity violations.
> - Clears poison from cache lines without consuming it, avoiding MCEs on
> access (refer TDX Module Base spec. 1348549-006US section 6.5.
> Handling Machine Check Events during Guest TD Operation).
>
> The TDX module also uses MOVDIR64B to initialize private pages before use.
> If cache flushing is needed, it sets TDX_FEATURES.CLFLUSH_BEFORE_ALLOC.
> However, KVM currently flushes unconditionally, refer commit 94c477a751c7b
> ("x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages")
>
> In contrast, when private pages are reclaimed, the TDX Module handles
> flushing via the TDH.PHYMEM.CACHE.WB SEAMCALL.
>
> Problem
>
> Clearing all private pages during VM shutdown is costly. For guests
> with a large amount of memory it can take minutes.
>
> Solution
>
> TDX Module Base Architecture spec. documents that private pages reclaimed
> from a TD should be initialized using MOVDIR64B, in order to avoid
> integrity violation or TD bit mismatch detection when later being read
> using a shared HKID, refer April 2025 spec. "Page Initialization" in
> section "8.6.2. Platforms not Using ACT: Required Cache Flush and
> Initialization by the Host VMM"
>
> That is an overstatement and will be clarified in coming versions of the
> spec. In fact, as outlined in "Table 16.2: Non-ACT Platforms Checks on
> Memory" and "Table 16.3: Non-ACT Platforms Checks on Memory Reads in Li
> Mode" in the same spec, there is no issue accessing such reclaimed pages
> using a shared key that does not have integrity enabled. Linux always uses
> KeyID 0 which never has integrity enabled. KeyID 0 is also the TME KeyID
> which disallows integrity, refer "TME Policy/Encryption Algorithm" bit
> description in "Intel Architecture Memory Encryption Technologies" spec
> version 1.6 April 2025. So there is no need to clear pages to avoid
> integrity violations.
>
> There remains a risk of poison consumption. However, in the context of
> TDX, it is expected that there would be a machine check associated with the
> original poisoning. On some platforms that results in a panic. However
> platforms may support "SEAM_NR" Machine Check capability, in which case
> Linux machine check handler marks the page as poisoned, which prevents it
> from being allocated anymore, refer commit 7911f145de5fe ("x86/mce:
> Implement recovery for errors in TDX/SEAM non-root mode")
>
> Improvement
>
> By skipping the clearing step on unaffected platforms, shutdown time
> can improve by up to 40%.
>
> On platforms with the X86_BUG_TDX_PW_MCE erratum (SPR and EMR), continue
> clearing because these platforms may trigger poison on partial writes to
> previously-private pages, even with KeyID 0, refer commit 1e536e1068970
> ("x86/cpu: Detect TDX partial write machine check erratum")
>
> Reviewed-by: Kirill A. Shutemov <kas@kernel.org>
> Acked-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>
>
> Changes in V4:
>
> Add TDX Module Base spec. version (Rick)
> Add Rick's Rev'd-by
>
> Changes in V3:
>
> Remove "flush cache" comments (Rick)
> Update function comment to better relate to "quirk" naming (Rick)
> Add "via MOVDIR64B" to comment (Xiaoyao)
> Add Rev'd-by, Ack'd-by tags
>
> Changes in V2:
>
> Improve the comment
>
>
> arch/x86/virt/vmx/tdx/tdx.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index fc8d8e444f15..ef22fc2b9af0 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -633,15 +633,19 @@ static int tdmrs_set_up_pamt_all(struct tdmr_info_list *tdmr_list,
> }
>
> /*
> - * Convert TDX private pages back to normal by using MOVDIR64B to
> - * clear these pages. Note this function doesn't flush cache of
> - * these TDX private pages. The caller should make sure of that.
> + * Convert TDX private pages back to normal by using MOVDIR64B to clear these
> + * pages. Typically, any write to the page will convert it from TDX private back
> + * to normal kernel memory. Systems with the X86_BUG_TDX_PW_MCE erratum need to
> + * do the conversion explicitly via MOVDIR64B.
> */
> static void tdx_quirk_reset_paddr(unsigned long base, unsigned long size)
> {
> const void *zero_page = (const void *)page_address(ZERO_PAGE(0));
> unsigned long phys, end;
>
> + if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE))
> + return;
> +
> end = base + size;
> for (phys = base; phys < end; phys += 64)
> movdir64b(__va(phys), zero_page);
prev parent reply other threads:[~2025-07-23 12:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-23 12:05 [PATCH V4 0/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
2025-07-23 12:05 ` [PATCH V4 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page() Adrian Hunter
2025-07-23 14:06 ` Edgecombe, Rick P
2025-07-23 14:37 ` Adrian Hunter
2025-07-23 14:44 ` Edgecombe, Rick P
2025-07-23 15:30 ` Adrian Hunter
2025-07-23 15:33 ` Edgecombe, Rick P
2025-07-23 15:41 ` Adrian Hunter
2025-07-23 16:03 ` Edgecombe, Rick P
2025-07-23 23:01 ` Huang, Kai
2025-07-23 23:26 ` Edgecombe, Rick P
2025-07-23 23:56 ` Huang, Kai
2025-07-23 15:57 ` Sean Christopherson
2025-07-23 12:05 ` [PATCH V4 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
2025-07-23 12:31 ` Xiaoyao Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a5bb2763-c28a-4b3e-a252-ebc1cd440e0b@intel.com \
--to=xiaoyao.li@intel.com \
--cc=adrian.hunter@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=chao.gao@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=tony.lindgren@linux.intel.com \
--cc=tony.luck@intel.com \
--cc=vannapurve@google.com \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.