From: Chao Gao <chao.gao@intel.com>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
<seanjc@google.com>, <vannapurve@google.com>,
Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, <x86@kernel.org>,
"H Peter Anvin" <hpa@zytor.com>, <linux-kernel@vger.kernel.org>,
<kvm@vger.kernel.org>, <rick.p.edgecombe@intel.com>,
<kirill.shutemov@linux.intel.com>, <kai.huang@intel.com>,
<reinette.chatre@intel.com>, <xiaoyao.li@intel.com>,
<tony.lindgren@linux.intel.com>, <binbin.wu@linux.intel.com>,
<isaku.yamahata@intel.com>, <yan.y.zhao@intel.com>
Subject: Re: [PATCH V2 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
Date: Mon, 7 Jul 2025 11:16:12 +0800 [thread overview]
Message-ID: <aGs7/C0W58nEUVNk@intel.com> (raw)
In-Reply-To: <20250703153712.155600-3-adrian.hunter@intel.com>
On Thu, Jul 03, 2025 at 06:37:12PM +0300, Adrian Hunter wrote:
>Avoid clearing reclaimed TDX private pages unless the platform is affected
>by the X86_BUG_TDX_PW_MCE erratum. This significantly reduces VM shutdown
>time on unaffected systems.
>
>Background
>
>KVM currently clears reclaimed TDX private pages using MOVDIR64B, which:
>
> - Clears the TD Owner bit (which identifies TDX private memory) and
> integrity metadata without triggering integrity violations.
> - Clears poison from cache lines without consuming it, avoiding MCEs on
> access (refer TDX Module Base spec. 16.5. Handling Machine Check
> Events during Guest TD Operation).
>
>The TDX module also uses MOVDIR64B to initialize private pages before use.
>If cache flushing is needed, it sets TDX_FEATURES.CLFLUSH_BEFORE_ALLOC.
>However, KVM currently flushes unconditionally, refer commit 94c477a751c7b
>("x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages")
>
>In contrast, when private pages are reclaimed, the TDX Module handles
>flushing via the TDH.PHYMEM.CACHE.WB SEAMCALL.
>
>Problem
>
>Clearing all private pages during VM shutdown is costly. For guests
>with a large amount of memory it can take minutes.
>
>Solution
>
>TDX Module Base Architecture spec. documents that private pages reclaimed
>from a TD should be initialized using MOVDIR64B, in order to avoid
>integrity violation or TD bit mismatch detection when later being read
>using a shared HKID, refer April 2025 spec. "Page Initialization" in
>section "8.6.2. Platforms not Using ACT: Required Cache Flush and
>Initialization by the Host VMM"
>
>That is an overstatement and will be clarified in coming versions of the
>spec. In fact, as outlined in "Table 16.2: Non-ACT Platforms Checks on
>Memory" and "Table 16.3: Non-ACT Platforms Checks on Memory Reads in Li
>Mode" in the same spec, there is no issue accessing such reclaimed pages
>using a shared key that does not have integrity enabled. Linux always uses
>KeyID 0 which never has integrity enabled. KeyID 0 is also the TME KeyID
>which disallows integrity, refer "TME Policy/Encryption Algorithm" bit
>description in "Intel Architecture Memory Encryption Technologies" spec
>version 1.6 April 2025. So there is no need to clear pages to avoid
>integrity violations.
>
>There remains a risk of poison consumption. However, in the context of
>TDX, it is expected that there would be a machine check associated with the
>original poisoning. On some platforms that results in a panic. However
>platforms may support "SEAM_NR" Machine Check capability, in which case
>Linux machine check handler marks the page as poisoned, which prevents it
>from being allocated anymore, refer commit 7911f145de5fe ("x86/mce:
>Implement recovery for errors in TDX/SEAM non-root mode")
Even on a CPU w/ SEAM_NR and w/o X86_BUG_TDX_PW_MCE, is there still a risk of
poisoned memory being returned to the host kernel? Since only poison
consumption causes #MCE, if a poisoned page is never consumed in SEAM non-root
mode, there will be no #MCE, and the mentioned commit won't mark the page as
poisoned.
A reclaimed poisoned page could be reused and potentially cause a kernel panic.
While WBINVD could help, we believe it's not worth it as it will slow down the
vast majority of cases. Is my understanding correct?
next prev parent reply other threads:[~2025-07-07 3:16 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 15:37 [PATCH V2 0/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
2025-07-03 15:37 ` [PATCH V2 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page() Adrian Hunter
2025-07-03 16:34 ` Kirill A. Shutemov
2025-07-04 6:44 ` Binbin Wu
2025-07-04 15:33 ` Xiaoyao Li
2025-07-07 2:08 ` Huang, Kai
2025-07-07 17:31 ` Edgecombe, Rick P
2025-07-03 15:37 ` [PATCH V2 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
2025-07-03 16:44 ` Kirill A. Shutemov
2025-07-03 17:06 ` Vishal Annapurve
2025-07-04 5:37 ` Adrian Hunter
2025-07-07 23:31 ` Vishal Annapurve
2025-07-04 1:32 ` Xiaoyao Li
2025-07-04 6:44 ` Binbin Wu
2025-07-07 2:09 ` Huang, Kai
2025-07-07 3:16 ` Chao Gao [this message]
2025-07-07 4:23 ` Dave Hansen
2025-07-07 7:15 ` Chao Gao
2025-07-07 11:39 ` Huang, Kai
2025-07-07 14:32 ` Dave Hansen
2025-07-07 18:15 ` Edgecombe, Rick P
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGs7/C0W58nEUVNk@intel.com \
--to=chao.gao@intel.com \
--cc=adrian.hunter@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=tony.lindgren@linux.intel.com \
--cc=tony.luck@intel.com \
--cc=vannapurve@google.com \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.