From: "Huang, Kai" <kai.huang@intel.com>
To: "Annapurve, Vishal" <vannapurve@google.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"Hunter, Adrian" <adrian.hunter@intel.com>,
"seanjc@google.com" <seanjc@google.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Li, Xiaoyao" <xiaoyao.li@intel.com>,
"Zhao, Yan Y" <yan.y.zhao@intel.com>,
"Luck, Tony" <tony.luck@intel.com>,
"tony.lindgren@linux.intel.com" <tony.lindgren@linux.intel.com>,
"binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>,
"Chatre, Reinette" <reinette.chatre@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"Yamahata, Isaku" <isaku.yamahata@intel.com>,
"kirill.shutemov@linux.intel.com"
<kirill.shutemov@linux.intel.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"hpa@zytor.com" <hpa@zytor.com>,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
"bp@alien8.de" <bp@alien8.de>, "Gao, Chao" <chao.gao@intel.com>,
"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH V2 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
Date: Mon, 7 Jul 2025 02:09:34 +0000 [thread overview]
Message-ID: <0ad0d81bf5465c5eb21166d06b1bf077dd2bd941.camel@intel.com> (raw)
In-Reply-To: <20250703153712.155600-3-adrian.hunter@intel.com>
On Thu, 2025-07-03 at 18:37 +0300, Hunter, Adrian wrote:
> Avoid clearing reclaimed TDX private pages unless the platform is affected
> by the X86_BUG_TDX_PW_MCE erratum. This significantly reduces VM shutdown
> time on unaffected systems.
>
> Background
>
> KVM currently clears reclaimed TDX private pages using MOVDIR64B, which:
>
> - Clears the TD Owner bit (which identifies TDX private memory) and
> integrity metadata without triggering integrity violations.
> - Clears poison from cache lines without consuming it, avoiding MCEs on
> access (refer TDX Module Base spec. 16.5. Handling Machine Check
> Events during Guest TD Operation).
>
> The TDX module also uses MOVDIR64B to initialize private pages before use.
> If cache flushing is needed, it sets TDX_FEATURES.CLFLUSH_BEFORE_ALLOC.
> However, KVM currently flushes unconditionally, refer commit 94c477a751c7b
> ("x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages")
>
> In contrast, when private pages are reclaimed, the TDX Module handles
> flushing via the TDH.PHYMEM.CACHE.WB SEAMCALL.
>
> Problem
>
> Clearing all private pages during VM shutdown is costly. For guests
> with a large amount of memory it can take minutes.
>
> Solution
>
> TDX Module Base Architecture spec. documents that private pages reclaimed
> from a TD should be initialized using MOVDIR64B, in order to avoid
> integrity violation or TD bit mismatch detection when later being read
> using a shared HKID, refer April 2025 spec. "Page Initialization" in
> section "8.6.2. Platforms not Using ACT: Required Cache Flush and
> Initialization by the Host VMM"
>
> That is an overstatement and will be clarified in coming versions of the
> spec. In fact, as outlined in "Table 16.2: Non-ACT Platforms Checks on
> Memory" and "Table 16.3: Non-ACT Platforms Checks on Memory Reads in Li
> Mode" in the same spec, there is no issue accessing such reclaimed pages
> using a shared key that does not have integrity enabled. Linux always uses
> KeyID 0 which never has integrity enabled. KeyID 0 is also the TME KeyID
> which disallows integrity, refer "TME Policy/Encryption Algorithm" bit
> description in "Intel Architecture Memory Encryption Technologies" spec
> version 1.6 April 2025. So there is no need to clear pages to avoid
> integrity violations.
>
> There remains a risk of poison consumption. However, in the context of
> TDX, it is expected that there would be a machine check associated with the
> original poisoning. On some platforms that results in a panic. However
> platforms may support "SEAM_NR" Machine Check capability, in which case
> Linux machine check handler marks the page as poisoned, which prevents it
> from being allocated anymore, refer commit 7911f145de5fe ("x86/mce:
> Implement recovery for errors in TDX/SEAM non-root mode")
>
> Improvement
>
> By skipping the clearing step on unaffected platforms, shutdown time
> can improve by up to 40%.
>
> On platforms with the X86_BUG_TDX_PW_MCE erratum (SPR and EMR), continue
> clearing because these platforms may trigger poison on partial writes to
> previously-private pages, even with KeyID 0, refer commit 1e536e1068970
> ("x86/cpu: Detect TDX partial write machine check erratum")
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>
Acked-by: Kai Huang <kai.huang@intel.com>
next prev parent reply other threads:[~2025-07-07 2:10 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 15:37 [PATCH V2 0/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
2025-07-03 15:37 ` [PATCH V2 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page() Adrian Hunter
2025-07-03 16:34 ` Kirill A. Shutemov
2025-07-04 6:44 ` Binbin Wu
2025-07-04 15:33 ` Xiaoyao Li
2025-07-07 2:08 ` Huang, Kai
2025-07-07 17:31 ` Edgecombe, Rick P
2025-07-03 15:37 ` [PATCH V2 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
2025-07-03 16:44 ` Kirill A. Shutemov
2025-07-03 17:06 ` Vishal Annapurve
2025-07-04 5:37 ` Adrian Hunter
2025-07-07 23:31 ` Vishal Annapurve
2025-07-04 1:32 ` Xiaoyao Li
2025-07-04 6:44 ` Binbin Wu
2025-07-07 2:09 ` Huang, Kai [this message]
2025-07-07 3:16 ` Chao Gao
2025-07-07 4:23 ` Dave Hansen
2025-07-07 7:15 ` Chao Gao
2025-07-07 11:39 ` Huang, Kai
2025-07-07 14:32 ` Dave Hansen
2025-07-07 18:15 ` Edgecombe, Rick P
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0ad0d81bf5465c5eb21166d06b1bf077dd2bd941.camel@intel.com \
--to=kai.huang@intel.com \
--cc=adrian.hunter@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=chao.gao@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=isaku.yamahata@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=tony.lindgren@linux.intel.com \
--cc=tony.luck@intel.com \
--cc=vannapurve@google.com \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).