* [PATCH 0/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
@ 2025-07-03 11:40 Adrian Hunter
2025-07-03 11:40 ` [PATCH 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page() Adrian Hunter
2025-07-03 11:40 ` [PATCH 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
0 siblings, 2 replies; 7+ messages in thread
From: Adrian Hunter @ 2025-07-03 11:40 UTC (permalink / raw)
To: Dave Hansen, pbonzini, seanjc, vannapurve
Cc: Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar, x86,
H Peter Anvin, linux-kernel, kvm, rick.p.edgecombe,
kirill.shutemov, kai.huang, reinette.chatre, xiaoyao.li,
tony.lindgren, binbin.wu, isaku.yamahata, yan.y.zhao, chao.gao
Hi
Here are 2 small self-explanatory patches related to clearing TDX private
pages.
Patch 1 is a minor tidy-up.
In patch 2, by skipping the clearing step, shutdown time can improve by
up to 40%.
Adrian Hunter (2):
x86/tdx: Eliminate duplicate code in tdx_clear_page()
x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
arch/x86/include/asm/tdx.h | 2 ++
arch/x86/kvm/vmx/tdx.c | 19 -------------------
arch/x86/virt/vmx/tdx/tdx.c | 15 +++++++++++++++
3 files changed, 17 insertions(+), 19 deletions(-)
Regards
Adrian
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page()
2025-07-03 11:40 [PATCH 0/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
@ 2025-07-03 11:40 ` Adrian Hunter
2025-07-03 14:14 ` Dave Hansen
2025-07-03 11:40 ` [PATCH 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
1 sibling, 1 reply; 7+ messages in thread
From: Adrian Hunter @ 2025-07-03 11:40 UTC (permalink / raw)
To: Dave Hansen, pbonzini, seanjc, vannapurve
Cc: Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar, x86,
H Peter Anvin, linux-kernel, kvm, rick.p.edgecombe,
kirill.shutemov, kai.huang, reinette.chatre, xiaoyao.li,
tony.lindgren, binbin.wu, isaku.yamahata, yan.y.zhao, chao.gao
tdx_clear_page() and reset_tdx_pages() duplicate the TDX page clearing
logic. Keep the tdx_clear_page() prototype but call reset_tdx_pages() for
the implementation.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
arch/x86/include/asm/tdx.h | 2 ++
arch/x86/kvm/vmx/tdx.c | 19 -------------------
arch/x86/virt/vmx/tdx/tdx.c | 6 ++++++
3 files changed, 8 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 7ddef3a69866..e94eb0711bf1 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -131,6 +131,8 @@ int tdx_guest_keyid_alloc(void);
u32 tdx_get_nr_guest_keyids(void);
void tdx_guest_keyid_free(unsigned int keyid);
+void tdx_clear_page(struct page *page);
+
struct tdx_td {
/* TD root structure: */
struct page *tdr_page;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a08e7055d1db..098e4ee574bb 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -276,25 +276,6 @@ static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
vcpu->cpu = -1;
}
-static void tdx_clear_page(struct page *page)
-{
- const void *zero_page = (const void *) page_to_virt(ZERO_PAGE(0));
- void *dest = page_to_virt(page);
- unsigned long i;
-
- /*
- * The page could have been poisoned. MOVDIR64B also clears
- * the poison bit so the kernel can safely use the page again.
- */
- for (i = 0; i < PAGE_SIZE; i += 64)
- movdir64b(dest + i, zero_page);
- /*
- * MOVDIR64B store uses WC buffer. Prevent following memory reads
- * from seeing potentially poisoned cache.
- */
- __mb();
-}
-
static void tdx_no_vcpus_enter_start(struct kvm *kvm)
{
struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index c7a9a087ccaf..17a7a599facd 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -654,6 +654,12 @@ static void reset_tdx_pages(unsigned long base, unsigned long size)
mb();
}
+void tdx_clear_page(struct page *page)
+{
+ reset_tdx_pages(page_to_phys(page), PAGE_SIZE);
+}
+EXPORT_SYMBOL_GPL(tdx_clear_page);
+
static void tdmr_reset_pamt(struct tdmr_info *tdmr)
{
tdmr_do_pamt_func(tdmr, reset_tdx_pages);
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
2025-07-03 11:40 [PATCH 0/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
2025-07-03 11:40 ` [PATCH 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page() Adrian Hunter
@ 2025-07-03 11:40 ` Adrian Hunter
2025-07-03 14:25 ` Dave Hansen
1 sibling, 1 reply; 7+ messages in thread
From: Adrian Hunter @ 2025-07-03 11:40 UTC (permalink / raw)
To: Dave Hansen, pbonzini, seanjc, vannapurve
Cc: Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar, x86,
H Peter Anvin, linux-kernel, kvm, rick.p.edgecombe,
kirill.shutemov, kai.huang, reinette.chatre, xiaoyao.li,
tony.lindgren, binbin.wu, isaku.yamahata, yan.y.zhao, chao.gao
Avoid clearing reclaimed TDX private pages unless the platform is affected
by the X86_BUG_TDX_PW_MCE erratum. This significantly reduces VM shutdown
time on unaffected systems.
Background
KVM currently clears reclaimed TDX private pages using MOVDIR64B, which:
- Clears the TD Owner bit (which identifies TDX private memory) and
integrity metadata without triggering integrity violations.
- Clears poison from cache lines without consuming it, avoiding MCEs on
access (refer TDX Module Base spec. 16.5. Handling Machine Check
Events during Guest TD Operation).
The TDX module also uses MOVDIR64B to initialize private pages before use.
If cache flushing is needed, it sets TDX_FEATURES.CLFLUSH_BEFORE_ALLOC.
However, KVM currently flushes unconditionally, refer commit 94c477a751c7b
("x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages")
In contrast, when private pages are reclaimed, the TDX Module handles
flushing via the TDH.PHYMEM.CACHE.WB SEAMCALL.
Problem
Clearing all private pages during VM shutdown is costly. For guests
with a large amount of memory it can take minutes.
Solution
TDX Module Base Architecture spec. documents that private pages reclaimed
from a TD should be initialized using MOVDIR64B, in order to avoid
integrity violation or TD bit mismatch detection when later being read
using a shared HKID, refer April 2025 spec. "Page Initialization" in
section "8.6.2. Platforms not Using ACT: Required Cache Flush and
Initialization by the Host VMM"
That is an overstatement and will be clarified in coming versions of the
spec. In fact, as outlined in "Table 16.2: Non-ACT Platforms Checks on
Memory" and "Table 16.3: Non-ACT Platforms Checks on Memory Reads in Li
Mode" in the same spec, there is no issue accessing such reclaimed pages
using a shared key that does not have integrity enabled. Linux always uses
KeyID 0 which never has integrity enabled. KeyID 0 is also the TME KeyID
which disallows integrity, refer "TME Policy/Encryption Algorithm" bit
description in "Intel Architecture Memory Encryption Technologies" spec
version 1.6 April 2025. So there is no need to clear pages to avoid
integrity violations.
There remains a risk of poison consumption. However, in the context of
TDX, it is expected that there would be a machine check associated with the
original poisoning. On some platforms that results in a panic. However
platforms may support "SEAM_NR" Machine Check capability, in which case
Linux machine check handler marks the page as poisoned, which prevents it
from being allocated anymore, refer commit 7911f145de5fe ("x86/mce:
Implement recovery for errors in TDX/SEAM non-root mode")
Improvement
By skipping the clearing step on unaffected platforms, shutdown time
can improve by up to 40%.
On platforms with the X86_BUG_TDX_PW_MCE erratum (SPR and EMR), continue
clearing because these platforms may trigger poison on partial writes to
previously-private pages, even with KeyID 0, refer commit 1e536e1068970
("x86/cpu: Detect TDX partial write machine check erratum")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
arch/x86/virt/vmx/tdx/tdx.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 17a7a599facd..a030679c9239 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -642,6 +642,15 @@ static void reset_tdx_pages(unsigned long base, unsigned long size)
const void *zero_page = (const void *)page_address(ZERO_PAGE(0));
unsigned long phys, end;
+ /*
+ * Linux uses only KeyID 0 which can read or write pages
+ * that were formerly TDX private pages, without poisoning
+ * memory. However on platforms with the partial-write errata,
+ * poisoning still happens for partial writes.
+ */
+ if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE))
+ return;
+
end = base + size;
for (phys = base; phys < end; phys += 64)
movdir64b(__va(phys), zero_page);
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page()
2025-07-03 11:40 ` [PATCH 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page() Adrian Hunter
@ 2025-07-03 14:14 ` Dave Hansen
2025-07-03 14:39 ` Adrian Hunter
0 siblings, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2025-07-03 14:14 UTC (permalink / raw)
To: Adrian Hunter, Dave Hansen, pbonzini, seanjc, vannapurve
Cc: Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar, x86,
H Peter Anvin, linux-kernel, kvm, rick.p.edgecombe,
kirill.shutemov, kai.huang, reinette.chatre, xiaoyao.li,
tony.lindgren, binbin.wu, isaku.yamahata, yan.y.zhao, chao.gao
On 7/3/25 04:40, Adrian Hunter wrote:
> tdx_clear_page() and reset_tdx_pages() duplicate the TDX page clearing
> logic. Keep the tdx_clear_page() prototype but call reset_tdx_pages() for
> the implementation.
Why keep the old prototype?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
2025-07-03 11:40 ` [PATCH 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
@ 2025-07-03 14:25 ` Dave Hansen
0 siblings, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2025-07-03 14:25 UTC (permalink / raw)
To: Adrian Hunter, Dave Hansen, pbonzini, seanjc, vannapurve
Cc: Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar, x86,
H Peter Anvin, linux-kernel, kvm, rick.p.edgecombe,
kirill.shutemov, kai.huang, reinette.chatre, xiaoyao.li,
tony.lindgren, binbin.wu, isaku.yamahata, yan.y.zhao, chao.gao
On 7/3/25 04:40, Adrian Hunter wrote:
> @@ -642,6 +642,15 @@ static void reset_tdx_pages(unsigned long base, unsigned long size)
> const void *zero_page = (const void *)page_address(ZERO_PAGE(0));
> unsigned long phys, end;
If the only possible use for reset_tdx_pages() is for the erratum, then
it needs to be in the function name.
> + /*
> + * Linux uses only KeyID 0 which can read or write pages
> + * that were formerly TDX private pages, without poisoning
> + * memory. However on platforms with the partial-write errata,
> + * poisoning still happens for partial writes.
> + */
> + if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE))
> + return;
Would this be more clear?
/*
* Typically, any write to the page will convert it from TDX
* private back to normal kernel memory. Systems with the
* erratum need to do the conversion explicitly.
*/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page()
2025-07-03 14:14 ` Dave Hansen
@ 2025-07-03 14:39 ` Adrian Hunter
2025-07-03 14:53 ` Dave Hansen
0 siblings, 1 reply; 7+ messages in thread
From: Adrian Hunter @ 2025-07-03 14:39 UTC (permalink / raw)
To: Dave Hansen, Dave Hansen, pbonzini, seanjc, vannapurve
Cc: Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar, x86,
H Peter Anvin, linux-kernel, kvm, rick.p.edgecombe,
kirill.shutemov, kai.huang, reinette.chatre, xiaoyao.li,
tony.lindgren, binbin.wu, isaku.yamahata, yan.y.zhao, chao.gao
On 03/07/2025 17:14, Dave Hansen wrote:
> On 7/3/25 04:40, Adrian Hunter wrote:
>> tdx_clear_page() and reset_tdx_pages() duplicate the TDX page clearing
>> logic. Keep the tdx_clear_page() prototype but call reset_tdx_pages() for
>> the implementation.
>
> Why keep the old prototype?
What prototype would you prefer?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page()
2025-07-03 14:39 ` Adrian Hunter
@ 2025-07-03 14:53 ` Dave Hansen
0 siblings, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2025-07-03 14:53 UTC (permalink / raw)
To: Adrian Hunter, Dave Hansen, pbonzini, seanjc, vannapurve
Cc: Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar, x86,
H Peter Anvin, linux-kernel, kvm, rick.p.edgecombe,
kirill.shutemov, kai.huang, reinette.chatre, xiaoyao.li,
tony.lindgren, binbin.wu, isaku.yamahata, yan.y.zhao, chao.gao
On 7/3/25 07:39, Adrian Hunter wrote:
> On 03/07/2025 17:14, Dave Hansen wrote:
>> On 7/3/25 04:40, Adrian Hunter wrote:
>>> tdx_clear_page() and reset_tdx_pages() duplicate the TDX page clearing
>>> logic. Keep the tdx_clear_page() prototype but call reset_tdx_pages() for
>>> the implementation.
>> Why keep the old prototype?
> What prototype would you prefer?
My first reaction is that there should be _one_ function that does the
erratum-related clearing. I'd rather have the KVM sites use the slightly
cumbersome paddr/size arguments than have two functions.
Worst, case, have two functions that have similar names to make it clear
that they are doing the same thing.
It is *NOT* clear that:
tdx_clear_page()
and
reset_tdx_pages()
are doing the exact same thing.
I think I'd probably rename reset_tdx_pages() to tdx_reset_paddr() and
then use it in KVM like this:
tdx_quirk_reset_paddr(page_to_phys(page), PAGE_SIZE);
The alternative would be to retain a function that keeps the 'struct
page' as an argument. Something like:
tdx_quirk_reset_paddr(unsigned long base, unsigned long size)
and
tdx_quirk_reset_page(struct page *page)
But the way proposed in this patch is very much a half measure.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-03 14:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-03 11:40 [PATCH 0/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
2025-07-03 11:40 ` [PATCH 1/2] x86/tdx: Eliminate duplicate code in tdx_clear_page() Adrian Hunter
2025-07-03 14:14 ` Dave Hansen
2025-07-03 14:39 ` Adrian Hunter
2025-07-03 14:53 ` Dave Hansen
2025-07-03 11:40 ` [PATCH 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Adrian Hunter
2025-07-03 14:25 ` Dave Hansen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).