From: Rick Edgecombe <rick.p.edgecombe@intel.com>
To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com,
kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com,
seanjc@google.com, tglx@kernel.org, vannapurve@google.com,
x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com,
kai.huang@intel.com
Cc: rick.p.edgecombe@intel.com,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [PATCH v6 03/11] x86/virt/tdx: Add tdx_alloc/free_control_page() helpers
Date: Mon, 25 May 2026 19:35:07 -0700 [thread overview]
Message-ID: <20260526023515.288829-4-rick.p.edgecombe@intel.com> (raw)
In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com>
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Add helpers to use when allocating or preparing pages that are handed to
the TDX-Module for use as control/S-EPT pages, and thus need Dynamic PAMT
adjustments.
The TDX module tracks some state for each page of physical memory that it
might use. It calls this state the PAMT. It includes separate state for
each page size a physical page could be utilized at within the TDX module
(1GB, 2MB, 4KB). In Dynamic PAMT, only the 4KB page size state is
allocated dynamically. So for pages that TDX will use as 2MB physically
contiguous pages, Dynamic PAMT backing is not needed.
KVM will need to hand pages to the TDX module that it will use at 4KB
granularity. So these pages will need Dynamic PAMT backing added before
they are used by the TDX module, and removed afterwards.
Add tdx_alloc_control_page() and tdx_free_control_page() to handle both
page allocation and Dynamic PAMT installation. Make them behave like
normal alloc/free functions where allocation can fail in the case of no
memory, but free (with any necessary Dynamic PAMT release) always
succeeds. Do this so they can support the existing TDX flows that require
teardowns to succeed.
Also create tdx_pamt_get/put() to handle installing Dynamic PAMT 4KB
backing for pages that are already allocated (such as KVM's use of S-EPT
page tables or guest private memory). Have them take a pfn instead of a
struct page, as future changes will want to use these helpers for guest
pages which are tracked by PFN.
Don't CLFLUSH the Dynamic PAMT pages handed to the TDX module, as is done
for some other SEAMCALLs, as the TDX docs specify that this is only
needed on "TD private memory or TD control structure page".
Since these allocations will be easily user triggerable, account the
memory.
Leave logic to handle concurrency issues for future changes.
Assisted-by: GitHub Copilot:claude-opus-4-6 Claude:claude-opus-4-7 Sashiko:claude-opus-4-6
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
v6:
The major change was to split out the concurrency stuff into a future
patch. It makes it easier to explain in the log. This one is the basic
functionality. Then the simple version of the concurrency and why in the
next patch. Also, to get rid of the dynamically sized DPAMT backing
support which was not based on a formal spec.
Details:
- Split out concurrency stuff into next patch because the log was too long
- Switch to fixed size pamt page arrays (Nikolay)
- Rename tdx_alloc_page()/tdx_free_page() to tdx_alloc_control_page()/
tdx_free_control_page() to reflect control/S-EPT purpose (Sean)
- Take gfp from the caller in tdx_alloc_control_page() (Sean)
- Narrow external API: make tdx_pamt_get()/tdx_pamt_put() static and
export only tdx_alloc_control_page()/tdx_free_control_page() (note:
dropped inline helpers since the discussion on Sean's series resulted
in them not being needed)
- Switch EXPORT_SYMBOL_GPL to EXPORT_SYMBOL_FOR_KVM (Sean)
- Use WARN_ON_ONCE() instead of pr_err() for TDX module failures (Sean)
- Fold alloc_pamt_array()/free_pamt_array() helpers back in and fix the
error-unwind index bug (dpamt_pages[i] -> [j])
- Adjustments after struct page->pfn
- Adjustments from dropping error helper patches
- Make the free error paths more normal
- Drop gfp_t arg in tdx_alloc_control_page(). In the Sean mega v5, it
was really needed because the kvm_mmu_memory_cache had a gfp_t it
needed something to do with. But this was still weird because that
version didn't handle allocating the DPAMT pages as the gfp_t. And in
the end all the callers pass GFP_KERNEL_ACCOUNT. So just drop the arg.
- Log tweaks
---
arch/x86/include/asm/tdx.h | 7 ++
arch/x86/virt/vmx/tdx/tdx.c | 159 ++++++++++++++++++++++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.h | 2 +
3 files changed, 168 insertions(+)
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 82dc27aecf297..74e75db5728c7 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -37,6 +37,7 @@
#include <uapi/asm/mce.h>
#include <asm/tdx_global_metadata.h>
+#include <linux/mm.h>
#include <linux/pgtable.h>
/*
@@ -160,6 +161,12 @@ void tdx_guest_keyid_free(unsigned int keyid);
void tdx_quirk_reset_paddr(unsigned long base, unsigned long size);
+/* Number PAMT pages to be provided to TDX module per 2MB region of PA */
+#define TDX_DPAMT_ENTRY_PAGE_CNT 2
+
+struct page *tdx_alloc_control_page(void);
+void tdx_free_control_page(struct page *page);
+
struct tdx_td {
/* TD root structure: */
struct page *tdr_page;
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 9ebd192cb5c17..9e0812d87ab06 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1919,6 +1919,165 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, kvm_pfn_t pfn)
}
EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_hkid);
+static int alloc_pamt_array(struct page **pamt_pages)
+{
+ int i, j;
+
+ for (i = 0; i < TDX_DPAMT_ENTRY_PAGE_CNT; i++) {
+ pamt_pages[i] = alloc_page(GFP_KERNEL_ACCOUNT);
+ if (!pamt_pages[i])
+ goto err;
+ }
+
+ return 0;
+err:
+ for (j = 0; j < i; j++)
+ __free_page(pamt_pages[j]);
+ return -ENOMEM;
+}
+
+static void free_pamt_array(struct page **pamt_pages)
+{
+ for (int i = 0; i < TDX_DPAMT_ENTRY_PAGE_CNT; i++) {
+ /*
+ * Reset pages unconditionally to cover cases
+ * where they were passed to the TDX module.
+ */
+ tdx_quirk_reset_paddr(page_to_phys(pamt_pages[i]), PAGE_SIZE);
+
+ __free_page(pamt_pages[i]);
+ }
+}
+
+/*
+ * Calculate the arg needed for operating on the DPAMT backing for
+ * a given 4KB page.
+ */
+static u64 pamt_2mb_arg(kvm_pfn_t pfn)
+{
+ unsigned long hpa_2mb = ALIGN_DOWN(pfn << PAGE_SHIFT, PMD_SIZE);
+
+ return hpa_2mb | TDX_PS_2M;
+}
+
+/* Add PAMT backing for the given page. */
+static u64 tdh_phymem_pamt_add(kvm_pfn_t pfn, struct page **pamt_pages)
+{
+ struct tdx_module_args args = {
+ .rcx = pamt_2mb_arg(pfn),
+ .rdx = page_to_phys(pamt_pages[0]),
+ .r8 = page_to_phys(pamt_pages[1]),
+ };
+
+ return seamcall(TDH_PHYMEM_PAMT_ADD, &args);
+}
+
+/* Remove PAMT backing for the given page. */
+static u64 tdh_phymem_pamt_remove(kvm_pfn_t pfn, struct page **pamt_pages)
+{
+ struct tdx_module_args args = {
+ .rcx = pamt_2mb_arg(pfn),
+ };
+ u64 ret;
+
+ ret = seamcall_ret(TDH_PHYMEM_PAMT_REMOVE, &args);
+ if (ret)
+ return ret;
+
+ /* Copy PAMT pages out of the struct per the TDX ABI */
+ pamt_pages[0] = phys_to_page(args.rdx);
+ pamt_pages[1] = phys_to_page(args.r8);
+
+ return 0;
+}
+
+/* Allocate PAMT memory for the given page */
+static int tdx_pamt_get(kvm_pfn_t pfn)
+{
+ struct page *pamt_pages[TDX_DPAMT_ENTRY_PAGE_CNT];
+ u64 tdx_status;
+ int ret;
+
+ if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
+ return 0;
+
+ ret = alloc_pamt_array(pamt_pages);
+ if (ret)
+ return ret;
+
+ tdx_status = tdh_phymem_pamt_add(pfn, pamt_pages);
+ if (tdx_status != TDX_SUCCESS) {
+ ret = -EIO;
+ goto out_free;
+ }
+
+ return 0;
+out_free:
+ free_pamt_array(pamt_pages);
+ return ret;
+}
+
+/* Free PAMT memory for the given page */
+static void tdx_pamt_put(kvm_pfn_t pfn)
+{
+ struct page *pamt_pages[TDX_DPAMT_ENTRY_PAGE_CNT] = {};
+ u64 tdx_status;
+
+ if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
+ return;
+
+ tdx_status = tdh_phymem_pamt_remove(pfn, pamt_pages);
+
+ /*
+ * Don't free pamt_pages as it could hold garbage when
+ * tdh_phymem_pamt_remove() fails. Don't panic/BUG_ON(), as
+ * there is no risk of data corruption, but do yell loudly as
+ * failure indicates a kernel bug, memory is being leaked, and
+ * the dangling PAMT entry may cause future operations to fail.
+ */
+ if (WARN_ON_ONCE(tdx_status != TDX_SUCCESS))
+ return;
+
+ free_pamt_array(pamt_pages);
+}
+
+/*
+ * Return a page that can be gifted to the TDX-Module for use as a "control"
+ * page, i.e. pages that are used for control and S-EPT structures for a given
+ * TDX guest, and bound to said guest's HKID and thus obtain TDX protections,
+ * including PAMT tracking.
+ */
+struct page *tdx_alloc_control_page(void)
+{
+ struct page *page;
+
+ page = alloc_page(GFP_KERNEL_ACCOUNT);
+ if (!page)
+ return NULL;
+
+ if (tdx_pamt_get(page_to_pfn(page))) {
+ __free_page(page);
+ return NULL;
+ }
+
+ return page;
+}
+EXPORT_SYMBOL_FOR_KVM(tdx_alloc_control_page);
+
+/*
+ * Free a page that was gifted to the TDX-Module for use as a control/S-EPT
+ * page. After this, the page is no longer protected by TDX.
+ */
+void tdx_free_control_page(struct page *page)
+{
+ if (!page)
+ return;
+
+ tdx_pamt_put(page_to_pfn(page));
+ __free_page(page);
+}
+EXPORT_SYMBOL_FOR_KVM(tdx_free_control_page);
+
#ifdef CONFIG_KEXEC_CORE
void tdx_cpu_flush_cache_for_kexec(void)
{
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index dde219c823b41..8c39dde347cc2 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -46,6 +46,8 @@
#define TDH_PHYMEM_PAGE_WBINVD 41
#define TDH_VP_WR 43
#define TDH_SYS_CONFIG 45
+#define TDH_PHYMEM_PAMT_ADD 58
+#define TDH_PHYMEM_PAMT_REMOVE 59
/*
* SEAMCALL leaf:
--
2.54.0
next prev parent reply other threads:[~2026-05-26 2:35 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 2:35 [PATCH v6 00/11] Dynamic PAMT Rick Edgecombe
2026-05-26 2:35 ` [PATCH v6 01/11] x86/virt/tdx: Simplify tdmr_get_pamt_sz() Rick Edgecombe
2026-05-26 2:35 ` [PATCH v6 02/11] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Rick Edgecombe
2026-05-26 2:35 ` Rick Edgecombe [this message]
2026-05-26 2:35 ` [PATCH v6 04/11] x86/virt/tdx: Allocate ref counts for Dynamic PAMT memory Rick Edgecombe
2026-05-26 2:35 ` [PATCH v6 05/11] x86/virt/tdx: Handle concurrent callers in tdx_pamt_get/put() Rick Edgecombe
2026-05-26 2:35 ` [PATCH v6 06/11] x86/virt/tdx: Optimize tdx_pamt_get/put() Rick Edgecombe
2026-05-26 8:57 ` Chao Gao
2026-05-26 2:35 ` [PATCH v6 07/11] KVM: TDX: Allocate PAMT memory for TD and vCPU control structures Rick Edgecombe
2026-05-26 2:35 ` [PATCH v6 08/11] x86/tdx: Add APIs to support Dynamic PAMT ops from KVM's fault path Rick Edgecombe
2026-05-26 2:35 ` [PATCH v6 09/11] KVM: TDX: Get/put PAMT pages when (un)mapping private memory Rick Edgecombe
2026-05-26 2:35 ` [PATCH v6 10/11] x86/virt/tdx: Enable Dynamic PAMT Rick Edgecombe
2026-05-26 2:35 ` [PATCH v6 11/11] Documentation/x86: Add documentation for TDX's " Rick Edgecombe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260526023515.288829-4-rick.p.edgecombe@intel.com \
--to=rick.p.edgecombe@intel.com \
--cc=bp@alien8.de \
--cc=chao.gao@intel.com \
--cc=dave.hansen@intel.com \
--cc=hpa@zytor.com \
--cc=kai.huang@intel.com \
--cc=kas@kernel.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=tglx@kernel.org \
--cc=vannapurve@google.com \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox