linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com
Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com,
	kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [PATCHv2 12/12] Documentation/x86: Add documentation for TDX's Dynamic PAMT
Date: Mon,  9 Jun 2025 22:13:40 +0300	[thread overview]
Message-ID: <20250609191340.2051741-13-kirill.shutemov@linux.intel.com> (raw)
In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com>

Expand TDX documentation to include information on the Dynamic PAMT
feature.

The new section explains PAMT support in the TDX module and how it is
enabled on the kernel side.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/arch/x86/tdx.rst | 108 +++++++++++++++++++++++++++++++++
 1 file changed, 108 insertions(+)

diff --git a/Documentation/arch/x86/tdx.rst b/Documentation/arch/x86/tdx.rst
index 719043cd8b46..a1dc50dd6f57 100644
--- a/Documentation/arch/x86/tdx.rst
+++ b/Documentation/arch/x86/tdx.rst
@@ -99,6 +99,114 @@ initialize::
 
   [..] virt/tdx: module initialization failed ...
 
+Dynamic PAMT
+------------
+
+Dynamic PAMT support in TDX module
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Dynamic PAMT is a TDX feature that allows VMM to allocate PAMT_4K as
+needed. PAMT_1G and PAMT_2M are still allocated statically at the time of
+TDX module initialization. At init stage allocation of PAMT_4K is replaced
+with PAMT_PAGE_BITMAP which currently requires one bit of memory per 4k.
+
+VMM is responsible for allocating and freeing PAMT_4K. There's a couple of
+new SEAMCALLs for this: TDH.PHYMEM.PAMT.ADD and TDH.PHYMEM.PAMT.REMOVE.
+They add/remove PAMT memory in form of page pair. There's no requirement
+for these pages to be contiguous.
+
+Page pair supplied via TDH.PHYMEM.PAMT.ADD will cover specified 2M region.
+It allows any 4K from the region to be usable by TDX module.
+
+With Dynamic PAMT, a number of SEAMCALLs can now fail due to missing PAMT
+memory (TDX_MISSING_PAMT_PAGE_PAIR):
+
+ - TDH.MNG.CREATE
+ - TDH.MNG.ADDCX
+ - TDH.VP.ADDCX
+ - TDH.VP.CREATE
+ - TDH.MEM.PAGE.ADD
+ - TDH.MEM.PAGE.AUG
+ - TDH.MEM.PAGE.DEMOTE
+ - TDH.MEM.PAGE.RELOCATE
+
+Basically, if you supply memory to a TD, this memory has to backed by PAMT
+memory.
+
+Once no TD uses the 2M range, the PAMT page pair can be reclaimed with
+TDH.PHYMEM.PAMT.REMOVE.
+
+TDX module track PAMT memory usage and can give VMM a hint that PAMT
+memory can be removed. Such hint is provided from all SEAMCALLs that
+removes memory from TD:
+
+ - TDH.MEM.SEPT.REMOVE
+ - TDH.MEM.PAGE.REMOVE
+ - TDH.MEM.PAGE.PROMOTE
+ - TDH.MEM.PAGE.RELOCATE
+ - TDH.PHYMEM.PAGE.RECLAIM
+
+With Dynamic PAMT, TDH.MEM.PAGE.DEMOTE takes PAMT page pair as additional
+input to populate PAMT_4K on split. TDH.MEM.PAGE.PROMOTE returns no longer
+needed PAMT page pair.
+
+PAMT memory is global resource and not tied to a specific TD. TDX modules
+maintains PAMT memory in a radix tree addressed by physical address. Each
+entry in the tree can be locked with shared or exclusive lock. Any
+modification of the tree requires exclusive lock.
+
+Any SEAMCALL that takes explicit HPA as an argument will walk the tree
+taking shared lock on entries. It required to make sure that the page
+pointed by HPA is of compatible type for the usage.
+
+TDCALLs don't take PAMT locks as none of the take HPA as an argument.
+
+Dynamic PAMT enabling in kernel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Kernel maintains refcounts for every 2M regions with two helpers
+tdx_pamt_get() and tdx_pamt_put().
+
+The refcount represents number of users for the PAMT memory in the region.
+Kernel calls TDH.PHYMEM.PAMT.ADD on 0->1 transition and
+TDH.PHYMEM.PAMT.REMOVE on transition 1->0.
+
+The function tdx_alloc_page() allocates a new page and ensures that it is
+backed by PAMT memory. Pages allocated in this manner are ready to be used
+for a TD. The function tdx_free_page() frees the page and releases the
+PAMT memory for the 2M region if it is no longer needed.
+
+PAMT memory gets allocated as part of TD init, VCPU init, on populating
+SEPT tree and adding guest memory (both during TD build and via AUG on
+accept). Splitting 2M page into 4K also requires PAMT memory.
+
+PAMT memory removed on reclaim of control pages and guest memory.
+
+Populating PAMT memory on fault and on split is tricky as kernel cannot
+allocate memory from the context where it is needed. These code paths use
+pre-allocated PAMT memory pools.
+
+Previous attempt on Dynamic PAMT enabling
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The initial attempt at kernel enabling was quite different. It was built
+around lazy PAMT allocation: only trying to add a PAMT page pair if a
+SEAMCALL fails due to a missing PAMT and reclaiming it based on hints
+provided by the TDX module.
+
+The motivation was to avoid duplicating the PAMT memory refcounting that
+the TDX module does on the kernel side.
+
+This approach is inherently more racy as there is no serialization of
+PAMT memory add/remove against SEAMCALLs that add/remove memory for a TD.
+Such serialization would require global locking, which is not feasible.
+
+This approach worked, but at some point it became clear that it could not
+be robust as long as the kernel avoids TDX_OPERAND_BUSY loops.
+TDX_OPERAND_BUSY will occur as a result of the races mentioned above.
+
+This approach was abandoned in favor of explicit refcounting.
+
 TDX Interaction to Other Kernel Components
 ------------------------------------------
 
-- 
2.47.2


  parent reply	other threads:[~2025-06-09 19:14 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-09 19:13 [PATCHv2 00/12] TDX: Enable Dynamic PAMT Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 01/12] x86/tdx: Consolidate TDX error handling Kirill A. Shutemov
2025-06-25 17:58   ` Dave Hansen
2025-06-25 20:58     ` Edgecombe, Rick P
2025-06-25 21:27       ` Sean Christopherson
2025-06-25 21:46         ` Edgecombe, Rick P
2025-06-26  9:25         ` kirill.shutemov
2025-06-26 14:46           ` Dave Hansen
2025-06-26 15:51             ` Sean Christopherson
2025-06-26 16:59               ` Dave Hansen
2025-06-27 10:42                 ` kirill.shutemov
2025-07-30 18:32                 ` Edgecombe, Rick P
2025-07-31 23:31                   ` Sean Christopherson
2025-07-31 23:46                     ` Edgecombe, Rick P
2025-07-31 23:53                       ` Sean Christopherson
2025-08-01 15:03                         ` Edgecombe, Rick P
2025-08-06 15:19                           ` Sean Christopherson
2025-06-26  0:05     ` Huang, Kai
2025-07-30 18:33       ` Edgecombe, Rick P
2025-06-09 19:13 ` [PATCHv2 02/12] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Kirill A. Shutemov
2025-06-25 18:06   ` Dave Hansen
2025-06-26  9:25     ` Kirill A. Shutemov
2025-07-31  1:06     ` Edgecombe, Rick P
2025-07-31  4:10       ` Huang, Kai
2025-06-26 11:08   ` Huang, Kai
2025-06-27 10:42     ` kirill.shutemov
2025-06-09 19:13 ` [PATCHv2 03/12] x86/virt/tdx: Allocate reference counters for PAMT memory Kirill A. Shutemov
2025-06-25 19:26   ` Dave Hansen
2025-06-27 11:27     ` Kirill A. Shutemov
2025-06-27 14:03       ` Dave Hansen
2025-06-26  0:53   ` Huang, Kai
2025-06-26  4:48     ` Huang, Kai
2025-06-27 11:35     ` kirill.shutemov
2025-06-09 19:13 ` [PATCHv2 04/12] x86/virt/tdx: Add tdx_alloc/free_page() helpers Kirill A. Shutemov
2025-06-10  2:36   ` Chao Gao
2025-06-10 14:51     ` [PATCHv2.1 " Kirill A. Shutemov
2025-06-25 18:01       ` Dave Hansen
2025-06-25 20:09     ` [PATCHv2 " Dave Hansen
2025-06-26  0:46       ` Chao Gao
2025-06-25 20:02   ` Dave Hansen
2025-06-27 13:00     ` Kirill A. Shutemov
2025-06-27  7:49   ` Adrian Hunter
2025-06-27 13:03     ` Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 05/12] KVM: TDX: Allocate PAMT memory in __tdx_td_init() Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 06/12] KVM: TDX: Allocate PAMT memory in tdx_td_vcpu_init() Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 07/12] KVM: TDX: Preallocate PAMT pages to be used in page fault path Kirill A. Shutemov
2025-06-26 11:21   ` Huang, Kai
2025-07-10  1:34   ` Edgecombe, Rick P
2025-07-10  7:49     ` kirill.shutemov
2025-06-09 19:13 ` [PATCHv2 08/12] KVM: TDX: Handle PAMT allocation in " Kirill A. Shutemov
2025-06-12 12:19   ` Chao Gao
2025-06-12 13:05     ` [PATCHv2.1 " Kirill A. Shutemov
2025-06-25 22:38   ` [PATCHv2 " Edgecombe, Rick P
2025-07-09 14:29     ` kirill.shutemov
2025-07-10  1:33   ` Edgecombe, Rick P
2025-07-10  8:45     ` kirill.shutemov
2025-08-21 19:21   ` Sagi Shahar
2025-08-21 19:35     ` Edgecombe, Rick P
2025-08-21 19:53       ` Sagi Shahar
2025-06-09 19:13 ` [PATCHv2 09/12] KVM: TDX: Reclaim PAMT memory Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 10/12] [NOT-FOR-UPSTREAM] x86/virt/tdx: Account PAMT memory and print it in /proc/meminfo Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 11/12] x86/virt/tdx: Enable Dynamic PAMT Kirill A. Shutemov
2025-06-09 19:13 ` Kirill A. Shutemov [this message]
2025-06-25 13:25 ` [PATCHv2 00/12] TDX: " Kirill A. Shutemov
2025-06-25 22:49 ` Edgecombe, Rick P
2025-06-27 13:05   ` kirill.shutemov
2025-08-08 23:18 ` Edgecombe, Rick P
2025-08-11  6:31   ` kas
2025-08-11 22:30     ` Edgecombe, Rick P
2025-08-12  2:02       ` Sean Christopherson
2025-08-12  2:31         ` Vishal Annapurve
2025-08-12  8:04           ` kas
2025-08-12 15:12             ` Edgecombe, Rick P
2025-08-12 16:15               ` Sean Christopherson
2025-08-12 18:39                 ` Edgecombe, Rick P
2025-08-12 22:00                   ` Vishal Annapurve
2025-08-12 23:34                     ` Edgecombe, Rick P
2025-08-13  0:18                       ` Vishal Annapurve
2025-08-13  0:51                         ` Edgecombe, Rick P
2025-08-12 18:44                 ` Vishal Annapurve
2025-08-13  8:09                 ` Kiryl Shutsemau
2025-08-13  7:49               ` Kiryl Shutsemau
2025-08-12  8:03         ` kas
2025-08-13 22:43         ` Edgecombe, Rick P
2025-08-13 23:31           ` Dave Hansen
2025-08-14  0:14             ` Edgecombe, Rick P
2025-08-14 10:55               ` Kiryl Shutsemau
2025-08-15  1:03                 ` Edgecombe, Rick P
2025-08-20 15:31                   ` Sean Christopherson
2025-08-20 16:35                     ` Edgecombe, Rick P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250609191340.2051741-13-kirill.shutemov@linux.intel.com \
    --to=kirill.shutemov@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).