linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"kas@kernel.org" <kas@kernel.org>,
	"seanjc@google.com" <seanjc@google.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Cc: "Gao, Chao" <chao.gao@intel.com>, "bp@alien8.de" <bp@alien8.de>,
	"Huang, Kai" <kai.huang@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"Zhao, Yan Y" <yan.y.zhao@intel.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCHv2 00/12] TDX: Enable Dynamic PAMT
Date: Fri, 8 Aug 2025 23:18:40 +0000	[thread overview]
Message-ID: <d432b8b7cfc413001c743805787990fe0860e780.camel@intel.com> (raw)
In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com>

On Mon, 2025-06-09 at 22:13 +0300, Kirill A. Shutemov wrote:
> Dynamic PAMT enabling in kernel
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Kernel maintains refcounts for every 2M regions with two helpers
> tdx_pamt_get() and tdx_pamt_put().
> 
> The refcount represents number of users for the PAMT memory in the region.
> Kernel calls TDH.PHYMEM.PAMT.ADD on 0->1 transition and
> TDH.PHYMEM.PAMT.REMOVE on transition 1->0.
> 
> The function tdx_alloc_page() allocates a new page and ensures that it is
> backed by PAMT memory. Pages allocated in this manner are ready to be used
> for a TD. The function tdx_free_page() frees the page and releases the
> PAMT memory for the 2M region if it is no longer needed.
> 
> PAMT memory gets allocated as part of TD init, VCPU init, on populating
> SEPT tree and adding guest memory (both during TD build and via AUG on
> accept). Splitting 2M page into 4K also requires PAMT memory.
> 
> PAMT memory removed on reclaim of control pages and guest memory.
> 
> Populating PAMT memory on fault and on split is tricky as kernel cannot
> allocate memory from the context where it is needed. These code paths use
> pre-allocated PAMT memory pools.
> 
> Previous attempt on Dynamic PAMT enabling
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The initial attempt at kernel enabling was quite different. It was built
> around lazy PAMT allocation: only trying to add a PAMT page pair if a
> SEAMCALL fails due to a missing PAMT and reclaiming it based on hints
> provided by the TDX module.
> 
> The motivation was to avoid duplicating the PAMT memory refcounting that
> the TDX module does on the kernel side.
> 
> This approach is inherently more racy as there is no serialization of
> PAMT memory add/remove against SEAMCALLs that add/remove memory for a TD.
> Such serialization would require global locking, which is not feasible.
> 
> This approach worked, but at some point it became clear that it could not
> be robust as long as the kernel avoids TDX_OPERAND_BUSY loops.
> TDX_OPERAND_BUSY will occur as a result of the races mentioned above.
> 
> This approach was abandoned in favor of explicit refcounting.

On closer inspection this new solution also has global locking. It
opportunistically checks to see if there is already a refcount, but otherwise
when a PAMT page actually has to be added there is a global spinlock while the
PAMT add/remove SEAMCALL is made. I guess this is going to get taken somewhere
around once per 512 4k private pages, but when it does it has some less ideal
properties:
 - Cache line bouncing of the lock between all TDs on the host
 - An global exclusive lock deep inside the TDP MMU shared lock fault path
 - Contend heavily when two TD's shutting down at the same time?

As for why not only do the lock as a backup option like the kick+lock solution
in KVM, the problem would be losing the refcount race and ending up with a PAMT
page getting released early.

As far as TDX module locking is concerned (i.e. BUSY error codes from pamt
add/remove), it seems this would only happen if pamt add/remove operate
simultaneously on the same 2MB HPA region. That is completely prevented by the
refcount and global lock, but it's a bit heavyweight. It prevents simultaneously
adding totally separate 2MB regions when we only would need to prevent
simultaneously operating on the same 2MB region.

I don't see any other reason for the global spin lock, Kirill was that it? Did
you consider also adding a lock per 2MB region, like the refcount? Or any other
granularity of lock besides global? Not saying global is definitely the wrong
choice, but seems arbitrary if I got the above right.


  parent reply	other threads:[~2025-08-08 23:18 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-09 19:13 [PATCHv2 00/12] TDX: Enable Dynamic PAMT Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 01/12] x86/tdx: Consolidate TDX error handling Kirill A. Shutemov
2025-06-25 17:58   ` Dave Hansen
2025-06-25 20:58     ` Edgecombe, Rick P
2025-06-25 21:27       ` Sean Christopherson
2025-06-25 21:46         ` Edgecombe, Rick P
2025-06-26  9:25         ` kirill.shutemov
2025-06-26 14:46           ` Dave Hansen
2025-06-26 15:51             ` Sean Christopherson
2025-06-26 16:59               ` Dave Hansen
2025-06-27 10:42                 ` kirill.shutemov
2025-07-30 18:32                 ` Edgecombe, Rick P
2025-07-31 23:31                   ` Sean Christopherson
2025-07-31 23:46                     ` Edgecombe, Rick P
2025-07-31 23:53                       ` Sean Christopherson
2025-08-01 15:03                         ` Edgecombe, Rick P
2025-08-06 15:19                           ` Sean Christopherson
2025-06-26  0:05     ` Huang, Kai
2025-07-30 18:33       ` Edgecombe, Rick P
2025-06-09 19:13 ` [PATCHv2 02/12] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Kirill A. Shutemov
2025-06-25 18:06   ` Dave Hansen
2025-06-26  9:25     ` Kirill A. Shutemov
2025-07-31  1:06     ` Edgecombe, Rick P
2025-07-31  4:10       ` Huang, Kai
2025-06-26 11:08   ` Huang, Kai
2025-06-27 10:42     ` kirill.shutemov
2025-06-09 19:13 ` [PATCHv2 03/12] x86/virt/tdx: Allocate reference counters for PAMT memory Kirill A. Shutemov
2025-06-25 19:26   ` Dave Hansen
2025-06-27 11:27     ` Kirill A. Shutemov
2025-06-27 14:03       ` Dave Hansen
2025-06-26  0:53   ` Huang, Kai
2025-06-26  4:48     ` Huang, Kai
2025-06-27 11:35     ` kirill.shutemov
2025-06-09 19:13 ` [PATCHv2 04/12] x86/virt/tdx: Add tdx_alloc/free_page() helpers Kirill A. Shutemov
2025-06-10  2:36   ` Chao Gao
2025-06-10 14:51     ` [PATCHv2.1 " Kirill A. Shutemov
2025-06-25 18:01       ` Dave Hansen
2025-06-25 20:09     ` [PATCHv2 " Dave Hansen
2025-06-26  0:46       ` Chao Gao
2025-06-25 20:02   ` Dave Hansen
2025-06-27 13:00     ` Kirill A. Shutemov
2025-06-27  7:49   ` Adrian Hunter
2025-06-27 13:03     ` Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 05/12] KVM: TDX: Allocate PAMT memory in __tdx_td_init() Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 06/12] KVM: TDX: Allocate PAMT memory in tdx_td_vcpu_init() Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 07/12] KVM: TDX: Preallocate PAMT pages to be used in page fault path Kirill A. Shutemov
2025-06-26 11:21   ` Huang, Kai
2025-07-10  1:34   ` Edgecombe, Rick P
2025-07-10  7:49     ` kirill.shutemov
2025-06-09 19:13 ` [PATCHv2 08/12] KVM: TDX: Handle PAMT allocation in " Kirill A. Shutemov
2025-06-12 12:19   ` Chao Gao
2025-06-12 13:05     ` [PATCHv2.1 " Kirill A. Shutemov
2025-06-25 22:38   ` [PATCHv2 " Edgecombe, Rick P
2025-07-09 14:29     ` kirill.shutemov
2025-07-10  1:33   ` Edgecombe, Rick P
2025-07-10  8:45     ` kirill.shutemov
2025-08-21 19:21   ` Sagi Shahar
2025-08-21 19:35     ` Edgecombe, Rick P
2025-08-21 19:53       ` Sagi Shahar
2025-06-09 19:13 ` [PATCHv2 09/12] KVM: TDX: Reclaim PAMT memory Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 10/12] [NOT-FOR-UPSTREAM] x86/virt/tdx: Account PAMT memory and print it in /proc/meminfo Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 11/12] x86/virt/tdx: Enable Dynamic PAMT Kirill A. Shutemov
2025-06-09 19:13 ` [PATCHv2 12/12] Documentation/x86: Add documentation for TDX's " Kirill A. Shutemov
2025-06-25 13:25 ` [PATCHv2 00/12] TDX: Enable " Kirill A. Shutemov
2025-06-25 22:49 ` Edgecombe, Rick P
2025-06-27 13:05   ` kirill.shutemov
2025-08-08 23:18 ` Edgecombe, Rick P [this message]
2025-08-11  6:31   ` kas
2025-08-11 22:30     ` Edgecombe, Rick P
2025-08-12  2:02       ` Sean Christopherson
2025-08-12  2:31         ` Vishal Annapurve
2025-08-12  8:04           ` kas
2025-08-12 15:12             ` Edgecombe, Rick P
2025-08-12 16:15               ` Sean Christopherson
2025-08-12 18:39                 ` Edgecombe, Rick P
2025-08-12 22:00                   ` Vishal Annapurve
2025-08-12 23:34                     ` Edgecombe, Rick P
2025-08-13  0:18                       ` Vishal Annapurve
2025-08-13  0:51                         ` Edgecombe, Rick P
2025-08-12 18:44                 ` Vishal Annapurve
2025-08-13  8:09                 ` Kiryl Shutsemau
2025-08-13  7:49               ` Kiryl Shutsemau
2025-08-12  8:03         ` kas
2025-08-13 22:43         ` Edgecombe, Rick P
2025-08-13 23:31           ` Dave Hansen
2025-08-14  0:14             ` Edgecombe, Rick P
2025-08-14 10:55               ` Kiryl Shutsemau
2025-08-15  1:03                 ` Edgecombe, Rick P
2025-08-20 15:31                   ` Sean Christopherson
2025-08-20 16:35                     ` Edgecombe, Rick P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d432b8b7cfc413001c743805787990fe0860e780.camel@intel.com \
    --to=rick.p.edgecombe@intel.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).