Linux Documentation
 help / color / mirror / Atom feed
From: Rick Edgecombe <rick.p.edgecombe@intel.com>
To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com,
	kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com,
	seanjc@google.com, tglx@kernel.org, vannapurve@google.com,
	x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com,
	kai.huang@intel.com
Cc: rick.p.edgecombe@intel.com,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [PATCH v6 06/11] x86/virt/tdx: Optimize tdx_pamt_get/put()
Date: Mon, 25 May 2026 19:35:10 -0700	[thread overview]
Message-ID: <20260526023515.288829-7-rick.p.edgecombe@intel.com> (raw)
In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com>

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

The Dynamic PAMT get/put helpers use a global spinlock to serialize all
refcount updates and SEAMCALL invocations. This gives correct behavior for
concurrent callers, but leads to contention. It is especially bad from the
KVM side, which is designed to allow faulting in EPT under a shared lock.
With the global spinlock, not only is the lock an exclusive one, but it is
for all TDs instead of just a single one.

But taking the global lock each time is actually unnecessary. Only the 0->1
and 1->0 refcount transitions actually need the lock (to pair with
SEAMCALLs that actually add and remove with the Dynamic PAMT pages). The
common case of incrementing or decrementing a non-zero refcount can be
done locklessly.

So create a fast and slow path. Check the refcount outside the lock and
only take it for the slowpath (0->1 and 1->0 transitions).

On the put side make the refcount adjustment and lock taking atomic so if
a 'get' happens between them, it doesn't cause the Dynamic PAMT to be
freed incorrectly. On the get side there is no technique for doing the
refcount adjustment and lock atomically, so check the refcount again
inside the lock.

Assisted-by: GitHub Copilot:claude-opus-4-6
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
v6:
 - Fix "tdx_pamt_add()" typo to "tdx_pamt_get()" in lost-race comment
 - Fix error path bug: set ret = -EIO and use WARN_ON_ONCE() instead of
   pr_err() for unexpected PAMT.ADD failures (Sean)
 - Use "set the refcount 0->1" wording to match atomic_set() usage
 - Wrap comments to 80 columns
 - Switch to atomic_dec_and_lock() and remove handling of races that are
   no longer needed as a result. Adjust comments as appropriate. (Dave)
 - Adjustments from dropping error helper patches
v4:
 - Use atomic_set() in the HPA_RANGE_NOT_FREE case (Kiryl)
 - Log, comment typos (Binbin)
 - Move PAMT page allocation after refcount check in tdx_pamt_get() to
   avoid an alloc/free in the common path.

v3:
 - Split out optimization from “x86/virt/tdx: Add tdx_alloc/free_page() helpers”
 - Remove edge case handling that I could not find a reason for
 - Write log
---
 arch/x86/virt/vmx/tdx/tdx.c | 102 +++++++++++++++++++++---------------
 1 file changed, 61 insertions(+), 41 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 50333eb96efa6..c41c632a4cdf2 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -2057,32 +2057,50 @@ static int tdx_pamt_get(kvm_pfn_t pfn)
 	if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
 		return 0;
 
+	pamt_refcount = tdx_find_pamt_refcount(pfn);
+
+	/*
+	 * If the pamt page is already added (i.e. refcount >= 1),
+	 * then just increment the refcount.
+	 */
+	if (atomic_inc_not_zero(pamt_refcount))
+		return 0;
+
 	ret = alloc_pamt_array(pamt_pages);
 	if (ret)
 		return ret;
 
-	pamt_refcount = tdx_find_pamt_refcount(pfn);
+	spin_lock(&pamt_lock);
 
-	scoped_guard(spinlock, &pamt_lock) {
-		/*
-		 * If the pamt page is already added (i.e. refcount >= 1),
-		 * then just increment the refcount.
-		 */
-		if (atomic_read(pamt_refcount)) {
-			atomic_inc(pamt_refcount);
-			goto out_free;
-		}
-
-		/* Try to add the pamt page and take the refcount 0->1. */
-		tdx_status = tdh_phymem_pamt_add(pfn, pamt_pages);
-		if (WARN_ON_ONCE(tdx_status != TDX_SUCCESS)) {
-			ret = -EIO;
-			goto out_free;
-		}
-
-		atomic_set(pamt_refcount, 1);
+	/*
+	 * Unlike tdx_pamt_put() which uses atomic_dec_and_lock() to
+	 * atomically handle the 1->0 transition, the get side has no
+	 * equivalent combined primitive for 0->1. Recheck under the
+	 * lock since another get may have already done the 0->1
+	 * transition after both saw atomic_inc_not_zero() fail.
+	 */
+	if (atomic_read(pamt_refcount)) {
+		atomic_inc(pamt_refcount);
+		spin_unlock(&pamt_lock);
+		goto out_free;
 	}
 
+	tdx_status = tdh_phymem_pamt_add(pfn, pamt_pages);
+	if (tdx_status == TDX_SUCCESS) {
+		/*
+		 * The refcount is zero, and this locked path is the
+		 * only way to increase it from 0->1.
+		 */
+		atomic_set(pamt_refcount, 1);
+	} else {
+		WARN_ON_ONCE(1);
+		ret = -EIO;
+		spin_unlock(&pamt_lock);
+		goto out_free;
+	}
+
+	spin_unlock(&pamt_lock);
+
 	return 0;
 out_free:
 	free_pamt_array(pamt_pages);
@@ -2104,32 +2122,34 @@ static void tdx_pamt_put(kvm_pfn_t pfn)
 
 	pamt_refcount = tdx_find_pamt_refcount(pfn);
 
-	scoped_guard(spinlock, &pamt_lock) {
+	/*
+	 * If there is more than 1 reference on the pamt page, don't
+	 * remove it yet. Just decrement the refcount.
+	 */
+	if (!atomic_dec_and_lock(pamt_refcount, &pamt_lock))
+		return;
+
+	tdx_status = tdh_phymem_pamt_remove(pfn, pamt_pages);
+
+	/*
+	 * Don't free pamt_pages as it could hold garbage when
+	 * tdh_phymem_pamt_remove() fails.  Don't panic/BUG_ON(), as
+	 * there is no risk of data corruption, but do yell loudly as
+	 * failure indicates a kernel bug, memory is being leaked, and
+	 * the dangling PAMT entry may cause future operations to fail.
+	 */
+	if (WARN_ON_ONCE(tdx_status != TDX_SUCCESS)) {
 		/*
-		 * If the there are more than 1 references on the pamt page,
-		 * don't remove it yet. Just decrement the refcount.
+		 * atomic_dec_and_lock() already decremented it to 0,
+		 * but the PAMT entry still exists since REMOVE failed.
 		 */
-		if (atomic_read(pamt_refcount) > 1) {
-			atomic_dec(pamt_refcount);
-			return;
-		}
-
-		/* Try to remove the pamt page and take the refcount 1->0. */
-		tdx_status = tdh_phymem_pamt_remove(pfn, pamt_pages);
-
-		/*
-		 * Don't free pamt_pages as it could hold garbage when
-		 * tdh_phymem_pamt_remove() fails.  Don't panic/BUG_ON(), as
-		 * there is no risk of data corruption, but do yell loudly as
-		 * failure indicates a kernel bug, memory is being leaked, and
-		 * the dangling PAMT entry may cause future operations to fail.
-		 */
-		if (WARN_ON_ONCE(tdx_status != TDX_SUCCESS))
-			return;
-
-		atomic_set(pamt_refcount, 0);
+		atomic_set(pamt_refcount, 1);
+		spin_unlock(&pamt_lock);
+		return;
 	}
 
+	spin_unlock(&pamt_lock);
+
 	free_pamt_array(pamt_pages);
 }
 
-- 
2.54.0


  parent reply	other threads:[~2026-05-26  2:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26  2:35 [PATCH v6 00/11] Dynamic PAMT Rick Edgecombe
2026-05-26  2:35 ` [PATCH v6 01/11] x86/virt/tdx: Simplify tdmr_get_pamt_sz() Rick Edgecombe
2026-05-26  2:35 ` [PATCH v6 02/11] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Rick Edgecombe
2026-05-26  2:35 ` [PATCH v6 03/11] x86/virt/tdx: Add tdx_alloc/free_control_page() helpers Rick Edgecombe
2026-05-26  2:35 ` [PATCH v6 04/11] x86/virt/tdx: Allocate ref counts for Dynamic PAMT memory Rick Edgecombe
2026-05-26  2:35 ` [PATCH v6 05/11] x86/virt/tdx: Handle concurrent callers in tdx_pamt_get/put() Rick Edgecombe
2026-05-26  2:35 ` Rick Edgecombe [this message]
2026-05-26  8:57   ` [PATCH v6 06/11] x86/virt/tdx: Optimize tdx_pamt_get/put() Chao Gao
2026-05-26  2:35 ` [PATCH v6 07/11] KVM: TDX: Allocate PAMT memory for TD and vCPU control structures Rick Edgecombe
2026-05-26  2:35 ` [PATCH v6 08/11] x86/tdx: Add APIs to support Dynamic PAMT ops from KVM's fault path Rick Edgecombe
2026-05-26  2:35 ` [PATCH v6 09/11] KVM: TDX: Get/put PAMT pages when (un)mapping private memory Rick Edgecombe
2026-05-26  2:35 ` [PATCH v6 10/11] x86/virt/tdx: Enable Dynamic PAMT Rick Edgecombe
2026-05-26  2:35 ` [PATCH v6 11/11] Documentation/x86: Add documentation for TDX's " Rick Edgecombe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260526023515.288829-7-rick.p.edgecombe@intel.com \
    --to=rick.p.edgecombe@intel.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=kai.huang@intel.com \
    --cc=kas@kernel.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nik.borisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@kernel.org \
    --cc=vannapurve@google.com \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox