From: Binbin Wu <binbin.wu@linux.intel.com>
To: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: kas@kernel.org, bp@alien8.de, chao.gao@intel.com,
dave.hansen@linux.intel.com, isaku.yamahata@intel.com,
kai.huang@intel.com, kvm@vger.kernel.org,
linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org,
mingo@redhat.com, pbonzini@redhat.com, seanjc@google.com,
tglx@linutronix.de, x86@kernel.org, yan.y.zhao@intel.com,
vannapurve@google.com,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v3 08/16] x86/virt/tdx: Optimize tdx_alloc/free_page() helpers
Date: Wed, 24 Sep 2025 14:15:11 +0800 [thread overview]
Message-ID: <86ab9923-624d-4950-abea-46780e94c6ce@linux.intel.com> (raw)
In-Reply-To: <20250918232224.2202592-9-rick.p.edgecombe@intel.com>
On 9/19/2025 7:22 AM, Rick Edgecombe wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> Optimize the PAMT alloc/free helpers to avoid taking the global lock when
> possible.
>
> The recently introduced PAMT alloc/free helpers maintain a refcount to
> keep track of when it is ok to reclaim and free a 4KB PAMT page. This
> refcount is protected by a global lock in order to guarantee that races
> don’t result in the PAMT getting freed while another caller requests it
> be mapped. But a global lock is a bit heavyweight, especially since the
> refcounts can be (already are) updated atomically.
>
> A simple approach would be to increment/decrement the refcount outside of
> the lock before actually adjusting the PAMT, and only adjust the PAMT if
> the refcount transitions from/to 0. This would correctly allocate and free
> the PAMT page without getting out of sync. But there it leaves a race
> where a simultaneous caller could see the refcount already incremented and
> return before it is actually mapped.
>
> So treat the refcount 0->1 case as a special case. On add, if the refcount
> is zero *don’t* increment the refcount outside the lock (to 1). Always
> take the lock in that case and only set the refcount to 1 after the PAMT
> is actually added. This way simultaneous adders, when PAMT is not
> installed yet, will take the slow lock path.
>
> On the 1->0 case, it is ok to return from tdx_pamt_put() when the DPAMT is
> not actually freed yet, so the basic approach works. Just decrement the
> refcount before taking the lock. Only do the lock and removal of the PAMT
> when the refcount goes to zero.
>
> There is an asymmetry between tdx_pamt_get() and tdx_pamt_put() in that
> tdx_pamt_put() goes 1->0 outside the lock, but tdx_pamt_put() does 0-1
^
tdx_pamt_get() ?
> inside the lock. Because of this, there is a special race where
> tdx_pamt_put() could decrement the refcount to zero before the PAMT is
> actually removed, and tdx_pamt_get() could try to do a PAMT.ADD when the
> page is already mapped. Luckily the TDX module will tell return a special
> error that tells us we hit this case. So handle it specially by looking
> for the error code.
>
> The optimization is a little special, so make the code extra commented
> and verbose.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> [Clean up code, update log]
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> v3:
> - Split out optimization from “x86/virt/tdx: Add tdx_alloc/free_page() helpers”
> - Remove edge case handling that I could not find a reason for
> - Write log
> ---
> arch/x86/include/asm/shared/tdx_errno.h | 2 ++
> arch/x86/virt/vmx/tdx/tdx.c | 46 +++++++++++++++++++++----
> 2 files changed, 42 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/shared/tdx_errno.h b/arch/x86/include/asm/shared/tdx_errno.h
> index 49ab7ecc7d54..4bc0b9c9e82b 100644
> --- a/arch/x86/include/asm/shared/tdx_errno.h
> +++ b/arch/x86/include/asm/shared/tdx_errno.h
> @@ -21,6 +21,7 @@
> #define TDX_PREVIOUS_TLB_EPOCH_BUSY 0x8000020100000000ULL
> #define TDX_RND_NO_ENTROPY 0x8000020300000000ULL
> #define TDX_PAGE_METADATA_INCORRECT 0xC000030000000000ULL
> +#define TDX_HPA_RANGE_NOT_FREE 0xC000030400000000ULL
> #define TDX_VCPU_NOT_ASSOCIATED 0x8000070200000000ULL
> #define TDX_KEY_GENERATION_FAILED 0x8000080000000000ULL
> #define TDX_KEY_STATE_INCORRECT 0xC000081100000000ULL
> @@ -100,6 +101,7 @@ DEFINE_TDX_ERRNO_HELPER(TDX_SUCCESS);
> DEFINE_TDX_ERRNO_HELPER(TDX_RND_NO_ENTROPY);
> DEFINE_TDX_ERRNO_HELPER(TDX_OPERAND_INVALID);
> DEFINE_TDX_ERRNO_HELPER(TDX_OPERAND_BUSY);
> +DEFINE_TDX_ERRNO_HELPER(TDX_HPA_RANGE_NOT_FREE);
> DEFINE_TDX_ERRNO_HELPER(TDX_VCPU_NOT_ASSOCIATED);
> DEFINE_TDX_ERRNO_HELPER(TDX_FLUSHVP_NOT_DONE);
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index af73b6c2e917..c25e238931a7 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -2117,7 +2117,7 @@ int tdx_pamt_get(struct page *page)
> u64 pamt_pa_array[MAX_DPAMT_ARG_SIZE];
> atomic_t *pamt_refcount;
> u64 tdx_status;
> - int ret;
> + int ret = 0;
>
> if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
> return 0;
> @@ -2128,14 +2128,40 @@ int tdx_pamt_get(struct page *page)
>
> pamt_refcount = tdx_find_pamt_refcount(hpa);
>
> + if (atomic_inc_not_zero(pamt_refcount))
> + goto out_free;
> +
> scoped_guard(spinlock, &pamt_lock) {
> - if (atomic_read(pamt_refcount))
> + /*
> + * Lost race to other tdx_pamt_add(). Other task has already allocated
> + * PAMT memory for the HPA.
> + */
> + if (atomic_read(pamt_refcount)) {
> + atomic_inc(pamt_refcount);
> goto out_free;
> + }
>
> tdx_status = tdh_phymem_pamt_add(hpa | TDX_PS_2M, pamt_pa_array);
>
> if (IS_TDX_SUCCESS(tdx_status)) {
> + /*
> + * The refcount is zero, and this locked path is the only way to
> + * increase it from 0-1. If the PAMT.ADD was successful, set it
> + * to 1 (obviously).
> + */
> + atomic_set(pamt_refcount, 1);
> + } else if (IS_TDX_HPA_RANGE_NOT_FREE(tdx_status)) {
> + /*
> + * Less obviously, another CPU's call to tdx_pamt_put() could have
> + * decremented the refcount before entering its lock section.
> + * In this case, the PAMT is not actually removed yet. Luckily
> + * TDX module tells about this case, so increment the refcount
> + * 0-1, so tdx_pamt_put() skips its pending PAMT.REMOVE.
> + *
> + * The call didn't need the pages though, so free them.
> + */
> atomic_inc(pamt_refcount);
> + goto out_free;
> } else {
> pr_err("TDH_PHYMEM_PAMT_ADD failed: %#llx\n", tdx_status);
> goto out_free;
> @@ -2167,15 +2193,23 @@ void tdx_pamt_put(struct page *page)
>
> pamt_refcount = tdx_find_pamt_refcount(hpa);
>
> + /*
> + * Unlike the paired call in tdx_pamt_get(), decrement the refcount
> + * outside the lock even if it's not the special 0<->1 transition.
it's not -> it's ?
> + * See special logic around HPA_RANGE_NOT_FREE in tdx_pamt_get().
> + */
> + if (!atomic_dec_and_test(pamt_refcount))
> + return;
> +
> scoped_guard(spinlock, &pamt_lock) {
> - if (!atomic_read(pamt_refcount))
> + /* Lost race with tdx_pamt_get() */
> + if (atomic_read(pamt_refcount))
> return;
>
> tdx_status = tdh_phymem_pamt_remove(hpa | TDX_PS_2M, pamt_pa_array);
>
> - if (IS_TDX_SUCCESS(tdx_status)) {
> - atomic_dec(pamt_refcount);
> - } else {
> + if (!IS_TDX_SUCCESS(tdx_status)) {
> + atomic_inc(pamt_refcount);
> pr_err("TDH_PHYMEM_PAMT_REMOVE failed: %#llx\n", tdx_status);
> return;
> }
next prev parent reply other threads:[~2025-09-24 6:15 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-18 23:22 [PATCH v3 00/16] TDX: Enable Dynamic PAMT Rick Edgecombe
2025-09-18 23:22 ` [PATCH v3 01/16] x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h> Rick Edgecombe
2025-09-19 1:29 ` Huang, Kai
2025-09-25 23:23 ` Edgecombe, Rick P
2025-09-25 23:32 ` Huang, Kai
2025-09-23 5:49 ` Binbin Wu
2025-09-25 23:09 ` Edgecombe, Rick P
2025-09-26 5:36 ` Binbin Wu
2025-09-26 4:52 ` Xiaoyao Li
2025-09-26 19:53 ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 02/16] x86/tdx: Add helpers to check return status codes Rick Edgecombe
2025-09-19 1:26 ` Huang, Kai
2025-09-25 23:27 ` Edgecombe, Rick P
2025-09-23 6:19 ` Binbin Wu
2025-09-25 23:24 ` Edgecombe, Rick P
2025-09-26 6:32 ` Xiaoyao Li
2025-09-26 21:27 ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 03/16] x86/virt/tdx: Simplify tdmr_get_pamt_sz() Rick Edgecombe
2025-09-19 0:50 ` Huang, Kai
2025-09-19 19:26 ` Edgecombe, Rick P
2025-09-29 11:44 ` Xiaoyao Li
2025-09-29 17:47 ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 04/16] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Rick Edgecombe
2025-09-23 7:15 ` Binbin Wu
2025-09-25 23:28 ` Edgecombe, Rick P
2025-09-26 8:41 ` Xiaoyao Li
2025-09-26 21:57 ` Edgecombe, Rick P
2025-09-26 22:06 ` Dave Hansen
2025-10-06 19:34 ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 05/16] x86/virt/tdx: Allocate reference counters for PAMT memory Rick Edgecombe
2025-09-23 7:45 ` Binbin Wu
2025-09-29 17:41 ` Edgecombe, Rick P
2025-09-29 18:08 ` Dave Hansen
2025-09-30 1:04 ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 06/16] x86/virt/tdx: Improve PAMT refcounters allocation for sparse memory Rick Edgecombe
2025-09-19 7:25 ` Huang, Kai
2025-09-23 9:38 ` Binbin Wu
2025-09-24 6:50 ` Huang, Kai
2025-09-24 8:57 ` Binbin Wu
2025-10-01 0:32 ` Edgecombe, Rick P
2025-10-01 10:40 ` Huang, Kai
2025-10-01 19:00 ` Edgecombe, Rick P
2025-10-01 20:49 ` Huang, Kai
2025-10-15 1:35 ` Huang, Kai
2025-09-18 23:22 ` [PATCH v3 07/16] x86/virt/tdx: Add tdx_alloc/free_page() helpers Rick Edgecombe
2025-09-22 11:27 ` Huang, Kai
2025-09-26 22:41 ` Edgecombe, Rick P
2025-09-29 7:56 ` Yan Zhao
2025-09-29 17:19 ` Edgecombe, Rick P
2025-09-30 14:03 ` Xiaoyao Li
2025-09-30 17:38 ` Dave Hansen
2025-09-30 17:47 ` Edgecombe, Rick P
2025-09-30 15:25 ` Dave Hansen
2025-09-30 17:00 ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 08/16] x86/virt/tdx: Optimize " Rick Edgecombe
2025-09-19 9:39 ` Kiryl Shutsemau
2025-09-24 6:15 ` Binbin Wu [this message]
2025-09-18 23:22 ` [PATCH v3 09/16] KVM: TDX: Allocate PAMT memory for TD control structures Rick Edgecombe
2025-09-18 23:22 ` [PATCH v3 10/16] KVM: TDX: Allocate PAMT memory for vCPU " Rick Edgecombe
2025-09-18 23:22 ` [PATCH v3 11/16] KVM: TDX: Add x86 ops for external spt cache Rick Edgecombe
2025-09-19 9:44 ` Kiryl Shutsemau
2025-09-23 7:03 ` Yan Zhao
2025-09-26 22:10 ` Edgecombe, Rick P
2025-09-28 8:35 ` Yan Zhao
2025-09-24 7:58 ` Binbin Wu
2025-09-30 1:02 ` Yan Zhao
2025-09-30 17:54 ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 12/16] x86/virt/tdx: Add helpers to allow for pre-allocating pages Rick Edgecombe
2025-09-19 9:55 ` Kiryl Shutsemau
2025-10-01 19:48 ` Edgecombe, Rick P
2025-09-22 11:20 ` Huang, Kai
2025-09-26 23:47 ` Edgecombe, Rick P
2025-09-28 22:56 ` Huang, Kai
2025-09-29 12:10 ` Huang, Kai
2025-09-26 1:44 ` Yan Zhao
2025-09-26 22:05 ` Edgecombe, Rick P
2025-09-28 1:40 ` Yan Zhao
2025-09-26 15:19 ` Dave Hansen
2025-09-26 15:49 ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 13/16] KVM: TDX: Handle PAMT allocation in fault path Rick Edgecombe
2025-09-30 1:09 ` Yan Zhao
2025-09-30 18:11 ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 14/16] KVM: TDX: Reclaim PAMT memory Rick Edgecombe
2025-09-18 23:22 ` [PATCH v3 15/16] x86/virt/tdx: Enable Dynamic PAMT Rick Edgecombe
2025-09-18 23:22 ` [PATCH v3 16/16] Documentation/x86: Add documentation for TDX's " Rick Edgecombe
2025-09-26 2:28 ` [PATCH v3 00/16] TDX: Enable " Yan Zhao
2025-09-26 14:09 ` Dave Hansen
2025-09-26 16:02 ` Edgecombe, Rick P
2025-09-26 16:11 ` Dave Hansen
2025-09-26 19:00 ` Edgecombe, Rick P
2025-09-26 19:03 ` Dave Hansen
2025-09-26 19:52 ` Edgecombe, Rick P
2025-09-28 1:34 ` Yan Zhao
2025-09-29 11:17 ` Kiryl Shutsemau
2025-09-29 16:22 ` Dave Hansen
2025-09-29 16:58 ` Edgecombe, Rick P
2025-09-30 18:29 ` Edgecombe, Rick P
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86ab9923-624d-4950-abea-46780e94c6ce@linux.intel.com \
--to=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=chao.gao@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kas@kernel.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vannapurve@google.com \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).