linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Binbin Wu <binbin.wu@linux.intel.com>
To: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: kas@kernel.org, bp@alien8.de, chao.gao@intel.com,
	dave.hansen@linux.intel.com, isaku.yamahata@intel.com,
	kai.huang@intel.com, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org,
	mingo@redhat.com, pbonzini@redhat.com, seanjc@google.com,
	tglx@linutronix.de, x86@kernel.org, yan.y.zhao@intel.com,
	vannapurve@google.com,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v3 08/16] x86/virt/tdx: Optimize tdx_alloc/free_page() helpers
Date: Wed, 24 Sep 2025 14:15:11 +0800	[thread overview]
Message-ID: <86ab9923-624d-4950-abea-46780e94c6ce@linux.intel.com> (raw)
In-Reply-To: <20250918232224.2202592-9-rick.p.edgecombe@intel.com>



On 9/19/2025 7:22 AM, Rick Edgecombe wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> Optimize the PAMT alloc/free helpers to avoid taking the global lock when
> possible.
>
> The recently introduced PAMT alloc/free helpers maintain a refcount to
> keep track of when it is ok to reclaim and free a 4KB PAMT page. This
> refcount is protected by a global lock in order to guarantee that races
> don’t result in the PAMT getting freed while another caller requests it
> be mapped. But a global lock is a bit heavyweight, especially since the
> refcounts can be (already are) updated atomically.
>
> A simple approach would be to increment/decrement the refcount outside of
> the lock before actually adjusting the PAMT, and only adjust the PAMT if
> the refcount transitions from/to 0. This would correctly allocate and free
> the PAMT page without getting out of sync. But there it leaves a race
> where a simultaneous caller could see the refcount already incremented and
> return before it is actually mapped.
>
> So treat the refcount 0->1 case as a special case. On add, if the refcount
> is zero *don’t* increment the refcount outside the lock (to 1). Always
> take the lock in that case and only set the refcount to 1 after the PAMT
> is actually added. This way simultaneous adders, when PAMT is not
> installed yet, will take the slow lock path.
>
> On the 1->0 case, it is ok to return from tdx_pamt_put() when the DPAMT is
> not actually freed yet, so the basic approach works. Just decrement the
> refcount before  taking the lock. Only do the lock and removal of the PAMT
> when the refcount goes to zero.
>
> There is an asymmetry between tdx_pamt_get() and tdx_pamt_put() in that
> tdx_pamt_put() goes 1->0 outside the lock, but tdx_pamt_put() does 0-1
                                                      ^
                                                 tdx_pamt_get() ?
> inside the lock. Because of this, there is a special race where
> tdx_pamt_put() could decrement the refcount to zero before the PAMT is
> actually removed, and tdx_pamt_get() could try to do a PAMT.ADD when the
> page is already mapped. Luckily the TDX module will tell return a special
> error that tells us we hit this case. So handle it specially by looking
> for the error code.
>
> The optimization is a little special, so make the code extra commented
> and verbose.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> [Clean up code, update log]
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> v3:
>   - Split out optimization from “x86/virt/tdx: Add tdx_alloc/free_page() helpers”
>   - Remove edge case handling that I could not find a reason for
>   - Write log
> ---
>   arch/x86/include/asm/shared/tdx_errno.h |  2 ++
>   arch/x86/virt/vmx/tdx/tdx.c             | 46 +++++++++++++++++++++----
>   2 files changed, 42 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/shared/tdx_errno.h b/arch/x86/include/asm/shared/tdx_errno.h
> index 49ab7ecc7d54..4bc0b9c9e82b 100644
> --- a/arch/x86/include/asm/shared/tdx_errno.h
> +++ b/arch/x86/include/asm/shared/tdx_errno.h
> @@ -21,6 +21,7 @@
>   #define TDX_PREVIOUS_TLB_EPOCH_BUSY		0x8000020100000000ULL
>   #define TDX_RND_NO_ENTROPY			0x8000020300000000ULL
>   #define TDX_PAGE_METADATA_INCORRECT		0xC000030000000000ULL
> +#define TDX_HPA_RANGE_NOT_FREE			0xC000030400000000ULL
>   #define TDX_VCPU_NOT_ASSOCIATED			0x8000070200000000ULL
>   #define TDX_KEY_GENERATION_FAILED		0x8000080000000000ULL
>   #define TDX_KEY_STATE_INCORRECT			0xC000081100000000ULL
> @@ -100,6 +101,7 @@ DEFINE_TDX_ERRNO_HELPER(TDX_SUCCESS);
>   DEFINE_TDX_ERRNO_HELPER(TDX_RND_NO_ENTROPY);
>   DEFINE_TDX_ERRNO_HELPER(TDX_OPERAND_INVALID);
>   DEFINE_TDX_ERRNO_HELPER(TDX_OPERAND_BUSY);
> +DEFINE_TDX_ERRNO_HELPER(TDX_HPA_RANGE_NOT_FREE);
>   DEFINE_TDX_ERRNO_HELPER(TDX_VCPU_NOT_ASSOCIATED);
>   DEFINE_TDX_ERRNO_HELPER(TDX_FLUSHVP_NOT_DONE);
>   
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index af73b6c2e917..c25e238931a7 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -2117,7 +2117,7 @@ int tdx_pamt_get(struct page *page)
>   	u64 pamt_pa_array[MAX_DPAMT_ARG_SIZE];
>   	atomic_t *pamt_refcount;
>   	u64 tdx_status;
> -	int ret;
> +	int ret = 0;
>   
>   	if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
>   		return 0;
> @@ -2128,14 +2128,40 @@ int tdx_pamt_get(struct page *page)
>   
>   	pamt_refcount = tdx_find_pamt_refcount(hpa);
>   
> +	if (atomic_inc_not_zero(pamt_refcount))
> +		goto out_free;
> +
>   	scoped_guard(spinlock, &pamt_lock) {
> -		if (atomic_read(pamt_refcount))
> +		/*
> +		 * Lost race to other tdx_pamt_add(). Other task has already allocated
> +		 * PAMT memory for the HPA.
> +		 */
> +		if (atomic_read(pamt_refcount)) {
> +			atomic_inc(pamt_refcount);
>   			goto out_free;
> +		}
>   
>   		tdx_status = tdh_phymem_pamt_add(hpa | TDX_PS_2M, pamt_pa_array);
>   
>   		if (IS_TDX_SUCCESS(tdx_status)) {
> +			/*
> +			 * The refcount is zero, and this locked path is the only way to
> +			 * increase it from 0-1. If the PAMT.ADD was successful, set it
> +			 * to 1 (obviously).
> +			 */
> +			atomic_set(pamt_refcount, 1);
> +		} else if (IS_TDX_HPA_RANGE_NOT_FREE(tdx_status)) {
> +			/*
> +			 * Less obviously, another CPU's call to tdx_pamt_put() could have
> +			 * decremented the refcount before entering its lock section.
> +			 * In this case, the PAMT is not actually removed yet. Luckily
> +			 * TDX module tells about this case, so increment the refcount
> +			 * 0-1, so tdx_pamt_put() skips its pending PAMT.REMOVE.
> +			 *
> +			 * The call didn't need the pages though, so free them.
> +			 */
>   			atomic_inc(pamt_refcount);
> +			goto out_free;
>   		} else {
>   			pr_err("TDH_PHYMEM_PAMT_ADD failed: %#llx\n", tdx_status);
>   			goto out_free;
> @@ -2167,15 +2193,23 @@ void tdx_pamt_put(struct page *page)
>   
>   	pamt_refcount = tdx_find_pamt_refcount(hpa);
>   
> +	/*
> +	 * Unlike the paired call in tdx_pamt_get(), decrement the refcount
> +	 * outside the lock even if it's not the special 0<->1 transition.
it's not -> it's ?

> +	 * See special logic around HPA_RANGE_NOT_FREE in tdx_pamt_get().
> +	 */
> +	if (!atomic_dec_and_test(pamt_refcount))
> +		return;
> +
>   	scoped_guard(spinlock, &pamt_lock) {
> -		if (!atomic_read(pamt_refcount))
> +		/* Lost race with tdx_pamt_get() */
> +		if (atomic_read(pamt_refcount))
>   			return;
>   
>   		tdx_status = tdh_phymem_pamt_remove(hpa | TDX_PS_2M, pamt_pa_array);
>   
> -		if (IS_TDX_SUCCESS(tdx_status)) {
> -			atomic_dec(pamt_refcount);
> -		} else {
> +		if (!IS_TDX_SUCCESS(tdx_status)) {
> +			atomic_inc(pamt_refcount);
>   			pr_err("TDH_PHYMEM_PAMT_REMOVE failed: %#llx\n", tdx_status);
>   			return;
>   		}


  parent reply	other threads:[~2025-09-24  6:15 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-18 23:22 [PATCH v3 00/16] TDX: Enable Dynamic PAMT Rick Edgecombe
2025-09-18 23:22 ` [PATCH v3 01/16] x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h> Rick Edgecombe
2025-09-19  1:29   ` Huang, Kai
2025-09-25 23:23     ` Edgecombe, Rick P
2025-09-25 23:32       ` Huang, Kai
2025-09-23  5:49   ` Binbin Wu
2025-09-25 23:09     ` Edgecombe, Rick P
2025-09-26  5:36       ` Binbin Wu
2025-09-26  4:52   ` Xiaoyao Li
2025-09-26 19:53     ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 02/16] x86/tdx: Add helpers to check return status codes Rick Edgecombe
2025-09-19  1:26   ` Huang, Kai
2025-09-25 23:27     ` Edgecombe, Rick P
2025-09-23  6:19   ` Binbin Wu
2025-09-25 23:24     ` Edgecombe, Rick P
2025-09-26  6:32   ` Xiaoyao Li
2025-09-26 21:27     ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 03/16] x86/virt/tdx: Simplify tdmr_get_pamt_sz() Rick Edgecombe
2025-09-19  0:50   ` Huang, Kai
2025-09-19 19:26     ` Edgecombe, Rick P
2025-09-29 11:44     ` Xiaoyao Li
2025-09-29 17:47       ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 04/16] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Rick Edgecombe
2025-09-23  7:15   ` Binbin Wu
2025-09-25 23:28     ` Edgecombe, Rick P
2025-09-26  8:41   ` Xiaoyao Li
2025-09-26 21:57     ` Edgecombe, Rick P
2025-09-26 22:06       ` Dave Hansen
2025-10-06 19:34       ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 05/16] x86/virt/tdx: Allocate reference counters for PAMT memory Rick Edgecombe
2025-09-23  7:45   ` Binbin Wu
2025-09-29 17:41     ` Edgecombe, Rick P
2025-09-29 18:08       ` Dave Hansen
2025-09-30  1:04         ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 06/16] x86/virt/tdx: Improve PAMT refcounters allocation for sparse memory Rick Edgecombe
2025-09-19  7:25   ` Huang, Kai
2025-09-23  9:38     ` Binbin Wu
2025-09-24  6:50       ` Huang, Kai
2025-09-24  8:57         ` Binbin Wu
2025-10-01  0:32           ` Edgecombe, Rick P
2025-10-01 10:40             ` Huang, Kai
2025-10-01 19:00               ` Edgecombe, Rick P
2025-10-01 20:49                 ` Huang, Kai
2025-10-15  1:35   ` Huang, Kai
2025-09-18 23:22 ` [PATCH v3 07/16] x86/virt/tdx: Add tdx_alloc/free_page() helpers Rick Edgecombe
2025-09-22 11:27   ` Huang, Kai
2025-09-26 22:41     ` Edgecombe, Rick P
2025-09-29  7:56   ` Yan Zhao
2025-09-29 17:19     ` Edgecombe, Rick P
2025-09-30 14:03   ` Xiaoyao Li
2025-09-30 17:38     ` Dave Hansen
2025-09-30 17:47     ` Edgecombe, Rick P
2025-09-30 15:25   ` Dave Hansen
2025-09-30 17:00     ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 08/16] x86/virt/tdx: Optimize " Rick Edgecombe
2025-09-19  9:39   ` Kiryl Shutsemau
2025-09-24  6:15   ` Binbin Wu [this message]
2025-09-18 23:22 ` [PATCH v3 09/16] KVM: TDX: Allocate PAMT memory for TD control structures Rick Edgecombe
2025-09-18 23:22 ` [PATCH v3 10/16] KVM: TDX: Allocate PAMT memory for vCPU " Rick Edgecombe
2025-09-18 23:22 ` [PATCH v3 11/16] KVM: TDX: Add x86 ops for external spt cache Rick Edgecombe
2025-09-19  9:44   ` Kiryl Shutsemau
2025-09-23  7:03   ` Yan Zhao
2025-09-26 22:10     ` Edgecombe, Rick P
2025-09-28  8:35       ` Yan Zhao
2025-09-24  7:58   ` Binbin Wu
2025-09-30  1:02   ` Yan Zhao
2025-09-30 17:54     ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 12/16] x86/virt/tdx: Add helpers to allow for pre-allocating pages Rick Edgecombe
2025-09-19  9:55   ` Kiryl Shutsemau
2025-10-01 19:48     ` Edgecombe, Rick P
2025-09-22 11:20   ` Huang, Kai
2025-09-26 23:47     ` Edgecombe, Rick P
2025-09-28 22:56       ` Huang, Kai
2025-09-29 12:10         ` Huang, Kai
2025-09-26  1:44   ` Yan Zhao
2025-09-26 22:05     ` Edgecombe, Rick P
2025-09-28  1:40       ` Yan Zhao
2025-09-26 15:19   ` Dave Hansen
2025-09-26 15:49     ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 13/16] KVM: TDX: Handle PAMT allocation in fault path Rick Edgecombe
2025-09-30  1:09   ` Yan Zhao
2025-09-30 18:11     ` Edgecombe, Rick P
2025-09-18 23:22 ` [PATCH v3 14/16] KVM: TDX: Reclaim PAMT memory Rick Edgecombe
2025-09-18 23:22 ` [PATCH v3 15/16] x86/virt/tdx: Enable Dynamic PAMT Rick Edgecombe
2025-09-18 23:22 ` [PATCH v3 16/16] Documentation/x86: Add documentation for TDX's " Rick Edgecombe
2025-09-26  2:28 ` [PATCH v3 00/16] TDX: Enable " Yan Zhao
2025-09-26 14:09   ` Dave Hansen
2025-09-26 16:02     ` Edgecombe, Rick P
2025-09-26 16:11       ` Dave Hansen
2025-09-26 19:00         ` Edgecombe, Rick P
2025-09-26 19:03           ` Dave Hansen
2025-09-26 19:52             ` Edgecombe, Rick P
2025-09-28  1:34           ` Yan Zhao
2025-09-29 11:17             ` Kiryl Shutsemau
2025-09-29 16:22               ` Dave Hansen
2025-09-29 16:58                 ` Edgecombe, Rick P
2025-09-30 18:29                   ` Edgecombe, Rick P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86ab9923-624d-4950-abea-46780e94c6ce@linux.intel.com \
    --to=binbin.wu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kas@kernel.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).