From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDEBB2D46C6 for ; Thu, 21 Aug 2025 19:22:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755804122; cv=none; b=qJy90k5/gNKclhVAzLXfnHlk0yvRnBbDuL2KcaFBEwBW27LK1k1YFGef3tJipJgSzwCfe7lCQH9bSF50YUbpA5dTrQkLtCOrxrLH9UFlofpuQ1R0ykwKqJq2d4sOSXYn/+0J/OtcZdEv76bDRD6IQwrSxN87aBSlGyzY3oXsF0Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755804122; c=relaxed/simple; bh=hKjV3Bdfa+3lPnicmtByd+iT0qEQfKA6FCL7/PvnH14=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=PJUFdgPyTE6dXGL5GzJDqTgkSreLcCamd/RtEnkI5P7iyDUfteDI6ZRntqBIBwjwmFEDvS1pOcyoyByAYe5uwbGNr2x9Hkw3lYz6ZGvWXLX5WFf+Gq+Zy9XeUKQv+XzmijKu2gselR4qG2rK8M/ERMhDLKpNDVKBem35W5CkGLs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=qaB5aa6/; arc=none smtp.client-ip=209.85.160.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="qaB5aa6/" Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-4b0bd88ab8fso72251cf.0 for ; Thu, 21 Aug 2025 12:22:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1755804120; x=1756408920; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ilc50mPVGrwfo+k7PZHzZeT1zp8dvAwV807vh9LnQD8=; b=qaB5aa6/4joBtFbkUpSpQZKdKcPDpXCi5OoHBWIQAMytX+qrMyEjBxXj0Yh1FKBkBJ sjjgLeZilZdvwfOiyhHpK9wcEOsKjXekN2tMPY8slEsp6LApTiB9lTHrekqHFydRhb0E nRBIlIfwhRCezb52IIBN3z3wulWtx+ADB3DxWwduLAUGSII84VZthseZ7WNaTRDiL2mo 5KnUAIUk1rtxToqsUZWUG0phwK6WS5IgpvratzX7/m3FJpX5PaPX2hUvy3Pt/mO4AJZM tKt17RtC9p0W4BR3tipB2MQm6++Y4bgrIHwbH8ssYH7Q5sP4VVLdu1QloiPqd/krpvy8 jG4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755804120; x=1756408920; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ilc50mPVGrwfo+k7PZHzZeT1zp8dvAwV807vh9LnQD8=; b=Ht3Bm/I/NNzSzfNfijDRhT0znInYr+uUgcd3Dl7rr3Yq0XqK4pe5B3r1Xc+2xNv/JR vuATTEABmBr0LaIkKSH79zlzIc0MXSH8UlwOUNJdZKZ4rGAAxuNLLNz8/lEp9dHXiTin xlsTM7wHNMSHvVrrVW3/LspOVshX+BRWgYiomI+UWvXO2PY/P8UfFtC7xgsxsUqyDx6P oldhiUFFhwzmGhzmN9//wPSOLEYIxNqqSDrIuS8Qfdp2rbbKXhjHDdUPqCY+jvno1f+y uGsu8fwVgstKwSRY2ug7wNoind/PHWAB+/Eyq47E8s86Ml9RX2uio4vIAgEklVirajUt nx/Q== X-Forwarded-Encrypted: i=1; AJvYcCXn69n8KkuQEW0rXIbqTyYoGtPcHJPGeO31jpRVWVjelSlJiKz1gfeLT7u6Ax4haxx2WC+N4uW1Y2OfMNY=@vger.kernel.org X-Gm-Message-State: AOJu0Yx5MTHSY7rhUTbOSROx/7bDbGWuUMl3/a3tmcVa8PFZz7vCUmH3 djiH6JwIJmGpSEej6VIh46g8jfYgycvU4ymediAzvxGt52yXnx9yij8veL3ANkVZplMihHwPT9m DbwC6y69MSrIlqYEFjJUY4cbV55GPXZYlEj+Ew1h+ X-Gm-Gg: ASbGncuoWhShQ/PEeqcS56Twt7l02uTlpJiZdXY1GHRtX+oin+WqbiRWfHDJysRXWcy BPmBaovZQ2VZVRoRazm4J2G0Qn1PmjihZmEIRzCP84bbf7NGhnIXT89Tp8CrWDmgcwOQ0/EMWKo FCidmTVvay7dmhKgHieJrLrMjFgUT+l1pFY1fdo75d6cqO2nEredx8hodS07GIH1lF7OBFzwvVJ jorXHWxi9tpi4pvVvqpGFVGS2hWLepbxJkX5oJlj73B/4PVcCeKi6flaA== X-Google-Smtp-Source: AGHT+IEfSQSRZKTmvco6JeYeQU6/BI51GtHOl5P8UiOFcRQ1UL7CTs/n9GMqAQJ/KofHvStmBAuUDbtTK0h93IsHqY4= X-Received: by 2002:a05:622a:860c:b0:4ab:3a34:317a with SMTP id d75a77b69052e-4b2aaff5aecmr374371cf.17.1755804119341; Thu, 21 Aug 2025 12:21:59 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> <20250609191340.2051741-9-kirill.shutemov@linux.intel.com> In-Reply-To: <20250609191340.2051741-9-kirill.shutemov@linux.intel.com> From: Sagi Shahar Date: Thu, 21 Aug 2025 14:21:47 -0500 X-Gm-Features: Ac12FXyTJoAPd-OPdCjI8iU_xiRJBgChDEsznpBOvEolzcRHN3gqj2WyAmgSbJ4 Message-ID: Subject: Re: [PATCHv2 08/12] KVM: TDX: Handle PAMT allocation in fault path To: "Kirill A. Shutemov" Cc: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com, rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Jun 9, 2025 at 2:16=E2=80=AFPM Kirill A. Shutemov wrote: > > There are two distinct cases when the kernel needs to allocate PAMT > memory in the fault path: for SEPT page tables in tdx_sept_link_private_s= pt() > and for leaf pages in tdx_sept_set_private_spte(). > > These code paths run in atomic context. Use a pre-allocated per-VCPU > pool for memory allocations. > > Signed-off-by: Kirill A. Shutemov > --- > arch/x86/include/asm/tdx.h | 4 ++++ > arch/x86/kvm/vmx/tdx.c | 40 ++++++++++++++++++++++++++++++++----- > arch/x86/virt/vmx/tdx/tdx.c | 21 +++++++++++++------ > virt/kvm/kvm_main.c | 1 + > 4 files changed, 55 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h > index 47092eb13eb3..39f8dd7e0f06 100644 > --- a/arch/x86/include/asm/tdx.h > +++ b/arch/x86/include/asm/tdx.h > @@ -116,6 +116,10 @@ u32 tdx_get_nr_guest_keyids(void); > void tdx_guest_keyid_free(unsigned int keyid); > > int tdx_nr_pamt_pages(void); > +int tdx_pamt_get(struct page *page, enum pg_level level, > + struct page *(alloc)(void *data), void *data); > +void tdx_pamt_put(struct page *page, enum pg_level level); > + > struct page *tdx_alloc_page(void); > void tdx_free_page(struct page *page); > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index 36c3c9f8a62c..bc9bc393f866 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -1537,11 +1537,26 @@ static int tdx_mem_page_record_premap_cnt(struct = kvm *kvm, gfn_t gfn, > return 0; > } > > +static struct page *tdx_alloc_pamt_page_atomic(void *data) > +{ > + struct kvm_vcpu *vcpu =3D data; > + void *p; > + > + p =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.pamt_page_cache); > + return virt_to_page(p); > +} > + > int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, > enum pg_level level, kvm_pfn_t pfn) > { > + struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); > struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); > struct page *page =3D pfn_to_page(pfn); > + int ret; > + > + ret =3D tdx_pamt_get(page, level, tdx_alloc_pamt_page_atomic, vcp= u); > + if (ret) > + return ret; tdx_pamt_get() can return non-zero value in case of success e.g. returning 1 in case tdx_pamt_add() lost the race. Shouldn't we check for (ret < 0) here and below cases? > > /* TODO: handle large pages. */ > if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) > @@ -1562,10 +1577,16 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gf= n_t gfn, > * barrier in tdx_td_finalize(). > */ > smp_rmb(); > - if (likely(kvm_tdx->state =3D=3D TD_STATE_RUNNABLE)) > - return tdx_mem_page_aug(kvm, gfn, level, page); > > - return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn); > + if (likely(kvm_tdx->state =3D=3D TD_STATE_RUNNABLE)) > + ret =3D tdx_mem_page_aug(kvm, gfn, level, page); > + else > + ret =3D tdx_mem_page_record_premap_cnt(kvm, gfn, level, p= fn); > + > + if (ret) > + tdx_pamt_put(page, level); > + > + return ret; > } > > static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, > @@ -1622,17 +1643,26 @@ int tdx_sept_link_private_spt(struct kvm *kvm, gf= n_t gfn, > enum pg_level level, void *private_spt) > { > int tdx_level =3D pg_level_to_tdx_sept_level(level); > - gpa_t gpa =3D gfn_to_gpa(gfn); > + struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); > struct page *page =3D virt_to_page(private_spt); > + gpa_t gpa =3D gfn_to_gpa(gfn); > u64 err, entry, level_state; > + int ret; > + > + ret =3D tdx_pamt_get(page, PG_LEVEL_4K, tdx_alloc_pamt_page_atomi= c, vcpu); > + if (ret) > + return ret; > > err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, tdx_level, pa= ge, &entry, > &level_state); > - if (unlikely(tdx_operand_busy(err))) > + if (unlikely(tdx_operand_busy(err))) { > + tdx_pamt_put(page, PG_LEVEL_4K); > return -EBUSY; > + } > > if (KVM_BUG_ON(err, kvm)) { > pr_tdx_error_2(TDH_MEM_SEPT_ADD, err, entry, level_state)= ; > + tdx_pamt_put(page, PG_LEVEL_4K); > return -EIO; > } > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 4f9eaba4af4a..d4b50b6428fa 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -2067,10 +2067,16 @@ static void tdx_free_pamt_pages(struct list_head = *pamt_pages) > } > } > > -static int tdx_alloc_pamt_pages(struct list_head *pamt_pages) > +static int tdx_alloc_pamt_pages(struct list_head *pamt_pages, > + struct page *(alloc)(void *data), void *= data) > { > for (int i =3D 0; i < tdx_nr_pamt_pages(); i++) { > - struct page *page =3D alloc_page(GFP_KERNEL); > + struct page *page; > + > + if (alloc) > + page =3D alloc(data); > + else > + page =3D alloc_page(GFP_KERNEL); > if (!page) > goto fail; > list_add(&page->lru, pamt_pages); > @@ -2115,7 +2121,8 @@ static int tdx_pamt_add(atomic_t *pamt_refcount, un= signed long hpa, > return 0; > } > > -static int tdx_pamt_get(struct page *page, enum pg_level level) > +int tdx_pamt_get(struct page *page, enum pg_level level, > + struct page *(alloc)(void *data), void *data) > { > unsigned long hpa =3D page_to_phys(page); > atomic_t *pamt_refcount; > @@ -2134,7 +2141,7 @@ static int tdx_pamt_get(struct page *page, enum pg_= level level) > if (atomic_inc_not_zero(pamt_refcount)) > return 0; > > - if (tdx_alloc_pamt_pages(&pamt_pages)) > + if (tdx_alloc_pamt_pages(&pamt_pages, alloc, data)) > return -ENOMEM; > > ret =3D tdx_pamt_add(pamt_refcount, hpa, &pamt_pages); > @@ -2143,8 +2150,9 @@ static int tdx_pamt_get(struct page *page, enum pg_= level level) > > return ret >=3D 0 ? 0 : ret; > } > +EXPORT_SYMBOL_GPL(tdx_pamt_get); > > -static void tdx_pamt_put(struct page *page, enum pg_level level) > +void tdx_pamt_put(struct page *page, enum pg_level level) > { > unsigned long hpa =3D page_to_phys(page); > atomic_t *pamt_refcount; > @@ -2179,6 +2187,7 @@ static void tdx_pamt_put(struct page *page, enum pg= _level level) > > tdx_free_pamt_pages(&pamt_pages); > } > +EXPORT_SYMBOL_GPL(tdx_pamt_put); > > struct page *tdx_alloc_page(void) > { > @@ -2188,7 +2197,7 @@ struct page *tdx_alloc_page(void) > if (!page) > return NULL; > > - if (tdx_pamt_get(page, PG_LEVEL_4K)) { > + if (tdx_pamt_get(page, PG_LEVEL_4K, NULL, NULL)) { > __free_page(page); > return NULL; > } > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index eec82775c5bf..6add012532a0 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -436,6 +436,7 @@ void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memor= y_cache *mc) > BUG_ON(!p); > return p; > } > +EXPORT_SYMBOL_GPL(kvm_mmu_memory_cache_alloc); > #endif > > static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsign= ed id) > -- > 2.47.2 > >