linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Huang, Kai" <kai.huang@intel.com>
To: "kirill.shutemov@linux.intel.com" <kirill.shutemov@linux.intel.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"seanjc@google.com" <seanjc@google.com>,
	"x86@kernel.org" <x86@kernel.org>, "bp@alien8.de" <bp@alien8.de>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"Zhao, Yan Y" <yan.y.zhao@intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>
Subject: Re: [RFC, PATCH 08/12] KVM: x86/tdp_mmu: Add phys_prepare() and phys_cleanup() to kvm_x86_ops
Date: Thu, 5 Jun 2025 22:21:46 +0000	[thread overview]
Message-ID: <46e0a089ea78613be5f0287eeca449231731f824.camel@intel.com> (raw)
In-Reply-To: <wwftow6boiueqbzrbfpedxs3e3ioelx3aqmsblzal6kxqdt3d5@dljyaozrfiry>

On Thu, 2025-06-05 at 16:01 +0300, kirill.shutemov@linux.intel.com wrote:
> On Fri, May 23, 2025 at 03:00:56PM +0300, kirill.shutemov@linux.intel.com wrote:
> > On Wed, May 14, 2025 at 12:00:17AM +0000, Huang, Kai wrote:
> > > On Mon, 2025-05-12 at 12:55 +0300, Kirill A. Shutemov wrote:
> > > > On Fri, May 09, 2025 at 09:25:58AM +0800, Yan Zhao wrote:
> > > > > On Thu, May 08, 2025 at 04:23:56PM +0300, Kirill A. Shutemov wrote:
> > > > > > On Tue, May 06, 2025 at 07:55:17PM +0800, Yan Zhao wrote:
> > > > > > > On Fri, May 02, 2025 at 04:08:24PM +0300, Kirill A. Shutemov wrote:
> > > > > > > > The functions kvm_x86_ops::link_external_spt() and
> > > > > > > > kvm_x86_ops::set_external_spte() are used to assign new memory to a VM.
> > > > > > > > When using TDX with Dynamic PAMT enabled, the assigned memory must be
> > > > > > > > covered by PAMT.
> > > > > > > > 
> > > > > > > > The new function kvm_x86_ops::phys_prepare() is called before
> > > > > > > > link_external_spt() and set_external_spte() to ensure that the memory is
> > > > > > > > ready to be assigned to the virtual machine. In the case of TDX, it
> > > > > > > > makes sure that the memory is covered by PAMT.
> > > > > > > > 
> > > > > > > > kvm_x86_ops::phys_prepare() is called in a context where struct kvm_vcpu
> > > > > > > > is available, allowing the implementation to allocate memory from a
> > > > > > > > per-VCPU pool.
> > > > > > > > 
> > > > > > > Why not invoke phys_prepare() and phys_cleanup() in set_external_spte_present()?
> > > > > > > Or in tdx_sept_set_private_spte()/tdx_sept_link_private_spt()?
> > > > > > 
> > > > > > Because the memory pool we allocated from is per-vcpu and we lost access
> > > > > > to vcpu by then. And not all callers provide vcpu.
> > > > > Maybe we can get vcpu via kvm_get_running_vcpu(), as in [1].
> > > > > Then for callers not providing vcpu (where vcpu is NULL), we can use per-KVM
> > > > > cache? 
> > > > 
> > > > Hm. I was not aware of kvm_get_running_vcpu(). Will play with it, thanks.
> > > 
> > > I am not sure why per-vcpu cache matters.
> > > 
> > > For non-leaf SEPT pages, AFAICT the "vcpu->arch.mmu_external_spt_cache" is just
> > > an empty cache, and eventually __get_free_page() is used to allocate in:
> > >                                                                                             
> > >   sp->external_spt = 
> > > 	kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_external_spt_cache);
> > > 
> > > So why not we actually create a kmem_cache for it with an actual 'ctor', and we
> > > can call tdx_alloc_page() in that.  This makes sure when the "external_spt" is
> > > allocated, the underneath PAMT entry is there.
> > 
> > I looked closer to this and while it is good idea, but ctor in kmem_cache
> > cannot fail which makes this approach not viable.
> > 
> > I guess we can a constructor directly into struct kvm_mmu_memory_cache.
> > Let me play with this.
> 
> I failed to make it work.
> 
> We need to have destructor paired with the constructor that would do
> PAMT-aware freeing. And redirect all free paths to it. It requires
> substantial rework. I don't think it worth the effort.
> 
> Will do manual PAMT management for SPT in TDX code.

Thanks for the effort.

Maybe something below?

diff --git a/arch/x86/kvm/mmu/mmu_internal.h
b/arch/x86/kvm/mmu/mmu_internal.h
index db8f33e4de62..48732270bff0 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -164,8 +164,10 @@ static inline bool is_mirror_sp(const struct
kvm_mmu_page *sp)
        return sp->role.is_mirror;
 }
 
-static inline void kvm_mmu_alloc_external_spt(struct kvm_vcpu *vcpu, struct
kvm_mmu_page *sp)
+static inline int kvm_mmu_alloc_external_spt(struct kvm_vcpu *vcpu, struct
kvm_mmu_page *sp)
 {
+       int r;
+
        /*
         * external_spt is allocated for TDX module to hold private EPT
mappings,
         * TDX module will initialize the page by itself.
@@ -173,6 +175,12 @@ static inline void kvm_mmu_alloc_external_spt(struct
kvm_vcpu *vcpu, struct kvm_
         * KVM only interacts with sp->spt for private EPT operations.
         */
        sp->external_spt = kvm_mmu_memory_cache_alloc(&vcpu-
>arch.mmu_external_spt_cache);
+
+       r = tdx_pamt_get(virt_to_page(sp->external_spt));
+       if (r)
+               free_page((unsigned long)sp->external_spt);
+
+       return r;
 }
 
 static inline gfn_t kvm_gfn_root_bits(const struct kvm *kvm, const struct
kvm_mmu_page *root)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7f3d7229b2c1..2d3a716d9195 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -55,7 +55,10 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 
 static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
 {
-       free_page((unsigned long)sp->external_spt);
+       if (sp->external_spt) {
+               free_page((unsigned long)sp->external_spt);
+               tdx_pamt_put(virt_to_page(sp->external_spt));
+       }
        free_page((unsigned long)sp->spt);
        kmem_cache_free(mmu_page_header_cache, sp);
 }
@@ -1277,8 +1280,13 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct
kvm_page_fault *fault)
                 */
                sp = tdp_mmu_alloc_sp(vcpu);
                tdp_mmu_init_child_sp(sp, &iter);
-               if (is_mirror_sp(sp))
-                       kvm_mmu_alloc_external_spt(vcpu, sp);
+               if (is_mirror_sp(sp)) {
+                       r = kvm_mmu_alloc_external_spt(vcpu, sp);
+                       if (r) {
+                               tdp_mmu_free_sp(sp);
+                               goto retry;
+                       }
+               }
 
                sp->nx_huge_page_disallowed = fault->huge_page_disallowed;


  reply	other threads:[~2025-06-05 22:22 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-02 13:08 [RFC, PATCH 00/12] TDX: Enable Dynamic PAMT Kirill A. Shutemov
2025-05-02 13:08 ` [RFC, PATCH 01/12] x86/virt/tdx: Allocate page bitmap for " Kirill A. Shutemov
2025-05-05 10:08   ` Huang, Kai
2025-05-02 13:08 ` [RFC, PATCH 02/12] x86/virt/tdx: Allocate reference counters for PAMT memory Kirill A. Shutemov
2025-05-05 11:05   ` Huang, Kai
2025-05-08 13:03     ` kirill.shutemov
2025-05-09  1:06       ` Huang, Kai
2025-05-12  9:53         ` kirill.shutemov
2025-05-13 23:24           ` Huang, Kai
2025-05-09  9:52   ` Chao Gao
2025-05-12  9:51     ` Kirill A. Shutemov
2025-05-02 13:08 ` [RFC, PATCH 03/12] x86/virt/tdx: Add wrappers for TDH.PHYMEM.PAMT.ADD/REMOVE Kirill A. Shutemov
2025-05-09 10:18   ` Chao Gao
2025-05-12  9:55     ` Kirill A. Shutemov
2025-05-02 13:08 ` [RFC, PATCH 04/12] x86/virt/tdx: Account PAMT memory and print if in /proc/meminfo Kirill A. Shutemov
2025-05-02 13:08 ` [RFC, PATCH 05/12] KVM: TDX: Add tdx_pamt_get()/put() helpers Kirill A. Shutemov
2025-05-05 12:44   ` Huang, Kai
2025-05-07  1:01     ` Yan Zhao
2025-05-07  1:15       ` Vishal Annapurve
2025-05-07  2:42         ` Yan Zhao
2025-05-08 13:19           ` kirill.shutemov
2025-05-07 16:31     ` Dave Hansen
2025-05-08  2:08       ` Yan Zhao
2025-05-08 13:21         ` kirill.shutemov
2025-05-08 13:16       ` kirill.shutemov
2025-05-23  9:42     ` kirill.shutemov
2025-05-14  5:25   ` Chao Gao
2025-05-23 10:46     ` Kirill A. Shutemov
2025-05-14  5:33   ` Chao Gao
2025-05-14  6:25     ` Kirill A. Shutemov
2025-05-02 13:08 ` [RFC, PATCH 06/12] KVM: TDX: Allocate PAMT memory in __tdx_td_init() Kirill A. Shutemov
2025-05-05 12:46   ` Huang, Kai
2025-05-02 13:08 ` [RFC, PATCH 07/12] KVM: TDX: Allocate PAMT memory in tdx_td_vcpu_init() Kirill A. Shutemov
2025-05-02 13:08 ` [RFC, PATCH 08/12] KVM: x86/tdp_mmu: Add phys_prepare() and phys_cleanup() to kvm_x86_ops Kirill A. Shutemov
2025-05-06 11:55   ` Yan Zhao
2025-05-08 13:23     ` Kirill A. Shutemov
2025-05-09  1:25       ` Yan Zhao
2025-05-12  9:55         ` Kirill A. Shutemov
2025-05-14  0:00           ` Huang, Kai
2025-05-14  6:43             ` kirill.shutemov
2025-05-19  5:00               ` Huang, Kai
2025-05-23 12:00             ` kirill.shutemov
2025-06-05 13:01               ` kirill.shutemov
2025-06-05 22:21                 ` Huang, Kai [this message]
2025-06-06 10:20                   ` kirill.shutemov
2025-05-14  6:15   ` Chao Gao
2025-05-02 13:08 ` [RFC, PATCH 09/12] KVM: TDX: Preallocate PAMT pages to be used in page fault path Kirill A. Shutemov
2025-05-14  0:07   ` Huang, Kai
2025-05-14  6:30   ` Chao Gao
2025-05-30 10:28     ` Kirill A. Shutemov
2025-05-02 13:08 ` [RFC, PATCH 10/12] KVM: TDX: Hookup phys_prepare() and phys_cleanup() kvm_x86_ops Kirill A. Shutemov
2025-05-02 13:08 ` [RFC, PATCH 11/12] KVM: TDX: Reclaim PAMT memory Kirill A. Shutemov
2025-05-14  1:11   ` Huang, Kai
2025-05-14 15:21     ` Vishal Annapurve
2025-05-19  5:06       ` Huang, Kai
2025-05-02 13:08 ` [RFC, PATCH 12/12] x86/virt/tdx: Enable Dynamic PAMT Kirill A. Shutemov
2025-05-14 13:41 ` [RFC, PATCH 00/12] TDX: " Sean Christopherson
2025-05-15 14:22   ` Kirill A. Shutemov
2025-05-15 15:03     ` Dave Hansen
2025-05-15 16:02       ` Kirill A. Shutemov
2025-05-14 20:33 ` Zhi Wang
2025-05-15  9:17   ` Kirill A. Shutemov
2025-05-15 14:03     ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46e0a089ea78613be5f0287eeca449231731f824.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).