From: Vishal Annapurve <vannapurve@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Ira Weiny <ira.weiny@intel.com>, Yan Zhao <yan.y.zhao@intel.com>,
Michael Roth <michael.roth@amd.com>,
pbonzini@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com,
kai.huang@intel.com, adrian.hunter@intel.com,
reinette.chatre@intel.com, xiaoyao.li@intel.com,
tony.lindgren@intel.com, binbin.wu@linux.intel.com,
dmatlack@google.com, isaku.yamahata@intel.com, david@redhat.com,
ackerleytng@google.com, tabba@google.com, chao.p.peng@intel.com
Subject: Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem_populate()
Date: Tue, 5 Aug 2025 07:30:17 -0700 [thread overview]
Message-ID: <CAGtprH-VcO4VJkgRZJPS8SEpdO6Xsjmw3CeGYCFchjkdROoMLg@mail.gmail.com> (raw)
In-Reply-To: <CAGtprH9ELoYmwA+brSx-kWH5qSK==u8huW=4otEZ5evu_GTvtQ@mail.gmail.com>
On Mon, Aug 4, 2025 at 6:20 PM Vishal Annapurve <vannapurve@google.com> wrote:
>
> On Mon, Aug 4, 2025 at 5:22 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > > > IIUC, the suggestion in the link is to abandon kvm_gmem_populate().
> > > > > > For TDX, it means adopting the approach in this RFC patch, right?
> > > > > Yes, IMO this RFC is following the right approach as posted.
> >
> > I don't think we want to abandon kvm_gmem_populate(). Unless I'm missing something,
> > SNP has the same AB-BA problem as TDX. The copy_from_user() on @src can trigger
> > a page fault, and resolving the page fault may require taking mm->mmap_lock.
> >
> > Fundamentally, TDX and SNP are doing the same thing: copying from source to guest
> > memory. The only differences are in the mechanics of the copy+encrypt, everything
> > else is the same. I.e. I don't expect that we'll find a magic solution that works
> > well for one and not the other.
> >
> > I also don't want to end up with wildly different ABI for SNP vs. everything else.
> > E.g. cond_resched() needs to be called if the to-be-initialzied range is large,
> > which means dropping mmu_lock between pages, whereas kvm_gmem_populate() can
> > yield without dropping invalidate_lock, which means that the behavior of populating
> > guest_memfd memory will be quite different with respect to guest_memfd operations.
>
> I would think that TDX/CCA VMs [1] will run into the similar behavior
> of needing to simulate stage2 faults i.e. KVM will end up picking up
> and dropping mmu_lock for each page anyways at least for these two
> platforms.
>
> [1] https://lore.kernel.org/kvm/20250611104844.245235-5-steven.price@arm.com/
> (rmi_rtt_create())
>
> >
> > Pulling in the RFC text:
> >
> > : I think the only different scenario is SNP, where the host must write
> > : initial contents to guest memory.
> > :
> > : Will this work for all cases CCA/SNP/TDX during initial memory
> > : population from within KVM:
> > : 1) Simulate stage2 fault
> > : 2) Take a KVM mmu read lock
> >
> > Doing all of this under mmu_lock is pretty much a non-starter.
Looking closer at CPU <-> PSP communication which is not implemented
to work within an atomic context, I agree now that this wouldn't work
for SNP VMs.
> >
> > : 3) Check that the needed gpa is mapped in EPT/NPT entries
> >
> > No, KVM's page tables are not the source of truth. S-EPT is a special snowflake,
> > and I'd like to avoid foisting the same requirements on NPT.
>
> I agree this would be a new requirement.
>
> >
> > : 4) For SNP, if src != null, make the target pfn to be shared, copy
> > : contents and then make the target pfn back to private.
> >
> > Copying from userspace under spinlock (rwlock) is illegal, as accessing userspace
> > memory might_fault() and thus might_sleep().
>
> I would think that a combination of get_user_pages() and
> kmap_local_pfn() will prevent this situation of might_fault().
>
> >
> > : 5) For TDX, if src != null, pass the same address for source and
> > : target (likely this works for CCA too)
> > : 6) Invoke appropriate memory encryption operations
> > : 7) measure contents
> > : 8) release the KVM mmu read lock
> > :
> > : If this scheme works, ideally we should also not call RMP table
> > : population logic from guest_memfd, but from KVM NPT fault handling
> > : logic directly (a bit of cosmetic change).
> >
next prev parent reply other threads:[~2025-08-05 14:30 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 6:26 [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem_populate() Yan Zhao
2025-07-03 16:51 ` Vishal Annapurve
2025-07-09 23:21 ` Michael Roth
2025-07-10 16:24 ` Sean Christopherson
2025-07-11 1:41 ` Ira Weiny
2025-07-11 14:21 ` Sean Christopherson
2025-07-11 4:36 ` Yan Zhao
2025-07-11 15:17 ` Michael Roth
2025-07-11 15:39 ` Sean Christopherson
2025-07-11 16:34 ` Michael Roth
2025-07-11 18:38 ` Vishal Annapurve
2025-07-11 19:49 ` Michael Roth
2025-07-11 20:19 ` Sean Christopherson
2025-07-11 20:25 ` Ira Weiny
2025-07-11 22:56 ` Sean Christopherson
2025-07-11 23:04 ` Vishal Annapurve
2025-07-14 23:11 ` Ira Weiny
2025-07-15 0:41 ` Vishal Annapurve
2025-07-14 23:08 ` Ira Weiny
2025-07-14 23:12 ` Sean Christopherson
2025-07-11 18:46 ` Vishal Annapurve
2025-07-12 17:38 ` Vishal Annapurve
2025-07-14 6:15 ` Yan Zhao
2025-07-14 15:46 ` Sean Christopherson
2025-07-14 16:02 ` David Hildenbrand
2025-07-14 16:07 ` Sean Christopherson
2025-07-15 1:10 ` Yan Zhao
2025-07-18 9:14 ` Yan Zhao
2025-07-18 15:57 ` Vishal Annapurve
2025-07-18 18:42 ` Ira Weiny
2025-07-18 18:59 ` Vishal Annapurve
2025-07-21 17:46 ` Ira Weiny
2025-07-28 9:48 ` Yan Zhao
2025-07-29 0:45 ` Vishal Annapurve
2025-07-29 1:37 ` Yan Zhao
2025-07-29 16:33 ` Ira Weiny
2025-08-05 0:22 ` Sean Christopherson
2025-08-05 1:20 ` Vishal Annapurve
2025-08-05 14:30 ` Vishal Annapurve [this message]
2025-08-05 19:59 ` Sean Christopherson
2025-08-06 0:09 ` Vishal Annapurve
2025-07-14 3:20 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGtprH-VcO4VJkgRZJPS8SEpdO6Xsjmw3CeGYCFchjkdROoMLg@mail.gmail.com \
--to=vannapurve@google.com \
--cc=ackerleytng@google.com \
--cc=adrian.hunter@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=chao.p.peng@intel.com \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=ira.weiny@intel.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=tony.lindgren@intel.com \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).