Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Fuad Tabba <tabba@google.com>,
	kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
	 linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com,
	 chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
	 paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu,  viro@zeniv.linux.org.uk,
	brauner@kernel.org, willy@infradead.org,
	 akpm@linux-foundation.org, yilun.xu@intel.com,
	chao.p.peng@linux.intel.com,  jarkko@kernel.org,
	amoorthy@google.com, dmatlack@google.com,
	 isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz,
	 vannapurve@google.com, ackerleytng@google.com,
	mail@maciej.szmigiero.name,  david@redhat.com,
	michael.roth@amd.com, wei.w.wang@intel.com,
	 liam.merwick@oracle.com, isaku.yamahata@gmail.com,
	 kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
	steven.price@arm.com,  quic_eberman@quicinc.com,
	quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
	 quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
	 quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	catalin.marinas@arm.com,  james.morse@arm.com,
	yuzenghui@huawei.com, oliver.upton@linux.dev,  maz@kernel.org,
	will@kernel.org, qperret@google.com, keirf@google.com,
	 roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
	jgg@nvidia.com,  rientjes@google.com, jhubbard@nvidia.com,
	fvdl@google.com, hughd@google.com,  jthoughton@google.com,
	peterx@redhat.com, pankaj.gupta@amd.com,  ira.weiny@intel.com
Subject: Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type
Date: Tue, 22 Jul 2025 08:54:25 -0700	[thread overview]
Message-ID: <aH-0MdNJbH19Mhm3@google.com> (raw)
In-Reply-To: <13654746-3edc-4e4a-ac4f-fa281b83b2ae@intel.com>

On Tue, Jul 22, 2025, Xiaoyao Li wrote:
> On 7/22/2025 10:37 PM, Sean Christopherson wrote:
> > On Tue, Jul 22, 2025, Xiaoyao Li wrote:
> > > On 7/21/2025 8:22 PM, Xiaoyao Li wrote:
> > > > On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > > > > +/*
> > > > > + * CoCo VMs with hardware support that use guest_memfd only for
> > > > > backing private
> > > > > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping
> > > > > enabled.
> > > > > + */
> > > > > +#define kvm_arch_supports_gmem_mmap(kvm)        \
> > > > > +    (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&    \
> > > > > +     (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
> > > > 
> > > > I want to share the findings when I do the POC to enable gmem mmap in QEMU.
> > > > 
> > > > Actually, QEMU can use gmem with mmap support as the normal memory even
> > > > without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
> > > > on KVM_SET_USER_MEMORY_REGION2.
> > > > 
> > > > Since the gmem is mmapable, QEMU can pass the userspace addr got from
> > > > mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
> > > > works well for non-coco VMs on x86.
> > > 
> > > one more findings.
> > > 
> > > I tested with QEMU by creating normal (non-private) memory with mmapable
> > > guest memfd, and enforcily passing the fd of the gmem to struct
> > > kvm_userspace_memory_region2 when QEMU sets up memory region.
> > > 
> > > It hits the kvm_gmem_bind() error since QEMU tries to back different GPA
> > > region with the same gmem.
> > > 
> > > So, the question is do we want to allow the multi-binding for shared-only
> > > gmem?
> > 
> > Can you elaborate, maybe with code?  I don't think I fully understand the setup.
> 
> well, I haven't fully sorted it out. Just share what I get so far.
> 
> the problem hit when SMM is enabled (which is enabled by default).
> 
> - The trace of "-machine q35,smm=off":
> 
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000
> size=0x80000000 ua=0x7f57b3fff000 guest_memfd=15
> guest_memfd_offset=0x80000000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x2 gpa=0xffc00000
> size=0x400000 ua=0x7f5840a00000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x0 gpa=0x0 size=0x0
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0xc0000
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x2 gpa=0xc0000 size=0x20000
> ua=0x7f5841000000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#4 flags=0x2 gpa=0xe0000 size=0x20000
> ua=0x7f5840de0000 guest_memfd=-1 guest_memfd_offset=0x3e0000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#5 flags=0x4 gpa=0x100000
> size=0x7ff00000 ua=0x7f57340ff000 guest_memfd=15 guest_memfd_offset=0x100000
> ret=0
> 
> - The trace of "-machine q35"
> 
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000
> ua=0x7f8faffff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000
> size=0x80000000 ua=0x7f902ffff000 guest_memfd=15
> guest_memfd_offset=0x80000000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x2 gpa=0xffc00000
> size=0x400000 ua=0x7f90bd000000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x4 gpa=0xfeda0000 size=0x20000
> ua=0x7f8fb009f000 guest_memfd=15 guest_memfd_offset=0xa0000 ret=-22
> qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION2
> failed, slot=3, start=0xfeda0000, size=0x20000, flags=0x4, guest_memfd=15,
> guest_memfd_offset=0xa0000: Invalid argument
> kvm_set_phys_mem: error registering slot: Invalid argument
> 
> 
> where QEMU tries to setup the memory region for [0xfeda0000, +0x20000],
> which is back'ed by gmem (fd is 15) allocated for normal RAM, from offset
> 0xa0000.
> 
> What I have tracked down in QEMU is mch_realize(), where it sets up some
> memory region starting from 0xfeda0000.

Oh yay, SMM.  The problem lies in memory regions that are aliased into low memory
(IIRC, there's at least one other such scenario, but don't quote me on that).
For SMRAM, when the "high" SMRAM location (0xfeda0000) is enabled, the "legacy"
SMRAM location (0xa0000) gets remapped (aliased in QEMU's vernacular) to the
high location, resulting in two CPU physical addresses pointing at the same
underyling memory[*].  From KVM's perspective, that means two GPA ranges pointing
at the same HVA.

As for whether or not we want to support such madness...  I'd definitely say "not
now", and probably not ever.  Emulating SMM puts the VMM *firmly* in the TCB of
the guest, and so guest_memfd benefits like not having to map guest memory into
userspace pretty much go out the window.  For such a use case, I don't think it's
unreasonable to require QEMU (or any other VMM) to map the aliases via HVA only,
i.e. to not take full advantage of guest_memfd.

[*] https://opensecuritytraining.info/IntroBIOS_files/Day1_08_Advanced%20x86%20-%20BIOS%20and%20SMM%20Internals%20-%20SMRAM.pdf

next prev parent reply	other threads:[~2025-07-22 15:54 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-17 16:27 [PATCH v15 00/21] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 01/21] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-07-21 15:17   ` Sean Christopherson
2025-07-21 15:26     ` Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 02/21] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-07-21 16:44   ` Sean Christopherson
2025-07-21 16:51     ` Fuad Tabba
2025-07-21 17:33       ` Sean Christopherson
2025-07-22  9:29         ` Fuad Tabba
2025-07-22 15:58           ` Sean Christopherson
2025-07-22 16:01             ` Fuad Tabba
2025-07-22 23:42               ` Sean Christopherson
2025-07-23  9:22                 ` Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 03/21] KVM: Introduce kvm_arch_supports_gmem() Fuad Tabba
2025-07-18  1:42   ` Xiaoyao Li
2025-07-21 14:47     ` Sean Christopherson
2025-07-21 14:55     ` Fuad Tabba
2025-07-21 16:44   ` Sean Christopherson
2025-07-17 16:27 ` [PATCH v15 04/21] KVM: x86: Introduce kvm->arch.supports_gmem Fuad Tabba
2025-07-21 16:45   ` Sean Christopherson
2025-07-21 17:00     ` Fuad Tabba
2025-07-21 19:09       ` Sean Christopherson
2025-07-17 16:27 ` [PATCH v15 05/21] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 06/21] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 07/21] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 08/21] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-07-18  2:56   ` Xiaoyao Li
2025-07-17 16:27 ` [PATCH v15 09/21] KVM: guest_memfd: Track guest_memfd mmap support in memslot Fuad Tabba
2025-07-18  3:33   ` Xiaoyao Li
2025-07-17 16:27 ` [PATCH v15 10/21] KVM: x86/mmu: Generalize private_max_mapping_level x86 op to max_mapping_level Fuad Tabba
2025-07-18  6:19   ` Xiaoyao Li
2025-07-21 19:46   ` Sean Christopherson
2025-07-17 16:27 ` [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level Fuad Tabba
2025-07-18  5:10   ` Xiaoyao Li
2025-07-21 23:17     ` Sean Christopherson
2025-07-22  5:35       ` Xiaoyao Li
2025-07-22 11:08         ` Fuad Tabba
2025-07-22 14:32           ` Sean Christopherson
2025-07-22 15:30             ` Fuad Tabba
2025-07-22 10:35       ` Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 12/21] KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level Fuad Tabba
2025-07-18  5:32   ` Xiaoyao Li
2025-07-18  5:57     ` Xiaoyao Li
2025-07-17 16:27 ` [PATCH v15 13/21] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-07-18  6:09   ` Xiaoyao Li
2025-07-21 16:47   ` Sean Christopherson
2025-07-21 16:56     ` Fuad Tabba
2025-07-22  5:41     ` Xiaoyao Li
2025-07-22  8:43       ` Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type Fuad Tabba
2025-07-18  6:10   ` Xiaoyao Li
2025-07-21 12:22   ` Xiaoyao Li
2025-07-21 12:41     ` Fuad Tabba
2025-07-21 13:45     ` Vishal Annapurve
2025-07-21 14:42       ` Xiaoyao Li
2025-07-21 14:42       ` Sean Christopherson
2025-07-21 15:07         ` Xiaoyao Li
2025-07-21 17:29           ` Sean Christopherson
2025-07-21 20:33             ` Vishal Annapurve
2025-07-21 22:21               ` Sean Christopherson
2025-07-21 23:50                 ` Vishal Annapurve
2025-07-22 14:35                   ` Sean Christopherson
2025-07-23 14:08                     ` Vishal Annapurve
2025-07-23 14:43                       ` Sean Christopherson
2025-07-23 14:46                         ` David Hildenbrand
2025-07-22 14:28     ` Xiaoyao Li
2025-07-22 14:37       ` Sean Christopherson
2025-07-22 15:31         ` Xiaoyao Li
2025-07-22 15:50           ` David Hildenbrand
2025-07-22 15:54           ` Sean Christopherson [this message]
2025-07-17 16:27 ` [PATCH v15 15/21] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 16/21] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
2025-07-22 12:31   ` Kunwu Chan
2025-07-23  8:20     ` Marc Zyngier
2025-07-23 11:44       ` Kunwu Chan
2025-07-23  8:26   ` Marc Zyngier
2025-07-17 16:27 ` [PATCH v15 17/21] KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd Fuad Tabba
2025-07-23  8:29   ` Marc Zyngier
2025-07-17 16:27 ` [PATCH v15 18/21] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
2025-07-23  8:33   ` Marc Zyngier
2025-07-23  9:18     ` Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 19/21] KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP Fuad Tabba
2025-07-18  6:14   ` Xiaoyao Li
2025-07-21 17:31   ` Sean Christopherson
2025-07-17 16:27 ` [PATCH v15 20/21] KVM: selftests: Do not use hardcoded page sizes in guest_memfd test Fuad Tabba
2025-07-17 16:27 ` [PATCH v15 21/21] KVM: selftests: guest_memfd mmap() test when mmap is supported Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aH-0MdNJbH19Mhm3@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=ira.weiny@intel.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=pankaj.gupta@amd.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_eberman@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).