Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Sean Christopherson <seanjc@google.com>, Fuad Tabba <tabba@google.com>
Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
	linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com,
	chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
	paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk,
	brauner@kernel.org, willy@infradead.org,
	akpm@linux-foundation.org, xiaoyao.li@intel.com,
	yilun.xu@intel.com, chao.p.peng@linux.intel.com,
	jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com,
	isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz,
	vannapurve@google.com, ackerleytng@google.com,
	mail@maciej.szmigiero.name, michael.roth@amd.com,
	wei.w.wang@intel.com, liam.merwick@oracle.com,
	isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com,
	suzuki.poulose@arm.com, steven.price@arm.com,
	quic_eberman@quicinc.com, quic_mnalajal@quicinc.com,
	quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com,
	quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com,
	quic_pheragu@quicinc.com, catalin.marinas@arm.com,
	james.morse@arm.com, yuzenghui@huawei.com,
	oliver.upton@linux.dev, maz@kernel.org, will@kernel.org,
	qperret@google.com, keirf@google.com, roypat@amazon.co.uk,
	shuah@kernel.org, hch@infradead.org, jgg@nvidia.com,
	rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com,
	hughd@google.com, jthoughton@google.com, peterx@redhat.com,
	pankaj.gupta@amd.com, ira.weiny@intel.com
Subject: Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
Date: Wed, 18 Jun 2025 11:25:52 +0200	[thread overview]
Message-ID: <9f252a62-9232-4048-ac10-0c6cdf2154fe@redhat.com> (raw)
In-Reply-To: <aEySD5XoxKbkcuEZ@google.com>


>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index d00b85cb168c..cb19150fd595 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
>>   #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>>   
>>   #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
>> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED	(1ULL << 0)
> 

Coming back to this part of the initial mail:

> I find the SUPPORT_SHARED terminology to be super confusing.  I had to dig quite
> deep to undesrtand that "support shared" actually mean "userspace explicitly
> enable sharing on _this_ guest_memfd instance".  E.g. I was surprised to see
> 
> IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate.  But even that is
> weird to me.  For non-CoCo VMs, there is no concept of shared vs. private.  What's
> novel and notable is that the memory is _mappable_.  Yeah, yeah, pKVM's use case
> is to share memory, but that's a _use case_, not the property of guest_memfd that
> is being controlled by userspace.

Looking back, it would all have made more sense if one would have to 
explicitly request support for "private" memory, and the non-private 
(ordinary?) would have been the default ... :)


> 
> And kvm_gmem_memslot_supports_shared() is even worse.  It's simply that the
> memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
> instance is the _only_ entry point to the memslot.
> 
> So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
> KVM_MEMSLOT_GUEST_MEMFD_ONLY.  That will make code like this:

As raised, GUEST_MEMFD_FLAG_MMAPPABLE / GUEST_MEMFD_FLAG_MMAP or sth. 
like that that spells out "mmap" would be better IMHO. Better than 
talking about faultability or mappability. (fault into what? map into 
what? mmap() cleanly implies user space)

That means that we have "mmap() support implies support for non-private 
memory", I can live with that, although some code might end up being a 
bit confusing (e.g., that's why you proposed kvm_is_memslot_gmem_only() 
below).

> 
> 	if (kvm_slot_has_gmem(slot) &&
> 	    (kvm_gmem_memslot_supports_shared(slot) ||
> 	     kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> 		return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> 	}
> 
> much more intutive:
> 
> 	if (kvm_is_memslot_gmem_only(slot) ||
> 	    kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE))
> 		return kvm_gmem_max_mapping_level(slot, gfn, max_level);

Hiding the details in such a helper makes it easier to digest.

> 
> And then have kvm_gmem_mapping_order() do:
> 
> 	WARN_ON_ONCE(!kvm_slot_has_gmem(slot));
> 	return 0;
> 
>>   struct kvm_create_guest_memfd {
>>   	__u64 size;
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 559c93ad90be..e90884f74404 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
>>   config HAVE_KVM_ARCH_GMEM_INVALIDATE
>>          bool
>>          depends on KVM_GMEM
>> +
>> +config KVM_GMEM_SHARED_MEM
>> +       select KVM_GMEM
>> +       bool
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 6db515833f61..06616b6b493b 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>>   	return gfn - slot->base_gfn + slot->gmem.pgoff;
>>   }
>>   
>> +static bool kvm_gmem_supports_shared(struct inode *inode)
>> +{
>> +	const u64 flags = (u64)inode->i_private;
>> +
>> +	if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
>> +		return false;
>> +
>> +	return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
>> +}
>> +
>> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> 
> And to my point about "shared", this is also very confusing, because there are
> zero checks in here about shared vs. private.

I assume it's simply a "kvm_gmem_fault" right now.

> 
>> +{
>> +	struct inode *inode = file_inode(vmf->vma->vm_file);
>> +	struct folio *folio;
>> +	vm_fault_t ret = VM_FAULT_LOCKED;
>> +
>> +	if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>> +		return VM_FAULT_SIGBUS;
>> +
>> +	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>> +	if (IS_ERR(folio)) {
>> +		int err = PTR_ERR(folio);
>> +
>> +		if (err == -EAGAIN)
>> +			return VM_FAULT_RETRY;
>> +
>> +		return vmf_error(err);
>> +	}
>> +
>> +	if (WARN_ON_ONCE(folio_test_large(folio))) {
>> +		ret = VM_FAULT_SIGBUS;
>> +		goto out_folio;
>> +	}
>> +
>> +	if (!folio_test_uptodate(folio)) {
>> +		clear_highpage(folio_page(folio, 0));
>> +		kvm_gmem_mark_prepared(folio);
>> +	}
>> +
>> +	vmf->page = folio_file_page(folio, vmf->pgoff);
>> +
>> +out_folio:
>> +	if (ret != VM_FAULT_LOCKED) {
>> +		folio_unlock(folio);
>> +		folio_put(folio);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
>> +	.fault = kvm_gmem_fault_shared,
>> +};
>> +
>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> +	if (!kvm_gmem_supports_shared(file_inode(file)))
>> +		return -ENODEV;
>> +
>> +	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
>> +	    (VM_SHARED | VM_MAYSHARE)) {
> 
> And the SHARED terminology gets really confusing here, due to colliding with the
> existing notion of SHARED file mappings.
> 

kvm_gmem_supports_mmap() would be as clear as it gets for this case here.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2025-06-18  9:26 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-06-13 13:57   ` Ackerley Tng
2025-06-13 20:35   ` Sean Christopherson
2025-06-16  7:13     ` Fuad Tabba
2025-06-16 14:20       ` David Hildenbrand
2025-06-24 20:51     ` Ackerley Tng
2025-06-25  6:33       ` Roy, Patrick
2025-06-11 13:33 ` [PATCH v12 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-06-12 16:16   ` Shivank Garg
2025-06-13 21:03   ` Sean Christopherson
2025-06-13 21:18     ` David Hildenbrand
2025-06-13 22:48     ` Sean Christopherson
2025-06-16  6:52     ` Fuad Tabba
2025-06-16 14:16       ` David Hildenbrand
2025-06-17 23:04       ` Sean Christopherson
2025-06-18 11:18         ` Fuad Tabba
2025-06-16 13:44     ` Ira Weiny
2025-06-16 14:03       ` David Hildenbrand
2025-06-16 14:16         ` Fuad Tabba
2025-06-16 14:25           ` David Hildenbrand
2025-06-18  0:40             ` Sean Christopherson
2025-06-18  8:15               ` David Hildenbrand
2025-06-18  9:20                 ` Xiaoyao Li
2025-06-18  9:27                   ` David Hildenbrand
2025-06-18  9:44                     ` Xiaoyao Li
2025-06-18  9:59                       ` David Hildenbrand
2025-06-18 10:42                         ` Xiaoyao Li
2025-06-18 11:14                           ` David Hildenbrand
2025-06-18 12:17                             ` Xiaoyao Li
2025-06-18 13:16                               ` David Hildenbrand
2025-06-19  1:48                 ` Sean Christopherson
2025-06-19  1:50                   ` Sean Christopherson
2025-06-18  9:25     ` David Hildenbrand [this message]
2025-06-25 21:47   ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-06-13 22:08   ` Sean Christopherson
2025-06-24 23:40     ` Ackerley Tng
2025-06-27 15:01       ` Ackerley Tng
2025-06-30  8:07         ` Fuad Tabba
2025-06-30 14:44           ` Ackerley Tng
2025-06-30 15:08             ` Fuad Tabba
2025-06-30 19:26               ` Shivank Garg
2025-06-30 20:03                 ` David Hildenbrand
2025-07-01 14:15                   ` Ackerley Tng
2025-07-01 14:44                     ` David Hildenbrand
2025-07-08  0:05                       ` Sean Christopherson
2025-07-08 13:44                         ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 12/18] KVM: x86: Enable guest_memfd shared memory for non-CoCo VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
2025-06-12 17:33   ` James Houghton
2025-06-11 13:33 ` [PATCH v12 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
2025-06-12 16:24   ` Shivank Garg
2025-06-11 13:33 ` [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-06-12 16:23   ` Shivank Garg
2025-06-12 17:38 ` [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
2025-06-24 10:02   ` Fuad Tabba
2025-06-24 10:16     ` David Hildenbrand
2025-06-24 10:25       ` Fuad Tabba
2025-06-24 11:44         ` David Hildenbrand
2025-06-24 11:58           ` Fuad Tabba
2025-06-24 17:50             ` Sean Christopherson
2025-06-25  8:00               ` Fuad Tabba
2025-06-25 14:07                 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f252a62-9232-4048-ac10-0c6cdf2154fe@redhat.com \
    --to=david@redhat.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=ira.weiny@intel.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=pankaj.gupta@amd.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_eberman@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).