All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Marc Zyngier <maz@kernel.org>,
	 Oliver Upton <oliver.upton@linux.dev>,
	Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	 kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	 Ira Weiny <ira.weiny@intel.com>, Gavin Shan <gshan@redhat.com>,
	Shivank Garg <shivankg@amd.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Xiaoyao Li <xiaoyao.li@intel.com>,
	 David Hildenbrand <david@redhat.com>,
	Fuad Tabba <tabba@google.com>,
	 Ackerley Tng <ackerleytng@google.com>,
	Tao Chan <chentao@kylinos.cn>,
	 James Houghton <jthoughton@google.com>
Subject: [PATCH v17 10/24] KVM: guest_memfd: Add plumbing to host to map guest_memfd pages
Date: Tue, 29 Jul 2025 15:54:41 -0700	[thread overview]
Message-ID: <20250729225455.670324-11-seanjc@google.com> (raw)
In-Reply-To: <20250729225455.670324-1-seanjc@google.com>

From: Fuad Tabba <tabba@google.com>

Introduce the core infrastructure to enable host userspace to mmap()
guest_memfd-backed memory. This is needed for several evolving KVM use
cases:

* Non-CoCo VM backing: Allows VMMs like Firecracker to run guests
  entirely backed by guest_memfd, even for non-CoCo VMs [1]. This
  provides a unified memory management model and simplifies guest memory
  handling.

* Direct map removal for enhanced security: This is an important step
  for direct map removal of guest memory [2]. By allowing host userspace
  to fault in guest_memfd pages directly, we can avoid maintaining host
  kernel direct maps of guest memory. This provides additional hardening
  against Spectre-like transient execution attacks by removing a
  potential attack surface within the kernel.

* Future guest_memfd features: This also lays the groundwork for future
  enhancements to guest_memfd, such as supporting huge pages and
  enabling in-place sharing of guest memory with the host for CoCo
  platforms that permit it [3].

Enable the basic mmap and fault handling logic within guest_memfd, but
hold off on allow userspace to actually do mmap() until the architecture
support is also in place.

[1] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[2] https://lore.kernel.org/linux-mm/cc1bb8e9bc3e1ab637700a4d3defeec95b55060a.camel@amazon.com
[3] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com/T/#u

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c       | 11 +++++++
 include/linux/kvm_host.h |  4 +++
 virt/kvm/guest_memfd.c   | 70 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1c49bc681c4..e5cd54ba1eaa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13518,6 +13518,16 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
 
+#ifdef CONFIG_KVM_GUEST_MEMFD
+/*
+ * KVM doesn't yet support mmap() on guest_memfd for VMs with private memory
+ * (the private vs. shared tracking needs to be moved into guest_memfd).
+ */
+bool kvm_arch_supports_gmem_mmap(struct kvm *kvm)
+{
+	return !kvm_arch_has_private_mem(kvm);
+}
+
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
 {
@@ -13531,6 +13541,7 @@ void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
 	kvm_x86_call(gmem_invalidate)(start, end);
 }
 #endif
+#endif
 
 int kvm_spec_ctrl_test_value(u64 value)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4d1c44622056..26bad600f9fa 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -726,6 +726,10 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
 }
 #endif
 
+#ifdef CONFIG_KVM_GUEST_MEMFD
+bool kvm_arch_supports_gmem_mmap(struct kvm *kvm);
+#endif
+
 #ifndef kvm_arch_has_readonly_mem
 static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
 {
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index a99e11b8b77f..67e7cd7210ef 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -312,7 +312,72 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
 	return gfn - slot->base_gfn + slot->gmem.pgoff;
 }
 
+static bool kvm_gmem_supports_mmap(struct inode *inode)
+{
+	return false;
+}
+
+static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
+{
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct folio *folio;
+	vm_fault_t ret = VM_FAULT_LOCKED;
+
+	if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
+		return VM_FAULT_SIGBUS;
+
+	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+	if (IS_ERR(folio)) {
+		int err = PTR_ERR(folio);
+
+		if (err == -EAGAIN)
+			return VM_FAULT_RETRY;
+
+		return vmf_error(err);
+	}
+
+	if (WARN_ON_ONCE(folio_test_large(folio))) {
+		ret = VM_FAULT_SIGBUS;
+		goto out_folio;
+	}
+
+	if (!folio_test_uptodate(folio)) {
+		clear_highpage(folio_page(folio, 0));
+		kvm_gmem_mark_prepared(folio);
+	}
+
+	vmf->page = folio_file_page(folio, vmf->pgoff);
+
+out_folio:
+	if (ret != VM_FAULT_LOCKED) {
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+
+	return ret;
+}
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+	.fault = kvm_gmem_fault_user_mapping,
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	if (!kvm_gmem_supports_mmap(file_inode(file)))
+		return -ENODEV;
+
+	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
+	    (VM_SHARED | VM_MAYSHARE)) {
+		return -EINVAL;
+	}
+
+	vma->vm_ops = &kvm_gmem_vm_ops;
+
+	return 0;
+}
+
 static struct file_operations kvm_gmem_fops = {
+	.mmap		= kvm_gmem_mmap,
 	.open		= generic_file_open,
 	.release	= kvm_gmem_release,
 	.fallocate	= kvm_gmem_fallocate,
@@ -391,6 +456,11 @@ static const struct inode_operations kvm_gmem_iops = {
 	.setattr	= kvm_gmem_setattr,
 };
 
+bool __weak kvm_arch_supports_gmem_mmap(struct kvm *kvm)
+{
+	return true;
+}
+
 static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 {
 	const char *anon_name = "[kvm-gmem]";
-- 
2.50.1.552.g942d659e1b-goog


  parent reply	other threads:[~2025-07-29 22:55 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-29 22:54 [PATCH v17 00/24] KVM: Enable mmap() for guest_memfd Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 01/24] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GUEST_MEMFD Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 02/24] KVM: x86: Have all vendor neutral sub-configs depend on KVM_X86, not just KVM Sean Christopherson
2025-07-31  8:08   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 03/24] KVM: x86: Select KVM_GENERIC_PRIVATE_MEM directly from KVM_SW_PROTECTED_VM Sean Christopherson
2025-07-31  8:08   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 04/24] KVM: x86: Select TDX's KVM_GENERIC_xxx dependencies iff CONFIG_KVM_INTEL_TDX=y Sean Christopherson
2025-07-31  8:07   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 05/24] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 06/24] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 07/24] KVM: Fix comments that refer to slots_lock Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 08/24] KVM: Fix comment that refers to kvm uapi header path Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 09/24] KVM: x86: Enable KVM_GUEST_MEMFD for all 64-bit builds Sean Christopherson
2025-07-29 22:54 ` Sean Christopherson [this message]
2025-07-29 22:54 ` [PATCH v17 11/24] KVM: guest_memfd: Track guest_memfd mmap support in memslot Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 12/24] KVM: x86/mmu: Rename .private_max_mapping_level() to .gmem_max_mapping_level() Sean Christopherson
2025-07-31  8:15   ` Fuad Tabba
2025-07-31  8:29     ` David Hildenbrand
2025-07-31  8:33       ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 13/24] KVM: x86/mmu: Hoist guest_memfd max level/order helpers "up" in mmu.c Sean Christopherson
2025-07-31  7:59   ` David Hildenbrand
2025-07-31  8:06   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 14/24] KVM: x86/mmu: Enforce guest_memfd's max order when recovering hugepages Sean Christopherson
2025-07-30  7:33   ` Xiaoyao Li
2025-07-31  8:06     ` David Hildenbrand
2025-07-31  8:10   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 15/24] KVM: x86/mmu: Extend guest_memfd's max mapping level to shared mappings Sean Christopherson
2025-07-30  7:36   ` Xiaoyao Li
2025-07-31  8:01   ` David Hildenbrand
2025-07-31  8:05   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 16/24] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Sean Christopherson
2025-07-30  7:37   ` Xiaoyao Li
2025-07-29 22:54 ` [PATCH v17 17/24] KVM: arm64: Refactor user_mem_abort() Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 18/24] KVM: arm64: Handle guest_memfd-backed guest page faults Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 19/24] KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 20/24] KVM: arm64: Enable support for guest_memfd backed memory Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 21/24] KVM: Allow and advertise support for host mmap() on guest_memfd files Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 22/24] KVM: selftests: Do not use hardcoded page sizes in guest_memfd test Sean Christopherson
2025-07-30 11:04   ` Xiaoyao Li
2025-07-29 22:54 ` [PATCH v17 23/24] KVM: selftests: guest_memfd mmap() test when mmap is supported Sean Christopherson
2025-07-30 11:39   ` Xiaoyao Li
2025-07-30 12:57     ` Sean Christopherson
2025-07-31  7:49       ` Xiaoyao Li
2025-08-07  8:12   ` Shivank Garg
2025-07-29 22:54 ` [PATCH v17 24/24] KVM: selftests: Add guest_memfd testcase to fault-in on !mmap()'d memory Sean Christopherson
2025-07-30  8:20   ` Xiaoyao Li
2025-07-30 15:51   ` Fuad Tabba
2026-03-30  6:21   ` Zenghui Yu
2026-04-17 16:47     ` Sean Christopherson
2026-05-12  7:28       ` Zenghui Yu
2026-05-12 15:53         ` Sean Christopherson
2025-07-30 21:34 ` [PATCH v17 00/24] KVM: Enable mmap() for guest_memfd Ackerley Tng
2025-07-30 22:44   ` Ackerley Tng
2025-08-27  8:43 ` Paolo Bonzini
2025-08-27 12:57   ` Sean Christopherson
2025-08-27 13:08   ` Marc Zyngier
2025-08-27 13:11     ` Paolo Bonzini
2025-08-27 13:14       ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250729225455.670324-11-seanjc@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=chentao@kylinos.cn \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=shivankg@amd.com \
    --cc=tabba@google.com \
    --cc=vbabka@suse.cz \
    --cc=xiaoyao.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.