From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2AA64C87FCA for ; Tue, 29 Jul 2025 23:25:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Reply-To:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To: From:Subject:Message-ID:References:Mime-Version:In-Reply-To:Date: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=tn1mSBLO+YLVwstEyyon9JR/mQU0lWjJcGUdOXFvVvs=; b=v6qwGEsf+SUaMk6MugbzPXIPn6 7u/KCeU6BSIBrqsxpXwLS/p3HlZJNt/TS0Etd1Rs2sa+hVtxKMTGaAlC5VphjHxtQjyEfzEoXQN1y MeoHeJ+mVdHKy0u2/4YH6+D6IvAGbChcBtWSfkzb/hyF+FzFaVY4rlgRDKBH5yc5ZG6D1Of8AZv9x 381qNOSXe3YaZclZQ6EAfd9W9ct84m0R7UvGDugwByhInH+alVNdIY76tHuCKZ2lJofjtPlOMGiJ4 IviN+Y9r4r0P53JYEa80QnM3vJXU+fxVHZrVTQkTs5+ILM5goyUeB9pLYRnpMzMlJ3i7/ZejbS9YM 9TB7KzJg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1ugtgt-00000000IAH-1yXb; Tue, 29 Jul 2025 23:25:03 +0000 Received: from mail-pg1-x549.google.com ([2607:f8b0:4864:20::549]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1ugtEa-00000000Doi-3vPz for linux-arm-kernel@lists.infradead.org; Tue, 29 Jul 2025 22:55:50 +0000 Received: by mail-pg1-x549.google.com with SMTP id 41be03b00d2f7-b115fb801bcso7549219a12.3 for ; Tue, 29 Jul 2025 15:55:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753829748; x=1754434548; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=tn1mSBLO+YLVwstEyyon9JR/mQU0lWjJcGUdOXFvVvs=; b=Dh29MB5xvSAGoFQKDTu/+hw/aenNsjnziKc9EeSfSsbJ6wWbh09Cr4UfbkVWCw+Yb+ AVzCqgL6SnVZSX4+gM6ToGkc+CAqjdA9la8+6KygUQteYEsbVB58EWP3Itv5LpnbJMP7 cGFu5WlGSKAwQaDSkRDE8G1hXyPheRwuA+fMgn1xOQIKdSs5x8C3V/JQz5k39y0asr+H O2pLa2eAK+PnWE3bl/MZ39AViV0sccwioEAYJzQpy7oGFDjrAlsTT3Jz+zd44jIPBDkP oM3HVq5Yfv2JE1HGuoHfiFWBeMwHLDoAS8xebfksSRN17435asdD2/oBGuUTLEvhZvvp 9dsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753829748; x=1754434548; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tn1mSBLO+YLVwstEyyon9JR/mQU0lWjJcGUdOXFvVvs=; b=QbbjbHZ4KWUWoYcXLhoUD4/9rafWloB2TGvFQTnsIt4/SFuSO19faVOKC0mWm0YDjN XsVgo7zexHkaFbTYf+QLit+iS+PCLoGc8XtWPVtGsDDvYWUD5iZ+oHLtnG1y74OoGdf8 4dBSUXjJR4Ta1Js3mspKogQha/330HxDZzJnMojFPsPjB1mkvsbK0Ym03veepiaS2c11 1wipT+b2LqDsf/RXzWAG60aisNO4vklVdKWsx5TvlaR6u5BuRjmQ57ECi2NuvXKY8KXj PRm0r6AVT1yas8mJtGyL1lNFccSE/wV1+fveKfLTXmm+LTGhTKJXjhYmOzWPD8CdMQG6 vFbg== X-Forwarded-Encrypted: i=1; AJvYcCXxXbAaZ6SU+y8OhI0SXx1ZGVATyJ51OObchcvO790rxE4oPTidna5Rh0SrhkA4pSWJYaorIj0FUi6puSOKSoJM@lists.infradead.org X-Gm-Message-State: AOJu0Yw+nW0t4LtQfWWsJGK6BuJxvv2j1G3aPsGFCYSFwa/InVEipuMd Dvi5kTMB4S5fnZlRD4qSaJhJ5HmLo6L8VFaD/pukpZe0i+y+bNRSyosSxmpSJRqUAtF4g4q7SF2 tWE8Isw== X-Google-Smtp-Source: AGHT+IHAFJZIedsHoBlxMRk1Os6nimsUyZRbs4o25k9lGhNMmtWWfdhx2zaS8kMkKKU+mHoV57Vx/+pszAo= X-Received: from pjk4.prod.google.com ([2002:a17:90b:5584:b0:312:187d:382d]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:d505:b0:240:6740:6b11 with SMTP id d9443c01a7336-24096b0faa3mr13034795ad.40.1753829746789; Tue, 29 Jul 2025 15:55:46 -0700 (PDT) Date: Tue, 29 Jul 2025 15:54:41 -0700 In-Reply-To: <20250729225455.670324-1-seanjc@google.com> Mime-Version: 1.0 References: <20250729225455.670324-1-seanjc@google.com> X-Mailer: git-send-email 2.50.1.552.g942d659e1b-goog Message-ID: <20250729225455.670324-11-seanjc@google.com> Subject: [PATCH v17 10/24] KVM: guest_memfd: Add plumbing to host to map guest_memfd pages From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, Ira Weiny , Gavin Shan , Shivank Garg , Vlastimil Babka , Xiaoyao Li , David Hildenbrand , Fuad Tabba , Ackerley Tng , Tao Chan , James Houghton Content-Type: text/plain; charset="UTF-8" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250729_155548_976757_A6AB1F25 X-CRM114-Status: GOOD ( 17.02 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Sean Christopherson Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Fuad Tabba Introduce the core infrastructure to enable host userspace to mmap() guest_memfd-backed memory. This is needed for several evolving KVM use cases: * Non-CoCo VM backing: Allows VMMs like Firecracker to run guests entirely backed by guest_memfd, even for non-CoCo VMs [1]. This provides a unified memory management model and simplifies guest memory handling. * Direct map removal for enhanced security: This is an important step for direct map removal of guest memory [2]. By allowing host userspace to fault in guest_memfd pages directly, we can avoid maintaining host kernel direct maps of guest memory. This provides additional hardening against Spectre-like transient execution attacks by removing a potential attack surface within the kernel. * Future guest_memfd features: This also lays the groundwork for future enhancements to guest_memfd, such as supporting huge pages and enabling in-place sharing of guest memory with the host for CoCo platforms that permit it [3]. Enable the basic mmap and fault handling logic within guest_memfd, but hold off on allow userspace to actually do mmap() until the architecture support is also in place. [1] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding [2] https://lore.kernel.org/linux-mm/cc1bb8e9bc3e1ab637700a4d3defeec95b55060a.camel@amazon.com [3] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com/T/#u Reviewed-by: Gavin Shan Reviewed-by: Shivank Garg Acked-by: David Hildenbrand Co-developed-by: Ackerley Tng Signed-off-by: Ackerley Tng Signed-off-by: Fuad Tabba Reviewed-by: Xiaoyao Li Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 11 +++++++ include/linux/kvm_host.h | 4 +++ virt/kvm/guest_memfd.c | 70 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 85 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a1c49bc681c4..e5cd54ba1eaa 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13518,6 +13518,16 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_arch_no_poll); +#ifdef CONFIG_KVM_GUEST_MEMFD +/* + * KVM doesn't yet support mmap() on guest_memfd for VMs with private memory + * (the private vs. shared tracking needs to be moved into guest_memfd). + */ +bool kvm_arch_supports_gmem_mmap(struct kvm *kvm) +{ + return !kvm_arch_has_private_mem(kvm); +} + #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order) { @@ -13531,6 +13541,7 @@ void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) kvm_x86_call(gmem_invalidate)(start, end); } #endif +#endif int kvm_spec_ctrl_test_value(u64 value) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4d1c44622056..26bad600f9fa 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -726,6 +726,10 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm) } #endif +#ifdef CONFIG_KVM_GUEST_MEMFD +bool kvm_arch_supports_gmem_mmap(struct kvm *kvm); +#endif + #ifndef kvm_arch_has_readonly_mem static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm) { diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index a99e11b8b77f..67e7cd7210ef 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -312,7 +312,72 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) return gfn - slot->base_gfn + slot->gmem.pgoff; } +static bool kvm_gmem_supports_mmap(struct inode *inode) +{ + return false; +} + +static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf) +{ + struct inode *inode = file_inode(vmf->vma->vm_file); + struct folio *folio; + vm_fault_t ret = VM_FAULT_LOCKED; + + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode)) + return VM_FAULT_SIGBUS; + + folio = kvm_gmem_get_folio(inode, vmf->pgoff); + if (IS_ERR(folio)) { + int err = PTR_ERR(folio); + + if (err == -EAGAIN) + return VM_FAULT_RETRY; + + return vmf_error(err); + } + + if (WARN_ON_ONCE(folio_test_large(folio))) { + ret = VM_FAULT_SIGBUS; + goto out_folio; + } + + if (!folio_test_uptodate(folio)) { + clear_highpage(folio_page(folio, 0)); + kvm_gmem_mark_prepared(folio); + } + + vmf->page = folio_file_page(folio, vmf->pgoff); + +out_folio: + if (ret != VM_FAULT_LOCKED) { + folio_unlock(folio); + folio_put(folio); + } + + return ret; +} + +static const struct vm_operations_struct kvm_gmem_vm_ops = { + .fault = kvm_gmem_fault_user_mapping, +}; + +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + if (!kvm_gmem_supports_mmap(file_inode(file))) + return -ENODEV; + + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != + (VM_SHARED | VM_MAYSHARE)) { + return -EINVAL; + } + + vma->vm_ops = &kvm_gmem_vm_ops; + + return 0; +} + static struct file_operations kvm_gmem_fops = { + .mmap = kvm_gmem_mmap, .open = generic_file_open, .release = kvm_gmem_release, .fallocate = kvm_gmem_fallocate, @@ -391,6 +456,11 @@ static const struct inode_operations kvm_gmem_iops = { .setattr = kvm_gmem_setattr, }; +bool __weak kvm_arch_supports_gmem_mmap(struct kvm *kvm) +{ + return true; +} + static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) { const char *anon_name = "[kvm-gmem]"; -- 2.50.1.552.g942d659e1b-goog