From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A03DCC83F03 for ; Wed, 9 Jul 2025 11:00:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 268436B00D4; Wed, 9 Jul 2025 07:00:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2193B6B00D5; Wed, 9 Jul 2025 07:00:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 095F36B00D6; Wed, 9 Jul 2025 07:00:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D8B2B6B00D4 for ; Wed, 9 Jul 2025 07:00:08 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A007D160185 for ; Wed, 9 Jul 2025 11:00:08 +0000 (UTC) X-FDA: 83644431696.02.CAD8310 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) by imf23.hostedemail.com (Postfix) with ESMTP id EE0E6140015 for ; Wed, 9 Jul 2025 11:00:06 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="4V/Iuwan"; spf=pass (imf23.hostedemail.com: domain of 3tUtuaAUKCFABsttsy66y3w.u64305CF-442Dsu2.69y@flex--tabba.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3tUtuaAUKCFABsttsy66y3w.u64305CF-442Dsu2.69y@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752058807; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ad7Mm+6p+u6M0V7ZHUKz/9Le+c/L7LqCq7GeLa1zflU=; b=aN+YWe6oPF+xlUN5OM9MTxrmIsQYwRYBEB/KF5Byh+OvdDIVweNQH9G8HHpVYqCTPMKvQ7 1VwxSqDVvuLXdSJ55BJyK8ssHzoiBCzG6xHjLKqZkYOY2xpGv7wqYCvxaseJZgqt0IDnz4 HFxhmpr6vme7D2Sa0N2h7ZVmpioYxB4= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="4V/Iuwan"; spf=pass (imf23.hostedemail.com: domain of 3tUtuaAUKCFABsttsy66y3w.u64305CF-442Dsu2.69y@flex--tabba.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3tUtuaAUKCFABsttsy66y3w.u64305CF-442Dsu2.69y@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752058807; a=rsa-sha256; cv=none; b=h1xJBW9Ppe3YWF/otP2Y8UKimjx2oSAcjaQf5Ea4C+A5DAkDw1kpUZ+jSSv4o0yfCM/Qd5 hWC+c8MK9k/hgwHaiF6teQSOr206WFuQQ96dmccF1+3qkZfKpPlDqFKtwXMtLhOMOC31Ps PmMukdPqk8KtppY6XAxpvgKilhtra/Q= Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4536962204aso23265445e9.3 for ; Wed, 09 Jul 2025 04:00:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752058805; x=1752663605; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Ad7Mm+6p+u6M0V7ZHUKz/9Le+c/L7LqCq7GeLa1zflU=; b=4V/IuwanOWs9RVHwPopxsR8Cx2KQxjrm5O8oP/DVmRPW0Zv0kcDme/i6GQrPs1q+uD VYjOWM9jEOOtSrl6td6ujjLPNF3TLr8nkPW0R4dNu8jH8socUeqM/i/DU5zntPABZ3KU MjVpzp/eTuTk/HpCU1ejtniwBBsUT5aXY6vOzwA7XFh0LpTUXG+q+080RZe9XCr4ciJF 01GopV4wvhbpVXzu/MT09+nJ6DetAx7SMncFrO/OZukorTzMQ3rOpC9EHETAbYYgJt4x GpnJOJRETMdwV4qh705TRceDB6wHHq/nq8OAoSE3udNInSUC1UzC130+QE+rSrFjucD1 QPEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752058805; x=1752663605; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ad7Mm+6p+u6M0V7ZHUKz/9Le+c/L7LqCq7GeLa1zflU=; b=pQYyrKADgyok7AFH/Z9/w1mFxKoXbNprSKbYs5ot+2XkL0WOI8PH6kck4zmmJEUNu4 gLcIBXhtvkuvFewTFsAIpPLtvb43rKOkNqPoUA0sIQYsfaWlxnjSzqnhLYo/fVSGHpGs Jswuf0SJHh7AaH0DeI2XYMG1/n6c49FQFtQDYG4u9HHj972kMJNZJMTvuxqYS0xGcKge v1xYrW/zrnyoGXCMCjJ45AUXuNvp7gkPdMMOYZBsrhJiVNv4joQQdjP+5U4diV9d1cFb LnhFRC308UtbCLBSmG2kIsnSetpCgPCF7wk942tvhzxKfdqj90oeKwgbhKEN9RoK1VXt h0jQ== X-Forwarded-Encrypted: i=1; AJvYcCXXf+45OdVqw9zh7jUQQgeBZR30J+Mv8xtbYLAa9Pmo+yjUirCvlk7FVhZUHVopPwV5h2uBvT7LKA==@kvack.org X-Gm-Message-State: AOJu0YwiPnZw1fYOXU38mMPzkzLfDgWMlBNCM3hQgLHOXEEfL2ALqh+c rgeY7bzIW9MLCUGCNs8Ny2U6bIv+kCg+JAgDmOLeJucZ1PmBo+FGqyrhdyTDgRiF8/flUu79DLp p9A== X-Google-Smtp-Source: AGHT+IGh+wnQIadg6Z+OkUHfxqW5PAvwOn8c3zjiDCy2vBs+5Oj5lp5BjdDYCMOWgxyqCITvvo4f4KsLcw== X-Received: from wmben10.prod.google.com ([2002:a05:600c:828a:b0:43d:1f28:b8bf]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1c9e:b0:43e:bdf7:7975 with SMTP id 5b1f17b1804b1-454d60f20e1mr13785845e9.32.1752058805470; Wed, 09 Jul 2025 04:00:05 -0700 (PDT) Date: Wed, 9 Jul 2025 11:59:34 +0100 In-Reply-To: <20250709105946.4009897-1-tabba@google.com> Mime-Version: 1.0 References: <20250709105946.4009897-1-tabba@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250709105946.4009897-9-tabba@google.com> Subject: [PATCH v13 08/20] KVM: guest_memfd: Allow host to map guest_memfd pages From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, kvmarm@lists.linux.dev Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com, ira.weiny@intel.com, tabba@google.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: EE0E6140015 X-Rspamd-Server: rspam09 X-Stat-Signature: qax5pryb5orkbfgji9eg4wnah14awsb1 X-HE-Tag: 1752058806-738373 X-HE-Meta: U2FsdGVkX19AKcvvO6tOWe1MuWjqnGBuUj3WMOtAL5B9vMqgExv6k/kzjUrz6y1NFr2ttnDZ8SpzBne006rASrPRXoYidLsUyoCuPWN2AO+BdxZZxuh5ZuLjYMl+Cv0r2N1HxeEuaR0PJwUdTCdVYOFa0X7u+0tdmjedPB2NTYpV+3SH3Pc+PhLONAVs1Rx4NmFILf55oxckHLVUx5TBfruL3c2QWZQ4BIfdD7vbfCxztFQDSxK+MpnGWTWp5sGaGa7J2IqxKt2zZO7TAX2hBa92fZyWgFXAqkJx1uxzmmh6jsvWYSrNW+R3qF9/sudc2TFNAevU6Fz5hlOTbkEVFeS4M/hYwnTXVgPAGBBjFI43ppOOr0ig7AIYYxrDOcBgaF03ShehB6/n4oI61KPHErQP5TZRjtxgXtxXCyLyoAv8AC4QfMoW61z+NZt4jAQm9gRWndUE1H3qwkYKd8aLxI92hkA/T0irAEgMpyegEM/3LDUgl1qlUW363GJa5mvQ7XyHS3gsSIvwWFsd6kPaX6J86NDMprimxfkyjiIxoKIkQ/HMe7PMESTAutu0aM92ft+HEhTegTfoXfE0hbB+DiQ/AhkxiqO9sV9WaGQU2mLo0aYjiKfmR6v16KA7ZB1z4vtzjSa3fZNJQy2461K2Q8OjEN08Y3Mhu3jvFIKQjriteLmX/AxP0EP3w9C/3PuSLuOTOL7sOPbzv4CkGAsUsmHnxtywHkunMRpbvwkMHgEk3G2D0GJH+TRM9dlj5FD2vIEJgEHUuip38aGzxU3dwr5OWAScl98fQROBYSsvNtAQEpmIYcoIvr+KiqY7zNQnisRT+4vKDEerZ0Ye0NXHwwOMlNIYlDroaCZHDyYFHVsYnCH+EIuxAqTrujpLhSpnoWrU+qPyxkKCcZdKLBIuI2YSsH904vGhSqES+UdQIGv9jCVlkU3W9zgJ8Sv0Kj/4I4LqkqF7LwLwj4PbxRt NR3FXxXn 6AwVRdzk7S2/AYaDk3EVvyW3SYczP4NKsHqxLTXFqVfqmgo0690BpKFP3TOaR0IBJXOWMkPNIqxKtTsPBjgm3U0xf9LFePeZ2BUiuL+Vi24Kz8iACeBOwA0iFfABEWfGHclCGuWvSG+ScfMt7xK77YGMGCkXf9vnuRtjvABwukYe4CVRn+wpmucQPic2c1Cb0RXhbA/j0KH9TIqJPupL76umEGmdzwYpdGWLz9HzK6z+Qr81I4jOzEC9dY5E93MIj9teCdmX5LWLmWJYvz+oqrRxdWXXFTT8qzVcenQEHNqp2LRRGb6/ak7d3GLBCkjgRnniHYV5KsWkaXtkFYria9ZSMV8g5Grzu6uFOO3WulD3I/w8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Introduce the core infrastructure to enable host userspace to mmap() guest_memfd-backed memory. This is needed for several evolving KVM use cases: * Non-CoCo VM backing: Allows VMMs like Firecracker to run guests entirely backed by guest_memfd, even for non-CoCo VMs [1]. This provides a unified memory management model and simplifies guest memory handling. * Direct map removal for enhanced security: This is an important step for direct map removal of guest memory [2]. By allowing host userspace to fault in guest_memfd pages directly, we can avoid maintaining host kernel direct maps of guest memory. This provides additional hardening against Spectre-like transient execution attacks by removing a potential attack surface within the kernel. * Future guest_memfd features: This also lays the groundwork for future enhancements to guest_memfd, such as supporting huge pages and enabling in-place sharing of guest memory with the host for CoCo platforms that permit it [3]. Therefore, enable the basic mmap and fault handling logic within guest_memfd. However, this functionality is not yet exposed to userspace and remains inactive until two conditions are met in subsequent patches: * Kconfig Gate (CONFIG_KVM_GMEM_SUPPORTS_MMAP): A new Kconfig option, KVM_GMEM_SUPPORTS_MMAP, is introduced later in this series. This option gates the compilation and availability of this mmap functionality at a system level. While the code changes in this patch might seem small, the Kconfig option is introduced to explicitly signal the intent to enable this new capability and to provide a clear compile-time switch for it. It also helps ensure that the necessary architecture-specific glue (like kvm_arch_supports_gmem_mmap) is properly defined. * Per-instance opt-in (GUEST_MEMFD_FLAG_MMAP): On a per-instance basis, this functionality is enabled by the guest_memfd flag GUEST_MEMFD_FLAG_MMAP, which will be set in the KVM_CREATE_GUEST_MEMFD ioctl. This flag is crucial because when host userspace maps guest_memfd pages, KVM must *not* manage the these memory regions in the same way it does for traditional KVM memory slots. The presence of GUEST_MEMFD_FLAG_MMAP on a guest_memfd instance allows mmap() and faulting of guest_memfd memory to host userspace. Additionally, it informs KVM to always consume guest faults to this memory from guest_memfd, regardless of whether it is a shared or a private fault. This opt-in mechanism ensures compatibility and prevents conflicts with existing KVM memory management. This is a per-guest_memfd flag rather than a per-memslot or per-VM capability because the ability to mmap directly applies to the specific guest_memfd object, regardless of how it might be used within various memory slots or VMs. [1] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding [2] https://lore.kernel.org/linux-mm/cc1bb8e9bc3e1ab637700a4d3defeec95b55060a.camel@amazon.com [3] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com/T/#u Reviewed-by: Gavin Shan Reviewed-by: Shivank Garg Acked-by: David Hildenbrand Co-developed-by: Ackerley Tng Signed-off-by: Ackerley Tng Signed-off-by: Fuad Tabba --- include/linux/kvm_host.h | 13 +++++++ include/uapi/linux/kvm.h | 1 + virt/kvm/Kconfig | 4 +++ virt/kvm/guest_memfd.c | 73 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 91 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1ec71648824c..9ac21985f3b5 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -740,6 +740,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm) } #endif +/* + * Returns true if this VM supports mmap() in guest_memfd. + * + * Arch code must define kvm_arch_supports_gmem_mmap if support for guest_memfd + * is enabled. + */ +#if !defined(kvm_arch_supports_gmem_mmap) +static inline bool kvm_arch_supports_gmem_mmap(struct kvm *kvm) +{ + return false; +} +#endif + #ifndef kvm_arch_has_readonly_mem static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm) { diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 37891580d05d..c71348db818f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1592,6 +1592,7 @@ struct kvm_memory_attributes { #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) +#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0) struct kvm_create_guest_memfd { __u64 size; diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 559c93ad90be..fa4acbedb953 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE config HAVE_KVM_ARCH_GMEM_INVALIDATE bool depends on KVM_GMEM + +config KVM_GMEM_SUPPORTS_MMAP + select KVM_GMEM + bool diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 6db515833f61..07a4b165471d 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) return gfn - slot->base_gfn + slot->gmem.pgoff; } +static bool kvm_gmem_supports_mmap(struct inode *inode) +{ + const u64 flags = (u64)inode->i_private; + + if (!IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP)) + return false; + + return flags & GUEST_MEMFD_FLAG_MMAP; +} + +static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf) +{ + struct inode *inode = file_inode(vmf->vma->vm_file); + struct folio *folio; + vm_fault_t ret = VM_FAULT_LOCKED; + + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode)) + return VM_FAULT_SIGBUS; + + folio = kvm_gmem_get_folio(inode, vmf->pgoff); + if (IS_ERR(folio)) { + int err = PTR_ERR(folio); + + if (err == -EAGAIN) + return VM_FAULT_RETRY; + + return vmf_error(err); + } + + if (WARN_ON_ONCE(folio_test_large(folio))) { + ret = VM_FAULT_SIGBUS; + goto out_folio; + } + + if (!folio_test_uptodate(folio)) { + clear_highpage(folio_page(folio, 0)); + kvm_gmem_mark_prepared(folio); + } + + vmf->page = folio_file_page(folio, vmf->pgoff); + +out_folio: + if (ret != VM_FAULT_LOCKED) { + folio_unlock(folio); + folio_put(folio); + } + + return ret; +} + +static const struct vm_operations_struct kvm_gmem_vm_ops = { + .fault = kvm_gmem_fault_user_mapping, +}; + +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + if (!kvm_gmem_supports_mmap(file_inode(file))) + return -ENODEV; + + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != + (VM_SHARED | VM_MAYSHARE)) { + return -EINVAL; + } + + vma->vm_ops = &kvm_gmem_vm_ops; + + return 0; +} + static struct file_operations kvm_gmem_fops = { + .mmap = kvm_gmem_mmap, .open = generic_file_open, .release = kvm_gmem_release, .fallocate = kvm_gmem_fallocate, @@ -463,6 +533,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) u64 flags = args->flags; u64 valid_flags = 0; + if (kvm_arch_supports_gmem_mmap(kvm)) + valid_flags |= GUEST_MEMFD_FLAG_MMAP; + if (flags & ~valid_flags) return -EINVAL; -- 2.50.0.727.gbf7dc18ff4-goog