From: "David Hildenbrand (Arm)" <david@kernel.org>
To: "Kalyazin, Nikita" <kalyazin@amazon.co.uk>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"kernel@xen0n.name" <kernel@xen0n.name>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"loongarch@lists.linux.dev" <loongarch@lists.linux.dev>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
"corbet@lwn.net" <corbet@lwn.net>,
"maz@kernel.org" <maz@kernel.org>,
"oupton@kernel.org" <oupton@kernel.org>,
"joey.gouly@arm.com" <joey.gouly@arm.com>,
"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"will@kernel.org" <will@kernel.org>,
"seanjc@google.com" <seanjc@google.com>,
"tglx@kernel.org" <tglx@kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"luto@kernel.org" <luto@kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"willy@infradead.org" <willy@infradead.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"lorenzo.stoakes@oracle.com" <lorenzo.stoakes@oracle.com>,
"vbabka@kernel.org" <vbabka@kernel.org>,
"rppt@kernel.org" <rppt@kernel.org>,
"surenb@google.com" <surenb@google.com>,
"mhocko@suse.com" <mhocko@suse.com>,
"ast@kernel.org" <ast@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"andrii@kernel.org" <andrii@kernel.org>,
"martin.lau@linux.dev" <martin.lau@linux.dev>,
"eddyz87@gmail.com" <eddyz87@gmail.com>,
"song@kernel.org" <song@kernel.org>,
"yonghong.song@linux.dev" <yonghong.song@linux.dev>,
"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"sdf@fomichev.me" <sdf@fomichev.me>,
"haoluo@google.com" <haoluo@google.com>,
"jolsa@kernel.org" <jolsa@kernel.org>,
"jgg@ziepe.ca" <jgg@ziepe.ca>,
"jhubbard@nvidia.com" <jhubbard@nvidia.com>,
"peterx@redhat.com" <peterx@redhat.com>,
"jannh@google.com" <jannh@google.com>,
"pfalcato@suse.de" <pfalcato@suse.de>,
"skhan@linuxfoundation.org" <skhan@linuxfoundation.org>,
"riel@surriel.com" <riel@surriel.com>,
"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
"jgross@suse.com" <jgross@suse.com>,
"yu-cheng.yu@intel.com" <yu-cheng.yu@intel.com>,
"kas@kernel.org" <kas@kernel.org>,
"coxu@redhat.com" <coxu@redhat.com>,
"kevin.brodsky@arm.com" <kevin.brodsky@arm.com>,
"ackerleytng@google.com" <ackerleytng@google.com>,
"yosry@kernel.org" <yosry@kernel.org>,
"ajones@ventanamicro.com" <ajones@ventanamicro.com>,
"maobibo@loongson.cn" <maobibo@loongson.cn>,
"tabba@google.com" <tabba@google.com>,
"prsampat@amd.com" <prsampat@amd.com>,
"wu.fei9@sanechips.com.cn" <wu.fei9@sanechips.com.cn>,
"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
"jmattson@google.com" <jmattson@google.com>,
"jthoughton@google.com" <jthoughton@google.com>,
"agordeev@linux.ibm.com" <agordeev@linux.ibm.com>,
"alex@ghiti.fr" <alex@ghiti.fr>,
"aou@eecs.berkeley.edu" <aou@eecs.berkeley.edu>,
"borntraeger@linux.ibm.com" <borntraeger@linux.ibm.com>,
"chenhuacai@kernel.org" <chenhuacai@kernel.org>,
"dev.jain@arm.com" <dev.jain@arm.com>,
"gor@linux.ibm.com" <gor@linux.ibm.com>,
"hca@linux.ibm.com" <hca@linux.ibm.com>,
"palmer@dabbelt.com" <palmer@dabbelt.com>,
"pjw@kernel.org" <pjw@kernel.org>,
"shijie@os.amperecomputing.com" <shijie@os.amperecomputing.com>,
"svens@linux.ibm.com" <svens@linux.ibm.com>,
"thuth@redhat.com" <thuth@redhat.com>,
"wyihan@google.com" <wyihan@google.com>,
"yang@os.amperecomputing.com" <yang@os.amperecomputing.com>,
"Jonathan.Cameron@huawei.com" <Jonathan.Cameron@huawei.com>,
"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
"urezki@gmail.com" <urezki@gmail.com>,
"zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
"gerald.schaefer@linux.ibm.com" <gerald.schaefer@linux.ibm.com>,
"jiayuan.chen@shopee.com" <jiayuan.chen@shopee.com>,
"lenb@kernel.org" <lenb@kernel.org>,
"osalvador@suse.de" <osalvador@suse.de>,
"pavel@kernel.org" <pavel@kernel.org>,
"rafael@kernel.org" <rafael@kernel.org>,
"vannapurve@google.com" <vannapurve@google.com>,
"jackmanb@google.com" <jackmanb@google.com>,
"aneesh.kumar@kernel.org" <aneesh.kumar@kernel.org>,
"patrick.roy@linux.dev" <patrick.roy@linux.dev>,
"Thomson, Jack" <jackabt@amazon.co.uk>,
"Itazuri, Takahiro" <itazur@amazon.co.uk>,
"Manwaring, Derek" <derekmn@amazon.com>
Subject: Re: [PATCH v11 10/16] KVM: guest_memfd: Add flag to remove from direct map
Date: Mon, 23 Mar 2026 19:05:11 +0100 [thread overview]
Message-ID: <50bfaeb5-551e-403f-bd00-a7d8b6bbf6e2@kernel.org> (raw)
In-Reply-To: <20260317141031.514-11-kalyazin@amazon.com>
On 3/17/26 15:12, Kalyazin, Nikita wrote:
> From: Patrick Roy <patrick.roy@linux.dev>
>
> Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
> ioctl. When set, guest_memfd folios will be removed from the direct map
> after preparation, with direct map entries only restored when the folios
> are freed.
>
> To ensure these folios do not end up in places where the kernel cannot
> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
> address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.
>
> Note that this flag causes removal of direct map entries for all
> guest_memfd folios independent of whether they are "shared" or "private"
> (although current guest_memfd only supports either all folios in the
> "shared" state, or all folios in the "private" state if
> GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
> entries of also the shared parts of guest_memfd are a special type of
> non-CoCo VM where, host userspace is trusted to have access to all of
> guest memory, but where Spectre-style transient execution attacks
> through the host kernel's direct map should still be mitigated. In this
> setup, KVM retains access to guest memory via userspace mappings of
> guest_memfd, which are reflected back into KVM's memslots via
> userspace_addr. This is needed for things like MMIO emulation on x86_64
> to work.
>
> Direct map entries are zapped right before guest or userspace mappings
> of gmem folios are set up, e.g. in kvm_gmem_fault_user_mapping() or
> kvm_gmem_get_pfn() [called from the KVM MMU code]. The only place where
> a gmem folio can be allocated without being mapped anywhere is
> kvm_gmem_populate(), where handling potential failures of direct map
> removal is not possible (by the time direct map removal is attempted,
> the folio is already marked as prepared, meaning attempting to re-try
> kvm_gmem_populate() would just result in -EEXIST without fixing up the
> direct map state). These folios are then removed form the direct map
> upon kvm_gmem_get_pfn(), e.g. when they are mapped into the guest later.
>
> Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
I you changed this patch significantly, you should likely add a
Co-developed-by: Nikita Kalyazin <kalyazin@amazon.com>
above your sob.
(applies to other patches as well, please double check)
> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
> ---
> Documentation/virt/kvm/api.rst | 21 ++++++-----
> include/linux/kvm_host.h | 3 ++
> include/uapi/linux/kvm.h | 1 +
> virt/kvm/guest_memfd.c | 67 ++++++++++++++++++++++++++++++++--
> 4 files changed, 79 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 032516783e96..8feec77b03fe 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6439,15 +6439,18 @@ a single guest_memfd file, but the bound ranges must not overlap).
> The capability KVM_CAP_GUEST_MEMFD_FLAGS enumerates the `flags` that can be
> specified via KVM_CREATE_GUEST_MEMFD. Currently defined flags:
>
> - ============================ ================================================
> - GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
> - descriptor.
> - GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
> - KVM_CREATE_GUEST_MEMFD (memory files created
> - without INIT_SHARED will be marked private).
> - Shared memory can be faulted into host userspace
> - page tables. Private memory cannot.
> - ============================ ================================================
> + ============================== ================================================
> + GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
> + descriptor.
> + GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
> + KVM_CREATE_GUEST_MEMFD (memory files created
> + without INIT_SHARED will be marked private).
> + Shared memory can be faulted into host userspace
> + page tables. Private memory cannot.
> + GUEST_MEMFD_FLAG_NO_DIRECT_MAP The guest_memfd instance will unmap the memory
> + backing it from the kernel's address space
> + before passing it off to userspace or the guest.
> + ============================== ================================================
>
> When the KVM MMU performs a PFN lookup to service a guest fault and the backing
> guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ce8c5fdf2752..c95747e2278c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -738,6 +738,9 @@ static inline u64 kvm_gmem_get_supported_flags(struct kvm *kvm)
> if (!kvm || kvm_arch_supports_gmem_init_shared(kvm))
> flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
>
> + if (!kvm || kvm_arch_gmem_supports_no_direct_map(kvm))
> + flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
> +
> return flags;
> }
> #endif
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 80364d4dbebb..d864f67efdb7 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1642,6 +1642,7 @@ struct kvm_memory_attributes {
> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
> #define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
> #define GUEST_MEMFD_FLAG_INIT_SHARED (1ULL << 1)
> +#define GUEST_MEMFD_FLAG_NO_DIRECT_MAP (1ULL << 2)
>
> struct kvm_create_guest_memfd {
> __u64 size;
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 651649623448..c9344647579c 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -7,6 +7,7 @@
> #include <linux/mempolicy.h>
> #include <linux/pseudo_fs.h>
> #include <linux/pagemap.h>
> +#include <linux/set_memory.h>
>
> #include "kvm_mm.h"
>
> @@ -76,6 +77,35 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
> return 0;
> }
>
> +#define KVM_GMEM_FOLIO_NO_DIRECT_MAP BIT(0)
> +
> +static bool kvm_gmem_folio_no_direct_map(struct folio *folio)
> +{
> + return ((u64)folio->private) & KVM_GMEM_FOLIO_NO_DIRECT_MAP;
> +}
> +
> +static int kvm_gmem_folio_zap_direct_map(struct folio *folio)
> +{
> + u64 gmem_flags = GMEM_I(folio_inode(folio))->flags;
> + int r = 0;
> +
> + if (kvm_gmem_folio_no_direct_map(folio) || !(gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP))
The function is only called when
kvm_gmem_no_direct_map(folio_inode(folio))
Does it really make sense to check for GUEST_MEMFD_FLAG_NO_DIRECT_MAP again?
If, at all, it should be a warning if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is
not set?
Further, kvm_gmem_folio_zap_direct_map() uses the folio lock to
synchronize, right? Might be worth pointing that out somehow (e.g.,
lockdep check if possible).
> + goto out;
> +
> + r = folio_zap_direct_map(folio);
> + if (!r)
> + folio->private = (void *)((u64)folio->private | KVM_GMEM_FOLIO_NO_DIRECT_MAP);
> +
> +out:
> + return r;
> +}
> +
> +static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
> +{
kvm_gmem_folio_zap_direct_map() is allowed to be called on folios that
already have the directmap remove, kvm_gmem_folio_restore_direct_map()
cannot be called if the directmap was already restored.
Should we make that more consistent?
Hoping Sean can find some time to review
--
Cheers,
David
WARNING: multiple messages have this Message-ID (diff)
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: "Kalyazin, Nikita" <kalyazin@amazon.co.uk>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"kernel@xen0n.name" <kernel@xen0n.name>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"loongarch@lists.linux.dev" <loongarch@lists.linux.dev>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
"corbet@lwn.net" <corbet@lwn.net>,
"maz@kernel.org" <maz@kernel.org>,
"oupton@kernel.org" <oupton@kernel.org>,
"joey.gouly@arm.com" <joey.gouly@arm.com>,
"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"will@kernel.org" <will@kernel.org>,
"seanjc@google.com" <seanjc@google.com>,
"tglx@kernel.org" <tglx@kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"luto@kernel.org" <luto@kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"willy@infradead.org" <willy@infradead.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"lorenzo.stoakes@oracle.com" <lorenzo.stoakes@oracle.com>,
"vbabka@kernel.org" <vbabka@kernel.org>,
"rppt@kernel.org" <rppt@kernel.org>,
"surenb@google.com" <surenb@google.com>,
"mhocko@suse.com" <mhocko@suse.com>,
"ast@kernel.org" <ast@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"andrii@kernel.org" <andrii@kernel.org>,
"martin.lau@linux.dev" <martin.lau@linux.dev>,
"eddyz87@gmail.com" <eddyz87@gmail.com>,
"song@kernel.org" <song@kernel.org>,
"yonghong.song@linux.dev" <yonghong.song@linux.dev>,
"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"sdf@fomichev.me" <sdf@fomichev.me>,
"haoluo@google.com" <haoluo@google.com>,
"jolsa@kernel.org" <jolsa@kernel.org>,
"jgg@ziepe.ca" <jgg@ziepe.ca>,
"jhubbard@nvidia.com" <jhubbard@nvidia.com>,
"peterx@redhat.com" <peterx@redhat.com>,
"jannh@google.com" <jannh@google.com>,
"pfalcato@suse.de" <pfalcato@suse.de>,
"skhan@linuxfoundation.org" <skhan@linuxfoundation.org>,
"riel@surriel.com" <riel@surriel.com>,
"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
"jgross@suse.com" <jgross@suse.com>,
"yu-cheng.yu@intel.com" <yu-cheng.yu@intel.com>,
"kas@kernel.org" <kas@kernel.org>,
"coxu@redhat.com" <coxu@redhat.com>,
"kevin.brodsky@arm.com" <kevin.brodsky@arm.com>,
"ackerleytng@google.com" <ackerleytng@google.com>,
"yosry@kernel.org" <yosry@kernel.org>,
"ajones@ventanamicro.com" <ajones@ventanamicro.com>,
"maobibo@loongson.cn" <maobibo@loongson.cn>,
"tabba@google.com" <tabba@google.com>,
"prsampat@amd.com" <prsampat@amd.com>,
"wu.fei9@sanechips.com.cn" <wu.fei9@sanechips.com.cn>,
"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
"jmattson@google.com" <jmattson@google.com>,
"jthoughton@google.com" <jthoughton@google.com>,
"agordeev@linux.ibm.com" <agordeev@linux.ibm.com>,
"alex@ghiti.fr" <alex@ghiti.fr>,
"aou@eecs.berkeley.edu" <aou@eecs.berkeley.edu>,
"borntraeger@linux.ibm.com" <borntraeger@linux.ibm.com>,
"chenhuacai@kernel.org" <chenhuacai@kernel.org>,
"dev.jain@arm.com" <dev.jain@arm.com>,
"gor@linux.ibm.com" <gor@linux.ibm.com>,
"hca@linux.ibm.com" <hca@linux.ibm.com>,
"palmer@dabbelt.com" <palmer@dabbelt.com>,
"pjw@kernel.org" <pjw@kernel.org>,
"shijie@os.amperecomputing.com" <shijie@os.amperecomputing.com>,
"svens@linux.ibm.com" <svens@linux.ibm.com>,
"thuth@redhat.com" <thuth@redhat.com>,
"wyihan@google.com" <wyihan@google.com>,
"yang@os.amperecomputing.com" <yang@os.amperecomputing.com>,
"Jonathan.Cameron@huawei.com" <Jonathan.Cameron@huawei.com>,
"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
"urezki@gmail.com" <urezki@gmail.com>,
"zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
"gerald.schaefer@linux.ibm.com" <gerald.schaefer@linux.ibm.com>,
"jiayuan.chen@shopee.com" <jiayuan.chen@shopee.com>,
"lenb@kernel.org" <lenb@kernel.org>,
"osalvador@suse.de" <osalvador@suse.de>,
"pavel@kernel.org" <pavel@kernel.org>,
"rafael@kernel.org" <rafael@kernel.org>,
"vannapurve@google.com" <vannapurve@google.com>,
"jackmanb@google.com" <jackmanb@google.com>,
"aneesh.kumar@kernel.org" <aneesh.kumar@kernel.org>,
"patrick.roy@linux.dev" <patrick.roy@linux.dev>,
"Thomson, Jack" <jackabt@amazon.co.uk>,
"Itazuri, Takahiro" <itazur@amazon.co.uk>,
"Manwaring, Derek" <derekmn@amazon.com>
Subject: Re: [PATCH v11 10/16] KVM: guest_memfd: Add flag to remove from direct map
Date: Mon, 23 Mar 2026 19:05:11 +0100 [thread overview]
Message-ID: <50bfaeb5-551e-403f-bd00-a7d8b6bbf6e2@kernel.org> (raw)
In-Reply-To: <20260317141031.514-11-kalyazin@amazon.com>
On 3/17/26 15:12, Kalyazin, Nikita wrote:
> From: Patrick Roy <patrick.roy@linux.dev>
>
> Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
> ioctl. When set, guest_memfd folios will be removed from the direct map
> after preparation, with direct map entries only restored when the folios
> are freed.
>
> To ensure these folios do not end up in places where the kernel cannot
> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
> address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.
>
> Note that this flag causes removal of direct map entries for all
> guest_memfd folios independent of whether they are "shared" or "private"
> (although current guest_memfd only supports either all folios in the
> "shared" state, or all folios in the "private" state if
> GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
> entries of also the shared parts of guest_memfd are a special type of
> non-CoCo VM where, host userspace is trusted to have access to all of
> guest memory, but where Spectre-style transient execution attacks
> through the host kernel's direct map should still be mitigated. In this
> setup, KVM retains access to guest memory via userspace mappings of
> guest_memfd, which are reflected back into KVM's memslots via
> userspace_addr. This is needed for things like MMIO emulation on x86_64
> to work.
>
> Direct map entries are zapped right before guest or userspace mappings
> of gmem folios are set up, e.g. in kvm_gmem_fault_user_mapping() or
> kvm_gmem_get_pfn() [called from the KVM MMU code]. The only place where
> a gmem folio can be allocated without being mapped anywhere is
> kvm_gmem_populate(), where handling potential failures of direct map
> removal is not possible (by the time direct map removal is attempted,
> the folio is already marked as prepared, meaning attempting to re-try
> kvm_gmem_populate() would just result in -EEXIST without fixing up the
> direct map state). These folios are then removed form the direct map
> upon kvm_gmem_get_pfn(), e.g. when they are mapped into the guest later.
>
> Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
I you changed this patch significantly, you should likely add a
Co-developed-by: Nikita Kalyazin <kalyazin@amazon.com>
above your sob.
(applies to other patches as well, please double check)
> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
> ---
> Documentation/virt/kvm/api.rst | 21 ++++++-----
> include/linux/kvm_host.h | 3 ++
> include/uapi/linux/kvm.h | 1 +
> virt/kvm/guest_memfd.c | 67 ++++++++++++++++++++++++++++++++--
> 4 files changed, 79 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 032516783e96..8feec77b03fe 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6439,15 +6439,18 @@ a single guest_memfd file, but the bound ranges must not overlap).
> The capability KVM_CAP_GUEST_MEMFD_FLAGS enumerates the `flags` that can be
> specified via KVM_CREATE_GUEST_MEMFD. Currently defined flags:
>
> - ============================ ================================================
> - GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
> - descriptor.
> - GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
> - KVM_CREATE_GUEST_MEMFD (memory files created
> - without INIT_SHARED will be marked private).
> - Shared memory can be faulted into host userspace
> - page tables. Private memory cannot.
> - ============================ ================================================
> + ============================== ================================================
> + GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
> + descriptor.
> + GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
> + KVM_CREATE_GUEST_MEMFD (memory files created
> + without INIT_SHARED will be marked private).
> + Shared memory can be faulted into host userspace
> + page tables. Private memory cannot.
> + GUEST_MEMFD_FLAG_NO_DIRECT_MAP The guest_memfd instance will unmap the memory
> + backing it from the kernel's address space
> + before passing it off to userspace or the guest.
> + ============================== ================================================
>
> When the KVM MMU performs a PFN lookup to service a guest fault and the backing
> guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ce8c5fdf2752..c95747e2278c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -738,6 +738,9 @@ static inline u64 kvm_gmem_get_supported_flags(struct kvm *kvm)
> if (!kvm || kvm_arch_supports_gmem_init_shared(kvm))
> flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
>
> + if (!kvm || kvm_arch_gmem_supports_no_direct_map(kvm))
> + flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
> +
> return flags;
> }
> #endif
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 80364d4dbebb..d864f67efdb7 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1642,6 +1642,7 @@ struct kvm_memory_attributes {
> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
> #define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
> #define GUEST_MEMFD_FLAG_INIT_SHARED (1ULL << 1)
> +#define GUEST_MEMFD_FLAG_NO_DIRECT_MAP (1ULL << 2)
>
> struct kvm_create_guest_memfd {
> __u64 size;
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 651649623448..c9344647579c 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -7,6 +7,7 @@
> #include <linux/mempolicy.h>
> #include <linux/pseudo_fs.h>
> #include <linux/pagemap.h>
> +#include <linux/set_memory.h>
>
> #include "kvm_mm.h"
>
> @@ -76,6 +77,35 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
> return 0;
> }
>
> +#define KVM_GMEM_FOLIO_NO_DIRECT_MAP BIT(0)
> +
> +static bool kvm_gmem_folio_no_direct_map(struct folio *folio)
> +{
> + return ((u64)folio->private) & KVM_GMEM_FOLIO_NO_DIRECT_MAP;
> +}
> +
> +static int kvm_gmem_folio_zap_direct_map(struct folio *folio)
> +{
> + u64 gmem_flags = GMEM_I(folio_inode(folio))->flags;
> + int r = 0;
> +
> + if (kvm_gmem_folio_no_direct_map(folio) || !(gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP))
The function is only called when
kvm_gmem_no_direct_map(folio_inode(folio))
Does it really make sense to check for GUEST_MEMFD_FLAG_NO_DIRECT_MAP again?
If, at all, it should be a warning if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is
not set?
Further, kvm_gmem_folio_zap_direct_map() uses the folio lock to
synchronize, right? Might be worth pointing that out somehow (e.g.,
lockdep check if possible).
> + goto out;
> +
> + r = folio_zap_direct_map(folio);
> + if (!r)
> + folio->private = (void *)((u64)folio->private | KVM_GMEM_FOLIO_NO_DIRECT_MAP);
> +
> +out:
> + return r;
> +}
> +
> +static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
> +{
kvm_gmem_folio_zap_direct_map() is allowed to be called on folios that
already have the directmap remove, kvm_gmem_folio_restore_direct_map()
cannot be called if the directmap was already restored.
Should we make that more consistent?
Hoping Sean can find some time to review
--
Cheers,
David
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2026-03-23 18:05 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 14:10 [PATCH v11 00/16] Direct Map Removal Support for guest_memfd Kalyazin, Nikita
2026-03-17 14:10 ` Kalyazin, Nikita
2026-03-17 14:10 ` [PATCH v11 01/16] set_memory: set_direct_map_* to take address Kalyazin, Nikita
2026-03-17 14:10 ` Kalyazin, Nikita
2026-03-23 17:44 ` David Hildenbrand (Arm)
2026-03-23 17:44 ` David Hildenbrand (Arm)
2026-04-10 15:24 ` Nikita Kalyazin
2026-04-10 15:24 ` Nikita Kalyazin
2026-03-23 18:00 ` Ackerley Tng
2026-03-23 18:00 ` Ackerley Tng
2026-04-10 15:25 ` Nikita Kalyazin
2026-04-10 15:25 ` Nikita Kalyazin
2026-03-17 14:10 ` [PATCH v11 02/16] set_memory: add folio_{zap,restore}_direct_map helpers Kalyazin, Nikita
2026-03-17 14:10 ` Kalyazin, Nikita
2026-03-23 17:51 ` David Hildenbrand (Arm)
2026-03-23 17:51 ` David Hildenbrand (Arm)
2026-04-10 15:25 ` [PATCH v11 02/16] set_memory: add folio_{zap, restore}_direct_map helpers Nikita Kalyazin
2026-04-10 15:25 ` Nikita Kalyazin
2026-03-23 18:43 ` [PATCH v11 02/16] set_memory: add folio_{zap,restore}_direct_map helpers Ackerley Tng
2026-03-23 18:43 ` Ackerley Tng
2026-04-10 15:25 ` [PATCH v11 02/16] set_memory: add folio_{zap, restore}_direct_map helpers Nikita Kalyazin
2026-04-10 15:25 ` Nikita Kalyazin
2026-03-17 14:11 ` [PATCH v11 03/16] mm/secretmem: make use of folio_{zap,restore}_direct_map Kalyazin, Nikita
2026-03-17 14:11 ` Kalyazin, Nikita
2026-03-23 17:53 ` David Hildenbrand (Arm)
2026-03-23 17:53 ` David Hildenbrand (Arm)
2026-04-10 15:26 ` [PATCH v11 03/16] mm/secretmem: make use of folio_{zap, restore}_direct_map Nikita Kalyazin
2026-04-10 15:26 ` Nikita Kalyazin
2026-03-23 18:46 ` [PATCH v11 03/16] mm/secretmem: make use of folio_{zap,restore}_direct_map Ackerley Tng
2026-03-23 18:46 ` Ackerley Tng
2026-04-10 15:26 ` [PATCH v11 03/16] mm/secretmem: make use of folio_{zap, restore}_direct_map Nikita Kalyazin
2026-04-10 15:26 ` Nikita Kalyazin
2026-03-17 14:11 ` [PATCH v11 04/16] mm/gup: drop secretmem optimization from gup_fast_folio_allowed Kalyazin, Nikita
2026-03-17 14:11 ` Kalyazin, Nikita
2026-03-23 18:31 ` David Hildenbrand (Arm)
2026-03-23 18:31 ` David Hildenbrand (Arm)
2026-04-10 15:27 ` Nikita Kalyazin
2026-04-10 15:27 ` Nikita Kalyazin
2026-03-17 14:11 ` [PATCH v11 05/16] mm/gup: drop local variable in gup_fast_folio_allowed Kalyazin, Nikita
2026-03-17 14:11 ` Kalyazin, Nikita
2026-03-23 17:55 ` David Hildenbrand (Arm)
2026-03-23 17:55 ` David Hildenbrand (Arm)
2026-03-23 20:22 ` Ackerley Tng
2026-03-23 20:22 ` Ackerley Tng
2026-04-10 15:27 ` Nikita Kalyazin
2026-04-10 15:27 ` Nikita Kalyazin
2026-03-17 14:11 ` [PATCH v11 06/16] mm: introduce AS_NO_DIRECT_MAP Kalyazin, Nikita
2026-03-17 14:11 ` Kalyazin, Nikita
2026-03-17 14:11 ` [PATCH v11 07/16] KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate Kalyazin, Nikita
2026-03-17 14:11 ` Kalyazin, Nikita
2026-03-17 14:12 ` [PATCH v11 08/16] KVM: x86: define kvm_arch_gmem_supports_no_direct_map() Kalyazin, Nikita
2026-03-17 14:12 ` Kalyazin, Nikita
2026-03-17 14:12 ` [PATCH v11 09/16] KVM: arm64: " Kalyazin, Nikita
2026-03-17 14:12 ` Kalyazin, Nikita
2026-03-17 14:12 ` [PATCH v11 10/16] KVM: guest_memfd: Add flag to remove from direct map Kalyazin, Nikita
2026-03-17 14:12 ` Kalyazin, Nikita
2026-03-23 18:05 ` David Hildenbrand (Arm) [this message]
2026-03-23 18:05 ` David Hildenbrand (Arm)
2026-03-23 20:47 ` Ackerley Tng
2026-03-23 20:47 ` Ackerley Tng
2026-04-10 15:28 ` Nikita Kalyazin
2026-04-10 15:28 ` Nikita Kalyazin
2026-04-10 15:29 ` Nikita Kalyazin
2026-04-10 15:29 ` Nikita Kalyazin
2026-03-23 21:15 ` Ackerley Tng
2026-03-23 21:15 ` Ackerley Tng
2026-04-10 15:30 ` Nikita Kalyazin
2026-04-10 15:30 ` Nikita Kalyazin
2026-03-17 14:12 ` [PATCH v11 11/16] KVM: selftests: load elf via bounce buffer Kalyazin, Nikita
2026-03-17 14:12 ` Kalyazin, Nikita
2026-03-17 14:12 ` [PATCH v11 12/16] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 Kalyazin, Nikita
2026-03-17 14:12 ` Kalyazin, Nikita
2026-03-17 14:13 ` [PATCH v11 13/16] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types Kalyazin, Nikita
2026-03-17 14:13 ` Kalyazin, Nikita
2026-03-17 14:13 ` [PATCH v11 14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests Kalyazin, Nikita
2026-03-17 14:13 ` Kalyazin, Nikita
2026-03-17 14:13 ` [PATCH v11 15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape Kalyazin, Nikita
2026-03-17 14:13 ` Kalyazin, Nikita
2026-03-17 14:13 ` [PATCH v11 16/16] KVM: selftests: Test guest execution from direct map removed gmem Kalyazin, Nikita
2026-03-17 14:13 ` Kalyazin, Nikita
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50bfaeb5-551e-403f-bd00-a7d8b6bbf6e2@kernel.org \
--to=david@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=Liam.Howlett@oracle.com \
--cc=ackerleytng@google.com \
--cc=agordeev@linux.ibm.com \
--cc=ajones@ventanamicro.com \
--cc=akpm@linux-foundation.org \
--cc=alex@ghiti.fr \
--cc=andrii@kernel.org \
--cc=aneesh.kumar@kernel.org \
--cc=aou@eecs.berkeley.edu \
--cc=ast@kernel.org \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=corbet@lwn.net \
--cc=coxu@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dave.hansen@linux.intel.com \
--cc=derekmn@amazon.com \
--cc=dev.jain@arm.com \
--cc=eddyz87@gmail.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=haoluo@google.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=itazur@amazon.co.uk \
--cc=jackabt@amazon.co.uk \
--cc=jackmanb@google.com \
--cc=jannh@google.com \
--cc=jgg@ziepe.ca \
--cc=jgross@suse.com \
--cc=jhubbard@nvidia.com \
--cc=jiayuan.chen@shopee.com \
--cc=jmattson@google.com \
--cc=joey.gouly@arm.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.co.uk \
--cc=kas@kernel.org \
--cc=kernel@xen0n.name \
--cc=kevin.brodsky@arm.com \
--cc=kpsingh@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=lenb@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=maobibo@loongson.cn \
--cc=martin.lau@linux.dev \
--cc=maz@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=osalvador@suse.de \
--cc=oupton@kernel.org \
--cc=palmer@dabbelt.com \
--cc=patrick.roy@linux.dev \
--cc=pavel@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=pfalcato@suse.de \
--cc=pjw@kernel.org \
--cc=prsampat@amd.com \
--cc=rafael@kernel.org \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=sdf@fomichev.me \
--cc=seanjc@google.com \
--cc=shijie@os.amperecomputing.com \
--cc=skhan@linuxfoundation.org \
--cc=song@kernel.org \
--cc=surenb@google.com \
--cc=suzuki.poulose@arm.com \
--cc=svens@linux.ibm.com \
--cc=tabba@google.com \
--cc=tglx@kernel.org \
--cc=thuth@redhat.com \
--cc=urezki@gmail.com \
--cc=vannapurve@google.com \
--cc=vbabka@kernel.org \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=wu.fei9@sanechips.com.cn \
--cc=wyihan@google.com \
--cc=x86@kernel.org \
--cc=yang@os.amperecomputing.com \
--cc=yonghong.song@linux.dev \
--cc=yosry@kernel.org \
--cc=yu-cheng.yu@intel.com \
--cc=yuzenghui@huawei.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.