* [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
@ 2025-03-18 16:18 Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 1/9] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
` (8 more replies)
0 siblings, 9 replies; 35+ messages in thread
From: Fuad Tabba @ 2025-03-18 16:18 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba
Main changes since v6 [1]:
- Fix and simplify handling of final folio_put() callback in case
underlying file is no longer associated with guestmem, e.g., KVM
module unloaded (Ackerley, Vlastimil)
- Removed dependency on folio lock when not needed (Ackerley)
- Carried acks (DavidH, Vlastimil)
- Rebased on Linux 6.14-rc7
The purpose of this series is to serve as a base for _restricted_ mmap()
support for guest_memfd backed memory at the host [2]. It allows
experimentation with what that support would be like in the safe
environment of software and non-confidential VM types.
For more background and for how to test this series, please refer to v2
[3]. Note that an updated version of kvmtool that works with this series
is available here [4].
I'm done respinning the series that tracks folio sharing [5]. I'll post
that one right after this one.
Cheers,
/fuad
[1] https://lore.kernel.org/all/20250312175824.1809636-1-tabba@google.com/
[2] https://lore.kernel.org/all/20250117163001.2326672-1-tabba@google.com/
[3] https://lore.kernel.org/all/20250129172320.950523-1-tabba@google.com/
[4] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/guestmem-6.14
[5] https://lore.kernel.org/all/20250117163001.2326672-1-tabba@google.com/
Fuad Tabba (9):
mm: Consolidate freeing of typed folios on final folio_put()
KVM: guest_memfd: Handle final folio_put() of guest_memfd pages
KVM: guest_memfd: Allow host to map guest_memfd() pages
KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed
memory
KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd
shared memory
KVM: arm64: Refactor user_mem_abort() calculation of force_pte
KVM: arm64: Handle guest_memfd()-backed guest page faults
KVM: arm64: Enable mapping guest_memfd in arm64
KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is
allowed
arch/arm64/include/asm/kvm_host.h | 12 ++
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/mmu.c | 76 +++++++-----
arch/x86/include/asm/kvm_host.h | 5 +
arch/x86/kvm/Kconfig | 3 +-
include/linux/kvm_host.h | 23 +++-
include/linux/page-flags.h | 31 +++++
include/uapi/linux/kvm.h | 1 +
mm/debug.c | 1 +
mm/swap.c | 52 ++++++++-
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/guest_memfd_test.c | 75 +++++++++++-
virt/kvm/Kconfig | 4 +
virt/kvm/guest_memfd.c | 109 ++++++++++++++++++
virt/kvm/kvm_main.c | 9 +-
15 files changed, 357 insertions(+), 46 deletions(-)
base-commit: 4701f33a10702d5fc577c32434eb62adde0a1ae1
--
2.49.0.rc1.451.g8f38331e32-goog
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v7 1/9] mm: Consolidate freeing of typed folios on final folio_put()
2025-03-18 16:18 [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
@ 2025-03-18 16:18 ` Fuad Tabba
2025-04-14 10:00 ` David Hildenbrand
2025-03-18 16:18 ` [PATCH v7 2/9] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages Fuad Tabba
` (7 subsequent siblings)
8 siblings, 1 reply; 35+ messages in thread
From: Fuad Tabba @ 2025-03-18 16:18 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba
Some folio types, such as hugetlb, handle freeing their own
folios. Moreover, guest_memfd will require being notified once a
folio's reference count reaches 0 to facilitate shared to private
folio conversion, without the folio actually being freed at that
point.
As a first step towards that, this patch consolidates freeing
folios that have a type. The first user is hugetlb folios. Later
in this patch series, guest_memfd will become the second user of
this.
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
include/linux/page-flags.h | 15 +++++++++++++++
mm/swap.c | 23 ++++++++++++++++++-----
2 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 36d283552f80..6dc2494bd002 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -953,6 +953,21 @@ static inline bool page_has_type(const struct page *page)
return page_mapcount_is_type(data_race(page->page_type));
}
+static inline int page_get_type(const struct page *page)
+{
+ return page->page_type >> 24;
+}
+
+static inline bool folio_has_type(const struct folio *folio)
+{
+ return page_has_type(&folio->page);
+}
+
+static inline int folio_get_type(const struct folio *folio)
+{
+ return page_get_type(&folio->page);
+}
+
#define FOLIO_TYPE_OPS(lname, fname) \
static __always_inline bool folio_test_##fname(const struct folio *folio) \
{ \
diff --git a/mm/swap.c b/mm/swap.c
index fc8281ef4241..47bc1bb919cc 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -94,6 +94,19 @@ static void page_cache_release(struct folio *folio)
unlock_page_lruvec_irqrestore(lruvec, flags);
}
+static void free_typed_folio(struct folio *folio)
+{
+ switch (folio_get_type(folio)) {
+#ifdef CONFIG_HUGETLBFS
+ case PGTY_hugetlb:
+ free_huge_folio(folio);
+ return;
+#endif
+ default:
+ WARN_ON_ONCE(1);
+ }
+}
+
void __folio_put(struct folio *folio)
{
if (unlikely(folio_is_zone_device(folio))) {
@@ -101,8 +114,8 @@ void __folio_put(struct folio *folio)
return;
}
- if (folio_test_hugetlb(folio)) {
- free_huge_folio(folio);
+ if (unlikely(folio_has_type(folio))) {
+ free_typed_folio(folio);
return;
}
@@ -966,13 +979,13 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs)
if (!folio_ref_sub_and_test(folio, nr_refs))
continue;
- /* hugetlb has its own memcg */
- if (folio_test_hugetlb(folio)) {
+ if (unlikely(folio_has_type(folio))) {
+ /* typed folios have their own memcg, if any */
if (lruvec) {
unlock_page_lruvec_irqrestore(lruvec, flags);
lruvec = NULL;
}
- free_huge_folio(folio);
+ free_typed_folio(folio);
continue;
}
folio_unqueue_deferred_split(folio);
--
2.49.0.rc1.451.g8f38331e32-goog
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v7 2/9] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages
2025-03-18 16:18 [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 1/9] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
@ 2025-03-18 16:18 ` Fuad Tabba
2025-04-14 10:01 ` David Hildenbrand
2025-03-18 16:18 ` [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
` (6 subsequent siblings)
8 siblings, 1 reply; 35+ messages in thread
From: Fuad Tabba @ 2025-03-18 16:18 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba
Before transitioning a guest_memfd folio to unshared, thereby
disallowing access by the host and allowing the hypervisor to
transition its view of the guest page as private, we need to be
sure that the host doesn't have any references to the folio.
This patch introduces a new type for guest_memfd folios, which
isn't activated in this series but is here as a placeholder and
to facilitate the code in the subsequent patch series. This will
be used in the future to register a callback that informs the
guest_memfd subsystem when the last reference is dropped,
therefore knowing that the host doesn't have any remaining
references.
This patch also introduces the configuration option,
KVM_GMEM_SHARED_MEM, which toggles support for mapping
guest_memfd shared memory at the host.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
include/linux/kvm_host.h | 4 ++++
include/linux/page-flags.h | 16 ++++++++++++++++
mm/debug.c | 1 +
mm/swap.c | 29 +++++++++++++++++++++++++++++
virt/kvm/Kconfig | 4 ++++
virt/kvm/guest_memfd.c | 8 ++++++++
6 files changed, 62 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f34f4cfaa513..3ad0719bfc4f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2571,4 +2571,8 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
struct kvm_pre_fault_memory *range);
#endif
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+void kvm_gmem_handle_folio_put(struct folio *folio);
+#endif
+
#endif
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6dc2494bd002..daeee9a38e4c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -933,6 +933,7 @@ enum pagetype {
PGTY_slab = 0xf5,
PGTY_zsmalloc = 0xf6,
PGTY_unaccepted = 0xf7,
+ PGTY_guestmem = 0xf8,
PGTY_mapcount_underflow = 0xff
};
@@ -1082,6 +1083,21 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
FOLIO_TEST_FLAG_FALSE(hugetlb)
#endif
+/*
+ * guestmem folios are used to back VM memory as managed by guest_memfd. Once
+ * the last reference is put, instead of freeing these folios back to the page
+ * allocator, they are returned to guest_memfd.
+ *
+ * For now, guestmem will only be set on these folios as long as they cannot be
+ * mapped to user space ("private state"), with the plan of always setting that
+ * type once typed folios can be mapped to user space cleanly.
+ */
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+FOLIO_TYPE_OPS(guestmem, guestmem)
+#else
+FOLIO_TEST_FLAG_FALSE(guestmem)
+#endif
+
PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
/*
diff --git a/mm/debug.c b/mm/debug.c
index 8d2acf432385..08bc42c6cba8 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -56,6 +56,7 @@ static const char *page_type_names[] = {
DEF_PAGETYPE_NAME(table),
DEF_PAGETYPE_NAME(buddy),
DEF_PAGETYPE_NAME(unaccepted),
+ DEF_PAGETYPE_NAME(guestmem),
};
static const char *page_type_name(unsigned int page_type)
diff --git a/mm/swap.c b/mm/swap.c
index 47bc1bb919cc..d8fda3948684 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -38,6 +38,10 @@
#include <linux/local_lock.h>
#include <linux/buffer_head.h>
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+#include <linux/kvm_host.h>
+#endif
+
#include "internal.h"
#define CREATE_TRACE_POINTS
@@ -94,6 +98,26 @@ static void page_cache_release(struct folio *folio)
unlock_page_lruvec_irqrestore(lruvec, flags);
}
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+static void gmem_folio_put(struct folio *folio)
+{
+ /*
+ * Perform the callback only as long as the KVM module is still loaded.
+ * As long as the folio mapping is set, the folio is associated with a
+ * guest_memfd inode.
+ */
+ if (folio->mapping)
+ kvm_gmem_handle_folio_put(folio);
+
+ /*
+ * If there are no references to the folio left, it's not associated
+ * with a guest_memfd inode anymore.
+ */
+ if (folio_ref_count(folio) == 0)
+ __folio_put(folio);
+}
+#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
+
static void free_typed_folio(struct folio *folio)
{
switch (folio_get_type(folio)) {
@@ -101,6 +125,11 @@ static void free_typed_folio(struct folio *folio)
case PGTY_hugetlb:
free_huge_folio(folio);
return;
+#endif
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+ case PGTY_guestmem:
+ gmem_folio_put(folio);
+ return;
#endif
default:
WARN_ON_ONCE(1);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 54e959e7d68f..4e759e8020c5 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -124,3 +124,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
config HAVE_KVM_ARCH_GMEM_INVALIDATE
bool
depends on KVM_PRIVATE_MEM
+
+config KVM_GMEM_SHARED_MEM
+ select KVM_PRIVATE_MEM
+ bool
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b2aa6bf24d3a..5fc414becae5 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -13,6 +13,14 @@ struct kvm_gmem {
struct list_head entry;
};
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+void kvm_gmem_handle_folio_put(struct folio *folio)
+{
+ WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
+}
+EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
+#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
+
/**
* folio_file_pfn - like folio_file_page, but return a pfn.
* @folio: The folio which contains this index.
--
2.49.0.rc1.451.g8f38331e32-goog
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages
2025-03-18 16:18 [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 1/9] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 2/9] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages Fuad Tabba
@ 2025-03-18 16:18 ` Fuad Tabba
2025-04-08 12:04 ` Shivank Garg
2025-04-14 10:06 ` David Hildenbrand
2025-03-18 16:18 ` [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory Fuad Tabba
` (5 subsequent siblings)
8 siblings, 2 replies; 35+ messages in thread
From: Fuad Tabba @ 2025-03-18 16:18 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba
Add support for mmap() and fault() for guest_memfd backed memory
in the host for VMs that support in-place conversion between
shared and private. To that end, this patch adds the ability to
check whether the VM type supports in-place conversion, and only
allows mapping its memory if that's the case.
Also add the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
indicates that the VM supports shared memory in guest_memfd, or
that the host can create VMs that support shared memory.
Supporting shared memory implies that memory can be mapped when
shared with the host.
This is controlled by the KVM_GMEM_SHARED_MEM configuration
option.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
include/linux/kvm_host.h | 11 +++++
include/uapi/linux/kvm.h | 1 +
virt/kvm/guest_memfd.c | 101 +++++++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 4 ++
4 files changed, 117 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3ad0719bfc4f..601bbcaa5e41 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -728,6 +728,17 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
}
#endif
+/*
+ * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
+ * private memory is enabled and it supports in-place shared/private conversion.
+ */
+#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
+static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
+{
+ return false;
+}
+#endif
+
#ifndef kvm_arch_has_readonly_mem
static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
{
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 45e6d8fca9b9..117937a895da 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -929,6 +929,7 @@ struct kvm_enable_cap {
#define KVM_CAP_PRE_FAULT_MEMORY 236
#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
#define KVM_CAP_X86_GUEST_MODE 238
+#define KVM_CAP_GMEM_SHARED_MEM 239
struct kvm_irq_routing_irqchip {
__u32 irqchip;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 5fc414becae5..fbf89e643add 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -320,7 +320,108 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
return gfn - slot->base_gfn + slot->gmem.pgoff;
}
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+static bool kvm_gmem_offset_is_shared(struct file *file, pgoff_t index)
+{
+ struct kvm_gmem *gmem = file->private_data;
+
+
+ /* For now, VMs that support shared memory share all their memory. */
+ return kvm_arch_gmem_supports_shared_mem(gmem->kvm);
+}
+
+static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
+{
+ struct inode *inode = file_inode(vmf->vma->vm_file);
+ struct folio *folio;
+ vm_fault_t ret = VM_FAULT_LOCKED;
+
+ filemap_invalidate_lock_shared(inode->i_mapping);
+
+ folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+ if (IS_ERR(folio)) {
+ int err = PTR_ERR(folio);
+
+ if (err == -EAGAIN)
+ ret = VM_FAULT_RETRY;
+ else
+ ret = vmf_error(err);
+
+ goto out_filemap;
+ }
+
+ if (folio_test_hwpoison(folio)) {
+ ret = VM_FAULT_HWPOISON;
+ goto out_folio;
+ }
+
+ if (!kvm_gmem_offset_is_shared(vmf->vma->vm_file, vmf->pgoff)) {
+ ret = VM_FAULT_SIGBUS;
+ goto out_folio;
+ }
+
+ /*
+ * Shared folios would not be marked as "guestmem" so far, and we only
+ * expect shared folios at this point.
+ */
+ if (WARN_ON_ONCE(folio_test_guestmem(folio))) {
+ ret = VM_FAULT_SIGBUS;
+ goto out_folio;
+ }
+
+ /* No support for huge pages. */
+ if (WARN_ON_ONCE(folio_test_large(folio))) {
+ ret = VM_FAULT_SIGBUS;
+ goto out_folio;
+ }
+
+ if (!folio_test_uptodate(folio)) {
+ clear_highpage(folio_page(folio, 0));
+ kvm_gmem_mark_prepared(folio);
+ }
+
+ vmf->page = folio_file_page(folio, vmf->pgoff);
+
+out_folio:
+ if (ret != VM_FAULT_LOCKED) {
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+
+out_filemap:
+ filemap_invalidate_unlock_shared(inode->i_mapping);
+
+ return ret;
+}
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+ .fault = kvm_gmem_fault,
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct kvm_gmem *gmem = file->private_data;
+
+ if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
+ return -ENODEV;
+
+ if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
+ (VM_SHARED | VM_MAYSHARE)) {
+ return -EINVAL;
+ }
+
+ file_accessed(file);
+ vm_flags_set(vma, VM_DONTDUMP);
+ vma->vm_ops = &kvm_gmem_vm_ops;
+
+ return 0;
+}
+#else
+#define kvm_gmem_mmap NULL
+#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
+
static struct file_operations kvm_gmem_fops = {
+ .mmap = kvm_gmem_mmap,
.open = generic_file_open,
.release = kvm_gmem_release,
.fallocate = kvm_gmem_fallocate,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ba0327e2d0d3..38f0f402ea46 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4830,6 +4830,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
#ifdef CONFIG_KVM_PRIVATE_MEM
case KVM_CAP_GUEST_MEMFD:
return !kvm || kvm_arch_has_private_mem(kvm);
+#endif
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+ case KVM_CAP_GMEM_SHARED_MEM:
+ return !kvm || kvm_arch_gmem_supports_shared_mem(kvm);
#endif
default:
break;
--
2.49.0.rc1.451.g8f38331e32-goog
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-03-18 16:18 [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (2 preceding siblings ...)
2025-03-18 16:18 ` [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
@ 2025-03-18 16:18 ` Fuad Tabba
2025-04-14 11:51 ` David Hildenbrand
2025-03-18 16:18 ` [PATCH v7 5/9] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory Fuad Tabba
` (4 subsequent siblings)
8 siblings, 1 reply; 35+ messages in thread
From: Fuad Tabba @ 2025-03-18 16:18 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba
For VMs that allow sharing guest_memfd backed memory in-place,
handle that memory the same as "private" guest_memfd memory. This
means that faulting that memory in the host or in the guest will
go through the guest_memfd subsystem.
Note that the word "private" in the name of the function
kvm_mem_is_private() doesn't necessarily indicate that the memory
isn't shared, but is due to the history and evolution of
guest_memfd and the various names it has received. In effect,
this function is used to multiplex between the path of a normal
page fault and the path of a guest_memfd backed page fault.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
include/linux/kvm_host.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 601bbcaa5e41..3d5595a71a2a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2521,7 +2521,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
#else
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{
- return false;
+ return kvm_arch_gmem_supports_shared_mem(kvm) &&
+ kvm_slot_can_be_private(gfn_to_memslot(kvm, gfn));
}
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
--
2.49.0.rc1.451.g8f38331e32-goog
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v7 5/9] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory
2025-03-18 16:18 [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (3 preceding siblings ...)
2025-03-18 16:18 ` [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory Fuad Tabba
@ 2025-03-18 16:18 ` Fuad Tabba
2025-03-26 14:42 ` kernel test robot
2025-03-18 16:18 ` [PATCH v7 6/9] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
` (3 subsequent siblings)
8 siblings, 1 reply; 35+ messages in thread
From: Fuad Tabba @ 2025-03-18 16:18 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba
The KVM_X86_SW_PROTECTED_VM type is meant for experimentation and
does not have any underlying support for protected guests. This
makes it a good candidate for testing mapping shared memory.
Therefore, when the kconfig option is enabled, mark
KVM_X86_SW_PROTECTED_VM as supporting shared memory.
This means that this memory is considered by guest_memfd to be
shared with the host, with the possibility of in-place conversion
between shared and private. This allows the host to map and fault
in guest_memfd memory belonging to this VM type.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/x86/include/asm/kvm_host.h | 5 +++++
arch/x86/kvm/Kconfig | 3 ++-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 32ae3aa50c7e..b874e54a5ee4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2246,8 +2246,13 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
#ifdef CONFIG_KVM_PRIVATE_MEM
#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
+
+#define kvm_arch_gmem_supports_shared_mem(kvm) \
+ (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) && \
+ ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM))
#else
#define kvm_arch_has_private_mem(kvm) false
+#define kvm_arch_gmem_supports_shared_mem(kvm) false
#endif
#define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ea2c4f21c1ca..22d1bcdaad58 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -45,7 +45,8 @@ config KVM_X86
select HAVE_KVM_PM_NOTIFIER if PM
select KVM_GENERIC_HARDWARE_ENABLING
select KVM_GENERIC_PRE_FAULT_MEMORY
- select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
+ select KVM_PRIVATE_MEM if KVM_SW_PROTECTED_VM
+ select KVM_GMEM_SHARED_MEM if KVM_SW_PROTECTED_VM
select KVM_WERROR if WERROR
config KVM
--
2.49.0.rc1.451.g8f38331e32-goog
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v7 6/9] KVM: arm64: Refactor user_mem_abort() calculation of force_pte
2025-03-18 16:18 [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (4 preceding siblings ...)
2025-03-18 16:18 ` [PATCH v7 5/9] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory Fuad Tabba
@ 2025-03-18 16:18 ` Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 7/9] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
` (2 subsequent siblings)
8 siblings, 0 replies; 35+ messages in thread
From: Fuad Tabba @ 2025-03-18 16:18 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba
To simplify the code and to make the assumptions clearer,
refactor user_mem_abort() by immediately setting force_pte to
true if the conditions are met. Also, remove the comment about
logging_active being guaranteed to never be true for VM_PFNMAP
memslots, since it's not technically correct right now.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/mmu.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1f55b0c7b11d..887ffa1f5b14 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1460,7 +1460,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
bool fault_is_perm)
{
int ret = 0;
- bool write_fault, writable, force_pte = false;
+ bool write_fault, writable;
bool exec_fault, mte_allowed;
bool device = false, vfio_allow_any_uc = false;
unsigned long mmu_seq;
@@ -1472,6 +1472,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
gfn_t gfn;
kvm_pfn_t pfn;
bool logging_active = memslot_is_logging(memslot);
+ bool force_pte = logging_active || is_protected_kvm_enabled();
long vma_pagesize, fault_granule;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt;
@@ -1521,16 +1522,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}
- /*
- * logging_active is guaranteed to never be true for VM_PFNMAP
- * memslots.
- */
- if (logging_active || is_protected_kvm_enabled()) {
- force_pte = true;
+ if (force_pte)
vma_shift = PAGE_SHIFT;
- } else {
+ else
vma_shift = get_vma_page_shift(vma, hva);
- }
switch (vma_shift) {
#ifndef __PAGETABLE_PMD_FOLDED
--
2.49.0.rc1.451.g8f38331e32-goog
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v7 7/9] KVM: arm64: Handle guest_memfd()-backed guest page faults
2025-03-18 16:18 [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (5 preceding siblings ...)
2025-03-18 16:18 ` [PATCH v7 6/9] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
@ 2025-03-18 16:18 ` Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 8/9] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 9/9] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
8 siblings, 0 replies; 35+ messages in thread
From: Fuad Tabba @ 2025-03-18 16:18 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba
Add arm64 support for handling guest page faults on guest_memfd
backed memslots.
For now, the fault granule is restricted to PAGE_SIZE.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/mmu.c | 65 +++++++++++++++++++++++++++-------------
include/linux/kvm_host.h | 5 ++++
virt/kvm/kvm_main.c | 5 ----
3 files changed, 50 insertions(+), 25 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 887ffa1f5b14..adb0681fc1c6 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1454,6 +1454,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
return vma->vm_flags & VM_MTE_ALLOWED;
}
+static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn, bool write_fault, bool *writable,
+ struct page **page, bool is_private)
+{
+ kvm_pfn_t pfn;
+ int ret;
+
+ if (!is_private)
+ return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
+
+ *writable = false;
+
+ ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
+ if (!ret) {
+ *writable = !memslot_is_readonly(slot);
+ return pfn;
+ }
+
+ if (ret == -EHWPOISON)
+ return KVM_PFN_ERR_HWPOISON;
+
+ return KVM_PFN_ERR_NOSLOT_MASK;
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, unsigned long hva,
@@ -1461,19 +1485,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
{
int ret = 0;
bool write_fault, writable;
- bool exec_fault, mte_allowed;
+ bool exec_fault, mte_allowed = false;
bool device = false, vfio_allow_any_uc = false;
unsigned long mmu_seq;
phys_addr_t ipa = fault_ipa;
struct kvm *kvm = vcpu->kvm;
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma = NULL;
short vma_shift;
void *memcache;
- gfn_t gfn;
+ gfn_t gfn = ipa >> PAGE_SHIFT;
kvm_pfn_t pfn;
bool logging_active = memslot_is_logging(memslot);
- bool force_pte = logging_active || is_protected_kvm_enabled();
- long vma_pagesize, fault_granule;
+ bool is_gmem = kvm_mem_is_private(kvm, gfn);
+ bool force_pte = logging_active || is_gmem || is_protected_kvm_enabled();
+ long vma_pagesize, fault_granule = PAGE_SIZE;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt;
struct page *page;
@@ -1510,16 +1535,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return ret;
}
+ mmap_read_lock(current->mm);
+
/*
* Let's check if we will get back a huge page backed by hugetlbfs, or
* get block mapping for device MMIO region.
*/
- mmap_read_lock(current->mm);
- vma = vma_lookup(current->mm, hva);
- if (unlikely(!vma)) {
- kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
- mmap_read_unlock(current->mm);
- return -EFAULT;
+ if (!is_gmem) {
+ vma = vma_lookup(current->mm, hva);
+ if (unlikely(!vma)) {
+ kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
+ mmap_read_unlock(current->mm);
+ return -EFAULT;
+ }
+
+ vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
+ mte_allowed = kvm_vma_mte_allowed(vma);
}
if (force_pte)
@@ -1590,18 +1621,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
ipa &= ~(vma_pagesize - 1);
}
- gfn = ipa >> PAGE_SHIFT;
- mte_allowed = kvm_vma_mte_allowed(vma);
-
- vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
-
/* Don't use the VMA after the unlock -- it may have vanished */
vma = NULL;
/*
* Read mmu_invalidate_seq so that KVM can detect if the results of
- * vma_lookup() or __kvm_faultin_pfn() become stale prior to
- * acquiring kvm->mmu_lock.
+ * vma_lookup() or faultin_pfn() become stale prior to acquiring
+ * kvm->mmu_lock.
*
* Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
* with the smp_wmb() in kvm_mmu_invalidate_end().
@@ -1609,8 +1635,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
mmu_seq = vcpu->kvm->mmu_invalidate_seq;
mmap_read_unlock(current->mm);
- pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
- &writable, &page);
+ pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
if (pfn == KVM_PFN_ERR_HWPOISON) {
kvm_send_hwpoison_signal(hva, vma_shift);
return 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3d5595a71a2a..ec3bedc18eab 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1882,6 +1882,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
return gfn_to_memslot(kvm, gfn)->id;
}
+static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot)
+{
+ return slot->flags & KVM_MEM_READONLY;
+}
+
static inline gfn_t
hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
{
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 38f0f402ea46..3e40acb9f5c0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2624,11 +2624,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
return size;
}
-static bool memslot_is_readonly(const struct kvm_memory_slot *slot)
-{
- return slot->flags & KVM_MEM_READONLY;
-}
-
static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn,
gfn_t *nr_pages, bool write)
{
--
2.49.0.rc1.451.g8f38331e32-goog
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v7 8/9] KVM: arm64: Enable mapping guest_memfd in arm64
2025-03-18 16:18 [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (6 preceding siblings ...)
2025-03-18 16:18 ` [PATCH v7 7/9] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
@ 2025-03-18 16:18 ` Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 9/9] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
8 siblings, 0 replies; 35+ messages in thread
From: Fuad Tabba @ 2025-03-18 16:18 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba
Enable mapping guest_memfd in arm64. For now, it applies to all
VMs in arm64 that use guest_memfd. In the future, new VM types
can restrict this via kvm_arch_gmem_supports_shared_mem().
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
arch/arm64/kvm/Kconfig | 1 +
2 files changed, 13 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d919557af5e5..4440b2334a05 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1543,4 +1543,16 @@ void kvm_set_vm_id_reg(struct kvm *kvm, u32 reg, u64 val);
#define kvm_has_s1poe(k) \
(kvm_has_feat((k), ID_AA64MMFR3_EL1, S1POE, IMP))
+#ifdef CONFIG_KVM_PRIVATE_MEM
+static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
+{
+ return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM);
+}
+
+static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
+{
+ return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
+}
+#endif /* CONFIG_KVM_PRIVATE_MEM */
+
#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index ead632ad01b4..4830d8805bed 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -38,6 +38,7 @@ menuconfig KVM
select HAVE_KVM_VCPU_RUN_PID_CHANGE
select SCHED_INFO
select GUEST_PERF_EVENTS if PERF_EVENTS
+ select KVM_GMEM_SHARED_MEM
help
Support hosting virtualized guest machines.
--
2.49.0.rc1.451.g8f38331e32-goog
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v7 9/9] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed
2025-03-18 16:18 [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (7 preceding siblings ...)
2025-03-18 16:18 ` [PATCH v7 8/9] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
@ 2025-03-18 16:18 ` Fuad Tabba
2025-04-01 17:25 ` Ackerley Tng
8 siblings, 1 reply; 35+ messages in thread
From: Fuad Tabba @ 2025-03-18 16:18 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba
Expand the guest_memfd selftests to include testing mapping guest
memory for VM types that support it.
Also, build the guest_memfd selftest for arm64.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/guest_memfd_test.c | 75 +++++++++++++++++--
2 files changed, 70 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 4277b983cace..c9a3f30e28dd 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -160,6 +160,7 @@ TEST_GEN_PROGS_arm64 += coalesced_io_test
TEST_GEN_PROGS_arm64 += demand_paging_test
TEST_GEN_PROGS_arm64 += dirty_log_test
TEST_GEN_PROGS_arm64 += dirty_log_perf_test
+TEST_GEN_PROGS_arm64 += guest_memfd_test
TEST_GEN_PROGS_arm64 += guest_print_test
TEST_GEN_PROGS_arm64 += get-reg-list
TEST_GEN_PROGS_arm64 += kvm_create_max_vcpus
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index ce687f8d248f..38c501e49e0e 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -34,12 +34,48 @@ static void test_file_read_write(int fd)
"pwrite on a guest_mem fd should fail");
}
-static void test_mmap(int fd, size_t page_size)
+static void test_mmap_allowed(int fd, size_t total_size)
{
+ size_t page_size = getpagesize();
+ const char val = 0xaa;
+ char *mem;
+ int ret;
+ int i;
+
+ mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass.");
+
+ memset(mem, val, total_size);
+ for (i = 0; i < total_size; i++)
+ TEST_ASSERT_EQ(mem[i], val);
+
+ ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
+ page_size);
+ TEST_ASSERT(!ret, "fallocate the first page should succeed");
+
+ for (i = 0; i < page_size; i++)
+ TEST_ASSERT_EQ(mem[i], 0x00);
+ for (; i < total_size; i++)
+ TEST_ASSERT_EQ(mem[i], val);
+
+ memset(mem, val, total_size);
+ for (i = 0; i < total_size; i++)
+ TEST_ASSERT_EQ(mem[i], val);
+
+ ret = munmap(mem, total_size);
+ TEST_ASSERT(!ret, "munmap should succeed");
+}
+
+static void test_mmap_denied(int fd, size_t total_size)
+{
+ size_t page_size = getpagesize();
char *mem;
mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
TEST_ASSERT_EQ(mem, MAP_FAILED);
+
+ mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ TEST_ASSERT_EQ(mem, MAP_FAILED);
}
static void test_file_size(int fd, size_t page_size, size_t total_size)
@@ -170,19 +206,27 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
close(fd1);
}
-int main(int argc, char *argv[])
+unsigned long get_shared_type(void)
{
- size_t page_size;
+#ifdef __x86_64__
+ return KVM_X86_SW_PROTECTED_VM;
+#endif
+ return 0;
+}
+
+void test_vm_type(unsigned long type, bool is_shared)
+{
+ struct kvm_vm *vm;
size_t total_size;
+ size_t page_size;
int fd;
- struct kvm_vm *vm;
TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
page_size = getpagesize();
total_size = page_size * 4;
- vm = vm_create_barebones();
+ vm = vm_create_barebones_type(type);
test_create_guest_memfd_invalid(vm);
test_create_guest_memfd_multiple(vm);
@@ -190,10 +234,29 @@ int main(int argc, char *argv[])
fd = vm_create_guest_memfd(vm, total_size, 0);
test_file_read_write(fd);
- test_mmap(fd, page_size);
+
+ if (is_shared)
+ test_mmap_allowed(fd, total_size);
+ else
+ test_mmap_denied(fd, total_size);
+
test_file_size(fd, page_size, total_size);
test_fallocate(fd, page_size, total_size);
test_invalid_punch_hole(fd, page_size, total_size);
close(fd);
+ kvm_vm_release(vm);
+}
+
+int main(int argc, char *argv[])
+{
+#ifndef __aarch64__
+ /* For now, arm64 only supports shared guest memory. */
+ test_vm_type(VM_TYPE_DEFAULT, false);
+#endif
+
+ if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
+ test_vm_type(get_shared_type(), true);
+
+ return 0;
}
--
2.49.0.rc1.451.g8f38331e32-goog
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v7 5/9] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory
2025-03-18 16:18 ` [PATCH v7 5/9] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory Fuad Tabba
@ 2025-03-26 14:42 ` kernel test robot
0 siblings, 0 replies; 35+ messages in thread
From: kernel test robot @ 2025-03-26 14:42 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: llvm, oe-kbuild-all, pbonzini, chenhuacai, mpe, anup,
paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang
Hi Fuad,
kernel test robot noticed the following build errors:
[auto build test ERROR on 4701f33a10702d5fc577c32434eb62adde0a1ae1]
url: https://github.com/intel-lab-lkp/linux/commits/Fuad-Tabba/mm-Consolidate-freeing-of-typed-folios-on-final-folio_put/20250319-003340
base: 4701f33a10702d5fc577c32434eb62adde0a1ae1
patch link: https://lore.kernel.org/r/20250318161823.4005529-6-tabba%40google.com
patch subject: [PATCH v7 5/9] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory
config: x86_64-randconfig-078-20250326 (https://download.01.org/0day-ci/archive/20250326/202503262202.WSGvs92t-lkp@intel.com/config)
compiler: clang version 20.1.1 (https://github.com/llvm/llvm-project 424c2d9b7e4de40d0804dd374721e6411c27d1d1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250326/202503262202.WSGvs92t-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202503262202.WSGvs92t-lkp@intel.com/
All errors (new ones prefixed by >>):
>> ld.lld: error: undefined symbol: kvm_gmem_handle_folio_put
>>> referenced by swap.c:110 (mm/swap.c:110)
>>> mm/swap.o:(free_typed_folio) in archive vmlinux.a
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 9/9] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed
2025-03-18 16:18 ` [PATCH v7 9/9] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
@ 2025-04-01 17:25 ` Ackerley Tng
2025-04-02 8:56 ` Fuad Tabba
0 siblings, 1 reply; 35+ messages in thread
From: Ackerley Tng @ 2025-04-01 17:25 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
tabba
Fuad Tabba <tabba@google.com> writes:
> Expand the guest_memfd selftests to include testing mapping guest
> memory for VM types that support it.
>
> Also, build the guest_memfd selftest for arm64.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> tools/testing/selftests/kvm/Makefile.kvm | 1 +
> .../testing/selftests/kvm/guest_memfd_test.c | 75 +++++++++++++++++--
> 2 files changed, 70 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> index 4277b983cace..c9a3f30e28dd 100644
> --- a/tools/testing/selftests/kvm/Makefile.kvm
> +++ b/tools/testing/selftests/kvm/Makefile.kvm
> @@ -160,6 +160,7 @@ TEST_GEN_PROGS_arm64 += coalesced_io_test
> TEST_GEN_PROGS_arm64 += demand_paging_test
> TEST_GEN_PROGS_arm64 += dirty_log_test
> TEST_GEN_PROGS_arm64 += dirty_log_perf_test
> +TEST_GEN_PROGS_arm64 += guest_memfd_test
> TEST_GEN_PROGS_arm64 += guest_print_test
> TEST_GEN_PROGS_arm64 += get-reg-list
> TEST_GEN_PROGS_arm64 += kvm_create_max_vcpus
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index ce687f8d248f..38c501e49e0e 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -34,12 +34,48 @@ static void test_file_read_write(int fd)
> "pwrite on a guest_mem fd should fail");
> }
>
> -static void test_mmap(int fd, size_t page_size)
> +static void test_mmap_allowed(int fd, size_t total_size)
> {
> + size_t page_size = getpagesize();
> + const char val = 0xaa;
> + char *mem;
> + int ret;
> + int i;
Was using this test with hugetlb patches - int i was overflowing and
causing test failures. Perhaps use size_t i for this instead?
> +
> + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> + TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass.");
> +
> + memset(mem, val, total_size);
> + for (i = 0; i < total_size; i++)
> + TEST_ASSERT_EQ(mem[i], val);
> +
> + ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
> + page_size);
> + TEST_ASSERT(!ret, "fallocate the first page should succeed");
> +
> + for (i = 0; i < page_size; i++)
> + TEST_ASSERT_EQ(mem[i], 0x00);
> + for (; i < total_size; i++)
> + TEST_ASSERT_EQ(mem[i], val);
> +
> + memset(mem, val, total_size);
> + for (i = 0; i < total_size; i++)
> + TEST_ASSERT_EQ(mem[i], val);
> +
> + ret = munmap(mem, total_size);
> + TEST_ASSERT(!ret, "munmap should succeed");
> +}
> +
> +static void test_mmap_denied(int fd, size_t total_size)
> +{
> + size_t page_size = getpagesize();
> char *mem;
>
> mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> TEST_ASSERT_EQ(mem, MAP_FAILED);
> +
> + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> + TEST_ASSERT_EQ(mem, MAP_FAILED);
> }
>
> static void test_file_size(int fd, size_t page_size, size_t total_size)
> @@ -170,19 +206,27 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> close(fd1);
> }
>
> -int main(int argc, char *argv[])
> +unsigned long get_shared_type(void)
> {
> - size_t page_size;
> +#ifdef __x86_64__
> + return KVM_X86_SW_PROTECTED_VM;
> +#endif
> + return 0;
> +}
> +
> +void test_vm_type(unsigned long type, bool is_shared)
> +{
> + struct kvm_vm *vm;
> size_t total_size;
> + size_t page_size;
> int fd;
> - struct kvm_vm *vm;
>
> TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
>
> page_size = getpagesize();
> total_size = page_size * 4;
>
> - vm = vm_create_barebones();
> + vm = vm_create_barebones_type(type);
>
> test_create_guest_memfd_invalid(vm);
> test_create_guest_memfd_multiple(vm);
> @@ -190,10 +234,29 @@ int main(int argc, char *argv[])
> fd = vm_create_guest_memfd(vm, total_size, 0);
>
> test_file_read_write(fd);
> - test_mmap(fd, page_size);
> +
> + if (is_shared)
> + test_mmap_allowed(fd, total_size);
> + else
> + test_mmap_denied(fd, total_size);
> +
> test_file_size(fd, page_size, total_size);
> test_fallocate(fd, page_size, total_size);
> test_invalid_punch_hole(fd, page_size, total_size);
>
> close(fd);
> + kvm_vm_release(vm);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +#ifndef __aarch64__
> + /* For now, arm64 only supports shared guest memory. */
> + test_vm_type(VM_TYPE_DEFAULT, false);
> +#endif
> +
> + if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
> + test_vm_type(get_shared_type(), true);
> +
> + return 0;
> }
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 9/9] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed
2025-04-01 17:25 ` Ackerley Tng
@ 2025-04-02 8:56 ` Fuad Tabba
0 siblings, 0 replies; 35+ messages in thread
From: Fuad Tabba @ 2025-04-02 8:56 UTC (permalink / raw)
To: Ackerley Tng
Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx
Hi Ackerley,
On Tue, 1 Apr 2025 at 18:25, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> > Expand the guest_memfd selftests to include testing mapping guest
> > memory for VM types that support it.
> >
> > Also, build the guest_memfd selftest for arm64.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> > tools/testing/selftests/kvm/Makefile.kvm | 1 +
> > .../testing/selftests/kvm/guest_memfd_test.c | 75 +++++++++++++++++--
> > 2 files changed, 70 insertions(+), 6 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> > index 4277b983cace..c9a3f30e28dd 100644
> > --- a/tools/testing/selftests/kvm/Makefile.kvm
> > +++ b/tools/testing/selftests/kvm/Makefile.kvm
> > @@ -160,6 +160,7 @@ TEST_GEN_PROGS_arm64 += coalesced_io_test
> > TEST_GEN_PROGS_arm64 += demand_paging_test
> > TEST_GEN_PROGS_arm64 += dirty_log_test
> > TEST_GEN_PROGS_arm64 += dirty_log_perf_test
> > +TEST_GEN_PROGS_arm64 += guest_memfd_test
> > TEST_GEN_PROGS_arm64 += guest_print_test
> > TEST_GEN_PROGS_arm64 += get-reg-list
> > TEST_GEN_PROGS_arm64 += kvm_create_max_vcpus
> > diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> > index ce687f8d248f..38c501e49e0e 100644
> > --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> > +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> > @@ -34,12 +34,48 @@ static void test_file_read_write(int fd)
> > "pwrite on a guest_mem fd should fail");
> > }
> >
> > -static void test_mmap(int fd, size_t page_size)
> > +static void test_mmap_allowed(int fd, size_t total_size)
> > {
> > + size_t page_size = getpagesize();
> > + const char val = 0xaa;
> > + char *mem;
> > + int ret;
> > + int i;
>
> Was using this test with hugetlb patches - int i was overflowing and
> causing test failures. Perhaps use size_t i for this instead?
I need to think big, or rather, huge (sorry :)) . Will fix this.
Cheers,
/fuad
> > +
> > + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > + TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass.");
> > +
> > + memset(mem, val, total_size);
> > + for (i = 0; i < total_size; i++)
> > + TEST_ASSERT_EQ(mem[i], val);
> > +
> > + ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
> > + page_size);
> > + TEST_ASSERT(!ret, "fallocate the first page should succeed");
> > +
> > + for (i = 0; i < page_size; i++)
> > + TEST_ASSERT_EQ(mem[i], 0x00);
> > + for (; i < total_size; i++)
> > + TEST_ASSERT_EQ(mem[i], val);
> > +
> > + memset(mem, val, total_size);
> > + for (i = 0; i < total_size; i++)
> > + TEST_ASSERT_EQ(mem[i], val);
> > +
> > + ret = munmap(mem, total_size);
> > + TEST_ASSERT(!ret, "munmap should succeed");
> > +}
> > +
> > +static void test_mmap_denied(int fd, size_t total_size)
> > +{
> > + size_t page_size = getpagesize();
> > char *mem;
> >
> > mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > TEST_ASSERT_EQ(mem, MAP_FAILED);
> > +
> > + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > + TEST_ASSERT_EQ(mem, MAP_FAILED);
> > }
> >
> > static void test_file_size(int fd, size_t page_size, size_t total_size)
> > @@ -170,19 +206,27 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> > close(fd1);
> > }
> >
> > -int main(int argc, char *argv[])
> > +unsigned long get_shared_type(void)
> > {
> > - size_t page_size;
> > +#ifdef __x86_64__
> > + return KVM_X86_SW_PROTECTED_VM;
> > +#endif
> > + return 0;
> > +}
> > +
> > +void test_vm_type(unsigned long type, bool is_shared)
> > +{
> > + struct kvm_vm *vm;
> > size_t total_size;
> > + size_t page_size;
> > int fd;
> > - struct kvm_vm *vm;
> >
> > TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> >
> > page_size = getpagesize();
> > total_size = page_size * 4;
> >
> > - vm = vm_create_barebones();
> > + vm = vm_create_barebones_type(type);
> >
> > test_create_guest_memfd_invalid(vm);
> > test_create_guest_memfd_multiple(vm);
> > @@ -190,10 +234,29 @@ int main(int argc, char *argv[])
> > fd = vm_create_guest_memfd(vm, total_size, 0);
> >
> > test_file_read_write(fd);
> > - test_mmap(fd, page_size);
> > +
> > + if (is_shared)
> > + test_mmap_allowed(fd, total_size);
> > + else
> > + test_mmap_denied(fd, total_size);
> > +
> > test_file_size(fd, page_size, total_size);
> > test_fallocate(fd, page_size, total_size);
> > test_invalid_punch_hole(fd, page_size, total_size);
> >
> > close(fd);
> > + kvm_vm_release(vm);
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +#ifndef __aarch64__
> > + /* For now, arm64 only supports shared guest memory. */
> > + test_vm_type(VM_TYPE_DEFAULT, false);
> > +#endif
> > +
> > + if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
> > + test_vm_type(get_shared_type(), true);
> > +
> > + return 0;
> > }
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages
2025-03-18 16:18 ` [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
@ 2025-04-08 12:04 ` Shivank Garg
2025-04-08 13:17 ` Fuad Tabba
2025-04-08 16:58 ` Ackerley Tng
2025-04-14 10:06 ` David Hildenbrand
1 sibling, 2 replies; 35+ messages in thread
From: Shivank Garg @ 2025-04-08 12:04 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx
Hi Fuad,
On 3/18/2025 9:48 PM, Fuad Tabba wrote:
> Add support for mmap() and fault() for guest_memfd backed memory
> in the host for VMs that support in-place conversion between
> shared and private. To that end, this patch adds the ability to
> check whether the VM type supports in-place conversion, and only
> allows mapping its memory if that's the case.
>
> Also add the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> indicates that the VM supports shared memory in guest_memfd, or
> that the host can create VMs that support shared memory.
> Supporting shared memory implies that memory can be mapped when
> shared with the host.
>
> This is controlled by the KVM_GMEM_SHARED_MEM configuration
> option.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
...
...
> +
> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + struct kvm_gmem *gmem = file->private_data;
> +
> + if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
> + return -ENODEV;
> +
> + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> + (VM_SHARED | VM_MAYSHARE)) {
> + return -EINVAL;
> + }
> +
> + file_accessed(file);
As it is not directly visible to userspace, do we need to update the
file's access time via file_accessed()?
> + vm_flags_set(vma, VM_DONTDUMP);
> + vma->vm_ops = &kvm_gmem_vm_ops;
> +
> + return 0;
> +}
> +#else
> +#define kvm_gmem_mmap NULL
> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> +
> static struct file_operations kvm_gmem_fops = {
> + .mmap = kvm_gmem_mmap,
> .open = generic_file_open,
> .release = kvm_gmem_release,
> .fallocate = kvm_gmem_fallocate,
Thanks,
Shivank
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages
2025-04-08 12:04 ` Shivank Garg
@ 2025-04-08 13:17 ` Fuad Tabba
2025-04-08 16:58 ` Ackerley Tng
1 sibling, 0 replies; 35+ messages in thread
From: Fuad Tabba @ 2025-04-08 13:17 UTC (permalink / raw)
To: Shivank Garg
Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx
Hi Shivank,
On Tue, 8 Apr 2025 at 13:04, Shivank Garg <shivankg@amd.com> wrote:
>
> Hi Fuad,
>
> On 3/18/2025 9:48 PM, Fuad Tabba wrote:
> > Add support for mmap() and fault() for guest_memfd backed memory
> > in the host for VMs that support in-place conversion between
> > shared and private. To that end, this patch adds the ability to
> > check whether the VM type supports in-place conversion, and only
> > allows mapping its memory if that's the case.
> >
> > Also add the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> > indicates that the VM supports shared memory in guest_memfd, or
> > that the host can create VMs that support shared memory.
> > Supporting shared memory implies that memory can be mapped when
> > shared with the host.
> >
> > This is controlled by the KVM_GMEM_SHARED_MEM configuration
> > option.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
>
> ...
> ...
> > +
> > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > + struct kvm_gmem *gmem = file->private_data;
> > +
> > + if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
> > + return -ENODEV;
> > +
> > + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> > + (VM_SHARED | VM_MAYSHARE)) {
> > + return -EINVAL;
> > + }
> > +
> > + file_accessed(file);
>
> As it is not directly visible to userspace, do we need to update the
> file's access time via file_accessed()?
Makes sense.
Thanks!
/fuad
> > + vm_flags_set(vma, VM_DONTDUMP);
> > + vma->vm_ops = &kvm_gmem_vm_ops;
> > +
> > + return 0;
> > +}
> > +#else
> > +#define kvm_gmem_mmap NULL
> > +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> > +
> > static struct file_operations kvm_gmem_fops = {
> > + .mmap = kvm_gmem_mmap,
> > .open = generic_file_open,
> > .release = kvm_gmem_release,
> > .fallocate = kvm_gmem_fallocate,
>
> Thanks,
> Shivank
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages
2025-04-08 12:04 ` Shivank Garg
2025-04-08 13:17 ` Fuad Tabba
@ 2025-04-08 16:58 ` Ackerley Tng
2025-04-09 7:17 ` Shivank Garg
1 sibling, 1 reply; 35+ messages in thread
From: Ackerley Tng @ 2025-04-08 16:58 UTC (permalink / raw)
To: Shivank Garg, Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, mail, david, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx
Shivank Garg <shivankg@amd.com> writes:
> Hi Fuad,
>
> On 3/18/2025 9:48 PM, Fuad Tabba wrote:
>> Add support for mmap() and fault() for guest_memfd backed memory
>> in the host for VMs that support in-place conversion between
>> shared and private. To that end, this patch adds the ability to
>> check whether the VM type supports in-place conversion, and only
>> allows mapping its memory if that's the case.
>>
>> Also add the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
>> indicates that the VM supports shared memory in guest_memfd, or
>> that the host can create VMs that support shared memory.
>> Supporting shared memory implies that memory can be mapped when
>> shared with the host.
>>
>> This is controlled by the KVM_GMEM_SHARED_MEM configuration
>> option.
>>
>> Signed-off-by: Fuad Tabba <tabba@google.com>
>
> ...
> ...
>> +
>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> + struct kvm_gmem *gmem = file->private_data;
>> +
>> + if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
>> + return -ENODEV;
>> +
>> + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
>> + (VM_SHARED | VM_MAYSHARE)) {
>> + return -EINVAL;
>> + }
>> +
>> + file_accessed(file);
>
> As it is not directly visible to userspace, do we need to update the
> file's access time via file_accessed()?
>
Could you explain a little more about this being directly visible to
userspace?
IIUC generic_fillattr(), which guest_memfd uses, will fill stat->atime
from the inode's atime. file_accessed() will update atime and so this
should be userspace accessible. (Unless I missed something along the way
that blocks the update)
>> + vm_flags_set(vma, VM_DONTDUMP);
>> + vma->vm_ops = &kvm_gmem_vm_ops;
>> +
>> + return 0;
>> +}
>> +#else
>> +#define kvm_gmem_mmap NULL
>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
>> +
>> static struct file_operations kvm_gmem_fops = {
>> + .mmap = kvm_gmem_mmap,
>> .open = generic_file_open,
>> .release = kvm_gmem_release,
>> .fallocate = kvm_gmem_fallocate,
>
> Thanks,
> Shivank
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages
2025-04-08 16:58 ` Ackerley Tng
@ 2025-04-09 7:17 ` Shivank Garg
2025-04-10 22:44 ` Ackerley Tng
0 siblings, 1 reply; 35+ messages in thread
From: Shivank Garg @ 2025-04-09 7:17 UTC (permalink / raw)
To: Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, mail, david, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx
On 4/8/2025 10:28 PM, Ackerley Tng wrote:
> Shivank Garg <shivankg@amd.com> writes:
>
>> Hi Fuad,
>>
>> On 3/18/2025 9:48 PM, Fuad Tabba wrote:
>>> Add support for mmap() and fault() for guest_memfd backed memory
>>> in the host for VMs that support in-place conversion between
>>> shared and private. To that end, this patch adds the ability to
>>> check whether the VM type supports in-place conversion, and only
>>> allows mapping its memory if that's the case.
>>>
>>> Also add the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
>>> indicates that the VM supports shared memory in guest_memfd, or
>>> that the host can create VMs that support shared memory.
>>> Supporting shared memory implies that memory can be mapped when
>>> shared with the host.
>>>
>>> This is controlled by the KVM_GMEM_SHARED_MEM configuration
>>> option.
>>>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>
>> ...
>> ...
>>> +
>>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>>> +{
>>> + struct kvm_gmem *gmem = file->private_data;
>>> +
>>> + if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
>>> + return -ENODEV;
>>> +
>>> + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
>>> + (VM_SHARED | VM_MAYSHARE)) {
>>> + return -EINVAL;
>>> + }
>>> +
>>> + file_accessed(file);
>>
>> As it is not directly visible to userspace, do we need to update the
>> file's access time via file_accessed()?
>>
>
> Could you explain a little more about this being directly visible to
> userspace?
>
> IIUC generic_fillattr(), which guest_memfd uses, will fill stat->atime
> from the inode's atime. file_accessed() will update atime and so this
> should be userspace accessible. (Unless I missed something along the way
> that blocks the update)
>
By visibility to userspace, I meant that guest_memfd is in-memory and not
directly exposed to users as a traditional file would be.
Yes, theoretically atime is accessible to user, but is it actually useful for
guest_memfd, and do users track atime in this context? In my understanding,
this might be an unnecessary unless we want to maintain it for VFS consistency.
My analysis of the call flow:
fstat() -> vfs_fstat() -> vfs_getattr() -> vfs_getattr_nosec() -> kvm_gmem_getattr()
I couldn't find any kernel-side consumers of inode's atime or instances where
it's being used for any internal purposes.
Searching for examples, I found secretmem_mmap() skips file_accessed().
Also as side note, I believe the kvm_gmem_getattr() ops implementation
might be unnecessary here.
Since kvm_gmem_getattr() is simply calling generic_fillattr() without
any special handling, couldn't we just use the default implementation?
int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
...
if (inode->i_op->getattr)
return inode->i_op->getattr(idmap, path, stat,
request_mask,
query_flags);
generic_fillattr(idmap, request_mask, inode, stat);
return 0;
}
I might be missing some context. Please feel free to correct me
if I've misunderstood anything.
Thanks,
Shivank
>>> + vm_flags_set(vma, VM_DONTDUMP);
>>> + vma->vm_ops = &kvm_gmem_vm_ops;
>>> +
>>> + return 0;
>>> +}
>>> +#else
>>> +#define kvm_gmem_mmap NULL
>>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
>>> +
>>> static struct file_operations kvm_gmem_fops = {
>>> + .mmap = kvm_gmem_mmap,
>>> .open = generic_file_open,
>>> .release = kvm_gmem_release,
>>> .fallocate = kvm_gmem_fallocate,
>>
>> Thanks,
>> Shivank
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages
2025-04-09 7:17 ` Shivank Garg
@ 2025-04-10 22:44 ` Ackerley Tng
2025-04-11 10:34 ` Shivank Garg
0 siblings, 1 reply; 35+ messages in thread
From: Ackerley Tng @ 2025-04-10 22:44 UTC (permalink / raw)
To: Shivank Garg, Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, mail, david, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx
Shivank Garg <shivankg@amd.com> writes:
> On 4/8/2025 10:28 PM, Ackerley Tng wrote:
>> Shivank Garg <shivankg@amd.com> writes:
>>
>>> Hi Fuad,
>>>
>>> On 3/18/2025 9:48 PM, Fuad Tabba wrote:
>>>> Add support for mmap() and fault() for guest_memfd backed memory
>>>> in the host for VMs that support in-place conversion between
>>>> shared and private. To that end, this patch adds the ability to
>>>> check whether the VM type supports in-place conversion, and only
>>>> allows mapping its memory if that's the case.
>>>>
>>>> Also add the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
>>>> indicates that the VM supports shared memory in guest_memfd, or
>>>> that the host can create VMs that support shared memory.
>>>> Supporting shared memory implies that memory can be mapped when
>>>> shared with the host.
>>>>
>>>> This is controlled by the KVM_GMEM_SHARED_MEM configuration
>>>> option.
>>>>
>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>>
>>> ...
>>> ...
>>>> +
>>>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>>>> +{
>>>> + struct kvm_gmem *gmem = file->private_data;
>>>> +
>>>> + if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
>>>> + return -ENODEV;
>>>> +
>>>> + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
>>>> + (VM_SHARED | VM_MAYSHARE)) {
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + file_accessed(file);
>>>
>>> As it is not directly visible to userspace, do we need to update the
>>> file's access time via file_accessed()?
>>>
>>
>> Could you explain a little more about this being directly visible to
>> userspace?
>>
>> IIUC generic_fillattr(), which guest_memfd uses, will fill stat->atime
>> from the inode's atime. file_accessed() will update atime and so this
>> should be userspace accessible. (Unless I missed something along the way
>> that blocks the update)
>>
>
> By visibility to userspace, I meant that guest_memfd is in-memory and not
> directly exposed to users as a traditional file would be.
shmem is also in-memory and uses updates atime, so I guess being
in-memory doesn't mean we shouldn't update atime.
guest_memfd is not quite traditional, but would the mmap patches Fuad is
working on now qualify the guest_memfd as more traditional?
>
> Yes, theoretically atime is accessible to user, but is it actually useful for
> guest_memfd, and do users track atime in this context? In my understanding,
> this might be an unnecessary unless we want to maintain it for VFS consistency.
>
> My analysis of the call flow:
> fstat() -> vfs_fstat() -> vfs_getattr() -> vfs_getattr_nosec() -> kvm_gmem_getattr()
> I couldn't find any kernel-side consumers of inode's atime or instances where
> it's being used for any internal purposes.
>
> Searching for examples, I found secretmem_mmap() skips file_accessed().
>
I guess I'm okay both ways, I don't think I have a use case for reading
atime, but I assumed VFS consistency is a good thing.
>
> Also as side note, I believe the kvm_gmem_getattr() ops implementation
> might be unnecessary here.
> Since kvm_gmem_getattr() is simply calling generic_fillattr() without
> any special handling, couldn't we just use the default implementation?
>
> int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
> u32 request_mask, unsigned int query_flags)
> {
> ...
> if (inode->i_op->getattr)
> return inode->i_op->getattr(idmap, path, stat,
> request_mask,
> query_flags);
>
> generic_fillattr(idmap, request_mask, inode, stat);
> return 0;
> }
I noticed this too. I agree that we could actually just use
generic_fillattr() by not specifying ->getattr().
>
> I might be missing some context. Please feel free to correct me
> if I've misunderstood anything.
>
> Thanks,
> Shivank
>
>>>> + vm_flags_set(vma, VM_DONTDUMP);
>>>> + vma->vm_ops = &kvm_gmem_vm_ops;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +#else
>>>> +#define kvm_gmem_mmap NULL
>>>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
>>>> +
>>>> static struct file_operations kvm_gmem_fops = {
>>>> + .mmap = kvm_gmem_mmap,
>>>> .open = generic_file_open,
>>>> .release = kvm_gmem_release,
>>>> .fallocate = kvm_gmem_fallocate,
>>>
>>> Thanks,
>>> Shivank
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages
2025-04-10 22:44 ` Ackerley Tng
@ 2025-04-11 10:34 ` Shivank Garg
0 siblings, 0 replies; 35+ messages in thread
From: Shivank Garg @ 2025-04-11 10:34 UTC (permalink / raw)
To: Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, mail, david, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx
[-- Attachment #1: Type: text/plain, Size: 4125 bytes --]
On 4/11/2025 4:14 AM, Ackerley Tng wrote:
> Shivank Garg <shivankg@amd.com> writes:
>
>> On 4/8/2025 10:28 PM, Ackerley Tng wrote:
>>> Shivank Garg <shivankg@amd.com> writes:
>>>
>>>> Hi Fuad,
>>>>
>>>> On 3/18/2025 9:48 PM, Fuad Tabba wrote:
>>>>> Add support for mmap() and fault() for guest_memfd backed memory
>>>>> in the host for VMs that support in-place conversion between
>>>>> shared and private. To that end, this patch adds the ability to
>>>>> check whether the VM type supports in-place conversion, and only
>>>>> allows mapping its memory if that's the case.
>>>>>
>>>>> Also add the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
>>>>> indicates that the VM supports shared memory in guest_memfd, or
>>>>> that the host can create VMs that support shared memory.
>>>>> Supporting shared memory implies that memory can be mapped when
>>>>> shared with the host.
>>>>>
>>>>> This is controlled by the KVM_GMEM_SHARED_MEM configuration
>>>>> option.
>>>>>
>>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>>>
>>>> ...
>>>> ...
>>>>> +
>>>>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>>>>> +{
>>>>> + struct kvm_gmem *gmem = file->private_data;
>>>>> +
>>>>> + if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
>>>>> + return -ENODEV;
>>>>> +
>>>>> + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
>>>>> + (VM_SHARED | VM_MAYSHARE)) {
>>>>> + return -EINVAL;
>>>>> + }
>>>>> +
>>>>> + file_accessed(file);
>>>>
>>>> As it is not directly visible to userspace, do we need to update the
>>>> file's access time via file_accessed()?
>>>>
>>>
>>> Could you explain a little more about this being directly visible to
>>> userspace?
>>>
>>> IIUC generic_fillattr(), which guest_memfd uses, will fill stat->atime
>>> from the inode's atime. file_accessed() will update atime and so this
>>> should be userspace accessible. (Unless I missed something along the way
>>> that blocks the update)
>>>
>>
>> By visibility to userspace, I meant that guest_memfd is in-memory and not
>> directly exposed to users as a traditional file would be.
>
> shmem is also in-memory and uses updates atime, so I guess being
> in-memory doesn't mean we shouldn't update atime.
>
> guest_memfd is not quite traditional, but would the mmap patches Fuad is
> working on now qualify the guest_memfd as more traditional?
>
>>
>> Yes, theoretically atime is accessible to user, but is it actually useful for
>> guest_memfd, and do users track atime in this context? In my understanding,
>> this might be an unnecessary unless we want to maintain it for VFS consistency.
>>
>> My analysis of the call flow:
>> fstat() -> vfs_fstat() -> vfs_getattr() -> vfs_getattr_nosec() -> kvm_gmem_getattr()
>> I couldn't find any kernel-side consumers of inode's atime or instances where
>> it's being used for any internal purposes.
>>
>> Searching for examples, I found secretmem_mmap() skips file_accessed().
>>
>
> I guess I'm okay both ways, I don't think I have a use case for reading
> atime, but I assumed VFS consistency is a good thing.
I'm happy to go with whatever maintainers think best for file_accessed().
>>
>> Also as side note, I believe the kvm_gmem_getattr() ops implementation
>> might be unnecessary here.
>> Since kvm_gmem_getattr() is simply calling generic_fillattr() without
>> any special handling, couldn't we just use the default implementation?
>>
>> int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
>> u32 request_mask, unsigned int query_flags)
>> {
>> ...
>> if (inode->i_op->getattr)
>> return inode->i_op->getattr(idmap, path, stat,
>> request_mask,
>> query_flags);
>>
>> generic_fillattr(idmap, request_mask, inode, stat);
>> return 0;
>> }
>
> I noticed this too. I agree that we could actually just use
> generic_fillattr() by not specifying ->getattr().
I’ve attached a patch to remove it, letting the VFS default
handle attribute queries. Please let me know if it looks good or
needs tweaks.
[-- Attachment #2: 0001-KVM-guest_memfd-Remove-redundant-kvm_gmem_getattr.patch --]
[-- Type: text/plain, Size: 1272 bytes --]
From bf713f4b7d96dc92960d24537b488582c5521722 Mon Sep 17 00:00:00 2001
From: Shivank Garg <shivankg@amd.com>
Date: Fri, 11 Apr 2025 08:06:21 +0000
Subject: [PATCH] KVM: guest_memfd: Remove redundant kvm_gmem_getattr
Drop kvm_gmem_getattr, which only calls generic_fillattr, and
rely on the VFS default implementation to simplify the code.
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
virt/kvm/guest_memfd.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b2aa6bf24d3a..7d85cc33c0bb 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -382,23 +382,12 @@ static const struct address_space_operations kvm_gmem_aops = {
#endif
};
-static int kvm_gmem_getattr(struct mnt_idmap *idmap, const struct path *path,
- struct kstat *stat, u32 request_mask,
- unsigned int query_flags)
-{
- struct inode *inode = path->dentry->d_inode;
-
- generic_fillattr(idmap, request_mask, inode, stat);
- return 0;
-}
-
static int kvm_gmem_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
struct iattr *attr)
{
return -EINVAL;
}
static const struct inode_operations kvm_gmem_iops = {
- .getattr = kvm_gmem_getattr,
.setattr = kvm_gmem_setattr,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v7 1/9] mm: Consolidate freeing of typed folios on final folio_put()
2025-03-18 16:18 ` [PATCH v7 1/9] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
@ 2025-04-14 10:00 ` David Hildenbrand
2025-04-14 10:15 ` Fuad Tabba
0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-04-14 10:00 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx
On 18.03.25 17:18, Fuad Tabba wrote:
> Some folio types, such as hugetlb, handle freeing their own
> folios. Moreover, guest_memfd will require being notified once a
> folio's reference count reaches 0 to facilitate shared to private
> folio conversion, without the folio actually being freed at that
> point.
>
> As a first step towards that, this patch consolidates freeing
> folios that have a type. The first user is hugetlb folios. Later
> in this patch series, guest_memfd will become the second user of
> this.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
As discussed in the last upstream meeting, we should focus on using the
folio_put() hook only for the post-truncate case where required (e.g.,
re-assemble hugetlb).
For shared->private conversion doing it synchronously (unmap, try
freezing refcount) and failing if impossible to signal user space to
retry is a better first approach.
So this patch will be dropped from your series for now, correct?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 2/9] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages
2025-03-18 16:18 ` [PATCH v7 2/9] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages Fuad Tabba
@ 2025-04-14 10:01 ` David Hildenbrand
0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2025-04-14 10:01 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx
On 18.03.25 17:18, Fuad Tabba wrote:
> Before transitioning a guest_memfd folio to unshared, thereby
> disallowing access by the host and allowing the hypervisor to
> transition its view of the guest page as private, we need to be
> sure that the host doesn't have any references to the folio.
>
> This patch introduces a new type for guest_memfd folios, which
> isn't activated in this series but is here as a placeholder and
> to facilitate the code in the subsequent patch series. This will
> be used in the future to register a callback that informs the
> guest_memfd subsystem when the last reference is dropped,
> therefore knowing that the host doesn't have any remaining
> references.
>
> This patch also introduces the configuration option,
> KVM_GMEM_SHARED_MEM, which toggles support for mapping
> guest_memfd shared memory at the host.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: David Hildenbrand <david@redhat.com>
> ---
Same with this patch for now.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages
2025-03-18 16:18 ` [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
2025-04-08 12:04 ` Shivank Garg
@ 2025-04-14 10:06 ` David Hildenbrand
2025-04-14 10:15 ` Fuad Tabba
1 sibling, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-04-14 10:06 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx
> + /*
> + * Shared folios would not be marked as "guestmem" so far, and we only
> + * expect shared folios at this point.
> + */
> + if (WARN_ON_ONCE(folio_test_guestmem(folio))) {
> + ret = VM_FAULT_SIGBUS;
> + goto out_folio;
> + }
With that dropped for now
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 1/9] mm: Consolidate freeing of typed folios on final folio_put()
2025-04-14 10:00 ` David Hildenbrand
@ 2025-04-14 10:15 ` Fuad Tabba
0 siblings, 0 replies; 35+ messages in thread
From: Fuad Tabba @ 2025-04-14 10:15 UTC (permalink / raw)
To: David Hildenbrand
Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx
Hi David,
On Mon, 14 Apr 2025 at 11:00, David Hildenbrand <david@redhat.com> wrote:
>
> On 18.03.25 17:18, Fuad Tabba wrote:
> > Some folio types, such as hugetlb, handle freeing their own
> > folios. Moreover, guest_memfd will require being notified once a
> > folio's reference count reaches 0 to facilitate shared to private
> > folio conversion, without the folio actually being freed at that
> > point.
> >
> > As a first step towards that, this patch consolidates freeing
> > folios that have a type. The first user is hugetlb folios. Later
> > in this patch series, guest_memfd will become the second user of
> > this.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > Acked-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
>
> As discussed in the last upstream meeting, we should focus on using the
> folio_put() hook only for the post-truncate case where required (e.g.,
> re-assemble hugetlb).
>
> For shared->private conversion doing it synchronously (unmap, try
> freezing refcount) and failing if impossible to signal user space to
> retry is a better first approach.
>
> So this patch will be dropped from your series for now, correct?
Yes it will.
Thanks,
/fuad
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages
2025-04-14 10:06 ` David Hildenbrand
@ 2025-04-14 10:15 ` Fuad Tabba
0 siblings, 0 replies; 35+ messages in thread
From: Fuad Tabba @ 2025-04-14 10:15 UTC (permalink / raw)
To: David Hildenbrand
Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx
On Mon, 14 Apr 2025 at 11:06, David Hildenbrand <david@redhat.com> wrote:
>
>
> > + /*
> > + * Shared folios would not be marked as "guestmem" so far, and we only
> > + * expect shared folios at this point.
> > + */
> > + if (WARN_ON_ONCE(folio_test_guestmem(folio))) {
> > + ret = VM_FAULT_SIGBUS;
> > + goto out_folio;
> > + }
>
> With that dropped for now
>
> Acked-by: David Hildenbrand <david@redhat.com>
Thanks!
/fuad
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-03-18 16:18 ` [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory Fuad Tabba
@ 2025-04-14 11:51 ` David Hildenbrand
2025-04-14 16:03 ` Fuad Tabba
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: David Hildenbrand @ 2025-04-14 11:51 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx
On 18.03.25 17:18, Fuad Tabba wrote:
> For VMs that allow sharing guest_memfd backed memory in-place,
> handle that memory the same as "private" guest_memfd memory. This
> means that faulting that memory in the host or in the guest will
> go through the guest_memfd subsystem.
>
> Note that the word "private" in the name of the function
> kvm_mem_is_private() doesn't necessarily indicate that the memory
> isn't shared, but is due to the history and evolution of
> guest_memfd and the various names it has received. In effect,
> this function is used to multiplex between the path of a normal
> page fault and the path of a guest_memfd backed page fault.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> include/linux/kvm_host.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 601bbcaa5e41..3d5595a71a2a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2521,7 +2521,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> #else
> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> {
> - return false;
> + return kvm_arch_gmem_supports_shared_mem(kvm) &&
> + kvm_slot_can_be_private(gfn_to_memslot(kvm, gfn));
> }
> #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
>
I've been thinking long about this, and was wondering if we should instead
clean up the code to decouple the "private" from gmem handling first.
I know, this was already discussed a couple of times, but faking that
shared memory is private looks odd.
I played with the code to star cleaning this up. I ended up with the following
gmem-terminology cleanup patches (not even compile tested)
KVM: rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
KVM: rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
KVM: rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
KVM: x86: rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
KVM: rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
KVM: x86: generalize private fault lookups to "gmem" fault lookups
https://github.com/davidhildenbrand/linux/tree/gmem_shared_prep
On top of that, I was wondering if we could look into doing something like
the following. It would also allow for pulling pages out of gmem for
existing SW-protected VMs once they enable shared memory for GMEM IIUC.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 08eebd24a0e18..6f878cab0f466 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4495,11 +4495,6 @@ static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
{
int max_order, r;
- if (!kvm_slot_has_gmem(fault->slot)) {
- kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
- return -EFAULT;
- }
-
r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
&fault->refcounted_page, &max_order);
if (r) {
@@ -4518,8 +4513,19 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
unsigned int foll = fault->write ? FOLL_WRITE : 0;
+ bool use_gmem = false;
+
+ if (fault->is_private) {
+ if (!kvm_slot_has_gmem(fault->slot)) {
+ kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
+ return -EFAULT;
+ }
+ use_gmem = true;
+ } else if (kvm_slot_has_gmem_with_shared(fault->slot)) {
+ use_gmem = true;
+ }
- if (fault->is_private)
+ if (use_gmem)
return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
foll |= FOLL_NOWAIT;
That is, we'd not claim that things are private when they are not, but instead
teach the code about shared memory coming from gmem.
There might be some more missing, just throwing it out there if I am completely off.
--
Cheers,
David / dhildenb
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-04-14 11:51 ` David Hildenbrand
@ 2025-04-14 16:03 ` Fuad Tabba
2025-04-14 19:42 ` David Hildenbrand
2025-04-14 18:07 ` Ackerley Tng
2025-04-16 12:30 ` Patrick Roy
2 siblings, 1 reply; 35+ messages in thread
From: Fuad Tabba @ 2025-04-14 16:03 UTC (permalink / raw)
To: David Hildenbrand
Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx
Hi David,
On Mon, 14 Apr 2025 at 12:51, David Hildenbrand <david@redhat.com> wrote:
>
> On 18.03.25 17:18, Fuad Tabba wrote:
> > For VMs that allow sharing guest_memfd backed memory in-place,
> > handle that memory the same as "private" guest_memfd memory. This
> > means that faulting that memory in the host or in the guest will
> > go through the guest_memfd subsystem.
> >
> > Note that the word "private" in the name of the function
> > kvm_mem_is_private() doesn't necessarily indicate that the memory
> > isn't shared, but is due to the history and evolution of
> > guest_memfd and the various names it has received. In effect,
> > this function is used to multiplex between the path of a normal
> > page fault and the path of a guest_memfd backed page fault.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> > include/linux/kvm_host.h | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 601bbcaa5e41..3d5595a71a2a 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2521,7 +2521,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> > #else
> > static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> > {
> > - return false;
> > + return kvm_arch_gmem_supports_shared_mem(kvm) &&
> > + kvm_slot_can_be_private(gfn_to_memslot(kvm, gfn));
> > }
> > #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
> >
>
> I've been thinking long about this, and was wondering if we should instead
> clean up the code to decouple the "private" from gmem handling first.
>
> I know, this was already discussed a couple of times, but faking that
> shared memory is private looks odd.
I agree. I've been wanting to do that as part of a separate series,
since renaming discussions sometimes tend to take a disproportionate
amount of time. But the confusion the current naming (and overloading
of terms) is causing is probably worse.
>
> I played with the code to star cleaning this up. I ended up with the following
> gmem-terminology cleanup patches (not even compile tested)
>
> KVM: rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
> KVM: rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
> KVM: rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
> KVM: x86: rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
> KVM: rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
> KVM: x86: generalize private fault lookups to "gmem" fault lookups
>
> https://github.com/davidhildenbrand/linux/tree/gmem_shared_prep
>
> On top of that, I was wondering if we could look into doing something like
> the following. It would also allow for pulling pages out of gmem for
> existing SW-protected VMs once they enable shared memory for GMEM IIUC.
>
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 08eebd24a0e18..6f878cab0f466 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4495,11 +4495,6 @@ static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> {
> int max_order, r;
>
> - if (!kvm_slot_has_gmem(fault->slot)) {
> - kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> - return -EFAULT;
> - }
> -
> r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
> &fault->refcounted_page, &max_order);
> if (r) {
> @@ -4518,8 +4513,19 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> struct kvm_page_fault *fault)
> {
> unsigned int foll = fault->write ? FOLL_WRITE : 0;
> + bool use_gmem = false;
> +
> + if (fault->is_private) {
> + if (!kvm_slot_has_gmem(fault->slot)) {
> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> + return -EFAULT;
> + }
> + use_gmem = true;
> + } else if (kvm_slot_has_gmem_with_shared(fault->slot)) {
> + use_gmem = true;
> + }
>
> - if (fault->is_private)
> + if (use_gmem)
> return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>
> foll |= FOLL_NOWAIT;
>
>
> That is, we'd not claim that things are private when they are not, but instead
> teach the code about shared memory coming from gmem.
>
> There might be some more missing, just throwing it out there if I am completely off.
For me these changes seem to be reasonable all in all. I might want to
suggest a couple of modifications, but I guess the bigger question is
what the KVM maintainers and guest_memfd's main contributors think.
Also, how do you suggest we go about this? Send out a separate series
first, before continuing with the mapping series? Or have it all as
one big series? It could be something to add to the agenda for
Thursday.
Thanks,
/fuad
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-04-14 11:51 ` David Hildenbrand
2025-04-14 16:03 ` Fuad Tabba
@ 2025-04-14 18:07 ` Ackerley Tng
2025-04-14 20:06 ` David Hildenbrand
2025-04-16 12:30 ` Patrick Roy
2 siblings, 1 reply; 35+ messages in thread
From: Ackerley Tng @ 2025-04-14 18:07 UTC (permalink / raw)
To: David Hildenbrand, Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx
David Hildenbrand <david@redhat.com> writes:
> On 18.03.25 17:18, Fuad Tabba wrote:
>> For VMs that allow sharing guest_memfd backed memory in-place,
>> handle that memory the same as "private" guest_memfd memory. This
>> means that faulting that memory in the host or in the guest will
>> go through the guest_memfd subsystem.
>>
>> Note that the word "private" in the name of the function
>> kvm_mem_is_private() doesn't necessarily indicate that the memory
>> isn't shared, but is due to the history and evolution of
>> guest_memfd and the various names it has received. In effect,
>> this function is used to multiplex between the path of a normal
>> page fault and the path of a guest_memfd backed page fault.
>>
>> Signed-off-by: Fuad Tabba <tabba@google.com>
>> ---
>> include/linux/kvm_host.h | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 601bbcaa5e41..3d5595a71a2a 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -2521,7 +2521,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>> #else
>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>> {
>> - return false;
>> + return kvm_arch_gmem_supports_shared_mem(kvm) &&
>> + kvm_slot_can_be_private(gfn_to_memslot(kvm, gfn));
>> }
>> #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
>>
>
> I've been thinking long about this, and was wondering if we should instead
> clean up the code to decouple the "private" from gmem handling first.
>
Thank you for making this suggestion more concrete, I like the renaming!
> I know, this was already discussed a couple of times, but faking that
> shared memory is private looks odd.
>
> I played with the code to star cleaning this up. I ended up with the following
> gmem-terminology cleanup patches (not even compile tested)
>
> KVM: rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
> KVM: rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
> KVM: rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
> KVM: x86: rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
> KVM: rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
Perhaps zooming into this [1] can clarify a lot. In
kvm_mmu_max_mapping_level(), it was
bool is_private = kvm_slot_has_gmem(slot) && kvm_mem_is_private(kvm, gfn);
and now it is
bool is_gmem = kvm_slot_has_gmem(slot) && kvm_mem_from_gmem(kvm, gfn);
Is this actually saying that the mapping level is to be fully determined
from lpage_info as long as this memslot has gmem and
A. this specific gfn is backed by gmem, or
B. if the specific gfn is private?
I noticed some other places where kvm_mem_is_private() is left as-is
[2], is that intentional? Are you not just renaming but splitting out
the case two cases A and B?
[1] https://github.com/davidhildenbrand/linux/blob/ac8ae8eb494c3d5a943a4a44c0e9e34b4976895a/arch/x86/kvm/mmu/mmu.c#L3286
[2] https://github.com/davidhildenbrand/linux/blob/ac8ae8eb494c3d5a943a4a44c0e9e34b4976895a/arch/x86/kvm/mmu/mmu.c#L4585
> KVM: x86: generalize private fault lookups to "gmem" fault lookups
>
> https://github.com/davidhildenbrand/linux/tree/gmem_shared_prep
>
> On top of that, I was wondering if we could look into doing something like
> the following. It would also allow for pulling pages out of gmem for
> existing SW-protected VMs once they enable shared memory for GMEM IIUC.
>
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 08eebd24a0e18..6f878cab0f466 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4495,11 +4495,6 @@ static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> {
> int max_order, r;
>
> - if (!kvm_slot_has_gmem(fault->slot)) {
> - kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> - return -EFAULT;
> - }
> -
> r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
> &fault->refcounted_page, &max_order);
> if (r) {
> @@ -4518,8 +4513,19 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> struct kvm_page_fault *fault)
> {
> unsigned int foll = fault->write ? FOLL_WRITE : 0;
> + bool use_gmem = false;
> +
> + if (fault->is_private) {
> + if (!kvm_slot_has_gmem(fault->slot)) {
> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> + return -EFAULT;
> + }
> + use_gmem = true;
> + } else if (kvm_slot_has_gmem_with_shared(fault->slot)) {
> + use_gmem = true;
> + }
>
> - if (fault->is_private)
> + if (use_gmem)
> return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>
> foll |= FOLL_NOWAIT;
>
>
> That is, we'd not claim that things are private when they are not, but instead
> teach the code about shared memory coming from gmem.
>
> There might be some more missing, just throwing it out there if I am completely off.
>
> --
> Cheers,
>
> David / dhildenb
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-04-14 16:03 ` Fuad Tabba
@ 2025-04-14 19:42 ` David Hildenbrand
2025-04-15 13:51 ` Fuad Tabba
0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-04-14 19:42 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx
On 14.04.25 18:03, Fuad Tabba wrote:
> Hi David,
>
> On Mon, 14 Apr 2025 at 12:51, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 18.03.25 17:18, Fuad Tabba wrote:
>>> For VMs that allow sharing guest_memfd backed memory in-place,
>>> handle that memory the same as "private" guest_memfd memory. This
>>> means that faulting that memory in the host or in the guest will
>>> go through the guest_memfd subsystem.
>>>
>>> Note that the word "private" in the name of the function
>>> kvm_mem_is_private() doesn't necessarily indicate that the memory
>>> isn't shared, but is due to the history and evolution of
>>> guest_memfd and the various names it has received. In effect,
>>> this function is used to multiplex between the path of a normal
>>> page fault and the path of a guest_memfd backed page fault.
>>>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>> include/linux/kvm_host.h | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index 601bbcaa5e41..3d5595a71a2a 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -2521,7 +2521,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>> #else
>>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>> {
>>> - return false;
>>> + return kvm_arch_gmem_supports_shared_mem(kvm) &&
>>> + kvm_slot_can_be_private(gfn_to_memslot(kvm, gfn));
>>> }
>>> #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
>>>
>>
>> I've been thinking long about this, and was wondering if we should instead
>> clean up the code to decouple the "private" from gmem handling first.
>>
>> I know, this was already discussed a couple of times, but faking that
>> shared memory is private looks odd.
>
> I agree. I've been wanting to do that as part of a separate series,
> since renaming discussions sometimes tend to take a disproportionate
> amount of time.But the confusion the current naming (and overloading
> of terms) is causing is probably worse.
Exactly my thoughts. The cleanup diff I was able to come up with is not
too crazy, so it feels feasible to just include the cleanups as a
preparation for mmap() where we introduce the concept of shared memory
in guest_memfd.
>
>>
>> I played with the code to star cleaning this up. I ended up with the following
>> gmem-terminology cleanup patches (not even compile tested)
>>
>> KVM: rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
>> KVM: rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
>> KVM: rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
>> KVM: x86: rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
>> KVM: rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
>> KVM: x86: generalize private fault lookups to "gmem" fault lookups
>>
>> https://github.com/davidhildenbrand/linux/tree/gmem_shared_prep
>>
>> On top of that, I was wondering if we could look into doing something like
>> the following. It would also allow for pulling pages out of gmem for
>> existing SW-protected VMs once they enable shared memory for GMEM IIUC.
>>
>>
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 08eebd24a0e18..6f878cab0f466 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -4495,11 +4495,6 @@ static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
>> {
>> int max_order, r;
>>
>> - if (!kvm_slot_has_gmem(fault->slot)) {
>> - kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>> - return -EFAULT;
>> - }
>> -
>> r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
>> &fault->refcounted_page, &max_order);
>> if (r) {
>> @@ -4518,8 +4513,19 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>> struct kvm_page_fault *fault)
>> {
>> unsigned int foll = fault->write ? FOLL_WRITE : 0;
>> + bool use_gmem = false;
>> +
>> + if (fault->is_private) {
>> + if (!kvm_slot_has_gmem(fault->slot)) {
>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>> + return -EFAULT;
>> + }
>> + use_gmem = true;
>> + } else if (kvm_slot_has_gmem_with_shared(fault->slot)) {
>> + use_gmem = true;
>> + }
>>
>> - if (fault->is_private)
>> + if (use_gmem)
>> return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>>
>> foll |= FOLL_NOWAIT;
>>
>>
>> That is, we'd not claim that things are private when they are not, but instead
>> teach the code about shared memory coming from gmem.
>>
>> There might be some more missing, just throwing it out there if I am completely off.
>
> For me these changes seem to be reasonable all in all. I might want to
> suggest a couple of modifications, but I guess the bigger question is
> what the KVM maintainers and guest_memfd's main contributors think.
I'm afraid we won't get a reply before we officially send it ...
>
> Also, how do you suggest we go about this? Send out a separate series
> first, before continuing with the mapping series? Or have it all as
> one big series? It could be something to add to the agenda for
> Thursday.
... and ideally it would be part of this series. After all, this series
shrunk a bit :)
Feel free to use my commits when helpful: they are still missing
descriptions and probably have other issues. Feel free to turn my SOB
into a Co-developed-by+SOB and make yourself the author.
Alternatively, let me know and I can polish them up and we can discuss
what you have in mind (either here or elsewhere).
I'd suggest we go full-steam on this series to finally get it over the
finish line :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-04-14 18:07 ` Ackerley Tng
@ 2025-04-14 20:06 ` David Hildenbrand
2025-04-15 21:50 ` Ackerley Tng
0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-04-14 20:06 UTC (permalink / raw)
To: Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx
>> I've been thinking long about this, and was wondering if we should instead
>> clean up the code to decouple the "private" from gmem handling first.
>>
>
> Thank you for making this suggestion more concrete, I like the renaming!
>
Thanks for the fast feedback!
>> I know, this was already discussed a couple of times, but faking that
>> shared memory is private looks odd.
>>
>> I played with the code to star cleaning this up. I ended up with the following
>> gmem-terminology cleanup patches (not even compile tested)
>>
>> KVM: rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
>> KVM: rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
>> KVM: rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
>> KVM: x86: rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
>> KVM: rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
>
> Perhaps zooming into this [1] can clarify a lot. In
> kvm_mmu_max_mapping_level(), it was
>
> bool is_private = kvm_slot_has_gmem(slot) && kvm_mem_is_private(kvm, gfn);
>
> and now it is
>
> bool is_gmem = kvm_slot_has_gmem(slot) && kvm_mem_from_gmem(kvm, gfn);
>
> Is this actually saying that the mapping level is to be fully determined
> from lpage_info as long as this memslot has gmem and
With this change in particular I was not quite sure what to do, maybe it should
stay specific to private memory only? But yeah the ideas was that
kvm_mem_from_gmem() would express:
(a) if guest_memfd only supports private memory, it would translate to
kvm_mem_is_private() -> no change.
(b) with guest_memfd having support for shared memory (+ support being enabled!),
it would only rely on the slot, not gfn information. Because it will all be
consumed from guest_memfd.
This hunk was missing
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d9616ee6acc70..cdcd7ac091b5c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2514,6 +2514,12 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
}
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
+static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
+{
+ /* For now, only private memory gets consumed from guest_memfd. */
+ return kvm_mem_is_private(kvm, gfn);
+}
+
>
> A. this specific gfn is backed by gmem, or
> B. if the specific gfn is private?
>
> I noticed some other places where kvm_mem_is_private() is left as-is
> [2], is that intentional? Are you not just renaming but splitting out
> the case two cases A and B?
That was the idea, yes.
If we get a private fault and !kvm_mem_is_private(), or a shared fault and
kvm_mem_is_private(), then we should handle it like today.
That is the kvm_mmu_faultin_pfn() case, where we
if (fault->is_private != kvm_mem_is_private(kvm, fault->gfn)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
}
which can be reached by arch/x86/kvm/svm/svm.c:npf_interception()
if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK))
error_code |= PFERR_PRIVATE_ACCESS;
In summary: the memory attribute mismatch will be handled as is, but not how
we obtain the gfn.
At least that was the idea (-issues in the commit).
What are your thoughts about that direction?
--
Cheers,
David / dhildenb
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-04-14 19:42 ` David Hildenbrand
@ 2025-04-15 13:51 ` Fuad Tabba
2025-04-15 17:23 ` David Hildenbrand
0 siblings, 1 reply; 35+ messages in thread
From: Fuad Tabba @ 2025-04-15 13:51 UTC (permalink / raw)
To: David Hildenbrand
Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx
Hi David,
On Mon, 14 Apr 2025 at 20:42, David Hildenbrand <david@redhat.com> wrote:
>
> On 14.04.25 18:03, Fuad Tabba wrote:
> > Hi David,
> >
> > On Mon, 14 Apr 2025 at 12:51, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 18.03.25 17:18, Fuad Tabba wrote:
> >>> For VMs that allow sharing guest_memfd backed memory in-place,
> >>> handle that memory the same as "private" guest_memfd memory. This
> >>> means that faulting that memory in the host or in the guest will
> >>> go through the guest_memfd subsystem.
> >>>
> >>> Note that the word "private" in the name of the function
> >>> kvm_mem_is_private() doesn't necessarily indicate that the memory
> >>> isn't shared, but is due to the history and evolution of
> >>> guest_memfd and the various names it has received. In effect,
> >>> this function is used to multiplex between the path of a normal
> >>> page fault and the path of a guest_memfd backed page fault.
> >>>
> >>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>> ---
> >>> include/linux/kvm_host.h | 3 ++-
> >>> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >>> index 601bbcaa5e41..3d5595a71a2a 100644
> >>> --- a/include/linux/kvm_host.h
> >>> +++ b/include/linux/kvm_host.h
> >>> @@ -2521,7 +2521,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >>> #else
> >>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >>> {
> >>> - return false;
> >>> + return kvm_arch_gmem_supports_shared_mem(kvm) &&
> >>> + kvm_slot_can_be_private(gfn_to_memslot(kvm, gfn));
> >>> }
> >>> #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
> >>>
> >>
> >> I've been thinking long about this, and was wondering if we should instead
> >> clean up the code to decouple the "private" from gmem handling first.
> >>
> >> I know, this was already discussed a couple of times, but faking that
> >> shared memory is private looks odd.
> >
> > I agree. I've been wanting to do that as part of a separate series,
> > since renaming discussions sometimes tend to take a disproportionate
> > amount of time.But the confusion the current naming (and overloading
> > of terms) is causing is probably worse.
>
> Exactly my thoughts. The cleanup diff I was able to come up with is not
> too crazy, so it feels feasible to just include the cleanups as a
> preparation for mmap() where we introduce the concept of shared memory
> in guest_memfd.
>
> >
> >>
> >> I played with the code to star cleaning this up. I ended up with the following
> >> gmem-terminology cleanup patches (not even compile tested)
> >>
> >> KVM: rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
> >> KVM: rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
> >> KVM: rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
> >> KVM: x86: rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
> >> KVM: rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
> >> KVM: x86: generalize private fault lookups to "gmem" fault lookups
> >>
> >> https://github.com/davidhildenbrand/linux/tree/gmem_shared_prep
> >>
> >> On top of that, I was wondering if we could look into doing something like
> >> the following. It would also allow for pulling pages out of gmem for
> >> existing SW-protected VMs once they enable shared memory for GMEM IIUC.
> >>
> >>
> >> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> >> index 08eebd24a0e18..6f878cab0f466 100644
> >> --- a/arch/x86/kvm/mmu/mmu.c
> >> +++ b/arch/x86/kvm/mmu/mmu.c
> >> @@ -4495,11 +4495,6 @@ static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> >> {
> >> int max_order, r;
> >>
> >> - if (!kvm_slot_has_gmem(fault->slot)) {
> >> - kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> >> - return -EFAULT;
> >> - }
> >> -
> >> r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
> >> &fault->refcounted_page, &max_order);
> >> if (r) {
> >> @@ -4518,8 +4513,19 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> >> struct kvm_page_fault *fault)
> >> {
> >> unsigned int foll = fault->write ? FOLL_WRITE : 0;
> >> + bool use_gmem = false;
> >> +
> >> + if (fault->is_private) {
> >> + if (!kvm_slot_has_gmem(fault->slot)) {
> >> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> >> + return -EFAULT;
> >> + }
> >> + use_gmem = true;
> >> + } else if (kvm_slot_has_gmem_with_shared(fault->slot)) {
> >> + use_gmem = true;
> >> + }
> >>
> >> - if (fault->is_private)
> >> + if (use_gmem)
> >> return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
> >>
> >> foll |= FOLL_NOWAIT;
> >>
> >>
> >> That is, we'd not claim that things are private when they are not, but instead
> >> teach the code about shared memory coming from gmem.
> >>
> >> There might be some more missing, just throwing it out there if I am completely off.
> >
> > For me these changes seem to be reasonable all in all. I might want to
> > suggest a couple of modifications, but I guess the bigger question is
> > what the KVM maintainers and guest_memfd's main contributors think.
>
> I'm afraid we won't get a reply before we officially send it ...
>
> >
> > Also, how do you suggest we go about this? Send out a separate series
> > first, before continuing with the mapping series? Or have it all as
> > one big series? It could be something to add to the agenda for
> > Thursday.
>
> ... and ideally it would be part of this series. After all, this series
> shrunk a bit :)
True, although Ackerley is working hard on adding more things on top
(mainly selftests though) :) That said, having multiple series
floating around was clearly not the way to go. So yes, this will be
part of this series.
> Feel free to use my commits when helpful: they are still missing
> descriptions and probably have other issues. Feel free to turn my SOB
> into a Co-developed-by+SOB and make yourself the author.
>
> Alternatively, let me know and I can polish them up and we can discuss
> what you have in mind (either here or elsewhere).
>
> I'd suggest we go full-steam on this series to finally get it over the
> finish line :)
Sure. I can take it over from here and bug you whenever I have any questions :)
Cheers,
/fuad
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-04-15 13:51 ` Fuad Tabba
@ 2025-04-15 17:23 ` David Hildenbrand
0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2025-04-15 17:23 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx
>>
>> ... and ideally it would be part of this series. After all, this series
>> shrunk a bit :)
>
> True, although Ackerley is working hard on adding more things on top
> (mainly selftests though) :) That said, having multiple series
> floating around was clearly not the way to go. So yes, this will be
> part of this series.
>
>> Feel free to use my commits when helpful: they are still missing
>> descriptions and probably have other issues. Feel free to turn my SOB
>> into a Co-developed-by+SOB and make yourself the author.
>>
>> Alternatively, let me know and I can polish them up and we can discuss
>> what you have in mind (either here or elsewhere).
>>
>> I'd suggest we go full-steam on this series to finally get it over the
>> finish line :)
>
> Sure. I can take it over from here and bug you whenever I have any questions :)
Just let me know if I can be of any help (privately, or on the list).
This has very high priority now for me :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-04-14 20:06 ` David Hildenbrand
@ 2025-04-15 21:50 ` Ackerley Tng
2025-04-16 12:53 ` David Hildenbrand
0 siblings, 1 reply; 35+ messages in thread
From: Ackerley Tng @ 2025-04-15 21:50 UTC (permalink / raw)
To: David Hildenbrand, Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx
David Hildenbrand <david@redhat.com> writes:
>>> I've been thinking long about this, and was wondering if we should instead
>>> clean up the code to decouple the "private" from gmem handling first.
>>>
>>
>> Thank you for making this suggestion more concrete, I like the renaming!
>>
>
> Thanks for the fast feedback!
>
>>> I know, this was already discussed a couple of times, but faking that
>>> shared memory is private looks odd.
>>>
>>> I played with the code to star cleaning this up. I ended up with the following
>>> gmem-terminology cleanup patches (not even compile tested)
>>>
>>> KVM: rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
>>> KVM: rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
>>> KVM: rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
>>> KVM: x86: rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
>>> KVM: rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
>>
>> Perhaps zooming into this [1] can clarify a lot. In
>> kvm_mmu_max_mapping_level(), it was
>>
>> bool is_private = kvm_slot_has_gmem(slot) && kvm_mem_is_private(kvm, gfn);
>>
>> and now it is
>>
>> bool is_gmem = kvm_slot_has_gmem(slot) && kvm_mem_from_gmem(kvm, gfn);
>>
>> Is this actually saying that the mapping level is to be fully determined
>> from lpage_info as long as this memslot has gmem and
>
> With this change in particular I was not quite sure what to do, maybe it should
> stay specific to private memory only? But yeah the ideas was that
> kvm_mem_from_gmem() would express:
>
> (a) if guest_memfd only supports private memory, it would translate to
> kvm_mem_is_private() -> no change.
>
> (b) with guest_memfd having support for shared memory (+ support being enabled!),
> it would only rely on the slot, not gfn information. Because it will all be
> consumed from guest_memfd.
>
> This hunk was missing
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d9616ee6acc70..cdcd7ac091b5c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2514,6 +2514,12 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> }
> #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
>
> +static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn)
> +{
> + /* For now, only private memory gets consumed from guest_memfd. */
> + return kvm_mem_is_private(kvm, gfn);
> +}
> +
>
>
I looked a little deeper and got help from James Houghton on
understanding this too.
Specifically for the usage of kvm_mem_is_private() in
kvm_mmu_max_mapping_level(), the intention there is probably to skip
querying userspace page tables in __kvm_mmu_max_mapping_level() since
private memory will never be faulted into userspace, hence no need to
check.
Hence kvm_mem_is_private() there is really meant to query the
private-ness of the gfn rather than just whether kvm_mem_from_gmem().
But then again, if kvm_mem_from_gmem(), guest_memfd should be queried
for max_mapping_level. guest_memfd would know, for both private and
shared memory, what page size the page was split to, and what level
it was faulted as. (Exception: if/when guest_memfd supports THP,
depending on how that is done, querying userspace page tables might be
necessary to determine the max_mapping_level)
>>
>> A. this specific gfn is backed by gmem, or
>> B. if the specific gfn is private?
>>
>> I noticed some other places where kvm_mem_is_private() is left as-is
>> [2], is that intentional? Are you not just renaming but splitting out
>> the case two cases A and B?
>
> That was the idea, yes.
>
> If we get a private fault and !kvm_mem_is_private(), or a shared fault and
> kvm_mem_is_private(), then we should handle it like today.
>
> That is the kvm_mmu_faultin_pfn() case, where we
>
> if (fault->is_private != kvm_mem_is_private(kvm, fault->gfn)) {
> kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> return -EFAULT;
> }
>
> which can be reached by arch/x86/kvm/svm/svm.c:npf_interception()
>
> if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK))
> error_code |= PFERR_PRIVATE_ACCESS;
>
> In summary: the memory attribute mismatch will be handled as is, but not how
> we obtain the gfn.
>
> At least that was the idea (-issues in the commit).
>
> What are your thoughts about that direction?
I still like the renaming. :)
I looked into kvm_mem_is_private() and I believe it has the following
uses:
1. Determining max_mapping_level (kvm_mmu_max_mapping_level() and
friends)
2. Querying the kernel's record of private/shared state, which is used
to handle (a) mismatch between fault->private and kernel's record
(handling implicit conversions) (b) how to prefaulting pages (c)
determining how to fault in KVM_X86_SW_PROTECTED_VMs
So perhaps we could leave kvm_mem_is_private() as not renamed, but as
part of the series introducing mmap and conversions
(CONFIG_KVM_GMEM_SHARED_MEM), we should also have kvm_mem_is_private()
query guest_memfd for shareability status, and perhaps
kvm_mmu_max_mapping_level() could query guest_memfd for page size (after
splitting, etc).
IIUC the maximum mapping level is determined by these factors:
1. Attribute granularity (lpage_info)
2. Page size (guest_memfd for guest_memfd backed memory)
3. Size of mapping in host page table (for non-guest_memfd backed
memory, and important for THP if/when/depending on how guest_memfd
supports THP)
>
> --
> Cheers,
>
> David / dhildenb
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-04-14 11:51 ` David Hildenbrand
2025-04-14 16:03 ` Fuad Tabba
2025-04-14 18:07 ` Ackerley Tng
@ 2025-04-16 12:30 ` Patrick Roy
2025-04-16 12:41 ` David Hildenbrand
2 siblings, 1 reply; 35+ messages in thread
From: Patrick Roy @ 2025-04-16 12:30 UTC (permalink / raw)
To: David Hildenbrand, Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx
Hi!
On Mon, 2025-04-14 at 12:51 +0100, David Hildenbrand wrote:
>
[...]
> On top of that, I was wondering if we could look into doing something like
> the following. It would also allow for pulling pages out of gmem for
> existing SW-protected VMs once they enable shared memory for GMEM IIUC.
>
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 08eebd24a0e18..6f878cab0f466 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4495,11 +4495,6 @@ static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> {
> int max_order, r;
>
> - if (!kvm_slot_has_gmem(fault->slot)) {
> - kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> - return -EFAULT;
> - }
> -
> r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
> &fault->refcounted_page, &max_order);
> if (r) {
> @@ -4518,8 +4513,19 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> struct kvm_page_fault *fault)
> {
> unsigned int foll = fault->write ? FOLL_WRITE : 0;
> + bool use_gmem = false;
> +
> + if (fault->is_private) {
> + if (!kvm_slot_has_gmem(fault->slot)) {
> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> + return -EFAULT;
> + }
> + use_gmem = true;
> + } else if (kvm_slot_has_gmem_with_shared(fault->slot)) {
> + use_gmem = true;
> + }
>
> - if (fault->is_private)
> + if (use_gmem)
> return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>
> foll |= FOLL_NOWAIT;
>
>
> That is, we'd not claim that things are private when they are not, but instead
> teach the code about shared memory coming from gmem.
>
> There might be some more missing, just throwing it out there if I am completely off.
I think I arrived at the need for this as well while experimenting with
building a Firecracker version that works with my direct map removal
patches.
With this patch series, on ARM, as soon as a memslot has a guest_memfd
associated with it, all guest faults go through kvm_gmem_get_pfn, but on
x86, they go through slot->userspace_addr by default, as
CONFIG_KVM_SW_PROTECTED_VM selects CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES.
There's no real difference between these if slot->userspace_addr can be
GUP'd, but if its a VMA of a guest_memfd without direct map entries,
faulting through slot->userspace_addr wont work. So on x86 Firecracker
has to formally set the memory attributes to private, while on ARM it
doesn't [1], which is a bit awkward.
David, I couldn't find an implementation of
kvm_slot_has_gmem_with_shared() in the branch you shared, but would it
be something like "slot->userspace_addr points to a gmem VMA,
particularly to a VMA of the gmem that's associated with this memslot,
mapped at the same offset"?
Best,
Patrick
[1]: https://github.com/firecracker-microvm/firecracker/blob/feature/secret-hiding/src/vmm/src/builder.rs#L268
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-04-16 12:30 ` Patrick Roy
@ 2025-04-16 12:41 ` David Hildenbrand
0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2025-04-16 12:41 UTC (permalink / raw)
To: Patrick Roy, Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx
On 16.04.25 14:30, Patrick Roy wrote:
> Hi!
>
> On Mon, 2025-04-14 at 12:51 +0100, David Hildenbrand wrote:
>>
>
> [...]
>
>> On top of that, I was wondering if we could look into doing something like
>> the following. It would also allow for pulling pages out of gmem for
>> existing SW-protected VMs once they enable shared memory for GMEM IIUC.
>>
>>
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 08eebd24a0e18..6f878cab0f466 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -4495,11 +4495,6 @@ static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
>> {
>> int max_order, r;
>>
>> - if (!kvm_slot_has_gmem(fault->slot)) {
>> - kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>> - return -EFAULT;
>> - }
>> -
>> r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
>> &fault->refcounted_page, &max_order);
>> if (r) {
>> @@ -4518,8 +4513,19 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>> struct kvm_page_fault *fault)
>> {
>> unsigned int foll = fault->write ? FOLL_WRITE : 0;
>> + bool use_gmem = false;
>> +
>> + if (fault->is_private) {
>> + if (!kvm_slot_has_gmem(fault->slot)) {
>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>> + return -EFAULT;
>> + }
>> + use_gmem = true;
>> + } else if (kvm_slot_has_gmem_with_shared(fault->slot)) {
>> + use_gmem = true;
>> + }
>>
>> - if (fault->is_private)
>> + if (use_gmem)
>> return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>>
>> foll |= FOLL_NOWAIT;
>>
>>
>> That is, we'd not claim that things are private when they are not, but instead
>> teach the code about shared memory coming from gmem.
>>
>> There might be some more missing, just throwing it out there if I am completely off.
>
> I think I arrived at the need for this as well while experimenting with
> building a Firecracker version that works with my direct map removal
> patches.
>
> With this patch series, on ARM, as soon as a memslot has a guest_memfd
> associated with it, all guest faults go through kvm_gmem_get_pfn, but on
> x86, they go through slot->userspace_addr by default, as
> CONFIG_KVM_SW_PROTECTED_VM selects CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES.
> There's no real difference between these if slot->userspace_addr can be
> GUP'd, but if its a VMA of a guest_memfd without direct map entries,
> faulting through slot->userspace_addr wont work. So on x86 Firecracker
> has to formally set the memory attributes to private, while on ARM it
> doesn't [1], which is a bit awkward.
Exactly. I proposed all that (especially the interactions/expectations
regarding KVM memory slots) as a topic for tomorrow's meeting. See below
on my current idea.
>
> David, I couldn't find an implementation of
> kvm_slot_has_gmem_with_shared() in the branch you shared, but would it
> be something like "slot->userspace_addr points to a gmem VMA,
> particularly to a VMA of the gmem that's associated with this memslot,
> mapped at the same offset"?
That's open for discussion, and the implementation of that will have to
be clarified in the concept of this series.
I would have made the KVM slot setup by user space a requirement:
If we enabled "gmem allows for shared memory / mmap" for a MM, then user
space *must* provide a mmap() of that guest_memfd in slot->userspace_addr.
(we might also be able to use a new slot flag to specify that)
With that, we can just say
* In the MMU etc, avoid walking the page tables and just grab the page
from guest_memfd.
* Where not possible (e.g., emulator access as discussed) we can just
access the page in the user space mapping
Does that sound reasonable?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
2025-04-15 21:50 ` Ackerley Tng
@ 2025-04-16 12:53 ` David Hildenbrand
0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2025-04-16 12:53 UTC (permalink / raw)
To: Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm, linux-mm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx
>>
>
> I looked a little deeper and got help from James Houghton on
> understanding this too.
Great :)
>
> Specifically for the usage of kvm_mem_is_private() in
> kvm_mmu_max_mapping_level(), the intention there is probably to skip
> querying userspace page tables in __kvm_mmu_max_mapping_level() since
> private memory will never be faulted into userspace, hence no need to
> check.
>
> Hence kvm_mem_is_private() there is really meant to query the
> private-ness of the gfn rather than just whether kvm_mem_from_gmem().
>
> But then again, if kvm_mem_from_gmem(), guest_memfd should be queried
> for max_mapping_level. guest_memfd would know, for both private and
> shared memory, what page size the page was split to, and what level
> it was faulted as. (Exception: if/when guest_memfd supports THP,
> depending on how that is done, querying userspace page tables might be
> necessary to determine the max_mapping_level)
Okay, so I assume my intuition was right: if we know we can go via the
guest_memfd also for !private memory, then probably no need to consult
the page tables.
Let's discuss that tomorrow in the meeting.
>
>>>
>>> A. this specific gfn is backed by gmem, or
>>> B. if the specific gfn is private?
>>>
>>> I noticed some other places where kvm_mem_is_private() is left as-is
>>> [2], is that intentional? Are you not just renaming but splitting out
>>> the case two cases A and B?
>>
>> That was the idea, yes.
>>
>> If we get a private fault and !kvm_mem_is_private(), or a shared fault and
>> kvm_mem_is_private(), then we should handle it like today.
>>
>> That is the kvm_mmu_faultin_pfn() case, where we
>>
>> if (fault->is_private != kvm_mem_is_private(kvm, fault->gfn)) {
>> kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>> return -EFAULT;
>> }
>>
>> which can be reached by arch/x86/kvm/svm/svm.c:npf_interception()
>>
>> if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK))
>> error_code |= PFERR_PRIVATE_ACCESS;
>>
>> In summary: the memory attribute mismatch will be handled as is, but not how
>> we obtain the gfn.
>>
>> At least that was the idea (-issues in the commit).
>>
>> What are your thoughts about that direction?
>
> I still like the renaming. :)
>
> I looked into kvm_mem_is_private() and I believe it has the following
> uses:
>
> 1. Determining max_mapping_level (kvm_mmu_max_mapping_level() and
> friends)
> 2. Querying the kernel's record of private/shared state, which is used
> to handle (a) mismatch between fault->private and kernel's record
> (handling implicit conversions) (b) how to prefaulting pages (c)
> determining how to fault in KVM_X86_SW_PROTECTED_VMs
>
> So perhaps we could leave kvm_mem_is_private() as not renamed, but as
> part of the series introducing mmap and conversions
> (CONFIG_KVM_GMEM_SHARED_MEM), we should also have kvm_mem_is_private()
> query guest_memfd for shareability status, and perhaps
> kvm_mmu_max_mapping_level() could query guest_memfd for page size (after
> splitting, etc).
>
Right, that's why I opted to leave kvm_mem_is_private() here and really
only indicate if memory is *actually* private.
> IIUC the maximum mapping level is determined by these factors:
>
> 1. Attribute granularity (lpage_info)
> 2. Page size (guest_memfd for guest_memfd backed memory)
> 3. Size of mapping in host page table (for non-guest_memfd backed
> memory, and important for THP if/when/depending on how guest_memfd
> supports THP)
Right, once private+shared will come from guest_memfd, then likely 3
does not apply anymore.
See my reply to Patrick regarding that.
Thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2025-04-16 12:53 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-18 16:18 [PATCH v7 0/9] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 1/9] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
2025-04-14 10:00 ` David Hildenbrand
2025-04-14 10:15 ` Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 2/9] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages Fuad Tabba
2025-04-14 10:01 ` David Hildenbrand
2025-03-18 16:18 ` [PATCH v7 3/9] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
2025-04-08 12:04 ` Shivank Garg
2025-04-08 13:17 ` Fuad Tabba
2025-04-08 16:58 ` Ackerley Tng
2025-04-09 7:17 ` Shivank Garg
2025-04-10 22:44 ` Ackerley Tng
2025-04-11 10:34 ` Shivank Garg
2025-04-14 10:06 ` David Hildenbrand
2025-04-14 10:15 ` Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory Fuad Tabba
2025-04-14 11:51 ` David Hildenbrand
2025-04-14 16:03 ` Fuad Tabba
2025-04-14 19:42 ` David Hildenbrand
2025-04-15 13:51 ` Fuad Tabba
2025-04-15 17:23 ` David Hildenbrand
2025-04-14 18:07 ` Ackerley Tng
2025-04-14 20:06 ` David Hildenbrand
2025-04-15 21:50 ` Ackerley Tng
2025-04-16 12:53 ` David Hildenbrand
2025-04-16 12:30 ` Patrick Roy
2025-04-16 12:41 ` David Hildenbrand
2025-03-18 16:18 ` [PATCH v7 5/9] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory Fuad Tabba
2025-03-26 14:42 ` kernel test robot
2025-03-18 16:18 ` [PATCH v7 6/9] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 7/9] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 8/9] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
2025-03-18 16:18 ` [PATCH v7 9/9] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-04-01 17:25 ` Ackerley Tng
2025-04-02 8:56 ` Fuad Tabba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).