* [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
@ 2025-06-11 13:33 Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
` (18 more replies)
0 siblings, 19 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
Main changes since v11 [1]:
- Addressed various points of feedback from the last revision.
- Rebased on Linux 6.16-rc1.
This patch series enables mapping of guest_memfd backed memory in the
host. This is useful for VMMs like Firecracker that aim to run guests
entirely backed by guest_memfd [2]. When combined with Patrick's series
for direct map removal [3], this provides additional hardening against
Spectre-like transient execution attacks.
This series also lays the groundwork for restricted mmap() support for
guest_memfd backed memory in the host for Confidential Computing
platforms that permit in-place sharing of guest memory with the host
[4].
Patch breakdown:
Patches 1-7: Primarily refactoring and renaming to decouple the concept
of guest memory being "private" from it being backed by guest_memfd.
Patches 8-9: Add support for in-place shared memory and the ability for
the host to map it. This is gated by a new configuration option, toggled
by a new flag, and advertised to userspace by a new capability
(introduced in patch 16).
Patches 10-15: Implement the x86 and arm64 support for this feature.
Patch 16: Introduces the new capability to advertise this support and
updates the documentation.
Patches 17-18: Add and fix selftests for the new functionality.
For details on how to test this patch series, and on how to boot a guest
that uses the new features, please refer to the instructions in v8 [5],
but use the updated kvmtool for 6.16 (KVM_CAP_GMEM_SHARED_MEM number has
changed) [6].
Cheers,
/fuad
[1] https://lore.kernel.org/all/20250605153800.557144-1-tabba@google.com/
[2] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[3] https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
[4] https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
[5] https://lore.kernel.org/all/20250430165655.605595-1-tabba@google.com/
[6] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/guestmem-basic-6.16
Ackerley Tng (2):
KVM: x86/mmu: Handle guest page faults for guest_memfd with shared
memory
KVM: x86: Consult guest_memfd when computing max_mapping_level
Fuad Tabba (16):
KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
CONFIG_KVM_GENERIC_GMEM_POPULATE
KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
KVM: Fix comments that refer to slots_lock
KVM: Fix comment that refers to kvm uapi header path
KVM: guest_memfd: Allow host to map guest_memfd pages
KVM: guest_memfd: Track shared memory support in memslot
KVM: x86: Enable guest_memfd shared memory for non-CoCo VMs
KVM: arm64: Refactor user_mem_abort()
KVM: arm64: Handle guest_memfd-backed guest page faults
KVM: arm64: Enable host mapping of shared guest_memfd memory
KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM
KVM: selftests: Don't use hardcoded page sizes in guest_memfd test
KVM: selftests: guest_memfd mmap() test when mapping is allowed
Documentation/virt/kvm/api.rst | 9 +
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/mmu.c | 189 ++++++++++++----
arch/x86/include/asm/kvm_host.h | 22 +-
arch/x86/kvm/Kconfig | 5 +-
arch/x86/kvm/mmu/mmu.c | 135 ++++++-----
arch/x86/kvm/svm/sev.c | 4 +-
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/x86.c | 4 +-
include/linux/kvm_host.h | 80 +++++--
include/uapi/linux/kvm.h | 2 +
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/guest_memfd_test.c | 212 +++++++++++++++---
virt/kvm/Kconfig | 14 +-
virt/kvm/Makefile.kvm | 2 +-
virt/kvm/guest_memfd.c | 91 +++++++-
virt/kvm/kvm_main.c | 16 +-
virt/kvm/kvm_mm.h | 4 +-
19 files changed, 630 insertions(+), 169 deletions(-)
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply [flat|nested] 75+ messages in thread
* [PATCH v12 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
` (17 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
The option KVM_PRIVATE_MEM enables guest_memfd in general. Subsequent
patches add shared memory support to guest_memfd. Therefore, rename it
to KVM_GMEM to make its purpose clearer.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 +-
include/linux/kvm_host.h | 10 +++++-----
virt/kvm/Kconfig | 8 ++++----
virt/kvm/Makefile.kvm | 2 +-
virt/kvm/kvm_main.c | 4 ++--
virt/kvm/kvm_mm.h | 4 ++--
6 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b4a391929cdb..6e0bbf4c2202 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2269,7 +2269,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
int tdp_max_root_level, int tdp_huge_page_level);
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
#else
#define kvm_arch_has_private_mem(kvm) false
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3bde4fb5c6aa..b2c415e81e2e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -601,7 +601,7 @@ struct kvm_memory_slot {
short id;
u16 as_id;
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
struct {
/*
* Writes protected by kvm->slots_lock. Acquiring a
@@ -722,7 +722,7 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
* Arch code must define kvm_arch_has_private_mem if support for private memory
* is enabled.
*/
-#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
+#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
{
return false;
@@ -2527,7 +2527,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{
- return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
+ return IS_ENABLED(CONFIG_KVM_GMEM) &&
kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
}
#else
@@ -2537,7 +2537,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
}
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
int *max_order);
@@ -2550,7 +2550,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
KVM_BUG_ON(1, kvm);
return -EIO;
}
-#endif /* CONFIG_KVM_PRIVATE_MEM */
+#endif /* CONFIG_KVM_GMEM */
#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 727b542074e7..49df4e32bff7 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -112,19 +112,19 @@ config KVM_GENERIC_MEMORY_ATTRIBUTES
depends on KVM_GENERIC_MMU_NOTIFIER
bool
-config KVM_PRIVATE_MEM
+config KVM_GMEM
select XARRAY_MULTI
bool
config KVM_GENERIC_PRIVATE_MEM
select KVM_GENERIC_MEMORY_ATTRIBUTES
- select KVM_PRIVATE_MEM
+ select KVM_GMEM
bool
config HAVE_KVM_ARCH_GMEM_PREPARE
bool
- depends on KVM_PRIVATE_MEM
+ depends on KVM_GMEM
config HAVE_KVM_ARCH_GMEM_INVALIDATE
bool
- depends on KVM_PRIVATE_MEM
+ depends on KVM_GMEM
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 724c89af78af..8d00918d4c8b 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -12,4 +12,4 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
-kvm-$(CONFIG_KVM_PRIVATE_MEM) += $(KVM)/guest_memfd.o
+kvm-$(CONFIG_KVM_GMEM) += $(KVM)/guest_memfd.o
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index eec82775c5bf..898c3d5a7ba8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4910,7 +4910,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
case KVM_CAP_MEMORY_ATTRIBUTES:
return kvm_supported_mem_attributes(kvm);
#endif
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
case KVM_CAP_GUEST_MEMFD:
return !kvm || kvm_arch_has_private_mem(kvm);
#endif
@@ -5344,7 +5344,7 @@ static long kvm_vm_ioctl(struct file *filp,
case KVM_GET_STATS_FD:
r = kvm_vm_ioctl_get_stats_fd(kvm);
break;
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
case KVM_CREATE_GUEST_MEMFD: {
struct kvm_create_guest_memfd guest_memfd;
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index acef3f5c582a..ec311c0d6718 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -67,7 +67,7 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
}
#endif /* HAVE_KVM_PFNCACHE */
-#ifdef CONFIG_KVM_PRIVATE_MEM
+#ifdef CONFIG_KVM_GMEM
void kvm_gmem_init(struct module *module);
int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args);
int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -91,6 +91,6 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
{
WARN_ON_ONCE(1);
}
-#endif /* CONFIG_KVM_PRIVATE_MEM */
+#endif /* CONFIG_KVM_GMEM */
#endif /* __KVM_MM_H__ */
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
` (16 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
The option KVM_GENERIC_PRIVATE_MEM enables populating a GPA range with
guest data. Rename it to KVM_GENERIC_GMEM_POPULATE to make its purpose
clearer.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/x86/kvm/Kconfig | 4 ++--
include/linux/kvm_host.h | 2 +-
virt/kvm/Kconfig | 2 +-
virt/kvm/guest_memfd.c | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 2eeffcec5382..9151cd82adab 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -46,7 +46,7 @@ config KVM_X86
select HAVE_KVM_PM_NOTIFIER if PM
select KVM_GENERIC_HARDWARE_ENABLING
select KVM_GENERIC_PRE_FAULT_MEMORY
- select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
+ select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
select KVM_WERROR if WERROR
config KVM
@@ -157,7 +157,7 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select ARCH_HAS_CC_PLATFORM
- select KVM_GENERIC_PRIVATE_MEM
+ select KVM_GENERIC_GMEM_POPULATE
select HAVE_KVM_ARCH_GMEM_PREPARE
select HAVE_KVM_ARCH_GMEM_INVALIDATE
help
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b2c415e81e2e..7700efc06e35 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2556,7 +2556,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
#endif
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
/**
* kvm_gmem_populate() - Populate/prepare a GPA range with guest data
*
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 49df4e32bff7..559c93ad90be 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -116,7 +116,7 @@ config KVM_GMEM
select XARRAY_MULTI
bool
-config KVM_GENERIC_PRIVATE_MEM
+config KVM_GENERIC_GMEM_POPULATE
select KVM_GENERIC_MEMORY_ATTRIBUTES
select KVM_GMEM
bool
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b2aa6bf24d3a..befea51bbc75 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -638,7 +638,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
}
EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
-#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
+#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
kvm_gmem_populate_cb post_populate, void *opaque)
{
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem()
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
` (15 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
The function kvm_arch_has_private_mem() indicates whether an architecture
supports guest_memfd. Until now, this support implied the memory was
strictly private.
To decouple guest_memfd support from memory privacy, rename this
function to kvm_arch_supports_gmem().
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/x86/include/asm/kvm_host.h | 8 ++++----
arch/x86/kvm/mmu/mmu.c | 8 ++++----
include/linux/kvm_host.h | 6 +++---
virt/kvm/kvm_main.c | 6 +++---
4 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6e0bbf4c2202..3d69da6d2d9e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2270,9 +2270,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
#ifdef CONFIG_KVM_GMEM
-#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
+#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
#else
-#define kvm_arch_has_private_mem(kvm) false
+#define kvm_arch_supports_gmem(kvm) false
#endif
#define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
@@ -2325,8 +2325,8 @@ enum {
#define HF_SMM_INSIDE_NMI_MASK (1 << 2)
# define KVM_MAX_NR_ADDRESS_SPACES 2
-/* SMM is currently unsupported for guests with private memory. */
-# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 2)
+/* SMM is currently unsupported for guests with guest_memfd (esp private) memory. */
+# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_supports_gmem(kvm) ? 1 : 2)
# define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
#else
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cbc84c6abc2e..e7ecf089780a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4910,7 +4910,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
if (r)
return r;
- if (kvm_arch_has_private_mem(vcpu->kvm) &&
+ if (kvm_arch_supports_gmem(vcpu->kvm) &&
kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(range->gpa)))
error_code |= PFERR_PRIVATE_ACCESS;
@@ -7707,7 +7707,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
* Zapping SPTEs in this case ensures KVM will reassess whether or not
* a hugepage can be used for affected ranges.
*/
- if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+ if (WARN_ON_ONCE(!kvm_arch_supports_gmem(kvm)))
return false;
if (WARN_ON_ONCE(range->end <= range->start))
@@ -7786,7 +7786,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
* a range that has PRIVATE GFNs, and conversely converting a range to
* SHARED may now allow hugepages.
*/
- if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+ if (WARN_ON_ONCE(!kvm_arch_supports_gmem(kvm)))
return false;
/*
@@ -7842,7 +7842,7 @@ void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
{
int level;
- if (!kvm_arch_has_private_mem(kvm))
+ if (!kvm_arch_supports_gmem(kvm))
return;
for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7700efc06e35..a0e661aa3f8a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -719,11 +719,11 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
#endif
/*
- * Arch code must define kvm_arch_has_private_mem if support for private memory
+ * Arch code must define kvm_arch_supports_gmem if support for guest_memfd
* is enabled.
*/
-#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
-static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
+#if !defined(kvm_arch_supports_gmem) && !IS_ENABLED(CONFIG_KVM_GMEM)
+static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
{
return false;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 898c3d5a7ba8..6efbea208fa6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1588,7 +1588,7 @@ static int check_memory_region_flags(struct kvm *kvm,
{
u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
- if (kvm_arch_has_private_mem(kvm))
+ if (kvm_arch_supports_gmem(kvm))
valid_flags |= KVM_MEM_GUEST_MEMFD;
/* Dirty logging private memory is not currently supported. */
@@ -2419,7 +2419,7 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
static u64 kvm_supported_mem_attributes(struct kvm *kvm)
{
- if (!kvm || kvm_arch_has_private_mem(kvm))
+ if (!kvm || kvm_arch_supports_gmem(kvm))
return KVM_MEMORY_ATTRIBUTE_PRIVATE;
return 0;
@@ -4912,7 +4912,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
#endif
#ifdef CONFIG_KVM_GMEM
case KVM_CAP_GUEST_MEMFD:
- return !kvm || kvm_arch_has_private_mem(kvm);
+ return !kvm || kvm_arch_supports_gmem(kvm);
#endif
default:
break;
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (2 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-13 13:57 ` Ackerley Tng
2025-06-13 20:35 ` Sean Christopherson
2025-06-11 13:33 ` [PATCH v12 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
` (14 subsequent siblings)
18 siblings, 2 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
The bool has_private_mem is used to indicate whether guest_memfd is
supported. Rename it to supports_gmem to make its meaning clearer and to
decouple memory being private from guest_memfd.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/x86/include/asm/kvm_host.h | 4 ++--
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/svm/svm.c | 4 ++--
arch/x86/kvm/x86.c | 3 +--
4 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3d69da6d2d9e..4bc50c1e21bd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1341,7 +1341,7 @@ struct kvm_arch {
unsigned int indirect_shadow_pages;
u8 mmu_valid_gen;
u8 vm_type;
- bool has_private_mem;
+ bool supports_gmem;
bool has_protected_state;
bool pre_fault_allowed;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
@@ -2270,7 +2270,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
#ifdef CONFIG_KVM_GMEM
-#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
+#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
#else
#define kvm_arch_supports_gmem(kvm) false
#endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e7ecf089780a..c4e10797610c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3488,7 +3488,7 @@ static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault *fault
* on RET_PF_SPURIOUS until the update completes, or an actual spurious
* case might go down the slow path. Either case will resolve itself.
*/
- if (kvm->arch.has_private_mem &&
+ if (kvm->arch.supports_gmem &&
fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
return false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ab9b947dbf4f..67ab05fd3517 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5180,8 +5180,8 @@ static int svm_vm_init(struct kvm *kvm)
(type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
to_kvm_sev_info(kvm)->need_init = true;
- kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
- kvm->arch.pre_fault_allowed = !kvm->arch.has_private_mem;
+ kvm->arch.supports_gmem = (type == KVM_X86_SNP_VM);
+ kvm->arch.pre_fault_allowed = !kvm->arch.supports_gmem;
}
if (!pause_filter_count || !pause_filter_thresh)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b58a74c1722d..401256ee817f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12778,8 +12778,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return -EINVAL;
kvm->arch.vm_type = type;
- kvm->arch.has_private_mem =
- (type == KVM_X86_SW_PROTECTED_VM);
+ kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
/* Decided by the vendor code for other VM types. */
kvm->arch.pre_fault_allowed =
type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (3 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
` (13 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
The function kvm_slot_can_be_private() is used to check whether a memory
slot is backed by guest_memfd. Rename it to kvm_slot_has_gmem() to make
that clearer and to decouple memory being private from guest_memfd.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/x86/kvm/mmu/mmu.c | 4 ++--
arch/x86/kvm/svm/sev.c | 4 ++--
include/linux/kvm_host.h | 2 +-
virt/kvm/guest_memfd.c | 2 +-
4 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c4e10797610c..75b7b02cfcb7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3285,7 +3285,7 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
int kvm_mmu_max_mapping_level(struct kvm *kvm,
const struct kvm_memory_slot *slot, gfn_t gfn)
{
- bool is_private = kvm_slot_can_be_private(slot) &&
+ bool is_private = kvm_slot_has_gmem(slot) &&
kvm_mem_is_private(kvm, gfn);
return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
@@ -4498,7 +4498,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
{
int max_order, r;
- if (!kvm_slot_can_be_private(fault->slot)) {
+ if (!kvm_slot_has_gmem(fault->slot)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5a69b657dae9..ed85634eb2bd 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2319,7 +2319,7 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
mutex_lock(&kvm->slots_lock);
memslot = gfn_to_memslot(kvm, params.gfn_start);
- if (!kvm_slot_can_be_private(memslot)) {
+ if (!kvm_slot_has_gmem(memslot)) {
ret = -EINVAL;
goto out;
}
@@ -4644,7 +4644,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
}
slot = gfn_to_memslot(kvm, gfn);
- if (!kvm_slot_can_be_private(slot)) {
+ if (!kvm_slot_has_gmem(slot)) {
pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
gpa);
return;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a0e661aa3f8a..76b85099da99 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -614,7 +614,7 @@ struct kvm_memory_slot {
#endif
};
-static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot)
+static inline bool kvm_slot_has_gmem(const struct kvm_memory_slot *slot)
{
return slot && (slot->flags & KVM_MEM_GUEST_MEMFD);
}
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index befea51bbc75..6db515833f61 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -654,7 +654,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
return -EINVAL;
slot = gfn_to_memslot(kvm, start_gfn);
- if (!kvm_slot_can_be_private(slot))
+ if (!kvm_slot_has_gmem(slot))
return -EINVAL;
file = kvm_gmem_get_file(slot);
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 06/18] KVM: Fix comments that refer to slots_lock
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (4 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
` (12 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
Fix comments so that they refer to slots_lock instead of slots_locks
(remove trailing s).
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 76b85099da99..aec8e4182a65 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -859,7 +859,7 @@ struct kvm {
struct notifier_block pm_notifier;
#endif
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
- /* Protected by slots_locks (for writes) and RCU (for reads) */
+ /* Protected by slots_lock (for writes) and RCU (for reads) */
struct xarray mem_attr_array;
#endif
char stats_id[KVM_STATS_NAME_SIZE];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6efbea208fa6..d41bcc6a78b0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -331,7 +331,7 @@ void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
* All current use cases for flushing the TLBs for a specific memslot
* are related to dirty logging, and many do the TLB flush out of
* mmu_lock. The interaction between the various operations on memslot
- * must be serialized by slots_locks to ensure the TLB flush from one
+ * must be serialized by slots_lock to ensure the TLB flush from one
* operation is observed by any other operation on the same memslot.
*/
lockdep_assert_held(&kvm->slots_lock);
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 07/18] KVM: Fix comment that refers to kvm uapi header path
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (5 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
` (11 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
The comment that points to the path where the user-visible memslot flags
are refers to an outdated path and has a typo.
Update the comment to refer to the correct path.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
include/linux/kvm_host.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index aec8e4182a65..9a6712151a74 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -52,7 +52,7 @@
/*
* The bit 16 ~ bit 31 of kvm_userspace_memory_region::flags are internally
* used in kvm, other bits are visible for userspace which are defined in
- * include/linux/kvm_h.
+ * include/uapi/linux/kvm.h.
*/
#define KVM_MEMSLOT_INVALID (1UL << 16)
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (6 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-12 16:16 ` Shivank Garg
` (2 more replies)
2025-06-11 13:33 ` [PATCH v12 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
` (10 subsequent siblings)
18 siblings, 3 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
This patch enables support for shared memory in guest_memfd, including
mapping that memory from host userspace.
This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
flag at creation time.
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
include/linux/kvm_host.h | 13 +++++++
include/uapi/linux/kvm.h | 1 +
virt/kvm/Kconfig | 4 +++
virt/kvm/guest_memfd.c | 73 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 91 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9a6712151a74..6b63556ca150 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
}
#endif
+/*
+ * Returns true if this VM supports shared mem in guest_memfd.
+ *
+ * Arch code must define kvm_arch_supports_gmem_shared_mem if support for
+ * guest_memfd is enabled.
+ */
+#if !defined(kvm_arch_supports_gmem_shared_mem)
+static inline bool kvm_arch_supports_gmem_shared_mem(struct kvm *kvm)
+{
+ return false;
+}
+#endif
+
#ifndef kvm_arch_has_readonly_mem
static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
{
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d00b85cb168c..cb19150fd595 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
+#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0)
struct kvm_create_guest_memfd {
__u64 size;
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 559c93ad90be..e90884f74404 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
config HAVE_KVM_ARCH_GMEM_INVALIDATE
bool
depends on KVM_GMEM
+
+config KVM_GMEM_SHARED_MEM
+ select KVM_GMEM
+ bool
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 6db515833f61..06616b6b493b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
return gfn - slot->base_gfn + slot->gmem.pgoff;
}
+static bool kvm_gmem_supports_shared(struct inode *inode)
+{
+ const u64 flags = (u64)inode->i_private;
+
+ if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
+ return false;
+
+ return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
+}
+
+static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
+{
+ struct inode *inode = file_inode(vmf->vma->vm_file);
+ struct folio *folio;
+ vm_fault_t ret = VM_FAULT_LOCKED;
+
+ if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
+ return VM_FAULT_SIGBUS;
+
+ folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+ if (IS_ERR(folio)) {
+ int err = PTR_ERR(folio);
+
+ if (err == -EAGAIN)
+ return VM_FAULT_RETRY;
+
+ return vmf_error(err);
+ }
+
+ if (WARN_ON_ONCE(folio_test_large(folio))) {
+ ret = VM_FAULT_SIGBUS;
+ goto out_folio;
+ }
+
+ if (!folio_test_uptodate(folio)) {
+ clear_highpage(folio_page(folio, 0));
+ kvm_gmem_mark_prepared(folio);
+ }
+
+ vmf->page = folio_file_page(folio, vmf->pgoff);
+
+out_folio:
+ if (ret != VM_FAULT_LOCKED) {
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+
+ return ret;
+}
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+ .fault = kvm_gmem_fault_shared,
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ if (!kvm_gmem_supports_shared(file_inode(file)))
+ return -ENODEV;
+
+ if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
+ (VM_SHARED | VM_MAYSHARE)) {
+ return -EINVAL;
+ }
+
+ vma->vm_ops = &kvm_gmem_vm_ops;
+
+ return 0;
+}
+
static struct file_operations kvm_gmem_fops = {
+ .mmap = kvm_gmem_mmap,
.open = generic_file_open,
.release = kvm_gmem_release,
.fallocate = kvm_gmem_fallocate,
@@ -463,6 +533,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
u64 flags = args->flags;
u64 valid_flags = 0;
+ if (kvm_arch_supports_gmem_shared_mem(kvm))
+ valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
+
if (flags & ~valid_flags)
return -EINVAL;
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 09/18] KVM: guest_memfd: Track shared memory support in memslot
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (7 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
` (9 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
Add a new internal flag in the top half of memslot->flags to track when
a guest_memfd-backed slot supports shared memory, which is reserved for
internal use in KVM.
This avoids repeatedly checking the underlying guest_memfd file for
shared memory support, which requires taking a reference on the file.
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
include/linux/kvm_host.h | 11 ++++++++++-
virt/kvm/guest_memfd.c | 2 ++
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6b63556ca150..bba7d2c14177 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -54,7 +54,8 @@
* used in kvm, other bits are visible for userspace which are defined in
* include/uapi/linux/kvm.h.
*/
-#define KVM_MEMSLOT_INVALID (1UL << 16)
+#define KVM_MEMSLOT_INVALID (1UL << 16)
+#define KVM_MEMSLOT_SUPPORTS_GMEM_SHARED (1UL << 17)
/*
* Bit 63 of the memslot generation number is an "update in-progress flag",
@@ -2525,6 +2526,14 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
}
+static inline bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot)
+{
+ if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
+ return false;
+
+ return slot->flags & KVM_MEMSLOT_SUPPORTS_GMEM_SHARED;
+}
+
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
{
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 06616b6b493b..73b0aa2bc45f 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -592,6 +592,8 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
*/
WRITE_ONCE(slot->gmem.file, file);
slot->gmem.pgoff = start;
+ if (kvm_gmem_supports_shared(inode))
+ slot->flags |= KVM_MEMSLOT_SUPPORTS_GMEM_SHARED;
xa_store_range(&gmem->bindings, start, end - 1, slot, GFP_KERNEL);
filemap_invalidate_unlock(inode->i_mapping);
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (8 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-13 22:08 ` Sean Christopherson
2025-06-11 13:33 ` [PATCH v12 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
` (8 subsequent siblings)
18 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
From: Ackerley Tng <ackerleytng@google.com>
For memslots backed by guest_memfd with shared mem support, the KVM MMU
must always fault in pages from guest_memfd, and not from the host
userspace_addr. Update the fault handler to do so.
This patch also refactors related function names for accuracy:
kvm_mem_is_private() returns true only when the current private/shared
state (in the CoCo sense) of the memory is private, and returns false if
the current state is shared explicitly or impicitly, e.g., belongs to a
non-CoCo VM.
kvm_mmu_faultin_pfn_gmem() is updated to indicate that it can be used to
fault in not just private memory, but more generally, from guest_memfd.
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/x86/kvm/mmu/mmu.c | 38 +++++++++++++++++++++++---------------
include/linux/kvm_host.h | 25 +++++++++++++++++++++++--
2 files changed, 46 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 75b7b02cfcb7..2aab5a00caee 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3291,6 +3291,11 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
}
+static inline bool fault_from_gmem(struct kvm_page_fault *fault)
+{
+ return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
+}
+
void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
struct kvm_memory_slot *slot = fault->slot;
@@ -4467,21 +4472,25 @@ static inline u8 kvm_max_level_for_order(int order)
return PG_LEVEL_4K;
}
-static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
- u8 max_level, int gmem_order)
+static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
+ struct kvm_page_fault *fault,
+ int order)
{
- u8 req_max_level;
+ u8 max_level = fault->max_level;
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
- max_level = min(kvm_max_level_for_order(gmem_order), max_level);
+ max_level = min(kvm_max_level_for_order(order), max_level);
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
- req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn);
- if (req_max_level)
- max_level = min(max_level, req_max_level);
+ if (fault->is_private) {
+ u8 level = kvm_x86_call(private_max_mapping_level)(kvm, fault->pfn);
+
+ if (level)
+ max_level = min(max_level, level);
+ }
return max_level;
}
@@ -4493,10 +4502,10 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
r == RET_PF_RETRY, fault->map_writable);
}
-static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
- struct kvm_page_fault *fault)
+static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault)
{
- int max_order, r;
+ int gmem_order, r;
if (!kvm_slot_has_gmem(fault->slot)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
@@ -4504,15 +4513,14 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
}
r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
- &fault->refcounted_page, &max_order);
+ &fault->refcounted_page, &gmem_order);
if (r) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return r;
}
fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
- fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
- fault->max_level, max_order);
+ fault->max_level = kvm_max_level_for_fault_and_order(vcpu->kvm, fault, gmem_order);
return RET_PF_CONTINUE;
}
@@ -4522,8 +4530,8 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
{
unsigned int foll = fault->write ? FOLL_WRITE : 0;
- if (fault->is_private)
- return kvm_mmu_faultin_pfn_private(vcpu, fault);
+ if (fault_from_gmem(fault))
+ return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
foll |= FOLL_NOWAIT;
fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bba7d2c14177..8f7069385189 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2547,10 +2547,31 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range);
+/*
+ * Returns true if the given gfn's private/shared status (in the CoCo sense) is
+ * private.
+ *
+ * A return value of false indicates that the gfn is explicitly or implicitly
+ * shared (i.e., non-CoCo VMs).
+ */
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{
- return IS_ENABLED(CONFIG_KVM_GMEM) &&
- kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
+ struct kvm_memory_slot *slot;
+
+ if (!IS_ENABLED(CONFIG_KVM_GMEM))
+ return false;
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
+ /*
+ * Without in-place conversion support, if a guest_memfd memslot
+ * supports shared memory, then all the slot's memory is
+ * considered not private, i.e., implicitly shared.
+ */
+ return false;
+ }
+
+ return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
}
#else
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (9 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 12/18] KVM: x86: Enable guest_memfd shared memory for non-CoCo VMs Fuad Tabba
` (7 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
From: Ackerley Tng <ackerleytng@google.com>
This patch adds kvm_gmem_max_mapping_level(), which always returns
PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
When guest_memfd supports shared memory, max_mapping_level (especially
when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
from recover_huge_pages_range()) should take input from
guest_memfd.
Input from guest_memfd should be taken in these cases:
+ if the memslot supports shared memory (guest_memfd is used for
shared memory, or in future both shared and private memory) or
+ if the memslot is only used for private memory and that gfn is
private.
If the memslot doesn't use guest_memfd, figure out the
max_mapping_level using the host page tables like before.
This patch also refactors and inlines the other call to
__kvm_mmu_max_mapping_level().
In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
provided (if applicable) in fault->max_level. Hence, there is no need
to query guest_memfd.
lpage_info is queried like before, and then if the fault is not from
guest_memfd, adjust fault->req_level based on input from host page
tables.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/x86/kvm/mmu/mmu.c | 87 +++++++++++++++++++++++++---------------
include/linux/kvm_host.h | 11 +++++
virt/kvm/guest_memfd.c | 12 ++++++
3 files changed, 78 insertions(+), 32 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2aab5a00caee..b31c4750d02e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3258,12 +3258,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
return level;
}
-static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
- const struct kvm_memory_slot *slot,
- gfn_t gfn, int max_level, bool is_private)
+static int kvm_lpage_info_max_mapping_level(struct kvm *kvm,
+ const struct kvm_memory_slot *slot,
+ gfn_t gfn, int max_level)
{
struct kvm_lpage_info *linfo;
- int host_level;
max_level = min(max_level, max_huge_page_level);
for ( ; max_level > PG_LEVEL_4K; max_level--) {
@@ -3272,28 +3271,61 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
break;
}
- if (is_private)
- return max_level;
+ return max_level;
+}
+
+static inline u8 kvm_max_level_for_order(int order)
+{
+ BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
+
+ KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
+ order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
+ order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
+
+ if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
+ return PG_LEVEL_1G;
+
+ if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+ return PG_LEVEL_2M;
+
+ return PG_LEVEL_4K;
+}
+
+static inline int kvm_gmem_max_mapping_level(const struct kvm_memory_slot *slot,
+ gfn_t gfn, int max_level)
+{
+ int max_order;
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
- host_level = host_pfn_mapping_level(kvm, gfn, slot);
- return min(host_level, max_level);
+ max_order = kvm_gmem_mapping_order(slot, gfn);
+ return min(max_level, kvm_max_level_for_order(max_order));
}
int kvm_mmu_max_mapping_level(struct kvm *kvm,
const struct kvm_memory_slot *slot, gfn_t gfn)
{
- bool is_private = kvm_slot_has_gmem(slot) &&
- kvm_mem_is_private(kvm, gfn);
+ int max_level;
- return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
+ max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM);
+ if (max_level == PG_LEVEL_4K)
+ return PG_LEVEL_4K;
+
+ if (kvm_slot_has_gmem(slot) &&
+ (kvm_gmem_memslot_supports_shared(slot) ||
+ kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+ return kvm_gmem_max_mapping_level(slot, gfn, max_level);
+ }
+
+ return min(max_level, host_pfn_mapping_level(kvm, gfn, slot));
}
static inline bool fault_from_gmem(struct kvm_page_fault *fault)
{
- return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
+ return fault->is_private ||
+ (kvm_slot_has_gmem(fault->slot) &&
+ kvm_gmem_memslot_supports_shared(fault->slot));
}
void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
@@ -3316,12 +3348,20 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* Enforce the iTLB multihit workaround after capturing the requested
* level, which will be used to do precise, accurate accounting.
*/
- fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot,
- fault->gfn, fault->max_level,
- fault->is_private);
+ fault->req_level = kvm_lpage_info_max_mapping_level(vcpu->kvm, slot,
+ fault->gfn, fault->max_level);
if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
return;
+ if (!fault_from_gmem(fault)) {
+ int host_level;
+
+ host_level = host_pfn_mapping_level(vcpu->kvm, fault->gfn, slot);
+ fault->req_level = min(fault->req_level, host_level);
+ if (fault->req_level == PG_LEVEL_4K)
+ return;
+ }
+
/*
* mmu_invalidate_retry() was successful and mmu_lock is held, so
* the pmd can't be split from under us.
@@ -4455,23 +4495,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
vcpu->stat.pf_fixed++;
}
-static inline u8 kvm_max_level_for_order(int order)
-{
- BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
-
- KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
- order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
- order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
-
- if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
- return PG_LEVEL_1G;
-
- if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
- return PG_LEVEL_2M;
-
- return PG_LEVEL_4K;
-}
-
static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
struct kvm_page_fault *fault,
int order)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8f7069385189..58d7761c2a90 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2574,6 +2574,10 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
}
#else
+static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
+{
+ return 0;
+}
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{
return false;
@@ -2584,6 +2588,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
int *max_order);
+int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
#else
static inline int kvm_gmem_get_pfn(struct kvm *kvm,
struct kvm_memory_slot *slot, gfn_t gfn,
@@ -2593,6 +2598,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
KVM_BUG_ON(1, kvm);
return -EIO;
}
+static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
+ gfn_t gfn)
+{
+ BUILD_BUG();
+ return 0;
+}
#endif /* CONFIG_KVM_GMEM */
#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 73b0aa2bc45f..ebdb2d8bf57a 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -713,6 +713,18 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
}
EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
+/*
+ * Returns the mapping order for this @gfn in @slot.
+ *
+ * This is equal to max_order that would be returned if kvm_gmem_get_pfn() were
+ * called now.
+ */
+int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_gmem_mapping_order);
+
#ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE
long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
kvm_gmem_populate_cb post_populate, void *opaque)
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 12/18] KVM: x86: Enable guest_memfd shared memory for non-CoCo VMs
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (10 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
` (6 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
Define the architecture-specific macro to enable shared memory support
in guest_memfd for ordinary, i.e., non-CoCo, VM types, specifically
KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.
Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
enabled.
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/x86/include/asm/kvm_host.h | 10 ++++++++++
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/x86.c | 3 ++-
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4bc50c1e21bd..7b9ccdd99f32 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2271,8 +2271,18 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
#ifdef CONFIG_KVM_GMEM
#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
+
+/*
+ * CoCo VMs with hardware support that use guest_memfd only for backing private
+ * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
+ */
+#define kvm_arch_supports_gmem_shared_mem(kvm) \
+ (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) && \
+ ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM || \
+ (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
#else
#define kvm_arch_supports_gmem(kvm) false
+#define kvm_arch_supports_gmem_shared_mem(kvm) false
#endif
#define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 9151cd82adab..29845a286430 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -47,6 +47,7 @@ config KVM_X86
select KVM_GENERIC_HARDWARE_ENABLING
select KVM_GENERIC_PRE_FAULT_MEMORY
select KVM_GENERIC_GMEM_POPULATE if KVM_SW_PROTECTED_VM
+ select KVM_GMEM_SHARED_MEM if KVM_SW_PROTECTED_VM
select KVM_WERROR if WERROR
config KVM
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 401256ee817f..e21f5f2fe059 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12778,7 +12778,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return -EINVAL;
kvm->arch.vm_type = type;
- kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
+ kvm->arch.supports_gmem =
+ type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
/* Decided by the vendor code for other VM types. */
kvm->arch.pre_fault_allowed =
type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 13/18] KVM: arm64: Refactor user_mem_abort()
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (11 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 12/18] KVM: x86: Enable guest_memfd shared memory for non-CoCo VMs Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
` (5 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
To simplify the code and to make the assumptions clearer,
refactor user_mem_abort() by immediately setting force_pte to
true if the conditions are met.
Remove the comment about logging_active being guaranteed to never be
true for VM_PFNMAP memslots, since it's not actually correct.
Move code that will be reused in the following patch into separate
functions.
Other small instances of tidying up.
No functional change intended.
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/mmu.c | 100 ++++++++++++++++++++++++-------------------
1 file changed, 55 insertions(+), 45 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2942ec92c5a4..58662e0ef13e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1470,13 +1470,56 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
return vma->vm_flags & VM_MTE_ALLOWED;
}
+static int prepare_mmu_memcache(struct kvm_vcpu *vcpu, bool topup_memcache,
+ void **memcache)
+{
+ int min_pages;
+
+ if (!is_protected_kvm_enabled())
+ *memcache = &vcpu->arch.mmu_page_cache;
+ else
+ *memcache = &vcpu->arch.pkvm_memcache;
+
+ if (!topup_memcache)
+ return 0;
+
+ min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
+
+ if (!is_protected_kvm_enabled())
+ return kvm_mmu_topup_memory_cache(*memcache, min_pages);
+
+ return topup_hyp_memcache(*memcache, min_pages);
+}
+
+/*
+ * Potentially reduce shadow S2 permissions to match the guest's own S2. For
+ * exec faults, we'd only reach this point if the guest actually allowed it (see
+ * kvm_s2_handle_perm_fault).
+ *
+ * Also encode the level of the original translation in the SW bits of the leaf
+ * entry as a proxy for the span of that translation. This will be retrieved on
+ * TLB invalidation from the guest and used to limit the invalidation scope if a
+ * TTL hint or a range isn't provided.
+ */
+static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
+ enum kvm_pgtable_prot *prot,
+ bool *writable)
+{
+ *writable &= kvm_s2_trans_writable(nested);
+ if (!kvm_s2_trans_readable(nested))
+ *prot &= ~KVM_PGTABLE_PROT_R;
+
+ *prot |= kvm_encode_nested_level(nested);
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, unsigned long hva,
bool fault_is_perm)
{
int ret = 0;
- bool write_fault, writable, force_pte = false;
+ bool topup_memcache;
+ bool write_fault, writable;
bool exec_fault, mte_allowed;
bool device = false, vfio_allow_any_uc = false;
unsigned long mmu_seq;
@@ -1488,6 +1531,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
gfn_t gfn;
kvm_pfn_t pfn;
bool logging_active = memslot_is_logging(memslot);
+ bool force_pte = logging_active;
long vma_pagesize, fault_granule;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt;
@@ -1505,28 +1549,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}
- if (!is_protected_kvm_enabled())
- memcache = &vcpu->arch.mmu_page_cache;
- else
- memcache = &vcpu->arch.pkvm_memcache;
-
/*
* Permission faults just need to update the existing leaf entry,
* and so normally don't require allocations from the memcache. The
* only exception to this is when dirty logging is enabled at runtime
* and a write fault needs to collapse a block entry into a table.
*/
- if (!fault_is_perm || (logging_active && write_fault)) {
- int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
-
- if (!is_protected_kvm_enabled())
- ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
- else
- ret = topup_hyp_memcache(memcache, min_pages);
-
- if (ret)
- return ret;
- }
+ topup_memcache = !fault_is_perm || (logging_active && write_fault);
+ ret = prepare_mmu_memcache(vcpu, topup_memcache, &memcache);
+ if (ret)
+ return ret;
/*
* Let's check if we will get back a huge page backed by hugetlbfs, or
@@ -1540,16 +1572,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}
- /*
- * logging_active is guaranteed to never be true for VM_PFNMAP
- * memslots.
- */
- if (logging_active) {
- force_pte = true;
+ if (force_pte)
vma_shift = PAGE_SHIFT;
- } else {
+ else
vma_shift = get_vma_page_shift(vma, hva);
- }
switch (vma_shift) {
#ifndef __PAGETABLE_PMD_FOLDED
@@ -1601,7 +1627,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
max_map_size = PAGE_SIZE;
force_pte = (max_map_size == PAGE_SIZE);
- vma_pagesize = min(vma_pagesize, (long)max_map_size);
+ vma_pagesize = min_t(long, vma_pagesize, max_map_size);
}
/*
@@ -1630,7 +1656,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
* with the smp_wmb() in kvm_mmu_invalidate_end().
*/
- mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+ mmu_seq = kvm->mmu_invalidate_seq;
mmap_read_unlock(current->mm);
pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
@@ -1665,24 +1691,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (exec_fault && device)
return -ENOEXEC;
- /*
- * Potentially reduce shadow S2 permissions to match the guest's own
- * S2. For exec faults, we'd only reach this point if the guest
- * actually allowed it (see kvm_s2_handle_perm_fault).
- *
- * Also encode the level of the original translation in the SW bits
- * of the leaf entry as a proxy for the span of that translation.
- * This will be retrieved on TLB invalidation from the guest and
- * used to limit the invalidation scope if a TTL hint or a range
- * isn't provided.
- */
- if (nested) {
- writable &= kvm_s2_trans_writable(nested);
- if (!kvm_s2_trans_readable(nested))
- prot &= ~KVM_PGTABLE_PROT_R;
-
- prot |= kvm_encode_nested_level(nested);
- }
+ if (nested)
+ adjust_nested_fault_perms(nested, &prot, &writable);
kvm_fault_lock(kvm);
pgt = vcpu->arch.hw_mmu->pgt;
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (12 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-12 17:33 ` James Houghton
2025-06-11 13:33 ` [PATCH v12 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
` (4 subsequent siblings)
18 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
Add arm64 support for handling guest page faults on guest_memfd backed
memslots. Until guest_memfd supports huge pages, the fault granule is
restricted to PAGE_SIZE.
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/mmu.c | 82 ++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 79 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 58662e0ef13e..71f8b53683e7 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1512,6 +1512,78 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
*prot |= kvm_encode_nested_level(nested);
}
+#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
+
+static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+ struct kvm_s2_trans *nested,
+ struct kvm_memory_slot *memslot, bool is_perm)
+{
+ bool write_fault, exec_fault, writable;
+ enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
+ enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
+ struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+ struct page *page;
+ struct kvm *kvm = vcpu->kvm;
+ void *memcache;
+ kvm_pfn_t pfn;
+ gfn_t gfn;
+ int ret;
+
+ ret = prepare_mmu_memcache(vcpu, true, &memcache);
+ if (ret)
+ return ret;
+
+ if (nested)
+ gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
+ else
+ gfn = fault_ipa >> PAGE_SHIFT;
+
+ write_fault = kvm_is_write_fault(vcpu);
+ exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
+
+ if (write_fault && exec_fault) {
+ kvm_err("Simultaneous write and execution fault\n");
+ return -EFAULT;
+ }
+
+ if (is_perm && !write_fault && !exec_fault) {
+ kvm_err("Unexpected L2 read permission error\n");
+ return -EFAULT;
+ }
+
+ ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
+ if (ret) {
+ kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
+ write_fault, exec_fault, false);
+ return ret;
+ }
+
+ writable = !(memslot->flags & KVM_MEM_READONLY);
+
+ if (nested)
+ adjust_nested_fault_perms(nested, &prot, &writable);
+
+ if (writable)
+ prot |= KVM_PGTABLE_PROT_W;
+
+ if (exec_fault ||
+ (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
+ (!nested || kvm_s2_trans_executable(nested))))
+ prot |= KVM_PGTABLE_PROT_X;
+
+ kvm_fault_lock(kvm);
+ ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
+ __pfn_to_phys(pfn), prot,
+ memcache, flags);
+ kvm_release_faultin_page(kvm, page, !!ret, writable);
+ kvm_fault_unlock(kvm);
+
+ if (writable && !ret)
+ mark_page_dirty_in_slot(kvm, memslot, gfn);
+
+ return ret != -EAGAIN ? ret : 0;
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, unsigned long hva,
@@ -1536,7 +1608,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt;
struct page *page;
- enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
+ enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
if (fault_is_perm)
fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@@ -1963,8 +2035,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
goto out_unlock;
}
- ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
- esr_fsc_is_permission_fault(esr));
+ if (kvm_slot_has_gmem(memslot))
+ ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
+ esr_fsc_is_permission_fault(esr));
+ else
+ ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
+ esr_fsc_is_permission_fault(esr));
if (ret == 0)
ret = 1;
out:
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (13 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
` (3 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
Enable the host mapping of guest_memfd-backed memory on arm64.
This applies to all current arm64 VM types that support guest_memfd.
Future VM types can restrict this behavior via the
kvm_arch_gmem_supports_shared_mem() hook if needed.
Reviewed-by: James Houghton <jthoughton@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/include/asm/kvm_host.h | 4 ++++
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/mmu.c | 7 +++++++
3 files changed, 12 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6ce2c5173482..0cd26219a12e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1655,5 +1655,9 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt);
void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1);
void check_feature_map(void);
+#ifdef CONFIG_KVM_GMEM
+#define kvm_arch_supports_gmem(kvm) true
+#define kvm_arch_supports_gmem_shared_mem(kvm) IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
+#endif
#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 713248f240e0..87120d46919a 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -37,6 +37,7 @@ menuconfig KVM
select HAVE_KVM_VCPU_RUN_PID_CHANGE
select SCHED_INFO
select GUEST_PERF_EVENTS if PERF_EVENTS
+ select KVM_GMEM_SHARED_MEM
help
Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 71f8b53683e7..55ac03f277e0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2274,6 +2274,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
if ((new->base_gfn + new->npages) > (kvm_phys_size(&kvm->arch.mmu) >> PAGE_SHIFT))
return -EFAULT;
+ /*
+ * Only support guest_memfd backed memslots with shared memory, since
+ * there aren't any CoCo VMs that support only private memory on arm64.
+ */
+ if (kvm_slot_has_gmem(new) && !kvm_gmem_memslot_supports_shared(new))
+ return -EINVAL;
+
hva = new->userspace_addr;
reg_end = hva + (new->npages << PAGE_SHIFT);
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (14 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
` (2 subsequent siblings)
18 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
This patch introduces the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
indicates that guest_memfd supports shared memory (when enabled by the
flag). This support is limited to certain VM types, determined per
architecture.
This patch also updates the KVM documentation with details on the new
capability, flag, and other information about support for shared memory
in guest_memfd.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
Documentation/virt/kvm/api.rst | 9 +++++++++
include/uapi/linux/kvm.h | 1 +
virt/kvm/kvm_main.c | 4 ++++
3 files changed, 14 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1bd2d42e6424..4ef3d8482000 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6407,6 +6407,15 @@ most one mapping per page, i.e. binding multiple memory regions to a single
guest_memfd range is not allowed (any number of memory regions can be bound to
a single guest_memfd file, but the bound ranges must not overlap).
+When the capability KVM_CAP_GMEM_SHARED_MEM is supported, the 'flags' field
+supports GUEST_MEMFD_FLAG_SUPPORT_SHARED. Setting this flag on guest_memfd
+creation enables mmap() and faulting of guest_memfd memory to host userspace.
+
+When the KVM MMU performs a PFN lookup to service a guest fault and the backing
+guest_memfd has the GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then the fault will
+always be consumed from guest_memfd, regardless of whether it is a shared or a
+private fault.
+
See KVM_SET_USER_MEMORY_REGION2 for additional details.
4.143 KVM_PRE_FAULT_MEMORY
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index cb19150fd595..c74cf8f73337 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -934,6 +934,7 @@ struct kvm_enable_cap {
#define KVM_CAP_ARM_EL2 240
#define KVM_CAP_ARM_EL2_E2H0 241
#define KVM_CAP_RISCV_MP_STATE_RESET 242
+#define KVM_CAP_GMEM_SHARED_MEM 243
struct kvm_irq_routing_irqchip {
__u32 irqchip;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d41bcc6a78b0..441c9b53b876 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4913,6 +4913,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
#ifdef CONFIG_KVM_GMEM
case KVM_CAP_GUEST_MEMFD:
return !kvm || kvm_arch_supports_gmem(kvm);
+#endif
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+ case KVM_CAP_GMEM_SHARED_MEM:
+ return !kvm || kvm_arch_supports_gmem_shared_mem(kvm);
#endif
default:
break;
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (15 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-12 16:24 ` Shivank Garg
2025-06-11 13:33 ` [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-06-12 17:38 ` [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
18 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
Using hardcoded page size values could cause the test to fail on systems
that have larger pages, e.g., arm64 with 64kB pages. Use getpagesize()
instead.
Also, build the guest_memfd selftest for arm64.
Reviewed-by: David Hildenbrand <david@redhat.com>
Suggested-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
tools/testing/selftests/kvm/guest_memfd_test.c | 11 ++++++-----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 38b95998e1e6..e11ed9e59ab5 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -172,6 +172,7 @@ TEST_GEN_PROGS_arm64 += arch_timer
TEST_GEN_PROGS_arm64 += coalesced_io_test
TEST_GEN_PROGS_arm64 += dirty_log_perf_test
TEST_GEN_PROGS_arm64 += get-reg-list
+TEST_GEN_PROGS_arm64 += guest_memfd_test
TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
TEST_GEN_PROGS_arm64 += memslot_perf_test
TEST_GEN_PROGS_arm64 += mmu_stress_test
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index ce687f8d248f..341ba616cf55 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -146,24 +146,25 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
{
int fd1, fd2, ret;
struct stat st1, st2;
+ size_t page_size = getpagesize();
- fd1 = __vm_create_guest_memfd(vm, 4096, 0);
+ fd1 = __vm_create_guest_memfd(vm, page_size, 0);
TEST_ASSERT(fd1 != -1, "memfd creation should succeed");
ret = fstat(fd1, &st1);
TEST_ASSERT(ret != -1, "memfd fstat should succeed");
- TEST_ASSERT(st1.st_size == 4096, "memfd st_size should match requested size");
+ TEST_ASSERT(st1.st_size == page_size, "memfd st_size should match requested size");
- fd2 = __vm_create_guest_memfd(vm, 8192, 0);
+ fd2 = __vm_create_guest_memfd(vm, page_size * 2, 0);
TEST_ASSERT(fd2 != -1, "memfd creation should succeed");
ret = fstat(fd2, &st2);
TEST_ASSERT(ret != -1, "memfd fstat should succeed");
- TEST_ASSERT(st2.st_size == 8192, "second memfd st_size should match requested size");
+ TEST_ASSERT(st2.st_size == page_size * 2, "second memfd st_size should match requested size");
ret = fstat(fd1, &st1);
TEST_ASSERT(ret != -1, "memfd fstat should succeed");
- TEST_ASSERT(st1.st_size == 4096, "first memfd st_size should still match requested size");
+ TEST_ASSERT(st1.st_size == page_size, "first memfd st_size should still match requested size");
TEST_ASSERT(st1.st_ino != st2.st_ino, "different memfd should have different inode numbers");
close(fd2);
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (16 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
@ 2025-06-11 13:33 ` Fuad Tabba
2025-06-12 16:23 ` Shivank Garg
2025-06-12 17:38 ` [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
18 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-11 13:33 UTC (permalink / raw)
To: kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny, tabba
Expand the guest_memfd selftests to include testing mapping guest
memory for VM types that support it.
Reviewed-by: James Houghton <jthoughton@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
.../testing/selftests/kvm/guest_memfd_test.c | 201 ++++++++++++++++--
1 file changed, 180 insertions(+), 21 deletions(-)
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 341ba616cf55..5da2ed6277ac 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -13,6 +13,8 @@
#include <linux/bitmap.h>
#include <linux/falloc.h>
+#include <setjmp.h>
+#include <signal.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
@@ -34,12 +36,83 @@ static void test_file_read_write(int fd)
"pwrite on a guest_mem fd should fail");
}
-static void test_mmap(int fd, size_t page_size)
+static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
+{
+ const char val = 0xaa;
+ char *mem;
+ size_t i;
+ int ret;
+
+ mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+ TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
+
+ mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
+
+ memset(mem, val, total_size);
+ for (i = 0; i < total_size; i++)
+ TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
+
+ ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
+ page_size);
+ TEST_ASSERT(!ret, "fallocate the first page should succeed.");
+
+ for (i = 0; i < page_size; i++)
+ TEST_ASSERT_EQ(READ_ONCE(mem[i]), 0x00);
+ for (; i < total_size; i++)
+ TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
+
+ memset(mem, val, page_size);
+ for (i = 0; i < total_size; i++)
+ TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
+
+ ret = munmap(mem, total_size);
+ TEST_ASSERT(!ret, "munmap() should succeed.");
+}
+
+static sigjmp_buf jmpbuf;
+void fault_sigbus_handler(int signum)
+{
+ siglongjmp(jmpbuf, 1);
+}
+
+static void test_fault_overflow(int fd, size_t page_size, size_t total_size)
+{
+ struct sigaction sa_old, sa_new = {
+ .sa_handler = fault_sigbus_handler,
+ };
+ size_t map_size = total_size * 4;
+ const char val = 0xaa;
+ char *mem;
+ size_t i;
+ int ret;
+
+ mem = mmap(NULL, map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
+
+ sigaction(SIGBUS, &sa_new, &sa_old);
+ if (sigsetjmp(jmpbuf, 1) == 0) {
+ memset(mem, 0xaa, map_size);
+ TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
+ }
+ sigaction(SIGBUS, &sa_old, NULL);
+
+ for (i = 0; i < total_size; i++)
+ TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
+
+ ret = munmap(mem, map_size);
+ TEST_ASSERT(!ret, "munmap() should succeed.");
+}
+
+static void test_mmap_not_supported(int fd, size_t page_size, size_t total_size)
{
char *mem;
mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
TEST_ASSERT_EQ(mem, MAP_FAILED);
+
+ mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ TEST_ASSERT_EQ(mem, MAP_FAILED);
}
static void test_file_size(int fd, size_t page_size, size_t total_size)
@@ -120,26 +193,19 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
}
}
-static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
+static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
+ uint64_t guest_memfd_flags,
+ size_t page_size)
{
- size_t page_size = getpagesize();
- uint64_t flag;
size_t size;
int fd;
for (size = 1; size < page_size; size++) {
- fd = __vm_create_guest_memfd(vm, size, 0);
- TEST_ASSERT(fd == -1 && errno == EINVAL,
+ fd = __vm_create_guest_memfd(vm, size, guest_memfd_flags);
+ TEST_ASSERT(fd < 0 && errno == EINVAL,
"guest_memfd() with non-page-aligned page size '0x%lx' should fail with EINVAL",
size);
}
-
- for (flag = BIT(0); flag; flag <<= 1) {
- fd = __vm_create_guest_memfd(vm, page_size, flag);
- TEST_ASSERT(fd == -1 && errno == EINVAL,
- "guest_memfd() with flag '0x%lx' should fail with EINVAL",
- flag);
- }
}
static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
@@ -171,30 +237,123 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
close(fd1);
}
-int main(int argc, char *argv[])
+static bool check_vm_type(unsigned long vm_type)
{
- size_t page_size;
+ /*
+ * Not all architectures support KVM_CAP_VM_TYPES. However, those that
+ * support guest_memfd have that support for the default VM type.
+ */
+ if (vm_type == VM_TYPE_DEFAULT)
+ return true;
+
+ return kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type);
+}
+
+static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
+ bool expect_mmap_allowed)
+{
+ struct kvm_vm *vm;
size_t total_size;
+ size_t page_size;
int fd;
- struct kvm_vm *vm;
- TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+ if (!check_vm_type(vm_type))
+ return;
page_size = getpagesize();
total_size = page_size * 4;
- vm = vm_create_barebones();
+ vm = vm_create_barebones_type(vm_type);
- test_create_guest_memfd_invalid(vm);
test_create_guest_memfd_multiple(vm);
+ test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size);
- fd = vm_create_guest_memfd(vm, total_size, 0);
+ fd = vm_create_guest_memfd(vm, total_size, guest_memfd_flags);
test_file_read_write(fd);
- test_mmap(fd, page_size);
+
+ if (expect_mmap_allowed) {
+ test_mmap_supported(fd, page_size, total_size);
+ test_fault_overflow(fd, page_size, total_size);
+
+ } else {
+ test_mmap_not_supported(fd, page_size, total_size);
+ }
+
test_file_size(fd, page_size, total_size);
test_fallocate(fd, page_size, total_size);
test_invalid_punch_hole(fd, page_size, total_size);
close(fd);
+ kvm_vm_free(vm);
+}
+
+static void test_vm_type_gmem_flag_validity(unsigned long vm_type,
+ uint64_t expected_valid_flags)
+{
+ size_t page_size = getpagesize();
+ struct kvm_vm *vm;
+ uint64_t flag = 0;
+ int fd;
+
+ if (!check_vm_type(vm_type))
+ return;
+
+ vm = vm_create_barebones_type(vm_type);
+
+ for (flag = BIT(0); flag; flag <<= 1) {
+ fd = __vm_create_guest_memfd(vm, page_size, flag);
+
+ if (flag & expected_valid_flags) {
+ TEST_ASSERT(fd >= 0,
+ "guest_memfd() with flag '0x%lx' should be valid",
+ flag);
+ close(fd);
+ } else {
+ TEST_ASSERT(fd < 0 && errno == EINVAL,
+ "guest_memfd() with flag '0x%lx' should fail with EINVAL",
+ flag);
+ }
+ }
+
+ kvm_vm_free(vm);
+}
+
+static void test_gmem_flag_validity(void)
+{
+ uint64_t non_coco_vm_valid_flags = 0;
+
+ if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
+ non_coco_vm_valid_flags = GUEST_MEMFD_FLAG_SUPPORT_SHARED;
+
+ test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, non_coco_vm_valid_flags);
+
+#ifdef __x86_64__
+ test_vm_type_gmem_flag_validity(KVM_X86_SW_PROTECTED_VM, non_coco_vm_valid_flags);
+ test_vm_type_gmem_flag_validity(KVM_X86_SEV_VM, 0);
+ test_vm_type_gmem_flag_validity(KVM_X86_SEV_ES_VM, 0);
+ test_vm_type_gmem_flag_validity(KVM_X86_SNP_VM, 0);
+ test_vm_type_gmem_flag_validity(KVM_X86_TDX_VM, 0);
+#endif
+}
+
+int main(int argc, char *argv[])
+{
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+
+ test_gmem_flag_validity();
+
+ test_with_type(VM_TYPE_DEFAULT, 0, false);
+ if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
+ test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_SUPPORT_SHARED,
+ true);
+ }
+
+#ifdef __x86_64__
+ test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
+ if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
+ test_with_type(KVM_X86_SW_PROTECTED_VM,
+ GUEST_MEMFD_FLAG_SUPPORT_SHARED, true);
+ }
+#endif
}
--
2.50.0.rc0.642.g800a2b2222-goog
^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
@ 2025-06-12 16:16 ` Shivank Garg
2025-06-13 21:03 ` Sean Christopherson
2025-06-25 21:47 ` Ackerley Tng
2 siblings, 0 replies; 75+ messages in thread
From: Shivank Garg @ 2025-06-12 16:16 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny
On 6/11/2025 7:03 PM, Fuad Tabba wrote:
> This patch enables support for shared memory in guest_memfd, including
> mapping that memory from host userspace.
>
> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> flag at creation time.
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> include/linux/kvm_host.h | 13 +++++++
> include/uapi/linux/kvm.h | 1 +
> virt/kvm/Kconfig | 4 +++
> virt/kvm/guest_memfd.c | 73 ++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 91 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9a6712151a74..6b63556ca150 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -729,6 +729,19 @@ static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> }
> #endif
>
> +/*
> + * Returns true if this VM supports shared mem in guest_memfd.
> + *
> + * Arch code must define kvm_arch_supports_gmem_shared_mem if support for
> + * guest_memfd is enabled.
> + */
> +#if !defined(kvm_arch_supports_gmem_shared_mem)
> +static inline bool kvm_arch_supports_gmem_shared_mem(struct kvm *kvm)
> +{
> + return false;
> +}
> +#endif
> +
> #ifndef kvm_arch_has_readonly_mem
> static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
> {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d00b85cb168c..cb19150fd595 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
>
> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0)
>
> struct kvm_create_guest_memfd {
> __u64 size;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 559c93ad90be..e90884f74404 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
> config HAVE_KVM_ARCH_GMEM_INVALIDATE
> bool
> depends on KVM_GMEM
> +
> +config KVM_GMEM_SHARED_MEM
> + select KVM_GMEM
> + bool
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 6db515833f61..06616b6b493b 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> return gfn - slot->base_gfn + slot->gmem.pgoff;
> }
>
> +static bool kvm_gmem_supports_shared(struct inode *inode)
> +{
> + const u64 flags = (u64)inode->i_private;
> +
> + if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
> + return false;
> +
> + return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +}
> +
> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> +{
> + struct inode *inode = file_inode(vmf->vma->vm_file);
> + struct folio *folio;
> + vm_fault_t ret = VM_FAULT_LOCKED;
> +
> + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> + return VM_FAULT_SIGBUS;
> +
> + folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> + if (IS_ERR(folio)) {
> + int err = PTR_ERR(folio);
> +
> + if (err == -EAGAIN)
> + return VM_FAULT_RETRY;
> +
> + return vmf_error(err);
> + }
> +
> + if (WARN_ON_ONCE(folio_test_large(folio))) {
> + ret = VM_FAULT_SIGBUS;
> + goto out_folio;
> + }
> +
> + if (!folio_test_uptodate(folio)) {
> + clear_highpage(folio_page(folio, 0));
> + kvm_gmem_mark_prepared(folio);
> + }
> +
> + vmf->page = folio_file_page(folio, vmf->pgoff);
> +
> +out_folio:
> + if (ret != VM_FAULT_LOCKED) {
> + folio_unlock(folio);
> + folio_put(folio);
> + }
> +
> + return ret;
> +}
> +
> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> + .fault = kvm_gmem_fault_shared,
> +};
> +
> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + if (!kvm_gmem_supports_shared(file_inode(file)))
> + return -ENODEV;
> +
> + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> + (VM_SHARED | VM_MAYSHARE)) {
> + return -EINVAL;
> + }
> +
> + vma->vm_ops = &kvm_gmem_vm_ops;
> +
> + return 0;
> +}
> +
> static struct file_operations kvm_gmem_fops = {
> + .mmap = kvm_gmem_mmap,
> .open = generic_file_open,
> .release = kvm_gmem_release,
> .fallocate = kvm_gmem_fallocate,
> @@ -463,6 +533,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> u64 flags = args->flags;
> u64 valid_flags = 0;
>
> + if (kvm_arch_supports_gmem_shared_mem(kvm))
> + valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +
> if (flags & ~valid_flags)
> return -EINVAL;
>
LGTM!
Reviewed-by: Shivank Garg <shivankg@amd.com>
Thanks,
Shivank
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed
2025-06-11 13:33 ` [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
@ 2025-06-12 16:23 ` Shivank Garg
0 siblings, 0 replies; 75+ messages in thread
From: Shivank Garg @ 2025-06-12 16:23 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny
On 6/11/2025 7:03 PM, Fuad Tabba wrote:
> Expand the guest_memfd selftests to include testing mapping guest
> memory for VM types that support it.
>
> Reviewed-by: James Houghton <jthoughton@google.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> .../testing/selftests/kvm/guest_memfd_test.c | 201 ++++++++++++++++--
> 1 file changed, 180 insertions(+), 21 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 341ba616cf55..5da2ed6277ac 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -13,6 +13,8 @@
>
> #include <linux/bitmap.h>
> #include <linux/falloc.h>
> +#include <setjmp.h>
> +#include <signal.h>
> #include <sys/mman.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> @@ -34,12 +36,83 @@ static void test_file_read_write(int fd)
> "pwrite on a guest_mem fd should fail");
> }
>
> -static void test_mmap(int fd, size_t page_size)
> +static void test_mmap_supported(int fd, size_t page_size, size_t total_size)
> +{
> + const char val = 0xaa;
> + char *mem;
> + size_t i;
> + int ret;
> +
> + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> + TEST_ASSERT(mem == MAP_FAILED, "Copy-on-write not allowed by guest_memfd.");
> +
> + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> + TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
> +
> + memset(mem, val, total_size);
> + for (i = 0; i < total_size; i++)
> + TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
> +
> + ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
> + page_size);
> + TEST_ASSERT(!ret, "fallocate the first page should succeed.");
> +
> + for (i = 0; i < page_size; i++)
> + TEST_ASSERT_EQ(READ_ONCE(mem[i]), 0x00);
> + for (; i < total_size; i++)
> + TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
> +
> + memset(mem, val, page_size);
> + for (i = 0; i < total_size; i++)
> + TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
> +
> + ret = munmap(mem, total_size);
> + TEST_ASSERT(!ret, "munmap() should succeed.");
> +}
> +
> +static sigjmp_buf jmpbuf;
> +void fault_sigbus_handler(int signum)
> +{
> + siglongjmp(jmpbuf, 1);
> +}
> +
> +static void test_fault_overflow(int fd, size_t page_size, size_t total_size)
> +{
> + struct sigaction sa_old, sa_new = {
> + .sa_handler = fault_sigbus_handler,
> + };
> + size_t map_size = total_size * 4;
> + const char val = 0xaa;
> + char *mem;
> + size_t i;
> + int ret;
> +
> + mem = mmap(NULL, map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> + TEST_ASSERT(mem != MAP_FAILED, "mmap() for shared guest memory should succeed.");
> +
> + sigaction(SIGBUS, &sa_new, &sa_old);
> + if (sigsetjmp(jmpbuf, 1) == 0) {
> + memset(mem, 0xaa, map_size);
> + TEST_ASSERT(false, "memset() should have triggered SIGBUS.");
> + }
> + sigaction(SIGBUS, &sa_old, NULL);
> +
> + for (i = 0; i < total_size; i++)
> + TEST_ASSERT_EQ(READ_ONCE(mem[i]), val);
> +
> + ret = munmap(mem, map_size);
> + TEST_ASSERT(!ret, "munmap() should succeed.");
> +}
> +
> +static void test_mmap_not_supported(int fd, size_t page_size, size_t total_size)
> {
> char *mem;
>
> mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> TEST_ASSERT_EQ(mem, MAP_FAILED);
> +
> + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> + TEST_ASSERT_EQ(mem, MAP_FAILED);
> }
>
> static void test_file_size(int fd, size_t page_size, size_t total_size)
> @@ -120,26 +193,19 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size)
> }
> }
>
> -static void test_create_guest_memfd_invalid(struct kvm_vm *vm)
> +static void test_create_guest_memfd_invalid_sizes(struct kvm_vm *vm,
> + uint64_t guest_memfd_flags,
> + size_t page_size)
> {
> - size_t page_size = getpagesize();
> - uint64_t flag;
> size_t size;
> int fd;
>
> for (size = 1; size < page_size; size++) {
> - fd = __vm_create_guest_memfd(vm, size, 0);
> - TEST_ASSERT(fd == -1 && errno == EINVAL,
> + fd = __vm_create_guest_memfd(vm, size, guest_memfd_flags);
> + TEST_ASSERT(fd < 0 && errno == EINVAL,
> "guest_memfd() with non-page-aligned page size '0x%lx' should fail with EINVAL",
> size);
> }
> -
> - for (flag = BIT(0); flag; flag <<= 1) {
> - fd = __vm_create_guest_memfd(vm, page_size, flag);
> - TEST_ASSERT(fd == -1 && errno == EINVAL,
> - "guest_memfd() with flag '0x%lx' should fail with EINVAL",
> - flag);
> - }
> }
>
> static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> @@ -171,30 +237,123 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> close(fd1);
> }
>
> -int main(int argc, char *argv[])
> +static bool check_vm_type(unsigned long vm_type)
> {
> - size_t page_size;
> + /*
> + * Not all architectures support KVM_CAP_VM_TYPES. However, those that
> + * support guest_memfd have that support for the default VM type.
> + */
> + if (vm_type == VM_TYPE_DEFAULT)
> + return true;
> +
> + return kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type);
> +}
> +
> +static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_flags,
> + bool expect_mmap_allowed)
> +{
> + struct kvm_vm *vm;
> size_t total_size;
> + size_t page_size;
> int fd;
> - struct kvm_vm *vm;
>
> - TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> + if (!check_vm_type(vm_type))
> + return;
>
> page_size = getpagesize();
> total_size = page_size * 4;
>
> - vm = vm_create_barebones();
> + vm = vm_create_barebones_type(vm_type);
>
> - test_create_guest_memfd_invalid(vm);
> test_create_guest_memfd_multiple(vm);
> + test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size);
>
> - fd = vm_create_guest_memfd(vm, total_size, 0);
> + fd = vm_create_guest_memfd(vm, total_size, guest_memfd_flags);
>
> test_file_read_write(fd);
> - test_mmap(fd, page_size);
> +
> + if (expect_mmap_allowed) {
> + test_mmap_supported(fd, page_size, total_size);
> + test_fault_overflow(fd, page_size, total_size);
> +
> + } else {
> + test_mmap_not_supported(fd, page_size, total_size);
> + }
> +
> test_file_size(fd, page_size, total_size);
> test_fallocate(fd, page_size, total_size);
> test_invalid_punch_hole(fd, page_size, total_size);
>
> close(fd);
> + kvm_vm_free(vm);
> +}
> +
> +static void test_vm_type_gmem_flag_validity(unsigned long vm_type,
> + uint64_t expected_valid_flags)
> +{
> + size_t page_size = getpagesize();
> + struct kvm_vm *vm;
> + uint64_t flag = 0;
> + int fd;
> +
> + if (!check_vm_type(vm_type))
> + return;
> +
> + vm = vm_create_barebones_type(vm_type);
> +
> + for (flag = BIT(0); flag; flag <<= 1) {
> + fd = __vm_create_guest_memfd(vm, page_size, flag);
> +
> + if (flag & expected_valid_flags) {
> + TEST_ASSERT(fd >= 0,
> + "guest_memfd() with flag '0x%lx' should be valid",
> + flag);
> + close(fd);
> + } else {
> + TEST_ASSERT(fd < 0 && errno == EINVAL,
> + "guest_memfd() with flag '0x%lx' should fail with EINVAL",
> + flag);
> + }
> + }
> +
> + kvm_vm_free(vm);
> +}
> +
> +static void test_gmem_flag_validity(void)
> +{
> + uint64_t non_coco_vm_valid_flags = 0;
> +
> + if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
> + non_coco_vm_valid_flags = GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +
> + test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, non_coco_vm_valid_flags);
> +
> +#ifdef __x86_64__
> + test_vm_type_gmem_flag_validity(KVM_X86_SW_PROTECTED_VM, non_coco_vm_valid_flags);
> + test_vm_type_gmem_flag_validity(KVM_X86_SEV_VM, 0);
> + test_vm_type_gmem_flag_validity(KVM_X86_SEV_ES_VM, 0);
> + test_vm_type_gmem_flag_validity(KVM_X86_SNP_VM, 0);
> + test_vm_type_gmem_flag_validity(KVM_X86_TDX_VM, 0);
> +#endif
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
> +
> + test_gmem_flag_validity();
> +
> + test_with_type(VM_TYPE_DEFAULT, 0, false);
> + if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
> + test_with_type(VM_TYPE_DEFAULT, GUEST_MEMFD_FLAG_SUPPORT_SHARED,
> + true);
> + }
> +
> +#ifdef __x86_64__
> + test_with_type(KVM_X86_SW_PROTECTED_VM, 0, false);
> + if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) {
> + test_with_type(KVM_X86_SW_PROTECTED_VM,
> + GUEST_MEMFD_FLAG_SUPPORT_SHARED, true);
> + }
> +#endif
> }
Reviewed-by: Shivank Garg <shivankg@amd.com>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test
2025-06-11 13:33 ` [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
@ 2025-06-12 16:24 ` Shivank Garg
0 siblings, 0 replies; 75+ messages in thread
From: Shivank Garg @ 2025-06-12 16:24 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
quic_pheragu, catalin.marinas, james.morse, yuzenghui,
oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, pankaj.gupta,
ira.weiny
On 6/11/2025 7:03 PM, Fuad Tabba wrote:
> Using hardcoded page size values could cause the test to fail on systems
> that have larger pages, e.g., arm64 with 64kB pages. Use getpagesize()
> instead.
>
> Also, build the guest_memfd selftest for arm64.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> tools/testing/selftests/kvm/Makefile.kvm | 1 +
> tools/testing/selftests/kvm/guest_memfd_test.c | 11 ++++++-----
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> index 38b95998e1e6..e11ed9e59ab5 100644
> --- a/tools/testing/selftests/kvm/Makefile.kvm
> +++ b/tools/testing/selftests/kvm/Makefile.kvm
> @@ -172,6 +172,7 @@ TEST_GEN_PROGS_arm64 += arch_timer
> TEST_GEN_PROGS_arm64 += coalesced_io_test
> TEST_GEN_PROGS_arm64 += dirty_log_perf_test
> TEST_GEN_PROGS_arm64 += get-reg-list
> +TEST_GEN_PROGS_arm64 += guest_memfd_test
> TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
> TEST_GEN_PROGS_arm64 += memslot_perf_test
> TEST_GEN_PROGS_arm64 += mmu_stress_test
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index ce687f8d248f..341ba616cf55 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -146,24 +146,25 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
> {
> int fd1, fd2, ret;
> struct stat st1, st2;
> + size_t page_size = getpagesize();
>
> - fd1 = __vm_create_guest_memfd(vm, 4096, 0);
> + fd1 = __vm_create_guest_memfd(vm, page_size, 0);
> TEST_ASSERT(fd1 != -1, "memfd creation should succeed");
>
> ret = fstat(fd1, &st1);
> TEST_ASSERT(ret != -1, "memfd fstat should succeed");
> - TEST_ASSERT(st1.st_size == 4096, "memfd st_size should match requested size");
> + TEST_ASSERT(st1.st_size == page_size, "memfd st_size should match requested size");
>
> - fd2 = __vm_create_guest_memfd(vm, 8192, 0);
> + fd2 = __vm_create_guest_memfd(vm, page_size * 2, 0);
> TEST_ASSERT(fd2 != -1, "memfd creation should succeed");
>
> ret = fstat(fd2, &st2);
> TEST_ASSERT(ret != -1, "memfd fstat should succeed");
> - TEST_ASSERT(st2.st_size == 8192, "second memfd st_size should match requested size");
> + TEST_ASSERT(st2.st_size == page_size * 2, "second memfd st_size should match requested size");
>
> ret = fstat(fd1, &st1);
> TEST_ASSERT(ret != -1, "memfd fstat should succeed");
> - TEST_ASSERT(st1.st_size == 4096, "first memfd st_size should still match requested size");
> + TEST_ASSERT(st1.st_size == page_size, "first memfd st_size should still match requested size");
> TEST_ASSERT(st1.st_ino != st2.st_ino, "different memfd should have different inode numbers");
>
> close(fd2);
Reviewed-by: Shivank Garg <shivankg@amd.com>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults
2025-06-11 13:33 ` [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
@ 2025-06-12 17:33 ` James Houghton
0 siblings, 0 replies; 75+ messages in thread
From: James Houghton @ 2025-06-12 17:33 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, david, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, peterx,
pankaj.gupta, ira.weiny
On Wed, Jun 11, 2025 at 6:34 AM Fuad Tabba <tabba@google.com> wrote:
>
> Add arm64 support for handling guest page faults on guest_memfd backed
> memslots. Until guest_memfd supports huge pages, the fault granule is
> restricted to PAGE_SIZE.
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
Thanks Fuad! Hopefully Oliver and/or Marc can take a look at these Arm
patches soon. :)
Feel free to add:
Reviewed-by: James Houghton <jthoughton@google.com>
> ---
> arch/arm64/kvm/mmu.c | 82 ++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 79 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 58662e0ef13e..71f8b53683e7 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1512,6 +1512,78 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
> *prot |= kvm_encode_nested_level(nested);
> }
>
> +#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
> +
> +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> + struct kvm_s2_trans *nested,
> + struct kvm_memory_slot *memslot, bool is_perm)
> +{
> + bool write_fault, exec_fault, writable;
> + enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
> + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> + struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> + struct page *page;
> + struct kvm *kvm = vcpu->kvm;
> + void *memcache;
> + kvm_pfn_t pfn;
> + gfn_t gfn;
> + int ret;
> +
> + ret = prepare_mmu_memcache(vcpu, true, &memcache);
> + if (ret)
> + return ret;
> +
> + if (nested)
> + gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
> + else
> + gfn = fault_ipa >> PAGE_SHIFT;
> +
> + write_fault = kvm_is_write_fault(vcpu);
> + exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
> +
> + if (write_fault && exec_fault) {
> + kvm_err("Simultaneous write and execution fault\n");
> + return -EFAULT;
> + }
> +
> + if (is_perm && !write_fault && !exec_fault) {
> + kvm_err("Unexpected L2 read permission error\n");
> + return -EFAULT;
> + }
> +
> + ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
> + if (ret) {
> + kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
> + write_fault, exec_fault, false);
> + return ret;
> + }
> +
> + writable = !(memslot->flags & KVM_MEM_READONLY);
> +
> + if (nested)
> + adjust_nested_fault_perms(nested, &prot, &writable);
> +
> + if (writable)
> + prot |= KVM_PGTABLE_PROT_W;
> +
> + if (exec_fault ||
> + (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
> + (!nested || kvm_s2_trans_executable(nested))))
> + prot |= KVM_PGTABLE_PROT_X;
> +
> + kvm_fault_lock(kvm);
> + ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
> + __pfn_to_phys(pfn), prot,
> + memcache, flags);
> + kvm_release_faultin_page(kvm, page, !!ret, writable);
> + kvm_fault_unlock(kvm);
> +
> + if (writable && !ret)
> + mark_page_dirty_in_slot(kvm, memslot, gfn);
> +
> + return ret != -EAGAIN ? ret : 0;
> +}
> +
> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> struct kvm_s2_trans *nested,
> struct kvm_memory_slot *memslot, unsigned long hva,
> @@ -1536,7 +1608,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> struct kvm_pgtable *pgt;
> struct page *page;
> - enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
> + enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
>
> if (fault_is_perm)
> fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
> @@ -1963,8 +2035,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> goto out_unlock;
> }
>
> - ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> - esr_fsc_is_permission_fault(esr));
> + if (kvm_slot_has_gmem(memslot))
> + ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> + esr_fsc_is_permission_fault(esr));
> + else
> + ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> + esr_fsc_is_permission_fault(esr));
> if (ret == 0)
> ret = 1;
> out:
> --
> 2.50.0.rc0.642.g800a2b2222-goog
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
` (17 preceding siblings ...)
2025-06-11 13:33 ` [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
@ 2025-06-12 17:38 ` David Hildenbrand
2025-06-24 10:02 ` Fuad Tabba
18 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2025-06-12 17:38 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx, pankaj.gupta, ira.weiny
On 11.06.25 15:33, Fuad Tabba wrote:
> Main changes since v11 [1]:
> - Addressed various points of feedback from the last revision.
> - Rebased on Linux 6.16-rc1.
Nit: In case you have to resend, it might be worth changing the subject
s/software protected/non-CoCo/ like you did in patch #12.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
2025-06-11 13:33 ` [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
@ 2025-06-13 13:57 ` Ackerley Tng
2025-06-13 20:35 ` Sean Christopherson
1 sibling, 0 replies; 75+ messages in thread
From: Ackerley Tng @ 2025-06-13 13:57 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, seanjc, viro, brauner, willy,
akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny, tabba
Fuad Tabba <tabba@google.com> writes:
> The bool has_private_mem is used to indicate whether guest_memfd is
> supported. Rename it to supports_gmem to make its meaning clearer and to
> decouple memory being private from guest_memfd.
>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> arch/x86/include/asm/kvm_host.h | 4 ++--
> arch/x86/kvm/mmu/mmu.c | 2 +-
> arch/x86/kvm/svm/svm.c | 4 ++--
> arch/x86/kvm/x86.c | 3 +--
> 4 files changed, 6 insertions(+), 7 deletions(-)
>
This [1] is one recently-merged usage of arch.has_private_mem which
needs to be renamed too.
[1] https://github.com/torvalds/linux/blob/27605c8c0f69e319df156b471974e4e223035378/arch/x86/kvm/vmx/tdx.c#L627
[...]
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
2025-06-11 13:33 ` [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-06-13 13:57 ` Ackerley Tng
@ 2025-06-13 20:35 ` Sean Christopherson
2025-06-16 7:13 ` Fuad Tabba
2025-06-24 20:51 ` Ackerley Tng
1 sibling, 2 replies; 75+ messages in thread
From: Sean Christopherson @ 2025-06-13 20:35 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On Wed, Jun 11, 2025, Fuad Tabba wrote:
> The bool has_private_mem is used to indicate whether guest_memfd is
> supported.
No? This is at best weird, and at worst flat out wrong:
if (kvm->arch.supports_gmem &&
fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
return false;
ditto for this code:
if (kvm_arch_supports_gmem(vcpu->kvm) &&
kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(range->gpa)))i
error_code |= PFERR_PRIVATE_ACCESS;
and for the memory_attributes code. E.g. IIRC, with guest_memfd() mmap support,
private vs. shared will become a property of the guest_memfd inode, i.e. this will
become wrong:
static u64 kvm_supported_mem_attributes(struct kvm *kvm)
{
if (!kvm || kvm_arch_supports_gmem(kvm))
return KVM_MEMORY_ATTRIBUTE_PRIVATE;
return 0;
}
Instead of renaming kvm_arch_has_private_mem() => kvm_arch_supports_gmem(), *add*
kvm_arch_supports_gmem() and then kill off kvm_arch_has_private_mem() once non-x86
usage is gone (i.e. query kvm->arch.has_private_mem directly).
And then rather than rename has_private_mem, either add supports_gmem or do what
you did for kvm_arch_supports_gmem_shared_mem() and explicitly check the VM type.
> Rename it to supports_gmem to make its meaning clearer and to decouple memory
> being private from guest_memfd.
>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> arch/x86/include/asm/kvm_host.h | 4 ++--
> arch/x86/kvm/mmu/mmu.c | 2 +-
> arch/x86/kvm/svm/svm.c | 4 ++--
> arch/x86/kvm/x86.c | 3 +--
> 4 files changed, 6 insertions(+), 7 deletions(-)
This missed the usage in TDX (it's not a staleness problem, because this series
was based on 6.16-rc1, which has the relevant code).
arch/x86/kvm/vmx/tdx.c: In function ‘tdx_vm_init’:
arch/x86/kvm/vmx/tdx.c:627:18: error: ‘struct kvm_arch’ has no member named ‘has_private_mem’
627 | kvm->arch.has_private_mem = true;
| ^
make[5]: *** [scripts/Makefile.build:287: arch/x86/kvm/vmx/tdx.o] Error 1
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-06-12 16:16 ` Shivank Garg
@ 2025-06-13 21:03 ` Sean Christopherson
2025-06-13 21:18 ` David Hildenbrand
` (4 more replies)
2025-06-25 21:47 ` Ackerley Tng
2 siblings, 5 replies; 75+ messages in thread
From: Sean Christopherson @ 2025-06-13 21:03 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On Wed, Jun 11, 2025, Fuad Tabba wrote:
> This patch enables support for shared memory in guest_memfd, including
Please don't lead with with "This patch", simply state what changes are being
made as a command.
> mapping that memory from host userspace.
> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> flag at creation time.
Why? I can see that from the patch.
This changelog is way, way, waaay too light on details. Sorry for jumping in at
the 11th hour, but we've spent what, 2 years working on this?
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d00b85cb168c..cb19150fd595 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
>
> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0)
I find the SUPPORT_SHARED terminology to be super confusing. I had to dig quite
deep to undesrtand that "support shared" actually mean "userspace explicitly
enable sharing on _this_ guest_memfd instance". E.g. I was surprised to see
IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is
weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's
novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case
is to share memory, but that's a _use case_, not the property of guest_memfd that
is being controlled by userspace.
And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the
memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
instance is the _only_ entry point to the memslot.
So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
KVM_MEMSLOT_GUEST_MEMFD_ONLY. That will make code like this:
if (kvm_slot_has_gmem(slot) &&
(kvm_gmem_memslot_supports_shared(slot) ||
kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
return kvm_gmem_max_mapping_level(slot, gfn, max_level);
}
much more intutive:
if (kvm_is_memslot_gmem_only(slot) ||
kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE))
return kvm_gmem_max_mapping_level(slot, gfn, max_level);
And then have kvm_gmem_mapping_order() do:
WARN_ON_ONCE(!kvm_slot_has_gmem(slot));
return 0;
> struct kvm_create_guest_memfd {
> __u64 size;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 559c93ad90be..e90884f74404 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
> config HAVE_KVM_ARCH_GMEM_INVALIDATE
> bool
> depends on KVM_GMEM
> +
> +config KVM_GMEM_SHARED_MEM
> + select KVM_GMEM
> + bool
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 6db515833f61..06616b6b493b 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> return gfn - slot->base_gfn + slot->gmem.pgoff;
> }
>
> +static bool kvm_gmem_supports_shared(struct inode *inode)
> +{
> + const u64 flags = (u64)inode->i_private;
> +
> + if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
> + return false;
> +
> + return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +}
> +
> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
And to my point about "shared", this is also very confusing, because there are
zero checks in here about shared vs. private.
> +{
> + struct inode *inode = file_inode(vmf->vma->vm_file);
> + struct folio *folio;
> + vm_fault_t ret = VM_FAULT_LOCKED;
> +
> + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> + return VM_FAULT_SIGBUS;
> +
> + folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> + if (IS_ERR(folio)) {
> + int err = PTR_ERR(folio);
> +
> + if (err == -EAGAIN)
> + return VM_FAULT_RETRY;
> +
> + return vmf_error(err);
> + }
> +
> + if (WARN_ON_ONCE(folio_test_large(folio))) {
> + ret = VM_FAULT_SIGBUS;
> + goto out_folio;
> + }
> +
> + if (!folio_test_uptodate(folio)) {
> + clear_highpage(folio_page(folio, 0));
> + kvm_gmem_mark_prepared(folio);
> + }
> +
> + vmf->page = folio_file_page(folio, vmf->pgoff);
> +
> +out_folio:
> + if (ret != VM_FAULT_LOCKED) {
> + folio_unlock(folio);
> + folio_put(folio);
> + }
> +
> + return ret;
> +}
> +
> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> + .fault = kvm_gmem_fault_shared,
> +};
> +
> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + if (!kvm_gmem_supports_shared(file_inode(file)))
> + return -ENODEV;
> +
> + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> + (VM_SHARED | VM_MAYSHARE)) {
And the SHARED terminology gets really confusing here, due to colliding with the
existing notion of SHARED file mappings.
> + return -EINVAL;
> + }
> +
> + vma->vm_ops = &kvm_gmem_vm_ops;
> +
> + return 0;
> +}
> +
> static struct file_operations kvm_gmem_fops = {
> + .mmap = kvm_gmem_mmap,
> .open = generic_file_open,
> .release = kvm_gmem_release,
> .fallocate = kvm_gmem_fallocate,
> @@ -463,6 +533,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> u64 flags = args->flags;
> u64 valid_flags = 0;
>
> + if (kvm_arch_supports_gmem_shared_mem(kvm))
> + valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> +
> if (flags & ~valid_flags)
> return -EINVAL;
>
> --
> 2.50.0.rc0.642.g800a2b2222-goog
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-13 21:03 ` Sean Christopherson
@ 2025-06-13 21:18 ` David Hildenbrand
2025-06-13 22:48 ` Sean Christopherson
` (3 subsequent siblings)
4 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2025-06-13 21:18 UTC (permalink / raw)
To: Sean Christopherson, Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On 13.06.25 23:03, Sean Christopherson wrote:
> On Wed, Jun 11, 2025, Fuad Tabba wrote:
>> This patch enables support for shared memory in guest_memfd, including
>
> Please don't lead with with "This patch", simply state what changes are being
> made as a command.
Agrred.
>
>> mapping that memory from host userspace.
>
>> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
>> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
>> flag at creation time.
>
> Why? I can see that from the patch.
>
> This changelog is way, way, waaay too light on details.
Agreed.
Sorry for jumping in at
> the 11th hour, but we've spent what, 2 years working on this?
It's late in Germany on a Friday, so I am probably grumpy, but most of
what you raise (terminology ...) was been discussed plenty of times
before either during review here or during the upstream calls ... :(
Anyhow, happy for your review feedback on this series at this point, but
this is just the perfect time to shutdown my computer on a Friday
evening, knowing we will need another 2 years until this is finally
upstream if we keep going like that.
(again, sorry to be grumpy, but this is not the stuff I want to be
reading at the 11th hour)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-06-11 13:33 ` [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
@ 2025-06-13 22:08 ` Sean Christopherson
2025-06-24 23:40 ` Ackerley Tng
0 siblings, 1 reply; 75+ messages in thread
From: Sean Christopherson @ 2025-06-13 22:08 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On Wed, Jun 11, 2025, Fuad Tabba wrote:
> From: Ackerley Tng <ackerleytng@google.com>
>
> For memslots backed by guest_memfd with shared mem support, the KVM MMU
> must always fault in pages from guest_memfd, and not from the host
> userspace_addr. Update the fault handler to do so.
And with a KVM_MEMSLOT_GUEST_MEMFD_ONLY flag, this becomes super obvious.
> This patch also refactors related function names for accuracy:
This patch. And phrase changelogs as commands.
> kvm_mem_is_private() returns true only when the current private/shared
> state (in the CoCo sense) of the memory is private, and returns false if
> the current state is shared explicitly or impicitly, e.g., belongs to a
> non-CoCo VM.
Again, state changes as commands. For the above, it's not obvious if you're
talking about the existing code versus the state of things after "this patch".
> kvm_mmu_faultin_pfn_gmem() is updated to indicate that it can be used to
> fault in not just private memory, but more generally, from guest_memfd.
> +static inline u8 kvm_max_level_for_order(int order)
Do not use "inline" for functions that are visible only to the local compilation
unit. "inline" is just a hint, and modern compilers are smart enough to inline
functions when appropriate without a hint.
A longer explanation/rant here: https://lore.kernel.org/all/ZAdfX+S323JVWNZC@google.com
> +static inline int kvm_gmem_max_mapping_level(const struct kvm_memory_slot *slot,
> + gfn_t gfn, int max_level)
> +{
> + int max_order;
>
> if (max_level == PG_LEVEL_4K)
> return PG_LEVEL_4K;
This is dead code, the one and only caller has *just* checked for this condition.
>
> - host_level = host_pfn_mapping_level(kvm, gfn, slot);
> - return min(host_level, max_level);
> + max_order = kvm_gmem_mapping_order(slot, gfn);
> + return min(max_level, kvm_max_level_for_order(max_order));
> }
...
> -static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
> - u8 max_level, int gmem_order)
> +static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
This is comically verbose. C ain't Java. And having two separate helpers makes
it *really* hard to (a) even see there are TWO helpers in the first place, and
(b) understand how they differ.
Gah, and not your bug, but completely ignoring the RMP in kvm_mmu_max_mapping_level()
is wrong. It "works" because guest_memfd doesn't (yet) support dirty logging,
no one enables the NX hugepage mitigation on AMD hosts.
We could plumb in the pfn and private info, but I don't really see the point,
at least not at this time.
> + struct kvm_page_fault *fault,
> + int order)
> {
> - u8 req_max_level;
> + u8 max_level = fault->max_level;
>
> if (max_level == PG_LEVEL_4K)
> return PG_LEVEL_4K;
>
> - max_level = min(kvm_max_level_for_order(gmem_order), max_level);
> + max_level = min(kvm_max_level_for_order(order), max_level);
> if (max_level == PG_LEVEL_4K)
> return PG_LEVEL_4K;
>
> - req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn);
> - if (req_max_level)
> - max_level = min(max_level, req_max_level);
> + if (fault->is_private) {
> + u8 level = kvm_x86_call(private_max_mapping_level)(kvm, fault->pfn);
Hmm, so the interesting thing here is that (IIRC) the RMP restrictions aren't
just on the private pages, they also apply to the HYPERVISOR/SHARED pages. (Don't
quote me on that).
Regardless, I'm leaning toward dropping the "private" part, and making SNP deal
with the intricacies of the RMP:
/* Some VM types have additional restrictions, e.g. SNP's RMP. */
req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
if (req_max_level)
max_level = min(max_level, req_max_level);
Then we can get to something like:
static int kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
struct kvm_page_fault *fault)
{
int max_level, req_max_level;
max_level = kvm_max_level_for_order(order);
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
if (req_max_level)
max_level = min(max_level, req_max_level);
return max_level;
}
int kvm_mmu_max_mapping_level(struct kvm *kvm,
const struct kvm_memory_slot *slot, gfn_t gfn)
{
int max_level;
max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM);
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
/* TODO: Comment goes here about KVM not supporting this path (yet). */
if (kvm_mem_is_private(kvm, gfn))
return PG_LEVEL_4K;
if (kvm_is_memslot_gmem_only(slot)) {
int order = kvm_gmem_mapping_order(slot, gfn);
return min(max_level, kvm_gmem_max_mapping_level(kvm, order, NULL));
}
return min(max_level, host_pfn_mapping_level(kvm, gfn, slot));
}
static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
struct kvm *kvm = vcpu->kvm;
int order, r;
if (!kvm_slot_has_gmem(fault->slot)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
}
r = kvm_gmem_get_pfn(kvm, fault->slot, fault->gfn, &fault->pfn,
&fault->refcounted_page, &order);
if (r) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return r;
}
fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
fault->max_level = kvm_gmem_max_mapping_level(kvm, order, fault);
return RET_PF_CONTINUE;
}
int sev_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault)
{
int level, rc;
bool assigned;
if (!sev_snp_guest(kvm))
return 0;
if (WARN_ON_ONCE(!fault) || !fault->is_private)
return 0;
rc = snp_lookup_rmpentry(fault->pfn, &assigned, &level);
if (rc || !assigned)
return PG_LEVEL_4K;
return level;
}
> +/*
> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
> + * private.
> + *
> + * A return value of false indicates that the gfn is explicitly or implicitly
> + * shared (i.e., non-CoCo VMs).
> + */
> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> {
> - return IS_ENABLED(CONFIG_KVM_GMEM) &&
> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> + struct kvm_memory_slot *slot;
> +
> + if (!IS_ENABLED(CONFIG_KVM_GMEM))
> + return false;
> +
> + slot = gfn_to_memslot(kvm, gfn);
> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
> + /*
> + * Without in-place conversion support, if a guest_memfd memslot
> + * supports shared memory, then all the slot's memory is
> + * considered not private, i.e., implicitly shared.
> + */
> + return false;
Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
mappable guest_memfd. You need to do that no matter what. Then you don't need
to sprinkle special case code all over the place.
> + }
> +
> + return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> }
> #else
> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> --
> 2.50.0.rc0.642.g800a2b2222-goog
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-13 21:03 ` Sean Christopherson
2025-06-13 21:18 ` David Hildenbrand
@ 2025-06-13 22:48 ` Sean Christopherson
2025-06-16 6:52 ` Fuad Tabba
` (2 subsequent siblings)
4 siblings, 0 replies; 75+ messages in thread
From: Sean Christopherson @ 2025-06-13 22:48 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On Fri, Jun 13, 2025, Sean Christopherson wrote:
> On Wed, Jun 11, 2025, Fuad Tabba wrote:
...
> > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
>
> And to my point about "shared", this is also very confusing, because there are
> zero checks in here about shared vs. private.
Heh, and amusingly (to me at least), I was the one that suggested this name[*]:
: > static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
:
: This should be something like kvm_gmem_fault_shared() make it abundantly clear
: what's being done. Because it too me a few looks to realize this is faulting
: memory into host userspace, not into the guest.
Though I don't think my two statements are contradictory. A bare kvm_gmem_fault()
is confusing because it's ambigous. kvm_gmem_fault_shared() is confusing because
"shared" is (IMO) bad terminology.
E.g. to me, this is much more obvious:
static vm_fault_t kvm_gmem_fault_user(struct vm_fault *vmf)
or even
static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
if we're worried about "user" getting confused with supervisor vs. user in the
guest.
[*] https://lore.kernel.org/all/Z-3UGmcCwJtaP-yF@google.com
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-13 21:03 ` Sean Christopherson
2025-06-13 21:18 ` David Hildenbrand
2025-06-13 22:48 ` Sean Christopherson
@ 2025-06-16 6:52 ` Fuad Tabba
2025-06-16 14:16 ` David Hildenbrand
2025-06-17 23:04 ` Sean Christopherson
2025-06-16 13:44 ` Ira Weiny
2025-06-18 9:25 ` David Hildenbrand
4 siblings, 2 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-16 6:52 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Hi Sean,
On Fri, 13 Jun 2025 at 22:03, Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Jun 11, 2025, Fuad Tabba wrote:
> > This patch enables support for shared memory in guest_memfd, including
>
> Please don't lead with with "This patch", simply state what changes are being
> made as a command.
Ack.
> > mapping that memory from host userspace.
>
> > This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> > and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> > flag at creation time.
>
> Why? I can see that from the patch.
It's in the patch series, not this patch. Would it help if I rephrase
it along the lines of:
This functionality isn't enabled until the introduction of the
KVM_GMEM_SHARED_MEM Kconfig option, and enabled for a given instance
by the GUEST_MEMFD_FLAG_SUPPORT_SHARED flag at creation time. Both of
which are introduced in a subsequent patch.
> This changelog is way, way, waaay too light on details. Sorry for jumping in at
> the 11th hour, but we've spent what, 2 years working on this?
I'll expand this. Just to make sure that I include the right details,
are you looking for implementation details, motivation, use cases?
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Acked-by: David Hildenbrand <david@redhat.com>
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index d00b85cb168c..cb19150fd595 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
> > #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
> >
> > #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
> > +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0)
>
> I find the SUPPORT_SHARED terminology to be super confusing. I had to dig quite
> deep to undesrtand that "support shared" actually mean "userspace explicitly
> enable sharing on _this_ guest_memfd instance". E.g. I was surprised to see
>
> IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is
> weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's
> novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case
> is to share memory, but that's a _use case_, not the property of guest_memfd that
> is being controlled by userspace.
>
> And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the
> memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
> instance is the _only_ entry point to the memslot.
>
> So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
> KVM_MEMSLOT_GUEST_MEMFD_ONLY. That will make code like this:
>
> if (kvm_slot_has_gmem(slot) &&
> (kvm_gmem_memslot_supports_shared(slot) ||
> kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> }
>
> much more intutive:
>
> if (kvm_is_memslot_gmem_only(slot) ||
> kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE))
> return kvm_gmem_max_mapping_level(slot, gfn, max_level);
>
> And then have kvm_gmem_mapping_order() do:
>
> WARN_ON_ONCE(!kvm_slot_has_gmem(slot));
> return 0;
I have no preference really. To me this was intuitive, but I guess I
have been staring at this way too long. If you and all the
stakeholders are happy with your suggested changes, then I am happy
making them :)
> > struct kvm_create_guest_memfd {
> > __u64 size;
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index 559c93ad90be..e90884f74404 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
> > config HAVE_KVM_ARCH_GMEM_INVALIDATE
> > bool
> > depends on KVM_GMEM
> > +
> > +config KVM_GMEM_SHARED_MEM
> > + select KVM_GMEM
> > + bool
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 6db515833f61..06616b6b493b 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> > return gfn - slot->base_gfn + slot->gmem.pgoff;
> > }
> >
> > +static bool kvm_gmem_supports_shared(struct inode *inode)
> > +{
> > + const u64 flags = (u64)inode->i_private;
> > +
> > + if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
> > + return false;
> > +
> > + return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +}
> > +
> > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
>
> And to my point about "shared", this is also very confusing, because there are
> zero checks in here about shared vs. private.
As you noted in a later email, it was you who suggested this name, but
like I said, I am happy to change it.
> > +{
> > + struct inode *inode = file_inode(vmf->vma->vm_file);
> > + struct folio *folio;
> > + vm_fault_t ret = VM_FAULT_LOCKED;
> > +
> > + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> > + return VM_FAULT_SIGBUS;
> > +
> > + folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> > + if (IS_ERR(folio)) {
> > + int err = PTR_ERR(folio);
> > +
> > + if (err == -EAGAIN)
> > + return VM_FAULT_RETRY;
> > +
> > + return vmf_error(err);
> > + }
> > +
> > + if (WARN_ON_ONCE(folio_test_large(folio))) {
> > + ret = VM_FAULT_SIGBUS;
> > + goto out_folio;
> > + }
> > +
> > + if (!folio_test_uptodate(folio)) {
> > + clear_highpage(folio_page(folio, 0));
> > + kvm_gmem_mark_prepared(folio);
> > + }
> > +
> > + vmf->page = folio_file_page(folio, vmf->pgoff);
> > +
> > +out_folio:
> > + if (ret != VM_FAULT_LOCKED) {
> > + folio_unlock(folio);
> > + folio_put(folio);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> > + .fault = kvm_gmem_fault_shared,
> > +};
> > +
> > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > + if (!kvm_gmem_supports_shared(file_inode(file)))
> > + return -ENODEV;
> > +
> > + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> > + (VM_SHARED | VM_MAYSHARE)) {
>
> And the SHARED terminology gets really confusing here, due to colliding with the
> existing notion of SHARED file mappings.
Ack.
Before I respin, let's make sure we're all on the same page in terms
of terminology. Hopefully David can chime in again now that he's had
the weekend to ponder over the latest exchange :)
Thanks,
/fuad
> > + return -EINVAL;
> > + }
> > +
> > + vma->vm_ops = &kvm_gmem_vm_ops;
> > +
> > + return 0;
> > +}
> > +
> > static struct file_operations kvm_gmem_fops = {
> > + .mmap = kvm_gmem_mmap,
> > .open = generic_file_open,
> > .release = kvm_gmem_release,
> > .fallocate = kvm_gmem_fallocate,
> > @@ -463,6 +533,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> > u64 flags = args->flags;
> > u64 valid_flags = 0;
> >
> > + if (kvm_arch_supports_gmem_shared_mem(kvm))
> > + valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +
> > if (flags & ~valid_flags)
> > return -EINVAL;
> >
> > --
> > 2.50.0.rc0.642.g800a2b2222-goog
> >
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
2025-06-13 20:35 ` Sean Christopherson
@ 2025-06-16 7:13 ` Fuad Tabba
2025-06-16 14:20 ` David Hildenbrand
2025-06-24 20:51 ` Ackerley Tng
1 sibling, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-16 7:13 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Hi Sean,
On Fri, 13 Jun 2025 at 21:35, Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Jun 11, 2025, Fuad Tabba wrote:
> > The bool has_private_mem is used to indicate whether guest_memfd is
> > supported.
>
> No? This is at best weird, and at worst flat out wrong:
>
> if (kvm->arch.supports_gmem &&
> fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
> return false;
>
> ditto for this code:
>
> if (kvm_arch_supports_gmem(vcpu->kvm) &&
> kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(range->gpa)))i
> error_code |= PFERR_PRIVATE_ACCESS;
>
> and for the memory_attributes code. E.g. IIRC, with guest_memfd() mmap support,
> private vs. shared will become a property of the guest_memfd inode, i.e. this will
> become wrong:
>
> static u64 kvm_supported_mem_attributes(struct kvm *kvm)
> {
> if (!kvm || kvm_arch_supports_gmem(kvm))
> return KVM_MEMORY_ATTRIBUTE_PRIVATE;
>
> return 0;
> }
>
> Instead of renaming kvm_arch_has_private_mem() => kvm_arch_supports_gmem(), *add*
> kvm_arch_supports_gmem() and then kill off kvm_arch_has_private_mem() once non-x86
> usage is gone (i.e. query kvm->arch.has_private_mem directly).
>
> And then rather than rename has_private_mem, either add supports_gmem or do what
> you did for kvm_arch_supports_gmem_shared_mem() and explicitly check the VM type.
Will do.
To make sure we're on the same page, we should add `supports_gmem` and
keep `has_private_mem`, and continue using it for x86 code by querying
it directly once the helpers are added.
> > Rename it to supports_gmem to make its meaning clearer and to decouple memory
> > being private from guest_memfd.
> >
> > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Reviewed-by: Shivank Garg <shivankg@amd.com>
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > Co-developed-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 4 ++--
> > arch/x86/kvm/mmu/mmu.c | 2 +-
> > arch/x86/kvm/svm/svm.c | 4 ++--
> > arch/x86/kvm/x86.c | 3 +--
> > 4 files changed, 6 insertions(+), 7 deletions(-)
>
> This missed the usage in TDX (it's not a staleness problem, because this series
> was based on 6.16-rc1, which has the relevant code).
>
> arch/x86/kvm/vmx/tdx.c: In function ‘tdx_vm_init’:
> arch/x86/kvm/vmx/tdx.c:627:18: error: ‘struct kvm_arch’ has no member named ‘has_private_mem’
> 627 | kvm->arch.has_private_mem = true;
> | ^
> make[5]: *** [scripts/Makefile.build:287: arch/x86/kvm/vmx/tdx.o] Error 1
I did test and run this before submitting the series. Building it on
x86 with x86_64_defconfig and with allmodconfig pass (I obviously
missed TDX though, apologies for that). I should have grepped for
has_private_mem. That said, if I understood your suggestion correctly,
this problem wouldn't happen again.
Cheers,
/fuad
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-13 21:03 ` Sean Christopherson
` (2 preceding siblings ...)
2025-06-16 6:52 ` Fuad Tabba
@ 2025-06-16 13:44 ` Ira Weiny
2025-06-16 14:03 ` David Hildenbrand
2025-06-18 9:25 ` David Hildenbrand
4 siblings, 1 reply; 75+ messages in thread
From: Ira Weiny @ 2025-06-16 13:44 UTC (permalink / raw)
To: Sean Christopherson, Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Sean Christopherson wrote:
> On Wed, Jun 11, 2025, Fuad Tabba wrote:
> > This patch enables support for shared memory in guest_memfd, including
>
> Please don't lead with with "This patch", simply state what changes are being
> made as a command.
>
> > mapping that memory from host userspace.
>
> > This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> > and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> > flag at creation time.
>
> Why? I can see that from the patch.
>
> This changelog is way, way, waaay too light on details. Sorry for jumping in at
> the 11th hour, but we've spent what, 2 years working on this?
>
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Acked-by: David Hildenbrand <david@redhat.com>
> > Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index d00b85cb168c..cb19150fd595 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
> > #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
> >
> > #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
> > +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0)
>
> I find the SUPPORT_SHARED terminology to be super confusing. I had to dig quite
> deep to undesrtand that "support shared" actually mean "userspace explicitly
> enable sharing on _this_ guest_memfd instance". E.g. I was surprised to see
>
> IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is
> weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's
> novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case
> is to share memory, but that's a _use case_, not the property of guest_memfd that
> is being controlled by userspace.
>
> And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the
> memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
> instance is the _only_ entry point to the memslot.
>
> So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
If we are going to change this; FLAG_MAPPABLE is not clear to me either.
The guest can map private memory, right? I see your point about shared
being overloaded with file shared but it would not be the first time a
term is overloaded. kvm_slot_has_gmem() does makes a lot of sense.
If it is going to change; how about GUEST_MEMFD_FLAG_USER_MAPPABLE?
Ira
> KVM_MEMSLOT_GUEST_MEMFD_ONLY. That will make code like this:
>
> if (kvm_slot_has_gmem(slot) &&
> (kvm_gmem_memslot_supports_shared(slot) ||
> kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> }
>
> much more intutive:
>
> if (kvm_is_memslot_gmem_only(slot) ||
> kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE))
> return kvm_gmem_max_mapping_level(slot, gfn, max_level);
>
> And then have kvm_gmem_mapping_order() do:
>
> WARN_ON_ONCE(!kvm_slot_has_gmem(slot));
> return 0;
>
> > struct kvm_create_guest_memfd {
> > __u64 size;
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index 559c93ad90be..e90884f74404 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
> > config HAVE_KVM_ARCH_GMEM_INVALIDATE
> > bool
> > depends on KVM_GMEM
> > +
> > +config KVM_GMEM_SHARED_MEM
> > + select KVM_GMEM
> > + bool
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 6db515833f61..06616b6b493b 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> > return gfn - slot->base_gfn + slot->gmem.pgoff;
> > }
> >
> > +static bool kvm_gmem_supports_shared(struct inode *inode)
> > +{
> > + const u64 flags = (u64)inode->i_private;
> > +
> > + if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
> > + return false;
> > +
> > + return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +}
> > +
> > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
>
> And to my point about "shared", this is also very confusing, because there are
> zero checks in here about shared vs. private.
>
> > +{
> > + struct inode *inode = file_inode(vmf->vma->vm_file);
> > + struct folio *folio;
> > + vm_fault_t ret = VM_FAULT_LOCKED;
> > +
> > + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> > + return VM_FAULT_SIGBUS;
> > +
> > + folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> > + if (IS_ERR(folio)) {
> > + int err = PTR_ERR(folio);
> > +
> > + if (err == -EAGAIN)
> > + return VM_FAULT_RETRY;
> > +
> > + return vmf_error(err);
> > + }
> > +
> > + if (WARN_ON_ONCE(folio_test_large(folio))) {
> > + ret = VM_FAULT_SIGBUS;
> > + goto out_folio;
> > + }
> > +
> > + if (!folio_test_uptodate(folio)) {
> > + clear_highpage(folio_page(folio, 0));
> > + kvm_gmem_mark_prepared(folio);
> > + }
> > +
> > + vmf->page = folio_file_page(folio, vmf->pgoff);
> > +
> > +out_folio:
> > + if (ret != VM_FAULT_LOCKED) {
> > + folio_unlock(folio);
> > + folio_put(folio);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> > + .fault = kvm_gmem_fault_shared,
> > +};
> > +
> > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > + if (!kvm_gmem_supports_shared(file_inode(file)))
> > + return -ENODEV;
> > +
> > + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> > + (VM_SHARED | VM_MAYSHARE)) {
>
> And the SHARED terminology gets really confusing here, due to colliding with the
> existing notion of SHARED file mappings.
>
> > + return -EINVAL;
> > + }
> > +
> > + vma->vm_ops = &kvm_gmem_vm_ops;
> > +
> > + return 0;
> > +}
> > +
> > static struct file_operations kvm_gmem_fops = {
> > + .mmap = kvm_gmem_mmap,
> > .open = generic_file_open,
> > .release = kvm_gmem_release,
> > .fallocate = kvm_gmem_fallocate,
> > @@ -463,6 +533,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> > u64 flags = args->flags;
> > u64 valid_flags = 0;
> >
> > + if (kvm_arch_supports_gmem_shared_mem(kvm))
> > + valid_flags |= GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> > +
> > if (flags & ~valid_flags)
> > return -EINVAL;
> >
> > --
> > 2.50.0.rc0.642.g800a2b2222-goog
> >
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-16 13:44 ` Ira Weiny
@ 2025-06-16 14:03 ` David Hildenbrand
2025-06-16 14:16 ` Fuad Tabba
0 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2025-06-16 14:03 UTC (permalink / raw)
To: Ira Weiny, Sean Christopherson, Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta
On 16.06.25 15:44, Ira Weiny wrote:
> Sean Christopherson wrote:
>> On Wed, Jun 11, 2025, Fuad Tabba wrote:
>>> This patch enables support for shared memory in guest_memfd, including
>>
>> Please don't lead with with "This patch", simply state what changes are being
>> made as a command.
>>
>>> mapping that memory from host userspace.
>>
>>> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
>>> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
>>> flag at creation time.
>>
>> Why? I can see that from the patch.
>>
>> This changelog is way, way, waaay too light on details. Sorry for jumping in at
>> the 11th hour, but we've spent what, 2 years working on this?
>>
>>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>> Acked-by: David Hildenbrand <david@redhat.com>
>>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index d00b85cb168c..cb19150fd595 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
>>> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
>>>
>>> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
>>> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0)
>>
>> I find the SUPPORT_SHARED terminology to be super confusing. I had to dig quite
>> deep to undesrtand that "support shared" actually mean "userspace explicitly
>> enable sharing on _this_ guest_memfd instance". E.g. I was surprised to see
>>
>> IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is
>> weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's
>> novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case
>> is to share memory, but that's a _use case_, not the property of guest_memfd that
>> is being controlled by userspace.
>>
>> And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the
>> memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
>> instance is the _only_ entry point to the memslot.
>>
>> So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
>
> If we are going to change this; FLAG_MAPPABLE is not clear to me either.
> The guest can map private memory, right? I see your point about shared
> being overloaded with file shared but it would not be the first time a
> term is overloaded. kvm_slot_has_gmem() does makes a lot of sense.
>
> If it is going to change; how about GUEST_MEMFD_FLAG_USER_MAPPABLE?
If "shared" is not good enough terminology ...
... can we please just find a way to name what this "non-private" memory
is called? That something is mappable into $whatever is not the right
way to look at this IMHO. As raised in the past, we can easily support
read()/write()/etc to this non-private memory.
I'll note, the "non-private" memory in guest-memfd behaves just like ...
the "shared" memory in shmem ... well, or like other memory in memfd.
(which is based on mm/shmem.c).
"Private" is also not the best way to describe the "protected\encrypted"
memory, but that ship has sailed with KVM_MEMORY_ATTRIBUTE_PRIVATE.
I'll further note that in the doc of KVM_SET_USER_MEMORY_REGION2 we talk
about "private" vs "shared" memory ... so that would have to be improved
as well.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-16 14:03 ` David Hildenbrand
@ 2025-06-16 14:16 ` Fuad Tabba
2025-06-16 14:25 ` David Hildenbrand
0 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-16 14:16 UTC (permalink / raw)
To: David Hildenbrand
Cc: Ira Weiny, Sean Christopherson, kvm, linux-arm-msm, linux-mm,
kvmarm, pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer,
aou, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx, pankaj.gupta
On Mon, 16 Jun 2025 at 15:03, David Hildenbrand <david@redhat.com> wrote:
>
> On 16.06.25 15:44, Ira Weiny wrote:
> > Sean Christopherson wrote:
> >> On Wed, Jun 11, 2025, Fuad Tabba wrote:
> >>> This patch enables support for shared memory in guest_memfd, including
> >>
> >> Please don't lead with with "This patch", simply state what changes are being
> >> made as a command.
> >>
> >>> mapping that memory from host userspace.
> >>
> >>> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> >>> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> >>> flag at creation time.
> >>
> >> Why? I can see that from the patch.
> >>
> >> This changelog is way, way, waaay too light on details. Sorry for jumping in at
> >> the 11th hour, but we've spent what, 2 years working on this?
> >>
> >>> Reviewed-by: Gavin Shan <gshan@redhat.com>
> >>> Acked-by: David Hildenbrand <david@redhat.com>
> >>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> >>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> >>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>> ---
> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>> index d00b85cb168c..cb19150fd595 100644
> >>> --- a/include/uapi/linux/kvm.h
> >>> +++ b/include/uapi/linux/kvm.h
> >>> @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
> >>> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
> >>>
> >>> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
> >>> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0)
> >>
> >> I find the SUPPORT_SHARED terminology to be super confusing. I had to dig quite
> >> deep to undesrtand that "support shared" actually mean "userspace explicitly
> >> enable sharing on _this_ guest_memfd instance". E.g. I was surprised to see
> >>
> >> IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is
> >> weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's
> >> novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case
> >> is to share memory, but that's a _use case_, not the property of guest_memfd that
> >> is being controlled by userspace.
> >>
> >> And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the
> >> memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
> >> instance is the _only_ entry point to the memslot.
> >>
> >> So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
> >
> > If we are going to change this; FLAG_MAPPABLE is not clear to me either.
> > The guest can map private memory, right? I see your point about shared
> > being overloaded with file shared but it would not be the first time a
> > term is overloaded. kvm_slot_has_gmem() does makes a lot of sense.
> >
> > If it is going to change; how about GUEST_MEMFD_FLAG_USER_MAPPABLE?
>
> If "shared" is not good enough terminology ...
>
> ... can we please just find a way to name what this "non-private" memory
> is called? That something is mappable into $whatever is not the right
> way to look at this IMHO. As raised in the past, we can easily support
> read()/write()/etc to this non-private memory.
>
>
> I'll note, the "non-private" memory in guest-memfd behaves just like ...
> the "shared" memory in shmem ... well, or like other memory in memfd.
> (which is based on mm/shmem.c).
>
> "Private" is also not the best way to describe the "protected\encrypted"
> memory, but that ship has sailed with KVM_MEMORY_ATTRIBUTE_PRIVATE.
>
> I'll further note that in the doc of KVM_SET_USER_MEMORY_REGION2 we talk
> about "private" vs "shared" memory ... so that would have to be improved
> as well.
To add to what David just wrote, V1 of this series used the term
"mappable" [1]. After a few discussions, I thought the consensus was
that "shared" was a more accurate description --- i.e., mappability
was a side effect of it being shared with the host.
One could argue that non-CoCo VMs have no concept of "shared" vs
"private". A different way of looking at it is, non-CoCo VMs have
their state as shared by default.
I don't have a strong opinion. What would be good if we could agree on
the terminology before I respin this.
Thanks,
/fuad
[1] https://lore.kernel.org/all/20250122152738.1173160-4-tabba@google.com/
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-16 6:52 ` Fuad Tabba
@ 2025-06-16 14:16 ` David Hildenbrand
2025-06-17 23:04 ` Sean Christopherson
1 sibling, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2025-06-16 14:16 UTC (permalink / raw)
To: Fuad Tabba, Sean Christopherson
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On 16.06.25 08:52, Fuad Tabba wrote:
> Hi Sean,
>
> On Fri, 13 Jun 2025 at 22:03, Sean Christopherson <seanjc@google.com> wrote:
>>
>> On Wed, Jun 11, 2025, Fuad Tabba wrote:
>>> This patch enables support for shared memory in guest_memfd, including
>>
>> Please don't lead with with "This patch", simply state what changes are being
>> made as a command.
>
> Ack.
>
>>> mapping that memory from host userspace.
>>
>>> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
>>> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
>>> flag at creation time.
>>
>> Why? I can see that from the patch.
>
> It's in the patch series, not this patch. Would it help if I rephrase
> it along the lines of:
>
> This functionality isn't enabled until the introduction of the
> KVM_GMEM_SHARED_MEM Kconfig option, and enabled for a given instance
> by the GUEST_MEMFD_FLAG_SUPPORT_SHARED flag at creation time. Both of
> which are introduced in a subsequent patch.
>
>> This changelog is way, way, waaay too light on details. Sorry for jumping in at
>> the 11th hour, but we've spent what, 2 years working on this?
>
> I'll expand this. Just to make sure that I include the right details,
> are you looking for implementation details, motivation, use cases?
>
>>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>> Acked-by: David Hildenbrand <david@redhat.com>
>>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index d00b85cb168c..cb19150fd595 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
>>> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
>>>
>>> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
>>> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0)
>>
>> I find the SUPPORT_SHARED terminology to be super confusing. I had to dig quite
>> deep to undesrtand that "support shared" actually mean "userspace explicitly
>> enable sharing on _this_ guest_memfd instance". E.g. I was surprised to see
>>
>> IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is
>> weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's
>> novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case
>> is to share memory, but that's a _use case_, not the property of guest_memfd that
>> is being controlled by userspace.
>>
>> And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the
>> memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
>> instance is the _only_ entry point to the memslot.
>>
>> So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
>> KVM_MEMSLOT_GUEST_MEMFD_ONLY. That will make code like this:
>>
>> if (kvm_slot_has_gmem(slot) &&
>> (kvm_gmem_memslot_supports_shared(slot) ||
>> kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
>> return kvm_gmem_max_mapping_level(slot, gfn, max_level);
>> }
>>
>> much more intutive:
>>
>> if (kvm_is_memslot_gmem_only(slot) ||
>> kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE))
>> return kvm_gmem_max_mapping_level(slot, gfn, max_level);
>>
>> And then have kvm_gmem_mapping_order() do:
>>
>> WARN_ON_ONCE(!kvm_slot_has_gmem(slot));
>> return 0;
>
> I have no preference really. To me this was intuitive, but I guess I
> have been staring at this way too long. If you and all the
> stakeholders are happy with your suggested changes, then I am happy
> making them :)
>
>
>>> struct kvm_create_guest_memfd {
>>> __u64 size;
>>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>>> index 559c93ad90be..e90884f74404 100644
>>> --- a/virt/kvm/Kconfig
>>> +++ b/virt/kvm/Kconfig
>>> @@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
>>> config HAVE_KVM_ARCH_GMEM_INVALIDATE
>>> bool
>>> depends on KVM_GMEM
>>> +
>>> +config KVM_GMEM_SHARED_MEM
>>> + select KVM_GMEM
>>> + bool
>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>>> index 6db515833f61..06616b6b493b 100644
>>> --- a/virt/kvm/guest_memfd.c
>>> +++ b/virt/kvm/guest_memfd.c
>>> @@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>>> return gfn - slot->base_gfn + slot->gmem.pgoff;
>>> }
>>>
>>> +static bool kvm_gmem_supports_shared(struct inode *inode)
>>> +{
>>> + const u64 flags = (u64)inode->i_private;
>>> +
>>> + if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
>>> + return false;
>>> +
>>> + return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
>>> +}
>>> +
>>> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
>>
>> And to my point about "shared", this is also very confusing, because there are
>> zero checks in here about shared vs. private.
>
> As you noted in a later email, it was you who suggested this name, but
> like I said, I am happy to change it.
>
>>> +{
>>> + struct inode *inode = file_inode(vmf->vma->vm_file);
>>> + struct folio *folio;
>>> + vm_fault_t ret = VM_FAULT_LOCKED;
>>> +
>>> + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>>> + return VM_FAULT_SIGBUS;
>>> +
>>> + folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>>> + if (IS_ERR(folio)) {
>>> + int err = PTR_ERR(folio);
>>> +
>>> + if (err == -EAGAIN)
>>> + return VM_FAULT_RETRY;
>>> +
>>> + return vmf_error(err);
>>> + }
>>> +
>>> + if (WARN_ON_ONCE(folio_test_large(folio))) {
>>> + ret = VM_FAULT_SIGBUS;
>>> + goto out_folio;
>>> + }
>>> +
>>> + if (!folio_test_uptodate(folio)) {
>>> + clear_highpage(folio_page(folio, 0));
>>> + kvm_gmem_mark_prepared(folio);
>>> + }
>>> +
>>> + vmf->page = folio_file_page(folio, vmf->pgoff);
>>> +
>>> +out_folio:
>>> + if (ret != VM_FAULT_LOCKED) {
>>> + folio_unlock(folio);
>>> + folio_put(folio);
>>> + }
>>> +
>>> + return ret;
>>> +}
>>> +
>>> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
>>> + .fault = kvm_gmem_fault_shared,
>>> +};
>>> +
>>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>>> +{
>>> + if (!kvm_gmem_supports_shared(file_inode(file)))
>>> + return -ENODEV;
>>> +
>>> + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
>>> + (VM_SHARED | VM_MAYSHARE)) {
>>
>> And the SHARED terminology gets really confusing here, due to colliding with the
>> existing notion of SHARED file mappings.
>
> Ack.
>
> Before I respin, let's make sure we're all on the same page in terms
> of terminology. Hopefully David can chime in again now that he's had
> the weekend to ponder over the latest exchange :)
Fortunately, the naming discussions I have in my spare time (e.g., how
to name my two daughters) are usually easier ;)
As raised in my other reply, talking about mappability is IMHO the wrong
way to look at it.
VM_SHARED is about shared mappings.
shmem is about shared memory.
You can have private mappings of shmem.
You can have shared mappings shmem. (confusing, right)
One talks about mmap() semantics (CoW), the other talks about memory
semantics.
This is existing confusion even without guest_memfd making some of its
memory behave like it shmem I'm afraid.
If we want to avoid talking about "shared memory" in the context of
guest_memfd, maybe we can come up with a terminology that describes this
"non-private memory" clearer. (and fixup the existing doc that uses
private vs. shared)
kvm_gmem_supports_ordinary_memory()
kvm_gmem_supports_non_private_memory()
...
which also don't sound that great.
(I don't like kvm_gmem_supports_shared() either, because there it is not
clear what is actually supported to be shared.
kvm_gmem_supports_shared_memory() would be clearer IMHO )
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
2025-06-16 7:13 ` Fuad Tabba
@ 2025-06-16 14:20 ` David Hildenbrand
0 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2025-06-16 14:20 UTC (permalink / raw)
To: Fuad Tabba, Sean Christopherson
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
>>> Rename it to supports_gmem to make its meaning clearer and to decouple memory
>>> being private from guest_memfd.
>>>
>>> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
>>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>> Reviewed-by: Shivank Garg <shivankg@amd.com>
>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>> Co-developed-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>> ---
>>> arch/x86/include/asm/kvm_host.h | 4 ++--
>>> arch/x86/kvm/mmu/mmu.c | 2 +-
>>> arch/x86/kvm/svm/svm.c | 4 ++--
>>> arch/x86/kvm/x86.c | 3 +--
>>> 4 files changed, 6 insertions(+), 7 deletions(-)
>>
>> This missed the usage in TDX (it's not a staleness problem, because this series
>> was based on 6.16-rc1, which has the relevant code).
>>
>> arch/x86/kvm/vmx/tdx.c: In function ‘tdx_vm_init’:
>> arch/x86/kvm/vmx/tdx.c:627:18: error: ‘struct kvm_arch’ has no member named ‘has_private_mem’
>> 627 | kvm->arch.has_private_mem = true;
>> | ^
>> make[5]: *** [scripts/Makefile.build:287: arch/x86/kvm/vmx/tdx.o] Error 1
>
> I did test and run this before submitting the series. Building it on
> x86 with x86_64_defconfig and with allmodconfig pass (I obviously
> missed TDX though, apologies for that). I should have grepped for
> has_private_mem. That said, if I understood your suggestion correctly,
> this problem wouldn't happen again.
It's interesting that the build bots didn't catch that earlier.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-16 14:16 ` Fuad Tabba
@ 2025-06-16 14:25 ` David Hildenbrand
2025-06-18 0:40 ` Sean Christopherson
0 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2025-06-16 14:25 UTC (permalink / raw)
To: Fuad Tabba
Cc: Ira Weiny, Sean Christopherson, kvm, linux-arm-msm, linux-mm,
kvmarm, pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer,
aou, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, ackerleytng, mail, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx, pankaj.gupta
On 16.06.25 16:16, Fuad Tabba wrote:
> On Mon, 16 Jun 2025 at 15:03, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 16.06.25 15:44, Ira Weiny wrote:
>>> Sean Christopherson wrote:
>>>> On Wed, Jun 11, 2025, Fuad Tabba wrote:
>>>>> This patch enables support for shared memory in guest_memfd, including
>>>>
>>>> Please don't lead with with "This patch", simply state what changes are being
>>>> made as a command.
>>>>
>>>>> mapping that memory from host userspace.
>>>>
>>>>> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
>>>>> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
>>>>> flag at creation time.
>>>>
>>>> Why? I can see that from the patch.
>>>>
>>>> This changelog is way, way, waaay too light on details. Sorry for jumping in at
>>>> the 11th hour, but we've spent what, 2 years working on this?
>>>>
>>>>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
>>>>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>>>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>>>> ---
>>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>>> index d00b85cb168c..cb19150fd595 100644
>>>>> --- a/include/uapi/linux/kvm.h
>>>>> +++ b/include/uapi/linux/kvm.h
>>>>> @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
>>>>> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
>>>>>
>>>>> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
>>>>> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0)
>>>>
>>>> I find the SUPPORT_SHARED terminology to be super confusing. I had to dig quite
>>>> deep to undesrtand that "support shared" actually mean "userspace explicitly
>>>> enable sharing on _this_ guest_memfd instance". E.g. I was surprised to see
>>>>
>>>> IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is
>>>> weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's
>>>> novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case
>>>> is to share memory, but that's a _use case_, not the property of guest_memfd that
>>>> is being controlled by userspace.
>>>>
>>>> And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the
>>>> memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
>>>> instance is the _only_ entry point to the memslot.
>>>>
>>>> So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
>>>
>>> If we are going to change this; FLAG_MAPPABLE is not clear to me either.
>>> The guest can map private memory, right? I see your point about shared
>>> being overloaded with file shared but it would not be the first time a
>>> term is overloaded. kvm_slot_has_gmem() does makes a lot of sense.
>>>
>>> If it is going to change; how about GUEST_MEMFD_FLAG_USER_MAPPABLE?
>>
>> If "shared" is not good enough terminology ...
>>
>> ... can we please just find a way to name what this "non-private" memory
>> is called? That something is mappable into $whatever is not the right
>> way to look at this IMHO. As raised in the past, we can easily support
>> read()/write()/etc to this non-private memory.
>>
>>
>> I'll note, the "non-private" memory in guest-memfd behaves just like ...
>> the "shared" memory in shmem ... well, or like other memory in memfd.
>> (which is based on mm/shmem.c).
>>
>> "Private" is also not the best way to describe the "protected\encrypted"
>> memory, but that ship has sailed with KVM_MEMORY_ATTRIBUTE_PRIVATE.
>>
>> I'll further note that in the doc of KVM_SET_USER_MEMORY_REGION2 we talk
>> about "private" vs "shared" memory ... so that would have to be improved
>> as well.
>
> To add to what David just wrote, V1 of this series used the term
> "mappable" [1]. After a few discussions, I thought the consensus was
> that "shared" was a more accurate description --- i.e., mappability
> was a side effect of it being shared with the host.
>
> One could argue that non-CoCo VMs have no concept of "shared" vs
> "private". A different way of looking at it is, non-CoCo VMs have
> their state as shared by default.
All memory of these VMs behaves similar to other memory-based shared
memory backends (memfd, shmem) in the system, yes. You can map it into
multiple processes and use it like shmem/memfd.
I'm still thinking about another way to call non-private memory ... no
success so far. "ordinary" or "generic" is .... not better.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-16 6:52 ` Fuad Tabba
2025-06-16 14:16 ` David Hildenbrand
@ 2025-06-17 23:04 ` Sean Christopherson
2025-06-18 11:18 ` Fuad Tabba
1 sibling, 1 reply; 75+ messages in thread
From: Sean Christopherson @ 2025-06-17 23:04 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On Mon, Jun 16, 2025, Fuad Tabba wrote:
> > > This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> > > and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> > > flag at creation time.
> >
> > Why? I can see that from the patch.
>
> It's in the patch series, not this patch.
Eh, not really. It doesn't even matter how "Why?" is interpreted, because nothing
in this series covers any of the reasonable interpretations to an acceptable
degree.
These are all the changelogs for generic changes
: This patch enables support for shared memory in guest_memfd, including
: mapping that memory from host userspace.
:
: This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
: and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
: flag at creation time.
: Add a new internal flag in the top half of memslot->flags to track when
: a guest_memfd-backed slot supports shared memory, which is reserved for
: internal use in KVM.
:
: This avoids repeatedly checking the underlying guest_memfd file for
: shared memory support, which requires taking a reference on the file.
the small bit of documentation
+When the capability KVM_CAP_GMEM_SHARED_MEM is supported, the 'flags' field
+supports GUEST_MEMFD_FLAG_SUPPORT_SHARED. Setting this flag on guest_memfd
+creation enables mmap() and faulting of guest_memfd memory to host userspace.
+
+When the KVM MMU performs a PFN lookup to service a guest fault and the backing
+guest_memfd has the GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then the fault will
+always be consumed from guest_memfd, regardless of whether it is a shared or a
+private fault.
and the cover letter
: The purpose of this series is to allow mapping guest_memfd backed memory
: at the host. This support enables VMMs like Firecracker to run guests
: backed completely by guest_memfd [2]. Combined with Patrick's series for
: direct map removal in guest_memfd [3], this would allow running VMs that
: offer additional hardening against Spectre-like transient execution
: attacks.
:
: This series will also serve as a base for _restricted_ mmap() support
: for guest_memfd backed memory at the host for CoCos that allow sharing
: guest memory in-place with the host [4].
None of those get remotely close to explaining the use cases in sufficient
detail.
Now, it's entirely acceptable, and in this case probably highly preferred, to
link to the relevant use cases, e.g. as opposed to trying to regurgitate and
distill a huge pile of information.
But I want the _changelog_ to do the heavy lifting of capturing the most useful
links and providing context. E.g. to find the the motiviation for using
guest_memfd to back non-CoCo VMs, I had to follow the [3] link to Patrick's
series, then walk backwards through the versions of _that_ series, and eventually
come across another link in Patrick's very first RFC:
: This RFC series is a rough draft adding support for running
: non-confidential compute VMs in guest_memfd, based on prior discussions
: with Sean [1].
where [1] is the much more helpful:
https://lore.kernel.org/linux-mm/cc1bb8e9bc3e1ab637700a4d3defeec95b55060a.camel@amazon.com
Now, _I_ am obviously aware of most/all of the use cases and motiviations, but
the changelog isn't just for people like me. Far from it; the changelog is most
useful for people that are coming in with _zero_ knowledge and context. Finding
the above link took me quite a bit of effort and digging (and to some extent, I
knew what I was looking for), whereas an explicit reference in the changelog
would (hopefully) take only the few seconds needed to read the blurb and click
the link.
My main argument for why you (and everyone else) should put significant effort
into changelogs (and comments and documentation!) is very simple: writing and
curating a good changelog (comment/documentation) is something the author does
*once*. If the author skimps out on the changelog, then *every* reader is having
to do that same work *every* time they dig through this code. We as a community
come out far, far ahead in terms of developer time and understanding by turning a
many-time cost into a one-time cost (and that's not even accounting for the fact
that the author's one-time cost will like be a _lot_ smaller).
There's obviously a balance to strike. E.g. if the changelog has 50 links, that's
probably going to be counter-productive for most readers. In this case, 5-7-ish
links with (very) brief contextual references is probably the sweet spot.
> Would it help if I rephrase it along the lines of:
>
> This functionality isn't enabled until the introduction of the
> KVM_GMEM_SHARED_MEM Kconfig option, and enabled for a given instance
> by the GUEST_MEMFD_FLAG_SUPPORT_SHARED flag at creation time. Both of
> which are introduced in a subsequent patch.
>
> > This changelog is way, way, waaay too light on details. Sorry for jumping in at
> > the 11th hour, but we've spent what, 2 years working on this?
>
> I'll expand this. Just to make sure that I include the right details,
> are you looking for implementation details, motivation, use cases?
Despite my lengthy response, none of the above?
Use cases are good fodder for Documentation and the cover letter, and for *brief*
references in the changelogs. Implementation details generally don't need to be
explained in the changelog, modulo notable gotchas and edge cases that are worth
calling out.
I _am_ looking for the motivation, but I suspect it's not the motivation you have
in mind. I'm not terribly concerned with why you want to implement this
functionality; that should be easy to glean from the Documentation and use case
links.
The motivation I'm looking for is why you're adding CONFIG_KVM_GMEM_SHARED_MEM
and GUEST_MEMFD_FLAG_SUPPORT_SHARED.
E.g. CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES was added because it gates large swaths
of code, uAPI, and a field we don't want to access "accidentally" (mem_attr_array),
and because CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES has a hard dependency on
CONFIG_KVM_GENERIC_MMU_NOTIFIER.
For CONFIG_KVM_GMEM_SHARED_MEM, I'm just not seeing the motiviation. It gates
very little code (though that could be slightly changed by wrapping the mmap()
and fault logic guest_memfd.c), and literally every use is part of a broader
conditional. I.e. it's effectively an optimization.
Ha! And it's actively buggy. Because this will allow shared gmem for DEFAULT_VM,
#define kvm_arch_supports_gmem_shared_mem(kvm) \
(IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) && \
((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM || \
(kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
but only if CONFIG_KVM_SW_PROTECTED_VM is selected. That makes no sense. And
that changelog is also sorely lacking. It covers the what, but that's quite
useless, because I can very easily see the what from the code. By covering the
"why" in the changelog, (hopefully) you would have come to the same conclusion
that selecting KVM_GMEM_SHARED_MEM iff KVM_SW_PROTECTED_VM is enabled doesn't
make any sense (because you wouldn't have been able to write a sane justification).
Or, if it somehow does make sense, i.e. if I'm missing something, then that
absolutely needs to in the changelog!
: Define the architecture-specific macro to enable shared memory support
: in guest_memfd for ordinary, i.e., non-CoCo, VM types, specifically
: KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.
:
: Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
: enabled.
As for GUEST_MEMFD_FLAG_SUPPORT_SHARED, after digging through the code, I _think_
the reason we need a flag is so that KVM knows to completely ignore the HVA in
the memslot. (a) explaining that (again, for future readers) would be super
helpful, and (b) if there is other motiviation for a per-guest_memfd opt-in, then
_that_ is also very interesting.
And for (a), bonus points if you explain why it's a GUEST_MEMFD flag, e.g. as
opposed to a per-VM capability or per-memslot flag. (Though this may be self-
evident to any readers that understand any of this, so definitely optional).
> > So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
> > KVM_MEMSLOT_GUEST_MEMFD_ONLY. That will make code like this:
> >
> > if (kvm_slot_has_gmem(slot) &&
> > (kvm_gmem_memslot_supports_shared(slot) ||
> > kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> > return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> > }
> >
> > much more intutive:
> >
> > if (kvm_is_memslot_gmem_only(slot) ||
> > kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE))
> > return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> >
> > And then have kvm_gmem_mapping_order() do:
> >
> > WARN_ON_ONCE(!kvm_slot_has_gmem(slot));
> > return 0;
>
> I have no preference really. To me this was intuitive, but I guess I
> have been staring at this way too long.
I agree that SHARED is intuitive for the pKVM use case (and probably all CoCo use
cases). My objection with the name is that it's misleading/confusing for non-CoCo
VMs (at least for me), and that using SHARED could unnecessarily paint us into a
corner.
Specifically, if there are ever use cases where guest memory is shared between
entities *without* mapping guest memory into host userspace, then we'll be a bit
hosed. Though as is tradition in KVM, I suppose we could just call it
GUEST_MEMFD_FLAG_SUPPORT_SHARED2 ;-)
Regarding CoCo vs. non-CoCo intuition, it's easy enough to discern that
GUEST_MEMFD_FLAG_MAPPABLE is required to do in-place sharing with host userspace.
But IMO it's not easy to glean that GUEST_MEMFD_FLAG_SUPPORT_SHARED is a
effectively a hard requirement for non-CoCo x86 VMs purely because because many
flows in KVM x86 will fail miserable if KVM can't access guest memory via uaccess,
i.e. if guest memory isn't mapped by host userspace. In other words, it's as much
about working within KVM's existing design (and not losing support for a wide
swath of features) as it is about "sharing" guest memory with host userspace.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-16 14:25 ` David Hildenbrand
@ 2025-06-18 0:40 ` Sean Christopherson
2025-06-18 8:15 ` David Hildenbrand
0 siblings, 1 reply; 75+ messages in thread
From: Sean Christopherson @ 2025-06-18 0:40 UTC (permalink / raw)
To: David Hildenbrand
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
ackerleytng, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx, pankaj.gupta
On Mon, Jun 16, 2025, David Hildenbrand wrote:
> On 16.06.25 16:16, Fuad Tabba wrote:
> > On Mon, 16 Jun 2025 at 15:03, David Hildenbrand <david@redhat.com> wrote:
> > > > > IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is
> > > > > weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's
> > > > > novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case
> > > > > is to share memory, but that's a _use case_, not the property of guest_memfd that
> > > > > is being controlled by userspace.
> > > > >
> > > > > And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the
> > > > > memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
> > > > > instance is the _only_ entry point to the memslot.
> > > > >
> > > > > So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
> > > >
> > > > If we are going to change this; FLAG_MAPPABLE is not clear to me either.
> > > > The guest can map private memory, right? I see your point about shared
> > > > being overloaded with file shared but it would not be the first time a
> > > > term is overloaded. kvm_slot_has_gmem() does makes a lot of sense.
> > > >
> > > > If it is going to change; how about GUEST_MEMFD_FLAG_USER_MAPPABLE?
> > >
> > > If "shared" is not good enough terminology ...
> > >
> > > ... can we please just find a way to name what this "non-private" memory
> > > is called?
guest_memfd? Not trying to be cheeky, I genuinely don't understand the need
to come up with a different name. Before CoCo came along, I can't think of a
single time where we felt the need to describe guest memory. There have been
*many* instances of referring to the underlying backing store (e.g. HugeTLB vs.
THP), and many instances where we've needed to talk about the types of mappings
for guest memory, but I can't think of any cases where describing the state of
guest memory itself was ever necessary or even useful.
> > > That something is mappable into $whatever is not the right
> > > way to look at this IMHO.
Why not? Honest question. USER_MAPPABLE is very literal, but I think it's the
right granularity. E.g. we _could_ support read()/write()/etc, but it's not
clear to me that we need/want to. And so why bundle those under SHARED, or any
other one-size-fits-all flag?
> > > As raised in the past, we can easily support read()/write()/etc to this
> > > non-private memory.
> > >
> > > I'll note, the "non-private" memory in guest-memfd behaves just like ...
> > > the "shared" memory in shmem ... well, or like other memory in memfd.
> > > (which is based on mm/shmem.c).
> > >
> > > "Private" is also not the best way to describe the "protected\encrypted"
> > > memory, but that ship has sailed with KVM_MEMORY_ATTRIBUTE_PRIVATE.
Heh, I would argue that ship sailed when TDX called the PTE flag the Shared bit :-)
But yeah, in hindsight, maybe not the greatest name.
> > > I'll further note that in the doc of KVM_SET_USER_MEMORY_REGION2 we talk
> > > about "private" vs "shared" memory ... so that would have to be improved
> > > as well.
> >
> > To add to what David just wrote, V1 of this series used the term
> > "mappable" [1]. After a few discussions, I thought the consensus was
> > that "shared" was a more accurate description --- i.e., mappability
> > was a side effect of it being shared with the host.
As I mentioned in the other thread with respect to sharing between other
entities, simply SHARED doesn't provide sufficient granularity. HOST_SHAREABLE
gets us closer, but I still don't like that because it implies the memory is
100% shareable, e.g. can be accessed just like normal memory.
And for non-CoCo x86 VMs, sharing with host userspace isn't even necessarily the
goal, i.e. "sharing" is a side effect of needing to allow mmap() so that KVM can
continue to function.
> > One could argue that non-CoCo VMs have no concept of "shared" vs
> > "private".
I am that one :-)
> A different way of looking at it is, non-CoCo VMs have
> > their state as shared by default.
Eh, there has to be another state for there to be a default.
> All memory of these VMs behaves similar to other memory-based shared memory
> backends (memfd, shmem) in the system, yes. You can map it into multiple
> processes and use it like shmem/memfd.
Ya, but that's more because guest_memfd only supports MAP_SHARED, versus KVM
really wanting to truly share the memory with the entire system.
Of course, that's also an argument to some extent against USER_MAPPABLE, because
that name assumes we'll never want to support MAP_PRIVATE. But letting userspace
MAP_PRIVATE guest_memfd would completely defeat the purpose of guest_memfd, so
unless I'm forgetting a wrinkle with MAP_PRIVATE vs. MAP_SHARED, that's an
assumption I'm a-ok making.
If we are really dead set on having SHARED in the name, it could be
GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
to me that's _too_ specific and again somewhat confusing given the unfortunate
private vs. shared usage in CoCo-land. And just playing the odds, I'm fine taking
a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or whatever,
because I think that is comically unlikely to happen.
> I'm still thinking about another way to call non-private memory ... no
> success so far. "ordinary" or "generic" is .... not better.
As above, I don't have the same sense of urgency regarding finding a name for
guest_memfd.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-18 0:40 ` Sean Christopherson
@ 2025-06-18 8:15 ` David Hildenbrand
2025-06-18 9:20 ` Xiaoyao Li
2025-06-19 1:48 ` Sean Christopherson
0 siblings, 2 replies; 75+ messages in thread
From: David Hildenbrand @ 2025-06-18 8:15 UTC (permalink / raw)
To: Sean Christopherson
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
ackerleytng, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx, pankaj.gupta
On 18.06.25 02:40, Sean Christopherson wrote:
> On Mon, Jun 16, 2025, David Hildenbrand wrote:
>> On 16.06.25 16:16, Fuad Tabba wrote:
>>> On Mon, 16 Jun 2025 at 15:03, David Hildenbrand <david@redhat.com> wrote:
>>>>>> IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is
>>>>>> weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's
>>>>>> novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case
>>>>>> is to share memory, but that's a _use case_, not the property of guest_memfd that
>>>>>> is being controlled by userspace.
>>>>>>
>>>>>> And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the
>>>>>> memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
>>>>>> instance is the _only_ entry point to the memslot.
>>>>>>
>>>>>> So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
>>>>>
>>>>> If we are going to change this; FLAG_MAPPABLE is not clear to me either.
>>>>> The guest can map private memory, right? I see your point about shared
>>>>> being overloaded with file shared but it would not be the first time a
>>>>> term is overloaded. kvm_slot_has_gmem() does makes a lot of sense.
>>>>>
>>>>> If it is going to change; how about GUEST_MEMFD_FLAG_USER_MAPPABLE?
>>>>
>>>> If "shared" is not good enough terminology ...
>>>>
>>>> ... can we please just find a way to name what this "non-private" memory
>>>> is called?
>
> guest_memfd? Not trying to be cheeky, I genuinely don't understand the need
> to come up with a different name. Before CoCo came along, I can't think of a
> single time where we felt the need to describe guest memory. There have been
> *many* instances of referring to the underlying backing store (e.g. HugeTLB vs.
> THP), and many instances where we've needed to talk about the types of mappings
> for guest memory, but I can't think of any cases where describing the state of
> guest memory itself was ever necessary or even useful.
> >>>> That something is mappable into $whatever is not the right
>>>> way to look at this IMHO.
>
> Why not? Honest question. USER_MAPPABLE is very literal, but I think it's the
> right granularity. E.g. we _could_ support read()/write()/etc, but it's not
> clear to me that we need/want to. And so why bundle those under SHARED, or any
> other one-size-fits-all flag?
Let's take a step back. There are various ways to look at this:
1) Indicate support for guest_memfd operations:
"GUEST_MEMFD_FLAG_MMAP": we support the mmap() operation
"GUEST_MEMFD_FLAG_WRITE": we support the write() operation
"GUEST_MEMFD_FLAG_READ": we support the read() operation
...
"GUEST_MEMFD_FLAG_UFFD": we support userfaultfd operations
Absolutely fine with me. In this series, we'd be advertising
GUEST_MEMFD_FLAG_MMAP. Because we support the mmap operation.
If the others are ever required remains to be seen [1].
2) Indicating the mmap mapping type (support for MMAP flags)
As you write below, one could indicate that we support
"mmap(MAP_SHARED)" vs "mmap(MAP_PRIVATE)".
I don't think that's required for now, as MAP_SHARED is really the
default that anything that supports mmap() supports. If someone ever
needs MAP_PRIVATE (CoW) support they can add such a flag
(GUEST_MEMFD_FLAG_MMAP_MAP_PRIVATE). I doubt we want that, but who knows.
As expressed elsewhere, the mmap mapping type was never what the
"SHARED" in KVM_GMEM_SHARED_MEM implied.
3) *guest-memfd specific* memory access characteristics
"private (non-accessible, private, secure, protected, ...) vs.
"non-private".
Traditionally, all was memory in guest-memfd was private, now we will
make guest_memfd also support non-private memory. As this memory is
"inaccessible" from a host point of view, any access to read/write it
(fault it into user page tables, read(), write(), etc) will fail.
Mempolicy support wanted to support mmap() without that, though [2],
which was one of the reasons I agreed that exposing the access
characteristics (that affect what you can actually mmap() ) made sense.
In the last upstream meeting we agreed that we will not do that, but
rather built up on MMAP+support for non-private memory support.
[1]
https://lore.kernel.org/kvm/20250303130838.28812-1-kalyazin@amazon.com/T/
[2]
https://lore.kernel.org/linux-mm/20250408112402.181574-1-shivankg@amd.com/
[...]
>>>> I'll further note that in the doc of KVM_SET_USER_MEMORY_REGION2 we talk
>>>> about "private" vs "shared" memory ... so that would have to be improved
>>>> as well.
>>>
>>> To add to what David just wrote, V1 of this series used the term
>>> "mappable" [1]. After a few discussions, I thought the consensus was
>>> that "shared" was a more accurate description --- i.e., mappability
>>> was a side effect of it being shared with the host.
>
> As I mentioned in the other thread with respect to sharing between other
> entities, simply SHARED doesn't provide sufficient granularity. HOST_SHAREABLE
> gets us closer, but I still don't like that because it implies the memory is
> 100% shareable, e.g. can be accessed just like normal memory.
>
> And for non-CoCo x86 VMs, sharing with host userspace isn't even necessarily the
> goal, i.e. "sharing" is a side effect of needing to allow mmap() so that KVM can
> continue to function.
Does mmap() support imply "support for non-private" memory or does
"support for non-private" imply mmap() support? :)
In this series we went for the latter. If I got you correctly, you argue
for the former.
Maybe both things should simply be separated.
>
>>> One could argue that non-CoCo VMs have no concept of "shared" vs
>>> "private".
>
> I am that one :-)
Well, if the concept of "private" does not exist, I'd argue everything
is "non-private" :)
>
>> A different way of looking at it is, non-CoCo VMs have
>>> their state as shared by default.
>
> Eh, there has to be another state for there to be a default.
>
>> All memory of these VMs behaves similar to other memory-based shared memory
>> backends (memfd, shmem) in the system, yes. You can map it into multiple
>> processes and use it like shmem/memfd.
>
> Ya, but that's more because guest_memfd only supports MAP_SHARED, versus KVM
> really wanting to truly share the memory with the entire system.
> > Of course, that's also an argument to some extent against
USER_MAPPABLE, because
> that name assumes we'll never want to support MAP_PRIVATE. But letting userspace
> MAP_PRIVATE guest_memfd would completely defeat the purpose of guest_memfd, so
> unless I'm forgetting a wrinkle with MAP_PRIVATE vs. MAP_SHARED, that's an
> assumption I'm a-ok making.
So, first important question, are we okay with adding:
"GUEST_MEMFD_FLAG_MMAP": we support the mmap() operation
>
> If we are really dead set on having SHARED in the name, it could be
> GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
> to me that's _too_ specific and again somewhat confusing given the unfortunate
> private vs. shared usage in CoCo-land. And just playing the odds, I'm fine taking
> a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or whatever,
> because I think that is comically unlikely to happen.
I think in addition to GUEST_MEMFD_FLAG_MMAP we want something to
express "this is not your old guest_memfd that only supports private
memory". And that's what I am struggling with.
Now, if you argue "support for mmap() implies support for non-private
memory", I'm probably okay for that.
I could envision support for non-private memory even without mmap()
support, how useful that might be, I don't know. But that's why I was
arguing that we mmap() is just one way to consume non-private memory.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-18 8:15 ` David Hildenbrand
@ 2025-06-18 9:20 ` Xiaoyao Li
2025-06-18 9:27 ` David Hildenbrand
2025-06-19 1:48 ` Sean Christopherson
1 sibling, 1 reply; 75+ messages in thread
From: Xiaoyao Li @ 2025-06-18 9:20 UTC (permalink / raw)
To: David Hildenbrand, Sean Christopherson
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta
On 6/18/2025 4:15 PM, David Hildenbrand wrote:
>> If we are really dead set on having SHARED in the name, it could be
>> GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or
>> GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
>> to me that's _too_ specific and again somewhat confusing given the
>> unfortunate
>> private vs. shared usage in CoCo-land. And just playing the odds, I'm
>> fine taking
>> a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or
>> whatever,
>> because I think that is comically unlikely to happen.
>
> I think in addition to GUEST_MEMFD_FLAG_MMAP we want something to
> express "this is not your old guest_memfd that only supports private
> memory". And that's what I am struggling with.
Sorry for chiming in.
Per my understanding, (old) guest memfd only means it's the memory that
cannot be accessed by userspace. There should be no shared/private
concept on it.
And "private" is the concept of KVM. Guest memfd can serve as private
memory, is just due to the character of it cannot be accessed from
userspace.
So if the guest memfd can be mmap'ed, then it become userspace
accessable and cannot serve as private memory.
> Now, if you argue "support for mmap() implies support for non-private
> memory", I'm probably okay for that.
I would say, support for mmap() implies cannot be used as private memory.
> I could envision support for non-private memory even without mmap()
> support, how useful that might be, I don't know. But that's why I was
> arguing that we mmap() is just one way to consume non-private memory.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-13 21:03 ` Sean Christopherson
` (3 preceding siblings ...)
2025-06-16 13:44 ` Ira Weiny
@ 2025-06-18 9:25 ` David Hildenbrand
4 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2025-06-18 9:25 UTC (permalink / raw)
To: Sean Christopherson, Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index d00b85cb168c..cb19150fd595 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1570,6 +1570,7 @@ struct kvm_memory_attributes {
>> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
>>
>> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
>> +#define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1ULL << 0)
>
Coming back to this part of the initial mail:
> I find the SUPPORT_SHARED terminology to be super confusing. I had to dig quite
> deep to undesrtand that "support shared" actually mean "userspace explicitly
> enable sharing on _this_ guest_memfd instance". E.g. I was surprised to see
>
> IMO, GUEST_MEMFD_FLAG_SHAREABLE would be more appropriate. But even that is
> weird to me. For non-CoCo VMs, there is no concept of shared vs. private. What's
> novel and notable is that the memory is _mappable_. Yeah, yeah, pKVM's use case
> is to share memory, but that's a _use case_, not the property of guest_memfd that
> is being controlled by userspace.
Looking back, it would all have made more sense if one would have to
explicitly request support for "private" memory, and the non-private
(ordinary?) would have been the default ... :)
>
> And kvm_gmem_memslot_supports_shared() is even worse. It's simply that the
> memslot is bound to a mappable guest_memfd instance, it's that the guest_memfd
> instance is the _only_ entry point to the memslot.
>
> So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
> KVM_MEMSLOT_GUEST_MEMFD_ONLY. That will make code like this:
As raised, GUEST_MEMFD_FLAG_MMAPPABLE / GUEST_MEMFD_FLAG_MMAP or sth.
like that that spells out "mmap" would be better IMHO. Better than
talking about faultability or mappability. (fault into what? map into
what? mmap() cleanly implies user space)
That means that we have "mmap() support implies support for non-private
memory", I can live with that, although some code might end up being a
bit confusing (e.g., that's why you proposed kvm_is_memslot_gmem_only()
below).
>
> if (kvm_slot_has_gmem(slot) &&
> (kvm_gmem_memslot_supports_shared(slot) ||
> kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> }
>
> much more intutive:
>
> if (kvm_is_memslot_gmem_only(slot) ||
> kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE))
> return kvm_gmem_max_mapping_level(slot, gfn, max_level);
Hiding the details in such a helper makes it easier to digest.
>
> And then have kvm_gmem_mapping_order() do:
>
> WARN_ON_ONCE(!kvm_slot_has_gmem(slot));
> return 0;
>
>> struct kvm_create_guest_memfd {
>> __u64 size;
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 559c93ad90be..e90884f74404 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -128,3 +128,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
>> config HAVE_KVM_ARCH_GMEM_INVALIDATE
>> bool
>> depends on KVM_GMEM
>> +
>> +config KVM_GMEM_SHARED_MEM
>> + select KVM_GMEM
>> + bool
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 6db515833f61..06616b6b493b 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -312,7 +312,77 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>> return gfn - slot->base_gfn + slot->gmem.pgoff;
>> }
>>
>> +static bool kvm_gmem_supports_shared(struct inode *inode)
>> +{
>> + const u64 flags = (u64)inode->i_private;
>> +
>> + if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
>> + return false;
>> +
>> + return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
>> +}
>> +
>> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
>
> And to my point about "shared", this is also very confusing, because there are
> zero checks in here about shared vs. private.
I assume it's simply a "kvm_gmem_fault" right now.
>
>> +{
>> + struct inode *inode = file_inode(vmf->vma->vm_file);
>> + struct folio *folio;
>> + vm_fault_t ret = VM_FAULT_LOCKED;
>> +
>> + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>> + return VM_FAULT_SIGBUS;
>> +
>> + folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>> + if (IS_ERR(folio)) {
>> + int err = PTR_ERR(folio);
>> +
>> + if (err == -EAGAIN)
>> + return VM_FAULT_RETRY;
>> +
>> + return vmf_error(err);
>> + }
>> +
>> + if (WARN_ON_ONCE(folio_test_large(folio))) {
>> + ret = VM_FAULT_SIGBUS;
>> + goto out_folio;
>> + }
>> +
>> + if (!folio_test_uptodate(folio)) {
>> + clear_highpage(folio_page(folio, 0));
>> + kvm_gmem_mark_prepared(folio);
>> + }
>> +
>> + vmf->page = folio_file_page(folio, vmf->pgoff);
>> +
>> +out_folio:
>> + if (ret != VM_FAULT_LOCKED) {
>> + folio_unlock(folio);
>> + folio_put(folio);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
>> + .fault = kvm_gmem_fault_shared,
>> +};
>> +
>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> + if (!kvm_gmem_supports_shared(file_inode(file)))
>> + return -ENODEV;
>> +
>> + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
>> + (VM_SHARED | VM_MAYSHARE)) {
>
> And the SHARED terminology gets really confusing here, due to colliding with the
> existing notion of SHARED file mappings.
>
kvm_gmem_supports_mmap() would be as clear as it gets for this case here.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-18 9:20 ` Xiaoyao Li
@ 2025-06-18 9:27 ` David Hildenbrand
2025-06-18 9:44 ` Xiaoyao Li
0 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2025-06-18 9:27 UTC (permalink / raw)
To: Xiaoyao Li, Sean Christopherson
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta
On 18.06.25 11:20, Xiaoyao Li wrote:
> On 6/18/2025 4:15 PM, David Hildenbrand wrote:
>>> If we are really dead set on having SHARED in the name, it could be
>>> GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or
>>> GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
>>> to me that's _too_ specific and again somewhat confusing given the
>>> unfortunate
>>> private vs. shared usage in CoCo-land. And just playing the odds, I'm
>>> fine taking
>>> a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or
>>> whatever,
>>> because I think that is comically unlikely to happen.
>>
>> I think in addition to GUEST_MEMFD_FLAG_MMAP we want something to
>> express "this is not your old guest_memfd that only supports private
>> memory". And that's what I am struggling with.
>
> Sorry for chiming in.
>
> Per my understanding, (old) guest memfd only means it's the memory that
> cannot be accessed by userspace. There should be no shared/private
> concept on it.
>
> And "private" is the concept of KVM. Guest memfd can serve as private
> memory, is just due to the character of it cannot be accessed from
> userspace.
>
> So if the guest memfd can be mmap'ed, then it become userspace
> accessable and cannot serve as private memory.
>
>> Now, if you argue "support for mmap() implies support for non-private
>> memory", I'm probably okay for that.
>
> I would say, support for mmap() implies cannot be used as private memory.
That's not where we're heading with in-place conversion support: you
will have private (ianccessible) and non-private (accessible) parts, and
while guest_memfd will support mmap() only the accessible parts can
actually be accessed (faulted in etc).
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-18 9:27 ` David Hildenbrand
@ 2025-06-18 9:44 ` Xiaoyao Li
2025-06-18 9:59 ` David Hildenbrand
0 siblings, 1 reply; 75+ messages in thread
From: Xiaoyao Li @ 2025-06-18 9:44 UTC (permalink / raw)
To: David Hildenbrand, Sean Christopherson
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta
On 6/18/2025 5:27 PM, David Hildenbrand wrote:
> On 18.06.25 11:20, Xiaoyao Li wrote:
>> On 6/18/2025 4:15 PM, David Hildenbrand wrote:
>>>> If we are really dead set on having SHARED in the name, it could be
>>>> GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or
>>>> GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
>>>> to me that's _too_ specific and again somewhat confusing given the
>>>> unfortunate
>>>> private vs. shared usage in CoCo-land. And just playing the odds, I'm
>>>> fine taking
>>>> a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or
>>>> whatever,
>>>> because I think that is comically unlikely to happen.
>>>
>>> I think in addition to GUEST_MEMFD_FLAG_MMAP we want something to
>>> express "this is not your old guest_memfd that only supports private
>>> memory". And that's what I am struggling with.
>>
>> Sorry for chiming in.
>>
>> Per my understanding, (old) guest memfd only means it's the memory that
>> cannot be accessed by userspace. There should be no shared/private
>> concept on it.
>>
>> And "private" is the concept of KVM. Guest memfd can serve as private
>> memory, is just due to the character of it cannot be accessed from
>> userspace.
>>
>> So if the guest memfd can be mmap'ed, then it become userspace
>> accessable and cannot serve as private memory.
>>
>>> Now, if you argue "support for mmap() implies support for non-private
>>> memory", I'm probably okay for that.
>>
>> I would say, support for mmap() implies cannot be used as private memory.
>
> That's not where we're heading with in-place conversion support: you
> will have private (ianccessible) and non-private (accessible) parts, and
> while guest_memfd will support mmap() only the accessible parts can
> actually be accessed (faulted in etc).
That's OK. The guestmemfd can be fine-grained, i.e., different
range/part of it can have different access property. But one rule never
change: only the sub-range is not accessible by userspace can it be
serve as private memory.
(I haven't read the in-place conversion support patch series. But I
think the private part is not mmap-able, right?)
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-18 9:44 ` Xiaoyao Li
@ 2025-06-18 9:59 ` David Hildenbrand
2025-06-18 10:42 ` Xiaoyao Li
0 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2025-06-18 9:59 UTC (permalink / raw)
To: Xiaoyao Li, Sean Christopherson
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta
On 18.06.25 11:44, Xiaoyao Li wrote:
> On 6/18/2025 5:27 PM, David Hildenbrand wrote:
>> On 18.06.25 11:20, Xiaoyao Li wrote:
>>> On 6/18/2025 4:15 PM, David Hildenbrand wrote:
>>>>> If we are really dead set on having SHARED in the name, it could be
>>>>> GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or
>>>>> GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
>>>>> to me that's _too_ specific and again somewhat confusing given the
>>>>> unfortunate
>>>>> private vs. shared usage in CoCo-land. And just playing the odds, I'm
>>>>> fine taking
>>>>> a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or
>>>>> whatever,
>>>>> because I think that is comically unlikely to happen.
>>>>
>>>> I think in addition to GUEST_MEMFD_FLAG_MMAP we want something to
>>>> express "this is not your old guest_memfd that only supports private
>>>> memory". And that's what I am struggling with.
>>>
>>> Sorry for chiming in.
>>>
>>> Per my understanding, (old) guest memfd only means it's the memory that
>>> cannot be accessed by userspace. There should be no shared/private
>>> concept on it.
>>>
>>> And "private" is the concept of KVM. Guest memfd can serve as private
>>> memory, is just due to the character of it cannot be accessed from
>>> userspace.
>>>
>>> So if the guest memfd can be mmap'ed, then it become userspace
>>> accessable and cannot serve as private memory.
>>>
>>>> Now, if you argue "support for mmap() implies support for non-private
>>>> memory", I'm probably okay for that.
>>>
>>> I would say, support for mmap() implies cannot be used as private memory.
>>
>> That's not where we're heading with in-place conversion support: you
>> will have private (ianccessible) and non-private (accessible) parts, and
>> while guest_memfd will support mmap() only the accessible parts can
>> actually be accessed (faulted in etc).
>
> That's OK. The guestmemfd can be fine-grained, i.e., different
> range/part of it can have different access property. But one rule never
> change: only the sub-range is not accessible by userspace can it be
> serve as private memory.
I'm sorry, I don't understand what you are getting at.
You said "So if the guest memfd can be mmap'ed, then it become userspace
accessable and cannot serve as private memory." and I say, with in-place
conversion support you are wrong.
The whole file can be mmaped(), that does not tell us anything about
which parts can be private or not.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-18 9:59 ` David Hildenbrand
@ 2025-06-18 10:42 ` Xiaoyao Li
2025-06-18 11:14 ` David Hildenbrand
0 siblings, 1 reply; 75+ messages in thread
From: Xiaoyao Li @ 2025-06-18 10:42 UTC (permalink / raw)
To: David Hildenbrand, Sean Christopherson
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta
On 6/18/2025 5:59 PM, David Hildenbrand wrote:
> On 18.06.25 11:44, Xiaoyao Li wrote:
>> On 6/18/2025 5:27 PM, David Hildenbrand wrote:
>>> On 18.06.25 11:20, Xiaoyao Li wrote:
>>>> On 6/18/2025 4:15 PM, David Hildenbrand wrote:
>>>>>> If we are really dead set on having SHARED in the name, it could be
>>>>>> GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or
>>>>>> GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
>>>>>> to me that's _too_ specific and again somewhat confusing given the
>>>>>> unfortunate
>>>>>> private vs. shared usage in CoCo-land. And just playing the odds,
>>>>>> I'm
>>>>>> fine taking
>>>>>> a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or
>>>>>> whatever,
>>>>>> because I think that is comically unlikely to happen.
>>>>>
>>>>> I think in addition to GUEST_MEMFD_FLAG_MMAP we want something to
>>>>> express "this is not your old guest_memfd that only supports private
>>>>> memory". And that's what I am struggling with.
>>>>
>>>> Sorry for chiming in.
>>>>
>>>> Per my understanding, (old) guest memfd only means it's the memory that
>>>> cannot be accessed by userspace. There should be no shared/private
>>>> concept on it.
>>>>
>>>> And "private" is the concept of KVM. Guest memfd can serve as private
>>>> memory, is just due to the character of it cannot be accessed from
>>>> userspace.
>>>>
>>>> So if the guest memfd can be mmap'ed, then it become userspace
>>>> accessable and cannot serve as private memory.
>>>>
>>>>> Now, if you argue "support for mmap() implies support for non-private
>>>>> memory", I'm probably okay for that.
>>>>
>>>> I would say, support for mmap() implies cannot be used as private
>>>> memory.
>>>
>>> That's not where we're heading with in-place conversion support: you
>>> will have private (ianccessible) and non-private (accessible) parts, and
>>> while guest_memfd will support mmap() only the accessible parts can
>>> actually be accessed (faulted in etc).
>>
>> That's OK. The guestmemfd can be fine-grained, i.e., different
>> range/part of it can have different access property. But one rule never
>> change: only the sub-range is not accessible by userspace can it be
>> serve as private memory.
>
> I'm sorry, I don't understand what you are getting at.
>
> You said "So if the guest memfd can be mmap'ed, then it become userspace
> accessable and cannot serve as private memory." and I say, with in-place
> conversion support you are wrong.
>
> The whole file can be mmaped(), that does not tell us anything about
> which parts can be private or not.
So there is nothing prevent userspace from accessing it after a range is
converted to private via KVM_GMEM_CONVERT_PRIVATE since the whole file
can be mmaped()?
If so, then for TDX case, userspace can change the TD-owner bit of the
private part by accessing it and later guest access will poison it and
trigger #MC. If the #MC is only delivered to the PCPU that triggers it,
it just leads to the TD guest being killed. If the #MC is broadcasted,
it affects other in the system.
I just give it a try on real TDX system with in-place conversion. The TD
is killed due to SIGBUS (host kernel handles the #MC and sends the
SIGBUS). It seems OK if only the TD guest being affected due to
userspace accesses the private memory. But I'm not sure if there is any
corner case that will affect the host.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-18 10:42 ` Xiaoyao Li
@ 2025-06-18 11:14 ` David Hildenbrand
2025-06-18 12:17 ` Xiaoyao Li
0 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2025-06-18 11:14 UTC (permalink / raw)
To: Xiaoyao Li, Sean Christopherson
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta
On 18.06.25 12:42, Xiaoyao Li wrote:
> On 6/18/2025 5:59 PM, David Hildenbrand wrote:
>> On 18.06.25 11:44, Xiaoyao Li wrote:
>>> On 6/18/2025 5:27 PM, David Hildenbrand wrote:
>>>> On 18.06.25 11:20, Xiaoyao Li wrote:
>>>>> On 6/18/2025 4:15 PM, David Hildenbrand wrote:
>>>>>>> If we are really dead set on having SHARED in the name, it could be
>>>>>>> GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or
>>>>>>> GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
>>>>>>> to me that's _too_ specific and again somewhat confusing given the
>>>>>>> unfortunate
>>>>>>> private vs. shared usage in CoCo-land. And just playing the odds,
>>>>>>> I'm
>>>>>>> fine taking
>>>>>>> a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or
>>>>>>> whatever,
>>>>>>> because I think that is comically unlikely to happen.
>>>>>>
>>>>>> I think in addition to GUEST_MEMFD_FLAG_MMAP we want something to
>>>>>> express "this is not your old guest_memfd that only supports private
>>>>>> memory". And that's what I am struggling with.
>>>>>
>>>>> Sorry for chiming in.
>>>>>
>>>>> Per my understanding, (old) guest memfd only means it's the memory that
>>>>> cannot be accessed by userspace. There should be no shared/private
>>>>> concept on it.
>>>>>
>>>>> And "private" is the concept of KVM. Guest memfd can serve as private
>>>>> memory, is just due to the character of it cannot be accessed from
>>>>> userspace.
>>>>>
>>>>> So if the guest memfd can be mmap'ed, then it become userspace
>>>>> accessable and cannot serve as private memory.
>>>>>
>>>>>> Now, if you argue "support for mmap() implies support for non-private
>>>>>> memory", I'm probably okay for that.
>>>>>
>>>>> I would say, support for mmap() implies cannot be used as private
>>>>> memory.
>>>>
>>>> That's not where we're heading with in-place conversion support: you
>>>> will have private (ianccessible) and non-private (accessible) parts, and
>>>> while guest_memfd will support mmap() only the accessible parts can
>>>> actually be accessed (faulted in etc).
>>>
>>> That's OK. The guestmemfd can be fine-grained, i.e., different
>>> range/part of it can have different access property. But one rule never
>>> change: only the sub-range is not accessible by userspace can it be
>>> serve as private memory.
>>
>> I'm sorry, I don't understand what you are getting at.
>>
>> You said "So if the guest memfd can be mmap'ed, then it become userspace
>> accessable and cannot serve as private memory." and I say, with in-place
>> conversion support you are wrong.
>>
>> The whole file can be mmaped(), that does not tell us anything about
>> which parts can be private or not.
>
> So there is nothing prevent userspace from accessing it after a range is
> converted to private via KVM_GMEM_CONVERT_PRIVATE since the whole file
> can be mmaped()?
>
> If so, then for TDX case, userspace can change the TD-owner bit of the
> private part by accessing it and later guest access will poison it and
> trigger #MC. If the #MC is only delivered to the PCPU that triggers it,
> it just leads to the TD guest being killed. If the #MC is broadcasted,
> it affects other in the system.
>
> I just give it a try on real TDX system with in-place conversion. The TD
> is killed due to SIGBUS (host kernel handles the #MC and sends the
> SIGBUS). It seems OK if only the TD guest being affected due to
> userspace accesses the private memory. But I'm not sure if there is any
> corner case that will affect the host.
I suggest you go ahead and read all about in-place conversion support,
and how it all relates to the #MC problem you mention here.
Long story short: SIGBUS is triggered by the fault handler, not by the
#MC, because private pages cannot be faulted in and accessed.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-17 23:04 ` Sean Christopherson
@ 2025-06-18 11:18 ` Fuad Tabba
0 siblings, 0 replies; 75+ messages in thread
From: Fuad Tabba @ 2025-06-18 11:18 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Hi Sean,
On Wed, 18 Jun 2025 at 00:04, Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Jun 16, 2025, Fuad Tabba wrote:
> > > > This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> > > > and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> > > > flag at creation time.
> > >
> > > Why? I can see that from the patch.
> >
> > It's in the patch series, not this patch.
>
> Eh, not really. It doesn't even matter how "Why?" is interpreted, because nothing
> in this series covers any of the reasonable interpretations to an acceptable
> degree.
>
> These are all the changelogs for generic changes
>
> : This patch enables support for shared memory in guest_memfd, including
> : mapping that memory from host userspace.
> :
> : This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> : and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> : flag at creation time.
>
> : Add a new internal flag in the top half of memslot->flags to track when
> : a guest_memfd-backed slot supports shared memory, which is reserved for
> : internal use in KVM.
> :
> : This avoids repeatedly checking the underlying guest_memfd file for
> : shared memory support, which requires taking a reference on the file.
>
> the small bit of documentation
>
> +When the capability KVM_CAP_GMEM_SHARED_MEM is supported, the 'flags' field
> +supports GUEST_MEMFD_FLAG_SUPPORT_SHARED. Setting this flag on guest_memfd
> +creation enables mmap() and faulting of guest_memfd memory to host userspace.
> +
> +When the KVM MMU performs a PFN lookup to service a guest fault and the backing
> +guest_memfd has the GUEST_MEMFD_FLAG_SUPPORT_SHARED set, then the fault will
> +always be consumed from guest_memfd, regardless of whether it is a shared or a
> +private fault.
>
> and the cover letter
>
> : The purpose of this series is to allow mapping guest_memfd backed memory
> : at the host. This support enables VMMs like Firecracker to run guests
> : backed completely by guest_memfd [2]. Combined with Patrick's series for
> : direct map removal in guest_memfd [3], this would allow running VMs that
> : offer additional hardening against Spectre-like transient execution
> : attacks.
> :
> : This series will also serve as a base for _restricted_ mmap() support
> : for guest_memfd backed memory at the host for CoCos that allow sharing
> : guest memory in-place with the host [4].
>
> None of those get remotely close to explaining the use cases in sufficient
> detail.
>
> Now, it's entirely acceptable, and in this case probably highly preferred, to
> link to the relevant use cases, e.g. as opposed to trying to regurgitate and
> distill a huge pile of information.
>
> But I want the _changelog_ to do the heavy lifting of capturing the most useful
> links and providing context. E.g. to find the the motiviation for using
> guest_memfd to back non-CoCo VMs, I had to follow the [3] link to Patrick's
> series, then walk backwards through the versions of _that_ series, and eventually
> come across another link in Patrick's very first RFC:
>
> : This RFC series is a rough draft adding support for running
> : non-confidential compute VMs in guest_memfd, based on prior discussions
> : with Sean [1].
>
> where [1] is the much more helpful:
>
> https://lore.kernel.org/linux-mm/cc1bb8e9bc3e1ab637700a4d3defeec95b55060a.camel@amazon.com
>
> Now, _I_ am obviously aware of most/all of the use cases and motiviations, but
> the changelog isn't just for people like me. Far from it; the changelog is most
> useful for people that are coming in with _zero_ knowledge and context. Finding
> the above link took me quite a bit of effort and digging (and to some extent, I
> knew what I was looking for), whereas an explicit reference in the changelog
> would (hopefully) take only the few seconds needed to read the blurb and click
> the link.
>
> My main argument for why you (and everyone else) should put significant effort
> into changelogs (and comments and documentation!) is very simple: writing and
> curating a good changelog (comment/documentation) is something the author does
> *once*. If the author skimps out on the changelog, then *every* reader is having
> to do that same work *every* time they dig through this code. We as a community
> come out far, far ahead in terms of developer time and understanding by turning a
> many-time cost into a one-time cost (and that's not even accounting for the fact
> that the author's one-time cost will like be a _lot_ smaller).
>
> There's obviously a balance to strike. E.g. if the changelog has 50 links, that's
> probably going to be counter-productive for most readers. In this case, 5-7-ish
> links with (very) brief contextual references is probably the sweet spot.
>
> > Would it help if I rephrase it along the lines of:
> >
> > This functionality isn't enabled until the introduction of the
> > KVM_GMEM_SHARED_MEM Kconfig option, and enabled for a given instance
> > by the GUEST_MEMFD_FLAG_SUPPORT_SHARED flag at creation time. Both of
> > which are introduced in a subsequent patch.
> >
> > > This changelog is way, way, waaay too light on details. Sorry for jumping in at
> > > the 11th hour, but we've spent what, 2 years working on this?
> >
> > I'll expand this. Just to make sure that I include the right details,
> > are you looking for implementation details, motivation, use cases?
>
> Despite my lengthy response, none of the above?
>
> Use cases are good fodder for Documentation and the cover letter, and for *brief*
> references in the changelogs. Implementation details generally don't need to be
> explained in the changelog, modulo notable gotchas and edge cases that are worth
> calling out.
>
> I _am_ looking for the motivation, but I suspect it's not the motivation you have
> in mind. I'm not terribly concerned with why you want to implement this
> functionality; that should be easy to glean from the Documentation and use case
> links.
>
> The motivation I'm looking for is why you're adding CONFIG_KVM_GMEM_SHARED_MEM
> and GUEST_MEMFD_FLAG_SUPPORT_SHARED.
>
> E.g. CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES was added because it gates large swaths
> of code, uAPI, and a field we don't want to access "accidentally" (mem_attr_array),
> and because CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES has a hard dependency on
> CONFIG_KVM_GENERIC_MMU_NOTIFIER.
>
> For CONFIG_KVM_GMEM_SHARED_MEM, I'm just not seeing the motiviation. It gates
> very little code (though that could be slightly changed by wrapping the mmap()
> and fault logic guest_memfd.c), and literally every use is part of a broader
> conditional. I.e. it's effectively an optimization.
>
> Ha! And it's actively buggy. Because this will allow shared gmem for DEFAULT_VM,
>
> #define kvm_arch_supports_gmem_shared_mem(kvm) \
> (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) && \
> ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM || \
> (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM))
>
> but only if CONFIG_KVM_SW_PROTECTED_VM is selected. That makes no sense. And
> that changelog is also sorely lacking. It covers the what, but that's quite
> useless, because I can very easily see the what from the code. By covering the
> "why" in the changelog, (hopefully) you would have come to the same conclusion
> that selecting KVM_GMEM_SHARED_MEM iff KVM_SW_PROTECTED_VM is enabled doesn't
> make any sense (because you wouldn't have been able to write a sane justification).
>
> Or, if it somehow does make sense, i.e. if I'm missing something, then that
> absolutely needs to in the changelog!
>
> : Define the architecture-specific macro to enable shared memory support
> : in guest_memfd for ordinary, i.e., non-CoCo, VM types, specifically
> : KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM.
> :
> : Enable the KVM_GMEM_SHARED_MEM Kconfig option if KVM_SW_PROTECTED_VM is
> : enabled.
>
>
> As for GUEST_MEMFD_FLAG_SUPPORT_SHARED, after digging through the code, I _think_
> the reason we need a flag is so that KVM knows to completely ignore the HVA in
> the memslot. (a) explaining that (again, for future readers) would be super
> helpful, and (b) if there is other motiviation for a per-guest_memfd opt-in, then
> _that_ is also very interesting.
>
> And for (a), bonus points if you explain why it's a GUEST_MEMFD flag, e.g. as
> opposed to a per-VM capability or per-memslot flag. (Though this may be self-
> evident to any readers that understand any of this, so definitely optional).
I think I see where you're going. I'll try to improve the changelogs
when I respin.
> > > So my vote would be "GUEST_MEMFD_FLAG_MAPPABLE", and then something like
> > > KVM_MEMSLOT_GUEST_MEMFD_ONLY. That will make code like this:
> > >
> > > if (kvm_slot_has_gmem(slot) &&
> > > (kvm_gmem_memslot_supports_shared(slot) ||
> > > kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
> > > return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> > > }
> > >
> > > much more intutive:
> > >
> > > if (kvm_is_memslot_gmem_only(slot) ||
> > > kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE))
> > > return kvm_gmem_max_mapping_level(slot, gfn, max_level);
> > >
> > > And then have kvm_gmem_mapping_order() do:
> > >
> > > WARN_ON_ONCE(!kvm_slot_has_gmem(slot));
> > > return 0;
> >
> > I have no preference really. To me this was intuitive, but I guess I
> > have been staring at this way too long.
>
> I agree that SHARED is intuitive for the pKVM use case (and probably all CoCo use
> cases). My objection with the name is that it's misleading/confusing for non-CoCo
> VMs (at least for me), and that using SHARED could unnecessarily paint us into a
> corner.
>
> Specifically, if there are ever use cases where guest memory is shared between
> entities *without* mapping guest memory into host userspace, then we'll be a bit
> hosed. Though as is tradition in KVM, I suppose we could just call it
> GUEST_MEMFD_FLAG_SUPPORT_SHARED2 ;-)
>
> Regarding CoCo vs. non-CoCo intuition, it's easy enough to discern that
> GUEST_MEMFD_FLAG_MAPPABLE is required to do in-place sharing with host userspace.
>
> But IMO it's not easy to glean that GUEST_MEMFD_FLAG_SUPPORT_SHARED is a
> effectively a hard requirement for non-CoCo x86 VMs purely because because many
> flows in KVM x86 will fail miserable if KVM can't access guest memory via uaccess,
> i.e. if guest memory isn't mapped by host userspace. In other words, it's as much
> about working within KVM's existing design (and not losing support for a wide
> swath of features) as it is about "sharing" guest memory with host userspace.
I'll defer answering that to the next email...
/fuad
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-18 11:14 ` David Hildenbrand
@ 2025-06-18 12:17 ` Xiaoyao Li
2025-06-18 13:16 ` David Hildenbrand
0 siblings, 1 reply; 75+ messages in thread
From: Xiaoyao Li @ 2025-06-18 12:17 UTC (permalink / raw)
To: David Hildenbrand, Sean Christopherson
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta
On 6/18/2025 7:14 PM, David Hildenbrand wrote:
> On 18.06.25 12:42, Xiaoyao Li wrote:
>> On 6/18/2025 5:59 PM, David Hildenbrand wrote:
>>> On 18.06.25 11:44, Xiaoyao Li wrote:
>>>> On 6/18/2025 5:27 PM, David Hildenbrand wrote:
>>>>> On 18.06.25 11:20, Xiaoyao Li wrote:
>>>>>> On 6/18/2025 4:15 PM, David Hildenbrand wrote:
>>>>>>>> If we are really dead set on having SHARED in the name, it could be
>>>>>>>> GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or
>>>>>>>> GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
>>>>>>>> to me that's _too_ specific and again somewhat confusing given the
>>>>>>>> unfortunate
>>>>>>>> private vs. shared usage in CoCo-land. And just playing the odds,
>>>>>>>> I'm
>>>>>>>> fine taking
>>>>>>>> a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or
>>>>>>>> whatever,
>>>>>>>> because I think that is comically unlikely to happen.
>>>>>>>
>>>>>>> I think in addition to GUEST_MEMFD_FLAG_MMAP we want something to
>>>>>>> express "this is not your old guest_memfd that only supports private
>>>>>>> memory". And that's what I am struggling with.
>>>>>>
>>>>>> Sorry for chiming in.
>>>>>>
>>>>>> Per my understanding, (old) guest memfd only means it's the memory
>>>>>> that
>>>>>> cannot be accessed by userspace. There should be no shared/private
>>>>>> concept on it.
>>>>>>
>>>>>> And "private" is the concept of KVM. Guest memfd can serve as private
>>>>>> memory, is just due to the character of it cannot be accessed from
>>>>>> userspace.
>>>>>>
>>>>>> So if the guest memfd can be mmap'ed, then it become userspace
>>>>>> accessable and cannot serve as private memory.
>>>>>>
>>>>>>> Now, if you argue "support for mmap() implies support for non-
>>>>>>> private
>>>>>>> memory", I'm probably okay for that.
>>>>>>
>>>>>> I would say, support for mmap() implies cannot be used as private
>>>>>> memory.
>>>>>
>>>>> That's not where we're heading with in-place conversion support: you
>>>>> will have private (ianccessible) and non-private (accessible)
>>>>> parts, and
>>>>> while guest_memfd will support mmap() only the accessible parts can
>>>>> actually be accessed (faulted in etc).
>>>>
>>>> That's OK. The guestmemfd can be fine-grained, i.e., different
>>>> range/part of it can have different access property. But one rule never
>>>> change: only the sub-range is not accessible by userspace can it be
>>>> serve as private memory.
>>>
>>> I'm sorry, I don't understand what you are getting at.
>>>
>>> You said "So if the guest memfd can be mmap'ed, then it become userspace
>>> accessable and cannot serve as private memory." and I say, with in-place
>>> conversion support you are wrong.
>>>
>>> The whole file can be mmaped(), that does not tell us anything about
>>> which parts can be private or not.
>>
>> So there is nothing prevent userspace from accessing it after a range is
>> converted to private via KVM_GMEM_CONVERT_PRIVATE since the whole file
>> can be mmaped()?
>>
>> If so, then for TDX case, userspace can change the TD-owner bit of the
>> private part by accessing it and later guest access will poison it and
>> trigger #MC. If the #MC is only delivered to the PCPU that triggers it,
>> it just leads to the TD guest being killed. If the #MC is broadcasted,
>> it affects other in the system.
>>
>> I just give it a try on real TDX system with in-place conversion. The TD
>> is killed due to SIGBUS (host kernel handles the #MC and sends the
>> SIGBUS). It seems OK if only the TD guest being affected due to
>> userspace accesses the private memory. But I'm not sure if there is any
>> corner case that will affect the host.
>
> I suggest you go ahead and read all about in-place conversion support,
> and how it all relates to the #MC problem you mention here.
>
> Long story short: SIGBUS is triggered by the fault handler, not by the
> #MC, because private pages cannot be faulted in and accessed.
>
Sorry for the wrong information and thanks for your patience!
I'm clearer now that this series and the in-place conversion try to make
shared/private the property of guest memfd. If under this big picture,
it looks reasonable to name the flag with "shared'. While just looking
at this patch alone, Sean's concern makes more sense.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-18 12:17 ` Xiaoyao Li
@ 2025-06-18 13:16 ` David Hildenbrand
0 siblings, 0 replies; 75+ messages in thread
From: David Hildenbrand @ 2025-06-18 13:16 UTC (permalink / raw)
To: Xiaoyao Li, Sean Christopherson
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta
On 18.06.25 14:17, Xiaoyao Li wrote:
> On 6/18/2025 7:14 PM, David Hildenbrand wrote:
>> On 18.06.25 12:42, Xiaoyao Li wrote:
>>> On 6/18/2025 5:59 PM, David Hildenbrand wrote:
>>>> On 18.06.25 11:44, Xiaoyao Li wrote:
>>>>> On 6/18/2025 5:27 PM, David Hildenbrand wrote:
>>>>>> On 18.06.25 11:20, Xiaoyao Li wrote:
>>>>>>> On 6/18/2025 4:15 PM, David Hildenbrand wrote:
>>>>>>>>> If we are really dead set on having SHARED in the name, it could be
>>>>>>>>> GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or
>>>>>>>>> GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
>>>>>>>>> to me that's _too_ specific and again somewhat confusing given the
>>>>>>>>> unfortunate
>>>>>>>>> private vs. shared usage in CoCo-land. And just playing the odds,
>>>>>>>>> I'm
>>>>>>>>> fine taking
>>>>>>>>> a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or
>>>>>>>>> whatever,
>>>>>>>>> because I think that is comically unlikely to happen.
>>>>>>>>
>>>>>>>> I think in addition to GUEST_MEMFD_FLAG_MMAP we want something to
>>>>>>>> express "this is not your old guest_memfd that only supports private
>>>>>>>> memory". And that's what I am struggling with.
>>>>>>>
>>>>>>> Sorry for chiming in.
>>>>>>>
>>>>>>> Per my understanding, (old) guest memfd only means it's the memory
>>>>>>> that
>>>>>>> cannot be accessed by userspace. There should be no shared/private
>>>>>>> concept on it.
>>>>>>>
>>>>>>> And "private" is the concept of KVM. Guest memfd can serve as private
>>>>>>> memory, is just due to the character of it cannot be accessed from
>>>>>>> userspace.
>>>>>>>
>>>>>>> So if the guest memfd can be mmap'ed, then it become userspace
>>>>>>> accessable and cannot serve as private memory.
>>>>>>>
>>>>>>>> Now, if you argue "support for mmap() implies support for non-
>>>>>>>> private
>>>>>>>> memory", I'm probably okay for that.
>>>>>>>
>>>>>>> I would say, support for mmap() implies cannot be used as private
>>>>>>> memory.
>>>>>>
>>>>>> That's not where we're heading with in-place conversion support: you
>>>>>> will have private (ianccessible) and non-private (accessible)
>>>>>> parts, and
>>>>>> while guest_memfd will support mmap() only the accessible parts can
>>>>>> actually be accessed (faulted in etc).
>>>>>
>>>>> That's OK. The guestmemfd can be fine-grained, i.e., different
>>>>> range/part of it can have different access property. But one rule never
>>>>> change: only the sub-range is not accessible by userspace can it be
>>>>> serve as private memory.
>>>>
>>>> I'm sorry, I don't understand what you are getting at.
>>>>
>>>> You said "So if the guest memfd can be mmap'ed, then it become userspace
>>>> accessable and cannot serve as private memory." and I say, with in-place
>>>> conversion support you are wrong.
>>>>
>>>> The whole file can be mmaped(), that does not tell us anything about
>>>> which parts can be private or not.
>>>
>>> So there is nothing prevent userspace from accessing it after a range is
>>> converted to private via KVM_GMEM_CONVERT_PRIVATE since the whole file
>>> can be mmaped()?
>>>
>>> If so, then for TDX case, userspace can change the TD-owner bit of the
>>> private part by accessing it and later guest access will poison it and
>>> trigger #MC. If the #MC is only delivered to the PCPU that triggers it,
>>> it just leads to the TD guest being killed. If the #MC is broadcasted,
>>> it affects other in the system.
>>>
>>> I just give it a try on real TDX system with in-place conversion. The TD
>>> is killed due to SIGBUS (host kernel handles the #MC and sends the
>>> SIGBUS). It seems OK if only the TD guest being affected due to
>>> userspace accesses the private memory. But I'm not sure if there is any
>>> corner case that will affect the host.
>>
>> I suggest you go ahead and read all about in-place conversion support,
>> and how it all relates to the #MC problem you mention here.
>>
>> Long story short: SIGBUS is triggered by the fault handler, not by the
>> #MC, because private pages cannot be faulted in and accessed.
>>
>
> Sorry for the wrong information and thanks for your patience!
No problem. As raised, it would all be clearer if we started out with
guest_memfd behaving just like memfd (mmap() support etc) and then added
support for "private" memory on top.
But that's not how it happened :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-18 8:15 ` David Hildenbrand
2025-06-18 9:20 ` Xiaoyao Li
@ 2025-06-19 1:48 ` Sean Christopherson
2025-06-19 1:50 ` Sean Christopherson
1 sibling, 1 reply; 75+ messages in thread
From: Sean Christopherson @ 2025-06-19 1:48 UTC (permalink / raw)
To: David Hildenbrand
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
ackerleytng, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx, pankaj.gupta
On Wed, Jun 18, 2025, David Hildenbrand wrote:
> On 18.06.25 02:40, Sean Christopherson wrote:
> > On Mon, Jun 16, 2025, David Hildenbrand wrote:
> > > On 16.06.25 16:16, Fuad Tabba wrote:
> > > > On Mon, 16 Jun 2025 at 15:03, David Hildenbrand <david@redhat.com> wrote:
> > > > > That something is mappable into $whatever is not the right
> > > > > way to look at this IMHO.
> >
> > Why not? Honest question. USER_MAPPABLE is very literal, but I think it's the
> > right granularity. E.g. we _could_ support read()/write()/etc, but it's not
> > clear to me that we need/want to. And so why bundle those under SHARED, or any
> > other one-size-fits-all flag?
>
> Let's take a step back. There are various ways to look at this:
>
> 1) Indicate support for guest_memfd operations:
>
> "GUEST_MEMFD_FLAG_MMAP": we support the mmap() operation
> "GUEST_MEMFD_FLAG_WRITE": we support the write() operation
> "GUEST_MEMFD_FLAG_READ": we support the read() operation
> ...
> "GUEST_MEMFD_FLAG_UFFD": we support userfaultfd operations
>
>
> Absolutely fine with me. In this series, we'd be advertising
> GUEST_MEMFD_FLAG_MMAP. Because we support the mmap operation.
>
> If the others are ever required remains to be seen [1].
Another advantage of granular flags that comes to mind: WRITE (and READ) could
be withdrawn after populating memory, e.g. to harden against unexpected accesses
once the VM has been initialized.
And FWIW, I'm pretty sure it's only MMAP that *needs* userspace to opt-in. If it
weren't for the change in memslot behavior, i.e. to always look at the guest_memfd
fd and ignore the hva, then MMAP wouldn't need a userspace opt-in. Though we
might *want* an opt-in, e.g. for hardening purposes.
> 2) Indicating the mmap mapping type (support for MMAP flags)
>
> As you write below, one could indicate that we support "mmap(MAP_SHARED)" vs
> "mmap(MAP_PRIVATE)".
>
> I don't think that's required for now, as MAP_SHARED is really the default
> that anything that supports mmap() supports. If someone ever needs
> MAP_PRIVATE (CoW) support they can add such a flag
> (GUEST_MEMFD_FLAG_MMAP_MAP_PRIVATE). I doubt we want that, but who knows.
>
> As expressed elsewhere, the mmap mapping type was never what the "SHARED" in
> KVM_GMEM_SHARED_MEM implied.
>
>
> 3) *guest-memfd specific* memory access characteristics
>
> "private (non-accessible, private, secure, protected, ...) vs.
> "non-private".
>
> Traditionally, all was memory in guest-memfd was private, now we will make
> guest_memfd also support non-private memory. As this memory is
> "inaccessible" from a host point of view, any access to read/write it (fault
> it into user page tables, read(), write(), etc) will fail.
...
> > As I mentioned in the other thread with respect to sharing between other
> > entities, simply SHARED doesn't provide sufficient granularity. HOST_SHAREABLE
> > gets us closer, but I still don't like that because it implies the memory is
> > 100% shareable, e.g. can be accessed just like normal memory.
> >
> > And for non-CoCo x86 VMs, sharing with host userspace isn't even necessarily the
> > goal, i.e. "sharing" is a side effect of needing to allow mmap() so that KVM can
> > continue to function.
>
> Does mmap() support imply "support for non-private" memory or does "support
> for non-private" imply mmap() support? :)
...
> > Ya, but that's more because guest_memfd only supports MAP_SHARED, versus KVM
> > really wanting to truly share the memory with the entire system.
> > Of course, that's also an argument to some extent against USER_MAPPABLE, because
> > that name assumes we'll never want to support MAP_PRIVATE. But letting userspace
> > MAP_PRIVATE guest_memfd would completely defeat the purpose of guest_memfd, so
> > unless I'm forgetting a wrinkle with MAP_PRIVATE vs. MAP_SHARED, that's an
> > assumption I'm a-ok making.
>
> So, first important question, are we okay with adding:
>
> "GUEST_MEMFD_FLAG_MMAP": we support the mmap() operation
Probably stating the obvious, but yes, I am.
> > If we are really dead set on having SHARED in the name, it could be
> > GUEST_MEMFD_FLAG_USER_MAPPABLE_SHARED or GUEST_MEMFD_FLAG_USER_MAP_SHARED? But
> > to me that's _too_ specific and again somewhat confusing given the unfortunate
> > private vs. shared usage in CoCo-land. And just playing the odds, I'm fine taking
> > a risk of ending up with GUEST_MEMFD_FLAG_USER_MAPPABLE_PRIVATE or whatever,
> > because I think that is comically unlikely to happen.
>
> I think in addition to GUEST_MEMFD_FLAG_MMAP we want something to express
> "this is not your old guest_memfd that only supports private memory". And
> that's what I am struggling with.
>
> Now, if you argue "support for mmap() implies support for non-private
> memory", I'm probably okay for that.
Yep, that essentially what I'm advocating.
> I could envision support for non-private memory even without mmap() support,
> how useful that might be, I don't know.
It _could_ be very useful, e.g. to have very strong confidence that nothing in
userspace can accidentally clobber guest memory. The problem is that reality gets
in the way, and so unfortunately I don't see this idea ever coming to fruition
(though I really, really like the concept).
> But that's why I was arguing that we mmap() is just one way to consume
> non-private memory.
I agree that mmap() is just one way to interact with non-private memory, but
in addition to wanting to avoid having to name "non-private memory", I also want
to avoid bundling all of those ways together. I.e. I want to start with the bare
minimum and add functionality if/when it's needed. Partly so that we don't have
to spend much time thinking about the unsupported methods, but mostly because
adding functionality is almost always way easier than taking it away.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-19 1:48 ` Sean Christopherson
@ 2025-06-19 1:50 ` Sean Christopherson
0 siblings, 0 replies; 75+ messages in thread
From: Sean Christopherson @ 2025-06-19 1:50 UTC (permalink / raw)
To: David Hildenbrand
Cc: Fuad Tabba, Ira Weiny, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
ackerleytng, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx, pankaj.gupta
On Wed, Jun 18, 2025, Sean Christopherson wrote:
> On Wed, Jun 18, 2025, David Hildenbrand wrote:
> > > Ya, but that's more because guest_memfd only supports MAP_SHARED, versus KVM
> > > really wanting to truly share the memory with the entire system.
> > > Of course, that's also an argument to some extent against USER_MAPPABLE, because
> > > that name assumes we'll never want to support MAP_PRIVATE. But letting userspace
> > > MAP_PRIVATE guest_memfd would completely defeat the purpose of guest_memfd, so
> > > unless I'm forgetting a wrinkle with MAP_PRIVATE vs. MAP_SHARED, that's an
> > > assumption I'm a-ok making.
> >
> > So, first important question, are we okay with adding:
> >
> > "GUEST_MEMFD_FLAG_MMAP": we support the mmap() operation
>
> Probably stating the obvious, but yes, I am.
Heh, my brain is a bit fried. I didn't realize you were asking about
doing s/GUEST_MEMFD_FLAG_MMAPPABLE/GUEST_MEMFD_FLAG_MMAP until I read your other
mail.
Luckily, I 100% agree that GUEST_MEMFD_FLAG_MMAP is way better.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
2025-06-12 17:38 ` [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
@ 2025-06-24 10:02 ` Fuad Tabba
2025-06-24 10:16 ` David Hildenbrand
0 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-24 10:02 UTC (permalink / raw)
To: David Hildenbrand, Sean Christopherson
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Hi,
Before I respin this, I thought I'd outline the planned changes for
V13, especially since it involves a lot of repainting. I hope that by
presenting this first, we could reduce the number of times I'll need
to respin it.
In struct kvm_arch: add bool supports_gmem instead of renaming
has_private_mem
The guest_memfd flag GUEST_MEMFD_FLAG_SUPPORT_SHARED should be
called GUEST_MEMFD_FLAG_MMAP
The memslot internal flag KVM_MEMSLOT_SUPPORTS_GMEM_SHARED should
be called KVM_MEMSLOT_SUPPORTS_GMEM_MMAP
kvm_arch_supports_gmem_shared_mem() should be called
kvm_arch_supports_gmem_mmap()
kvm_gmem_memslot_supports_shared() should be called
kvm_gmem_memslot_supports_mmap()
kvm_gmem_fault_shared(struct vm_fault *vmf) should be called
kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
The capability KVM_CAP_GMEM_SHARED_MEM should be called KVM_CAP_GMEM_MMAP
The Kconfig CONFIG_KVM_GMEM_SHARED_MEM should be called
CONFIG_KVM_GMEM_SUPPORTS_MMAP
Also, what (unless you disagree) will stay the same as V12:
Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM: Since private
implies gmem, and we will have additional flags for MMAP support
Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE
Rename kvm_slot_can_be_private() to kvm_slot_has_gmem(): since
private does imply that it has gmem
Thanks,
/fuad
On Thu, 12 Jun 2025 at 18:39, David Hildenbrand <david@redhat.com> wrote:
>
> On 11.06.25 15:33, Fuad Tabba wrote:
> > Main changes since v11 [1]:
> > - Addressed various points of feedback from the last revision.
> > - Rebased on Linux 6.16-rc1.
>
> Nit: In case you have to resend, it might be worth changing the subject
> s/software protected/non-CoCo/ like you did in patch #12.
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
2025-06-24 10:02 ` Fuad Tabba
@ 2025-06-24 10:16 ` David Hildenbrand
2025-06-24 10:25 ` Fuad Tabba
0 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2025-06-24 10:16 UTC (permalink / raw)
To: Fuad Tabba, Sean Christopherson
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On 24.06.25 12:02, Fuad Tabba wrote:
> Hi,
>
> Before I respin this, I thought I'd outline the planned changes for
> V13, especially since it involves a lot of repainting. I hope that
> by presenting this first, we could reduce the number of times I'll
> need to respin it.
>
> In struct kvm_arch: add bool supports_gmem instead of renaming
> has_private_mem
>
> The guest_memfd flag GUEST_MEMFD_FLAG_SUPPORT_SHARED should be
> called GUEST_MEMFD_FLAG_MMAP
>
> The memslot internal flag KVM_MEMSLOT_SUPPORTS_GMEM_SHARED should be
> called KVM_MEMSLOT_SUPPORTS_GMEM_MMAP
>
> kvm_arch_supports_gmem_shared_mem() should be called
> kvm_arch_supports_gmem_mmap()
>
> kvm_gmem_memslot_supports_shared() should be called
> kvm_gmem_memslot_supports_mmap()
>
> kvm_gmem_fault_shared(struct vm_fault *vmf) should be called
> kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>
> The capability KVM_CAP_GMEM_SHARED_MEM should be called
> KVM_CAP_GMEM_MMAP
>
> The Kconfig CONFIG_KVM_GMEM_SHARED_MEM should be called
> CONFIG_KVM_GMEM_SUPPORTS_MMAP
Works for me.
>
> Also, what (unless you disagree) will stay the same as V12:
>
> Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM: Since private
> implies gmem, and we will have additional flags for MMAP support
Agreed.
>
> Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
> CONFIG_KVM_GENERIC_GMEM_POPULATE
Agreed.
>
> Rename kvm_slot_can_be_private() to kvm_slot_has_gmem(): since
> private does imply that it has gmem
Right. It's a little more tricky in reality at least with this series:
without in-place conversion, not all gmem can have private memory. But
the places that check kvm_slot_can_be_private() likely only care about
if this memslot is backed by gmem.
Sean also raised a "kvm_is_memslot_gmem_only()", how did you end up
calling that?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
2025-06-24 10:16 ` David Hildenbrand
@ 2025-06-24 10:25 ` Fuad Tabba
2025-06-24 11:44 ` David Hildenbrand
0 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-24 10:25 UTC (permalink / raw)
To: David Hildenbrand
Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
ackerleytng, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx, pankaj.gupta, ira.weiny
Hi David,
On Tue, 24 Jun 2025 at 11:16, David Hildenbrand <david@redhat.com> wrote:
>
> On 24.06.25 12:02, Fuad Tabba wrote:
> > Hi,
> >
> > Before I respin this, I thought I'd outline the planned changes for
> > V13, especially since it involves a lot of repainting. I hope that
> > by presenting this first, we could reduce the number of times I'll
> > need to respin it.
> >
> > In struct kvm_arch: add bool supports_gmem instead of renaming
> > has_private_mem
> >
> > The guest_memfd flag GUEST_MEMFD_FLAG_SUPPORT_SHARED should be
> > called GUEST_MEMFD_FLAG_MMAP
> >
> > The memslot internal flag KVM_MEMSLOT_SUPPORTS_GMEM_SHARED should be
> > called KVM_MEMSLOT_SUPPORTS_GMEM_MMAP
> >
> > kvm_arch_supports_gmem_shared_mem() should be called
> > kvm_arch_supports_gmem_mmap()
> >
> > kvm_gmem_memslot_supports_shared() should be called
> > kvm_gmem_memslot_supports_mmap()
> >
> > kvm_gmem_fault_shared(struct vm_fault *vmf) should be called
> > kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
> >
> > The capability KVM_CAP_GMEM_SHARED_MEM should be called
> > KVM_CAP_GMEM_MMAP
> >
> > The Kconfig CONFIG_KVM_GMEM_SHARED_MEM should be called
> > CONFIG_KVM_GMEM_SUPPORTS_MMAP
>
> Works for me.
>
> >
> > Also, what (unless you disagree) will stay the same as V12:
> >
> > Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM: Since private
> > implies gmem, and we will have additional flags for MMAP support
>
> Agreed.
>
> >
> > Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
> > CONFIG_KVM_GENERIC_GMEM_POPULATE
>
> Agreed.
>
> >
> > Rename kvm_slot_can_be_private() to kvm_slot_has_gmem(): since
> > private does imply that it has gmem
>
> Right. It's a little more tricky in reality at least with this series:
> without in-place conversion, not all gmem can have private memory. But
> the places that check kvm_slot_can_be_private() likely only care about
> if this memslot is backed by gmem.
Exactly. Reading the code, all the places that check
kvm_slot_can_be_private() are really checking whether the slot has
gmem. After this series, if a caller is interested in finding out
whether a slot can be private could achieve the same effect by
checking that a gmem slot doesn't support mmap (i.e.,
kvm_slot_has_gmem() && kvm_arch_supports_gmem_mmap() ). If that
happens, we can reintroduce kvm_slot_can_be_private() as such.
Otherwise, I could keep it and already define it as so. What do you think?
> Sean also raised a "kvm_is_memslot_gmem_only()", how did you end up
> calling that?
Good point, I'd missed that. Isn't it true that
kvm_is_memslot_gmem_only() is synonymous (at least for now) with
kvm_gmem_memslot_supports_mmap()?
Thanks,
/fuad
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
2025-06-24 10:25 ` Fuad Tabba
@ 2025-06-24 11:44 ` David Hildenbrand
2025-06-24 11:58 ` Fuad Tabba
0 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2025-06-24 11:44 UTC (permalink / raw)
To: Fuad Tabba
Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
ackerleytng, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx, pankaj.gupta, ira.weiny
On 24.06.25 12:25, Fuad Tabba wrote:
> Hi David,
>
> On Tue, 24 Jun 2025 at 11:16, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 24.06.25 12:02, Fuad Tabba wrote:
>>> Hi,
>>>
>>> Before I respin this, I thought I'd outline the planned changes for
>>> V13, especially since it involves a lot of repainting. I hope that
>>> by presenting this first, we could reduce the number of times I'll
>>> need to respin it.
>>>
>>> In struct kvm_arch: add bool supports_gmem instead of renaming
>>> has_private_mem
>>>
>>> The guest_memfd flag GUEST_MEMFD_FLAG_SUPPORT_SHARED should be
>>> called GUEST_MEMFD_FLAG_MMAP
>>>
>>> The memslot internal flag KVM_MEMSLOT_SUPPORTS_GMEM_SHARED should be
>>> called KVM_MEMSLOT_SUPPORTS_GMEM_MMAP
>>>
>>> kvm_arch_supports_gmem_shared_mem() should be called
>>> kvm_arch_supports_gmem_mmap()
>>>
>>> kvm_gmem_memslot_supports_shared() should be called
>>> kvm_gmem_memslot_supports_mmap()
>>>
>>> kvm_gmem_fault_shared(struct vm_fault *vmf) should be called
>>> kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>>>
>>> The capability KVM_CAP_GMEM_SHARED_MEM should be called
>>> KVM_CAP_GMEM_MMAP
>>>
>>> The Kconfig CONFIG_KVM_GMEM_SHARED_MEM should be called
>>> CONFIG_KVM_GMEM_SUPPORTS_MMAP
>>
>> Works for me.
>>
>>>
>>> Also, what (unless you disagree) will stay the same as V12:
>>>
>>> Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM: Since private
>>> implies gmem, and we will have additional flags for MMAP support
>>
>> Agreed.
>>
>>>
>>> Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
>>> CONFIG_KVM_GENERIC_GMEM_POPULATE
>>
>> Agreed.
>>
>>>
>>> Rename kvm_slot_can_be_private() to kvm_slot_has_gmem(): since
>>> private does imply that it has gmem
>>
>> Right. It's a little more tricky in reality at least with this series:
>> without in-place conversion, not all gmem can have private memory. But
>> the places that check kvm_slot_can_be_private() likely only care about
>> if this memslot is backed by gmem.
>
> Exactly. Reading the code, all the places that check
> kvm_slot_can_be_private() are really checking whether the slot has
> gmem. After this series, if a caller is interested in finding out
> whether a slot can be private could achieve the same effect by
> checking that a gmem slot doesn't support mmap (i.e.,
> kvm_slot_has_gmem() && kvm_arch_supports_gmem_mmap() ). If that
> happens, we can reintroduce kvm_slot_can_be_private() as such.
>
> Otherwise, I could keep it and already define it as so. What do you think?
>
>> Sean also raised a "kvm_is_memslot_gmem_only()", how did you end up
>> calling that?
>
> Good point, I'd missed that. Isn't it true that
> kvm_is_memslot_gmem_only() is synonymous (at least for now) with
> kvm_gmem_memslot_supports_mmap()?
Yes. I think having a simple kvm_is_memslot_gmem_only() helper might
make fault handling code easier to read, though.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
2025-06-24 11:44 ` David Hildenbrand
@ 2025-06-24 11:58 ` Fuad Tabba
2025-06-24 17:50 ` Sean Christopherson
0 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-24 11:58 UTC (permalink / raw)
To: David Hildenbrand
Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve,
ackerleytng, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx, pankaj.gupta, ira.weiny
On Tue, 24 Jun 2025 at 12:44, David Hildenbrand <david@redhat.com> wrote:
>
> On 24.06.25 12:25, Fuad Tabba wrote:
> > Hi David,
> >
> > On Tue, 24 Jun 2025 at 11:16, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 24.06.25 12:02, Fuad Tabba wrote:
> >>> Hi,
> >>>
> >>> Before I respin this, I thought I'd outline the planned changes for
> >>> V13, especially since it involves a lot of repainting. I hope that
> >>> by presenting this first, we could reduce the number of times I'll
> >>> need to respin it.
> >>>
> >>> In struct kvm_arch: add bool supports_gmem instead of renaming
> >>> has_private_mem
> >>>
> >>> The guest_memfd flag GUEST_MEMFD_FLAG_SUPPORT_SHARED should be
> >>> called GUEST_MEMFD_FLAG_MMAP
> >>>
> >>> The memslot internal flag KVM_MEMSLOT_SUPPORTS_GMEM_SHARED should be
> >>> called KVM_MEMSLOT_SUPPORTS_GMEM_MMAP
> >>>
> >>> kvm_arch_supports_gmem_shared_mem() should be called
> >>> kvm_arch_supports_gmem_mmap()
> >>>
> >>> kvm_gmem_memslot_supports_shared() should be called
> >>> kvm_gmem_memslot_supports_mmap()
> >>>
> >>> kvm_gmem_fault_shared(struct vm_fault *vmf) should be called
> >>> kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
> >>>
> >>> The capability KVM_CAP_GMEM_SHARED_MEM should be called
> >>> KVM_CAP_GMEM_MMAP
> >>>
> >>> The Kconfig CONFIG_KVM_GMEM_SHARED_MEM should be called
> >>> CONFIG_KVM_GMEM_SUPPORTS_MMAP
> >>
> >> Works for me.
> >>
> >>>
> >>> Also, what (unless you disagree) will stay the same as V12:
> >>>
> >>> Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM: Since private
> >>> implies gmem, and we will have additional flags for MMAP support
> >>
> >> Agreed.
> >>
> >>>
> >>> Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to
> >>> CONFIG_KVM_GENERIC_GMEM_POPULATE
> >>
> >> Agreed.
> >>
> >>>
> >>> Rename kvm_slot_can_be_private() to kvm_slot_has_gmem(): since
> >>> private does imply that it has gmem
> >>
> >> Right. It's a little more tricky in reality at least with this series:
> >> without in-place conversion, not all gmem can have private memory. But
> >> the places that check kvm_slot_can_be_private() likely only care about
> >> if this memslot is backed by gmem.
> >
> > Exactly. Reading the code, all the places that check
> > kvm_slot_can_be_private() are really checking whether the slot has
> > gmem. After this series, if a caller is interested in finding out
> > whether a slot can be private could achieve the same effect by
> > checking that a gmem slot doesn't support mmap (i.e.,
> > kvm_slot_has_gmem() && kvm_arch_supports_gmem_mmap() ). If that
> > happens, we can reintroduce kvm_slot_can_be_private() as such.
> >
> > Otherwise, I could keep it and already define it as so. What do you think?
> >
> >> Sean also raised a "kvm_is_memslot_gmem_only()", how did you end up
> >> calling that?
> >
> > Good point, I'd missed that. Isn't it true that
> > kvm_is_memslot_gmem_only() is synonymous (at least for now) with
> > kvm_gmem_memslot_supports_mmap()?
>
> Yes. I think having a simple kvm_is_memslot_gmem_only() helper might
> make fault handling code easier to read, though.
Ack. So, with that, at least the two of us are in agreement about what
needs to be done for V13. I'll wait until I hear from Sean and
potentially the others before I respin.
Thanks!
/fuad
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
2025-06-24 11:58 ` Fuad Tabba
@ 2025-06-24 17:50 ` Sean Christopherson
2025-06-25 8:00 ` Fuad Tabba
0 siblings, 1 reply; 75+ messages in thread
From: Sean Christopherson @ 2025-06-24 17:50 UTC (permalink / raw)
To: Fuad Tabba
Cc: David Hildenbrand, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On Tue, Jun 24, 2025, Fuad Tabba wrote:
> On Tue, 24 Jun 2025 at 12:44, David Hildenbrand <david@redhat.com> wrote:
> >
> > On 24.06.25 12:25, Fuad Tabba wrote:
> > > Hi David,
> > >
> > > On Tue, 24 Jun 2025 at 11:16, David Hildenbrand <david@redhat.com> wrote:
> > >>
> > >> On 24.06.25 12:02, Fuad Tabba wrote:
> > >>> Hi,
> > >>>
> > >>> Before I respin this, I thought I'd outline the planned changes for
> > >>> V13, especially since it involves a lot of repainting. I hope that
> > >>> by presenting this first, we could reduce the number of times I'll
> > >>> need to respin it.
> > >>>
> > >>> In struct kvm_arch: add bool supports_gmem instead of renaming
> > >>> has_private_mem
> > >>>
> > >>> The guest_memfd flag GUEST_MEMFD_FLAG_SUPPORT_SHARED should be
> > >>> called GUEST_MEMFD_FLAG_MMAP
> > >>>
> > >>> The memslot internal flag KVM_MEMSLOT_SUPPORTS_GMEM_SHARED should be
> > >>> called KVM_MEMSLOT_SUPPORTS_GMEM_MMAP
This one...
> > >>> kvm_arch_supports_gmem_shared_mem() should be called
> > >>> kvm_arch_supports_gmem_mmap()
> > >>>
> > >>> kvm_gmem_memslot_supports_shared() should be called
> > >>> kvm_gmem_memslot_supports_mmap()
...and this one are the only names I don't like. Explanation below.
> > >>> Rename kvm_slot_can_be_private() to kvm_slot_has_gmem(): since
> > >>> private does imply that it has gmem
> > >>
> > >> Right. It's a little more tricky in reality at least with this series:
> > >> without in-place conversion, not all gmem can have private memory. But
> > >> the places that check kvm_slot_can_be_private() likely only care about
> > >> if this memslot is backed by gmem.
> > >
> > > Exactly. Reading the code, all the places that check
> > > kvm_slot_can_be_private() are really checking whether the slot has gmem.
Yeah, I'm fine with this change. There are a few KVM x86 uses where
kvm_slot_can_be_private() is slightly better in a vacuum, but in all but one of
those cases, the check immediately gates a kvm_gmem_xxx() call. I.e. when
looking at the code as a whole, I think kvm_slot_has_gmem() will be easier for
new readers to understand.
The only outlier is kvm_mmu_max_mapping_level(), but that'll probably get ripped
apart by this series, i.e. I'm guessing kvm_slot_has_gmem() will probably work
out better there too.
> > > After this series, if a caller is interested in finding out whether a
> > > slot can be private could achieve the same effect by checking that a gmem
> > > slot doesn't support mmap (i.e., kvm_slot_has_gmem() &&
> > > kvm_arch_supports_gmem_mmap() ). If that happens, we can reintroduce
> > > kvm_slot_can_be_private() as such.
> > >
> > > Otherwise, I could keep it and already define it as so. What do you think?
> > >
> > >> Sean also raised a "kvm_is_memslot_gmem_only()", how did you end up
> > >> calling that?
> > >
> > > Good point, I'd missed that. Isn't it true that
> > > kvm_is_memslot_gmem_only() is synonymous (at least for now) with
> > > kvm_gmem_memslot_supports_mmap()?
> >
> > Yes. I think having a simple kvm_is_memslot_gmem_only() helper might
> > make fault handling code easier to read, though.
Yep, exactly. The fact that a memslot is bound to a guest_memfd instance that
supports mmap() isn't actually what KVM cares about. The important part is that
the userspace_addr in the memslot needs to be ignored when mapping memory into
the guest, because the bound guest_memfd is the single source of truth for guest
mappings.
E.g. userspace could actually point userspace_addr at a completely different
mapping, in which case walking the userspace page tables to get the max mapping
size would be all kinds of wrong.
KVM will still use userspace_addr when access guest memory from within KVM,
but that's not dangerous to the host kernel/KVM, only to the guest (and userspace
is firmly in the TCB for that side of things).
So I think KVM_MEMSLOT_IS_GMEM_ONLY and kvm_is_memslot_gmem_only()?
Those names are technically not entirely true, because as above, there is no
guarantee that userspace_addr actually points at the bound guest_memfd. But
for all intents and purposes, that will hold true for all non-buggy setups.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
2025-06-13 20:35 ` Sean Christopherson
2025-06-16 7:13 ` Fuad Tabba
@ 2025-06-24 20:51 ` Ackerley Tng
2025-06-25 6:33 ` Roy, Patrick
1 sibling, 1 reply; 75+ messages in thread
From: Ackerley Tng @ 2025-06-24 20:51 UTC (permalink / raw)
To: Sean Christopherson, Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Sean Christopherson <seanjc@google.com> writes:
> On Wed, Jun 11, 2025, Fuad Tabba wrote:
>> The bool has_private_mem is used to indicate whether guest_memfd is
>> supported.
>
> No? This is at best weird, and at worst flat out wrong:
>
> if (kvm->arch.supports_gmem &&
> fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
> return false;
>
> ditto for this code:
>
> if (kvm_arch_supports_gmem(vcpu->kvm) &&
> kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(range->gpa)))i
> error_code |= PFERR_PRIVATE_ACCESS;
>
> and for the memory_attributes code. E.g. IIRC, with guest_memfd() mmap support,
> private vs. shared will become a property of the guest_memfd inode, i.e. this will
> become wrong:
>
> static u64 kvm_supported_mem_attributes(struct kvm *kvm)
> {
> if (!kvm || kvm_arch_supports_gmem(kvm))
> return KVM_MEMORY_ATTRIBUTE_PRIVATE;
>
> return 0;
> }
>
> Instead of renaming kvm_arch_has_private_mem() => kvm_arch_supports_gmem(), *add*
> kvm_arch_supports_gmem() and then kill off kvm_arch_has_private_mem() once non-x86
> usage is gone (i.e. query kvm->arch.has_private_mem directly).
>
> And then rather than rename has_private_mem, either add supports_gmem or do what
> you did for kvm_arch_supports_gmem_shared_mem() and explicitly check the VM type.
>
IIUC Fuad will be adding bool supports_gmem instead of renaming, but we
haven't discussed which usages will start using the new function.
Let me go over all the changes/usages related to has_private_mem and
supports_gmem.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6e0bbf4c2202..3d69da6d2d9e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2270,9 +2270,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>
>
> #ifdef CONFIG_KVM_GMEM
> -#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
> +#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
> #else
> -#define kvm_arch_has_private_mem(kvm) false
> +#define kvm_arch_supports_gmem(kvm) false
> #endif
>
*The* renaming vs adding-new-function change.
> #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
> @@ -2325,8 +2325,8 @@ enum {
> #define HF_SMM_INSIDE_NMI_MASK (1 << 2)
>
> # define KVM_MAX_NR_ADDRESS_SPACES 2
> -/* SMM is currently unsupported for guests with private memory. */
> -# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 2)
> +/* SMM is currently unsupported for guests with guest_memfd (esp private) memory. */
> +# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_supports_gmem(kvm) ? 1 : 2)
> # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> #else
IIUC from the discussion at guest_memfd call on 2025-05-15, SMM can't be
supported because SMM relies on memory being shared.
This should remain as kvm_arch_has_private_mem() - as long as the VM
supports private memory at all, kvm_arch_nr_memslot_as_ids() should
return 1 (no SMM support).
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index cbc84c6abc2e..e7ecf089780a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4910,7 +4910,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> if (r)
> return r;
>
> - if (kvm_arch_has_private_mem(vcpu->kvm) &&
> + if (kvm_arch_supports_gmem(vcpu->kvm) &&
> kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(range->gpa)))
> error_code |= PFERR_PRIVATE_ACCESS;
>
If the VM supports private mem and KVM knows the gfn to be private
(whether based on memory attributes or in future, guest_memfd's
shareability), prefault it as private.
Here technically the kvm_arch_has_private_mem() check just helps
short-circuit to save deeper lookups, but if kvm_arch_has_private_mem()
is false, kvm_mem_is_private() always return false anyway.
This should remain as kvm_arch_has_private_mem().
> @@ -7707,7 +7707,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
> * Zapping SPTEs in this case ensures KVM will reassess whether or not
> * a hugepage can be used for affected ranges.
> */
> - if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
> + if (WARN_ON_ONCE(!kvm_arch_supports_gmem(kvm)))
> return false;
>
> if (WARN_ON_ONCE(range->end <= range->start))
Skip setting memory attributes if this kvm doesn't support private
memory.
This should remain as kvm_arch_has_private_mem().
> @@ -7786,7 +7786,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> * a range that has PRIVATE GFNs, and conversely converting a range to
> * SHARED may now allow hugepages.
> */
> - if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
> + if (WARN_ON_ONCE(!kvm_arch_supports_gmem(kvm)))
> return false;
>
> /*
Skip setting memory attributes if this kvm doesn't support private
memory.
This should remain as kvm_arch_has_private_mem().
> @@ -7842,7 +7842,7 @@ void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
> {
> int level;
>
> - if (!kvm_arch_has_private_mem(kvm))
> + if (!kvm_arch_supports_gmem(kvm))
> return;
>
> for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
Skip initializing memory attributes if this kvm doesn't support private
memory, since for now KVM_MEMORY_ATTRIBUTE_PRIVATE is the only memory
attribute.
This should remain as kvm_arch_has_private_mem().
Or perhaps (separately from this series) this check can be changed to
kvm_supported_mem_attributes() != 0.
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7700efc06e35..a0e661aa3f8a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -719,11 +719,11 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
> #endif
>
> /*
> - * Arch code must define kvm_arch_has_private_mem if support for private memory
> + * Arch code must define kvm_arch_supports_gmem if support for guest_memfd
> * is enabled.
> */
> -#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> -static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
> +#if !defined(kvm_arch_supports_gmem) && !IS_ENABLED(CONFIG_KVM_GMEM)
> +static inline bool kvm_arch_supports_gmem(struct kvm *kvm)
> {
> return false;
> }
*The* renaming vs adding-new-function change.
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 898c3d5a7ba8..6efbea208fa6 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1588,7 +1588,7 @@ static int check_memory_region_flags(struct kvm *kvm,
> {
> u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
>
> - if (kvm_arch_has_private_mem(kvm))
> + if (kvm_arch_supports_gmem(kvm))
> valid_flags |= KVM_MEM_GUEST_MEMFD;
>
> /* Dirty logging private memory is not currently supported. */
This should be renamed - the flag is valid only if guest_memfd is
supported and squarely matches kvm_arch_supports_gmem().
> @@ -2419,7 +2419,7 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
> #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> static u64 kvm_supported_mem_attributes(struct kvm *kvm)
> {
> - if (!kvm || kvm_arch_has_private_mem(kvm))
> + if (!kvm || kvm_arch_supports_gmem(kvm))
> return KVM_MEMORY_ATTRIBUTE_PRIVATE;
>
> return 0;
This should remain as kvm_arch_has_private_mem().
There's a little issue when mmap support is added: generally for the VM
(for non-guest_memfd memslots in this VM), KVM_MEMORY_ATTRIBUTE_PRIVATE
should be supported, but specifically for some ranges that belong to
guest_memfd-only memslots, KVM_MEMORY_ATTRIBUTE_PRIVATE should not be
supported?
I think kvm_supported_mem_attributes() respond generically for the
entire VM, so leaving this as kvm_arch_has_private_mem() is correct.
> @@ -4912,7 +4912,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> #endif
> #ifdef CONFIG_KVM_GMEM
> case KVM_CAP_GUEST_MEMFD:
> - return !kvm || kvm_arch_has_private_mem(kvm);
> + return !kvm || kvm_arch_supports_gmem(kvm);
> #endif
> default:
> break;
This should be renamed - the CAP is valid only if guest_memfd is
supported and squarely matches kvm_arch_supports_gmem().
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3d69da6d2d9e..4bc50c1e21bd 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1341,7 +1341,7 @@ struct kvm_arch {
> unsigned int indirect_shadow_pages;
> u8 mmu_valid_gen;
> u8 vm_type;
> - bool has_private_mem;
> + bool supports_gmem;
> bool has_protected_state;
> bool pre_fault_allowed;
> struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
> @@ -2270,7 +2270,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>
>
> #ifdef CONFIG_KVM_GMEM
> -#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
> +#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
> #else
> #define kvm_arch_supports_gmem(kvm) false
> #endif
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e7ecf089780a..c4e10797610c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3488,7 +3488,7 @@ static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault *fault
> * on RET_PF_SPURIOUS until the update completes, or an actual spurious
> * case might go down the slow path. Either case will resolve itself.
> */
> - if (kvm->arch.has_private_mem &&
> + if (kvm->arch.supports_gmem &&
> fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
> return false;
>
This check should remain as a check on has_private_mem.
If the VM supports private memory, skip fast page faults on fault type
and KVM memory privacy status mismatches.
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ab9b947dbf4f..67ab05fd3517 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5180,8 +5180,8 @@ static int svm_vm_init(struct kvm *kvm)
> (type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
> to_kvm_sev_info(kvm)->need_init = true;
>
> - kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
> - kvm->arch.pre_fault_allowed = !kvm->arch.has_private_mem;
> + kvm->arch.supports_gmem = (type == KVM_X86_SNP_VM);
> + kvm->arch.pre_fault_allowed = !kvm->arch.supports_gmem;
> }
>
> if (!pause_filter_count || !pause_filter_thresh)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b58a74c1722d..401256ee817f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12778,8 +12778,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> return -EINVAL;
>
> kvm->arch.vm_type = type;
> - kvm->arch.has_private_mem =
> - (type == KVM_X86_SW_PROTECTED_VM);
> + kvm->arch.supports_gmem = (type == KVM_X86_SW_PROTECTED_VM);
> /* Decided by the vendor code for other VM types. */
> kvm->arch.pre_fault_allowed =
> type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM;
and
> arch/x86/kvm/vmx/tdx.c: In function ‘tdx_vm_init’:
> arch/x86/kvm/vmx/tdx.c:627:18: error: ‘struct kvm_arch’ has no member named ‘has_private_mem’
> 627 | kvm->arch.has_private_mem = true;
> | ^
> make[5]: *** [scripts/Makefile.build:287: arch/x86/kvm/vmx/tdx.o] Error 1
These three changes make me think that maybe .has_private_mem shouldn't
be a field at all and can be removed.
What if kvm_arch_has_private_mem() for x86 always checks for a specific
list of VM types? Something like this: on x86,
* kvm_arch_has_private_mem() will return true for
KVM_X86_SW_PROTECTED_VM, KVM_X86_SNP_VM and KVM_X86_TDX_VM.
* kvm_arch_supports_gmem() will return true for KVM_X86_SW_PROTECTED_VM,
KVM_X86_SNP_VM and KVM_X86_TDX_VM as well.
After mmap support, kvm_arch_supports_gmem() also return true for
KVM_X86_DEFAULT_VM, in addition to the original SW_PROTECTED, SNP and
TDX.
Patrick, Nikita, am I right that for KVM_X86_DEFAULT_VM to work with
mmap-able guest_memfd, the usage in page_fault_can_be_fast() need not be
updated, and that patch 10/18 in this series will be sufficient?
>> [...]
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-06-13 22:08 ` Sean Christopherson
@ 2025-06-24 23:40 ` Ackerley Tng
2025-06-27 15:01 ` Ackerley Tng
0 siblings, 1 reply; 75+ messages in thread
From: Ackerley Tng @ 2025-06-24 23:40 UTC (permalink / raw)
To: Sean Christopherson, Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Sean Christopherson <seanjc@google.com> writes:
> On Wed, Jun 11, 2025, Fuad Tabba wrote:
>> From: Ackerley Tng <ackerleytng@google.com>
>>
>> For memslots backed by guest_memfd with shared mem support, the KVM MMU
>> must always fault in pages from guest_memfd, and not from the host
>> userspace_addr. Update the fault handler to do so.
>
> And with a KVM_MEMSLOT_GUEST_MEMFD_ONLY flag, this becomes super obvious.
>
>> This patch also refactors related function names for accuracy:
>
> This patch. And phrase changelogs as commands.
>
>> kvm_mem_is_private() returns true only when the current private/shared
>> state (in the CoCo sense) of the memory is private, and returns false if
>> the current state is shared explicitly or impicitly, e.g., belongs to a
>> non-CoCo VM.
>
> Again, state changes as commands. For the above, it's not obvious if you're
> talking about the existing code versus the state of things after "this patch".
>
>
Will fix these, thanks!
>> kvm_mmu_faultin_pfn_gmem() is updated to indicate that it can be used to
>> fault in not just private memory, but more generally, from guest_memfd.
>
>> +static inline u8 kvm_max_level_for_order(int order)
>
> Do not use "inline" for functions that are visible only to the local compilation
> unit. "inline" is just a hint, and modern compilers are smart enough to inline
> functions when appropriate without a hint.
>
> A longer explanation/rant here: https://lore.kernel.org/all/ZAdfX+S323JVWNZC@google.com
>
Will fix this!
>> +static inline int kvm_gmem_max_mapping_level(const struct kvm_memory_slot *slot,
>> + gfn_t gfn, int max_level)
>> +{
>> + int max_order;
>>
>> if (max_level == PG_LEVEL_4K)
>> return PG_LEVEL_4K;
>
> This is dead code, the one and only caller has *just* checked for this condition.
>>
>> - host_level = host_pfn_mapping_level(kvm, gfn, slot);
>> - return min(host_level, max_level);
>> + max_order = kvm_gmem_mapping_order(slot, gfn);
>> + return min(max_level, kvm_max_level_for_order(max_order));
>> }
>
> ...
>
>> -static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
>> - u8 max_level, int gmem_order)
>> +static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
>
> This is comically verbose. C ain't Java. And having two separate helpers makes
> it *really* hard to (a) even see there are TWO helpers in the first place, and
> (b) understand how they differ.
>
> Gah, and not your bug, but completely ignoring the RMP in kvm_mmu_max_mapping_level()
> is wrong. It "works" because guest_memfd doesn't (yet) support dirty logging,
> no one enables the NX hugepage mitigation on AMD hosts.
>
> We could plumb in the pfn and private info, but I don't really see the point,
> at least not at this time.
>
>> + struct kvm_page_fault *fault,
>> + int order)
>> {
>> - u8 req_max_level;
>> + u8 max_level = fault->max_level;
>>
>> if (max_level == PG_LEVEL_4K)
>> return PG_LEVEL_4K;
>>
>> - max_level = min(kvm_max_level_for_order(gmem_order), max_level);
>> + max_level = min(kvm_max_level_for_order(order), max_level);
>> if (max_level == PG_LEVEL_4K)
>> return PG_LEVEL_4K;
>>
>> - req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn);
>> - if (req_max_level)
>> - max_level = min(max_level, req_max_level);
>> + if (fault->is_private) {
>> + u8 level = kvm_x86_call(private_max_mapping_level)(kvm, fault->pfn);
>
> Hmm, so the interesting thing here is that (IIRC) the RMP restrictions aren't
> just on the private pages, they also apply to the HYPERVISOR/SHARED pages. (Don't
> quote me on that).
>
> Regardless, I'm leaning toward dropping the "private" part, and making SNP deal
> with the intricacies of the RMP:
>
> /* Some VM types have additional restrictions, e.g. SNP's RMP. */
> req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
> if (req_max_level)
> max_level = min(max_level, req_max_level);
>
> Then we can get to something like:
>
> static int kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
> struct kvm_page_fault *fault)
> {
> int max_level, req_max_level;
>
> max_level = kvm_max_level_for_order(order);
> if (max_level == PG_LEVEL_4K)
> return PG_LEVEL_4K;
>
> req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
> if (req_max_level)
> max_level = min(max_level, req_max_level);
>
> return max_level;
> }
>
> int kvm_mmu_max_mapping_level(struct kvm *kvm,
> const struct kvm_memory_slot *slot, gfn_t gfn)
> {
> int max_level;
>
> max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM);
> if (max_level == PG_LEVEL_4K)
> return PG_LEVEL_4K;
>
> /* TODO: Comment goes here about KVM not supporting this path (yet). */
Which path does KVM not support?
> if (kvm_mem_is_private(kvm, gfn))
> return PG_LEVEL_4K;
>
Just making sure - this suggestion does take into account that
kvm_mem_is_private() will be querying guest_memfd for memory privacy
status, right? So the check below for kvm_is_memslot_gmem_only() will
only be handling the cases where the memory is shared, and only
guest_memfd is used for this gfn?
> if (kvm_is_memslot_gmem_only(slot)) {
> int order = kvm_gmem_mapping_order(slot, gfn);
>
> return min(max_level, kvm_gmem_max_mapping_level(kvm, order, NULL));
> }
>
> return min(max_level, host_pfn_mapping_level(kvm, gfn, slot));
> }
>
> static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> struct kvm_page_fault *fault)
> {
> struct kvm *kvm = vcpu->kvm;
> int order, r;
>
> if (!kvm_slot_has_gmem(fault->slot)) {
> kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> return -EFAULT;
> }
>
> r = kvm_gmem_get_pfn(kvm, fault->slot, fault->gfn, &fault->pfn,
> &fault->refcounted_page, &order);
> if (r) {
> kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> return r;
> }
>
> fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> fault->max_level = kvm_gmem_max_mapping_level(kvm, order, fault);
>
> return RET_PF_CONTINUE;
> }
>
> int sev_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault)
> {
> int level, rc;
> bool assigned;
>
> if (!sev_snp_guest(kvm))
> return 0;
>
> if (WARN_ON_ONCE(!fault) || !fault->is_private)
> return 0;
>
> rc = snp_lookup_rmpentry(fault->pfn, &assigned, &level);
> if (rc || !assigned)
> return PG_LEVEL_4K;
>
> return level;
> }
I like this. Thanks for the suggestion, I'll pass Fuad some patch(es)
for v13.
>> +/*
>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
>> + * private.
>> + *
>> + * A return value of false indicates that the gfn is explicitly or implicitly
>> + * shared (i.e., non-CoCo VMs).
>> + */
>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>> {
>> - return IS_ENABLED(CONFIG_KVM_GMEM) &&
>> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>> + struct kvm_memory_slot *slot;
>> +
>> + if (!IS_ENABLED(CONFIG_KVM_GMEM))
>> + return false;
>> +
>> + slot = gfn_to_memslot(kvm, gfn);
>> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>> + /*
>> + * Without in-place conversion support, if a guest_memfd memslot
>> + * supports shared memory, then all the slot's memory is
>> + * considered not private, i.e., implicitly shared.
>> + */
>> + return false;
>
> Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
> mappable guest_memfd. You need to do that no matter what.
Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
out. Where do people think we should check the mutual exclusivity?
In kvm_supported_mem_attributes() I'm thiking that we should still allow
the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
gfn ranges. Or do people think we should just disallow
KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
a guest_memfd-only memslot?
If we check mutually exclusivity when handling
kvm_vm_set_memory_attributes(), as long as part of the range where
KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
whose slot is guest_memfd-only, the ioctl will return EINVAL.
> Then you don't need
> to sprinkle special case code all over the place.
>
That's true, thanks.
I guess the special-casing will come back when guest_memfd supports
conversions (and stores shareability). After guest_memfd supports
conversions, if guest_memfd-only memslot, check with guest_memfd. Else,
look up memory attributes with kvm_get_memory_attributes().
>> + }
>> +
>> + return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>> }
>> #else
>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>> --
>> 2.50.0.rc0.642.g800a2b2222-goog
>>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem
2025-06-24 20:51 ` Ackerley Tng
@ 2025-06-25 6:33 ` Roy, Patrick
0 siblings, 0 replies; 75+ messages in thread
From: Roy, Patrick @ 2025-06-25 6:33 UTC (permalink / raw)
To: ackerleytng@google.com, Sean Christopherson, Fuad Tabba
Cc: akpm@linux-foundation.org, amoorthy@google.com,
anup@brainfault.org, aou@eecs.berkeley.edu, brauner@kernel.org,
catalin.marinas@arm.com, chao.p.peng@linux.intel.com,
chenhuacai@kernel.org, david@redhat.com, dmatlack@google.com,
fvdl@google.com, hch@infradead.org, hughd@google.com,
ira.weiny@intel.com, isaku.yamahata@gmail.com,
isaku.yamahata@intel.com, james.morse@arm.com, jarkko@kernel.org,
jgg@nvidia.com, jhubbard@nvidia.com, jthoughton@google.com,
keirf@google.com, kirill.shutemov@linux.intel.com,
kvm@vger.kernel.org, kvmarm@lists.linux.dev,
liam.merwick@oracle.com, linux-arm-msm@vger.kernel.org,
linux-mm@kvack.org, mail@maciej.szmigiero.name, maz@kernel.org,
mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au,
oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com,
paul.walmsley@sifive.com, pbonzini@redhat.com, peterx@redhat.com,
qperret@google.com, quic_cvanscha@quicinc.com,
quic_eberman@quicinc.com, quic_mnalajal@quicinc.com,
quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com,
rientjes@google.com, Roy, Patrick, shuah@kernel.org,
steven.price@arm.com, suzuki.poulose@arm.com,
vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk,
wei.w.wang@intel.com, will@kernel.org, willy@infradead.org,
xiaoyao.li@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com
Hi Ackerley!
On Tue, 2025-06-24 at 21:51 +0100, Ackerley Tng wrote:> Sean Christopherson <seanjc@google.com> writes:
>
[...]
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 3d69da6d2d9e..4bc50c1e21bd 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1341,7 +1341,7 @@ struct kvm_arch {
>> unsigned int indirect_shadow_pages;
>> u8 mmu_valid_gen;
>> u8 vm_type;
>> - bool has_private_mem;
>> + bool supports_gmem;
>> bool has_protected_state;
>> bool pre_fault_allowed;
>> struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
>> @@ -2270,7 +2270,7 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>>
>>
>> #ifdef CONFIG_KVM_GMEM
>> -#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.has_private_mem)
>> +#define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem)
>> #else
>> #define kvm_arch_supports_gmem(kvm) false
>> #endif
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index e7ecf089780a..c4e10797610c 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -3488,7 +3488,7 @@ static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault *fault
>> * on RET_PF_SPURIOUS until the update completes, or an actual spurious
>> * case might go down the slow path. Either case will resolve itself.
>> */
>> - if (kvm->arch.has_private_mem &&
>> + if (kvm->arch.supports_gmem &&
>> fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
>> return false;
>>
>
> This check should remain as a check on has_private_mem.
>
> If the VM supports private memory, skip fast page faults on fault type
> and KVM memory privacy status mismatches.
...
> Patrick, Nikita, am I right that for KVM_X86_DEFAULT_VM to work with
> mmap-able guest_memfd, the usage in page_fault_can_be_fast() need not be
> updated, and that patch 10/18 in this series will be sufficient?
Yeah, since KVM_X86_DEFAULT_VM does not and won't ever (?) support private
memory in guest_memfd (e.g. it always has to be used in all-shared mode) from
my understanding, the fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
check should never succeed anyway. kvm_mem_is_private() will always return
false, and fault->is_private should always be false, too (unless the guest does
something it should not be doing, and even then the worst case is that we won't
be handling this weirdness "fast").
In my testing with earlier iterations of this series where
page_fault_can_be_fast() was untouched I also never saw any problems related to
page faults on x86.
Best,
Patrick
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
2025-06-24 17:50 ` Sean Christopherson
@ 2025-06-25 8:00 ` Fuad Tabba
2025-06-25 14:07 ` Sean Christopherson
0 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-25 8:00 UTC (permalink / raw)
To: Sean Christopherson
Cc: David Hildenbrand, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Thanks Sean,
On Tue, 24 Jun 2025 at 18:50, Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Jun 24, 2025, Fuad Tabba wrote:
> > On Tue, 24 Jun 2025 at 12:44, David Hildenbrand <david@redhat.com> wrote:
> > >
> > > On 24.06.25 12:25, Fuad Tabba wrote:
> > > > Hi David,
> > > >
> > > > On Tue, 24 Jun 2025 at 11:16, David Hildenbrand <david@redhat.com> wrote:
> > > >>
> > > >> On 24.06.25 12:02, Fuad Tabba wrote:
> > > >>> Hi,
> > > >>>
> > > >>> Before I respin this, I thought I'd outline the planned changes for
> > > >>> V13, especially since it involves a lot of repainting. I hope that
> > > >>> by presenting this first, we could reduce the number of times I'll
> > > >>> need to respin it.
> > > >>>
> > > >>> In struct kvm_arch: add bool supports_gmem instead of renaming
> > > >>> has_private_mem
> > > >>>
> > > >>> The guest_memfd flag GUEST_MEMFD_FLAG_SUPPORT_SHARED should be
> > > >>> called GUEST_MEMFD_FLAG_MMAP
> > > >>>
> > > >>> The memslot internal flag KVM_MEMSLOT_SUPPORTS_GMEM_SHARED should be
> > > >>> called KVM_MEMSLOT_SUPPORTS_GMEM_MMAP
>
> This one...
>
> > > >>> kvm_arch_supports_gmem_shared_mem() should be called
> > > >>> kvm_arch_supports_gmem_mmap()
> > > >>>
> > > >>> kvm_gmem_memslot_supports_shared() should be called
> > > >>> kvm_gmem_memslot_supports_mmap()
>
> ...and this one are the only names I don't like. Explanation below.
>
> > > >>> Rename kvm_slot_can_be_private() to kvm_slot_has_gmem(): since
> > > >>> private does imply that it has gmem
> > > >>
> > > >> Right. It's a little more tricky in reality at least with this series:
> > > >> without in-place conversion, not all gmem can have private memory. But
> > > >> the places that check kvm_slot_can_be_private() likely only care about
> > > >> if this memslot is backed by gmem.
> > > >
> > > > Exactly. Reading the code, all the places that check
> > > > kvm_slot_can_be_private() are really checking whether the slot has gmem.
>
> Yeah, I'm fine with this change. There are a few KVM x86 uses where
> kvm_slot_can_be_private() is slightly better in a vacuum, but in all but one of
> those cases, the check immediately gates a kvm_gmem_xxx() call. I.e. when
> looking at the code as a whole, I think kvm_slot_has_gmem() will be easier for
> new readers to understand.
>
> The only outlier is kvm_mmu_max_mapping_level(), but that'll probably get ripped
> apart by this series, i.e. I'm guessing kvm_slot_has_gmem() will probably work
> out better there too.
>
> > > > After this series, if a caller is interested in finding out whether a
> > > > slot can be private could achieve the same effect by checking that a gmem
> > > > slot doesn't support mmap (i.e., kvm_slot_has_gmem() &&
> > > > kvm_arch_supports_gmem_mmap() ). If that happens, we can reintroduce
> > > > kvm_slot_can_be_private() as such.
> > > >
> > > > Otherwise, I could keep it and already define it as so. What do you think?
> > > >
> > > >> Sean also raised a "kvm_is_memslot_gmem_only()", how did you end up
> > > >> calling that?
> > > >
> > > > Good point, I'd missed that. Isn't it true that
> > > > kvm_is_memslot_gmem_only() is synonymous (at least for now) with
> > > > kvm_gmem_memslot_supports_mmap()?
> > >
> > > Yes. I think having a simple kvm_is_memslot_gmem_only() helper might
> > > make fault handling code easier to read, though.
>
> Yep, exactly. The fact that a memslot is bound to a guest_memfd instance that
> supports mmap() isn't actually what KVM cares about. The important part is that
> the userspace_addr in the memslot needs to be ignored when mapping memory into
> the guest, because the bound guest_memfd is the single source of truth for guest
> mappings.
>
> E.g. userspace could actually point userspace_addr at a completely different
> mapping, in which case walking the userspace page tables to get the max mapping
> size would be all kinds of wrong.
>
> KVM will still use userspace_addr when access guest memory from within KVM,
> but that's not dangerous to the host kernel/KVM, only to the guest (and userspace
> is firmly in the TCB for that side of things).
>
> So I think KVM_MEMSLOT_IS_GMEM_ONLY and kvm_is_memslot_gmem_only()?
>
> Those names are technically not entirely true, because as above, there is no
> guarantee that userspace_addr actually points at the bound guest_memfd. But
> for all intents and purposes, that will hold true for all non-buggy setups.
Got it. So, to summarize again:
In struct kvm_arch: add `bool supports_gmem` instead of renaming
`bool has_private_mem`
The guest_memfd flag GUEST_MEMFD_FLAG_SUPPORT_SHARED becomes
GUEST_MEMFD_FLAG_MMAP
The memslot internal flag KVM_MEMSLOT_SUPPORTS_GMEM_SHARED becomes
KVM_MEMSLOT_GMEM_ONLY
kvm_gmem_memslot_supports_shared() becomes kvm_memslot_is_gmem_only()
kvm_arch_supports_gmem_shared_mem() becomes kvm_arch_supports_gmem_mmap()
kvm_gmem_fault_shared(struct vm_fault *vmf) becomes
kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
The capability KVM_CAP_GMEM_SHARED_MEM becomes KVM_CAP_GMEM_MMAP
The Kconfig CONFIG_KVM_GMEM_SHARED_MEM becomes CONFIG_KVM_GMEM_SUPPORTS_MMAP
What will stay the same as V12:
CONFIG_KVM_PRIVATE_MEM becomes CONFIG_KVM_GMEM
CONFIG_KVM_GENERIC_PRIVATE_MEM becomes CONFIG_KVM_GENERIC_GMEM_POPULATE
kvm_slot_can_be_private() becomes kvm_slot_has_gmem()
Thanks,
/fuad
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
2025-06-25 8:00 ` Fuad Tabba
@ 2025-06-25 14:07 ` Sean Christopherson
0 siblings, 0 replies; 75+ messages in thread
From: Sean Christopherson @ 2025-06-25 14:07 UTC (permalink / raw)
To: Fuad Tabba
Cc: David Hildenbrand, kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini,
chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
dmatlack, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
mail, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On Wed, Jun 25, 2025, Fuad Tabba wrote:
> Got it. So, to summarize again:
>
> In struct kvm_arch: add `bool supports_gmem` instead of renaming
> `bool has_private_mem`
>
> The guest_memfd flag GUEST_MEMFD_FLAG_SUPPORT_SHARED becomes
> GUEST_MEMFD_FLAG_MMAP
>
> The memslot internal flag KVM_MEMSLOT_SUPPORTS_GMEM_SHARED becomes
> KVM_MEMSLOT_GMEM_ONLY
>
> kvm_gmem_memslot_supports_shared() becomes kvm_memslot_is_gmem_only()
>
> kvm_arch_supports_gmem_shared_mem() becomes kvm_arch_supports_gmem_mmap()
>
> kvm_gmem_fault_shared(struct vm_fault *vmf) becomes
> kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>
> The capability KVM_CAP_GMEM_SHARED_MEM becomes KVM_CAP_GMEM_MMAP
>
> The Kconfig CONFIG_KVM_GMEM_SHARED_MEM becomes CONFIG_KVM_GMEM_SUPPORTS_MMAP
>
> What will stay the same as V12:
>
> CONFIG_KVM_PRIVATE_MEM becomes CONFIG_KVM_GMEM
>
> CONFIG_KVM_GENERIC_PRIVATE_MEM becomes CONFIG_KVM_GENERIC_GMEM_POPULATE
>
> kvm_slot_can_be_private() becomes kvm_slot_has_gmem()
LGTM!
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-06-12 16:16 ` Shivank Garg
2025-06-13 21:03 ` Sean Christopherson
@ 2025-06-25 21:47 ` Ackerley Tng
2 siblings, 0 replies; 75+ messages in thread
From: Ackerley Tng @ 2025-06-25 21:47 UTC (permalink / raw)
To: Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm
Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, mail, david, michael.roth, wei.w.wang,
liam.merwick, isaku.yamahata, kirill.shutemov, suzuki.poulose,
steven.price, quic_eberman, quic_mnalajal, quic_tsoni,
quic_svaddagi, quic_cvanscha, quic_pderrin, quic_pheragu,
catalin.marinas, james.morse, yuzenghui, oliver.upton, maz, will,
qperret, keirf, roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl,
hughd, jthoughton, peterx, pankaj.gupta, ira.weiny, tabba
Fuad Tabba <tabba@google.com> writes:
> This patch enables support for shared memory in guest_memfd, including
> mapping that memory from host userspace.
>
> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> flag at creation time.
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> include/linux/kvm_host.h | 13 +++++++
> include/uapi/linux/kvm.h | 1 +
> virt/kvm/Kconfig | 4 +++
> virt/kvm/guest_memfd.c | 73 ++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 91 insertions(+)
>
> [...]
>
Just want to call out here that I believe HWpoison handling (and
kvm_gmem_error_folio()) remains correct after this patch. Would still
appreciate a review of the following!
> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> +{
> + struct inode *inode = file_inode(vmf->vma->vm_file);
> + struct folio *folio;
> + vm_fault_t ret = VM_FAULT_LOCKED;
> +
> + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> + return VM_FAULT_SIGBUS;
> +
> + folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> + if (IS_ERR(folio)) {
> + int err = PTR_ERR(folio);
> +
> + if (err == -EAGAIN)
> + return VM_FAULT_RETRY;
> +
> + return vmf_error(err);
> + }
> +
> + if (WARN_ON_ONCE(folio_test_large(folio))) {
> + ret = VM_FAULT_SIGBUS;
> + goto out_folio;
> + }
> +
> + if (!folio_test_uptodate(folio)) {
> + clear_highpage(folio_page(folio, 0));
> + kvm_gmem_mark_prepared(folio);
> + }
> +
> + vmf->page = folio_file_page(folio, vmf->pgoff);
> +
> +out_folio:
> + if (ret != VM_FAULT_LOCKED) {
> + folio_unlock(folio);
> + folio_put(folio);
> + }
> +
> + return ret;
> +}
> +
> [...]
This ->fault() callback does not explicitly check for
folio_test_hwpoison(), but up the call tree, __do_fault() checks for
HWpoison.
If the folio is clean, the folio is removed from the filemap. Fault is
eventually retried and (hopefully) another non-HWpoison folio will be
faulted in.
If the folio is dirty, userspace gets a SIGBUS.
kvm_gmem_error_folio() calls kvm_gmem_invalidate_begin(), which only
unmaps KVM_FILTER_PRIVATE, but IIUC that's okay since after mmap is
introduced,
* non-Coco VMs will always zap KVM_DIRECT_ROOTS anyway so the HWpoison
folio is still zapped from guest page tables
* Unmapping from host userspace page tables is handled in
memory_failure(), so the next access will lead to a fault, which
is handled using a SIGBUS in __do_fault()
* Coco VMs can only use guest_memfd for private pages, so there's no
change there since private pages still get zapped.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-06-24 23:40 ` Ackerley Tng
@ 2025-06-27 15:01 ` Ackerley Tng
2025-06-30 8:07 ` Fuad Tabba
0 siblings, 1 reply; 75+ messages in thread
From: Ackerley Tng @ 2025-06-27 15:01 UTC (permalink / raw)
To: Sean Christopherson, Fuad Tabba
Cc: kvm, linux-arm-msm, linux-mm, kvmarm, pbonzini, chenhuacai, mpe,
anup, paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
isaku.yamahata, mic, vbabka, vannapurve, mail, david,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Ackerley Tng <ackerleytng@google.com> writes:
> [...]
>>> +/*
>>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
>>> + * private.
>>> + *
>>> + * A return value of false indicates that the gfn is explicitly or implicitly
>>> + * shared (i.e., non-CoCo VMs).
>>> + */
>>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>> {
>>> - return IS_ENABLED(CONFIG_KVM_GMEM) &&
>>> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>> + struct kvm_memory_slot *slot;
>>> +
>>> + if (!IS_ENABLED(CONFIG_KVM_GMEM))
>>> + return false;
>>> +
>>> + slot = gfn_to_memslot(kvm, gfn);
>>> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>>> + /*
>>> + * Without in-place conversion support, if a guest_memfd memslot
>>> + * supports shared memory, then all the slot's memory is
>>> + * considered not private, i.e., implicitly shared.
>>> + */
>>> + return false;
>>
>> Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
>> mappable guest_memfd. You need to do that no matter what.
>
> Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
> disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
> out. Where do people think we should check the mutual exclusivity?
>
> In kvm_supported_mem_attributes() I'm thiking that we should still allow
> the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
> gfn ranges. Or do people think we should just disallow
> KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
> a guest_memfd-only memslot?
>
> If we check mutually exclusivity when handling
> kvm_vm_set_memory_attributes(), as long as part of the range where
> KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
> whose slot is guest_memfd-only, the ioctl will return EINVAL.
>
At yesterday's (2025-06-26) guest_memfd upstream call discussion,
* Fuad brought up a possible use case where within the *same* VM, we
want to allow both memslots that supports and does not support mmap in
guest_memfd.
* Shivank suggested a concrete use case for this: the user wants a
guest_memfd memslot that supports mmap just so userspace addresses can
be used as references for specifying memory policy.
* Sean then added on that allowing both types of guest_memfd memslots
(support and not supporting mmap) will allow the user to have a second
layer of protection and ensure that for some memslots, the user
expects never to be able to mmap from the memslot.
I agree it will be useful to allow both guest_memfd memslots that
support and do not support mmap in a single VM.
I think I found an issue with flags, which is that GUEST_MEMFD_FLAG_MMAP
should not imply that the guest_memfd will provide memory for all guest
faults within the memslot's gfn range (KVM_MEMSLOT_GMEM_ONLY).
For the use case Shivank raised, if the user wants a guest_memfd memslot
that supports mmap just so userspace addresses can be used as references
for specifying memory policy for legacy Coco VMs where shared memory
should still come from other sources, GUEST_MEMFD_FLAG_MMAP will be set,
but KVM can't fault shared memory from guest_memfd. Hence,
GUEST_MEMFD_FLAG_MMAP should not imply KVM_MEMSLOT_GMEM_ONLY.
Thinking forward, if we want guest_memfd to provide (no-mmap) protection
even for non-CoCo VMs (such that perhaps initial VM image is populated
and then VM memory should never be mmap-ed at all), we will want
guest_memfd to be the source of memory even if GUEST_MEMFD_FLAG_MMAP is
not set.
I propose that we should have a single VM-level flag to solve this (in
line with Sean's guideline that we should just move towards what we want
and not support non-existent use cases): something like
KVM_CAP_PREFER_GMEM.
If KVM_CAP_PREFER_GMEM_MEMORY is set,
* memory for any gfn range in a guest_memfd memslot will be requested
from guest_memfd
* any privacy status queries will also be directed to guest_memfd
* KVM_MEMORY_ATTRIBUTE_PRIVATE will not be a valid attribute
KVM_CAP_PREFER_GMEM_MEMORY will be orthogonal with no validation on
GUEST_MEMFD_FLAG_MMAP, which should just purely guard mmap support in
guest_memfd.
Here's a table that I set up [1]. I believe the proposed
KVM_CAP_PREFER_GMEM_MEMORY (column 7) lines up with requirements
(columns 1 to 4) correctly.
[1] https://lpc.events/event/18/contributions/1764/attachments/1409/3710/guest_memfd%20use%20cases%20vs%20guest_memfd%20flags%20and%20privacy%20tracking.pdf
> [...]
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-06-27 15:01 ` Ackerley Tng
@ 2025-06-30 8:07 ` Fuad Tabba
2025-06-30 14:44 ` Ackerley Tng
0 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-30 8:07 UTC (permalink / raw)
To: Ackerley Tng
Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Hi Ackerley,
On Fri, 27 Jun 2025 at 16:01, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Ackerley Tng <ackerleytng@google.com> writes:
>
> > [...]
>
> >>> +/*
> >>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
> >>> + * private.
> >>> + *
> >>> + * A return value of false indicates that the gfn is explicitly or implicitly
> >>> + * shared (i.e., non-CoCo VMs).
> >>> + */
> >>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >>> {
> >>> - return IS_ENABLED(CONFIG_KVM_GMEM) &&
> >>> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> >>> + struct kvm_memory_slot *slot;
> >>> +
> >>> + if (!IS_ENABLED(CONFIG_KVM_GMEM))
> >>> + return false;
> >>> +
> >>> + slot = gfn_to_memslot(kvm, gfn);
> >>> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
> >>> + /*
> >>> + * Without in-place conversion support, if a guest_memfd memslot
> >>> + * supports shared memory, then all the slot's memory is
> >>> + * considered not private, i.e., implicitly shared.
> >>> + */
> >>> + return false;
> >>
> >> Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
> >> mappable guest_memfd. You need to do that no matter what.
> >
> > Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
> > disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
> > out. Where do people think we should check the mutual exclusivity?
> >
> > In kvm_supported_mem_attributes() I'm thiking that we should still allow
> > the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
> > gfn ranges. Or do people think we should just disallow
> > KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
> > a guest_memfd-only memslot?
> >
> > If we check mutually exclusivity when handling
> > kvm_vm_set_memory_attributes(), as long as part of the range where
> > KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
> > whose slot is guest_memfd-only, the ioctl will return EINVAL.
> >
>
> At yesterday's (2025-06-26) guest_memfd upstream call discussion,
>
> * Fuad brought up a possible use case where within the *same* VM, we
> want to allow both memslots that supports and does not support mmap in
> guest_memfd.
> * Shivank suggested a concrete use case for this: the user wants a
> guest_memfd memslot that supports mmap just so userspace addresses can
> be used as references for specifying memory policy.
> * Sean then added on that allowing both types of guest_memfd memslots
> (support and not supporting mmap) will allow the user to have a second
> layer of protection and ensure that for some memslots, the user
> expects never to be able to mmap from the memslot.
>
> I agree it will be useful to allow both guest_memfd memslots that
> support and do not support mmap in a single VM.
>
> I think I found an issue with flags, which is that GUEST_MEMFD_FLAG_MMAP
> should not imply that the guest_memfd will provide memory for all guest
> faults within the memslot's gfn range (KVM_MEMSLOT_GMEM_ONLY).
>
> For the use case Shivank raised, if the user wants a guest_memfd memslot
> that supports mmap just so userspace addresses can be used as references
> for specifying memory policy for legacy Coco VMs where shared memory
> should still come from other sources, GUEST_MEMFD_FLAG_MMAP will be set,
> but KVM can't fault shared memory from guest_memfd. Hence,
> GUEST_MEMFD_FLAG_MMAP should not imply KVM_MEMSLOT_GMEM_ONLY.
>
> Thinking forward, if we want guest_memfd to provide (no-mmap) protection
> even for non-CoCo VMs (such that perhaps initial VM image is populated
> and then VM memory should never be mmap-ed at all), we will want
> guest_memfd to be the source of memory even if GUEST_MEMFD_FLAG_MMAP is
> not set.
>
> I propose that we should have a single VM-level flag to solve this (in
> line with Sean's guideline that we should just move towards what we want
> and not support non-existent use cases): something like
> KVM_CAP_PREFER_GMEM.
>
> If KVM_CAP_PREFER_GMEM_MEMORY is set,
>
> * memory for any gfn range in a guest_memfd memslot will be requested
> from guest_memfd
> * any privacy status queries will also be directed to guest_memfd
> * KVM_MEMORY_ATTRIBUTE_PRIVATE will not be a valid attribute
>
> KVM_CAP_PREFER_GMEM_MEMORY will be orthogonal with no validation on
> GUEST_MEMFD_FLAG_MMAP, which should just purely guard mmap support in
> guest_memfd.
>
> Here's a table that I set up [1]. I believe the proposed
> KVM_CAP_PREFER_GMEM_MEMORY (column 7) lines up with requirements
> (columns 1 to 4) correctly.
>
> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3710/guest_memfd%20use%20cases%20vs%20guest_memfd%20flags%20and%20privacy%20tracking.pdf
I'm not sure this naming helps. What does "prefer" imply here? If the
caller from user space does not prefer, does it mean that they
mind/oppose?
Regarding the use case Shivank mentioned, mmaping for policy, while
the use case is a valid one, the raison d'être of mmap is to map into
user space (i.e., fault it in). I would argue that if you opt into
mmap, you are doing it to be able to access it. To me, that seems like
something that merits its own flag, rather than mmap. Also, I recall
that we said that later on, with inplace conversion, that won't be
even necessary. In other words, this would also be trying to solve a
problem that we haven't yet encountered and that we have a solution
for anyway.
I think that, unless anyone disagrees, is to go ahead with the names
we discussed in the last meeting. They seem to be the ones that make
the most sense for the upcoming use cases.
Cheers,
/fuad
> > [...]
>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-06-30 8:07 ` Fuad Tabba
@ 2025-06-30 14:44 ` Ackerley Tng
2025-06-30 15:08 ` Fuad Tabba
0 siblings, 1 reply; 75+ messages in thread
From: Ackerley Tng @ 2025-06-30 14:44 UTC (permalink / raw)
To: Fuad Tabba
Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Fuad Tabba <tabba@google.com> writes:
> Hi Ackerley,
>
> On Fri, 27 Jun 2025 at 16:01, Ackerley Tng <ackerleytng@google.com> wrote:
>>
>> Ackerley Tng <ackerleytng@google.com> writes:
>>
>> > [...]
>>
>> >>> +/*
>> >>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
>> >>> + * private.
>> >>> + *
>> >>> + * A return value of false indicates that the gfn is explicitly or implicitly
>> >>> + * shared (i.e., non-CoCo VMs).
>> >>> + */
>> >>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>> >>> {
>> >>> - return IS_ENABLED(CONFIG_KVM_GMEM) &&
>> >>> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>> >>> + struct kvm_memory_slot *slot;
>> >>> +
>> >>> + if (!IS_ENABLED(CONFIG_KVM_GMEM))
>> >>> + return false;
>> >>> +
>> >>> + slot = gfn_to_memslot(kvm, gfn);
>> >>> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>> >>> + /*
>> >>> + * Without in-place conversion support, if a guest_memfd memslot
>> >>> + * supports shared memory, then all the slot's memory is
>> >>> + * considered not private, i.e., implicitly shared.
>> >>> + */
>> >>> + return false;
>> >>
>> >> Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
>> >> mappable guest_memfd. You need to do that no matter what.
>> >
>> > Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
>> > disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
>> > out. Where do people think we should check the mutual exclusivity?
>> >
>> > In kvm_supported_mem_attributes() I'm thiking that we should still allow
>> > the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
>> > gfn ranges. Or do people think we should just disallow
>> > KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
>> > a guest_memfd-only memslot?
>> >
>> > If we check mutually exclusivity when handling
>> > kvm_vm_set_memory_attributes(), as long as part of the range where
>> > KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
>> > whose slot is guest_memfd-only, the ioctl will return EINVAL.
>> >
>>
>> At yesterday's (2025-06-26) guest_memfd upstream call discussion,
>>
>> * Fuad brought up a possible use case where within the *same* VM, we
>> want to allow both memslots that supports and does not support mmap in
>> guest_memfd.
>> * Shivank suggested a concrete use case for this: the user wants a
>> guest_memfd memslot that supports mmap just so userspace addresses can
>> be used as references for specifying memory policy.
>> * Sean then added on that allowing both types of guest_memfd memslots
>> (support and not supporting mmap) will allow the user to have a second
>> layer of protection and ensure that for some memslots, the user
>> expects never to be able to mmap from the memslot.
>>
>> I agree it will be useful to allow both guest_memfd memslots that
>> support and do not support mmap in a single VM.
>>
>> I think I found an issue with flags, which is that GUEST_MEMFD_FLAG_MMAP
>> should not imply that the guest_memfd will provide memory for all guest
>> faults within the memslot's gfn range (KVM_MEMSLOT_GMEM_ONLY).
>>
>> For the use case Shivank raised, if the user wants a guest_memfd memslot
>> that supports mmap just so userspace addresses can be used as references
>> for specifying memory policy for legacy Coco VMs where shared memory
>> should still come from other sources, GUEST_MEMFD_FLAG_MMAP will be set,
>> but KVM can't fault shared memory from guest_memfd. Hence,
>> GUEST_MEMFD_FLAG_MMAP should not imply KVM_MEMSLOT_GMEM_ONLY.
>>
>> Thinking forward, if we want guest_memfd to provide (no-mmap) protection
>> even for non-CoCo VMs (such that perhaps initial VM image is populated
>> and then VM memory should never be mmap-ed at all), we will want
>> guest_memfd to be the source of memory even if GUEST_MEMFD_FLAG_MMAP is
>> not set.
>>
>> I propose that we should have a single VM-level flag to solve this (in
>> line with Sean's guideline that we should just move towards what we want
>> and not support non-existent use cases): something like
>> KVM_CAP_PREFER_GMEM.
>>
>> If KVM_CAP_PREFER_GMEM_MEMORY is set,
>>
>> * memory for any gfn range in a guest_memfd memslot will be requested
>> from guest_memfd
>> * any privacy status queries will also be directed to guest_memfd
>> * KVM_MEMORY_ATTRIBUTE_PRIVATE will not be a valid attribute
>>
>> KVM_CAP_PREFER_GMEM_MEMORY will be orthogonal with no validation on
>> GUEST_MEMFD_FLAG_MMAP, which should just purely guard mmap support in
>> guest_memfd.
>>
>> Here's a table that I set up [1]. I believe the proposed
>> KVM_CAP_PREFER_GMEM_MEMORY (column 7) lines up with requirements
>> (columns 1 to 4) correctly.
>>
>> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3710/guest_memfd%20use%20cases%20vs%20guest_memfd%20flags%20and%20privacy%20tracking.pdf
>
> I'm not sure this naming helps. What does "prefer" imply here? If the
> caller from user space does not prefer, does it mean that they
> mind/oppose?
>
Sorry, bad naming.
I used "prefer" because some memslots may not have guest_memfd at
all. To clarify, a "guest_memfd memslot" is a memslot that has some
valid guest_memfd fd and offset. The memslot may also have a valid
userspace_addr configured, either mmap-ed from the same guest_memfd fd
or from some other backing memory (for legacy CoCo VMs), or NULL for
userspace_addr.
I meant to have the CAP enable KVM_MEMSLOT_GMEM_ONLY of this patch
series for all memslots that have some valid guest_memfd fd and offset,
except if we have a VM-level CAP, KVM_MEMSLOT_GMEM_ONLY should be moved
to the VM level.
> Regarding the use case Shivank mentioned, mmaping for policy, while
> the use case is a valid one, the raison d'être of mmap is to map into
> user space (i.e., fault it in). I would argue that if you opt into
> mmap, you are doing it to be able to access it.
The above is in conflict with what was discussed on 2025-06-26 IIUC.
Shivank brought up the case of enabling mmap *only* to be able to set
mempolicy using the VMAs, and Sean (IIUC) later agreed we should allow
userspace to only enable mmap but still disable faults, so that userspace
is given additional protection, such that even if a (compromised)
userspace does a private-to-shared conversion, userspace is still not
allowed to fault in the page.
Hence, if we want to support mmaping just for policy and continue to
restrict faulting, then GUEST_MEMFD_FLAG_MMAP should not imply
KVM_MEMSLOT_GMEM_ONLY.
> To me, that seems like
> something that merits its own flag, rather than mmap. Also, I recall
> that we said that later on, with inplace conversion, that won't be
> even necessary.
On x86, as of now I believe we're going with an ioctl that does *not*
check what the guest prefers and will go ahead to perform the
private-to-shared conversion, which will go ahead to update
shareability.
> In other words, this would also be trying to solve a
> problem that we haven't yet encountered and that we have a solution
> for anyway.
>
So we don't have a solution for the use case where userspace wants to
mmap but never fault for userspace's protection from stray
private-to-shared conversions, unless we decouple GUEST_MEMFD_FLAG_MMAP
and KVM_MEMSLOT_GMEM_ONLY.
> I think that, unless anyone disagrees, is to go ahead with the names
> we discussed in the last meeting. They seem to be the ones that make
> the most sense for the upcoming use cases.
>
We could also discuss if we really want to support the use case where
userspace wants to mmap but never fault for userspace's protection from
stray private-to-shared conversions.
> Cheers,
> /fuad
>
>
>
>> > [...]
>>
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-06-30 14:44 ` Ackerley Tng
@ 2025-06-30 15:08 ` Fuad Tabba
2025-06-30 19:26 ` Shivank Garg
0 siblings, 1 reply; 75+ messages in thread
From: Fuad Tabba @ 2025-06-30 15:08 UTC (permalink / raw)
To: Ackerley Tng
Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Hi Ackerley,
On Mon, 30 Jun 2025 at 15:44, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> > Hi Ackerley,
> >
> > On Fri, 27 Jun 2025 at 16:01, Ackerley Tng <ackerleytng@google.com> wrote:
> >>
> >> Ackerley Tng <ackerleytng@google.com> writes:
> >>
> >> > [...]
> >>
> >> >>> +/*
> >> >>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
> >> >>> + * private.
> >> >>> + *
> >> >>> + * A return value of false indicates that the gfn is explicitly or implicitly
> >> >>> + * shared (i.e., non-CoCo VMs).
> >> >>> + */
> >> >>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >> >>> {
> >> >>> - return IS_ENABLED(CONFIG_KVM_GMEM) &&
> >> >>> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> >> >>> + struct kvm_memory_slot *slot;
> >> >>> +
> >> >>> + if (!IS_ENABLED(CONFIG_KVM_GMEM))
> >> >>> + return false;
> >> >>> +
> >> >>> + slot = gfn_to_memslot(kvm, gfn);
> >> >>> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
> >> >>> + /*
> >> >>> + * Without in-place conversion support, if a guest_memfd memslot
> >> >>> + * supports shared memory, then all the slot's memory is
> >> >>> + * considered not private, i.e., implicitly shared.
> >> >>> + */
> >> >>> + return false;
> >> >>
> >> >> Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
> >> >> mappable guest_memfd. You need to do that no matter what.
> >> >
> >> > Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
> >> > disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
> >> > out. Where do people think we should check the mutual exclusivity?
> >> >
> >> > In kvm_supported_mem_attributes() I'm thiking that we should still allow
> >> > the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
> >> > gfn ranges. Or do people think we should just disallow
> >> > KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
> >> > a guest_memfd-only memslot?
> >> >
> >> > If we check mutually exclusivity when handling
> >> > kvm_vm_set_memory_attributes(), as long as part of the range where
> >> > KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
> >> > whose slot is guest_memfd-only, the ioctl will return EINVAL.
> >> >
> >>
> >> At yesterday's (2025-06-26) guest_memfd upstream call discussion,
> >>
> >> * Fuad brought up a possible use case where within the *same* VM, we
> >> want to allow both memslots that supports and does not support mmap in
> >> guest_memfd.
> >> * Shivank suggested a concrete use case for this: the user wants a
> >> guest_memfd memslot that supports mmap just so userspace addresses can
> >> be used as references for specifying memory policy.
> >> * Sean then added on that allowing both types of guest_memfd memslots
> >> (support and not supporting mmap) will allow the user to have a second
> >> layer of protection and ensure that for some memslots, the user
> >> expects never to be able to mmap from the memslot.
> >>
> >> I agree it will be useful to allow both guest_memfd memslots that
> >> support and do not support mmap in a single VM.
> >>
> >> I think I found an issue with flags, which is that GUEST_MEMFD_FLAG_MMAP
> >> should not imply that the guest_memfd will provide memory for all guest
> >> faults within the memslot's gfn range (KVM_MEMSLOT_GMEM_ONLY).
> >>
> >> For the use case Shivank raised, if the user wants a guest_memfd memslot
> >> that supports mmap just so userspace addresses can be used as references
> >> for specifying memory policy for legacy Coco VMs where shared memory
> >> should still come from other sources, GUEST_MEMFD_FLAG_MMAP will be set,
> >> but KVM can't fault shared memory from guest_memfd. Hence,
> >> GUEST_MEMFD_FLAG_MMAP should not imply KVM_MEMSLOT_GMEM_ONLY.
> >>
> >> Thinking forward, if we want guest_memfd to provide (no-mmap) protection
> >> even for non-CoCo VMs (such that perhaps initial VM image is populated
> >> and then VM memory should never be mmap-ed at all), we will want
> >> guest_memfd to be the source of memory even if GUEST_MEMFD_FLAG_MMAP is
> >> not set.
> >>
> >> I propose that we should have a single VM-level flag to solve this (in
> >> line with Sean's guideline that we should just move towards what we want
> >> and not support non-existent use cases): something like
> >> KVM_CAP_PREFER_GMEM.
> >>
> >> If KVM_CAP_PREFER_GMEM_MEMORY is set,
> >>
> >> * memory for any gfn range in a guest_memfd memslot will be requested
> >> from guest_memfd
> >> * any privacy status queries will also be directed to guest_memfd
> >> * KVM_MEMORY_ATTRIBUTE_PRIVATE will not be a valid attribute
> >>
> >> KVM_CAP_PREFER_GMEM_MEMORY will be orthogonal with no validation on
> >> GUEST_MEMFD_FLAG_MMAP, which should just purely guard mmap support in
> >> guest_memfd.
> >>
> >> Here's a table that I set up [1]. I believe the proposed
> >> KVM_CAP_PREFER_GMEM_MEMORY (column 7) lines up with requirements
> >> (columns 1 to 4) correctly.
> >>
> >> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3710/guest_memfd%20use%20cases%20vs%20guest_memfd%20flags%20and%20privacy%20tracking.pdf
> >
> > I'm not sure this naming helps. What does "prefer" imply here? If the
> > caller from user space does not prefer, does it mean that they
> > mind/oppose?
> >
>
> Sorry, bad naming.
>
> I used "prefer" because some memslots may not have guest_memfd at
> all. To clarify, a "guest_memfd memslot" is a memslot that has some
> valid guest_memfd fd and offset. The memslot may also have a valid
> userspace_addr configured, either mmap-ed from the same guest_memfd fd
> or from some other backing memory (for legacy CoCo VMs), or NULL for
> userspace_addr.
>
> I meant to have the CAP enable KVM_MEMSLOT_GMEM_ONLY of this patch
> series for all memslots that have some valid guest_memfd fd and offset,
> except if we have a VM-level CAP, KVM_MEMSLOT_GMEM_ONLY should be moved
> to the VM level.
Regardless of the name, I feel that this functionality at best does
not belong in this series, and potentially adds more confusion.
Userspace should be specific about what it wants, and they know what
kind of memslots there are in the VM: userspace creates them. In that
case, userspace can either create a legacy memslot, no need for any of
the new flags, or it can create a guest_memfd memslot, and then use
any new flags to qualify that. Having a flag/capability that means
something for guest_memfd memslots, but effectively keeps the same
behavior for legacy ones seems to add more confusion.
> > Regarding the use case Shivank mentioned, mmaping for policy, while
> > the use case is a valid one, the raison d'être of mmap is to map into
> > user space (i.e., fault it in). I would argue that if you opt into
> > mmap, you are doing it to be able to access it.
>
> The above is in conflict with what was discussed on 2025-06-26 IIUC.
>
> Shivank brought up the case of enabling mmap *only* to be able to set
> mempolicy using the VMAs, and Sean (IIUC) later agreed we should allow
> userspace to only enable mmap but still disable faults, so that userspace
> is given additional protection, such that even if a (compromised)
> userspace does a private-to-shared conversion, userspace is still not
> allowed to fault in the page.
I don't think there's a conflict :) What I think is this is outside
of the scope of this series for a few reasons:
- This is prior to the mempolicy work (and is the base for it)
- If we need to, we can add a flag later to restrict mmap faulting
- Once we get in-place conversion, the mempolicy work could use the
ability to disallow mapping for private memory
By actually implementing something now, we would be restricting the
mempolicy work, rather than helping it, since we would effectively be
deciding now how that work should proceed. By keeping this the way it
is now, the mempolicy work can explore various alternatives.
I think we discussed this in the guest_memfd sync of 2025-06-12, and I
think this was roughly our conclusion.
> Hence, if we want to support mmaping just for policy and continue to
> restrict faulting, then GUEST_MEMFD_FLAG_MMAP should not imply
> KVM_MEMSLOT_GMEM_ONLY.
>
> > To me, that seems like
> > something that merits its own flag, rather than mmap. Also, I recall
> > that we said that later on, with inplace conversion, that won't be
> > even necessary.
>
> On x86, as of now I believe we're going with an ioctl that does *not*
> check what the guest prefers and will go ahead to perform the
> private-to-shared conversion, which will go ahead to update
> shareability.
Here I think you're making my case that we're dragging more complexity
from future work/series into this series, since now we're going into
the IOCTLs for the conversion series :)
> > In other words, this would also be trying to solve a
> > problem that we haven't yet encountered and that we have a solution
> > for anyway.
> >
>
> So we don't have a solution for the use case where userspace wants to
> mmap but never fault for userspace's protection from stray
> private-to-shared conversions, unless we decouple GUEST_MEMFD_FLAG_MMAP
> and KVM_MEMSLOT_GMEM_ONLY.
>
> > I think that, unless anyone disagrees, is to go ahead with the names
> > we discussed in the last meeting. They seem to be the ones that make
> > the most sense for the upcoming use cases.
> >
>
> We could also discuss if we really want to support the use case where
> userspace wants to mmap but never fault for userspace's protection from
> stray private-to-shared conversions.
I would really rather defer that work to when it's needed. It seems
that we should aim to land this series as soon as possible, since it's
the one blocking much of the future work. As far as I can tell,
nothing here precludes introducing the mechanism of supporting the
case where userspace wants to mmap but never fault, once it's needed.
This was I believe what we had agreed on in the sync on 2025-06-26.
Cheers,
/fuad
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-06-30 15:08 ` Fuad Tabba
@ 2025-06-30 19:26 ` Shivank Garg
2025-06-30 20:03 ` David Hildenbrand
0 siblings, 1 reply; 75+ messages in thread
From: Shivank Garg @ 2025-06-30 19:26 UTC (permalink / raw)
To: Fuad Tabba, Ackerley Tng
Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On 6/30/2025 8:38 PM, Fuad Tabba wrote:
> Hi Ackerley,
>
> On Mon, 30 Jun 2025 at 15:44, Ackerley Tng <ackerleytng@google.com> wrote:
>>
>> Fuad Tabba <tabba@google.com> writes:
>>
>>> Hi Ackerley,
>>>
>>> On Fri, 27 Jun 2025 at 16:01, Ackerley Tng <ackerleytng@google.com> wrote:
>>>>
>>>> Ackerley Tng <ackerleytng@google.com> writes:
>>>>
>>>>> [...]
>>>>
>>>>>>> +/*
>>>>>>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
>>>>>>> + * private.
>>>>>>> + *
>>>>>>> + * A return value of false indicates that the gfn is explicitly or implicitly
>>>>>>> + * shared (i.e., non-CoCo VMs).
>>>>>>> + */
>>>>>>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>>>>>> {
>>>>>>> - return IS_ENABLED(CONFIG_KVM_GMEM) &&
>>>>>>> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>>>>>> + struct kvm_memory_slot *slot;
>>>>>>> +
>>>>>>> + if (!IS_ENABLED(CONFIG_KVM_GMEM))
>>>>>>> + return false;
>>>>>>> +
>>>>>>> + slot = gfn_to_memslot(kvm, gfn);
>>>>>>> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>>>>>>> + /*
>>>>>>> + * Without in-place conversion support, if a guest_memfd memslot
>>>>>>> + * supports shared memory, then all the slot's memory is
>>>>>>> + * considered not private, i.e., implicitly shared.
>>>>>>> + */
>>>>>>> + return false;
>>>>>>
>>>>>> Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
>>>>>> mappable guest_memfd. You need to do that no matter what.
>>>>>
>>>>> Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
>>>>> disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
>>>>> out. Where do people think we should check the mutual exclusivity?
>>>>>
>>>>> In kvm_supported_mem_attributes() I'm thiking that we should still allow
>>>>> the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
>>>>> gfn ranges. Or do people think we should just disallow
>>>>> KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
>>>>> a guest_memfd-only memslot?
>>>>>
>>>>> If we check mutually exclusivity when handling
>>>>> kvm_vm_set_memory_attributes(), as long as part of the range where
>>>>> KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
>>>>> whose slot is guest_memfd-only, the ioctl will return EINVAL.
>>>>>
>>>>
>>>> At yesterday's (2025-06-26) guest_memfd upstream call discussion,
>>>>
>>>> * Fuad brought up a possible use case where within the *same* VM, we
>>>> want to allow both memslots that supports and does not support mmap in
>>>> guest_memfd.
>>>> * Shivank suggested a concrete use case for this: the user wants a
>>>> guest_memfd memslot that supports mmap just so userspace addresses can
>>>> be used as references for specifying memory policy.
>>>> * Sean then added on that allowing both types of guest_memfd memslots
>>>> (support and not supporting mmap) will allow the user to have a second
>>>> layer of protection and ensure that for some memslots, the user
>>>> expects never to be able to mmap from the memslot.
>>>>
>>>> I agree it will be useful to allow both guest_memfd memslots that
>>>> support and do not support mmap in a single VM.
>>>>
>>>> I think I found an issue with flags, which is that GUEST_MEMFD_FLAG_MMAP
>>>> should not imply that the guest_memfd will provide memory for all guest
>>>> faults within the memslot's gfn range (KVM_MEMSLOT_GMEM_ONLY).
>>>>
>>>> For the use case Shivank raised, if the user wants a guest_memfd memslot
>>>> that supports mmap just so userspace addresses can be used as references
>>>> for specifying memory policy for legacy Coco VMs where shared memory
>>>> should still come from other sources, GUEST_MEMFD_FLAG_MMAP will be set,
>>>> but KVM can't fault shared memory from guest_memfd. Hence,
>>>> GUEST_MEMFD_FLAG_MMAP should not imply KVM_MEMSLOT_GMEM_ONLY.
>>>>
>>>> Thinking forward, if we want guest_memfd to provide (no-mmap) protection
>>>> even for non-CoCo VMs (such that perhaps initial VM image is populated
>>>> and then VM memory should never be mmap-ed at all), we will want
>>>> guest_memfd to be the source of memory even if GUEST_MEMFD_FLAG_MMAP is
>>>> not set.
>>>>
>>>> I propose that we should have a single VM-level flag to solve this (in
>>>> line with Sean's guideline that we should just move towards what we want
>>>> and not support non-existent use cases): something like
>>>> KVM_CAP_PREFER_GMEM.
>>>>
>>>> If KVM_CAP_PREFER_GMEM_MEMORY is set,
>>>>
>>>> * memory for any gfn range in a guest_memfd memslot will be requested
>>>> from guest_memfd
>>>> * any privacy status queries will also be directed to guest_memfd
>>>> * KVM_MEMORY_ATTRIBUTE_PRIVATE will not be a valid attribute
>>>>
>>>> KVM_CAP_PREFER_GMEM_MEMORY will be orthogonal with no validation on
>>>> GUEST_MEMFD_FLAG_MMAP, which should just purely guard mmap support in
>>>> guest_memfd.
>>>>
>>>> Here's a table that I set up [1]. I believe the proposed
>>>> KVM_CAP_PREFER_GMEM_MEMORY (column 7) lines up with requirements
>>>> (columns 1 to 4) correctly.
>>>>
>>>> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3710/guest_memfd%20use%20cases%20vs%20guest_memfd%20flags%20and%20privacy%20tracking.pdf
>>>
>>> I'm not sure this naming helps. What does "prefer" imply here? If the
>>> caller from user space does not prefer, does it mean that they
>>> mind/oppose?
>>>
>>
>> Sorry, bad naming.
>>
>> I used "prefer" because some memslots may not have guest_memfd at
>> all. To clarify, a "guest_memfd memslot" is a memslot that has some
>> valid guest_memfd fd and offset. The memslot may also have a valid
>> userspace_addr configured, either mmap-ed from the same guest_memfd fd
>> or from some other backing memory (for legacy CoCo VMs), or NULL for
>> userspace_addr.
>>
>> I meant to have the CAP enable KVM_MEMSLOT_GMEM_ONLY of this patch
>> series for all memslots that have some valid guest_memfd fd and offset,
>> except if we have a VM-level CAP, KVM_MEMSLOT_GMEM_ONLY should be moved
>> to the VM level.
>
> Regardless of the name, I feel that this functionality at best does
> not belong in this series, and potentially adds more confusion.
>
> Userspace should be specific about what it wants, and they know what
> kind of memslots there are in the VM: userspace creates them. In that
> case, userspace can either create a legacy memslot, no need for any of
> the new flags, or it can create a guest_memfd memslot, and then use
> any new flags to qualify that. Having a flag/capability that means
> something for guest_memfd memslots, but effectively keeps the same
> behavior for legacy ones seems to add more confusion.
>
>>> Regarding the use case Shivank mentioned, mmaping for policy, while
>>> the use case is a valid one, the raison d'être of mmap is to map into
>>> user space (i.e., fault it in). I would argue that if you opt into
>>> mmap, you are doing it to be able to access it.
>>
>> The above is in conflict with what was discussed on 2025-06-26 IIUC.
>>
>> Shivank brought up the case of enabling mmap *only* to be able to set
>> mempolicy using the VMAs, and Sean (IIUC) later agreed we should allow
>> userspace to only enable mmap but still disable faults, so that userspace
>> is given additional protection, such that even if a (compromised)
>> userspace does a private-to-shared conversion, userspace is still not
>> allowed to fault in the page.
>
> I don't think there's a conflict :) What I think is this is outside
> of the scope of this series for a few reasons:
>
> - This is prior to the mempolicy work (and is the base for it)
> - If we need to, we can add a flag later to restrict mmap faulting
> - Once we get in-place conversion, the mempolicy work could use the
> ability to disallow mapping for private memory
>
> By actually implementing something now, we would be restricting the
> mempolicy work, rather than helping it, since we would effectively be
> deciding now how that work should proceed. By keeping this the way it
> is now, the mempolicy work can explore various alternatives.
>
> I think we discussed this in the guest_memfd sync of 2025-06-12, and I
> think this was roughly our conclusion.
>
>> Hence, if we want to support mmaping just for policy and continue to
>> restrict faulting, then GUEST_MEMFD_FLAG_MMAP should not imply
>> KVM_MEMSLOT_GMEM_ONLY.
>>
>>> To me, that seems like
>>> something that merits its own flag, rather than mmap. Also, I recall
>>> that we said that later on, with inplace conversion, that won't be
>>> even necessary.
>>
>> On x86, as of now I believe we're going with an ioctl that does *not*
>> check what the guest prefers and will go ahead to perform the
>> private-to-shared conversion, which will go ahead to update
>> shareability.
>
> Here I think you're making my case that we're dragging more complexity
> from future work/series into this series, since now we're going into
> the IOCTLs for the conversion series :)
>
>>> In other words, this would also be trying to solve a
>>> problem that we haven't yet encountered and that we have a solution
>>> for anyway.
>>>
>>
>> So we don't have a solution for the use case where userspace wants to
>> mmap but never fault for userspace's protection from stray
>> private-to-shared conversions, unless we decouple GUEST_MEMFD_FLAG_MMAP
>> and KVM_MEMSLOT_GMEM_ONLY.
>>
>>> I think that, unless anyone disagrees, is to go ahead with the names
>>> we discussed in the last meeting. They seem to be the ones that make
>>> the most sense for the upcoming use cases.
>>>
>>
>> We could also discuss if we really want to support the use case where
>> userspace wants to mmap but never fault for userspace's protection from
>> stray private-to-shared conversions.
>
> I would really rather defer that work to when it's needed. It seems
> that we should aim to land this series as soon as possible, since it's
> the one blocking much of the future work. As far as I can tell,
> nothing here precludes introducing the mechanism of supporting the
> case where userspace wants to mmap but never fault, once it's needed.
> This was I believe what we had agreed on in the sync on 2025-06-26.
I support this approach.
I think it's more strategic to land the mmap functionality first and then
iterate. We can address those advanced usecases in a separate series.
This follows the same pattern we agreed upon for the NUMA mempolicy support[1]
in 2025-06-12 sync: merge the initial feature rebased on stage-1, and handle
CoCo/SNP requirements in stage 2.
[1] https://lore.kernel.org/linux-mm/20250618112935.7629-1-shivankg@amd.com
Thanks,
Shivank
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-06-30 19:26 ` Shivank Garg
@ 2025-06-30 20:03 ` David Hildenbrand
2025-07-01 14:15 ` Ackerley Tng
0 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2025-06-30 20:03 UTC (permalink / raw)
To: Shivank Garg, Fuad Tabba, Ackerley Tng
Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
On 30.06.25 21:26, Shivank Garg wrote:
> On 6/30/2025 8:38 PM, Fuad Tabba wrote:
>> Hi Ackerley,
>>
>> On Mon, 30 Jun 2025 at 15:44, Ackerley Tng <ackerleytng@google.com> wrote:
>>>
>>> Fuad Tabba <tabba@google.com> writes:
>>>
>>>> Hi Ackerley,
>>>>
>>>> On Fri, 27 Jun 2025 at 16:01, Ackerley Tng <ackerleytng@google.com> wrote:
>>>>>
>>>>> Ackerley Tng <ackerleytng@google.com> writes:
>>>>>
>>>>>> [...]
>>>>>
>>>>>>>> +/*
>>>>>>>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
>>>>>>>> + * private.
>>>>>>>> + *
>>>>>>>> + * A return value of false indicates that the gfn is explicitly or implicitly
>>>>>>>> + * shared (i.e., non-CoCo VMs).
>>>>>>>> + */
>>>>>>>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>>>>>>> {
>>>>>>>> - return IS_ENABLED(CONFIG_KVM_GMEM) &&
>>>>>>>> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>>>>>>> + struct kvm_memory_slot *slot;
>>>>>>>> +
>>>>>>>> + if (!IS_ENABLED(CONFIG_KVM_GMEM))
>>>>>>>> + return false;
>>>>>>>> +
>>>>>>>> + slot = gfn_to_memslot(kvm, gfn);
>>>>>>>> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>>>>>>>> + /*
>>>>>>>> + * Without in-place conversion support, if a guest_memfd memslot
>>>>>>>> + * supports shared memory, then all the slot's memory is
>>>>>>>> + * considered not private, i.e., implicitly shared.
>>>>>>>> + */
>>>>>>>> + return false;
>>>>>>>
>>>>>>> Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
>>>>>>> mappable guest_memfd. You need to do that no matter what.
>>>>>>
>>>>>> Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
>>>>>> disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
>>>>>> out. Where do people think we should check the mutual exclusivity?
>>>>>>
>>>>>> In kvm_supported_mem_attributes() I'm thiking that we should still allow
>>>>>> the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
>>>>>> gfn ranges. Or do people think we should just disallow
>>>>>> KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
>>>>>> a guest_memfd-only memslot?
>>>>>>
>>>>>> If we check mutually exclusivity when handling
>>>>>> kvm_vm_set_memory_attributes(), as long as part of the range where
>>>>>> KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
>>>>>> whose slot is guest_memfd-only, the ioctl will return EINVAL.
>>>>>>
>>>>>
>>>>> At yesterday's (2025-06-26) guest_memfd upstream call discussion,
>>>>>
>>>>> * Fuad brought up a possible use case where within the *same* VM, we
>>>>> want to allow both memslots that supports and does not support mmap in
>>>>> guest_memfd.
>>>>> * Shivank suggested a concrete use case for this: the user wants a
>>>>> guest_memfd memslot that supports mmap just so userspace addresses can
>>>>> be used as references for specifying memory policy.
>>>>> * Sean then added on that allowing both types of guest_memfd memslots
>>>>> (support and not supporting mmap) will allow the user to have a second
>>>>> layer of protection and ensure that for some memslots, the user
>>>>> expects never to be able to mmap from the memslot.
>>>>>
>>>>> I agree it will be useful to allow both guest_memfd memslots that
>>>>> support and do not support mmap in a single VM.
>>>>>
>>>>> I think I found an issue with flags, which is that GUEST_MEMFD_FLAG_MMAP
>>>>> should not imply that the guest_memfd will provide memory for all guest
>>>>> faults within the memslot's gfn range (KVM_MEMSLOT_GMEM_ONLY).
>>>>>
>>>>> For the use case Shivank raised, if the user wants a guest_memfd memslot
>>>>> that supports mmap just so userspace addresses can be used as references
>>>>> for specifying memory policy for legacy Coco VMs where shared memory
>>>>> should still come from other sources, GUEST_MEMFD_FLAG_MMAP will be set,
>>>>> but KVM can't fault shared memory from guest_memfd. Hence,
>>>>> GUEST_MEMFD_FLAG_MMAP should not imply KVM_MEMSLOT_GMEM_ONLY.
>>>>>
>>>>> Thinking forward, if we want guest_memfd to provide (no-mmap) protection
>>>>> even for non-CoCo VMs (such that perhaps initial VM image is populated
>>>>> and then VM memory should never be mmap-ed at all), we will want
>>>>> guest_memfd to be the source of memory even if GUEST_MEMFD_FLAG_MMAP is
>>>>> not set.
>>>>>
>>>>> I propose that we should have a single VM-level flag to solve this (in
>>>>> line with Sean's guideline that we should just move towards what we want
>>>>> and not support non-existent use cases): something like
>>>>> KVM_CAP_PREFER_GMEM.
>>>>>
>>>>> If KVM_CAP_PREFER_GMEM_MEMORY is set,
>>>>>
>>>>> * memory for any gfn range in a guest_memfd memslot will be requested
>>>>> from guest_memfd
>>>>> * any privacy status queries will also be directed to guest_memfd
>>>>> * KVM_MEMORY_ATTRIBUTE_PRIVATE will not be a valid attribute
>>>>>
>>>>> KVM_CAP_PREFER_GMEM_MEMORY will be orthogonal with no validation on
>>>>> GUEST_MEMFD_FLAG_MMAP, which should just purely guard mmap support in
>>>>> guest_memfd.
>>>>>
>>>>> Here's a table that I set up [1]. I believe the proposed
>>>>> KVM_CAP_PREFER_GMEM_MEMORY (column 7) lines up with requirements
>>>>> (columns 1 to 4) correctly.
>>>>>
>>>>> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3710/guest_memfd%20use%20cases%20vs%20guest_memfd%20flags%20and%20privacy%20tracking.pdf
>>>>
>>>> I'm not sure this naming helps. What does "prefer" imply here? If the
>>>> caller from user space does not prefer, does it mean that they
>>>> mind/oppose?
>>>>
>>>
>>> Sorry, bad naming.
>>>
>>> I used "prefer" because some memslots may not have guest_memfd at
>>> all. To clarify, a "guest_memfd memslot" is a memslot that has some
>>> valid guest_memfd fd and offset. The memslot may also have a valid
>>> userspace_addr configured, either mmap-ed from the same guest_memfd fd
>>> or from some other backing memory (for legacy CoCo VMs), or NULL for
>>> userspace_addr.
>>>
>>> I meant to have the CAP enable KVM_MEMSLOT_GMEM_ONLY of this patch
>>> series for all memslots that have some valid guest_memfd fd and offset,
>>> except if we have a VM-level CAP, KVM_MEMSLOT_GMEM_ONLY should be moved
>>> to the VM level.
>>
>> Regardless of the name, I feel that this functionality at best does
>> not belong in this series, and potentially adds more confusion.
>>
>> Userspace should be specific about what it wants, and they know what
>> kind of memslots there are in the VM: userspace creates them. In that
>> case, userspace can either create a legacy memslot, no need for any of
>> the new flags, or it can create a guest_memfd memslot, and then use
>> any new flags to qualify that. Having a flag/capability that means
>> something for guest_memfd memslots, but effectively keeps the same
>> behavior for legacy ones seems to add more confusion.
>>
>>>> Regarding the use case Shivank mentioned, mmaping for policy, while
>>>> the use case is a valid one, the raison d'être of mmap is to map into
>>>> user space (i.e., fault it in). I would argue that if you opt into
>>>> mmap, you are doing it to be able to access it.
>>>
>>> The above is in conflict with what was discussed on 2025-06-26 IIUC.
>>>
>>> Shivank brought up the case of enabling mmap *only* to be able to set
>>> mempolicy using the VMAs, and Sean (IIUC) later agreed we should allow
>>> userspace to only enable mmap but still disable faults, so that userspace
>>> is given additional protection, such that even if a (compromised)
>>> userspace does a private-to-shared conversion, userspace is still not
>>> allowed to fault in the page.
>>
>> I don't think there's a conflict :) What I think is this is outside
>> of the scope of this series for a few reasons:
>>
>> - This is prior to the mempolicy work (and is the base for it)
>> - If we need to, we can add a flag later to restrict mmap faulting
>> - Once we get in-place conversion, the mempolicy work could use the
>> ability to disallow mapping for private memory
>>
>> By actually implementing something now, we would be restricting the
>> mempolicy work, rather than helping it, since we would effectively be
>> deciding now how that work should proceed. By keeping this the way it
>> is now, the mempolicy work can explore various alternatives.
>>
>> I think we discussed this in the guest_memfd sync of 2025-06-12, and I
>> think this was roughly our conclusion.
>>
>>> Hence, if we want to support mmaping just for policy and continue to
>>> restrict faulting, then GUEST_MEMFD_FLAG_MMAP should not imply
>>> KVM_MEMSLOT_GMEM_ONLY.
>>>
>>>> To me, that seems like
>>>> something that merits its own flag, rather than mmap. Also, I recall
>>>> that we said that later on, with inplace conversion, that won't be
>>>> even necessary.
>>>
>>> On x86, as of now I believe we're going with an ioctl that does *not*
>>> check what the guest prefers and will go ahead to perform the
>>> private-to-shared conversion, which will go ahead to update
>>> shareability.
>>
>> Here I think you're making my case that we're dragging more complexity
>> from future work/series into this series, since now we're going into
>> the IOCTLs for the conversion series :)
>>
>>>> In other words, this would also be trying to solve a
>>>> problem that we haven't yet encountered and that we have a solution
>>>> for anyway.
>>>>
>>>
>>> So we don't have a solution for the use case where userspace wants to
>>> mmap but never fault for userspace's protection from stray
>>> private-to-shared conversions, unless we decouple GUEST_MEMFD_FLAG_MMAP
>>> and KVM_MEMSLOT_GMEM_ONLY.
>>>
>>>> I think that, unless anyone disagrees, is to go ahead with the names
>>>> we discussed in the last meeting. They seem to be the ones that make
>>>> the most sense for the upcoming use cases.
>>>>
>>>
>>> We could also discuss if we really want to support the use case where
>>> userspace wants to mmap but never fault for userspace's protection from
>>> stray private-to-shared conversions.
>>
>> I would really rather defer that work to when it's needed. It seems
>> that we should aim to land this series as soon as possible, since it's
>> the one blocking much of the future work. As far as I can tell,
>> nothing here precludes introducing the mechanism of supporting the
>> case where userspace wants to mmap but never fault, once it's needed.
>> This was I believe what we had agreed on in the sync on 2025-06-26.
>
> I support this approach.
Agreed. Let's get this in with the changes requested by Sean applied.
How to use GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM with
legacy mem attributes (-> all memory in guest_memfd private) could be
added later on top, once really required.
As discussed, CoCo VMs that want to support GUEST_MEMFD_FLAG_MMAP will
have to disable legacy mem attributes using a new capability in stage-2.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-06-30 20:03 ` David Hildenbrand
@ 2025-07-01 14:15 ` Ackerley Tng
2025-07-01 14:44 ` David Hildenbrand
0 siblings, 1 reply; 75+ messages in thread
From: Ackerley Tng @ 2025-07-01 14:15 UTC (permalink / raw)
To: David Hildenbrand, Shivank Garg, Fuad Tabba
Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
David Hildenbrand <david@redhat.com> writes:
> On 30.06.25 21:26, Shivank Garg wrote:
>> On 6/30/2025 8:38 PM, Fuad Tabba wrote:
>>> Hi Ackerley,
>>>
>>> On Mon, 30 Jun 2025 at 15:44, Ackerley Tng <ackerleytng@google.com> wrote:
>>>>
>>>> Fuad Tabba <tabba@google.com> writes:
>>>>
>>>>> Hi Ackerley,
>>>>>
>>>>> On Fri, 27 Jun 2025 at 16:01, Ackerley Tng <ackerleytng@google.com> wrote:
>>>>>>
>>>>>> Ackerley Tng <ackerleytng@google.com> writes:
>>>>>>
>>>>>>> [...]
>>>>>>
>>>>>>>>> +/*
>>>>>>>>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
>>>>>>>>> + * private.
>>>>>>>>> + *
>>>>>>>>> + * A return value of false indicates that the gfn is explicitly or implicitly
>>>>>>>>> + * shared (i.e., non-CoCo VMs).
>>>>>>>>> + */
>>>>>>>>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>>>>>>>> {
>>>>>>>>> - return IS_ENABLED(CONFIG_KVM_GMEM) &&
>>>>>>>>> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>>>>>>>> + struct kvm_memory_slot *slot;
>>>>>>>>> +
>>>>>>>>> + if (!IS_ENABLED(CONFIG_KVM_GMEM))
>>>>>>>>> + return false;
>>>>>>>>> +
>>>>>>>>> + slot = gfn_to_memslot(kvm, gfn);
>>>>>>>>> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>>>>>>>>> + /*
>>>>>>>>> + * Without in-place conversion support, if a guest_memfd memslot
>>>>>>>>> + * supports shared memory, then all the slot's memory is
>>>>>>>>> + * considered not private, i.e., implicitly shared.
>>>>>>>>> + */
>>>>>>>>> + return false;
>>>>>>>>
>>>>>>>> Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
>>>>>>>> mappable guest_memfd. You need to do that no matter what.
>>>>>>>
>>>>>>> Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
>>>>>>> disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
>>>>>>> out. Where do people think we should check the mutual exclusivity?
>>>>>>>
>>>>>>> In kvm_supported_mem_attributes() I'm thiking that we should still allow
>>>>>>> the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
>>>>>>> gfn ranges. Or do people think we should just disallow
>>>>>>> KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
>>>>>>> a guest_memfd-only memslot?
>>>>>>>
>>>>>>> If we check mutually exclusivity when handling
>>>>>>> kvm_vm_set_memory_attributes(), as long as part of the range where
>>>>>>> KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
>>>>>>> whose slot is guest_memfd-only, the ioctl will return EINVAL.
>>>>>>>
>>>>>>
>>>>>> At yesterday's (2025-06-26) guest_memfd upstream call discussion,
>>>>>>
>>>>>> * Fuad brought up a possible use case where within the *same* VM, we
>>>>>> want to allow both memslots that supports and does not support mmap in
>>>>>> guest_memfd.
>>>>>> * Shivank suggested a concrete use case for this: the user wants a
>>>>>> guest_memfd memslot that supports mmap just so userspace addresses can
>>>>>> be used as references for specifying memory policy.
>>>>>> * Sean then added on that allowing both types of guest_memfd memslots
>>>>>> (support and not supporting mmap) will allow the user to have a second
>>>>>> layer of protection and ensure that for some memslots, the user
>>>>>> expects never to be able to mmap from the memslot.
>>>>>>
>>>>>> I agree it will be useful to allow both guest_memfd memslots that
>>>>>> support and do not support mmap in a single VM.
>>>>>>
>>>>>> I think I found an issue with flags, which is that GUEST_MEMFD_FLAG_MMAP
>>>>>> should not imply that the guest_memfd will provide memory for all guest
>>>>>> faults within the memslot's gfn range (KVM_MEMSLOT_GMEM_ONLY).
>>>>>>
>>>>>> For the use case Shivank raised, if the user wants a guest_memfd memslot
>>>>>> that supports mmap just so userspace addresses can be used as references
>>>>>> for specifying memory policy for legacy Coco VMs where shared memory
>>>>>> should still come from other sources, GUEST_MEMFD_FLAG_MMAP will be set,
>>>>>> but KVM can't fault shared memory from guest_memfd. Hence,
>>>>>> GUEST_MEMFD_FLAG_MMAP should not imply KVM_MEMSLOT_GMEM_ONLY.
>>>>>>
>>>>>> Thinking forward, if we want guest_memfd to provide (no-mmap) protection
>>>>>> even for non-CoCo VMs (such that perhaps initial VM image is populated
>>>>>> and then VM memory should never be mmap-ed at all), we will want
>>>>>> guest_memfd to be the source of memory even if GUEST_MEMFD_FLAG_MMAP is
>>>>>> not set.
>>>>>>
>>>>>> I propose that we should have a single VM-level flag to solve this (in
>>>>>> line with Sean's guideline that we should just move towards what we want
>>>>>> and not support non-existent use cases): something like
>>>>>> KVM_CAP_PREFER_GMEM.
>>>>>>
>>>>>> If KVM_CAP_PREFER_GMEM_MEMORY is set,
>>>>>>
>>>>>> * memory for any gfn range in a guest_memfd memslot will be requested
>>>>>> from guest_memfd
>>>>>> * any privacy status queries will also be directed to guest_memfd
>>>>>> * KVM_MEMORY_ATTRIBUTE_PRIVATE will not be a valid attribute
>>>>>>
>>>>>> KVM_CAP_PREFER_GMEM_MEMORY will be orthogonal with no validation on
>>>>>> GUEST_MEMFD_FLAG_MMAP, which should just purely guard mmap support in
>>>>>> guest_memfd.
>>>>>>
>>>>>> Here's a table that I set up [1]. I believe the proposed
>>>>>> KVM_CAP_PREFER_GMEM_MEMORY (column 7) lines up with requirements
>>>>>> (columns 1 to 4) correctly.
>>>>>>
>>>>>> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3710/guest_memfd%20use%20cases%20vs%20guest_memfd%20flags%20and%20privacy%20tracking.pdf
>>>>>
>>>>> I'm not sure this naming helps. What does "prefer" imply here? If the
>>>>> caller from user space does not prefer, does it mean that they
>>>>> mind/oppose?
>>>>>
>>>>
>>>> Sorry, bad naming.
>>>>
>>>> I used "prefer" because some memslots may not have guest_memfd at
>>>> all. To clarify, a "guest_memfd memslot" is a memslot that has some
>>>> valid guest_memfd fd and offset. The memslot may also have a valid
>>>> userspace_addr configured, either mmap-ed from the same guest_memfd fd
>>>> or from some other backing memory (for legacy CoCo VMs), or NULL for
>>>> userspace_addr.
>>>>
>>>> I meant to have the CAP enable KVM_MEMSLOT_GMEM_ONLY of this patch
>>>> series for all memslots that have some valid guest_memfd fd and offset,
>>>> except if we have a VM-level CAP, KVM_MEMSLOT_GMEM_ONLY should be moved
>>>> to the VM level.
>>>
>>> Regardless of the name, I feel that this functionality at best does
>>> not belong in this series, and potentially adds more confusion.
>>>
>>> Userspace should be specific about what it wants, and they know what
>>> kind of memslots there are in the VM: userspace creates them. In that
>>> case, userspace can either create a legacy memslot, no need for any of
>>> the new flags, or it can create a guest_memfd memslot, and then use
>>> any new flags to qualify that. Having a flag/capability that means
>>> something for guest_memfd memslots, but effectively keeps the same
>>> behavior for legacy ones seems to add more confusion.
>>>
>>>>> Regarding the use case Shivank mentioned, mmaping for policy, while
>>>>> the use case is a valid one, the raison d'être of mmap is to map into
>>>>> user space (i.e., fault it in). I would argue that if you opt into
>>>>> mmap, you are doing it to be able to access it.
>>>>
>>>> The above is in conflict with what was discussed on 2025-06-26 IIUC.
>>>>
>>>> Shivank brought up the case of enabling mmap *only* to be able to set
>>>> mempolicy using the VMAs, and Sean (IIUC) later agreed we should allow
>>>> userspace to only enable mmap but still disable faults, so that userspace
>>>> is given additional protection, such that even if a (compromised)
>>>> userspace does a private-to-shared conversion, userspace is still not
>>>> allowed to fault in the page.
>>>
>>> I don't think there's a conflict :) What I think is this is outside
>>> of the scope of this series for a few reasons:
>>>
>>> - This is prior to the mempolicy work (and is the base for it)
>>> - If we need to, we can add a flag later to restrict mmap faulting
>>> - Once we get in-place conversion, the mempolicy work could use the
>>> ability to disallow mapping for private memory
>>>
>>> By actually implementing something now, we would be restricting the
>>> mempolicy work, rather than helping it, since we would effectively be
>>> deciding now how that work should proceed. By keeping this the way it
>>> is now, the mempolicy work can explore various alternatives.
>>>
>>> I think we discussed this in the guest_memfd sync of 2025-06-12, and I
>>> think this was roughly our conclusion.
>>>
>>>> Hence, if we want to support mmaping just for policy and continue to
>>>> restrict faulting, then GUEST_MEMFD_FLAG_MMAP should not imply
>>>> KVM_MEMSLOT_GMEM_ONLY.
>>>>
>>>>> To me, that seems like
>>>>> something that merits its own flag, rather than mmap. Also, I recall
>>>>> that we said that later on, with inplace conversion, that won't be
>>>>> even necessary.
>>>>
>>>> On x86, as of now I believe we're going with an ioctl that does *not*
>>>> check what the guest prefers and will go ahead to perform the
>>>> private-to-shared conversion, which will go ahead to update
>>>> shareability.
>>>
>>> Here I think you're making my case that we're dragging more complexity
>>> from future work/series into this series, since now we're going into
>>> the IOCTLs for the conversion series :)
>>>
>>>>> In other words, this would also be trying to solve a
>>>>> problem that we haven't yet encountered and that we have a solution
>>>>> for anyway.
>>>>>
>>>>
>>>> So we don't have a solution for the use case where userspace wants to
>>>> mmap but never fault for userspace's protection from stray
>>>> private-to-shared conversions, unless we decouple GUEST_MEMFD_FLAG_MMAP
>>>> and KVM_MEMSLOT_GMEM_ONLY.
>>>>
>>>>> I think that, unless anyone disagrees, is to go ahead with the names
>>>>> we discussed in the last meeting. They seem to be the ones that make
>>>>> the most sense for the upcoming use cases.
>>>>>
>>>>
>>>> We could also discuss if we really want to support the use case where
>>>> userspace wants to mmap but never fault for userspace's protection from
>>>> stray private-to-shared conversions.
>>>
>>> I would really rather defer that work to when it's needed. It seems
>>> that we should aim to land this series as soon as possible, since it's
>>> the one blocking much of the future work. As far as I can tell,
>>> nothing here precludes introducing the mechanism of supporting the
>>> case where userspace wants to mmap but never fault, once it's needed.
>>> This was I believe what we had agreed on in the sync on 2025-06-26.
>>
>> I support this approach.
>
> Agreed. Let's get this in with the changes requested by Sean applied.
>
> How to use GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM with
> legacy mem attributes (-> all memory in guest_memfd private) could be
> added later on top, once really required.
>
> As discussed, CoCo VMs that want to support GUEST_MEMFD_FLAG_MMAP will
> have to disable legacy mem attributes using a new capability in stage-2.
>
I rewatched the guest_memfd meeting on 2025-06-12. We do want to
support the use case where userspace wants to have mmap (e.g. to set
mempolicy) but does not want to allow faulting into the host.
On 2025-06-12, the conclusion was that the problem will be solved once
guest_memfd supports shareability, and that's because userspace can set
shareability to GUEST, so the memory can't be faulted into the host.
On 2025-06-26, Sean said we want to let userspace have an extra layer of
protection so that memory cannot be faulted in to the host, ever. IOW,
we want to let userspace say that even if there is a stray
private-to-shared conversion, *don't* allow faulting memory into the
host.
The difference is the "extra layer of protection", which should remain
in effect even if there are (stray/unexpected) private-to-shared
conversions to guest_memfd or to KVM. Here's a direct link to the point
in the video where Sean brought this up [1]. I'm really hoping I didn't
misinterpret this!
Let me look ahead a little, since this involves use cases already
brought up though I'm not sure how real they are. I just want to make
sure that in a few patch series' time, we don't end up needing userspace
to use a complex bunch of CAPs and FLAGs.
In this series (mmap support, V12, patch 10/18) [2], to allow
KVM_X86_DEFAULT_VMs to use guest_memfd, I added a `fault_from_gmem()`
helper, which is defined as follows (before the renaming Sean requested):
+static inline bool fault_from_gmem(struct kvm_page_fault *fault)
+{
+ return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
+}
The above is changeable, of course :). The intention is that if the
fault is private, fault from guest_memfd. If GUEST_MEMFD_FLAG_MMAP is
set (KVM_MEMSLOT_GMEM_ONLY will be set on the memslot), fault from
guest_memfd.
If we defer handling GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM
with legacy mem attributes to the future, this helper will probably
become
-static inline bool fault_from_gmem(struct kvm_page_fault *fault)
+static inline bool fault_from_gmem(struct kvm *kvm, struct kvm_page_fault *fault)
+{
- return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
+ return fault->is_private || (kvm_gmem_memslot_supports_shared(fault->slot) &&
+ !kvm_arch_disable_legacy_private_tracking(kvm));
+}
And on memslot binding we check
if kvm_arch_disable_legacy_private_tracking(kvm) and not GUEST_MEMFD_FLAG_MMAP
return -EINVAL;
1. Is that what yall meant?
2. Does this kind of not satisfy the "extra layer of protection"
requirement (if it is a requirement)?
A legacy CoCo VM using guest_memfd only for private memory (shared
memory from say, shmem) and needing to set mempolicy would
* Set GUEST_MEMFD_FLAG_MMAP
* Leave KVM_CAP_DISABLE_LEGACY_PRIVATE_TRACKING defaulted to false
but still be able to send conversion ioctls directly to guest_memfd,
and then be able to fault guest_memfd memory into the host.
3. Now for a use case I've heard of (feel free to tell me this will
never be supported or "we'll deal with it if it comes"): On a
non-CoCo VM, we want to use guest_memfd but not use mmap (and the
initial VM image will be written using write() syscall or something
else).
* Set GUEST_MEMFD_FLAG_MMAP to false
* Leave KVM_CAP_DISABLE_LEGACY_PRIVATE_TRACKING defaulted to false
(it's a non-CoCo VM, weird to do anything to do with private
tracking)
And now we're stuck because fault_from_gmem() will return false all
the time and we can't use memory from guest_memfd.
[1] https://youtu.be/7b5hgKHoZoY?t=1162s
[2] https://lore.kernel.org/all/20250611133330.1514028-11-tabba@google.com/
> --
> Cheers,
>
> David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-07-01 14:15 ` Ackerley Tng
@ 2025-07-01 14:44 ` David Hildenbrand
2025-07-08 0:05 ` Sean Christopherson
0 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2025-07-01 14:44 UTC (permalink / raw)
To: Ackerley Tng, Shivank Garg, Fuad Tabba
Cc: Sean Christopherson, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
>>> I support this approach.
>>
>> Agreed. Let's get this in with the changes requested by Sean applied.
>>
>> How to use GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM with
>> legacy mem attributes (-> all memory in guest_memfd private) could be
>> added later on top, once really required.
>>
>> As discussed, CoCo VMs that want to support GUEST_MEMFD_FLAG_MMAP will
>> have to disable legacy mem attributes using a new capability in stage-2.
>>
>
> I rewatched the guest_memfd meeting on 2025-06-12. We do want to
> support the use case where userspace wants to have mmap (e.g. to set
> mempolicy) but does not want to allow faulting into the host.
>
> On 2025-06-12, the conclusion was that the problem will be solved once
> guest_memfd supports shareability, and that's because userspace can set
> shareability to GUEST, so the memory can't be faulted into the host.
>
> On 2025-06-26, Sean said we want to let userspace have an extra layer of
> protection so that memory cannot be faulted in to the host, ever. IOW,
> we want to let userspace say that even if there is a stray
> private-to-shared conversion, *don't* allow faulting memory into the
> host.
>
> The difference is the "extra layer of protection", which should remain
> in effect even if there are (stray/unexpected) private-to-shared
> conversions to guest_memfd or to KVM. Here's a direct link to the point
> in the video where Sean brought this up [1]. I'm really hoping I didn't
> misinterpret this!
>
> Let me look ahead a little, since this involves use cases already
> brought up though I'm not sure how real they are. I just want to make
> sure that in a few patch series' time, we don't end up needing userspace
> to use a complex bunch of CAPs and FLAGs.
>
> In this series (mmap support, V12, patch 10/18) [2], to allow
> KVM_X86_DEFAULT_VMs to use guest_memfd, I added a `fault_from_gmem()`
> helper, which is defined as follows (before the renaming Sean requested):
>
> +static inline bool fault_from_gmem(struct kvm_page_fault *fault)
> +{
> + return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
> +}
>
> The above is changeable, of course :). The intention is that if the
> fault is private, fault from guest_memfd. If GUEST_MEMFD_FLAG_MMAP is
> set (KVM_MEMSLOT_GMEM_ONLY will be set on the memslot), fault from
> guest_memfd.
>
> If we defer handling GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM
> with legacy mem attributes to the future, this helper will probably
> become
>
> -static inline bool fault_from_gmem(struct kvm_page_fault *fault)
> +static inline bool fault_from_gmem(struct kvm *kvm, struct kvm_page_fault *fault)
> +{
> - return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
> + return fault->is_private || (kvm_gmem_memslot_supports_shared(fault->slot) &&
> + !kvm_arch_disable_legacy_private_tracking(kvm));
> +}
>
> And on memslot binding we check
>
> if kvm_arch_disable_legacy_private_tracking(kvm) and not GUEST_MEMFD_FLAG_MMAP
> return -EINVAL;
>
> 1. Is that what yall meant?
My understanding:
CoCo VMs will initially (stage-1) only support !GUEST_MEMFD_FLAG_MMAP.
With stage-2, CoCo VMs will support GUEST_MEMFD_FLAG_MMAP only with
kvm_arch_disable_legacy_private_tracking().
Non-CoCo VMs will only support GUEST_MEMFD_FLAG_MMAP. (no concept of
private)
>
> 2. Does this kind of not satisfy the "extra layer of protection"
> requirement (if it is a requirement)?
>
> A legacy CoCo VM using guest_memfd only for private memory (shared
> memory from say, shmem) and needing to set mempolicy would
>
> * Set GUEST_MEMFD_FLAG_MMAP
> * Leave KVM_CAP_DISABLE_LEGACY_PRIVATE_TRACKING defaulted to false
>
> but still be able to send conversion ioctls directly to guest_memfd,
> and then be able to fault guest_memfd memory into the host.
In that configuration, I would expect that all memory in guest_memfd is
private and remains private.
guest_memfd without memory attributes cannot support in-place conversion.
How to achieve that might be interesting: the capability will affect
guest_memfd behavior?
>
> 3. Now for a use case I've heard of (feel free to tell me this will
> never be supported or "we'll deal with it if it comes"): On a
> non-CoCo VM, we want to use guest_memfd but not use mmap (and the
> initial VM image will be written using write() syscall or something
> else).
>
> * Set GUEST_MEMFD_FLAG_MMAP to false
> * Leave KVM_CAP_DISABLE_LEGACY_PRIVATE_TRACKING defaulted to false
> (it's a non-CoCo VM, weird to do anything to do with private
> tracking)
>
> And now we're stuck because fault_from_gmem() will return false all
> the time and we can't use memory from guest_memfd.
I think I discussed that with Sean: we would have GUEST_MEMFD_FLAG_WRITE
that will imply everything that GUEST_MEMFD_FLAG_MMAP would imply,
except the actual mmap() support.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-07-01 14:44 ` David Hildenbrand
@ 2025-07-08 0:05 ` Sean Christopherson
2025-07-08 13:44 ` Ackerley Tng
0 siblings, 1 reply; 75+ messages in thread
From: Sean Christopherson @ 2025-07-08 0:05 UTC (permalink / raw)
To: David Hildenbrand
Cc: Ackerley Tng, Shivank Garg, Fuad Tabba, kvm, linux-arm-msm,
linux-mm, kvmarm, pbonzini, chenhuacai, mpe, anup, paul.walmsley,
palmer, aou, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
vbabka, vannapurve, mail, michael.roth, wei.w.wang, liam.merwick,
isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
jthoughton, peterx, pankaj.gupta, ira.weiny
On Tue, Jul 01, 2025, David Hildenbrand wrote:
> > > > I support this approach.
> > >
> > > Agreed. Let's get this in with the changes requested by Sean applied.
> > >
> > > How to use GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM with
> > > legacy mem attributes (-> all memory in guest_memfd private) could be
> > > added later on top, once really required.
> > >
> > > As discussed, CoCo VMs that want to support GUEST_MEMFD_FLAG_MMAP will
> > > have to disable legacy mem attributes using a new capability in stage-2.
> > >
> >
> > I rewatched the guest_memfd meeting on 2025-06-12. We do want to
> > support the use case where userspace wants to have mmap (e.g. to set
> > mempolicy) but does not want to allow faulting into the host.
> >
> > On 2025-06-12, the conclusion was that the problem will be solved once
> > guest_memfd supports shareability, and that's because userspace can set
> > shareability to GUEST, so the memory can't be faulted into the host.
> >
> > On 2025-06-26, Sean said we want to let userspace have an extra layer of
> > protection so that memory cannot be faulted in to the host, ever. IOW,
> > we want to let userspace say that even if there is a stray
> > private-to-shared conversion, *don't* allow faulting memory into the
> > host.
Eh, my comments were more along the lines of "it would be nice if we could have
such protections", not a "we must support this". And I suspect that making the
behavior all-or-nothing for a given guest_memfd wouldn't be very useful, i.e.
that userspace would probably want to be able to prevent accessing a specific
chunk of the gmem instance.
Actually, we can probably get that via mseal(), maybe even for free today? E.g.
mmap() w/ PROT_NONE, mbind(), and then mseal().
So yeah, I think we do nothing for now.
> > The difference is the "extra layer of protection", which should remain
> > in effect even if there are (stray/unexpected) private-to-shared
> > conversions to guest_memfd or to KVM. Here's a direct link to the point
> > in the video where Sean brought this up [1]. I'm really hoping I didn't
> > misinterpret this!
> >
> > Let me look ahead a little, since this involves use cases already
> > brought up though I'm not sure how real they are. I just want to make
> > sure that in a few patch series' time, we don't end up needing userspace
> > to use a complex bunch of CAPs and FLAGs.
> >
> > In this series (mmap support, V12, patch 10/18) [2], to allow
> > KVM_X86_DEFAULT_VMs to use guest_memfd, I added a `fault_from_gmem()`
> > helper, which is defined as follows (before the renaming Sean requested):
> >
> > +static inline bool fault_from_gmem(struct kvm_page_fault *fault)
> > +{
> > + return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
> > +}
> >
> > The above is changeable, of course :). The intention is that if the
> > fault is private, fault from guest_memfd. If GUEST_MEMFD_FLAG_MMAP is
> > set (KVM_MEMSLOT_GMEM_ONLY will be set on the memslot), fault from
> > guest_memfd.
> >
> > If we defer handling GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM
> > with legacy mem attributes to the future, this helper will probably
> > become
> >
> > -static inline bool fault_from_gmem(struct kvm_page_fault *fault)
> > +static inline bool fault_from_gmem(struct kvm *kvm, struct kvm_page_fault *fault)
> > +{
> > - return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
> > + return fault->is_private || (kvm_gmem_memslot_supports_shared(fault->slot) &&
> > + !kvm_arch_disable_legacy_private_tracking(kvm));
> > +}
> >
> > And on memslot binding we check
> >
> > if kvm_arch_disable_legacy_private_tracking(kvm)
I would invert the KVM-internal arch hook, and only have KVM x86's capability refer
to the private memory attribute as legacy (because it simply doesn't exist for
any thing else).
> > and not GUEST_MEMFD_FLAG_MMAP
> > return -EINVAL;
> >
> > 1. Is that what yall meant?
I was thinking:
if (kvm_arch_has_private_memory_attribute(kvm) ==
kvm_gmem_mmap(...))
return -EINVAL;
I.e. in addition to requiring mmap() when KVM doesn't track private/sahred via
memory attributes, also disallow mmap() when private/shared is tracked via memory
attributes.
> My understanding:
>
> CoCo VMs will initially (stage-1) only support !GUEST_MEMFD_FLAG_MMAP.
>
> With stage-2, CoCo VMs will support GUEST_MEMFD_FLAG_MMAP only with
> kvm_arch_disable_legacy_private_tracking().
Yep, and everything except x86 will unconditionally return true for
kvm_arch_disable_legacy_private_tracking() (or false if it's inverted as above).
> Non-CoCo VMs will only support GUEST_MEMFD_FLAG_MMAP. (no concept of
> private)
>
> >
> > 2. Does this kind of not satisfy the "extra layer of protection"
> > requirement (if it is a requirement)?
It's not a requirement.
> > A legacy CoCo VM using guest_memfd only for private memory (shared
> > memory from say, shmem) and needing to set mempolicy would
> > * Set GUEST_MEMFD_FLAG_MMAP
I think we should keep it simple as above, and not support mmap() (and therefore
mbind()) with legacy CoCo VMs. Given the double allocation flaws with the legacy
approach, supporting mbind() seems like putting a bandaid on a doomed idea.
> > * Leave KVM_CAP_DISABLE_LEGACY_PRIVATE_TRACKING defaulted to false
> > but still be able to send conversion ioctls directly to guest_memfd,
> > and then be able to fault guest_memfd memory into the host.
>
> In that configuration, I would expect that all memory in guest_memfd is
> private and remains private.
>
> guest_memfd without memory attributes cannot support in-place conversion.
>
> How to achieve that might be interesting: the capability will affect
> guest_memfd behavior?
>
> >
> > 3. Now for a use case I've heard of (feel free to tell me this will
> > never be supported or "we'll deal with it if it comes"): On a
> > non-CoCo VM, we want to use guest_memfd but not use mmap (and the
> > initial VM image will be written using write() syscall or something
> > else).
> >
> > * Set GUEST_MEMFD_FLAG_MMAP to false
> > * Leave KVM_CAP_DISABLE_LEGACY_PRIVATE_TRACKING defaulted to false
> > (it's a non-CoCo VM, weird to do anything to do with private
> > tracking)
> >
> > And now we're stuck because fault_from_gmem() will return false all
> > the time and we can't use memory from guest_memfd.
Nah, don't support this scenario. Or rather, use mseal() as above. If someone
comes along with a concrete, strong use case for backing non-CoCo VMs and using
mseal() to wall off guest memory doesn't suffice, then they can have the honor
of justifying why KVM needs to take on more complexity. :-)
> I think I discussed that with Sean: we would have GUEST_MEMFD_FLAG_WRITE
> that will imply everything that GUEST_MEMFD_FLAG_MMAP would imply, except
> the actual mmap() support.
Ya, for the write() access or whatever. But there are bigger problems beyond
populating the memory, e.g. a non-CoCo VM won't support private memory, so without
many more changes to redirect KVM to gmem when faulting in guest memory, KVM won't
be able to map any memory into the guest.
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
2025-07-08 0:05 ` Sean Christopherson
@ 2025-07-08 13:44 ` Ackerley Tng
0 siblings, 0 replies; 75+ messages in thread
From: Ackerley Tng @ 2025-07-08 13:44 UTC (permalink / raw)
To: Sean Christopherson, David Hildenbrand
Cc: Shivank Garg, Fuad Tabba, kvm, linux-arm-msm, linux-mm, kvmarm,
pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro,
brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
amoorthy, dmatlack, isaku.yamahata, mic, vbabka, vannapurve, mail,
michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat, shuah,
hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton, peterx,
pankaj.gupta, ira.weiny
Sean Christopherson <seanjc@google.com> writes:
> On Tue, Jul 01, 2025, David Hildenbrand wrote:
>> > > > I support this approach.
>> > >
>> > > Agreed. Let's get this in with the changes requested by Sean applied.
>> > >
>> > > How to use GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM with
>> > > legacy mem attributes (-> all memory in guest_memfd private) could be
>> > > added later on top, once really required.
>> > >
>> > > As discussed, CoCo VMs that want to support GUEST_MEMFD_FLAG_MMAP will
>> > > have to disable legacy mem attributes using a new capability in stage-2.
>> > >
>> >
>> > I rewatched the guest_memfd meeting on 2025-06-12. We do want to
>> > support the use case where userspace wants to have mmap (e.g. to set
>> > mempolicy) but does not want to allow faulting into the host.
>> >
>> > On 2025-06-12, the conclusion was that the problem will be solved once
>> > guest_memfd supports shareability, and that's because userspace can set
>> > shareability to GUEST, so the memory can't be faulted into the host.
>> >
>> > On 2025-06-26, Sean said we want to let userspace have an extra layer of
>> > protection so that memory cannot be faulted in to the host, ever. IOW,
>> > we want to let userspace say that even if there is a stray
>> > private-to-shared conversion, *don't* allow faulting memory into the
>> > host.
>
> Eh, my comments were more along the lines of "it would be nice if we could have
> such protections", not a "we must support this". And I suspect that making the
> behavior all-or-nothing for a given guest_memfd wouldn't be very useful, i.e.
> that userspace would probably want to be able to prevent accessing a specific
> chunk of the gmem instance.
>
> Actually, we can probably get that via mseal(), maybe even for free today? E.g.
> mmap() w/ PROT_NONE, mbind(), and then mseal().
>
> So yeah, I think we do nothing for now.
>
>> > The difference is the "extra layer of protection", which should remain
>> > in effect even if there are (stray/unexpected) private-to-shared
>> > conversions to guest_memfd or to KVM. Here's a direct link to the point
>> > in the video where Sean brought this up [1]. I'm really hoping I didn't
>> > misinterpret this!
>> >
>> > Let me look ahead a little, since this involves use cases already
>> > brought up though I'm not sure how real they are. I just want to make
>> > sure that in a few patch series' time, we don't end up needing userspace
>> > to use a complex bunch of CAPs and FLAGs.
>> >
>> > In this series (mmap support, V12, patch 10/18) [2], to allow
>> > KVM_X86_DEFAULT_VMs to use guest_memfd, I added a `fault_from_gmem()`
>> > helper, which is defined as follows (before the renaming Sean requested):
>> >
>> > +static inline bool fault_from_gmem(struct kvm_page_fault *fault)
>> > +{
>> > + return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
>> > +}
>> >
>> > The above is changeable, of course :). The intention is that if the
>> > fault is private, fault from guest_memfd. If GUEST_MEMFD_FLAG_MMAP is
>> > set (KVM_MEMSLOT_GMEM_ONLY will be set on the memslot), fault from
>> > guest_memfd.
>> >
>> > If we defer handling GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM
>> > with legacy mem attributes to the future, this helper will probably
>> > become
>> >
>> > -static inline bool fault_from_gmem(struct kvm_page_fault *fault)
>> > +static inline bool fault_from_gmem(struct kvm *kvm, struct kvm_page_fault *fault)
>> > +{
>> > - return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot);
>> > + return fault->is_private || (kvm_gmem_memslot_supports_shared(fault->slot) &&
>> > + !kvm_arch_disable_legacy_private_tracking(kvm));
>> > +}
>> >
>> > And on memslot binding we check
>> >
>> > if kvm_arch_disable_legacy_private_tracking(kvm)
>
> I would invert the KVM-internal arch hook, and only have KVM x86's capability refer
> to the private memory attribute as legacy (because it simply doesn't exist for
> any thing else).
>
>> > and not GUEST_MEMFD_FLAG_MMAP
>> > return -EINVAL;
>> >
>> > 1. Is that what yall meant?
>
> I was thinking:
>
> if (kvm_arch_has_private_memory_attribute(kvm) ==
> kvm_gmem_mmap(...))
> return -EINVAL;
>
> I.e. in addition to requiring mmap() when KVM doesn't track private/sahred via
> memory attributes, also disallow mmap() when private/shared is tracked via memory
> attributes.
>
>> My understanding:
>>
>> CoCo VMs will initially (stage-1) only support !GUEST_MEMFD_FLAG_MMAP.
>>
>> With stage-2, CoCo VMs will support GUEST_MEMFD_FLAG_MMAP only with
>> kvm_arch_disable_legacy_private_tracking().
>
> Yep, and everything except x86 will unconditionally return true for
> kvm_arch_disable_legacy_private_tracking() (or false if it's inverted as above).
>
>> Non-CoCo VMs will only support GUEST_MEMFD_FLAG_MMAP. (no concept of
>> private)
>>
>> >
>> > 2. Does this kind of not satisfy the "extra layer of protection"
>> > requirement (if it is a requirement)?
>
> It's not a requirement.
>
>> > A legacy CoCo VM using guest_memfd only for private memory (shared
>> > memory from say, shmem) and needing to set mempolicy would
>> > * Set GUEST_MEMFD_FLAG_MMAP
>
> I think we should keep it simple as above, and not support mmap() (and therefore
> mbind()) with legacy CoCo VMs. Given the double allocation flaws with the legacy
> approach, supporting mbind() seems like putting a bandaid on a doomed idea.
>
>> > * Leave KVM_CAP_DISABLE_LEGACY_PRIVATE_TRACKING defaulted to false
>> > but still be able to send conversion ioctls directly to guest_memfd,
>> > and then be able to fault guest_memfd memory into the host.
>>
>> In that configuration, I would expect that all memory in guest_memfd is
>> private and remains private.
>>
>> guest_memfd without memory attributes cannot support in-place conversion.
>>
>> How to achieve that might be interesting: the capability will affect
>> guest_memfd behavior?
>>
>> >
>> > 3. Now for a use case I've heard of (feel free to tell me this will
>> > never be supported or "we'll deal with it if it comes"): On a
>> > non-CoCo VM, we want to use guest_memfd but not use mmap (and the
>> > initial VM image will be written using write() syscall or something
>> > else).
>> >
>> > * Set GUEST_MEMFD_FLAG_MMAP to false
>> > * Leave KVM_CAP_DISABLE_LEGACY_PRIVATE_TRACKING defaulted to false
>> > (it's a non-CoCo VM, weird to do anything to do with private
>> > tracking)
>> >
>> > And now we're stuck because fault_from_gmem() will return false all
>> > the time and we can't use memory from guest_memfd.
>
> Nah, don't support this scenario. Or rather, use mseal() as above. If someone
> comes along with a concrete, strong use case for backing non-CoCo VMs and using
> mseal() to wall off guest memory doesn't suffice, then they can have the honor
> of justifying why KVM needs to take on more complexity. :-)
>
>> I think I discussed that with Sean: we would have GUEST_MEMFD_FLAG_WRITE
>> that will imply everything that GUEST_MEMFD_FLAG_MMAP would imply, except
>> the actual mmap() support.
>
> Ya, for the write() access or whatever. But there are bigger problems beyond
> populating the memory, e.g. a non-CoCo VM won't support private memory, so without
> many more changes to redirect KVM to gmem when faulting in guest memory, KVM won't
> be able to map any memory into the guest.
Thanks for clarifying everything above :). Next respin (with Fuad's
help) coming soon!
^ permalink raw reply [flat|nested] 75+ messages in thread
end of thread, other threads:[~2025-07-08 13:44 UTC | newest]
Thread overview: 75+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-06-13 13:57 ` Ackerley Tng
2025-06-13 20:35 ` Sean Christopherson
2025-06-16 7:13 ` Fuad Tabba
2025-06-16 14:20 ` David Hildenbrand
2025-06-24 20:51 ` Ackerley Tng
2025-06-25 6:33 ` Roy, Patrick
2025-06-11 13:33 ` [PATCH v12 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-06-12 16:16 ` Shivank Garg
2025-06-13 21:03 ` Sean Christopherson
2025-06-13 21:18 ` David Hildenbrand
2025-06-13 22:48 ` Sean Christopherson
2025-06-16 6:52 ` Fuad Tabba
2025-06-16 14:16 ` David Hildenbrand
2025-06-17 23:04 ` Sean Christopherson
2025-06-18 11:18 ` Fuad Tabba
2025-06-16 13:44 ` Ira Weiny
2025-06-16 14:03 ` David Hildenbrand
2025-06-16 14:16 ` Fuad Tabba
2025-06-16 14:25 ` David Hildenbrand
2025-06-18 0:40 ` Sean Christopherson
2025-06-18 8:15 ` David Hildenbrand
2025-06-18 9:20 ` Xiaoyao Li
2025-06-18 9:27 ` David Hildenbrand
2025-06-18 9:44 ` Xiaoyao Li
2025-06-18 9:59 ` David Hildenbrand
2025-06-18 10:42 ` Xiaoyao Li
2025-06-18 11:14 ` David Hildenbrand
2025-06-18 12:17 ` Xiaoyao Li
2025-06-18 13:16 ` David Hildenbrand
2025-06-19 1:48 ` Sean Christopherson
2025-06-19 1:50 ` Sean Christopherson
2025-06-18 9:25 ` David Hildenbrand
2025-06-25 21:47 ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-06-13 22:08 ` Sean Christopherson
2025-06-24 23:40 ` Ackerley Tng
2025-06-27 15:01 ` Ackerley Tng
2025-06-30 8:07 ` Fuad Tabba
2025-06-30 14:44 ` Ackerley Tng
2025-06-30 15:08 ` Fuad Tabba
2025-06-30 19:26 ` Shivank Garg
2025-06-30 20:03 ` David Hildenbrand
2025-07-01 14:15 ` Ackerley Tng
2025-07-01 14:44 ` David Hildenbrand
2025-07-08 0:05 ` Sean Christopherson
2025-07-08 13:44 ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 12/18] KVM: x86: Enable guest_memfd shared memory for non-CoCo VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
2025-06-12 17:33 ` James Houghton
2025-06-11 13:33 ` [PATCH v12 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
2025-06-12 16:24 ` Shivank Garg
2025-06-11 13:33 ` [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-06-12 16:23 ` Shivank Garg
2025-06-12 17:38 ` [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
2025-06-24 10:02 ` Fuad Tabba
2025-06-24 10:16 ` David Hildenbrand
2025-06-24 10:25 ` Fuad Tabba
2025-06-24 11:44 ` David Hildenbrand
2025-06-24 11:58 ` Fuad Tabba
2025-06-24 17:50 ` Sean Christopherson
2025-06-25 8:00 ` Fuad Tabba
2025-06-25 14:07 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).